【python】【資料處理】畫多元資料分布圖

2023-08-07 23:42:04

小姿勢：

Matplotlib中%matplotlib inline是什麼、如何使用 https://blog.csdn.net/liangzuojiayi/article/details/78183783
load_iris 可以加載sklearn自帶的鸢尾花資料集（根據花萼、花瓣的長寬分辨屬于哪一個類），資料格式：

data.feature_names(data['feature_names'])：
 		['sepal length (cm)',
		 'sepal width (cm)',
		 'petal length (cm)',
		 'petal width (cm)']
data.target_names：
	array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
data['data']:
	array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       ......
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])
 data['target']:
 	array([0, 0, 0, 0,.......2, 2, 2])

sklearn.dataset可以加載很多種資料
t-SNE: https://blog.csdn.net/hustqb/article/details/78144384 詳細解釋了tsne的原理優缺點和使用方法

代碼：

–畫出手寫數字圖檔的資料分布圖

from time import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import TSNE
import pandas as pd

digits = datasets.load_digits(n_class=10)
df = pd.DataFrame(digits.data)
label = digits.target
df['label']  = label
print(type(digits.data))
df
# orginal data

<class 'numpy.ndarray'>

1	2	3	4	5	6	7	8	9	...	55	56	57	58	59	60	61	62	63	label
0.0	0.0	5.0	13.0	9.0	1.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	6.0	13.0	10.0	0.0	0.0	0.0
1	0.0	0.0	0.0	12.0	13.0	5.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	11.0	16.0	10.0	0.0	0.0	1
2	0.0	0.0	0.0	4.0	15.0	12.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	3.0	11.0	16.0	9.0	0.0	2
3	0.0	0.0	7.0	15.0	13.0	1.0	0.0	0.0	0.0	8.0	...	0.0	0.0	0.0	7.0	13.0	13.0	9.0	0.0	0.0	3
4	0.0	0.0	0.0	1.0	11.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	2.0	16.0	4.0	0.0	0.0	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1792	0.0	0.0	4.0	10.0	13.0	6.0	0.0	0.0	0.0	1.0	...	0.0	0.0	0.0	2.0	14.0	15.0	9.0	0.0	0.0	9
1793	0.0	0.0	6.0	16.0	13.0	11.0	1.0	0.0	0.0	0.0	...	0.0	0.0	0.0	6.0	16.0	14.0	6.0	0.0	0.0
1794	0.0	0.0	1.0	11.0	15.0	1.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	2.0	9.0	13.0	6.0	0.0	0.0	8
1795	0.0	0.0	2.0	10.0	7.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	5.0	12.0	16.0	12.0	0.0	0.0	9
1796	0.0	0.0	10.0	14.0	8.0	1.0	0.0	0.0	0.0	2.0	...	0.0	0.0	1.0	8.0	12.0	14.0	12.0	1.0	0.0	8

1797 rows × 65 columns

result = tsne.fit_transform(digits.data)
result

array([[ -4.2510934,  57.605927 ],
       [ 27.768238 , -18.912882 ],
       [ 19.440983 ,  -7.737709 ],
       ...,
       [ 10.630893 , -12.436025 ],
       [-18.820362 ,  28.899649 ],
       [  6.5873857,  -8.608063 ]], dtype=float32)

# draw 2-dimension pic

x_min, x_max = np.min(result), np.max(result)

# 這一步似乎讓結果都變為0-1的數字
result = (result - x_min)/(x_max-x_min)
fig = plt.figure()
# subplot可以畫出一個矩形，長寬由參數的前兩位确定，參數越大，邊長越小
ax = plt.subplot(111)
for i in range(result.shape[0]):
    plt.text(result[i,0], result[i,1], str(label[i]), color=plt.cm.Set1(label[i] / 10.), fontdict={'weight': 'bold','size': 9})
plt.xticks([])
plt.yticks([])
plt.title('hello')
plt.show(fig)

結果：

【python】【資料處理】畫多元資料分布圖

【python】【資料處理】畫多元資料分布圖

小姿勢：

代碼：

繼續閱讀

（Operation not permitted）無法安裝ipython

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入