【python】【数据处理】画多维数据分布图

2023-08-07 23:42:04

小姿势：

Matplotlib中%matplotlib inline是什么、如何使用 https://blog.csdn.net/liangzuojiayi/article/details/78183783
load_iris 可以加载sklearn自带的鸢尾花数据集（根据花萼、花瓣的长宽分辨属于哪一个类），数据格式：

data.feature_names(data['feature_names'])：
 		['sepal length (cm)',
		 'sepal width (cm)',
		 'petal length (cm)',
		 'petal width (cm)']
data.target_names：
	array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
data['data']:
	array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       ......
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])
 data['target']:
 	array([0, 0, 0, 0,.......2, 2, 2])

sklearn.dataset可以加载很多种数据
t-SNE: https://blog.csdn.net/hustqb/article/details/78144384 详细解释了tsne的原理优缺点和使用方法

代码：

–画出手写数字图片的数据分布图

from time import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import TSNE
import pandas as pd

digits = datasets.load_digits(n_class=10)
df = pd.DataFrame(digits.data)
label = digits.target
df['label']  = label
print(type(digits.data))
df
# orginal data

<class 'numpy.ndarray'>

1	2	3	4	5	6	7	8	9	...	55	56	57	58	59	60	61	62	63	label
0.0	0.0	5.0	13.0	9.0	1.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	6.0	13.0	10.0	0.0	0.0	0.0
1	0.0	0.0	0.0	12.0	13.0	5.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	11.0	16.0	10.0	0.0	0.0	1
2	0.0	0.0	0.0	4.0	15.0	12.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	3.0	11.0	16.0	9.0	0.0	2
3	0.0	0.0	7.0	15.0	13.0	1.0	0.0	0.0	0.0	8.0	...	0.0	0.0	0.0	7.0	13.0	13.0	9.0	0.0	0.0	3
4	0.0	0.0	0.0	1.0	11.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	2.0	16.0	4.0	0.0	0.0	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1792	0.0	0.0	4.0	10.0	13.0	6.0	0.0	0.0	0.0	1.0	...	0.0	0.0	0.0	2.0	14.0	15.0	9.0	0.0	0.0	9
1793	0.0	0.0	6.0	16.0	13.0	11.0	1.0	0.0	0.0	0.0	...	0.0	0.0	0.0	6.0	16.0	14.0	6.0	0.0	0.0
1794	0.0	0.0	1.0	11.0	15.0	1.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	2.0	9.0	13.0	6.0	0.0	0.0	8
1795	0.0	0.0	2.0	10.0	7.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	5.0	12.0	16.0	12.0	0.0	0.0	9
1796	0.0	0.0	10.0	14.0	8.0	1.0	0.0	0.0	0.0	2.0	...	0.0	0.0	1.0	8.0	12.0	14.0	12.0	1.0	0.0	8

1797 rows × 65 columns

result = tsne.fit_transform(digits.data)
result

array([[ -4.2510934,  57.605927 ],
       [ 27.768238 , -18.912882 ],
       [ 19.440983 ,  -7.737709 ],
       ...,
       [ 10.630893 , -12.436025 ],
       [-18.820362 ,  28.899649 ],
       [  6.5873857,  -8.608063 ]], dtype=float32)

# draw 2-dimension pic

x_min, x_max = np.min(result), np.max(result)

# 这一步似乎让结果都变为0-1的数字
result = (result - x_min)/(x_max-x_min)
fig = plt.figure()
# subplot可以画出一个矩形，长宽由参数的前两位确定，参数越大，边长越小
ax = plt.subplot(111)
for i in range(result.shape[0]):
    plt.text(result[i,0], result[i,1], str(label[i]), color=plt.cm.Set1(label[i] / 10.), fontdict={'weight': 'bold','size': 9})
plt.xticks([])
plt.yticks([])
plt.title('hello')
plt.show(fig)

结果：

【python】【数据处理】画多维数据分布图

【python】【数据处理】画多维数据分布图

小姿势：

代码：

继续阅读

（Operation not permitted）无法安装ipython

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入