機器學習中的sklearn中的聚類資料生成器

參數的意思：

n_samples: int, optional (default=100)待生成的樣本的總數。

n_features: int, optional (default=2)每個樣本的特征數。

centers: int or array of shape [n_centers, n_features], optional (default=3)要生成的樣本中心（類别）數，或者是确定的中心點。

cluster_std: float or sequence of floats, optional (default=1.0)每個類别的方差，例如我們希望生成2類資料，其中一類比另一類具有更大的方差，可以将cluster_std設定為[1.0,3.0]。

center_box: pair of floats (min, max), optional (default=(-10.0, 10.0))

shuffle: boolean, optional (default=True)

random_state:

return：

X : array of shape [n_samples, n_features]

The generated samples.

生成的樣本資料集。

y : array of shape [n_samples]

The integer labels for cluster membership of each sample.

1）make_classification

sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2,

n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None,

flip_y=0.01, class_sep=1.0, hypercube=True,shift=0.0, scale=1.0,

shuffle=True, random_state=None)

通常用于分類算法。

n_features :特征個數= n_informative（） + n_redundant + n_repeated

n_informative：多資訊特征的個數

n_redundant：備援資訊，informative特征的随機線性組合

n_repeated ：重複資訊，随機提取n_informative和n_redundant 特征

n_classes：分類類别

n_clusters_per_class ：某一個類别是由幾個cluster構成的

樣本資料集的标簽。

2）make_circles and make_moons

sklearn.datasets.make_circles(n_samples=100, shuffle=True, noise=None, random_state=None, factor=0.8)

3）make_gaussian_quantiles 和make_hastie_10_2

sklearn.datasets.make_gaussian_quantiles(mean=None, cov=1.0, n_samples=100, n_features=2, n_classes=3,

shuffle=True, random_state=None)