Hierarchical Clustering

2023-07-02 17:11:17

Clustering, in one sentence, is the extraction of natural groupings of similar data objects.

There are a couple of general ideas that occur quite frequently with respect to clustering:

The clusters should be naturally occurring in data.
The clustering should discover hidden patterns in the data.
Data points within the cluster should be similar.
Data points in two different clusters should not be similar.

Common algorithms used for clustering include K-Means, DBSCAN, and Gaussian Mixture Models.

Hierarchical Clustering

As mentioned before, hierarchical clustering relies using these clustering techniques to find a hierarchy of clusters, where this hierarchy resembles a tree structure, called a dendrogram.

Hierarchical clustering is the hierarchical decomposition of the data based on group similarities

Finding hierarcical clusters

There are two top-level methods for finding these hierarchical clusters:

Agglomerative clustering uses a bottom-up approach, wherein each data point starts in its own cluster. These clusters are then joined greedily, by taking the two most similar clusters together and merging them.
Divisive clustering uses a top-down approach, wherein all data points start in the same cluster. You can then use a parametric clustering algorithm like K-Means to divide the cluster into two clusters. For each cluster, you further divide it down to two clusters until you hit the desired number of clusters.

Both of these approaches rely on constructing a similarity matrix between all of the data points, which is usually calculated by cosine or Jaccard distance.

References

Hierarchical Clustering and its Applications
scikit-learn中的hierarchical clustering

Hierarchical Clustering

Hierarchical Clustering

References

繼續閱讀

PRML-系列一之1.2

機率——二進制變量（0/1）

1.1 例⼦：多項式曲線拟合

變分推斷讀書筆記PRML第十章——變分推斷(一)

Topic model and Gibbs Sampling

PRML5.2--網絡訓練網絡訓練

PRML3.5--證據近似證據近似

PRML第三章3.2偏置-方差分解

PRML第三章3.3貝葉斯線性回歸貝葉斯線性回歸

決策理論

從曲線拟合問題窺視機器學習中的相關概念1. 曲線拟合問題2. 曲線拟合資料源3. 多項式拟合4. 最優模型選擇5. 總結

PRML 讀書筆記 chapter1 Introduce

Reading Report Chapter 1 of PRMLmain contentsome thinkingsome problemsfuture

PRML學習筆記-條件高斯分布與邊緣高斯分布的常用性質條件高斯分布與邊緣高斯分布的常用性質

PRML讀書筆記（1）——第三章線性回歸模型線性基函數模型