Machine Learning（2）Estimate the probability density -- Mixture of Gaussians

Machine Learning（2）Mixture of Gaussians

Chenjing Ding

2018/02/21

notation	meaning
M	the number of mixture components
p(j)	weight of mixture component
p(x\|θj) p ( x \| θ j )	mixture component
p(x\|θ) p ( x \| θ )	mixture density
θj θ j	j-th component parameters

1. Mixture of Multivariate Gaussians

In some cases, one Gaussian distribution cannot represent p(x|θ) p ( x | θ ) , (see red model in figure 1 ), thus in this chapter we want to estimate the mixture density of multivariate Gaussians.

1.1 Obtain mixture of density

Weight of mixture component:

p(j)=πj p ( j ) = π j

Mixture component: p(x|θj) p ( x | θ j )

Mixture density p(x|θ)=∑j=1Mp(x|θj)p(j) p ( x | θ ) = ∑ j = 1 M p ( x | θ j ) p ( j )

Machine Learning（2）Estimate the probability density -- Mixture of Gaussians

figure1 mixture of density

2. Maximum Likelihood

using maximum likelihood to estimate uj u j :

Problem with estimation uj u j

thus γj(xn) γ j ( x n ) represents “responsibility of component j for mixture density given xn x n ”, if we can estimate γj(xn) γ j ( x n ) , then we can obtain uj u j ; and K-Means cluster is helpful.

3. K-Means cluster

K-Means cluster aims to assign data to one of the K clusters according to the distance to the mean of each cluster.

3.1 steps

step1: Initialization: pick K arbitrary centroids (cluster means)

step2: Assign each sample to the closest centroid.

step3: Adjust the centroids to be the means of the samples assigned to them.

step4: Go to step 2 until no change in step3;

Machine Learning（2）Estimate the probability density -- Mixture of Gaussians

figure2 the process of K-Means cluster (K = 2)

3.2 Objective function

K-Means optimizes the following objective function:

L=∑n=1N∑k=1Krnk||xn−μk||2rnk={ 1, k=argmink||xn−μk||2 0, else L = ∑ n = 1 N ∑ k = 1 K r n k | | x n − μ k | | 2 r n k = { 0 , e l s e 1 , k = a r g m i n k | | x n − μ k | | 2

rnk r n k is an indicator variable that checks whether uk u k is the nearest cluster center to point xn x n .

3.3 Advantages and Disadvantages

Advantage:

simple and fast to compute
converge to local minimum of within-cluster squared error

Disadvantage:

sensitive to initialization
sensitive to outliers
difficult to set K properly
only detect spherical clusters

Machine Learning（2）Estimate the probability density -- Mixture of Gaussians
figure3 the problem of K-Means cluster (K = 2)

4 .EM Algorithm

Once we use K-Means cluster to get the mean of each cluster, then we have θj=(uj, Σj) θ j = ( u j , Σ j ) , we can estimate the “responsibility” of component j for mixture density γj(xn) γ j ( x n ) .

4.1 K-Means Clustering Revisited

step1: Initialization pick K arbitrary centroids [compute θ0j=(μ0j,Σ0j) θ j 0 = ( μ j 0 , Σ j 0 ) ]

step2: Assign each sample to the closest centroid. [compute γj(xn) γ j ( x n ) ⇒ ⇒ Estep]

step3: Adjust the centroids to be the means of the samples assigned to them, [compute θτj=(μτj,Στj) θ j τ = ( μ j τ , Σ j τ ) ⇒ ⇒ Mstep]

step4: Go to step 2 (until no change)

The process is almost same with K-Means cluster, but in K-Means one point only depends on one distribution, no concept like γj(xn) γ j ( x n ) .

4.2 Estep & Mstep

Estep: softly assign samples to mixture components

γj(xn)=p(j)p(xn|θj)∑Mk=1p(xn|θk)p(k);∀j=1...K,∀n=1...N γ j ( x n ) = p ( j ) p ( x n | θ j ) ∑ k = 1 M p ( x n | θ k ) p ( k ) ; ∀ j = 1... K , ∀ n = 1... N

Mstep: re-estimate the parameters (separately for each mixture component) based on the soft assignments. Njˆ=∑n=1Nγj(xn)p(j)ˆ=NjˆNunewjˆ=∑Nn=1γj(xn)∗xn∑Nn=1γj(xn)Σnewjˆ=1Njˆ∑n=1Nγj(xn)(xn−unewjˆ)(xn−unewjˆ)T N j ^ = ∑ n = 1 N γ j ( x n ) p ( j ) ^ = N j ^ N u j n e w ^ = ∑ n = 1 N γ j ( x n ) ∗ x n ∑ n = 1 N γ j ( x n ) Σ j n e w ^ = 1 N j ^ ∑ n = 1 N γ j ( x n ) ( x n − u j n e w ^ ) ( x n − u j n e w ^ ) T

4.3 Advantages

Very general, can represent any (continuous) distribution.
Once trained, very fast to evaluate.
Can be updated online.

4.4 Caveats

introduce regularization

instead of Σ−1 Σ − 1 , use (Σ+σ)−1 ( Σ + σ ) − 1 to avoid Σ−1=0 Σ − 1 = 0 causing p(xn|θj) p ( x n | θ j ) goes to infinite
Initialize with k-Means to get better results

Typical steps:

Run k-Means M times (e.g. M = 10~100)

Pick the best result (lowest error J)

Use this result to initialize EM
EM for MoG is computational expensive
Need to select the number of mixture components K properly ⇒ ⇒ model selection problem

Machine Learning（2）Estimate the probability density -- Mixture of Gaussians

Machine Learning（2）Mixture of Gaussians

1. Mixture of Multivariate Gaussians

1.1 Obtain mixture of density

2. Maximum Likelihood

3. K-Means cluster

3.1 steps

3.2 Objective function

3.3 Advantages and Disadvantages

4 .EM Algorithm

4.1 K-Means Clustering Revisited

4.2 Estep & Mstep

4.3 Advantages

4.4 Caveats

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告