這裡寫自定義目錄标題
- 假設
- EM算法的流程
- EM算法的證明
-
- 命題
- 證明
假設
假設 Y Y Y 是觀察值, Z Z Z 是隐變量, θ \theta θ 是模型參數。
令 L ( θ ) = ln P ( Y ∣ θ ) L (\theta) = \ln P(Y | \theta) L(θ)=lnP(Y∣θ)。
EM算法的流程
- 初始化 θ 0 \theta ^{0} θ0。
-
假設已知 θ i \theta ^{i} θi ,則
2.1 計算隐變量 Z ∼ P ( Z ∣ Y , θ i ) Z\sim P(Z | Y, \theta ^{i}) Z∼P(Z∣Y,θi) 的分布。
2.2 E步驟:求期望 q ( θ , θ i ) = E z ∼ P ( Z ∣ Y , θ i ) ln P ( Y , Z ∣ θ ) q (\theta, \theta ^{i}) = \mathop { \mathbb{ E } } _{z \sim P(Z | Y, \theta ^{i}) } \ln P(Y, Z | \theta) q(θ,θi)=Ez∼P(Z∣Y,θi)lnP(Y,Z∣θ)
2.3 M步驟:求期望最大值時的 θ \theta θ, 即 θ i + 1 = arg max θ q ( θ , θ i ) \theta ^{i + 1} = \argmax \limits _{\theta} q(\theta, \theta ^{i}) θi+1=θargmaxq(θ,θi)
- 若滿足某條件,則結束算法。
EM算法的證明
命題
EM算法裡的 θ 0 , θ 1 , θ 2 , ⋯ \theta ^{0}, \theta ^{1}, \theta ^{2}, \cdots θ0,θ1,θ2,⋯,滿足 L ( θ i + 1 ) ≥ L ( θ i ) L (\theta ^{i + 1} ) \ge L (\theta ^{i} ) L(θi+1)≥L(θi)。
證明
L ( θ ) = ln P ( Y ∣ θ ) L (\theta) = \ln P(Y | \theta) L(θ)=lnP(Y∣θ)
= ln ( ∑ z P ( Y , Z ∣ θ ) ) = \ln \left ( \sum \limits _{z} P(Y, Z | \theta) \right ) =ln(z∑P(Y,Z∣θ))
= ln ( ∑ z P ( Z ∣ Y , θ i ) ⋅ P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) = \ln \left ( \sum \limits _{z} P(Z | Y, \theta ^{i}) \cdot \dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) =ln(z∑P(Z∣Y,θi)⋅P(Z∣Y,θi)P(Y,Z∣θ))
≥ ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ( P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) ] \ge \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) \right ] ≥z∑[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θ))]
令 f ( θ , θ i ) = ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ( P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) ] f(\theta, \theta ^{i}) = \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) \right ] f(θ,θi)=z∑[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θ))]
則 f ( θ i , θ i ) = ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ( P ( Y , Z ∣ θ i ) P ( Z ∣ Y , θ i ) ) ] f(\theta^{i}, \theta ^{i}) = \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta ^{i}) } { P(Z | Y, \theta ^{i}) } \right ) \right ] f(θi,θi)=z∑[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θi))]
= ∑ z P ( Z ∣ Y , θ i ) ⋅ ln P ( Y ∣ θ i ) = \sum \limits _{z} P(Z | Y, \theta ^{i}) \cdot \ln P(Y | \theta ^{i}) =z∑P(Z∣Y,θi)⋅lnP(Y∣θi)
= ln P ( Y ∣ θ i ) = \ln P(Y | \theta ^{i}) =lnP(Y∣θi)
= L ( θ i ) = L (\theta ^{i}) =L(θi)
且 L ( θ ) ≥ f ( θ , θ i ) L (\theta) \ge f(\theta, \theta ^{i}) L(θ)≥f(θ,θi)
令 θ i + 1 = arg max θ f ( θ , θ i ) \theta ^{i + 1} = \argmax \limits _{\theta} f(\theta, \theta ^{i}) θi+1=θargmaxf(θ,θi)
則 L ( θ i + 1 ) ≥ f ( θ i + 1 , θ i ) = max θ f ( θ , θ i ) ≥ f ( θ i , θ i ) = L ( θ i ) L (\theta ^{i + 1} ) \ge f(\theta ^{i + 1} , \theta ^{i}) = \max \limits _{\theta} f(\theta, \theta ^{i}) \ge f(\theta^{i}, \theta ^{i}) = L (\theta ^{i}) L(θi+1)≥f(θi+1,θi)=θmaxf(θ,θi)≥f(θi,θi)=L(θi)
由于 f ( θ , θ i ) = ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ( P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) ] f(\theta, \theta ^{i}) = \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) \right ] f(θ,θi)=z∑[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θ))]
= ∑ z ( P ( Z ∣ Y , θ i ) ⋅ ln P ( Y , Z ∣ θ ) ) − ∑ z ( P ( Z ∣ Y , θ i ) ⋅ ln P ( Z ∣ Y , θ i ) ) = \sum \limits _{z} \left ( P(Z | Y, \theta ^{i}) \cdot \ln P(Y, Z | \theta) \right ) - \sum \limits _{z} \left ( P(Z | Y, \theta ^{i}) \cdot \ln P(Z | Y, \theta ^{i}) \right ) =z∑(P(Z∣Y,θi)⋅lnP(Y,Z∣θ))−z∑(P(Z∣Y,θi)⋅lnP(Z∣Y,θi))
是以 θ i + 1 = arg max θ f ( θ , θ i ) = arg max θ ∑ z ( P ( Z ∣ Y , θ i ) ⋅ ln P ( Y , Z ∣ θ ) ) \theta ^{i + 1} = \argmax \limits _{\theta} f(\theta, \theta ^{i}) = \argmax \limits _{\theta} \sum \limits _{z} \left ( P(Z | Y, \theta ^{i}) \cdot \ln P(Y, Z | \theta) \right ) θi+1=θargmaxf(θ,θi)=θargmaxz∑(P(Z∣Y,θi)⋅lnP(Y,Z∣θ))
= arg max θ E z ∼ P ( Z ∣ Y , θ i ) ln P ( Y , Z ∣ θ ) = \argmax \limits _{\theta} \mathop { \mathbb{ E } } _{z \sim P(Z | Y, \theta ^{i}) } \ln P(Y, Z | \theta) =θargmaxEz∼P(Z∣Y,θi)lnP(Y,Z∣θ)
令 q ( θ , θ i ) = E z ∼ P ( Z ∣ Y , θ i ) ln P ( Y , Z ∣ θ ) q (\theta, \theta ^{i}) = \mathop { \mathbb{ E } } _{z \sim P(Z | Y, \theta ^{i}) } \ln P(Y, Z | \theta) q(θ,θi)=Ez∼P(Z∣Y,θi)lnP(Y,Z∣θ)
則 θ i + 1 = arg max θ q ( θ , θ i ) \theta ^{i + 1} = \argmax \limits _{\theta} q (\theta, \theta ^{i}) θi+1=θargmaxq(θ,θi)
且 L ( θ i + 1 ) ≥ L ( θ i ) L (\theta ^{i + 1} ) \ge L (\theta ^{i} ) L(θi+1)≥L(θi)