天天看點

EM算法與證明假設EM算法的流程EM算法的證明

這裡寫自定義目錄标題

  • 假設
  • EM算法的流程
  • EM算法的證明
    • 命題
    • 證明

假設

假設 Y Y Y 是觀察值, Z Z Z 是隐變量, θ \theta θ 是模型參數。

令 L ( θ ) = ln ⁡ P ( Y ∣ θ ) L (\theta) = \ln P(Y | \theta) L(θ)=lnP(Y∣θ)。

EM算法的流程

  1. 初始化 θ 0 \theta ^{0} θ0。
  2. 假設已知 θ i \theta ^{i} θi ,則

    2.1 計算隐變量 Z ∼ P ( Z ∣ Y , θ i ) Z\sim P(Z | Y, \theta ^{i}) Z∼P(Z∣Y,θi) 的分布。

    2.2 E步驟:求期望 q ( θ , θ i ) = E z ∼ P ( Z ∣ Y , θ i ) ln ⁡ P ( Y , Z ∣ θ ) q (\theta, \theta ^{i}) = \mathop { \mathbb{ E } } _{z \sim P(Z | Y, \theta ^{i}) } \ln P(Y, Z | \theta) q(θ,θi)=Ez∼P(Z∣Y,θi)​lnP(Y,Z∣θ)

    2.3 M步驟:求期望最大值時的 θ \theta θ, 即 θ i + 1 = arg max ⁡ θ q ( θ , θ i ) \theta ^{i + 1} = \argmax \limits _{\theta} q(\theta, \theta ^{i}) θi+1=θargmax​q(θ,θi)

  3. 若滿足某條件,則結束算法。

EM算法的證明

命題

EM算法裡的 θ 0 , θ 1 , θ 2 , ⋯ \theta ^{0}, \theta ^{1}, \theta ^{2}, \cdots θ0,θ1,θ2,⋯,滿足 L ( θ i + 1 ) ≥ L ( θ i ) L (\theta ^{i + 1} ) \ge L (\theta ^{i} ) L(θi+1)≥L(θi)。

證明

L ( θ ) = ln ⁡ P ( Y ∣ θ ) L (\theta) = \ln P(Y | \theta) L(θ)=lnP(Y∣θ)

= ln ⁡ ( ∑ z P ( Y , Z ∣ θ ) ) = \ln \left ( \sum \limits _{z} P(Y, Z | \theta) \right ) =ln(z∑​P(Y,Z∣θ))

= ln ⁡ ( ∑ z P ( Z ∣ Y , θ i ) ⋅ P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) = \ln \left ( \sum \limits _{z} P(Z | Y, \theta ^{i}) \cdot \dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) =ln(z∑​P(Z∣Y,θi)⋅P(Z∣Y,θi)P(Y,Z∣θ)​)

≥ ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ⁡ ( P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) ] \ge \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) \right ] ≥z∑​[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θ)​)]

令 f ( θ , θ i ) = ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ⁡ ( P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) ] f(\theta, \theta ^{i}) = \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) \right ] f(θ,θi)=z∑​[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θ)​)]

則 f ( θ i , θ i ) = ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ⁡ ( P ( Y , Z ∣ θ i ) P ( Z ∣ Y , θ i ) ) ] f(\theta^{i}, \theta ^{i}) = \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta ^{i}) } { P(Z | Y, \theta ^{i}) } \right ) \right ] f(θi,θi)=z∑​[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θi)​)]

= ∑ z P ( Z ∣ Y , θ i ) ⋅ ln ⁡ P ( Y ∣ θ i ) = \sum \limits _{z} P(Z | Y, \theta ^{i}) \cdot \ln P(Y | \theta ^{i}) =z∑​P(Z∣Y,θi)⋅lnP(Y∣θi)

= ln ⁡ P ( Y ∣ θ i ) = \ln P(Y | \theta ^{i}) =lnP(Y∣θi)

= L ( θ i ) = L (\theta ^{i}) =L(θi)

且 L ( θ ) ≥ f ( θ , θ i ) L (\theta) \ge f(\theta, \theta ^{i}) L(θ)≥f(θ,θi)

令 θ i + 1 = arg max ⁡ θ f ( θ , θ i ) \theta ^{i + 1} = \argmax \limits _{\theta} f(\theta, \theta ^{i}) θi+1=θargmax​f(θ,θi)

則 L ( θ i + 1 ) ≥ f ( θ i + 1 , θ i ) = max ⁡ θ f ( θ , θ i ) ≥ f ( θ i , θ i ) = L ( θ i ) L (\theta ^{i + 1} ) \ge f(\theta ^{i + 1} , \theta ^{i}) = \max \limits _{\theta} f(\theta, \theta ^{i}) \ge f(\theta^{i}, \theta ^{i}) = L (\theta ^{i}) L(θi+1)≥f(θi+1,θi)=θmax​f(θ,θi)≥f(θi,θi)=L(θi)

由于 f ( θ , θ i ) = ∑ z [ P ( Z ∣ Y , θ i ) ⋅ ln ⁡ ( P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ i ) ) ] f(\theta, \theta ^{i}) = \sum \limits _{z} \left [ P(Z | Y, \theta ^{i}) \cdot \ln \left (\dfrac { P(Y, Z | \theta) } { P(Z | Y, \theta ^{i}) } \right ) \right ] f(θ,θi)=z∑​[P(Z∣Y,θi)⋅ln(P(Z∣Y,θi)P(Y,Z∣θ)​)]

= ∑ z ( P ( Z ∣ Y , θ i ) ⋅ ln ⁡ P ( Y , Z ∣ θ ) ) − ∑ z ( P ( Z ∣ Y , θ i ) ⋅ ln ⁡ P ( Z ∣ Y , θ i ) ) = \sum \limits _{z} \left ( P(Z | Y, \theta ^{i}) \cdot \ln P(Y, Z | \theta) \right ) - \sum \limits _{z} \left ( P(Z | Y, \theta ^{i}) \cdot \ln P(Z | Y, \theta ^{i}) \right ) =z∑​(P(Z∣Y,θi)⋅lnP(Y,Z∣θ))−z∑​(P(Z∣Y,θi)⋅lnP(Z∣Y,θi))

是以 θ i + 1 = arg max ⁡ θ f ( θ , θ i ) = arg max ⁡ θ ∑ z ( P ( Z ∣ Y , θ i ) ⋅ ln ⁡ P ( Y , Z ∣ θ ) ) \theta ^{i + 1} = \argmax \limits _{\theta} f(\theta, \theta ^{i}) = \argmax \limits _{\theta} \sum \limits _{z} \left ( P(Z | Y, \theta ^{i}) \cdot \ln P(Y, Z | \theta) \right ) θi+1=θargmax​f(θ,θi)=θargmax​z∑​(P(Z∣Y,θi)⋅lnP(Y,Z∣θ))

= arg max ⁡ θ E z ∼ P ( Z ∣ Y , θ i ) ln ⁡ P ( Y , Z ∣ θ ) = \argmax \limits _{\theta} \mathop { \mathbb{ E } } _{z \sim P(Z | Y, \theta ^{i}) } \ln P(Y, Z | \theta) =θargmax​Ez∼P(Z∣Y,θi)​lnP(Y,Z∣θ)

令 q ( θ , θ i ) = E z ∼ P ( Z ∣ Y , θ i ) ln ⁡ P ( Y , Z ∣ θ ) q (\theta, \theta ^{i}) = \mathop { \mathbb{ E } } _{z \sim P(Z | Y, \theta ^{i}) } \ln P(Y, Z | \theta) q(θ,θi)=Ez∼P(Z∣Y,θi)​lnP(Y,Z∣θ)

則 θ i + 1 = arg max ⁡ θ q ( θ , θ i ) \theta ^{i + 1} = \argmax \limits _{\theta} q (\theta, \theta ^{i}) θi+1=θargmax​q(θ,θi)

且 L ( θ i + 1 ) ≥ L ( θ i ) L (\theta ^{i + 1} ) \ge L (\theta ^{i} ) L(θi+1)≥L(θi)

繼續閱讀