天天看點

兩個多元高斯分布的Kullback-Leibler divergence(KL散度)

兩個高斯分布分别為:

p ( x ) = N ( x j ; μ , ∑ )                                    = 1 ( 2 π ) n 2 ∣ ∑ ∣ 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) } p(x)=N(x_j;\mu,\sum)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\} p(x)=N(xj​;μ,∑)                                  =(2π)2n​∣∑∣21​1​exp{−21​(x−μ)T(∑)−1(x−μ)}

q ( x ) = N ( x j ; m , L )                                    = 1 ( 2 π ) n 2 ∣ L ∣ 1 2 e x p { − 1 2 ( x − m ) T L − 1 ( x − m ) } q(x)=N(x_j;m,L)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\} q(x)=N(xj​;m,L)                                  =(2π)2n​∣L∣21​1​exp{−21​(x−m)TL−1(x−m)}

矩陣迹(tr)的性質:

t r ( α A + β B ) = α t r ( A ) + β t r ( B ) . . . . . . ① tr(\alpha A+\beta B)=\alpha tr(A)+\beta tr(B)......① tr(αA+βB)=αtr(A)+βtr(B)......① t r ( A ) = t r ( A T ) . . . . . . ② tr(A)=tr(A^T)......② tr(A)=tr(AT)......② t r ( A B ) = t r ( B A ) . . . . . . ③ tr(AB)=tr(BA) ...... ③ tr(AB)=tr(BA)......③ t r ( A B C ) = t r ( B C A ) = t r ( C A B ) . . . . . . ④ ( 由 ③ 得 ) tr(ABC)=tr(BCA)=tr(CAB)...... ④(由③得) tr(ABC)=tr(BCA)=tr(CAB)......④(由③得)

一個重要公式: λ T A λ = t r ( λ T A λ ) = t r ( A λ λ T ) . . . . . . ⑤ \lambda^TA\lambda=tr(\lambda^TA\lambda)=tr(A\lambda\lambda^T)......⑤ λTAλ=tr(λTAλ)=tr(AλλT)......⑤

多元分布中期望E與協方差 ∑ \sum ∑的性質:

E ( x x T ) = ∑ + μ μ T . . . . . . ⑥ E(xx^T)=\sum+\mu\mu^T...... ⑥ E(xxT)=∑+μμT......⑥

證明: ∑ = E [ ( x − μ ) ( x − μ T ) ] = E ( x x T − x μ T − μ x T + μ μ T ) = E ( x x T − μ μ T − μ μ T + μ μ T ) = E ( x x T ) − μ μ T                              \sum=E\big[(x-\mu)(x-\mu^T)\big] \\=E\big(xx^T-x\mu^T-\mu x^T+\mu\mu^T\big) \\=E\big(xx^T-\mu\mu^T-\mu\mu^T+\mu\mu^T\big) \\=E\big(xx^T\big)-\mu\mu^T \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ ∑=E[(x−μ)(x−μT)]=E(xxT−xμT−μxT+μμT)=E(xxT−μμT−μμT+μμT)=E(xxT)−μμT                           

E ( x T A x ) = t r ( A ∑ ) + μ T A μ . . . . . . ⑦ E\big(x^TAx\big)=tr\big(A\sum\big)+\mu^TA\mu......⑦ E(xTAx)=tr(A∑)+μTAμ......⑦

證明:

E ( x T A x ) = E [ t r ( x T A x ) ] = E [ t r ( A x x T ) ] = t r [ E ( A x x T ) ] = t r [ A E ( x x T ) ] = t r [ A ( ∑ + μ μ T ) ] = t r ( A ∑ ) + t r ( A μ μ T ) = t r ( A ∑ ) + t r ( μ T A μ ) = t r ( A ∑ ) + μ T A μ E\big(x^TAx\big) \\=E\big[tr(x^TAx)\big] \\=E\big[tr(Axx^T)\big] \\=tr\big[E(Axx^T)\big] \\=tr\big[AE(xx^T)\big] \\=tr\big[A(\sum+\mu\mu^T)\big] \\=tr(A\sum)+tr(A\mu\mu^T) \\=tr(A\sum)+tr(\mu^TA\mu) \\=tr(A\sum)+\mu^TA\mu E(xTAx)=E[tr(xTAx)]=E[tr(AxxT)]=tr[E(AxxT)]=tr[AE(xxT)]=tr[A(∑+μμT)]=tr(A∑)+tr(AμμT)=tr(A∑)+tr(μTAμ)=tr(A∑)+μTAμ

K L 散 度 的 定 義 : KL散度的定義: KL散度的定義:

K L ( p ∣ ∣ q ) = E p [ l o g p ( x ) q ( x ) ] KL(p||q)=E_p\bigg[log\frac{p(x)}{q(x)}\bigg] KL(p∣∣q)=Ep​[logq(x)p(x)​]

p ( x ) q ( x ) = 1 ( 2 π ) n 2 ∣ ∑ ∣ 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) } 1 ( 2 π ) n 2 ∣ L ∣ 1 2 e x p { − 1 2 ( x − m ) T L − 1 ( x − m ) } = ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) − [ − 1 2 ( x − m ) T L − 1 ( x − m ) ] } = ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] } \frac{p(x)}{q(x)}=\frac{\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\}}{\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\}} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)-\big[{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\big]\bigg\} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\} q(x)p(x)​=(2π)2n​∣L∣21​1​exp{−21​(x−m)TL−1(x−m)}(2π)2n​∣∑∣21​1​exp{−21​(x−μ)T(∑)−1(x−μ)}​=(∣∑∣∣L∣​)21​exp{−21​(x−μ)T(∑)−1(x−μ)−[−21​(x−m)TL−1(x−m)]}=(∣∑∣∣L∣​)21​exp{21​[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]}

l o g p ( x ) q ( x ) = l o g ( ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] } ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] log\frac{p(x)}{q(x)}=log\Bigg((\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\}\Bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big] logq(x)p(x)​=log((∣∑∣∣L∣​)21​exp{21​[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]})=21​log∣∑∣∣L∣​+21​[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]

E p [ l o g p ( x ) q ( x ) ]                                                                         = E p ( 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] ) = 1 2 E p ( l o g ∣ L ∣ ∣ ∑ ∣ ) + 1 2 E p ( ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 E p ( t r [ L − 1 ( x − m ) ( x − m ) T ] − t r [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) . . . . . . ( 性 質 ⑤ ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( E p [ L − 1 ( x − m ) ( x − m ) T ] )                                    − 1 2 t r ( E p [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) . . . . . . ( 性 質 ① ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( E p [ L − 1 ( x x T − m x T − x m T + m m T ) ] )                  − 1 2 t r ( ( ∑ ) − 1 E p [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( L − 1 [ E p ( x x T − m x T − x m T + m m T ) ] )                  − 1 2 t r ( ( ∑ ) − 1 ∑ ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( L − 1 [ ∑ + μ μ T ⏟ 性 質 ⑥ − m x T − x m T + m m T ] ) − n 2 = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 [ μ μ T − m x T − x m T + m m T ] ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 μ μ T − L − 1 m x T − L − 1 x m T + L − 1 m m T ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( μ T L − 1 μ − 2 x T L − 1 m + m T L − 1 m ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 μ μ T − L − 1 m x T − L − 1 x m T + L − 1 m m T ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + ( x − m ) T L − 1 ( x − m ) } E_p\bigg[log\frac{p(x)}{q(x)}\bigg] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=E_p\bigg(\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg) \\=\frac{1}{2}E_p\bigg(log\frac{|L|}{|\sum|}\bigg)+\frac{1}{2}E_p\bigg((x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}E_p\bigg(tr\big[L^{-1} (x-m)(x-m)^T\big]-tr\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg)......(性質⑤) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[L^{-1} (x-m)(x-m)^T\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg(E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) ......(性質①) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[ L^{-1}(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg((\sum)^{-1}E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[ E_p(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\big((\sum)^{-1}\sum\big) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[\underbrace{\sum+\mu\mu^T}_{性質⑥}-mx^T-xm^T+mm^T\big]\bigg)-\frac{n}{2} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}[\mu\mu^T-mx^T-xm^T+mm^T]\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(\mu^TL^{-1}\mu-2x^TL^{-1}m+m^TL^{-1}m\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+\big(x-m\big)^TL^{-1}\big(x-m\big)\Bigg\} Ep​[logq(x)p(x)​]                                                                       =Ep​(21​log∣∑∣∣L∣​+21​[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)])=21​Ep​(log∣∑∣∣L∣​)+21​Ep​((x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ))=21​log∣∑∣∣L∣​+21​Ep​(tr[L−1(x−m)(x−m)T]−tr[(∑)−1(x−μ)(x−μ)T])......(性質⑤)=21​log∣∑∣∣L∣​+21​tr(Ep​[L−1(x−m)(x−m)T])                                  −21​tr(Ep​[(∑)−1(x−μ)(x−μ)T])......(性質①)=21​log∣∑∣∣L∣​+21​tr(Ep​[L−1(xxT−mxT−xmT+mmT)])                −21​tr((∑)−1Ep​[(∑)−1(x−μ)(x−μ)T])=21​log∣∑∣∣L∣​+21​tr(L−1[Ep​(xxT−mxT−xmT+mmT)])                −21​tr((∑)−1∑)=21​log∣∑∣∣L∣​+21​tr(L−1[性質⑥

∑+μμT​​−mxT−xmT+mmT])−2n​=21​{log∣∑∣∣L∣​−n+tr(L−1∑)+tr(L−1[μμT−mxT−xmT+mmT])}=21​{log∣∑∣∣L∣​−n+tr(L−1∑)+tr(L−1μμT−L−1mxT−L−1xmT+L−1mmT)}=21​{log∣∑∣∣L∣​−n+tr(L−1∑)+tr(μTL−1μ−2xTL−1m+mTL−1m)}=21​{log∣∑∣∣L∣​−n+tr(L−1∑)+tr(L−1μμT−L−1mxT−L−1xmT+L−1mmT)}=21​{log∣∑∣∣L∣​−n+tr(L−1∑)+(x−m)TL−1(x−m)}

繼續閱讀