問題描述
本文不解釋FM模型,僅僅通過向量以及矩陣的形式解釋FM模型的理論推導。網絡上大部分的推導都是以元素級别的推導,過程顯得臃腫。這裡将以矩陣和向量的形式解釋其偏導數的推導。多項式模型中,特征 x i x_i xi與 x j x_j xj的組合用 x i x j x_ix_j xixj表示。為了簡單起見,我們讨論二階多項式模型。具體的模型表達式如下(僅讨論二項式部分):
L = Σ i = 1 n − 1 Σ j = i + 1 n w i j ⋅ x i ⋅ x j L= \mathop{\Sigma}\limits_{i=1}^{n-1}\ \mathop{\Sigma}\limits_{j=i+1}^{n}w_{ij}\cdot x_i\cdot x_j L=i=1Σn−1 j=i+1Σnwij⋅xi⋅xj
若令 W = V T ⋅ V W=V^T \cdot V W=VT⋅V,其中 V [ k × n ] = [ V 1 , V 2 , ⋯ , V n ] , V i 為 s h a p e 等 于 [ k , 1 ] V_{[k\times n]}=[V_1,V_2,\cdots,V_n],V_i為shape等于[k,1] V[k×n]=[V1,V2,⋯,Vn],Vi為shape等于[k,1]的列向量。 w i j w_{ij} wij即為 W [ n × n ] W_{[n \times n]} W[n×n]矩陣的元素。同時令列向量 ξ = [ x 1 , ⋯ , x n ] T \xi = [x_1,\cdots,x_n]^T ξ=[x1,⋯,xn]T。現在我們的目标是尋找 V V V的偏導數?
注意 w i j w_{ij} wij為 W W W矩陣的上半區的元素,并且不含有斜對角線上的元素:
∴ L = 1 2 ξ T V T V ξ − 1 2 Σ i = 1 n x i V i T V i x i \therefore L = \frac{1}{2}\xi^T V^T V \xi- \frac{1}{2} \mathop{\Sigma}\limits_{i=1}^{n}x_iV_i^TV_ix_i ∴L=21ξTVTVξ−21i=1ΣnxiViTVixi
推導過程
令:
{ L 1 = 1 2 ξ T V T V ξ L 2 = 1 2 Σ i = 1 n x i V i T V i x i \left\{ \begin{aligned} L_1 &= \frac{1}{2}\xi^T V^T V \xi\\ L_2 &= \frac{1}{2} \mathop{\Sigma}\limits_{i=1}^{n}x_iV_i^TV_ix_i \end{aligned} \right. ⎩⎪⎪⎨⎪⎪⎧L1L2=21ξTVTVξ=21i=1ΣnxiViTVixi
則:
∂ L 1 ∂ V = ∂ L 1 ∂ ( V ξ ) ⋅ ∂ V ξ ∂ V = V ξ ⋅ ξ T \begin{aligned} \frac{\partial L_1}{\partial V}&= \frac{\partial L_1}{\partial (V\xi)}\cdot \frac{\partial V\xi}{\partial V}\\ &=V\xi\cdot\xi^T \end{aligned} ∂V∂L1=∂(Vξ)∂L1⋅∂V∂Vξ=Vξ⋅ξT
同時
∂ L 2 ∂ V i = V i x i 2 \frac{\partial L_2}{\partial V_i}= V_i x_i^2 ∂Vi∂L2=Vixi2
∴ ∂ L 2 ∂ V = [ V 1 x 1 2 , ⋯ V i x i 2 ⋯ , V n x n 2 ] \therefore\frac{\partial L_2}{\partial V}= [V_1 x_1^2,\cdots V_i x_i^2\cdots,V_n x_n^2] ∴∂V∂L2=[V1x12,⋯Vixi2⋯,Vnxn2]
最終結果為:
∴ ∂ L ∂ V = V ξ ⋅ ξ T − [ V 1 x 1 2 , ⋯ V i x i 2 ⋯ , V n x n 2 ] \therefore\frac{\partial L}{\partial V}= V\xi\cdot\xi^T-[V_1 x_1^2,\cdots V_i x_i^2\cdots,V_n x_n^2] ∴∂V∂L=Vξ⋅ξT−[V1x12,⋯Vixi2⋯,Vnxn2]
總結
其實一共就僅用了實數對向量求偏導和對矩陣求偏導的知識。本文不做這兩個方面的推導,可以參考偏導數鍊式法則。
後續
上面的内容是個人能力範圍内最簡化的推導過程了,但是整個流程下來還是覺得不夠過瘾,因為在求 ∂ L 2 / ∂ V {\partial L_2}/{\partial V} ∂L2/∂V時不免俗的将矩陣 V V V拆開求對向量 V i V_i Vi的偏導,然後再組合為對 V V V的偏導。下面使用矩陣的方法求偏導,不過反而将問題複雜化了(目的還是在于驗證直接通過對矩陣求偏導方法的可行性)。
構造:
X = [ x 1 0 ⋯ 0 0 x 2 ⋯ 0 ⋮ 0 ⋯ 0 x n ] n × n X = \left[ \begin{matrix} x_1 & 0 &\cdots & 0 \\ 0 & x_2 &\cdots & 0 \\ \vdots \\ 0 &\cdots &0 & x_n \end{matrix} \right] _{n\times n} X=⎣⎢⎢⎢⎡x10⋮00x2⋯⋯⋯000xn⎦⎥⎥⎥⎤n×n
∴ L 2 = 1 2 Σ i = 1 n x i V i T V i x i = 1 2 T r ( X V T V X ) ∵ X = X T = 1 2 T r ( X T V T V X ) \begin{aligned} \therefore L_2 &= \frac{1}{2} \mathop{\Sigma}\limits_{i=1}^{n}x_iV_i^TV_ix_i \\ &= \frac{1}{2}Tr(XV^TVX) \qquad \because X=X^T\\ &= \frac{1}{2}Tr(X^TV^TVX) \\ \end{aligned} ∴L2=21i=1ΣnxiViTVixi=21Tr(XVTVX)∵X=XT=21Tr(XTVTVX)
令 Z = V ⋅ X Z=V\cdot X Z=V⋅X,即 L 2 = 1 / 2 ⋅ T r ( Z T Z ) L_2 = 1/2\cdot Tr(Z^TZ) L2=1/2⋅Tr(ZTZ)則:
∂ L 2 ∂ Z = Z \frac{\partial L_2}{\partial Z}= Z ∂Z∂L2=Z
是以:
∂ L 2 ∂ V = ∂ L 2 ∂ Z ⋅ ∂ Z ∂ V = Z ⋅ X T ∵ X = X T = Z ⋅ X = V ⋅ X 2 \begin{aligned} \frac{\partial L_2}{\partial V} &= \frac{\partial L_2}{\partial Z}\cdot \frac{\partial Z}{\partial V}\\ &= Z\cdot X^T \qquad \because X=X^T\\ &= Z\cdot X \\ &= V\cdot X^2 \end{aligned} ∂V∂L2=∂Z∂L2⋅∂V∂Z=Z⋅XT∵X=XT=Z⋅X=V⋅X2
非常接近結論了, X X X為對角陣,對角線上的元素為 ξ \xi ξ中的變量 x i x_i xi。 X 2 X^2 X2也為對角陣,用其右乘矩陣 V V V,相當于對矩陣的列向量的乘法操作。
V ⋅ X 2 = [ V 1 , V 2 , ⋯ , V n ] ⋅ X 2 = [ V 1 x 1 2 , ⋯ V i x i 2 ⋯ , V n x n 2 ] V\cdot X^2 = [V_1,V_2,\cdots,V_n]\cdot X^2 = [V_1 x_1^2,\cdots V_i x_i^2\cdots,V_n x_n^2] V⋅X2=[V1,V2,⋯,Vn]⋅X2=[V1x12,⋯Vixi2⋯,Vnxn2]
可見如果寫成矩陣形式:
∴ ∂ L ∂ V = V ξ ⋅ ξ T − V ⋅ X 2 \therefore\frac{\partial L}{\partial V}= V\xi\cdot\xi^T- V\cdot X^2 ∴∂V∂L=Vξ⋅ξT−V⋅X2