天天看點

Softmax Regression損失函數的求導

softmax regression 代價函數:

J ( θ ) = − 1 m [ ∑ i = 1 m ∑ j = 1 k 1 { y ( i ) = j } l o g e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ] J(\theta) = -\frac{1}{m}\left[\sum_{i=1}^{m}\sum_{j=1}^{k}1\{y^{(i)}=j\}log \frac{e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}\right] J(θ)=−m1​[i=1∑m​j=1∑k​1{y(i)=j}log∑l=1k​eθlT​X(i)eθjT​X(i)​]

其中,1{y(i)=j}表示的是當y(i)屬于類别j時,1{y(i)=j}=1, 否則,1{y(i)=j}=0.

對損失函數求導:

∇ θ j J ( θ ) = − 1 m ∑ i = 1 m [ ∇ θ j ∑ j = 1 k 1 { y ( i ) = j } l o g e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ] = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) e θ j T X ( i ) ⋅ ( − e θ j T X ( i ) ⋅ X ( i ) ⋅ e θ j T X ( i ) ( ∑ l = 1 k e θ l T X ( i ) ) 2 + e θ j T X ( i ) ⋅ X ( i ) ∑ l = 1 k e θ l T X ( i ) ) ] = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) − e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ⋅ X ( i ) ] \nabla_{\theta_j}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}\left[\nabla_{\theta_j}\sum_{j=1}^{k}1\{y^{(i)}=j\}log \frac{e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}\right] \\=-\frac{1}{m}\sum_{i=1}^{m}\left[1\{y^{(i)}=j\}⋅\frac{\sum_{l=1}^ke^{{\theta_l^T}X^{(i)}}}{e^{\theta_j^TX^{(i)}}} ⋅\left(-\frac{e^{{\theta_j^T}X^{(i)}}⋅X^{(i)}⋅{e^{{\theta_j^T}{X^{(i)}}}}}{\left({\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}\right)^2} + \frac{e^{{\theta_j^T}X^{(i)}}⋅X^{(i)}}{{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}}\right)\right]\\=-\frac{1}{m}\sum_{i=1}^{m}\left[1\{y^{(i)}=j\}⋅\frac{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}-e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}⋅X^{(i)}\right] ∇θj​​J(θ)=−m1​i=1∑m​[∇θj​​j=1∑k​1{y(i)=j}log∑l=1k​eθlT​X(i)eθjT​X(i)​]=−m1​i=1∑m​⎣⎢⎡​1{y(i)=j}⋅eθjT​X(i)∑l=1k​eθlT​X(i)​⋅⎝⎜⎛​−(∑l=1k​eθlT​X(i))2eθjT​X(i)⋅X(i)⋅eθjT​X(i)​+∑l=1k​eθlT​X(i)eθjT​X(i)⋅X(i)​⎠⎟⎞​⎦⎥⎤​=−m1​i=1∑m​[1{y(i)=j}⋅∑l=1k​eθlT​X(i)∑l=1k​eθlT​X(i)−eθjT​X(i)​⋅X(i)]

第一步求導中,首先是對數函數的求導,此處将log看成ln:

( l n x ) ′ = 1 x (lnx)' = \frac{1}{x} (lnx)′=x1​

其次是對指數函數的求導:

( e x ) ′ = e x (e^x)'=e^x (ex)′=ex

另外對?_j_的求導是針對?中的某一項j,是以其他非j的?項求導後為0.

這一步還有看不懂的可以參考一下求導例題:

Softmax Regression損失函數的求導

而對于每一個樣本,估計其所屬的類别的機率為:

P ( y ( i ) = j ∣ X ( i ) ; θ ) = e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) P(y^{(i)}=j|X^{(i)};\theta)=\frac{e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}} P(y(i)=j∣X(i);θ)=∑l=1k​eθlT​X(i)eθjT​X(i)​

是以最終的結果是:

∇ θ j J ( θ ) = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) − e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ⋅ X ( i ) ] = − 1 m ∑ i = 1 m [ X ( i ) ⋅ ( 1 { y ( i ) = j } − P ( y ( i ) = j ∣ X ( i ) ; θ ) ) ] \nabla_{\theta_j}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}\left[1\{y^{(i)}=j\}⋅\frac{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}-e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}⋅X^{(i)}\right]\\=-\frac{1}{m}\sum_{i=1}^{m}\left[X^{(i)}⋅\left(1\{y^{(i)}=j\}-P(y^{(i)}=j|X^{(i)};\theta)\right)\right] ∇θj​​J(θ)=−m1​i=1∑m​[1{y(i)=j}⋅∑l=1k​eθlT​X(i)∑l=1k​eθlT​X(i)−eθjT​X(i)​⋅X(i)]=−m1​i=1∑m​[X(i)⋅(1{y(i)=j}−P(y(i)=j∣X(i);θ))]

此處?_j_表示的是一個向量。

通過梯度下降法的公式可心更新如下:

θ j = θ j + α ∇ θ j J ( θ ) \theta_j = \theta_j+\alpha\nabla_{\theta_j}J(\theta) θj​=θj​+α∇θj​​J(θ)

更詳細了解Softmax回歸可參考Ufldl教程。

繼續閱讀