合頁損失函數
[ z ] + = { 0 , z ≤ 0 z , z > 0 [z]_{+} = \{^{z, z >0}_{0, z\le 0} [z]+={0,z≤0z,z>0
SVM的目标函數
對于線性SVM,我們知道它的原始最優化問題為:
m i n w , b , ξ 1 / 2 ∣ ∣ w ∣ ∣ 2 + C Σ i = 1 N ξ i s . t . y i ( w ⋅ x i + b ) ≥ 1 − ξ i , i = 1 , 2 , . . . , N ξ i ≥ 0 , i = 1 , 2 , . . . , N min_{w,b,\xi} 1/2||w||^2 +C \Sigma_{i=1}^N\xi_i\\ s.t. y_i(w\cdot x_i+b)\ge 1-\xi_i, i=1,2,...,N\\ \xi_i\ge 0, i=1,2,...,N minw,b,ξ1/2∣∣w∣∣2+CΣi=1Nξis.t.yi(w⋅xi+b)≥1−ξi,i=1,2,...,Nξi≥0,i=1,2,...,N
對于原始優化問題,當 y i ( w ⋅ x i + b ) ≥ 1 y_i(w\cdot x_i+b)\ge 1 yi(w⋅xi+b)≥1時,資料點落在了間隔邊界正确的一側,這時 ξ i \xi_i ξi為0,而當 y i ( w ⋅ x i + b ) < 1 y_i(w\cdot x_i+b)\lt 1 yi(w⋅xi+b)<1的時候,這時 ξ i = 1 − y i ( w ⋅ x i + b ) \xi_i=1- y_i(w\cdot x_i+b) ξi=1−yi(w⋅xi+b).
是以,SVM的優化問題可以轉化為最優化以下問題:
m i n w , b Σ i = 1 N [ 1 − y i ( w ⋅ x i + b ) ] + + λ ∣ ∣ w ∣ ∣ 2 min_{w,b}\Sigma_{i=1}^N[1-y_i(w\cdot x_i+b)]_+ + \lambda ||w||^2 minw,bΣi=1N[1−yi(w⋅xi+b)]++λ∣∣w∣∣2
其中, λ = ( 2 C ) − 1 \lambda = (2C)^{-1} λ=(2C)−1
Logistic的目标函數
對于logistic回歸,我們知道sigmoid函數的形式為
f ( a ) = 1 1 + e x p { − ( w ⋅ x + b ) } = 1 1 + e x p ( − a ) f(a) = \frac{1}{1+exp\{-(w\cdot x + b)\}}=\frac{1}{1+exp(-a)} f(a)=1+exp{−(w⋅x+b)}1=1+exp(−a)1
其中 a = w ⋅ x + b a = w\cdot x + b a=w⋅x+b
sigmoid函數的性質:
- 對稱性: f ( − a ) = 1 − f ( a ) f(-a)=1-f(a) f(−a)=1−f(a)
- ∂ f ∂ a = f ( 1 − f ) \frac{\partial f}{\partial a} = f(1-f) ∂a∂f=f(1−f)
應用極大似然函數估計模型參數,首先構造似然函數,我們知道logistic回歸模型為 p ( y = 1 ∣ x ) = π ( x ) , p ( y = 0 ∣ x ) = 1 − π ( x ) p(y=1|x)=\pi(x), p(y=0|x)=1-\pi(x) p(y=1∣x)=π(x),p(y=0∣x)=1−π(x),則似然函數為:
Z = Π i = 1 N [ π ( x i ) ] y i [ 1 − π ( x i ) ] 1 − y i Z=\Pi_{i=1}^N[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i} Z=Πi=1N[π(xi)]yi[1−π(xi)]1−yi
取似然函數的負對數得到誤差函數,這個誤差函數就是交叉熵(cross-entropy)誤差函數:
L 1 = − l n Z = − Σ i = 1 N [ y i l o g π ( x i ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] L1 = -lnZ = -\Sigma_{i=1}^N[y_ilog\pi(x_i)+(1-y_i)log(1-\pi(x_i))] L1=−lnZ=−Σi=1N[yilogπ(xi)+(1−yi)log(1−π(xi))]
去掉前面負号不影響優化問題,則:
L 2 = Σ i = 1 N [ y i l o g π ( x i ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] L2 =\Sigma_{i=1}^N[y_ilog\pi(x_i)+(1-y_i)log(1-\pi(x_i))] L2=Σi=1N[yilogπ(xi)+(1−yi)log(1−π(xi))]
在處理logistic回歸時,為了比較友善,我們對目标變量 y ∈ { 0 , 1 } y \in \{0,1\} y∈{0,1}進行操作,使用目标函數 y ∈ { − 1 , 1 } y \in \{-1,1\} y∈{−1,1}重寫最大似然logistic函數。我們知道, p ( y = 1 ∣ a ) = f ( a ) , p ( y = − 1 ∣ a ) = 1 − f ( a ) = f ( − a ) p(y=1|a) = f(a),p(y=-1|a)=1-f(a)=f(-a) p(y=1∣a)=f(a),p(y=−1∣a)=1−f(a)=f(−a),根據sigmoid函數的對稱性質,我們有:
p ( y ∣ a ) = f ( a y ) = 1 1 + e x p ( − a y ) p(y|a) = f(ay)=\frac{1}{1+exp(-ay)} p(y∣a)=f(ay)=1+exp(−ay)1
從上式子中通過對似然函數取負對數構造一個帶正則化項的誤差函數:
L = Σ i = 1 N l n ( 1 + e x p ( − a i y i ) ) + λ ∣ ∣ w ∣ ∣ 2 L = \Sigma_{i=1}^Nln(1+exp(-a_iy_i))+\lambda ||w||^2 L=Σi=1Nln(1+exp(−aiyi))+λ∣∣w∣∣2
把最終式換回去:
m i n w , b L = Σ i = 1 N l n ( 1 + e x p ( − ( w ⋅ x i + b ) y i ) + λ ∣ ∣ w ∣ ∣ 2 min_{w,b}L = \Sigma_{i=1}^Nln(1+exp(-(w\cdot x_i +b)y_i)+\lambda ||w||^2 minw,bL=Σi=1Nln(1+exp(−(w⋅xi+b)yi)+λ∣∣w∣∣2
和SVM作對比:
m i n w , b Σ i = 1 N [ 1 − y i ( w ⋅ x i + b ) ] + + λ ∣ ∣ w ∣ ∣ 2 min_{w,b}\Sigma_{i=1}^N[1-y_i(w\cdot x_i+b)]_+ + \lambda ||w||^2 minw,bΣi=1N[1−yi(w⋅xi+b)]++λ∣∣w∣∣2
是以,SVM和Logistic回歸有相似的目标函數
附加:
logistic回歸與最大熵模型的關系