天天看點

機器學習——軟間隔SVM硬間隔SVM的問題軟間隔SVM軟間隔SVM的求解

以下内容均為個人了解,如有錯誤,歡迎指出

文章目錄

  • 硬間隔SVM的問題
  • 軟間隔SVM
      • 什麼是軟間隔SVM
      • 軟間隔SVM的數學表達式
  • 軟間隔SVM的求解
      • SMO算法求解軟間隔SVM
      • 選擇兩個α的方法(未解釋原因,隻闡明步驟)

硬間隔SVM的問題

硬間隔要求間隔之間不存在任何點,這點要求非常苛刻,這也導緻了硬間隔SVM對于異常點非常敏感,由于噪聲的因素,可能屬于A類的點分布在B類中(異常點),此時硬間隔将無法找到一個劃分超平面,是以,我們導出了軟間隔SVM。

軟間隔SVM

什麼是軟間隔SVM

軟間隔SVM允許部分點分布在間隔内部,此時可以解決硬間隔SVM的問題(隻需将異常點放到間隔内部即可),因為間隔内部的點對于SVM的思想來說是一種錯誤,是以我們希望位于間隔内部的點盡可能少,其實是一種折中,即在錯誤較少的情況下獲得不錯的劃分超平面

軟間隔SVM的數學表達式

回顧一下硬間隔SVM的數學表達式為

min ⁡ w , b ∣ ∣ w ∣ ∣ 2 2 s . t .   y i ( w T x i + b ) ≥ 1 , i = 1 , 2..... m \begin{aligned} & \min \limits_{w,b} \frac{||w||^2}{2} \\ \\ &s.t.\ y_i(w^Tx_i+b)\geq1,i=1,2.....m \end{aligned} ​w,bmin​2∣∣w∣∣2​s.t. yi​(wTxi​+b)≥1,i=1,2.....m​

位于間隔内部的點滿足 y i ( w T x i + b ) < 1 y_i(w^Tx_i+b)<1 yi​(wTxi​+b)<1,我們為每個點 ( x i , y i ) (x_i,y_i) (xi​,yi​)引入一個松弛變量 ε i \varepsilon_i εi​,對于間隔内部的點,滿足

ε i > 0 y i ( w T x i + b ) + ε i ≥ 1 \begin{aligned} & \varepsilon_i>0 \\ \\ & y_i(w^Tx_i+b)+\varepsilon_i\geq 1 \end{aligned} ​εi​>0yi​(wTxi​+b)+εi​≥1​

而間隔外的點和支援向量,隻需滿足 ε i = 0 \varepsilon_i=0 εi​=0即可,綜上,對于軟間隔SVM,我們的限制條件變為

ε i ≥ 0 y i ( w T x i + b ) + ε i ≥ 1    i = 1 , 2..... m \begin{aligned} & \varepsilon_i\geq0 \\ \\ & y_i(w^Tx_i+b)+\varepsilon_i\geq 1 \ \ i=1,2.....m \end{aligned} ​εi​≥0yi​(wTxi​+b)+εi​≥1  i=1,2.....m​

由于我們希望位于間隔内的點盡可能少,為了展現這個特點,我們将優化目标變為

min ⁡ ∣ ∣ w ∣ ∣ 2 2 + C ∑ i = 1 m ε i s . t   ε i ≥ 0 y i ( w T x i + b ) + ε i ≥ 1    i = 1 , 2..... m (式1.0) \begin{aligned} &\min \frac{||w||^2}{2}+C\sum_{i=1}^m\varepsilon_i\\ &s.t \ \varepsilon_i\geq0 \\ \\ &y_i(w^Tx_i+b)+\varepsilon_i\geq 1 \ \ i=1,2.....m \tag{式1.0} \end{aligned} ​min2∣∣w∣∣2​+Ci=1∑m​εi​s.t εi​≥0yi​(wTxi​+b)+εi​≥1  i=1,2.....m​(式1.0)

其中,C>0為懲罰參數,為我們事先指定,C越大,位于間隔内部的點越少,感性一點的解釋是, ∣ ∣ w ∣ ∣ 2 2 + C ∑ i = 1 m ε i \frac{||w||^2}{2}+C\sum_{i=1}^m\varepsilon_i 2∣∣w∣∣2​+C∑i=1m​εi​由兩部分組成, ∣ ∣ w ∣ ∣ 2 2 \frac{||w||^2}{2} 2∣∣w∣∣2​和 C ∑ i = 1 m ε i C\sum_{i=1}^m\varepsilon_i C∑i=1m​εi​,C越大,則 C ∑ i = 1 m ε i C\sum_{i=1}^m\varepsilon_i C∑i=1m​εi​對于取值的占比越大,此時為了讓式子取最小,則會有盡可能多的 ε i \varepsilon_i εi​趨近于0

軟間隔SVM的求解

由于 ε i   ( i = 1 , 2..... m ) \varepsilon_i \ (i=1,2.....m) εi​ (i=1,2.....m)為仿射函數,是以,我們仍然可以對式1.0使用拉格朗日對偶,則有下列過程

L ( w , b , α , β , ε ) = ∣ ∣ w ∣ ∣ 2 2 + C ∑ i = 1 m ε i + ∑ i = 1 m α i ( 1 − ε i − y i ( w T x i + b ) ) − ∑ i = 1 m β i ε i (式1.1) \begin{aligned} L(w,b,\alpha,\beta,\varepsilon)=\frac{||w||^2}{2}+C\sum_{i=1}^m\varepsilon_i+\sum_{i=1}^m\alpha_i(1-\varepsilon_i-y_i(w^Tx_i+b))-\sum_{i=1}^m \beta_i\varepsilon_i \tag{式1.1} \end{aligned} L(w,b,α,β,ε)=2∣∣w∣∣2​+Ci=1∑m​εi​+i=1∑m​αi​(1−εi​−yi​(wTxi​+b))−i=1∑m​βi​εi​​(式1.1)

拉格朗日對偶函數為 max ⁡ α , β min ⁡ w , b , ε L ( w , b , α , β , ε ) \max \limits_{\alpha,\beta} \min \limits_{w,b,\varepsilon}L(w,b,\alpha,\beta,\varepsilon) α,βmax​w,b,εmin​L(w,b,α,β,ε),若固定 α , β \alpha,\beta α,β,式1.1為 w , b , ε w,b,\varepsilon w,b,ε的凸函數,對 w , b , ε w,b,\varepsilon w,b,ε求導得

∂ L ( w , b , α , β , ε ) ∂ w = { w 1 − ∑ i = 1 m α i y i x i 1 w 2 − ∑ i = 1 m α i y i x i 2 . . . . . w n − ∑ i = 1 m α i y i x i 1 } = w − ∑ i = 1 m α i y i x i = 0 \frac{\partial L(w,b,\alpha,\beta,\varepsilon)}{\partial w}= \left\{ \begin{matrix} & w_1-\sum_{i=1}^m \alpha_iy_ix_{i1} \\ & w_2-\sum_{i=1}^m \alpha_iy_ix_{i2} \\ & .....\\ & w_n-\sum_{i=1}^m \alpha_iy_ix_{i1} \end{matrix} \right\}=w-\sum_{i=1}^m\alpha_iy_ix_i =0\\ ∂w∂L(w,b,α,β,ε)​=⎩⎪⎪⎨⎪⎪⎧​​w1​−∑i=1m​αi​yi​xi1​w2​−∑i=1m​αi​yi​xi2​.....wn​−∑i=1m​αi​yi​xi1​​⎭⎪⎪⎬⎪⎪⎫​=w−i=1∑m​αi​yi​xi​=0

∂ L ( w , b , α , β , ε ) ∂ b = ∑ i = 1 m α i y i = 0 \frac{\partial L(w,b,\alpha,\beta,\varepsilon)}{\partial b}=\sum_{i=1}^m \alpha_iy_i=0 ∂b∂L(w,b,α,β,ε)​=i=1∑m​αi​yi​=0

α i + β i = C    ( i = 1 , 2 , . . . , m ) \alpha_i+\beta_i=C \ \ (i=1,2,...,m) αi​+βi​=C  (i=1,2,...,m)

将上述式子代入式1.0,可得

min ⁡ 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j − ∑ i = 1 m α i s . t   α i ≥ 0       β i ≥ 0   ( i = 1 , 2 , . . . , m )        α i + β i = C \begin{aligned} & \min \frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j-\sum_{i=1}^m\alpha_i\\ \\ & s.t \ \alpha_i\geq0\\ \\ & \ \ \ \ \ \beta_i\geq0 \ (i=1,2,...,m)\\ \\ & \ \ \ \ \ \ \alpha_i+\beta_i=C \end{aligned} ​min21​i=1∑m​j=1∑m​αi​αj​yi​yj​xiT​xj​−i=1∑m​αi​s.t αi​≥0     βi​≥0 (i=1,2,...,m)      αi​+βi​=C​

由于 β i = C − α i \beta_i=C-\alpha_i βi​=C−αi​,是以有 C − α i ≥ 0 C-\alpha_i\geq0 C−αi​≥0,即 C ≥ α i C\geq\alpha_i C≥αi​,則最終的優化對象為

min ⁡ 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j − ∑ i = 1 m α i s . t   0 ≤ α i ≤ C ( i = 1 , 2 , . . . , m ) (式1.2) \begin{aligned} & \min \frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j-\sum_{i=1}^m\alpha_i\\ \\ & s.t \ 0 \leq \alpha_i\leq C (i=1,2,...,m) \end{aligned}\tag{式1.2} ​min21​i=1∑m​j=1∑m​αi​αj​yi​yj​xiT​xj​−i=1∑m​αi​s.t 0≤αi​≤C(i=1,2,...,m)​(式1.2)

接下來,我們使用SMO算法對其進行優化,我們将展現SMO算法的具體細節

SMO算法求解軟間隔SVM

我們選擇 α 1 , α 2 \alpha_1,\alpha_2 α1​,α2​作為可變元,剩下的 α i ( i = 3 , . . . . m ) \alpha_i (i=3,....m) αi​(i=3,....m)不變,式1.3的原型為

1 2 ∑ i = 1 m α i y i x i ∑ j = 1 m α j y j x j − ∑ i = 1 m α i = 1 2 ( α 1 y 1 x 1 + α 2 y 2 x 2 + . . . . + α m y m x m ) ( α 1 y 1 x 1 + α 2 y 2 x 2 + . . . . + α m y m x m ) − ∑ i = 1 m α i = 1 2 α 1 2 y 1 2 x 1 T x 1 + 1 2 α 2 2 y 2 2 x 2 T x 2 + α 1 α 2 y 1 y 2 x 1 T x 2 + α 1 y 1 x 1 T ∑ i = 3 m α i y i x i + α 2 y 2 x 2 T ∑ i = 3 m α i y i x i − α 1 − α 2 + H + K ( 式 1.3 ) \begin{aligned} &\frac{1}{2}\sum_{i=1}^m\alpha_iy_ix_i \sum_{j=1}^m\alpha_jy_jx_j -\sum_{i=1}^m\alpha_i\\ =&\frac{1}{2}(\alpha_1y_1x_1+\alpha_2y_2x_2+....+\alpha_my_mx_m)(\alpha_1y_1x_1+\alpha_2y_2x_2+....+\alpha_my_mx_m)-\sum_{i=1}^m\alpha_i\\ =&\frac{1}{2}\alpha_1^2y_1^2x_1^Tx_1+\frac{1}{2}\alpha_2^2y_2^2x_2^Tx_2+\alpha_1\alpha_2y_1y_2x_1^Tx_2+\alpha_1y_1x_1^T\sum_{i=3}^m\alpha_iy_ix_i+ \\ &\alpha_2y_2x_2^T\sum_{i=3}^m\alpha_iy_ix_i-\alpha_1-\alpha_2+H+K(式1.3) \end{aligned} ==​21​i=1∑m​αi​yi​xi​j=1∑m​αj​yj​xj​−i=1∑m​αi​21​(α1​y1​x1​+α2​y2​x2​+....+αm​ym​xm​)(α1​y1​x1​+α2​y2​x2​+....+αm​ym​xm​)−i=1∑m​αi​21​α12​y12​x1T​x1​+21​α22​y22​x2T​x2​+α1​α2​y1​y2​x1T​x2​+α1​y1​x1T​i=3∑m​αi​yi​xi​+α2​y2​x2T​i=3∑m​αi​yi​xi​−α1​−α2​+H+K(式1.3)​

其中

H = 1 2 ∑ i = 3 m ∑ j = 3 m α i y i α j y j x i T x j K = − ∑ i = 3 m α i \begin{aligned} &H=\frac{1}{2}\sum_{i=3}^m\sum_{j=3}^m\alpha_iy_i\alpha_jy_jx_i^Tx_j\\ &K=-\sum_{i=3}^m\alpha_i \end{aligned} ​H=21​i=3∑m​j=3∑m​αi​yi​αj​yj​xiT​xj​K=−i=3∑m​αi​​

設 k i j = x i T x j k_{ij}=x_i^Tx_j kij​=xiT​xj​,則式1.3變為

1 2 k 11 α 1 2 + 1 2 k 22 α 2 2 + α 1 α 2 y 1 y 2 k 12 + α 1 y 1 ∑ i = 3 m α i y i k i 1 + α 2 y 2 ∑ i = 3 m α i y i k i 2 + H + K − α 1 − α 2 (式1.4) \frac{1}{2}k_{11}\alpha_1^2+\frac{1}{2}k_{22}\alpha_2^2+\alpha_1\alpha_2y_1y_2k_{12}+\alpha_1y_1\sum_{i=3}^m\alpha_iy_ik_{i1}\\ +\alpha_2y_2\sum_{i=3}^m\alpha_iy_ik_{i2}+H+K-\alpha_1-\alpha_2\tag{式1.4} 21​k11​α12​+21​k22​α22​+α1​α2​y1​y2​k12​+α1​y1​i=3∑m​αi​yi​ki1​+α2​y2​i=3∑m​αi​yi​ki2​+H+K−α1​−α2​(式1.4)

由于 α 1 y 1 + α 2 y 2 + ∑ i = 3 m α i y i = 0 \alpha_1y_1+\alpha_2y_2+\sum_{i=3}^m\alpha_iy_i=0 α1​y1​+α2​y2​+∑i=3m​αi​yi​=0,是以有 α 1 y 1 + α 2 y 2 = − ∑ i = 3 m α i y i \alpha_1y_1+\alpha_2y_2=-\sum_{i=3}^m\alpha_iy_i α1​y1​+α2​y2​=−∑i=3m​αi​yi​,令 ξ = − ∑ i = 3 m α i y i \xi=-\sum_{i=3}^m\alpha_iy_i ξ=−∑i=3m​αi​yi​,則 α 1 = ξ y 1 − α 2 y 1 y 2 \alpha_1=\xi y_1-\alpha_2y_1y_2 α1​=ξy1​−α2​y1​y2​,将其代入式1.3可得

1 2 k 11 ( ξ − α 2 y 2 ) 2 + 1 2 k 22 α 2 2 + α 2 y 2 k 12 ( ξ − α 2 y 2 ) + α 2 y 2 v 2 + ( ξ − α 2 y 2 ) v 1 − ( α 1 + ξ y 1 − α 2 y 1 y 2 ) (式1.5) \frac{1}{2}k_{11}(\xi-\alpha_2y_2)^2+\frac{1}{2}k_{22}\alpha_2^2+\alpha_2y_2k_{12}(\xi-\alpha_2y_2)+\alpha_2y_2v_2+(\xi-\alpha_2y_2)v_1\\ -(\alpha_1+\xi y_1-\alpha_2y_1y_2)\tag{式1.5} 21​k11​(ξ−α2​y2​)2+21​k22​α22​+α2​y2​k12​(ξ−α2​y2​)+α2​y2​v2​+(ξ−α2​y2​)v1​−(α1​+ξy1​−α2​y1​y2​)(式1.5)

其中 v 1 = ∑ i = 3 m α i y i k i 1 v 2 = ∑ i = 3 m α i y i k i 2 v_1=\sum_{i=3}^m\alpha_iy_ik_{i1}\\ v_2=\sum_{i=3}^m\alpha_iy_ik_{i2} v1​=i=3∑m​αi​yi​ki1​v2​=i=3∑m​αi​yi​ki2​

式1.5是一個二次函數,其二次項系數為 1 2 k 11 + 1 2 k 22 − k 12 \frac{1}{2}k_{11}+\frac{1}{2}k_{22}-k_{12} 21​k11​+21​k22​−k12​,到這裡其實就是一個高中數學題了,我們将問題重新表述一遍,即

min ⁡   1 2 k 11 ( ξ − α 2 y 2 ) 2 + 1 2 k 22 α 2 2 + α 2 y 2 k 12 ( ξ − α 2 y 2 ) + α 2 y 2 v 2 + ( ξ − α 2 y 2 ) v 1 − ( α 1 + ξ y 1 − α 2 y 1 y 2 ) s . t   0 ≤ α 1 ≤ C ⟹ 0 ≤ ξ y 1 − α 2 y 1 y 2 ≤ C 0 ≤ α 2 ≤ C \begin{aligned} \min \ & \frac{1}{2}k_{11}(\xi-\alpha_2y_2)^2+\frac{1}{2}k_{22}\alpha_2^2+\alpha_2y_2k_{12}(\xi-\alpha_2y_2)+\alpha_2y_2v_2+(\xi-\alpha_2y_2)v_1-(\alpha_1+\xi y_1-\alpha_2y_1y_2)\\ \\ s.t \ & 0 \leq \alpha_1\leq C \Longrightarrow 0 \leq \xi y_1-\alpha_2y_1y_2 \leq C\\ \\ &0 \leq \alpha_2\leq C \end{aligned} min s.t ​21​k11​(ξ−α2​y2​)2+21​k22​α22​+α2​y2​k12​(ξ−α2​y2​)+α2​y2​v2​+(ξ−α2​y2​)v1​−(α1​+ξy1​−α2​y1​y2​)0≤α1​≤C⟹0≤ξy1​−α2​y1​y2​≤C0≤α2​≤C​

其二次項系數為 1 2 k 11 + 1 2 k 22 − k 12 \frac{1}{2}k_{11}+\frac{1}{2}k_{22}-k_{12} 21​k11​+21​k22​−k12​,即在定義域内求二次函數極值的問題

我們對二次項次數進行分類讨論(絕大多數資料都會畫圖,這裡我們直接用解高中題的思路就可以求解了)

1. 1 2 k 11 + 1 2 k 22 − k 12 > 0 \frac{1}{2}k_{11}+\frac{1}{2}k_{22}-k_{12}>0 21​k11​+21​k22​−k12​>0,式1.5為開口向上的二次函數

此時對式1.5求導置0可得 α 2 \alpha_2 α2​的新值為:

α 2 n e w = y 2 ( − k 12 ξ + k 11 ξ − v 1 + v 2 + y 2 − y 1 ) k 11 + k 22 − 2 k 12 \alpha_2^{new}=\frac{y_2(-k_{12}\xi+k_{11}\xi-v_1+v_2+y_2-y_1)}{k_{11}+k_{22}-2k_{12}} α2new​=k11​+k22​−2k12​y2​(−k12​ξ+k11​ξ−v1​+v2​+y2​−y1​)​

由于 v 1 = ∑ i = 3 m α i y i k i 1 , v 2 = ∑ i = 3 m α i y i k i 2 , ξ = α 1 y 1 + α 2 y 2 v_1=\sum_{i=3}^m\alpha_iy_ik_{i1},v_2=\sum_{i=3}^m\alpha_iy_ik_{i2},\xi=\alpha_1y_1+\alpha_2y_2 v1​=∑i=3m​αi​yi​ki1​,v2​=∑i=3m​αi​yi​ki2​,ξ=α1​y1​+α2​y2​,設 g ( x i ) = ∑ j = 1 m α j y j k j i + b g(x_i)=\sum_{j=1}^m\alpha_jy_jk_{ji}+b g(xi​)=∑j=1m​αj​yj​kji​+b故有

y 2 ( − k 12 ξ + k 11 ξ − v 1 + v 2 + y 2 − y 1 ) = y 2 ( y 2 − y 1 + g ( x 1 ) − g ( x 2 ) + ∑ j = 1 2 y j α j k j 1 − ∑ j = 1 2 y j α j k j 2 + ( k 11 − k 12 ) ( α 1 y 1 + α 2 y 2 ) ) = y 2 ( y 2 − y 1 + g ( x 1 ) + g ( x 2 ) + ∑ j = 1 2 α j y j k j 2 − ∑ j = 1 2 α j y j k j 1 + α 1 y 1 k 11 − α j y 1 k 12 + α 2 y 2 k 11 − α 2 y 2 k 12 ) = y 2 ( y 2 − y 1 + g ( x 1 ) − g ( x 2 ) + α 2 y 2 k 22 + α 2 y 2 k 11 − 2 α 2 y 2 k 12 ) \begin{aligned} &y_2(-k_{12}\xi+k_{11}\xi-v_1+v_2+y_2-y_1)\\ =&y_2(y_2-y_1+g(x_1)-g_(x_2)+\sum_{j=1}^2y_j\alpha_jk_{j1}-\sum_{j=1}^2y_j\alpha_jk_{j2}+(k_{11}-k_{12})(\alpha_1y_1+\alpha_2y_2))\\ =&y_2(y_2-y_1+g(x_1)+g(x_2)+\sum_{j=1}^2\alpha_jy_jk_{j2}-\sum_{j=1}^2\alpha_jy_jk_{j1}+\alpha_1y_1k_{11}-\alpha_jy_1k_{12}+\alpha_2y_2k_{11}-\alpha_2y_2k_{12})\\ =&y_2(y_2-y_1+g(x_1)-g(x_2)+\alpha_2y_2k_{22}+\alpha_2y_2k_{11}-2\alpha_2y_2k_{12}) \end{aligned} ===​y2​(−k12​ξ+k11​ξ−v1​+v2​+y2​−y1​)y2​(y2​−y1​+g(x1​)−g(​x2​)+j=1∑2​yj​αj​kj1​−j=1∑2​yj​αj​kj2​+(k11​−k12​)(α1​y1​+α2​y2​))y2​(y2​−y1​+g(x1​)+g(x2​)+j=1∑2​αj​yj​kj2​−j=1∑2​αj​yj​kj1​+α1​y1​k11​−αj​y1​k12​+α2​y2​k11​−α2​y2​k12​)y2​(y2​−y1​+g(x1​)−g(x2​)+α2​y2​k22​+α2​y2​k11​−2α2​y2​k12​)​

是以有

α 2 n e w = y 2 ( y 2 − y 1 + g ( x 1 ) − g ( x 2 ) + α 2 y 2 k 22 + α 2 y 2 k 11 − 2 α 2 y 2 k 12 ) k 11 + k 22 − 2 k 12 = y 2 ( E 1 − E 2 ) k 11 + k 22 − 2 k 12 + α 2 \begin{aligned} \alpha_2^{new}=&\frac{y_2(y_2-y_1+g(x_1)-g(x_2)+\alpha_2y_2k_{22}+\alpha_2y_2k_{11}-2\alpha_2y_2k_{12})}{k_{11}+k_{22}-2k_{12}}\\ =&\frac{y_2(E_1-E_2)}{k_{11}+k_{22}-2k_{12}}+\alpha_2 \end{aligned} α2new​==​k11​+k22​−2k12​y2​(y2​−y1​+g(x1​)−g(x2​)+α2​y2​k22​+α2​y2​k11​−2α2​y2​k12​)​k11​+k22​−2k12​y2​(E1​−E2​)​+α2​​

其中, E i = g ( x i ) − y i E_i=g(x_i)-y_i Ei​=g(xi​)−yi​

由于大部分求解思路

當 y 1 = 1 , y 2 = 1 當y_1=1,y_2=1 當y1​=1,y2​=1,首先先确定定義域,我們有 ξ − C ≤ α 2 ≤ ξ 0 ≤ α 2 ≤ C \begin{aligned} &\xi-C\leq\alpha_2\leq\xi \\ & 0\leq \alpha_2\leq C \end{aligned} ​ξ−C≤α2​≤ξ0≤α2​≤C​

定義域為 [ max ⁡ ( 0 , ξ − C ) , min ⁡ ( C , ξ ) ] [\max(0,\xi-C),\min(C,\xi)] [max(0,ξ−C),min(C,ξ)],是以,我們有

α 2 n e w = { max ⁡ ( 0 , ξ − C )    y 2 ( E 1 − E 2 ) k 11 + k 22 − 2 k 12 + α 2 < max ⁡ ( 0 , ξ − C ) y 2 ( E 1 − E 2 ) k 11 + k 22 − 2 k 12 + α 2     max ⁡ ( 0 , ξ − C ) ≤ y 2 ( E 1 − E 2 ) k 11 + k 22 − 2 k 12 + α 2 ≤ max ⁡ ( 0 , ξ − C ) min ⁡ ( C , ξ )     y 2 ( E 1 − E 2 ) k 11 + k 22 − 2 k 12 + α 2 > min ⁡ ( C , ξ ) \alpha_2^{new}=\left\{ \begin{aligned} &\max(0,\xi-C) \ \ \frac{y_2(E_1-E_2)}{k_{11}+k_{22}-2k_{12}}+\alpha_2<\max(0,\xi-C)\\ &\frac{y_2(E_1-E_2)}{k_{11}+k_{22}-2k_{12}}+\alpha_2 \ \ \ \max(0,\xi-C)\leq\frac{y_2(E_1-E_2)}{k_{11}+k_{22}-2k_{12}}+\alpha_2\leq\max(0,\xi-C)\\ &\min(C,\xi)\ \ \ \frac{y_2(E_1-E_2)}{k_{11}+k_{22}-2k_{12}}+\alpha_2>\min(C,\xi) \end{aligned} \right. α2new​=⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧​​max(0,ξ−C)  k11​+k22​−2k12​y2​(E1​−E2​)​+α2​<max(0,ξ−C)k11​+k22​−2k12​y2​(E1​−E2​)​+α2​   max(0,ξ−C)≤k11​+k22​−2k12​y2​(E1​−E2​)​+α2​≤max(0,ξ−C)min(C,ξ)   k11​+k22​−2k12​y2​(E1​−E2​)​+α2​>min(C,ξ)​

當 y 1 , y 2 y_1,y_2 y1​,y2​取其他值時,也具有類似的結構,在此不贅述

2. 1 2 k 11 + 1 2 k 22 − k 12 = 0 \frac{1}{2}k_{11}+\frac{1}{2}k_{22}-k_{12}=0 21​k11​+21​k22​−k12​=0,此時式1.5為一個一次函數,依據斜率取對應的定義域邊界值

3. 1 2 k 11 + 1 2 k 22 − k 12 < 0 \frac{1}{2}k_{11}+\frac{1}{2}k_{22}-k_{12}<0 21​k11​+21​k22​−k12​<0,此時式1.5為一個開口向下的二次函數,求解思路與 1 2 k 11 + 1 2 k 22 − k 12 > 0 \frac{1}{2}k_{11}+\frac{1}{2}k_{22}-k_{12}>0 21​k11​+21​k22​−k12​>0一緻,在此不贅述

選擇兩個α的方法(未解釋原因,隻闡明步驟)

以下内容摘自關于SVM數學細節邏輯的個人了解(三) :SMO算法了解

找第一個參數的具體過程是這樣的:

①周遊一遍整個資料集,對每個不滿足KKT條件的參數,選作第一個待修改參數

②在上面對整個資料集周遊一遍後,選擇那些參數滿足的子集,開始周遊,如果發現一個不滿足KKT條件的,作為第一個待修改參數,然後找到第二個待修改的參數并修改,修改完後,重新開始周遊這個子集

③周遊完子集後,重新開始①②,直到在執行①和②時沒有任何修改就結束

找第二個參數的過程是這樣的:

①啟發式找,找能讓 ∣ E 1 − E 2 ∣ |E_1-E_2| ∣E1​−E2​∣最大的 α 2 \alpha_2 α2​

繼續閱讀