天天看點

《統計學習方法》(第七章)—— 支援向量機線性可分支援向量機與硬間隔最大化線性支援向量機與軟間隔最大化非線性支援向量機與核函數序列最小最優化算法

線性可分支援向量機與硬間隔最大化

線性可分支援向量機

  • 定義:給定線性可分訓練資料集,通過間隔最大化或等價地求解相應的凸二次規劃問題學習得到的分離超平面為

    w ∗ ⋅ x + b ∗ = 0 w^* \cdot x +b^*=0 w∗⋅x+b∗=0

    以及相應的分類決策函數

    f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) f(x)=sign(w^* \cdot x +b^*) f(x)=sign(w∗⋅x+b∗)

    稱為線性可分支援向量機.

函數間隔和幾何間隔

  • 函數間隔定義:對于給定的訓練資料集T和超平面 ( w , b ) (w,b) (w,b)定義超平面 ( w , b ) (w,b) (w,b)關于樣本點 ( x i , y i ) (x_i,y_i) (xi​,yi​)的函數間隔為

    γ ^ i = y i ( w ⋅ x i + b ) \hat{\gamma}_i=y_i(w \cdot x_i+b) γ^​i​=yi​(w⋅xi​+b)

    定義超平面 ( w , b ) (w,b) (w,b)關于訓練資料集 T T T的函數間隔為超平面 ( w , b ) (w,b) (w,b)關于 T T T中所有樣本點 ( x i , y i ) (x_i,y_i) (xi​,yi​)的函數間隔最小值

    γ ^ = min ⁡ i = 1 , . . . , N γ ^ i \hat{\gamma}=\min\limits_{i=1,...,N}\hat{\gamma}_i γ^​=i=1,...,Nmin​γ^​i​

  • 幾何間隔定義:對于給定的訓練資料集 T T T和超平面 ( w , b ) (w,b) (w,b),定義超平面 ( w , b ) (w,b) (w,b)關于樣本點 ( x i , y i ) (x_i,y_i) (xi​,yi​)的幾何間隔為

    γ i = y i ( w ∣ ∣ w ∣ ∣ ⋅ x i + b ∣ ∣ w ∣ ∣ ) \gamma_i=y_i(\frac{w}{||w||}\cdot x_i+\frac{b}{||w||}) γi​=yi​(∣∣w∣∣w​⋅xi​+∣∣w∣∣b​)

    定義超平面 ( w , b ) (w,b) (w,b)關于訓練資料集 T T T的幾何間隔為超平面 ( w , b ) (w,b) (w,b)關于 T T T中所有樣本點 ( x i , y i ) (x_i,y_i) (xi​,yi​)的幾何間隔最小

    γ = min ⁡ i = 1 , . . . , N γ i {\gamma}=\min\limits_{i=1,...,N}{\gamma}_i γ=i=1,...,Nmin​γi​

    于是我們有

    γ i = γ i ^ ∣ ∣ w ∣ ∣ {\gamma}_i=\frac{\hat{\gamma_i}}{||w||} γi​=∣∣w∣∣γi​^​​

    γ = γ ^ ∣ ∣ w ∣ ∣ {\gamma}=\frac{\hat{\gamma}}{||w||} γ=∣∣w∣∣γ^​​

間隔最大化

最大間隔超平面為

                                       max ⁡ w , b   γ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \max\limits_{w,b} \ \gamma                                       w,bmax​ γ

                                       s . t .      y i ( w ∣ ∣ w ∣ ∣ + b ∣ ∣ w ∣ ∣ ) ≥ γ , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i(\frac{w}{||w||}+\frac{b}{||w||})\ge \gamma, i=1,2,...,N                                       s.t.    yi​(∣∣w∣∣w​+∣∣w∣∣b​)≥γ,i=1,2,...,N

等價于

                                       max ⁡ w , b   γ ^ ∣ ∣ w ∣ ∣ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \max\limits_{w,b} \ \frac{\hat{\gamma}}{||w||}                                       w,bmax​ ∣∣w∣∣γ^​​

                                       s . t .      y i ( w + b ) ≥ γ ^ , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})\ge \hat{\gamma}, i=1,2,...,N                                       s.t.    yi​(w+b)≥γ^​,i=1,2,...,N

因為 γ ^ \hat{\gamma} γ^​取值無所謂,我們取 γ ^ = 1 \hat{\gamma}=1 γ^​=1

則最終等價于

                                       min ⁡ w , b   1 2 ∣ ∣ w ∣ ∣ 2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \frac{1}{2}||w||^2                                       w,bmin​ 21​∣∣w∣∣2

                                       s . t .      y i ( w + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})-1 \ge0, i=1,2,...,N                                       s.t.    yi​(w+b)−1≥0,i=1,2,...,N

這是一個凸優化問題

最終算法

輸入:線性可分訓練資料集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1​,y1​),(x2​,y2​),...,(xN​,yN​)}其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xi​∈Rn,yi​∈{−1,+1},i=1,2,...,N

輸出:最大間隔分離超平面和分類函數

( 1 ) (1) (1)構造優化問題

                                      min ⁡ w , b   1 2 ∣ ∣ w ∣ ∣ 2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \frac{1}{2}||w||^2                                      w,bmin​ 21​∣∣w∣∣2

                                       s . t .      y i ( w + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})-1 \ge0, i=1,2,...,N                                       s.t.    yi​(w+b)−1≥0,i=1,2,...,N

( 2 ) (2) (2)得到分類超平面

w ∗ ⋅ x + b ∗ = 0 w^* \cdot x+b^*=0 w∗⋅x+b∗=0

以及分類決策函數

f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) f(x)=sign(w^* \cdot x+b^*) f(x)=sign(w∗⋅x+b∗)

  • 最大間隔分離超平面存在且唯一性證明:

    ( 1 ) (1) (1)存在性

    由于資料線性可分,必然存在可行解,又由于目标函數有下界,是以最優解必然存在,記 ( w ∗ , b ∗ ) (w^*,b^*) (w∗,b∗)又因為資料中存在正負樣本,是以 w ∗ ≠ 0 w^* \ne 0 w∗​=0,存在性得證

    ( 2 ) (2) (2)唯一性

    首先證明 w ∗ w^* w∗唯一.假設有兩個最優解 ( w 1 ∗ , b 1 ∗ ) (w_1^*,b_1^*) (w1∗​,b1∗​)和 ( w 2 ∗ , b 2 ∗ ) (w_2^*,b_2^*) (w2∗​,b2∗​)顯然 ∣ ∣ w 1 ∗ ∣ ∣ = ∣ ∣ w 2 ∗ ∣ ∣ = c ||w_1^*||=||w_2^*||=c ∣∣w1∗​∣∣=∣∣w2∗​∣∣=c

    令 w = w 1 ∗ + w 2 ∗ 2 , b = b 1 ∗ + b 2 ∗ 2 w=\frac{w_1^*+w_2^*}{2},b=\frac{b_1^*+b_2^*}{2} w=2w1∗​+w2∗​​,b=2b1∗​+b2∗​​,則 c ≤ ∣ ∣ w ∣ ∣ ≤ 1 2 ∣ ∣ w 1 ∗ ∣ ∣ + 1 2 ∣ ∣ w 2 ∗ ∣ ∣ = c , c\le||w||\le\frac{1}{2}||w_1^*||+\frac{1}{2}||w_2^*||=c, c≤∣∣w∣∣≤21​∣∣w1∗​∣∣+21​∣∣w2∗​∣∣=c,是以 ∣ ∣ w ∣ ∣ = 1 2 ∣ ∣ w 1 ∗ ∣ ∣ + 1 2 ∣ ∣ w 2 ∗ ∣ ∣ ||w||=\frac{1}{2}||w_1^*||+\frac{1}{2}||w_2^*|| ∣∣w∣∣=21​∣∣w1∗​∣∣+21​∣∣w2∗​∣∣

    進而 ∣ ∣ w 1 ∗ ∣ ∣ = λ ∣ ∣ w 2 ∗ ∣ ∣ ||w_1^*||=\lambda||w_2^*|| ∣∣w1∗​∣∣=λ∣∣w2∗​∣∣, ∣ λ ∣ = 1 |\lambda|=1 ∣λ∣=1如果 λ = − 1 \lambda=-1 λ=−1,則 ∣ ∣ w ∣ ∣ = 0 ||w||=0 ∣∣w∣∣=0沖突,如果 λ = 1 \lambda=1 λ=1,則 ∣ ∣ w 1 ∗ ∣ ∣ = ∣ ∣ w 2 ∗ ∣ ∣ ||w_1^*||=||w_2^*|| ∣∣w1∗​∣∣=∣∣w2∗​∣∣沖突,是以 w ∗ w^* w∗唯一

    再證 b ∗ b^* b∗

    設 x 1 ‘ , x 2 ‘ x_1^`,x_2^` x1‘​,x2‘​為集合 { x i ∣ y i = + 1 } \{x_i|y_i=+1\} {xi​∣yi​=+1}中分别對應 ( w ∗ , b 1 ∗ ) (w^*,b_1^*) (w∗,b1∗​)和 ( w ∗ , b 2 ∗ ) (w^*,b_2^*) (w∗,b2∗​)成立的點

    設 x 1 ‘ ‘ , x 2 ‘ ‘ x_1^{``},x_2^{``} x1‘‘​,x2‘‘​為集合 { x i ∣ y i = − 1 } \{x_i|y_i=-1\} {xi​∣yi​=−1}中分别對應 ( w ∗ , b 1 ∗ ) (w^*,b_1^*) (w∗,b1∗​)和 ( w ∗ , b 2 ∗ ) (w^*,b_2^*) (w∗,b2∗​)成立的點

    則 b 1 ∗ = − 1 2 ( w ∗ ⋅ x 1 ′ + w ∗ ⋅ x 1 ′ ′ ) , b 2 ∗ = − 1 2 ( w ∗ ⋅ x 2 ′ + w ∗ ⋅ x 2 ′ ′ ) b_1^*=-\frac{1}{2}(w^* \cdot x_1^{'}+w^* \cdot x_1^{''}),b_2^*=-\frac{1}{2}(w^* \cdot x_2^{'}+w^* \cdot x_2^{''}) b1∗​=−21​(w∗⋅x1′​+w∗⋅x1′′​),b2∗​=−21​(w∗⋅x2′​+w∗⋅x2′′​)

    b 1 ∗ − b 2 ∗ = − 1 2 [ w ∗ ⋅ ( x 1 ′ − x 2 ′ ) + w ∗ ⋅ ( x 1 ′ ′ − x 2 ′ ′ ) ] b_1^*-b_2^*=-\frac{1}{2}[w^*\cdot (x_1^{'}-x_2^{'})+w^*\cdot (x_1^{''}-x_2^{''})] b1∗​−b2∗​=−21​[w∗⋅(x1′​−x2′​)+w∗⋅(x1′′​−x2′′​)]

    w ∗ ⋅ x 2 ′ + b 1 ∗ ≥ 1 = w ∗ ⋅ x 1 ′ + b 1 ∗ w^* \cdot x_2^{'}+b_1^* \ge 1=w^* \cdot x_1^{'}+b_1^* w∗⋅x2′​+b1∗​≥1=w∗⋅x1′​+b1∗​

    w ∗ ⋅ x 1 ′ + b 1 ∗ ≥ 1 = w ∗ ⋅ x 2 ′ + b 2 ∗ w^* \cdot x_1^{'}+b_1^* \ge 1=w^* \cdot x_2^{'}+b_2^* w∗⋅x1′​+b1∗​≥1=w∗⋅x2′​+b2∗​

    是以 w ∗ ⋅ ( x 1 ′ − x 2 ′ ) = 0 w^* \cdot(x_1^{'}-x_2^{'})=0 w∗⋅(x1′​−x2′​)=0,同理 w ∗ ⋅ ( x 1 ′ ‘ − x 2 ′ ’ ) = 0 w^* \cdot(x_1^{'‘}-x_2^{'’})=0 w∗⋅(x1′‘​−x2′’​)=0

    是以 b 1 ∗ = b 2 ∗ b_1^*=b_2^* b1∗​=b2∗​成立.

  • 支援向量和間隔邊界

    滿足 w ⋅ x i + b = y i w \cdot x_i+b=y_i w⋅xi​+b=yi​的點稱為支援向量

    H 1 : w ⋅ x i + b = + 1 H_1:w \cdot x_i+b=+1 H1​:w⋅xi​+b=+1

    H 2 : w ⋅ x i + b = − 1 H_2:w \cdot x_i+b=-1 H2​:w⋅xi​+b=−1

    則 H 1 和 H 2 H_1和H_2 H1​和H2​之間的寬度為 2 ∣ ∣ w ∣ ∣ \frac{2}{||w||} ∣∣w∣∣2​為間隔邊界

學習的對偶算法

對優化問題求解,首先定義拉格朗日函數

L ( w , b , a ) = 1 2 ∣ ∣ w ∣ ∣ 2 − ∑ i = 1 N a i y i ( w ⋅ x i + b ) + ∑ i = 1 N a i , L(w,b,a)=\frac{1}{2}||w||^2-\sum\limits_{i=1}^Na_iy_i(w \cdot x_i+b)+\sum\limits_{i=1}^Na_i, L(w,b,a)=21​∣∣w∣∣2−i=1∑N​ai​yi​(w⋅xi​+b)+i=1∑N​ai​,其中 a i ≥ 0 , i = 1 , 2 , . . . , N a_i \ge0,i=1,2,...,N ai​≥0,i=1,2,...,N

定義 a = ( a 1 , a 2 , . . . , a N ) T a=(a_1,a_2,...,a_N)^T a=(a1​,a2​,...,aN​)T

則原問題等價于

max ⁡ a min ⁡ w , b L ( w , b , a ) \max\limits_a\min\limits_{w,b}L(w,b,a) amax​w,bmin​L(w,b,a)

( 1 ) (1) (1)求 min ⁡ w , b L ( w , b , a ) \min\limits_{w,b}L(w,b,a) w,bmin​L(w,b,a)另 w , b w,b w,b偏導數等于0

∇ w L ( w , b , a ) = w − ∑ i = 1 N a i y i x i = 0 \nabla_wL(w,b,a)=w-\sum\limits_{i=1}^Na_iy_ix_i=0 ∇w​L(w,b,a)=w−i=1∑N​ai​yi​xi​=0

∇ b L ( w , b , a ) = − ∑ i = 1 N a i y i = 0 \nabla_bL(w,b,a)=-\sum\limits_{i=1}^Na_iy_i=0 ∇b​L(w,b,a)=−i=1∑N​ai​yi​=0

w = ∑ i = 1 N a i y i x i w=\sum\limits_{i=1}^Na_iy_ix_i w=i=1∑N​ai​yi​xi​

∑ i = 1 N a i y i = 0 \sum\limits_{i=1}^Na_iy_i=0 i=1∑N​ai​yi​=0

代入得

L ( w , b , a ) = 1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i y i ( ( ∑ j = 1 N a j y j x j ) ⋅ x i + b + ∑ i = 1 N a i ) L(w,b,a)=\frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_iy_i\Bigg((\sum\limits_{j=1}^Na_jy_jx_j)\cdot x_i +b+\sum\limits_{i=1}^Na_i\Bigg) L(w,b,a)=21​i=1∑N​j=1∑N​ai​aj​yi​yj​(xi​⋅xj​)−i=1∑N​ai​yi​((j=1∑N​aj​yj​xj​)⋅xi​+b+i=1∑N​ai​)

= − 1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) + ∑ i = 1 N a i =-\frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)+\sum\limits_{i=1}^Na_i =−21​i=1∑N​j=1∑N​ai​aj​yi​yj​(xi​⋅xj​)+i=1∑N​ai​

( 2 ) (2) (2)

                                       min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i                                       amin​ 21​i=1∑N​j=1∑N​ai​aj​yi​yj​(xi​⋅xj​)−i=1∑N​ai​

                                       s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1∑N​ai​yi​=0

                                       a i ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a_i\ge 0,i=1,2,...,N                                       ai​≥0,i=1,2,...,N

w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w∗=i=1∑N​ai∗​yi​xi​

b ∗ = y i − ∑ i = 1 N a i ∗ y i ( x i ⋅ x j ) , a i > 0 b^*=y_i-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_i>0 b∗=yi​−i=1∑N​ai∗​yi​(xi​⋅xj​),ai​>0

f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*) f(x)=sign(i=1∑N​ai∗​yi​(x⋅xi​)+b∗)

算法

輸入:線性可分訓練集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1​,y1​),(x2​,y2​),...,(xN​,yN​)},其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xi​∈Rn,yi​∈{−1,+1},i=1,2,...,N

輸出:分離超平面和分類決策函數

( 1 ) (1) (1)

                                       min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i                                       amin​ 21​i=1∑N​j=1∑N​ai​aj​yi​yj​(xi​⋅xj​)−i=1∑N​ai​

                                       s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1∑N​ai​yi​=0

                                       a i ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a_i\ge 0,i=1,2,...,N                                       ai​≥0,i=1,2,...,N

求解 a ∗ a^* a∗

( 2 ) (2) (2)計算

w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w∗=i=1∑N​ai∗​yi​xi​

b ∗ = y j − ∑ i = 1 N a i ∗ y i ( x i ⋅ x j ) , a j > 0 b^*=y_j-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_j>0 b∗=yj​−i=1∑N​ai∗​yi​(xi​⋅xj​),aj​>0

( 3 ) (3) (3)求得分類超平面

f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*) f(x)=sign(i=1∑N​ai∗​yi​(x⋅xi​)+b∗)

線性支援向量機與軟間隔最大化

線性支援向量機

  • 定義 給定線性不可分的訓練資料集,通過求解凸二次規劃問題,即軟間隔最大化,得到分離超平面為

    w ∗ ⋅ x + b ∗ = 0 w^* \cdot x+b^*=0 w∗⋅x+b∗=0

    以及決策分類函數

    f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) f(x)=sign(w^* \cdot x+b^*) f(x)=sign(w∗⋅x+b∗)

    稱為線性支援向量機,

    改變限制條件為

    y i ( w ⋅ x i + b ) ≥ 1 − ξ i , ξ i ≥ 0 y_i(w \cdot x_i+b)\ge 1-\xi_i,\xi_i\ge0 yi​(w⋅xi​+b)≥1−ξi​,ξi​≥0

    目标函數為

    1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i , C > 0 \frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i,C>0 21​∣∣w∣∣2+Ci=1∑N​ξi​,C>0

    最終為

    min ⁡ w , b     1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i                           \min\limits_{w,b} \ \ \ \frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w,bmin​   21​∣∣w∣∣2+Ci=1∑N​ξi​                         

    s . t .      y i ( w + b ) ≥ 1 − ξ i , i = 1 , 2 , . . . , N s.t. \ \ \ \ y_i({w}+{b}) \ge1-\xi_i, i=1,2,...,N s.t.    yi​(w+b)≥1−ξi​,i=1,2,...,N

學習的對偶算法

根據對偶原理

L ( w , b , ξ , a , μ ) = 1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i − ∑ i = 1 N a i ( y i ( w ⋅ x i + b ) − 1 + ξ i ) − ∑ i = 1 N μ i ξ i ,                 ξ i ≥ 0 , μ i ≥ 0 L(w,b,\xi,a,\mu)=\frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i-\sum\limits_{i=1}^Na_i(y_i(w \cdot x_i+b)-1+\xi_i)-\sum_{i=1}^N\mu_i\xi_i,\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \xi_i \ge0,\mu_i\ge0 L(w,b,ξ,a,μ)=21​∣∣w∣∣2+Ci=1∑N​ξi​−i=1∑N​ai​(yi​(w⋅xi​+b)−1+ξi​)−i=1∑N​μi​ξi​,               ξi​≥0,μi​≥0

∇ w L ( w , b , ξ , a , μ ) = w − ∑ i = 1 N a i y i x i = 0 \nabla_wL(w,b,\xi,a,\mu)=w-\sum\limits_{i=1}^Na_iy_ix_i=0 ∇w​L(w,b,ξ,a,μ)=w−i=1∑N​ai​yi​xi​=0

∇ b L ( w , b , ξ , a , μ ) = − ∑ i = 1 N a i y i = 0 \nabla_bL(w,b,\xi,a,\mu)=-\sum\limits_{i=1}^Na_iy_i=0 ∇b​L(w,b,ξ,a,μ)=−i=1∑N​ai​yi​=0

∇ ξ i L ( w , b , ξ , a , μ ) = C − a i − μ i = 0 \nabla_{\xi_i}L(w,b,\xi,a,\mu)=C-a_i-\mu_i=0 ∇ξi​​L(w,b,ξ,a,μ)=C−ai​−μi​=0

代入原式中得

                                  min ⁡ w , b     1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i       \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \ \ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i \ \ \ \ \                                  w,bmin​   21​i=1∑N​j=1∑N​ai​aj​yi​yj​(xi​⋅xj​)−i=1∑N​ai​     

s . t .              ∑ i = 1 N a i y i = 0 s.t. \ \ \ \ \ \ \ \ \ \ \ \ \sum\limits_{i=1}^Na_iy_i=0 s.t.            i=1∑N​ai​yi​=0

       0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \ \ \ \ \ \ 0\le a_i\le C,i=1,2,...,N       0≤ai​≤C,i=1,2,...,N

其中

w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w∗=i=1∑N​ai∗​yi​xi​

b ∗ = y j − ∑ i = 1 N y i a i ∗ ( x i ⋅ x j ) , 0 < a j ∗ < C b^*=y_j-\sum\limits_{i=1}^Ny_ia_i^*(x_i \cdot x_j),0<a_j^*<C b∗=yj​−i=1∑N​yi​ai∗​(xi​⋅xj​),0<aj∗​<C

算法:

輸入:線性可分訓練集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1​,y1​),(x2​,y2​),...,(xN​,yN​)},其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xi​∈Rn,yi​∈{−1,+1},i=1,2,...,N

輸出:分離超平面和分類決策函數

( 1 ) (1) (1)

                                       min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i                                       amin​ 21​i=1∑N​j=1∑N​ai​aj​yi​yj​(xi​⋅xj​)−i=1∑N​ai​

                                       s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1∑N​ai​yi​=0

                                       0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i\le C,i=1,2,...,N                                       0≤ai​≤C,i=1,2,...,N

求解 a ∗ a^* a∗

( 2 ) (2) (2)計算

w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w∗=i=1∑N​ai∗​yi​xi​

b ∗ = y j − ∑ i = 1 N a i ∗ y i ( x i ⋅ x j ) , a j > 0 b^*=y_j-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_j>0 b∗=yj​−i=1∑N​ai∗​yi​(xi​⋅xj​),aj​>0

( 3 ) (3) (3)求得分類超平面

f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*) f(x)=sign(i=1∑N​ai∗​yi​(x⋅xi​)+b∗)

支援向量

  • 0 < a i ∗ < C 0<a_i^*<C 0<ai∗​<C則 x i x_i xi​在間隔邊界上
  • a i ∗ = C , 0 < ξ i < 1 a_i^*=C,0 < \xi_i <1 ai∗​=C,0<ξi​<1則分類正确,且在間隔邊界和超平面之間
  • a i ∗ = C , ξ i = 1 a_i^*=C,\xi_i =1 ai∗​=C,ξi​=1則 x i x_i xi​在分離超平面上
  • a i ∗ = C , 1 < ξ i a_i^*=C,1 < \xi_i ai∗​=C,1<ξi​則 x i x_i xi​在另一測

合頁損失函數

修改目标函數為

∑ i = 1 N [ 1 − y i ( w ⋅ x + b ) ] + + λ ∣ ∣ w ∣ ∣ 2 \sum\limits_{i=1}^N[1-y_i(w \cdot x+b)]_++ \lambda||w||^2 i=1∑N​[1−yi​(w⋅x+b)]+​+λ∣∣w∣∣2

等價于線性支援向量機

取 ξ i = [ 1 − y i ( w ⋅ x + b ) ] + \xi_i=[1-y_i(w \cdot x+b)]_+ ξi​=[1−yi​(w⋅x+b)]+​

min ⁡ w , b ∑ i = 1 N ξ i + λ ∣ ∣ w ∣ ∣ 2 \min\limits_{w,b}\sum\limits_{i=1}^N\xi_i+\lambda||w||^2 w,bmin​i=1∑N​ξi​+λ∣∣w∣∣2

取 λ = 1 2 C \lambda=\frac{1}{2C} λ=2C1​

min ⁡ w , b 1 C ( C ∑ i = 1 N ξ i + 1 2 λ ∣ ∣ w ∣ ∣ 2 ) \min\limits_{w,b}\frac{1}{C}(C\sum\limits_{i=1}^N\xi_i+\frac{1}{2}\lambda||w||^2) w,bmin​C1​(Ci=1∑N​ξi​+21​λ∣∣w∣∣2)

等價之

非線性支援向量機與核函數

核技巧

針對線性不可分問題,我們應用核技巧

設 ϕ ( x ) \phi(x) ϕ(x)為x向特征空間的映射

k ( x , z ) = ϕ ( x ) ⋅ ϕ ( z ) k(x,z)=\phi(x) \cdot \phi(z) k(x,z)=ϕ(x)⋅ϕ(z)

替換 x j ⋅ x i x_j \cdot x_i xj​⋅xi​為 k ( x , z ) k(x,z) k(x,z)

正定核

K ( x , z ) K(x,z) K(x,z)為正定核函數的充要條件為其Gram矩陣是半正定的

K = [ K ( x i , x j ) ] m × m K=[K(x_i,x_j)]_{m×m} K=[K(xi​,xj​)]m×m​

為半正定

常用核函數

  • 多項式核函數

    K ( x , z ) = ( x ⋅ z + 1 ) p K(x,z)=(x \cdot z +1)^p K(x,z)=(x⋅z+1)p

  • 高斯核函數

    K ( x , z ) = exp ⁡ ( − ∣ ∣ x − z ∣ ∣ 2 2 σ 2 ) K(x,z)=\exp(-\frac{||x-z||^2}{2\sigma^2}) K(x,z)=exp(−2σ2∣∣x−z∣∣2​)

  • 字元串核函數

    K n ( s , t ) = ∑ u ∈ ∑ n [ ϕ n ( s ) ] n [ ϕ n ( t ) ] n = ∑ u ∈ ∑ n ∑ ( i , j ) : s ( i ) = t ( j ) = u λ l ( i ) + l ( j ) K_n(s,t)=\sum\limits_{u \in \sum^n}[\phi_n(s)]_n[\phi_n(t)]_n=\sum\limits_{u \in \sum^n}\sum\limits_{(i,j):s(i)=t(j)=u}\lambda^{l(i)+l(j)} Kn​(s,t)=u∈∑n∑​[ϕn​(s)]n​[ϕn​(t)]n​=u∈∑n∑​(i,j):s(i)=t(j)=u∑​λl(i)+l(j)

    其中 0 < λ ≤ 1 , l ( i ) 0<\lambda\le1,l(i) 0<λ≤1,l(i)為字元串 i i i的長度,在 s , t s,t s,t子串上進行

    l ( i ) = i ∣ u ∣ − i 1 + 1 , 1 ≤ i 1 < i 2 , . . . , i ∣ u ∣ ≤ ∣ s ∣ l(i)=i_{|u|}-i_1+1,1\le i_1<i_2,...,i_{|u|}\le |s| l(i)=i∣u∣​−i1​+1,1≤i1​<i2​,...,i∣u∣​≤∣s∣

非線性支援向量機

  • 定義:從非線性分類訓練集,通過核函數與軟間隔最大化,或凸規劃,學習得到的分類決策函數

    f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i K ( x , x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_iK(x,x_i)+b^*) f(x)=sign(i=1∑N​ai∗​yi​K(x,xi​)+b∗)

    稱為非線性支援向量機, K ( x , z ) K(x,z) K(x,z)為正定核函數

    算法:

    輸入:線性可分訓練集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1​,y1​),(x2​,y2​),...,(xN​,yN​)},其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xi​∈Rn,yi​∈{−1,+1},i=1,2,...,N

    輸出:分離超平面和分類決策函數

    ( 1 ) (1) (1)

                                           min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j K ( x i , x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_jK(x_i,x_j)-\sum\limits_{i=1}^Na_i                                       amin​ 21​i=1∑N​j=1∑N​ai​aj​yi​yj​K(xi​,xj​)−i=1∑N​ai​

                                           s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1∑N​ai​yi​=0

                                           0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i\le C,i=1,2,...,N                                       0≤ai​≤C,i=1,2,...,N

    求解 a ∗ a^* a∗

    ( 2 ) (2) (2)計算

    w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w∗=i=1∑N​ai∗​yi​xi​

    b ∗ = y j − ∑ i = 1 N a i ∗ y i K ( x i , x j ) , a j > 0 b^*=y_j-\sum\limits_{i=1}^Na_i^*y_iK(x_i,x_j),a_j>0 b∗=yj​−i=1∑N​ai∗​yi​K(xi​,xj​),aj​>0

( 3 ) (3) (3)求得分類超平面

f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i K ( x i , x ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_iK(x_i,x)+b^*) f(x)=sign(i=1∑N​ai∗​yi​K(xi​,x)+b∗)

序列最小最優化算法

選擇兩個違反KKT條件的變量進行優化,直到滿足停止條件或者都滿足KKT條件,如果滿足KKT條件,則是最優解

兩個變量二次規劃的求解方法

設選擇 a 1 , a 2 a_1,a_2 a1​,a2​

min ⁡ a 1 , a 2       W ( a 1 , a 2 ) = 1 2 K 11 a 1 2 + 1 2 K 22 a 2 2 + y 1 y 2 K 12 a 1 a 2 − ( a 1 + a 2 ) + y 1 a 1 ∑ i = 3 N y i a i K i 1 + y 2 a 2 ∑ i = 3 N y i a i K i 2 \min\limits_{a_1,a_2}\ \ \ \ \ W(a_1,a_2)=\frac{1}{2}K_{11}a_1^2+\frac{1}{2}K_{22}a_2^2+y_1y_2K_{12}a_1a_2-(a_1+a_2)+y_1a_1\sum\limits_{i=3}^Ny_ia_iK_{i1}+y_2a_2\sum\limits_{i=3}^Ny_ia_iK_{i2} a1​,a2​min​     W(a1​,a2​)=21​K11​a12​+21​K22​a22​+y1​y2​K12​a1​a2​−(a1​+a2​)+y1​a1​i=3∑N​yi​ai​Ki1​+y2​a2​i=3∑N​yi​ai​Ki2​

s . t .          a 1 y 1 + a 2 y 2 = − ∑ i = 3 N y i a i = ξ s.t.\ \ \ \ \ \ \ \ a_1y_1+a_2y_2=-\sum\limits_{i=3}^Ny_ia_i=\xi s.t.        a1​y1​+a2​y2​=−i=3∑N​yi​ai​=ξ

              0 ≤ a i ≤ C , i = 1 , 2 \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i \le C,i=1,2              0≤ai​≤C,i=1,2

K i j = K ( x i , x j ) K_{ij}=K(x_i,x_j) Kij​=K(xi​,xj​)

我們要求

L ≤ a 2 n e w ≤ H L \le a_2^{new}\le H L≤a2new​≤H

  • y 1 ≠ y 2 y_1\ne y_2 y1​​=y2​ L = max ⁡ ( 0 , a 2 o l d − a 1 o l d ) , R = min ⁡ ( C , C + a 2 o l d − a 1 o l d ) L=\max(0,a_2^{old}-a_1^{old}),R=\min(C,C+a_2^{old}-a_1^{old}) L=max(0,a2old​−a1old​),R=min(C,C+a2old​−a1old​)
  • y 1 = y 2 y_1= y_2 y1​=y2​ L = max ⁡ ( 0 , a 2 o l d + a 1 o l d − C ) , R = min ⁡ ( C , a 2 o l d + a 1 o l d ) L=\max(0,a_2^{old}+a_1^{old}-C),R=\min(C,a_2^{old}+a_1^{old}) L=max(0,a2old​+a1old​−C),R=min(C,a2old​+a1old​)

    未剪輯和考慮限制條件的解為 a 2 n e w , u n c a_2^{new,unc} a2new,unc​

    g ( x ) = ∑ i = 1 N a i y i K ( x i , x ) + b g(x)=\sum\limits_{i=1}^Na_iy_iK(x_i,x)+b g(x)=i=1∑N​ai​yi​K(xi​,x)+b

    E i = g ( x i ) − y i = ( ∑ j = 1 N a j y j K ( x j , x i ) + b ) − y i ,       i = 1 , 2 E_i=g(x_i)-y_i=(\sum\limits_{j=1}^Na_jy_jK(x_j,x_i)+b)-y_i,\ \ \ \ \ i=1,2 Ei​=g(xi​)−yi​=(j=1∑N​aj​yj​K(xj​,xi​)+b)−yi​,     i=1,2

    a 2 n e w , u n c = a 2 o l d + y 2 ( E 1 − E 2 ) η a_2^{new,unc}=a_2^{old}+\frac{y_2(E_1-E_2)}{\eta} a2new,unc​=a2old​+ηy2​(E1​−E2​)​

    其中

    η = K 11 + K 22 − 2 K 12 \eta=K_{11}+K_{22}-2K_{12} η=K11​+K22​−2K12​

    再進行剪輯

    a n e w = { H a 2 n e w , u n c > H a 2 n e w , u n c L ≤ a 2 n e w , u n c ≤ H L a 2 n e w , u n c < L a^{new}=\begin{cases} H & a_2^{new,unc}>H\\ a_2^{new,unc} & L \le a_2^{new,unc} \le H\\ L & a_2^{new,unc}<L\\ \end{cases} anew=⎩⎪⎨⎪⎧​Ha2new,unc​L​a2new,unc​>HL≤a2new,unc​≤Ha2new,unc​<L​

    a 1 n e w = a 1 o l d + y 1 y 2 ( a 2 o l d − a 1 o l d ) a_1^{new}=a_1^{old}+y_1y_2(a_2^{old}-a_1^{old}) a1new​=a1old​+y1​y2​(a2old​−a1old​)

  • 以上更新公式的證明:

    記 v i = ∑ j = 3 N a j y j K ( x i , x j ) = g ( x i ) − ∑ j = 1 2 a j y j K ( x i , x j ) − b v_i=\sum\limits_{j=3}^Na_jy_jK(x_i,x_j)=g(x_i)-\sum\limits_{j=1}^2a_jy_jK(x_i,x_j)-b vi​=j=3∑N​aj​yj​K(xi​,xj​)=g(xi​)−j=1∑2​aj​yj​K(xi​,xj​)−b

    則原問題為

    W ( a 1 , a 2 ) = 1 2 K 11 a 1 2 + 1 2 K 22 a 2 2 + y 1 y 2 K 12 a 1 a 2 − ( a 1 + a 2 ) + y 1 v 1 a 1 + y 2 v 2 a 2 W(a_1,a_2)=\frac{1}{2}K_{11}a_1^2+\frac{1}{2}K_{22}a_2^2+y_1y_2K_{12}a_1a_2-(a_1+a_2)+y_1v_1a_1+y_2v_2a_2 W(a1​,a2​)=21​K11​a12​+21​K22​a22​+y1​y2​K12​a1​a2​−(a1​+a2​)+y1​v1​a1​+y2​v2​a2​

    a 1 = ( ξ − y 2 a 2 ) y 1 a_1=(\xi-y_2a_2)y_1 a1​=(ξ−y2​a2​)y1​

    則得到

    W ( a 2 ) = 1 2 K 11 ( ξ − a 2 y 2 ) 2 + 1 2 K 22 a 2 2 + y 2 K 12 ( ξ − a 2 y 2 ) a 2 − ( ξ − a 2 y 2 ) y 1 − a 2 + v 1 ( ξ − a 2 y 2 ) + y 2 v 2 a 2 W(a_2)=\frac{1}{2}K_{11}(\xi-a_2y_2)^2+\frac{1}{2}K_{22}a_2^2+y_2K_{12}(\xi-a_2y_2)a_2-(\xi-a_2y_2)y_1-a_2+v_1(\xi-a_2y_2)+y_2v_2a_2 W(a2​)=21​K11​(ξ−a2​y2​)2+21​K22​a22​+y2​K12​(ξ−a2​y2​)a2​−(ξ−a2​y2​)y1​−a2​+v1​(ξ−a2​y2​)+y2​v2​a2​

    求導

    ∂ W ∂ a 2 = K 11 a 2 + K 22 a 2 − 2 K 12 a 2 − K 11 ξ y 2 + K 12 ξ y 2 + y 1 y 2 − 1 − v 1 y 2 + y 2 v 2 = 0 \frac{\partial W}{\partial a_2}=K_{11}a_2+K_{22}a_2-2K_{12}a_2-K_{11}\xi y_2+K_{12}\xi y_2+y_1y_2-1-v_1y_2+y_2v_2=0 ∂a2​∂W​=K11​a2​+K22​a2​−2K12​a2​−K11​ξy2​+K12​ξy2​+y1​y2​−1−v1​y2​+y2​v2​=0

    同時

    η = K 11 + K 22 − 2 K 12 \eta=K_{11}+K_{22}-2K_{12} η=K11​+K22​−2K12​

    a 2 n e w , u n c = a 2 o l d + y 2 ( E 1 − E 2 ) η a_2^{new,unc}=a_2^{old}+\frac{y_2(E_1-E_2)}{\eta} a2new,unc​=a2old​+ηy2​(E1​−E2​)​

變量的選擇方法

  1. 選擇第一個變量

    KTT條件如下

    a i = 0    ⟺    y i g ( x i ) ≥ 1 a_i=0\iff y_ig(x_i) \ge 1 ai​=0⟺yi​g(xi​)≥1

    0 < a i < C    ⟺    y i g ( x i ) = 1 0<a_i<C\iff y_ig(x_i) = 1 0<ai​<C⟺yi​g(xi​)=1

    a i = C    ⟺    y i g ( x i ) ≤ 1 a_i=C\iff y_ig(x_i) \le 1 ai​=C⟺yi​g(xi​)≤1

    優先選擇不滿足第二個條件,再周遊整個資料集選其他不滿足的

  2. 選擇第二個變量

    在第一個選擇後,我們選擇 a 2 a_2 a2​的原則是盡量變化的快,即

  • E 1 > 0 E_1>0 E1​>0,選最小的 E 2 E_2 E2​
  • E 1 < 0 E_1<0 E1​<0,選最大的 E 2 E_2 E2​

    優先選擇間隔邊界上的點,如果沒有變化快的,則周遊整個資料集,如果再沒有,則放棄 a 1 a_1 a1​重新選擇 a 1 a_1 a1​

  1. 計算 b b b和 E i E_i Ei​

    由KKT條件, 0 < a 1 n e w < C 0<a_1^{new}<C 0<a1new​<C

    ∑ i = 1 N a i y i K i 1 + b = y 1 \sum\limits_{i=1}^Na_iy_iK_{i1}+b=y_1 i=1∑N​ai​yi​Ki1​+b=y1​

    b 1 n e w = y 1 − ∑ i = 3 N a i y i K i 1 − a 1 n e w y 1 K 11 − a 2 n e w y 2 K 21 b_1^{new}=y_1-\sum\limits_{i=3}^Na_iy_iK_{i1}-a_1^{new}y_1K_{11}-a_2^{new}y_2K_{21} b1new​=y1​−i=3∑N​ai​yi​Ki1​−a1new​y1​K11​−a2new​y2​K21​

    E 1 = ∑ i = 3 N a i y i K i 1 + a 1 o l d y 1 K 11 + a 2 o l d y 2 K 21 + b o l d − y 1 E_1=\sum\limits_{i=3}^Na_iy_iK_{i1}+a_1^{old}y_1K_{11}+a_2^{old}y_2K_{21}+b^{old}-y_1 E1​=i=3∑N​ai​yi​Ki1​+a1old​y1​K11​+a2old​y2​K21​+bold−y1​

    由兩項得

    b 1 n e w = − E 1 − y 1 K 11 ( a 1 n e w − a 1 o l d ) − y 2 K 21 ( a 2 n e w − a 2 o l d ) + b o l d b_1^{new}=-E_1-y_1K_{11}(a_1^{new}-a_1^{old})-y_2K_{21}(a_2^{new}-a_2^{old})+b^{old} b1new​=−E1​−y1​K11​(a1new​−a1old​)−y2​K21​(a2new​−a2old​)+bold

    同樣如果 0 < a 2 n e w < C 0<a_2^{new}<C 0<a2new​<C

    b 2 n e w = − E 2 − y 1 K 12 ( a 1 n e w − a 1 o l d ) − y 2 K 22 ( a 2 n e w − a 2 o l d ) + b o l d b_2^{new}=-E_2-y_1K_{12}(a_1^{new}-a_1^{old})-y_2K_{22}(a_2^{new}-a_2^{old})+b^{old} b2new​=−E2​−y1​K12​(a1new​−a1old​)−y2​K22​(a2new​−a2old​)+bold

    如果 a 1 n e w , a 2 n e w a_1^{new},a_2^{new} a1new​,a2new​同時滿足條件,則 b 1 n e w = b 2 n e w b_1^{new}=b_2^{new} b1new​=b2new​

    如果 a 1 n e w , a 2 n e w a_1^{new},a_2^{new} a1new​,a2new​為0或 C C C,則我們取 b n e w = b 1 n e w + b 2 n e w 2 b^{new}=\frac{b_1^{new}+b_2^{new}}{2} bnew=2b1new​+b2new​​

    最後

    E i n e w = ∑ S y j a j K ( x i , x j ) + b n e w − y i E_i^{new}=\sum\limits_{S}y_ja_jK(x_i,x_j)+b^{new}-y_i Einew​=S∑​yj​aj​K(xi​,xj​)+bnew−yi​

    其中 S S S為支援向量的集合

SMO算法

輸入:訓練資料集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } , x i ∈ R n , y i ∈ { − 1 , + 1 } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\},x_i \in R^n,y_i \in \{-1,+1\} T={(x1​,y1​),(x2​,y2​),...,(xN​,yN​)},xi​∈Rn,yi​∈{−1,+1},精度 ϵ \epsilon ϵ

輸出:近似解 a ^ \hat{a} a^

( 1 ) (1) (1)取初始值 a ( 0 ) = 0 , k = 0 a^{(0)}=0,k=0 a(0)=0,k=0

( 2 ) (2) (2)按照算法求解以 a 1 ( k ) a 2 ( k ) , a^{(k)}_1a^{(k)}_2, a1(k)​a2(k)​,求 a 1 ( k + 1 ) a 2 ( k + 1 ) , a^{(k+1)}_1a^{(k+1)}_2, a1(k+1)​a2(k+1)​,

( 3 ) (3) (3)如果以精度 ϵ \epsilon ϵ滿足條件則停止,

∑ i = 1 N a i y i = 0 , 0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \sum\limits_{i=1}^Na_iy_i=0,0\le a_i \le C,i=1,2,...,N i=1∑N​ai​yi​=0,0≤ai​≤C,i=1,2,...,N

y i ⋅ g ( x i ) = { ≥ 1 { x i ∣ a i = 0 } = 1 { x i ∣ 0 < a i < C } ≤ 1 { x i ∣ a i = C } y_i \cdot g(x_i)=\begin{cases} \ge 1 &\{x_i|a_i=0\}\\ =1 &\{x_i|0<a_i<C\}\\ \le 1 & \{x_i|a_i=C\}\\ \end{cases} yi​⋅g(xi​)=⎩⎪⎨⎪⎧​≥1=1≤1​{xi​∣ai​=0}{xi​∣0<ai​<C}{xi​∣ai​=C}​

其中

g ( x i ) = ∑ j = 1 N a j y j K ( x j , x i ) + b g(x_i)=\sum\limits_{j=1}^Na_jy_jK(x_j,x_i)+b g(xi​)=j=1∑N​aj​yj​K(xj​,xi​)+b

否則轉 ( 4 ) , k = k + 1 (4),k=k+1 (4),k=k+1

( 4 ) (4) (4) a ^ = a ( k + 1 ) \hat{a}=a^{(k+1)} a^=a(k+1)

繼續閱讀