天天看點

《統計學習方法》筆記——AdaBoost與向前分步算法公式推導

判決函數 G m ( x ) G_m(x) Gm​(x)

基 本 形 式 : f M ( x ) = ∑ m = 1 M α m G m ( x ) 代 價 函 數 : L o s s ( y , f ( x ) ) = ∑ i N exp ⁡ ( − y i f ( x i ) ) 基本形式: f_M(x)=\sum_{m=1}^{M}\alpha_mG_m(x) \\ 代價函數: Loss(y,f(x))=\sum_i^N\exp({-y_if(x_i)}) \\ 基本形式:fM​(x)=m=1∑M​αm​Gm​(x)代價函數:Loss(y,f(x))=i∑N​exp(−yi​f(xi​))

假定我們通過某種手段,通過m-1輪疊代計算得到 f m − 1 ( x ) f_{m-1}(x) fm−1​(x)使得 L o s s ( x , f m − 1 ( x ) ) Loss(x,f_{m-1}(x)) Loss(x,fm−1​(x))最小

現準備通過第m輪疊代計算使得 L o s s ( x , f m ( x ) ) Loss(x,f_{m}(x)) Loss(x,fm​(x))最小的 f m ( x ) f_m(x) fm​(x)

根據預測函數的基本形式可以得到如下遞歸形式:

f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_m(x)=f_{m-1}(x)+\alpha_mG_m(x) \\ fm​(x)=fm−1​(x)+αm​Gm​(x)

進而得到如下損失函數形式

L o s s ( y , f m ( x ) ) = ∑ i N exp ⁡ ( − y i f m ( x i ) ) = ∑ i N exp ⁡ { − y i [ f m − 1 ( x i ) + α m G m ( x i ) ] } Loss(y,f_m(x))=\sum_i^N\exp({-y_if_m(x_i)})=\sum_i^N\exp\{{-y_i[f_{m-1}(x_i)+\alpha_mG_m(x_i)]}\} \\ Loss(y,fm​(x))=i∑N​exp(−yi​fm​(xi​))=i∑N​exp{−yi​[fm−1​(xi​)+αm​Gm​(xi​)]}

接下來的目标就是計算使得 L o s s ( y , f m ( x ) ) Loss(y,f_m(x)) Loss(y,fm​(x))最小的 α m , G m ( x ) \alpha_m,G_m(x) αm​,Gm​(x)

( α m , G m ( x ) ) = arg ⁡ min ⁡ α m , G m ( x ) L o s s ( y , f m ( x ) ) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}Loss(y,f_m(x)) (αm​,Gm​(x))=αm​,Gm​(x)argmin​Loss(y,fm​(x))

( α m , G m ( x ) ) = arg ⁡ min ⁡ α m , G m ( x ) ∑ i N exp ⁡ { − y i [ f m − 1 ( x i ) + α m G m ( x i ) ] } (1) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}\sum_i^N\exp\{{-y_i[f_{m-1}(x_i)+\alpha_mG_m(x_i)]}\}\tag{1} (αm​,Gm​(x))=αm​,Gm​(x)argmin​i∑N​exp{−yi​[fm−1​(xi​)+αm​Gm​(xi​)]}(1)

由于我們分析的前提是預設 f m − 1 ( x ) f_{m-1}(x) fm−1​(x)已經是最佳函數,且顯然 f m − 1 ( x ) f_{m-1}(x) fm−1​(x)的取值與 α m , G m ( x ) \alpha_m,G_m(x) αm​,Gm​(x)無關,是以令

w ˉ m , i = e − y i f m − 1 ( x i ) (2) \bar{w}_{m,i}=e^{-y_if_{m-1}(x_i)}\tag{2} wˉm,i​=e−yi​fm−1​(xi​)(2)

将 ( 2 ) (2) (2)帶入 ( 1 ) (1) (1)得到

( α m , G m ( x ) ) = arg ⁡ min ⁡ α m , G m ( x ) ∑ i N w ˉ m , i exp ⁡ [ − y i α m G m ( x i ) ] (3) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}\sum_i^N\bar{w}_{m,i}\exp{[-y_i\alpha_mG_m(x_i)]}\tag{3} (αm​,Gm​(x))=αm​,Gm​(x)argmin​i∑N​wˉm,i​exp[−yi​αm​Gm​(xi​)](3)

定義訓示函數

I ( condition ) I(\text{condition}) I(condition)

當condition滿足時函數值取1,否則取0。

此時(3)式可以改寫為如下形式

( α m , G m ( x ) ) = arg ⁡ min ⁡ α m , G m ( x ) ∑ i N w ˉ m , i exp ⁡ [ − y i α m G m ( x i ) ] = ① arg ⁡ min ⁡ α m , G m ( x ) [ ∑ y i = G m ( x i ) w ˉ m , i e − α m + ∑ y i ≠ G m ( x i ) w ˉ m , i e α m ] (5) \begin{aligned} (\alpha_m,G_m(x)) & =\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}\sum_i^N\bar{w}_{m,i}\exp{[-y_i\alpha_mG_m(x_i)]}\\ & \mathop{=}\limits^{①} \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[\sum_{y_i=G_m(x_i)}\bar{w}_{m,i}e^{-\alpha_m}+ \sum_{y_i ≠G_m(x_i)}\bar{w}_{m,i}e^{\alpha_m}]\\ \tag{5} \end{aligned} (αm​,Gm​(x))​=αm​,Gm​(x)argmin​i∑N​wˉm,i​exp[−yi​αm​Gm​(xi​)]=①αm​,Gm​(x)argmin​[yi​=Gm​(xi​)∑​wˉm,i​e−αm​+yi​​=Gm​(xi​)∑​wˉm,i​eαm​]​(5)

觀察可知 e − α m , e α m e^{-\alpha_m},e^{\alpha_m} e−αm​,eαm​是與 G m ( x ) G_m(x) Gm​(x)無關的變量,是以上式可以改寫為如下形式

( α m , G m ( x ) ) = arg ⁡ min ⁡ α m , G m ( x ) [ ∑ y i = G m ( x i ) w ˉ m , i e − α m + ∑ y i ≠ G m ( x i ) w ˉ m , i e α m ] = arg ⁡ min ⁡ α m , G m ( x ) [ e − α m ∑ y i = G m ( x i ) w ˉ m , i + e α m ∑ y i ≠ G m ( x i ) w ˉ m , i ] = arg ⁡ min ⁡ α m , G m ( x ) [ e − α m ∑ i N w ˉ m , i I ( y i = G m ( x i ) ) ] + e α m ∑ i N w ˉ m , i I ( y i ≠ G m ( x i ) ) ] (6) \begin{aligned} (\alpha_m,G_m(x)) & = \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[\sum_{y_i=G_m(x_i)}\bar{w}_{m,i}e^{-\alpha_m}+ \sum_{y_i ≠G_m(x_i)}\bar{w}_{m,i}e^{\alpha_m}]\\ & = \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[e^{-\alpha_m}\sum_{y_i=G_m(x_i)}\bar{w}_{m,i}+ e^{\alpha_m}\sum_{y_i ≠G_m(x_i)}\bar{w}_{m,i}]\\ & = \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[e^{-\alpha_m}\sum_{i}^N\bar{w}_{m,i}I(y_i = G_m(x_i))]+ e^{\alpha_m}\sum_{i}^N\bar{w}_{m,i}I(y_i \neq G_m(x_i))]\\ \tag{6} \end{aligned} (αm​,Gm​(x))​=αm​,Gm​(x)argmin​[yi​=Gm​(xi​)∑​wˉm,i​e−αm​+yi​​=Gm​(xi​)∑​wˉm,i​eαm​]=αm​,Gm​(x)argmin​[e−αm​yi​=Gm​(xi​)∑​wˉm,i​+eαm​yi​​=Gm​(xi​)∑​wˉm,i​]=αm​,Gm​(x)argmin​[e−αm​i∑N​wˉm,i​I(yi​=Gm​(xi​))]+eαm​i∑N​wˉm,i​I(yi​​=Gm​(xi​))]​(6)

定義帶權重 w m i w_{mi} wmi​、歸一化的“誤差函數”如下

e m = ∑ i N w ˉ m , i I ( y i ≠ G m ( x i ) ) ∑ i N w ˉ m , i (7) e_m=\frac{\sum_{i}^N\bar{w}_{m,i}I(y_i \neq G_m(x_i))}{\sum_{i}^N\bar{w}_{m,i}}\tag{7} em​=∑iN​wˉm,i​∑iN​wˉm,i​I(yi​​=Gm​(xi​))​(7)

根據(7)式,可以改寫(6)式為如下形式。

( α m , G m ( x ) ) = arg ⁡ min ⁡ α m , G m ( x ) [ e − α m ( 1 − e m ) + e α m e m ] (8) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[e^{-\alpha_m}(1-e_m)+ e^{\alpha_m}e_m]\tag{8} (αm​,Gm​(x))=αm​,Gm​(x)argmin​[e−αm​(1−em​)+eαm​em​](8)

注:實際上優化目标函數改寫前後相差一個常數倍數 ∑ i N w ˉ m , i \sum_{i}^N\bar{w}_{m,i} ∑iN​wˉm,i​,這實際上對優化結果沒有影響。

為了友善後續分析,令

L ( e m , α m ) = e − α m ( 1 − e m ) + e α m e m (9) L(e_m,\alpha_m)=e^{-\alpha_m}(1-e_m)+ e^{\alpha_m}e_m \tag{9} L(em​,αm​)=e−αm​(1−em​)+eαm​em​(9)

(9)式中 C C C隻與 α m \alpha_m αm​有關, e m e_m em​則是隻與 G m ( x ) G_m(x) Gm​(x)有關,是以對 e m e_m em​求偏微分得到:

∂ L ( e m , α m ) ∂ e m = e α m − e − α m \frac{\partial L(e_m,\alpha_m)}{\partial e_m}=e^{\alpha_m}-e^{-\alpha_m} ∂em​∂L(em​,αm​)​=eαm​−e−αm​

當 α m > 0 \alpha_m>0 αm​>0時, e − α m < 1 e^{-\alpha_m}<1 e−αm​<1,是以

∂ L ( e m , α m ) ∂ e m > 0 \frac{\partial L(e_m,\alpha_m)}{\partial e_m}>0 ∂em​∂L(em​,αm​)​>0

是以若希望使損失函數最小,要使得誤差最小;根據式 ( 7 ) (7) (7)和判決函數 G m ( x ) G_m(x) Gm​(x),使 e m e_m em​最小就是選擇特定的 G m ( x ) G_m(x) Gm​(x),使得誤判權重最小。

再對 α m \alpha_m αm​求偏導

∂ L ( e m , α m ) ∂ α m = e − α m ( e m − 1 ) + e α m e m \frac{\partial L(e_m,\alpha_m)}{\partial \alpha_m}=e^{-\alpha_m}(e_m-1)+e^{\alpha_m}e_m ∂αm​∂L(em​,αm​)​=e−αm​(em​−1)+eαm​em​

令 ∂ L ( e m , α m ) ∂ α m = 0 \frac{\partial L(e_m,\alpha_m)}{\partial \alpha_m}=0 ∂αm​∂L(em​,αm​)​=0,即:

e − α m ( e m − 1 ) + e α m e m = 0 e^{-\alpha_m}(e_m-1)+e^{\alpha_m}e_m=0 e−αm​(em​−1)+eαm​em​=0

解得 α m = 1 2 log ⁡ 1 − e m e m \alpha_m=\frac{1}{2}\log{\frac{1-e_m}{e_m}} αm​=21​logem​1−em​​

當選取最優的 e m e_m em​時,一定會使得 e m < 0.5 e_m<0.5 em​<0.5,是以 α m > 0 \alpha_m>0 αm​>0,與假設相符。

繼續閱讀