判決函數 G m ( x ) G_m(x) Gm(x)
基 本 形 式 : f M ( x ) = ∑ m = 1 M α m G m ( x ) 代 價 函 數 : L o s s ( y , f ( x ) ) = ∑ i N exp ( − y i f ( x i ) ) 基本形式: f_M(x)=\sum_{m=1}^{M}\alpha_mG_m(x) \\ 代價函數: Loss(y,f(x))=\sum_i^N\exp({-y_if(x_i)}) \\ 基本形式:fM(x)=m=1∑MαmGm(x)代價函數:Loss(y,f(x))=i∑Nexp(−yif(xi))
假定我們通過某種手段,通過m-1輪疊代計算得到 f m − 1 ( x ) f_{m-1}(x) fm−1(x)使得 L o s s ( x , f m − 1 ( x ) ) Loss(x,f_{m-1}(x)) Loss(x,fm−1(x))最小
現準備通過第m輪疊代計算使得 L o s s ( x , f m ( x ) ) Loss(x,f_{m}(x)) Loss(x,fm(x))最小的 f m ( x ) f_m(x) fm(x)
根據預測函數的基本形式可以得到如下遞歸形式:
f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_m(x)=f_{m-1}(x)+\alpha_mG_m(x) \\ fm(x)=fm−1(x)+αmGm(x)
進而得到如下損失函數形式
L o s s ( y , f m ( x ) ) = ∑ i N exp ( − y i f m ( x i ) ) = ∑ i N exp { − y i [ f m − 1 ( x i ) + α m G m ( x i ) ] } Loss(y,f_m(x))=\sum_i^N\exp({-y_if_m(x_i)})=\sum_i^N\exp\{{-y_i[f_{m-1}(x_i)+\alpha_mG_m(x_i)]}\} \\ Loss(y,fm(x))=i∑Nexp(−yifm(xi))=i∑Nexp{−yi[fm−1(xi)+αmGm(xi)]}
接下來的目标就是計算使得 L o s s ( y , f m ( x ) ) Loss(y,f_m(x)) Loss(y,fm(x))最小的 α m , G m ( x ) \alpha_m,G_m(x) αm,Gm(x)
( α m , G m ( x ) ) = arg min α m , G m ( x ) L o s s ( y , f m ( x ) ) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}Loss(y,f_m(x)) (αm,Gm(x))=αm,Gm(x)argminLoss(y,fm(x))
即
( α m , G m ( x ) ) = arg min α m , G m ( x ) ∑ i N exp { − y i [ f m − 1 ( x i ) + α m G m ( x i ) ] } (1) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}\sum_i^N\exp\{{-y_i[f_{m-1}(x_i)+\alpha_mG_m(x_i)]}\}\tag{1} (αm,Gm(x))=αm,Gm(x)argmini∑Nexp{−yi[fm−1(xi)+αmGm(xi)]}(1)
由于我們分析的前提是預設 f m − 1 ( x ) f_{m-1}(x) fm−1(x)已經是最佳函數,且顯然 f m − 1 ( x ) f_{m-1}(x) fm−1(x)的取值與 α m , G m ( x ) \alpha_m,G_m(x) αm,Gm(x)無關,是以令
w ˉ m , i = e − y i f m − 1 ( x i ) (2) \bar{w}_{m,i}=e^{-y_if_{m-1}(x_i)}\tag{2} wˉm,i=e−yifm−1(xi)(2)
将 ( 2 ) (2) (2)帶入 ( 1 ) (1) (1)得到
( α m , G m ( x ) ) = arg min α m , G m ( x ) ∑ i N w ˉ m , i exp [ − y i α m G m ( x i ) ] (3) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}\sum_i^N\bar{w}_{m,i}\exp{[-y_i\alpha_mG_m(x_i)]}\tag{3} (αm,Gm(x))=αm,Gm(x)argmini∑Nwˉm,iexp[−yiαmGm(xi)](3)
定義訓示函數
I ( condition ) I(\text{condition}) I(condition)
當condition滿足時函數值取1,否則取0。
此時(3)式可以改寫為如下形式
( α m , G m ( x ) ) = arg min α m , G m ( x ) ∑ i N w ˉ m , i exp [ − y i α m G m ( x i ) ] = ① arg min α m , G m ( x ) [ ∑ y i = G m ( x i ) w ˉ m , i e − α m + ∑ y i ≠ G m ( x i ) w ˉ m , i e α m ] (5) \begin{aligned} (\alpha_m,G_m(x)) & =\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}\sum_i^N\bar{w}_{m,i}\exp{[-y_i\alpha_mG_m(x_i)]}\\ & \mathop{=}\limits^{①} \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[\sum_{y_i=G_m(x_i)}\bar{w}_{m,i}e^{-\alpha_m}+ \sum_{y_i ≠G_m(x_i)}\bar{w}_{m,i}e^{\alpha_m}]\\ \tag{5} \end{aligned} (αm,Gm(x))=αm,Gm(x)argmini∑Nwˉm,iexp[−yiαmGm(xi)]=①αm,Gm(x)argmin[yi=Gm(xi)∑wˉm,ie−αm+yi=Gm(xi)∑wˉm,ieαm](5)
觀察可知 e − α m , e α m e^{-\alpha_m},e^{\alpha_m} e−αm,eαm是與 G m ( x ) G_m(x) Gm(x)無關的變量,是以上式可以改寫為如下形式
( α m , G m ( x ) ) = arg min α m , G m ( x ) [ ∑ y i = G m ( x i ) w ˉ m , i e − α m + ∑ y i ≠ G m ( x i ) w ˉ m , i e α m ] = arg min α m , G m ( x ) [ e − α m ∑ y i = G m ( x i ) w ˉ m , i + e α m ∑ y i ≠ G m ( x i ) w ˉ m , i ] = arg min α m , G m ( x ) [ e − α m ∑ i N w ˉ m , i I ( y i = G m ( x i ) ) ] + e α m ∑ i N w ˉ m , i I ( y i ≠ G m ( x i ) ) ] (6) \begin{aligned} (\alpha_m,G_m(x)) & = \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[\sum_{y_i=G_m(x_i)}\bar{w}_{m,i}e^{-\alpha_m}+ \sum_{y_i ≠G_m(x_i)}\bar{w}_{m,i}e^{\alpha_m}]\\ & = \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[e^{-\alpha_m}\sum_{y_i=G_m(x_i)}\bar{w}_{m,i}+ e^{\alpha_m}\sum_{y_i ≠G_m(x_i)}\bar{w}_{m,i}]\\ & = \mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[e^{-\alpha_m}\sum_{i}^N\bar{w}_{m,i}I(y_i = G_m(x_i))]+ e^{\alpha_m}\sum_{i}^N\bar{w}_{m,i}I(y_i \neq G_m(x_i))]\\ \tag{6} \end{aligned} (αm,Gm(x))=αm,Gm(x)argmin[yi=Gm(xi)∑wˉm,ie−αm+yi=Gm(xi)∑wˉm,ieαm]=αm,Gm(x)argmin[e−αmyi=Gm(xi)∑wˉm,i+eαmyi=Gm(xi)∑wˉm,i]=αm,Gm(x)argmin[e−αmi∑Nwˉm,iI(yi=Gm(xi))]+eαmi∑Nwˉm,iI(yi=Gm(xi))](6)
定義帶權重 w m i w_{mi} wmi、歸一化的“誤差函數”如下
e m = ∑ i N w ˉ m , i I ( y i ≠ G m ( x i ) ) ∑ i N w ˉ m , i (7) e_m=\frac{\sum_{i}^N\bar{w}_{m,i}I(y_i \neq G_m(x_i))}{\sum_{i}^N\bar{w}_{m,i}}\tag{7} em=∑iNwˉm,i∑iNwˉm,iI(yi=Gm(xi))(7)
根據(7)式,可以改寫(6)式為如下形式。
( α m , G m ( x ) ) = arg min α m , G m ( x ) [ e − α m ( 1 − e m ) + e α m e m ] (8) (\alpha_m,G_m(x))=\mathop{\arg\min}\limits_{\alpha_m,G_m(x)}[e^{-\alpha_m}(1-e_m)+ e^{\alpha_m}e_m]\tag{8} (αm,Gm(x))=αm,Gm(x)argmin[e−αm(1−em)+eαmem](8)
注:實際上優化目标函數改寫前後相差一個常數倍數 ∑ i N w ˉ m , i \sum_{i}^N\bar{w}_{m,i} ∑iNwˉm,i,這實際上對優化結果沒有影響。
為了友善後續分析,令
L ( e m , α m ) = e − α m ( 1 − e m ) + e α m e m (9) L(e_m,\alpha_m)=e^{-\alpha_m}(1-e_m)+ e^{\alpha_m}e_m \tag{9} L(em,αm)=e−αm(1−em)+eαmem(9)
(9)式中 C C C隻與 α m \alpha_m αm有關, e m e_m em則是隻與 G m ( x ) G_m(x) Gm(x)有關,是以對 e m e_m em求偏微分得到:
∂ L ( e m , α m ) ∂ e m = e α m − e − α m \frac{\partial L(e_m,\alpha_m)}{\partial e_m}=e^{\alpha_m}-e^{-\alpha_m} ∂em∂L(em,αm)=eαm−e−αm
當 α m > 0 \alpha_m>0 αm>0時, e − α m < 1 e^{-\alpha_m}<1 e−αm<1,是以
∂ L ( e m , α m ) ∂ e m > 0 \frac{\partial L(e_m,\alpha_m)}{\partial e_m}>0 ∂em∂L(em,αm)>0
是以若希望使損失函數最小,要使得誤差最小;根據式 ( 7 ) (7) (7)和判決函數 G m ( x ) G_m(x) Gm(x),使 e m e_m em最小就是選擇特定的 G m ( x ) G_m(x) Gm(x),使得誤判權重最小。
再對 α m \alpha_m αm求偏導
∂ L ( e m , α m ) ∂ α m = e − α m ( e m − 1 ) + e α m e m \frac{\partial L(e_m,\alpha_m)}{\partial \alpha_m}=e^{-\alpha_m}(e_m-1)+e^{\alpha_m}e_m ∂αm∂L(em,αm)=e−αm(em−1)+eαmem
令 ∂ L ( e m , α m ) ∂ α m = 0 \frac{\partial L(e_m,\alpha_m)}{\partial \alpha_m}=0 ∂αm∂L(em,αm)=0,即:
e − α m ( e m − 1 ) + e α m e m = 0 e^{-\alpha_m}(e_m-1)+e^{\alpha_m}e_m=0 e−αm(em−1)+eαmem=0
解得 α m = 1 2 log 1 − e m e m \alpha_m=\frac{1}{2}\log{\frac{1-e_m}{e_m}} αm=21logem1−em
當選取最優的 e m e_m em時,一定會使得 e m < 0.5 e_m<0.5 em<0.5,是以 α m > 0 \alpha_m>0 αm>0,與假設相符。