Regularization on GBDT

2017-12-03 23:50:00

之前一篇文章簡單地講了XGBoost的實作與普通GBDT實作的不同之處，本文嘗試總結一下GBDT運用的正則化技巧。

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent.

具體的做法是選擇一部分樣本作為驗證集，在疊代拟合訓練集的過程中，如果模型在驗證集裡錯誤率不再下降，就停止訓練，也就是說控制疊代的輪數（樹的個數）。

Shrinkage就是将每棵樹的輸出結果乘一個因子(0<ν<10<ν<1)，其中ΣJmj=1γjmI(x∈Rjm)Σj=1JmγjmI(x∈Rjm)是第m棵的輸出，而f(m)f(m)是前m棵樹的ensemble:

fm(x)=fm−1(x)+ν⋅ΣJmj=1γjmI(x∈Rjm)fm(x)=fm−1(x)+ν⋅Σj=1JmγjmI(x∈Rjm)

ESL書中這樣講：

The parameter νν can be regarded as controlling the leanring rate of the boosting procedure

νν和疊代輪數M(樹個數)是一個tradeoff，推薦的是νν值設定小一點(如0.1)，而M設定大一些。這樣一般能有比較好的準确率，代價是訓練時間變長(與M成比例)。

下面是Sklearn的實作關于該參數設定的片段，XGBoost類似：

Subsampling其實源于bootstrap averaging(bagging)思想，GBDT裡的做法是在每一輪建樹時，樣本是從訓練集合中無放回随機抽樣的ηη部分，典型的ηη值是0.5。這樣做既能對模型起正則作用，也能減少計算時間。

事實上，XGBoost和Sklearn的實作均借鑒了随機森林，除了有樣本層次上的采樣，也有特征采樣。也就是說建樹的時候隻從随機選取的一些特征列尋找最優分裂。下面是Sklearn裡的相關參數設定的片段，

将樹模型的複雜度作為正則項顯式地加進優化目标裡，是XGBoost實作的獨到之處。

L(t)=∑i=1nl(yi,y∗(t−1)i+ft(xi))+Ω(ft)L(t)=∑i=1nl(yi,yi∗(t−1)+ft(xi))+Ω(ft)

where

Ω(f)=γT+12λ||w||2Ω(f)=γT+12λ||w||2

我個人的看法是将樹模型的複雜度作為正則化項加在優化目标，相比自己通過參數控制每輪樹的複雜度更直接，這可能是XGBoost相比普通GBDT實作效果更好的一個很重要的原因。很遺憾，Sklearn暫時無相應的實作。

文中提到GBDT裡會出現over-specialization的問題：

Trees added at later iterations tend to impact the prediction of only a few instances, and they make negligible contribution towards the prediction of all the remaining instances. We call this issue of subsequent trees affecting the prediction of only a small fraction of the training instances over-specialization.

也就是說前面疊代的樹對預測值的貢獻比較大，後面的樹會集中預測一小部分樣本的偏差。Shrinkage可以減輕over-specialization的問題，但不是很好。作者想通過Dropout來平衡所有樹對預測的貢獻，如下圖的效果：

具體的做法如下：

DART divergesfrom MART at two places. First, when computing the gradient that the next tree will fit, only a random subset of the existing ensemble is considered. The second place at which DART diverges from MART is when adding the new tree to the ensemble where DART performs a normalization step.

簡單說就是每次新加一棵樹，這棵樹要拟合的并不是之前全部樹ensemble後的殘差，而是随機抽取的一些樹ensemble；同時新加的樹結果要規範化一下。

這種新做法對GBDT效果的提升有多明顯還有待大家探索嘗試。

Regularization on GBDT

繼續閱讀

編譯期授權回調Compile-Time Authorization Callbacks

如何查找Authorization object在哪些ABAP代碼裡使用到

ABAP的權限檢查跟蹤(Authorization trace)工具事務碼 STAUTHTRACE

1094. The Largest Generation (25)

SAP BRF+ Interpretation模式與Generation模式的差別Sent: Tuesday, August 26, 2014 4:45 PM

ASP.NET Core Authentication and Authorization

Halcon區域region的周遊，合并，旋轉與排序

ABAP Netweaver Authorization trace tool

Authorization object的where used清單功能，位于事務碼SUIM使用方法

ABAP Authorization trace工具

SAP BRF+ Interpretation Mode與Generation Mode

Authorization object的where used清單功能，位于事務碼SUIM使用方法

如何查找Authorization object在哪些ABAP代碼裡使用到

SAP ABAP的權限檢查跟蹤(Authorization trace)工具使用步驟介紹

如何為部署到 SAP BTP 平台上的 Node.js 應用提供Authorization 和 Trust 管理 - 權限管控

如何為部署到 SAP BTP 平台上的 Node.js 應用提供Authorization 和 Trust 管理 - 權限管控