Regularization on GBDT

2017-12-03 23:50:00

之前一篇文章简单地讲了XGBoost的实现与普通GBDT实现的不同之处，本文尝试总结一下GBDT运用的正则化技巧。

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent.

具体的做法是选择一部分样本作为验证集，在迭代拟合训练集的过程中，如果模型在验证集里错误率不再下降，就停止训练，也就是说控制迭代的轮数（树的个数）。

Shrinkage就是将每棵树的输出结果乘一个因子(0<ν<10<ν<1)，其中ΣJmj=1γjmI(x∈Rjm)Σj=1JmγjmI(x∈Rjm)是第m棵的输出，而f(m)f(m)是前m棵树的ensemble:

fm(x)=fm−1(x)+ν⋅ΣJmj=1γjmI(x∈Rjm)fm(x)=fm−1(x)+ν⋅Σj=1JmγjmI(x∈Rjm)

ESL书中这样讲：

The parameter νν can be regarded as controlling the leanring rate of the boosting procedure

νν和迭代轮数M(树个数)是一个tradeoff，推荐的是νν值设置小一点(如0.1)，而M设置大一些。这样一般能有比较好的准确率，代价是训练时间变长(与M成比例)。

下面是Sklearn的实现关于该参数设置的片段，XGBoost类似：

Subsampling其实源于bootstrap averaging(bagging)思想，GBDT里的做法是在每一轮建树时，样本是从训练集合中无放回随机抽样的ηη部分，典型的ηη值是0.5。这样做既能对模型起正则作用，也能减少计算时间。

事实上，XGBoost和Sklearn的实现均借鉴了随机森林，除了有样本层次上的采样，也有特征采样。也就是说建树的时候只从随机选取的一些特征列寻找最优分裂。下面是Sklearn里的相关参数设置的片段，

将树模型的复杂度作为正则项显式地加进优化目标里，是XGBoost实现的独到之处。

L(t)=∑i=1nl(yi,y∗(t−1)i+ft(xi))+Ω(ft)L(t)=∑i=1nl(yi,yi∗(t−1)+ft(xi))+Ω(ft)

where

Ω(f)=γT+12λ||w||2Ω(f)=γT+12λ||w||2

我个人的看法是将树模型的复杂度作为正则化项加在优化目标，相比自己通过参数控制每轮树的复杂度更直接，这可能是XGBoost相比普通GBDT实现效果更好的一个很重要的原因。很遗憾，Sklearn暂时无相应的实现。

文中提到GBDT里会出现over-specialization的问题：

Trees added at later iterations tend to impact the prediction of only a few instances, and they make negligible contribution towards the prediction of all the remaining instances. We call this issue of subsequent trees affecting the prediction of only a small fraction of the training instances over-specialization.

也就是说前面迭代的树对预测值的贡献比较大，后面的树会集中预测一小部分样本的偏差。Shrinkage可以减轻over-specialization的问题，但不是很好。作者想通过Dropout来平衡所有树对预测的贡献，如下图的效果：

具体的做法如下：

DART divergesfrom MART at two places. First, when computing the gradient that the next tree will fit, only a random subset of the existing ensemble is considered. The second place at which DART diverges from MART is when adding the new tree to the ensemble where DART performs a normalization step.

简单说就是每次新加一棵树，这棵树要拟合的并不是之前全部树ensemble后的残差，而是随机抽取的一些树ensemble；同时新加的树结果要规范化一下。

这种新做法对GBDT效果的提升有多明显还有待大家探索尝试。

Regularization on GBDT

继续阅读

编译期授权回调Compile-Time Authorization Callbacks

如何查找Authorization object在哪些ABAP代码里使用到

ABAP的权限检查跟踪(Authorization trace)工具事务码 STAUTHTRACE

1094. The Largest Generation (25)

SAP BRF+ Interpretation模式与Generation模式的区别Sent: Tuesday, August 26, 2014 4:45 PM

ASP.NET Core Authentication and Authorization

Halcon区域region的遍历，合并，旋转与排序

ABAP Netweaver Authorization trace tool

Authorization object的where used列表功能，位于事务码SUIM使用方法

ABAP Authorization trace工具

SAP BRF+ Interpretation Mode与Generation Mode

Authorization object的where used列表功能，位于事务码SUIM使用方法

如何查找Authorization object在哪些ABAP代码里使用到

SAP ABAP的权限检查跟踪(Authorization trace)工具使用步骤介绍

如何为部署到 SAP BTP 平台上的 Node.js 应用提供Authorization 和 Trust 管理 - 权限管控

如何为部署到 SAP BTP 平台上的 Node.js 应用提供Authorization 和 Trust 管理 - 权限管控