`xgboost`

xgboost 全称是 eXtreme Gradient Boosting ,可译为极限梯度提升算法。它由陈天奇所设计，致力于让提升树突破自身的计算极限，以实现运算快速，性能优秀的工程目标。和传统的梯度提升算法相比， XGBoost 进行了很多改进，并且已经认为是在分类和回归上都拥有超高性能的先进评估器
xgboost documents :https://xgboost.readthedocs.io/en/latest/index.html

1.使用 `xgboost` 库

方式一【直接使用 xgboost 库自己的建模流程】

xgboost算法xgboost

最核心的是 DMatrix 这个读取类，以及 train() 这个用于训练的类。与 sklearn 把所有的参数都写在类中的方式不同， xgboost 库中必须先使用字典设置参数集，在使用 train() 来将参数集输入，再进行训练【 xgboost 涉及的参数太多，全部写在 xgb.train() 容易出错】。

xgboost算法xgboost

方式二：【使用 XGB 库中的 sklearn 的API】

xgboost算法xgboost
两种使用方式的区别
- 使用 xgboost 中设定的建模流程来建模，和使用 sklearnAPI 中的类来建模，模型效果比较相似，但是 xgboost 库本身的运算速度(尤其是交叉验证)以及调参手段比 sklearn 简单

2.梯度提升算法

XGBoost 的基础是梯度提升算法
梯度提升(Gradient boosting)：
- 是构建预测模型的最强大的技术之一，是集成算法中提升法(Boosting)的代表算法。集成算法通过在数据上构建多个弱评估器，汇总所有弱评估器的建模结果，以获取比单个模型更好的回归或分类表现
  - 集成不同弱评估器的方法很多；例如，一次性建立多个平行独立的弱评估器的装袋法；逐一构建弱评估器，经过多次迭代逐渐积累多个弱评估器的提升法
- 梯度提升法是集成算法中提升法(Boosting)的代表算法
基于梯度提升的回归和分类来说，建模过程大致如下
- 最开始建立一棵树，然后逐渐迭代，每次迭代过程中都增加一棵树，逐渐形成众多树模型集成的强评估器
  
  xgboost算法xgboost

3. `XGB` 算法原理

XGB 中构建的弱学习器为 CART 树，意味着 XGBoost 中所有的树都是二叉的
XGB 中的预测值是所有弱分类器上的叶子结点权重直接求和得到的

xgboost算法xgboost

这个集成模型 XGB 中共有k棵决策树，整个模型在这个样本 i 上给出的预测结果为：

xgboost算法xgboost

yi(k) 表示k棵树叶子结点权重的累和或 XGB 模型返回的预测结果，K表示树的总和， fk(xi) 表示第K棵决策树返回的叶子结点的权重(第K棵树返回的结果)

4.参数

有放回随机抽样： subsample
- 控制随机抽样，在 xgb 和 sklearn 中，参数默认为1且不为0，说明我们无法控制模型是否进行随机有放回抽样，只能控制抽样抽出来的样本量大概是多少,通常在样本量很大的时候来调整和使用
迭代决策树： eta
- 使用参数 learning_rate 来表示迭代的速率。 learn_rate 值越大表示迭代速度越快，算法的极限会很快被达到，有可能无法收敛到真正最佳的损失值， learn_rate 越小就越有可能找到更加精确的最佳值，但是迭代速度会变慢，耗费更多的的运算空间和成本
选择弱评估器： booster

xgboost算法xgboost

5.示例

xgb

与随机森林对比

from xgboost import XGBRegressor as XGBR
from sklearn.ensemble import RandomForestRegressor as RFR
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import KFold,cross_val_score,train_test_split
data=load_boston()
feature=data.data
target=data.target
x_train,x_test,y_train,y_test=train_test_split(feature,target,test_size=0.3,random_state=420)
#xgboost
reg=XGBR(n_estimators=100).fit(x_train,y_train)
reg.score(x_test,y_test)  #0.9050988968414799
MSE(y_test,reg.predict(x_test))  #7.466827353555599
cross_val_score(reg,x_train,y_train,cv=5).mean()#0.7995062821902295

#随机森林
rfr=RFR(n_estimators=100).fit(x_train,y_train)
rfr.score(x_test,y_test)  #0.8732682957243149
MSE(y_test,rfr.predict(x_test)) #12.016450651315784
cross_val_score(rfr,x_train,y_train,cv=5).mean()#0.8011385218554434

n_eatimators

from xgboost import XGBRegressor as XGBR
from sklearn.datasets import load_boston
from sklearn.model_selection import KFold,cross_val_score,train_test_split
%matplotlib inline
import matplotlib.pyplot as plt
data=load_boston()
feature=data.data
target=data.target
x_train,x_test,y_train,y_test=train_test_split(feature,target,test_size=0.3,random_state=420)
model_count=[]
scores=[]
for i in range(100,300):
    xgb=XGBR(n_estimators=i).fit(x_train,y_train)
    score=xgb.score(x_test,y_test)
    scores.append(score)
    model_count.append(i)
plt.plot(model_count,scores)

xgboost算法xgboost

subsample

from xgboost import XGBRegressor as XGBR
from sklearn.datasets import load_boston
from sklearn.model_selection import KFold,cross_val_score,train_test_split
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

data=load_boston()
feature=data.data
target=data.target
x_train,x_test,y_train,y_test=train_test_split(feature,target,test_size=0.3,random_state=420)
subs=[]
scores=[]
for i in np.linspace(0.05,1,20):
    xgb=XGBR(n_estimators=182,subsample=i).fit(x_train,y_train)
    score=xgb.score(x_test,y_test)
    subs.append(i)
    scores.append(score)
plt.plot(subs,scores)

xgboost算法xgboost

learning_rate

from sklearn.model_selection import KFold,cross_val_score,train_test_split
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

data=load_boston()
feature=data.data
target=data.target
x_train,x_test,y_train,y_test=train_test_split(feature,target,test_size=0.3,random_state=420)
rates=[]
scores=[]
for i in np.linspace(0.05,1,20):
    xgb=XGBR(n_estimators=182,subsample=0.9,learning_rate=i).fit(x_train,y_train)
    score=xgb.score(x_test,y_test)
    rates.append(i)
    scores.append(score)
plt.plot(rates,scores)

xgboost算法xgboost

booster

from sklearn.model_selection import KFold,cross_val_score,train_test_split
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

data=load_boston()
feature=data.data
target=data.target
x_train,x_test,y_train,y_test=train_test_split(feature,target,test_size=0.3,random_state=420)
for booster in ['gbtree','gblinear','dart']:
    reg=XGBR(n_estimators=180,learning_rate=0.1,random_state=420,booster=booster).fit(x_train,y_train)
    print(booster)
    print(reg.score(x_test,y_test))
# gbtree
# 0.9260984369386971
# gblinear
# 0.6499751962020093
# dart
# 0.9260984459922119

xgboost算法xgboost

`xgboost`

目录

1.使用 `xgboost` 库

2.梯度提升算法

3. `XGB` 算法原理

4.参数

5.示例

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入

xgboost算法xgboost

xgboost

目录

1.使用 xgboost 库

2.梯度提升算法

3. XGB 算法原理

4.参数

5.示例

继续阅读

`xgboost`

1.使用 `xgboost` 库

3. `XGB` 算法原理