【机器学习与R语言】7-回归树和模型树

1.理解回归树和模型树

决策树用于数值预测：

回归树：基于到达叶节点的案例的平均值做出预测，没有使用线性回归的方法。
模型树：在每个叶节点，根据到达该节点的案例建立多元线性回归模型。因此叶节点数目越多，一颗模型树越大，比同等回归树更难理解，但模型可能更精确。

将回归加入到决策树：

分类决策树中，一致性（均匀性）由熵值来度量；数值决策树，则通过统计量（如方差、标准差或平均绝对偏差等）来度量。

标准偏差减少SDR：一个常见的分割标准。

比如计算特征A和特征B的SDR分别为1.2和1.4，即特征B标准差减少得更多（更加均匀），所以首先使用特征B，这就是回归树。而模型树则需要再建立一个结果相对于特征A的线性回归模型，然后根据两个线性模型中的任何一个为新的案例做出预测。

2.回归树和模型树应用示例

葡萄酒质量评级

1）收集数据

白葡萄酒数据包含4898个葡萄酒案例的11种化学特征的信息（如酸性/含糖量/pH/密度等，还包含一列质量等级）。

数据下载：

链接: https://pan.baidu.com/s/1pN_PtZOYjOz2I-KJqSq6pw 提取码: 6swg

2）探索和准备数据

## Step 2: Exploring and preparing the data ----
wine <- read.csv("whitewines.csv")

# examine the wine data
str(wine)

# the distribution of quality ratings
hist(wine$quality)

# summary statistics of the wine data
summary(wine)

wine_train <- wine[1:3750, ]
wine_test <- wine[3751:4898, ]

3）训练数据

## Step 3: Training a model on the data ----
# regression tree using rpart
library(rpart)
m.rpart <- rpart(quality ~ ., data = wine_train)

# get basic information about the tree
m.rpart

# get more detailed information about the tree
summary(m.rpart)

# use the rpart.plot package to create a visualization
library(rpart.plot)

# a basic decision tree diagram
rpart.plot(m.rpart, digits = 3)

# a few adjustments to the diagram
rpart.plot(m.rpart, digits = 4, fallen.leaves = TRUE, type = 3, extra = 101)

alcohol是决策树种第一个使用的变量，所以它是葡萄酒质量种唯一最重要的指标。

4）评估模型

①预测值与真实值的范围以及相关性

②用平均绝对误差度量性能

平均绝对误差MAE：考虑预测值离真实值有多远

## Step 4: Evaluate model performance ----

# generate predictions for the testing dataset
p.rpart <- predict(m.rpart, wine_test)

# compare the distribution of predicted values vs. actual values
summary(p.rpart)
summary(wine_test$quality)

# compare the correlation
cor(p.rpart, wine_test$quality)

# function to calculate the mean absolute error
MAE <- function(actual, predicted) {
  mean(abs(actual - predicted))  
}

# mean absolute error between predicted and actual values
MAE(p.rpart, wine_test$quality)

# mean absolute error between actual values and mean value
mean(wine_train$quality) # result = 5.87
MAE(5.87, wine_test$quality)

5）提高模型性能

回归树在叶节点进行预测时只使用了一个单一的值，模型树可以通过回归树模型取代叶节点来改善回归树。

M5'算法（M5-prime）： RWeka::M5P 函数

## Step 5: Improving model performance ----
# train a M5' Model Tree
library(RWeka)
m.m5p <- M5P(quality ~ ., data = wine_train)

# display the tree
m.m5p

# get a summary of the model's performance
summary(m.m5p)

# generate predictions for the model
p.m5p <- predict(m.m5p, wine_test)

# summary statistics about the predictions
summary(p.m5p)

# correlation between the predicted and true values
cor(p.m5p, wine_test$quality)

# mean absolute error of predicted and true values
# (uses a custom function defined above)
MAE(wine_test$quality, p.m5p)

分割与回归树相似，但节点不是以一个数值预测终止，而是以一个线性模型终止（LM1，LM2...LM163）

模型树的预测范围、相关性、平均绝对误差比回归树都有所改善。

PS：回归树和模型树的结果比较费解，这篇推文解读有点简单

【机器学习与R语言】7-回归树和模型树

1.理解回归树和模型树

2.回归树和模型树应用示例

1）收集数据

2）探索和准备数据

3）训练数据

4）评估模型

5）提高模型性能

继续阅读

简单文档分类——朴素贝叶斯算法朴素贝叶斯算法简单文档分类实例步骤总结朴素贝叶斯分类调用(sklearn)

【分类算法】什么是分类算法定义分类与聚类分类过程方法

分类算法的评价指标

K-近邻算法以及图像分类应用

weka之NB算法

使用weka的select attribute

weka中分类器算法

在weka中集成自己的算法

【多变量线性回归】学习记录序思路实现终

申请评分模型拒绝推断（RI）方法申请评分模型拒绝推断（RI）方法

【人工智能行业大师访谈1】吴恩达采访 Geoffery Hinton

【趋高机器视觉】机器视觉技术原理解析及解决方案

吴恩达 coursera ML 第七课总结+作业答案前言目录正文模型表示作业答案

XGBoost Plotting API以及GBDT组合特征实践 XGBoost Plotting API以及GBDT组合特征实践

解码器用于语义分割：数据依赖的解码可以实现灵活的特征聚合

2021-2025年中国运动疗法（KT）带行业市场供需与战略研究报告