【機器學習與R語言】7-回歸樹和模型樹

1.了解回歸樹和模型樹

決策樹用于數值預測：

回歸樹：基于到達葉節點的案例的平均值做出預測，沒有使用線性回歸的方法。
模型樹：在每個葉節點，根據到達該節點的案例建立多元線性回歸模型。是以葉節點數目越多，一顆模型樹越大，比同等回歸樹更難了解，但模型可能更精确。

将回歸加入到決策樹：

分類決策樹中，一緻性（均勻性）由熵值來度量；數值決策樹，則通過統計量（如方差、标準差或平均絕對偏差等）來度量。

标準偏差減少SDR：一個常見的分割标準。

比如計算特征A和特征B的SDR分别為1.2和1.4，即特征B标準差減少得更多（更加均勻），是以首先使用特征B，這就是回歸樹。而模型樹則需要再建立一個結果相對于特征A的線性回歸模型，然後根據兩個線性模型中的任何一個為新的案例做出預測。

2.回歸樹和模型樹應用示例

葡萄酒品質評級

1）收集資料

白葡萄酒資料包含4898個葡萄酒案例的11種化學特征的資訊（如酸性/含糖量/pH/密度等，還包含一列品質等級）。

資料下載下傳：

連結: https://pan.baidu.com/s/1pN_PtZOYjOz2I-KJqSq6pw 提取碼: 6swg

2）探索和準備資料

## Step 2: Exploring and preparing the data ----
wine <- read.csv("whitewines.csv")

# examine the wine data
str(wine)

# the distribution of quality ratings
hist(wine$quality)

# summary statistics of the wine data
summary(wine)

wine_train <- wine[1:3750, ]
wine_test <- wine[3751:4898, ]

3）訓練資料

## Step 3: Training a model on the data ----
# regression tree using rpart
library(rpart)
m.rpart <- rpart(quality ~ ., data = wine_train)

# get basic information about the tree
m.rpart

# get more detailed information about the tree
summary(m.rpart)

# use the rpart.plot package to create a visualization
library(rpart.plot)

# a basic decision tree diagram
rpart.plot(m.rpart, digits = 3)

# a few adjustments to the diagram
rpart.plot(m.rpart, digits = 4, fallen.leaves = TRUE, type = 3, extra = 101)

alcohol是決策樹種第一個使用的變量，是以它是葡萄酒品質種唯一最重要的名額。

4）評估模型

①預測值與真實值的範圍以及相關性

②用平均絕對誤差度量性能

平均絕對誤差MAE：考慮預測值離真實值有多遠

## Step 4: Evaluate model performance ----

# generate predictions for the testing dataset
p.rpart <- predict(m.rpart, wine_test)

# compare the distribution of predicted values vs. actual values
summary(p.rpart)
summary(wine_test$quality)

# compare the correlation
cor(p.rpart, wine_test$quality)

# function to calculate the mean absolute error
MAE <- function(actual, predicted) {
  mean(abs(actual - predicted))  
}

# mean absolute error between predicted and actual values
MAE(p.rpart, wine_test$quality)

# mean absolute error between actual values and mean value
mean(wine_train$quality) # result = 5.87
MAE(5.87, wine_test$quality)

5）提高模型性能

回歸樹在葉節點進行預測時隻使用了一個單一的值，模型樹可以通過回歸樹模型取代葉節點來改善回歸樹。

M5'算法（M5-prime）： RWeka::M5P 函數

## Step 5: Improving model performance ----
# train a M5' Model Tree
library(RWeka)
m.m5p <- M5P(quality ~ ., data = wine_train)

# display the tree
m.m5p

# get a summary of the model's performance
summary(m.m5p)

# generate predictions for the model
p.m5p <- predict(m.m5p, wine_test)

# summary statistics about the predictions
summary(p.m5p)

# correlation between the predicted and true values
cor(p.m5p, wine_test$quality)

# mean absolute error of predicted and true values
# (uses a custom function defined above)
MAE(wine_test$quality, p.m5p)

分割與回歸樹相似，但節點不是以一個數值預測終止，而是以一個線性模型終止（LM1，LM2...LM163）

模型樹的預測範圍、相關性、平均絕對誤差比回歸樹都有所改善。

PS：回歸樹和模型樹的結果比較費解，這篇推文解讀有點簡單

【機器學習與R語言】7-回歸樹和模型樹

1.了解回歸樹和模型樹

2.回歸樹和模型樹應用示例

1）收集資料

2）探索和準備資料

3）訓練資料

4）評估模型

5）提高模型性能

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告