天天看點

# 使用袋外誤差評估随機森林模型

# 使用袋外誤差評估随機森林模型  使用袋外樣本
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets

iris = datasets.load_iris()
features = iris.data
target = iris.target

randomforest = RandomForestClassifier(
    random_state=0, n_estimators=1000, oob_score=True, n_jobs=-1)

model = randomforest.fit(features, target)
# 檢視袋外誤差
randomforest.oob_score_
0.9533333333333334
Discussion
In random forests, each decision tree is trained using a boostrapped subset of observations. This means that for every tree there is a separate subset of observations not being used to train that tree. These are called out-of-bag (OOB) observations. We can use OOB observations as a test set to evaluate the performance of our random forest.

For every observation, the learning algorithm compares the observation's true vlaue with the prediction from a subset of trees not trained using that observation. The overall score is calculated and provides a single measure of a random forest's performance. OOB score estimation is an alternative to cross-validation