天天看點

python模型訓練 warm_start_在scikit學習中結合随機森林模型

我相信這可以通過修改RandomForestClassifier對象上的estimators_和n_estimators屬性來實作。林中的每個樹都存儲為DecisionTreeClassifier對象,這些樹的清單存儲在estimators_屬性中。為了確定不存在間斷性,改變n_estimators中的估計數也是有意義的。

這種方法的優點是,您可以在多台機器上并行地建構一堆小森林,并将它們組合起來。

下面是一個使用iris資料集的示例:from sklearn.ensemble import RandomForestClassifier

from sklearn.cross_validation import train_test_split

from sklearn.datasets import load_iris

def generate_rf(X_train, y_train, X_test, y_test):

rf = RandomForestClassifier(n_estimators=5, min_samples_leaf=3)

rf.fit(X_train, y_train)

print "rf score ", rf.score(X_test, y_test)

return rf

def combine_rfs(rf_a, rf_b):

rf_a.estimators_ += rf_b.estimators_

rf_a.n_estimators = len(rf_a.estimators_)

return rf_a

iris = load_iris()

X, y = iris.data[:, [0,1,2]], iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.33)

# in the line below, we create 10 random forest classifier models

rfs = [generate_rf(X_train, y_train, X_test, y_test) for i in xrange(10)]

# in this step below, we combine the list of random forest models into one giant model

rf_combined = reduce(combine_rfs, rfs)

# the combined model scores better than *most* of the component models

print "rf combined score", rf_combined.score(X_test, y_test)