天天看點

【機器學習】交叉驗證和K-折交叉驗證cross-validation and k-fold cross-validation

【機器學習】交叉驗證和K-折交叉驗證cross-validation and k-fold cross-validation

http://www.anc.ed.ac.uk/rbf/intro/node16.html

If data is not scarce then the set of available input-output measurements can be divided into two parts - one part for training and one part for testing. In this way several different models, all trained on the training set, can be compared on the test set. This is the basic form of cross-validation.

如果資料不稀疏,把資料集分為兩部分,一部分是訓練集,一部分是測試集。 這樣,一些不同的模型,都在訓練集上訓練,在測試集上對比結果。 

A better method, which is intended to avoid the possible bias introduced by relying on any one particular division into test and train components, is to partition the original set in several different ways and to compute an average score over the different partitions.

為了避免依賴某一特定的訓練和測試集的劃分産生了可能的偏差,一個更好的方法是把原始資料按不同的方法分,計算不同部分的平均得分。

下面引入k-折交叉驗證。

https://randomforests.wordpress.com/2014/02/02/basics-of-k-fold-cross-validation-and-gridsearchcv-in-scikit-learn/

K-Fold Cross Validation is used to validate your model through generating different combinations of the data you already have. For example, if you have 100 samples, you can train your model on the first 90, and test on the last 10. Then you could train on samples 1-80 & 90-100, and test on samples 80-90. Then repeat. This way, you get different combinations of train/test data, essentially giving you ‘more’ data for validation from your original data. 

k折交叉驗證是使用不同的資料組合來驗證模型。例如,你有100個樣本,你在前90個樣本上訓練,在後10個上測試。然後在80-90上測試,其他的訓練。重複下去。這樣,你可以得到不同的訓練-測試集組合,可以給你提供更多的資料去驗證模型。

k-折交叉驗證對網格搜尋是很重要的

we’ll now check out GridSearchCV. This allows us to create a special model that will find its optimal parameter values

網格搜尋允許我們去建立一個特定的模型去找到它的最優參數值。

繼續閱讀