【機器學習】交叉驗證和K-折交叉驗證cross-validation and k-fold cross-validation

2023-05-13 14:31:18

http://www.anc.ed.ac.uk/rbf/intro/node16.html

If data is not scarce then the set of available input-output measurements can be divided into two parts - one part for training and one part for testing. In this way several different models, all trained on the training set, can be compared on the test set. This is the basic form of cross-validation.

如果資料不稀疏，把資料集分為兩部分，一部分是訓練集，一部分是測試集。這樣，一些不同的模型，都在訓練集上訓練，在測試集上對比結果。

A better method, which is intended to avoid the possible bias introduced by relying on any one particular division into test and train components, is to partition the original set in several different ways and to compute an average score over the different partitions.

為了避免依賴某一特定的訓練和測試集的劃分産生了可能的偏差，一個更好的方法是把原始資料按不同的方法分，計算不同部分的平均得分。

下面引入k-折交叉驗證。

https://randomforests.wordpress.com/2014/02/02/basics-of-k-fold-cross-validation-and-gridsearchcv-in-scikit-learn/

K-Fold Cross Validation is used to validate your model through generating different combinations of the data you already have. For example, if you have 100 samples, you can train your model on the first 90, and test on the last 10. Then you could train on samples 1-80 & 90-100, and test on samples 80-90. Then repeat. This way, you get different combinations of train/test data, essentially giving you ‘more’ data for validation from your original data.

k折交叉驗證是使用不同的資料組合來驗證模型。例如，你有100個樣本，你在前90個樣本上訓練，在後10個上測試。然後在80-90上測試，其他的訓練。重複下去。這樣，你可以得到不同的訓練-測試集組合，可以給你提供更多的資料去驗證模型。

k-折交叉驗證對網格搜尋是很重要的

we’ll now check out GridSearchCV. This allows us to create a special model that will find its optimal parameter values

網格搜尋允許我們去建立一個特定的模型去找到它的最優參數值。

【機器學習】交叉驗證和K-折交叉驗證cross-validation and k-fold cross-validation

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告