what’s cross validation?
Cross-validation is a technique that is used for the assessment of how the results of statistical analysis generalize to an independent data set. Cross-validation is largely used in settings where the target is prediction and it is necessary to estimate the accuracy of the performance of a predictive model. The prime reason for the use of cross-validation rather than conventional validation is that there is not enough data available for partitioning them into separate training and test sets (as in conventional validation). This results in a loss of testing and modeling capability.
Cross-validation is also known as rotation estimation.
summary of cross validation
- generate a data set based on statistical analysis
- cross-validation for evaluation the model effectively.
- not enough data
what’s the grid search?
Grid Search for Parameter Selection.
kfold?
kfold is the method to split the data into k folds.
what’s the role of training/validate/test ?
about the validate set, will take advantage of kfold and cross-validation technology.
practice
kfold = KFold(n_splits=10)
parameters = {"max_depth":[1,3,5,15,None], "criterion":["gini","entropy"],"splitter":["random","best"]}
scoring_fnc = make_scorer(accuracy_score)
print("parameters:", parameters)
grid = GridSearchCV(classifier, parameters, scoring_fnc, cv=kfold)
grid = grid.fit(X_train, y_train)
reg = grid.best_estimator_
print('best score: %f'%grid.best_score_)
print('best parameters:')
for key in parameters.keys():
print('\t%s: %s'%(key, reg.get_params()[key]))
print('test score: %f'%reg.score(X_test, y_test))
the code is here.
summary
- grid-search functions as finding the best parameters
- cv is used for the evaluting the model fully if the data is not enough
- grid-search, cv are accompied by.
reference
- https://amueller.github.io/ml-training-intro/slides/03-cross-validation-grid-search.html#1
- https://stackabuse.com/cross-validation-and-grid-search-for-model-selection-in-python/
- https://towardsdatascience.com/why-and-how-to-cross-validate-a-model-d6424b45261f