Lecture - 5 - Validation
Lecture - 5 - Validation
• The training set is used to fit the models; the validation set is used to estimate
prediction error for model selection; the test set is used for assessment of the
generalization error of the final chosen model.
• Ideally, the test set should be kept in a “vault,” and be brought out only at the
end of the data analysis
• Model is fit on the training set, and the fitted model is used to predict the responses for
the observations in the validation set
• The resulting validation set error rate—typically assessed using MSE in the case of a
quantitative response—provides an estimate of the test error rate
• The model choices, like determining the parameters of the model should be done using
validation set.
• The LOOCV estimate for the test MSE is the average of these n error estimates
where
• The k-fold CV error rate and validation set error rates are defined analogously
• Accuracy or other measures can also be used similarly (see Jupyter notebook
KNN_Validation)