Cross Validation
Cross Validation
Outline
•Context
•Different Approaches of Cross-Validation
Validation Set Approach
Leave-One-Out Cross-Validation
𝑘-Fold Cross Validation
•An application
Context
Advertising Data Set
The Advertising data set consists of the sales (in
thousands of units) of a particular product in 200
different markets.
Linear
Predicted
Sales Regression Sales
Model
Quantitative
Quantitative
TV Radio Newspaper
Possible Models
Models Predictors
1 TV
2 Radio
3 Newspaper
4 TV and Radio
5 TV and Newspaper
6 Radio and Newspaper
7 TV, Radio and Newspaper
Model Selection
Model Predictors 𝑅2 Adjusted − 𝑅 2 𝑅𝑆𝐸
• Fit all the candidate models (Models (1) to Models (10)) using the
2 training data set.
• Use the fitted models to predict 𝑚𝑝𝑔 for the validation data set.
3
• The model with the lowest validation (testing) MSE is the winner!!
4
Auto Data
• Left: Validation error rate
for a single split into the
training and validation data
sets
• Right: Validation method
repeated 10 times, each
time the split is done
randomly!
Validation Set Approach: Advantages
Conceptually simple
Easy to implement
Validation Set Approach: Disadvantages
The validation MSE can be highly variable (see the plot on the
right-hand panel of the Figure in the previous slide).
• Validate the model using the validation data and compute the corresponding MSE.
3
1
• The MSE for the model is computed as 𝐶𝑉(𝑛) = σ𝑛𝑖=1 𝑀𝑆𝐸𝑖 .
5 𝑛
LOOCV: Advantages
LOOCV has less bias
• We repeatedly fit the statistical learning method using training data
that contains 𝑛 − 1 obs., i.e. almost all the data set is used.
LOOCV produces a less variable MSE
• The validation approach produces different MSE when applied
repeatedly due to randomness in the splitting process, while
performing LOOCV multiple times will always yield the same results,
because we split based on one obs. each time.
LOOCV: Disadvantages
• LOOCV is computationally intensive.
We fit each model 𝑛 times !!
𝐾-fold Cross Validation
𝐾-fold Cross Validation
𝐾-fold Cross Validation
• We divide the data set into 𝐾 different parts (e.g., 𝐾 = 5, or 𝐾 = 10, etc.).
1
• We then remove the first part, fit the model on the remaining 𝐾 − 1 parts,
2 and compute the MSE on the first part.
• We then repeat this 𝐾 different times taking out a different part each time.
3
1
• The 𝐾-fold CV error is given by 𝐶𝑉(𝐾) = σ𝐾
𝑖=1 𝑀𝑆𝐸𝑖 .
4 𝐾
Auto Data: LOOCV & K-fold CV
• LOOCV is a special case of k-fold, where 𝑘 = 𝑛
• They are both stable, but LOOCV is more computationally intensive!
Home Work
• Obtain the cross-validation error rates of all 7 models for the Advertising data
set.
Reading Material
• James, G., Witten, D., Hastie, T. & Tibshirani, R. (2021). An Introduction to
Statistical Learning: with Applications in R. New York: Springer-Verlag.
Chapter 2: Sub-section 2.2.1
Chapter 5: Section 5.1, Sub-sections 5.1.1, 5.1.2, 5.1.3, 5.3.1, 5.3.2, 5.3.3.