ML 2024 Part1 CrossValidation
ML 2024 Part1 CrossValidation
A. Colin Cameron
Univ.of California - Davis
.
April 2024
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 1 / 43
Course Outline
Part 1: Variable selection and cross validation
2. Shrinkage methods
I ridge, lasso, elastic net
3. ML for causal inference using lasso
I OLS with many controls, IV with many instruments
4. Other methods for prediction
I nonparametric regression, principal components, splines
I neural networks
I regression trees, random forests, bagging, boosting
5. More ML for causal inference
I ATE with heterogeneous e¤ects and many controls.
6. Classi…cation and unsupervised learning
I classi…cation (categorical y ) and unsupervised learning (no y ).
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 2 / 43
1. Introduction
1. Introduction
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 3 / 43
1. Introduction
Overview
1 Introduction
2 Model Selection using Predictive Ability
1 Generated data example
2 Mean squared error
3 Information criteria and related penalty measures
4 Cross validation overview
5 Single-split cross validation
6 K-fold cross-validation
7 Leave-one-out cross validation
8 Stepwise selection and best subsets selection
9 Selection using statistical signi…cance
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 4 / 43
1. Introduction
Terminology
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 5 / 43
2. Model Selection using predictive ability 2.1 Generated data example
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 6 / 43
2. Model Selection using predictive ability 2.1 Generated data example
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 7 / 43
2. Model Selection using predictive ability 2.1 Generated data example
. * Summarize data
. summarize
. correlate
(obs=40)
x1 x2 x3 y
x1 1.0000
x2 0.5077 1.0000
x3 0.4281 0.2786 1.0000
y 0.4740 0.3370 0.2046 1.0000
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 8 / 43
2. Model Selection using predictive ability 2.1 Generated data example
Robust
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 9 / 43
2. Model Selection using predictive ability 2.2 Mean squared error
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 12 / 43
2. Model Selection using predictive ability 2.3 Information criteria and related penalty measures
The sign sign can be reversed (so AIC and BIC are positive).
Econometricians use βb and σ b )2
b2 = MSE= n1 ∑ni=1 (yi xi0 β
I then AIC= n
2 ln 2π n
2 b2
ln σ n
2 + 2k.
b and σ
Machine learners use β e2 = 1
∑ni=1 (yi e )2
xi0 β
n p p
I e is obtained from OLS in the largest model under
where β p
consideration that has p regressors including intercept
Also a …nite sample correction is
I AICC= AIC+2(K + 1)(K + 2) / (N K 2).
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 13 / 43
2. Model Selection using predictive ability 2.3 Information criteria and related penalty measures
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 15 / 43
2. Model Selection using predictive ability 2.3 Information criteria and related penalty measures
. global xlist2 x1
. global xlist3 x2
. global xlist4 x3
. global xlist5 x1 x2
. global xlist6 x2 x3
. global xlist7 x1 x3
. global xlist8 x1 x2 x3
.
. * Full sample estimates with AIC, BIC, Cp, R2adj penalties
. quietly regress y $xlist8
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 16 / 43
2. Model Selection using predictive ability 2.3 Information criteria and related penalty measures
. forvalues k = 1/8 {
2. quietly regress y ${xlist`k'}
3. scalar mse`k' = e(rss)/e(N)
4. scalar r2adj`k' = e(r2_a)
5. scalar aic`k' = -2*e(ll) + 2*e(rank)
6. scalar bic`k' = -2*e(ll) + e(rank)*ln(e(N))
7. scalar cp`k' = e(rss)/s2full - e(N) + 2*e(rank)
8. display "Model " "${xlist`k'}" _col(15) " MSE=" %8.5f mse`k' ///
> " R2adj=" %6.3f r2adj`k' " AIC=" %7.2f aic`k' ///
> " BIC=" %7.2f bic`k' " Cp=" %6.3f cp`k'
9. }
Model MSE=11.27186 R2adj= 0.000 AIC= 212.41 BIC= 214.10 Cp= 9.199
Model x1 MSE= 8.73891 R2adj= 0.204 AIC= 204.23 BIC= 207.60 Cp= 0.593
Model x2 MSE= 9.99158 R2adj= 0.090 AIC= 209.58 BIC= 212.96 Cp= 5.838
Model x3 MSE=10.80016 R2adj= 0.017 AIC= 212.70 BIC= 216.08 Cp= 9.224
Model x1 x2 MSE= 8.59796 R2adj= 0.196 AIC= 205.58 BIC= 210.64 Cp= 2.002
Model x2 x3 MSE= 9.84189 R2adj= 0.080 AIC= 210.98 BIC= 216.05 Cp= 7.211
Model x1 x3 MSE= 8.73887 R2adj= 0.183 AIC= 206.23 BIC= 211.29 Cp= 2.592
Model x1 x2 x3 MSE= 8.59740 R2adj= 0.174 AIC= 207.57 BIC= 214.33 Cp= 4.000
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 17 / 43
2. Model Selection using predictive ability 2.4 Cross-validation overview
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 18 / 43
2. Model Selection using predictive ability 2.5 Single-split cross validation
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 19 / 43
2. Model Selection using predictive ability 2.5 Single-split cross validation
. tabulate dtrain
0 8 20.00 20.00
1 32 80.00 100.00
Total 40 100.00
. * Single split validation - training and test MSE for the 8 possible models
. forvalues k = 1/8 {
2. qui reg y ${xlist`k'} if dtrain==1
3. qui predict y`k'hat
4. qui gen y`k'errorsq = (y`k'hat - y)^2
5. qui sum y`k'errorsq if dtrain == 1
6. scalar mse`k'train = r(mean)
7. qui sum y`k'errorsq if dtrain == 0
8. qui scalar mse`k'test = r(mean)
9. display "Model " "${xlist`k'}" _col(16) ///
> " Training MSE = " %7.3f mse`k'train " Test MSE = " %7.3f mse`k'test
10. }
Model Training MSE = 10.124 Test MSE = 16.280
Model x1 Training MSE = 7.478 Test MSE = 13.871
Model x2 Training MSE = 8.840 Test MSE = 14.803
Model x3 Training MSE = 9.658 Test MSE = 15.565
Model x1 x2 Training MSE = 7.288 Test MSE = 13.973
Model x2 x3 Training MSE = 8.668 Test MSE = 14.674
Model x1 x3 Training MSE = 7.474 Test MSE = 13.892
Model x1 x2 x3 Training MSE = 7.288 Test MSE = 13.980
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 21 / 43
2. Model Selection using predictive ability 2.6 K-fold cross validation
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 22 / 43
2. Model Selection using predictive ability 2.6 K-fold cross validation
splitsample command
. * Split sample into five equal size parts using splitsample commands
. splitsample, nsplit(5) generate(snum) rseed(10101)
. tabulate snum
1 8 20.00 20.00
2 8 20.00 40.00
3 8 20.00 60.00
4 8 20.00 80.00
5 8 20.00 100.00
Total 40 100.00
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 23 / 43
2. Model Selection using predictive ability 2.6 K-fold cross validation
. forvalues i = 1/5 {
2. qui reg y x1 x2 x3 if foldnum != `i'
3. qui predict y`i'hat
4. qui gen y`i'errorsq = (y`i'hat - y)^2
5. qui sum y`i'errorsq if foldnum ==`i'
6. matrix allmses[`i',1] = r(mean)
7. }
allmses[5,1]
c1
r1 13.980321
r2 6.4997357
r3 9.3623792
r4 6.413401
r5 12.23958
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 24 / 43
2. Model Selection using predictive ability 2.6 K-fold cross validation
. * Compute the average MSE over the five folds and standard deviation
. svmat allmses, names(vallmses)
. display "CV5 = " %5.3f r(mean) " with st. dev. = " %5.3f r(sd)
CV5 = 9.699 with st. dev. = 3.389
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 25 / 43
2. Model Selection using predictive ability 2.6 K-fold cross validation
RMSE
est1 3.739027
est2 2.549458
est3 3.059801
est4 2.532469
est5 3.498511
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 26 / 43
2. Model Selection using predictive ability 2.6 K-fold cross validation
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 27 / 43
2. Model Selection using predictive ability 2.6 K-fold cross validation
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 28 / 43
2. Model Selection using predictive ability 2.7 Leave-one-out cross validation
∑i =1 MSE(
n
∑ i = 1 ( yi
1 1 n 2
CVn = n i) = n yb( i ))
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 29 / 43
2. Model Selection using predictive ability 2.7 Leave-one-out cross validation
Method Value
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 30 / 43
2. Model Selection using predictive ability 2.7 Leave-one-out cross validation
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 31 / 43
2. Model Selection using predictive ability 2.7 Leave-one-out cross validation
Response : y
Selected predictors: x1 x2 x3
Optimal models:
1 : x1
2 : x1 x2
3 : x1 x2 x3
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 34 / 43
2. Model Selection using predictive ability 2.8 Forwards selection, backwards selection and best subsets
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 35 / 43
2. Model Selection using predictive ability 2.9 Selection using Statistical Signi…cance
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 36 / 43
2. Model Selection using predictive ability 2.9 Selection using Statistical Signi…cance
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 37 / 43
2. Model Selection using predictive ability 2.9 Selection using Statistical Signi…cance
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 39 / 43
2. Model Selection using predictive ability 2.9 Selection using Statistical Signi…cance
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 40 / 43
2. Model Selection using predictive ability 2.9 Selection using Statistical Signi…cance
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 41 / 43
2. Model Selection using predictive ability 2.9 Selection using Statistical Signi…cance
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 42 / 43
References
References
ISLR2: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibsharani
(2021), An Introduction to Statistical Learning: with Applications in R, 2nd Ed.,
Springer.
I PDF and $40 softcover book at
https://link.springer.com/book/10.1007/978-1-0716-1418-1
ISLP: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani and
Jonathan Taylor (2023), An Introduction to Statistical Learning: with Applications
in Python, Springer.
A. Colin Cameron Univ.of California - Davis . () ML Part 1: Cross validation April 2024 43 / 43