0% found this document useful (0 votes)
40 views3 pages

Lab 7

This document summarizes a lab on regularization methods, specifically ridge regression and LASSO. It introduces regularization and the glmnet package in R. It shows how to perform ridge regression and LASSO on a baseball dataset, including exploring the effects of different regularization parameters and using cross-validation to select the best parameter. It also provides pseudocode for how K-fold cross-validation is implemented for these methods.

Uploaded by

Beitriss Chua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views3 pages

Lab 7

This document summarizes a lab on regularization methods, specifically ridge regression and LASSO. It introduces regularization and the glmnet package in R. It shows how to perform ridge regression and LASSO on a baseball dataset, including exploring the effects of different regularization parameters and using cross-validation to select the best parameter. It also provides pseudocode for how K-fold cross-validation is implemented for these methods.

Uploaded by

Beitriss Chua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MH4510 - Statistical Learning and Data Mining - AY1819 S1 Lab 07

MH4510 - Regularization Method


Matthew Zakharia Hadimaja

28th September 2018 (Fri) - Regularization Method


Course instructor : PUN Chi Seng
Lab instructor : Matthew Zakharia Hadimaja

References
Chapter 6.6, [ISLR] An Introduction to Statistical Learning (with Applications in R). Free access to download
the book: http://www-bcf.usc.edu/~gareth/ISL/
To see the help file of a function funcname, type ?funcname.

1. Preparation

Load dataset
library(ISLR)
data(Hitters)
Hitters <- na.omit(Hitters)

glmnet has different input types. Therefore, we have to create them first
# x, the predictor, has to be a numerical matrix
# model.matrix converts factors to a set of dummy variables
x <- model.matrix(Salary ~ ., Hitters)[, -1]
head(x)
# y, the output, has to be a vector
y <- Hitters$Salary

2. Ridge Regression

The penalty is defined as (1 − α)/2||β||22 + α||β||1 . Therefore, for ridge regression, the alpha is 0.
library(glmnet)
grid <- 10 ^ seq(10, -2, length = 100) # lambda from 10^10 to 10^-2, logarithmically scaled
ridge.mod <- glmnet(x, y, alpha = 0, lambda = grid)
names(ridge.mod) # read ?glmnet for details
dim(coef(ridge.mod))
par(mfrow = c(1,2))
plot(ridge.mod, xvar = 'norm')
plot(ridge.mod, xvar = 'lambda')

1
MH4510 - Statistical Learning and Data Mining - AY1819 S1 Lab 07

Large vs small lambda

Large lambda
ridge.mod$lambda[50]
coef(ridge.mod)[, 10]
sqrt(sum(coef(ridge.mod)[-1, l] ^ 2)) # l2-norm of the coefficients

Small lambda
ridge.mod$lambda[60]
coef(ridge.mod)[, l]
sqrt(sum(coef(ridge.mod)[-1, l] ^ 2))

Predict coefficients with new lambda value s.


predict(ridge.mod, s = 50, type = "coefficients")[1:20, ]

For lambda = 0 or lambda = Inf, what model does the algorithm produce?

3. LASSO

Same as ridge, but alpha = 1 now. Notice that the coefficients can be exactly zero.
lasso.mod <- glmnet(x, y, alpha = 1, lambda = grid)
par(mfrow = c(1,2))
plot(lasso.mod, xvar = 'norm')
plot(lasso.mod, xvar = 'lambda')

Large vs small lambda

Large lambda
lasso.mod$lambda[50]
coef(lasso.mod)[, 50]
sqrt(sum(coef(lasso.mod)[-1, l] ^ 2))

Small lambda
lasso.mod$lambda[80]
coef(lasso.mod)[, 80]
sqrt(sum(coef(lasso.mod)[-1, l] ^ 2))

Predict coefficients with new lambda value s.


predict(lasso.mod, s = 50, type = "coefficients")[1:20, ]

Cross validation

Cross validation to choose best lambda


lasso.cv <- cv.glmnet(x[train, ], y[train], alpha = 1)
plot(lasso.cv)
(lasso.bestlam <- lasso.cv$lambda.min)

Refit using the whole training set

2
MH4510 - Statistical Learning and Data Mining - AY1819 S1 Lab 07

lasso.cvmod <- glmnet(x[train, ], y[train], alpha = 1, lambda = lasso.bestlam)


lasso.coef <- coef(lasso.cvmod)
lasso.coef

Predict test set


lasso.pred <- predict(lasso.cvmod, newx = x[-train, ])
mean((lasso.pred - y[-train]) ^ 2)

4. Tutorial

Explain how K-fold cross-validation is implemented on ridge regression / LASSO with scaling. Please specify
how to compute the cross-validation function and how is the scaling implemented.
This is a pseudo-code for CV without scaling. Note that this does not represent the cv.glmnet function.
Modify the pseudo-code below to answer the question above.
Suppose L is a vector containing lambda values to try, and we have 'X' as our data.

1. set the CV error for each lambda, CV[lambda] = 0, lambda in L


2. split X into training set Tr and test set Te
3. split Tr into K random parts with equal size, Tr[k], k = 1,2,...,K
4. for k in 1:K
1. set Tr[-k] as the k-th pseudo training set, pTr[k]
2. set Tr[k] as the k-th pseudo test set, pTe[k]
3. for lambda in L
1. perform ridge regression / LASSO with Tr[-k] and lambda
2. evaluate the test error on pTe[k]
3. CV[lambda] = CV[lambda] + test error
5. choose lambda that minimises CV[lambda], call it lambda*
6. refit the whole model with Tr and lambda*
7. check the performance on Te

You might also like