0% found this document useful (0 votes)

40 views3 pages

Lab 7

This document summarizes a lab on regularization methods, specifically ridge regression and LASSO. It introduces regularization and the glmnet package in R. It shows how to perform ridge regression and LASSO on a baseball dataset, including exploring the effects of different regularization parameters and using cross-validation to select the best parameter. It also provides pseudocode for how K-fold cross-validation is implemented for these methods.

Uploaded by

Beitriss Chua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views3 pages

Lab 7

Uploaded by

Beitriss Chua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

MH4510 - Statistical Learning and Data Mining - AY1819 S1 Lab 07

MH4510 - Regularization Method

Matthew Zakharia Hadimaja

28th September 2018 (Fri) - Regularization Method

Course instructor : PUN Chi Seng
Lab instructor : Matthew Zakharia Hadimaja

References
Chapter 6.6, [ISLR] An Introduction to Statistical Learning (with Applications in R). Free access to download
the book: http://www-bcf.usc.edu/~gareth/ISL/
To see the help file of a function funcname, type ?funcname.

1. Preparation

Load dataset
library(ISLR)
data(Hitters)
Hitters <- na.omit(Hitters)

glmnet has different input types. Therefore, we have to create them first
# x, the predictor, has to be a numerical matrix
# model.matrix converts factors to a set of dummy variables
x <- model.matrix(Salary ~ ., Hitters)[, -1]
head(x)
# y, the output, has to be a vector
y <- Hitters$Salary

2. Ridge Regression

The penalty is defined as (1 − α)/2||β||22 + α||β||1 . Therefore, for ridge regression, the alpha is 0.
library(glmnet)
grid <- 10 ^ seq(10, -2, length = 100) # lambda from 10^10 to 10^-2, logarithmically scaled
ridge.mod <- glmnet(x, y, alpha = 0, lambda = grid)
names(ridge.mod) # read ?glmnet for details
dim(coef(ridge.mod))
par(mfrow = c(1,2))
plot(ridge.mod, xvar = 'norm')
plot(ridge.mod, xvar = 'lambda')

1
MH4510 - Statistical Learning and Data Mining - AY1819 S1 Lab 07

Large vs small lambda

Large lambda
ridge.mod$lambda[50]
coef(ridge.mod)[, 10]
sqrt(sum(coef(ridge.mod)[-1, l] ^ 2)) # l2-norm of the coefficients

Small lambda
ridge.mod$lambda[60]
coef(ridge.mod)[, l]
sqrt(sum(coef(ridge.mod)[-1, l] ^ 2))

Predict coefficients with new lambda value s.

predict(ridge.mod, s = 50, type = "coefficients")[1:20, ]

For lambda = 0 or lambda = Inf, what model does the algorithm produce?

3. LASSO

Same as ridge, but alpha = 1 now. Notice that the coefficients can be exactly zero.
lasso.mod <- glmnet(x, y, alpha = 1, lambda = grid)
par(mfrow = c(1,2))
plot(lasso.mod, xvar = 'norm')
plot(lasso.mod, xvar = 'lambda')

Large vs small lambda

Large lambda
lasso.mod$lambda[50]
coef(lasso.mod)[, 50]
sqrt(sum(coef(lasso.mod)[-1, l] ^ 2))

Small lambda
lasso.mod$lambda[80]
coef(lasso.mod)[, 80]
sqrt(sum(coef(lasso.mod)[-1, l] ^ 2))

Predict coefficients with new lambda value s.

predict(lasso.mod, s = 50, type = "coefficients")[1:20, ]

Cross validation

Cross validation to choose best lambda

lasso.cv <- cv.glmnet(x[train, ], y[train], alpha = 1)
plot(lasso.cv)
(lasso.bestlam <- lasso.cv$lambda.min)

Refit using the whole training set

2
MH4510 - Statistical Learning and Data Mining - AY1819 S1 Lab 07

lasso.cvmod <- glmnet(x[train, ], y[train], alpha = 1, lambda = lasso.bestlam)

lasso.coef <- coef(lasso.cvmod)
lasso.coef

Predict test set

lasso.pred <- predict(lasso.cvmod, newx = x[-train, ])
mean((lasso.pred - y[-train]) ^ 2)

4. Tutorial

Explain how K-fold cross-validation is implemented on ridge regression / LASSO with scaling. Please specify
how to compute the cross-validation function and how is the scaling implemented.
This is a pseudo-code for CV without scaling. Note that this does not represent the cv.glmnet function.
Modify the pseudo-code below to answer the question above.
Suppose L is a vector containing lambda values to try, and we have 'X' as our data.

1. set the CV error for each lambda, CV[lambda] = 0, lambda in L

2. split X into training set Tr and test set Te
3. split Tr into K random parts with equal size, Tr[k], k = 1,2,...,K
4. for k in 1:K
1. set Tr[-k] as the k-th pseudo training set, pTr[k]
2. set Tr[k] as the k-th pseudo test set, pTe[k]
3. for lambda in L
1. perform ridge regression / LASSO with Tr[-k] and lambda
2. evaluate the test error on pTe[k]
3. CV[lambda] = CV[lambda] + test error
5. choose lambda that minimises CV[lambda], call it lambda*
6. refit the whole model with Tr and lambda*
7. check the performance on Te

Qauntum Scalar I40i80 I4 Maintenance Guide
50% (14)
Qauntum Scalar I40i80 I4 Maintenance Guide
474 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Slides Estimation
No ratings yet
Slides Estimation
171 pages
Naive Bayes
No ratings yet
Naive Bayes
110 pages
Lecture 4
No ratings yet
Lecture 4
63 pages
R For Data Science Lecture 5: Chang Liu
No ratings yet
R For Data Science Lecture 5: Chang Liu
65 pages
2.b Applied Machine Learning Secret Sauce - Slides
No ratings yet
2.b Applied Machine Learning Secret Sauce - Slides
41 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
SL LMRG
No ratings yet
SL LMRG
32 pages
Lecture 6 Model Selection and Regularization 11oct2023
No ratings yet
Lecture 6 Model Selection and Regularization 11oct2023
29 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Slides 2
No ratings yet
Slides 2
27 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
No ratings yet
Cross-Validation, Regularization, and Principal Components Analysis (PCA)
47 pages
Lecture1 FGV
No ratings yet
Lecture1 FGV
21 pages
ISYE 8803 - Kamran - M6 - LD Learning Using Regularization
No ratings yet
ISYE 8803 - Kamran - M6 - LD Learning Using Regularization
25 pages
Regularization Linear Models
No ratings yet
Regularization Linear Models
23 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
IML Summary
No ratings yet
IML Summary
12 pages
18 CV & Model Selection
No ratings yet
18 CV & Model Selection
11 pages
Ghojogh, Benyamin, and Mark Crowley
No ratings yet
Ghojogh, Benyamin, and Mark Crowley
23 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
Regression Linaire Python Tome II
No ratings yet
Regression Linaire Python Tome II
10 pages
Regularization Methods Intro 1694372556
No ratings yet
Regularization Methods Intro 1694372556
38 pages
LASSO Regression
No ratings yet
LASSO Regression
7 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Feature Selection
No ratings yet
Feature Selection
19 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Cheatsheet 2
No ratings yet
Cheatsheet 2
5 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Assignment
No ratings yet
Assignment
7 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
Cappstone
No ratings yet
Cappstone
2 pages
Module 2 Lab Activity - Regression
No ratings yet
Module 2 Lab Activity - Regression
9 pages
Tutorial 3 - Updated
No ratings yet
Tutorial 3 - Updated
5 pages
Bias Variance Ridge Regression
No ratings yet
Bias Variance Ridge Regression
4 pages
SRM Formula Sheet-2
100% (1)
SRM Formula Sheet-2
11 pages
Linear Regression: Bayesian Theory and Practice January 2025
No ratings yet
Linear Regression: Bayesian Theory and Practice January 2025
4 pages
hw16 109090023
No ratings yet
hw16 109090023
22 pages
Compact HMI 800 5.0 Getting Started
100% (1)
Compact HMI 800 5.0 Getting Started
176 pages
n14 PDF
No ratings yet
n14 PDF
4 pages
TP MSDC 3
No ratings yet
TP MSDC 3
6 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
No ratings yet
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
14 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
Tally Shortcut Keys 2024-2025 (Commerce Academy)
No ratings yet
Tally Shortcut Keys 2024-2025 (Commerce Academy)
3 pages
CS3351-DPCO Answer Key
No ratings yet
CS3351-DPCO Answer Key
10 pages
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
No ratings yet
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
7 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Exercise 03
No ratings yet
Exercise 03
5 pages
Block 5 ST3189
No ratings yet
Block 5 ST3189
6 pages
How To Send Biiling Document Through Edi Idoc PDF
No ratings yet
How To Send Biiling Document Through Edi Idoc PDF
15 pages
Iot Internet of Things
No ratings yet
Iot Internet of Things
18 pages
SDS2 7.1-Assorted Tools PDF
No ratings yet
SDS2 7.1-Assorted Tools PDF
94 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
Cis 4 - 0
No ratings yet
Cis 4 - 0
120 pages
Phishing, Pharming, Vishing and Smishing
100% (1)
Phishing, Pharming, Vishing and Smishing
2 pages
01 - Introduction To Computer Security Security
No ratings yet
01 - Introduction To Computer Security Security
39 pages
Test Driven Machine Learning - Sample Chapter
100% (1)
Test Driven Machine Learning - Sample Chapter
25 pages
MultiCorpora MultiTrans Pro User Manual v37 English
No ratings yet
MultiCorpora MultiTrans Pro User Manual v37 English
213 pages
BernhardGschaider-OFW9 swakPyFoamBasicTraining
No ratings yet
BernhardGschaider-OFW9 swakPyFoamBasicTraining
222 pages
RSA Archer Integration Guide
No ratings yet
RSA Archer Integration Guide
26 pages
【Android Development Advanced Series】Android Multi-Process Topic
No ratings yet
【Android Development Advanced Series】Android Multi-Process Topic
30 pages
Bec405 A Module4 - 1
No ratings yet
Bec405 A Module4 - 1
4 pages
Rohini 52093006178
No ratings yet
Rohini 52093006178
11 pages
Dissertation Indesign Vorlage
100% (2)
Dissertation Indesign Vorlage
5 pages
CCTV Camera 2
No ratings yet
CCTV Camera 2
8 pages
Speed Control and Direction Control of DC Motor Using Arm7: K.A.Wadile, S.R.Chillarge, D.D.Jadhav
No ratings yet
Speed Control and Direction Control of DC Motor Using Arm7: K.A.Wadile, S.R.Chillarge, D.D.Jadhav
5 pages
Census Form 2020 PDF
No ratings yet
Census Form 2020 PDF
2 pages
COMP1786 Individual Coursework
0% (1)
COMP1786 Individual Coursework
10 pages
Ii Mca: Object Oriented Analysis and Design PART - A (Compulsory)
No ratings yet
Ii Mca: Object Oriented Analysis and Design PART - A (Compulsory)
12 pages
HP Color LaserJet CM1312
No ratings yet
HP Color LaserJet CM1312
7 pages
Files
No ratings yet
Files
15 pages
Dashboard Reports Powerbi Sharepoint
No ratings yet
Dashboard Reports Powerbi Sharepoint
15 pages
F4XT Manuale Istruzioni It
No ratings yet
F4XT Manuale Istruzioni It
9 pages
Advanced Arithmetic Operations: A-First Example
No ratings yet
Advanced Arithmetic Operations: A-First Example
9 pages
Automated Number Plate Recognition & Beyond
No ratings yet
Automated Number Plate Recognition & Beyond
1 page
POA For Kernel Patch Upgradation On 2-Node Cluster C24 and Cfab
No ratings yet
POA For Kernel Patch Upgradation On 2-Node Cluster C24 and Cfab
4 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Lab 7

Uploaded by

Lab 7

Uploaded by

MH4510 - Statistical Learning and Data Mining - AY1819 S1 Lab 07

MH4510 - Regularization Method

28th September 2018 (Fri) - Regularization Method

Large vs small lambda

Predict coefficients with new lambda value s.

Large vs small lambda

Predict coefficients with new lambda value s.

Cross validation to choose best lambda

Refit using the whole training set

lasso.cvmod <- glmnet(x[train, ], y[train], alpha = 1, lambda = lasso.bestlam)

Predict test set

1. set the CV error for each lambda, CV[lambda] = 0, lambda in L

You might also like