0% found this document useful (0 votes)

78 views

Ridge Regression

This document discusses and compares Ridge and Lasso regression. It begins by implementing Ridge regression on a dataset to predict sales, which shows a slight improvement over linear regression. The document then discusses the mathematics behind Ridge regression and how the alpha hyperparameter controls the penalty term. Next, it implements Lasso regression on the same dataset, which performs even better. The key differences between Ridge and Lasso are that Lasso uses L1 regularization and can perform automatic feature selection by setting some coefficients exactly to zero, while Ridge uses L2 regularization and only shrinks coefficients. The document concludes by suggesting Elastic Net for situations with many correlated features.

Uploaded by

Julia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views

Ridge Regression

Uploaded by

Julia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

11.

Ridge Regression
Let us first implement it on our above problem and check our results that whether it
performs better than our linear regression model.

from sklearn.linear_model import Ridge

## training the model
ridgeReg = Ridge(alpha=0.05, normalize=True)
ridgeReg.fit(x_train,y_train)
pred = ridgeReg.predict(x_cv)
calculating mse
mse = np.mean((pred_cv - y_cv)**2)
mse 1348171.96 ## calculating score ridgeReg.score(x_cv,y_cv) 0.5691

So, we can see that there is a slight improvement in our model because the value of
the R-Square has been increased. Note that value of alpha, which is hyperparameter
of Ridge, which means that they are not automatically learned by the model instead
they have to be set manually.

Here we have consider alpha = 0.05. But let us consider different values of alpha and
plot the coefficients for each case.

You can see that, as we increase the value of alpha, the magnitude of the coefficients
decreases, where the values reaches to zero but not absolute zero.

But if you calculate R-square for each alpha, we will see that the value of R-square
will be maximum at alpha=0.05. So we have to choose it wisely by iterating it
through a range of values and using the one which gives us lowest error.

So, now you have an idea how to implement it but let us take a look at the
mathematics side also. Till now our idea was to basically minimize the cost function,
such that values predicted are much closer to the desired result.

Now take a look back again at the cost function for ridge regression.

Here if you notice, we come across an extra term, which is known as the penalty
term. λ given here, is actually denoted by alpha parameter in the ridge function. So by
changing the values of alpha, we are basically controlling the penalty term. Higher
the values of alpha, bigger is the penalty and therefore the magnitude of coefficients
are reduced.

Important Points:

 It shrinks the parameters, therefore it is mostly used to prevent

multicollinearity.
 It reduces the model complexity by coefficient shrinkage.
 It uses L2 regularization technique. (which I will discussed later in this article)

Now let us consider another type of regression technique which also makes use of
regularization.

12. Lasso regression

LASSO (Least Absolute Shrinkage Selector Operator), is quite similar to ridge, but
lets understand the difference them by implementing it in our big mart problem.

from sklearn.linear_model import Lasso

lassoReg = Lasso(alpha=0.3, normalize=True)
lassoReg.fit(x_train,y_train)
pred = lassoReg.predict(x_cv)
# calculating mse
mse = np.mean((pred_cv - y_cv)**2)
mse
1346205.82
lassoReg.score(x_cv,y_cv)
0.5720
As we can see that, both the mse and the value of R-square for our model has been
increased. Therefore, lasso model is predicting better than both linear and ridge.
Again lets change the value of alpha and see how does it affect the coefficients.

So, we can see that even at small values of alpha, the magnitude of coefficients have
reduced a lot. By looking at the plots, can you figure a difference between ridge and
lasso? We can see that as we increased the value of alpha, coefficients were
approaching towards zero, but if you see in case of lasso, even at smaller alpha’s, our
coefficients are reducing to absolute zeroes. Therefore, lasso selects the only some
feature while reduces the coefficients of others to zero. This property is known as
feature selection and which is absent in case of ridge.
Mathematics behind lasso regression is quiet similar to that of ridge only difference
being instead of adding squares of theta, we will add absolute value of Θ.

Here too, λ is the hypermeter, whose value is equal to the alpha in the Lasso function.

Important Points:

 It uses L1 regularization technique (will be discussed later in this article)

 It is generally used when we have more number of features, because it
automatically does feature selection.

Now that you have a basic understanding of ridge and lasso regression, let’s think of
an example where we have a large dataset, lets say it has 10,000 features. And we
know that some of the independent features are correlated with other independent
features. Then think, which regression would you use, Rigde or Lasso?

Let’s discuss it one by one. If we apply ridge regression to it, it will retain all of the
features but will shrink the coefficients. But the problem is that model will still
remain complex as there are 10,000 features, thus may lead to poor model
performance.

Instead of ridge what if we apply lasso regression to this problem. The main problem
with lasso regression is when we have correlated variables, it retains only one
variable and sets other correlated variables to zero. That will possibly lead to some
loss of information resulting in lower accuracy in our model.

Then what is the solution for this problem? Actually we have another type of
regression, known as elastic net regression, which is basically a hybrid of ridge and
lasso regression. So let’s try to understand it.

Step 4: Implementation of Ridge regression

install.packages("glmnet")
library(glmnet)

train$Item_Weight[is.na(train$Item_Weight)] <- mean(train$Item_Weight, na.rm =

TRUE)
train$Outlet_Size[is.na(train$Outlet_Size)] <- "Small"
train$Item_Visibility[train$Item_Visibility == 0] <- mean(train$Item_Visibility)
train$Outlet_Establishment_Year=2013 - train$Outlet_Establishment_Year

train<-train[c(-1)]
Y<-train[c(11)]

X <- model.matrix(Item_Outlet_Sales~., train)

lambda <- 10^seq(10, -2, length = 100)

set.seed(567)
part <- sample(2, nrow(X), replace = TRUE, prob = c(0.7, 0.3))
X_train<- X[part == 1,]
X_cv<- X[part == 2,]

Y_train<- Y[part == 1,]

Y_cv<- Y[part == 2,]

#ridge regression

ridge_reg <- glmnet(X[X_train,], Y[X_train], alpha = 0, lambda = lambda)

summary(ridge_reg)
#find the best lambda via cross validation
ridge_reg1 <- cv.glmnet(X[X_train,], Y[X_train], alpha = 0)

bestlam <- ridge_reg1$lambda.min

ridge.pred <- predict(ridge_reg, s = bestlam, newx = X[X_cv,])

m<-mean((Y_cv - ridge.pred)^2)
m

out = glmnet(X[X_train,],Y[X_train],alpha = 0)
ridge.coef<-predict(ridge_reg, type = "coefficients", s = bestlam)[1:40,]
ridge.coef

Step 5: Implementation of lasso regression

install.packages("glmnet")
library(glmnet)

train$Item_Weight[is.na(train$Item_Weight)] <- mean(train$Item_Weight, na.rm =

train<-train[c(-1)]
Y<-train[c(11)]

X <- model.matrix(Item_Outlet_Sales~., train)

lambda <- 10^seq(10, -2, length = 100)

set.seed(567)
part <- sample(2, nrow(X), replace = TRUE, prob = c(0.7, 0.3))
X_train<- X[part == 1,]
X_cv<- X[part == 2,]

Y_train<- Y[part == 1,]

Y_cv<- Y[part == 2,]

#lasso regression
lasso_reg <- glmnet(X[X_train,], Y[X_train], alpha = 1, lambda = lambda)
lasso.pred <- predict(lasso_reg, s = bestlam, newx = X[X_cv,])
m<-mean((lasso.pred-Y_cv)^2)
m

lasso.coef <- predict(lasso_reg, type = 'coefficients', s = bestlam)[1:40,]

lasso.coef

0s2 8MA0-21 Statistics - Mock Set 2 PDF
No ratings yet
0s2 8MA0-21 Statistics - Mock Set 2 PDF
12 pages
Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
Introductory Econometrics Test Bank
84% (32)
Introductory Econometrics Test Bank
133 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
PA 1 UNIT
No ratings yet
PA 1 UNIT
23 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
Ridge Mt1cars
No ratings yet
Ridge Mt1cars
4 pages
Detailed_Breakdown_Ridge_Lasso
No ratings yet
Detailed_Breakdown_Ridge_Lasso
2 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
INSY446 - 3 - Linear Model Part 2
No ratings yet
INSY446 - 3 - Linear Model Part 2
27 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Lab 1
No ratings yet
Lab 1
6 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
1.1. Linear Models — scikit-learn 1.6.1 documentation
No ratings yet
1.1. Linear Models — scikit-learn 1.6.1 documentation
41 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
L2 Regularization
No ratings yet
L2 Regularization
10 pages
LASSO and Ridge-1
No ratings yet
LASSO and Ridge-1
15 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
AML-3
No ratings yet
AML-3
19 pages
TP MSDC 3
No ratings yet
TP MSDC 3
6 pages
Unit 2
No ratings yet
Unit 2
8 pages
Module 3
No ratings yet
Module 3
35 pages
_Regularization_Methods_Intro_1694372556
No ratings yet
_Regularization_Methods_Intro_1694372556
38 pages
Machine learning
No ratings yet
Machine learning
19 pages
Lecture 13 - Reguralization
No ratings yet
Lecture 13 - Reguralization
33 pages
Unit 2
No ratings yet
Unit 2
92 pages
Regression Analysis in Machine Learning: Context
No ratings yet
Regression Analysis in Machine Learning: Context
16 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
AI34
No ratings yet
AI34
3 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
Feature selection
No ratings yet
Feature selection
19 pages
Regularization_(1)
No ratings yet
Regularization_(1)
3 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Copie de Executive Summary of Marketing Plan by Slidesgo 1
No ratings yet
Copie de Executive Summary of Marketing Plan by Slidesgo 1
50 pages
Module 4 EDA
No ratings yet
Module 4 EDA
20 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
No ratings yet
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
11 pages
Ridge Regression LASSO
No ratings yet
Ridge Regression LASSO
18 pages
Honours 1
No ratings yet
Honours 1
5 pages
5-R
No ratings yet
5-R
65 pages
Modern Regression - Ridge Regression
100% (1)
Modern Regression - Ridge Regression
21 pages
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
No ratings yet
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
21 pages
Regularization
No ratings yet
Regularization
13 pages
Ridge and Lasso Regression in Python
No ratings yet
Ridge and Lasso Regression in Python
18 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
04_ridge-regression.en
No ratings yet
04_ridge-regression.en
1 page
MLT Content
No ratings yet
MLT Content
3 pages
Hyperparameter tuning using Ridge and Lasso Regression
No ratings yet
Hyperparameter tuning using Ridge and Lasso Regression
7 pages
Data Analytics_Ridge and LASSO Regression
No ratings yet
Data Analytics_Ridge and LASSO Regression
15 pages
Training Python
No ratings yet
Training Python
4 pages
WNvanWieringen HDDA Lecture234 RidgeRegression 20182019 PDF
No ratings yet
WNvanWieringen HDDA Lecture234 RidgeRegression 20182019 PDF
132 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Pratima Education® 9898168041: D. Ratio
No ratings yet
Pratima Education® 9898168041: D. Ratio
68 pages
X1 Ke Y: Variables Entered/Removed
No ratings yet
X1 Ke Y: Variables Entered/Removed
3 pages
SPA3e 3.4 LecturePPT
No ratings yet
SPA3e 3.4 LecturePPT
13 pages
Brief History of Statistics
No ratings yet
Brief History of Statistics
20 pages
PM&R Volume 4 Issue 12 2012 - Sainani, Kristin L. - Dealing With Non-Normal Data
No ratings yet
PM&R Volume 4 Issue 12 2012 - Sainani, Kristin L. - Dealing With Non-Normal Data
5 pages
ASTM E122-07 Sample Size Determination
No ratings yet
ASTM E122-07 Sample Size Determination
14 pages
Project 1: Cold Storage Case Study Statistical Method For Decision Making
No ratings yet
Project 1: Cold Storage Case Study Statistical Method For Decision Making
10 pages
Tests of Hypothesis
100% (3)
Tests of Hypothesis
82 pages
A Correlational Study of The Relationship Between Height and Weight of 2nd Year BSIS Students of CMDI
100% (1)
A Correlational Study of The Relationship Between Height and Weight of 2nd Year BSIS Students of CMDI
11 pages
Quiz 2
No ratings yet
Quiz 2
2 pages
Report On Petroleum Consumption Data Analytics: - Submitted by
No ratings yet
Report On Petroleum Consumption Data Analytics: - Submitted by
18 pages
Guide To SEM
No ratings yet
Guide To SEM
21 pages
Century National Bank Analysis
No ratings yet
Century National Bank Analysis
6 pages
Geostatistical Analysis Of Compositional Data 1st Edition Olea download
100% (1)
Geostatistical Analysis Of Compositional Data 1st Edition Olea download
80 pages
Descriptive Statistics
100% (1)
Descriptive Statistics
78 pages
Class 1
No ratings yet
Class 1
17 pages
ECON 332 Business Forecasting Methods Time Series Decomposition Models Prof. Kirti K. Katkar
No ratings yet
ECON 332 Business Forecasting Methods Time Series Decomposition Models Prof. Kirti K. Katkar
26 pages
Factor Analysis: KMO and Bartlett's Test
No ratings yet
Factor Analysis: KMO and Bartlett's Test
7 pages
Assignment Module04 Part2
50% (2)
Assignment Module04 Part2
4 pages
Statistics All Grade 11
No ratings yet
Statistics All Grade 11
18 pages
Statistics and Probability (Week 3 and 4)
No ratings yet
Statistics and Probability (Week 3 and 4)
8 pages
Solutions Chapter 5
No ratings yet
Solutions Chapter 5
21 pages
Anova Report April 2019
No ratings yet
Anova Report April 2019
11 pages
Forecasting: Month Actual Demand 3-Month Weighted Moving Average 3-Month Moving Average
No ratings yet
Forecasting: Month Actual Demand 3-Month Weighted Moving Average 3-Month Moving Average
9 pages
upto6-l1-1
No ratings yet
upto6-l1-1
2 pages
Ap19 FRQ Statistics
No ratings yet
Ap19 FRQ Statistics
17 pages
Assessing The Var of A Portfolio Using D-Vine Copula Based Multivariate Garch Models
No ratings yet
Assessing The Var of A Portfolio Using D-Vine Copula Based Multivariate Garch Models
33 pages
TSU Thesis TeacherKit Chapter 3
No ratings yet
TSU Thesis TeacherKit Chapter 3
7 pages

Ridge Regression

Uploaded by

Ridge Regression

Uploaded by

11.

from sklearn.linear_model import Ridge

 It shrinks the parameters, therefore it is mostly used to prevent

12. Lasso regression

from sklearn.linear_model import Lasso

 It uses L1 regularization technique (will be discussed later in this article)

Step 4: Implementation of Ridge regression

train$Item_Weight[is.na(train$Item_Weight)] <- mean(train$Item_Weight, na.rm =

X <- model.matrix(Item_Outlet_Sales~., train)

lambda <- 10^seq(10, -2, length = 100)

Y_train<- Y[part == 1,]

ridge_reg <- glmnet(X[X_train,], Y[X_train], alpha = 0, lambda = lambda)

bestlam <- ridge_reg1$lambda.min

Step 5: Implementation of lasso regression

train$Item_Weight[is.na(train$Item_Weight)] <- mean(train$Item_Weight, na.rm =

X <- model.matrix(Item_Outlet_Sales~., train)

lambda <- 10^seq(10, -2, length = 100)

Y_train<- Y[part == 1,]

lasso.coef <- predict(lasso_reg, type = 'coefficients', s = bestlam)[1:40,]

You might also like