0% found this document useful (0 votes)

11 views29 pages

Chapter 6 - 1 Handsout Machine Learning

chapter 6

Uploaded by

Abid Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views29 pages

Chapter 6 - 1 Handsout Machine Learning

chapter 6

Uploaded by

Abid Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Machine Learning

Session 17-18

Prof. Dr.Ijaz Hussain

Statistics, QAU

May 13, 2024

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 1 / 29
Shrinkage Methods

The subset selection methods involve using least squares to fit a linear
model that contains a subset of the predictors.
As an alternative, we can fit a model containing all p predictors using
a technique that constrains or regularizes the coefficient estimates, or
equivalently, that shrinks the coefficient estimates towards zero.
It may not be immediately obvious why such a constraint should im-
prove the fit, but it turns out that shrinking the coefficient estimates
can significantly reduce their variance.
The two best-known techniques for shrinking the regression coefficients
towards zero are ridge regression and the lasso.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 2 / 29
Ridge Regression
In least squares fitting procedure to estimate β0 , β1 , ..., βp using the
values that minimize
 2
Xn Xp
RSS = yi − β0 − βj xij 
i=1 j=1

Ridge regression is very similar to OLS, except that the coefficients are
estimated by minimizing a slightly different quantity.
Ridge regression coefficient estimates β̂ R are the values that minimize
 2
Xn Xp p
X
R
RSS =  yi − β0 − βj xij  +λ βj2
i=1 j=1 j=1
p
X
= RSS + λ βj2
j=1
where λ ≥ 0 is a tuning parameter, to be determined separately.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 3 / 29
Ridge Regression...
The above equation represents trade off between two different criteria.
1 As with least squares, ridge regression seeks coefficient estimates that
fit the data well, by making the RSS small.
The second term, λ pj=1 βj2 called a shrinkage penalty, is small when
P
2

β0 , β1 , ..., βp are close to zero, and so it has the effect of shrinking the
estimates of βj towards zero.
The tuning parameter λ serves to control the relative impact of these
two terms on the regression coefficient estimates.
When λ = 0 , the penalty term has no effect, and ridge regression will
produce the least squares estimates.
However, as λ → ∞ , the impact of the shrinkage penalty grows, and
the ridge regression coefficient estimates will approach zero.
Unlike least squares, which generates only one set of coefficient esti-
mates, ridge regression will produce a different set of coefficient esti-
mates, β̂λR , for each value of λ .
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 4 / 29
Ridge Regression...

Selecting a good value for λ is critical; which can be determined by

using cross-validation methods.
Note that in above, the shrinkage penalty is applied to β0 , β1 , ..., βp ,
but not to the intercept β0 .
We want to shrink the estimated association of each variable with the
response.
However, we do not want to shrink the intercept, which is simply a
measure of the mean value of the response when xi1 = xi2 = ... =
xip = 0.
If we assume that the variables—that is, the columns of the data matrix
X —have been centered to have mean zero before ridge regression is
performed,
P then the estimated intercept will take the form β̂0 = ȳ =
y /n.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 5 / 29
Ridge regression predictions on a simulated data set

Figure: Squared bias (black), variance (green), and test mean squared error
(purple) as a function of λ and β̂λR 2/ β̂ 2.

The horizontal dashed lines indicate the minimum possible MSE.

The purple crosses indicate the ridge regression models for which the
MSE is smallest
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 6 / 29
Ridge regression predictions on a simulated data set

The right-hand panel of the Figure displays the same curves as the left-
hand panel, this time plotted against the l2 norm of the ridge regression
coefficient estimates divided by the l2 norm of the least squares esti-
mates.
Now as we move from left to right, the fits become more flexible, and
so the bias decreases and the variance increases.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 7 / 29
Why Does Ridge Regression Improve Over Least Squares?

Ridge regression’s advantage over least squares is rooted in the bias-

variance trade-off.
As λ increases, the flexibility of the ridge regression fit decreases, lead-
ing to decrease variance but increase bias.
This is illustrated in the left-hand panel of the previously displayed
Figure , using a simulated data set containing p = 45 predictors and
n = 50 observations.
The green curve in the left-hand panel of the Figure displays the vari-
ance of the ridge regression predictions as a function of λ.
At the least squares coefficient estimates, which correspond to ridge
regression with λ = 0, the variance is high but there is no bias.
But as λ increases, the shrinkage of the ridge coefficient estimates
leads to a substantial reduction in the variance of the predictions, at
the expense of a slight increase in bias.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 8 / 29
Why Does Ridge Regression Improve Over Least Squares?

Recall that the test mean squared error (MSE), plotted in purple, is
closely related to the variance plus the squared bias.
For values of λ up to about 10, the variance decreases rapidly, with
very little increase in bias, plotted in black.
Consequently, the MSE drops considerably as λ increases from 0 to 10.
Beyond this point, the decrease in variance due to increasing λ slows,
and the shrinkage on the coefficients causes them to be significantly
underestimated, resulting in a large increase in the bias.
The minimum MSE is achieved at approximately λ = 30.
Interestingly, because of its high variance, the MSE associated with
the least squares fit, when λ = 0, is almost as high as that of the null
model for which all coefficient estimates are zero, when λ = ∞.
However, for an intermediate value of λ, the MSE is considerably lower.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 9 / 29
Computational Advantages of Ridge Regression

Ridge regression also has substantial computational advantages over

best subset selection, which requires searching through 2p models.
As we discussed previously, even for moderate values of p, such a search
can be computationally infeasible.
In contrast, for any fixed value of λ, ridge regression only fits a single
model, and the model-fitting procedure can be performed quite quickly.
In fact, one can show that the computations required to solve ridge
Regression Equation, simultaneously for all values of λ, are almost
identical to those for fitting a model using least squares.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 10 / 29
The Lasso Regression
Unlike best subset, forward stepwise, and backward stepwise selection,
which will generally select models that involve just a subset of the
variables, ridge regression includes all p predictors in the final model.
The penalty term λ pj=1 βj2 will shrink all of the coefficients towards
P
zero, but it will not set any of them exactly to zero (unless λ = ∞).
This may not be a problem for prediction accuracy, but it can create
a challenge in model interpretation in settings in which the number of
variables p is quite large.
The lasso is a relatively recent alternative to ridge regression that over
comes this disadvantage.
The lasso coefficients, β̂λL , minimize the quantity
 2
n
X p
X p
X p
X
L yi − β0 −
RSS = βj xij  + λ |βj | = RSS + λ |βj |
i=1 j=1 j=1 j=1

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 11 / 29
The Lasso Regression....

Comparing RSS or ridge to RSS of Lasso, it can be observed that the

lasso and ridge regression have similar formulations.
The only difference is that the βj2 term in the ridge regression penalty
has been replaced by |βj | in the lasso penalty.
In statistical parlance, the lasso uses an l1 (pronounced “ell 1”) penalty
instead of an l2 penalty.
P
The l1 norm of a coefficient vector β is given by ∥β∥1 = |βJ |.
As with ridge regression, the lasso shrinks the coefficient estimates
towards zero.
However, in the case of the lasso, the l1 penalty has the effect of forcing
some of the coefficient estimates to be exactly equal to zero when the
tuning parameter λ is sufficiently large.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 12 / 29
The Lasso Regression....

Hence, much like best subset selection, the lasso performs variable
selection.
As a result, models generated from the lasso are generally much easier
to interpret than those produced by ridge regression.
We say that the lasso yields sparse models—that is, models that involve
only a subset of the variables.
As in ridge regression, selecting a good value of λ for the lasso is
critical; it can be determined by using cross-validation.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 13 / 29
Example Lasso

Figure:

When λ = 0, then the lasso simply gives the least squares fit, and when
λ becomes sufficiently large, the lasso gives the null model in which all
coefficient estimates equal zero.
However, in between these two
Prof. Dr.Ijaz Hussain (Statistics, QAU)
extremes, the ridge regression
Machine Learning May 13, 2024
and lasso
14 / 29
Example Lasso....

Moving from left to right in the right-hand panel, we observe that at

first the lasso results in a model that contains only the rating predictor.
Then student and limit enter the model almost simultaneously, shortly
followed by income.
Eventually, the remaining variables enter the model.
Hence, depending on the value of λ, the lasso can produce a model
involving any number of variables.
In contrast, ridge regression will always include all of the variables in
the model, although the magnitude of the coefficient estimates will
depend on λ.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 15 / 29
Comparing the Lasso and Ridge Regression
It is clear that the lasso has a major advantage over ridge regression,
in that it produces simpler and more interpretable models that involve
only a subset of the predictors.
However, which method leads to better prediction accuracy? it cannot
be generalized.
The lasso implicitly assumes that a number of the coefficients truly
equal zero.
If the number of predictors that are function of response is very high
then ridge regression will outperform the lasso in terms of prediction
error.
If the response is function of few variables e.g 2 out of 40 variables
then the lasso will tend to outperform ridge regression in terms of bias,
variance, and MSE.
It can be concluded that neither ridge regression nor the lasso will
universally dominate the other.
In general, one might expect the lasso to perform better in a setting
where a relatively small number of predictors have substantial coef-
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 16 / 29
Comparing the Lasso and Ridge Regression
Ridge regression will perform better when the response is a function of
many predictors, all with coefficients of roughly equal size.
However, the number of predictors that is related to the response is
never known a priori for real data sets.
A technique such as cross-validation can be used in order to determine
which approach is better on a particular data set.
As with ridge regression, when the least squares estimates have exces-
sively high variance, the lasso solution can yield a reduction in variance
at the expense of a small increase in bias, and consequently can gen-
erate more accurate predictions.
Unlike ridge regression, the lasso performs variable selection, and hence
results in models that are easier to interpret.
There are very efficient algorithms for fitting both ridge and lasso mod-
els; in both cases the entire coefficient paths can be computed with
about the same amount of work as a single least squares fit.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 17 / 29
Comparison of Ridge Regression and Lasso Regression

Figure:

Left: Plots of squared bias (black), variance (green), and test MSE
(purple) for the lasso on a simulated data set.
Right: Comparison of squared bias, variance, and test MSE between
lasso (solid) and ridge (dotted).
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 18 / 29
A Simple Special Case for Ridge Regression and the Lasso
In order to obtain a better understanding about the behavior of ridge
regression and the lasso, consider a simple special case with n = p, and
X a diagonal matrix with 1’s on the diagonal and 0’s in all off-diagonal
elements.
To simplify the problem further, assume also that we are performing
regression without an intercept.
With these assumptions, the usual least squares problem simplifies to
finding β1 , ..., βp that minimize
p
X
(yj − βj )2
j=1

In this case, the least squares solution is yj = β̂j

And in this setting, ridge regression amounts to finding β1 , ..., βp such
that
Xp Xp
(yj − βj )2 + λ βj2
j=1 j=1
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 19 / 29
A Simple Special Case for Ridge Regression and the
Lasso...

The lasso amounts to finding the coefficients such that

p
X p
X
2
(yj − βj ) + λ |βj |
j=1 j=1

is minimized.
One can show that in this setting, the ridge regression estimates take
the form
yi
β̂jR =
(1 + λ)
and the lasso estimates take the form

β̂jL = yj − λ/2 if yj > λ/2, yj + λ/2 if yj < −λ/2, Otherwise 0

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 20 / 29
Example: The ridge regression and lasso

Figure:

Left: The ridge regression coefficient estimates are shrunken propor-

tionally towards zero, relative to the least squares estimates.
Right: The lasso coefficient estimates are soft-thresholded towards
zero.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 21 / 29
Example: The ridge regression and lasso...

We can see from above figure that ridge regression and the lasso per-
form two very different types of shrinkage.
In ridge regression, each least squares coefficient estimate is shrunken
by the same proportion.
In contrast, the lasso shrinks each least squares coefficient towards zero
by a constant amount, λ/2; the least squares coefficients that are less
than λ/2 in absolute value are shrunken entirely to zero.
The type of shrinkage performed by the lasso in this simple setting is
known as softthresholding.
The fact that some lasso coefficients are shrunken entirely to zero ex-
plains why the lasso performs feature selection.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 22 / 29
Example: The ridge regression and lasso...

In the case of a more general data matrix X , the story is a little more
complicated than what is depicted in above Figure, but the main ideas
still hold approximately
Ridge regression more or less shrinks every dimension of the data by
the same proportion.
Whereas the lasso more or less shrinks all coefficients toward zero by a
similar amount, and sufficiently small coefficients are shrunken all the
way to zero.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 23 / 29
Bayesian Interpretation for Ridge Regression and the Lasso

We now show that one can view ridge regression and the lasso through
a Bayesian lens.
A Bayesian viewpoint for regression assumes that the coefficient vector
β has some prior distribution, say p(β), where β = (β0 , β1 , ..., βp )T .
The likelihood of the data can be written as f (Y |X , β), where X =
(X1 , ..., Xp ).
Multiplying the prior distribution by the likelihood gives us (up to a pro-
portionality constant) the posterior distribution,which takes the form

p (β|X , Y ) ∝ f (Y |X , β) p (β|X ) = f (Y |X , β) p (β)

where the proportionality above follows from Bayes’ theorem, and the
equality above follows from the assumption that X is fixed.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 24 / 29
Bayesian Interpretation for Ridge Regression and the Lasso
We assume the usual linear model,
Y = β0 + X1 β1 + .... + Xp βp + ϵ,
and suppose that the errors are independent and drawn from a normal
distribution.
Furthermore, assume that p(β) = pj=1 g (βj ), for some density func-
Q
tion g . It turns out that ridge regression and the lasso follow naturally
from two special cases of g :
1. If g is a Gaussian distribution with mean zero and standard deviation
a function of λ, then it follows that the posterior mode for β—that
is, the most likely value for β, given the data—is given by the ridge
regression solution. (In fact, the ridge regression solution is also the
posterior mean.)
2. If g is a double-exponential (Laplace) distribution with mean zero and
scale parameter a function of λ, then it follows that the posterior mode
for β is the lasso solution. (However, the lasso solution is not the pos-
terior mean, and in fact, the posterior mean does not yield a sparse
coefficient vector.)
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 25 / 29
Posterior of Ridge regression and Lasso Regression

Figure:

Left: Ridge regression is the posterior mode for β under a Gaussian

prior.
Right: The lasso is the posterior mode for β under a double-exponential
prior.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 26 / 29
Selecting the Tuning Parameter

Just as the subset selection approaches require a method to determine

which of the models under consideration is best.
Implementing ridge regression and the lasso requires a method for se-
lecting a value for the tuning parameter λ or equivalently, the value of
the constraints.
Cross-validation provides a simple way to tackle this problem.
We choose a grid of λ values, and compute the cross-validation error
for each value of λ.
We then select the tuning parameter value for which the cross-validation
error is smallest.
Finally, the model is re-fit using all of the available observations and
the selected value of the tuning parameter.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 27 / 29
References

The material used in these slides is borrowed from the following books.
These slides can be used only for academic purpose.
Gareth, J., Daniela, W., Trevor, H., & Robert, T. (2013). An intro-
duction to statistical learning: with applications in R. Spinger.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009).
The elements of statistical learning: data mining, inference, and pre-
diction (Vol. 2, pp. 1-758). New York: springer.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 28 / 29
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 13, 2024 29 / 29

Lasso & Ridge Regression
No ratings yet
Lasso & Ridge Regression
5 pages
Week 2 Lasso and Ridge Regression
No ratings yet
Week 2 Lasso and Ridge Regression
7 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
LASSO and Ridge-1
No ratings yet
LASSO and Ridge-1
15 pages
Presentation Day 3 - Lasso-Ridge Regression, Logistic Regression, SVM
No ratings yet
Presentation Day 3 - Lasso-Ridge Regression, Logistic Regression, SVM
56 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
2 RegularizedRegression
No ratings yet
2 RegularizedRegression
25 pages
Unit 2
No ratings yet
Unit 2
92 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Mechanisms in Modern Engineering Design PDF
100% (3)
Mechanisms in Modern Engineering Design PDF
618 pages
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
No ratings yet
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
38 pages
Modern Regression - Ridge Regression
100% (1)
Modern Regression - Ridge Regression
21 pages
Lasoo Regression
No ratings yet
Lasoo Regression
8 pages
WNvanWieringen HDDA Lecture234 RidgeRegression 20182019 PDF
No ratings yet
WNvanWieringen HDDA Lecture234 RidgeRegression 20182019 PDF
132 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
35 ตัน XCT35 - Y - 1
No ratings yet
35 ตัน XCT35 - Y - 1
20 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Ridge Regression: Patrick Breheny
No ratings yet
Ridge Regression: Patrick Breheny
22 pages
INSY446 - 3 - Linear Model Part 2
No ratings yet
INSY446 - 3 - Linear Model Part 2
27 pages
7SSMM700 Lecture 8
No ratings yet
7SSMM700 Lecture 8
33 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
Lecture 3
No ratings yet
Lecture 3
16 pages
Regularization Methods Intro 1694372556
No ratings yet
Regularization Methods Intro 1694372556
38 pages
Regularization
No ratings yet
Regularization
3 pages
Copie de Executive Summary of Marketing Plan by Slidesgo 1
No ratings yet
Copie de Executive Summary of Marketing Plan by Slidesgo 1
50 pages
Rudyregularization PDF
No ratings yet
Rudyregularization PDF
56 pages
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
No ratings yet
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
29 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
Seminar Report Format
No ratings yet
Seminar Report Format
7 pages
SLChapter 5
No ratings yet
SLChapter 5
16 pages
Subset Selection and Shrinkage Methods
No ratings yet
Subset Selection and Shrinkage Methods
25 pages
Ihp w22 Model Answer Paper 22655
No ratings yet
Ihp w22 Model Answer Paper 22655
14 pages
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
No ratings yet
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
56 pages
L2 Regularization
No ratings yet
L2 Regularization
10 pages
ABAP Proxies
No ratings yet
ABAP Proxies
50 pages
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
No ratings yet
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
16 pages
HL-740 (TM) 7-5
No ratings yet
HL-740 (TM) 7-5
17 pages
Pa 1 Unit
No ratings yet
Pa 1 Unit
23 pages
Module 4 EDA
No ratings yet
Module 4 EDA
20 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Codigos de FalhaCP 224 e 274
No ratings yet
Codigos de FalhaCP 224 e 274
6 pages
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
No ratings yet
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
21 pages
Module 3
No ratings yet
Module 3
35 pages
Tibshirani Lasso
No ratings yet
Tibshirani Lasso
22 pages
Proposal - SRI SAI ENTERPRISES MOHAN NAGAR
No ratings yet
Proposal - SRI SAI ENTERPRISES MOHAN NAGAR
4 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Unit III
No ratings yet
Unit III
18 pages
Data Analytics - Ridge and LASSO Regression
No ratings yet
Data Analytics - Ridge and LASSO Regression
15 pages
Ridge Regression LASSO
No ratings yet
Ridge Regression LASSO
18 pages
Full Mobile App Development With Ionic Cross Platform Apps With Ionic Angular and Cordova Griffith Ebook All Chapters
100% (3)
Full Mobile App Development With Ionic Cross Platform Apps With Ionic Angular and Cordova Griffith Ebook All Chapters
38 pages
Mitsubishi Outdoor Bơm Nhiệt GN Nước
No ratings yet
Mitsubishi Outdoor Bơm Nhiệt GN Nước
34 pages
Regression Shrinkage and Selection Via The Lasso
No ratings yet
Regression Shrinkage and Selection Via The Lasso
22 pages
Aml 3
No ratings yet
Aml 3
19 pages
Regression Analysis in Machine Learning: Context
No ratings yet
Regression Analysis in Machine Learning: Context
16 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
Ridge Regression and Ill-Conditioning
No ratings yet
Ridge Regression and Ill-Conditioning
10 pages
Ridge Regression
No ratings yet
Ridge Regression
6 pages
Is 1892
No ratings yet
Is 1892
1 page
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Unit 1 - ADT
No ratings yet
Unit 1 - ADT
26 pages
Kailasuf Al Failasuf DVOR
No ratings yet
Kailasuf Al Failasuf DVOR
10 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
SIPGA Project List
No ratings yet
SIPGA Project List
1 page
PRACRES1 Syllabus
No ratings yet
PRACRES1 Syllabus
9 pages
System Overview MEI633
No ratings yet
System Overview MEI633
46 pages
Gnucash Guide
No ratings yet
Gnucash Guide
226 pages
Ridge Mt1cars
No ratings yet
Ridge Mt1cars
4 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Superseded
No ratings yet
Superseded
19 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
CONDUITE
No ratings yet
CONDUITE
9 pages
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers For Robust Speaker Embeddings
No ratings yet
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers For Robust Speaker Embeddings
6 pages
Checklist Chiller-Update
No ratings yet
Checklist Chiller-Update
4 pages
Detailed Breakdown Ridge Lasso
No ratings yet
Detailed Breakdown Ridge Lasso
2 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
NTCC Sem VI Major Project WPR
No ratings yet
NTCC Sem VI Major Project WPR
12 pages
Lasso Ridge Notes
No ratings yet
Lasso Ridge Notes
2 pages
LLDP
No ratings yet
LLDP
6 pages
RCH-Series, Hollow Plunger Cylinders: Shown From Left To Right: RCH-306, RCH-120, RCH-1003
No ratings yet
RCH-Series, Hollow Plunger Cylinders: Shown From Left To Right: RCH-306, RCH-120, RCH-1003
2 pages
64167USERASSIST
No ratings yet
64167USERASSIST
10 pages
Files and Data Structures
No ratings yet
Files and Data Structures
3 pages
BQS502 Assignment 1 Mac - Aug 2022
No ratings yet
BQS502 Assignment 1 Mac - Aug 2022
2 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
Unit 11: Travel Planning
No ratings yet
Unit 11: Travel Planning
6 pages
Test48 - Google Search
No ratings yet
Test48 - Google Search
3 pages
Simple Multi-Gbps 60 GHZ Radio-Over-Fiber Links Employing Optical and Electrical Data Up-Convers
No ratings yet
Simple Multi-Gbps 60 GHZ Radio-Over-Fiber Links Employing Optical and Electrical Data Up-Convers
3 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Chapter 6 - 1 Handsout Machine Learning

Uploaded by

Chapter 6 - 1 Handsout Machine Learning

Uploaded by

Machine Learning

Prof. Dr.Ijaz Hussain

May 13, 2024

Selecting a good value for λ is critical; which can be determined by

The horizontal dashed lines indicate the minimum possible MSE.

Ridge regression’s advantage over least squares is rooted in the bias-

Ridge regression also has substantial computational advantages over

Comparing RSS or ridge to RSS of Lasso, it can be observed that the

Moving from left to right in the right-hand panel, we observe that at

In this case, the least squares solution is yj = β̂j

The lasso amounts to finding the coefficients such that

β̂jL = yj − λ/2 if yj > λ/2, yj + λ/2 if yj < −λ/2, Otherwise 0

Left: The ridge regression coefficient estimates are shrunken propor-

p (β|X , Y ) ∝ f (Y |X , β) p (β|X ) = f (Y |X , β) p (β)

Left: Ridge regression is the posterior mode for β under a Gaussian

Just as the subset selection approaches require a method to determine

You might also like