0% found this document useful (0 votes)

104 views

Module 4: Regression Shrinkage Methods

This document discusses regression shrinkage methods, specifically ridge regression. It begins by introducing the concept of shrinkage in regression analysis, where fitted relationships appear to perform worse on new data than the original data used for fitting. It then discusses ridge regression, which aims to reduce this variability by applying a penalty, known as an L2 penalty term, to the coefficients to shrink them towards zero. This balances reducing variance with increasing bias. The key aspects covered are that ridge regression modifies the standard OLS regression formula by introducing this penalty term that is the sum of the squares of the coefficients, controlled by a hyperparameter lambda. This has the effect of performing coefficient shrinkage to reduce overfitting.

Uploaded by

205Abhishek Kotagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views

Module 4: Regression Shrinkage Methods

Uploaded by

205Abhishek Kotagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

MODULE 4: REGRESSION SHRINKAGE METHODS

In statistics, shrinkage is the reduction in the effects of sampling variation. In regression

analysis, a fitted relationship appears to perform less well on a new data set than on the data set
used for fitting. In particular the value of the coefficient of determination 'shrinks'. This idea is
complementary to overfitting and, separately, to the standard adjustment made in the coefficient
of determination to compensate for the subjunctive effects of further sampling, like controlling
for the potential of new explanatory terms improving the model by chance: that is, the
adjustment formula itself provides "shrinkage." But the adjustment formula yields an artificial
shrinkage.

A shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the

effects of shrinkage. In loose terms this means that a naive or raw estimate is improved by
combining it with other information.

Description

Many standard estimators can be improved, in terms of mean squared error (MSE), by shrinking
them towards zero (or any other finite constant value). In other words, the improvement in the
estimate from the corresponding reduction in the width of the confidence interval can outweigh
the worsening of the estimate introduced by biasing the estimate towards zero (see bias-variance
trade-off).

Assume that the expected value of the raw estimate is not zero and consider other estimators
obtained by multiplying the raw estimate by a certain parameter. A value for this parameter can
be specified so as to minimize the MSE of the new estimate. For this value of the parameter, the
new estimate will have a smaller MSE than the raw one. Thus it has been improved. An effect
here may be to convert an unbiased raw estimate to an improved biased one.

Prediction:

 Linear regression:

 Or for a more general regression function:

 In a prediction context, there is less concern about the values of the components of the
right-hand side, rather interest is on the total contribution.
Variable Selection:

 The driving force behind variable selection:

o The desire for a parsimonious regression model (one that is simpler and easier to
interpret);

o The need for greater accuracy in prediction.

 The notion of what makes a variable "important" is still not well understood, but one
interpretation (Breiman, 2001) is that a variable is important if dropping it seriously
affects prediction accuracy.

 Selecting variables in regression models is a complicated problem, and there are many
conflicting views on which type of variable selection procedure is best, e.g. LRT, F-test,
AIC, and BIC.

There are two main types of stepwise procedures in regression:

 Backward elimination: eliminate the least important variable from the selected ones.

 Forward selection: add the most important variable from the remaining ones.

 A hybrid version that incorporates ideas from both main types: alternates backward and
forward steps, and stops when all variables have either been retained for inclusion or
removed.

Criticisms of Stepwise Methods:

 There is no guarantee that the subsets obtained from stepwise procedures will contain the
same variables or even be the "best" subset.

 When there are more variables than observations (p > n), backward elimination is
typically not a feasible procedure.

 The maximum or minimum of a set of correlated F statistics is not itself an F statistic.

 It produces a single answer (a very specific subset) to the variable selection problem,
although several different subsets may be equally good for regression purposes.

 The computing is easy by the use of R function step() or regsubsets(). However, to

specify a practically good answer, you must know the practical context in which your
inference will be used.

Scott Zeger on 'how to pick the wrong model': Turn your scientific problem over to a computer
that, knowing nothing about your science or your question, is very good at optimizing AIC, BIC,.
Objectives

Upon successful completion of this lesson, you should be able to:

 Introducing biased regression methods to reduce variance.

 Implementation of Ridge and Lasso regression.

Ridge Regression
Ridge regression solves some of the shortcomings of linear regression. Ridge regression is an
extension of the OLS method with an additional constraint. The OLS estimates are
unconstrained, and might exhibit a large magnitude, and therefore large variance. In ridge
regression, the coefficients are applied a penalty, so that they are shrunk towards zero, this also
having the effect of reducing the variance and hence, the prediction error. Similar to the OLS
approach, we choose the ridge coefficients to minimize a penalized residual sum of squares
(RSS). As opposed to OLS, ridge regression provides biased estimators which have a low
variance

Motivation: too many predictors

 It is not unusual to see the number of input variables greatly exceed the number of
observations, e.g. microarray data analysis, environmental pollution studies.
 With many predictors, fitting the full model without penalization will result in large
prediction intervals, and LS regression estimator may not uniquely exist.

One way out of this situation is to abandon the requirement of an unbiased estimator.
We assume only that X's and Y have been centered so that we have no need for a constant term
in the regression:
 X is an n by p matrix with centered columns,
 Y is a centered n-vector.
When initially developing predictive models, we often need to compute coefficients, as
coefficients are not explicitly stated in the training data. To estimate coefficients, we can use a
standard ordinary least squares (OLS) matrix coefficient estimator:

Hoerl and Kennard (1970) proposed that potential instability in the LS estimator
Knowing this formula’s operations requires familiarity with matrix notation. Suffice it to say,
this formula aims to find the best-fitting line for a given dataset by calculating coefficients for
each independent variable that collectively result in the smallest residual sum of squares (also
called the sum of squared errors)

Residual sum of squares (RSS) measures how well a linear regression model matches training
data. It is represented by the formulation:

This formula measures model prediction accuracy for ground-truth values in the training data. If
RSS = 0, the model perfectly predicts dependent variables. A score of zero is not always
desirable, however, as it can indicate overfitting on the training data, particularly if the training
dataset is small. Multicollinearity may be one cause of this.

• High coefficient estimates can often be symptomatic of overfitting. If two or more

variables share a high, linear correlation, OLS may return erroneously high-value
coefficients. When one or more coefficients are too high, the model’s output becomes
sensitive to minor alterations in the input data. In other words, the model has overfitted
on a specific training set and fails to accurately generalize on new test sets. Such a model
is considered unstable.

• Ridge regression modifies OLS by calculating coefficients that account for potentially
correlated predictors. Specifically, ridge regression corrects for high-value coefficients by
introducing a regularization term (often called the penalty term) into the RSS function.
This penalty term is the sum of the squares of the model’s coefficients. It is represented
in the formulation:
The L2 penalty term is inserted as the end of the RSS function, resulting in a new formulation,
the ridge regression estimator. Therein, its effect on the model is controlled by the
hyperparameter lambda (λ):

Remember that coefficients mark a given predictor’s (i.e. independent variable’s) effect on the
predicted value (i.e. dependent variable). Once added into RSS formula, the L2 penalty term
counteracts especially high coefficients by reducing all coefficient values. In statistics, this is
called coefficient shrinkage. The above ridge estimator thus calculates new regression
coefficients that reduce a given model’s RSS. This minimizes every predictor’s effect and
reduces overfitting on training data.

Note that ridge regression does not shrink every coefficient by the same value. Rather,
coefficients are shrunk in proportion to their initial size. As λ increases, high-value coefficients
shrink at a greater rate than low-value coefficients. High-value coefficients are thus penalized
greater than low-value coefficients.

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Unit III
No ratings yet
Unit III
18 pages
UNIT-3-1
No ratings yet
UNIT-3-1
41 pages
Ridge Regression LASSO
No ratings yet
Ridge Regression LASSO
18 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Manual ML 1
No ratings yet
Manual ML 1
8 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Copie de Executive Summary of Marketing Plan by Slidesgo 1
No ratings yet
Copie de Executive Summary of Marketing Plan by Slidesgo 1
50 pages
FDA UNIT 5
No ratings yet
FDA UNIT 5
20 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
No ratings yet
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
4 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Ch08 - Linear Regression
No ratings yet
Ch08 - Linear Regression
37 pages
Lecture Notes On Ridge Regression
No ratings yet
Lecture Notes On Ridge Regression
149 pages
CHAPTER TWO
No ratings yet
CHAPTER TWO
44 pages
Machine learning
No ratings yet
Machine learning
19 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Unit 2
No ratings yet
Unit 2
92 pages
CHP 3 PDF
No ratings yet
CHP 3 PDF
31 pages
Регрессии на пальцаx
100% (1)
Регрессии на пальцаx
118 pages
Module 5.2
No ratings yet
Module 5.2
51 pages
Regid Regression
No ratings yet
Regid Regression
129 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Chapter Three
No ratings yet
Chapter Three
22 pages
Modern Regression - Ridge Regression
100% (1)
Modern Regression - Ridge Regression
21 pages
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
No ratings yet
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
21 pages
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
No ratings yet
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
30 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
PA 1 UNIT
No ratings yet
PA 1 UNIT
23 pages
Bivariate Regression Analysis: The Beginning of Many Types of Regression
No ratings yet
Bivariate Regression Analysis: The Beginning of Many Types of Regression
40 pages
Data Analytics Unit 3 Notes
100% (2)
Data Analytics Unit 3 Notes
28 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
STATS135 REVIEWER
No ratings yet
STATS135 REVIEWER
5 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
UNIT-3 NOTES
No ratings yet
UNIT-3 NOTES
16 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
BRM - L4,5 - Linear Regression
No ratings yet
BRM - L4,5 - Linear Regression
113 pages
4lasso and Friends
No ratings yet
4lasso and Friends
36 pages
derex econom
No ratings yet
derex econom
13 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
R18&19
No ratings yet
R18&19
32 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Econometrics for Finace Lecture II-Session Three
No ratings yet
Econometrics for Finace Lecture II-Session Three
32 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
BYJU'S Answer: Study Materials
No ratings yet
BYJU'S Answer: Study Materials
13 pages
Block 2
No ratings yet
Block 2
208 pages
AP Calculus BC Chapter 11 Take Home Test
No ratings yet
AP Calculus BC Chapter 11 Take Home Test
9 pages
Me8692 Finite Element Analysis Question Bank 2022-2023
No ratings yet
Me8692 Finite Element Analysis Question Bank 2022-2023
36 pages
NM - Chap 2 Solved - Exercieses
No ratings yet
NM - Chap 2 Solved - Exercieses
10 pages
54534e0515043 Blue Ridge Hot Tub Company
No ratings yet
54534e0515043 Blue Ridge Hot Tub Company
3 pages
Kalkulus - 2
No ratings yet
Kalkulus - 2
10 pages
2012 Gauss
No ratings yet
2012 Gauss
54 pages
Mm597 Advanced Numerical Methods in Engineers
No ratings yet
Mm597 Advanced Numerical Methods in Engineers
142 pages
NYC PracTest1 F15
No ratings yet
NYC PracTest1 F15
2 pages
HW 02 Sol
No ratings yet
HW 02 Sol
14 pages
Math S Practice Paper 4 QP
No ratings yet
Math S Practice Paper 4 QP
12 pages
Problem 8: Bairstow's Method
No ratings yet
Problem 8: Bairstow's Method
3 pages
La4-2 Special Types of Matrices
No ratings yet
La4-2 Special Types of Matrices
6 pages
Algorithm For Raman Backward Pumping
No ratings yet
Algorithm For Raman Backward Pumping
8 pages
PETE - Petroleum Engineering (PETE)
No ratings yet
PETE - Petroleum Engineering (PETE)
3 pages
Integral CH - 7.pmd
No ratings yet
Integral CH - 7.pmd
8 pages
TME18250 - Aakash Virdhe - NRM & SAM - NMO
No ratings yet
TME18250 - Aakash Virdhe - NRM & SAM - NMO
2 pages
October 10, 2022 1 / 100
No ratings yet
October 10, 2022 1 / 100
105 pages
Lesson 10.1 Add and Subtract Polynomials
No ratings yet
Lesson 10.1 Add and Subtract Polynomials
11 pages
L03 PDF
No ratings yet
L03 PDF
16 pages
The Numerical Solution of Ordinary and Partial Differential Equations Second Edition Granville Sewell(Auth.) download pdf
100% (8)
The Numerical Solution of Ordinary and Partial Differential Equations Second Edition Granville Sewell(Auth.) download pdf
60 pages
Appendix-54
No ratings yet
Appendix-54
89 pages
22 MATM11 Set 2
No ratings yet
22 MATM11 Set 2
4 pages
Graphical Method
No ratings yet
Graphical Method
11 pages
Numerical Analysis Durham UNI
No ratings yet
Numerical Analysis Durham UNI
87 pages
Iterative Methods For Linear and Nonlinear Equations
No ratings yet
Iterative Methods For Linear and Nonlinear Equations
1 page
Solving System of Linear Equations: Y. Sharath Chandra Mouli
No ratings yet
Solving System of Linear Equations: Y. Sharath Chandra Mouli
32 pages
(Lecture Notes in Bioengineering) Socrates Dokos-Modelling Organs, Tissues, Cells and Devices - Using MATLAB and COMSOL Multiphysics-Springe PDF
No ratings yet
(Lecture Notes in Bioengineering) Socrates Dokos-Modelling Organs, Tissues, Cells and Devices - Using MATLAB and COMSOL Multiphysics-Springe PDF
503 pages
M.tech Civil (Structural) Engg 1ST Sem Syllabus
No ratings yet
M.tech Civil (Structural) Engg 1ST Sem Syllabus
15 pages