Econometrics

Statistics 512: Applied Linear Models
Topic 5a
Topic Overview
This topic will cover
Ridge Regression
Ridge Regression (Section 11.2)

Some Remedial Measures for Multicollinearity
Restrict the use of the regression model to infererence on values of predictor variables
that follow the same pattern of multicollinearity.
For example, suppose a model has three predictors: X1 , X2 , X3 . The distribution of (X1 , X2 , X3 ) is N (, ) for some mean vector and covariance
matrix . If future predictor values come from this distribution, even if there
is serious multicollinearity, inferences for the predictions using this model are
still useful.
If the model is a polynomial regression model, use centered variables.
Drop one or more predictor variables (i.e., variable selection).
Standard errors on the parameter estimates decrease.
However, how can we tell if the dropped variable(s) give us any useful information.
If the variable is important, the parameter estimates become biased up.
Sometimes, observations can be designed to break the multicollinearity.
Get coefficient estimates from additional data from other contexts.
For instance, if the model is
Yi = 0 + 1 Xi,1 + 2 Xi,2 + i ,
and you have an estimator b1 (for 1 based on another data set, you can estimate 2
by regressing the adjusted variable Yi0 = Yi b1 Xi,1 on Xi,2 . (Common example: in
economics, using cross-sectional data to estimate parameters for a time-dependent
model.)
Use the first few principal components (or factor loadings) of the predictor variables.
(Limitation: may lose interpretability.)
Biased Regression or Coefficient Shrinkage (Example: Ridge Regression)
1
Two Equivalent Formulations of Ridge Regression

Ridge regression shrinks estimators by penalyzing their size. (Penalty:
j2 )
Penalized Residual Sum of Squares:

( N
)
p
p
X
X
X
ridge = arg min
(Yi 0
xi,j j )2 +
2
j
i=1
j=1
j=1
controls the amount of shrinkage of the parameter estimates

Large greater shrinkage (toward zero)
Equivalent Representation:
ridge
= arg min yi 0
p
X
!2
xi,j j
j=1
subject to
p
X
j2 s.
j=1
There is a direct relationship between and s (although we will usually talk about ).
The intercept 0 is not subject to the shrinkage penalty.
Matrix Representation of Solution
ridge = (X0 X + I)1 X0 y
KNNL Example page 256

SAS code in ridge.sas
20 healthy female subjects ages 25-34
Y is fraction body fat
X1 is triceps skin fold thickness
X2 is thigh circumference
X3 is midarm circumference
Conclusion from previous analysis: could have good model with thigh only or midarm
and thickness only.
Input the data
data bodyfat;
infile H:\System\Desktop\CH07TA01.dat;
input skinfold thigh midarm fat;
proc print data = bodyfat;
proc reg data = bodyfat;
model fat = skinfold thigh midarm;
Source
Model
Error
Corrected Total
DF
3
16
19
Analysis of Variance
Sum of
Mean
Squares
Square
396.98461
132.32820
98.40489
6.15031
495.38950
Root MSE
Dependent Mean
Coeff Var
2.47998
20.19500
12.28017
Variable
Intercept
skinfold
thigh
midarm
Parameter Estimates
Parameter
Standard
Estimate
Error
117.08469
99.78240
4.33409
3.01551
-2.85685
2.58202
-2.18606
1.59550
DF
1
1
1
1
R-Square
Adj R-Sq
F Value
21.52
0.8014
0.7641
t Value
1.17
1.44
-1.11
-1.37
Pr > |t|
0.2578
0.1699
0.2849
0.1896
None of the p-values are significant.
skinfold
thigh
midarm
fat
Pearson Correlation Coefficients, N = 20

skinfold
thigh
midarm
1.00000
0.92384
0.45778
0.92384
1.00000
0.08467
0.45778
0.08467
1.00000
0.84327
0.87809
0.14244
Try Ridge Regression

proc reg data = bodyfat
outest = bfout ridge = 0 to 0.1 by 0.003;
model fat = skinfold thigh midarm / noprint;
plot / ridgeplot nomodel nostat;
fat
0.84327
0.87809
0.14244
1.00000
Pr > F
<.0001
Ridge Trace
Each value of (or Ridge k in SAS) gives different values of the parameter estimates. (Note
the instability of the estimate values for small .)
How to Choose
Things to look for
Get the variance inflation factors (VIF) close to 1
Estimated coefficients should be stable
look for only modest change in R2 or
.
title2 Variance Inflation Factors;
proc gplot data = bfout;
plot (skinfold thigh midarm)* _RIDGE_ / overlay;
where _TYPE_ = RIDGEVIF;
run;
Graph the VIFs
Chart the Estimates and Errors for different values

proc print data = bfout;
var _RIDGE_ skinfold thigh midarm;
where _TYPE_ = RIDGEVIF;
proc print data = bfout;
var _RIDGE_ _RMSE_ Intercept skinfold thigh midarm;
where _TYPE_ = RIDGE;
Variance Inflation Factors
Obs
_RIDGE_
skinfold
2
0.000
708.843
4
0.002
50.559
6
0.004
16.982
8
0.006
8.503
10
0.008
5.147
12
0.010
3.486
14
0.012
2.543
16
0.014
1.958
18
0.016
1.570
20
0.018
1.299
22
0.020
1.103
24
0.022
0.956
26
0.024
0.843
28
0.026
0.754
30
0.028
0.683
32
0.030
0.626
thigh
564.343
40.448
13.725
6.976
4.305
2.981
2.231
1.764
1.454
1.238
1.081
0.963
0.872
0.801
0.744
0.697
midarm
104.606
8.280
3.363
2.119
1.624
1.377
1.236
1.146
1.086
1.043
1.011
0.986
0.966
0.949
0.935
0.923
Note that at RIDGE = 0.020, the VIFs are close to 1.

Parameter Estimates
Obs
_RIDGE_
_RMSE_
Intercept
skinfold
thigh
midarm
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0.022
0.024
0.026
0.028
0.030
2.47998
2.54921
2.57173
2.58174
2.58739
2.59104
2.59360
2.59551
2.59701
2.59822
2.59924
2.60011
2.60087
2.60156
2.60218
2.60276
117.085
22.277
7.725
1.842
-1.331
-3.312
-4.661
-5.637
-6.373
-6.946
-7.403
-7.776
-8.083
-8.341
-8.559
-8.746
4.33409
1.46445
1.02294
0.84372
0.74645
0.68530
0.64324
0.61249
0.58899
0.57042
0.55535
0.54287
0.53233
0.52331
0.51549
0.50864
-2.85685
-0.40119
-0.02423
0.12820
0.21047
0.26183
0.29685
0.32218
0.34131
0.35623
0.36814
0.37786
0.38590
0.39265
0.39837
0.40327
-2.18606
-0.67381
-0.44083
-0.34604
-0.29443
-0.26185
-0.23934
-0.22278
-0.21004
-0.19991
-0.19163
-0.18470
-0.17881
-0.17372
-0.16926
-0.16531
Note that at RIDGE = 0.020, the RM SE is only increased by 5% (so SSE increase by about
10%), and the parameter estimates are closer to making sense.
Conclusion
So the solution at = 0.02 with parameter estimates (-7.4, 0.56, 0.37, -0.19) seems to make
the most sense.
Notes
The book makes a big deal about standardizing the variables... SAS does this for you
in the ridge option.
Why ridge regression? Estimates tend to be more stable, particularly outside the region
of the predictor variables: less affected by small changes in the data. (Ordinary LS
estimates can be highly unstable when there is lots of multicollinearity.)
Major drawback: ordinary inference procedures dt work so well.
P
Other procedures use different penalties, e.g. Lasso penalty:
|j |.

Econometrics

Uploaded by

Copyright:

Available Formats

Econometrics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics

Uploaded by

Copyright:

Available Formats

Statistics 512: Applied Linear Models

Ridge Regression (Section 11.2)

Two Equivalent Formulations of Ridge Regression

Penalized Residual Sum of Squares:

controls the amount of shrinkage of the parameter estimates

KNNL Example page 256

None of the p-values are significant.

Pearson Correlation Coefficients, N = 20

Try Ridge Regression

Graph the VIFs

Chart the Estimates and Errors for different values

Note that at RIDGE = 0.020, the VIFs are close to 1.

You might also like