Econometrics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Statistics 512: Applied Linear Models

Topic 5a
Topic Overview
This topic will cover
Ridge Regression

Ridge Regression (Section 11.2)


Some Remedial Measures for Multicollinearity
Restrict the use of the regression model to infererence on values of predictor variables
that follow the same pattern of multicollinearity.
For example, suppose a model has three predictors: X1 , X2 , X3 . The distribution of (X1 , X2 , X3 ) is N (, ) for some mean vector and covariance
matrix . If future predictor values come from this distribution, even if there
is serious multicollinearity, inferences for the predictions using this model are
still useful.
If the model is a polynomial regression model, use centered variables.
Drop one or more predictor variables (i.e., variable selection).
Standard errors on the parameter estimates decrease.
However, how can we tell if the dropped variable(s) give us any useful information.
If the variable is important, the parameter estimates become biased up.
Sometimes, observations can be designed to break the multicollinearity.
Get coefficient estimates from additional data from other contexts.
For instance, if the model is
Yi = 0 + 1 Xi,1 + 2 Xi,2 + i ,
and you have an estimator b1 (for 1 based on another data set, you can estimate 2
by regressing the adjusted variable Yi0 = Yi b1 Xi,1 on Xi,2 . (Common example: in
economics, using cross-sectional data to estimate parameters for a time-dependent
model.)

Use the first few principal components (or factor loadings) of the predictor variables.
(Limitation: may lose interpretability.)
Biased Regression or Coefficient Shrinkage (Example: Ridge Regression)
1

Two Equivalent Formulations of Ridge Regression


Ridge regression shrinks estimators by penalyzing their size. (Penalty:

j2 )

Penalized Residual Sum of Squares:


( N
)
p
p
X
X
X
ridge = arg min
(Yi 0
xi,j j )2 +
2
j

i=1

j=1

j=1

controls the amount of shrinkage of the parameter estimates


Large greater shrinkage (toward zero)
Equivalent Representation:
ridge

= arg min yi 0

p
X

!2
xi,j j

j=1

subject to

p
X

j2 s.

j=1

There is a direct relationship between and s (although we will usually talk about ).
The intercept 0 is not subject to the shrinkage penalty.
Matrix Representation of Solution
ridge = (X0 X + I)1 X0 y

KNNL Example page 256


SAS code in ridge.sas
20 healthy female subjects ages 25-34
Y is fraction body fat
X1 is triceps skin fold thickness
X2 is thigh circumference
X3 is midarm circumference
Conclusion from previous analysis: could have good model with thigh only or midarm
and thickness only.
Input the data

data bodyfat;
infile H:\System\Desktop\CH07TA01.dat;
input skinfold thigh midarm fat;
proc print data = bodyfat;
proc reg data = bodyfat;
model fat = skinfold thigh midarm;

Source
Model
Error
Corrected Total

DF
3
16
19

Analysis of Variance
Sum of
Mean
Squares
Square
396.98461
132.32820
98.40489
6.15031
495.38950

Root MSE
Dependent Mean
Coeff Var

2.47998
20.19500
12.28017

Variable
Intercept
skinfold
thigh
midarm

Parameter Estimates
Parameter
Standard
Estimate
Error
117.08469
99.78240
4.33409
3.01551
-2.85685
2.58202
-2.18606
1.59550

DF
1
1
1
1

R-Square
Adj R-Sq

F Value
21.52

0.8014
0.7641

t Value
1.17
1.44
-1.11
-1.37

Pr > |t|
0.2578
0.1699
0.2849
0.1896

None of the p-values are significant.

skinfold
thigh
midarm
fat

Pearson Correlation Coefficients, N = 20


skinfold
thigh
midarm
1.00000
0.92384
0.45778
0.92384
1.00000
0.08467
0.45778
0.08467
1.00000
0.84327
0.87809
0.14244

Try Ridge Regression


proc reg data = bodyfat
outest = bfout ridge = 0 to 0.1 by 0.003;
model fat = skinfold thigh midarm / noprint;
plot / ridgeplot nomodel nostat;

fat
0.84327
0.87809
0.14244
1.00000

Pr > F
<.0001

Ridge Trace

Each value of (or Ridge k in SAS) gives different values of the parameter estimates. (Note
the instability of the estimate values for small .)
How to Choose
Things to look for
Get the variance inflation factors (VIF) close to 1
Estimated coefficients should be stable
look for only modest change in R2 or
.
title2 Variance Inflation Factors;
proc gplot data = bfout;
plot (skinfold thigh midarm)* _RIDGE_ / overlay;
where _TYPE_ = RIDGEVIF;
run;

Graph the VIFs

Chart the Estimates and Errors for different values


proc print data = bfout;
var _RIDGE_ skinfold thigh midarm;
where _TYPE_ = RIDGEVIF;
proc print data = bfout;
var _RIDGE_ _RMSE_ Intercept skinfold thigh midarm;
where _TYPE_ = RIDGE;
Variance Inflation Factors
Obs
_RIDGE_
skinfold
2
0.000
708.843
4
0.002
50.559
6
0.004
16.982
8
0.006
8.503
10
0.008
5.147
12
0.010
3.486
14
0.012
2.543
16
0.014
1.958
18
0.016
1.570
20
0.018
1.299
22
0.020
1.103
24
0.022
0.956
26
0.024
0.843
28
0.026
0.754
30
0.028
0.683
32
0.030
0.626

thigh
564.343
40.448
13.725
6.976
4.305
2.981
2.231
1.764
1.454
1.238
1.081
0.963
0.872
0.801
0.744
0.697

midarm
104.606
8.280
3.363
2.119
1.624
1.377
1.236
1.146
1.086
1.043
1.011
0.986
0.966
0.949
0.935
0.923

Note that at RIDGE = 0.020, the VIFs are close to 1.


Parameter Estimates
Obs
_RIDGE_
_RMSE_

Intercept

skinfold

thigh

midarm

3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33

0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0.022
0.024
0.026
0.028
0.030

2.47998
2.54921
2.57173
2.58174
2.58739
2.59104
2.59360
2.59551
2.59701
2.59822
2.59924
2.60011
2.60087
2.60156
2.60218
2.60276

117.085
22.277
7.725
1.842
-1.331
-3.312
-4.661
-5.637
-6.373
-6.946
-7.403
-7.776
-8.083
-8.341
-8.559
-8.746

4.33409
1.46445
1.02294
0.84372
0.74645
0.68530
0.64324
0.61249
0.58899
0.57042
0.55535
0.54287
0.53233
0.52331
0.51549
0.50864

-2.85685
-0.40119
-0.02423
0.12820
0.21047
0.26183
0.29685
0.32218
0.34131
0.35623
0.36814
0.37786
0.38590
0.39265
0.39837
0.40327

-2.18606
-0.67381
-0.44083
-0.34604
-0.29443
-0.26185
-0.23934
-0.22278
-0.21004
-0.19991
-0.19163
-0.18470
-0.17881
-0.17372
-0.16926
-0.16531

Note that at RIDGE = 0.020, the RM SE is only increased by 5% (so SSE increase by about
10%), and the parameter estimates are closer to making sense.
Conclusion
So the solution at = 0.02 with parameter estimates (-7.4, 0.56, 0.37, -0.19) seems to make
the most sense.
Notes
The book makes a big deal about standardizing the variables... SAS does this for you
in the ridge option.
Why ridge regression? Estimates tend to be more stable, particularly outside the region
of the predictor variables: less affected by small changes in the data. (Ordinary LS
estimates can be highly unstable when there is lots of multicollinearity.)
Major drawback: ordinary inference procedures dt work so well.
P
Other procedures use different penalties, e.g. Lasso penalty:
|j |.

You might also like