Applied Statistics II Chapter 8 Multiple Linear Regression: Jian Zou
Applied Statistics II Chapter 8 Multiple Linear Regression: Jian Zou
Jian Zou
1 / 63
The MLR Model
Multicollinearity
2 / 63
Multiple Linear Regression
3 / 63
The MLR Model
Y = β0 + β1 X1 (Z1 , Z2 , . . . , Zp )
+β2 X2 (Z1 , Z2 , . . . , Zp )
+ . . . + βq Xq (Z1 , Z2 , . . . , Zp ) + ,
4 / 63
The MLR Model
Here are some examples:
Y = β0 + β1 Z1 + β2 Z12 + ,
(p = , q= , X1 = , X2 = )
Y = β0 + β1 Z1 + β2 Z2 + β3 Z12
+β4 Z1 Z2 + β5 Z22 + ,
(p = , q= , X1 = , X2 = ,
X3 = , X4 = , X5 = )
p
Y = β0 + β1 log(Z2 ) + β3 Z1 Z2 + .
(p = , q= , X1 = , X2 = )
Y = β0 + β1 X1 + β2 X2 + . . . + βq Xq + .
5 / 63
Example 1
6 / 63
The Response Surface
Y = β0 + β1 X1 (Z1 , Z2 , . . . , Zp )
+β2 X2 (Z1 , Z2 , . . . , Zp )+
. . . + βq Xq (Z1 , Z2 , . . . , Zp ),
is called the response surface of the model.
7 / 63
Interpreting the Response Surface
E (Y | X1 = x1 , X2 = x2 , . . . , Xq = xq ) =
β0 + β1 x1 + β2 x2 + . . . + βq xq .
If it is possible for the Xi to simultaneously take the value 0, then
β0 is the value of the response surface when all Xi equal 0.
Otherwise, β0 has no separate interpretation of its own.
8 / 63
Interpreting the Response Surface
9 / 63
Interpreting the Response Surface
E (Y | Z1 = z1 , Z2 = z2 , . . . , Zp = zp ) =
β0 + β1 X1 (z1 , z2 , . . . , zp )+
β2 X2 (z1 , z2 , . . . , zp )+
. . . + βq Xq (z1 , z2 , . . . , zp ).
10 / 63
Interpreting the Response Surface
∂
E (Y | Z1 = z1 , Z2 = z2 , . . . , Zp = zp ).
∂zi
11 / 63
Some Response Surface Examples
E (Y | Z1 = z1 , Z2 = z2 ) = β0 + β1 z1 + β2 z2 ,
12 / 63
Some Response Surface Examples
E (Y | Z1 = z1 , Z2 = z2 ) = β0 + β1 z1 + β2 z2 + β3 z1 z2 ,
13 / 63
Some Response Surface Examples
E (Y | Z1 = z1 , Z2 = z2 ) = β0 + β1 z1 + β2 z2 + β3 z12 + β4 z22 + β5 z1 z2 ,
14 / 63
Example 1, Continued
The fitted response surface for the tool life data was
\
ln(ToolLife) = −17.5985 + 96.1106Speed −.25 + 0.0164Feed −1
15 / 63
The Modeling Process
16 / 63
Model Specification
For the MLR model, model specification means specifying the form
of the model: the response, predictors and regressors.
17 / 63
Multivariable Visualization
18 / 63
Fitting the MLR Model
As we did for SLR model, we use least squares to fit the MLR
model. This means finding estmators of the model parameters
β0 , β1 , . . . , βq and σ 2 .
19 / 63
Fitting the MLR Model
SSE(b0 , b1 , . . . , bq ) =
n
X
[Yi − (b0 + b1 Xi1 + b2 Xi2 + · · · + bq Xiq )]2 .
i=1
20 / 63
Fitting the MLR Model
21 / 63
Example 3
Let’s see what happens when we identify and fit a model to data in
CARS93A. The scatterplot array on the next slide shows the
response, highway mpg (HIGHMPG) and three potential predictors,
displacement (DISPLACE), horsepower (HP) and rpm (RPM).
22 / 63
50
HIGHMPG
20
5.7
DISPLACE
1.0
300
HP
55
6500
RPM
3800
23 / 63
Example 3, Continued
24 / 63
Assessing Model Fit
25 / 63
Example 3, Continued
Let’s look at the residuals from the fit to the data in CARS93A.
26 / 63
2 2
s s
t t
u u
d d
r r
e e
0 0
s s
-2 -2
DISPLACE HP
2 2
s s
t t
u u
d d
r r
e e
0 0
s s
-2 -2
RPM fitted
27 / 63
studres
-2 0 2
studres
0.4
D
e
n
s
0.2
i
t
y
0
-3 -2 -1 0 1 2 3 4
studres
Moments
N 93.0000 Sum Wgts 93.0000
Mean -0.0006 Sum -0.0527
Std Dev 1.0248 Variance 1.0502
Skewness 0.5658 Kurtosis 1.5809
2
s USS 96.6185 CSS 96.6185
t CV -180831.14 Std Mean 0.1063
u Quantiles
d 100% Max 3.6778 99.0% 3.6778
r 75% Q3 0.6858 97.5% 1.9703
0
50% Med -0.0734 95.0% 1.5264
e
25% Q1 -0.6849 90.0% 1.0344
s
0% Min -2.5365 10.0% -1.0406
Range 6.2144 5.0% -1.7875
Q3-Q1 1.3707 2.5% -1.8808
-2 Mode -0.7288 1.0% -2.5365
28 / 63
Interpretation of the Fitted Model
β̂2 X2 (Z1 , Z2 , . . . , Zp )+
. . . + β̂q Xq (Z1 , Z2 , . . . , Zp ).
If we feel that this model fits the data well, then for purposes of
interpretation, we regard the fitted model as the actual response
surface, and we interpret it exactly as we would interpret the
response surface.
29 / 63
Example 3, Continued
Let’s interpret the fitted model for the fit to the data in CARS93A.
Recall that it is
30 / 63
The Analysis of Variance (ANOVA)
SSTO can be broken down into two pieces: SSR, the regression
sum of squares, and SSE, the error sum of squares, so that
SSTO=SSR+SSE.
31 / 63
The Analysis of Variance (ANOVA)
32 / 63
The Analysis of Variance (ANOVA)
33 / 63
The Analysis of Variance (ANOVA)
Analysis of Variance
Source DF SS MS F Stat Prob > F
Model q SSR MSR F=MSR/MSE p-value
Error n−q−1 SSE MSE
C Total n−1 SSTO
34 / 63
Example 3, Continued
Here’s the ANOVA table for the original fit to the CARS93A data.
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 6 1614.28441 269.04740 23.11 < .0001
Error 86 1001.02742 11.63985
Corrected Total 92 2615.31183
35 / 63
Comparison of Fitted Models
36 / 63
Residual Analysis
37 / 63
Principle of Parsimony
38 / 63
The Coefficient of Multiple Determination
SSR SSE
R2 = =1− .
SSTO SSTO
R 2 is
I The proportion of variation in the response explained by the
regression.
I The proportion by which the unexplained variation in the
response is reduced by the regression.
39 / 63
The Adjusted Coefficient of Multiple Determination
SSE/(n − q − 1)
Ra2 = 1 − .
SSTO/(n − 1)
Ra2 can be used to help implement the Principle of Parsimony.
40 / 63
Example 3, Continued
Let’s fit a second model to the data in CARS93A, and compare its
fit to the first model we considered.
We’d like to get rid of the squared terms in the model, which
complicate interpretation. One way we might do this is to
transform the data.
The scatterplot array on the next slide plots RPM and the natural
logs of MPG, DISPLACEMENT and HP. The transformations have
made all the relations nearly linear.
41 / 63
3.9120
L_HIGHMP
2.9957
1.7405
L_DISPLA
0.0000
5.7038
L_HP
4.0073
6500
RPM
3800
42 / 63
Example 3, Continued
Model R2 Ra2
1 0.6172 0.5905
2 0.5905 0.5572
43 / 63
Example 3, Continued
The plots of the Studentized residuals on the next two slides give
no reason to doubt the adequacy of the model fit. In fact, all
normality tests (even Shapiro-Wilk) fail to reject the null
hypothesis of normality.
44 / 63
2 2
s s
t t
u u
d d
0 0
r r
e e
s s
-2 -2
L_DISPLA L_HP
2 2
s s
t t
u u
d d
0 0
r r
e e
s s
-2 -2
RPM fits
45 / 63
studres
-2 0 2
studres
0.3
D
e
0.2
n
s
i
t
y 0.1
0
-3.2 -2.4 -1.6 -0.8 0.0 0.8 1.6 2.4 3.2
studres
Moments
N 93.0000 Sum Wgts 93.0000
Mean -0.0006 Sum -0.0593
2 Std Dev 1.0169 Variance 1.0341
Skewness -0.2701 Kurtosis 0.7683
USS 95.1413 CSS 95.1413
s
CV -159449.64 Std Mean 0.1055
t
u Quantiles
46 / 63
Inference for the MLR Model: The F Test
47 / 63
Example 3, Continued
The value F ∗ , of the F test statistic and its p-value are usually
included in the ANOVA table output by a computer program. Here
is the F test for the first model for the CARS93A data.
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 6 1614.28441 269.04740 23.11 < .0001
Error 86 1001.02742 11.63985
Corrected Total 92 2615.31183
48 / 63
Tests for Individual Regressors
49 / 63
Tests for Individual Regressors
50 / 63
Tests for Individual Regressors
51 / 63
Example 3, Continued
Here are the tests for the model 1 fit to the CARS93A data.
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 −5.67592 25.94414 −0.22 0.8273
D 1 −6.54110 3.19907 −2.04 0.0439
D2 1 1.08102 0.40608 2.66 0.0093
H 1 −0.11772 0.05551 −2.12 0.0368
H2 1 0.00017020 0.00013181 1.29 0.2001
R 1 0.01902 0.00990 1.92 0.0580
R2 1 −0.00000156 9.535905E−7 −1.64 0.1048
52 / 63
Confidence Intervals for MLR Model
53 / 63
Prediction Interval for a Future Observation
where
Ŷnew = β̂0 + β̂1 X10 + · · · + β̂q Xq0 ,
and q
σ̂(Ynew − Ŷnew ) = MSE + σ̂ 2 (Ŷ0 ).
54 / 63
Example 3, Continued
55 / 63
Example 3, Continued
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 −5.67592 25.94414 −0.22 0.8273
D 1 −6.54110 3.19907 −2.04 0.0439
D2 1 1.08102 0.40608 2.66 0.0093
H 1 −0.11772 0.05551 −2.12 0.0368
H2 1 0.00017020 0.00013181 1.29 0.2001
R 1 0.01902 0.00990 1.92 0.0580
R2 1 −0.00000156 9.535905E−7 −1.64 0.1048
So, for example, a 95% confidence interval for the coefficient for
D2 is (after obtaining t86,0.975 ≈ 1.99 from the t-table):
56 / 63
Multicollinearity
57 / 63
Multicollinearity
58 / 63
Multicollinearity
59 / 63
Multicollinearity
60 / 63
Example 3, Continued
For the first model for the CARS93A data, there is a large amount
of multicollinearity as the “Uncentered” columns in the table below
show. To try to alleviate this, we centered the predictors prior to
forming the model. The table shows that this has greatly reduced
the multicollinearity.
Uncentered Centered
Variable Tolerance VIF Tolerance VIF
D 0.0115 87.0459 0.0690 14.5023
D2 0.0177 56.4137 0.2916 3.4297
H 0.0150 66.7970 0.0925 10.8154
H2 0.0223 44.7906 0.3247 3.0543
R 0.0036 275.9746 0.2430 4.1149
R2 0.0036 278.7026 0.7164 1.3959
61 / 63
Empirical Model Building
62 / 63
Example 3, Continued
63 / 63