0% found this document useful (0 votes)
58 views63 pages

Applied Statistics II Chapter 8 Multiple Linear Regression: Jian Zou

The document discusses multiple linear regression (MLR) models. MLR models relate a response variable to more than one predictor variable through regression terms. The modeling process involves specifying the model, fitting the model to data using least squares, assessing model fit, and validating the model. Key aspects of MLR covered include the response surface, interpreting model parameters, examples of different MLR models, and fitting the model to obtain estimated coefficients and residuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views63 pages

Applied Statistics II Chapter 8 Multiple Linear Regression: Jian Zou

The document discusses multiple linear regression (MLR) models. MLR models relate a response variable to more than one predictor variable through regression terms. The modeling process involves specifying the model, fitting the model to data using least squares, assessing model fit, and validating the model. Key aspects of MLR covered include the response surface, interpreting model parameters, examples of different MLR models, and fitting the model to obtain estimated coefficients and residuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Applied Statistics II

Chapter 8 Multiple Linear Regression

Jian Zou

WPI Mathematical Sciences

1 / 63
The MLR Model

The Response Surface

The Modeling Process

Fitting the MLR Model

Assessing Model Fit

Interpretation of the Fitted Model

The Analysis of Variance (ANOVA)

Comparison of Fitted Models

Inference for the MLR Model: The F Test

Multicollinearity

2 / 63
Multiple Linear Regression

In Chapter 7, we studied simple linear regression (SLR) models:


models relating a response to a single regressor. Now we turn to
multiple linear regression (MLR) models: models relating a
response to more than one regressor.

3 / 63
The MLR Model

We assume the response Y is related to p predictors Z1 , Z2 , . . . , Zp


through q regressors
X1 (Z1 , Z2 , . . . , Zp ), X2 (Z1 , Z2 , . . . , Zp ), . . . , Xq (Z1 , Z2 , . . . , Zp ) and
the linear relation

Y = β0 + β1 X1 (Z1 , Z2 , . . . , Zp )
+β2 X2 (Z1 , Z2 , . . . , Zp )
+ . . . + βq Xq (Z1 , Z2 , . . . , Zp ) + ,

where  is a random error.

4 / 63
The MLR Model
Here are some examples:

Y = β0 + β1 Z1 + β2 Z12 + ,
(p = , q= , X1 = , X2 = )
Y = β0 + β1 Z1 + β2 Z2 + β3 Z12
+β4 Z1 Z2 + β5 Z22 + ,
(p = , q= , X1 = , X2 = ,
X3 = , X4 = , X5 = )
p
Y = β0 + β1 log(Z2 ) + β3 Z1 Z2 + .
(p = , q= , X1 = , X2 = )

We will write these models generically as

Y = β0 + β1 X1 + β2 X2 + . . . + βq Xq + .

5 / 63
Example 1

In a metal cutting experiment, the objective was to develop model


for the life of a prototype tool in terms of two important machining
productivity predictor variables, cutting speed and feed rate. The
data are found in TOOL LIFE.

By exploring the data, and using some establilshed methodologies,


the experimenters found a satisfactory model of the form

ln(ToolLife) = β0 + β1 Speed −.25 + β2 Feed −1 + 

Here, in addition to a transformed response, there are two


predictors (Speed and Feed) and two regressors (Speed −.25 and
Feed −1 ).

6 / 63
The Response Surface

The surface defined by the deterministic part of the multiple linear


regression model,

Y = β0 + β1 X1 (Z1 , Z2 , . . . , Zp )

+β2 X2 (Z1 , Z2 , . . . , Zp )+
. . . + βq Xq (Z1 , Z2 , . . . , Zp ),
is called the response surface of the model.

7 / 63
Interpreting the Response Surface

When considered a function of the regressors, the response surface


is defined by the functional relationship

E (Y | X1 = x1 , X2 = x2 , . . . , Xq = xq ) =

β0 + β1 x1 + β2 x2 + . . . + βq xq .
If it is possible for the Xi to simultaneously take the value 0, then
β0 is the value of the response surface when all Xi equal 0.
Otherwise, β0 has no separate interpretation of its own.

8 / 63
Interpreting the Response Surface

For i = 1, . . . , q, βi is interpreted as the change in the expected


response per unit change in the regressor Xi , when all other
regressors are held constant.

Note that sometimes this interpretation is impossible. For example,


if X1 = Z1 and X2 = Z13 , we cannot change X2 while holding X1
constant.

9 / 63
Interpreting the Response Surface

As a function of the predictors, the response surface is defined by


the functional relationship

E (Y | Z1 = z1 , Z2 = z2 , . . . , Zp = zp ) =

β0 + β1 X1 (z1 , z2 , . . . , zp )+
β2 X2 (z1 , z2 , . . . , zp )+
. . . + βq Xq (z1 , z2 , . . . , zp ).

10 / 63
Interpreting the Response Surface

If the regressors are differentiable functions of the predictors, the


instantaneous rate of change of the surface in the direction of
predictor Zi , at the point z1 , z2 , . . . , zp is


E (Y | Z1 = z1 , Z2 = z2 , . . . , Zp = zp ).
∂zi

11 / 63
Some Response Surface Examples

Additive Model: For the model

E (Y | Z1 = z1 , Z2 = z2 ) = β0 + β1 z1 + β2 z2 ,

the change in expected response per unit change in zi is



E (Y | Z1 = z1 , Z2 = z2 ) =
∂zi

(β0 + β1 z1 + β2 z2 ) = βi , i = 1, 2.
∂zi

12 / 63
Some Response Surface Examples

Two predictor interaction Model: For the two predictor


interaction model

E (Y | Z1 = z1 , Z2 = z2 ) = β0 + β1 z1 + β2 z2 + β3 z1 z2 ,

the change in expected response per unit change in z1 is



E (Y | Z1 = z1 , Z2 = z2 ) = β1 + β3 z2 ,
∂z1
and the change in expected response per unit change in z2 is

E (Y | Z1 = z1 , Z2 = z2 ) = β2 + β3 z1 .
∂z2

13 / 63
Some Response Surface Examples

Full Quadratic Model: For the full quadratic model

E (Y | Z1 = z1 , Z2 = z2 ) = β0 + β1 z1 + β2 z2 + β3 z12 + β4 z22 + β5 z1 z2 ,

the change in expected response per unit change in z1 is



E (Y | Z1 = z1 , Z2 = z2 ) = β1 + 2β3 z1 + β5 z2 ,
∂z1
and the change in expected response per unit change in z2 is

E (Y | Z1 = z1 , Z2 = z2 ) = β2 + 2β4 z2 + β5 z1 .
∂z2

14 / 63
Example 1, Continued

The fitted response surface for the tool life data was

\
ln(ToolLife) = −17.5985 + 96.1106Speed −.25 + 0.0164Feed −1

On this surface, the change in estimated log tool life


I per unit change in the regressor X1 = Speed −.25 is
I per unit change in Speed is

15 / 63
The Modeling Process

The modeling process involves the following steps:


1. Model Specification
2. Model Fitting
3. Model Assessment
4. Model Validation

16 / 63
Model Specification

For the MLR model, model specification means specifying the form
of the model: the response, predictors and regressors.

When the data arise from well-understood phenomena, this can


often be done using experience and theory.

Often, however, one must look to the data to suggest an


appropriate form for the model. This is where a technique known
as multivariable visualization can help.

17 / 63
Multivariable Visualization

Multivariable visualization begins with a number of standard


statistical tools, such as histograms, to look at each variable
individually, or scatterplots, to look at pairs of variables. But the
true power of multivariable visualization can be found only in a set
of sophisticated statistical tools which make use of multiple
dynamically-linked displays (You won’t find these in Microsoft
Excel!) Two such tools are
I Scatterplot Arrays
I Rotating 3-D Plots

18 / 63
Fitting the MLR Model

As we did for SLR model, we use least squares to fit the MLR
model. This means finding estmators of the model parameters
β0 , β1 , . . . , βq and σ 2 .

19 / 63
Fitting the MLR Model

The least squares estimators of the βs are those values, of


b0 , b1 , . . . , bq , denoted β̂0 , β̂1 , . . . , β̂q , which minimize

SSE(b0 , b1 , . . . , bq ) =
n
X
[Yi − (b0 + b1 Xi1 + b2 Xi2 + · · · + bq Xiq )]2 .
i=1

NOTE: We could take derivatives with respect to each of the bi s


and obtain the normal equations by setting the results to 0, but
the resulting formulas are very messy and we will not present them
here. In fact, the best way to present them is to use vectors and
matrices, which we will leave for a more advanced course.

20 / 63
Fitting the MLR Model

The fitted values are

Ŷi = β̂0 + β̂1 Xi1 + β̂2 Xi2 + · · · + β̂q Xiq ,

and the residuals are


ei = Yi − Ŷi .

21 / 63
Example 3

Let’s see what happens when we identify and fit a model to data in
CARS93A. The scatterplot array on the next slide shows the
response, highway mpg (HIGHMPG) and three potential predictors,
displacement (DISPLACE), horsepower (HP) and rpm (RPM).

22 / 63
50

HIGHMPG

20
5.7

DISPLACE

1.0
300

HP

55
6500

RPM

3800

23 / 63
Example 3, Continued

There seems to be a curvilinear relation between the response and


each potential predictor, so we will begin by trying a model with
linear and squared terms for each predictor. The resulting fitted
model is (letting Y denote highway mpg, H denote HP, R denote
RPM, and D denote displacement):

Ŷ = −5.6759 − 0.1177H − 6.5411D + 0.0190R


+1.0810D 2 + 0.0002H 2 − (1.6 × 10−6 )R 2 .

24 / 63
Assessing Model Fit

Residuals and Studentized residuals are the primary tools to


analyze model fit. We look for outliers and other deviations from
model assumptions.

25 / 63
Example 3, Continued

Let’s look at the residuals from the fit to the data in CARS93A.

The next slide shows scatterplots of Studentized residuals versus


each predictor and the fitted values. These plots show a fairly
random pattern (or non-pattern) for the residuals.

The slide after shows a distribution analysis of the Studentized


residuals. The normal quantile plot seems to show no major
departures from normality, though the Shapiro-Wilk test, one of
the tests for normality, does suggest the normality assumption is
questionable, perhaps because of the two large outliers.

On the whole, however, the model seems to fit reasonably well.

26 / 63
2 2

s s
t t
u u
d d
r r
e e
0 0
s s

-2 -2

1 2 3 4 5 100 200 300

DISPLACE HP

2 2

s s
t t
u u
d d
r r
e e
0 0
s s

-2 -2

4000 5000 6000 25 30 35 40

RPM fitted

27 / 63
studres

-2 0 2

studres

0.4

D
e
n
s
0.2
i
t
y

0
-3 -2 -1 0 1 2 3 4

studres

Moments
N 93.0000 Sum Wgts 93.0000
Mean -0.0006 Sum -0.0527
Std Dev 1.0248 Variance 1.0502
Skewness 0.5658 Kurtosis 1.5809
2
s USS 96.6185 CSS 96.6185
t CV -180831.14 Std Mean 0.1063

u Quantiles
d 100% Max 3.6778 99.0% 3.6778
r 75% Q3 0.6858 97.5% 1.9703
0
50% Med -0.0734 95.0% 1.5264
e
25% Q1 -0.6849 90.0% 1.0344
s
0% Min -2.5365 10.0% -1.0406
Range 6.2144 5.0% -1.7875
Q3-Q1 1.3707 2.5% -1.8808
-2 Mode -0.7288 1.0% -2.5365

-2 0 2 Tests for Normality


N_studres Test Statistic Value p-value
Shapiro-Wilk 0.969600 0.0287
Kolmogorov-Smirnov 0.078465 >.1500
Cramer-von Mises 0.088748 0.1604
Anderson-Darling 0.623321 0.1018

28 / 63
Interpretation of the Fitted Model

The fitted model is

Ŷ = β̂0 + β̂1 X1 (Z1 , Z2 , . . . , Zp )+

β̂2 X2 (Z1 , Z2 , . . . , Zp )+
. . . + β̂q Xq (Z1 , Z2 , . . . , Zp ).

If we feel that this model fits the data well, then for purposes of
interpretation, we regard the fitted model as the actual response
surface, and we interpret it exactly as we would interpret the
response surface.

29 / 63
Example 3, Continued

Let’s interpret the fitted model for the fit to the data in CARS93A.
Recall that it is

Ŷ = −5.6759 − 0.1177H − 6.5411D + 0.0190R


+1.0810D 2 + 0.0002H 2 − (1.6 × 10−6 )R 2 .

Interpretation of the effect of H would be that a change in


predicted MPG with respect to a unit change in H is
−0.1177 + 0.0004H, assuming D and R remain constant.

Does the intercept −5.6759 have an interpretation?

30 / 63
The Analysis of Variance (ANOVA)

Total variation in the response (about its mean) is measured by


n
X
SSTO = (Yi − Y )2 .
i=1

This is the variation or uncertainty of prediction if no regressor


variables are used.

SSTO can be broken down into two pieces: SSR, the regression
sum of squares, and SSE, the error sum of squares, so that
SSTO=SSR+SSE.

31 / 63
The Analysis of Variance (ANOVA)

SSE = ni ei2 is the total sum of the squared residuals. It


P
measures the variation of the response unaccounted for by the
fitted model or, equivalently, the uncertainty of predicting the
response using the fitted model.

SSR = SSTO − SSE is the variability explained by the fitted model


or, equivalently, the reduction in uncertainty of prediction due to
using the fitted model.

32 / 63
The Analysis of Variance (ANOVA)

The degrees of freedom for a SS is the number of independent


pieces of data making up the SS.

For SSTO, SSE and SSR the degrees of freedom are n − 1,


n − q − 1 and q. These add just as the SSs do.

A SS divided by its degrees of freedom is called a mean square.

The ANOVA table summarizes the SSs, degrees of freedom and


mean squares.

33 / 63
The Analysis of Variance (ANOVA)

The ANOVA table looks like this:

Analysis of Variance
Source DF SS MS F Stat Prob > F
Model q SSR MSR F=MSR/MSE p-value
Error n−q−1 SSE MSE
C Total n−1 SSTO

34 / 63
Example 3, Continued

Here’s the ANOVA table for the original fit to the CARS93A data.

Sum of Mean
Source DF Squares Square F Value Pr > F
Model 6 1614.28441 269.04740 23.11 < .0001
Error 86 1001.02742 11.63985
Corrected Total 92 2615.31183

35 / 63
Comparison of Fitted Models

There are a number of tools we can use to compare models.


Among these are
I Residual analysis
I Principle of parsimony (simplicity of description)
I Coefficient of multiple determination, and its adjusted cousin.

36 / 63
Residual Analysis

Residual analysis, as we have seen, can pinpoint model deficiencies.


In doing so, it points up the relative deficiencies (or merits) of the
models being compared.

37 / 63
Principle of Parsimony

The Principle of Parsimony, also known as Occam’s Razor, states


that all other things being equal, the simplest explanation of a
phenomenon is the best. In MLR this can mean
I Among models with equal fit and predictive power, the one
with fewest parameters is best.
I Among models with equal fit and predictive power, the one
with the easiest interpretation is best.

38 / 63
The Coefficient of Multiple Determination

This quantity, R 2 , is defined as

SSR SSE
R2 = =1− .
SSTO SSTO

R 2 is
I The proportion of variation in the response explained by the
regression.
I The proportion by which the unexplained variation in the
response is reduced by the regression.

R 2 is also the square of the Pearson correlation between Y and Yb .

39 / 63
The Adjusted Coefficient of Multiple Determination

One problem with using R 2 to measure the quality of model fit, is


that it can always be increased by adding another regressor.

The Adjusted Coefficient of Multiple Determination, Ra2 , is a


measure that adjusts R 2 for the number of regressors in the model.
It is defined as

SSE/(n − q − 1)
Ra2 = 1 − .
SSTO/(n − 1)
Ra2 can be used to help implement the Principle of Parsimony.

40 / 63
Example 3, Continued

Let’s fit a second model to the data in CARS93A, and compare its
fit to the first model we considered.

Here’s our reasoning in selecting the second model: simplify.

We’d like to get rid of the squared terms in the model, which
complicate interpretation. One way we might do this is to
transform the data.

The scatterplot array on the next slide plots RPM and the natural
logs of MPG, DISPLACEMENT and HP. The transformations have
made all the relations nearly linear.

41 / 63
3.9120

L_HIGHMP

2.9957
1.7405

L_DISPLA

0.0000
5.7038

L_HP

4.0073
6500

RPM

3800

42 / 63
Example 3, Continued

So we fit the model (the prefix L denotes natural log)

LdY = 4.5490 − 0.3550L H + 0.0292L D + 9.9 × 10−5 R.

Both R 2 and Ra2 favor the first model:

Model R2 Ra2
1 0.6172 0.5905
2 0.5905 0.5572

43 / 63
Example 3, Continued

The plots of the Studentized residuals on the next two slides give
no reason to doubt the adequacy of the model fit. In fact, all
normality tests (even Shapiro-Wilk) fail to reject the null
hypothesis of normality.

Two additional considerations:


1. The first model uses the original units, which might be
considered an advantage.
2. However, if one doesn’t mind working with transformed data,
the form of the second model is simpler (no squared terms).

44 / 63
2 2

s s
t t
u u
d d
0 0
r r
e e
s s

-2 -2

0.0 0.5 1.0 1.5 4.5 5.0 5.5

L_DISPLA L_HP

2 2

s s
t t
u u
d d
0 0
r r
e e
s s

-2 -2

4000 5000 6000 3.2 3.4 3.6

RPM fits

45 / 63
studres

-2 0 2

studres

0.3

D
e
0.2
n
s
i
t
y 0.1

0
-3.2 -2.4 -1.6 -0.8 0.0 0.8 1.6 2.4 3.2

studres

Moments
N 93.0000 Sum Wgts 93.0000
Mean -0.0006 Sum -0.0593
2 Std Dev 1.0169 Variance 1.0341
Skewness -0.2701 Kurtosis 0.7683
USS 95.1413 CSS 95.1413
s
CV -159449.64 Std Mean 0.1055
t
u Quantiles

d 100% Max 3.0226 99.0% 3.0226


0
75% Q3 0.8253 97.5% 1.8550
r
50% Med 0.0408 95.0% 1.3645
e
25% Q1 -0.6438 90.0% 1.0620
s 0% Min -2.8522 10.0% -1.0487
Range 5.8748 5.0% -2.0432
Q3-Q1 1.4691 2.5% -2.4337
-2
Mode -0.4507 1.0% -2.8522

Tests for Normality


Test Statistic Value p-value
-2 0 2
Shapiro-Wilk 0.980247 0.1717
N_studres
Kolmogorov-Smirnov 0.056201 >.1500
Cramer-von Mises 0.059467 >.2500
Anderson-Darling 0.522011 0.1881

46 / 63
Inference for the MLR Model: The F Test

Once we have identified and fit a tentative model, we will want to


know if there is a statistically significant relation between the
response and the regressors. The tool to answer this question is
the F Test.
I The Hypotheses:
H0 : β1 = β2 = · · · = βq = 0 (i.e. no relationship)
Ha : Not H0
I The Test Statistic: F=MSR/MSE
I The P-Value: P(Fq,n−q−1 > F ∗ ), the area under the
density curve of the Fq,n−q−1 distribution that exceeds F ∗ ,
the observed value of the test statistic.

47 / 63
Example 3, Continued

The value F ∗ , of the F test statistic and its p-value are usually
included in the ANOVA table output by a computer program. Here
is the F test for the first model for the CARS93A data.

Sum of Mean
Source DF Squares Square F Value Pr > F
Model 6 1614.28441 269.04740 23.11 < .0001
Error 86 1001.02742 11.63985
Corrected Total 92 2615.31183

48 / 63
Tests for Individual Regressors

If the F test fails to reject H0 , our task is done: there is no reason


to pursue the regression model further, as there is no evidence of a
relation between the response and the regressors.

We may choose to reformulate the model or to revise our theories,


conduct further experiments, or take some other course of action.

49 / 63
Tests for Individual Regressors

If the F test does reject H0 , however, we will conclude that there is


evidence of a relation between the response and the regressors, and
we will probably want to explore that relationship more closely.

Specifically, for the MLR model, we will be interested in which


regressors do show evidence of a relationship with the response,
and which do not.

50 / 63
Tests for Individual Regressors

To test for the significance of the relationship between the


response and regressor Xi , we can conduct the following t-test:
I The Hypotheses:
H0 : βi = 0
Ha : βi 6= 0
β̂i
I The Test Statistic: t = , where σ̂(β̂i ) is the estimated
σ̂(β̂i )
standard error of β̂i
I The P-Value: 2 min{p− , p + }, where t ∗ is the observed value
of the test statistic, p− = P(tn−q−1 < t ∗ ) is the area under
the tn−q−1 density curve below t ∗ and p + = P(tn−q−1 > t ∗ )
is the area under the tn−q−1 density curve above t ∗ .

51 / 63
Example 3, Continued

Here are the tests for the model 1 fit to the CARS93A data.

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 −5.67592 25.94414 −0.22 0.8273
D 1 −6.54110 3.19907 −2.04 0.0439
D2 1 1.08102 0.40608 2.66 0.0093
H 1 −0.11772 0.05551 −2.12 0.0368
H2 1 0.00017020 0.00013181 1.29 0.2001
R 1 0.01902 0.00990 1.92 0.0580
R2 1 −0.00000156 9.535905E−7 −1.64 0.1048

52 / 63
Confidence Intervals for MLR Model

Confidence Interval for Model Coefficients: A level L


confidence interval for βi has endpoints

β̂i ± σ̂(β̂i )tn−q−1,(1+L)/2

Confidence Interval for Mean Response: A level L confidence


interval for the mean response at regressor values X10 , X20 , . . . , Xq0
has endpoints
Ŷ0 ± σ̂(Ŷ0 )tn−q−1,(1+L)/2
where
Ŷ0 = β̂0 + β̂1 X10 + · · · + β̂q Xq0 ,
and σ̂(Ŷ0 ) is the estimated standard error of the response.

53 / 63
Prediction Interval for a Future Observation

A level L prediction interval for a new response at regressor values


X10 , X20 , . . . , Xq0 has endpoints

Ŷnew ± σ̂(Ynew − Ŷnew )tn−q−1,(1+L)/2 ,

where
Ŷnew = β̂0 + β̂1 X10 + · · · + β̂q Xq0 ,
and q
σ̂(Ynew − Ŷnew ) = MSE + σ̂ 2 (Ŷ0 ).

54 / 63
Example 3, Continued

We illustrate by computing confidence and prediction intervals for


the original fit to the CARS93A data.

Confidence intervals for the βi are easily computed using a t-table


and the estimates and their standard errors from the table
(presented earlier) of parameter estimates.

55 / 63
Example 3, Continued

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 −5.67592 25.94414 −0.22 0.8273
D 1 −6.54110 3.19907 −2.04 0.0439
D2 1 1.08102 0.40608 2.66 0.0093
H 1 −0.11772 0.05551 −2.12 0.0368
H2 1 0.00017020 0.00013181 1.29 0.2001
R 1 0.01902 0.00990 1.92 0.0580
R2 1 −0.00000156 9.535905E−7 −1.64 0.1048

So, for example, a 95% confidence interval for the coefficient for
D2 is (after obtaining t86,0.975 ≈ 1.99 from the t-table):

56 / 63
Multicollinearity

Multicollinearity is correlation among the regressors. Two negative


consequences of multicollinearity are
I Large sampling variability for β̂i
I Questionable interpretation of β̂i as change in expected
response per unit change in Xi .

57 / 63
Multicollinearity

There are a number of measures used to detect multicollinearity.


One of the simplest is Ri2 , the coefficient of multiple determination
obtained from regressing Xi on the other X s. Ri2 is a measure of
how highly correlated Xi is with the other X s.

58 / 63
Multicollinearity

Ri2 provides two related measures of multicollinearity:


I Tolerance: TOLi = 1 − Ri2 .
Small TOLi indicates Xi is highly correlated with other X s.
We should begin getting concerned if TOLi < 0.1.
I VIF: VIFi = 1/TOLi .
VIF stands for variance inflation factor. Large VIFi indicates
Xi is highly correlated with other X s. We should begin getting
concerned if VIFi > 10.

59 / 63
Multicollinearity

Remedial measures for multicollinearity include


o Center or standardize the Zi prior to computing the Xi
I This is particularly useful if the regressor Xi is a power or
product of the predictors Zi .
I Sometimes this is not possible; for example if Xi = ln(Zi )
o Drop offending Xi

60 / 63
Example 3, Continued

For the first model for the CARS93A data, there is a large amount
of multicollinearity as the “Uncentered” columns in the table below
show. To try to alleviate this, we centered the predictors prior to
forming the model. The table shows that this has greatly reduced
the multicollinearity.
Uncentered Centered
Variable Tolerance VIF Tolerance VIF
D 0.0115 87.0459 0.0690 14.5023
D2 0.0177 56.4137 0.2916 3.4297
H 0.0150 66.7970 0.0925 10.8154
H2 0.0223 44.7906 0.3247 3.0543
R 0.0036 275.9746 0.2430 4.1149
R2 0.0036 278.7026 0.7164 1.3959

61 / 63
Empirical Model Building

Selection of variables in empirical model building is an important


task. We consider only one of many possible methods: backward
elimination, which consists of starting with all possible Xi in the
model and eliminating the non-significant ones one at at time,
until we are satisfied with the remaining model.

62 / 63
Example 3, Continued

Here’s an example of empirical model building using backward


elimination to select regressors for the CARS93A data.

63 / 63

You might also like