Multiple Regression: - y - Response Variable
Multiple Regression: - y - Response Variable
Multiple Regression
y response variable
x
1
,
x
2
, , x
k
-- a set of explanatory variables
In this chapter, all variables assumed to be quantitative.
Chapters 12-14 show how to incorporate categorical
variables also in a regression model.
Multiple regression equation (population):
E(y) = o + |
1
x
1
+ |
2
x
2
+ . + |
k
x
k
Parameter Interpretation
o = E(y) when x
1
= x
2
= = x
k
= 0.
|
1
, |
2
, , |
k
are called partial regression coefficients.
Controlling for other predictors in model, there is a linear
relationship between E(y) and x
1
with slope |
1.
i.e., if x
1
goes up 1 unit with other xs held constant, the
change in E(y) is
[o + |
1
(x
1
+ 1)
+ |
2
x
2
+ . + |
k
x
k
] [o + |
1
x
1
+ |
2
x
2
+ . + |
k
x
k
]
= |
1.
Prediction equation
With sample data, we get least squares
estimates of parameters by minimizing
SSE = sum of squared prediction errors (residuals)
= E(observed y predicted y)
2
to get a sample prediction equation
1 1 2 2
...
k k
y a b x b x b x = + + + +
Example: Mental impairment study
y = mental impairment (summarizes extent of psychiatric
symptoms, including aspects of anxiety and depression, based
on questions in Health opinion survey with possible responses
hardly ever, sometimes, often)
Ranged from 17 to 41 in sample, mean = 27, s = 5.
x
1
= life events score (composite measure of number and
severity of life events in previous 3 years)
Ranges from 0 to 100, sample mean = 44, s = 23
x
2
= socioeconomic status (composite index based on
occupation, income, and education)
Ranges from 0 to 100, sample mean = 57, s = 25
Data (n = 40) at www.stat.ufl.edu/~aa/social/data.html
and p. 327 of text
Other explanatory variables in study (not used here) include age,
marital status, gender, race
Bivariate regression analyses give prediction equations:
Correlation matrix
2
32.2 0.086 y x =
1
23.3 0.090 y x = +
Prediction equation for multiple regression analysis is:
1 2
\ .
Inferences for individual regression
coefficients (Need all predictors in model?)
To test partial effect of x
i
controlling for the other
explan. vars in model, test H
0
: |
i
= 0 using test stat.
t = (b
i
0)/se, df = n - (k + 1)
which is df
2
from the F test (and in df column of
ANOVA table in Residual row)
CI for |
i
has form b
i
t(se), with t-score from t-table
also having
df = n - (k + 1), for the desired confidence level
Software provides estimates, standard errors, t test
statistics, P-values for tests (2-sided by default)
In SPSS, check confidence intervals under
Statistics in Linear regression dialog box to get CIs
for regression parameters (95% by default)
Example: Effect of SES on mental
impairment, controlling for life events
H
0
: |
2
= 0, H
a
: |
2
= 0
Test statistic t = b
2
/se = -0.097/0.029 = -3.35,
df = n - (k + 1) = 40 3 = 37.
Software reports P-value = 0.002
Conclude there is very strong evidence that SES has a
negative effect on mental impairment, controlling for
life events. (We would reject H
0
at standard significance
levels, such as 0.05.)
Likewise for test of H
0
: |
1
= 0 (P-value = 0.003), but life events has
positive effect on mental impairment, controlling for SES.
A 95% CI for |
2
is b
2
t(se), which is
-0.097 2.03(0.029), or (-0.16, -0.04)
This does not contain 0, in agreement with rejecting
H
0
for two-sided H
a
at 0.05 significance level
Perhaps simpler to interpret corresponding CI of
(-16, -4) for the change in mean mental impairment
for an increase of 100 units in SES (from minimum of
0 to maximum of 100).
(relatively wide CI because of relatively small n = 40)
Why bother with F test? Why not go right to the t tests?
A caution: Overlapping variables
(multicollinearity)
It is possible to get a small P-value in F test of
H
0
: |
1
= |
2
= = |
k
= 0 yet not get a small P-value for
any of the t tests of individual H
0
: |
i
= 0
Likewise, it is possible to get a small P-value in a
bivariate test for a predictor but not for its partial test
controlling for other variables.
This happens when the partial variability explained
uniquely by a predictor is small. (i.e., each x
i
can be
predicted well using the other predictors) (picture)
Example (purposely absurd): y = height
x
1
= length of right leg, x
2
= length of left leg
When multicollinearity occurs,
se values for individual b
i
may be large
(and individual t statistics not significant)
R
2
may be nearly as large when drop some
predictors from model
It is advisable to simplify the model by dropping
some nearly redundant explanatory variables.
Modeling interaction between
predictors
Recall that the multiple regression model
E(y) = o + |
1
x
1
+ |
2
x
2
+ . + |
k
x
k
assumes the partial slope relating y to each x
i
is the
same at all values of other predictors (i.e., assumes
no interaction between pairs of predictors)
(recall picture showing parallelism)
For a model allowing interaction between x
1
and x
2
the
effect of x
1
may change as x
2
changes.
Simplest interaction model: Introduce
cross product terms for predictors
Ex: k = 2 explan vars: E(y) = o + |
1
x
1
+ |
2
x
2
+ |
3
(x
1
x
2
)
is special case of the multiple regression model
E(y) = o + |
1
x
1
+ |
2
x
2
+ |
3
x
3
with x
3
= x
1
x
2
(create x
3
in transform menu with
compute variable option in SPSS)
Example: For mental impairment data, we get
= 26.0 + 0.156x
1
- 0.060x
2
- 0.00087x
1
x
2
y
SPSS output for interaction model
(need more decimal places! Highlight table and
repeatedly click on the value.)
Fixed x
2
Prediction equation for y and x
1
0 26.0 + 0.156x
1
- 0.060(0)
- 0.00087 x
1
(0)
= 26.0 + 0.16x
1
50 26.0 + 0.156x
1
- 0.060(50)
- 0.00087 x
1
(50)
= 23.0 + 0.11x
1
100 26.0 + 0.156x
1
- 0.060(100)
- 0.00087 x
1
(100)
= 20.0 + 0.07x
1
The higher the value of SES, the weaker the
relationship between y = mental impairment and x
1
=
life events (plausible for these variables)
(picture)
Comments about interaction model
Note that E(y) = o + |
1
x
1
+ |
2
x
2
+ |
3
x
1
x
2
= (o + |
2
x
2
) + (|
1
+ |
3
x
2
)x
1
i.e, E(y) is a linear function of x
1
E(y) = (constant with respect to x
1
) + (coeff. of x
1
)x
1
where coefficient of x
1
is (|
1
+
|
3
x
2
).
For fixed x
2
the slope of the relationship between E(y)
and x
1
depends on the value of x
2 .
To model interaction with k > 2 explanatory variables,
take cross product for each pair; e.g., k = 3:
E(y) = o + |
1
x
1
+ |
2
x
2
+ |
3
x
3
+ |
4
x
1
x
2
+ |
5
x
1
x
3
+ |
6
x
2
x
3
To test H
0
: no interaction in model E(y) = o + |
1
x
1
+
|
2
x
2
+ |
3
x
1
x
2
, test H
0
: |
3
= 0 using test statistic
t = b
3
/se.
Example: t = -0.00087/0.0013 = -0.67, df = n 4 = 36.
P-value = 0.51 for H
a
: |
3
= 0
Insufficient evidence to conclude that interaction exists.
(It is significant for the entire data set, with n > 1000)
With several predictors, often some interaction terms
are needed but not others. E.g., could end up using
model such as E(y) = o + |
1
x
1
+ |
2
x
2
+ |
3
x
3
+ |
4
x
1
x
2
Be careful not to misintepret main effect terms when
there is interaction between them in the model.
Comparing two regression models
How to test whether a model gives a better fit than a
simpler model containing only a subset of the
predictors?
Example: Compare
E(y) = o + |
1
x
1
+ |
2
x
2
+ |
3
x
3
+ |
4
x
1
x
2
+ |
5
x
1
x
3
+ |
6
x
2
x
3
to
E(y) = o + |
1
x
1
+ |
2
x
2
+ |
3
x
3
to test H
0
: no interaction by testing H
0
: |
4
= |
5
= |
6
= 0.
An F test compares the models by comparing their SSE values,
or equivalently, their R
2
values.
The more complex (complete) model is better if its SSE is
sufficiently smaller (or equivalently if its R
2
value is sufficiently
larger) than the SSE (or R
2
) value for the simpler (reduced)
model.
Denote the SSE values for the complete and reduced models
by SSE
c
and SSE
r.
Denote the R
2
values by R
2
c
and R
2
r.
The test statistic for comparing the models is
df
1
= number of extra parameters in complete model,
df
2
= n-(k+1) = df
2
for F test that all | terms in complete model = 0
(e.g., df
2
= n 7 for model above)
2 2
1 1
2
2 2
( ) / ( ) /
/ (1 ) /
r c c r
c c
SSE SSE df R R df
F
SSE df R df
= =