0% found this document useful (0 votes)
36 views8 pages

Multiple Regression Analysis

The document discusses multiple regression analysis including how to estimate regression coefficients using least squares, properties of the least squares estimators, and how to perform ANOVA tests for model selection. It covers model assumptions, fitted and residual values, and variable selection criteria including balancing model simplicity and fit.

Uploaded by

Leia Seungho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Multiple Regression Analysis

The document discusses multiple regression analysis including how to estimate regression coefficients using least squares, properties of the least squares estimators, and how to perform ANOVA tests for model selection. It covers model assumptions, fitted and residual values, and variable selection criteria including balancing model simplicity and fit.

Uploaded by

Leia Seungho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Multiple Regression Analysis

Copyright@Tieming Ji
Spring 2013
University of Missouri at Columbia
1 / 8
Model:
y
i
=
0
+
1
x
i 1
+
2
x
i 2
+ +
k
x
ik
+ e
i
e
i
i .i .d.
N(0,
2
)
Key Point:
j
, j = 1, , k, is the change in the mean of Y
for a unit increase in x
j
with all other variables held constant.
Model Assumptions:
Independence;
Normality;
Constant Variance;
Linearity.
2 / 8
Least Square Estimation
How to estimate
0
,
1
, ...,
k
?
min
n

i =1
(y
i
y
i
)
2
min
n

i =1
(y
i
(

0
+

1
x
i 1
+ +

k
x
ik
))
2

0
,

1
, ,

k
.
Theorem: LSE

0
,

1
, ,

k
are the best linear unbiased
estimators of
0
,
1
, ,
k
.
3 / 8
Conclusions:
The tted values (predicted values) are
y
i
=

0
+

1
x
i 1
+ +

k
x
ik
.
Residuals are e
i
= y
i
y
i
, i = 1, ..., n.

j
, j = 1, , k, is a linear function of y
i
s, so

j
N(
j
,
2

j
).
Multiple correlation coecient
r
Y,

Y
=

n
i =1
(y
i
y)( y
i

y)

n
i 1
(y
i
y)
2

n
i =1
( y
i

y)
2
.
It measures how well response variables are predicted.
4 / 8
ANOVA Table
Source d.f. SS MSS
model k SSR =

n
i =1
( y
i
y)
2
MSR =
SSR
k
error n (k + 1) SSE =

n
i =1
(y
i
y
i
)
2
MSE =
SSE
n(k+1)
total n 1 SST =

n
i =1
(y
i
y)
2
R
2
=
SSR
SST
=
SST SSE
SST
.
MSE S
2
is the unbiased estimator of
2
.
5 / 8
ANOVA Test
Hypotheses:
H
0
:
1
=
2
= =
k
= 0;
H
a
: at least one of
j
s, j = 1, , k, is not equal to 0.
Test statistic: F
obs
=
MSR
MSE
from ANOVA table.
Decision (at level ):
(1) reject H
0
if F
obs
F
k,n(k+1),1
; or
(2) reject H
0
if p-value is less or equal to .
ANOVA test is also a model selection. where H
0
: y
i
= + e
i
vs. H
a
: model contains at least one
j
, j = 1, , k.
6 / 8
Model Selection
Suppose we have decided to use linear models, we need to
choose predictors in the linear function. We may consider:
What explanatory variables to include;
What forms (e.g. transformations, interactions) of
explanatory variables to include.
Criteria:
Overall criteria: model simplicity + tting quality.
Larger R
2
Better tting to observations.
R
2
always increases as more explanatory variables added in
model. Incremental increase in R
2
by adding more
explanatory variables may not be statistically signicant
nor practically important.
7 / 8
Model Selection
(a) (b) (c)
(a) Poor tting, R
2
is low.
(b) Adequate tting, robust for future data.
(c) Extremely overtting, R
2
=1, but not robust for predicting.
8 / 8

You might also like