3-TheSimpleLinearRegressionModelPart2
3-TheSimpleLinearRegressionModelPart2
Linear
Regression
Model (Part 2)
1
Goodness of Fit
2
Goodness of Fit
• 1. The sum of squared residuals represents
the degree to which our model missed the
data. A lower SSR means a “better” fit.
N N
SSR (Yi Yi ) u i
ˆ 2
ˆ 2
i 1 i 1
N
SST (Yi Yi ) var(Yi )
2
i 1 4
• 3. The Explained Sum of Squares (SSE)
represents the deviation of the fitted values from
the mean.
N
SSE (Yˆi Yi ) 2
i 1
6
Coefficient of Determination
N N
SSE
i i
(Yˆ Y ) 2
SSR
i i
(Y Yˆ ) 2
R
2
i 1
N
1 1 i 1
N
i i i i
SST SST
(Y Y ) 2
(Y Y ) 2
i 1 i 1
B A (Yi Yˆ ) 2 SSE
B /( A B) R 2
C B /( B C ) ˆ
1
9
• A value of R2 that is nearly
equal to zero indicates a
poor fit of the OLS line:
very little of the variation in
Yi is captured by the
variation in the Yi_hat.
However, in some
Coefficient of instances the value of R2 =
Determination 0.07 is okay as long as
coefficients make sense.
• In panel and cross section
data, R2 is lower.
• In time series data, R2 is
higher.
10
Back to Our Example - SSR
i Yi Xi Yˆ Yi Yˆ (Yi Yˆ ) 2
1 1050 1100 880.8 169.2 28615.19
2 1900 2550 2195.1 -295.1 87102.18
SSE i i
(Yˆ Y ) 2
27588479.68
R
2
i 1
N
0.993
(Y Y )
SST 2 27778489
i i
i 1
SSR i i
(Y Yˆ )2
190010.19
1 1 i 1
N
1 .993
i i
SST 27778489
(Y Y ) 2
i 1
Y 3041
" SSE"
" SST "
Yˆ2 2194
Y2 1900
“SSR”
X
X 2 2550
-116
15
• Another way to think about
R2 is a measure of how
well your model performs
More on relative to the simplest
model, wherein the values
R2 of Yi are predicted using
only the sample mean,
and no explanatory
variables.
16
• Note, that if you have no explanatory
variables, the least squares estimate of bo
will be the mean of Yi:
a
(Yi a ) 2
2 (Yi a)(1) 0 Y a 0
i
Yi
Yi Na a
N
Y
17
Standard Error of the Regression (SER) –
another measure of goodness of fit
The SER measures the spread of the distribution of u. The SER
is (almost) the sample standard deviation of the OLS residuals:
1 n
SER = i
n 2 i 1
( ˆ
u ˆ
u ) 2
no. of coefficients
and intercept
1 n 2
Average residual
= uˆi
n 2 i 1 across the sample
1 n
(the second equality holds because û = uˆi = 0).
n i 1
18
1 n 2
SER =
n 2 i 1
uˆi
The SER:
has the units of u, which are the units of Y
measures the average “size” of the OLS residual (the average
“mistake” made by the OLS regression line)
The root mean squared error (RMSE) is closely related to the
SER:
1 n 2
RMSE =
n i 1
uˆi
19
SER from previous example:
• Since we already calculated SSR = 190,010.19,
and our n=9,
1 1
SER SSR (190,010.19) 164.76
n2 92
21
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
Yi = b0 + b12 Xi + ui is not.
23
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
25
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
27
• What if this assumption does not hold?
Scatter Plot
Scatter Plot with
withE(e)>0
E(ui) > 0
9 Better than OLS
8
7
6 OLS
5
Y
4
3
2
1
0
0 2 4 6 8 10 12
X
28
Another important implication of
the zero conditional mean
assumption is that
Assumption
E(ui | Xi) = 0 implies that
SLR.3
COV(Xi,ui) = 0
29
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
Bˆ1
(Yi Y )( X i X )
is not defined if
(X i X) 2
i
( X X ) 2
0
31
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.
33
Sampling Distribution
Each potential set of observations will have with it
a new “best fit” line, and new estimates of B0 and
B1
34
So, we observe one
possible estimate of the
true underlying
(population) parameter.
Sampling
Distribution We don’t know what the
value of that parameter is,
but we do know something
about its relationship to the
distribution from which our
estimate arose. . .
35
Properties of OLS Sampling
Distribution
• 1. The distribution of B̂1 is approximately normal in large
samples.
• 2. The distribution of B̂1 is centered about the true value
of 1 .
• 3. The variance of the distribution of B̂1 decreases as
the sample size increases.
– Smaller variance will be about the mean
36
Properties
• 1. It can be shown that the Central Limit
Theorem applies to the OLS estimates, and
therefore we may assume that when n > 100,
B̂1 is normally distributed.
37
Properties
• 2. Saying that the distribution of B̂1 is centered
about the true value of 1 is another way of
saying that B̂1 is an unbiased estimate of 1.
38