0% found this document useful (0 votes)
2 views

3-TheSimpleLinearRegressionModelPart2

The document discusses the concept of goodness of fit in the context of the Simple Linear Regression Model, focusing on the sum of squared residuals (SSR), total sum of squares (SST), and explained sum of squares (SSE). It introduces the coefficient of determination (R²) as a measure of how well the model explains the variation in the dependent variable, with values closer to 1 indicating a better fit. Additionally, it covers the standard error of the regression (SER) and the necessary assumptions for ordinary least squares (OLS) estimation to be valid.

Uploaded by

Madil Escabusa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

3-TheSimpleLinearRegressionModelPart2

The document discusses the concept of goodness of fit in the context of the Simple Linear Regression Model, focusing on the sum of squared residuals (SSR), total sum of squares (SST), and explained sum of squares (SSE). It introduces the coefficient of determination (R²) as a measure of how well the model explains the variation in the dependent variable, with values closer to 1 indicating a better fit. Additionally, it covers the standard error of the regression (SER) and the necessary assumptions for ordinary least squares (OLS) estimation to be valid.

Uploaded by

Madil Escabusa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

The Simple

Linear
Regression
Model (Part 2)

1
Goodness of Fit

Given that the line represented by the


OLS estimates of the slope and intercept
are the “optimal choices” in that they
minimize SSR, we still need to compare
SSR across several models to get a better
goodness of fit.

2
Goodness of Fit
• 1. The sum of squared residuals represents
the degree to which our model missed the
data. A lower SSR means a “better” fit.

N N
SSR   (Yi  Yi )   u i
ˆ 2
ˆ 2

i 1 i 1

• However, the value of SSR depends on the


scale of the data, will not allow for consistent
comparison across equations 3
• 2. What represents the degree to which
our model succeeds?

– For each observation, we are trying to


explain the deviation from the mean of the
dependent variable.

– For the sample as a whole, we look at the


sum of squared deviations from the mean,
which is the Total Sum of Squares or SST.

N
SST   (Yi  Yi )  var(Yi )
2

i 1 4
• 3. The Explained Sum of Squares (SSE)
represents the deviation of the fitted values from
the mean.
N
SSE   (Yˆi  Yi ) 2
i 1

• A “perfect fit” would happen when SSE = SST


and the fitted Yi is the same as the observed Yi.

• Note that SST = SSE + SSR for OLS


5
• The COEFFICIENT OF
DETERMINATION or R2
the percentage of total
variation (SST) that is
explained by the model
(SSE)
– The ratio of the explained
variation compared to the
total variation
– measure of goodness of fit

6
Coefficient of Determination
N N

SSE
 i i
(Yˆ  Y ) 2

SSR
 i i
(Y  Yˆ ) 2

R 
2
 i 1
N
 1  1 i 1
N

 i i  i i
SST SST
(Y  Y ) 2
(Y  Y ) 2

i 1 i 1

• SSE/SST means the fraction of the


sample variation in Y that is explained by
X.
• The closer the value to 1, the better fit our
data is
7
Venn Diagram of R2
“Variation in Yi”
A  B   (Yi  Y ) 2  SST
B  C   ( X i  X )2
A
B   (Yi  Y )( X i  X )

B A   (Yi  Yˆ ) 2  SSE
B /( A  B)  R 2
C B /( B  C )  ˆ
1

The greater B (the overlap),


“Variation in Xi” the better the fit
8
• 5. By definition, R2 will be
between zero and 1, simply
because SST will never be less
than SSR. SSE can be no greater
than SST
– An R2 = 1 indicates that all
observations lie exactly on the
regression line. OLS provides a
perfect fit to the data. This
never happens, and if you see
it, there is something wrong.

9
• A value of R2 that is nearly
equal to zero indicates a
poor fit of the OLS line:
very little of the variation in
Yi is captured by the
variation in the Yi_hat.
However, in some
Coefficient of instances the value of R2 =
Determination 0.07 is okay as long as
coefficients make sense.
• In panel and cross section
data, R2 is lower.
• In time series data, R2 is
higher.

10
Back to Our Example - SSR
i Yi Xi Yˆ Yi  Yˆ (Yi  Yˆ ) 2
1 1050 1100 880.8 169.2 28615.19
2 1900 2550 2195.1 -295.1 87102.18

3 1560 1700 1424.7 135.3 18310.33

4 2760 3400 2965.6 -205.6 42262.00

5 6500 7200 6409.9 90.1 8113.30

6 5000 5600 4959.7 40.3 1626.19

7 3400 3900 3418.8 -18.8 352.73


8 4000 4500 3962.6 37.4 1396.85
9 1200 1400 1152.8 47.2 2231.42
Mean 3041 190,010.19
SSR 11
Calculate TSS
(Y-Y_bar)2
i Yi Xi Y-Y_bar
3964523
1 1050 1100 -1991
1302135
2 1900 2550 -1141
2193690
3 1560 1700 -1481
79023
4 2760 3400 -281
11963912
5 6500 7200 3459
3837246
6 5000 5600 1959
128801
7 3400 3900 359
919468
8 4000 4500 959
3389690
9 1200 1400 -1841
Mean 3041 27,778,489 12
SST
Calculate ESS
(Y_hat - Y_bar)2
i Yi Xi Y_hat Y_hat - Y_bar

1 1050 1100 880.6 -2160.3 4666772.31


2 1900 2550 2194 -846.0 715682.72
3 1560 1700 1424 -1616.4 2612835.57
4 2760 3400 2964 -75.5 5705.37
5 6500 7200 6407 3368.8 11348914.56
6 5000 5600 4958 1918.6 3680883.41
7 3400 3900 3417 377.7 142634.58
8 4000 4500 3961 921.5 849188.95
9 1200 1400 1152 -1888.3 3565862.21
Mean 3041 27,588,479.68
SSE
13
Example R2
N

SSE  i i
(Yˆ  Y ) 2
27588479.68
R 
2
 i 1
N
  0.993
 (Y  Y )
SST 2 27778489
i i
i 1

SSR  i i
(Y  Yˆ )2
190010.19
1  1 i 1
N
 1  .993
 i i
SST 27778489
(Y  Y ) 2

i 1

100* R2 is the percentage of the sample variation


in Y that is explained by X.
Variation in X explains 99.3% of the variation in Y.14
Example R2 Graph
Y

Y  3041

 
" SSE" 

 " SST "
Yˆ2  2194 
Y2  1900
“SSR”

X
X 2  2550

-116

15
• Another way to think about
R2 is a measure of how
well your model performs
More on relative to the simplest
model, wherein the values
R2 of Yi are predicted using
only the sample mean,
and no explanatory
variables.

16
• Note, that if you have no explanatory
variables, the least squares estimate of bo
will be the mean of Yi:

• Let a be an unknown value. Minimize the


sum of squared deviations of Yi from a.


a
 (Yi  a ) 2
2 (Yi  a)(1)  0  Y   a  0
i

 Yi
 Yi  Na  a
N
Y

17
Standard Error of the Regression (SER) –
another measure of goodness of fit
The SER measures the spread of the distribution of u. The SER
is (almost) the sample standard deviation of the OLS residuals:
1 n
SER =  i
n  2 i 1
( ˆ
u  ˆ
u ) 2

no. of coefficients
and intercept
1 n 2

Average residual
= uˆi
n  2 i 1 across the sample

1 n
(the second equality holds because û =  uˆi = 0).
n i 1
18
1 n 2
SER = 
n  2 i 1
uˆi

The SER:
 has the units of u, which are the units of Y
 measures the average “size” of the OLS residual (the average
“mistake” made by the OLS regression line)
 The root mean squared error (RMSE) is closely related to the
SER:
1 n 2
RMSE = 
n i 1
uˆi

This measures the same thing as the SER – the minor


difference is division by 1/n instead of 1/(n–2).

19
SER from previous example:
• Since we already calculated SSR = 190,010.19,
and our n=9,

we can find SER by dividing SSR by n-2, and


taking the square root:

1 1
SER  SSR  (190,010.19)  164.76
n2 92

This means that the average deviation of the


predicted from actual value of Yi is about $164.
20
• In order to draw any
specific conclusions
from our OLS
estimates (i.e. run
hypothesis tests with
Initial OLS known distributions)
we must make some
Assumptions assumptions about
the mathematical
properties of the
estimates and OLS
estimator.

21
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are


independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable


Xi varies across
observations.
Linear in Parameters.

Yi is a linear function of b0 and b1,


but not necessarily Xi.

Assumption Yi = b0 + b1Xi2 + ui or f(Yi)


= b0 + b1*g(Xi) + ui
SLR.1
are OK, but . . .

Yi = b0 + b12 Xi + ui is not.

23
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are


independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
4 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable


Xi varies across
observations.
• 2. (Xi, Yi) are i.i.d.

• This essentially means that


observations are randomly drawn
Assumption from a population.

SLR.2 • Think of optimal survey data.

• If this assumption is violated, we


cannot extrapolate sample findings
to the overall population.

25
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are


independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
6 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable


Xi varies across
observations.
E(ui | Xi) = 0.

-To make sense of this, we


need to remember that each
observation of ui is a single
Assumption draw from an underlying
distribution.
SLR.3
-This assumption simply states
that this distribution,
associated with each value
of Xi, is centered around
zero.

27
• What if this assumption does not hold?

Scatter Plot
Scatter Plot with
withE(e)>0
E(ui) > 0
9 Better than OLS
8
7
6 OLS
5
Y

4
3
2
1
0
0 2 4 6 8 10 12
X

28
Another important implication of
the zero conditional mean
assumption is that

Assumption
E(ui | Xi) = 0 implies that
SLR.3

COV(Xi,ui) = 0

29
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are


independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
0 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable


Xi varies across
observations.
Assumption SLR.4
• Xi is not constant across observations.

• A mathematical necessity, given our


formula for the OLS estimate.

Bˆ1 
 (Yi  Y )( X i  X )
is not defined if
 (X i  X) 2

 i
( X  X ) 2
0
31
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are


independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable


Xi varies across
observations.
Sampling Distribution of the OLS
Estimators
• A key concept of estimation is the idea that Bˆ 0 and Bˆ1
are random variables derived from the sampling
distribution of the error term ui.

• We have to imagine the data that we observe (Yi


and Xi) to be the result of one of an infinite
number of possible outcomes.

33
Sampling Distribution
Each potential set of observations will have with it
a new “best fit” line, and new estimates of B0 and
B1

34
So, we observe one
possible estimate of the
true underlying
(population) parameter.

Sampling
Distribution We don’t know what the
value of that parameter is,
but we do know something
about its relationship to the
distribution from which our
estimate arose. . .

35
Properties of OLS Sampling
Distribution
• 1. The distribution of B̂1 is approximately normal in large
samples.
• 2. The distribution of B̂1 is centered about the true value
of 1 .
• 3. The variance of the distribution of B̂1 decreases as
the sample size increases.
– Smaller variance will be about the mean

36
Properties
• 1. It can be shown that the Central Limit
Theorem applies to the OLS estimates, and
therefore we may assume that when n > 100,
B̂1 is normally distributed.

– Therefore, the distribution will be symmetric about


its mean (see next slide), with known probability
density function.

37
Properties
• 2. Saying that the distribution of B̂1 is centered
about the true value of 1 is another way of
saying that B̂1 is an unbiased estimate of 1.

Mean of Bˆ1  E ( Bˆ1 )  

38

You might also like