0% found this document useful (0 votes)

2 views

3-TheSimpleLinearRegressionModelPart2

The document discusses the concept of goodness of fit in the context of the Simple Linear Regression Model, focusing on the sum of squared residuals (SSR), total sum of squares (SST), and explained sum of squares (SSE). It introduces the coefficient of determination (R²) as a measure of how well the model explains the variation in the dependent variable, with values closer to 1 indicating a better fit. Additionally, it covers the standard error of the regression (SER) and the necessary assumptions for ordinary least squares (OLS) estimation to be valid.

Uploaded by

Madil Escabusa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

3-TheSimpleLinearRegressionModelPart2

Uploaded by

Madil Escabusa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

The Simple

Linear
Regression
Model (Part 2)

1
Goodness of Fit

Given that the line represented by the

OLS estimates of the slope and intercept
are the “optimal choices” in that they
minimize SSR, we still need to compare
SSR across several models to get a better
goodness of fit.

2
Goodness of Fit
• 1. The sum of squared residuals represents
the degree to which our model missed the
data. A lower SSR means a “better” fit.

N N
SSR   (Yi  Yi )   u i
ˆ 2
ˆ 2

i 1 i 1

• However, the value of SSR depends on the

scale of the data, will not allow for consistent
comparison across equations 3
• 2. What represents the degree to which
our model succeeds?

– For each observation, we are trying to

explain the deviation from the mean of the
dependent variable.

– For the sample as a whole, we look at the

sum of squared deviations from the mean,
which is the Total Sum of Squares or SST.

N
SST   (Yi  Yi )  var(Yi )
2

i 1 4
• 3. The Explained Sum of Squares (SSE)
represents the deviation of the fitted values from
the mean.
N
SSE   (Yˆi  Yi ) 2
i 1

• A “perfect fit” would happen when SSE = SST

and the fitted Yi is the same as the observed Yi.

• Note that SST = SSE + SSR for OLS

5
• The COEFFICIENT OF
DETERMINATION or R2
the percentage of total
variation (SST) that is
explained by the model
(SSE)
– The ratio of the explained
variation compared to the
total variation
– measure of goodness of fit

6
Coefficient of Determination
N N

SSE
 i i
(Yˆ  Y ) 2

SSR
 i i
(Y  Yˆ ) 2

R 
2
 i 1
N
 1  1 i 1
N

 i i  i i
SST SST
(Y  Y ) 2
(Y  Y ) 2

i 1 i 1

• SSE/SST means the fraction of the

sample variation in Y that is explained by
X.
• The closer the value to 1, the better fit our
data is
7
Venn Diagram of R2
“Variation in Yi”
A  B   (Yi  Y ) 2  SST
B  C   ( X i  X )2
A
B   (Yi  Y )( X i  X )

B A   (Yi  Yˆ ) 2  SSE
B /( A  B)  R 2
C B /( B  C )  ˆ
1

The greater B (the overlap),

“Variation in Xi” the better the fit
8
• 5. By definition, R2 will be
between zero and 1, simply
because SST will never be less
than SSR. SSE can be no greater
than SST
– An R2 = 1 indicates that all
observations lie exactly on the
regression line. OLS provides a
perfect fit to the data. This
never happens, and if you see
it, there is something wrong.

9
• A value of R2 that is nearly
equal to zero indicates a
poor fit of the OLS line:
very little of the variation in
Yi is captured by the
variation in the Yi_hat.
However, in some
Coefficient of instances the value of R2 =
Determination 0.07 is okay as long as
coefficients make sense.
• In panel and cross section
data, R2 is lower.
• In time series data, R2 is
higher.

10
Back to Our Example - SSR
i Yi Xi Yˆ Yi  Yˆ (Yi  Yˆ ) 2
1 1050 1100 880.8 169.2 28615.19
2 1900 2550 2195.1 -295.1 87102.18

3 1560 1700 1424.7 135.3 18310.33

4 2760 3400 2965.6 -205.6 42262.00

5 6500 7200 6409.9 90.1 8113.30

6 5000 5600 4959.7 40.3 1626.19

7 3400 3900 3418.8 -18.8 352.73

8 4000 4500 3962.6 37.4 1396.85
9 1200 1400 1152.8 47.2 2231.42
Mean 3041 190,010.19
SSR 11
Calculate TSS
(Y-Y_bar)2
i Yi Xi Y-Y_bar
3964523
1 1050 1100 -1991
1302135
2 1900 2550 -1141
2193690
3 1560 1700 -1481
79023
4 2760 3400 -281
11963912
5 6500 7200 3459
3837246
6 5000 5600 1959
128801
7 3400 3900 359
919468
8 4000 4500 959
3389690
9 1200 1400 -1841
Mean 3041 27,778,489 12
SST
Calculate ESS
(Y_hat - Y_bar)2
i Yi Xi Y_hat Y_hat - Y_bar

1 1050 1100 880.6 -2160.3 4666772.31

2 1900 2550 2194 -846.0 715682.72
3 1560 1700 1424 -1616.4 2612835.57
4 2760 3400 2964 -75.5 5705.37
5 6500 7200 6407 3368.8 11348914.56
6 5000 5600 4958 1918.6 3680883.41
7 3400 3900 3417 377.7 142634.58
8 4000 4500 3961 921.5 849188.95
9 1200 1400 1152 -1888.3 3565862.21
Mean 3041 27,588,479.68
SSE
13
Example R2
N

SSE  i i
(Yˆ  Y ) 2
27588479.68
R 
2
 i 1
N
  0.993
 (Y  Y )
SST 2 27778489
i i
i 1

SSR  i i
(Y  Yˆ )2
190010.19
1  1 i 1
N
 1  .993
 i i
SST 27778489
(Y  Y ) 2

i 1

100* R2 is the percentage of the sample variation

in Y that is explained by X.
Variation in X explains 99.3% of the variation in Y.14
Example R2 Graph
Y

Y  3041

 
" SSE" 

 " SST "
Yˆ2  2194 
Y2  1900
“SSR”


X
X 2  2550

-116

15
• Another way to think about
R2 is a measure of how
well your model performs
More on relative to the simplest
model, wherein the values
R2 of Yi are predicted using
only the sample mean,
and no explanatory
variables.

16
• Note, that if you have no explanatory
variables, the least squares estimate of bo
will be the mean of Yi:

• Let a be an unknown value. Minimize the

sum of squared deviations of Yi from a.


a
 (Yi  a ) 2
2 (Yi  a)(1)  0  Y   a  0
i

 Yi
 Yi  Na  a
N
Y

17
Standard Error of the Regression (SER) –
another measure of goodness of fit
The SER measures the spread of the distribution of u. The SER
is (almost) the sample standard deviation of the OLS residuals:
1 n
SER =  i
n  2 i 1
( ˆ
u  ˆ
u ) 2

no. of coefficients
and intercept
1 n 2

Average residual
= uˆi
n  2 i 1 across the sample

1 n
(the second equality holds because û =  uˆi = 0).
n i 1
18
1 n 2
SER = 
n  2 i 1
uˆi

The SER:
 has the units of u, which are the units of Y
 measures the average “size” of the OLS residual (the average
“mistake” made by the OLS regression line)
 The root mean squared error (RMSE) is closely related to the
SER:
1 n 2
RMSE = 
n i 1
uˆi

This measures the same thing as the SER – the minor

difference is division by 1/n instead of 1/(n–2).

19
SER from previous example:
• Since we already calculated SSR = 190,010.19,
and our n=9,

we can find SER by dividing SSR by n-2, and

taking the square root:

1 1
SER  SSR  (190,010.19)  164.76
n2 92

This means that the average deviation of the

predicted from actual value of Yi is about $164.
20
• In order to draw any
specific conclusions
from our OLS
estimates (i.e. run
hypothesis tests with
Initial OLS known distributions)
we must make some
Assumptions assumptions about
the mathematical
properties of the
estimates and OLS
estimator.

21
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
Linear in Parameters.

Yi is a linear function of b0 and b1,

but not necessarily Xi.

Assumption Yi = b0 + b1Xi2 + ui or f(Yi)

= b0 + b1*g(Xi) + ui
SLR.1
are OK, but . . .

Yi = b0 + b12 Xi + ui is not.

23
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
4 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
• 2. (Xi, Yi) are i.i.d.

• This essentially means that

observations are randomly drawn
Assumption from a population.

SLR.2 • Think of optimal survey data.

• If this assumption is violated, we

cannot extrapolate sample findings
to the overall population.

25
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
2
Assumptions
6 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
E(ui | Xi) = 0.

-To make sense of this, we

need to remember that each
observation of ui is a single
Assumption draw from an underlying
distribution.
SLR.3
-This assumption simply states
that this distribution,
associated with each value
of Xi, is centered around
zero.

27
• What if this assumption does not hold?

Scatter Plot
Scatter Plot with
withE(e)>0
E(ui) > 0
9 Better than OLS
8
7
6 OLS
5
Y

4
3
2
1
0
0 2 4 6 8 10 12
X

28
Another important implication of
the zero conditional mean
assumption is that

Assumption
E(ui | Xi) = 0 implies that
SLR.3

COV(Xi,ui) = 0

29
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
0 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
Assumption SLR.4
• Xi is not constant across observations.

• A mathematical necessity, given our

formula for the OLS estimate.

Bˆ1 
 (Yi  Y )( X i  X )
is not defined if
 (X i  X) 2

 i
( X  X ) 2
0
31
• SLR.1.) Relationship between
Xi and Yi is linear in
parameters.

• SLR.2.) Xi and Yi are

independent and identically
distributed (iid) draws from a
joint distribution.
3
Assumptions
2 • SLR.3.) The error term ui has
a zero mean conditional on Xi.

• SLR.4.) Independent variable

Xi varies across
observations.
Sampling Distribution of the OLS
Estimators
• A key concept of estimation is the idea that Bˆ 0 and Bˆ1
are random variables derived from the sampling
distribution of the error term ui.

• We have to imagine the data that we observe (Yi

and Xi) to be the result of one of an infinite
number of possible outcomes.

33
Sampling Distribution
Each potential set of observations will have with it
a new “best fit” line, and new estimates of B0 and
B1

34
So, we observe one
possible estimate of the
true underlying
(population) parameter.

Sampling
Distribution We don’t know what the
value of that parameter is,
but we do know something
about its relationship to the
distribution from which our
estimate arose. . .

35
Properties of OLS Sampling
Distribution
• 1. The distribution of B̂1 is approximately normal in large
samples.
• 2. The distribution of B̂1 is centered about the true value
of 1 .
• 3. The variance of the distribution of B̂1 decreases as
the sample size increases.
– Smaller variance will be about the mean

36
Properties
• 1. It can be shown that the Central Limit
Theorem applies to the OLS estimates, and
therefore we may assume that when n > 100,
B̂1 is normally distributed.

– Therefore, the distribution will be symmetric about

its mean (see next slide), with known probability
density function.

37
Properties
• 2. Saying that the distribution of B̂1 is centered
about the true value of 1 is another way of
saying that B̂1 is an unbiased estimate of 1.

Mean of Bˆ1  E ( Bˆ1 )  

Advanced Oracle SQL Tuning
No ratings yet
Advanced Oracle SQL Tuning
5 pages
2025_R10_Module_10.2
No ratings yet
2025_R10_Module_10.2
8 pages
Curve Fitting: There Are Two General Approaches For Curve Fitting
No ratings yet
Curve Fitting: There Are Two General Approaches For Curve Fitting
63 pages
Least Square Regression: Numerical Methods ECE 410
No ratings yet
Least Square Regression: Numerical Methods ECE 410
44 pages
Analisis Regresi Sederhana Dan Berganda (Teori Dan Praktik)
No ratings yet
Analisis Regresi Sederhana Dan Berganda (Teori Dan Praktik)
53 pages
Lecture set 2
No ratings yet
Lecture set 2
47 pages
Supporting Information - Entropy-Suppressed
No ratings yet
Supporting Information - Entropy-Suppressed
10 pages
11 SimpleRegression
No ratings yet
11 SimpleRegression
42 pages
Ajuste de Curvas
No ratings yet
Ajuste de Curvas
35 pages
STT 430/630/ES 760 Lecture Notes: Chapter 2: Descriptive Statistics
No ratings yet
STT 430/630/ES 760 Lecture Notes: Chapter 2: Descriptive Statistics
28 pages
Hypothesis Testing of OLS Unit 1
No ratings yet
Hypothesis Testing of OLS Unit 1
3 pages
Sambhav Test Answer Key 25.07.2023
No ratings yet
Sambhav Test Answer Key 25.07.2023
12 pages
Regresi Data Panel
No ratings yet
Regresi Data Panel
10 pages
MT1 F13 v2 Solved
No ratings yet
MT1 F13 v2 Solved
8 pages
Biostatistics Lect 7b - 112025 (1)
No ratings yet
Biostatistics Lect 7b - 112025 (1)
50 pages
2019 Comp Q6_8
No ratings yet
2019 Comp Q6_8
5 pages
Regrion
No ratings yet
Regrion
19 pages
Curve Fitting
100% (1)
Curve Fitting
43 pages
PQR Branches
29% (7)
PQR Branches
23 pages
S Y B SC – SEM 4 - Statistical Treatment.
No ratings yet
S Y B SC – SEM 4 - Statistical Treatment.
76 pages
ANUM 2012 Curve-Fitting
No ratings yet
ANUM 2012 Curve-Fitting
44 pages
TCH442E Quantitative Methods For Finance: Last Lecture: Next
No ratings yet
TCH442E Quantitative Methods For Finance: Last Lecture: Next
13 pages
Spectroscopy: Microwave (Rotational) Infrared (Vibrational) Raman (Rotational & Vibrational) Texts
No ratings yet
Spectroscopy: Microwave (Rotational) Infrared (Vibrational) Raman (Rotational & Vibrational) Texts
33 pages
Lab3 DC327756
No ratings yet
Lab3 DC327756
4 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
RC Continued: G G Z Z
No ratings yet
RC Continued: G G Z Z
4 pages
Econometrics_3
No ratings yet
Econometrics_3
7 pages
How To Calculate ELISA Assay Values by EXCEL
No ratings yet
How To Calculate ELISA Assay Values by EXCEL
8 pages
Intro To Graphical Analysis Report Alex Tchantaev
No ratings yet
Intro To Graphical Analysis Report Alex Tchantaev
3 pages
Na 11
No ratings yet
Na 11
15 pages
nz2c00739 Si 001
No ratings yet
nz2c00739 Si 001
15 pages
Multivariate Regression
No ratings yet
Multivariate Regression
20 pages
Lecture 16
No ratings yet
Lecture 16
26 pages
Lecture 6 Regression Analysis
No ratings yet
Lecture 6 Regression Analysis
35 pages
Experiment: Schematic of Linear Variable Differential Transformer (LVDT)
No ratings yet
Experiment: Schematic of Linear Variable Differential Transformer (LVDT)
15 pages
Lab 01
No ratings yet
Lab 01
6 pages
Doe How To Transform Data With Unequal Variances
No ratings yet
Doe How To Transform Data With Unequal Variances
5 pages
Report On Electrical Resistivity Test
No ratings yet
Report On Electrical Resistivity Test
9 pages
CENV6152Week8Parts1&2(191124) - Tagged
No ratings yet
CENV6152Week8Parts1&2(191124) - Tagged
45 pages
Statistics For Business and Economics: Dr. Tang Yu
No ratings yet
Statistics For Business and Economics: Dr. Tang Yu
47 pages
1.中国证券市场价格冲击传导效应分析
No ratings yet
1.中国证券市场价格冲击传导效应分析
14 pages
8 Reflectivity
No ratings yet
8 Reflectivity
25 pages
2018-First-Principles Insights Into Tin-Based1
No ratings yet
2018-First-Principles Insights Into Tin-Based1
7 pages
From The Results of RUN1 As Shown in Table 7.1, Plot The Graph of The Frequency Against The Total Current IT
No ratings yet
From The Results of RUN1 As Shown in Table 7.1, Plot The Graph of The Frequency Against The Total Current IT
8 pages
1a-Biostat Review
No ratings yet
1a-Biostat Review
28 pages
Main
No ratings yet
Main
4 pages
24-01-22 Marked Slides
No ratings yet
24-01-22 Marked Slides
50 pages
Lifetlmes of Excited States of "Si
No ratings yet
Lifetlmes of Excited States of "Si
12 pages
Multivariate Regression
No ratings yet
Multivariate Regression
20 pages
Frequency Response and Bode Plots
No ratings yet
Frequency Response and Bode Plots
43 pages
Exponential Smoothing Methods
No ratings yet
Exponential Smoothing Methods
74 pages
FoM - LCF Questions
No ratings yet
FoM - LCF Questions
5 pages
Slevin Oral Presentation
No ratings yet
Slevin Oral Presentation
19 pages
Analysis of Variance
No ratings yet
Analysis of Variance
14 pages
Crystalline Size and Strain
No ratings yet
Crystalline Size and Strain
10 pages
Predective Modelling
No ratings yet
Predective Modelling
28 pages
Physics Investigatory Class 12
No ratings yet
Physics Investigatory Class 12
13 pages
1965 - Shapiro - Analysis Variance Normality
No ratings yet
1965 - Shapiro - Analysis Variance Normality
21 pages
Multiple_Linear_Regression
No ratings yet
Multiple_Linear_Regression
27 pages
EE303A_2015_final_solution
No ratings yet
EE303A_2015_final_solution
21 pages
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
From Everand
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
Analog Dialogue
4/5 (1)
Inventory Control Using ABC and Min-Max Analysis o
No ratings yet
Inventory Control Using ABC and Min-Max Analysis o
11 pages
Baffinland Iron Mines Corporation Et Alia. v. The Territorial Board of Revision, Government of Nunavut
No ratings yet
Baffinland Iron Mines Corporation Et Alia. v. The Territorial Board of Revision, Government of Nunavut
42 pages
Class 9 Science Annual Papers
No ratings yet
Class 9 Science Annual Papers
7 pages
Admit Card .
No ratings yet
Admit Card .
2 pages
Special Situations: Intradialytic Hypertension/chronic Hypertension and Intradialytic Hypotension
No ratings yet
Special Situations: Intradialytic Hypertension/chronic Hypertension and Intradialytic Hypotension
8 pages
CPF India Private LTD - Compliance Mangement Proposal (June 26th 2024)
No ratings yet
CPF India Private LTD - Compliance Mangement Proposal (June 26th 2024)
22 pages
APOLLO Complete Catalog
No ratings yet
APOLLO Complete Catalog
490 pages
Present Simple or Present Continuous 2018
No ratings yet
Present Simple or Present Continuous 2018
1 page
A. SDRRM Team 7-Artemis
No ratings yet
A. SDRRM Team 7-Artemis
1 page
Cryptographic Algorithm Validation Program
No ratings yet
Cryptographic Algorithm Validation Program
21 pages
Hiring Manager ESG and Sustainability Services 1723318287
No ratings yet
Hiring Manager ESG and Sustainability Services 1723318287
1 page
Specs ICE 28C
100% (1)
Specs ICE 28C
2 pages
Underground Fire Protection Pipes & Equipments
No ratings yet
Underground Fire Protection Pipes & Equipments
6 pages
Nidek Mark 5 Plus Concentrator - User Manual
No ratings yet
Nidek Mark 5 Plus Concentrator - User Manual
7 pages
A Study On Perception of Life Insurance Agency As A Career For Bajaj Allianz Life Insurance Company Limited"
0% (1)
A Study On Perception of Life Insurance Agency As A Career For Bajaj Allianz Life Insurance Company Limited"
12 pages
Unit - 4 Artificial Intelligence Assignment
No ratings yet
Unit - 4 Artificial Intelligence Assignment
13 pages
The Study of Ratio Analysis
100% (2)
The Study of Ratio Analysis
55 pages
The Big Test Julie Danneberg download
100% (1)
The Big Test Julie Danneberg download
24 pages
SS3 Mathematics E-Note
No ratings yet
SS3 Mathematics E-Note
125 pages
Running Head: The Theory of Reasoned Action 1: Marketing Consumer Behavior
No ratings yet
Running Head: The Theory of Reasoned Action 1: Marketing Consumer Behavior
9 pages
JCU Practice Questions 1
No ratings yet
JCU Practice Questions 1
6 pages
Rezistente CYYF, CYAbYF
No ratings yet
Rezistente CYYF, CYAbYF
6 pages
Project Dates: What Are The Dates in SAP Project System?
No ratings yet
Project Dates: What Are The Dates in SAP Project System?
4 pages
Ch-02 Agile Methodology.pptx
No ratings yet
Ch-02 Agile Methodology.pptx
26 pages
Math 1
No ratings yet
Math 1
8 pages
Auger Torque ME Price Guide 2020 Final 040620
No ratings yet
Auger Torque ME Price Guide 2020 Final 040620
3 pages
Passive Voice Subordinate NK
No ratings yet
Passive Voice Subordinate NK
2 pages
Blockchain: Research and Applications: Lodovica Marchesi, Michele Marchesi, Roberto Tonelli, Maria Ilaria Lunesu
No ratings yet
Blockchain: Research and Applications: Lodovica Marchesi, Michele Marchesi, Roberto Tonelli, Maria Ilaria Lunesu
13 pages