0% found this document useful (0 votes)

44 views28 pages

Lecture 2 SLR - 1

Uploaded by

1621995944

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views28 pages

Lecture 2 SLR - 1

Uploaded by

1621995944

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

ECON 7310 Elements of Econometrics

Lecture 2: Linear Regression with One Regressor

1 / 28
Outline:

▶ The population linear regression model (LRM)

▶ The ordinary least squares (OLS) estimator and the sample regression
line
▶ Measures of fit of the sample regression
▶ The least squares assumptions
▶ The sampling distribution of the OLS estimator

2 / 28
Linear Regression

▶ Linear regression lets us estimate the slope of the population regression

line.
▶ The slope of the population regression line is the expected effect on Y of
a unit change in X .
▶ Ultimately our aim is to estimate the causal effect on Y of a unit change
in X – but for now, just think of the problem of fitting a straight line to data
on two variables, Y and X .

3 / 28
Linear Regression

▶ The problem of statistical inference for linear regression is, at a general

level, the same as for estimation of the mean or of the differences
between two means.
▶ Statistical, or econometric, inference about the slope entails:
▶ Estimation:
How should we draw a line through the data to estimate the population
slope? Answer: ordinary least squares (OLS).
What are advantages and disadvantages of OLS?
▶ Hypothesis testing:
How to test if the slope is zero?
▶ Confidence intervals:
How to construct a confidence interval for the slope?

4 / 28
The Linear Regression Model SW Section 4.1

▶ The population regression line:

Test Score = β0 + β1 STR

▶ β1 = slope of population regression line
= change in test score for a unit change in student-teacher ratio (STR)
▶ Why are β0 and β1 “population” parameters?
▶ We would like to know the population value of β1 .
▶ We don’t know β1 , so must estimate it using data.

5 / 28
The Population Linear Regression Model

Consider
Yi = β0 + β1 Xi + ui
for i = 1, . . . , n
▶ We have n observations, (Xi , Yi ), i = 1, .., n.
▶ X is the independent variable or regressor or right-hand-side variable
▶ Y is the dependent variable or left-hand-side variable
▶ β0 = intercept
▶ β1 = slope
▶ ui = the regression error
▶ The regression error consists of omitted factors. In general, these
omitted factors are other factors that influence Y , other than the variable
X . The regression error also includes error in the measurement of Y .

6 / 28
The population regression model in a picture

▶ Observations on Y and X (n = 7); the population regression line; and

the regression error (the “error term"):

7 / 28
The Ordinary Least Squares Estimator (SW Section 4.2)

▶ How can we estimate β0 and β1 from data? Recall that was the least
squares estimator of µY solves,
n
X
min (Yi − m)2
m
i=1

▶ By analogy, we will focus on the least squares (“ordinary least squares”

or “OLS”) estimator of the unknown parameters β0 and β1 . The OLS
estimator solves,
Xn
min [Yi − (b0 + b1 Xi )]2
b0 ,b1
i=1

▶ In fact, we estimate the conditional expectation function E[Y |X ] under

the assumption that E[Y |X ] = β0 + β1 X

8 / 28
Mechanics of OLS

▶ The population regression line:

Test Score = β0 + β1 STR

9 / 28
Mechanics of OLS

▶ The OLS estimator minimizes the average squared difference between

the actual values of Yi and the prediction (“predicted value”) based on
the estimated line.
▶ This minimization problem can be solved using calculus (Appendix 4.2).
▶ The result is the OLS estimators of β0 and β1 .

10 / 28
OLS estimator, predicted values, and residuals

▶ The OLS estimators are

Pn
i=1 (Xi − X )(Yi − Y)
βb1 = Pn 2
i=1 (Xi − X )

βb0 = Y − βb1 X
▶ These are estimates of the unknown population parameters β0 and β1 .
▶ The OLS predicted (fitted) values Y
bi and residuals u
bi are

Y
bi = βb0 + βb1 Xi
bi = Yi − Y
u bi

▶ The estimated intercept, βb0 , and slope, βb1 , and residuals u

bi are
computed from a sample of n observations (Xi , Yi ) i = 1, . . . , n.

11 / 28
Predicted values & residuals

▶ One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and TestScore = 657.8

predicted value: = 698.9 − 2.28 × 19.33 = 654.8

residual: = 657.8 − 654.8 = 3.0

12 / 28
OLS regression: R output

TestScore = 698.93 − 2.28 × STR

We will discuss the rest of this output later.

13 / 28
Measures of fit Section 4.3

▶ Two regression statistics provide complementary measures of how well

the regression line “fits” or explains the data:
▶ The regression R 2 measures the fraction of the variance of Y that is
explained by X ; it is unit free and ranges between zero (no fit) and one
(perfect fit)
▶ The standard error of the regression (SER) measures the magnitude of
a typical regression residual in the units of Y .

14 / 28
Regression R 2

▶ The sample variance of Yi = n1 ni=1 (Yi − Y )2

The sample variance of Y bi = 1 Pn (Y b )2 , where in fact Y

bi − Y b = Y.
n i=1
2
R is simply the ratio of those two sample variances.
▶ Formally, we define R 2 as follows (two equivalent definitions);
Pn b
Explained Sum of Squares (ESS) (Yi − Y )2
R 2 := = Pi=1
n
Total Sum of Squares (TSS) i=1 (Yi − Y )
2
Pn
Residual Sum of Squares (RSS) b2
u
R 2 := 1 − = 1 − Pn i=1 i
Total Sum of Squares i=1 (Yi − Y )
2

▶ R 2 = 0 ⇐⇒ ESS = 0 and R 2 = 1 ⇐⇒ ESS = TSS. Also, 0 ≤ R 2 ≤ 1

▶ For regression with a single X ,
R 2 = the square of the sample correlation coefficient between X and Y

15 / 28
The Standard Error of the Regression (SER)

▶ The SER measures the spread of the distribution of u. The SER is

(almost) the sample standard deviation of the OLS residuals (?)
v
u n
u 1 X
SER := t bi2
u
n−2
i=1

▶ The SER:
▶ has the units of ui , which are the units of Yi
▶ measures the average “size” of the OLS residual (the average “mistake”
made by the OLS regression line)
▶ The root mean squared error (RMSE) is closely related to the SER:
v
u n
u1 X
RMSE := t bi2
u
n
i=1

▶ When n is large, SER ≈ RMSE. 1

1
Here, n − 2 is the degrees of freedom – need to subtract 2 because there are two parameters
to estimate. For details, see section 18.4.
16 / 28
Example of the R 2 and the SER

▶ TestScore = 698.9 − 2.28 × STR, R 2 = 0.05, SER = 18.6

▶ STR explains only a small fraction of the variation in test scores.
▶ Does this make sense?
▶ Does this mean the STR is unimportant in a policy sense?

17 / 28
Least Squares Assumptions (SW Section 4.4)

▶ What, in a precise sense, are the properties of the sampling distribution

of the OLS estimator? When will it be unbiased? What is its variance?
▶ To answer these questions, we need to make some assumptions about
how Y and X are related to each other, and about how they are collected
(the sampling scheme)
▶ These assumptions – there are three – are known as the Least Squares
Assumptions.

18 / 28
Least Squares Assumptions (SW Section 4.4)

Yi = β0 + β1 Xi + ui , i = 1, . . . , n

1. The conditional distribution of u given X has mean zero, that is,

E(u|X = x) = 0.
▶ This implies that OLS estimators are unbiased
2. (Xi , Yi ), i = 1, · · · , n, are i.i.d.
▶ This is true if (X , Y ) are collected by simple random sampling
▶ This delivers the sampling distribution of βb0 and βb1
3. Large outliers in X and/or Y are rare.
▶ Technically, X and Y have finite fourth moments
▶ Outliers can result in meaningless values of βb1

19 / 28
Least squares assumption #1: E(u|X = x) = 0.

For any given value of X , the mean of u is zero:

Example: TestScorei = β0 + β1 STRi + ui , ui = other factors

▶ What are some of these “other factors”?
▶ Is E(u|X = x) = 0 plausible for these other factors?

20 / 28
Least squares assumption #1: E(u|X = x) = 0 (continued)

▶ A benchmark for thinking about this assumption is to consider an ideal

randomized controlled experiment:
▶ X is randomly assigned to people (students randomly assigned to
different size classes; patients randomly assigned to medical
treatments). Randomization is done by computer – using no information
about the individual.
▶ Because X is assigned randomly, all other individual characteristics –
the things that make up u – are distributed independently of X , so u and
X are independent
▶ Thus, in an ideal randomized controlled experiment, E(u|X = x) = 0
(that is, LSA #1 holds)
▶ In actual experiments, or with observational data, we will need to think
hard about whether E(u|X = x) = 0 holds.

21 / 28
Least squares assumption #2: (Xi , Yi ), i = 1, · · · , n are i.i.d.

▶ This arises automatically if the entity (individual, district) is sampled by

simple random sampling:
▶ The entities are selected from the same population, so (Xi , Yi ) are
identically distributed for all i = 1, . . . , n.
▶ The entities are selected at random, so the values of (X , Y ) for different
entities are independently distributed.
▶ The main place we will encounter non-i.i.d. sampling is when data are
recorded over time for the same entity (panel data and time series data)
– we will deal with that complication when we cover panel data.

22 / 28
Least squares assumption #3: Large outliers are rare
Technical statement: E(X 4 ) < ∞ and E(Y 4 ) < ∞

▶ A large outlier is an extreme value of X or Y

▶ On a technical level, if X and Y are bounded, then they have finite fourth
moments. (Standardized test scores automatically satisfy this; STR,
family income, etc. satisfy this too.)
▶ The substance of this assumption is that a large outlier can strongly
influence the results – so we need to rule out large outliers.
▶ Look at your data! If you have a large outlier, is it a typo? Does it belong
in your data set? Why is it an outlier?

23 / 28
OLS can be sensitive to an outlier

▶ Is the lone point an outlier in X or Y ?

▶ In practice, outliers are often data glitches (coding or recording
problems). Sometimes they are observations that really shouldn’t be in
your data set. Plot your data before running regressions!

24 / 28
The Sampling Distribution of the OLS Estimator (SW Section 4.5)

The OLS estimator is computed from a sample of data. A different sample

yields a different value of βb1 . This is the source of the “sampling uncertainty”
of βb1 . We want to:
▶ quantify the sampling uncertainty associated with
▶ use βb1 to test hypotheses such as β1 = 0
▶ construct a confidence interval for β1
▶ All these require figuring out the sampling distribution of the OLS
estimator.

25 / 28
Sampling Distribution of βb1

▶ We can show that βb1 is unbiased, i.e., E[βb1 ] = β1 . Similarly for βb0 .
▶ We do not derive V (βb1 ), as it requires some tedious algebra. Moreover,
we do not need to memorize the formula of it. Here, we just emphasize
two aspects of V (βb1 ).
▶ First, V (βb1 ) is inversely proportional to n, just like V (Y n ). Combining
p
E[βb1 ] = β1 , it is then suggested that βb1 −→ β1 , i.e., βb1 is consistent.
That is, as sample size grows, β1 gets closer to β1 .
b
▶ Second, V (βb1 ) is inversely proportional to the variance of X ; see the
graphs below.

26 / 28
Sampling Distribution of βb1

Low x variation High x variation

⇒ low precision ⇒ high precision

▶ Intuitively, if there is more variation in X , then there is more information

in the data that you can use to fit the regression line.

27 / 28
Sampling Distribution of βb1

▶ The exact sampling distribution is complicated – it depends on the

population distribution of (Y , X ) – but when n is large we get some
simple (and good) approximations:
▶ Let SE(βb1 ) be the standard error (SE) of βb1 , i.e., a consistent estimator
q
for the standard deviation of βb1 which is V (βb1 )
▶ Then, it turns out that

βb1 − β1 approx
∼ N (0, 1)
SE(βb1 )
▶ Using this approximate distribution, we can conduct statistical inference
on βb1 , i.e., hypothesis testing, confidence interval ⇒ Ch5.

28 / 28

Compressor Performance Map Generation and Testing Per SAE J1723
No ratings yet
Compressor Performance Map Generation and Testing Per SAE J1723
40 pages
Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)
No ratings yet
Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)
41 pages
Ch.2 The Simple Regression Model
No ratings yet
Ch.2 The Simple Regression Model
6 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
24 pages
Lecture Set 2
No ratings yet
Lecture Set 2
47 pages
CH 02
No ratings yet
CH 02
41 pages
Econ 399 Chapter2a
No ratings yet
Econ 399 Chapter2a
40 pages
Metrics 2019 Lec3
No ratings yet
Metrics 2019 Lec3
59 pages
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
No ratings yet
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
59 pages
Temas 4 Al 7
No ratings yet
Temas 4 Al 7
191 pages
Lect2 Part2
No ratings yet
Lect2 Part2
73 pages
IAES Tajikistan Day4
No ratings yet
IAES Tajikistan Day4
46 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
The Simple Linear Regression Model (Part 2)
No ratings yet
The Simple Linear Regression Model (Part 2)
38 pages
Review CH - 02 Annuity
100% (1)
Review CH - 02 Annuity
30 pages
Q2 Ans
No ratings yet
Q2 Ans
13 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
1 - The Simple Regression Model
No ratings yet
1 - The Simple Regression Model
41 pages
Ordinary Least Squares-2
No ratings yet
Ordinary Least Squares-2
31 pages
Lecture 2-3 - Properties of The OLS Estimates
No ratings yet
Lecture 2-3 - Properties of The OLS Estimates
20 pages
Ols 23-24
No ratings yet
Ols 23-24
87 pages
ECO375H Slides 3
No ratings yet
ECO375H Slides 3
39 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Lecture 6
No ratings yet
Lecture 6
45 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Week 3-4
No ratings yet
Week 3-4
75 pages
C1 English
No ratings yet
C1 English
26 pages
Properties of OLS
No ratings yet
Properties of OLS
13 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
41 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
Linear Regression
No ratings yet
Linear Regression
73 pages
A Brief Comparison of Interpolation Methods in Yield Curve Construction
No ratings yet
A Brief Comparison of Interpolation Methods in Yield Curve Construction
3 pages
Function Approximation, Interpolation, and Curve Fitting
No ratings yet
Function Approximation, Interpolation, and Curve Fitting
53 pages
The Simple Regression Model: DR Jin Hongfei 1
No ratings yet
The Simple Regression Model: DR Jin Hongfei 1
41 pages
Assignments Ashoka University
No ratings yet
Assignments Ashoka University
32 pages
Pertemuan 2 - Simple Linear Regression
No ratings yet
Pertemuan 2 - Simple Linear Regression
24 pages
Chapter 3 Econometrics Edited
No ratings yet
Chapter 3 Econometrics Edited
48 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
No ratings yet
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
64 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Chapter2 Econometrics Old
No ratings yet
Chapter2 Econometrics Old
37 pages
CH 03
No ratings yet
CH 03
17 pages
統計摘要
No ratings yet
統計摘要
12 pages
2 - Model Linear Jamak Dan OLS
No ratings yet
2 - Model Linear Jamak Dan OLS
11 pages
AG909 Quantitative Methods For Finance
No ratings yet
AG909 Quantitative Methods For Finance
7 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Tema I (Mínimos Cuadrados Ordinarios)
No ratings yet
Tema I (Mínimos Cuadrados Ordinarios)
49 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
1 12, Tesma501, IJEAST PDF
No ratings yet
1 12, Tesma501, IJEAST PDF
12 pages
MS-02 (Cost Behavior With Regression Analysis)
No ratings yet
MS-02 (Cost Behavior With Regression Analysis)
44 pages
Math644 Chapter 1 Part1
No ratings yet
Math644 Chapter 1 Part1
5 pages
Introduction To Multiple Regression
No ratings yet
Introduction To Multiple Regression
36 pages
Chap 2
No ratings yet
Chap 2
15 pages
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
No ratings yet
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
7 pages
Ordinary Least Squares: Rómulo A. Chumacero
No ratings yet
Ordinary Least Squares: Rómulo A. Chumacero
50 pages
10 Week 10 CHPT 14n15 NCA
No ratings yet
10 Week 10 CHPT 14n15 NCA
47 pages
Lecture 5 MLR - 2
No ratings yet
Lecture 5 MLR - 2
32 pages
Monaghan J. J. and Gingold R. A. (1983)
No ratings yet
Monaghan J. J. and Gingold R. A. (1983)
16 pages
3 - Applied Econometrics Syllabus
No ratings yet
3 - Applied Econometrics Syllabus
7 pages
Assignment I Questions Econ. For Acct & Fin. 2023
No ratings yet
Assignment I Questions Econ. For Acct & Fin. 2023
3 pages
Econometrics: Two Variable Regression: The Problem of Estimation
No ratings yet
Econometrics: Two Variable Regression: The Problem of Estimation
28 pages
Geoprobe 3.3 New Features Tutorial
No ratings yet
Geoprobe 3.3 New Features Tutorial
32 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Stock and Watson - Slides For Chapter 4
No ratings yet
Stock and Watson - Slides For Chapter 4
43 pages
Econometrics For Management Assignment
No ratings yet
Econometrics For Management Assignment
3 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Numerical Method
No ratings yet
Numerical Method
6 pages
Lec02-4 Splines Interpolation
No ratings yet
Lec02-4 Splines Interpolation
14 pages
Ts Wed123 Quiz2 Ielsiu21357 Ielsiu21189 Ielsiu21288
No ratings yet
Ts Wed123 Quiz2 Ielsiu21357 Ielsiu21189 Ielsiu21288
7 pages
1 CHIN2750 - 2024 - W1 - Post-Class Lecture With Notes
No ratings yet
1 CHIN2750 - 2024 - W1 - Post-Class Lecture With Notes
28 pages
2 CHIN2750 - 2023 - W2 - Lecture
No ratings yet
2 CHIN2750 - 2023 - W2 - Lecture
74 pages
Business Case Challenge
No ratings yet
Business Case Challenge
6 pages
Numerical Analysis - Curve Fitting
No ratings yet
Numerical Analysis - Curve Fitting
35 pages
Lecture 4 MLR - 1
No ratings yet
Lecture 4 MLR - 1
30 pages
Lecture 7 - Regression
No ratings yet
Lecture 7 - Regression
31 pages
Origin Vs Originpro 2020 en
No ratings yet
Origin Vs Originpro 2020 en
5 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Uji Hipotesis Alya
No ratings yet
Uji Hipotesis Alya
3 pages
Complete Handbook of High Frequency Trading and Modeling in Finance 1st Edition Ionut Florescu PDF For All Chapters
No ratings yet
Complete Handbook of High Frequency Trading and Modeling in Finance 1st Edition Ionut Florescu PDF For All Chapters
91 pages
Lesson 5 6 Linear Regression Prerequisites II
No ratings yet
Lesson 5 6 Linear Regression Prerequisites II
10 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
q6-5 Solution (Ridge and Lasso)
No ratings yet
q6-5 Solution (Ridge and Lasso)
7 pages
Decision Sciences Formulae Sheet
No ratings yet
Decision Sciences Formulae Sheet
3 pages
3-Applying Multiple Linear Regression
No ratings yet
3-Applying Multiple Linear Regression
5 pages
Hasts211 Mar2023
No ratings yet
Hasts211 Mar2023
3 pages
MCA Syllabus
No ratings yet
MCA Syllabus
57 pages

Lecture 2 SLR - 1

Uploaded by

Lecture 2 SLR - 1

Uploaded by

ECON 7310 Elements of Econometrics

Lecture 2: Linear Regression with One Regressor

▶ The population linear regression model (LRM)

▶ Linear regression lets us estimate the slope of the population regression

▶ The problem of statistical inference for linear regression is, at a general

▶ The population regression line:

Test Score = β0 + β1 STR

▶ Observations on Y and X (n = 7); the population regression line; and

▶ By analogy, we will focus on the least squares (“ordinary least squares”

▶ In fact, we estimate the conditional expectation function E[Y |X ] under

▶ The population regression line:

Test Score = β0 + β1 STR

▶ The OLS estimator minimizes the average squared difference between

▶ The OLS estimators are

▶ The estimated intercept, βb0 , and slope, βb1 , and residuals u

predicted value: = 698.9 − 2.28 × 19.33 = 654.8

TestScore = 698.93 − 2.28 × STR

▶ Two regression statistics provide complementary measures of how well

▶ The sample variance of Yi = n1 ni=1 (Yi − Y )2

The sample variance of Y bi = 1 Pn (Y b )2 , where in fact Y

▶ R 2 = 0 ⇐⇒ ESS = 0 and R 2 = 1 ⇐⇒ ESS = TSS. Also, 0 ≤ R 2 ≤ 1

▶ The SER measures the spread of the distribution of u. The SER is

▶ When n is large, SER ≈ RMSE. 1

▶ TestScore = 698.9 − 2.28 × STR, R 2 = 0.05, SER = 18.6

▶ What, in a precise sense, are the properties of the sampling distribution

1. The conditional distribution of u given X has mean zero, that is,

For any given value of X , the mean of u is zero:

Example: TestScorei = β0 + β1 STRi + ui , ui = other factors

▶ A benchmark for thinking about this assumption is to consider an ideal

▶ This arises automatically if the entity (individual, district) is sampled by

▶ A large outlier is an extreme value of X or Y

▶ Is the lone point an outlier in X or Y ?

The OLS estimator is computed from a sample of data. A different sample

Low x variation High x variation

▶ Intuitively, if there is more variation in X , then there is more information

▶ The exact sampling distribution is complicated – it depends on the

You might also like