0% found this document useful (0 votes)
67 views67 pages

Chapter 3 Econometrics

This document provides an outline for Chapter 3 of an econometrics textbook on linear regression with one regressor. It introduces the simple linear regression model and the ordinary least squares estimator used to estimate the population regression coefficients. It discusses measuring model fit and the assumptions of the least squares model. The chapter aims to explain how to estimate the slope of the regression line representing the causal effect of an independent variable on a dependent variable using a statistical package like Stata.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views67 pages

Chapter 3 Econometrics

This document provides an outline for Chapter 3 of an econometrics textbook on linear regression with one regressor. It introduces the simple linear regression model and the ordinary least squares estimator used to estimate the population regression coefficients. It discusses measuring model fit and the assumptions of the least squares model. The chapter aims to explain how to estimate the slope of the regression line representing the causal effect of an independent variable on a dependent variable using a statistical package like Stata.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Econometrics I

Chapter 3
Linear Regression with One Regressor

Prof. Miguel Ángel Borrella Mas

School of Economics and Business Administration


Universidad de Navarra

Academic year 2022-23


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Learning objectives

• Ask a question [simple - with one independent variable]. We


want to study the causal effect of “A” on “B”

• Set up a simple linear model to answer this question

• Answer the question using data and a statistical package


(Stata)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example
Question from previous chapter: Does class-size affect student
performance?

• What are our priors? −→ Smaller class size are better for
learning outcomes (?)
• We are interested in
Change in TextScore △T extScore
β1 = =
Change in ClassSize △ClassSize

• In words: β1 measures the change in Test Score due to a unit


change in Class Size
• Mathematically:
• β1 = slope of a straight line relating test scores and class size
T est Score = β0 + β1 ∗ Class Size
• β0 = intercept of the straight line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example (2)

• But: The average test score in district i does not only depend
on the average class size
• It also depends on other factors such as:
• Quality of the teachers
• Student background
• Quality of text books
• ...

• The equation describing the linear relation between Test score


and Class size is better written as:

T est Scorei = β0 + β1 Class Sizei + ui

where ui lumps together all other district characteristics that


affect average test scores

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Statistical inference for linear regression

Statistical (or econometric) inference about the slope entails:


1 Estimation:
• How should we draw a line through the data to estimate the
population slope? −→ Ordinary Least Squares (OLS!)
• What are advantages and disadvantages of OLS?
2 Hypothesis testing:
• How to test whether the slope is zero?
3 Confidence intervals:
• How to construct a confidence interval for the slope?

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Population regression line

The population regression line is the expected value of Y given X

E(Y | X)

• The slope (or marginal effect) is the difference in the expected


values of Y, for two values of X that differ by one unit
• The estimated regression can be used either for:
• causal inference (learning about the causal effect on Y of a
change in X)
• prediction (predicting the value of Y given X, for an
observation not in the data set)

• Causal inference and prediction place different requirements


on the data – but both use the same regression toolkit

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


General form

General form of the population regression line


Yi = β0 + β1 Xi + ui , i = 1, . . . , n
where:

• Subscript i = observational level [n-paired obs. (Xi , Yi )]


• Yi = Dependent variable
• Xi = Independent variable or regressor
• β0 = Population intercept (unknown!)
• β1 = Population slope (unknown!)
• ui = Regression error term −→ Omitted factors that influence
Y , other than the variable X. Also includes error in the
measurement of Y

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Interpretation

△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?

• Condition unlikely to hold


• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Interpretation

△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?

• Condition unlikely to hold


• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example
A simple linear wage equation:

W agei = β0 + β1 Educi + ui

• β1 = Measures the change in hourly wage of an additional


year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability

• What about the conditional mean independence?


−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
A simple linear wage equation:

W agei = β0 + β1 Educi + ui

• β1 = Measures the change in hourly wage of an additional


year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability

• What about the conditional mean independence?


−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?

OLS estimators: To choose the regression coefficients s.t. the


estimated regression line is as close as possible to the observed data
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?

OLS estimators: To choose the regression coefficients s.t. the


estimated regression line is as close as possible to the observed data
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Graphically

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Regression model
Method to estimate β0 and β1 :

LEAST SQUARE PRINCIPLE


Mathematical procedure that uses the data to position a line with
the objective of minimizing the sum of the squares of the vertical
distances between the actual Y values and the predicted values of Y

n
X n
X
min S(β0 , β1 ) = û2i = (Yi − Ŷi )2
β0 ,β1
i=1 i=1

where ûi are called the residuals:


• Difference between the observed y-value and the predicted
y-value for a given x-value on the line
• ûi = Yi − βˆ0 − βˆ1 Xi
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Regression equation

Regression equation
An equation that expresses the linear relationship between two
variables

Ŷi = βˆ0 + βˆ1 Xi


where:
• Ŷi = Estimated value of Y for a selected X value
• βˆ0 = Y-intercept. Estimated value of Y when X = 0
• βˆ1 = Slope. Average change in the dependent variable Y for
each change of one unit (increase or decrease) in the
independent variable X
• Xi = Value of the independent variable that is selected

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


OLS estimators of β1 and β0

Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y

βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
OLS estimators of β1 and β0

Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y

βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Why to use OLS estimators?

• OLS is, as in the case of the sample average, the estimator


searching for the line that better “fits” the scatterplot:
• Notice: if the “line” is just an intercept (Y does not depend
on X), then the OLS estimator is just the sample average of
Y1 , . . . , Yn −→ (Ȳ )

• Like Ȳ , the OLS estimator has some desirable properties:


• Under certain assumptions, it is unbiased −→ E(β̂1 ) = β1
• Its sampling distribution has lower variance than other
candidate estimators of β1
p
• Under certain assumptions, it is consistent −→ β̂1 −→ β1

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example
Application to the California Test Score – Class Size data

• The sample mean of district average test scores Ȳ = 654.16

• It can also be obtained by OLS:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example

19.053
• Estimated slope = β̂1 = −2.28 = −0.2264 ∗ 1.8918

• Estimated intercept = β̂0 = 698.9 = 654.1565 + 2.28 ∗ 19.64


• Estimated regression line: T est ˆScore = 698.9 − 2.28 ∗ ST R
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example

• Interpretation of the estimated slope and intercept:


• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28

• The intercept: Taken literally, means that, according to this


estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example

• Interpretation of the estimated slope and intercept:


• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28

• The intercept: Taken literally, means that, according to this


estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example (2)

In Stata:

One of the districts in the dataset is Antelope, CA, for which:

ST R = 19.33 and Test Score = 657.8

• Predicted value: ŶAntelope = 698.9 − 2.28 ∗ 19.33 = 654.8


• Residual: ûAntelope = 657.8 − 654.8 = 3.0

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

How well does the estimated regression line “fit” or explain the data?
1 Does the regressor X account for much or for little variation
in Y ? −→ The R2 measures the fraction of the variance of Y
that is explained by X
• It is unitless
• Ranges between 0 (no fit) and 1 (perfect fit)

2 Are the observations in the scatter plot clustered closely


around the regression line?
−→ The standard error of the regression (SER) measures
how far Yi typically is from its predicted value

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


The R2

The R2 is the fraction of the sample variance of Yi “explained” by


the regression

Yi = Ŷi + ûi = OLS prediction + OLS residual

• Sample Var(Y) = Sample Var(Ŷ ) + Sample Var(û)


• Total sum of squares = “Explained” SS + SS “Residuals”
Pn  ¯ 2

ESS i=1 Ŷi −Ŷ SSR
• R2 = T SS = Pn 2 =1− T SS
i=1 (Yi −Ȳ )

• If R2 = 0, Xi explains none of the variation in Yi


• If R2 = 1, Xi explains all of the variation in Yi (Yi = Ŷi )
• In practice, 0 < R2 < 1
• With one regressor: R2 = the square of the correlation
coefficient between X and Y

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i

SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1

¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i

SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1

¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example

The slope is statistically significant & large in a policy sense, but:


• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example

The slope is statistically significant & large in a policy sense, but:


• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

So far: OLS is as a way to draw a straight line through the data on


Y and X. But:

1 Under what conditions does the slope of this line have a


causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?

To answer these questions −→ To make some assumptions about


how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

So far: OLS is as a way to draw a straight line through the data on


Y and X. But:

1 Under what conditions does the slope of this line have a


causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?

To answer these questions −→ To make some assumptions about


how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Reminder!

The causal effect on Y of a unit change in X is the expected


difference in Y as measured in a randomized controlled experiment

• With a binary treatment:


• The causal effect is the expected difference in means between
the treatment and control groups (remember chapter 2b!)
• It requires random assignment or as-if random assignment
• Random assignment ensures that the treatment (X) is
uncorrelated with all other determinants of Y, so that there are
no confounding variables

• The least squares assumptions for causal inference generalize


the binary treatment case to regression

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


General assumptions

General assumptions for the linear regression model:

1 Assumption SLR.1 (Linear in parameters) −→ In the


population, the relationship between Y and X is linear

Y = β0 + β1 X + U

2 Assumption SLR.2 (Sample variation in the regressor) −→


The values of the regressor are not all the same (otherwise
would be impossible to study how different values of X lead to
different values of Y)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Specific assumptions
Specific assumptions for the linear regression model:
3 Assumption SLR.3 (Zero conditional mean) −→ The value
of the regressor must contain no information about the mean
of the unobserved factors
E (ui | Xi ) = 0
4 Assumption SLR.4 (Simple random sampling) −→
(Xi , Yi ), i = 1, . . . , n are i.i.d. and then each data point
follows the population equation
5 Assumption SLR.5 (Outliers unlikely) −→ X and Y have
finite fourth moments
6 Assumption SLR.6 (Homoskedasticity) −→ The value of the
regressor must contain no information about the variability of
the unobserved factors
V ar (ui | Xi ) = σ 2
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.3
For any given value of X, the mean of u is zero: E (ui | Xi = xi ) = 0

• This implies that β̂1 is unbiased for the causal effect of β1

E (Yi | Xi ) = E (β0 + β1 Xi + ui | Xi ) =
= β0 + β1 E (Xi | Xi ) + (ui | Xi ) = β0 + β1 Xi

• Our example: T est Scorei = β0 + β1 Class Sizei + ui −→


What can be those other factors included in ui ?
• Parental involvement
• Outside learning opportunities (extra math classes...)
• Home environment conducive to reading
• Family income is a useful proxy for many such factors

• This means E(family income | ST R) = constant and implies


that family income and STR are uncorrelated. . .
• Is E (ui | Xi = xi ) = 0 plausible for these other factors?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.3 (2)

The benchmark for understanding this assumption is to consider an


ideal RCT:

• X is randomly assigned to people


• Students randomly assigned to different size classes
• Randomization is done by computer —using no information
about the individual

• Because X is assigned randomly, all other individual


characteristics (things included in u) are distributed
independently of X −→ u and X are independent
• Then, in an ideal RCT: E (ui | Xi = xi ) = 0 (SLR.3 holds)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.3 (3)
With observational data, we need to think hard about whether
E (ui | Xi = xi ) = 0 holds. Example:
• Suppose that:
• Districts which wealthy inhabitants have small classes and
good teachers
• These districts have a lot of money which they can use to hire
more and better teachers
• Districts with poor inhabitants have large classes and bad
teachers
• These districts have little money and can hire only few and not
very good teachers
• In this case, class size is related to teacher quality
• Teacher quality likely affects test scores −→ Within ui
• This implies a violation of SLR.3
E(ui | Class sizei = small) ̸= E(ui | Class sizei = large) ̸= 0
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.4

(Xi , Yi ), i = 1, . . . , n are i.i.d. arises automatically if the entity


(individual, district) is sampled by simple random sampling:

• The entities are selected from the same population, so


(Xi , Yi ) are identically distributed for all i = 1, . . . , n
• The entities are selected at random, so the values of (X, Y )
for different entities are independently distributed

Examples of a violation of simple random sampling:


• Panel data and time series data (Data recorded over time)
• Observations on children from the same mother (not
independent)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.5

Large outliers are rare −→ E(X 4 ) < ∞ and E(Y 4 ) < ∞

• Outliers are observations that have values far outside the


usual range of the data
• Another way to state assumption is that X and Y have finite
kurtosis
• Large outliers can make OLS regression results misleading
• Look at your data! If you have a large outlier, is it a typo?
• Does it belong in your data set? Why is it an outlier?

• Assumption is necessary to justify the large sample


approximation to the sampling distribution of the OLS
estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.5 (2)

Large outlier can strongly influence the results:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.6

Homoskedasticity graphically:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

The OLS estimator is computed from a sample of data


−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:

• Quantify the sampling uncertainty associated with β̂1


• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

The OLS estimator is computed from a sample of data


−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:

• Quantify the sampling uncertainty associated with β̂1


• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Probability Framework for Linear Regression

The probability framework is summarized by the OLS assumptions:

1 Population → The group of interest (ex: all possible school


districts)
2 Random variables → X, Y (ex: Test Score, STR) (SLR.2)
3 Joint distribution of X and Y → We assume:
• Population regression function is linear (SLR.1)
• E (ui | Xi = xi ) = 0 (SLR.3)
• X, Y have nonzero finite fourth moments (SLR.5)
4 Simple random sampling → Data collection by this method
implies (Xi , Yi ), i = 1, . . . , n are i.i.d. (SLR.4)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Reminder

Recall the summary of the sampling distribution of Ȳ :

• For (Y1 , . . . , Yn ) i.i.d. with 0 < σY2 < ∞,

σY2
 
Ȳ is Best ≤ V ar(µ̂Y ) ∀ µ̂Y
V ar(Ȳ ) =
n
n
!
1X
Linear µ̂Y = Yi
n
i=1

Unbiased E(Ȳ ) = µY
Estimator of µY

• Moreover:
Ȳ − E(Ȳ )
By CLT: p ≃ N (0, 1)
V ar(Ȳ )

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Sampling distribution of β̂1

Like Ȳ , β̂1 (remember: it is a function of sample averages!) has a


sampling distribution:

1 What is E(β̂1 )?
−→ If E(β̂1 ) = β1 , then OLS is unbiased (good thing!)
2 What is V ar(β̂1 )? (measure of sampling uncertainty)
−→ We need to derive a formula in order to compute the SE
of β1
3 What is the distribution of β̂1 in small samples?
−→ It is very complicated in general
4 What is the distribution of β̂1 in large samples?
−→ By the CLT, β̂1 is (approx) normally distributed

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Preliminary algebra
Some (needed!) preliminary algebra:

Yi = β0 + β1 Xi + ui
Ȳ = β0 + β1 X̄ + ū
Hence: Yi − Ȳ = β1 (Xi − X̄) + (ui − ū)

Thus:
Pn  
Xi − X̄ Yi − Ȳ
βˆ1 = i=1
Pn 2
Xi − X̄
Pn  Pi=1
n 
Xi − X̄ β1 (Xi − X̄) + (ui − ū)
βˆ1 = i=1
Pn
i=1
2
Xi − X̄
i=1
Pn   Pn 
X i − X̄ Xi − X̄ X i − X̄ (ui − ū)
βˆ1 = β1 i=1Pn 2 + i=1 Pn 2
i=1 Xi − X̄ i=1 Xi − X̄

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Preliminary algebra (2)

Pn 
Xi − X̄ (ui − ū)
βˆ1 = β1 + i=1
Pn 2
i=1 Xi − X̄

It can be shown that:


n
X n
X
 
Xi − X̄ (ui − ū) = Xi − X̄ ui
i=1 i=1

Finally:
Pn 
ˆ i=1 Xi − X̄ ui
β1 − β1 = Pn 2
i=1 Xi − X̄

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


What is E(β̂1 )?

Pn  !
  Xi − X̄ ui
E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄
Pn  !
  X i − X̄ E (ui | Xi )
Using LIE: E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄
 
E βˆ1 − β1 = 0, because LSR.3: E (ui | Xi = xi )

• Thus LSR.3 implies that E(β̂1 ) = β1 —just like Ȳ !


• That is, β̂1 is an unbiased estimator of β1

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


What is V ar(β̂1 )?
Rewrite:
Pn  1 Pn
Xi − X̄ ui i=1 vi
βˆ1 − β1 = i=1
Pn 2 =
n
n−1
 2
i=1 Xi − X̄ n SX

where vi = Xi − X̄ ui
2 ≈ σ 2 and n−1
If n is large enough: SX X n ≈ 1. Then:
1 P n
vi
βˆ1 − β1 ≈ n i=1
2
σX
!
1 Pn
  vi
V ar βˆ1 − β1 = V ar n i=1 2
σX
  1 V ar  X − X̄  u 
i i
V ar βˆ1 = 2
n 2
σX
 
• V ar βˆ1 is inversely proportional to n —just like Ȳ !
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
What is the distribution of β̂1 ?

The exact sampling distribution is complicated – it depends on the


population distribution of (Y, X). But, When n is large, we get
some simple (and good) approximations
p
• Since V ar(β̂1 ) < ∞ and β̂1 −→ β1
• We can use the CLT to obtain the (approx) distribution
• Remember previous slide:
1 Pn 1 Pn
i=1 vi i=1 vi
βˆ1 − β1 = n
n−1
 2 ≈ n
2
n SX σX

where vi = Xi − X̄ ui

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


What is the distribution of β̂1 ? (2)

• When n is large: vi = (Xi − X̄)ui ≈ (Xi − µX )ui


• vi is i.i.d. (why?)
• V ar(vi ) < ∞ (why?)
σ2
Pn  
• By the CLT: n1 i=1 vi ≃ N 0, nv

Then, β̂1 is approximately distributed:


!
σv2
β̂1 ≃ N β1 , 2
2
n σX

where vi ≈ (Xi − µX ) ui

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (1): Proof of consistency
p
Consistency means β̂1 −→ β1 or p − lim β̂1 = β1

Pn  !
Xi − X̄ Y i − Ȳ
p − lim βˆ1 = p − lim i=1
Pn 2
i=1 Xi − X̄
1 Pn
 p !
Xi − X̄ ui −→ 0
p − lim βˆ1 = β1 + p − lim n i=1
2 p
1 Pn
n i=1 Xi − X̄ −→ V ar(X)

• Then, p − lim β̂1 = β1 if E (ui | Xi = xi ) = 0


• Unbiasedness & consistency both rely on SLR.3
• But consistency implies that the sampling distribution
becomes more and more tightly distributed around β1 if the
sample size n becomes larger and large

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (2): Variance of X vs Variance of β̂1

  1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σ 2
X

where 2
σX = V ar(Xi )

• The variance of X appears (squared) in the denominator −→


Increasing the spread of X decreases the variance of β̂1
• Intuition: More variation in X implies more info in the data
that can be used to fit the regression line. Graphically:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (3): (What about SLR.6?)

Under SLR.6:
  σu2 σu2
V ar βˆ1 = 2 = Pn 2
n ∗ σX Xi − X̄
i=1

• Same notion: Larger sampling variability of βˆ1 if variability


of unobserved factors is higher; lower if variation in the
regressor is larger
SSR
• As usual, σu2 is unknown −→ Use σ̂ 2 = s2u = SER = n−2

• Homoskedastic SE of βˆ1 is then:


su
SE(βˆ1 ) = qP 2
n
i=1 Xi − X̄

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (3): (What about SLR.6?)

Difference between homoskedastic and heteroskedastic (robust) SE

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Summary

Parallel conclusions hold for the OLS estimator β̂1 (and also for β̂0 ):

• Under SLR.1-SLR.5:
 
β̂1 is Best V ar(β̂1 ) ≤ V ar(β̃1 ) ∀ β̃1 −→ Efficient!
X
Linear ( Y1 , . . . ; Yn weighted by X1 , . . . , Xn )
 
Unbiased E(β̂1 ) = β1
Estimator of β1

• Moreover:
β̂1 − E(β̂1 )
By CLT: q ≃ N (0, 1)
V ar(β̂1 )

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Summary (2)

If SLR.1-SLR.5 hold, then in large samples β̂1 and β̂0 have a jointly
normal sampling distribution:
 
1 The large-sample normal distribution of β̂ is N
1 β1 , σβ̂2 ,
1
where the variance of this distribution is:
  1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σX 2

 
2 The large-sample normal distribution of β̂ is N β , σ 2 ,
0 0 β̂
0
where the variance of this distribution is:
  1 V ar (H u )  
ˆ i i µX
V ar β0 =  where Hi = 1 − Xi
n E(H 2 ) 2 E(Xi2 )

i

Ready to turn to hypothesis tests & confidence intervals!


Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Econometrics I
End chapter 3

Prof. Miguel Ángel Borrella Mas

School of Economics and Business Administration


Universidad de Navarra

You might also like