EC501 Lecture 02
EC501 Lecture 02
Marcus Chambers
Department of Economics
University of Essex
19 October 2023
1 / 27
Outline
Review
2 / 27
Review
We motivated the ordinary least squares (OLS) estimator by
choosing a linear combination of the regressors that provides a
‘good’ approximation of the dependent variable.
Our measure of ‘good’ was in terms of the sum of squared
residuals, where the residual for observation i is
N
X
S(β̃) = e2i = e′ e = (y − X β̃)′ (y − X β̃).
i=1
The result is
b = (X ′ X)−1 X ′ y.
3 / 27
Properties of b?
4 / 27
The linear regression model
The linear regression model takes the form
5 / 27
Random sampling
The origins of the linear regression model lie in the sciences
where the xi variables are determined in a laboratory setting.
The xi variables are fixed in repeated samples so that the only
source of randomness is ϵi leading to different values for yi
across samples.
This can be hard to justify in Economics where it is more
common to regard both xi and ϵi as changing across samples.
This leads to different observed values of yi and xi each time a
new sample is drawn.
A random sample implies that each observation, (yi , xi ), is an
independent drawing from the population.
We will use this idea as a basis for a set of statistical
assumptions.
6 / 27
Assumptions
Our assumptions concern the linear model
yi = xi′ β + ϵi , i = 1, . . . , N.
8 / 27
Assumption A2
Under assumption (A2) the matrix X and vector ϵ are
independent.
This means that knowledge of X tells us nothing about the
distribution of ϵ (and vice versa).
It implies that
Under (A1) and (A2) the linear regression model is a model for
the conditional mean of yi , because
9 / 27
Small N
b = (X ′ X)−1 X ′ y
= (X ′ X)−1 X ′ Xβ + (X ′ X)−1 X ′ ϵ
10 / 27
E{b}
But
E{(X ′ X)−1 X ′ ϵ} = E{(X ′ X)−1 X ′ }E{ϵ} = 0
by (A2) and E{ϵ} = 0 by (A1).
Hence E{b} = β and the OLS estimator b is said to be an
unbiased estimator of β.
In repeated sampling the OLS estimator will be equal to β ‘on
average.’
Note that unbiasedness does not require (A3) and (A4).
11 / 27
V{b|X}
The conditional covariance matrix of b is:
V{b|X} = E{(b − β)(b − β)′ |X}
= σ 2 (X ′ X)−1 .
V{b} = σ 2 E (X ′ X)−1 ,
Gauss-Markov Theorem
Under Assumptions (A1)–(A4), the OLS estimator b of β is the
Best Linear Unbiased Estimator (BLUE) in the sense that it has
minimum variance within the class of LUEs.
What does this mean?
Take any other LUE, call it b̃; then
V{b̃|X} ≥ V{b|X}
13 / 27
Normality of ϵ
ϵi ∼ NID(0, σ 2 ), (A5′ )
14 / 27
Normality of b
b ∼ N(β, σ 2 (X ′ X)−1 )
because b is linear in ϵ.
Each element of b is also normally distributed:
bk ∼ N(βk , σ 2 ckk ), k = 1, . . . , K,
15 / 27
Estimation of σ 2 = V{ϵi }
We usually estimate variances by sample averages but ϵi is
unobserved.
Instead we can base an estimator on the residuals:
N
1 X 2
s2 = ei .
N−K
i=1
V̂{b} = s2 (X ′ X)−1 .
16 / 27
Returning to the R output for a regression of individuals’ wages
on years of education from last week:
> fit1 <- lm(lwage~educ, data=wage1)
> summary(fit1)
Call:
lm(formula = lwage ~ educ, data = wage1)
Residuals:
Min 1Q Median 3Q Max
-2.21158 -0.36393 -0.07263 0.29712 1.52339
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.583773 0.097336 5.998 3.74e-09 ***
educ 0.082744 0.007567 10.935 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
17 / 27
Large N
18 / 27
Convergence
19 / 27
Convergence of random variables
20 / 27
Slutsky’s Theorem
If plim b = β then b is a consistent estimator of β.
Consistency can be thought of as a large sample version of
unbiasedness and is a minimum requirement for an estimator.
A useful property of the plim operator is:
Slutsky’s Theorem
If g(·) is a continuous function and plim xN = c, then
N=10
N=100
5
N=1000
4
Density
0
-1 -0.5 0 0.5 1
Estimator
22 / 27
Large N assumptions
23 / 27
Properties of b
We begin by writing
−1
1 ′ 1 ′
b = β+ XX Xϵ
N N
N
!−1 N
1X ′ 1X
= β+ xi xi xi ϵi .
N N
i=1 i=1
N
!−1 N
1X ′ 1X
plim(b − β) = plim xi xi plim xi ϵi .
N N
i=1 i=1
24 / 27
Large sample results
It is reasonable to assume that sample averages converge to
their population values and so
N
1X
plim xi ϵi = E{xi ϵi }.
N
i=1
plim(b − β) = Σ−1
xx E{xi ϵi } = 0.
25 / 27
Large sample approximation
For a large but finite sample size we can use this result to
approximate the distribution of b as
a
b ∼ N(β, σ 2 Σ−1
xx /N),
a
where ∼ means ‘is approximately distributed as.’
Our best estimate of Σxx is X ′ X/N and we estimate σ 2 using s2 .
Hence we have the familiar result
a
b ∼ N(β, s2 (X ′ X)−1 ).
26 / 27
Summary
• Gauss-Markov assumptions
• statistical properties of OLS: small N and large N
• Next week:
• goodness-of-fit
• hypothesis testing (t and F tests)
27 / 27