0% found this document useful (0 votes)

4 views27 pages

EC501 Lecture 02

The document discusses the statistical properties of the Ordinary Least Squares (OLS) estimator in linear regression, outlining the assumptions required for its validity, such as independence of errors and homoskedasticity. It explains the implications of these assumptions on the unbiasedness and variance of the OLS estimator, as well as the Gauss-Markov theorem which states that OLS is the Best Linear Unbiased Estimator (BLUE). Additionally, it touches on the behavior of OLS estimators with small and large sample sizes, including the concept of consistency and asymptotic properties.

Uploaded by

T T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views27 pages

EC501 Lecture 02

Uploaded by

T T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

EC501 Econometric Methods

2. Linear Regression: Statistical Properties

Marcus Chambers

Department of Economics
University of Essex

19 October 2023

1 / 27
Outline

Review

The linear regression model: assumptions

Statistical properties of OLS: small N

Statistical properties of OLS: large N

Reference: Verbeek, chapter 2.

2 / 27
Review
We motivated the ordinary least squares (OLS) estimator by
choosing a linear combination of the regressors that provides a
‘good’ approximation of the dependent variable.
Our measure of ‘good’ was in terms of the sum of squared
residuals, where the residual for observation i is

ei = yi − β̃1 − β̃2 xi2 − . . . − β̃K xiK , i = 1, . . . , N.

The OLS estimator is obtained as b = arg minβ̃ S(β̃) where

N
X
S(β̃) = e2i = e′ e = (y − X β̃)′ (y − X β̃).
i=1

The result is
b = (X ′ X)−1 X ′ y.

3 / 27
Properties of b?

But: what are the (statistical) properties of b?

To answer this question we need to move beyond thinking of
OLS in a purely algebraic sense.
Instead of describing the properties of a given sample we shall
think in terms of a statistical model relating y to x2 , . . . , xK .
We try to learn something about this relationship from our
observed sample.
The statistical properties (assumptions) of the model then
determine the statistical properties of b.

4 / 27
The linear regression model
The linear regression model takes the form

yi = β1 + β2 xi2 + . . . + βK xiK + ϵi or yi = xi′ β + ϵi ,

where ϵi is an error term or disturbance.

This is a population relationship between y and x and is
assumed to hold for any possible observation.
Our goal is to estimate the population parameters, β1 , . . . , βK ,
based on our sample, (yi , xi ; i = 1, . . . , N).
We regard yi and ϵi (and usually xi ) as random variables that
are part of a sample derived from the population.
Recall that we can write the model in matrix form as
y = X β + ϵ.
(1)
(N × 1) (N × K) (K × 1) (N × 1)

5 / 27
Random sampling
The origins of the linear regression model lie in the sciences
where the xi variables are determined in a laboratory setting.
The xi variables are fixed in repeated samples so that the only
source of randomness is ϵi leading to different values for yi
across samples.
This can be hard to justify in Economics where it is more
common to regard both xi and ϵi as changing across samples.
This leads to different observed values of yi and xi each time a
new sample is drawn.
A random sample implies that each observation, (yi , xi ), is an
independent drawing from the population.
We will use this idea as a basis for a set of statistical
assumptions.

6 / 27
Assumptions
Our assumptions concern the linear model

yi = xi′ β + ϵi , i = 1, . . . , N.

The Gauss-Markov conditions are:

E{ϵi } = 0, i = 1, . . . , N; (A1)
{ϵ1 , . . . , ϵN } and {x1 , . . . , xN } are independent; (A2)
V{ϵi } = σ 2 , i = 1, . . . , N; (A3)
cov{ϵi , ϵj } = 0, i, j = 1, . . . , N, i ̸= j. (A4)

Note that we also need N > K and X ′ X to be invertible – here

we need X to have rank K i.e. the columns of X are linearly
independent (M4).
What do these conditions imply?
7 / 27
Assumptions A1, A3 and A4
Assumption (A1) suggests that the regression line holds on
average (more on this shortly).
Assumption (A3) states that all disturbances have the same
variance - this is known as homoskedasticity (which rules out
heteroskedasticity, or non-constant variances, which we shall
deal with later).
Assumption (A4) tells us that all pairs, ϵi and ϵj , are
uncorrelated (this is essentially just random sampling), thereby
ruling out autocorrelation.
In terms of the N × 1 vector ϵ, these assumptions imply (see
S12) that

E{ϵ} = 0 (N × 1) and V{ϵ} = σ 2 IN (N × N),

where IN is the N × N identity matrix.

8 / 27
Assumption A2
Under assumption (A2) the matrix X and vector ϵ are
independent.
This means that knowledge of X tells us nothing about the
distribution of ϵ (and vice versa).
It implies that

E{ϵ|X} = E{ϵ} = 0 and V{ϵ|X} = V{ϵ} = σ 2 IN .

Under (A1) and (A2) the linear regression model is a model for
the conditional mean of yi , because

E{yi |xi } = E{xi′ β + ϵi |xi } = xi′ β + E{ϵi |xi } = xi′ β

in view of E{ϵi |xi } = 0.

Assumptions (A1)–(A4) jointly determine the properties of b.

9 / 27
Small N

We shall begin by taking the sample size, N, to be a finite

number (but recall N > K).
First, note that the OLS vector b is a linear function of y:

b = (X ′ X)−1 X ′ y

= (X ′ X)−1 X ′ (Xβ + ϵ) (using y = Xβ + ϵ)

= (X ′ X)−1 X ′ Xβ + (X ′ X)−1 X ′ ϵ

= β + (X ′ X)−1 X ′ ϵ (because (X ′ X)−1 X ′ X = IK ).

It is, therefore, also a linear function of the unobservable

random vector ϵ.

10 / 27
E{b}

The expected value of b is

E{b} = E{β + (X ′ X)−1 X ′ ϵ} = β + E{(X ′ X)−1 X ′ ϵ}.

But
E{(X ′ X)−1 X ′ ϵ} = E{(X ′ X)−1 X ′ }E{ϵ} = 0
by (A2) and E{ϵ} = 0 by (A1).
Hence E{b} = β and the OLS estimator b is said to be an
unbiased estimator of β.
In repeated sampling the OLS estimator will be equal to β ‘on
average.’
Note that unbiasedness does not require (A3) and (A4).

11 / 27
V{b|X}
The conditional covariance matrix of b is:
V{b|X} = E{(b − β)(b − β)′ |X}

= E{(X ′ X)−1 X ′ ϵϵ′ X(X ′ X)−1 |X}

= (X ′ X)−1 X ′ E{ϵϵ′ |X}X(X ′ X)−1

= σ 2 (X ′ X)−1 X ′ X(X ′ X)−1 as E{ϵϵ′ |X} = σ 2 IN

= σ 2 (X ′ X)−1 .

We will denote this as V{b} = σ 2 (X ′ X)−1 for convenience.

The unconditional variance matrix is actually

V{b} = σ 2 E (X ′ X)−1 ,

which is rather more complicated!

12 / 27
Gauss-Markov Theorem
Clearly OLS is a Linear Unbiased Estimator (LUE).
But how does OLS compare to other LUEs?

Gauss-Markov Theorem
Under Assumptions (A1)–(A4), the OLS estimator b of β is the
Best Linear Unbiased Estimator (BLUE) in the sense that it has
minimum variance within the class of LUEs.
What does this mean?
Take any other LUE, call it b̃; then

V{b̃|X} ≥ V{b|X}

in the sense that the matrix V{b̃|X} − V{b|X} is positive

semi-definite; see (M10).

13 / 27
Normality of ϵ

Sometimes it is appropriate to actually specify the distribution

of the random disturbance vector ϵ.
A common assumption, that incorporates (A1), (A3) and (A4),
is:
ϵ ∼ N(0, σ 2 IN ). (A5)
This is equivalent to

ϵi ∼ NID(0, σ 2 ), (A5′ )

where NID denotes ‘normally and independently distributed.’

This also implies that yi ∼ NID(xi′ β, σ 2 ) (conditional on X) which
is not always appropriate.

14 / 27
Normality of b

Under (A2) and (A5) it follows that

b ∼ N(β, σ 2 (X ′ X)−1 )

because b is linear in ϵ.
Each element of b is also normally distributed:

bk ∼ N(βk , σ 2 ckk ), k = 1, . . . , K,

where ckk denotes the (k, k) (diagonal) element of (X ′ X)−1 .

These results motivate statistical tests based on b but, in
practice, we don’t know σ 2 .
We therefore estimate σ 2 using the data – how do we do this?

15 / 27
Estimation of σ 2 = V{ϵi }
We usually estimate variances by sample averages but ϵi is
unobserved.
Instead we can base an estimator on the residuals:
N
1 X 2
s2 = ei .
N−K
i=1

This estimator is unbiased (i.e. E{s2 } = σ 2 ).

Note the degrees of freedom adjustment – the denominator is
N − K rather than N − 1.
This is because we have estimated K parameters in order to
obtain the residuals (ei = yi − xi′ b).
The estimated variance matrix of b is then

V̂{b} = s2 (X ′ X)−1 .

16 / 27
Returning to the R output for a regression of individuals’ wages
on years of education from last week:
> fit1 <- lm(lwage~educ, data=wage1)
> summary(fit1)

Call:
lm(formula = lwage ~ educ, data = wage1)

Residuals:
Min 1Q Median 3Q Max
-2.21158 -0.36393 -0.07263 0.29712 1.52339

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.583773 0.097336 5.998 3.74e-09 ***
educ 0.082744 0.007567 10.935 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4801 on 524 degrees of freedom

Multiple R-squared: 0.1858,Adjusted R-squared: 0.1843
F-statistic: 119.6 on 1 and 524 DF, p-value: < 2.2e-16

= 0.4801 (implying s2 = 0.2305), while the standard

Here, s p
errors ( s2 ckk ) are 0.0973 and 0.0076 for b1 and b2 , respectively.

17 / 27
Large N

The Gauss-Markov assumptions ensure that exact finite

sample results hold for b (e.g. unbiasedness, normality).
If we wish to relax some of these assumptions then exact finite
sample results are typically not available.
For example, if (A2) doesn’t hold, then b will be biased.
We therefore use results for large N to find out the asymptotic
properties as N → ∞.
For large enough N we treat the asymptotic results as holding
approximately.

18 / 27
Convergence

Consider a sequence of numbers indexed by N e.g.

−N 1 1 1 1
{xN = e } = , , ,..., N,... .
e e2 e3 e

We can define the limit of this sequence as N → ∞:

lim xN = lim e−N = 0.

N→∞ N→∞

The sequence {xN } is said to converge to zero.

But what happens if the elements of the sequence are random
variables?

19 / 27
Convergence of random variables

The sequence of random variables {xN } is said to converge in

probability to a constant c if

lim P {|xN − c| > δ} = 0 for all δ > 0;

N→∞

(see, for example, (2.69) on p.34 of Verbeek).

This is written
p
xN → c or plim xN = c.
In words: there exists a positive number δ such that, as N gets
larger and larger, the probability that the distance between xN
and c is larger than δ, converges to zero.
Note that δ can be arbitrarily small.

20 / 27
Slutsky’s Theorem
If plim b = β then b is a consistent estimator of β.
Consistency can be thought of as a large sample version of
unbiasedness and is a minimum requirement for an estimator.
A useful property of the plim operator is:

Slutsky’s Theorem
If g(·) is a continuous function and plim xN = c, then

plim g(xN ) = g(plim xN ) = g(c);

(see, for example, (2.71) on p.34 of Verbeek).

This is not a property shared by the expectations operator; in

general,
̸ g{E(x)}
E{g(x)} =
for a random variable x.
21 / 27
Convergence to a constant
6

N=10
N=100
5
N=1000

4
Density

0
-1 -0.5 0 0.5 1
Estimator

Convergence to a constant c (here, c = 0) is illustrated above

by the variance of the distribution becoming smaller as N
increases.

22 / 27
Large N assumptions

What can we say about b in large samples? Is it consistent?

It is convenient to make the following assumptions:
N
1 ′ 1X ′ p
XX= xi xi → Σxx , (finite, nonsingular); (A6)
N N
i=1
E{xi ϵi } = 0, i = 1, . . . , N. (A7)

In (A6) the matrix Σxx can be regarded as E(xi xi′ ).

Assumption (A7) states that xi and ϵi are uncorrelated.
What do these conditions imply for b?

23 / 27
Properties of b

We begin by writing
−1
1 ′ 1 ′
b = β+ XX Xϵ
N N
N
!−1 N
1X ′ 1X
= β+ xi xi xi ϵi .
N N
i=1 i=1

Applying the plim operator and using Slutsky we find

N
!−1 N
1X ′ 1X
plim(b − β) = plim xi xi plim xi ϵi .
N N
i=1 i=1

The first term converges to Σ−1

xx using (A6).

24 / 27
Large sample results
It is reasonable to assume that sample averages converge to
their population values and so
N
1X
plim xi ϵi = E{xi ϵi }.
N
i=1

But E{xi ϵi } = 0 under (A7) and so

plim(b − β) = Σ−1
xx E{xi ϵi } = 0.

Hence b is a consistent estimator of β.

It is also possible to show that, as N → ∞,
√
N(b − β) → N(0, σ 2 Σ−1
xx ),

where → means ‘is asymptotically distributed as’.

25 / 27
Large sample approximation

For a large but finite sample size we can use this result to
approximate the distribution of b as
a
b ∼ N(β, σ 2 Σ−1
xx /N),
a
where ∼ means ‘is approximately distributed as.’
Our best estimate of Σxx is X ′ X/N and we estimate σ 2 using s2 .
Hence we have the familiar result
a
b ∼ N(β, s2 (X ′ X)−1 ).

But note this is only approximate as it is based on weaker

assumptions than Gauss-Markov.

26 / 27
Summary

• Gauss-Markov assumptions
• statistical properties of OLS: small N and large N

• Next week:
• goodness-of-fit
• hypothesis testing (t and F tests)

27 / 27

Example 2
No ratings yet
Example 2
7 pages
Gauss-Markov Theorem
No ratings yet
Gauss-Markov Theorem
5 pages
Stock Watson 3u Exercise Solutions Chapter 7 Instructors
No ratings yet
Stock Watson 3u Exercise Solutions Chapter 7 Instructors
13 pages
Econometrics Formulas
80% (5)
Econometrics Formulas
2 pages
03 Assumptions and Gauss Markov
No ratings yet
03 Assumptions and Gauss Markov
5 pages
Statistics II - Least Squares Regression: Marcelo Sant'Anna
No ratings yet
Statistics II - Least Squares Regression: Marcelo Sant'Anna
18 pages
Ec 2
No ratings yet
Ec 2
12 pages
Econometrics I Lecture 4 Wooldridge
No ratings yet
Econometrics I Lecture 4 Wooldridge
33 pages
4-Econometrics-Linear Regression
No ratings yet
4-Econometrics-Linear Regression
12 pages
Financial Econometrics Lecture 4
No ratings yet
Financial Econometrics Lecture 4
41 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Tema I (Mínimos Cuadrados Ordinarios)
No ratings yet
Tema I (Mínimos Cuadrados Ordinarios)
49 pages
Ordinary Least Squares: Rómulo A. Chumacero
No ratings yet
Ordinary Least Squares: Rómulo A. Chumacero
50 pages
TA Session 06
No ratings yet
TA Session 06
13 pages
Chapter 6: Regression
No ratings yet
Chapter 6: Regression
7 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
Lect 6
No ratings yet
Lect 6
20 pages
Chapter 2 Regression Analysis Notes
No ratings yet
Chapter 2 Regression Analysis Notes
11 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
No ratings yet
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
64 pages
Week2 Lecture2
No ratings yet
Week2 Lecture2
59 pages
Properties of The OLS Estimator: Quantitative Methods 2
No ratings yet
Properties of The OLS Estimator: Quantitative Methods 2
57 pages
Econometrics: Two Variable Regression: The Problem of Estimation
No ratings yet
Econometrics: Two Variable Regression: The Problem of Estimation
28 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
ECONOMETRICS
No ratings yet
ECONOMETRICS
36 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Class 6a EEFE 510 - Fall 2024
No ratings yet
Class 6a EEFE 510 - Fall 2024
25 pages
Chapter3
No ratings yet
Chapter3
52 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
Session-Classical Assumption
No ratings yet
Session-Classical Assumption
26 pages
(2021) EC6041 Lecture 3 Properties of OLS
No ratings yet
(2021) EC6041 Lecture 3 Properties of OLS
25 pages
Chapter 4 Evaluation of Estimators
No ratings yet
Chapter 4 Evaluation of Estimators
25 pages
MultivariableRegression 3
No ratings yet
MultivariableRegression 3
67 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
No ratings yet
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
7 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
52 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Econometrics - Exercise Set 1 (Solution)
No ratings yet
Econometrics - Exercise Set 1 (Solution)
7 pages
Block 1
No ratings yet
Block 1
81 pages
Properties of OLS Estimators: Assumptions Underlying Model
100% (1)
Properties of OLS Estimators: Assumptions Underlying Model
23 pages
Block 1
No ratings yet
Block 1
83 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
CH 03
No ratings yet
CH 03
17 pages
CLRM Assumptions
No ratings yet
CLRM Assumptions
24 pages
Lecture5-Estimating The Linear Conditional Mean Model II - Annotated
No ratings yet
Lecture5-Estimating The Linear Conditional Mean Model II - Annotated
27 pages
Chapter 2 The Classical Linear Regression Model (CLRM)
No ratings yet
Chapter 2 The Classical Linear Regression Model (CLRM)
20 pages
Appendex E
No ratings yet
Appendex E
9 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
UnivariateRegression 2
No ratings yet
UnivariateRegression 2
72 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
Hayashi 1 13
No ratings yet
Hayashi 1 13
13 pages
Ec2 1
No ratings yet
Ec2 1
11 pages
Lesson01 PDF 02
No ratings yet
Lesson01 PDF 02
5 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Undergraduate Econometric
No ratings yet
Undergraduate Econometric
15 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Assignment 3 Quality Core Tool 2
No ratings yet
Assignment 3 Quality Core Tool 2
9 pages
AREC 815: Experimental and Behavioral Economics Syllabus (Revised August 31, 2015)
No ratings yet
AREC 815: Experimental and Behavioral Economics Syllabus (Revised August 31, 2015)
5 pages
Trend Analysis Practice
No ratings yet
Trend Analysis Practice
4 pages
MITx+IDS.S24x+2T2024 Time Series Analysis Lecture 7 Annotated
No ratings yet
MITx+IDS.S24x+2T2024 Time Series Analysis Lecture 7 Annotated
13 pages
6.21, 7.3, 7.4 - 103046
No ratings yet
6.21, 7.3, 7.4 - 103046
3 pages
Outline - Demand Planning (Slide 14 Resource)
No ratings yet
Outline - Demand Planning (Slide 14 Resource)
6 pages
Class Test 2 - Forecasting
No ratings yet
Class Test 2 - Forecasting
8 pages
Regression Analysis Project
100% (1)
Regression Analysis Project
4 pages
Autocorrelation
No ratings yet
Autocorrelation
5 pages
Arima Cho Usd Eur
No ratings yet
Arima Cho Usd Eur
15 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
Bowerman Regression CHPT 1
100% (2)
Bowerman Regression CHPT 1
18 pages
AI Innovation and Economics Growth A Global Eviden
No ratings yet
AI Innovation and Economics Growth A Global Eviden
19 pages
S1B 15 02 Estimation Bias 4
No ratings yet
S1B 15 02 Estimation Bias 4
2 pages
USDINR Forcasting
No ratings yet
USDINR Forcasting
55 pages
Econometrics I: Problem Set II: Prof. Nicolas Berman November 30, 2018
No ratings yet
Econometrics I: Problem Set II: Prof. Nicolas Berman November 30, 2018
4 pages
ch12 0
No ratings yet
ch12 0
82 pages
Lecture 2 (Modified v.2025)
No ratings yet
Lecture 2 (Modified v.2025)
31 pages
Stationarity & AR, MA, ARIMA, SARIMA
100% (1)
Stationarity & AR, MA, ARIMA, SARIMA
6 pages
Manual Stata 13
100% (1)
Manual Stata 13
371 pages
2nd Year Statistics Question Bank CH#12
No ratings yet
2nd Year Statistics Question Bank CH#12
3 pages
BM 2018 20
No ratings yet
BM 2018 20
302 pages
Model Perf Cheat Sheet
No ratings yet
Model Perf Cheat Sheet
2 pages
Introduction To Financial Models For Management and Planning Second Edition John P. Daley
No ratings yet
Introduction To Financial Models For Management and Planning Second Edition John P. Daley
49 pages
ECON7310 Project1
No ratings yet
ECON7310 Project1
8 pages

EC501 Lecture 02

Uploaded by

EC501 Lecture 02

Uploaded by

EC501 Econometric Methods

2. Linear Regression: Statistical Properties

The linear regression model: assumptions

Statistical properties of OLS: small N

Statistical properties of OLS: large N

Reference: Verbeek, chapter 2.

ei = yi − β̃1 − β̃2 xi2 − . . . − β̃K xiK , i = 1, . . . , N.

The OLS estimator is obtained as b = arg minβ̃ S(β̃) where

But: what are the (statistical) properties of b?

yi = β1 + β2 xi2 + . . . + βK xiK + ϵi or yi = xi′ β + ϵi ,

where ϵi is an error term or disturbance.

The Gauss-Markov conditions are:

Note that we also need N > K and X ′ X to be invertible – here

E{ϵ} = 0 (N × 1) and V{ϵ} = σ 2 IN (N × N),

where IN is the N × N identity matrix.

E{ϵ|X} = E{ϵ} = 0 and V{ϵ|X} = V{ϵ} = σ 2 IN .

E{yi |xi } = E{xi′ β + ϵi |xi } = xi′ β + E{ϵi |xi } = xi′ β

in view of E{ϵi |xi } = 0.

We shall begin by taking the sample size, N, to be a finite

= (X ′ X)−1 X ′ (Xβ + ϵ) (using y = Xβ + ϵ)

= β + (X ′ X)−1 X ′ ϵ (because (X ′ X)−1 X ′ X = IK ).

It is, therefore, also a linear function of the unobservable

The expected value of b is

E{b} = E{β + (X ′ X)−1 X ′ ϵ} = β + E{(X ′ X)−1 X ′ ϵ}.

= E{(X ′ X)−1 X ′ ϵϵ′ X(X ′ X)−1 |X}

= (X ′ X)−1 X ′ E{ϵϵ′ |X}X(X ′ X)−1

= σ 2 (X ′ X)−1 X ′ X(X ′ X)−1 as E{ϵϵ′ |X} = σ 2 IN

We will denote this as V{b} = σ 2 (X ′ X)−1 for convenience.

which is rather more complicated!

in the sense that the matrix V{b̃|X} − V{b|X} is positive

Sometimes it is appropriate to actually specify the distribution

where NID denotes ‘normally and independently distributed.’

Under (A2) and (A5) it follows that

where ckk denotes the (k, k) (diagonal) element of (X ′ X)−1 .

This estimator is unbiased (i.e. E{s2 } = σ 2 ).

Residual standard error: 0.4801 on 524 degrees of freedom

= 0.4801 (implying s2 = 0.2305), while the standard

The Gauss-Markov assumptions ensure that exact finite

Consider a sequence of numbers indexed by N e.g.

We can define the limit of this sequence as N → ∞:

lim xN = lim e−N = 0.

The sequence {xN } is said to converge to zero.

The sequence of random variables {xN } is said to converge in

lim P {|xN − c| > δ} = 0 for all δ > 0;

(see, for example, (2.69) on p.34 of Verbeek).

plim g(xN ) = g(plim xN ) = g(c);

(see, for example, (2.71) on p.34 of Verbeek).

This is not a property shared by the expectations operator; in

Convergence to a constant c (here, c = 0) is illustrated above

What can we say about b in large samples? Is it consistent?

In (A6) the matrix Σxx can be regarded as E(xi xi′ ).

Applying the plim operator and using Slutsky we find

The first term converges to Σ−1

But E{xi ϵi } = 0 under (A7) and so

Hence b is a consistent estimator of β.

where → means ‘is asymptotically distributed as’.

But note this is only approximate as it is based on weaker

You might also like