Sampling Distributions of The OLS Estimators

Multiple linear regression:
hypothesis testing
BY SHARPER SIKOTA-LECTURER KWAME NKRUMAH UNIVERSITY
WOOLDRIDGE CHAPTERS 4 – 9
Cheat Sheet Time
Make yourself a cheat sheet with the formulae for:
The population model, fitted values, residuals, OVB formula,

SST, SSR, SSE, R squared, relationship between SST, SSE and
SSR, Variance of beta, VIF, and whatever else seems
appropriate. Then keep it handy while studying. It will prove
useful!
Chapter 4 Roadmap
1. Sampling Distributions of the OLS Estimators

2. Testing Hypotheses about a Single Population Parameter:
The t Test
3. Confidence Intervals
4. Testing Hypotheses about a Single Linear Combination of
the Parameters
5. Testing Multiple Linear Restrictions: The F Test
6. Reporting Regression Results
3
Logwage = β0 + β1female + β2educ + β3age + β4african + β5coloured +
β6indas + β7depressed + β8hhrural + u
4/
Now we can ask questions
What is the distribution of β? 4.1
➢
What is the true coefficient on female? 4.2

➢
Is a coefficient equal to a particular value? 4.2, 4.3

➢
Are two or more coefficients related somehow? 4.4

➢
Should all the variables have been included? 4.5

➢
NB! What is the best way to report regressions? 4.6

➢
5/
Chapter 4 Roadmap
1. Sampling Distributions of the OLS Estimators

2. Testing Hypotheses about a Single Population
Parameter: The t Test
3. Confidence Intervals
4. Testing Hypotheses about a Single Linear Combination
of the Parameters
5. Testing Multiple Linear Restrictions: The F Test
6. Reporting Regression Results
6
MLR Assumptions 1-5
MLR.1 linear in parameters

Y = β0+ β1x1 + β2x2 + ... + βkxk + u
MLR.2 random sampling
MLR.3 No perfect collinearity
MLR.4 Zero conditional mean
E(u | x1,x2, …, x1k) = 0
MLR.5 homoskedasticity
Var(u | x1,x2, …, x1k) = σ2
7/
Distribution of the OLS Estimator
➢What does it mean to talk about the distribution of 𝛃?

𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥1 + 𝑢ො
➢Surely I just get one value for 𝛽መ 1?
➢ Yes. You do. In that sample. What about in another sample?
➢Thought experiment: (1) Draw many many samples, (2)
estimate 𝛽መ 1 in each sample, and (3) examine the distribution of
all of those values for 𝛽መ 1.
➢Is it normal? Uniform? Poisson?
If our model is good, it has (1) low variance, and (2) is centered
➢
around the true population value of 𝛽1
8/
Distribution of the OLS Estimator
▪ How are the sample coefficients distributed?
▪ We need to know this to perform hypothesis tests
▪ We have a new normality assumption about u:

MLR.6
▪ MLR.6 will help us figure out how 𝛽 is distributed
9/
MLR.6 implies MLR.4 and MLR.5
▪u is independent of the x's, so E(u|x) = E(u)
We know E(u) = 0, so E(u|x) = 0, so MLR.6 → MLR.4
▪u is independent of the x’s, so Var(u|x) = Var(u)
MLR.6 also says Var(u) = σ2, so Var(u|x) = σ2 which is MLR.5

(homoskedasticity) so MLR.6 → MLR.5
10 /
Classical Linear Model (CLM)
MLR.1
● linear in parameters
Y = β0+ β1x1 + β2x2 + ... + βkxk + u
MLR.2
● random sampling
MLR.3
● No perfect collinearity
MLR.4
● Zero conditional mean E(u|x1, x2, …, x1k) = 0
MLR.5
● Homoskedasticity Var(u|x1, x2, …, x1k) = σ2
MLR.6
● Normality u ~ Normal(0, σ2)
11 /
What does the CLM imply?
MLR.1 – MLR.6 is the classical linear model (CLM)
OLS estimator now has the lowest variance of ALL the unbiased
estimators of βi (i.e. not just the linear ones).
Given MLR.1 – MLR.6 we now know:
y|x ~ Normal(β0+ β1x1 + β2x2 + ... + βkxk, σ2)
Why? We need some proof (next slide)
12 /
Proof: Var(y|x) = Var(u|x) = σ2
y = β0+ β1x1 + β2x2 + ... + βkxk + u
Let A = β0+ β1x1 + β2x2 + ... + βkxk

Therefore y = A + u
And Var(y|x) = Var(A + u|x)
Var(y|x) = Var(A|x) + Var(u|x)

(by independence of A and u)
Var(y|x) = 0 + Var(u|x) = 0 + σ2 = σ2
By laws of conditional variance (Proved)
NB – I did this in Chapter 3 – including it here for convenience
13 /
Why is Var(y|x) = Var(u|x) = σ2?
Well y = β0+ β1x1 + β2x2 + ... + βkxk + u
This is unwieldy. Let's do the following:
Let A = β0+ β1x1 + β2x2 + ... + βkxk
So y = A + u, and Var(y|x) = Var(A + u|x)
Var(y|x) = Var(A|x) + Var(u|x)
Why? We can only do this if A and u are independent. Are they? Yes if E(u|x) =
E(u) = 0 holds. If A and u were related, we would have a Cov(A,u) term in the mix
somewhere too (remember the laws of how to find Var(A+B) for A and B any
random variables)
Why? Because E(u|x) = E(u) implies u and x are unrelated, so u and any function
of x (which is what A is), should also be unrelated.
So now what?
Var(y|x) = 0 + Var(u|x) = 0 + σ2
Why is Var(A|x) = 0? Because once you are given the values of x (this is what
conditioning on x is), there is no variance in an expression of x – it is a constant,
and thus has zero variance.
14 /
Why is E(y|x) = β0+ β1x1 + β2x2 + ... + βkxk?
Given: y = β0+ β1x1 + β2x2 + ... + βkxk + u
E(y|x) = E(β0+ β1x1 + β2x2 + ... + βkxk + u|x)
E(y|x) = E(β0+ β1x1 + β2x2 + ... + βkxk |x) + E(u|x)
because the expected value of a sum = the sum of the expected values
But: x and u are independent, so E(u|x) = E(u)
So: E(y|x) = E(β0+ β1x1 + β2x2 + ... + βkxk |x) + E(u)
But E(u) = 0 (by the way the OLS estimator is constructed)
Therefore:
E(y|x) = E(β0+ β1x1 + β2x2 + ... + βkxk|x) = β0+ β1x1 + β2x2 + ... + βkxk
because E(f(x)|x) = f(x) (laws of conditional expectations)
15 /
2
y|x ~ Normal(β0+ β1x1 + β2x2 + ... + βkxk, σ )
▪Why? You proved: 1. Var(y|x) = σ2
▪And you know: 2. E(y|x) = β0+ β1x1 + β2x2 + ... + βkxk

because of the independence of x and u
▪So why is 3. y|x normally distributed?

▪Because y = f (the xs, u),
Therefore y|x = f (a constant, u),
▪Remember: conditioning on x – i.e. being given the values of x -
means any function of x is a constant with zero variance.
▪The only variation in y is from u, which is normally distributed (by
MLR.6), → y|x ~ Normal
16 /
y|x~Normal(β0+ β1x1 + β2x2 + ... + βkxk, σ2)
17 /
Are the errors really normal?
u = the sum of many things affecting y
Maybe according to the Central Limit Theorem:
The distribution of a sum of independent random variables tends
towards the standard normal.
This won’t hold if:

▪ u is a complicated function of the left out x’s (i.e. not linear)
▪ OR the x's have weird distributions.
▪ It's an empirical question as to whether the u's are normal.

▪ What if they're not?
18 /
Are Wages Normally Distributed?
What about Log Wages?
19 /
20 /
Who is in the regression?
Stata can predict yhat for you, if you have data for all the x's,
even if you don't have data for lmonthlywage
21 /
Graph the Residuals
Histogram uhat, scheme(s1manual)
22 /
Which y's are Normally Distributed?
● E.g. Is Wage|educ, exper, tenure normally distributed?
No. Wage is bounded at zero
● A categorical y with few values is probably not normal

E.g, number of times arrested
●Taking the log can make something more normal, we call

this log normal – you just saw this.
●Otherwise in large samples, it's OK to assume this

(Chapter 5). Phew!
23 /
Given MLR1-6, β is Normal. Conclusions?
● We care because our hypothesis testing depends on the

distribution of β
We also get told:
● Any linear combination of β , β , …, β is normal
1 2 k
● Any subset of β , β , …, β is jointly normal
1 2 k
Cool!
24 /
What is Joint Normality?
▪ E.g. if x1, x2, x3 are jointly normal, any linear combination of x1,
x2, and x3 is normal:
▪ So the expression:
𝑎1𝑥1 + 𝑎2𝑥2 + 𝑎3𝑥3
where a1, a2 and a3 are any constants, is a new random variable,
which is normally distributed.
▪ NB: the reverse isn't true.
▪ If you take a collection of normally distributed x variables, a

linear combination of those variables may not be normal.
25 /
Following Joint Normality
▪ Why isn’t a linear combination of normal variables
not always normal?
▪ Surely it should be? No. Not necessarily.
▪ It will definitely be if they are jointly normal (as
opposed to only individually)
▪ But what is true is:
Property Normal.4:
Any linear combination of independent, identically
distributed normal random variables has a normal
distribution.
26 /
You are told β is normal. Proof?
▪ You could do this - you have the tools.
▪ The errors are independent, identically distributed (iid)
Normal (0, σ2) random variables.
▪ VERY brief summary: each β is a linear combination of the
ui's and thus β is normal
▪ NB: the textbook has an error – it refers to equation 3.62,
but actually it is another equation – find it in the chapter 3
appendices
NOT EXAMINABLE
27 /

Sampling Distributions of The OLS Estimators

Uploaded by

Copyright:

Available Formats

Sampling Distributions of The OLS Estimators

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sampling Distributions of The OLS Estimators

Uploaded by

Copyright:

Available Formats

Multiple linear regression:

Make yourself a cheat sheet with the formulae for:

The population model, fitted values, residuals, OVB formula,

1. Sampling Distributions of the OLS Estimators

What is the true coefficient on female? 4.2

Is a coefficient equal to a particular value? 4.2, 4.3

Are two or more coefficients related somehow? 4.4

Should all the variables have been included? 4.5

NB! What is the best way to report regressions? 4.6

1. Sampling Distributions of the OLS Estimators

MLR.1 linear in parameters

➢What does it mean to talk about the distribution of 𝛃?

around the true population value of 𝛽1

▪ We need to know this to perform hypothesis tests

▪ We have a new normality assumption about u:

▪ MLR.6 will help us figure out how 𝛽 is distributed

▪u is independent of the x's, so E(u|x) = E(u)

We know E(u) = 0, so E(u|x) = 0, so MLR.6 → MLR.4

▪u is independent of the x’s, so Var(u|x) = Var(u)

MLR.6 also says Var(u) = σ2, so Var(u|x) = σ2 which is MLR.5

MLR.1 – MLR.6 is the classical linear model (CLM)

Given MLR.1 – MLR.6 we now know:

y|x ~ Normal(β0+ β1x1 + β2x2 + ... + βkxk, σ2)

Why? We need some proof (next slide)

Let A = β0+ β1x1 + β2x2 + ... + βkxk

Var(y|x) = Var(A|x) + Var(u|x)

because E(f(x)|x) = f(x) (laws of conditional expectations)

▪And you know: 2. E(y|x) = β0+ β1x1 + β2x2 + ... + βkxk

▪So why is 3. y|x normally distributed?

This won’t hold if:

▪ It's an empirical question as to whether the u's are normal.

● A categorical y with few values is probably not normal

●Taking the log can make something more normal, we call

●Otherwise in large samples, it's OK to assume this

● We care because our hypothesis testing depends on the

▪ NB: the reverse isn't true.

▪ If you take a collection of normally distributed x variables, a

You might also like