0% found this document useful (0 votes)

75 views76 pages

EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)

This document provides an introduction to multiple linear regression analysis. It explains that multiple regression allows incorporating different factors to better explain relationships between variables. The key assumptions of multiple regression are presented, including that the error term is uncorrelated with the regressors. Ordinary least squares is used to estimate the coefficients, which then have interpretations as partial effects holding other regressors constant. An example illustrates how controlling for IQ reduces the estimated effect of education on wages.

Uploaded by

SHUMING ZHU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views76 pages

EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)

Uploaded by

SHUMING ZHU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

EC212: Introduction to Econometrics

Multiple Regression: Estimation

(Wooldridge, Ch. 3)

Tatiana Komarova

London School of Economics

Summer 2021

1
1. Motivation for multiple regression
(Wooldridge, Ch. 3.1)

2
Example: Wage equation

• Extend simple regression for wage as

log(wage) = β0 + β1 educ + β2 IQ + u

where IQ is IQ score

• Primarily interested in β1 but β2 is of some interest, too

• Now IQ is taken out from error term. If IQ is good proxy for

intelligence, this may lead to more convincing estimate of
causal effect of schooling

3
Model with two regressors

• Generally, we can write regression model with two regressors

y = β0 + β1 x1 + β2 x2 + u

where β0 is intercept, β1 measures change in y with respect

to x1 , holding other factors (u and x2 ) fixed, and β2 measures
change in y with respect to x2 , holding other factors (u and
x1 ) fixed

• In this model, key assumption about how u is related to x1

and x2 is
E (u|x1 , x2 ) = 0
i.e. for any values of x1 and x2 in population, conditional
expectation of u is zero

4
Back to example

• In wage example, this assumption is E (u|educ, IQ) = 0. Now

u no longer contains intelligence (we hope). So this condition
has better chance of being true. In simple regression, we had
to assume IQ and educ are unrelated to justify leaving IQ in
error term

• Other factors such as experience and “motivation” are part of

u. Motivation is very difficult to measure. Experience is easier:

log(wage) = β0 + β1 educ + β2 IQ + β3 exper + u

5
Model with k regressors

• Multiple linear regression model is written as

y = β0 + β1 x1 + · · · + βk xk + u

where β0 is intercept and β1 , . . . , βk are slope parameters

(i.e. (k + 1) unknown parameters in total)

• Key assumption is

E (u|x1 , . . . , xk ) = 0

• Provided we are careful, we can make this condition closer to

being true by “controlling for” more variables. In wage
example, we “control for” IQ when estimating return to
education

6
• Multiple regression allows to incorporate different factors to
explain behavior of y

• Also multiple regression is useful to allow more flexible

functional forms. For example

log(wage) = β0 + β1 educ + β2 IQ + β3 exper + β4 exper 2 + u

so that exper is allowed to have quadratic effect on log(wage)

• In this case, we set x1 = educ, x2 = IQ, x3 = exper , and

x4 = exper 2 . Note that x4 is a nonlinear function of x3

• See App. A.4 for review on quadratic function

7
• We already know that 100 · β1 is percent change in wage
when educ increases by one year. 100 · β2 has similar
interpretation (for one point increase in IQ)

• β3 and β4 are harder to interpret, but we can use calculus to

get slope of log(wage) with respect to exper

∂ log(wage)
= β3 + 2β4 exper
∂exper
• Multiply by 100 to get percentage effect

8
2. Mechanics and interpretation of
OLS
(Wooldridge, Ch. 3.2)

9
OLS for multiple regression

• Suppose we have x1 and x2 (k = 2) along with y . We want to

fit the equation

ŷi = β̂0 + β̂1 xi1 + β̂2 xi2

by data {(yi , xi1 , xi2 ) : i = 1, . . . , n}

• Now regressors have two subscripts: i is observation number

and the second subscript (1 and 2 in this case) are labels for
particular regressor. For example,

xi1 = educi for i = 1, . . . , n

xi2 = IQi for i = 1, . . . , n

10
• As in simple regression case, different ways to derive OLS
estimator. We choose β̂0 , β̂1 , and β̂2 (so three unknowns) to
minimize sum of squared residuals
n
X
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 )2
i=1

• Case with k regressors is easy to state: choose k + 1 values

β̂0 , β̂1 , . . . , β̂k to minimize
n
X
(yi − β̂0 − β̂1 xi1 − · · · − β̂k xik )2
i=1

• Later we discuss condition to have unique solution. Stata is

good at finding solution
• Terminology: We say β̂0 , β̂1 , . . . , β̂k are the OLS estimates
from the regression
y on x1 , x2 , . . . , xk
11
OLS regression line

• OLS regression line is written as

ŷ = β̂0 + β̂1 x1 + · · · + β̂k xk

• Slope coefficients now explicitly have ceteris paribus

interpretations

• For example, if k = 2, then

∆ŷ = β̂1 ∆x1 + β̂2 ∆x2

which allows us to compute how predicted y changes when x1

and x2 change by any amount

12
• What if we “hold x2 fixed”? Then

∆ŷ
β̂1 = if ∆x2 = 0
∆x1

i.e., β̂1 is slope of ŷ with respect to x1 when x2 is held fixed

• Similarly
∆ŷ
β̂2 = if ∆x1 = 0
∆x2
• We call β̂1 and β̂2 partial effects

13
Example: Regress log(wage) on educ (WAGE2.dta)

. reg lwage educ

Source SS df MS Number of obs = 935

F( 1, 933) = 100.70
Model 16.1377042 1 16.1377042 Prob > F = 0.0000
Residual 149.518579 933 .160255712 R-squared = 0.0974
Adj R-squared = 0.0964
Total 165.656283 934 .177362188 Root MSE = .40032

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0598392 .0059631 10.03 0.000 .0481366 .0715418

_cons 5.973063 .0813737 73.40 0.000 5.813366 6.132759

14
Multiple regression: log(wage) on educ and IQ
. reg lwage educ IQ

Source SS df MS Number of obs = 935

F( 2, 932) = 69.42
Model 21.4779447 2 10.7389723 Prob > F = 0.0000
Residual 144.178339 932 .154697788 R-squared = 0.1297
Adj R-squared = 0.1278
Total 165.656283 934 .177362188 Root MSE = .39332

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0391199 .0068382 5.72 0.000 .0256998 .05254

IQ .0058631 .0009979 5.88 0.000 .0039047 .0078215
_cons 5.658288 .0962408 58.79 0.000 5.469414 5.847162

. corr educ IQ
(obs=935)

educ IQ

educ 1.0000
IQ 0.5157 1.0000
15
• Results

\
log(wage) = 5.973 + 0.060 educ
\
log(wage) = 5.658 + 0.039 educ + 0.0059 IQ

• Estimated return to one year of education falls from 6.0% to

3.9% when we control for differences in IQ

• To interpret multiple regression, we do this thought

experiment: Take two people A and B with same IQ score.
Suppose person B has one more year of schooling than person
A. Then we predict B to have wage with 3.9% higher

• Simple regression does not allow us to compare people with

same IQ score. Larger estimated return from simple regression
is because we are attributing part of IQ effect to education

16
• Not surprisingly, there is nontrivial positive correlation
between educ and IQ: Corr (educi , IQi ) = 0.516

• Multiple regression “partials out” other regressors when looking

at effect of educ. We can show that β̂1 measures effect of
educ on log(wage) once correlation between educ and IQ is
partialled out

• Another IQ point is worth much less than one year of

education. Holding educ fixed, 10 more IQ points increases
predicted wage by about 5.9%

• Beauty of multiple regression is that it gives us ceteris paribus

interpretation without having to find two people with same
value of IQ who differ in education by one year. OLS
automatically does it for us

17
Fitted values and residuals

• For each i, fitted value is

ŷi = β̂0 + β̂1 xi1 + · · · + β̂k xik

and residual is
ûi = yi − ŷi

18
Algebraic properties

• (1) Residuals always average to zero

n
X
ûi = 0
i=1

• (2) Each regressor has zero sample covariance (or

correlation) with residuals
n
X
xij ûi = 0 for j = 1, . . . , k
i=1

• These properties follow from the first order conditions of OLS

Pn
• These properties imply e.g. ȳ = ȳˆ and i=1 ŷi ûi

19
Goodness-of-fit
• As with simple regression, it can be shown that

SST = SSE + SSR

where SST , SSE and SSR are total, explained and residual
sum of squares

• We define R-squared as before

SSE SSR
R2 = =1−
SST SST
• Property: It holds 0 ≤ R 2 ≤ 1, but using same dependent
variable, R 2 never falls when another regressor is added
to regression (adding another x cannot increase SSR)

• Thus, if we focus on R 2 , we might include silly variables

20
Adjusted R 2

• One way to overcome this problem R 2 is to use adjusted R 2

[SSR/(n − k − 1)]
R̄ 2 = 1 −
[SST /(n − 1)]
• When more regressors are added, SSR falls, but so does
df = n − k − 1. R̄ 2 can increase or decrease

• Goodness-of-fit of different multiple regression models can be

compared by R̄ 2

• See Ch. 6.3 for further discussion

21
Compare simple and multiple regression estimates

• Compare simple and multiple OLS regression lines

ỹ = β̃0 + β̃1 x1
ŷ = β̂0 + β̂1 x1 + β̂2 x2

where tilde (∼) denotes simple regression and hat (∧) denotes
multiple regression (by same data)

• Question: Is there simple relationship between β̃1 (which does

not control for x2 ) and β̂1 (which does)?

22
• Yes, but we need to define another simple regression. Let δ̃1
be the slope from regression

xi2 on xi1

(note: x2 plays the role of dependent variable)

• It is always true for any sample that

β̃1 = β̂1 + β̂2 δ̃1

23
Case 1: β̂2 > 0, x1 & x2 are positively correlated

• Positive correlation between x1 and x2 implies δ̃1 > 0. Thus, if

β̂2 > 0, then β̂2 δ̃1 > 0 and

β̃1 = β̂1 + β̂2 δ̃1

= β̂1 + (+)(+) > β̂1

i.e. slope estimate of x1 gets smaller if x2 is added

24
Example: log(wage) = β0 + β1 educ + β2 IQ + u

• We got β̃1 = .060 > .039 = β̂1

. reg IQ educ

Source SS df MS Number of obs = 935

F( 1, 933) = 338.02
Model 56280.9277 1 56280.9277 Prob > F = 0.0000
Residual 155346.531 933 166.502177 R-squared = 0.2659
Adj R-squared = 0.2652
Total 211627.459 934 226.581862 Root MSE = 12.904

IQ Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ 3.533829 .1922095 18.39 0.000 3.156616 3.911042

_cons 53.68715 2.622933 20.47 0.000 48.53962 58.83469

• Indeed δ̃1 = 3.53 > 0 from above

25
Case 2: β̂2 > 0, x1 & x2 are negatively correlated

• Negative correlation between x1 and x2 implies δ̃1 < 0. Thus,

if β̂2 > 0, then β̂2 δ̃1 < 0 and

β̃1 = β̂1 + β̂2 δ̃1

= β̂1 + (+)(−) < β̂1

i.e. slope estimate of x1 gets larger if x2 is added

26
Regression through the origin

• Occasionally, one wants to impose that predicted y is zero

when all xj ’s are zero. This means intercept should be set to
zero, rather than estimated

ỹ = β̃1 x1 + · · · + β̃k xk

• Cost of imposing zero intercept when population intercept is

not zero (i.e. β 6= 0) is severe: all slope estimators are
biased, in general

• Estimating intercept when we do not need to (i.e. β0 = 0)

does not cause bias in slope estimators. We already know this
from simple regression: Nothing in SLR.1-4 prevents
population parameters to be zero

27
3. Expected value of OLS estimators
(Wooldridge, Ch. 3.3)

28
Statistical properties of OLS

• As with simple regression, there is set of assumptions under

which OLS is unbiased

• We also explicitly consider bias caused by omitting regressor

appearing in population model

29
Assumptions MLR (Multiple Linear Regression)
• Assumption MLR.1 (Linear in parameters) In population,
it holds
y = β0 + β1 x1 + · · · + βk xk + u
where βj ’s are parameters and u is error term

• Assumption MLR.2 (Random Sampling) We have a

random sample {(yi , xi1 , . . . , xik ) : i = 1, . . . , n} of size n from
population

• Assumption MLR.3 (No perfect collinearity) None of

regressor is constant, and there are no exact linear
relationships among them

• Assumption MLR.4 (Zero conditional mean)

E (u|x1 , . . . , xk ) = 0 for all (x1 , . . . , xk )

30
Assumption MLR.1

• Assumption MLR.1 (Linear in parameters)

In population, it holds

y = β0 + β1 x1 + · · · + βk xk + u

where βj ’s are parameters and u is error term

• y and xj ’s can be nonlinear functions of underlying variables

(e.g. log(y ) and xj2 ), so the model is flexible

31
Assumption MLR.2

• Assumption MLR.2 (Random Sampling)

We have a random sample {(yi , xi1 , . . . , xik ) : i = 1, . . . , n} of

size n from population

• As with SLR.2, this assumption introduces data and implies

data are representative sample from population

• By SLR.1-2, we can write

yi = β0 + β1 xi1 + · · · + βk xik + ui

for i = 1, . . . , n

32
Assumption MLR.3

• Assumption MLR.3 (No perfect collinearity)

None of regressor is constant, and there are no exact linear

relationships among them

• Need to rule out cases where {xij : i = 1, . . . , n} has no

variation is clear from simple regression

• New part to this assumption because of multiple regressors:

We must rule out (extreme) case that some of regressors is an
exact linear function of others

33
Perfect collinearity

• If, say, xi1 is an exact linear function of xi2 , . . . , xik in sample,

we say model suffers from perfect collinearity

• Under perfect collinearity, there are no unique OLS estimators.

Stata will indicate the problem

• Usually perfect collinearity arises from bad specification of

model. Small sample size can also be reason (e.g. unluckily
educi = 2experi for all i)

34
Example: Same variable in different units

• Do not include same variable in model measured in different

units

• For example, in CEO salary equation, it would make no sense

to include firm sales measured in dollars along with sales
measured in millions of dollars (no new information)

• Another example: Return on equity should be included as

percent or proportion, but not both

35
• Also be careful with functional forms

• For example, following does not work

log(cons) = β0 + β1 log(inc) + β2 log(inc 2 ) + u

because log(inc 2 ) = 2 log(inc)

• Instead we probably mean something like

log(cons) = β0 + β1 log(inc) + β2 [log(inc)]2 + u

With this choice, x2 = x12 is an exact nonlinear function of

x1 , but this is allowed in MLR.3

36
One more example
• Consider

voteA = β0 + β1 exA + β2 exB + β3 exTotal + u

where exA and exB are campaign spendings by Candidates A

and B, exTotal is total spending

• Problem is: by definition

exA + exB = exTotal

• One of three variables has to be dropped (Stata automatically

does this, but better to do by yourself)
exA
• On the other hand, share of expenditure shareA = exA+exB
can be included along with exA and exB because shareA is
nonlinear function of exA and exB
37
Further remark on MLR.3

• Key: MLR.3 does not say regressors have to be uncorrelated.

MLR.3 only rules out perfect correlation in sample, i.e.
correlations of ±1

• Again in practice violations of MLR.3 are rare unless mistake

has been made in specifying model

• In equation like

log(wage) = β0 + β1 educ + β2 IQ + β3 exper + u

we fully expect correlation among regressors

• Multiple regression allows us to estimate ceteris paribus

effects even under correlation among xj ’s

38
Assumption MLR.4

• Assumption MLR.4 (Zero conditional mean)

E (u|x1 , . . . , xk ) = 0 for all (x1 , . . . , xk )

• If u is correlated with any of xj ’s, MLR.4 is violated

• Often hope is that if our focus is on, say, x1 , we can include

enough other variables in x2 , . . . , xk to make MLR.4 true or
close to true

• When MLR.4 holds, we say x1 , . . . , xk are exogenous

regressors

• If xj is correlated with u, we often say xj is an endogenous

regressor (although this name comes from another context)

39
Example: Effects of class size on student performance

• Consider regression for standardized test score

score = β0 + β1 classize + β2 income + u

• Even at same income level, families differ in their interest and

concern about children’s education. Family support and
student motivation are in u. Are these correlated with class
size even though we have included income? (Probably)

40
Theorem: Unbiasedness of OLS

• Under Assumptions MLR.1-4, OLS estimators are unbiased

E (β̂j ) = βj

for each j = 0, 1, . . . , k

• This result holds for any value of βj , including zero

• See Appendix 3A for proof

41
Inclusion of irrelevant variables

• It is important to see that unbiasedness result allow for βj to

be any value, including zero

• Consider

log(wage) = β0 + β1 educ + β2 exper + β3 motheduc + u

where MLR.1-4 hold

• Suppose that β3 = 0, but we do not know that. We estimate

full model by OLS

\ = β̂0 + β̂1 educ + β̂2 exper + β̂3 motheduc

log(wage)

42
• We automatically know from unbiasedness result that

E (β̂j ) = βj for j = 0, 1, 2
E (β̂3 ) = 0

• Including irrelevant variables (regressors with zero coefficients)

do not cause bias in any coefficients

• In other words, overspecifying the model cause no bias

43
Omitted variable bias (OVB)

• Leaving a variable out when it should be included in multiple

regression is serious problem

• Consider the case where correct model has two explanatory

variables (satisfying MLR.1-4)

y = β0 + β1 x1 + β2 x2 + u

• If we regress y on x1 and x2 , we know resulting OLS

estimators will be unbiased. But suppose we omit x2 and use
simple regression of y on x1

ỹ = β̃0 + β̃1 x1

• In most cases, we omit x2 because we cannot collect data on it

44
Derivation of OVB

• We can easily derive bias in β̃1 conditional on the sample

outcomes X = {(xi1 , xi2 ) : i = 1, . . . , n}

• We already have relationship between β̃1 and multiple

regression estimator β̂1

β̃1 = β̂1 + β̂2 δ̃1

where β̂2 is multiple regression estimator of β2 and δ̃1 is slope

coefficient in auxiliary regression

x2 on x1

45
• Now use the fact that β̂1 and β̂2 are unbiased conditional on X

E (β̂1 ) = β1
E (β̂2 ) = β2

• Since δ̃1 is a function of {(xi1 , xi2 ) : i = 1, . . . , n}, conditional

on X

E (β̃1 ) = E (β̂1 ) + E (β̂2 )δ̃1

= β1 + β2 δ̃1

• Therefore, conditional on X

Bias(β̃1 ) = E (β̃1 ) − β1 = β2 δ̃1

• Recall that δ̃1 has same sign as sample correlation

Corr (xi1 , xi2 )

46
When does β̃1 happen to be unbiased?

• Simple regression estimator β̃1 is unbiased in two cases

• (1) β2 = 0. But this means x2 does not appear in model, so

simple regression is right thing to do

• (2) δ̃1 = 0 or Corr (xi1 , xi2 ) = 0

• If β2 6= 0 and Corr (xi1 , xi2 ) 6= 0, then β̃1 is generally biased

• We do not know β2 and only have vague idea about size of δ̃1 .
But we can often guess sign of bias

47
Bias in simple regression estimator of β1

• OVB formula: Conditional on X

Bias(β̃1 ) = E (β̃1 ) − β1 = β2 δ̃1

• Sign of bias
Corr (x1 , x2 ) > 0 Corr (x1 , x2 ) < 0
β2 > 0 Positive Bias Negative Bias
β2 < 0 Negative Bias Positive Bias

48
Example: Omitted ability bias

• Consider

log(wage) = β0 + β1 educ + β2 abil + u

where abil is “ability”

• Essentially by definition β2 > 0. We also think

Corr (educ, abil) > 0

so that higher ability people get more education on average

49
• In this scenario

E (β̃1 ) = β1 + β2 δ̃1
= β1 + (+)(+) > β1

so there is upward bias in simple regression. Failure to control

for ability leads to (on average) overestimating return to
education

• Remember, for particular sample, we never know whether

β̃1 > β1 . But we should be very hesitant to trust procedure
that produces to large bias on average

50
Example: Effects of tutoring program on student
performance
• Consider
GPA = β0 + β1 tutor + β2 abil + u
where tutor is hours spent in tutoring.
• Again β2 > 0. Suppose that students with lower ability tend
to use more tutoring

Corr (tutor , abil) < 0

• In this scenario,

E (β̃1 ) = β1 + β2 δ̃1
= β1 + (+)(−) < β1

so that failure to account for ability leads to underestimate

effect of tutoring
51
4. Variance of OLS estimators
(Wooldridge, Ch. 3.4)

52
Assumptions so far

• MLR.1: y = β0 + β1 x1 + · · · + βk xk + u

• MLR.2: random sampling from the population

• MLR.3: no perfect collinearity in the sample

• MLR.4: E (u|x1 , . . . , xk ) = 0

• Under MLR.3 we can compute OLS estimates

• Other assumptions ensure that OLS is unbiased

• To get Var (β̂j ), we add simplifying assumption,

homoskedasticity

53
Assumption MLR.5

• Assumption MLR.5 (Homoskedasticity)

Variance of u does not change with any of x1 , . . . , xk

Var (u|x1 , . . . , xk ) = Var (u) = σ 2

• This assumption can never be guaranteed. We impose this for

now to get simple formulas

• MLR.1-4 imply

E (y |x1 , . . . , xk ) = β0 + β1 x1 + · · · + βk xk

and when we add MLR.5

Var (y |x1 , . . . , xk ) = Var (u|x1 , . . . , xk ) = σ 2

54
Example: Savings equation

• Consider savings equation

sav = β0 + β1 inc + β2 famsize + β3 pareduc + u

where famsize is size of family and pareduc is total parents’

education

• MLR.5 means that variance in sav cannot depend in income,

family size, or parents’s education

• Later we will show how to relax MLR.5, and how to test

whether it is true

55
Formula for Var (β̂j )
• Focus on slope (different formula is needed for intercept)

• As before, we compute variance conditional on the values of

regressors

• We need to define two quantities associated with each xj .

First is total variation of xj in sample
n
X
SSTj = (xij − x̄j )2
i=1

• Second is measure of correlation between xj and other

regressors, which is

Rj2 = R 2 of regression from xj on other regressors

(note: y plays no role here)

56
Theorem: Sampling variance of OLS estimators

• Under Assumptions MLR.1-5, and conditional on X,

σ2
Var (β̂j ) =
SSTj (1 − Rj2 )

for j = 1, . . . , k, where
n
X
SSTj = (xij − x̄j )2
i=1
Rj2 = R 2 of regression from xj on other regressors

57
Remark on theorem

• All five assumptions are needed to get this formula

• Note: Rj2 = 1 is ruled out by Assumption MLR.3

• Any value 0 ≤ Rj2 < 1 is permitted. As Rj2 gets closer to one,

xj is more linearly related to other regressors
σ 2
• If MLR.5 is violated, variance formula Var (β̂j ) = is
SSTj (1−Rj2 )
generally incorrect for all j = 1, . . . , k

58
Remark on variance formula

• Variance formula

σ2
Var (β̂j ) =
SSTj (1 − Rj2 )

has three components

• σ 2 and SSTj are familiar from simple regression. The third

component 1 − Rj2 is new to multiple regression

• As error variance σ 2 = Var (ui ) decreases, Var (β̂j ) decreases.

One way to reduce error variance is to take more stuff out of
the error, i.e. add more regressors

59
Effect of SSTj

• As total sample variation SSTj in xj increases, Var (β̂j )

decreases. It is easier to estimate how xj affects y if we see
more variation in xj

• As we mentioned earlier, SSTj /n (or SSTj /(n − 1)) is sample

variance of {xij : i = 1, . . . , n}. So we can say

SSTj ≈ nσj2

where σj2 = Var (xj ) is variance of xj

• We can increase SSTj by increasing sample size

60
Effect of Rj2

• As Rj2 → 1, Var (β̂j ) → ∞. Rj2 measures how linearly related

xj is to other regressors

• We get smallest variance for β̂j when Rj2 = 0

σ2
Var (β̂j ) =
SSTj

which looks just like simple regression formula

• If xj is unrelated to all other regressors, it is easier to estimate

its ceteris paribus effect on y

• Rj2 = 0 is very rare. In fact, Rj2 ≈ 1 is somewhat common.

This can cause problems for getting sufficiently precise
estimate of βj

61
Multicollinearity

• Loosely, Rj2 “close” to one is called the “problem” of

multicollinearity

• Unfortunately, we cannot define what we mean by “close” that

is relevant for all situations. We have ruled out the case of
perfect collinearity Rj2 = 1

• Here is important point: One often hears discussions of

multicollinearity as if high correlation among regressors is
violation of an assumption we made. But it does not violate
any of Assumptions MLR.1-5

• So regardless of multicollinearity, we still have E (β̂j ) = βj and

variance formula is correct

62
• In fact, formula is doing its job: It shows that if Rj2 is “close”
to one, Var (β̂j ) might be very large

• If Rj2 is “close” to one, xj does not have much sample variation

separate from other regressors. We are trying to estimate
effect of xj on y , holding x1 , . . . , xj−1 , xj+1 , . . . , xk fixed, but
data may not allow us to do that very precisely

• Because multicollinearity violates none of our assumptions, it

is essentially impossible to state hard rules about when it is a
“problem”

63
• Value of Rj2 per se is not important. Ultimately what
important is Var (β̂j )

• For Var (β̂j ), large Rj2 can be offset by large SSTj , which
grows roughly linearly with sample size n

• At this point, we have no way of knowing whether Var (β̂j ) is

“too large” for the estimate β̂j to be useful. Only when we
discuss confidence intervals and hypothesis testing (in Ch. 4),
this will be apparent

64
Correlation among control variables

• Consider
y = β0 + β1 x1 + β2 x2 + β3 x3 + u
where β1 is coefficient of interest. Assume x2 and x3 act as
controls so that we hope to get good ceteris paribus estimate
of x1 . Such controls are often highly correlated. (E.g. x2 and
x3 are different test scores)

• Key is: correlation between x2 and x3 has nothing to do with

Var (β̂1 ). It is only correlation of x1 with (x2 , x3 ) that matters

65
Example

• To determine whether communities with larger minority

populations are discriminated against in lending

percapproved = β0 + β1 percminority
+β2 avginc + β3 avghouseval + u,

where β1 is of interest

• avginc and avghouseval might be highly correlated. But we

do not care whether we can precisely estimate β2 or β3

66
Variance in misspecified models

• As with bias calculations, we can study variances of OLS

estimators in misspecified models

• Consider
y = β0 + β1 x1 + β2 x2 + u
where Assumptions MLR.1-5 hold true

• We run “short” regression, y on x1 , and also “long” regression,

y on x1 , x2

ỹ = β̃0 + β̃1 x1
ŷ = β̂0 + β̂1 x1 + β̂2 x2

67
• From previous analysis, we know: conditional on X,

σ2
Var (β̂1 ) =
SST1 (1 − R12 )

• What about simple regression OLS? We can show: conditional

on X,
σ2
Var (β̃1 ) =
SST1
• Whenever xi1 and xi2 are correlated, then R12 > 0 and

σ2 σ2
Var (β̃1 ) = < = Var (β̂1 )
SST1 SST1 (1 − R12 )

• By omitting x2 , in fact we get estimator with smaller

variance, even though it is biased (bias-variance tradeoff)

68
Two cases: y = β0 + β1 x1 + β2 x2 + u

• (1) If β2 6= 0, then
β̃1 is biased, β̂1 is unbiased, but Var (β̃1 ) < Var (β̂1 )

• (2) If β2 = 0, then
β̃1 and β̂1 are both unbiased and Var (β̃1 ) < Var (β̂1 )

• Case 2 is clear. If β2 = 0, x2 has no (partial) effect on y .

When x2 is correlated with x1 , including it along with x1
makes it more difficult to estimate partial effect of x1 . Simple
regression is clearly preferred

69
Case 1

• Case 1 is more difficult, but there is reason to prefer unbiased

estimator β̂1

• Bias in β̃1 does not systematically change with sample size.

We should assume bias is as large when n = 1000 as when
n = 10

• By contrast, variances of β̃1 and β̂1 both shrink at the rate

1/n. With large sample size, difference between Var (β̃1 ) and
Var (β̂1 ) is less important

70
Estimation of σ 2 and Standard error
• We still need to estimate σ 2 = Var (u). For multiple
regression, its unbiased estimator is
n
2 1 X SSR
σ̂ = ûi2 =
n−k −1 n−k −1
i=1

• In Stata, square root σ̂ is reported as “Root MSE”

• Note: SSR falls when new regressor is added, but degree of

freedom n − (k + 1) falls too. So σ̂ can increase or decrease
when new variable is added
• Standard error of slope β̂j is computed as

σ̂
se(β̂j ) = q
SSTj (1 − Rj2 )

• Critical to report this along with coefficient estimate

71
Example: Wage equation

. reg lwage educ IQ exper

Source SS df MS Number of obs = 935

F( 3, 931) = 60.10
Model 26.876768 3 8.95892266 Prob > F = 0.0000
Residual 138.779515 931 .149065 R-squared = 0.1622
Adj R-squared = 0.1595
Total 165.656283 934 .177362188 Root MSE = .38609

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .057108 .007348 7.77 0.000 .0426875 .0715285

IQ .0057856 .0009797 5.91 0.000 .003863 .0077082
exper .0195249 .0032444 6.02 0.000 .0131579 .025892
_cons 5.198085 .1215426 42.77 0.000 4.959556 5.436614

72
5. Efficiency of OLS: Gauss-Markov
theorem
(Wooldridge, Ch. 3.5)

73
Efficiency of OLS

• How come we use OLS β̂j , rather than some other estimation
method, say β̌j ?

• First criterion to compare these estimators would be

unbiasedness. If β̌j is biased, we prefer OLS

• Suppose β̌j is unbiased. Then we prefer estimator with smaller

variance (called efficiency)

• Under MLR.1-5, OLS estimator has smallest variance in

certain class of estimators

74
Gauss-Markov theorem

• Terminology: Estimator β̃j of βj is called linear estimator if

it takes the form of
n
X
β̃j = wij yi
i=1

where wij (i = 1, . . . , n) is any function of regressors X

• OLS estimator β̂j can be written in this way, i.e., OLS

estimator is an example of linear estimator

• Gauss-Markov theorem: Under MLR.1-5, OLS estimator is

the Best Linear Unbiased Estimator (BLUE)

• Here “best” means “smallest variance”

75
• G-M theorem says: under MLR.1-5, if we take any linear
unbiased estimator β̃j , then conditional on X

Var (β̂j ) ≤ Var (β̃j )

for j = 0, . . . , k

• Recall β̂j is unbiased under MLR.1-4

• Implication of G-M theorem: If we insist on linear unbiased

estimators, then we need look no further than OLS

• See Appendix 3A for proof

• If MLR.5 fails, β̂j is not BLUE in general

MLRM
No ratings yet
MLRM
67 pages
Part 2 - Simple Regression Model
No ratings yet
Part 2 - Simple Regression Model
56 pages
Assignment 1 With Answers PDF
No ratings yet
Assignment 1 With Answers PDF
8 pages
ECO375H Slides 2
No ratings yet
ECO375H Slides 2
39 pages
Simple Regression Model
No ratings yet
Simple Regression Model
54 pages
Instrumental Variable in Regression
No ratings yet
Instrumental Variable in Regression
28 pages
Instrumental Variable in Regression
No ratings yet
Instrumental Variable in Regression
28 pages
Lec Topic2
No ratings yet
Lec Topic2
68 pages
Econ 103 2023 MidTerm Practice2 - With - Solutions
No ratings yet
Econ 103 2023 MidTerm Practice2 - With - Solutions
12 pages
Ecotrix Assignment
No ratings yet
Ecotrix Assignment
5 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Lecture Set 5
No ratings yet
Lecture Set 5
54 pages
18-Econometrics-Linear Regression
No ratings yet
18-Econometrics-Linear Regression
18 pages
2 Simple Regression Model
No ratings yet
2 Simple Regression Model
55 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
Problem Set 3 SOLUTIONS
No ratings yet
Problem Set 3 SOLUTIONS
7 pages
YD Slides5 NonLin
No ratings yet
YD Slides5 NonLin
54 pages
CH 5 - Multicollearity
No ratings yet
CH 5 - Multicollearity
27 pages
Part 2 - Multiple Regression Model
No ratings yet
Part 2 - Multiple Regression Model
49 pages
19-Econometrics-Linear Regression
No ratings yet
19-Econometrics-Linear Regression
17 pages
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
No ratings yet
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
48 pages
Econometrics 7
No ratings yet
Econometrics 7
49 pages
17-Econometrics-Linear Regression
No ratings yet
17-Econometrics-Linear Regression
18 pages
Econometrics For Finanace Test 2
No ratings yet
Econometrics For Finanace Test 2
1 page
Econometrics 5 and 6
No ratings yet
Econometrics 5 and 6
16 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
QM 9 Instrumental Variables I
No ratings yet
QM 9 Instrumental Variables I
29 pages
Year 8 Mathematics Autumn White Rose Higher B
0% (1)
Year 8 Mathematics Autumn White Rose Higher B
12 pages
05 Week Economicsofeducation
No ratings yet
05 Week Economicsofeducation
11 pages
Analisis Regresi Sederhana Dan Berganda (Teori Dan Praktik)
No ratings yet
Analisis Regresi Sederhana Dan Berganda (Teori Dan Praktik)
53 pages
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
No ratings yet
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
89 pages
Problem Set
No ratings yet
Problem Set
8 pages
Dougherty5e IM 2015 09 12 ch03
No ratings yet
Dougherty5e IM 2015 09 12 ch03
15 pages
AE6207 - Solution 1 - 2024
No ratings yet
AE6207 - Solution 1 - 2024
8 pages
Seminar Report On Machine Learing
33% (3)
Seminar Report On Machine Learing
30 pages
Wooldridge 7e Ch06 SM
No ratings yet
Wooldridge 7e Ch06 SM
9 pages
Mock Test Econ
No ratings yet
Mock Test Econ
2 pages
Ansprac 2
No ratings yet
Ansprac 2
6 pages
Topic 1 Class Exercises
No ratings yet
Topic 1 Class Exercises
5 pages
Machine Design-Ii: Gears
100% (1)
Machine Design-Ii: Gears
50 pages
Homework 2 Questions
No ratings yet
Homework 2 Questions
7 pages
Section 13F - Engine Electrical System PDF
No ratings yet
Section 13F - Engine Electrical System PDF
14 pages
Econ 251 PS5 Solutions
No ratings yet
Econ 251 PS5 Solutions
16 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
48 pages
Modelo Multiple
No ratings yet
Modelo Multiple
51 pages
Low Voltage Circuit Breaker Testing - Emerson
100% (1)
Low Voltage Circuit Breaker Testing - Emerson
1 page
Assignment No5
No ratings yet
Assignment No5
1 page
Introduction To Econometrics - Stock & Watson - CH 6 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 6 Slides
59 pages
Assignement 1 .Hridita. BUS 525
No ratings yet
Assignement 1 .Hridita. BUS 525
10 pages
Measurement of High Voltage
100% (1)
Measurement of High Voltage
30 pages
5103A1
No ratings yet
5103A1
6 pages
Econ 251 S2018 PS6 Solutions
No ratings yet
Econ 251 S2018 PS6 Solutions
16 pages
Lnq = Β + Β Lnli + Β Lnki + Ɛ
No ratings yet
Lnq = Β + Β Lnli + Β Lnki + Ɛ
12 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
اطروحة شبر جواد كاظم العبيدي
No ratings yet
اطروحة شبر جواد كاظم العبيدي
135 pages
Lab Exercises Answer
No ratings yet
Lab Exercises Answer
13 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
16 pages
Econ 251 PS4 Solutions
No ratings yet
Econ 251 PS4 Solutions
11 pages
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
4 pages
Perhitungan Sistem Bilga Di Kapal
No ratings yet
Perhitungan Sistem Bilga Di Kapal
63 pages
Centeno - Alexander PSET2 LBYMET2 Final
No ratings yet
Centeno - Alexander PSET2 LBYMET2 Final
11 pages
Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
β school+u (for women) : Estimation of log (wage) =β +
No ratings yet
β school+u (for women) : Estimation of log (wage) =β +
3 pages
Maths2b Ipe QN Bank 20-21
No ratings yet
Maths2b Ipe QN Bank 20-21
18 pages
Im ch01
No ratings yet
Im ch01
11 pages
LP LECTURE NOTES-1 Linux Programming PDF
No ratings yet
LP LECTURE NOTES-1 Linux Programming PDF
235 pages
Capital Budgeting: Learning Objectives
No ratings yet
Capital Budgeting: Learning Objectives
41 pages
Econ 1630 HW1
No ratings yet
Econ 1630 HW1
6 pages
CBSE Class12 PYQs Electric Charges and Fields-1
No ratings yet
CBSE Class12 PYQs Electric Charges and Fields-1
2 pages
Declarative Programming
No ratings yet
Declarative Programming
35 pages
03-TN - SP023 - E1 - 1 Number Plan in CS Domain-11
No ratings yet
03-TN - SP023 - E1 - 1 Number Plan in CS Domain-11
9 pages
Suyono & Hariyanto - 2012 - Relationship Between Internal Control, Internal Audit, and Organization Commitment With Good Governance Indonesian Case
No ratings yet
Suyono & Hariyanto - 2012 - Relationship Between Internal Control, Internal Audit, and Organization Commitment With Good Governance Indonesian Case
10 pages
Example of 2SLS and Hausman Test
No ratings yet
Example of 2SLS and Hausman Test
4 pages
Anesthetic Technique For Inferior Alveolar Nerve Block: A New Approach
No ratings yet
Anesthetic Technique For Inferior Alveolar Nerve Block: A New Approach
5 pages
SP Tools - MaytoJuly2013
No ratings yet
SP Tools - MaytoJuly2013
24 pages
Seismic Micro Zonation Aap PHD
No ratings yet
Seismic Micro Zonation Aap PHD
11 pages
QED User Manual
No ratings yet
QED User Manual
57 pages
STA301-Quiz-2 by Vu Topper RM
No ratings yet
STA301-Quiz-2 by Vu Topper RM
125 pages
17 GEOG245 Tutorial9 PDF
No ratings yet
17 GEOG245 Tutorial9 PDF
7 pages
Emergent Ecapture Pro Manual v0.1.7 (2022-08-05)
No ratings yet
Emergent Ecapture Pro Manual v0.1.7 (2022-08-05)
128 pages
Herrmann Gerlach Seelig 2015
No ratings yet
Herrmann Gerlach Seelig 2015
13 pages
Protein Metabolism: Department of Biochemistry Medical Faculti of Hasanuddin University
No ratings yet
Protein Metabolism: Department of Biochemistry Medical Faculti of Hasanuddin University
80 pages
Kidney Disease Early-Stage Identification and Prevention Using Supervised Machine Learning
No ratings yet
Kidney Disease Early-Stage Identification and Prevention Using Supervised Machine Learning
6 pages
L14: Optimal Linear Filtering - Wiener Filtering: Lennart Svensson
No ratings yet
L14: Optimal Linear Filtering - Wiener Filtering: Lennart Svensson
12 pages
Token Ring (TR, IEEE 802.5) : Technology Description
No ratings yet
Token Ring (TR, IEEE 802.5) : Technology Description
4 pages
Resource 20240428125627 Doc-20240422-Wa0002.
No ratings yet
Resource 20240428125627 Doc-20240422-Wa0002.
2 pages
Datasheet RevPi AIO
No ratings yet
Datasheet RevPi AIO
2 pages
Adobe Photoshop Cs6 Portable Camera Raw
No ratings yet
Adobe Photoshop Cs6 Portable Camera Raw
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet