0% found this document useful (0 votes)

27 views51 pages

Lec Topic3

Uploaded by

xinyangw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views51 pages

Lec Topic3

Uploaded by

xinyangw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

ECON2280 Introductory Econometrics

First Term, 2024-2025

Multiple Regression Analysis: Estimation

Fall, 2024

1 / 51
Motivation for Multiple Regression

2 / 51
Motivation for Multiple Regression

▶ The multiple linear regression (MLR) model is defined as

y = β0 + β1 x1 + · · · + βk xk + u,

which tries to explain variable y in terms of variables x1 , · · · , xk .

▶ The terminologies for y , (x1 , · · · , xk ), u, (β0 , β1 , · · · , βk ) are the
same as in the SLR model.
▶ Motivations:
– Incorporate more explanatory factors into the model;
– Explicitly hold fixed other factors that otherwise are in u;
– Allow for more flexible functional forms.

3 / 51
Example: Wage Equation

▶ Suppose
wage = β0 + β1 educ + β2 exper + u

– wage: hourly wage

– educ: years of education
– exper : years of labor market experience
– u: all other factors affecting wage

▶ Now, β1 measures effect of education explicitly holding experience

fixed.
▶ If omitting exper , then E [u|educ] ̸= 0 given that educ and exper
are correlated =⇒ β̂1 is biased.

4 / 51
Example: Family Income and Family Consumption
▶ Suppose
cons = β0 + β1 inc + β2 inc 2 + u,

– cons: family consumption

– inc: family income
– inc 2 : family income squared
– u: all other factors affecting cons

▶ Model has two explanatory variables: income and income squared.

▶ Consumption is explained as a quadratic function of income.

▶ To interpret the coefficients:

∂cons
= β1 + 2β2 inc,
∂inc
which depends on how much income is already there.

5 / 51
Mechanics and Interpretation of
Ordinary Least Squares

6 / 51
Obtaining the OLS Estimates

▶ Suppose we have a random sample

{(xi1 , · · · , xik , yi ) : i = 1, · · · , n}, where i denotes the observation
number, and j denotes different independent variables.
▶ Given (β̂0 , β̂1 , · · · , β̂k ), the residual for observation i is:

ûi = yi − ŷi = yi − β̂0 − β̂1 xi1 − · · · − β̂k xik .

▶ We choose to minimize the sum of squared residuals

n
X n
X
min û 2 = min (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik )2
β̂0 ,β̂1 ,··· ,β̂k i=1 β̂0 ,β̂1 ,··· ,β̂k i=1

▶ The minimizers are the OLS estimates.

7 / 51
Obtaining the OLS Estimates

▶ The OLS estimates are the solution to the FOCs:

n
X
(yi − β̂0 − β̂1 xi1 − · · · − β̂k xik ) = 0,
i=1
n
X
xi1 (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik ) = 0,
i=1
..
.
n
X
xik (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik ) = 0,
i=1

which can be carried out through a standard econometric software.

8 / 51
Interpreting the OLS Regression Equation
▶ In the MLR model y = β0 + β1 x1 + · · · + βk xk + u:

∂y
βj = ,
∂xj

which means “by how much does the dependent variable change if
the j-th independent variable is increased by one unit, holding all
other independent variables and the error term constant”.
▶ The multiple linear regression model manages to hold the values of
other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration.
▶ dy ∂u
dxj |(x1 ,··· ,xj−1 ,xj+1 ,··· ,xk ) = βj + ∂xj .

▶ We still need to assume that u do not change with xj conditional on

(x1 , · · · , xj−1 , xj+1 , · · · , xk ). The zero conditional mean assumption
is E [u|x1 , · · · , xk ] = 0, which is more plausible than E [u|xj ] = 0.

9 / 51
Example: Determinants of College GPA

▶ The fitted regression is

\ = 1.29 + 0.5hsGPA + 0.0003SAT .

FreshGPA

– FreshGPA: GPA in freshman year

– hsGPA: high school GPA
– SAT : SAT score

▶ Holding SAT fixed, an increase in high school GPA by 1 point is

associated with a 0.5 point higher freshman year GPA.
▶ Or: If we compare two students, A and B, with the same SAT , but
the hsGPA of A is one point higher, we predict A to have a
FreshGPA that is 0.5 higher than that of B.

10 / 51
A ”Partialling Out“ Interpretation of Multiple Regression
▶ One can show that the estimated coefficient of an explanatory
variable in a multiple regression can be obtained in two steps:
1. Regress the explanatory variable on all other explanatory
variables.
2. Regress y on the residuals from this regression.

▶ Mathematically, suppose we regress y on the constant 1, x1 and x2

(denoted as ŷ = β̂0 + β̂1 x1 + β̂2 x2 ), and want to get β̂1 .
1. x̂i1 = δ̂0 + δ̂1 xi2 =⇒ r̂i1 is the residual from this regression.
2. ŷi = α̂0 + α̂1 r̂i1
Pn
r̂i1 yi
=⇒ α̂1 = Pni 2
= β̂1
i=1 r̂i1
Pn
▶ From Step 1, xi1 = x̂i1 + r̂i1 with x̂i1 = δ̂0 + δ̂1 xi2 , i=1 r̂i1 = 0,
Pn Pn
i=1 xi2 r̂i1 = 0, and i=1 x̂i1 r̂i1 = 0.

11 / 51
A ”Partialling Out“ Interpretation of Multiple Regression
Pn
▶ The FOC w.r.t β̂1 is i=1 xi1 (yi − β̂0 − β̂1 xi1 − β̂k xi2 ) = 0
n
X
=⇒ (δ̂0 + δ̂1 xi2 + r̂i1 )(yi − β̂0 − β̂1 xi1 − β̂k xi2 )
i=1
n
X n
X n
X
= δ̂0 ûi + δ̂1 xi2 ûi + r̂i1 (yi − β̂0 − β̂1 xi1 − β̂2 xi2 )
i=1 i=1 i=1
n
X n
X n
X
= −β̂0 r̂i1 − β̂2 xi2 r̂i1 + r̂i1 [yi − β̂1 (x̂i1 + r̂i1 )]
i=1 i=1 i=1
n
X
= r̂i1 (yi − β̂1 r̂i1 ) = 0
i=1
Pn Pn
where i=1 ûi = 0 and i=1 xi2 ûi = 0 are obtained from the FOCs
w.r.t β̂0 and β̂2 , respectively.

12 / 51
Why does This Procedure Work?

▶ This procedure is usually called the Frisch-Waugh theorem.

▶ The residuals from the first regression is the part of the explanatory
variable that is uncorrelated with the other explanatory variables.
▶ The slope coefficient of the second regression therefore represents
the isolated (or pure) effect of the explanatory variable on the
dependent variable.
▶ Recall that in the SLR,
Pn
(xi − x̄ )yi
β̂1 = Pi=1
n 2
.
i=1 (xi − x̄ )

In the MLR, we replace (xi − x̄ ) by r̂i1 . Actually, in the SLR,

(xi − x̄ ) is the residual in the regression of xi on all other
explanatory variables which include only the constant 1.

13 / 51
Properties of OLS on Any Sample of Data

▶ Algebraic properties of OLS regression:

Pn
– i=1 ûi = 0: deviations from the fitted regression ”plane” sum
up to zero. ȳ = β̂0 + β̂1 x̄1 + · · · + β̂k x̄k : sample averages of y
and of the regressors lie on the fitted regression plane.
Pn
– i=1 xij ûi = 0, j = 1, ..., k: correlations between deviations
and regressors are zero.

▶ These properties are corollaries of the FOCs for the OLS estimates.

14 / 51
Goodness-of-Fit

▶ Decomposition of total variation:

SST = SSE + SSR

▶ R-squared:
SSE SSR
R2 = =1− .
SST SST
▶ Alternative expression for R-squared [proof not required]:
Pn 2
( yi − ȳ )(ŷi − ŷ¯ ) d (y , ŷ )2
Cov
R 2 = Pn i=1 Pn = d (y , ŷ )2
= Corr
(y − ȳ ) 2 (ŷ − ¯
ŷ ) 2
Var (y )Var (ŷ )
i=1 i i=1 i
d d

i.e., R-squared is equal to the squared correlation coefficient between

the actual and the predicted value of the dependent variable.
d (y , ŷ ) ∈ [−1, 1], R 2 ∈ [0, 1].
▶ Because Corr

15 / 51
R 2 Cannot Decrease When One More Regressor Is Added
▶ SSR with k and k + 1 regressors,
n
X
SSRk = min (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik )2
β̂0 ,β̂1 ,··· ,β̂k i=1

n
X
SSRk+1 = min (yi − β̂0 − β̂1 xi1 −· · ·− β̂k xik − β̂k+1 xik+1 )2
β̂0 ,β̂1 ,··· ,β̂k ,β̂k+1 i=1

▶ Treat SSRk+1 as a function of β̂k+1 , i.e., for each value of β̂k+1 , we

minimize the objective function of SSRk+1 with respect to
(β̂0 , β̂1 , ..., β̂k ). Denote the resulting function as SSRk+1 (β̂k+1 ).
▶ Obviously, when β̂k+1 = 0, the two objective functions of SSRk and
SSRk+1 are the same, i.e., SSRk+1 (0) = SSRk .
▶ However, we search for the optimal β̂k+1 that minimizes
SSRk+1 (β̂k+1 ). If the minimizer β̂k+1 ̸= 0, then
2
SSRk+1 (β̂k+1 ) < SSRk+1 (0) = SSRk . =⇒ Rk+1 > Rk2 .

16 / 51
Example: Explaining Arrest Records
▶ The fitted regression line is

narr
\ 86 = 0.712 − 0.150pcnv − 0.034ptime86 − 0.104qemp86
n = 2, 725, R 2 = 0.0413

– narr 86: number of times arrested during 1986

– pcnv : proportion (not percentage) of prior arrests that led to
conviction
– ptime86: months spent in prison during 1986
– qemp86: the number of quarters employed in 1986

▶ pcnv : +0.5 =⇒ -0.075, i.e., -7.5 arrests per 100 men.

▶ ptime86: +12 =⇒ -0.408 arrests.

▶ qemp86: +1 =⇒ -.104, i.e., -10.4 arrests per 100 men - economic

policies are effective.

17 / 51
Example: Explaining Arrest Records

▶ An additional explanatory variable is added:

narr
\ 86 = 0.707 − 0.151pcnv + 0.007avgsen − 0.037ptime86 − 0.103qemp86
n = 2, 725, R 2 = 0.0422 (increases only slightly)

▶ Average prior sentence increases number of arrests.

(counter-intuitive but β̂2 ≈ 0).
▶ Limited additional explanatory power as R-squared increases by
little. (why? β̂2 ≈ 0).
▶ General remark on R-squared: even if R-squared is small (as in the
given example), regression may still provide good estimates (i.e.,
s.e.’s are small) of ceteris paribus effects.

18 / 51
The Expected Value of the OLS Estimators

19 / 51
Standard Assumptions for the MLR Model

▶ Assumption MLR.1 (Linear in Parameters):

y = β0 + β1 x1 + · · · + βk xk + u.

– In the population, the relationship between y and x is linear.

– The “linear” in linear regression means “linear in parameter”.

▶ Assumption MLR.2 (Random Sampling): The data

{(xi1 , · · · , xik , yi ) : i = 1, ..., n} is a random sample drawn from the
population, i.e., each data point follows the population equation,

yi = β0 + β1 xi1 + · · · + βk xik + ui .

20 / 51
Standard Assumptions for the MLR Model

▶ Assumption MLR.3 (No Perfect Collinearity): In the sample (and

therefore in the population), none of the independent variables is
constant and there are no exact linear relationships among the
independent variables.

– The assumption only rules out perfect correlation between

explanatory variables; imperfect correlation is allowed.

– If an explanatory variable is a perfect linear combination of other

explanatory variables it is redundant and can be removed.

– Constant variables are ruled out (collinear with the regressor 1).
▶ This is an extension of ni=1 (xi − x̄ )2 > 0 in the SLR models. Why?
P

21 / 51
Example for Perfect Collinearity

▶ Suppose the MLR model is

voteA = β0 + β1 shareA + β2 shareB + u

where voteA is the percentage of vote for candidate A, and shareA

and shareB are the percentage of total campaign expenditures spent
by A and B in two-candidate elections.
▶ Either shareA or shareB has to be dropped from the regression
because there is an exact linear relationship between them:
shareA + shareB = 1.

22 / 51
Standard Assumptions for the MLR Model
▶ Assumption MLR.4 (Zero Conditional Mean):

E [u|x1 , x2 , · · · , xk ] = 0

The the values of the explanatory variables must contain no

information about the mean of the unobserved factors.
▶ In a MLR model, the zero conditional mean assumption is much
more likely to hold because fewer things end up in the error.
▶ Example: avgscore = β0 + β1 expend + β2 avginc + u,
– avgscore is average standardized test score of a school; expend
is per student spending at the school; avginc is average family
income of students at the school.
If avginc was not included in the regression, it would end up in the
error term; it would then be hard to defend that expend is
uncorrelated with the error.

23 / 51
Unbiasedness of OLS
▶ Explanatory variables that are correlated with u are called
endogenous variables; endogeneity is a violation of assumption
MLR.4.
▶ Explanatory variables that are uncorrelated with u are called
exogenous variables; MLR.4 holds if all explanatory variables are
exogenous.
▶ Exogeneity is the key assumption for a causal interpretation of the
regression, and for unbiasedness of the OLS estimators.
▶ Theorem (Unbiasedness of OLS): Under assumptions
MLR.1-MLR.4,
E [β̂j ] = βj , j = 0, 1, · · · , k,
for any values of population parameter βj .
▶ Unbiasedness is an average property in repeated samples; in a given
sample, the estimates may still be far away from the true values.

24 / 51
Including Irrelevant Variables in a Regression Model

▶ Suppose
y = β0 + β1 x1 + β2 x2 + β3 x3 + u,
where β3 = 0, i.e., x3 is irrelevant to y .
▶ No problem because E [β̂3 ] = β3 = 0.

▶ However, including irrelevant variables may increase sampling

variance of β̂1 and β̂2 .

25 / 51
Omitted Variable Bias: the Simple Case

▶ Suppose the true model is

y = β0 + β1 x1 + β2 x2 + u,

i.e., the true model contains both x1 and x2 (β1 ̸= 0, β2 ̸= 0).

However, we estimate a misspecified model:

y = α0 + α1 x1 + ε.

So, x2 is omitted.
▶ If x1 and x2 are correlated. Assume a linear regression relationship:

x2 = δ0 + δ1 x1 + v .

26 / 51
Omitted Variable Bias: the Simple Case

▶ Then,

y = β0 + β1 x1 + β2 (δ0 + δ1 x1 + v ) + u
= β0 + β2 δ0 + (β1 + β2 δ1 ) x1 + (u + β2 v ) .
| {z } | {z } | {z }
α0 α1 ε

▶ If y is only regressed on x1 , the estimated intercept and slope satisfy

E [α̂0 ] = β0 + β2 δ0

E [α̂1 ] = β1 + β2 δ1
Why? The new error term ε = u + β2 v satisfies the zero conditional
mean assumption: E [u + β2 v |x1 ] = E [u|x1 ] + β2 E [v |x1 ] = 0.
▶ Obviously, if β2 δ1 = 0, α̂1 is an unbiased estimator of β1 .

27 / 51
bias is determined by the sizes of b2 and δ 1.
In practice, since b2 is an unknown population parameter, we cannot be certain
Omitted Variable Bias: the Simple Case
whether b2 is positive or negative. Nevertheless, we usually have a pretty good idea about
the direction of the partial effect of x2 on y. Further, even though the sign of the correlation
between x1 and x2 cannot be known if x2 is not observed, in many cases, we can make an
educated guess about whether x1 and x2 are positively or negatively correlated.
In the wage equation (3.42), by definition, more ability leads to higher productivity
and therefore higher wages: b2  0. Also, there are reasons to believe that educ and abil
are▶positively
Summary correlated:
of biasoninaverage,
α̂1 when individuals with more
x2 is omitted ininnate ability choose
estimating higher
equation.
~
T a b l e 3 . 2 Summary of Bias in b1 when x2 Is Omitted in Estimating Eqution (3.40)

© Cengage Learning, 2013

Corr(x1, x2)  0 Corr(x1, x2)  0
b2 > 0 Positive bias Negative bias
b2 < 0 Negative bias Positive bias

age Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial rev
d that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

▶ When is there no omitted variable bias? If the omitted variable is

irrelevant (β2 = 0) or uncorrelated (δ1 = 0).

28 / 51
Example: Omitting Ability in a Wage Equation

▶ Suppose the true wage equation is

wage = β0 + β1 educ + β2 abil + u, where β2 > 0.

But the estimated equation is

wage = α0 + α1 educ + ε.

▶ Suppose
abil = δ0 + δ1 educ + v , where δ1 > 0.
▶ The return to education β1 will be overestimated because β2 δ1 > 0.
It will look as if people with many years of education earn very high
wages, but this is partly due to the fact that people with more
education are also more able on average.

29 / 51
Omitted Variable Bias: More General Cases

▶ How the omission of xk bias the estimates for β0 , β1 , · · · , βk−1

when k ≥ 3?
– Long regression: y = β̂0 + β̂1 x1 + · · · + β̂k xk
– Short regression: y = α̂0 + α̂1 x1 + · · · + α̂k−1 xk−1
– xk regression: xk = δ̃0 + δ̃1 x1 + · · · + δ̃k−1 xk−1

▶ Plug in the xk regression to the long regression and collect terms:

α̂j = β̂j + β̂k δ̃j , j = 0, 1, · · · , k − 1.

=⇒ E [α̂j ] = βj + βk δj .
▶ α̂j is an unbiased estimator for βj only if βk = 0 or δj = 0.

30 / 51
Exercise

Suppose that you are interested in estimating the ceteris paribus

relationship between y and x1 . For this purpose, you can collect
data on two control variables, x2 and x3 . Let β̃1 be the simple
regression estimate from y on x1 and let β̂1 be the multiple
regression estimate from y on x1 , x2 , x3 .
▶ If x1 is highly correlated with x2 and x3 in the sample, and x2
and x3 have large partial effects on y , would you expect β̃1
and β̂1 to be similar or very different? Explain.
▶ If x1 is almost uncorrelated with x2 and x3 , but x2 and x3 are
highly correlated, will β̃1 and β̂1 tend to be similar or very
different? Explain.

31 / 51
Solution

▶ Because x1 is highly correlated with x2 and x3 , and these

latter variables have large partial effects on y , the simple and
multiple regression coefficients on x1 can differ by large
amounts.
▶ Here we would expect β̃1 and β̂1 to be similar. The amount of
correlation between x2 and x3 does not directly effect the
multiple regression estimate on x1 , if x1 is essentially
uncorrelated with x2 and x3 .

32 / 51
The Variance of the OLS Estimators

33 / 51
Standard Assumptions for the MLR Model

▶ Assumption MLR.5 (Homoskedasticity):

Var [u|x1 , · · · , xk ] = σ 2

The value of the explanatory variables must contain no information

about the variance of the unobserved factors.
▶ Example: In the wage equation

wage = β0 + β1 educ + β2 exper + β3 tenure + u,

the homoskedasticity assumption

Var [u|educ, exper , tenure] = σ 2

may also be hard to justify in many cases.

34 / 51
Sampling Variances of the OLS Slope Estimators
▶ Theorem (Sampling Variances of the OLS Slope Estimators):
Under assumptions MLR.1-MLR.5,

σ2
Var (β̂j ) = , j = 1, · · · , k,
SSTj (1 − Rj2 )

where σ 2 is the variance of error term, SSTj = i (xij − x̄j )2 is the

P
total sample variation in explanatory variable xj , and Rj2 is the
R-squared from the regression:

xj = δ0 + δ1 x1 + · · · + δj−1 xj−1 + δj+1 xj+1 + · · · + δk xk . (1)

Pn
▶ Note that SSTj (1 − Rj2 ) = SSRj = 2
i=1 r̂ij , where r̂ij is the residual
from the regression (1).

▶ Compared with the SLR case where Var (β̂1 ) = σ2 σ2

SSTx = Pn (xi −x̄ )2
,
i=1
the MLR case replaces xi − x̄ by r̂ij .

35 / 51
The Components of OLS Variances
▶ The error variance, σ 2 :

– A high σ 2 indicates more “noise” in the equation, which

increases the sampling variance and makes estimates imprecise.

▶ The total sample variation in the explanatory variable xj , SSTj :

– More sample variation leads to more precise estimates.

– Total sample variation is non-decreasing with the sample size.
n
X n
X n+1
X
(xi − x̄n )2 ≤ (xi − x̄n+1 )2 ≤ (xi − x̄n+1 )2
i=1 i=1 i=1

1 Pn 2

– SSTj = n i=1 (xij − x̄j ) = nVar
d (xj ); Var
d (xj ) tends to be
n
stable.
– Increasing the sample size n is thus a way to get more precise
estimates.

36 / 51
The Components of OLS Variances

▶ The linear relationships among the independent variables, Rj2 :

– In the regression of xj on all other independent variables

(including a constant), the R 2 will be the higher the better xj
can be linearly explained by the other independent variables.
– Var (β̂j ) will be the higher the better explanatory variable xj
can be linearly explained by other independent variables.
– The problem of almost linearly dependent explanatory variables
is called multicollinearity (i.e., R 2 → 1 for some j).
– If Rj2 = 1, i.e., there is perfect collinearity between xj and other
regressors, βj cannot be identified. This is why Var (β̂j ) = ∞.
– Multicollinearity is a small-sample problem. As larger and
larger data sets are available nowadays, i.e., n is much larger
than k, it is seldom a problem in current econometric practice.

37 / 51
An Example for Multicollinearity
▶ Consider the following MLR model,

avgscore = β0 + β1 teacherexp + β2 matexp + β3 othexp + · · · ,

– avgscore: average standardized test score of school
– teacherexp: expenditures for teachers
– matexp: expenditures for instructional materials
– othexp: other expenditures

▶ The different expenditure categories will be strongly correlated

because if a school has ample resources it will spend on everything.
▶ For precise estimates of the differential effects, one needs
information about situations where expenditure categories change
differentially.
▶ Therefore, sampling variance of the estimated effects will be large.

38 / 51
Discussion of the Multicollinearity Problem

▶ In the above example, it would probably be better to lump all

expenditure categories together.
▶ In other cases, dropping some independent variables may reduce
multicollinearity (but this may lead to omitted variable bias).
▶ Only the sampling variance of the variables involved in
multicollinearity will be inflated; the estimates of other effects may
be very precise.

▶ Multicollinearity may be detected by ”variance inflation factors”:

1
VIFj = .
1 − Rj2

As an (arbitrary) rule of thumb, the variance inflation factor should

not be larger than 10 (or Rj2 should not be larger than 0.9).

39 / 51
96 Part 1 Regression Analysis with Cross-Sectional Data

Discussion of the Multicollinearity Problem

ˆ1) as a function of R12.
F i g u r e 3 . 1 Var(b

Var( ˆ 1)

0
R12 1

ˆj is in
will see in Chapter 4, for statistical inference, what ultimately matters is how big b 40 / 51
Variances in Misspecified Models

▶ The choice of whether to include a particular variable in a regression

can be made by analyzing the trade-off between bias and variance.
▶ Suppose the true model is

y = β0 + β1 x1 + β2 x2 + u,

the fitted regression line in model 1 is

ŷ = β̂0 + β̂1 x1 + β̂2 x2 ,

and model 2 is
ỹ = β̃0 + β̃1 x1 .
▶ It might be the case that the likely omitted variable bias of β̃1 in the
misspecified model 2 is overcompensated by a smaller variance.

41 / 51
Variances in Misspecified Models

▶ Mean Squared Error (MSE): For a general estimator, say, β̂,

2

MSE (β̂) = E (β̂ − β)
= E (β̂ − E [β̂] + E [β̂] − β)2

2
= E (β̂ − E [β̂])2 + E [β̂] − β

− 2 β − E [β̂] E β̂ − E [β̂]
= Var (β̂) + Bias(β̂)2

▶ MSE is unobserved. In practice, estimate both models, and assess

how sensitive are β̂1 and se(β̂1 )

42 / 51
Estimating the Error Variance

▶ The unbiased estimator of σ 2 is

n
1 X SSR
σ̂ 2 = û 2 =
n − k − 1 i=1 i n−k −1

where n − (k + 1) is called the degree of freedom.

▶ The n estimated squared residuals {ûi : i = 1, · · · , n} in the sum are
not completely independent but related through the k + 1 equations
that define the first order conditions of the minimization problem.
1
Pn
▶ In the SLR, k = 1, and σ̂ 2 = n−2 2
i=1 ûi .

▶ Theorem (Unbiased Estimation of σ 2 ): Under assumptions

MLR.1-MLR.5,
E [σ̂ 2 ] = σ 2 .

43 / 51
Estimating the Error Variance

▶ The true sampling variation of the estimated βj is

s
σ2
q
sd(β̂j ) = Var (β̂j ) =
SSTj (1 − Rj2 )

▶ The estimated sampling variation of the estimated βj is

s
σ̂ 2
q
se(β̂j ) = Var (β̂j ) =
d
SSTj (1 − Rj2 )

i.e., we plug in σ̂ 2 for the unknown σ 2 .

▶ Note that these formulas are only valid under assumptions
MLR.1-MLR.5 (in particular, there has to be homoskedasticity).

44 / 51
Exercise

Suppose that you are interested in estimating the ceteris paribus

relationship between y and x1 . For this purpose, you can collect
data on two control variables, x2 and x3 . Let β̃1 be the simple
regression estimate from y on x1 and let β̂1 be the multiple
regression estimate from y on x1 , x2 , x3 .
▶ If x1 is highly correlated with x2 and x3 , and x2 and x3 have
small partial effect on y , would you expect se(β̃1 ) or se(β̂1 ) to
be smaller?
▶ If x1 is almost uncorrelated with x2 and x3 , and x2 and x3 are
highly correlated, would you expect se(β̃1 ) or se(β̂1 ) to be
smaller?

45 / 51
Solution

▶ In this case, we are (unnecessarily) introducing

multicollinearity into the regression; x2 and x3 have small
partial effects on y and yet x2 and x3 are highly correlated
with x1 . Adding x2 and x3 increases the standard error of the
coefficient on x1 substantially, so se(β̂1 ) is likely to be much
larger than se(β̃1 ).
▶ In this case, adding x2 and x3 will decrease the residual
variance without causing much collinearity (because x1 is
almost uncorrelated with x2 and x3 ), so we should see se(β̂1 )
smaller than se(β̃1 ). The amount of correlation between x2
and x3 does not directly affect se(β̂1 ).

46 / 51
Efficiency of OLS: The Gauss-Markov Theorem

47 / 51
Efficiency of OLS

▶ Under assumptions MLR.1-MLR.4, OLS is unbiased.

▶ However, under these assumptions there may be many other

estimators that are unbiased. Which one is the unbiased estimator
with the smallest variance?
▶ In order to answer this question one usually limits oneself to linear
estimators, i.e., estimators linear in the dependent variable:
n
X
β̃j = wij yi ,
i=1

where wij is an arbitrary function of the sample values of all the

explanatory variables.

48 / 51
Efficiency of OLS

▶ The OLS estimator can be shown to be of this form. In the SLR

Pn
(xi − x̄ )yi
β̂1 = Pi=1
n 2
,
i=1 (xi − x̄ )

i.e.,
xi − x̄ xi − x̄
wi1 = Pn 2
= ,
(x
i=1 i − x̄ ) SSTx
which is a function of {xi : i = 1, · · · , n}. (How about β̂j in MLR?)

49 / 51
The Gauss-Markov Theorem

▶ Theorem (The Gauss-Markov Theorem): Under assumptions

MLR.1-MLR.5, the OLS estimators are the best linear unbiased
estimators (BLUEs) of the regression coefficients, i.e.,

Var (β̂j ) ≤ Var (β̃j )

Pn
for all β̃j = i=1 wij yi for which E [β̃j ] = βj , j = 0, 1, · · · , k.
▶ OLS is the best linear estimator only when MLR.1-MLR.5 hold; if
there is heteroskedasticity for example, there are better estimators.
▶ The key assumption for the Gauss-Markov theorem is Assumption
MLR.5.
▶ Due to the Gauss-Markov Theorem, assumptions MLR.1-MLR.5 are
collectively known as the Gauss-Markov assumption.

50 / 51
Gauss-Markov Theorem
The Gauss-Markov
OLS is Theorem
efficient in the class of unbiased, linear estimators.

All estimators

unbiased

linear

OLS is BLUE--best linear unbiased estimator. 51 / 51

2020 Mock Exam C - Morning Session (With Solutions)
No ratings yet
2020 Mock Exam C - Morning Session (With Solutions)
62 pages
120 DS-With Answer
100% (1)
120 DS-With Answer
32 pages
Assignments Ashoka University
No ratings yet
Assignments Ashoka University
32 pages
2 - Model Linear Jamak Dan OLS
No ratings yet
2 - Model Linear Jamak Dan OLS
11 pages
Ch03 - MLR Estimation - Ver2
No ratings yet
Ch03 - MLR Estimation - Ver2
52 pages
Econometrics Lecture4 MultipleRegression
No ratings yet
Econometrics Lecture4 MultipleRegression
40 pages
Multiple Linear Regression Notes
No ratings yet
Multiple Linear Regression Notes
9 pages
Lecture 3 - Econometria I
No ratings yet
Lecture 3 - Econometria I
46 pages
L4 MLR With 2 Regressors
No ratings yet
L4 MLR With 2 Regressors
19 pages
Ch03 - MLR Estimation - Ver2
No ratings yet
Ch03 - MLR Estimation - Ver2
52 pages
Ols 23-24
No ratings yet
Ols 23-24
87 pages
Lecture 8 - Removed
No ratings yet
Lecture 8 - Removed
13 pages
Econometrics II: Revision Class: Introduction To Econometrics
No ratings yet
Econometrics II: Revision Class: Introduction To Econometrics
55 pages
Multiple Regression Model
No ratings yet
Multiple Regression Model
17 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Multiple Linear Regression Model
No ratings yet
Multiple Linear Regression Model
99 pages
Econometric S
No ratings yet
Econometric S
8 pages
Econ 399 Chapter2a
No ratings yet
Econ 399 Chapter2a
40 pages
Lecture 3 Multiple Regression Model-Estimation
No ratings yet
Lecture 3 Multiple Regression Model-Estimation
40 pages
Top2 Estimation Handout
No ratings yet
Top2 Estimation Handout
39 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
No ratings yet
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
43 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
Lecture 7. Multiple Regression
No ratings yet
Lecture 7. Multiple Regression
11 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Ecc321 Chapter 3
No ratings yet
Ecc321 Chapter 3
8 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
統計摘要
No ratings yet
統計摘要
12 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
16 pages
ECO375H Slides 3
No ratings yet
ECO375H Slides 3
39 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
The Three-Variable Model: Notation and Assumptions
No ratings yet
The Three-Variable Model: Notation and Assumptions
8 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
L1 The SLR Model
No ratings yet
L1 The SLR Model
11 pages
Econometrics For Finance Lecture III
No ratings yet
Econometrics For Finance Lecture III
54 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Eco 3
No ratings yet
Eco 3
68 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
DAAI - Lecture - 04 - With - Solutions - 10oct22
No ratings yet
DAAI - Lecture - 04 - With - Solutions - 10oct22
84 pages
Sta301 Solved Mcqs Final Term by Junaid
No ratings yet
Sta301 Solved Mcqs Final Term by Junaid
55 pages
Mood An Introduction To The Theory of Statistics
No ratings yet
Mood An Introduction To The Theory of Statistics
577 pages
Test of Hypothesis
No ratings yet
Test of Hypothesis
48 pages
Analysis of Variance-20220125072228
No ratings yet
Analysis of Variance-20220125072228
120 pages
2 PDF
No ratings yet
2 PDF
60 pages
Sampling Principles: Design Manual
No ratings yet
Sampling Principles: Design Manual
41 pages
Institute of Mathematical Statistics
No ratings yet
Institute of Mathematical Statistics
20 pages
Sarndal (2007)
No ratings yet
Sarndal (2007)
22 pages
Design of Experiments - Basic Concepts - Treatment - Experimental Unit - Experimental Error - Basic Principle - Replication, Randomization and Local Control.
No ratings yet
Design of Experiments - Basic Concepts - Treatment - Experimental Unit - Experimental Error - Basic Principle - Replication, Randomization and Local Control.
3 pages
Alebel Bayrau - Afford Ability &amp Willingness To Pay For Impro
No ratings yet
Alebel Bayrau - Afford Ability &amp Willingness To Pay For Impro
37 pages
(1999) Stern Et Al., Summary of Experimental Uncertainty Assessment Methodology With Example, IIHR Technical Report
No ratings yet
(1999) Stern Et Al., Summary of Experimental Uncertainty Assessment Methodology With Example, IIHR Technical Report
41 pages
Stat13t PPT 03
No ratings yet
Stat13t PPT 03
108 pages
Lesson 4.1 Computing The Point Estimate of A Population Mean
100% (1)
Lesson 4.1 Computing The Point Estimate of A Population Mean
34 pages
The Rolling Cross-Section and Causal Attribution: Henry E. Brady and Richard Johnston
No ratings yet
The Rolling Cross-Section and Causal Attribution: Henry E. Brady and Richard Johnston
32 pages
ML A1 PDF
100% (1)
ML A1 PDF
3 pages
Median: This Article Is About The Statistical Concept. For Other Uses, See Median (Disambiguation)
No ratings yet
Median: This Article Is About The Statistical Concept. For Other Uses, See Median (Disambiguation)
14 pages
Introduction To Statistical Machine Learning
No ratings yet
Introduction To Statistical Machine Learning
84 pages
Intermittent Demand Forecasting: Context, Methods and Applications Syntetos - Download The Ebook Today and Own The Complete Version
100% (1)
Intermittent Demand Forecasting: Context, Methods and Applications Syntetos - Download The Ebook Today and Own The Complete Version
59 pages
Chapter 3 Introduction To Numerical Methods: C BX Ax
No ratings yet
Chapter 3 Introduction To Numerical Methods: C BX Ax
19 pages
(15 Points) : Statistics and Probability
No ratings yet
(15 Points) : Statistics and Probability
4 pages
17.874 Lecture Notes Part 6: Panel Models
No ratings yet
17.874 Lecture Notes Part 6: Panel Models
13 pages
Econometrics Exam, 11,11,12
100% (1)
Econometrics Exam, 11,11,12
2 pages
Statistical Properties of OLS
No ratings yet
Statistical Properties of OLS
59 pages
10.statistics Unit-X 2
No ratings yet
10.statistics Unit-X 2
32 pages
ps8 Sol
No ratings yet
ps8 Sol
4 pages
MEC-003 - Compressed
No ratings yet
MEC-003 - Compressed
12 pages
Propensity Score Matching
No ratings yet
Propensity Score Matching
40 pages