0% found this document useful (0 votes)
32 views

Lecture 7. Multiple Regression

This document outlines a lecture on multiple regression. It discusses using multiple explanatory factors in regression models, estimating coefficients using ordinary least squares, and properties of OLS estimates. Examples are provided to illustrate concepts like omitted variable bias and interpreting regression coefficients.

Uploaded by

Andrea
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Lecture 7. Multiple Regression

This document outlines a lecture on multiple regression. It discusses using multiple explanatory factors in regression models, estimating coefficients using ordinary least squares, and properties of OLS estimates. Examples are provided to illustrate concepts like omitted variable bias and interpreting regression coefficients.

Uploaded by

Andrea
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Outline

ECMT5001: Principles of Econometrics 1 Multiple regression


Lecture 7: Multiple regression
2 Estimation by OLS
Instructor: Simon Kwok1
1 School of Economics 3 Assumptions of OLS
The University of Sydney

4 Properties of OLS

1
Based on lecture notes by Nicolas de Roos.
Simon Kwok ECMT5001 L7 1 / 41 Simon Kwok ECMT5001 L7 2 / 41

Outline Multiple regression

Explain the variable y in terms of the variables x1 , x2 , . . . , xk


1 Multiple regression

slope random
y intercept
2 Estimation by OLS parameters error

y= 0 + 1 x1 + 2 x2 + ... k xk +u
3 Assumptions of OLS
dependent independent
variable variables
4 Properties of OLS

Simon Kwok ECMT5001 L7 Multiple regression 3 / 41 Simon Kwok ECMT5001 L7 Multiple regression 4 / 41
Multiple regression Multiple regression

Example (Test scores)


Motivation
incorporates more explanatory factors into model
avgscore = 0 + 1 expend + 2 avginc + u
explicitly holds other factors fixed that would otherwise be in u
avgscore: average standardised test score of school
allows more flexible functional forms
expend: spending per student at this school
Example (Wage equation) avginc: average family income of students at this school
Per student spending is likely correlated with average family income
because of school financing
wage = 0 + 1 educ + 2 exper + u
omitting average income would lead to a biased estimate of the e↵ect
1 measures the e↵ect of education on wages holding experience fixed
of spending on average test scores
u contains all other factors
in a simple regression model, e↵ect of per student spending would
partly include the e↵ect of family income on test scores

Simon Kwok ECMT5001 L7 Multiple regression 5 / 41 Simon Kwok ECMT5001 L7 Multiple regression 6 / 41

Multiple regression Multiple regression

Example (Family consumption)


Example (CEO salaries)
2
cons = 0 + 1 inc + 2 inc +u
cons: family consumption log(salary ) = 0 + 1 log(sales) + 2 ceoten + 3 ceoten2 + u
inc: family income log(salary ) indicates the natural logarithm of CEO salary
Consumption is explained as a quadratic function of income the model suggests a quadratic relationship between CEO salary and
his or her tenure at the firm
be careful when interpreting coefficients!
@cons Note: linear regression means that the model has to be linear in
= 1 + 2 2 inc parameters not linear in variables
@inc
the e↵ect of income on consumption depends on the level of income

Simon Kwok ECMT5001 L7 Multiple regression 7 / 41 Simon Kwok ECMT5001 L7 Multiple regression 8 / 41
Outline OLS estimation
Start with a random sample

{(xi1 , xi2 , . . . , xik , yi ) : i = 1, . . . , n}


1 Multiple regression
Regression residuals

2 Estimation by OLS ûi = yi ˆ0 ˆ1 xi1 ˆ2 xi2 ··· ˆk xik

Minimise sum of squared residuals


3 Assumptions of OLS n
X
min ûi2 ! ˆ0 , ˆ1 , ˆ2 , . . . , ˆk
i=1
4 Properties of OLS
The formulae for the ˆ are similar to the simple regression case
calculation involves matrix inversion
in practice, calculation is performed by software

Simon Kwok ECMT5001 L7 Estimation by OLS 9 / 41 Simon Kwok ECMT5001 L7 Estimation by OLS 10 / 41

Interpretation of multiple regression Multiple regression


Example (College GPA)
j measures how much the dependent variable changes if the jth
explanatory variable is increased by one unit, holding all other explanatory \ = 1.29 + 0.453hsGPA + 0.0094ACT
colGPA
variables and the error term fixed colGPA: a student’s grade point average at college
hsGPA: the high school grade point average
@y ACT : an Achievement Test score
j =
@xj Interpretation
holding ACT fixed, another point of high school GPA is associated
other explanatory variables are held fixed even if they are correlated
with 0.453 points of college GPA
with xj
or, if we compare two students with the same ACT, if student A has a
it still has to be assumed that unobserved factors do not change if the
high school GPA one point higher than student B, we predict her
explanatory variables change
college GPA to be 0.453 points higher
holding high school GPA fixed, another 10 points on ACT are
associated with less than one point on college GPA
Simon Kwok ECMT5001 L7 Estimation by OLS 11 / 41 Simon Kwok ECMT5001 L7 Estimation by OLS 12 / 41
Partialling out interpretation Partialling out interpretation

Suppose we are interested in 1 in the regression

y= 0 + 1 x1 + 2 x2 + u. Why does this procedure work?


only the part of x1 that is uncorrelated with x2 is being related to y
Consider the following procedure:
the residuals from the first regression are the part of x1 that is
1. regress x1 on the other explanatory variables
uncorrelated with the other explanatory variables
x1 = 0 + 1 x2 + r1 ; so we are estimating the e↵ect of x1 on y after x2 has been
“partialled out” or “removed”
2. regress y on the residuals from this regression
the slope coefficient of the second regression therefore represents the
y= + isolated e↵ect of the explanatory variable on the dependent variable
0 1 rˆ1 + ✏.

Then our estimates of 1 are the same as our estimates of 1 in the


original regression. That is, ˆ1 = ˆ1 .

Simon Kwok ECMT5001 L7 Estimation by OLS 13 / 41 Simon Kwok ECMT5001 L7 Estimation by OLS 14 / 41

Algebraic properties of OLS Goodness of fit


fitted values and residuals
R-squared
ŷi = ˆ0 + ˆ1 xi1 + ˆ2 xi2 + · · · + ˆk xik , ûi = yi ŷi
SSE SSR
R2 = =1
fitted or predicted values residuals SST SST
notice that R 2 can only increase if another variable is added

algebraic properties of OLS regression We can also write R-squared as


n
X n
X Pn 2
(yi y )(ŷi ŷ i )
ûi = 0, xij ûi = 0, y = ˆ0 + ˆ1 x 1 + · · · + ˆk x k R 2 = Pn i=1 Pn
2
( i=1 (yi y ) ) ŷ )2
i=1 i=1 i=1 (ŷi

i.e. R 2 is equal to the squared correlation coefficient between the


zero correlation sample averages actual and the predicted value of the dependent variable
residuals
between residuals of y and x lie on
sum to zero
and regressors the regression line

Simon Kwok ECMT5001 L7 Estimation by OLS 15 / 41 Simon Kwok ECMT5001 L7 Estimation by OLS 16 / 41
Multiple regression Multiple regression
Example (Arrest records)
Example (Arrest records)
Suppose an additional explanatory variable is added
\
narr 86 = 0.712 0.150pcnv 0.034ptime86 0.104qemp86
n = 2, 725, R 2 = 0.0413 \
narr 86 = .707 .151pcnv + .0074avgsen .037ptime86 .103qemp86
narr 86: number of times arrested in 1986 n = 2, 725, R 2 = 0.0422
pcnv : proportion of prior arrests that led to a conviction avgsen: average sentence in prior convictions
ptime86: months in prison in 1986 Interpretation
qemp86: quarters employed in 1986 longer prior sentence increases the number of arrests
Interpretation limited additional explanatory power (small increase in R 2 )
another quarter of employment is associated with a decrease in arrest Comment on R 2
rate of 0.104
even if R 2 is small, regression may still provide good estimates of
a 10% point increase in the probability of prior conviction is ceteris paribus e↵ects
associated with a decrease in the arrest rate of 0.015

Simon Kwok ECMT5001 L7 Estimation by OLS 17 / 41 Simon Kwok ECMT5001 L7 Estimation by OLS 18 / 41

Outline Standard assumptions for multiple regression


Assumption MLR.1 (Linear in parameters)
y= 0 + 1 x1 + 2 x2 + ··· + k xk +u
1 Multiple regression
the population relationship between y
and the explanatory variables is linear
2 Estimation by OLS
Assumption MLR.2 (Random sampling)
3 Assumptions of OLS the data are
{(xi1 , xi2 , . . . , xik , yi ) : i = 1, . . . , n} randomly drawn
from the population
4 Properties of OLS
each data point
yi = 0 + 1 xi1 + ... k xik + ui follows the pop-
ulation equation

Simon Kwok ECMT5001 L7 Assumptions of OLS 19 / 41 Simon Kwok ECMT5001 L7 Assumptions of OLS 20 / 41
Standard assumptions for multiple regression Examples of perfect collinearity

Small samples
Assumption MLR.3 (No perfect collinearity)
in the sample (and also in the population), none of the independent avgscore = 0 + 1 expend + 2 avginc +u
variables is constant and there are no exact relationships between the avginc might coincidentally be an exact multiple of expend (this is
independent variables rare, even in small samples)
Remarks it will then be impossible to disentangle their e↵ects
the assumption rules out perfect correlation between explanatory Relationships between regressors
variables
if an explanatory variable is a perfect linear combination of other voteA = 0 + 1 shareA + 2 shareB +u
explanatory variables it may be eliminated there is an exact linear relationship between shareA and shareB:
constant variables are also ruled out (collinear with intercept) shareA + shareB = 1
either shareA or shareB will have to be dropped

Simon Kwok ECMT5001 L7 Assumptions of OLS 21 / 41 Simon Kwok ECMT5001 L7 Assumptions of OLS 22 / 41

Standard assumptions for multiple regression The zero conditional mean assumption
Assumption MLR.4 (Zero conditional mean)

the explanatory variables contain


E (u|x1 , x2 , . . . , xk ) = 0 no information about the mean
of the unobserved factors Explanatory variables that are correlated with the error term are
endogenous
in a multiple regression, the zero conditional mean assumption is endogeneity is a violation of assumption MLR.4
more likely to hold because fewer things remain in the error term
Explanatory variables that are uncorrelated with u are exogenous
Example (Test scores) MLR.4 holds if all explanatory variables are exogenous

avgscore = 0 + 1 expend + 2 avginc +u

If avginc were not included in the model it would end up in the error term
and expend would likely be correlated with u

Simon Kwok ECMT5001 L7 Assumptions of OLS 23 / 41 Simon Kwok ECMT5001 L7 Assumptions of OLS 24 / 41
Outline Statistical properties of OLS

Theorem (Unbiasedness of OLS)


1 Multiple regression
If assumptions MLR.1 - MLR.4 hold then

E ( ˆj ) = j, j = 0, 1, 2, . . . , k
2 Estimation by OLS

Interpretation
3 Assumptions of OLS in any random sample, the estimated coefficients may be larger or
smaller
on average in repeated samples they will be equal to the values
4 Properties of OLS
determined by the population relationship between y and the
explanatory variables

Simon Kwok ECMT5001 L7 Properties of OLS 25 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 26 / 41

Misspecification Omitted variable bias

Including irrelevant variables Omitting relevant variables (ctd.)


suppose there is a linear relationship between x1 and x2
y= 0 + 1 x1 + 2 x2 + 3 x3 +u
suppose 3 = 0 in the population x2 = 0 + 1 x1 +v
no problem because E ( ˆ3 ) = 3 =0 )y = 0 + 1 x1 + 2( 0 + 1 x1 + v) + u
however, including irrelevant variables may increase the sampling =( 0 + 2 0) +( 1 + 2 1 )x1 +( 2v + u)
variance

Omitting relevant variables: the simple case


estimated in- estimated slope
y= + tercept if y is if y is regressed error term
0 1 x1 + 2 x2 +u (true model)
regressed on x1 only on x1 only
y = ↵ 0 + ↵ 1 x1 + w (estimated model)
Conclusion: all estimated coefficients will be biased!

Simon Kwok ECMT5001 L7 Properties of OLS 27 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 28 / 41
Omitted variable bias Omitted variable bias
Example (Wage equation) Consider the more general model with k = 3

y= 0 + 1 x1 + 2 x2 + 3 x3 +u (true model)
wage = 0 + 1 educ + 2 abil +u
y= 0 + 1 x1 + 2 x2 +u (estimated model)
abil = 0 + 1 educ +v
wage = ( 0 + 2 0) +( 1 + 2 1 )educ +( 2v + u) No general statements are possible about the direction of bias

The return to education 1 will be overestimated if abil is omitted because Example (Wage equation)
2 1 > 0.
it will look as if people with high education earn very high wages wage = 0 + 1 educ + 2 exper + 3 abil +u
this is partly because people with more education have greater ability
on average If abil is omitted then the direction of the bias is unclear
however if we know exper is approximately uncorrelated with educ
When is there no omitted variable bias? and abil, then the direction of bias can be analysed as in the simple
if the omitted variable is irrelevant (i.e. 2 = 0) or uncorrelated (i.e. two variable case
1 = 0)

Simon Kwok ECMT5001 L7 Properties of OLS 29 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 30 / 41

Standard assumptions for multiple regression Statistical properties of OLS


Assumption MLR.5 (Homoskedasticity) the explanatory variables
2 contain no information Theorem (Sampling variances of slope estimators)
Var (u|x1 , x2 , . . . , xk ) =
about the variance of
Suppose Assumptions MLR.1 - MLR.5 hold. Then
the unobserved factors
2
Var ( ˆj ) = , j = 1, . . . , k
Example (Wage equation) SSTj (1 Rj2 )

where
2
Var (u|educ, exper , tenure) = 2 is the variance of the error term
Pn
This assumption may also be hard to justify SSTj = i=1 (xij x j )2 is the total sample variation in xj
Rj2 is the R 2 from a regression of explanatory variable xj on all other
Short-hand notation explanatory variables (including a constant)
2
Var (u|x) = , x = (x1 , x2 , . . . , xk )

Simon Kwok ECMT5001 L7 Properties of OLS 31 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 32 / 41
Components of the OLS variance Components of the OLS variance

2 2
Var ( ˆj ) = , j = 1, . . . , k Var ( ˆj ) = , j = 1, . . . , k
SSTj (1 Rj2 ) SSTj (1 Rj2 )

The error variance, 2 Linear relationships among explanatory variables, Rj2


a high error variance increases the sampling variance Rj2 is higher the better xj can be linearly explained by the other
a large error variance makes estimates imprecise explanatory variables
the error variance does not decrease with sample size if xj is mostly explained by other x’s then there is not much variation
left to identify ˆj
Sample variation in explanatory variable, SSTj
the problem of almost linearly dependent explanatory variables is
more sample variation leads to more precise estimates called multicollinearity (i.e. Rj2 ! 1 for some j)
total sample variation increases with sample size

Simon Kwok ECMT5001 L7 Properties of OLS 33 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 33 / 41

Multicollinearity Multicollinearity
Example (Test scores)
Discussion
with multicollinearity, the e↵ects of di↵erent explanatory variables is
avgscore = 0 + 1 teachexp + 2 matexp + 3 othexp + . . . hard to disentangle
avgscore: average standardised test score of a school dropping some explanatory variables may reduce multicollinearity (but
teachexp: expenditure on teachers it may lead to omitted variable bias!)
matexp: expenditure on instructional materials only the sampling variance of the explanatory variables involved will
othexp: other expenditures be inflated; estimates of other e↵ects may still be precise
note that multicollinearity is not a violation of MLR.3
The di↵erent expenditure categories may be strongly correlated: if a
school has a lot of resources, it will spend a lot on everything multicollinearity can be detected with variance inflation factors

it will be hard to estimate the di↵erential e↵ects of di↵erent spending VIFj = 1/(1 Rj2 )
categories
for precise estimates, we need information from situations in which as a rule of thumb, we should not have VIFj > 10
expenditure categories change di↵erentially

Simon Kwok ECMT5001 L7 Properties of OLS 34 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 35 / 41
Variance in misspecified models Variance in misspecified models
The choice of whether to include a variable depends on the trade o↵
between bias and variance
Case 1: suppose 2 =0
y= 0 + 1 x1 + 2 x2 +u (true model)
ŷ = ˆ0 + ˆ1 x1 + ˆ2 x2 (estimated model 1) E ( ˆ1 ) = 1, E ( ˜1 ) = 1, Var ( ˜1 ) < Var ( ˆ1 )
ỹ = ˜0 + ˜1 x1 (estimated model 2) conclusion: do not include irrelevant regressors

It may be that the omitted variable bias is compensated by a smaller Case 2: suppose 2 6= 0
variance
2 E ( ˆ1 ) = 1, E ( ˜1 ) 6= 1, Var ( ˜1 ) < Var ( ˆ1 )
Var ( ˆ1 ) = conclusion: tradeo↵ between bias and variance
SST1 (1 R12 )
2 beware: bias will not vanish even in large samples
Var ( ˜1 ) =
SST1
conditional on x1 and x2 , the variance in model 2 is always smaller
than in model 1
Simon Kwok ECMT5001 L7 Properties of OLS 36 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 37 / 41

Estimating the error variance Estimating standard errors

Under Assumptions MLR.1 - MLR.5


Theorem (Unbiased estimator of the error variance)
If Assumptions MLR.1 - MLR.5 hold then r
q h i
Pn true sampling
2
i=1 ûi
sd( ˆj ) = Var ( ˆj ) = 2/ SSTj (1 Rj2 )
E (ˆ ) =2 2
, where ˆ2 = variation of ˆj
n k 1
q r h i estimated
i.e. ˆ 2 is an unbiased estimator of 2
se( ˆj ) = Var ( ˆj ) = ˆ 2 / SSTj (1 Rj2 ) sampling
n (k + 1) are the degrees of freedom
variation of ˆj
the n estimated squared residuals are not completely independent
they are related through the k + 1 equations that define the first We can then calculate standard errors using the same steps we described
order conditions of the minimisation problem for simple regression

Simon Kwok ECMT5001 L7 Properties of OLS 38 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 39 / 41
Estimating standard errors Efficiency of OLS
Under assumptions MLR.1 - MLR.5 we know OLS is unbiased
To estimate the standard error of your estimates we would like the unbiased estimator with the smallest variance
1. estimate your regression model to obtain parameter estimates ˆ
2. obtain residuals from your regression, ûi , i = 1, . . . , n Theorem (Gauss-Markov Theorem)
3. estimate the error variance Under assumptions MLR.1 - MLR.5, the OLS estimators are the best
Pn linear unbiased estimators (BLUEs) of the regression coefficients. i.e. for
2
i=1 ûi all j = 0, 1, . . . , k,
ˆ2 =
n k 1 n
X
Var ( ˆj )  Var ( ˜j ), for all ˜j = wij yi for which E ( ˜j ) = j
4. calculate the standard error
i=1
r h i
se( ˆj ) = ˆ 2 / SSTj (1 Rj2 ) Notes
˜j are linear estimators (the OLS estimator is of this form)

OR use a software package! OLS is only the best estimator if MLR.1 - MLR.5 hold
e.g. if there is heteroskedasticity, there are more efficient estimators

Simon Kwok ECMT5001 L7 Properties of OLS 40 / 41 Simon Kwok ECMT5001 L7 Properties of OLS 41 / 41

You might also like