Econometrics Lecture 3 Multiple Regression Estimation
Econometrics Lecture 3 Multiple Regression Estimation
▶ The intercept represents the expected value of the dependent variable when all predictors
are zero
▶ Including the intercept ensures the residuals have a mean of zero when estimating OLS.
▶ Sometimes the constant is included in the model but its value doesn’t make practical sense.
For example, if you’re predicting weight based on height, a negative or unrealistic intercept
(like -30 kg) might appear, even though no one can have negative weight. This can happen
due to the linear nature of the model but it doesn’t imply the intercept is meaningful.
▶ Even if the value of the intercept seems nonsensical, it is still crucial to include it in the
model. The intercept ensures the regression line can adjust to fit the data properly.
Excluding the intercept would force the model through the origin, potentially worsening
predictions and model performance.
1 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
▶ The population error ui is the difference between the true value of the dependent variable y
and the true population regression line:
u = y − (β0 + β1 x )
we do not observe it.
▶ The residual ûi is the difference between the observed data and the fitted regression line
based on the sample:
ûi = yi − (β̂0 + β̂1 xi )
▶ The zero conditional mean assumption relates to the population error term. Even if the
sample residuals average out to zero, the assumption can be violated if, in the population,
the error term is correlated with the independent variables x.
▶ In practice, we cannot observe the population error ui , and thus, we cannot directly test
whether the errors ui are correlated with the explanatory variables xi or the parameter β1 .
To address this, we rely on theoretical assumptions about the true relationship between
variables to infer the likelihood of such correlations, even though we cannot test this directly
with sample data.
2 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
D EFINITION OF THE MULTIPLE LINEAR REGRESSION MODEL
Or in matrix notation:
Because it contains experience explicitly, we will be able to measure the effect of education on
wage, holding experience fixed. In a simple regression analysis —which puts exper in the error
term— we would have to assume that experience is uncorrelated with education, a tenuous
assumption.
4 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E XAMPLES
▶ Per student spending is likely to be correlated with average family income at a given high
school because of school financing.
▶ Omitting average family income in regression would lead to biased estimate of the effect of
spending on average test scores.
5 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E XAMPLES
Multiple regression analysis is also useful for generalizing functional relationships between
variables.
Example: Suppose family consumption (cons) is a quadratic function of family income
(inc)
6 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E XAMPLES
▶ Model assumes a constant elasticity relationship between CEO salary and the sales of his
or her firm.
▶ Model assumes a quadratic relationship between CEO salary and his or her tenure with the
firm.
R Stats: lets see how to run this example on R. Open Moodle R scripts - Lecture 3 - Multiple
regression
7 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
OLS E STIMATION OF THE MULTIPLE REGRESSION MODEL
▶ Random sample
▶ Regression residuals
▶ These estimates can be represented as: β̂ = (X ′ X )−1 X ′ y , representing them (including the
intercept) in matrix notation (see Appendix E).
8 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NTERPRETATION OF THE MULTIPLE REGRESSION MODEL
▶ The multiple linear regression model manages to hold the values of other explanatory
variables fixed even if, in reality, they are correlated with the explanatory variable under
consideration.
▶ If these other explanatory variables were in the residual, having a ceteris paribus
interpretation is impossible.
▶ It has still to be assumed that unobserved factors do not change if the explanatory variables
are changed.
9 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NTERPRETATION OF THE MULTIPLE REGRESSION MODEL
Interpretation
▶ Holding ACT fixed, another point on high school grade point average is associated with
another .453 points college grade point average
▶ Or: If we compare two students with the same ACT, but the hsGPA of student A is one point
higher, we predict student A to have a colGPA that is .453 higher than that of student B
▶ Holding high school grade point average fixed, another 10 points on ACT are associated
with less than one point on college GPA
10 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
P ROPERTIES OF OLS ON ANY SAMPLE OF DATA
OLS estimation mechanically ensures these results. These are direct results of the OLS
minimization process. Even though the sample covariance between residuals and regressors is
zero, it doesn’t imply that the true population error term is uncorrelated with the regressors. So
the coefficients can still be biased. 11 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
“PARTIALLING OUT ” INTERPRETATION OF MULTIPLE REGRESSION
How can multiple regression estimate the effect of a variable while holding other variables fixed?
▶ One can show that the estimated coefficient of an explanatory variable in a multiple
regression can be obtained in two steps:
1. Regress an explanatory variable x1 on all other explanatory variables (x2 , ..., xk )
▶ R squared
13 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
G OODNESS - OF -F IT
Interpretation:
▶ If the proportion prior arrests increases by 0.5, the predicted fall in arrests is 7.5 arrests per
100 men.
▶ If the months in prison increase from 0 to 12, the predicted fall in arrests is 0.408 arrests for
a particular man.
▶ If the quarters employed increase by 1, the predicted fall in arrests is 10.4 arrests per 100
men.
14 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
G OODNESS - OF -F IT
Interpretation:
▶ Average prior sentence increases number of arrests (?). Unexpected sign: it says that a
longer average sentence length increases criminal activity. Limited additional explanatory
power as R 2 increases by little.
General remark on R 2 :
▶ Even if R 2 is small (as in the given example), regression may still provide good estimates of
ceteris paribus effects.
▶ Generally, a low R 2 indicates that it is hard to predict individual outcomes on y with much
accuracy.
▶ The fact that R 2 never decreases when any variable is added to a regression makes it a
poor tool for deciding whether one variable or several variables should be added to a model.
15 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL
You should remember that statistical properties have nothing to do with a particular sample, but
rather with the property of estimators when random sampling is done repeatedly.
▶ Assumption MLR.1 (Linear in parameters)
16 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL
Assumption MLR.3 is more complicated than its counterpart for simple regression because we
must now look at relationships between all independent variables.
▶ Assumption MLR.3 (No perfect collinearity)
• In the sample (and therefore in the population), none of the independent variables is
constant and there are no exact linear relationships among the independent variables.
▶ Remarks on MLR.3
• Constant variables are also ruled out (collinear with intercept).
• The assumption only rules out perfect collinearity/correlation between explanatory
variables; imperfect correlation is allowed.
• If an explanatory variable is a perfect linear combination of other explanatory variables
it is superfluous and may be eliminated.
• For example, in estimating a relationship between consumption and income, it makes
no sense to include as independent variables income measured in dollars as well as
income measured in thousands of dollars
• The model cons = β0 + β1 inc + β2 inc 2 + u does not violate Assumption MLR.3: even
though x2 = inc 2 is an exact function of x1 = inc, inc 2 is not an exact linear function of
inc.
17 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL
18 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL
▶ In a multiple regression model, the zero conditional mean assumption is more likely to hold
because fewer things end up in the error. Nevertheless, in any application, there are always
factors that, due to data limitations or ignorance, we will not be able to include.
▶ One way that Assumption MLR.4 can fail is if the functional relationship between the
explained and explanatory variables is misspecified. For example, if we forget to include the
quadratic term inc 2 in the consumption function cons = β0 + β1 inc + β2 inc 2 + u when we
estimate the model. We must include what actually shows up in the population model.
▶ There are other ways that u can be correlated with an explanatory variable: measurement
error in an explanatory variable; and explanatory variables that are determined jointly with
y . We must postpone our study of these problems until we have a firm grasp of multiple
regression analysis under an ideal set of assumptions.
19 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL
▶ When Assumption MLR.4 holds, we often say that we have exogenous explanatory
variables.
▶ If xj is correlated with u for any reason, then xj is said to be an endogenous explanatory
variable.
▶ Do not confuse Assumptions MLR.3 and MLR.4. Assumption MLR.3 rules out certain
relationships among the independent or explanatory variables and has nothing to do with
the error, u.
▶ You will know immediately when carrying out OLS estimation whether or not Assumption
MLR.3 holds (i.e., it gives an error when don’t). On the other hand, we will never know for
sure whether the average value of the unobserved factors is unrelated to the explanatory
variables.
20 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL
21 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES
22 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES
23 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES
When is there no omitted variable bias? If the omitted variable is irrelevant or uncorrelated:
β2 = 0 or δ̃1 = 0
24 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES
25 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES
Usually, no general statements possible about direction of bias. This is because x1 , x2 , and x3
can all be pairwise correlated.
Analysis as in simple case if one regressor is uncorrelated with others.
26 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL (C ONT.)
27 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL (C ONT.)
28 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
C OMPONENTS OF OLS VARIANCES :
29 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
C OMPONENTS OF OLS VARIANCES :
30 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
M ULTICOLLINEARITY
▶ The different expenditure categories will be strongly correlated because if a school has a lot
of resources it will spend a lot on everything.
▶ It will be hard to estimate the differential effects of different expenditure categories because
all expenditures are either high or low. For precise estimates of the differential effects, one
would need information about situations where expenditure categories change differentially.
▶ As a consequence, sampling variance of the estimated effects will be large.
31 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
M ULTICOLLINEARITY
• In the above example, it would probably be better to lump all expenditure categories
together because effects cannot be disentangled.
• In other cases, dropping some independent variables may reduce multicollinearity (but
this may lead to omitted variable bias).
• Only the sampling variance of the variables involved in multicollinearity will be inflated;
the estimates of other effects may be very precise.
32 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
VARIANCES IN MISSPECIFIED MODELS
It might be the case that the likely omitted variable bias in the misspecified model 2 is
overcompensated by a smaller variance.
33 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
VARIANCES IN MISSPECIFIED MODELS
▶ Case 1:
▶ Case 2:
Main reason for including x2 in the model: any bias in β̃1 does not shrink as the sample size
grows. On the other hand, the Variances shrink to zero as n gets large, which means that the
multicollinearity induced by adding x2 becomes less important as the sample size grows. 34 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E STIMATING THE ERROR VARIANCE
An unbiased estimate of the error variance can be obtained by subtracting the number of
estimated regression coefficients from the number of observations. The number of observations
minus the number of estimated parameters is also called the degrees of freedom.
▶ Theorem 3.3 (Unbiased estimator of the error variance)
35 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E STIMATION OF THE SAMPLING VARIANCES OF THE OLS ESTIMATORS
Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there
has to be homoskedasticity)
36 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E FFICIENCY OF OLS: T HE G AUSS -M ARKOV T HEOREM
37 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
T HEOREM 3.4 (G AUSS -M ARKOV T HEOREM )
▶ Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased
estimators (BLUEs) of the regression coefficients, i.e.
OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is heteroskedasticity for example,
there are better estimators.
The importance of the Gauss-Markov Theorem is that, when the standard set of assumptions
holds, we need not look for alternative unbiased linear estimators: none will be better than OLS.
38 / 39
R EFERENCES
39 / 39