0% found this document useful (0 votes)
2 views40 pages

Econometrics Lecture 3 Multiple Regression Estimation

The document covers multiple regression analysis, focusing on estimation techniques and the importance of including the intercept in models. It explains key concepts such as residuals, the definition of the multiple linear regression model, and the significance of various assumptions for accurate estimation. Additionally, it discusses the implications of including irrelevant variables and omitting relevant ones, highlighting the potential for biased estimates in regression models.

Uploaded by

Philips Otibo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views40 pages

Econometrics Lecture 3 Multiple Regression Estimation

The document covers multiple regression analysis, focusing on estimation techniques and the importance of including the intercept in models. It explains key concepts such as residuals, the definition of the multiple linear regression model, and the significance of various assumptions for accurate estimation. Additionally, it discusses the implications of including irrelevant variables and omitting relevant ones, highlighting the potential for biased estimates in regression models.

Uploaded by

Philips Otibo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

L ECTURE 3 - M ULTIPLE R EGRESSION A NALYSIS : E STIMATION

Course: [107408] ECONOMETRICS

Department of Economics and Statistics,


University of Siena
2024/2025
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION

Clarifications - The intercept

▶ The intercept represents the expected value of the dependent variable when all predictors
are zero

▶ Including the intercept ensures the residuals have a mean of zero when estimating OLS.

▶ Sometimes the constant is included in the model but its value doesn’t make practical sense.
For example, if you’re predicting weight based on height, a negative or unrealistic intercept
(like -30 kg) might appear, even though no one can have negative weight. This can happen
due to the linear nature of the model but it doesn’t imply the intercept is meaningful.

▶ Even if the value of the intercept seems nonsensical, it is still crucial to include it in the
model. The intercept ensures the regression line can adjust to fit the data properly.
Excluding the intercept would force the model through the origin, potentially worsening
predictions and model performance.

1 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION

Clarifications - The error vs residuals

▶ The population error ui is the difference between the true value of the dependent variable y
and the true population regression line:

u = y − (β0 + β1 x )
we do not observe it.
▶ The residual ûi is the difference between the observed data and the fitted regression line
based on the sample:
ûi = yi − (β̂0 + β̂1 xi )
▶ The zero conditional mean assumption relates to the population error term. Even if the
sample residuals average out to zero, the assumption can be violated if, in the population,
the error term is correlated with the independent variables x.
▶ In practice, we cannot observe the population error ui , and thus, we cannot directly test
whether the errors ui are correlated with the explanatory variables xi or the parameter β1 .
To address this, we rely on theoretical assumptions about the true relationship between
variables to infer the likelihood of such correlations, even though we cannot test this directly
with sample data.
2 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
D EFINITION OF THE MULTIPLE LINEAR REGRESSION MODEL

▶ “Explains variable y in terms of variables x1, x2,. . . , xk”

Or in matrix notation:

• where y is an n × 1 vector of the dependent variable (with n observations).


• X is an n × (k + 1) matrix of the independent variables, where the first column is all
ones (to account for the intercept), and the remaining columns correspond to the k
independent variables x1 , x2 , . . . , xk .
• β is a (k + 1) × 1 vector of the regression coefficients, which includes the intercept β0
and the coefficients β1 , . . . , βk . 3 / 39
• u is an n × 1 vector of the errors or residuals.
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
M OTIVATION AND E XAMPLE

Motivation for multiple regression


▶ Incorporate more explanatory factors into the model
▶ For an x, it explicitly holds fixed other factors affecting y
▶ Allow for more flexible functional forms

▶ Example: Wage equation

Because it contains experience explicitly, we will be able to measure the effect of education on
wage, holding experience fixed. In a simple regression analysis —which puts exper in the error
term— we would have to assume that experience is uncorrelated with education, a tenuous
assumption.
4 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E XAMPLES

Example: Average test scores and per student spending

▶ Per student spending is likely to be correlated with average family income at a given high
school because of school financing.
▶ Omitting average family income in regression would lead to biased estimate of the effect of
spending on average test scores.

5 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E XAMPLES

Multiple regression analysis is also useful for generalizing functional relationships between
variables.
Example: Suppose family consumption (cons) is a quadratic function of family income
(inc)

▶ Model has two explanatory variables: income and income squared


▶ Consumption is explained as a quadratic function of income
▶ One has to be very careful when interpreting the coefficients. It makes no sense to
measure the effect of inc on cons while holding inc 2 fixed, because if inc changes, then so
must inc 2 ! Instead, the change in consumption with respect to the change in income—the
marginal propensity to consume— is approximated by:

6 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E XAMPLES

Example: CEO salary, sales and CEO tenure

▶ Model assumes a constant elasticity relationship between CEO salary and the sales of his
or her firm.
▶ Model assumes a quadratic relationship between CEO salary and his or her tenure with the
firm.

▶ Meaning of “linear” regression


• The model has to be linear in the parameters (not in the variables)

R Stats: lets see how to run this example on R. Open Moodle R scripts - Lecture 3 - Multiple
regression
7 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
OLS E STIMATION OF THE MULTIPLE REGRESSION MODEL

▶ Random sample

▶ Regression residuals

▶ Minimize sum of squared residuals

▶ These estimates can be represented as: β̂ = (X ′ X )−1 X ′ y , representing them (including the
intercept) in matrix notation (see Appendix E).
8 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NTERPRETATION OF THE MULTIPLE REGRESSION MODEL

The estimates have partial effect, or ceteris paribus, interpretations.

▶ The multiple linear regression model manages to hold the values of other explanatory
variables fixed even if, in reality, they are correlated with the explanatory variable under
consideration.
▶ If these other explanatory variables were in the residual, having a ceteris paribus
interpretation is impossible.
▶ It has still to be assumed that unobserved factors do not change if the explanatory variables
are changed.

9 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NTERPRETATION OF THE MULTIPLE REGRESSION MODEL

Example: Determinants of college GPA

Interpretation
▶ Holding ACT fixed, another point on high school grade point average is associated with
another .453 points college grade point average
▶ Or: If we compare two students with the same ACT, but the hsGPA of student A is one point
higher, we predict student A to have a colGPA that is .453 higher than that of student B
▶ Holding high school grade point average fixed, another 10 points on ACT are associated
with less than one point on college GPA

10 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
P ROPERTIES OF OLS ON ANY SAMPLE OF DATA

▶ Fitted values and residuals

▶ Algebraic properties of OLS regression

OLS estimation mechanically ensures these results. These are direct results of the OLS
minimization process. Even though the sample covariance between residuals and regressors is
zero, it doesn’t imply that the true population error term is uncorrelated with the regressors. So
the coefficients can still be biased. 11 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
“PARTIALLING OUT ” INTERPRETATION OF MULTIPLE REGRESSION

How can multiple regression estimate the effect of a variable while holding other variables fixed?
▶ One can show that the estimated coefficient of an explanatory variable in a multiple
regression can be obtained in two steps:
1. Regress an explanatory variable x1 on all other explanatory variables (x2 , ..., xk )

xi1 = b0 + b1 xi2 + · · · + bk xki + ri 1


rˆi1 is xi1 after the effects of (xi 2, ..., xi k ) have been partialled out, or netted out.
2. Regress y on the residuals from this regression
y = β̂0 + β̂1 r̂i + ûi
This will give the same effect of β1 as the simple regression:

y = β̂0 + β̂1 xˆ1i + ûi


▶ Why does this procedure work?
• The residuals from the first regression is the part of the explanatory variable that is
uncorrelated with the other explanatory variables.
• The slope coefficient of the second regression therefore represents the isolated effect
of the explanatory variable on the dep. variable. 12 / 39
• This general partialling out result is called the Frisch-Waugh theorem.
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
G OODNESS - OF -F IT

▶ Decomposition of total variation

▶ R squared

▶ Alternative expression for R squared

13 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
G OODNESS - OF -F IT

Example: Explaining arrest records

Interpretation:
▶ If the proportion prior arrests increases by 0.5, the predicted fall in arrests is 7.5 arrests per
100 men.
▶ If the months in prison increase from 0 to 12, the predicted fall in arrests is 0.408 arrests for
a particular man.
▶ If the quarters employed increase by 1, the predicted fall in arrests is 10.4 arrests per 100
men.

14 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
G OODNESS - OF -F IT

Example: Explaining arrest records (cont.)


An additional explanatory variable is added:

Interpretation:
▶ Average prior sentence increases number of arrests (?). Unexpected sign: it says that a
longer average sentence length increases criminal activity. Limited additional explanatory
power as R 2 increases by little.
General remark on R 2 :
▶ Even if R 2 is small (as in the given example), regression may still provide good estimates of
ceteris paribus effects.
▶ Generally, a low R 2 indicates that it is hard to predict individual outcomes on y with much
accuracy.
▶ The fact that R 2 never decreases when any variable is added to a regression makes it a
poor tool for deciding whether one variable or several variables should be added to a model.
15 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL

You should remember that statistical properties have nothing to do with a particular sample, but
rather with the property of estimators when random sampling is done repeatedly.
▶ Assumption MLR.1 (Linear in parameters)

▶ Assumption MLR.2 (Random sampling)

16 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL

Assumption MLR.3 is more complicated than its counterpart for simple regression because we
must now look at relationships between all independent variables.
▶ Assumption MLR.3 (No perfect collinearity)
• In the sample (and therefore in the population), none of the independent variables is
constant and there are no exact linear relationships among the independent variables.

▶ Remarks on MLR.3
• Constant variables are also ruled out (collinear with intercept).
• The assumption only rules out perfect collinearity/correlation between explanatory
variables; imperfect correlation is allowed.
• If an explanatory variable is a perfect linear combination of other explanatory variables
it is superfluous and may be eliminated.
• For example, in estimating a relationship between consumption and income, it makes
no sense to include as independent variables income measured in dollars as well as
income measured in thousands of dollars
• The model cons = β0 + β1 inc + β2 inc 2 + u does not violate Assumption MLR.3: even
though x2 = inc 2 is an exact function of x1 = inc, inc 2 is not an exact linear function of
inc.
17 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL

▶ Example for perfect collinearity: small sample

▶ Example for perfect collinearity: relationships between regressors

18 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL

▶ Assumption MLR.4 (Zero conditional mean)

▶ In a multiple regression model, the zero conditional mean assumption is more likely to hold
because fewer things end up in the error. Nevertheless, in any application, there are always
factors that, due to data limitations or ignorance, we will not be able to include.
▶ One way that Assumption MLR.4 can fail is if the functional relationship between the
explained and explanatory variables is misspecified. For example, if we forget to include the
quadratic term inc 2 in the consumption function cons = β0 + β1 inc + β2 inc 2 + u when we
estimate the model. We must include what actually shows up in the population model.
▶ There are other ways that u can be correlated with an explanatory variable: measurement
error in an explanatory variable; and explanatory variables that are determined jointly with
y . We must postpone our study of these problems until we have a firm grasp of multiple
regression analysis under an ideal set of assumptions.
19 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL

▶ When Assumption MLR.4 holds, we often say that we have exogenous explanatory
variables.
▶ If xj is correlated with u for any reason, then xj is said to be an endogenous explanatory
variable.
▶ Do not confuse Assumptions MLR.3 and MLR.4. Assumption MLR.3 rules out certain
relationships among the independent or explanatory variables and has nothing to do with
the error, u.
▶ You will know immediately when carrying out OLS estimation whether or not Assumption
MLR.3 holds (i.e., it gives an error when don’t). On the other hand, we will never know for
sure whether the average value of the unobserved factors is unrelated to the explanatory
variables.

▶ Example: Average test scores

20 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL

21 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES

▶ Including irrelevant variables in a regression model

▶ Omitting relevant variables: the simple case


This problem generally causes the OLS estimators to be biased. It is time to show this
explicitly and, just as importantly, to derive the direction and size of the bias.

22 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES

▶ Omitted variable bias

Conclusion: All estimated coefficients will be biased

Bias = E (β̃1 + β̂2 δ̃1 ) − β1 = β2 δ̃1


Cov(x1 ,x2 )
δ̃1 is the sample covariance between x1 and x2 over the sample variance of x1 : δ̃ = Var(x1 )

23 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES

▶ Example: Omitting ability in a wage equation

When is there no omitted variable bias? If the omitted variable is irrelevant or uncorrelated:
β2 = 0 or δ̃1 = 0

24 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES

In practice, because β2 is an unknown population parameter, we cannot be certain whether β2 is


positive or negative. Nevertheless, we usually have a pretty good idea about the direction of the
partial effect of x2 on y . Further, even though the sign of the correlation between x1 and x2
cannot be known if x2 is not observed, in many cases, we can make an educated guess about
whether x1 and x2 are positively or negatively correlated.

25 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
I NCLUDING IRRELEVANT VARIABLES OR O MITTING RELEVANT VARIABLES

▶ Omitted variable bias: more general cases

Usually, no general statements possible about direction of bias. This is because x1 , x2 , and x3
can all be pairwise correlated.
Analysis as in simple case if one regressor is uncorrelated with others.

▶ Example: Omitting ability in a wage equation

26 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL (C ONT.)

▶ Assumption MLR.5 (Homoskedasticity)

▶ Example: Wage equation

▶ Short hand notation (x in bold)

27 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
S TANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL (C ONT.)

▶ Theorem 3.2 (Sampling variances of the OLS slope estimators)

Under assumptions MLR.1 – MLR.5:

28 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
C OMPONENTS OF OLS VARIANCES :

1. The error variance


• A high error variance increases the sampling variance because there is more “noise” in
the equation.
• The error variance does not decrease with sample size. Because it is a feature of the
population
• For a given dependent variable y , there is only one way to reduce the error variance:
adding more explanatory variables to the equation (take some factors out of the error
term).

2. The total sample variation in the explanatory variable


• More sample variation leads to more precise estimates.
• Total sample variation automatically increases with the sample size.
• Increasing the sample size is thus a way to get more precise estimates.
• When SSTj is small, Var (β̂j ) can get very large, but a small SSTj is not a violation of
Assumption MLR.3.

29 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
C OMPONENTS OF OLS VARIANCES :

3. Linear relationships among the independent variables


• Regress xj on all other independent variables (including constant)
• The R-squared of this regression will be the higher when xj can be better explained by
the other independent variables.
• The sampling variance of the slope estimator for xj will be higher when xj can be better
explained by the other independent variables.
• Under perfect multicollinearity, the variance of the slope estimator will approach infinity.

30 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
M ULTICOLLINEARITY

▶ An example for multicollinearity

▶ The different expenditure categories will be strongly correlated because if a school has a lot
of resources it will spend a lot on everything.
▶ It will be hard to estimate the differential effects of different expenditure categories because
all expenditures are either high or low. For precise estimates of the differential effects, one
would need information about situations where expenditure categories change differentially.
▶ As a consequence, sampling variance of the estimated effects will be large.

31 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
M ULTICOLLINEARITY

▶ Discussion of the multicollinearity problem

• In the above example, it would probably be better to lump all expenditure categories
together because effects cannot be disentangled.

• In other cases, dropping some independent variables may reduce multicollinearity (but
this may lead to omitted variable bias).

• Only the sampling variance of the variables involved in multicollinearity will be inflated;
the estimates of other effects may be very precise.

• Note that multicollinearity is not a violation of MLR.3 in the strict sense.

• Multicollinearity may be detected through “variance inflation factors.”

32 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
VARIANCES IN MISSPECIFIED MODELS

▶ The choice of whether to include a particular variable in a regression can be made by


analyzing the tradeoff between bias and variance.

It might be the case that the likely omitted variable bias in the misspecified model 2 is
overcompensated by a smaller variance.

33 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
VARIANCES IN MISSPECIFIED MODELS

Assuming x1 and x2 are correlated:

▶ Case 1:

▶ Case 2:

Main reason for including x2 in the model: any bias in β̃1 does not shrink as the sample size
grows. On the other hand, the Variances shrink to zero as n gets large, which means that the
multicollinearity induced by adding x2 becomes less important as the sample size grows. 34 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E STIMATING THE ERROR VARIANCE

An unbiased estimate of the error variance can be obtained by subtracting the number of
estimated regression coefficients from the number of observations. The number of observations
minus the number of estimated parameters is also called the degrees of freedom.
▶ Theorem 3.3 (Unbiased estimator of the error variance)

35 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E STIMATION OF THE SAMPLING VARIANCES OF THE OLS ESTIMATORS

Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there
has to be homoskedasticity)

36 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
E FFICIENCY OF OLS: T HE G AUSS -M ARKOV T HEOREM

Efficiency of OLS: The Gauss-Markov Theorem

▶ Under assumptions MLR.1 - MLR.5, OLS is unbiased


▶ However, under these assumptions there may be many other estimators that are unbiased.
Which one is the unbiased estimator with the smallest variance?
▶ In order to answer this question one usually limits oneself to linear estimators, i.e.
estimators linear in the dependent variable.

37 / 39
M ULTIPLE R EGRESSION A NALYSIS : E STIMATION
T HEOREM 3.4 (G AUSS -M ARKOV T HEOREM )

Theorem 3.4 (Gauss-Markov Theorem)

▶ Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased
estimators (BLUEs) of the regression coefficients, i.e.

OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is heteroskedasticity for example,
there are better estimators.
The importance of the Gauss-Markov Theorem is that, when the standard set of assumptions
holds, we need not look for alternative unbiased linear estimators: none will be better than OLS.

38 / 39
R EFERENCES

Wooldridge J.M. (2018). Introductory Econometrics: A Modern Approach, Seventh Edition.


Cengage. Chapter 3 - Multiple Regression Analysis: Estimation; and Appendix E.

39 / 39

You might also like