0% found this document useful (0 votes)
22 views8 pages

Ecc321 Chapter 3

The document provides an overview of multiple linear regression analysis, explaining its definition, motivation, and interpretation. It discusses various examples, properties of ordinary least squares (OLS) estimation, and standard assumptions for the multiple regression model. Additionally, it covers issues like omitted variable bias, multicollinearity, and the efficiency of OLS estimators under specific assumptions.

Uploaded by

qhamabeta05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views8 pages

Ecc321 Chapter 3

The document provides an overview of multiple linear regression analysis, explaining its definition, motivation, and interpretation. It discusses various examples, properties of ordinary least squares (OLS) estimation, and standard assumptions for the multiple regression model. Additionally, it covers issues like omitted variable bias, multicollinearity, and the efficiency of OLS estimators under specific assumptions.

Uploaded by

qhamabeta05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Created by Turbolearn AI

Multiple Regression Analysis: Estimation


Definition of Multiple Linear Regression
The multiple linear regression model explains a variable y in terms of variables
x ,x ,...,x .
1 2 k

Motivation for Multiple Regression


Incorporate more explanatory factors into the model.
Explicitly hold fixed other factors that otherwise would be in the error term.
Allow for more flexible functional forms.
Example: Wage equation.

Example: Average Test Scores and Per Student


Spending
Per student spending is likely to be correlated with average family income at a given
high school due to school financing. Omitting average family income in regression
would lead to a biased estimate of the effect of spending on average test scores. In a
simple regression model, the effect of per student spending would partly include the
effect of family income on test scores.

Example: Family Income and Family


Consumption
The model has two explanatory variables: income and income squared.
Consumption is explained as a quadratic function of income. One has to be very
careful when interpreting the coefficients.

Example: CEO Salary, Sales, and CEO Tenure

Page 1
Created by Turbolearn AI

Model assumes a constant elasticity relationship between CEO salary and the
sales of his or her firm.
Model assumes a quadratic relationship between CEO salary and his or her
tenure with the firm.
The model has to be linear in the parameters (not in the variables).

OLS Estimation of the Multiple Regression


Model
Random sample
Regression residuals
Minimize sum of squared residuals

Interpretation of the Multiple Regression


Model
The multiple linear regression model manages to hold the values of other
explanatory variables fixed even if, in reality, they are correlated with the explanatory
variable under consideration.

Ceteris paribus-interpretation: It has still to be assumed that unobserved


factors do not change if the explanatory variables are changed.

Example: Determinants of College GPA


Interpretation: Holding ACT fixed, another point on high school grade point
average is associated with another .453 points college grade point average.
If we compare two students with the same ACT, but the high school GPA of
student A is one point higher, we predict student A to have a college GPA that
is .453 higher than that of student B.
Holding high school grade point average fixed, another 10 points on ACT are
associated with less than one point on college GPA.

Properties of OLS on Any Sample of Data


Fitted values and residuals
Algebraic properties of OLS regression

Page 2
Created by Turbolearn AI

Partialling Out Interpretation of Multiple


Regression
One can show that the estimated coefficient of an explanatory variable in a multiple
regression can be obtained in two steps:

1. Regress the explanatory variable on all other explanatory variables.


2. Regress y on the residuals from this regression.

The residuals from the first regression are the part of the explanatory variable that is
uncorrelated with the other explanatory variables. The slope coefficient of the second
regression therefore represents the isolated effect of the explanatory variable on the
dependent variable.

Goodness-of-Fit
Decomposition of total variation
2
R

Alternative expression for R 2

Example: Explaining Arrest Records


If the proportion prior arrests increases by 0.5, the predicted fall in arrests is 7.5
arrests per 100 men.
If the months in prison increase from 0 to 12, the predicted fall in arrests is
0.408 arrests for a particular man.
If the quarters employed increase by 1, the predicted fall in arrests is 10.4
arrests per 100 men.
When an additional explanatory variable is added (average prior sentence), the
additional explanatory power is limited as R increases by little.
2

Even if R is small, regression may still provide good estimates of ceteris


2

paribus effects.

Standard Assumptions for the Multiple


Regression Model

Page 3
Created by Turbolearn AI

Assumption Description

MLR.1 (Linear in parameters)


MLR.2 (Random sampling)
(No perfect collinearity) In the sample (and therefore in the population),
MLR.3 none of the independent variables is constant and there are no exact
linear relationships among the independent variables.
(Zero conditional mean) In a multiple regression model, the zero
MLR.4 conditional mean assumption is much more likely to hold because
fewer things end up in the error.
MLR.5 (Homoskedasticity)

Remarks on MLR.3
The assumption only rules out perfect collinearity/correlation between
explanatory variables; imperfect correlation is allowed.
If an explanatory variable is a perfect linear combination of other explanatory
variables, it is superfluous and may be eliminated.
Constant variables are also ruled out (collinear with intercept).

Example for Perfect Collinearity


Small sample and relationships between regressors.

Example: Average Test Scores


In a multiple regression model, the zero conditional mean assumption is much more
likely to hold because fewer things end up in the error.

Omitted Variable Bias

Conclusion
All estimated coefficients will be biased.

Page 4
Created by Turbolearn AI

заработной плате Example: Omitting Ability in a Wage


Equation
When is there no omitted variable bias? If the omitted variable is irrelevant or
uncorrelated.

More General Cases


No general statements possible about direction of bias. Analysis as in simple case if
one regressor uncorrelated with others. Example: Omitting ability in a wage equation.

Homoskedasticity
Example: Wage equation

Shorthand notation: V ar(u|x 1, x2 , . . . , xk ) = σ


2

Sampling Variances of the OLS Slope


Estimators
Under assumptions MLR.1 - MLR.5:
2
^ ) =
V ar(β
σ
j 2
SST j (1−R )
j

Components of OLS Variances

Page 5
Created by Turbolearn AI

1. The error variance (σ ):


2

A high error variance increases the sampling variance because there is


more noise in the equation.
A large error variance doesn't necessarily make estimates imprecise.
The error variance does not decrease with sample size.
2. The total sample variation in the explanatory variable (SST ): j

More sample variation leads to more precise estimates.


Total sample variation automatically increases with the sample size.
Increasing the sample size is thus a way to get more precise estimates.
3. Linear relationships among the independent variables (R ): 2
j

Regress x on all other independent variables (including constant).


j

The R of this regression will be higher when x can be better explained


2
j

by the other independent variables.


The sampling variance of the slope estimator for x will be higher when
j

x can be better explained by the other independent variables.


j

Under perfect multicollinearity, the variance of the slope estimator will


approach infinity.

Multicollinearity

Example
The different expenditure categories will be strongly correlated because if a school
has a lot of resources it will spend a lot on everything. It will be hard to estimate the
differential effects of different expenditure categories because all expenditures are
either high or low. For precise estimates of the differential effects, one would need
information about situations where expenditure categories change differentially. As a
consequence, sampling variance of the estimated effects will be large.

Discussion

Page 6
Created by Turbolearn AI

In the above example, it would probably be better to lump all expenditure


categories together because effects cannot be disentangled.
In other cases, dropping some independent variables may reduce
multicollinearity (but this may lead to omitted variable bias).
Only the sampling variance of the variables involved in multicollinearity will be
inflated; the estimates of other effects may be very precise.
Multicollinearity is not a violation of MLR.3 in the strict sense.
Multicollinearity may be detected through variance inflation factors.

Variances in Misspecified Models


The choice of whether to include a particular variable in a regression can be made by
analyzing the tradeoff between bias and variance. It might be the case that the likely
omitted variable bias in the misspecified model is overcompensated by a smaller
variance.

Estimating the Error Variance


An unbiased estimate of the error variance can be obtained by subtracting the
number of estimated regression coefficients from the number of observations. The
number of observations minus the number of estimated parameters is also called the
degrees of freedom.

The n estimated squared residuals in the sum are not completely independent but
related through the k + 1 equations that define the first-order conditions of the
minimization problem.

Theorem 3.3 (Unbiased estimator of the error variance)


2 SSR
σ
^ =
n−k−1

Estimation of the Sampling Variances of the


OLS Estimators
Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in
particular, there has to be homoskedasticity).

Efficiency of OLS: The Gauss-Markov Theorem

Page 7
Created by Turbolearn AI

Under assumptions MLR.1 - MLR.5, OLS is unbiased. However, under these


assumptions there may be many other estimators that are unbiased. Which one is the
unbiased estimator with the smallest variance? In order to answer this question one
usually limits oneself to linear estimators, i.e., estimators linear in the dependent
variable.

Theorem 3.4 (Gauss-Markov Theorem)

Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased
estimators (BLUEs) of the regression coefficients, i.e., OLS is only the best estimator if
MLR.1 - MLR.5 hold; if there is heteroskedasticity, for example, there are better
estimators.

Several Scenarios for Applying Multiple


Regression
Prediction: The best prediction of y will be its conditional expectation.

Efficient markets: Efficient markets theory states that a single variable acts as a
sufficient statistic for predicting y. Once we know this sufficient statistic, then
additional information is not useful in predicting y.

Measuring the tradeoff between two variables: Consider regressing salary on


pension compensation and other controls.

Testing for ceteris paribus group differences: Differences in outcomes


between groups can be evaluated with dummy variables.

Potential outcomes, treatment effects, and policy analysis: With multiple


regression, we can get closer to random assignment by conditioning on
observables. Inclusion of the x variables allows us to control for any reasons
why there may not be random assignment.

For example, if y is earnings and w is participation in a job training program,


then the variables in x would include all of those variables that are likely to be
related to both earnings and participation in job training.

Page 8

You might also like