0% found this document useful (0 votes)
24 views34 pages

AE 2023 Lecture3 PDF

This document provides an overview of multiple regression models. It introduces multiple regression and explains why we need multiple regression models to control for many factors that simultaneously affect the dependent variable. The document also discusses the differences between simple and multiple regression, the interpretation of coefficients in multiple regression, and the assumptions of multiple regression models including linearity, random sampling, no perfect collinearity, and zero conditional mean of the error term. It addresses the concepts of exogeneity, unbiasedness of OLS estimators, and the effect of including irrelevant variables in a regression model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views34 pages

AE 2023 Lecture3 PDF

This document provides an overview of multiple regression models. It introduces multiple regression and explains why we need multiple regression models to control for many factors that simultaneously affect the dependent variable. The document also discusses the differences between simple and multiple regression, the interpretation of coefficients in multiple regression, and the assumptions of multiple regression models including linearity, random sampling, no perfect collinearity, and zero conditional mean of the error term. It addresses the concepts of exogeneity, unbiasedness of OLS estimators, and the effect of including irrelevant variables in a regression model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Lecture 3: Multiple Regression

Model
Applied Econometrics
Dr. Le Anh Tuan

1
Introducing Multiple Regression
►“Explains variable ! in terms of variables "1, "2, … , "'
Dependent variable, Independent variables,
Explained variable, explanatory variables,
Response variable,… regressors,…

Intercept, Error term,


Constant Disturbance,
Slope parameter,
Unobservables,
Coefficient
Residuals,…

2
Why Do We Need Multiple Regression?
►Control for many other factors which simultaneously
affect the dependent variable
►Reduce omitted variable bias

►Explicitly hold fixed other factors that otherwise would


be in !
►Once we control for a factor, the ceteris paribus
condition with respect to this factor is
automatically fulfilled
►Note that in multiple regression model, our
primary interest is still the value of our main
independent variable, other variables are
control variables.

3
Why Do We Need Multiple Regression?
► Useful for creating functional relationships between
variables
► Quadratic function

Model has two explanatory variables: income and


income squared. Consumption is explained as a
quadratic function of income

► Interaction terms function


! = #$ + #& '& + #( '( + #) '& ∗ '( + +

4
Simple Regression vs. Multiple Regression
► Most of the properties of the simple regression model
directly extend to the multiple regression case
→ same principles
► Population regression model
!" = $% + $' ("' + $) (") + ⋯ + $+ ("+ + ,i

► Sample regression model


!" = $-% + $
-' ("' + $
-) (") + ⋯ + $
-+ ("+ + ,
."

► Fitted values of y:
-% + $
!/" = $ -' ("' + $
-) (") + ⋯ + $
-+ ("+

► Residuals
/ " = !" − !
, -% + $
/" = !" − $ -' ("' + $
-) (") + ⋯ + $
-+ ("+

5
OLS model

►Choosing the values " !#, "


!%, "
! & ,…, "
! ' such that
the sum of squared residuals (SSR) is
minimized.

6
Properties of OLS

► Algebraic properties of OLS regression

► Deviations from regression line sum up to zero

► Covariance between deviations and regressors is


zero

► Sample averages of ! and " lie on regression line

7
Interpreting the coefficients
►Interpretation of the multiple linear regression model
By how much does the dependent variable change
if the j-th independent variable is increased by one
unit, holding all other independent variables and the
error term constant

►The multiple linear regression model manages to hold


the values of other explanatory variables fixed even if,
in reality, they are correlated with the explanatory
variable under consideration.
►“Ceteris paribus”-interpretation
►It has still to be assumed that unobserved factors do
not change if the explanatory variables are changed.

8
Example
►Determinants of college GPA

Grade point
High school grade Achievement test
average at college
point average score

► The coefficient on hsGPA: Holding ACT fixed, another point


on high school GPA is associated with an increase in college
GPA by 0.453 points.
► Or: If we compare two students with the same ACT, but the
hsGPA of student A is one point higher, we predict student A
to have a colGPA that is 0.453 higher than that of student B
► Holding high school GPA fixed, another 1 point on ACT are
associated with an increase in college GPA by 0.0094
points.
9
Goodness of Fit

10
Goodness of Fit
►Decomposition of total variation: SST = SSE + SSR

Explained sum of squares,


represents variation
explained by regression/
independent variables
Total sum of squares, Residual sum of squares,
represents total variation represents variation not
in the dependent variable explained by regression

11
Goodness of Fit

► R-squared measures the fraction of the total variation that is


explained by the regression

► R2 is the fraction of the sample variation in y that is explained


by independent variables ("#, "%, "&, … , "().

► R2 always goes up if we include additional variables.

► Additional regressors can cause trouble especially when we


have few observations.

12
Goodness of Fit
►Econometricians came up with adjusted R-squared,
which enables a comparison of models:

()*
"! # = 1 − 1 − " # .
()+)*

► "! # < " # which implies that as the number of


, variables increases, the adjusted R2 increases less
than the R2

►It is good practice to use "! # rather than " # because


" # tends to give an overly optimistic picture of the fit
of the regression, particularly when the number of
explanatory variables is large compared with the
number of observations.
13
Assumptions of
Multiple Linear Regression (MLR)

14
Assumptions of MLR
► Assumption MLR.1 (Linear population model). In the
population, the relationships between dependent and
independent variables are linear.
! = #$ + #& '1 + #) '2 + ⋯ + #, '- + .
where #$ , #& , #) , … , #, are the population intercept and
slope parameters, respectively.

► Assumption MLR.2 (Random sampling). The data is a


random sample drawn from the population. Each data
point therefore follows the population equation. We
have a random sample of size 1, '21 , '22 , … , '2- , !2 , n =
1,2, … , n

15
Assumptions of MLR
► Assumption MLR.3 (no perfect collinearity). In the
sample (and therefore in the population), none of the
independent variables is constant, and there are no exact
linear relationships among the independent variables.

► Note that SLR.3 was telling us that there is sample


variation in !
► Now, not only do we need variation in all explanatory
variables, but we need them to vary individually.

► The assumption only rules out perfect collinearity/


correlation between explanatory variables; imperfect
correlation is allowed.

16
MLR3. no perfect collinearity
► Example for perfect collinearity: relationships between
regressors

Either shareA or shareB will have to be dropped from the


regression because there is an exact linear relationship between
them: shareA + shareB = 1
ShareA: Percentage of campaign expenditures candidate A
ShareB: Percentage of campaign expenditures candidate B

► Dropping some independent variables may reduce


multicollinearity (but this may lead to omitted variable
bias).

17
Assumptions of MLR
► Assumption MLR.4 (zero conditional mean of u).
The error ! has an expected value of zero given any
value of the explanatory variables. In other words,
"[!|%] = 0.

► The value of the explanatory variables must contain


no information about the mean of the unobserved
factors

► Note that for our random sample, MLR.4 implies


"[!) |%)* , … , %)- ] = 0

18
Endogenous vs Exogenous

► If independent variables and error term :


+ are correlated => endogenous. Endogeneity is a
violation of assumption MLR.4
+are uncorrelated => exogenous; MLR.4 holds if all
indepedent variables are exogenous

► Exogeneity is the key assumption for a causal


interpretation of the regression, and for unbiasedness
of the OLS estimators.

19
Unbiasedness of OLS
► Theorem 3.1 (Unbiasedness of OLS)

MLR.1 – MLR.4 → " $#% = $%

► Under the assumptions MLR.1 through MLR.4, the OLS estimators


are unbiased.

► If we collected multiple random samples, OLS doesn’t


overestimate or underestimate the real values. The estimated
coefficients equal to the true values in the population.

20
Including Irrelevant Variables in a
Regression Model

► No problem because ! #"$ = #$ =0 = 0 in the population

► All OLS estimates are unbiased for any values of #$ ,


including 0.

► Not including an important variable may cause a bias


(omitted variable bias).

► However, including irrelevant variables reduces the


accuracy of the estimated coefficients:
► increase the sampling variance
21
Omitted variable bias

► True model: ! = #$ + #& '& + #( '( + )

► Estimated model: ! +$ + #
*=# + & '& + , (omitted -2 )

► Assume -1 and -2 are correlated and '( = 0$ + 0&'& + 1

► ! = #$ + #& '& + #( (0$ + 0&'& + 1 + ))


► ! = #$ +#( 0$ + (#& +#( 0&)'& + (#( 1 + ))

→ All estimated coefficients will be biased

► If the omitted variable is irrelevant or uncorrelated, no


omitted variable bias

22
Variances of the OLS estimators

► Assumption MLR.5 (Homoskedasticity):


Variance of ! does not vary with ". More precisely,

► The value of the explanatory variables must contain no


information about the variability of the unobserved factors.
► Homoskedasticity = constant variance
► Heteroskedasticity = variance is not constant (changed)

23
Variances of the OLS estimators
► Theorem 3.2 (Variances of the OLS estimators)
► Under assumptions MLR.1 – MLR.5: the variance of !

Total sample variation in


explanatory variable "#: R-squared from a regression of
explanatory variable "# on all other
independent variables (including a
constant)

24
Variances of the OLS estimators

The more the total variation in !" Remember that σ2 is the variance of
, the more accurate the OLS the error term u. If the variance of u
estimates of #" will be. reduces, the accuracy of OLS
What can help: adding more estimators increases.
observations increases $$%" . What can help: adding explanatory
variables (taking some factors out of u)

If !" is uncorrelated with other independent variables, this R2 is zero. With


increasing correlation between the x’s, the accuracy of OLS estimators reduces.
The possible linear relationship between the x’s is called multicollinearity.
25
How to detect multicollinearity?
► Use the correlation coefficients between variable
► Corr(x1, x2) should not be larger than 0.7

► Multicollinearity may be detected through “variance


inflation factors (VIF)”

The variance inflation factor should not be


larger than 10

26
Example : Multicollinearity
Average Other ex-
Expenditures Expenditures for in-
standardized penditures
for teachers structional materials
test score of school

► The different expenditure categories will be strongly


correlated because if a school has a lot of resources it will
spend a lot on everything.
► It will be hard to estimate the differential effects of different
expenditure categories because all expenditures are either
high or low. For precise estimates of the differential
effects, one would need information about situations
where expenditure categories change differentially.
► As a consequence, sampling variance of the estimated
effects will be large.

27
Variances in misspecified models
► The choice of whether to include a particular variable in a
regression can be made by analyzing the tradeoff between
bias and variance
► True population model:

► Estimated model 1

► Estimated model 2

28
Variances in misspecified models
► Case 1:

Conclusion: Do not include irrelevant regressors

► Case 2:

Trade off bias and variance; Caution: bias will not


disappear even in large samples

29
Variances of the OLS estimators
► Estimating the error variance:

► An unbiased estimate of the error variance can be obtained by substracting the


number of estimated regression coefficients from the number of observations.
The number of observations minus the number of estimated parameters is also
called the degrees of freedom. The n estimated squared residuals in the sum
are not completely independent but related through the k+1 equations that define
the first order conditions of the minimization problem

► Theorem 3.3 (Unbiased estimator of the error variance)

30
Theorem 2.3
► Calculation of standard errors for regression coefficients

The true sampling


variation of the
estimated

The estimated
sampling variation of
the estimated
Plug in for the unknown

Note that these formulas are only valid under


assumptions MLR.1-MLR.5 (in particular, there
has to be homoskedasticity)

31
Gauss-Markov Theorem
► The Gauss-Markov theorem shows that, within a certain
class of unbiased estimators, OLS is the one that exhibits
the smallest variance among all competing estimators

Theorem: Gauss Markov theorem.


Under the assumptions MLR.1 through MLR.5,
OLS estimator is the best linear unbiased
estimator (BLUE) of the regression coefficients

32
Gauss-Markov Theorem
On the meaning of BLUE:
►Best:
►“best” actually means “the one with the lowest
variance
►Linear:
►An estimator of multiple regression coefficients is
linear
►OLS can be shown to be a linear estimator
►Unbiased
►the expected value of the estimate of "!# is the real
value "#
►Estimator

33
Exercise

► Reading Chapter 4 – Hypothesis Testing

► Propose 3-5 initial ideas.

► Prepare Research Proposal.

34

You might also like