0% found this document useful (0 votes)
57 views26 pages

Multiple Regression Applications: Econ 140

This lecture covers multicollinearity and the use of dummy variables in multiple regression analysis. It discusses multicollinearity when independent variables are linearly related, causing problems for inference and interpretation of regression results. The lecture also demonstrates how to include qualitative variables like gender using dummy variables in a regression equation. This allows estimating differences in expected outcomes between categories defined by the dummy variables.

Uploaded by

NIkolai Castillo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views26 pages

Multiple Regression Applications: Econ 140

This lecture covers multicollinearity and the use of dummy variables in multiple regression analysis. It discusses multicollinearity when independent variables are linearly related, causing problems for inference and interpretation of regression results. The lecture also demonstrates how to include qualitative variables like gender using dummy variables in a regression equation. This allows estimating differences in expected outcomes between categories defined by the dummy variables.

Uploaded by

NIkolai Castillo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Econ 140

Multiple Regression Applications


Lecture 15

Lecture 15 1
Today’s Plan Econ 140

• Two topics and how they relate to multiple regression


– Multicollinearity
– Dummy variables

Lecture 15 2
Multicollinearity Econ 140

• Suppose we have the following regression equation:


Y = a + b1X1 + b2X2 + e

• Multicollinearity occurs when some or all of the


independent X variables are linearly related
• Different forms of multicollinearity:
– Perfect: OLS estimation will not work
– Non-perfect: comes out of applied work - presents
problems for inference and interpretation of the results.
– No test for detection - only possible to compare
alternative specified forms of the model.
Lecture 15 3
Multicollinearity Example Econ 140

• Again we’ll use returns to education where:


– the dependent variable Y is (log) wages
– the independent variables (X’s) are age, experience, and
years of schooling

• Experience is defined as years in the labor force, or the


difference between age and years of schooling
– this can be written: Experience = Age - Years of school
– What’s the problem with this?

Lecture 15 4
Multicollinearity Example (2) Econ 140

• Note that we’ve expressed experience as the difference of


two of our other independent variables
– by constructing experience in this manner we create a
collinear dependence between age and experience
– the relationship between age and experience is a linear
relationship such that: as age increases, for given years
of schooling, experience also increases

• We can write our regression equation for this example:


Ln(Wages) = a + b1Experience + b2Age + e

Lecture 15 5
Multicollinearity Example (3) Econ 140

• Recall that our estimate for b1 is


2
bˆ1   1  2   x1 x2  x2 y
x y x
x x 2
 
 1 2  1 2
2
x x  2

Where x1 = experience and x2 = age

• The problem is that x1 and x2 are linearly related


– as we get closer to perfect collinearity, the denominator
will go to zero.
– OLS won’t work!

Lecture 15 6
Multicollinearity Example (4) Econ 140

• Recall that the estimated variance for b̂1 is:

2 2   x 2 
ˆ b1  ˆ YX  2
2
  x1  x2   x1 x2  
2 2

– So as x1 and x2 approach perfect collinearity, the


denominator will go to zero and the expression for the
the estimated variance of b̂1 will increase
• Implications:
– with multicollinearity, you will get large standard errors
on partial coefficients
– your t-ratios, given the null hypothesis that the value of
Lecture 15
the coefficient is zero, will be small 7
More Multicollinearity Examples Econ 140

• In L15_1.xls we have individual data on age, years of


education, weekly earnings, school age, and experience
– we can perform a regression to calculate returns given
age and experience
– we can also estimate bivariate models including only
age, only experience, and only years of schooling
– we expect that the problem is that experience is related
to age (to test this, we can regress age on experience)
• if the slope coefficient on experience is 1, there is
perfect multicollinearity

Lecture 15 8
More Multicollinearity Examples (2) Econ 140

• On L15_2.xls there is a made-up example of perfect


multicollinearity
– OLS is unable to calculate the slope coefficients
– calculating the products and cross-products, we find
that the denominator for the slope coefficients is zero as
predicted
– If we have is an applied problem with these properties:
1) OLS is still unbiased
2) Large variance, standard errors, and difficult
hypothesis testing
3) Few significant coefficients but a high R2

Lecture 15 9
More Multicollinearity Examples (3) Econ 140

• What to do with L15_1.xls?


– There’s simply not enough variation
– We can collect more data or rethink the model
– We can test for partial correlations between the X
variables.
– Always try specification checks.
– Alternatively, try to re-scale variables so that the
correlation is not the same.

Lecture 15 10
Dummy variables Econ 140

• Dummy variables allow you to include qualitative


variables (or variables that otherwise cannot be quantified)
in your regression
– examples include: gender, race, marital status, and
religion
– also becomes important when looking at “regime
shifts” which may be new policy initiatives, economic
change, or seasonality
• We will look at some examples:
– using female as a qualitative variable
– using marital status as a qualitative variable
– using the Phillips curve to demonstrate a regime shift
Lecture 15 11
Qualitative example: female Econ 140

• We’ll construct a dummy variable:


Di = 0 if not female i = 1, …n
Di = 1 if female
– We can do this with any qualitative variable
– Note: assigning the values for the dummy variable is an
arbitrary choice
• On L15_3.xls there is a sample from the current CPS
– to create the dummy variable “female” we assign the
value one and zero to the CPS’ value of two and one for
sex, respectively
– we can include the dummy variable in the regression
Lecture 15
equation like we would any other variable 12
Qualitative example: female (2) Econ 140

• We estimate the following equation:


Yˆi  5.975  0.485Di
• Now we can ask: what are the expected earnings given that
a person is male?
E  Yi | Di  0   a  b(0)  a
E  Yi | Di  0   5.975
• Similarly, what are the expected earnings given that a
person is female?
E(Yi | Di = 1) = a + b(1) = a + b
= 5.975 - 0.485 = 5.490

Lecture 15 13
Qualitative example: female (4) Econ 140

• We can use other variables to extend our analysis


• for example we can include age to get the equation:
Y = a + b1Di + b2Xi + e
– where Xi can be any or all relevant variables
– Di and the related coefficient b1 will indicate how
much, on average, females earn less than males
– for males the intercept will be â
– for females the intercept will be aˆ  bˆ1

Lecture 15 14
Qualitative example: female (5) Econ 140

• The estimated regression found on the spreadsheet is


Yˆi  5.085  0.656Di  0.023X i

• The expected weekly earnings for men are:

E (Yi | Di  0)  a  b2 X i

• The expected weekly earnings for women are:


E (Yi | Di  1)  ( a  b1 )  b2 X i

Lecture 15 15
Qualitative example: female (6) Econ 140

• An important note:
• We can not include dummy variables for both male and
female in the same regression equation
– suppose we have Y = a + b1D1i + b2D2i + e
– where: D1i = 0 if male D1i = 1 if female
D2i = 0 if female D2i = 1 if male
– OLS won’t be able to estimate the regression coefficients
because D1i and D2i show perfect multicollinearity with
intercept a
• So if you have m qualitative variables, you should include
(m-1) dummy variables in the regression equation
Lecture 15 16
Example: marital status Econ 140

• The spreadsheet (L15_3.xls) also estimates the following


regression equation using two distinct dummy variables:
Y  a  b1D1i  b2 D2i  b3 X i  e
– where: D1i = 0 if male D1i = 1 if female
D2i = 0 if other D2i = 1 if married

• Using the regression equation we can create four categories:


married males, unmarried males, married females, and
unmarried females

Lecture 15 17
Example: marital status (2) Econ 140

• Expected earnings for unmarried males:


E (Yi | D1i  0, D2i  0)  a  b3 X i
• Expected earnings for unmarried females:
E (Yi | D1i  1, D2i  0)  (a  b1 )  b3 X i
• Expected earnings for married males:
E (Yi | D1i  0, D2i  1)  (a  b2 )  b3 X i
• Expected earnings for unmarried females:
E (Yi | D1i  1, D2i  1)  (a  b1  b2 )  b3 X i

Lecture 15 18
Interactive terms Econ 140

• So far we’ve only used dummy variables to change the


intercept
• We can also use dummy variables to alter the partial slope
coefficients
• Let’s think about this model:
ln(Wi )= a + b1Agei + b2Marriedi + e
– we could argue that bˆ1 , bˆ2 and aˆ be different for males
would
and females
– we want to think about two sub-sample groups: males
and females
– we can test the hypothesis that the partial slope
coefficients will be different for these 2 groups
Lecture 15 19
Interactive terms (2) Econ 140

• To test our hypothesis we’ll estimate the regression equation


for the whole sample and then for the two sub-sample
groups
• We test to see if our estimated coefficients are the same
between males and females

• Our null hypothesis is:


H0 : aM, b1M, b2M = aF, b1F, b2F

Lecture 15 20
Interactive terms (3) Econ 140

• We have an unrestricted form and a restricted form


– unrestricted: used when we estimate for the sub-sample
groups separately
– restricted: used when we estimate for the whole sample
• What type of statistic will we use to carry out this test?
– F-statistic:
 SSRR  SSRU  q
F
SSRU   n1  k    n2  k  
q = k, the number of parameters in the model
n = n1 + n2 where n is complete sample size

Lecture 15 21
Interactive terms (4) Econ 140

• The sum of squared residuals for the unrestricted form will


be:
SSRU = SSRM + SSRF

• L15.4.xls
– the data are sorted according to the dummy variable
“female”
– there is a second dummy variable for marital status
– there are 3 estimated regression equations, one each for
the total sample, male sub-sample, and female sub-
sample
Lecture 15 22
Interactive terms (5) Econ 140

• The output allows us to gather the necessary sum of squared


residuals and sample sizes to construct the estimate:
 SSRR  SSRU  q
F
SSRU   n1  k    n2  k  
16.261   7.495  5.093  3

 7.495  5.093  33  6 
1.224
  2.626
0.466
– Since F0.05,3, 27 = 2.96 > F* we cannot reject the null
hypothesis that the partial slope coefficients are the same
for males and females
Lecture 15 23
Interactive terms (6) Econ 140

• What if F* > F0.05,3, 27 ? How to read the results?


– There’s a difference between the two sub-samples and
therefore we should estimate the wage equations
separately
– Or we could interact the dummy variables with the
other variables
• To interact the dummy variables with the age and marital
status variables, we multiply the dummy variable by the
age and marital status variables to get:
Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) +
b5(Di*Marriedi) + ei

Lecture 15 24
Interactive terms (7) Econ 140

• Using L15.4.xls you can construct the interactive terms by


multiplying the FEMALE column by the AGE and
MARRIED columns
– one way to see if the two sub-samples are different,
look at the t-ratios on the interactive terms
– in this example, neither of the t-ratios are statistically
significant so we can not reject the null hypothesis

• We now know how to use dummy variables to indicate the


importance of sub-sample groups within the data
– dummy variables are also useful for testing for
structural breaks or regime shifts
Lecture 15 25
Interactive terms (8) Econ 140

• If we want to estimate the equation for the first sub-sample


(males) we take the expectation of the wage equation
where the dummy variable for female takes the value of
zero:
E(Wi|Di = 0) = a + b1Agei + b2Marriedi

• We can do the same for the second sub-sample (Females)


E(Wi|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi

• We can see that by using only one regression equation, we


have allowed the intercept and partial slope coefficients to
vary by sub-sample
Lecture 15 26

You might also like