Lecture 3-2_Interpreting Multiple Regression Estimates
Lecture 3-2_Interpreting Multiple Regression Estimates
Xinwei Ma
Department of Economics
UC San Diego
Spring 2021
• Information on wage, education and experience of 526 individuals in the workforce in 1976.
• Wage: hourly wage measured in dollars. Education and work experience: measured in years.
Regression estimate:
QUESTION. Does each year of work experience has the same effect? Do we expect work
experience to be more (or less) valuable for younger workers?
By including the quadratic term, exper2 , we allow for a nonlinear (percentage) effect of
work experience on wage.
• β3 > 0 → increasing marginal effect: one more year of experience is more valuable if one
already has lots of work experience.
• β3 < 0 → decreasing marginal effect: one more year of experience is more valuable if one does
not have lots of work experience.
ln(Wage) 1 2 3 4 5 6 7 8 9 10
Work experience
ln(Wage)
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Regression estimate:
• How do we interpret β̂2 = 0.041? For new entrants to the labor market, each additional year of
work experience is associated with a 4.1% increase in wage.
• Compare it to someone who has 10 years of experience: the marginal effect of one more year of
experience is 100(0.041 − 2 × 0.0007 × 10)% = 2.7%.
• β̂3 < 0 and is statistically significant → work experience has a decreasing marginal effect.
Regression estimate:
\ = 0.127 + 0.090 educ + 0.041 exper − 0.0007 exper2
ln(wage)
(0.107) (0.007) (0.005) (0.0001)
h i
∆ ln(wage) ≈ 0.041 − 2 × 0.0007exper ∆exper
3
negative? 0.041 − 2 × 0.0007exper < 0 if
exper > 28.6.
2
• For people with more than 28.6 years of experience,
log(Wage)
an extra year of experience is associated with a
1
decrease in wage. 0
EXERCISE. By including quadratic terms for both education and work experience, we
obtain
\ = 0.953 − 0.058 educ + 0.006 educ2 + 0.042 exper − 0.0007 exper2
ln(wage)
(0.146) (0.023) (0.001) (0.005) (0.0001)
h i
∆ ln(wage) ≈ −0.058 + 2 × 0.006educ ∆educ
h i
∆ ln(wage) ≈ 0.041 − 2 × 0.0007exper ∆exper
3
educ > 4.6, the effect of education will be positive.
2
• There are only 7 people in our data with less than 5
years of education.
log(Wage)
Overall, does education have an increasing or a 1
• β3 > 0 → positive interaction effect: one more year of experience is more valuable if one has
more education.
• β3 < 0 → negative interaction effect: one more year of experience is more valuable for people
who are less educated.
c Xinwei Ma 2021 7/27
Using Polnomials and Interactions
EXAMPLE. Education, experience, and wage.
• Information on wage, education and experience of 526 individuals in the workforce in 1976.
• Wage: hourly wage measured in dollars. Education an work experience: measured in years.
• Information on wage, education and gender of 526 individuals in the workforce in 1976.
• Wage: hourly wage measured in dollars. Education: measured in years. Gender: measured by a
binary variable female (= 1 if female, and = 0 if male).
• Questions:
− Does the effect of education on wage differ for men and women?
If we believe the zero conditional mean assumption, E[u|female, educ], then β1 captures
the “effect of gender on wage.” Alternatively, it is the wage gap between men and
women.
Regression estimates:
• There is a large gender was gap. On average, women earn 36% less than men. (Note that the
data was collected in 1976.)
• The gender wage gap is large enough to offset five years of education.
0 5 10 15 20
Education
Women Men
c Xinwei Ma 2021 11/27
Interaction with a Binary Regressor
With the additional interaction term, we allow the effect of education on wage to differ
by gender groups.
Regression estimates:
\ = 0.826 − 0.360 female + 0.077 educ − 0.00006 female × educ.
ln(wage)
(0.117) (0.203) (0.008) (0.01626)
• The estimate of the interaction term is very small and is not statistically significant.
c Xinwei Ma 2021 12/27
Interaction with a Binary Regressor
• We can code the three values numerically, say, 1=white, 2=black, and 3=other, but this may
not be useful in practice.
• What if another researcher runs the same regression but with race2, which codes the three
categories differently: 1=white, 23=black, and 5=other?
(THIS REGRESSION DOES NOT MAKE ANY SENSE!)
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
race2 | -.0564449 .0116564 -4.84 0.000 -.0793033 -.0335864
_cons | 8.148341 .1532251 53.18 0.000 7.847864 8.448819
------------------------------------------------------------------------------
c Xinwei Ma 2021 15/27
Regression with Categorial Variables
Using categorical variables as dependent variables or regressors naively will not yield
meaningful estimates, because the numerical values used to code a categorical variable
may not bear any economic meaning.
• EXAMPLE. Define three binary variables, white, black, and other. Consider the regression of
wage on white and black.
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
white | -.4677818 1.013238 -0.46 0.644 -2.454764 1.519201
black | -1.706223 1.024282 -1.67 0.096 -3.714864 .3024168
_cons | 8.550781 1.002483 8.53 0.000 6.584889 10.51667
------------------------------------------------------------------------------
• Should we regress wage on all three categories, white, black, and other? (NO! This will lead to
perfect multicollinearity.)
Y X1 X2 X3
wage white black other
X1 = 1 11.73 1 0 0
6.40 1 0 0
X3 = 1
5.01 1 0 0
X2 = 1
9.03 0 1 0
8.08 0 1 0
4.62 0 1 0
10.45 0 1 0
17.20 0 1 0
13.08 0 0 1
Xk+1 = 1 7.74 0 0 1
16.79 0 0 1
15.48 0 0 1
5.23 0 0 1
• Xj Xj = Xj . If j 6= `, Xj X` = 0.
• Alternatively, we can include X2 , X3 , · · · , Xk+1 as regressors, then X1 will be the base category.
• EXAMPLE. What is the base category if we regress wage on white and black? How about a
regression of wage on black and other?
We cannot include all k + 1 categories, because this will lead to perfect multicollinearity.
We observe
average wage for other = 8.5507813 = intercept
average wage for white = 8.0829994 = 8.550781 − 0.4677818
= intercept + slope of white
average wage for black = 8.0829994 = 8.550781 − 1.706223
= intercept + slope of black.
c Xinwei Ma 2021 19/27
Regression with Categorial Variables
Then we have
average Y for the Xk+1 category = β̂0 = intercept
..
.
average Y for the Xk category = β̂0 + β̂k = intercept + slope of Xk
To memorize, treat the regressors as “switches” that can be turned “on” and “off”:
• Take X1 = X2 = · · · = Xk = 0, then we get the average for the base category.
• Take X1 = 1 and X2 = · · · = Xk = 0, then we get the average for the X1 category.
• ···
• Step 2. Xik+1 Yi = β̂0 Xik+1 + β̂1 Xi1 Xik+1 + β̂2 Xi2 Xik+1 + · · · + β̂k Xik Xik+1 + ûi Xik+1 .
• Step 3. Xij Xik+1 = 0 for all j = 1, 2, · · · , k. Therefore Xik+1 Yi = β̂0 Xik+1 + ûi Xik+1 .
n n n
• Step 4. Sum over all observations:
P P P
Xik+1 Yi = β̂0 Xik+1 + ûi Xik+1 .
i=1 i=1 i=1
n n n
• Step 5.
P P P
ûi Xik+1 = 0, so that Xik+1 Yi = β̂0 Xik+1 .
i=1 i=1 i=1
n
P
Xik+1 Yi
i=1
• Step 6. β̂0 = n
. Conclude that this is the average Y for the Xk+1 category.
P
Xik+1
i=1
c Xinwei Ma 2021 21/27
Regression with Categorial Variables
• Step 2. Xi1 Yi = β̂0 Xi1 + β̂1 Xi1 Xi1 + β̂2 Xi2 Xi1 + · · · + β̂k Xik Xi1 + ûi Xi1 .
• Step 3. Xi1 Xi1 = Xi1 , and Xij Xi1 = 0 for all j = 2, 3, · · · , k. Therefore
Xi1 Yi = (β̂0 + β̂1 )Xi1 + ûi Xi1 .
n n n
• Step 4. Sum over all observations:
P P P
Xi1 Yi = (β̂0 + β̂1 ) Xi1 + ûi Xi1 .
i=1 i=1 i=1
n n n
• Step 5.
P P P
ûi Xi1 = 0, so that Xi1 Yi = (β̂0 + β̂1 ) Xi1 .
i=1 i=1 i=1
n
P
Xi1 Yi
i=1
• Step 6. β̂0 + β̂1 = n
. Conclude that this is the average Y for the X1 category.
P
Xi1
i=1
c Xinwei Ma 2021 22/27
Regression with Categorial Variables
• Consider the following saturated regression of wage on white and black (so that other is the
base category).
d = 8.55 − 0.47 white − 1.71 black.
wage
• What are the regression estimates if we choose white as the base category?
d = (A) + (B) black + (C) other.
wage
• Solution.
(A) = average wage for white = 8.55 − 0.47
EXAMPLE. Assume we observe wage and two binary variables urban and college.
urban
=1 =0
=1 urban, college graduate rural, college graduate
college
=0 urban, without college degree rural, without college degree
• In Stata:
gen urbanCol = (urban == 1 & college == 1)
gen urbanNonCol = (urban == 1 & college == 0)
gen ruralCol = (urban == 0 & college == 1)
gen ruralNonCol = (urban == 0 & college == 0)
urban
=1 =0
=1 5.75 + 5.36 = 11.11 5.75 + 2.61 = 8.36
college
=0 5.75 + 1.70 = 7.45 5.75
urban
=1 =0
=1 5.75 + 1.70 + 2.61 + 1.05 = 11.11 5.75 + 2.61 = 8.36
college
=0 5.75 + 1.70 = 7.45 5.75
c Xinwei Ma 2021 25/27
Interaction Between Binary Regressors
EXAMPLE. Assume we observe wage and three binary variables urban, college, and
married.
− non-college graduates that are single and live in rural areas (base category)
• There are 2p categories (unless some of the binary variables are mutually exclusive).
• The base category is X1 = X2 = · · · = Xp = 0, and average Y for this category is the intercept.
• Average Y for other categories can be found by setting the corresponding binary variables to 1
and 0 and adding up the relevant coefficients.
REMARK
• Regressing Y on X1 , X2 , · · · , Xp does not yield a saturated regression, unless all of the variables
are mutually exclusive.
• With many binary variables, the saturated regression will involve lots of categories.
• Some category may not have any observation (by chance), and the saturated regression will fail
due to perfect multicollinearity.
If you do so, you may be subject to student conduct proceedings under the UC San
Diego Student Code of Conduct.
c Xinwei Ma 2021
[email protected]