0% found this document useful (0 votes)
8 views

Lecture 3-2_Interpreting Multiple Regression Estimates

ECON 101 Lecture 3.2

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 3-2_Interpreting Multiple Regression Estimates

ECON 101 Lecture 3.2

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

ECON 120B Econometrics

Lecture 3 Part 2: Interpreting Multiple Regression Estimates

Xinwei Ma

Department of Economics
UC San Diego

Spring 2021

c Xinwei Ma 2021 0/27


Outline

Using Polnomials and Interactions

Interaction with a Binary Regressor

Regression with Categorial Variables

Interaction Between Binary Regressors

c Xinwei Ma 2021 0/27


Using Polnomials and Interactions

 EXAMPLE. Education, experience, and wage.

• Information on wage, education and experience of 526 individuals in the workforce in 1976.

• Wage: hourly wage measured in dollars. Education and work experience: measured in years.

 Regression estimate:

\ = 0.216 + 0.097 educ + 0.010 exper


ln(wage)
(0.114) (0.008) (0.002)

• Each year of education is associated with a 9.7% increase in hourly wage.

• Each year of work experience is associated with a 1% increase in hourly wage.

• Both slope estimates are statistically significant at the 1% level.

 QUESTION. Does each year of work experience has the same effect? Do we expect work
experience to be more (or less) valuable for younger workers?

c Xinwei Ma 2021 1/27


Using Polnomials and Interactions

 Consider the following regression model:

ln(wage) = β0 + β1 educ + β2 exper + β3 exper2 + u.

 By including the quadratic term, exper2 , we allow for a nonlinear (percentage) effect of
work experience on wage.

 The marginal effect of work experience is now:


h i
∆ ln(wage) ≈ β2 + 2β3 exper ∆exper,

if we hold both ∆educ = 0 and ∆u = 0.

• The approximation will be accurate if ∆exper is small.

• β3 > 0 → increasing marginal effect: one more year of experience is more valuable if one
already has lots of work experience.

• β3 < 0 → decreasing marginal effect: one more year of experience is more valuable if one does
not have lots of work experience.

c Xinwei Ma 2021 2/27


Linear effect

ln(Wage) 1 2 3 4 5 6 7 8 9 10

Work experience

Increasing marginal effect Decreasing marginal effect


ln(Wage)

ln(Wage)

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Work experience Work experience

c Xinwei Ma 2021 3/27


Using Polnomials and Interactions

 Regression estimate:

\ = 0.127 + 0.090 educ + 0.041 exper − 0.0007 exper2


ln(wage)
(0.107) (0.007) (0.005) (0.0001)
h i
∆ ln(wage) ≈ 0.041 − 2 × 0.0007exper ∆exper

• How do we interpret β̂2 = 0.041? For new entrants to the labor market, each additional year of
work experience is associated with a 4.1% increase in wage.

• Compare it to someone who has 10 years of experience: the marginal effect of one more year of
experience is 100(0.041 − 2 × 0.0007 × 10)% = 2.7%.

• β̂3 < 0 and is statistically significant → work experience has a decreasing marginal effect.

c Xinwei Ma 2021 4/27


Using Polnomials and Interactions

 Regression estimate:
\ = 0.127 + 0.090 educ + 0.041 exper − 0.0007 exper2
ln(wage)
(0.107) (0.007) (0.005) (0.0001)
h i
∆ ln(wage) ≈ 0.041 − 2 × 0.0007exper ∆exper

 One feature of the quadratic function,


0.041 − 2 × 0.0007exper is that the function will
start going down for large values of exper.

• Would the marginal effect of experience ever be

3
negative? 0.041 − 2 × 0.0007exper < 0 if
exper > 28.6.

2
• For people with more than 28.6 years of experience,

log(Wage)
an extra year of experience is associated with a
1
decrease in wage. 0

• Does this reflect a negative causal effect of work


experience on wage? Probably not. People may
-1

switch to lower-paying and lower-effort jobs in later 0 10 20 30 40 50


stages of career. Work experience

c Xinwei Ma 2021 5/27


Using Polnomials and Interactions

 EXERCISE. By including quadratic terms for both education and work experience, we
obtain
\ = 0.953 − 0.058 educ + 0.006 educ2 + 0.042 exper − 0.0007 exper2
ln(wage)
(0.146) (0.023) (0.001) (0.005) (0.0001)
h i
∆ ln(wage) ≈ −0.058 + 2 × 0.006educ ∆educ
h i
∆ ln(wage) ≈ 0.041 − 2 × 0.0007exper ∆exper

 How should we interpret the negative coefficient


−0.058? Does it imply education has a negative
effect on wage?
• Probably not. We see that the effect of education,
−0.058 + 2 × 0.006educ, will be positive as long as

3
educ > 4.6, the effect of education will be positive.

2
• There are only 7 people in our data with less than 5
years of education.

log(Wage)
 Overall, does education have an increasing or a 1

decreasing marginal effect?


0

• Education has an increasing marginal effect, as the


-1

coefficient of the quadratic term is positive, and is 0 5 10 15 20


statistically significant. Education

c Xinwei Ma 2021 6/27


Using Polnomials and Interactions
 QUESTION. Are education and work experience “substitutes” or “complements”?
• Theory 1: People can acquire human capital either by attending school (more education) or
through on-the-job training (more experience). So education and work experience are
“substitutes.”
• Theory 2: People with more education tend to learn job-related skills faster, and hence education
and work experience are “complements.”

 Consider the regression model


ln(wage) = β0 + β1 educ + β2 exper + β3 educ × exper + u.
 By including the interaction term, educ × exper, we allow for an interaction effect
between education and experience.

 For example, the marginal effect of experience is


h i
∆ ln(wage) ≈ β2 + β3 educ ∆exper,

if we hold both ∆educ = 0 and ∆u = 0.

• β3 > 0 → positive interaction effect: one more year of experience is more valuable if one has
more education.

• β3 < 0 → negative interaction effect: one more year of experience is more valuable for people
who are less educated.
c Xinwei Ma 2021 7/27
Using Polnomials and Interactions
 EXAMPLE. Education, experience, and wage.
• Information on wage, education and experience of 526 individuals in the workforce in 1976.
• Wage: hourly wage measured in dollars. Education an work experience: measured in years.

Dependent variable: ln(Wage)


Model
(1) (2) (3) (4)
Education 0.097 −0.058 0.103 −0.061
(0.008) (0.023) (0.011) (0.048)
Education2 0.006 0.006
(0.001) (0.002)
Experience 0.010 0.042 0.013 0.041
(0.002) (0.005) (0.005) (0.010)
Experience2 −0.0007 −0.0007
(0.001) (0.0001)
Education×Experience −0.0002 0.0005
(0.0004) (0.0006)
Intercept 0.216 0.953 0.153 0.978
(0.114) (0.146) (0.156) (0.360)
R2 0.249 0.328 0.250 0.329
n 526 526 526 526
Note. Standard errors are given in parentheses.
c Xinwei Ma 2021 8/27
Outline

Using Polnomials and Interactions

Interaction with a Binary Regressor

Regression with Categorial Variables

Interaction Between Binary Regressors

c Xinwei Ma 2021 8/27


Interaction with a Binary Regressor

 EXAMPLE. Education, gender, and wage.

• Information on wage, education and gender of 526 individuals in the workforce in 1976.

• Wage: hourly wage measured in dollars. Education: measured in years. Gender: measured by a
binary variable female (= 1 if female, and = 0 if male).

• Questions:

− Is there a wage gap between men and women?

− Can the difference be explained by educational attainment?

− Does the effect of education on wage differ for men and women?

c Xinwei Ma 2021 9/27


Interaction with a Binary Regressor

 Consider the regression

ln(wage) = β0 + β1 female + β2 educ + u.

 If we believe the zero conditional mean assumption, E[u|female, educ], then β1 captures
the “effect of gender on wage.” Alternatively, it is the wage gap between men and
women.

 Regression estimates:

\ = 0.826 − 0.360 female + 0.077 educ


ln(wage)
(0.100) (0.039) (0.008)

• There is a large gender was gap. On average, women earn 36% less than men. (Note that the
data was collected in 1976.)

• The gender wage gap is large enough to offset five years of education.

c Xinwei Ma 2021 10/27


Interaction with a Binary Regressor
 Including a dummy variable allows a mean/intercept shift between groups.

• Intuitively, we can rewrite the regression estimates


\ = 0.826 − 0.360 female + 0.077 educ
ln(wage)
(0.100) (0.039) (0.008)
as
men: \ = 0.826
ln(wage) + 0.077 educ
(0.100) (0.008)

women: \ = 0.826 − 0.360


ln(wage) + 0.077 educ.
(0.100) (0.039) (0.008)
Importantly, the effect of education on wage is ∆ ln(wage) = β2 ∆educ, which is the same for
both men and women.
3
2
log(Wage)
10
-1

0 5 10 15 20
Education

Women Men
c Xinwei Ma 2021 11/27
Interaction with a Binary Regressor

 Consider the regression


ln(wage) = β0 + β1 female + β2 educ + β3 female × educ + u.

 With the additional interaction term, we allow the effect of education on wage to differ
by gender groups.

• For men, the effect of education of wage is β2 , while it is β2 + β3 for women:


h i
men: ∆ ln(wage) = β2 + β3 female ∆educ = β2 ∆educ
h i
women: ∆ ln(wage) = β2 + β3 female ∆educ = (β2 + β3 )∆educ

 Alternatively, we can rewrite the regression model as


men: ln(wage) = β0 + β2 educ +u
women: ln(wage) = β0 + β1 + (β2 + β3 )educ + u.

 Regression estimates:
\ = 0.826 − 0.360 female + 0.077 educ − 0.00006 female × educ.
ln(wage)
(0.117) (0.203) (0.008) (0.01626)

• The estimate of the interaction term is very small and is not statistically significant.
c Xinwei Ma 2021 12/27
Interaction with a Binary Regressor

 REMARK. Subgroup analysis and fully interacted model.

• Assume we want to study the relationship between Y and X1 , X2 , · · · , Xk . In addition, we have


a binary variable Z , which indicates two groups.

• EXAMPLE. Y = ln(wage), X1 = educ, X2 = exper, and Z = female.

• The following are equivalent:

• Subgroup analysis: Regressing Y on X1 , X2 , · · · , Xk separately for each group


Z =1: Ŷ = β̂0 + β̂1 X1 + β̂2 X2 + · · · + β̂k Xk
Z =0: Ŷ = β̃0 + β̃1 X1 + β̃2 X2 + · · · + β̃k Xk .
• Fully interacted model: Regress Y on X1 , X2 , · · · , Xk , Z , and all the interaction terms
ZX1 , ZX2 , · · · , ZXk

Ŷ = β̃0 + β̃1 X1 + β̃2 X2 + · · · + β̃k Xk


+(β̂0 − β̃0 )Z + (β̂1 − β̃1 )ZX1 + (β̂2 − β̃2 )ZX2 + · · · + (β̂k − β̃k )ZXk .

c Xinwei Ma 2021 13/27


Interaction with a Binary Regressor
. reg lwage educ if female == 1, robust // SUBGROUP WOMEN, female == 1
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0771639 .0135456 5.70 0.000 .0504859 .1038419
_cons | .4658902 .1663282 2.80 0.005 .1383069 .7934734
------------------------------------------------------------------------------

. reg lwage educ if female == 0, robust // SUBGROUP MEN, female == 0


------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0772279 .0089893 8.59 0.000 .0595305 .0949254
_cons | .8259547 .1168366 7.07 0.000 .5959357 1.055974
------------------------------------------------------------------------------

. reg lwage educ female female_educ, robust // FULLY INTERACTED


------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0772279 .0089907 8.59 0.000 .0595656 .0948903
female | -.3600645 .2032506 -1.77 0.077 -.7593542 .0392252
female_educ | -.0000641 .0162559 -0.00 0.997 -.0319991 .0318709
_cons | .8259547 .1168546 7.07 0.000 .5963916 1.055518
------------------------------------------------------------------------------

c Xinwei Ma 2021 14/27


Outline

Using Polnomials and Interactions

Interaction with a Binary Regressor

Regression with Categorial Variables

Interaction Between Binary Regressors

c Xinwei Ma 2021 14/27


Regression with Categorial Variables

 A variable X is called categorical if it takes values in a finite set.

 EXAMPLE. Race and wage.


The variable race can take three values: white, black, and other.

• We can code the three values numerically, say, 1=white, 2=black, and 3=other, but this may
not be useful in practice.

• What will happen if a researcher regresses wage on race?


(THIS REGRESSION DOES NOT MAKE ANY SENSE!)
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
race | -.9793586 .2401173 -4.08 0.000 -1.450234 -.5084834
_cons | 9.023197 .3449131 26.16 0.000 8.346814 9.699579
------------------------------------------------------------------------------

• What if another researcher runs the same regression but with race2, which codes the three
categories differently: 1=white, 23=black, and 5=other?
(THIS REGRESSION DOES NOT MAKE ANY SENSE!)
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
race2 | -.0564449 .0116564 -4.84 0.000 -.0793033 -.0335864
_cons | 8.148341 .1532251 53.18 0.000 7.847864 8.448819
------------------------------------------------------------------------------
c Xinwei Ma 2021 15/27
Regression with Categorial Variables

 Using categorical variables as dependent variables or regressors naively will not yield
meaningful estimates, because the numerical values used to code a categorical variable
may not bear any economic meaning.

 In practice, it is common to disentangle the different categories with a collection of binary


variables.

• EXAMPLE. Define three binary variables, white, black, and other. Consider the regression of
wage on white and black.
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
white | -.4677818 1.013238 -0.46 0.644 -2.454764 1.519201
black | -1.706223 1.024282 -1.67 0.096 -3.714864 .3024168
_cons | 8.550781 1.002483 8.53 0.000 6.584889 10.51667
------------------------------------------------------------------------------

• How should we interpret the above results?

• Should we regress wage on all three categories, white, black, and other? (NO! This will lead to
perfect multicollinearity.)

c Xinwei Ma 2021 16/27


Regression with Categorial Variables

 Assume a sample can be split into k + 1 categories, represented by k + 1 binary variables,


X1 , X2 , · · · , Xk+1 .

EXAMPLE. Race and wage.

Y X1 X2 X3
wage white black other
X1 = 1 11.73 1 0 0
6.40 1 0 0
X3 = 1
5.01 1 0 0
X2 = 1
9.03 0 1 0
8.08 0 1 0
4.62 0 1 0
10.45 0 1 0
17.20 0 1 0
13.08 0 0 1
Xk+1 = 1 7.74 0 0 1
16.79 0 0 1
15.48 0 0 1
5.23 0 0 1

c Xinwei Ma 2021 17/27


Regression with Categorial Variables

 Assume a sample can be split into k + 1 categories, represented by k + 1 binary variables,


X1 , X2 , · · · , Xk+1 .
k+1
• Xj = 0 or 1, and
P
Xj = 1.
j=1

• Xj = 1 if an observation falls into the j-th category.

• Xj Xj = Xj . If j 6= `, Xj X` = 0.

− EXAMPLE. X1 = white, X2 = black, and X3 = other.

 A multiple regression is called saturated, if it includes any k categories as regressors.

 If we include X1 , X2 , · · · , Xk as our regressors, the omitted category, Xk+1 , is called the


base category or the base group.

• Alternatively, we can include X2 , X3 , · · · , Xk+1 as regressors, then X1 will be the base category.

• EXAMPLE. What is the base category if we regress wage on white and black? How about a
regression of wage on black and other?

 We cannot include all k + 1 categories, because this will lead to perfect multicollinearity.

c Xinwei Ma 2021 18/27


Regression with Categorial Variables
 Consider the saturated regression
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
white | -.4677818 1.013238 -0.46 0.644 -2.454764 1.519201
black | -1.706223 1.024282 -1.67 0.096 -3.714864 .3024168
_cons | 8.550781 1.002483 8.53 0.000 6.584889 10.51667
------------------------------------------------------------------------------
and the average wage within each category (Stata command: tab race, summ(wage)):
| Summary of hourly wage
race | Mean Std. Dev. Freq.
------------+------------------------------------
white | 8.0829994 5.9550691 1,637
black | 6.8445578 5.0761866 583
other | 8.5507813 5.2094301 26
------------+------------------------------------
Total | 7.766949 5.7555229 2,246

 We observe
average wage for other = 8.5507813 = intercept
average wage for white = 8.0829994 = 8.550781 − 0.4677818
= intercept + slope of white
average wage for black = 8.0829994 = 8.550781 − 1.706223
= intercept + slope of black.
c Xinwei Ma 2021 19/27
Regression with Categorial Variables

 Assume we have a saturated regression


Ŷ = β̂0 + β̂1 X1 + β̂2 X2 + · · · + β̂k Xk ,
and we take Xk+1 as the base category.

 Then we have
average Y for the Xk+1 category = β̂0 = intercept

average Y for the X1 category = β̂0 + β̂1 = intercept + slope of X1

average Y for the X2 category = β̂0 + β̂2 = intercept + slope of X2

..
.
average Y for the Xk category = β̂0 + β̂k = intercept + slope of Xk

 To memorize, treat the regressors as “switches” that can be turned “on” and “off”:
• Take X1 = X2 = · · · = Xk = 0, then we get the average for the base category.
• Take X1 = 1 and X2 = · · · = Xk = 0, then we get the average for the X1 category.
• ···

 What if we regress Y on X2 , X3 , · · · , Xk+1 (so that X1 is the base category)?


c Xinwei Ma 2021 20/27
Regression with Categorial Variables

 Assume we have a saturated regression


Ŷ = β̂0 + β̂1 X1 + β̂2 X2 + · · · + β̂k Xk ,
and we take Xk+1 as the base category.

 Prove average Y for the Xk+1 (base) category = β̂0 = intercept.

• Step 1. Yi = β̂0 + β̂1 Xi1 + β̂2 Xi2 + · · · + β̂k Xik + ûi .

• Step 2. Xik+1 Yi = β̂0 Xik+1 + β̂1 Xi1 Xik+1 + β̂2 Xi2 Xik+1 + · · · + β̂k Xik Xik+1 + ûi Xik+1 .

• Step 3. Xij Xik+1 = 0 for all j = 1, 2, · · · , k. Therefore Xik+1 Yi = β̂0 Xik+1 + ûi Xik+1 .

n n n
• Step 4. Sum over all observations:
P P P
Xik+1 Yi = β̂0 Xik+1 + ûi Xik+1 .
i=1 i=1 i=1

n n n
• Step 5.
P P P
ûi Xik+1 = 0, so that Xik+1 Yi = β̂0 Xik+1 .
i=1 i=1 i=1

n
P
Xik+1 Yi
i=1
• Step 6. β̂0 = n
. Conclude that this is the average Y for the Xk+1 category.
P
Xik+1
i=1
c Xinwei Ma 2021 21/27
Regression with Categorial Variables

 Assume we have a saturated regression


Ŷ = β̂0 + β̂1 X1 + β̂2 X2 + · · · + β̂k Xk ,
and we take Xk+1 as the base category.

 Prove average Y for the X1 category = β̂0 + β̂1 = intercept + slope of X1 .

• Step 1. Yi = β̂0 + β̂1 Xi1 + β̂2 Xi2 + · · · + β̂k Xik + ûi .

• Step 2. Xi1 Yi = β̂0 Xi1 + β̂1 Xi1 Xi1 + β̂2 Xi2 Xi1 + · · · + β̂k Xik Xi1 + ûi Xi1 .

• Step 3. Xi1 Xi1 = Xi1 , and Xij Xi1 = 0 for all j = 2, 3, · · · , k. Therefore
Xi1 Yi = (β̂0 + β̂1 )Xi1 + ûi Xi1 .

n n n
• Step 4. Sum over all observations:
P P P
Xi1 Yi = (β̂0 + β̂1 ) Xi1 + ûi Xi1 .
i=1 i=1 i=1

n n n
• Step 5.
P P P
ûi Xi1 = 0, so that Xi1 Yi = (β̂0 + β̂1 ) Xi1 .
i=1 i=1 i=1

n
P
Xi1 Yi
i=1
• Step 6. β̂0 + β̂1 = n
. Conclude that this is the average Y for the X1 category.
P
Xi1
i=1
c Xinwei Ma 2021 22/27
Regression with Categorial Variables

 EXERCISE. Changing the base category.

• Consider the following saturated regression of wage on white and black (so that other is the
base category).
d = 8.55 − 0.47 white − 1.71 black.
wage

• What are the regression estimates if we choose white as the base category?
d = (A) + (B) black + (C) other.
wage

• Solution.
(A) = average wage for white = 8.55 − 0.47

(A) + (B) = average wage for black = 8.55 − 1.71

(A) + (C) = average wage for other = 8.55.

c Xinwei Ma 2021 23/27


Outline

Using Polnomials and Interactions

Interaction with a Binary Regressor

Regression with Categorial Variables

Interaction Between Binary Regressors

c Xinwei Ma 2021 23/27


Interaction Between Binary Regressors

 We discussed how to conduct regression analysis using a categorical variable.

 In many applications, however, categories are generated using dummy variables.

 EXAMPLE. Assume we observe wage and two binary variables urban and college.

• We can partition the sample into four categories:

urban
=1 =0
=1 urban, college graduate rural, college graduate
college
=0 urban, without college degree rural, without college degree

• We can generate four categories: urbanCol, urbanNonCol, ruralCol, and ruralNonCol.

• In Stata:
gen urbanCol = (urban == 1 & college == 1)
gen urbanNonCol = (urban == 1 & college == 0)
gen ruralCol = (urban == 0 & college == 1)
gen ruralNonCol = (urban == 0 & college == 0)

c Xinwei Ma 2021 24/27


Interaction Between Binary Regressors

 Regression of wage on urbanCol, urbanNonCol, and ruralCol (ruralNonCol is the base


category)
d = 5.75 + 5.36 urbanCol + 1.70 urbanNonCol + 2.61 ruralCol.
wage
Average wage for the four categories can be found by

urban
=1 =0
=1 5.75 + 5.36 = 11.11 5.75 + 2.61 = 8.36
college
=0 5.75 + 1.70 = 7.45 5.75

 In practice, it is more common to use interactions of binary variables

 Consider the regression of wage on urban, college, and urban× college


d = 5.75 + 1.70 urban + 2.61 college + 1.05 urban × college.
wage
Average wage for the four categories can be found by

urban
=1 =0
=1 5.75 + 1.70 + 2.61 + 1.05 = 11.11 5.75 + 2.61 = 8.36
college
=0 5.75 + 1.70 = 7.45 5.75
c Xinwei Ma 2021 25/27
Interaction Between Binary Regressors

 EXAMPLE. Assume we observe wage and three binary variables urban, college, and
married.

• There are 23 = 8 categories.

• A fully saturated regression will involve 7 categories as the regressors

• We can use interactions


d = β̂0 + β̂1 urban + β̂2 college + β̂3 married
wage

+ β̂4 urban × college + β̂5 urban × married + β̂6 college × married

+ β̂7 urban × college × married.


• To interpret the estimates, treat the binary variables as “switches” that can be turned “on” and
“off.”

• Find average wage for

− college graduates that are married and live in urban areas

− college graduates that are single and live in urban areas

− non-college graduates that are married and live in rural areas

− non-college graduates that are single and live in rural areas (base category)

c Xinwei Ma 2021 26/27


Interaction Between Binary Regressors

 In general, if we have p binary variables, X1 , X2 , · · · , Xp then

• There are 2p categories (unless some of the binary variables are mutually exclusive).

• A saturated regression requires 2p − 1 regressors.

• The base category is X1 = X2 = · · · = Xp = 0, and average Y for this category is the intercept.

• Average Y for the category X1 = X2 = · · · = Xp = 1 can be found by adding up all estimates.

• Average Y for other categories can be found by setting the corresponding binary variables to 1
and 0 and adding up the relevant coefficients.

 REMARK

• Regressing Y on X1 , X2 , · · · , Xp does not yield a saturated regression, unless all of the variables
are mutually exclusive.

• With many binary variables, the saturated regression will involve lots of categories.

• Some category may not have any observation (by chance), and the saturated regression will fail
due to perfect multicollinearity.

c Xinwei Ma 2021 27/27


The lectures and course materials, including slides, tests, outlines, and similar
materials, are protected by U.S. copyright law and by University policy. You may take
notes and make copies of course materials for your own use. You may also share those
materials with another student who is enrolled in or auditing this course.

You may not reproduce, distribute or display (post/upload) lecture notes or


recordings or course materials in any other way – whether or not a fee is charged –
without my written consent. You also may not allow others to do so.

If you do so, you may be subject to student conduct proceedings under the UC San
Diego Student Code of Conduct.

c Xinwei Ma 2021
[email protected]

c Xinwei Ma 2021 27/27

You might also like