0% found this document useful (0 votes)
48 views16 pages

1-6 Dummy Variable

This document discusses using dummy variables in regression analysis when explanatory variables are categorical. It explains that dummy variables, also called indicator or 0-1 variables, are used to represent categorical variables. Examples are provided to demonstrate how to create dummy variables in Excel and interpret the slope coefficients in regressions with single and multiple categorical variables. The key points are that dummy variables allow categorical variables to be included in regressions and the coefficients indicate average differences for each category compared to a reference group.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views16 pages

1-6 Dummy Variable

This document discusses using dummy variables in regression analysis when explanatory variables are categorical. It explains that dummy variables, also called indicator or 0-1 variables, are used to represent categorical variables. Examples are provided to demonstrate how to create dummy variables in Excel and interpret the slope coefficients in regressions with single and multiple categorical variables. The key points are that dummy variables allow categorical variables to be included in regressions and the coefficients indicate average differences for each category compared to a reference group.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

STA 371G

DUMMY VARIABLE
Bank Salaries.xlsx
• The Fifth National Bank of Springfield is facing a
gender-discrimination suit. The charge is that its
female employees receive substantially smaller
salaries than its male employees.
• The bank’s employee database is listed in this file.

2
Dummy Variables
• Some potential explanatory variables are categorical
and cannot be measured on a quantitative scale.
• However, we often need to use these variables because
they are related to the response variable.
• The trick is to create dummy variables, also called
indicator or 0-1 variables.
• These are variables that indicate the category a given
observation is in.

5
Create Dummy Variables in Excel
• Use 0 or 1 to encode the category
• IF(Gender=“Female”,1,0)
• After creating the dummy variables, we will only
use the dummy variables in our regression model.
We shouldn’t use any of the original categorical
variables that the dummies are based on.

6
Dummy variable when more than two categories
• We should use one less dummy variables than the
number of categories for any categorical variable.
The omitted dummy then corresponds to the
reference category.
• As we will see the interpretation of the dummy
variable coefficients are all relevant to this
reference category.
7
Exercise
• We are trying to understand the dependence of a
variable 'y' on three categorical variables. The
first variable can take 4 possible values (like for
example sales, advertising, purchasing,
engineering) while the second and third variables
can take 3 and 6 different values respectively.
How many total dummy variables will we need
to include in our multiple regression model?
8
Exercise
• We are trying to understand the dependence of a
variable 'y' on three categorical variables. The
first variable can take 4 possible values (like for
example sales, advertising, purchasing,
engineering) while the second and third variables
can take 3 and 6 different values respectively.
How many total dummy variables will we need
to include in our multiple regression model? [10]
9
Example (Interpretation of slope)
• We first estimate a regression equation with only one explanatory
variable.
Predicted Salary = 45505 - 8296Female
• To interpret this equation recall that Female has only two possible
values, 0 and 1. If we substitute 1 then the predicted salary equals
37209 and if we substitute 0 the predicted salary is 45505. These
are the average salaries of females and males.
• Therefore the interpretation of the -8296 coefficient of the Female
dummy variable is straightforward.

10
Example (Interpretation of slope)
• We first estimate a regression equation with only one explanatory
variable.
Predicted Salary = 45505 - 8296Female
• To interpret this equation recall that Female has only two possible
values, 0 and 1. If we substitute 1 then the predicted salary equals
37209 and if we substitute 0 the predicted salary is 45505. These
are the average salaries of females and males.
• Therefore the interpretation of the -8296 coefficient of the Female
dummy variable is straightforward.
• Predicted Salary = 37209 + 8296Male

11
Example continued
• The above equation only tells part of the story, it
ignores all information except for gender.
• We expand this equation by adding another
variable YrsExper.

12
Example continued (Parallel Regression Lines)
• The corresponding equation is
Predicted Salary = 35823 + 981YrsExper – 8011 Female
It is useful to write two separate equations, one for females and one for
males
Predicted Salary (F) = 27812 + 981YrsExper
Predicted Salary (M) = 35823 + 981YrsExper

• We interpret the coefficient -8011 of the Female dummy variable as the


average salary disadvantage for females relative to males after
controlling for job experience. But there is still more story to tell.

13
Example continued (Multiple categorical variables)
• We next add YrsPrior and Educ to the equation by
including four of the five education level dummies.
Although any four could be used, we use Ed_2 to Ed_5,
so that the lowest level becomes the reference category.
• We would expect this to lead to positive coefficients for
these dummies, which are easier to interpret.

14
Example continued (Multiple categorical variables)
• The estimated regression equations is now
Predicted Salary = 26613 + 1033YrsExper +
362YrsPrior + 160Ed_2 + 4765Ed_3 + 7320Ed_4 +
11770Ed_5 - 4501Female
• There are now two categorical variables involved,
gender and educational level.
• However, we can still write a separate equation for any
combination of categories by setting the dummies to the
appropriate values.
15
Example continued (Interpretation)
• For example, the equation for females at the fifth
education level is found by setting Female=1 and
Ed_5=1 and setting the other job dummies equal to 0.
The equation formed is
Predicted Salary = 33882 + 1033YrsExper + 362YrsPrior
• Fixing any gender and any education level, the
expected increase in salary for one extra year of
experience with Fifth National of $1033; the expected
increase in salary for one extra year of prior experience
with another bank is $362.
16
Example continued (Interpretation)
Predicted Salary = 26613 + 1033YrsExper + 362YrsPrior - 4501Female + 160Ed_2 +
4765Ed_3 + 7320Ed_4 + 11770Ed_5
• The coefficients of the education dummies indicate the average increase
in salary an employee can expect relative to the reference (lowest)
education level.
• The key coefficient, the negative $4501 for females, indicates the average
salary disadvantage for females relative to males, given that they have
the same experience levels and the same education levels.
• One further explanation for gender differences in salary might be job
grade. Perhaps females tend to be in lower job grades, which would help
explain why they get lower salaries on average.

17
Summary
• Dummy variable
– Categorical variable
– Create dummy variables
– Interpretation of slope coefficients

• Readings: Textbook Section 10.6

18

You might also like