0% found this document useful (0 votes)
43 views11 pages

Lecture 5 Dummy Variable

Uploaded by

Jakey Brown
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views11 pages

Lecture 5 Dummy Variable

Uploaded by

Jakey Brown
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

ECON3049: ECONOMETRICS I

DUMMY VARIABLES
Marlon Tracey
Summer 2012
Outline
2

 Nature and use of dummy variables


 Creation of dummy variables
 Dummy as explanatory variable

 Dummy variable trap

 Intercept dummy

 Slope dummy

 Dummy as dependent variable


 Linear probability model

Reading: Gujurati Chapt. 9 & 15; Wooldridge Chapt. 7


Nature and Use of Dummy Variables
3

 If a variable is qualitative or categorical, standard regression can still


be used to analyze the effect of such variables. This is done by using
dummy variables.
 Dummy variables are variables that take on only two values 0 and 1,
where 0 means the absence of a quality and 1 means the presence of
the quality.
 A dummy variable can be used as an explanatory variable:
 to capture the effects of qualitative characteristics of individuals (e.g. race, religion,
gender, level of education), firms (e.g. private /public, large/medium/small/micro),
countries (e.g. OECD or otherwise, developed, developing, undeveloped,
underdeveloped) etc.;
 to examine before and after situations (pre-liberalization/post-liberalization; pre-
recovery/post-recovery) or

 To test whether a regression function is different for one group versus another group.

 A dummy variable can also be used as a dependent variable to predict


“choices”, “options” or qualitative outcomes.
Creation of dummy variables
4

 Convert a variable with two categories (such as “distance from home” or


gender) to a dummy variable (D) as per the following examples:
1 if live near 1 if male
Dn   Dm  
0 if live far 0 if female

0 if live near 0 if male


Df   Df  
1 if live far 1 if female
 Convert a variable with three or more categories (such as quality of
degree) to dummy variables by creating for each category a separate
dummy variable as follows:
1 First Class 1 Upper Second Class
D1   D2  
0 Otherwise 0 Otherwise

1 Lower Second Class 1 Pass


D3   D4  
0 Otherwise 0 Otherwise
Dummy as explanatory variable
5

 When dummy variables are used as explanatory variables


in a regression, one must be careful to avoid the dummy
variable trap.
 A dummy explanatory variable can either be an intercept
dummy and/or a slope dummy:
 An intercept dummy is used when the researcher theorizes the
dummy variable to have a level effect on the dependent variable,
that is, the level of the dependent variable is higher for one
category versus another.
 A slope dummy is used when the researcher theorizes the dummy
variable to change the marginal effect of an explanatory variable
on the dependent variable. It changes the magnitude and/or
direction of the relationship between an explanatory variable and
a dependent variable.
Dummy variable trap
6

 In regression analysis, all possible dummies for a qualitative


explanatory variable should not be included in the equation.
 Doing so creates a dummy variable trap, which is due to perfect
multicollinearity.
 To avoid the dummy variable trap:
 If the variable has M categories, only M-1 dummies should be entered
in the regression equation.
 Or we can drop the intercept from the model and include all the
dummies.

 The final category not represented in the regression is referred to as


the reference category.
 The coefficients of the other dummy variables are interpreted
relative to the reference category.
Intercept dummy
7

 Consider a regression model with one non-categorical (edu) and one


dummy (Df = 1 if female) as follows:
wage = 0 + 1edu + 0Df + u
 This can be interpreted as an intercept shift
 If Df = 0, then wage = 0 + 1edu + u
 If Df = 1, then wage = (0 + 0) + 1edu + u

 In this case, the group that is left out is “Males” and is represented by Df =
0. This group is the reference group.

 Note that the dummy Df when equal to 1, shifts the intercept from 0 to
(0 + 0). It is therefore called an intercept dummy.

 The effect of the intercept dummy is interpreted as follows: the difference


between wage if Df =1 and wage if Df =0 is 0 , ceteris paribus. More
intuitively, the gender wage differential is 0 .
Slope dummy
8

 Consider a regression model with one non-categorical (edu) and one


dummy (Df = 1 if female) as follows:
wage = 0 + 1edu+ 1Df *edu+ u
 This can be interpreted as an intercept shift
 If Df = 0, then wage = 0 + 1edu + u
 If Df = 1, then wage = 0 + (1 +1) edu + u

 In this case, the group that is left out is “Males” and is represented by Df =
0. This group is the reference group.

 Note that the dummy Df when equal to 1, shifts the slope from 1 to (1 +
1). It is therefore called an slope dummy.
 The effect of the slope dummy is interpreted as follows: the difference
between the effect of education on wage if Df = 1 and the effect if Df = 0
is 1 , ceteris paribus. More intuitively, the difference between the marginal
return to education for females and males is 1 .
Dummy as dependent variable
9

 In many cases, a researcher may be interested in the set of


variables that predict certain binary “choices” or “options”.
For example, the researcher might be interested in the factors
that determine the following:

 Decision to work or go to school


 Voting (vote or not vote)
 Marital Status (married or not)
 Decision to Pass or fail a course.

 Can OLS regression analysis still be conducted when the


dependent variable is binary?
Linear Probability Model I
10

 Consider the simple linear regression model as follows:


Y   0  1 X  u
 Where the population regression function (PRF) is
E (Y | X )   0   1 X  E (u | X )  0

 Since Y is a dummy (or binary) variable at each value of X,


the E (Y | X )  p , where p is the probability that y = 1.
Therefore, in this case, the PRF is referred to as a linear
probability model.
 When the PRF is estimated by OLS on a particular sample,
the estimated linear probability equation is given by:
pˆ  ˆ 0  ˆ1 X
Linear Probability Model II
11

 However, the linear probability model violates several


assumptions of our standard regression model:
 Linearity assumption is no longer realistic.
 Since X has a linear effect β1 on y, it is possible that pˆ  0 or pˆ  1 but
probability must lie in the interval [0,1].

 Y and therefore, u is non-normal. However, in large sample this


is not so much of a problem.

 u is heteroskedastic, since V ( u )  p (1  p ) , where p depends


on X. However, this problem can be corrected using Weighted
Least Squares.

You might also like