0% found this document useful (0 votes)
33 views7 pages

2.9 ECS3706 Study Unit 7 - Dummy Variables SM

Uploaded by

Karl Azong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views7 pages

2.9 ECS3706 Study Unit 7 - Dummy Variables SM

Uploaded by

Karl Azong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

STuDY uNIT 7: Choosing a functional form

7.4 uSING DuMMY VARIABLES


Dummy variables are commonly used as X variables in regression equations to rep-
resent qualitative differences, for example, the condition of gender (male or female)
or meeting a particular condition (having at least a master’s degree).

(a) Intercept dummies

The simplest case is the intercept dummy variable as in

Y = β0+ β1X + β2G + ε (equation 7.4)

where G is the dummy variable which assumes values of 0 or 1 depending on gender.


If, for example, the observation applies to a man then G = 1, and if the observation
applies to a woman, then G = 0. In equation 7.4, β2 then represents the difference
in intercept explained by G (gender). In fact, equation 7.4 represents two different
equations, one for men and one for women. This may be proved as follows:

If G = 0 then Y = β0 + β1X + ε

If G = 1 then Y = β0 + β1X + β2 + ε or Y = (β0 + β2) + β1X + ε

ECS3706/1 97
Open Rubric
The intercept term changes from β0 to β0 + β2 when G changes from 0 to 1.

The slope coefficient (β1 = ΔY/ΔX) remains the same irrespective of the value of G.

TASK 7.4.1
Consider a model which explains the expenditure on recreation and sport (Y) by
young unmarried professionals. The data are provided below.

Person Monthly Spen- Age Gender Education


income
ding level
INC
Y A G E

1 10 000 400 27 male BA

2 16 000 200 24 female MCom

3 8 500 500 25 male BSc

4 12 600 0 29 male Matric

5 24 500 600 30 female MBA

6 6 700 200 22 male Matric

7 7 000 200 23 female BCom

8 14 600 300 34 male Matric

9 9 000 250 28 female BA

(1) Compile an equation which expresses Y as a function of income (INC), age


(A), gender (G) and level of education (E: graduates or non-graduates).

• Specify the equation (called equation 7.4.1) and state the expected value
of all its coefficients.
• Define the meaning of all dummy variable(s).
• Assume a non-linear specification with respect to age. Assume that Y
initially increases with age, and then decreases.
• Assume that men spend more than women, ceteris paribus.
• Assume that graduates spend more than non-graduates, ceteris paribus.

(2) Assume an alternative specification which differs from model 7.4.1 in the
sense that the meaning of the dummy variable for gender has been reversed
(if previously G = 1 denoted men, now G = 1 denotes women). Explain what
impact this has on the interpretation of the coefficient of G.

24

98
STuDY uNIT 7: Choosing a functional form

ANSWERS
(1) A simple specification is:

Y = β 0 + βINCINC + βA A + βA2 A2 + βGG + β EE + ε (equation 7.4.1)

The dummy variable G denotes gender with, say, G = 1 for men and G = 0
for women. The dummy variable E denotes level of education where E = 1
for graduates and E = 0 for nongraduates.

Variable Expected value of coefficient and reason

Income βINC > 0 since Y can be expected to increase with an increase


in INC. One could also use log(INC), meaning that when
INC increases, Y increases at a decreasing rate.

Age A quadratic form is used. Since C must first increase with


age, and then decrease, the expected value of coefficient
βA2 < 0.

Gender βG > 0 since men spend more than women.

Educa - β E > 0 since graduates spend more than nongraduates.


tion

(2) Assume gender is specified as G = 0 for men and G = 1 for women. Be-
cause men spend more than women, the expected value of coefficient βG
will now be negative.

(b) Using dummy variables to denote more than two levels

The previous section assumed using only 0 and 1 values of one dummy variable which
denotes the presence or absence of one qualitative factor. For example, based on

Y = β0 + βINCINC + βA A + βA2A 2 + βGG + βEE + ε (equation 7.4.1)

the value of G determines the intercept which has two possible levels.

If G = 0 then Y = β0 + (βINCINC + βA A + βA2A 2 + βEE + ε) while

if G = 1 then Y = (β0 + βG) + (βINCINC + βA A + βA2A 2 + βEE + ε).

Dummy variables can also be used to represent more than two levels of qualitative
factors.

TASK 7.4.2
Based on equation 7.4.1 explain how you would incorporate not only two levels
of education (nongraduate and graduate) but four levels (nongraduate, B degree,
M degree and D degree). To keep things simple, only include the Y variable and
the variable dealing with the level of education.

• Explain why model A is insufficient


Y = β 0 + βINCINC + βEE + ε

ECS3706/1 99
where E = 0 for nongraduates), E = 1 for a B degree, E = 2 for an M degree and
E = 3 for a D degree.

• Explain why model B

Y = β 0 + βINCINC + βENEN + βEBEB + βEMEM + βEDED + ε

is also insufficient where EN = 1 for nongraduates, EB = 1 for B degrees, EM =


1 for M degrees and ED = 1 for D degrees and all these dummy variables = 0 if
otherwise.

• Explain why model C (using all the variables of model B except EN)

Y = β 0 + βINCINC + βEBEB + β EM + βEDED + ε

is correct, as well as the meaning of its coefficients.

25 ANSWER
It may be tempting to use the model A

Y = β 0 + βINCINC + β EE + ε

where E = 0 (nongraduates), E = 1 (B degrees), E = 2 (M degrees) and E = 3 (D


degrees). This specification, however, assumes a constant difference in Y between
each successive level of education, which might be unrealistic. For example, the
impact on Y between a nongraduate and a B graduate may be much larger than
between a B graduate and an M graduate.

Model B

Y = β 0 + βINCINC + β EN EN + β EB EB + β EM EM + β EDED + ε

is insufficient on grounds of multicollinearity. This matter is discussed fully in


chapter 8. Let’s use a table to demonstrate the problem. Each person’s level
of education could be coded as follows:

Condition EN EB EM ED

Nongraduate 1 0 0 0

B degree 0 1 0 0

M degree 0 0 1 0

D degree 0 0 0 1

For every observation within the data set, EN + EB + EM + ED = 1. This implies


that if any three variables are known, then the remaining variable is also known.
If, for example, it is known that EB = EM = ED = 0, then the observation must
necessarily deal with a nongraduate. Thus EN is redundant and the solution is to
omit variable EN. In fact, any one of (EN, EB, EM, ED) may be omitted, although
the interpretation of these coefficients will then change.

100
STuDY uNIT 7: Choosing a functional form

Model C is the proper model to use.

Y = β 0 + βINCINC + β EB EB + β EM EM + β EDED + ε

The interpretation of the coefficients is as follows:

(a) β 0 includes the (constant term) effect on Y of nongraduates.

(b) β EB measures the difference in intercept between nongraduates and B


graduates.

(c) β EM measures the difference in intercept between nongraduates and M


graduates.

(d) β ED measures the difference in intercept between nongraduates and D


graduates.

7.5 SLOPE DuMMY VARIABLES


Dummy variables can also be used to represent differences in slope. This is done by
adding an X variable called an interaction variable, as in

Yi = β0 + βINCINCi + βGGi + βX(Gi.INCi) + εi (equation 7.5)

where G.INC is the “interaction variable”, which is the product of two X variables.

TASK 7.5.1
Demonstrate that equation 7.5 copes with both the effects of an intercept and
slope change when G changes from 0 to 1.

26 ANSWER
If Gi = 0, then equation 7.5 changes to

Y i = β 0 + βINCINCi + εi.

If Gi = 1 then equation 7.5 changes to

Y i = β 0 + βINCINCi + βG + β X(INCi) + εi

which may also be written as

Y i = (β 0 + βG) + (βINC + β X )INCi + εi.

The effect is that when Gi changes from 0 to 1 then the

• intercept changes from β 0 to β 0 + βG and the


• slope changes from βINC to βINC + β X .

Thus equation 7.5 accommodates both the effect of an intercept dummy and of
a slope dummy.

ECS3706/1 101
TASK 7.5.2
(a) Develop model 7.5.2, based on model 7.4.1, but where the slope ΔY/ΔINC
varies with the level of education (E). Compare the slope and the intercept
of model 7.5.2 when E changes from 0 to 1.

Y = β 0 + βINCINC + βA A + βA2 A 2 + βGG + βEE + ε (equation 7.4.1)

where E = 0 for nongraduates and E = 1 for graduates.

(b) Create the data set needed to estimate equation 7.5.2.

(c) Use your spreadsheet to estimate the coefficients of equation 7.5.2 by OLS.

27 ANSWERS
(a) The required equation in which ΔY/ΔINC varies with the level of education
(E) is:

Y = β 0 + βINCINC + βA A + βA2 A2 + βGG + β EE + β X(INC.E) + ε


(equation 7.5.2).

As previously, INC.E is the interaction variable.

When E = 0 then

Y = β 0 + βINCINC + βA A + βA2 A2 + βGG + ε.

When E = 1 then

Y = β 0 + βINCINC + βA A + βA2 A2 + βGG + β E + β X(INC) + ε.

Comparison of slope and intercept:

E=0 E=1

Intercept β0 β0 + βE

Slope ΔY/ΔINC βINC βINC + β X

(b) The full data set needed for the regression is provided below.

Y INC A A2 G E INC.E

400 10 000 27 729 1 1 10 000

200 16 000 24 576 0 1 16 000

500 8 500 25 625 1 1 8 500

0 12 600 29 841 1 0 0

600 24 500 30 900 0 1 24 500

102
200 6 700 22 484 1 0 0

200 7 000 23 529 0 1 7 000

300 14 600 34 1 156 1 0 0

250 9 000 28 784 0 1 9 000

(c) The regression coefficients are provided below in table format.

Variable Coefficient Standard error t Stat

Intercept 3 924.07 2 229.64 1.760

INC -0.05 0.03 -1.537

A -298.75 168.54 -1.773

A2 6.11 3.08 1.988

G 246.06 98.41 2.500

E -204.87 327.05 -0.626

INC.E 0.06 0.03 2.117

Note that these results do not appear very promising. Although the overall fit is
quite good (R2 = 0.92), some of the coefficients have unexpected values and some
of them are insignificant. But then the sample size is very small!

ECS3706/1 103

You might also like