2.9 ECS3706 Study Unit 7 - Dummy Variables SM
2.9 ECS3706 Study Unit 7 - Dummy Variables SM
If G = 0 then Y = β0 + β1X + ε
ECS3706/1 97
Open Rubric
The intercept term changes from β0 to β0 + β2 when G changes from 0 to 1.
The slope coefficient (β1 = ΔY/ΔX) remains the same irrespective of the value of G.
TASK 7.4.1
Consider a model which explains the expenditure on recreation and sport (Y) by
young unmarried professionals. The data are provided below.
• Specify the equation (called equation 7.4.1) and state the expected value
of all its coefficients.
• Define the meaning of all dummy variable(s).
• Assume a non-linear specification with respect to age. Assume that Y
initially increases with age, and then decreases.
• Assume that men spend more than women, ceteris paribus.
• Assume that graduates spend more than non-graduates, ceteris paribus.
(2) Assume an alternative specification which differs from model 7.4.1 in the
sense that the meaning of the dummy variable for gender has been reversed
(if previously G = 1 denoted men, now G = 1 denotes women). Explain what
impact this has on the interpretation of the coefficient of G.
24
98
STuDY uNIT 7: Choosing a functional form
ANSWERS
(1) A simple specification is:
The dummy variable G denotes gender with, say, G = 1 for men and G = 0
for women. The dummy variable E denotes level of education where E = 1
for graduates and E = 0 for nongraduates.
(2) Assume gender is specified as G = 0 for men and G = 1 for women. Be-
cause men spend more than women, the expected value of coefficient βG
will now be negative.
The previous section assumed using only 0 and 1 values of one dummy variable which
denotes the presence or absence of one qualitative factor. For example, based on
the value of G determines the intercept which has two possible levels.
Dummy variables can also be used to represent more than two levels of qualitative
factors.
TASK 7.4.2
Based on equation 7.4.1 explain how you would incorporate not only two levels
of education (nongraduate and graduate) but four levels (nongraduate, B degree,
M degree and D degree). To keep things simple, only include the Y variable and
the variable dealing with the level of education.
ECS3706/1 99
where E = 0 for nongraduates), E = 1 for a B degree, E = 2 for an M degree and
E = 3 for a D degree.
• Explain why model C (using all the variables of model B except EN)
25 ANSWER
It may be tempting to use the model A
Y = β 0 + βINCINC + β EE + ε
Model B
Y = β 0 + βINCINC + β EN EN + β EB EB + β EM EM + β EDED + ε
Condition EN EB EM ED
Nongraduate 1 0 0 0
B degree 0 1 0 0
M degree 0 0 1 0
D degree 0 0 0 1
100
STuDY uNIT 7: Choosing a functional form
Y = β 0 + βINCINC + β EB EB + β EM EM + β EDED + ε
where G.INC is the “interaction variable”, which is the product of two X variables.
TASK 7.5.1
Demonstrate that equation 7.5 copes with both the effects of an intercept and
slope change when G changes from 0 to 1.
26 ANSWER
If Gi = 0, then equation 7.5 changes to
Y i = β 0 + βINCINCi + εi.
Y i = β 0 + βINCINCi + βG + β X(INCi) + εi
Thus equation 7.5 accommodates both the effect of an intercept dummy and of
a slope dummy.
ECS3706/1 101
TASK 7.5.2
(a) Develop model 7.5.2, based on model 7.4.1, but where the slope ΔY/ΔINC
varies with the level of education (E). Compare the slope and the intercept
of model 7.5.2 when E changes from 0 to 1.
(c) Use your spreadsheet to estimate the coefficients of equation 7.5.2 by OLS.
27 ANSWERS
(a) The required equation in which ΔY/ΔINC varies with the level of education
(E) is:
When E = 0 then
When E = 1 then
E=0 E=1
Intercept β0 β0 + βE
(b) The full data set needed for the regression is provided below.
Y INC A A2 G E INC.E
0 12 600 29 841 1 0 0
102
200 6 700 22 484 1 0 0
Note that these results do not appear very promising. Although the overall fit is
quite good (R2 = 0.92), some of the coefficients have unexpected values and some
of them are insignificant. But then the sample size is very small!
ECS3706/1 103