0% found this document useful (0 votes)
11 views

Lecture8 Dummy

Uploaded by

dorineish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture8 Dummy

Uploaded by

dorineish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

DUMMY VARIABLES Barnabe Walheer

WHAT IS THIS ABOUT ?


DUMMY VARIABLES
Examples:
In many contexts, we
observe qualitative
features or attributes. - Smoker or non-smoker
Some can be added in the - Black or white
regression as independent - Female or male
variables using a dummy
variable (1 = presence of - Junior or senior
the qualitative feature, 0
- Belgian or non-Belgian
= absence).

What if we have more than two possibilities ?


DUMMY VARIABLES
COST: annual recurrent
expenditure.
COST
N: number of students
enrolled
Two types of secondary
schools : regular and  1'
occupational.

1
occupational school
Run 2 separate regular school
regressions ? N
DUMMY VARIABLES
COST
We are hypothesizing that
the annual overhead cost
is different for the two
types of school, but the
marginal cost is the same.  1+ 

The marginal cost
assumption is not very
1
plausible, and we will occupational school
relax it in due course. regular school
N
Does  capture a
marginal effect ? Regular school COST = 1 + 2N + u
Occupational school COST = 1' + 2N + u
Define  = 1' – 1
DUMMY VARIABLES
COST
Dummy variables
always have two values,
0 or 1.
If OCC is equal to 0, the  1+ 
cost function becomes 
that for regular schools.
If OCC is equal to 1, the 1
occupational school
cost function becomes regular school
that for occupational N
schools.
OCC = 0 Regular school COST = 1 + 2N + u
OCC = 1 Occupational school COST = 1 +  + 2N + u
Combined equation COST = 1 + OCC + 2N + u
DUMMY VARIABLES
Regular school (OCC=0):
annual overhead cost is . reg COST N OCC
‒34,000. What ?
Source | SS df MS Number of obs = 74
Occupational school ---------+------------------------------
Model | 9.0582e+11 2 4.5291e+11
F( 2,
Prob > F
71) =
=
56.86
0.0000
(OCC=1): the annual Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156
---------+------------------------------ Adj R-squared = 0.6048
overhead cost is 133,000 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248
higher than for regular
------------------------------------------------------------------------------
school. COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254
The regression results OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1
include standard errors _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61
------------------------------------------------------------------------------
and the usual diagnostic
statistics.
Regression quality ?
DUMMY VARIABLES

^
COST = –34,000 + 133,000OCC + 331N

^
Regular school COST = –34,000 + 331N
(OCC = 0)

^
Occupational school COST = –34,000 + 133,000 + 331N
(OCC = 1) = 99,000 + 331N
DUMMY VARIABLES
COST occupational school
regular school
600000

500000

What
400000
changes
between the
300000
red and grey
lines ? 200000

100000

0
0 200 400 600 800 1000 1200 N
-100000
HOW TO DEAL WITH
MULTIPLE CATEGORIES?
DUMMY VARIABLES
Dummy variables have been used to differentiate between regular and occupational
schools when fitting a cost function.

There are two types of regular secondary school: there are general schools (GEN), which
provide the usual academic education, and vocational schools (VOC).
Likewise, there are two types of occupational school: there are technical schools (TECH)
training technicians and skilled workers’ schools (WORKER) training craftsmen.

The standard procedure is to choose one category as the reference category and to define
dummy variables for each of the others.
DUMMY VARIABLES
TECH will be the dummy for the technical schools: TECH is equal to 1 if the observation
relates to a technical school, 0 otherwise.
WORKER and VOC for the skilled workers’ schools and the vocational schools.

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

Note that you do not include a dummy variable for the reference category, and that is the
reason that the reference category is usually described as the omitted category. Why ?
DUMMY VARIABLES
COST = 1+ TTECH + WWORKER + VVOC + 2N + u

General school COST = 1+ 2N + u


(TECH = WORKER = VOC = 0)

Technical school COST = (1+ T) + 2N + u


(TECH = 1; WORKER = VOC = 0) Interpretation of
the delta
Skilled workers' school COST = (1+ W) + 2N + u coefficients ?
(WORKER = 1; TECH = VOC = 0)

Vocational school COST = (1+ V) + 2N + u


(VOC = 1; TECH = WORKER = 0)
DUMMY VARIABLES
COST

Technical

1+T W T
1+W Workers’
Vocational
V
1+V
1 General

N
DUMMY VARIABLES
COST
600000

500000

400000

300000

200000

100000

0
0 200 400 600 800 1000 1200 N
-100000

Technical schools Workers' schools Vocational schools General schools


DUMMY VARIABLES
The coefficients of TECH,
WORKER, and VOC are
154,000, 143,000, and . reg COST N TECH WORKER VOC
53,000, respectively, and
should be interpreted as the Source | SS df MS Number of obs = 74
additional annual overhead ---------+------------------------------ F( 4, 69) = 29.63
Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000
costs, relative to those of Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320
general schools. ---------+------------------------------ Adj R-squared = 0.6107
Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578
The constant term is –55,000,
indicating that the annual ------------------------------------------------------------------------------
overhead cost of a general COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
academic school is –55,000 ---------+--------------------------------------------------------------------
per year. N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692
TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4
Obviously, this is nonsense WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2
and indicates that something VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9
is wrong with the model. _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748
------------------------------------------------------------------------------
Regression quality ?
DUMMY VARIABLES
^ = –55,000 + 154,000 TECH + 143,000 WORKER + 53,000 VOC + 343 N
COST

^
General school COST = –55,000 + 343N
(TECH = WORKER = VOC = 0)

^
Technical school COST = –55,000 + 154,000 + 343N
(TECH = 1; WORKER = VOC = 0) = 99,000 + 343N
^
Skilled workers' school COST = –55,000 + 143,000 + 343N
(WORKER = 1; TECH = VOC = 0) = 88,000 + 343N
^
Vocational school COST = –55,000 + 53,000 + 343N
(VOC = 1; TECH = WORKER = 0) = –2,000 + 343N
DUMMY VARIABLES
COST
600000

500000

400000

300000

200000

100000

0
0 200 400 600 800 1000 1200 N
-100000

Technical schools Workers' schools Vocational schools General schools


DUMMY VARIABLES . reg COST N

F test of the joint Source | SS df MS Number of obs = 74


---------+------------------------------ F( 1, 72) = 46.82
explanatory power Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000
of the dummy Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940
---------+------------------------------ Adj R-squared = 0.3856
variables as a group. Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05
----------------------------------------
The null hypothesis is
H0: T = W = V = 0.
. reg COST N TECH WORKER VOC

Source | SS df MS Number of obs = 74


The alternative ---------+------------------------------ F( 4, 69) = 29.63
Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000
hypothesis is that at Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320
least one  is ---------+------------------------------
Total | 1.4713e+12 73 2.0155e+10
Adj R-squared
Root MSE
=
=
0.6107
88578
different from 0. ----------------------------------------
Why /3 ?
(8.92  1011  5.41 1011 ) / 3
F (3,69)   14.92 F (3,60)crit, 0.1%  6.17
Conclusion of the test ? 5.41 1011 / 69
DUMMY VARIABLES
COST = 1+ TTECH + VVOC + GGEN + 2N + u

Skilled workers' school COST = 1+ 2N + u


(TECH = VOC = GEN = 0)
What happens if we
choose another group as Technical school COST = (1+ T) + 2N + u
the reference group ? (TECH = 1; VOC = GEN = 0)

Vocational school COST = (1+ V) + 2N + u


(VOC = 1; TECH = GEN = 0)

General school COST = (1+ G) + 2N + u


(GEN = 1; TECH = VOC = 0)
DUMMY VARIABLES
. reg COST N TECH VOC GEN

Source | SS df MS Number of obs = 74


---------+------------------------------ F( 4, 69) = 29.63
Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000
Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320
---------+------------------------------ Adj R-squared = 0.6107
Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578

------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 Interpretation
TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95
VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 of the
GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 coefficients ?
_cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6
------------------------------------------------------------------------------
DUMMY VARIABLES
^ = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N
COST

^
Skilled workers' school COST = 88,000 + 343N
We obtain the same (TECH = VOC = GEN = 0)
cost functions
^
Technical school COST = 88,000 + 11,000 + 343N
(TECH = 1; VOC = GEN = 0) = 99,000 + 343N
^
Vocational school COST = 88,000 – 90,000 + 343N
(VOC = 1; TECH = GEN = 0) = –2,000 + 343N
^
General school COST = 88,000 – 143,000 + 343N
(VOC = 1; TECH = WORKER = 0) = –55,000 + 343N
DUMMY VARIABLES
COST
600000

500000

400000
We obtain the same
300000
ranking between
schools. 200000

100000

0
0 200 400 600 800 1000 1200 N
-100000

Technical schools Workers' schools Vocational schools General schools


DUMMY VARIABLES
The goodness of fit, . reg COST N TECH VOC GEN

whether measured Source | SS df MS Number of obs = 74


---------+------------------------------ F( 4, 69) = 29.63
by R2, RSS, or Root Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000
MSE, is likewise not Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320
---------+------------------------------ Adj R-squared = 0.6107
affected by the Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578
change.
------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
But the t tests are ---------+--------------------------------------------------------------------
affected. N |
TECH |
342.6335
10748.51
40.2195
30524.87
8.519
0.352
0.000
0.726
262.3978
-50146.93
422.8692
71643.95
Is this an VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07
GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57
issue ? _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6
------------------------------------------------------------------------------
DUMMY VARIABLES
The one test . reg COST N TECH WORKER VOC
------------------------------------------------------------------------------
involving the COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
dummy variables ---------+--------------------------------------------------------------------
N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692
that can be TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4
performed with WORKER |
VOC |
143362.4
53228.64
27852.8
31061.65
5.147
1.714
0.000
0.091
87797.57
-8737.646
198927.2
115194.9
either specification _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748
------------------------------------------------------------------------------
is the test of
whether the . reg COST N TECH VOC GEN
overhead costs of ------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
general schools and ---------+--------------------------------------------------------------------
N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692
skilled workers’ TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95
schools are VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07
GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57
different. Why ? _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6
------------------------------------------------------------------------------
HOW TO DEAL WITH MULTIPLE
DUMMY VARIABLES?
DUMMY VARIABLES
COST = 1+ OCC + RES + 2N + u

Regular, nonresidential COST = 1+ 2N + u


(OCC = RES = 0)

Regular, residential COST = (1+ ) + 2N + u Interpretation of


(OCC = 0; RES = 1)
the coefficients ?
Occupational, nonresidential COST = (1+ ) + 2N + u
(OCC = 1; RES = 0)

Occupational, residential COST = (1+  + ) + 2N + u


(OCC = 1; RES = 1)
DUMMY VARIABLES
COST
Occupational, residential
Occupational,
nonresidential
 Regular,
residential
1++  +
 1+ 

 1+ 
1 Regular, nonresidential

N
DUMMY VARIABLES
COST
600000

500000

400000

300000

200000

100000

0
0 200 400 600 800 1000 1200 N
-100000

Nonresidential regular Residential regular


Nonresidential occupational Residential occupational
DUMMY VARIABLES
The coefficient of
OCC indicates that the
annual overhead costs . reg COST N OCC RES
of occupational Source | SS df MS Number of obs = 74
schools are 110,000 ---------+------------------------------ F( 3, 70) = 40.43
more than those of Model | 9.3297e+11 3 3.1099e+11 Prob > F = 0.0000
regular schools. Residual | 5.3838e+11 70 7.6911e+09
---------+------------------------------
R-squared
Adj R-squared
=
=
0.6341
0.6184
Total | 1.4713e+12 73 2.0155e+10 Root MSE = 87699
The coefficient of RES
indicates that the ------------------------------------------------------------------------------
annual overhead costs COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
of residential schools ---------+--------------------------------------------------------------------
N | 321.833 39.40225 8.168 0.000 243.2477 400.4183
are 58,000 greater OCC | 109564.6 24039.58 4.558 0.000 61619.15 157510
than those of non- RES | 57909.01 30821.31 1.879 0.064 -3562.137 119380.2
residential schools. _cons | -29045.27 23291.54 -1.247 0.217 -75498.78 17408.25
------------------------------------------------------------------------------
DUMMY VARIABLES
^ = –29,000 + 110,000OCC + 58,000RES + 322N
COST

^
Regular, nonresidential COST = –29,000 + 322N
(OCC = RES = 0)

^
Regular, residential COST = –29,000 + 58,000 + 322N
(OCC = 0; RES = 1) = 29,000 + 322N Does this make
^ sense ?
Occupational, nonresidential COST = –29,000 + 110,000 + 322N
(OCC = 1; RES = 0) = 81,000 + 322N
^
Occupational, residential COST = –29,000 + 110,000 + 58,000 + 322N
(OCC = 1; RES = 1) = 139,000 + 322N
DUMMY VARIABLES
COST
600000
O, R
500000 O, N
R, R
400000 R, N

300000

200000

100000

0
0 200 400 600 800 1000 1200 N
-100000

Nonresidential regular Residential regular


Nonresidential occupational Residential occupational
ARE DUMMY VARIABLES USED
FOR ABSOLUTE EFFECTS ONLY ?
DUMMY VARIABLES

Slope dummy COST = 1 + OCC + 2N + N OCC + u


variables.

Regular school COST = 1 + 2N + u


(OCC = NOCC = 0) Interpretation of
NOCC is defined the coefficients ?
as the product of
N and OCC. Occupational school COST = (1 + ) + (2 + )N + u
(OCC = 1; NOCC = N)
DUMMY VARIABLES
COST

Occupational


Regular

 1 +

1

N
DUMMY VARIABLES
. reg COST N OCC NOCC

Source | SS df MS Number of obs = 74


---------+------------------------------ F( 3, 70) = 49.64
Model | 1.0009e+12 3 3.3363e+11 Prob > F = 0.0000
Residual | 4.7045e+11 70 6.7207e+09 R-squared = 0.6803
---------+------------------------------ Adj R-squared = 0.6666
Coefficients’ Total | 1.4713e+12 73 2.0155e+10 Root MSE = 81980

interpretations?------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Conclusion of N | 152.2982 60.01932 2.537 0.013 32.59349 272.003
the tests ? OCC | -3501.177
NOCC | 284.4786
41085.46
75.63211
-0.085
3.761
0.932
0.000
-85443.55
133.6351
78441.19
435.3221
_cons | 51475.25 31314.84 1.644 0.105 -10980.24 113930.7
------------------------------------------------------------------------------
DUMMY VARIABLES
^
COST = 51,000 – 4,000OCC + 152N + 284NOCC

^
Regular school COST = 51,000 + 152N
(OCC = NOCC = 0)

^
Occupational school COST = 51,000 – 4,000 + 152N + 284N
(OCC = 1; NOCC = N) = 47,000 + 436N
DUMMY VARIABLES
COST
occupational school
regular school
600000

500000

400000

300000

200000

100000

0
0 200 400 600 800 1000 1200 N
-100000
DUMMY VARIABLES
. reg COST N OCC NOCC

We can also Source | SS df


---------+------------------------------
MS Number of obs =
F( 3, 70) =
74
49.64
perform an F test Model | 1.0009e+12 3 3.3363e+11 Prob > F = 0.0000
of the joint Residual | 4.7045e+11 70
---------+------------------------------
6.7207e+09 R-squared = 0.6803
Adj R-squared = 0.6666
explanatory Total | 1.4713e+12 73 2.0155e+10 Root MSE = 81980
power of the ------------------------------------------------------------------------------

dummy . reg COST N


variables, Source | SS df MS Number of obs = 74
comparing RSS ---------+------------------------------ F( 1, 72) = 46.82
Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000
when the dummy Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940
variables are ---------+------------------------------ Adj R-squared = 0.3856
Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05
included with RSS ------------------------------------------------------------------------------
and when they H0 ? Why /2 ? F (2,70)crit, 0.1%  7.6
are not. (8 . 92  10 11
 4 . 70  10 11
) / 2
F (2,70)   31.4
H1 ? 4 . 70  10 11
/ 70 Conclusion of the test ?
SHOULD I CONSIDER
GROUPS IN MY SAMPLE ?
DUMMY VARIABLES
COST
occupational school
regular school
600000
Chow test:
500000
compare the
pooled 400000
regression with
the sub-group 300000
regressions.
200000

100000

0
0 200 400 600 800 1000 1200 N
DUMMY VARIABLES
Chow test
procedure
Residual sum of squares (x1011)
Step 1: run
regression for the Regression Occupational Regular Total
sub-groups and
save the RSS.
Separate RSS1 = 3.49 RSS2 = 1.22 4.71

Step 2: run a Pooled RSSP = 8.91


pooled regression
and sun the RSS.
DUMMY VARIABLES
COST
occupational school RSS = 8.91 x 1011
regular school
600000

500000

400000

300000

200000

100000

0
0 200 400 600 800 1000 1200 N
DUMMY VARIABLES
overall reduction in RSS when cost in degrees
separate regressions are run of freedom
F(k, n – 2k) 
H0 ? total RSS remaining when degrees of freedom
separate regressions are run remaining
H1 ?
(RSSP  [RSS1  RSS2 ]) / k

(RSS1  RSS2 ) /(n  2k )

(8.91 1011  [ 3.49  1011  1.22  1011 ]) / 2


F (2,70)   31.2
(3.49  1011  1.22  1011 ) / 70

F (2,70)crit, 0.1%  7.6 RSSP = 8.91 x 1011


Conclusion of the test ? RSS1 + RSS2 = 4.71 x 1011

You might also like