Lecture8 Dummy
Lecture8 Dummy
1
occupational school
Run 2 separate regular school
regressions ? N
DUMMY VARIABLES
COST
We are hypothesizing that
the annual overhead cost
is different for the two
types of school, but the
marginal cost is the same. 1+
The marginal cost
assumption is not very
1
plausible, and we will occupational school
relax it in due course. regular school
N
Does capture a
marginal effect ? Regular school COST = 1 + 2N + u
Occupational school COST = 1' + 2N + u
Define = 1' – 1
DUMMY VARIABLES
COST
Dummy variables
always have two values,
0 or 1.
If OCC is equal to 0, the 1+
cost function becomes
that for regular schools.
If OCC is equal to 1, the 1
occupational school
cost function becomes regular school
that for occupational N
schools.
OCC = 0 Regular school COST = 1 + 2N + u
OCC = 1 Occupational school COST = 1 + + 2N + u
Combined equation COST = 1 + OCC + 2N + u
DUMMY VARIABLES
Regular school (OCC=0):
annual overhead cost is . reg COST N OCC
‒34,000. What ?
Source | SS df MS Number of obs = 74
Occupational school ---------+------------------------------
Model | 9.0582e+11 2 4.5291e+11
F( 2,
Prob > F
71) =
=
56.86
0.0000
(OCC=1): the annual Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156
---------+------------------------------ Adj R-squared = 0.6048
overhead cost is 133,000 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248
higher than for regular
------------------------------------------------------------------------------
school. COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254
The regression results OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1
include standard errors _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61
------------------------------------------------------------------------------
and the usual diagnostic
statistics.
Regression quality ?
DUMMY VARIABLES
^
COST = –34,000 + 133,000OCC + 331N
^
Regular school COST = –34,000 + 331N
(OCC = 0)
^
Occupational school COST = –34,000 + 133,000 + 331N
(OCC = 1) = 99,000 + 331N
DUMMY VARIABLES
COST occupational school
regular school
600000
500000
What
400000
changes
between the
300000
red and grey
lines ? 200000
100000
0
0 200 400 600 800 1000 1200 N
-100000
HOW TO DEAL WITH
MULTIPLE CATEGORIES?
DUMMY VARIABLES
Dummy variables have been used to differentiate between regular and occupational
schools when fitting a cost function.
There are two types of regular secondary school: there are general schools (GEN), which
provide the usual academic education, and vocational schools (VOC).
Likewise, there are two types of occupational school: there are technical schools (TECH)
training technicians and skilled workers’ schools (WORKER) training craftsmen.
The standard procedure is to choose one category as the reference category and to define
dummy variables for each of the others.
DUMMY VARIABLES
TECH will be the dummy for the technical schools: TECH is equal to 1 if the observation
relates to a technical school, 0 otherwise.
WORKER and VOC for the skilled workers’ schools and the vocational schools.
Note that you do not include a dummy variable for the reference category, and that is the
reason that the reference category is usually described as the omitted category. Why ?
DUMMY VARIABLES
COST = 1+ TTECH + WWORKER + VVOC + 2N + u
Technical
1+T W T
1+W Workers’
Vocational
V
1+V
1 General
N
DUMMY VARIABLES
COST
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
-100000
^
General school COST = –55,000 + 343N
(TECH = WORKER = VOC = 0)
^
Technical school COST = –55,000 + 154,000 + 343N
(TECH = 1; WORKER = VOC = 0) = 99,000 + 343N
^
Skilled workers' school COST = –55,000 + 143,000 + 343N
(WORKER = 1; TECH = VOC = 0) = 88,000 + 343N
^
Vocational school COST = –55,000 + 53,000 + 343N
(VOC = 1; TECH = WORKER = 0) = –2,000 + 343N
DUMMY VARIABLES
COST
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
-100000
------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 Interpretation
TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95
VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 of the
GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 coefficients ?
_cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6
------------------------------------------------------------------------------
DUMMY VARIABLES
^ = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N
COST
^
Skilled workers' school COST = 88,000 + 343N
We obtain the same (TECH = VOC = GEN = 0)
cost functions
^
Technical school COST = 88,000 + 11,000 + 343N
(TECH = 1; VOC = GEN = 0) = 99,000 + 343N
^
Vocational school COST = 88,000 – 90,000 + 343N
(VOC = 1; TECH = GEN = 0) = –2,000 + 343N
^
General school COST = 88,000 – 143,000 + 343N
(VOC = 1; TECH = WORKER = 0) = –55,000 + 343N
DUMMY VARIABLES
COST
600000
500000
400000
We obtain the same
300000
ranking between
schools. 200000
100000
0
0 200 400 600 800 1000 1200 N
-100000
N
DUMMY VARIABLES
COST
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
-100000
^
Regular, nonresidential COST = –29,000 + 322N
(OCC = RES = 0)
^
Regular, residential COST = –29,000 + 58,000 + 322N
(OCC = 0; RES = 1) = 29,000 + 322N Does this make
^ sense ?
Occupational, nonresidential COST = –29,000 + 110,000 + 322N
(OCC = 1; RES = 0) = 81,000 + 322N
^
Occupational, residential COST = –29,000 + 110,000 + 58,000 + 322N
(OCC = 1; RES = 1) = 139,000 + 322N
DUMMY VARIABLES
COST
600000
O, R
500000 O, N
R, R
400000 R, N
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
-100000
Occupational
Regular
1 +
1
N
DUMMY VARIABLES
. reg COST N OCC NOCC
interpretations?------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Conclusion of N | 152.2982 60.01932 2.537 0.013 32.59349 272.003
the tests ? OCC | -3501.177
NOCC | 284.4786
41085.46
75.63211
-0.085
3.761
0.932
0.000
-85443.55
133.6351
78441.19
435.3221
_cons | 51475.25 31314.84 1.644 0.105 -10980.24 113930.7
------------------------------------------------------------------------------
DUMMY VARIABLES
^
COST = 51,000 – 4,000OCC + 152N + 284NOCC
^
Regular school COST = 51,000 + 152N
(OCC = NOCC = 0)
^
Occupational school COST = 51,000 – 4,000 + 152N + 284N
(OCC = 1; NOCC = N) = 47,000 + 436N
DUMMY VARIABLES
COST
occupational school
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
-100000
DUMMY VARIABLES
. reg COST N OCC NOCC
100000
0
0 200 400 600 800 1000 1200 N
DUMMY VARIABLES
Chow test
procedure
Residual sum of squares (x1011)
Step 1: run
regression for the Regression Occupational Regular Total
sub-groups and
save the RSS.
Separate RSS1 = 3.49 RSS2 = 1.22 4.71
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
DUMMY VARIABLES
overall reduction in RSS when cost in degrees
separate regressions are run of freedom
F(k, n – 2k)
H0 ? total RSS remaining when degrees of freedom
separate regressions are run remaining
H1 ?
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )