Type author name/s here
Dougherty
Introduction to Econometrics,
5th edition
Chapter heading
Chapter 5: Dummy Variables
© Christopher Dougherty, 2016. All rights reserved.
CHOW TEST
COST
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
Sometimes in regression analysis there are two types of observation in the sample data.
1
CHOW TEST
COST
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
If this is the case, it is sensible to investigate whether one regression model applies to both
categories or whether you need separate ones for them. To do this, you can perform a
Chow test.
2
CHOW TEST
COST
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
We will illustrate it using the data for the 74 secondary schools in Shanghai. The scatter
diagram plots the data on annual recurrent expenditure and number of students.
3
CHOW TEST
. reg COST N
Source | SS df MS Number of obs = 74
---------+------------------------------ F( 1, 72) = 46.82
Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000
Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940
---------+------------------------------ Adj R-squared = 0.3856
Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05
------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
N | 339.0432 49.55144 6.842 0.000 240.2642 437.8222
_cons | 23953.3 27167.96 0.882 0.381 -30205.04 78111.65
------------------------------------------------------------------------------
Here is the regression output when COST is regressed on N, making no distinction between
the different types of school.
4
CHOW TEST
COST
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
This is the scatter diagram with the regression line.
5
CHOW TEST
COST occupational school
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
Now we make a distinction between occupational schools and regular schools and run
separate regressions for the two subsamples.
6
CHOW TEST
. reg COST N if OCC==1
Source | SS df MS Number of obs = 34
---------+------------------------------ F( 1, 32) = 55.52
Model | 6.0538e+11 1 6.0538e+11 Prob > F = 0.0000
Residual | 3.4895e+11 32 1.0905e+10 R-squared = 0.6344
---------+------------------------------ Adj R-squared = 0.6229
Total | 9.5433e+11 33 2.8919e+10 Root MSE = 1.0e+05
------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
N | 436.7769 58.62085 7.451 0.000 317.3701 556.1836
_cons | 47974.07 33879.03 1.416 0.166 -21035.26 116983.4
------------------------------------------------------------------------------
This is the regression output when COST is regressed on N using the subsample of 34
occupational schools.
7
CHOW TEST
. reg COST N if OCC==0
Source | SS df MS Number of obs = 40
---------+------------------------------ F( 1, 38) = 13.53
Model | 4.3273e+10 1 4.3273e+10 Prob > F = 0.0007
Residual | 1.2150e+11 38 3.1973e+09 R-squared = 0.2626
---------+------------------------------ Adj R-squared = 0.2432
Total | 1.6477e+11 39 4.2249e+09 Root MSE = 56545
------------------------------------------------------------------------------
COST | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
N | 152.2982 41.39782 3.679 0.001 68.49275 236.1037
_cons | 51475.25 21599.14 2.383 0.022 7750.064 95200.43
------------------------------------------------------------------------------
And this is the regression output when COST is regressed on N for the subsample of 40
regular schools.
8
CHOW TEST
COST occupational school
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
Here are the regression lines for the two subsamples.
9
CHOW TEST
COST occupational school
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
The regression line for the pooled sample (entire sample, making no distinction) is shown
for comparison.
10
CHOW TEST
COST occupational school RSS = 5.55 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
The diagram shows the residuals for the occupational schools in the regression using the
pooled sample.
11
CHOW TEST
COST occupational school RSS = 3.49 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
Now the corresponding residuals for the regression using only the subsample observations
on the occupation schools.
12
CHOW TEST
COST occupational school RSS = 3.49 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
RSS is smaller for the residuals from the subsample regression. This must be the case.
Why? (Try to answer before continuing.)
13
CHOW TEST
COST occupational school RSS = 3.49 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
The regression line for the subsample regression is located so as to minimize the sum of
the squares of the residuals for the occupational school observations. This is the principle
underlying OLS.
14
CHOW TEST
COST occupational school RSS = 5.55 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
The regression line for the pooled sample is located to give the best overall fit for the
sample as a whole, including the regular schools.
15
CHOW TEST
COST occupational school RSS = 5.55 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
Its location is therefore a compromise between the best fit for the occupational school
observations and the best fit for the regular school observations. Because it is a
compromise, its fit will be inferior to that for the subsample regression.
16
CHOW TEST
COST occupational school RSS = 3.36 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
Next we turn to the regular schools. Here are the residuals for the pooled regression.
17
CHOW TEST
COST occupational school RSS = 1.22 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
And now those for the same observations in the regression using only that subsample of
observations..
18
CHOW TEST
COST occupational school RSS = 1.22 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
Again, RSS must be lower for the subsample regression than for the pooled sample
regression.
19
CHOW TEST
Residual sum of squares (x1011)
Regression Occupational Regular Total
Separate RSS1 = 3.49 RSS2 = 1.22 4.71
Pooled 5.55 3.36 RSSP = 8.91
The table summarizes the RSS data for the two types of school in the separate and pooled
regressions.
20
CHOW TEST
Residual sum of squares (x1011)
Regression Occupational Regular Total
Separate RSS1 = 3.49 RSS2 = 1.22 4.71
Pooled 5.55 3.36 RSSP = 8.91
The residual sums of squares for the separate regressions for the occupational and regular
schools will be denoted RSS1 and RSS2, respectively.
21
CHOW TEST
Residual sum of squares (x1011)
Regression Occupational Regular Total
Separate RSS1 = 3.49 RSS2 = 1.22 4.71
Pooled 5.55 3.36 RSSP = 8.91
Adding them together, we get the total residual sum of squares when separate regressions
are run for the two subsamples.
22
CHOW TEST
Residual sum of squares (x1011)
Regression Occupational Regular Total
Separate RSS1 = 3.49 RSS2 = 1.22 4.71
Pooled 5.55 3.36 RSSP = 8.91
We compare this total with RSSP, the residual sum of squares from the pooled sample
regression.
23
CHOW TEST
COST occupational school RSS = 8.91 x 1011
regular school
600000
500000
400000
300000
200000
100000
0
0 200 400 600 800 1000 1200 N
This is obtained directly from the original pooled regression. There is no need to calculate
the occupational and regular components. We are interested only in the total.
24
CHOW TEST
Residual sum of squares (x1011)
Regression Occupational Regular Total
Separate RSS1 = 3.49 RSS2 = 1.22 4.71
Pooled 5.55 3.36 RSSP = 8.91
We are interested in seeing whether there is a significant reduction in the total when we run
separate regressions for the two subsamples.
25
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )
The test statistic is the F statistic defined as shown.
26
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )
The first argument of the F statistic is k, the cost, in terms of degrees of freedom, of
running separate regressions.
27
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )
The cost is k because two sets of k parameters are estimated when separate regressions
are run, instead of only one set with the pooled regression.
28
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )
The second argument of the F statistic is n – 2k, the total number of degrees of freedom
remaining when separate regressions are run.
29
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )
There are n observations and k degrees of freedom are used up by each regression when
separate regressions are run.
30
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )
The numerator of the F statistic consists of the overall improvement in the fit on splitting
the sample, divided by the cost in terms of degrees of freedom when separate regressions
are run.
31
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k )
The denominator of the F statistic is the total RSS remaining after splitting the sample,
divided by the number of degrees of freedom remaining.
32
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k ) RSSP = 8.91 x 1011
RSS1 + RSS2 = 4.71 x 1011
(8.91 1011 [3.49 1011 1.22 1011 ]) / 2
F (2,70) 31 .2
(3.49 1011 1.22 1011 ) / 70
In the case of the school cost functions, the reduction in the residual sum of squares has
already been tabulated.
33
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k ) RSSP = 8.91 x 1011
RSS1 + RSS2 = 4.71 x 1011
(8.91 1011 [3.49 1011 1.22 1011 ]) / 2
F (2,70) 31 .2
(3.49 1011 1.22 1011 ) / 70
There are only two parameters in the model, the constant and the coefficient of N, so the
first argument of the F statistic is 2.
34
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k ) RSSP = 8.91 x 1011
RSS1 + RSS2 = 4.71 x 1011
(8.91 1011 [3.49 1011 1.22 1011 ]) / 2
F (2,70) 31 .2
(3.49 1011 1.22 1011 ) / 70
The residual sum of squares remaining after splitting the sample is the sum of RSS1 and
RSS2.
35
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k ) RSSP = 8.91 x 1011
RSS1 + RSS2 = 4.71 x 1011
(8.91 1011 [3.49 1011 1.22 1011 ]) / 2
F (2,70) 31 .2
(3.49 1011 1.22 1011 ) / 70
There are 74 observations and so there are 70 degrees of freedom remaining after
estimating two sets of parameters.
36
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k ) RSSP = 8.91 x 1011
RSS1 + RSS2 = 4.71 x 1011
(8.91 1011 [3.49 1011 1.22 1011 ]) / 2
F (2,70) 31 .2
(3.49 1011 1.22 1011 ) / 70
F (2,70)crit, 0.1% 7.6
The F statistic is thus 31.2. The critical value of F(2,70) is 7.6 at the 0.1% significance level.
37
CHOW TEST
overall reduction in RSS cost in degrees
when of freedom
F(k, n – 2k) separate regressions are run
total RSS remaining when degrees of freedom
separate regressions are run remaining
(RSSP [RSS1 RSS2 ]) / k
(RSS1 RSS2 ) /(n 2k ) RSSP = 8.91 x 1011
RSS1 + RSS2 = 4.71 x 1011
(8.91 1011 [3.49 1011 1.22 1011 ]) / 2
F (2,70) 31 .2
(3.49 1011 1.22 1011 ) / 70
F (2,70)crit, 0.1% 7.6
The reduction in the residual sum of squares is therefore significant at the 0.1% level. We
conclude that the pooled cost function is an inadequate specification and that we should
run separate regressions for the two types of school.
38
Copyright Christopher Dougherty 2016.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 5.4 of C. Dougherty,
Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
www.oxfordtextbooks.co.uk/orc/dougherty5e/.
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2016.05.03