Block 3 MECE 001 Unit 10
Block 3 MECE 001 Unit 10
Block 3 MECE 001 Unit 10
Structure
10.0 Objectives
0 I Introduction
10.2 The Nature of Dummy Variables
10.3 A Simple Dummy Variable Model
10.4 Use of More than One Qualitative Variable
10.5 Testing for Structural Stability of Regression Models
10.6 Use of Dummy Variables in Seasonal Analysis
10.7 Pooling Cross Sectional and Time Series Data
10.8 Let Us Sum Up
10.9 Key Words
10.10 Some Useful Books
10.1 1 AnswersIHints to Check Your Progress Exercises
10.0 OBJECTIVES
After going through this unit you will be able to:
explain the nature of dummy variables;
use dummy variables in regression models;
test for structural stability in dummy variable models; and
pool cross sectional and time series data by using dummy variables.
10.1 INTRODUCTION
In the linear regression models considered in previous units so far we have assumed
the explanatory variables (i.e., the Xs) to be numerical or quantitative in nature. But
this may not always be the case. There can be instances when the explanatory
variable(s) are qualitative in nature. These qualitative variables are o%n called the
dummy variables. The purpose of this unit is to consider the role of such qualitative
explanatory variables in the regression analysis and also to show how the use of
dummy variables make the linear regression models an extremely flexible tool for
handling many interesting problems encountered in empirical studies.
I
10.2 THE NATURE OF DUMMY VANAISLES
I
I In regression analysis the dependent variable is frequently influenced not only by
! variables that can be readily quantified on some well defined scale (e.g., income,
I
I
output, prices, costs, weights, etc.), but also by other variables that are essentially
qualitative in nature (e.g., marital status, gender, religion, cute,ee.). For example,
1 holding all other factors constant, IIT graduates are found to earn more that1 their
I
/
counterparts from the regional engineering colleges in India. Shilariy studies in US
have reported that female college teachers earn less than their male counterparts.
Whatever may be the reason for this al.,parity, qwlitative variables like gender,
institution of education, etc. do influence the dependent variable and should be
included among the independent variables.
E%$ensionsof Regressian Since such qualitative variables usually indicate the presence or absence of sonre
Models
attribute or quality, such as rural or urban, male or female, married or unmarried etc.
one can quantify such attributes by constructing artificid variables that take value 1
or 0, 1 indicating the presence (or possession) of a particular attrib~teand 0 the
absence of it or vice versa. For example, 1 ma) indicate that the person is a inale and
0 may indicate that the person is female; or 1 may indicate that a persorm is educated
and 0 that helshe is not educated and so on. Such ~~ariables which assume values 0
.and 1 are called Dummy variables.'
Like the quantitative variables the dummy variah!es tag be 11secl i:l regression
analysis very -easily. In fact it may so happen that a regression model may contain
only dummy explanatory variables. Regression models containing only dunlmy
explanatory variables are called the analysis of variance (ANOVA) models. The
following model is an example of ANOVA model
E(Y, ) D l=O)=a, .
+a2(0) ... (10.2)
= a,+a,
.Inschool
- the above model the intercept term a, gives the mean annual salary of a female
teacher while the slope coefficient a2tells how much the mean salary of a
US
male school teacher differs from that of his female counterpart with (a1 + ~ 2 )
reflecting the mean annual salary of a male school teacher.
1
We can also test for the hypothesis: Is there a discrimination in accordance to the
gender of an ihdividual while determining the salary of school teachers by running
OLS on the regression equation (1 0.1) and finding out on the basis of t-test whether
the estimated a2is statistically significant or not.
I ?
We shall demonstrate the above with the help of a hypothetical example. Consider Dumrlly Variable Models
Table 10.I which gives :,ypothctical data of annual salary of school teachers.
Table 10.1: Annual Salaries of School Teachers
..
r.
The OLS results corresponding to the regression model (1 0.1) are as follows:
'3
From (1 0.4) we see that the estimated mean salary of female school teachers ( a ,) is
. A
--
Male teacher
Female teacher
> c
I
b .
I 'Teaching Experience I
Fig. 10.2: Salary Function of Teachers
Just as in regression (10.4)'here also one can use the t-test to test for the hypothesis
that male and female school teachers have the same mean annual salary.
Before proceeding further it is essential to discuss some of the important features of
the dummy variables which are as follows: .
/
1) In the above models we have introduced only one dummy variable, D, to
distinguish between two categories, male and female with Dl = 1 denoting
maje and D, = 0 denoting female. Now what happens if instead of one dummy
variable two dummy variables Dl, and D2, are introduced in the model, one
each for male and female? Model (10.5) can now be written as .
0 otherwise
Dz,
= 1 if a female teacher
0 otherwise
Due to perfect collinearity between Dl and D2 (i.e., perfect linear relationship)
model (10.8) cannot be estimated (See Unit 6 for the problem of
multicollinearity). This can be more clearly explained with the help of the . .
following data table.
Table 10.2: Example of Perfect Linear Relationship
-
Intercept Dl D2 X
Male yI 1 1 0 XI
Male y2 1 1 0 x2
.Female Ya -
-
1 0 1 x3
-Male Y4 I . 1 . . O X4
-
'Female y5
d
.- P 0 1 xs
From the above table it is easy to verify that Dl and D2are perfectJycollinear,
as Dl = ( I - D2)or D2= (1- Dl). We know from the previous units that in case
n f n e r f e r t rnlinenritv i t i6 nnt nnccihie :n ~ c t i r n n t ethe vnrinllc n a m r n e t m '
Exfensions of Regression, There are, however, a number of ways of resolving this problem but the
Models
simplest one is by assigning the dummies as we had done in model (1 0.5) and
using orlly one .dummy variable if there are two categories of a qualitative
variable.
Rule of Thumb: If a qualitative variable has m categories, introduce only (m -1)
dummy variables
Thus if a qualitative variable has 4 characteristics, introduce only 3 dummy
variables. If this rule is not followed, we shall fall into what is known as the
dummy variable trap, i.e., a situation of perfect multicollinearity.
The assignment of values 0 and I to two categories like rural and urban, or
educated and uneducated etc., is arbitrary. For example in our model 10.5
instead of assigping 1 to male teacher and 0 to female teacher we could have
assigned value 1 to female teacher and 0 to male teacher (and the coefficients
would change accordingly). In such a case what is of importance is the
interpretation of results. Thus in interpreting the results of the models that use +
dummy variables it is critical to know how the values 1 and 0 are assigned.
The category that is assigned a value 0 is often referred to as the base category b
or benchmark category and all the comparisons are made with reference to this
category. In model (10.5) female school teacher which is assigned value 0 is I
3) The coefficient attached to the dummy variable (for example, P in model 10.5)
is referred to as the differential intercept coefficient because it tells by how
much the value of the intercept term of the category that receives value 1
differs from that of the base category.
A variety of hypothesis can be tested by the Ordinary Least Square estimation of the
model (10.9). For example, we can test for the sig~ificanceof the differential
intercept terms p or y, or both P and y thereby determining which of the possibilities
exists without running the regression separately for each combination of gender and
type of educatiap,
In a similar fashion we can extend the model to include more than one quantitative
and more than two qualitative variables. However, we have to always keep in mind
that the number of dummies for each of the qualitative variable should be one less'
than the number of categories of that variable.
dxtensions o f Regression
Models. 10.5 TESTING FOR STRUCTURAL STABILITY OF
REGRESSION MODELS
In the regression Aodels $hat have been discussed so far in the present unit we have
considered that the qualitative variables affect the intercept term only but not the
. slope coefficient. but what happens if the slope coefficients are also affected by the
qualitative variables? In such- situations testing for the differences in the intercepts
alone will be of little significance. Therefore, we need to look for a methodology that
will identify whether the differences in two or more regressions are due to
differences in the intercept, or slopes or both slope and intercept. In order t6
understand this problem let us consider the following example.
Suppose we are interested in estimating a simple savings function that relates
domestic household savings (S) with the gross domestic product (Y) of India for the
period 1980-81 to 2002-03. The relevaht data is given in Table 10.2 below. One way
of proceeding is to simply run an OLS regression of S on Y for the entire period
1980-81 to 2002-03 assuming that the relation between savings and GDP do not
change over the entire period. But, it may not be so. India, in 1991, introduced a
series of economic reforms thereby bringing in substantial change in its economic
system. Introduction of economic reforms would have also influenced considerably
the savings-income relationship. Thus our objective is now to check whether the
savings-income relationship has underg0ne.a structural change between the two time
periods or not. By structural change we mean that the parameters of the savings
function haveshanged.
*&.' '.-
One way of testing whether the savings function has undergone a structural change is
to use the techniques of Chow test which has been discussed in details in Unit 3.
Following the procedure of chow test we divide the time period 1980-81 to 2002-03
into two periods: pre-reforms period (1980-81 to 1991-92) and post-reforms period
(1992-93 to 2002-03). The savings function for the two periods would now be
written as
a) AI=BI, and A2=B2;i.e., the two regressions are identical. This is the cpse of
coincident regression. (refer to Fig. 10.3a)
' b) Al#B1, but A2=B2;i.e., the two regressions differ only in their localion or;the
: intercepts. This is a case ofparallel regression. (refer to Fig. 1O.?b),
c) AI=BI, but A2#B2; i.e., the two regressions have same, interdept term butL
-
different slopeg. This is a case of concurrent regreksion $refer t o Fig. 10.3~)
and
I
d) , ,Al#BI,but A2fB2; i.e., the two regressions are completely different. This is a h m m y Vsri$pIe Models
case of dissimilar regression. (refer to Fig. 10.3d)
From the data on household savings and gross domestic product for India ,
Table 10.3 we can run the h'vo reg~essions(1 0.14) and (10.15) and apply the Chow
test,to see whether the savings function has undergone a structural change between
the two time periods.
(Students are advised to do this as an assignment).
In model (10.16) the parameter b is the dzferential intercept and d is the differential
slope coeficient indicating how much the slope coefficient of the pre-reform periods
savings function differs from the slope coefficient of the savings function in the post
reform period. Just as the introduction of an additive dummy enables us to
distinguish between the intercepts of the two periods, the introduction of
multiplicative dummy enables us to differentiate between the slope coefficients in
the two periods.
In order to see the implications of model (1 0.16) assuming E(uJ = 0, we get
which are respectively the mean savings function for the pre-reform and post-reform
period as represented by model (10.14) and (10.15) with a=Al, c=Az, and (a+b)=BI,
(c+d)=B2.Thus with the help of dummy variables a single regression can easily be
used to obtain two sub-period regressions.
Now estimating the regression (10.16) using the savings and income data given in
Table 10.3 we get
The prdfit function of the departmental store is dependent on the sales. Let us
represent the profit findion as
DII = 1 if ~ n lies
e in the fourth quarter
= 0 otherwise
In the above model we have, through the four quarters, incorporated the seasonal
t
v iation. It is assumed that the qualitative variable 'season' has four categories
th reby requiring the use of three dummies in the model. In the model as it is
desigtu&the first quarter is the base or benchmark quarter. The seasonal variation is
incorporated through the .use of differential intercepts. Each of these differential
intercepts tells us by how much the mean value of Y (i.e., the mean profit) differs in
each quarter in comparison to the base or the first quarter. Using the data in Table
10.4 the regression results of 10.22 are as follows
As this regression shows that all the differential intercepts are statistically significant
it implies that the average profits differ across each the four quarters. It shows that
the average profit is the 6aximum inthe third quarter when the sales are the highest
on account of the festive season in the country.
Note: We have in the above model assumed that the seasonal variations effect the
intercept term qn!y and not the slope term. But this inay not be so in reality. In order
to find out w&&er the seasonal variation have affect the intercept or slope or both
we use the technique of differential intercept and differential slope coeff~cient
discussed in the'previous section (model 10:16). Applying the method we can rewrite
If;
model 10.22 as shown by regression model ( 0.24) and test for the significance of ~"rnrn ~
V~ariable
Models
differential intercept and differential slope terms.
where, the differential slope coefficients An, A p and Ag tell US by how the slope
coefficient of the second quarter, third quarter and the fourth quarter differ from the
base or the first quarter respectively.
ConsFder Table 10.5 which gives data on the energy demand (in million tones of oil
equivalent) and value of output (in Rs mn) for three sectors of Indian economy. This
is an example of a cross section time series data where we are intetested in finding
out the relation betw-en energy demand and value of output for three sectors of the
economy and for each sector we have data for 18 years from 1980~81to 1997-98.
There are a number of ways to study the relationship between energy demand and
value of output. First, we can run the following times series regression for each
sector separately:
Using the dummy variable technique as discussed earlier or the chow test one can
find out if the parameters of these demand functions are the same or not.
The second way is to estimate for each of the year the cross-sectional regression. In
I
such a case there would be one regression for each of the 18 years giving a total of
18 regressions to be estimated.
I
The third way is to pool all the 54 observations (1 8 times series observations for the
I three sectors) and estimate the following regression
where, i stands for i-th sector and t for the t-&htime perk i.
(In (10.27) we have assumed that only the 'intercept terms differ across the sector but
not the slope terms. Readers can assume both the slope coeficients and intercept
terms to be different across sectors and test of their significance themselves).
Estimating ( I 0.27) using data in Table 10.5 we get
2)- Use the data given at Table 10.3 and run the regression given at equation
(10.24). Are the differential intercept and differential scope coefficients
statistically significant?
............................................................................................... - Dummy Vari+,l,e Models
................................................................................................
-
Gujarati, D.N., 1999, Essentials of Economgtrics, Second edition, Irwin McGraw-
Hill, New Delhi.
Gujarati, D.N., 2005, Basic Econometrics, Fourth edition, Tata McGraw-Hill, New
Delhi.
Johnston, J., and J. DiNardo, 1997, Econometric Methods, McGraw-Hill Co. New
Y ork.
I
, -
I Koutsoyiannis, A., 1977, Theory of Econometrics: An Introductory Exposition of
i 't
Econometric Methods, MacmiIlan Press Ltd, London.
Extensions of Regression Maddala, G.S., 1977, Econometrics, McGraw-Hill Kogakusha Ltd. Tokyo.
Models
Maddala, G.S., 1992, Introduction to Econometrics, Second edition, McMillan
Publishers, New York.
2) Go through Seqtion 10.3 and formulate the model with differential slope
coefficient. Test the model on the basis of t-test of the dummy variable
coefficients.