0% found this document useful (0 votes)
27 views77 pages

Chapter Three QM

Chapter Three discusses the use of dummy variables in regression analysis, emphasizing their role in incorporating qualitative information such as sex and race into models. It explains how dummy variables can represent categorical effects and how to determine the number of dummy variables needed based on the number of categories. The chapter also illustrates regression models with one or more qualitative variables, demonstrating how to interpret the results and test for discrimination.

Uploaded by

habtamulegese24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views77 pages

Chapter Three QM

Chapter Three discusses the use of dummy variables in regression analysis, emphasizing their role in incorporating qualitative information such as sex and race into models. It explains how dummy variables can represent categorical effects and how to determine the number of dummy variables needed based on the number of categories. The chapter also illustrates regression models with one or more qualitative variables, demonstrating how to interpret the results and test for discrimination.

Uploaded by

habtamulegese24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

1

Chapter Three

By: Habtamu Legese (Asst.Prof)


Regression Analysis with Qualitative Information

By: Habtamu Legese (Asst.Prof)


3.1 The nature of dummy variables
•In regression analysis the dependent variable is frequently
influenced not only by variables that can be readily
quantified on some well-defined scale.

•(e.g., sex, race, colour, religion, nationality, wars,


earthquakes, strikes, political upheavals, and changes in
government economic policy).

By: Habtamu Legese (Asst.Prof) 2


Cont.
• For example, holding all other factors constant, female daily wage workers are
found to earn less than their male counterparts, and nonwhites are found
to earn less than whites.

• This pattern may result from sex or racial discrimination, but whatever the
reason, qualitative variables such as sex and race do influence the dependent
variable and clearly should be included among the explanatory variables.

By: Habtamu Legese (Asst.Prof) 3


Cont.
• Qualitative variables usually indicate the presence or absence of a
“quality” or an attribute, such as male or female, black or white, or
Christian or Muslim.

• One method of “quantifying” such attributes is by constructing


artificial variables that take on values of 1 or 0, 0 indicating the
absence of an attribute and 1 indicating the presence (or possession) of
that attribute.

By: Habtamu Legese (Asst.Prof) 4


Cont.
• For example, 1 may indicate that a person is a male, and 0 may
designate a female; or 1 may indicate that a person is a college
graduate, and 0 that he is not, and so on.

• Variables that assume such 0 and 1 values are called dummy


variables.

• Alternative names are indicator variables, binary variables,


categorical variables, and dichotomous variables.

By: Habtamu Legese (Asst.Prof) 5


Cont.
• Dummy variables can be used in regression models just as easily as quantitative
variables. As a matter of fact, a regression model may contain explanatory
variables that are exclusively dummy, or qualitative, in nature.

Example: Yi    Di  ui ------------------------------------------(1.01)

where Y=annual salary of a college professor


Di  1 if male college professor
= 0 otherwise (i.e., female professor)

By: Habtamu Legese (Asst.Prof) 6


Cont.
• Model (1.01) may enable us to find out whether sex makes any difference in a
college professor’s salary, assuming, of course, that all other variables such as age,
degree attained, and years of experience are held constant.

• Assuming that the disturbance satisfies the usually assumptions of the classical
linear regression model, we obtain from (1.01).

Mean salary of female college professor: E (Yi / Di  0)   -------(1.02)

Mean salary of male college professor: E (Yi / Di  1)    

By: Habtamu Legese (Asst.Prof) 7


Cont.
the intercept term  gives the mean salary of female college professors and the slope
coefficient  tells by how much the mean salary of a male college professor differs from the
mean salary of his female counterpart,    reflecting the mean salary of the male college
professor.

A test of the null hypothesis that there is no sex discrimination ( H 0 :   0) can be easily made

by running regression (1.01) in the usual manner and finding out whether on the basis of the t
test the estimated  is statistically significant.

By: Habtamu Legese (Asst.Prof) 8


A. Dummy Independent Variable Models

1.2 Regression on one quantitative variable and one qualitative variable with
two classes, or categories

Consider the model: Yi   i   2 Di  X i  ui ---------------(1.03)


Where: Yi  annual salary of a college professor
X i  years of teaching experience
Di  1 if male
=0 otherwise

By: Habtamu Legese (Asst.Prof) 9


Cont.
• Model (1.03) contains one quantitative variable (years of teaching
experience) and one qualitative variable (sex) that has two classes (or
levels, classifications, or categories), namely, male and female.

What is the meaning of this equation? Assuming, as usual, that E (ui )  0, we see that

Mean salary of female college professor: E (Yi / X i , Di  0)  1  X i ---------(1.04)

Mean salary of male college professor: E (Yi / X i , Di  1)  (   2 )  X i ------(1.05)

By: Habtamu Legese (Asst.Prof) 10


Cont.
• Geometrically, we have the situation shown in fig. 1.1 (for
illustration, it is assumed that ). In words, model 1.01
postulates that the male and female college professors’ salary
functions in relation to the years of teaching experience have the
same slope but different intercepts.

• In other words, it is assumed that the level of the male professor’s


mean salary is different from that of the female professor’s mean
salary (by but the rate of change in the mean annual salary by
years of experience is the same for both sexes.

By: Habtamu Legese (Asst.Prof) 11


By: Habtamu Legese (Asst.Prof) 12
Cont.
• If the assumption of common slopes is valid, a test of the hypothesis
that the two regressions (1.04) and (1.05) have the same intercept (i.e.,
there is no sex discrimination) can be made easily by running the
regression (1.03) and noting the statistical significance of the
estimated on the basis of the traditional t test.

•If the t test shows that is statistically significant, we


reject the null hypothesis that the male and female college
professors’ levels of mean annual salary are the same.

By: Habtamu Legese (Asst.Prof) 13


Cont.
• Before proceeding further, note the following features of the
dummy variable regression model considered previously
1. To distinguish the two categories, male and female, we have
introduced only one dummy variable . For if always
denotes a male, when D = 0 we know that it is a female
since there are only two possible outcomes.

Hence, one dummy variable suffices to distinguish two


categories. The general rule is this: If a qualitative variable
has ‘m’ categories, introduce only ‘m-1’ dummy variables.

By: Habtamu Legese (Asst.Prof) 14


Cont.
•In our example, sex has two categories, and hence we
introduced only a single dummy variable. If this rule is not
followed, we will fall into what might be called the dummy
variable trap, that is, the situation of perfect
multicollinearity.

2. The assignment of 1 and 0 values to two categories, such as


male and female, is arbitrary in the sense that in our example we
could have assigned D = 1 for female and D = 0 for male.

By: Habtamu Legese (Asst.Prof) 15


Cont.
3. The group, category, or classification that is assigned the value of 0 is
often referred to as the base, benchmark, control, comparison,
reference, or omitted category. It is the base in the sense that
comparisons are made with that category.

4. The coefficient attached to the dummy variable D can be called


the differential intercept coefficient because it tells by how much the
value of the intercept term of the category that receives the value of 1
differs from the intercept coefficient of the base category.

By: Habtamu Legese (Asst.Prof) 16


What is dummy variable ?
• In statistics and econometrics, particularly in regression analysis,
a dummy variable is one that takes only the value 0 or 1 to
indicate the absence or presence of some categorical effect that
may be expected to shift the outcome.

By: Habtamu Legese (Asst.Prof) 17


What is the purpose of dummy variables?
•Dummy variables are useful because they enable us to
use a single regression equation to represent multiple
groups.

•This means that we don't need to write out separate


equation models for each subgroup.

•The dummy variables act like 'switches' that turn various


parameters on and off in an equation.

By: Habtamu Legese (Asst.Prof) 18


How do you determine the number of dummy variables?

•The first step in this process is to decide the number of


dummy variables.
•This is easy; it's simply k-1, where k is the number of
levels of the original variable.
•You could also create dummy variables for all levels in
the original variable, and simply drop one from each
analysis.

By: Habtamu Legese (Asst.Prof) 19


Is 0 male or female?

• In the case of gender, there is typically no natural reason to code the


variable female = 0, male = 1, versus male = 0, female = 1.
• However, convention may suggest one coding is more familiar to a
reader; or choosing a coding that makes the regression coefficient
positive may ease interpretation.

By: Habtamu Legese (Asst.Prof) 20


Can dummy variables be 1 and 2?
•Technically, dummy variables are dichotomous,
quantitative variables.
•Their range of values is small; they can take on only two
quantitative values.
•As a practical matter, regression results are easiest to
interpret when dummy variables are limited to two
specific values, 1 or 0.

By: Habtamu Legese (Asst.Prof) 21


Why do we drop one dummy variable?

By: Habtamu Legese (Asst.Prof) 22


3.3 Regression on one quantitative variable and one qualitative variable with
more than two classes

• Suppose that, on the basis of the cross-sectional data, we want to regress


the annual expenditure on health care by an individual on the income and
education of the individual.
• Since the variable education is qualitative in nature, suppose we consider
three mutually exclusive levels of education: less than high school, high
school, and college.

By: Habtamu Legese (Asst.Prof) 23


Cont.
•Now, unlike the previous case, we have more than two
categories of the qualitative variable education.
•Therefore, following the rule that the number of dummies
be one less than the number of categories of the variable,
we should introduce two dummies to take care of the three
levels of education.
•Assuming that the three educational groups have a
common slope but different intercepts in the regression of
annual expenditure on health care on annual income, we can
use the following model:
By: Habtamu Legese (Asst.Prof) 24
Cont.
Yi  1   2 D2i   3 D3i  X i  ui --------------------------(1.06)

Where Yi  annual expenditure on health care

X i  annual expenditure
D2  1 if high school education
= 0 otherwise
D3  1 if college education
= 0 otherwise

By: Habtamu Legese (Asst.Prof) 25


Cont.

• Note that in the preceding assignment of the dummy variables we are


arbitrarily treating the “less than high school education” category as
the base category. Therefore, the intercept will reflect the intercept for
this category.

• The differential intercepts and tell by how much the intercepts of


the other two categories differ from the intercept of the base category,
which can be readily checked as follows:

By: Habtamu Legese (Asst.Prof) 26


Cont.

E (Yi | D2  0, D3  0, X i )  1  X i
• Assuming , we obtain from (1.06)
E(Yi | D2  1, D3  0, X i )  (1   2 )  X i
E(Yi | D2  0, D3  1, X i )  (1   3 )  X i

•which are, respectively the mean health care expenditure


functions for the three levels of education, namely, less than
high school, high school, and college.

By: Habtamu Legese (Asst.Prof) 27


Cont.
• Geometrically, the situation is shown in fig 1.2 (for illustrative
purposes it is assumed that ).

By: Habtamu Legese (Asst.Prof) 28


3.4 Regression on one quantitative variable and two
qualitative variables
The technique of dummy variable can be easily extended to
handle more than one qualitative variable.
Let us revert to the college professors’ salary regression
(1.03), but now assume that in addition to years of teaching
experience and sex the skin color of the teacher is also an
important determinant of salary.
For simplicity, assume that colour has two categories: black
and white

By: Habtamu Legese (Asst.Prof) 29


Cont.
• We can now write (1.03) as:

Yi  1   2 D2i   3 D3i  X i  ui ----------(1.07)


Where Yi  annual salary

X i  years of teaching experience


D2  1 if female

=0 otherwise
D3  1 if white
=0 otherwise

By: Habtamu Legese (Asst.Prof) 30


Cont.
•Notice that each of the two qualitative variables, sex and
color, has two categories and hence needs one dummy
variable for each. Note also that the omitted, or base,
category now is “black female professor”.

By: Habtamu Legese (Asst.Prof) 31


Cont.
Assuming E (ui )  0 , we can obtain the following regression from (1.07)
Mean salary for black female professor:
E (Yi | D2  0, D3  0, X i )  1  X i
Mean salary for black male professor:
E(Yi | D2  1, D3  0, X i )  (1   2 )  X i
Mean salary for white female professor:
E(Yi | D2  0, D3  1, X i )  (1   3 )  X i
Mean salary for white male professor:
E (Yi | D2  1, D3  1, X i )  (1   2   3 )  X i

By: Habtamu Legese (Asst.Prof) 32


Cont.
•Once again, it is assumed that the preceding regressions
differ only in the intercept coefficient but not in the slope
coefficient.
•An OLS estimation of (1.07) will enable us to test a variety
of hypotheses. Thus, if is statistically significant, it will
mean that colour does affect a professor’s salary.
•Similarly, if is statistically significant, it will mean that
sex also affects a professor’s salary. If both these
differential intercepts are statistically significant, it would
mean sex as well as colour is an important determinant of
professors’ salaries.

By: Habtamu Legese (Asst.Prof) 33


Cont.
•From the preceding discussion it follows that we can extend
our model to include more than one quantitative variable
and more than two qualitative variables.
•The only precaution to be taken is that the number of
dummies for each qualitative variable should be one less
than the number of categories of that variable.

By: Habtamu Legese (Asst.Prof) 34


3.5 Interaction effects
• Consider the following model:

Yi  1   2 D2i   3 D3i  X i  ui ----------------------------(1.08)


where Yi  annual expenditure on clothing
X i  Income
D2  1 if female
= 0 if male
D3  1 if college graduate
= 0 otherwise

By: Habtamu Legese (Asst.Prof) 35


Cont.
•The implicit assumption in this model is that the differential
effect of the sex dummy is constant across the two levels
of education and the differential effect of the education
dummy is also constant across the two sexes.

•That is, if, say, the mean expenditure on clothing is higher


for females than males this is so whether they are college
graduates or not. Likewise, if, say, college graduates on
the average spend more on clothing than non-college
graduates, this is so whether they are female or males.

By: Habtamu Legese (Asst.Prof) 36


Cont.
•In many applications, such an assumption may be untenable.
A female college graduate may spend more on clothing
than a male graduate.
• In other words, there may be interaction between the two
qualitative variables and therefore their effect on mean
Y may not be simply additive as in (1.08) but multiplicative
as well, as in the following model:

Yi  1   2 D2i   3 D3i   4 ( D2i D3i )  X i  ui -----------------(4.09)

By: Habtamu Legese (Asst.Prof) 37


Cont.

• From (4.09) we obtain


E(Yi | D2  1, D3  1, X i )  (1   2   3   4 )  X i ------------(4.10)
• which is the mean clothing expenditure of graduate females. Notice
that
• differential effect of being a female
• differential effect of being a college graduate
• differential effect of being a female graduate

By: Habtamu Legese (Asst.Prof) 38


Cont.
•If are all positive, the average clothing
expenditure of females is higher than the base category
(which here is male non-graduate), but it is much more so if
the females also happen to be graduates.

•This shows how the interaction dummy modifies the effect of


the two attributes considered individually.

•Whether the coefficient of the interaction dummy is


statistically significant can be tested by the usual t test.
Omitting a significant interaction term will lead to a
specification bias. By: Habtamu Legese (Asst.Prof) 39
Cont.
The importance of interactions among dummy variables
 Help us to get influential variables
 To avoid misspecification bias

By: Habtamu Legese (Asst.Prof) 40


3.6 Slope indicator variables
The interaction between dummy variables and quantitative
variables. They affect only slope, i.e, it does not affect
intercept.
It help us to captures the interaction effect of dummy and
quantitative variables on dependent variables
 Look at the following example
The price of condominium house can be explained as a function
of its characteristics such as its size, location, number of
bedrooms, age, floor and so on.

By: Habtamu Legese (Asst.Prof)


41
Cont.
For our discussion, let us assume that the number of bed
room is measured in numbers, nbdr, is the only relevant
variable in determining house price.
prhou  0  1nbdr  ui

 1 is the value of an additional number of bed


rooms.

 0 is the value of land alone

By: Habtamu Legese (Asst.Prof)


42
Cont.
prhou  α1  α2D 1nbdr  ui
1if desirable neibour hood
D   0 if not desirable neibourhood

We make the reference group, non desirable


group.
Instead of assuming that the effect of location on
house price causes a change in the intercept.
Let us assume that the change is in the slope of the
relationship.

By: Habtamu Legese (Asst.Prof)


43
Cont.
We can allow for a change in a slope by including in
the model an additional explanatory variable that is
equal to the product of an indicator variable and
continuous variable.
In our model, the slope of the relationship is the value
of an additional number of bed rooms.
If we assume 1 value for homes in desirable
neibourhood, and 0 other wise; we can specify our
model as follows:
prhou  α1  1nbdr  2(nbdr * D)  ui
By: Habtamu Legese (Asst.Prof)
44
Cont.
The new variable (nbdr*neib) is the product of the number of
bedroom and the indicator variables, is called an interaction
variable as it captures the interaction of location and number of
bedroom on condominium house prices.
Or it is called a slope –indicator variable or a slope dummy
variable, b/c it allows for the change in the slope of the
relationship.
The slope indicator variable takes a value equal to nbdr for
houses in the desirable neibourhood, when D = 1, and it is 0 for
homes in other neighbourhoods.

By: Habtamu Legese (Asst.Prof)


45
Cont.
 A slope indicator variable is treated as just like
any other explanatory variable in a regression
model.
 α1   1 nbdr   2 nbdr    when D  1
E ( prhou ) 
 1   1 nbdr         when D  0

 In the desirable neighbourhood, the price per


additional number of bedrooms of a house is 1  2
 In the non desirable neighbourhood, the price
per additional number of bedrooms of a house is 1 .
 If 2  0 price per additional number of
bedrooms is higher in the more desirable
neighbourhood.
By: Habtamu Legese (Asst.Prof)
46
Cont.
The effect of including a slope indicator variable
also can be see by using calculus.
The partial derivatives of expected house price
with respect to number of bed rooms
E ( prhou )  1  2 when D  1
(nbdr )   1 w hen D  0
 If  2  0

slope  1   2 E( prhou)    (1  2 )nbdr


prhou E( prhou)    1nbdr

0 slope  1

nbdr
By: Habtamu Legese (Asst.Prof)
47
Cont.

 If we assume that house location affects both the


intercept and the slope, then both can be
incorporated into a single model.
 The model specification will be:

prhou   1  2D  1nbdr  2 (nbdr * D)  ui

(  2 )  (  1  2 ) nbdr   when D  1
E ( prhou )   1
 1  1nbdr ----------------------------When D  0

By: Habtamu Legese (Asst.Prof)


48
3.7 Tests for Structural Change and Stability
•A fundamental assumption in regression modeling is that the
pattern of data on dependent and independent variables
remains the same throughout the period over which the data
is collected.
• Under such an assumption, a single linear regression
model is fitted over the entire data set.
•The regression model is estimated and used for prediction
assuming that the parameters remain same over the
entire time period of estimation and prediction.

By: Habtamu Legese (Asst.Prof) 49


•When it is suspected that there exists a change in the
pattern of data, then the fitting of single linear regression
model may not be appropriate, and more than one
regression models may be required to be fitted.
•Before taking such a decision to fit a single or more than
one regression models, a question arises how to test and
decide if there is a change in the structure or pattern of
data.
•Such changes can be characterized by the change in the
parameters of the model and are termed as structural
change.

By: Habtamu Legese (Asst.Prof) 50


Cont.
• Now we consider some examples to understand the problem of
structural change in the data.
• Suppose the data on the consumption pattern is available for several
years and suppose there was a war in between the years over which
the consumption data is available.
• Obviously, the consumption pattern before and after the war
does not remain the same as the economy of the country gets
disturbed.
• So if a model is fitted then the regression coefficients before and
after the war period will change. Such a change is referred to as a
structural break or structural change in the data.
• A better option, in this case, would be to fit two different linear
regression models- one for the data before the war and another for
the data after the war.

By: Habtamu Legese (Asst.Prof) 51


Cont.
Testing for structural stability will help us to find out whether
two or more regressions are different, where the difference
may be in the intercepts or the slopes or both.
Suppose we are interested in estimating a simple saving
function that relates domestic household savings (S) with
gross domestic product (Y) for Ethiopia.
Suppose further that, at a certain point of time (1991), a
series of economic reforms have been introduced.
By: Habtamu Legese (Asst.Prof)
52
Cont.
So far we assumed that the intercept and
all the slope coefficients (βj's) are the
same/stable for the whole set of
observations. Y = Xβ + e
But, structural shifts and/or group
differences are common in the real world.
May be:
The intercept differs/changes, or
The slope differs/changes, or
Both differ/change across categories or
time period.
By: Habtamu Legese (Asst.Prof)
53
Cont.
 The hypothesis here is that such reforms might have
considerably influenced the savings - income relationship, that is,
the relationship between savings and income might be different
in the post reform period as compared to that in the pre-reform
period.
 If this hypothesis is true, then we say a structural change has
happened.

 H0: Economic reforms might not have influenced the


savings and national income relationship
 H1: Economic reforms might have influenced the savings and
national income relationship
 How do we check if this is so?
By: Habtamu Legese (Asst.Prof)
54
Cont.

We can test structural stability of testing parameter


by using two methods.
1. Dummy variables
2. Chow test
1. Using dummy variables
* Write the savings function as:
S t   0  1 Dt   2Yt   3 (Yt Dt )  u t
where St is household saving at time t,Yt is GDP at time t and
0 if pre  reform ( 1991)
D 
1 if post  reform ( 1991)
t

By: Habtamu Legese (Asst.Prof)


55
Cont.
 3 is the differential slope coefficient
indicating how much the slope coefficient of the
pre-reform period savings function differs from
the slope coefficient of the savings function in
the post reform period.
 Decision rule:
 If  1 &  3 are both statistically significant as
judged by the t-test, the pre-reform and post-
reform regressions differ in both the intercept
and the slope.

By: Habtamu Legese (Asst.Prof)


56
Cont.
 If only  1 is statistically significant, then the
pre-reform and post-reform regressions differ
only in the intercept (meaning the marginal
propensity to save (MPS) is the same for pre-
reform and post-reform periods).
 If only  3 is statistically significant, then the
two regressions differ only in the slope (MPS).
 Check structural stability for the f/wing
regression result:
Ŝt  20.76005  5.9991 D̂t  2.616285 Yˆ  0.5298177 (Yˆ D̂ )
t t t

By: Habtamu Legese (Asst.Prof)


57
Cont.
Example 2: Using the DVR to Test for Structural
Break:
 Recall the example of consumption function:
Period 1: consi = α1+ β1*inci+ui
vs. Period 2: consi = α2+ β2*inci+ui
Let’s define a dummy variable D1, where:
 D1 = 1 for the period 1974-1991, and
 D1 =0 for the period 1992-2022
 Then, consi = α0+α1*D1+β0*inci+β1(D1*inci)+ui
For period 1: consi = (α0+α1)+(β0+β1)inci+ui
For period 2 (base category): consi= α0+ β0*inci+ui
 Regressing cons on inc, D1 and (D1*inc) gives:
cons = 1.95 + 152D1 + 0.806*inc – 0.056(D1*inc)
p-value: (0.968) (0.010) (0.000) (0.002)
By: Habtamu Legese (Asst.Prof)
58
Cont.
 D1 = 1 for i ϵ period-1 & D1= 0 for i ϵ period-2:
 Period 1 (1974-1991):cons = 153.95 + 0.75*inc
 Period 2 (1992-2022): cons = 1.95 + 0.806*inc

By: Habtamu Legese (Asst.Prof)


59
2. Chow Test
• A Chow test is a statistical test developed by
economist Gregory Chow that is used to test whether the
coefficients in two different regression models on different
datasets are equal.

• The Chow test is typically used in time series data to


determine if there is a structural break in the data at some
point.

By: Habtamu Legese (Asst.Prof)


60
When to use the Chow Test
The following examples illustrate situations where you may
wish to perform a Chow test:
1. To determine if stock prices change at different rates before
and after an election.
2. To determine if housing prices change before and after an
interest rate change.
3. To determine if the average profit of public companies is
different before and after a new tax law is passed.
In each situation, we could use a Chow test to determine if
there is a structural break point in the data at a certain point in
time.
By: Habtamu Legese (Asst.Prof)
61
2. Chow’s test
 One approach for testing the presence of structural change (structural instability) is by
means of Chow’s test. The steps involved in this procedure:
 Step 1: Estimate the regression equation for the whole period (pre-
reform plus post-reform periods) and find the error sum of squares (
RSSR ) or RRSS.
 Step 2: Estimate equation (model) using the available data in the pre-
reform period (say, of size n1), and find the error sum of squares (RSS1).
 Step 3: Estimate equation (model) using the available data in the post-
reform period (say, of size n2), and find the error sum of squares (RSS2).
 Step 4: Calculate RSSU= RSS1+RSS2.
 Step 5: Calculate the Chow test statistic

(RSS R  RSSU ) / k
Fc 
RSSU /(n1  n2  2k)
 Where k is number of estimated regression coefficients and
intercept By: Habtamu Legese (Asst.Prof)
62
Chow Test
RSS c  ( RSS1  RSS 2 ) / k
F
RSS1  RSS 2 / n  2k
RSS c  combined _ RSS
RSS1  pre  break _ RSS
RSS 2  post  break _ RSS
By: Habtamu Legese (Asst.Prof) 63
Cont.

F is the critical value from the t-
(k,n1 n2 2k)

 distribution with k (in our case k =2) and


n1+n2-2k degrees of freedom from a given
significance level,
 Decision rule: Reject the null hypothesis of
identical intercepts and slopes for the pre-reform
and post reform periods, that is if Fc  Ftb .

 i.e, Rejecting H0 means there is a structural


change.

By: Habtamu Legese (Asst.Prof)


64
Cont.
Example: RSS1= 64,499,436.865 (Error sum of
squares in the pre-reform period); n1=12;
RSS2=2,726,652,790.434 (Error sum of squares in
the post-reform period); n2=11;
RSSR=13,937,337,067.461 (Error sum of squares
for the whole period)
 RSSU=RSS1+RSS2=2,791,152,227.299
 The test statistics is:

(RSSR  RSSU ) / k (13,937,337,067.461 2,791,152,227.2) / 2


Fc    190
RSSU /(n1  n2  2k) (2,791,152,227.299) /(12 11 2(2))
The tabulated value from the F-distribution with 2
and 19 degrees of freedom at the 5% level of
significance is 3.52.
By: Habtamu Legese (Asst.Prof)
65
By: Habtamu Legese (Asst.Prof)
66
Cont.
 Decision: Since the calculated value of F exceeds
the tabulated value, we reject the null hypothesis
of identical intercepts and slopes for the pre-
reform and post reform periods at the 5% level of
significance.
 Hence, we can conclude that there is a structural
break.

By: Habtamu Legese (Asst.Prof)


67
Cont.
Draw backs:
 Chow’s test does not tell us whether the difference
(change) in the slope only, in the intercept only or
in both the intercept and the slope.
The Chow Tests
Using an F-test to determine whether a single
regression is more efficient than two/more separate
regressions on sub-samples.

By: Habtamu Legese (Asst.Prof)


68
Using Dummy variables vs Chow’s test
Comparing the two methods, it is preferable to use
the method of dummy variables regression.
 This is because with the method of DVR:
1. We run only one regression.
2.We can test whether the change is in
the intercept only, in the slope only, or
in both.

By: Habtamu Legese (Asst.Prof)


69
Dummy dependent variable
(Qualitative Response Model)
Qualitative Response Model shows situations in
which the dependent variable in a regression
equation simply represents a discrete choice
assuming only a limited number of values
 Such a model is called
 Limited dependent variable

 Discrete dependent variable

 Qualitative response

By: Habtamu Legese (Asst.Prof)


70
Categories of Qualitative Response Models
There are two broad categories of QRM
1. Binomial Model: it shows the choice
between two alternatives
e.g: Decision to participate in labor force or not
2. Multinomial models: the choice between
more than two alternatives
e.g: Y= 1, occupation is farming
=2, occupation is carpentry
=0, government employee

By: Habtamu Legese (Asst.Prof)


71
Cont.

 Ordinal variables: variables that have


categories that can be ranked.
Example: Rank according to education
attainment (Y)
0 if primary education

Y  if secondary education
1

 2 if university education
 Nominal variables: variables occur when
there are multiple outcomes that cannot be
ordered.
By: Habtamu Legese (Asst.Prof)
72
Cont.

Example: Occupation can be grouped as


farming, fishing, carpentry etc.
0 if farming N.B: Numbers are
1if fishermen

Y assigned arbitrarily
 2 if carpentry
 3 if government employee
Count variables: indicate the number of times
some event has occurred.
Example: How many years of education you
have attend?
In all of the above situations, the variables
are discrete valued. By: Habtamu Legese (Asst.Prof) 73
Qualitative Choice Analysis
In such cases instead of standard regression models, we
apply different methods of modeling and analyzing
discrete data.
Qualitative choice models may be used when a decision
maker faces a choice among:
 Finite number of choices
 The choices are mutually exclusive (the person
chooses only one of the alternatives)
The choices are exhaustive (all possible alternatives
are included)
By: Habtamu Legese (Asst.Prof)
74
Cont.
Throughout our discussion we shall restrict
ourselves to cases of qualitative choice where the
set of alternatives is binary.
For the sake of convenience the dependent
variable is given a value of 0 or 1.
Example: Suppose the choice is whether to
work or not. The discrete dependent variable
we are working with will assume only two

1if i th individual is working


values 0 and 1: Yi  
0 if i th individual is notworking
where i = 1, 2, …, n.
By: Habtamu Legese (Asst.Prof)
75
Reading Assignment
The four most commonly used approaches to estimating
binary response models
(Type of binomial models). These are:

Linear probability models


The logit model
The probit model
The tobit (censored regression) model

By: Habtamu Legese (Asst.Prof)


76
Thank You

By: Habtamu Legese (Asst.Prof)


77

You might also like