0% found this document useful (0 votes)
56 views8 pages

CHapter 5 Acct

This document introduces dummy variables and qualitative response regression models. It discusses how dummy variables can represent qualitative variables by assigning binary values of 0 and 1. A single dummy variable is sufficient to represent variables with two categories. The intercept represents the mean of the base or omitted category, while the coefficient on the dummy variable represents the difference from the base category. Four approaches to modeling a binary dependent variable are introduced: the linear probability model, logit model, probit model, and tobit model. The linear probability model is described in detail, where the regression equation is interpreted as estimating the conditional probability of a binary dependent variable.

Uploaded by

haile ethio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views8 pages

CHapter 5 Acct

This document introduces dummy variables and qualitative response regression models. It discusses how dummy variables can represent qualitative variables by assigning binary values of 0 and 1. A single dummy variable is sufficient to represent variables with two categories. The intercept represents the mean of the base or omitted category, while the coefficient on the dummy variable represents the difference from the base category. Four approaches to modeling a binary dependent variable are introduced: the linear probability model, logit model, probit model, and tobit model. The linear probability model is described in detail, where the regression equation is interpreted as estimating the conditional probability of a binary dependent variable.

Uploaded by

haile ethio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER ONE

Regression Analysis with Qualitative Information: Binary (Dummy Variables)


1.1. Describing Qualitative Information
In the regression analysis you discussed in previous chapters, the dependent variable as well as the
explanatory variables was quantitative in nature. However, the dependent variable might not only
influenced by variables that can be readily quantified on some well-defined scale (e.g., income,
output, prices, costs, height, and temperature), but also by variables that are essentially qualitative
in nature (e.g., sex, race, color, religion, nationality, and changes in government economic policy).
For example, holding all other factors constant, female college professors are found to earn less
than their male counterparts, and nonwhites are found to earn less than whites. This pattern may
result from sex or racial discrimination, but whatever the reason, qualitative variables such as sex
and race does influence the dependent variable and clearly should be included among the
explanatory variables. Since such qualitative variables usually indicate the presence or absence of
a “quality” or an attribute, such as male or female, black or white, one method of “quantifying”
such attributes is by constructing artificial variables that take on values of 1 or 0, 0 indicating the
absence of an attribute and 1 indicating the presence (or possession) of that attribute. For example,
1 may indicate that a person is a male, and 0 may designate a female; or 1 may indicate that a
person is a college graduate, and 0 that he is not, and so on. Variables that assume such 0 and 1
values are called dummy variables. Alternative names are indicator variables, binary variables,
categorical variables, and dichotomous variables.

1.2. Dummy Variable as Independent Variable


Dummy variables can be used in regression models just as easily as quantitative variables. As a
matter of fact, a regression model may contain explanatory variables that are exclusively dummy,
or qualitative, in nature.
Example: 𝑌𝑖 = 𝛼 + 𝛽𝐷𝑖 + 𝑈𝑖 … … … … … … … … … … … … … … … … … … … … … . (1.1)
Where 𝑌𝑖 =annual salary of a college professor
𝐷𝑖 = 1 if male college professor
= 0 otherwise (i.e., female professor)

1
Note that (1.1) is like the two variable regression models encountered in chapter two except that
instead of a quantitative X variable we have a dummy variable D (hereafter, we shall designate all
dummy variables by the letter D).
Model (1. 1) may enable us to find out whether sex makes any difference in a college professor’s
salary, assuming, of course, that all other variables such as age, degree attained, and years of
experience are held constant. Assuming that the disturbance satisfy the usually assumptions of the
classical linear regression model, we obtain from (1. 1).
Mean salary of female college professor: 𝐸(𝑌𝑖 /𝐷𝑖 = 0) = 𝛼 … … … … … … … … (1.2)
Mean salary of male college professor: 𝐸(𝑌𝑖 /𝐷𝑖 = 1) = 𝛼 + 𝛽
That is, the intercept term  gives the mean salary of female college professors and the slope
coefficient  tells by how much the mean salary of a male college professor differs from the mean
salary of his female counterpart,  +  reflecting the mean salary of the male college professor.
A test of the null hypothesis that there is no sex discrimination ( H 0 :  = 0) can be easily made by

running regression (1.1) in the usual manner and finding out whether on the basis of the t- test the
estimated  is statistically significant.

Note the following features of the dummy variable regression model considered previously.
1. To distinguish the two categories, male and female, we have introduced only one dummy
variable Di . For example, if Di = 1 always denotes a male, when Di = 0 we know that it

is a female since there are only two possible outcomes. Hence, one dummy variable
suffices to distinguish two categories. The general rule is this: If a qualitative variable
has ‘m’ categories, introduce only ‘m-1’ dummy variables. In our example, sex has two
categories, and hence we introduced only a single dummy variable. If this rule is not
followed, we shall fall into what might be called the dummy variable trap, that is, the
situation of perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary
in the sense that in our example we could have assigned D=1 for female and D=0 for male.
3. The group, category, or classification that is assigned the value of 0 is often referred to as
the base, benchmark, control, comparison, reference, or omitted category. It is the base in
the sense that comparisons are made with that category.

2
4. The coefficient  attached to the dummy variable D can be called the differential
intercept coefficient because it tells by how much the value of the intercept term of the
category that receives the value of 1 differs from the intercept coefficient of the base
category.

1.3. Dummy Variables as an Independent variables


1.3.1. Introduction
In the regression models so far, we have implicitly assumed that the dependent variable Y is
quantitative, whereas the explanatory variables are either quantitative, qualitative (or dummy), or
a mixture thereof. In fact, in the previous sections, on dummy variables, we saw how the dummy
explanatory variables are introduced in a regression model and what role they play in specific
situations. In this section we consider several models in which the dependent variable itself is
qualitative in nature. Although increasingly used in various areas of social sciences and medical
research, qualitative response regression models pose interesting estimation and interpretation
challenges.

Suppose we want to study about house ownership as a function of the income, house price, etc. A
person either is own house or not. Hence, the dependent variable, house ownership, can take only
two values: 1 if the person is owns house and 0 if he or she does not. In this case, the dependent
variable is of the type that elicits a yes or no response; that is, it is dichotomous in nature.

1.3.2. QUALITATIVE RESPONSE MODELS


Now we start our discussion of qualitative response models (QRM). These are models in which
the dependent variable is a discrete outcome. These models are analyzed in a general framework
of probability models. There are two broad categories of QRM.
A. Binomial Models: where the choice is between two alternatives
Example 1.3.1: 𝑌 = 𝛼0 + 𝛼1 𝑋1 +𝛼2 𝑋2

Y= 1, if individual 𝑖 attend college


= 0, otherwise
Therefore, the dependent variable Y takes on only two values (i.e., 0 and 1). Conventional
regression methods cannot be used to analyze a qualitative dependent variable model.

3
B. Multinomial Models: The choice is between more than two alternatives
Example 1.3.2: 𝑌 = 𝛼0 + 𝛼1 𝑋1 +𝛼2 𝑋2
In this case, the dependent variable Y takes more than two alternatives. For instances,
Y= 1, if individual 𝑖 attend college

Y= 2, if individual 𝑖 high school


Y= 3, otherwise
Since the discussion of multinomial QRMs is out of scope of this course, we start our study of
qualitative response models by considering the binary response regression model. There are four
approaches to estimating a probability model for a binary response variable:
1. Linear Probability Models 3. The Probit Model
2. The Logit Model 4. The Tobit (Censored Regression) Model1.

1.3.2.1. THE LINEAR PROBABILITY MODEL (LPM)


The linear probability model is the regression model applied to a binary dependent variable. To
fix ideas, consider the following simple model:
Yi =  0 +  1 Xi + Ui ……………………………(1)

Where X = income and 𝑈𝑖 = is the disturbance term


Y = 1 if the person owns a house
= 0 if the person does not own a house
The independent variable Xi can be discrete or continuous variable. The model can be extended to
include other additional explanatory variables. The above model expresses the dichotomous Yi as
a linear function of the explanatory variable Xi. Such kinds of models are called linear probability
models (LPM) since E(Yi/Xi) the conditional expectation of Yi given Xi, can be interpreted as the
conditional probability that the event will occur given Xi; that is, Pr(Yi = 1/Xi). Thus, in the
preceding case, E(Yi/Xi) gives the probability of a person owing a house and whose income is the
given amount Xi. The justification of the name LPM can be seen as follows.

Assuming E(Ui) = 0, as usual (to obtain unbiased estimators), we obtain

1
The Tobit model will not be discussed in this chapter

4
E(Yi/Xi) =  0 +  1 Xi …………………………………….(2)

Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – Pi = probability that
Yi = 0 (that is, that the event does not occur), the variable Yi has the following distributions:

𝑌𝑖 Probability
0 1-𝑃𝑖
1 Pi
Total 1

Therefore, by the definition of mathematical expectation, we obtain


E(Yi) = 0 (1 – Pi) + 1(Pi) = Pi ……………………………………..(3)
Now, comparing (2) with (3), we can equate
E(Yi/Xi) = Yi =  0 +  1 Xi = Pi ……………………………………(4)

i.e., the conditional expectation of the model (1) can, in fact, be interpreted as the conditional
probability of Yi. Since the probability Pi must lie between 0 and 1, we have the restriction 0  E
(Yi/Xi)  1 i.e., the conditional expectation, or conditional probability, must lie between 0 and 1.

Example: The LPM estimated by OLS (on home ownership) is given as follows:
Ŷi = -0.9457 + 0.1021Xi
(0.1228) (0.0082)
t = (-7.6984) (12.515) R2 = 0.8048
The above regression is interpreted as follows
❖ The intercept of –0.9457 gives the “probability” that a family with zero income will own a
house. Since this value is negative, and since probability cannot be negative, we treat this
value as zero. The slope value of 0.1021 means that for a unit change in income, on the
average the probability of owning a house increases by 0.1021 or about 10 percent. This is
so whether the income level is increased or not. This seems patently unrealistic. In reality
one would expect that Pi is non-linearly related to Xi (see next section).

Problems with the LPM

5
From the preceding discussion it would seem that OLS can be easily extended to binary dependent
variable regression models. So, perhaps there is nothing new here. Unfortunately, this is not the
case, for the LPM poses several problems. That is, while the interpretation of the parameters is
unaffected by having a binary outcome, several assumptions of the LPM are necessarily violated.

1. Heteroscedasticity
The variance of the disturbance terms depends on the X’s and is thus not constant. Let us see this
as follows. We have the following probability distributions for U.
𝑌𝑖 𝑈𝑖 Probability
0 -𝛽0-𝛽1 𝑋𝑖 1-𝑃𝑖
1 1-𝛽0-𝛽1 𝑋𝑖 Pi

Now by definition Var(𝑈𝑖 ) = E[𝑈𝑖 − E(𝑈𝑖 )]2 = 𝐸(𝑈𝑖 2 ) since E(𝑈𝑖 ) = 0 and Cov(𝑈𝑖 , 𝑈𝑗 ) = 0 for
all 𝑖 ≠ 𝑗 (no serial correlation) by assumption.
Therefore, using the preceding probability distribution of 𝑈𝑖 , we obtain
Var(𝑈𝑖 ) = E(Ui2) = (-  0 –  1 Xi)2 (1-Pi) + (1-  0 –  1 Xi)2 (Pi)

=(-  0 –  1 Xi)2(1-  0 –  1 Xi) + (1-  0 –  1 Xi)2 (  0 +  1 Xi)

= (  0 +  1 Xi) (1-  0 –  1 Xi)

or Var(𝑈𝑖 ) = E(Yi/Xi) [1 – E(Yi/Xi) = Pi (1 – Pi)


This shows that the variance of Ui is heteroscedastic because it depends on the conditional
expectation of Y, which, of course, depends on the value taken by X. Thus the OLS estimator of
 is inefficient and the standard errors are biased, resulting in incorrect test.

2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we assumed them
to be so distributed for the purpose of statistical inference, that is, hypothesis testing, etc. But the
assumption of normality for Ui is no longer tenable for the LPMs because like Yi, Ui takes on only
two values.
𝑈𝑖 = Yi-  0 –  1 Xi

Now when Yi = 1, 𝑈𝑖 = 1 -  0 –  1 Xi

6
and when Yi = 0, 𝑈𝑖 = –  0 –  1 Xi

Obviously 𝑈𝑖 cannot be assumed to be normally distributed. Recall that normality is not required
for the OLS estimates to be unbiased.

3. Non-fulfillment of 0  E (Yi/Xi)  1
The LPM produces predicted values outside the normal range of probabilities (0, 1). It predicts
value of Y that are negative and greater than 1. This is the real problem with the OLS estimation
of the LPM.

4. Functional Form:
Since the model is linear, a unit increase in X results in a constant change of  in the probability
of an event, holding all other variables constant. The increase is the same regardless of the current
value of X. In many applications, this is unrealistic. When the outcome is a probability, it is often
substantively reasonable that the effects of independent variables will have diminishing returns as
the predicted probability approaches 0 or 1.
Remark: Because of the above mentioned problems the LPM model is not recommended for
empirical works.

1.3.2.2. ALTERNATIVE MODEL TO LPM


As we have seen, the LPM is characterized by several problems such as non-normality of ui,
heteroscedasticity of ui and possibility of Yi lying outside the 0–1 range. But even then the
fundamental problem with the LPM is that it is not logically a very attractive model because it
assumes that Pi = E(Y =1| X) increases linearly with X, that is, the marginal or incremental effect
of X remains constant throughout. Thus, in our home ownership example we found that as X
increases by a unit (Birr 1000), the probability of owning a house increases by the same constant
amount of 0.10. This is so whether the income level is Birr 8000, Birr 10,000, Birr 18,000, or Birr
22,000. This seems patently unrealistic. In reality one would expect that Pi is nonlinearly related
to Xi: At very low income a family will not own a house but at a sufficiently high level of income,
say, X*, it most likely will own a house. Any increase in income beyond X* will have little effect
on the probability of owning a house. Thus, at both ends of the income distribution, the probability
of owning a house will be virtually unaffected by a small increase in X.

7
Therefore, what we need is a (probability) model that has these two features: (1) As Xi increases,
Pi = E(Y = 1 | X) increases but never steps outside the 0–1 interval, and (2) the relationship between
Pi and Xi is nonlinear, that is, “one which approaches zero at slower and slower rates as Xi gets
small and approaches one at slower and slower rates as Xi gets very large.’’ Geometrically, the
model we want would look something like Figure 1.1.

1 CDF

X
- 
0
Fig 1.1: A Cumulative Distribution Function (CDF
The above S-shaped curve is very much similar with the cumulative distribution function (CDF)
of a random variable. (Note that the CDF of a random variable X is simply the probability that it
takes a value less than or equal to X0, were X0 is some specified numerical value of X. In short,
F(X), the CDF of X, is F(X = X0) = P(X  X0). Therefore, one can easily use the CDF to model
regressions where the response variable is dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
1. the logistic – which gives rise to the logit model
2. the normal – which gives rise to the probit (or normit) model

You might also like