CHapter 5 Acct
CHapter 5 Acct
1
Note that (1.1) is like the two variable regression models encountered in chapter two except that
instead of a quantitative X variable we have a dummy variable D (hereafter, we shall designate all
dummy variables by the letter D).
Model (1. 1) may enable us to find out whether sex makes any difference in a college professor’s
salary, assuming, of course, that all other variables such as age, degree attained, and years of
experience are held constant. Assuming that the disturbance satisfy the usually assumptions of the
classical linear regression model, we obtain from (1. 1).
Mean salary of female college professor: 𝐸(𝑌𝑖 /𝐷𝑖 = 0) = 𝛼 … … … … … … … … (1.2)
Mean salary of male college professor: 𝐸(𝑌𝑖 /𝐷𝑖 = 1) = 𝛼 + 𝛽
That is, the intercept term gives the mean salary of female college professors and the slope
coefficient tells by how much the mean salary of a male college professor differs from the mean
salary of his female counterpart, + reflecting the mean salary of the male college professor.
A test of the null hypothesis that there is no sex discrimination ( H 0 : = 0) can be easily made by
running regression (1.1) in the usual manner and finding out whether on the basis of the t- test the
estimated is statistically significant.
Note the following features of the dummy variable regression model considered previously.
1. To distinguish the two categories, male and female, we have introduced only one dummy
variable Di . For example, if Di = 1 always denotes a male, when Di = 0 we know that it
is a female since there are only two possible outcomes. Hence, one dummy variable
suffices to distinguish two categories. The general rule is this: If a qualitative variable
has ‘m’ categories, introduce only ‘m-1’ dummy variables. In our example, sex has two
categories, and hence we introduced only a single dummy variable. If this rule is not
followed, we shall fall into what might be called the dummy variable trap, that is, the
situation of perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary
in the sense that in our example we could have assigned D=1 for female and D=0 for male.
3. The group, category, or classification that is assigned the value of 0 is often referred to as
the base, benchmark, control, comparison, reference, or omitted category. It is the base in
the sense that comparisons are made with that category.
2
4. The coefficient attached to the dummy variable D can be called the differential
intercept coefficient because it tells by how much the value of the intercept term of the
category that receives the value of 1 differs from the intercept coefficient of the base
category.
Suppose we want to study about house ownership as a function of the income, house price, etc. A
person either is own house or not. Hence, the dependent variable, house ownership, can take only
two values: 1 if the person is owns house and 0 if he or she does not. In this case, the dependent
variable is of the type that elicits a yes or no response; that is, it is dichotomous in nature.
3
B. Multinomial Models: The choice is between more than two alternatives
Example 1.3.2: 𝑌 = 𝛼0 + 𝛼1 𝑋1 +𝛼2 𝑋2
In this case, the dependent variable Y takes more than two alternatives. For instances,
Y= 1, if individual 𝑖 attend college
1
The Tobit model will not be discussed in this chapter
4
E(Yi/Xi) = 0 + 1 Xi …………………………………….(2)
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – Pi = probability that
Yi = 0 (that is, that the event does not occur), the variable Yi has the following distributions:
𝑌𝑖 Probability
0 1-𝑃𝑖
1 Pi
Total 1
i.e., the conditional expectation of the model (1) can, in fact, be interpreted as the conditional
probability of Yi. Since the probability Pi must lie between 0 and 1, we have the restriction 0 E
(Yi/Xi) 1 i.e., the conditional expectation, or conditional probability, must lie between 0 and 1.
Example: The LPM estimated by OLS (on home ownership) is given as follows:
Ŷi = -0.9457 + 0.1021Xi
(0.1228) (0.0082)
t = (-7.6984) (12.515) R2 = 0.8048
The above regression is interpreted as follows
❖ The intercept of –0.9457 gives the “probability” that a family with zero income will own a
house. Since this value is negative, and since probability cannot be negative, we treat this
value as zero. The slope value of 0.1021 means that for a unit change in income, on the
average the probability of owning a house increases by 0.1021 or about 10 percent. This is
so whether the income level is increased or not. This seems patently unrealistic. In reality
one would expect that Pi is non-linearly related to Xi (see next section).
5
From the preceding discussion it would seem that OLS can be easily extended to binary dependent
variable regression models. So, perhaps there is nothing new here. Unfortunately, this is not the
case, for the LPM poses several problems. That is, while the interpretation of the parameters is
unaffected by having a binary outcome, several assumptions of the LPM are necessarily violated.
1. Heteroscedasticity
The variance of the disturbance terms depends on the X’s and is thus not constant. Let us see this
as follows. We have the following probability distributions for U.
𝑌𝑖 𝑈𝑖 Probability
0 -𝛽0-𝛽1 𝑋𝑖 1-𝑃𝑖
1 1-𝛽0-𝛽1 𝑋𝑖 Pi
Now by definition Var(𝑈𝑖 ) = E[𝑈𝑖 − E(𝑈𝑖 )]2 = 𝐸(𝑈𝑖 2 ) since E(𝑈𝑖 ) = 0 and Cov(𝑈𝑖 , 𝑈𝑗 ) = 0 for
all 𝑖 ≠ 𝑗 (no serial correlation) by assumption.
Therefore, using the preceding probability distribution of 𝑈𝑖 , we obtain
Var(𝑈𝑖 ) = E(Ui2) = (- 0 – 1 Xi)2 (1-Pi) + (1- 0 – 1 Xi)2 (Pi)
2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we assumed them
to be so distributed for the purpose of statistical inference, that is, hypothesis testing, etc. But the
assumption of normality for Ui is no longer tenable for the LPMs because like Yi, Ui takes on only
two values.
𝑈𝑖 = Yi- 0 – 1 Xi
Now when Yi = 1, 𝑈𝑖 = 1 - 0 – 1 Xi
6
and when Yi = 0, 𝑈𝑖 = – 0 – 1 Xi
Obviously 𝑈𝑖 cannot be assumed to be normally distributed. Recall that normality is not required
for the OLS estimates to be unbiased.
3. Non-fulfillment of 0 E (Yi/Xi) 1
The LPM produces predicted values outside the normal range of probabilities (0, 1). It predicts
value of Y that are negative and greater than 1. This is the real problem with the OLS estimation
of the LPM.
4. Functional Form:
Since the model is linear, a unit increase in X results in a constant change of in the probability
of an event, holding all other variables constant. The increase is the same regardless of the current
value of X. In many applications, this is unrealistic. When the outcome is a probability, it is often
substantively reasonable that the effects of independent variables will have diminishing returns as
the predicted probability approaches 0 or 1.
Remark: Because of the above mentioned problems the LPM model is not recommended for
empirical works.
7
Therefore, what we need is a (probability) model that has these two features: (1) As Xi increases,
Pi = E(Y = 1 | X) increases but never steps outside the 0–1 interval, and (2) the relationship between
Pi and Xi is nonlinear, that is, “one which approaches zero at slower and slower rates as Xi gets
small and approaches one at slower and slower rates as Xi gets very large.’’ Geometrically, the
model we want would look something like Figure 1.1.
1 CDF
X
-
0
Fig 1.1: A Cumulative Distribution Function (CDF
The above S-shaped curve is very much similar with the cumulative distribution function (CDF)
of a random variable. (Note that the CDF of a random variable X is simply the probability that it
takes a value less than or equal to X0, were X0 is some specified numerical value of X. In short,
F(X), the CDF of X, is F(X = X0) = P(X X0). Therefore, one can easily use the CDF to model
regressions where the response variable is dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
1. the logistic – which gives rise to the logit model
2. the normal – which gives rise to the probit (or normit) model