0% found this document useful (0 votes)
274 views29 pages

Qualitative Response Regression Models 1

The document discusses qualitative response regression models. It explains that these models are used when the dependent variable is binary or qualitative. It describes the linear probability model and issues it faces, such as heteroscedasticity. It then introduces alternative models like the logit and probit models which address these issues.

Uploaded by

Claire Manyanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
274 views29 pages

Qualitative Response Regression Models 1

The document discusses qualitative response regression models. It explains that these models are used when the dependent variable is binary or qualitative. It describes the linear probability model and issues it faces, such as heteroscedasticity. It then introduces alternative models like the logit and probit models which address these issues.

Uploaded by

Claire Manyanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Qualitative Response Regression

Models
(With vivid example, explain the nature of Qualitative response models)
The Nature of Qualitative Response Models
Example, Suppose we want to study the labor force participation (LFP) decision of adult
males. Since an adult is either in the labor force or not, LFP is a yes or no decision. Hence,
the response variable, or regressand (dependent variable) , can take only two values, say,
1 if the person is in the labor force and 0 if he or she is not.
In other words, the regressand (dependent variable) is a binary, or dichotomous, variable.
For the present purposes, the important thing to note is that the regressand (dependent
variable) is a qualitative variable.
(Explain the key difference between the Quantitative model and Qualitative Model)
Difference between Qualitative and Quantitative model
In a model where Y is quantitative, our objective is to estimate its expected, or mean,
value given the values of the regressors (independent variable).
While In models where Y is qualitative, our objective is to find the probability of
something happening, such as voting for a Democratic candidate, or owning a house, or
belonging to a union, or participating in a sport, etc. Hence, qualitative response
regression models are often known as probability models.
We start our study of qualitative response models by first considering the binary
response regression model.
There are four approaches to developing a probability model for a binary response
variable:
1. The linear probability model (LPM)
2. The logit model
3. The probit model
4. The tobit model
(What is Linear Probability Model and Discuss how the LPM model cause Bernoulli
probability distribution)
The Linear Probability Model (LPM)
Consider the following regression model:
Yi = β1 + β2Xi + ui
where X = family income and Y = 1 if the family owns a house and 0 if it does not
own a house.
Model above looks like a typical linear regression model but because the
regressand is binary, or dichotomous, it is called a linear probability model (LPM).
This is because the conditional expectation of Yi given Xi , E(Yi | Xi), can be
interpreted as the conditional probability that the event will occur given Xi , that is,
Pr (Yi = 1 | Xi).
Thus, in our example, E(Yi | Xi) gives the probability of a family owning a house and
whose income is the given amount Xi.
The justification of the name LPM for models like Eq. above can be seen as
follows:
Assuming E(ui) = 0, as usual (to obtain unbiased estimators), we obtain
E(Yi | Xi) = β1 + β2Xi
Now, if Pi = probability that Yi = 1 (that is, the event occurs), and (1 − Pi) =
probability that Yi = 0 (that is, the event does not occur), the variable Yi has the
following (probability) distribution

Yi Probability
0 1 - Pi

1 Pi

TOTAL 1

That is, Yi follows the Bernoulli probability distribution.


Now, by the definition of mathematical expectation, we obtain:
E(Yi) = 0(1 − Pi) + 1(Pi) = Pi
Comparing above with E(Yi | Xi) = β1 + β2Xi we can equate
E(Yi | Xi) = β1 + β2Xi = Pi
In general, the expectation of a Bernoulli random variable is the probability that
the random variable equals 1.
In passing note that if there are n independent trials, each with a probability p of
success and probability (1 − p) of failure, and X of these trials represent the
number of successes, then X is said to follow the binomial distribution.
The mean of the binomial distribution is np and its variance is np(1 − p).
Since the probability Pi must lie between 0 and 1, we have the restriction
0 ≤ E(Yi | Xi) ≤ 1
that is, the conditional expectation (or conditional probability) must lie between 0
and 1.
(Explain the setbacks of using LPM and way of collecting those setbacks)
Linear Probability Model poses several problems, which are as follows:
i. Non-Normality of the Disturbances ui
The assumption of normality for ui is not tenable for the LPMs because, like Yi , the
disturbances ui also take only two values; that is, they also follow the Bernoulli
distribution.
ui = Yi − β1 − β2X
The probability distribution of ui is
ui Probability
When Yi = 1 1 − β1 − β2Xi Pi
When Yi = 0 −β1 − β2Xi (1 − Pi)

Obviously, ui cannot be assumed to be normally distributed; they follow the


Bernoulli distribution.
We can resolve this problem by increase the sample size to minimize the non-
normality problem.
ii. Heteroscedastic Variances of the Disturbances
As statistical theory shows, for a Bernoulli distribution the theoretical mean and
variance are, respectively, p and p(1 − p), where p is the probability of success (i.e.,
something happening), showing that the variance is a function of the mean. Hence
the error variance is heteroscedastic.
We already know that, in the presence of heteroscedasticity, the OLS estimators,
although unbiased, are not efficient; that is, they do not have minimum variance.
Since the variance of ui depends on E(Yi | Xi), one way to resolve the
heteroscedasticity problem is to transform the model by dividing it through 𝒘
𝑌𝑖 β1 𝑋𝑖 𝑢𝑖
= + β2 +
𝑤 𝑤 𝑤 𝑤
As you can readily verify, the transformed error term is homoscedastic.
iii. Nonfulfillment of 0 ≤ E(Yi | Xi) ≤ 1.
Since E(Yi | Xi) in the linear probability models measures the conditional probability
of the event Y occurring given X, it must necessarily lie between 0 and 1.
Although this is true a priori, there is no guarantee that Yi estimated , the
estimators of E(Yi | Xi), will necessarily fulfill this restriction, and this is the real
problem with the OLS estimation of the LPM.
This happens because OLS does not take into account the restriction that
0 ≤ E(Yi) ≤ 1 (an inequality restriction).
There are two ways of finding out whether the estimated Yi lie between 0 and 1.
1. To estimate the LPM by the usual OLS method and find out whether the
estimated Yi lie between 0 and 1. (solution) If some are less than 0 (that is,
negative), Yi is assumed to be zero for those cases; if they are greater than 1,
they are assumed to be 1.
2. To devise an estimating technique that will guarantee that the estimated
conditional probabilities Yi will lie between 0 and 1.
iv. Questionable Value of R2 as a Measure of Goodness of Fit.
The conventionally computed R2 is of limited value in the dichotomous response
models.
To see why, consider Figure above Corresponding to a given X, Y is either 0 or 1.
Therefore, all the Y values will either lie along the X axis or along the line
corresponding to 1.
Therefore, generally no LPM is expected to fit such a scatter well, whether it is the
unconstrained LPM (Figure a) or the truncated or constrained LPM (Figure b), an
LPM estimated in such a way that it will not fall outside the logical band 0–1.
As a result, the conventionally computed R2 is likely to be much lower than 1 for
such models. In most practical applications the R2 ranges between 0.2 to 0.6.
R2 in such models will be high, say, in excess of 0.8 only when the actual scatter is
very closely clustered around points A and B (Figure c), for in that case it is easy to
fix the straight line by joining the two points A and B. In this case the predicted Yi
will be very close to either 0 or 1.
Example
Current in Dar es salaam some people are affected by red-eyes. Form the linear
probability model to show this
Solution
We have three independent variables;
1. Exposure to pollution (X1)
This variable represent the level of pollution individuals are exposed to, as pollution can
often lead to eye irritation and redness.
2. Bad Personal Hygiene Practices (X2)
Example, No Handwashing frequency or cleanliness of living condition which impact the
eye infection

Once we have selected these variables, we can construct our LPM equation
Y = β1 + β2X1 + β3X2 + ui
Here the response variable, or regressand (dependent variable) , can take only two values, say, 1
if the individual is affected by red eyes and 0 if individual is not affected by red eyes.
Where
β1 is the intercept term
β2 , β3 are the coefficients of the independent variables
X1,X2 are regressors (Explanatory variables)
ui is the error term.

Interpretation
For every one-unit increase in Exposure to pollution (X1), the probability of being affected by
red-eyes increases by β2 , assuming other variables remain constant.
For every one-unit increase in bad personal hygiene practices (X2), the probability of being
affected by red-eyes increases by β3 , assuming other variables remain constant.
Alternatives to LPM
1. The Logit Model
2. The Probit Model
3. The Tobit model
1. THE LOGIT MODEL
Recall
Pi = β1 + β2Xi
where X is income and Pi = E(Yi = 1|Xi) means the family owns a house. But now consider
the following representation of home ownership.
1
Pi=
1 + e−(β1+β2Xi)

For ease of exposition, we write


1
Pi= −z , Multiply RHS by ez
1+e

ez
Pi=
1 + ez
because eze-z= 1
where Zi = β1 + β2Xi .
Equation above represents what is known as the (cumulative) logistic distribution
function.
If Pi , the probability of event occur, is given by Equation above, then (1 − Pi), the probability of
event does not occur, is
1
1 − Pi =
1 + eZi
Therefore, we can write
1 + e Zi
Pi =
−Zi
1−Pi 1 + e
Pi = ezi
1−Pi
Now Pi/(1 − Pi) is simply the odds ratio in favor of event will occur—the ratio of the probability
that a event will occur to the probability that event will not occur.
Thus, if Pi = 0.8, it means that odds are 4 to 1 in favor of the event will occur.

Li = Ln ( Pi ) = Zi
1−Pi
but Zi = B1 + B2X
Li = B1 + B2X
that is, Li, the log of the odds ratio, is not only linear in X, but also (from the estimation
viewpoint) linear in the parameters. Li is called the logit.
Estimation of the Logit Model
Li = Ln ( Pi ) = B1 + B2X + ui
1−Pi

To estimate Equation above, we need, apart from Xi, the values of the regressand,
or logit, Li. This depends on the type of data we have for analysis. We distinguish
two types of data:
(1) Data at the individual, or micro, level.
(2) Grouped or replicated data.
1. Data at the individual, or micro, level.
If we have data on individual families, OLS estimation of logit model is infeasible.
Pi = 1 if a event occur and Pi = 0 if event does not occur. But if we put these values
directly into the logit Li , we obtain:
1
Li=ln ( ) if event occur
0
0
Li=ln ( ) if event does not occur
1
Obviously, these expressions are meaningless. Therefore, if we have data at the
micro, or individual, level, we cannot estimate logit model by the standard OLS
routine. In this situation we may have to resort to the maximum-likelihood (ML)
method to estimate the parameters.
2. Grouped or replicated data.
Now consider the data given in Table below. This table gives data on several
families grouped or replicated (repeat observations) according to income level and
the number of families owning a house at each income level. Corresponding to
each income level Xi , there are Ni families, ni among whom are home owners
(ni ≤ Ni). Therefore, if we compute
Using the estimated Pi, we can obtain the estimated logit as

which will be a fairly good estimate of the true logit Li if the number of
observations Ni at each Xi is reasonably large.
As in the case of the LPM, the disturbance term in the logit model is
heteroscedastic. Thus, instead of using OLS we will have to use the weighted least
squares (WLS).
We now describe the various steps in estimating the logit regression
1. For each income level X, compute the probability of owning a house as
2. For each Xi , obtain the logit as

3. To resolve the problem of heteroscedasticity, transform logit regression as


follows:
which we write as
Li ∗ = β1 𝑤𝑖 + β2Xi∗ + vi
Li ∗ = transformed or weighted Li; Xi∗ = transformed or weighted Xi ; and
vi = transformed error term. It is easy to verify that the transformed error term vi is
homoscedastic.
Example 1
We will use the data given in Table above. Since the data in the table are grouped,
the logit model based on this data will be called a grouped logit model, glogit, for
short.
The necessary raw data and other relevant calculations necessary to implement
glogit are given in Table below

i. How do you interpret the regression above?


ii. Compute Probability at X=20 of regression model
Solution
i. Odds Interpretation
Remember that Li = ln [Pi/(1 − Pi)]. Therefore, taking the antilog of the estimated
logit, we get Pi/(1 − Pi), that is, the odds ratio. Hence, taking the antilog of Equation
above, we obtain:

e0.07862 = 1.0817. This means that for a unit increase in weighted income, the
(weighted) odds in favor of owning a house increases by 1.0817 or about 8.17
percent.

Note
if you take the antilog of the slope coefficient, subtract 1 from it, and multiply the
result by 100, you will get the percent change in the odds for a unit increase in the
regressor.
ii. Computing Probabilities
Suppose we want to compute this probability at X = 20 ($20,000). Plugging this
value into our Equation, we obtain: Li∗ = −0.09311 and dividing this by √wi = 4.1816
(see Table), we obtain Li = −0.02226. Therefore, at the income level of $20,000, we
have

The estimated probability is 0.4945. That is, given the income of $20,000, the
probability of a family owning a house is about 49 percent.
Example 2
From data for 54 standard metropolitan statistical areas (SMSA), Demaris
estimated the following logit model to explain high murder rate versus low murder
rate
lnOi = 1.1387 + 0.0014Pi + 0.0561Ci − 0.4050Ri
se = (0.0009) (0.0227) (0.1568)
where O = the odds of a high murder rate, P = 1980 population size in thousands,
C = population growth rate from 1970 to 1980, R = reading quotient, and the se are
the asymptotic standard errors.
a. How would you interpret the population size coefficients?
b. Which of the coefficients are individually statistically significant?
c. What is the effect of a unit increase in the reading quotient on the odds of
having a higher murder rate?
d. What is the effect of a percentage point increase in the population growth rate
on the odds of having a higher murder rate?
Solutions
a. The coefficient of 0.0014 attached by Pi is to be interpreted as;
e0.0014 = 1.0004, we subtract 1 from the answer and multiply the difference by 100
1.0004 – 1 = 0.0004 * 100 = 0.04%. This means that if population increases by one unit,
the odds of a high murder rate goes up by 0.04%.

b. Individually, the coefficient of C and R are statistically significant at the 5%. (show
your calculations).

c. The effect of unit increase in the reading quotient ;


e0.4050 = 1.4993, we subtract 1 from the answer and multiply the difference by 100
1.4993 – 1 = 0.4993 * 100 = 49.93%. This means that if reading quotient increases by one
unit, the odds of a high murder rate goes up by 49.93%.

d. The effect of percentage point increase in population growth;


e0.0561 = 1.0577, we subtract 1 from the answer and multiply the difference by 100
1.0577 – 1 = 0.0577 * 100 = 5.77%. This means that if population growth rate increases
by one unit, the odds of a high murder rate goes up by 5.77%.

You might also like