In All The Regression Models That We Have Considered So
In All The Regression Models That We Have Considered So
2. Determinants of poverty
where X = family income and Y = 1 if the family owns a house and 0 if it does
not own a house. Model (15.2.1) looks like a typical linear regression model but
because the regressand is binary, or dichotomous, it is called a linear
probability model (LPM).
Now, if Pi = probability that Yi = 1 (that is, the event occurs), and (1 − Pi) =
probability that Yi = 0 (that is, that the event does not occur), the variable Yi
has the following (probability) distribution.
that is, the conditional expectation of the model (15.2.1) can, in
fact, be interpreted as the conditional probability of Yi . In
general, the expectation of a Bernoulli random variable is the
probability that the random variable equals 1.
In passing note that if there are n independent trials, each with a
probability p of success and probability (1 − p) of failure, and X of
these trials represent the number of successes, then X is said to
follow the binomial distribution. The mean of the binomial
distribution is np and its variance is np(1 − p). The term success is
defined in the context of the problem.
that is, the conditional expectation (or conditional probability) must lie
between 0 and 1. From the preceding discussion it would seem that
OLS can be easily extended to binary dependent variable regression
models So, perhaps there is nothing new here. Unfortunately, this is
not the case, for the LPM poses several problems, which are as follows:
throughout.
Thus, in our home ownership example we found that as X
increases by a unit ($1000), the probability of owning a house
increases by the same constant amount of 0.10. This is so whether
the income level is $8000, $10,000, $18,000, or $22,000. This seems
patently unrealistic. In reality one would expect that Pi is
nonlinearly related to Xi :
At very low income a family will not own a house but at a
sufficiently high level of income, say, X*, it most likely will own a
house. Any increase in income beyond X* will have little effect on
the probability of owning a house. Thus, at both ends of the
income distribution, the probability of owning a house will be
virtually unaffected by a small increase in X.
Therefore, what we need is a (probability) model that has these two
features: (1) As Xi increases, Pi = E(Y = 1 | X) increases but never
steps outside the 0–1 interval, and (2) the relationship between Pi and
Xi is nonlinear, that is, “one which approaches zero at slower and
slower rates as Xi gets small and approaches one at slower and
slower rates as Xi gets very large.’’ Geometrically, the model we want
would look something like Figure 15.2. Notice in this model that the
probability lies between 0 and 1 and that it varies nonlinearly with X.
The reader will realize that the sigmoid, or S-shaped, curve in the
figure very much resembles the cumulative distribution function
(CDF) of a random variable.
Therefore, one can easily use the CDF to model regressions where
the response variable is dichotomous, taking 0–1 values. The
practical question now is, which CDF? For although all CDFs are
S shaped, for each random variable there is a unique CDF.
For historical as well as practical reasons, the CDFs commonly
chosen to represent the 0–1 response models are
The logistic CDF: giving rise to the logit model
The normal CDF: giving rise to the probit (or normit) model.
Alternative Models to LPM:
( Logit and Probit Models)
They are probability models with two characteristics.
1. ) 1
The cumulative distribution function (CDF) can be used to model the such
qualitative response models. CDF is the probability of a random variable when
it takes a value less than or equal to some specified numerical value: F(Y)
=F(Y= ) = P(Y ).
THE LOGIT MODEL
We will continue with our home ownership example to explain the
basic ideas underlying the logit model. Recall that in explaining
home ownership in relation to income, the LPM was
Equation (15.5.3) represents what is known as the (cumulative)
logistic distribution function. It is easy to verify that as Zi ranges
from −∞ to +∞, Pi ranges between 0 and 1 and that Pi is
nonlinearly related to Zi (i.e., Xi), thus satisfying the two
requirements considered earlier.
Example 2
Consider the logit2 data, where =grade = , =gpa , = score on an examination given
at the beginning of the term(diagnostic exam) and = methodology = .
Fit the following model:= =+++ +
. logit grade gpa score methodology, nolog
In the table we see the coefficients, their standard errors, the z-statistic, associated p-
values, and the 95% confidence interval of the coefficients. Both gpa and methodology
are statistically significant, while score is statistically insignificant
The logistic regression coefficients give the change in the log odds of the outcome for a
one unit increase in the predictor variable.
For every one unit change in gpa, the log odds in favor of scoring A increases by
2.826
For a one unit increase in score, the log odds in favor of scoring A grade increases
by 0.095. however this variable is statistically insignificant.
The indicator variables for methodology have a slightly different interpretation. If
new teaching method is used (versus the old teaching method), the log odds in favor
of scoring A increases by 2.37.
. mfx
is a binary variable.
This data set has a binary response (outcome, dependent) variable called admit.
There are three predictor variables: gre, gpa and rank. We will treat the
variables gre and gpa as continuous. The variable rank takes on the values 1
through 4. Institutions with a rank of 1 have the highest prestige, while those with
a rank of 4 have the lowest.
Stata result
. logit admit gpa gre i.rank, nolog
rank
2 -.6754429 .3164897 -2.13 0.033 -1.295751 -.0551346
3 -1.340204 .3453064 -3.88 0.000 -2.016992 -.6634158
4 -1.551464 .4178316 -3.71 0.000 -2.370399 -.7325287
. mfx
In the table we see the coefficients, their standard errors, the z-statistic, associated p-
values, and the 95% confidence interval of the coefficients. Both gre and gpa are
statistically significant, as are the three indicator variables for rank.
The logistic regression coefficients give the change in the log odds of the outcome for a
one unit increase in the predictor variable.
For every one unit change in gre, the log odds of admission (versus non-admission)
increases by 0.002.
For a one unit increase in gpa, the log odds of being admitted to graduate school
increases by 0.804.
The indicator variables for rank have a slightly different interpretation. For
example, having attended an undergraduate institution with rank of 2, versus an
institution with a rank of 1, decreases the log odds of admission by 0.675.
Probit model
As we have noted, to explain the behavior of a dichotomous dependent variable we
will have to use a suitably chosen CDF. The logit model uses the cumulative logistic
function, as shown in (15.5.2).
But this is not the only CDF that one can use. In some applications, the normal CDF
has been found useful. The estimating model that emerges from the normal CDF is
popularly known as the probit model, although sometimes it is also known as the
normit model. In principle one could substitute the normal CDF in place of the
logistic CDF .
In statistics, a probit model is a type of regression where the dependent variable can
take only two values, for example married or not married. The purpose of the model
is to estimate the probability that an observation with particular characteristics will
fall into a specific one of the categories; moreover, classifying observations based on
their predicted probabilities is a type of binary classification model.
Example.
.. probit
probit admit
admit gpa
gpa gre
gre i.rank,
i.rank, nolog
nolog
Probit
Probit regression
regression Number
Number ofof obs
obs == 400
400
LR
LR chi2(5)
chi2(5) == 41.56
41.56
Prob
Prob >> chi2
chi2 == 0.0000
0.0000
Log
Log likelihood
likelihood == -229.20658
-229.20658 Pseudo R2
Pseudo R2 == 0.0831
0.0831
admit
admit Coef.
Coef. Std.
Std. Err.
Err. zz P>|z|
P>|z| [95%
[95% Conf.
Conf. Interval]
Interval]
gpa
gpa .4777302
.4777302 .1954625
.1954625 2.44
2.44 0.015
0.015 .0946308
.0946308 .8608297
.8608297
gre
gre .0013756
.0013756 .0006489
.0006489 2.12
2.12 0.034
0.034 .0001038
.0001038 .0026473
.0026473
rank
rank
22 -.4153992
-.4153992 .1953769
.1953769 -2.13
-2.13 0.033
0.033 -.7983308
-.7983308 -.0324675
-.0324675
33 -.812138
-.812138 .2085956
.2085956 -3.89
-3.89 0.000
0.000 -1.220978
-1.220978 -.4032981
-.4032981
44 -.935899
-.935899 .2456339
.2456339 -3.81
-3.81 0.000
0.000 -1.417333
-1.417333 -.4544654
-.4544654
_cons
_cons -2.386838
-2.386838 .6740879
.6740879 -3.54
-3.54 0.000
0.000 -3.708026
-3.708026 -1.065649
-1.065649
Example.
The likelihood ratio chi-square of 41.56 with a p-value of 0.0001 tells us that our
model as a whole is statistically significant, that is, it fits significantly better than a
model with no predictors.
In the table we see the coefficients, their standard errors, the z-statistic, associated p-
values, and the 95% confidence interval of the coefficients. Both gre, gpa, and the
three indicator variables for rank are statistically significant. The probit regression
coefficients give the change in the z-score or probit index for a one unit change in the
predictor.
Y=B1+B2X2+B3X3+ B4X4 + Ui
Dependent variable is poverty = {1, if non-poor
{0, otherwise