0% found this document useful (0 votes)
30 views24 pages

Lecture 7 Probit

This document discusses probit regression models for binary dependent variables. It introduces the probit model, which uses the cumulative standard normal distribution to model the probability of a binary outcome as a nonlinear function of explanatory variables. This allows the predicted probabilities to always be between 0 and 1. The document provides an example using STATA to estimate a probit model using HMDA data and discusses interpreting the results, including computing predicted probabilities and marginal effects.

Uploaded by

Richa Jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views24 pages

Lecture 7 Probit

This document discusses probit regression models for binary dependent variables. It introduces the probit model, which uses the cumulative standard normal distribution to model the probability of a binary outcome as a nonlinear function of explanatory variables. This allows the predicted probabilities to always be between 0 and 1. The document provides an example using STATA to estimate a probit model using HMDA data and discusses interpreting the results, including computing predicted probabilities and marginal effects.

Uploaded by

Richa Jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture 7 Limited Dependent Variable -2

ECMT-7302
Econometrics II
MA Eco. 2022, Fall 2023
Instructor: Sunaina Dhingra
Lectures: Wednesday, (11.20-12.50pM) & Thursday (9-40-11.10 am)
Lecture Meeting Mode: In person (Classroom: T4-F99)
Office Hours: Wednesday 1-2.30 pm & by appointment in FOB, Office No.1B in south on 7th Floor)
Email-id: [email protected]
Lecture Material: Slides and textbooks
Credits: 4.5
• Assume that in the two-variable model Yi = β1 + β2 Xi + ui the Yi are normally
and independently distributed with mean = β1 + β2 Xi and variance = σ 2.
• The joint probability density function of Y1, Y2, ... , Yn , given the preceding
mean and variance, can be written as

• But in view of the independence of the Y’s, this joint probability density
function can be written as a product of n individual density functions as

• Where

• which is the density function of a normally distributed variable with the given mean and 1-2
variance.
• Substituting Equation (2) for each Yi into Equation (1) gives

• If Y1, Y2, . . . , Yn are known or given, but β1, β2, and σ2 are not known, the function in
Equation (3) is called a likelihood function, denoted by LF(β1, β2, σ2), and written as

• MLE Method consists in estimating the unknown parameters (β1, β2, and σ2 )in such a manner
that the probability of observing the given Y’s is as high (or maximum) as possible.
• Therefore, we find the maximum of the function in Equation (4) using differential calculus.
• For differentiation it is easier to express Equation (4) in the log term as follows.
(Note: ln = natural log.)
• Differentiating Equation (5) partially with respect to β1, β2, and σ2, we obtain

1-4
• After simplifying, Eqs. (9) and (10) yield

• which are precisely the normal equations of the least-squares theory obtained by OLS

1-5
• the ML estimator of σ2 is biased. The magnitude of this bias can be easily determined
as follows.

1-6
Limited Dependent Variable Models
• Logit and Probit models for binary response

• Disadvantages of the LPM for binary dependent variables


• Predictions sometimes lie outside the unit interval
• Partial effects of explanatory variables are constant

• Nonlinear models for binary response


• Response probability is a nonlinear function of explanatory variables

7
Limited Dependent Variable Models
• Choices for the link function

• Latent variable formulation of the Logit and Probit models

8
Limited Dependent Variable Models
• Interpretation of coefficients in Logit and Probit models

• Partial effects are nonlinear and depend on the level of x.

9
Limited Dependent Variable Models
• Maximum likelihood estimation of Logit and Probit models

• Properties of maximum likelihood estimators


• Maximum likelihood estimators are consistent, asymptotically normal, and asymptotically efficient if the
distributional assumptions hold.

10
Probit and Logit Regression

• The problem with the linear probability model is that it


models the probability of Y=1 as being linear:

Pr(Y = 1|X) = β0 + β1X

• Instead, we want:

• Pr(Y = 1|X) to be increasing in X for β1>0, and

• 0 ≤ Pr(Y = 1|X) ≤ 1 for all X

• This requires using a nonlinear functional form for the


probability. How about an “S-curve” (like a CDF from earlier classes)
• The probit model satisfies these conditions:
I. Pr(Y = 1|X) to be increasing in X for β1>0, and
II. 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
Probit Regression
• Probit regression models the probability that Y=1 using
the cumulative standard normal distribution function,
Φ(z), evaluated at z = β0 + β1X. The probit regression
model is,
• Pr(Y = 1|X) = Φ(β0 + β1X)
• where Φ is the cumulative normal distribution function
and z = β0 + β1X is the “z-value” or “z-index” of the
probit model.
• Example: Suppose β0 = -2, β1= 3, X = .4, so
• Pr(Y = 1|X=.4) = Φ(-2 + 3×.4) = Φ(-0.8)
• Pr(Y = 1|X=.4) = area under the standard normal density
to left of z = -.8, which is…
Pr(z ≤ -0.8) = .2119
Probit regression, ctd.
• Why use the cumulative normal probability distribution?
• The “S-shape” gives us what we want:
• Pr(Y = 1|X) is increasing in X for β1>0
• 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
• Easy to use – the probabilities are tabulated in the cumulative
normal tables (and easily using regression software)
• Relatively straightforward interpretation:
• β0 + β1X = z-value
• ̂0+ ̂1X is the predicted z-value, given X
• β1 is the change in the z-value for a unit change in X
• The probit model satisfies these conditions:
I. Pr(Y = 1|X) to be increasing in X for β1>0, and
II. 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
The probit model uses the cumulative normal distribution function to
model the probability of denial given the payment-to income ratio or,
more generally, to model Pr(Y = 1| X). Unlike the linear probability model,
the probit conditional probabilities are always between 0 and 1.

1-17
STATA Example: HMDA data
. probit deny p_irat, r;
Iteration 0: log likelihood = -872.0853 We’ll discuss this later
Iteration 1: log likelihood = -835.6633
Iteration 2: log likelihood = -831.80534
Iteration 3: log likelihood = -831.79234
Probit estimates Number of obs = 2380
Wald chi2(1) = 40.68
Prob > chi2 = 0.0000
Log likelihood = -831.79234 Pseudo R2 = 0.0462
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901
_cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082
------------------------------------------------------------------------------
Pr (deny = 1|P / Iratio) = Φ(-2.19 + 2.97×P/I ratio)
(.16) (.47)
STATA Example: HMDA data, ctd.
Pr (deny = 1|P / Iratio) = Φ(-2.19 + 2.97×P/I ratio)
(.16) (.47)
• Positive coefficient: Does this make sense?
• Standard errors have the usual interpretation
• Predicted probabilities:
Pr (deny = 1|P / Iratio = .3) = Φ (-2.19+2.97×.3)
= Φ (-1.30) = .097
• Effect of change in P/I ratio from .3 to .4:
Pr (deny = 1|P / Iratio = .4) = Φ (-2.19+2.97×.4)
= Φ (-1.00) = .159
• Predicted probability of denial rises from .097 to .159
• increase in the probability of denial of 6.2 percentage points,
from 9.7% to 15.9%
• Because the probit regression function is nonlinear, the effect of
a change in X depends on the starting value of X.

• For example, if P/I ratio = 0.5, the estimated denial probability


based on Equation is (-2.19 + 2.97 * 0.5) = (-0.71) = 0.239.

• Thus the change in the predicted probability when P/I ratio


increases from 0.4 to 0.5 is 0.239 - 0.159, or 8.0 percentage
points,

• Which larger than the increase of 6.2 percentage points when


P/I ratio increases from 0.3 to 0.4.

1-20
Probit regression with multiple regressors
Pr(Y = 1|X1, X2) = Φ (β0 + β1X1 + β2X2)
• The model is best interpreted by computing predicted probabilities and the
effect of a change in a regressor.
• Φ is the cumulative normal distribution function.
• The predicted probability that Y = 1, given values of X1, X2 is calculated by
computing the z-value, z = β0 + β1X1 + β2X2 and then looking up this z-value
in the normal distribution table (Appendix Table 1).
• z = β0 + β1X1 + β2X2 is the “z-value” or “z-index” of the probit model.
• β1 is the effect on the z-score of a unit change in X1, holding constant X2
• The effect on the predicted probability of a change in a regressor is
computed by
• (1) computing the predicted probability for the initial value of the regressors,
• (2) computing the predicted probability for the new or changed value of the
regressors, and
• (3) taking their difference.
STATA Example: HMDA data
. probit deny p_irat black, r;
Iteration 0: log likelihood = -872.0853
Iteration 1: log likelihood = -800.88504
Iteration 2: log likelihood = -797.1478
Iteration 3: log likelihood = -797.13604
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
We’ll go through the estimation details later…
STATA Example, ctd.: Predicted probit probabilities
. probit deny p_irat black, r;
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
. sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0;
. display "Pred prob, p_irat=.3, white: " normprob(z1);
Pred prob, p_irat=.3, white: .07546603
NOTE
_b[_cons] is the estimated intercept (-2.258738)
_b[p_irat] is the coefficient on p_irat (2.741637)
sca creates a new scalar which is the result of a calculation
display prints the indicated information to the screen
STATA Example, ctd.
Pr (deny = 1|P/I, black)
= Φ(-2.26 + 2.74×P/I ratio + .71×black)
(.16) (.44) (.08)
• Is the coefficient on black statistically significant?
• Estimated effect of race for P/I ratio = .3:
Pr (deny = 1|.3,1)= Φ(-2.26+2.74×.3+.71×1) = .233

Pr (deny = 1|.3,0)= Φ(-2.26+2.74×.3+.71×0) = .075

• Difference in rejection probabilities = .158 (15.8 pp)


• Still plenty of room for omitted variable bias!

You might also like