Regression With A Binary Dependent Variable
Regression With A Binary Dependent Variable
Regression with a
Binary Dependent
Variable
Presenters
2
Regression with a Binary
Dependent Variable (SW Chapter 11)
So far the dependent variable (Y) has been continuous:
district-wide average test score
traffic fatality rate
What if Y is binary?
Y = get into college, or not; X = years of education
Y = person smokes, or not; X = income
Y = mortgage application is accepted, or not; X =
income, house characteristics, marital status, race
3
Example: Mortgage denial and race
The Boston Fed HMDA data set
Individual applications for single-family mortgages
made in 1990 in the greater Boston area
2380 observations, collected under Home Mortgage
Disclosure Act (HMDA)
Variables
Dependent variable:
Is the mortgage denied or accepted?
Independent variables:
income, wealth, employment status
other loan, property characteristics
race of applicant 4
The Linear Probability Model
(SW Section 11.1)
Y i = 0 + 1X i + u i
But:
Y
What does 1 mean when Y is binary? Is 1 = ?
X
What does the line 0 + 1X mean when Y is binary?
What does the predicted value Yˆ mean when Y is binary?
For example, what does Yˆ = 0.26 mean?
5
The linear probability model, ctd.
Yi = 0 + 1Xi + u i
When Y is binary,
E(Y) = 1 Pr(Y=1) + 0 Pr(Y=0) = Pr(Y=1)
so
E(Y|X) = Pr(Y=1|X)
6
The linear probability model, ctd.
When Y is binary, the linear regression model
Yi = 0 + 1 Xi + u i
is called the linear probability model.
8
Linear probability model: HMDA
data, ctd.
deny = -.080 + .604P/I ratio (n = 2380)
(.032) (.098)
9
Linear probability model: HMDA
data, ctd
Next include black as a regressor:
deny = -.091 + .559P/I ratio + .177black
(.032) (.098) (.025)
11
Probit and Logit Regression
BY
12
Probit and Logit Regression
(SW Section 11.2)
Instead, we want:
0 ≤ Pr(Y = 1|X) ≤ 1 for all X
Pr(Y = 1|X) to be increasing in X (for 1>0)
This requires a nonlinear functional form for the probability.
How about an “S-curve”…
13
The probit model satisfies these conditions:
0 ≤ Pr(Y = 1|X) ≤ 1 for all X
Pr(Y = 1|X) to be increasing in X (for 1>0)
14
Probit regression models the probability that Y=1 using the
cumulative standard normal distribution function, evaluated
at z = 0 + 1X:
Pr(Y = 1|X) = (0 + 1X)
is the cumulative normal distribution function.
z = 0 + 1X is the “z-value” or “z-index” of the probit
model.
16
Probit regression, ctd.
Why use the cumulative normal probability distribution?
The “S-shape” gives us what we want:
0 ≤ Pr(Y = 1|X) ≤ 1 for all X
Pr(Y = 1|X) to be increasing in X (for 1>0)
Easy to use – the probabilities are tabulated in the
cumulative normal tables
Relatively straightforward interpretation:
z-value = 0 + 1X
ˆ + ˆ X is the predicted z-value, given X
0 1
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901
_cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082
------------------------------------------------------------------------------
20
STATA Example: HMDA data
. probit deny p_irat black, r;
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
21
STATA Example, ctd.: predicted
probit probabilities
. probit deny p_irat black, r;
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
. sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0;
NOTE
_b[_cons] is the estimated intercept (-2.258738)
_b[p_irat] is the coefficient on p_irat (2.741637)
sca creates a new scalar which is the result of a calculation
display prints the indicated information to the screen
22
STATA Example, ctd.
deny 1 | P / I , black )
Pr(
= (-2.26 + 2.74 P/I ratio + .71 black)
(.16) (.44) (.08)
Is the coefficient on black statistically significant?
Estimated effect of race for P/I ratio = .3:
deny 1 | .3,1) = (-2.26+2.74 .3+.71 1) = .233
Pr(
deny 1 | .3,0) = (-2.26+2.74 .3+.71 0) = .075
Pr(
Difference in rejection probabilities = .158 (15.8
percentage points)
Still plenty of room still for omitted variable bias…
23
Logit Regression
Logit regression models the probability of Y=1 as the
cumulative standard logistic distribution function, evaluated
at z = 0 + 1X:
1
F(0 + 1X) =
1 e ( 0 1 X )
24
Logit regression, ctd.
Pr(Y = 1|X) = F(0 + 1X)
1
where F(0 + 1X) = ( 0 1 X )
.
1 e
25
STATA Example: HMDA data
. logit deny p_irat black, r;
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 5.370362 .9633435 5.57 0.000 3.482244 7.258481
black | 1.272782 .1460986 8.71 0.000 .9864339 1.55913
_cons | -4.125558 .345825 -11.93 0.000 -4.803362 -3.447753
------------------------------------------------------------------------------
27
Example for class discussion:
Characterizing the Background of Hezbollah Militants
28
29
30
Hezbollah militants example, ctd.
Compute the effect of schooling by comparing predicted
probabilities using the logit regression in column (3):
32
Estimation and Inference in
Probit (and Logit) Models
BY
Rinkal Virwani
33
Estimation and Inference in Probit
(and Logit) Models (SW Section 11.3)
Probit model:
Pr(Y = 1|X) = (0 + 1X)
35
Probit estimation by maximum
likelihood
The likelihood function is the conditional density of
Y1,…,Yn given X1,…,Xn, treated as a function of the
unknown parameters 0 and 1.
The maximum likelihood estimator (MLE) is the value of
(0, 1) that maximize the likelihood function.
The MLE is the value of (0, 1) that best describe the full
distribution of the data.
In large samples, the MLE is:
consistent
normally distributed
efficient
36
Special case: the probit MLE with
no X
1 with probability p
Y= (Bernoulli distribution)
0 with probability 1 p
37
Joint density of (Y1,Y2):
Because Y1 and Y2 are independent,
= p
n
i 1
yi
(1 p )
i 1 yi
n
n
38
The MLE in the “no-X” case
(Bernoulli distribution), ctd.:
pˆ MLE = Y = fraction of 1’s
39
The MLE in the “no-X” case
(Bernoulli distribution), ctd:
The theory of maximum likelihood estimation says that
pˆ MLE is the most efficient estimator of p – of all possible
estimators – at least for large n. This is why people use the
MLE.
40
The probit likelihood with one X
The derivation starts with the density of Y1, given X1:
Pr(Y1 = 1|X1) = (0 + 1X1)
Pr(Y1 = 0|X1) = 1–(0 + 1X1)
so
Pr(Y1 = y1|X1) = ( 0 1 X 1 ) y1 [1 ( 0 1 X 1 )]1 y1
41
Measures of fit for logit and probit
The R2 and R 2 don’t make sense here (why?). So, two other
specialized measures are used:
42
Application to the Boston HMDA
Data
By
Saeed Ahmed
43
Application to the Boston HMDA
Data (SW Section 11.4)
Mortgages (home loans) are an essential part of buying a
home.
Is there differential access to home loans by race?
If two otherwise identical individuals, one white and one
black, applied for a home loan, is there a difference in
the probability of denial?
44
The HMDA Data Set
Data on individual characteristics, property
characteristics, and loan denial/acceptance
The mortgage application process circa 1990-1991:
Go to a bank or mortgage company
Fill out an application (personal+financial info)
Meet with the loan officer
Then the loan officer decides – by law, in a race-blind
way. Presumably, the bank wants to make profitable
loans, and the loan officer doesn’t want to originate
defaults.
45
The loan officer’s decision
Loan officer uses key financial variables:
P/I ratio
housing expense-to-income ratio
loan-to-value ratio
personal credit history
The decision rule is nonlinear:
loan-to-value ratio > 80%
loan-to-value ratio > 95% (what happens in default?)
credit score
46
Regression specifications
Pr(deny=1|black, other X’s) = …
linear probability model
probit
47
48
49
50
Table 11.2, ctd.
51
Table 11.2, ctd.
52
Summary of Empirical Results
Coefficients on the financial variables make sense.
Black is statistically significant in all specifications
Race-financial variable interactions aren’t significant.
Including the covariates sharply reduces the effect of race
on denial probability.
LPM, probit, logit: similar estimates of effect of race on
the probability of denial.
Estimated effects are large in a “real world” sense.
53
Remaining threats to internal,
external validity
Internal validity
1. omitted variable bias
what else is learned in the in-person interviews?
2. functional form misspecification (no…)
3. measurement error (originally, yes; now, no…)
4. selection
random sample of loan applications
define population to be loan applicants
5. simultaneous causality (no)
External validity
This is for Boston in 1990-91. What about today?
54
Summary
(SW Section 11.5)
55