Logit Probit
Logit Probit
Distribution of Epsilon
-x
0.50
0.40
0.30
0.20
0.10
-2
-1
Values of
That will prevent admission
Another example
Suppose we have a kid with bad scores.
For this kid, xi is small (even negative).
What will allow admission? Only a large
positive i
What is the probability of observing a large
positive i ? Very small.
Most likely, not admitted, so, we estimate
a small probability
4
Distribution of Epsilon
x
0.50
-x
0.40
Values of
that would
allow
admission
0.30
0.20
Values of
that would
prevent
admission
0.10
0.00
-3
-2
-1
Summary
Pr(yi=1) = (xi )
Pr(yi=0) = 1 -(xi )
Logit
PDF: f(x) = exp(x)/[1+exp(x)]2
CDF: F(a) = exp(a)/[1+exp(a)]
Symmetric, unimodal distribution
Looks a lot like the normal
Incredibly easy to evaluate the CDF and PDF
Mean of zero, variance > 1 (more variance
than normal)
10
STATA Resources
Discrete Outcomes
Regression Models for Categorical
Dependent Variables Using STATA
J. Scott Long and Jeremy Freese
Data: workplace1.dta
Sample program: workplace1.doc
Results: workplace1.log
15
. desc;
storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------> smoker
byte
%9.0g
is current smoking
worka
byte
%9.0g
has workplace smoking bans
age
byte
%9.0g
age in years
male
byte
%9.0g
male
black
byte
%9.0g
black
hispanic
byte
%9.0g
hispanic
incomel
float %9.0g
log income
hsgrad
byte
%9.0g
is hs graduate
somecol
byte
%9.0g
has some college
college
float %9.0g
-----------------------------------------------------------------------
16
Summary statistics
sum;
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------smoker |
16258
.25163
.433963
0
1
worka |
16258
.6851396
.4644745
0
1
age |
16258
38.54742
11.96189
18
87
male |
16258
.3947595
.488814
0
1
black |
16258
.1119449
.3153083
0
1
-------------+-------------------------------------------------------hispanic |
16258
.0607086
.2388023
0
1
incomel |
16258
10.42097
.7624525
6.214608
11.22524
hsgrad |
16258
.3355271
.4721889
0
1
somecol |
16258
.2685447
.4432161
0
1
college |
16258
.3293763
.4700012
0
1
17
Heteroskedastic consistent
Standard errors
.
.
.
>
Number of obs
F( 9, 16248)
Prob > F
R-squared
Root MSE
=
=
=
=
=
16258
99.26
0.0000
0.0488
.42336
-----------------------------------------------------------------------------|
Robust
smoker |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0004776
.0002806
-1.70
0.089
-.0010276
.0000725
incomel | -.0287361
.0047823
-6.01
0.000
-.03811
-.0193621
male |
.0168615
.0069542
2.42
0.015
.0032305
.0304926
black | -.0356723
.0110203
-3.24
0.001
-.0572732
-.0140714
hispanic |
-.070582
.0136691
-5.16
0.000
-.097375
-.043789
hsgrad | -.0661429
.0162279
-4.08
0.000
-.0979514
-.0343345
somecol | -.1312175
.0164726
-7.97
0.000
-.1635056
-.0989293
college | -.2406109
.0162568
-14.80
0.000
-.272476
-.2087459
worka |
-.066076
.0074879
-8.82
0.000
-.080753
-.051399
_cons |
.7530714
.0494255
15.24
0.000
.6561919
.8499509
------------------------------------------------------------------------------
Since OLS
Report t-stats
18
0:
1:
2:
3:
log
log
log
log
Probit estimates
likelihood
likelihood
likelihood
likelihood
= -9171.443
= -8764.068
= -8761.7211
= -8761.7208
=
=
=
=
16258
819.44
0.0000
0.0447
-----------------------------------------------------------------------------smoker |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0012684
.0009316
-1.36
0.173
-.0030943
.0005574
incomel |
-.092812
.0151496
-6.13
0.000
-.1225047
-.0631193
male |
.0533213
.0229297
2.33
0.020
.0083799
.0982627
black | -.1060518
.034918
-3.04
0.002
-.17449
-.0376137
hispanic | -.2281468
.0475128
-4.80
0.000
-.3212701
-.1350235
hsgrad | -.1748765
.0436392
-4.01
0.000
-.2604078
-.0893453
somecol |
-.363869
.0451757
-8.05
0.000
-.4524118
-.2753262
college | -.7689528
.0466418
-16.49
0.000
-.860369
-.6775366
worka | -.2093287
.0231425
-9.05
0.000
-.2546873
-.1639702
_cons |
.870543
.154056
5.65
0.000
.5685989
1.172487
------------------------------------------------------------------------------
Report z-statistics
Instead of t-stats
19
Number of obs
LR chi2(9)
Prob > chi2
Pseudo R2
= 16258
= 819.44
= 0.0000
= 0.0447
-----------------------------------------------------------------------------smoker |
dF/dx
Std. Err.
z
P>|z|
x-bar [
95% C.I.
]
---------+-------------------------------------------------------------------age | -.0003951
.0002902
-1.36
0.173
38.5474 -.000964 .000174
incomel | -.0289139
.0047173
-6.13
0.000
10.421
-.03816 -.019668
male*|
.0166757
.0071979
2.33
0.020
.39476
.002568 .030783
black*| -.0320621
.0102295
-3.04
0.002
.111945 -.052111 -.012013
hispanic*| -.0658551
.0125926
-4.80
0.000
.060709 -.090536 -.041174
hsgrad*|
-.053335
.013018
-4.01
0.000
.335527
-.07885 -.02782
somecol*| -.1062358
.0122819
-8.05
0.000
.268545 -.130308 -.082164
college*| -.2149199
.0114584
-16.49
0.000
.329376 -.237378 -.192462
worka*| -.0668959
.0075634
-9.05
0.000
.68514
-.08172 -.052072
---------+-------------------------------------------------------------------obs. P |
.25163
pred. P |
.2409344 (at x-bar)
-----------------------------------------------------------------------------(*) dF/dx is for discrete change of dummy variable from 0 to 1
z and P>|z| correspond to the test of the underlying coefficient being 0
20
. mfx compute;
.
.
.
.
.
.
min->max
-0.0327
-0.1807
0.0198
-0.0390
-0.0817
-0.0634
-0.1257
-0.2685
-0.0753
0->1
-0.0005
-0.0314
0.0198
-0.0390
-0.0817
-0.0634
-0.1257
-0.2685
-0.0753
-+1/2
-0.0005
-0.0348
0.0200
-0.0398
-0.0855
-0.0656
-0.1360
-0.2827
-0.0785
-+sd/2
-0.0057
-0.0266
0.0098
-0.0126
-0.0205
-0.0310
-0.0605
-0.1351
-0.0365
MargEfct
-0.0005
-0.0349
0.0200
-0.0398
-0.0857
-0.0657
-0.1367
-0.2888
-0.0786
22
23
hsgrad = 0
somecol = 0
college = 0
chi2( 3) =
Prob > chi2 =
504.78
0.0000
24
.
.
.
.
>
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
= -9171.443
= -8764.068
= -8761.7211
= -8761.7208
LR chi2(3) =
Prob > chi2 =
520.7
25 0.000
LP
-0.00040
-0.0289
0.0167
-0.0321
-0.0658
-0.0533
-0.2149
-0.0669
Probit
-0.00048
-0.0287
0.0168
-0.0357
-0.0706
-0.0661
-0.2406
-0.0661
Logit
-0.00048
-0.0276
0.0172
-0.0342
-0.0602
-0.0514
-0.2121
-0.0658
26
probit
matrix
matrix
matrix
smoker worka age incomel male black hispanic hsgrad somecol college;
betat=e(b);
* get beta from probit (1 x k);
beta=betat';
covp=e(V);
* get v/c matric from probit (k x k);
results[10,3]
marg_eff
worka -.06521255
age -.00039515
incomel -.02891389
male
.01661127
black -.03303852
hispanic -.07107496
hsgrad -.05447959
somecol -.11335675
college -.23955322
_cons
.2712018
std_err
.00720374
.00029023
.00471728
.00714305
.0108782
.01479806
.01359844
.01408096
.0144803
.04808183
z_score
-9.0525984
-1.3615156
-6.129356
2.3255154
-3.0371321
-4.8029926
-4.0063111
-8.0503576
-16.543383
5.6404217
-----------------------------------------------------------------------------smoker |
dF/dx
Std. Err.
z
P>|z|
x-bar [
95% C.I.
]
---------+-------------------------------------------------------------------age | -.0003951
.0002902
-1.36
0.173
38.5474 -.000964 .000174
incomel | -.0289139
.0047173
-6.13
0.000
10.421
-.03816 -.019668
male*|
.0166757
.0071979
2.33
0.020
.39476
.002568 .030783
black*| -.0320621
.0102295
-3.04
0.002
.111945 -.052111 -.012013
hispanic*| -.0658551
.0125926
-4.80
0.000
.060709 -.090536 -.041174
hsgrad*|
-.053335
.013018
-4.01
0.000
.335527
-.07885 -.02782
somecol*| -.1062358
.0122819
-8.05
0.000
.268545 -.130308 -.082164
college*| -.2149199
.0114584
-16.49
0.000
.329376 -.237378 -.192462
worka*| -.0668959
.0075634
-9.05
0.000
.68514
-.08172 -.052072
---------+-------------------------------------------------------------------29
30
symmetric me_1[1,1]
c1
r1 -.06689591
. * standard error of workplace a;
. matrix list se_me_1;
symmetric se_me_1[1,1]
c1
r1 .00756336
-----------------------------------------------------------------------------smoker |
dF/dx
Std. Err.
z
P>|z|
x-bar [
95% C.I.
]
---------+-------------------------------------------------------------------age | -.0003951
.0002902
-1.36
0.173
38.5474 -.000964 .000174
incomel | -.0289139
.0047173
-6.13
0.000
10.421
-.03816 -.019668
male*|
.0166757
.0071979
2.33
0.020
.39476
.002568 .030783
black*| -.0320621
.0102295
-3.04
0.002
.111945 -.052111 -.012013
hispanic*| -.0658551
.0125926
-4.80
0.000
.060709 -.090536 -.041174
hsgrad*|
-.053335
.013018
-4.01
0.000
.335527
-.07885 -.02782
somecol*| -.1062358
.0122819
-8.05
0.000
.268545 -.130308 -.082164
college*| -.2149199
.0114584
-16.49
0.000
.329376 -.237378 -.192462
worka*| -.0668959
.0075634
-9.05
0.000
.68514
-.08172 -.052072
---------+-------------------------------------------------------------------31
Standard Normal
0.35
0.30
Y
0.25
0.20
Logit
0.15
0.10
0.05
0.00
-7
-5
-3
-1
X
32
Pseudo R
Predicting Y
Let b be the estimated value of
For any candidate vector of xi , we can predict
probabilities, Pi
Pi = (xib)
Once you have Pi, pick a threshold value, T, so
that you predict
Yp = 1 if Pi > T
Yp = 0 if Pi T
35
36
Mean of predicted
Y is always close to actual mean
(0.25163 in this case)
. predict pred_prob_smoke;
(option p assumed; Pr(smoker))
. * get detailed descriptive data about predicted prob;
. sum pred_prob, detail;
Pr(smoker)
------------------------------------------------------------Percentiles
Smallest
1%
.0959301
.0615221
5%
.1155022
.0622963
10%
.1237434
.0633929
Obs
16258
25%
.1620851
.0733495
Sum of Wgt.
16258
50%
75%
90%
95%
99%
.2569962
.3187975
.3795704
.4039573
.4672697
Largest
.5619798
.5655878
.5684112
.6203823
Mean
Std. Dev.
.2516653
.0960007
Variance
Skewness
Kurtosis
.0092161
.1520254
2.149247
37
Risk ratio
RR = Prob(y=1|x=1)/Prob(y=1|x=0)
Differences in the probability of an event
when x is and is not observed
How much does smoking elevate the chance
your child will be a low weight birth
39
Y11 = RR*Y10
40
Odds Ratio
OR=A/B = [Y11/Y01]/[Y10/Y00]
A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)]
= odds of Y occurring if you are a smoker
B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)]
= odds of Y happening if you are not a smoker
What are the relative odds of Y happening if you do or
do not experience X
41
Details
Y11 = exp(o+ 1 + 2Z) /(1+ exp(o+ 1+ 2Z) )
Y10 = exp(o+ 2Z)/(1+ exp(o+2Z))
Y01 = 1 /(1+ exp(o+ 1 + 2Z) )
Y00 = 1/(1+ exp(o+2Z)
[Y11/Y01] = exp(o+ 1 + 2Z)
[Y10/Y00] = exp(o+ 2Z)
OR=A/B = [Y11/Y01]/[Y10/Y00]
= exp(o+ 1 + 2Z)/ exp(o + 2Z)
= exp(1)
43
45
Therefore
yc = current outcome
Ya = Y10 outcome with zero smoking
PAR = (yc Ya)/yc
Substitute definition of Ya and yc
Reduces to (RR 1)xs /[(1-xs) + RRxs]
47
52
PAR
PAR = (RR 1) xs /[(1- xs) + RR xs]
xs= 0.137
RR = 1.96
PAR = 0.116
11.6% of low weight births attributed to
maternal smoking
53
D *
Pr(Y 1) 0 D * 1
0
1
1
0.045
Endowment effect
Ask group to fill out a survey
As a thank you, give them a coffee mug
Have the mug when they fill out the survey
Policy implications
Example:
A) How much are you willing to pay for clean air?
B) How much do we have to pay you to allow
someone to pollute
Answer to B) orders of magnitude larger than A)
Prior estimate WTP via A and assume equals
WTA
Problem
Artificial situations
Inexperienced may not know value of the
item
Solution: see how experienced actors
behave when they are endowed with
something they can easily value
Two experiments: baseball card shows
and collectible pins
Baseball cards
Two pieces of memorabilia
Game stub from game Cal Ripken Jr set the
record for consecutive games played (vs. KC,
June 14, 1996)
Certificate commemorating Nolan Ryans
300th win