0% found this document useful (0 votes)
60 views66 pages

Logit Probit

1. Logit and probit models are used to estimate the probability of a binary outcome based on regressors. They assume the error term follows a logistic or normal distribution. 2. For students with strong academic credentials, the probability of admission estimated by the model will be high, as only a very low error value would prevent their admission. Students with weak credentials have a low estimated probability of admission, as it would require a high error value for them to be admitted. 3. The probit model estimates show that being male increases the probability of smoking by 1.7 percentage points, while having a college degree decreases it by 21.5 percentage points. Each additional 10 years of age reduces smoking rates by

Uploaded by

aleezashiek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views66 pages

Logit Probit

1. Logit and probit models are used to estimate the probability of a binary outcome based on regressors. They assume the error term follows a logistic or normal distribution. 2. For students with strong academic credentials, the probability of admission estimated by the model will be high, as only a very low error value would prevent their admission. Students with weak credentials have a low estimated probability of admission, as it would require a high error value for them to be admitted. 3. The probit model estimates show that being male increases the probability of smoking by 1.7 percentage points, while having a college degree decreases it by 21.5 percentage points. Each additional 10 years of age reduces smoking rates by

Uploaded by

aleezashiek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 66

Logit/Probit Models

Making sense of the decision rule


Suppose we have a kid with great scores,
great grades, etc.
For this kid, xi is large.
What will prevent admission? Only a large
negative i
What is the probability of observing a large
negative i ? Very small.
Most likely admitted. We estimate a large
probability
2

Distribution of Epsilon
-x

0.50

0.40

0.30

0.20

0.10

Values of that would allow admission


0.00
-3

-2

-1

Values of
That will prevent admission

Another example
Suppose we have a kid with bad scores.
For this kid, xi is small (even negative).
What will allow admission? Only a large
positive i
What is the probability of observing a large
positive i ? Very small.
Most likely, not admitted, so, we estimate
a small probability
4

Distribution of Epsilon
x

0.50

-x

0.40

Values of
that would
allow
admission

0.30

0.20

Values of
that would
prevent
admission

0.10

0.00
-3

-2

-1

Normal (probit) Model


is distributed as a standard normal
Mean zero
Variance 1

Evaluate probability (y=1)


Pr(yi=1) = Pr(i > - xi ) = 1 (-xi )
Given symmetry: 1 (-xi ) = (xi )

Evaluate probability (y=0)


Pr(yi=0) = Pr(i - xi ) = (-xi )
Given symmetry: (-xi ) = 1 - (xi )
6

Summary
Pr(yi=1) = (xi )
Pr(yi=0) = 1 -(xi )

Notice that (a) is increasing a.


Therefore, if the xs increases the
probability of observing y, we would
expect the coefficient on that
variable to be (+)
7

The standard normal assumption


(variance=1) is not critical
In practice, the variance may be not equal
to 1, but given the math of the problem,
we cannot separately identify the variance.

Logit
PDF: f(x) = exp(x)/[1+exp(x)]2
CDF: F(a) = exp(a)/[1+exp(a)]
Symmetric, unimodal distribution
Looks a lot like the normal
Incredibly easy to evaluate the CDF and PDF
Mean of zero, variance > 1 (more variance
than normal)

Evaluate probability (y=1)


Pr(yi=1) = Pr(i > - xi ) = 1 F(-xi )
Given symmetry: 1 F(-xi ) = F(xi )
F(xi ) = exp(xi )/(1+exp(xi ))

10

Evaluate probability (y=0)


Pr(yi=0) = Pr(i - xi ) = F(-xi )
Given symmetry: F(-xi ) = 1 - F(xi )
1 - F(xi ) = 1 /(1+exp(xi ))

In summary, when i is a logistic


distribution
Pr(yi =1) = exp(xi )/(1+exp(xi ))
Pr(yi=0) = 1/(1+exp(xi ))
11

STATA Resources
Discrete Outcomes
Regression Models for Categorical
Dependent Variables Using STATA
J. Scott Long and Jeremy Freese

Available for sale from STATA website for


$52 (www.stata.com)
Post-estimation subroutines that translate
results
Do not need to buy the book to use the
subroutines
12

In STATA command line type


net search spost

Will give you a list of available programs to


download
One is
Spostado from http://www.indiana.edu/~jslsoc/stata

Click on the link and install the files


13

Example: Workplace smoking


bans
Smoking supplements to 1991 and 1993
National Health Interview Survey
Asked all respondents whether they
currently smoke
Asked workers about workplace tobacco
policies
Sample: indoor workers
Key variables: current smoking and
whether they faced a workplace ban
14

Data: workplace1.dta
Sample program: workplace1.doc
Results: workplace1.log

15

Description of variables in data

. desc;

storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------> smoker
byte
%9.0g
is current smoking
worka
byte
%9.0g
has workplace smoking bans
age
byte
%9.0g
age in years
male
byte
%9.0g
male
black
byte
%9.0g
black
hispanic
byte
%9.0g
hispanic
incomel
float %9.0g
log income
hsgrad
byte
%9.0g
is hs graduate
somecol
byte
%9.0g
has some college
college
float %9.0g
-----------------------------------------------------------------------

16

Summary statistics

sum;

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------smoker |
16258
.25163
.433963
0
1
worka |
16258
.6851396
.4644745
0
1
age |
16258
38.54742
11.96189
18
87
male |
16258
.3947595
.488814
0
1
black |
16258
.1119449
.3153083
0
1
-------------+-------------------------------------------------------hispanic |
16258
.0607086
.2388023
0
1
incomel |
16258
10.42097
.7624525
6.214608
11.22524
hsgrad |
16258
.3355271
.4721889
0
1
somecol |
16258
.2685447
.4432161
0
1
college |
16258
.3293763
.4700012
0
1

17

Heteroskedastic consistent
Standard errors
.
.
.
>

* run a linear probability model for comparison purposes;


* estimate white standard errors to control for heteroskedasticity;
reg smoker age incomel male black hispanic
hsgrad somecol college worka, robust;

Regression with robust standard errors

Very low R2, typical in LP models

Number of obs
F( 9, 16248)
Prob > F
R-squared
Root MSE

=
=
=
=
=

16258
99.26
0.0000
0.0488
.42336

-----------------------------------------------------------------------------|
Robust
smoker |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0004776
.0002806
-1.70
0.089
-.0010276
.0000725
incomel | -.0287361
.0047823
-6.01
0.000
-.03811
-.0193621
male |
.0168615
.0069542
2.42
0.015
.0032305
.0304926
black | -.0356723
.0110203
-3.24
0.001
-.0572732
-.0140714
hispanic |
-.070582
.0136691
-5.16
0.000
-.097375
-.043789
hsgrad | -.0661429
.0162279
-4.08
0.000
-.0979514
-.0343345
somecol | -.1312175
.0164726
-7.97
0.000
-.1635056
-.0989293
college | -.2406109
.0162568
-14.80
0.000
-.272476
-.2087459
worka |
-.066076
.0074879
-8.82
0.000
-.080753
-.051399
_cons |
.7530714
.0494255
15.24
0.000
.6561919
.8499509
------------------------------------------------------------------------------

Since OLS
Report t-stats

18

Same syntax as REG but with probit


. * run probit model;
. probit smoker age incomel male black hispanic
> hsgrad somecol college worka;
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

Probit estimates

likelihood
likelihood
likelihood
likelihood

= -9171.443
= -8764.068
= -8761.7211
= -8761.7208

Test that all non-constant


Terms are 0

Log likelihood = -8761.7208

Converges rapidly for most


problems
Number of obs
LR chi2(9)
Prob > chi2
Pseudo R2

=
=
=
=

16258
819.44
0.0000
0.0447

-----------------------------------------------------------------------------smoker |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0012684
.0009316
-1.36
0.173
-.0030943
.0005574
incomel |
-.092812
.0151496
-6.13
0.000
-.1225047
-.0631193
male |
.0533213
.0229297
2.33
0.020
.0083799
.0982627
black | -.1060518
.034918
-3.04
0.002
-.17449
-.0376137
hispanic | -.2281468
.0475128
-4.80
0.000
-.3212701
-.1350235
hsgrad | -.1748765
.0436392
-4.01
0.000
-.2604078
-.0893453
somecol |
-.363869
.0451757
-8.05
0.000
-.4524118
-.2753262
college | -.7689528
.0466418
-16.49
0.000
-.860369
-.6775366
worka | -.2093287
.0231425
-9.05
0.000
-.2546873
-.1639702
_cons |
.870543
.154056
5.65
0.000
.5685989
1.172487
------------------------------------------------------------------------------

Report z-statistics
Instead of t-stats

19

. dprobit smoker age incomel male black hispanic


> hsgrad somecol college worka;
Probit regression, reporting marginal effects
Log likelihood = -8761.7208

Number of obs
LR chi2(9)
Prob > chi2
Pseudo R2

= 16258
= 819.44
= 0.0000
= 0.0447

-----------------------------------------------------------------------------smoker |
dF/dx
Std. Err.
z
P>|z|
x-bar [
95% C.I.
]
---------+-------------------------------------------------------------------age | -.0003951
.0002902
-1.36
0.173
38.5474 -.000964 .000174
incomel | -.0289139
.0047173
-6.13
0.000
10.421
-.03816 -.019668
male*|
.0166757
.0071979
2.33
0.020
.39476
.002568 .030783
black*| -.0320621
.0102295
-3.04
0.002
.111945 -.052111 -.012013
hispanic*| -.0658551
.0125926
-4.80
0.000
.060709 -.090536 -.041174
hsgrad*|
-.053335
.013018
-4.01
0.000
.335527
-.07885 -.02782
somecol*| -.1062358
.0122819
-8.05
0.000
.268545 -.130308 -.082164
college*| -.2149199
.0114584
-16.49
0.000
.329376 -.237378 -.192462
worka*| -.0668959
.0075634
-9.05
0.000
.68514
-.08172 -.052072
---------+-------------------------------------------------------------------obs. P |
.25163
pred. P |
.2409344 (at x-bar)
-----------------------------------------------------------------------------(*) dF/dx is for discrete change of dummy variable from 0 to 1
z and P>|z| correspond to the test of the underlying coefficient being 0

20

Males are 1.7 percentage points more likely to smoke

. mfx compute;

Those w/ college degree 21.5 % points


Less likely to smoke

Marginal effects after probit


y = Pr(smoker) (predict)
= .24093439
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------age | -.0003951
.00029
-1.36
0.173 -.000964 .000174
38.5474
incomel | -.0289139
.00472
-6.13
0.000
-.03816 -.019668
10.421
male*|
.0166757
.0072
2.32
0.021
.002568 .030783
.39476
black*| -.0320621
.01023
-3.13
0.002 -.052111 -.012013
.111945
hispanic*| -.0658551
.01259
-5.23
0.000 -.090536 -.041174
.060709
hsgrad*|
-.053335
.01302
-4.10
0.000
-.07885 -.02782
.335527
somecol*| -.1062358
.01228
-8.65
0.000 -.130308 -.082164
.268545
college*| -.2149199
.01146 -18.76
0.000 -.237378 -.192462
.329376
worka*| -.0668959
.00756
-8.84
0.000
-.08172 -.052072
.68514
-----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1

10 years of age reduces smoking rates by


4 tenths of a percentage point
10 percent increase in income will reduce smoking
21
By .29 percentage points

.
.
.
.
.
.

* get marginal effect/treatment effects for specific person;


* male, age 40, college educ, white, without workplace smoking ban;
* if a variable is not specified, its value is assumed to be;
* the sample mean. in this case, the only variable i am not;
* listing is mean log income;
prchange, x(male=1 age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);

probit: Changes in Predicted Probabilities for smoker


age
incomel
male
black
hispanic
hsgrad
somecol
college
worka

min->max
-0.0327
-0.1807
0.0198
-0.0390
-0.0817
-0.0634
-0.1257
-0.2685
-0.0753

0->1
-0.0005
-0.0314
0.0198
-0.0390
-0.0817
-0.0634
-0.1257
-0.2685
-0.0753

-+1/2
-0.0005
-0.0348
0.0200
-0.0398
-0.0855
-0.0656
-0.1360
-0.2827
-0.0785

-+sd/2
-0.0057
-0.0266
0.0098
-0.0126
-0.0205
-0.0310
-0.0605
-0.1351
-0.0365

MargEfct
-0.0005
-0.0349
0.0200
-0.0398
-0.0857
-0.0657
-0.1367
-0.2888
-0.0786

22

Min->Max: change in predicted probability as x changes from its


minimum to its maximum
0->1: change in pred. prob. as x changes from 0 to 1
-+1/2: change in predicted probability as x changes from 1/2
unit below base value to 1/2 unit above
-+sd/2: change in predicted probability as x changes from 1/2
standard dev below base to 1/2 standard dev above
MargEfct: the partial derivative of the predicted
probability/rate with respect to a given independent variable

23

. * using a wald test, test the null hypothesis that;


. * all the education coefficients are zero;
. test hsgrad somecol college;
( 1)
( 2)
( 3)

hsgrad = 0
somecol = 0
college = 0
chi2( 3) =
Prob > chi2 =

504.78
0.0000

24

.
.
.
.
>

* how to run the same tets with a -2 log like test;


* estimate the unresticted model and save the estimates ;
* in urmodel;
probit smoker age incomel male black hispanic
hsgrad somecol college worka;

Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

= -9171.443
= -8764.068
= -8761.7211
= -8761.7208

Delete some results


. estimates store urmodel;
. * estimate the restricted model. save results in rmodel;
. probit smoker age incomel male black hispanic
> worka;
Iteration 0:
Iteration 1:
Iteration 2:

log likelihood = -9171.443


log likelihood = -9022.2473
log likelihood = -9022.1031

Delete some results


. lrtest urmodel rmodel;
likelihood-ratio test
(Assumption: rmodel nested in urmodel)

LR chi2(3) =
Prob > chi2 =

520.7

25 0.000

Comparing Marginal Effects


Variable
age
incomel
male
Black
hispanic
hsgrad
college
worka

LP
-0.00040
-0.0289
0.0167
-0.0321
-0.0658
-0.0533
-0.2149
-0.0669

Probit
-0.00048
-0.0287
0.0168
-0.0357
-0.0706
-0.0661
-0.2406
-0.0661

Logit
-0.00048
-0.0276
0.0172
-0.0342
-0.0602
-0.0514
-0.2121
-0.0658
26

When will results differ?


Normal and logit PDF/CDF look:
Similar in the mid point of the distribution
Different in the tails

You obtain more observations in the tails


of the distribution when
Samples sizes are large
approaches 1 or 0

These situations will more likely produce


differences in estimates
27

probit
matrix
matrix
matrix

smoker worka age incomel male black hispanic hsgrad somecol college;
betat=e(b);
* get beta from probit (1 x k);
beta=betat';
covp=e(V);
* get v/c matric from probit (k x k);

* get means of x -- call it xbar (k x 1);


* must be the same order as in the probit statement;
matrix accum zz = worka age incomel male black hispanic hsgrad somecol college,
means(xbart);
matrix xbar=xbart';
* transpose beta;
matrix xbeta=beta'*xbar;
* get xbeta (scalar);
matrix pdf=normalden(xbeta[1,1]);
* evaluate std normal pdf at xbarbeta;
matrix k=rowsof(beta);
* get number of covariates;
matrix Ik=I(k[1,1]);
* construct I(k);
matrix G=Ik-xbeta*beta*xbar';
* construct G;
matrix v_c=(pdf*pdf)*G*covp*G';
* get v-c matrix of marginal effects;
matrix me= beta*pdf;
* get marginal effects;
matrix se_me1=cholesky(diag(vecdiag(v_c))); * get square root of main diag;
matrix se_me=vecdiag(se_me1)';
*take diagonal values;
matrix z_score=vecdiag(diag(me)*inv(diag(se_me)))'; * get z score;
matrix results=me,se_me,z_score;
* construct results matrix;
matrix colnames results=marg_eff std_err z_score;
* define column names;
matrix list results;
* list results;
28

results[10,3]
marg_eff
worka -.06521255
age -.00039515
incomel -.02891389
male
.01661127
black -.03303852
hispanic -.07107496
hsgrad -.05447959
somecol -.11335675
college -.23955322
_cons
.2712018

std_err
.00720374
.00029023
.00471728
.00714305
.0108782
.01479806
.01359844
.01408096
.0144803
.04808183

z_score
-9.0525984
-1.3615156
-6.129356
2.3255154
-3.0371321
-4.8029926
-4.0063111
-8.0503576
-16.543383
5.6404217

-----------------------------------------------------------------------------smoker |
dF/dx
Std. Err.
z
P>|z|
x-bar [
95% C.I.
]
---------+-------------------------------------------------------------------age | -.0003951
.0002902
-1.36
0.173
38.5474 -.000964 .000174
incomel | -.0289139
.0047173
-6.13
0.000
10.421
-.03816 -.019668
male*|
.0166757
.0071979
2.33
0.020
.39476
.002568 .030783
black*| -.0320621
.0102295
-3.04
0.002
.111945 -.052111 -.012013
hispanic*| -.0658551
.0125926
-4.80
0.000
.060709 -.090536 -.041174
hsgrad*|
-.053335
.013018
-4.01
0.000
.335527
-.07885 -.02782
somecol*| -.1062358
.0122819
-8.05
0.000
.268545 -.130308 -.082164
college*| -.2149199
.0114584
-16.49
0.000
.329376 -.237378 -.192462
worka*| -.0668959
.0075634
-9.05
0.000
.68514
-.08172 -.052072
---------+-------------------------------------------------------------------29

* this is an example of a marginal effect for a dichotomous outcome;


* in this case, set the 1st variable worka as 1 or 0;
matrix x1=xbar;
matrix x1[1,1]=1;
matrix x0=xbar;
matrix x0[1,1]=0;
matrix xbeta1=beta'*x1;
matrix xbeta0=beta'*x0;
matrix prob1=normal(xbeta1[1,1]);
matrix prob0=normal(xbeta0[1,1]);
matrix me_1=prob1-prob0;
matrix pdf1=normalden(xbeta1[1,1]);
matrix pdf0=normalden(xbeta0[1,1]);
matrix G1=pdf1*x1 - pdf0*x0;
matrix v_c1=G1'*covp*G1;
matrix se_me_1=sqrt(v_c1[1,1]);
* marginal effect of workplace bans;
matrix list me_1;
* standard error of workplace a;
matrix list se_me_1;

30

symmetric me_1[1,1]
c1
r1 -.06689591
. * standard error of workplace a;
. matrix list se_me_1;
symmetric se_me_1[1,1]
c1
r1 .00756336

-----------------------------------------------------------------------------smoker |
dF/dx
Std. Err.
z
P>|z|
x-bar [
95% C.I.
]
---------+-------------------------------------------------------------------age | -.0003951
.0002902
-1.36
0.173
38.5474 -.000964 .000174
incomel | -.0289139
.0047173
-6.13
0.000
10.421
-.03816 -.019668
male*|
.0166757
.0071979
2.33
0.020
.39476
.002568 .030783
black*| -.0320621
.0102295
-3.04
0.002
.111945 -.052111 -.012013
hispanic*| -.0658551
.0125926
-4.80
0.000
.060709 -.090536 -.041174
hsgrad*|
-.053335
.013018
-4.01
0.000
.335527
-.07885 -.02782
somecol*| -.1062358
.0122819
-8.05
0.000
.268545 -.130308 -.082164
college*| -.2149199
.0114584
-16.49
0.000
.329376 -.237378 -.192462
worka*| -.0668959
.0075634
-9.05
0.000
.68514
-.08172 -.052072
---------+-------------------------------------------------------------------31

Logit and Standard Normal CDF


0.45
0.40

Standard Normal

0.35
0.30
Y

0.25
0.20
Logit

0.15
0.10
0.05
0.00
-7

-5

-3

-1

X
32

Pseudo R

LLk log likelihood with all variables


LL1 log likelihood with only a constant
0 > LLk > LL1 so | LLk | < |LL1|
Pseudo R2 = 1 - |LL1/LLk|
Bounded between 0-1
Not anything like an R2 from a regression
33

Predicting Y
Let b be the estimated value of
For any candidate vector of xi , we can predict
probabilities, Pi
Pi = (xib)
Once you have Pi, pick a threshold value, T, so
that you predict
Yp = 1 if Pi > T
Yp = 0 if Pi T

Then compare, fraction correctly predicted


34

Question: what value to pick for T?


Can pick .5 what some textbooks
suggest
Intuitive. More likely to engage in the activity
than to not engage in it
When is small (large), this criteria does a
poor job of predicting Yi=1 (Yi=0)

35

*predict probability of smoking;


predict pred_prob_smoke;
* get detailed descriptive data about predicted
prob;
sum pred_prob, detail;
* predict binary outcome with 50% cutoff;
gen pred_smoke1=pred_prob_smoke>=.5;
label variable pred_smoke1 "predicted smoking, 50%
cutoff";
* compare actual values;
tab smoker pred_smoke1, row col cell;

36

Predicted values close


To sample mean of y

Mean of predicted
Y is always close to actual mean
(0.25163 in this case)

. predict pred_prob_smoke;
(option p assumed; Pr(smoker))
. * get detailed descriptive data about predicted prob;
. sum pred_prob, detail;
Pr(smoker)
------------------------------------------------------------Percentiles
Smallest
1%
.0959301
.0615221
5%
.1155022
.0622963
10%
.1237434
.0633929
Obs
16258
25%
.1620851
.0733495
Sum of Wgt.
16258
50%
75%
90%
95%
99%

.2569962
.3187975
.3795704
.4039573
.4672697

Largest
.5619798
.5655878
.5684112
.6203823

Mean
Std. Dev.

.2516653
.0960007

Variance
Skewness
Kurtosis

.0092161
.1520254
2.149247

No one predicted to have a


High probability of smoking
Because mean of Y closer to 0

37

Some nice properties of the Logit


Outcome, y=1 or 0
Treatment, x=1 or 0
Other covariates, x
Context,
x = whether a baby is born with a low weight
birth
x = whether the mom smoked or not during
pregnancy
38

Risk ratio
RR = Prob(y=1|x=1)/Prob(y=1|x=0)
Differences in the probability of an event
when x is and is not observed
How much does smoking elevate the chance
your child will be a low weight birth
39

Let Yyx be the probability y=1 or 0 given


x=1 or 0
Think of the risk ratio the following way
Y11 is the probability Y=1 when X=1
Y10 is the probability Y=1 when X=0

Y11 = RR*Y10
40

Odds Ratio
OR=A/B = [Y11/Y01]/[Y10/Y00]
A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)]
= odds of Y occurring if you are a smoker
B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)]
= odds of Y happening if you are not a smoker
What are the relative odds of Y happening if you do or
do not experience X
41

Suppose Pr(Yi =1) = F(o+ 1Xi + 2Z) and F is


the logistic function
Can show that
OR = exp(1) = e 1
This number is typically reported by most
statistical packages
42

Details
Y11 = exp(o+ 1 + 2Z) /(1+ exp(o+ 1+ 2Z) )
Y10 = exp(o+ 2Z)/(1+ exp(o+2Z))
Y01 = 1 /(1+ exp(o+ 1 + 2Z) )
Y00 = 1/(1+ exp(o+2Z)
[Y11/Y01] = exp(o+ 1 + 2Z)
[Y10/Y00] = exp(o+ 2Z)
OR=A/B = [Y11/Y01]/[Y10/Y00]
= exp(o+ 1 + 2Z)/ exp(o + 2Z)
= exp(1)
43

Suppose Y is rare, mean is close to 0


Pr(Y=0|X=1) and Pr(Y=0|X=0) are both close
to 1, so they cancel

Therefore, when mean is close to 0


Odds Ratio Risk Ratio

Why is this nice?


44

Population Attributable Risk


PAR
Fraction of outcome Y attributed to X
Let xs be the fraction use of x
PAR = (RR 1)xs /[(1-xs) + RRxs]
Derived on next 2 slides

45

Population attributable risk


Average outcome in the population
yc = (1-xs) Y10 + xs Y11 = (1- xs)Y10 + xs (RR)Y10
Average outcomes are a weighted average of
outcomes for X=0 and X=1
What would the average outcome be in the absence
of X (e.g., reduce smoking rates to 0)?
Ya = Y10
46

Therefore
yc = current outcome
Ya = Y10 outcome with zero smoking
PAR = (yc Ya)/yc
Substitute definition of Ya and yc
Reduces to (RR 1)xs /[(1-xs) + RRxs]

47

Example: Maternal Smoking and


Low Weight Births
6% births are low weight
< 2500 grams
Average birth is 3300 grams (5.5 lbs)

Maternal smoking during pregnancy has


been identified as a key cofactor
13% of mothers smoke
This number was falling about 1 percentage
point per year during 1980s/90s
Doubles chance of low weight birth
48

Natality detail data


Census of all births (4 million/year)
Annual files starting in the 60s
Information about
Baby (birth weight, length, date, sex, plurality, birth
injuries)
Demographics (age, race, marital, educ of mom)
Birth (who delivered, method of delivery)
Health of mom (smoke/drank during preg, weight
gain)
49

Smoking not available from CA or NY


~3 million usable observations
I pulled .5% random sample from 1995
About 12,500 obs
Variables: birthweight (grams), smoked,
married, 4-level race, 5 level education,
mothers age at birth
50

Notice a few things


13.7% of women smoke
6% have low weight birth
Raw
Pr(LBW | Smoke) =10.28%
Numbers
Pr(LBW |~ Smoke) = 5.36%
RR
= Pr(LBW | Smoke)/ Pr(LBW |~ Smoke)
= 0.1028/0.0536 = 1.92
51

Asking for odds ratios


Logistic y x1 x2;
In this case
xi: logistic lowbw smoked age
married i.educ5 i.race4;

52

PAR
PAR = (RR 1) xs /[(1- xs) + RR xs]
xs= 0.137
RR = 1.96
PAR = 0.116
11.6% of low weight births attributed to
maternal smoking
53

D *
Pr(Y 1) 0 D * 1

0
1
1

0.045

5.3 / 22.222 0.239


1 22.222
1

Endowment effect
Ask group to fill out a survey
As a thank you, give them a coffee mug
Have the mug when they fill out the survey

After the survey, offer them a trade of a


candy bar for a mug
Reverse the experiment offer candy bar,
then trade for a mug
Comparison sample give them a choice
of mug/candy after survey is complete

Contrary to simply consumer


choice model
Standard util. theory model assume MRS
between two good is symmetric
Lack of trading suggests an endowment
effect
People value the good more once they own it
Generates large discrepancies between WTP
and WTA

Policy implications
Example:
A) How much are you willing to pay for clean air?
B) How much do we have to pay you to allow
someone to pollute
Answer to B) orders of magnitude larger than A)
Prior estimate WTP via A and assume equals
WTA

Thought of as loss aversion

Problem
Artificial situations
Inexperienced may not know value of the
item
Solution: see how experienced actors
behave when they are endowed with
something they can easily value
Two experiments: baseball card shows
and collectible pins

Baseball cards
Two pieces of memorabilia
Game stub from game Cal Ripken Jr set the
record for consecutive games played (vs. KC,
June 14, 1996)
Certificate commemorating Nolan Ryans
300th win

Ask people to fill out a 5 min survey. In


return, they receive one of the pieces,
then ask for a trade

You might also like