0% found this document useful (0 votes)
10 views18 pages

Introduction To Treatment Effects Handout

Uploaded by

dudisunita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views18 pages

Introduction To Treatment Effects Handout

Uploaded by

dudisunita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Causality, Potential Outcomes, and the

Estimation of Treatment Effects in Randomized


Studies
2020 AEA Continuing Education Program
Mastering Mostly Harmless Econometrics

Alberto Abadie
MIT

Contents

1 Causality, counterfactuals and potential outcomes

2 Randomized experiments, Fisher’s exact test

3 Threats to internal and external validity in randomized


experiments

4 Appendix: Experimental design


Purpose, scope, and examples

The goal of policy/program evaluation is to assess the causal


effect of policy interventions. Examples:
Job training programs on earnings and employment
Class size on test scores
Minimum wage on employment
Tax-deferred saving programs on savings accumulation

More generally, we may be interested in the effect of interventions


that are not public policies. Examples:
Interest rate on credit card usage
Incentive schemes on employee productivity

Causality with potential outcomes

Treatment
Di : Indicator of treatment intake for unit i

1 if unit i received the treatment
Di =
0 otherwise.

Outcome
Yi : Observed outcome variable of interest for unit i

Potential Outcomes
Y0i and Y1i : Potential outcomes for unit i

Y1i : Potential outcome for unit i with treatment


Y0i : Potential outcome for unit i without treatment
Causality with potential outcomes

Treatment Effect
The treatment effect or causal effect of the treatment on the
outcome for unit i is the difference between its two potential
outcomes:
Y1i − Y0i

Observed Outcomes
Observed outcomes are realized as

Y1i if Di = 1
Yi = Y1i Di + Y0i (1 − Di ) or Yi =
Y0i if Di = 0

Fundamental Problem of Causal Inference


Cannot observe both potential outcomes (Y1i , Y0i )

Stable unit treatment value assumption (SUTVA)

Assumption
Observed outcomes are realized as

Yi = Y1i Di + Y0i (1 − Di )

Implies that potential outcomes for unit i are unaffected by


the treatment of unit j

Rules out interference across units

Example: Effect of flu vaccine on hospitalization


This assumption may be problematic, so we should choose the
units of analysis to minimize interference across units.
Quantities of interest (estimands)

ATE
Average treatment effect is:

αATE = E [Y1 − Y0 ]

ATET
Average treatment effect on the treated is:

αATET = E [Y1 − Y0 |D = 1]

Average treatment effect (ATE)

Imagine a population with 4 units:

i Y1i Y0i Yi Di Y1i − Y0i


1 3 0 3 1 3
2 1 1 1 1 0
3 1 0 0 0 1
4 1 1 1 0 0
E [Y1 ] 1.5
E [Y0 ] 0.5
E [Y1 − Y0 ] 1

αATE = E [Y1 − Y0 ] = 3 · (1/4) + 0 · (1/4) + 1 · (1/4) + 0 · (1/4) = 1


Average treatment effect on the treated (ATET)

Imagine a population with 4 units:

i Y1i Y0i Yi Di Y1i − Y0i


1 3 0 3 1 3
2 1 1 1 1 0
3 1 0 0 0 1
4 1 1 1 0 0
E [Y1 |D = 1] 2
E [Y0 |D = 1] 0.5
E [Y1 −Y0 |D = 1] 1.5

αATET = E [Y1 − Y0 |D = 1] = 3 · (1/2) + 0 · (1/2) = 1.5

Selection bias

Problem
Comparisons of earnings for the treated and the untreated do not
usually give the right answer:

E [Y |D = 1] − E [Y |D = 0] = E [Y1 |D = 1] − E [Y0 |D = 0]
= E [Y1 − Y0 |D = 1] + {E [Y0 |D = 1] − E [Y0 |D = 0]}
| {z } | {z }
ATET BIAS

Selection into treatment often depends on potential outcomes

Bias term may be positive or negative depending on the


setting
Selection bias

Problem
Comparisons of earnings for the treated and the untreated do not
usually give the right answer:

E [Y |D = 1] − E [Y |D = 0] = E [Y1 |D = 1] − E [Y0 |D = 0]
= E [Y1 − Y0 |D = 1] + {E [Y0 |D = 1] − E [Y0 |D = 0]}
| {z } | {z }
ATET BIAS

Example: Job training program for disadvantaged


participants are self-selected from a subpopulation of
individuals in difficult labor situations
post-training period earnings would be lower for participants
than for nonparticipants in the absence of the program
(E [Y0 |D = 1] − E [Y0 |D = 0] < 0)

Training program for the disadvantaged in the U.S.


Assignment mechanism

Assignment mechanism
Assignment mechanism is the procedure that determines which
units are selected for treatment intake. Examples include:
random assignment
selection on observables
selection on unobservables
Typically, treatment effects models attain identification by
restricting the assignment mechanism in some way.

Key ideas

Causality is defined by potential outcomes, not by realized


(observed) outcomes

Observed association is neither necessary nor sufficient for


causation

Estimation of causal effects of a treatment (usually) starts


with studying the assignment mechanism
Selection bias

Recall the selection problem when comparing the mean outcomes


for the treated and the untreated:
E [Y |D = 1] − E [Y |D = 0] = E [Y1 |D = 1] − E [Y0 |D = 0]
| {z }
Difference in Means
= E [Y1 − Y0 |D = 1] + {E [Y0 |D = 1] − E [Y0 |D = 0]}
| {z } | {z }
ATET BIAS
Random assignment of units to the treatment forces the
selection bias to be zero
The treatment and control group will tend to be similar along
all characteristics (including Y0 )

Identification in randomized experiments


Randomization implies:

(Y1 , Y0 ) independent of D, or ⊥D.


(Y1 , Y0 )⊥

We have that E [Y0 |D = 1] = E [Y0 |D = 0] and therefore

αATET = E [Y1 − Y0 |D = 1] = E [Y |D = 1] − E [Y |D = 0]

Also, we have that

αATE = E [Y1 −Y0 ] = E [Y1 −Y0 |D = 1] = E [Y |D = 1]−E [Y |D = 0]

As a result,

E [Y |D = 1] − E [Y |D = 0] = αATE = αATET
| {z }
Difference in Means
Identification in randomized experiments
The identification result extends beyond average treatment effects.
Let Qθ (Y ) be the θ-th quantile of the distribution of Y :

Pr(Y ≤ Qθ (Y )) = θ.

Given random assignment, Y0 ⊥


⊥D. Therefore,

Y0 ∼ Y0 |D = 0 ∼ Y |D = 0

where “∼” means “has the same distribution as”. Similarly,

Y1 ∼ Y |D = 1.

So effect of the treatment at any quantile, Qθ (Y1 ) − Qθ (Y0 ) is


identified.
Randomization identifies the entire marginal distributions of
Y0 and Y1
Does not identify the quantiles of the effect: Qθ (Y1 − Y0 ) (the
difference of quantiles is not the quantile of the difference)

Estimation in randomized experiments


Consider a randomized trial with N individuals. Suppose that the
estimand of interest is ATE:

αATE = E [Y1 − Y0 ] = E [Y |D = 1] − E [Y |D = 0].

Using the analogy principle, we construct an estimator:

b = Ȳ1 − Ȳ0 ,
α

where P
Yi · Di 1 X
Ȳ1 = P = Yi ;
Di N1
Di =1
P
Yi · (1 − Di ) 1 X
Ȳ0 = P = Yi
(1 − Di ) N0
Di =0
P
with N1 = i Di and N0 = N − N1 .
b is an unbiased and consistent estimator of αATE .
α
Testing in large samples: Two-sample t-test
Notice that:
αb − αATE d
s → N(0, 1),
σ 2
b1 b
σ 2
+ 0
N1 N0
where X
1
b12 =
σ (Yi − Ȳ1 )2 ,
N1 − 1
Di =1

b02 is analogously defined. In particular, let


and σ

b
α
t=s .
b12
σ b02
σ
+
N1 N0

We reject the null hypothesis H0 : αATE = 0 against the alternative


H1 : αATE 6= 0 at the 5% significance level if |t| > 1.96.

Testing in small samples: Fisher’s exact test

Test of differences in means with large N:

H0 : E [Y1 ] = E [Y0 ], H1 : E [Y1 ] 6= E [Y0 ]

Fisher’s Exact Test with small N:

H0 : Y1 = Y0 , H1 : Y1 6= Y0 (sharp null)

Let Ω be the set of all possible randomization realizations.


We only observe the outcomes, Yi , for one realization of the
experiment. We calculate α̂ = Ȳ1 − Ȳ0 .
Under the sharp null hypothesis we can calculate the value
that the difference of means would have taken under any
other realization, α̂(ω), for ω ∈ Ω.
Testing in small samples: Fisher’s exact test
Suppose that we assign 4 individuals out of 8 to the treatment:

Yi 12 4 6 10 6 0 1 1
Di 1 1 1 1 0 0 0 0 α̂ = 6
α̂(ω)
ω =1 1 1 1 1 0 0 0 0 6
ω =2 1 1 1 0 1 0 0 0 4
ω =3 1 1 1 0 0 1 0 0 1
ω =4 1 1 1 0 0 0 1 0 1.5
···
ω = 70 0 0 0 0 1 1 1 1 -6
The randomization distributionP b (under the sharp null
of α
1
α ≤ z) = 70 ω∈Ω 1{b
hypothesis) is Pr(b α(ω) ≤ z}
α| > z) ≤ 0.05}
Now, find z̄ = inf{z : P(|b
Reject the null hypothesis, H0 : Y1i − Y0i = 0 for all i, against the
alternative hypothesis, H1 : Y1i − Y0i 6= 0 for some i, at the 5%
significance level if |b
α| > z̄

Testing in small samples: Fisher’s exact test


Randomization Distribution of the Difference in Means

12
Diff. in Means

10

0
−8 −6 −4 −2 0 2 4 6 8

Pr(|α̂(ω)| ≥ 6) = 0.0857
Pr(|α̂(ω)| ≥ 6) = 0.0857
Covariate balance

Randomization balances observed but also unobserved


characteristics between treatment and control group

Can check random assignment using so called “balance tests”


(e.g., t-tests) to see if distributions of the observed covariates,
X , are the same in the treatment and control groups

X are pre-treatment variables that are measured prior to


treatment assignment (i.e., at “baseline”)

Threats to the validity of randomized experiments

Internal validity: can we estimate treatment effect for our


particular sample?
Fails when there are differences between treated and controls
(other than the treatment itself) that affect the outcome and
that we cannot control for

External validity: can we extrapolate our estimates to other


populations?
Fails when the treatment effect is different outside the
evaluation environment
Most common threats to internal validity

Failure of randomization

Non-compliance with experimental protocol

Attrition

Most common threats to external validity

Non-representative sample

Non-representative program

The treatment differs in actual implementations

Scale effects

Actual implementations are not randomized (nor full scale)

Hawthorne effects
Appendix:
Experimental Design

Experimental design: Relative sample sizes for fixed N

Suppose that you have N experimental subjects and you have to


decide how many will be in the treatment group and how many in
the control group. We know that:
 
σ12 σ02
Ȳ1 − Ȳ0 ∼ µ1 − µ0 , + .
N1 N0

We want to choose N1 and N0 , subject to N1 + N0 = N, to


minimize the variance of the estimator of the average treatment
effect.
The variance of Ȳ1 − Ȳ0 is:

σ12 σ02
var(Ȳ1 − Ȳ0 ) = +
pN (1 − p)N

where p = N1 /N is the proportion of treated in the sample.


Experimental design: Relative sample sizes for fixed N

Find the value p ∗ that minimizes var(Ȳ1 − Ȳ0 ):

σ12 σ02
− ∗2 + = 0.
p N (1 − p ∗ )2 N

Therefore:
1 − p∗ σ0
= ,
p∗ σ1
and
σ1 1
p∗ = = .
σ1 + σ0 1 + σ0 /σ1
A “rule of thumb” for the case σ1 ≈ σ0 is p∗ = 0.5
For practical reasons it is sometimes better to choose unequal
sample sizes (even if σ1 ≈ σ0 )

Experimental design: Power calculations to choose N

Recall that for a statistical test:


Type I error: Rejecting the null if the null is true.
Type II error: Not rejecting the null if the null is false.

Size of a test is the probability of type I error, usually 0.05.

Power of a test is one minus the probability of type II error,


i.e. the probability of rejecting the null if the null is false.

Statistical power increases with the sample size.

But when is a sample “large enough”?

We want to find N such that we will be able to detect an


average treatment effect of size α or larger with high
probability.
Experimental design: Power calculations to choose N

Assume a particular value, α, for µ1 − µ0 .


b = Ȳ1 − Ȳ0 and
Let α
s
σ12 σ02
s.e.(b
α) = + .
N1 N0

For a large enough sample, we can approximate:

αb−α
∼ N (0, 1) .
s.e.(b
α)

Therefore, the t-statistic for a test of significance is:


 
b
α α
t= ∼N ,1 .
s.e.(b
α) s.e.(b
α)

Probability of rejection if µ1 − µ0 = 0

−1.96 0 1.96
Probability of rejection if µ1 − µ0 = α

−1.96 0 α 1.96
s.e.(b
α)

Experimental design: Power calculations to choose N


The probability of rejecting the null µ1 − µ0 = 0 is:

Pr (|t| > 1.96) = Pr (t < −1.96) + Pr (t > 1.96)


 
α α
= Pr t − < −1.96 −
s.e.(b
α) s.e.(b
α)
 
α α
+ Pr t − > 1.96 −
s.e.(b
α) s.e.(b
α)
    
α α
= Φ −1.96 − + 1 − Φ 1.96 −
s.e.(b
α) s.e.(b
α)

Suppose that p = 1/2 and σ12 = σ02 = σ 2 . Then,


s
σ2 σ2
s.e.(b
α) = +
N/2 N/2

=√ .
N
Power functions with p = 1/2 and σ12 = σ02
1

0.9 N=50

0.8 N=25

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

α/σ

General formula for the power function (p 6= 1/2, σ02 6= σ12 )

Pr (reject µ1 − µ0 = 0|µ1 − µ0 = α)
 ,s 
2
σ1 σ02
= Φ −1.96 − α + 
pN (1 − p)N
,s !!
σ12 σ02
+ 1 − Φ 1.96 − α + .
pN (1 − p)N

To choose N we need to specify:


1 α: minimum detectable magnitude of treatment effect

2 Power value (usually 0.80 or higher)


3 σ12 and σ02 (usually σ12 = σ02 ) (e.g., using previous measures)
4 p: proportion of observations in the treatment group If
σ1 = σ0 , then the power is maximized by p = 0.5

You might also like