0% found this document useful (0 votes)
21 views7 pages

Chapter 2 Randomized Experiments Identification and Inference

Randomized experiments solve the identification problem for causal inference through random assignment of treatment. Regression analysis of experimental data yields unbiased estimates of the average treatment effect with conservative confidence intervals. Covariate adjustment via regression can improve efficiency but may not always be robust to model specifications. There is a potential tradeoff between internal and external validity when adjusting covariates in randomized experiments.

Uploaded by

Miriam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Chapter 2 Randomized Experiments Identification and Inference

Randomized experiments solve the identification problem for causal inference through random assignment of treatment. Regression analysis of experimental data yields unbiased estimates of the average treatment effect with conservative confidence intervals. Covariate adjustment via regression can improve efficiency but may not always be robust to model specifications. There is a potential tradeoff between internal and external validity when adjusting covariates in randomized experiments.

Uploaded by

Miriam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 2: Randomized Experiments: Identification and Inference

MIT 17.802: Quantitative Research Methods II

2023-03-19

Lecture 2: Randomized Experiments - Identification and Basic In-


ference

BLUF

• Random assignment solves the identification problem for causal inference based on minimal assumptions
that we can control as researchers
• Regression is a useful tool for analyzing experiments: simple regression with robust SE yields unbiased
estimates with conservative confidence intervals.

• Covariate adjustment via regression can improve efficiency, and estimates are usually (but not always)
robust to alternative model specifications.
• Possible trade off between internal and external validity

Randomization solves selection bias problem

Recall that:
τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y[ 0i]|Di = 0
= E[Y1i − Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y0i |Di = 0]

Random assignment of Di will make the treated and untreated units identical on average such that:

E[Y0i |Di = 1] = E[Y0i |Di = 0]

In other words,

Bias = E[Y0i |Di = 1] − E[Y0i |Di = 0] = 0

Therefore

τ̃ = E[Y1i − Y0i |Di = 1]

1
Identification vs estimation

Two inferential hurdles of causal inference:

1. Identification: if you can observe data from an entire population, can can you learn about yoru Qol
(quantity of interest)?
2. Estimation: Given a finite sample, how well can you learn about your Qol?

Golden rule of inference: identification precedes estimation


Recall that in normal linear regression. . .

Yi = XiT β + ϵi

β is estimable when there is no collinearity amount Xi .


β is unbiasedly estimable when the zero conditional mean assumption is met:

E[ϵi |Xi ] = 0

We learned last semester that an unbiased estimator for β is:

n
X Xn
β̂ = (Xi XiT )−1 ( Xi Yi )
i=1 i=1

Classical randomized experiment

• Units: i = 1, ..., N
• Treatment Di = 0, 1 is randomly assigned

• Potential outcomes: Y0i , Y1i


• Observed outcomes: Yi = YDi ,i

Two types of randomization:

1. Complete randomization: Exactly N1 treated units.


2. Simple (Bernoulli) randomization: Each unit independently assigned to treatment with probability p

Randomization implies that [Y1i , Y0i ] ⊥ Di

2
Identification of Average Treatment Effect under randomized experiment

Proof that TATE is identified:


This is what we want:

N
1 X
τAT E = E[Y1i − Y0i ] = (Y1i − Y0i )
N i=1

Substituting our earlier definition for Yi , we get:

E[Yi |Di = 1] = E[Di · Y1i + (1 − D0i )Y0i |Di = 1]

The second half of the term, (1 − D0i )Y0i drops because we know that Di = 1.

= E[Di · Y1i |Di = 1]

Then, because of randomization, we can get rid of he “given Di ” bit since that doesn’t change anything.

= E[Y1i ]

Similarly,

E[Yi |Di = 0] = E[Di · Y1i + (1 − Di ) · Y0i |Di = 0] = E[(1 − Di )Y0i |Di = 0] = E[Y0i |Di = 0] = E[Y0i ]

Plugging that into our TATE formula from above:

τAT E = E[Y1i − Y0i ] = E[Y1i ] − E[Y0i ]

τAT E = E[Yi |Di = 1] − E[Yi |Di = 0]

Since random assignment implies [Y1i , Y0i ] ⊥ Di , which implies:

E[Y1i |Di = 0] = E[Y1i |Di = 1]

E[Y0i |Di = 0] = E[Y0i |Di = 1]

Because of this, observed difference in means identifies: (lecture 2, slide 15/49)

τ̃ = E[Yi |Di = 1] − E[Yi |Di = 0] = τAT T = τAT C = τAT E

3
Variation due to random assignment

Observed difference in means:

N N
1 X 1 X
τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0] = Di Yi − (1 − Di )(Yi )
N1 i=1 N0 i=1

Randomization variance (standard error squared) of τ̃ :

• Uncertainty comes from random assignment to treatment; even if we have data on entire population,
τ̃ still has uncertainty due to this
• It’s not clear that the t-test is valid for randomized uncertainty

When Di is assigned by complete randomization, we can show that

1 N0 2 N1 2
V(τ̃ ) = ( S + S + 2S01 )
N N1 1 N0 0

Where S12 is sample variance of the potential outcome under treatment:


As we can see, it’s basically an average (with the usual N-1 convention) of the distance between each
treated observed outcome and the average treatment observed outcome.

N
1 X
S12 = (Y1i − Y¯1 )2
N − 1 i=1

S02 is the sample variance of the potential outcome under control:

N
1 X
S02 = (Y0i − Y¯0 )2
N − 1 i=1

S01 is the sample covariance of Y1i and Y0i

N
1 X
(Y1i − Y¯1 )(Y0i − Y¯0 )
N − 1 i=1

N0 N1
N1 and N0 are weights, but I’m not sure about the logic behind them
Unfortunately, V[τ̃ ] is not identified, because S01 is unidentified (since we never observe Y1i and Y0i together.)
Therefore, our estimate of the randomization variance of τ̃ is always biased!
What we usually end up using for our “t-test” is a biased (but identified) estimator.

2 2
˜ τ = S1 + S0
V(˜)
N1 N0
We observe all of the components of this estimator in our data.
Also, we know the direction of the bias. This formula is conservative, meaning it overestimates the size of
variance. The true variance is smaller.

˜ τ ≥ V(τ̃ )
V(˜)
˜ τ = τ̃ if and only if τi = τAT E for all i (constant effect).
Somehow, V(˜)

4
Can we use regression to analyze experimental data?
For a binary treatment (Di )ϵ[0, 1], we can show that:

• Simple regression coefficient is numerically equal to difference in means:

Pn
i=1 (Yi − Ȳ )(Di − D̄)
β̂OLS = Pn 2
= τ̃
i=1 (Di − D̄)

• Heteroskedasticity-robust variance (the HC2 variant) is numerically equal to the “t-test” variance
formula

S12 S2 ˜τ
2
σ̂HC2 = + 0 = V[˜]
N1 N0
- Recall that homoskedasticity refers to the assumption that the variance of the errors or residuals in a
statistical model is constant across all levels of the predictor variables. In other words, it means that the
variability of the errors is the same for all observations in the dataset.
Therefore, we can use regression to analyze a randomized experiment:

1. Regress Yi on Di and get the coefficient on Di .


2. Calculate the robust standard error (lm_robust in the estimator package in R)
3. Do the t-test, calculate confidence intervals, etc. as usual

Covariate adjustment in Randomized Exeriments


• Randomization makes our sample ATE estimate unbiased, because it balances both unobserved and
observed pre-treatment covariates between the treated and untreated on average.
• However, in small samples, we can get unlucky and suffer from imbalance, which can mean that our
estimate is far from the true ATE.

This only matters if the covariate on which the sample is unbalanced is related to the outcome.
In natural experiments, need to check whether randomization occurred as expected.

• Common practice: Conduct balance checks with respect to observed pre-treatment covariates: 1)
Compare means, standard deviations, etc. between treated and untreated; regress treatment indicator
on covariates 2) Visual inspection via histograms and density plots.
• Can correct imbalance via: regression matching, weighting, lasso, etc.
• Post randomization adjustment can also improve efficiency
• Can create bias due to:

- Model misspecification (but this is usually small)


- Post-hoc analysis (p-hacking)
- Incorrectly adjusting post-treatment comparatives (never do this!!)

• Danny advises to always adjust for predictive covariates, because not doing so leaves precision on the
table.

5
Bias-variance tradeoff in covariate adjustment with regression

Yi = α + β ∗ Di + XiT γ + ϵi
How does the bias-variance tradeoff work out in this case?

• When Di is randomized, bias due to model misspecification is likely to be small because of the “par-
tialling out” formula, which takes us from a regression with many covariates down to a bivariate
regression.

Cov(Yi ,D̃i ) Cov(Yi ,Di )


β̂∗OLS = V ar(D̃i )
vs. β̂OLS = τ̃ = V ar(Di )

In the first equation, D̃i is the residual from a regression of Di on Xi .


[My attempt at this, based on 2/14 lecture 1:15:28, lecture 2 slides 29/39]

D̃i = δXiT + w

Here, δ should equal zero [on average], because this is a randomized experiment, so we should basically just
get back almost the same residual. That means that the residuals from this regression should be distributed
in the same way as the residuals for the original regression.

So basically β̂OLS ≈ β̂OLS here. In other words, what bias exists is very small.

• Recent formal and simulation work suggest that the efficiency gains of this swamp bias as bias disap-
pears quickly with sample size.
• Also, covariate adjustment using regression still allows for consistent estimation of the ATE even if the
model is mis-specified.
• How this impacts the variance of the regression-adjusted estimator, (V(β π ).
Note that here we will consider V(β) under the condition of homoskedasticity.

– With no covariates and under homoskedasticity:

σY2 |D
V(β) = Pn
i=1 (Di − D̄)2

Numerator: the unconditional marginal variance in Y (what remains after you’ve “partialled out” the
treatmet variable)
Denominator: variance of Di .

– When we add covariates, σY2 |D , the unconditional marginal variance of Y, gets replaced with a
conditional marginal variance of Y: σY2 |D,X .

V(Ydi |Xi = x) = σY2 |D,X

for d = 0, 1 and all values of x

– If X is predictive of Y, then σY2 |D,X < σY2 |D , and thus the adjusted estimator will be more precise.

• Example from slides 32 - 38 / 49: Example is imbalanced, because treated units tend to have way lower
values of [pre-treatment covariate] Xi than control units. Danny showed that we it a line through the
treated units and a line through the control units, and then calculate what ATE would be if Xi = 0.

6
– This is what regression adjustment does in general.
– The linear model in this example is wrong, but if you have a big enough sample size, the linear
model isn’t doing much work, so it won’t matter much.
– We see (slide 38), that the covariance adjustment makes the standard error much much smaller
– Here, the linearity assumption is wrong though.

• How to check balance: regress treatment variable on covariates, and then do an f-test on the hypothesis
that the variable coefficients are all zero.

Threats to validity

• Internal validity - can we estimate the treatment effect of our particular sample? Fails when there are
differences between treatment and control groups other than treatment itself that affect the outcome
and we cannot control for.
Examples

– Failure of randomization (e.g., by implementing partners)


∗ When randomization not implemented by researcher, balance checks are CRUCIAL.
– Noncompliance with experimental protocol
– Differential attrition - most common threat to validity (e.g., control gorup subjects more likely to
drop out of study than treatment group subjets)

• External validity - can we extrapolate our estimates to other populations? Fails when outside the
experimental environment, the treatment has a different effect.
Two key concepts:

1. Transportability: will the treatment also be effective in a different location or time


2. Non-stability: Will the treatment also be effective when not randomly assigned but taken by
self-selection?

Most common threats to external validity

– Non-representative sample
– Non-representative treatment

Tradeoffs in internal and external validity

• Randomization ensures internal validity, but often requires us to have a convenience sample

• Observational studies are not necessarily better in external validity than randomization samples
• Internally valid causal estimates are of limited value without external validity

You might also like