Chapter 2 Randomized Experiments Identification and Inference
Chapter 2 Randomized Experiments Identification and Inference
2023-03-19
BLUF
• Random assignment solves the identification problem for causal inference based on minimal assumptions
that we can control as researchers
• Regression is a useful tool for analyzing experiments: simple regression with robust SE yields unbiased
estimates with conservative confidence intervals.
• Covariate adjustment via regression can improve efficiency, and estimates are usually (but not always)
robust to alternative model specifications.
• Possible trade off between internal and external validity
Recall that:
τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y[ 0i]|Di = 0
= E[Y1i − Y0i |Di = 1] + E[Y0i |Di = 1] − E[Y0i |Di = 0]
Random assignment of Di will make the treated and untreated units identical on average such that:
In other words,
Therefore
1
Identification vs estimation
1. Identification: if you can observe data from an entire population, can can you learn about yoru Qol
(quantity of interest)?
2. Estimation: Given a finite sample, how well can you learn about your Qol?
Yi = XiT β + ϵi
E[ϵi |Xi ] = 0
n
X Xn
β̂ = (Xi XiT )−1 ( Xi Yi )
i=1 i=1
• Units: i = 1, ..., N
• Treatment Di = 0, 1 is randomly assigned
2
Identification of Average Treatment Effect under randomized experiment
N
1 X
τAT E = E[Y1i − Y0i ] = (Y1i − Y0i )
N i=1
The second half of the term, (1 − D0i )Y0i drops because we know that Di = 1.
Then, because of randomization, we can get rid of he “given Di ” bit since that doesn’t change anything.
= E[Y1i ]
Similarly,
E[Yi |Di = 0] = E[Di · Y1i + (1 − Di ) · Y0i |Di = 0] = E[(1 − Di )Y0i |Di = 0] = E[Y0i |Di = 0] = E[Y0i ]
3
Variation due to random assignment
N N
1 X 1 X
τ̃ = E[Y1i |Di = 1] − E[Y0i |Di = 0] = Di Yi − (1 − Di )(Yi )
N1 i=1 N0 i=1
• Uncertainty comes from random assignment to treatment; even if we have data on entire population,
τ̃ still has uncertainty due to this
• It’s not clear that the t-test is valid for randomized uncertainty
1 N0 2 N1 2
V(τ̃ ) = ( S + S + 2S01 )
N N1 1 N0 0
N
1 X
S12 = (Y1i − Y¯1 )2
N − 1 i=1
N
1 X
S02 = (Y0i − Y¯0 )2
N − 1 i=1
N
1 X
(Y1i − Y¯1 )(Y0i − Y¯0 )
N − 1 i=1
N0 N1
N1 and N0 are weights, but I’m not sure about the logic behind them
Unfortunately, V[τ̃ ] is not identified, because S01 is unidentified (since we never observe Y1i and Y0i together.)
Therefore, our estimate of the randomization variance of τ̃ is always biased!
What we usually end up using for our “t-test” is a biased (but identified) estimator.
2 2
˜ τ = S1 + S0
V(˜)
N1 N0
We observe all of the components of this estimator in our data.
Also, we know the direction of the bias. This formula is conservative, meaning it overestimates the size of
variance. The true variance is smaller.
˜ τ ≥ V(τ̃ )
V(˜)
˜ τ = τ̃ if and only if τi = τAT E for all i (constant effect).
Somehow, V(˜)
4
Can we use regression to analyze experimental data?
For a binary treatment (Di )ϵ[0, 1], we can show that:
Pn
i=1 (Yi − Ȳ )(Di − D̄)
β̂OLS = Pn 2
= τ̃
i=1 (Di − D̄)
• Heteroskedasticity-robust variance (the HC2 variant) is numerically equal to the “t-test” variance
formula
S12 S2 ˜τ
2
σ̂HC2 = + 0 = V[˜]
N1 N0
- Recall that homoskedasticity refers to the assumption that the variance of the errors or residuals in a
statistical model is constant across all levels of the predictor variables. In other words, it means that the
variability of the errors is the same for all observations in the dataset.
Therefore, we can use regression to analyze a randomized experiment:
This only matters if the covariate on which the sample is unbalanced is related to the outcome.
In natural experiments, need to check whether randomization occurred as expected.
• Common practice: Conduct balance checks with respect to observed pre-treatment covariates: 1)
Compare means, standard deviations, etc. between treated and untreated; regress treatment indicator
on covariates 2) Visual inspection via histograms and density plots.
• Can correct imbalance via: regression matching, weighting, lasso, etc.
• Post randomization adjustment can also improve efficiency
• Can create bias due to:
• Danny advises to always adjust for predictive covariates, because not doing so leaves precision on the
table.
5
Bias-variance tradeoff in covariate adjustment with regression
Yi = α + β ∗ Di + XiT γ + ϵi
How does the bias-variance tradeoff work out in this case?
• When Di is randomized, bias due to model misspecification is likely to be small because of the “par-
tialling out” formula, which takes us from a regression with many covariates down to a bivariate
regression.
D̃i = δXiT + w
Here, δ should equal zero [on average], because this is a randomized experiment, so we should basically just
get back almost the same residual. That means that the residuals from this regression should be distributed
in the same way as the residuals for the original regression.
∗
So basically β̂OLS ≈ β̂OLS here. In other words, what bias exists is very small.
• Recent formal and simulation work suggest that the efficiency gains of this swamp bias as bias disap-
pears quickly with sample size.
• Also, covariate adjustment using regression still allows for consistent estimation of the ATE even if the
model is mis-specified.
• How this impacts the variance of the regression-adjusted estimator, (V(β π ).
Note that here we will consider V(β) under the condition of homoskedasticity.
σY2 |D
V(β) = Pn
i=1 (Di − D̄)2
Numerator: the unconditional marginal variance in Y (what remains after you’ve “partialled out” the
treatmet variable)
Denominator: variance of Di .
– When we add covariates, σY2 |D , the unconditional marginal variance of Y, gets replaced with a
conditional marginal variance of Y: σY2 |D,X .
– If X is predictive of Y, then σY2 |D,X < σY2 |D , and thus the adjusted estimator will be more precise.
• Example from slides 32 - 38 / 49: Example is imbalanced, because treated units tend to have way lower
values of [pre-treatment covariate] Xi than control units. Danny showed that we it a line through the
treated units and a line through the control units, and then calculate what ATE would be if Xi = 0.
6
– This is what regression adjustment does in general.
– The linear model in this example is wrong, but if you have a big enough sample size, the linear
model isn’t doing much work, so it won’t matter much.
– We see (slide 38), that the covariance adjustment makes the standard error much much smaller
– Here, the linearity assumption is wrong though.
• How to check balance: regress treatment variable on covariates, and then do an f-test on the hypothesis
that the variable coefficients are all zero.
Threats to validity
• Internal validity - can we estimate the treatment effect of our particular sample? Fails when there are
differences between treatment and control groups other than treatment itself that affect the outcome
and we cannot control for.
Examples
• External validity - can we extrapolate our estimates to other populations? Fails when outside the
experimental environment, the treatment has a different effect.
Two key concepts:
– Non-representative sample
– Non-representative treatment
• Randomization ensures internal validity, but often requires us to have a convenience sample
• Observational studies are not necessarily better in external validity than randomization samples
• Internally valid causal estimates are of limited value without external validity