Slides 1
Slides 1
Outline
Lecture 1
1. Introduction
Long literature, in both statistics and economics. Influential Once overlap issues are addressed, choice of estimators is less
economics/econometrics papers include Ashenfelter and Card important. Estimators combining matching and regression or
(1985), Barnow, Cain and Goldberger (1980), Card and Sulli- weighting and regression are recommended for robustness rea-
van (1988), Dehejia and Wahba (1999), Hahn (1998), Heck- sons.
man and Hotz (1989), Heckman and Robb (1985), Lalonde
Key role for analysis of the joint distribution of treatment in-
(1986). In stat literature work by Rubin (1974, 1978), Rosen-
dicator and covariates prior to using outcome data.
baum and Rubin (1983).
2 3
2. Potential Outcomes (Rubin, 1974)
The first is a statistical, data descriptive motivation. A second argument is that almost any evaluation of a treatment
involves comparisons of units who received the treatment with
A natural starting point in the evaluation of any program is a units who did not.
comparison of average outcomes for treated and control units.
The question is typically not whether such a comparison should
A logical next step is to adjust any difference in average out- be made, but rather which units should be compared, that is,
comes for differences in exogenous background characteristics which units best represent the treated units had they not been
(exogenous in the sense of not being affected by the treat- treated.
ment).
It is clear that settings where some of necessary covariates are
Such an analysis may not lead to the final word on the efficacy not observed will require strong assumptions to allow for iden-
of the treatment, but the absence of such an analysis would tification. E.g., instrumental variables settings Absent those
seem difficult to rationalize in a serious attempt to understand assumptions, typically only bounds can be identified (e.g., Man-
the evidence regarding the effect of the treatment. ski, 1990, 1995).
10 11
Suppose that profits are output minus costs, Second assumption on the joint distribution of treatments and
covariates:
Wi = arg max E[πi (w)|ci ] = arg max E[g(w, εi) − ci · w|ci ],
w w
Wi = 1{E[g(1, εi) − g(0, εi) ≥ ci|ci ]} = h(ci). 0 < Pr(Wi = 1|Xi) < 1.
To make this feasible, one needs to be able to estimate the The reason is that the weaker assumption is intrinsically tied to
expectations E[Yi |Xi = x, Wi = w] for all values of w and x in the functional form assumptions, and as a result one cannot iden-
support of these variables. This is where overlap is important. tify average effects on transformations of the original outcome
(e.g., logarithms) without the strong assumption.
Given identification of τ (x),
If we are interested in τP,T it is sufficient to assume
τP = E[τ (Xi)]
Yi(0) ⊥
⊥ Wi | Xi,
14 15
Efficiency Bound
Propensity Score
Hahn (1998): for any regular estimator for τP , denoted by τ̂ ,
Result 1 Suppose that Assumption 1 holds. Then: with
√
(Yi(0), Yi(1)) ⊥
⊥ Wi | e(Xi). d
N · (τ̂ − τP ) −→ N(0, V ),
Only need to condition on scalar function of covariates, which the variance must satisfy:
would be much easier in practice if Xi is high-dimensional.
σ12(Xi) σ02(Xi)
V ≥E + + (τ (Xi) − τP )2 . (1)
(Problem is that the propensity score e(x) is almost never e(Xi) 1 − e(Xi)
known.)
Estimators exist that achieve this bound.
16 17
A. Regression Estimators
8
units.
7
The reason is that in that case the regression estimators rely
6
heavily on extrapolation.
5
4
Note that μ0(x) is used to predict missing outcomes for the
treated. Hence on average one wishes to use predict the control
3
outcome at X T = i Wi · Xi/NT , the average covariate value
2
for the treated. With a linear regression function, the average
prediction can be written as ȲC + β̂ (X T − X C ).
1
0
If X T and X C are close, the precise specification of the regres-
−1
sion function will not matter much for the average prediction.
0.35
0.25
0.15
0.05
0.4
0.3
0.2
0.1
0
With the two averages very different, the prediction based on
a linear regression function can be sensitive to changes in the
specification.
20
B. Matching
let m(i) is the mth closest match, that is, the index l that
satisfies Wl = Wi and
Issues with Matching
1{ Xj − Xi ≤ Xl − Xi } = m,
j|Wj =Wi Bias is of order O(N −1/K ), where K is dimension of covariates.
Is important in large samples if K ≥ 2 (and dominates variance
Then asymptotically if K ≥ 3)
Yi if Wi = 0, 1
Ŷi(0) = 1 Ŷi(1) = M j∈JM (i) Yj if Not Efficient (but efficiency loss is small)
M j∈JM (i) Yj if Wi = 1, Yi if
Easy to implement, robust.
The simple matching estimator is
1 N
sm =
τ̂M Ŷi(1) − Ŷi(0) . (2)
N i=1
21 22
implying
N (1 − W ) · Y
N (1 − W )
i i i
W ·Y (1 − W ) · Y − /
τP = E − . i=1 1 − ê(Xi ) i=1 1 − ê(Xi )
e(X) 1 − e(X)
With the propensity score known one can directly implement Is efficient given nonparametric estimator for e(x).
this estimator as
N
Potentially sensitive to estimator for propensity score.
1 Wi · Yi (1 − Wi) · Yi
τ̃ = − . (3)
N i=1 e(Xi) 1 − e(Xi)
23 24
D.1 Mixed Estimators: Weighting and Regression
using regression estimate for β. (Does not work for matching estimator)
Can eliminate bias of matching estimator given flexible speci-
fication of regression function.
27 28
Estimation of the Variance
N
τ̂ = λi (X, W) · Yi,
i=1
N
V (τ̂ |X, W) = λi(X, W)2 · σW
2 (X ).
i i
i=1
2
2 (X ) = Y − Y
σ̂W /2.
i i i v(i)
29