0% found this document useful (0 votes)
8 views8 pages

Slides 1

The document outlines a course on Applied Econometrics, focusing on the estimation of average treatment effects using various methods such as matching, regression, and propensity score estimators. It discusses key concepts like potential outcomes, unconfoundedness, and the importance of covariate overlap for accurate estimation. The document also highlights the significance of combining different estimation techniques for robustness and efficiency in applied settings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Slides 1

The document outlines a course on Applied Econometrics, focusing on the estimation of average treatment effects using various methods such as matching, regression, and propensity score estimators. It discusses key concepts like potential outcomes, unconfoundedness, and the importance of covariate overlap for accurate estimation. The document also highlights the significance of combining different estimation techniques for robustness and efficiency in applied settings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

“A Course in Applied Econometrics”

Outline
Lecture 1

1. Introduction

Estimation of Average Treatment Effects


2. Potential Outcomes
Under Unconfoundedness, Part I
3. Estimands and Identification

Guido Imbens 4. Estimation and Inference

IRP Lectures, UW Madison, August 2008

1. Introduction Unusual case with many proposed (semi-parametric) estima-


tors (matching, regression, propensity score, or combinations),
many of which are actually used in practice.
We are interested in estimating the average effect of a program
or treatment, allowing for heterogenous effects, assuming that
We discuss implementation, and assessment of the critical as-
selection can be taken care of by adjusting for differences in
sumptions (even if they are not testable).
observed covariates.
In practice concern with overlap in covariate distributions tends
This setting is of great applied interest. to be important.

Long literature, in both statistics and economics. Influential Once overlap issues are addressed, choice of estimators is less
economics/econometrics papers include Ashenfelter and Card important. Estimators combining matching and regression or
(1985), Barnow, Cain and Goldberger (1980), Card and Sulli- weighting and regression are recommended for robustness rea-
van (1988), Dehejia and Wahba (1999), Hahn (1998), Heck- sons.
man and Hotz (1989), Heckman and Robb (1985), Lalonde
Key role for analysis of the joint distribution of treatment in-
(1986). In stat literature work by Rubin (1974, 1978), Rosen-
dicator and covariates prior to using outcome data.
baum and Rubin (1983).
2 3
2. Potential Outcomes (Rubin, 1974)

We observe N units, indexed by i = 1, . . . , N , viewed as drawn


randomly from a large population. Several additional pieces of notation.

We postulate the existence for each unit of a pair of potential


First, the propensity score (Rosenbaum and Rubin, 1983) is
outcomes,
Yi(0) for the outcome under the control treatment and defined as the conditional probability of receiving the treat-
Yi(1) for the outcome under the active treatment ment,
Yi(1) − Yi(0) is unit-level causal effect
e(x) = Pr(Wi = 1|Xi = x) = E[Wi|Xi = x].
Covariates Xi (not affected by treatment)
Each unit is exposed to a single treatment; Wi = 0 if unit i
receives the control treatment and Wi = 1 if unit i receives Also the two conditional regression and variance functions:
the active treatment. We observe for each unit the triple
(Wi, Yi, Xi ), where Yi is the realized outcome: μw (x) = E[Yi (w)|Xi = x], 2 (x) = V(Y (w)|X = x).
σw i i

Yi(0) if Wi = 0,
Yi ≡ Yi(Wi) =
Yi(1) if Wi = 1.
6 7

3. Estimands and Identification 4. Estimation and Inference

Assumption 1 (Unconfoundedness, Rosenbaum and Rubin, 1983a)


Population average treatments
(Yi(0), Yi(1)) ⊥
⊥ Wi | Xi.
τP = E[Yi(1) − Yi(0)] τP,T = E[Yi (1) − Yi(0)|W = 1].
“conditional independence assumption,” “selection on observ-
Most of the discussion in these notes will focus on τP , with ables.” In missing data literature “missing at random.”
extensions to τP,T available in the references.
To see the link with standard exogeneity assumptions, assume
We will also look at the sample average treatment effect (SATE): constant effect and linear regression:

N Yi(0) = α + Xi β + εi, =⇒ Yi = α + τ · Wi + Xiβ + εi


1 
τS = (Yi(1) − Yi(0))
N i=1 with εi ⊥
⊥ Xi . Given the constant treatment effect assumption,
unconfoundedness is equivalent to independence of Wi and εi
τP versus τS does not matter for estimation, but matters for conditional on Xi, which would also capture the idea that Wi
variance. is exogenous.
8 9
Motivation for Unconfoundeness Assumption (I) Motivation for Unconfoundeness Assumption (II)

The first is a statistical, data descriptive motivation. A second argument is that almost any evaluation of a treatment
involves comparisons of units who received the treatment with
A natural starting point in the evaluation of any program is a units who did not.
comparison of average outcomes for treated and control units.
The question is typically not whether such a comparison should
A logical next step is to adjust any difference in average out- be made, but rather which units should be compared, that is,
comes for differences in exogenous background characteristics which units best represent the treated units had they not been
(exogenous in the sense of not being affected by the treat- treated.
ment).
It is clear that settings where some of necessary covariates are
Such an analysis may not lead to the final word on the efficacy not observed will require strong assumptions to allow for iden-
of the treatment, but the absence of such an analysis would tification. E.g., instrumental variables settings Absent those
seem difficult to rationalize in a serious attempt to understand assumptions, typically only bounds can be identified (e.g., Man-
the evidence regarding the effect of the treatment. ski, 1990, 1995).
10 11

Motivation for Unconfoundeness Assumption (III)

Example of a model that is consistent with unconfoundedness:


suppose we are interested in estimating the average effect of Overlap
a binary input on a firm’s output, or Yi = g(W, εi).

Suppose that profits are output minus costs, Second assumption on the joint distribution of treatments and
covariates:
Wi = arg max E[πi (w)|ci ] = arg max E[g(w, εi) − ci · w|ci ],
w w

implying Assumption 2 (Overlap)

Wi = 1{E[g(1, εi) − g(0, εi) ≥ ci|ci ]} = h(ci). 0 < Pr(Wi = 1|Xi) < 1.

If unobserved marginal costs ci differ between firms, and these


marginal costs are independent of the errors εi in the firms’ Rosenbaum and Rubin (1983a) refer to the combination of the
forecast of output given inputs, then unconfoundedness will two assumptions as ”stongly ignorable treatment assignment.”
hold as

(g(0, εi), g(1, εi)) ⊥


⊥ ci.
12 13
Alternative Assumptions
Identification Given Assumptions
E[Yi (w)|Wi, Xi ] = E[Yi(w)|Xi ],
τ (x) ≡ E[Yi(1) − Yi(0)|Xi = x] = E[Yi(1)|Xi = x] − E[Yi (0)|Xi = x]
for w = 0, 1. Although this assumption is unquestionably
= E[Yi(1)|Xi = x, Wi = 1] − E[Yi(0)|Xi = x, Wi = 0] weaker, in practice it is rare that a convincing case can be
made for the weaker assumption without the case being equally
= E[Yi|Xi , Wi = 1] − E[Yi |Xi, Wi = 0]. strong for the stronger Assumption.

To make this feasible, one needs to be able to estimate the The reason is that the weaker assumption is intrinsically tied to
expectations E[Yi |Xi = x, Wi = w] for all values of w and x in the functional form assumptions, and as a result one cannot iden-
support of these variables. This is where overlap is important. tify average effects on transformations of the original outcome
(e.g., logarithms) without the strong assumption.
Given identification of τ (x),
If we are interested in τP,T it is sufficient to assume
τP = E[τ (Xi)]
Yi(0) ⊥
⊥ Wi | Xi,
14 15

Efficiency Bound
Propensity Score
Hahn (1998): for any regular estimator for τP , denoted by τ̂ ,
Result 1 Suppose that Assumption 1 holds. Then: with


(Yi(0), Yi(1)) ⊥
⊥ Wi | e(Xi). d
N · (τ̂ − τP ) −→ N(0, V ),

Only need to condition on scalar function of covariates, which the variance must satisfy:
would be much easier in practice if Xi is high-dimensional.
 
σ12(Xi) σ02(Xi)
V ≥E + + (τ (Xi) − τP )2 . (1)
(Problem is that the propensity score e(x) is almost never e(Xi) 1 − e(Xi)
known.)
Estimators exist that achieve this bound.

16 17
A. Regression Estimators

Estimate μw (x) consistently and estimate τP or τS as


Estimators
N
1 
τ̂reg = (μ̂1(Xi) − μ̂0(Xi )).
N i=1
A. Regression Estimators
Simple implementations include
B. Matching
μw (x) = β  x + τ · w,

in which case the average treatment effect is equal to τ . In


C. Propensity Score Estimators this case one can estimate τ simply by least squares estimation
using the regression function

D. Mixed Estimators (recommended) Yi = α + β Xi + τ · Wi + εi.

More generally, one can specify separate regression functions


 x.
for the two regimes, μw (x) = βw
18 19

These simple regression estimators can be sensitive to dif-


ferences in the covariate distributions for treated and control

8
units.

7
The reason is that in that case the regression estimators rely

6
heavily on extrapolation.

5
4
Note that μ0(x) is used to predict missing outcomes for the
treated. Hence on average one wishes to use predict the control


3
outcome at X T = i Wi · Xi/NT , the average covariate value

2
for the treated. With a linear regression function, the average
prediction can be written as ȲC + β̂ (X T − X C ).

1
0
If X T and X C are close, the precise specification of the regres-

−1
sion function will not matter much for the average prediction.
0.35

0.25

0.15

0.05
0.4

0.3

0.2

0.1

0
With the two averages very different, the prediction based on
a linear regression function can be sensitive to changes in the
specification.
20
B. Matching

let m(i) is the mth closest match, that is, the index l that
satisfies Wl = Wi and
Issues with Matching

1{ Xj − Xi ≤ Xl − Xi } = m,
j|Wj =Wi Bias is of order O(N −1/K ), where K is dimension of covariates.
Is important in large samples if K ≥ 2 (and dominates variance
Then asymptotically if K ≥ 3)
 
Yi if Wi = 0, 1 
Ŷi(0) = 1  Ŷi(1) = M j∈JM (i) Yj if Not Efficient (but efficiency loss is small)
M j∈JM (i) Yj if Wi = 1, Yi if
Easy to implement, robust.
The simple matching estimator is

1 N  
sm =
τ̂M Ŷi(1) − Ŷi(0) . (2)
N i=1
21 22

C.1 Propensity Score Estimators: Weighting


       
WY W Yi(1) e(X)Yi (1) Implementation of Horvitz-Thompson Estimator
E =E E X =E E = E[Yi (1)],
e(X) e(X) e(X)
Estimate e(x) flexibly (Hirano, Imbens and Ridder, 2003)
and similarly
 
(1 − W )Y N
 W i · Yi N
Wi
E = E[Yi(0)], τ̂weight = /
1 − e(X) i=1 ê(Xi ) i=1 ê(Xi )

implying
N (1 − W ) · Y
 N (1 − W )

  i i i
W ·Y (1 − W ) · Y − /
τP = E − . i=1 1 − ê(Xi ) i=1 1 − ê(Xi )
e(X) 1 − e(X)

With the propensity score known one can directly implement Is efficient given nonparametric estimator for e(x).
this estimator as
N
Potentially sensitive to estimator for propensity score.
1  Wi · Yi (1 − Wi) · Yi
τ̃ = − . (3)
N i=1 e(Xi) 1 − e(Xi)
23 24
D.1 Mixed Estimators: Weighting and Regression

Interpret Horvitz-Thompson estimator as weighted regression


estimator:

Matching or Regression on the Propensity Score Wi 1 − Wi


Yi = α + τ · W i + ε i , with weights λi = + .
e(Xi ) 1 − e(Xi)
Not clear what advantages are.
This weighted-least-squares representation suggests that one
may add covariates to the regression function to improve pre-
Large sample properties not known. cision, for example as

Simulation results not encouraging. Yi = α + β Xi + τ · Wi + εi,

with the same weights λi. Such an estimator is consistent


as long as either the regression model or the propensity score
(and thus the weights) are specified correctly. That is, in the
Robins-Ritov terminology, the estimator is doubly robust.
25 26

Matching and Regression

First match observations.


Estimation of the Variance
Define
  For efficient estimator of τP :
Xi if Wi = 0, X(i) if Wi = 0,
X̂i(0) = X̂i(1) =  
X(i) if Wi = 1, Xi if Wi = 1.
σ 2(X ) σ02(Xi)
VP = E 1 i + + (μ1(Xi) − μ0(Xi) − τ )2 ,
Then adjust within pair difference for the within-pair difference e(Xi) 1 − e(Xi)
in covariates X̂i (1) − X̂i(0):
Estimate all components nonparametrically, and plug in.
1 N   
adj
τ̂M = Ŷi(1) − Ŷi(0) − β̂ · X̂i (1) − X̂i(0) ,
N i=1 Alternatively, use bootstrap.

using regression estimate for β. (Does not work for matching estimator)
Can eliminate bias of matching estimator given flexible speci-
fication of regression function.
27 28
Estimation of the Variance

For all estimators of τS , for some known λi(X, W)

N

τ̂ = λi (X, W) · Yi,
i=1

N

V (τ̂ |X, W) = λi(X, W)2 · σW
2 (X ).
i i
i=1

To estimate σW 2 (X ) one uses the closest match within the set


i i
of units with the same treatment indicator. Let v(i) be the
closest unit to i with the same treatment indicator.

The sample variance of the outcome variable for these 2 units


2 (X ):
can then be used to estimate σW i i

 2
2 (X ) = Y − Y
σ̂W /2.
i i i v(i)
29

You might also like