Lecture 1b
Lecture 1b
Lamarche
The purpose of this lecture is to provide a framework of ideas and concepts that are
useful to keep in mind. It is an introduction to identification and estimation under strong
assumptions (e.g., exogeneity).
Suppose now we have a binary treatment indicator, D = {0, 1}, that has been randomly
assigned to different individuals in a population. Suppose also that Y is an outcome of
interest and it is modeled using:
Y = β0 + δD + U. (1)
which is identified based on the existence of two conditional mean functions. For δ to be
well-defined, both E(Y |D = 1) and E(Y |D = 0) need to exist and can be learned from data.
The unknown δ is interpreted as the difference between the mean of Y among people who
are on treatment (D = 1) and the mean of Y among people who are in the control group
(D = 0).
I am interested in briefly discussing basic regression analysis when most of the assump-
tions (A1)-(A5) are satisfied. When we use observational data, we need to carefully evaluate
violations of this assumptions. However, in some situations, that are increasingly more
common in economics, we might have experimental data (or data from an experiment).
By a randomized experiment, you should think of a situation described by two groups of
individuals (one group is “treated” and the other is not) of a fixed population of n individuals:
3. n1 units are randomly selected. Therefore, n0 units are randomly selected as well
1
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche
4. The units in n1 will receive active treatment, while the units in n0 will not receive
active treatment
yi = β0 + δDi + ui , (3)
where Di is an dummy variable taking values 0 or 1, and ui is an error term. The parameter
δ is called the Average Treatment Effect (ATE) and has the following interpretation:
β̂0 = ȳ − δ̂ D̄
∑n
i=1 (Di − D̄)(yi − ȳ)
δ̂ = ∑n ,
i=1 (Di − D̄)
2
2
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche
1∑
n
n1
D̄ = Di = .
n i=1 n
We now find an expression of the OLS estimator in the case of a two-sample treatment-
control problem. First, note that
∑
n ∑
n ∑
n ∑
n
(Di − D̄)2 = Di (Di − D̄) = Di2 − Di D̄
i=1 i=1 i=1 i=1
∑
n ∑
n
n1 n0 ∑
n
n0 n1
= Di − Di = Di = .
i=1 i=1
n n i=1 n
Moreover,
∑
n ( ( ))
n0 ȳ0 + n1 ȳ1
Di (yi − ȳ) = n1 (ȳ1 − ȳ) = n1 ȳ1 −
n
i=1
( ) ( )
nȳ1 − n0 ȳ0 − n1 ȳ1 n0 ȳ1 − n0 ȳ0
= n1 = n1
n n
n0 n1
= (ȳ1 − ȳ0 ).
n
δ̂ = ȳ1 − ȳ0 ,
which is the difference between the averages of the observed outcomes in the treatment group
and in the control group.
This means that the OLS estimator is often interpreted as an estimator for the causal
effect in randomized experiments. As an example, there are two AER papers Angrist et al.
(2002) and Angrist, Bettinger and Kremer (2006). They use simple OLS and random effects
to estimate the “treatment” effect. Of course, one needs to be careful with this interpretation
if Di is not randomly assigned or the researcher does not have a random sample.
In sum, random sampling and random assignment in a regression setting allows us to
interpret the OLS estimator as a consistent estimator of the causal effect. To see this,
recall that independence between the covariates and the error term leads to consistency and
unbiased estimator for the simple model considered before. Are ui and Di independent here?
3
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche
yi = Di y1i + (1 − Di )y0i
yi = β0 + δDi + ui ,
and therefore,
ui := y0i − β0 + Di (y1i − y0i − δ).
Note that by definition,
E(y|Di = 0) = E(y0i ) = β0
E(y|Di = 1) = E(y1i ) = β0 + δ.
because
We can also obtain the variance of the treatment effect estimator under randomized data.
Under homocedasticity, the variance is
( )
σ̂ 2 nσ̂ 2 1 1
Var(δ̂) = ∑n = = σ̂ 2
+ .
i=1 (Di − D̄)
2 n1 n0 n1 n0
The variance of the OLS estimator if the errors are heteroscedastic is,
∑n 2
i=1 û (Di − D̄)
2
σ̂12 σ̂02
Var(δ̂) = ∑ = + ,
i=1 (Di − D̄)
n 2 n1 n0
where σ̂12 and σ̂02 are the variances of the treatment and control groups, respectively.
1 ∑
σ̂12 = (yi − ȳ1 )2
n1 − 1 i:D =1
i
1 ∑
σ̂02 = (yi − ȳ0 )2 .
n0 − 1 i:D =0
i
4
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche
as n → ∞, which is equal to
It follows that,
By definition, we have that E(β̃0 (xi − µx )′ β) = 0, and by random sampling and random
assignment,
E(δDi (xi − µx )′ β) = 0.
The parameter δ only enters in the first term, so the argument that minimizes the OLS
objective function in the limit, S(β0 , δ, β), is obtained considering only the first term on the
5
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche
right hand side of the equation E(yi − β̃0 − δDi )2 . Therefore, as before,
We cannot say however that the effect of adding covariates to the model do not affect
the performance of the OLS estimator. Adding covariates to the model affects the variance
of the OLS estimator, because we use residuals for estimating σ 2 :
∑
n ∑
n
2
σ̂ = û2i = (yi − β̂0 − δ̂Di − x′i β̂)2 .
i=1 i=1
Naturally, the sum of the residuals squared in a model with independent variables is
smaller than in a model that do not include additional controls, provided that the included
covariates have some predicting power. The caveat here is that if you include a large number
of covariates, the variance might increase since you are likely to estimate effects that are
equal to zero.
Y = β0 + β1 D + β2 T + θ(D × T ) + U. (8)
It follows that
6
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche
employment that New Jersey (NJ) would have experienced if the minimum wage in New
Jersey was not increased. The data is available from Professor Card’s website at Berkeley.
By estimating the sample mean of employment before and after the changes in the min-
imum wage, we obtain: and therefore,
Before After
PA 23.33 21.17
NJ 20.43 21.02
The result was controversial and lead to a significant discussion in academia and general
public. How we should interpret the results and how general they are? The results were
weakly significant as they only use a small sample of restaurants located in these states. As
in many causal inference papers, one needs to be careful how much we extrapolate to other
settings. But the message of the paper is that small increases in minimum wage might not
lead to decreases in employment.