0% found this document useful (0 votes)
11 views

Lecture 1b

Lecture 1b of advanced econometrics from university

Uploaded by

valentina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 1b

Lecture 1b of advanced econometrics from university

Uploaded by

valentina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof.

Lamarche

Identification and Estimation in Models with


Binary Regressors
Lecture 1b

The purpose of this lecture is to provide a framework of ideas and concepts that are
useful to keep in mind. It is an introduction to identification and estimation under strong
assumptions (e.g., exogeneity).
Suppose now we have a binary treatment indicator, D = {0, 1}, that has been randomly
assigned to different individuals in a population. Suppose also that Y is an outcome of
interest and it is modeled using:

Y = β0 + δD + U. (1)

It follows that the average treatment effect, or ATE, is

δ = E(Y |D = 1) − E(Y |D = 0), (2)

which is identified based on the existence of two conditional mean functions. For δ to be
well-defined, both E(Y |D = 1) and E(Y |D = 0) need to exist and can be learned from data.
The unknown δ is interpreted as the difference between the mean of Y among people who
are on treatment (D = 1) and the mean of Y among people who are in the control group
(D = 0).
I am interested in briefly discussing basic regression analysis when most of the assump-
tions (A1)-(A5) are satisfied. When we use observational data, we need to carefully evaluate
violations of this assumptions. However, in some situations, that are increasingly more
common in economics, we might have experimental data (or data from an experiment).
By a randomized experiment, you should think of a situation described by two groups of
individuals (one group is “treated” and the other is not) of a fixed population of n individuals:

1. The number of treated units, n1 , is fixed (a priori)

2. The number of control groups units, n0 = n − n1 , is fixed as well

3. n1 units are randomly selected. Therefore, n0 units are randomly selected as well

1
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche

4. The units in n1 will receive active treatment, while the units in n0 will not receive
active treatment

5. The probability P (D|Y1 , Y0 ) = P (D) do not depend on potential outcomes (uncon-


foundedness)

6. Each unit has a probability of assignment equal to n1 /n where n1 = ni=1 di

Unfortunately, there typical scenario is to use observational or non-experimental data,


therefore an strategy based on randomized data may be inappropriate. Sometimes, the re-
searcher has data obtained from a randomized assignment, but there are issues with the
design and/or implementation that make the treatment and control samples invalid for con-
sistent estimation of causal parameters. In these situations, researchers try to satisfy the
CIA assumption, where one can potentially control on X if one has a sufficiently rich data
set of observables. Treatment effects can also be estimated using regression models including
regression discontinuity design, matching estimators, and instrumental variable methods.

1 Estimation using experimental data


We begin with a simple regression model with no covariates:

yi = β0 + δDi + ui , (3)

where Di is an dummy variable taking values 0 or 1, and ui is an error term. The parameter
δ is called the Average Treatment Effect (ATE) and has the following interpretation:

δ = E(yi |Di = 1) − E(yi |Di = 0). (4)

We know that OLS solves:



n
(β̂0 , δ̂) = arg min (yi − β0 − δDi )2 .
i=1

The solutions are:

β̂0 = ȳ − δ̂ D̄
∑n
i=1 (Di − D̄)(yi − ȳ)
δ̂ = ∑n ,
i=1 (Di − D̄)
2

2
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche

where z̄ denotes sample average of a generic variable z. Note that,

1∑
n
n1
D̄ = Di = .
n i=1 n

We now find an expression of the OLS estimator in the case of a two-sample treatment-
control problem. First, note that

n ∑
n ∑
n ∑
n
(Di − D̄)2 = Di (Di − D̄) = Di2 − Di D̄
i=1 i=1 i=1 i=1

n ∑
n
n1 n0 ∑
n
n0 n1
= Di − Di = Di = .
i=1 i=1
n n i=1 n

Moreover,

n ( ( ))
n0 ȳ0 + n1 ȳ1
Di (yi − ȳ) = n1 (ȳ1 − ȳ) = n1 ȳ1 −
n
i=1
( ) ( )
nȳ1 − n0 ȳ0 − n1 ȳ1 n0 ȳ1 − n0 ȳ0
= n1 = n1
n n
n0 n1
= (ȳ1 − ȳ0 ).
n

Therefore, the OLS estimator is equal to:

δ̂ = ȳ1 − ȳ0 ,

which is the difference between the averages of the observed outcomes in the treatment group
and in the control group.
This means that the OLS estimator is often interpreted as an estimator for the causal
effect in randomized experiments. As an example, there are two AER papers Angrist et al.
(2002) and Angrist, Bettinger and Kremer (2006). They use simple OLS and random effects
to estimate the “treatment” effect. Of course, one needs to be careful with this interpretation
if Di is not randomly assigned or the researcher does not have a random sample.
In sum, random sampling and random assignment in a regression setting allows us to
interpret the OLS estimator as a consistent estimator of the causal effect. To see this,
recall that independence between the covariates and the error term leads to consistency and
unbiased estimator for the simple model considered before. Are ui and Di independent here?

3
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche

First, recall that

yi = Di y1i + (1 − Di )y0i
yi = β0 + δDi + ui ,

and therefore,
ui := y0i − β0 + Di (y1i − y0i − δ).
Note that by definition,

E(y|Di = 0) = E(y0i ) = β0
E(y|Di = 1) = E(y1i ) = β0 + δ.

We arrive to the conclusion that

Cov(ui , Di ) = E(Di ui ) = E(Di E(ui |Di )) = 0,

because

E(ui |Di = 0) = E(y0i − β0 |Di = 0) = 0


E(ui |Di = 1) = E(y1i − y0i − δ|Di = 1) = 0.

We can also obtain the variance of the treatment effect estimator under randomized data.
Under homocedasticity, the variance is
( )
σ̂ 2 nσ̂ 2 1 1
Var(δ̂) = ∑n = = σ̂ 2
+ .
i=1 (Di − D̄)
2 n1 n0 n1 n0

The variance of the OLS estimator if the errors are heteroscedastic is,
∑n 2
i=1 û (Di − D̄)
2
σ̂12 σ̂02
Var(δ̂) = ∑ = + ,
i=1 (Di − D̄)
n 2 n1 n0

where σ̂12 and σ̂02 are the variances of the treatment and control groups, respectively.
1 ∑
σ̂12 = (yi − ȳ1 )2
n1 − 1 i:D =1
i

1 ∑
σ̂02 = (yi − ȳ0 )2 .
n0 − 1 i:D =0
i

4
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche

1.1 Adding covariates


We now answer what is the effect of adding covariates in a regression model when practition-
ers have randomized data (often called randomized control trial, or simply RCT). Consider
now,
yi = β0 + δDi + x′i β + ui ,
where xi = (xi1 , . . . , xip )′ is a vector of observable variables that are assumed to enter linearly
in the outcome equation. In this model, we are not interested in the “nuisance” parameters
β0 and β. The only parameter of interest is the treatment effect, δ.
The OLS estimator is still consistent. We can show the consistency using the limit of the
objective function,

n
Sn (β0 , δ, β) = (yi − β0 − δDi − x′i β)2 ,
i=1

as n → ∞, which is equal to

S(β0 , δ, β) = E(yi − β0 − δDi − x′i β)2 . (5)

Let β̃0 = β0 + µx β and µx = E(xi ). We can write (5) as,

S(β0 , δ, β) = E(yi − β̃0 − δDi − (xi − µx )′ β)2 . (6)

It follows that,

S(β0 , δ, β) = E(yi − β̃0 − δDi )2 + E((xi − µx )′ β)2


( )
−2E (yi − β̃0 − δDi )(xi − µx )′ β .

By definition, we have that E(β̃0 (xi − µx )′ β) = 0, and by random sampling and random
assignment,
E(δDi (xi − µx )′ β) = 0.

Therefore, we have that

S(β0 , δ, β) = E(yi − β̃0 − δDi )2 + E((xi − µx )′ β)2


−2E (yi (xi − µx )′ β) .

The parameter δ only enters in the first term, so the argument that minimizes the OLS
objective function in the limit, S(β0 , δ, β), is obtained considering only the first term on the

5
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche

right hand side of the equation E(yi − β̃0 − δDi )2 . Therefore, as before,

δ = E(yi |Di = 1) − E(yi |Di = 0). (7)

We cannot say however that the effect of adding covariates to the model do not affect
the performance of the OLS estimator. Adding covariates to the model affects the variance
of the OLS estimator, because we use residuals for estimating σ 2 :

n ∑
n
2
σ̂ = û2i = (yi − β̂0 − δ̂Di − x′i β̂)2 .
i=1 i=1

Naturally, the sum of the residuals squared in a model with independent variables is
smaller than in a model that do not include additional controls, provided that the included
covariates have some predicting power. The caveat here is that if you include a large number
of covariates, the variance might increase since you are likely to estimate effects that are
equal to zero.

2 Estimation under quasi-experimental variation


Suppose Yb is the outcome of interest before a program is implemented and Ya is the outcome
of interest is observed after a program is implemented. Suppose that D now indicates
individuals under treatment (note that D = 0 before the program is implemented).
Let D = {0, 1}, T = {b, a} and

Y = β0 + β1 D + β2 T + θ(D × T ) + U. (8)

It follows that

θ = (E(Ya |D = 1) − E(Ya |D = 0)) − (E(Yb |D = 1) − E(Yb |D = 0)), (9)

which is again identified based on the existence of conditional mean functions.


The unknown θ is interpreted as the difference between the mean of Y among people who
are on treatment (D = 1) and the mean of Y among people who are in the control group
(D = 0).
The parameter of interest can be estimated by Method of Moments or a linear regression
with dummy variables.
This approach has been popular since the work by Card and Krueger (1994) on minimum
wages. They investigate the change in employment due to the increase in the minimum wage
in New Jersey. They use Pennsylvania (PA), a neighboring state, to identify the change in

6
Econometrı́a Avanzada, Maestrı́a en Economı́a - FCE - UNLP Prof. Lamarche

employment that New Jersey (NJ) would have experienced if the minimum wage in New
Jersey was not increased. The data is available from Professor Card’s website at Berkeley.
By estimating the sample mean of employment before and after the changes in the min-
imum wage, we obtain: and therefore,

Before After
PA 23.33 21.17
NJ 20.43 21.02

δb = (21.02 − 20.43) − (21.17 − 23.33) = 2.75.

The result was controversial and lead to a significant discussion in academia and general
public. How we should interpret the results and how general they are? The results were
weakly significant as they only use a small sample of restaurants located in these states. As
in many causal inference papers, one needs to be careful how much we extrapolate to other
settings. But the message of the paper is that small increases in minimum wage might not
lead to decreases in employment.

You might also like