Diff PDF
Diff PDF
Section Notes
GSI: David Albouy
The purpose of the program evaluation is to find a "good" estimate of δ, δ̂, given the data that we have
available.
Example 1 Card and Krueger (1994, AER) in "Minimum Wages and Employment: A Case Study of the
Fast-Food Industry in New Jersey and Pennsylvania" try to evaluate the effect of the minimum wage (the
treatment) on employment (the outcome). On April 1, 1992, New Jersey’s minimum wage rose from $4.25
to $5.05 per hour. To evaluate the impact of the law, the authors surveyed 410 fast-food restaurants in New
Jersey (the treatment group) and eastern Pennsylvania (the control group) before and after the rise. Yi is
the employment of a fast food restaurant, Ti is an indicator of whether or not a restaurant is in New Jersey,
and ti is an indicator of whether the observation is from before or after the minimum wage hike.
1. The model in equation (Outcome) is correctly specified. For example, the additive structure imposed
is correct.
2. The error term is on average zero: E [εi ] = 0. Not a hard assumption with the constant term α put
in.
3. The error term is uncorrelated with the other variables in the equation:
cov (εi , Ti ) = 0
cov (εi , ti ) = 0
cov (εi , Ti · ti ) = 0
the last of these assumptions, also known as the parallel-trend assumption, is the most critical.
Under these assumptions we can use equation (Outcome) to determine that expected values of the average
outcomes are given by
£ ¤
E Y0T = α + β
£ ¤
E Y1T = α + β + γ + δ
£ ¤
E Y0C = α
£ ¤
E Y1C = α + γ
which means that this estimator will be biased so long as γ 6= 0, i.e. if a time-trend exists in the outcome Yi
then we will confound the time trend as being part of the treatment effect.
1 This would be the estimate one would get from an OLS estimate on a regression equation of the form
Yi = α1 + δ1 Ti + εi
on the sample from the treatment group only.
2
2.2 Simple Treatment versus Control Estimator
Next consider the estimator based on comparing the average difference in outcome Yi post-treatment, between
the treatment and control groups, ignoring pre-treatment outcomes.2
and so this estimator is biased so long as β 6= 0, i.e. there exist permanent average differences in outcome Yi
between the treatment groups. The true treatment effect will be confounded by permanent differences in
treatment and control groups that existed prior to any treatment. Note that in a randomized experiments,
where subjects are randomly selected into treatment and control groups, β should be zero as both groups
should be nearly identical: in this case this estimator may perform well in a controlled experimental setting
typically unavailable in most program evaluation problems seen in economics.
This estimator can be seen as taking the difference between two pre-versus-post estimators seen above in
(D1), subtracting the control group’s estimator, which captures the time trend γ, from the treatment ¡ group’s¢
estimator to get δ. We can also rearrange terms in equation (DD) to get δ̂DD = Ȳ1T − Ȳ1C − Ȳ0T − Ȳ0C
in which can be interpreted as taking the difference of two estimators of the simple treatment versus control
type seen in equation (D2). The difference estimator for the pre-period is used to estimate the permanent
difference β, which is then subtracted away from the post-period estimator to get δ.
Another interpretation of the difference in difference estimator is that is a simple difference estimator
between the actual Ȳ1T and the Ȳ1T that would¡ occur¢in the post treatment period to the treatment group
T
had there been no treatment Ȳcf = Ȳ0T + Ȳ1C − Ȳ0C , where the subscript ”cf ” refers to the term "coun-
h i
terfactual," so that δ̂DD = Ȳ1T − ȲcfT
. This observation ȲcfT
, which has expectation E ȲcfT
= α + β + γ,
does not exist: it is literally "contrary to fact" since there actually was a treatment in fact. However if our
T
assumption are correct we can construct legitimate estimate of Ȳcf , taking the pre treatment average Ȳ0T
and adding the our estimate β using the pre versus post difference for the control group.
2 This would be the estimate one would get from an OLS estimate on a regression equation of the form
Yi = α2 + δ2 ti + εi
on the post-treatment samples only.
3 This would be the estimate one would get from an OLS estimate of a regression equation of the form given by (Outcome)
Notice that the first row ends with the estimate δ̂1 , the second column ends with estimate δ̂2 , and the lower
right hand corner entry gives the estimate δ̂DD .
Example 2 According to the model, by Card and Krueger (1994) comparisons of employment growth at
stores in New Jersey and Pennsylvania (where the minimum wage was constant), provide simple estimates
of the effect of the higher minimum wage. Some of the results from Table 3 are shown below with the average
employment in the fast-food restaurants, with standard errors in parentheses
The difference in difference estimator shows a small increase in employment in New Jersey where the mini-
mum wage increased. This came as quite a shock to most economists who thought employment would fall.
Notice that we can see that prior to the increase in the minimum wage Pennsylvania had higher employment
than New Jersey and that it was bound to fall to a lower level. This may be a failure in the parallel trend
assumption. However the small, albeit insignificant increase in employment in New Jersey makes it hard to
accept the hypothesis that employment actually decreased in New Jersey over this time. Although still some-
what controversial, this study helped change the common presupposition that a small change in the minimum
wage from a low level was bound to cause a significant decrease in employment.
The failure of the parallel trend assumption may in fact be a relatively common problem in many program
evaluation studies, causing many difference in difference estimators to be biased.
One way to help avoid these problems is to get more data on other time periods before and after treatment
to see if there are any other pre-existing differences in trends. It may also be possible to find other control
groups which will can provide additional underlying trends. There is a huge literature on this subject,
although a good place to start is Meyer (1995).