0% found this document useful (0 votes)
4 views

Distribution Regression Difference-in-Differences

Uploaded by

490189269
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Distribution Regression Difference-in-Differences

Uploaded by

490189269
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Distribution Regression Difference-In-Differences

Iván Fernández-Val∗ Jonas Meier† Aico van Vuuren‡ Francis Vella§


arXiv:2409.02311v1 [econ.EM] 3 Sep 2024

September 5, 2024

Abstract

We provide a simple distribution regression estimator for treatment effects in the


difference-in-differences (DiD) design. Our procedure is particularly useful when the
treatment effect differs across the distribution of the outcome variable. Our pro-
posed estimator easily incorporates covariates and, importantly, can be extended
to settings where the treatment potentially affects the joint distribution of multiple
outcomes. Our key identifying restriction is that the counterfactual distribution of
the treated in the untreated state has no interaction effect between treatment and
time. This assumption results in a parallel trend assumption on a transformation
of the distribution. We highlight the relationship between our procedure and as-
sumptions with the changes-in-changes approach of Athey and Imbens (2006). We
also reexamine two existing empirical examples which highlight the utility of our
approach.


Boston University

Swiss National Bank.

University of Groningen.
§
Georgetown University.

1
1 Introduction
The remarkable popularity of the difference-in-difference (DiD) estimator, inspired by an
approach to evaluating the impact of policy interventions on economic outcomes intro-
duced by David Card (see, for example, Card 1990, Card and Krueger, 1994), is one of
the most striking features of empirical work on treatment and policy effects. While the
methodological innovations in this literature (see Arkhangelsky and Imbens, 2023 for a
recent review) include the use of constructed control groups, the staggered timing of treat-
ments, and fuzzy rather than sharp designs, the vast majority of the associated empirical
work has estimated the mean effect of the treatment on a single economic outcome. This
seems somewhat limited and a fuller evaluation of a policy treatment would be based on
an examination of the marginal and joint distributions of all outcomes it potentially in-
fluences. This paper provides a simple procedure for estimating distributional treatment
effects in the presence of a single treatment when the outcomes of interest are potentially
multivariate.
An initial methodological innovation focusing on distributional effects in DiD estima-
tion is the changes-in-changes procedure of Athey and Imbens (2006), which estimates
the counterfactual distribution of the treated group in the absence of treatment to com-
pare with its observed distribution in the presence of treatment. Torous et al. (2024)
extend the Athey and Imbens approach (2006) to the multiple outcome setting. Other
work has adapted DiD estimation to examine the treatment effects at different quantiles
of the outcome via the use of quantile regression. This includes, for example, Callaway
and Li (2018, 2019). In contrast, Dube (2019), Goodman-Bacon (2021), and Goodman-
Bacon and Schmidt (2020) employ conventional DiD estimation to explore the impact of
the treatment at different points of the outcome distribution. Other distributional ap-
proaches include Kim and Wooldridge (2023) and Biewen, Fitzenberger, and Rümmele
(2022). The former proposes an inverse probability weighting based procedure, while the
latter employs a distribution regression (DR) approach. In this paper we also adopt a
DR approach to constructing counterfactuals. In contrast to Biewen, Fitzenberger, and
Rümmele (2022), who construct the counterfactual distributions via linear probability

2
models, we employ non-linear link functions such as probit or logit models. This has a
number of advantages, which we discuss below. In addition, we provide the associated
identifying conditions required for this form of the implementation of DR-DiD.
While DiD has typically been employed to evaluate the treatment effect on a specified
economic outcome, there are many instances in which the treatment may affect multiple
outcomes. For example, a change in tax rates on earnings of married couples may affect the
hours of work of both husbands and wives. An analysis of such a tax change should include
the impact on each of the outcomes. However, a richer analysis would not only examine
the impact on the respective marginal hours distribution of husbands and wives but also
the joint distribution of hours. Alternatively, while evaluations of minimum wage laws
typically evaluate their impact on employment, they may also affect the wage distribution.
We illustrate how this joint effect can be evaluated via the bivariate distribution regression
(BDR) approach of Fernandez-Val et al. (2024a). This requires that we first estimate
the joint distribution by BDR and then construct the appropriate counterfactual. The
treatment effects are obtained via the appropriate comparisons. An alternative to this
approach is extending the changes-in-changes procedure to multiple outcomes as is done
in Torous et al. (2024).
The following section introduces the model and provides an analysis of the univariate
case without covariates. We also extend our analysis to include covariates and contrast
our approach with the Athey and Imbens (2006) changes-in-changes procedure. Section
3 extends our analysis to the multiple outcome case and Section 4 discusses estimation.
Section 5 provides two empirical illustrations of our methodology. We first revisit the
Malesky et al. (2014) investigation of the impact of recentralization in Vietnam. We
also employ the data studied in Callaway and Li (2019) and feed it with data from the
Bureau of Labor Statistics and the Census in order to explore the impact of changes
in the minimum wage on the joint distributions of average wages, poverty rates and
unemployment rates. Section 6 concludes.

3
2 Econometric analysis of the univariate case
Consider the standard DiD design with 2 periods, T ∈ {0, 1}, and 2 groups, G ∈ {0, 1} in
which a binary treatment, D ∈ {0, 1}, is administered only to the treatment group with
G = 1 in the second period T = 1. Let Y0 and Y1 denote the potential outcomes under
the non-treated and treated statuses. The observed outcome is Y = Y0 (1 − D) + Y1 D,
which corresponds to Y0 for both groups at T = 0, Y0 for G = 0 at T = 1, and Y1 for
G = 1 at T = 1. Note that this implicitly imposes a non-anticipation assumption as we
do not distinguish between the outcomes of the treated and non-treated state for G = 1
in period T = 0.
We are interested in the distributions of the potential outcomes of the treated at
T = 1, that is FY1 | G,T (y | 1, 1) and FY0 | G,T (y | 1, 1). FY1 | G,T (y | 1, 1) is identified from the
observed outcome for G = 1 at T = 1,

FY1 | G,T (y | 1, 1) = FY | G,T (y | 1, 1);

whereas FY0 | G,T (y | 1, 1) is not identified without further assumptions.


The distribution of Y0 conditional on G and T can be written as:

FY0 | G,T (y | g, t) = Λ(α(y) + β(y)t + γ(y)g + δ(y)gt), y ∈ R, (1)

where Λ is an invertible CDF such as the logistic, normal or uniform, and y 7→ (α(y),
β(y), γ(y), δ(y)) is a vector of function-valued parameters.
The representation in (1) does not make any parametric assumption about the under-
lying distribution of Y0 | G, T since the dummy variable representation within the paren-
theses on the right-hand side is fully saturated. The parameters of the representation are
local as they vary with y. To understand why (1) does not impose any restriction, note

4
that α(y), β(y), γ(y) and δ(y) can be defined as:1

α(y) = Λ−1 FY0 | G,T (y | 0, 0)
 
β(y) = Λ−1 FY0 | G,T (y | 0, 1) − Λ−1 FY0 | G,T (y | 0, 0)
 
γ(y) = Λ−1 FY0 | G,T (y | 1, 0) − Λ−1 FY0 | G,T (y | 0, 0)
 
δ(y) = Λ−1 FY0 | G,T (y | 1, 1) − Λ−1 FY0 | G,T (y | 1, 0)
  
− Λ−1 FY0 | G,T (y | 0, 1) − Λ−1 FY0 | G,T (y | 0, 0) .

We make the following identifying assumptions:

Assumption 1 [No-interaction].

δ(y) = 0 for all y ∈ R in (1).

Let Yd (G = g, T = t) denote the support of Yd | G = g, T = t, for d, g, t ∈ 0, 1. We


also assume:

Assumption 2 [Support].

Y0 (G = 1; T = 1) ⊆ Y0 (G = 0; T = 1) ∪ Y0 (G = 1; T = 0) ∪ Y0 (G = 0; T = 0).

Assumption 1 implies that the distribution of the potential outcome Y0 should not
change differently in the second period for the treatment group compared to the control
group. That is, we allow a difference between the distributions of the potential outcome Y0
between the treatment and control group, but this difference should be identical in both
periods. This is a parallel trend type assumption on a transformation of the distribution
and can be written as:
 
Λ−1 FY0 | G,T (y | 1, 1) − Λ−1 FY0 | G,T (y | 1, 0) =
 
Λ−1 FY0 | G,T (y | 0, 1) − Λ−1 FY0 | G,T (y | 0, 0) .

This assumption is sensitive to the link function and imposes restrictions on the distri-
bution FY0 | G,T for some link functions. For example, if Λ is the identity link used in the
linear probability model as in, for example, Almond et al. (2011), Dube (2019), Cengiz
1
See also Wooldridge (2023) equations (2.6) and (2.7).

5
et al. (2019), Goodman-Bacon and Smith (2020), Goodman-Bacon (2021) and Biewen et
al. (2022), one needs strong requirements in order to satisfy the parallel trends assump-
tion (Blundell et al., 2004 and Wooldridge, 2023) That is, we need restrictions on the
tails of the distribution of FY0 | G,T (y | 1, 0), FY0 | G,T (y | 0, 1) and FY0 | G,T (y | 0, 0) to guar-
antee that FY0 | G,T (y | 1, 1) is between 0 and 1. Thus, it requires that FY0 | G,T (y | 1, 0) ≤
1 + FY0 | G,T (y | 0, 0) − FY0 | G,T (y | 0, 1), which might be restrictive at the top of the distribu-
tion, and FY0 | G,T (y | 1, 0)≥ FY0 | G,T (y | 0, 0)−FY0 | G,T (y | 0, 1), which might be restrictive at
the bottom of the distribution.2,3 Link functions such as the normal or logistic CDFs do
not require such restrictions since the transformation expands the range of the distribution
to the entire real line.
Assumption 1 cannot be empirically verified but when we have multiple observations in
the pre-treatment period, it is possible to examine whether the “parallel trends” assump-
tion holds pre-treatment. Assumption 2 is a restriction of the support of the counterfactual
outcome of Y0 for the treated group in the treated period.
These two assumptions identify FY0 | G,T (y | 1, 1) since:

FY0 | G,T (y | 1, 1) = Λ(α(y) + β(y) + γ(y))


   
= Λ Λ−1 FY0 | G,T (y | 1, 0) + Λ−1 FY0 | G,T (y | 0, 1) − Λ−1 FY0 | G,T (y | 0, 0)
   
= Λ Λ−1 FY | G,T (y | 1, 0) + Λ−1 FY | G,T (y | 0, 1) − Λ−1 FY | G,T (y | 0, 0) , (2)

under Assumption 1. The support restrictions in Assumption 2 ensure that the term
inside the squared brackets in (2) is determined. Note that as limx→∞ Λ(x) = 1 and
limx→−∞ Λ(x) = 0, our assumptions are sufficient but not necessary.
We present this identification result in the following lemma:

2
These requirements could be used to develop a specification test for the identity link. Roth and
Sant’Anna (2023) proposed a test for the sharp hypothesis that y 7→ FY0 | G,T (y | 1, 0) + FY0 | G,T (y | 0, 1) −
FY0 | G,T (y | 0, 0) be weakly increasing, which can be adapted to our setting. We do not pursue this route
as we do not encourage the use of the linear probability model.
3
For example, an increase in 0.2 in probability over time might be realistic for the control group when
the initial probability was 0.5. However, if treatment group has a probability of, for example, 0.9, in the
first period then it is not possible for the common trends assumption to hold.

6
Lemma 1 [Identification with Single Outcome]. y 7→ FY0 | G,T (y | 1, 1) is identified on
y ∈ R under Assumptions 1 and 2.

Proof of Lemma 1. The results follow from equation (2).

2.1 Inclusion of Covariates


Including covariates is appealing as the assumption that δ(y) = 0 may be harder to defend
when there are differences in the trend between covariates and/or the composition of the
treatment group changes over time in terms of observed characteristics; see also Melly
and Santangelo (2015). Covariates are easily incorporated into the identification result by
conditioning on them and adding an overlapping support assumption. Specifically, let X
be a vector of covariates such that the non-interaction assumption holds conditional on
X; see Assumption 3. The distribution of Y0 conditional on G, T and X can be written:

FY0 | G,T,X (y | g, t, x) = Λ(α(y, x) + β(y, x)t + γ(y, x)g + δ(y, x)gt), y ∈ R, (3)

where (y, x) 7→ (α(y, x), β(y, x), γ(y, x), δ(y, x)) is a vector of unspecified functions.
The identifying assumptions with covariates become:

Assumption 3 [No-interaction with Covariates].

δ(y, X) = 0 almost surely for all y ∈ R in (3).

Assumption 4 [Support conditions with Covariates].

Y0 (G = 1; T = 1; X) ⊆ Y0 (G = 0; T = 1; X)∪Y0 (G = 1; T = 0; X)∪Y0 (G = 0; T = 0; X),

almost surely.

These two assumptions identify FY0 | G,T,X (y | 1, 1, x) since

FY0 | G,T,X (y | 1, 1, x) = Λ(α(y, x) + β(y, x) + γ(y, x))


   
= Λ Λ−1 FY0 | G,T,X (y | 1, 0, x) + Λ−1 FY0 | G,T,X (y | 0, 1, x) − Λ−1 FY0 | G,T,X (y | 0, 0, x)
   
= Λ Λ−1 FY | G,T,X (y | 1, 0, x) + Λ−1 FY | G,T,X (y | 0, 1, x) − Λ−1 FY | G,T,X (y | 0, 0, x) ,
(4)

7
under the Assumption 3. The support restrictions in Assumption 4 ensure that the
term between parentheses in (4) is determined. Note that as limx→∞ Λ(x) = 1 and
limx→−∞ Λ(x) = 0, our assumptions are sufficient but not necessary.
Let X11 denote the support of X conditional on G = 1 and T = 1. The following
Lemma states that FY0 ,Z0 | G,T,X is identified under the previous assumptions.

Lemma 2 [Identification with Covariates]. Under Assumptions 3 and 4, (y, x) 7→ FY0 | G,T,X (y | 1, 1, x)
is identified on (y, x) ∈ R × X11 .

Proof of Lemma 2. The result follows from equations (4).

We can then identify FY0 | G,T (y | 1, 1) as:


Z
FY0 | G,T (y | 1, 1) = FY0 | G,T,X (y | 1, 1, x)dFX | G,T (x | 1, 1). (5)
X11

2.2 Comparison with Changes-In-Changes (Athey and Imbens,


2006)
As our proposals provide an alternative approach to the changes-in-changes (CiC) proce-
dure of Athey and Imbens (2006), it is useful to contrast their set up and assumptions
with ours. CiC assumes that the outcome of an individual without treatment satisfies the
relationship Y0 = h(U, T ) for the treatment and control groups, where U is an unobserved
and uniformly distributed random variable. It also assumes that h is strictly increasing in
the first term and that the distribution of U is independent of time given the treatment
outcome, i.e. U ⊥
⊥ T | G. Finally, the support of U for the treated population should be
a subset of those of the untreated population. The final assumption implies in terms of
the support of the outcomes that:

Y0 (G = 1, T = 0) ⊆ Y0 (G = 0, T = 0),

Y0 (G = 1, T = 1) ⊆ Y0 (G = 0, T = 1).

Their second support restriction is less restrictive than ours but we do not need their
first support restriction. The previous assumptions identify the quantile function of

8
FY0 | G,T (y | 1, 1) as:
 
FY−1
0 | G,T
(u | 1, 1) −1
=φ FY0 | G,T (u | 1, 0) ,

φ(y) := FY−1
0 | G,T
FY0 | G,T (y | 0, 0) | 0, 1 , u ∈ [0, 1],

where it is assumed that Y0 is continuous with strictly increasing distribution function.


The transformation φ gives the second period outcome for an individual with an unob-
served component u such that h(u, 0) = y, with y the location at which the distribution
function is evaluated (Athey and Imbens, 2006, page 441). Hence, their identification
results follow since φ evaluated in the first period observations of the treatment group is
equally distributed as the distribution of the untreated outcome of the treatment group
in the second period. Their assumptions imply the transformation φ that maps quantiles
of Y0 from period 0 to period 1 is the same for the treatment and control groups. This
condition imposes the following restrictions on the coefficients of the representation of the
conditional distribution in (1):

α(y) = α(φ(y)) + β(φ(y)), γ(y) = γ(φ(y)) + δ(φ(y)).

To see this, note that:

FY0 | G,T (y | g, 0) = FY0 | G,T (h(h−1 (y, 0), 1) | g, 1). (6)

Evaluating (6) at g = 0 and applying FY−1


0 | G,T
(· | 0, 1) to both sides:

h(h−1 (y, 0), 1) = FY−1
0 | G,T FY 0 | G,T (y | 0, 0) | 0, 1 =: φ(y).

Replacing φ(y) back in (6) and using the representation (1):

Λ(α(y) + γ(y)g) = Λ(α(φ(y)) + β(φ(y)) + γ(φ(y))g + δ(φ(y))g).

The restrictions then follow from equalizing the coefficients in both sides.4 They would
complicate estimation in our framework as they involve two different levels of Y and the
transformation φ needs to be estimated.
4
There is only a binding restriction because α(y) = α(φ(y)) + β(φ(y)) holds by definition of φ(y).

9
2.3 Comparison with Roth and Sant’Anna (2023)
Roth and Sant’Anna (2023) derive the condition:

FY0 | G,T (y | 1, 1) − FY0 | G,T (y | 1, 0) = FY0 | G,T (y | 0, 1) − FY0 | G,T (y | 0, 0), y ∈ R,

for the parallel trends assumption in expectations:

E(Y0 | G = 1, T = 1) − E(Y0 | G = 1, T = 0) = E(Y0 | G = 0, T = 1) − E(Y0 | G = 0, T = 0),

to be invariant to strictly monotone transformations of Y0 . This condition is different


from our no-interaction assumption. Indeed, our DR model with no-interaction does not
generally satisfy the parallel trends assumption in expectation as:

E(Y0 | G = g, T = 1) − E(Y0 | G = g, T = 0) =
Z ∞
[Λ(α(y) + γ(y)g) − Λ(α(y) + β(y) + γ(y)g)]dy
−∞

depends on g unless Λ is the identity map, or β(y) = 0 (no trend) or γ(y) = 0 (random
assignment) for y ∈ R. Roth and Sant’Anna (2023) show that their condition holds if and
only if there are no trends, random assignment or a mixture of both. Our model, how-
ever, generally satisfies a different invariance property with respect to strictly monotonic
transformations that we specify in subsection 2.4.

2.4 Invariance to Strictly Monotonic Transformations


The DR model in (1) with no-interaction is invariant to strictly monotonic transformations
in the sense that we specify here. If Y0 follows the DR model and satisfies the no-
interaction assumption, then Ỹ0 = h(Y0 ) also follows the DR model and satisfies the
no-interaction assumption for any strictly monotonic transformation h. To see this result
note that if h is strictly increasing:

FỸ0 | G,T,X (ỹ | g, t, x) = Λ(α(h−1 (ỹ)) + β(h−1 (ỹ))t + γ(h−1 (ỹ))g) = Λ(α̃(ỹ) + β̃(ỹ)t + γ̃(ỹ)g),

where ỹ 7→ h−1 (ỹ) is the inverse function of y 7→ h(y), α̃ = α ◦ h−1 , β̃ = β ◦ h−1 and
γ̃ = γ ◦ h−1 . A similar argument applies when h is strictly decreasing. Unlike the parallel

10
trends in expectation, the no-interaction or parallel trends in distribution is invariant to
strictly monotonic transformations.5

3 Multiple Outcomes
Some settings may feature multiple outcomes which are potentially affected by the treat-
ment. In these situations we might be interested in not only how each of the outcomes
are affected by the treatment, but also how the relationship between the outcomes is
affected by the treatment. For this it is necessary to identify the joint distribution of
the potential outcomes with and without the treatment. We now consider a setting with
two outcomes Y and Z and we focus on comparing features of the joint distribution of
the potential outcomes with the treatment, Y1 and Z1 , and the joint distribution of the
potential outcomes without the treatment, Y0 and Z0 , for the treated group G = 1 in the
post-treatment period T = 1. For the sake of illustration we consider two measures of
dependence. Namely, Spearman’s and Kendall’s rank correlation.
Spearman’s rank correlation between Yd and Zd , d ∈ {0, 1}, can be expressed:

ρ[Yd , Zd | G = 1, T = 1] = Corr[FYd | G,T (Yd | 1, 1), FZd | G,T (Zd | 1, 1) | G = 1, T = 1] =


Z ∞Z ∞
12 [FYd | G,T (y | 1, 1) − 1/2][FZd | G,T (z | 1, 1) − 1/2]FYd ,Zd | G,T (dy, dz | 1, 1);
−∞ −∞

and Kendall’s rank correlation between Yd and Zd , d ∈ {0, 1}, can be expressed:
Z ∞ Z ∞
τ [Yd , Zd | G = 1, T = 1] = 4 [FYd ,Zd | G,T (y, z | 1, 1)−1/4]FYd,Zd | G,T (dy, dz | 1, 1),
−∞ −∞

where we assume that Yd and Zd are continuous random variables to obtain the expressions
on the right hand side.
As in the univariate case, FY1 ,Z1 | G,T (y, z | 1, 1) is identified by the joint distribution
of the observed outcomes, FY,Z | G,T (y, z | 1, 1), whereas FY0 ,Z0 | G,T (y, z | 1, 1) is not identi-
fied from the data. To analyze identification, we use a variation of the local Gaussian
representation (LGR) of a bivariate distribution from Chernozhukov, Fernandéz-Val and
Luo (2018). Let Φ denote the Gaussian distribution function and Φ2 (·, ·; ρ) denote the
5
The distributional approach of Kim and Wooldridge (2023) also satisfies this property.

11
distribution of the bivariate standard normal with parameter ρ. Moreover, Λ is, again,
a strictly increasing cumulative distribution function. As we show in Section 4, there is
a benefit of using the logistic link function in our univariate analysis. Accordingly, we
employ this in our empirical analysis for estimating both the univariate and bivariate
effects.

Lemma 3 [LGR with non-Normal Marginals]. The joint distribution of two random vari-
ables Y and Z conditional on X can be represented by:

FY,Z | X (y, z | x)(y, z | x) ≡ Φ2 (Φ−1 (Λ(µY | X (y | x))), Φ−1 (Λ(µZ | X (y | x))); ρY,Z | X (y, z | x)),

for all y, z, x, where µY | X (y | x) = Λ−1 (FY | X (y | x)), µZ | X (y | x) = Λ−1 (FZ | X (z | x)), and
ρY,Z | X (y, z | x)) is the unique solution in ρ to the equation:

FY,Z | X (y, z | x)(y, z | x) = Φ2 (Φ−1 (FY | X (y | x)(y, z | x)), Φ−1 (FZ | X (z | x)(y, z | x)); ρ).

Proof. The proof is identical to the proof of Lemma 2.1 of Chernozhukov, Fernandéz-Val
and Luo (2018) using:

Φ−1 (Λ(µY | X (y | x))) = Φ−1 (FY | X (y | x))

and
Φ−1 (Λ(µZ | X (z | x))) = Φ−1 (FZ | X (z | x)).

The difference between Lemma 3 and the LGR of Chernozhukov, Fernandéz-Val and
Luo (2018) is that the marginals are represented by a general link rather than Gaussian
links, that is:

FY | X (y | x)(y | x) ≡ Λ(µY | X (y | x)), FZ | X (z | x) ≡ Λ(µZ | X (z | x)).

By the LGR, FY0 ,Z0 | G,T can be expressed as:

FY0 ,Z0 | G,T (y, z | g, t) ≡

Φ2 (Φ−1 (Λ(µY0 | G,T (y | g, t))), Φ−1(Λ(µZ0 | G,T (y | g, t))); ρY0,Z0 | G,T (y, z | g, t)), (7)

12
where µY0 | G,T (y | g, t) = αY (y) + βY (y)t + γY (y)g + δY (y)gt, µZ0 | G,T (y | g, t) = αZ (z) +
βZ (z)t + γZ (z)g + δZ (z)gt, and ρY,Z | G,T (y, z | g, t) = αY,Z (y, z) + βY,Z (y, z)t + γY,Z (y, z)g +
δY,Z (y, z)gt. In the LGR, the marginals are represented by:

FY0 | G,T (y | g, t) = Λ(αY (y) + βY (y)t + γY (y)g + δY (y)gt),

and
FZ0 | G,T (z | g, t) = Λ(αZ (z) + βZ (z)t + γZ (z)g + δZ (z)gt).

We make the following identifying assumptions with respect to the distribution func-
tion in (7).

Assumption 5 [Bivariate No-interaction].

δY (y) = δZ (z) = δY,Z (y, z) = 0 for all (y, z) ∈ R2 in (7).

Let YZ d (G = g, T = t) denote the support of (Yd , Zd ) | G = g, T = t, for d, g, t ∈ 0, 1.


We also assume:

Assumption 6 [Bivariate Support].

YZ 0 (G = 1; T = 1) ⊆ YZ 0 (G = 0; T = 1) ∪ YZ 0 (G = 1; T = 0) ∪ YZ 0 (G = 0; T = 0).

Lemma 4 [Identification with Two Outcomes]. (y, z) 7→ FY0 ,Z0 | G,T (y, z | 1, 1) is identified
on R2 under Assumptions 5 and 6.

Proof of Lemma 4. Under the assumptions of the Lemma, µY0 | G,T (y | g, t) = αY (y) +
βY (y)t + γY (y)g, µZ0 | G,T (y | g, t) = αZ (z) + βZ (z)t + γZ (z)g, and ρY,Z | G,T (y, z | g, t) =
αY,Z (y, z) + βY,Z (y, z)t + γY,Z (y, z)g.
The parameters αY (y), βY (y), γY (y), αZ (z), βZ (z), and γZ (z) are identified from the
marginals of Y and Z, by Lemma 1. The parameter αY,Z (y, z) is identified as the solution
in α to:

FY,Z | G,T (y, z | 0, 0) = Φ2 (αY (y) + βY (y)t + γY (y)g, αZ (z) + βZ (z)t + γZ (z)g; α).

This solution exists and is unique because the RHS is strictly increasing in α. The
parameters βY,Z (y, z) and γY,Z (y, z) are identified similarly as the solutions in β and γ of:

FY,Z | G,T (y, z | 0, 1) = Φ2 (αY (y) + βY (y)t + γY (y)g, αZ (z) + βZ (z)t + γZ (z)g; αY,Z (y, z) + β).

13
and

FY,Z | G,T (y, z | 1, 0) = Φ2 (αY (y) + βY (y)t + γY (y)g, αZ (z) + βZ (z)t + γZ (z)g; αY,Z (y, z) + γ).

Finally,

FY0 ,Z0 | G,T (y, z | 1, 1) = Φ2 (αY (y) + βY (y) + γY (y), αZ (z)βZ (z) + γZ (z); αY,Z (y, z)+
βY,Z (y, z) + γY,Z (y, z)).

Covariates can be incorporated in a similar fashion as the univariate case. In particular,


the LGR of FY0 ,Z0 | G,T,X becomes:

FY0 ,Z0 | G,T,X (y, z | g, t, x) ≡


Φ2 (Φ−1 (Λ(µY0 | G,T,X (y | g, t, x))), Φ−1(Λ(µZ0 | G,T,X (y | g, t, x))); ρY0,Z0 | G,T,X (y, z | g, t, x)),
(8)
where µY0 | G,T,X (y | g, t, x) = αY (y, x)+βY (y, x)t+γY (y, x)g+δY (y, x)gt, µZ0 | G,T,X (y | g, t, x) =
αZ (z, x) + βZ (z, x)t + γZ (z, x)g + δZ (z, x)gt, and ρY,Z | G,T,X (y, z | g, t, x) = αY,Z (y, z, x) +
βY,Z (y, z, x)t + γY,Z (y, z, x)g + δY,Z (y, z, x)gt.
The identifying assumptions with covariates become:

Assumption 7 [Bivariate No-interaction with Covariates].

δY (y, X) = δZ (z, X) = δY,Z (y, z, X) = 0 almost surely for all (y, z) ∈ R2 in (8).

Assumption 8 [Bivariate Support with Covariates].

YZ 0 (G = 1; T = 1; X) ⊆

YZ 0 (G = 0; T = 1; X) ∪ YZ 0 (G = 1; T = 0; X) ∪ YZ 0 (G = 0; T = 0; X),

almost surely.

Lemma 5 [Identification with Two Outcomes and Covariates]. Under Assumptions 7 and
8, (y, z) 7→ FY0 ,Z0 | G,T,X (y, z | 1, 1, x) is identified on R2 × X11 .

The marginalized distribution FY0 ,Z0 | G,T (y | 1, 1) is then identified by


Z
FY0 ,Z0 | G,T (y, z | 1, 1) = FY0 ,Z0 | G,T,X (y, z | 1, 1, x)dFX | G,T (x | 1, 1).
X11

14
4 Estimation

4.1 Univariate Case


Assume we have a sample {(Yi , Xi , Gi , Ti ) : 1 ≤ i ≤ N} of (Y, X, G, T ). For estimation,
we replace the functions (y, x) 7→ (α(y, x), β(y, x), γ(y, x)) in (3) by semiparametric linear
indexes leading to the DR model for the conditional distribution:

FY0 | G,T,X (y | g, t, x) = Λ(pα (x)′ α(y) + pβ (x)′ β(y)t + pγ (x)′ γ(y)g), y ∈ R, (9)

where pα (x), pβ (x) and pγ (x) are vectors including the covariates and their transforma-
tions, and y 7→ (α(y), β(y), γ(y)) is a vector of function-valued parameters.
We implement the DR DiD estimator via the sequence of logit models at each point
of the distribution of the outcome variable (Foresi and Peracchi, 1995, Chernozhukov,
Fernandez-Val and Melly, 2013). We choose logit because it is the canonical link for binary
outcomes allowing for pooled estimation of the distributions of the potential outcomes
with and without the treatment (Wooldridge, 2023). Accordingly, we estimate the DR
model for the observed outcomes on all observations including those with Di = 1:

FY | G,T,X (y | g, t, x) = Λ(pα (x)′ α(y) + pβ (x)′ β(y)t + pγ (x)′ γ(y)g + pθ (x)′ θ(y)gt), y ∈ Y,
(10)
where pα (x), pβ (x), pγ (x) and pθ (x) are vectors including a constant as the first compo-
nent, and transformations of the covariates, and Y is a finite grid on R. Let Iiy := 1(Yi ≤ y)
and I¯iy = 1 − Iiy .

Algorithm 1 [Univariate Estimator]. 1. Estimate the parameters of model (10) by


distribution regression, that is, for y ∈ Y,
N
X
(α̂(y), β̂(y), γ̂(y), θ̂(y)) ∈ arg max ℓi (a, b, c, d),
a,b,c,d
i=1

ℓi (a, b, c, d) = Iiy log Λ(pα (Xi )′ a + pβ (Xi )′ b Ti + pγ (Xi )′ c Gi + pθ (Xi )′ d Gi Ti )

+ I¯iy log Λ(−pα (Xi )′ a − pβ (Xi )′ b Ti − pγ (Xi )′ c Gi − pθ (Xi )′ d Gi Ti ).

2. Construct plug-in estimators of the distributions of the potential outcomes


N
1 X
F̂Y0 | G,T (y | 1, 1) = Gi Ti Λ(pα (Xi )′ α̂(y) + pβ (Xi )′ β̂(y) + pγ (Xi )′ γ̂(y)),
N11 i=1

15
and
N
1 X
F̂Y1 | G,T (y | 1, 1) = Gi Ti Λ(pα (Xi )′ α̂(y) + pβ (Xi )′ β̂(y) + pγ (Xi )′ γ̂(y) + pθ (Xi )′ θ̂(y)),
N11 i=1
PN
where N11 = i=1 Gi Ti .

3. If needed, rearrange the estimates y 7→ F̂Yd | G,T (y | 1, 1) on y ∈ Y, d ∈ {0, 1}, to


make them increasing.

By the properties of the logistic link, the estimator of FY1 | G,T (y | 1, 1) is identical to
the empirical distribution of Y conditional on G = 1 and T = 1,
N
1 X
F̂Y1 | G,T (y | 1, 1) ≡ Gi Ti 1(Yi ≤ y).
N11 i=1

Note that this estimator is therefore invariant to the specification of pθ (x). We set pθ (x) =
1 to speed up computation. Estimators of the functionals of the distributions of potential
outcomes such as quantile functions and effects can be constructed using the plug-in
principle.
Our algorithm has an advantage over the alternative of estimating pα , pβ , pγ using
only those observations for which Di = 0 via direct estimation of (10). The distributional
treatment effect, i.e.
 
EX FY1 |G,T,X (y|1, 1, X) − FY0 |G,T,X (y|1, 1, X) ,

equals the average derivative estimator of Di for the logit model used for distribution
regression. This estimate of the average derivative and its standard error are reported by
many software packages (Wooldridge, 2024).

4.2 Bivariate Case


Assume we have a sample {(Yi , Zi , Xi , Gi , Ti ) : 1 ≤ i ≤ N} of (Y, Z, X, G, T ). For
estimation, as in the univariate case, we replace the functions in µY0 | G,T,X , µZ0 | G,T,X and
ρY,Z | G,T,X by semiparametric generalized linear indexes leading to a bivariate distribution
regression (BDR) model:

µY0 | G,T,X (y | g, t, x) = pα (x)′ αY (y) + pβ (x)′ βY (y)t + pγ (x)′ γY (y)g, (11)

16
µZ0 | G,T,X (y | g, t, x) = qα (x)′ αZ (z) + qβ (x)′ βZ (z)t + qγ (x)′ γZ (z)g, (12)

and

ρY,Z | G,T,X (y, z | g, t, x) = h(rα (x)′ αY,Z (y, z) + rβ (x)′ βY,Z (y, z)t + rγ (x)′ γY,Z (y, z)g), (13)

where pα (x), pβ (x), pγ (x), qα (x), qβ (x), qγ (x), rα (x), rβ (x) and rγ (x) are vectors including
the covariates and their transformations, and h(u) = arctanh(u) is the Fisher transfor-
mation that enforces ρY,Z | G,T,X to lie in [−1, 1].
We estimate all the parameters of FY0 ,Z0 | G,T (y, z | 1, 1) using the bivariate distribution
regression estimator of Fernandez-Val et al. (2024a). We employ an imputation method
that combines the parameter estimates from the sample of the first period for both groups
and the sample of the second period for the untreated group, with the sample of the
covariates in the second period for the treated group. The distribution FY1 ,Z1 | G,T (y, z | 1, 1)
is estimated using the empirical distribution of Y and Z in the second period for the
treated group. Algorithm 2 describes the estimation procedure. Let Y and Z be finite
grids on R, Iiy := 1(Yi ≤ y), I¯iy = 1 − Iiy , Jiz := 1(Zi ≤ z), J¯iz = 1 − Jiz .

Algorithm 2 [Bivariate Estimator]. 1. Estimate the parameters of (11) and (12) us-
ing Algorithm 1 on y ∈ Y and z ∈ Z. Obtain

m̂Yi (y) = Φ−1 (Λ(pα (Xi )′ α̂Y (y) + pβ (Xi )′ β̂Y (y) Ti + pγ (Xi )′ γ̂Y (y) Gi )),

and

m̂Zi (z) = Φ−1 (Λ(qα (Xi )′ α̂Z (z) + qβ (Xi )′ β̂Z (z) Ti + qγ (Xi )′ γ̂Z (z) Gi )),

where α̂Y (y), β̂Y (y), γ̂Y (y), α̂Z (z), β̂Z (z) and γ̂Z (z) are the estimates of αY (y),
βY (y), γY (y), αZ (z), βZ (z) and γZ (z) obtained from Algorithm 1.

2. Estimate the parameters of (13) by BDR, that is, for y ∈ Y and z ∈ Z,


N
X
(α̂Y,Z (y, z), β̂Y,Z (y, z), γ̂Y,Z (y, z)) ∈ arg max (1 − Gi Ti ) ℓi (a, b, c),
a,b,c
i=1

ℓi (a, b, c) = Iiy Jiz log Φ2 (m̂Yi (y), m̂Z ′ ′ ′


i (z); h(rα (Xi ) a + rβ (Xi ) b Ti + rγ (Xi ) c Gi ))

+ Iiy J¯iz log Φ2 (m̂Yi (y), −m̂Z ′ ′ ′


i (z); −h(rα (Xi ) a + rβ (Xi ) b Ti + rγ (Xi ) c Gi ))

+ I¯iy Jiz log Φ2 (−m̂Yi (y), m̂Z ′ ′ ′


i (z); −h(rα (Xi ) a + rβ (Xi ) b Ti + rγ (Xi ) c Gi ))

+ I¯iy J¯iz log Φ2 (−m̂Yi (y), −m̂Z ′ ′ ′


i (z); h(rα (Xi ) a + rβ (Xi ) b Ti + rγ (Xi ) c Gi )).

17
3. Construct plug-in estimators of the distributions of the potential outcomes
N
1 X Y,Z
F̂Y0 | G,T (y | 1, 1) = Gi Ti Φ2 (n̂Yi (y), n̂Z
i (z); n̂i (y, z)),
N11 i=1

and
N
1 X
F̂Y1 ,Z1 | G,T (y, z | 1, 1) = Gi Ti 1(Yi ≤ y, Zi ≤ z),
N11 i=1
where

n̂Yi (y) = Φ−1 (Λ(pα (Xi )′ α̂Y (y) + pβ (Xi )′ β̂Y (y) + pγ (Xi )′ γ̂Y (y))),

n̂Zi (z) = Φ−1 (Λ(qα (Xi )′ α̂Z (z) + qβ (Xi )′ β̂Z (z) + qγ (Xi )′ γ̂Z (z))),

n̂Y,Z ′ ′ ′
i (y, z) = h(rα (Xi ) α̂Y,Z (y, z) + rβ (Xi ) β̂Y,Z (y, z) + rγ (Xi ) γ̂Y,Z (y, z))
P
and N11 = N i=1 Gi Ti .

Estimators of the functionals of the joint distributions of potential outcomes such


as Spearman’s and Kendall’s rank correlation coefficients can be constructed using the
plug-in principle.

4.3 Bootstrap Inference


The estimators described in Algorithms 1 and 2 can be applied to panel and repeated
cross-section data. Here we describe a weighted bootstrap algorithm to perform inference
on functions of the distributions of potential outcomes designed for panel data. We focus
on this case because it is relevant for the empirical applications in Section 5.
To describe the procedure, we need to introduce an indicator IDi , i = 1, . . . , N,
for the units in the panel. For example, if the sample is sorted by unit and time period,
ID = (1, 1, 2, 2, . . . , n, n), where n = N/2. The following algorithm describes the weighted
bootstrap procedure to construct joint confidence bands for the distributions of the po-
tential outcomes with and without the treatment in the univariate case. Inference for
functionals of the distributions and for the bivariate case can be performed using similar
algorithms.

Algorithm 3 [Weighted Bootstrap Inference]. 1. Choose the number of bootstrap rep-


etitions B, e.g., B = 500 or B = 1, 000.

18
2. Draw weights for each unit independent and identically from the standard expo-
nential distribution, independently for the data. Construct a vector of weights
ω = (ω1 , . . . , ωN ), where ωi = ωj if IDi = IDj , and normalize the components
of ω to add up to one.

3. Estimate the parameters of model (10) by weighted distribution regression, that is,
for y ∈ Y,
N
X
(α̃(y), β̃(y), γ̃(y), θ̃(y)) ∈ arg max ωi ℓi (a, b, c, d),
a,b,c,d
i=1

ℓi (a, b, c, d) = Iiy log Λ(pα (Xi )′ a + pβ (Xi )′ b Ti + pγ (Xi )′ c Gi + pθ (Xi )′ d Gi Ti )

+ I¯iy log Λ(−pα (Xi )′ a − pβ (Xi )′ b Ti − pγ (Xi )′ c Gi − pθ (Xi )′ d Gi Ti ).

4. Construct plug-in weighted estimators of the distributions of the potential outcomes


N
1 X
F̂Yb0 | G,T (y | 1, 1) = ωi Gi Ti Λ(pα (Xi )′ α̃(y) + pβ (Xi )′ β̃(y) + pγ (Xi )′ γ̃(y)),
N11 i=1

and
N
1 X
F̂Yb1 | G,T (y | 1, 1) = ωi Gi Ti Λ(pα (Xi )′ α̃(y) + pβ (Xi )′ β̃(y) + pγ (Xi )′ γ̃(y) + pθ (Xi )′ θ̃(y)),
N11 i=1
PN
where N11 = i=1 ωi Gi Ti .

5. Repeat steps 1-3 B times to obtain


n o
F̂Yb0 | G,T (y | 1, 1), F̂Yb1 | G,T (y | 1, 1) : y ∈ Y, 1 ≤ b ≤ B .

6. Construct an estimator of the (1−α)-critical value of the maximal t-statistic, t̄Y (1−
α), as the (1 − α)-quantile of {t̄bY : 1 ≤ b ≤ B}, where
" b b
#
| F̂Y0 | G,T (y | 1, 1) − F̂Y 0 | G,T (y | 1, 1)| | F̂Y1 | G,T (y | 1, 1) − F̂Y 1 | G,T (y | 1, 1)|
t̄bY = max , ,
y∈Y S0 (y) S1 (y)
n o
and Sd (y) is the interquartile range of F̂Ybd | G,T (y | 1, 1) : 1 ≤ b ≤ B divided by
1.34896, the interquartile range of the standard normal distribution, for d ∈ {0, 1}.

7. Construct the (1 − α)-confidence bands as

CB1−α [FYd | G,T (· | 1, 1)] = F̂Yd | G,T (y | 1, 1) ± t̄Y (1 − α)Sd (y), d ∈ {0, 1}.

19
Remark 1 (Empirical Bootstrap). Empirical bootstrap can be implemented by drawing
the weights in step 1 from a multinomial distribution with values 1, . . . , n and equal prob-
abilities 1/n.

5 Empirical applications
We illustrate our proposed estimators via an examination of data from two existing em-
pirical investigations.6 . The first is the Malesky et al. (2014) investigation of the impact
of recentralization in Vietnam. This paper is useful for our purposes as it considers a
single treatment and multiple outcomes. Moreover, Malesky et al. (2014) only report
mean effects. We examine whether these mean effects are informative of the heterogene-
ity across the outcomes’ distributions. We also investigate whether the treatment affects
the bivariate distribution of different combinations of outcomes. In a second example
we explore the impact of increases in the mandatory minimum wage on average weekly
wages, unemployment and poverty rates by examining an extension of the data in Call-
away and Li, (2019). We estimate the impact of the treatment on each of these outcome
distributions and also the bivariate distributions of some combinations of the outcomes.

5.1 The impact of recentralization (Malesky, et al. 2014)


5.1.1 Description of the original empirical exercise

Malesky et al. (2014) investigate the impact of recentralization via a case study in Viet-
nam. Due to the dissatisfaction with decentralization measures taken in the early 1990s,
Vietnam changed their political system in 2007 by eliminating one political layer from the
decision making process. More explicitly, Vietnam has four layers of the political process:
the central government, the provinces (63 in total), the districts (696 in total), and the
communes (more than 11,000 in total).7 The change involved abolishing the political
process at the districts (which are governed by the so called Districts People Council or
DPC). Prior to introducing this change, the Vietnam government experimented with ten
6
We thank Brant Callaway, Tong Li and Pedro Sant’Anna for providing us with the data
7
The total population of Vietnam was 84.76 million in 2007.

20
provinces (with 99 districts). Malesky et al. (2014) use this experiment for their empirical
analysis. Note that the experiment was not random, but was decided by the central gov-
ernment to be stratified based on regions and subregions as well as on rural versus urban
areas and by socioeconomic and public administration performance of the provinces. The
decision to start this experiment was made in 2008 and the abolition of the DPC in the
treatment districts started in 2009.
Malesky et al. (2014) employ the following specification;

Yit = α + βTt + γGi + θGi Tt + Xit′ π + Uit .

where Yit is the outcome variable for period t of commune i. Tt is a dummy variable
that equals one in the treated period while Gi is a dummy variable that equals one if
the commune i belongs to a treated district. Finally, Xit is a set of control variables
for commune i and in period t. Malesky et al. (2014) consider the log surface area of
the commune, the log of the commune population density, whether the commune belongs
to a national level city, and region dummies (8 regions in total). For reasons of data
availability, Malesky et al. (2014) only use rural communes and two years of observations:
2008 and 2010 (they also use 2006 for robustness checks). They analyze 30 different
outcome variables to investigate the impact of the abolition of the political layer. As
most of their outcome variables are indicator variables, we only employ the following
eight of their original outcome variables: (1) proportion of households supported crop, (2)
proportion of households supported agricultural extension, (3) proportion of households
supported agriculture tax exemption, (4) the number of visits of agricultural extension
staff, (5) proportion of households supported healthcare fee, (6) proportion of households
supported tuition fee, (7) proportion of households supported credit, and (8) proportion
of households supported business tax exemption.

5.1.2 Our analysis for the univariate case

We use the following specification for FY0,i |Gi ,Ti ,Xit (·|g, t, x)

FY0,i |Gi ,Ti ,Xit (y|gt, tt , xit ) = Λ(α(y) + β(y)tt + γ(y)gi + x′it π(y)),

21
and use the same control variables as in Malesky et al. (2014). We estimate the counter-
factual distribution using:

FbY0,i |Gi ,Ti ,Xit (y|1, 1, xi,1) = Λ(b b +b


α(y) + β(y) γ (y) + x′i1 π
b(y)),

where α b
b(y), β(y), γb(y), and π
b(y) are estimated via distribution regression at y. We
estimate the unconditional distribution using
Z
FY0,i |Gi ,Ti (y|1, 1) = FY0,i |Gi ,Ti ,Xit (y|1, 1, xi1)dFXit |G,T (x|1, 1).
X (1,1)

Our estimator then becomes:

1 X
FbY0,i |Gi ,Ti (y|1, 1) = FbY0,i,t |Gi ,Ti ,Xi1 (y|1, 1, Xi,1),
N11 i:Gi =1,Ti =1

where N11 is the total number of observations for which Gi = 1, Ti = 1. We estimate the
quantile treatment effects by inverting the estimated distributions of FY0,i,t |Gi ,Ti (y|1, 1) and
FY1,i,t |Gi ,Ti (y|1, 1), where we estimate FY1,i,t |Gi ,Ti (y|1, 1) by using the empirical distribution.
In particular we use

FbY−1
j,i,t |Gi ,Ti
(q|1, 1) = inf{y : FbYj,i,t |Gi ,Ti (y|1, 1) ≤ q} j = 0, 1.

Results of our empirical exercise are in Figure 1. The quantile treatment effects are listed
in Table 1. We estimate the quantile treatment effects by inverting the estimated dis-
tribution functions. As in Malesky et al. (2014), we correct the confidence intervals for
clustering at the province level. We use the Bayesian bootstrap and draw the same expo-
nential weight for all observations belonging to the same province (see also Chernozukov
et al., 2020). We construct the confidence bounds by following steps 1-4 of Algorithm 1 of
Chernozhukov et al. (2020) but we use directly the quantile treatment effects rather than
the estimated distributions. This is allowed provided we assume the outcome variable
is continuously distributed. For some outcome variables, there is substantial bunching
at zero. For example, for the variable “Proportion of households supported crop” 49.93
percent of the observations equal zero. It also has this value for 54.04 percent for the
treatment group in the treatment period and 50.79 for the control group. This implies
that the quantile treatment effect is by definition equal to zero up to the median and the

22
Figure 1: Results of the empirical application.

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee

23
Figure 1: Results of the empirical application (continued).

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption

impact can only occur at the higher quantiles. Note that we employ dots to distinguish
this from cases in which there is an estimated zero impact.
Nevertheless, even after ignoring these zeros, both Figure 1 and Table 1 reveal a great
deal of treatment heterogeneity and that the mean impacts reported in Malesky et al.
(2014) are primarily generated from impacts at the top of the distribution. This is most
clear from the outcome variable “The number of visits of agricultural extension staff”
which has only a substantial impact at Q3 and D9. A similar conclusion can be drawn
for the outcome variable “Proportion of households supported credit”.
Malesky et al. (2014) use data for the year 2006 to check the common trends assump-
tion. They examine the results of a DiD design for the periods 2006 and 2008 in which
2008 is the placebo treatment period. That is, one would not expect any treatment effect
in the period before the introduction of the treatment. The results of this exercise are
in Figure 2 and Table 2. Similar to Malesky et al. (2014), we do find some substantial
differences in Figure 2 between the distribution of Y0 and Y1 . However, Table 2 indicates
these differences are not statistically significant and that they are generally in the opposite
direction as found in the original results. For example, for the proportion of households
supported crop (the first figure in Figures 1 and 2), the distribution of FY1 |G,T (·|1, 1) is in

24
Table 1: Quantile treatment effects with 90 percent confidence intervals based on Bayesian
weights. Confidence intervals corrected for clustering at the province level.
Mean 0.1 0.25 0.5 0.75 0.9

Proportion of households supported 0.0514 · · · -0.003 0.3074


crop (0.011,0.091) (·, ·) (·, ·) (·, ·) (-0.096,0.09) (-0.005,0.62)
Proportion of households supported 0.0214 · · · 0.0118 0.0876
agricultural extension (0.005,0.037) (·, ·) (·, ·) (·, ·) (-0.006,0.03) (0.001,0.175)
Proportion of households supported -0.0427 · · · -0.0687 ·
agricultural exemption (-0.099,0.013) (·, ·) (·, ·) (·, ·) (-0.79,0.653) (·, ·)

25
The number of visits of agricultural 0.0203 -0.0001 0.0 0.01 0.0299 0.08
extension staff (-0.001,0.042) (-0.0,0.0) (-0.009,0.009) (0.0,0.02) (-0.009,0.069) (0.0,0.16)
Proportion of households supported 0.0389 · -0.001 -0.0077 -0.0073 0.026
healthcare fee (0.008,0.07) (·, ·) (-0.003,0.001) (-0.019,0.003) (-0.042,0.027) (-0.348,0.4)
Proportion of households supported 0.0016 · 0.0005 -0.0001 -0.0001 0.0063
tuition fee (-0.002,0.005) (·, ·) (-0.001,0.002) (-0.002,0.002) (-0.006,0.006) (-0.008,0.021)
Proportion of households supported -0.0535 · -0.008 -0.0276 -0.0709 -0.1804
credit (-0.073,-0.034) (·, ·) (-0.015,-0.001) (-0.045,-0.01) (-0.204,0.062) (-0.269,-0.092)
Proportion of households supported 0.0012 · · · · -0.0143
business tax exemption (-0.009,0.011) (·, ·) (·, ·) (·, ·) (·, ·) (-0.029,-0.0)
Figure 2 generally to the left of the distribution of FY1 |G,T (·|1, 1) while the relationship is
the opposite in Figure 1.
In a standard linear DiD design, checking the common trend assumption as presented
above is identical to checking the value of the interaction term in the period(s) before the
treatment. This is not true for the non-linear design. However we perform an additional
check by examining the estimated coefficient value of the interaction term. That is, we
estimate the general representation presented in (1) for all observations in the periods
2006 and 2008. Results of this exercise are presented in Figure 3. Generally, we find that
δ(y) is not significantly different from zero but there are some regions in the distribution
of some of the outcome variables where there is a significant difference. For example, for
the outcome variable the “Proportion of households supported agricultural exemption”,
we find a significant difference in between 0.3 and 0.9 of the outcome values.
As a further robustness check, we interacted the covariates with the time and treatment
dummy variables. We interact regions with time but we cannot interact regions with
treatment as this will result in perfect multicollinearity due to the setup of the program.
These results are shown in Figure 4 and the quantile treatment effects are reported in
Table 5.

5.1.3 Comparison with the changes-in-changes estimation

For the changes-in-changes estimation, we note as in Athey and Imbens (2006) that the
distribution of Y0 | G = 1, T = 1 equals the distribution of φ(Y0 ) | G = 1, T = 0, where φ(y)

is defined as in Section 2.2, i.e. φ(y) := FY−1
0 | G,T FY 0 | G,T (y | 0, 0) | 0, 1 . We then obtain
an estimator of the distribution of Y0 |G = 1, T = 1 by using the empirical distribution
function of the random variable:

QFbY
0 |G,T
(Y0 |0,1) (Y0 |G = 0, T = 0).

We estimate the distribution function of FY0 |G=1,T =1 at point y for our changes-in-changes
estimator using the following steps:

1. For every observation of Y0 of the subsample of G = 1, T = 0 estimate the empirical


distribution function of the subsample for which G = 0, T = 0.

26
Figure 2: Results of robustness check using the years 2006 and 2008.

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee

27
Table 2: Quantile treatment effects – robustness check for 2006 and 2008 – parallel trends.
Mean 0.1 0.25 0.5 0.75 0.9

Proportion of households supported -0.0164 · · · -0.0569 -0.0246


crop (-0.077,0.044) (·, ·) (·, ·) (·, ·) (-0.119,0.006) (-0.684,0.635)
Proportion of households supported -0.0007 · · · · 0.0117
agricultural extension (-0.012,0.01) (·, ·) (·, ·) (·, ·) (·, ·) (-0.014,0.038)
Proportion of households supported 0.0296 · · · -0.0052 ·
agricultural exemption (-0.041,0.1) (·, ·) (·, ·) (·, ·) (-0.577,0.566) (·, ·)

28
The number of visits of agricultural -0.0002 · · · -0.0001 -0.0001
extension staff (-0.026,0.026) (·, ·) (·, ·) (·, ·) (-0.02,0.02) (-0.08,0.079)
Proportion of households supported -0.0038 · · 0.0079 -0.0221 -0.0241
healthcare fee (-0.037,0.029) (·, ·) (·, ·) (-0.004,0.02) (-0.043,-0.001) (-0.135,0.087)
Proportion of households supported 0.0012 · 0.0002 0.0018 0.0033 0.006
tuition fee (-0.003,0.006) (·, ·) (-0.001,0.001) (-0.0,0.004) (-0.001,0.008) (-0.017,0.028)
Proportion of households supported 0.0053 · -0.0011 -0.0004 -0.0013 -0.0239
credit (-0.014,0.025) (·, ·) (-0.014,0.012) (-0.023,0.022) (-0.059,0.056) (-0.149,0.101)
Proportion of households supported -0.0058 · · · 0.002 0.0142
business tax exemption (-0.021,0.009) (·, ·) (·, ·) (·, ·) (-0.001,0.005) (-0.004,0.032)
Figure 3: Results of the robustness check to investigate whether δ(y) of equation (1)
equals zero in the period before the treatment.

3
1
2
δ(y)

1 0.5

0
0
−1

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1


Proportion of households supported crop Proportion of households supported agricultural extension

2
0.2

1
0
δ(y)

−0.2 0

−0.4
−1
0.2 0.4 0.6 0.8 1 0.5 1 1.5
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff

0.5

0.5
δ(y)

0
0

−0.5
0.2 0.4 0.6 0.8 1 5 · 10−2 0.1 0.15 0.2 0.25
Proportion of households supported healthcare fee Proportion of households supported tuition fee

29
Table 3: Quantile treatment effects without using additional control variables in the
Mean 0.1 0.25 0.5 0.75 0.9

Proportion of households supported 0.0465 · · · -0.009 0.3068


crop (-0.002,0.095) (·, ·) (·, ·) (·, ·) (-0.119,0.101) (0.005,0.608)
Proportion of households supported 0.0204 · · · 0.0047 0.0832
agricultural extension (0.002,0.039) (·, ·) (·, ·) (·, ·) (-0.014,0.024) (-0.004,0.171)
Proportion of households supported -0.0461 · · · -0.0664 -0.0528
agricultural exemption (-0.102,0.01) (·, ·) (·, ·) (·, ·) (-0.748,0.615) (-0.789,0.683)

30
The number of visits of agricultural 0.0197 -0.0001 -0.0002 0.0098 0.0194 0.0507
extension staff (-0.002,0.041) (-0.0,0.0) (-0.01,0.009) (0.0,0.02) (-0.011,0.05) (-0.028,0.13)
Proportion of households supported 0.0331 · -0.0009 -0.0049 -0.0071 0.0237
healthcare fee (0.0,0.066) (·, ·) (-0.002,0.0) (-0.016,0.006) (-0.039,0.025) (-0.419,0.466)
Proportion of households supported 0.001 · 0.0004 -0.0005 -0.0014 0.0036
tuition fee (-0.003,0.005) (·, ·) (-0.001,0.002) (-0.003,0.002) (-0.009,0.006) (-0.009,0.017)
Proportion of households supported -0.0574 · -0.0089 -0.028 -0.0716 -0.1808
credit (-0.079,-0.036) (·, ·) (-0.019,0.001) (-0.05,-0.006) (-0.2,0.057) (-0.307,-0.054)
Proportion of households supported 0.0008 · · · · -0.0153
business tax exemption (-0.008,0.01) (·, ·) (·, ·) (·, ·) (·, ·) (-0.028,-0.002)

analysis.
Table 4: Quantile treatment effects without using additional control variables in the
Mean 0.1 0.25 0.5 0.75 0.9

analysis – robustness check for 2006 and 2008 – parallel trends.


Proportion of households supported -0.0162 · · · -0.0597 -0.0206
crop (-0.084,0.052) (·, ·) (·, ·) (·, ·) (-0.152,0.033) (-0.762,0.72)
Proportion of households supported -0.0041 · · · · 0.0007
agricultural extension (-0.015,0.007) (·, ·) (·, ·) (·, ·) (·, ·) (-0.032,0.033)
Proportion of households supported 0.0248 · · · -0.0021 ·
agricultural exemption (-0.058,0.107) (·, ·) (·, ·) (·, ·) (-0.61,0.606) (·, ·)

31
The number of visits of agricultural -0.0163 -0.0001 -0.0003 -0.0193 -0.0003 -0.05
extension staff (-0.032,-0.001) (-0.0,0.0) (-0.01,0.009) (-0.029,-0.009) (-0.02,0.019) (-0.119,0.019)
Proportion of households supported -0.0072 · · 0.0094 -0.0212 -0.0236
healthcare fee (-0.045,0.031) (·, ·) (·, ·) (-0.011,0.029) (-0.051,0.009) (-0.133,0.085)
Proportion of households supported 0.0008 · -0.0001 0.0014 0.0026 0.0051
tuition fee (-0.005,0.007) (·, ·) (-0.002,0.001) (-0.001,0.003) (-0.003,0.008) (-0.022,0.032)
Proportion of households supported 0.0008 · -0.0055 -0.0005 -0.0024 -0.0307
credit (-0.025,0.026) (·, ·) (-0.021,0.01) (-0.026,0.025) (-0.072,0.067) (-0.168,0.107)
Proportion of households supported -0.0061 · · · 0.0019 0.014
business tax exemption (-0.024,0.012) (·, ·) (·, ·) (·, ·) (-0.002,0.006) (-0.009,0.037)
Table 5: Quantile treatment effects using interaction terms between the covariates and
Mean 0.1 0.25 0.5 0.75 0.9

Proportion of households supported 0.0451 · · · 0.0042 0.3457


crop (0.003,0.087) (·, ·) (·, ·) (·, ·) (-0.092,0.101) (0.022,0.669)
Proportion of households supported 0.0233 · · · 0.0122 0.0912
agricultural extension (0.007,0.04) (·, ·) (·, ·) (·, ·) (-0.004,0.028) (-0.011,0.194)
Proportion of households supported -0.0554 · · · -0.0911 ·
agricultural exemption (-0.116,0.005) (·, ·) (·, ·) (·, ·) (-0.847,0.665) (·, ·)

32
The number of visits of agricultural 0.02 -0.0001 -0.0001 0.0099 0.0193 0.0798
extension staff (-0.008,0.048) (-0.0,0.0) (-0.009,0.009) (0.0,0.02) (-0.02,0.058) (0.0,0.159)

the time and treatment dummy variables.


Proportion of households supported 0.0276 · -0.0002 -0.0017 0.0005 0.0469
healthcare fee (-0.001,0.056) (·, ·) (-0.002,0.001) (-0.013,0.01) (-0.034,0.035) (-0.453,0.546)
Proportion of households supported 0.0003 · 0.0002 -0.0004 -0.0016 0.0038
tuition fee (-0.004,0.004) (·, ·) (-0.001,0.001) (-0.004,0.003) (-0.01,0.006) (-0.01,0.018)
Proportion of households supported -0.0484 · -0.0077 -0.0271 -0.0702 -0.1806
credit (-0.079,-0.018) (·, ·) (-0.022,0.007) (-0.053,-0.001) (-0.215,0.075) (-0.328,-0.033)
Proportion of households supported -0.0016 · · · · -0.0152
business tax exemption (-0.012,0.009) (·, ·) (·, ·) (·, ·) (·, ·) (-0.032,0.001)
Figure 3: Results of the robustness check to investigate whether δ(y) of equation (1)
equals zero in the period before the treatment.(continued).

0.5

1.5

0 1
δ(y)

0.5
−0.5

−1 −0.5

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1


Proportion of households supported credit Proportion of households supported business tax exemption

2. For every computed empirical distribution function of step 1 estimate the corre-
sponding quantile of the subsample for which G = 0, T = 1.

3. For all the obtained quantiles from step 2, compute the empirical distribution func-
tion in y.

The distribution of FY1 |G=1,T =1 can be estimated using the empirical distribution function.
One obtains the quantile treatment effect by inverting the distribution at the desired levels
of the distribution.
As we consider our estimator for the univariate outcome as a simpler alternative to
the CiC approach we contrast our results with those from that approach. We only do
so for the no covariates case as the CiC estimator is more difficult to implement in the
presence of covariates. The results are reported in Table 3 and Figure 5. The results are
reassuring as they are very similar to those in Table 6 and Figure 6 for CiC. The results
are similar in terms of both magnitude and statistical significance.

33
Figure 4: Results of the empirical application using interaction terms between the covari-
ates and the time and treatment dummy variable.

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee

34
Table 6: Quantile treatment using changes-in-changes.
Mean 0.1 0.25 0.5 0.75 0.9

Proportion of households supported 0.0484 · · · -0.0087 0.3863


crop (0.003,0.093) (·, ·) (·, ·) (·, ·) (-0.114,0.097) (0.001,0.772)
Proportion of households supported 0.0115 · · · 0.0025 0.0763
agricultural extension (-0.008,0.031) (·, ·) (·, ·) (·, ·) (-0.019,0.024) (-0.011,0.164)
Proportion of households supported -0.0489 · · · -0.0648 ·
agricultural exemption (-0.107,0.009) (·, ·) (·, ·) (·, ·) (-0.697,0.567) (·, ·)

35
The number of visits of agricultural 0.0145 -0.01 -0.0097 0.0098 0.0064 0.0497
extension staff (-0.01,0.039) (-0.02,-0.0) (-0.019,-0.0) (-0.002,0.021) (-0.021,0.034) (-0.02,0.12)
Proportion of households supported 0.0286 · -0.0009 -0.0066 0.0007 0.0353
healthcare fee (-0.001,0.059) (·, ·) (-0.003,0.001) (-0.017,0.004) (-0.025,0.026) (-0.417,0.488)
Proportion of households supported 0.0013 · 0.0005 -0.0003 0.0006 0.0045
tuition fee (-0.002,0.005) (·, ·) (-0.001,0.002) (-0.003,0.002) (-0.007,0.008) (-0.008,0.017)
Proportion of households supported -0.0559 · -0.0076 -0.0275 -0.0892 -0.1857
credit (-0.08,-0.032) (·, ·) (-0.018,0.003) (-0.05,-0.005) (-0.206,0.028) (-0.298,-0.074)
Proportion of households supported -0.0004 · · · -0.0011 -0.0158
business tax exemption (-0.008,0.007) (·, ·) (·, ·) (·, ·) (-0.007,0.005) (-0.029,-0.002)
Figure 4: Results of the empirical application using interaction terms between the covari-
ates and the time and treatment dummy variable (continued).

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption

5.1.4 Results for bivariate outcomes

For the bivariate analysis we consider the outcome variables “Proportion of households
supported credit” and the “Proportion of households supported healthcare fee” as these
variables have relatively little bunching at integer values. The counterfactual and actual
distributions are shown in Figure 10.
The figure reveals that the joint distribution has changed due to the treatment and that
the distribution of the treated population has shifted to the upper-left corner. However,
it is difficult to see whether this is not merely a result of the changes in the marginal
distributions. We also present results using Kendall’s tau given as:
n X
X n
τ̂ = sgn(Ydi − Ydj )sgn(Zdi − Zdj ).
i=1 j=i+1

The Kendall’s tau for the joint distribution of the treated sample in the second period
when treated can directly be calculated from the observed data. For the counterfactual
distribution of the treated sample in the second period when not treated, we first sample
from the estimated distribution. That is, we sample a value of Y0 using our estimate of its
marginal distribution from above. We then sample Z0 from the conditional distribution

36
Figure 5: Results without using additional control variables in the analysis.

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee

37
Figure 5: Results without using additional control variables in the analysis. (continued).

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption

of Z0 | Y0 which can be obtained using our estimates for the bivariate case. The estimated
Kendall’s tau from this procedure is 0.1253 with a 95-percent confidence interval from
0.0989 to 0.1518. This implies a positive correlation between the two outcomes in the
districts. For the observed distribution of the treated group we obtain a 0.2463 with a
95-percent confidence interval from 0.2224 to 0.2703. This implies that the treatment has
statistically significantly increased the correlation between the two outcomes.

5.2 The impact of increases in the mandatory minimum wage


The impact of increases in the mandatory minimum wage has been the focus of many em-
pirical investigations. Some consider their impact on (un)employment (Card and Krueger,
1994, Dube et al., 2010, Cengiz et al, 2019, Callaway and Li, 2019, Torous et al, 2024),
while others focus on how they affect income levels (Dube, 2019) or the poverty rate
(MaCurdy, 2015). The vast majority of these studies consider the mean of these outcome
variables and only investigate the respective outcomes separately. Torous et al. (2024) is
an exception as it examines distributional effects and the relationship between part-time
and full-time employment. We now consider the joint impact of changes in the mandatory
minimum wage on wages and both the unemployment and poverty rates.

38
Figure 6: Results of changes-in-changes

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee

39
Figure 6: Results of changes-in-changes (continued).

1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)

FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption

We follow Callaway and Li (2019) and investigate a change in the mandatory state
minimum wage in the 11 U.S. states which had their mandatory minimum wage below the
federal minimum wage at the beginning of 2007. We employ these states as the treated
group. Before this change there were an additional 22 states in which the state minimum
wage was lower than the federal but we follow Callaway and Li (2019) and drop New
Hampshire and Pennsylvania. This leaves a control group of 20 states.
Similar to Callaway and Li (2019), we first estimate the impact of the change in the
state-level minimum wage on the county level unemployment rates. Callaway and Li
(2019) employ the Local Area Unemployment Statistics Database from the Bureau of
Labor Statistics (BLS) and control variables on population and income from the 2000
County Data Book. We use county data on the percentage of African Americans, the
percentage of high school graduates, the percentage of college graduates, the log of the
total population, the poverty rate and the log median income (for 1997).
The results using our method are shown in the first row of Table 7 and are quantita-
tively similar to those of Callaway and Li (2019). The impact on the unemployment rate
is negative for those counties that have lower levels of the unemployment rate but it is
positive for those counties that have higher unemployment rates. The degree of hetero-

40
Table 7: Quantile treatment effects of the increase in the mandatory minimum wage. The
unemployment rate and poverty rate are measured in percentages.

Mean 0.1 0.25 0.5 0.75 0.9

Unemployment rate 0.1242 -0.3182 -0.0886 0.1824 0.3398 0.3191


(-0.104,0.353) (-0.779,0.143) (-0.524,0.346) (-0.186,0.551) (-0.188,0.868) (-0.145,0.783)
(log) Average weekly 0.0015 -0.0153 -0.0124 -0.0157 -0.009 -0.001
wage (-0.015,0.018) (-0.044,0.013) (-0.034,0.009) (-0.042,0.011) (-0.048,0.03) (-0.04,0.038)
Poverty rate 0.1748 0.3 0.3 0.0374 0.0 0.0
(-0.051,0.4) (-0.301,0.901) (-0.204,0.804) (-0.489,0.564) (-0.504,0.504) (-0.612,0.612)

geneity in our results is smaller than Callaway and Li (2019). For example, they find an
impact of -0.44 at the first decile compared to our estimate of -0.32. This results in our
failure to reject the null hypothesis of no impact in any part of the distribution, whereas
their impact at the bottom decile is marginally statistically significant.
We also merge the Callaway and Li (2019) data with average weekly wage data taken
from the Quarterly Census of Employment and Wages (QCEW) sample from the BLS.
Note that we employ data for the first quarter of the year 2006 and 2007. We acknowledge
that the average weekly wage is not the ideal feature of the wage distribution to examine
for this question but other wage measures were not available. However, the monopsony
wages literature has frequently argued that the entire wage distribution may shift due to
a change in the mandatory minimum wage (e.g. Van den Berg, 2003). This would have
implications for the average wage. Using the same specification as for the unemployment
rate equation we examine the impact on average wages and the results are reported in the
second row of Table 7. The impact on the average weekly wage is small and statistically
insignificant.
We also examine poverty rates from the state and county estimates published by the
Census for 2006 and 2007. A shortcoming of these data is that poverty rates are reported
for the whole of 2007. As reported by Callaway and Li (2019), the federal minimum wage
increased at the end of July 2007 and this change may have an impact on the poverty
rates in our control states. Under this proviso we explore the impact of the change in
the minimum wage. We continue to use the same specification and the results are in the
third row of Table 7. The point estimates suggests that the impact on the poverty rate

41
appears somewhat larger than that on average weekly wages. In addition, the impact is
larger for those counties that have low poverty rates. However, there is no evidence of a
statistically significant impact.
Our evidence suggests that the univariate distributions are not significantly affected
by the change in the mandatory minimum wage. To investigate whether the joint dis-
tributions are affected, Figures 7 to 9 report the 2-dimensional effects noting that the
left-hand figures show the counterfactual distributions and the right-hand side show the
observed distributions. As it is difficult to reach clear conclusions from these figures,
Table 8 reports the changes in the Kendall’s τ and Spearman’s correlation index for the
different pairs of outcome variables noting that the latter is computed as:
P
6 ni=1 d2i
ρS = 1 −
n(n2 − 1)
where di is the difference in ranks between the two variables yi and zi . Note that we do
not report the Spearman’s correlation value for the previous empirical example due to the
degree of bunching at certain values in the data.
In the absence of treatment the estimates of Kendall’s τ and Spearman’s correlation
index for the unemployment rate and the average weekly wage are -0.1095 and -0.1316
respectively. Following the increase in the mandatory minimum wage, this negative rela-
tionship becomes weaker, with the corresponding estimate values of -0.0350 and -0.052.
However, these changes are not statistically significantly different from zero at a 95 per-
cent significance level. In contrast, the unemployment rate and the poverty rate have a
positive correlation prior to the treatment with the respective estimates of the correlation
being 0.256 and 0.329. This correlation becomes somewhat stronger after the increase of
the mandatory minimum wage with estimates of 0.312 and 0.453. Once again the changes
are not statistically significant.
Finally, we examine the correlation between the average weekly wage and the poverty
rate. This relationship is also negative prior to treatment with the two estimates of the
correlation being -0.3070 and -.413. This correlation also becomes weaker after treatment,
increasing to -0.1762 and -.261, and in contrast to the earlier results these changes are
statistically significant, or marginally insignificant, at the 95 percent significance level.
While we leave a fuller explanation of this result to future work, this is an encouraging

42
Figure 7: Results of 2-dimensional effects of the unemployment rate and the (log) average
weekly wage

FY0 ,Z0 |G,T (y, z|1, 1) FY1 ,Z1 |G,T (y, z|1, 1)

7.5 7.5

0.80
0.8
00

0
Poverty rate

0.
70
7 7

0
0.
0.6 70
0.5 00 0
00 0.6
0.4 00
0.3 00 0.5
00 00

0.20 0.4
6.5 0.1
0 6.5 0.3
00

00 00
0.2
00

0.100

6 6
3 4 5 6 7 8 9 3 4 5 6 7 8 9
Unemployment rate Unemployment rate

result for those who support the increase in the minimum wage. One might have expected
that the increase in minimum wage would result in higher average wages which result
in higher unemployment and greater number of individuals in poverty. While the first
relationship between wages and unemployment is consistent with the first row of Table
8, the second relationship is not supported by the data. This result also highlights the
importance of our approach. An examination of the univariate distributions suggests there
is no response to treatment. However, the changes in these correlation values suggests that
the bivariate relationships are sensitive to the treatment. This provides greater insight
into the treatment effects and the mechanisms underlying them.

6 Conclusion
We provide a relatively simple distribution regression based estimator to implement the
evaluation of treatment effects in a difference-in-difference setting. As our approach pro-
vides counterfactual distributions we are able to explore the impact of the treatment at

43
Figure 8: Results of 2-dimensional effects of the unemployment rate and the poverty rate

FY0 ,Z0 |G,T (y, z|1, 1) FY1 ,Z1 |G,T (y, z|1, 1)

30 30

0 .9 0
Average weekly wage

0
0 .9 0
0
25 25

0.
80
0
0.7

0.
50

70
0
20 20
0 .6

0.
60
00

0
0.
0.4

50
0
50

0.4
0.3

00
00

15 15 0.3
00
0.15

0.
20
0

10 10
4 5 6 7 8 9 4 5 6 7 8 9
Unemployment rate Unemployment rate

Figure 9: Results of 2-dimensional effects of the (log) average weekly wage and the poverty
rate

FY0 ,Z0 |G,T (y, z|1, 1) FY1 ,Z1 |G,T (y, z|1, 1)

0.90
30 30
0.8
00

0
0.7
00
Poverty rate

0.75

25 25
0
0.60
0.5

0.6
00

00

0
0.300

0.45
0 .3 0

0.4
00

0
0
0.10

0.15

20 20
0.2
00
0

15 15

10 10
6 6.2 6.4 6.6 6.8 7 7.2 6 6.2 6.4 6.6 6.8 7 7.2
Average weekly wage Average weekly wage

44
Table 8: Results of Kendall’s τ for the treatment group when treated compared to not
treated (with 95 percent confidence intervals between parentheses).

Treatment Without treatment Difference


Kendall’s τ
Unemployment rate and (log) average weekly wage -0.0350 -0.1095 0.0745
(-0.0571, -0.0129) (-0.1791, -0.0400) (-0.0021, 0.1511)

45
Unemployment rate and poverty rate 0.3119 0.2564 0.0555
( 0.2957, 0.3281) (0.0184, 0.4944) (-0.1810, 0.2919)
(log) Average weekly wage and poverty rate -0.1762 -0.3070 0.1307
(-0.1950, -0.1574) (-0.4308, -0.1832) (0.0060, 0.2554)
Spearman’s correlation index
Unemployment rate and (log) average weekly wage -0.0522 -0.1316 0.0793
(-0.0855, -0.0190) (-0.2134, -0.0497) (-0.0123, 0.1709)
Unemployment rate and poverty rate 0.4525 0.3287 0.1238
( 0.4301, 0.4749) (-0.0004, 0.6577) (-0.2032, 0.4508)
(log) Average weekly wage and poverty rate -0.2611 -0.4131 0.1520
(-0.2883, -0.2340) (-0.5822, -0.2441) (-0.0144, 0.3184)
Proportion of households supported healthcare fee Figure 10: Results of 2-dimensional effects

FY0 ,Z0 |G,T (y, z|1, 1)


FY1 ,Z1 |G,T (y, z|1, 1)
0.8
0.8

0.6 0.9
0
0.6
0.90

0.84
0.4
0.4 0.84

0.78
0.78

0.72
0.2 0.72
0.66 0.2 0.66
0.60
0.60
0.54
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Proportion of households supported credit
Proportion of households supported credit

different quantiles of the distribution of the outcome variable. For both the univariate
and multivariate cases we provide the identifying assumption and the associated estima-
tion algorithms. We provide two empirical example which revisits an existing studies and
which highlight the utility of various aspects of our approach.
Our analysis can easily be extended to the case of multiple time periods and more
than two outcomes. We can also extend our distributional regression framework to use
time and unit weights as in the synthetic difference-in-difference estimation method of
Arkhangelsky et al. (2021). We leave these extensions to future research (e.g. Fernández-
Val et al., 2024b).

References
Almond, D., H.W. Hoynes, and D.W. Schanzenbach (2011), “Inside the war on
poverty: the impact of food stamps on birth outcomes”, Review of Economics and
Statistics 93, 387–403.

46
Arkhangelsky, D. and G. Imbens (2023), “Causal Models for longitudinal and
panel Data: a survey”, working paper, Stanford University.

Arkhangelsky, D. S. Athey, D.A. Hirshberg, G.W. Imbens, and S. Wa-


ger (2021), “Synthetic difference-in-differences, American Economic Review 111,
4088–18.

Athey, S. and G.D. Imbens (2006), “Identification and inference in nonlinear difference-
in-differences models”, Econometrica 74 431–97.

Biewen, M., M. Rümmele, and B. Fitzenberger (2022), “Using distribution re-


gression Difference-in-Differences to evaluate the effects of a minimum wage in-
troduction on the distribution of hourly wages and hours worked, working paper,
Nürnberg.

Blundell, R., C. Meghir, M. Costa Dias and J. Van Reenen (2004), “Evalu-
ating the employment impact of a mandatory job search program”, Journal of the
European Economic Association 2,569–606.

Callaway, B. and T. Li (2019), “Quantile treatment effects in difference in differences


models with panel data”Quantitative Economics 10, 1579–1618.

Card, D. (1990), “The Impact of the Mariel Boatlift on the Miami Labor Market”Industrial
and Labor Relations Review, 43, 245—57.

Card, D. and A.B. Krueger (1994), “Minimum Wages and Employment: A Case
Study of the Fast-Food Industry in New Jersey and Pennsylvania”American Eco-
nomic Review 84, 772—93.

Cengiz, D., A. Dube, A. Lindner, and B. Zipperer (2019), “The effect of mini-
mum wages on low-wage jobs ”, Quarterly Journal of Economics 134, 1405–54.

Chernozhukov, V., I. Fernández-Val, and Melly B. (2013), “Inference on coun-


terfactual distributions”, Econometrica, 81, 2205–68.

47
Chernozhukov, V., I. Fernández-Val, and S. Luo (2019), “Distribution regres-
sion with sample selection, with an application to wage decompositions in the UK”,
working paper, MIT, Cambridge (MA).

Chernozhukov, V. I. Fernández-Val, B. Melly, and K. Wüthrich (2020),


“Generic inference on quantile and quantile effect functions for discrete outcomes”,
Journal of the American Statistical Association 115, 123–37.

Dube, A. (2019), “Minimum wages and the distribution of family incomes”, American
Economic Journal: Applied Economics 11, 268—304.

Fernández-Val I., J. Meier, A. van Vuuren and F. Vella (2024a) “Bivariate


distribution regression with an application to intergenerational mobility”, working
paper, Boston University.

Fernández-Val I., J. Meier, A. van Vuuren and F. Vella (2024b) “Distribu-


tional synthetic difference-in-differences”, working paper, Boston University.

Foresi, S. and F. Peracchi (1995), “The conditional distribution of excess returns:


an empirical analysis”, Journal of the American Statistical Association, 90, 451–66.

Goodman-Bacon, A. (2021), “The long-run effects of childhood insurance coverage:


medicaid implementation, adult health, and labor market outcomes, American Eco-
nomic Review 111, 2550-93.

Goodman-Bacon, A. and L. Schmidt (2020), “Federalizing benefits: The introduc-


tion of supplemental security income and the size of the safety net.”, Journal of
Public Economics 185, 104174.

Kim, D. and J.M. Wooldridge (2023), “Difference-in-differences estimator of quan-


tile treatment effect on the treated ”, working paper, Michigan State University.

MaCurdy, T. (2015), “How effective is the minimum wage at supporting the poor?”,
Journal of Political Economy 123, 497–545.

48
Malesky, E.J., C.V. Nguyen, and A. Trahn (2014), “The Impact of recentraliza-
tion on public services: A difference-in-differences analysis of the abolition of elected
councils in Vietnam”, American Political Science Review 108 144–68.

Melly, B. and Santangelo (2015), “The changes-in-changes model with covariates”,


working paper, Bern University.

Roth, J. and P.H.C. Sant’Anna (2023), “When Is parallel trends sensitive to func-
tional form?”, Econometrica 91 737–47.

Torous, W., F. Gunsilius, and P. Rigollet (2024), “An optimal transport ap-
proach to estimating causal effects via nonlinear difference-in-differences, working
paper”, University of California, Berkeley.

Wooldridge, J.M. (2023), “Simple approaches to nonlinear difference-in-differences


with panel data”, Econometric Journal 26 C31–66.

Williams, O. D. and J.E. Grizzle (1972), “Analysis of contingency tables having


ordered response categories”, Journal of the American Statistical Association 67,
55–63.

49

You might also like