Distribution Regression Difference-in-Differences
Distribution Regression Difference-in-Differences
September 5, 2024
Abstract
∗
Boston University
†
Swiss National Bank.
‡
University of Groningen.
§
Georgetown University.
1
1 Introduction
The remarkable popularity of the difference-in-difference (DiD) estimator, inspired by an
approach to evaluating the impact of policy interventions on economic outcomes intro-
duced by David Card (see, for example, Card 1990, Card and Krueger, 1994), is one of
the most striking features of empirical work on treatment and policy effects. While the
methodological innovations in this literature (see Arkhangelsky and Imbens, 2023 for a
recent review) include the use of constructed control groups, the staggered timing of treat-
ments, and fuzzy rather than sharp designs, the vast majority of the associated empirical
work has estimated the mean effect of the treatment on a single economic outcome. This
seems somewhat limited and a fuller evaluation of a policy treatment would be based on
an examination of the marginal and joint distributions of all outcomes it potentially in-
fluences. This paper provides a simple procedure for estimating distributional treatment
effects in the presence of a single treatment when the outcomes of interest are potentially
multivariate.
An initial methodological innovation focusing on distributional effects in DiD estima-
tion is the changes-in-changes procedure of Athey and Imbens (2006), which estimates
the counterfactual distribution of the treated group in the absence of treatment to com-
pare with its observed distribution in the presence of treatment. Torous et al. (2024)
extend the Athey and Imbens approach (2006) to the multiple outcome setting. Other
work has adapted DiD estimation to examine the treatment effects at different quantiles
of the outcome via the use of quantile regression. This includes, for example, Callaway
and Li (2018, 2019). In contrast, Dube (2019), Goodman-Bacon (2021), and Goodman-
Bacon and Schmidt (2020) employ conventional DiD estimation to explore the impact of
the treatment at different points of the outcome distribution. Other distributional ap-
proaches include Kim and Wooldridge (2023) and Biewen, Fitzenberger, and Rümmele
(2022). The former proposes an inverse probability weighting based procedure, while the
latter employs a distribution regression (DR) approach. In this paper we also adopt a
DR approach to constructing counterfactuals. In contrast to Biewen, Fitzenberger, and
Rümmele (2022), who construct the counterfactual distributions via linear probability
2
models, we employ non-linear link functions such as probit or logit models. This has a
number of advantages, which we discuss below. In addition, we provide the associated
identifying conditions required for this form of the implementation of DR-DiD.
While DiD has typically been employed to evaluate the treatment effect on a specified
economic outcome, there are many instances in which the treatment may affect multiple
outcomes. For example, a change in tax rates on earnings of married couples may affect the
hours of work of both husbands and wives. An analysis of such a tax change should include
the impact on each of the outcomes. However, a richer analysis would not only examine
the impact on the respective marginal hours distribution of husbands and wives but also
the joint distribution of hours. Alternatively, while evaluations of minimum wage laws
typically evaluate their impact on employment, they may also affect the wage distribution.
We illustrate how this joint effect can be evaluated via the bivariate distribution regression
(BDR) approach of Fernandez-Val et al. (2024a). This requires that we first estimate
the joint distribution by BDR and then construct the appropriate counterfactual. The
treatment effects are obtained via the appropriate comparisons. An alternative to this
approach is extending the changes-in-changes procedure to multiple outcomes as is done
in Torous et al. (2024).
The following section introduces the model and provides an analysis of the univariate
case without covariates. We also extend our analysis to include covariates and contrast
our approach with the Athey and Imbens (2006) changes-in-changes procedure. Section
3 extends our analysis to the multiple outcome case and Section 4 discusses estimation.
Section 5 provides two empirical illustrations of our methodology. We first revisit the
Malesky et al. (2014) investigation of the impact of recentralization in Vietnam. We
also employ the data studied in Callaway and Li (2019) and feed it with data from the
Bureau of Labor Statistics and the Census in order to explore the impact of changes
in the minimum wage on the joint distributions of average wages, poverty rates and
unemployment rates. Section 6 concludes.
3
2 Econometric analysis of the univariate case
Consider the standard DiD design with 2 periods, T ∈ {0, 1}, and 2 groups, G ∈ {0, 1} in
which a binary treatment, D ∈ {0, 1}, is administered only to the treatment group with
G = 1 in the second period T = 1. Let Y0 and Y1 denote the potential outcomes under
the non-treated and treated statuses. The observed outcome is Y = Y0 (1 − D) + Y1 D,
which corresponds to Y0 for both groups at T = 0, Y0 for G = 0 at T = 1, and Y1 for
G = 1 at T = 1. Note that this implicitly imposes a non-anticipation assumption as we
do not distinguish between the outcomes of the treated and non-treated state for G = 1
in period T = 0.
We are interested in the distributions of the potential outcomes of the treated at
T = 1, that is FY1 | G,T (y | 1, 1) and FY0 | G,T (y | 1, 1). FY1 | G,T (y | 1, 1) is identified from the
observed outcome for G = 1 at T = 1,
where Λ is an invertible CDF such as the logistic, normal or uniform, and y 7→ (α(y),
β(y), γ(y), δ(y)) is a vector of function-valued parameters.
The representation in (1) does not make any parametric assumption about the under-
lying distribution of Y0 | G, T since the dummy variable representation within the paren-
theses on the right-hand side is fully saturated. The parameters of the representation are
local as they vary with y. To understand why (1) does not impose any restriction, note
4
that α(y), β(y), γ(y) and δ(y) can be defined as:1
α(y) = Λ−1 FY0 | G,T (y | 0, 0)
β(y) = Λ−1 FY0 | G,T (y | 0, 1) − Λ−1 FY0 | G,T (y | 0, 0)
γ(y) = Λ−1 FY0 | G,T (y | 1, 0) − Λ−1 FY0 | G,T (y | 0, 0)
δ(y) = Λ−1 FY0 | G,T (y | 1, 1) − Λ−1 FY0 | G,T (y | 1, 0)
− Λ−1 FY0 | G,T (y | 0, 1) − Λ−1 FY0 | G,T (y | 0, 0) .
Assumption 1 [No-interaction].
Assumption 2 [Support].
Y0 (G = 1; T = 1) ⊆ Y0 (G = 0; T = 1) ∪ Y0 (G = 1; T = 0) ∪ Y0 (G = 0; T = 0).
Assumption 1 implies that the distribution of the potential outcome Y0 should not
change differently in the second period for the treatment group compared to the control
group. That is, we allow a difference between the distributions of the potential outcome Y0
between the treatment and control group, but this difference should be identical in both
periods. This is a parallel trend type assumption on a transformation of the distribution
and can be written as:
Λ−1 FY0 | G,T (y | 1, 1) − Λ−1 FY0 | G,T (y | 1, 0) =
Λ−1 FY0 | G,T (y | 0, 1) − Λ−1 FY0 | G,T (y | 0, 0) .
This assumption is sensitive to the link function and imposes restrictions on the distri-
bution FY0 | G,T for some link functions. For example, if Λ is the identity link used in the
linear probability model as in, for example, Almond et al. (2011), Dube (2019), Cengiz
1
See also Wooldridge (2023) equations (2.6) and (2.7).
5
et al. (2019), Goodman-Bacon and Smith (2020), Goodman-Bacon (2021) and Biewen et
al. (2022), one needs strong requirements in order to satisfy the parallel trends assump-
tion (Blundell et al., 2004 and Wooldridge, 2023) That is, we need restrictions on the
tails of the distribution of FY0 | G,T (y | 1, 0), FY0 | G,T (y | 0, 1) and FY0 | G,T (y | 0, 0) to guar-
antee that FY0 | G,T (y | 1, 1) is between 0 and 1. Thus, it requires that FY0 | G,T (y | 1, 0) ≤
1 + FY0 | G,T (y | 0, 0) − FY0 | G,T (y | 0, 1), which might be restrictive at the top of the distribu-
tion, and FY0 | G,T (y | 1, 0)≥ FY0 | G,T (y | 0, 0)−FY0 | G,T (y | 0, 1), which might be restrictive at
the bottom of the distribution.2,3 Link functions such as the normal or logistic CDFs do
not require such restrictions since the transformation expands the range of the distribution
to the entire real line.
Assumption 1 cannot be empirically verified but when we have multiple observations in
the pre-treatment period, it is possible to examine whether the “parallel trends” assump-
tion holds pre-treatment. Assumption 2 is a restriction of the support of the counterfactual
outcome of Y0 for the treated group in the treated period.
These two assumptions identify FY0 | G,T (y | 1, 1) since:
under Assumption 1. The support restrictions in Assumption 2 ensure that the term
inside the squared brackets in (2) is determined. Note that as limx→∞ Λ(x) = 1 and
limx→−∞ Λ(x) = 0, our assumptions are sufficient but not necessary.
We present this identification result in the following lemma:
2
These requirements could be used to develop a specification test for the identity link. Roth and
Sant’Anna (2023) proposed a test for the sharp hypothesis that y 7→ FY0 | G,T (y | 1, 0) + FY0 | G,T (y | 0, 1) −
FY0 | G,T (y | 0, 0) be weakly increasing, which can be adapted to our setting. We do not pursue this route
as we do not encourage the use of the linear probability model.
3
For example, an increase in 0.2 in probability over time might be realistic for the control group when
the initial probability was 0.5. However, if treatment group has a probability of, for example, 0.9, in the
first period then it is not possible for the common trends assumption to hold.
6
Lemma 1 [Identification with Single Outcome]. y 7→ FY0 | G,T (y | 1, 1) is identified on
y ∈ R under Assumptions 1 and 2.
FY0 | G,T,X (y | g, t, x) = Λ(α(y, x) + β(y, x)t + γ(y, x)g + δ(y, x)gt), y ∈ R, (3)
where (y, x) 7→ (α(y, x), β(y, x), γ(y, x), δ(y, x)) is a vector of unspecified functions.
The identifying assumptions with covariates become:
almost surely.
7
under the Assumption 3. The support restrictions in Assumption 4 ensure that the
term between parentheses in (4) is determined. Note that as limx→∞ Λ(x) = 1 and
limx→−∞ Λ(x) = 0, our assumptions are sufficient but not necessary.
Let X11 denote the support of X conditional on G = 1 and T = 1. The following
Lemma states that FY0 ,Z0 | G,T,X is identified under the previous assumptions.
Lemma 2 [Identification with Covariates]. Under Assumptions 3 and 4, (y, x) 7→ FY0 | G,T,X (y | 1, 1, x)
is identified on (y, x) ∈ R × X11 .
Y0 (G = 1, T = 0) ⊆ Y0 (G = 0, T = 0),
Y0 (G = 1, T = 1) ⊆ Y0 (G = 0, T = 1).
Their second support restriction is less restrictive than ours but we do not need their
first support restriction. The previous assumptions identify the quantile function of
8
FY0 | G,T (y | 1, 1) as:
FY−1
0 | G,T
(u | 1, 1) −1
=φ FY0 | G,T (u | 1, 0) ,
φ(y) := FY−1
0 | G,T
FY0 | G,T (y | 0, 0) | 0, 1 , u ∈ [0, 1],
The restrictions then follow from equalizing the coefficients in both sides.4 They would
complicate estimation in our framework as they involve two different levels of Y and the
transformation φ needs to be estimated.
4
There is only a binding restriction because α(y) = α(φ(y)) + β(φ(y)) holds by definition of φ(y).
9
2.3 Comparison with Roth and Sant’Anna (2023)
Roth and Sant’Anna (2023) derive the condition:
E(Y0 | G = g, T = 1) − E(Y0 | G = g, T = 0) =
Z ∞
[Λ(α(y) + γ(y)g) − Λ(α(y) + β(y) + γ(y)g)]dy
−∞
depends on g unless Λ is the identity map, or β(y) = 0 (no trend) or γ(y) = 0 (random
assignment) for y ∈ R. Roth and Sant’Anna (2023) show that their condition holds if and
only if there are no trends, random assignment or a mixture of both. Our model, how-
ever, generally satisfies a different invariance property with respect to strictly monotonic
transformations that we specify in subsection 2.4.
FỸ0 | G,T,X (ỹ | g, t, x) = Λ(α(h−1 (ỹ)) + β(h−1 (ỹ))t + γ(h−1 (ỹ))g) = Λ(α̃(ỹ) + β̃(ỹ)t + γ̃(ỹ)g),
where ỹ 7→ h−1 (ỹ) is the inverse function of y 7→ h(y), α̃ = α ◦ h−1 , β̃ = β ◦ h−1 and
γ̃ = γ ◦ h−1 . A similar argument applies when h is strictly decreasing. Unlike the parallel
10
trends in expectation, the no-interaction or parallel trends in distribution is invariant to
strictly monotonic transformations.5
3 Multiple Outcomes
Some settings may feature multiple outcomes which are potentially affected by the treat-
ment. In these situations we might be interested in not only how each of the outcomes
are affected by the treatment, but also how the relationship between the outcomes is
affected by the treatment. For this it is necessary to identify the joint distribution of
the potential outcomes with and without the treatment. We now consider a setting with
two outcomes Y and Z and we focus on comparing features of the joint distribution of
the potential outcomes with the treatment, Y1 and Z1 , and the joint distribution of the
potential outcomes without the treatment, Y0 and Z0 , for the treated group G = 1 in the
post-treatment period T = 1. For the sake of illustration we consider two measures of
dependence. Namely, Spearman’s and Kendall’s rank correlation.
Spearman’s rank correlation between Yd and Zd , d ∈ {0, 1}, can be expressed:
and Kendall’s rank correlation between Yd and Zd , d ∈ {0, 1}, can be expressed:
Z ∞ Z ∞
τ [Yd , Zd | G = 1, T = 1] = 4 [FYd ,Zd | G,T (y, z | 1, 1)−1/4]FYd,Zd | G,T (dy, dz | 1, 1),
−∞ −∞
where we assume that Yd and Zd are continuous random variables to obtain the expressions
on the right hand side.
As in the univariate case, FY1 ,Z1 | G,T (y, z | 1, 1) is identified by the joint distribution
of the observed outcomes, FY,Z | G,T (y, z | 1, 1), whereas FY0 ,Z0 | G,T (y, z | 1, 1) is not identi-
fied from the data. To analyze identification, we use a variation of the local Gaussian
representation (LGR) of a bivariate distribution from Chernozhukov, Fernandéz-Val and
Luo (2018). Let Φ denote the Gaussian distribution function and Φ2 (·, ·; ρ) denote the
5
The distributional approach of Kim and Wooldridge (2023) also satisfies this property.
11
distribution of the bivariate standard normal with parameter ρ. Moreover, Λ is, again,
a strictly increasing cumulative distribution function. As we show in Section 4, there is
a benefit of using the logistic link function in our univariate analysis. Accordingly, we
employ this in our empirical analysis for estimating both the univariate and bivariate
effects.
Lemma 3 [LGR with non-Normal Marginals]. The joint distribution of two random vari-
ables Y and Z conditional on X can be represented by:
FY,Z | X (y, z | x)(y, z | x) ≡ Φ2 (Φ−1 (Λ(µY | X (y | x))), Φ−1 (Λ(µZ | X (y | x))); ρY,Z | X (y, z | x)),
for all y, z, x, where µY | X (y | x) = Λ−1 (FY | X (y | x)), µZ | X (y | x) = Λ−1 (FZ | X (z | x)), and
ρY,Z | X (y, z | x)) is the unique solution in ρ to the equation:
FY,Z | X (y, z | x)(y, z | x) = Φ2 (Φ−1 (FY | X (y | x)(y, z | x)), Φ−1 (FZ | X (z | x)(y, z | x)); ρ).
Proof. The proof is identical to the proof of Lemma 2.1 of Chernozhukov, Fernandéz-Val
and Luo (2018) using:
and
Φ−1 (Λ(µZ | X (z | x))) = Φ−1 (FZ | X (z | x)).
The difference between Lemma 3 and the LGR of Chernozhukov, Fernandéz-Val and
Luo (2018) is that the marginals are represented by a general link rather than Gaussian
links, that is:
Φ2 (Φ−1 (Λ(µY0 | G,T (y | g, t))), Φ−1(Λ(µZ0 | G,T (y | g, t))); ρY0,Z0 | G,T (y, z | g, t)), (7)
12
where µY0 | G,T (y | g, t) = αY (y) + βY (y)t + γY (y)g + δY (y)gt, µZ0 | G,T (y | g, t) = αZ (z) +
βZ (z)t + γZ (z)g + δZ (z)gt, and ρY,Z | G,T (y, z | g, t) = αY,Z (y, z) + βY,Z (y, z)t + γY,Z (y, z)g +
δY,Z (y, z)gt. In the LGR, the marginals are represented by:
and
FZ0 | G,T (z | g, t) = Λ(αZ (z) + βZ (z)t + γZ (z)g + δZ (z)gt).
We make the following identifying assumptions with respect to the distribution func-
tion in (7).
YZ 0 (G = 1; T = 1) ⊆ YZ 0 (G = 0; T = 1) ∪ YZ 0 (G = 1; T = 0) ∪ YZ 0 (G = 0; T = 0).
Lemma 4 [Identification with Two Outcomes]. (y, z) 7→ FY0 ,Z0 | G,T (y, z | 1, 1) is identified
on R2 under Assumptions 5 and 6.
Proof of Lemma 4. Under the assumptions of the Lemma, µY0 | G,T (y | g, t) = αY (y) +
βY (y)t + γY (y)g, µZ0 | G,T (y | g, t) = αZ (z) + βZ (z)t + γZ (z)g, and ρY,Z | G,T (y, z | g, t) =
αY,Z (y, z) + βY,Z (y, z)t + γY,Z (y, z)g.
The parameters αY (y), βY (y), γY (y), αZ (z), βZ (z), and γZ (z) are identified from the
marginals of Y and Z, by Lemma 1. The parameter αY,Z (y, z) is identified as the solution
in α to:
FY,Z | G,T (y, z | 0, 0) = Φ2 (αY (y) + βY (y)t + γY (y)g, αZ (z) + βZ (z)t + γZ (z)g; α).
This solution exists and is unique because the RHS is strictly increasing in α. The
parameters βY,Z (y, z) and γY,Z (y, z) are identified similarly as the solutions in β and γ of:
FY,Z | G,T (y, z | 0, 1) = Φ2 (αY (y) + βY (y)t + γY (y)g, αZ (z) + βZ (z)t + γZ (z)g; αY,Z (y, z) + β).
13
and
FY,Z | G,T (y, z | 1, 0) = Φ2 (αY (y) + βY (y)t + γY (y)g, αZ (z) + βZ (z)t + γZ (z)g; αY,Z (y, z) + γ).
Finally,
FY0 ,Z0 | G,T (y, z | 1, 1) = Φ2 (αY (y) + βY (y) + γY (y), αZ (z)βZ (z) + γZ (z); αY,Z (y, z)+
βY,Z (y, z) + γY,Z (y, z)).
δY (y, X) = δZ (z, X) = δY,Z (y, z, X) = 0 almost surely for all (y, z) ∈ R2 in (8).
YZ 0 (G = 1; T = 1; X) ⊆
YZ 0 (G = 0; T = 1; X) ∪ YZ 0 (G = 1; T = 0; X) ∪ YZ 0 (G = 0; T = 0; X),
almost surely.
Lemma 5 [Identification with Two Outcomes and Covariates]. Under Assumptions 7 and
8, (y, z) 7→ FY0 ,Z0 | G,T,X (y, z | 1, 1, x) is identified on R2 × X11 .
14
4 Estimation
FY0 | G,T,X (y | g, t, x) = Λ(pα (x)′ α(y) + pβ (x)′ β(y)t + pγ (x)′ γ(y)g), y ∈ R, (9)
where pα (x), pβ (x) and pγ (x) are vectors including the covariates and their transforma-
tions, and y 7→ (α(y), β(y), γ(y)) is a vector of function-valued parameters.
We implement the DR DiD estimator via the sequence of logit models at each point
of the distribution of the outcome variable (Foresi and Peracchi, 1995, Chernozhukov,
Fernandez-Val and Melly, 2013). We choose logit because it is the canonical link for binary
outcomes allowing for pooled estimation of the distributions of the potential outcomes
with and without the treatment (Wooldridge, 2023). Accordingly, we estimate the DR
model for the observed outcomes on all observations including those with Di = 1:
FY | G,T,X (y | g, t, x) = Λ(pα (x)′ α(y) + pβ (x)′ β(y)t + pγ (x)′ γ(y)g + pθ (x)′ θ(y)gt), y ∈ Y,
(10)
where pα (x), pβ (x), pγ (x) and pθ (x) are vectors including a constant as the first compo-
nent, and transformations of the covariates, and Y is a finite grid on R. Let Iiy := 1(Yi ≤ y)
and I¯iy = 1 − Iiy .
15
and
N
1 X
F̂Y1 | G,T (y | 1, 1) = Gi Ti Λ(pα (Xi )′ α̂(y) + pβ (Xi )′ β̂(y) + pγ (Xi )′ γ̂(y) + pθ (Xi )′ θ̂(y)),
N11 i=1
PN
where N11 = i=1 Gi Ti .
By the properties of the logistic link, the estimator of FY1 | G,T (y | 1, 1) is identical to
the empirical distribution of Y conditional on G = 1 and T = 1,
N
1 X
F̂Y1 | G,T (y | 1, 1) ≡ Gi Ti 1(Yi ≤ y).
N11 i=1
Note that this estimator is therefore invariant to the specification of pθ (x). We set pθ (x) =
1 to speed up computation. Estimators of the functionals of the distributions of potential
outcomes such as quantile functions and effects can be constructed using the plug-in
principle.
Our algorithm has an advantage over the alternative of estimating pα , pβ , pγ using
only those observations for which Di = 0 via direct estimation of (10). The distributional
treatment effect, i.e.
EX FY1 |G,T,X (y|1, 1, X) − FY0 |G,T,X (y|1, 1, X) ,
equals the average derivative estimator of Di for the logit model used for distribution
regression. This estimate of the average derivative and its standard error are reported by
many software packages (Wooldridge, 2024).
16
µZ0 | G,T,X (y | g, t, x) = qα (x)′ αZ (z) + qβ (x)′ βZ (z)t + qγ (x)′ γZ (z)g, (12)
and
ρY,Z | G,T,X (y, z | g, t, x) = h(rα (x)′ αY,Z (y, z) + rβ (x)′ βY,Z (y, z)t + rγ (x)′ γY,Z (y, z)g), (13)
where pα (x), pβ (x), pγ (x), qα (x), qβ (x), qγ (x), rα (x), rβ (x) and rγ (x) are vectors including
the covariates and their transformations, and h(u) = arctanh(u) is the Fisher transfor-
mation that enforces ρY,Z | G,T,X to lie in [−1, 1].
We estimate all the parameters of FY0 ,Z0 | G,T (y, z | 1, 1) using the bivariate distribution
regression estimator of Fernandez-Val et al. (2024a). We employ an imputation method
that combines the parameter estimates from the sample of the first period for both groups
and the sample of the second period for the untreated group, with the sample of the
covariates in the second period for the treated group. The distribution FY1 ,Z1 | G,T (y, z | 1, 1)
is estimated using the empirical distribution of Y and Z in the second period for the
treated group. Algorithm 2 describes the estimation procedure. Let Y and Z be finite
grids on R, Iiy := 1(Yi ≤ y), I¯iy = 1 − Iiy , Jiz := 1(Zi ≤ z), J¯iz = 1 − Jiz .
Algorithm 2 [Bivariate Estimator]. 1. Estimate the parameters of (11) and (12) us-
ing Algorithm 1 on y ∈ Y and z ∈ Z. Obtain
m̂Yi (y) = Φ−1 (Λ(pα (Xi )′ α̂Y (y) + pβ (Xi )′ β̂Y (y) Ti + pγ (Xi )′ γ̂Y (y) Gi )),
and
m̂Zi (z) = Φ−1 (Λ(qα (Xi )′ α̂Z (z) + qβ (Xi )′ β̂Z (z) Ti + qγ (Xi )′ γ̂Z (z) Gi )),
where α̂Y (y), β̂Y (y), γ̂Y (y), α̂Z (z), β̂Z (z) and γ̂Z (z) are the estimates of αY (y),
βY (y), γY (y), αZ (z), βZ (z) and γZ (z) obtained from Algorithm 1.
17
3. Construct plug-in estimators of the distributions of the potential outcomes
N
1 X Y,Z
F̂Y0 | G,T (y | 1, 1) = Gi Ti Φ2 (n̂Yi (y), n̂Z
i (z); n̂i (y, z)),
N11 i=1
and
N
1 X
F̂Y1 ,Z1 | G,T (y, z | 1, 1) = Gi Ti 1(Yi ≤ y, Zi ≤ z),
N11 i=1
where
n̂Yi (y) = Φ−1 (Λ(pα (Xi )′ α̂Y (y) + pβ (Xi )′ β̂Y (y) + pγ (Xi )′ γ̂Y (y))),
n̂Zi (z) = Φ−1 (Λ(qα (Xi )′ α̂Z (z) + qβ (Xi )′ β̂Z (z) + qγ (Xi )′ γ̂Z (z))),
n̂Y,Z ′ ′ ′
i (y, z) = h(rα (Xi ) α̂Y,Z (y, z) + rβ (Xi ) β̂Y,Z (y, z) + rγ (Xi ) γ̂Y,Z (y, z))
P
and N11 = N i=1 Gi Ti .
18
2. Draw weights for each unit independent and identically from the standard expo-
nential distribution, independently for the data. Construct a vector of weights
ω = (ω1 , . . . , ωN ), where ωi = ωj if IDi = IDj , and normalize the components
of ω to add up to one.
3. Estimate the parameters of model (10) by weighted distribution regression, that is,
for y ∈ Y,
N
X
(α̃(y), β̃(y), γ̃(y), θ̃(y)) ∈ arg max ωi ℓi (a, b, c, d),
a,b,c,d
i=1
and
N
1 X
F̂Yb1 | G,T (y | 1, 1) = ωi Gi Ti Λ(pα (Xi )′ α̃(y) + pβ (Xi )′ β̃(y) + pγ (Xi )′ γ̃(y) + pθ (Xi )′ θ̃(y)),
N11 i=1
PN
where N11 = i=1 ωi Gi Ti .
6. Construct an estimator of the (1−α)-critical value of the maximal t-statistic, t̄Y (1−
α), as the (1 − α)-quantile of {t̄bY : 1 ≤ b ≤ B}, where
" b b
#
| F̂Y0 | G,T (y | 1, 1) − F̂Y 0 | G,T (y | 1, 1)| | F̂Y1 | G,T (y | 1, 1) − F̂Y 1 | G,T (y | 1, 1)|
t̄bY = max , ,
y∈Y S0 (y) S1 (y)
n o
and Sd (y) is the interquartile range of F̂Ybd | G,T (y | 1, 1) : 1 ≤ b ≤ B divided by
1.34896, the interquartile range of the standard normal distribution, for d ∈ {0, 1}.
CB1−α [FYd | G,T (· | 1, 1)] = F̂Yd | G,T (y | 1, 1) ± t̄Y (1 − α)Sd (y), d ∈ {0, 1}.
19
Remark 1 (Empirical Bootstrap). Empirical bootstrap can be implemented by drawing
the weights in step 1 from a multinomial distribution with values 1, . . . , n and equal prob-
abilities 1/n.
5 Empirical applications
We illustrate our proposed estimators via an examination of data from two existing em-
pirical investigations.6 . The first is the Malesky et al. (2014) investigation of the impact
of recentralization in Vietnam. This paper is useful for our purposes as it considers a
single treatment and multiple outcomes. Moreover, Malesky et al. (2014) only report
mean effects. We examine whether these mean effects are informative of the heterogene-
ity across the outcomes’ distributions. We also investigate whether the treatment affects
the bivariate distribution of different combinations of outcomes. In a second example
we explore the impact of increases in the mandatory minimum wage on average weekly
wages, unemployment and poverty rates by examining an extension of the data in Call-
away and Li, (2019). We estimate the impact of the treatment on each of these outcome
distributions and also the bivariate distributions of some combinations of the outcomes.
Malesky et al. (2014) investigate the impact of recentralization via a case study in Viet-
nam. Due to the dissatisfaction with decentralization measures taken in the early 1990s,
Vietnam changed their political system in 2007 by eliminating one political layer from the
decision making process. More explicitly, Vietnam has four layers of the political process:
the central government, the provinces (63 in total), the districts (696 in total), and the
communes (more than 11,000 in total).7 The change involved abolishing the political
process at the districts (which are governed by the so called Districts People Council or
DPC). Prior to introducing this change, the Vietnam government experimented with ten
6
We thank Brant Callaway, Tong Li and Pedro Sant’Anna for providing us with the data
7
The total population of Vietnam was 84.76 million in 2007.
20
provinces (with 99 districts). Malesky et al. (2014) use this experiment for their empirical
analysis. Note that the experiment was not random, but was decided by the central gov-
ernment to be stratified based on regions and subregions as well as on rural versus urban
areas and by socioeconomic and public administration performance of the provinces. The
decision to start this experiment was made in 2008 and the abolition of the DPC in the
treatment districts started in 2009.
Malesky et al. (2014) employ the following specification;
where Yit is the outcome variable for period t of commune i. Tt is a dummy variable
that equals one in the treated period while Gi is a dummy variable that equals one if
the commune i belongs to a treated district. Finally, Xit is a set of control variables
for commune i and in period t. Malesky et al. (2014) consider the log surface area of
the commune, the log of the commune population density, whether the commune belongs
to a national level city, and region dummies (8 regions in total). For reasons of data
availability, Malesky et al. (2014) only use rural communes and two years of observations:
2008 and 2010 (they also use 2006 for robustness checks). They analyze 30 different
outcome variables to investigate the impact of the abolition of the political layer. As
most of their outcome variables are indicator variables, we only employ the following
eight of their original outcome variables: (1) proportion of households supported crop, (2)
proportion of households supported agricultural extension, (3) proportion of households
supported agriculture tax exemption, (4) the number of visits of agricultural extension
staff, (5) proportion of households supported healthcare fee, (6) proportion of households
supported tuition fee, (7) proportion of households supported credit, and (8) proportion
of households supported business tax exemption.
We use the following specification for FY0,i |Gi ,Ti ,Xit (·|g, t, x)
FY0,i |Gi ,Ti ,Xit (y|gt, tt , xit ) = Λ(α(y) + β(y)tt + γ(y)gi + x′it π(y)),
21
and use the same control variables as in Malesky et al. (2014). We estimate the counter-
factual distribution using:
where α b
b(y), β(y), γb(y), and π
b(y) are estimated via distribution regression at y. We
estimate the unconditional distribution using
Z
FY0,i |Gi ,Ti (y|1, 1) = FY0,i |Gi ,Ti ,Xit (y|1, 1, xi1)dFXit |G,T (x|1, 1).
X (1,1)
1 X
FbY0,i |Gi ,Ti (y|1, 1) = FbY0,i,t |Gi ,Ti ,Xi1 (y|1, 1, Xi,1),
N11 i:Gi =1,Ti =1
where N11 is the total number of observations for which Gi = 1, Ti = 1. We estimate the
quantile treatment effects by inverting the estimated distributions of FY0,i,t |Gi ,Ti (y|1, 1) and
FY1,i,t |Gi ,Ti (y|1, 1), where we estimate FY1,i,t |Gi ,Ti (y|1, 1) by using the empirical distribution.
In particular we use
FbY−1
j,i,t |Gi ,Ti
(q|1, 1) = inf{y : FbYj,i,t |Gi ,Ti (y|1, 1) ≤ q} j = 0, 1.
Results of our empirical exercise are in Figure 1. The quantile treatment effects are listed
in Table 1. We estimate the quantile treatment effects by inverting the estimated dis-
tribution functions. As in Malesky et al. (2014), we correct the confidence intervals for
clustering at the province level. We use the Bayesian bootstrap and draw the same expo-
nential weight for all observations belonging to the same province (see also Chernozukov
et al., 2020). We construct the confidence bounds by following steps 1-4 of Algorithm 1 of
Chernozhukov et al. (2020) but we use directly the quantile treatment effects rather than
the estimated distributions. This is allowed provided we assume the outcome variable
is continuously distributed. For some outcome variables, there is substantial bunching
at zero. For example, for the variable “Proportion of households supported crop” 49.93
percent of the observations equal zero. It also has this value for 54.04 percent for the
treatment group in the treatment period and 50.79 for the control group. This implies
that the quantile treatment effect is by definition equal to zero up to the median and the
22
Figure 1: Results of the empirical application.
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee
23
Figure 1: Results of the empirical application (continued).
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption
impact can only occur at the higher quantiles. Note that we employ dots to distinguish
this from cases in which there is an estimated zero impact.
Nevertheless, even after ignoring these zeros, both Figure 1 and Table 1 reveal a great
deal of treatment heterogeneity and that the mean impacts reported in Malesky et al.
(2014) are primarily generated from impacts at the top of the distribution. This is most
clear from the outcome variable “The number of visits of agricultural extension staff”
which has only a substantial impact at Q3 and D9. A similar conclusion can be drawn
for the outcome variable “Proportion of households supported credit”.
Malesky et al. (2014) use data for the year 2006 to check the common trends assump-
tion. They examine the results of a DiD design for the periods 2006 and 2008 in which
2008 is the placebo treatment period. That is, one would not expect any treatment effect
in the period before the introduction of the treatment. The results of this exercise are
in Figure 2 and Table 2. Similar to Malesky et al. (2014), we do find some substantial
differences in Figure 2 between the distribution of Y0 and Y1 . However, Table 2 indicates
these differences are not statistically significant and that they are generally in the opposite
direction as found in the original results. For example, for the proportion of households
supported crop (the first figure in Figures 1 and 2), the distribution of FY1 |G,T (·|1, 1) is in
24
Table 1: Quantile treatment effects with 90 percent confidence intervals based on Bayesian
weights. Confidence intervals corrected for clustering at the province level.
Mean 0.1 0.25 0.5 0.75 0.9
25
The number of visits of agricultural 0.0203 -0.0001 0.0 0.01 0.0299 0.08
extension staff (-0.001,0.042) (-0.0,0.0) (-0.009,0.009) (0.0,0.02) (-0.009,0.069) (0.0,0.16)
Proportion of households supported 0.0389 · -0.001 -0.0077 -0.0073 0.026
healthcare fee (0.008,0.07) (·, ·) (-0.003,0.001) (-0.019,0.003) (-0.042,0.027) (-0.348,0.4)
Proportion of households supported 0.0016 · 0.0005 -0.0001 -0.0001 0.0063
tuition fee (-0.002,0.005) (·, ·) (-0.001,0.002) (-0.002,0.002) (-0.006,0.006) (-0.008,0.021)
Proportion of households supported -0.0535 · -0.008 -0.0276 -0.0709 -0.1804
credit (-0.073,-0.034) (·, ·) (-0.015,-0.001) (-0.045,-0.01) (-0.204,0.062) (-0.269,-0.092)
Proportion of households supported 0.0012 · · · · -0.0143
business tax exemption (-0.009,0.011) (·, ·) (·, ·) (·, ·) (·, ·) (-0.029,-0.0)
Figure 2 generally to the left of the distribution of FY1 |G,T (·|1, 1) while the relationship is
the opposite in Figure 1.
In a standard linear DiD design, checking the common trend assumption as presented
above is identical to checking the value of the interaction term in the period(s) before the
treatment. This is not true for the non-linear design. However we perform an additional
check by examining the estimated coefficient value of the interaction term. That is, we
estimate the general representation presented in (1) for all observations in the periods
2006 and 2008. Results of this exercise are presented in Figure 3. Generally, we find that
δ(y) is not significantly different from zero but there are some regions in the distribution
of some of the outcome variables where there is a significant difference. For example, for
the outcome variable the “Proportion of households supported agricultural exemption”,
we find a significant difference in between 0.3 and 0.9 of the outcome values.
As a further robustness check, we interacted the covariates with the time and treatment
dummy variables. We interact regions with time but we cannot interact regions with
treatment as this will result in perfect multicollinearity due to the setup of the program.
These results are shown in Figure 4 and the quantile treatment effects are reported in
Table 5.
For the changes-in-changes estimation, we note as in Athey and Imbens (2006) that the
distribution of Y0 | G = 1, T = 1 equals the distribution of φ(Y0 ) | G = 1, T = 0, where φ(y)
is defined as in Section 2.2, i.e. φ(y) := FY−1
0 | G,T FY 0 | G,T (y | 0, 0) | 0, 1 . We then obtain
an estimator of the distribution of Y0 |G = 1, T = 1 by using the empirical distribution
function of the random variable:
QFbY
0 |G,T
(Y0 |0,1) (Y0 |G = 0, T = 0).
We estimate the distribution function of FY0 |G=1,T =1 at point y for our changes-in-changes
estimator using the following steps:
26
Figure 2: Results of robustness check using the years 2006 and 2008.
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee
27
Table 2: Quantile treatment effects – robustness check for 2006 and 2008 – parallel trends.
Mean 0.1 0.25 0.5 0.75 0.9
28
The number of visits of agricultural -0.0002 · · · -0.0001 -0.0001
extension staff (-0.026,0.026) (·, ·) (·, ·) (·, ·) (-0.02,0.02) (-0.08,0.079)
Proportion of households supported -0.0038 · · 0.0079 -0.0221 -0.0241
healthcare fee (-0.037,0.029) (·, ·) (·, ·) (-0.004,0.02) (-0.043,-0.001) (-0.135,0.087)
Proportion of households supported 0.0012 · 0.0002 0.0018 0.0033 0.006
tuition fee (-0.003,0.006) (·, ·) (-0.001,0.001) (-0.0,0.004) (-0.001,0.008) (-0.017,0.028)
Proportion of households supported 0.0053 · -0.0011 -0.0004 -0.0013 -0.0239
credit (-0.014,0.025) (·, ·) (-0.014,0.012) (-0.023,0.022) (-0.059,0.056) (-0.149,0.101)
Proportion of households supported -0.0058 · · · 0.002 0.0142
business tax exemption (-0.021,0.009) (·, ·) (·, ·) (·, ·) (-0.001,0.005) (-0.004,0.032)
Figure 3: Results of the robustness check to investigate whether δ(y) of equation (1)
equals zero in the period before the treatment.
3
1
2
δ(y)
1 0.5
0
0
−1
2
0.2
1
0
δ(y)
−0.2 0
−0.4
−1
0.2 0.4 0.6 0.8 1 0.5 1 1.5
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
0.5
0.5
δ(y)
0
0
−0.5
0.2 0.4 0.6 0.8 1 5 · 10−2 0.1 0.15 0.2 0.25
Proportion of households supported healthcare fee Proportion of households supported tuition fee
29
Table 3: Quantile treatment effects without using additional control variables in the
Mean 0.1 0.25 0.5 0.75 0.9
30
The number of visits of agricultural 0.0197 -0.0001 -0.0002 0.0098 0.0194 0.0507
extension staff (-0.002,0.041) (-0.0,0.0) (-0.01,0.009) (0.0,0.02) (-0.011,0.05) (-0.028,0.13)
Proportion of households supported 0.0331 · -0.0009 -0.0049 -0.0071 0.0237
healthcare fee (0.0,0.066) (·, ·) (-0.002,0.0) (-0.016,0.006) (-0.039,0.025) (-0.419,0.466)
Proportion of households supported 0.001 · 0.0004 -0.0005 -0.0014 0.0036
tuition fee (-0.003,0.005) (·, ·) (-0.001,0.002) (-0.003,0.002) (-0.009,0.006) (-0.009,0.017)
Proportion of households supported -0.0574 · -0.0089 -0.028 -0.0716 -0.1808
credit (-0.079,-0.036) (·, ·) (-0.019,0.001) (-0.05,-0.006) (-0.2,0.057) (-0.307,-0.054)
Proportion of households supported 0.0008 · · · · -0.0153
business tax exemption (-0.008,0.01) (·, ·) (·, ·) (·, ·) (·, ·) (-0.028,-0.002)
analysis.
Table 4: Quantile treatment effects without using additional control variables in the
Mean 0.1 0.25 0.5 0.75 0.9
31
The number of visits of agricultural -0.0163 -0.0001 -0.0003 -0.0193 -0.0003 -0.05
extension staff (-0.032,-0.001) (-0.0,0.0) (-0.01,0.009) (-0.029,-0.009) (-0.02,0.019) (-0.119,0.019)
Proportion of households supported -0.0072 · · 0.0094 -0.0212 -0.0236
healthcare fee (-0.045,0.031) (·, ·) (·, ·) (-0.011,0.029) (-0.051,0.009) (-0.133,0.085)
Proportion of households supported 0.0008 · -0.0001 0.0014 0.0026 0.0051
tuition fee (-0.005,0.007) (·, ·) (-0.002,0.001) (-0.001,0.003) (-0.003,0.008) (-0.022,0.032)
Proportion of households supported 0.0008 · -0.0055 -0.0005 -0.0024 -0.0307
credit (-0.025,0.026) (·, ·) (-0.021,0.01) (-0.026,0.025) (-0.072,0.067) (-0.168,0.107)
Proportion of households supported -0.0061 · · · 0.0019 0.014
business tax exemption (-0.024,0.012) (·, ·) (·, ·) (·, ·) (-0.002,0.006) (-0.009,0.037)
Table 5: Quantile treatment effects using interaction terms between the covariates and
Mean 0.1 0.25 0.5 0.75 0.9
32
The number of visits of agricultural 0.02 -0.0001 -0.0001 0.0099 0.0193 0.0798
extension staff (-0.008,0.048) (-0.0,0.0) (-0.009,0.009) (0.0,0.02) (-0.02,0.058) (0.0,0.159)
0.5
1.5
0 1
δ(y)
0.5
−0.5
−1 −0.5
2. For every computed empirical distribution function of step 1 estimate the corre-
sponding quantile of the subsample for which G = 0, T = 1.
3. For all the obtained quantiles from step 2, compute the empirical distribution func-
tion in y.
The distribution of FY1 |G=1,T =1 can be estimated using the empirical distribution function.
One obtains the quantile treatment effect by inverting the distribution at the desired levels
of the distribution.
As we consider our estimator for the univariate outcome as a simpler alternative to
the CiC approach we contrast our results with those from that approach. We only do
so for the no covariates case as the CiC estimator is more difficult to implement in the
presence of covariates. The results are reported in Table 3 and Figure 5. The results are
reassuring as they are very similar to those in Table 6 and Figure 6 for CiC. The results
are similar in terms of both magnitude and statistical significance.
33
Figure 4: Results of the empirical application using interaction terms between the covari-
ates and the time and treatment dummy variable.
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee
34
Table 6: Quantile treatment using changes-in-changes.
Mean 0.1 0.25 0.5 0.75 0.9
35
The number of visits of agricultural 0.0145 -0.01 -0.0097 0.0098 0.0064 0.0497
extension staff (-0.01,0.039) (-0.02,-0.0) (-0.019,-0.0) (-0.002,0.021) (-0.021,0.034) (-0.02,0.12)
Proportion of households supported 0.0286 · -0.0009 -0.0066 0.0007 0.0353
healthcare fee (-0.001,0.059) (·, ·) (-0.003,0.001) (-0.017,0.004) (-0.025,0.026) (-0.417,0.488)
Proportion of households supported 0.0013 · 0.0005 -0.0003 0.0006 0.0045
tuition fee (-0.002,0.005) (·, ·) (-0.001,0.002) (-0.003,0.002) (-0.007,0.008) (-0.008,0.017)
Proportion of households supported -0.0559 · -0.0076 -0.0275 -0.0892 -0.1857
credit (-0.08,-0.032) (·, ·) (-0.018,0.003) (-0.05,-0.005) (-0.206,0.028) (-0.298,-0.074)
Proportion of households supported -0.0004 · · · -0.0011 -0.0158
business tax exemption (-0.008,0.007) (·, ·) (·, ·) (·, ·) (-0.007,0.005) (-0.029,-0.002)
Figure 4: Results of the empirical application using interaction terms between the covari-
ates and the time and treatment dummy variable (continued).
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption
For the bivariate analysis we consider the outcome variables “Proportion of households
supported credit” and the “Proportion of households supported healthcare fee” as these
variables have relatively little bunching at integer values. The counterfactual and actual
distributions are shown in Figure 10.
The figure reveals that the joint distribution has changed due to the treatment and that
the distribution of the treated population has shifted to the upper-left corner. However,
it is difficult to see whether this is not merely a result of the changes in the marginal
distributions. We also present results using Kendall’s tau given as:
n X
X n
τ̂ = sgn(Ydi − Ydj )sgn(Zdi − Zdj ).
i=1 j=i+1
The Kendall’s tau for the joint distribution of the treated sample in the second period
when treated can directly be calculated from the observed data. For the counterfactual
distribution of the treated sample in the second period when not treated, we first sample
from the estimated distribution. That is, we sample a value of Y0 using our estimate of its
marginal distribution from above. We then sample Z0 from the conditional distribution
36
Figure 5: Results without using additional control variables in the analysis.
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee
37
Figure 5: Results without using additional control variables in the analysis. (continued).
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption
of Z0 | Y0 which can be obtained using our estimates for the bivariate case. The estimated
Kendall’s tau from this procedure is 0.1253 with a 95-percent confidence interval from
0.0989 to 0.1518. This implies a positive correlation between the two outcomes in the
districts. For the observed distribution of the treated group we obtain a 0.2463 with a
95-percent confidence interval from 0.2224 to 0.2703. This implies that the treatment has
statistically significantly increased the correlation between the two outcomes.
38
Figure 6: Results of changes-in-changes
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported crop Proportion of households supported agricultural extension
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported agricultural exemption The number of visits of agricultural extension staff
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported healthcare fee Proportion of households supported tuition fee
39
Figure 6: Results of changes-in-changes (continued).
1 1
FY0 |G,T (·|1, 1), FY1 |G,T (·|1, 1)
0.6 0.6
0.4 0.4
0.2 0.2
FY0 |G,T (·|1, 1) FY0 |G,T (·|1, 1)
FY1 |G,T (·|1, 1) FY1 |G,T (·|1, 1)
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Proportion of households supported credit Proportion of households supported business tax exemption
We follow Callaway and Li (2019) and investigate a change in the mandatory state
minimum wage in the 11 U.S. states which had their mandatory minimum wage below the
federal minimum wage at the beginning of 2007. We employ these states as the treated
group. Before this change there were an additional 22 states in which the state minimum
wage was lower than the federal but we follow Callaway and Li (2019) and drop New
Hampshire and Pennsylvania. This leaves a control group of 20 states.
Similar to Callaway and Li (2019), we first estimate the impact of the change in the
state-level minimum wage on the county level unemployment rates. Callaway and Li
(2019) employ the Local Area Unemployment Statistics Database from the Bureau of
Labor Statistics (BLS) and control variables on population and income from the 2000
County Data Book. We use county data on the percentage of African Americans, the
percentage of high school graduates, the percentage of college graduates, the log of the
total population, the poverty rate and the log median income (for 1997).
The results using our method are shown in the first row of Table 7 and are quantita-
tively similar to those of Callaway and Li (2019). The impact on the unemployment rate
is negative for those counties that have lower levels of the unemployment rate but it is
positive for those counties that have higher unemployment rates. The degree of hetero-
40
Table 7: Quantile treatment effects of the increase in the mandatory minimum wage. The
unemployment rate and poverty rate are measured in percentages.
geneity in our results is smaller than Callaway and Li (2019). For example, they find an
impact of -0.44 at the first decile compared to our estimate of -0.32. This results in our
failure to reject the null hypothesis of no impact in any part of the distribution, whereas
their impact at the bottom decile is marginally statistically significant.
We also merge the Callaway and Li (2019) data with average weekly wage data taken
from the Quarterly Census of Employment and Wages (QCEW) sample from the BLS.
Note that we employ data for the first quarter of the year 2006 and 2007. We acknowledge
that the average weekly wage is not the ideal feature of the wage distribution to examine
for this question but other wage measures were not available. However, the monopsony
wages literature has frequently argued that the entire wage distribution may shift due to
a change in the mandatory minimum wage (e.g. Van den Berg, 2003). This would have
implications for the average wage. Using the same specification as for the unemployment
rate equation we examine the impact on average wages and the results are reported in the
second row of Table 7. The impact on the average weekly wage is small and statistically
insignificant.
We also examine poverty rates from the state and county estimates published by the
Census for 2006 and 2007. A shortcoming of these data is that poverty rates are reported
for the whole of 2007. As reported by Callaway and Li (2019), the federal minimum wage
increased at the end of July 2007 and this change may have an impact on the poverty
rates in our control states. Under this proviso we explore the impact of the change in
the minimum wage. We continue to use the same specification and the results are in the
third row of Table 7. The point estimates suggests that the impact on the poverty rate
41
appears somewhat larger than that on average weekly wages. In addition, the impact is
larger for those counties that have low poverty rates. However, there is no evidence of a
statistically significant impact.
Our evidence suggests that the univariate distributions are not significantly affected
by the change in the mandatory minimum wage. To investigate whether the joint dis-
tributions are affected, Figures 7 to 9 report the 2-dimensional effects noting that the
left-hand figures show the counterfactual distributions and the right-hand side show the
observed distributions. As it is difficult to reach clear conclusions from these figures,
Table 8 reports the changes in the Kendall’s τ and Spearman’s correlation index for the
different pairs of outcome variables noting that the latter is computed as:
P
6 ni=1 d2i
ρS = 1 −
n(n2 − 1)
where di is the difference in ranks between the two variables yi and zi . Note that we do
not report the Spearman’s correlation value for the previous empirical example due to the
degree of bunching at certain values in the data.
In the absence of treatment the estimates of Kendall’s τ and Spearman’s correlation
index for the unemployment rate and the average weekly wage are -0.1095 and -0.1316
respectively. Following the increase in the mandatory minimum wage, this negative rela-
tionship becomes weaker, with the corresponding estimate values of -0.0350 and -0.052.
However, these changes are not statistically significantly different from zero at a 95 per-
cent significance level. In contrast, the unemployment rate and the poverty rate have a
positive correlation prior to the treatment with the respective estimates of the correlation
being 0.256 and 0.329. This correlation becomes somewhat stronger after the increase of
the mandatory minimum wage with estimates of 0.312 and 0.453. Once again the changes
are not statistically significant.
Finally, we examine the correlation between the average weekly wage and the poverty
rate. This relationship is also negative prior to treatment with the two estimates of the
correlation being -0.3070 and -.413. This correlation also becomes weaker after treatment,
increasing to -0.1762 and -.261, and in contrast to the earlier results these changes are
statistically significant, or marginally insignificant, at the 95 percent significance level.
While we leave a fuller explanation of this result to future work, this is an encouraging
42
Figure 7: Results of 2-dimensional effects of the unemployment rate and the (log) average
weekly wage
FY0 ,Z0 |G,T (y, z|1, 1) FY1 ,Z1 |G,T (y, z|1, 1)
7.5 7.5
0.80
0.8
00
0
Poverty rate
0.
70
7 7
0
0.
0.6 70
0.5 00 0
00 0.6
0.4 00
0.3 00 0.5
00 00
0.20 0.4
6.5 0.1
0 6.5 0.3
00
00 00
0.2
00
0.100
6 6
3 4 5 6 7 8 9 3 4 5 6 7 8 9
Unemployment rate Unemployment rate
result for those who support the increase in the minimum wage. One might have expected
that the increase in minimum wage would result in higher average wages which result
in higher unemployment and greater number of individuals in poverty. While the first
relationship between wages and unemployment is consistent with the first row of Table
8, the second relationship is not supported by the data. This result also highlights the
importance of our approach. An examination of the univariate distributions suggests there
is no response to treatment. However, the changes in these correlation values suggests that
the bivariate relationships are sensitive to the treatment. This provides greater insight
into the treatment effects and the mechanisms underlying them.
6 Conclusion
We provide a relatively simple distribution regression based estimator to implement the
evaluation of treatment effects in a difference-in-difference setting. As our approach pro-
vides counterfactual distributions we are able to explore the impact of the treatment at
43
Figure 8: Results of 2-dimensional effects of the unemployment rate and the poverty rate
FY0 ,Z0 |G,T (y, z|1, 1) FY1 ,Z1 |G,T (y, z|1, 1)
30 30
0 .9 0
Average weekly wage
0
0 .9 0
0
25 25
0.
80
0
0.7
0.
50
70
0
20 20
0 .6
0.
60
00
0
0.
0.4
50
0
50
0.4
0.3
00
00
15 15 0.3
00
0.15
0.
20
0
10 10
4 5 6 7 8 9 4 5 6 7 8 9
Unemployment rate Unemployment rate
Figure 9: Results of 2-dimensional effects of the (log) average weekly wage and the poverty
rate
FY0 ,Z0 |G,T (y, z|1, 1) FY1 ,Z1 |G,T (y, z|1, 1)
0.90
30 30
0.8
00
0
0.7
00
Poverty rate
0.75
25 25
0
0.60
0.5
0.6
00
00
0
0.300
0.45
0 .3 0
0.4
00
0
0
0.10
0.15
20 20
0.2
00
0
15 15
10 10
6 6.2 6.4 6.6 6.8 7 7.2 6 6.2 6.4 6.6 6.8 7 7.2
Average weekly wage Average weekly wage
44
Table 8: Results of Kendall’s τ for the treatment group when treated compared to not
treated (with 95 percent confidence intervals between parentheses).
45
Unemployment rate and poverty rate 0.3119 0.2564 0.0555
( 0.2957, 0.3281) (0.0184, 0.4944) (-0.1810, 0.2919)
(log) Average weekly wage and poverty rate -0.1762 -0.3070 0.1307
(-0.1950, -0.1574) (-0.4308, -0.1832) (0.0060, 0.2554)
Spearman’s correlation index
Unemployment rate and (log) average weekly wage -0.0522 -0.1316 0.0793
(-0.0855, -0.0190) (-0.2134, -0.0497) (-0.0123, 0.1709)
Unemployment rate and poverty rate 0.4525 0.3287 0.1238
( 0.4301, 0.4749) (-0.0004, 0.6577) (-0.2032, 0.4508)
(log) Average weekly wage and poverty rate -0.2611 -0.4131 0.1520
(-0.2883, -0.2340) (-0.5822, -0.2441) (-0.0144, 0.3184)
Proportion of households supported healthcare fee Figure 10: Results of 2-dimensional effects
0.6 0.9
0
0.6
0.90
0.84
0.4
0.4 0.84
0.78
0.78
0.72
0.2 0.72
0.66 0.2 0.66
0.60
0.60
0.54
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Proportion of households supported credit
Proportion of households supported credit
different quantiles of the distribution of the outcome variable. For both the univariate
and multivariate cases we provide the identifying assumption and the associated estima-
tion algorithms. We provide two empirical example which revisits an existing studies and
which highlight the utility of various aspects of our approach.
Our analysis can easily be extended to the case of multiple time periods and more
than two outcomes. We can also extend our distributional regression framework to use
time and unit weights as in the synthetic difference-in-difference estimation method of
Arkhangelsky et al. (2021). We leave these extensions to future research (e.g. Fernández-
Val et al., 2024b).
References
Almond, D., H.W. Hoynes, and D.W. Schanzenbach (2011), “Inside the war on
poverty: the impact of food stamps on birth outcomes”, Review of Economics and
Statistics 93, 387–403.
46
Arkhangelsky, D. and G. Imbens (2023), “Causal Models for longitudinal and
panel Data: a survey”, working paper, Stanford University.
Athey, S. and G.D. Imbens (2006), “Identification and inference in nonlinear difference-
in-differences models”, Econometrica 74 431–97.
Blundell, R., C. Meghir, M. Costa Dias and J. Van Reenen (2004), “Evalu-
ating the employment impact of a mandatory job search program”, Journal of the
European Economic Association 2,569–606.
Card, D. (1990), “The Impact of the Mariel Boatlift on the Miami Labor Market”Industrial
and Labor Relations Review, 43, 245—57.
Card, D. and A.B. Krueger (1994), “Minimum Wages and Employment: A Case
Study of the Fast-Food Industry in New Jersey and Pennsylvania”American Eco-
nomic Review 84, 772—93.
Cengiz, D., A. Dube, A. Lindner, and B. Zipperer (2019), “The effect of mini-
mum wages on low-wage jobs ”, Quarterly Journal of Economics 134, 1405–54.
47
Chernozhukov, V., I. Fernández-Val, and S. Luo (2019), “Distribution regres-
sion with sample selection, with an application to wage decompositions in the UK”,
working paper, MIT, Cambridge (MA).
Dube, A. (2019), “Minimum wages and the distribution of family incomes”, American
Economic Journal: Applied Economics 11, 268—304.
MaCurdy, T. (2015), “How effective is the minimum wage at supporting the poor?”,
Journal of Political Economy 123, 497–545.
48
Malesky, E.J., C.V. Nguyen, and A. Trahn (2014), “The Impact of recentraliza-
tion on public services: A difference-in-differences analysis of the abolition of elected
councils in Vietnam”, American Political Science Review 108 144–68.
Roth, J. and P.H.C. Sant’Anna (2023), “When Is parallel trends sensitive to func-
tional form?”, Econometrica 91 737–47.
Torous, W., F. Gunsilius, and P. Rigollet (2024), “An optimal transport ap-
proach to estimating causal effects via nonlinear difference-in-differences, working
paper”, University of California, Berkeley.
49