0% found this document useful (0 votes)
39 views18 pages

Package Rdrobust': R Topics Documented

This package provides tools for statistical inference in regression discontinuity designs: - The rdrobust function constructs point estimators and confidence intervals for treatment effects. - The rdbwselect function selects bandwidths for local polynomial estimators. - The rdplot function creates exploratory data analysis plots. The package allows for sharp, fuzzy, and kink regression discontinuity designs with robust variance estimation and covariate adjustment.

Uploaded by

omkar_puri5277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views18 pages

Package Rdrobust': R Topics Documented

This package provides tools for statistical inference in regression discontinuity designs: - The rdrobust function constructs point estimators and confidence intervals for treatment effects. - The rdbwselect function selects bandwidths for local polynomial estimators. - The rdplot function creates exploratory data analysis plots. The package allows for sharp, fuzzy, and kink regression discontinuity designs with robust variance estimation and covariate adjustment.

Uploaded by

omkar_puri5277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Package ‘rdrobust’

August 26, 2020


Type Package
Title Robust Data-Driven Statistical Inference in
Regression-Discontinuity Designs
Version 0.99.9
Date 2020-08-24
Author Sebastian Calonico <[email protected]>, Matias D. Catta-
neo <[email protected]>, Max H. Farrell <[email protected]>, Ro-
cio Titiunik <[email protected]>
Maintainer Sebastian Calonico <[email protected]>
Description Regression-discontinuity (RD) designs are quasi-experimental research designs popu-
lar in social, behavioral and natural sciences. The RD design is usually em-
ployed to study the (local) causal effect of a treatment, intervention or policy. This package pro-
vides tools for data-driven graphical and analytical statistical inference in RD
designs: rdrobust() to construct local-polynomial point estimators and robust confidence inter-
vals for average treatment effects at the cutoff in Sharp, Fuzzy and Kink RD settings, rdbwse-
lect() to perform bandwidth selection for the different procedures implemented, and rd-
plot() to conduct exploratory data analysis (RD plots).
Depends R (>= 3.1.1)
License GPL-2
Imports ggplot2
NeedsCompilation no
Repository CRAN
Date/Publication 2020-08-26 16:10:29 UTC

R topics documented:
rdrobust-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
rdbwselect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
rdbwselect_2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
rdplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
rdrobust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
rdrobust_RDsenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1
2 rdbwselect

Index 18

rdrobust-package Robust Data-Driven Statistical Inference in RD Designs

Description
Regression-discontinuity (RD) designs are quasi-experimental research designs popular in social,
behavioral and natural sciences. The RD design is usually employed to study the (local) causal
effect of a treatment, intervention or policy. This package provides tools for data-driven graphical
and analytical statistical inference in RD designs: rdrobust to construct local-polynomial point
estimators and robust confidence intervals for average treatment effects at the cutoff in Sharp, Fuzzy
and Kink RD settings, rdbwselect to perform bandwidth selection for the different procedures
implemented, and rdplot to conduct exploratory data analysis (RD plots).

Details

Package: rdrobust
Type: Package
Version: 0.99.9
Date: 2020-08-24
License: GPL-2

Function for statistical inference: rdrobust


Function for bandwidths selection: rdbwselect
Function for exploratory data analysis (RD plots): rdplot

Author(s)
Sebastian Calonico, Columbia University, New York, NY. <[email protected]>.
Matias D. Cattaneo, Princeton University, Princeton, NJ. <[email protected]>.
Max H. Farrell, University of Chicago, Chicago, IL. <[email protected]>.
Rocio Titiunik, Princeton University, Princeton, NJ. <[email protected]>.

rdbwselect Bandwidth Selection Procedures for Local Polynomial Regression


Discontinuity Estimators

Description
rdbwselect implements bandwidth selectors for local polynomial Regression Discontinuity (RD)
point estimators and inference procedures developed in Calonico, Cattaneo and Titiunik (2014a),
rdbwselect 3

Calonico, Cattaneo and Farrell (2018), Calonico, Cattaneo, Farrell and Titiunik (2019) and Calonico,
Cattaneo and Farrell (2020).
Companion commands are: rdrobust for point estimation and inference procedures, and rdplot
for data-driven RD plots (see Calonico, Cattaneo and Titiunik (2015a) for details).
A detailed introduction to this command is given in Calonico, Cattaneo and Titiunik (2015b) and
Calonico, Cattaneo, Farrell and Titiunik (2019). A companion Stata package is described in
Calonico, Cattaneo and Titiunik (2014b).
For more details, and related Stata and R packages useful for analysis of RD designs, visit https:
//rdpackages.github.io/

Usage
rdbwselect(y, x, c = NULL, fuzzy = NULL,
deriv = NULL, p = NULL, q = NULL,
covs = NULL, covs_drop = TRUE,
kernel = "tri", weights = NULL, bwselect = "mserd",
vce = "nn", cluster = NULL, nnmatch = 3,
scaleregul = 1, sharpbw = FALSE,
all = NULL, subset = NULL,
masspoints = "adjust", bwcheck = NULL,
bwrestrict = TRUE, stdvars = FALSE)

Arguments
y is the dependent variable.
x is the running variable (a.k.a. score or forcing variable).
c specifies the RD cutoff in x; default is c = 0.
fuzzy specifies the treatment status variable used to implement fuzzy RD estimation
(or Fuzzy Kink RD if deriv=1 is also specified). Default is Sharp RD design
and hence this option is not used.
deriv specifies the order of the derivative of the regression functions to be estimated.
Default is deriv=0 (for Sharp RD, or for Fuzzy RD if fuzzy is also specified).
Setting deriv=1 results in estimation of a Kink RD design (up to scale), or
Fuzzy Kink RD if fuzzy is also specified.
p specifies the order of the local-polynomial used to construct the point-estimator;
default is p = 1 (local linear regression).
q specifies the order of the local-polynomial used to construct the bias-correction;
default is q = 2 (local quadratic regression).
covs specifies additional covariates to be used for estimation and inference.
covs_drop if TRUE, it checks for collinear additional covariates and drops them. Default is
TRUE.
kernel is the kernel function used to construct the local-polynomial estimator(s). Op-
tions are triangular (default option), epanechnikov and uniform.
weights is the variable used for optional weighting of the estimation procedure. The
unit-specific weights multiply the kernel function.
4 rdbwselect

bwselect specifies the bandwidth selection procedure to be used. Options are:


mserd one common MSE-optimal bandwidth selector for the RD treatment ef-
fect estimator.
msetwo two different MSE-optimal bandwidth selectors (below and above the
cutoff) for the RD treatment effect estimator.
msesum one common MSE-optimal bandwidth selector for the sum of regression
estimates (as opposed to difference thereof).
msecomb1 for min(mserd,msesum).
msecomb2 for median(msetwo,mserd,msesum), for each side of the cutoff sepa-
rately.
cerrd one common CER-optimal bandwidth selector for the RD treatment ef-
fect estimator.
certwo two different CER-optimal bandwidth selectors (below and above the
cutoff) for the RD treatment effect estimator.
cersum one common CER-optimal bandwidth selector for the sum of regression
estimates (as opposed to difference thereof).
cercomb1 for min(cerrd,cersum).
cercomb2 for median(certwo,cerrd,cersum), for each side of the cutoff sepa-
rately.
Note: MSE = Mean Square Error; CER = Coverage Error Rate. Default is
bwselect=mserd. For details on implementation see Calonico, Cattaneo and
Titiunik (2014a), Calonico, Cattaneo and Farrell (2018), and Calonico, Catta-
neo, Farrell and Titiunik (2017), and the companion software articles.
vce specifies the procedure used to compute the variance-covariance matrix estima-
tor. Options are:
nn for heteroskedasticity-robust nearest neighbor variance estimator with nnmatch
the (minimum) number of neighbors to be used.
hc0 for heteroskedasticity-robust plug-in residuals variance estimator without
weights.
hc1 for heteroskedasticity-robust plug-in residuals variance estimator with hc1
weights.
hc2 for heteroskedasticity-robust plug-in residuals variance estimator with hc2
weights.
hc3 for heteroskedasticity-robust plug-in residuals variance estimator with hc3
weights.
Default is vce=nn.
cluster indicates the cluster ID variable used for cluster-robust variance estimation with
degrees-of-freedom weights. By default it is combined with vce=nn for cluster-
robust nearest neighbor variance estimation. Another option is plug-in residuals
combined with vce=hc0.
nnmatch to be combined with for vce=nn for heteroskedasticity-robust nearest neighbor
variance estimator with nnmatch indicating the minimum number of neighbors
to be used. Default is nnmatch=3
scaleregul specifies scaling factor for the regularization term added to the denominator of
the bandwidth selectors. Setting scaleregul = 0 removes the regularization
term from the bandwidth selectors; default is scaleregul = 1.
rdbwselect 5

sharpbw option to perform fuzzy RD estimation using a bandwidth selection procedure


for the sharp RD model. This option is automatically selected if there is perfect
compliance at either side of the threshold.
all if specified, rdbwselect reports all available bandwidth selection procedures.
subset an optional vector specifying a subset of observations to be used.
masspoints checks and controls for repeated observations in the running variable. Options
are:
(i) off: ignores the presence of mass points;
(ii) check: looks for and reports the number of unique observations at each side
of the cutoff.
(iii) adjust: controls that the preliminary bandwidths used in the calculations
contain a minimal number of unique observations. By default it uses 10 obser-
vations, but it can be manually adjusted with the option bwcheck).
Default option is masspoints=adjust.
bwcheck if a positive integer is provided, the preliminary bandwidth used in the calcula-
tions is enlarged so that at least bwcheck unique observations are used.
bwrestrict if TRUE, computed bandwidths are restricted to lie within the range of x; default
is bwrestrict = TRUE.
stdvars if TRUE, x and y are standardized before computing the bandwidths; default is
stdvars = FALSE.

Value
N vector with sample sizes to the left and to the righst of the cutoff.
c cutoff value.
p order of the local-polynomial used to construct the point-estimator.
q order of the local-polynomial used to construct the bias-correction estimator.
bws matrix containing the estimated bandwidths for each selected procedure.
bwselect bandwidth selection procedure employed.
kernel kernel function used to construct the local-polynomial estimator(s).

Author(s)
Sebastian Calonico, Columbia University, New York, NY. <[email protected]>.
Matias D. Cattaneo, Princeton University, Princeton, NJ. <[email protected]>.
Max H. Farrell, University of Chicago, Chicago, IL. <[email protected]>.
Rocio Titiunik, Princeton University, Princeton, NJ. <[email protected]>.

References
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on
Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association,
113(522): 767-779.
6 rdbwselect_2014

Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2020. Optimal Bandwidth Choice for Robust Bias
Corrected Inference in Regression Discontinuity Designs. Econometrics Journal, 23(2): 192-210.
Calonico, S., M. D. Cattaneo, M. H. Farrell, and R. Titiunik. 2017. rdrobust: Software for Regres-
sion Discontinuity Designs. Stata Journal 17(2): 372-404.
Calonico, S., M. D. Cattaneo, M. H. Farrell, and R. Titiunik. 2019. Regression Discontinuity
Designs using Covariates. Review of Economics and Statistics, 101(3): 442-451.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2014a. Robust Nonparametric Confidence Intervals
for Regression-Discontinuity Designs. Econometrica 82(6): 2295-2326.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2014b. Robust Data-Driven Inference in the
Regression-Discontinuity Design. Stata Journal 14(4): 909-946.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2015a. Optimal Data-Driven Regression Disconti-
nuity Plots. Journal of the American Statistical Association 110(512): 1753-1769.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2015b. rdrobust: An R Package for Robust Nonpara-
metric Inference in Regression-Discontinuity Designs. R Journal 7(1): 38-51.
Cattaneo, M. D., B. Frandsen, and R. Titiunik. 2015. Randomization Inference in the Regression
Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal
of Causal Inference 3(1): 1-24.

See Also
rdrobust, rdplot

Examples
x<-runif(1000,-1,1)
y<-5+3*x+2*(x>=0)+rnorm(1000)
rdbwselect(y,x)

rdbwselect_2014 Deprecated Bandwidth Selection Procedures for Local-Polynomial


Regression-Discontinuity Estimators.

Description
rdbwselect_2014 is a deprecated command implementing three bandwidth selectors for local poly-
nomial Regression Discontinuity (RD) point estimators and inference procedures, as described in
Calonico, Cattaneo and Titiunik (2014).
This command is no longer supported or updated, and it is made available only for backward com-
patibility purposes. Please use rdbwselect instead.
The latest version of the rdrobust package includes the following commands: rdrobust for point
estimation and inference procedures. rdbwselect for data-driven bandwidth selection. rdplot for
data-driven RD plots.
For more details, and related Stata and R packages useful for analysis of RD designs, visit https:
//rdpackages.github.io/
rdbwselect_2014 7

Usage
rdbwselect_2014(y, x, subset = NULL, c = 0, p = 1, q = 2, deriv = 0,
rho = NULL, kernel = "tri", bwselect = "CCT", scaleregul = 1,
delta = 0.5, cvgrid_min = NULL, cvgrid_max = NULL,
cvgrid_length = NULL, cvplot = FALSE, vce = "nn", matches = 3,
all = FALSE, precalc = TRUE )

Arguments
y is the dependent variable.
x is the running variable (a.k.a. score or forcing variable).
subset an optional vector specifying a subset of observations to be used.
c specifies the RD cutoff in x; default is c = 0.
p specifies the order of the local-polynomial used to construct the point-estimator;
default is p = 1 (local linear regression).
q specifies the order of the local-polynomial used to construct the bias-correction;
default is q = 2 (local quadratic regression).
deriv specifies the order of the derivative of the regression function to be estimated;
default is deriv = 0 (Sharp RD, or Fuzzy RD if fuzzy is also specified). Setting
it equal to 1 results in estimation of a Kink RD design (or Fuzzy Kink RD if
fuzzy is also specified).
rho if specified, sets the pilot bandwidth b equal to h/rho, where h is computed
using the method and options chosen below.
kernel is the kernel function used to construct the local-polynomial estimator(s). Op-
tions are triangular (default option), epanechnikov and uniform.
bwselect selects the bandwidth selection procedure to be used. By default it computes
both h and b, unless rho is specified, in which case it only computes h and sets
b = h/rho. Options are:
CCT for bandwidth selector proposed by Calonico, Cattaneo and Titiunik (2014)
(default option).
IK for bandwidth selector proposed by Imbens and Kalyanaraman (2012) (only
available for Sharp RD design).
CV for cross-validation method proposded by Ludwig and Miller (2007) (only
available for Sharp RD design).
scaleregul specifies scaling factor for the regularization terms of CCT and IK bandwidth se-
lectors. Setting scaleregul = 0 removes the regularization term from the band-
width selectors; default is scaleregul = 1.
delta sets the quantile that defines the sample used in the cross-validation procedure.
This option is used only if bwselect(CV) is specified; default is delta = 0.5,
that is, the median of the control and treated samples.
cvgrid_min sets the minimum value of the bandwidth grid used in the cross-validation pro-
cedure. This option is used only if bwselect = "CV" is specified.
cvgrid_max sets the maximum value of the bandwidth grid used in the cross-validation pro-
cedure. This option is used only if bwselect = "CV" is specified.
8 rdbwselect_2014

cvgrid_length sets the bin length of the (evenly-spaced) bandwidth grid used in the cross-
validation procedure. This option is used only if bwselect = "CV" is specified.
cvplot generates a graph of the CV objective function. This option is used only if
bwselect = "CV" is specified.
vce specifies the procedure used to compute the variance-covariance matrix estima-
tor. This option is used only if CCT or IK bankdwith procedures are employed.
Options are:
nn for nearest-neighbor matches residuals using matches number of matches.
This is the default option (with matches = 3, see below).
resid for estimated plug-in residuals using h bandwidth.
matches specifies the number of matches in the nearest-neighbor based variance-covariance
matrix estimator. This options is used only when nearest-neighbor matches
residuals are employed; default is matches = 3.
all if specified, rdbwselect_2014 reports three different procedures:
CCT for bandwidth selector proposed by Calonico, Cattaneo and Titiunik (2014).
IK for bandwidth selector proposed by Imbens and Kalyanaraman (2012).
CV for cross-validation method proposed by Ludwig and Miller (2007).
precalc internal option.

Value
bws matrix containing the estimated bandwidths for each selected procedure.
bwselect bandwidth selection procedure employed.
kernel kernel function used to construct the local-polynomial estimator(s).
p order of the local-polynomial used to construct the point-estimator.
q order of the local-polynomial used to construct the bias-correction estimator.

Author(s)
Sebastian Calonico, Columbia University, New York, NY. <[email protected]>.
Matias D. Cattaneo, Princeton University, Princeton, NJ. <[email protected]>.
Max H. Farrell, University of Chicago, Chicago, IL. <[email protected]>.
Rocio Titiunik, Princeton University, Princeton, NJ. <[email protected]>.

References
Calonico, S., Cattaneo, M. D., and R. Titiunik. 2014. Robust Nonparametric Confidence Intervals
for Regression-Discontinuity Designs. Econometrica 82(6): 2295-2326. .

See Also
rdrobust, rdplot
rdplot 9

Examples
x<-runif(1000,-1,1)
y<-5+3*x+2*(x>=0)+rnorm(1000)
rdbwselect_2014(y,x)

rdplot Data-Driven Regression Discontinuity Plots

Description
rdplot implements several data-driven Regression Discontinuity (RD) plots, using either evenly-
spaced or quantile-spaced partitioning. Two type of RD plots are constructed: (i) RD plots with
binned sample means tracing out the underlying regression function, and (ii) RD plots with binned
sample means mimicking the underlying variability of the data. For technical and methodological
details see Calonico, Cattaneo and Titiunik (2015a).
Companion commands are: rdrobust for point estimation and inference procedures, and rdbwselect
for data-driven bandwidth selection.
A detailed introduction to this command is given in Calonico, Cattaneo and Titiunik (2015b) and
Calonico, Cattaneo, Farrell and Titiunik (2017). A companion Stata package is described in
Calonico, Cattaneo and Titiunik (2014).
For more details, and related Stata and R packages useful for analysis of RD designs, visit https:
//rdpackages.github.io/

Usage
rdplot(y, x, c = 0, p = 4, nbins = NULL, binselect = "esmv",
scale = NULL, kernel = "uni", weights = NULL, h = NULL,
covs = NULL, covs_eval = 0, covs_drop = TRUE,
support = NULL, subset = NULL,
hide = FALSE, ci = NULL, shade = FALSE, title = NULL,
x.label = NULL, y.label = NULL, x.lim = NULL, y.lim = NULL,
col.dots = NULL, col.lines = NULL)

Arguments
y is the dependent variable.
x is the running variable (a.k.a. score or forcing variable).
c specifies the RD cutoff in x; default is c = 0.
p specifies the order of the global-polynomial used to approximate the population
conditional mean functions for control and treated units; default is p = 4.
nbins specifies the number of bins used to the left of the cutoff, denoted J− , and to
the right of the cutoff, denoted J+ , respectively. If not specified, J+ and J− are
estimated using the method and options chosen below.
10 rdplot

binselect specifies the procedure to select the number of bins. This option is available
only if J− and J+ are not set manually. Options are:
es: IMSE-optimal evenly-spaced method using spacings estimators.
espr: IMSE-optimal evenly-spaced method using polynomial regression.
esmv: mimicking variance evenly-spaced method using spacings estimators.
This is the default option.
esmvpr: mimicking variance evenly-spaced method using polynomial regres-
sion.
qs: IMSE-optimal quantile-spaced method using spacings estimators.
qspr: IMSE-optimal quantile-spaced method using polynomial regression.
qsmv: mimicking variance quantile-spaced method using spacings estimators.
qsmvpr: mimicking variance quantile-spaced method using polynomial regres-
sion.
scale specifies a multiplicative factor to be used with the optimal numbers of bins
selected. Specifically, the number of bins used for the treatment and control
groups will be scale×Jˆ+ and scale×Jˆ− , where Jˆ· denotes the estimated op-
timal numbers of bins originally computed for each group; default is scale =
1.
kernel specifies the kernel function used to construct the local-polynomial estimator(s).
Options are: triangular, epanechnikov, and uniform. Default is kernel=uniform
(i.e., equal/no weighting to all observations on the support of the kernel).
weights is the variable used for optional weighting of the estimation procedure. The
unit-specific weights multiply the kernel function.
h specifies the bandwidth used to construct the (global) polynomial fits given the
kernel choice kernel. If not specified, the bandwidths are chosen to span the
full support of the data. If two bandwidths are specified, the first bandwidth is
used for the data below the cutoff and the second bandwidth is used for the data
above the cutoff.
covs specifies additional covariates to be used in the polynomial regression.
covs_eval sets the evaluation points for the additional covariates, when included in the
estimation. Options are: covs_eval = 0 (default) and covs_eval = "mean"
covs_drop if TRUE, it checks for collinear additional covariates and drops them. Default is
TRUE.
support specifies an optional extended support of the running variable to be used in the
construction of the bins; default is the sample range.
subset an optional vector specifying a subset of observations to be used.
hide logical. If TRUE, it omits the RD plot; default is hide = FALSE.
ci optional graphical option to display confidence intervals of selected level for
each bin.
shade optional graphical option to replace confidence intervals with shaded areas.
title optional title for the RD plot.
x.label optional label for the x-axis of the RD plot.
y.label optional label for the y-axis of the RD plot.
rdplot 11

x.lim optional setting for the range of the x-axis in the RD plot.
y.lim optional setting for the range of the y-axis in the RD plot.
col.dots optional setting for the color of the dots in the RD plot.
col.lines optional setting for the color of the lines in the RD plot.

Value

binselect method used to compute the optimal number of bins.


N sample sizes used to the left and right of the cutoff.
Nh effective sample sizes used to the left and right of the cutoff.
c cutoff value.
p order of the global polynomial used.
h bandwidth used to the left and right of the cutoff.
kernel kernel used.
J selected number of bins to the left and right of the cutoff.
J_IMSE IMSE optimal number of bins to the left and right of the cutoff.
J_MV Mimicking variance number of bins to the left and right of the cutoff.
coef matrix containing the coefficients of the pth order global polynomial estimated
both sides of the cutoff.
scale selected scale value.
rscale implicit scale value.
bin_avg average bin length.
bin_med median bin length.
vars_bins data frame containing the variables used to construct the bins: bin id, cutoff
values, mean of x and y within each bin, cutoff points and confidence interval
bounds.
vars_poly data frame containing the variables used to construct the global polynomial plot.
rdplot a standard ggplot object that can be used for further customization.

Author(s)

Sebastian Calonico, Columbia University, New York, NY. <[email protected]>.


Matias D. Cattaneo, Princeton University, Princeton, NJ. <[email protected]>.
Max H. Farrell, University of Chicago, Chicago, IL. <[email protected]>.
Rocio Titiunik, Princeton University, Princeton, NJ. <[email protected]>.
12 rdrobust

References

Calonico, S., M. D. Cattaneo, M. H. Farrell, and R. Titiunik. 2017. rdrobust: Software for Regres-
sion Discontinuity Designs. Stata Journal 17(2): 372-404.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2014. Robust Data-Driven Inference in the Regression-
Discontinuity Design. Stata Journal 14(4): 909-946.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2015a. Optimal Data-Driven Regression Disconti-
nuity Plots. Journal of the American Statistical Association 110(512): 1753-1769.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2015b. rdrobust: An R Package for Robust Nonpara-
metric Inference in Regression-Discontinuity Designs. R Journal 7(1): 38-51.
Cattaneo, M. D., B. Frandsen, and R. Titiunik. 2015. Randomization Inference in the Regression
Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal
of Causal Inference 3(1): 1-24.

See Also

rdbwselect, rdrobust

Examples
x<-runif(1000,-1,1)
y<-5+3*x+2*(x>=0)+rnorm(1000)
rdplot(y,x)

rdrobust Local-Polynomial RD Estimation with Robust Confidence Intervals

Description

rdrobust implements local polynomial Regression Discontinuity (RD) point estimators with robust
bias-corrected confidence intervals and inference procedures developed in Calonico, Cattaneo and
Titiunik (2014a), Calonico, Cattaneo and Farrell (2018), Calonico, Cattaneo, Farrell and Titiunik
(2019), and Calonico, Cattaneo and Farrell (2020). It also computes alternative estimation and
inference procedures available in the literature.
Companion commands are: rdbwselect for data-driven bandwidth selection, and rdplot for data-
driven RD plots (see Calonico, Cattaneo and Titiunik (2015a) for details).
A detailed introduction to this command is given in Calonico, Cattaneo and Titiunik (2015b),
and Calonico, Cattaneo, Farrell and Titiunik (2017). A companion Stata package is described
in Calonico, Cattaneo and Titiunik (2014b).
For more details, and related Stata and R packages useful for analysis of RD designs, visit https:
//rdpackages.github.io/
rdrobust 13

Usage
rdrobust(y, x, c = NULL, fuzzy = NULL,
deriv = NULL, p = NULL, q = NULL,
h = NULL, b = NULL, rho = NULL, covs = NULL, covs_drop = TRUE,
kernel = "tri", weights = NULL, bwselect = "mserd",
vce = "nn", cluster = NULL,
nnmatch = 3, level = 95, scalepar = 1, scaleregul = 1,
sharpbw = FALSE, all = NULL, subset = NULL,
masspoints = "adjust", bwcheck = NULL,
bwrestrict = TRUE, stdvars = FALSE)

Arguments
y is the dependent variable.
x is the running variable (a.k.a. score or forcing variable).
c specifies the RD cutoff in x; default is c = 0.
fuzzy specifies the treatment status variable used to implement fuzzy RD estimation
(or Fuzzy Kink RD if deriv=1 is also specified). Default is Sharp RD design
and hence this option is not used.
deriv specifies the order of the derivative of the regression functions to be estimated.
Default is deriv=0 (for Sharp RD, or for Fuzzy RD if fuzzy is also specified).
Setting deriv=1 results in estimation of a Kink RD design (up to scale), or
Fuzzy Kink RD if fuzzy is also specified.
p specifies the order of the local-polynomial used to construct the point-estimator;
default is p = 1 (local linear regression).
q specifies the order of the local-polynomial used to construct the bias-correction;
default is q = 2 (local quadratic regression).
h specifies the main bandwidth used to construct the RD point estimator. If not
specified, bandwidth h is computed by the companion command rdbwselect.
If two bandwidths are specified, the first bandwidth is used for the data below
the cutoff and the second bandwidth is used for the data above the cutoff.
b specifies the bias bandwidth used to construct the bias-correction estimator. If
not specified, bandwidth b is computed by the companion command rdbwselect.
If two bandwidths are specified, the first bandwidth is used for the data below
the cutoff and the second bandwidth is used for the data above the cutoff.
rho specifies the value of rho, so that the bias bandwidth b equals h/rho. Default is
rho = 1 if h is specified but b is not.
covs specifies additional covariates to be used for estimation and inference.
covs_drop if TRUE, it checks for collinear additional covariates and drops them. Default is
TRUE.
kernel is the kernel function used to construct the local-polynomial estimator(s). Op-
tions are triangular (default option), epanechnikov and uniform.
weights is the variable used for optional weighting of the estimation procedure. The
unit-specific weights multiply the kernel function.
14 rdrobust

bwselect specifies the bandwidth selection procedure to be used. By default it computes


both h and b, unless rho is specified, in which case it only computes h and sets
b=h/rho. Options are:
mserd one common MSE-optimal bandwidth selector for the RD treatment ef-
fect estimator.
msetwo two different MSE-optimal bandwidth selectors (below and above the
cutoff) for the RD treatment effect estimator.
msesum one common MSE-optimal bandwidth selector for the sum of regression
estimates (as opposed to difference thereof).
msecomb1 for min(mserd,msesum).
msecomb2 for median(msetwo,mserd,msesum), for each side of the cutoff sepa-
rately.
cerrd one common CER-optimal bandwidth selector for the RD treatment ef-
fect estimator.
certwo two different CER-optimal bandwidth selectors (below and above the
cutoff) for the RD treatment effect estimator.
cersum one common CER-optimal bandwidth selector for the sum of regression
estimates (as opposed to difference thereof).
cercomb1 for min(cerrd,cersum).
cercomb2 for median(certwo,cerrd,cersum), for each side of the cutoff sepa-
rately.
Note: MSE = Mean Square Error; CER = Coverage Error Rate. Default is
bwselect=mserd. For details on implementation see Calonico, Cattaneo and
Titiunik (2014a), Calonico, Cattaneo and Farrell (2018), and Calonico, Catta-
neo, Farrell and Titiunik (2019), and the companion software articles.
vce specifies the procedure used to compute the variance-covariance matrix estima-
tor. Options are:
nn for heteroskedasticity-robust nearest neighbor variance estimator with nnmatch
the (minimum) number of neighbors to be used.
hc0 for heteroskedasticity-robust plug-in residuals variance estimator without
weights.
hc1 for heteroskedasticity-robust plug-in residuals variance estimator with hc1
weights.
hc2 for heteroskedasticity-robust plug-in residuals variance estimator with hc2
weights.
hc3 for heteroskedasticity-robust plug-in residuals variance estimator with hc3
weights.
Default is vce=nn.
cluster indicates the cluster ID variable used for cluster-robust variance estimation with
degrees-of-freedom weights. By default it is combined with vce=nn for cluster-
robust nearest neighbor variance estimation. Another option is plug-in residuals
combined with vce=hc0.
nnmatch to be combined with for vce=nn for heteroskedasticity-robust nearest neighbor
variance estimator with nnmatch indicating the minimum number of neighbors
to be used. Default is nnmatch=3
rdrobust 15

level sets the confidence level for confidence intervals; default is level = 95.
scalepar specifies scaling factor for RD parameter of interest. This option is useful when
the population parameter of interest involves a known multiplicative factor (e.g.,
sharp kink RD). Default is scalepar = 1 (no scaling).
scaleregul specifies scaling factor for the regularization term added to the denominator of
the bandwidth selectors. Setting scaleregul = 0 removes the regularization
term from the bandwidth selectors; default is scaleregul = 1.
sharpbw option to perform fuzzy RD estimation using a bandwidth selection procedure
for the sharp RD model. This option is automatically selected if there is perfect
compliance at either side of the cutoff.
all if specified, rdrobust reports three different procedures:
(i) conventional RD estimates with conventional standard errors.
(ii) bias-corrected estimates with conventional standard errors.
(iii) bias-corrected estimates with robust standard errors.
subset an optional vector specifying a subset of observations to be used.
masspoints checks and controls for repeated observations in the running variable. Options
are:
(i) off: ignores the presence of mass points;
(ii) check: looks for and reports the number of unique observations at each side
of the cutoff.
(iii) adjust: controls that the preliminary bandwidths used in the calculations
contain a minimal number of unique observations. By default it uses 10 obser-
vations, but it can be manually adjusted with the option bwcheck).
Default option is masspoints=adjust.
bwcheck if a positive integer is provided, the preliminary bandwidth used in the calcula-
tions is enlarged so that at least bwcheck unique observations are used.
bwrestrict if TRUE, computed bandwidths are restricted to lie within the range of x; default
is bwrestrict = TRUE.
stdvars if TRUE, x and y are standardized before computing the bandwidths; default is
stdvars = FALSE.

Value
N vector with the sample sizes used to the left and to the right of the cutoff.
N_h vector with the effective sample sizes used to the left and to the right of the
cutoff.
c cutoff value.
p order of the polynomial used for estimation of the regression function.
q order of the polynomial used for estimation of the bias of the regression function.
bws matrix containing the bandwidths used.
tau_cl conventional local-polynomial estimate to the left and to the right of the cutoff.
tau_bc bias-corrected local-polynomial estimate to the left and to the right of the cutoff.
16 rdrobust

coef vector containing conventional and bias-corrected local-polynomial RD esti-


mates.
se vector containing conventional and robust standard errors of the local-polynomial
RD estimates.
bias estimated bias for the local-polynomial RD estimator below and above the cut-
off.
beta_p_l conventional p-order local-polynomial estimates to the left of the cutoff.
beta_p_r conventional p-order local-polynomial estimates to the right of the cutoff.
V_cl_l conventional variance-covariance matrix estimated below the cutoff.
V_cl_r conventional variance-covariance matrix estimated above the cutoff.
V_rb_l robust variance-covariance matrix estimated below the cutoff.
V_rb_r robust variance-covariance matrix estimated above the cutoff.
pv vector containing the p-values associated with conventional, bias-corrected and
robust local-polynomial RD estimates.
ci matrix containing the confidence intervals associated with conventional, bias-
corrected and robust local-polynomial RD estimates.

Author(s)
Sebastian Calonico, Columbia University, New York, NY. <[email protected]>.
Matias D. Cattaneo, Princeton University, Princeton, NJ. <[email protected]>.
Max H. Farrell, University of Chicago, Chicago, IL. <[email protected]>.
Rocio Titiunik, Princeton University, Princeton, NJ. <[email protected]>.

References
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on
Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association,
113(522): 767-779.
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2020. Optimal Bandwidth Choice for Robust Bias
Corrected Inference in Regression Discontinuity Designs. Econometrics Journal, 23(2): 192-210.
Calonico, S., M. D. Cattaneo, M. H. Farrell, and R. Titiunik. 2017. rdrobust: Software for Regres-
sion Discontinuity Designs. Stata Journal, 17(2): 372-404.
Calonico, S., M. D. Cattaneo, M. H. Farrell, and R. Titiunik. 2019. Regression Discontinuity
Designs using Covariates. Review of Economics and Statistics, 101(3): 442-451.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2014a. Robust Nonparametric Confidence Intervals
for Regression-Discontinuity Designs. Econometrica 82(6): 2295-2326.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2014b. Robust Data-Driven Inference in the
Regression-Discontinuity Design. Stata Journal 14(4): 909-946.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2015a. Optimal Data-Driven Regression Disconti-
nuity Plots. Journal of the American Statistical Association 110(512): 1753-1769.
Calonico, S., M. D. Cattaneo, and R. Titiunik. 2015b. rdrobust: An R Package for Robust Nonpara-
metric Inference in Regression-Discontinuity Designs. R Journal 7(1): 38-51.
rdrobust_RDsenate 17

Cattaneo, M. D., B. Frandsen, and R. Titiunik. 2015. Randomization Inference in the Regression
Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal
of Causal Inference 3(1): 1-24.

See Also
rdbwselect, rdplot

Examples

x<-runif(1000,-1,1)
y<-5+3*x+2*(x>=0)+rnorm(1000)
rdrobust(y,x)

rdrobust_RDsenate RD Senate Data

Description
Extract of the dataset constructed by Cattaneo, Frandsen, and Titiunik (2015), which include mea-
sures of incumbency advantage in the U.S. Senate for the period 1914-2010.

Usage
data(rdrobust_RDsenate)

Format
A data frame with 1390 observations on the following 2 variables.
margin a numeric vector.
vote a numeric vector.

Source
Cattaneo, M. D., Frandsen, B., and R. Titiunik. 2015. Randomization Inference in the Regression
Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal
of Causal Inference 3(1): 1-24.

References
Cattaneo, M. D., Frandsen, B., and R. Titiunik. 2015. Randomization Inference in the Regression
Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal
of Causal Inference 3(1): 1-24.
Index

∗ RD plots
rdplot, 9
∗ RDD
rdrobust, 12
∗ Robust Estimation
rdrobust, 12
∗ binning
rdplot, 9
∗ datasets
rdrobust_RDsenate, 17
∗ partitioning
rdplot, 9
∗ regression discontinuity
rdplot, 9
∗ tuning parameter selection
rdplot, 9

print.rdbwselect (rdbwselect), 2
print.rdbwselect_2014
(rdbwselect_2014), 6
print.rdplot (rdplot), 9
print.rdrobust (rdrobust), 12

rdbwselect, 2, 2, 5, 6, 9, 12, 13, 17


rdbwselect_2014, 6, 6, 8
rdplot, 2, 3, 6, 8, 9, 9, 12, 17
rdrobust, 2, 3, 6, 8, 9, 12, 12, 15
rdrobust-package, 2
rdrobust_RDsenate, 17

summary.rdbwselect (rdbwselect), 2
summary.rdbwselect_2014
(rdbwselect_2014), 6
summary.rdplot (rdplot), 9
summary.rdrobust (rdrobust), 12

18

You might also like