Module 2.2 Randomized Assignment
Module 2.2 Randomized Assignment
1. Introduction ........................................................................................................................... 3
2. Analysis for Evaluating Impacts .............................................................................................. 3
2.1 Regression Analysis to Quantify Impacts ................................................................................ 3
2.2 T-test Based Analysis............................................................................................................... 5
2.3 A Decision to Make: Regression Models or T-test? ................................................................ 5
3. Randomization in Practice ...................................................................................................... 6
3.1 Simple Randomization ............................................................................................................ 7
3.2 Stratified or Block Randomization .......................................................................................... 7
3.3 Clustered Randomization ........................................................................................................ 8
3.4 Testing the Success of Randomization .................................................................................... 8
4. Bibliography/Further Readings ............................................................................................... 9
Learning Guide: Randomized Assignment
List of Figures
Figure 1. STATA output for OLS regression model to evaluate impacts ................................................. 4
Figure 2. t-test output to evaluate impact of PROGRESA on household income .................................. 5
Figure 3. OLS regression output with clustered standard error and controlling for poverty ................ 6
Figure 4. Testing for baseline balance ................................................................................................... 9
1. INTRODUCTION
In the previous module we covered causal inference and counterfactual analysis, which are two key
concepts used to conduct a rigorous impact evaluation. We also described selection bias and
omitted variable bias and how randomization mitigates/eliminates this problem. In the next modules
we will cover the various methods typically used when conducting an impact evaluation, including
experimental and quasi-experimental methodologies.
This module will focus on randomized assignment. We will spend time making sure you understand
how to actually analyze the impacts of a program, where to look, and what to look for. We will
explain the various methods that one can use to find the impact, comparing when to use one over
the other. Finally, we will walk through more advanced program designs where we may need to
stratify on some existing variable (e.g. gender, age, occupation) or cluster at a higher level than the
individual (e.g. school, market, village). These ways get at a more precise estimate of the impact, but
are also relatively trickier to set up and implement.
𝑌𝑖 = 𝑇𝑖 . 𝑌𝑖 𝑡𝑟𝑡 + (1 − 𝑇𝑖 ). 𝑌𝑖 𝑐𝑡𝑟
where Ti = 1 if the individual i is assigned to a treatment group and Ti = 0 if the individual is assigned
to the control group, and Yi is that individual’s observed outcome. The trt and ctr are used to clarify
that the individual can be either in treatment or control groups in “real life” analysis. Rearranging
the terms
which in usual linearity regression notation, assuming linearity, can be represented as,
𝑌𝑖 = 𝛽0 + 𝛽1 . 𝑇𝑖 + 𝜀𝑖 .
Based on the above regression model, we can estimate the conditional outcome with and without
the treatment T and then estimate the causal effect as follows,
𝑻𝒊 = 𝟏: 𝐸[𝑌𝑖 |𝑇 = 1] = 𝛽0 + 𝛽1 + 𝜀𝑖
𝑻𝒊 = 𝟎: 𝐸[𝑌𝑖 |𝑇 = 0] = 𝛽0 + 𝜀𝑖
Therefore, coefficient 𝛽1 quantifies the impact as group mean difference in outcomes between the
treatment and control group. Remember, estimates of 𝛽1 obtained in this way are unbiased only if
selection bias is 0.
Let’s assume that we expect PROGRESA (D_HH) to change the income of the household
(IncomeLabHH) in 1999. Specify a regression model in STATA as discussed above and restrict the
data to be used from year 1999 (variable year). What is the impact of the intervention on income
levels? Is it statistically significant? How do you interpret the coefficient?
Answer Key: Specify regression model as, reg IncomeLab_HH D_HH if year ==
1999. The STATA output is given in Figure 1. We find that PROGRESA participation by the
household (D_HH) did not change the household income levels in 1999 statistically significantly at
= 0.1. Remember, these causal inferences are based on the assumption that the treatment was
effectivelyrandomized by the study organizers.
We have amply practiced t-test to detect difference between the two groups; for example, whether
household assets value is different between highly-educated versus poorly-educated household
heads (Module 1.3).
Here, we can extend t-test analysis to comparing the two groups which differ by their treatment
assignment (Ti). As before, the null hypothesis is: the outcome of treated individuals is the same as it
would have been had those individuals not been treated. The alternative hypothesis can be specified
as a two-sided or one-sided (one is larger/smaller than other) comparison.
Remember, we are making an assumption that the individuals receiving the treatment are exactly
like those not receiving the treatment because of the “missing data” problem. In other words,
randomization and independence implies that there is no selection bias.
Exercise: Conduct a two-sided t-test to compare if household income is different by the treatment
assignment. Are the results same as those from the regression analysis?
Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
The above exercise demonstrates that causal effects are the same whether you use regression
analysis or t-test. However, both of these analyses assumed that selection bias was zero. Was this
the case? Whether randomized or non-randomized design is used, it is possible that the comparison
groups were imbalanced at the baseline. What if the randomization is stratified by geography, for
instance (we will discuss randomization strategies in a later module)? If that was the case, we would
have to account for how we modeled the individuals in our analysis. What if treatments were
conducted at the village level, but we believe that individuals within a village are influenced by each
other and share common facilities, so that there is some “correlation” among their behaviors? Then
we have to estimate the standard errors clustered as the village level (refer to Module 1.3 to learn
more). Also, we will later learn more robust methods, such as difference-in-difference, which
provide additional robustness relative to regression analysis in some cases.
In general, impact evaluations often face problems that have to be “controlled for” or adjusted for in
the analysis.
Regression methods give us the tools to do such adjustment which a simple t-test cannot. T-tests (or
chi-squared tests if the outcome is categorical) alone are valid only if our groups are properly
randomized, sample size is adequately large to achieve baseline balance in the two groups, data
collection is unbiased, and you can reasonably adhere to the standard assumptions of t-test.
Robust
IncomeLab_HH Coef. Std. Err. t P>|t| [95% Conf. Interval]
Figure 3. OLS regression output with clustered standard error and controlling for poverty
3. RANDOMIZATION IN PRACTICE
Typically, there are important experimental steps to take before you randomize the treatment
group. These steps include: (1) based on several assumptions, you estimate the sample size you
would need to detect the impact you are interested in (covered in later modules); and (2) you select
a population which you can include in your experiment. This can be a purposive selection in
discussion with intervention implementing agencies, a representative sample of the target
population and anything in between. Here, we discuss how to randomly assign the people into the
treatment and control groups. We will cover the quasi-experimental selection of control groups
later.
Under a simple randomization framework each individual, household, or any other “unit of analysis”
has equal chance of being selected in treatment or control groups; that is, a 0.5 probability of being
part of each group. You can imagine flipping a coin and assigning the individual to the treatment
group if the coin turns heads and to the control group if it is tails. You can conduct such a
randomized allocation through STATA as demonstrated below.
You must perform randomization before the program or at the baseline (whether or not you
conduct a baseline survey). However, in this example, we have data for 1998 and 1999
already so we restricted the randomization to 1997 observations only.
This is a practical and popular strategy often employed in the development sector. Most
development interventions or programs are not targeted at individuals but at some cluster: a village,
an office department, or some other group of people. All individuals within the targeted cluster can
be the intended customers/beneficiaries/target of the intervention, or there can be a “selection
criteria” or eligibility criteria within the cluster, or there may be random selection of participants
within the selected cluster for the program (the latter is termed “block randomization”). In STATA
you can do cluster randomization as follows.
Create an identifier that flags unique records at a cluster level. For example, suppose we want
to randomize the villages then we should identify one unique observation for each village. We
can do this as egen uniqvill = tag(villid) if year == 1997
Now, randomly assign half of the villages to treatment group as follows, gen
random_cluster_T = 0+int((1-0+1)*runiform()) if year == 1997 &
uniqvill == 1
Note, you can always combine cluster and stratified randomization as, bysort geopolid
: gen random_strata_cluster_T = 0+int((1-0+1)*runiform()) if
year == 1997 & uniqvill == 1
We discussed how the randomization assumption is based on large sample size to randomize the
treatment so that we can assume that all measured and unmeasured confounders are distributed
equally in the comparison groups. However, in reality we have to assess whether randomization was
“successful” in achieving balance. The way to test for “success of randomization” is by evaluating
the balance in measured variables between the treatment and control groups at the baseline, which
should be the first table in any kind of report. Consider the following:
We have to use baseline data because some of the measured variables can be affected by the
intervention, and they can change differentially in the treatment and control group, so that
comparing them after the commencement of a treatment might be biased
Just because we find statistically significant difference at the baseline does not mean that the
groups are imbalanced. For example, continuous measurements of age (years) and income
(US Dollars) will likely be statistically different between the two groups even if the sample size
of each is very large, because large sample size allows us to detect even very small differences
in continuous variables. Therefore, we should assess only whether the difference is
economically, biologically and logically large, not necessarily relying only on statistical
significance.
The converse of above is also true. Just because the difference is statistically insignificant does
not mean that the groups are well balanced, if the differences have large magnitudes.
It is best to compare the treatment and control groups on all available measurements for
balance at the baseline so that you can be reasonably confident that the comparison groups
are balanced, at least on observables. Best practice is to select these comparison variables
“before” you randomize and faithfully check the group mean difference after the
randomization at baseline. Plus, we can always add this covariate into our regression
afterwards to check for any potential omitted variable bias!
For the dataset example we have been following so far, let’s demonstrate how to check for the
balance.
Download and install the STATA command ttable2 is you haven’t already done so.
Run the following command to check the balance for a few selected variables at the baseline.
Note, D indicates whether the village was randomized to treatment group or not. ttable2
IncomeLab_HH famsize eduhead sexhead agehead pov_HH if year
==1997, by(D)
Figure 4 is the output that shows how well the groups are balanced. We find that all the
factors are balanced very well, but some of the differences are statistically significant.
Note, you can use the regress or ttest commands also for each one of these variables
separately and get the same results. STATA offers you several options for most kinds of
analysis, and it is up to you which one to use.
In the case of stratified or block randomization, you should evaluate the balance between each
block or strata. You can do so by using the by or bysort options in most STATA
commands.
4. BIBLIOGRAPHY/FURTHER READINGS
1. Duflo, Esther; Rachel Glannester and Micheal Kremer (2008). “Using Randomization in
Economic Development Research: A Toolkit,” Handbook of Development Economics, Vol. 4,
Elservier Science.
2. Gerber, Alan S., and Donald P. Green. “Field experiments: Design, analysis, and
interpretation.” WW Norton, 2012.
3. Gertler, Paul J., Sebastian Martinez, Patrick Premand, Laura B. Rawlings, and Christel MJ
Vermeersch. “Impact evaluation in practice.” World Bank Publications, 2011.