0% found this document useful (0 votes)

57 views

Module 2.2 Randomized Assignment

This document discusses randomized assignment and how to analyze data from randomized controlled trials (RCTs). It describes using regression analysis and t-tests to quantify the impact of interventions by comparing outcomes between treatment and control groups. It also covers stratified and clustered randomization.

Uploaded by

noubissidomguia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Module 2.2 Randomized Assignment

Uploaded by

noubissidomguia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Center for Effective Global Action

University of California, Berkeley

Module 2.2: Randomized Assignment

Contents

1. Introduction ........................................................................................................................... 3
2. Analysis for Evaluating Impacts .............................................................................................. 3
2.1 Regression Analysis to Quantify Impacts ................................................................................ 3
2.2 T-test Based Analysis............................................................................................................... 5
2.3 A Decision to Make: Regression Models or T-test? ................................................................ 5
3. Randomization in Practice ...................................................................................................... 6
3.1 Simple Randomization ............................................................................................................ 7
3.2 Stratified or Block Randomization .......................................................................................... 7
3.3 Clustered Randomization ........................................................................................................ 8
3.4 Testing the Success of Randomization .................................................................................... 8
4. Bibliography/Further Readings ............................................................................................... 9
Learning Guide: Randomized Assignment

List of Figures
Figure 1. STATA output for OLS regression model to evaluate impacts ................................................. 4
Figure 2. t-test output to evaluate impact of PROGRESA on household income .................................. 5
Figure 3. OLS regression output with clustered standard error and controlling for poverty ................ 6
Figure 4. Testing for baseline balance ................................................................................................... 9

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 3

1. INTRODUCTION

In the previous module we covered causal inference and counterfactual analysis, which are two key
concepts used to conduct a rigorous impact evaluation. We also described selection bias and
omitted variable bias and how randomization mitigates/eliminates this problem. In the next modules
we will cover the various methods typically used when conducting an impact evaluation, including
experimental and quasi-experimental methodologies.

This module will focus on randomized assignment. We will spend time making sure you understand
how to actually analyze the impacts of a program, where to look, and what to look for. We will
explain the various methods that one can use to find the impact, comparing when to use one over
the other. Finally, we will walk through more advanced program designs where we may need to
stratify on some existing variable (e.g. gender, age, occupation) or cluster at a higher level than the
individual (e.g. school, market, village). These ways get at a more precise estimate of the impact, but
are also relatively trickier to set up and implement.

At the end of this module, you should be able to:

 Determine how successful the randomization was (or wasn’t)

 Conduct basic data analysis of an RCT
 Understand how and when to implement a stratified or clustered randomization

2. ANALYSIS FOR EVALUATING IMPACTS

2.1 Regression Analysis to Quantify Impacts

We can write the observed outcome Y for an individual I as,

𝑌𝑖 = 𝑇𝑖 . 𝑌𝑖 𝑡𝑟𝑡 + (1 − 𝑇𝑖 ). 𝑌𝑖 𝑐𝑡𝑟

where Ti = 1 if the individual i is assigned to a treatment group and Ti = 0 if the individual is assigned
to the control group, and Yi is that individual’s observed outcome. The trt and ctr are used to clarify
that the individual can be either in treatment or control groups in “real life” analysis. Rearranging
the terms

𝑌𝑖 = 𝑌𝑖 𝑐𝑡𝑟 + (𝑌𝑖 𝑡𝑟𝑡 − 𝑌𝑖 𝑐𝑡𝑟 ). 𝑇𝑖 , and

𝑌𝑖 = 𝐸[𝑌𝑖 ]𝑐𝑡𝑟 + (𝑌𝑖 𝑡𝑟𝑡 − 𝑌𝑖 𝑐𝑡𝑟 ). 𝑇𝑖 + (𝑌𝑖 𝑐𝑡𝑟 − 𝐸[𝑌𝑖 ]𝑐𝑡𝑟 )

which in usual linearity regression notation, assuming linearity, can be represented as,

𝑌𝑖 = 𝛽0 + 𝛽1 . 𝑇𝑖 + 𝜀𝑖 .

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 4

Based on the above regression model, we can estimate the conditional outcome with and without
the treatment T and then estimate the causal effect as follows,

𝑻𝒊 = 𝟏: 𝐸[𝑌𝑖 |𝑇 = 1] = 𝛽0 + 𝛽1 + 𝜀𝑖

𝑻𝒊 = 𝟎: 𝐸[𝑌𝑖 |𝑇 = 0] = 𝛽0 + 𝜀𝑖

𝐸 [𝑌𝑖 |𝑇 = 1] − 𝐸 [𝑌𝑖 |𝑇 = 0] = 𝛽0 + 𝛽1 + 𝜀𝑖 − (𝛽0 + 𝜀𝑖 ) = 𝛽1 .

Therefore, coefficient 𝛽1 quantifies the impact as group mean difference in outcomes between the
treatment and control group. Remember, estimates of 𝛽1 obtained in this way are unbiased only if
selection bias is 0.

Exercise: Open PanelPROGRESA_97_99year.dta. This dataset is a repeated cross-section

of different waves of the ENCEL survey for March and October 1998 and November 1999. It also
includes the baseline data collected in 1997. We have used this or part of this dataset previous
modules as well. Also refer to Module2.2 Learning Guide.do file.

Let’s assume that we expect PROGRESA (D_HH) to change the income of the household
(IncomeLabHH) in 1999. Specify a regression model in STATA as discussed above and restrict the
data to be used from year 1999 (variable year). What is the impact of the intervention on income
levels? Is it statistically significant? How do you interpret the coefficient?

Answer Key: Specify regression model as, reg IncomeLab_HH D_HH if year ==
1999. The STATA output is given in Figure 1. We find that PROGRESA participation by the
household (D_HH) did not change the household income levels in 1999 statistically significantly at 
= 0.1. Remember, these causal inferences are based on the assumption that the treatment was
effectivelyrandomized by the study organizers.

. reg IncomeLab_HH D_HH if year == 1999

Source SS df MS Number of obs = 18,370

F(1, 18368) = 0.16
Model 38331591.5 1 38331591.5 Prob > F = 0.6908
Residual 4.4509e+12 18,368 242320693 R-squared = 0.0000
Adj R-squared = -0.0000
Total 4.4510e+12 18,369 242309588 Root MSE = 15567

IncomeLab_HH Coef. Std. Err. t P>|t| [95% Conf. Interval]

D_HH 94.88663 238.5733 0.40 0.691 -372.7393 562.5125

_cons 1898.628 144.1233 13.17 0.000 1616.133 2181.123

Figure 1. STATA output for OLS regression model to evaluate impacts

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 5

2.2 T-test Based Analysis

We have amply practiced t-test to detect difference between the two groups; for example, whether
household assets value is different between highly-educated versus poorly-educated household
heads (Module 1.3).

Here, we can extend t-test analysis to comparing the two groups which differ by their treatment
assignment (Ti). As before, the null hypothesis is: the outcome of treated individuals is the same as it
would have been had those individuals not been treated. The alternative hypothesis can be specified
as a two-sided or one-sided (one is larger/smaller than other) comparison.

Remember, we are making an assumption that the individuals receiving the treatment are exactly
like those not receiving the treatment because of the “missing data” problem. In other words,
randomization and independence implies that there is no selection bias.

Exercise: Conduct a two-sided t-test to compare if household income is different by the treatment
assignment. Are the results same as those from the regression analysis?

Answer Key: we can conduct a t-test in STATA as ttest IncomeLab_HH if year ==

1999, by(D_HH). We find that the magnitude (group mean difference) and significance of the
causal effect is precisely the same as that in Figure 1 above. Indeed, OLS just performed a t-test for
us.
. ttest IncomeLab_HH if year == 1999, by(D_HH)

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

0 11,666 1898.628 83.78446 9049.497 1734.396 2062.859

1 6,704 1993.514 278.9093 22836.52 1446.764 2540.265

combined 18,370 1933.256 114.8499 15566.3 1708.139 2158.373

diff -94.88663 238.5733 -562.5125 372.7393

diff = mean(0) - mean(1) t = -0.3977

Ho: diff = 0 degrees of freedom = 18368

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.3454 Pr(|T| > |t|) = 0.6908 Pr(T > t) = 0.6546

Figure 2. t-test output to evaluate impact of PROGRESA on household income

2.3 A Decision to Make: Regression Models or T-test?

The above exercise demonstrates that causal effects are the same whether you use regression
analysis or t-test. However, both of these analyses assumed that selection bias was zero. Was this
the case? Whether randomized or non-randomized design is used, it is possible that the comparison
groups were imbalanced at the baseline. What if the randomization is stratified by geography, for
instance (we will discuss randomization strategies in a later module)? If that was the case, we would
have to account for how we modeled the individuals in our analysis. What if treatments were

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 6

conducted at the village level, but we believe that individuals within a village are influenced by each
other and share common facilities, so that there is some “correlation” among their behaviors? Then
we have to estimate the standard errors clustered as the village level (refer to Module 1.3 to learn
more). Also, we will later learn more robust methods, such as difference-in-difference, which
provide additional robustness relative to regression analysis in some cases.

In general, impact evaluations often face problems that have to be “controlled for” or adjusted for in
the analysis.

Regression methods give us the tools to do such adjustment which a simple t-test cannot. T-tests (or
chi-squared tests if the outcome is categorical) alone are valid only if our groups are properly
randomized, sample size is adequately large to achieve baseline balance in the two groups, data
collection is unbiased, and you can reasonably adhere to the standard assumptions of t-test.

Exercise: As a demonstration, let’s assume that poverty status of a household (pov_HH) is a

confounder and we should control for it in the regression analysis. We also want to cluster the
standard errors at the village level. We can most easily accomplish this using regression models in
STATA as: reg IncomeLab_HH D_HH pov_HH if year == 1999,
cluster(villid). Notice that the effect size (magnitude of the coefficient) and the standard
error are both changed markedly. This analysis provides additional evidence against the hypothesis
that the PROGRESA program increased local income levels. However, this analysis is still very basic;
we will see in a later module that PROGRESA actually did have significant impacts, though we have
been unable to isolate them empirically.

. regress IncomeLab_HH D_HH pov_HH if year == 1999, cluster( villid )

Linear regression Number of obs = 17,942

F(2, 498) = 5.34
Prob > F = 0.0051
R-squared = 0.0001
Root MSE = 15749

(Std. Err. adjusted for 499 clusters in villid)

Robust
IncomeLab_HH Coef. Std. Err. t P>|t| [95% Conf. Interval]

D_HH 229.035 324.9309 0.70 0.481 -409.3695 867.4394

pov_HH -439.9662 138.2007 -3.18 0.002 -711.4945 -168.4379
_cons 2204.446 75.01622 29.39 0.000 2057.058 2351.833

Figure 3. OLS regression output with clustered standard error and controlling for poverty

3. RANDOMIZATION IN PRACTICE

Typically, there are important experimental steps to take before you randomize the treatment
group. These steps include: (1) based on several assumptions, you estimate the sample size you
would need to detect the impact you are interested in (covered in later modules); and (2) you select

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 7

a population which you can include in your experiment. This can be a purposive selection in
discussion with intervention implementing agencies, a representative sample of the target
population and anything in between. Here, we discuss how to randomly assign the people into the
treatment and control groups. We will cover the quasi-experimental selection of control groups
later.

3.1 Simple Randomization

Under a simple randomization framework each individual, household, or any other “unit of analysis”
has equal chance of being selected in treatment or control groups; that is, a 0.5 probability of being
part of each group. You can imagine flipping a coin and assigning the individual to the treatment
group if the coin turns heads and to the control group if it is tails. You can conduct such a
randomized allocation through STATA as demonstrated below.

 Open PanelPROGRESA_97_99year.dta if you have not already

 Generate a random variable which takes value of 0 or 1 as:

gen random_T = 0+int((1-0+1)*runiform()) if year == 1997

 You must perform randomization before the program or at the baseline (whether or not you
conduct a baseline survey). However, in this example, we have data for 1998 and 1999
already so we restricted the randomization to 1997 observations only.

3.2 Stratified or Block Randomization

In some experiments it is practical or theoretically proper to randomize at some unit of grouping

(called blocks or strata). For example, you may want to control for regional administrative,
ecological, or political factors and maximize “exchangeability” by randomizing at an administrative
district or block level. It is possible that you have selected your study participants in a 2-stage
sample and now you want to randomize the treatment in the same manner. Or you are concerned
with attrition (individuals leaving your sample before you conduct follow-up analysis) so that at the
end of the study you will be faced with “un-exchangeable” groups. In this case, you can: (a) break
the study sample in random groups of sizes (for example) 2, 4, and 6 villages each. Within each
group, you can randomize the treatment – half to treatment and half to controls. This is done so
that even if you lose controls in a “group” you can discard that group of 2, 4 or 6 villages but still be
assured of exchangeability in the remaining groups. In short, there are strong reasons for stratifying
the randomization. It can be conducted in STATA as follows:

 Suppose we want to stratify by federal administrative regions (geopolid)

 We will repeat what we did in 3.1 but just stratified as, bysort geopolid : gen
random_strata_T = 0+int((1-0+1)*runiform()) if year == 1997
 Note, anytime you use a random generator (e.g. runiform function above), it is best to set
seed to a fixed value so that you can always reproduce the results. runiform() creates a
uniform distribution which is basically random number generation.

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 8

3.3 Clustered Randomization

This is a practical and popular strategy often employed in the development sector. Most
development interventions or programs are not targeted at individuals but at some cluster: a village,
an office department, or some other group of people. All individuals within the targeted cluster can
be the intended customers/beneficiaries/target of the intervention, or there can be a “selection
criteria” or eligibility criteria within the cluster, or there may be random selection of participants
within the selected cluster for the program (the latter is termed “block randomization”). In STATA
you can do cluster randomization as follows.

 Create an identifier that flags unique records at a cluster level. For example, suppose we want
to randomize the villages then we should identify one unique observation for each village. We
can do this as egen uniqvill = tag(villid) if year == 1997
 Now, randomly assign half of the villages to treatment group as follows, gen
random_cluster_T = 0+int((1-0+1)*runiform()) if year == 1997 &
uniqvill == 1
 Note, you can always combine cluster and stratified randomization as, bysort geopolid
: gen random_strata_cluster_T = 0+int((1-0+1)*runiform()) if
year == 1997 & uniqvill == 1

3.4 Testing the Success of Randomization

We discussed how the randomization assumption is based on large sample size to randomize the
treatment so that we can assume that all measured and unmeasured confounders are distributed
equally in the comparison groups. However, in reality we have to assess whether randomization was
“successful” in achieving balance. The way to test for “success of randomization” is by evaluating
the balance in measured variables between the treatment and control groups at the baseline, which
should be the first table in any kind of report. Consider the following:

 We have to use baseline data because some of the measured variables can be affected by the
intervention, and they can change differentially in the treatment and control group, so that
comparing them after the commencement of a treatment might be biased

 Just because we find statistically significant difference at the baseline does not mean that the
groups are imbalanced. For example, continuous measurements of age (years) and income
(US Dollars) will likely be statistically different between the two groups even if the sample size
of each is very large, because large sample size allows us to detect even very small differences
in continuous variables. Therefore, we should assess only whether the difference is
economically, biologically and logically large, not necessarily relying only on statistical
significance.

 The converse of above is also true. Just because the difference is statistically insignificant does
not mean that the groups are well balanced, if the differences have large magnitudes.

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 9

 It is best to compare the treatment and control groups on all available measurements for
balance at the baseline so that you can be reasonably confident that the comparison groups
are balanced, at least on observables. Best practice is to select these comparison variables
“before” you randomize and faithfully check the group mean difference after the
randomization at baseline. Plus, we can always add this covariate into our regression
afterwards to check for any potential omitted variable bias!

For the dataset example we have been following so far, let’s demonstrate how to check for the
balance.

 Download and install the STATA command ttable2 is you haven’t already done so.
 Run the following command to check the balance for a few selected variables at the baseline.
Note, D indicates whether the village was randomized to treatment group or not. ttable2
IncomeLab_HH famsize eduhead sexhead agehead pov_HH if year
==1997, by(D)
 Figure 4 is the output that shows how well the groups are balanced. We find that all the
factors are balanced very well, but some of the differences are statistically significant.
 Note, you can use the regress or ttest commands also for each one of these variables
separately and get the same results. STATA offers you several options for most kinds of
analysis, and it is up to you which one to use.
 In the case of stratified or block randomization, you should evaluate the balance between each
block or strata. You can do so by using the by or bysort options in most STATA
commands.

Figure 4. Testing for baseline balance

4. BIBLIOGRAPHY/FURTHER READINGS

1. Duflo, Esther; Rachel Glannester and Micheal Kremer (2008). “Using Randomization in
Economic Development Research: A Toolkit,” Handbook of Development Economics, Vol. 4,
Elservier Science.

Center for Effective Global Action

University of California, Berkeley
Learning Guide: Randomized Assignment Page | 10

2. Gerber, Alan S., and Donald P. Green. “Field experiments: Design, analysis, and
interpretation.” WW Norton, 2012.
3. Gertler, Paul J., Sebastian Martinez, Patrick Premand, Laura B. Rawlings, and Christel MJ
Vermeersch. “Impact evaluation in practice.” World Bank Publications, 2011.

Center for Effective Global Action

University of California, Berkeley

Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
OLS Stata9
No ratings yet
OLS Stata9
11 pages
OLS Stata9
No ratings yet
OLS Stata9
13 pages
A4-+PresentationTemplate Research
No ratings yet
A4-+PresentationTemplate Research
18 pages
STATA Training for staff
No ratings yet
STATA Training for staff
23 pages
Diff in Diff
No ratings yet
Diff in Diff
23 pages
Causal-Inference Emsley
No ratings yet
Causal-Inference Emsley
54 pages
05 - The Unreasonable Effectiveness of Linear Regression - Causal Inference For The Brave and True
No ratings yet
05 - The Unreasonable Effectiveness of Linear Regression - Causal Inference For The Brave and True
10 pages
Centeno - Alexander PSET2 LBYMET2 Final
No ratings yet
Centeno - Alexander PSET2 LBYMET2 Final
11 pages
Techniques of Statistical Analysis 1 Group 2 2014-15
No ratings yet
Techniques of Statistical Analysis 1 Group 2 2014-15
3 pages
Poli 30: Political Inquiry: Fall Quarter, 2012 Review
No ratings yet
Poli 30: Political Inquiry: Fall Quarter, 2012 Review
18 pages
@ Arkes - Regression Analysis - A Practical Introduction (2023)
No ratings yet
@ Arkes - Regression Analysis - A Practical Introduction (2023)
413 pages
Sketch
No ratings yet
Sketch
3 pages
Econometrics 2
No ratings yet
Econometrics 2
128 pages
Impact Evaluation Universidad Del Rosario: Problem Set 3
No ratings yet
Impact Evaluation Universidad Del Rosario: Problem Set 3
10 pages
Efectos de Interacción
No ratings yet
Efectos de Interacción
30 pages
Diagnostic Tests
No ratings yet
Diagnostic Tests
44 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
100% (2)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
49 pages
Full Download (eBook PDF) Using Multivariate Statistics 7th Edition by Barbara G. Tabachnick PDF DOCX
100% (2)
Full Download (eBook PDF) Using Multivariate Statistics 7th Edition by Barbara G. Tabachnick PDF DOCX
53 pages
Solutions To Sample Final Exam ECO2151
No ratings yet
Solutions To Sample Final Exam ECO2151
7 pages
Instrumental variable in regression
No ratings yet
Instrumental variable in regression
28 pages
Surviving Graduate Econometrics With R Difference-In-Differences Estimation - 2 of 8
No ratings yet
Surviving Graduate Econometrics With R Difference-In-Differences Estimation - 2 of 8
7 pages
(Ebook) Regression Analysis: A Practical Introduction by Jeremy Arkes ISBN 9781138541405, 1138541400 download pdf
100% (4)
(Ebook) Regression Analysis: A Practical Introduction by Jeremy Arkes ISBN 9781138541405, 1138541400 download pdf
55 pages
Causal Inference in Python
No ratings yet
Causal Inference in Python
10 pages
Im ch01
No ratings yet
Im ch01
11 pages
IE Methods
No ratings yet
IE Methods
112 pages
Regression Analysis A Practical Introduction Compress
No ratings yet
Regression Analysis A Practical Introduction Compress
363 pages
Class 10 Multilevel Models
No ratings yet
Class 10 Multilevel Models
42 pages
ansprac2
No ratings yet
ansprac2
6 pages
Group4
No ratings yet
Group4
9 pages
283 (1)
No ratings yet
283 (1)
7 pages
102x Screening Exam Questions
No ratings yet
102x Screening Exam Questions
3 pages
Mock Exam Solution Empirical Methods For Finance
No ratings yet
Mock Exam Solution Empirical Methods For Finance
6 pages
2013-01-18 Hansen IV Slides
No ratings yet
2013-01-18 Hansen IV Slides
71 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
Emp Handout PDF
No ratings yet
Emp Handout PDF
36 pages
BRM Assgnmnt
No ratings yet
BRM Assgnmnt
14 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
100% (5)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
46 pages
05_week_economicsofeducation
No ratings yet
05_week_economicsofeducation
11 pages
Kokoka
No ratings yet
Kokoka
3 pages
UT Dallas Syllabus For Epps6316.502.11s Taught by Paul Jargowsky (Jargo)
No ratings yet
UT Dallas Syllabus For Epps6316.502.11s Taught by Paul Jargowsky (Jargo)
4 pages
Pivot Table
No ratings yet
Pivot Table
52 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual download
100% (2)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual download
44 pages
Two Way Fixed Effect Models
No ratings yet
Two Way Fixed Effect Models
116 pages
CH - 05 - Further Issues - TQT
No ratings yet
CH - 05 - Further Issues - TQT
35 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual - Available For Instant Download And Reading
100% (1)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual - Available For Instant Download And Reading
43 pages
Module 2.6 PowerAnalysis
No ratings yet
Module 2.6 PowerAnalysis
13 pages
Cross Sectional Data
No ratings yet
Cross Sectional Data
4 pages
Lect 1 18
No ratings yet
Lect 1 18
22 pages
Lab Exercises Answer
No ratings yet
Lab Exercises Answer
13 pages
Regression Analysis A Practical Introduction 2nd Edition Jeremy Arkes instant download
100% (1)
Regression Analysis A Practical Introduction 2nd Edition Jeremy Arkes instant download
77 pages
Survival Analysis in R Tutorial 1688044180
No ratings yet
Survival Analysis in R Tutorial 1688044180
31 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual - Quickly Download For The Best Reading Experience
100% (3)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual - Quickly Download For The Best Reading Experience
49 pages
Empirical Methods - Esther Duflo 2002
No ratings yet
Empirical Methods - Esther Duflo 2002
36 pages
Analyzing GRT Data in Stata
No ratings yet
Analyzing GRT Data in Stata
17 pages
CH 07 Specification and Data Issues TQT
No ratings yet
CH 07 Specification and Data Issues TQT
45 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
5103A1
No ratings yet
5103A1
6 pages
Intervention Set Selection
From Everand
Intervention Set Selection
Simone G. Symonette
No ratings yet
Endurance Regression Testing Method To Obtain A Hydrostatic Design Basis For Fiberglass Pipe
100% (1)
Endurance Regression Testing Method To Obtain A Hydrostatic Design Basis For Fiberglass Pipe
15 pages
Mba 1-1
No ratings yet
Mba 1-1
7 pages
0 - Module 2 MIT-OT
No ratings yet
0 - Module 2 MIT-OT
95 pages
Robust Weighted Least Squares Estimation
No ratings yet
Robust Weighted Least Squares Estimation
7 pages
Verification and Validation in Scientific Computing 1st Edition William L. Oberkampf All Chapters Instant Download
100% (1)
Verification and Validation in Scientific Computing 1st Edition William L. Oberkampf All Chapters Instant Download
67 pages
Estimating Optimal Transformations For Multiple Regression and Correlation
No ratings yet
Estimating Optimal Transformations For Multiple Regression and Correlation
19 pages
(Ebook) Differential Equations: A Modeling Approach by Courtney Brown ISBN 9781412941082, 9781441655110, 1412941083, 1441655115download
100% (4)
(Ebook) Differential Equations: A Modeling Approach by Courtney Brown ISBN 9781412941082, 9781441655110, 1412941083, 1441655115download
27 pages
YONARIZA - PAPER - Food Security (20agust)
No ratings yet
YONARIZA - PAPER - Food Security (20agust)
22 pages
IJCRT
No ratings yet
IJCRT
10 pages
Me Demand Forecast
No ratings yet
Me Demand Forecast
51 pages
Multiple Discriminant Analysis and Logistic Regression
No ratings yet
Multiple Discriminant Analysis and Logistic Regression
56 pages
Chapter 5 (Time Series Analysis - Forecasting)
No ratings yet
Chapter 5 (Time Series Analysis - Forecasting)
71 pages
Assignment 4 Simple Linear Regression
100% (1)
Assignment 4 Simple Linear Regression
3 pages
The Impact of Customer Concentration On The Tunneling of The Large Shareholders
No ratings yet
The Impact of Customer Concentration On The Tunneling of The Large Shareholders
15 pages
Econometrics
No ratings yet
Econometrics
1 page
Barbosa Et Al. 2014
No ratings yet
Barbosa Et Al. 2014
10 pages
Final Exam Questions
No ratings yet
Final Exam Questions
2 pages
Measuring Impact of Demographic and Environmental Factors On Small Business Performance - A Case Study of D.I.khan KPK Pakistan)
No ratings yet
Measuring Impact of Demographic and Environmental Factors On Small Business Performance - A Case Study of D.I.khan KPK Pakistan)
7 pages
Sem 1 Review
No ratings yet
Sem 1 Review
26 pages
1859Applied Multilevel Analysis A Practical Guide for Medical Researchers Practical Guides to Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk all chapter instant download
100% (3)
1859Applied Multilevel Analysis A Practical Guide for Medical Researchers Practical Guides to Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk all chapter instant download
82 pages
Correlational Research Design
100% (1)
Correlational Research Design
18 pages
The Effect of Addiction of Watching Korean Drama Series On Imitation Behavior of Adolescents
No ratings yet
The Effect of Addiction of Watching Korean Drama Series On Imitation Behavior of Adolescents
8 pages
York University 2320-F10-Final-questions
No ratings yet
York University 2320-F10-Final-questions
19 pages
Indian Airline Ticket Price Analysis
No ratings yet
Indian Airline Ticket Price Analysis
60 pages
2CSOE51-ML - Course Policy
No ratings yet
2CSOE51-ML - Course Policy
7 pages
Teacher Personality and Students Learnin
No ratings yet
Teacher Personality and Students Learnin
7 pages
DS Assignment 2
No ratings yet
DS Assignment 2
6 pages
Physical Self-Concept and Its Relationship To Exercise Dependence Symptoms in Young Regular Physical Exercisers
No ratings yet
Physical Self-Concept and Its Relationship To Exercise Dependence Symptoms in Young Regular Physical Exercisers
6 pages
ECON1203 Business and Economic Statistics Quiz 1-4 Solutions
100% (1)
ECON1203 Business and Economic Statistics Quiz 1-4 Solutions
8 pages
Knowledge, Entrepreneurship and Performance: Evidence From Country-Level and Firm-Level Studies
No ratings yet
Knowledge, Entrepreneurship and Performance: Evidence From Country-Level and Firm-Level Studies
249 pages

Module 2.2 Randomized Assignment

Uploaded by

Module 2.2 Randomized Assignment

Uploaded by

Center for Effective Global Action

University of California, Berkeley

Module 2.2: Randomized Assignment

Center for Effective Global Action

At the end of this module, you should be able to:

 Determine how successful the randomization was (or wasn’t)

2. ANALYSIS FOR EVALUATING IMPACTS

2.1 Regression Analysis to Quantify Impacts

We can write the observed outcome Y for an individual I as,

𝑌𝑖 = 𝑌𝑖 𝑐𝑡𝑟 + (𝑌𝑖 𝑡𝑟𝑡 − 𝑌𝑖 𝑐𝑡𝑟 ). 𝑇𝑖 , and

𝑌𝑖 = 𝐸[𝑌𝑖 ]𝑐𝑡𝑟 + (𝑌𝑖 𝑡𝑟𝑡 − 𝑌𝑖 𝑐𝑡𝑟 ). 𝑇𝑖 + (𝑌𝑖 𝑐𝑡𝑟 − 𝐸[𝑌𝑖 ]𝑐𝑡𝑟 )

Center for Effective Global Action

𝐸 [𝑌𝑖 |𝑇 = 1] − 𝐸 [𝑌𝑖 |𝑇 = 0] = 𝛽0 + 𝛽1 + 𝜀𝑖 − (𝛽0 + 𝜀𝑖 ) = 𝛽1 .

Exercise: Open PanelPROGRESA_97_99year.dta. This dataset is a repeated cross-section

. reg IncomeLab_HH D_HH if year == 1999

Source SS df MS Number of obs = 18,370

IncomeLab_HH Coef. Std. Err. t P>|t| [95% Conf. Interval]

D_HH 94.88663 238.5733 0.40 0.691 -372.7393 562.5125

Figure 1. STATA output for OLS regression model to evaluate impacts

Center for Effective Global Action

2.2 T-test Based Analysis

Answer Key: we can conduct a t-test in STATA as ttest IncomeLab_HH if year ==

Two-sample t test with equal variances

0 11,666 1898.628 83.78446 9049.497 1734.396 2062.859

combined 18,370 1933.256 114.8499 15566.3 1708.139 2158.373

diff -94.88663 238.5733 -562.5125 372.7393

diff = mean(0) - mean(1) t = -0.3977

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Figure 2. t-test output to evaluate impact of PROGRESA on household income

2.3 A Decision to Make: Regression Models or T-test?

Center for Effective Global Action

Exercise: As a demonstration, let’s assume that poverty status of a household (pov_HH) is a

. regress IncomeLab_HH D_HH pov_HH if year == 1999, cluster( villid )

Linear regression Number of obs = 17,942

(Std. Err. adjusted for 499 clusters in villid)

D_HH 229.035 324.9309 0.70 0.481 -409.3695 867.4394

Center for Effective Global Action

3.1 Simple Randomization

 Open PanelPROGRESA_97_99year.dta if you have not already

gen random_T = 0+int((1-0+1)*runiform()) if year == 1997

3.2 Stratified or Block Randomization

In some experiments it is practical or theoretically proper to randomize at some unit of grouping

 Suppose we want to stratify by federal administrative regions (geopolid)

Center for Effective Global Action

3.3 Clustered Randomization

3.4 Testing the Success of Randomization

Center for Effective Global Action

Figure 4. Testing for baseline balance

Center for Effective Global Action

Center for Effective Global Action

You might also like