Wilson - Dropout - Protocol - CSR
Wilson - Dropout - Protocol - CSR
BACKGROUND
With the expansion of regional and national economies into a global marketplace, education has
even greater importance as a primary factor in allowing young adults to enter the workforce and
advance economically, as well as to share in the social, health, and other benefits associated with
education and productive careers. Dropping out of school before completing the normal course
of secondary education greatly undermines these opportunities and is associated with adverse
personal and social consequences. Dropout rates in the United States vary by calculation
method, state, ethnic background, and socioeconomic status (Cataldi, Laird, & KewelRamani,
2009). Across all states, the percentage of freshman who did not graduate from high school in
four years ranges from 13.1% to 44.2% and averages 26.8%. The status dropout rate, which
estimates the percentage of individuals in a certain age range who are not in high school and
have not earned a diploma or credential, is slightly lower. In October 2007, the proportion of
noninstitutionalized 18-24 year olds not in school without a diploma or certificate was 8.7%.
Males are more likely to be dropouts than females (9.8% vs. 7.7%). Status dropout rates are
much higher for racial/ethnic minorities (21.4% for Hispanics and 8.4% for Blacks vs. 5.3% for
Whites). Event dropout rates illustrate single year dropout rates for high school students and
show that students from low-income households drop out of high school more frequently than
those from more advantaged backgrounds (8.8% for low-income vs. 3.5% for middle income and
0.9% for high income students). The National Dropout Prevention Center/Network reports that
school dropouts in the United States earn an average of $9,245 a year less than those who
complete high school, have unemployment rates almost 13 percentage points higher than high
school graduates, are disproportionately represented in prison populations, are more likely to
become teen parents, and more frequently live in poverty (2009). The consequences of school
dropout are even worse for minority youth, further exacerbating the economic and structural
disadvantage they often experience.
School dropout has implications not only for the lives and opportunities of those who experience
it, but also has enormous economic and social implications for society at large. For instance, the
A relatively large number of intervention and prevention programs in the research literature
give some attention to reducing dropout rates as a possible outcome. The National Dropout
Prevention Center/Network, for instance, lists 192 “model programs.” Relatively few of those
programs, however, bill themselves as dropout programs; many focus on academic
performance, risk factors for dropout such as absences or truancy, or indirect outcomes like
student engagement, but may also include dropout reduction as a program objective. The
corresponding research domain includes evaluations of virtually any program provided to
students for which dropout rates are measured as an outcome variable, regardless of whether
they are billed as dropout programs. To represent the full scope of relevant research on this
topic, all such programs should be considered in a review of dropout programs. However,
because we are interested in summarizing the research on dropout programs that could be
implemented by schools, we narrow our focus to programs that can be implemented in school
settings or under school auspices.
There have been a handful of systematic reviews on the effects of prevention and intervention
programs on school dropout and completion outcomes. However, the restrictive inclusion
criteria and methodological weaknesses of these reviews preclude any confident conclusions
about the effectiveness of the broad range of programs with dropout outcomes, or the potential
variation of effectiveness for different program types or subject populations. For instance, the
U.S. Department of Education’s What Works Clearinghouse report on dropout prevention found
only 15 qualifying studies that reported outcomes on direct measures of staying in school or
completing school (http://ies.ed.gov/ ncee/wwc/reports/dropout/topic/#top). This report,
however, restricted discussion to interventions in the United States and did not include a meta-
analysis of program effectiveness or examine potential moderators of program effectiveness.
Another review on best practices in dropout prevention summarized the results of 58 studies of
dropout programs (ICF, International, 2008). That report presented effect sizes primarily for
individual program types and did not examine potential moderators or examine the influence of
study method on effect size. The report also presented a narrative review of important variables
associated with implementation quality, but implementation quality was not analyzed in a meta-
analysis framework.
Two other systematic reviews have focused on the effectiveness of prevention and intervention
programs to reduce school dropout or increase school completion (Klima, Miller, & Nunlist,
2009; Lehr et al., 2003). In their review, Lehr et al. (2003) identified 17 experimental or quasi-
experimental studies with enrollment status outcomes. This review was completed seven years
ago, and thus does not include the most recent studies. The authors did not perform a meta-
analysis because they felt that the dependent variables differed too greatly across studies to
create meaningful aggregates. This circumstance prevented the authors from examining the
differential effectiveness of programs with different treatment or participant characteristics,
something we plan to do in the proposed systematic review. In a more recent review, Klima et al.
(2009) identified 22 experimental or quasi-experimental studies with dropout, achievement,
and truancy outcomes. However, this review excluded programs for general “at-risk”
populations of students (e.g., minority or low socioeconomic status samples), as well as
programs with general character-building, social-emotional learning, or delinquency/behavioral
The findings of the Klima et al., (2009) and Lehr et al., (2003) reviews have some similarities.
Both teams highlight the dearth of high-quality research on dropout programs, and mention
especially the lack of key outcomes such as enrollment (or presence) at school and dropout. Both
reviews demonstrate that some of the included programs had positive effects on the students
involved. Lehr and her colleagues do not identify specific programs that were particularly
effective or ineffective, but focus rather on implementation integrity as a key variable and
emphasize the importance of strong methodologies for future research on dropout programs.
Klima and colleagues conclude that the programs they reviewed had overall positive effects on
dropout, achievement, and attendance/enrollment. They highlight alternative educational
programs, such as schools-within-schools, as particularly effective. The Klima review also
suggests that alternative school programs, that is, programs in separate school facilities, were
ineffective. Overall, these two reviews identify several important potential moderators that will
be included in the coding scheme for the proposed review. These include implementation
quality, treatment modality, and whether programs are housed in typical school facilities or in
alternative school locations.
OBJECTIVES
The objective of the proposed systematic review is to summarize the available evidence on the
effects of prevention and intervention programs aimed at primary and secondary students for
increasing school completion or reducing school dropout. Program effects on the closely related
outcomes of school attendance (absences, truancy) will also be examined. Moreover, when
accompanying dropout or attendance outcomes, effects on student engagement, academic
performance, and school conduct will be considered.
The primary focus of the analysis will be the comparative effectiveness of different programs and
program approaches in an effort to identify those that have the largest and most reliable effects
on the respective school participation outcomes, especially with regard to differences associated
with treatment modality, implementation quality, and program location or setting. In addition,
evidence of differential effects for students with different characteristics will be explored, e.g., in
relation to age or grade, gender, race/ethnicity, and risk factors. Because of large ethnic and
socioeconomic differences in graduation rates, it will be particularly important to identify
programs that may be more or less effective for disadvantaged students.
The ultimate objective of this systematic review is to provide school administrators and
policymakers with an integrative summary of research evidence that is useful for guiding
programmatic efforts to reduce school dropout and increase school completion.
Studies must meet the following eligibility criteria to be included in the systematic review.
Types of interventions
There must be a school-based or affiliated psychological, educational, or behavioral prevention
or intervention program, broadly defined, that involves actions performed with the expectation
that they will have beneficial effects on student recipients. School-based programs are those that
are administered under the auspices of school authorities and delivered during school hours.
School affiliated programs are those that are delivered with the collaboration of school
authorities, possibly by other agents, e.g., community service providers, and which may take
place before or after school hours and/or off the school grounds. Community-based programs
that are explicitly presented as dropout prevention or intervention programs will be included
whether or not a school affiliation is evident. Other community-based programs that may
include dropout among their goals or intended outcomes, but for which dropout or related
variables are not a main focus, and which have no evident school affiliation, will be excluded.
We expect that programs that might be excluded for being community-based with no school
affiliation or dropout focus, but that happen to assess school dropout outcomes, would mainly
be delinquency or drug prevention or treatment programs or. The rationale for this exclusion is
that we believe these kinds of programs are likely to be outside the realm of strategies that
school administrators might consider when selecting programs for dropout prevention or
treatment.
Types of participants
The research must investigate outcomes for an intervention directed toward school-aged youth,
defined as those expected to attend pre-k to 12th grade primary and secondary schools, or the
equivalent in countries with a different grade structure, corresponding to approximately ages 4-
18. The age or school participation of the sample must be presented in sufficient detail to allow
reasonable inference that it meets this requirement. Recent dropouts who are between the ages
of 18-21 will also be included if the program under study is explicitly oriented toward secondary
school completion or the equivalent.
General population samples of school-age children will be included. Samples from populations
broadly at risk because of economic disadvantage, individual risk variables, and closely related
factors will also be included (e.g., inner city schools, students from low SES families, teen
parents, students with poor attendance records, students who have low test scores or who are
over-age for their grade).
Types of outcomes
To be included, a study must assess intervention effects on at least one eligible outcome
variable. Qualifying outcome variables are those that fall in or are substantially similar to the
following categories: (a) School completion/dropout; (b) GED completion/high school
graduation; (c) Absences or truancy. If a measure absences, truancy, or attendance is the only
outcome provided, the majority of the students in the sample must be age 12 or older. The
rationale for this exclusion is practical; there is a large literature on programs designed to
influence attendance for elementary school age children that is beyond the scope of this review.
Moreover, there is already an active Campbell Collaboration protocol on this topic (Maynard,
Tyson-McCrea, Pigott, & Kelly, 2009).
1Note that there is no threshhold for initial equivalence. To be eligible under the third criterion, studies must simply
present statistically controlled data or information from which group equivalence effect sizes can be calculated for key
risk factors or student characteristics such as age, gender, race/ethnicity, school attendance, school performance, etc.
Should sufficient data be available, pretest effect sizes or pre-treatment effect sizes on risk factors or demographics
may be used as covariates to adjust for initial treatment-control differences for all research designs.
Resources searched
A comprehensive and diverse strategy will be used to search the international research literature
for qualifying studies reported during the last 25 years (1985-2010). The wide range of resources
searched is intended to reduce omission of any potentially relevant studies and to ensure
adequate representation of both published and unpublished studies.
Research registers to be searched include, the Cochrane Collaboration Library, the National
Dropout Prevention Center/Network, the National Research Register (NRR), the National
Technical Information Service (NTIS), and the System for Information on Grey Literature
(OpenSIGLE). International research databases such as Australian Education Index, British
Education Index, CBCA Education (Canada), Canadian Research Index will also be searched.
Reference lists in previous meta-analyses and reviews, and citations in research reports
screened for eligibility will also be reviewed for potential relevance to the review.
Correspondence with researchers in the field will also be maintained throughout the review
process.
School dropouts, school attendance, truancy, school graduation, high school graduates, school
completion, GED, general education development, high school diploma, dropout, alternative
education, alternative high school, career academy, schools-within-schools, schools and
absence, chronic and absence, school enrollment, high school equivalency, school failure, high
school reform, educational attainment, grade promotion, grade retention, school
nonattendance, school engagement, and graduation rate;
AND
intervention, program evaluation, random, prevent, pilot project, youth program, counseling,
guidance program, summative evaluation, RCT, clinical trial, quasi-experiment, treatment
outcome, program effectiveness, treatment effectiveness, evaluation, experiment, social
program, effective.
The following search terms will be used to exclude irrelevant studies: higher education, post-
secondary, undergraduate, doctoral, prison, and inmate.
One study that exemplifies the methods likely to meet the eligibility criteria for the proposed
review is a program evaluation of Ohio’s Learning, Earning, and Parenting (LEAP) Program
(Long et al., 1996). In 1989, almost 10,000 teenage parents throughout the state of Ohio were
randomly assigned to the LEAP program or a no-treatment control group. The LEAP program
used an incentive structure for teens to encourage regular attendance in a program designed to
lead to a high school diploma or GED. Because it used random assignment, the LEAP study did
not provide pretest group equivalence information for the intervention and control groups, as
those differences were presumed negligible given the randomized study design (other studies
using quasi-experimental designs, however, must provide such pretest information in order to
be included in the proposed review). The key outcomes of interest from the LEAP study are the
posttest measurements—in this case measured three years after random assignment. Outcomes
measured at posttest included measures of the percent of students in the intervention and
comparison conditions who completed 9th, 10th, and 11th grade, completed high school,
completed a GED, or were currently enrolled in school or a GED program.
Multiple reports from single studies, and multiple studies in single reports, will be identified
through information on program details, sample sizes, authors, grant numbers and the like. If it
is unclear whether reports and studies provide independent findings, the authors of the reports
will be contacted.
All codable effect sizes will be extracted from study reports during the coding phase of the
review (i.e., we plan to code multiple outcomes and multiple follow-ups measured within the
same study). These will be separated according to the general constructs they represent
(dropout, attendance, engagement, etc.) and each outcome construct category will be analyzed
separately. We expect that some portion of the studies will provide more than one effect size for
a particular outcome construct (e.g., report two measures of dropout). This circumstance creates
statistical dependencies that violate the assumptions of standard meta-analysis methods. If
there are relatively few instances of this for a given construct category, we will retain only one of
these effect sizes in the analysis by selecting the construct that is most similar to those used by
other studies in that category. 2 For any construct categories where this is relatively common,
however, we will retain all the effect sizes in the analysis and use the technique recently
developed by Hedges, Tipton, and Johnson (2010) to estimate robust standard errors that
account for the statistical dependencies.
2For instance, constructs with similar measurement characteristics (source of information, length of measurement
period, standardized assessment source).
Eligible studies will be coded on variables related to study methods, the nature of the
intervention and its implementation, the characteristics of the subject samples, the outcome
variables and statistical findings, and contextual features such as setting, year of publication,
and the like. A detailed coding manual is included in Appendix I. All coding will be done by
trained coders who will enter data directly into a FileMaker Pro database using computer
screens tailored to the coding items and with help links to the relevant sections of the coding
manual. Effect size calculation is built into the data entry screens for the most common
statistical representations and specialized computational programs and expert consultation will
be used for the less common representations. We will select a 10% random sample of studies for
independent double coding. The results will be compared for discrepancies that will then be
resolved by further review of the respective study reports. The coding team will be retrained on
any coding items that show discrepancies during this process.
Analysis will be conducted using SPSS and the specialized meta-analysis macros available for
that program (Lipsey & Wilson, 2001) as well as Stata and the meta-analysis routines available
for it (Sterne, 2009).
All computations with odds ratios will be carried out with the natural logarithm of the odds
ratios, defined as follows:
A* D
ln(OR) = ln
B*C
where A and B are the respective counts of “successes” and “failures” in the treatment group,
and C and D are the corresponding counts of “successes” and “failures” in the comparison group.
The sampling variance of the logarithm of an odds ratio can be represented as:
1 1 1 1
Varln(OR ) = + + +
A B C D
Analytic results from the logged odds ratios effect sizes will be converted back to the original
odds ratio metric for final substantive interpretation.
X G2 − X G1
d=
sp
where the numerator is the difference in group means for the intervention and comparison
groups, and the denominator is the pooled standard deviation for the intervention and
comparison groups. All standardized mean difference effect sizes will be adjusted with the
small-sample correction factor to provide unbiased estimates of the effect size (Hedges, 1981).
This small-sample corrected effect size (g) and its sampling variance can be represented as:
3
g = 1 − ∗ d
4 N − 9
nG1 + nG 2 g2
Varg = +
nG1nG 2 2( nG1 + nG 2 )
where N is the total sample size for the intervention and comparison groups, d is the original
standardized mean difference effect size, nG1 is the sample size for the intervention group, and
nG2 is the sample size for the comparison group.
During the analytic phase of the project we will determine the number of coded effect sizes in
the odds ratio and standardized mean difference metrics in each outcome construct category. If
both occur in a given category, we will transform the effect size metric with the smaller
proportion into the metric with the larger proportion using the Cox transform shown by
Sánchez-Meca et al, (2003) to produce good results for this purpose. This will allow all the effect
sizes for that outcome category to be analyzed together. If this involves a large proportion of the
effect sizes in any category, sensitivity analyses will be conducted to ensure that the transformed
effect sizes and those in the original metric produce comparable results.
Missing data
All reasonable attempts will be made to collect complete data on items listed in the coding
manual (see Appendix I). Authors of the reports will be contacted if key variables of interest
cannot be extracted from study reports. In the event that a small number of studies continue to
have missing data on covariates or moderators of interest to be used in the final analysis, we
plan to explore an option for imputing missing values using an expectation-maximization (EM)
algorithm, which produces asymptotically unbiased estimates (Graham, Cumsille, & Elek-Fisk,
2003). A series of sensitivity analyses will be conducted to examine whether the inclusion of
imputed data values substantively alters the results of the moderator analyses. If the EM
algorithm fails to converge, or if other difficulties arise that make this technique not feasible, all
resulting analyses will implement listwise deletion of missing data.
Outliers
The effect size distributions for each outcome construct category will be examined for outliers
using Tukey’s (1977) inner fence as the criterion and any outliers found will be recoded to the
inner fence value to ensure that they do not exercise disproportionate influence on the analysis
results. The distribution of sample sizes will also be examined and any outliers similarly recoded
Analytic techniques
All analysis with effect sizes will be inverse variance weighted using random effects statistical
models. Specifically, the weighting function will be:
1
wi =
Vari + τ 2
Where wi is the weight for effect size i, Vari is the sampling variance for effect size i as defined
above for the respective effect size metric, and τ2 is the random effects variance component
estimated for each analysis with a method of moments or maximum likelihood estimator.
The unit of assignment to treatment and comparison groups will be coded for all studies, and
appropriate adjustments will be made to effect sizes to correct for variation associated with
cluster-level assignment (Hedges, 2007).
We plan to code pretest effect sizes when available. If available in sufficient numbers for certain
outcomes, it may be possible to use the pretest effect sizes as covariates in our meta-regression
models, to control for pre-treatment differences between treatment and comparison groups on
the outcome variables. For the dropout outcomes, we do not expect to find many pretest effect
sizes, because most programs are likely to involve students who are currently attending school
(thus the pretest effect sizes would be zero). For attendance outcomes, we may have sufficient
pretest effect sizes to use them in our analyses.
The main objective of the analyses, however, will be to describe the direction and magnitude of
the effects of different interventions on the different outcome constructs in a manner that allows
their comparative effectiveness to be assessed. Additionally, moderator analysis using meta-
regression models will attempt to identify the characteristics of the interventions and student
participants that are associated with larger and smaller effects for the various outcome
constructs. Based on prior theory and research, the following moderators will be examined for
their influence on effect sizes:
• Treatment modality
• Implementation quality
• Treatment duration
Examination of funnel plots, the use of Duval and Tweedie’s trim and fill method (2000), and
Egger’s regression test (1997) will be used to assess the possibility of publication bias and its
impact on the findings of the review. Sensitivity analyses will be conducted to examine whether
any decisions made during analyses substantively influenced the review findings, e.g.,
transformation between effect size metrics, the way outlier effect sizes and sample sizes were
handled, the inclusion of studies with poorer methodological quality within the range allowed by
the inclusion criteria, and missing data imputations.
SOURCES OF SUPPORT
External funding:
Work on this review to date has been supported by a contract from the Campbell Collaboration.
DECLARATIONS OF INTEREST
Lead reviewer:
Sandra Jo Wilson, Ph.D.
Peabody Research Institute, Vanderbilt University
230 Appleton Place, PMB 181
Nashville, TN 37203-5721 USA
Phone: (615) 343-7215
Fax: (615) 322-0293
email: [email protected]
Co-authors:
1. Mark W. Lipsey, Ph.D. Director, Peabody Research Institute, Vanderbilt University.
2. Emily E. Tanner-Smith, Ph.D. Research Associate, Peabody Research Institute,
Vanderbilt University.
3Note that community-based programs without a school affiliation are not eligible for the review. As a result, we
expect to find few (if any) programs that are not housed in typical or alternative school settings, or housed in school
buildings but operate outside of school hours. Should we locate non-school based programs that are implemented
under school auspices, we would include those in the moderator analysis.
PRELIMINARY TIMEFRAME
Timeframe
The research team will begin working on the systematic review upon approval by the Campbell
Collaboration editorial staff. We plan to complete the review by February, 2011.
REFERENC ES
Cataldi, E. F., Laird, J., & KewalRamani, A. (2009). High school dropout and completion rates
in the United States: 2007 (NCES 2009-064). National Center for Education Statistics,
Institute of Education Sciences, U.S. Department of Education. Washington, DC. Retrieved
Jan 26, 2010 from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009064.
Duval, S., & Tweedie, R. (2000). A nonparametric ‘trim and fill’ method of accounting for
publication bias in meta-analysis. Journal of the American Statistical Association, 95, 89-
98.
Egger, M., Davey Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected
by a simple, graphical test. British Medical Journal, 315, 629-634.
Graham, J. W., Cumsille, P. E., & Elek-Fisk, E. (2003). Methods for handling missing data. In J.
A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in
psychology, Vol. 2. (pp. 87-114). Hoboken, NJ: John Wiley & Sons, Inc.
Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related
estimators. Journal of Educational Statistics, 6, 107-128.
Hedges, L. V. (2007). Effect sizes in cluster-randomized designs. Journal of Educational and
Behavioral Statistics, 32, 341-370.
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-
regression with dependent effect size estimates. Research Synthesis Methods, 1, 39-65.
ICF International & National Dropout Prevention Center/Network. (2008). Best practices in
dropout prevention. Fairfax, VA: ICF International.
STUDY IDENTIFIERS
The “unit” you will code here consists of a study, i.e., one research investigation of a defined
subject sample or subsamples compared to each other, and the treatments, measures, and
statistical analyses applied to them. Sometimes there are several different reports (e.g., journal
articles) about a single study. In such cases, the coding should be done from the full set of
relevant reports, using whichever report is best for each item to be coded; BE SURE YOU HAVE
THE FULL SET OF RELEVANT REPORTS BEFORE BEGINNING TO CODE. Sometimes a
single report describes more than one study, e.g., one journal article could describe a series of
similar studies done at different sites. In these cases, each study should be coded separately as if
each had been described in a separate report.
Each study has its own study identification number, or StudyID (e.g., 619). Each report also has
an identification number (e.g., 619.01), which you will find printed on the folder holding the
report. The ReportID has two parts; the part before the decimal is the StudyID, and the part
after the decimal is used to distinguish the reports within a study. (These two types of ID
numbers, along with bibliographic information, are assigned and tracked using the
bibliography.) When coding, use the study ID (e.g., 619) to refer to the study as a whole, and use
the appropriate report ID (e.g., 619.01) when referring to an individual report.
While reading reports for coding, be alert to any references to other dropout studies that may be
appropriate to include in this meta-analysis. If you find appropriate-looking references that are
not currently entered into the bibliography, the references may need to be entered.
[StudyID] Study identification number of the study you are coding, e.g., 1923.
[Coder] Coder's initials (select from menu)
[CodeDate] Date you began coding this study (will be inserted automatically)
STUDY CONTEXT
[H1] Year of publication (four digits): If more than one report, choose earliest date.
(1) Aggregate treatment and/or comparison groups. The largest participant groupings on which
contrasts between experimental conditions or contrasts between time points can be made. Note
that the designations “comparison group” and “control group” refer to any group with which the
treatment of interest is compared that is presumed to represent conditions in the absence of that
treatment, whether a true random control or not. Often there is only one aggregate treatment
group and one aggregate control group, but it is possible to have a design with numerous
treatment variations (e.g., different levels) and control variations (e.g., placebos) all compared
(e.g., in ANOVA format) to each other.
(2) Breakouts. Sometimes researchers will present data for some subset(s) of the participants
from an aggregate group; e.g., for an aggregate group composed of males and females, the
researchers may present some results for the males and females separately. You will code
information about breakouts later.
[H6] Unit of group assignment. The unit on which assignment to groups was based.
1. individual (i.e., some children assigned to treatment group, some to comparison
group)
2. group (e.g.,whole classrooms, schools, sites, facilities assigned to treatment and
comparison groups)
3. program area, regions, school districts, counties, etc. (i.e., region assigned as an
intact unit)
9. cannot tell
Nonrandom, no matching prior to treatment but descriptive data, etc. regarding the nature of
the group differences:
8. Non-random, not matched, but pretreatment equivalence information is available.
[H9] Number of variables on which treatment and comparison group differences were
statistically compared prior to the intervention. A statistical comparison is one in which a
statistical test was performed by the authors, whether they provide data or not (e.g., “no
statistically significant differences were found”). Do not include here any comparisons on
pretest variables, that is, measures of a dependent variable taken prior to treatment, e.g., prior
number of absences when subsequent number of absences is used as an outcome measure.
[H11] Number of variables on which treatment and comparison group differences were or can
be descriptively compared prior to the intervention. A descriptive comparison is any comparison
across treatment and control groups that does not involve a statistical test (e.g., the actual
number of males and females in each group or a statement by the author(s) about group
similarity).
[H13] Rating of similarity of treatment and control groups. Using all the available information,
rate the overall similarity of the treatment group and the comparison group, prior to treatment,
on factors likely to have to do with dropout or responsiveness to treatment (ignore differences
on any irrelevant factors). Note: Greatest equivalence from “clean randomization” with prior
blocking on relevant characteristics and no subsequent attrition/degradation; least equivalence
with some differential selection of one “type” of individual vs. another on some variable likely to
be relevant to dropout.
Guidelines: Use ratings in the 1-3 range for good randomizations and matchings, e.g., 1=clean
random, 2=nice matched. Use ratings in the 5-7 range for selection with no matching or
randomization or instances where it has been seriously degraded, e.g., by attrition before
posttest. Within this bracket, the question is whether the selection bias is pertinent to the
outcomes being examined. Were participants selected explicitly or implicitly on a variable that
might make a big difference in dropout? The middle three points are for sloppy matching
designs, degradations, bad wait list designs, and the like. If the data indicate equivalence but the
assignment procedure was not random give it a 4 or thereabouts since not all possible variables
were measured for equivalence between groups.
[H15] Click here to record any problems you encountered while coding this section.
At this point, you should go to the Effect Size Database to code group equivalence effect sizes
and descriptive information about initial group differences. See the Effect Size Coding section of
this manual for more information on effect size calculation.
For each measure you can identify on which the treatment and control group were compared
prior to treatment (other than dependent variables) or on which you can tell equivalence (e.g. if
all males then code it here), determine which group is favored and if possible, calculate an effect
size (ES, standardized difference between means or odds ratio). Do not include here any
comparisons on pretest variables, that is, measures of a dependent variable taken prior to
The only eligible variables for group equivalence effect sizes are: (a) gender, (b) age, (c)
race/ethnicity, and (d) variables relating to risk for school dropout. A pretest that is used later in
the study as a posttest would not be coded here – you would code it as a pretest effect size. If the
study reports group equivalence outcome data for multiple risk variables, group equivalence
effect size information should be coded for up to four variables. If more than four variables are
available for any of the risk factors, code the four most relevant ones. When deciding which are
most relevant, use the following criteria:
1. First preference should be given to behavioral measures (e.g., prior absences, school
performance).
2. Second preference should be given to measures of psychological conditions,
predispositions, or attitudes (e.g., school engagement, school bonding, etc.).
3. Lowest preference should be given to broad measures of social disadvantage or family
history (e.g., socioeconomic status of parents, residence in inner-city).
[StudyID] Indicate the Study ID for the study you are coding.
[ReportID] Enter the Report ID for the report in which you found the information on group
equivalence. Use the complete Report ID, e.g. 1973.01.
[pagenum] Enter the page number on which you found the information on group equivalence.
[ES19] Wave number. Pretests and group equivalence effect sizes always get a 1; each wave
thereafter gets numbered consecutively, beginning with 1. Some studies involve more than one
posttest measurement and we need to be able to distinguish one from another. Give the first
posttest after treatment a 1, the second a 2, and so on.
[ES17] Which group is favored? Whichever group has more of the characteristic that
presumably makes them better off or more amenable to treatment (e.g., less truant, higher SES,
smarter, etc.) is considered favored. NOTE: You should code this item even for cases in which
you are unable to calculate a numeric effect size but have information about which group is
favored.
Data Fields: Fill in the data fields using the relevant statistical information provided in the
report(s). You do not need to fill in all the fields; fill in only the information necessary to
calculate an effect size. Thus, if the report provides sample sizes, means, standard deviations,
and t-test scores, you need only enter the sample sizes, means, and standard deviations.
Create one record in this database for each of the aggregate treatment and/or control groups
that you selected earlier for coding. Studies with a treatment group and a control group will have
two records, etc.
[StudyID] Type in the StudyID for the study you are coding if it does not appear automatically.
[GroupID] Number each group consecutively within a study, starting with 1.
Intervention Condition
1. Focal program or treatment. There may be several focal programs in a study, as when
two different types of treatments, both of which could be expected to be effective, are
compared.
Control Condition
2. “Straw man” alternate program or treatment, diluted version, less extensive program,
etc., not expected to be effective but used as contrast for treatment group of primary
interest. If the alternate treatment is not minimal and could realistically be expected
to be effective, it is not a control condition and should be classified as a focal
treatment instead.
3. Placebo (or attention) treatment. Group gets some attention or sham treatment (e.g.,
watching Wild Kingdom videos while treatment group gets therapy)
4. Treatment as usual. Group gets “usual” handling instead of some special treatment.
5. No treatment. Group gets no treatment at all. Note: The difference between “no
treatment” and “treatment as usual” hinges on whether or not the treatment and
control groups in this study have an institutional framework or experience in
common.
[G3] Program name. Write in program or treatment label for this group (e.g., Dropout
Prevention Curriculum, waiting list control, etc.). REMEMBER: YOU MUST CREATE A
PROGRAM LABEL FOR CONTROL GROUPS AS WELL AS TREATMENT GROUPS.
[G4] Program description. Write in a brief description of the treatment this group receives.
Please try to keep the description short by focusing on the key elements of treatment, but make
sure you include ALL treatment elements in your description. As much as possible, quote or give
a close paraphrase of the relevant descriptive text in the study report. REMEMBER: YOU MUST
CREATE A DESCRIPTION FOR CONTROL GROUPS AS WELL AS TREATMENT GROUPS.
Second, choose the one program type that can be considered the focal program characteristic.
Most programs will arguably deliver multiple service types, but do your best to narrow the focal
type down to one category. It may be helpful to examine the amount of each service type
delivered. For instance, if a program delivered 1 hour/week of skills training to parents and 5
hours/week of vocational training to students, you would code vocational training as the focal
program component. If a program contains too many service types to distinguish a focal type,
choose “multi-service” package as the focal component.
ACADEMIC:
1. Curriculum
2. Academic program
3. Remedial education (e.g., reading remediation)
4. GED preparation
5. Computer-assisted learning
6. Test-taking and study skills assistance
7. Tutoring
8. Homework assistance
9. Extracurricular activities (e.g., after school club). NOTE: just because a program is
delivered after school does not mean it should be coded here; this program component
should include academic, social, or sport activities that are separate from regular school
activities.
10. Professional development for school staff
SCHOOL STRUCTURE
11. Class or grade reorganization (schools within schools)
12. Small class sizes/small “learning communities”
13. Alternative school
FAMILY ENGAGEMENT:
14. Family outreach
15. Feedback to parents and students on performance
16. Parent or teacher consultation enhancement
17. Parenting skills program
LINKING TO SERVICES:
30. Case management
31. Health services
32. Transportation assistance
33. Child care/day care
34. Residential living services
SOCIAL RELATIONSHIPS:
35. Mentoring
36. Peer support
37. Social events
38. Community service/volunteer service/tutoring (“helper-therapy”)
39. Recreational, wilderness, etc. program
PERSONAL/AFFECTIVE:
40. Counseling
41. Skills training (life skills, social skills/social competence)
42. Cognitive behavioral therapy (e.g., problem solving skills)
BEHAVIORAL:
43. Attendance monitoring
44. Contingency management, financial incentives, token economy, extrinsic reward system
to promote attendance/academic achievement
OTHER:
45. Multi-service package (NOTE: Only choose this program code if the group receives an
amorphous, broadly defined program with components that cannot be clearly
identified otherwise. Use this program code as focal if a group has multiple “focal”
treatment components and you cannot make a distinction otherwise.
School Sites
1. Regular Class Time (this includes interventions delivered during regularly scheduled
classes AND in the regular classroom for youths in the group)
2. Special Class (e.g., youth in treatment are in a classroom-type setting that is different
from a typical classroom, although it may be the subjects’ usual classroom – includes
such settings as special education classrooms, schools-within-schools, alternative
schools, etc.)
3. Resource Room, School Counselor's Office, or other similar setting that is NOT the
student’s regular classroom; the idea here is that students are removed from class for
treatment
4. Treatment delivered at school facility, but not during regular school hours (e.g.,
afterschool programs)
Home
9. Treatment delivered in the subject’s home
Community-based, Non-residential
10. Private office, clinic, center (e.g., YMCA, university, therapist’s office)
11. Public office, clinic, center (e.g., human services department, public health agency)
12. Work site (e.g., community service, trash collection on roadside, etc.)
13. Park, playground, wilderness area, etc.
Institutional, Residential
14. Private institution, residential
15. Public institution, residential (e.g., camp, reformatory)
[G10] Role of the evaluator(s)/author(s)/research team or staff in the program. This item
focuses on the role of the research team working on the evaluation, regardless of whether they
are all listed as authors.
1. evaluator delivered therapy/treatment
2. evaluator involved in planning or controlling treatment or is designer of program
3. evaluator influential in service setting but no direct role in delivering, controlling, or
supervision
4 . evaluator independent of service setting and treatment; research role only
9. cannot tell
[G11] Role of program developer in the research project. This items focuses on the individual
(or group of individuals) who created or developed the program and their role in the delivery of
the program under study. Is the program developer the researcher conducting the study, or is
the program developer not participating in the research project?
1. Program developer is author/evaluator/delivery agent
[G12] Routine practice or program vs. research project. Indicate the appropriate level for the
treatment you are coding: at one end of the continuum are research projects (option 1), in which
a researcher decides to implement and evaluate a particular program for research purposes; in
many cases, the program may require the cooperation of a service agency (school, clinic, etc.),
but the intervention is delivered primarily so the researcher can conduct research. At the other
end of the continuum are evaluations of “real-world” or routine programs (option 3): a service
agency implements a program on its own, and also decides to conduct an evaluation of the
program; the evaluation may or may not be conducted by outside researchers. In the middle of
the continuum are demonstration projects (option 2), which are conducted primarily for
research purposes, but generally have more elements of “real world” practice than typical
research projects as defined under option 1. Demonstration projects generally involve a program
that has been studied in prior research but is being tested for effectiveness in different settings
than the original research, or on a larger scale than the original research.
If a researcher is a school principal or some other school staff person and is conducting the
evaluation as part of his/her dissertation, the decision depends on the extent of the program. If
the program is small-scale and implemented in, say, a classroom or two, and supervised by the
researcher/principal, code it as a research project. If the program is a broader school-wide
program that the researcher/principal happens to be evaluating, code it as either a
demonstration or routine program, depending on whether the program is a special program
being tested (demonstration) or something that the school does on a routine basis (routine
practice).
1. research project: The intervention would not have been implemented without the
interest or initiative of the researcher(s). The intervention is delivered by the research
staff or by service providers (regular agency personnel, teachers, etc.) trained by the
researchers.
2. demonstration project: A research project that involves a new or special program being
tested, rather than a routine program. Although generally implemented by researchers
for research purposes, a demonstration project has more elements of actual practice than
a research project. Demonstration projects usually involve programs that have been
studied previously, either in small-scale pilot projects or tightly controlled efficacy trials;
demonstration projects would serve as a larger scale or quasi-real-world test of a
promising program.
[G14] Did treatment personnel receive special training in this specific program, intervention, or
therapy? If the treatment is delivered by the researcher, use “yes” below, unless the report
indicates otherwise.
1. yes
2. no
9. cannot tell
[G15] If yes, write in amount of training of personnel for providing this treatment:
_______________
Second, choose the one format type that can be considered the focal format. This
selection should match the format of the focal program type you selected above under
G6. If you selected multi-service package above, select the format for the most frequent
or most focal piece of the package; if this is impossible, select multiple format program.
[G20] Duration of treatment. Approximate (or exact) number of weeks that subjects
received treatment, from first treatment event to last excluding follow-ups designated as such.
Divide days by 7; multiply months by 4.3. Code 888 if a control group that receives nothing
Code 999 if cannot tell. Estimate for this item if necessary, and if you can come up with a
reasonable order of magnitude number.
[G22] Approximate (or exact) frequency of contact between subjects and provider or treatment
activity. This refers only to the element of treatment that is different from what the control
group receives.
1. less than weekly
2. Once a week
3. 2 times a week
4. 3-4 times a week
5. daily contact (not 24 hours of contact per day but some treatment during each day,
perhaps excluding weekends)
6. continuous (e.g. residential living)
9. cannot tell
88. N/A: control group
[G24] ____________ Approximate (or exact) mean hours actual contact time between
subject and provider or treatment activity per week if reported or calculable. Assume that high
school classes, counseling, or therapy sessions are an hour unless otherwise specified. Round to
one decimal place. Code 8888 for institutional, residential, or around the clock program; code
9999 if not available.
[G26] _____________ Approximate (or exact) mean number of hours total contact between
subject and provider or treatment activity over full duration of treatment per subject if reported
or calculable. Round to whole number. Code 8888 for institutional, residential, or around the
clock program; code 9999 if not available.
[G30] Based on evidence or author acknowledgment, was there any uncontrolled variation or
degradation in implementation or delivery of treatment, e.g., high dropouts, erratic attendance,
treatment not delivered as intended, wide differences between settings or individual providers,
etc.? Assume that there is no problem if one is not specified.
This question has to do with variation in treatment delivery, not research contact. That is, there
is no “dropout” if all subjects complete treatment, even if some fail to complete the outcome
measures.
1. yes (describe below)
2. possible (describe below)
3. no, apparently implemented as intended
Subject Characteristics
ETHNICITY CODING:
[G43a] Percent white
[G43b] Percent black
[G43c] Percent Hispanic
[G43d] Percent other minority
[G43e] Percent non-white (ONLY use this category if specific minority groups are not
mentioned; if you use this category, there should only be numbers in the white and non-
white categories)
[G46] Enter the average age of the sample using number of years.
[G46a] and [G46b] High and low age using years.
[G47] Enter the average grade level of the sample. (dropdown menu)
[G47a] and [G47b] High and low grades (dropdown menu)
Select the general construct group for the dependent variable you are coding, then
select the specific construct category that best matches the dependent variable.
100. Dropout
101. Attendance, truancy
102. Academic performance
103. School conduct
104. School engagement
[DV3] Source of information. Who provided the information for this dependent variable?
1. Participants, self-report
2. Parents
3. Peers
4. Teachers
5. Principal
6. Service Provider (treatment agent)
7. School Records
BREAKOUT/SUBGROUP CODING
Note that a simple report of the number of males and females in the treatment and control
groups does not constitute a breakout (though it is relevant to group equivalence issues). To be a
breakout, outcome data must be reported for the treatment-control or pretest-posttest
comparison for at least one subgroup of the breakout variable. Breakouts are usually presented
because the authors think that subgroups (e.g., males and females) are sufficiently different to
warrant separate presentation of results (because, for example, males may be more likely to
dropout than females).
NOTE: Only certain breakout variables are eligible for coding. These include gender, age,
ethnicity, and prior school completion/dropout, GED completion, or absences/truancy. If you
encounter another breakout variable that may be relevant to dropout, please check with Sandra.
Create a new record for each subgroup that you will be coding for this study.
[BreakID] Subgroup number. Assign a number to the subgroup such that the first subgroup you
code is numbered 1, the second is numbered 2, and so on. These numbers are used within a
study, so when you code subgroups from another study, you would start over with 1 again.
[Labels:B2] Write in descriptor for the subgroup you are coding, e.g., males, 8 year olds, whites,
etc.
[ESID] Effect size ID. FileMaker will automatically generate unique effect size ID numbers
ACROSS studies.
[pagenum] Page number for this effect size. Indicate the page number of the report identified
above on which you found the effect size data. If you used data from two different pages, you can
type in both, but use a comma or dash between the page numbers.
There are 3 types of effect sizes that can be coded: pretest, posttest, and group equivalence (or
baseline similarity) effect sizes. They are defined as follows:
• Pretest effect size. This effect size measures the difference between a treatment and
comparison group before treatment (or at the beginning of treatment) on the same variable
used as an outcome measure, e.g., school attendance measured before the treatment begins
is used as a pretest for school attendance measured the same way after the treatment ends.
• Group equivalence effect size. Group equivalence effect sizes are used to code the
equivalence of two groups prior to treatment delivery on variables that might be related to
outcome. See the Group Equivalence Coding section for more information.
• Posttest effect size. This effect size measures the difference between two groups after
treatment on some outcome variable.
This is very important!!!! These three types of effect sizes are different from the multiple
breakouts and multiple dependent variables that you might have in a study. For example, you
might have a study that measures the treatment and comparison groups at pretest and posttest
at 6 months after treatment on 3 different dependent variables. The results might be presented
for the entire sample and broken down by gender. In this case you would have 6 group
comparison effect sizes for the entire sample – three for the pretest and 3 for the 6 month
posttest (the three is for your three dependent variables). In addition to these 6 aggregate effect
sizes, you will have 6 more for the girls (the same as for the aggregate groups but just for the
subgroup of girls) and 6 for the boys (also the same as for the aggregate groups but just for the
subgroup of boys).
[ES19] Wave number. Pretests and group equivalence effect sizes always get a 1; each wave
thereafter gets numbered consecutively, beginning with 1. Some studies involve more than one
posttest measurement and we need to be able to distinguish one from another. Give the first
posttest after treatment a 1, the second a 2, and so on.
[ES47] Timing of measurement. Approximate (or exact) number of weeks after treatment when
measure was taken. Divide days by 7; multiply months by 4.3. Enter 999 if cannot tell, but try to
make an estimate if possible. Enter 0 if pretest. [es47_ck]
It is now time to identify the data you will use to calculate the effect size and to calculate the
effect size yourself if necessary (see below). Effect sizes can be calculated ONLY from data based
on the number of subjects, e.g., average number of days absent per subject and the
corresponding standard deviation) or proportion of subjects who were chronic truants during a
given time period. Effect sizes can NOT be calculated from data based solely on the incidence of
events, e.g., total number of days absent per group. THIS IS VERY IMPORTANT—BE SURE
YOU KNOW WHICH KIND OF DATA YOU HAVE.
You need to determine what effect size format you will use for each effect size calculation. There
are two general formats you can use, each with its own section in FileMaker:
1. Compute ES from means, sds, variances, test statistics, etc.
2. Compute ES from frequencies, proportions, contingency tables, odds, odds ratios, etc.
Also note that within each of the above effect size formats, effect sizes can be calculated from a
variety of statistical estimates; to determine which data you should use for effect size calculation,
please refer to the following guidelines in order of preference:
1. Compute ES from descriptive statistics if possible (means, sds, frequencies, proportions).
2. If adequate descriptive statistics are unavailable, compute ES from significant test
statistics if possible (values of t, F, Chi square, etc.).
3. If significance tests statistics are unavailable or unusable but p value and degrees of
freedom (df) are available, determine the corresponding value of the test statistic (e.g., t,
chi-square) and compute ES as if that value had been reported.
Note that if the authors present both covariate adjusted and unadjusted means, you should use
the covariate adjusted ones. If adjusted standard deviations are presented, however, they should
not be used.
For treatment-control comparisons, the treatment group is favored when it does “better” than
the control group. The control group is favored when it does “better” than the treatment group.
Remember that you cannot rely on simple numerical values to determine which group is better
off. For example, a researcher might assess the attendance and report this variable in terms of
Sometimes it may be difficult to tell which group is better off because a study uses multi-item
measures in which it is unclear whether a high score or a low score is more favorable. In these
situations, a thorough reading of the text from the results and discussion sections usually can
bring to light the direction of effect – e.g., the authors will often state verbally which group did
better on the measure you are coding, even when it is not clear in the data table. Note that if you
cannot determine which group has done better, you will not be able to calculate a numeric effect
size. (You will still be able to create an effect size record—just not a numeric effect size.)
[ES50] For this effect size, did you use adjusted data (e.g., covariate adjusted means) or
unadjusted data? If both unadjusted and adjusted data are presented, you should use the
adjusted data for the group means or mean difference, but use unadjusted standard deviations
or variances. Adjusted data are most frequently presented as part of an analysis of covariance
(ANCOVA). The covariate is often either the pretest or some personal characteristic such as
socioeconomic status. If you encounter data that is adjusted using something other than a
covariate, please see Sandra or Mark.
1. Unadjusted data
2. Pretest adjusted data (or other baseline measure of an outcome variable construct)
3. Data adjusted on some variable other than the pretest (e.g., socioeconomic status)
4. Data adjusted on pretest plus some other variables
[ES55] Intent-to-treat analysis: Are results for this effect size based on an intent-to-treat
analysis?
Experimental and quasi-experimental designs may employ “intent-to-treat” (ITT) or
“completer” analyses. An intent-to-treat analysis is one that (attempts to) include outcome data
from all the participants initially assigned to the treatment and comparison conditions
regardless of their compliance with the entry criteria, the treatment they actually received, or
any subsequent withdrawal from treatment (non-completers) or deviation from the protocol. A
true ITT is possible only when the authors (attempt to) use outcome data for all randomized (or
otherwise assigned) subjects; if all assigned subjects are used to present outcome results, then
code as ITT, regardless of whether authors call the analysis an ITT. If the authors attempt to
collect outcome data on non-completers and even if they are not 100% successful in this
attempt, still code as ITT (as the missing data for non-completers is then coded as attrition).
Sometimes researchers will use a modified ITT, in which they estimate missing data on non-
completers, or include all subjects with pretests but not all who were randomized. These
modified ITTs would be coded as “2” below. Completer analyses (AKA ‘treatment on the treated
(TOT)’ analyses) involve only the participants who completed treatment or met some other
criteria indicating an acceptable level of participation.
1. Intent-to-treat analysis (all subjects who were assigned are used in posttest)
2. Modified intent-to-treat (not all assigned subjects are used in posttest, but authors have
done some modifications to approximate a true ITT)
3. Completer analysis (only those subjects who completed treatment or who stayed in the
study are used in posttest)
[ES36] Assigned N for the treatment group (or pretest, if this is a pretest-posttest effect size).
[ES37] Assigned N for the comparison or second treatment group (or posttest, if this is a
pretest-posttest effect size; if this is a pretest-posttest effect size, this value should be the
same as the assigned N for the pretest).
[ES38] Total Assigned N.
[ES1] Observed N for the treatment group (or pretest, if this is a pretest-posttest effect size).
[ES2] Observed N for the comparison or second treatment group (or posttest, if this is a
pretest-posttest effect size).
[ES3] Total Observed N.
[ES51] Number of units assigned for treatment group (for cluster-assigned studies): ____
[ES52] Number of units assigned for control group (for cluster-assigned studies): ____
[ES53] Intra-class correlation (ICC) for outcome measure (for cluster-assigned studies): ____
Remember that you cannot rely on simple numerical values to determine which group has done
better. For treatment-control comparisons, a positive effect size should indicate that the
treatment group did “better” on the outcome measure than the comparison group, while a
Effect sizes can range anywhere from around –3 to +3. However, you will most commonly see
effect sizes in the –1 to +1 range. Odds ratios smaller than 1 indicate that the control group is
better off; those greater than 1 indicate that the treatment group has the better outcome.
Note: If the authors report an effect size, include that in your coding and use it for the final effect
size value if no other information is reported. However, if the authors also include enough
information to calculate the effect size, always calculate your own and report it in addition to
that reported in the study.