Applied Multivariate Statistical Analysis Solution Manual PDF
Applied Multivariate Statistical Analysis Solution Manual PDF
ABSTRACTS
Monday, 5 July
Cross-disciplinary challenges
Gavin Stewart
Research synthesis methods are fundamental to the design, conduct, analysis and interpretation of
scientific evidence across all disciplines. Arguably, synthesis of data has become a science in its own
right with an increasingly complex set of methodologies surrounding systematic review and meta-
analysis in particular. Here we attempt to provide a cross-disciplinary overview of the comparative
history and characteristics of research synthesis. As a starting point we consider synthesis in the fields
of medicine and social sciences with the longest history of use of meta-analysis and also the
environmental field, which has similar pressing needs to inform decision makers with the best-
available evidence.
Policy decisions often require synthesis of evidence from multiple sources, and the source studies
typically vary in rigour and in relevance to the target question. Rigour (or internal bias) reflects how
well a study estimates its intended parameters, and varies according to use of randomisation, degree of
blinding and attrition levels. Relevance (or external bias) reflects how similar the source study design
is to the target setting, with respect to study population, outcomes and interventions. We present
methods for allowing for internal and external biases in evidence synthesis.
The methods were developed in the context of a NICE technology appraisal in antenatal care, which
identified ten relevant studies. Many were historically controlled, only one was a randomised trial,
and doses, populations and outcomes varied between studies and differed from the target UK setting.
Using elicited opinion, we constructed prior distributions to represent the biases in each study, and
performed a bias-adjusted meta-analysis. Our generic bias modelling approach allows decisions to be
based on all available evidence, with less rigorous or less relevant evidence discounted using
computationally simple methods.
In further work, the bias adjustment methods have also been adapted to meta-analyses of longitudinal
observational studies. Application of the modified methods is illustrated within a systematic review
Models for potentially biased evidence in meta-analysis using empirically based priors
Nicky J Welton
We present methods for the combined analysis of evidence from randomized controlled trials
categorized as being at either low or high risk of bias due to a flaw in their conduct. We formulate a
bias model that incorporates between-study and between-meta-analysis heterogeneity in bias, and
uncertainty in overall mean bias. The parameters of the bias model can be estimated from collections
of previously published meta-analyses (meta-epidemiological studies). We illustrate the methods
using an illustrative example meta-analysis of clozapine in the treatment of schizophrenia. A
sensitivity analysis shows that the gain in precision from including studies at high risk of bias is likely
to be low, however numerous or large their size, and that little is gained by incorporating such studies,
unless the information from studies at low risk of bias is limited. The use of meta-epidemiological
data to inform bias parameters requires strong exchangeability assumptions, and we consider the
potential of estimating bias parameters within a mixed treatment comparison evidence structure to
avoid making such strong assumptions. We discuss approaches that might increase the value of
including studies at high risk of bias, and the acceptability of the methods in the evaluation of health
care interventions.
Tony Ades
In multi-parameter synthesis applications there may often be data on more functions of parameters
than there are parameters. This creates the possibility of conflict between data sources. If conflict
exists under a particular model, this might either indicate that the model is mis-specified, or it might
suggest that one or more data sources may be "biased", in the sense that they are not estimating their
target parameter.
On the other hand, because there is more data than there are parameters, the size of the bias can in
principle be estimated. However, we may not know which data sources are biased and alternative
assumptions about the locus of the bias will all yield different estimates. We look at solutions that
take account of the uncertainty in which data is biassed. The presentation is illustrated with a non-
linear, 9-parameter synthesis model of the prevalence and distribution of HIV (Ades and Cliffe,
2002), based on routine surveillance and survey data.
Hayley Jones1, Sylwia Bujkiewicz, Rebecca Turner, Monica Lai, Nicola Cooper, Neil Hawkins, Hazel
Pilgrim, Keith Abrams, David Spiegelhalter, Alex Sutton
1
Department of Community Based Medicine, University of Bristol, UK
We continue with the example of routine antenatal anti-D prophylaxis for RhD-negative
women, introduced by Rebecca Turner earlier in this session. In Turner et al (JRSS A 2009;
172:21-47) a meta-analysis of efficacy data was performed which was adjusted for various
expected biases based on expert elicitations. We now incorporate this bias-adjusted meta-
analysis into a fully probabilistic cost-effectiveness model (Pilgrim et al, Health Technol
Assess 2009; 13:1-126).
We will further introduce the “Transparent Interactive Decision Interrogator” (TIDI), an
Excel-based user interface which runs R and WinBUGS “behind the scenes” and returns
summary statistics and graphical displays back to Excel. Using this user-friendly interface,
the user can decide interactively which of the studies to include in the meta-analysis, which
types of bias to adjust for, and also the beliefs of which experts to incorporate. This allows
the user to explore sensitivity to various choices and assumptions without expertise regarding
the underlying software or model.
Finally, we briefly consider application of a meta-epidemiological based bias adjustment, as
described by Welton et al (JRSS A 2009;172:119–136), to this case study. Preliminary
information on the average bias associated with observational versus randomised studies
from the BRANDO (Bias in Randomised AND Observational studies) database is used for
this purpose.
Center for Applied Epistemology, CNRS/Ecole Polytechnique, Paris and Department of Hygiene and
Epidemiology, University of Ioannina, Greece
Many different types of bias have been described.. Some biases may tend to coexist or be associated
with specific research settings, fields, and types of studies. We aimed to map systematically the
terminology of bias across biomedical research using advanced text-mining and clustering techniques.
The evaluation of 17M items from PubMed (1958-2008) make it possible to identify 235 bias terms
and 103 other terms that appear commonly in articles dealing with bias. Forty bias terms were used in
the title or abstract of more than 100 articles each. Pseudo-inclusion clustering identified 252 clusters
of terms for the last decade. The clusters were organized into macroscopic maps that cover a
continuum of research fields. The resulting maps highlight which types of biases tend to co-occur and
may need to be considered together and what biases are commonly encountered and discussed in
specific fields. Most of the common bias terms have had continuous use over time since their
introduction, and some (in particular confounding, selection bias, response bias, and publication bias)
show increased usage through time. This systematic mapping offers a dynamic classification of biases
in biomedical investigation and related fields and can offer insights for the multifaceted aspects of
bias.
Julian Higgins
I have recently had the opportunity to work with the Assessment Methodology Unit at the European
Food Safety Authority (EFSA) in the preparation of guidance for adopting systematic review methods
in the area of food and feed safety. Whereas many of the specific methods for systematic reviews
translate reasonably well, many of our discussions revolved around preliminary considerations in
question formulation and deciding whether a systematic review was appropriate. We found relatively
little published guidance in these areas.
I will summarize our discussions and decisions about (i) breaking down 'complex' questions into
'reviewable' questions; (ii) differentiating specific types of reviewable questions; (iii) the potential use
of 'evidence mapping' as a precursor to a systematic review; and (iv) considerations for deciding
whether or not it is worthwhile embarking on a systematic review.
Comparing the performance of alternative statistical tests for moderators in mixed-effects meta-
regression models
When the effect sizes in a meta-analysis are found to be heterogeneous, researchers usually examine
whether at least part of the variability between the effect size estimates can be accounted for based on
the influence of moderator variables. The models used for this purpose are usually linear regression
models allowing for residual heterogeneity between the effect sizes, so that the resulting analysis is
typically called a mixed-effects meta-regression.
In this talk, several methods for conducting mixed-effects meta-regression analyses are compared.
Specifically, seven residual heterogeneity estimators were combined with four different methods for
testing the statistical significance of the moderators included in the model: the standard, Wald-type
method, the untruncated Knapp and Hartung method, the truncated Knapp and Hartung method (as
the authors proposed on their seminal paper in 2003) and the permutation test. The 28 resulting
combinations were compared by means of a Monte Carlo simulation.
The results did not differ with respect to the residual heterogeneity estimator used. However, some
noteworthy differences were found depending on the method employed for testing the model
coefficients. Regarding the Type I error, the standard method showed inflated rejection probabilities
when the amount of residual heterogeneity was large, especially when the number of studies was
small. On the other hand, for small amounts of residual heterogeneity, the standard method showed
overly conservative rejection probabilities (i.e., below .05). The truncated Knapp and Hartung method
was also overly conservative, but essentially across all conditions. This, in turn, lead to a noticeable
The reliability generalization (RG) approach is a new kind of meta-analysis aimed to statistically
integrate reliability coefficients obtained in different applications of the same test, in order to
determine whether scores reliability can be generalized to different participant populations, contexts
and adaptations of the test. RG studies usually calculate an average reliability coefficient, assess the
heterogeneity assumption and search for moderator variables that can explain the variability of the
coefficients. Precursors of the RG approach have not established a single preferred analytic method,
giving freedom of choice to meta-analysts. The methods for analyzing reliability coefficients usually
applied in RG studies differ among them depending on whether: (a) coefficients are or are not
transformed, existing different transformation formulae that are applied in order to normalize their
distributions and homogenize their variances, and (b) coefficients are not weighted or some weighting
method is applied (including the assumption of a fixed- or a random-effects model). By means of a
real example, we illustrate how using different statistical methods in an RG study can influence
results. Specifically, results from an RG study of the Maudsley Obsessive-Compulsive Inventory
(MOCI) are presented. The implications of our results for the RG practice are discussed.
This research has been funded by the Fundación Séneca, Murcia County, Spain (Project nº. 08650/PHCS/08).
Juan Botella
The designs of the studies providing estimates of the reliability of scores from a given test vary
considerably, especially in the sampling frames. Furthermore, the variance of the scores in any study
strongly depends of the way the participants are selected for inclusion. Although this source of
variability in the estimates has been often acknowledged in studies of Reliability Generalization (RG),
it has been rarely incorporated in the statistical analyses. First, I will show the results of several
simulations that illustrate the strong effect of this artifact in the heterogeneity of the coefficients of
internal consistency (Cronbach’s alpha). Second, I will propose a way to deal with it (Botella, Suero,
& Gambara, Psychological Methods, in press). It is based on comparing the incremental fit of nested
models, and tries to reach parsimonious conclusions. Finally, I will show several examples of how the
conclusions of a Reliability Generalization can be affected by this source of heterogeneity.
Wolfgang Viechtbauer
School for Public Health and Primary Care, Maastricht University, The Netherlands
R is a computer program for performing statistical analyses and producing graphics and is becoming
the tool of choice for those conducting statistical analyses in various field. One of the great
advantages of R is that it is freely available via the internet. It is distributed with open source under
the GNU General Public License (GPL) and runs on a wide variety of platforms, including
Linux/Unix, Windows, and Mac OS X. In addition, the availability of over 2000 user-contributed add-
on packages has tremendously helped to increase R's popularity.
The metafor package (Viechtbauer, 2009) consists of a collection of functions for conducting meta-
analyses in R. The package grew out of a function written by the author several years ago
(Viechtbauer, 2006), which has since been successfully applied in several published meta-analyses.
The package allows users to easily fit fixed- and random/mixed-effects models with and without
moderators. For 2x2 table data, the Mantel-Haenszel and Peto's method are also implemented.
Moreover, the package provides various plot functions (e.g., for forest, funnel, and radial plots) and
functions for assessing the model fit, for obtaining case diagnostics, and for conducting funnel
asymmetry tests.
In this talk, I will demonstrate the current capabilities of the package with several examples, describe
some implementation details, and discuss plans for extending the package to handle multivariate and
dependent observations.
Viechtbauer, W. (2006). MiMa: An S-Plus/R Function to Fit Meta-Analytic Mixed-, Random-, and Fixed-
Effects Models [Computer software and manual]. Retrieved from http://www.wvbauer.com/
Viechtbauer, W. (2009). The metafor Package, Version 1.0-1 [Computer software and manual]. Retrieved
from http://cran.r-project.org/package=metafor
One problem with including information from multiple treatment groups from a single study in meta-
analysis is that the effects may be correlated (i.e., stochastically dependence), especially when the
treatment groups are contrasted with a single control group (Gleser and Olkin, 2009; Kalaian and
Kasim, 2008). A small, but growing, body of methods has been proposed to address the issue of
stochastic dependence in meta-analysis due to multiple treatment groups (Becker, 2000). This study
examined how SEM can control stochastic dependence in meta-analysis (Cheung, 2008).
Similar to previous studies (e.g., Raudenbush and Bryk, 2002, Van Den Noortgate and Onghena,
2003), this simulation study found dramatic biasing effects of ignoring stochastic dependence in a
univariate SEM based meta-analysis, including underestimation of the standard error (S.E.) and Type-
I error inflation. Importantly, the simulation results also indicate that the multivariate approach to
SEM based meta-analysis accurately estimated S.E. and controlled Type-I error to chance levels.
Implications of these results are discussed.
Becker, B.J. (2000). Multivariate meta-analysis. In H.E.A. Tinsley & S.D. Brown (Eds.), Handbook of
applied multivariate statistics and mathematical modeling (pp. 499-525). Academic Press
Cheung, M. W. (2008). A model for integrating fixed-, random-, and mixed-effects meta-analyses into
structural equation modeling. Psychological Methods, 13: 182 – 202.
Gleser, L., J., & Olkin, I. (2009). Stochastically Dependent Effect Sizes. In H. M. Cooper, L. V. Hedges &
J. C. Valentine (Eds.), The Handbook of Research Synthesis and Meta-Analysis (2nd ed., pp. 357-376).
New York, NY: Russell Sage Foundation.
Kalaian, S. A., & Kasim, R. M. (2008). Multilevel Methods for Meta-Analysis. In A. A. O'Connell & D. B.
McCoach (Eds.), Multilevel Modeling for Educational Data (pp. 315-343): Information Age Publishing, Inc
Raudenbush, S. W., & Bryk, A. S. (2002). Applications in Meta-Analysis and Other Cases where Level-1
Variances are Known. In S. W. Raudenbush & A. S. Bryk (Eds.), Heirarchical Linear Models:
Applications and Data Analysis Methods (2nd ed., pp. 205-227). Thousand Oaks, CA: Sage Publications,
Inc.
Van Den Noortgate, W., & Onghena, P. (2003). Multilevel meta-analysis: A comparison with traditional
meta-analytical procedures. Educational and Psychological Measurement, 63(5), 765-790.
Ariel M. Aloe
The rsp index is the semi-partial correlation of a predictor with the outcome of interest. This effect
size can be computed when multiple predictor variables are included in each model in a meta-
analysis, and represents a partial effect size in the correlation family. Specifically, this index has been
proposed for use in the context of meta-analysis when primary studies report regression analyses but
do not include correlation matrices. In the current research, methods for synthesizing series of rsp
values are studied under different conditions in the primary studies. I examine variations in sample
size, the degree of correlation among predictors and between the predictors and dependent variable,
and the number of predictors in the model.
Further Results on Robust Variance Estimates for Meta-analysis Involving Correlated Effect
Size Estimates
In the past I have written about ways to use synthesized correlation matrices to estimate linear
regression models in meta-analysis. Recently I have begun to examine the situation where correlations
that represent treatment effects (i.e., r values obtained by transforming standardized-mean-difference
effect sizes) are combined in a similar fashion. In this presentation I will examine synthesized
regression models that represent effects of multiple treatments derived from a single sample.
Comparisons will be made between effects based on two ways of computing the effect size d (using
the mean-square within from a two-way design versus using the standard pooled variance that would
be obtained from a t test), and the impact of confounding of (or interactions among) the treatments on
that process will also be examined.
G Lu & AE Ades
In psychological research, evidence on the treatments of interest and their comparators are often
presented on multiple outcome scales, even within the same trial. These scales often measure similar
constructs, for example the Beck and Hamilton scales for depression. It would be sensible to combine
information from the different measures to make the most efficient use of the data.
There are three main types of data: trial evidence (aggregated data on one or more outcome
measures), mapping evidence (on converting one scale-score into another) and external evidence (on
test retest, intra- inter-rater reliabilities of outcome measures and on observed correlations between
test instruments). This paper provides statistical analysis for combining these types of evidence on a
single baseline measurement. We develop a framework for synthesis of multiple outcomes that takes
account of not only correlations between outcome measures, but also measurement errors. In this
framework we ‘map’ all the outcome information into the baseline scale. The effects of measurement
error on the mappings and on the variance-covariance structures are analysed in details and then
incorporated into the synthesis process. We show that in the absence of measurement error there
We discuss a work in progress with emphasis on the method rather than on substantive results. A
group of former students presented us with a problem in which they wished to meta-analyze
standardized mean differences from studies with varying numbers of means that arose in the context
of repeated-measures designs. (We are deliberately vague about the details of the problem, as this is
an ongoing project on a topic of current interest in psychology, and the data are not our own.)
Although they envisioned analyzing multiple differences between means, with the number of
comparisons depending on the number of repetitions in the study, it became clear in discussion that
what they really needed was a growth curve function.
We describe an algorithm for accomplishing the analysis. First, we standardize the outcome metrics.
Next, we fit a polynomial regression for each study. Then we adjust the covariance matrix of the
sampling distribution of regression parameters to revert to the metric of raw data rather than means.
We perform a multivariate meta-analysis of the regression parameters with a random-effects error
component added to the intercept.
Note that the adjustment at the second step does not correctly reflect the true error structure of the
original repeated-measures design. As no relevant information about within-subjects error variance is
available, we conduct a sensitivity analysis by attenuating the diagonals of the covariance matrices in
varying degrees while maintaining the necessary positive definiteness of the matrix.
We illustrate the process with a partial data set.
Petra Macaskill
Cochrane reviews of studies of diagnostic accuracy are now being conducted. Statistical methods
currently recommended for such reviews require that a 2×2 table be extracted for each study to
provide the number of true positives, true negatives, false positives and false negatives from which an
estimate of sensitivity and also specificity of the test may be computed. Sensitivity and specificity are
expected to be negatively correlated across studies and should generally be analysed jointly.
At present, the two recommended approaches for modelling such data are (i) the bivariate model
which focuses on making inferences about a summary operating point (1-specificity, sensitivity), and
Single-case designs (SCDs) are short interrupted time series where an intervention is repeatedly given
to and removed from a single case (typically a person, but sometimes an aggregate like a classroom).
These designs are widely used in parts of education, psychology, and medicine when better designs
such as a randomized trial are not feasible, ethical or optimal for the patient. Despite the fact that
these designs are viewed by many researchers as providing credible evidence of cause-effect
relationships, they have not generally been included in systematic reviews. One reason for that is lack
of consensus about how data from these designs should be analyzed and aggregated. We have recently
proposed an effect size estimator that is comparable to the usual between groups standardized mean
difference statistic, and also derived a conditional variance for that estimator. The latter depends on
many features of the SCD including the autocorrelation of the data points over time, the number of
SCDs within a publication, the number of time points in the SCD and within each phase, and the
number of phases. Of particular interest in the continued development of this estimator and its
variance is their performance in computer simulations that vary the level of each of these features.
The present research will help to determine those levels by examining the existing SCD literature to
see what levels are representative of what is done when these designs are actually used. We report the
results of a survey of all publications that included SCDs during the year 2008 in a set of 21 journals
in the fields of psychology, education, and autism. We have completed initial surveys, and are in the
process now of extracting data about the design features of interest. Preliminary examination suggests
that these 21 journals published 118 articles reporting results from SCDs. Those articles contained a
total of 876 separate reports of SCDs, each report being a combination of a case and a dependent
variable. We have some additional preliminary results. For example, by far the most common metric
(80+%) for the outcome data is some form of a count, with less than 10% of the outcomes plausibly
described as normally distributed continuous data. This has significant implications for the models
used to analyze such data. We are currently coding data on additional variables, and are extracting the
raw data from the SCDs to use in computing autocorrelations. We will present as much of these data
as is available at the time of the conference.
Introduction Generally, primary-level studies generally omit structural-level variables due to the
difficulty to incorporating them. Yet clearly structural-level variables may interact with finer-grained
factors to influence a phenomenon (Johnson et al., in press) and meta-analytic techniques can discover
such trends (e.g., Bond & Smith, 1996). This paper illustrates these patterns by examining how geo-
temporal information (e.g., Human Development Index) may relate to the efficacy of international
HIV prevention trials, sometimes over and above the contribution of information about the trials
themselves.
Method A wide variety of systematic search strategies in different languages were used with
electronic databases to find eligible publications within two literatures. (a) One had 33 interactive,
relatively intensive controlled interventions that had condom use outcomes and took place in Latin
American or the Caribbean (N=34,597). And (b), the other had 95 interventions from around the
world that included a media component and had condom use (or other risk markers) and compared
against a control or a baseline (N=130,412). Structural-level, study, sample, and intervention
characteristics were coded. We followed a structural equation modeling strategy (Cheung, 2008) to
meta-analysis in order to accommodate variables at different levels.
Results and Conclusions Both meta-analyses revealed that overall, interventions increased condom
use. Although a number of individual-level variables significantly related to the magnitude of effect
sizes, several became non-significant or exhibited between-level interactions after structural factors
were included in the models. Specifically, interventions succeeded better in countries with lower
human development index values (an index that integrates standardized measures of measures of life
expectancy, literacy, educational attainment, and GDP per capita) , lower Gini coefficients (a measure
of income inequality), or lower HIV prevalence. Both patterns reveal that intensive HIV prevention
activities succeed best where and when the need and the inequality in the population are the greatest.
Those comparisons suggest that structural factors can be quite powerful predictors of behavior, and
may have a differential impact depending upon the cultural context. Advantages of this multi-level
approach are instant interdisciplinary implications and the possibility of geographically mapping
complex results in order to highlight best where knowledge is lacking. Disadvantages such as
restriction of range and the possibility of confounds among variables (e.g., Human Development
Index with HIV knowledge) will also be discussed.
Bond, R. & Smith, P.B. (1996). Culture and Conformity: A Meta-analysis of studies using Asch’s (1952b,
1956) line judgment task. Psychological Bulletin, 119: 111 – 137.
Cheung, M. W. (2008). A model for integrating fixed-, random-, and mixed-effects meta-analyses into
structural equation modeling. Psychological Methods, 13: 182 – 202.
Johnson, B. T., Redding, C. A., DiClemente et al. (in press). A Network-Individual-Resource model for
HIV prevention. AIDS & Behavior.
Multilevel models are increasingly used to model the between-study and the sampling variation, and
look for moderator variables to explain the between-study variation. A major advantage of using
multilevel models for meta-analysis is their amazing flexibility, allowing fitting models that may
better match the kind of data and the research questions. One possibility that is seldom mentioned in
the methodological meta-analytic literature or typically is not implemented in software for meta-
analysis, is the distinction of a third level of variation to model dependencies between studies (e.g.,
occurring when several studies stem from the same research group) or within studies (e.g., occurring
when within a study multiple samples were drawn).
In the presentation we try to clarify, using real data examples and a simulation study, in which
situations three level models are appropriate to solve the problem of dependent effect sizes.
This paper presents methods for second order meta-analysis. A second order meta-analysis is a meta-
analysis of a number of statistically independent meta-analyses that were conducted to estimate the
same relation in different populations. Meta-analysis greatly reduces the sampling error variance in an
estimate of an effect size or relation but does not completely eliminate sampling error. The residual
sampling error is called second order sampling error. The purpose of a second order meta-analysis is
to estimate how much of the variance in mean effect sizes across meta-analyses is attributable to
second order sampling error. We present equations and methods for second order meta-analysis for
three situations: (a) where the first order meta-analyses corrected for only sampling error; (b) where
the first order meta-analyses corrected each effect size for measurement error (and other artifacts, if
applicable); and (c) where the first order meta-analyses used the artifact distribution method to correct
for measurement error (and other artifacts if applicable). All methods and equations are random
effects (RE) models. We also present an empirical application of second order meta-analysis. For each
of five personality traits, meta-analyses have been conducted separately in five East Asian countries
relating the personality traits to job performance. For each personality trait, it appeared that the mean
correlation varied over the five countries. However, a second order meta-analysis showed that for four
of the traits all variance of these values across countries was attributable to second order sampling
error, resulting in a more parsimonious explanation. Other areas in which second order meta-analysis
might be applied are also discussed.
Pre-post effect sizes for the separate arms of experimental comparisons isolate the different treatments
represented in each arm when the available studies consist mainly of treatment-treatment comparisons
but, of course, lack experimental control. Group comparison effect sizes for outcomes maintain
experimental control but do not permit the comparative effectiveness of the different treatments to be
easily determined when they are compared with each other in unsystematic and incomplete patterns.
In a meta-analysis of the effectiveness of treatment for adolescent substance abuse, we explored an
approach to integrating meta-analyses of these different effect sizes to assess comparative treatment
effectiveness. Using meta-regression to control for differences between study samples, measures, and
methods, estimates of the effects of each treatment type were derived from (a) analysis of the pre-post
effect sizes, (b) analysis of the group comparison effect sizes, and (c) analysis of synthetic group
comparisons in which one arm was statistically controlled to provide a presumptively constant basis
of comparison across the treatments in the other arm. The results of these three analyses were then
compared and integrated to draw conclusions about comparative treatment effects. This presentation
will describe that approach and invite our more sophisticated colleagues to poke holes in it.
Quantifying Selective Reporting and the Proteus Phenomenon for Multiple Datasets with
Similar Bias
Program for Evolutionary Dynamics, Harvard University, USA and Department of Hygiene and
Epidemiology, University of Ioannina, Greece
Meta-analyses play an important role in synthesizing evidence from diverse studies and datasets that
address similar questions. One of the major obstacles for meta-analyses arises from biases in
reporting. Results published in scientific publications often present a non-random sample from all the
results that have been obtained. In particular, it is speculated that findings that do not achieve formal
statistical significance may be less likely reported than statistically significant findings. Statistical
methods have been proposed for the detection and correction of selective reporting bias. When
applied to a single meta-analysis that covers a small number of studies, however, these methods often
have limited statistical power. Here we present an extension of previous methods for analyzing
selective reporting bias. We model selective reporting based on a combined analysis of different
datasets that are assumed to be subject to the same bias. We illustrate our methods on a dataset on the
genetic basis of Alzheimer’s disease (AD). The dataset covers 1167 results from case-control studies
on 102 genetic markers. While different genetic markers may differ in their association with AD, we
assume that biases in scientific publishing are the same for all markers in the field. Analyzing such a
combined dataset increases the statistical power to quantify selective reporting and also allows
detecting more complex bias patterns. For the AD dataset we observe that initial studies on a genetic
marker tend to be substantially more biased than subsequent replications. Moreover, early replications
tend to be biased against initial findings, an observation that previously has been termed Proteus
phenomenon. Our findings imply that dynamic patters in bias, which arise from the combination of
publication bias, initial-study bias, and the Proteus phenomenon, are difficult to correct for with
conventional methods where typically simple publication bias is assumed to operate. Moreover, our
methods provide a basis for information and decision theoretical modeling of selective reporting and
thereby allow addressing the question how to optimally deal with the resulting biases.
Julia H. Littell
Background. There is ample evidence of bias in the reporting and publication of RCTs in health care.
Studies of publication bias first appeared over fifty years ago in psychology, and there is reason to
believe that this bias persists in psychology and related fields, yet there has been relatively little
research on publication bias (and related problems) outside of medicine. Many recently published
meta-analyses on psychotherapy and other social and behavioral interventions are based entirely on
published studies, and many do not make use of available methods for detecting publication bias.
Objectives
1. To determine the prevalence of incomplete reporting of clinical trials in psychology and social
welfare.
2. To assess the association between incomplete reporting and the direction and statistical significance
of results.
3. To assess the extent of publication bias in clinical trials in psychology and social welfare.
4. To assess the completeness and accuracy of published reviews of clinical trials.
Methods. Inception cohorts will include studies approved by research ethics committees and those
identified in prospective registers. Abstract cohorts will be drawn from presentations at conventions
of the American Psychological Association, Association for Public Policy Analysis and Management,
and the Society for Social Work and Research. We will identify the number and characteristics of
primary and secondary outcomes reported in trial protocols, conference presentations, and published
reports. Trialists will be surveyed to confirm outcomes and publication status. For a subsample of
published trials, we will track citations of these studies to identify relevant published reviews; we will
determine which results were selected for inclusion and how these results were presented in narrative
reviews, systematic reviews, and meta-analyses.
Limitations. Prospective registration of trials appears to be less common in the social and behavioral
sciences than in medicine; thus, we will probably need the cooperation of research ethics boards to
obtain a sizable inception cohort. As in previous cohort studies, we may obtain low response rate from
trialists, and their responses may be unreliable.
Benefits. Empirical evidence on reporting, publication, and dissemination patterns can inform
reviewers about steps needed to minimize bias in systematic reviews and meta-analyses in psychology
and social welfare.
The need for principles for meta-analysis utilizing non–peer-reviewed data in the public domain
Jesse A. Berlin
The increasing availability of clinical trial results on such websites as www.clinicaltrials.gov, has the
promise to increase the transparency of the research process, and to make conduct of systematic
reviews and meta-analyses faster and easier. This availability also raises some important questions.
For example, study reports may not always be peer-reviewed, and may not be reported in any kind of
standardized format. In this context, it’s relevant to ask whether we need additional guidelines for
meta-analyses or simply expansion on and clarification about those that exist? Two published papers
show mixed compliance with QUOROM guidelines, and particularly show that flow diagrams are
infrequently included. One of these papers (Hind and Booth) examined published monographs from
the UK NHS Health Technology Assessment (HTA) programme. (As a methodologic aside, in the
Biondi et al. paper, QUOROM scores and Guyatt-Oxman scores were not associated (R = -0.06, p =
0.86).
The presentation will provide a series of principles and questions specific to the use of publicly
“posted” clinical trial results. For example, can data and reporting standards for trials be implemented,
to allow automation of the data extraction process? Are additional reporting standards needed for
Biondi-Zoccai et al. Compliance with QUOROM and quality of reporting of overlapping meta-analyses on
the role of acetylcysteine in the prevention of contrast associated nephropathy: case study. BMJ,
doi:10.1136/bmj.38693.516782.7C (published 16 January 2006).
Hind D, Booth A. Do health technology assessments comply with QUOROM diagram guidance? An
empirical study. BMC Medical Research Methodology 2007,7:49.
Christopher Rhoads
The typical approach to missing data in experimental studies has been to make untestable assumptions
about the missing data mechanism in order to obtain a point estimate of the treatment effect. As a
result, meta-analysts combining the results of these studies have almost always implicitly maintained
the same untestable assumptions about the missing data within each study. Recent research, mainly in
econometrics, has asked what can be learned about treatment effects without making untestable
assumptions about the missing data. In this approach, the results of the experiment are summarized
not by a single effect size estimate, but rather by a “partial identification region” which contains all
feasible estimates consistent with the observed data. The current paper explores methods for the
meta-analysis of partial identification regions under a fixed effects meta-analytic model. The
conditions under which the addition of more studies narrows the identification region are clarified.
The difference between the results of a typical meta-analysis that would assume missing data is
missing at random within each study and a meta-analysis of partial identification regions is illustrated
via a practical example. It is noted that in certain conditions the usual approach can result in logically
impossible estimates. Possible extensions to a random effects meta-analytic model are explored.
Using Bibliometric Temporal Trends to Predict Numbers of Studies Available for Research
Synthesis
Introduction Many if not most domains of investigation now have multiple systematic reviews, and
many of these are meta-analyses. Most of these reviews, in turn, attempt extensive overlapping
strategies to retrieve studies so as to maximize their evidential basis. Because most domains can be
characterized as producing new studies steadily if not steadily faster, the bibliometric trends in past
research syntheses can provide the basis for predicting the numbers for new research syntheses in
these domains, which can assist in anticipating the amount of work that is necessary to produce a
Allen, I. E., & Olkin I. (1999). Estimating time to conduct a meta-analysis from number of citations
retrieved. JAMA, 282, 634-635.
Johnson, B. T., Carey, M. P., Marsh, K. L., Levin, K. D., & Scott-Sheldon, L. A. J. (2003). Interventions to
reduce sexual risk for the Human Immunodeficiency Virus in adolescents, 1985-2000: A research synthesis.
Archives of Pediatrics & Adolescent Medicine, 157, 381-388.
Johnson, B. T., Scott-Sheldon, L. A. J., & Carey, M. P. (2010). Meta-synthesis of health behavior change
meta-analyses. American Journal of Public Health. DOI: 10.2105/AJPH.2008.155200.