Meta-Analysis in R

Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

Doing Meta-Analysis

with R
Doing Meta-Analysis
with R
A Hands-On Guide

Mathias Harrer
Pim Cuijpers
Toshi A. Furukawa
David D. Ebert
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742

and by CRC Press


2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

© 2022 Mathias Harrer, Pim Cuijpers, Toshi A. Furukawa, David D. Ebert

CRC Press is an imprint of Taylor & Francis Group, LLC

Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.com
or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.
co.uk

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data


Names: Harrer, Mathias, author.
Title: Doing meta-analysis with R : a hands-on guide / Mathias Harrer [and
three others].
Description: First edition. | Boca Raton : CRC Press, 2022. | Includes
bibliographical references and index.
Identifiers: LCCN 2021017096 (print) | LCCN 2021017097 (ebook) | ISBN
9780367610074 (hardback) | ISBN 9780367619770 (paperback) | ISBN
9781003107347 (ebook)
Subjects: LCSH: Meta-analysis. | R (Computer program language)
Classification: LCC R853.M48 H37 2022 (print) | LCC R853.M48 (ebook) |
DDC 610.727--dc23
LC record available at https://lccn.loc.gov/2021017096
LC ebook record available at https://lccn.loc.gov/2021017097

ISBN: 9780367610074 (hbk)


ISBN: 9780367619770 (pbk)
ISBN: 9781003107347 (ebk)

DOI: 10.1201/9781003107347

Typeset in Alegreya
by KnowledgeWorks Global Ltd.
The problems are solved, not by giving new information,
but by arranging what we have known since long.
– Ludwig Wittgenstein, Philosophical Investigations
Contents

Preface xiii

About the Authors xxiii

List of Symbols xxv

I Getting Started 1
1 Introduction 3
1.1 What Are Meta-Analyses? . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 “Exercises in Mega-Silliness”: A Historical Anecdote . . . . . . . . . 6
1.3 Apples and Oranges: A Quick Tour of Meta-Analysis Pitfalls . . . . 8
1.4 Problem Specification, Study Search & Coding . . . . . . . . . . . 11
1.4.1 Defining the Research Question . . . . . . . . . . . . . . . 12
1.4.2 Analysis Plan & Preregistration . . . . . . . . . . . . . . . . 16
1.4.3 Study Search . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.4 Study Selection . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.5 Data Extraction & Coding . . . . . . . . . . . . . . . . . . . 24
1.5 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Discovering R 29
2.1 Installing R & R Studio . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 The {dmetar} Package . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Data Preparation & Import . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Class Conversion . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.2 Data Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.3 Data Transformation . . . . . . . . . . . . . . . . . . . . . 44
2.5.4 Saving Data . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

vii
viii Contents

II Meta-Analysis in R 51
3 Effect Sizes 53
3.1 What Is an Effect Size? . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Measures & Effect Sizes in Single Group Designs . . . . . . . . . . 59
3.2.1 Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.2 Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.3 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 Effect Sizes in Control Group Designs . . . . . . . . . . . . . . . . 64
3.3.1 (Standardized) Mean Differences . . . . . . . . . . . . . . . 64
3.3.2 Risk & Odds Ratios . . . . . . . . . . . . . . . . . . . . . . 70
3.3.3 Incidence Rate Ratios . . . . . . . . . . . . . . . . . . . . . 76
3.4 Effect Size Correction . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.1 Small Sample Bias . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.2 Unreliability . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4.3 Range Restriction . . . . . . . . . . . . . . . . . . . . . . . 84
3.5 Common Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5.1 Different Effect Size Data Formats . . . . . . . . . . . . . . 87
3.5.2 The Unit-of-Analysis Problem . . . . . . . . . . . . . . . . . 88
3.6 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 Pooling Effect Sizes 93


4.1 The Fixed-Effect & Random-Effects Model . . . . . . . . . . . . . . 94
4.1.1 The Fixed-Effect Model . . . . . . . . . . . . . . . . . . . . 95
4.1.2 The Random-Effects Model . . . . . . . . . . . . . . . . . . 99
4.2 Effect Size Pooling in R . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2.1 Pre-Calculated Effect Size Data . . . . . . . . . . . . . . . . 108
4.2.2 (Standardized) Mean Differences . . . . . . . . . . . . . . . 112
4.2.3 Binary Outcomes . . . . . . . . . . . . . . . . . . . . . . . 115
4.2.4 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.2.5 Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.2.6 Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.3 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

5 Between-Study Heterogeneity 139


5.1 Measures of Heterogeneity . . . . . . . . . . . . . . . . . . . . . . 140
5.1.1 Cochran’s 𝑄 . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.1.2 Higgins & Thompson’s 𝐼2 Statistic . . . . . . . . . . . . . . 145
5.1.3 The 𝐻2 Statistic . . . . . . . . . . . . . . . . . . . . . . . . 147
5.1.4 Heterogeneity Variance 𝜏2 & Standard Deviation 𝜏 . . . . . 147
5.2 Which Measure Should I Use? . . . . . . . . . . . . . . . . . . . . . 149
5.3 Assessing Heterogeneity in R . . . . . . . . . . . . . . . . . . . . . 150
5.4 Outliers & Influential Cases . . . . . . . . . . . . . . . . . . . . . . 153
Contents ix

5.4.1 Basic Outlier Removal . . . . . . . . . . . . . . . . . . . . . 153


5.4.2 Influence Analysis . . . . . . . . . . . . . . . . . . . . . . . 156
5.4.3 GOSH Plot Analysis . . . . . . . . . . . . . . . . . . . . . . 163
5.5 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

6 Forest Plots 173


6.1 What Is a Forest Plot? . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.2 Forest Plots in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.2.1 Layout Types . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.2.2 Saving the Forest Plots . . . . . . . . . . . . . . . . . . . . 178
6.3 Drapery Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

7 Subgroup Analyses 183


7.1 The Fixed-Effects (Plural) Model . . . . . . . . . . . . . . . . . . . 184
7.1.1 Pooling the Effect in Subgroups . . . . . . . . . . . . . . . 184
7.1.2 Comparing the Subgroup Effects . . . . . . . . . . . . . . . 185
7.2 Limitations & Pitfalls of Subgroup Analyses . . . . . . . . . . . . . 187
7.3 Subgroup Analysis in R . . . . . . . . . . . . . . . . . . . . . . . . 190
7.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

8 Meta-Regression 197
8.1 The Meta-Regression Model . . . . . . . . . . . . . . . . . . . . . . 198
8.1.1 Meta-Regression with a Categorical Predictor . . . . . . . . 198
8.1.2 Meta-Regression with a Continuous Predictor . . . . . . . . 200
8.1.3 Assessing the Model Fit . . . . . . . . . . . . . . . . . . . . 201
8.2 Meta-Regression in R . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.3 Multiple Meta-Regression . . . . . . . . . . . . . . . . . . . . . . . 206
8.3.1 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.3.2 Common Pitfalls in Multiple Meta-Regression . . . . . . . . 209
8.3.3 Multiple Meta-Regression in R . . . . . . . . . . . . . . . . 212
8.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 224
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

9 Publication Bias 227


9.1 What Is Publication Bias? . . . . . . . . . . . . . . . . . . . . . . . 228
9.2 Addressing Publication Bias in Meta-Analyses . . . . . . . . . . . . 230
9.2.1 Small-Study Effect Methods . . . . . . . . . . . . . . . . . . 231
9.2.2 P-Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
9.2.3 Selection Models . . . . . . . . . . . . . . . . . . . . . . . . 272
9.3 Which Method Should I Use? . . . . . . . . . . . . . . . . . . . . . 281
9.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
x Contents

III Advanced Methods 285


10 “Multilevel” Meta-Analysis 287
10.1 The Multilevel Nature of Meta-Analysis . . . . . . . . . . . . . . . . 287
10.2 Fitting Three-Level Meta-Analysis Models in R . . . . . . . . . . . . 291
10.2.1 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . 293
10.2.2 Distribution of Variance across Levels . . . . . . . . . . . . 295
10.2.3 Comparing Models . . . . . . . . . . . . . . . . . . . . . . 296
10.3 Subgroup Analyses in Three-Level Models . . . . . . . . . . . . . . 298
10.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 301
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

11 Structural Equation Modeling Meta-Analysis 303


11.1 What Is Meta-Analytic Structural Equation Modeling? . . . . . . . 304
11.1.1 Model Specification . . . . . . . . . . . . . . . . . . . . . . 304
11.1.2 Meta-Analysis from a SEM Perspective . . . . . . . . . . . . 307
11.1.3 The Two-Stage Meta-Analytic SEM Approach . . . . . . . . 308
11.2 Multivariate Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . 309
11.2.1 Specifying the Model . . . . . . . . . . . . . . . . . . . . . 311
11.2.2 Evaluating the Results . . . . . . . . . . . . . . . . . . . . . 313
11.2.3 Visualizing the Results . . . . . . . . . . . . . . . . . . . . 315
11.3 Confirmatory Factor Analysis . . . . . . . . . . . . . . . . . . . . . 316
11.3.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . 317
11.3.2 Model Specification . . . . . . . . . . . . . . . . . . . . . . 319
11.3.3 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . 324
11.3.4 Path Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 328
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

12 Network Meta-Analysis 329


12.1 What Are Network Meta-Analyses? . . . . . . . . . . . . . . . . . . 330
12.1.1 Direct & Indirect Evidence . . . . . . . . . . . . . . . . . . 330
12.1.2 Transitivity & Consistency . . . . . . . . . . . . . . . . . . 332
12.1.3 Network Meta-Analysis Models . . . . . . . . . . . . . . . . 334
12.2 Frequentist Network Meta-Analysis . . . . . . . . . . . . . . . . . 335
12.2.1 The Graph Theoretical Model . . . . . . . . . . . . . . . . . 336
12.2.2 Frequentist Network Meta-Analysis in R . . . . . . . . . . . 338
12.3 Bayesian Network Meta-Analysis . . . . . . . . . . . . . . . . . . . 356
12.3.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . 356
12.3.2 The Bayesian Network Meta-Analysis Model . . . . . . . . . 358
12.3.3 Bayesian Network Meta-Analysis in R . . . . . . . . . . . . 360
12.3.4 Network Meta-Regression . . . . . . . . . . . . . . . . . . 373
12.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 378
12.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Contents xi

13 Bayesian Meta-Analysis 381


13.1 The Bayesian Hierarchical Model . . . . . . . . . . . . . . . . . . . 381
13.2 Setting Prior Distributions . . . . . . . . . . . . . . . . . . . . . . 383
13.3 Bayesian Meta-Analysis in R . . . . . . . . . . . . . . . . . . . . . 385
13.3.1 Fitting the Model . . . . . . . . . . . . . . . . . . . . . . . 386
13.3.2 Assessing Convergence . . . . . . . . . . . . . . . . . . . . 387
13.3.3 Interpreting the Results . . . . . . . . . . . . . . . . . . . . 389
13.3.4 Generating a Forest Plot . . . . . . . . . . . . . . . . . . . . 391
13.4 Questions & Answers . . . . . . . . . . . . . . . . . . . . . . . . . 395
13.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

IV Helpful Tools 397


14 Power Analysis 399
14.1 Fixed-Effect Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
14.2 Random-Effects Model . . . . . . . . . . . . . . . . . . . . . . . . 404
14.3 Subgroup Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . 406

15 Risk of Bias Plots 407


15.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
15.2 Summary Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
15.3 Traffic Light Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

16 Reporting & Reproducibility 413


16.1 Using R Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.2 Writing Reproducible Reports with R Markdown . . . . . . . . . . 415
16.3 OSF Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
16.3.1 Access Token . . . . . . . . . . . . . . . . . . . . . . . . . . 417
16.3.2 The {osfr} Package & Authentication . . . . . . . . . . . . . 418
16.3.3 Repository Setup . . . . . . . . . . . . . . . . . . . . . . . . 418
16.3.4 Upload & Download . . . . . . . . . . . . . . . . . . . . . . 419
16.3.5 Collaboration, Open Access & Pre-Registration . . . . . . . 420

17 Effect Size Calculation & Conversion 423


17.1 Mean & Standard Error . . . . . . . . . . . . . . . . . . . . . . . . 423
17.2 Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . . 424
17.3 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
17.4 One-Way ANOVAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
17.5 Two-Sample 𝑡-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 428
17.6 𝑝-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
17.7 𝜒2 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
17.8 Number Needed to Treat . . . . . . . . . . . . . . . . . . . . . . . 430
17.9 Multi-Arm Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
xii Contents

Appendix 437

A Questions & Answers 437

B Effect Size Formulas 445

C R & Package Information 449

Bibliography 451

Index 471
Preface

It is a trivial observation that our world is complex. Scientific research is no exception;


in most research fields, we are often faced with a seemingly insurmountable body
of previous research. Evidence from different studies can be conflicting, and it can
be difficult to make sense out of various sources of information. Evidence synthesis
methods therefore play a crucial role in many disciplines, for example the social
sciences, medicine, biology, or econometrics. Meta-analysis, the statistical procedure
used to combine results of various studies or analyses, has become an indispensable
tool in many research areas. Meta-analyses can be of enormous importance, especially
if they guide practical decision-making, or future research efforts. Many applied
researchers therefore already have some meta-analysis skills in their “statistical
toolbox”, while others want to learn how to perform meta-analyses in their own
research field. Meta-analyses have become so ubiquitous that many graduate and
undergraduate students already learn how to perform one as part of their curriculum
– sometimes with varying levels of enthusiasm.
The way meta-analyses can be performed, like statistical computing as a whole, has
seen major shifts in the last decades. This has a lot to do with the rise of open source,
collaborative statistical software, primarily in the form of the R Statistical Program-
ming Language and Environment. The R ecosystem allows researchers and statisti-
cians everywhere to build their own packages, and to make them available to everyone,
at no cost. This has lead to a spectacular rise in readily available statistical software
for the R language. While we are writing this, the CRAN Task View1 lists more than
130 packages dedicated to meta-analysis alone. In R, you can do anything – literally.
It is a full programming language, so if you do not find a function for something
you want to do, you can easily write it yourself. For meta-analyses, however, there
is hardly any need to do this anymore. Just a small collection of R packages already
provide all the functionality you can find in current “state-of-the-art” meta-analysis
programs – for free. Even more so, there are many novel meta-analysis methods that
can currently only be applied in R. In short: the R environment gives researchers
much more tools for their meta-analyses. In the best case, this allows us to draw
more robust conclusions from our data, and thus better informed decision-making.
This raises the question: why isn’t everyone using R for meta-analyses? We think there
are two main reasons: convenience and anxiety (and sometimes a mixture of both). Both
reasons are very understandable. Most meta-analysts are applied researchers, not
1 https://cran.r-project.org/web/views/MetaAnalysis.html

xiii
xiv Preface

statisticians or programmers. The thought of learning an obscure and complicated-


seeming programming language can act as a deterrent. The same is true for meta-
analytic methods, with their special theoretical background, their myriad analytic
choices, and different statistics that need to be interpreted correctly.
With this guide, we want to show that many of these concerns are unfounded, and
that learning how to do a meta-analysis in R is worth the effort. We hope that the
guide will help you to learn the skills needed to master your own meta-analysis project
in R. We also hope that this guide will make it easier for you to not only learn what
meta-analytic methods to apply when, but also why we apply them. Last but not
least, we see this guide as an attempt to show you that meta-analysis methods and R
programming are not mere inconveniences, but a fascinating topic to explore.

This Book Is for Mortals


This guide was not written for meta-analysis experts or statisticians. We do not
assume that you have any special background knowledge on meta-analytic meth-
ods. Only basic knowledge of fundamental mathematical and statistical concepts
is needed. For example, we assume that you have heard before what are things like
a “mean”, “standard deviation”, “correlation”, “regression”, “𝑝-value” or a “normal
distribution”. If these terms ring a bell, you should be good to go. If you are really
starting from scratch, you may want to first have a look at Robert Stinerock’s statistics
beginner’s guide (Stinerock, 2018) for a thorough introduction including hands-on
examples in R–or some other introductory statistics textbook of your choice.
Although we tried to keep it as minimal as possible, we will use mathematical formulas
and statistical notation at times. But do not panic. Formulas and Greek letters can
seem confusing at first glance, but they are often a very good way to precisely describe
the idea behind some meta-analysis methods. Having seen these formulas, and
knowing what they represent, will also make it easier for you to understand more
advanced texts you may want to read further down the line. And of course, we tried
our best to always explain in detail what certain symbols or letters stand for, and
what a specific formula wants to tell us. In the beginning of this book, you can find a
list of the symbols we use, and what they represent. In later chapters, especially the
Advanced Methods section, we need to become a little more technical to explain the
ideas behind some of the applied techniques. Nevertheless, we made sure to always
include some background information on the mathematical and statistical concepts
used in these sections.
No prior knowledge of R (or programming in general) is required. In the guide, we
try to provide a gentle introduction into basic R skills you need to code your own
meta-analysis. We also provide references to adequate resources to keep on learning.
Preface xv

Furthermore, we will show you how you can set up a free computer program which
allows you to use R conveniently on your PC or Mac.
As it says in the title, our book focuses on the “doing” part of meta-analysis. Our
guide aims to be an accessible resource which meets the needs of applied researchers,
students and data scientists who want to get going with their analyses using R. Meta-
analysis, however, is a vast and multi-faceted topic, so it is natural that not everything
can be covered in this guide. For this book, limitations particularly pertain to three
areas:
• Although we provide a short primer on these topics, we do not cover in detail how
to define research questions, systematically search and include studies for your
meta-analysis, as well as how to assess their quality. Each of these topics merits
books of their own, and luckily many helpful resources already exist. We therefore
only give an overview of important considerations and pitfalls when collecting
the data for your meta-analysis, and will refer you to adequate resources dealing
with the nitty-gritty details.
• The second limitation of this guide pertains to its level of technicality. This book
is decidedly written for “mortals”. We aim to show you when, how and why to
apply certain meta-analytic techniques, along with their pitfalls. We also try to
provide an easily accessible, conceptual understanding of the techniques we
cover, resorting to more technical details only if it benefits this mission. Quite
naturally, this means that some parts of the guide will not contain a deep dive
into technicalities that expert-level meta-analysts and statisticians may desire.
Nevertheless, we include references to more advanced resources and publications
in each chapter for the interested reader.
• Contents of a book will always to some extent reflect the background and ex-
perience of its authors. We are confident that the methods we cover here are
applicable and relevant to a vast range of research areas and disciplines. Never-
theless, we wanted to disclose that the four authors of this book are primarily
versed in current research in psychology, psychiatry, medicine and intervention
research. “Real-world” use cases and examples we cover in the book therefore
concentrate on topics where we know our way around. The good news is that
meta-analytic methods (provided some assumptions, which we will cover) are
largely agnostic to the research field from which data stem from, and can be
used for various types of outcome measures. Nonetheless, and despite our best
intentions to make this guide as broadly applicable to as many applied research
disciplines as possible, it may still be possible that some of the methods covered
in this book are more relevant for some research areas than others.
xvi Preface

Topics Covered in the Book


Among other things, this guide will cover the following topics:
• What a meta-analysis is, and why it was invented.
• Advantages and common problems with meta-analysis.
• How research questions for meta-analyses are specified, and how the search for
studies can be conducted.
• How you can set up R, and a computer program which allows you to use R in a
convenient way.
• How you can import your meta-analysis data into R, and how to manipulate it
through code.
• What effect sizes are, and how they are calculated.
• How to pool effect sizes in fixed-effect and random-effects meta-analyses.
• How to analyze the heterogeneity of your meta-analysis, and how to explore it
using subgroup analyses and meta-regression.
• Problems with selective outcome reporting, and how to tackle them.
• How to perform advanced types of meta-analytic techniques, such as “multi-
level” meta-analysis, meta-analytic structural equation modeling, network meta-
analysis, or Bayesian meta-analysis.
• How to report your meta-analysis results, and make them reproducible.

How to Use This Book

Work Flow

This book is intended to be read in a “linear” fashion. We recommend that you start
with the first chapters on meta-analysis and R basics, and then keep on working your-
self through the book one chapter after another. Jumping straight to the hands-on
chapters may be tempting, but it is not generally recommended. From our experi-
ence, a basic familiarity with meta-analysis, as well as the R Studio environment, is a
necessary evil to avoid frustrations later on. This is particularly true if you have no
previous experience with meta-analysis and R programming. Experienced R users
may skip Chapter 2, which introduces R and R Studio. However, it will certainly do
no harm to work through the chapter anyway as a quick refresher.
Preface xvii

While all chapters are virtually self-contained, we do sometimes make references to


topics covered in previous chapters. Chapters in the Advanced Methods section in
particular assume that you are familiar with theoretical concepts we have covered
before.
The last section of this book contains helpful tools for your meta-analysis. This does
not mean that these topics are the final things you have to consider when performing
a meta-analysis. We simply put these chapters at the end because they primarily
serve as reference works for your own meta-analysis projects. We link to these tools
throughout the book in sections where they are thematically relevant.

Online Version

This book also has an online version2 . On the website, click on “Read the Guide” to
open it. The contents of the online version are nearly identical with the ones you will
find here. However, the website does contain some extra content, including a few
sections on special interest topics that we did not consider essential for this book.
It also contains interactive material which can only be used via the Internet. We
reference supplementary online content in the book where it is thematically relevant.
The online version of the guide also contains an additional chapter called Corrections
& Remarks. We regularly update the online version of the book. Potential errors and
problems in the printed version of the book that we, or others, may have encountered
in the meantime will be displayed there.

Companion R Package

This book comes with a companion R package called {dmetar}. This package mainly
serves two functions. First, it aims to make your life easier. Although there are fan-
tastic R packages for meta-analysis out there with a vast range of functionalities,
there are still a few things which are currently not easy to implement in R, at least
for beginners. The {dmetar} package aims to bridge this gap by providing a few extra
functions facilitating exactly those things. Secondly, the package also contains all the
data sets we are using for the hands-on examples included in this book. In Chapter
2.3, the {dmetar} package is introduced in detail, and we show you how to install
the package step by step. Although we will make sure that there are no substantial
changes, {dmetar} is still under active development, so it may be helpful to have a
look at the package website3 now and then to check if there are new or improved
functionalities which you can use for your meta-analysis. While advised, it is not
essential that you install the package. Wherever we make use of {dmetar} in the book,
we will also provide you with the raw code for the function, or a download link to the
data set we are using.
2 www.protectlab.org/meta-analysis-in-r/
3 dmetar.protectlab.org
xviii Preface

Text Boxes

Throughout the book, a set of text boxes is used.

General Note
General notes contain relevant background information, insights, anec-
dotes, considerations or take-home messages pertaining to the covered
topic.

Important Information
These boxes contain information on caveats, problems, drawbacks or
pitfalls you have to keep in mind.

Questions
After each chapter, this box will contain a few questions through which
you can test your knowledge. Answers to these questions can be found
at the end of the book in Appendix A.

{dmetar} Note
The {dmetar} note boxes appear whenever functions or data sets con-
tained in the companion R package are used. These boxes also contain
URLs to the function code, or data set download links, for readers who
did not install the package.

How Can I Report This?


These boxes contain recommendations on how you can report R output
in your thesis or research article.
Preface xix

Conventions
A few conventions are followed throughout the book.

{packages}
All R packages are written in italic and are put into curly brackets. This is a common
way to write package names in the R community.

R Code

All R code or objects we define in R are written in this monospace font.

## R Output

The same monospace font is used for the output we receive after running R code.
However, we use two number signs (hashes) to differentiate it from R input.

𝐹𝑜𝑟𝑚𝑢𝑙𝑎
This serif font is reserved for formulas, statistics and other forms of mathematical
notation.

What to Do When You Are Stuck


Undeniably, the road to doing meta-analyses in R can be a rocky path at times. Al-
though we think this is sometimes exaggerated, R’s learning curve is steep. Statistics
is hard. We did our best to make your experience of learning how to perform meta-
analyses using R as painless as possible. Nevertheless, this will not shield you from
being frustrated sometimes. This is all but natural. We all had to start from scratch
somewhere down the line. From our own experience, we can you assure that we have
never met anyone who was not able to learn R, or how to do a meta-analysis. It only
takes practice, and the understanding that there will be no point in time when you
are “done” learning. We believe in you.
If you are looking for something a little more practical than this motivational message:
here are a few things you can do once you stumble upon things that this guide cannot
answer.
xx Preface

Do Not Panic

Making their first steps in R, many people are terrified when the first red error
messages start popping up. That is not necessary. Everyone gets error messages all the
time. Instead of becoming panicky or throwing your computer out the window, take
a deep breath and take a closer look at the error message. Very often, it only takes a
few tweaks to make the error messages disappear. Have you misspelled something in
your code? Have you forgotten to close a bracket, or to put something into quotation
marks? Also, make sure that your output actually is an error message. R distinguishes
between Errors, Warnings and plain messages. Only the first means that your code
could not be executed. Warnings mean that your code did run, but that something
may have gone awry. Messages mean that your code did run completely, and are
usually shown when a function simply wants to bring your attention to something
it has done for you under the hood. For this reason, they are also called diagnostic
messages.

Google

A software developer friend once told the first author this joke about his profession: “A
programmer is someone who can Google better than Average Joe”. This observation
certainly also applies to R programming. If you find yourself in a situation in which
you cannot make sense out of an error or warning message you receive, do not hesitate
to simply copy and paste it, and do a Google search. Adding “R” to your search is often
helpful to improve the results. Most content on the Internet is in English; so if your
error message in R is in another language, run Sys.setenv(LANGUAGE = "en") and
then rerun your code again. There is a large R community out there, and it is very
likely that someone had the same problem as you before. Google is also helpful when
there is something specific you want to do with your data, but do not know what R
commands you should use. Even for experts, it is absolutely normal to use Google
dozens of times when writing R code. Do not hesitate to do the same whenever you
get stuck.

StackOverflow & CrossValidated

When searching for R-related questions on Google, you will soon find out that many
of the first hits will link you to a website called StackOverflow4 . StackOverflow is a
large community-based forum for questions related to programming in general. On
StackOverflow, everyone (including you) can ask and answer questions. In contrast
to many other forums on the Internet, answers you get on StackOverflow are usually
goal-oriented and helpful. If searching Google did not help you to solve your problem,
addressing it there might be a good solution. However, there are a few things to
4 https://stackoverflow.com/
Preface xxi

keep in mind. First, when asking a question, always tag your question with [R] so
that people know which programming language you are talking about. Also, run
sessionInfo() in R and attach the output you get to your question. This lets people
know which R and package versions you are using, and might be helpful to locate the
problem. Lastly, do not expect overwhelming kindness. Many StackOverflow users
are experienced programmers who may be willing to point to certain solutions; but
do not expect anyone to solve your problem for you. It is also possible that someone
will simply inform you that this topic has already been covered elsewhere, send you
the link, and then move on. Nevertheless, using StackOverflow is usually the best way
to get high-quality support for specific problems you are dealing with. StackOver-
flow, by the way, is primarily for questions on programming. If your question also
has a statistics background, you can use CrossValidated5 instead. CrossValidated
works like StackOverflow, but is primarily used by statisticians and machine learning
experts.

Contact Us

If you have the feeling that your problem has something to do with this guide itself,
you can also contact us. This particularly pertains to issues with the companion R
package for this guide, {dmetar}. If you have trouble installing the package, or using
some if its functions, you can go to our website6 , where you can find ways to report
your issue. When certain problems come up frequently, we usually try to have a look
at them and search for fixes. Known issues will also be displayed in the Corrections &
Remarks section in the online version of the guide (see Work Flow section). Please do
not be disappointed if we do not answer your question personally, or if takes some
time to get back to you. We receive many questions related to meta-analysis and our
package every day, so it is sometimes not possible to directly answer each and every
one.

Acknowledgments
We would like to thank David Grubbs and Chapman & Hall/CRC Press for approach-
ing us with the wonderful idea of turning our online guide into the printed book you
are reading right now, and for their invaluable editorial support.
Many researchers and students have shared their feedback and experiences working
with this guide with us since we began writing a preliminary online version of it in
5 https://stats.stackexchange.com/
6 www.protectlab.org/meta-analysis-in-r
xxii Preface

late 2018. This feedback has been incredibly valuable, and has helped us considerably
to tailor this book further to the needs of the ones reading it. Thanks to all of you.
We owe a great debt of gratitude to all researchers involved in the development of
the R meta-analysis infrastructure presented in this guide; but first and foremost to
Guido Schwarzer and Wolfgang Viechtbauer, maintainers of the {meta} and {metafor}
package, respectively. This guide, like the whole R meta-analysis community, would
not exist without your effort and dedication.
Furthermore, particular thanks go to Luke McGuinness, author of the gorgeous
{robvis} package, for writing an additional chapter on risk of bias visualization, which
you can find on this book’s companion website. Luke, we are incredibly grateful for
your continued support of this project.
Last but not least, we want to thank Lea Schuurmans for supporting us in the devel-
opment and compilation of this book.

February 2021
Erlangen, Amsterdam, Kyoto and Munich
Mathias, Pim, Toshi and David
About the Authors

Mathias Harrer is a research associate at the Friedrich-Alexander-University


Erlangen-Nuremberg. Mathias’ research focuses on biostatistical and technolog-
ical approaches in psychotherapy research, methods for clinical research synthesis
and on the development of statistical software.
Pim Cuijpers is professor of clinical psychology at the VU University Amsterdam.
He is specialized in conducting randomized controlled trials and meta-analyses,
with a focus on the prevention and treatment of common mental disorders. Pim has
published more than 800 articles in international peer-reviewed scientific journals,
many of which are meta-analyses of clinical trials.
Toshi A. Furukawa is professor of health promotion and human behavior at the
Kyoto University School of Public Health. His seminal research focuses both on theo-
retical aspects of research synthesis and meta-analysis, as well as their application in
evidence-based medicine.
David D. Ebert is professor of psychology and behavioral health technology at the
Technical University of Munich. David’s research focuses on Internet-based inter-
vention, clinical epidemiology, as well as applied research synthesis in this field.

xxiii
List of Symbols

u�, u�, u�, u� Events in the treatment group, u�0 , u�1 , u� Regression intercept, regression
non-events in the treatment group, coefficient, Type II error rate.
events in the control group, non-events
in the control group.

u�u� Critical value assumed for the Type I ℋu�(u�0 , u�) Half-Cauchy distribution with location
error rate u� (typically 1.96). parameter u�0 and scaling parameter u�.

u�2 Chi-squared statistic. Cov(u�, u�) Covariance of u� and u�.

u� Cohen’s u� (standardized mean u�u� Regression dummy.


difference).

u� Non-centrality parameter (non-central d.f. Degrees of freedom.


u� distribution).

u� Sampling error. u� Snedecor’s u� statistic (used by the


u�-tests in ANOVAs).

u� Small sample bias-corrected u�2 Higgins’ and Thompson’s u�2 measure


standardized mean difference (Hedges’ of heterogeneity (percentage of
u�). variation not attributable to sampling
error).

∫ u�(u�)u�u� Integral of u�(u�). u�, u� Some study in a meta-analysis, total


number of studies in a meta-analysis.

u� True effect of an effect size cluster. u�u�, (Standardized) mean difference


u�u�u� (Cohen’s u�).

u�̄ Arithmetic mean (based on an observed u�, u� (True) population mean, sample mean.
sample), identical to u�.

u�, u� (Total) sample size of a study. u�(u�, u�2 ) Normal distribution with population
mean u� and variance u�2 .

Φ(u�) Cumulative distribution function u�, u� True population proportion, proportion


(CDF), where u� follows a standard based on an observed sample.
normal distribution.

u�(X|Y) Conditional probability of X given Y. u� ̂ (Estimate of) Peto’s odds ratio, or some
other binary effect size.

xxv
xxvi List of Symbols

(continued)

u� Cochran’s u� measure of heterogeneity. u�u�, Risk ratio, odds ratio, incidence rate
u�u�, ratio.
u�u�u�

u�̂ R-hat value in Bayesian modeling. u�2∗ u�2 (explained variance) analog for
meta-regression models.

u�, u� True population correlation, observed u�u� Standard error


correlation.

u�2 (True) population variance. u� Student’s u� statistic.

u�2 , u� True heterogeneity variance and u� A true effect size, or the true value of an
standard deviation. outcome measure.

u�, u�, u�2 , Sample variance (of u�), where u� is the u�, u�∗ , (Inverse-variance) weight,
̂
V ar(u�) standard deviation. u�(u�) random-effects weight of an effect size,
function that assigns weights to u�.

u� Fisher’s u� or u�-score. u�, u� “Error” due to between-study


heterogeneity, random effect in
(meta-)regression models.

Note. Vectors and matrices are written in bold. For example, we can denote all observed effect sizes in
a meta-analysis with a vector u�̂ = (u�1̂ , u�2̂ , … , u�u�
̂ )⊤ , where u� is the total number of studies. The ⊤
symbol indicates that the vector is transposed. This means that elements in the vector are arranged vertically
instead of horizontally. This is sometimes necessary to do further operations with the vector, for example,
multiplying it with another matrix.
Part I

Getting Started
1
Introduction

Science is generally assumed to be a cumulative process. In their scientific endeavors,


researchers build on the evidence compiled by generations of scientists who came
before them. A famous quote by Isaac Newton stresses that if we want to see further,
we can do so by standing on the “shoulders of giants”. Many of us are fascinated
by science because it is progressive, furthering our understanding of the world, and
helping us to make better decisions.
At least by the numbers alone, this sentiment may be justified. Never in history did
we have access to more evidence in the form of published research articles than we do
today. Petabytes of research findings are produced every day all around the world. In
biomedicine alone, more than one million peer-reviewed articles are published each
year (Björk et al., 2008). The amount of published research findings is also increasing
almost exponentially. The number of articles indexed for each year in one of the
largest bibliographical databases, PubMed1 , symbolizes this in an exemplary fashion.
Until the middle of the 20th century, only a few hundred research articles are listed
for each year. These numbers rise substantially for the following decades, and since
the beginning of the 21st century, they skyrocket (see Figure 1.1).

FIGURE 1.1: Articles indexed in PubMed by year, 1781-2019.

In principle, this development should make us enthusiastic about the prospects of


science. If science is cumulative, more published research equals more evidence.
This should allow us to build more powerful theories and to dismantle fallacies of the
past. Yet, of course, it is not that easy. In a highly influential paper, John Ioannidis of
1 pubmed.ncbi.nlm.nih.gov/

DOI: 10.1201/9781003107347-1 3
4 1 Introduction

Stanford criticized the notion that science is automatically cumulative and constantly
improving. His article has the fitting title “Why Science Is Not Necessarily Self-
Correcting” (Ioannidis, 2012). He argues that research fields can often exist in a
state where an immense research output is produced on a particular topic or theory,
but where fundamental fallacies remain unchallenged and are only perpetuated.
Back in the 1970s, the brilliant psychologist Paul Meehl already observed that in
some research disciplines, there is a close resemblance between theories and fashion
trends. Many theories, Meehl argued, are not continuously improved or refuted, they
simply “fade away” when people start to lose interest in them (Meehl, 1978).
It is an inconvenient truth that the scientific process, when left to its own devices,
will not automatically move us to the best of all possible worlds. With unprecedented
amounts of research findings produced each day, it is even more important to view
and critically appraise bodies of evidence in their entirety. Meta-analysis can be enor-
mously helpful in achieving this, as long as we acknowledge its own limitations and
biases.

1.1 What Are Meta-Analyses?


One of its founding fathers, Gene V. Glass, described meta-analysis as an “analysis
of analyses” (Glass, 1976). This simple definition already tells us a lot. In conventional
studies, the units of analysis are a number of people, specimens, countries, or objects.
In meta-analysis, primary studies themselves become the elements of our analysis. The
aim of meta-analysis is to combine, summarize, and interpret all available evidence
pertaining to a clearly defined research field or research question (Lipsey and Wilson,
2001, chapter 1). However, it is only one method to do this. There are at least three
distinct ways through which evidence from multiple studies can be synthesized
(Cuijpers, 2016).
• Traditional/Narrative Reviews. Until way into the 1980s, narrative reviews were
the most common way to summarize a research field. Narrative reviews are often
written by experts and authorities of a research field. There are no strict rules
on how studies in a narrative review have to be selected and how to define the
scope of the review. There are also no fixed rules on how to draw conclusions
from the reviewed evidence. Overall, this can lead to biases favoring the opinion
of the author. Nevertheless, narrative reviews, when written in a balanced way,
can be helpful for readers to get an overall impression of the relevant research
questions and evidence base of a field.
• Systematic Reviews. Systematic reviews try to summarize evidence using clearly
defined and transparent rules. In systematic reviews, research questions are de-
termined beforehand, and there is an explicit, reproducible methodology through
which studies are selected and reviewed. Systematic reviews aim to cover all
1.1 What Are Meta-Analyses? 5

available evidence. They also assess the validity of evidence using predefined
standards and present a synthesis of outcomes in a systematic way.
• Meta-Analyses. Most meta-analyses can be seen as an advanced type of a system-
atic review. The scope of meta-analyses is clearly defined beforehand, primary
studies are also selected in a systematic and reproducible way, and there are also
clear standards through which the validity of the evidence is assessed. This is why
it is common to find studies being named a “systematic review and meta-analysis”.
However, there is one aspect which makes meta-analyses special. Meta-analyses
aim to combine results from previous studies in a quantitative way. The goal of
meta-analyses is to integrate quantitative outcomes reported in the selected
studies into one numerical estimate. This estimate then summarizes all the indi-
vidual results. Meta-analyses quantify, for example, the effect of a medication,
the prevalence of a disease, or the correlation between two properties, across all
studies2 . Therefore, they can only be used for studies which report quantitative
results. Compared to systematic reviews, meta-analyses often have to be more
exclusive concerning the kind of evidence that is summarized. To perform a
meta-analysis, it is usually necessary that studies used the same design and type
of measurement, and/or delivered the same intervention (see Chapter 1.3).

Individual Participant Data Meta-Analysis

Depending on the definition, there is also a fourth type of evidence syn-


thesis method, so called Individual Participant Data (IPD) Meta-Analysis
(Riley et al., 2010). Traditionally, meta-analyses are based on aggregated
results of studies that are found in the published literature (e.g. means
and standard deviations, or proportions). In IPD meta-analysis, the
original data of all studies is collected instead and combined into one
big data set. IPD meta-analysis has several advantages. For example,
it is possible to impute missing data and apply statistical methods in
exactly the same way across all studies. Furthermore, they can make it
easier to explore variables which influence the outcome of interest. In
traditional meta-analyses, only so-called study-level variables (e.g. the
year of publication, or the population used in the study) can be used to
do this. However, it is often participant-level information (e.g. an indi-
vidual person’s age or gender) which may play a role as an important
moderator of the results. Such variables can only be explored using IPD
meta-analysis.

2 This statement is of course only true if meta-analytic techniques were applied soundly, and if the

results of the meta-analysis allow for such generalizations.


6 1 Introduction

IPD meta-analysis is a relatively new method, and the overwhelming


majority of meta-analyses conducted today remain “traditional” meta-
analyses. This is also one reason why we will not cover IPD meta-analysis
methods in this guide. This has nothing to do with traditional meta-
analysis being superior–the opposite is correct. It is simply due to the
fact that making all research data openly available has unfortunately
been very uncommon in most disciplines until recently. While it is rel-
atively easy to extract summarized results from published research
reports, obtaining original data from all relevant studies is much more
challenging. In biomedical research, for example, individual participant
data can only be obtained from approximately 64% of the eligible studies
(Riley et al., 2007).

1.2 “Exercises in Mega-Silliness”: A Historical Anecdote


Meta-analysis was not invented by one person alone, but by many founding mothers
and fathers (O’Rourke, 2007). The first attempts to statistically summarize the effects
of separate, but similar studies date back around 100 years, and can be linked to two
of the most important statisticians of all time, Karl Pearson and Ronald A. Fisher.
Pearson, in the beginning of the 20th century, combined findings on the effects of
typhoid inoculation across the British Empire to calculate a pooled estimate (Shan-
non, 2016). Fisher, in his seminal 1935 book on the design of experiments, covered
approaches to analyze data from multiple studies in agricultural research, and al-
ready acknowledged the problem that study results may vary due to location and
time (Fisher, 1935; O’Rourke, 2007).
The name “meta-analysis” and the beginning of its rise to prominence, however, can
be traced back to a scholarly dispute raging in the mid-20th century. In 1952, the
famous British psychologist Hans Jürgen Eysenck (Figure 1.2) published an article
in which he claimed that psychotherapy (in that time, this largely meant Freudian
psychoanalysis) was ineffective. If patients get better during therapy, it is because
their situation would have improved anyway due to factors that have nothing to do
with the therapy. Even worse, Eysenck claimed, psychotherapy would often hinder
patients from getting better. The reputation of psychotherapy took a big hit, and
it did not recover until the late 1970s. During that time, Gene V. Glass developed
a technique he termed “meta-analysis”, which allowed to pool Standardized Mean
Differences3 across studies. The first extensive application of his technique was in an
3 i.e., the difference in means between two groups, for example, an intervention and control group,

expressed in the units of the pooled standard deviation of both groups (see Chapter 3.3.1).
1.2 “Exercises in Mega-Silliness”: A Historical Anecdote 7

article published in the American Psychologist, written by Mary L. Smith and Glass
himself (Smith and Glass, 1977). In this large study, results from 375 studies with
more than 4000 participants were combined in a meta-analysis. The study found
that psychotherapies had a pooled effect of 0.68, which can be considered quite
large. Glass’ work had an immense impact because it provided quantitative evidence
that Eysenck’s verdict was wrong. Eysenck himself, however, was not convinced,
calling the meta-analysis “an abandonment of scholarship” and “an exercise in mega-
silliness” (Eysenck, 1978).

FIGURE 1.2: Hans Jürgen Eysenck (Sirswindon/CC BY-SA 3.0).

Today we know that Smith and Glass’ study may have overestimated the effects of
psychotherapy because it did not control for biases in the included studies (Cui-
jpers et al., 2019a). However, the primary finding that some psychotherapies are
effective has been corroborated by countless other meta-analyses in the following
decades. Eysenck’s grim response could not change that meta-analysis soon became
a commonly used method in various fields of study.
The methodology behind meta-analysis has been continuously refined since that time.
About the same time Glass developed his meta-analysis method, Hunter and Schmidt
started crafting their own type of meta-analysis techniques putting emphasis on the
correction of measurement artifacts (Schmidt and Hunter, 1977; Hunter and Schmidt,
2004). Meta-analysis first found its way into medicine through the groundbreaking
work of Peter Elwood and Archie Cochrane, among others, who used meta-analysis to
show that aspirin has a small, but statistically and clinically relevant preventive effect
on the recurrence of heart attacks (Peto and Parish, 1980; Elwood, 2006; O’Rourke,
2007). In the mid-80s, Rebecca DerSimonian and Nan Laird introduced an approach
to calculate random-effects meta-analyses (see Chapter 4.1.2) that has been in use to
this day (DerSimonian and Laird, 1986). Countless other innovations have helped to
increase the applicability, robustness, and versatility of meta-analytic methods in
the last four decades.
8 1 Introduction

The Cochrane and Campbell Collaboration

The Cochrane Collaborationa (or simply Cochrane), founded in 1993 and


named after Archie Cochrane, has played a crucial role in the develop-
ment of applied meta-analysis. Cochrane is an international network
of researchers, professionals, patients, and other relevant stakeholders
who “work together to produce credible, accessible health information
that is free from commercial sponsorship and other conflicts of inter-
est”.
Cochrane uses rigorous standards to synthesize evidence in the biomed-
ical field. The institution has its headquarters in London, but also has
local branches in several countries around the world. The Cochrane
Collaboration issues the regularly updated Handbook for Systematic Re-
views of Interventionsb (Higgins et al., 2019) and the Cochrane Risk of Bias
Toolc (Sterne et al., 2019). Both are widely viewed as standard reference
works for all technical details concerning systematic reviews and meta-
analyses (see Chapter 1.4). An organization similar to Cochrane is the
Oslo-based Campbell Collaborationd , which primarily focuses on research
in the social sciences.
a https://www.cochrane.org/
b https://training.cochrane.org/handbook
c https://methods.cochrane.org/bias/resources/rob-2-revised-cochrane-

risk-bias-tool-randomized-trials
d https://campbellcollaboration.org/

1.3 Apples and Oranges: A Quick Tour of Meta-Analysis Pitfalls


In the last decades, meta-analysis has become a universally accepted research tool.
This does not come without its own costs. Conducting a high-quality primary study
is often very expensive, and it can take many years until the results can finally be an-
alyzed. In comparison, meta-analyses can be produced without too many resources,
and within a relatively small time. Nevertheless, meta-analyses often have a high
impact and are cited frequently (Patsopoulos et al., 2005). This means that scientific
journals are often very inclined to publish meta-analyses, maybe even if their quality
or scientific merit is limited. Unfortunately, this creates a natural incentive for re-
searchers to produce many meta-analyses, and scientific considerations sometimes
become secondary.
1.3 Apples and Oranges: A Quick Tour of Meta-Analysis Pitfalls 9

Ioannidis (2016) criticized that an immense amount of redundant and misleading


meta-analyses is produced each year. On some “hot” topics, there are more than 20
recent meta-analyses. Some meta-analyses may also be heavily biased by corporate
interests, for example, in pharmacotherapy research (Ebrahim et al., 2016; Kirsch
et al., 2002). As we have mentioned before, reproducibility is a hallmark of good
science. In reality, however, the reproducibility of many meta-analyses is all too
often limited because important information is not reported (Lakens et al., 2017). A
common problem is also that different meta-analyses on the same or overlapping
topics come to different conclusions. In psychotherapy research, for example, there
has been an ongoing debate pertaining to the question if all types of psychotherapy
produce equivalent outcomes. Countless reviews have been published supporting
either one conclusion or the other (Wampold, 2013; Cuijpers et al., 2019c).
While some of these issues may be associated with systemic problems of the scientific
process, others can be traced back to flaws of meta-analyses themselves. Therefore, we
want to lead you through a quick tour of common meta-analysis pitfalls (Borenstein
et al., 2011; Greco et al., 2013; Sharpe, 1997).
• The “Apples and Oranges” problem. One may argue that meta-analysis means
combining apples with oranges. Even with the strictest inclusion criteria, studies
in a meta-analysis will never be absolutely identical. There will always be smaller
or larger differences between the included sample, the way an intervention was
delivered, the study design, or the type of measurement used in the studies. This
can be problematic. Meta-analysis means to calculate a numerical estimate which
represents the results of all studies. Such an estimate can always be derived from
a statistical point of view, but it becomes meaningless when studies do not share
the properties that matter to answer a specific research question. Imagine the,
admittedly absurd, scenario in which a meta-analyst decides to pool both studies
on the effect of job satisfaction on job performance, as well as all available evi-
dence on the effect of medication on the HbA1c value of diabetic patients in one
meta-analysis. The results would be pointless to organizational psychologists
and diabetologists alike. Now, suppose that the same poor meta-analyst, trying
to learn from previous mistakes, overcompensates and conducts a meta-analysis
containing only studies published between 1990 and 1999 in which Canadian
males in their sixties with moderate depressive symptoms were treated using
40mg of Fluoxetine, for exactly six weeks. The meta-analyst may proudly report
the positive results of the study to a psychiatrist. However, she may only ask:
“and what do I do if my patient is 45 years old and French”?
This brings us to an important point. The goal of meta-analyses is not to heed-
lessly throw everything together that can be combined. Meta-analysis can be
used to answer relevant research questions that go beyond the particularities of
individual studies (Borenstein et al., 2011, chapter 40). The scope and specificity
of a meta-analysis should therefore be based on the research question it wants
to answer, and this question should be of practical relevance (see Chapter 1.4). If
we want to know, for example, if a type of training program is effective across
various age groups, cultural regions and settings, it makes perfect sense to put
10 1 Introduction

no restriction on the population and country of origin of a study. However, it


may then be advisable to be more restrictive with respect to the training pro-
gram evaluated in the studies, and only include the ones in which the training
had a certain length, or covered similar topics. Results of such a meta-analysis
would allow us not only to estimate the pooled effect of the training but also allow
us to quantify if and how much this effect may vary across different settings.
Meta-analysis is capable to accommodate and “make sense” out of such forms of
heterogeneity. In Chapter 5, we will have a closer look at this important concept.
To sum up, whether the “Apples and Oranges” problem is in fact an issue highly
depends on the question a meta-analysis wants to answer. Variation between
studies can often be unproblematic, and even insightful if it is correctly incorpo-
rated into the aims and problem specification of a meta-analysis.
• The “Garbage In, Garbage Out” problem. The quality of evidence produced by a
meta-analysis heavily depends on the quality of the studies it summarizes. If the
results reported in our included findings are biased, or downright incorrect, the
results of the meta-analysis will be equally flawed. This is what the “Garbage In,
Garbage Out” problem refers to. It can be mitigated to some extent by assessing
the quality or risk of bias (see Chapter 1.4 and 15) of the included studies. However,
if many or most of the results are of suboptimal quality and likely biased, even
the most rigorous meta-analysis will not be able to balance this out. The only
conclusion that can usually be drawn in such cases is that no trustworthy evidence
exists for the reviewed topic, and that more high-quality studies have to be
conducted in the future. However, even such a rather disappointing outcome
can be informative, and help guide future research.
• The “File Drawer” problem. The file drawer problem refers to the issue that not
all relevant research findings are published, and therefore missing in our meta-
analysis. Not being able to integrate all evidence in a meta-analysis would be
undesirable, but at least tolerable if we could safely assume that research findings
are missing at random in the published literature. Unfortunately, they are not.
Positive, “innovative” findings often generate more buzz than failed replications
or studies with negative and inconclusive results. In line with this, research shows
that in the last decades, less and less negative findings have been published in
many disciplines, particularly in the social sciences and the biomedical field
(Fanelli, 2012). There is good reason to believe that studies with negative or
“disappointing” results are systematically underrepresented in the published
literature and that there is a so called publication bias. The exact nature and extent
of this bias can be at best a “known unknown” in meta-analyses. However, there
are certain ways through which publication bias can be minimized. One pertains
to the way that studies are searched for and selected (see Chapter 1.4). The other
approaches are statistical methods which try to estimate if publication bias exists
in a meta-analysis, and how big its impact may be. We will cover a few of these
methods in Chapter 9.
• The “Researcher Agenda” problem. When defining the scope of a meta-analysis,
searching and selecting studies, and ultimately pooling outcome measures,
1.4 Problem Specification, Study Search & Coding 11

researchers have to make a myriad of choices. Meta-analysis comes with many


“researcher degrees of freedom” (Wicherts et al., 2016), leaving much space for
decisions which may sometimes be arbitrary, and sometimes the result of undis-
closed personal preferences. The freedom of meta-analysts in their modus operandi
becomes particularly problematic when researchers are consciously or subcon-
sciously driven by their own agenda. Meta-analyses are usually performed by
applied researchers, and having extensive subject-specific expertise on the re-
viewed topic is a double-edged sword. On the one hand, it can help to derive
and answer meaningful research questions in a particular field. On the other
hand, such experts are also deeply invested in the research area they are exam-
ining. This means that many meta-analysts may hold strong opinions about
certain topics, and may intentionally or unintentionally influence the results
in the direction that fits their beliefs. There is evidence that, given one and the
same data set, even experienced analysts with the best intentions can come to
drastically varying conclusions (Silberzahn et al., 2018). The problem may be
even more grave in intervention research, where some meta-analysts have a
substantial researcher allegiance because they have helped to develop the type of
intervention under study. Such researchers may of course be much more inclined
to interpret outcomes of a meta-analysis more positively than indicated by the
evidence. One way to reduce the researcher agenda problem is pre-registration,
and publishing a detailed analysis plan before beginning with the data collection
for a meta-analysis (see Chapters 1.4 and 16.3.5).

1.4 Problem Specification, Study Search & Coding


In the last chapter, we took some time to discuss common problems and limitations
of meta-analyses. Many of these issues, such as the “Apples and Oranges” problem,
the “File Drawer” problem, or the “Researcher Agenda” problem, can and should be
addressed by every meta-analyst. This begins long before you start calculating your
first results. No meta-analysis can be conducted without data, and this data has to
come from somewhere. We first have to specify the research question and eligibility
criteria of our planned meta-analysis, search for studies and select the relevant ones,
extract the data we need for our calculations, and then code important information
we want to report later on. There are several rules, standards, and recommendations
we can or should follow during each of these steps; they can help us to create a high-
quality meta-analysis. Such high-quality meta-analyses contain a comprehensive
selection of all suitable evidence, are unbiased and impartial with respect to their
subject, and they draw valid, justified, and practically relevant conclusions from their
results.
However, even when “following all the rules”, it may not always be clear which specific
decision is the best to achieve this in practice. It is possible that people will disagree
12 1 Introduction

with the way you went about some things. This is normal and usually just fine, as
long as your methodological decisions are both transparent and reproducible (Pigott
and Polanin, 2020).
In this chapter, we will go chronologically through a few important building blocks
needed before we can begin with our first calculations. The length of this chapter is
not representative of the time this process of data acquisition takes in reality. From
our experience, statistical analyses only make up a maximum of 15% of the time
spent on a meta-analysis, much less compared to everything that comes before. But
specifying the research question, systematically searching for studies and reliably
coding extracted data is essential. It builds the basis of every good meta-analysis.

1.4.1 Defining the Research Question

When designing a study, the first thing we do is define the research question. Meta-
analysis is no exception. To produce a good research question, it helps to first see
it as a form of problem specification. To be pertinent and impactful, a meta-analysis
should solve a problem. To identify such problems, some subject-specific knowledge
is necessary. If you want to find a good research question for a meta-analysis, it may,
therefore, be helpful to pick a research area in which you have some background
knowledge and ask yourself a few basic questions first. What are the questions which
are currently relevant in this particular field? Is there a gap in current knowledge
on certain topics? Are there any open discussions that remain unsettled? It might
also help to think about the intended target audience. What are problems that are
relevant to other researchers? What issues might other people, for example, health
care professionals, state agencies, schools, or human resource departments face?
Meta-analysis depends on previous research. Once you know the general direction
of your research problem, it therefore helps to have a look at the current literature.
Do previous primary studies exist on this topic, and how did they address the prob-
lem? What methods and outcome measures did they use? What limitations did they
mention in the background and discussion section of the article? Have previous
reviews and meta-analyses addressed the topic, and what issues have they left open?
Cummings and colleagues (2013) have proposed a few criteria we can use to specify
the problem to be covered by our meta-analysis, the FINER framework. It states that
a research question should be Feasible, Interesting, Novel, Ethical, and Relevant.
Step by step, asking yourself these questions should make it easier to define what you
want to achieve with your meta-analysis. It may also become clear that meta-analysis
is not suitable for your problem. For example, there may simply be no relevant studies
that have addressed the topic; or there may already be recent high-quality meta-
analyses in the literature which address the issue sufficiently. However, if you get the
feeling that your problem is relevant to one or several groups of people, that previous
studies have provided data pertaining to this problem, and that previous reviews
and meta-analyses have not sufficiently or adequately addressed it, you can proceed
to turn it into a research question.
1.4 Problem Specification, Study Search & Coding 13

Let us give you an example of how this can be done. There is evidence suggesting
that gender biases exist in medical research (Hamberg, 2008; Nielsen et al., 2017). Es-
pecially in earlier decades, many clinical trials only or largely used male participants,
and results were simply assumed to generalize to women as well. This has probably
lead to worse health outcomes in women for some diseases, such as heart conditions
(Kim and Menon, 2009; Mosca et al., 2013)4 . Let us imagine that you are a medical
researcher. You have heard rumors that a commonly used drug, Chauvicepine, may
have serious side effects in women that have remained largely unrecognized. You
determined that this, if true, would be a highly relevant problem because it would
mean that many women are prescribed with a drug that is not safe for them. A look
into the literature reveals that most studies investigating Chauvicepine were random-
ized placebo-controlled trials. The first of these trials were conducted in populations
which only or predominantly consisted of men. But you also found a few more recent
trials in which the gender makeup was more balanced. Many of these trials even
reported the number of negative side effects that occurred in the trial separately for
men and women. You also find a recent commentary in a medical journal in which
a doctor reports that in her clinic, many women have experienced negative side
effects when being treated with the medication. Based on this, you decide that it may
be interesting to address this problem in a meta-analysis. Therefore, you translate
the issue you just discovered into a research question: does evidence from randomized
placebo-controlled trials show that Chauvicepine leads to a significant increase of negative side
effects in women, compared to placebo?
Having derived a first formulation of the research question is only the first step. We
now have to translate it into concrete eligibility criteria. These eligibility criteria will
guide the decision which studies will and will not be included in our meta-analysis.
They are, therefore, extremely important and should be absolutely transparent and
reproducible. A good way to start specifying the eligibility criteria is to use the PICO
framework (Mattos and Ruellas, 2015). This framework is primarily aimed at in-
tervention studies, but it is also helpful for other types of research questions. The
letters in PICO stand for Population, Intervention, Control group or comparison,
and Outcome.
• Population: What kind of people or study subjects do studies have to include to
be eligible? Again, remember that it is important to address this questions as
precisely as possible, and to think of the implications of each definition. If you
only want to include studies in young adults, what does “young adults” mean?
That only people between 18 and 30 were included? Can that even be determined
from the published articles? Or is it just important that people were recruited
from places which are usually frequented by young adults, such as universities
and Cardi B concerts? If you only want to include studies on patients with a
specific medical condition, how has that condition been diagnosed? By a trained
health care professional, or is a self-report questionnaire sufficient? Many of
these questions can be answered by resorting to the F and R parts of the FINER
4 It is of note that gender bias can not only negatively affect women but also men; an example are

diseases such as osteoporosis (Adler, 2014).


14 1 Introduction

framework. Is it feasible to impose such a limitation on published research? And


is it a relevant differentiation?
• Intervention: What kind of intervention (or alternatively, exposure) do studies
have to examine? If you want to study the effects of an intervention, it is impor-
tant to be very clear on the type of treatment that is eligible. How long or short
do interventions have to be? Who is allowed to deliver them? What contents
must the intervention include? If you do not focus on interventions, how must
the independent variable be operationalized? Must it be measured by a specific
instrument? If you study job satisfaction, for example, how must this construct
be operationalized in the studies?
• Control group or comparison: To what were results of the study compared to? A
control group receiving an attention placebo, or a pill placebo? Waitlists? Another
treatment? Or nothing at all? It is also possible that there is no comparison or
control group; for example, if you want to study the prevalence estimates of a
disease across different studies, or how many specimens of a species there are in
different habitats.
• Outcome: What kind of outcome or dependent variable do studies have to mea-
sure? And how must the variable be measured? Is it the mean and standard devia-
tion of questionnaire scores? Or the number of patients who died or got sick?
When must the outcome be measured? Simply after the treatment, no matter
how long the treatment was? Or after one to two years?

Guidelines for Systematic Reviews and Meta-Analyses

In light of the often suboptimal quality of meta-analyses, some guide-


lines and standards have been established on how meta-analyses should
be conducted.

If you meta-analyze evidence in biomedical research or on the effect of


an intervention, we strongly advise you to follow the Preferred Reporting
Items for Systematic Reviews and Meta-Analyses, or PRISMA (Moher et al.,
2009). The PRISMA statement contains several recommendations on
how nearly all aspects of the meta-analysis process should be reported.
The statement can also be found onlinea . For meta-analyses of psycho-
logical and behavior research, the American Psychological Association’s
Meta-Analysis Reporting Standards (Appelbaum et al., 2018), or MARS,
should be followed.
a http://www.prisma-statement.org/
1.4 Problem Specification, Study Search & Coding 15

Although these standards largely pertain to how meta-analyses should


be reported, they also have implications on best practices when performing
a meta-analysis. PRISMA and MARS share many core elements, and
many things that we cover in this chapter are also mentioned in both of
these guidelines.

An even more detailed resource is the Cochrane Handbook for Systematic


Reviews of Interventions (see Chapter 1.2), which contains precise recom-
mendations on virtually every aspect of systematic reviews and meta-
analyses. An overview of methodological standards for meta-analyses
(with a focus on social science) can be found in Pigott and Polanin
(2020).

While the PICO framework is an excellent way to specify the eligibility criteria of a
meta-analysis, it does not cover all information that may be relevant. There are a few
other aspects to consider (Lipsey and Wilson, 2001).
One relevant detail are the eligible research designs. In evidence-based medicine, it
is common to only include evidence from randomized controlled trials (meaning
studies in which participants were allocated to the treatment or control group by
chance); but this is not always required (Borenstein et al., 2011, chapter 40).
It may also be helpful to specify the cultural and linguistic range of eligible studies. Most
research is based on WEIRD populations, meaning western, educated, industrialized,
rich, and democratic societies (Henrich et al., 2010). Especially in social science, it
is very likely that certain effects or phenomena do not generalize well to countries
with other societal norms. Many researchers, however, only consider publications
in English for their meta-analyses, to avoid having to translate articles in other
languages. This means that some evidence from different language areas will not be
taken into account. Although English is the most common language for scientific
publishing in most disciplines, it should be at least made transparent in the eligibility
criteria that this limitation exists. If one of the goals of a meta-analysis is to examine
cross-cultural differences, however, it is generally advisable to extend the eligibility
criteria to other languages, provided all the other criteria are fulfilled.
Another important aspect is the publication type that is allowed for a meta-analysis.
Sometimes, meta-analysts only include research articles which were published in
peer-reviewed scientific journals. The argument is that studies taken from this source
fulfill higher standards since they have passed the critical eyes of experts in the
field. This justification is not without flaws. In Chapter 1.3, we already covered that
the “File Drawer” problem can seriously limit the validity of meta-analysis results
because positive findings are more likely to get published. A way to mitigate the risk
of publication bias is therefore to also include grey literature. Grey literature can be
defined as all types of research materials that have not been made available through
conventional publication formats. This includes research reports, preprints, working
16 1 Introduction

papers, or conference contributions. Dissertations also often count as grey literature,


although many of them are indexed in electronic bibliographic databases today
(Schöpfel and Rasuli, 2018). It may be advisable to at least also include dissertations
in a meta-analysis. Compared to other types of unpublished material, it is rather
unlikely that the information provided in dissertations is heavily biased or downright
fraudulent. Furthermore, you can still define other eligibility criteria to ensure that
only studies fulfilling certain methodological requirements are included, no matter
if they were published in scientific journals or not.
The last step of defining your eligibility criteria is to write them down as a list of
inclusion and exclusion criteria that you will apply. Here is an example from a meta-
analysis of insomnia interventions in college students showing how this can be done
(Saruhanjan et al., 2020):

“We included: (a) RCTs [randomized controlled trials; authors’ note] in which
(b) individuals enrolled at a tertiary education facility (university, college or
comparable postsecondary higher education facility) at the time of random-
ization, (c) received a sleep-focused psychological intervention, (d) that was
compared with a passive control condition, defined as a control condition in
which no active manipulation was induced as part of the study (wait-list,
treatment as usual).
For the purposes of this analysis, “sleep-focused” means that (e) effects on
symptoms of sleep disturbances (global measures of sleep disturbances, sleep-
onset latency […], fatigue and daytime functionality, pre-sleep behaviour and
experiences) were assessed as a (f) target outcome (by declaring a sleep outcome
as the primary outcome or by stating the intervention was primarily aimed
at this outcome) using (g) standardized symptom measures (objective sleep
measures, standardized sleep or fatigue questionnaires, sleep diaries, items
recording sleep quantity, quality or hygiene).
Only studies (h) published in English or German were considered for inclu-
sion.”

1.4.2 Analysis Plan & Preregistration

After your research question and eligibility criteria are set, it is sensible to also write
an analysis plan (Pigott and Polanin, 2020; Tipton et al., 2019). In statistics, there
is an important distinction between a priori and post hoc analyses. A priori analyses
are specified before seeing the data. Post hoc, or exploratory, analyses are conducted
1.4 Problem Specification, Study Search & Coding 17

after seeing the data, or based on the results implicated by the data. Results of a priori
analyses can be regarded as much more valid and trustworthy than post hoc analyses.
Post hoc analyses make it easier to tweak certain details about the analysis or the
data itself until results support the goals of the researcher. They are therefore much
more prone to the “Researcher Agenda” problem we discussed in Chapter 1.3.
In the analysis plan, we specify all important calculations we want to perform in
our meta-analysis a priori. This serves two purposes. First, it allows others to verify
that the analyses we made were indeed planned, and are not the mere result of us
playing around with the data until something desirable came out. Second, a detailed
analysis plan also makes our meta-analysis reproducible, meaning that others can
understand what we did at each step of our meta-analysis, and try to replicate them.
When using R, we can take the reproducibility of our analyses to a whole new level
by writing documents which allow others to re-run every step of our analysis (see
Chapter 16 in the “Helpful Tools” section). But this is relevant after we complete our
analyses. In the analysis plan, we specify what we plan to do before any data has been
collected.
There are a few things we should always specify in our analysis plan. We should
make clear which information we will extract, and which effect size metric will be
calculated for each included study (see Chapter 3). It is also recommended to decide
beforehand if we will use a fixed- or random-effects model to pool results from each
study, based on the amount of variation between studies we expect (see Chapter
4). An a priori power analysis may also be helpful to determine how many studies are
required for our meta-analysis to find a statistically significant effect (see Chapter
14 in the “Helpful Tools” section). Furthermore, it is crucial to determine if we want
to assess if some variables explain differences in the outcomes of included studies
using a subgroup analysis (Chapter 7) or meta-regression (Chapter 8). For example,
if our hypothesis states that the publication year might be associated with a study’s
outcome, and if we want to have a look at this association later in our meta-analysis,
we must mention this in our analysis plan. If we plan to sort studies into subgroups
and then have a look at these subgroups separately, we should also report the exact
criteria through which we will determine that a study belongs to a specific subgroup
(see Chapter 1.4.4). In Part II of this book, we will cover various statistical techniques
to apply as part of a meta-analysis. Every technique we learn there and plan to apply
in our meta-analysis should be mentioned in the analysis plan.
Once you are finished writing your analysis plan, do not simply bury it somewhere–
make it public. There are a few excellent options for researchers to make their re-
search documents openly available. For example, we can create a new project on the
website of the Open Science Framework (OSF; see Chapter 16.3 in the “Helpful Tools”
section) and upload our analysis plan there. We can also upload our analysis plan
to a preprint server such as medrxiv.org, biorxiv.org, or psyarxiv.com, depending on
the nature of our research question. Once our eligibility criteria, analysis plan and
search strategy (see next chapter) are set, we should also register our meta-analysis. If
the meta-analysis has a broadly health-related outcome, this may preferably be done
18 1 Introduction

using PROSPERO5 , one of the largest registries for prospective systematic reviews
and meta-analyses. The preregistration service of the OSF6 is also a good option.
In case we want to go even one step further, we can also write an entire protocol for our
meta-analysis (Quintana, 2015). A meta-analysis protocol contains the analysis plan,
plus a description of the scientific background of our study, more methodological
detail, and a discussion of the potential impact of the study. There are also guidelines
on how to write such protocols, such as the PRISMA-P Statement (Moher et al., 2015).
Meta-analysis protocols are accepted by many peer-review journals. A good example
can be found in Büscher, Torok and Sander (2019), or Valstad and colleagues (2016).
A priori analysis plans and preregistration are essential features of a well-made, trust-
worthy meta-analysis. And they should not make you anxious. Making the perfect
choice for each and every methodological decision straight away is difficult, if not
impossible. It is perfectly natural to make changes to one’s initial plans somewhere
down the road. We can assure you that, if you are honest and articulate about changes
to your planned approach, most researchers will not perceive this as a sign of failure,
but of professionalism and credibility.

1.4.3 Study Search

The next step after determining your eligibility criteria and analysis plan is to search
for studies. In Chapter 1.1, we discussed that most meta-analyses are an advanced
type of systematic review. We aim to find all available evidence on a research question
in order to get an unbiased, comprehensive view of the facts. This means that the
search for studies should also be as comprehensive as possible. Not only one, but
several sources should be used to search for studies. Here is an overview of important
and commonly used sources.
• Review articles. It can be very helpful to screen previous reviews on the same or
similar topics for relevant references. Narrative and systematic reviews usually
provide a citation for all the studies that they included in their review. Many of
these studies may also be relevant for your purposes.
• References in studies. If you find a study that is relevant for your meta-analysis, it
is sensible to also screen the articles that this study references. It is very likely
that the study cites previous literature on the same topic in the introduction
or discussion section, and some of these studies may also be relevant for your
meta-analysis.
• Forward search. A forward search can be seen as the opposite of screening the
references of previous primary studies and reviews. It means to take a study that
is relevant for the meta-analysis as basis, and then search for other articles that
have cited this study since it has been published. This can be done quite easily
5 https://www.crd.york.ac.uk/prospero/
6 https://osf.io/prereg/
1.4 Problem Specification, Study Search & Coding 19

on the Internet. You simply have to find the online entry of the study; usually,
it is on the website of the journal in which it has been published. Most journal
websites today have a functionality to display articles that have cited a study.
Alternatively, you can also search for the study on Google Scholar (see Table 1.1).
Google Scholar can display citing research for every entry.
• Relevant journals. Often, there are a number of scientific journals which are spe-
cialized in the type of research question you are focused on. Therefore, it can be
helpful to search for studies specifically in those journals. Virtually all journals
have a website with a search functionality today, which you can use to screen for
potentially eligible studies. Alternatively, you can also use electronic bibliograph-
ical databases, and use a filter so that only results from one or several journals
are displayed.
• Electronic bibliographical databases. The methods we described above can be seen
as rather fine-grained strategies. They are ways to search in places where it is
very likely that a relevant article will be listed. The disadvantage is that these
approaches will unlikely uncover all evidence that is really out there. Thus, it
is advisable to also use electronic bibliographic databases for one’s search. An
overview of important databases can be found in Table 1.1.
One should always conduct a search in several databases, not just one. Many
bibliographical databases contain an immense number of entries. Nevertheless,
it is common to find that the overlap in the results of database searches is smaller
than anticipated. You can select the databases you want to search based on their
subject-specific focus. If your meta-analysis focuses on health-related outcomes,
for example, you should at least search PubMed and CENTRAL.
When searching bibliographic databases, it is important to develop a search string.
A search string contains different words or terms, which are connected using
operators such as AND or OR. Developing search strings takes some time and ex-
perimenting. A good way to start is to use the PICO or eligibility criteria (Chapter
1.4.1) as basis and to connect them using AND (a simplified example would be “col-
lege student” AND “psychotherapy” AND “randomized controlled trial” AND “depression”).
Most bibliographical databases also allow for truncation and wildcards. Truncation
means to replace a word ending with a symbol, allowing it to vary as part of your
search. This is usually done using asterisks. Using “sociolog*” as a search term,
for example, means that the database will search for “sociology”, “sociological”,
and “sociologist” at the same time. A wildcard signifies that a letter in a word
can vary. This can come in handy when there are differences in the spelling of
words (for example, differences between American English and British English).
Take the search term “randomized”. This will only find studies using American
English spelling. If you use a wildcard (often symbolized by a question mark),
you can write “randomi?ed” instead, and this will also give results in which the
British English spelling was used (“randomised”).
When developing your search string, you should also have a look at the number
of hits. A search string should not be too specific, so that some relevant arti-
cles are missed. For example, getting around 3000 hits for your search string is
20 1 Introduction

manageable in later steps, and it makes it more likely that all important refer-
ences will be listed in your results. To see if your search string is generally valid,
it sometimes helps to inspect the first few hundred hits you get, and to check if
at least some of the references have something to do with your research question.
Once you have developed the final versions of the search strings you want to use
in your selected databases, save them somewhere. It is best practice to already
include your search string(s) in your preregistration. Reporting of the search
string (for example, in the supplement) is required if you want to publish a meta-
analysis protocol (see Chapter 1.4.1), or the final results of your meta-analysis.
In conclusion, we want to stress that searching bibliographic databases is an
art in and of itself, and that this paragraph only barely scratches the surface. A
much more detailed discussion of this topic can be found in Cuijpers (2016) and
Bramer and colleagues (2018).

TABLE 1.1: A selection of relevant bibliographical databases.

Database Description Website

Core Database

PubMed Openly accessible database of the US ncbi.nlm.nih.gov/ pubmed


National Library of Medicine. Primarily
contains biomedical research.

PsycInfo Database of the American Psychological apa.org/pubs/


Association. Primarily covers research in the databases/psycinfo
social and behavioral sciences. Allows for a
30-day free trial.

Cochrane Central Register of Openly accessible database of the Cochrane cochranelibrary.com/


Controlled Trials Collaboration. Primarily covers central
(CENTRAL) health-related topics.

Embase Database of biomedical research elsevier.com/solutions/


maintained by the large scientific publisher embase-biomedical-
Elsevier. Requires a license. research

ProQuest International Database of social science research. about.proquest.com/


Bibliography of the Social Requires a license. products-services/ibss-set-
Sciences c.html

Education Resources Openly accessible database on education eric.ed.gov


Information Center (ERIC) research.
1.4 Problem Specification, Study Search & Coding 21

TABLE 1.1: A selection of relevant bibliographical databases. (continued)

Database Description Website

Citation Database

Web of Science Interdisciplinary citation database webofknowledge.com


maintained by Clarivate Analytics. Requires
a license.

Scopus Interdisciplinary citation database scopus.com


maintained by Elsevier. Requires a license.

Google Scholar Openly accessible citation database scholar.google.com


maintained by Google. Has only limited
search and reference retrieval functionality.

Dissertations

ProQuest Dissertations Database of dissertations. Requires a license about.proquest.com/


products-
services/dissertations/

Study Registries

WHO International Clinical Openly accessible database of clinical trial www.who.int/ictrp


Trials Registry Platform registrations worldwide. Can be used to
(ICTRP) identify studies that have not (yet) been
published.

OSF Registries Openly accessible interdisciplinary database osf.io/registries


of study registrations. Can be used to
identify studies that have not (yet) been
published.

1.4.4 Study Selection

After completing your study search, you should have been able to collect thousands of
references from different sources. The next step is now to select the ones that fulfill
your eligibility criteria. It is advised to follow a three-stepped procedure to do this.
In the first step, you should remove duplicate references. Especially when you search
in multiple electronic bibliographical databases, it is likely that a reference will ap-
pear more than once. An easy way to do this is to first collect all your references in
22 1 Introduction

one place by importing them into a reference management software. There are several
good reference management tools. Some of them, like Zotero7 or Mendeley8 can be
downloaded for free. Other programs like EndNote9 provide more functionality but
usually require a license. Nearly all of those reference managers have a functionality
which allows you to automatically remove duplicate articles. It is important that
you write down the number of references you initially found in your study search,
and how many references remained after duplicate removal. Such details should be
reported later on once you make your meta-analysis public.
After duplicate removal, it is time to eliminate references that do not fit your purpose,
based on their title and abstract. It is very likely that your study search will yield
hundreds of results that are not even remotely linked to your research question10 .
Such references can be safely removed by looking at their title and abstract only. A
reference manager will be helpful for this step too. You can go through each reference
one after another and simply remove it when you are sure that the article is not
relevant for you11 . If you think that a study might contain interesting information
based on the title and abstract, do not remove it–even if it seems unlikely that the
study is important. It would be unfortunate if you put considerable time and effort
into a comprehensive study search just to erroneously delete relevant references in
the next step. The title and abstract-based screening of references does not require
you to give a specific reason why you excluded the study. In the end, you must only
document how many studies remained for the next step.
Based on title and abstract screening, it is likely that more than 90% of your initial
references could be removed. In the next step, you should now retrieve the full article
for each reference. Based on everything reported in the article, you then make a final
decision if the study fulfills your eligibility criteria or not. You should be particularly
thorough here because this is the final step determining if a study will be included
in your meta-analysis or not. Furthermore, it is not simply sufficient to say that you
removed a study because it did not fit your purpose. You have to give a reason here.
For each study you decide to remove, you should document why exactly it was not
eligible as per your defined criteria. Besides your eligibility criteria, there is one
other reason why you might not be able to include a study. When going through
the full article, you might discover that not enough information is provided to decide
whether the study is eligible or not. It is possible that a study simply does not provide
enough information on the research design. Another frequent scenario is that the
results of a study are not reported in a format that would allow to calculate the effect
size metric used in your meta-analysis. If this happens, you should try to contact
7 https://www.zotero.org/
8 https://www.mendeley.com/
9 https://endnote.com/
10 Lipsey and Wilson (2001) tell the amusing anecdote that, when searching articles for a meta-analysis

on the relationship between alcohol consumption and aggression, they had to exclude a surprisingly large
number of studies in which alcohol was given to fish to examine territorial fighting behavior.
11 When exporting references from an electronic database, the abstract is usually added to the reference

file, and can be displayed in the reference management tool. If no abstract is found for the reference, it
usually only takes a quick Google search of the study title to find it.
1.4 Problem Specification, Study Search & Coding 23

the corresponding author of the study at least two times, and ask for the needed
information. Only if the author does not respond, and if the information lacking in
the published article is essential, you can exclude the study.
Once we have arrived at the final selection of studies to include, we write down all
the details of the inclusion process in a flow diagram. A commonly used template for
such a flow chart is the one provided by the PRISMA guidelines12 . This flow chart
documents all the necessary information we covered above: (1) how many references
we could identify by searching electronic databases; (2) how many additional refer-
ences we found through other sources; (3) the number of references that remained
after duplicate removal; (4) the number of references we removed based on title
and abstract; (5) the number of articles we removed based on the full manuscript,
including how many articles where excluded due to which specific reason; and (6)
the number of studies we included in our qualitative synthesis (systematic review)
and quantitative synthesis (meta-analysis). Please note that the number of articles
that were not excluded at (5) and the number of studies included in (6) are usually
identical, but they do not have to be. For example, it is possible that one article reports
results of two or more independent studies, all of which are suitable for meta-analysis.
The number of studies would then be higher than the number of included articles.

Double-Screening

Nearly all relevant guidelines and consensus statements emphasize


that double screening should be used during the study selection process
(Tacconelli, 2009; Higgins et al., 2019; Campbell Collaboration, 2016).
This means that at least two people should perform each of the study
selection steps independently to avoid errors. Reference removal based
on the title and abstract should be conducted independently by two or
more researchers, and the combination of all records that have not been
removed by the assessors should be forwarded to the next step. Using
two or more assessors is even more important in the final step, in which
full articles are screened. In this step, each person should independently
assess if a study is eligible, and if it is not, give reasons why.
The assessors should then meet and compare their results. It is common
that assessors disagree on the eligibility of some studies, and such dis-
agreements can usually be resolved through discussion. If assessors fail
to find an agreement, it can be helpful to determine a senior researcher
beforehand who can make a final decision in such cases.
Using two or more assessors is not only advisable in the study selection

12 http://prisma-statement.org/PRISMAStatement/FlowDiagram
24 1 Introduction

process. This approach is also beneficial when extracting and coding


data (see Chapter 1.4.5).

1.4.5 Data Extraction & Coding

When the selection of studies to be included in the meta-analysis is finalized, data


can be extracted. There are three major types of information we should extract from
the selected articles (Cuijpers, 2016):

1. Characteristics of the studies.


2. Data needed to calculate effect sizes.
3. Study quality or risk of bias characteristics.

It is conventional for high-quality meta-analyses to provide a table in which charac-


teristics of the included studies are reported. The exact details reported in this table
can vary depending on the research field and research question. However, you should
always extract and report the first author of a study, and when it was published. The
sample size of each study should also be reported. Apart from that, you may include
some information on characteristics specified in the PICO of your meta-analysis;
such as the country of origin, the mean or median age, the proportion of female and
male participants, the type of intervention or exposure, the type of control group or
comparison (if applicable), as well as the assessed outcomes of each study. If one or
several studies have not assessed one of the characteristics, you should indicate that
this detail has not been specified in the table.
It is also necessary to extract and collect the data needed to calculate the effect sizes
or outcome measures we plan to pool. In Chapter 2, we will discuss in greater detail
how you can structure your effect size data in a spreadsheet so that it can easily
be used for calculations in R. If your analysis plan (see Chapter 1.4.2) also includes
planned subgroup analyses and meta-regressions, you should also extract the data
you need for these analyses from the articles.
It is common in meta-analysis to also rate and report the quality of the primary
studies. The information you need to extract from each study depends on the type
of rating system you are using. Countless tools to assess the quality of primary
studies have been developed in the last decades (Sanderson et al., 2007). When only
randomized controlled trials are eligible for your study, one of the best ways to code
the study quality is to use the Risk of Bias Tool developed by Cochrane (Higgins et al.,
2011; Sterne et al., 2019)13 . As it says in the title, this tool does not assess the quality
of studies per se, but their risk of bias. Study quality and risk of bias are related, but
13 https://methods.cochrane.org/bias/resources/rob-2-revised-cochrane-risk-bias-tool-

randomized-trials
1.5 Problem Specification, Study Search & Coding 25

not identical concepts. “Bias” refers to systematic errors in the results of a study or
their interpretation. Risks of bias are aspects of the way a study was conducted, or its
results, that may increase the likelihood of such systematic errors. Even when a study
only applies methods that are considered the “state of the art”, it is still possible that
biases exist. A study can fulfill all quality standards that are perceived as important
in a particular research field, but sometimes even these best practices may not be
enough to shield the study from distortions. The “risk of bias” concept thus has a
slightly different focus compared to study quality assessments. It primarily cares
if the output of an intervention study is believable and focuses on criteria which are
conducive to this goal (see also Higgins et al., 2019, chapter 7).
On several domains, the risk of bias tool lets you classify the risk of bias of a study
as “high” or “low”, or it can be determined that there are “some concerns”. There are
also conventions on how the risk of bias can be summarized visually (see Chapter
15, where we describe how this can be done in R). A similar resource to assess the
risk of bias in non-randomized studies is the Risk of Bias in Non-randomized Studies of
Interventions, or ROBINS-I, tool (Sterne et al., 2016)14 .
The Cochrane Risk of Bias tools have become the standard approach to assess the
risk of bias in (non-)randomized clinical trials (Jørgensen et al., 2016). In other areas,
current practices unfortunately still rather resemble the Wild West. In psychological
research, for example, study quality assessments are often inconsistent, nontrans-
parent, or not conducted at all (Hohn et al., 2019). If you plan to meta-analyze studies
other than clinical trials, there are two things you can do. First, you can check if
the Risk of Bias or ROBINS-I tool may still be applicable, for example, if your stud-
ies focus on another type of intervention that simply has no health-related focus.
Another–admittedly suboptimal–way may be to search for previous high-quality
meta-analyses on similar topics, and check how these studies have determined the
quality of primary studies.
This ends our dive into the history of meta-analysis, its problems, and how we can
avoid some of them when collecting and encoding our data. The next chapter is the
beginning of the “hands-on” part of this guide. In it, we will do our own first steps in
R.

14 https://www.riskofbias.info/welcome/home
26 1 Introduction

1.5 Questions & Answers

Test your knowledge!

1. How can meta-analyses be defined? What differentiates a meta-


analysis from other types of literature reviews?
2. Can you name one of the founding mothers and fathers of
meta-analysis? What achievement can be attributed to her or
him?
3. Name three common problems of meta-analyses and describe
them in one or two sentences.
4. Name qualities that define a good research question for a meta-
analysis.
5. Have a look at the eligibility criteria of the meta-analysis on
sleep interventions in college students (end of Chapter 1.4.1).
Can you extract the PICO from the inclusion and exclusion
criteria of this study?
6. Name a few important sources that can be used to search stud-
ies.
7. Describe the difference between “study quality” and “risk of
bias” in one or two sentences.

Answers to these questions are listed in Appendix A at the end of this book.

1.6 Summary
• More and more scientific research is published each year, making it harder to keep
track of available evidence. However, more research output does not automatically
result in scientific progress.
• Meta-analysis aims to combine the results of previous studies in a quantitative way.
It synthesizes all available evidence pertaining to a research question and can be
used for decision-making.
1.6 Summary 27

• Meta-analytic methods trace back to the beginning of the 20th century. Modern
meta-analytic approaches, however, have been developed in the second half of the
20th century, and meta-analysis has become a common research tool since then.
• There are several problems that are relevant for each meta-analysis: the “Apples
and Oranges” problem, the “Garbage In, Garbage Out” problem, the “File Drawer”
problem, and the “Researcher Agenda” problem.
• Many of these problems can be mitigated by defining a clear research question and
eligibility criteria, writing an analysis plan, pre-registering the meta-analysis, and
conducting the study search and data extraction in a systematic and reproducible
way.
References
Aaron, B. , Kromrey, J. D. , and Ferron, J. (1998). Equating r-based and d-based effect size
indices: Problems with a commonly recommended formula.
https://files.eric.ed.gov/fulltext/ED433353.pdf.
Adler, R. A. (2014). Osteoporosis in men: A review. Bone Research, 2:14001.
Alexander, R. A. , Scozzaro, M. J. , and Borodkin, L. J. (1989). Statistical and empirical
examination of the chi-square test for homogeneity of correlations in meta-analysis.
Psychological Bulletin, 106(2):329.
Altman, D. G. and Bland, J. M. (2011). How to obtain the confidence interval from a P value.
BMJ, 343:d2090.
Appelbaum, M. , Cooper, H. , Kline, R. B. , Mayo-Wilson, E. , Nezu, A. M. , and Rao, S. M.
(2018). Journal article reporting standards for quantitative research in psychology: The APA
publications and communications board task force report. American Psychologist, 73(1):3.
Aronow, P. M. and Miller, B. T. (2019). Foundations of agnostic statistics. Cambridge University
Press.
Assink, M. , Wibbelink, C. J. , et al. (2016). Fitting three-level meta-analytic models in R: A step-
by-step tutorial. The Quantitative Methods for Psychology, 12(3):154–174.
Bakbergenuly, I. , Hoaglin, D. C. , and Kulinskaya, E. (2020). Methods for estimating between-
study variance and overall effect in meta-analysis of odds ratios. Research Synthesis Methods,
11(3):426–442.
Bakbergenuly, I. and Kulinskaya, E. (2018). Meta-analysis of binary outcomes via generalized
linear mixed models: A simulation study. BMC Medical Research Methodology, 18(1):70.
Balduzzi, S. , Rücker, G. , and Schwarzer, G. (2019). How to perform a meta-analysis with R: A
practical tutorial. Evidence-Based Mental Health, 22(4):153–160.
Bauer, D. J. (2003). Estimating multilevel linear models as structural equation models. Journal
of Educational and Behavioral Statistics, 28(2):135–167.
Baujat, B. , Mahé, C. , Pignon, J.-P. , and Hill, C. (2002). A graphical method for exploring
heterogeneity in meta-analyses: Application to a meta-analysis of 65 trials. Statistics in
Medicine, 21(18):2641–2652.
Beck, A. T. , Steer, R. A. , and Brown, G. (1996). Beck Depression Inventory–II. Psychological
Assessment.
Becker, B. J. (1988). Synthesizing standardized mean-change measures. British Journal of
Mathematical and Statistical Psychology, 41(2):257–278.
Bellhouse, D. R. et al. (2004). The Reverend Thomas Bayes, FRS: A biography to celebrate the
tercentenary of his birth. Statistical Science, 19(1):3–43.
Berlin, J. A. and Antman, E. M. (1994). Advantages and limitations of metaanalytic regressions
of clinical trials data. The Online Journal of Current Clinical Trials.
Björk, B.-C. , Roos, A. , and Lauri, M. (2008). Global annual volume of peer reviewed scholarly
articles and the share available via different open access options. In Proceedings ELPUB 2008
Conference on Electronic Publishing, pages 178–186.
Bonett, D. G. (2020). Point-biserial correlation: Interval estimation, hypothesis testing, meta-
analysis, and sample size determination. British Journal of Mathematical and Statistical
Psychology, 73:113–144.
Borenstein, M. , Hedges, L. V. , Higgins, J. P. , and Rothstein, H. R. (2011). Introduction to
meta-analysis. John Wiley & Sons.
Borenstein, M. and Higgins, J. P. (2013). Meta-analysis and subgroups. Prevention Science,
14(2):134–143.
Borenstein, M. , Higgins, J. P. , Hedges, L. V. , and Rothstein, H. R. (2017). Basics of meta-
analysis: I 2 is not an absolute measure of heterogeneity. Research Synthesis Methods,
8(1):5–18.
Bradburn, M. J. , Deeks, J. J. , Berlin, J. A. , and Russell Localio, A. (2007). Much ado about
nothing: A comparison of the performance of meta-analytical methods with rare events.
Statistics in Medicine, 26(1):53–77.
Bramer, W. M. , de Jonge, G. B. , Rethlefsen, M. L. , Mast, F. , and Kleijnen, J. (2018). A
systematic approach to searching: An efficient and complete method to develop literature
searches. Journal of the Medical Library Association: JMLA, 106(4):531.
Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by
the author). Statistical Science, 16(3):199–231.
Browne, M. and Cudeck, R. (1993). Alternative ways of assessing model fit. In Bollen, K. and
Long, J. , editors, Testing structural equation models. Sage Publications.
Bürkner, P.-C. (2017). Advanced Bayesian multilevel modeling with the R package brms. ArXiv
Preprint 1705.11123.
Büscher, R. , Torok, M. , and Sander, L. (2019). The effectiveness of internet-based self-help
interventions to reduce suicidal ideation: Protocol for a systematic review and meta-analysis.
JMIR Research Protocols, 8(7):e14174.
Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan.
Journal of Statistical Software, Articles, 80(1):1–28.
Campbell Collaboration (2016). Methodological expectations of Campbell Collaboration
intervention reviews (MECCIR): Conduct standards.
https://onlinelibrary.wiley.com/page/journal/18911803/homepage/author-guidelines.
Carter, E. C. , Schönbrodt, F. D. , Gervais, W. M. , and Hilgard, J. (2019). Correcting for bias in
psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in
Psychological Science, 2(2):115–144.
Chan, A.-W. , Song, F. , Vickers, A. , Jefferson, T. , Dickersin, K. , Gøtzsche, P. C. , Krumholz,
H. M. , Ghersi, D. , and Van Der Worp, H. B. (2014). Increasing value and reducing waste:
Addressing inaccessible research. The Lancet, 383(9913):257–266.
Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the
Royal Statistical Society: Series A (Statistics in Society), 158(3):419–444.
Cheung, M. W. and Chan, W. (2009). A two-stage approach to synthesizing covariance
matrices in meta-analytic structural equation modeling. Structural Equation Modeling: A
Multidisciplinary Journal, 16(1):28–53.
Cheung, M. W.-L. (2008). A model for integrating fixed-, random-, and mixed-effects meta-
analyses into structural equation modeling. Psychological Methods, 13(3):182.
Cheung, M. W.-L. (2014). Modeling dependent effect sizes with three-level meta-analyses: A
structural equation modeling approach. Psychological Methods, 19(2):211.
Cheung, M. W.-L. (2015a). Meta-analysis: A structural equation modeling approach. John Wiley
& Sons.
Cheung, M. W.-L. (2015b). metasem: An R package for meta-analysis using structural equation
modeling. Frontiers in Psychology, 5:1521.
Christensen, P. M. and Kristiansen, I. S. (2006). Number-Needed-to-Treat (NNT)–needs
treatment with care. Basic & Clinical Pharmacology & Toxicology, 99(1):12–16.
Chung, Y. , Rabe-Hesketh, S. , Dorie, V. , Gelman, A. , and Liu, J. (2013). A nondegenerate
penalized likelihood estimator for variance parameters in multilevel models. Psychometrika,
78(4):685–709.
Cipriani, A. , Higgins, J. P. , Geddes, J. R. , and Salanti, G. (2013). Conceptual and technical
challenges in network meta-analysis. Annals of Internal Medicine, 159(2):130–137.
Cochran, W. G. (1954). Some methods for strengthening the common χ 2 tests. Biometrics,
10(4):417–451.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Erlbaum Press.
Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research.
InterJournal, Complex Systems:1695.
Cuijpers, P. (2016). Meta-analyses in mental health research. A practical guide. Amsterdam, the
Netherlands : Pim Cuijpers Uitgeverij .
Cuijpers, P. , Karyotaki, E. , Reijnders, M. , and Ebert, D. (2019a). Was Eysenck right after all?
A reassessment of the effects of psychotherapy for adult depression. Epidemiology and
Psychiatric Sciences, 28(1):21–30.
Cuijpers, P. , Noma, H. , Karyotaki, E. , Cipriani, A. , and Furukawa, T. A. (2019b).
Effectiveness and acceptability of cognitive behavior therapy delivery formats in adults with
depression: A network meta-analysis. JAMA Psychiatry, 76(7):700–707.
Cuijpers, P. , Reijnders, M. , and Huibers, M. J. (2019c). The role of common factors in
psychotherapy outcomes. Annual Review of Clinical Psychology, 15:207–231.
Cuijpers, P. and Smit, F. (2002). Excess mortality in depression: A meta-analysis of community
studies. Journal of Affective Disorders, 72(3):227–236.
Cuijpers, P. , Turner, E. H. , Koole, S. L. , Van Dijke, A. , and Smit, F. (2014). What is the
threshold for a clinically relevant effect? The case of major depressive disorders. Depression
and Anxiety, 31(5):374–378.
Cuijpers, P. , Weitz, E. , Cristea, I. , and Twisk, J. (2017). Pre-post effect sizes should be
avoided in meta-analyses. Epidemiology and Psychiatric Sciences, 26(4):364–368.
Cummings, S. R. , Browner, W. S. , and Hulley, S. B. (2013). Conceiving the research question
and developing the study plan. Designing Clinical Research, 4:14–22.
Dahlke, J. A. and Wiernik, B. M. (2019). psychmeta: An R package for psychometric meta-
analysis. Applied Psychological Measurement, 43(5):415–416.
Dechartres, A. , Atal, I. , Riveros, C. , Meerpohl, J. , and Ravaud, P. (2018). Association
between publication characteristics and treatment effect estimates: A meta-epidemiologic study.
Annals of Internal Medicine, 169(6):385–393.
DerSimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials,
7(3):177–188.
Dias, S. , Ades, A. E. , Welton, N. J. , Jansen, J. P. , and Sutton, A. J. (2018). Network meta-
analysis for decision-making. John Wiley & Sons.
Dias, S. , Sutton, A. J. , Ades, A. , and Welton, N. J. (2013). Evidence synthesis for decision
making 2: A generalized linear modeling framework for pairwise and network meta-analysis of
randomized controlled trials. Medical Decision Making, 33(5):607–617.
Dias, S. , Welton, N. J. , Caldwell, D. , and Ades, A. E. (2010). Checking consistency in mixed
treatment comparison meta-analysis. Statistics in Medicine, 29(7-8):932–944.
DiCiccio, T. J. and Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, pages
189–212.
Duval, S. and Tweedie, R. (2000). Trim and fill: A simple funnel-plot–based method of testing
and adjusting for publication bias in meta-analysis. Biometrics, 56(2):455–463.
Ebrahim, S. , Bance, S. , Athale, A. , Malachowski, C. , and Ioannidis, J. P. (2016). Meta-
analyses with industry involvement are massively published and report no caveats for
antidepressants. Journal of Clinical Epidemiology, 70:155–163.
Edwards, S. , Clarke, M. , Wordsworth, S. , and Borrill, J. (2009). Indirect comparisons of
treatments based on systematic reviews of randomised controlled trials. International Journal of
Clinical Practice, 63(6):841–854.
Efthimiou, O. (2018). Practical guide to the meta-analysis of rare events. Evidence-Based
Mental Health, 21(2):72–76.
Efthimiou, O. , Debray, T. P. , van Valkenhoef, G. , Trelle, S. , Panayidou, K. , Moons, K. G. ,
Reitsma, J. B. , Shang, A. , Salanti, G. , and Group, G. M. R. (2016). GetReal in network meta-
analysis: A review of the methodology. Research Synthesis Methods, 7(3):236–263.
Egger, M. , Smith, G. D. , Schneider, M. , and Minder, C. (1997). Bias in meta-analysis detected
by a simple, graphical test. BMJ, 315(7109):629–634.
Elwood, P. (2006). The first randomized trial of aspirin for heart attack and the advent of
systematic overviews of trials. Journal of the Royal Society of Medicine, 99(11):586–588.
Epskamp, S. (2019). semplot: Path diagrams and visual analysis of various SEM packages'
output. R package version 1.1.2. https://CRAN.R-project.org/package=semPlot.
Etz, A. (2018). Introduction to the concept of likelihood and its applications. Advances in
Methods and Practices in Psychological Science, 1(1):60–69.
Eysenck, H. J. (1978). An exercise in mega-silliness. American Psychologist, 33(5).
Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries.
Scientometrics, 90(3):891–904.
Fisher, R. A. (1935). The Design of Experiments. Oliver & Boyd, Edinburgh, UK.
Follmann, D. A. and Proschan, M. A. (1999). Valid inference in random effects meta-analysis.
Biometrics, 55(3):732–737.
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density
estimation. Journal of the American Statistical Association, 97(458):611–631.
Friese, M. , Loschelder, D. D. , Gieseler, K. , Frankenbach, J. , and Inzlicht, M. (2019). Is ego
depletion real? An analysis of arguments. Personality and Social Psychology Review,
23(2):107–131.
Furukawa, T. A. and Leucht, S. (2011). How to obtain NNT from Cohen's d: comparison of two
methods. PLOS ONE, 6(4):e19070.
Furukawa, T. A. , McGuire, H. , and Barbui, C. (2003). Low dosage tricyclic antidepressants for
depression. Cochrane Database of Systematic Reviews, (3).
Furukawa, T. A. , Reijnders, M. , Kishimoto, S. , Sakata, M. , DeRubeis, R. J. , Dimidjian, S. ,
Dozois, D. J. , Hegerl, U. , Hollon, S. D. , Jarrett, R. B. , Lespérance, F. , Segal, Z. V. , Mohr, D.
C. , Simons, A. D. , Quilty, L. C. , Reynolds, C. F. , Gentili, C. , Leucht, S. , Engel, R. , and
Cuijpers, P. (2020). Translating the BDI and BDI-II into the HAMD and vice versa with
equipercentile linking. Epidemiology and Psychiatric Sciences, 29.
Gart, J. J. and Zweifel, J. R. (1967). On the bias of various estimators of the logit and its
variance with application to quantal bioassay. Biometrika, 181–187.
Gasparrini, A. , Armstrong, B. , and Kenward, M. G. (2012). Multivariate meta-analysis for non-
linear and other multi-parameter associations. Statistics in Medicine, 31(29):3821–3839.
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5):587–606.
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational
Researcher, 5(10):3–8.
Good, P. (2013). Permutation tests: A practical guide to resampling methods for testing
hypotheses. Springer Science & Business.
Greco, T. , Zangrillo, A. , Biondi-Zoccai, G. , and Landoni, G. (2013). Meta-analysis: Pitfalls and
hints. Heart, Lung and Vessels, 5(4):219.
Grolemund, G. (2014). Hands-on programming with R: Write your own functions and
simulations. O'Reilly.
Hamberg, K. (2008). Gender bias in medicine. Women's Health, 4(3):237–243.
Harrer, M. , Adam, S. H. , Messner, E.-M. , Baumeister, H. , Cuijpers, P. , Bruffaerts, R. ,
Auerbach, R. P. , Kessler, R. C. , Jacobi, C. , Taylor, C. B. , and Ebert, D. D. (2020). Prevention
of eating disorders at universities: A systematic review and meta-analysis. International Journal
of Eating Disorders, 53(3):823–833.
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108.
Hartung, J. (1999). An alternative method for meta-analysis. Biometrical Journal: Journal of
Mathematical Methods in Biosciences, 41(8):901–916.
Hartung, J. and Knapp, G. (2001a). On tests of the overall treatment effect in meta-analysis with
normally distributed responses. Statistics in Medicine, 20(12):1771–1782.
Hartung, J. and Knapp, G. (2001b). A refined method for the meta-analysis of controlled clinical
trials with binary outcome. Statistics in Medicine, 20(24):3875–3889.
Hedges, L. and Olkin, I. (2014). Statistical methods for meta-analysis. Academic Press.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related
estimators. Journal of Educational Statistics, 6(2):107–128.
Hedges, L. V. (1984). Estimation of effect size under nonrandom sampling: The effects of
censoring studies yielding statistically insignificant mean differences. Journal of Educational
Statistics, 9(1):61–85.
Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical
Science, 7(2):246–255.
Hedges, L. V. and Pigott, T. D. (2001). The power of statistical tests in meta-analysis.
Psychological Methods, 6(3):203.
Hedges, L. V. and Pigott, T. D. (2004). The power of statistical tests for moderators in meta-
analysis. Psychological Methods, 9(4):426.
Hedges, L. V. and Vevea, J. L. (1996). Estimating effect size under publication bias: Small
sample properties and robustness of a random effects selection model. Journal of Educational
and Behavioral Statistics, 21(4):299–332.
Hedges, L. V. and Vevea, J. L. (1998). Fixed-and random-effects models in meta-analysis.
Psychological Methods, 3(4):486.
Henrich, J. , Heine, S. J. , and Norenzayan, A. (2010). Most people are not WEIRD. Nature,
466(7302):29.
Higgins, J. , Jackson, D. , Barrett, J. , Lu, G. , Ades, A. , and White, I. (2012). Consistency and
inconsistency in network meta-analysis: Concepts and models for multi-arm studies. Research
Synthesis Methods, 3(2):98–110.
Higgins, J. , Thompson, S. , Deeks, J. , and Altman, D. (2002). Statistical heterogeneity in
systematic reviews of clinical trials: A critical appraisal of guidelines and practice. Journal of
Health Services Research Policy, 7(1):51–61.
Higgins, J. P. , Altman, D. G. , Gøtzsche, P. C. , Jüni, P. , Moher, D. , Oxman, A. D. , Savović,
J. , Schulz, K. F. , Weeks, L. , and Sterne, J. A. (2011). The Cochrane Collaboration's tool for
assessing risk of bias in randomised trials. BMJ, 343:d5928.
Higgins, J. P. , Thomas, J. , Chandler, J. , Cumpston, M. , Li, T. , Page, M. J. , and Welch, V. A.
(2019). Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons.
Higgins, J. P. and Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis.
Statistics in Medicine, 21(11):1539–1558.
Higgins, J. P. and Thompson, S. G. (2004). Controlling the risk of spurious findings from meta-
regression. Statistics in Medicine, 23(11):1663–1682.
Higgins, J. P. , Thompson, S. G. , and Spiegelhalter, D. J. (2009). A re-evaluation of random-
effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society),
172(1):137–159.
Hoaglin, D. C. (2016). Misunderstandings about Q and ‘Cochran's Q test' in meta-analysis.
Statistics in Medicine, 35(4):485–495.
Hoenig, J. M. and Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power
calculations for data analysis. The American Statistician, 55(1):19–24.
Hoffman, M. D. and Gelman, A. (2014). The No-U-Turn sampler: Adaptively setting path lengths
in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1):1593–1623.
Hohn, R. E. , Slaney, K. L. , and Tafreshi, D. (2019). Primary study quality in psychological
meta-analyses: An empirical assessment of recent practice. Frontiers in Psychology, 9:2667.
Hough, S. L. and Hall, B. W. (1994). Comparison of the Glass and Hunter-Schmidt meta-
analytic techniques. The Journal of Educational Research, 87(5):292–296.
Hunter, J. E. and Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in
research findings. Sage.
Hutton, J. L. (2010). Misleading statistics. Pharmaceutical Medicine, 24(3):145–149.
Infanger, D. and Schmidt-Trucksäss, A. (2019). P value functions: An underused method to
present research results and to promote quantitative reasoning. Statistics in Medicine,
38(21):4189–4197.
Iniesta, R. , Stahl, D. , and McGuffin, P. (2016). Machine learning, statistical learning and the
future of biological research in psychiatry. Psychological Medicine, 46(12):2455–2465.
IntHout, J. , Ioannidis, J. P. , and Borm, G. F. (2014). The Hartung-Knapp-Sidik-Jonkman
method for random effects meta-analysis is straightforward and considerably outperforms the
standard DerSimonian-Laird method. BMC Medical Research Methodology, 14(1):25.
IntHout, J. , Ioannidis, J. P. , Rovers, M. M. , and Goeman, J. J. (2016). Plea for routinely
presenting prediction intervals in meta-analysis. BMJ Open, 6(7).
Ioannidis, J. P. (2005). Why most published research findings are false. PLOS Medicine,
2(8):e124.
Ioannidis, J. P. (2006). Indirect comparisons: The mesh and mess of clinical trials. The Lancet,
368(9546):1470–1472.
Ioannidis, J. P. (2012). Why science is not necessarily self-correcting. Perspectives on
Psychological Science, 7(6):645–654.
Ioannidis, J. P. (2016). The mass production of redundant, misleading, and conflicted
systematic reviews and meta-analyses. The Milbank Quarterly, 94(3):485–514.
Iyengar, S. and Greenhouse, J. B. (1988). Selection models and the file drawer problem.
Statistical Science, 3(1):109–117.
Jackson, D. (2013). Confidence intervals for the between-study variance in random effects
meta-analysis using generalised Cochran heterogeneity statistics. Research Synthesis
Methods, 4(3):220–229.
Jackson, D. , White, I. R. , and Riley, R. D. (2013). A matrix-based method of moments for
fitting the multivariate random effects model for meta-analysis and meta-regression. Biometrical
Journal, 55(2):231–245.
Jordan, A. E. , Blackburn, N. A. , Des Jarlais, D. C. , and Hagan, H. (2017). Past-year
prevalence of prescription opioid misuse among those 11 to 30 years of age in the United
States: A systematic review and meta-analysis. Journal of Substance Abuse Treatment,
77:31–37.
Jöreskog, K. and Sörbom, D. (2006). LISREL 8.80. Chicago: Scientific Software International.
Computer software.
Jørgensen, L. , Paludan-Müller, A. S. , Laursen, D. R. , Savović, J. , Boutron, I. , Sterne, J. A. ,
Higgins, J. P. , and Hróbjartsson, A. (2016). Evaluation of the Cochrane tool for assessing risk
of bias in randomized clinical trials: Overview of published comments and analysis of user
practice in Cochrane and non-Cochrane reviews. Systematic Reviews, 5(1):80.
Kay, M. (2020). tidybayes: Tidy data and geoms for Bayesian models. R package version 2.1.1.
http://mjskay.github.io/tidybayes.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2(3):196–217.
Kim, E. S. and Menon, V. (2009). Status of women in cardiovascular clinical trials.
Arteriosclerosis, thrombosis, and vascular biology, 29(3):279–283.
Kirsch, I. (2010). The emperor's new drugs: Exploding the antidepressant myth. Basic Books.
Kirsch, I. , Moore, T. J. , Scoboria, A. , and Nicholls, S. S. (2002). The emperor's new drugs: An
analysis of antidepressant medication data submitted to the us food and drug administration.
Prevention & Treatment, 5(1):23a.
Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford.
Knapp, G. and Hartung, J. (2003). Improved tests for a random effects meta-regression with a
single covariate. Statistics in Medicine, 22(17):2693–2710.
Koffel, E. and Watson, D. (2009). The two-factor structure of sleep complaints and its relation to
depression and anxiety. Journal of Abnormal Psychology, 118(1):183.
König, J. , Krahn, U. , and Binder, H. (2013). Visualizing the flow of evidence in network meta-
analysis and characterizing mixed treatment comparisons. Statistics in Medicine,
32(30):5414–5429.
Kraemer, H. C. and Kupfer, D. J. (2006). Size of treatment effects and their importance to
clinical research and practice. Biological Psychiatry, 59(11):990–996.
Krahn, U. , Binder, H. , and König, J. (2013). A graphical tool for locating inconsistency in
network meta-analyses. BMC Medical Research Methodology, 13(1):35.
Lakens, D. , Page-Gould, E. , van Assen, M. A. , Spellman, B. , Schönbrodt, F. , Hasselman, F.
, Corker, K. S. , Grange, J. A. , Sharples, A. , Cavender, C. , Hilde, A. , Heike, G. , Cosima, L. ,
Ian, M. , Farid, A. , and Anne, S. (2017). Examining the reproducibility of meta-analyses in
psychology: A preliminary report. https://osf.io/xfbjf/.
Langan, D. , Higgins, J. P. , Jackson, D. , Bowden, J. , Veroniki, A. A. , Kontopantelis, E. ,
Viechtbauer, W. , and Simmonds, M. (2019). A comparison of heterogeneity variance
estimators in simulated random-effects meta-analyses. Research Synthesis Methods,
10(1):83–98.
Lipsey, M. W. and Wilson, D. B. (2001). Practical meta-analysis. SAGE.
Lu, G. and Ades, A. (2009). Modeling between-trial variance structure in mixed treatment
comparisons. Biostatistics, 10(4):792–805.
Lüdecke, D. (2019). esc: Effect size computation for meta analysis (version 0.5.1).
https://CRAN.R-project.org/package=esc.
Mahood, Q. , Van Eerd, D. , and Irvin, E. (2014). Searching for grey literature for systematic
reviews: challenges and benefits. Research Synthesis Methods, 5(3):221–234.
Makambi, K. H. (2004). The effect of the heterogeneity variance estimator on some tests of
treatment efficacy. Journal of Biopharmaceutical Statistics, 14(2):439–449.
Mansfield, E. R. and Helms, B. P. (1982). Detecting multicollinearity. The American Statistician,
36(3a):158–160.
Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of data from
retrospective studies of disease. Journal of the National Cancer Institute, 22(4):719–748.
Marsman, M. , Schönbrodt, F. D. , Morey, R. D. , Yao, Y. , Gelman, A. , and Wagenmakers, E.-
J. (2017). A Bayesian bird's eye view of ‘Replications of important results in social psychology'.
Royal Society Open Science, 4(1):160426.
Mattos, C. T. and Ruellas, A. C. d. O. (2015). Systematic review and meta-analysis: What are
the implications in the clinical practice? Dental Press Journal of Orthodontics, 20(1):17–19.
Mbuagbaw, L. , Rochwerg, B. , Jaeschke, R. , Heels-Andsell, D. , Alhazzani, W. , Thabane, L. ,
and Guyatt, G. H. (2017). Approaches to interpreting and choosing the best treatments in
network meta-analyses. Systematic Reviews, 6(1):1–5.
McArdle, J. J. and McDonald, R. P. (1984). Some algebraic properties of the reticular action
model for moment structures. British Journal of Mathematical and Statistical Psychology,
37(2):234–251.
McAuley, L. , Tugwell, P. , Moher, D. , et al. (2000). Does the inclusion of grey literature
influence estimates of intervention effectiveness reported in meta-analyses? The Lancet,
356(9237):1228–1231.
McGrayne, S. B. (2011). The theory that would not die: How Bayes' rule cracked the enigma
code, hunted down Russian submarines, and emerged triumphant from two centuries of
controversy. Yale University Press.
McGuinness, L. A. (2019). robvis: An R package and web application for visualising risk-of-bias
assessments. https://github.com/mcguinlu/robvis.
McGuinness, L. A. and Higgins, J. P. (2020). Risk-Of-Bias VISualization (robvis): An R package
and shiny web app for visualizing risk-of-bias assessments. Research Synthesis Methods,
12(1).
McNeish, D. (2016). On using Bayesian methods to address small sample problems. Structural
Equation Modeling: A Multidisciplinary Journal, 23(5):750–773.
McNutt, M. (2014). Reproducibility. Science, 343(6168):229.
McShane, B. B. , Böckenholt, U. , and Hansen, K. T. (2016). Adjusting for publication bias in
meta-analysis: An evaluation of selection methods and some cautionary notes. Perspectives on
Psychological Science, 11(5):730–749.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow
progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4):806.
Mehta, P. D. and Neale, M. C. (2005). People are variables too: Multilevel structural equations
modeling. Psychological Methods, 10(3):259.
Mendes, D. , Alves, C. , and Batel-Marques, F. (2017). Number needed to treat (NNT) in clinical
literature: An appraisal. BMC Medicine, 15(1):112.
Moher, D. , Liberati, A. , Tetzlaff, J. , Altman, D. G. , Group, P. , et al. (2009). Preferred
reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS
Medicine, 6(7).
Moher, D. , Shamseer, L. , Clarke, M. , Ghersi, D. , Liberati, A. , Petticrew, M. , Shekelle, P. ,
Stewart, L. A. , and, P.-P.G. (2015). Preferred reporting items for systematic review and meta-
analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews, 4(1):1.
Møller, A. P. and Mousseau, T. A. (2015). Strong effects of ionizing radiation from Chernobyl on
mutation rates. Scientific Reports, 5:8363.
Mosca, L. , Hammond, G. , Mochari-Greenberger, H. , Towfighi, A. , and Albert, M. A. (2013).
Fifteen-year trends in awareness of heart disease in women: Results of a 2012 American Heart
Association national survey. Circulation, 127(11):1254–1263.
Muthén, L. K. and Muthén, B. O. (2012). MPlus: Statistical analysis with latent variables–user's
guide.
Ngamaba, K. H. , Panagioti, M. , and Armitage, C. J. (2017). How strongly related are health
status and subjective well-being? Systematic review and meta-analysis. The European Journal
of Public Health, 27(5):879–885.
Nielsen, M. W. , Andersen, J. P. , Schiebinger, L. , and Schneider, J. W. (2017). One and a half
million medical papers reveal a link between author gender and attention to gender and sex
analysis. Nature Human Behaviour, 1(11):791–796.
Nuzzo, R. (2014). Statistical errors: P values, the ‘gold standard' of statistical validity, are not as
reliable as many scientists assume. Nature, 506(7487):150–153.
Olkin, I. , Dahabreh, I. J. , and Trikalinos, T. A. (2012). GOSH–a graphical display of study
heterogeneity. Research Synthesis Methods, 3(3):214–223.
Olkin, I. and Finn, J. D. (1995). Correlations redux. Psychological Bulletin, 118(1):155.
Open Science Collaboration et al. (2015). Estimating the reproducibility of psychological
science. Science, 349(6251).
O'Rourke, K. (2007). An historical perspective on meta-analysis: Dealing quantitatively with
varying study results. Journal of the Royal Society of Medicine, 100(12):579–582.
Page, M. J. , Sterne, J. A. , Higgins, J. P. , and Egger, M. (2020). Investigating and dealing with
publication bias and other reporting biases in meta-analyses of health research: A review.
Research Synthesis Methods.
Panageas, K. S. , Ben-Porat, L. , Dickler, M. N. , Chapman, P. B. , and Schrag, D. (2007).
When you look matters: The effect of assessment schedule on progression-free survival.
Journal of the National Cancer Institute, 99(6):428–432.
Pastor, D. A. and Lazowski, R. A. (2018). On the multilevel nature of meta-analysis: A tutorial,
comparison of software programs, and discussion of analytic choices. Multivariate Behavioral
Research, 53(1):74–89.
Patsopoulos, N. A. , Analatos, A. A. , and Ioannidis, J. P. (2005). Relative citation impact of
various study designs in the health sciences. JAMA, 293(19):2362–2366.
Paule, R. C. and Mandel, J. (1982). Consensus values and weighting factors. Journal of
Research of the National Bureau of Standards, 87(5):377–385.
Peters, J. L. , Sutton, A. J. , Jones, D. R. , Abrams, K. R. , and Rushton, L. (2006). Comparison
of two methods to detect publication bias in meta-analysis. JAMA, 295(6):676–680.
Peters, J. L. , Sutton, A. J. , Jones, D. R. , Abrams, K. R. , and Rushton, L. (2007). Performance
of the trim and fill method in the presence of publication bias and between-study heterogeneity.
Statistics in Medicine, 26(25):4544–4562.
Peters, J. L. , Sutton, A. J. , Jones, D. R. , Abrams, K. R. , and Rushton, L. (2008). Contour-
enhanced meta-analysis funnel plots help distinguish publication bias from other causes of
asymmetry. Journal of Clinical Epidemiology, 61(10):991–996.
Peterson, B. G. and Carl, P. (2020). PerformanceAnalytics: Econometric tools for performance
and risk analysis. R package version 2.0.4. https://CRAN.R-
project.org/package=PerformanceAnalytics.
Peto, R. and Parish, S. (1980). Aspirin after myocardial infarction. The Lancet,
1(8179):1172–1173.
Piantadosi, S. , Byar, D. P. , and Green, S. B. (1988). The ecological fallacy. American Journal
of Epidemiology, 127(5):893–904.
Pigott, T. D. and Polanin, J. R. (2020). Methodological guidance paper: High-quality meta-
analysis in a systematic review. Review of Educational Research, 90(1):24–46.
Plummer, M. (2019). rjags: Bayesian graphical models using MCMC. R package version 4-10.
https://CRAN.R-project.org/package=rjags.
Poole, C. and Greenland, S. (1999). Random-effects meta-analyses are not always
conservative. American Journal of Epidemiology, 150(5):469–475.
Pustejovsky, J. E. and Rodgers, M. A. (2019). Testing for funnel plot asymmetry of standardized
mean differences. Research Synthesis Methods, 10(1):57–71.
Quintana, D. S. (2015). From pre-registration to publication: A non-technical primer for
conducting a meta-analysis to synthesize correlational data. Frontiers in Psychology, 6:1549.
Raudenbush, S. (2009). Analyzing effect sizes: Random effects models. In Cooper, H. ,
Hedges, L. , and Valentine, J. , editors, The handbook of research synthesis and meta-analysis
(2nd Ed.). Russell Sage Foundation.
Riley, R. D. , Lambert, P. C. , and Abo-Zaid, G. (2010). Meta-analysis of individual participant
data: Rationale, conduct, and reporting. BMJ, 340:c221.
Riley, R. D. , Simmonds, M. C. , and Look, M. P. (2007). Evidence synthesis combining
individual patient data and aggregate data: A systematic review identified current practice and
possible methods. Journal of Clinical Epidemiology, 60(5):431.e1–431.e12.
Robins, J. , Greenland, S. , and Breslow, N. E. (1986). A general estimator for the variance of
the Mantel-Haenszel odds ratio. American Journal of Epidemiology, pages 719–723.
Rosnow, R. L. and Rosenthal, R. (1996). Computing contrasts, effect sizes, and counternulls on
other people's published data: General procedures for research consumers. Psychological
Methods, 1(4):331.
Rosnow, R. L. , Rosenthal, R. , and Rubin, D. B. (2000). Contrasts and correlations in effect-
size estimation. Psychological Science, 11(6):446–453.
Rothman, K. J. , Greenland, S. , and Lash, T. L. (2008). Modern epidemiology. Lippincott
Williams & Wilkins.
Rothstein, H. R. , Sutton, A. J. , and Borenstein, M. (2005). Publication bias in meta-analysis.
John Wiley & Sons.
Röver, C. (2017). Bayesian random-effects meta-analysis using the ‘bayesmeta' R package.
ArXiv Preprint 1711.08683.
Rücker, G. (2012). Network meta-analysis, electrical networks and graph theory. Research
Synthesis Methods, 3(4):312–324.
Rücker, G. and Schwarzer, G. (2015). Ranking treatments in frequentist network meta-analysis
works without resampling methods. BMC Medical Research Methodology, 15(58).
Rücker, G. and Schwarzer, G. (2021). Beyond the forest plot: The drapery plot. Research
Synthesis Methods, 12(1):13–19.
Rücker, G. , Schwarzer, G. , Carpenter, J. R. , Binder, H. , and Schumacher, M. (2011).
Treatment-effect estimates adjusted for small-study effects via a limit meta-analysis.
Biostatistics, 12(1):122–142.
Rücker, G. , Schwarzer, G. , Carpenter, J. R. , and Schumacher, M. (2008). Undue reliance on I
2 in assessing heterogeneity may mislead. BMC Medical Research Methodology, 8(1):79.
Rücker, G. , Krahn, U. , König, J. , Efthimiou, O. , and Schwarzer, G. (2020). netmeta: Network
Meta-Analysis using Frequentist Methods. R package version 1.2-1.
Salanti, G. , Ades, A. , and Ioannidis, J. P. (2011). Graphical methods and numerical summaries
for presenting results from multiple-treatment meta-analysis: An overview and tutorial. Journal
of Clinical Epidemiology, 64(2):163–171.
Salanti, G. , Del Giovane, C. , Chaimani, A. , Caldwell, D. M. , and Higgins, J. P. (2014).
Evaluating the quality of evidence from a network meta-analysis. PLOS ONE, 9(7):e99682.
Sanderson, S. , Tatt, I. D. , and Higgins, J. (2007). Tools for assessing quality and susceptibility
to bias in observational studies in epidemiology: A systematic review and annotated
bibliography. International Journal of Epidemiology, 36(3):666–676.
Saruhanjan, K. , Zarski, A.-C. , Bauer, T. , Baumeister, H. , Cuijpers, P. , Spiegelhalder, K. ,
Auerbach, R. P. , Kessler, R. C. , Bruffaerts, R. , Karyotaki, E. , Berking, M. , and Ebert, D. D.
(2020). Psychological interventions to improve sleep in college students: A meta-analysis of
randomized controlled trials. Journal of Sleep Research, e13097.
Schauberger, P. and Walker, A. (2020). openxlsx: Read, write and edit xlsx files. R package
version 4.1.5. https://CRAN.R-project.org/package=openxlsx.
Scherer, R. W. , Meerpohl, J. J. , Pfeifer, N. , Schmucker, C. , Schwarzer, G. , and von Elm, E.
(2018). Full publication of results initially presented in abstracts. Cochrane Database of
Systematic Reviews, 1(11).
Schmidt, F. L. and Hunter, J. E. (1977). Development of a general solution to the problem of
validity generalization. Journal of Applied Psychology, 62(5):529.
Schmucker, C. , Schell, L. K. , Portalupi, S. , Oeller, P. , Cabrera, L. , Bassler, D. , Schwarzer,
G. , Scherer, R. W. , Antes, G. , Von Elm, E. , and Joerg, J. M. (2014). Extent of non-publication
in cohorts of studies approved by research ethics committees or included in trial registries.
PLOS ONE, 9(12):e114023.
Schubert, E. , Sander, J. , Ester, M. , Kriegel, H. P. , and Xu, X. (2017). DBSCAN revisited,
revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database
Systems (TODS), 42(3):1–21.
Schwarzer, G. , Carpenter, J. R. , and Rücker, G. (2015). Meta-analysis with R. Springer.
Schwarzer, G. , Carpenter, J. R. , and Rücker, G. (2020). metasens: Advanced statistical
methods to model and adjust for bias in meta-analysis. R package version 0.5-0.
https://CRAN.R-project.org/package=metasens.
Schwarzer, G. , Chemaitelly, H. , Abu-Raddad, L. J. , and Rücker, G. (2019). Seriously
misleading results using inverse of Freeman-Tukey double arcsine transformation in meta-
analysis of single proportions. Research Synthesis Methods, 10(3):476–483.
Schopfel, J. and Rasuli, B. (2018). Are electronic theses and dissertations (still) grey literature in
the digital age? A fair debate. The Electronic Library, 36(2):208–219.
Shannon, H. (2016). A statistical note on Karl Pearson's 1904 meta-analysis. Journal of the
Royal Society of Medicine, 109(8):310–311.
Sharpe, D. (1997). Of apples and oranges, file drawers and garbage: Why validity issues in
meta-analysis will not go away. Clinical Psychology Review, 17(8):881–901.
Shim, S. R. , Kim, S.-J. , Lee, J. , and Rücker, G. (2019). Network meta-analysis: Application
and practice using R software. Epidemiology and Health, 41.
Sidik, K. and Jonkman, J. N. (2002). A simple confidence interval for meta-analysis. Statistics in
Medicine, 21(21):3153–3159.
Sidik, K. and Jonkman, J. N. (2005). Simple heterogeneity variance estimation for meta-
analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(2):367–384.
Sidik, K. and Jonkman, J. N. (2007). A comparison of heterogeneity variance estimators in
combining results of studies. Statistics in Medicine, 26(9):1964–1981.
Sidik, K. and Jonkman, J. N. (2019). A note on the empirical Bayes heterogeneity variance
estimator in meta-analysis. Statistics in Medicine, 38(20):3804–3816.
Silberzahn, R. , Uhlmann, E. L. , Martin, D. P. , Anselmi, P. , Aust, F. , Awtrey, E. , Bahnk, Š. ,
Bai, F. , Bannard, C. , Bonnier, E. , Carlsson, R. , Cheung, F. , Christensen, G. , Clay, R. ,
Craig, M. A. , Dalla Rosa, A. , Dam, L. , Evans, M. H. , Flores Cervantes, I. , Fong, N. , Gamez-
Djokic, M. , Glenz, A. , Gordon-McKeon, S. , Heaton, T. J. , Hederos, K. , Heene, M. , Hofelich
Mohr, A. J. , Högden, F. , Hui, K. , Johannesson, M. , Kalodimos, J. , Kaszubowski, E. ,
Kennedy, D. M. , Lei, R. , Lindsay, T. A. , Liverani, S. , Madan, C. R. , Molden, D. , Molleman, E.
, Morey, R. D. , Mulder, L. B. , Nijstad, B. R. , Pope, N. G. , Pope, B. , Prenoveau, J. M. , Rink,
F. , Robusto, E. , Roderique, H. , Sandberg, A. , Schlüter, E. , Schönbrodt, F. D. , Sherman, M.
F. , Sommer, S. A. , Sotak, K. , Spain, S. , Spörlein, C. , Stafford, T. , Stefanutti, L. , Tauber, S. ,
Ullrich, J. , Vianello, M. , Wagenmakers, E. J. , Witkowiak, M. , Yoon, S. , and Nosek, B. A.
(2018). Many analysts, one data set: Making transparent how variations in analytic choices
affect results. Advances in Methods and Practices in Psychological Science, 1(3):337–356.
Simonsohn, U. , Nelson, L. D. , and Simmons, J. P. (2014a). P-curve: A key to the file-drawer.
Journal of Experimental Psychology: General, 143(2):534.
Simonsohn, U. , Nelson, L. D. , and Simmons, J. P. (2014b). P-curve and effect size: Correcting
for publication bias using only significant results. Perspectives on Psychological Science,
9(6):666–681.
Simonsohn, U. , Simmons, J. P. , and Nelson, L. D. (2015). Better p-curves: Making p-curve
analysis more robust to errors, fraud, and ambitious p-hacking, a reply to Ulrich and Miller
(2015). Journal of Experimental Psychology: General, 144(6):1146–1152.
Simonsohn, U. , Simmons, J. P. , and Nelson, L. D. (2020). Specification curve analysis. Nature
Human Behaviour, 4(11):1208–1214.
Smith, M. L. and Glass, G. V. (1977). Meta-analysis of psychotherapy outcome studies.
American Psychologist, 32(9):752.
Song, F. , Loke, Y. K. , Walsh, T. , Glenny, A.-M. , Eastwood, A. J. , and Altman, D. G. (2009).
Methodological problems in the use of indirect comparisons for evaluating healthcare
interventions: Survey of published systematic reviews. BMJ, 338:b1147.
Spearman, C. (1904). The proof and measurement of association between two things. The
American Journal of Psychology, 15(1):72–101.
Stanley, T. D. (2008). Meta-regression methods for detecting and estimating empirical effects in
the presence of publication selection. Oxford Bulletin of Economics and Statistics,
70(1):103–127.
Stanley, T. D. (2017). Limitations of PET-PEESE and other meta-analysis methods. Social
Psychological and Personality Science, 8(5):581–591.
Stanley, T. D. and Doucouliagos, H. (2014). Meta-regression approximations to reduce
publication selection bias. Research Synthesis Methods, 5(1):60–78.
Sterne, J. A. , Gavaghan, D. , and Egger, M. (2000). Publication and related bias in meta-
analysis: Power of statistical tests and prevalence in the literature. Journal of Clinical
Epidemiology, 53(11):1119–1129.
Sterne, J. A. , Hernán, M. A. , Reeves, B. C. , Savović, J. , Berkman, N. D. , Viswanathan, M. ,
Henry, D. , Altman, D. G. , Ansari, M. T. , Boutron, I. , Carpenter, J. R. , Chan, A.-W. , Churchill,
R. , Deeks, J. J. , Hróbjartsson, A. , Kirkham, J. , Jüni, P. , Loke, Y. K. , Pigott, T. D. , Ramsay,
C. R. , Regidor, D. , Rothstein, H. R. , Sandhu, L. , Santaguida, P. L. , Schünemann, H. J. ,
Shea, B. , Shrier, I. , Tugwell, P. , Turner, L. , Valentine, J. C. , Waddington, H. , Waters, E. ,
Wells, G. A. , Whiting, P. F. , and Higgins, J. P. T. (2016). ROBINS-I: A tool for assessing risk of
bias in non-randomised studies of interventions. BMJ, 355:i4919.
Sterne, J. A. , Savović, J. , Page, M. J. , Elbers, R. G. , Blencowe, N. S. , Boutron, I. , Cates, C.
J. , Cheng, H.-Y. , Corbett, M. S. , Eldridge, S. M. , et al. (2019). RoB 2: A revised tool for
assessing risk of bias in randomised trials. BMJ, 366.
Sterne, J. A. C. , Sutton, A. J. , Ioannidis, J. P. A. , Terrin, N. , Jones, D. R. , Lau, J. , Carpenter,
J. , Rücker, G. , Harbord, R. M. , Schmid, C. H. , Tetzlaff, J. , Deeks, J. J. , Peters, J. ,
Macaskill, P. , Schwarzer, G. , Duval, S. , Altman, D. G. , Moher, D. , and Higgins, J. P. T.
(2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-
analyses of randomised controlled trials. BMJ, 343.
Stijnen, T. , Hamza, T. H. , and Özdemir, P. (2010). Random effects meta-analysis of event
outcome in the framework of the generalized linear mixed model with applications in sparse
data. Statistics in Medicine, 29(29):3046–3067.
Stinerock, R. (2018). Statistics with R: A Beginner's Guide. SAGE, London, UK, 1st edition.
Sweeting, M. J. , Sutton, A. J. , and Lambert, P. C. (2004). What to add to nothing? use and
avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine,
23(9):1351–1375.
Tacconelli, E. (2009). Systematic reviews: CRD's guidance for undertaking reviews in
healthcare. The Lancet Infectious Diseases, 10(4).
Tang, R. W. and Cheung, M. W.-L. (2016). Testing IB theories with meta-analytic structural
equation modeling. Review of International Business and Strategy, 26(4):472–492.
Terrin, N. , Schmid, C. H. , Lau, J. , and Olkin, I. (2003). Adjusting for publication bias in the
presence of heterogeneity. Statistics in Medicine, 22(13):2113–2126.
Thalheimer, W. and Cook, S. (2002). How to calculate effect sizes from published research: A
simplified methodology.
Thompson, B. (2004). Exploratory and confirmatory factor analysis. American Psychological
Association.
Thompson, S. G. and Higgins, J. P. (2002). How should meta-regression analyses be
undertaken and interpreted? Statistics in Medicine, 21(11):1559–1573.
Thompson, S. G. , Turner, R. M. , and Warn, D. E. (2001). Multilevel models for meta-analysis,
and their application to absolute risk differences. Statistical Methods in Medical Research,
10(6):375–392.
Tipton, E. , Pustejovsky, J. E. , and Ahmadi, H. (2019). A history of meta-regression: Technical,
conceptual, and practical developments between 1974 and 2018. Research Synthesis Methods,
10(2):161–179.
Valstad, M. , Alvares, G. A. , Andreassen, O. A. , Westlye, L. T. , and Quintana, D. S. (2016).
The relationship between central and peripheral oxytocin concentrations: A systematic review
and meta-analysis protocol. Systematic Reviews, 5(1):1–7.
van Aert, R. C. , Wicherts, J. M. , and van Assen, M. A. (2016). Conducting meta-analyses
based on p values: Reservations and recommendations for applying p-uniform and p-curve.
Perspectives on Psychological Science, 11(5).
van Valkenhoef, G. , Lu, G. , de Brock, B. , Hillege, H. , Ades, A. , and Welton, N. J. (2012).
Automating network meta-analysis. Research Synthesis Methods, 3(4):285–299.
Vance, A. (2009). Data analysts captivated by R's power. New York Times .
Veroniki, A. A. , Jackson, D. , Viechtbauer, W. , Bender, R. , Bowden, J. , Knapp, G. , Kuss, O. ,
Higgins, J. P. , Langan, D. , and Salanti, G. (2016). Methods to estimate the between-study
variance and its uncertainty in meta-analysis. Research Synthesis Methods, 7(1):55–79.
Vevea, J. L. and Woods, C. M. (2005). Publication bias in research synthesis: Sensitivity
analysis using a priori weight functions. Psychological Methods, 10(4):428.
Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-
effects model. Journal of Educational and Behavioral Statistics, 30(3):261–293.
Viechtbauer, W. (2007). Confidence intervals for the amount of heterogeneity in meta-analysis.
Statistics in Medicine, 26(1):37–52.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of
Statistical Software, 36(3):1–48.
Viechtbauer, W. and Cheung, M. W.-L. (2010). Outlier and influence diagnostics for meta-
analysis. Research Synthesis Methods, 1(2):112–125.
Viechtbauer, W. , López-López, J. A. , Sánchez-Meca, J. , and Marn-Martnez, F. (2015). A
comparison of procedures to test for moderators in mixed-effects meta-regression models.
Psychological Methods, 20(3):360.
Wampold, B. E. (2013). The great psychotherapy debate: Models, methods, and findings.
Routledge.
Wellek, S. (2017). A critical evaluation of the current “p-value controversy”. Biometrical Journal,
59(5):854–872.
Whiting, P. F. , Rutjes, A. W. , Westwood, M. E. , Mallett, S. , Deeks, J. J. , Reitsma, J. B. ,
Leeflang, M. M. , Sterne, J. A. , and Bossuyt, P. M. (2011). QUADAS-2: A revised tool for the
quality assessment of diagnostic accuracy studies. Annals of Internal Medicine,
155(8):529–536.
Whittingham, M. J. , Stephens, P. A. , Bradbury, R. B. , and Freckleton, R. P. (2006). Why do
we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology,
75(5):1182–1189.
Wicherts, J. M. , Veldkamp, C. L. , Augusteijn, H. E. , Bakker, M. , Van Aert, R. , and Van
Assen, M. A. (2016). Degrees of freedom in planning, running, analyzing, and reporting
psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7:1832.
Wickham, H. , Averick, M. , Bryan, J. , Chang, W. , McGowan, L. D. , Franà § ois, R. ,
Grolemund, G. , Hayes, A. , Henry, L. , Hester, J. , Kuhn, M. , Pedersen, T. L. , Miller, E. ,
Bache, S. M. , Müller, K. , Ooms, J. , Robinson, D. , Seidel, D. P. , Spinu, V. , Takahashi, K. ,
Vaughan, D. , Wilke, C. , Woo, K. , and Yutani, H. (2019). Welcome to the tidyverse. Journal of
Open Source Software, 4(43):1686.
Wickham, H. and Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize,
and model data. O'Reilly.
Wiksten, A. , Rücker, G. , and Schwarzer, G. (2016). Hartung–Knapp method is not always
conservative compared with fixed-effect meta-analysis. Statistics in Medicine,
35(15):2503–2515.
Williams, D. R. , Rast, P. , and Bürkner, P.-C. (2018). Bayesian meta-analysis with weakly
informative prior distributions. https://psyarxiv.com/7tbrm/.
Wolen, A. R. , Hartgerink, C. H. , Hafen, R. , Richards, B. G. , Soderberg, C. K. , and York, T. P.
(2020). osfr: An R interface to the Open Science Framework. Journal of Open Source Software,
5(46):2071.
Wołodźko, T. (2020). extradistr: Additional univariate and multivariate distributions. R package
version 1.9.1. https://CRAN.R-project.org/package=extraDistr.
Xie, Y. , Allaire, J. J. , and Grolemund, G. (2018). R Markdown: The definitive guide. Chapman
and Hall/CRC Press.
Yusuf, S. , Peto, R. , Lewis, J. , Collins, R. , and Sleight, P. (1985). Beta blockade during and
after myocardial infarction: An overview of the randomized trials. Progress in Cardiovascular
Diseases, 27(5):335–371.
Zhang, J. and Yu, K. F. (1998). What's the relative risk? A method of correcting the odds ratio in
cohort studies of common outcomes. JAMA, 280(19):1690–1691.

You might also like