0% found this document useful (0 votes)

26 views22 pages

AssinkWibbelink2016 FittingThree LevelMeta AnalyticModelsinR AStep by StepTutorial

This is an academic paper I recently purchased.

Uploaded by

tong990812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views22 pages

AssinkWibbelink2016 FittingThree LevelMeta AnalyticModelsinR AStep by StepTutorial

This is an academic paper I recently purchased.

Uploaded by

tong990812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/307168012

Fitting three-level meta-analytic models in R: A step-by-step tutorial

Article in Tutorials in Quantitative Methods for Psychology · December 2016

DOI: 10.20982/tqmp.12.3.p154

CITATIONS READS
685 3,583

2 authors:

Mark Assink Carlijn Wibbelink

University of Amsterdam University of Amsterdam
87 PUBLICATIONS 3,737 CITATIONS 11 PUBLICATIONS 825 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Mark Assink on 01 April 2024.

The user has requested enhancement of the downloaded file.

¦ 2016 Vol. 12 no. 3

Fitting three-level meta-analytic models in R:

A step-by-step tutorial

Mark Assinka, and Carlijn J. M. Wibbelinkb

a
Research Institute of Child Development and Education; University of Amsterdam
b
Psychology Research Institute; University of Amsterdam

Abstract Applying a multilevel approach to meta-analysis is a strong method for dealing with Acting Editor De-
dependency of effect sizes. However, this method is relatively unknown among researchers and, nis Cousineau (Uni-
versité d’Ottawa)
to date, has not been widely used in meta-analytic research. Therefore, the purpose of this tuto-
rial was to show how a three-level random effects model can be applied to meta-analytic models Reviewers
in R using the rma.mv function of the metafor package. This application is illustrated by taking One anonymous re-
viewer.
the reader through a step-by-step guide to the multilevel analyses comprising the steps of (1) orga-
nizing a data file; (2) setting up the R environment; (3) calculating an overall effect; (4) examining
heterogeneity of within-study variance and between-study variance; (5) performing categorical and
continuous moderator analyses; and (6) examining a multiple moderator model. By example, the
authors demonstrate how the multilevel approach can be applied to meta-analytically examining
the association between mental health disorders of juveniles and juvenile offender recidivism. In
our opinion, the rma.mv function of the metafor package provides an easy and flexible way of ap-
plying a multi-level structure to meta-analytic models in R. Further, the multilevel meta-analytic
models can be easily extended so that the potential moderating influence of variables can be exam-
ined.
Keywords meta-analysis, multilevel analysis. Tools R, rma.mv, metafor.

[email protected]

MA: na; CJMW: na

10.20982/tqmp.12.3.p154

analytic research, we refer the reader to the work of Boren-

Introduction
stein, Hedges, Higgins, and Rothstein (2009), Cooper (2010),
The term meta-analysis refers to a stepwise procedure Hunter and Schmidt (2004), Lipsey and Wilson (2001) and
and a set of statistical techniques for combining results Mullen (1989).
of independent primary studies, so that overall conclu- After a research problem has been formulated and the
sions regarding a specific topic can be drawn. In general, search procedure for relevant primary studies has been
the meta-analytic process can be divided into the follow- finished, it is time for the research synthesist to retrieve in-
ing steps: (1) formulating a research problem; (2) search- formation from all primary studies in a coding procedure.
ing for relevant primary studies; (3) retrieving informa- In essence, there are two aspects to coding studies: coding
tion from the primary studies; (4) integrating the retrieved information about empirical findings reported in primary
information in statistical analyses; and (5) interpreting studies that can be expressed in effect sizes (i.e., the depen-
the results from the analyses and drawing overall conclu- dent variable), and the coding of factors, such as study de-
sions. In this tutorial, we specifically focus on the statis- sign, ethnicity of the sample, and type of instruments used,
tical analyses in meta-analytic research (the fourth step that may influence the nature and magnitude of the empir-
mentioned above). Throughout the years, a large num- ical findings (i.e., the independent variables) (Lipsey & Wil-

f
ber of books have been written on meta-analysis and for son, 2001). For integrating empirical ﬁndings reported in
a comprehensive overview of all aspects involved in meta- primary studies, it is necessary that each empirical ﬁnding

The Quantitative Methods for Psychology 154

¦ 2016 Vol. 12 no. 3

on a topic of interest is expressed in an effect size, which ung, 2015). After all, informative differences between ef-
Cohen (1988) has defined as a quantitative indication of the fect sizes are lost and can no longer be identified in the
degree to which [a] phenomenon is present in the population analyses. In addition, Cheung notes that extracting a sin-
(pp. 9 – 10). The larger the value, the greater the degree gle effect size from each primary study implies that homo-
to which a phenomenon is present, or in other words, the geneity of effect sizes within studies is assumed, which is,
larger the effect. Common metrics for effect size are the in most instances, a questionable assumption. By stepping
standardized difference between the mean of two different away from the traditional univariate approach to meta-
groups (Cohen’s d), the correlation coefficient (r or Fisher’s analysis, it becomes possible to deal with dependency of
Z when transformed), and the odds-ratio. effect sizes in such a way that a research synthesist can
An important requirement in traditional univariate extract all relevant effect sizes from each primary study
meta-analytic approaches is that there is no dependency without needing to reduce the number of effect sizes in
between effect sizes in the data set that is to be analyzed any way. By performing the analyses using all relevant ef-
(e.g., Rosenthal, 1984). If there is dependency between effect sizes, all information can be preserved and maximum
fect sizes (i.e., effect sizes are correlated), there is overlap statistical power can be achieved. In addition, there is no
in information to which correlated effect sizes are refer- assumption of homogeneity of effect sizes within studies.
ring to. In this way the available information is ‘inflated’ Applying a three-level structure to a meta-analytic
and consequently leads to an overconfidence in the re- model (Cheung, 2014; Hox, 2010; Van den Noortgate et al.,
sults of a meta-analysis (Van den Noortgate, Lòpez-Lòpez, 2013, 2014) is a better approach for dealing with depen-
Marı̀n-Martı̀nez, & Sànchez-Meca, 2013). Lipsey and Wil- dency of effect sizes than the methods just mentioned. This
son (2001) emphasize that for meeting the requirement of three-level meta-analytic model considers three different
non-independency, only one effect size per primary study variance components distributed over the three levels of
should be included. After all, it is likely that effect sizes the model: sampling variance of the extracted effect sizes
extracted from the same study are more alike (and thus at level 1; variance between effect sizes extracted from the
interdependent) than effect sizes extracted from different same study at level 2; and variance between studies at level
studies, because the former may be based on the same par- 3. In short, this model allows effect sizes to vary between
ticipants, instruments, and/or circumstances in which the participants (level 1), outcomes (level 2), and studies (level
research was conducted (Houben, Van den Noortgate, & 3). Contrary to several other statistical techniques, the mul-
Kuppens, 2015). tilevel approach does not require the correlations between
Different solutions for dealing with dependency of ef- outcomes reported within primary studies to be known for
fect sizes have been described in the literature (see, for estimating the covariance matrix of the effect sizes, since
instance, Borenstein et al., 2009; Cooper, 2010; Del Re, the second level in the above described three-level meta-
2015; Hedges & Olkin, 1985; Lipsey & Wilson, 2001; Rosen- analytic model accounts for sampling covariation (Van den
thal, 1984; Schmidt & Hunter, 2015). Common methods for Noortgate et al., 2013). Because (estimates of) correlations
handling dependency of effect sizes are: simply ignoring between outcomes are rarely reported in primary stud-
the dependency and analyzing the effect sizes as if they ies and therefore difficult to obtain, the use of multilevel
were independent; averaging the dependent effect sizes models in meta-analytic research is a very practical way
within studies into a single effect size by calculating an to account for interdependency of effect sizes. Further,
unweighted or - less biased - weighted average; selecting the three-level approach allows examining differences in
only one effect size per study (also referred to as elimi- outcomes within studies (i.e., within-study heterogeneity)
nating effect sizes); and shifting the unit of analysis mean- as well as differences between studies (i.e., between-study
ing that one unit of analysis is selected after which effect heterogeneity). If there is evidence for heterogeneity in
sizes are averaged within each unit. Some of these meth- effect sizes, moderator analyses can be conducted to test
ods are quite conservative, whereas others produce more variables that may explain within-study or between-study
accurate effect sizes. Cheung (2015) presents a more de- heterogeneity. For these analyses, the three-level random
tailed overview of these strategies and their limitations in effects model can easily be extended with study and effect
his book on applying a structural equation modeling ap- size characteristics, making the model a three-level mixed
proach to meta-analysis. effects model.
When averaging or eliminating effect sizes in primary Despite the fact that using multilevel modeling in meta-
studies, there may not only be the problem of a lower sta- analysis is a strong method for dealing with interdepen-
tistical power in the analyses due to information loss, but dency of effect sizes, it is a rather unknown method among
also the problem of a limit in the research questions that scholars and has not been widely applied yet in meta-

f
can be addressed in a meta-analytic research project (Che- analytic research. Therefore, the main purpose of this tu-

The Quantitative Methods for Psychology 155

¦ 2016 Vol. 12 no. 3

torial is to show how the above described three-level struc- in their meta-analytic study.
ture can be applied to meta-analytic models. For this pur-
Organizing the data file
pose, we use the rma.mv function of the metafor package
(Viechtbauer, 2015), which can be invoked in the statistical Prior to analyzing the effect sizes in a data set, it is first im-
software environment R (R Development Core Team, 2016). portant to properly organize a data file, so that the three-
The metafor package was written by Wolfgang Viechtbauer level meta-analytic models can be built in the R environ-
and comprises a large set of functions for conducting meta- ment. An excerpt of the data file that is used in the ex-
analyses. One of the many features of this flexible R pack- ample described in the present tutorial is shown in Table
age is that it allows users to fit a variety of meta-analytic 1. From this table, it can be derived that each row rep-
models in which different approaches to analysis can be resents one effect size extracted from one primary study.
used. The rma.mv function is part of this package and The first four columns from the left represent the variables
makes it possible to fit multilevel meta-analytic models that that are mandatory to create in order to properly build the
can be extended by including moderators. To illustrate three-level meta-analytic models. In the first column, each
how a three-level random effects meta-analytic model can independent study is designated with a unique identifier
be set up using the rma.mv function in the R environment, in the variable studyID, and in the second column, each
we will present an example of meta-analytic research on extracted effect size is designated with a unique identifier
the association between mental health disorders and ju- in the variable effectsizeID. As can be seen in the ta-
venile offender recidivism, which was adapted from the ble, six effect sizes were extracted from study 1, three ef-
work of Wibbelink, Hoeve, Stams, and Oort (2016). The fect sizes from study 2, six effect sizes from study 3, one
reader will be guided through this example in a stepwise effect size from study 11, one effect size from study 12, and
manner. First, we will illustrate how a data set should two effect sizes from study 16. The variable labeled y con-
be organized and how the R environment should be set tains all actual effect sizes, and in this example, all effects
up. Second, we will demonstrate how an overall effect are expressed in Cohen’s d (but other metrics for the ef-
can be estimated using a three-level meta-analytic model. fect size, such as Fisher’s z , can also be analyzed with the
Third, we will discuss within-study heterogeneity as well rma.mv function of the metafor package). Each effect size
as between-study heterogeneity, and fourth, we will illus- represents the difference in recidivism rates between ju-
trate the steps that are involved in performing a moderator veniles with a mental health disorder and a comparison
analysis. Lastly, we will show how moderators can be ana- group of juveniles without a mental health disorder. A pos-
lyzed jointly in one multiple moderator model, in order to itive value of Cohen’s d indicates that the prevalence of re-
examine the unique contribution of moderators. cidivism is higher in the group of juveniles with a mental
health disorder relative to the comparison group, whereas
Example: The association between mental health disor-
a negative value of Cohen’s d is indicative of the opposite.
ders of juveniles and juvenile offender recidivism
According to the criteria formulated by Cohen (1988), d val-
In their meta-analytic study, Wibbelink et al. (2016) fo- ues of .2, .5, and .8 can be interpreted as small, moderate,
cused on associations between mental health disorders of and large effects, respectively. The variable labeled v con-
delinquent juveniles and subsequent delinquent behavior tains the sampling variance that corresponds with the ob-
of those juveniles (i.e., recidivism). More specifically, the served effect size in the variable y and can be obtained by
first aim of the study was to meta-analytically estimate an squaring the standard error.
overall association between mental health disorders of ju- The other variables that are part of the data set are
veniles and recidivism, since there are considerable dif- tested in moderator analyses as potential moderators of
ferences in the associations found in primary studies. By the overall association between juveniles with a mental
statistically summarizing primary studies, better insight is health disorder and recidivism. In our example, the poten-
provided in the true association between mental health tial moderators that will be examined are (1) publication
disorders of juveniles and recidivism. Because primary status of the primary study; (2) type of delinquent behav-
studies differ from each other in several ways (e.g., dif- ior in which juveniles have recidivated; and (3) the year
ferences in the way recidivism is defined, differences in in which a primary study was published. Prior to testing
assessing recidivism, and differences in methodological categorical variables as potential moderators of the over-
characteristics), a second aim of the study was to examine all effect, we created a dummy variable for each category
whether (and how) the association between mental health of a categorical variable (see Table 1). At first glance, it
disorders of juveniles and recidivism is moderated by a may seem redundant to create a dummy variable for each
number of variables. For the present tutorial, we used a of the categories rather than for only the categories that

f
subset of the data set that Wibbelink and colleagues used are tested against a reference category (i.e., total number

The Quantitative Methods for Psychology 156

¦ 2016 Vol. 12 no. 3

Table 1 Excerpt of the Data Set Used in the Present Example.

studyID effectsizeID y v pstatpub pstatnotpub typegen typeovert typecovert pyear

1 1 .9066 .0740 1 0 1 0 0 5
1 2 .4295 .0398 1 0 1 0 0 5
1 3 .2679 .0481 1 0 1 0 0 5
1 4 .2078 .0239 1 0 1 0 0 5
1 5 .0526 .0331 1 0 1 0 0 5
1 6 -.0507 .0886 1 0 1 0 0 5
2 7 .5117 .0115 1 0 1 0 0 2
2 8 .4738 .0076 1 0 1 0 0 2
2 9 .3544 .0065 1 0 1 0 0 2
3 10 2.2844 .3325 1 0 1 0 0 -9
3 11 2.1771 .3073 1 0 1 0 0 -9
3 12 1.7777 .2697 1 0 1 0 0 -9
3 13 1.5480 .4533 1 0 1 0 0 -9
3 14 1.4855 .1167 1 0 1 0 0 -7
3 15 1.4836 .1706 1 0 1 0 0 -7
11 59 .8615 .1591 0 1 0 1 0 -7
12 63 .2994 .0041 0 1 0 1 0 5
16 93 -.5675 .0340 1 0 0 0 1 3
16 94 -.7586 .0437 1 0 0 0 1 3
Note. The data set used in the present example was based on the data set created by Wibbelink, Hoeve, Stams, and
Oort (2016). studyID = Unique identiﬁer for each primary study; effectsizeID = Unique identiﬁer for each effect size;
y = Variable containing all effect sizes; v = Variable containing all sampling variances; pstatpub = Published primary
studies (0 = not published; 1 = published); Pstatnotpub = Unpublished primary studies (0 = published; 1 = unpublished);
typegen = General delinquency; typeovert = Overt delinquency; typecovert = Covert delinquency. All variables are ex-
plained in the text.

of categories – 1). However, we were not only interested not applicable. Once again, these dummy variables are
in the mean effect of a reference category, but also in the mutually exclusive. The publication year of a study was
mean effect (including significance and confidence inter- regarded as a continuous variable and after the publica-
val) of the other categories that are tested against a ref- tion year of all primary studies was coded, the variable was
erence category. In order to obtain these results, we cre- centered around its mean. The results were stored in the
ated a dummy variable for each category of a discrete vari- variable pyear (see Table 1). Prior to the analyses (but not
able that is tested as a potential moderator. We will fur- visible in Table 1), it was checked whether outlying effect
ther elaborate on this in the section on moderator analy- sizes were present in the data set by screening for stan-
ses. So, in our example, we created two dummy variables dardized z values larger than 3.29 or smaller than -3.29
for publication status and three dummy variables for type (Tabachnik & Fidell, 2013). In case of missing values in the
of delinquency. In the dichotomous variable pstatpub, it variables that were to be tested as potential moderators,
was coded whether a primary study was published or not the cells were left empty (i.e., system missing values which
(1 = published; 0 = not published). The dichotomous vari- are not visible in Table 1). Note that the data set used in
able pstatnotpub was created by inverting the values the present example can be downloaded as a comma sep-
of the variable pstatpub, so that 0 is indicative of a pub- arated values file (named dataset.csv) from the jour-
lished study and 1 is indicative of an unpublished study. nal’s website.
Both variables are mutually exclusive, as can be seen in Ta-
Setting up the R environment
ble 1. In the variables typegen (i.e., general delinquent
behavior), typeovert (i.e., overt delinquent behavior), The statistical software environment R (we recommend at
and typecovert (i.e., covert delinquent behavior). The least version 3.2.2) can be downloaded from the following
value 1 in these three dummy variables is indicative of the websites:

f
speciﬁc type of delinquency being applicable, whereas the http://cran.r-project.org/bin/windows/base/ (for Windows);
value 0 indicates that the speciﬁc type of delinquency is http://cran.r-project.org/bin/macosx/ (for OS X).

The Quantitative Methods for Psychology 157

¦ 2016 Vol. 12 no. 3

R provides a basic graphical user interface, but it is invoking the rma.mv function with the random argu-
rather easy to install a more productive developmental en- ment (for more information on the random-effects ap-
vironment for R (such as RStudio), if desired by the user. proach, see for instance Raudenbush (2009), Van den
After installing R, the user needs to define a working direc- Noortgate and Onghena (2003).
tory in which syntax, data, and other files can be found by • list(~ 1 | effectsizeID, ~ 1 | studyID)
the R environment. This can be done by running the syn- = the element needed for defining the three-level struc-
tax in Listing 1. Note that all syntax should be entered at ture of the meta-analytic model. effectsizeID (i.e.,
the command prompt (>) of the R environment and that all the variable containing the unique identifiers of all ef-
text after a number sign (#) is considered a comment and fect sizes in the data set) defines the second level of the
will not be executed by R. Readers who are interested in three-level model at which the variance between effect
replicating our analyses can therefore leave out the com- sizes within primary studies is distributed. studyID
ments in the syntaxes presented in this tutorial. (i.e., the variable containing the unique identifiers of
Next, the user needs to install and load the metafor all primary studies in the data set) defines the third
package that comprises the rma.mv function, which will be level of the three-level model at which the variance
invoked later on for building the multilevel meta-analytic between studies is distributed. For both grouping vari-
model. Installing and loading the metafor package can be ables (i.e., effectsizeID and studyID) accounts
performed by running the syntax in Listing 2. that the same random effect is assigned to effect sizes
Next, the data set needs to be imported into the with the same value of the grouping variable (i.e., ef-
R environment. Since our data was saved in the file fect sizes are not assumed to be independent), whereas
dataset.csv, which is in the comma delimited format, different random effects are assigned to effect sizes
we need to import this file by running the syntax in Listing having different values of the grouping variable (i.e.,
3. effect sizes are assumed to be independent). In this syn-
In order to check whether the data was correctly im- tax element, the random effects variance is denoted by
ported in the R environment, the user can screen the im- ~ 1 and is assigned to a grouping variable by the ver-
ported data by invoking several functions in a sequential tical bar (i.e., |). Note that the first level of the model
order (see the syntax in Listing 4). at which the sampling variance of all extracted effect
sizes is distributed, does not need to be defined in the
Calculating an overall effect
syntax. The sampling variance is not estimated in the
First, the overall association between juveniles with a men- meta-analytic model and is considered to be known. In
tal health disorder and recidivism (i.e., the overall ef- this example, we will use the formula as given by Che-
fect) will be estimated by fitting a three-level meta-analytic ung (2014, pg. 2015) to estimate the sampling variance
model to the data that will only consist of an intercept rep- parameter at the first level of the model, and we will
resenting the overall effect. For this purpose, we use the return on this issue later on.
rma.mv function of the metafor package, by running the • tdist=TRUE = the argument specifying that test
syntax in Listing 5. statistics and confidence intervals must be based on the
Below, we will first take a closer look on the elements t-distribution. See below for more information on this
of the syntax in Listing 5 that are taken as arguments by argument.
the rma.mv function. • data=dataset = the argument describing which ob-
• overall = the name of the object in which the results ject contains the data set.
of the rma.mv function will be stored. In our example, We will now take a closer look at the tdist=TRUE ar-
we have named this object overall, since we are first gument of the syntax. The default settings of the rma.mv
estimating an overall effect; function prescribe that test statistics of individual coeffi-
• y = the name of the variable containing all effect sizes cients and confidence intervals are based on the normal
(which are Cohen’s d values in the present example); distribution (i.e., the Z distribution). Further, the omnibus
• v = the name of the variable containing all sampling test used for testing multiple coefficients in a meta-analytic
variances; model that is extended with potential moderating vari-
• random = the argument that is taken by the rma.mv ables is, by default, based on the chi-square distribution
function when the user wants to perform a random- with m degrees of freedom (m = number of coefficients
effects meta-analysis. Because the primary studies in tested in the model, excluding the intercept, if present in
the present meta-analytic example were considered to the model). Several scholars showed that using the Z dis-
be a random sample of the population of studies, we tribution in assessing the significance of model coefficients

f
wanted to perform a random-effects meta-analysis by and in building conﬁdence intervals around these coeﬃ-

The Quantitative Methods for Psychology 158

¦ 2016 Vol. 12 no. 3

Listing 1 Setting the Working Directory.

# Setting the working directory;

# Mind the forward slashes in the syntax.
setwd("C:/research/meta-analysis/data")

Listing 2 Installing and Loading the Metafor Package.

# Installing and loading the metafor package.

install.packages("metafor")
library(metafor)

cients, may lead to an increase in the number of unjustified meta-analytic models (Viechtbauer, 2015, personal com-
significant results (see, for instance, Li, Shi, & Roth, 1994; munication).
Ziegler, Koch, & Victor, 2001). To reduce this problem, the The results of fitting a three-level intercept only model
user can apply the Knapp and Hartung (2003) adjustment to the data can be printed on screen by running the syntax
to the analyses by passing the argument tdist=TRUE to in Listing 6. Running this syntax will produce the output
the rma.mv function. that is shown in Output 1.
By applying the Knapp and Hartung’s (2003) adjust- We will now proceed with a detailed explanation of
ment, the calculation of standard errors, p values, and con- Output 1.
fidence intervals is slightly modified. To be precise, test • k = 100; method: REML implies that the data
statistics of individual coefficients will be based on the t set comprises 100 effect sizes (i.e., 100 rows in the data
distribution with k (number of effect sizes) – p (total num- set) and that the REstricted Maximum Likelihood esti-
ber of coefficients in the model including the intercept) de- mation method (REML) is used for estimating the pa-
grees of freedom. If an omnibus test is performed (only rameters in the model. It is often possible to choose be-
relevant when testing potential moderating variables by tween different estimation methods in statistical soft-
extending the intercept-only model with predictors), it will ware, and each estimation method has its own advan-
be based on the F distribution in which the degrees of free- tages and disadvantages. The REML method is in some
dom of the numerator (df1) equals the number of coeffi- ways superior to other methods (see, for instance, Hox,
cients in the model, and in which the degrees of freedom 2010; Viechtbauer, 2005), but has also restrictions (e.g.,
of the denominator (df2) equals k (number of effect sizes) Cheung, 2014; Van den Noortgate et al., 2013). In this
– p (total number of coefficients in the model including the tutorial, we will not further discuss this issue. How-
intercept). In case the intercept-only model is extended ever, it is important to note that by using the REML
with only one predictor, the F value of the omnibus test method, it is not possible to perform a log-likelihood-
equals the square of the t value associated with the regres- ratio test to compare the fit of an intercept-only model
sion coefficient of the predictor. The studies of Assink et al. (i.e., a model without predictors) to a model with pre-
(2015), Houben et al. (2015) and Weisz et al. (2013) are ex- dictors (Hox, 2010; Van den Noortgate et al., 2014, for
amples of published meta-analytic research in which the more information, see).
Knapp and Hartung adjustment is applied. As for calcu- • Loglik, Deviance, AIC, BIC, AICc are goodness-
lating the degrees of freedom, a Satterthwaite correction of-fit indices for the meta-analytic model and provide
(Satterthwaite, 1946) is sometimes applied when there are information on how well the model fits the data set. In
differences in variances of the groups that are to be com- this tutorial, we will not further discuss the technical
pared. This often results in fractional degrees of freedom details of these indices.
(see, for instance, Table 2 in the work of Weisz et al., 2013 • As for the variance components, it can be derived
and Table 2 in the work of Houben et al., 2015). This Sat- from the output that 0.112 is the estimated value for
terthwaite correction is not (yet) available in the rma.mv the variance between effect sizes within studies (dis-
function, and therefore it cannot be applied when there tributed at the second level of the model) and that
are differences in variances between groups. However, 0.188 is the estimated value for the variance between
until now, this does not seem problematic, since there is studies (distributed at the third level of the model).
no empirical evidence available showing that applying the The results in the columns nlvls and factor tell

f
Satterthwaite correction produces more robust results in us that the data set comprises 100 effect sizes (factor

The Quantitative Methods for Psychology 159

¦ 2016 Vol. 12 no. 3

Listing 3 Importing the Data in R.

# Importing data saved in a comma separated values (CSV) file;

# The data file to be imported was named "dataset.csv";
# All data saved in the file "dataset.csv" is read by invoking
# the read.csv function and assigned to a newly created object
# "dataset" by the assignment operator "<-".
dataset <- read.csv("dataset.csv")

Listing 4 Screening the Imported Data.

# Request several descriptive statistics (e.g., mean, median,

# minimum, maximum) of all variables that are part of the data.
summary(dataset)
# Request an overview of the data structure.
str(dataset)
# Request a print of the first six rows of the data set on screen.
head(dataset)

Listing 5 Estimating the Overall Effect.

# Estimate the overall effect by fitting an intercept-only model.

overall <- rma.mv(y, v, random = list(~ 1 | effectsizeID, ~ 1 | studyID), tdist=
TRUE, data=dataset)

Listing 6 Printing the Results on Screen.

# Request a print of the results stored in the object

# ‘‘overall’’ in three digits.
summary(overall, digits=3)

Output 1 Output of Listings 5 - 6.

Multivariate Meta-Analysis Model (k = 100; method: REML)

logLik Deviance AIC BIC AICc
-73.632 147.264 153.264 161.050 153.517

Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.112 0.335 100 no effectsizeID
sigma^2.2 0.188 0.433 17 no studyID

Test for Heterogeneity:

Q(df = 99) = 808.848, p-val <.001
Model Results:
estimate se tval pval ci.lb ci.ub
0.427 0.118 3.604 <.001 0.192 0.662 ***
---

f
Signif. Codes: 0 ’***’ 0.001 ’**’ ’*’ 0.05 ’.’ 0.1 ’ ’ 1

The Quantitative Methods for Psychology 160

¦ 2016 Vol. 12 no. 3

effectsizeID) that were extracted from 17 studies the variance at level 2 will be manually fixed to zero. In
(factor studyID). other words, the fit of the original three-level model will be
• The results of the test for heterogeneity reveal signifi- compared to the fit of a two-level model in which within-
cant variation between all effect sizes in the data set, study variance is no longer modeled. By doing so, it is pos-
since the p value is smaller than .001. However, these sible to determine whether it is at all necessary to account
results are not very informative, as we are interested for within-study variance in the meta-analytic model. The
in within-study variance (level 2) as well as between- null hypothesis in this test states that the within-study vari-
study variance (level 3) and not in variance between ance equals zero (H0 : σ 2 (level2) = 0), whereas the alter-
all effect sizes in the data set. native hypothesis states that the within-study variance is
• The overall effect can be derived from the Model greater than zero (Ha : σ 2 (level2) > 0). If the test re-
Results. More specifically: estimate = the overall sults provide support for rejecting the null hypothesis, we
effect size; se = standard error; tval = t value; pval can conclude that the fit of the original three-level model is
= p value; ci.lb = lower bound of the confidence in- statistically better than the fit of the two-level model, and
terval; and ci.ub = upper bound of the confidence in- consequently, that there is significant variability between
terval. effect sizes within studies. The significance test can be per-
In our example, we can conclude that the overall asso- formed by running the syntax in Listing 7.
ciation between mental health disorders of juveniles and This syntax closely resembles the syntax for creating
recidivism in juvenile delinquency is 0.427 (expressed in the overall object (see Listing 5), but it has been modified
Cohen’s d) with a standard error of 0.118. This overall ef- in two respects:
fect is significant (t(99) = 3.604, p < .001) and the confidence • modelnovar2 = the name of the object in which the
interval is 0.192 to 0.662. According to the criteria formu- results of the rma.mv function will be stored. In our
lated by Cohen (1988), stating that d = .2, d = .5, and d = example, we have named this object modelnovar2,
.8 are small, moderate, and large effects respectively, the since it will contain a model that has no within-study
overall effect of 0.427 can be regarded as small to moder- variance at level 2;
ate. • sigma2=c(0,NA) = the argument that is taken by the
rma.mv function when the user wants to fix a specific
Determining the significance of the heterogeneity in ef-
variance component to a user-defined value. The first
fect sizes
parameter (0) states that the within-study variance is
To determine whether the within-study variance (level fixed to zero (i.e., no within-study variance will be mod-
2) and between-study variance (level 3) is significant, eled), and the second parameter (NA) states that the
two separate log-likelihood-ratio tests can be performed. between-study variance is estimated.
Preferably, these tests are performed one-sided, since vari- To perform the actual log-likelihood-ratio test, the syn-
ance components can only deviate from zero in a positive tax in Listing 8 needs to be executed.
direction. In both tests, the null hypothesis states that one By calling the anova function, the fit of the two-level
of the variance component equals zero, whereas the al- model named modelnovar2 will be tested against the fit
ternative hypothesis states that the variance component is of the three-level model named overall, which was pre-
greater than zero. Performing these tests two-sided would viously created (see Listing 5). We will now take a look
be too conservative (Viechtbauer, 2015, personal commu- at the output generated by the anova function, which is
nication). In the output of R, p values are by default re- shown in Output 2.
ported for two-sided tests and since we are performing Output 2 should be interpreted as follows:
one-sided log-likelihood-ratio tests, we need to divide the • Full represents the three-level model stored in the ob-
accompanying p values by two. ject overall, whereas Reduced represents the two-
level model stored in the object modelnovar2;
Heterogeneity of within-study variance (level 2)
• df = degrees of freedom. The reduced model has one
Recall from the last output that the variance distributed at degree less than the full model, since within-study vari-
the second level of the three-level model was captured in ance is not present in the reduced model;
the estimated value of 0.112. For testing the significance • LRT = likelihood-ratio test. In this column, the value of
of this variance component, we will perform a one-sided the test statistic is presented;
log-likelihood-ratio test. In this test, the fit of the original • pval = the two-sided p value of the test statistic;
model, in which the variance at the levels 2 and 3 are freely • QE resembles the test for heterogeneity in all effect
estimated, will be compared to the fit of a model in which sizes in the data set, and the value of the test statistic

f
only the variance at level 3 is freely estimated and in which is given in this column. Recall that this test is not very

The Quantitative Methods for Psychology 161

¦ 2016 Vol. 12 no. 3

Listing 7 Building a Two Level-Model without Within-Study Variance.

# Build a two-level model without within-study variance.

modelnovar2 <- rma.mv(y, v, random = list(~ 1 | effectsizeID, ~ 1 | studyID),
sigma2=c(0,NA), tdist=TRUE, data=dataset)

Listing 8 Performing a Likelihood-Ratio-Test.

# Perform a likelihood-ratio-test to determine the

# significance of the within-study variance.
anova(overall,modelnovar2)

informative, as we are interested in both within-study ance at level 3;

variance (level 2) and between-study variance (level 3) • Since we want to fix the between-study variance to zero
in this three-level meta-analytic example. We are not and freely estimate the within-study variance, we have
interested in variance between all effect sizes in the now typed sigma2=c(NA,0);
data set. • In calling the anova function, we have specified that
Given the results, we can conclude that the within- the fit of the two-level model named modelnovar3
study variance is significant, since the fit of the full model should be tested against the fit of the three-level model
is significantly better than the fit of the reduced model. named overall.
Simply put, we found significant variability between effect- After running this syntax, output as shown in Output 3
sizes within studies. Note that the two-sided p value is very is generated.
small (< .0001) and already smaller than the significance Given the results, we can conclude that the between-
level of .05, so dividing the p value by two does not change study variance is significant, since the fit of the full model is
this conclusion. significantly better than the fit of the reduced model. Sim-
ply put, we found significant variability between studies.
Heterogeneity of between-study variance (level 3)
Note that the two-sided p value is significant (p < .0001), so
Determining the significance of the between-study vari- dividing this p value by two does not change this conclu-
ance proceeds in a similar way. Recall from Output 1 that sion.
the variance distributed at the third level of the three-level In our example, there is significant within-study vari-
model was captured in the estimated value of 0.188. We ance (at level 2) as well as significant between-study vari-
will again perform a one-sided log-likelihood-ratio test, but ance (at level 3). This implies that there is more variabil-
now, the fit of the original three-level model will be com- ity in effect sizes (within and between studies) than may
pared to the fit of a model in which only the variance at be expected based on sampling variance alone. Therefore,
level 2 is freely estimated and in which the variance at moderator analyses can be performed in order to examine
level 3 will be manually fixed to zero. In this last model, variables that may explain within- and/or between-study
between-study variance is not modeled. The null hypothe- variance. However, before turning to moderator analyses,
sis in the test states that the between-study variance equals we will first examine how the total variance is distributed
zero (H0 : σ 2 (level3) = 0), whereas the alternative hy- over the three levels of the meta-analytic model.
pothesis states that the between-study variance is greater
The distribution of the variance over the three levels of
than zero (Ha : σ 2 (level2) > 0). If the null hypothesis
the meta-analytic model
should be rejected based on the test results, we can con-
clude that the fit of the original three-level model is statis- Besides testing the significance of the within-study and
tically better than the fit of the two-level model, and conse- between-study variance, it is possible to examine how the
quently, that there is significant variability between stud- total variance is distributed over the three levels of the
ies. The significance test can be performed by running the meta-analytic model. Recall that three different sources
syntax in Listing 9. of variance are modeled in our meta-analytic model: sam-
This syntax has just slightly changed in comparison to pling variance at the first level; within-study variance at
the syntax in Listing 7. the second level; and between-study variance at the third
• The object is now named modelnovar3, since we are level. To determine how much variance can be attributed

f
determining the signiﬁcance of the between-study vari- to differences between effect sizes within studies (level

The Quantitative Methods for Psychology 162

¦ 2016 Vol. 12 no. 3

Output 2 Output of Listings 7 - 8.

df AIC BIC AICc logLik LRT pval QE

Full 3 153.264 161.049 153.517 -73.632 808.8482
Reduced 2 233.156 238.347 233.347 233.281 81.8923 <.0001 808.8482

Listing 9 Determining the Signiﬁcance of Between-Study Variance.

# Build a two-level model without between-study variance;

# Perform a likelihood-ratio-test to determine the
# significance of the between-study variance.
modelnovar3 <- rma.mv(y, v, random = list(~ 1 | effectsizeID, ~ 1 | studyID),
sigma2=c(NA,0), tdist=TRUE, data=dataset)
anova(overall,modelnovar3)

2) and to differences between studies (level 3), formulas amountvariancelevel2, and amountvariancelevel3,
given by Cheung (2014) can be used. The sampling vari- the proportional estimates of the three variance com-
ance (level 1) cannot be regarded as one fixed value, as ponents are multiplied by 100 (%), so that a percentage
this source of variance varies over primary studies. Sam- estimate of each variance component is stored in an
pling variance is based on the sample size, and since sam- object;
ple sizes often differ (considerably) from study to study • By typing and running the objects amountvariancelevel1,
and from effect size to effect size, variation in sampling amountvariancelevel2, and amountvariancelevel3
variance is the consequence. However, it is possible to seperately, the percentage estimates are printed on
make an estimate of the sampling variance by using the screen.
formula of Cheung (2014, formula 14 on page 2015) and Running this syntax generates the output as presented
this estimate is also referred to as the typical within-study in Output 4. For ease of interpretation, the last three lines
sampling variance. In Listing 10, the formulas of Cheung of the syntax in Listing 10 are repeated in Output 4.
are translated into R syntax, with which the distribution of From Output 4, we can derive that 6.94 percent of the
the total variance over the three levels of the meta-analytic total variance can be attributed to variance at level 1 (i.e.,
model can be determined. the typical within-study sampling variance); 34.75 percent
First, we will proceed with an explanation of the syntax of the total variance can be attributed to differences be-
in Listing 10. tween effect sizes within studies at level 2 (i.e., within-
• In the first eight lines of the syntax, the formula of Che- study variance); and 58.30 percent of the total variance can
ung (2014, formula 14 on page 2015) is broken down be attributed to differences between studies at level 3 (i.e.,
in a number of steps. In each step, a new object is between-study variance).
created in which interim results are stored. Even-
A different approach to heterogeneity
tually, the sampling variance is stored in the object
estimated.sampling.variance; Although performing a significance test is the preferred
• dataset$v= variable v in object dataset; method for determining whether variance components are
• ^2 = squaring an object or variable; significant, it may be wise to examine heterogeneity from
• In creating the objects I2_1, I2_2, and I2_3, each of a different perspective. A problem that arises in perform-
the three variance components (i.e., sampling variance, ing log-likelihood-ratio tests is that the test results may
within-study variance, and between-study variance, re- not be significant in case the data set is comprised of a
spectively) is divided by the total amount of variance, rather small number of primary studies and/or effect sizes,
so that a proportional estimate of each variance com- even though there is in reality substantial within-study or
ponent is stored in an object. overall$sigma2[1] between-study variance present. In other words, a statis-
refers to the amount of within-study variance in tical power problem may be involved. When a research
the object overall (which was created in Listing 5) synthesist is presented with non-significant results of log-
and overall$sigma2[2] refers to the amount of likelihood ratio tests and consequently decides not to pro-
between-study variance in the object overall. ceed with performing moderator analyses, this may not be

f
• In creating the objects amountvariancelevel1, the optimal research strategy.

The Quantitative Methods for Psychology 163

¦ 2016 Vol. 12 no. 3

Output 3 Output of Listing 9.

df AIC BIC AICc logLik LRT pval QE

Full 3 153.264 161.049 153.517 -73.632 808.8482
Reduced 2 214.066 219.257 214.191 -105.03 62.8024 <.0001 808.8482

Listing 10 The Distribution of the Total Variance over the Three Levels.

# Determining how the total variance is distributed over the

# three levels of the meta-analytic model;
# Print the results in percentages on screen.
n <- length(dataset$v)
list.inverse.variances <- 1 / (dataset$v)
sum.inverse.variances <- sum(list.inverse.variances)
squared.sum.inverse.variances <- (sum.inverse.variances) ^ 2
list.inverse.variances.square <- 1 / (dataset$v^2)
sum.inverse.variances.square <-
sum(list.inverse.variances.square)
numerator <- (n - 1) * sum.inverse.variances
denominator <- squared.sum.inverse.variances -
sum.inverse.variances.square

estimated.sampling.variance <- numerator / denominator

I2_1 <- (estimated.sampling.variance) / (overall$sigma2[1]

+ overall$sigma2[2] + estimated.sampling.variance)

I2_2 <- (overall$sigma2[1]) / (overall$sigma2[1]

+ overall$sigma2[2] + estimated.sampling.variance)

I2_3 <- (overall$sigma2[2]) / (overall$sigma2[1]

+ overall$sigma2[2] + estimated.sampling.variance)

amountvariancelevel1 <- I2_1 * 100

amountvariancelevel2 <- I2_2 * 100
amountvariancelevel3 <- I2_3 * 100

amountvariancelevel1
amountvariancelevel2
amountvariancelevel3

Output 4 Output of Listing 10.

> amountvariancelevel1
[1] 6.942732

> amountvariancelevel2
[1] 34.75388

> amountvariancelevel3

f
[1] 58.30339

The Quantitative Methods for Psychology 164

¦ 2016 Vol. 12 no. 3

Because of this problem, it may be wise to examine het- tax in Listing 5 that was used for calculating an overall ef-
erogeneity also in a different way by applying the 75% rule fect, but there are some differences:
as described by Hunter and Schmidt (1990). These schol- • The object in which the results of the moderator analy-
ars state that heterogeneity can be regarded as substantial, sis are stored has been designated as notpublished,
if less than 75% of the total amount of variance can be at- because we have chosen the category not published
tributed to sampling variance (at level 1). If this is the case, (which was coded as 0 in the variable pstatpub and
it may be fruitful to examine the potential moderating ef- coded as 1 in the variable pstatnotpub) to be the ref-
fect of study and/or effect size characteristics on the over- erence category. Similar to testing categorical predic-
all effect. In our example, approximately 7% of the total tors in simple regression analysis, one category func-
amount of variance could be attributed to sampling vari- tions as the reference category and the other cate-
ance (see Output 4), and based on the rule of Hunter and gorie(s) are compared against the reference category.
Schmidt, we can once again conclude that there is substan- From a mere statistical viewpoint, it makes no differ-
tial variation between effect sizes within studies and/or be- ence which category is chosen as the reference cate-
tween studies, making it relevant to perform moderator gory;
analyses. • mods = is the argument that is taken by the rma.mv
function when the user wants to test the potential mod-
Moderator analyses
erating influence of a variable. In our example, we are
testing whether effect sizes extracted from published
Categorical moderators with two categories (i.e., bi-
studies are significantly different from effect sizes ex-
nary or dichotomous predictors)
tracted from unpublished studies, and therefore we
Because we concluded that there is significant within-study have added pstatpub to the mods element by writ-
and between-study variance, we are now going to examine ing mods = ~pstatpub. Unpublished studies func-
whether it is possible to designate variables as moderators tion as the reference category.
of the overall effect. As we use the REstricted Maximum By calling the summary function (see Listing 11), the
Likelihood estimation method (REML) for estimating the results as given in Output 5 are presented on screen. The
parameters of the meta-analytic model, it is not possible to following should be derived from Output 5:
compare the fit of a model with potential moderating vari- • The results of the Test for Residual
ables to the fit of the model without the potential moder- Heterogeneity show that there is significant un-
ating variables (i.e., performing a log-likelihood-ratio test) explained variance left between all effect sizes in the
(see Hox, 2010; pg. 215). Instead, an omnibus test will be data set (QE (98) = 702.194, p < .001), after publication
performed to determine whether a (potential) moderating status has been added to the meta-analytic model to
effect of one or more variables included in the model is test its potential moderating effect;
significant. The null hypothesis in this omnibus test states • The results of the omnibus test are presented under
that all regression coefficients (i.e., betas) are equal to zero Test of Moderators (coefficient(s) 2).
(H0 : β1 = β2 = β3 = · · · = 0), and the alternative The p value is larger than the significant level of .05
hypothesis states that at least one of these regression coef- and this implies that the regression coefficient of the
ficients is not equal to zero. In case an intercept is part of variable pstatpub (the only coefficient that is tested)
the model (which is the case in our example), it will not be does not significantly deviate from zero. Therefore,
tested in the omnibus test. we can conclude that the overall effect is not moder-
In our example, we will first examine the potential ated by the publication status of the included primary
moderating effect of publication status of the included pri- studies. The results of the omnibus test can be writ-
mary studies. Recall that two dummy variables regard- ten as: F (1, 98) = 1.844, p = .178. Recall that we use
ing publication status are part of the data set: pstatpub the Knapp and Hartung adjustment (Knapp & Hartung,
(coded as 1 = published and 0 = not published) and 2003) in our analyses, implying that the omnibus test
pstatnotpub (coded as 0 = published and 1 = not pub- is based on the F distribution (and not on the normal
lished). We are going to use both variables in the modera- distribution);
tor analysis, but to test whether publication status is a sig- • From the Model Results, we can derive the mean
nificant moderating variable, we will first extend the meta- effect of the reference category, which is 0.812, and rep-
analytic model with the variable pstatpub. We can test resents the mean effect of the primary studies that have
the potential moderating effect of the categorical variable not been published. This mean effect significantly de-
publication status, by running the syntax in Listing 11. viates from zero, since t(98) = 2.656, p = .009. The mean

f
Once again, the syntax in Listing 11 resembles the syn- effect of published primary studies is equal to 0.812 + (-

The Quantitative Methods for Psychology 165

¦ 2016 Vol. 12 no. 3

Listing 11 Testing Publication Status as Potential Moderator (Published vs. Unpublished).

# Determine the potential moderating effect of publication status;

# Published studies are tested against unpublished studies, so
# unpublished studies serve as the reference category;
# Print the results stored in the object "notpublished"} on screen.
notpublished <- rma.mv(y, v, mods = ~ pstatpub, random = list(~ 1 | effectsizeID, ~
1 | studyID), tdist=TRUE, data=dataset)
summary(notpublished, digits=3)

Output 5 Output of Listing 11.

Multivariate Meta-Analysis Model (k = 100; method: REML)

logLik Deviance AIC BIC AICc
-71.435 142.870 150.870 161.210 151.300

Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.113 0.336 100 no effectsizeID
sigma^2.2 0.171 0.414 17 no studyID

Test for Residual Heterogeneity:

QE(df = 98) = 702.194, p-val < .001

Test of Moderators (coefficient(s) 2):

QM(df = 1) = 1.844, p-val = 0.178

Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.812 0.306 2.656 0.009 0.205 1.418 **
pstatpub -0.447 0.329 -1.358 0.178 -1.101 0.206
---
Signif. Codes: 0 ’***’ 0.001 ’**’ ’*’ 0.05 ’.’ 0.1 ’ ’ 1

0.447) = 0.365 and, as we already learnt from the results more (potential) moderating variables have been included
of the omnibus test, is not significantly different from in the meta-analytic model, by repeating the procedure as
the mean effect of unpublished primary studies. The described in the sections on heterogeneity of within- and
t test statistic used in testing the significance of the re- between-study variance, respectively. For now, we are not
gression coefficient of the variable pstatpub (-0.447) looking further into the significance of the variance com-
is not significant (t(98) = -1.358, p = .178) and in line ponents, since we did not detect a moderating effect of pub-
with the result of the omnibus test. Because we are lication status.
testing only one potential moderating variable in this It can be of relevance to not only report on the mean
specific model (i.e., the variable pstatpub), the value effect (including significance and confidence interval) of
of the omnibus test (F = 1.844) equals the square of the the reference category, but also on the mean effect (in-
t-test statistic (-1.358). cluding significance and confidence interval) of the other
Given the results, we can now conclude that the overall categories that are tested against the reference category.
association between mental health disorders of juveniles Above, we manually calculated the mean effect of the other
and recidivism in delinquency (d = 0.427) is not moderated category (i.e., published primary studies in the present ex-
by publication status of the included primary studies. If de- ample), but for determining the significance and the con-
sired, it is possible to examine the significance of the resid- fidence interval of this mean effect, we need to perform

f
ual within-study and between-study variance, after one or a second analysis. In addition, calculating mean effects

The Quantitative Methods for Psychology 166

¦ 2016 Vol. 12 no. 3

using R is less prone to error than manually calculating (−0.222) = 0.248. This effect is not significantly lower
mean effects and therefore preferable. For performing this than the mean effect of general delinquency, as the re-
additional analysis, we need to modify the syntax slightly gression coefficient is not significant: t(97) = -1.594, p =
by including the dummy variable pstatpub and leaving .114;
out the dummy variable pstatnotpub. Recall that these • The mean effect of covert delinquency equals 0.470 +
two variables are coded in opposite directions, so including (-0.730) = -0.260. This effect is significantly lower than
pstatnotpub in the syntax will give us the mean effect the mean effect of general delinquency, as the regres-
of published studies, which is now the reference category sion coefficient is significant: t(97) = -3.795, p < .001.
(see Listing 12). Running this syntax generates the output Given the results, we can conclude that there is a mod-
as presented in Output 6. erating effect of type of delinquency on the association be-
We can derive from Output 6 that the mean effect of tween mental health disorders and juvenile offender re-
published studies is 0.364 (95% CI: 0.120; 0.609), which is cidivism. For covert delinquency, the association is signif-
only slightly different from the value we calculated manu- icantly lower (Cohen’s d = -0.260) than for general delin-
ally (0.365) and this is due to rounding. Note that, in com- quency (Cohen’s d = 0.470). If the research synthesist is in-
parison to the results in Output 5, there are no differences terested in testing whether the mean effect of covert delin-
in the fit statistics, the estimates of the variance compo- quency significantly deviates from zero, additional syntax
nents, and the results of the omnibus test. should be written in such a way that the dummy vari-
ables typegen and typeovert are added as potential mod-
Categorical moderators with three categories
erating variables, whereas the dummy variable typecovert
Next, we will examine whether the overall association be- is left out. In this way, covert delinquency will become
tween mental health disorders of juveniles and recidivism the reference category (represented by the intercept), mak-
in delinquency is moderated by the type of delinquent ing it possible to determine not only the significance of
behavior. We distinguish between three types of delin- the mean effect of covert delinquency, but also the con-
quency: overt, covert, and general delinquent behavior. fidence interval around this effect. Adding the dummy
Since general delinquent behavior is a non-specific form of variables typegen and typecovert to the syntax (and
delinquency, we wanted this category to be the reference leaving out typeovert), would be necessary if we were
category. This implies that the other two categories (overt to determine the significance of the mean effect of overt
and covert delinquent behavior) must be part of the syn- delinquency. We could now examine the significance of
tax for properly performing the moderator analysis. Re- the residual within-study and between-study variance by
call from section 3 that three mutually exclusive dummy repeating the procedure as described in the sections on
variables representing the three types of delinquency are heterogeneity of within- and between-study variances, re-
part of the data set: typeovert, typecovert, and spectively. Note that the syntax for creating the objects
typegen. See Listing 13 for the syntax. modelnovar2 (see Listing 7) and modelnovar3 (see
In this syntax, the two variables typeovert and Listing 9) should be extended with the argument mods =
typecovert have been added. By using the + sign, mul- ~ typeovert + typecovert, so that the moderator
tiple variables can be added to the mods element. Recall type of delinquency is added to the model.
that the variable representing the reference category (gen- As a final remark, note that if we were only interested
eral delinquency in our example) must not be added to the in determining the moderating effect of a discrete vari-
syntax, otherwise the problem of redundancy will arise. able and not in estimates of the mean effect (including sig-
Running the syntax produces the output as presented in nificance and confidence interval) of all the categories of
Output 7. that variable, it would not be necessary to create and test
From this output, we can derive that: dummy variables. In this case, including that single dis-
• There is a moderating effect of type of delinquency, as crete variable as a moderator in the syntax (i.e., after the
the results of the omnibus test point towards a signifi- mods ~ element) would suffice. However, it has become
cant moderating effect: F (2, 97) = 7.490, p < .001. This rather common to report on the mean effect (as well as
implies that at least one of the regression coefficients of significance and confidence interval) of all categories of a
the variables added to the model significantly deviates discrete potential moderating variable (see, for instance,
from zero; Assink et al., 2015; Houben et al., 2015; Rapp, Van den
• The mean effect of general delinquency equals 0.470 Noortgate, Broekaert, & Vanderplasschen, 2014; Van der
and this effect significantly deviates from zero: t(97) Hallen, Evers, Brewaeys, Van den Noortgate, & Wagemans,
= 3.986, p <.001; 2015; Van der Stouwe, Asscher, Stams, Dekovic, & Van der

f
• The mean effect of overt delinquency equals 0.470 + Laan, 2014; Weisz et al., 2013).

The Quantitative Methods for Psychology 167

¦ 2016 Vol. 12 no. 3

Output 6 Output of Listing 12.

Multivariate Meta-Analysis Model (k = 100; method: REML)

logLik Deviance AIC BIC AICc
-71.435 142.870 150.870 161.210 151.300

Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.113 0.336 100 no effectsizeID
sigma^2.2 0.171 0.414 17 no studyID

Test for Residual Heterogeneity:

QE(df = 98) = 702.194, p-val < .001

Test of Moderators (coefficient(s) 2):

QM(df = 1) = 1.844, p-val = 0.178

Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.364 0.123 2.962 0.004 0.120 0.609 **
pstatnotpub 0.447 0.329 1.358 0.178 -0.206 1.101
---
Signif. Codes: 0 ’***’ 0.001 ’**’ ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Output 7 Output of Listing 13.

Multivariate Meta-Analysis Model (k = 100; method: REML)

logLik Deviance AIC BIC AICc

-66.290 132.581 142.581 155.454 143.240

Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.085 0.291 100 no effectsizeID
sigma^2.2 0.190 0.436 17 no studyID

Test for Residual Heterogeneity:

QE(df = 97) = 761.162, p-val < .001

Test of Moderators (coefficient(s) 2,3):

QM(df = 2) = 7.490, p-val < .001

Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.470 0.118 3.986 < .001 0.236 0.704 ***
typeovert -0.222 0.139 -1.594 0.114 -0.498 0.054
typecovert -0.730 0.192 -3.795 < .001 -1.111 -0.348 ***
---

f
Signif. Codes: 0 ’***’ 0.001 ’**’ ’*’ 0.05 ’.’ 0.1 ’ ’ 1

The Quantitative Methods for Psychology 168

¦ 2016 Vol. 12 no. 3

Listing 12 Testing Publication Status as Potential Moderator (Unpublished vs. Published).

# Determine the potential moderating effect of publication status;

# Unpublished studies are now tested against published
# studies, so published studies serve as the reference category;
# Print the results stored in the object "published" on screen.
published <- rma.mv(y, v, mods = ~ pstatnotpub, random = list(~ 1 | effectsizeID, ~
1 | studyID), tdist=TRUE, data=dataset)
summary(published, digits=3)

Listing 13 Testing Type of Delinquency as Potential Moderator.

# Determine the potential moderating effect of type of delinquency;

# General delinquency is chosen as the reference category;
# Print the results stored in the object "generaldelinquency" on screen.
generaldelinquency <- rma.mv(y, v, mods = ~ typeovert + typecovert, random = list(~
1 | effectsizeID, ~ 1 | studyID), tdist=TRUE, data=dataset)
summary(generaldelinquency, digits=3)

testing categorical moderators, the intercept cannot be

Continuous moderators
interpreted as the mean effect of a reference category.
In this last example of univariate moderator analyses, When testing continuous variables as potential moder-
we will show how to test a potential continuous modera- ators, the regression coefficient (beta) and its signifi-
tor. We were interested in examining whether the year cance are in most cases more informative.
in which a primary study was published moderates the In sum, we can conclude that publication year is a sig-
overall effect, since changes over time may influence the nificant moderator of the overall association between men-
strength of the association between mental health disor- tal health disorders of juveniles and recidivism in delin-
ders of juveniles and juvenile offender recidivism. These quency. As studies have been published more recently
changes over time may be seen, for instance, in juvenile (i.e., publication year increases), the strength of the over-
criminal law or in the way mental health disorders and/or all association decreases. This significant decrease in ef-
recidivism are operationalized and assessed. We treated fect over time is not indicative for a very robust association
publication year as a continuous variable and its potential between mental health disorders of juveniles and juvenile
moderating effect can be tested by running the syntax in offender recidivism, and would call for further testing of
Listing 14. Running this syntax produces the output pre- more specific potential moderating variables. Note that the
sented in Output 8. significance of the within-study and between-study vari-
From Output 8, we can derive that: ance can be tested again (see the sections on heterogene-
• Publication year is a significant moderator, as the om- ity of within- and between-study variances, respectively),
nibus test is significant (F (1, 98) = 5.464, p = .021) and, to examine whether there is significant variance left that
logically, also the regression coefficient is significant (- may be explained by other moderating variables.
0.042; t(98) = -2.238, p = .021). The regression coeffi-
Multiple moderator model
cient is negative, implying that the more recent a pri-
mary study has been published, the lower the reported In meta-analytic research it is common practice to test the
effects in the primary studies; potential moderating effect of multiple variables, such as
• The intercept significantly deviates from zero (t(98) = study, sample, and research design characteristics. As de-
4.095, p < .001), but this is not the most important re- noted by Hox (2010), many of these variables are often
sult when testing a continuous moderator. The value interrelated leading to substantial multicollinearity in the
of the intercept represents the mean effect of effect analyses. As a consequence, it is not always straightfor-
sizes extracted from primary studies that have been ward to determine what effects are really relevant and de-
published in the mean publication year (i.e., when the serve the most attention. In light of this, Hox states that
variable pyear, that was centred around its mean, is testing multiple moderators in a single model after (poten-

f
given the value 0). So, in contrast to the procedure for tial) moderating effects have been evaluated separately in

The Quantitative Methods for Psychology 169

¦ 2016 Vol. 12 no. 3

Listing 14 Testing Publication Year as Potential Moderator.

# Determine the potential moderating effect of publication year;

# Print the results stored in the object "publicationyear" on screen.
publicationyear <- rma.mv(y, v, mods = ~ pyear, random = list(~ 1 | effectsizeID, ~
1 | studyID), tdist=TRUE, data=dataset)
summary(publicationyear, digits=3)

Output 8 Output of Listing 14.

Multivariate Meta-Analysis Model (k = 100; method: REML)

logLik Deviance AIC BIC AICc

-70.282 140.564 148.564 158.904 148.994

Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.113 0.336 100 no effectsizeID
sigma^2.2 0.135 0.367 17 no studyID

Test for Residual Heterogeneity:

QE(df = 98) = 672.545, p-val < .001

Test of Moderators (coefficient(s) 2):

QM(df = 1) = 5.464, p-val = 0.021

Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.426 0.104 4.095 < .001 0.219 0.632 ***
pyear -0.042 0.018 -2.238 0.021 -0.078 -0.006 *
---
Signif. Codes: 0 ’***’ 0.001 ’**’ ’*’ 0.05 ’.’ 0.1 ’ ’ 1

univariate models, is a reasonable strategy. In our ﬁnal Output 9.

step of the moderator analyses, we follow the approach of From Output 9, we can derive that:
Hox and we will examine the unique effect of the variables • At least one of the regression coefficients of the moder-
that were previously identified as significant moderators ators significantly deviates from zero, as the omnibus
in the univariate analyses. To do so, we need to extend the test shows a significant result (F (3, 96) = 6.414, p < .001);
meta-analytic model by adding all significant moderating • The regression coefficient of publication year (-0.038)
variables simultaneously. In our example, recall that the significantly deviates from zero, as the t test shows a
categorical variable type of delinquency as well as the con- significant result (t(96) = -2.077, p = .040);
tinuous variable publication year were identified as signif- • The regression coefficient of covert delinquency (-
icant moderators in the univariate analyses (see Outputs 7 0.709) significantly deviates from zero, as the t test
and 8, respectively). Therefore, we will extend the meta- shows a significant result (t(96) = -3.707, p < .001).
analytic model with the variables pyear, typeovert, Based on these results, we can conclude that both pub-
and typecovert. Since general delinquency was the ref- lication year and the category covert delinquency (ver-
erence category in testing the variable type of delinquency sus general delinquency) of the variable type of delin-
as a potential moderator, we will not include the dummy quency have a unique moderating effect on the associ-
variable typegen in the syntax. The multiple moderator ation between mental health disorders of juveniles and
model can be built by executing the syntax in Listing 15. recidivism in delinquency. In other words, we can say

f
Running this syntax produces the output as presented in that both moderators are robust in the sense that they

The Quantitative Methods for Psychology 170

¦ 2016 Vol. 12 no. 3

Listing 15 Testing Multiple Moderators in a Single Model.

# Testing a multiple moderator model in which publication year

# and delinquency type (overt delinquency and covert
# delinquency) have been added as moderators.
multiplemoderator <- rma.mv(y, v, mods = ~ pyear + typeovert + typecovert, random =
list(~ 1 | effectsizeID, ~ 1 | studyID), tdist=TRUE, data=dataset)
summary(multiplemoderator, digits=3)

Output 9 Output of Listing 15.

Multivariate Meta-Analysis Model (k = 100; method: REML)

logLik Deviance AIC BIC AICc

-63.375 126.750 138.750 154.136 139.694

Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.085 0.292 100 no effectsizeID
sigma^2.2 0.149 0.386 17 no studyID

Test for Residual Heterogeneity:

QE(df = 96) = 609.357, p-val < .001

Test of Moderators (coefficient(s) 2,3,4):

QM(df = 3) = 6.414, p-val < .001

Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.466 0.107 4.346 < .001 0.253 0.678 ***
pyear -0.038 0.018 -2.077 0.040 -0.074 -0.002 *
typeovert -0.204 0.139 -1.472 0.144 -0.479 0.071
typecovert -0.709 0.191 -3.707 < .001 -1.089 -0.330 ***
---
Signif. Codes: 0 ’***’ 0.001 ’**’ ’*’ 0.05 ’.’ 0.1 ’ ’ 1

are not confounded by the other variable in the model Missing data and size of the data set
(i.e., covert delinquency (versus general delinquency) is
Although the primary aim of this tutorial is to demonstrate
not confounded by publication year and vice versa). This
how a multilevel approach can be applied to meta-analytic
multiple moderator model provides more evidence of true
models in R, we shortly address the problem of missing
moderating effects of the variables covert delinquency
data in multilevel meta-analytic research. Throughout the
(versus general delinquency) and publication year than the
years a number of techniques have been developed for as-
results of the univariate moderator analyses alone. Now
sessing whether data is missing in a meta-analytic research
that the multiple moderator model is built, it is possible
project and, if so, how this affects the results. Examples
to test the signiﬁcance of the residual within-study and
of well-known techniques are the Rosenthal’s fail-safe test
between-study variance, respectively. Note that, for this
(1979), Egger’s linear regression test (Egger, Davey-Smith,
purpose, the syntax in Listings 7 and 9 should then be ex-
Schneider, & Minder, 1997), the Begg and Mazumdar’s
tended with the mods = ~ argument and all variables
Rank Correlation test (Begg & Mazumdar, 1994), and the
that are part of the present multiple moderator model.
trim-and-ﬁll method (Duval & Tweedie, 2000a, 2000b). It is
good practice for a research synthesist to discuss the extent

f
to which the results were affected by missing data, and to

The Quantitative Methods for Psychology 171

¦ 2016 Vol. 12 no. 3

apply at least one of the available methods for detecting

Conclusion
and handling missing data. This also accounts for multi-
level meta-analytic research. In scientific literature, there Applying a multilevel approach to meta-analysis is a strong
is a considerable and ongoing debate on the appropriate- method for dealing with interdependency of effect sizes,
ness of the available methods and each method seems to but until today, it is a rather unknown method among
have its own limitations (see, for instance, Egger, Davey- scholars and it has not been widely used in meta-analytic
Smith, & Altman, 2001; Nakagawa & Santos, 2012; Nik Idris, research. The main purpose of the present tutorial was
2012; Peters, Sutton, Jones, Abrams, & Rushton, 2007; Ter- to provide an introduction to multilevel modeling in meta-
rin, Schmid, Lau, & Olkin, 2003). Therefore, selecting the analysis using the rma.mv function of the metafor R pack-
most appropriate method for dealing with missing data age (Viechtbauer, 2015). In specific, we show how the
may not be straightforward. Furthermore, to our knowl- rma.mv function can be called in R syntax, so that a three
edge, the available methods have not been evaluated in level structure is applied to a meta-analytic model. In these
multi-level meta-analytic research and this makes it even three-level models, three different variance components
more difficult to select an appropriate method for detect- are considered: sampling variance at the first level, within-
ing and handling missing data in a multilevel meta-analytic study variance at the second level, and between-study vari-
research project. Evaluating the performance of the avail- ance at the third level (Cheung, 2014; Hox, 2010; Van den
able methods in multi-level meta-analysis would be a good Noortgate, López-López, Marı́n-Martinez, & Sánchez-Meca,
direction for future research. 2013, 2014). In short, this tutorial offers a step-by-step
As for the size of the data set used in multilevel meta- guide for (1) organizing a data file; (2) setting up the R
analytic research, it is rather difficult to state what the min- environment; (3) calculating an overall effect; (4) examin-
imum number of studies and effect sizes should be. The ing heterogeneity of within-study variance and between-
statistical power in the analyses increases as the number of study variance; (5) performing categorical and continuous
studies and effect sizes in the data set increases, but meth- moderator analyses; and (6) examining a multiple moder-
ods for determining the exact power in multilevel meta- ator model. The statistical approach described in this tu-
analytic models seem not yet available. Further, Viecht- torial has been used in several published meta-analytic re-
bauer (2005) and Van den Noortgate and Onghena (2003) views (see, for instance, Assink et al., 2015; Gubbels, Van
showed that when (restricted) maximum likelihood proce- der Stouwe, Spruit, & Stams, 2016; Spruit, Assink, Van Vugt,
dures are used for estimating the parameters in the mul- Van der Put, & Stams, 2016; Spruit, Schalkwijk, Van Vugt, &
tilevel meta-analytic model, a smaller number of studies Stams, 2016; Spruit, Van Vugt, Van der Put, Van der Stouwe,
might result in underestimated standard errors and, conse- & Stams, 2016). The data file that was used in the present
quently, an increase in the number of type 1 errors in test- tutorial can be downloaded by the reader from the jour-
ing the overall effect size and the moderator effects. In ad- nal’s website.
dition, a low number of studies may also lead to the prob-
Authors’ note
lem of a biased estimate of the between-study variance and
the corresponding standard error (see also Van den Noort- We thank Prof. Dr. Wim van den Noortgate (University
gate et al., 2013). To be short, larger numbers of studies of Leuven) and Dr. Wolfgang Viechtbauer (Maastricht Uni-
and effect sizes are to be preferred above smaller num- versity) for sharing their statistical expertise.
bers, which is not suprising. Future research on the per-
References
formance and robustness of multilevel meta-analytic mod-
els using data sets of different sizes (and types) is needed. Assink, M., Van der Put, C., Hoeve, M., De Vries, S. L. A.,
All in all, given the difficulties and restrictions of the tra- Stams, G. J. J. M., & Oort, F. J. (2015). Risk factors
ditional univariate approach to meta-analysis, the three- for persistent delinquent behavior among juveniles:
level approach in meta-analytic research seems reliable A meta-analytic review. Clinical Psychology Review,
and promising. For further reading on three-level meta- 42, 47–61. doi:10.1016/j.cpr.2015.08.002
analysis, we refer the reader to Van den Noortgate and Begg, C. B. & Mazumdar, M. (1994). Operating character-
Onghena (2003) and Van den Noortgate et al. (2013, 2014). istics of a rank correlation test for publication bias.
Biometrics, 50(4), 1088–1101. doi:10.2307/2533446
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein,
H. (2009). Introduction to meta-analysis. West Sussex,
UK: John Wiley & Sons.
Cheung, M. W. L. (2014). Modeling dependent effect sizes

f
with three-level meta-analyses: A structural equation

The Quantitative Methods for Psychology 172

¦ 2016 Vol. 12 no. 3

modeling approach. Psychological Methods, 19, 211– Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-
229. doi:10.1037/a0032968 analysis. Thousand Oaks, CA: Sage.
Cheung, M. W. L. (2015). Meta-analysis: A structural equa- Mullen, B. (1989). Advanced basic meta-analysis. Hillsdale,
tion modeling approach. New York, NY: John Wiley & NJ: Lawrence Erlbaum Associates.
Sons. Nakagawa, S. & Santos, E. S. A. (2012). Methodological is-
Cohen, J. (1988). Statistical power analysis for the behav- sues and advances in biological meta-analysis. Evolu-
ioral sciences. New York, NY: Routledge Academic. tionary Ecology, 26(5), 1253–1274. doi:10.1007/s10682-
Cooper, H. (2010). Research synthesis and meta-analysis: 012-9555-5
A step-by-step approach (4th ed.) Thousand Oaks, CA: Nik Idris, N. R. (2012). A comparison of methods to detect
Sage. publication bias for meta-analysis of continuous data.
Del Re, A. C. (2015). A practical tutorial on conducting meta- Journal of Applied Sciences, 12(13), 1413–1417. doi:10.
analysis in R. The Quantitative Methods for Psychol- 3923/jas.2012.1413.1417
ogy, 11(1), 37–50. Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R.,
Duval, S. & Tweedie, R. (2000a). A nonparametric ‘trim & Rushton, L. (2007). Performance of the trim and
and fill’ method of accounting for publication bias in fill method in the presence of publication bias and
meta-analysis. Journal of the American Statistical As- between-study heterogeneity. Statistics in Medicine,
sociation, 95(449), 89–99. doi:10.1080/01621459.2000. 26(25), 4544–4562. doi:10.1002/sim.2889
10473905 R Development Core Team. (2016). R: A language and en-
Duval, S. & Tweedie, R. (2000b). Trim and fill: A simple vironment for statistical computing (Version 3.3). Vi-
funnel-plot-based method of testing and adjusting for enna, Austria: R Foundation for Statistical Computing.
publication bias in meta-analysis. Biometrics, 56(2), Retrieved from http://www.R-project.org/
455–463. doi:10.1111/j.0006-341X.2000.00455.x Rapp, R. C., Van den Noortgate, W., Broekaert, E., & Van-
Egger, M., Davey-Smith, G., & Altman, D. (2001). Systematic derplasschen, W. (2014). The efficacy of case manage-
reviews in healthcare. London: British Medical Jour- ment with persons who have substance abuse prob-
nal Books. lems: A three-level meta-analysis of outcomes. Jour-
Egger, M., Davey-Smith, G., Schneider, M., & Minder, C. nal of Consulting and Clinical Psychology, 82(4), 605–
(1997). Bias in meta-analysis detected by a simple, 618. doi:10.1037/a0036750
graphical test. British Medical Journal, 315, 629–634. Raudenbush, S. W. (2009). Analyzing effect sizes: Random-
doi:10.1136/bmj.315.7109.629 effects models. In L. V. H. Cooper & J. C. Valentine
Hedges, L. V. & Olkin, I. (1985). Statistical methods for meta- (Eds.), The handbook of research synthesis and meta-
analysis. Orlando, FL: Academic Press. analysis (pp. 295–315). New York, NY: Russell Sage
Houben, M., Van den Noortgate, W., & Kuppens, P. (2015). Foundation.
The relation between short-term emotion dynamics Rosenthal, R. (1984). Meta-analytic procedures for social re-
and psychological well-being: A meta-analysis. Psy- search. Beverly Hills, CA: Sage.
chological Bulletin, 141(4), 901–930. doi:10 . 1037 / Satterthwaite, F. E. (1946). An approximate distribution
a0038822 of estimates of variance components. Biometrics Bul-
Hox, J. J. (2010). Multilevel analysis: Techniques and appli- letin, 6, 110–114. doi:10.2307/3002019
cations. New York, NY: Routledge. Schmidt, F. L. & Hunter, J. E. (2015). Methods of meta-
Hunter, J. E. & Schmidt, F. L. (1990). Methods of meta- analysis: Correcting error and bias in research findings
analysis: Correcting error and bias in research find- (3rd ed). Thousand Oaks, CA: Sage.
ings. Newbury Park, CA: Sage. Tabachnik, B. G. & Fidell, L. S. (2013). Using multivariate
Hunter, J. E. & Schmidt, F. L. (2004). Methods of meta- statistics (6th ed.) Boston: Allyn and Bacon.
analysis: Correcting error and bias in research findings Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting
(2nd ed). Thousand Oaks, CA: Sage. for publication bias in the presence of heterogeneity.
Knapp, G. & Hartung, J. (2003). Improved tests for a ran- Statistics in Medicine, 22, 2113–2126. doi:10.1002/sim.
dom effects meta-regression with a single covariate. 1461
Statistics in Medicine, 22, 2693–2710. doi:10.1002/sim. Van den Noortgate, W., Lòpez-Lòpez, J. A., Marı̀n-Martı̀nez,
1482 F., & Sànchez-Meca, J. (2013). Three-level meta-
Li, Y., Shi, L., & Roth, D. (1994). The bias of the commonly- analysis of dependent effect sizes. Behavior Research
used estimate of variance in meta-analysis. Communi- Methods, 45, 576–594. doi:10.3758/s13428-012-0261-6
cations in Statistics - Theory and Methods, 23(4), 1063– Van den Noortgate, W., Lòpez-Lòpez, J. A., Marı̀n-Martı̀nez,

f
1085. doi:10.1080/03610929408831305 F., & Sànchez-Meca, J. (2014). Meta-analysis of mul-

The Quantitative Methods for Psychology 173

¦ 2016 Vol. 12 no. 3

tiple outcomes: A multilevel approach. Behavior Re- Journal of Educational and Behavioral Statistics, 30,
search Methods, 46, 1–21. doi:10 . 3758 / s13428 - 014 - 261–293. doi:10.3102/10769986030003261
0527-2 Viechtbauer, W. (2015). Meta-analysis package for R. Re-
Van den Noortgate, W. & Onghena, P. (2003). Multi- trieved from https://cran.r-project.org/web/packages/
level meta-analysis: A comparison with traditional metafor/metafor.pdf
meta-analytical procedures. Educational and Psycho- Weisz, J. R., Kuppens, S., Eckshtain, D., Ugueto, A. M.,
logical Measurement, 63, 765–790. doi:10 . 1177 / Hawley, K. M., & Jensen-Doss, A. (2013). Perfor-
0013164403251027 mance of evidence-based youth psychotherapies com-
Van der Hallen, R., Evers, K., Brewaeys, K., Van den Noort- pared with usual clinical care: A multilevel meta-
gate, W., & Wagemans, J. (2015). Global processing analysis. JAMA Psychiatry, 70, 750–761. doi:10 . 1001 /
takes time: A meta-analysis on local-global visual pro- jamapsychiatry.2013.1176
cessing in asd. Psychological Bulletin, 141(3), 549–573. Wibbelink, C. J. M., Hoeve, M., Stams, G. J. J. M., & Oort,
doi:10.1037/bul0000004 F. J. (2016). A meta-analysis of the association be-
Van der Stouwe, T., Asscher, J. J., Stams, G. J. J. M., Dekovic, tween mental health disorders and juvenile recidivism.
M., & Van der Laan, P. H. (2014). The effectiveness of Manuscript submitted for publication.
Multisystemic Therapy (MST): A meta-analysis. Clini- Ziegler, S., Koch, A., & Victor, N. (2001). Deﬁcits and rem-
cal Psychology Review, 34(6), 468–481. doi:10.1016/j. edy of the standard random effects methods in meta-
cpr.2014.06.006 analysis. Methods of Information in Medicine, 40(2),
Viechtbauer, W. (2005). Bias and eﬃciency of meta-analytic 148–155.
variance estimators in the random-effects model.

Open practices

The Open Material badge was earned because supplementary material(s) are available on the journal’s web site.

Citation

Assink, M. & Wibbelink, C. J. M. (2016). Fitting three-level meta-analytic models in R: A step-by-step tutorial. The Quanti-
tative Methods for Psychology, 12(3), 154–174. doi:10.20982/tqmp.12.3.p154

Copyright © 2016, Assink, Wibbelink . This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC
BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original
publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not
comply with these terms.