0% found this document useful (0 votes)
48 views19 pages

Brown (2021) - An Introduction To Linear Mixed-Effects Modeling in R

This document provides an introduction to linear mixed-effects modeling. It begins by discussing limitations of traditional repeated measures ANOVAs for analyzing data with dependencies, such as responses from the same participants across multiple trials. Mixed-effects models allow modeling of both participant and item variability simultaneously, without requiring data aggregation. They also handle missing data more flexibly than ANOVAs. The tutorial then introduces mixed-effects modeling in R as a more flexible statistical approach than ANOVAs for many types of repeated measures experimental data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views19 pages

Brown (2021) - An Introduction To Linear Mixed-Effects Modeling in R

This document provides an introduction to linear mixed-effects modeling. It begins by discussing limitations of traditional repeated measures ANOVAs for analyzing data with dependencies, such as responses from the same participants across multiple trials. Mixed-effects models allow modeling of both participant and item variability simultaneously, without requiring data aggregation. They also handle missing data more flexibly than ANOVAs. The tutorial then introduces mixed-effects modeling in R as a more flexible statistical approach than ANOVAs for many types of repeated measures experimental data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

960351

research-article2021
AMPXXX10.1177/2515245920960351BrownAn Introduction to Mixed-Effects Modeling

ASSOCIATION FOR
Tutorial PSYCHOLOGICAL SCIENCE
Advances in Methods and

An Introduction to Linear Mixed-Effects Practices in Psychological Science


January-March 2021, Vol. 4, No. 1,
pp. 1­–19
Modeling in R © The Author(s) 2021
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/2515245920960351
https://doi.org/10.1177/2515245920960351
www.psychologicalscience.org/AMPPS

Violet A. Brown
Department of Psychological & Brain Sciences, Washington University in St. Louis

Abstract
This Tutorial serves as both an approachable theoretical introduction to mixed-effects modeling and a practical
introduction to how to implement mixed-effects models in R. The intended audience is researchers who have some
basic statistical knowledge, but little or no experience implementing mixed-effects models in R using their own data. In
an attempt to increase the accessibility of this Tutorial, I deliberately avoid using mathematical terminology beyond what
a student would learn in a standard graduate-level statistics course, but I reference articles and textbooks that provide
more detail for interested readers. This Tutorial includes snippets of R code throughout; the data and R script used to
build the models described in the text are available via OSF at https://osf.io/v6qag/, so readers can follow along if they
wish. The goal of this practical introduction is to provide researchers with the tools they need to begin implementing
mixed-effects models in their own research.

Keywords
mixed-effects modeling, R, language, speech perception, open data

Received 4/11/20; Revision accepted 8/20/20

In many areas of experimental psychology, researchers multiple trials must be analyzed with a statistical test
collect data from participants responding to multiple tri- that takes the dependencies in the data into account.
als. This type of data has traditionally been analyzed For this reason, when analyzing data in which obser-
using repeated measures analyses of variance (ANOVAs)— vations are nested within participants, repeated measures
statistical analyses that assess whether conditions differ ANOVAs are preferable to standard ANOVAs and multiple
significantly in their means, accounting for the fact that regression, which both ignore the hierarchical structure
observations within individuals are correlated. Repeated of the data. However, repeated measures ANOVAs are far
measures ANOVAs have been favored for analyzing this from perfect. Although they can model either participant-
type of data because using other statistical techniques, or item-level variability (often referred to as F 1 and F2
such as multiple regression, would violate a crucial analyses in the ANOVA literature), they cannot simulta-
assumption of many statistical tests: the independence neously take both sources of variability into account, so
assumption. This assumption states that the observations observations within a condition must be collapsed across
in a data set must be independent; that is, they cannot either items or participants. When the data are aggre-
be correlated with one another. But take, for example, gated in this way, however, important information about
a reaction time study in which participants respond to variability within participants or items is lost, which
the same 100 trials, each of which corresponds to a dif- reduces statistical power (see Barr, 2008), that is, the
ferent item (e.g., a particular word in a psycholinguistics likelihood of detecting an effect if one exists.
study). Reaction times within a given participant and
within an item will certainly be correlated; some partici-
Corresponding Author:
pants are faster than others, and some items are responded Violet A. Brown, Department of Psychological & Brain Sciences,
to more quickly than others. Given that observations are Washington University in St. Louis
not independent, data in which participants respond to E-mail: [email protected]

Creative Commons NonCommercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0
License (https://creativecommons.org/licenses/by-nc/4.0/), which permits noncommercial use, reproduction, and distribution of the work without
further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Brown

Another limitation of ANOVAs is that they deal with modeling easier, as reliance on ANOVAs tends to create
missing observations via listwise deletion; this means that a fixed mind-set in which statistical testing and categorical
if a single observation is missing, the entire case is deleted, “significant versus nonsignificant” thinking are paramount.
and none of the observations from that individual (or Mixed-effects modeling is therefore appropriate in many
item) will be used in the analysis. Depending on the num- cases in which standard ANOVAs, repeated measures
ber of complete cases in the data set, this can substantially ANOVAs, and multiple regression are not. Thus, it is a
reduce sample size, which leads to inflated standard error more flexible analytic tool.
estimates and reduced statistical power (though the esti-
mates will be unbiased if the data are missing completely
Disclosures
at random; see Enders, 2010). ANOVAs also assume that
the dependent variable is continuous and the indepen- The data and R script used to generate the models
dent variables are categorical; experiments in which the described in this article are available via OSF, at https://
outcome is categorical (e.g., accuracy at identifying par- osf.io/v6qag/.
ticular items in a recognition memory task) must be aggre-
gated or analyzed using a different technique, and
Introducing the Data
continuous predictors (e.g., time in a longitudinal study)
must be treated categorically (i.e., binned), which reduces In this Tutorial, I use examples from my own research
statistical power and makes it difficult to model nonlinear area, human speech perception, but the concepts apply
relationships between predictors and outcomes (e.g., to a wide variety of areas within and beyond psychology.
Liben-Nowell et al., 2019; Royston et al., 2005). A final For example, participants in a social-psychology experi-
drawback of ANOVAs is that although they indicate ment might view videos and be asked to evaluate the
whether an effect is significant, they do not provide infor- affect associated with each of them, or participants in a
mation about the magnitude or direction of the effect; that clinical experiment might read a series of narratives and
is, they do not provide individual coefficient estimates for be asked to describe the extent to which each of them
each predictor that indicate growth or trajectory. generates anxiety.1 The goal of this Tutorial is to provide
a practical introduction to linear mixed-effects modeling
and introduce the tools that will enable you to conduct
Mixed-Effects Models Take the Stage such analyses on your own. This overview is not intended
These shortcomings of ANOVAs and multiple regression to address every issue you may encounter in your own
can be avoided by using linear mixed-effects modeling analyses, but is meant to provide enough information
(also referred to as multilevel modeling or mixed model- that you have a sense of what to ask if you get stuck. To
ing). Mixed-effects modeling allows a researcher to help you along the way, I provide snippets of R code
examine the condition of interest while also taking into using dummy data that serve as a running example.
account variability within and across participants and The example data I provide (see https://osf.io/
items simultaneously. It also handles missing data and v6qag/), which we will work with later in this Tutorial,
unbalanced designs quite well; although observations come from a within-subjects speech-perception study in
are removed when a value is missing, each observation which each of 53 participants was presented with 553
represents just one of many responses within an indi- isolated words, some in the auditory modality alone
vidual, so removal of a single observation has a much (audio-only condition) and some with an accompanying
smaller effect in the mixed-modeling framework than in video of the talker (audiovisual condition). Participants
the ANOVA framework, in which all responses within a listened to and repeated these isolated words aloud
participant are considered to be part of the same obser- while simultaneously performing an unrelated response
vation. Participants or items with more missing cases time task in the tactile modality (classifying the length
also have weaker influences on parameter estimates (i.e., of pulses that coincided with the presentation of each
the parameter estimates are precision weighted), and word as short, medium, or long). The response time data
extreme values are “shrunk” toward the mean (for more are based on data from a previous experiment of mine
details on shrinkage, see Raudenbush & Bryk, 2002; (Brown & Strand, 2019; complete data set available at
Snijders & Bosker, 2012). Further, continuous predictors https://osf.io/86zdp/), but the response times them-
do not pose a problem for mixed-effects models (see selves have been modified for pedagogical purposes
Baayen, 2010), and the fitted model provides coefficient (i.e., to illustrate particular issues that you may encoun-
estimates that indicate the magnitude and direction of ter when analyzing data with mixed-effects models). The
the effects of interest. Finally, the mixed-effects regres- accuracy data have not been modified, but variables
sion framework can easily be extended to handle a vari- have been removed for simplicity.
ety of response variables (e.g., categorical outcomes) Previous research has shown that being able to see as
via generalized linear mixed-effects models, and operat- well as hear a talker in a noisy environment substantially
ing in this framework makes the transition to Bayesian improves listeners’ ability to identify speech relative to
An Introduction to Mixed-Effects Modeling 3

Table 1. First Six Rows of the Example Data Set in Unaggregated and
Aggregated Formats

Unaggregated data set Aggregated data set

PID modality stim RT PID modality RT


301 Audio-only gown 1024 301 Audio-only 1027
301 Audio-only might 838 301 Audiovisual 1002
301 Audio-only fern 1060 302 Audio-only 1047
301 Audio-only vane 882 302 Audiovisual 1043
301 Audio-only pup 971 303 Audio-only 883
301 Audio-only rise 1064 303 Audiovisual 938

Note: PID = participant identification number; stim = stimulus; RT = response time.

hearing the talker alone (e.g., Erber, 1972). The goals of misinterpretation of interactions, as I discuss below; see
this dual-task experiment were to determine whether Wendorf, 2004, for a description of various coding
seeing the talker would also affect response times in the schemes).
secondary task (slower response times were taken as an The left side of Table 1 shows the first six lines of the
indication of increased cognitive costs associated with data in the desired format: unaggregated long format.
the listening task—“listening effort”) and to replicate the For a helpful tutorial on how to wrangle the data into
well-known intelligibility benefit from seeing the talker. this format, I recommend Wickham and Grolemund’s
In what follows, we will use mixed-effects modeling to (2017) open-access textbook R For Data Science (Chap-
assess the effect of modality (audio-only vs. audiovisual) ter 12.3.1) and the tidyverse collection of R packages
on response times and word intelligibility while simul- (Wickham et al., 2019). If you are following along with
taneously modeling variability both within and across your own data, ensure that your data are in long format
participants and items. We will assume that modality was such that each row represents an individual observation
manipulated within subjects and within items, which (i.e., do not aggregate across either participants or items).
means that each participant completed the task in both Notice that in this half of the table, each of the first six
modalities, and each word was presented in both modal- rows corresponds to a different word (stim) presented
ities (but each word occurred in only one modality for to the same participant (PID). In contrast, for an ANOVA,
each participant). the data frame would contain two rows per participant,
For all the analyses described below, we will use a one for each modality, and the value in the response time
dummy-coding (also referred to as treatment-coding) (RT) column for a given row would reflect the mean
scheme such that the audio-only condition serves as the response time for all words presented to that individual
reference level and is therefore coded as 0, and the in the indicated condition (right side of Table 1).
audiovisual condition is coded as 1. Thus, in the mixed-
effects models, the regression coefficient associated with
Fixed and Random Effects
the intercept represents the estimated mean response
time in the audio-only condition (when modality = 0), Mixed-effects models are called “mixed” because they
and the coefficient associated with the effect of modality simultaneously model fixed and random effects. Fixed
indicates how the mean response time changes in the effects represent population-level (i.e., average) effects
audiovisual condition (when modality = 1). We could that should persist across experiments. Condition effects
instead use the audiovisual condition as the reference are typically fixed effects because they are expected to
level, in which case the intercept would represent the operate in predictable ways across various samples of
estimated mean response time in the audiovisual con- participants and items. Indeed, in our example, modality
dition (when modality = 0), and the modality effect will be modeled as a fixed effect because we expect that
would indicate how this estimate changes in the audio- there is an average relationship between modality and
only condition (when modality = 1). Altering the cod- response times that will turn up again if we conduct the
ing scheme, either by changing the reference level or same experiment with a different sample of participants
by switching to a different coding scheme altogether and items.
(e.g., sum or deviation coding, which involves coding Whereas fixed effects model average trends, random
the groups as −0.5 and 0.5 or −1 and 1 so the intercept effects model the extent to which these trends vary
corresponds to the grand mean) will not change the across levels of some grouping factor (e.g., participants
fit of the model; it will simply change the interpreta- or items). Random effects are clusters of dependent data
tion of the regression coefficients (but often leads to points in which the component observations come from
4 Brown

the same higher-level group (e.g., an individual partici- were, on average, 83 ms slower in the audiovisual condi-
pant or item) and are included in mixed-effects models tion than the audio-only condition—but one participant
to account for the fact that the behavior of particular may have been very strongly affected by modality (e.g., a
participants or items may differ from the average trend. response time difference between modalities of 200 ms),
Given that random effects are discrete units sampled from and another may have been only weakly affected by
some population, they are inherently categorical (Winter, modality (e.g., a response time difference between
2019). Thus, if you are wondering if an effect should be modalities of 10 ms). These individual deviations from
modeled as fixed or random and it is continuous in the average modality effect are modeled via random
nature, be aware that it cannot be modeled as a random slopes (note that a simple mean difference like the one
effect and therefore must be considered a fixed effect. In described here is represented in a regression equation
our hypothetical experiment, participants and words are as a slope).
modeled as random effects because they are randomly It may be confusing that modality comes up in the
sampled from their respective populations, and we want context of fixed and random effects, but recall that an
to account for variability within those populations. effect is considered fixed if it is assumed that the effect
Including random effects for participants and items would persist in a different sample of participants. In
resolves the nonindependence problem that often our case, modality is modeled as a fixed effect because
plagues multiple regression by accounting for the fact we are modeling the common influence of modality on
that some participants respond more quickly than others, response times across participants and items. However,
and some items are responded to more quickly than given that participants represent a random sample from
others. These random deviations from the mean response the population of interest, the effect of modality within
time are called random intercepts. For example, the participants represents a subset of possible ways modal-
model may estimate that the mean response time for ity and participants can interact. In other words, modal-
some condition is 1,000 ms, but specifying by-participant ity itself is not a random effect, but the way it interacts
random intercepts allows the model to estimate each with participants is random, and including random
participant’s deviation from this fixed estimate of the slopes for modality allows the model to estimate each
mean response time. So if one participant tended to participant’s deviation from the overall (fixed) trend.
respond particularly quickly, that person’s individual (For more on the distinction between fixed and random
intercept might be shifted down 150 ms (i.e., the esti- effects and a description of when a researcher may actu-
mated intercept would be 850 ms). Similarly, including ally want to model participants as a fixed effect, see
by-item random intercepts enables the model to estimate Mirman, 2014).
each item’s deviation from the fixed intercept, reflecting One question people often have at this point is, if
the fact that some words tend to be responded to more mixed-effects models derive an intercept and slope esti-
quickly than others. In multiple regression, in contrast, mate for each participant, why are these seemingly sys-
the same regression line (both intercept and slope) is tematic effects called random effects? The answer is that
applied to all participants and items, so predictions tend although an effect might be consistent within a particular
to be less accurate than in mixed-effects regression, and individual (e.g., one participant may systematically
residual error tends to be larger. Thus, in mixed model- respond more quickly and be less affected by modality
ing, the fixed-intercept estimate represents the average than average), the source of this variability is unknown
intercept, and random intercepts allow each participant and is therefore considered random. If you find yourself
and item to deviate from this average. 2 These devia- stumbling over the use of the word random, it may be
tions are assumed to follow a normal distribution with helpful to instead consider the synonymous terminology
a mean of zero and a variance that is estimated by the by-participant (or -item) varying intercepts (or slopes).
model. However, given that these effects are most commonly
An additional source of variability that mixed-effects referred to as random intercepts and slopes, I use that
models can account for comes from the fact that a vari- terminology here.
able that is modeled as a fixed effect may actually have
different influences on different participants (or items).
Visualizing Random Intercepts and Slopes
In our example, some participants may show very small
differences in response times between the audio-only In this section, I provide plots to help you visualize what
and audiovisual conditions, and others may show large happens when one builds on ordinary regression by
differences. Similarly, some words may be more affected introducing random intercepts and random slopes. These
by modality than others. To model this type of variability, plots are derived from fake data from four hypothetical
we will include random slopes in the model specification. participants who each responded to four items (note
In our hypothetical study, the model may estimate that that random effects really should have at least five or
the effect of modality is 83 ms—meaning that participants six levels, and having more levels is preferable; e.g.,
An Introduction to Mixed-Effects Modeling 5

2,250

Response Time (ms)


1,500

750

0
0 2 4 6 8 10
Word Difficulty
Fig. 1. Fixed-effects-only regression line depicting the relationship between word dif-
ficulty and response time. The plotted points represent individual response times for
each word for each participant, and the vertical lines represent the deviation of each
point from the line of best fit (i.e., residual error). Note that although you can discern
the nested nature of the data from this plot because each participant’s data are repre-
sented by a different shape, the model does not take such dependencies in the data
into account. For visualization purposes, data from the four participants for each word
have been jittered horizontally to avoid overlap.

Bolker, 2020). The effect of interest is the influence of time for a word with a 0 on the difficulty scale). In this
word difficulty on response times (where 0 represents example, the relationship between word difficulty and
“very easy” by some collection of criteria, such as the response time is equally strong for all participants (i.e.,
frequency with which the word occurs in the language the slope is fixed); random intercepts simply shift each
and the number of similar-sounding words, and 10 rep- participant’s regression line up or down depending on
resents “very difficult”). First, consider a model with no that individual’s deviation from the mean (see Winter,
random effects (i.e., fixed-effects-only regression; Fig. 2019, p. 237, for another way of visualizing random
1). More difficult words tend to elicit slower response intercepts, via a histogram of each individual’s deviation
times, but because there are no random effects, the from the population intercept). Notice that the residual
model estimates are the same for every participant; that error, indicated by the vertical lines, is substantially
is, although you can tell which points in the plot cor- smaller in the random-intercepts model relative to the
respond to each participant because I have represented fixed-effects-only model. Indeed, the residual standard
data from each participant with a different shape, the deviation in the original model is 410 ms, but it is
model does not have access to this information. Further, reduced to 275 ms in the random-intercepts model
given that this model predicts just one regression line (these values are obtained via the summary() com-
that applies to all observations, the residual error (rep- mand in R). This is because we have considered the fact
resented by vertical lines connecting every point to the that each participant’s intercept can vary from the aver-
regression line) is relatively large. age intercept, so residual error represents deviation from
Next, consider a model that includes random inter- a specific participant’s regression line rather than the
cepts for participants. In Figure 2, each dashed gray line overall regression line.
depicts model predictions for a single participant, and Figure 3 shows how the model changes when by-
the solid black line depicts the estimates for the average participant random slopes are included; this model
(fixed) effects. This model takes into account the fact allows for the relationship between word difficulty and
that some participants tend to have slower response response time to vary across participants. Here, partici-
times than others. Here, the overall effect of word dif- pants differ not only in how quickly they respond when
ficulty on response times is still apparent, but this model word difficulty is 0 (random intercepts), but also in the
does a better job predicting response times for a given extent to which they are affected by changes in word
participant because it allows for each participant to have difficulty (random slopes). Although the general trend
a different intercept (representing the predicted response that difficult words are responded to more slowly is still
6 Brown

2,500

2,000

Response Time (ms) 1,500

1,000

500

0
0 2 4 6 8 10
Word Difficulty
Fig. 2. Mixed-effects regression lines depicting the relationship between word difficulty
and response time, generated from a model including by-participant random intercepts
but no random slopes. Each dashed gray line represents model predictions for a single
participant, and the solid black line represents the fixed-effects estimates for the intercept
and slope. The plotted points represent individual response times for each word for each
participant, and the vertical lines represent the deviation of each point from the participant’s
individual regression line. Notice that including random intercepts reduces residual error
relative to the error in the fixed-effects-only model (Fig. 1). For visualization purposes, data
from the four participants for each word have been jittered horizontally to avoid overlap.

2,250
Response Time (ms)

1,500

750

0
0 2 4 6 8 10
Word Difficulty
Fig. 3. Mixed-effects regression lines depicting the relationship between word difficulty and
response time, generated from a model including by-participant random intercepts as well
as by-participant random slopes for word difficulty. Each dashed gray line represents model
predictions for a single participant, and the solid black line represents the fixed-effects esti-
mates for the intercept and slope. The plotted points represent individual response times for
each word for each participant, and the vertical lines represent the deviation of each point
from the participant’s individual regression line. Notice that including random slopes reduces
residual error relative to the error in both the random-intercepts model and the fixed-effects-
only model (Figs. 1 and 2). For visualization purposes, data from the four participants for
each word have been jittered horizontally to avoid overlap.
An Introduction to Mixed-Effects Modeling 7

apparent, the strength of this relationship varies across mathematical abilities are related to the trajectory of
participants. The result is that the residual error is even their improvement over the course of a training pro-
smaller because each regression line is tailored to the gram, so the correlation between by-participant random
individual; indeed, the residual standard deviation has intercepts and slopes for the training effect would be
decreased from 275 ms in the random-intercepts model of particular interest.
to 75 ms in the random-slopes model. Note that for Another reason for examining correlations among
simplicity, these plots do not take item-level variability random effects is that they can be informative about
into account. (See Barr et al., 2013, for a helpful visual- possible ceiling or floor effects. Consider the intelligibil-
ization depicting the simultaneous influences of partici- ity effect I described when I introduced the data: Seeing
pant and item random effects.) the talker improves speech intelligibility. A negative cor-
relation between by-participant random intercepts and
slopes for modality in this case would indicate that indi-
Correlations Among Random Effects
viduals with higher intercepts (i.e., better speech identi-
The discussion of mixed-effects models thus far has fication in the audio-only condition) had shallower slopes
focused on fixed effects, random intercepts, and random (i.e., benefited less from seeing the talker)—a correlation
slopes, but the models estimate additional parameters that would also emerge if the speech-identification task
that are often overlooked: correlations among random was too easy. That is, if participants could attain a high
effects. For example, when you specify that a model level of performance without seeing the talker, then
should include by-participant random intercepts and seeing the talker would have little effect on perfor-
slopes for modality, the model will also estimate the mance; this would result in a negative correlation
correlations among those random intercepts and slopes. between by-participant random intercepts and slopes,
Although experimental psychology typically focuses on but only because ceiling effects prevented the modality
fixed effects, correlations among random effects can effect from emerging in some individuals (see Winter,
provide useful information about individual differences 2019, p. 239, for another example of random-effects
in condition effects. correlations that may indicate the presence of ceiling
Suppose, for example, that in our hypothetical dual-task effects). Thus, even if your research question primarily
experiment, the correlation between the by-participant concerns fixed effects, examining random effects and
random intercepts and slopes was negative (e.g., r = their correlations will help you understand your data
−.17). This would suggest that individuals who have more deeply.
higher intercepts (i.e., slower response times in the
audio-only condition) tend to have lower slopes. Inter-
preting what “lower” means in the context of our experi-
Which Random Effects Can You Include?
ment also requires knowledge of the direction of the Before we move on to implementation in R, it is impor-
modality effect. If the modality effect is positive, then tant to note one other issue regarding random-effects
“lower slopes” means slopes that are less positive (i.e., structures in mixed-effects modeling: deciding which
closer to zero), and the correlation therefore suggests random slopes are justified by your design. Consider
that individuals with slower response times are less again the example in which we modeled response times
affected by the modality manipulation. If, however, the to words as a function of their difficulty. Word difficulty
modality effect is negative, then “lower slopes” means was manipulated within subjects, but because the words
slopes that are more negative, which suggests that indi- differed on an intrinsic property—namely, their difficulty—
viduals with slower response times tend to be more word difficulty was a between-items variable. Given that
affected by the condition manipulation. by-item random slopes account for variability across
If the modality effect is 83 ms, a negative correlation items in the extent to which they are affected by the
between by-participant random intercepts and slopes predictor of interest, we cannot model the effect of word
would indicate that individuals who had slower response difficulty on a particular item because each word has
times in the audio-only condition tended to show a less only one level of difficulty; that is, we cannot include
pronounced slowing in the audiovisual condition. One by-item random slopes.3 In contrast, if the predictor of
interpretation of this correlation is that people who interest was a within-items variable (as in our running
respond more slowly are completing the task more care- example in which all words appeared in both the audio-
fully, and this slow, deliberate responding washes out only and the audiovisual conditions), we could include
the condition effect in those individuals. Although this by-item random slopes for that predictor in our model,
correlation is not of particular interest in this experi- which would account for the fact that different words
ment, there are situations in which correlations among may be differently affected by the predictor. Put simply,
random effects are key to the research question. For by-participant and by-item slopes are justified only for
example, a researcher conducting a longitudinal study within-subjects and within-items designs, respectively.
might be interested in whether students’ baseline Thus, our random-effects structure in the word-difficulty
8 Brown

example can include random intercepts for both partici- Analyzing data with a continuous
pants and items, as well as by-participant random slopes outcome (response time)
for word difficulty, but cannot include by-item random
slopes for word difficulty. Now we can start building some models. For these exam-
Because including by-item random slopes for word ples, which I conducted in R Version 4.0.3 with lme4
difficulty would not be justified in this example, the Version 1.1-26, I used the hypothetical data set intro-
random-effects structure including random intercepts duced earlier to assess whether seeing the talker affects
for both participants and items, as well as by-participant response times to a secondary task and word intelligibil-
random slopes for word difficulty, would represent the ity. I used a dummy-coding scheme with the audio-only
maximal random-effects structure justified by the condition as the reference level. To follow along, go to
design (see Barr et al., 2013; Matuschek et al., 2017). https://osf.io/v6qag/ and navigate to the R Markdown 4
In cases in which by-participant and by-item random file called “intro_to_lmer.Rmd.”
slopes are justified, mixed-effects models can incorpo-
rate the simultaneous influences of both participant Model building and convergence issues. The basic
and item random slopes (but note that just because syntax for mixed-effects modeling for an experiment with
you can include a random effect does not necessarily one independent variable and random intercepts but no
mean that it would be advisable to do so. I discuss this random slopes for (crossed)5 participants and items is
further in the Model Building and Convergence Issues > l
 mer(outcome ~ 1 + predictor +
section). (1|participant) + (1|item), data = data)
The portions in the interior sets of parentheses are the
Examples and Implementation in R random effects, and the portions not in these parenthe-
Now that you have a conceptual understanding of what ses are the fixed effects. The vertical lines within the
mixed-effects models are and why they are useful, let random-effects portions of the code are called pipes,
us consider how to implement them in R. First, you will and they indicate that within each set of parentheses,
need to install R (R Core Team, 2020) and then RStudio the effects to the left of the pipe vary by the grouping
(RStudio Team, 2020), a programming environment that factor to the right of the pipe. Thus, in this example, the
allows you to write R code, run it, and view graphs and intercept (indicated by the 1) varies by the two grouping
data frames all in one place. I suggest working in RStudio factors in this experiment: participants and items. Note
rather than R (though this is not a rule, and some people that the 1 is optional in the fixed-effects portion of the
code in the R console without RStudio). Base R (the set model specification because the fixed intercept is
of tools that is built into R) has a host of functions, but included by default, but it is not optional in the random-
to create mixed-effects models you will need to install a effects portions because there must be some indication
specific package called lme4 (Bates et al., 2020). Packages, about which effects are allowed to vary by each group-
also referred to as libraries, are sets of functions that ing factor (i.e., the region to the left of the pipe cannot
work together and are not already built into Base R. To be left blank). I recommend always labeling intercepts
install lme4, run the following line of code (you should with a 1 in both the fixed- and the random-effects por-
run this line of code only if you have not already installed tions of the model specification to avoid any confusion
the package): about when the 1 must be included. Finally, the data
argument indicates the name of the R object containing
> install.packages("lme4") the data, and the lmer part is the function that builds
Once the package is installed, it is always on your a mixed-effects model (which you can access because
computer, and you will not need to run that line of code you installed the lme4 package).
again. Whenever you want to create mixed-effects mod- The model thus far includes random intercepts but
els, you will need to load the installed package, which no random slopes. However, my experience in speech-
will give you access to all the functions you need (you perception research leads me to expect that both par-
need to rerun this line of code every time you start a ticipants and words might differ in the extent to which
new R session). The following line of code will load the they are affected by the modality manipulation. We will
lme4 package: therefore fit a model that includes both by-participant
and by-item random slopes for modality. Failing to
> library(lme4) include random slopes would amount to assuming that
In this section, I assume rudimentary knowledge of all participants and words respond to the modality effect
R. If you are new to R, I recommend installing and in exactly the same way, which is an unreasonable
loading the swirl package (Kross et al., 2020), which assumption to make. Although the model including by-
serves as an introduction to R that can be completed participant and by-item random intercepts and slopes
in R itself. reflects the maximal random-effects structure justified
An Introduction to Mixed-Effects Modeling 9

by the design, the decision to include by-participant and (stim = stimulus), and we are telling R to use the data
by-item random slopes is also theoretically justified. frame called rt_data.7 Also note that this line of code
Theoretical motivation should always be considered, as includes the <- operator. This is used to assign a name
blind maximization can lead to nonconverging models to an object (a data structure with specific attributes that
and a loss of statistical power (Matuschek et al., 2017). is stored in R’s memory) and save it for later. Thus, with
Notice how the basic syntax for the model changes when this line of code we have created a model and given it
we include by-participant and by-item varying slopes in an intuitive name so that we know what that object
the random-effects structure: represents later on.
If you run this line of code in the R script, you may
> l
 mer(outcome ~ 1 + predictor +
notice that you get a warning message saying that the
(1 + predictor|participant) + (1 +
model failed to converge. Linear mixed-effects models
predictor|item), data = data)
can be computationally complex, especially when they
Here, the portions in parentheses indicate that both the have rich random-effects structures, and failure to con-
intercept (indicated by the 1, which in this case is verge basically means that a good fit for the data could
optional because it is implied by the presence of random not be found within a reasonable number of iterations
slopes but is included for clarity) and the predictor (indi- of attempting to estimate model parameters. It is impor-
cated by + predictor) vary by participants and items. tant never to report the results of a nonconverging
In plain language, this syntax means “predict the out- model, as the convergence warnings are an indication
come from the predictor and the random intercepts and that the model has not been reliably estimated and there-
slopes for participants and items, using the data I fore cannot be trusted.
provide.”6 When a model fails to converge, you as the researcher
The model above includes only one predictor, but if have several options, and this is a situation potentially
a model includes multiple predictors the researcher may introducing researcher degrees of freedom—the numer-
decide which of the predictors can vary by participant ous seemingly innocuous choices made during the
or item; in other words, any fixed effect to the left of research process that enable researchers to find “‘statisti-
the interior parentheses can be included to the left of cally significant’ evidence consistent with any hypoth-
the pipe (inside the interior parentheses), provided that esis” (Simmons et al., 2011, p. 1359). As a general rule,
including it is justified given the design of the experi- you should consider which random effects are theoreti-
ment. For example, if we wanted to include a second cally important to include in your model beforehand,
predictor that varied within both participants and items, using knowledge of your particular domain and previous
but there was no theoretical motivation for including research (e.g., ask yourself the question, “Does it make
by-item random slopes for the second predictor—or, sense for modality to vary by participants or by items?”),
alternatively, if the second predictor varied between and remove random effects only if all other ways of
items, so including the by-item random slope would not addressing convergence issues have been unsuccessful.
be justified given the experimental design—the syntax If you must remove a random effect, this decision should
would look like this: be documented and reported in your published manu-
scripts and/or accompanying code.
> l
 mer(outcome ~ 1 + predictor1 +
The first step you should take to address convergence
predictor2 + (1 + predictor1 +
issues is to consider your data set and how your model
predictor2|participant) + (1 +
relates to it, and to ensure that your model has not been
predictor1|item), data = data)
misspecified (e.g., have you included by-item random
In the example we will be working with, the full slopes for a predictor that does not actually vary within
model (i.e., the model including the fixed effects of items?). It is also possible that the convergence warnings
interest and all theoretically motivated random effects) stem from imbalanced data: If you have some partici-
is specified as follows: pants or items with only a few observations, the model
may encounter difficulty estimating random slopes, and
> r t_full.mod <- lmer(RT ~ 1 +
those participants or items may need to be removed to
modality + (1 + modality|PID) +
enable model convergence. Although attempting to
(1 + modality|stim), data = rt_data)
resolve convergence issues can feel like a hassle, keep
Here, we are predicting response times (RT) on the basis in mind that these warnings serve as a friendly reminder
of the fixed effects for the intercept (1) and modality to think deeply about your data and not model with your
(audio-only vs. audiovisual condition), we are including eyes closed. Assuming you have done this, the next step
random intercepts and slopes for both participants is to add control parameters to your model, so that you
(PID = participant identification number) and words can tinker with the nuts and bolts of estimation. There
10 Brown

are many control parameters, and depending on the determine whether elimination is warranted. To remove
source of the convergence issues, some may be more a correlation between two random effects in R, simply
appropriate or useful than others. The one I recommend put a 0 where the 1 was in the random-effects specifica-
starting with is adjusting the optimizer (i.e., the method tion. When you do this, however, the lmer() function
by which the model finds an optimal solution). The no longer estimates the random intercept, so you need
model specification below is identical to the one above, to be sure to put it back into the model specification.
with the exception that it includes a control parameter Here is what the code would look like if you wanted to
that explicitly specifies the optimizer: remove the correlation between the random intercept
for participants and the by-participant random slope for
> rt_full.mod <- lmer(RT ~ 1 + modality +
modality:
(1 + modality|PID) + (1 +
modality|stim), data = rt_data, > rt_full.mod <- lmer(RT ~ 1 + modality +
control = lmerControl(optimizer = (0 + modality|PID) + (1|PID) + (1 +
"bobyqa")) modality|stim), data = rt_data)
This model converges, but how did I know which Other ways to resolve convergence warnings include
optimizer to choose? And what if the model had not increasing the number of iterations before the model
successfully converged with that optimizer? When it “gives up” on finding a solution (e.g., control =
comes to selecting an optimizer, I highly recommend lmerControl(optCtr = list(maxfun = 1e9))),
the all_fit() function from the afex package (Sing- centering or scaling continuous predictors (or sum-
mann et al., 2020). This function takes a model as input, coding categorical predictors), or removing some of
refits the model with a variety of optimizers, and lets the derivative calculations that occur after the model
you know which ones produce warning messages. This has reached a solution using the following control param-
package integrates nicely with lme4, so the model syntax eter: control = lmerControl(calc.derivs =
need not be changed before running the function. Here FALSE). I also suggest typing ?convergence into the
is the relevant code and abbreviated output: R console, which will open a help file offering other
recommendations for resolving convergence warnings.
> all_fit(rt_full.mod)
Finally, it may be that a model fails to converge simply
bobyqa. : [OK]
because the random-effects structure is too complex
Nelder_Mead. : [OK]
(Bates, Kliegl, et al., 2015). In this case, one can selec-
optimx.nlminb : [OK]
tively remove random effects based on model selection
optimx.L-BFGS-B : [OK]
techniques (Matuschek et al., 2017). It is important to
nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
reiterate, however, that simplification of the random-
nloptwrap.NLOPT_LN_BOBYQA : [OK]
effects structure should only be done as a last resort, and
nmkbw. : [OK]
these decisions should be documented—the random-
This output indicates that none of the optimizers tested effects structure should be theoretically motivated, so it is
led to convergence warnings or singular fits, both of best to try to maintain that structure unless all other meth-
which are indicative of problems with estimation. Thus, ods of addressing convergence issues are unsuccessful.
any of these optimizers should produce reliable param-
eter estimates. Likelihood-ratio tests. Now that we have a model to work
In this example, our model converged when we with, how do we determine if modality actually affected
changed the optimizer, but this will not always be the response times? This is typically done by comparing a model
case, and you may sometimes need to address conver- including the effect of interest (e.g., modality) with a model
gence issues in another way. 8 One option is to force the lacking that effect (i.e., a nested model) using a likelihood-
correlations among random effects to be zero. Recall ratio test.10 This test is used to compare two nested models
that in addition to estimating fixed and random effects, by calculating likelihoods for the two models using a tech-
mixed-effects models estimate correlations among ran- nique called maximum likelihood estimation and then statis-
dom effects. If you are willing to accept that a correlation tically comparing those likelihoods. If you obtain a small
may be zero,9 this will reduce the computational com- p value from the likelihood-ratio test, this indicates that the
plexity of the model and may allow the model to con- full model provides a better fit for the data.
verge on parameter estimates. Note, however, that it is When we run a likelihood-ratio test for our example,
advisable to conduct likelihood-ratio tests (described in we are basically asking, does a model that includes
detail in the next subsection) on nested models differing information about the modality in which words are pre-
in the presence of the correlation parameter—or examine sented fit the data better than a model that does not
the confidence interval around the correlation—to include that information? Here is how you do this in R,
An Introduction to Mixed-Effects Modeling 11

first by building the reduced model that lacks the fixed practice ( John et al., 2012) known as hypothesizing after
effect for modality but is otherwise identical to the full the results are known (HARKing) and should be avoided
model (including any control parameters used), and then because, as Kerr (1998) put it, HARKing transforms Type
by conducting the test via the anova()11 function I error (false positives) into theory.
(which does not actually compute an analysis of vari- Luckily, the afex package has another handy function
ance, but is a convenient function for conducting a that allows you to avoid this practice altogether. The
likelihood-ratio test): mixed() function takes a model specification as input
and conducts likelihood-ratio tests on all fixed (but not
> r t_reduced.mod <- lmer(RT ~ 1 + (1 +
random) effects in the model when the argument
modality|PID) + (1 + modality|stim),
method = 'LRT ' is included. Crucially, you do not see
data = rt_data, control =
the reduced models that were built to obtain the relevant
lmerControl(optimizer = "bobyqa"))
p values, so the temptation to inadvertently p-hack is
> anova(rt_reduced.mod, rt_full.mod)
reduced. This function is more useful when your model
Data: rt_data
has multiple fixed effects, but here is how to implement
Models:
the function in our example and what the output looks
rt_reduced.mod: RT ~ 1 + (1 + modality|stim)
like (notice that the χ2 value is the same as when we
+ (1 + modality|PID)
used the anova() function, because both functions
rt_full.mod: RT ~ 1 + modality + (1 +
conduct likelihood-ratio tests):
modality|stim) + (1 + modality|PID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) > m
 ixed(RT ~ 1 + modality + (1 +
rt_reduced
.mod       8 302449 302513 -151217 302433 modality|PID) + (1 + modality|stim),
rt_full data = rt_data, control =
.mod       9 302419 302491 -151200 302401 32.385 1 1.264e-08 lmerControl(optimizer = "bobyqa"),
The small p value in the Pr(>Chisq) column indicates method = 'LRT')
that the model including the modality effect provides a Model: RT ~ 1 + modality + (1 +
better fit for the data than the model without it; thus, modality|stim) + (1 + modality|PID)
the modality effect is significant. I have added boldface Data: rt_data
for the p value, the χ 2 value in the Chisq column Df full model: 9
(32.385), and the degrees of freedom for the test (1, Effect    df Chisq p.value
found in the Chisq Df column) because these three 1 modality 1 32.39***    <.001
values should be reported in your results section (I
return to this point below). Interpreting fixed and random effects. The likelihood-
Given that the full model includes only one condition ratio test comparing our full and reduced models indi-
effect (modality), conducting the test is relatively cated that the modality effect was significant, but it did not
straightforward. However, performing likelihood-ratio tell us about the direction or magnitude of the effect. So
tests can quickly become a tedious task for complex how do we assess whether the audiovisual condition
models with many fixed effects. This is because these resulted in slower or faster response times? And how do
tests must be conducted on nested models, so in order we gain insight into the variability across participants and
to test a particular effect, a reduced model (sometimes items that we asked the model to estimate? To answer
referred to as a null model) lacking that effect needs to these questions, we need to examine the model output via
be built for comparison. Another issue with this approach the summary() command. The output contains two main
is that although the reduced models are built solely for sections: The top part contains information about random
the purpose of comparison with the full model, it can effects, and the bottom part contains information about
be quite tempting to examine those intermediate models fixed effects. The following code chunk implements the
and consider them plausible candidates for the “best summary() command and shows the abbreviated output
model” (i.e., perform stepwise regression without know- relevant to interpreting fixed effects:
ing it). For example, suppose you build a full model
with two fixed effects—modality and background-noise > summary(rt_full.mod)
level—and a reduced model to test whether the effect Fixed effects:
of modality is significant. In doing so, you may notice Estimate Std. Error df t value
that noise level is significant in the reduced model but (Intercept) 1044.14 23.36 52.14   4 4.704
modality 83.18 12.58 52.10   6 .615
not in the full model and convince yourself that a model
without modality is actually more appropriate, even Recall that we used a dummy-coding scheme with the
though you had not considered this possibility before audio-only condition as the reference level; the intercept
examining the models. This is a questionable research therefore represents the estimated mean response time
12 Brown

in the audio-only condition, and the modality effect $PID


represents the adjustment to the intercept in the audio- (Intercept) modality
visual condition. Thus, response times in the audio-only 301 1024.0668 -16.936415
condition averaged an estimated 1,044 ms, and response 302 1044.1377 1.842626
times were an estimated 83 ms slower in the audiovisual 303 882.8306 57.789321
condition. 304 1232.7544 -27.919775
Now let us focus on the random-effects portion of the
This output indicates that the estimated intercept for
output:
the word “bag” is 1,043 ms, and the estimated slope is 86
Random effects: ms; these values are very similar to the estimates for the
Groups Name Variance Std.Dev. Corr fixed intercept (1,044 ms) and slope (83 ms). The partici-
stim (Intercept)   3 03.9 17.43 pant part of the output indicates that Participant 303 had
modality 216.6   1 4.72 0.16
PID (Intercept)   2 8552.7    168.98 an estimated intercept of 883 ms and an estimated slope
modality 7709.8     87.81 –0.17 of 58 ms, indicating that this person responded much
Residual   
65258.8    255.46 more quickly than average and was less affected by
modality than average. Notice that even though we are
The Groups column lists the grouping factors that
looking at estimates for only four items and participants,
appeared to the right of the pipes in the model specifi-
it is clear that there is more intercept and slope variability
cation (along with the residuals), and the Name column
across participants than across items. The standard devia-
lists the effects that were grouped by each factor (i.e.,
tions are consistent with this observation. Specifically, the
the intercepts and modality slopes that appeared to the
standard deviations for the by-participant random inter-
left of the pipes in the model specification). Each of
cepts (169 ms) and slopes (88 ms) are much larger than
these random intercepts and slopes has an associated
those for the by-item random intercepts (17 ms) and
variance (and standard deviation) estimate, which tells
slopes (15 ms). This is not surprising—in my experience,
you the extent to which response times for particular
participants tend to vary more than items—but it is useful
stimuli and participants varied around the fixed intercept
to know that participants vary considerably in their
and slope. For example, the standard deviation for by-
response times because this could have important conse-
item random intercepts (in boldface in the output above)
quences for power calculations and could uncover ave-
indicates that response times for particular items varied
nues for individual differences research (e.g., why do
around the average intercept of 1,044 ms by about 17
people vary so much in the way the modality manipula-
ms. Similarly, the standard deviation for by-participant
tion affects their response times?) and follow-up studies
random slopes (in boldface in the output above) indi-
(e.g., do the results hold when one controls for individual
cates that participants’ estimated slopes varied around
differences in simple reaction time?).
the average slope of 83 ms by about 88 ms. Thus, an
Although the focus of our hypothetical study is on
individual whose slope was 1 SD below the mean would
fixed effects, random-effects estimates can be interesting
have an estimated slope near 0 (indicating that this per-
and informative in their own right, and in some cases
son’s response times were not affected by the modality
provide insight into the key research question. For exam-
in which the words were presented), whereas an indi-
ple, Idemaru and colleagues (2020) recently concluded
vidual whose slope was 1 SD above the mean would
that loudness is a more informative cue than pitch in
have a very steep slope (indicating a difference between
predicting whether an utterance is perceived as respect-
modalities of about 171 ms). The coef() function in
ful or not respectful. This claim was supported both by
lme4 provides individual intercept and slope estimates
greater variation in pitch than loudness slopes across
for every participant and item, which not only helps
participants (i.e., participants responded more consis-
make the concept of random-intercept and -slope esti-
tently to loudness cues) and by the fact that the direction
mates more concrete, but can also help you identify
of the loudness effect was negative for every single
outliers. Here is the code and abbreviated output indicat-
participant (this is an example of the coef() function
ing estimates for the first four items and participants:
in action), but the direction of the pitch effect varied
> coef(rt_full.mod) considerably across participants. Thus, random effects
$stim rather than fixed effects were at the crux of the authors’
(Intercept) modality argument that listeners use loudness as an indicator of
babe 1038.921 82.11521 respect more consistently than they use pitch.
back 1050.914 86.52633 The last piece of information in the random-effects
bad 1041.122 81.12267 output concerns correlations among random effects. The
bag 1042.896 86.40601 Corr column indicates that the correlation between
An Introduction to Mixed-Effects Modeling 13

random intercepts for items and by-item random slopes As long as you report your results transparently and
is .16, and the correlation between random intercepts include details of the model specification and any sim-
for participants and by-participant random slopes is −.17 plifications you made to the random-effects structure in
(these values have been put in boldface in the output your manuscript or accompanying code, the particular
above). This means that items that were responded to convention you follow is up to you (and, of course,
more slowly in the audio-only condition tended to have making your data and code publicly available reduces
larger (more positive, steeper) slopes, and participants the impact of the reporting convention you adopt).
who responded more slowly in the audio-only condition Finally, you should be sure to cite R as well as the spe-
had shallower slopes. One possible explanation for the cific packages you used to conduct your analyses,
positive correlation for items is that the items that were including the versions you used, both to facilitate repro-
responded to more slowly tended to be the more diffi- ducibility of your results (indeed, it is not uncommon
cult ones and may have been particularly affected by for a model that once converged to no longer converge
any distraction coming from the visual signal. The nega- with an lme4 update) and to give credit to the package
tive correlation between by-participant random inter- developers who have put a lot of work into making your
cepts and slopes is consistent with the one described analyses possible.
earlier in this article and may suggest that slow, deliber-
ate responding washes out the modality effect.
Interpreting interactions. The data set we have been
Finally, it is important to note that it is possible for a
working with throughout this Tutorial contains just one
model to encounter estimation issues (i.e., produce unre-
condition effect. Although this simplicity is convenient for
liable parameter estimates) without any warning messages
learning about mixed-effects models, many experiments
appearing in aggressive red text in your R console, and
test multiple conditions and the interactions among them.
the random-effects portion of the output contains some
Interpreting interactions is tricky, and doing so accurately
clues that may help you identify when this happens. One
depends critically on knowledge of the coding scheme
clue comes from the random-effects correlations, which
used for categorical predictors. R’s default (and usually my
are set to −1.00 or 1.00 by default when they cannot be
own) is to use dummy coding, which leads to misinterpre-
estimated, and another comes from the variance estimates,
tation of interactions and lower-order effects if sum cod-
which are set to 0 when they cannot be estimated (i.e.,
ing is assumed to be the default. Therefore, for this section,
the variance and correlation parameters are set to their
I continue to use dummy-coded predictors. The example
boundary values when they cannot be estimated; Bates,
I provide uses the same data set we have been working
Mächler, et al., 2015). Although random-effects correla-
with, but contains one additional categorical predictor
tions of −1.00 or 1.00 are often accompanied by “singular
representing the difficulty of the background-noise level.
fit” warning messages, this is not always the case, so it is
Participants identified speech in audio-only (coded 0) and
crucial to examine the random-effects portion of the
audiovisual (coded 1) conditions in both an easy (coded
model output to ensure that estimation went smoothly.
0) and a hard (coded 1) level of background noise. The
goal of this analysis is to assess whether the effect of
Reporting results in a manuscript. There are no explicit
modality on response time depends on (i.e., interacts with)
rules for reporting findings from model comparisons and
the level of the background noise (i.e., the signal-to-noise
the associated parameter estimates from the preferred
ratio, or SNR). On the basis of previous research, we
model (Meteyard & Davies, 2020). How results are reported
expect that response times will be slower in the audio­
depends on the number and nature of model compari-
visual condition (as in the analyses above), but that this
sons, the journal submission guidelines, and author and
slowing will be more pronounced in easy listening condi-
reviewer preferences. That said, I typically report the χ2
tions because the cognitive costs associated with simulta-
value from the likelihood-ratio test, the degrees of free-
neously processing auditory and visual information are
dom of the test, and the associated p value, as well as the
amplified in conditions in which seeing the talker is
coefficient estimates, t values, and standard errors associ-
unnecessary to attain a high level of performance (see
ated with the parameters of interest from the selected
Brown & Strand, 2019).
model. To report the findings described in the example
There are a few ways to specify an interaction in R
above, you could write,
that produce identical results. One way is to use an
asterisk ( m o d a l i t y * S N R ), which automatically
A likelihood-ratio test indicated that the model
includes all lower-order terms even if you do not type
including modality provided a better fit for the data
them in (the following syntax is abbreviated for read-
than a model without it, χ2(1) = 32.39, p < .001.
ability, but random effects and control parameters are
Examination of the summary output for the full
also included; see the accompanying code at https://osf
model indicated that response times were on
.io/v6qag/):
average an estimated 83 ms slower in the audiovisual
relative to the audio-only condition(β β = 83.18, > rt_int.mod <- lmer(RT ~ 1 + modality*
SE = 12.58, t = 6.62). SNR + . . ., data = rt_data_interaction)
14 Brown

Another way to specify an interaction is to use a colon plugging 0s and 1s into the regression equation above
rather than an asterisk (modality:SNR), but in this may help you here).
case, you need to explicitly specify the lower-order Similarly, the SNR effect indicates that response times
terms in the model specification (I use this method for are on average 92 ms slower in the hard relative to the
clarity): easy listening condition, but this applies only when the
modality dummy code is set to 0 (representing the
> rt_int.mod <- lmer(RT ~ 1 + modality
audio-only condition). The modality and SNR effects I
+ SNR + modality:SNR + . . .,
have just described are called simple effects, but are often
data = rt_data_interaction)
misinterpreted as main effects. Simple effects represent
Here is the abbreviated model output: the effect of a predictor on an outcome at a particular
level of another predictor, whereas main effects repre-
Fixed effects:
Estimate Std. Error df t value sent the average effect of a predictor on an outcome
(Intercept) 998.824 22.214 52.729 44.964 across levels of another predictor. Thus, when an inter-
modality 98.510 13.199 59.065 7.464 action is present and you have used a coding scheme
SNR 92.339 14.790 58.004 6.243 centered on 0 (e.g., sum coding), lower-order effects are
modality:SNR -29.532 6.755 21298.850 -4.372
considered main effects, but if you have used a dummy-
Recall that the intercept represents the estimated coding scheme, they are simple effects. Keep this com-
response time when all other predictors are set to 0. The mon misinterpretation in mind any time you use dummy
intercept of 999 ms therefore represents the estimated coding.
mean response time in the audio-only modality (modal- Just as the modality and SNR effects can be thought
ity = 0) in the easy listening condition (SNR = 0). If you of as adjustments to the intercept in particular conditions
are having difficulty understanding why this is the case, (e.g., estimates are shifted up 99 ms in the audiovisual
it may be helpful to plug in 0 for both modality and SNR relative to the audio-only condition, but only in the easy
in the following regression equation. Notice that the listening condition), the interaction term can be thought
intercept is the only term that does not drop out: of as an adjustment to the modality or SNR slope when
both predictors are set to 1 (note that interactions adjust
RT = 999 + 99 * modality + 92 * SNR − 30 * modality * SNR
coefficient estimates only for a single cell of the design
= 999 + 99 * 0 + 92 * 0 − 30 * 0 * 0 because the interaction term drops out when one or
= 999 both of the predictors are set to 0). In this example, the
coefficient for the modality term indicates that the modal-
To interpret the remaining three coefficients, it is ity effect is 99 ms when the SNR is easy, but the presence
important to note that when an interaction is included of an interaction tells us that the effect of modality differs
in a model, it no longer makes sense to interpret the depending on the level of the background noise; that is,
predictors that make up the interaction in isolation. This the modality slope needs to be adjusted when the SNR
means that the coefficient for the modality term should is hard. Specifically, the negative interaction term indi-
not be interpreted as the average modality effect if SNR cates that the modality slope is 30 ms lower (less steep)
is held constant (this would be the interpretation if we when the SNR is hard, which is consistent with the
had not included an interaction in the model), because hypothesis I described above: Seeing the talker slows
the presence of the interaction tells us that the modality response times, but it does so to a greater extent when
effect changes depending on the SNR. Instead, the coef- the listening conditions are easy, presumably because
ficient for the modality term should be interpreted as the visual signal is distracting and unnecessary when
the estimated change in response time from the audio- the auditory signal is highly intelligible. Note that inter-
only to the audiovisual condition when all other predic- actions are symmetric in that if the modality slope varies
tors are set to 0. Thus, the modality effect indicates that by SNR, then the SNR slope varies by modality. You can
response times are on average 99 ms slower in the audio- therefore also interpret the interaction term as an adjust-
visual relative to the audio-only condition in the easy ment to the SNR slope: The 92-ms SNR effect is 30 ms
listening condition (SNR = 0). Think of it this way: When weaker in the audiovisual condition. If you are struggling
the SNR dummy code is set to 0 (easy), the SNR and to interpret interactions with dummy-coded predictors,
interaction terms drop out of the model, and we are left I recommend making a table containing the coefficient
with a 99-ms adjustment to the intercept when we move estimate for each of the cells in the design by plugging
from the audio-only to the audiovisual condition. How- all combinations of 0s and 1s into the regression equa-
ever, when the SNR dummy code is set to 1 (hard), those tion (Table 2); this can help you visualize the role of
terms do not drop out of the model, and it is no longer each individual coefficient estimate in generating cell-
accurate to say that the modality effect is 99 ms (again, wise predictions (see also Winter, 2019).
An Introduction to Mixed-Effects Modeling 15

Table 2. Estimates for All Cells in the 2 × 2 Design When means that any predictions generated from the model
the Model Includes an Interaction Term will also be on a log-odds scale, which is not particularly
Modality informative, but luckily, these predictions can be expo-
nentiated to put them back on an odds scale, and the
Audio-only Audiovisual odds can then be converted into probabilities (see Jae-
Signal-to-noise ratio condition (0) condition (1) ger, 2008, for a tutorial on using logit mixed models).
Easy (0) 999 999 + 99 Here is the code to build the full model:
Hard (1) 999 + 92 999 + 99 + 92 − 30 > a cc_full.mod <- glmer(acc ~ 1 +
modality + (1 + modality|PID) +
(1 + modality|stim), data = acc_data,
Analyzing data with a binary outcome family = binomial)
(identification accuracy) This code is very similar to that for the response time
Now you should have a general understanding about analysis, but it contains a few key differences. First, the
how to build and interpret models in which the outcome dependent variable is acc (0 for incorrect and 1 for
is continuous (e.g., response time), but what if you correct word identification) rather than RT. Because this
wanted to test for an effect of modality on accuracy at outcome is binomially distributed, we indicate that we
identifying words, when accuracy for each trial is scored are using generalized linear mixed-effects modeling by
as 0 or 1? These values are discrete and bounded by 0 using the glmer() function, and we indicate that our
and 1, so you need to use generalized linear mixed- dependent variable follows a binomial distribution with
effects models; if we instead modeled this discrete out- the additional parameter family = binomial.
come assuming a continuous outcome, the model would This model converged, but remember that you should
generate impossible predictions (e.g., a predicted prob- always examine the random-effects portion of the output
ability of −0.2 or 1.3). The R code for building these to ensure that estimation went smoothly:
kinds of models is almost exactly the same as that
> summary(acc_full.mod)
described above, except rather than using the lmer()
Random effects:
function you use the glmer() (generalized linear
Groups Name   V ariance Std.Dev. Corr
mixed-effects regression) function, and you need to
stim (Intercept) 0.72085 0.8490
include at least one additional argument within the
modality   0 .46663 0.6831    -0.06
glmer() function indicating the assumed distribution
PID (Intercept) 0.04346 0.2085
of the dependent variable. The glmer() function also
modality   0 .04903 0.2214    -0.15
contains an argument for specifying a link function,
which transforms the outcome into a continuous and Not only did we not encounter any convergence or
unbounded scale, but each family of distributions has a singularity warnings, but the variance estimates and esti-
default link function that typically does not need to be mated correlations among random effects seem reason-
changed. In this case, the discrete outcomes of 0 and 1 able (i.e., the variance estimates are not exactly zero,
follow a binomial distribution, which should be modeled and the correlations are not −1.00 or 1.00). It is slightly
with logistic regression, typically using a logit link func- unusual that in this data set there is more variability
tion (the default). The logit link function transforms across items than across participants in both intercepts
probabilities, which are bounded by 0 and 1, into a and slopes, but this may simply reflect the fact that the
continuous, unbounded scale (log odds). Using the logit speech-identification task was relatively easy for most
link function allows us to model the linear relationship participants, which resulted in little variability. 12
between the predictors and the log odds of the outcome Next, we will build a reduced model lacking modality
(which can be transformed back into odds and probabili- as a fixed effect so we can conduct a likelihood-ratio test:
ties for ease of interpretation) without generating non-
sensical predictions. > acc_reduced.mod <- glmer(acc ~ 1 +
Put simply, the logit link function first transforms (1 + modality|PID) + (1 +
probabilities, which are bounded by 0 and 1, into odds, modality|stim), data = acc_data,
which are bounded by 0 and infinity (a probability of 0 family = binomial)
corresponds to odds of 0, and a probability of 1 corre- It is important to note that although both the full and
sponds to odds of infinity). However, this scale still has reduced models converged with this random-effects
a lower bound of 0, so the link function takes the natural structure and no control parameters, it is certainly pos-
logarithm of the odds (the logarithm of 0 is negative sible (and indeed not uncommon) for the full model to
infinity, so the lower bound of the scale is extended from converge but the reduced model to encounter conver-
0 to negative infinity), which results in the continuous gence issues. In this case, you should find a random-
and unbounded log-odds scale. Using this function effects structure and combination of control parameters
16 Brown

Table 3. Helpful Links and Additional Resources

Helpful links Additional resources


R (R Core Team, 2020) for Macs: https://cran An excellent introduction to linear models and mixed-effects modeling for
.r-project.org/bin/macosx/ individuals with limited statistical experience: Winter (2013)
R (R Core Team, 2020) for Windows: https:// An introduction to analyzing eye-tracking data with mixed-effects
cran.r-project.org/bin/windows/base/ modeling: Barr (2008)
RStudio (RStudio Team, 2020): https://www An argument in favor of utilizing the maximal random-effects structure
.rstudio.com/products/rstudio/download/ justified by the design (within reason): Barr et al. (2013)
swirl package for learning R in R (Kross et al., An argument in favor of using parsimonious mixed models: Bates et al.
2020): https://swirlstats.com/ (2015)
Wickham and Grolemund’s (2017) R for Data A model-selection approach to selecting random-effects structures:
Science book: https://r4ds.had.co.nz/index.html Matuschek et al. (2017)
Franke and Roettger’s (2019) brms tutorial: https:// Descriptions of mixed models with crossed random effects for participants
psyarxiv.com/cdxv3 and items: Baayen et al. (2008), Quené and van den Bergh (2008)
Overviews of design types and statistical power for analyzing data with
mixed-effects models: Judd et al. (2017), Westfall et al. (2014)
Description of logit mixed models: Jaeger (2008)
Descriptions of how to extend mixed-effects modeling to growth-curve
analysis: Mirman et al. (2008), Mirman (2014)
Introduction to modeling other nonlinear effects (e.g., linguistic change)
and implementing general additive modeling: Winter and Wieling (2016)

that enable both models to converge (e.g., via the all_ include continuous predictors, they provide estimates
fit() function in the afex package), because the mod- for average (as well as by-participant and by-item)
els being compared via a likelihood-ratio test should be effects of predictors on the outcome, and they can be
nested and built with the same control parameters. That easily extended to model categorical outcomes.
is, the models should be identical except for the pres- However, as Uncle Ben once said to Spider-Man, with
ence of the fixed effect of interest. Here is the code and great power comes great responsibility (Lee & Ditko,
output for the likelihood-ratio test: 1962). These models can be easily implemented in R
> anova(acc_reduced.mod, acc_full.mod) without cost, but it is important that researchers ensure
Data: acc_data that this powerful tool is used correctly. Indeed, although
Models: more and more researchers are implementing mixed-
acc_reduced.mod: acc ~ 1 + (1 + effects models, there is a concerning lack of standards
modality | PID) + (1 + modality | stim) guiding implementation and reporting of these models
acc_full.mod: acc ~ 1 + modality + (Meteyard & Davies, 2020). Many analytic decisions must
(1 + modality | PID) + (1 + modality | stim) be made when using this statistical technique. Consider,
npar AIC BIC logLik deviance Chisq      
Df Pr(>Chisq) for example, the number of options available to the
acc_reduced researcher if a model fails to converge. This results in a
.mod 7 28147 28205 -14067 28133
massive number of “forking paths” (Gelman & Loken,
acc_full
.mod       8 27989  28055 -13986 27973 160.78 1     < 2.2e-16 2014) that the researcher may embark upon to obtain
statistically significant results. Given the considerable
The small p value indicates that the full model pro- number of choices a researcher may make during data
vides a better fit for the data than the reduced model, analysis (i.e., researcher degrees of freedom; Simmons
and thus that modality has a significant effect on spoken- et al., 2011), it is important that these models be used
word identification accuracy. carefully and reported transparently (see Meteyard &
Davies, 2020, for an example of how models and results
Conclusions should be reported).
Mixed-effects modeling is becoming an increasingly The goal of this article is to serve as an accessible,
popular method of analyzing data from experiments in broad overview of mixed-effects modeling for research-
which each participant responds to multiple items—and ers with minimal experience with this type of modeling.
for good reason. The beauty of mixed-effects models is I have focused on what mixed-effects models are, what
that they can simultaneously model participant and item they offer over other analytic techniques, and how to
variability while being far more flexible and powerful implement them in R. Table 3 lists helpful links, as well
than other commonly used statistical techniques: They as additional resources for readers interested in more
handle missing observations well, they can seamlessly in-depth descriptions of particular topics.
An Introduction to Mixed-Effects Modeling 17

Transparency 3. Note that including by-item random slopes might be unjus-


tified even when the conditions are not defined by stimulus-
Action Editor: Mijke Rhemtulla
intrinsic properties. For example, if you are interested in the
Editor: Daniel J. Simons
effect of background noise on response times to words, but
Author Contributions
different words are assigned to different conditions (each word
V. A. Brown is the sole author of this article and is respon-
appears in only one level of noise), it would not be justified to
sible for its content. She devised the idea for the article,
include by-item random slopes. It is therefore crucial to consider
wrote the article in its entirety, wrote the accompanying R
your experimental design before building mixed-effects models.
script, and generated the dummy data on which the models
4. R Markdown is a file format that is easily accessed via RStudio
are based.
and incorporates plain text, code, and R output.
Declaration of Conflicting Interests
5. The corresponding code for a nested random-effects structure
The author(s) declared that there were no conflicts of inter-
in which classes are nested within schools is
est with respect to the authorship or the publication of this
>  lmer(outcome ~ 1 + predictor + (1|school/
article.
class), data = data)
Funding
6. When you are creating a mixed-effects model like this one, R
This work was supported by the National Science Founda-
uses maximum likelihood estimation to compute the values of
tion through a Graduate Research Fellowship awarded to
the parameters that maximize the likelihood of the data given
V. A. Brown (DGE-1745038).
the structure that you specify for the model (see Etz, 2018, for an
Open Practices
approachable introduction to the concept of likelihood).
Open Data: https://osf.io/v6qag/
7. Response time data should really be analyzed with generalized
Open Materials: not applicable
linear mixed-effects models (discussed in the section on analyz-
Preregistration: not applicable
ing binomial data) assuming, for example, an inverse Gaussian
All data have been made publicly available via OSF and
distribution and an identity link function because response times
can be accessed at https://osf.io/v6qag/. This article has
tend to be positively skewed (Lo & Andrews, 2015). For sim-
received the badge for Open Data. More information about
plicity, however, we will use general linear mixed models via
the Open Practices badges can be found at http://www
the lmer() function; the parameter estimates change a bit with
.psychologicalscience.org/publications/badges.
generalized mixed modeling, but the conclusions do not change.
Mixed modeling is quite robust to violations of the normality
assumption, so it is acceptable to use general mixed models here.
8. Note that convergence issues are far less common in Bayesian
ORCID iD mixed models than in frequentist mixed models (lme4 falls into
Violet A. Brown https://orcid.org/0000-0001-5310-6499 the frequentist category), so if you find yourself struggling with
convergence issues, you might consider switching to a Bayesian
Acknowledgments framework. For individuals who are comfortable using lme4,
this switch is made easy by the brms package (Bürkner, 2017)
I am grateful to Michael Strube for providing detailed feedback
because this package uses lme4 formula syntax but Bayesian sta-
on an earlier draft of the manuscript and to Julia Strand and
tistics behind the scenes. This Tutorial uses lme4 only, but inter-
Kristin Van Engen for providing helpful comments and sug-
ested readers may want to refer to Franke and Roettger’s (2019)
gestions throughout the writing process.
helpful tutorial on how to use brms (https://psyarxiv.com/cdxv3).
9. A situation in which you may not be willing to assume the
Notes correlation is zero is when that correlation is a crucial part of
1. It is important to note that the examples in this Tutorial concern your research question. For example, if your research question
crossed rather than nested random effects (mixed-effects models addresses whether people with slower overall response times
with nested random-effects structures are typically referred to tend to be less affected by modality, then it would be critical to
as hierarchical linear models). Random effects (defined later) allow the model to estimate the correlation between the random
of participants and items are considered crossed when every effects. However, a typical study in experimental psychology
participant responds to every item and nested when every par- is more interested in the fixed-effects parameter estimates, so
ticipant responds to a different set of items. The classic exam- assuming the correlation is zero is often acceptable. The code
ple of nested random effects comes from education research to examine the confidence interval around standard deviation
in which students are nested within classes, which in turn are and correlation estimates is confint(rt_full.mod, parm =
nested within schools (see Raudenbush, 1988). The motivation "theta_", oldNames = F). The parm parameter indicates
for using mixed modeling applies to both design types, but the which parameters in the model will be given confidence inter-
examples and R code I provide assume a crossed design (see vals, and setting the oldNames parameter to FALSE (F) simply
Baayen et al., 2008; Judd et al., 2017; Quené & van den Bergh, gives the output more interpretable names.
2008; Westfall et al., 2014, for more on the distinction between 10. If you examine the summary output for a mixed-effects model,
crossed and nested designs). you may notice that the lmer() function does not include p
2. Note that this is not literally how parameters in mixed-effects values. This is because the null distribution is unknown (the
models are estimated. Those details are beyond the scope of this error structure in multilevel models is complex, and the degrees
Tutorial, and this simplified description is provided to help you of freedom cannot be calculated). Bates, one of the creators of
conceptualize what mixed models are doing behind the scenes. the lme4 package and the person who wrote the lmer() func-
See Snijders and Bosker (2012) for more detail. tion, has posted a helpful description of why he did not include
18 Brown

p values in that function (see Bates, 2006). You can obtain p val- Enders, C. K. (2010). Applied missing data analysis. Guilford
ues by loading the lmerTest package (Kuznetsova et al., 2017), Press.
but I recommend using likelihood-ratio tests instead. Erber, N. P. (1972). Auditory, visual, and auditory-visual recog-
11. Note that if you are testing only random effects, you should nition of consonants by children with normal and impaired
include the argument refit = FALSE in the anova() com- hearing. Journal of Speech and Hearing Research, 15(2),
mand. This is because lme4 automatically refits the models using 413–422.
maximum likelihood (ML) estimation when you conduct a likeli- Etz, A. (2018). Introduction to the concept of likelihood and
hood-ratio test via the anova() command, but this is necessary its applications. Advances in Methods and Practices in
only when testing fixed effects. This default is in place to make Psychological Science, 1(1), 60–69.
it really difficult to test fixed effects when the models have been Franke, M., & Roettger, T. (2019). Bayesian regression model-
built using the default estimation procedure in lme4 (restricted ing (for factorial designs): A tutorial. PsyArXiv. https://doi
maximum likelihood estimation, or REML), as this method is .org/10.31234/osf.io/cdxv3
not appropriate for comparing models differing in fixed effects. Gelman, A., & Loken, E. (2014). The statistical crisis in science:
However, you should override the default by including refit = Data-dependent analysis—a “garden of forking paths”—
FALSE if you are testing random effects. explains why many statistically significant comparisons
12. The experiment on which these data are based also included don’t hold up. American Scientist, 102(6), 460–466.
an SNR manipulation whereby each word in the data set Idemaru, K., Winter, B., Brown, L., & Oh, G. E. (2020). Loud­
occurred in both audio-only and audiovisual conditions at both ness trumps pitch in politeness judgments: Evidence from
a very easy and a moderate SNR. I have ignored the SNR vari- Korean deferential speech. Language and Speech, 63(1),
able for simplicity, but the relatively easy SNRs in which the 123–148.
words were presented may explain why accuracy was high for Jaeger, T. F. (2008). Categorical data analysis: Away from
all participants. ANOVAs (transformation or not) and towards logit mixed
models. Journal of Memory and Language, 59(4), 434–446.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring
References the prevalence of questionable research practices with
Baayen, R. H. (2010). A real experiment is a factorial experi- incentives for truth telling. Psychological Science, 23(5),
ment. The Mental Lexicon, 5(1), 149–157. 524–532.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed- Judd, C. M., Westfall, J., & Kenny, D. A. (2017). Experiments
effects modeling with crossed random effects for subjects with more than one random factor: Designs, analytic mod-
and items. Journal of Memory and Language, 59(4), 390–412. els, and statistical power. Annual Review of Psychology,
Barr, D. J. (2008). Analyzing “visual world” eyetracking data 68, 601–625.
using multilevel logistic regression. Journal of Memory and Kerr, N. L. (1998). HARKing: Hypothesizing after the results
Language, 59(4), 457–474. are known. Personality and Social Psychology Review, 2(3),
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random 196–217.
effects structure for confirmatory hypothesis testing: Keep Kross, S., Carchedi, N., Bauer, B., Grdina, G., Schouwenaars,
it maximal. Journal of Memory and Language, 68(3), 255– F., & Wu, W. (2020). Package ‘swirl’ (Version 2.4.5)
278. https://doi.org/10.1016/j.jml.2012.11.001 [Computer software]. Comprehensive R Archive Network.
Bates, D. (2006). [R] lmer, p-values, and all that. R-help. https:// https://cran.r-project.org/web/packages/swirl/swirl.pdf
stat.ethz.ch/pipermail/r-help/2006-May/094765.html Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B.
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsi­ (2017). lmerTest package: Tests in linear mixed effects
monious mixed models. arXiv. http://arxiv.org/abs/1506 models. Journal of Statistical Software, 82(13). https://doi
.04967 .org/10.18637/jss.v082.i13
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Lee, S., & Ditko, S. (1962). Amazing Fantasy, 1(15) [Comic
Fitting linear mixed-effects models using lme4. Journal book]. Marvel.
of Statistical Software, 67(1). https://doi.org/10.18637/jss Liben-Nowell, D., Strand, J., Sharp, A., Wexler, T., & Woods, K.
.v067.i01 (2019). The danger of testing by selecting controlled sub-
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, sets, with applications to spoken-word recognition. Journal
R., Singmann, H., Dai, B., Scheipl, F., Grothendieck, of Cognition, 2(1). Article 2. https://doi.org/10.5334/joc.51
G., & Green, P. (2020). Package ‘lme4’ (Version 1.1-26) Lo, S., & Andrews, S. (2015). To transform or not to transform:
[Computer software]. Comprehensive R Archive Network. Using generalized linear mixed models to analyse reaction
https://cran.r-project.org/web/packages/lme4/lme4.pdf time data. Frontiers in Psychology, 6, Article 1171. https://
Bolker, B. (2020). GLMM FAQ. GitHub. http://bbolker.github doi.org/10.3389/fpsyg.2015.01171
.io/mixedmodels-misc/glmmFAQ.html#should-i-treat- Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D.
factor-xxx-as-fixed-or-random (2017). Balancing Type I error and power in linear mixed
Brown, V. A., & Strand, J. F. (2019). About face: Seeing the models. Journal of Memory and Language, 94, 305–315.
talker improves spoken word recognition but increases lis- Meteyard, L., & Davies, R. A. I. (2020). Best practice guidance
tening effort. Journal of Cognition, 2(1). Article 44. https:// for linear mixed-effects models in psychological science.
doi.org/10.5334/joc.89 Journal of Memory and Language, 112, Article 104092.
Bürkner, P.-C. (2017). brms: An R package for Bayesian mul- https://doi.org/10.1016/j.jml.2020.104092
tilevel models using Stan. Journal of Statistical Software, Mirman, D. (2014). Growth curve analysis and visualization
80(1). https://doi.org/10.18637/jss.v080.i01 using R. Chapman and Hall.
An Introduction to Mixed-Effects Modeling 19

Mirman, D., Dixon, J. A., & Magnuson, J. S. (2008). Statistical Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis:
and computational models of the visual world paradigm: An introduction to basic and advanced multilevel model­
Growth curves and individual differences. Journal of ing (2nd ed.). SAGE Publications.
Memory and Language, 59(4), 475–494. Wendorf, C. A. (2004). Primer on multiple regression coding:
Quené, H., & van den Bergh, H. (2008). Examples of mixed- Common forms and the additional case of repeated con-
effects modeling with crossed random effects and with bino- trasts. Understanding Statistics, 3(1), 47–57.
mial data. Journal of Memory and Language, 59(4), 413–425. Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical
Raudenbush, S. W. (1988). Educational applications of hier- power and optimal design in experiments in which
archical linear models: A review. Journal of Educational samples of participants respond to samples of stimuli.
and Behavioral Statistics, 13(2), 85–116. Journal of Experimental Psychology. General, 143(5),
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear 2020–2045.
models: Applications and data analysis methods (2nd ed.). Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan,
Sage Publications. L. D., François, R., Grolemund, G., Hayes, A., Henry, L.,
R Core Team. (2020). R: A language and environment for sta- Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache,
tistical computing [Computer software]. R Foundation for S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P.,
Statistical Computing. http://www.R-project.org/ Spinu, V., . . . Yutani, H. (2019). Welcome to the tidy-
Royston, P., Altman, D. G., & Sauerbrei, W. (2005). Dichotomi­ verse. Journal of Open Source Software, 4(43), Article 1686.
zing continuous predictors in multiple regression: A bad https://doi.org/10.21105/joss.01686
idea. Statistics in Medicine, 25(1), 127–141. Wickham, H., & Grolemund, G. (2017). R for data science:
RStudio Team. (2020). RStudio: Integrated development envi- Import, tidy, transform, visualize, and model data (1st ed.).
ronment for R [Computer software]. RStudio, PBC. http:// O’Reilly Media.
www.rstudio.com/ Winter, B. (2013). Linear models and linear mixed effects mod-
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). els in R with linguistic applications. arXiv. http://arxiv.org/
False-positive psychology: Undisclosed flexibility in data abs/1308.5499
collection and analysis allows presenting anything as sig- Winter, B. (2019). Statistics for linguists: An introduction using
nificant. Psychological Science, 22(11), 1359–1366. R. Taylor & Francis.
Singmann, H., Bolker, B., Westfall, J., & Aust, F. (2020). Winter, B., & Wieling, M. (2016). How to analyze linguis-
afex: Analysis of factorial experiments (Version 0.27.2) tic change using mixed models, growth curve analysis
[Computer software]. GitHub. https://github.com/sing and generalized additive modeling. Journal of Language
mann/afex Evolution, 1(1), 7–18.

You might also like