An Introduction To Repeatability Estimation With RPTR
An Introduction To Repeatability Estimation With RPTR
with rptR
Martin A. Stoffel, Shinichi Nakagawa & Holger Schielzeth
2018-01-04
A very powerful tool for estimating repeatabilities is the mixed effects model framework.
Random-effect predictors are used to estimate variances at different hierarchical levels. In the
simplest case of a design where repeated measures (e.g. quantifications of behavior) were
taken from the same features (e.g. individuals), the repeatability R
is calculated as the variance among group means (group-level variance VG) over the sum of
group-level and data-level (residual) variance VR:
R=VG/(VG+VR)
and VR from standard software packages, it is considerably more difficult to quantify the
uncertainty in estimated R values. The rptR package offers easy-to-use functions for
estimating R and its uncertainty. The package includes estimates of the repeatability (and also
raw variances and marginal R2
; sensu Nakagawa and Schielzeth 2013) for Gaussian traits, binomial and Poisson-distributed
traits. The theory for for the quantification of the repeatability for non-Gaussian traits is
reviewed in Nakagawa & Schielzeth (2010).
The functions build heavily on functions for fitting mixed models, in particular on lmer from
the lme4 package (Bates et al. 2015). Mixed-effect models are fitted within the function call
and fitted model objects are returned as part of the class rpt return object. Uncertainty is
quantified via parametric bootstrapping. In a nutshell, parametric bootstrapping works by (i)
fitting the model, (ii) simulating new data based on the estimators of the original model while
adhering to the original sampling design, (iii) refitting the model to the simulated data and
storage of the repeatability of the model fit. Repeating steps (ii) to (iii) a large number of
times (e.g. 1000 times) yields the sampling variance give the data structure under the
assumption that the model is valid.
Basic usage
As usual, the CRAN-version of the package can be installed via a call to:
install.packages("rptR")
install.packages("devtools")
devtools::install_github("mastoffel/rptR", build_vignettes = TRUE)
library(rptR)
citation("rptR")
The central wrapper function for estimating the repeatability is rpt. The argument datatype
decides whether the call is forwarded to the specialized functions rptGaussian for Gaussian
traits, rptBinary for binary (0/1) outcomes, rptProportion for proportion data represented
as joint vectors (or matrix) of successes and failures and rptPoisson for Poisson-distributed
count data. The functions return S3 class rpt objects that can be viewed via the str function.
The generic functions summary, print and plot are available for a more convenient display
of the results. In the following we walk through the features of rptR starting with Gaussian
models for introducing the basic plots and outputs, before addressing the specifics of count
and binary trait analyses.
A toy dataset
As an example dataset we will use a simulated toy dataset that was introduced for a different
purpose (Nakagawa & Schielzeth 2013). It offers a balanced dataset with rather simple
structure, sizable effects and decent sample size, just right to demonstrate some features of
rptR. Sufficient sample size is required in particular for the non-Gaussian traits, because
those tend to be more computationally demanding and less rich in information per data point
than simple Gaussian traits.
In brief, the imaginary sampling design of the simulated dataset is as follows. Beetle larvae
were sampled from 12 populations (Population) with samples taken from two discrete
microhabitats at each location (Habitat). Samples were split in equal proportions and raised
in two dietary treatments (Treatment). Beetles were sexed at the pupal stage (Sex) and pupae
were kept in single-sex containers (Container). Phenotypes (BodyL, Egg and Color) were
measured at the imaginal stage and are introduced below in the separate sections.
Repeatability for Gaussian data
As an example for Gaussian data, we will analyse body length (BodyL). To start with, we will
analyse the repeatability at the level of the population that might arise through genetic
differentiation and/or early environmental influences prior to sampling.
data(BeetlesBody)
str(BeetlesBody)
## 'data.frame': 960 obs. of 6 variables:
## $ Population: int 1 1 1 1 1 1 1 1 1 1 ...
## $ Container : int 1 1 1 1 1 1 1 1 2 2 ...
## $ Sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 2 2
...
## $ Habitat : Factor w/ 2 levels "A","B": 1 1 1 1 2 2 2 2 1 1 ...
## $ Treatment : Factor w/ 2 levels "Cont","Exp": 1 2 1 2 1 2 1 2 1 2 ...
## $ BodyL : num 10.8 11.2 11.6 15.6 12.4 ...
hist(BeetlesBody$BodyL)
table(BeetlesBody$Population)
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 80 80 80 80 80 80 80 80 80 80 80 80
We analyse the data using the rpt function with the argument datatype = "Gaussian". It is
possible to call rptGaussian directly, while omitting the datatype argument. As parametric
bootstrapping and permutation can be slow, we recommend starting with shutting
bootstrapping and permutation off by setting nboot = 0 and npermut = 0. Note that the
formula argument uses the same syntax as lmer from the lme4 packages, because the
formula-call is directly forwarded to lmer. Also note that grname has to be specified with
quotation marks and that the term has to match exactly a random effect as it is used in the
formula argument. The data argument is mandatory and is specified without quotes.
Since this works fine and we see sizable repeatability, we increase the number of parametric
bootstraps to 1000, in order to evaluate the uncertainty in the repeatability estimate. The
increase in the number of parametric bootstraps will not affect the point estimates, but interval
estimates will be affected. At the same time, the permutation test can be kept switched off.
The results are stored in an object named rep1.
rep1 <- rpt(BodyL ~ (1 | Population), grname = "Population", data =
BeetlesBody,
datatype = "Gaussian", nboot = 1000, npermut = 0)
It is also possible to increase the number of bootstrap iterations (and also of permutations, see
below) by calling the function again with the update = TRUE and rptObj accepting the rpt
object from the previous function call. Note that all other arguments (except for nboot and
npermut) have to be identical to the original function call. We here add an additional 500
bootstraps to the original 1000 bootstraps.
The rptR package offers three generic function for convenient display. The print function
displays the essence of the repeatability estimation: the point estimate R, standard error SE,
confidence interval CI and P
values. The summary function is more verbose, summarizing parts of the data structure,
distributional summaries of the bootstrap and permutation samples and the full details on
likelihood ratio tests. Finally, the plot function shows the distribution of the parametric
bootstrap samples along with the point estimate and the limits of the confidence interval. The
plot can be customized by specifying additional graphical parameters as with the cex.main
argument below (see ?par for additional arguments).
print(rep1)
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Population
## R = 0.299
## SE = 0.089
## CI = [0.116, 0.466]
## P = 9.59e-62 [LRT]
## NA [Permutation]
summary(rep1)
##
## Repeatability estimation using the lmm method
##
## Call = rpt(formula = BodyL ~ (1 | Population), grname = "Population",
data = BeetlesBody, datatype = "Gaussian", nboot = 500, npermut = 0, rptObj
= rep1, update = TRUE)
##
## Data: 960 observations
## ----------------------------------------
##
## Population (12 groups)
##
## Repeatability estimation overview:
## R SE 2.5% 97.5% P_permut LRT_P
## 0.299 0.0895 0.116 0.466 NA 0
##
## Bootstrapping and Permutation test:
## N Mean Median 2.5% 97.5%
## boot 1500 0.287 0.285 0.116 0.466
## permut 2 NA NA 0.000 0.000
##
## Likelihood ratio test:
## logLik full model = -1946.634
## logLik red. model = -2083.405
## D = 274, df = 1, P = 9.59e-62
##
## ----------------------------------------
plot(rep1, cex.main = 1)
Parametric bootstrapping and permutations are inherently repetitive procedures that can be
computed on multiple cores in parallel. The rptR package has build-in options for using
multiple cores by setting parallel = TRUE. The default ncores = NULL determines the
number of cores on your machine and uses all but one core. It is also possible to specify the
number of cores to be used via the ncores argument.
The statistical significance of the repeatability is tested by likelihood ratio tests (LRT),
comparing the model fit of a model including the grouping factor of interest and one
excluding it (i.e., constraining it group-level variance to zero). Because the LRT are
conducted at the boundary of possible parameter space (against zero group-level variance),
the difference in log-likelihoods is assumed to follow a mixture distribution of a χ2
values based on LRT are part of the standard print output and the full details can be see by a
call to summary (see above). In this case the variance among populations is highly significant
and so is the repeatability at this level.
Permutation can be used as a robust alternative or addition to LRT. In the simplest case this
will work by randomizing the vector of group identities against the response vector followed
by refitting the model to the randomized data and storage of the repeatability for each of many
replications. However, in more complex models involving multiple random effects and/or
fixed effects, this will also break the data structure between the grouping factor of interest and
other aspects of the experimental design. We therefore construct randomized datasets by
fitting a model without the grouping factor of interest and then add the permutated residuals to
the fitted values of this model. This maintains the general data structure and any effects of
other design aspects on the response while still breaking the link between grouping factor and
the response.
It is not unusual to see warnings about convergence problems, because the data are simulated
with zero group-level variance and this might occasionally produce odd data. The results of
the permutation tests are shown as part of print and summary, and can be visualized using the
type argument in the plot function. The results are in general agreement with the LRT in this
case, but note that the resolution of the permutation tests is limited by the number of
iterations. Since the empirical estimate is always included as one permutation sample
(following the reasoning that even under the null hypothesis, the observed data are a possible,
albeit possibly unlikely, outcome, Manly 2007), the resolution is limited to 1/N
were N
is the number of permutations. It is typical to see the spike at zero and an extended right tail if
variances are low.
The full return object can be viewed by the str functions, which shows, for example, that
R_boot holds all parametric bootstrapping samples, R_permut all permutation samples (the
first one of which is the point estimate), all_warnings hold all warning messages and mod
holds the model fit as an lmerMod as returned by lmer.
str(rep2)
# Output omitted
summary(rep2$mod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: BodyL ~ (1 | Population)
## Data: data
##
## REML criterion at convergence: 3893.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2950 -0.7770 0.0186 0.7867 2.6107
##
## Random effects:
## Groups Name Variance Std.Dev.
## Population (Intercept) 1.377 1.173
## Residual 3.235 1.798
## Number of obs: 960, groups: Population, 12
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 14.0827 0.3436 40.98
The print and summary functions will show results for both grouping factors, while the
action of the plot function needs to be controlled by the grname argument. The results show
substantial repeatable variation among containers with still sizable (albeit slightly reduced)
variation among populations.
In our example, containers are nested within populations in the sense that samples from the
same population are spread over multiple containers, but each container houses individuals
from only a single population. In principle it might have been possible to apply a crossed
sampling design with samples from the same population spread across multiple containers
while each container houses samples from multiple populations. Crossed designs are usually
preferred, but their appliation might be constraint like in the case of beetles where it is not
possible to follow individuals (and consequently population identities) across the pupal stage,
because all external markings are lost.
Nested sampling has consequences for how variances are interpreted. In our example, the
population level repeatability captures all variation among populations (arising for genetic and
environmental reasons) over all other variance components. But the container-level variance
with population included in the model captures only among-container variance after
controlling for variation among populations and is hence foremost environmental in origin.
The population x container interaction variance, however, will appear as part of the container
repeatability. The interaction variance might, in this case, arise from genotype-environment
interactions or early environmental x late environment interactions (see Schielzeth &
Nakagawa 2013 for a more detailed discussion of nesting).
One might argue that the full container variance should contain the population variance. It is
here possible to estimate the repeatability at the container-level without controlling for
population and the results turn out to be almost equivalent to adding the population-level
variance to the container-level variance.
Estimating adjusted repeatability is facilitated in rprR by the formula argument. Fixed effect
covariates can be included as in any lmer formula. We will here include Treatment and Sex
as covariates. Note that the treatment was artificially applied, hence it potentially introduces
excess variation to the sample that we might want to control for statistically. Sex is a more
controversial example, because it represents part of the natural variation, but often we want to
know the (adjusted) repeatabilities given that the sexual identities of the individuals are
known (and assuming that repeatabilities are identical across the sexes).
The results can be viewed and plotted as before and the significance of the fixed effect can be
assessed from the fitted model.
print(rep5)
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Container
## R = 0.083
## SE = 0.027
## CI = [0.043, 0.145]
## P = 1.34e-13 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## R = 0.491
## SE = 0.113
## CI = [0.239, 0.676]
## P = 3.19e-31 [LRT]
## NA [Permutation]
plot(rep5, type = "boot", grname = "Container", cex.main = 0.8)
plot(rep5, type = "boot", grname = "Population", cex.main = 0.8)
It turns out that most of the variance among containers was caused by sexual dimorphism
(females are larger, as can be seen from the slope of SexMale in the fitted model rep5$mod),
albeit some significant container repeatability remains even after controlling for sexual
dimorphism.
of mixed effects models (Nakagawa & Schielzeth 2013). rptR provides a quantification of
uncertainty for the variance explained by fixed effects by parametric bootstrapping as for
other variance components. However, we have not implemented significance testing for the
fixed effect variance. This should be quantified by other means.
The variances explained by container and population are evidently unchanged, but it is now
visible that the fixed effects alone explain almost 40% of the phenotypic variance in body
length in our example. A big part of this is simply sexual dimorphism. This serves as an
example, because in this particular case, we would probably be more interested in the adjusted
repeatability controlling for Treatment and Sex.
While in a Gaussian model, there are different parameters for the location and the spread of
the distribution (mean and variance), this is not the case in non-Gaussian models. The Poisson
distribution, for example, has an inherent variance that is equal to the mean. This inherent
variance needs to be respected when quantifying repeatabilities for non-Gaussian data. We
have reviewed the literature on non-Gaussian link-scale variances (Nakagawa & Schielzeth
2010) and rptR is based on the formula presented therein. It turns out that the distribution-
specific variance for a Poisson model with log link is approximated by ln(1/λ+1)
where λ is the average count. This approximation however becomes increasingly inaccurate at
low λ
In order to account for additive overdisperison in Poisson and other non-Gaussian models
(except for binary data), the current implementation of rptR internally adds an observation
level random effect to the model specified in formula. An observation level random effect is
an additional random effect term (named Overdispersion) with as many levels as there are
observations to the specified model. The additive overdispersion explained by the new
Overdispersion term is added to the distribution-specific variance in the denominator of the
repeatability. See als section ‘Multiplicative and additive overdispersion models’ towards the
end of the vignette.
As an example, we will analyse the number of eggs laid by female beetles (Eggs) in our toy
dataset. We will estimate the repeatability at the level of the population as well as at the level
of the container simultaneously while controlling for excess variation introduced by the
experimental nutritional treatment. There is no need to control for sex-differences in this sex-
limited trait.
data(BeetlesFemale)
hist(BeetlesFemale$Egg, nclass = max(BeetlesFemale$Egg))
We will estimate the repeatability through the rpt function using the datatype =
"Poisson" argument. Alternatively, it would be possible to fit the model directly using the
rptPoisson function. The formula syntax follows the rules used in glmer from the lme4
package, which is also the central engine for fitting the mixed effects model.
As usual, the output can be viewed and displayed by print, summary and plot. The results
show substantial repeatability at the level of the population (due to genetic differentiation or
lasting environmental influence prior to sampling), but very little repeatability at the level of
the container (indicating limited shared environmental effects).
print(rep8)
##
##
## Repeatability estimation using the glmm method and log link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0.038
## SE = 0.025
## CI = [0, 0.1]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.032
## SE = 0.023
## CI = [0, 0.089]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.52
## SE = 0.122
## CI = [0.234, 0.689]
## P = 1.19e-16 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.504
## SE = 0.121
## CI = [0.223, 0.68]
## P = 1.19e-16 [LRT]
## NA [Permutation]
There are two repeatabilities for Poisson GLMM, derived by a latent scale approximation and
a original scale approximation. We have previously called the two estimates the latent scale
repeatability and the original scale repeatability. However, they are both estimates of the same
quantities, not entirely different repeatabilites, hence we now prefer to call them the latent
and orginal scale approximations. In the case of Poisson GLMM, original scale
approximations are actually the exact solutions, while link scale approximations are indeed
approximate. However, in our experience the two typically give very similar results.
). In this case the treatment effect explained less than 10% of the variance and including this
fraction to the denomionator of the repeatability calculation gives marginally reduced
repeatability estimates for population and container:
that represents the expected value, i.e. the global mean on the data scale. We have previously
given different advice for estimating λ
, including the inverse-link of the intercept on the latent scale. While this gives a reasonable
approximation in many cases, the estimate will be biased for small expected values (smaller
than about 2). A preferred alternative is to use the mean of the observations in the sample as
we have also suggested previously. The is now the default option in rptR that is determined
by the expect argument with its default expect = "meanobs".
There is also an analytical alternative that takes the intercept and variances as estimated on the
latent scale. The expected value is E[y]=exp(β0+vartot/2)
represents the sum of the variances, including variance caused by fixed effects, on the latent
scale. While this is generally a preferable solution, the estimates are susceptible to the
distribution of covariates and, in most cases, it gives appropriate results only if all covariates
are centered. Since centering is not enforced within rptR, we have made this an opt-in option
that can be used by setting expect = "latent". The repeatabilities estimated by the
formulae presented in Nakagawa & Schielzeth (2010) can be requested by expect =
"liability".
Males in our toy example occur in two distinct color morphs: dark and reddish-brown. We
will analyze the repeatability of color morph identities (Colour) as an example for the
repeatability of binary data. This is a test for spatial heterogeneity in morph ratios.
data(BeetlesMale)
table(c("dark", "reddish")[BeetlesMale$Colour + 1])
##
## dark reddish
## 275 205
We estimate the repeatability through the wrapper function rpt at the level of population and
at the level of containers. As with Poisson models it frequently happens that some of the
bootstrap or permutation iterations do not converge on a model fit and that glmer throws
warnings that are collected in the all_warnings element of the rpt object.
As with Poisson GLMM, there are two approximations for the repeatability that we call the
latent scale approximation and original scale approximation and in the case of Binomial
GLMM, the two are both approximations of the same quantity and in our experience the two
typically give very similar results.
It turns out that there is significant repeatability at the level of populations, but not on the
level of containers (at least not beyond variance among populations). It frequently happens
that small variances are estimated as zero, because small variances are difficult to estimate
with any level of confidence. For a check of robustness, we still estimate the enhanced
agreement repeatabilities controlling for the treatment manipulation, but it turns out that this
change to the model does hardly affect the repeatability estimates due to the low explanatory
power of the treatment.
We have previously adviced to estimate the distribution specific variance for the link scale
approximation as π2/3
(Nakagawa & Schielzeth 2010). While this gives a reasonable approximation, it is preferable
to estimate the link scale variance as 1/(p(1−p)) where p is the expected probability of
success (Nakagawa & Schielzeth 2016). The current version of rptR implements the more
accurate approximation 1/(p(1−p))
As with Poisson GLMM, we have implemented two options for estimating the expectation for
p
. The default version with expect = "meanobs" takes the average succes rate across all
observations. There is also a latent scale approximation (details are presented in Nakagawa &
Schielzeth 2016) that can be used upon choice by setting expect = "latent". As with
Poisson GLMM, we consider the analytical solution via expect = "latent" more accurate
and putitatively more robust to unbalanced sampling, but expect = "meanobs" seems safer,
because it does not depend on how covariates are coded.
Repeatabilities for Proportion data
Counts of successes and failures are conveniently modelled as proportion data. They are
handed over to rpt (and ultimately to glmer) as combined vectors of integers. We will
illustrate the repeatability analysis of proportion data by aggregating color morph identities
within containers. The analysis is equivalent to the analysis of the data as binary data with
Container treated as a random effect as it was done above (model rep10).
BeetlesMale$Dark = BeetlesMale$Colour
BeetlesMale$Reddish = (BeetlesMale$Colour - 1) * -1
BeetlesMaleAggr <- aggregate(cbind(Dark, Reddish) ~ Population + Container,
data = BeetlesMale, FUN = sum)
rep12 <- rpt(cbind(Dark, Reddish) ~ (1 | Population), grname =
c("Population"),
data = BeetlesMaleAggr, datatype = "Proportion", nboot = 1000, npermut
= 0)
print(rep12)
print(rep12)
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.188
## SE = 0.072
## CI = [0.045, 0.318]
## P = 5.81e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.185
## SE = 0.07
## CI = [0.045, 0.308]
## P = 5.81e-09 [LRT]
## NA [Permutation]
is thus equivalent to no overdispersion beyond what is expected from the distribution. See
Nakagawa & Schielzeth (2010) for details on additive and multiplicative overdispersion.
The choice of additive versus multiplicative overdispersion is a matter of convenience. The
data for the toy example was simulated as additive overdispersion and it happens that the
powerful glmer function implements additive overdispersion. We have previously used
glmmPQL for fitting multiplicative overdispersion models, but do not maintain this option any
longer.
Link functions
In our examples we have used the default link functions of Poisson models (log link) and
binary/proportion data (logit link). Alternative link functions are possible, as long as the
distribution-specific variances are known for the distribution-link family (see Nakagawa &
Schielzeth 2010 for an overview). Beyond what has been demonstrated above, rptR supports
the sqrt link for Poisson and the probit link for binary/proportion data. In the above
examples, the differences between alternative link functions are rather minor (but note that the
original scale approximations are not available for Possion data with sqrt link and for
binary/proportion models with probit link).
Poisson models
print(rep8) # log link
##
##
## Repeatability estimation using the glmm method and log link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0.038
## SE = 0.025
## CI = [0, 0.1]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.032
## SE = 0.023
## CI = [0, 0.089]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.52
## SE = 0.122
## CI = [0.234, 0.689]
## P = 1.19e-16 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.504
## SE = 0.121
## CI = [0.223, 0.68]
## P = 1.19e-16 [LRT]
## NA [Permutation]
rep8a <- rpt(Egg ~ Treatment + (1 | Container) + (1 | Population), grname =
c("Container",
"Population"), data = BeetlesFemale, datatype = "Poisson", link =
"sqrt",
nboot = 1000, npermut = 0)
print(rep8a) # sqrt link
##
##
## Repeatability estimation using the glmm method and sqrt link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0.039
## SE = 0.024
## CI = [0.001, 0.097]
## P = 0.00771 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 0.00771 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.477
## SE = 0.107
## CI = [0.219, 0.627]
## P = 2.07e-16 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 2.07e-16 [LRT]
## NA [Permutation]
Binary models
print(rep10) # logit link
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0
## SE = 0.014
## CI = [0, 0.051]
## P = 1 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0
## SE = 0.014
## CI = [0, 0.051]
## P = 1 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.188
## SE = 0.073
## CI = [0.042, 0.33]
## P = 5.81e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.185
## SE = 0.071
## CI = [0.042, 0.322]
## P = 5.81e-09 [LRT]
## NA [Permutation]
rep10a <- rpt(Colour ~ (1 | Container) + (1 | Population), grname =
c("Container",
"Population"), data = BeetlesMale, datatype = "Binary", link =
"probit",
nboot = 1000, npermut = 0)
print(rep10a) # probit link
##
##
## Repeatability estimation using the glmm method and probit link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0
## SE = 0.013
## CI = [0, 0.047]
## P = 1 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 1 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.178
## SE = 0.07
## CI = [0.043, 0.316]
## P = 5.08e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 5.08e-09 [LRT]
## NA [Permutation]
Proportion models
print(rep12) # logit link
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.188
## SE = 0.072
## CI = [0.045, 0.318]
## P = 5.81e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.185
## SE = 0.07
## CI = [0.045, 0.308]
## P = 5.81e-09 [LRT]
## NA [Permutation]
rep12a <- rpt(cbind(Dark, Reddish) ~ (1 | Population), grname =
c("Population"),
data = BeetlesMaleAggr, datatype = "Proportion", link = "probit", nboot
= 1000,
npermut = 0)
print(rep12a) # probit link
##
##
## Repeatability estimation using the glmm method and probit link
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.178
## SE = 0.07
## CI = [0.052, 0.312]
## P = 5.08e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 5.08e-09 [LRT]
## NA [Permutation]
For example, if we are interested in the link scale variances of colour morph variation, we can
estimate the variances as:
rep13 <- rpt(Egg ~ Treatment + (1 | Container) + (1 | Population), grname =
c("Container",
"Population"), data = BeetlesFemale, datatype = "Poisson", nboot =
1000,
npermut = 0, ratio = FALSE)
print(rep13)
##
##
## Variance estimation using the glmm method and log link
##
## Variance of Container
## --------------------------------
## Link-scale approximation:
## Var = 0.022
## SE = 0.012
## CI = [0.001, 0.05]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Variance of Population
## --------------------------------
## Link-scale approximation:
## Var = 0.304
## SE = 0.126
## CI = [0.09, 0.605]
## P = 1.19e-16 [LRT]
## NA [Permutation]
plot(rep13, grname = "Container", scale = "link", cex.main = 0.8)
plot(rep13, grname = "Population", scale = "link", cex.main = 0.8)
Evidently, the significance of the random effect terms won’t change no matter whether they
are presented as variances or as variances devided by the phenotypic variance
(repeatabilities). If we are interested in inspecting raw variances, we are often also interested
in quantifying the amount of residual variance and its uncertainty, too. We have therefore
introduced two addition reserved terms for the grname argument (besides grname =
"Fixed") to allow this. Residual will allow estimating the residual variance, which in the
case of non-Gaussian models is the overdispersion on the latent scale plus the distribution-
specific variance. Overdispersion will allow estimating the overdispersion variance and the
latent scale, which in the case of Gaussian models (with implicit identity link) is equal to the
residual estimated by Residual.
It is necessary to specify at least one additional grouping level in the grname argument along
with Residual. The results for Container and Population are unaffected and are the same
as above (rep13). In our example there is significant overdispersion, meaning that besides
treatment effects and variation among populations and among containers within populations
there is more unexplained variation than expected from the Poisson distribution alone.
It is pointless to test for the statistical significance of residual variance against a null
hypothesis of no residual variance and we therefore omit this test. But for non-Gaussian
models it is worthwhile testing if there is statistically significant overdispersion and we have
therefore implemented the LRT test (but not the permutation test) for Overdispersion. In
gerneral, however, be aware that each grouping level specified in grname will be tested in
separate iterations if a permutation test is required. Hence it is recommened to be careful with
the use of npermut in combination with multiple grouping levels.
It is possible to use both Overdisperion and Residual in combination with ratio = FALSE
or with ratio = TRUE. The ratio of the residual variance to the total variance should then be
interpreted as the ‘non-repeatability’. We assume that the combination of these two terms
with ratio = FALSE will be more commonly used.
Random-slope models
It is possible to fit random-slope models following the approach introduced by Johnson
(2014). The function call will return the average repeatability across the distribution of the
covariate in the sample (see Johnson 2014 for details). Permutation tests and likelihood ratio
test are conducted with random-intercept and random-slope terms being jointly removed from
the model. Note that random-slope models can be specified in multiple ways and the function
will likely handle only those models well that have been specified in the classical way (foo |
grname), implicitly fitting random-intercepts, random slopes for the covariate foo and their
correlation. Multiple random-slopes are possible, but models with covariances being
constraint do currently not work.
The following code gives and example for a repeatability estimation for body size where
random slopes are fitted for Sex and Treatment at the level of populations. The resulting
model allows for variance components to vary by sex and by treatment with the average
across the sex/treatment levels being returned as the model output.
Concluding remarks
The rptR package offers convenient extraction of (ratios of) variance components for
multiple grouping levels for estimating repeatabilities, including the option to control for
fixed effects in order to estimate adjusted repeatabilities. While this is easy-to-use we want to
remind users to make sure their data is appropriately modeled. We therefore provide the fitted
mixed model as part of the rpt object and recommend that this is inspected for any
indications of violated model assumptions or missing predictors. Repeatability estimates from
Poisson, binary and proportion data might often appear disappointingly low. This is likely to
be a simple reflection of the fact that for non-Gaussian data, the numerator contains the
among-group variance in predispositions (e.g. probabilities of successes, long-range averages
of counts) while the observed data is discretely scattered around these expectations. A lot of
the phenotypic variance is hence inherently random in origin and each datapoint contains only
very limited information about the underlying predisposition. The estimation of repeatabilities
for count, binary and proportion data will thus require substantial sample sizes and (ideally)
optimized sampling designs.
References
Bates, D., M. Maechler, B. Bolker & S. Walker (2015). Fitting linear mixed-effects models
using lme4. Journal of Statistical Software 67: 1-48.
Nakagawa, S. & H. Schielzeth (2013). A general and simple method for obtaining R2
from generalized linear mixed-effects models. Methods in Ecology and Evolution 4: 133-142.
and intra-class correlation ICC from generalized linear mixed-effects models revisited and
expanded. bioRxive: doi 10.1101/095851.
Manly, B. F. J. (2007). Randomization, bootstrap and Monte Carlo methods in Biology. 3rd
edn. Chapman & Hall, Boca Rato.
Schielzeth, H. & S. Nakagawa (2013). Nested by design: Model fitting and interpretation in a
mixed model era. Methods in Ecology and Evolution 4: 14-24.