Permuco Tutorial
Permuco Tutorial
Abstract
Recent methodological researches produced permutation methods to test parameters in
presence of nuisance variables in linear models or repeated measures ANOVA. Permutation
tests are also particularly useful to overcome the multiple comparisons problem as they
are used to test the effect of factors or variables on signals while controlling the family-wise
error rate (FWER). This article introduces the permuco package which implements several
permutation methods. They can all be used jointly with multiple comparisons procedures
like the cluster-mass tests or threshold-free cluster enhancement (TFCE). The permuco
package is designed, first, for univariate permutation tests with nuisance variables, like
regression and ANOVA; and secondly, for comparing signals as required, for example, for
the analysis of event-related potential (ERP) of experiments using electroencephalography
(EEG). This article describes the permutation methods and the multiple comparisons
procedures implemented. A tutorial for each of theses cases is provided.
1. Introduction
Permutation tests are exact for simple models like one-way ANOVA and t test (Lehmann
and Romano 2008, pp. 176–177). Moreover it has been shown that they have some robust
properties under non normality (Lehmann and Romano 2008). However they require the as-
sumption of exchangeability under the null hypothesis to be fulfilled which is not the case in
a multifactorial setting. For these more complex designs, Janssen and Pauls (2003), Janssen
(2005), Pauly, Brunner, and Konietschke (2015) and Konietschke, Bathke, Harrar, and Pauly
(2015) show that permutation tests based on non exchangeable data can be exact asymptot-
ically if used with studentized statistics. Another approach to handle multifactorial designs
is to transform the data before permuting. Several authors (Draper and Stoneman 1966;
Freedman and Lane 1983; Kennedy 1995; Huh and Jhun 2001; Dekker, Krackhardt, and Sni-
jders 2007; Kherad-Pajouh and Renaud 2010; ter Braak 1992) have proposed different types
of transformations and Winkler, Ridgway, Webster, Smith, and Nichols (2014) gave a simple
and unique notation to compare those different methods.
Repeated measures ANOVA including one or more within subject effects are the most widely
used models in the field of psychology. In the simplest case of one single random factor, an
exact permutation procedure consists in restricting the permutations within the subjects. In
more general cases, free permutations in repeated measures ANOVA designs would violate the
exchangeability assumption. This is because the random effects associated with subjects and
2 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
their interactions with fixed effects imply a complex structure for the (full) covariance matrix
of observations. It follows that the second moments are not preserved after permutation.
Friedrich, Brunner, and Pauly (2017a) have derived exact asymptotic properties in those
designs for a Wald-type statistic and Kherad-Pajouh and Renaud (2015) proposed several
methods to transform the data following procedures developed by Kennedy (1995) or Kherad-
Pajouh and Renaud (2010).
For linear models, permutation tests are useful when the assumption of normality is violated
or when the sample size is too small to apply asymptotic theory. In addition they can be used
to control the family wise error rate (FWER) in some multiple comparisons settings (Troendle
1995; Maris and Oostenveld 2007; Smith and Nichols 2009). These methods have been suc-
cessfully applied for the comparison of experimental conditions in both functional magnetic
resonance imaging (fMRI) and electroencephalography (EEG) as they take advantage of the
spatial and/or temporal correlation of the data.
The aim of the present article is to provide an overview of the use of permutation meth-
ods and multiple comparisons procedures using permutation tests and to explain how it
can be used in R (R Core Team 2021) with the package permuco (Frossard and Renaud
2019). The package is available from the Comprehensive R Archive Network (CRAN) at
https://CRAN.R-project.org/package=permuco. Note that the presentation and discus-
sion of the available packages that handle permutation tests in related settings is deferred to
Section 5.1, where all the notions are introduced. Appendix A shows a comparison of the
relevant code and outputs. But first, Section 2 focuses on fixed effect models. It explains the
model used for ANOVA and regression and the various permutation methods proposed in the
literature. Section 3 introduces the methods for repeated measures ANOVA. Section 4 ex-
plains the multiple comparisons procedures used for comparing signals between experimental
conditions and how permutation tests are applied in this setting. Section 5 describes addi-
tional programming details and some of the choices for the default settings in the permuco
package. Section 6 treats two real data analyses, one from a control trial in psychology and
the second from an experiment in neurosciences using EEG.
H0 : β = 0 vs. H1 : β 6= 0. (2)
The permutation test is exact under the null hypothesis for finite samples if the data are ex-
changeable under the null hypothesis. This assumption is not fulfilled in model in Equation 1
as we cannot control the influence of the nuisance term Dη when permuting. In fact, under
the null hypothesis in Equation 2, the responses follow a distribution (Dη, σ 2 In ) which are
not exchangeable due to the presence of unequal first moments. Pauly et al. (2015) show
however that permuting the responses and using a Wald-type statistic is an asymptotically
exact procedure in factorial designs. Another approach, which is the focus of this paper, is
to transform the data prior to the permutation. Those transformation procedures are what
will be called permutation methods. They are described in Section 2.2 and are implemented
in permuco.
The permutation of a vector v is defined as P v and the permutation of the rows of a matrix M
as P M where P is a permutation matrix (Gentle 2007, pp. 66–67). For any design matrix M ,
its corresponding “hat” matrix is HM = M (M ⊤ M )−1 M ⊤ and its corresponding “residuals”
matrix is RM = I − M (M ⊤ M )−1 M ⊤ (Greene 2011, pp. 24–25). The full QR decomposition
is " #
i U
M 0
h i h
M 0 = QM VM , (3)
0 0
n×n
method/Authors y∗ D∗ X∗
manly (Manly 1991) Py D X
draper_stoneman (Draper and Stoneman 1966) y D PX
dekker(Dekker et al. 2007) y D P RD X
kennedy (Kennedy 1995) (P RD )y RD X
huh_jhun (Huh and Jhun 2001) (P VD⊤ RD )y VD⊤ RD X
freedman_lane (Freedman and Lane 1983) (HD + P RD )y D X
terBraak (ter Braak 1992) (HX,D + P RX,D )y D X
Table 1: Permutation methods in the presence of nuisance variables. See text for explanations
of the symbols.
All the remaining permutation methods are also summarized by the transformation of y, D
and X into y ∗ , X ∗ and D∗ and are explained next. The manly method simply permutes the
response (this method is sometimes called raw permutations). Even if this method does not
take into account the nuisance variables, it still has good asymptotic properties when using
studentized statistics. draper_stoneman permutes the design of interest (note that without
nuisance variables permuting the design is equivalent to permuting the response variable).
However, this method ignores the correlation between D and X that is typically present in
regressions or unbalanced designs. For the dekker method, we first orthogonalize X with
respect to D, then we permute the design of interest. This transformation reduces the influ-
ence of the correlation between D and X and is more appropriate for unbalanced design. The
kennedy method orthogonalizes all of the elements (y, D and X) with respect to the nuisance
variables, removing the nuisance variables in the equation, and then permutes the obtained
response. Doing so, all the design matrices lie in the span of X, a sub-space of observed
design X and D. However this projection modifies the distribution of the residuals that lose
exchangeability (RD y ∼ (0, RD σ 2 ) for original IID data). The huh_jhun method is similar to
kennedy but it applies a second transformation (VD⊤ ) to the data to ensure exchangeability (up
to the second moment, VD⊤ RD y ∼ (0, In−(p−q) σ 2 )). The VD matrix comes from the Equation 3
and has a dimension of n × (n − (p − q)). It implies that the P ’s matrices for the huh_jhun
method have smaller dimensions. The terBraak method is similar to freedman_lane but
uses the residuals of the full model. This permutation method creates a new response vari-
able y ∗ which assumes that the observed value of the estimate β̂|y is the true value of β.
Computing the statistic using y ∗ , X, D would not produce a permutation distribution under
the null hypothesis. To circumvent this issue, the method changes the null hypothesis when
computing the statistics at each permutation to H0 : β = β̂|y = (X ⊤ RD X)−1 X ⊤ RD y|y. The
right part of this new hypothesis corresponds to the observed estimate of the parameters of
interest under the full model, and implicitly uses a pivotal assumption. Note that terBraak
is the only method where the statistic computed with the identity permutation is different
from the observed statistic. The notation RD,X means that the residuals matrix is based on
the concatenation of the matrices D and X. See Section 5.2 for advises on the choice of the
method.
For each of the methods presented in Table 1, permutation tests can be computed using
different statistics. For univariate or multivariate β parameters, the permuco package imple-
mented a F statistic that constitutes a marginal test (or “type III” sum of square) (Searle
Jaromil Frossard, Olivier Renaud 5
2006, pp. 53–54). For a univariate β , one- and two-sided tests (based on a t-statistic) are
1×1
also implemented. We write the F statistic as
y ⊤ HRD X y n − p
F = . (4)
y ⊤ RD,X y p − q
When q = 1, the t statistic is
(X ⊤ RD X)−1 XRD y √
tSt = q n − p, (5)
y ⊤ RD,X y(X ⊤ RD X)−1
where the numerator is the estimate of β under the full model. Note that the statistic
can be simplified by a factor of (X ⊤ RD X)−1/2 . The two statistics are function of data.
They lead to the general notation t = t(y, D, X) when applied to the observed data and to
t∗ = t(y ∗ , D∗ , X ∗ ) when applied to the permuted data. The permuted statistics constitute
the set T which contains the t∗ for all P ∈ P. We define the permuted p value as p =
1 P ∗ 1 P ∗
nP t∗ ∈T I (|t | ≥ |t|), for a two-tailed t test, p = nP t∗ ∈T I (t ≥ t), for an upper-tailed
t test or an F test and finally p = n1P t∗ ∈T I (t∗ ≤ t), for a lower-tailed t test, where I(·) is
P
y = Dη + Xβ + E 0 κ + Z 0 γ + ǫ, (6)
where y is the response, the fixed part of the design is split into the nuisance variable(s)
n×1
D , and the variable(s) of interest X . The specificity of the repeated measures
n×(p1 −q1 ) n×(p1 )
ANOVA model allows us to split the random part into E0 and Z 0 which are the random
n×(p02 −q20 ) n×q20
effects associated with D and X respectively (Kherad-Pajouh and Renaud 2015). The fixed
⊤ ⊤
⊤ ⊤
η ⊤ β ⊤ κ γ
parameters are 1×(p −q ) 1×q . The random part is 1×(p0 −q0 ) 1×q0 ∼ (0, Ω) and
1 1 1 2 2 2
ǫ∼ (0, σ 2 I). The matrices associated with the random effects E0 and Z 0 can be computed
using
E 0 = (Dwithin
0′ 0′ ⊤
∗ Z∆ ) and Z 0 = (Xwithin
0′ 0′ ⊤
∗ Z∆ ) , (7)
where 0
Dwithinand 0
Xwithin
are overparametrized matrices and are associated with the within
effects in the design matrices D and X. Z∆ 0 is the overparametrized design matrix associated
to the subjects and ∗ is the column-wise Khatri-Rao product (Khatri and Rao 1968). Since the
matrices E 0 and Z 0 are overparametrized and colinear to the intercept or between-participant
effects they cannot directly be used to compute their corresponding sums of squares. We need
versions that are constrained into their respective appropriate sub-spaces:
method y∗ D∗ X∗ E∗ Z∗
Rd_keradPajouh_renaud (RD ) P RD y RD X RD Z
Rde_keradPajouh_renaud (RD,E ) P RD,E y RD,E X RD,E Z
Table 2: Permutation methods in the presence of nuisance variables for repeated measures
ANOVA.
The matrices E and Z are respectively of rank p2 −q2 and q2 and are the ones used to compute
F statistics. Formally, the hypothesis of interest associated with Equation 6 writes:
H0 : β = 0 vs. H1 : β 6= 0. (9)
where ys is the response variable for all observations at time s and each of the k models are
the same as Equation 1. D and X, the design matrices, are then identical over the k time
points. The aim is to test simultaneously all k hypotheses H0s : βs = 0 vs. H1s : βs 6= 0 for
s ∈ {1, . . . , k} while controlling for the FWER through the k tests. Likewise, the random
effects model is written:
where each of the k models are defined as in Equation 6 and, similarly, we are interested to
test the k hypotheses H0s : βs = 0 vs. H1s : βs 6= 0 for s ∈ {1, . . . , k}.
For both models, we choose one of the permutation methods presented in Tables 1 or 2 and
compute the k observed statistics ts , the k sets of permutated statistics Ts , which lead to k
raw or uncorrected p values.
To correct them, the k sets of permutated statistics Ts can be analyzed as one set of multivari-
ate statistic. It is done simply by combining the k univariate permutation-based distributions
into a single k-variate distribution which maintains the correlation between tests. For each
permutation, we simply combine all k univariate permuted statistics t∗1 , . . . , t∗k into one mul-
tivariate permuted statistic t∗ = [t∗1 . . . t∗k ]⊤ . The three multiple comparisons procedures
described below are all based on this multivariate distribution and take advantage of the
correlation structure between the tests.
8 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
In addition to the theoretical properties of this procedure (Maris and Oostenveld 2007), this
method makes sense for EEG data analysis because if a difference of cerebral activity is
Jaromil Frossard, Olivier Renaud 9
τ=4
4
2
0
Time (ms)
Figure 1: Display of the 600 statistics corresponding to the tests on 600 time points. Here
4 clusters are found using a threshold τ = 4. Using the sum to aggregate the statistics, for
each cluster i, the shaded area underneath the curve represents its cluster-mass mi .
believed to happen at a time s for a given factor, it is very likely that the time s + 1 (or s − 1)
will show this difference too.
where e(h) is the extend at the height h and it is interpreted as the length of a cluster for a
threshold of h. E and H are free parameters named the extend power, and the height power
respectively. t0 is set close to zero. Figure 2 illustrates how the TFCE statistic is computed
for a given time point s.
We construct the TFCE null distribution U by applying the formula in Equation 13 at each
time-point of the permuted statistics t∗s for s ∈ {1, . . . , k} to produce for each permutation,
k values u∗s . Then the contribution of a permutation to U is the maximum of all k values u∗s
10 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
ts ●
6
4
h
e(h)
2
0
Time (ms)
Figure 2: The TFCE transforms the statistic ts using formula in Equation 13. The extend
e(h), in red, is shown for a given height h. The TFCE statistics us at s can be viewed as a
function of characteristics in the grey area.
(see Algorithm 3). In practice, the integral in Equation 13 is approximated numerically using
small dh ≤ 0.1 (Smith and Nichols 2009; Pernet et al. 2015).
At time s, the statistic ts will be modified using the formula in Equation 13. The formula can
be viewed as a function of characteristics in the grey area (its area in the special case where
both E and H are set to 1).
To test the significance of a time point s we compare its enhanced statistics us with the
threshold-free cluster-enhancement null distribution U . For an F test we define the p value
as ps = n1P u∗ ∈U I(u∗ ≥ us ).
P
cluster-mass test is a two steps procedure: first, it aggregates time-points into clusters, and
then summarizes them using the cluster-mass. The inference is only performed at the second
step which looses any information on the shape and size of the clusters. It implies that the
interpretation of individual time-point is proscribed. Finally, the transformation of the TFCE
statistic is an integration over all thresholds of cluster statistics (Smith and Nichols 2009).
Therefore, the TFCE does not allow an interpretation of each time-point individually either
as it also summarizes statistics using the concept of clusters. Thus, the interpretation of
individual time-point must also involves it. Therefore, a significant time-point must be inter-
preted as a time-point being part of at least one significant cluster (among all clusters formed
using all thresholds), where a significant cluster contains at least one significant time-point.
5. Comparison of implementations
The codes and outputs for packages that perform ANOVA/ANCOVA are given in Ap-
pendix A.1 and in Appendix A.2 for repeated measures. For fixed effects, this illustrates
that permuco, flip and lmPerm handle covariates and are based on the same statistic (F )
whereas GFD uses the Wald-type statistic. It also shows that flip is testing one factor at a
time (main effect of sex in this case) whereas the other packages produce directly tests for
all the effects. Also, the nuisance variables in flip must be carefully implemented using the
appropriate coding variables in case of factors. Note that lmPerm centers the covariates using
the default setting and that it provides both marginal (Type III) or sequential (Type I) tests.
Concerning permutation methods, only the manly method is used for both lmPerm and
GFD, the flip package uses the huh_jhun method, whereas multiple methods can be set
by users using the permuco package. Note also that different default choices for the V
matrix as implemented in flip (based on eigendecomposition) and permuco (based on QR
decomposition) packages lead to slightly different results (see Table 1 for more information
on the permutation methods).
Finally, concerning repeated measures designs, flip cannot handle cases where measures are
not repeated in each condition for each subject, and therefore cannot be compared in Appendix
A.2. As already said, lmPerm produces sequential tests in repeated measures designs and
permuco produces marginal tests. This explains why, with unbalanced data, only the last
interaction term in each strata produces the same statistic.
method allows us to test the intercept, which is not available for the other methods.
The multcomp argument can be set to "bonferroni" for the Bonferroni correction (Dunn
1958), to "holm" for the Holm correction (Holm 1979), "benjamini_hochberg" for the
Benjamini-Hochberg method (Benjamini and Hochberg 1995), to "troendle", see Section 4.2,
to "clustermass", see Section 4.3 and to "tfce", see Section 4.4. Note that in the permuco
package, these 6 methods are available in conjunction with permutation, although the first 3
methods are general procedures that could also be used in a parametric setting.
For the "clustermass" method, the threshold parameter of the cluster-mass statistic is
usually chosen by default at the 0.95 quantile of the corresponding univariate parametric
distribution; but the FWER is preserved for any a priori value of the threshold that the
user may set. The mass function is specified by the aggr_FUN argument. It is set by default to
the sum of squares for a t statistic and the sum for an F . It should be a function that returns a
positive scalar which will be large for an uncommon event under the null hypothesis (e.g., use
the sum of absolute value of t statistics instead of the sum). It can be tuned depending on the
expected signal. For the t statistic, typically, the sum of squares will detect more efficiently
high peaks and the sum of absolute values will detect more efficiently wider clusters.
For the "tfce" method, the default value for the extend parameter is E = 0.5 and for the
height H = 2 for t tests and, for F test, it is E = 0.5 and H = 1 following the recommendations
of Smith and Nichols (2009) and Pernet et al. (2015). The ndh parameter controls the number
of steps used in the approximation of the integral in Equation 13 and is set to 500 by default.
The argument return_distribution is set by default to FALSE but can be set to TRUE to
return the large matrices (nP × k) with the value of the permuted statistics.
The algorithm and formula presented in the previous sections may not be efficient for very
large size of data. When available, they are implemented in a more efficient way in permuco.
For example, to reduce the computing time, the permuted statistics are computed through a
QR decomposition using the qr, qr.fitted, qr.resid or qr.coef functions.
14 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
6. Tutorial
To load the permuco package:
R> install.packages("permuco")
R> library("permuco")
The permutation tests are obtained with the aovperm function. The np argument sets the
number of permutations. We choose to set a high number of permutations (np = 100000) to
reduce the variablity of the permutation p values so that they can safely be compared to the
parametric ones. The aovperm function automatically converts the coding of factors with the
contr.sum which allows us to test the main effects of factors and their interactions.
Anova Table
Resampling test using freedman_lane to handle nuisance variables and
1e+05 permutations.
SS df F parametric P(>F)
LOSc 2.162e+09 1 483.4422 0.0000
sex 1.463e+07 1 3.2714 0.0723
insurance 6.184e+05 1 0.1383 0.7105
LOSc:sex 8.241e+06 1 1.8427 0.1765
LOSc:insurance 2.911e+07 1 6.5084 0.0116
sex:insurance 1.239e+05 1 0.0277 0.8680
LOSc:sex:insurance 1.346e+07 1 3.0091 0.0846
Residuals 7.514e+08 168
resampled P(>F)
LOSc 0.0000
sex 0.0763
insurance 0.6794
LOSc:sex 0.1576
Jaromil Frossard, Olivier Renaud 15
LOSc:insurance 0.0233
sex:insurance 0.8537
LOSc:sex:insurance 0.0847
Residuals
The interaction LOSc:insurance is significant both using the parametric p value 0.0116 and
the permutation one 0.0233 using a 5% level. However, the difference between these 2 p values
is 0.0117 which is high enough to lead to different conclusions e.g., in case of correction for
multiple tests or a smaller α level.
If we are interested in the difference between the groups for a high value of the covariate, we
center the covariate to the third quantile (14 days) and re-run the analysis.
Anova Table
Resampling test using freedman_lane to handle nuisance variables and
1e+05 permutations.
SS df F parametric P(>F)
LOS14 2.162e+09 1 483.4422 0.0000
sex 2.760e+07 1 6.1703 0.0140
insurance 9.864e+05 1 0.2206 0.6392
LOS14:sex 8.241e+06 1 1.8427 0.1765
LOS14:insurance 2.911e+07 1 6.5084 0.0116
sex:insurance 7.722e+05 1 0.1727 0.6783
LOS14:sex:insurance 1.346e+07 1 3.0091 0.0846
Residuals 7.514e+08 168
resampled P(>F)
LOS14 0.0000
sex 0.0214
insurance 0.6101
LOS14:sex 0.1550
LOS14:insurance 0.0229
sex:insurance 0.6530
LOS14:sex:insurance 0.0827
Residuals
For a long length of stay, the effect of sex is significant using the parametric p value p = 0.014
and the permutation one p = 0.0214.
If the researcher has an a priori oriented alternative hypothesis HA : βsex=M > βsex=F ,
the lmperm function produces one-sided t tests. To run the same models as previously, we
first need to set the coding of the factors with the contr.sum function before running the
permutation tests.
16 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
[,1]
public 1
semi_private -1
[,1]
F 1
M -1
LOS14:insurance1 0.0232
sex1:insurance1 0.6526
LOS14:sex1:insurance1 0.0835
The effect sex1 is significant for both the parametric one-sided p value, p = 0.007, and the
permutation one-sided p value, p = 0.0211. It indicates that when the length of the stay is
high, men have a shorter cost than women.
To test the effect of the sex within the public insured persons (called simple effect), we
change the coding of the factors inside the data.frame using the contr.treatment function
and disable the automatic recoding using the argument coding_sum = FALSE.
semi_private
public 0
semi_private 1
[,1]
F 1
M -1
Anova Table
Resampling test using freedman_lane to handle nuisance variables and
1e+05 permutations.
SS df F parametric P(>F)
LOSc 9.512e+09 1 2126.7539 0.0000
sex 6.092e+07 1 13.6210 0.0003
insurance 6.184e+05 1 0.1383 0.7105
LOSc:sex 1.510e+08 1 33.7708 0.0000
LOSc:insurance 2.911e+07 1 6.5084 0.0116
sex:insurance 1.239e+05 1 0.0277 0.8680
LOSc:sex:insurance 1.346e+07 1 3.0091 0.0846
Residuals 7.514e+08 168
resampled P(>F)
LOSc 0.0000
sex 0.0003
18 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
insurance 0.6829
LOSc:sex 0.0000
LOSc:insurance 0.0231
sex:insurance 0.8519
LOSc:sex:insurance 0.0836
Residuals
The sex row can be interpreted as the effect of sex for the public insured persons for an
average length of stay. Both the parametric p = 0.0003 and permutation p value, p = 0.0003,
show significant effect of sex within the public insured persons.
Given the skewness of the data for each case where the permutation test differs from the
parametric result, we tend to put more faith on the permutation result since it does not rely
on assumption of normality.
We perform the permutation tests by running the aovperm function. The within subject
factors should be written using + Error(...) similarly to the aov function from the stats
package:
Warning message:
In checkBalancedData(fixed_formula = formula_f, data = cbind(y, :
The data are not balanced, the results may not be exact.
A warning message is issued if the design is not fully balanced, as some exactness properties
of the tests are no longer warranted. However, the method from Kherad-Pajouh and Renaud
(2015) can still be applied as the within-subject factor (time) is balanced. The results are
shown in an ANOVA table by printing the object:
R> mod_jpah2016
0.8
0.6
0.6
0.6
Density
Density
Density
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 10 20 30 40 0 10 20 30 40 50 0 5 10 15 20
Figure 3: The permutation distributions of the F statistics for the effects bmic, condition
and bmic:condition. The vertical lines indicate the observed statistics.
This analysis reveals a significant p value for the effect of the interaction bmic:condition
with a statistic F = 5.4269 , which lead to a permutation p value p = 0.0224 not far from the
parametric one. For this example, the permutation tests backs the parametric analysis. The
permutation distributions can be viewed using the plot function like in Figure 3.
dataset contains the ERP of the electrode O1. The design of experiment is given in the
attentionshifting_design dataset along with the laterality, sex, age, and 2 measures of
anxiety of each subjects, see Table 3.
As almost any ERP experiment, the data is designed for a repeated measures ANOVA. Us-
ing the permuco package, we test each time points of the ERP for the main effects and the
interactions of the variables visibility, emotion and direction while controlling for the
FWER. We perform F tests using a threshold at the 95% quantile, the sum as a cluster-
mass statistics and 5000 permutations. We handle nuisance variables with the method
Rd_kheradPajouh_renaud:
The plot method produced a graphical representation of the tests that allows us to see quickly
the significant time frames corrected by clustermass. The results are shown in Figure 4.
R> plot(electrod_O1)
Only one significant result appears for the main effect of visibility. This cluster is corrected
using the clustermass method. The summary of the clusterlm object gives more informa-
tion about all clusters for the main effect of visibility, whether they are driving the significant
effect or not:
R> summary(electrod_O1)$visibility
Effect: visibility.
Alternative Hypothesis: two.sided.
Statistic: fisher(1, 14).
Resampling Method: Rd_kheradPajouh_renaud.
Type of Resampling: permutation.
Number of Dependant Variables: 819.
Jaromil Frossard, Olivier Renaud 21
60
visibility
40
20
0
6
emotion
4
2
12 0 1 2 3 4 5 6 7 0
direction
visibility:emotion
8
6
4
2
0
8
visibility:direction
6
4
2
10 0
emotion:direction
8
6
4
2
6 0
visibility:emotion:direction
5
4
3
2
1
0
Figure 4: The plot method on a clusterlm object displays the observed statistics of the
three main effects and their interactions. The dotted horizontal line represents the threshold
which is set by default to the 95% percentile of the statistic. For this dataset, one cluster is
significant for the main effect of visibility using the clustermass method, as shown by the
red part. The summary method gives more details.
There is a significant difference between the two levels of visibility. This difference is driven by
one cluster that appears between the measures 332 and 462 which correspond to the 123.7 ms
and 250.9 ms after the event. Its cluster-mass statistic is 3559.1 with an associated p-value
of 0.0012. The threshold is set to 4.60011 which is the 95% percentile of the F statistic. If
we want to use other multiple comparisons procedures, we use multcomp argument:
Note that we retrieve the very same permutations as previous model by using the P argument.
The computation time for those tests is reasonably low: it takes less than 12 minutes on a
desktop computer (i7 3770CPU 3.4GHz, 8Go RAM) to compute the 7 permutation tests with
all the multiple comparisons procedures available. To see quickly the results of the threshold-
free cluster-enhancement procedure, we set the multcomp argument of plot to "tfce" as
shown in Figure 5.
The TFCE procedure gets approximately a similar effect. However the time-points around
400 (190 ms) are not part of significant effect. If the curves in the TFCE plot happen to
to show some small steps (which is not the case in Figure 5) it may be because of a small
number of terms in the approximation of the integral of the tfce statistics of Equation 13.
In that case it would be reasonable to increase the value of the parameter ndh.
Finally, to be able to interpret individually each time-point, we can use the troendle multiple
comparisons procedure whose results are visualized by plotting the full_electrod_O1 object.
A similar period is detected for the main effect of visibility.
Effect: visibility.
Alternative Hypothesis: two.sided.
Statistic: fisher(1, 14).
Jaromil Frossard, Olivier Renaud 23
8000
visibility
4000
100 0
emotion
60
20
0
100
direction
60
20
200 0
visibility:emotion
50 100
100 150 200 0
visibility:direction
50
150 0
emotion:direction
100
50
0
visibility:emotion:direction
50
30
0 10
Figure 5: Setting the multcomp argument to "tfce" in the plot function will display the
TFCE p values. The argument enhanced_stat = TRUE shows the TFCE statistics us of
Equation 13.
60
visibility
40
20
0
6
emotion
4
2
12 0 1 2 3 4 5 6 7 0
direction
visibility:emotion
8
6
4
2
0
8
visibility:direction
6
4
2
10 0
emotion:direction
8
6
4
2
6 0
visibility:emotion:direction
5
4
3
2
1
0
Figure 6: Setting the multcomp to "troendle" will display the troendle correction which
allows an interpretation of each time-point individually.
7. Conclusion
This article presents recent methodological advances in permutations tests and their imple-
mentation in the permuco package. Hypotheses in linear models framework or repeated
measures ANOVA are tested using several methods to handle nuisance variables. Moreover
permutations tests can solve the multiple comparisons problem and control the FWER trough
Jaromil Frossard, Olivier Renaud 25
cluster-mass tests or TFCE, and the clusterlm function implements those procedures for the
analysis of signals, like EEG data. Section 6 illustrates some real data example of tests that
can be performed for regression, repeated measures ANCOVA and ERP signals comparison.
We hope that further developments of permuco expand cluster-mass tests to multidimen-
sional adjacency (space and time) to handle full scalp ERP tests that control the FWER
over all electrodes. An early version of the functions are already available in the the follow-
ing repository: https://github.com/jaromilfrossard/permuco4brain. Another evolution
will concern permutation procedures for mixed effects models to allows researchers to perform
tests in models containing participants and stimuli specific random effects. Indeed, we plan
to include in permuco the re-sampling test presented by Bürki, Frossard, and Renaud (2018)
as they show that, first, using F statistic (by averaging over the stimuli) in combination with
cluster-mass procedure increases the FWER and, secondely, that a re-sampling method based
on the quasi-F statistic (Clark 1973; Raaijmakers, Schrijnemakers, and Gremmen 1999) keeps
it much closer to the nominal level of 5%.
Acknowledgments
We are particularly grateful for the assistance given by Eda Tipura, Guillaume Rousselet and
Elvezio Ronchetti that greatly improved this manuscript. Eda Tipura provided original EEG
data and all three gave many comments coming from their extended reading of the paper;
although any errors are our own.
References
Basso D, Finos L (2012). “Exact Multivariate Permutation Tests for Fixed Effects in Mixed-
Models.” Communications in Statistics - Theory and Methods, 41(16-17), 2991–3001. doi:
10.1080/03610926.2011.627103.
Benjamini Y, Hochberg Y (1995). “Controlling the False Discovery Rate: A Practical and
Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society B, 57(1),
289–300. doi:10.1111/j.2517-6161.1995.tb02031.x.
Bürki A, Frossard J, Renaud O (2018). “Accounting for Stimulus and Participant Effects
in Event-Related Potential Analyses to Increase the Replicability of Studies.” Journal of
Neuroscience Methods, 309, 218–227. doi:10.1016/j.jneumeth.2018.09.016.
Cheval B, Sarrazin P, Pelletier L, Friese M (2016). “Effect of Retraining Approach-Avoidance
Tendencies on an Exercise Task: A Randomized Controlled Trial.” Journal of Physical
Activity and Health, 13(12), 1396–1403. doi:10.1123/jpah.2015-0597.
Clark HH (1973). “The Language-as-Fixed-Effect Fallacy: A Critique of Language Statistics
in Psychological Research.” Journal of Verbal Learning and Verbal Behavior, 12(4), 335–
359. doi:10.1016/s0022-5371(73)80014-3.
Dekker D, Krackhardt D, Snijders TAB (2007). “Sensitivity of MRQAP Tests to Collinear-
ity and Autocorrelation Conditions.” Psychometrika, 72(4), 563–581. doi:10.1007/
s11336-007-9016-1.
26 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
Draper NR, Stoneman DM (1966). “Testing for the Inclusion of Variables in Linear Regression
by a Randomisation Technique.” Technometrics, 8(4), 695. doi:10.2307/1266641.
Dunn OJ (1958). “Estimation of the Means of Dependent Variables.” The Annals of Mathe-
matical Statistics, 29(4), 1095–1111. doi:10.1214/aoms/1177706443.
Fay MP, Shaw PA (2010). “Exact and Asymptotic Weighted Logrank Tests for Interval
Censored Data: The interval R Package.” Journal of Statistical Software, 36(2), 1–34.
doi:10.18637/jss.v036.i02.
Finos L (2018). flip: Multivariate Permutation Tests. R package version 2.5.0, URL https:
//CRAN.R-project.org/package=flip.
Finos L, Basso D (2014). “Permutation Tests for Between-Unit Fixed Effects in Multivariate
Generalized Linear Mixed Models.” Statistics and Computing, 24(6), 941–952. doi:10.
1007/s11222-013-9412-6.
Frossard J, Renaud O (2019). permuco: Permutation Tests for Regression, (Repeated Mea-
sures) ANOVA/ANCOVA and Comparison of Signals. R package version 1.1.0, URL
https://CRAN.R-project.org/package=permuco.
Hothorn T, Hornik K, Van De Wiel MA, Zeileis A, et al. (2008). “Implementing a Class
of Permutation Pests: The coin Package.” Journal of Statistical Software, 28(8), 1–23.
doi:10.18637/jss.v028.i08.
Jaromil Frossard, Olivier Renaud 27
Huh MH, Jhun M (2001). “Random Permutation Testing in Multiple Linear Regression.”
Communications in Statistics - Theory and Methods, 30(10), 2023–2032. doi:10.1081/
sta-100106060.
Janssen A (2005). “Resampling Student’s t-Type Statistics.” Annals of the Institute of Sta-
tistical Mathematics, 57(3), 507–529. doi:10.1007/bf02509237.
Janssen A, Pauls T (2003). “How Do Bootstrap and Permutation Tests Work?” The Annals
of Statistics, 31(3), 768–806. doi:10.1214/aos/1056562462.
Kennedy PE (1995). “Randomization Tests in Econometrics.” Journal of Business & Eco-
nomic Statistics, 13(1), 85. doi:10.2307/1392523.
Khatri CG, Rao CR (1968). “Solutions to Some Functional Equations and Their Applica-
tions to Characterization of Probability Distributions.” Sankhyā: The Indian Journal of
Statistics, Series A, pp. 167–180. doi:10.1002/9781118165676.ch5.
Kherad-Pajouh S, Renaud O (2010). “An Exact Permutation Method for Testing Any Effect
in Balanced and Unbalanced Fixed Effect ANOVA.” Computational Statistics & Data
Analysis, 54, 1881–1893. doi:10.1016/j.csda.2010.02.015.
Kherad-Pajouh S, Renaud O (2015). “A General Permutation Approach for Analyzing Re-
peated Measures ANOVA and Mixed-Model Designs.” Statistical Papers, 56(4), 947–967.
doi:10.1007/s00362-014-0617-3.
Konietschke F, Bathke AC, Harrar SW, Pauly M (2015). “Parametric and Nonparametric
Bootstrap Methods for General MANOVA.” Journal of Multivariate Analysis, 140, 291–
301. doi:10.1016/j.jmva.2015.05.001.
Langsrud Ø (2005). “Rotation Tests.” Statistics and Computing, 15(1), 53–60. doi:10.1007/
s11222-005-4789-5.
Lehmann EL, Romano JP (2008). Testing Statistical Hypotheses. Springer-Verlag.
Manly BFJ (1991). Randomization, Bootstrap and Monte Carlo Methods in Biology. Chapman
and Hall/CRC.
Maris E, Oostenveld R (2007). “Nonparametric Statistical Testing of EEG- And MEG-Data.”
Journal of Neuroscience Methods, 164(1), 177–190. doi:10.1016/j.jneumeth.2007.03.
024.
Pauly M, Brunner E, Konietschke F (2015). “Asymptotic Permutation Tests in General
Factorial Designs.” Journal of the Royal Statistical Society B, 77(2), 461–473. doi:10.
1111/rssb.12073.
Pernet CR, Latinus M, Nichols TE, Rousselet GA (2015). “Cluster-Based Computational
Methods for Mass Univariate Analyses of Event-Related Brain Potentials/Fields: A Simu-
lation Study.” Journal of Neuroscience Methods, 250, 85–93. doi:10.1016/j.jneumeth.
2014.08.003.
Raaijmakers JGW, Schrijnemakers JMC, Gremmen F (1999). “How to Deal with “The
Language-as-Fixed-Effect Fallacy”: Common Misconceptions and Alternative Solutions.”
Journal of Memory and Language, 41(3), 416–426. doi:10.1006/jmla.1999.2650.
28 permuco: Permutation Tests for Regression, ANOVA, and Comparison of Signals
R Core Team (2021). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Searle SR (2006). Linear Models for Unbalanced Data. John Wiley & Sons.
Seber GAF, Lee AJ (2012). Linear Regression Analysis. John Wiley & Sons. doi:10.1002/
0471725315.
ter Braak CJF (1992). “Permutation Versus Bootstrap Significance Tests in Multiple Regres-
sion and Anova.” In KH Jöckel, G Rothe, W Sendler (eds.), Bootstrapping and Related
Techniques, pp. 79–85. Springer-Verlag. doi:10.1007/978-3-642-48850-4_10.
Tipura E, Renaud O, Pegna AJ (2019). “Attention Shifting and Subliminal Cueing under
High Attentional Load: An EEG Study Using Emotional Faces.” Neuroreport. doi:10.
1097/wnr.0000000000001349.
Weiss NA (2015). wPerm: Permutation Tests. R package version 1.0.1, URL https://CRAN.
R-project.org/package=wPerm.
Wheeler B, Torchiano M (2016). lmPerm: Permutation Tests for Linear Models. R package
version 2.1.0, URL https://CRAN.R-project.org/package=lmPerm.
Winkler AM, Ridgway GR, Douaud G, Nichols TE, Smith SM (2016). “Faster Permutation
Inference in Brain Imaging.” NeuroImage, 141, 502–516. doi:10.1016/j.neuroimage.
2016.05.068.
Winkler AM, Ridgway GR, Webster MA, Smith SM, Nichols TE (2014). “Permutation
Inference for the General Linear Model.” NeuroImage, 92, 381–397. doi:10.1016/j.
neuroimage.2014.01.060.
Jaromil Frossard, Olivier Renaud 29
R> install.packages("lmPerm")
R> install.packages("flip")
R> install.packages("GFD")
R> library("lmPerm")
R> library("flip")
R> library("GFD")
R> set.seed(42)
R> emergencycost$LOSc <- scale(emergencycost$LOS, scale = FALSE)
R> contrasts(emergencycost$sex) <- contr.sum
R> contrasts(emergencycost$insurance) <- contr.sum
R> X <- model.matrix( ~ sex+insurance, data = emergencycost)[, -1]
R> colnames(X) <- c("sex_num", "insurance_num")
R> emergencycost <- data.frame(emergencycost, X)
R> anova_permuco <- aovperm(cost ~ sex*insurance, data = emergencycost)
R> anova_GFD <- GFD(cost ~ sex*insurance, data = emergencycost,
+ CI.method = "perm", nperm = 5000)
R> ancova_permuco <- aovperm(cost ~ LOSc*sex*insurance, data = emergencycost,
+ method = "huh_jhun")
R> ancova_flip <- flip(cost ~1, X = ~sex_num, Z = ~LOSc*insurance_num*sex_num
+ - sex_num, data = emergencycost, statTest = "ANOVA", perms = 5000)
R> ancova_lmPerm <- aovp(cost ~ LOS*sex*insurance, data = emergencycost,
+ seqs = FALSE, nCycle = 1)
R> anova_permuco
Anova Table
Resampling test using freedman_lane to handle nuisance variables and
5000 permutations.
R> anova_GFD
Call:
cost ~ sex * insurance
R> ancova_permuco
Anova Table
Resampling test using huh_jhun to handle nuisance variables and
5000, 5000, 5000, 5000, 5000, 5000, 5000 permutations.
SS df F parametric P(>F)
LOSc 2162110751 1 483.4422 0.0000
sex 14630732 1 3.2714 0.0723
insurance 618366 1 0.1383 0.7105
LOSc:sex 8241073 1 1.8427 0.1765
LOSc:insurance 29107536 1 6.5084 0.0116
sex:insurance 123892 1 0.0277 0.8680
LOSc:sex:insurance 13457877 1 3.0091 0.0846
Residuals 751350616 168
resampled P(>F)
LOSc 0.0002
sex 0.0736
insurance 0.7224
LOSc:sex 0.1756
LOSc:insurance 0.0102
sex:insurance 0.8704
LOSc:sex:insurance 0.0820
Residuals
R> summary(ancova_lmPerm)
Component 1 :
Df R Sum Sq R Mean Sq Iter Pr(Prob)
LOS 1 2162110751 2162110751 5000 <0.0000000000000002
sex 1 14630732 14630732 4159 0.0236
LOS:sex 1 8241073 8241073 1525 0.0616
insurance 1 618366 618366 94 0.5213
LOS:insurance 1 29107536 29107536 5000 0.0010
sex:insurance 1 123892 123892 80 0.5625
LOS:sex:insurance 1 13457877 13457877 2238 0.0429
Residuals 168 751350616 4472325
Jaromil Frossard, Olivier Renaud 31
LOS ***
sex *
LOS:sex .
insurance
LOS:insurance ***
sex:insurance
LOS:sex:insurance *
Residuals
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R> ancova_flip
Warning message:
In checkBalancedData(fixed_formula = formula_f, data = cbind(y, :
The data are not balanced, the results may not be exact.
R> summary(rancova_lmPerm)
Error: id
Component 1 :
Df R Sum Sq R Mean Sq Iter Pr(Prob)
bmic 1 3270 3270 51 0.8824
condition 2 20000 10000 840 0.3009
bmic:condition 2 89238 44619 5000 0.0255 *
Residuals 13 106884 8222
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: id:time
Component 1 :
Df R Sum Sq R Mean Sq Iter Pr(Prob)
time 1 1047 1047.4 51 0.9412
bmic:time 1 31 31.5 51 0.8039
condition:time 2 29793 14896.4 240 0.3875
bmic:condition:time 2 29146 14572.9 345 0.3914
Residuals 13 167305 12869.6
Affiliation:
Jaromil Frossard, Olivier Renaud
University of Geneva
Boulevard du Pont d’Arve 40, 1204 Geneva, Switzerland
E-mail: [email protected], [email protected]
URL: http://www.unige.ch/fapse/mad/