Covariate Adaptive False Discovery Rate Control With Applications To Omics-Wide Multiple Testing
Covariate Adaptive False Discovery Rate Control With Applications To Omics-Wide Multiple Testing
Abstract Conventional multiple testing procedures often assume hypotheses for different fea-
tures are exchangeable. However, in many scientific applications, additional covariate information
regarding the patterns of signals and nulls are available. In this paper, we introduce an FDR control
procedure in large-scale inference problem that can incorporate covariate information. We develop
a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when
the underlying likelihood ratio model is misspecified and the p-values are weakly dependent (e.g.,
strong mixing). Extensive simulations are conducted to study the finite sample performance of the
proposed method and we demonstrate that the new approach improves over the state-of-the-art
approaches by being flexible, robust, powerful and computationally efficient. We finally apply the
method to several omics datasets arising from genomics studies with the aim to identify omics fea-
tures associated with some clinical and biological phenotypes. We show that the method is overall
the most powerful among competing methods, especially when the signal is sparse. The proposed
Covariate Adaptive Multiple Testing procedure is implemented in the R package CAMT.
Keywords: Covariates, EM-algorithm, False Discovery Rate, Multiple Testing.
1 Introduction
Multiple testing refers to simultaneous testing of more than one hypothesis. Given a set of
hypotheses, multiple testing deals with deciding which hypotheses to reject while guaranteeing
some notion of control on the number of false rejections. A traditional measure is the family-wise
error rate (FWER), which is the probability of committing at least one type I error. As the number
of trials increases, FWER still measures the probability of at least one false discovery, which is overly
stringent in many applications. This absolute control is in contrast to the proportionate control
1
Xianyang Zhang ([email protected]) is Associate Professor of Statistics at Texas A&M University. Jun
Chen ([email protected]) is Associate Professor of Biostatistics at Mayo Clinic. Zhang acknowledges partial
support from NSF DMS-1830392 and NSF DMS-1811747. Chen acknowledges support from Mayo Clinic Center for
Individualized Medicine.
1
Consider the problem of testing m distinct hypotheses. Suppose a multiple testing procedure
rejects R hypotheses among which V hypotheses are null, i.e., it commits V type I errors. In the
seminal paper by Benjamini and Hochberg, the authors introduced the concept of FDR defined as
V
FDR = E ,
R∨1
where a∨b = max{a, b} for a, b ∈ R, and the expectation is with respect to the random quantities V
and R. FDR has many advantageous features comparing to other existing error measures. Control
of FDR is less stringent than the control of FWER especially when a large number of hypothesis
tests are performed. FDR is also adaptive to the underlying signal structure in the data. The
widespread use of FDR is believed to stem from and motivated by the modern technologies which
produce big datasets, with huge numbers of measurements on a comparatively small number of
experimental units. Another reason for the popularity of FDR is the existence of a simple and
easy-to-use procedure proposed in Benjamini and Hochberg (1995) (the BH procedure, hereafter)
Although the BH procedure is more powerful than procedures aiming to control the FWER, it
assumes hypotheses for different features are exchangeable which could result in suboptimal power
as demonstrated in recent literature when individual tests differ in their true effect size, signal-to-
noise ratio or prior probability of being false. In many scientific applications, particularly those
from genomics studies, there are rich covariates that are informative of either the statistical power
or the prior null probability. These covariates can be roughly derived into two classes: statistical
covariates and external covariates (Ignatiadi et al., 2016). Statistical covariates are derived from
the data itself and could reflect the power or null probability. Generic statistical covariates include
the sample variance, total sample size and sample size ratio (for two-group comparison), and the
direction of the effects. There are also specific statistical covariates for particular applications. For
example, in transcriptomics studies using RNA-Seq, the sum of read counts per gene across all
samples is a statistical covariate informative of power since the low-count genes are subject to more
sampling variability. Similarly, the minor allele frequency and the prevalence of the bacterial species
can be taken as statistical covariates for genome-wide association studies (GWAS) and microbiome-
wide association studies (MWAS), respectively. Moreover, the average methylation level of a CpG
2
site in epigenome-wide association studies (EWAS) can be a statistical covariate informative of the
prior null probability due to the fact that differential methylation frequently occurs in highly or
lowly methylated region depending on the biological context. Besides these statistical covariates,
there are a plethora of covariates that are derived from external sources and are usually informative
of the prior null probability. These external covariates include the deleteriousness of the genetic
variants for GWAS, the location (island and shore) of CpG methylation variants for EWAS, and
pathogenicity of the bacterial species for MWAS. Useful external covariates also include p-values
from previous or related studies which suggest that some hypotheses are more likely to be non-
null than others. Exploiting such external covariates in multiple testing could lead to improved
Accommodating covariates in multiple testing has recently been a very active research area.
We briefly review some contributions that are most relevant to the current work. The basic idea
of many existing works is to relax the p-value thresholds for hypotheses that are more likely to
be non-null and tighten the thresholds for the other hypotheses so that the overall FDR level can
be controlled. For example, Genovese et al. (2006) proposed to weight the p-values with different
weights, and then apply the BH procedure to the weighted p-values. Hu et al. (2010) developed
a group BH procedure by estimating the proportions of null hypotheses for each group separately,
which extends the method in Storey (2002). Li and Barber (2017) generalized this idea by using
the censored p-values (i.e., p-values that are greater than a pre-specified threshold) to adaptively
estimate the weights that can be designed to reflect any structure believed to be present. Ignatiadi
et al. (2016) proposed the independent hypothesis weighting (IHW) for multiple testing with
covariate information. Their idea is to bin the covariates into several groups and then apply the
weighted BH procedure with piecewise constant weights. Boca and Leek (2018) extended the idea
by using a regression approach to estimate weights. Another related method (named AdaPT)
was proposed in Lei and Fithian (2018), which iteratively estimates the p-value thresholds using
partially censored p-values. The above procedures can be viewed to some extent as different variants
of the weighted BH procedure. Along a separate line, Local FDR (LFDR) based approaches have
been developed to accommodate various forms of auxiliary information. For example, Cai and Sun
(2009) considered multiple testing of grouped hypotheses using the pooled LFDR statistic. Sun
et al. (2015) developed a LFDR-based procedure to incorporate spatial information. Scott et al.
3
(2015) and Tansey et al. (2017) proposed EM-type algorithms to estimate the LFDR by taking
Although the approaches mentioned above excel in certain aspects, a method that is flexible,
robust, powerful and computationally efficient is still lacking. For example, IHW developed in
Ignatiadi et al. (2016) cannot handle multiple covariates. AdaPT in Lei and Fithian (2018) is
computationally intensive and may suffer from significant power loss when the signal is sparse, and
covariate is not very informative. Li and Barber (2017)’s procedure is not Bayes optimal as shown
in Lei and Fithian (2018) and thus could lead to suboptimal power as observed in our numerical
studies. The FDR regression method proposed in Scott et al. (2015) lacks a rigorous FDR control
simulations covering different signal structures, we propose a new procedure to incorporate covariate
information with generic applicability. The covariates can be any continuous or categorical variables
that are thought to be informative of the statistical properties of the hypothesis tests. The main
1. Given a sequence of p-values {p1 , . . . , pm }, we introduce a general decision rule of the form
(1 − t)πi
(1 − ki )pi−ki ≥ , 0 < ki < 1, 1 ≤ i ≤ m, (1)
t(1 − πi )
which serves as a surrogate for the optimal decision rule derived under the two-component
mixture model with varying mixing probabilities and alternative densities. Here πi and ki are
parameters that can be estimated from the covariates and p-values, and t is a cut-off value to
be determined by our FDR control method. We develop a new procedure to estimate (ki , πi )
and find the optimal threshold value for t in (1). We show that (i) when πi and ki are chosen
independently of the p-values, the proposed procedure provides finite sample FDR control;
(ii) our procedure provides asymptotic FDR control when πi and ki are chosen to maximize a
potentially misspecified likelihood based on the covariates and p-values; (iii) Similar to some
recent works (e.g., Ignatiadi et al., 2016; Lei and Fithian, 2017; Li and Barber, 2017), our
method allows the underlying likelihood ratio model to be misspecified. A distinctive feature
is that our asymptotic analysis does not require the p-values to be marginally independent or
4
conditionally independent given the covariates. More specifically, we allow the pairs of p-value
3.3.
to problems with millions of tests. Through extensive numerical studies, we show that our
procedure is highly competitive to several existing approaches in the recent literature in terms
of finite sample performance. The proposed procedure is implemented in the R package CAMT.
Our method is related to Lei and Fithian (2018), and it is worth highlighting the differences from
their work. (i) Lei and Fithian (2018) uses partially censored p-values to determine the threshold,
which can discard useful information concerning the alternative distribution of p-values (i.e., f1,i
in (3) below) since small p-values that are likely to be generated from the alternative are censored.
In contrast, we use all the p-values to determine the threshold. Our method is seen to exhibit more
power as compared to Lei and Fithian (2018) when signal is (moderately) sparse. Although our
method no longer offers theoretical finite sample FDR control, we show empirically that the power
gain is not at the cost of FDR control. (ii) Different from Lei and Fithian (2018) which requires
multiple stages for practitioners to make their final decision, our method is a single-stage procedure
that only needs to be run one time; Thus the implementation of our method is faster and scalable
to modern big datasets. (iii) Our theoretical analysis is entirely different from those in Lei and
Fithian (2018). In particular, we show that our method achieves asymptotic FDR control even
2 Methodology
associated with the ith hypothesis, and with some abuse of notation, let Hi indicate the underlying
truth of the ith hypothesis. In other words, Hi = 0 if the ith hypothesis is true and Hi = 1
otherwise. For each hypothesis, we observe a covariate xi lying in some space X ⊆ Rq with q ≥ 1.
From a Bayesian viewpoint, we can model Hi given xi as a Bernoulli random variable with success
5
probability 1 − π0i , where π0i denotes the prior probability that the ith hypothesis is under the null
when conditioning on xi . One approach to model the p-value distribution is via a two-component
mixture model,
where f0 and f1,i are the density functions corresponding to the null and alternative hypotheses
respectively. In the following discussions, we shall assume that f0 satisfies the following condition:
Z a Z 1
f0 (x)dx ≤ f0 (x)dx. (4)
0 1−a
This condition relaxes the assumption of uniform distribution on the unit interval. It is fulfilled
when f0 is non-decreasing or f0 is symmetric about 0.5 (in which case the equality holds in (4)).
We demonstrate that this relaxation is capable of describing plausible data generating processes
that would create a non-uniform null distribution. Let T be a test statistic such that under the null
its z-score Z = (T − µ0 )/σ0 is standard normal. In practice, one uses µ̂ and σ̂ to estimate µ0 and
σ0 respectively. Let Φ be the standard normal CDF. The corresponding one-sided p-value is given
by Φ((T − µ̂)/σ̂) whose distribution function is P (Φ((T − µ̂)/σ̂) ≤ x) = Φ((Φ−1 (x)σ̂ + µ̂ − µ0 )/σ0 ).
When µ0 ≥ µ̂ (i.e., we underestimate the mean), one can verify that f0 is a non-decreasing. In the
Compared to the classical two-component mixture model, the varying null probability reflects
the relative importance of each hypothesis given the external covariate information xi and the
varying alternative density f1,i emphasizes the heterogeneity among signals. In the context
without covariate information, it is well known that the optimal rejection is based on the LFDR,
see e.g., Efron (2004) and Sun and Cai (2007). The result has been generalized to the setups with
group or covariate information, see e.g., Cai and Sun (2009) and Lei and Fithian (2018). Based
on these insights, one can indeed show that the optimal rejection rule that controls the expected
6
number of false positives while maximizes the expected number of true positives takes the form of
where t ∈ (0, 1) is a cut-off value. This decision rule is generally unobtainable because f1,i is uniden-
tifiable without extra assumptions on its form. Moreover, consistent estimation of the decision rule
(5) is difficult, and even with the use of additional approximations, such as splines or piecewise
constant functions. In this work, we do not aim to estimate the optimal rejection rule directly.
Instead, we try to find a rejection rule that can mimic some useful operational characteristics of
the optimal rule. Our idea is to first replace f1,i /f0 by a surrogate function hi . We emphasize that
hi needs not agree with the likelihood ratio f1,i /f0 for our method to be valid. In fact, the validity
of our method does not rely on the correct specification of model (2)-(3). We require hi to satisfy
R1
(i) hi (p) ≥ 0 for p ∈ [0, 1]; (ii) 0 hi (p)dp = 1; (iii) h is decreasing. Requirement (iii) is imposed to
mimic the common likelihood ratio assumption in the literature, see e.g. Sun and Cai (2007). In
where ki is a parameter that depends on xi . Suppose that under the null hypothesis, pi is
uniformly distributed, whereas under the alternative, it follows a beta distribution with parameters
(1 − ki , 1), then the true likelihood ratio would take exactly the form given in (6). To demonstrate
the approximation of the proposed surrogate likelihood ratio to the actual likelihood ratio for
realistic problems, we simulated two binary variables and generated four alternative distributions
f1,i depending on the four levels of the two variables (details in the legend of Figure 1). We used
the proposed procedure to find the best ki and compared the CDF of the empirical distribution
(reflecting the actual likelihood ratio) to that of the fitted beta distribution (reflecting the surrogate
likelihood ratio). We can see from Figure 1 the approximation was reasonably well and the accuracy
7
Based on the surrogate likelihood ratio, the corresponding rejection rule is given by
(1 − t)πi
hi (pi ) ≥ wi (t) := , (7)
t(1 − πi )
for some weights πi to be determined later. See Section 2.3 for more details about the estimation
of ki and πi .
We first note that the false discovery proportion (FDP) associated with the rejection rule (7)
is equal to
Pm
i=1 (1 − H )1{hi (pi ) ≥ wi (t)}
FDP(t) := Pm i .
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}
Pm
i=1 (1P− Hi )1{pi ≤ h−1 i (wi (t))}
FDP(t) =
1∨ m i=1 1{h (p
i i ) ≥ wi (t)}
Pm −1
(1 − Hi )P (pi ≤ hi (wi (t)))
≈ i=1 Pm
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}
Pm
(1 − Hi )P (1 − pi ≤ h−1 i (wi (t)))
≤ i=1 Pm
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}
1+ m
P
i=1 (1 − H )1{hi (1 − pi ) ≥ wi (t)}
≈ Pm i
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}
Pm
1 + i=1 1{hi (1 − pi ) ≥ wi (t)}
≤ := FDPup (t),
1∨ m
P
i=1 1{hi (pi ) ≥ wi (t)}
where the approximations are due to the law of large numbers and the inequality follows from
Condition (4).1 This strategy is partly motivated by the recent distribution-free method proposed
in Barber and Candès (2015). We refer any FDR estimator constructed using this strategy as
the BC-type estimator. Both the adaptive procedure in Lei and Fithian (2018) and the proposed
method fall into this category. A natural idea is to select the largest threshold such that FDPup (t)
1
Rigorous theoretical justifications are provided in Theorem 2.1 and Theorem 3.8.
8
is less or equal to a prespecified FDR level α. Specifically, we define
1+ m
P
∗ i=1 1{hi (1 − pi ) ≥ wi (t)}
t = max t ∈ [0, tup ] : FDPup (t) = ≤α ,
1∨ m
P
i=1 1{hi (pi ) ≥ wi (t)}
where tup satisfies that wi (tup ) ≥ hi (0.5) for all i, and we reject all hypotheses such that hi (pi ) ≥
wi (t∗ ). The following theorem establishes the finite sample control of the above procedure when πi
and hi are prespecified and thus independent of the p-values. For example, πi and hi are estimated
Theorem 2.1. Suppose hi is strictly decreasing for each i and f0 satisfies Condition (4). If the p-
values are independent and the choice of hi and πi is independent of the p-values, then the adaptive
2.3 An algorithm
The optimal choices of πi and ki are rarely known in practice, and a generally applicable data-
xi |Hi ∼ (1 − Hi )g0 + Hi g1 ,
9
where π(x) = g0 (x)π0 /{g0 (x)π0 +g1 (x)(1−π0 )} = f (Hi = 0|xi = x). Therefore, πi is the conditional
probability that the ith hypothesis is under the null given the covariate xi .
0
To motivate our estimation procedure for πi and ki , let us define πθ (x) = 1/(1 + e−θ0 −θ1 x ) and
0
kβ (x) = 1/(1 + e−β0 −β1 x ) for x ∈ Rq , where θ = (θ0 , θ1 ) and β = (β0 , β1 ). Suppose that conditional
f1,i (pi )
f (pi |xi ) =πθ (xi )f0 (pi ) + (1 − πθ (xi ))f1,i (pi ) = f0 (pi ) πθ (xi ) + (1 − πθ (xi )) .
f0 (pi )
Replacing f1,i /f0 by the surrogate likelihood ratio whose parameters ki depend on xi , we obtain
n o
−k (x )
f˜(pi |xi ) = f0 (pi ) πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i .
Moving to a log scale and summing up the individual log likelihoods, we see that the null density
m m n o
−k (x )
X X
log f˜(pi |xi ) = log πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i + C0 ,
i=1 i=1
Pm
where C0 = i=1 log f0 (pi ). The above discussions thus motivate the following optimization prob-
m
X
max log{πi + (1 − πi )(1 − ki )p−ki }, (8)
θ=(θ0 ,θ1 )0 ∈Θ,β=(β0 ,β1 )0 ∈B
i=1
where
πi ki
log = θ0 + θ10 xi , log = β0 + β10 xi , (9)
1 − πi 1 − ki
and Θ, B ⊆ Rq+1 are some compact parameter spaces. This problem can be solved using the EM-
algorithm together with the Newton’s method in its M-step. Let θ̂ and β̂ be the maximizer from
10
(8). Define
0
if 1/(1 + e−x̃i θ̂ ) ≤ 1 ,
1 ,
0
π̂i = W (1/(1 + e−x̃i θ̂ ), 1 , 2 ) := 0
1/(1 + e−x̃i θ̂ ),
0
if 1 < 1/(1 + e−x̃i θ̂ ) < 1 − 2 ,
1 − ,
2 otherwise,
0
and k̂i = 1/(1 + e−x̃i β̂ ) with x̃i = (1, x0i )0 , and
(1 − t)π̂i
ŵi (t) = .
t(1 − π̂i )
We use winsorization to prevent π̂i from being too close to zero. In numerical studies, we found
the choices of 1 = 0.1 and 2 = 10−5 perform reasonably well. Further denote
( Pm )
1+ 1{(1 − k̂ )(1 − p ) −k̂i > ŵ (t)}
i=1 i i i
t̂ = max t ∈ [0, 1] : ≤α .
Pm −k̂i
1 ∨ i=1 1{(1 − k̂i )pi ≥ ŵi (t)}
(1 − k̂i )p−
i
k̂i
≥ ŵi (t̂).
Remark 2.1. We can replace xi ∈ Rq by (g1 (xi ), . . . , gq0 (xi )) ∈ Rq0 for some transformations
(g1 , . . . , gq0 ) to allow nonlinearity in the logistic regressions. In numerical studies, we shall consider
3 Asymptotic results
In this subsection, we provide asymptotic justification for the proposed procedure. Note that
( )1/k̂i
−k̂i t(1 − k̂i )(1 − π̂i )
1{(1 − k̂i )p ≥ ŵi (t)} = 1{p ≤ c(t, π̂i , k̂i )} for c(t, π̂i , k̂i ) = 1 ∧ .
(1 − t)π̂i
11
Define
Pm
i=1 (1 − Hi )1{pi ≤ c(t, πi , ki )}
FDR(t, Π, K) = E Pm
i=1 1{pi ≤ c(t, πi , ki )}
with Π = (π1 , . . . , πm ) and K = (k1 , . . . , km ). We make the following assumptions to facilitate our
theoretical derivations.
Assumption 3.1. Suppose the parameter spaces Θ and B are both compact.
m
1 X −k (x )
lim E log{πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i }
m m
i=1
converges uniformly over θ ∈ Θ and β ∈ B to R(θ, β), which has a unique maximum at (θ∗ , β ∗ ) in
Θ × B.
Let Fab = σ((xi , pi ), a ≤ i ≤ b) be the Borel σ-algebra generated by the random variables (xi , pi )
Assumption 3.3. Suppose (xi , pi ) is α-mixing with α(v) = O(v −ξ ) for ξ > r/(r − 1) and r > 1 (or
φ-mixing with φ(v) = O(v −ξ ) for ξ > r/(2r−1) and r ≥ 1). Further assume supi E| log(pi )|r+δ < ∞
and maxi kxi k∞ < C, where k · k∞ denotes the l∞ norm of a vector and C, δ > 0.
Assumption 3.1 is standard. Assumption 3.2 is a typical condition in the literature of maximum
likelihood estimation for misspecified models, see e.g. White (1982). Assumption 3.3 relaxes the
the uniform strong law of large numbers for the process Rm (θ, β) defined in the proof of Lemma 3.4
below which establishes the uniform strong consistency for π̂i and k̂i . The boundedness assumption
on xi could be relaxed with a more delicate analysis to control its tail behavior and study the
convergence rate of θ̂ and β̂. Denote by k · k the l2 norm of a vector. An essential condition
12
required in our proof of Lemma 3.4 is kθ̂ − θ∗ k max1≤i≤n kxi k = oa.s. (1). If kθ̂ − θ∗ k = Oa.s. (n−a )
for some a > 0, then by the Borel-Cantelli lemma, we require max1≤i≤n Ekxi kk < ∞ for some k
with ak > 2, i.e., xi should have a sufficiently light polynomial tail. We remark that Assumption
3.3 can be replaced by more primitive conditions which allow other weak dependence conditions,
0 ∗ 0 ∗
see, e.g., Pötscher and Prucha (1989). Let πi∗ = W (1/(1 + e−x̃i θ ), 1 , 2 ) and ki∗ = 1/(1 + e−x̃i β ).
Assumption 3.5. For two sequences ai , bi ∈ [, 1] with small enough and large enough m,
m
1 X
{P (pi ≤ ai |xi ) − P (pi ≤ bi |xi )} ≤ c0 max |ai − bi |,
m 1≤i≤m
i=1
m
1 X
P (pi ≤ c(t, πi∗ , ki∗ )) → G0 (t), (10)
m
i=1
m
1 X
P (1 − pi < c(t, πi∗ , ki∗ )) → G1 (t), (11)
m
i=1
1 X
P (pi ≤ c(t, πi∗ , ki∗ )) → G̃1 (t), (12)
m
Hi =0
for any t ≥ t0 with t0 > 0, where G0 (t), G1 (t) and G̃1 (t) are all continuous functions of t. Note
that the probability here is taken with respect to the joint distribution of (pi , xi ).
Let U (t) = G1 (t)/G0 (t), where G1 and G0 are defined in Assumption 3.6.
Assumption 3.7. There exists a t0 > t0 > 0 such that U (t0 ) < α.
Assumption 3.5 is fulfilled if the conditional density of pi given xi is bounded uniformly across
i on [, 1]. This assumption is not very strong as we still allow the density to be unbounded around
13
zero. Assumptions 3.6-3.7 are similar to those in Theorem 4 of Storey et al. (2004). In particular,
Assumption 3.7 ensures the existence of a cut-off to control the FDR at level α.
We are now in position to state the main result of this section which shows that the proposed
procedure provides asymptotic FDR control. The proof is deferred to the supplementary material.
Theorem 3.8. Suppose Assumptions 3.1-3.7 hold and f0 satisfies Condition (4). Then we have
It is worth mentioning that the validity of our method does not rely on the mixture model as-
sumption (2)-(3). In this sense, our method is misspecification robust as the classical BH procedure
does. We provide a comparison between our method and some recently proposed approaches in the
following table.
Table 1: Comparison of several covariate adaptive FDR control procedures in recent literature.
The number of “+” represents the speed. *The framework of Li and Barber (2017) allows accom-
modating multiple covariates, but the provided software did not implement.
We study the asymptotic power of the oracle procedure. Suppose the mixture model (2)-(3)
holds with π0i = π0 (xi ) and f1,i (·) = f1 (·; xi ), where f1 (·; x) is a density function for any fixed
x ∈ X . Denote by F1 (·; x) and F̄1 (·; x) the distribution and survival functions associated with
14
f1 (·; x) respectively. Suppose the empirical distribution of xi ’s converges to the probability law
P. Consider the oracle procedure with πi = π0 (xi ) and ki = k0 (xi ). Here k0 (·) minimizes the
Z
k0 = argmin DKL (f (·; x)||g(; k(x)))P(dx),
k∈K
Z 1
f (p; x)
DKL (f (·; x)||g(·; k(x))) = f (p; x) log dp,
0 g(p; k(x))
with f (p; x) = π0 (x)f0 (p) + (1 − π0 (x))f1 (p; x) and g(p; k(x)) = π0 (x) + (1 − π0 (x))(1 − k(x))p−k(x) ,
n o
k(x)
and K = k(x) : log 1−k(x) = β0 + β10 x, (β0 , β1 ) ∈ B . Write c(t, x) = c(t, π0 (x), k0 (x)). By the
law of large numbers, the realized power of the oracle procedure has the approximation
Pm R
i=1 1{i : H = 1, p ≤ c(t, xi )} (1 − π(x))F1 (c(topt , x); x)P(dx)
Power = Pm i ≈ R ,
i=1 1{i : Hi = 1} (1 − π(x))P(dx)
R
{π0 (x)F0 (c(t, x)) + (1 − π0 (x))F̄1 (1 − c(t, x); x)}P(dx)
R ≤ α. (13)
{π0 (x)F0 (c(t, x)) + (1 − π0 (x))F1 (c(t, x); x)}P(dx)
R
(1 − π0 (x))F̄1 (1 − c(topt , x); x)P(dx)
R ≈ 0, (14)
{π0 (x)F0 (c(topt , x)) + (1 − π0 (x))F1 (c(topt , x); x)}P(dx)
the asymptotic power of the proposed procedure is closed to the oracle procedure based on the
LFDR given by
π0i f0 (pi )
LFDRi (pi ) = . (15)
π0i f0 (pi ) + (1 − π0i )f1,i (pi )
4 Simulation studies
posed method and compare it to competing methods. For genome-scale multiple testing, the
numbers of hypotheses could range from thousands to millions. For demonstration purpose, we
15
start with m=10, 000 hypotheses. To study the impact of signal density and strength, we simulate
three levels of signal density (sparse, medium and dense signals) and six levels of signal strength
(from very weak to very strong). To demonstrate the power improvement by using external covari-
and strongly informative). For simplicity, we simulate one covariate xi ∼ N (0, 1) for i = 1, · · · , m.
Given xi , we let
exp(ηi )
π0i = , ηi = η0 + kd xi ,
1 + exp(ηi )
where η0 and kd determine the baseline signal density and the informativeness of the covariate,
respectively. For each simulated dataset, we fix the value of η0 and kd . We set η0 ∈ {3.5, 2.5, 1.5},
which achieves a signal density around 3%, 8%, and 18% respectively at the baseline (i.e., no co-
variate effect), representing sparse, medium and dense signals. We set kd ∈ {0, 1, 1.5}, representing
we have a total of 3 × 3 = 9 parameter settings. Based on π0i , the underlying truth Hi is simulated
from
Hi ∼ Bernoulli(1 − π0i ).
zi ∼ N (ks Hi , 1),
where ks controls the signal strength (effect size) and we use values equally spaced on [2, 2.8]. Z-
scores are converted into p-values using the one-sided formula 1 − Φ(zi ). P-values together with xi
In addition to the basic setting (denoted as Setup S0), we investigate other settings to study
Setup S1. Additional f1 distribution. Instead of simulating normal z-scores under f1 , we simulate z-
scores from a non-central gamma distribution with the shape parameter k= 2. The scale/non-
centrality parameters of the non-central gamma distribution are chosen to match the variance
16
Setup S2. Covariate-dependent π0i and f1,i . On top of the basic setup S0, we simulate another covariate
2 exp(kf x0i )
x0i ∼ N (0, 1) and let x0i affect f1,i . Specifically, we scale ks by , where we
1 + exp(kf x0i )
set kf ∈ {0, 0.25, 0.5} for non-informative, moderately informative and strongly informative
Setup S3. Dependent hypotheses. We further investigate the effect of dependency among hypotheses
two block correlation structures and two AR(1) correlation structures, are investigated. For
the block correlation structure, we divide the 10,000 hypotheses into 500 equal-sized blocks.
Within each block, we simulate equal positive correlations (ρ=0.5) (S3.1). We also further
divide the block into 2 by 2 sub-blocks, and simulate negative correlations (ρ= − 0.5) between
the two sub-blocks (S3.2). For AR(1) structure, we investigate both ρ=0.75|i−j| (S3.3) and
ρ=(−0.75)|i−j| (S3.4).
Setup S4. Heavy-tail covariate. In this variant, we generate xi from the t distribution with 5 degrees of
freedom.
Setup S5. Non-theoretical null distribution. We simulate both increasing and decreasing f0 . For an
increasing f0 (S5.1), we generate null z-score zi |H0 ∼ N (−0.15, 1). For a decreasing f0
We present the simulation results for the Setup S0-S2 in the main text and the results for the Setup
S3-S5 in the supplementary material. To allow users to conveniently implement our method and
reproduce the numerical results reported here, we make our code and data publicly available at
https://github.com/jchen1981/CAMT.
We label our method as CAMT (Covariate Adaptive Multiple Testing) and compare it to the
• Oracle: Oracle procedure based on LFDR (see e.g., (15)) with simulated π0i and f1,i , which
17
• ST: Storey’s BH procedure (Storey 2002, qvalue package, v2.10.0);
• BL: Boca and Leek procedure (Boca and Leek, 2018, swfdr package, v1.4.0);
• IHW: Independent hypothesis weighting (Ignatiadis et al., 2016, IHW package, v1.6.0);
• FDRreg: False discovery rate regression (Scott et al., 2015, FDRreg package, v0.2,
• SABHA: Structure adaptive BH procedure (Li and Barber, 2017, τ = 0.5, = 0.1 and stepwise
• AdaPT: Adaptive p-value thresholding procedure (Lei and Fithian, 2018, adaptMT package,
v1.0.0).
We evaluate the performance based on FDR control (false discovery proportion) and power
(true positive rate) with a target FDR level of 5%. Results are averaged over 100 simulation runs.
We first study the performance of the proposed method under the basic setup (S0, Figure 2).
All compared methods generally controlled the FDR around/under the nominal level of 0.05 and no
serious FDR inflation was observed at any of the parameter setting (Figure 2A). However, FDRreg
exhibited a slight FDR inflation under some parameter settings and the inflation seemed to increase
with the informativeness of the covariate and signal density. Conservativeness was also observed
for some methods in some cases. As expected, the BH procedure, which did not take into account
π0 , was conservative when the signal was dense. IHW procedure was generally more conservative
than BH and the conservativeness increased with the informativeness of the covariate. CAMT,
the proposed method, was conservative when the signal was sparse and the covariate was less
informative. The conservativeness was more evident when the effect size was small but decreased as
the effect size became larger. AdaPT was more conservative than CAMT under sparse signal/weak
covariate. In terms of power (Figure 2B), there were several interesting observations. First, as the
covariate became more informative, all the covariate adaptive methods became more powerful than
18
ST and BH. The power differences between these methods also increased. Second, FDRreg was the
most powerful across settings. Under a highly informative covariate, it was even slightly above the
oracle procedure, which theoretically had an optimal power. The superior power of FDRreg could
be partly explained by a less well controlled FDR. The IHW was more powerful than BL/SABHA
when the signal was sparse; but the trend reversed when the signal was dense. Third, AdaPT was
very powerful when the signal was dense and the covariate was highly informative. However, the
power decreased as the signal became more sparse and the covariate became less informative. In
fact, when the signal was sparse and the covariate was not informative or moderately informative,
AdaPT had the lowest power. In contrast, the proposed method CAMT was close to the oracle
procedure. It was comparable to AdaPT when AdaPT was the most powerful, but was significantly
more powerful than AdaPT in its unfavorable scenarios. CAMT had a clear edge when the covariate
was informative and signal was sparse. Similar to AdaPT, CAMT had some power loss under sparse
signal and non-informative covariate, probably due to the discretization effect from the BC-type
estimator.
We conducted more evaluations on type I error control under S0. We investigated the FDR
control across different target levels. Figure 3 showed excellent FDR control across target levels
for all methods except FDRreg. The actual FDR level of BH and IHW was usually below the
target level. CAMT was slightly conservative at a small target level under the scenario of sparse
signal and less informative covariate, but it became less conservative at larger target levels. We
also simulated a complete null, where no signal was included (Figure 4). In such case, FDR was
reduced to FWER. Interestingly, FDRreg was as conservative as CAMT and AdaPT under the
complete null.
It is interesting to study the performance of the competing methods under a much larger feature
size, less signal density, and weaker signal strength, representing the most challenging scenario in
real problems. To achieve this end, we simulated m = 100, 000 features with a signal density of
0.5% at the baseline (no covariate effect). Under a moderately informative covariate, we observed a
substantial power improvement of CAMT over all other methods including FDRreg while controlling
the FDR adequately at different target levels (Figure 5). We further reduced the feature size to
1,000 (Figure A1 in the supplement) to study the robustness of the methods to a much smaller
feature size. Although CAMT and AdaPT were still more powerful than the competing methods
19
when the signal was dense and the covariate was informative, a significant power loss was observed
in other parameter settings, particularly under sparse signal and a less informative covariate. As
we further decreased the feature size to 200, CAMT and AdaPT became universally less powerful
than ST across parameter settings (data not shown). Therefore, application of CAMT or AdaPT
to datasets with small numbers of features was not recommended unless the signal was dense and
We also simulated datasets, where the z-scores under the alternative were drawn from a non-
central gamma distribution (Setup S1). Under such setting, the trend remained almost the same
as the basic setup (Figure 6), but FDRreg had a more marked FDR inflation. When both π0i and
f1,i depended on the covariate (Setup S2), CAMT became slightly more powerful without affecting
the FDR control, especially when the covariate was highly informative (Figure 7). Meanwhile,
the performance of FDRreg was also remarkable with a very small FDR inflation. However, if we
increased the effect on f1,i by reducing the standard deviation of the z-score under the alternative,
FDRreg was no longer robust and the observed FDP was substantially above the target level when
the signal strength was weak, indicating the benefit of modeling covariate-dependent f1 (Figure
A2 in the supplement). CAMT was also robust to different correlation structures (Setup S3.1,
S3.2, S3.3, S3.4) and we observed similar performance under these correlation structures (Figures
A3-A6 in the supplement). The performance of CAMT was also robust to a heavy-tail covariate
(Setup S4, Figure A7 in the supplement). In an unreported numerical study, we added different
levels of perturbation to the covariate by multiplying random small values drawn from Unif(0.95,
1.05), Unif(0.9, 1.1), and Unif(0.8, 1.2), respectively. We observed that the π0 estimates under
perturbation are highly correlated with the π0 estimates without perturbation, which showed the
We also examined the robustness of CAMT to the deviation from the theoretical null (Setup
S5). Specifically, we simulated both decreasing and increasing f0 . The new results were presented in
Figures A8 and A9 in the supplement. We observed that, for an increasing f0 , all the methods other
than FDRreg were conservative and had substantial less power than the oracle procedure. FDRreg
using a theoretical null was conservative when the covariate was less informative but was anti-
conservative under a highly informative covariate. On the other hand, FDRreg using an empirical
null had an improved power and controlled the FDR closer to the target level for most settings.
20
However, it did not control the FDR well when the signal was dense and the prior information was
strong. When f0 was decreasing, all the methods without using the empirical null failed to control
the FDR. FDRreg with an empirical null improved the FDR control substantially for most settings
but still could not control the FDR well under the dense-signal and strong-prior setting. Therefore,
there is still room for improvement to address the empirical null problem.
Finally, we compared the computational efficiency of these competing methods (Figure 8).
SABHA (step function) and IHW were computationally the most efficient and they completed
the analysis for one million p-values in less than two minutes. CAMT and the new version of
FDRreg (v0.2) were also computationally efficient, followed by BL, and they all could complete
the computation in minutes for one million p-values under S0. AdaPT was computationally the
most intensive and completed the analysis in hours for one million p-values. We note that all the
methods including AdaPT are computationally feasible for a typical omics dataset.
In summary, CAMT improves over existing covariate adaptive multiple testing procedures, and
is a powerful, robust and computationally efficient tool for large-scale multiple testing.
To demonstrate the use of the proposed method for real-world applications, we applied CAMT
to several omics datasets from transcriptomics, proteomics, epigenomics and metagenomics studies
with the aim to identify omics features associated with the phenotype of interest. Since AdaPT is
the most start-of-the-art method, we focused our comparison to it. To make a fair comparison, we
first run the analyses on the four omics datasets, which were also evaluated by AdaPT (Lei and
Fithian, 2018), including Bottomly (Bottomly et al., 2011), Pasilla (Brooks et al., 2011), Airway
(Himes et al., 2014) and Yeast Protein dataset (Dephoure et al., 2012). The Bottomly, Pasilla
and Airway were three transcriptomics datasets from RNA-seq experiments with a feature size
of 13,932, 11,836 and 33,469, respectively. The yeast protein dataset was a proteomics dataset
from with a feature size of 2,666. We used the same methods to calculate the p-values for these
datasets as described in Lei and Fithian (2018). The distributions of the p-values for these four
datasets all exhibited a spike in the low p-value region, indicating that the signal was dense. The
logarithm of normalized count (averaged across all samples) was used as the univariate covariate
21
A 99% null, m=50,000, four-component f1
x1 = 0, x2 = 0 x1 = 1, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 1
0.0 0.2 0.4 0.6 0.8 1.0
Fn(x)
Fn(x)
Fn(x)
●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
●
● ● ● ●
●
● ●
●
●
●
● ●●●
●
● ● ●
● ●●
●
● ●● ●
● ●●
●
● ●
● ●
● ●●
●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
●
● ●
● ● ●●
●
● ●
● ●
● ●
●
●
● ●
● ●
● ●●
●
● ●
● ●
● ●
●
●● ●● ●
● ●●
●
● ●
● ●
● ●
●
●
● ● ● ●
●
● ●●
●
●
●● ●●
●●
●
● ●
● ● ●
●
● ●
● ●● ● ●●
●
●
● ●
●
● ●●
● ●●
●
●
● ●● ●● ●
●
●
●
●
●
●● ●●● ●
●●
●● ●
●● ●● ●●
●
●
●
● ●
● ●●
● ●
●
●
●
●
● ●●
● ●●
● ●●
●
●
● ●●
●
● ●●
●
●
● ● ●●● ● ●
●● ●● ●
● ●
●
● ●●● ●● ●●
●● ● ●● ●●
●● ●●
● ●●
● ● ●●
●
● ●● ●●
●
●● ●● ●● ●●
●● ● ●● ●
● ●
● ●● ● ●
● ●● ● ●
●●
● ●● ● ●
1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00
p value p value p value p value
B 95% null, m=50,000, four-component f1
x1 = 0, x2 = 0 x1 = 1, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 1
0.0 0.2 0.4 0.6 0.8 1.0
Fn(x)
Fn(x)
Fn(x)
●
●
● ●
● ●
● ●
●
●
●● ●●
● ●
●
● ●
●
●
● ●
● ●● ●
●●
●
● ●
●
● ●
● ●●
●
●
● ●
● ●
●
● ●
●
●
●
●
● ●● ●● ●●
●
● ●
●
● ●
●
● ●
●
●
●
● ●
●
● ●● ●
●
●
●
● ●
● ●
●
● ●
●
●
●
● ●
●
● ●
● ●●
●
●
● ●
● ●
●
● ●●
●
●● ●
● ●
● ●
●
●
●
● ●
●● ●
● ●●
●
●
●
●
● ●
● ●
●
● ●
●
●
●
● ●
●● ●●
● ●
●
●● ●
● ●●
● ●
●
●●
●
● ●
● ●
●
● ●
●●
●
● ●● ●
●
● ●●
●
●
● ●
●
● ●●
● ●
●
●
● ●
● ●
● ●
●●
●
● ●
● ●
●● ●
●
●●
●
● ●● ●
●● ●
●●
●
●
●
●
● ●
●
● ●
● ●
●
●
●
● ●● ●●
● ●
●●
●● ●
●
● ●●
● ●●
●●
●
● ●
●
● ●
● ●
●●
●
●
● ●
●
● ●
●● ●●
●
●●
●● ●
● ●
●
●● ●
●
●●
●
● ●
●
● ●●● ●●
●
●
●
● ●
●
● ●
●●
● ●●●
●
●
●
●● ●
● ●●
● ●●
●
●
● ●
●● ●●
●
●● ●●
●●
●
●
● ●
●
●●
● ●
●
●● ●●
●
●●
●●
●
● ●
●
● ●
●
●●
● ●●
●
●●
●●
● ●
●
● ●
●●
●●
●
● ●●
●
●
●●●
●
●●
● ●
● ●
●
●● ●●
●●●● ●
●● ●●
●●
●●
● ●●● ●
●●
●● ●● ●
●●
●
●
●● ●
●
●●
● ●
●
●●● ●●
●●
●
●●
● ●
●●
●
● ●● ●
●●●●
● ●
●●
● ●
●●
●
●●
● ●●
●●
●
● ●●●●●●●● ●●●●●●
●
●
●
●
●●● ●
●
●●
●● ● ●
●●●
● ●● ●
●● ●● ● ●●
●●
●
●●
● ● ●
●
●● ●●● ● ●●●● ●
●●● ●●● ●
●
●●
●● ●
●●● ● ● ●● ●●
● ●●● ● ● ●●
● ● ● ● ●
●
● ●
●
●● ● ●●●●●●●●
● ●
● ● ●● ●● ●
●
●●
●
● ● ●
● ● ● ● ● ● ● ●● ●
● ●● ●●●
1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00
p value p value p value p value
Fn(x)
Fn(x)
Fn(x)
1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00
p value p value p value p value
Figure 1: Illustration of the approximation to the true likelihood ratio by the surrogate likelihood
ratio based on a beta distribution. Two binary covariates x1 and x2 were simulated. The z-score
under the alternative was drawn from N (0, 1.5 + 0.5x1 + x2 ). Three levels of null proportions (A -
99%, B - 95%, and C - 80%) were simulated, where the null z-score was drawn from N (0, 1). Two-
sided p-values were calculated based on the z-score and the parameter ki of the beta distribution was
estimated by CAMT. The CDF of the empirical distribution of the p-value under the alternative
(black) was compared to CDF of the fitted beta-distribution (red).
22
A Non−informative Prior Moderate Prior Strong Prior
0.06
Sparse Signal
0.04
Method
False Discovery Proportion
0.02
Oracle
0.00
CAMT
0.06
Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST
Dense Signal
BH
0.04
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
●
●
0.4 ●
●
●
●
●
Method
● ● ●
0.2 ●
●
●
●
●
Oracle
0.0 ●
True Positive Rate
●
CAMT
Medium Signal
●
●
0.6 ● ● AdaPT
●
●
● ●
●
IHW
0.4 ● ●
●
●
●
FDRreg(T)
●
0.2 ●
●
SABHA
●
BL
0.8 ●
ST
Dense Signal
●
●
●
●
0.6 ●
●
●
●
●
BH
●
●
●
●
●
0.4 ●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure 2: Performance comparison under the basic setting (S0). False discovery proportions (A)
and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.
23
Non−informative Prior Moderate Prior Strong Prior
0.025
●
0.000 ● ● ●
Sparse Signal
● ● ● ●
● ●
● ●
● ●
−0.025
−0.050
Method
Deviation from target FDR Level
−0.075
Oracle
−0.100
0.025 ●
CAMT
AdaPT
Medium Signal
0.000 ●
●
● ● ●
● ●
● ●
●
● ● ●
●
●
−0.025 IHW
FDRreg(T)
−0.050
SABHA
−0.075
BL
−0.100
0.025 ST
0.000 ● ● ● ●
● ● ● ● BH
Dense Signal
● ● ● ● ●
● ●
−0.025
−0.050
−0.075
−0.100
0.00 0.05 0.10 0.15 0.200.00 0.05 0.10 0.15 0.200.00 0.05 0.10 0.15 0.20
Target FDR level
Figure 3: FDR control at various target levels (0.01 - 0.20) under the basic setting (S0) and a
medium signal strength. False discovery proportions were averaged over 100 simulation runs and
the deviation from the target level (y-axis) was plotted.
for the three RNA-seq data (Bottomly, Pasilla and Airway). The logarithm of the total number of
peptides across all samples was used as the univariate covariate for the yeast protein data. Following
AdaPT, we used a spline basis with six equiquantile knots for π0i , f1,i (CAMT and AdaPT) and
for π0i (FDRreg, BL) to account for potential complex nonlinear effects. Since IHW and SABHA
could only take univariate covariate, we used the univariate covariate directly. We summarized
the results in Figure 9. We were able to reproduce the results in Lei and Fithian (2018). Indeed,
AdaPT was more powerful than SABHA, IHW, ST and BH on the four datasets. FDRreg and BL,
which were not compared in Lei and Fithian (2018), also performed well and made more rejections
than other methods on the Yeast dataset and the Bottomly dataset, respectively. The performance
of the proposed method, CAMT, was almost identical to AdaPT, which was consistent with the
simulation results in the scenario of dense signal and informative covariate (Figure 2).
We next applied to two additional omics datasets from an epigenome-wide association study
(EWAS) of congenital heart disease (CHD) (Wijnands et al., 2017) and a microbiome-wide associ-
24
0.00
●
Deviation from target FDR level
Method
● CAMT
−0.05 ●
AdaPT
IHW
FDRreg(T)
●
SABHA
−0.10
BL
ST
●
● BH
−0.15
Figure 4: FDR control at various target levels (0.01 - 0.20) under the complete null (no signal was
simulated). False discovery proportions were averaged over 1,000 simulation runs and the deviation
from the target level (y-axis) was plotted.
25
A
False Discovery Proportion Method
0.4
Oracle
● CAMT
0.3 AdaPT
IHW
0.2 ●
FDRreg(T)
SABHA
●
BL
0.1
ST
●
● BH
0.0 ●
B 50
●
Method
Oracle
Number of rejections
40
●
● CAMT
AdaPT
30
IHW
FDRreg(T)
20
●
SABHA
BL
10 ST
●
BH
0 ●
26
A Non−informative Prior Moderate Prior Strong Prior
Sparse Signal
0.075
0.050
Method
False Discovery Proportion
0.025
Oracle
CAMT
Medium Signal
0.075 AdaPT
IHW
0.050
FDRreg(T)
0.025 SABHA
BL
ST
Dense Signal
0.075
BH
0.050
0.025
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
0.4 ●
●
●
●
● ●
0.2 ●
●
●
●
●
Method
●
● ●
●
●
Oracle
0.0
True Positive Rate
●
CAMT
Medium Signal
0.6 ● AdaPT
●
●
● ● IHW
0.4 ● ●
●
●
●
●
FDRreg(T)
● ●
0.2 ● ●
SABHA
●
●
BL
● ST
Dense Signal
0.75 ● ●
●
● ● BH
●
● ●
0.50 ● ●
●
● ●
●
●
0.25 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure 6: Performance comparison under S1 (f1,i : non-central gamma distribution). False discovery
proportions (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A)
represent the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.
27
A Non−informative Prior Moderate Prior Strong Prior
0.06
Sparse Signal
0.04
0.02 Method
False Discovery Proportion
Oracle
0.00
CAMT
0.06
Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST
Dense Signal
0.04 BH
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
●
●
0.4 ●
●
● ●
●
● ●
●
0.2 ●
●
Method
● ●
●
Oracle
0.0 ●
True Positive Rate
●
CAMT
●
Medium Signal
0.6 ●
● ● AdaPT
●
● ●
● IHW
0.4 ● ●
●
● ●
FDRreg(T)
●
0.2 ●
●
SABHA
●
BL
0.8 ●
ST
Dense Signal
●
●
●
●
●
0.6 ●
●
●
●
BH
●
●
●
●
●
0.4 ●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure 7: Performance comparison under S2 (covariate-dependent π0,i and f1,i ). False discovery
proportions (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A)
represent the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.
28
6000
Method
4000
● CAMT
IHW
Seconds FDRreg(T)
AdaPT
SABHA
2000
BL
●
0 ● ● ● ● ●
Figure 8: Comparison of runtime under the basic setting (S0). Medium signal density and strength,
and a moderately informative covariate was simulated. The number of features varied from 103 to
45 × 103 . The average runtime over three replications was plotted against the feature size on a log
scale. The computation was performed on an AMD Opteron CPU with 256GB RAM and 16 MB
available cache.
• EWAS data. The aim of the EWAS of CHD was to identify the CpG loci in the human genome
that were differentially methylated between healthy (n = 196) and CHD (n = 84) children.
The methylation levels of 455,741 CpGs were measured by the the Illumina 450K methyla-
tion beadchip and was normalized properly before analysis. The p-values were produced by
running a linear regression to the methylation outcome for each CpG, adjusting for potential
confounders such as age, sex and blood cell mixtures as described in Wijnands et al. (2017).
diseases (Robertson, 2005), we use the mean methylation across samples as the univariate
covariate.
• MWAS data. The aim of the MWAS of sex was to identify differentially abundant bacteria
in the gut microbiome between males and females, where the abundances of the gut bacteria
were determined by sequencing a fingerprint gene in the bacteria 16S rRNA gene. We used
the publicly available data from the AmericanGut project (McDonald et al., 2018), where
more than the gut microbiome from more than 10,000 subjects were sequenced. We focused
29
A Bottomly B Pasilla
●
2500 Method Method
●
● CAMT ● ● CAMT
Number of rejections
Number of rejections
IHW ●
IHW
2000 ● 800
FDRreg FDRreg
●
AdaPT ●
AdaPT
●
1500 SABHA
SABHA 600 ●
● BL BL
1000 BH BH
●
ST ● ST
400
0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100
Target FDR level Target FDR level
Number of rejections
Number of rejections
●
6000
●
IHW ● IHW
FDRreg 300 ● FDRreg
●
5000
AdaPT AdaPT
●
200 ●
SABHA ●
SABHA
4000 ●
BL ●
BL
100
● BH BH
3000
ST ST
0
0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100
Target FDR level Target FDR level
Figure 9: The number of rejections at different target FDR levels on four real datasets used to
demonstrate the performance of AdaPT. The Bottomly (A), Pasilla (B) and Airway (C) datasets
were three transcriptomics datasets from RAN-seq experiments with a feature size of 13,932, 11,836
and 33,469, respectively. The yeast protein dataset (D) was a proteomics dataset with a feature
size of 2,666.
Number of rejections
40 ● ● ● IHW IHW
●
300 ●
FDRreg ● FDRreg
●
●
AdaPT AdaPT
200
SABHA SABHA
20 ●
●
BL ● BL
100 ●
BH ●
BH
ST ST
0 ● ● 0 ●
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
Target FDR level Target FDR level
Figure 10: The number of rejections at different target FDR levels on two real datasets: EWAS of
congenital heart disease (A) and MWAS of sex effect (B). The EWAS dataset was produced by the
Illumina 450K methylation beadchip (m=455, 741) and the MWAS dataset was produced by the
16S rRNA gene amplicon sequencing (m=2, 492).
30
our analysis on a relatively homogenous subset consisting of 481 males and 335 males (age
between 13-70, normal BMI, from United States). We removed OTUs (clustered sequencing
units representing bacteria species) observed in less than 5 subjects, and a total of 2, 492
OTUs were tested using Wilcoxon rank sum test on the normalized abundances. We use the
percentage of zeros across samples as the univariate covariate since we expect a much lower
The results for these two datasets were summarized in Figure 10. For the EWAS data, the signal
density was very sparse (π̂0 =0.99, qvalue package). CAMT identified far more loci than the other
methods at various FDR levels. The performance was consistent with the simulation results in
the scenario of extremely sparse signal and informative covariate, where CAMT was substantially
more powerful than the competing methods (Figure 5). At an FDR of 20%, we identified 55
differentially methylated CpGs, compared to 19 for AdaPT. These 55 CpG loci were mainly located
in CpG islands and the gene promotor regions, which were known for their important role in gene
expression regulation (Robertson, 2005). Interestingly, all but one CpG loci had low levels of
methylation, indicating the methylation level was indeed informative to help identify differential
CpGs. We also did gene set enrichment analysis for the genes where the identified CpGs were
(BP DIRECT), three GO terms were found to be significant (unadjusted p-value <0.05) including
one term “embryonic heart tube development”, which was very relevant to the congenital heart
disease under study (Wijnands et al., 2017). As a sanity check, we randomized the covariate and
re-analyzed the data using CAMT. As expected, CAMT became similar to BH/ST and identified
For the MWAS data, although the difference was not as striking as the EWAS data, CAMT
was still overall more powerful than other competing methods except FDRreg. However, given
the fact that FDRreg was not robust under certain scenarios, the interpretation of the increased
power should be cautious. The relationship between the fitted π0i and the covariate (number
of nonzeros) was very interesting: π̂0i first decreased, reached a minimum at around 70 and then
increased (Figure 11). When the OTU was rare (e.g., a small number of nonzeros, only a few
subjects had it), it was either very individualized or we had limited power to reject it, leading to a
31
A Null probability vs covariate B Target FDR level (alpha = 0.1)
1 rejection 8 rejection
−log10(p−value)
N N
Y 6 Y
0
logit(pi0)
4
−1
2
−2 0
0 200 400 600 800 0 200 400 600 800
Number of nonzeros Number of nonzeros
Figure 11: Performance on the MWAS dataset. (A) The fitted π0i (logit scale) vs. the covariate
(number of nonzeros). (B) p-value (log scale) vs. the covariate (number of nonzeros). Rejected
hypotheses at FDR 10% were in red.
large π0i . In the other extreme where the OTU was very prevalent (e.g., a large number of nonzeros,
most of the subjects had it), it was probability not sex-specific either. Therefore, taking into account
the sparsity level could increase the power of MWAS. It is also informative to compare CAMT to the
filter before performing multiple testing correction, based on the idea that rare OTUs are less likely
to be significant and including them will increase the multiple testing burden. A subjective filtering
criterion has to be determined beforehand. For this MWAS dataset, if we removed OTUs present
in less than 10% of the subjects, ST and BH recovered 116 and 85 significant OTUs at an FDR
of 10%, compared to 69 and 65 on the original dataset, indicating that filtering did improve the
statistical power of traditional FDR control procedures. However, if we removed OTUs present in
less than 20% of the subjects, the numbers of significant OTUs by ST and BH reduced to 71 and
50 respectively. Therefore, filtering could potentially leave out biologically important OTUs. In
contrast, CAMT did not require an explicit filtering criterion, and was much more powerful (141
6 Discussions
Pm
There are generally two strategies for estimating the number of false rejections i=1 (1 −
Hi )1{hi (pi ) ≥ wi (t)} given the form of the rejection rule hi (pi ) ≥ wi (t). The first approach
32
(called BH-type estimator) is to replace the number of false rejections by its expectation assum-
ing that pi follows the uniform distribution on [0, 1] under the null, which leads to the quantity
Pm
i=1 π0i c(t, π0i , ki ) for c(·) defined in Section 3. The second approach (called BC-type estimator)
constant ξ under the assumption that the null distribution of p-values is symmetric about 0.5. Both
procedures enjoy optimality in some asymptotic sense, see, e.g., Arias-Castro and Chen (2017). The
advantage of the BC-type procedure lies on that its estimation of the number of false rejections is
asymptotically conservative when the rejection rule converges to a non-random limit (which holds
even under a misspcified model, see e.g., White, 1982) and f0 is mirror conservative (see equation
(3) of Lei and Fithian, 2018). This fact allows us to estimate the rejection rule by maximizing
a potentially misspecified likelihood as the resulting rejection rule has a non-random limit under
suitable conditions. This is not necessarily the case for the BH-type estimator without imposing
additional constraint when estimating π0i and ki . Specific restriction on the estimators of π0i is
required for the BH-type estimator to achieve FDR control, see, e.g., equation (3) of Li and Barber
(2018).
On the other hand, as the BC-type estimator uses a counting approach to estimate the number
of false rejections, it suffers from the discretization issue (i.e., the BC-type estimator is a step
function of t while the BH-type estimator is continuous), which may result in a large variance for
the FDR estimate. This is especially the case when the FDR level is small. For small FDR level,
the number of rejections is usually small, and thus both the denominator and numerator of the
FDR estimate become small and more variable. Another issue with the BC-type estimator is the
selection of ξ. We follow the idea of knockoff+ in Barber and Candès (2015) by setting ξ = 1. This
choice could make the procedure rather conservative when the signal is very sparse, and the target
FDR level is small. A choice of smaller ξ (e.g. ξ = 0) often leads to inflated FDR in our unreported
simulation studies. To alleviate this issue, one may consider a mixed strategy by using
(m m
)
X X
max π0i c(t, π0i , ki ), 1{hi (1 − pi ) ≥ wi (t)}
i=1 i=1
as a conservative estimate for the number of false rejections when t is relatively small. Our numerical
results in Figure A10 in the supplementary material show that the resulting method can successfully
33
reduce the power loss in the case of sparse signals (or small FDR levels) and less informative
covariates while maintaining the good power performance in other cases. A serious investigation of
this mixed procedure and the BH-type estimator is left for future research.
Since our method is not robust to a decreasing f0 , some diagnostics are needed before running
CAMT. To detect a decreasing f0 , the genomic inflation factor (GIF) can be employed (Devin
and Roeder, 1999). GIF is defined as the ratio of the median of the observed test statistic to
the expected median based on the theoretical null distribution. GIF has been widely used in
genome-wide association studies to assess the deviation of the empirical distribution of the null
p-values from the theoretical uniform distribution. To accommodate potential dense signals for
some genomics studies, we recommend to confine the GIF calculation to p-values between 0.5 and
1. If the GIF is larger, using CAMT may result in excess false positives. In such case, the user
should not trust the results and may consider recalculating the p-values by adjusting potential
confounding factors, either known or estimated based on some latent variable approach such as
surrogate variable analysis (Leek and Storey, 2007), or using the simple genomic control approach
References
[1] Arias-Castro, E., and Chen, S. (2017). Distribution-free multiple testing. Electronic Journal
[2] Barber, R. F., and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs.
[3] Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and
powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57,
289-300.
[4] Bottomly, D., Walter, N.A., Hunter, J.E., Darakjian, P., Kawane, S., Buck, K.J., Searles,
R.P., Mooney, M., McWeeney, S.K., and Hitzemann, R. (2011). Evaluating gene expression
in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PloS one, 6,
p.e17820.
34
[5] Brooks, A. N., Yang, L., Duff, M. O., Hansen, K. D., Park, J. W., Dudoit, S., Brenner, S. E.,
and Graveley, B. R. (2011). Conservation of an RNA regulatory map between Drosophila and
[6] Cai, T. T., and Sun, W. (2009). Simultaneous testing of grouped hypotheses: finding needles
[7] Dephoure, N., and Gygi, S. P. (2012). Hyperplexing: a method for higher-order multiplexed
quantitative proteomics provides a map of the dynamic response to rapamycin in yeast. Science
Signaling, 5, rs2-rs2.
[8] Efron, B. (2004). Local false discovery rate. Technical report, Stanford University, Dept. of
Statistics.
[9] Genovese, C. R., Roeder, K., and Wasserman, L. (2006). False discovery control with p-value
[10] Devlin, B., and Roeder, K. (1999). Genomic control for association studies. Biometrics, 55,
997-1004.
[11] Himes, B.E., Jiang, X., Wagner, P., Hu, R., Wang, Q., Klanderman, B., Whitaker, R. M.,
Duan, Q., Lasky-Su, J., Nikolos, C., and Jester, W. (2014). RNA-Seq transcriptome profiling
[12] Hu, J. X., Zhao, H., and Zhou, H. H. (2010). False discovery rate control with groups. Journal
[13] Ignatiadis, N., Klaus, B., Zaugg, J. B., and Huber, W. (2016). Data-driven hypothesis weight-
ing increases detection power in genome-scale multiple testing. Nature Methods, 13, 577–580.
[14] Leek, J. T., and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by
[15] Lei, L., and Fithian, W. (2018). AdaPT: An interactive procedure for multiple testing with
35
[16] Li, A., and Barber, R. F. (2017). Multiple testing with the structure adaptive Benjamini-
[17] McDonald, D., Hyde, E., Debelius, J.W., Morton, J.T., Gonzalez, A., Ackermann, G., et al.
(2018). American Gut: an open platform for citizen science microbiome research. mSystems,
3, e00031–18.
[18] Pötscher, B. M., and Prucha, I. R. (1989). A uniform law of large numbers for dependent and
[19] Robertson, K. D. (2005). DNA methylation and human disease. Nature Reviews Genetics, 6,
597.
[20] Scott, J. G., Kelly, R. C., Smith, M. A., Zhou, P., and Kass, R. E. (2015). False discovery rate
[21] Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical
[22] Storey, J. D., Taylor, J. E., and Siegmund, D. (2004). Strong control, conservative point esti-
mation and simultaneous conservative consistency of false discovery rates: a unified approach.
[23] Sun, W., Reich, B. J., Cai, T. T., Guindani, M., and Schwartzman, A. (2015). False discovery
control in large-scale multiple testing. Journal of the Royal Statistical Society, Series B, 77,
59–83.
[24] Tansey, W., Koyejo, O., Poldrack, R. A., and Scott, J. G. (2017). False discovery rate
smootinng. arXiv:1411.6144.
[25] White, H. (1982). Maximum likelihood estimation of misspecified Models. Econometrica, 50,
1-25.
[26] Wijnands, K.P., Chen, J., Liang, L., Verbiest, M.M., Lin, X., Helbing, W.A., et al. (2016).
36
Genome-wide methylation analysis identifies novel CpG loci for perimembranous ventricular
37
Supplement to “Covariate Adaptive False Discovery Rate Control
with Applications to Omics-Wide Multiple Testing”
A1 Technical details
Pm
− H )1 {hi (pi ) ≥ wi (t∗ )}
i=1 (1
Pm i
FDR =E
1 ∨ i=1 1 {hi (pi ) ≥ wi (t∗ )}
" Pm
1+ i=1 (1 − H )1 {hi (1 − pi ) ≥ wi (t∗ )}
=E Pm i
1 ∨ i=1 1 {hi (pi ) ≥ wi (t∗ )}
Pm #
(1 − H )1 {h (p ) ≥ w (t∗ )}
i i i i
Pi=1
1+ m i=1 (1 − H i )1 {h i (1 − p i ) ≥ wi (t∗ )}
P m ∗
i=1 (1 − Hi )1 {hi (pi ) ≥ wi (t )}
≤αE Pm
1 + s=1 (1 − Hi )1 {hi (1 − pi ) ≥ wi (t∗ )}
Pm
(1 − Hi )1 {ψi (pi ) ≤ t∗ }
i=1
=αE , (A1)
1+ m ∗
P
i=1 (1 − Hi )1 {ψi (1 − pi ) ≤ t }
where
πi
ψi (pi ) = .
πi + (1 − πi )hi (pi )
Define the order statistics b(1) ≤ b(2) ≤ · · · ≤ b(m0 ) for {bi : Hi = 0}, where m0 is the number of
hypotheses under the null. As t∗ ≤ tup , we can find an integer J ≤ m0 such that
38
Then for any J,
Pm ∗
i=1 (1 − Hi )1 {ψi (pi ) ≤ t } (1 − B1 ) + · · · + (1 − BJ ) J +1
Pm ∗
= = − 1,
1 + i=1 (1 − Hi )1 {ψi (1 − pi ) ≤ t } 1 + B1 + B2 + · · · + BJ 1 + B1 + B2 + · · · + BJ
where Bi = 1 p(i) ≥ 0.5 . Under Condition (4) of the main paper, we have mini:Hi =0 P (pi ≥
J +1
E − 1 ≤ 1.
1 + B1 + B2 + · · · + BJ
−k (x )
Proof of Lemma 3.4. Let ri (θ, β) = log{πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i } and Rm (θ, β) =
1 Pm
m i=1 ri (θ, β) for θ ∈ Θ and β ∈ B. Under Assumptions 3.1 and the boundedness of xi , there
exist c1 and c2 such that 0 < c1 ≤ πθ (xi ) ≤ 1 − c1 < 1 and 0 < c2 ≤ kβ (xi ) ≤ 1 − c2 < 1 for all i
−(1−c2 )
|ri (θ, β)| ≤| log{πθ (xi ) + (1 − πθ (xi ))(1 − c2 )pi }|
for some constant c3 > 0. Under Assumption 3.3, by Corollary 1 of Pötscher and Prucha (1989),
we have
Lemma 2.2 of White (1982) states that if (θ̂, β̂) minimizes −Rm and (θ∗ , β ∗ ) uniquely minimizes
−R, then
39
Finally, notice that
0 ∗ 0
0 ∗ 0 |e−x̃i θ − e−x̃i θ̂ |
max |1/(1 + e−x̃i θ ) − 1/(1 + e−x̃i θ̂ )| = max 0 0 ∗
1≤i≤m 1≤i≤m (1 + e−x̃i θ̂ )(1 + e−x̃i θ )
for x̃i = (1, xi )0 , where the inequality follows from the mean value theorem for e−x and the bounded-
ness of xi . It implies that max1≤i≤m |π̂i −πi∗ | →a.s. 0. The other result max1≤i≤m |k̂i −πβ ∗ (xi )| →a.s.
m
1 X
1{pi ≤ c(t, πi∗ , ki∗ )} →a.s. G0 (t), (A2)
m
i=1
m
1 X
1{1 − pi < c(t, πi∗ , ki∗ )} →a.s. G1 (t), (A3)
m
i=1
1 X
1{pi ≤ c(t, πi∗ , ki∗ )} →a.s. G̃1 (t), (A4)
m
Hi =0
m
1 X
{1{pi ≤ ti } − P (pi ≤ ti )} →a.s. 0, (A5)
m
i=1
−kβ ∗ (xi )
where ψi (pi , xi ) = πθ∗ (xi )/{πθ∗ (xi ) + (1 − πθ∗ (xi ))(1 − kβ ∗ (xi ))pi }. Under Assumption 3.3,
1{ψi (pi , xi ) ≤ t} and 1{ψi (1 − pi , xi ) < t} are both α-mixing (or φ-mixing) processes. Thus by the
40
strong law of large numbers for mixing processes, we get
m
1 X
[1{pi ≤ c(t, πi∗ , ki∗ )} − P (pi ≤ c(t, πi∗ , ki∗ ))] →a.s. 0,
m
i=1
m
1 X
[1{1 − pi < c(t, πi∗ , ki∗ )} − P (1 − pi < c(t, πi∗ , ki∗ ))] →a.s. 0,
m
i=1
1 X
[1{pi ≤ c(t, πi∗ , ki∗ )} − P (pi ≤ c(t, πi∗ , ki∗ ))] →a.s. 0,
m
Hi =0
Lemma A1.2. Suppose Assumptions 3.3, 3.5 and 3.6 hold. Then for small enough > 0, we have
m
1 X
sup sup sup 1{pi ≤ c(t, πi , ki )} − G0 (t) ≤ C1 + oa.s. (1), (A6)
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0 m i=1
m
1 X
sup sup sup 1{1 − pi < c(t, πi , ki )} − G1 (t) ≤ C2 + oa.s. (1), (A7)
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0 m
i=1
m
1 X
sup sup sup (1 − Hi )1{pi ≤ c(t, πi , ki )} − G̃1 (t) ≤ C3 + oa.s. (1), (A8)
||K−K ∗ || ∞ < ||Π−Π∗ || ∞ < t≥t0 m
i=1
where C1 , C2 , C3 > 0 are independent of , ||K − K ∗ ||∞ = max1≤i≤m |ki − ki∗ |, ||Π − Π∗ ||∞ =
Proof of Lemma A1.2. We only prove (A6) as the proofs for the other results are similar. For any
Define
m
1 X
Ĝ(t, Π, K) = 1{pi ≤ c(t, πi , ki )}.
m
i=1
Note that Ĝ(t, Π, k) and G0 (t) are both non-decreasing functions of t. Denote by Ĝ(t−, Π, k) and
G0 (t−) the left limits of Ĝ and G0 at point t respectively. Following the proof of Glivenko-Cantelli
41
We note that for any ||K − K ∗ ||∞ < and ||Π − Π∗ ||∞ < ,
1/(ki∗ +)
t(1 − ki∗ + )(1 − πi∗ + )
c(t, πi , ki ) ≤ c+ (t, πi∗ , ki∗ , ) := 1 ∧ ,
(1 − t)(πi∗ − )
1/(ki∗ −)
t(1 − ki∗ − )(1 − πi∗ − )
c(t, πi , ki ) ≥ c− (t, πi∗ , ki∗ , ) := 1 ∧ .
(1 − t)(πi∗ + )
Also, note that c− (t, πi∗ , ki∗ , ) is bounded away from zero for t ≥ bnt0 c/n and large enough n.
Define
m
∗ ∗ 1 X
Ĝ+ (t, Π , K , ) = 1{pi ≤ c+ (t, πi∗ , ki∗ , )},
m
i=1
m
∗ ∗ 1 X
Ĝ− (t, Π , K , ) = 1{pi ≤ c− (t, πi∗ , ki∗ , )}.
m
i=1
We deduce that
Next we analyze the term |Ĝ+ (qv,n , Π∗ , K ∗ , ) − G0 (qv,n )|. As πi∗ and ki∗ are both bounded away
|P (pi ≤ c+ (qv,n , πi∗ , ki∗ , )) − P (pi ≤ c(qv,n , πi∗ , ki∗ ))|
≤E|P (pi ≤ c+ (qv,n , πi∗ , ki∗ , )|xi ) − P (pi ≤ c(qv,n , πi∗ , ki∗ )|xi )| (A10)
42
Then we have
where the second inequality follows from Assumption 3.5, (A2) and (A5), and the third inequality
is due to (A5) and (A10). Similar arguments can be used to deal with the other terms. Therefore,
sup sup sup Ĝ(t, Π, K) − G0 (t) ≤ c3 + 1/n + oa.s (1) ≤ (c3 + 1) + oa.s. (1).
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0
m
1 X
sup 1{pi ≤ c(t, π̂i , k̂i )} − G0 (t) = oa.s. (1), (A11)
t≥t0 m
i=1
m
1 X
sup 1{1 − pi < c(t, π̂i , k̂i )} − G1 (t) = oa.s. (1), (A12)
t≥t0 m
i=1
m
1 X
sup (1 − Hi )1{pi ≤ c(t, π̂i , k̂i )} − G̃1 (t) = oa.s. (1). (A13)
t≥t0 m i=1
43
Conditional on Am, , by Lemma A1.2 and Assumption 3.5, we have
m
1 X
sup 1{pi ≤ c(t, π̂i , k̂i )} − 1{pi ≤ c(t, πi∗ , ki∗ )}
t≥t0 m
i=1
m
1 X
≤2 sup sup sup (1{pi ≤ c(t, πi , ki )}) − P (pi ≤ c(t, πi , ki ))
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0 m i=1
m
1 X
+ sup P (pi ≤ c(t, π̂i , k̂)) − P (pi ≤ c(t, πi∗ , k ∗ ))
t≥t0 m
i=1
≤c0 max sup c(t, π̂i , k̂i ) − c(t, πi∗ , ki∗ ) + c4 + oa.s. (1)
1≤i≤m t≥t0
As P (Am, ) → 1, the conclusion follows. The proofs for (A12) and (A13) are similar and we omit
the details.
Pm
i=1 1{1 − pi < c(t, π̂i , k̂i )} G1 (t)
sup m
− = oa.s. (1), (A14)
G0 (t)
P
t≥t0 i=1 1{pi ≤ c(t, π̂i , k̂i )}
Pm
i=1 (1 − Hi )1{pi ≤ c(t, π̂i , k̂i )} G̃1 (t)
sup Pm − = oa.s. (1). (A15)
t≥t0
i=1 1{p i ≤ c(t, π̂ i , k̂ i )} G 0 (t)
m−1 m
P
i=1 1{pi ≤ c(t, π̂i , k̂i )} by Gm,1 and Gm,0 respectively. The monotonicity of G0 implies that
uniformly for any t ≥ t0 . Similar arguments can be used to prove the other result.
44
Lemma A1.5. Suppose f0 satisfies Condition (4) of the main paper. Under Assumption 3.6, we
1 X
G̃1 (t) = lim P (pi ≤ c(t, πi∗ , ki∗ ))
m→+∞ m
Hi =0
1 X
≤ lim P (1 − pi < c(t, πi∗ , ki∗ ))
m→+∞ m
Hi =0
m
1 X
≤ lim P (1 − pi < c(t, πi∗ , ki∗ )) = G1 (t)
m→+∞ m
i=1
for t ≥ t0 , where the first inequality is due to the assumption P (pi ≤ a) ≤ P (1 − pi ≤ a) for any
a ∈ [0, 1] and pi ∼ f0 .
Pm 0
i=1 1{1 − pi < c(t , π̂i , k̂)}
m
≤ α − e/2 < α,
0
P
i=1 1{pi ≤ c(t , π̂i , k̂)}
Pm
i=1 (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
α− Pm
i=1 1{pi ≤ c(t̂, π̂i , k̂i )}
Pm Pm
i=1 1{1 − pi < c(t̂, π̂i , k̂i )} (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
≥ Pm − i=1 Pm
i=1 1{pi ≤ c(t̂, π̂i , k̂i )} i=1 1{pi ≤ c(t̂, π̂i , k̂i )}
( Pm
i=1 1{1 − pi < c(t, π̂i , k̂i )} G1 (t)
≥ inf0 m
−
G0 (t)
P
i=1 1{pi ≤ c(t, π̂i , k̂i )}
t≥t
Pm )
G1 (t) − G̃1 (t) G̃1 (t) (1 − H i )1{p i ≤ c(t, π̂i , k̂i )}
+ + − i=1 Pm ≥ oa.s. (1),
G0 (t) G0 (t) i=1 1{pi ≤ c(t, π̂i , k̂i )}
where we have used the fact that G1 (t) ≥ G̃1 (t) as shown in Lemma A1.5. It implies that
Pm
i=1 (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
Pm ≤ α + oa.s. (1).
i=1 1{pi ≤ c(t̂, π̂i , k̂i )}
45
Finally by Fatou’s Lemma,
"P #
m
i=1 (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
lim sup FDR(t̂, Π̂, K̂) ≤ lim sup E Pm ≤ α,
m m i=1 1{pi ≤ c(t̂, π̂i , k̂i )}
Figure A10 provides some numerical results for the mixed strategy discussed in Section 6.
References
[1] Barber, R. F., and Candès, E. J. (2016). A knockoff filter for high-dimensional selective infer-
ence. arXiv:1602.03574.
[2] Pötscher, B. M., and Prucha, I. R. (1989). A uniform law of large numbers for dependent and
[3] Resnick, S. I. (2005). A probability path. Springer Science & Business Media.
[4] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50,
1-25.
46
A Non−informative Prior Moderate Prior Strong Prior
Sparse Signal
0.09
0.06
Method
False Discovery Proportion
0.03
Oracle
0.00
CAMT
Medium Signal
0.09 AdaPT
0.06 IHW
FDRreg(T)
0.03
SABHA
0.00 BL
ST
Dense Signal
0.09
BH
0.06
0.03
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
0.4 ●
●
●
Method
●
0.2 ●
●
●
●
● ● ● ● Oracle
0.0 ● ● ● ●
True Positive Rate
●
CAMT
Medium Signal
0.6 ●
● AdaPT
●
●
●
●
IHW
0.4 ●
●
●
●
●
● FDRreg(T)
0.2 ●
●
●
SABHA
●
0.0 ●
BL
0.8 ● ST
Dense Signal
●
●
●
0.6 ●
●
●
●
BH
● ●
●
●
●
0.4 ●
●
● ●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A1: Performance comparison with m = 1, 000 under the basic setting (S0). False discovery
proportions (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A)
represent the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.
47
A Non−informative Prior Moderate Prior Strong Prior
Sparse Signal
0.3
0.2
Method
False Discovery Proportion
0.1
Oracle
0.0
CAMT
Medium Signal
0.3 AdaPT
IHW
0.2
FDRreg(T)
0.1
SABHA
0.0 BL
ST
Dense Signal
0.3
BH
0.2
0.1
0.0
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
0.6 ●
0.4 ●
●
●
●
0.2 ●
●
Method
●
●
● ● Oracle
0.0 ●
True Positive Rate
● ● ● ●
●
CAMT
0.8
Medium Signal
●
●
AdaPT
0.6 ● ●
● ●
IHW
0.4 ●
● ●
●
● FDRreg(T)
0.2 ● ●
SABHA
● ●
0.0 ● ● ●
BL
● ST
Dense Signal
0.75 ● ●
●
●
●
●
●
●
BH
●
0.50 ●
●
●
0.25 ●
●
0.00 ● ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A2: Performance comparison under S2 (covariate-dependent π0,i and f1,i , the standard
deviation of the z-score under H1 is 0.5). False discovery proportions (A) and true positive rates
(B) were averaged over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed
horizontal line indicates the target FDR level of 0.05.
48
A Non−informative Prior Moderate Prior Strong Prior
0.06
Sparse Signal
0.04
Method
False Discovery Proportion
0.02
Oracle
CAMT
Medium Signal
0.06 AdaPT
0.04 IHW
FDRreg(T)
0.02 SABHA
BL
0.06 ST
Dense Signal
BH
0.04
0.02
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
●
●
0.4 ●
●
● ● ●
0.2 ●
●
●
●
Method
●
●
●
●
Oracle
0.0
True Positive Rate
●
CAMT
Medium Signal
●
●
0.6 ● ● AdaPT
●
●
● ●
●
IHW
0.4 ● ●
● ●
●
FDRreg(T)
●
0.2 ●
●
SABHA
●
BL
0.8 ●
ST
Dense Signal
●
●
●
●
0.6 ●
●
●
●
●
BH
●
●
●
●
●
0.4 ●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A3: Performance comparison under S3.1 (block correlation structure, positive correlations
(ρ=0.5) within blocks). False discovery proportions (A) and true positive rates (B) were averaged
over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed horizontal line
indicates the target FDR level of 0.05.
49
A Non−informative Prior Moderate Prior Strong Prior
0.06
Sparse Signal
0.04
Method
False Discovery Proportion
0.02
Oracle
0.00
CAMT
0.06
Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST
Dense Signal
BH
0.04
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
●
●
0.4 ●
●
● ●
●
● ●
0.2 ● ●
●
Method
●
●
●
Oracle
0.0 ●
True Positive Rate
●
CAMT
Medium Signal
●
●
0.6 ● ● AdaPT
●
●
●
●
●
IHW
0.4 ● ●
●
●
●
FDRreg(T)
●
0.2 ●
●
SABHA
●
BL
0.8 ●
ST
Dense Signal
●
●
●
●
0.6 ●
●
●
●
BH
●
●
●
●
●
●
0.4 ●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A4: Performance comparison under S3.2 (block correlation structure, positive/negative
correlations (ρ= ± 0.5) within blocks). False discovery proportions (A) and true positive rates (B)
were averaged over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed
horizontal line indicates the target FDR level of 0.05.
50
A Non−informative Prior Moderate Prior Strong Prior
0.06
Sparse Signal
0.04
Method
False Discovery Proportion
0.02
Oracle
0.00
CAMT
0.06
Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST
Dense Signal
BH
0.04
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
●
●
0.4 ● ●
● ●
●
●
Method
●
0.2 ● ●
●
● ●
●
●
Oracle
0.0
True Positive Rate
●
CAMT
Medium Signal
0.6 ● ●
●
AdaPT
●
●
●
●
●
IHW
0.4 ● ●
●
●
●
FDRreg(T)
●
0.2 ●
●
SABHA
●
BL
0.8 ●
ST
Dense Signal
●
●
●
●
●
●
●
BH
0.6 ●
●
●
●
● ●
●
0.4 ●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A5: Performance comparison under S3.3 (AR(1) structure, positive correlations (ρ=0.75)).
False discovery proportions (A) and true positive rates (B) were averaged over 100 simulation runs.
Error bars (A) represent the 95% CIs and the dashed horizontal line indicates the target FDR level
of 0.05.
51
A Non−informative Prior Moderate Prior Strong Prior
0.06
Sparse Signal
0.04
Method
False Discovery Proportion
0.02
Oracle
0.00
CAMT
Medium Signal
0.06 AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST
Dense Signal
BH
0.04
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
●
●
0.4 ● ●
●
●
●
● ●
0.2 ● ●
●
Method
●
●
● Oracle
0.0 ●
True Positive Rate
●
CAMT
●
Medium Signal
0.6 ● ●
●
AdaPT
●
●
● ●
●
IHW
0.4 ● ●
●
●
●
FDRreg(T)
●
0.2 ●
●
SABHA
●
BL
0.8 ●
ST
Dense Signal
●
●
●
●
0.6 ●
●
●
●
BH
●
●
●
● ●
●
0.4 ●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A6: Performance comparison under S3.4 (AR(1) structure, positive/negative correlations
(ρ= ± 0.75)). False discovery proportions (A) and true positive rates (B) were averaged over 100
simulation runs. Error bars (A) represent the 95% CIs and the dashed horizontal line indicates the
target FDR level of 0.05.
52
A Non−informative Prior Moderate Prior Strong Prior
0.08
Sparse Signal
0.06
0.04
Method
False Discovery Proportion
0.02
0.00 Oracle
0.08 CAMT
Medium Signal
0.06 AdaPT
IHW
0.04
FDRreg(T)
0.02
SABHA
0.00 BL
0.08
ST
Dense Signal
0.06
BH
0.04
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
0.6 ● ●
●
●
●
●
0.4 ●
●
Method
● ●
0.2 ●
● ●
0.0 ●
●
Oracle
True Positive Rate
0.8 ●
●
CAMT
Medium Signal
●
●
AdaPT
0.6 ●
●
●
●
● ●
●
IHW
0.4 ● ●
●
●
FDRreg(T)
●
0.2 ●
SABHA
●
BL
●
●
ST
Dense Signal
● ●
0.7 ●
●
●
●
●
●
● BH
●
0.5 ●
● ●
●
0.3
●
2 4 6 2 4 6 2 4 6
Effect size
Figure A7: Performance comparison under S4 (heavy-tail covariate). False discovery proportions
(A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent
the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.
53
A Non−informative Prior Moderate Prior Strong Prior
Sparse Signal
0.2
0.1 Method
False Discovery Proportion
Oracle
0.0 CAMT
AdaPT
Medium Signal
0.2 IHW
FDRreg(T)
0.1 FDRreg(E)
SABHA
0.0
BL
Dense Signal
ST
0.2
BH
0.1
0.0
2 4 6 2 4 6 2 4 6
Effect size
0.6
Sparse Signal
●
0.4 ●
●
●
●
● Method
0.2 ●
●
●
●
●
●
Oracle
● ●
0.0 ● ●
CAMT
True Positive Rate
0.8
AdaPT
Medium Signal
●
0.6 ● ●
●
IHW
● ●
0.4 ●
●
●
FDRreg(T)
●
●
0.2 ●
● ●
●
FDRreg(E)
●
●
●
SABHA
0.0
BL
0.8
Dense Signal
● ●
●
●
ST
0.6 ●
●
●
●
BH
● ●
●
0.4 ●
●
● ●
●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A8: Performance comparison under S5.1 (increasing f0 ). False discovery proportions (A)
and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs and the dashed horizontal line indicates the target FDR level of 0.05. FDRregT and
FDRregE represent the FDRreg method using the theoretical and empirical null respectively.
54
A Non−informative Prior Moderate Prior Strong Prior
Sparse Signal
0.2
0.1
Method
False Discovery Proportion
Oracle
0.0 CAMT
AdaPT
Medium Signal
0.2 IHW
FDRreg(T)
0.1 FDRreg(E)
SABHA
0.0
BL
Dense Signal
ST
0.2
BH
0.1
0.0
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
0.6 ●
●
●
●
0.4 ●
●
● ●
Method
●
●
● ●
0.2 ●
●
●
Oracle
●
0.0 CAMT
True Positive Rate
0.8 ●
AdaPT
Medium Signal
●
● ●
●
0.6 ● ●
●
●
IHW
● ●
0.4 ●
●
●
FDRreg(T)
●
●
● FDRreg(E)
0.2 ●
SABHA
0.0
●
BL
●
●
0.75
Dense Signal
●
●
●
●
●
●
●
ST
●
●
0.50
●
● ●
BH
● ●
0.25
0.00
2 4 6 2 4 6 2 4 6
Effect size
Figure A9: Performance comparison under S5.2 (decreasing f0 ). False discovery proportions (A)
and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs and the dashed horizontal line indicates the target FDR level of 0.05. FDRregT and
FDRregE represent the FDRreg method using the theoretical and empirical null respectively.
55
A Non−informative Prior Moderate Prior Strong Prior
0.06
Sparse Signal
0.04
Method
False Discovery Proportion
0.02
Oracle
0.00
CAMT
0.06
Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST
Dense Signal
BH
0.04
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size
Sparse Signal
●
●
●
0.4 ●
●
● ●
●
●
●
Method
●
0.2 ●
●
● ●
●
●
Oracle
0.0
True Positive Rate
●
CAMT
●
Medium Signal
0.6 ● ● AdaPT
●
●
●
●
●
IHW
0.4 ● ●
● ●
●
FDRreg(T)
●
0.2 ●
●
SABHA
●
BL
0.8 ●
ST
Dense Signal
●
●
●
●
0.6 ●
●
●
●
●
BH
●
●
●
●
●
0.4 ●
●
0.2 ●
2 4 6 2 4 6 2 4 6
Effect size
Figure A10: Performance comparison under the basic setting (S0). CAMT used the mixed strategy
discussed in in Section 6. False discovery proportions (A) and true positive rates (B) were averaged
over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed horizontal line
indicates the target FDR level of 0.05.
56