0% found this document useful (0 votes)
10 views56 pages

Covariate Adaptive False Discovery Rate Control With Applications To Omics-Wide Multiple Testing

This paper presents a novel procedure for controlling the False Discovery Rate (FDR) in large-scale multiple testing by incorporating covariate information, addressing limitations of traditional methods that assume exchangeability among hypotheses. The proposed method is shown to be flexible, robust, and computationally efficient, outperforming existing approaches, especially in sparse signal scenarios, and is implemented in the R package CAMT. Extensive simulations validate its performance, demonstrating its applicability to omics datasets in genomics studies.

Uploaded by

H.C. So
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views56 pages

Covariate Adaptive False Discovery Rate Control With Applications To Omics-Wide Multiple Testing

This paper presents a novel procedure for controlling the False Discovery Rate (FDR) in large-scale multiple testing by incorporating covariate information, addressing limitations of traditional methods that assume exchangeability among hypotheses. The proposed method is shown to be flexible, robust, and computationally efficient, outperforming existing approaches, especially in sparse signal scenarios, and is implemented in the R package CAMT. Extensive simulations validate its performance, demonstrating its applicability to omics datasets in genomics studies.

Uploaded by

H.C. So
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Covariate Adaptive False Discovery Rate Control with

Applications to Omics-Wide Multiple Testing


Xianyang Zhang and Jun Chen
arXiv:1909.04811v2 [stat.ME] 10 Jun 2020

Abstract Conventional multiple testing procedures often assume hypotheses for different fea-
tures are exchangeable. However, in many scientific applications, additional covariate information
regarding the patterns of signals and nulls are available. In this paper, we introduce an FDR control
procedure in large-scale inference problem that can incorporate covariate information. We develop
a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when
the underlying likelihood ratio model is misspecified and the p-values are weakly dependent (e.g.,
strong mixing). Extensive simulations are conducted to study the finite sample performance of the
proposed method and we demonstrate that the new approach improves over the state-of-the-art
approaches by being flexible, robust, powerful and computationally efficient. We finally apply the
method to several omics datasets arising from genomics studies with the aim to identify omics fea-
tures associated with some clinical and biological phenotypes. We show that the method is overall
the most powerful among competing methods, especially when the signal is sparse. The proposed
Covariate Adaptive Multiple Testing procedure is implemented in the R package CAMT.
Keywords: Covariates, EM-algorithm, False Discovery Rate, Multiple Testing.

1 Introduction

Multiple testing refers to simultaneous testing of more than one hypothesis. Given a set of

hypotheses, multiple testing deals with deciding which hypotheses to reject while guaranteeing

some notion of control on the number of false rejections. A traditional measure is the family-wise

error rate (FWER), which is the probability of committing at least one type I error. As the number

of trials increases, FWER still measures the probability of at least one false discovery, which is overly

stringent in many applications. This absolute control is in contrast to the proportionate control

afforded by the false discovery rate (FDR).

1
Xianyang Zhang ([email protected]) is Associate Professor of Statistics at Texas A&M University. Jun
Chen ([email protected]) is Associate Professor of Biostatistics at Mayo Clinic. Zhang acknowledges partial
support from NSF DMS-1830392 and NSF DMS-1811747. Chen acknowledges support from Mayo Clinic Center for
Individualized Medicine.

1
Consider the problem of testing m distinct hypotheses. Suppose a multiple testing procedure

rejects R hypotheses among which V hypotheses are null, i.e., it commits V type I errors. In the

seminal paper by Benjamini and Hochberg, the authors introduced the concept of FDR defined as

 
V
FDR = E ,
R∨1

where a∨b = max{a, b} for a, b ∈ R, and the expectation is with respect to the random quantities V

and R. FDR has many advantageous features comparing to other existing error measures. Control

of FDR is less stringent than the control of FWER especially when a large number of hypothesis

tests are performed. FDR is also adaptive to the underlying signal structure in the data. The

widespread use of FDR is believed to stem from and motivated by the modern technologies which

produce big datasets, with huge numbers of measurements on a comparatively small number of

experimental units. Another reason for the popularity of FDR is the existence of a simple and

easy-to-use procedure proposed in Benjamini and Hochberg (1995) (the BH procedure, hereafter)

to control the FDR at a prespecified level.

Although the BH procedure is more powerful than procedures aiming to control the FWER, it

assumes hypotheses for different features are exchangeable which could result in suboptimal power

as demonstrated in recent literature when individual tests differ in their true effect size, signal-to-

noise ratio or prior probability of being false. In many scientific applications, particularly those

from genomics studies, there are rich covariates that are informative of either the statistical power

or the prior null probability. These covariates can be roughly derived into two classes: statistical

covariates and external covariates (Ignatiadi et al., 2016). Statistical covariates are derived from

the data itself and could reflect the power or null probability. Generic statistical covariates include

the sample variance, total sample size and sample size ratio (for two-group comparison), and the

direction of the effects. There are also specific statistical covariates for particular applications. For

example, in transcriptomics studies using RNA-Seq, the sum of read counts per gene across all

samples is a statistical covariate informative of power since the low-count genes are subject to more

sampling variability. Similarly, the minor allele frequency and the prevalence of the bacterial species

can be taken as statistical covariates for genome-wide association studies (GWAS) and microbiome-

wide association studies (MWAS), respectively. Moreover, the average methylation level of a CpG

2
site in epigenome-wide association studies (EWAS) can be a statistical covariate informative of the

prior null probability due to the fact that differential methylation frequently occurs in highly or

lowly methylated region depending on the biological context. Besides these statistical covariates,

there are a plethora of covariates that are derived from external sources and are usually informative

of the prior null probability. These external covariates include the deleteriousness of the genetic

variants for GWAS, the location (island and shore) of CpG methylation variants for EWAS, and

pathogenicity of the bacterial species for MWAS. Useful external covariates also include p-values

from previous or related studies which suggest that some hypotheses are more likely to be non-

null than others. Exploiting such external covariates in multiple testing could lead to improved

statistical power as well as enhanced interpretability of research results.

Accommodating covariates in multiple testing has recently been a very active research area.

We briefly review some contributions that are most relevant to the current work. The basic idea

of many existing works is to relax the p-value thresholds for hypotheses that are more likely to

be non-null and tighten the thresholds for the other hypotheses so that the overall FDR level can

be controlled. For example, Genovese et al. (2006) proposed to weight the p-values with different

weights, and then apply the BH procedure to the weighted p-values. Hu et al. (2010) developed

a group BH procedure by estimating the proportions of null hypotheses for each group separately,

which extends the method in Storey (2002). Li and Barber (2017) generalized this idea by using

the censored p-values (i.e., p-values that are greater than a pre-specified threshold) to adaptively

estimate the weights that can be designed to reflect any structure believed to be present. Ignatiadi

et al. (2016) proposed the independent hypothesis weighting (IHW) for multiple testing with

covariate information. Their idea is to bin the covariates into several groups and then apply the

weighted BH procedure with piecewise constant weights. Boca and Leek (2018) extended the idea

by using a regression approach to estimate weights. Another related method (named AdaPT)

was proposed in Lei and Fithian (2018), which iteratively estimates the p-value thresholds using

partially censored p-values. The above procedures can be viewed to some extent as different variants

of the weighted BH procedure. Along a separate line, Local FDR (LFDR) based approaches have

been developed to accommodate various forms of auxiliary information. For example, Cai and Sun

(2009) considered multiple testing of grouped hypotheses using the pooled LFDR statistic. Sun

et al. (2015) developed a LFDR-based procedure to incorporate spatial information. Scott et al.

3
(2015) and Tansey et al. (2017) proposed EM-type algorithms to estimate the LFDR by taking

into account covariate and spatial information, respectively.

Although the approaches mentioned above excel in certain aspects, a method that is flexible,

robust, powerful and computationally efficient is still lacking. For example, IHW developed in

Ignatiadi et al. (2016) cannot handle multiple covariates. AdaPT in Lei and Fithian (2018) is

computationally intensive and may suffer from significant power loss when the signal is sparse, and

covariate is not very informative. Li and Barber (2017)’s procedure is not Bayes optimal as shown

in Lei and Fithian (2018) and thus could lead to suboptimal power as observed in our numerical

studies. The FDR regression method proposed in Scott et al. (2015) lacks a rigorous FDR control

theory. Table 1 provides a detailed comparison of these methods.

In this paper, in addition to a thorough evaluation of these methods using comprehensive

simulations covering different signal structures, we propose a new procedure to incorporate covariate

information with generic applicability. The covariates can be any continuous or categorical variables

that are thought to be informative of the statistical properties of the hypothesis tests. The main

contributions of our paper are two-fold:

1. Given a sequence of p-values {p1 , . . . , pm }, we introduce a general decision rule of the form

(1 − t)πi
(1 − ki )pi−ki ≥ , 0 < ki < 1, 1 ≤ i ≤ m, (1)
t(1 − πi )

which serves as a surrogate for the optimal decision rule derived under the two-component

mixture model with varying mixing probabilities and alternative densities. Here πi and ki are

parameters that can be estimated from the covariates and p-values, and t is a cut-off value to

be determined by our FDR control method. We develop a new procedure to estimate (ki , πi )

and find the optimal threshold value for t in (1). We show that (i) when πi and ki are chosen

independently of the p-values, the proposed procedure provides finite sample FDR control;

(ii) our procedure provides asymptotic FDR control when πi and ki are chosen to maximize a

potentially misspecified likelihood based on the covariates and p-values; (iii) Similar to some

recent works (e.g., Ignatiadi et al., 2016; Lei and Fithian, 2017; Li and Barber, 2017), our

method allows the underlying likelihood ratio model to be misspecified. A distinctive feature

is that our asymptotic analysis does not require the p-values to be marginally independent or

4
conditionally independent given the covariates. More specifically, we allow the pairs of p-value

and covariate across different hypotheses to be strongly mixing as specified in Assumption

3.3.

2. We develop an efficient algorithm to estimate πi and ki . The developed algorithm is scalable

to problems with millions of tests. Through extensive numerical studies, we show that our

procedure is highly competitive to several existing approaches in the recent literature in terms

of finite sample performance. The proposed procedure is implemented in the R package CAMT.

Our method is related to Lei and Fithian (2018), and it is worth highlighting the differences from

their work. (i) Lei and Fithian (2018) uses partially censored p-values to determine the threshold,

which can discard useful information concerning the alternative distribution of p-values (i.e., f1,i

in (3) below) since small p-values that are likely to be generated from the alternative are censored.

In contrast, we use all the p-values to determine the threshold. Our method is seen to exhibit more

power as compared to Lei and Fithian (2018) when signal is (moderately) sparse. Although our

method no longer offers theoretical finite sample FDR control, we show empirically that the power

gain is not at the cost of FDR control. (ii) Different from Lei and Fithian (2018) which requires

multiple stages for practitioners to make their final decision, our method is a single-stage procedure

that only needs to be run one time; Thus the implementation of our method is faster and scalable

to modern big datasets. (iii) Our theoretical analysis is entirely different from those in Lei and

Fithian (2018). In particular, we show that our method achieves asymptotic FDR control even

when the p-values are dependent.

2 Methodology

2.1 Rejection rule

We consider simultaneous testing of m hypotheses Hi for i = 1, 2, . . . , m. Let pi be the p-value

associated with the ith hypothesis, and with some abuse of notation, let Hi indicate the underlying

truth of the ith hypothesis. In other words, Hi = 0 if the ith hypothesis is true and Hi = 1

otherwise. For each hypothesis, we observe a covariate xi lying in some space X ⊆ Rq with q ≥ 1.

From a Bayesian viewpoint, we can model Hi given xi as a Bernoulli random variable with success

5
probability 1 − π0i , where π0i denotes the prior probability that the ith hypothesis is under the null

when conditioning on xi . One approach to model the p-value distribution is via a two-component

mixture model,

Hi |xi ∼ Bernoulli(1 − π0i ), (2)

pi |xi , Hi ∼ (1 − Hi )f0 + Hi f1,i , (3)

where f0 and f1,i are the density functions corresponding to the null and alternative hypotheses

respectively. In the following discussions, we shall assume that f0 satisfies the following condition:

for any a ∈ [0, 1]

Z a Z 1
f0 (x)dx ≤ f0 (x)dx. (4)
0 1−a

This condition relaxes the assumption of uniform distribution on the unit interval. It is fulfilled

when f0 is non-decreasing or f0 is symmetric about 0.5 (in which case the equality holds in (4)).

We demonstrate that this relaxation is capable of describing plausible data generating processes

that would create a non-uniform null distribution. Let T be a test statistic such that under the null

its z-score Z = (T − µ0 )/σ0 is standard normal. In practice, one uses µ̂ and σ̂ to estimate µ0 and

σ0 respectively. Let Φ be the standard normal CDF. The corresponding one-sided p-value is given

by Φ((T − µ̂)/σ̂) whose distribution function is P (Φ((T − µ̂)/σ̂) ≤ x) = Φ((Φ−1 (x)σ̂ + µ̂ − µ0 )/σ0 ).

When µ0 ≥ µ̂ (i.e., we underestimate the mean), one can verify that f0 is a non-decreasing. In the

case of µ0 = µ̂ and σ0 6= σ̂, f0 is non-uniformly symmetric about 0.5.

Compared to the classical two-component mixture model, the varying null probability reflects

the relative importance of each hypothesis given the external covariate information xi and the

varying alternative density f1,i emphasizes the heterogeneity among signals. In the context

without covariate information, it is well known that the optimal rejection is based on the LFDR,

see e.g., Efron (2004) and Sun and Cai (2007). The result has been generalized to the setups with

group or covariate information, see e.g., Cai and Sun (2009) and Lei and Fithian (2018). Based

on these insights, one can indeed show that the optimal rejection rule that controls the expected

6
number of false positives while maximizes the expected number of true positives takes the form of

f1,i (pi ) (1 − t)π0i


≥ , (5)
f0 (pi ) t(1 − π0i )

where t ∈ (0, 1) is a cut-off value. This decision rule is generally unobtainable because f1,i is uniden-

tifiable without extra assumptions on its form. Moreover, consistent estimation of the decision rule

(5) is difficult, and even with the use of additional approximations, such as splines or piecewise

constant functions. In this work, we do not aim to estimate the optimal rejection rule directly.

Instead, we try to find a rejection rule that can mimic some useful operational characteristics of

the optimal rule. Our idea is to first replace f1,i /f0 by a surrogate function hi . We emphasize that

hi needs not agree with the likelihood ratio f1,i /f0 for our method to be valid. In fact, the validity

of our method does not rely on the correct specification of model (2)-(3). We require hi to satisfy
R1
(i) hi (p) ≥ 0 for p ∈ [0, 1]; (ii) 0 hi (p)dp = 1; (iii) h is decreasing. Requirement (iii) is imposed to

mimic the common likelihood ratio assumption in the literature, see e.g. Sun and Cai (2007). In

this paper, we suggest to use the beta density,

hi (p) = (1 − ki )p−ki , 0 < ki < 1, (6)

where ki is a parameter that depends on xi . Suppose that under the null hypothesis, pi is

uniformly distributed, whereas under the alternative, it follows a beta distribution with parameters

(1 − ki , 1), then the true likelihood ratio would take exactly the form given in (6). To demonstrate

the approximation of the proposed surrogate likelihood ratio to the actual likelihood ratio for

realistic problems, we simulated two binary variables and generated four alternative distributions

f1,i depending on the four levels of the two variables (details in the legend of Figure 1). We used

the proposed procedure to find the best ki and compared the CDF of the empirical distribution

(reflecting the actual likelihood ratio) to that of the fitted beta distribution (reflecting the surrogate

likelihood ratio). We can see from Figure 1 the approximation was reasonably well and the accuracy

increases with the signal density.

7
Based on the surrogate likelihood ratio, the corresponding rejection rule is given by

(1 − t)πi
hi (pi ) ≥ wi (t) := , (7)
t(1 − πi )

for some weights πi to be determined later. See Section 2.3 for more details about the estimation

of ki and πi .

2.2 Adaptive procedure

We first note that the false discovery proportion (FDP) associated with the rejection rule (7)

is equal to

Pm
i=1 (1 − H )1{hi (pi ) ≥ wi (t)}
FDP(t) := Pm i .
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}

Then for a cut-off value t, we have

Pm
i=1 (1P− Hi )1{pi ≤ h−1 i (wi (t))}
FDP(t) =
1∨ m i=1 1{h (p
i i ) ≥ wi (t)}
Pm −1
(1 − Hi )P (pi ≤ hi (wi (t)))
≈ i=1 Pm
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}
Pm
(1 − Hi )P (1 − pi ≤ h−1 i (wi (t)))
≤ i=1 Pm
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}
1+ m
P
i=1 (1 − H )1{hi (1 − pi ) ≥ wi (t)}
≈ Pm i
1 ∨ i=1 1{hi (pi ) ≥ wi (t)}
Pm
1 + i=1 1{hi (1 − pi ) ≥ wi (t)}
≤ := FDPup (t),
1∨ m
P
i=1 1{hi (pi ) ≥ wi (t)}

where the approximations are due to the law of large numbers and the inequality follows from

Condition (4).1 This strategy is partly motivated by the recent distribution-free method proposed

in Barber and Candès (2015). We refer any FDR estimator constructed using this strategy as

the BC-type estimator. Both the adaptive procedure in Lei and Fithian (2018) and the proposed

method fall into this category. A natural idea is to select the largest threshold such that FDPup (t)

1
Rigorous theoretical justifications are provided in Theorem 2.1 and Theorem 3.8.

8
is less or equal to a prespecified FDR level α. Specifically, we define

1+ m
 P 
∗ i=1 1{hi (1 − pi ) ≥ wi (t)}
t = max t ∈ [0, tup ] : FDPup (t) = ≤α ,
1∨ m
P
i=1 1{hi (pi ) ≥ wi (t)}

where tup satisfies that wi (tup ) ≥ hi (0.5) for all i, and we reject all hypotheses such that hi (pi ) ≥

wi (t∗ ). The following theorem establishes the finite sample control of the above procedure when πi

and hi are prespecified and thus independent of the p-values. For example, πi and hi are estimated

based on data from an independent but related study.

Theorem 2.1. Suppose hi is strictly decreasing for each i and f0 satisfies Condition (4). If the p-

values are independent and the choice of hi and πi is independent of the p-values, then the adaptive

procedure provides finite sample FDR control at level α.

2.3 An algorithm

The optimal choices of πi and ki are rarely known in practice, and a generally applicable data-

driven method is desirable. In this subsection, we propose an EM-type algorithm to estimate πi

and ki . In particular, we model both πi and ki as functions of the covariate xi . As an illustration,

we provide the following example.

Example 2.1. Suppose

pi |xi , Hi ∼ (1 − Hi )f0 + Hi f1,i ,

xi |Hi ∼ (1 − Hi )g0 + Hi g1 ,

where Hi ∼i.i.d Bernoulli(1 − π0 ). Using the Bayes rule, we have

f (pi , xi |Hi = 0)π0 + f (pi , xi |Hi = 1)(1 − π0 )


f (pi |xi ) =
f (xi |Hi = 0)π0 + f (xi |Hi = 1)(1 − π0 )
f (pi |xi , Hi = 0)f (xi |Hi = 0)π0 + f (pi |xi , Hi = 1)f (xi |Hi = 1)(1 − π0 )
=
f (xi |Hi = 0)π0 + f (xi |Hi = 1)(1 − π0 )

=π(xi )f0 (pi ) + (1 − π(xi ))f1,i (pi ),

9
where π(x) = g0 (x)π0 /{g0 (x)π0 +g1 (x)(1−π0 )} = f (Hi = 0|xi = x). Therefore, πi is the conditional

probability that the ith hypothesis is under the null given the covariate xi .

0
To motivate our estimation procedure for πi and ki , let us define πθ (x) = 1/(1 + e−θ0 −θ1 x ) and
0
kβ (x) = 1/(1 + e−β0 −β1 x ) for x ∈ Rq , where θ = (θ0 , θ1 ) and β = (β0 , β1 ). Suppose that conditional

on xi and marginalizing over Hi ,

 
f1,i (pi )
f (pi |xi ) =πθ (xi )f0 (pi ) + (1 − πθ (xi ))f1,i (pi ) = f0 (pi ) πθ (xi ) + (1 − πθ (xi )) .
f0 (pi )

Replacing f1,i /f0 by the surrogate likelihood ratio whose parameters ki depend on xi , we obtain

n o
−k (x )
f˜(pi |xi ) = f0 (pi ) πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i .

Moving to a log scale and summing up the individual log likelihoods, we see that the null density

is a nuisance parameter that does not depend on θ and β:

m m n o
−k (x )
X X
log f˜(pi |xi ) = log πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i + C0 ,
i=1 i=1

Pm
where C0 = i=1 log f0 (pi ). The above discussions thus motivate the following optimization prob-

lem for estimating the unknown parameters:

m
X
max log{πi + (1 − πi )(1 − ki )p−ki }, (8)
θ=(θ0 ,θ1 )0 ∈Θ,β=(β0 ,β1 )0 ∈B
i=1

where

   
πi ki
log = θ0 + θ10 xi , log = β0 + β10 xi , (9)
1 − πi 1 − ki

and Θ, B ⊆ Rq+1 are some compact parameter spaces. This problem can be solved using the EM-

algorithm together with the Newton’s method in its M-step. Let θ̂ and β̂ be the maximizer from

10
(8). Define

0
if 1/(1 + e−x̃i θ̂ ) ≤ 1 ,




1 ,


0
π̂i = W (1/(1 + e−x̃i θ̂ ), 1 , 2 ) := 0
1/(1 + e−x̃i θ̂ ),
0
if 1 < 1/(1 + e−x̃i θ̂ ) < 1 − 2 ,





1 −  ,

2 otherwise,

0
and k̂i = 1/(1 + e−x̃i β̂ ) with x̃i = (1, x0i )0 , and

(1 − t)π̂i
ŵi (t) = .
t(1 − π̂i )

We use winsorization to prevent π̂i from being too close to zero. In numerical studies, we found

the choices of 1 = 0.1 and 2 = 10−5 perform reasonably well. Further denote
( Pm )
1+ 1{(1 − k̂ )(1 − p ) −k̂i > ŵ (t)}
i=1 i i i
t̂ = max t ∈ [0, 1] : ≤α .
Pm −k̂i
1 ∨ i=1 1{(1 − k̂i )pi ≥ ŵi (t)}

Then we reject the ith hypothesis if

(1 − k̂i )p−
i
k̂i
≥ ŵi (t̂).

Remark 2.1. We can replace xi ∈ Rq by (g1 (xi ), . . . , gq0 (xi )) ∈ Rq0 for some transformations

(g1 , . . . , gq0 ) to allow nonlinearity in the logistic regressions. In numerical studies, we shall consider

the spline transformation.

3 Asymptotic results

3.1 FDR control

In this subsection, we provide asymptotic justification for the proposed procedure. Note that

( )1/k̂i
−k̂i t(1 − k̂i )(1 − π̂i )
1{(1 − k̂i )p ≥ ŵi (t)} = 1{p ≤ c(t, π̂i , k̂i )} for c(t, π̂i , k̂i ) = 1 ∧ .
(1 − t)π̂i

11
Define

 Pm 
i=1 (1 − Hi )1{pi ≤ c(t, πi , ki )}
FDR(t, Π, K) = E Pm
i=1 1{pi ≤ c(t, πi , ki )}

with Π = (π1 , . . . , πm ) and K = (k1 , . . . , km ). We make the following assumptions to facilitate our

theoretical derivations.

Assumption 3.1. Suppose the parameter spaces Θ and B are both compact.

Assumption 3.2. Suppose

m
1 X −k (x )
lim E log{πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i }
m m
i=1

converges uniformly over θ ∈ Θ and β ∈ B to R(θ, β), which has a unique maximum at (θ∗ , β ∗ ) in

Θ × B.

Let Fab = σ((xi , pi ), a ≤ i ≤ b) be the Borel σ-algebra generated by the random variables (xi , pi )

for a ≤ i ≤ b. Define the α-mixing and φ-mixing coefficients respectively as

α(v) = sup sup |P (AB) − P (A)P (B)|,


b b +∞
A∈F−∞ ,B∈Fb+v

φ(v) = sup sup |P (A|B) − P (A)|.


b b +∞
A∈F−∞ ,B∈Fb+v ,P (B)>0

Assumption 3.3. Suppose (xi , pi ) is α-mixing with α(v) = O(v −ξ ) for ξ > r/(r − 1) and r > 1 (or

φ-mixing with φ(v) = O(v −ξ ) for ξ > r/(2r−1) and r ≥ 1). Further assume supi E| log(pi )|r+δ < ∞

and maxi kxi k∞ < C, where k · k∞ denotes the l∞ norm of a vector and C, δ > 0.

Assumption 3.1 is standard. Assumption 3.2 is a typical condition in the literature of maximum

likelihood estimation for misspecified models, see e.g. White (1982). Assumption 3.3 relaxes the

usual independence assumption by allowing (xi , pi ) to be weakly dependent. It is needed to establish

the uniform strong law of large numbers for the process Rm (θ, β) defined in the proof of Lemma 3.4

below which establishes the uniform strong consistency for π̂i and k̂i . The boundedness assumption

on xi could be relaxed with a more delicate analysis to control its tail behavior and study the

convergence rate of θ̂ and β̂. Denote by k · k the l2 norm of a vector. An essential condition

12
required in our proof of Lemma 3.4 is kθ̂ − θ∗ k max1≤i≤n kxi k = oa.s. (1). If kθ̂ − θ∗ k = Oa.s. (n−a )

for some a > 0, then by the Borel-Cantelli lemma, we require max1≤i≤n Ekxi kk < ∞ for some k

with ak > 2, i.e., xi should have a sufficiently light polynomial tail. We remark that Assumption

3.3 can be replaced by more primitive conditions which allow other weak dependence conditions,
0 ∗ 0 ∗
see, e.g., Pötscher and Prucha (1989). Let πi∗ = W (1/(1 + e−x̃i θ ), 1 , 2 ) and ki∗ = 1/(1 + e−x̃i β ).

Lemma 3.4. Under Assumptions 3.1-3.3, we have

max |π̂i − πi∗ | →a.s. 0, max |k̂i − ki∗ | →a.s. 0.


1≤i≤m 1≤i≤m

We impose some additional assumptions to study the asymptotic FDR control.

Assumption 3.5. For two sequences ai , bi ∈ [, 1] with small enough  and large enough m,

m
1 X
{P (pi ≤ ai |xi ) − P (pi ≤ bi |xi )} ≤ c0 max |ai − bi |,
m 1≤i≤m
i=1

where c0 depends on  but is independent of m, xi , ai and bi .

Assumption 3.6. Assume that

m
1 X
P (pi ≤ c(t, πi∗ , ki∗ )) → G0 (t), (10)
m
i=1
m
1 X
P (1 − pi < c(t, πi∗ , ki∗ )) → G1 (t), (11)
m
i=1
1 X
P (pi ≤ c(t, πi∗ , ki∗ )) → G̃1 (t), (12)
m
Hi =0

for any t ≥ t0 with t0 > 0, where G0 (t), G1 (t) and G̃1 (t) are all continuous functions of t. Note

that the probability here is taken with respect to the joint distribution of (pi , xi ).

Let U (t) = G1 (t)/G0 (t), where G1 and G0 are defined in Assumption 3.6.

Assumption 3.7. There exists a t0 > t0 > 0 such that U (t0 ) < α.

Assumption 3.5 is fulfilled if the conditional density of pi given xi is bounded uniformly across

i on [, 1]. This assumption is not very strong as we still allow the density to be unbounded around

13
zero. Assumptions 3.6-3.7 are similar to those in Theorem 4 of Storey et al. (2004). In particular,

Assumption 3.7 ensures the existence of a cut-off to control the FDR at level α.

We are now in position to state the main result of this section which shows that the proposed

procedure provides asymptotic FDR control. The proof is deferred to the supplementary material.

Theorem 3.8. Suppose Assumptions 3.1-3.7 hold and f0 satisfies Condition (4). Then we have

lim sup FDR(t̂, Π̂, K̂) ≤ α,


m

where Π̂ = (π̂1 , . . . , π̂m ) and K̂ = (k̂1 , . . . , k̂m ).

It is worth mentioning that the validity of our method does not rely on the mixture model as-

sumption (2)-(3). In this sense, our method is misspecification robust as the classical BH procedure

does. We provide a comparison between our method and some recently proposed approaches in the

following table.

Procedure π0 f1 FDR control Dependent Misspec. Multiple Computation


p-values robust covariates
Ignatiadis et al. Varying Partially Asymptotic Unknown Yes No ++++
(2016) used control
Li and Barber Varying Not used Finite sample Gaussian Yes No∗ ++++
(2017) upper bound copula
Lei and Fithian Varying Varying Finite sample Unknown Yes Yes +
(2016) control
Scott et al. Varying Fixed No guarantee Unknown Unknown Yes +++
(2015)
Boca and Leek Varying Not used Unknown Unknown Yes Yes +++
(2018)
The proposed Varying Varying Asymptotic Asymptotic Yes Yes +++
method control

Table 1: Comparison of several covariate adaptive FDR control procedures in recent literature.
The number of “+” represents the speed. *The framework of Li and Barber (2017) allows accom-
modating multiple covariates, but the provided software did not implement.

3.2 Power analysis

We study the asymptotic power of the oracle procedure. Suppose the mixture model (2)-(3)

holds with π0i = π0 (xi ) and f1,i (·) = f1 (·; xi ), where f1 (·; x) is a density function for any fixed

x ∈ X . Denote by F1 (·; x) and F̄1 (·; x) the distribution and survival functions associated with

14
f1 (·; x) respectively. Suppose the empirical distribution of xi ’s converges to the probability law

P. Consider the oracle procedure with πi = π0 (xi ) and ki = k0 (xi ). Here k0 (·) minimizes the

integrated Kullback-Leibler divergence, i.e.,

Z
k0 = argmin DKL (f (·; x)||g(; k(x)))P(dx),
k∈K
Z 1
f (p; x)
DKL (f (·; x)||g(·; k(x))) = f (p; x) log dp,
0 g(p; k(x))

with f (p; x) = π0 (x)f0 (p) + (1 − π0 (x))f1 (p; x) and g(p; k(x)) = π0 (x) + (1 − π0 (x))(1 − k(x))p−k(x) ,
n   o
k(x)
and K = k(x) : log 1−k(x) = β0 + β10 x, (β0 , β1 ) ∈ B . Write c(t, x) = c(t, π0 (x), k0 (x)). By the

law of large numbers, the realized power of the oracle procedure has the approximation

Pm R
i=1 1{i : H = 1, p ≤ c(t, xi )} (1 − π(x))F1 (c(topt , x); x)P(dx)
Power = Pm i ≈ R ,
i=1 1{i : Hi = 1} (1 − π(x))P(dx)

where topt is the largest positive number such that

R
{π0 (x)F0 (c(t, x)) + (1 − π0 (x))F̄1 (1 − c(t, x); x)}P(dx)
R ≤ α. (13)
{π0 (x)F0 (c(t, x)) + (1 − π0 (x))F1 (c(t, x); x)}P(dx)

We remark that when

R
(1 − π0 (x))F̄1 (1 − c(topt , x); x)P(dx)
R ≈ 0, (14)
{π0 (x)F0 (c(topt , x)) + (1 − π0 (x))F1 (c(topt , x); x)}P(dx)

the asymptotic power of the proposed procedure is closed to the oracle procedure based on the

LFDR given by

π0i f0 (pi )
LFDRi (pi ) = . (15)
π0i f0 (pi ) + (1 − π0i )f1,i (pi )

4 Simulation studies

We conduct comprehensive simulations to evaluate the finite-sample performance of the pro-

posed method and compare it to competing methods. For genome-scale multiple testing, the

numbers of hypotheses could range from thousands to millions. For demonstration purpose, we

15
start with m=10, 000 hypotheses. To study the impact of signal density and strength, we simulate

three levels of signal density (sparse, medium and dense signals) and six levels of signal strength

(from very weak to very strong). To demonstrate the power improvement by using external covari-

ates, we simulate covariates of varying informativeness (non-informative, moderately informative

and strongly informative). For simplicity, we simulate one covariate xi ∼ N (0, 1) for i = 1, · · · , m.

Given xi , we let
exp(ηi )
π0i = , ηi = η0 + kd xi ,
1 + exp(ηi )

where η0 and kd determine the baseline signal density and the informativeness of the covariate,

respectively. For each simulated dataset, we fix the value of η0 and kd . We set η0 ∈ {3.5, 2.5, 1.5},

which achieves a signal density around 3%, 8%, and 18% respectively at the baseline (i.e., no co-

variate effect), representing sparse, medium and dense signals. We set kd ∈ {0, 1, 1.5}, representing

a non-informative, moderately informative and strongly informative covariate, respectively. Thus,

we have a total of 3 × 3 = 9 parameter settings. Based on π0i , the underlying truth Hi is simulated

from

Hi ∼ Bernoulli(1 − π0i ).

Finally, we simulate independent z-scores using

zi ∼ N (ks Hi , 1),

where ks controls the signal strength (effect size) and we use values equally spaced on [2, 2.8]. Z-

scores are converted into p-values using the one-sided formula 1 − Φ(zi ). P-values together with xi

are used as the input for the proposed method.

In addition to the basic setting (denoted as Setup S0), we investigate other settings to study

the robustness of the proposed method. Specifically, we study

Setup S1. Additional f1 distribution. Instead of simulating normal z-scores under f1 , we simulate z-

scores from a non-central gamma distribution with the shape parameter k= 2. The scale/non-

centrality parameters of the non-central gamma distribution are chosen to match the variance

and mean of the normal distribution under S0.

16
Setup S2. Covariate-dependent π0i and f1,i . On top of the basic setup S0, we simulate another covariate
2 exp(kf x0i )
x0i ∼ N (0, 1) and let x0i affect f1,i . Specifically, we scale ks by , where we
1 + exp(kf x0i )
set kf ∈ {0, 0.25, 0.5} for non-informative, moderately informative and strongly informative

covariate scenarios, respectively.

Setup S3. Dependent hypotheses. We further investigate the effect of dependency among hypotheses

by simulating correlated multivariate normal z-scores. Four correlation structures, including

two block correlation structures and two AR(1) correlation structures, are investigated. For

the block correlation structure, we divide the 10,000 hypotheses into 500 equal-sized blocks.

Within each block, we simulate equal positive correlations (ρ=0.5) (S3.1). We also further

divide the block into 2 by 2 sub-blocks, and simulate negative correlations (ρ= − 0.5) between

the two sub-blocks (S3.2). For AR(1) structure, we investigate both ρ=0.75|i−j| (S3.3) and

ρ=(−0.75)|i−j| (S3.4).

Setup S4. Heavy-tail covariate. In this variant, we generate xi from the t distribution with 5 degrees of

freedom.

Setup S5. Non-theoretical null distribution. We simulate both increasing and decreasing f0 . For an

increasing f0 (S5.1), we generate null z-score zi |H0 ∼ N (−0.15, 1). For a decreasing f0

(S5.2), we generate null z-score zi |H0 ∼ N (0.15, 1).

We present the simulation results for the Setup S0-S2 in the main text and the results for the Setup

S3-S5 in the supplementary material. To allow users to conveniently implement our method and

reproduce the numerical results reported here, we make our code and data publicly available at

https://github.com/jchen1981/CAMT.

4.1 Competing methods

We label our method as CAMT (Covariate Adaptive Multiple Testing) and compare it to the

following competing methods:

• Oracle: Oracle procedure based on LFDR (see e.g., (15)) with simulated π0i and f1,i , which

theoretically has the optimal performance;

• BH: Benjamini-Hochberg procedure (Benjamini et al., 1995, p.adjust in R 3.4.2);

17
• ST: Storey’s BH procedure (Storey 2002, qvalue package, v2.10.0);

• BL: Boca and Leek procedure (Boca and Leek, 2018, swfdr package, v1.4.0);

• IHW: Independent hypothesis weighting (Ignatiadis et al., 2016, IHW package, v1.6.0);

• FDRreg: False discovery rate regression (Scott et al., 2015, FDRreg package, v0.2,

https://github.com/jgscott/FDRreg), FDRreg(T) and FDRreg(E) represent FDRreg with the

theoretical null and empirical null respectively;

• SABHA: Structure adaptive BH procedure (Li and Barber, 2017, τ = 0.5,  = 0.1 and stepwise

constant weights, https://www.stat.uchicago.edu/∼rina/sabha/All q est functions.R);

• AdaPT: Adaptive p-value thresholding procedure (Lei and Fithian, 2018, adaptMT package,

v1.0.0).

We evaluate the performance based on FDR control (false discovery proportion) and power

(true positive rate) with a target FDR level of 5%. Results are averaged over 100 simulation runs.

4.2 Simulation results

We first study the performance of the proposed method under the basic setup (S0, Figure 2).

All compared methods generally controlled the FDR around/under the nominal level of 0.05 and no

serious FDR inflation was observed at any of the parameter setting (Figure 2A). However, FDRreg

exhibited a slight FDR inflation under some parameter settings and the inflation seemed to increase

with the informativeness of the covariate and signal density. Conservativeness was also observed

for some methods in some cases. As expected, the BH procedure, which did not take into account

π0 , was conservative when the signal was dense. IHW procedure was generally more conservative

than BH and the conservativeness increased with the informativeness of the covariate. CAMT,

the proposed method, was conservative when the signal was sparse and the covariate was less

informative. The conservativeness was more evident when the effect size was small but decreased as

the effect size became larger. AdaPT was more conservative than CAMT under sparse signal/weak

covariate. In terms of power (Figure 2B), there were several interesting observations. First, as the

covariate became more informative, all the covariate adaptive methods became more powerful than

18
ST and BH. The power differences between these methods also increased. Second, FDRreg was the

most powerful across settings. Under a highly informative covariate, it was even slightly above the

oracle procedure, which theoretically had an optimal power. The superior power of FDRreg could

be partly explained by a less well controlled FDR. The IHW was more powerful than BL/SABHA

when the signal was sparse; but the trend reversed when the signal was dense. Third, AdaPT was

very powerful when the signal was dense and the covariate was highly informative. However, the

power decreased as the signal became more sparse and the covariate became less informative. In

fact, when the signal was sparse and the covariate was not informative or moderately informative,

AdaPT had the lowest power. In contrast, the proposed method CAMT was close to the oracle

procedure. It was comparable to AdaPT when AdaPT was the most powerful, but was significantly

more powerful than AdaPT in its unfavorable scenarios. CAMT had a clear edge when the covariate

was informative and signal was sparse. Similar to AdaPT, CAMT had some power loss under sparse

signal and non-informative covariate, probably due to the discretization effect from the BC-type

estimator.

We conducted more evaluations on type I error control under S0. We investigated the FDR

control across different target levels. Figure 3 showed excellent FDR control across target levels

for all methods except FDRreg. The actual FDR level of BH and IHW was usually below the

target level. CAMT was slightly conservative at a small target level under the scenario of sparse

signal and less informative covariate, but it became less conservative at larger target levels. We

also simulated a complete null, where no signal was included (Figure 4). In such case, FDR was

reduced to FWER. Interestingly, FDRreg was as conservative as CAMT and AdaPT under the

complete null.

It is interesting to study the performance of the competing methods under a much larger feature

size, less signal density, and weaker signal strength, representing the most challenging scenario in

real problems. To achieve this end, we simulated m = 100, 000 features with a signal density of

0.5% at the baseline (no covariate effect). Under a moderately informative covariate, we observed a

substantial power improvement of CAMT over all other methods including FDRreg while controlling

the FDR adequately at different target levels (Figure 5). We further reduced the feature size to

1,000 (Figure A1 in the supplement) to study the robustness of the methods to a much smaller

feature size. Although CAMT and AdaPT were still more powerful than the competing methods

19
when the signal was dense and the covariate was informative, a significant power loss was observed

in other parameter settings, particularly under sparse signal and a less informative covariate. As

we further decreased the feature size to 200, CAMT and AdaPT became universally less powerful

than ST across parameter settings (data not shown). Therefore, application of CAMT or AdaPT

to datasets with small numbers of features was not recommended unless the signal was dense and

the covariate was highly informative.

We also simulated datasets, where the z-scores under the alternative were drawn from a non-

central gamma distribution (Setup S1). Under such setting, the trend remained almost the same

as the basic setup (Figure 6), but FDRreg had a more marked FDR inflation. When both π0i and

f1,i depended on the covariate (Setup S2), CAMT became slightly more powerful without affecting

the FDR control, especially when the covariate was highly informative (Figure 7). Meanwhile,

the performance of FDRreg was also remarkable with a very small FDR inflation. However, if we

increased the effect on f1,i by reducing the standard deviation of the z-score under the alternative,

FDRreg was no longer robust and the observed FDP was substantially above the target level when

the signal strength was weak, indicating the benefit of modeling covariate-dependent f1 (Figure

A2 in the supplement). CAMT was also robust to different correlation structures (Setup S3.1,

S3.2, S3.3, S3.4) and we observed similar performance under these correlation structures (Figures

A3-A6 in the supplement). The performance of CAMT was also robust to a heavy-tail covariate

(Setup S4, Figure A7 in the supplement). In an unreported numerical study, we added different

levels of perturbation to the covariate by multiplying random small values drawn from Unif(0.95,

1.05), Unif(0.9, 1.1), and Unif(0.8, 1.2), respectively. We observed that the π0 estimates under

perturbation are highly correlated with the π0 estimates without perturbation, which showed the

stability of our method against data perturbations.

We also examined the robustness of CAMT to the deviation from the theoretical null (Setup

S5). Specifically, we simulated both decreasing and increasing f0 . The new results were presented in

Figures A8 and A9 in the supplement. We observed that, for an increasing f0 , all the methods other

than FDRreg were conservative and had substantial less power than the oracle procedure. FDRreg

using a theoretical null was conservative when the covariate was less informative but was anti-

conservative under a highly informative covariate. On the other hand, FDRreg using an empirical

null had an improved power and controlled the FDR closer to the target level for most settings.

20
However, it did not control the FDR well when the signal was dense and the prior information was

strong. When f0 was decreasing, all the methods without using the empirical null failed to control

the FDR. FDRreg with an empirical null improved the FDR control substantially for most settings

but still could not control the FDR well under the dense-signal and strong-prior setting. Therefore,

there is still room for improvement to address the empirical null problem.

Finally, we compared the computational efficiency of these competing methods (Figure 8).

SABHA (step function) and IHW were computationally the most efficient and they completed

the analysis for one million p-values in less than two minutes. CAMT and the new version of

FDRreg (v0.2) were also computationally efficient, followed by BL, and they all could complete

the computation in minutes for one million p-values under S0. AdaPT was computationally the

most intensive and completed the analysis in hours for one million p-values. We note that all the

methods including AdaPT are computationally feasible for a typical omics dataset.

In summary, CAMT improves over existing covariate adaptive multiple testing procedures, and

is a powerful, robust and computationally efficient tool for large-scale multiple testing.

5 Application to omics-wide multiple testing

To demonstrate the use of the proposed method for real-world applications, we applied CAMT

to several omics datasets from transcriptomics, proteomics, epigenomics and metagenomics studies

with the aim to identify omics features associated with the phenotype of interest. Since AdaPT is

the most start-of-the-art method, we focused our comparison to it. To make a fair comparison, we

first run the analyses on the four omics datasets, which were also evaluated by AdaPT (Lei and

Fithian, 2018), including Bottomly (Bottomly et al., 2011), Pasilla (Brooks et al., 2011), Airway

(Himes et al., 2014) and Yeast Protein dataset (Dephoure et al., 2012). The Bottomly, Pasilla

and Airway were three transcriptomics datasets from RNA-seq experiments with a feature size

of 13,932, 11,836 and 33,469, respectively. The yeast protein dataset was a proteomics dataset

from with a feature size of 2,666. We used the same methods to calculate the p-values for these

datasets as described in Lei and Fithian (2018). The distributions of the p-values for these four

datasets all exhibited a spike in the low p-value region, indicating that the signal was dense. The

logarithm of normalized count (averaged across all samples) was used as the univariate covariate

21
A 99% null, m=50,000, four-component f1
x1 = 0, x2 = 0 x1 = 1, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 1
0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0



● ●
● ●
● ●


● ●
● ●
● ●


● ●
● ●
● ●●

● ●
● ●
● ●


● ●
● ● ●

● ●
● ●
● ●


● ● ●
● ●●

● ●
● ●
● ●


● ●
● ●
● ●

●● ●
● ● ●

● ●● ●● ●


● ●
● ●
● ●●

● ●
● ●
● ●


● ●
● ●● ●


● ●● ●
● ●


● ●
● ●
● ●●

● ●
● ● ●


● ● ●
● ●

● ●
● ●
● ●●

● ●
● ●
● ●

●● ●
● ●
● ●


● ●
● ●● ●

● ● ●
Fn(x)

Fn(x)

Fn(x)

Fn(x)

● ●
● ●
● ●


● ●
● ●
● ●


● ●
● ●
● ●


● ●
● ●
● ●


● ●
● ●
● ●

● ● ● ●

● ●



● ●●●

● ● ●
● ●●

● ●● ●
● ●●

● ●
● ●
● ●●

● ●
● ●
● ●


● ●
● ●
● ●

● ●
● ● ●●

● ●
● ●
● ●


● ●
● ●
● ●●

● ●
● ●
● ●

●● ●● ●
● ●●

● ●
● ●
● ●


● ● ● ●

● ●●


●● ●●
●●

● ●
● ● ●

● ●
● ●● ● ●●


● ●

● ●●
● ●●


● ●● ●● ●





●● ●●● ●
●●
●● ●
●● ●● ●●



● ●
● ●●
● ●




● ●●
● ●●
● ●●


● ●●

● ●●


● ● ●●● ● ●
●● ●● ●
● ●

● ●●● ●● ●●
●● ● ●● ●●
●● ●●
● ●●
● ● ●●

● ●● ●●

●● ●● ●● ●●
●● ● ●● ●
● ●
● ●● ● ●
● ●● ● ●
●●
● ●● ● ●

1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00
p value p value p value p value
B 95% null, m=50,000, four-component f1
x1 = 0, x2 = 0 x1 = 1, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 1
0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0



● ●
● ●
● ●


●● ●
● ●
● ●


● ●

● ●
●● ●



● ●
● ●
● ●



● ●

● ●

● ●


● ●
● ●
● ●



● ●
● ●
● ●



● ●

● ●

● ●



● ●
● ●
● ●


●● ●
●● ●

● ●


● ●
● ●
● ●●


● ●

● ●
●● ●



● ●
● ●
● ●



● ●

● ●

● ●●

● ●
● ●
● ●



● ●●
● ●
● ●


● ●
● ●

● ●


● ●
● ●
● ●




● ●
● ●
● ●


● ●●
● ●
●● ●



● ●

● ●
● ●


● ●
● ●●
● ●


Fn(x)

Fn(x)

Fn(x)

Fn(x)


● ●
● ●
● ●


●● ●●
● ●

● ●


● ●
● ●● ●
●●

● ●

● ●
● ●●


● ●
● ●

● ●




● ●● ●● ●●

● ●

● ●

● ●



● ●

● ●● ●



● ●
● ●

● ●



● ●

● ●
● ●●


● ●
● ●

● ●●

●● ●
● ●
● ●



● ●
●● ●
● ●●




● ●
● ●

● ●



● ●
●● ●●
● ●

●● ●
● ●●
● ●

●●

● ●
● ●

● ●
●●

● ●● ●

● ●●


● ●

● ●●
● ●


● ●
● ●
● ●
●●

● ●
● ●
●● ●

●●

● ●● ●
●● ●
●●




● ●

● ●
● ●



● ●● ●●
● ●
●●
●● ●

● ●●
● ●●
●●

● ●

● ●
● ●
●●


● ●

● ●
●● ●●

●●
●● ●
● ●

●● ●

●●

● ●

● ●●● ●●



● ●

● ●
●●
● ●●●



●● ●
● ●●
● ●●


● ●
●● ●●

●● ●●
●●


● ●

●●
● ●

●● ●●

●●
●●

● ●

● ●

●●
● ●●

●●
●●
● ●

● ●
●●
●●

● ●●


●●●

●●
● ●
● ●

●● ●●
●●●● ●
●● ●●
●●
●●
● ●●● ●
●●
●● ●● ●
●●


●● ●

●●
● ●

●●● ●●
●●

●●
● ●
●●

● ●● ●
●●●●
● ●
●●
● ●
●●

●●
● ●●
●●

● ●●●●●●●● ●●●●●●




●●● ●

●●
●● ● ●
●●●
● ●● ●
●● ●● ● ●●
●●

●●
● ● ●

●● ●●● ● ●●●● ●
●●● ●●● ●

●●
●● ●
●●● ● ● ●● ●●
● ●●● ● ● ●●
● ● ● ● ●

● ●

●● ● ●●●●●●●●
● ●
● ● ●● ●● ●

●●

● ● ●
● ● ● ● ● ● ● ●● ●
● ●● ●●●

1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00
p value p value p value p value

C 80% null, m=50,000, four-component f1


x1 = 0, x2 = 0 x1 = 1, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 1
0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0


Fn(x)

Fn(x)

Fn(x)

Fn(x)

1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00 1e−12 1e−09 1e−06 1e−03 1e+00
p value p value p value p value

Figure 1: Illustration of the approximation to the true likelihood ratio by the surrogate likelihood
ratio based on a beta distribution. Two binary covariates x1 and x2 were simulated. The z-score
under the alternative was drawn from N (0, 1.5 + 0.5x1 + x2 ). Three levels of null proportions (A -
99%, B - 95%, and C - 80%) were simulated, where the null z-score was drawn from N (0, 1). Two-
sided p-values were calculated based on the z-score and the parameter ki of the beta distribution was
estimated by CAMT. The CDF of the empirical distribution of the p-value under the alternative
(black) was compared to CDF of the fitted beta-distribution (red).

22
A Non−informative Prior Moderate Prior Strong Prior
0.06

Sparse Signal
0.04

Method
False Discovery Proportion

0.02
Oracle
0.00
CAMT
0.06

Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST

Dense Signal
BH
0.04

0.02

0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6 ●

Sparse Signal


0.4 ●



Method
● ● ●
0.2 ●



Oracle
0.0 ●
True Positive Rate


CAMT
Medium Signal


0.6 ● ● AdaPT

● ●

IHW
0.4 ● ●




FDRreg(T)

0.2 ●

SABHA

BL
0.8 ●
ST
Dense Signal




0.6 ●




BH





0.4 ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure 2: Performance comparison under the basic setting (S0). False discovery proportions (A)
and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.

23
Non−informative Prior Moderate Prior Strong Prior
0.025

0.000 ● ● ●

Sparse Signal
● ● ● ●
● ●
● ●
● ●

−0.025

−0.050
Method
Deviation from target FDR Level

−0.075
Oracle
−0.100
0.025 ●
CAMT
AdaPT

Medium Signal
0.000 ●

● ● ●
● ●
● ●

● ● ●

−0.025 IHW
FDRreg(T)
−0.050
SABHA
−0.075
BL
−0.100
0.025 ST

0.000 ● ● ● ●
● ● ● ● BH

Dense Signal
● ● ● ● ●
● ●

−0.025

−0.050

−0.075

−0.100
0.00 0.05 0.10 0.15 0.200.00 0.05 0.10 0.15 0.200.00 0.05 0.10 0.15 0.20
Target FDR level

Figure 3: FDR control at various target levels (0.01 - 0.20) under the basic setting (S0) and a
medium signal strength. False discovery proportions were averaged over 100 simulation runs and
the deviation from the target level (y-axis) was plotted.

for the three RNA-seq data (Bottomly, Pasilla and Airway). The logarithm of the total number of

peptides across all samples was used as the univariate covariate for the yeast protein data. Following

AdaPT, we used a spline basis with six equiquantile knots for π0i , f1,i (CAMT and AdaPT) and

for π0i (FDRreg, BL) to account for potential complex nonlinear effects. Since IHW and SABHA

could only take univariate covariate, we used the univariate covariate directly. We summarized

the results in Figure 9. We were able to reproduce the results in Lei and Fithian (2018). Indeed,

AdaPT was more powerful than SABHA, IHW, ST and BH on the four datasets. FDRreg and BL,

which were not compared in Lei and Fithian (2018), also performed well and made more rejections

than other methods on the Yeast dataset and the Bottomly dataset, respectively. The performance

of the proposed method, CAMT, was almost identical to AdaPT, which was consistent with the

simulation results in the scenario of dense signal and informative covariate (Figure 2).

We next applied to two additional omics datasets from an epigenome-wide association study

(EWAS) of congenital heart disease (CHD) (Wijnands et al., 2017) and a microbiome-wide associ-

ation study (MWAS) of sex effect (McDonald et al., 2018).

24
0.00

Deviation from target FDR level

Method
● CAMT
−0.05 ●
AdaPT
IHW
FDRreg(T)


SABHA
−0.10
BL
ST

● BH
−0.15

0.00 0.05 0.10 0.15 0.20


Target FDR level

Figure 4: FDR control at various target levels (0.01 - 0.20) under the complete null (no signal was
simulated). False discovery proportions were averaged over 1,000 simulation runs and the deviation
from the target level (y-axis) was plotted.

25
A
False Discovery Proportion Method
0.4
Oracle
● CAMT
0.3 AdaPT
IHW

0.2 ●
FDRreg(T)
SABHA

BL
0.1
ST

● BH
0.0 ●

0.05 0.10 0.15 0.20


Target FDR level

B 50

Method
Oracle
Number of rejections

40

● CAMT
AdaPT
30
IHW
FDRreg(T)
20

SABHA
BL
10 ST

BH
0 ●

0.05 0.10 0.15 0.20


Target FDR level
Figure 5: Performance comparison with m = 100, 000 under the basic setting (S0). Extremely low
signal density (> 99%), moderate covariate strength and low signal strength were simulated. False
discovery proportions (A) and number of rejections (B) were averaged over 100 simulation runs
and were plotted against various FDR target levels (0.01 - 0.20).

26
A Non−informative Prior Moderate Prior Strong Prior

Sparse Signal
0.075

0.050
Method
False Discovery Proportion

0.025
Oracle
CAMT

Medium Signal
0.075 AdaPT
IHW
0.050
FDRreg(T)
0.025 SABHA
BL
ST

Dense Signal
0.075
BH
0.050

0.025

2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior

Sparse Signal

0.4 ●


● ●

0.2 ●




Method

● ●



Oracle
0.0
True Positive Rate


CAMT
Medium Signal

0.6 ● AdaPT

● ● IHW
0.4 ● ●





FDRreg(T)
● ●

0.2 ● ●
SABHA

BL
● ST
Dense Signal

0.75 ● ●

● ● BH

● ●
0.50 ● ●


● ●



0.25 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure 6: Performance comparison under S1 (f1,i : non-central gamma distribution). False discovery
proportions (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A)
represent the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.

27
A Non−informative Prior Moderate Prior Strong Prior
0.06

Sparse Signal
0.04

0.02 Method
False Discovery Proportion

Oracle
0.00
CAMT
0.06

Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST

Dense Signal
0.04 BH

0.02

0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6 ●

Sparse Signal


0.4 ●

● ●


● ●

0.2 ●

Method
● ●

Oracle
0.0 ●
True Positive Rate


CAMT

Medium Signal

0.6 ●
● ● AdaPT

● ●
● IHW
0.4 ● ●

● ●
FDRreg(T)

0.2 ●

SABHA

BL
0.8 ●
ST
Dense Signal





0.6 ●



BH




0.4 ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure 7: Performance comparison under S2 (covariate-dependent π0,i and f1,i ). False discovery
proportions (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A)
represent the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.

28
6000

Method
4000
● CAMT
IHW

Seconds FDRreg(T)
AdaPT
SABHA
2000
BL


0 ● ● ● ● ●

103 103.5 104 104.5 105 105.5 106


Sample size

Figure 8: Comparison of runtime under the basic setting (S0). Medium signal density and strength,
and a moderately informative covariate was simulated. The number of features varied from 103 to
45 × 103 . The average runtime over three replications was plotted against the feature size on a log
scale. The computation was performed on an AMD Opteron CPU with 256GB RAM and 16 MB
available cache.

• EWAS data. The aim of the EWAS of CHD was to identify the CpG loci in the human genome

that were differentially methylated between healthy (n = 196) and CHD (n = 84) children.

The methylation levels of 455,741 CpGs were measured by the the Illumina 450K methyla-

tion beadchip and was normalized properly before analysis. The p-values were produced by

running a linear regression to the methylation outcome for each CpG, adjusting for potential

confounders such as age, sex and blood cell mixtures as described in Wijnands et al. (2017).

Since widespread hyper-methylation (increased methylation in low-methylation regions) or

hypo-methylation (decreased methylation in high-methylation regions) are common in many

diseases (Robertson, 2005), we use the mean methylation across samples as the univariate

covariate.

• MWAS data. The aim of the MWAS of sex was to identify differentially abundant bacteria

in the gut microbiome between males and females, where the abundances of the gut bacteria

were determined by sequencing a fingerprint gene in the bacteria 16S rRNA gene. We used

the publicly available data from the AmericanGut project (McDonald et al., 2018), where

more than the gut microbiome from more than 10,000 subjects were sequenced. We focused

29
A Bottomly B Pasilla

2500 Method Method

● CAMT ● ● CAMT

Number of rejections
Number of rejections

IHW ●
IHW
2000 ● 800
FDRreg FDRreg

AdaPT ●
AdaPT

1500 SABHA
SABHA 600 ●

● BL BL

1000 BH BH

ST ● ST
400
0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100
Target FDR level Target FDR level

C Airway D Yeast protein


7000
Method Method
● CAMT 400 ●
● CAMT

Number of rejections
Number of rejections


6000

IHW ● IHW
FDRreg 300 ● FDRreg

5000
AdaPT AdaPT

200 ●

SABHA ●
SABHA
4000 ●
BL ●
BL
100
● BH BH
3000
ST ST
0
0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100
Target FDR level Target FDR level

Figure 9: The number of rejections at different target FDR levels on four real datasets used to
demonstrate the performance of AdaPT. The Bottomly (A), Pasilla (B) and Airway (C) datasets
were three transcriptomics datasets from RAN-seq experiments with a feature size of 13,932, 11,836
and 33,469, respectively. The yeast protein dataset (D) was a proteomics dataset with a feature
size of 2,666.

A CHD EWAS B AmericanGut Sex MWAS



● Method 400 Method

● CAMT ● CAMT
Number of rejections

Number of rejections

40 ● ● ● IHW IHW

300 ●

FDRreg ● FDRreg

AdaPT AdaPT
200
SABHA SABHA
20 ●

BL ● BL
100 ●

BH ●
BH
ST ST
0 ● ● 0 ●

0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
Target FDR level Target FDR level

Figure 10: The number of rejections at different target FDR levels on two real datasets: EWAS of
congenital heart disease (A) and MWAS of sex effect (B). The EWAS dataset was produced by the
Illumina 450K methylation beadchip (m=455, 741) and the MWAS dataset was produced by the
16S rRNA gene amplicon sequencing (m=2, 492).

30
our analysis on a relatively homogenous subset consisting of 481 males and 335 males (age

between 13-70, normal BMI, from United States). We removed OTUs (clustered sequencing

units representing bacteria species) observed in less than 5 subjects, and a total of 2, 492

OTUs were tested using Wilcoxon rank sum test on the normalized abundances. We use the

percentage of zeros across samples as the univariate covariate since we expect a much lower

power for OTUs with excessive zeros.

The results for these two datasets were summarized in Figure 10. For the EWAS data, the signal

density was very sparse (π̂0 =0.99, qvalue package). CAMT identified far more loci than the other

methods at various FDR levels. The performance was consistent with the simulation results in

the scenario of extremely sparse signal and informative covariate, where CAMT was substantially

more powerful than the competing methods (Figure 5). At an FDR of 20%, we identified 55

differentially methylated CpGs, compared to 19 for AdaPT. These 55 CpG loci were mainly located

in CpG islands and the gene promotor regions, which were known for their important role in gene

expression regulation (Robertson, 2005). Interestingly, all but one CpG loci had low levels of

methylation, indicating the methylation level was indeed informative to help identify differential

CpGs. We also did gene set enrichment analysis for the genes where the identified CpGs were

located (https://david.ncifcrf.gov/). Based on the GO terms annotated to biological processes

(BP DIRECT), three GO terms were found to be significant (unadjusted p-value <0.05) including

one term “embryonic heart tube development”, which was very relevant to the congenital heart

disease under study (Wijnands et al., 2017). As a sanity check, we randomized the covariate and

re-analyzed the data using CAMT. As expected, CAMT became similar to BH/ST and identified

the same eight CpGs at 20% FDR level.

For the MWAS data, although the difference was not as striking as the EWAS data, CAMT

was still overall more powerful than other competing methods except FDRreg. However, given

the fact that FDRreg was not robust under certain scenarios, the interpretation of the increased

power should be cautious. The relationship between the fitted π0i and the covariate (number

of nonzeros) was very interesting: π̂0i first decreased, reached a minimum at around 70 and then

increased (Figure 11). When the OTU was rare (e.g., a small number of nonzeros, only a few

subjects had it), it was either very individualized or we had limited power to reject it, leading to a

31
A Null probability vs covariate B Target FDR level (alpha = 0.1)
1 rejection 8 rejection

−log10(p−value)
N N
Y 6 Y
0
logit(pi0)

4
−1
2

−2 0
0 200 400 600 800 0 200 400 600 800
Number of nonzeros Number of nonzeros

Figure 11: Performance on the MWAS dataset. (A) The fitted π0i (logit scale) vs. the covariate
(number of nonzeros). (B) p-value (log scale) vs. the covariate (number of nonzeros). Rejected
hypotheses at FDR 10% were in red.

large π0i . In the other extreme where the OTU was very prevalent (e.g., a large number of nonzeros,

most of the subjects had it), it was probability not sex-specific either. Therefore, taking into account

the sparsity level could increase the power of MWAS. It is also informative to compare CAMT to the

traditional filtering-based procedure for MWAS. In practice, we usually apply a prevalence-based

filter before performing multiple testing correction, based on the idea that rare OTUs are less likely

to be significant and including them will increase the multiple testing burden. A subjective filtering

criterion has to be determined beforehand. For this MWAS dataset, if we removed OTUs present

in less than 10% of the subjects, ST and BH recovered 116 and 85 significant OTUs at an FDR

of 10%, compared to 69 and 65 on the original dataset, indicating that filtering did improve the

statistical power of traditional FDR control procedures. However, if we removed OTUs present in

less than 20% of the subjects, the numbers of significant OTUs by ST and BH reduced to 71 and

50 respectively. Therefore, filtering could potentially leave out biologically important OTUs. In

contrast, CAMT did not require an explicit filtering criterion, and was much more powerful (141

significant OTUs at 10% FDR) than the filtering-based method.

6 Discussions
Pm
There are generally two strategies for estimating the number of false rejections i=1 (1 −

Hi )1{hi (pi ) ≥ wi (t)} given the form of the rejection rule hi (pi ) ≥ wi (t). The first approach

32
(called BH-type estimator) is to replace the number of false rejections by its expectation assum-

ing that pi follows the uniform distribution on [0, 1] under the null, which leads to the quantity
Pm
i=1 π0i c(t, π0i , ki ) for c(·) defined in Section 3. The second approach (called BC-type estimator)

estimates the false rejection conservatively by ξ + m


P
i=1 1{hi (1 − pi ) ≥ wi (t)} for a nonnegative

constant ξ under the assumption that the null distribution of p-values is symmetric about 0.5. Both

procedures enjoy optimality in some asymptotic sense, see, e.g., Arias-Castro and Chen (2017). The

advantage of the BC-type procedure lies on that its estimation of the number of false rejections is

asymptotically conservative when the rejection rule converges to a non-random limit (which holds

even under a misspcified model, see e.g., White, 1982) and f0 is mirror conservative (see equation

(3) of Lei and Fithian, 2018). This fact allows us to estimate the rejection rule by maximizing

a potentially misspecified likelihood as the resulting rejection rule has a non-random limit under

suitable conditions. This is not necessarily the case for the BH-type estimator without imposing

additional constraint when estimating π0i and ki . Specific restriction on the estimators of π0i is

required for the BH-type estimator to achieve FDR control, see, e.g., equation (3) of Li and Barber

(2018).

On the other hand, as the BC-type estimator uses a counting approach to estimate the number

of false rejections, it suffers from the discretization issue (i.e., the BC-type estimator is a step

function of t while the BH-type estimator is continuous), which may result in a large variance for

the FDR estimate. This is especially the case when the FDR level is small. For small FDR level,

the number of rejections is usually small, and thus both the denominator and numerator of the

FDR estimate become small and more variable. Another issue with the BC-type estimator is the

selection of ξ. We follow the idea of knockoff+ in Barber and Candès (2015) by setting ξ = 1. This

choice could make the procedure rather conservative when the signal is very sparse, and the target

FDR level is small. A choice of smaller ξ (e.g. ξ = 0) often leads to inflated FDR in our unreported

simulation studies. To alleviate this issue, one may consider a mixed strategy by using
(m m
)
X X
max π0i c(t, π0i , ki ), 1{hi (1 − pi ) ≥ wi (t)}
i=1 i=1

as a conservative estimate for the number of false rejections when t is relatively small. Our numerical

results in Figure A10 in the supplementary material show that the resulting method can successfully

33
reduce the power loss in the case of sparse signals (or small FDR levels) and less informative

covariates while maintaining the good power performance in other cases. A serious investigation of

this mixed procedure and the BH-type estimator is left for future research.

Since our method is not robust to a decreasing f0 , some diagnostics are needed before running

CAMT. To detect a decreasing f0 , the genomic inflation factor (GIF) can be employed (Devin

and Roeder, 1999). GIF is defined as the ratio of the median of the observed test statistic to

the expected median based on the theoretical null distribution. GIF has been widely used in

genome-wide association studies to assess the deviation of the empirical distribution of the null

p-values from the theoretical uniform distribution. To accommodate potential dense signals for

some genomics studies, we recommend to confine the GIF calculation to p-values between 0.5 and

1. If the GIF is larger, using CAMT may result in excess false positives. In such case, the user

should not trust the results and may consider recalculating the p-values by adjusting potential

confounding factors, either known or estimated based on some latent variable approach such as

surrogate variable analysis (Leek and Storey, 2007), or using the simple genomic control approach

based on p-values (Devin and Roeder, 1999).

References

[1] Arias-Castro, E., and Chen, S. (2017). Distribution-free multiple testing. Electronic Journal

of Statistics, 11, 1983-2001.

[2] Barber, R. F., and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs.

Annals of Statistics, 43, 2055-2085.

[3] Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and

powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57,

289-300.

[4] Bottomly, D., Walter, N.A., Hunter, J.E., Darakjian, P., Kawane, S., Buck, K.J., Searles,

R.P., Mooney, M., McWeeney, S.K., and Hitzemann, R. (2011). Evaluating gene expression

in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PloS one, 6,

p.e17820.

34
[5] Brooks, A. N., Yang, L., Duff, M. O., Hansen, K. D., Park, J. W., Dudoit, S., Brenner, S. E.,

and Graveley, B. R. (2011). Conservation of an RNA regulatory map between Drosophila and

mammals. Genome Research, 21, 193-202.

[6] Cai, T. T., and Sun, W. (2009). Simultaneous testing of grouped hypotheses: finding needles

in multiple haystacks. Journal of the American Statistical Association, 104, 1467–1481.

[7] Dephoure, N., and Gygi, S. P. (2012). Hyperplexing: a method for higher-order multiplexed

quantitative proteomics provides a map of the dynamic response to rapamycin in yeast. Science

Signaling, 5, rs2-rs2.

[8] Efron, B. (2004). Local false discovery rate. Technical report, Stanford University, Dept. of

Statistics.

[9] Genovese, C. R., Roeder, K., and Wasserman, L. (2006). False discovery control with p-value

weighting. Biometrika, 93, 509-524.

[10] Devlin, B., and Roeder, K. (1999). Genomic control for association studies. Biometrics, 55,

997-1004.

[11] Himes, B.E., Jiang, X., Wagner, P., Hu, R., Wang, Q., Klanderman, B., Whitaker, R. M.,

Duan, Q., Lasky-Su, J., Nikolos, C., and Jester, W. (2014). RNA-Seq transcriptome profiling

identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine function in

airway smooth muscle cells. PloS one, 9, p.e99625.

[12] Hu, J. X., Zhao, H., and Zhou, H. H. (2010). False discovery rate control with groups. Journal

of the American Statistical Association, 105, 1215-1227.

[13] Ignatiadis, N., Klaus, B., Zaugg, J. B., and Huber, W. (2016). Data-driven hypothesis weight-

ing increases detection power in genome-scale multiple testing. Nature Methods, 13, 577–580.

[14] Leek, J. T., and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by

surrogate variable analysis. PLoS genetics, 3, e161.

[15] Lei, L., and Fithian, W. (2018). AdaPT: An interactive procedure for multiple testing with

side information. Journal of the Royal Statistical Society, Series B, to appear.

35
[16] Li, A., and Barber, R. F. (2017). Multiple testing with the structure adaptive Benjamini-

Hochberg algorithm. arXiv:1606.07926.

[17] McDonald, D., Hyde, E., Debelius, J.W., Morton, J.T., Gonzalez, A., Ackermann, G., et al.

(2018). American Gut: an open platform for citizen science microbiome research. mSystems,

3, e00031–18.

[18] Pötscher, B. M., and Prucha, I. R. (1989). A uniform law of large numbers for dependent and

heterogeneous data processes. Econometrica, 675-683.

[19] Robertson, K. D. (2005). DNA methylation and human disease. Nature Reviews Genetics, 6,

597.

[20] Scott, J. G., Kelly, R. C., Smith, M. A., Zhou, P., and Kass, R. E. (2015). False discovery rate

regression: an application to neural synchrony detection in primary visual cortex. Journal of

the American Statistical Association, 110, 459–471.

[21] Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical

Society, Series B, 64, 479–498.

[22] Storey, J. D., Taylor, J. E., and Siegmund, D. (2004). Strong control, conservative point esti-

mation and simultaneous conservative consistency of false discovery rates: a unified approach.

Journal of the Royal Statistical Society, Series B, 66, 187–205.

[23] Sun, W., Reich, B. J., Cai, T. T., Guindani, M., and Schwartzman, A. (2015). False discovery

control in large-scale multiple testing. Journal of the Royal Statistical Society, Series B, 77,

59–83.

[24] Tansey, W., Koyejo, O., Poldrack, R. A., and Scott, J. G. (2017). False discovery rate

smootinng. arXiv:1411.6144.

[25] White, H. (1982). Maximum likelihood estimation of misspecified Models. Econometrica, 50,

1-25.

[26] Wijnands, K.P., Chen, J., Liang, L., Verbiest, M.M., Lin, X., Helbing, W.A., et al. (2016).

36
Genome-wide methylation analysis identifies novel CpG loci for perimembranous ventricular

septal defects in human. Epigenomics, 9, 241–251.

37
Supplement to “Covariate Adaptive False Discovery Rate Control
with Applications to Omics-Wide Multiple Testing”

Xianyang Zhang and Jun Chen

A1 Technical details

Proof of Theorem 2.1. We first note that

 Pm
− H )1 {hi (pi ) ≥ wi (t∗ )}

i=1 (1
Pm i
FDR =E
1 ∨ i=1 1 {hi (pi ) ≥ wi (t∗ )}
" Pm
1+ i=1 (1 − H )1 {hi (1 − pi ) ≥ wi (t∗ )}
=E Pm i
1 ∨ i=1 1 {hi (pi ) ≥ wi (t∗ )}
Pm #
(1 − H )1 {h (p ) ≥ w (t∗ )}
i i i i
Pi=1
1+ m i=1 (1 − H i )1 {h i (1 − p i ) ≥ wi (t∗ )}
 P m ∗ 
i=1 (1 − Hi )1 {hi (pi ) ≥ wi (t )}
≤αE Pm
1 + s=1 (1 − Hi )1 {hi (1 − pi ) ≥ wi (t∗ )}
Pm
(1 − Hi )1 {ψi (pi ) ≤ t∗ }
 
i=1
=αE , (A1)
1+ m ∗
P
i=1 (1 − Hi )1 {ψi (1 − pi ) ≤ t }

where
πi
ψi (pi ) = .
πi + (1 − πi )hi (pi )

As hi is strictly decreasing, ψi (p) is a strictly increasing function of p. Define




ψi (pi )
 pi < 0.5,
bi =

ψi (1 − pi )
 pi ≥ 0.5.

Define the order statistics b(1) ≤ b(2) ≤ · · · ≤ b(m0 ) for {bi : Hi = 0}, where m0 is the number of

hypotheses under the null. As t∗ ≤ tup , we can find an integer J ≤ m0 such that

b(1) ≤ b(2) ≤ · · · ≤ b(J) ≤ t∗ < b(J+1) ≤ · · · ≤ b(m0 ) .

38
Then for any J,

Pm ∗
i=1 (1 − Hi )1 {ψi (pi ) ≤ t } (1 − B1 ) + · · · + (1 − BJ ) J +1
Pm ∗
= = − 1,
1 + i=1 (1 − Hi )1 {ψi (1 − pi ) ≤ t } 1 + B1 + B2 + · · · + BJ 1 + B1 + B2 + · · · + BJ


where Bi = 1 p(i) ≥ 0.5 . Under Condition (4) of the main paper, we have mini:Hi =0 P (pi ≥

0.5) ≥ 0.5. Using Lemma 1 of Barber and Candès (2016), we have

 
J +1
E − 1 ≤ 1.
1 + B1 + B2 + · · · + BJ

By (A1), we have FDR ≤ α.

−k (x )
Proof of Lemma 3.4. Let ri (θ, β) = log{πθ (xi ) + (1 − πθ (xi ))(1 − kβ (xi ))pi β i } and Rm (θ, β) =
1 Pm
m i=1 ri (θ, β) for θ ∈ Θ and β ∈ B. Under Assumptions 3.1 and the boundedness of xi , there

exist c1 and c2 such that 0 < c1 ≤ πθ (xi ) ≤ 1 − c1 < 1 and 0 < c2 ≤ kβ (xi ) ≤ 1 − c2 < 1 for all i

and θ ∈ Θ, β ∈ B. Thus we have

−(1−c2 )
|ri (θ, β)| ≤| log{πθ (xi ) + (1 − πθ (xi ))(1 − c2 )pi }|

≤| log{2πθ (xi )}| + | log{2(1 − πθ (xi ))(1 − c2 )}| + (1 − c2 )| log(pi )|

≤c3 + (1 − c2 )| log(pi )|,

for some constant c3 > 0. Under Assumption 3.3, by Corollary 1 of Pötscher and Prucha (1989),

we have

Rm (θ, β) − E[Rm (θ, β)] →a.s. 0

uniformly over θ ∈ Θ and β ∈ B. Together with Assumption 3.2, we obtain

sup |Rm (θ, β) − R(θ, β)| →a.s. 0.


θ∈Θ,β∈B

Lemma 2.2 of White (1982) states that if (θ̂, β̂) minimizes −Rm and (θ∗ , β ∗ ) uniquely minimizes

−R, then

(θ̂, β̂) →a.s. (θ∗ , β ∗ ).

39
Finally, notice that

0 ∗ 0
0 ∗ 0 |e−x̃i θ − e−x̃i θ̂ |
max |1/(1 + e−x̃i θ ) − 1/(1 + e−x̃i θ̂ )| = max 0 0 ∗
1≤i≤m 1≤i≤m (1 + e−x̃i θ̂ )(1 + e−x̃i θ )

≤c4 |θ̂ − θ∗ | = oa.s. (1),

for x̃i = (1, xi )0 , where the inequality follows from the mean value theorem for e−x and the bounded-

ness of xi . It implies that max1≤i≤m |π̂i −πi∗ | →a.s. 0. The other result max1≤i≤m |k̂i −πβ ∗ (xi )| →a.s.

0 follows from a similar argument.

Lemma A1.1. Under Assumptions 3.3 and 3.6, we have

m
1 X
1{pi ≤ c(t, πi∗ , ki∗ )} →a.s. G0 (t), (A2)
m
i=1
m
1 X
1{1 − pi < c(t, πi∗ , ki∗ )} →a.s. G1 (t), (A3)
m
i=1
1 X
1{pi ≤ c(t, πi∗ , ki∗ )} →a.s. G̃1 (t), (A4)
m
Hi =0
m
1 X
{1{pi ≤ ti } − P (pi ≤ ti )} →a.s. 0, (A5)
m
i=1

for any t ≥ t0 with t0 > 0 and ti ∈ [0, 1].

Proof of Lemma A1.1. Notice that

1{pi ≤ c(t, πi∗ , ki∗ )} = 1 {ψi (pi , xi ) ≤ t} ,

1{1 − pi < c(t, πi∗ , ki∗ )} = 1 {ψi (1 − pi , xi ) < t} ,

−kβ ∗ (xi )
where ψi (pi , xi ) = πθ∗ (xi )/{πθ∗ (xi ) + (1 − πθ∗ (xi ))(1 − kβ ∗ (xi ))pi }. Under Assumption 3.3,

1{ψi (pi , xi ) ≤ t} and 1{ψi (1 − pi , xi ) < t} are both α-mixing (or φ-mixing) processes. Thus by the

40
strong law of large numbers for mixing processes, we get

m
1 X
[1{pi ≤ c(t, πi∗ , ki∗ )} − P (pi ≤ c(t, πi∗ , ki∗ ))] →a.s. 0,
m
i=1
m
1 X
[1{1 − pi < c(t, πi∗ , ki∗ )} − P (1 − pi < c(t, πi∗ , ki∗ ))] →a.s. 0,
m
i=1
1 X
[1{pi ≤ c(t, πi∗ , ki∗ )} − P (pi ≤ c(t, πi∗ , ki∗ ))] →a.s. 0,
m
Hi =0

and (A5). The conclusion then follows from Assumption 3.6.

Lemma A1.2. Suppose Assumptions 3.3, 3.5 and 3.6 hold. Then for small enough  > 0, we have

m
1 X
sup sup sup 1{pi ≤ c(t, πi , ki )} − G0 (t) ≤ C1  + oa.s. (1), (A6)
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0 m i=1
m
1 X
sup sup sup 1{1 − pi < c(t, πi , ki )} − G1 (t) ≤ C2  + oa.s. (1), (A7)
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0 m
i=1
m
1 X
sup sup sup (1 − Hi )1{pi ≤ c(t, πi , ki )} − G̃1 (t) ≤ C3  + oa.s. (1), (A8)
||K−K ∗ || ∞ < ||Π−Π∗ || ∞ < t≥t0 m
i=1

where C1 , C2 , C3 > 0 are independent of , ||K − K ∗ ||∞ = max1≤i≤m |ki − ki∗ |, ||Π − Π∗ ||∞ =

max1≤i≤m |πi − πi∗ | and 0 < t0 < 1.

Proof of Lemma A1.2. We only prove (A6) as the proofs for the other results are similar. For any

n with 1/n ≤ , let qv,n = G−1 −1


0 (v/n) for v = bnt0 c, . . . , n, where G0 (x) = inf{u : G0 (u) ≥ x}.

Define
m
1 X
Ĝ(t, Π, K) = 1{pi ≤ c(t, πi , ki )}.
m
i=1

Note that Ĝ(t, Π, k) and G0 (t) are both non-decreasing functions of t. Denote by Ĝ(t−, Π, k) and

G0 (t−) the left limits of Ĝ and G0 at point t respectively. Following the proof of Glivenko-Cantelli

Lemma (see Theorem 7.5.2 of Resnick, 2005), we have

sup Ĝ(t, Π, K) − G0 (t)


t≥t0
_n
≤ |Ĝ(qv,n , Π, K) − G0 (qv,n )| ∨ |Ĝ(qv,n −, Π, K) − G0 (qv,n −)| + 1/n. (A9)
v=bnt0 c

41
We note that for any ||K − K ∗ ||∞ <  and ||Π − Π∗ ||∞ < ,

1/(ki∗ +)
t(1 − ki∗ + )(1 − πi∗ + )

c(t, πi , ki ) ≤ c+ (t, πi∗ , ki∗ , ) := 1 ∧ ,
(1 − t)(πi∗ − )
1/(ki∗ −)
t(1 − ki∗ − )(1 − πi∗ − )

c(t, πi , ki ) ≥ c− (t, πi∗ , ki∗ , ) := 1 ∧ .
(1 − t)(πi∗ + )

Also, note that c− (t, πi∗ , ki∗ , ) is bounded away from zero for t ≥ bnt0 c/n and large enough n.

Define

m
∗ ∗ 1 X
Ĝ+ (t, Π , K , ) = 1{pi ≤ c+ (t, πi∗ , ki∗ , )},
m
i=1
m
∗ ∗ 1 X
Ĝ− (t, Π , K , ) = 1{pi ≤ c− (t, πi∗ , ki∗ , )}.
m
i=1

We deduce that

sup sup sup Ĝ(t, Π, K) − G0 (t)


||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0
n
_
≤ |Ĝ+ (qv,n , Π∗ , K ∗ , ) − G0 (qv,n )| ∨ |Ĝ− (qv,n , Π∗ , K ∗ , ) − G0 (qv,n )|
v=bnt0 c

∨ |Ĝ+ (qv,n −, Π∗ , K ∗ , ) − G0 (qv,n −)| ∨ |Ĝ− (qv,n −, Π∗ , K ∗ , ) − G0 (qv,n −)| + 1/n.

Next we analyze the term |Ĝ+ (qv,n , Π∗ , K ∗ , ) − G0 (qv,n )|. As πi∗ and ki∗ are both bounded away

from zero and one, some calculus shows that

max |c+ (qv,n , πi∗ , ki∗ , ) − c(qv,n , πi∗ , ki∗ )| ≤ c1 .


1≤i≤m

It thus implies that

|P (pi ≤ c+ (qv,n , πi∗ , ki∗ , )) − P (pi ≤ c(qv,n , πi∗ , ki∗ ))|

≤E|P (pi ≤ c+ (qv,n , πi∗ , ki∗ , )|xi ) − P (pi ≤ c(qv,n , πi∗ , ki∗ )|xi )| (A10)

≤E max |c+ (qv,n , πi∗ , ki∗ , ) − c(qv,n , πi∗ , ki∗ )| ≤ c1 .


1≤i≤m

42
Then we have

|Ĝ+ (qv,n , Π∗ , K ∗ , ) − G0 (qv,n )|


m
∗ 1 X∗
≤ Ĝ+ (qv,n , Π , K , ) − P (pi ≤ c+ (qv,n , πi∗ , ki∗ , ))
m
i=1
m
1 X
+ {P (pi ≤ c+ (qv,n , πi∗ , ki∗ , )) − P (pi ≤ c(qv,n , πi∗ , ki∗ ))}
m
i=1
m
1 X
+ P (pi ≤ c(qv,n , πi∗ , ki∗ )) − G0 (qv,n )
m
i=1
m
∗ 1 X
≤ Ĝ+ (qv,n , Π , ki∗ , ) − P (pi ≤ c+ (qv,n , πi∗ , ki∗ , ))
m
i=1

+ c0 max |c+ (qv,n , πi∗ , ki∗ , ) − c(qv,n , πi∗ , ki∗ )| + o(1)


1≤i≤m

≤c2  + oa.s. (1),

where the second inequality follows from Assumption 3.5, (A2) and (A5), and the third inequality

is due to (A5) and (A10). Similar arguments can be used to deal with the other terms. Therefore,

sup sup sup Ĝ(t, Π, K) − G0 (t) ≤ c3  + 1/n + oa.s (1) ≤ (c3 + 1) + oa.s. (1).
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0

Lemma A1.3. Under Assumptions 3.1-3.6, we have

m
1 X
sup 1{pi ≤ c(t, π̂i , k̂i )} − G0 (t) = oa.s. (1), (A11)
t≥t0 m
i=1
m
1 X
sup 1{1 − pi < c(t, π̂i , k̂i )} − G1 (t) = oa.s. (1), (A12)
t≥t0 m
i=1
m
1 X
sup (1 − Hi )1{pi ≤ c(t, π̂i , k̂i )} − G̃1 (t) = oa.s. (1). (A13)
t≥t0 m i=1

Proof of Lemma A1.3. To show (A11), we define the event

Am, = {||Π̂ − Π∗ ||∞ < } ∩ {||K̂ − K ∗ ||∞ < }.

43
Conditional on Am, , by Lemma A1.2 and Assumption 3.5, we have

m
1 X 
sup 1{pi ≤ c(t, π̂i , k̂i )} − 1{pi ≤ c(t, πi∗ , ki∗ )}
t≥t0 m
i=1
m
1 X
≤2 sup sup sup (1{pi ≤ c(t, πi , ki )}) − P (pi ≤ c(t, πi , ki ))
||K−K ∗ ||∞ < ||Π−Π∗ ||∞ < t≥t0 m i=1
m 
1 X 
+ sup P (pi ≤ c(t, π̂i , k̂)) − P (pi ≤ c(t, πi∗ , k ∗ ))
t≥t0 m
i=1

≤c0 max sup c(t, π̂i , k̂i ) − c(t, πi∗ , ki∗ ) + c4  + oa.s. (1)
1≤i≤m t≥t0

=c5  + oa.s. (1).

As P (Am, ) → 1, the conclusion follows. The proofs for (A12) and (A13) are similar and we omit

the details.

Lemma A1.4. Under Assumptions 3.1-3.6, we have

Pm
i=1 1{1 − pi < c(t, π̂i , k̂i )} G1 (t)
sup m
− = oa.s. (1), (A14)
G0 (t)
P
t≥t0 i=1 1{pi ≤ c(t, π̂i , k̂i )}
Pm
i=1 (1 − Hi )1{pi ≤ c(t, π̂i , k̂i )} G̃1 (t)
sup Pm − = oa.s. (1). (A15)
t≥t0
i=1 1{p i ≤ c(t, π̂ i , k̂ i )} G 0 (t)

Proof of Lemma A1.4. For the ease of presentation, denote m−1 m


P
i=1 1{1 − pi < c(t, π̂i , k̂i )} and

m−1 m
P
i=1 1{pi ≤ c(t, π̂i , k̂i )} by Gm,1 and Gm,0 respectively. The monotonicity of G0 implies that

mint≥t0 G0 (t) = G0 (t0 ) > 0. By Lemma A1.3, we deduce that

Gm,1 (t) G1 (t)



Gm,0 (t) G0 (t)
(Gm,1 (t) − G1 (t))G0 (t) − G1 (t)(Gm,0 (t) − G0 (t))
=
G0 (t)Gm,0 (t)
G0 (1)|Gm,1 (t) − G1 (t)| + G1 (1)|Gm,0 (t) − G0 (t)|

G0 (t0 ){G0 (t) − supx≥t0 |Gm,0 (x) − G0 (x)|}
G0 (1) supx≥t0 |Gm,1 (x) − G1 (x)| + G1 (1) supx≥t0 |Gm,0 (x) − G0 (x)| a.s.
≤ → 0,
G0 (t0 ){G0 (t0 ) − supx≥t0 |Gm,0 (x) − G0 (x)|}

uniformly for any t ≥ t0 . Similar arguments can be used to prove the other result.

44
Lemma A1.5. Suppose f0 satisfies Condition (4) of the main paper. Under Assumption 3.6, we

have G̃1 (t) ≤ G1 (t) for t ≥ t0 .

Proof of Lemma A1.5. Under Assumption 3.6, we have

1 X
G̃1 (t) = lim P (pi ≤ c(t, πi∗ , ki∗ ))
m→+∞ m
Hi =0
1 X
≤ lim P (1 − pi < c(t, πi∗ , ki∗ ))
m→+∞ m
Hi =0
m
1 X
≤ lim P (1 − pi < c(t, πi∗ , ki∗ )) = G1 (t)
m→+∞ m
i=1

for t ≥ t0 , where the first inequality is due to the assumption P (pi ≤ a) ≤ P (1 − pi ≤ a) for any

a ∈ [0, 1] and pi ∼ f0 .

Proof of Theorem 3.8. Set e = α − U (t0 ). By Lemma A1.4, we have

Pm 0
i=1 1{1 − pi < c(t , π̂i , k̂)}
m
≤ α − e/2 < α,
0
P
i=1 1{pi ≤ c(t , π̂i , k̂)}

with probability tending to one. Therefore, P (t̂ ≥ t0 ) → 1 as m → +∞. Then we have

Pm
i=1 (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
α− Pm
i=1 1{pi ≤ c(t̂, π̂i , k̂i )}
Pm Pm
i=1 1{1 − pi < c(t̂, π̂i , k̂i )} (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
≥ Pm − i=1 Pm
i=1 1{pi ≤ c(t̂, π̂i , k̂i )} i=1 1{pi ≤ c(t̂, π̂i , k̂i )}
( Pm
i=1 1{1 − pi < c(t, π̂i , k̂i )} G1 (t)
≥ inf0 m

G0 (t)
P
i=1 1{pi ≤ c(t, π̂i , k̂i )}
t≥t
Pm )
G1 (t) − G̃1 (t) G̃1 (t) (1 − H i )1{p i ≤ c(t, π̂i , k̂i )}
+ + − i=1 Pm ≥ oa.s. (1),
G0 (t) G0 (t) i=1 1{pi ≤ c(t, π̂i , k̂i )}

where we have used the fact that G1 (t) ≥ G̃1 (t) as shown in Lemma A1.5. It implies that

Pm
i=1 (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
Pm ≤ α + oa.s. (1).
i=1 1{pi ≤ c(t̂, π̂i , k̂i )}

45
Finally by Fatou’s Lemma,
"P #
m
i=1 (1 − Hi )1{pi ≤ c(t̂, π̂i , k̂i )}
lim sup FDR(t̂, Π̂, K̂) ≤ lim sup E Pm ≤ α,
m m i=1 1{pi ≤ c(t̂, π̂i , k̂i )}

which completes the proof.

A2 Additional simulation results

We summarize additional simulation results in Figures A1-A9.

A3 Numerical results for the mixed strategy

Figure A10 provides some numerical results for the mixed strategy discussed in Section 6.

References

[1] Barber, R. F., and Candès, E. J. (2016). A knockoff filter for high-dimensional selective infer-

ence. arXiv:1602.03574.

[2] Pötscher, B. M., and Prucha, I. R. (1989). A uniform law of large numbers for dependent and

heterogeneous data processes. Econometrica, 675-683.

[3] Resnick, S. I. (2005). A probability path. Springer Science & Business Media.

[4] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50,

1-25.

46
A Non−informative Prior Moderate Prior Strong Prior

Sparse Signal
0.09

0.06
Method
False Discovery Proportion

0.03
Oracle
0.00
CAMT

Medium Signal
0.09 AdaPT

0.06 IHW
FDRreg(T)
0.03
SABHA
0.00 BL
ST

Dense Signal
0.09
BH
0.06

0.03

0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6

Sparse Signal

0.4 ●


Method

0.2 ●


● ● ● ● Oracle
0.0 ● ● ● ●
True Positive Rate


CAMT
Medium Signal

0.6 ●
● AdaPT



IHW
0.4 ●




● FDRreg(T)
0.2 ●


SABHA

0.0 ●

BL
0.8 ● ST
Dense Signal



0.6 ●



BH
● ●


0.4 ●

● ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A1: Performance comparison with m = 1, 000 under the basic setting (S0). False discovery
proportions (A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A)
represent the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.

47
A Non−informative Prior Moderate Prior Strong Prior

Sparse Signal
0.3
0.2
Method
False Discovery Proportion

0.1
Oracle
0.0
CAMT

Medium Signal
0.3 AdaPT
IHW
0.2
FDRreg(T)
0.1
SABHA
0.0 BL
ST

Dense Signal
0.3
BH
0.2
0.1
0.0
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior

Sparse Signal
0.6 ●

0.4 ●


0.2 ●

Method


● ● Oracle
0.0 ●
True Positive Rate

● ● ● ●


CAMT
0.8
Medium Signal



AdaPT
0.6 ● ●

● ●
IHW
0.4 ●
● ●


● FDRreg(T)
0.2 ● ●
SABHA
● ●

0.0 ● ● ●
BL
● ST
Dense Signal

0.75 ● ●





BH

0.50 ●


0.25 ●

0.00 ● ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A2: Performance comparison under S2 (covariate-dependent π0,i and f1,i , the standard
deviation of the z-score under H1 is 0.5). False discovery proportions (A) and true positive rates
(B) were averaged over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed
horizontal line indicates the target FDR level of 0.05.

48
A Non−informative Prior Moderate Prior Strong Prior

0.06

Sparse Signal
0.04
Method
False Discovery Proportion

0.02
Oracle
CAMT

Medium Signal
0.06 AdaPT
0.04 IHW
FDRreg(T)
0.02 SABHA
BL
0.06 ST

Dense Signal
BH
0.04

0.02

2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6 ●

Sparse Signal


0.4 ●

● ● ●

0.2 ●



Method



Oracle
0.0
True Positive Rate


CAMT
Medium Signal


0.6 ● ● AdaPT

● ●

IHW
0.4 ● ●

● ●

FDRreg(T)

0.2 ●

SABHA

BL
0.8 ●
ST
Dense Signal




0.6 ●




BH




0.4 ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A3: Performance comparison under S3.1 (block correlation structure, positive correlations
(ρ=0.5) within blocks). False discovery proportions (A) and true positive rates (B) were averaged
over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed horizontal line
indicates the target FDR level of 0.05.

49
A Non−informative Prior Moderate Prior Strong Prior
0.06

Sparse Signal
0.04

Method
False Discovery Proportion

0.02
Oracle
0.00
CAMT
0.06

Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST

Dense Signal
BH
0.04

0.02

0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6 ●

Sparse Signal


0.4 ●

● ●

● ●

0.2 ● ●

Method



Oracle
0.0 ●
True Positive Rate


CAMT
Medium Signal


0.6 ● ● AdaPT




IHW
0.4 ● ●




FDRreg(T)

0.2 ●

SABHA

BL
0.8 ●
ST
Dense Signal




0.6 ●



BH






0.4 ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A4: Performance comparison under S3.2 (block correlation structure, positive/negative
correlations (ρ= ± 0.5) within blocks). False discovery proportions (A) and true positive rates (B)
were averaged over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed
horizontal line indicates the target FDR level of 0.05.

50
A Non−informative Prior Moderate Prior Strong Prior
0.06

Sparse Signal
0.04

Method
False Discovery Proportion

0.02
Oracle
0.00
CAMT
0.06

Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST

Dense Signal
BH
0.04

0.02

0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6 ●

Sparse Signal


0.4 ● ●

● ●

Method

0.2 ● ●

● ●



Oracle
0.0
True Positive Rate


CAMT
Medium Signal

0.6 ● ●

AdaPT




IHW
0.4 ● ●




FDRreg(T)

0.2 ●

SABHA

BL
0.8 ●
ST
Dense Signal








BH
0.6 ●


● ●

0.4 ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A5: Performance comparison under S3.3 (AR(1) structure, positive correlations (ρ=0.75)).
False discovery proportions (A) and true positive rates (B) were averaged over 100 simulation runs.
Error bars (A) represent the 95% CIs and the dashed horizontal line indicates the target FDR level
of 0.05.

51
A Non−informative Prior Moderate Prior Strong Prior

0.06

Sparse Signal
0.04

Method
False Discovery Proportion

0.02
Oracle
0.00
CAMT

Medium Signal
0.06 AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST

Dense Signal
BH
0.04

0.02

0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6 ●

Sparse Signal


0.4 ● ●




● ●

0.2 ● ●

Method


● Oracle
0.0 ●
True Positive Rate


CAMT

Medium Signal

0.6 ● ●

AdaPT

● ●

IHW
0.4 ● ●




FDRreg(T)

0.2 ●

SABHA

BL
0.8 ●
ST
Dense Signal




0.6 ●



BH


● ●

0.4 ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A6: Performance comparison under S3.4 (AR(1) structure, positive/negative correlations
(ρ= ± 0.75)). False discovery proportions (A) and true positive rates (B) were averaged over 100
simulation runs. Error bars (A) represent the 95% CIs and the dashed horizontal line indicates the
target FDR level of 0.05.

52
A Non−informative Prior Moderate Prior Strong Prior
0.08

Sparse Signal
0.06
0.04
Method
False Discovery Proportion

0.02
0.00 Oracle
0.08 CAMT

Medium Signal
0.06 AdaPT
IHW
0.04
FDRreg(T)
0.02
SABHA
0.00 BL
0.08
ST

Dense Signal
0.06
BH
0.04
0.02
0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


Sparse Signal

0.6 ● ●



0.4 ●

Method
● ●

0.2 ●
● ●

0.0 ●

Oracle
True Positive Rate

0.8 ●

CAMT
Medium Signal



AdaPT
0.6 ●


● ●

IHW
0.4 ● ●


FDRreg(T)

0.2 ●
SABHA

BL


ST
Dense Signal

● ●

0.7 ●





● BH

0.5 ●
● ●


0.3

2 4 6 2 4 6 2 4 6
Effect size

Figure A7: Performance comparison under S4 (heavy-tail covariate). False discovery proportions
(A) and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent
the 95% CIs and the dashed horizontal line indicates the target FDR level of 0.05.

53
A Non−informative Prior Moderate Prior Strong Prior

Sparse Signal
0.2

0.1 Method
False Discovery Proportion

Oracle
0.0 CAMT
AdaPT

Medium Signal
0.2 IHW
FDRreg(T)
0.1 FDRreg(E)
SABHA
0.0
BL

Dense Signal
ST
0.2
BH
0.1

0.0
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior

0.6

Sparse Signal

0.4 ●


● Method
0.2 ●




Oracle
● ●

0.0 ● ●
CAMT
True Positive Rate

0.8
AdaPT
Medium Signal


0.6 ● ●

IHW
● ●

0.4 ●


FDRreg(T)

0.2 ●
● ●

FDRreg(E)



SABHA
0.0
BL
0.8
Dense Signal

● ●


ST
0.6 ●



BH
● ●

0.4 ●

● ●



0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A8: Performance comparison under S5.1 (increasing f0 ). False discovery proportions (A)
and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs and the dashed horizontal line indicates the target FDR level of 0.05. FDRregT and
FDRregE represent the FDRreg method using the theoretical and empirical null respectively.

54
A Non−informative Prior Moderate Prior Strong Prior

Sparse Signal
0.2

0.1
Method
False Discovery Proportion

Oracle
0.0 CAMT
AdaPT

Medium Signal
0.2 IHW
FDRreg(T)
0.1 FDRreg(E)
SABHA
0.0
BL

Dense Signal
ST
0.2
BH
0.1

0.0
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior



Sparse Signal
0.6 ●


0.4 ●

● ●

Method


● ●

0.2 ●



Oracle

0.0 CAMT
True Positive Rate

0.8 ●
AdaPT
Medium Signal


● ●

0.6 ● ●

IHW
● ●

0.4 ●


FDRreg(T)


● FDRreg(E)
0.2 ●

SABHA
0.0

BL


0.75
Dense Signal







ST

0.50

● ●
BH
● ●

0.25

0.00
2 4 6 2 4 6 2 4 6
Effect size

Figure A9: Performance comparison under S5.2 (decreasing f0 ). False discovery proportions (A)
and true positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs and the dashed horizontal line indicates the target FDR level of 0.05. FDRregT and
FDRregE represent the FDRreg method using the theoretical and empirical null respectively.

55
A Non−informative Prior Moderate Prior Strong Prior
0.06

Sparse Signal
0.04

Method
False Discovery Proportion

0.02
Oracle
0.00
CAMT
0.06

Medium Signal
AdaPT
0.04 IHW
FDRreg(T)
0.02
SABHA
0.00 BL
0.06 ST

Dense Signal
BH
0.04

0.02

0.00
2 4 6 2 4 6 2 4 6
Effect size

B Non−informative Prior Moderate Prior Strong Prior


0.6 ●

Sparse Signal


0.4 ●

● ●


Method

0.2 ●

● ●



Oracle
0.0
True Positive Rate


CAMT

Medium Signal

0.6 ● ● AdaPT




IHW
0.4 ● ●

● ●

FDRreg(T)

0.2 ●

SABHA

BL
0.8 ●
ST
Dense Signal




0.6 ●




BH




0.4 ●

0.2 ●

2 4 6 2 4 6 2 4 6
Effect size

Figure A10: Performance comparison under the basic setting (S0). CAMT used the mixed strategy
discussed in in Section 6. False discovery proportions (A) and true positive rates (B) were averaged
over 100 simulation runs. Error bars (A) represent the 95% CIs and the dashed horizontal line
indicates the target FDR level of 0.05.

56

You might also like