Efron Mixture
Efron Mixture
Current scienti c techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands
of cases to consider simultaneously. This poses new dif culties for the statistician, but also opens new opportunities. In particular, it allows
empirical estimation of an appropriate null hypothesis. The empirical null may be considerably more dispersed than the usual theoretical
null distribution that would be used for any one case considered separately. An empirical Bayes analysis plan for this situation is developed,
using a local version of the false discovery rate to examine the inference issues. Two genomics problems are used as examples to show the
importance of correctly choosing the null hypothesis.
KEY WORDS: Empirical Bayes; Empirical null hypothesis; Local false discovery rate; Microarray analysis; Unobserved covariates.
Bradley Efron is Professor, Department of Statistics, Stanford Univer- © 2004 American Statistical Association
sity, Stanford, CA 94305 (E-mail: [email protected]). The author thanks Journal of the American Statistical Association
Robert Shafer, David Katzenstein, and Rami Kantor for bringing the drug muta- March 2004, Vol. 99, No. 465, Theory and Methods
tion data to his attention, and Robert Tibshirani for several helpful discussions. DOI 10.1198/016214504000000089
96
Efron: Choising a Null Hypothesis in Large-Scale Tests 97
with the uij and ¯i mutually independent and with ¹i having some prior distribution g.¹/,
»
¡1; j D 1; 2; : : : ; n ¹i » g.¹/ for i D 1; 2; : : : ; N: (21)
Ij D (18)
C1; j D n C 1; : : : ; 2n Structure (20) is often a good approximation (see Efron 1988,
Then it is easy to show that the statistics Yi follow a dilated sec. 4), and in fact proved reasonably accurate in the bootstrap
t distribution with 2n ¡ 2 df, experiment yielding (15). Together, (20) and (21) say that the
³ ´1 mixture density f .z/, (10), is a convolution of g.¹/ with the
n 2
standard normal density ’.z/,
Yi » 1 C ¾¯2 ¢ t2n¡2 ; (19)
2 Z 1
whereas the permutation distribution, permuting Treatments f .z/ D ’.z ¡ ¹/g.¹/ d¹ (22)
¡1
and Controls within each experiment, has nearly a standard
t2n¡2 null distribution. So, for example, if ¾¯2 D 2=n, then [with the understanding that g.¹/ may include discrete proba-
p bility atoms].
the empirical density of the Yi ’s will be 2 times as wide as As a rst application of the structural model, suppose that
the permutation null.
we insist that g.¹/ put probability p0 on ¹ D 0,
The quantity ¯i in (17) and (18) produces the only consistent
differences between Treatments and Controls in experiment i. Prg f¹ D 0g D p0 ; (23)
If ¯i is a dependable feature of the ith experiment, and would for some xed value of p0 between 0 and 1. This amounts to the
appear again with the same value in a replication of the study, original Bayes model (9) with p0 D PrfUninterestingg, f0 .z/
then the permutation null t2n¡2 is a reasonable basis for infer- the theoretical null hypothesis N.0; 1/, and
ence. With n large and ¾¯2 D 2=n, this results in fdr.yi / < :10 Z .
for the most extreme 2% of the observed t statistics, favoring f1 .z/ D ’.z ¡ ¹/g.¹/ d¹ .1 ¡ p0 /: (24)
those with the largest values of j¯i j. ¹6D0
Suppose, however, that ¯i is not inherent to experiment i, In the context of this article, p0 should be .90 or greater.
but rather is a purely random effect that would have a different For any f .z/ of the convolution form (22), let .±g ; ¾g / be the
value and perhaps a different sign if the study were repeated; center and width parameters .±0 ; ¾0 / de ned by (14). Figure 4
that is, ¯i is part of the noise and not part of the signal. In answers the following question: For a given choice of p0 in
this case, the appropriate choice is the empirical null (19). constraint (23), what are the maximum possible values of j±g j
The equivalent of Figure 1 would be all central peak, with no and of ¾g ,
interesting outliers, and no cases with small values of fdr.yi /.
This is appropriate, because now there is no real Treatment ±max D maxfj±g jjp0 g and ¾max D maxf¾g jp0 g? (25)
effect. Three curves appear for ¾max , for the general case just de-
In this latter context ¯i acts as an unobserved covari- scribed, for the case where the nonzero component of g.¹/ is
ate, a quantity that the statistician would use to correct the required to be symmetric around 0, and for the case where it
Treatment–Control comparison if it were observable. Unob- is also required to be normal. Here only the general case will
served covariates are ubiquitous in observational studies. There be discussed. Remark F (Sec. 7) discusses the solution of (25),
are several obvious ones in the drug mutation study, including which turns out to have a simple “single-point” form.
personal patient characteristics, such as age and gender, previ- The notable feature of Figure 4 is that for p0 ¸ :90, my
ous use of AZT and other non-PI drugs, years since infection, preferred realm for large-scale hypothesis testing, .±max ; ¾max /
geographic location, and so on. must be quite near the theoretical null values .0; 1/,
The effect of important unobserved covariates is to dilate the
null hypothesis density f0 .z/, as happens in (19). Unobserved ±max · :07 and ¾max · 1:04: (26)
covariates will also dilate the Interesting density f1 .z/ in (9),
and the mixture density f .z/, (10). However, an empirical t-
ting method for estimating f .z/, such as the spline t in Fig-
ure 1, automatically includes any dilation effects. In estimating
fdr.z/ D f0 .z/=f .z/, it is important to also allow for dilation of
the numerator f0 . This is a strong argument for preferring the
empirical null hypothesis in observational studies.
5. A STRUCTURAL MODEL FOR THE z -VALUES
The Bayesian speci cations (9) underlying the fdr results
have the advantage of not requiring a structural model for
the z-values; in particular, it is not necessary to motivate,
or even describe, the nonnull density f1 .z/. There is, how-
ever, a simple structural model that can help elucidate the
Interesting–Uninteresting distinction in (9).
The structural model assumes that zi , the ith z-value, is nor- Figure 4. Maximum Possible Values of the Center and Width Para-
meters (±0 , ¾0 ), (14), When the Structural Model (20)–(22) is Constrained
mally distributed around a “true value” ¹i , its expectation,
to Put Probability p0 on ¹ D 0. For 1 ¡p0 · .10, the maxima are not much
zi » N.¹i ; 1/ for i D 1; 2; : : : ; N ; (20) greater than the theoretical null values (0, 1), as shown in Table 1.
100 Journal of the American Statistical Association, March 2004
Table 2. Weights ¼ j and Locations ¹ j for the Eight-Point Best-Fit Estimate g(¹) of Figure 8
7. REMARKS
Remark A (Drug mutation study). The data base for the
drug mutation study (Wu et al. 2002), included 2,497 patients
having HIV subtype B, of whom 1,391 had received at least
Figure 6. Histogram of N D 3,226 z-Values From the Breast Can- one of six popular protease inhibitor (PI) drugs: amprenavir,
cer Study. The theoretical N(0, 1) null is much narrower than the central indinavir, lopinavir, nel navir, ritonavir, or saquinavir. Among
peak, which has (±0 , ¾0 ) D (¡.02, 1.58). In this case the central peak
the 1,391, the mean number of PI drugs taken was 2.05 per pa-
seems to include the entire histogram.
tient. Amino acid sequences were obtained at all 99 positionson
the HIV protease gene, and mutations from wild-type recorded;
microarray experiment concerning differences between two 25 positions showed 3 or fewer mutations among the 1,391 pa-
types of genetic mutations causing increased breast cancer risk, tients, deemed too few for analysis, leaving 74 positions for the
BRCA1 and BRCA2 (see Hedenfalk, Duggen, and Chen 2001; investigationhere. Each of the 74 individuallogistic regressions
Efron and Tibshirani 2002; Efron 2003). included an intercept term as well as the six PI main effects, but
no other covariates.
The experiment included 15 breast cancer patients, 7 with
BRCA1 and 8 with BRCA2. Each women’s tumor was ana- Remark B (The local false discovery rate). The local fdr,
lyzed on a separate microarray, each microarray reporting on (11) or (12), is closely related to Benjamini and Hochberg’s
the same set of N D 3,226 genes. For each gene, the two-sample (1995) “tail-area” FDR, as discussed by Efron et al. (2001),
t statistic yi comparing the seven BRCA1 responses with the Storey (2002), and Efron and Tibshirani (2002). Substituting
eight BRCA2’s was computed. The yi ’s were then converted cdf’s F0 and F for the densities f0 and f , Bayes’s theorem
to z-values, gives a tail-area version of (11),
PrfUninterestingjz · z0 g D p0 F0 .z0 /=F .z0 /
zi D 8¡1 F13 .yi /; (29)
´ FDR.z0 /: (30)
where F13 is the cdf of a standard t distribution with 13 df.
Figure 6 displays the histogram of the 3,226 z-values. Here FDR.z0 / turns out to be the conditional expectation
The central peak is wider here than in Figure 1, with center- of fdr.z/ ´ p0 f0 .z/=f .z/ given z · z0 ,
Z z0 ¿Z z
width estimates .±0 ; ¾0 / D .¡:02; 1:58/. More importantly, 0
FDR.z0 / D fdr.z/f .z/ dz f .z/ dz: (31)
the histogram seems to be all central peak, with no interest- ¡1 ¡1
ing outliers such as those seen at the left of Figure 1. This
Benjamini and Hochberg worked in a frequentist framework,
was re ected in the local fdr calculations; using the theoreti- but their FDR control rule can be stated in empirical Bayes
cal N.0; 1/ null yielded 35 genes with fdr.zi / < :1, those with terms. Given F0 , which they usually took to be what has been
jzi j > 3:35; using the empirical N.¡:02; 1:582/ null, no genes called here the theoretical null, they estimate FDR.z0 / by
at all had fdr < :1—or, for that matter, fdr < :9, the histogram
FDR.z b.z0 /;
d 0 / D p0 F0 .z/=F (32)
in fact being a little short-tailed compared with N.¡:02; 1:582/.
There is ample reason to distrust the theoretical null in where Fb is the empirical cdf of the zi ’s. For a desired control
this case. The microarray experiment, for all its impressive level ®, say ® D :05, de ne
technology, is still an observational study, with a wide range of
d
z0 D arg maxfFDR.z/ · ®gI (33)
unobserved covariates possibly distorting the BRCA1–BRCA2 z
comparison. then rejecting all cases with zi · z0 gives an expected (frequen-
Another reason for doubt can be found in the data itself. tist) rate of false discoveries no greater than ®.
The fdr methodology does not require independence of the yi ’s With z0 as in (33), relation (31) (applied to the estimated
or zi ’s across genes. However, it does require that the 15 mea- versions of FDR, fdr, and f ) says that the weighted average
surements for each gene be independentacross the microarrays. of fdr.zi / for the cases rejected by the FDR level-® rule is
Otherwise, the two-sample t statistic yi will not have an t13 null itself ®. As an example, take ® D :05 and f0 equal the theoreti-
distribution, not even approximately. cal N.0; 1/ null. Applying the FDR control rule to the negative
Unfortunately the experimental methodology used in the side of Figure 1’s drug mutation data rejects the null hypothesis
breast cancer study seems to have induced substantial correla- for the 56 cases having zi · ¡2:61; the corresponding 56 val-
tions among the various microarrays. In particular, as discussed ues of fdr.zi / have weighted average ® D :05. They vary from
in Remark G, the rst four microarrays in the BRCA2 groups nearly 0 at the far left to .19 at the boundary value z D ¡2:61,
102 Journal of the American Statistical Association, March 2004
justifying the term “local”; zi ’s near the boundary are more in (14). This procedure gave the small bootstrap standard error
likely to be false discoveries than the overall .05 rate suggests. estimate in (15).
Our concern with a correct choice of null hypothesis ap- None of this methodology is crucial, although it is impor-
plies to FDR just as well as to fdr. In the microarray study, tant that the estimates ±0 and ¾0 relate directly to f0 .z/ and are
FDR with F0 D N.0; 1/ gives 24 signi cant genes at ® D :05, not much affected by the nonnull distribution f1 .z/ in (9). As
whereas F0 D N.¡:02; 1:582/ gives none. In fact, any simul- an example of what can go wrong, suppose that one tries to
taneous testing procedure, the popular Westfall–Young method estimate ¾0 by a “robust” scale measure, such as (84th quan-
(Westfall and Young, 1993), for example, will depend on a cor- tile minus 16th quantile)=2. This gives ¾0 D 1:47 for the drug
rect assessment of p values for the individual cases, that is, on mutation data, re ecting long tails due to the Interesting cases
the choice of F0 . in Figure 1. Similar dif culties arise using the central slope of
Remark C [Estimating f .z/]. The Poisson regression me- a qq plot. Basically, a density estimate of the central peak is
thod used in Figure 1 to estimate the mixture density f .z/, (10), required, and then some assessment of its center and width.
originates in an idea of Lindsey described by Efron and Tibshi- More ambitiously, one might try extending the estimation
rani (1996, sec. 2). The range of the sample z1 ; z2 ; : : : ; zN is of f0 .z/ to third moments, permitting a skew null distribution.
partitioned into K equal intervals, with interval k having mid- Expression (35) could be generalized to
point xk and containingcount sk of the N z-values; the expecta-
P 0 C c1 z C c2 z2 =2 C c3 z 3 =6;
¡ log f .z/Dc (36)
tion ¸k of sk is nearly proportional to fk ´ f .xk /, and if the zi ’s
are independent,then the counts approximate independentPois- now requiring three derivates to estimate the coef cients rather
son variates, than the two of (14). This is an unexplored path, and in particu-
ind lar Table 1 has not been extended to include skewness bounds.
sk » Poi.¸k / and ¸k D cfk ; k D 1; 2; : : : ; K; (34)
Familiarity was the only reason for using z-values instead of
where c is a constant depending on N and the interval length. t -values in Figures 1 and 6.
Lindsey’s method is to estimate the ¸k ’s with a Pois-
Remark E (Estimating p0 ). One can obtain reasonable upper
son regression, which because of (34) amounts to estimating
bounds for p0 in (9) from estimates of
a scaled version of the fk ’s; in other words, estimating f .z/.
K equals 60 in Figure 1, with the regression model being a nat- ¼.c/ ´ Prf fzi 2 ±0 § c¾0 g: (37)
ural spline with 7 df, roughly equivalent to a sixth degree poly-
nomial t in z. Supposing that f0 .z/ D N.±0 ; ¾02 /, de ne
Poisson regression based on (34) is almost fully ef cient for Z ±0 Cc¾0
estimating f .z/ if the zi ’s are independent. Here one does not G0 .c/ D 28.c/ ¡ 1 and G1 .c/ D f1 .z/ dz; (38)
expect independence,but we still have the expectationof sk pro- ±0 ¡c¾0
portional to fk . The Poisson regression method will still tend
the probabilities that zi 2 ±0 § c¾0 under f0 and f1 . Then
to unbiasedly estimate f .z/, assuming the regression model is
suf ciently exible, though we may lose estimating ef ciency. ¼.c/ ¡ G1 .c/ ¼.c/
I also used the bootstrap analysis that gave the standard er- p0 D · ; (39)
G0 .c/ ¡ G1 .c/ G0 .c/
rors in (15) to check (34). This turned out to be surprisingly
accurate for the drug mutation data. If it had not, then I might the inequality following from the assumption that G1 .c/ ·
have used the bootstrap estimate of covariance for the sk ’s to G0 .c/; that is, the f1 density is more dispersed than f0 .
motivate a more ef cient estimation procedure, though this is This leads to the estimated upper bound for p0 ,
unlikely to be important for large values of N . In any case boot- b
¼ .c/
strap analyses as in (15) will provide legitimate standard errors b0 D
p ; ¼ .c/ D #fzi 2 ±0 § c¾0 g=N:
with b (40)
G0 .c/
for the Poisson regression whether or not (34) is valid.
Remark D (Estimating the empirical null distribution). In particular, if it is assumed that G1 .c/ D 0—in other words,
The main tactic of this article is to estimate the null distribution that the Interesting zi ’s always fall outside ±0 § c¾0 —then
f0 .x/ in (9) from the central peak in the z-values’ histogram. b0 D b
p ¼ .c/=G0 .c/ is unbiased. (This is the same estimate sug-
Assuming normality for f0 gives gested in remark F of Efron et al. 2001 and Storey 2002.)
³ ´ Choosing .±0 ; ¾0 / D .¡:35; 1:20/ and c D 1:5 gave p b0 D :88
1 z ¡ ±0 2 for the drug mutation data, with bootstrap standard error .024.
log f .z/ D
P ¡ C constant (35)
2 ¾0
Remark F [Single-point solutions for .±max ;¾ max /]. The dis-
for z near 0, so that ± 0 and ¾0 can be estimated by differentiating tributions g.¹/ providing .±max ; ¾max / in (25), as graphed in
log f .z/ as in (14). The constant depends on N and p0 , but the Figure 4, have their nonzero components supported at a single
constant has no effect on the derivatives in (14). point ¹1 . For example, g.¹/ for the entry giving ¾max D 1:04
Directly differentiating the spline estimate of log f .z/ can in Table 1 puts probability .90 at ¹ D 0 and .10 at ¹1 D 1:47.
give an overly variable estimate of ¾0 . One more smoothing Single-point optimality was proved for three of the four cases in
step was used here, tting a quadratic curve a0 C a1 xk C a2 x k2 Figure 4, and veri ed by numerical maximization for the “Gen-
by ordinary least squares to the estimated values log fk , for xk eral” case. Here is the proof for the ¾max “Symmetric” case;
1
within 1.5 units of the maximum ±0 , yielding ¾0 D [¡2a2 ]¡ 2 as the other two proofs are similar.
Efron: Choising a Null Hypothesis in Large-Scale Tests 103
Table 3. Correlation Matrix for the BRCA2 Data With Row-Wise Means
Consider symmetric distributions putting probability p0
Subtracted off (46), Indicating Positive Correlations Within the
on ¹ D 0 and probabilities pj on symmetric pairs .¡¹j ; ¹j /, Two Blocks of Four
j D 1; 2; : : : ; J , so (22) becomes
1 2 3 4 5 6 7 8
J
X
1 1.00 .02 .02 .23 ¡.36 ¡.35 ¡.39 ¡.34
f .z/ D p0 ’.z/ C pj [’.z ¡ ¹j / C ’.z C ¹j /]=2: (41) 2 .02 1.00 .10 ¡.08 ¡.30 ¡.30 ¡.23 ¡.33
j D1 3 .02 .10 1.00 ¡.17 ¡.21 ¡.26 ¡.31 ¡.27
P 4 .23 ¡.08 ¡.17 1.00 ¡.30 ¡.23 ¡.27 ¡.32
De ning c0 D p0 =.1 ¡ p0 /, rj D pj =p0 , and rC D J1 rj D 5 ¡.36 ¡.30 ¡.21 ¡.30 1.00 ¡.02 .11 .22
1=c0 , ¾0 in (14) can be expressed as 6 ¡.35 ¡.30 ¡.26 ¡.23 ¡.02 1.00 .15 .13
7 ¡.39 ¡.23 ¡.31 ¡.27 .11 .15 1.00 .07
PJ 2 ¡¹2j =2 8 ¡.34 ¡.33 ¡.27 ¡.32 .22 .13 .07 1.00
¡ 12 1 rj ¹j e
¾0 D .1 ¡ Q/ ; where Q D P 2
: (42)
c0 rC C J1 rj e¡¹j =2
Here ± 0 D 0, which is true by symmetry assuming that p0 ¸ 1=2. off-diagonal blocks too negative and the on-diagonal blocks too
Then ¾max in (25) can be found by maximizing Q. positive.
It will be shown that with p0 (and c0 ) and ¹1 ; ¹2 ; : : : ; ¹J Remark H (Scaling properties). The associate editor pointed
held xed in (41), Q is maximized by a choice of p1 ; p2 ; : : : ; pJ out that the combination of empirical null hypotheseswith false
having J ¡ 1 zero values; this is a stronger version of the single- discovery rate methodology “scales up” nicely, in terms of both
point result. Because Q is homogeneous in r D .r1 ; r2 ; : : : ; rJ / the number of tests and the amount of information per test.
in (42), the unconstrained maximization of Q.r/, subject only Consider the structural model (20), (21) with g.¹/ a mixture
to rj ¸ 0 for j D 1; 2; : : : ; J , can be considered. of 99% ¹ » N.0; :01/ and 1% of ¹ D 5. For N the number of
Differentiation gives
tests large enough, methods like Bonferroni bounds that control
1 £ 2 ¡¹2j =2 ¡ ¡¹2 =2 ¢¤ the family-wise error rate will eventually accept all N null hy-
@ Q=@rj D ¹j e ¡ Q ¢ c0 C e j ; (43)
den potheses; fdr methods, using either the empirical or theoretical
with “den” the denominator of Q. At a maximizing point r, null, will correctly identify most of the Interesting 1%.
we must have Suppose now that the amount of informationp per test in-
@Q.r/ creases by a factor of n, so that each ¹i ! n ¹i in (21). Using
· 0 with equality if rj > 0: (44) the theoretical N.0; 1/ null makes fdr reject all N cases for n
@rj
suf ciently large, whereas the empirical null continues to iden-
¹2j =2 tify only the Interesting 1%. In this context, the fdr=empirical
De ning Rj D ¹2j =.1 C c0 e /, (43) and (44) yield
combination avoids the standard criticism of hypotheses test-
Q.r/ ¸ R j with equality if rj > 0: (45) ing, that rejection becomes certain for large sample sizes.
Because Q.r/ is the maximum, this says that rj , and pj can
8. SUMMARY
be nonzero only if j maximizes Rj . In case of ties, one of the
maximizing j ’s can be arbitrarily chosen. Large-scale simultaneous hypothesis testing, where the num-
All of this shows that only J D 1 need be considered in (41). ber of cases exceeds, say 100, permits the empirical estimation
The global maximized value of r0 in (41) is ¾max D .1 ¡ of a null hypothesis distribution. The empirical null may be
1
Rmax /¡ 2 , where wider (more dispersed) than the theoretical null distribution that
© ¯¡ 2 ¢ª would ordinarily be used for a single hypothesistest. The choice
Rmax D max ¹21 1 C c0 e¹1 =2 : (46) between empirical and theoretical nulls can greatly in uence
¹1
which cases are identi ed as “Signi cant” or “Interesting,” as
The maximizing argument ¹1 ranges from 1.43 for p0 D :95
opposed to “Null” or “Uninteresting,” this being true no matter
to 1.51 for p0 D :70. The corresponding result for ±max is sim-
which simultaneous hypothesis testing method is used.
pler, ¹1 D ± max C 1.
This article presents an analysis plan for large-scale testing
Remark G (Microarray correlation in the breast cancer situations:
study). It is easy to spot an unwanted correlation structure
² A density tting technique is used to estimate the null hy-
among the eight BRCA2 microarrays. Let X be the 3,226 £ 8
pothesis distribution f0 , (Fig. 1 and Sec. 3).
matrix of BRCA2 data, with the columns of X standardized
² The local false discovery rate (fdr), an empirical Bayes
to have mean 0 and variance 1. A “de-gened” matrix e X was
formed by subtracting row-wise averages from each element version of standard FDR theory, provides inferences for
of X, the N cases (Fig. 3 and Sec. 2).
8
X There are many possible reasons for overdispersion of the
xij D xij ¡
e xik =8: (47) empirical null distribution that would lead to the empirical null
kD1 being preferred for simultaneous testing including:
Table 3 shows the 8 £ 8 correlation matrix of e X. With gen- ² Unobserved covariates in an observational study, (Sec. 4)
uine gene effects subtracted out, the correlations should vary ² Hidden correlations (Sec. 6)
around ¡1=7 D ¡:14 if the columns of X are independent. In- ² A large proportion of genuine but uninterestingly small ef-
stead, the columns are correlated in blocks of four, with the fects (Fig. 5).
104 Journal of the American Statistical Association, March 2004
Large-scale testing differs in scienti c intent from an individ- (2003), “Robbins, Empirical Bayes, and Microarrays,” The Annals of
ual hypothesis test. The latter is most often designed to reject Statistics, 31, 366–378.
the null hypothesis with high probability. Large-scale testing Efron, B., and Tibshirani, R. (1996), “Using Specially Designed Exponential
Families for Density Estimation,” The Annals of Statistics, 24, 2431–2461.
is usually more of a screening operation, intended to identify (2002), “Empirical Bayes Methods and False Discovery Rates for Mi-
a small percentage of Interesting cases, assumed to be on the croarrays,” Genetic Epidemiology, 23, 70–86.
order of 10% or less in this article. The empirical null hypothe- Efron, B., Tibshirani, R., Storey, J., and Tusher, V. (2001), “Empirical Bayes
sis methodologyis designed to be accurate under this constraint Analysis of a Microarray Experiment,” Journal of the American Statistical
Association, 96, 1151–1160.
(Fig. 4). More traditional estimation methods, involving per-
Hedenfalk, I., Duggen, D., Chen, Y. et al. (2001), “Gene Expression Pro les in
mutations or quantiles, give incorrect f0 estimates (Sec. 4 and Hereditary Breast Cancer,” New England Journal of Medicine, 344, 539–548.
Remark D). Miller, R. (1981), Simultaneous Statistical Inference (2nd ed.), New York:
[Received June 2003. Revised August 2003.] Springer-Verlag.
Storey, J. (2002), “A Direct Approach to False Discovery Rates,” Journal of the
REFERENCES Royal Statistical Society, Ser. B, 64, 479–498.
(2003), “The Positive False Discovery Rate: A Bayesian Interpretation
Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery Rate:
and the q-Value,” The Annals of Statistics, 31, to appear.
A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal
Statistical Society, Ser. B, 57, 289–300. Westfall, P., and Young, S. (1993), Resampling-Based Multiple Testing: Exam-
Dudoit, S., Shaffer J., and Boldrick J. (2003), “Multiple Hypothesis Testing in ples and Methods for p-Value Adjustments, New York: Wiley.
Microarray Experiments,” Statistical Science, 18, 71–103. Wu, T., Schiffer, C., Shafer, R. et al. (2003), “Mutation Patterns and Structural
Efron, B. (1988), “Three Examples of Computer-Intensive Statistical Infer- Correlates in Human Immunode ciency Virus Type 1 Protease Following
ence,” Sankhyā, 50, 338–362. Different Protease Inhibitor Treatments,” Journal of Virology, 77, 4836–4847.