nrn3475 p033
nrn3475 p033
nrn3475 p033
It has been claimed and demonstrated that many (and low sample size of studies, small effects or both) nega-
possibly most) of the conclusions drawn from biomedi- tively affects the likelihood that a nominally statistically
cal research are probably false1. A central cause for this significant finding actually reflects a true effect. We dis-
important problem is that researchers must publish in cuss the problems that arise when low-powered research
order to succeed, and publishing is a highly competitive designs are pervasive. In general, these problems can be
enterprise, with certain kinds of findings more likely to divided into two categories. The first concerns prob-
be published than others. Research that produces novel lems that are mathematically expected to arise even if
results, statistically significant results (that is, typically the research conducted is otherwise perfect: in other
p < 0.05) and seemingly ‘clean’ results is more likely to be words, when there are no biases that tend to create sta-
1
School of Experimental
Psychology, University of
published2,3. As a consequence, researchers have strong tistically significant (that is, ‘positive’) results that are
Bristol, Bristol, BS8 1TU, UK. incentives to engage in research practices that make spurious. The second category concerns problems that
2
School of Social and their findings publishable quickly, even if those prac- reflect biases that tend to co‑occur with studies of low
Community Medicine, tices reduce the likelihood that the findings reflect a true power or that become worse in small, underpowered
University of Bristol,
(that is, non-null) effect 4. Such practices include using studies. We next empirically show that statistical power
Bristol, BS8 2BN, UK.
3
Stanford University School of flexible study designs and flexible statistical analyses is typically low in the field of neuroscience by using evi-
Medicine, Stanford, and running small studies with low statistical power 1,5. dence from a range of subfields within the neuroscience
California 94305, USA. A simulation of genetic association studies showed literature. We illustrate that low statistical power is an
4
Department of Psychology, that a typical dataset would generate at least one false endemic problem in neuroscience and discuss the impli-
University of Virginia,
Charlottesville,
positive result almost 97% of the time6, and two efforts cations of this for interpreting the results of individual
Virginia 22904, USA. to replicate promising findings in biomedicine reveal studies.
5
Wellcome Trust Centre for replication rates of 25% or less7,8. Given that these pub-
Human Genetics, University of lishing biases are pervasive across scientific practice, it Low power in the absence of other biases
Oxford, Oxford, OX3 7BN, UK.
is possible that false positives heavily contaminate the Three main problems contribute to producing unreliable
6
School of Physiology and
Pharmacology, University of neuroscience literature as well, and this problem may findings in studies with low power, even when all other
Bristol, Bristol, BS8 1TD, UK. affect at least as much, if not even more so, the most research practices are ideal. They are: the low probability of
Correspondence to M.R.M. prominent journals9,10. finding true effects; the low positive predictive value (PPV;
e-mail: marcus.munafo@ Here, we focus on one major aspect of the problem: see BOX 1 for definitions of key statistical terms) when an
bristol.ac.uk
doi:10.1038/nrn3475
low statistical power. The relationship between study effect is claimed; and an exaggerated estimate of the mag-
Published online 10 April 2013 power and the veracity of the resulting finding is nitude of the effect when a true effect is discovered. Here,
Corrected online 15 April 2013 under-appreciated. Low statistical power (because of we discuss these problems in more detail.
for water maze and radial maze, respectively. Our results The estimates shown in FIGS 4,5 are likely to be opti-
indicate that the median statistical power for the water mistic, however, because they assume that statistical
maze studies and the radial maze studies to detect these power and R are the only considerations in determin-
medium to large effects was 18% and 31%, respectively ing the probability that a research finding reflects a true
(TABLE 2). The average sample size in these studies was 22 effect. As we have already discussed, several other biases
animals for the water maze and 24 for the radial maze are also likely to reduce the probability that a research
experiments. Studies of this size can only detect very finding reflects a true effect. Moreover, the summary
large effects (d = 1.20 for n = 22, and d = 1.26 for n = 24) effect size estimates that we used to determine the statis-
with 80% power — far larger than those indicated by tical power of individual studies are themselves likely to
the meta-analyses. These animal model studies were be inflated owing to bias — our excess of significance test
therefore severely underpowered to detect the summary provided clear evidence for this. Therefore, the average
effects indicated by the meta-analyses. Furthermore, the statistical power of studies in our analysis may in fact be
summary effects are likely to be inflated estimates of the even lower than the 8–31% range we observed.
true effects, given the problems associated with small
studies described above. Ethical implications. Low average power in neuro-
The results described in this section are based on science studies also has ethical implications. In our
only two meta-analyses, and we should be appropriately analysis of animal model studies, the average sample
cautious in extrapolating from this limited evidence. size of 22 animals for the water maze experiments was
Nevertheless, it is notable that the results are so con- only sufficient to detect an effect size of d = 1.26 with
sistent with those observed in other fields, such as the
neuroimaging and neuroscience studies that we have
described above. 16
14 30
Implications 12 25
Implications for the likelihood that a research finding 10
20
%
reflects a true effect. Our results indicate that the aver- 8
N
15
age statistical power of studies in the field of neurosci- 6
10
ence is probably no more than between ~8% and ~31%, 4
2 5
on the basis of evidence from diverse subfields within
0 0
neuro-science. If the low average power we observed
10
0
00
–2
–3
–4
–5
–6
–7
–8
–9
–1
11
21
31
41
51
61
71
81
91