[1]This document is the result of the research project funded by the National Science Foundation [DMS 2311005].
[ orcid=0009-0009-3331-6523]
Conceptualization, Formal Analysis, Methodology, Software, Writing - original draft and revision
1]organization=Texas A&M University, addressline=TAMU, postcode=77843, postcodesep=, city=College Station, country=USA
[orcid=0000-0003-1672-3118] \creditMethodology, Software
[orcid=0000-0002-8659-4772] \cormark[1]
Conceptualization, Formal Analysis, Methodology, Software, Writing - original draft and revision, Supervision
Bayes Factors Based on Test Statistics and Non-Local Moment Prior Densities
Abstract
We describe Bayes factors based on z, t, , and F statistics when non-local moment prior distributions are used to define alternative hypotheses. The non-local alternative prior distributions are centered on standardized effects. The prior densities include a dispersion parameter that can be used to model prior precision and the variation of effect sizes across replicated experiments. We examine the convergence rates of Bayes factors under true null and true alternative hypotheses and show how these Bayes factors can be used to construct Bayes factor functions. An example illustrates the application of resulting Bayes factors to psychological experiments.
keywords:
Bayes factor function \sepNon-local prior density \sepNormal-moment density \sepReplicated Design1 Introduction
Bayes factors based on test statistics and first-order non-local prior densities were used in Johnson et al. (2023) (hereafter J23) to define Bayes factor functions (BFFs). The first-order non-local prior densities used to define those Bayes factors contained a single scale parameter that determined the mode of the non-local prior densities used to define the alternative hypotheses (Johnson and Rossell, 2010). BFFs were defined as the mapping of these prior modes to Bayes factors. The use of first-order moment prior densities allowed J23 to obtain closed-form expressions for Bayes factors based on common test statistics, including , , , and statistics.
In this article we extend the results of J23 by providing closed-form expressions for Bayes factors based on test statistics (BFBOTs) (Johnson, 2005) and alternative hypotheses defined using non-local moment prior densities of arbitrary order. This extension enables the incorporation of subjective prior knowledge regarding the precision of prior estimates of non-null effect sizes. Moreover, it provides a potential mechanism for modeling variation in effect sizes across replicated experiments.
2 BFBOTs and moment prior alternatives
J23 defined Bayes factors using two categories of prior densities: normal-moment prior densities and gamma prior densities. A normal-moment density on a parameter , given hyperparameters , can be expressed as
(1) | |||||
J23 derived Bayes factors based on and statistics by imposing normal-moment prior densities with on the non-centrality parameters of the test statistics under the alternative hypothesis. We extend these results for .
BFBOTs for and test statistics were defined by assuming the non-centrality parameter for each distribution under the alternative hypothesis followed gamma distributions with parameters and , i.e., . We generalize these results by extending the class of gamma priors to include distributions for .
For one-sided tests with positive non-centrality parameters, the prior densities and are defined similarly to the expression in (1), but they are constrained to the positive or negative real line, respectively. This constraint depends on whether the alternative hypothesis necessitates that or .
For simplicity we adopt the notation used in J23 and define
-
•
as a normal distribution with mean and variance .
-
•
as a distribution with degrees of freedom and non-centrality parameter .
-
•
as a distribution with degrees of freedom and non-centrality parameter .
-
•
as an distribution with degrees of freedom and non-centrality parameter .
-
•
as a gamma distribution with shape parameter and rate parameter .
-
•
as a normal-moment distribution of order and rate parameter and density given in (1).
In addition, we let denote the confluent hypergeometric function and the Gaussian hypergeometric function.
Test | Prior | |||
Two-sided z | ||||
One-sided z | , | |||
with | ||||
Two-sided | ||||
with | ||||
One-sided | ||||
with | ||||
Using this notation, the main results of this article are provided in Table 1, which provides explicit expressions for Bayes factors based on , , and tests. These expressions are justified by theorems provided in the Supplemental Material.
In common applications of these tests, the convergence rates for the resulting Bayes factors may be summarized as follows.
-
1.
Under true alternative hypotheses (i.e., ), Bayes factors in favor of null hypotheses decrease exponentially fast to 0 as the sample size .
-
2.
Under true null hypotheses and certain regularity conditions, Bayes factors in favor of alternative hypotheses often decrease at rate for and tests and for and tests.
Sufficient conditions for achieving these rates are provided in the Supplemental Material.
The improved convergence rates obtained for provide a partial motivation for generalizing the class of prior densities considered in J23. The non-local alternative prior densities proposed there correspond to setting in Table 1. The improvement in convergence rates for true null hypotheses can be attributed to the more rapid descent of the non-local prior densities to 0 as the non-centrality parameters converge to .
3 Selection of prior hyperparameters
We now describe strategies for specifying to define either a Bayes factor or to construct a BFF. We assume that the non-centrality parameter of the test statistic can be expressed as a function of , a standardized effect. That is, . For example, in a test that a normal mean is 0, the non-centrality parameter satisfies , where and is the sample size.
Our recommendation for setting is to fix conditionally on , and then set so that the prior mode on the non-centrality parameter equals , where represents a hypothesized value of . That is, we define such that
(2) |
Here, denotes the prior density on under the alternative hypothesis. This constraint places the prior mode for at the specified value of . In constructing a BFF, is varied over a plausible range of values for the standardized effect.
Like many default Bayesian testing methods, criteria for setting the scale of the prior density used to define alternative hypotheses remains a topic of active research (e.g., Zellner, 1986; Doucet et al., 2002; Liang et al., 2008; Rouder et al., 2009; Consonni et al., 2018; Pramanik and Johnson, 2023). For the purposes of this article, we simply examine the sensitivity of Bayes factors and BFFs to the choice of . Methods for estimating from published findings of similar studies or in replicated designs are currently under investigation.
To illustrate this strategy for setting for given , consider a test of a null hypothesis based on a random sample , where and is unknown. For this test, , where is the usual unbiased estimate of . The distribution of is
(3) |
where . Under the null hypothesis, . The non-centrality parameter of the distribution under the alternative hypothesis is , where we define as the standardized effect.
The prior distribution recommended for in Table 1 is a normal-moment prior, . The modes of this density are . To define a Bayes factor given a value , we select so that the modes of the prior occur at . That is, we equate and define .
A similar procedure can be used to set for other test statistics. Table 2 lists values of for several common statistical tests. This table generalizes the values provided in Table 1 of J23 for the case .
Figure 1 illustrates the effect of varying so that the mode of the prior density on an effect size remains constant. It shows that the prior dispersion around the mode decreases as increases.
Test | Statistic | Standardized Effect () | |
1-sample z | |||
1-sample t | |||
2-sample z | |||
2-sample t | |||
Multinomial/Poisson | |||
Linear model | |||
Likelihood Ratio |
4 Examples
To demonstrate the application of the Bayes factors described in Table 1 and the procedure for setting described in Section 3, we re-analyzed an experiment from the Many Labs 3 project (Ebersole et al., 2016).
The experiment measured the effect on response time for subjects performing the Stroop task (Ebersole et al., 2016). This effect is among the strongest and most widely replicated effects in experimental psychology. The Stroop task requires subjects to identify the color of the type of printed words. This task takes longer when there is discordance between the type’s color and the color’s name. For example, responding red takes longer than blue. After preprocessing the data to account for unusually long response times and incorrect answers (Greenwald et al., 2003), the authors “calculated the average response time for all correct responses separately for congruent and incongruent trials" and then “replaced response latencies for trials with errors using the mean of correct responses in that condition plus 600 ms.” They then “recomputed the means for congruent and incongruent trials overall” and used the difference between these two means divided by the standard deviation of all correct trials regardless of condition” to construct paired statistics.
Twenty replications of this experiment were replicated in the Many Labs experiment. For illustration, we begin by analyzing results from the first experimental site where the statistic was 9.38 on degrees of freedom (Table 3).
The null hypothesis in this study is that the mean difference in response times for the congruous and incongruous conditions is 0. Under the alternative hypothesis, we assume that the population mean difference in response times is and that the observational variance is , and let denote the standardized effect size. Given , we assume that the non-centrality parameters of the distributions of the statistic, , is drawn from a normal-moment distribution,
(4) |
where, from Table 2, we set
(5) |
Figure 2 displays the plot of BFFs obtained for , and using the theoretical results from Tables 1 and 2.
Several aspects of this figure merit comment. As expected, the BFF curves reflect more evidence in favor of alternative hypotheses corresponding to as increases. This happens because alternative hypotheses concentrate more of their prior mass on the hypothesized values of as increases. Indeed, as , all priors converge to a point mass on the hypothesized value of , and the BFF curves reduce to a plot of the log-likelihood ratio (based on the test statistic). Nonetheless, all BFF curves reflect very strong evidence for alternative hypotheses defined by priors centered on standardized effect sizes greater than about 0.05, and the curves for are relatively insensitive to the choice of within this range.
We now consider how estimates of might be used to construct BFFs when data from replicated sites is available. In this case, data from 20 Many Labs consortium sites are available; statistics from the 20 sites are listed in Table 3. For simplicity, we took a naive empirical Bayes approach to estimate (see Supplemental Materials), leading to an estimate of . This value of suggests that standardized effects were relatively consistent across sites.
Figure 3 provides the BFF based on the empirical Bayes estimate of and the statistics from all 20 sites. For comparison, the BFF for is also displayed. In this case, using the empirical Bayes estimate of had a moderate effect in increasing the BFF near its maximum of . With 20 experimental sites each exhibiting very strong evidence for a Stroop effect, the evidence against the null hypothesis is overwhelming except for alternative hypotheses concentrating prior mass on near 0.
9.38(83) | 9.85(118) | 7.36(43) | 11.62(90) |
7.85(95) | 12.56(317) | 11.01(123) | 10.15(130) |
13.52(157) | 10.14(100) | 8.90(116) | 10.37(141) |
11.68(177) | 9.11(118) | 16.97(241) | 8.82(136) |
8.46(88) | 5.93(80) | 12.17(193) | 9.37(94) |
5 Discussion
Bayes factors based on test statistics present compelling advantages over standard methods for calculating Bayes factors. They eliminate computational challenges that arise when Bayes factors are computed from complex statistical models, and they reduce subjectivity when defining prior distributions on model parameters. By indexing Bayes factors based on test statistics according to standardized effects, BFFs eliminate much of the subjectivity associated with specifying a single alternative model. They also provide users with a simple representation of the statistical evidence in favor of alternative hypotheses centered on a range of effect sizes.
The BFFs proposed in J23 lack flexibility in allowing scientists to incorporate the precision of their estimates of effect sizes into Bayesian hypothesis tests. This article addresses this shortcoming by expanding the class of prior distributions used in defining BFFs. In particular, it provides analytic expressions for Bayes factors based on test statistics in conjunction with more general classes of prior distributions and illustrates how BFFs can be constructed from these broader classes. Importantly, we demonstrate how the scale parameters of the prior distributions can be linked so that the modes of prior distributions are located at hypothesized effect sizes.
J23 demonstrated how test statistics from replicated experiments could be combined to generate an aggregated BFF. This article’s results will permit that methodology to account for the dispersion of effect sizes across replications of similar experiments. Efficient and coherent procedures for incorporating such information are currently under investigation.
The BFF package (https://CRAN.R-project.org/package=BFF), available from the CRAN depository, provides R functions to compute the BFFs reported in this article.
Acknowledgment
The authors acknowledge support from the National Science Foundation, NSF DMS 2311005.
References
- Consonni et al. [2018] Guido Consonni, Dimitris Fouskakis, Brunero Liseo, and Ioannis Ntzoufras. Prior Distributions for Objective Bayesian Analysis. Bayesian Analysis, 13(2):627 – 679, 2018. 10.1214/18-BA1103. URL https://doi.org/10.1214/18-BA1103.
- Doucet et al. [2002] A. Doucet, S.J. Godsill, and C.P. Robert. Marginal maximum a posteriori estimation using Markov chain Monte Carlo. Statistics and Computing, 12:77–84, 2002.
- Ebersole et al. [2016] Charles R. Ebersole, Olivia E. Atherton, Aimee L. Belanger, Hayley M. Skulborstad, Jill M. Allen, Jonathan B. Banks, Erica Baranski, Michael J. Bernstein, Diane B.V. Bonfiglio, Leanne Boucher, Elizabeth R. Brown, Nancy I. Budiman, Athena H. Cairo, Colin A. Capaldi, Christopher R. Chartier, Joanne M. Chung, David C. Cicero, Jennifer A. Coleman, John G. Conway, William E. Davis, Thierry Devos, Melody M. Fletcher, Komi German, Jon E. Grahe, Anthony D. Hermann, Joshua A. Hicks, Nathan Honeycutt, Brandon Humphrey, Matthew Janus, David J. Johnson, Jennifer A. Joy-Gaba, Hannah Juzeler, Ashley Keres, Diana Kinney, Jacqeline Kirshenbaum, Richard A. Klein, Richard E. Lucas, Christopher J.N. Lustgraaf, Daniel Martin, Madhavi Menon, Mitchell Metzger, Jaclyn M. Moloney, Patrick J. Morse, Radmila Prislin, Timothy Razza, Daniel E. Re, Nicholas O. Rule, Donald F. Sacco, Kyle Sauerberger, Emily Shrider, Megan Shultz, Courtney Siemsen, Karin Sobocko, R. Weylin Sternglanz, Amy Summerville, Konstantin O. Tskhay, Zack van Allen, Leigh Ann Vaughn, Ryan J. Walker, Ashley Weinberg, John Paul Wilson, James H. Wirth, Jessica Wortman, and Brian A. Nosek. Many labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67:68–82, 2016. ISSN 0022-1031. https://doi.org/10.1016/j.jesp.2015.10.012. URL https://www.sciencedirect.com/science/article/pii/S0022103115300123. Special Issue: Confirmatory.
- Greenwald et al. [2003] A.G. Greenwald, B.A. Nosek, and M.R. Banaji. Understanding and using the implicit association test: I. an improved scoring algorithm. J Pers Soc Psychol, 2003. 10.1037/0022-3514.85.2.197.
- Johnson [2005] Valen E. Johnson. Bayes factors based on test statistics. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(5):689–701, 2005. ISSN 13697412, 14679868. URL http://www.jstor.org/stable/3647614.
- Johnson and Rossell [2010] Valen E. Johnson and David Rossell. On the use of non-local prior densities in bayesian hypothesis tests. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 72(2):143–170, 2010. ISSN 13697412, 14679868. URL http://www.jstor.org/stable/40541581.
- Johnson et al. [2023] Valen E. Johnson, Sandipan Pramanik, and Rachael Shudde. Bayes factor functions for reporting outcomes of hypothesis tests. Proceedings of the National Academy of Sciences, 2023. URL https://doi.org/10.48550/arXiv.2210.00049.
- Liang et al. [2008] Feng Liang, Rui Paulo, German Molina, Merlise A Clyde, and Jim O Berger. Mixtures of g priors for bayesian variable selection. Journal of the American Statistical Association, 103(481):410–423, 2008. 10.1198/016214507000001337. URL https://doi.org/10.1198/016214507000001337.
- Pramanik and Johnson [2023] Sandipan Pramanik and Valen Johnson. Efficient alternatives for bayesian hypothesis tests in psychology. Psychological Methods, 2023. URL https://doi.org/10.1037/met0000482.
- Rouder et al. [2009] J.N. Rouder, P.L. Speckman, D. Sun, and R.D. Morey. Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16:225–237, 2009. 10.3758/PBR.16.2.225.
- Zellner [1986] A Zellner. On Assessing Prior Distributions and Bayesian Regression Analysis with g Prior Distributions. Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti. Studies in Bayesian Econometrics and Statistics. Vol. 6. New York: Elsevier, 1986. ISBN 78-0-444-87712-3.