0% found this document useful (0 votes)

27 views16 pages

23 STS907

Uploaded by

ribedeu777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views16 pages

23 STS907

Uploaded by

ribedeu777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Statistical Science

2024, Vol. 39, No. 1, 46–61

https://doi.org/10.1214/23-STS907
© Institute of Mathematical Statistics, 2024

Past, Present and Future of Software for

Bayesian Inference
Erik Štrumbelj, Alexandre Bouchard-Côté, Jukka Corander, Andrew Gelman, Håvard Rue,
Lawrence Murray, Henri Pesonen, Martyn Plummer, and Aki Vehtari

Abstract. Software tools for Bayesian inference have undergone rapid evo-
lution in the past three decades, following popularisation of the first genera-
tion MCMC-sampler implementations. More recently, exponential growth in
the number of users has been stimulated both by the active development of
new packages by the machine learning community and popularity of special-
ist software for particular applications. This review aims to summarize the
most popular software and provide a useful map for a reader to navigate the
world of Bayesian computation. We anticipate a vigorous continued develop-
ment of algorithms and corresponding software in multiple research fields,
such as probabilistic programming, likelihood-free inference and Bayesian
neural networks, which will further broaden the possibilities for employing
the Bayesian paradigm in exciting applications.
Key words and phrases: Statistics, data analysis, MCMC, computation,
probabilistic programming.

1. INTRODUCTION

In the past three decades, Bayesian inference has estab-

Erik Štrumbelj is a Professor, Faculty of Computer and lished itself as a viable alternative to more classical ap-
Information Science, University of Ljubljana, Ljubljana, proaches to statistical inference and is now a must-have
Slovenia (e-mail: [email protected]). Alexandre tool for every statistician’s toolbox. Many theoretical and
Bouchard-Côté is a Professor, Statistics at the University of methodological developments have contributed to the suc-
British Columbia, Vancouver, Canada (e-mail: cess of Bayesian statistics. However, no development has
[email protected]). Jukka Corander is a Professor, been as important for mass adoption as was the emergence
Faculty of Medicine, University of Oslo, Oslo, Norway, Faculty
of accessible and robust software.
of Science, University of Helsinki, Helsinki, Finland, and
Our goal with this paper is to introduce the reader to
Associate Faculty Member at Wellcome Sanger Institute,
the history, the state-of-the-art and the future of software
Hinxton, Cambridge, UK (e-mail:
for Bayesian inference. We aim to provide the reader with
[email protected]). Andrew Gelman is a
a comprehensive survey of popular software, key devel-
Professor, Statistics and Political Science at Columbia
opments in statistics and computing that enabled the soft-
University, New York, New York 10027 USA (e-mail:
ware and the limitations and challenges faced. The paper
[email protected]). Håvard Rue is a Professor,
Statistics at King Abdullah University of Science and
is aimed both at the Bayesian statistics practitioner and
Technology, Thuwal, Saudi Arabia (e-mail: those that are less familiar with the field and would like
[email protected]). Lawrence Murray has contributed to learn more about the Bayesian inference tasks and the
to this work as an independent researcher (e-mail: tools used to solve them.
[email protected]). Henri Pesonen is a Researcher, Oslo Before we proceed, we briefly discuss the background
Centre for Biostatistics and Epidemiology, Oslo University and introduce some basic terminology that we use
Hospital, Oslo, Norway (e-mail: throughout the paper.
[email protected]). Martyn Plummer is a 1.1 Bayesian Inference
Professor, Statistics at the University of Warwick, Warwick, UK
(e-mail: [email protected]). Aki Vehtari is a The essence of the Bayesian approach to inference is
Professor, Computational Bayesian Modeling at Aalto combining the chosen likelihood p(y|θ ) and prior distri-
University, Aalto, Finland (e-mail: [email protected]). bution p(θ ) of the parameters θ (or the joint distribution

46
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 47

p(θ, y)) with the data y to compute the posterior distribu- software is designed with a target user in mind, no compo-
tion of the parameters p(θ |y). We do it with Bayes’ rule: nent is more influenced by the requirements of the target
user than the modeling language. And, as demonstrated
p(y|θ )p(θ )
p(θ |y) = by the variety of different modeling languages, Bayesian
p(y) inference users are a heterogeneous group and there is no
p(y|θ )p(θ ) one-size-fits-all approach.
= ∝ p(y|θ )p(θ ).
p(y|θ )p(θ ) dθ 1.2.2 Computation methods. Once the model is spec-
The most common quantities of interest in Bayesian in- ified, the next step is to perform the computation of the
ference are posterior properties of parameters or functions posterior and other quantities of interest. Therefore, a
thereof, which can be expressed in terms of expectations complete software for Bayesian inference must imple-
over the posterior distribution p(θ |y): ment one or more Bayesian computation methods.
There is no method that is able to perform practically

E g(θ )|y = g(θ )p(θ |y) dθ. feasible Bayesian computation for every model. There-
fore, many different computation methods have been de-
Prior and posterior predictions, model selection and veloped, and each method represents a trade-off between
other quantities of interest follow a similar pattern. Thus, generality and efficiency. The computation method deter-
the main computational problem of Bayesian inference is mines the class of models that can be computed and usu-
computing integrals. ally limits the software more than the modeling language.
Our choice of likelihood and prior rarely lead to a That is, it is not uncommon that the modeling language
closed-form solution for p(θ |y), which in most cases can allows for the specification of models that the computa-
only be evaluated up to a multiplicative constant, and even tion method is not able to compute, not even in theory.
less often to a closed-form solution for the integral. There- And, as a rule, there always exist models that a computa-
fore, computing the quantities of interest is a numerical tion method will not be able to deal with in practice, even
problem and is a challenge in itself. though it is able to do so in theory.
In this paper, our treatment of Bayesian computation is
1.2 Software for Bayesian Inference from a Bayesian software perspective: we limit ourselves
For our discussion of software for Bayesian inference, to discussing methods that were key for the development
we divide the software components into three groups: the of software for Bayesian inference and listing the methods
modeling language, the computation methods and the util- implemented in the software. For details about the history
ities. and the state-of-the-art of Bayesian computation, we refer
the reader to [80].
1.2.1 Modeling language. We use the term modeling
language in the broadest sense of a component that allows 1.2.3 Utilities. With utilities, we refer to all software
the user to specify the likelihood, prior and data (from components that do not fall in to the previous two groups,
now on, we use model to refer to all of these combined). but are still common in Bayesian software and convenient
Alternatively, Bayesian inference can be done by spec- if not essential to the Bayesian inference workflow (for a
ifying a generative model for p(θ, y) instead (see Sec- detailed treatment of the Bayesian workflow, we refer the
tion 4.3) and some languages support specifying both. See reader to [41, 45]):
the Appendix for an illustrative example in different mod- • Diagnosing Bayesian computation: Bayesian computa-
eling languages. tion methods can and often do fail to find the optimum
Every modeling language represents some kind of solution or, in the case of Markov chain Monte Carlo
trade-off between generality and accessibility. On one end (MCMC), properly explore the posterior distribution.
of the spectrum are expressive languages, such probabilis- Diagnostics tools are essential to identifying potential
tic programming languages (PPLs) and general-purpose issues before proceeding with the interpretation of the
programming languages like Python. On the other end, results. Furthermore, most key methods are MCMC
we have software that allows for a single model or a lim- and, therefore, sampling-based and approximate. Ap-
ited number of options. And in between we have Bayesian proximation error must also be quantified and included
inference-specific declarative (e.g., WinBUGS [74]), im- in the interpretation of the results. Common diagnostics
perative (e.g., Stan [21]), or formula-based languages are traceplots, Monte Carlo standard errors, effective
(e.g., R-INLA [71] and rstanarm [50]) that use syntax sample size (ESS), R̂ and simulation-based calibration
similar to the formula object used by generalized linear [119].
models (GLMs) in the core R stats package [100]), etc. • Model validation and comparison: Prior, posterior
The choice of modeling language more so than any and model visualization, prior and posterior predictive
other component determines the target user. Or, when the checks, (approximate) leave-one-out cross-validation
48 E. ŠTRUMBELJ ET AL.

and model evaluation criteria such as WAIC [134] and serve a more specific purpose, for example, software that
computing Bayes factors. The modeling language de- provides only Bayesian computation of a utility or soft-
termines how easy or difficult it is to compute these ware that focuses on a more narrow class of models.
[20]. For example, for prior and posterior predictive When it comes to less commonly used and more specific
checks we have to draw samples from the prior p(y) software, this paper is biased toward Python and R, the
and posterior predictive distribution p(ynew |y). For two most popular languages for data analysis. See Ta-
Bayes factors, we have to evaluate the marginal p(y) bles 1 and 2 for an estimate of the relative popularity
and for cross-validation we have to evaluate the poste- of Bayesian software packages for Python and R, respec-
rior predictive p(ynew |y). tively.
• Computation libraries: Matrix algebra libraries, sup- General-purpose Bayesian computation has had two
port for probability distributions and other statistical distinct periods, each dominated by a certain type of
computation, support for high-performance computing Bayesian computation and software. From the early 1990s
to the 2010s, it was Gibbs sampling and the quintessen-
and automatic differentiation (AD) libraries.
tial representative of software is BUGS. From the 2010s
• Interfaces: Often, Bayesian software provides the user
up to now, it is Hamiltonian Monte Carlo (HMC) and the
with only low-level command-line interface to the com-
quintessential representative is Stan. The first part of the
putation, where the data and model are passed as files. remainder of the paper roughly corresponds to these two
For convenience, interfaces are then developed that al- periods. In Section 2, we describe Gibbs sampling, the
low the user to access the computation from a popular typical structure of Gibbs sampling-based software and
higher-level language such as Python and R. the BUGS language. We also include software that might
• Documentation: This includes software documenta- have been developed later but is related to, was inspired
tion, language definition, examples, case studies and by or is a continuation of BUGS. Similarly, Section 3 fo-
other material that make it easier to use the software. cuses on HMC and Stan.
1.3 Scope and Organization We dedicate Section 4 to software that we were not able
to meaningfully assign to either of the two periods. It fea-
In part, this paper is a survey of the most popular and tures software that focuses on computation, software that
historically most relevant software for general-purpose targets a more specific class of models and the latest de-
Bayesian inference. We also include popular software that velopments in Bayesian software and universal PPLs.

TABLE 1
Total Python Package index (PyPI) downloads for Bayesian inference-related Python packages referenced in this paper for the period between
January 1, 2022 and December 31, 2022. We obtained the information from the PyPI data set (bigquery-public-data.pypi). We include matplotlib
[63], the most popular Python package for statistical graphics, as a baseline for comparison. While these counts should in most cases be a
reasonable proxy for relative popularity, we have to keep in mind that users can also download these packages from other sources. Inclusion in
other packages and automated downloads can also bias the results

Package Download count Description

matplotlib 339,834,089 Python plotting package

pystan 30,416,188 Python interface to Stan, a package for Bayesian inference
cmdstanpy 30,061,999 Python interface to CmdStan
prophet 12,279,924 Automatic Forecasting Procedure
tensorflow-probability 10,666,937 Probabilistic modeling and statistical inference in TensorFlow
arviz 7,388,303 Exploratory analysis of Bayesian models
pymc3 4,278,991 Probabilistic Programming in Python: Bayesian Modeling and Probabilistic ML with Theano
pyro-ppl 3,383,969 A Python library for probabilistic modeling and inference
httpstan 1,854,832 HTTP-based interface to Stan, a package for Bayesian inference
emcee 1,143,412 The Python ensemble sampling toolkit for MCMC
pymc 682,274 Probabilistic Programming in Python: Bayesian Modeling and Probabilistic ML with PyTensor
numpyro 566,378 Pyro PPL on NumPy
dynesty 277,271 A dynamic nested sampling package for computing Bayesian posteriors and evidences.
bambi 125,998 BAyesian Model Building Interface in Python
elfi 75,236 Engine for Likelihood-free Inference
edward 61,403 A library for probabilistic modeling, inference and criticism
blackjax 42,843 Flexible and fast inference in Python
pyjags 20,140 Python interface to JAGS library for Bayesian data analysis
oryx 15,550 Probabilistic programming and deep learning in JAX
edward2 11,457 Edward2
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 49

TABLE 2
Total RStudio [104] CRAN mirror downloads for Bayesian inference-related R packages referenced in this paper for the period between January 1,
2022 and December 31, 2022. We used the cranlogs package [26]. We include ggplot2 [135], the most popular R package for statistical graphics,
as a baseline for comparison. While these counts should in most cases be a good proxy for relative popularity, we have to keep in mind that users
can also download these packages from other CRAN mirrors or directly from code repositories. Also, some popular R packages are not available
on CRAN, for example, R-INLA, cmdstanr, the R interface to Stan or R2MultiBUGS, the R interface to MultiBUGS

Package Download count Description

ggplot2 31,457,872 Create Elegant Data Visualisations Using the Grammar of Graphics
mgcv 1,523,237 Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
coda 1,190,640 Output Analysis and Diagnostics for MCMC
rstan 993,086 R Interface to Stan
loo 738,325 Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models
bayestestR 599,283 Understand and Describe Bayesian Models and Posterior Distributions
prophet 338,276 Automatic Forecasting Procedure
posterior 314,669 Tools for Working with Posterior Distributions
bayesplot 308,747 Plotting for Bayesian Models
bnlearn 286,003 Bayesian Network Structure Learning, Parameter Learning and Inference
shinystan 272,855 Interactive Visual and Numerical Diagnostics and Posterior Analysis for Bayesian Models
BayesFactor 239,538 Computation of Bayes Factors for Common Designs
rjags 228,433 Bayesian Graphical Models using MCMC
brms 215,302 Bayesian Regression Models using Stan
MCMCpack 186,124 Markov Chain Monte Carlo (MCMC) Package
rstanarm 164,469 Bayesian Applied Regression Modeling via Stan
bridgesampling 155,278 Bridge Sampling for Marginal Likelihoods and Bayes Factors
R2WinBUGS 61,926 Running WinBUGS and OpenBUGS from R SPLUS
nimble 36,471 MCMC, Particle Filtering and Programmable Hierarchical Modeling
abc 36,251 Tools for Approximate Bayesian Computation (ABC)
R2OpenBUGS 27,284 Running OpenBUGS from R
greta 8453 Simple and Scalable Statistical Modeling in R
abctools 6404 Tools for ABC Analyses
EasyABC 5344 Efficient Approximate Bayesian Computation Sampling Schemes

We discuss the future of software for Bayesian infer- and the full conditional of a node is
ence in Section 5. p(v|V /v) ∝ p(V )
(1)
∝ p v|P (v) p u|P (u) ,
2. FIRST GENERATION—GIBBS SAMPLING-BASED u∈V :v∈P (u)
where P (v) are parent nodes of v.
In the period between the early 1990s and early 2010s,
A Markov chain that updates one node at a time using
the most popular software for general-purpose Bayesian its full conditional will converge to the posterior distribu-
inference was based on graphical models and Gibbs sam- tion under weak conditions. From a practical perspective,
pling as the method of Bayesian computation. this means that we only have to be able to iteratively sam-
The main assumption of this approach is that the con- ple from the full conditionals. Algorithm 1 is a summary
ditional independence between variables in our joint dis- of the Gibbs sampling algorithm. A major appeal of the
tribution p(V ) = p(θ, y) can be represented by a directed algorithm is that there are no algorithm parameters that
acyclic graph (DAG), where each variable is represented need to be tuned, which is a useful property for automated
by a node and every node is conditionally independent of inference. For the purpose of sampling from a full condi-
tional, a hierarchy of samplers is typically used. Because
all other nodes, given its Markov blanket.
the model is stated in a symbolic way, it is straightfor-
A model that admits such a representation is called a ward to check the properties of the full conditional. In
Bayesian network and is a class of probabilistic graphi- most cases, the more specific the distribution, the more
cal models (see the Appendix for an example). The joint efficient the sampling algorithm that we can use.
distribution can then be factored as
2.1 BUGS

p(V ) = p v|P (v) The quintessential representative of this approach is
v∈V BUGS (Bayesian inference Using Gibbs Sampling) [112].
50 E. ŠTRUMBELJ ET AL.

The BUGS project started at the Medical Research Coun- A strong point of the BUGS PPL is that the distinction
cil Biostatistics Unit in Cambridge in 1989. The BUGS between data and parameters is made at run time, based
software evolved into WinBUGS [74, 76, 113], which up- on provided observations. Vectors can also be partially
dated the BUGS language and the sampling algorithms, observed, by leaving the unobserved elements unknown
and OpenBUGS, a GNU General Public License release (NA). This simplifies the simulation of draws for pos-
of WinBUGS that also runs on Linux (with some limi- terior checks. Although WinBUGS focuses on Bayesian
tations) [114]. BUGS, WinBUGS and OpenBUGS are no networks, there is some limited support for undirected
longer developed, but the BUGS project has inspired other graphs (factor models) as long as the entire subset of vari-
software, which we discuss at the end of this section. A ables is represented as a single multivariate node so that
detailed history of BUGS is provided by Lunn et al. [75]. their values are sampled jointly. WinBUGS also supports
The factorization from equation (1) is the basis for both graphical model specification in plate notation with the
the BUGS language and the Gibbs sampling-based com- DoodleBUGS editor.
putation. The BUGS language is a declarative language MultiBUGS [54] is a continuation of the BUGS project.
in which the user states all the parent–child relationships The major contribution of MultiBUGS is that it provides a
P (v) between the variables in the model. See the Ap- more efficient implementation and several parallelization
pendix for an example of a Bayesian network model in techniques. In a multicore environment, MultiBUGS can
JAGS, a language which is very similar to WinBUGS. be several orders of magnitude more efficient than Open-
For sampling from the full conditionals, WinBUGS im- BUGS.
plements a different approach for each of the following R interfaces are available for WinBUGS, OpenBUGS
contingencies [74], Chapter 12.1: and MultiBUGS: R2WinBUGS [117], R2OpenBUGS
[126] and R2MultiBUGS.1
• Discrete distribution: The inverse CDF method.
• Standard distribution: Standard algorithm for that dis- 2.2 JAGS
tribution. JAGS (Just Another Gibbs Sampler) [96] is similar to
• Log-concave: Derivative-free adaptive rejection sam- WinBUGS in its language and computation (see [74],
pling [46]. Many standard distributions are log- Chapter 12.6, for differences). Unlike WinBUGS and
concave, including the exponential family. The prod- OpenBUGS, which are written in Component Pascal,
uct of log-concave is also log-concave, so it is common JAGS is written in C++ and portable. This has con-
for the full-conditional to be log-concave. tributed to its popularity and the fact that is still being
• Restricted range: Slice sampling [90]. actively developed. See the Appendix for an example of a
• Unrestricted range: Current point Metropolis. model written in JAGS.
OpenBUGS includes block sampling methods that JAGS (Just Another Gibbs Sampler) is a clone of BUGS
jointly sample from groups of nodes that are likely to that has a completely independent code base but aims for
be correlated based on the structure of the model. Block similar functionality, although it notably lacks a graphical
updating solves one of the disadvantages of Gibbs sam- user interface (see [74], Chapter 12.6, for a summary of
pling: it is strongly dependent on the parameterization of differences). JAGS is written in C++ and runs on Win-
the model. If two variables have a high posterior correla- dows, MacOS and Linux. It is published under the GNU
tion but are updated independently using Gibbs sampling, General Public License, version 2. JAGS incorporates a
then the Markov chain will exhibit high autocorrelation copy of the R math library, which provides high-quality
for both variables. Block updating of correlated nodes algorithms for random number generation and calcula-
solves this problem, which otherwise falls to the user to tion of quantities associated with probability distributions.
solve by reparameterizing the model. The workhorse sampling method for JAGS is slice sam-
pling [90], which can be applied to both continuous- and
discrete-valued nodes. The “glm” module of JAGS in-
Algorithm 1 k—number of nodes, p(xj |x−j )—k full corporates efficient samplers for generalized linear mixed
conditionals, x0 —starting value, m—number of samples models (GLMMs). These samplers are based on the prin-
1: procedure G IBBS SAMPLING( ) ciple of data augmentation, a commonly used technique
2: for i ← 1 : m do to simplify sampling from a graphical model by adding
3: for j ← 1 : k do new nodes [58]. In this case, data augmentation reduces
4: xj(i) ∼ p(xj |x1(i) , . . . , xj(i)−1 , xj(i−1) (i−1)
−1 , . . . , xk ) GLMMs with binary outcomes [2, 62, 98] or binary and
5: end for Poisson outcomes [38] to a linear model with normal out-
6: ith sample ← x (i) comes. This reduction to a normal linear model allows
7: end for
8: end procedure 1 https://github.com/MultiBUGS/R2MultiBUGS.
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 51

block updating of all the parameters in the linear predic- system. The main idea of HMC is to use the Hamiltonian
tor, which is much more efficient than Gibbs sampling. to define a joint density of position and momentum:
The underlying engine for the linear model uses sparse
p(q, p) ∝ e−H (q,p) = e−U (q) e−K(p) .
matrix algebra [29], which handles fixed and random ef-
fects simultaneously. Substituting U (q) = − log f (q), where f is propor-
tional to the density we want to sample from, and use
2.3 Nimble
standard kinetic energy, we get
Nimble [30], similar to BUGS, focuses on graphical 1 T M −1 p
models. It is an extension of the BUGS language but also p(q, p) ∝ f (q)e− 2 p .
implements a modeling language embedded in R, both of The joint density p(q, p) can be seen as the target den-
which are compiled to C++. Several Bayesian compu- sity over the position vector q augmented by an indepen-
tation methods are implemented, including Metropolis– dent multivariate Gaussian for the momentum vector p,
Hastings (MH), Gibbs sampling and sequential Monte with mean 0 and covariance M.
Carlo (SMC), and the user has the flexibility of assigning Hamiltonian dynamics conserves the Hamiltonian, so
different sampling methods to different nodes. Recently, all states on a trajectory will have the same density p(·, ·).
they have also added support for AD and HMC. See the That makes Hamiltonian dynamics suitable for proposing
Appendix for an example of a model written in Nimble’s the next state in an MCMC algorithm, because a trajectory
R-embedded language. can propose a state far away in position q from the current
state, but still with acceptance probability 1. To reach ev-
3. SECOND GENERATION—HMC-BASED ery possible state, we have to sample a new momentum.
The two main drawbacks of the BUGS-like approach Because the kinetic and potential energy parts of the joint
are the limited expressiveness of the language (imperative density are independent and we are sampling from the ac-
language, no local variables, conditional statements, etc.) tual distribution of momentum p, this sampling leaves the
and the inefficiency of computation. The single-node ex- target distribution invariant. That is, p(q, p) remains the
ploration of Gibbs sampling is inefficient when the nodes stationary distribution of the Markov chain. In practice,
are highly correlated in the posterior, in particular, when however, the leapfrog method, while being a stable sim-
the dimensionality in terms of parameters is high, it re- ulation of Hamiltonian dynamics, will not conserve the
verts to random walk behavior [59, 89]. Hamiltonian exactly, and there will be relatively small
The only MCMC algorithm that theoretically scales to fluctuations. That is why we still have to apply a Metropo-
high dimensions on a broad class of models is HMC. In- lis correction. Putting it all together, Algorithm 2 summa-
troductions to HMC have been provided, for example, by rizes the basic HMC algorithm.
Neal [91] and Betancourt [10], and a more detailed math- The main ideas behind HMC had been known for more
ematical treatment by Betancourt [11]. than 20 years before HMC featured in popular Bayesian
HMC is a physics-inspired approach to proposing the software [33]. The key enabler of more automatic use of
next state that uses the gradient of the target density for
a better understanding of its geometry. Hamiltonian dy- Algorithm 2 f —a function proportional to our target
namics consist of a d-dimensional position vector q and density, q0 —starting value, —step size, L—number of
a d-dimensional momentum vector p. The evolution of steps, M mass matrix, m—number of samples
the system as a function of “algorithmic time,” t, is deter- 1: procedure HMC
mined by the function H (q, p) (the Hamiltonian) and the 2: for i ← 1 : m do
ordinary differential equations 3: p ∼ N(0, M) resample momentum
dqi ∂H dpi ∂H 4: get (q ∗ , p∗ ) with L leapfrog steps of size
= , =− . from (qi−1 , p)
dt ∂pi dt ∂qi ∗ ∗
5: α ← min{1, e−H (q ,p )+H (qi−1 ,p) }
To simulate Hamiltonian dynamics, we need to dis-
6: sample u ∼ U (0, 1)
cretize time with some step size . The most commonly
7: if u ≤ α then Metropolis correction
used method is the leapfrog symplectic integrator. Hamil-
8: qi ← q ∗ accept transition
tonian dynamics have several properties, which are im-
9: else
portant for HMC to work: they preserve the Hamiltonian,
10: qi ← qi−1
they are reversible and they are symplectic, and thus vol-
11: end if
ume preserving.
12: end for
For HMC, the Hamiltonian H is typically chosen so
13: ith sample ← qi
that it is separable: H (q, p) = U (q) + K(p), where U (q)
14: end procedure
is the potential energy and K(p) the kinetic energy of the
52 E. ŠTRUMBELJ ET AL.

HMC was the development of automatic differentiation also has mature interfaces for Python and R (RStan [115],
(AD; see [6] for an introduction and survey). Simulating PyStan [102]) and lightweigt wrappers for Python and R
the Hamiltonian dynamics of HMC requires the gradient (CmdStanPy [116], CmdStanR [39], BridgeStan2 ). There
of the density and in order for the software to be gen- are also interfaces for most languages that are tradition-
eral purpose, we must be able to compute the gradient for ally used for data analysis: Matlab (Matlabstan), Julia
any program the user can code. Of the four general ap- (Stan.jl), Stata (StataStan), Mathematica (MathematicaS-
proaches to computing derivatives, three will not work: tan), Scala (ScalaStan) and http request-based interface
manually deriving them is not practical, numerical dif- (httpstan).
ferentiation via finite differences is too unstable due to While Stan implements black-box variational infer-
rounding and truncation errors and also is slow in high ence [69], Laplace approximation and standard optimiza-
dimensions and symbolic differentiation suffers from ex- tion methods, the core Bayesian computation method is
pression swell and leads to inefficient code. AD instead NUTS, a variant of HMC. Stan has a rich mathematics
exploits the fact that every program is a composition of library with AD [22], and OpenCL-based GPU support
elementary operations and, as long as each elementary with kernel fusion [23, 24].
operation also implements a derivative, we can apply the The Stan PPL is an imperative language with which
chain rule to derive the gradient of the composition. This the user specifies the computation of the (log-)posterior.
leads to machine level precision of gradients. Most mod- A program is divided into blocks, the most important of
ern inference software implements or imports an AD li- which are data, parameters and model. See the Appendix
brary. An important limitation of HMC is that it can only for an example of a model written in Stan. The distinc-
be used on smooth spaces. tion between data and parameters is made at compile time,
A challenge in making HMC useful in general-purpose so changing a variable from data to a parameter (or vice
Bayesian inference is automatically tuning its parame- versa) requires moving it from the data to the parameter
ters (mass matrix, step size, number of steps). HMC- block (or vice versa) and recompiling. Notable work on
based software typically implements one or more warmup the Stan language includes SlicStan [53, 51, 52], which
phases for parameter tuning. Software then proceeds with contains several improvements, and translating Stan to
sampling and the warmup samples are discarded. The key Pyro [5].
development was the no-U-turn sampler (NUTS) [59], Because of HMC-based computation, the class of mod-
which is, with some modifications, still the core Bayesian els that can be fit by Stan are models with a smooth den-
computation method in Stan. The basic idea of NUTS is to sity. An important omission are models with discrete pa-
have a dynamic number of steps by simulating the Hamil- rameters, which currently have to be manually marginal-
tonian trajectory until we detect a turn back toward the ized out. This means that Stan does not subsume what
starting state (or reach the maximum number of steps). can be fit with BUGS and that HMC does not make
While promising alternatives for tuning the number of Gibbs sampling-based software obsolete. However, em-
steps, including more GPU computation friendly variants pirical evidence suggests that, when applicable, Stan is
[61, 60], NUTS is still the most common implementation currently the go-to software for general-purpose Bayesian
of HMC and is also available in most modern software for inference [7].
general-purpose Bayesian inference. The majority of Stan users are not writing the models
HMC/NUTS admits several specific MCMC diagnos- directly in the Stan language. There are several popular
tics [10]: when the step size is too large to capture a fea- packages that provide a simplified formula or options-
ture of the target density (which can lead to nonnegligi- based modeling language for a mode specific class of
ble bias), this is likely to manifest as a diverging simu- models and use a Stan backend for modeling and com-
lation, which can be detected and we can use a smaller putation: the R package brms [16] for modeling with hi-
step size; reaching the maximum number of steps before erarchical models; Prophet [121] implemented in Python
terminating the trajectory is an indication of inefficient [123] and R [122] for nonlinear time series forecasting
exploration; the Bayesian fraction of missing information with trend, seasonality and holiday effects; and the R
(BMFI) [9] quantifies how well momentum resampling package rstanarm [50] for a Bayesian analogue to R lm,
matches the marginal energy distribution and can be used glm, aov, etc. Overall, there are more than 140 R packages
to detect poor adaptation during warmup or inefficient ex- built on top Stan, providing easy-to-use interfaces for var-
ploration. ious types of models common in different applications.
The success of these packages is not only due to Stan,
3.1 Stan but also due to increasing number of useful utilities in R,
Stan [21] is by far the most popular software for Python and Julia.
general-purpose Bayesian inference. Stan is implemented
in C++, has a standalone command line interface, but 2 github.com/roualdes/bridgestan.
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 53

3.2 ADMB 4.2 Universal probabilistic programming languages

AD Model Builder (ADMB) [37] was the first PPL In related work there is currently no consensus on what
based on AD. It is similar to Stan in that data, parame- makes a PPL universal. But it is generally accepted that a
ters and the likelihood and priors are described separately universal PPL program can have probabilistic operations
and that a distinction is made between data and parame-
anywhere, for example, that not even the number of ran-
ters. ADMB is tailored more to the optimization-based in-
dom variables can be determined statically. We will use
ference, but also implements MH, Laplace approximation
and HMC with manual tuning. A third-party implementa- the notion of inverting simulators used in [101], that is,
tion of NUTS for ADMB is available [84]. that the user codes a stochastic simulation and the PPL
framework is able to infer the properties of the simulation
3.3 PyMC
given the observed data.
PyMC [107] is a Python library for Bayesian inference. In this sense, BUGS, Stan and other languages men-
It includes HMC, SMC and black-box variational infer- tioned so far are not universal PPLs; they can be viewed
ence. It is based on PyTensor,3 a Python mathematics li- as Bayesian inference-driven systems that streamline the
brary that is a fork of Aesara, and continuation of the Bayesian inference workflow within classes of models for
no longer developed Theano [8], which was the PyMC3 which inference can easily be automated. Designing a uni-
backend up to the current major release 4.0 and the re-
versal PPL primarily focuses on having a general-purpose
naming to PyMC. The computational graphs in PyMC
language and then an inference framework that is able to
are transpiled to C-code, Numba or JAX [15] (a high-
performance AD library for running Pyton/NumPy code handle all the algorithms that can be specified. In theory,
on CPU, GPU and TPU), which allows for highly opti- a universal PPL subsumes Bayesian inference and it is ar-
mized code. PyMC syntax is similar to other Bayesian guably easier to code a stochastic simulator than it is to
software PPLs. See the Appendix for an example of a design an appropriate statistical inference. However, it is
model written in PyMC. The Bambi (BAyesian Model- not clear if inference can be automated and be efficient
Building Interface) package [19] is built on PyMC and enough for such a broad class of algorithms.
designed to simplify the use of hierachical GLMs. From a Bayesian statistics practitioner’s perspective,
universal PPLs are still more an object of research than
4. OTHER SOFTWARE
of general practical use. However, there have been many
4.1 R-INLA promising developments. In the remainder of this section,
R-INLA [71, 130] is a popular R package for Bayesian we highlight some of the more popular or recent universal
inference with latent Gaussian models. This class of mod- PPLs. Other relevant related works include early univer-
els does not require a PPL; instead, the models are speci- sal PPL languages: Church [48], Venture [79] and Angli-
fied with a standard R formula, similar to lm/glm in the can [128], Julia-based Gen [28], Turing.jl [44] (and its
core R stats package [100] and extended formula syn- more recent frontend DynamicPPL [120]) and Python-
tax for smooths and hierarchical (“random effect”) terms embedded Edward/Edward2 [129].
similar as, for example, in the R package mgcv [137].
Latent Gaussian models, when the number of hyperpa- 4.2.1 Bean machine. [124] is Bayesian software and
rameters is moderate and some additional assumptions, a declarative universal PPL embedded in Python with a
allow for efficient computation using integrated nested PyTorch backend. In essence, Bean Machine allows for
Laplace approximation (INLA) [82, 105, 106], an ap- a specification of a distribution over Bayesian networks
proximate Bayesian computation method. For models that with possibly different numbers of variables. While im-
meet these criteria, R-INLA is a very efficient alterna- perative languages are becoming more common, includ-
tive to MCMC methods and would be difficult to re- ing the currently most popular Stan, the authors argue for
place. A key feature of R-INLA is the support for con- declarative PPLs over imperative ones. In particular, the
tinuous spatial models using the stochastic partial differ-
(possibly dynamic) dependency structure between vari-
ential equation (SPDE) approach [4, 68, 72]. Recently,
ables is more easily recovered from a declarative model
the model representation has been improved and the inner
Laplace approximation have been replaced with a varia- description and that inference can more easily be adapted
tional Bayes correction layer, to facilitate better scaling to individual blocks of variables, including second-order
properties with respect to data size, model size and num- methods that are usually infeasible in higher dimensions.
ber of computing cores [43, 131, 132]. This can be ex- Bean Machine implements several single-site samplers,
panded to variance, skewness and to correcting marginals NUTS, Newtonian Monte Carlo and black box VI. It al-
for hyperparameters. lows for blocking of variables and custom proposers. See
the Appendix for an example of a model written in Bean
3 www.github.com/pymc-devs/pytensor. Machine.
54 E. ŠTRUMBELJ ET AL.

4.2.2 Birch. [88] is a universal PPL that transpiles to 4.3 Likelihood-Free Bayesian Inference
C++, with GPU support. Users implement the joint dis-
Likelihood-free inference (LFI) methods such as ap-
tribution of their model in a generative manner, with
proximate Bayesian computation (ABC) [110], Bayesian
a preference for generic and object-oriented program-
synthetic likelihood (BSL) [99], machine learning-based
ming paradigms. Inference methods are based on SMC
posterior approximations and surrogate likelihood meth-
with gradient-based kernels. A defining feature of Birch
ods [56, 25] refer to (mostly) Bayesian computation meth-
is support for automatic marginalization and automatic
ods that can be used when it is impossible or infeasible to
conditioning. Much like AD, these recognize known
evaluate the likelihood function, but a generative simula-
forms, such as conjugacies and discrete enumerations,
to marginalize out random variables where possible, and tor model exists. Such methods are popular, for example,
condition them on later simulations where necessary. The in astrostatistics, genetics, ecology, systems biology and
implementation of these is based on a heuristic known human cognition modeling. Engine for Likelihood-Free
as delayed sampling [87], which reveals these opportuni- Inference (ELFI) [73] is a Python package for LFI that
ties during program execution by deferring the simulation covers all the main approaches (ABC, BSL and ML-based
of random variables for as long as possible. The result methods). ELFI has a modular design that consists of a
is the automatic enhancement of inference methods with DAG-based modeling API and a separate API for infer-
features such as Rao–Blackwellization [87] and variable ence, allowing a user to choose flexibly from a selection
elimination [136]. Birch has been demonstrated on prob- of algorithms that generate a sample from the approxi-
lems where the number of random variables is unknown, mate posterior distribution. Sampling can be done using
such as multiobject tracking [88] (where the number of adaptive Importance Sampling or MCMC/HMC, and with
objects is unknown), and statistical phylogenetics [103] or without the use of a surrogate model for the likeli-
(where the number of extinct side branches of a phy- hood function approximation. The surrogate model em-
logeny is unknown). See the Appendix for an example ulates a target function using Gaussian processes (GPs)
of a model written in Birch. and active learning (Bayesian optimization). The active
learning approach has been demonstrated to accelerate
4.2.3 Pyro. [12] is a Python PPL built on the PyTorch likelihood-free inference up to several orders of magni-
[93] backend. The main computation method in Pyro is tude. Other general-purpose ABC packages are Python
stochastic variational inference, so the software is aimed packages pyABC [108] and ABCpy [34], and R packages
at scalable probabilistic machine learning. NumPyro [94] abc [27], EasyABC [64], and abctools [92] and ABCreg
is a NumPy-based backend for the Pyro PPL that uses [127]. Neural network based surrogate models are acces-
JAX for AD and compilation to CPU/GPU. sible via Python package sbi [125] and R package [3] pro-
4.2.4 Blang. [14] is an open source package for ap- vides a toolbox for BSL. More detailed surveys of ABC
proximating posterior distributions over arbitrary spaces, software are provided by Nunes and Prangle [92] and
that is, Bayesian models containing not only integer and Kousathanas et al. [67].
real variables but also user-defined datatypes such as phy- 4.4 Software That Focuses on Computation
logenetic trees, random graphs and sequence alignments.
The Blang project includes a standard library of common Blackjax5 is a Python library of MCMC methods for
datatypes and distributions, written in the Blang language, JAX. It works on CPU and GPU, is robust, efficient and
and extension points to create new datatypes and asso- easily integrates with PPLs that provide densities com-
ciated distributions. Users can publish versioned Blang patible with JAX (TFP, Oryx, NumPyro, Aesara, PyTen-
packages containing new datatypes and distributions and sor/PyMC).
import contributed packages and their transitive depen- Emcee [35, 36] is a Python implementation of the
dencies. The Blang language’s scoping rules are used to Affine Invariant MCMC ensemble Bayesian computation
automatically detect sparsity patterns and construct a type method [49]. This derivative-free approach is suitable for
of graphical model known as a factor graph. Based on low-dimensional problems with black-box likelihoods,
this factor graph, the posterior distribution is approxi- which are common in astrophysics. Another Python pack-
mated via an adaptive nonreversible parallel tempering age that is popular in astrophysics is dynesty [111], which
algorithm [118], which by default is parallelized over the implements dynamic nested sampling [57].
user’s CPU cores, but can also be distributed over MPI Mamba.jl6 is a Julia package aimed at users who want
(Message Passing Interface) thanks to Blang’s integration to use and develop MCMC methods. It implements sev-
with the Pigeons distributed Parallel Tempering package.4 eral MCMC methods (HMC, NUTS, Metropolis-within-
See the Appendix for an example of a model written in Gibbs, etc.) and MCMC diagnostics. Another popular
Blang.
5 https://github.com/blackjax-devs/blackjax.
4 https://github.com/Julia-Tempering/Pigeons.jl. 6 https://github.com/brian-j-smith/Mamba.jl.
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 55

Julia package that implements state-of-the-art Bayesian graphical predictive checking. The R package projpred
computation methods is DynamicHMC.jl.7 [95] performs projection predictive variable selection for
Bayesian generalized linear and additive multilevel mod-
4.5 Other General-Purpose Software
els fit using MCMC methods. The R package priors-
Other general purpose software includes Infer.NET ense [66] performs efficient prior and likelihood sensitiv-
[83] is a machine learning library written in C# for the ity analysis for Bayesian models fit using MCMC meth-
.NET framework. It facilitates automatic approximate in- ods. MCMCpack [81] is an R package that implements
ference for Bayesian networks and Markov random fields. MCMC-based computation for several statistical meth-
Bayesian computation is mostly limited to message pass- ods. Tools for structural learning and parameter estima-
ing. TensorFlow Probability (TFP) [31] is a Python li- tion of Bayesian networks include bnlearn [109], Bayes
brary built on TensorFlow [1]. An example of a PPL with Net for Matlab [86], HUGIN [77], VIBES [13], MSBNx
a very compact syntax is greta [47], a PPL embedded in [65], along with commercial tools GEnIe/SMILE [32] and
R but based on TensorFlow and TFP. While limited in the Netica.9
Bayesian computation methods provided, it is extensible.
See the Appendix for an example of a model written in 5. CHALLENGES AND FUTURE PERSPECTIVES
greta. Oryx8 is a PPL built on top of JAX. Journal of Sta- The field of software for Bayesian inference has never
tistical software also recently published a special issue been more active or varied. There are developments in all
on Bayesian software [18], which includes some software directions, providing better tools that allow for more ac-
covered by this paper and other specialized software. cessible, robust or efficient treatment of typical modeling
4.6 Popular Utilities and Specialized Software as well as pushing the boundaries of what can be done.
Similar to programming languages, where one might
CODA [97] is a still popular R package for post- prefer Python for general-purpose programming, R for
hoc diagnostics and analysis of MCMC output. ArviZ data wrangling and visualization or the emerging Julia for
[70] is a Python package, which provides MCMC di- high-performance data analytics, there is no one-size-fits-
agnostics, model evaluation and model validation tools. all approach to software for Bayesian inference. Stan is
ArviZ is backend-agnostic and currently the most popu- the typical choice for Bayesian model building and infer-
lar such tool in Python. The R package bridgesampling ence, Pyro or TFP for Bayesian machine learning, and nu-
[55] estimates marginal likelihoods, Bayes factors, poste- merous other tools for more specialized tasks. Such diver-
rior model probabilities and normalizing constants. The sity is understandable, because limiting the tool simplifies
R package posterior [17] subsets, binds, mutates and con- it and allows for more efficient computation. While there
verts between formats of MCMC samples and includes has been some encouraging progress in universal PPLs
lightweight implementations of state-of-the-art posterior and underlying Bayesian computation, it is not yet clear
inference diagnostics. BayesFactor [85] is an R pack- if a novel trade-off between expressivity and efficiency
age for computing Bayes factors for contingency tables, can be struck, leading to a third generation of tools.
one- and two-sample designs, one-way designs, general As a result, users have to either accept the limitations
ANOVA designs and linear regression. The R package of their tool of choice to learn how to work with multiple
shinystan [42] provides a graphical user interface for in- tools and languages. A natural solution would be to auto-
teractive Markov chain Monte Carlo (MCMC) diagnos- matically translate between languages or from statistical
tics and other tools for analyzing a posterior sample. The notation into code, as illustrated in the Appendix. This is
procedures in shinystan are agnostic to what generated the a difficult problem, because languages differ in expressiv-
MCMC samples but with some added functionality for ity and even when they exist, automatic translations could
models fit with RStan. The R package loo [133] performs result in inefficient code. Regardless, there appears to be
efficient approximate leave-one-out cross-validation for a relative lack of incentive in this area.
Bayesian models fit using MCMC methods. The R pack- A new PPL is most often learned from model examples
age bayestestR [78] has tools for dealing with uncertainty with code and data, or from translations from a language
and effects in a Bayesian statistics framework. It is ag- we are already familiar with. So, it is not a surprise that
nostic of the software that generated the posterior sam- popular PPLs such as BUGS and Stan have extensive doc-
ples and includes MAP estimates, measures of disper- umentation, including user’s manuals, case studies and in
sion, ROPE and Bayes factors. The R package bayesplot the case of Stan, examples of translations from BUGS to
[40] has graphing functions for Bayesian models, in- Stan. Popular PPLs are also accessible on all popular plat-
cluding posterior draws, visual MCMC diagnostics and forms and through major programming languages (typi-
cally standalone with interfaces), have open governance
7 https://github.com/tpapp/DynamicHMC.jl.
8 https://github.com/jax-ml/oryx/. 9 https://www.norsys.com/netica.html.
56 E. ŠTRUMBELJ ET AL.

and an active community that facilitates communication JAGS

between users and developers, typically from their gene-
sis. model {
for (i in 1:n) {
As the standards for statistical analysis rise, support for y[i] dnorm(beta * x[i] + alpha, 1 / (sigma * sigma))
}
the statistical workflow is also becoming more important.
To an extent, this is already addressed by some excellent alpha
beta
dnorm(0, 1 / 25)
dnorm(0, 1 / 25)
standalone utilities. However, certain parts of the work- }
sigma dunif(0, 10)

flow are more difficult to encapsulate because they rely

on the underlying language or computation. Another is- Nimble
sue is that sometimes it may be difficult to pinpoint from
a software point of view if the analysis is not working nimbleCode({
as expected, especially when black-box components are for(i in 1:n) {
y[i] dnorm(beta * x[i] + alpha, sd = sigma)
used. }

The practical importance of scalability is already ac- alpha dnorm(0, sd = 5)

beta dnorm(0, sd = 5)
knowledged in modern software for Bayesian inference, sigma dunif(0, 10)
})
and all popular languages have at least some support for
it, either through third-party or native libraries. Scalability
with respect to data size is today primarily handled with PyMC
optimized matrix algebra computation on massively par-
allel devices such as GPUs. We anticipate that this will with Model() as model:
sigma = Uniform("sigma", lower = 0, upper = 10)
further improve with developments in hardware and com- alpha = Normal("alpha", 0, sigma = 5)
beta = Normal("beta", 0, sigma = 5)
putation libraries. Scalability with respect to model size
likelihood = Normal("y", mu = beta * x + alpha,
depends more on the Bayesian computation used. HMC sigma = sigma, observed = y)

is currently the state-of-the-art for general-purpose fully-

Bayesian computation. Some limitations of HMC, as is Stan
the case with most other types of general Bayesian com-
putation, can be overcome with careful reparametrization data {
int<lower=0> n;
or by assigning different computation to different blocks vector[n] x;
vector[n] y;
of parameters. However, there is currently no automated }
approach to this. It remains to be seen if and how general- parameters {
real alpha;
purpose software (Stan, JAGS, MultiBUGS) will be su- real beta;
real<lower=0, upper=10> sigma;
perseded and what role will universal PPLs, approximate }
model {
Bayesian computation and SMC play. y normal(beta * x + alpha, sigma);
alpha normal(0, 5);
beta normal(0, 5);
}
APPENDIX: MODELING LANGUAGE EXAMPLES
In this Appendix, we illustrate several modeling lan- Bean Machine
guages with this example of Bayesian linear regression:

yi |β, α, σ, xi ∼ N βxi + α, σ 2 , i = 1 . . . n, @bm.random_variable
def alpha():
return Normal(0, 5)

α ∼ N 0, 52 , @bm.random_variable
def beta():
return Normal(0, 5)

β ∼ N 0, 52 , @bm.random_variable
def sigma():
return Uniform(0, 1)
σ ∼ U (0, 10), @bm.random_variable
def x(i):
return Normal(0, sigma())
where yi is the dependent variable and xi is the predictor. @bm.random_variable
def y():
This model is in the class of Bayesian networks. Its rep- return Normal(logit = beta() * x + alpha(), sigma())

resentation as a graphical model in plate notation is:

Birch

alpha Normal(0.0, 25.0);

beta Normal(0.0, 25.0);
sigma Uniform(0.0, 10.0);
y Normal(beta*x + alpha, sigma*sigma);
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 57

Greta [2] A LBERT, J. H. and C HIB , S. (1993). Bayesian analysis of bi-

nary and polychotomous response data. J. Amer. Statist. Assoc.
88 669–679. MR1224394
alpha <- normal(0, 5)
beta <- normal(0, 5) [3] A N , Z., S OUTH , L. F. and D ROVANDI , C. (2022). BSL: An R
sigma <- uniform(0, 10) package for efficient parameter estimation for simulation-based
distribution(y) <- normal(beta * x + alpha, sigma) models via Bayesian synthetic likelihood. J. Stat. Softw. 101 1–
33.
[4] BAKKA , H., RUE , H., F UGLSTAD , G.-A., R IEBLER , A.,
ELFI
B OLIN , D., I LLIAN , J., K RAINSKI , E., S IMPSON , D. and
L INDGREN , F. (2018). Spatial modeling with R-INLA: A
def linear_regression(alpha, review. Wiley Interdiscip. Rev.: Comput. Stat. 10 e1443.
beta,
sigma, MR3873676 https://doi.org/10.1002/wics.1443
x, [5] BAUDART, G., B URRONI , J., H IRZEL , M., M ANDEL , L. and
batch_size=1,
random_state=None): S HINNAR , A. (2021). Compiling Stan to generative probabilis-
tic languages and extension to deep probabilistic programming.
x = x.reshape(1,-1)
n = x.shape[1] In Proceedings of the 42nd ACM SIGPLAN International Con-
random_state = random_state or numpy.random ference on Programming Language Design and Implementation
alpha = numpy.repeat(alpha.reshape(-1,1), n, axis=1)
beta_x = numpy.matmul(beta.reshape(-1, 1), x) 497–510.
noise = numpy.matmul(
random_state.randn(n, batch_size),
[6] BAYDIN , A. G., P EARLMUTTER , B. A., R ADUL , A. A. and
numpy.diag(sigma)).T S ISKIND , J. M. (2017). Automatic differentiation in machine
y = alpha + beta_x + noise
learning: A survey. J. Mach. Learn. Res. 18 153. MR3800512
return y [7] B ERAHA , M., FALCO , D. and G UGLIELMI , A. (2021).
m = elfi.ElfiModel() JAGS, NIMBLE, Stan: A detailed comparison among Bayesian
elfi.Prior(’normal’, 0, 5, model=m, name=’alpha’) MCMC software. Available at arXiv:2107.09357.
elfi.Prior(’normal’, 0, 5, model=m, name=’beta’)
elfi.Prior(’uniform’, 0, 10, model=m, name=’sigma’) [8] B ERGSTRA , J., B REULEUX , O., BASTIEN , F., L AMBLIN ,
elfi.Simulator(linear_regression, P., PASCANU , R., D ESJARDINS , G., T URIAN , J., WARDE -
m[’alpha’],
m[’beta’], FARLEY, D. and B ENGIO , Y. (2010). Theano: A CPU and GPU
m[’sigma’], math expression compiler. In Proceedings of the Python for Sci-
x,
name=’linreg’) entific Computing Conference (SciPy) 4 1–7. Austin, TX.
[9] B ETANCOURT, M. (2016). Diagnosing suboptimal cotangent
disintegrations in Hamiltonian Monte Carlo. Available at
Blang
arXiv:1604.00695.
[10] B ETANCOURT, M. (2017). A conceptual introduction to Hamil-
model LinRegression { tonian Monte Carlo. Available at arXiv:1701.02434.
param GlobalDataSource data
param Plate<Integer> observationPlate
[11] B ETANCOURT, M., B YRNE , S., L IVINGSTONE , S. and G IRO -
param Plated<RealVar> x LAMI , M. (2017). The geometric foundations of Hamilto-
random RealVar alpha, beta, sigma nian Monte Carlo. Bernoulli 23 2257–2298. MR3648031
random Plated<RealVar> y https://doi.org/10.3150/16-BEJ810
laws { [12] B INGHAM , E., C HEN , J. P., JANKOWIAK , M., O BERMEYER ,
alpha Normal(0, 25) F., P RADHAN , N., K ARALETSOS , T., S INGH , R., S ZERLIP, P.,
beta Normal(0, 25)
sigma ContinuousUniform(0, 10) H ORSFALL , P. et al. (2019). Pyro: Deep universal probabilistic
for (Index<Integer> i : observationPlate.indices) { programming. J. Mach. Learn. Res. 20 973–978.
y.get(i) | beta, alpha, sigma, RealVar x_i = x.get(i)
Normal(beta * x_i + alpha, sigma * sigma) [13] B ISHOP, C., S PIEGELHALTER , D. and W INN , J. (2002).
} VIBES: A variational inference engine for Bayesian networks.
}
} Adv. Neural Inf. Process. Syst. 15.
[14] B OUCHARD -C ÔTÉ , A., C HERN , K., C UBRANIC , D., H OS -
SEINI , S., H UME , J., L EPUR , M., O UYANG , Z. and S GARBI ,
ACKNOWLEDGMENTS
G. (2022). Blang: Bayesian declarative modeling of general
Special thanks to Christian Robert for the initiative and data structures and inference via algorithms based on distribu-
tion continua. J. Stat. Softw. 103 1–98.
encouragement for this work.
[15] B RADBURY, J., F ROSTIG , R., H AWKINS , P., J OHNSON , M.
J., L EARY, C., M ACLAURIN , D., N ECULA , G., PASZKE , A.,
FUNDING VANDER P LAS , J. et al. (2018). JAX: Composable transforma-
tions of Python+NumPy programs.
Erik Štrumbelj’s work is partially funded by the Slove- [16] B ÜRKNER , P.-C. (2017). brms: An R package for Bayesian
nian Research Agency (research core funding No. P2- multilevel models using Stan. J. Stat. Softw. 80 1–28.
0442). Andrew Gelman’s work is partially funded by the [17] B ÜRKNER , P.-C., G ABRY, J., K AY, M. and V EHTARI , A.
U.S. Office of Naval Research. (2022). posterior: Tools for working with posterior distribu-
tions.
[18] C AMELETTI , M. and G ÓMEZ -RUBIO , V. (2021). Software for
REFERENCES Bayesian statistics. J. Stat. Softw. 100 1–7.
[19] C APRETTO , T., P IHO , C., K UMAR , R., W ESTFALL , J.,
[1] A BADI , M. et al. (2015). TensorFlow: Large-Scale Machine YARKONI , T. and M ARTIN , O. A. (2022). Bambi: A simple
Learning on Heterogeneous Systems. Software available from interface for fitting Bayesian linear models in python. J. Stat.
tensorflow.org. Softw. 103 1–29.
58 E. ŠTRUMBELJ ET AL.

[20] C ARPENTER , B. (2021). What do we need from a prob- [36] F OREMAN -M ACKEY, D., H OGG , D. W., L ANG , D. and
abilistic programming language to support Bayesian work- G OODMAN , J. (2013). emcee: The MCMC hammer. Publ. As-
flow? In International Conference on Probabilistic Program- tron. Soc. Pac. 125 306.
ming (PROBPROG) 46. [37] F OURNIER , D. A., S KAUG , H. J., A NCHETA , J., I ANELLI ,
[21] C ARPENTER , B., G ELMAN , A., H OFFMAN , M. D., L EE , D., J., M AGNUSSON , A., M AUNDER , M. N., N IELSEN , A. and
G OODRICH , B., B ETANCOURT, M., B RUBAKER , M., G UO , S IBERT, J. (2012). AD model builder: Using automatic differ-
J., L I , P. et al. (2017). Stan: A probabilistic programming lan- entiation for statistical inference of highly parameterized com-
guage. J. Stat. Softw. 76. plex nonlinear models. Optim. Methods Softw. 27 233–249.
[22] C ARPENTER , B., H OFFMAN , M. D., B RUBAKER , M., L EE , MR2901959 https://doi.org/10.1080/10556788.2011.597854
D., L I , P. and B ETANCOURT, M. (2015). The Stan math library: [38] F RÜHWIRTH -S CHNATTER , S., F RÜHWIRTH , R., H ELD , L.
Reverse-mode automatic differentiation in C++. Available at and RUE , H. (2009). Improved auxiliary mixture sam-
arXiv:1509.07164. pling for hierarchical models of non-Gaussian data. Stat.
[23] C EŠNOVAR , R. (2022). Parallel computation in the Stan prob- Comput. 19 479–492. MR2565319 https://doi.org/10.1007/
abilistic programming language Ph.D. thesis Univerza v Ljubl- s11222-008-9109-4
jani, Fakulteta za računalništvo in informatiko. [39] G ABRY, J. and Č EŠNOVAR , R. (2022). A lightweight R inter-
[24] C IGLARI Č , T., Č EŠNOVAR , R. and Š TRUMBELJ , E. (2020). face to CmdStan.
Automated OpenCL GPU kernel fusion for Stan math. In Pro- [40] G ABRY, J. and M AHR , T. (2022). bayesplot: Plotting for
ceedings of the International Workshop on OpenCL 1–6. Bayesian models. R package version 1.10.0.
[25] C RANMER , K., B REHMER , J. and L OUPPE , G. (2020). The [41] G ABRY, J., S IMPSON , D., V EHTARI , A., B ETANCOURT, M.
frontier of simulation-based inference. Proc. Natl. Acad. Sci. and G ELMAN , A. (2019). Visualization in Bayesian work-
USA 117 30055–30062. MR4263287 https://doi.org/10.1073/ flow. J. Roy. Statist. Soc. Ser. A 182 389–402. MR3902665
pnas.1912789117 https://doi.org/10.1111/rssa.12378
[26] C SÁRDI , G. (2019). cranlogs: Download logs from the ‘RStu- [42] G ABRY, J. and V EEN , D. (2022). shinystan: Interactive visual
dio’ ‘CRAN’ mirror. R package version 2.1.1. and numerical diagnostics and posterior analysis for Bayesian
[27] C SILLÉRY, K., F RANÇOIS , O. and B LUM , M. G. (2012). abc: models. R package version 2.6.0. Available at https://CRAN.
An R package for approximate Bayesian computation (ABC). R-project.org/package=shinystan.
Methods Ecol. Evol. 3 475–479. [43] G AEDKE -M ERZHÄUSER , L., VAN N IEKERK , J., S CHENK , O.
[28] C USUMANO -T OWNER , M. F., S AAD , F. A., L EW, A. K. and and RUE , H. (2023). Parallelized integrated nested Laplace ap-
M ANSINGHKA , V. K. (2019). Gen: A general-purpose proba- proximations for fast Bayesian inference. Stat. Comput. 33 25.
bilistic programming system with programmable inference. In MR4526361 https://doi.org/10.1007/s11222-022-10192-1
Proceedings of the 40th ACM Sigplan Conference on Program- [44] G E , H., X U , K. and G HAHRAMANI , Z. (2018). Turing: A
ming Language Design and Implementation 221–236. language for flexible probabilistic inference. In International
[29] DAVIS , T. A. (2006). Direct Methods for Sparse Linear Sys- Conference on Artificial Intelligence and Statistics 1682–1690.
tems. Fundamentals of Algorithms 2. SIAM, Philadelphia, PA. PMLR.
MR2270673 https://doi.org/10.1137/1.9780898718881 [45] G ELMAN , A., V EHTARI , A., S IMPSON , D., M ARGOSSIAN ,
[30] DE VALPINE , P., T UREK , D., PACIOREK , C. J., A NDERSON - C. C., C ARPENTER , B., YAO , Y., K ENNEDY, L., G ABRY, J.,
B ERGMAN , C., T EMPLE L ANG , D. and B ODIK , R. (2017). B ÜRKNER , P.-C. et al. (2020). Bayesian workflow. Available at
Programming with models: Writing statistical algorithms arXiv:2011.01808.
for general model structures with NIMBLE. J. Comput. [46] G ILKS , W. R. and W ILD , P. (1992). Adaptive rejection sam-
Graph. Statist. 26 403–413. MR3640196 https://doi.org/10. pling for Gibbs sampling. J. R. Stat. Soc., Ser. C 41 337–348.
1080/10618600.2016.1172487 [47] G OLDING , N. (2019). greta: Simple and scalable statistical
[31] D ILLON , J. V., L ANGMORE , I., T RAN , D., B REVDO , E., VA - modelling in R. J. Open Sour. Softw. 4 1601.
SUDEVAN , S., M OORE , D., PATTON , B., A LEMI , A., H OFF - [48] G OODMAN , N., M ANSINGHKA , V., ROY, D. M., B ONAWITZ ,
MAN , M. et al. (2017). Tensorflow distributions. Available at K. and T ENENBAUM , J. B. (2012). Church: A language for
arXiv:1711.10604. generative models. Available at arXiv:1206.3255.
[32] D RUZDZEL , M. J. (1999). SMILE: Structural modeling, infer- [49] G OODMAN , J. and W EARE , J. (2010). Ensemble samplers with
ence, and learning engine and GeNIe: A development environ- affine invariance. Commun. Appl. Math. Comput. Sci. 5 65–80.
ment for graphical decision-theoretic models. In American As- MR2600822 https://doi.org/10.2140/camcos.2010.5.65
sociation for Artificial Intelligence Proceedings 902–903. [50] G OODRICH , B., A LI , I., G ABRY, J. and S AM , B. (2021). rstan-
[33] D UANE , S., K ENNEDY, A. D., P ENDLETON , B. J. and arm: Bayesian applied regression modeling via Stan R package
ROWETH , D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195 version 2.21.3. https://CRAN.R-project.org/package=rstanarm.
216–222. MR3960671 https://doi.org/10.1016/0370-2693(87) [51] G ORINOVA , M. I. (2022). Program analysis of probabilistic
91197-x programs. Available at arXiv:2204.06868.
[34] D UTTA , R., S CHOENGENS , M., PACCHIARDI , L., U MMA - [52] G ORINOVA , M. I., G ORDON , A. D. and S UTTON , C. (2019).
DISINGU , A., W IDMER , N., K ÜNZLI , P., O NNELA , J.-P. and Probabilistic programming with densities in SlicStan: Efficient,
M IRA , A. (2021). ABCpy: A high-performance computing per- flexible, and deterministic. Proc. ACM Program. Lang. 3 1–30.
spective to approximate Bayesian computation. J. Stat. Softw. [53] G ORINOVA , M., M OORE , D. and H OFFMAN , M. (2020). Au-
100 1–38. tomatic reparameterisation of probabilistic programs. In Inter-
[35] F OREMAN -M ACKEY, D., FARR , W. M., S INHA , M., national Conference on Machine Learning 3648–3657. PMLR.
A RCHIBALD , A. M., H OGG , D. W., S ANDERS , J. S., Z UNTZ , [54] G OUDIE , R. J., T URNER , R. M., D E A NGELIS , D. and
J., W ILLIAMS , P. K., N ELSON , A. R. et al. (2019). emcee T HOMAS , A. (2020). MultiBUGS: A parallel implementation
v3: A Python ensemble sampling toolkit for affine-invariant of the BUGS modelling framework for faster Bayesian infer-
MCMC. Available at arXiv:1911.07688. ence. J. Stat. Softw. 95.
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 59

[55] G RONAU , Q. F., S INGMANN , H. and WAGENMAKERS , E.-J. [73] L INTUSAARI , J., V UOLLEKOSKI , H., K ANGASRÄÄSIÖ , A.,
(2020). bridgesampling: An R package for estimating normal- S KYTÉN , K., JÄRVENPÄÄ , M., M ARTTINEN , P., G UTMANN ,
izing constants. J. Stat. Softw. 92 1–29. M. U., V EHTARI , A., C ORANDER , J. et al. (2018). ELFI: En-
[56] G UTMANN , M. U. and C ORANDER , J. (2016). Bayesian opti- gine for likelihood-free inference. J. Mach. Learn. Res. 19 16.
mization for likelihood-free inference of simulator-based statis- MR3862423
tical models. J. Mach. Learn. Res. 17 125. MR3555016 [74] L UNN , D., JACKSON , C., B EST, N., T HOMAS , A. and
[57] H IGSON , E., H ANDLEY, W., H OBSON , M. and L ASENBY, S PIEGELHALTER , D. (2012). The BUGS Book: A Practical In-
A. (2019). Dynamic nested sampling: An improved algo- troduction to Bayesian Analysis. CRC Press, Boca Raton, FL.
rithm for parameter estimation and evidence calculation. Stat. [75] L UNN , D., S PIEGELHALTER , D., T HOMAS , A. and B EST, N.
Comput. 29 891–913. MR3994608 https://doi.org/10.1007/ (2009). The BUGS project: Evolution, critique and future direc-
s11222-018-9844-0 tions. Stat. Med. 28 3049–3067. MR2750401 https://doi.org/10.
[58] H OBERT, J. P. (2011). The data augmentation algorithm: The- 1002/sim.3680
ory and methodology. In Handbook of Markov Chain Monte [76] L UNN , D. J., T HOMAS , A., B EST, N. and S PIEGELHALTER ,
Carlo. Chapman & Hall/CRC Handb. Mod. Stat. Methods 253– D. (2000). WinBUGS—a Bayesian modelling framework: Con-
293. CRC Press, Boca Raton, FL. MR2858452 cepts, structure, and extensibility. Stat. Comput. 10 325–337.
[59] H OFFMAN , M. D. and G ELMAN , A. (2014). The no-U-turn [77] M ADSEN , A. L., L ANG , M., K JÆRULFF , U. B. and J ENSEN ,
sampler: Adaptively setting path lengths in Hamiltonian Monte F. (2003). The Hugin tool for learning Bayesian networks.
Carlo. J. Mach. Learn. Res. 15 1593–1623. MR3214779 In Symbolic and Quantitative Approaches to Reasoning with
[60] H OFFMAN , M. D., R ADUL , A. and S OUNTSOV, P. (2021). An Uncertainty. Lecture Notes in Computer Science 2711 594–
adaptive MCMC scheme for setting trajectory lengths in Hamil- 605. Springer, New York. MR2050972 https://doi.org/10.1007/
tonian Monte Carlo. Int. Conf. Artif. Intell. Stat. 978-3-540-45062-7_49
[61] H OFFMAN , M. and S OUNTSOV, P. (2022). Tuning-free gener- [78] M AKOWSKI , D., B EN -S HACHAR , M. S. and L ÜDECKE , D.
alized Hamiltonian Monte Carlo. Proc. Mach. Learn. Res. 151 (2019). bayestestR: Describing effects and their uncertainty,
7799–7813. existence and significance within the Bayesian framework. J.
[62] H OLMES , C. C. and H ELD , L. (2006). Bayesian auxil- Open Sour. Softw. 4 1541.
iary variable models for binary and multinomial regression. [79] M ANSINGHKA , V., S ELSAM , D. and P EROV, Y. (2014). Ven-
Bayesian Anal. 1 145–168. MR2227368 https://doi.org/10.
ture: A higher-order probabilistic programming platform with
1214/06-BA105
programmable inference. Available at arXiv:1404.0099.
[63] H UNTER , J. D. (2007). Matplotlib: A 2D graphics environ-
[80] M ARTIN , G. M., F RAZIER , D. T. and ROBERT, C. P.
ment. Comput. Sci. Eng. 9 90–95.
(2022). Computing Bayes: From then ‘til now’. Available at
[64] JABOT, F., FAURE , T. and D UMOULIN , N. (2013). Easy ABC:
arXiv:2208.00646.
Performing efficient approximate Bayesian computation sam-
[81] M ARTIN , A. D., Q UINN , K. M. and PARK , J. H. (2011).
pling schemes using R. Methods Ecol. Evol. 4 684–687.
MCMCpack: Markov chain Monte Carlo in R. J. Stat. Softw.
[65] K ADIE , C. M., H OVEL , D. and H ORVITZ , E. (2001). MSBNx:
42 22.
A Component-Centric Toolkit for Modeling and Inference with
[82] M ARTINS , T. G., S IMPSON , D., L INDGREN , F. and RUE ,
Bayesian Networks. Microsoft Research, Richmond, WA. Tech-
H. (2013). Bayesian computing with INLA: New fea-
nical Report MSR-TR-2001-67 28.
[66] K ALLIOINEN , N., PAANANEN , T., B ÜRKNER , P.-C. and tures. Comput. Statist. Data Anal. 67 68–83. MR3079584
V EHTARI , A. (2021). Detecting and diagnosing prior https://doi.org/10.1016/j.csda.2013.04.014
and likelihood sensitivity with power-scaling. Available at [83] M INKA , T., W INN , J. M., G UIVER , J. P., Z AYKOV, Y.,
arXiv:2107.14054. FABIAN , D. and B RONSKILL , J. (2018). /Infer.NET 0.3. Mi-
[67] KOUSATHANAS , A., D UCHEN , P. and W EGMANN , D. (2019). crosoft Research Cambridge. Available at http://dotnet.github.
A guide to general-purpose ABC software. In Handbook of io/infer.
Approximate Bayesian Computation. Chapman & Hall/CRC [84] M ONNAHAN , C. C. and K RISTENSEN , K. (2018). No-U-turn
Handb. Mod. Stat. Methods 369–413. CRC Press, Boca Raton, sampling for fast Bayesian inference in ADMB and TMB: In-
FL. MR3889290 troducing the adnuts and tmbstan R packages. PLoS ONE 13
[68] K RAINSKI , E., G ÓMEZ -RUBIO , V., BAKKA , H., L ENZI , A., e0197954. https://doi.org/10.1371/journal.pone.0197954
C ASTRO -C AMILO , D., S IMPSON , D., L INDGREN , F. and [85] M OREY, R. D. and ROUDER , J. N. (2022). BayesFactor: Com-
RUE , H. (2018). Advanced Spatial Modeling with Stochastic putation of Bayes factors for common designs.
Partial Differential Equations Using R and INLA. CRC Press, [86] M URPHY, K. (2001). The Bayes Net toolbox for Matlab. Com-
Boca Raton, FL. put. Sci. Stat. 33 1024–1034.
[69] K UCUKELBIR , A., T RAN , D., R ANGANATH , R., G ELMAN , A. [87] M URRAY, L. M., L UNDÉN , D., K UDLICKA , J., B ROMAN , D.
and B LEI , D. M. (2017). Automatic differentiation variational and S CHÖN , T. B. (2018). Delayed sampling and automatic
inference. J. Mach. Learn. Res. 18 14. MR3634881 Rao-blackwellization of probabilistic programs. In Proceedings
[70] K UMAR , R., C ARROLL , C., H ARTIKAINEN , A. and M ARTÍN , of the 21st International Conference on Artificial Intelligence
O. A. (2019). ArviZ a unified library for exploratory analysis and Statistics (AISTATS).
of Bayesian models in Python. J. Open Sour. Softw.. [88] M URRAY, L. M. and S CHÖN , T. B. (2018). Automated learn-
[71] L INDGREN , F. and RUE , H. (2015). Bayesian spatial modelling ing with a probabilistic programming language: Birch. Annu.
with R-INLA. J. Stat. Softw. 63 1–25. Rev. Control 46 29–43. MR3907522 https://doi.org/10.1016/j.
[72] L INDGREN , F., RUE , H. and L INDSTRÖM , J. (2011). An arcontrol.2018.10.013
explicit link between Gaussian fields and Gaussian Markov [89] N EAL , R. M. (1993). Probabilistic Inference Using Markov
random fields: The stochastic partial differential equation ap- Chain Monte Carlo Methods. Department of Computer Science,
proach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 423–498. Univ. Toronto.
MR2853727 https://doi.org/10.1111/j.1467-9868.2011.00777. [90] N EAL , R. M. (2003). Slice sampling. Ann. Statist. 31 705–767.
x MR1994729 https://doi.org/10.1214/aos/1056562461
60 E. ŠTRUMBELJ ET AL.

[91] N EAL , R. M. MCMC using Hamiltonian dynamics. Handb. [111] S PEAGLE , J. S. (2020). dynesty: A dynamic nested sampling
Markov Chain Monte Carlo. package for estimating Bayesian posteriors and evidences. Mon.
[92] N UNES , M. A. and P RANGLE , D. (2015). abctools: An R pack- Not. R. Astron. Soc. 493 3132–3158.
age for tuning approximate Bayesian computation analyses. R J. [112] S PIEGELHALTER , D., T HOMAS , A., B EST, N. and G ILKS , W.
7 189–205. (1996). BUGS 0.5: Bayesian inference using Gibbs sampling
[93] PASZKE , A., G ROSS , S., M ASSA , F., L ERER , A., B RADBURY, manual (version II). In MRC Biostatistics Unit 1–59. Institute
J., C HANAN , G., K ILLEEN , T., L IN , Z., G IMELSHEIN , N. et of Public Health, Cambridge, UK.
al. (2019). Pytorch: An imperative style, high-performance deep [113] S PIEGELHALTER , D. J., T HOMAS , A., B EST, N. and L UNN ,
learning library. Adv. Neural Inf. Process. Syst. 32. D. (2003). WinBUGS Version 1.4 User Manual, MRC Biostatis-
[94] P HAN , D., P RADHAN , N. and JANKOWIAK , M. (2019). Com- tics Unit, Cambridge. Available at http://www.mrc-bsu.cam.ac.
posable effects for flexible and accelerated probabilistic pro- uk/bugs.
gramming in NumPyro. Available at arXiv:1912.11554. [114] S PIEGELHALTER , D., T HOMAS , A., B EST, N. and L UNN , D.
[95] P IIRONEN , J., PAASINIEMI , M., C ATALINA , A., W EBER , F. (2014). OpenBUGS user manual. Version 3.2.3.
and V EHTARI , A. (2023). projpred: Projection predictive fea- [115] S TAN D EVELOPMENT T EAM (2022). RStan: The R interface to
ture selection. Stan.
[96] P LUMMER , M. et al. (2003). JAGS: A program for analysis of [116] S TAN D EVELOPMENT T EAM (2022). A lightweight Python in-
Bayesian graphical models using Gibbs sampling. In Proceed- terface to CmdStan.
ings of the 3rd International Workshop on Distributed Statisti- [117] S TURTZ , S., L IGGES , U. and G ELMAN , A. (2005).
cal Computing 124 1–10. R2WinBUGS: A package for running WinBUGS from R. J.
[97] P LUMMER , M., B EST, N., C OWLES , K. and V INES , K. (2006). Stat. Softw. 12 1–16.
CODA: Convergence diagnosis and output analysis for MCMC. [118] S YED , S., B OUCHARD -C ÔTÉ , A., D ELIGIANNIDIS , G. and
R News 6 7–11. D OUCET, A. (2022). Non-reversible parallel tempering: A scal-
[98] P OLSON , N. G., S COTT, J. G. and W INDLE , J. (2013). able highly parallel MCMC scheme. J. R. Stat. Soc. Ser. B. Stat.
Bayesian inference for logistic models using Pólya-Gamma Methodol. 84 321–350. MR4412989
latent variables. J. Amer. Statist. Assoc. 108 1339–1349. [119] TALTS , S., B ETANCOURT, M., S IMPSON , D., V EHTARI ,
MR3174712 https://doi.org/10.1080/01621459.2013.829001 A. and G ELMAN , A. (2018). Validating Bayesian infer-
[99] P RICE , L. F., D ROVANDI , C. C., L EE , A. and N OTT, ence algorithms with simulation-based calibration. Available at
D. J. (2018). Bayesian synthetic likelihood. J. Comput. arXiv:1804.06788.
Graph. Statist. 27 1–11. MR3788296 https://doi.org/10.1080/ [120] TAREK , M., X U , K., T RAPP, M., G E , H. and G HAHRAMANI ,
10618600.2017.1302882 Z. (2020). DynamicPPL: Stan-like speed for dynamic proba-
[100] R C ORE T EAM (2022). R: A Language and Environment for bilistic models. Available at arXiv:2002.02702.
Statistical Computing. R Foundation for Statistical Computing, [121] TAYLOR , S. J. and L ETHAM , B. (2018). Forecasting at scale.
Vienna, Austria. Amer. Statist. 72 37–45. MR3790566 https://doi.org/10.1080/
[101] R AINFORTH , T. W. G. (2017). Automating inference, learning, 00031305.2017.1380080
and design using probabilistic programming Ph.D. thesis Univ. [122] TAYLOR , S. J. and L ETHAM , B. (2021). prophet: Automatic
Oxford. Forecasting Procedure.
[102] R IDELL , A. (2022). Python interface to Stan. [123] TAYLOR , S. J. and L ETHAM , B. (2022). Prophet: Automatic
[103] RONQUIST, F., K UDLICKA , J., S ENDEROV, V., B ORGSTRÖM , Forecasting Procedure.
J., L ARTILLOT, N., L UNDÉN , D., M URRAY, L., S CHÖN , T. B. [124] T EHRANI , N., A RORA , N. S., L I , Y. L., S HAH , K. D.,
and B ROMAN , D. (2021). Universal probabilistic programming N OURSI , D., T INGLEY, M., T ORABI , N., L IPPERT, E., M EI -
offers a powerful approach to statistical phylogenetics. Com- JER , E. et al. (2020). Bean machine: A declarative probabilistic
mun. Biol. 4. programming language for efficient programmable inference.
[104] RS TUDIO T EAM (2021). RStudio: Integrated Development En- In International Conference on Probabilistic Graphical Models
vironment for R. RStudio, PBC, Boston, MA. 485–496. PMLR.
[105] RUE , H., M ARTINO , S. and C HOPIN , N. (2009). Approxi- [125] T EJERO -C ANTERO , A., B OELTS , J., D EISTLER , M., L UECK -
mate Bayesian inference for latent Gaussian models by using MANN , J.-M., D URKAN , C., G ONÇALVES , P. J., G REEN -
integrated nested Laplace approximations. J. R. Stat. Soc. Ser. BERG , D. S. and M ACKE , J. H. (2020). sbi: A toolkit for
B. Stat. Methodol. 71 319–392. MR2649602 https://doi.org/10. simulation-based inference. J. Open Sour. Softw. 5 2505.
1111/j.1467-9868.2008.00700.x [126] T HOMAS , N. (2020). R2OpenBUGS: Running OpenBUGS
[106] RUE , H., R IEBLER , A., S ØRBYE , S. H., I LLIAN , J. B., S IMP - from R.
SON , D. P. and L INDGREN , F. K. (2017). Bayesian computing [127] T HORNTON , K. R. (2009). Automating approximate Bayesian
with INLA: A review. Annu. Rev. Stat. Appl. 4 395–421. computation by local linear regression. BMC Genet. 10 1–5.
[107] S ALVATIER , J., W IECKI , T. V. and F ONNESBECK , C. (2016). [128] T OLPIN , D., VAN DE M EENT, J.-W., YANG , H. and W OOD ,
Probabilistic programming in Python using PyMC3. PeerJ F. (2016). Design and implementation of probabilistic program-
Comput. Sci. 2 e55. ming language anglican. In Proceedings of the 28th Symposium
[108] S CHÄLTE , Y., K LINGER , E., A LAMOUDI , E. and H ASE - on the Implementation and Application of Functional Program-
NAUER , J. (2022). pyABC: Efficient and robust easy- ming Languages 1–12.
to-use approximate Bayesian computation. Available at [129] T RAN , D., H OFFMAN , M. W., M OORE , D., S UTER , C., VA -
arXiv:2203.13043. SUDEVAN , S. and R ADUL , A. (2018). Simple, distributed, and
[109] S CUTARI , M. (2010). Learning Bayesian networks with the bn- accelerated probabilistic programming. Adv. Neural Inf. Pro-
learn R package. J. Stat. Softw. 35. cess. Syst. 31.
[110] S ISSON , S. A., FAN , Y. and B EAUMONT, M. (2018). Hand- [130] VAN N IEKERK , J., BAKKA , H., RUE , H. and S CHENK , O.
book of Approximate Bayesian Computation. CRC Press, Boca (2021). New frontiers in Bayesian modeling using the INLA
Raton, FL. package in R. J. Stat. Softw. 100 1–28.
PAST, PRESENT AND FUTURE OF SOFTWARE FOR BAYESIAN INFERENCE 61

[131] VAN N IEKERK , J., K RAINSKI , E., RUSTAND , D. and RUE , and WAIC. Stat. Comput. 27 1413–1432. MR3647105
H. (2023). A new avenue for Bayesian inference with https://doi.org/10.1007/s11222-016-9696-4
INLA. Comput. Statist. Data Anal. 181 107692. MR4540934 [135] W ICKHAM , H. (2016). Ggplot2: Elegant Graphics for Data
https://doi.org/10.1016/j.csda.2023.107692 Analysis. Springer, Berlin.
[132] VAN N IEKERK , J. and RUE , H. (2021). Correcting the Laplace [136] W IGREN , A., R ISULEO , R. S., M URRAY, L. M. and L IND -
method with variational Bayes. Available at arXiv:2111.12945. STEN , F. (2019). Parameter elimination in particle Gibbs sam-
[133] V EHTARI , A., G ABRY, J., M AGNUSSON , M., YAO , Y., pling. Advances in Neural Information Processing Systems 32
B ÜRKNER , P.-C., PAANANEN , T. and G ELMAN , A. (2022). (NeurIPS 2019).
loo: Efficient leave-one-out cross-validation and WAIC for [137] W OOD , S. N. (2017). Generalized Additive Models: An Intro-
Bayesian models. R package version 2.5.1. duction with R, Second ed. CRC Press, Boca Raton, FL.
[134] V EHTARI , A., G ELMAN , A. and G ABRY, J. (2017). Practical
Bayesian model evaluation using leave-one-out cross-validation

Weekend Windfalls: Trading Manual & Quick-Start Guide
No ratings yet
Weekend Windfalls: Trading Manual & Quick-Start Guide
33 pages
An Introduction To Bayesian Inference, Methods and Computation
100% (1)
An Introduction To Bayesian Inference, Methods and Computation
177 pages
An Introduction to Information Theory
From Everand
An Introduction to Information Theory
Fazlollah M. Reza
No ratings yet
Nursing Care Plans
100% (3)
Nursing Care Plans
10 pages
Computational Bayesian Statistics
100% (1)
Computational Bayesian Statistics
254 pages
Bayesian Inference For Psychology Part 3: Parameter Estimation in Nonstardard Models
No ratings yet
Bayesian Inference For Psychology Part 3: Parameter Estimation in Nonstardard Models
25 pages
3 0 Lueckmann21a
No ratings yet
3 0 Lueckmann21a
14 pages
An Introduction To Bayesian Inference, Methods and Computation, Nick Heard, 1st Ed - 2021, 2021, Springer Nature
No ratings yet
An Introduction To Bayesian Inference, Methods and Computation, Nick Heard, 1st Ed - 2021, 2021, Springer Nature
176 pages
Slides
No ratings yet
Slides
381 pages
Bayesian Inference and Computation A Beginner's Guide - Brewer
No ratings yet
Bayesian Inference and Computation A Beginner's Guide - Brewer
40 pages
Bayesian Inference
No ratings yet
Bayesian Inference
380 pages
BayesiaLab Book V18 PDF
No ratings yet
BayesiaLab Book V18 PDF
383 pages
Lec18 HierarchicalBayesianModels
No ratings yet
Lec18 HierarchicalBayesianModels
20 pages
MSBNX: A Component-Centric Toolkit For Modeling and Inference With Bayesian Networks
No ratings yet
MSBNX: A Component-Centric Toolkit For Modeling and Inference With Bayesian Networks
34 pages
Chopin Ridgway
No ratings yet
Chopin Ridgway
24 pages
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
100% (4)
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
257 pages
180 Paper
No ratings yet
180 Paper
19 pages
Computing Bayes: Bayesian Computation From 1763 To The 21st Century
No ratings yet
Computing Bayes: Bayesian Computation From 1763 To The 21st Century
47 pages
BayesiaLab Specifications en
No ratings yet
BayesiaLab Specifications en
17 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Bayesian Inference of Gene Expression
No ratings yet
Bayesian Inference of Gene Expression
24 pages
(Review) The Frontier of Simulation-Based Inference
No ratings yet
(Review) The Frontier of Simulation-Based Inference
8 pages
Bayesian-Statistics Final 20140416 3
No ratings yet
Bayesian-Statistics Final 20140416 3
38 pages
Bayesian Programming
100% (5)
Bayesian Programming
208 pages
Bayesian Programming
No ratings yet
Bayesian Programming
208 pages
Gen: A General-Purpose Probabilistic Programming System With Programmable Inference
No ratings yet
Gen: A General-Purpose Probabilistic Programming System With Programmable Inference
14 pages
Bayesian Networks, Introduction and Practical Applications (Final Draft)
No ratings yet
Bayesian Networks, Introduction and Practical Applications (Final Draft)
32 pages
Bayes in Meachine Learning
No ratings yet
Bayes in Meachine Learning
12 pages
Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
No ratings yet
Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
29 pages
Vikram Mullachery Aniruddh Khera Amir Husain: Bayesian Neural Networks
No ratings yet
Vikram Mullachery Aniruddh Khera Amir Husain: Bayesian Neural Networks
16 pages
Bayesian Inference
100% (2)
Bayesian Inference
347 pages
Bayesian Analysis in Stata With Winbugs
No ratings yet
Bayesian Analysis in Stata With Winbugs
20 pages
Slides 1
No ratings yet
Slides 1
42 pages
A Visual Introduction To Gaussian Belief Propagation A Framework For Distributed Inference With Emerging Hardware.
No ratings yet
A Visual Introduction To Gaussian Belief Propagation A Framework For Distributed Inference With Emerging Hardware.
20 pages
Bayesian Graphical Models For Software Testing
No ratings yet
Bayesian Graphical Models For Software Testing
12 pages
of Bayesian Statistics (Chirayu Jain & Group)
No ratings yet
of Bayesian Statistics (Chirayu Jain & Group)
8 pages
Probabilistic Modeling in Bioinformatics and Medical Informatics - 1st Edition All Format Download
100% (19)
Probabilistic Modeling in Bioinformatics and Medical Informatics - 1st Edition All Format Download
17 pages
Bayesian Statistics in Machine Learning - 093615
No ratings yet
Bayesian Statistics in Machine Learning - 093615
7 pages
ML 5
No ratings yet
ML 5
28 pages
Bayesian Statistics and Modelling
No ratings yet
Bayesian Statistics and Modelling
28 pages
Patterns of Scalable Bayesian Inference
No ratings yet
Patterns of Scalable Bayesian Inference
133 pages
Gonzalez 2021
No ratings yet
Gonzalez 2021
67 pages
Probabilistic Modeling in Bioinformatics and Medical Informatics 1st Edition Entire Book Download
No ratings yet
Probabilistic Modeling in Bioinformatics and Medical Informatics 1st Edition Entire Book Download
14 pages
Bayes
No ratings yet
Bayes
825 pages
Amcs 2004 14 3 11
No ratings yet
Amcs 2004 14 3 11
18 pages
Bayesian Network
100% (1)
Bayesian Network
442 pages
Bayesian Compendium One-Click Download
No ratings yet
Bayesian Compendium One-Click Download
17 pages
Bayes
No ratings yet
Bayes
281 pages
Stata Bayesian Analysis Reference Manual: Release 14
No ratings yet
Stata Bayesian Analysis Reference Manual: Release 14
281 pages
PyMC A Modern and Comprehensive Probabilistic Programming Framework in Python
No ratings yet
PyMC A Modern and Comprehensive Probabilistic Programming Framework in Python
35 pages
2021 - Nature - Bayesian Statistics and Modelling
100% (1)
2021 - Nature - Bayesian Statistics and Modelling
26 pages
Bayesian Inference
No ratings yet
Bayesian Inference
22 pages
Probabilistic Inferences in Bayesian Networks
No ratings yet
Probabilistic Inferences in Bayesian Networks
15 pages
Bayes Nets 2016
No ratings yet
Bayes Nets 2016
62 pages
BN in Medicine
No ratings yet
BN in Medicine
19 pages
3-Bayesian Modelling - Inference and Bayesian NT
No ratings yet
3-Bayesian Modelling - Inference and Bayesian NT
25 pages
Bayesian Optimization Primer: 1. Sigopt
No ratings yet
Bayesian Optimization Primer: 1. Sigopt
4 pages
Lecture 12 Bayesian Neural Network
No ratings yet
Lecture 12 Bayesian Neural Network
46 pages
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
Computational Physics: Basic Concepts
From Everand
Computational Physics: Basic Concepts
Devang Patil
No ratings yet
Bayesian Methodology: an Overview With The Help Of R Software
From Everand
Bayesian Methodology: an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Electrical Electronics VOL.08 PDF
50% (2)
Electrical Electronics VOL.08 PDF
148 pages
Machine Life Cycle Analysis
No ratings yet
Machine Life Cycle Analysis
1 page
Republic of Korea - Republic of Indonesia/Roi-Rok Economic Cooperation Joint Task Force
No ratings yet
Republic of Korea - Republic of Indonesia/Roi-Rok Economic Cooperation Joint Task Force
8 pages
Presentation - J&J 2
0% (1)
Presentation - J&J 2
47 pages
My Time Table 2024-25
No ratings yet
My Time Table 2024-25
1 page
Biostatistics Concepts and Applications For Biologists
No ratings yet
Biostatistics Concepts and Applications For Biologists
210 pages
ĐỀ THI THỬ SỐ 10 - Khóa Đề
No ratings yet
ĐỀ THI THỬ SỐ 10 - Khóa Đề
6 pages
9 Types of Organization
No ratings yet
9 Types of Organization
4 pages
Audio Spotlight
No ratings yet
Audio Spotlight
40 pages
CSC441 Script Video Sawanah Koko
No ratings yet
CSC441 Script Video Sawanah Koko
2 pages
9th Major-4 English NCERT Paper Zdyxcq
No ratings yet
9th Major-4 English NCERT Paper Zdyxcq
7 pages
Aryan Mittal
No ratings yet
Aryan Mittal
1 page
Daa 2
No ratings yet
Daa 2
4 pages
JLG-860SJ - en
No ratings yet
JLG-860SJ - en
142 pages
Math C4 Practice
No ratings yet
Math C4 Practice
53 pages
Unit 5 (C++) - Function
No ratings yet
Unit 5 (C++) - Function
102 pages
Search:: A Really Simple Database
No ratings yet
Search:: A Really Simple Database
30 pages
Bioransformation2 180518095843 PDF
No ratings yet
Bioransformation2 180518095843 PDF
79 pages
PHD Download
No ratings yet
PHD Download
1 page
GPS Unit 2 Assignment Sheet
No ratings yet
GPS Unit 2 Assignment Sheet
3 pages
WAHA Monthly Halal Checklist Abattoir
No ratings yet
WAHA Monthly Halal Checklist Abattoir
9 pages
What Is The Definition of - Medium - in Art
No ratings yet
What Is The Definition of - Medium - in Art
9 pages
Railway Traning Report
100% (2)
Railway Traning Report
45 pages
You Must Be Mad!: Warbirds RPG Mad Science Sourcebook
100% (2)
You Must Be Mad!: Warbirds RPG Mad Science Sourcebook
55 pages
Assessment Test 2nd Cash&Rec
100% (1)
Assessment Test 2nd Cash&Rec
6 pages
Ericsson The Bss To Cloud Journey
No ratings yet
Ericsson The Bss To Cloud Journey
26 pages
Instructional Module
100% (2)
Instructional Module
6 pages
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
No ratings yet
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
2 pages

23 STS907

Uploaded by

23 STS907

Uploaded by

Statistical Science

2024, Vol. 39, No. 1, 46–61

Past, Present and Future of Software for

In the past three decades, Bayesian inference has estab-

Package Download count Description

matplotlib 339,834,089 Python plotting package

Package Download count Description

3.2 ADMB 4.2 Universal probabilistic programming languages

and an active community that facilitates communication JAGS

flow are more difficult to encapsulate because they rely

The practical importance of scalability is already ac- alpha dnorm(0, sd = 5)

is currently the state-of-the-art for general-purpose fully-

resentation as a graphical model in plate notation is:

alpha Normal(0.0, 25.0);

Greta [2] A LBERT, J. H. and C HIB , S. (1993). Bayesian analysis of bi-

You might also like