Automatic Reparameterisation of Probabilistic Programs
Automatic Reparameterisation of Probabilistic Programs
NealsFunnel(z, x) : z=0
z ∼ N (0, 3) lpz = log pN (z | 0, 3)
x ∼ N (0, exp(z/2)) x=0
lpx = log pN (x | 0, exp(z/2))
(a) Centred (left) and non-centred (right) parame- (b) Model that generates variables (c) The model in the context of
terisation. z and x. log_prob_at_0.
realisation of a model discussed by Betancourt & Girolami data was generated from some unknown latent variables.
(2015, (2)), where for a vector of N datapoints y, and some Most probabilistic programming languages (PPLs) provide
given constants σ and σµ , we have: some mechanism for transforming a generative process into
an inference program; our automatic reparameterisation
θ ∼ N (0, 1) µ ∼ N (θ, σµ ) approach is applicable to PPLs that transform generative
yn ∼ N (µ, σ) for all n ∈ 1 . . . N programs using effect handling. This includes modern deep
PPLs such as Pyro (Uber AI Labs, 2017) and Edward2 (Tran
In the non-centred model, y is defined in terms of µ̃ and θ, et al., 2018).
where µ̃ is a standard Gaussian variable:
4.1. Effect Handling-based Probabilistic Programming
θ ∼ N (0, 1) µ̃ ∼ N (0, 1)
Consider a generative program, where running the program
yn ∼ N (θ + σµ µ̃, σ) for all n ∈ 1 . . . N
forward generates samples from the prior over latent vari-
ables and data. Effect handling-based PPLs treat generating
Figure 2a and Figure 2b show the graphical models for the a random variable within such a model as an effectful opera-
two parameterisations. In the non-centred case, the direct tion (an operation that is understood as having side effects)
dependency between θ and µ is substituted by a conditional and provide ways for resolving this operation in the form
dependency given the data y, which creates an “explaining of effect handlers, to allow for inference. For example, we
away” effect. Intuitively, this means that the stronger the often need to transform a statement that generates a random
evidence y is (large N , and small variance), the stronger the variable to a statement that evaluates some (log) density
dependency between θ and µ̃ becomes, creating a poorly- or mass function. We can implement this using an effect
conditioned posterior that may slow inference. handler:
As the Gaussian distribution is self-conjugate, the posterior
log_prob_at_0 =
in each case (centred or non-centred) is also a Gaussian
distribution, and we can analytically inspect its covariance handler {v ∼ D(a1 , . . . , aN ) 7→
matrix V . To quantify the quality of the parameterisation v = 0; lpv = log pD (v | a1 , . . . , aN )}2
in each case, we investigate the condition number κ of
the posterior covariance matrix under the optimal diagonal The handler log_prob_at_0 handles statements of the
preconditioner. This models the common practice (imple- form v ∼ D(a1 , . . . , aN ). Such statements normally
mented in tools such as PyMC3 and Stan and followed in mean “sample a random variable from the distribution
our experiments) of sampling using a fitted diagonal precon- D(a1 , . . . , aN ) and record its value in v”. However, when
ditioner. executed in the context of log_prob_at_0 (we write
Figure 2c shows the condition numbers κCP and κNCP for with log_prob_at_0 handle model), statements that
each parameterisation as a function of q = N/σ 2 ; the full contain random-variable constructions are handled by set-
derivation is in Appendix A. This figure confirms the intu- ting the value of the variable v to 0, then evaluating the log
ition that the non-centred parameterisation is better suited density (or mass) function of D(a1 , . . . , aN ) at v = 0 and
for situation when the evidence is weak, while strong evi- recording its value in a new (program) variable lpv .
dence calls for centred parameterisation. In this example we For example, consider the function implementing Neal’s fun-
can exactly determine the optimal parameterisation, since nel in Figure 1b. When executed without any context, this
the model has only one variable that can be reparameterised function generates two random variables, z and x. When
and the posterior has a closed form. In more realistic set- executed in the context of the log_prob_at_0 handler, it
tings, even experts cannot predict the optimal parameterisa- does not generate random variables, but it instead evaluates
tion for hierarchical models with many variables and groups log pN (z | 0, 3) and log pN (x | 0, exp(z/2)) (Figure 1c).
of data, and the wrong choice can lead to poor conditioning,
heavy tails or other pathological geometry. This approach can be extended to produce a function
that corresponds to the log joint density (or mass) func-
tion of the latent variables of the model. In §§ B.1,
4. Reparameterising Probabilistic Programs we give the pseudo-code implementation of a function
An advantage of probabilistic programming is that the pro- make_log_joint, which takes a model M (z | x) — that
gram itself provides a structured model representation, and generates latent variables z and generates and observes data
we can explore model reparameterisation through the lens x — and returns the function f (z) = log p(z, x). This is
of program transformations. In this paper, we focus on 2
Algebraic effects and handlers typically involve passing a
transforming generative probabilistic programs where the continuation within the handler. We make the continuation implicit
program represents a sampling process describing how the to stay close to Edward2’s implementation.
Automatic Reparameterisation of Probabilistic Programs
θ µ θ µ̃
yn yn
n = 1, ..., N n = 1, ..., N
(a) Centred. (b) Non-centred. (c) The condition number as a function of the data’s strength.
a core operation, as it transforms a generative model into returns the log joint function of the transformed variables z̃
a function proportional to the posterior distribution, which rather than the original variables z.
can be repeatedly evaluated and automatically differentiated
For example, make_log_joint(NealsFunnel(z, x)) gives:
to perform inference.
log p(z, x) = log N (z | 0, 3) + log N (x | 0, exp(z/2))
More generally, effectful operations are operations that can
have side effects, e.g. writing to a file. The program- make_log_joint(with ncp handle NealsFunnel(z, x))
ming languages literature formalises cases where impure corresponds to the function:
behaviour arises from a set of effectful operations in terms
of algebraic effects and their handlers (Plotkin & Power, log p(z̃, x̃) = log N (z̃ | 0, 1) + log N (x̃ | 0, 1)
2001; Plotkin & Pretnar, 2009; Pretnar, 2015). A concrete where z = 3z̃ and x = exp(z/2)x̃.
implementation for an effectful operation is given in the
form of effect handlers, which (similarly to exception han- This approach can easily be extended to other parameter-
dlers) are responsible for resolving the operation. Effect isations, including partially centred parameterisations (as
handlers can be used as a powerful abstraction in probabilis- shown later in §§ 5.2), non-centring and whitening multi-
tic programming, and have been incorporated into recent variate Gaussians, and transforming constrained variables
frameworks such as Pyro and Edward2 (Moore & Gorinova, to have unbounded support.
2018).
Edward2 Implementation. We implement reparameter-
4.2. Model Reparameterisation Using Effect Handlers isation handlers in Edward2, a deep PPL embedded in
Python and TensorFlow (Tran et al., 2018). A model
Once equipped with an effect handling-based PPL, we can in Edward2 is a Python function that generates random
easily construct handlers to perform many model transfor- variables. In the core of Edward2 is a special case of
mations, including model reparameterisation. effect handling called interception. To obtain the joint
density of a model, the language provides the function
Non-centring Handler. ncp = handler { make_log_joint_fn(model)4 , which uses a log_prob
v ∼ N (µ, σ), v ∈
/ data 7→ ṽ ∼ N (0, 1); v = µ+σṽ} interceptor (handler) as previously described.
A non-centring handler can be used to non-centre all stan- We extend the usage of interception to treat sample state-
dardisable 3 latent variables in a model. The handler simply ments in one parameterisation as sample statements in an-
applies to statements of the form v ∼ N (µ, σ), where v is other parameterisation (similarly to the ncp handler above):
not a data variable, and transforms them to ṽ ∼ N (0, 1), def noncentre(rv_fn, **d):
v = µ + σṽ. When nested within a log_prob handler # Assumes a location-scale family.
(like the one from §§ 4.1), log_prob handles the trans- rv_fn = ed.interceptable5 (rv_fn)
formed standard normal statement ṽ ∼ N (0, 1). Thus, rv_std = rv_fn(loc=0, scale=1)
make_log_joint applied to a model in the ncp context return d["loc"] + d["scale"] * rv_std
3
We focus on Gaussian variables, but non-centring is broadly 4
Corresponds to make_log_joint(model) in our example.
applicable, e.g. to the location-scale family and random variables 5
Wrapping the constructor with ed.interceptable en-
that can be expressed as a bijective transformation z = fθ (z̃) of a sures that we can nest this interceptor in the context of other
“standardised” variable z̃. interceptors.
Automatic Reparameterisation of Probabilistic Programs
We use the interceptor by executing a model of interest model, MCMC methods applied to the learned parameteri-
within the interceptor’s context (using Python’s context man- sation maintain their asymptotic guarantees.
agers). This overrides each random variable’s constructor
Consider a model with latent variables z. We introduce
to construct a variable with location 0 and scale 1, and scale
parameterisation parameters λ = (λi ) ∈ [0, 1] for each
and shift that variable appropriately:
variable zi , and transform zi ∼ N (zi | µi , σi ) by defining
with ed.interception(noncentre): z̃i ∼ N (λi µi , σiλi ) and zi = µi + σi1−λi (z̃i − λi µi ). This
neals_funnel() defines a continuous relaxation that includes NCP as the
special case λ = 0 and CP as λ = 1. More generally, it
We present and explain in more detail all interceptors used supports a combinatorially large class of per-variable and
for this work in Appendix B. partial centrings.
5. Automatic Model Reparameterisation Example. Recall the example model from Section 3,
which defines the joint density p(θ, µ, y) = N (θ | 0, 1) ×
We introduce two inference strategies that exploit automatic N (µ | θ, σµ ) × N (y | µ, σ). Using the parameterisation
reparameterisation: interleaved Hamiltonian Monte Carlo above to reparameterise µ, we get:
(iHMC), and the Variationally Inferred Parameterisation
(VIP). p(θ, µ̂, y) = N (θ | 0, 1) × N (µ̂ | λθ, σµλ )
× N (y | θ + σµ1−λ (µ̂ − λθ), σ)
5.1. Interleaved Hamiltonian Monte Carlo
Automatic reparameterisation opens up the possibility of al- Similarly to before, we analytically derive an expression for
gorithms that exploit multiple parameterisations of a single the posterior under different values of λ. Figure 4 shows
model. We consider interleaved Hamiltonian Monte Carlo the condition number κ(λ) of the diagonally preconditioned
(iHMC), which uses two HMC steps to produce each sample posterior, for different values of q = N/σ 2 with fixed prior
from the target distribution: the first step is made in CP, us- scale σµ = 1. As expected, when the data is weak (q =
ing the original model latent variables, while the second step 0.01), setting the parameterisation parameter λ to be close
is made in NCP, using the auxiliary standardised variables. to 0 (NCP), results in a better conditioned posterior than
Interleaving MCMC kernels across parameterisations has setting it close to 1 (CP), and conversely for strong data
been explored in previous work on Gibbs sampling (Yu & (q = 100). More interestingly, in intermediate cases (q = 1)
Meng, 2011; Kastner & Frühwirth-Schnatter, 2014), which the optimal value for λ is truly between 0 and 1, yielding a
demonstrated that CP and NCP steps can be combined to modest but real improvement over the extreme points.
achieve more robust and performant samplers. Our con-
tribution is to make the interleaving automatic and model- Optimisation. For a general model with latent variables z
agnostic: instead of requiring the user to write multiple and data x, we aim to choose the parameterisation λ under
versions of their model and a custom inference algorithm, which the posterior p(z̃ | x; λ) is “most like” an indepen-
we implement iHMC as a black-box inference algorithm for dent normal distribution. A natural objective to minimise
centred Edward2 models. is KL(q(z̃; θ) || p(z̃ | x; λ)), where q(z̃; θ) = N (z̃ |
µ, diag(σ)) is an independent normal model with varia-
Algorithm 1 outlines iHMC. It takes a single centred model tional parameters θ = (µ, σ). Minimising this divergence
Mcp (z | x) that defines latent variables z and generates corresponds to maximising a variational lower bound, the
data x. It uses the function make_ncp to automatically ELBO (Bishop, 2006):
obtain a non-centred version of the model, Mncp (z̃ | x),
which defines auxiliary variables z̃ and function f , such that L(θ, λ) = Eq(z̃;θ) (log p(x, z̃; λ) − log q(z̃; θ))
z = f (z̃).
Note that the auxiliary parameters λ are not statistically
identifiable: the marginal likelihood log p(x; λ) = log p(x)
5.2. Variationally Inferred Parameterisation
is constant with respect to λ. However, the computational
The best parameterisation for a given model may mix cen- properties of the reparameterised models differ, and the
tred and non-centred representations for different variables. variational bound will prefer models for which the pos-
To efficiently search the space of reparameterisations, we terior is close in KL to a diagonal normal. Our key hy-
propose the variationally inferred parameterisation (VIP) al- pothesis (which the results in Figure 6 seem to support) is
gorithm, which selects a parameterisation by gradient-based that diagonal-normal approximability is a good proxy for
optimisation of a differentiable variational objective. VIP MCMC sampling efficiency.
can be used as a pre-processing step to another inference To search for a good model reparameterisation, we optimise
algorithm; as it only changes the parameterisation of the L(θ, λ) using stochastic gradients to simultaneously fit the
Automatic Reparameterisation of Probabilistic Programs
(a) Different parameterisations λ of the funnel, with mean-field normal variational fit q(z̃)(overlayed in white).
∗
(b) Alternative view as implicit variational distributions qλ (z) (overlayed in white) on the original space.
Figure 3. Neal’s funnel: z ∼ N (0, 3); x ∼ N (0, ez/2 ), with mean-field normal variational fit overlayed.
resenting prior dependence. Letting z = fλ (z̃) represent Electric Company (Gelman & Hill, 2006): paired causal
the partial centring transformation, an independent normal analysis of the effect of viewing an educational TV show on
family q(z̃) on the transformed model corresponds to an each of 192 classforms over G = 4 grades. The classrooms
implicit posterior qλ∗ (z) = q z̃ = fλ−1 (z) | det Jf −1 (z)|
were divided into P = 96 pairs, and one class in each pair
λ
on the original model variables. Under this interpretation, λ was treated (xi = 1) at random:
are variational parameters that serve to add freedom to the
µg ∼ N (0, 1) ap ∼ N (µg[p] , 1) bg ∼ N (0, 100)
variational family, allowing it to interpolate from indepen-
dent normal (at λi = 1, Figure 3b left) to a representation log σg ∼ N (0, 1) yi ∼ N (ap[i] + bg[i] xi , σg[i] )
that captures the exact prior dependence structure of the
model (at λi = 0, Figure 3b right). 6.2. Algorithms and Experimental Details
For each model and dataset, we compare our methods, in-
6. Experiments terleaved HMC (iHMC) and VIP-HMC, with baselines of
running HMC on either fully centred (CP-HMC) or fully
We evaluate the usefulness of our approach as a robust and
non-centred (NCP-HMC) models. We initialise each HMC
fully automatic alternative to manual reparameterisation.
chain with samples from an independent Gaussian varia-
We compare our methods to HMC ran on fully centred or
tional posterior, and use the posterior scales as a diagonal
fully non-centred models, one of which often gives catas-
preconditioner; for VIP-HMC this variational optimisation
trophically bad results. Our results show not only that VIP
also includes the parameterisation parameters λ. All varia-
improves robustness by avoiding catastrophic reparame-
tional optimisations were run for the same number of steps,
terisations, but also that it sometimes finds a parameteri-
so they were a fixed cost across all methods except iHMC
sation that is better than both the fully centred and fully
(which depends on preconditioners for both the centred
non-centred alternatives.
and non-centred transition kernels). The HMC step size
and number of leapfrog steps were tuned following the
6.1. Models and Datasets
procedures described in Appendix C, which also contains
We evaluate our proposed approaches by using Hamiltonian additional details of the experimental setup.
Monte Carlo to sample from the posterior of hierarchical We report the average effective sample size per 1000 gra-
Bayesian models on several datasets: dient evaluations (ESS/∇), with standard errors computed
Eight schools (Rubin, 1981): estimating the treatment ef- from 200 chains. We use gradient evaluations, rather than
fects θi of a course taught at each of i = 1 . . . 8 schools, wallclock time, as they are the dominant operation in both
given test scores yi and standard errors σi : HMC and VI and are easier to measure reliably; in practice,
the wallclock times we observed per gradient evaluation did
µ ∼ N (0, 5) log τ ∼ N (0, 5) not differ significantly between methods. This is not surpris-
θi ∼ N (µ, τ ) yi ∼ N (θi , σi ) ing, since the (minimal) overhead of interception is incurred
Radon (Gelman & Hill, 2006): hierarchical linear regres- only once at graph-building time. This metric is a direct
sion, in which the radon level ri in a home i in county c is evaluation of the sampler; we do not count the gradient steps
modelled as a function of the (unobserved) county-level ef- taken during the initial variational optimization.
fect mc , the county uranium reading uc , and xi , the number In addition to effective sample size, we also directly exam-
of floors in the home: ined the convergence of posterior moments for each method.
µ, a, b ∼ N (0, 1) mc ∼ N (µ + auc , 1) This yielded similar qualitative conclusions to the results
we report here; more analysis can be found in Appendix D.
log ri ∼ N (mc[i] + bxi , σ)
German credit (Dua & Graff, 2017): logistic regression; 6.3. Results
hierarchical prior on coefficient scales:
Figures 5 and 6 show the results of the experiments. In most
log τ0 ∼ N (0, 10) log τi ∼ N (log τ0 , 1) cases, either the centred or non-centred parameterisation
βi ∼ N (0, τi ) y ∼ Bernoulli(σ(βX T )) works well, while the other does not. An exception is the
German credit dataset, where both CP-HMC and NCP-HMC
Election ’88 (Gelman & Hill, 2006): logistic model of give a small ESS: 1.2±0.2 or 1.3±0.2 ESS/∇ respectively.
1988 US presidential election outcomes by county, given
demographic covariates xi and state-level effects αs : iHMC. Across the datasets in both figures, we see that
βd ∼ N (0, 100) µ ∼ N (0, 100) log τ ∼ N (0, 10) iHMC is a robust alternative to CP-HMC and NCP-HMC.
Its performance is always within a factor of two of the
αs ∼ N (µ, τ ) yi ∼ Bernoulli(σ(αs[i] + β T xi )) best of CP-HMC and NCP-HMC, and sometimes better. In
Automatic Reparameterisation of Probabilistic Programs
Figure 5. Effective sample size and 95% confidence intervals for the radon model across US states.
Figure 6. Effective sample size (w/ 95% intervals) and the optimised ELBO across several models.
addition to being robust, iHMC can sometimes navigate scape under different parameterisations. Figure 8 shows
the posterior more efficiently than either of CP-HMC and typical marginals of the German credit model. In the cen-
NCP-HMC can: in the case of German credit, it performs tred case, the geometry is funnel-like both in the prior (in
better than both (3.0 ± 0.2 ESS/∇). grey) and the posterior (in red). In the non-centred case,
the prior is an independent Gaussian, but the posteriors still
VIP. Performance of VIP-HMC is typically as good as the possess significant curvature. The partially centred parame-
better of CP-HMC and NCP-HMC, and sometimes better. terisations chosen by VIP appear to yield more favourable
On the German credit dataset, it achieves 5.6 ± 0.6 ESS/∇, posterior geometry, where the change in curvature is smaller
more than three times the rate of CP-HMC and NCP-HMC, than that present in the CP and NCP cases.
and significantly better than iHMC. Figure 6 shows the cor-
A practical lesson from our experiments is that while the
respondence between the optimised mean-field ELBO and
ELBO appears to correlate with sampler quality, they are not
the effective sampling rate. We see that parameterizations
necessarily equally sensitive. A variational model that gives
with higher ELBOs tend to yield better samplers, which sup-
zero mass to half of the posterior is only log 2 away from
ports the ELBO as a reasonable predictor of the conditioning
perfect in the ELBO, but the corresponding sampler may be
of a model.
quite bad. We found it helpful to estimate the ELBO with
We show some of the parameterisations that VIP finds in a relatively large number (tens to hundreds, we used 256)
Figure 7. VIP’s behaviour appears reasonable: for most of Monte Carlo samples. As with most variational methods,
datasets we looked at, VIP finds the “correct” global pa- the VIP optimisation is nonconvex in general, and local
rameterisation: most parameterisation parameters are set to optima are also a concern. We occasionally encountered
either 0 or 1 (Figure 7, left). In the cases where a global local optima during development, though we found VIP to
parameterisation is not optimal (e.g. radon MO, radon PA be generally well-behaved on models for which simpler op-
and, most notably, German credit), VIP finds a mixed pa- timisations are well-behaved. In a practical implementation,
rameterisation, combining centred, non-centred, and par- one might detect optimization failure by comparing the VIP
tially centred variables (Figure 7, centre and right). These ELBO to those obtained from fixed parameterizations; for
examples demonstrate the significance of the effect that modest-sized models, a deep PPL can often run multiple
automatic reparameterisation can have on the quality of in- such optimizations in parallel at minimal cost.
ference: manually finding an adequate parameterisation in
the German credit case would, at best, require unreasonable 7. Discussion
amount of hand tuning, while VIP finds such parameterisa-
tion automatically. Our results demonstrate that automated reparameterisation
of probabilistic models is practical, and enables inference
It is interesting to examine the shape of the posterior land-
algorithms that can in some cases find parameterisations
Automatic Reparameterisation of Probabilistic Programs
R EFERENCES
Andrieu, C. and Thoms, J. A tutorial on adaptive MCMC.
Statistics and computing, 18(4):343–373, 2008.
Betancourt, M. and Girolami, M. Hamiltonian Monte Carlo
for hierarchical models. Current trends in Bayesian
methodology with applications, 79:30, 2015.
Gorinova, M. I., Gordon, A. D., and Sutton, C. Probabilis- Plotkin, G. and Pretnar, M. Handlers of algebraic effects. In
tic programming with densities in SlicStan: Efficient, Castagna, G. (ed.), Programming Languages and Systems,
flexible, and deterministic. Proceedings of the ACM on pp. 80–94, Berlin, Heidelberg, 2009. Springer Berlin
Programming Languages, 3(POPL):35, 2019. Heidelberg. ISBN 978-3-642-00590-9.
Hoffman, M., Sountsov, P., Dillon, J. V., Langmore, I., Tran, Pretnar, M. An introduction to algebraic effects and handlers.
D., and Vasudevan, S. NeuTra-lizing bad geometry in Invited tutorial paper. Electronic Notes in Theoretical
Hamiltonian Monte Carlo using neural transport. arXiv Computer Science, 319:19 – 35, 2015. ISSN 1571-0661.
preprint arXiv:1903.03704, 2019. The 31st Conference on the Mathematical Foundations
of Programming Semantics (MFPS XXXI).
Hoffman, M. D., Johnson, M., and Tran, D. Autoconj:
Recognizing and exploiting conjugacy without a domain- Rubin, D. B. Estimation in parallel randomized experiments.
specific language. In Neural Information Processing Journal of Educational Statistics, 6(4):377–401, 1981.
Systems, 2018. ISSN 03629791. URL http://www.jstor.org/
Kastner, G. and Frühwirth-Schnatter, S. Ancillarity- stable/1164617.
sufficiency interweaving strategy (ASIS) for boosting Stan Development Team et al. Stan modelling lan-
MCMC estimation of stochastic volatility models. Com- guage users guide and reference manual. Technical re-
putational Statistics & Data Analysis, 76:408–423, 2014. port, 2016. https://mc-stan.org/docs/2_19/
Kingma, D. P. and Ba, J. Adam: A method for stochastic stan-users-guide/.
optimization. arXiv preprint arXiv:1412.6980, 2014. Tran, D., Hoffman, M. D., Vasudevan, S., Suter, C., Moore,
Kingma, D. P. and Welling, M. Auto-encoding variational D., Radul, A., Johnson, M., and Saurous, R. A. Sim-
Bayes. arXiv preprint arXiv:1312.6114, 2013. ple, distributed, and accelerated probabilistic program-
ming. 2018. URL https://arxiv.org/abs/
Moore, D. and Gorinova, M. I. Effect handling for compos- 1811.02091. Advances in Neural Information Pro-
able program transformations in Edward2. International cessing Systems.
Conference on Probabilistic Programming, 2018. URL
https://arxiv.org/abs/1811.06150. Uber AI Labs. Pyro: A deep probabilistic programming
language, 2017. http://pyro.ai/.
Narayanan, P., Carette, J., Romano, W., Shan, C., and
Zinkov, R. Probabilistic inference by program transfor- Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. Yes,
mation in Hakaru (system description). In International but did it work?: Evaluating variational inference. arXiv
Symposium on Functional and Logic Programming - 13th preprint arXiv:1802.02538, 2018.
International Symposium, FLOPS 2016, Kochi, Japan,
March 4-6, 2016, Proceedings, pp. 62–79. Springer, 2016. Yu, Y. and Meng, X.-L. To center or not to center: That is
doi: 10.1007/978-3-319-29604-3_5. URL http://dx. not the question—an ancillarity–sufficiency interweaving
doi.org/10.1007/978-3-319-29604-3_5. strategy (ASIS) for boosting MCMC efficiency. Journal
of Computational and Graphical Statistics, 20(3):531–
Neal, R. M. Slice sampling. The Annals of Statistics, 31(3): 570, 2011.
705–741, 2003. ISSN 00905364. URL http://www.
jstor.org/stable/3448413. Zinkov, R. and Shan, C. Composing inference algorithms
as program transformations. In Elidan, G., Kersting, K.,
Papaspiliopoulos, O., Roberts, G. O., and Sköld, M. A and Ihler, A. T. (eds.), Proceedings of the Thirty-Third
general framework for the parametrization of hierarchical Conference on Uncertainty in Artificial Intelligence, UAI
models. Statistical Science, pp. 59–73, 2007. 2017, Sydney, Australia, August 11-15, 2017. AUAI Press,
2017.
Parno, M. D. and Marzouk, Y. M. Transport map accel-
erated Markov chain Monte Carlo. SIAM/ASA Journal
on Uncertainty Quantification, 6(2):645–682, 2018. doi:
10.1137/17M1134640. URL https://doi.org/10.
1137/17M1134640.
Plotkin, G. and Power, J. Adequacy for algebraic effects.
In Honsell, F. and Miculan, M. (eds.), Foundations of
Software Science and Computation Structures, pp. 1–24,
Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
ISBN 978-3-540-45315-4.