0% found this document useful (0 votes)

62 views8 pages

Automatic Reparameterisation in Probabilistic Programming

This document summarizes two automatic approaches for reparameterizing probabilistic models to improve approximate Bayesian inference: 1) Interleaved Hamiltonian Monte Carlo (i-HMC) alternates between sampling in the centered and non-centered parameterizations of a model to provide more robust inference. 2) Variationally Inferred Parameterization (VIP) uses a continuous relaxation to optimize a variational objective over possible reparameterizations of the model, learning a partially centered/non-centered parameterization that improves inference. Both approaches aim to automatically determine good reparameterizations without requiring choices from the user, making reparameterization model-agnostic and improving inference performance.

Uploaded by

dperepolkin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views8 pages

Automatic Reparameterisation in Probabilistic Programming

Uploaded by

dperepolkin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1st Symposium on Advances in Approximate Bayesian Inference, 2018 1–8

Automatic Reparameterisation in Probabilistic Programming

Maria I. Gorinova∗ [email protected]

University of Edinburgh
Dave Moore [email protected]
Matthew D. Hoffman [email protected]
Google

Abstract
The performance of approximate posterior inference algorithms can depend strongly on how
the model is parameterised. In particular, non-centring a model can break dependencies
between different levels in hierarchical models and drastically reduce the difficulty of the
inference task. However, it is not obvious how to reparameterise a model in the best possible
way, as the shape of the posterior depends on the properties of the observed data.
We propose two inference strategies that utilise the power of probabilistic programming
to free modellers from the need to choose a parameterisation. The first strategy alternates
between sampling using the centred or the non-centred parameterisation, while the second
strategy learns a partially non-centred parameterisation by optimising a variational objective.

1. Introduction
Reparameterising a probabilistic model means expressing it in terms of a different set of
random variables, representing a bijective transformation of the original variables of interest.
The reparameterised model can have drastically different posterior geometry from the original,
with significant implications for both variational and sampling-based inference algorithms.
In this paper, we focus on non-centring, a particularly common form of reparameterisation
in hierarchical Bayesian modelling. Consider a parameter z ∼ N (µ, σ); we say this is in
centred parameterisation (CP). If we instead work with an auxiliary, standard normal
parameter ∼ N (0, 1), and obtain z by applying the transformation z = µ + σ, we
say the parameter is in its non-centred parameterisation (NCP).1 Although the centred
parameterisation is often more intuitive and interpretable, non-centring can sometimes
drastically improve the performance of inference (Betancourt and Girolami, 2015). Figure 1
illustrates a simple example of such a case.
Bayesian practitioners are often advised to manually non-centre their models; however,
this breaks the separation between modelling and inference and requires expressing the
model in a potentially less intuitive form. Moreover, non-centring is not universally better
than centring: the best parameterisation depends on many factors including the statistical
properties of the observed data. The user must possess the sophistication to understand
what reparameterisation is needed, and where in the model it should be applied.
We explore strategies to tackle this problem automatically via transformations of proba-
bilistic programs. Using the Edward2 probabilistic programming language, we implement
∗
Work done while interning at Google.
1. More generally, non-centring is applicable to location-scale families and any random variable that can
be expressed as a bijective transformation z = fθ () of a “standardized” variable , analogous to the
“reparameterisation trick” in stochastic optimisation (Kingma and Welling, 2013).

c M.I. Gorinova, D. Moore & M.D. Hoffman.
Automatic Reparameterisation

two approaches: Interleaved Hamiltonian Monte Carlo (i-hmc), which alternates HMC
steps between centred and non-centred parameterisations, and a novel algorithm we call
Variationally Inferred Parameterisation (vip), which uses a continuous relaxation to search
over possible ways of reparameterising the model.2 Experiments across a range of models
demonstrate that these strategies enable robust inference, performing at least as well as the
best fixed parameterisation, and sometimes better.

2. Automatic Model Reparameterisation

An advantage of probabilistic programming is that the program itself provides a structured
model representation, and we can explore model reparameterisation through the lens of
program transformations. Modern “deep” PPLs such as Pyro (Uber AI Labs, 2017) and
Edward2 (Tran et al., 2018) provide built-in mechanisms for transforming a sampling process
into an inference program, e.g., by overriding the behaviour of sample statements to instead
compute the log-density of a given value under the sampling distribution. We will focus on
Edward2, where this mechanism is called “interception” and can be understood as a special
case of effect handling (Plotkin and Pretnar, 2009; Moore and Gorinova, 2018). In this work,
we extend the usage of interception to treat sample statements in one parameterisation
as sample statements in another parameterisation. For example, an NCP interceptor
treats centred sample statements as non-centred. We give a more detailed explanation of
interception in Appendix A, and present the interceptors used for this work in Appendix B.
We propose two strategies that tackle the reparameterisation automatically: Interleaved
Hamiltonian Monte Carlo (i-hmc), and the Variationally Inferred Parameterisation (vip).

2.1. Interleaved Hamiltonian Monte Carlo

The Interleaved Hamiltonian Monte Carlo (i-hmc) algorithm uses two hmc steps to produce
each sample from the target distribution. The first step is made in CP, using the original
model parameters, while the second step is made in NCP, using the auxiliary standard
parameters. The idea of interleaving MCMC kernels across parameterisations has been
explored in previous work on interleaved Gibbs sampling (Yu and Meng, 2011; Kastner and
Frühwirth-Schnatter, 2014), which demonstrated that we do not have to choose between
CP and NCP, but can combine them to achieve more robust and performant samplers. Our
contribution is that we make this interleaving automatic and model-agnostic: instead of
requiring the user to write multiple versions of their model and a custom inference algorithm,
we implement i-hmc in Tensorflow Probability, and use interceptors to make its usage only
require the centred model definition. This brings i-hmc to the group of algorithms that can
be used as a black-box algorithm as part of a probabilistic programming system.
Algorithm 1 outlines i-hmc in pseudo-code. The algorithm takes a single centred model
Mcp (z | x) that defines parameters z and generates data x. It then uses the function
make_ncp to automatically obtain a non-centred version of the model, Mncp (z̃ | x), which
defines the auxiliary standard variables z̃, and the function f , such that z = f (z̃).
2. Code for these algorithms and experiments is available at https://github.com/google-research/
google-research/tree/master/edward2_autoreparam.
3. Here we use HMC as the inference kernel. However, there is nothing specific to HMC in this strategy, so
in practise we can use other any inference algorithm instead.

2
Automatic Reparameterisation

Algorithm 1: Interleaved Hamiltonian Monte Carlo Algorithm 2: Variationally Inferred Parameterisation

Arguments: data x; a centred model Mcp (z | x) Arguments: data x; a centred model Mcp (z | x)
Returns: S samples z(1) , . . . z(S) from p(z | x) Returns: S samples z(1) , . . . z(S) from p(z | x)
1: Mncp (z̃ | x), f = make_ncp(Mcp (z | x)) 1: Mvip (z̃ | x; φ), f = make_vip(Mcp (z | x))
2: log pcp = make_log_joint(Mcp (z | x)) 2: log p(x, z̃) = make_log_joint(Mvip (z̃ | x; φ))
3: log pncp = make_log_joint(Mncp (z̃ | x)) 3:
4: 4: Q(z̃; θ) = make_variational(Mvip (z̃ | x; φ))
5: z0 = init() 5: log q(z̃; θ) = make_log_joint(q(z̃; θ))
6: for s ∈ [1, . . . , S] do 6:
7: L(θ, φ) = Eq (log p(x, z̃; φ)) − Eq (log q(z̃; θ))
7: z0 = hmc_step(log pcp , z(s−1) )
8: θ ∗ , φ∗ = argmax L(θ, φ)
8: z00 = hmc_step(log pncp , f −1 (z0 )) θ,φ
9: z(s) = f (z00 ); 9: log p(x, z̃) = make_log_joint(Mvip (z̃ | x; φ∗ ))
10: 10: z(1) , . . . , z(S) = hmc(log p)3
11: return z(1) , . . . , z(S)
11: return f (z(1) ), . . . , f (z(S) )

2.2. Variationally inferred parameterisation

The best parameterisation for a given model may mix centred and non-centred representations
for different variables. To efficiently search the space of reparameterisations, we construct
a continuous relaxation of non-centring that includes both CP and NCP, and propose
an algorithm vip, which selects a parameterisation by gradient-based optimisation of a
variational objective. vip can be used as a pre-processing step to another inference algorithm.
As it only changes the parameterisation of the model, using vip in combination with an
MCMC method does not have an effect on the asymptotic guarantees that the latter has.
Consider a model with parameters z. We introduce a set of parameterisation parameters
φ = (a, b), and transform each zi ∼ N (zi | µi , σi ), by defining z̃i ∼ N (ai µi , σibi ) and
zi = µi + σi1−bi (z̃i − ai µi ), where ai and bi are between 0 and 1. This parameterisation
includes NCP as the special case a = b = 0, and CP as the case a = b = 1.
We learn a parameterisation by optimising an objective that favours parameterisations
under which the posterior’s shape is close to an independent normal distribution. A natural
objective to choose is KL(q(z̃; θ) || p(z̃ | x; φ)), where q(z̃; θ) = N (z̃ | µ, diag(σ)) is a mean-
field variational model with variational parameters θ = (µ, σ). Minimising this divergence
corresponds to maximimizing a variational lower bound (the ELBO):

L(θ, φ) = Eq(z̃;θ) (log p(x, z̃; φ)) − Eq(z̃;θ) (log q(z̃; θ)) .

Neal’s funnel (Figure 1) provides an illustrative example: the variational distribution

is a poor fit to the pathological geometry of CP, but non-centring leads to a perfect fit.
We simultaneously fit the variational distribution q(z̃; θ) to the posterior p(z̃ | x; φ) by
changing the variational parameters θ, and change the shape of that posterior through
the parameterisation parameters φ. Algorithm 2 summarises the method. Both the
model reparameterisation and the construction of the variational distribution q are done
automatically, utilising Edward2’s interceptors (see Appendix B).

3. Experiments
We report experimental results for hierarchical Bayesian regression on the Eight Schools
(Rubin, 1981), Radon (Gelman and Hill, 2006) and German credit datasets. For each, we
specify a model and evaluate i-hmc and vip-hmc by comparing to HMC run on the fully

3
Automatic Reparameterisation

8 Schools German
Credit
hmc-cp 92 ±4 64 ± 21
hmc-ncp 3475 ± 849 35 ± 15
i-hmc 3879 ± 281 101 ± 32
vip-hmc 4986 ± 660 115 ± 12

(a) Centred (b) Non-centred

Figure 1: Neal’s funnel: z ∼ N (0, 3); x ∼ Table 1: Effective sample size to number of
N (0, e−z/2 ) (Neal, 2003), with mean-field normal leapfrog steps (larger is better), with stan-
variational fit overlayed. Although half of the mass dard errors from five trials. Eight schools and
is inside the “funnel” (z < 0), centred samplers German credit models.
have difficulty reaching it.

centred model (hmc-cp) and the fully non-centred model (hmc-ncp). On each run of the
experiment, we obtain 50000 samples after burn-in. We tune the step sizes and number of
leapfrog steps for HMC automatically, as described in more details in Appendix C.

MN IN PA MO ND MA AZ
hmc-cp 798 ± 276 1034 ± 54 425 ± 208 427 ± 45 2840 ± 347 5796 ± 393 3644 ± 129
hmc-ncp 340 ± 35 75 ± 14 43 ± 9 16 ± 7 187 ± 36 179 ± 66 100 ± 26
i-hmc 1495 ± 129 590 ± 287 410 ± 183 233 ± 29 2421 ± 89 6696 ± 97 2472 ± 267
vip-hmc 1144 ± 279 865 ± 98 816 ± 184 416 ± 51 3273 ± 145 5551 ± 336 3875 ± 73

Table 2: Effective sample size to number of leapfrog steps (larger is better), with standard errors
from five trials. Radon data for different US states.

Across the datasets in Table 1 and Table 2, we see that i-hmc is a robust alternative to
using a fully centred or non-centred model. By taking alternating steps in CP and NCP,
i-hmc ensures that it makes reasonable progress, regardless of which of hmc-cp or hmc-ncp
is better. Moreover, we see that vip-hmc finds a reasonable reparameterisation in each
case; typically as good as of the better of hmc-cp and hmc-ncp. On initial inspection, the
learned parameterisations are often very close to fully centred or non-centred (implying that
vip-hmc successfully learns the “correct” global parameterisation for each problem), but a
small number of groups are sometimes flipped to the alternative parameterisation. These
preliminary results suggest that these learned, mixed parameterisations may sometimes be
superior to the best global parameterisation; we are excited to explore this further.

4. Discussion
We presented two inference strategies that use program transformations on probabilistic
programs to automatically make use of different model reparameterisations, and we show both
strategies are robust. We hope that the idea of automatically reparameterising probabilistic
models with the aid of program transformations can lead to new ways of easing the inference
task, potentially allowing us to work with models that were previously infeasable.

4
Automatic Reparameterisation

References
Christophe Andrieu and Johannes Thoms. A tutorial on adaptive MCMC. Statistics and
computing, 18(4):343–373, 2008.

Michael Betancourt and Mark Girolami. Hamiltonian Monte Carlo for hierarchical models.
Current trends in Bayesian methodology with applications, 79:30, 2015.

Andrew Gelman and Jennifer Hill. Data analysis using regression and multilevel/hierarchical
models. Cambridge university press, 2006.

Gregor Kastner and Sylvia Frühwirth-Schnatter. Ancillarity-sufficiency interweaving strategy

(ASIS) for boosting MCMC estimation of stochastic volatility models. Computational
Statistics & Data Analysis, 76:408–423, 2014.

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114, 2013.

Dave Moore and Maria I. Gorinova. Effect handling for composable program transformations
in Edward2. International Conference on Probabilistic Programming, 2018. URL https:
//arxiv.org/abs/1811.06150.

Radford M. Neal. Slice sampling. The Annals of Statistics, 31(3):705–741, 2003. ISSN
00905364. URL http://www.jstor.org/stable/3448413.

Gordon Plotkin and Matija Pretnar. Handlers of algebraic effects. In Giuseppe Castagna,
editor, Programming Languages and Systems, pages 80–94, Berlin, Heidelberg, 2009.
Springer Berlin Heidelberg. ISBN 978-3-642-00590-9.

Matija Pretnar. An introduction to algebraic effects and handlers. Invited tutorial paper.
Electronic Notes in Theoretical Computer Science, 319:19 – 35, 2015. ISSN 1571-0661.
doi: https://doi.org/10.1016/j.entcs.2015.12.003. URL http://www.sciencedirect.com/
science/article/pii/S1571066115000705. The 31st Conference on the Mathematical
Foundations of Programming Semantics (MFPS XXXI).

Donald B. Rubin. Estimation in parallel randomized experiments. Journal of Educational

Statistics, 6(4):377–401, 1981. ISSN 03629791. URL http://www.jstor.org/stable/
1164617.

Dustin Tran, Matthew D. Hoffman, Srinivas Vasudevan, Christopher Suter, Dave Moore,
Alexey Radul, Matthew Johnson, and Rif A. Saurous. Simple, Distributed, and Accelerated
probabilistic programming. 2018. URL https://arxiv.org/abs/1811.02091. To appear
in Advances in Neural Information Processing Systems 2019.

Uber AI Labs. Pyro: A deep probabilistic programming language, 2017. http://pyro.ai/.

Yaming Yu and Xiao-Li Meng. To center or not to center: That is not the question—an
ancillarity–sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. Journal
of Computational and Graphical Statistics, 20(3):531–570, 2011.

5
Automatic Reparameterisation

Appendix A. Probabilistic Programming in Edward2

Edward2 (Tran et al., 2018) is a deep probabilistic programming language built on top of Ten-
sorFlow, which similarly to Pyro (Uber AI Labs, 2017) uses algebraic effect handlers(Plotkin
and Pretnar, 2009; Pretnar, 2015) to transform a generative model to a target function on
the parameters of the model, that can be used for inference. This section outlines the basic
concepts of probabilistic programming in Edward2.
A model in Edward2 is a Python function that generates random variables. For example,
the following function corresponds to Neal’s funnel from Figure 1:
from t e n s o r f l o w _ p r o b a b i l i t y import edward2 as ed
def neals_funnel ():
z = ed . Normal ( loc =0. , scale =3. , name = " z " )
x = ed . Normal ( loc =0. , scale = tf . exp ( - z / 2.) , name = " x " )
return x
When run forward, this generates samples for x. However, in most cases when the task
is to sample from some posterior distribution given data, we are interested in a function
that evaluates the density, rather than the generative model. Edward2 uses a mechanism
called interception to do that.
Interception can be seen as a special case of effect handling, and the idea behind it is that
it treats the construction of random variables as an effectful operation. Effectful operations
are operations that can potentially have some side effect (e.g. writing to a systems file).
Effectful operations can be intercepted (or in PL jargon they are handled ), so that their
effect can be controlled.
Continuing with Neal’s funnel, we can define a function that evaluates the log density
log p(x, z) at some given x and z:
def log_joint_fn (** kwargs ):
log_prob = 0

def l o g _ p r o b _ i n t e r c e p t o r ( rv_constructor , ** rv_kwargs ):

# Overrides a random variable ’s ‘ value ‘ and accumulates its log - prob .
rv_name = rv_kwargs . get ( " name " )
rv_kwargs [ " value " ] = kwargs . get ( rv_name )

rv = rv_constructor (** rv_kwargs )

log_prob = log_prob + rv . distribution . log_prob ( rv . value )
return rv

with ed . interception ( l o g _ p r o b _ i n t e r c e p t o r ):
neals_funnel ()
return log_prob
By executing the neals_funnel function in the context of log_prob_interceptor, we
override each sample statement (a call to a random variable constructor rv_constructor),
to generate a variable that takes on the value provided in the arguments of log_joint_fn.
As a side effect, we also accumulate the result of evaluating each variable’s prior density at
the provided value, which, by the chain rule, gives us the log joint density. The function
log_joint_fn then is equvalent to the function log p, where log p(z, x) = log N (z | 0, 1) +
log N (x | 0, e−z/2 ).

6
Automatic Reparameterisation

Appendix B. Interceptors
Interceptors can be used as a powerful abstractions in a probabilistic programming systems,
as discussed previously by Moore and Gorinova (2018), and shown by both Pyro and
Edward2. In particular, we can use interceptors to automatically reparameterise a model, as
well as to specify variational families. In this section, we show Edward2 pseudo-code for the
interceptors used to implement i-hmc and vip-hmc.

B.1. Non-centred Parameterisation Interceptor

By intercepting every construction of a normal variable, 4 we can create a standard normal
variable instead, and scale and shift appropriately.
def ncp_interceptor ( rv_constructor , ** rv_kwargs ):
# Assumes rv_constructor is in the location - scale family
name = rv_kwargs [ " name " ] + " _std "
rv_std = ed . interceptable 5 ( rv_constructor )( loc =0 , scale =1)
return rv_kwargs [ " loc " ] + rv_kwargs [ " scale " ] * rv_std
Running a model that declares the random variables θ in the context of ncp_interceptor
will declare a new set of standard normal random variables θ (std) . Nesting this in the
context of the log_prob_interceptor from Appendix A will then evaluate the log joint
density log p(θ (std) ).
For example, going back to Neal’s funnel, running
with ed . interception ( l o g _ p r o b _ i n t e r c e p t o r ):
neals_funnel ()
corresponds to evaluating log p(z, x) = log N (z | 0, 1) + log N (x | 0, e−z/2 ), while running
with ed . interception ( l o g _ p r o b _ i n t e r c e p t o r ):
with ed . interception ( ncp_interceptor ):
neals_funnel ()
corresponds to evaluating log p(z (std) , x(std) ) = log N (z (std) | 0, 1) + log N (x(std) | 0, 1).

B.2. VIP Interceptor

The VIP interceptor is similar to the NCP interceptor. The notable difference is that
it creates new learnable Tensorflow variables, which correspond to the parameterisation
parameters a and b:
def vip_interceptor ( rv_constructor , ** rv_kwargs ):
name = rv_kwargs [ " name " ] + " _vip "
rv_loc = rv_kwargs [ " loc " ]
rv_scale = rv_kwargs [ " scale " ]

a = tf . nn . sigmoid ( tf . get_variable (
name + " _a_unconstrained " , initializer = tf . zeros_like ( rv_loc ))
b = tf . nn . sigmoid ( tf . get_variable (
name + " _b_unconstrained " , initializer = tf . zeros_like ( rv_scale ))

4. Or, more generally, of location-scale family variables.

5. Wrapping the constructor in with ed.interceptable ensures that we can nest this interceptor in the
context of other interceptors.

7
Automatic Reparameterisation

rv_vip = ed . interceptable ( rv_constructor )(

loc = a * rv_loc , scale = b ** rv_scale )
return rv_loc + rv_scale ** (1 - b ) * ( rv_vip - a * rv_loc )

B.3. Mean-field Variational Model Interceptor

Finally, we show a mean-field variational familiy interceptor, which we use both to tune
the step sizes for hmc (see Appendix C), and to make use of vip automatically. The
mfvi_interceptor simply substitutes each sample statement with sampling from a normal
distribution with parameters specified by some fresh variational parameters µ and σ:
def vip_interceptor ( rv_constructor , ** rv_kwargs ):
name = rv_kwargs [ " name " ] + " _q "
mu = tf . get_variable ( name + " _mu " )
sigma = tf . nn . softmax ( tf . get_variable ( name + " _sigma " ))

rv_q = ed . interceptable ( ed . Normal )(

loc = mu , scale = sigma , name = name )
return rv_q

Appendix C. Experimental Setup

Algorithms. We evaluate i-hmc and vip-hmc by comparing to hmc run on a fully centred
model (hmc-cp) and fully non-centred model (hmc-ncp). On each run of the experiment,
we obtain 50000 samples, in addition to the first 8000 samples that we discard as burn-in.
We also thin each HMC chain by discarding every other sample, in order to match the work
done per sample to that of i-hmc (where the intermediate ncp sample is discarded).
Tuning. We run each algorithm using HMC with 1, 2, 4, 8, 16, and 32 leapfrog steps, and
report the results for each model and each dataset that maximize (effective sample size /
leapfrog steps), where the joint effective sample size is taken to be the minimum across all
model variables. Within each HMC chain, the per-variable step sizes are initialized using
the posterior standard deviation computed by mean-field variational inference (MF-VI). The
step-size is then adapted for the first 6000 steps of the burnin using a procedure following
Section 4.2 of Andrieu and Thoms (2008), targeting an acceptance rate of 0.75. In the case
of i-hmc, we use fixed step sizes for each sub-step: the resulting step sizes after adaptation
of hmc-cp and hmc-ncp for the cp and ncp sub-steps respectively. As this approximation
is noisy, yet different values can drastically change the quality of the inferred samples, we
approximate and adapt the step sizes at every experimental run, for every algorithm.
Variational Inference Setup. The MF-VI procedure uses 64 samples from the variational
distribution q for a Monte Carlo estimate of the expectations under q. We optimise the
ELBO using Adam optimiser. To reduce variance, we run the MF-VI procedure 3 times
using learning rate of 0.01 and 3 times using learning rate of 0.1, and chose the result that
maximizes the ELBO.

Ai Fundamentals Source Quizzes
100% (1)
Ai Fundamentals Source Quizzes
109 pages
A.I.question Bank
100% (1)
A.I.question Bank
28 pages
Automatic Reparameterisation of Probabilistic Programs
No ratings yet
Automatic Reparameterisation of Probabilistic Programs
10 pages
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
18 Aos1715
No ratings yet
18 Aos1715
33 pages
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
AdamMCMC
No ratings yet
AdamMCMC
16 pages
1312.6114v1
No ratings yet
1312.6114v1
9 pages
kucukelbir_2017_advi
No ratings yet
kucukelbir_2017_advi
45 pages
Automatic Differentiation Variational Inference
No ratings yet
Automatic Differentiation Variational Inference
45 pages
On Particle Methods For Parameter Estimation in State-Space Models
No ratings yet
On Particle Methods For Parameter Estimation in State-Space Models
25 pages
Using Early Rejection Markov Chain Monte Carlo and Gaussian Processes To Accelerate ABC Methods
No ratings yet
Using Early Rejection Markov Chain Monte Carlo and Gaussian Processes To Accelerate ABC Methods
33 pages
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Coeurdoux etal23-PnPGibbs
No ratings yet
Coeurdoux etal23-PnPGibbs
15 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
SI Nonlin
No ratings yet
SI Nonlin
14 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Physics-Informed Neural NetworkThe Effect of For Sloving Differential Equations
No ratings yet
Physics-Informed Neural NetworkThe Effect of For Sloving Differential Equations
8 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
dpnn_tpami2017
No ratings yet
dpnn_tpami2017
9 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Approximate Bayesian Computation For Finite Mixture Models
No ratings yet
Approximate Bayesian Computation For Finite Mixture Models
21 pages
1-s2.0-S0888327015005890-main
No ratings yet
1-s2.0-S0888327015005890-main
14 pages
Uncertainty Quantification For Parameter Estimation of An Indust - 2023 - Mechat
No ratings yet
Uncertainty Quantification For Parameter Estimation of An Indust - 2023 - Mechat
9 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
Introduction to Numerical Analysis
From Everand
Introduction to Numerical Analysis
Simone Malacrida
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
2412.10683v2
No ratings yet
2412.10683v2
78 pages
Can J Chem Eng - 2024 - Gibson - Bayesian Parameter Estimation Using Truncated Normal Distributions as Priors For
No ratings yet
Can J Chem Eng - 2024 - Gibson - Bayesian Parameter Estimation Using Truncated Normal Distributions as Priors For
17 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Green Et Al 2015 Bayesian and Markov Chain Monte Carlo Methods For Identifying Nonlinear Systems in The Presence of
No ratings yet
Green Et Al 2015 Bayesian and Markov Chain Monte Carlo Methods For Identifying Nonlinear Systems in The Presence of
18 pages
1711.09268v3
No ratings yet
1711.09268v3
15 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Journalclub Wood2016
No ratings yet
Journalclub Wood2016
18 pages
Reparametrization Trick
No ratings yet
Reparametrization Trick
8 pages
Latent Space-Based Stochastic Model Updating
No ratings yet
Latent Space-Based Stochastic Model Updating
26 pages
L S N N R: Earning Parse Eural Etworks Through Egularization
No ratings yet
L S N N R: Earning Parse Eural Etworks Through Egularization
13 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
!!!BPINN and inversion
No ratings yet
!!!BPINN and inversion
32 pages
GP506 L2 Error Analysis 2
No ratings yet
GP506 L2 Error Analysis 2
28 pages
Bayseian Time Domain
No ratings yet
Bayseian Time Domain
17 pages
FunctionSpace Regularization in Neural NetworksA Probabilistic Perspective
No ratings yet
FunctionSpace Regularization in Neural NetworksA Probabilistic Perspective
16 pages
Auto Encoding Variational Bayes
No ratings yet
Auto Encoding Variational Bayes
14 pages
2505.04611v1
No ratings yet
2505.04611v1
12 pages
Aim
No ratings yet
Aim
27 pages
Paap 2019-08-15 Peter Glynn
No ratings yet
Paap 2019-08-15 Peter Glynn
46 pages
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Abu Husain, Haddad Khodaparast, Ouyang - 2012 - Parameter Selection and Stochastic Model Updating Using Perturbation Methods With Parame
No ratings yet
Abu Husain, Haddad Khodaparast, Ouyang - 2012 - Parameter Selection and Stochastic Model Updating Using Perturbation Methods With Parame
18 pages
Lifted Model Construction Without Normalisation: A Vectorised Approach To Exploit Symmetries in Factor Graphs
No ratings yet
Lifted Model Construction Without Normalisation: A Vectorised Approach To Exploit Symmetries in Factor Graphs
17 pages
Hogg 2018 ApJS 236 11
No ratings yet
Hogg 2018 ApJS 236 11
18 pages
Machine Learning and AI
No ratings yet
Machine Learning and AI
10 pages
6114 Weight Normalization A Simple Reparameterization To Accelerate Training of Deep Neural Networks
No ratings yet
6114 Weight Normalization A Simple Reparameterization To Accelerate Training of Deep Neural Networks
9 pages
Gelbart Dissertation 2015
No ratings yet
Gelbart Dissertation 2015
137 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Bayesian Optimization Over Discrete and Mixed Spaces Via Probabilistic Reparameterization - Eriksson Et. Al.
No ratings yet
Bayesian Optimization Over Discrete and Mixed Spaces Via Probabilistic Reparameterization - Eriksson Et. Al.
32 pages
394 A Scalable Laplace Approximati
No ratings yet
394 A Scalable Laplace Approximati
15 pages
Bayesian Network Homework Solutions
100% (1)
Bayesian Network Homework Solutions
4 pages
ML IMP QUES 1
No ratings yet
ML IMP QUES 1
22 pages
MCQQQQQQQQQ
No ratings yet
MCQQQQQQQQQ
35 pages
Ai R16 - Unit-6
No ratings yet
Ai R16 - Unit-6
36 pages
Knowledge Inference AI
No ratings yet
Knowledge Inference AI
21 pages
Unit 5
No ratings yet
Unit 5
41 pages
Bayes Classifier PDF
100% (1)
Bayes Classifier PDF
18 pages
Artificial Intelligence: Tutorial 7 Questions Uncertainty and Imprecision
No ratings yet
Artificial Intelligence: Tutorial 7 Questions Uncertainty and Imprecision
6 pages
AI - Instructional System Design (Final)
No ratings yet
AI - Instructional System Design (Final)
25 pages
Risk Assessment Framework For Building Construction Projects' in Developing Countries
No ratings yet
Risk Assessment Framework For Building Construction Projects' in Developing Countries
12 pages
Reasoning under Uncertainity Study material
No ratings yet
Reasoning under Uncertainity Study material
22 pages
Unit 4( Probabilistic Reasoning)
No ratings yet
Unit 4( Probabilistic Reasoning)
68 pages
unit5
No ratings yet
unit5
7 pages
Module 3 Notes_AI
No ratings yet
Module 3 Notes_AI
55 pages
Causal Reasoning Workshop Armen RCA NOOK
No ratings yet
Causal Reasoning Workshop Armen RCA NOOK
27 pages
LM39 - Naïve Bayes Models
No ratings yet
LM39 - Naïve Bayes Models
14 pages
Emerging Artificial Intelligence Applications in Computer Engineering 1st Edition by Ilias Maglogiannis, Kostas Karpouzis, Manolis Wallace, John Soldatos ISBN 1586037803 9781586037802 - Own the ebook now with all fully detailed chapters
100% (5)
Emerging Artificial Intelligence Applications in Computer Engineering 1st Edition by Ilias Maglogiannis, Kostas Karpouzis, Manolis Wallace, John Soldatos ISBN 1586037803 9781586037802 - Own the ebook now with all fully detailed chapters
83 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
CCF Using ML and DL
No ratings yet
CCF Using ML and DL
19 pages
ai
No ratings yet
ai
52 pages
Unit 3
No ratings yet
Unit 3
68 pages
2 BN Ve
No ratings yet
2 BN Ve
25 pages
QB 3 Unit
No ratings yet
QB 3 Unit
8 pages
A Comparative Study of Various Machine Learning Algorithms in Fog Computing
No ratings yet
A Comparative Study of Various Machine Learning Algorithms in Fog Computing
12 pages
1 s2.0 S0959652620308970 Main
No ratings yet
1 s2.0 S0959652620308970 Main
13 pages
Unit 5
No ratings yet
Unit 5
37 pages
Bayesian Filtering - From Kalman Filters To Particle Filters and Beyond
No ratings yet
Bayesian Filtering - From Kalman Filters To Particle Filters and Beyond
69 pages
Bank Direct Marketing Analysis of Data Mining Techniques: Hany A. Elsalamony
No ratings yet
Bank Direct Marketing Analysis of Data Mining Techniques: Hany A. Elsalamony
11 pages

Automatic Reparameterisation in Probabilistic Programming

Uploaded by

Automatic Reparameterisation in Probabilistic Programming

Uploaded by

1st Symposium on Advances in Approximate Bayesian Inference, 2018 1–8

Automatic Reparameterisation in Probabilistic Programming

Maria I. Gorinova∗ [email protected]

2. Automatic Model Reparameterisation

2.1. Interleaved Hamiltonian Monte Carlo

Algorithm 1: Interleaved Hamiltonian Monte Carlo Algorithm 2: Variationally Inferred Parameterisation

2.2. Variationally inferred parameterisation

Neal’s funnel (Figure 1) provides an illustrative example: the variational distribution

(a) Centred (b) Non-centred

Gregor Kastner and Sylvia Frühwirth-Schnatter. Ancillarity-sufficiency interweaving strategy

Donald B. Rubin. Estimation in parallel randomized experiments. Journal of Educational

Uber AI Labs. Pyro: A deep probabilistic programming language, 2017. http://pyro.ai/.

Appendix A. Probabilistic Programming in Edward2

def l o g _ p r o b _ i n t e r c e p t o r ( rv_constructor , ** rv_kwargs ):

rv = rv_constructor (** rv_kwargs )

B.1. Non-centred Parameterisation Interceptor

B.2. VIP Interceptor

4. Or, more generally, of location-scale family variables.

rv_vip = ed . interceptable ( rv_constructor )(

B.3. Mean-field Variational Model Interceptor

rv_q = ed . interceptable ( ed . Normal )(

Appendix C. Experimental Setup

You might also like