0% found this document useful (0 votes)

68 views23 pages

Bayesian Uncertainty Quantification

This document outlines a course on Bayesian uncertainty quantification and high performance computing. It introduces Bayesian statistics and frameworks for modeling uncertainty. Specific techniques discussed include the Laplace approximation for analytical estimates, Monte Carlo methods for numerical integration, and Markov chain Monte Carlo sampling. Examples are provided to illustrate Bayesian inference for problems like estimating the bias of a coin and parameters of a linear model. The goal is to quantify uncertainty in computational models given observations.

Uploaded by

Kowshik Thopalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views23 pages

Bayesian Uncertainty Quantification

Uploaded by

Kowshik Thopalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Bayesian Uncertainty Quantification

High Performance Computing

for Computational Science and Engineering
II

Prof. Dr. Petros Koumoutsakos

Spring 2018

Contents
1 Introduction 2

2 Bayesian Framework 2
2.1 Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Example: The coin flipping problem . . . . . . . . . . . . . . . . 3
2.3 Example: Linear model . . . . . . . . . . . . . . . . . . . . . . . 4

3 The Laplace Approximation 6

3.1 Example: Back to the coin flipping problem . . . . . . . . . . . . 10
3.2 Example: Gaussian mean estimator . . . . . . . . . . . . . . . . 11

4 Monte Carlo Methods 12

4.1 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Random Number Generators . . . . . . . . . . . . . . . . . . . . 13
4.3 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Sampling methods 15
5.1 Function Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . 21

1
1 Introduction
In science, we attempt to describe, understand and predict systems via models
which depend on parameters. These models are an approximation of the reality
and contain several sources of uncertainty, including modeling and numerical
errors. Furthermore, we often don’t know the parameters of the model or how
sensible in the output of the model with respect to the parameters.. We wish
to describe the uncertainty of these parameters given observations of the real
system. We will here present the steps to complete this process.
In Section 2, we will present the Bayes’ theorem and its applications in
the field of uncertainty quantification. In Section 3, we will describe how to
derive analytically estimations to quantify the uncertainty in parameters. In
Section 4, we will introduce the concept of Monte Carlo methods, which are
the basis of most numerical methods used in uncertainty quantification. Fi-
nally, in Section 5, we will present numerical methods able to sample arbitrary
distributions.

2 Bayesian Framework
2.1 Bayes’ theorem
Let X and Y be two random variables (r.v.) with densities pX and pY . The
Bayes’ theorem states that,

pY |X (y|x) pX (x)
pX|Y (x|y) = . (1)
pY (y)

The density pX|Y is called the posterior probability. The term pY |X is viewed
as a function of x since on the left hand side of Eq. (1) we condition on the fixed
value for the random variable Y = y. As a function of x this term is called the
likelihood functions and is a measure of how likely is to observe the value y for
the r.v. Y conditioning on the value x for the r.v. X. Notice that pY |X is not a
probability density as a function of x. The term pX is called prior distribution
and represents our belief on the values of X prior to observing any values for
the random variable Y . Finally, the denominator is defined as,
Z
pY (y) = pY |X (y|x) pX (x) dx, (2)

and is the normalizing constant that makes the right hand side of Eq. (1) a
probability density function.
In order to simplify the notation we drop the dependence of the density on
the random variable. Which density is used will be evident from the arguments.
For example, when we write p(x|y) then p = pX|Y or p(X) the p = pX .

Bayes’ theorem in action The way we will use Bayes theorem in the next
sections is the following. First we make some assumptions:

2
• We assume that we have a computational model that depends on some
parameters. These parameters are considered to be random variables and
will be denoted by X. A prior distribution can be imposed on them,
e.g. if we know that X takes only positive values pX can be the gamma
distribution.

• We have observed a set of data, y. We assume that data are also r.v. that
follow a probability distribution.
• The likelihood function of the data, pY |X , is either known explicitly or
can modeled based on other assumptions.

Based on these assumptions and using Eq. (1) we are able to find the distribution
of the parameters conditioned on the data. Stated differently, we can answer the
question “what values for the parameters will make the computational model
fit the data better?”.
In order to fix the notation, we will denote the random variable that repre-
sents the parameters and the data with X and D respectively and a realization
from these variables with x and d.

Robust prediction The uncertainty in the parameters can be propagated to

the output of the model in order to quantify the uncertainty it the predictions. If
the prior uncertainty is used, then the prediction is called prior robust prediction
Z
p(y) = p(y |x) p(x) dx. (3)

If the posterior distribution is used the prediction is called posterior robust

prediction Z
p(y |d) = p(y |x) p(x | d) dx. (4)

Model selection TBW

2.2 Example: The coin flipping problem

A coin comes up heads 4 times in 16 flips.
Is this a fair coin?

Define H the bias-weighting of the coin. For example

• if H = 0: a tail comes at every flip,

• if H = 1: a head comes at every flip,

• if H = 21 : a fair coin.

3
Here, H plays the role of the model parameter x. Suppose we observe “R heads
in N tosses”. We want to estimate the posterior distribution of H given the
observed data d = (R, N ),
p(H | d). (5)
Using Bayes’ theorem, we write

p(H | d) ∝ p(d| H)p(H). (6)

Here, we omit the normalization factor for simplicity. We choose a uniform

prior, (
1, if 0 ≤ H ≤ 1,
p(H) = (7)
0, otherwise.
Such prior is used when we do not have any prior knowledge about the fairness
of the coin: it is equally probable to have a fair coin or to have a coin completely
biased towards head. We need to define the likelihood function in Eq. (6). In
words, the likelihood function measures the chance to observe certain data if
the value of the bias-weighting is given. Assuming independent events, it is
easy to observe that the likelihood of obtaining “R heads in N tosses” follows
a binomial distribution,

p(d |H) ∝ H R (1 − H)N −R . (8)

This is intuitively derived by considering

• H R is the probability of having R “heads”,

• (1 − H)N −R is the probability of having N − R “tails”.
Note that we again omitted the constant factor in Eq. (8) as it does not depend
on H. Posterior distributions of the bias-weighting of the coin H are shown on
Fig. 1, starting from three different priors. Comparing Fig. 1 with our original
problem, we can see that the probability of the coin to be fair still lies in the
confidence region. However it is more likely that the coin is not fair, given the
data. We need more data to increase our confidence about H.

2.3 Example: Linear model

Consider the following linear model,

y = x + , (9)

where x and are independent. We assume the following prior knowledge,

x ∼ N µ, σ 2 , (10)
∼ N (0, 1) , (11)

4
18 Parameter estimation I

Fig. 2.2 The effect of different priors, prob(H |I ), on the posterior pdf for the bias-weighting of
Figurea coin.
1: The solid line isofthethe
Evolution sameposterior
as in Fig. 2.1, and is included
density of theforbias-weighting
ease of comparison.of
Thea case
coin as
for two alternative
the number of data priors, reflecting slightly
increases. The different
differentassumptions in the conditioning
lines show posteriorinformation
densities for
I, are priors.
different shown with Thedashed andfigure
first dotted lines.
for 0 data points represents the three priors. It
can be seen that after many observations, the three posterior densities converge
to the same distribution. However, the effect of the prior is evident for smaller
observation data sets, as the biased Gaussian prior converges later to the actual
posterior density. Taken from [1].

5
where µ and σ are given. After a single observation y = d ∈ R, we can write
from Bayes’ theorem
p(d | x)p(x)
p(x | d) = . (12)
p(d)
From Eq. (9) and Eq. (11), the likelihood p(d| x) is a Gaussian centered at x
with variance 1,
1 1 2
p(d | x) = √ exp − (d − x) .
2π 2
Substituting the prior and the likelihood inside Eq. (12) gives the posterior
distribution,
 !
2 2
1 (d − x) (x − µ)
p(x | d) ∝ exp − + ,
2 1 σ2

which can be written as a normal distribution,

!
µ + dσ 2 σ2
p(x |d) = N , .
1 + σ 1 + σ2
2

Robust prediction The robust prediction is the probability density of the

output of the model. This density takes the uncertainty of the parameters of
the model. From the model Eq. (9), we observe that the output is a sum of
two random variables. In the case of prior robust prediction, the parameter x is
normally distributed (see Eq. (10)). The error is also normally
distributed (see
Eq. (11)). Note that the sum of two r.v. X1 ∼ N µ1 , σ12 and X2 ∼ N µ2 , σ22
is given by Z = X1 + X2 ∼ N µ1 + µ2 , σ12 + σ22 .
We can then easily write the prior robust prediction as

p(y) = N µ, σ 2 + 1 .

Similarly, the posterior robust prediction can be written as

!
µ + dσ 2 σ2
p(y | d) = N , +1 .
1 + σ2 1 + σ2

Note that in this case, the posterior robust prediction gives a smaller confi-
dence interval, so adding data increases the robustness of the prediction. These
quantities are represented in Fig. 2.
The same result can be obtained from Eq. (3) or Eq. (4). In this particular
case, p(y | x) = N (x, 1).

3 The Laplace Approximation

For non linear models, the posterior distribution can not be derived analytically
in general. Therefore, we summarize such distributions with two quantities:

6
prior
posterior

µ ŷ µŷ
x y

Figure 2: Left: Prior and Posterior distributions, p(x) and p(x |d), of the param-
eter x. Adding data increases our confidence in x. Right: Prior and Posterior
Robust predictions of y. Again, adding data in this particular case increases
the confidence in the prediction.

• the best estimate, which is the model parameter for which the posterior
density function is maximized,
• measure of reliability of the best estimate.

The posterior distribution around the best estimate can be locally approxi-
mated with a Gaussian distribution, by employing the Laplace approximation
method. The Laplace approximation method uses the Taylor expansion of a
function around a global maximum in order to construct the exponential form
of a function. Since the posterior distribution is approximated by a Gaussian,
the logarithm of the posterior plays the role of the function onto which the Tay-
lor expansion is applied. The main idea of the Laplace approximation method
is discussed below.

Let x ∈ R be a parameter with probability distribution function p(x). In

the case of continuous variables, the following two conditions hold true for the
global maximum of the distribution, x̂,

∂p
= 0, (13)
∂x x̂

∂ 2 p
< 0. (14)
∂x2
x̂

The logarithm of the probability density (which, in the Bayesian framework

corresponds to the log-likelihood function) is,

L(x) = log p(x) .

By performing Taylor expansion of the logarithm of p(x) around the maximum

7
x̂, which corresponds to the maximum of p(x), we have

1 ∂ 2 L 2

3

L(x) = L(x̂) + (x − x̂) + O (x − x̂) , (15)
2 ∂x2

x̂

where we used Eq. (13). Keeping only terms up to second order, we can write
the probability distribution as,
 
2
1 ∂ L
p(x) ≈ A exp  (x − x̂)2  ,

2 ∂x2
x̂

where A is a constant A = exp L(x̂) . We obtained a Gaussian approximation
of the probability density function with variance
 −1
2
∂ L
σ2 = −  2  .

∂x
x̂

This is positive as the second derivative is negative according to the condi-

tion Eq. (14). We can finally write the Gaussian approximation, omitting the
normalization constant, as
√
p(x) ≈ 2πσ 2 p(x̂)N (x | x̂, σ)

1 1 2
∝√ exp − 2 (x − x̂) .
2πσ 2 2σ

The concept of Laplace approximation is graphically explained in Fig. 3.

2-dimensional approximation In 2D, if the parameters are denoted as x =

(x1 , x2 ), the first partial derivatives are zero, due to the existence of maximum
at the best estimate x̂ = (x̂1 , x̂2 ),

∇L(x) = 0

The log-likelihood around the best estimates (x̂1 , x̂2 ) is then approximated by
Taylor series expansion,
 
2 2 2
1 ∂ L ∂ L ∂ L
L(x) ≈ L(x̂)+  2 (x1 − x̂1 )2 + (x2 − x̂2 )2 + 2 (x1 − x̂1 )(x2 − x̂2 ) .
2 ∂x1 ∂x22 ∂x1 ∂x2
x̂ x̂ x̂

Defining
∂ 2 L ∂ 2 L ∂ 2 L
A= , B= , C= ,
∂x21 ∂x22 ∂x1 ∂x2
x̂ x̂ x̂

8
Figure 3: Laplace approximation. In the limit of many observation data the
probability distribution function p(x) is locally approximated as a Gaussian
around the best estimate, i.e. the value that maximizes the density function.

and introducing the Hessian matrix H of the function L,

A C
H= ,
C B

the Taylor series expansion takes the form,

1
L(x) ≈ L(x̂) + Q(x).
2
where Q(x) is,
Q(x) = (x − x̂)T H(x̂)(x − x̂).
The covariance matrix of the Gaussian approximation is the inverse of the Hes-
sian
Σ = H −1 (x̂).
We compute the marginal probability of parameter x1
Z ∞
p(x1 ) = p(x1 , x2 )dx2 ,
−∞
!
1 AB − C 2
≈ c exp (x − x̂1 )2 ,
2 B

where c is the normalization factor.

9
D-dimensional approximation In higher dimensions, the Taylor expansion
of L(x) about the best estimate x̂ extends as follows,
1
L ≈ L(x̂) + (x − x̂)T ∇∇T L(x̂)(x − x̂).
2
The Hessian at the best estimate is defined as

H(x̂) = ∇∇T L(x̂),

and the covariance matrix is again,

Σ = H −1 (x̂).

The posterior distribution is calculated as

q
p(x) ≈ (2π)N |Σ| p(x̂) N (x | x̂, Σ)

1 1
≈ cp exp (x − x̂)T Σ−1 (x − x̂) .
(2π)N |Σ| 2

where c is the normalization constant.

3.1 Example: Back to the coin flipping problem

In Section 2.2 we described the coin flipping problem in the Bayesian framework.
We will here apply Laplace approximation to the same problem. In the Bayesian
framework, we now approximate the posterior distribution of the parameter with
a Gaussian distribution around the best estimate, i.e. the value that maximizes
the posterior. We already showed that the posterior probability density function
of the parameter x = H has the form

p H | d ∝ H R (1 − H)N −R .

Taking the logarithm, we get

L(H) = const + R log(H) + (N − R) log(1 − H),

where the constant does not play any role as we want to find the best estimate.
The first two derivatives read,
∂L R N −R
= − ,
∂H H 1−H
∂2L R N −R
=− 2 − .
∂H 2 H (1 − H)2

10
R
The condition Eq. (13) gives the best estimate Ĥ = N. The standard deviation
is therefore given by
 − 12
2
∂ L
σ = −

∂H 2
Ĥ
s
Ĥ(1 − Ĥ)
= .
N
Note that the certainty increases as we add more data. We can also notice that
it is easier to detect a biased coin than a fair coin. Indeed, the uncertainty is
maximized for Ĥ = 21 .

3.2 Example: Gaussian mean estimator

Consider N independent and identically distributed (i.i.d.) observations d =
(d1 , d2 , . . . dN ). We assume that the data are randomly generated from a Gaus-
sian distribution with known variance σ 2 and unknown mean x = µ,

1 1 2
p(dk | µ) = √ exp − 2 (dk − µ) .
2πσ 2 2σ
What is the best estimate for µ and what is our confidence for this estimate?
From Bayes’ theorem, the posterior probability of the mean µ is given by
p(µ |d) ∝ p(d | µ) p(µ).
Since the data is i.i.d., the likelihood function takes the form
N
Y
p(d| µ) = p(dk | µ).
k=1

Here, we assume an uninformative uniform prior for the mean of the Gaussian,
(
1
c = µmax −µ , µmin ≤ µ ≤ µmax
p(µ) = min

0, otherwise.

The posterior distribution is then given by

N
Y
p(µ| d) ∝ c p(dk |µ)
k=1
N
Y 1 1 2
=c √ exp − 2 (dk − µ)
2πσ 2 2σ
k=1
 
N
1 1 X
=c exp − 2 (dk − µ)2  .
(2πσ 2 )N/2 2σ
k=1

11
We compute the log-likelihood,

L(µ) = log(p(µ |d)),

N
X (dk − µ)2
= const − .
2σ 2
k=1

The best estimate µ̂ must satisfy

N
dL(µ) X dk − µ̂
= = 0,
dµ µ̂ σ2
k=1
N
X N
X
⇒ dk = µ̂,
k=1 k=1
N
1 X
⇒ µ̂ = dk .
N
k=1

We compute the second derivative of the log-likelihood

N
d2 L X 1 N
2
=− 2
= − 2,
dµ σ σ
k=1

which is negative, meaning that L(µ̂) is indeed

√ a maximum. Finally, the stan-
dard deviation of the posterior is equal to σ/ N .

4 Monte Carlo Methods

In the previous section we saw that the Laplacian approximation method can
be used to approximate the posterior distribution of the parameters. This ap-
proach is characterized as deterministic. Alternatively, we can make use of
stochastic methods in order to numerically represent the posterior probabilities
with randomly generated samples from the underlying distribution.

4.1 Monte Carlo Integration

The main concept of the Monte-Carlo methods is presented here in the case of
the numerical computation of an integral of the following form
Z
E[f (x)] = f (x) p(x) dx, (16)

where x is a random vector with density p and f is a given function we want to

integrate. Common examples are:
1. model evidence: Z
p(d) = p(d |x)p(x) dx,

12
2. robust posterior prediction:
Z
p(y | d) = p(y | x)p(x |d) dx,

Assume that {x(k) }N

k=1 are i.i.d. samples drawn from the density p. Using
the Law of Large numbers, the expected value of f (x) is given by the estimate
N
1 X
µ̂f,N = f (x(k) ).
N
k=1

In the limit N → ∞, the sample average converges to the expected value.

Defining

µf = E[ [f ] (x)]
2
σf2 = Var[f (x)] = E[ [f ] (x)] − µ2f ,

the Law of Large numbers and the Central Limit theorem give

lim µ̂f,N = µf ,
N →∞

µ̂f,N ∼ N µf , σf2 /N .

Conclusively:
√
• The error of the sample estimate decreases as 1/ N .
• The sample estimate is an unbiased estimate of the true value.
• Convergence of the estimate is independent of the dimensionality of the
problem.

4.2 Random Number Generators

The whole concept of Monte-Carlo methods relies on the generation of random
samples. It is essential to generate such numbers with the desired properties.

Pseudo-random number generators: Algorithms that guarantee the gen-

eration of a sequence of integers Zi that approximately follow a uniform distri-
bution on an interval in the real axis. The general algorithm for the generation
of pseudo-random numbers is

Zi = g(Zi−1 , . . . , Zi−m ) mod M,

for generating integers in the interval [0, M −1]. In this case, Zi is the remainder
of the division of g(Zi−1 , . . . , Zi−m ) by M. In a simpler form, Zi can be generated
by:
Zi = αZi−1 mod M,

13
for Z0 = 1 and i ≥ 1. For the above, M is a large prime number and α an
integer [2].
The resulting sequence of random numbers turns out to be ergodic with
period M − 1. A sequence is accepted when it satisfies certain criteria. For
example, for a choice of M = 109 not all α result into an good quality sequence
of random numbers. Recommended options are: α = 75 , M = 231 − 1 [3].

Note for non-uniform distributions Integrals of the form

Z
I= f (x) dx
Ω

can be written as
Z
I = |Ω| f (x) p(x) dx = |Ω| Ep [f ].
Ω

Here p is the uniform distribution over Ω from which samples are drawn,
(
1
, x ∈ Ω,
p(x) = |Ω|
0, otherwise.
For these cases, the pseudo-random number generators are sufficient. In the
general case we want to approximate the integral Eq. (16) for a non uniform
density p. Usually, the distribution p is known up to a constant factor,
φ(x)
p(x) = ,
Z
R∞
where Z = −∞ φ(x) dx. In the next sections, we will discuss how to generate
random numbers from such distributions.

4.3 Importance Sampling

We want to evaluate the integral
Z
I= f (x) dx, (17)

using Monte Carlo integration. Eq. (17) can be written equivalently as

Z
f (x)
I= p(x) dx,
p(x)

f (x)
= Ep ,
p(x)
R
where p(x) > 0 is a probability density function with p(x) dx = 1. Therefore,
we can approximate I using the Monte Carlo integration technique,
N
1 X f (x(k) )
Iˆ = ,
N
k=1
p(x(k) )

14
where the samples {x(k) }N k=1 are i.i.d. and follow the density p. There is an
infinite amount of choices for p from which we would like to select one which:
1. is easy to sample,
2. minimizes the error of the estimate for a finite number of samples.
A measure of the error of the estimate is given by
" 2 # Z
f (x) f (x)2
Ep −I = p(x) dx − I 2 . (18)
p(x) p(x)2

It is easy to show that this is minimized for

f (x)
p(x) = ,
I
but this expression implies that we already know I. In practice, we choose p
“similar” to f .

5 Sampling methods
5.1 Function Inversion
Let X be a real random variable with probability density function pX and
corresponding cumulative distribution function
Z x
FX (x) = pX (r) dr.
−∞

The idea behind the “Inverse Transform Sampling” method is that samples from
the density pX (x) can be generated by a transformation

x = g(u),

where u ∼ U(0, 1). We will identify the function g such that X follows the
desired density pX . The densities of X and U should satisfy

pX (x) dx = pU (u) du, (19)

which leads to
dg(u) −1

du
pX (x) = pU (u) = pU (u) .
dx du
For the r.v. U drawn from a uniform probability distribution, U ∼ pU (u) =
U(u| 0, 1), the probability of generating a random number between u and u + du
is (
du, 0 ≤ u < 1,
pU (u) du =
0, otherwise.

15
Therefore, pU (u) = 1 for u ∈ [0, 1] and integrating Eq. (19) yields
Z x Z u
pX (r) dr = pU (r) dr = u.
−∞ 0

This means, from the definition of FX , that

FX (x) = u.
−1
Assuming that FX has an inverse FX , we obtain
−1
x = g(u) = FX (u).
−1
For simple density functions pX for which FX is known, it is therefore easy to
generate samples X. However, the inverse is not available in general, which lead
to the developpement of sampling algorithms, as discussed later in this section.

Example: Exponential Distribution Given that the random variable x is

distributed according to the probability density function

pX (x) = λe−λx ,

with λ > 0 and x ≥ 0, the CDF of x is given by

Z x
FX (x) = λe−λτ dτ = 1 − e−λx .
0

Setting the random variable u := FX (x), x can be sampled from the inverse
transformation
−1
x = g(u) = FX (u),
or equivalently
FX (x) = u ⇒ 1 − e−λx = u,
which results in
1
x = − ln(1 − u).
λ
If samples {u(k) }N
k=1 are drawn from U(0, 1) then x
(k)
= − λ1 ln(1 − u(k) ) follow
pX .

Example: Gaussian Distribution We want to draw samples from the stan-

dard normal distribution
x ∼ N (0, 1) .
−1
The inverse transform method can be time consuming for this case, as FX is
not known in closed form. An alternative algorithm, called Box-Muller transfor-
mation, uses the inverse transform method to convert two independent uniform
random variables into two independent gaussian random variables. Suppose
that {r, φ} is a set of two independent distributed random variables according
to the following:

16
• φ is drawn by a uniform distribution with a simple transformation so that
(
1
, 0 ≤ φ ≤ 2π,
pΦ (φ) = 2π
0, otherwise.

• r is sampled according to the exponential distribution

1 −r
pR (r) = e 2,
2
for r > 0. The sampling from pR is performed via an inverse transforma-
tion from a uniformly sampled variable, as seen in the previous example.
Since r and φ are independent, the joint probability is

pR,Φ (r, φ) = pR (r)pΦ (φ).

√ √
We now define the transformation, x = r cos φ and y = r sin φ, so that
r = x2 + y 2 and φ = arctan y/x. The sampling joint distribution for x and y is
now computed as
1 1 − x2 +y2
pX,Y (x, y) = e 2 ,
2π 2
1 x2 1 y2
= √ e− 2 √ e− 2 .
2π 2π
The joint distribution ends in describing two independent normally distributed
variables.

5.2 Rejection Sampling

Until now, we have seen how to generate samples from a uniform distribution,
through pseudo-random numbers generators and from other distributions, using
the inverse transformation method. However, there are many distributions, from
which it may be impossible to directly define an inverse transform. In such cases,
we turn to methods that only require knowledge of the functional form of the
probability density function p up to a constant. The key concept here is the
following: In order to generate independent samples from a desired density p
one draws from another density q that is easier to sample from and then, instead
of applying a transformation to q, some sampled points are rejected according
to certain criteria.
Given a density p, we can write
Z p(x) Z ∞
p(x) = 1 dx = χ[0,p(x)] (u) du. (20)
0 −∞

The function (
1, 0 ≤ u ≤ p(x),
χ[0,p(x)] (u) =
0, otherwise.

17
Figure 4: Assuming we can sample from the distribution p, we draw a sample
x ∼ p and then a uniform number u ∼ U(0, p(x)). If we marginalize the samples
(x, u) we recover the distribution p, as shown in Eq. (20).

can be seen as the joint distribution pX,U of the random variables X and U
following the distribution p(x) and p(u|x) = U(u | 0, p(x). Marginalizing the
joint distribution over U we recover the distribution p,
Z
p(x) = pX,U (x, u) du. (21)

This property of p is presented in Fig. 4. The dots in the figure correspond to

samples drawn from the joint pX,U and they are uniformly distributed under
the graph of p.

Acceptance–Rejection technique What if we cannot directly sample from

the density p? The answer is simple:

1. find a density q that samples are easily drawn from,

2. scale q by a constant M such that the graph of M q is always above the
graph of p,
3. sample from the joint density pX,U where x ∼ q and u ∼ U(0, M q(X)),

4. keep only the points that are bellow the graph of p.

18
Figure 5: Demonstration of the Accept-Reject algorithm. 1. A sample x is
drawn from the distribution q. 2. A random number u is drawn uniformly in
[0, M q(x)]. 3. The sample x is accepted if u < p(x), i.e., if the point (x, u) is
bellow the graph of p, and rejected otherwise.

This intuitive procedure is called acceptance–rejection algorithm and is pre-

sented graphically in Fig. 5. The detailed algorithm is presented in Algorithm 1.
A basic requirement of the algorithm is that the graph of p should always be
bellow the graph of M q. Equivalently, the constant M must satisfy,

p(x)
M > max .
x q(x)

Theorem 1. The samples generated from Algorithm 1 are distributed according

to p.
Proof. According to the algorithm, we first sample x ∼ q, then u ∼ U(0, M q(x))
and we accept if u > p(x). Thus, the posterior density, using Bayes’ theorem,
is given by
p(u ≤ p(x) |x) q(x)
p(x | u ≤ p(x)) = . (22)
p(ξ ≤ p(x))
The likelihood function corresponds to the probability of a uniformly distributed
value in [0, M q(x)] to be less or equal to p(x). It easy to check that it is equal
to
p(x)
p(u ≤ p(x) | x) = . (23)
M q(x)

19
Algorithm 1 Rejection sampling algorithm.
Input: densities p, q and constant M > 0 such that p(x)/q(x)
Output: a sample distributed according to p
function Rejection sampling(p, q, M )
Generate x ∼ q . Propose a new sample
Generate u ∼ U(0, M q(x))
if u < p(x) then
return x . Accept the proposed sample
else
return Rejection sampling(p, q, C) . Reject and try again
end if
end function

In order to evaluate the denominator of Eq. (22), we integrate the numerator

of Eq. (22) and use Eq. (23),
Z
p(u ≤ p(x)) = p(u ≤ p(x) |x) q(x) dx,
Z
p(x)
= q(x) dx,
M q(x)
Z
1
= p(x) dx,
M
1
= . (24)
M
Inserting Eq. (23) and Eq. (24) in Eq. (22) we obtain
p(x)
M q(x) q(x)
p(x | u ≤ p(x)) = 1 = p(x).
M

Note: The efficiency of the algorithm depends on whether u ≤ p(x). For

1
independent trials, the probability of success is M (see Eq. (24)). Thus, that
the expected number of trials before accepting the sample is C.

Example: von Neumann The original rejection algorithm (Algorithm 2)

was used by von Neumann to draw samples from a density p(x) in [a, b], using
a uniform proposal
1
q(x) = , for a ≤ x ≤ b.
b−a
The constant M is given by

M ≥ max (b − a)p(x).
x∈[a,b]

20
Since M should be as small as possible, we select the lower bound in the above
inequality,
M = (b − a) max p(x).
x∈[a,b]

The procedure to generate one sample is summarized in Algorithm 2.

Algorithm 2 Von Neuman Rejection sampling algorithm.

Input: density p, interval (a, b) and constant M = max p(x)
x∈[a,b]
Output: a sample following the density p
function Rejection sampling(p, a, b, M )
Generate x ∼ U(a, b)
Generate u ∼ U(0, M/(b − a))
if u < p(x) then
return x . Accept the proposed sample
else
return Rejection sampling(p, a, b, M ) . Reject and try again
end if
end function

5.3 Markov Chain Monte Carlo

A Markov chain is a sequence of random numbers x1 , x2 , . . . ∈ Rd with condi-
tional distributions that obey the rule

P xn | xn−1 , xn−2 , . . . , x1 = P xn | xn−1 . (25)
The Metropolis-Hasting algorithm [4], introduced by Nicholas Metropolis to-
gether with Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller,
and Edward Teller (M(RT)2 ), makes use of the Markov chain properties to
generate samples from a probability density function. For a stochastic process
W (x |y) following a Markov chain, the probability density of the states con-
verges to an equilibrium probability density function peq if the detailed balance
equation is satisfied,
W (x |y)peq (y) = W (y | x)peq (x). (26)
In statistical physics, we usually know the stochastic process and need to find
the equilibrium distribution. Here we want the opposite: we know the density
peq and want to design a suitable process W which generates states distributed
according to peq . The idea of M(RT)2 is to write this process as a combination
of proposition and acceptance terms T and A,
W (x | y) = A(x |y)T (x | y). (27)
0 0
The proposal distribution T (x | y) proposes the transition from y to x . It must
normalize to 1: Z
T (x | y)dx = 1.

21
The proposed state x0 is then accepted with acceptance probability A(x0 | y). The
acceptance must be chosen to satisfy the detailed balance condition Eq. (26),

A(x | y)T (x |y)peq (y) = A(y | x)T (y |x)peq (x).

We now define
T (y | x)peq (x)
q(x | y) = . (28)
T (x | y)peq (y)
Note that q(x | y) ≥ 0. It is easy to check that the detailed balance condition is
satisfied for
A(x |y) = min 1, q(x |y) .
The algorithm used to generate one sample from a given state is summarized in
Algorithm 3. Note that it is sufficient to know peq only up to a constant factor.
Indeed, in the M(RT)2 algorithm, peq appears only in a ratio (see Eq. (28)).

Algorithm 3 One step of the Metropolis-Hasting sampling algorithm

Input: Current state y, proposal density T , target density peq
Output: Next state
function Metropolis Hasting step(y, T , Peq )
generate x ∼ T (.| y) . Propose a new state
set q ← T (y | x)peq (x)/T (x |y)peq (y)
if q > 1 then
return x
else
generate U ∼ U(0, 1)
if U < q then . select the state with probability A(x | y) = q
return x . Accept the new state
else
return y . Reject the new state
end if
end if
end function

We will now demonstrate that the M(RT)2 algorithm indeed converges to the
desired distribution peq . We define the probability densities φi of each random
variable xi in the sequence. Given φn , we can write φn+1 (x) as a sum of two
contributions:
• Probability of accepting a new state,
Z
P (“previous state was not x”) = A(x |y)T (x | y)φn (y)dy,

• Probability of not moving away from x, i.e. rejected the proposed state,
Z
P (“previous state was x”) = φn (x) (1 − A(y |x))T (y | x)dy.

22
Combining the above contributions gives
Z Z
φn+1 (x) = A(x | y)T (x | y)φn (y)dy + φn (x) (1 − A(y | x))T (y |x)dy. (29)

It can be shown that this recursive relation gives an ergodic system (the system
will return to the states already visited with probability one and every state
is aperiodic). Therefore, according to Theorem 2 (see also [5]), there exist a
unique equilibrium distribution to which the recursion Eq. (29) converges.
Theorem 2 (Feller). If a random variable defines an ergodic system, then it
exists a unique probability distribution peq that is a fixed point in the above
recursion.
Proof. We will show now that the fixed point is indeed peq . We substitute
φn = peq in Eq. (29) and obtain
Z Z
φn+1 (x) = A(x | y)T (x |y)peq (y)dy + peq (x) (1 − A(y | x))T (y |x)dy,
Z

= A(x |y)T (x |y)peq (y) − A(y |x)T (y |x)peq (x) dy
Z
+ peq (x) T (y | x)dy,
Z
= peq (x) T (y | x)dy,

= peq (x),
where we used the detailed balance condition Eq. (26). Therefore, peq is the
asymptotic density distribution of the random walk.

References
[1] Devinderjit Sivia and John Skilling. Data analysis: a Bayesian tutorial.
OUP Oxford, 2006.
[2] Derrick H Lehmer. Euclid’s algorithm for large numbers. American Mathe-
matical Monthly, pages 227–233, 1938.
[3] Donald E Knuth. The Art of Computer Programming; Volume 2: Seminu-
meral Algorithms. 1981.
[4] Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Au-
gusta H Teller, and Edward Teller. Equation of state calculations by fast
computing machines. The journal of chemical physics, 21(6):1087–1092,
1953.
[5] Francesco Petruccione and Peter Biechele. Stochastic methods for physics
using java: An introduction. 2000.

Cambridge International Exam Fees Lists May June 2024
No ratings yet
Cambridge International Exam Fees Lists May June 2024
4 pages
Of Tribes, Hunters and Barbarians - Forest Dwellers in The Mauryan Period
No ratings yet
Of Tribes, Hunters and Barbarians - Forest Dwellers in The Mauryan Period
20 pages
Sistema de Frenos Freight m12
No ratings yet
Sistema de Frenos Freight m12
457 pages
Toaz - Info Detailed Lesson Plan DLP For Demo Teaching Parallelism PR
No ratings yet
Toaz - Info Detailed Lesson Plan DLP For Demo Teaching Parallelism PR
3 pages
MS For Survey Works (Draft) R5
No ratings yet
MS For Survey Works (Draft) R5
47 pages
Theory of Elasticity
No ratings yet
Theory of Elasticity
4 pages
Fundamentals of Aerodynamits: MC Graw Hill
No ratings yet
Fundamentals of Aerodynamits: MC Graw Hill
9 pages
1 Absolutism Vs Relavatism
No ratings yet
1 Absolutism Vs Relavatism
4 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
Elephant Lifting Catalog v48
100% (1)
Elephant Lifting Catalog v48
80 pages
M-35 Mix Design
No ratings yet
M-35 Mix Design
1 page
Businessethics
No ratings yet
Businessethics
2 pages
Bayes Manuscripts
No ratings yet
Bayes Manuscripts
180 pages
Jeff Byers - Machine Learning and Advanced Statitics
No ratings yet
Jeff Byers - Machine Learning and Advanced Statitics
48 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
No ratings yet
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
27 pages
Baysian-Slides 16 Bayes Intro
No ratings yet
Baysian-Slides 16 Bayes Intro
49 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
No ratings yet
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
1 page
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
No ratings yet
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
49 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
No ratings yet
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
11 pages
An Overview of Bayesian Econometrics
No ratings yet
An Overview of Bayesian Econometrics
30 pages
Bayesian Basics: Ryan P. Adams
No ratings yet
Bayesian Basics: Ryan P. Adams
7 pages
Bayes Lectures English
No ratings yet
Bayes Lectures English
74 pages
Mathematics Board Examination Mastery Test 2 Engineering Pre-Board
No ratings yet
Mathematics Board Examination Mastery Test 2 Engineering Pre-Board
18 pages
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
No ratings yet
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
18 pages
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
No ratings yet
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
12 pages
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
No ratings yet
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
41 pages
Introduction To Probability Theory: A Short Course On Graphical Models
No ratings yet
Introduction To Probability Theory: A Short Course On Graphical Models
30 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
3 Zone Fence Integrity Monitor
No ratings yet
3 Zone Fence Integrity Monitor
2 pages
Multitask Linear Discriminant Analysis For View Invariant Action Recognition
No ratings yet
Multitask Linear Discriminant Analysis For View Invariant Action Recognition
13 pages
Multitask Linear Discriminant Analysis For View Invariant Action Recognition
No ratings yet
Multitask Linear Discriminant Analysis For View Invariant Action Recognition
13 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
qt9kb6x0bw Nosplash
No ratings yet
qt9kb6x0bw Nosplash
18 pages
DS-ZF - 400 - A Gear Box For Volvo Penta d13
No ratings yet
DS-ZF - 400 - A Gear Box For Volvo Penta d13
4 pages
1711 10925 PDF
No ratings yet
1711 10925 PDF
23 pages
Test-Time Training With Self-Supervision For Generalization Under Distribution Shifts
No ratings yet
Test-Time Training With Self-Supervision For Generalization Under Distribution Shifts
20 pages
Bayesian Inference: A Practical Primer: Outline
No ratings yet
Bayesian Inference: A Practical Primer: Outline
28 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
A Course in Bayesian Econometrics University of Queensland
No ratings yet
A Course in Bayesian Econometrics University of Queensland
22 pages
S20G Low Headroom Hoist/geared Trolley Combination
No ratings yet
S20G Low Headroom Hoist/geared Trolley Combination
5 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Understanding Python
No ratings yet
Understanding Python
9 pages
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
No ratings yet
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
49 pages
Handout 3 Skills - Unit 2 - 4 Medio
No ratings yet
Handout 3 Skills - Unit 2 - 4 Medio
3 pages
Air Pollution: Classification of Air Pollutants
No ratings yet
Air Pollution: Classification of Air Pollutants
33 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Naive Bayes
No ratings yet
Naive Bayes
32 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
ML Lecture 03 - Probabilistic Inference (Spring 2024)
No ratings yet
ML Lecture 03 - Probabilistic Inference (Spring 2024)
46 pages
Fullz PDF
No ratings yet
Fullz PDF
3 pages
History Plan Week 6and 7. Term 1
No ratings yet
History Plan Week 6and 7. Term 1
2 pages
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
No ratings yet
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
14 pages
Kingspan Range Tribune Xe Brochure en GB
No ratings yet
Kingspan Range Tribune Xe Brochure en GB
16 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
CH 5
No ratings yet
CH 5
45 pages
Zzzz-Essential Bayes
No ratings yet
Zzzz-Essential Bayes
158 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Bootcamp 2 Session PPT Day 1 Probability Statistics Ankit Javeri 2ND May 2024
No ratings yet
Bootcamp 2 Session PPT Day 1 Probability Statistics Ankit Javeri 2ND May 2024
37 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
PROMATECT®-L Safety Data Sheet
No ratings yet
PROMATECT®-L Safety Data Sheet
4 pages
Lecture 6
No ratings yet
Lecture 6
13 pages
4 Political Frame Worksheet
No ratings yet
4 Political Frame Worksheet
3 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Bayesian Inference
No ratings yet
Bayesian Inference
12 pages
Bayesian Modeling - Student
No ratings yet
Bayesian Modeling - Student
26 pages
15 Remanufact
No ratings yet
15 Remanufact
6 pages
Phy340-Tutorial 2
No ratings yet
Phy340-Tutorial 2
2 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Probs Stats
No ratings yet
Probs Stats
26 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
No ratings yet
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
11 pages
AM207 14 Introduction UQ
No ratings yet
AM207 14 Introduction UQ
63 pages
Notes3 Likelihood
No ratings yet
Notes3 Likelihood
13 pages
Revision For Mid Term Test
No ratings yet
Revision For Mid Term Test
7 pages
Notes 19
No ratings yet
Notes 19
11 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
BIO 101 - Lecture Notes 1
No ratings yet
BIO 101 - Lecture Notes 1
20 pages
Bayesian Estimation
No ratings yet
Bayesian Estimation
13 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
PML Class 1 2025
No ratings yet
PML Class 1 2025
54 pages
Model-Free Objetive Bayesian Prediction
No ratings yet
Model-Free Objetive Bayesian Prediction
8 pages
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
No ratings yet
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
11 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet

Bayesian Uncertainty Quantification

Uploaded by

Bayesian Uncertainty Quantification

Uploaded by

Bayesian Uncertainty Quantification

High Performance Computing

Prof. Dr. Petros Koumoutsakos

3 The Laplace Approximation 6

4 Monte Carlo Methods 12

Robust prediction The uncertainty in the parameters can be propagated to

If the posterior distribution is used the prediction is called posterior robust

Model selection TBW

2.2 Example: The coin flipping problem

Define H the bias-weighting of the coin. For example

• if H = 1: a head comes at every flip,

p(H | d) ∝ p(d| H)p(H). (6)

Here, we omit the normalization factor for simplicity. We choose a uniform

p(d |H) ∝ H R (1 − H)N −R . (8)

This is intuitively derived by considering

• H R is the probability of having R “heads”,

2.3 Example: Linear model

where x and  are independent. We assume the following prior knowledge,

which can be written as a normal distribution,

Robust prediction The robust prediction is the probability density of the

Similarly, the posterior robust prediction can be written as

3 The Laplace Approximation

Let x ∈ R be a parameter with probability distribution function p(x). In

The logarithm of the probability density (which, in the Bayesian framework

By performing Taylor expansion of the logarithm of p(x) around the maximum

This is positive as the second derivative is negative according to the condi-

The concept of Laplace approximation is graphically explained in Fig. 3.

2-dimensional approximation In 2D, if the parameters are denoted as x =

and introducing the Hessian matrix H of the function L,

the Taylor series expansion takes the form,

where c is the normalization factor.

H(x̂) = ∇∇T L(x̂),

and the covariance matrix is again,

The posterior distribution is calculated as

where c is the normalization constant.

3.1 Example: Back to the coin flipping problem

Taking the logarithm, we get

L(H) = const + R log(H) + (N − R) log(1 − H),

3.2 Example: Gaussian mean estimator

The posterior distribution is then given by

L(µ) = log(p(µ |d)),

The best estimate µ̂ must satisfy

We compute the second derivative of the log-likelihood

which is negative, meaning that L(µ̂) is indeed

4 Monte Carlo Methods

4.1 Monte Carlo Integration

where x is a random vector with density p and f is a given function we want to

Assume that {x(k) }N

In the limit N → ∞, the sample average converges to the expected value.

4.2 Random Number Generators

Pseudo-random number generators: Algorithms that guarantee the gen-

Zi = g(Zi−1 , . . . , Zi−m ) mod M,

Note for non-uniform distributions Integrals of the form

4.3 Importance Sampling

using Monte Carlo integration. Eq. (17) can be written equivalently as

It is easy to show that this is minimized for

pX (x) dx = pU (u) du, (19)

This means, from the definition of FX , that

Example: Exponential Distribution Given that the random variable x is

with λ > 0 and x ≥ 0, the CDF of x is given by

Example: Gaussian Distribution We want to draw samples from the stan-

• r is sampled according to the exponential distribution

pR,Φ (r, φ) = pR (r)pΦ (φ).

5.2 Rejection Sampling

This property of p is presented in Fig. 4. The dots in the figure correspond to

Acceptance–Rejection technique What if we cannot directly sample from

1. find a density q that samples are easily drawn from,

4. keep only the points that are bellow the graph of p.

This intuitive procedure is called acceptance–rejection algorithm and is pre-

Theorem 1. The samples generated from Algorithm 1 are distributed according

In order to evaluate the denominator of Eq. (22), we integrate the numerator

Note: The efficiency of the algorithm depends on whether u ≤ p(x). For

Example: von Neumann The original rejection algorithm (Algorithm 2)

The procedure to generate one sample is summarized in Algorithm 2.

Algorithm 2 Von Neuman Rejection sampling algorithm.

5.3 Markov Chain Monte Carlo

where x and are independent. We assume the following prior knowledge,