0% found this document useful (0 votes)
31 views17 pages

Lectures 6

Uploaded by

rweinert00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views17 pages

Lectures 6

Uploaded by

rweinert00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Monte Carlo Methods

∗ Monte Carlo methods are a way of approximating the value


of an integral using large samples of random variables.

∗ These samples of random variables are typically computer


generated.

∗ Since we regularly need to calculate integrals in Bayesian


inference, Monte Carlo methods are very popular for that.

6-1
Monte Carlo Integration

∗ Suppose that we need to evaluate


Z
I = h(x) dx
A

∗ Let f be a probability density function with support A.

∗ Then we can write


h(x)
Z Z
I = f (x) dx = g(x)f (x) dx
A f (x) A

∗ Now if Y is a random variable with pdf f then we have


I = E[g(Y )].

6-2
Monte Carlo Integration

∗ Hence if we have Y1, . . . , YN iid


∼ f , the Weak Law of Large
Numbers tells us that
N
1 X p
ˆ
I = g(Yi) −→ I.
N i=1

∗ We can therefore use Iˆ to approximate I very well for large


enough N .

∗ Furthermore we can use estimate the variability in Iˆ using


the sample variance of the random sample g(Y1), . . . , g(YN )
divided by N .

∗ Since N is totally in our control we can choose N to be large


enough to make the variability as low as we desire.

6-3
Generating Uniform Random Variates

∗ Computers are unable to generate random numbers.

∗ The can, however, be used to generate pseudo-random num-


bers.

∗ These are sequences of numbers which are generated from a


deterministic algorithm but which behave like a sequence of
iid random variates.

∗ Typically the numbers generated by a computer can be thought


of as coming from a Uniform(0,1) distribution.

6-4
Generating Non-Uniform Random Numbers

∗ Uniform random variates are rarely what we need for simula-


tion or Monte Carlo inference.

∗ They are, however, the building blocks for generating random


variates from any other distributions.

∗ Much of this is based on the following theorem


Theorem 6.1 (Probability Integral Transform)
Suppose that U ∼ Uniform(0, 1) and that F is a continuous cdf
with unique inverse F −1. Then the random variable

Y = F −1(U )
has a distribution with cdf F .

6-5
Generating Discrete Random Variates

∗ The method as described above requires that we have a con-


tinuous cdf.

∗ A similar technique can also be used to generate discrete


random variables.

∗ Suppose that p(y) is the probability mass function and the


support of the random variable is Y = {y : p(y) > 0}. Then
we can define the inverse cdf as

F −1(u) = min{y ∈ Y : F (y) ⩾ u}

∗ Then if U ∼ uniform(0, 1), the random variable Y = F −1(u)


will be distributed with probability mass function p(y).

6-6
Special Methods

∗ In many cases, the inverse cdf is not available in closed form


and so this method cannot be used. We can often, however,
use algorithms based on transformations for such situations.

∗ Suppose that U1 and U2 are two independent Uniform(0, 1)


random variables then it is easy to show that
q q
Y1 = −2 log U1 sin(2πU2) and Y2 = −2 log U1 cos(2πU2)
are independent standard normal random variables.

∗ This is known as the Box-Muller Algorithm.

6-7
Accept/Reject Algorithm

∗ A more general technique which is useful when the inverse


cdf method cannot be applied is called the Accept/Reject
Algorithm.

∗ This method relies on generating a different random variable


V which has the same support as the required variable Y .

∗ We also require that the ratio of densities is bounded by a


known constant
fY (y)
M = sup < ∞
y fV (y)

6-8
Accept/Reject Algorithm

1. Calculate M = supy fY (y)/fV (y).

2. Generate V ∼ fV and independently U ∼ Uniform(0, 1).

3. If
fY (V )
U <
M fV (V )
then set Y = V . Otherwise discard U and V and return to
step 2.

6-9
Markov Chain Monte Carlo Methods

∗ Many of the methods described so far are not very useful for
generating multivariate random variates.

∗ Markov Chain Monte Carlo methods are now widely used in


these settings.

∗ The methods work on the idea of constructing a Markov


chain which has a stationary distribution equal to the distri-
bution of interest.

∗ Under certain conditions, the distribution of the elements in


such a chain will converge to this stationary distribution.

6-10
Markov Chain Monte Carlo Methods

∗ These algorithms start with some initial value for the random
variable of interest.

∗ They then run a carefully constructed Markov chain starting


from that initial value for a sufficiently long time.

∗ It is not always easy to know how long the chains should be


run but various diagnostics have been proposed.

∗ Any observations in the chain after this burn-in period may


be considered as (at least approximately) distributed with the
stationary distribution.

6-11
Metropolis–Hastings Algorithm

∗ First introduced in statistical physics in 1954 by Metropolis


et al. Statistical properties shown by Hastings in 1970.

∗ It is basically a Markov chain version of the accept/reject


algorithm.

∗ Random variates are generated from some candidate distri-


bution conditional on the current state of the chain and then
either the new state is accepted or rejected in which case the
chain stays where it is.

6-12
Metropolis–Hastings Algorithm

Suppose we wish to sample Y ∼ fY .

First initialize the chain with some value Y (0).

Then for t = 1, 2, . . . we generate Y (t) by


1. Generate V (t) ∼ fV |Y (v | Y (t−1)).
2. Calculate the acceptance probability
     
 fY V (t) fV |Y Y (t−1) | V (t) 
ρt = min  ×  , 1
f
Y Y
(t−1) fV |Y V (t) | Y (t−1) 

3. Generate Ut ∼ Uniform(0, 1) and set



 V (t) if Ut ⩽ ρt
Y (t) =
 Y (t−1) if Ut > ρt

6-13
Independence Metropolis–Hastings Algorithm

∗ It is often convenient to generate V (t) from the same distri-


bution at every iteration.

∗ In this case we have fV |Y (v | Y (t−1)) = fV (v) and so the


acceptance probability becomes
     
 fY V (t) fV Y (t−1) 
ρt = min  ×   , 1
f
Y Y
(t−1) fV V (t) 
     
 fY V (t) fV Y (t−1) 
= min  ×  , 1
f
V V
(t) fY Y (t−1) 

6-14
Random Walk Metropolis–Hastings Algorithm

∗ Another special case is where fV |Y (v | y) = fZ (v − y) where


fZ is a distribution symmetric about 0.

∗ We generate Z (t) ∼ fZ and set V (t) = Y (t−1) + Z (t).

∗ In stochastic processes this is called a random walk.

∗ The acceptance probability for the Metropolis–Hastings al-


gorithm then becomes
   
 fY V (t) 
ρt = min  , 1
f
Y Y
(t−1) 

6-15
Gibbs Sampler

∗ The Gibbs Sampler (Geman & Geman, 1984) is designed to


generate observations from a complex multivariate distribu-
tion.

∗ The Markov chain is constructed by considering the univari-


ate conditional distributions.

∗ Suppose that the random vector of interest is Y = (Y1, . . . , Yd)


and that we can generate observations from the full condi-
tional distributions
   
fj y | Y−j = y−j = fyj |y−j y | Y−j = y−j j = 1, . . . , d
 
where Y−j = Y1, . . . , Yj−1, Yj+1, . . . , Yd .

6-16
Gibbs Sampler
 
(0) (0)
Initialise the chain to some value Y (0) = Y1 , . . . , Yd .

For t = 1, 2, . . .  
(t) (t−1) (t−1)
1 Generate Y1 from f1 y1 | Y2 , . . . , Yd .
 
(t) (t) (t−1) (t−1)
2 Generate Y2 from f2 y2 | Y1 , Y3 , . . . , Yd .
...

 
(t) (t) (t) (t−1) (t−1)
j Generate Yj from fj yj | Y1 , . . . , Yj−1, Yj+1 , . . . , Yd .
...

 
(t) (t) (t)
d Generate Yd from fd yd | Y1 , . . . , Yd−1 .

 
(t) (t)
Then we set Y (t) = Y1 , . . . , Yd .

6-17

You might also like