0% found this document useful (0 votes)

87 views22 pages

Sampling Methods: Søren Højsgaard

The document describes various sampling methods for Bayesian modeling including rejection sampling, sampling importance resampling (SIR), and Markov chain Monte Carlo (MCMC) methods. Rejection sampling draws samples from a proposal distribution and accepts them with a probability proportional to the target distribution. SIR first draws weighted samples from a proposal distribution and then resamples to approximate the target distribution. MCMC methods like Metropolis-Hastings and Gibbs sampling generate dependent samples that converge to the target distribution.

Uploaded by

Aleksandr Terranova

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

87 views22 pages

Sampling Methods: Søren Højsgaard

Uploaded by

Aleksandr Terranova

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 22

Sampling methods

Søren Højsgaard
Department of Mathematical Sciences

Aalborg University, Denmark

June 3, 2014

Contents
1 Introduction – Bayesian modelling 1

2 Computations using Monte Carlo methods 2

2.1 Rejection sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Example: Rejection sampling . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 QUIZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Sampling importance resampling (SIR)* . . . . . . . . . . . . . . . . . . . 5
2.5 Example: Sampling importance resampling (SIR)* . . . . . . . . . . . . . . 5

3 Markov Chain Monte Carlo methods 6

3.1 The Metropolis–Hastings (MH) algorithm . . . . . . . . . . . . . . . . . . 6
3.2 Special cases of the Metropolis–Hastings algorithm . . . . . . . . . . . . . 7
3.3 Example: Metropolis–Hastings algorithm . . . . . . . . . . . . . . . . . . . 7
3.4 Capture–recapture revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Single component Metropolis–Hastings . . . . . . . . . . . . . . . . . . . . 10
3.7 The Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.8 Slice sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.9 Towards omnibus software . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.10 Sampling from a (truncated) poisson . . . . . . . . . . . . . . . . . . . . . 16
3.11 Sampling on the unit interval . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.12 Capture–recapture revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.13 Work on the log–scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1
1 Introduction – Bayesian modelling
• In a Bayesian setting, parameters are treated as random quantities on equal footing
with the random variables.
• The joint distribution of a parameter (vector) ✓ and data (vector) y is specified
through a prior distribution ⇡(✓) for ✓ and a conditional distribution p(y | ✓) of data
for a fixed value of ✓.
• This leads to the joint distribution for data AND parameters
p(y, ✓) = p(y | ✓)⇡(✓)

• The prior distribution ⇡(✓) represents our knowledge (or uncertainty) about ✓ before
data have been observed.

• After observing data y, the posterior distribution ⇡ ⇤ (✓) of ✓ is obtained by condition-

ing with data which gives
p(y|✓)⇡(✓)
⇡ ⇤ (✓) = p(✓|y) = / L(✓)⇡(✓)
p(y)
R
where L(✓) = p(y | ✓) is the likelihood and the marginal density p(y) = p(y | ✓)⇡(✓)d✓
is the normalizing constant.
• Often we are interested in the posterior mean of some function g(✓):
Z
E(g(✓)|⇡ ) = g(✓)⇡ ⇤ (✓)d✓
⇤

Examples: E(✓|⇡ ⇤ ) or Var(✓|⇡ ⇤ ).

• However, usually
R ⇡ ⇤ (✓) can not be found analytically because the normalizing con-
stant p(y) = p(y | ✓)⇡(✓)d✓ is intractable.

• In such cases one will often resort to sampling based methods: If we can draw samples
✓(1) , . . . , ✓(N ) from ⇡ ⇤ (✓) we can do just as well:
1 X
E(g(✓)|⇡ ⇤ ) ⇡ g(✓(i) )
N i

• The question is then how to draw samples from ⇡ ⇤ (✓) where ⇡ ⇤ (✓) is only known up
to the normalizing constant.
• There are many methods for achieving this; these methods are known as Markov
Chain Monte Carlo (MCMC) methods and will be described elsewhere.
• Sections marked with “*” in the following can be skipped at first reading.

2
2 Computations using Monte Carlo methods
Consider a random vector X with density / probability mass function p(x) which is the
target distribution (from which we want to sample).
In many real world applications

• we can not directly draw samples from p.

• p is only known up to a constant of proportionality; that is

p(x) = k(x)/c

where k() is known and the normalizing constant c is unknown.

We reserve h(x) for a proposal distribution which is a distribution from which we can
draw samples.

2.1 Rejection sampling

Let p(x) = k(x)/c be a density where k() is known and c is unknown. Let h(x) be a
proposal distribution.
Suppose we can find a constant M such that k(x) < M h(x) (i.e. k(x)/M < h(x) and hence
k(x)/M
h(x)
< 1) for all x. The algorithm is then

1. Draw sample x ⇠ h(). Draw u ⇠ U (0, 1).

k(x)/M
2. Set ↵ = h(x)

3. If u < ↵, accept x.

The accepted values x1 , . . . xN is a random sample from p(·).

Notice:

• It is tricky to choose a good proposal distribution h(). It should have support at

least as large as p() and preferably heavier tails than p().

• It is desirable to choose M as small as possible which is difficult in practice. Hence

one often chooses a large value of M whereby only few proposed values are accepted
so it is difficult to make rejection sampling efficient.

3
2.2 Example: Rejection sampling
> k <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
> x <- seq(-4, 4, 0.1)
> plot(x,k(x),type="l")
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

> # uniform proposal on [-4,4]:

> h <- function(x){rep.int(0.125,length(x))}
> # we can find M in this case:
> M <- round(max(k(x)/h(x))) + 1; M
[1] 25
> # number of samples
> N <- 1000
> # generate proposals and u
> x.h <- runif( N, -4, 4 )
> u <- runif( N )
> acc <- u < k(x.h) / (M * h(x.h))
> x.acc <- x.h[ acc ]
> # how many proposals are accepted
> sum( acc ) /N
[1] 0.335
> # calculate some statistics
> c(m=mean(x.acc), s=sd(x.acc))
m s
-0.60006 1.42031
> par(mfrow=c(1,2), mar=c(2,2,1,1))
> plot(x,k(x),type="l")
> barplot(table(round(x.acc,1))/length(x.acc))

4
0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.04
0.02
0.00
−4 −2 0 2 4 −2.8 −1.6 −0.4 0.7 1.6 3.1

2.3 QUIZ
Reuse the code from above to answer these questions, but please think about what the
results would be before executing the code.

• Suppose we could not easily determine M and hence had to make a conservative
choice; say M = 100 or M = 500 in this context.
Which e↵ect would that have on the number of accepted samples, and how would
you have to compensate?

• Suppose we take the proposal distribution h() to be uniform om [ 10, 10]. Which
e↵ect would that have on the acceptance rate? What if the proposal distribution is
an N (0, 1)? What is the quality of the samples in this case? Hint: Use dnorm() to
evaluate the normal density.

2.4 Sampling importance resampling (SIR)*

When M is not readily available, we may generated approximate samples from p as follows.

1. Draw samples x1 , . . . xN ⇠ h(x).

2. Calculated importance weights wi = p(xi )/h(xi ).

P
3. Normalize the weights as qi = wi / j wj .

4. Resample from {x1 , . . . xN } where y i is drawn with probability qi .

The samples obtained in 4. are approximately samples from p.

Notice:

• This scheme works also if p is only known up to proportionality (because the nor-
malizing constant cancels out in step 3. above).

5
• Samples from h which “fits best to p” are those most likely to appear in the resample.
However, if h is a poor approximation to p then the “best samples from h” are not
necessarily good samples in the sense of resembling p.

2.5 Example: Sampling importance resampling (SIR)*

> h <- function(x){rep.int(0.125,length(x))}
> N <- 1000
> x.h <- runif( N, -4, 4 )
> u <- runif( N )
> ww <- k(x.h) / h(x.h)
> qq <- ww / sum(ww)
> x.acc <-sample(x.h, prob=qq, replace=T)
> par(mfrow=c(2,2), mar=c(2,2,1,1))
> plot(x,k(x),type="l")
> barplot(table(round(x.acc,1))/length(x.acc))
0.0 1.0 2.0 3.0

0.03
0.00

−4 −2 0 2 4 −3.1 −2 −1.1 0 0.7 1.6 2.5

3 Markov Chain Monte Carlo methods

• A drawback of the rejection algorithm and the SIR–algorithm is that it is difficult to
suggest a proposal distribution h which leads to an efficient algorithm.
• For the rejection algorithm, it is also difficult to find M .
• A way around this problem is to let the proposed values depend on the last accepted
values: If x0 is a “likely” value from p then so is probably also a proposed value x
which is “close” to x0 .
• Hence the proposal distribution will now be conditional on the last accepted value
and have the form h(x|x0 ).

• This leads schemes (described below) for drawing samples x1 , . . . , xN and these sam-
ples will, under certain conditions, form an ergodic Markov chain with p(x) as its
stationary distribution.

6
• Hence, the expected value of any function of x can be calculated approximately as
Z
1 X
g(x)p(x)dx ⇡ g(xi ).
N i

• The samples x1 , . . . , xN will typically be correlated because the value xj will be

generated from h(·|xj 1 ) and will hence depend on xj 1 .

3.1 The Metropolis–Hastings (MH) algorithm

Given an accepted value xt 1 :

1. Draw x ⇠ h(·|xt 1 ). Draw u ⇠ U (0, 1).

✓ ◆
p(x) h(xt 1 |x)
2. Calculate acceptance probability ↵ = min 1, p(xt 1 ) h(x|xt 1 )

3. If u < ↵ then set xt = x; else set xt = xt 1 .

After a burn–in period the samples x1 , x2 , . . . will be samples from p(·).

Notice:

• The samples x1 , x2 , . . . will be correlated.

• The algorithm also works if p is only known up to proportionality (because the
normalizing constant cancels when calculating the acceptance probability).
• We must be able to both sample from h() and evaluate the density.

3.2 Special cases of the Metropolis–Hastings algorithm

Metropolis algorithm (a special case of the Metropolis-Hastings algorithm) The pro-
posal distribution is symmetrical, i.e. h(x|x0 ) = h(x0 |x) for all pairs (x, x0 ). Hence
the acceptance probability is ↵ = min 1, p(xp(x)
t 1) .

Random–walk Metropolis A popular choice for proposal in a Metropolis algorithm is

h(x|x0 ) = g(x x0 ) where g is symmetric, e.g.
x = x0 + e e ⇠ g

Example: x = x0 + N (0, 2
)
The independence sampler (A special case of the Metropolis–Hastings algorithm) The
proposal h(x|x0 ) = h(x) does not depend on x0 .) The acceptance probability be-
h(xt 1 )
comes ↵ = min 1, p(xp(x)
t 1 ) h(x) . For this sampler to work well, h should be a good
approximation to p.

7
3.3 Example: Metropolis–Hastings algorithm
Random walk Metropolis is straight forward to implement
> N <- 10000
> x.acc5 <- rep.int(NA, N)
> u <- runif(N)
> acc.count <- 0
> std <- 0.05 ## Spread of proposal distribution
> xc <- 0; ## Starting value
> for (ii in 1:N){
xp <- rnorm(1, mean=xc, sd=std) ## proposal
alpha <- min(1, (k(xp)/k(xc)) *
(dnorm(xc, mean=xp,sd=std)/dnorm(xp, mean=xc,sd=std)))
x.acc5[ii] <- xc <- ifelse(u[ii] < alpha, xp, xc)
## find number of acccepted proposals:
acc.count <- acc.count + (u[ii] < alpha)
}
> ## Fraction of accepted *new* proposals
> acc.count/N
[1] 0.9846
> par(mfrow=c(1,2), mar=c(2,2,1,1))
> plot(x,k(x),type="l")
> barplot(table(round(x.acc5,1))/length(x.acc5))
0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.04
0.02
0.00

−4 −2 0 2 4 −3.4 −2 −1 0 0.9 2 2.9

3.4 Capture–recapture revisited

Consider again the capture–recapture model for estimating population size.

recaptured not recaptured

marked m=20 n-m=80 n=100
unmarked u=180 ? U?
total R=200 ? N?

8
We assume
m ⇠ bin(n, ✓), u ⇠ bin(U, ✓)

So we get
p(m|✓) ⇠ bin(n, ✓) p(u|✓, U ) ⇠ bin(U, ✓)

Hence as before we get

p(m, u|✓, U ) = p(m|✓)p(u|✓, U )

The likelihood is:

p(m, u|✓, U ) = L(✓, U )

✓ ◆ ✓ ◆
n m n m U
= ✓ (1 ✓) ✓u (1 ✓)U u
m u
✓ ◆
U m+u
/ ✓ (1 ✓)n+U (m+u)
u

To complete the model specification we must specify prior distributions for ✓ and U . These
must reflect our prior knowledge of the problem.
The joint density of data (m, u) and the parameters (✓, U ) is then
✓ ◆
U m+u
p(m, u, ✓, U ) / ✓ (1 ✓)n+U (m+u) ⇡✓ (✓)⇡U (U )
u

The posterior is proportional to the joint density

✓ ◆
U m+u
p(✓, U |m, u) / ✓ (1 ✓)n+U (m+u)
⇡✓ (✓)⇡U (U )
u

To fit in with the current notation let x1 = ✓, x2 = U and x = (x1 , x2 ). Also notice that
data (m, u) is fixed so we need not write that in the posterior.
✓ ◆
⇤ x2 m+u
p (x1 , x2 ) / x (1 x1 )n+x2 (m+u) ⇡x1 (x1 )⇡x2 (x2 ) = k(x1 , x2 )
u 1
> logk <- function(x1, x2, n_, m_, u_){
R_ <- m_ + u_
R_*log(x1) + (n_+x2-R_)*log(1-x1) + lchoose (x2, u_) +
+ log(dunif(x1, .0, .2)) + log( disc.pmf(x2, 500, 2000))
}
> disc.pmf <- function(x, a, b){
ifelse (x>=a & x<=b, 1/(b-a+1), 0)
}
> n_ <- 100
> m_ <- 20

9
> u_ <- 180
> NN <- 10000 ## Number of samples
> u <- runif(NN)
> th.prop <- runif(NN, .0, 0.5)
> U.prop <- sample(300:3000, NN, replace=T)
> out <- matrix(NA, NN,2)
> xc <- c(0.2, 1500)
> acc.count <- 0
> for (i in 1:NN){
xp <- c( th.prop[i], U.prop[i] )
alpha <- min(1, exp(logk(xp[1], xp[2], n_, m_, u_) -
logk(xc[1], xc[2], n_, m_, u_)))
xc <- if(u[i]<alpha) xp else xc
out[i, ] <- xc
acc.count <- acc.count + (u[i]<alpha)
}
> acc.count / NN ## Not impressive acceptance ratio
[1] 0.0128
> summary(out[,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.103 0.158 0.168 0.166 0.180 0.200
> summary(out[,2])
Min. 1st Qu. Median Mean 3rd Qu. Max.
813 991 1090 1110 1200 1750
> par(mfrow=c(1,2), mar=c(2,2,1,1))
> hist(out[,1], prob=T); lines(density(out[,1]), col="red")
> hist(out[,2], prob=T); lines(density(out[,2]), col="red")
Histogram of out[, 1] Histogram of out[, 2]
0.0000 0.0010 0.0020 0.0030
10 15 20 25 30

Density
5
0

0.10 0.14 0.18 800 1200 1600 2000

3.5 Quiz
Using the code from the slides, experiment with the following:

10
• Set m = 2 and u = 18. How does that e↵ect the posterior distribution? What if you
set m = 40 and u = 360?

• Experiment with narrowing and widening the range of the proposal distributions.
Which e↵ect does that have on the output?

• Try chaning the prior for U to a poisson distribution. Hint: dpois is your friend.

• Experiment with changing the number of samples. How many do you need to produce
“nice” histograms?

3.6 Single component Metropolis–Hastings

Instead of updating the entire vector x it is often more convenient and computationally
efficient to update x in blocks.
We partition x into blocks, for example x = (x1 , x2 , x3 ).
Suppose that we have a sample xt 1 = (xt1 1 , xt2 1 , xt3 1 ) and also that x1 has also been
updated to xt1 in the current iteration. The task is to update x2 .
To do so we specify a proposal distribution h2 from which we can sample candidate values
for x2 :

1. Draw x2 ⇠ h2 (·|xt1 , xt2 1 , xt3 1 ). Draw u ⇠ U (0, 1).

✓ ◆
p(x |xt ,xt 1 ) h (xt 1 |xt ,x ,xt 1
)
2. Calculate acceptance probability ↵ = min 1, p(xt 2 1 |x1 t ,x3 t 1 ) h2 (x2 |xt ,x1t 21 ,x3t 1
)
2 1 3 2 2 1 2 3

3. If u < ↵ set xt2 = x2 ; else set xt2 = xt2 1 .

Notice:

• Item 3. can be restated as: With probability ↵ set xt2 = x2 ; with probability 1 ↵
set xt2 = xt2 1 .

• If we can choose h2 such that ↵ is close to 1 then we have an efficient sampler.

3.7 The Gibbs sampler

Consider the acceptance probability for single component Metropolis–Hastings for updating
x2 : ✓ ◆
p(x2 |xt1 , xt3 1 ) h2 (xt2 1 |xt1 , x2 , xt3 1 )
↵ = min 1,
p(xt2 1 |xt1 , xt3 1 ) h2 (x2 |xt1 , xt2 1 , xt3 1 )

11
The Gibbs sampler is a special case of single component Metropolis–Hastings, namely the
case where the proposal distribution h2 (x2 |xt1 , xt2 1 , xt3 1 ) for updating x2 is chosen to be
p(x2 |xt1 , xt3 1 )

Hence for the Gibbs sampler the proposed values are always accepted.
One version of the algorithm is as follows. Suppose a sample xt = (xt1 , xt2 , xt3 ) is available.

1. Sample xt+1
1 ⇠ p(x1 |xt2 , xt3 )
2. Sample xt+1
2 ⇠ p(x2 |xt+1 t
1 , x3 )

3. Sample xt+1
3 ⇠ p(x3 |xt+1 t+1
1 , x2 )

4. Set xt+1 = (xt+1 t+2 t+3

1 , x1 , x1 )

The sequence x1 , x2 , . . . then consists of (correlated) samples from p(x).

Notice:

• The proposed values are always accepted (because ↵ = 1), so the sampler is very
efficient.
• The sampler requires that we can sample from the conditionals p(xi |x i ). In some
cases this is easy; in some cases this is difficult. In general; slice sampling can be
used (and this is what JAGS does).

3.8 Slice sampling

Suppose we want to sample from p(xi |x i ) where x i = (x1 , . . . , xi 1 , xi+1 , . . . xK ).
Since x i is fixed we can regard p(xi |x i ) as a function of xi alone; call this function ki (xi )
and recall that ki () is an unnormalized density.
Slice sampling is a simple approach for sampling from an unnormalized density.
> k <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
> plot(x,k(x), type='l', lwd=2, col=2)
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

12
Notice: k() is practically zero outside [ 4, 4] and in this interval k() takes values in, say
[0, 3].
Slice sampling is based on the following idea: Sample uniformly in a “large enough” window:
> N <- 5000
> xs <- runif(N, -4, 4)
> ys <- runif(N, 0, 3)
> plot(x,k(x), type='l', lwd=2, col=2)
> points(xs,ys, pch=".")
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

Keep those samples that fall under the curve.

> good <- ys<k(xs)
> xg <- xs[good]
> yg <- ys[good]
> plot(x,k(x), type='l', lwd=2, col=2)
> points(xg, yg, pch='.')
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

> hist(xg)

13
Histogram of xg
250
150
0 50

−3 −2 −1 0 1 2 3

Obviously inefficient for other type of distributions, for example

> x2 <- seq(0,10, 0.1)
> k2 <- function(x){1/x}
> plot(x2, k2(x2), type='l')
10
8
6
4
2
0

0 2 4 6 8 10

Algorithm goes as follows: Given sample xt . Pick y uniformly in [0, k(xt )].
> xt<-1; y <- runif(1, 0, k(xt))
> plot(x,k(x), type='l', lwd=2, col=2)
> abline(v=xt, col='green'); abline(h=y, col='blue')
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

Let set S = {x : k(x) y} of x–values for which k(x) is below the curve. Sample xt+1
uniformly from S.

14
1. Sample y uniformly from ]0, k(xt )]. Defines a horisontal “slice” S = {x : k(x) y}.

2. Find interval I = [L, R] containing all (or much) of the slice.

3. Sample xt+1 uniformly from the part of the slice within I.

The last two steps can be implemented in many ways. We need an interval width w (chosen
by us).
“Stepping out”: Position interval of length w randomly around xt . Denote interval by
I = [L, R]. Expand both ends in steps of size w until both ends are outside the slice, i.e.
until k(L) < y and k(R) < y. Sample xt+1 from the part of the slice within I. (That is,
sample uniformly from I; if a sample is outside S just sample again).
“Doubling”: Position interval of length w around xt . Denote interval by I = [L, R]. Double
the interval until both ends are outside the slice, i.e. until k(L) < y and k(R) < y. Sample
xt+1 from the part of the slice within I.
> sliceSample_real<- function(k, xc, w){
kc<-k(xc)
y <-runif(1, 0, kc)
a <- runif(1) ## place w randomly around xc
l <- xc-a*w
u <- xc+(1-a)*w
kl <- k(l)
while (kl>y){ ## expand interval to the left if necessary
l <- l-w; kl <- k(l)
}
ku <- k(u)
while(ku>y){ ## expand interval to the right if necessary
u <- u+w; ku <- k(u)
}
xp <- runif(1, l, u) ## propose xp
kp <- k(xp)
while(kp<y){ ## shrink interval if possible
if (xp<xc) l <- xp else u <- xp
xp <- runif(1, l, u)
kp <- k(xp)
}
xp
}
> N <- 3000
> out <- rep.int(NA, N)
> x <- 1
> for (i in 1:1000){
x <- sliceSample_real(k, x, w=1)
out[i] <- x
}

15
> hist( out )
Histogram of out
200
150
100
50
0

−3 −2 −1 0 1 2 3

3.9 Towards omnibus software

With the slice sampling method, it is now clear why general purpose software (such as
JAGS) can be constructed.
We need a collection of slice samplers:

• Sampling on the real line

• Sampling on the unit interval [0, 1]

• Sampling on non–negative real values

• Sampling on the non–negative integers 0, 1, 2, . . .

• Sampling on a, a + 1, a + 2, . . . a + b

• ... and more

3.10 Sampling from a (truncated) poisson

Suppose we want to sample from a (truncated) poisson distribution given as
x x
p(x) / k(x) = I(x L)I(x  U ) e / I(x L)I(x  U )
x! x!
> k <- function(x, lambda=12, L=5, U=15){
(x>=L)*(x<=U)*x^lambda/factorial(x)
}
> x <- seq(0,20)
> plot(x, k(x))

16
●
2500000

●
1000000

●
● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
0

0 5 10 15 20

> sliceSample_int <- function(k, xc, w=5){

a <- runif(1, 0, 1)
l <- floor( xc-a*w )
u <- ceiling( xc+(1-a)*w )
#cat(sprintf("init l=%d, u=%d\n",l,u))
y <- runif(1, 0, k(xc))
kl <- k(l)
while(kl>y){
l <- l-w; kl <- k(l)
#cat(sprintf("lower: l=%d, u=%d\n",l,u))
}
ku <- k(u)
while(ku>y){
u <- u+w; ku <- k(u)
#cat(sprintf("upper: l=%d, u=%d\n",l,u))
}
z <- l:u # sample uniformly on [l,u]
xp <- sample(z, 1)
kp <- k(xp)
while(kp<y){xp <- sample(z, 1);kp <- k(xp)}
xp
}
Sampling from poisson:
> N <- 1000
> out <- rep.int(NA, N)
> x <- 10 # Some starting value
> k <- function(x, lambda=12, L=0, U=150){
if(x<0) 0
else
(x>=L)*(x<=U)*lambda^x/factorial(x)
}
> for (i in 1:N){
out[i] <- sliceSample_int(k, x, w=5)
}
> barplot(table(out))

17
120
0 20 40 60 80

4 5 6 7 8 9 11 13 15 17 19 21

Sampling from truncated poisson:

> N <- 1000
> out <- rep.int(NA, N)
> x <- 10 # Some starting value
> k <- function(x, lambda=12, L=7, U=15){
if(x<0) 0
else
(x>=L)*(x<=U)*lambda^x/factorial(x)
}
> for (i in 1:N){
out[i] <- sliceSample_int(k, x, w=5)
}
> barplot(table(out))
140
100
60
0 20

7 8 9 10 11 12 13 14 15

3.11 Sampling on the unit interval

> k <- function(x){-x*(1-x)+.25}
> x <- seq(0,1,0.01)
> plot(x,k(x))

18
● ●
● ●
● ●
● ●
0.20

● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
0.10

● ●
● ●
● ●
● ●
●● ●
●● ●●●
●● ●●
●● ●●
●● ●●
●● ●●
●● ●
●●● ●●
●●● ●●●
0.00

●●●●● ●●●
●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

> sliceSample_unit<- function(k, xc, w){

kc<-k(xc)
y <-runif(1, 0, kc)
xp <-runif(1, 0, 1)
kp <- k(xp)
while (kp<y){xp<-runif(1,0,1); kp<-k(xp)}
xp
}
> N <- 1000
> out <- rep(NA, N)
> for (i in 1:N){
x <- sliceSample_unit(k, x, w);
out[i] <- x
}
> hist(out)
Histogram of out
100 150 200 250
50
0

0.0 0.2 0.4 0.6 0.8 1.0

3.12 Capture–recapture revisited

Consider again the capture–recapture model for estimating population size.
We assume
m ⇠ bin(n, ✓), u ⇠ bin(U, ✓)

19
recaptured not recaptured
marked m=20 n-m=80 n=100
unmarked u=180 ? U?
total R=200 ? N?

So we get
p(m|✓) ⇠ bin(n, ✓) p(u|✓, U ) ⇠ bin(U, ✓)

Hence as before we get

p(m, u|✓, U ) = p(m|✓)p(u|✓, U )

The likelihood is:

p(m, u|✓, U ) = L(✓, U )

✓ ◆ ✓ ◆
n m n m U
= ✓ (1 ✓) ✓u (1 ✓)U u
m u
✓ ◆
U m+u
/ ✓ (1 ✓)n+U (m+u)
u

The posterior is proportional to the joint density

✓ ◆
U m+u
p(✓, U |m, u) / ✓ (1 ✓)n+U (m+u)
⇡✓ (✓)⇡U (U )
u

To fit in with the current notation let x1 = ✓, x2 = U and x = (x1 , x2 ). Also notice that
data (m, u) is fixed so we need not write that in the posterior.
✓ ◆
⇤ x2 m+u
p (x1 , x2 ) / x (1 x1 )n+x2 (m+u) ⇡x1 (x1 )⇡x2 (x2 ) = k(x1 , x2 )
u 1
> k <- function(x1, x2, n_, m_, u_){
R_ <- m_ + u_
z<-R_*log(x1) + (n_+x2-R_)*log(1-x1) + lchoose (x2, u_) +
+ log(dunif(x1, .0, .2)) + log( disc.pmf(x2, 500, 2000))
exp(z)
}
> disc.pmf <- function(x, a, b){

20
ifelse (x>=a & x<=b, 1/(b-a+1), 0)
}
> n_ <- 100
> m_ <- 20
> u_ <- 180
> library(doBy)
> kk <- specialize(k, list(n_=n_, m_=m_, u_=u_))
> # Now kk is function only of x1, x2
> args(kk)
function (x1, x2)
NULL
> kk
function (x1, x2)
{
R_ <- 20 + 180
z <- R_ * log(x1) + (100 + x2 - R_) * log(1 - x1) + lchoose(x2,
180) + +log(dunif(x1, 0, 0.2)) + log(disc.pmf(x2, 500,
2000))
exp(z)
}
<environment: 0x07e1c188>
> N <- 10000
> x1t <- .1 # initial values
> x2t <- 1000 # initial values
> out <- matrix(NA, N, 2)
> kk1 <- specialize(kk, list(x2=x2t)); kk1
function (x1)
{
R_ <- 20 + 180
z <- R_ * log(x1) + (100 + 1000 - R_) * log(1 - x1) + lchoose(1000,
180) + +log(dunif(x1, 0, 0.2)) + log(disc.pmf(1000, 500,
2000))
exp(z)
}
<environment: 0x07db16e0>
> kk2 <- specialize(kk, list(x1=x1t)); kk2
function (x2)
{
R_ <- 20 + 180
z <- R_ * log(0.1) + (100 + x2 - R_) * log(1 - 0.1) + lchoose(x2,
180) + +log(dunif(0.1, 0, 0.2)) + log(disc.pmf(x2, 500,
2000))
exp(z)
}
<environment: 0x07d95200>

21
> for (i in 1:N){
x1t <- sliceSample_unit(kk1, x1t, w=1)
kk2 <- specialize(kk, list(x1=x1t))
x2t <- sliceSample_int(kk2, x2t, w=10)
kk1 <- specialize(kk, list(x2=x2t))
out[i,] <- c(x1t,x2t)
}
> par(mfrow=c(1,2))
> z<-apply(out, 2, hist)
Histogram of newX[, i] Histogram of newX[, i]

2500
1500

Frequency

1500
1000
500

500
0

0.08 0.12 0.16 0.20 800 1200 1600 2000

3.13 Work on the log–scale

For numerical reasons it is generally better to work on a log scale, i.e. with log k(x) instead
of k(x).
Sampling y from a U (0, k(xt ) distribution is the same as taking y = k(xt )u where u ⇠
u(0, 1).
Instead of doing this work on the log scale. Take

z = log k(xt ) + log u where u ⇠ u(0, 1)

We need to sample xt+1 uniformly from the slice S = {x : k(x) y}. But this is the same
as sampling xt+1 uniformly from the slice S = {x : log k(x) log y = z}.

Unit-5_ML
100% (1)
Unit-5_ML
14 pages
MCMC Brief
100% (1)
MCMC Brief
69 pages
UNIT-5
No ratings yet
UNIT-5
74 pages
Simon Shaw Bayes Theory
No ratings yet
Simon Shaw Bayes Theory
72 pages
Monte Carlo Techniques For Bayesian Statistical Inference - A Comparative Review
No ratings yet
Monte Carlo Techniques For Bayesian Statistical Inference - A Comparative Review
15 pages
Stochastic Simulation Book
No ratings yet
Stochastic Simulation Book
146 pages
Lec30 GibbsSampling
No ratings yet
Lec30 GibbsSampling
55 pages
Siggraph03
No ratings yet
Siggraph03
24 pages
University of Bristol Research Report 08:16: SMCTC: Sequential Monte Carlo in C++
No ratings yet
University of Bristol Research Report 08:16: SMCTC: Sequential Monte Carlo in C++
36 pages
Cra I U Rosenthal Ann Rev
No ratings yet
Cra I U Rosenthal Ann Rev
40 pages
ST407 2020 Notes
100% (1)
ST407 2020 Notes
126 pages
CLEP Calculus
100% (3)
CLEP Calculus
42 pages
2 MS2 (Sampling)
No ratings yet
2 MS2 (Sampling)
29 pages
PDF Sampling: Markov Chain Monte Carlo: X N I I
No ratings yet
PDF Sampling: Markov Chain Monte Carlo: X N I I
13 pages
Monte Carlo Methods in Bayesian Computation Full Text Download
100% (15)
Monte Carlo Methods in Bayesian Computation Full Text Download
16 pages
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
No ratings yet
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
78 pages
Computational Statistics With Matlab
No ratings yet
Computational Statistics With Matlab
71 pages
mcmc_notes_91b1c707-2c45-4da5-9443-ee53eb3219f0 (1)
No ratings yet
mcmc_notes_91b1c707-2c45-4da5-9443-ee53eb3219f0 (1)
77 pages
LectureNotes_complete (1)
No ratings yet
LectureNotes_complete (1)
90 pages
Lectures 6
No ratings yet
Lectures 6
17 pages
MARKOV CHAIN MONTE CARLO
No ratings yet
MARKOV CHAIN MONTE CARLO
51 pages
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
No ratings yet
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
24 pages
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
No ratings yet
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
487 pages
problem_set_f3a9968c-11db-4887-8482-ad690c58f86f
No ratings yet
problem_set_f3a9968c-11db-4887-8482-ad690c58f86f
13 pages
Numerical Methods For Engineering Design and Optimization
No ratings yet
Numerical Methods For Engineering Design and Optimization
19 pages
Lec29 ImportanceSampling
No ratings yet
Lec29 ImportanceSampling
84 pages
NeurIPS-2019-sample-adaptive-mcmc-Paper
No ratings yet
NeurIPS-2019-sample-adaptive-mcmc-Paper
12 pages
Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
No ratings yet
Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
485 pages
Taz TFG 2022 2997
No ratings yet
Taz TFG 2022 2997
33 pages
MCMC
No ratings yet
MCMC
70 pages
MCMC
No ratings yet
MCMC
7 pages
Lecture 13 Student
No ratings yet
Lecture 13 Student
41 pages
Stat 413
No ratings yet
Stat 413
55 pages
Computation
No ratings yet
Computation
11 pages
Markov Chain Monte Carlo Without Likelihoods
No ratings yet
Markov Chain Monte Carlo Without Likelihoods
5 pages
KY.UNIT 3
No ratings yet
KY.UNIT 3
20 pages
MTH210
No ratings yet
MTH210
126 pages
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
No ratings yet
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
66 pages
Lec27 AcceptReject
No ratings yet
Lec27 AcceptReject
30 pages
Importance Sampling
No ratings yet
Importance Sampling
13 pages
Monte Carlo Sampling Methods
No ratings yet
Monte Carlo Sampling Methods
25 pages
Chib-UnderstandingMetropolisHastingsAlgorithm-1995
No ratings yet
Chib-UnderstandingMetropolisHastingsAlgorithm-1995
10 pages
L11 TopicModels 2
No ratings yet
L11 TopicModels 2
37 pages
UNIT-5 Markov Chain Monte Carlo Methods
No ratings yet
UNIT-5 Markov Chain Monte Carlo Methods
17 pages
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
No ratings yet
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
14 pages
Part A Simulation: Matthias Winkel Department of Statistics University of Oxford
No ratings yet
Part A Simulation: Matthias Winkel Department of Statistics University of Oxford
54 pages
VMO ESV and Toeplitz Operators On The Bergman Space
No ratings yet
VMO ESV and Toeplitz Operators On The Bergman Space
30 pages
Markov Chains and Monte Carlo Methods: Ioana A. Cosma and Ludger Evers
No ratings yet
Markov Chains and Monte Carlo Methods: Ioana A. Cosma and Ludger Evers
97 pages
Essentials of Bayesian Inference 1706204646
No ratings yet
Essentials of Bayesian Inference 1706204646
21 pages
MonteCarlo PDF
No ratings yet
MonteCarlo PDF
84 pages
Symbolic Integration (1967)
No ratings yet
Symbolic Integration (1967)
272 pages
Chapter 3. Intro To Linear Programing
100% (1)
Chapter 3. Intro To Linear Programing
20 pages
Bayesian Analysis
No ratings yet
Bayesian Analysis
20 pages
Sage 9.1 Reference Manual: Functions: Release 9.1
No ratings yet
Sage 9.1 Reference Manual: Functions: Release 9.1
148 pages
NP-hard Problems and Approximation Algorithms: 10.1 What Is The Class NP?
No ratings yet
NP-hard Problems and Approximation Algorithms: 10.1 What Is The Class NP?
29 pages
MCMC Final Edition
No ratings yet
MCMC Final Edition
17 pages
Notes NMCE
No ratings yet
Notes NMCE
264 pages
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
30 pages
Questions_for_Unit_5__RM_
No ratings yet
Questions_for_Unit_5__RM_
4 pages
Lecture 9: Variance, Covariance, Correlation Coefficient: Kate Rina Sta Nková
No ratings yet
Lecture 9: Variance, Covariance, Correlation Coefficient: Kate Rina Sta Nková
76 pages
Logic
100% (2)
Logic
152 pages
Logic
100% (2)
Logic
152 pages
Iterative and Non-Iterative Simulation Algorithms
No ratings yet
Iterative and Non-Iterative Simulation Algorithms
6 pages
Monte Carlo
No ratings yet
Monte Carlo
59 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
Radicals Test
100% (5)
Radicals Test
3 pages
Management Science Lecture 2 Graphical Method
No ratings yet
Management Science Lecture 2 Graphical Method
12 pages
Rarefied Gas Dynamics - DSMC Course
No ratings yet
Rarefied Gas Dynamics - DSMC Course
50 pages
Chromatography
No ratings yet
Chromatography
28 pages
Numerical Analysis: Using MATLAB and Spreadsheets
No ratings yet
Numerical Analysis: Using MATLAB and Spreadsheets
44 pages
Derivations Integrals
No ratings yet
Derivations Integrals
24 pages
Types of Controllers: P, I, D, PI, PD, and PID Controllers
No ratings yet
Types of Controllers: P, I, D, PI, PD, and PID Controllers
5 pages
SAS Dumps
No ratings yet
SAS Dumps
153 pages
unit2progresscheckfrqpartb
No ratings yet
unit2progresscheckfrqpartb
8 pages
ALENCAL IRBE 2.5 300 MG TABLETAS LOTE 16009 IRBESARTAN-1
No ratings yet
ALENCAL IRBE 2.5 300 MG TABLETAS LOTE 16009 IRBESARTAN-1
8 pages
Geostatistics Fundamentals
0% (1)
Geostatistics Fundamentals
6 pages
Business Mathematics
No ratings yet
Business Mathematics
2 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
2 pages
Machine Learning Lab: Delhi Technological University
No ratings yet
Machine Learning Lab: Delhi Technological University
6 pages
About NTPC: Moving Average Convergence Divergence (MACD)
No ratings yet
About NTPC: Moving Average Convergence Divergence (MACD)
5 pages
Determination Chlorpyrifos and Prophenofos in Vegetables by HPLC and UV Spectrophotometric Method
No ratings yet
Determination Chlorpyrifos and Prophenofos in Vegetables by HPLC and UV Spectrophotometric Method
20 pages
6.5.4. Jacobian Pseudoinverse With An Optimization Term - Q
No ratings yet
6.5.4. Jacobian Pseudoinverse With An Optimization Term - Q
4 pages
Lecture 13: Simple Linear Regression in Matrix Format
No ratings yet
Lecture 13: Simple Linear Regression in Matrix Format
9 pages
Section II. Analysis of Forecasting Methods
No ratings yet
Section II. Analysis of Forecasting Methods
9 pages
Divide and Conquer
No ratings yet
Divide and Conquer
7 pages
Assignment 01
No ratings yet
Assignment 01
5 pages
TEMPLATE AAJ en 2022
No ratings yet
TEMPLATE AAJ en 2022
3 pages
14.1 Write The Characteristics Equation and Construct Routh Array For The Control System Shown - It Is Stable For (1) KC 9.5, (Ii) KC 11 (Iii) KC 12
No ratings yet
14.1 Write The Characteristics Equation and Construct Routh Array For The Control System Shown - It Is Stable For (1) KC 9.5, (Ii) KC 11 (Iii) KC 12
12 pages
Banach Space
No ratings yet
Banach Space
13 pages
Geometric Stiffness and P Delta Effects
100% (1)
Geometric Stiffness and P Delta Effects
14 pages
Pharmaceutical Drug Analysis
90% (10)
Pharmaceutical Drug Analysis
554 pages
SAT Math Shortcuts
From Everand
SAT Math Shortcuts
Bella Biscotti
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Sampling Methods: Søren Højsgaard

Uploaded by

Sampling Methods: Søren Højsgaard

Uploaded by

Sampling methods

Aalborg University, Denmark

2 Computations using Monte Carlo methods 2

3 Markov Chain Monte Carlo methods 6

• After observing data y, the posterior distribution ⇡ ⇤ (✓) of ✓ is obtained by condition-

Examples: E(✓|⇡ ⇤ ) or Var(✓|⇡ ⇤ ).

• we can not directly draw samples from p.

• p is only known up to a constant of proportionality; that is

where k() is known and the normalizing constant c is unknown.

2.1 Rejection sampling

1. Draw sample x ⇠ h(). Draw u ⇠ U (0, 1).

The accepted values x1 , . . . xN is a random sample from p(·).

• It is tricky to choose a good proposal distribution h(). It should have support at

• It is desirable to choose M as small as possible which is difficult in practice. Hence

> # uniform proposal on [-4,4]:

2.4 Sampling importance resampling (SIR)*

1. Draw samples x1 , . . . xN ⇠ h(x).

2. Calculated importance weights wi = p(xi )/h(xi ).

4. Resample from {x1 , . . . xN } where y i is drawn with probability qi .

The samples obtained in 4. are approximately samples from p.

2.5 Example: Sampling importance resampling (SIR)*

−4 −2 0 2 4 −3.1 −2 −1.1 0 0.7 1.6 2.5

3 Markov Chain Monte Carlo methods

• The samples x1 , . . . , xN will typically be correlated because the value xj will be

3.1 The Metropolis–Hastings (MH) algorithm

1. Draw x ⇠ h(·|xt 1 ). Draw u ⇠ U (0, 1).

3. If u < ↵ then set xt = x; else set xt = xt 1 .

After a burn–in period the samples x1 , x2 , . . . will be samples from p(·).

• The samples x1 , x2 , . . . will be correlated.

3.2 Special cases of the Metropolis–Hastings algorithm

Random–walk Metropolis A popular choice for proposal in a Metropolis algorithm is

−4 −2 0 2 4 −3.4 −2 −1 0 0.9 2 2.9

3.4 Capture–recapture revisited

recaptured not recaptured

Hence as before we get

The likelihood is:

p(m, u|✓, U ) = L(✓, U )

The posterior is proportional to the joint density

0.10 0.14 0.18 800 1200 1600 2000

3.6 Single component Metropolis–Hastings

1. Draw x2 ⇠ h2 (·|xt1 , xt2 1 , xt3 1 ). Draw u ⇠ U (0, 1).

3. If u < ↵ set xt2 = x2 ; else set xt2 = xt2 1 .

• If we can choose h2 such that ↵ is close to 1 then we have an efficient sampler.

3.7 The Gibbs sampler

4. Set xt+1 = (xt+1 t+2 t+3

The sequence x1 , x2 , . . . then consists of (correlated) samples from p(x).

3.8 Slice sampling

Keep those samples that fall under the curve.

Obviously inefficient for other type of distributions, for example

2. Find interval I = [L, R] containing all (or much) of the slice.

3. Sample xt+1 uniformly from the part of the slice within I.

3.9 Towards omnibus software

• Sampling on the real line

• Sampling on the unit interval [0, 1]

• Sampling on non–negative real values

• Sampling on the non–negative integers 0, 1, 2, . . .

• ... and more

3.10 Sampling from a (truncated) poisson

> sliceSample_int <- function(k, xc, w=5){

Sampling from truncated poisson:

3.11 Sampling on the unit interval

0.0 0.2 0.4 0.6 0.8 1.0

> sliceSample_unit<- function(k, xc, w){

0.0 0.2 0.4 0.6 0.8 1.0

3.12 Capture–recapture revisited

Hence as before we get

The likelihood is:

p(m, u|✓, U ) = L(✓, U )

The posterior is proportional to the joint density

0.08 0.12 0.16 0.20 800 1200 1600 2000

3.13 Work on the log–scale

z = log k(xt ) + log u where u ⇠ u(0, 1)

You might also like