0% found this document useful (0 votes)
10 views29 pages

2 MS2 (Sampling)

Uploaded by

hadjiamine93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views29 pages

2 MS2 (Sampling)

Uploaded by

hadjiamine93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Mathematical Statistics 2 (MS2)

Lecture 2: Sampling from the posterior

Amine Hadji <[email protected]>


Leiden University February 16, 2022
2 Sampling from the posterior
2.1 Introduction

In this chapter, we will discuss:


• Monte Carlo integration

• Sampling Techniques

• Initiation to MCMC

1 / 17
2 Sampling from the posterior
2.1 Introduction

Introductory problem
Let Y = (Y1 , ..., Yn )|θ ∼ N (θ, 1) a conditional iid sample, and θ a
random variable with a Gamma prior P(θ) = Γ(θ; α, β):

2 / 17
2 Sampling from the posterior
2.1 Introduction

Introductory problem
Let Y = (Y1 , ..., Yn )|θ ∼ N (θ, 1) a conditional iid sample, and θ a
random variable with a Gamma prior P(θ) = Γ(θ; α, β):

• What is the posterior distribution of θ?

• What is the posterior mean? What is the posterior variance?

• Can we construct a 95%-credible interval for θ?

2 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo

Definition 2.1 (Monte Carlo methods)


Monte Carlo methods are a class of computational algorithms that rely
on the Strong Law of Large Numbers to obtain numerical results in:
optimization, numerical integration, and generating draws from a
probability distribution.

3 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo

Definition 2.1 (Monte Carlo methods)


Monte Carlo methods are a class of computational algorithms that rely
on the Strong Law of Large Numbers to obtain numerical results in:
optimization, numerical integration, and generating draws from a
probability distribution.

Example:
R1 1
Pn
The integral 0 xdx can be approximated by n i=1 Xi
iid
where (Xi )ni=1 ∼ U[0,1]

3 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo method


Rb
Using a Monte Carlo method to approximate a
f (x)dx (a can be −∞
and b can be +∞):

1. Verify that the integral is well-defined

2. Find a probability distribution P such that P(x) > 0 for all x ∈ (a, b)

1
Pn f (Xi )1(a,b) (Xi )
3. Approximate the integral by n i=1 P(Xi ) for large values of n
By√the Central Limit Theorem, the method has a rate of convergence of
1/ n

4 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo method


Rb
Using a Monte Carlo method to approximate a
f (x)dx (a can be −∞
and b can be +∞):

1. Verify that the integral is well-defined

2. Find a probability distribution P such that P(x) > 0 for all x ∈ (a, b)

1
Pn f (Xi )1(a,b) (Xi )
3. Approximate the integral by n i=1 P(Xi ) for large values of n
By√the Central Limit Theorem, the method has a rate of convergence of
1/ n

Warning: If the integral is not well-defined, the method will behave


badly!

4 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating the posterior mean


We know that the posterior mean is:

Z
θ̂ = E[θ|y ] = θP(θ|y )dθ
Z
L(θ|y )P(θ)
= θR dθ
L(ϑ|y )P(ϑ)dϑ
R
θL(θ|y )P(θ)dθ
= R
L(θ|y )P(θ)dθ
1
Pn
θi L(θi |y )
≈ n1 Pi=1
n
n i=1 L(θi |y )

5 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating the posterior mean


We know that the posterior mean is:

Z
θ̂ = E[θ|y ] = θP(θ|y )dθ
Z
L(θ|y )P(θ)
= θR dθ
L(ϑ|y )P(ϑ)dϑ
R
θL(θ|y )P(θ)dθ
= R
L(θ|y )P(θ)dθ
1
Pn
θi L(θi |y )
≈ n1 Pi=1
n
n i=1 L(θi |y )

Therefore, we only need to draw (θi )ni=1 from the prior

5 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating a posterior expectation


The previous methods can work to approximate any expectation
E[f (θ)|y ] with f : Θ → R measurable

6 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating a posterior expectation


The previous methods can work to approximate any expectation
E[f (θ)|y ] with f : Θ → R measurable
In particular, we can approximate
• the posterior mean

• the posterior variance

• a posterior probability

6 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF

Lemma 2.2 (Probability integral transform)


Let X a continuous random variable with cdf FX ; then the random
variable Y := FX (X ) follows a uniform distribution on (0, 1)

7 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF

Lemma 2.2 (Probability integral transform)


Let X a continuous random variable with cdf FX ; then the random
variable Y := FX (X ) follows a uniform distribution on (0, 1)

Proposition 2.3
Let F be a cdf, and let F −1 be its inverse function

F −1 (u) = inf{x | F (x) ≥ u} (0 < u < 1).

If U is a uniform random variable on (0, 1) then F −1 (U) has F as its cdf.

7 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF Method

1. Draw the iid sample (θi )N


i=1 from the prior

2. Compute the empirical cdf F̂θ


1
PN
N i=1 1(−∞,x) (θi )L(θi |y )
F̂θ (x) = 1
PN
N i=1 (θi )L(θi |y )

3. Generate U a uniform random variable on (0, 1)

4. Compute θ̃ := F̂ −1 (U)

8 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF Method

1. Draw the iid sample (θi )N


i=1 from the prior

2. Compute the empirical cdf F̂θ


1
PN
N i=1 1(−∞,x) (θi )L(θi |y )
F̂θ (x) = 1
PN
N i=1 (θi )L(θi |y )

3. Generate U a uniform random variable on (0, 1)

4. Compute θ̃ := F̂ −1 (U)
Using the Law of Large Numbers, we see that
N→∞
θ̃ → P(θ|y )

(i.e. when N is large, θ̃ follows approximately the posterior)

8 / 17
2 Sampling from the posterior
2.3 Sampling methods

Sequential Importance Resampling (SIR)

Proposition 2.4
Let (Yi )N
i=1 iid sample from the distribution P and let Q a distribution
dominated by P (P  Q). If (Ik )nk=1 ∼ M(1, PNw1 w , ..., PwN N w ), with
i=1 i i=1 i
wi = Q(Yi )/P(Yi ) then
N→∞
(YIk )nk=1 → Q,

and the random variables (YIk )nk=1 are asymptotically iid

9 / 17
2 Sampling from the posterior
2.3 Sampling methods

SIR Algorithm

1. Draw the iid sample (θi )N


i=1 from the prior

2. Draw the iid sample (Ik )nk=1 from M(1, w1 , ..., wN ) with N  n

L(θi |y )
wi = Pn
i=1 L(θi |y )

3. Compute (θIk )nk=1

10 / 17
2 Sampling from the posterior
2.3 Sampling methods

SIR Algorithm

1. Draw the iid sample (θi )N


i=1 from the prior

2. Draw the iid sample (Ik )nk=1 from M(1, w1 , ..., wN ) with N  n

L(θi |y )
wi = Pn
i=1 L(θi |y )

3. Compute (θIk )nk=1


The SIR algorithm and the ICDF method are completely equivalent

10 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)

11 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)
Imagine we want to obtain a sample from the distribution with pdf f
using samples from the distribution with pdf g

1. Generate Y from the distribution with pdf g

2. Generate U a uniform random variable on (0, 1)


f (Y )
I If U < Mg (Y ) , then accept Y as a sample from the distribution
with pdf g

I If U ≥ f (Y )
Mg (Y ) , then reject Y and start from the beginning

11 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method - Theory

Lemma 2.5
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)
Let Y a random variable with pdf g and U a uniform random variable on
(0, 1); then  
f (Y ) 1
P U≤ =
Mg (Y ) M

12 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method - Theory

Proposition 2.6
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)
Let Y a random variable with pdf g and U a uniform random variable on
(0, 1); then
  Z y
f (Y )
P Y ≤ y |U ≤ = F (y ) := f (t)dt.
Mg (Y ) −∞

13 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Sample

1. Compute M := maxθ∈Θ L(θ|y )

2. Generate θ̃ from the prior

3. Generate U a uniform random variable on (0, 1)

I If U < L(θ̃|y )
M , then accept θ̃ as a sample from the posterior

I If U ≥ L(θ̃|y )
M , then reject θ̃ and start from the beginning

14 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.7 (Markov chain)


A discrete-time Markov chain is a sequence of random variables
X0 , X1 , X2 , ... (i.e. a stochastic process) with the Markov property:

P(Xn+1 ∈ B | X1 = x1 , X2 = x2 , ..., Xn = xn ) = P(Xn+1 ∈ B | Xn = xn ),

if both conditional probabilities are well defined (i.e. if


P(X1 = x1 , ..., Xn = xn ) > 0).

15 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.7 (Markov chain)


A discrete-time Markov chain is a sequence of random variables
X0 , X1 , X2 , ... (i.e. a stochastic process) with the Markov property:

P(Xn+1 ∈ B | X1 = x1 , X2 = x2 , ..., Xn = xn ) = P(Xn+1 ∈ B | Xn = xn ),

if both conditional probabilities are well defined (i.e. if


P(X1 = x1 , ..., Xn = xn ) > 0).

Definition 2.8 (Time-homogeneity)


A Markov chain is said to be time-homogeneous if

P(Xn+1 ∈ B | Xn = x) = P(X1 ∈ B | X0 = x)

for all n ∈ N.

15 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.9 (Transition kernel)


A time-homogeneous Markov chain is entirely defined by its transition
kernel Q
Q(x, B) = P(Xn+1 ∈ B | Xn = x)
for all n ∈ N.

16 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.9 (Transition kernel)


A time-homogeneous Markov chain is entirely defined by its transition
kernel Q
Q(x, B) = P(Xn+1 ∈ B | Xn = x)
for all n ∈ N.

Definition 2.10 (Stationary distribution)


A probability distribution Π is called stationary for the transition kernel Q
if Xn ∼ Π implies that Xn+1 ∼ Π for all n ∈ N
Z
Q(x, B)dΠ(x) = Π(x).

16 / 17
2 Sampling from the posterior
2.4 MCMC

Monte-Carlo Markov chains

Definition 2.11 (MCMC)


Markov chain Monte Carlo methods comprise a class of algorithms for
sampling from a probability distribution by constructing a Markov chain
that has the desired distribution as its equilibrium distribution.

17 / 17

You might also like