MCMC
MCMC
CMU-10701
Markov Chain Monte Carlo Methods
2
Monte Carlo Methods
3
The importance of MCMC
A recent survey places the Metropolis algorithm among the
4
MCMC Applications
MCMC plays significant role in statistics, econometrics, physics and
computing science.
Marginalization
Normalization
Expectation
Global optimization
5
The Monte Carlo principle
Our goal is to estimate the following integral:
Estimator:
6
The Monte Carlo principle
Unbiased estimation
Independent of dimension d!
Asymptotically normal
7
The Monte Carlo principle
8
Sampling
Rejection sampling
Importance sampling
9
Main Goal
For example,
p(x) ∝ 0.3 exp(−0.2x2) +0.7 exp(−0.2(x − 10)2)
10
Rejection Sampling
11
Rejection Sampling Conditions
Suppose that
p(x) is known up to a proportionality constant
p(x) ∝ 0.3 exp(−0.2x2) +0.7 exp(−0.2(x − 10)2)
M is known
12
Rejection Sampling Algorithm
13
Rejection Sampling
Theorem
Severe limitations:
14
Importance Sampling
15
Importance Sampling
Goal: Sample from distribution p(x) that is only known up to a
proportionality constant
Importance sampling is an alternative “classical” solution that goes
back to the 1940’s.
16
Importance Sampling
Consequently,
17
Importance Sampling
Theorem
This estimator is unbiased
Under weak assumptions, the strong law of large numbers applies:
18
Importance Sampling
Theorem
This estimator is unbiased
Under weak assumptions, the strong law of large numbers applies:
19
Importance Sampling
Find one that minimizes the variance of the estimator!
Theorem
The variance is minimal when we adopt the following
optimal importance distribution:
20
Importance Sampling
The optimal proposal is not very useful in the sense that it is not easy to
sample from
21
MCMC sampling - Main ideas
Create a Markov chain, which has the desired limiting distribution!
22
Andrey Markov
Markov Chains
23
Markov Chains
Markov chain:
24
Markov Chains
Assume that the state space is finite:
Lemma:
25
Markov Chains Example
Markov chain with three states (s = 3)
it follows that
limit distribution
no matter what initial distribution µ(x1) was.
28
Markov Chains
Our goal is to find conditions under which the Markov chain
converges to a unique limit distribution (independently from its
starting state distribution)
Observation:
If this limiting distribution exists, it has to be the stationary distribution.
29
Limit Theorem of Markov Chains
Theorem:
That is, the chain will convergence to the unique stationary distribution
30
Markov Chains
Definition
Irreducibility:
For each pairs of states (i,j), there is a positive probability, starting in
state i, that the process will ever enter state j.
= The matrix T cannot be reduced to separate smaller matrices
= Transition graph is connected.
31
Markov Chains
Definition
Definition
A state i has period k if any return to state i, must occur in multiples of
k time steps. Formally, the period of a state i is defined as
In other words,
a state i is aperiodic if there exists n such that for all n' ≥ n,
Definition
A Markov chain is aperiodic if every state is aperiodic.
33
Markov Chains
Example for periodic Markov chain:
Let
In this case
If we start the chain from (1,0), or (0,1), then the chain get
traps into a cycle, it doesn’t forget its past.
34
Reversible Markov chains
(Detailed Balance Property)
How can we find the limiting distribution of an irreducible and aperiodic
Markov chain?
Theorem:
36
Spectral properties
Theorem: If
37
The Hastings-Metropolis Algorithm
38
The Hastings-Metropolis Algorithm
Our goal:
We don’t know B !
41
The Hastings-Metropolis Algorithm
Theorem
Proof
42
The Hastings-Metropolis Algorithm
Observation
Proof:
Corollary
Theorem
43
The Hastings-Metropolis Algorithm
Theorem
Proof:
Note:
44
The Hastings-Metropolis Algorithm
46
Experiment with HM
An application for continuous distributions
48
HM on Combinatorial Sets
50
HM on Combinatorial Sets
51
Gibbs Sampling: The Problem
52
Gibbs Sampling: Pseudo Code
53
Gibbs Sampling: Theory
Consider the following HM sampler:
Let
and let
Proof:
By definition:
55
Gibbs Sampling is a Special HM
Proof:
56
Gibbs Sampling in Practice
57
Simulated Annealing
58
Simulated Annealing
Goal: Find
59
Simulated Annealing
Theorem:
Proof:
60
Simulated Annealing
Main idea
Let λ be big.
Generate a Markov chain with limit distribution Pλ(x).
In long run, the Markov chain will jump among the maximum points of
Pλ(x).
61
Simulated Annealing
Uniform distribution
62
Simulated Annealing: Pseudo Code
63
Simulated Annealing: Special case
64
Simulated Annealing: Problems
65
Simulated Annealing
Temperature = 1/ λ
66
Simulated Annealing
67
Monte Carlo EM
E Step:
68
Monte Carlo EM
69
Thanks for the Attention! ☺
70