Markov Chain Monte Carlo
Markov Chain Monte Carlo
probability distribution, particularly when direct sampling is challenging. These methods rely
on constructing a Markov chain that has the desired distribution as its equilibrium
distribution.
Basics of Sampling
1. Importance Sampling
o Goal: Estimate properties of a target distribution by sampling from a simpler
proposal distribution.
o Key Idea: Reweight samples from the proposal distribution using an
importance weight w(x)=p(x)q(x)w(x) = \frac{p(x)}{q(x)}w(x)=q(x)p(x),
where p(x)p(x)p(x) is the target density and q(x)q(x)q(x) is the proposal
density.
2. Rejection Sampling
o Goal: Generate samples from the target distribution using a proposal
distribution.
o Process:
1. Sample xxx from a proposal distribution q(x)q(x)q(x).
2. Accept xxx with probability p(x)Mq(x)\frac{p(x)}{Mq(x)}Mq(x)p(x),
where MMM is a constant such that p(x)≤Mq(x)p(x) \leq
Mq(x)p(x)≤Mq(x) for all xxx.
3. Reject otherwise.
3. Stratified Sampling
o Goal: Reduce variance in sampling by dividing the distribution into strata
(subregions) and sampling proportionally within each stratum.
o Application: Often used in scenarios where the distribution exhibits
significant variability across different regions.
Proposal Distribution
MCMC Algorithms
1. Metropolis-Hastings Algorithm
o Purpose: Generate samples from the target distribution using a proposal
distribution.
o Steps:
1. Start with an initial state x0x_0x0.
2. Propose a new state x′x'x′ from the proposal distribution
q(x′∣xt)q(x'|x_t)q(x′∣xt).
3. Compute the acceptance ratio:
A(x′,xt)=min(1,p(x′)q(xt∣x′)p(xt)q(x′∣xt)).A(x', x_t) = \min \left( 1,
\frac{p(x') q(x_t | x')}{p(x_t) q(x' | x_t)} \right).A(x′,xt)=min(1,p(xt
)q(x′∣xt)p(x′)q(xt∣x′)).
4. Accept x′x'x′ with probability A(x′,xt)A(x', x_t)A(x′,xt); otherwise,
retain xtx_txt.
o Features: Works with any proposal distribution and target distribution.
2. Gibbs Sampling
o Purpose: Special case of MCMC for multidimensional distributions where
sampling each dimension conditionally is easier.
o Steps:
1. Initialize all variables (x1,x2,…,xn)(x_1, x_2, \ldots, x_n)(x1,x2,…,xn
).
2. For each dimension iii, sample xix_ixi from p(xi∣x−i)p(x_i | x_{-
i})p(xi∣x−i), where x−ix_{-i}x−i represents all variables except xix_ixi
.
o Features: Particularly useful for high-dimensional problems.
1. Ergodicity
o A Markov chain is ergodic if it is irreducible (all states communicate),
aperiodic (no cyclic behavior), and positive recurrent.
o This ensures that the chain converges to a unique stationary distribution.
2. Convergence Diagnostics
o Mixing Time: The time it takes for the Markov chain to get close to the
stationary distribution.
o Burn-in Period: Initial samples are discarded to allow the chain to reach the
stationary distribution.
o Autocorrelation: Low autocorrelation between samples indicates better
convergence.