Markov chain Monte Carlo: Difference between revisions

Content deleted Content added
Accgcc (talk | contribs)
Cewbot (talk | contribs)
m Fixing broken anchor: Reminder of an inactive anchor: equilibrium distribution
(12 intermediate revisions by 6 users not shown)
Line 1:
{{short description|Class of dependent sampling algorithms}}
{{Bayesian statistics}}
In [[statistics]], '''Markov chain Monte Carlo''' ('''MCMC''') methods compriseis a class of [[algorithm]]s forused to draw samplingsamples from a [[probability distribution]]. ByGiven constructinga probability distribution, one can construct a [[Markov chain]] whose elements' distribution approximates it – that hasis, the desiredMarkov distribution as itschain's [[Discrete-time Markov chain#Stationary distributions|equilibrium distribution]], one can obtain a sample ofmatches the desiredtarget distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the [[Metropolis–Hastings algorithm]].
 
Markov chain Monte Carlo methods are used to study probability distributions that are too complex or too highly [[N-dimensional space|dimensional]] to study with analytic techniques alone. Various algorithms exist for constructing such Markov chains, including the [[Metropolis–Hastings algorithm]].
== Application domains ==
 
== Applications ==
MCMC methods are primarily used for calculating [[Numerical analysis|numerical approximations]] of [[Multiple integral|multi-dimensional integrals]], for example in [[Bayesian statistics]], [[computational physics]],<ref>{{Cite journal|last1=Kasim|first1=M.F.|last2=Bott|first2=A.F.A.|last3=Tzeferacos|first3=P.|last4=Lamb|first4=D.Q.|last5=Gregori|first5=G.|last6=Vinko|first6=S.M. | date = September 2019 |title=Retrieving fields from proton radiography without source profiles |journal=Physical Review E|volume=100|issue=3|page=033208|doi=10.1103/PhysRevE.100.033208|pmid=31639953|arxiv=1905.12934|bibcode=2019PhRvE.100c3208K|s2cid=170078861}}</ref> [[computational biology]]<ref>{{Cite journal|last1=Gupta|first1=Ankur|last2=Rawlings|first2=James B. | date = April 2014 |title=Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology |journal=AIChE Journal|volume=60|issue=4|pages=1253–1268|doi=10.1002/aic.14409 |pmc=4946376|pmid=27429455}}</ref> and [[computational linguistics]].<ref>See Gill 2008.</ref><ref>See Robert & Casella 2004.</ref>
 
In Bayesian statistics, theMarkov recentchain developmentMonte Carlo methods are typically used to calculate [[Moment (mathematics)|moments]] and [[credible interval]]s of [[posterior probability]] distributions. The use of MCMC methods has mademakes it possible to compute large [[Bayesian network#Hierarchical models|hierarchical models]] that require integrations over hundreds to thousands of unknown parameters.<ref>{{cite book|last1=Banerjee|first1=Sudipto|last2=Carlin|first2=Bradley P.|last3=Gelfand|first3=Alan P.|title=Hierarchical Modeling and Analysis for Spatial Data|publisher=CRC Press|isbn=978-1-4398-1917-3|page=xix|edition=Second|date=2014-09-12}}</ref>
 
In [[rare event sampling]], they are also used for generating samples that gradually populate the rare failure region.{{Citation needed|date=June 2021}}
Line 19 ⟶ 21:
Random walk Monte Carlo methods are a kind of random [[Computer simulation|simulation]] or [[Monte Carlo method]]. However, whereas the random samples of the integrand used in a conventional [[Monte Carlo integration]] are [[statistically independent]], those used in MCMC are [[autocorrelation|autocorrelated]]. Correlations of samples introduces the need to use the [[Markov chain central limit theorem]] when estimating the error of mean values.
 
These algorithms create [[Markov chains]] such that they have an [[Markov chain#Steady-state analysis and limiting distributions|equilibrium distribution]]{{Broken anchor|date=2024-06-13|bot=User:Cewbot/log/20201008/configuration|target_link=Markov chain#Steady-state analysis and limiting distributions|reason= The anchor (Steady-state analysis and limiting distributions) [[Special:Diff/970694186|has been deleted]].}} which is proportional to the function given.
 
==Reducing correlation==
Line 30 ⟶ 32:
**[[Gibbs sampling]]: When target distribution is multi-dimensional, Gibbs sampling algorithm<ref>{{Cite journal |last1=Geman |first1=Stuart |last2=Geman |first2=Donald |date=November 1984 |title=Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images |url=https://ieeexplore.ieee.org/document/4767596 |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=PAMI-6 |issue=6 |pages=721–741 |doi=10.1109/TPAMI.1984.4767596 |pmid=22499653 |s2cid=5837272 |issn=0162-8828}}</ref> updates each coordinate from its full [[conditional distribution]] given other coordinates. Gibbs sampling can be viewed as a special case of Metropolis–Hastings algorithm with acceptance rate uniformly equal to 1. When drawing from the full conditional distributions is not straightforward other samplers-within-Gibbs are used (e.g., see <ref>{{Cite journal|title = Adaptive Rejection Sampling for Gibbs Sampling|journal = Journal of the Royal Statistical Society. Series C (Applied Statistics)|date = 1992-01-01|pages = 337–348|volume = 41|issue = 2|doi = 10.2307/2347565|first1 = W. R.|last1 = Gilks|first2 = P.|last2 = Wild|jstor=2347565}}</ref><ref>{{Cite journal|title = Adaptive Rejection Metropolis Sampling within Gibbs Sampling|journal = Journal of the Royal Statistical Society. Series C (Applied Statistics)|date = 1995-01-01|pages = 455–472|volume = 44|issue = 4|doi = 10.2307/2986138|first1 = W. R.|last1 = Gilks|first2 = N. G.|last2 = Best|author2-link= Nicky Best |first3 = K. K. C.|last3 = Tan|jstor=2986138}}</ref>). Gibbs sampling is popular partly because it does not require any 'tuning'. Algorithm structure of the Gibbs sampling highly resembles that of the coordinate ascent variational inference in that both algorithms utilize the full-conditional distributions in the updating procedure.<ref>{{Cite journal |last=Lee|first=Se Yoon| title = Gibbs sampler and coordinate ascent variational inference: A set-theoretical review|journal=Communications in Statistics - Theory and Methods|year=2021|volume=51 |issue=6 |pages=1–21|doi=10.1080/03610926.2021.1921214|arxiv=2008.01006|s2cid=220935477}}</ref>
** [[Metropolis-adjusted Langevin algorithm]] and other methods that rely on the gradient (and possibly second derivative) of the log target density to propose steps that are more likely to be in the direction of higher probability density.<ref>See Stramer 1999.</ref>
** [[Hamiltonian Monte Carlo|Hamiltonian (or hybrid) Monte Carlo]] (HMC): Tries to avoid random walk behaviour by introducing an auxiliary [[momentum]] vector and implementing [[Hamiltonian dynamics]], so the potential energy function is the target density. The momentum samples are discarded after sampling. The result of hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.
**[[Pseudo-Marginal Metropolis–Hastings algorithm|Pseudo-marginal Metropolis–Hastings]]: This method replaces the evaluation of the density of the target distribution with an unbiased estimate and is useful when the target density is not available analytically, e.g. [[latent variable model]]s.
* [[Slice sampling]]: This method depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. It alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position.
* [[Multiple-try Metropolis]]: This method is a variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. By making it possible to take larger steps at each iteration, it helps address the curse of dimensionality.
* [[Reversible-jump]]: This method is a variant of the Metropolis–Hastings algorithm that allows proposals that change the dimensionality of the space.<ref>See Green 1995.</ref> Markov chain Monte Carlo methods that change dimensionality have long been used in [[statistical physics]] applications, where for some problems a distribution that is a [[grand canonical ensemble]] is used (e.g., when the number of molecules in a box is variable). But the reversible-jump variant is useful when doing Markov chain Monte Carlo or Gibbs sampling over [[nonparametric]] Bayesian models such as those involving the [[Dirichlet process]] or [[Chinese restaurant process]], where the number of mixing components/clusters/etc. is automatically inferred from the data.
* [[Hamiltonian Monte Carlo|Hamiltonian (or hybrid) Monte Carlo]] (HMC): Tries to avoid random walk behaviour by introducing an auxiliary [[momentum]] vector and implementing [[Hamiltonian dynamics]], so the potential energy function is the target density. The momentum samples are discarded after sampling. The result of hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.
 
=== Interacting particle methods ===
Interacting MCMC methodologies are a class of [[mean-field particle methods]] for obtaining [[Pseudo-random number sampling|random samples]] from a sequence of probability distributions with an increasing level of sampling complexity.<ref name="dp13">{{cite book|last = Del Moral|first = Pierre|title = Mean field simulation for Monte Carlo integration|year = 2013|publisher = Chapman & Hall/CRC Press |url = http://www.crcpress.com/product/isbn/9781466504059|pages = 626}}</ref> These probabilistic models include path space state models with increasing time horizon, posterior distributions w.r.t. sequence of partial observations, increasing constraint level sets for conditional distributions, decreasing temperature schedules associated with some Boltzmann–Gibbs distributions, and many others. In principle, any Markov chain Monte Carlo sampler can be turned into an interacting Markov chain Monte Carlo sampler. These interacting Markov chain Monte Carlo samplers can be interpreted as a way to run in parallel a sequence of Markov chain Monte Carlo samplers. For instance, interacting [[simulated annealing]] algorithms are based on independent Metropolis–Hastings moves interacting sequentially with a selection-resampling type mechanism. In contrast to traditional Markov chain Monte Carlo methods, the precision parameter of this class of interacting Markov chain Monte Carlo samplers is ''only'' related to the number of interacting Markov chain Monte Carlo samplers. These advanced particle methodologies belong to the class of Feynman–Kac particle models,<ref name="dp04">{{cite book|last = Del Moral|first = Pierre|title = Feynman–Kac formulae. Genealogical and interacting particle approximations|year = 2004|publisher = Springer |url = https://www.springer.com/mathematics/probability/book/978-0-387-20268-6|pages = 575}}</ref><ref name="dmm002">{{cite book|last1 = Del Moral|first1 = Pierre|last2 = Miclo|first2 = Laurent|contribution = Branching and Interacting Particle Systems Approximations of Feynman-Kac Formulae with Applications to Non-Linear Filtering|title=Séminaire de Probabilités XXXIV |editor=Jacques Azéma |editor2=Michel Ledoux |editor3=Michel Émery |editor4=Marc Yor|series = Lecture Notes in Mathematics|date = 2000|volume = 1729|pages = 1–145|url = http://archive.numdam.org/ARCHIVE/SPS/SPS_2000__34_/SPS_2000__34__1_0/SPS_2000__34__1_0.pdf|doi = 10.1007/bfb0103798|isbn = 978-3-540-67314-9}}</ref> also called Sequential Monte Carlo or [[particle filter]] methods in [[Bayesian inference]] and [[signal processing]] communities.<ref name=":3">{{Cite journal|title = Sequential Monte Carlo samplers | doi=10.1111/j.1467-9868.2006.00553.x|volume=68|issue = 3|year=2006|journal=Journal of the Royal Statistical Society. Series B (Statistical Methodology)|pages=411–436 | last1 = Del Moral | first1 = Pierre|arxiv=cond-mat/0212648| s2cid=12074789}}</ref> Interacting Markov chain Monte Carlo methods can also be interpreted as a mutation-selection [[Genetic algorithm|genetic particle algorithm]] with Markov chain Monte Carlo mutations.
 
=== Quasi-Monte Carlo ===
=== Markov Chain quasi-Monte Carlo (MCQMC)<ref>{{cite journal |last1=Chen |first1=S. |first2=Josef |last2=Dick |first3=Art B. |last3=Owen |title=Consistency of Markov chain quasi-Monte Carlo on continuous state spaces |journal=[[Annals of Statistics]] |volume=39 |issue=2 |year=2011 |pages=673–701 |doi=10.1214/10-AOS831 |arxiv=1105.1896 |doi-access=free }}</ref><ref>{{cite thesis |last=Tribble |first=Seth D. |title=Markov chain Monte Carlo algorithms using completely uniformly distributed driving sequences |type=Diss. |publisher=Stanford University |year=2007 |id={{ProQuest|304808879}} }}</ref> ===
 
The advantage of [[lowquasi-discrepancyMonte sequenceCarlo method]]s inis lieuan ofanalog randomto numbers for simplethe independentnormal Monte Carlo samplingmethod isthat welluses known[[low-discrepancy sequence]]s instead of random numbers.<ref>{{cite journal |last1=Papageorgiou |first1=Anargyros |first2=J. F. |last2=Traub |title=Beating Monte Carlo |journal=Risk |volume=9 |issue=6 |year=1996 |pages=63–65 }}</ref> This procedure, known as [[Quasi-Monte Carlo method]] (QMC),<ref>{{cite journal | last1 = Sobol | first1 = Ilya M | year = 1998 | title = On quasi-monte carlo integrations | journal = Mathematics and Computers in Simulation | volume = 47 | issue = 2| pages = 103–112 | doi=10.1016/s0378-4754(98)00096-2 }}</ref> It yields an integration error that decays atfaster a superior rate tothan that obtainedof bytrue IIDrandom sampling, as quantified by the [[Low-discrepancy sequence#The Koksma–Hlawka inequality|Koksma–Hlawka inequality]]. Empirically it allows the reduction of both estimation error and convergence time by an order of magnitude.{{Citation needed|date=April 2015}} TheMarkov Array–RQMCchain methodquasi-Monte combinesCarlo randomizedmethods<ref>{{cite quasi–Montejournal Carlo|last1=Chen and|first1=S. |first2=Josef |last2=Dick |first3=Art B. |last3=Owen |title=Consistency of Markov chain simulationquasi-Monte byCarlo simulatingon continuous state spaces |journal=[[Annals of Statistics]] |volume=39 |issue=2 |year=2011 |pages=673–701 |doi=10.1214/10-AOS831 |arxiv=1105.1896 |doi-access=free }}<math/ref>n</mathref>{{cite chainsthesis simultaneously|last=Tribble in|first=Seth aD. way|title=Markov thatchain theMonte empiricalCarlo distributionalgorithms ofusing completely uniformly distributed driving sequences |type=Diss. |publisher=Stanford University |year=2007 |id={{ProQuest|304808879}} }}</ref> such as the Array–RQMC method combine randomized quasi–Monte Carlo and Markov chain simulation by simulating <math>n</math> stateschains atsimultaneously anyin givena stepway is athat better approximation ofapproximates the true distribution of the chain than with ordinary MCMC.<ref>{{cite journal |last1=L'Ecuyer |first1=P. |first2=C. |last2=Lécot |first3=B. |last3=Tuffin |title=A Randomized Quasi-Monte Carlo Simulation Method for Markov Chains |journal=[[Operations Research (journal)|Operations Research]] |volume=56 |issue=4 |year=2008 |pages=958–975 |doi=10.1287/opre.1080.0556 |url=https://hal.inria.fr/inria-00070462/file/RR-5545.pdf }}</ref> In empirical experiments, the variance of the average of a function of the state sometimes converges at rate <math>O(n^{-2})</math> or even faster, instead of the <math>O(n^{-1})</math> Monte Carlo rate.<ref>{{cite journal |last1=L'Ecuyer |first1=P. |first2=D. |last2=Munger |first3=C. |last3=Lécot |first4=B. |last4=Tuffin |title=Sorting Methods and Convergence Rates for Array-RQMC: Some Empirical Comparisons |journal=Mathematics and Computers in Simulation |volume=143 |year=2018 |pages=191–201 |doi=10.1016/j.matcom.2016.07.010 }}</ref>
 
== Convergence ==
Line 55 ⟶ 57:
Several software programs provide MCMC sampling capabilities, for example:
* [https://github.com/cdslaborg/paramonte ParaMonte] parallel Monte Carlo software available in multiple programming languages including [[C (programming language)|C]], [[C++]], [[Fortran]], [[MATLAB]], and [[Python (programming language)|Python]].
* [https://github.com/dkundih/vandal Vandal] software for creation of Monte Carlo simulation available in [[Python (programming language)|Python]].
* Packages that use dialects of the [[Bayesian inference using Gibbs sampling|BUGS]] model language:
** [[WinBUGS]] / [[OpenBUGS]]/ [https://www.multibugs.org/ MultiBUGS]
Line 64 ⟶ 65:
** [https://juliahub.com/ui/Packages/General/DynamicHMC/ DynamicHMC.jl]
** [https://github.com/madsjulia/AffineInvariantMCMC.jl AffineInvariantMCMC.jl]
** [https://github.com/probcomp/Gen.jl Gen.jl]
** and the ones in StanJulia repository.
* [[Python (programming language)]] with the packages:
Line 121 ⟶ 123:
| isbn = 978-0-470-04609-8
}}
*Carlin, Brad; Chib, Siddhartha (1995). [https://wwwf.imperial.ac.uk/~das01/MyWeb/SCBI/Papers/CarlinChib.pdf "Bayesian Model Choice via Markov Chain Monte Carlo Methods"]. ''[[Journal of the Royal Statistical Society|Journal of the Royal Statistical Society, Series B]]'', 57(3), 473&ndash;484, 1995.
*{{cite journal
| first1 = George
Line 137 ⟶ 139:
| citeseerx = 10.1.1.554.3993
}}
*{{cite journal
*Chib, Siddhartha; Greenberg, Edward. [https://www.jstor.org/stable/2684568 "Understanding the Metropolis&ndash;Hastings Algorithm"]. ''[[The American Statistician]]'', 49(4), 327&ndash;335, 1995.
| first1 = Siddhartha
| last1 = Chib
| author1-link = Siddhartha Chib
| first2 = Edward
| last2 = Greenberg
| title = Understanding the Metropolis&ndash;Hastings Algorithm
| journal = The American Statistician
| volume = 49
| issue = 4
| pages = 327–335
| year = 1995
| doi = 10.1080/00031305.1995.10476177
| jstor = 2684568
}}
*{{cite journal
| first1 = A.E.