Introduction To Monte Carlo Methods
Introduction To Monte Carlo Methods
Helmut G. Katzgraber
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Monte Carlo integration . . . . . . . . . . . . . . . . . . . . 3
2.1 Traditional integration schemes . . . . . . . . . . . . . . 3
2.2 Simple and Markov-chain sampling . . . . . . . . . . . . 4
2.3 Importance sampling . . . . . . . . . . . . . . . . . . . . 8
3 Interlude: Statistical mechanics . . . . . . . . . . . . . . . . 9
3.1 Simple toy model: The Ising model . . . . . . . . . . . . 9
3.2 Statistical physics in a nutshell . . . . . . . . . . . . . . . 10
Monte Carlo Methods (Katzgraber)
1 Introduction
The Monte Carlo method in computational physics is possibly one of the most im-
portant numerical approaches to study problems spanning all thinkable scientific dis-
ciplines. The idea is seemingly simple: Randomly sample a volume in d-dimensional
space to obtain an estimate of an integral at the price of a statistical error. For
problems where the phase space dimension is very large—this is especially the case
when the dimension of phase space depends on the number of degrees of freedom—the
Monte Carlo method outperforms any other integration scheme. The difficulty lies in
smartly choosing the random samples to minimize the numerical effort.
The term Monte Carlo method was coined in the 1940s by physicists S. Ulam,
E. Fermi, J. von Neumann, and N. Metropolis (amongst others) working on the nu-
clear weapons project at Los Alamos National Laboratory. Because random numbers
(similar to processes occurring in a casino, such as the Monte Carlo Casino in Monaco)
are needed, it is believed that this is the source of the name. Monte Carlo methods
were central to the simulations done at the Manhattan Project, yet mostly hampered
by the slow computers of that era. This also spurred the development of fast random
number generators, discussed in another lecture of this series.
In this lecture, focus is placed on the standard Metropolis algorithm to study prob-
lems in statistical physics, as well as a variation known as exchange or parallel tem-
pering Monte Carlo that is very efficient when studying problems in statistical physics
with complex energy landscapes (e.g., spin glasses, proteins, neural networks) [1]. In
2
2 Monte Carlo integration
general, continuous phase transitions are discussed. First-order phase transitions are,
however, beyond the scope of these notes.
Traditionally, one partitions the interval [a, b] into M slices of width δ = (b − a)/M
and then performs a kth order interpolation of the function f (x) for each interval to
approximate the integral as a discrete sum (see Fig. 1). For example, to first order,
one performs the midpoint rule where the area of the lth slice is approximated by a
rectangle of width δ and height f [(xl + xl+1 )/2]. It follows that
M
X −1
I≈ δ · f [(xl + xl+1 )/2] . (2)
l=0
For M → ∞ the discrete sum converges to the integral of f (x). Convergence can be
improved by replacing the rectangle with a linear interpolation between xl and xl+1
(trapezoidal rule) or a weighted quadratic interpolation (Simpson’s rule) [74]. One
can show that the error made due to the approximation of the function is proportional
to ∼ M −1 for the midpoint rule if the function is evaluated at one of the interval’s
edges (in the center as shown above ∼ M −2 ), ∼ M −2 for the trapezoidal rule, and
∼ M −4 for Simpson’s rule. The convergence of the midpoint rule can thus be slow
and the method should be avoided.
A problem arises when a multi-dimensional integral needs to be computed. In this
case one can show that, for example, the error of Simpson’s rule scales as ∼ M −4/d
3
Monte Carlo Methods (Katzgraber)
f(x)
a b x
Figure 1: Illustration of the midpoint rule. The integration interval [a, b] is divided
into M slices, the area of each slice approximated by the width of the slice, δ =
(b − a)/M , times the function evaluated at the midpoint of each slice.
because each space component has to be partitioned independently. Clearly, for space
dimensions larger than 4 convergence becomes very slow. Similar arguments apply for
any other traditional integration scheme where the error scales as ∼ M −κ : if applied
to a d-dimensional integral the error scales ∼ M −κ/d .
1 algorithm simple_pi
2 initialize n_hits 0
3 initialize m_trials 10000
4 initialize counter 0
5
4
2 Monte Carlo integration
7 x = rand(0,1)
8 y = rand(0,1)
9 if(x**2 + y**2 < 1)
10 n_hits++
11 fi
12 counter++
13 done
14
15 return pi = 4*n_hits/m_trials
For each of the m trials trials we generate two uniform random numbers [74] in the
interval [0, 1] [with rand(0,1)] and test in line 9 of the algorithm if these lie in the
unit circle or not. The counter n hits is then updated if the resulting number is in
the circle. In line 15 a statistical estimate of π is then returned.
Before applying these ideas to the integration of a function, we introduce the
concept of a Markov chain [64]. In the simple-sampling approach to estimate the area
of a pond as presented above, the random pebbles used are independent in the sense
that a newly-selected pebble to be thrown into the rectangular area in Fig. 2 does not
depend in any way on the position of the previous pebbles. If, however, the pond is
very large, it is impossible to throw pebbles randomly from one position. Thus the
approach is modified: After enough beer you start at a random location (make sure to
drain the pond first) and throw a pebble into a random direction. You then walk to
that pebble, pull a new pebble out of a pebble bucket you have with you and repeat
the operation. This is illustrated in Fig. 4. If the pebble lands outside the rectangular
area, the thrower should go get the outlier and place it on the current position of the
thrower, i.e., if the move lies outside the sampled area, it is rejected and the last move
counted twice. Why? This will be explained later and is called detailed balance (see
p. 14). Basically, it ensures that the Markov chain is reversible. After many beers
and throws, pebbles are scattered around the rectangular area, with small piles of
multiple pebbles closer to the boundaries (due to rejected moves).
Again, these ideas can be used to estimate π by Markov-chain sampling the unit
circle. Later, the Metropolis algorithm, which is based on these simple ideas, is
introduced in detail using models from statistical physics. The following algorithm
describes Markov-chain Monte Carlo for estimating π:
5
Monte Carlo Methods (Katzgraber)
1 algorithm markov_pi
2 initialize n_hits 0
3 initialize m_trials 10000
4 initialize x 0
5 initialize y 0
6 initialize counter 0
7
21 return pi = 4*n_hits/m_trials
The algorithm starts from a given position in the space to be sampled [here (0, 0)]
and generates the position of the new dot from the position of the previous one. If
the new position is outside the square, it is rejected (line 11). A careful selection of
the step size p used to generate random numbers in the range [−p, p] is of importance:
When p is too small, convergence is slow, whereas if p is too large many moves are
rejected because the simulation will often leave the unit square. Therefore, a value of
p has to be selected such that consecutive moves are accepted approximately 50% of
the time.
The simple-sampling approach has the advantage over the Markov-chain approach
in that the different samples are independent and thus not correlated. In the Markov-
chain approach the new state depends on the previous state. This can be a problem
since there might be a “memory” associated with this behavior. If this memory is
large, then the autocorrelation times (i.e., the time it takes the system to forget where
6
2 Monte Carlo integration
it was) are large and many moves have to be discarded. Then why even think about
the Markov-chain approach? Because in the study of physical systems it is generally
easier to slightly (and randomly) change an existing state than to generate a new state
from scratch for each step of the calculation. For example, when studying a system
of N spins it is easier to flip one spin according to a given probability distribution
than to generate a new configuration from scratch with a pre-determined probability
distribution.
Let us apply now these ideas to perform a simple-sampling estimate of the integral
of an actual function. As an example, we select a simple function, namely
Z 1
n
f (x) = x → I= f (x)dx (3)
0
with n > −1. Using simple-sampling Monte Carlo, the integral can be estimated via
1 algorithm simple_integrate
2 initialize integral 0
3 initialize m_trials 10000
4 initialize counter 0
5
12 return integral/m_trials
In line 8 we evaluate the function at the random location and add the result to the
estimate of the integral, i.e.,
M
1 X
I≈ f (xi ) , (4)
M i
where we have set m trials = M . To calculate the error of the estimate, we need to
compute the variance of the function. For this we need to also perform a simple sam-
pling of the square of the function, i.e., add a line to the code with integral square
+= x**(2*n). It then follows [56] for the statistical error of the integral δI
r
Varf
δI = , Varf = hf 2 i − hf i2 , (5)
M −1
with
Z 1 M
1 X
hf k i = [f (x)]k dx ≈ [f (xi )]k . (6)
0 M i
Here xi are uniformly distributed random numbers. The important detail is that
Eq. (5) does not depend on the space dimension and merely on M −1/2 . This means
7
Monte Carlo Methods (Katzgraber)
that, for example, for space dimensions d > 8 Monte Carlo sampling outperforms
Simpson’s rule.
The presented simple-sampling approach has one crucial problem: When in the
example shown the exponent n is close to −1 or much larger than 1 the variance of
the function in the interval is large. At the same time, the interval [0, 1] is sampled
uniformly. Therefore, similar to the estimate of π, areas which carry little weight
for the integral are sampled with equal probability as areas which carry most of the
function’s support (see Fig. 5). Therefore the integral and error converge slowly. To
alleviate the situation and shift resources where they are needed most, importance
sampling is used.
f(x)
1
Figure 5: Illustration of the simple-sampling
approach when integrating f (x) = xn with n
1. The function has most support for x → 1.
Because random numbers are generated with
a uniform probability, the whole range [0, 1] is
sampled equally probable, although for x → 0
the contribution to the integral is small. Thus,
the integral converges slowly.
1 x
8
3 Interlude: Statistical mechanics
The first term in Eq. (9) is responsible for the pairwise interaction between two
neighboring spins Si and Sj . When Jij = −J < 0, the energy is minimized by
aligning all spins, i.e., ferromagnetic order, whereas when Jij = J > 0 the energy is
minimized by ensuring that the product over all neighboring spins is negative. In this
case, staggered antiferromagnetic order is obtained for T → 0. The “hi, ji” represents
a sum over nearest-neighbor pairs of spins on the lattice (see Fig. 6). The second term
in Eq. (9) represents a coupling to an external field of strength H. Amazingly, this
simple model captures all interesting phenomena found in the physics of statistical
mechanics and phase transitions. It is exactly solvable in one space dimension, and in
two dimensions for H = 0, and thus an excellent test bed for algorithms. Furthermore,
in space dimensions larger than one it undergoes a finite-temperature transition into
an ordered state.
A natural way to quantify the temperature-dependent transition in the ferromag-
netic case is to measure the magnetization
1 X
m= Si (10)
N i
9
Monte Carlo Methods (Katzgraber)
of the system. When all spins are aligned, i.e., at low temperatures (below the
transition), the magnetization is close to unity. For temperatures much larger than the
transition temperature Tc , spins fluctuate wildly and so, on average, the magnetization
is zero. Therefore, the magnetization plays the role of an order parameter that is large
in the ordered phase and zero otherwise. Before the model is described further, some
basic concepts from statistical physics are introduced.
The sum
P is over all states s in the system, and k represents the Boltzmann constant.
Z = s exp[−H(s)/kT ] is the partition function which normalizes the equilibrium
Boltzmann distribution
1
Peq (s) = e−H(s)/kT . (12)
Z
The h· · · i in Eq. (11) represent a thermal average. One can show that the internal
energy of the system is given by
E = hH(s)i , (13)
F = −kT ln Z . (14)
10
3 Interlude: Statistical mechanics
Note that all thermodynamic quantities can be computed directly from the partition
function and expressed as derivatives of the free energy (see Ref. [91] for details).
Because the partition function is closely related to the Boltzmann distribution, it
follows that if we can sample observables (e.g., measure the magnetization) with
states generated according to the corresponding Boltzmann distribution, a simple
Markov-chain “integration” scheme can be used to produce an estimate.
Phase transitions Continuous phase transitions [43] have no latent heat at the
transition and are thus easier to describe. At a continuous phase transition the free
energy has a singularity that usually manifests itself via a power-law behavior of the
derived observables at criticality. The correlation length ξ [43]—which gives us a
measure of correlations and order in a system—diverges at the transition
ξ ∼ |T − Tc |−ν , (15)
with ν a critical exponent quantifying this divergence and Tc the transition tempera-
ture. Close enough to the transition (i.e., |T −Tc |/Tc 1) the behavior of observables
can be well described by power laws. For example, the specific heat cV has a singu-
larity at Tc with cV ∼ |T − Tc |−α , although the exponent α (unlike ν) can be both
negative and positive. The magnetization does not diverge, but has a singular kink
at Tc , i.e., m ∼ |T − Tc |β with β > 0.
Using arguments from the renormalization group [31] it can be shown that the crit-
ical exponents are related via scaling relations. Often (as in the Ising case), only two
exponents are independent and fully characterize the critical behavior of the model.
It can be further shown that models in statistical physics generally obey universal be-
havior (there are some exceptions. . . ), i.e., if the lattice geometry is kept the same, the
critical exponents only depend on the order parameter symmetry. Therefore, when
simulating a statistical model, it is enough to determine the location of the transition
temperature Tc , as well as two independent critical exponents to fully characterize the
universality class of the system.
Finite-size scaling and the Binder ratio (or “Binder cumulant”) How can
we determine the bulk critical exponents of a system by simulating finite lattices?
When the systems are not infinitely large, the critical behavior is smeared out. Again,
using arguments from the renormalization group, one can show that the nonanalytic
part of a given observable can be described by a finite-size scaling form [75]. For
example, the finite-size magnetization from a simulation of an Ising system with Ld
spins is asymptotically (close to the transition, and for large L) given by
11
Monte Carlo Methods (Katzgraber)
where close to the transition χ ∼ |T − Tc |−γ (for the infinite system, L → ∞) and
Ld
hm2 i − hmi2 .
χ= (18)
kT
Both M̃ and C̃ are unknown scaling functions. Equations (16) and (17) show that
when T = Tc , data for hmL i/Lβ/ν and χL /Lγ/ν simulated for different system sizes
L should cross in the large-L limit at one point, namely T = Tc , provided we use
the right expressions for β/ν and γ/ν, respectively. In reality, there are nonanalytic
corrections to scaling and so the crossing points between two successive system size
pairs (e.g., L and 2L) converge to a common crossing point for L → ∞ that agrees
with the bulk transition temperature Tc . Performing the finite-size scaling analysis
with the magnetization or the susceptibility is not very practical, because neither β
nor γ are known a priori. There are other approaches to determine these, but a far
simpler method is to determine combined quantities that are dimensionless. One such
quantity is known as the Binder ratio (or “Binder cumulant”) [12] given by
hm4 i
1
g= 3− ∼ G̃[L1/ν (T − Tc )] . (19)
2 hm2 i2
12
4 Monte Carlo simulations in statistical physics
Figure 7: Left panel: Binder ratio as a function of temperature for the two-
dimensional Ising model with nearest-neighbor interactions. The data approxi-
mately cross at one point (the dashed line corresponds to the exactly-known Tc for
the two-dimensional Ising model) signaling a transition. Right panel: Finite-size
scaling of the data in the left panel using the known Tc = 2.269 . . . and ν = 1. Plot-
ted are data for the Binder ratio as a function of the scaling variable L1/ν [T − Tc ].
Data for different system sizes fall onto a universal curve suggesting that the pa-
rameters used are the correct ones.
Equation (20) can be trivially extended with a distribution for the states, i.e.,
−H(s)/kT
P
s [O(s)e /P(s)] P(s)
hOi = P −H(s)/kT
. (21)
s [e /P(s)] P(s)
The approach is completely analogous to the importance sampling Monte Carlo inte-
gration. If P(s) is the Boltzmann distribution [Eq. (12)] then the factors cancel out
and we obtain
1 X
hOi = O(si ) , (22)
M i
where the states si are now selected according to the Boltzmann distribution. The
problem now is to find an algorithm that allows for a sampling of the Boltzmann
distribution. The method is known as the Metropolis algorithm.
13
Monte Carlo Methods (Katzgraber)
“The purpose of this paper is to describe a general method, suitable for fast
electronic computing machines, of calculating the properties of any substance
which may be considered as composed of interacting individual molecules.”
And they were right. The idea is the following: In order to evaluate Eq. (20) we
generate a Markov chain of successive states s1 → s2 → . . .. The new state is
generated from the old state with a carefully-designed transition probability P(s → s0 )
such that it occurs with a probability given by the equilibrium Boltzmann distribution,
i.e., Peq (s) = Z −1 exp[−H(s)/kT ]. In the Markov process, the state s occurs with
probability Pk (s) at the kth time step, described by the master equation
X
Pk+1 (s) = Pk (s) + [T (s0 → s)Pk (s0 ) − T (s → s0 )Pk (s)] . (23)
s0
The sum is over all states s0 and the first term in the sum describes all processes
reaching state s, while the second term describes all processes leaving state s. The
goal is that for k → ∞ the probabilities Pk (s) reach a stationary distribution described
by the Boltzmann distribution. The transition probabilities T can be designed in such
a way that for Pk (s) = Peq (s), all terms in the sum vanish, i.e., for all s and s0 the
detailed balance condition
must hold. The condition in Eq. (24) means that the process has to be reversible.
Furthermore, when the system has assumed the equilibrium probabilities, the ratio
of the transition probabilities only depends on the change in energy ∆H(s, s0 ) =
H(s0 ) − H(s), i.e.,
T (s → s0 )
= exp[−(H(s0 ) − H(s))/kT ] = exp[−∆H(s, s0 )/kT ] . (25)
T (s0 → s)
There are different choices for the transition probabilities T that satisfy Eq. (25). One
can show that T has to satisfy the general equation T (x)/T (1/x) = x ∀x with x =
exp(−∆H/kT ). There are two convenient choices for T that satisfy this condition:
14
4 Monte Carlo simulations in statistical physics
focus on the Metropolis algorithm. The heat bath algorithm is more efficient when
temperatures far below the transition temperature are sampled.
The move between states s and s0 can, in principle, be arbitrary. If, however, the
energies of states s and s0 are too far apart, the move will likely not be accepted. For
the case of the Ising model, in general, a single spin Si is selected and flipped with
the following probability:
Γ, for Si = −sign(hi );
T (Si → −Si ) = (27)
Γe−2Si hi /kT , for S = sign(h ) .
i i
P
where hi = − j6=i Jij Sj + H is the effective local field felt by the spin Si .
15
Monte Carlo Methods (Katzgraber)
1 algorithm ising_metropolis(T,steps)
2 initialize starting configuration S
3 initialize O = 0
4
13 O += O(S’)
14 done
15
16 return O/steps
After initialization, in line 6 a proposed state is generated by, e.g., flipping a spin.
The energy of the new state is computed and henceforth the transition probability
between states p = T (S → S 0 ). A uniform random number x ∈ [0, 1] is generated. If
the probability is larger than the random number, the move is accepted. If the energy
is lowered, i.e., ∆H > 0, the spin is always flipped. Otherwise the spin is flipped with
a probability p. Once the new state is accepted, we measure a given observable and
record its value to perform the thermal average at a given temperature. For steps
→ ∞ the average of the observable converges to the exact value, again with an error
inversely proportional to the square root of the number of steps. This is the core
bare-bones routine for the Metropolis algorithm. In practice, several aspects have to
be considered to ensure that the data produced are correct. The most important,
autocorrelation and equilibration times, are described below.
4.2 Equilibration
In order to obtain a correct estimate of an observable O, it is imperative to ensure
that one is actually sampling an equilibrium state. Because, in general, the initial con-
figuration of the simulation can be chosen at random—popular choices being random
or polarized configuration—the system will have to evolve for several Monte Carlo
steps before an equilibrium state at a given temperature is obtained. The time τeq
until the system is in thermal equilibrium is called equilibration time and depends
directly on the system size (e.g., the number of spins N = Ld ) and increases with
decreasing temperature. In general, it is measured in units of Monte Carlo sweeps
(MCS), i.e., 1 MCS = N spin updates.
In practice, all measured observables should be monitored as a function of MCS to
ensure that the system is in thermal equilibrium. Some observables, such as the
16
4 Monte Carlo simulations in statistical physics
m(t)
τ eq t
energy, equilibrate faster than others (e.g., magnetization) and thus the equilibration
times of all observables measured need to be considered.
17
Monte Carlo Methods (Katzgraber)
τauto ∼ ξ z (31)
with z > 1 and typically around 2. Because the correlation length ξ diverges at a
continuous phase transition, so does the autocorrelation time. This effect, known as
critical slowing down, slows simulations to intractable times close to continuous phase
transitions when the dynamical critical exponent z is large.
The problem can be alleviated by using Monte Carlo methods which, while only
performing small changes to the energy of the system (to ensure that moves are
accepted frequently), heavily randomize the spin configurations and not only change
the value of one spin. This ensures that phase space is sampled evenly. Typical
examples are cluster algorithms [82, 90] where a carefully-built cluster of spins is
flipped at each step of the simulation [36, 58, 59, 68].
Wolff cluster algorithm (Ising spins) In the Wolff cluster algorithm [90] we
choose a random spin and build a cluster around it (the algorithm is constructed
in such a way that larger clusters are preferred). Once the cluster is constructed,
it is flipped in a rejection-free move. This “randomizes” the system efficiently, thus
overcoming critical slowing down. Outline of the algorithm:
The algorithm obeys detailed balance. Furthermore, one can show that the linear size
of the cluster is proportional to the correlation length. Therefore the algorithm adapts
to the behavior of the system at criticality resulting in z ≈ 0, i.e., the critical slowing
down encountered around the transition is removed and the algorithm performs orders
of magnitude faster than simple Monte Carlo. For low temperatures, the cluster
algorithm merely “flip-flops” almost all spins of the system and provides not much
improvement, unless a domain wall is stuck in the system. For temperatures much
higher than the critical temperature the size of the clusters is of the order of one spin
and there the Metropolis algorithm outperforms the cluster algorithm (keep in mind
that building the cluster takes many operations). Thus the method works best at
criticality.
In general, to be able to cover a temperature range that extends beyond the critical
region, combinations of cluster updates and local updated (standard Monte Carlo) are
recommended. One can also define improved estimators to measure observables with
18
5 Complex toy model: The Ising spin glass
a reduced statistical error. Finally, the Wolff cluster algorithm can also be generalized
to Potts spins, XY and Heisenberg spins, as well as hard disks. The reader is referred
to the literature [36, 56, 58, 59, 68] for details.
Note that the Swendsen-Wang cluster algorithm [82] is similar to the Wolff cluster
algorithm. However, instead of building one cluster, multiple clusters are built. This
is less efficient when the space dimension is larger than two because in that case only
few large clusters will exist.
These systems are characterized by a complex energy landscape with deep valleys
and mountains that grow exponentially with the system size. Therefore, for low tem-
peratures, equilibration times of simple Monte Carlo methods diverge. Although the
method technically still works, the time it takes to equilibrate even the smallest sys-
tems becomes impractical. Improved sampling techniques for rough energy landscapes
need to be implemented.
19
Monte Carlo Methods (Katzgraber)
is a hallmark of spin glasses [13, 21, 23, 28, 65, 83, 92]. Note that, in general, the bonds
are either chosen from a bimodal (Pb ) or Gaussian (Pg ) disorder distribution:
1 2
Pb (Jij ) = pδ(Jij − 1) + (1 − p)δ(Jij + 1) , Pg (Jij ) = √ exp[−Jij /2] , (32)
2π
where, in general p = 1/2. The Hamiltonian in Eq. (9) with disorder in the bonds
is known as the Edwards-Anderson Ising spin glass. There is a finite-temperature
transition for space dimensions d ≥ 3 between a spin-glass and the (thermally) disor-
dered state, cf. Sec. 5.2. For example, for Gaussian disorder in three space dimensions
Tc ≈ 0.95 [49].
?
Figure 10: Two-dimensional Ising spin-glass. The circles represent Ising spins.
A thin line between two spins i and j corresponds to Jij < 0, whereas a thick
line corresponds to Jij > 0. In comparison to a ferromagnet, the behavior of the
model system changes drastically, as illustrated in the highlighted plaquette. For
T → 0, the spin in the lower left corner is unable to fulfill the interactions with the
neighbors and is frustrated (see text).
20
5 Complex toy model: The Ising spin glass
21
Monte Carlo Methods (Katzgraber)
standard theoretical predictions of replica symmetry breaking and the droplet theory.
How can order be quantified in a system that intrinsically does not have visible
spatial order? For this we need to first determine what differentiates a spin glass
at temperatures above the critical point Tc and below. Above the transition, like
for the regular Ising model, spins fluctuate and any given snapshot yields a random
configuration. Therefore, comparing a snapshot at time t and time t + δt yields
completely different results. Below the transition, (replica) symmetry is broken and
configurations freeze into place. Therefore, comparing a snapshot of the system at
time t and time t + δt shows significant similarities. A natural choice thus is to define
an overlap function q which compares two copies of the system with the same disorder.
In simulations, it is less practical to compare two snapshots of the system at
different times. Therefore, for practical reasons two copies (called “replicas”) α and β
with the same disorder but different initial conditions and Markov chains are simulated
in parallel. The order parameter is then given by
1 X α β
q= S S , (33)
N i i i
and is illustrated graphically in Fig. 11. For temperatures below Tc , q tends to unity
whereas for T > Tc on average q → 0, similar to the magnetization for the Ising
ferromagnet. Analogous to the ferromagnetic case, we can define a Binder ratio g by
replacing the magnetization m with the spin overlap q to probe for the existence of a
spin-glass state.
α β α β
22
6 Parallel tempering Monte Carlo
of these systems with standard Monte Carlo [55, 59, 64] or molecular dynamics [29]
methods is slowed down by long relaxation times due to the suppression of tunnel-
ing through these barriers. Already simple chemical reactions with latent heat, i.e.,
first-order phase transitions, present huge numerical challenges that are not present
for systems which undergo second-order phase transitions where improved updating
techniques, such as cluster algorithms [82,90], can be used. For complex systems with
competing interactions, one instead attempts to improve the local updating technique
by introducing artificial statistical ensembles such that tunneling times through bar-
riers are reduced and autocorrelation effects minimized.
One such method is parallel tempering Monte Carlo [5,30,44,61,81] that has proven
to be a versatile “workhorse” in many fields [24]. Similar to replica Monte Carlo
[81], simulated tempering [61], or extended ensemble methods [60], the algorithm
aims to overcome free-energy barriers in the free energy landscape by simulating
several copies of the target system at different temperatures. The system can thus
escape metastable states when wandering to higher temperatures and relax to lower
temperatures again in time scales several orders of magnitude smaller than for a simple
Monte Carlo simulation at one fixed temperature. The method has also been combined
with several other algorithms such as genetic algorithms and related optimization
methods, molecular dynamics, cluster algorithms and quantum Monte Carlo.
T [(Ei , Ti ) → (Ei+1 , Ti+1 )] = min {1, exp[(Ei+1 − Ei )(1/Ti+1 − 1/Ti )]} . (34)
A given configuration will thus perform a random walk in temperature space over-
coming free energy barriers by wandering to high temperatures where equilibration
is rapid and configurations change more rapidly, and returning to low temperatures
where relaxation times can be long. Unlike for simple Monte Carlo, the system can
efficiently explore the complex energy landscape. Note that the update probability in
Eq. (34) obeys detailed balance.
At first sight it might seem wasteful to simulate a system at multiple tempera-
tures. In most cases, the number of temperatures does not exceed 100 values, yet the
speedup attained can be 5 – 6 orders of magnitude. Furthermore, one often needs the
temperature dependence of a given observable and so the method delivers data for
different temperatures in one run. A simple implementation of the parallel tempering
move called after a certain number of lattice sweeps using pseudo code is shown below.
23
Monte Carlo Methods (Katzgraber)
1 algorithm parallel_tempering(*energy,*temp,*spins)
2
The swap( ) function swaps neighboring energies and spin configurations (*spins)
if the move is accepted. As simple as the algorithm is, some fine tuning has to be
performed for it to operate optimally.
Because relaxation is slower for lower temperatures, the geometric progression peaks
the number of temperatures close to T1 . If, however, the specific heat of the system
has a strong divergence, this approach is not optimal.
One can show that the acceptance probabilities are inversely correlated √ to the
functional behavior of the specific heat per spin cV via ∆Ti,i+1 ∼ cV Ti / N [72].
24
6 Parallel tempering Monte Carlo
Improved approaches Recently, a new iterative feedback method has been in-
troduced to optimize the position of the temperatures in parallel tempering simula-
tions [51]. The idea is to treat the set of temperatures as an ensemble and thus use
ensemble optimization methods [86] to improve the round-trip times of a given system
copy in temperature space. Unlike the conventional approaches, resources are allo-
cated to the bottlenecks of the simulation, i.e., phase transitions and ground states
where relaxation is slow. As a consequence, acceptance probabilities are temperature-
dependent because more temperatures are allocated to the bottlenecks. The approach
requires one to gather enough round-trip data for the temperature sets to converge
and thus is not always practical. For details on the implementation, see Refs. [51]
and [85], as well as Ref. [33] for an improved version.
A similar approach to optimize the efficiency of parallel tempering has recently
been introduced by Bittner et al. [15]. Unlike the previously-mentioned feedback
method, this approach leaves the position of the temperatures untouched but with an
average acceptance probability of 50%. To deal with free-energy barriers in the simu-
lation, the autocorrelation times of the simulation without parallel tempering have to
be measured ahead of time. The number of MCS between parallel tempering updates
is then dependent on the autocorrelation times, i.e., close to a phase transition, more
MCS between parallel tempering moves are performed. Again, the method is thus
optimized because resources are reallocated to where they are needed most. Unfortu-
nately, this approach also requires a simulation to be done ahead of time to estimate
the autocorrelation times, but a rough estimate is sufficient.
25
Monte Carlo Methods (Katzgraber)
spin correlator known as the link overlap q` [50]. The link overlap is given by
1 X α α β β
q` = Si Sj Si Sj . (36)
dN
hi,ji
The sum in Eq. (36) is over neighboring spin pairs and the normalization is over all
bonds. If a domain of spins in a spin glass is flipped, the link overlap measures the
average length of the boundary of the domain.
where h· · · i represents the Monte Carlo average for a given set of bonds, and [· · · ]av
denotes an average over the (Gaussian) disorder. One can perform an integration by
parts over Jij to relate u to the average link overlap defined in Eq. (36), i.e.,
Tu
[hq` i]av = 1 +. (38)
d
The simulation starts with a random spin configuration. This means that the two
sides of Eq. (38) approach equilibrium from opposite directions. Data for q` will be
too small because we started from a random configuration, whereas the initial energy
will not be as negative as in thermal equilibrium. Once both sides of Eq. (38) agree,
the system is in thermal equilibrium. This is illustrated in Fig. 12 for the Edwards-
Anderson Ising spin glass with 43 spins and T = 0.5 which is approximately 50% Tc .
The data are averaged over 5000 realizations of the disorder. While the data for q`
generated with parallel tempering Monte Carlo agree after approximately 300 MCS,
the data produced with simple Monte Carlo have not reached equilibrium even after
105 MCS, thus illustrating the power of parallel tempering for systems with a rough
energy landscape.
26
7 Rare events: Probing tails of energy distributions
Together with the disorder distribution P(J ), this defines the ground-state energy
distribution Z
P (E) = dJ P(J ) δ [E − E(J )] . (40)
27
Monte Carlo Methods (Katzgraber)
via
Nsamp
1 X
P (E) = δ [E − E(Ji )] , (41)
Nsamp i=1
so that the averages of functions with respect to the disorder are replaced by averages
with respect to the Nsamp random samples. The functional form of the ground-state
energy distribution and its parameters can be estimated by a maximum likelihood fit
of an empirical distribution Fθ (E) with parameters {θ} to the data [19]. Note that
due to the limited range of energies sampled by the simple-sampling algorithm it is
often difficult or even impossible to quantify how well the tails of the distribution are
described by a maximum-likelihood fit.
28
7 Rare events: Probing tails of energy distributions
where it decays to 1/e [58]. Here Ei is the ground state energy after the i-th Monte
Carlo step and h. . .i represents an average over Monte Carlo time. To be sure that the
visited ground-state configurations are not correlated, we empirically only use every
4τ -th measurement. Once the autocorrelation effects have been quantified, the data
can be analyzed with the same methods as the simple-sampling results [see Eq. (41)].
!%#
" !#
!%!#
Figure 13: Autocorrelation function as defined in Eq. (43) for the Sherrington-
Kirkpatrick spin glass and system sizes N = 16 (circles) and N = 128 (triangles).
The value 1/e is marked by the horizontal dotted line. Time steps i are measured
in Monte Carlo steps. (Figure adapted from Reference [54]).
29
Monte Carlo Methods (Katzgraber)
We first compute 105 ground-state energies and bin the data into 50 bins and
perform a maximum-likelihood fit to a function that describes the shape of the ground-
state energy distribution best. In this case, this is a modified Gumbel distribution [32]:
E−µ E−µ
Fµ,ν,m (E) ∝ exp m − m exp . (45)
ν ν
The modified Gumbel distribution is parametrized by the “location” parameter µ,
the “width” parameter ν, and the “slope” parameter m. The parameters µ, ν and
m estimated from a maximum-likelihood fit represent the input parameters for the
guiding function used in the importance-sampling simulation in the disorder. To
perform a step in the Monte Carlo algorithm, we choose a site at random, replace
all bonds connected to this site (the expected change in the ground-state energy is
then of the order ∼ 1/N ), calculate the ground-state energy of the new configuration,
and accept the new configuration with the probability given in Eq. (42). A study of
the energy autocorrelation shows that for system sizes between 16 and 128 spins the
autocorrelation times are of the order of 400 to 700 Monte Carlo steps, see Fig. 13.
0
10
N
-3
10 16
24
32
-6 48
10 64
96
128
PN (E)
-9
10
-12
10
-15
10
-18
10
-150 -100 -50 0
E
30
8 Other Monte Carlo methods
In comparison to similar methods [35, 40] the presented approach has several ad-
vantages due to its simplicity: Instead of iterating towards a good guiding function,
which may be quite expensive computationally, we use a maximum likelihood fit as a
guiding function. Therefore, the proposed algorithm is straightforward to implement
and considerably more efficient than traditional approaches, provided a good guiding
function, i.e., a good maximum-likelihood fit to the simple-sampling results, can be
found. Note also that the method can be generalized to any distribution function,
such as an order-parameter distribution.
31
Monte Carlo Methods (Katzgraber)
Acknowledgments
I would like to thank Juan Carlos Andresen and Ruben Andrist for critically reading
the manuscript. Furthermore, I thank M. Hasenbusch for spotting an error in Sec. 2.1.
References
[1] The bulk of this chapter is based on material collected from the different books
cited.
[2] The alcohol is to improve the randomness of the sampling. If the experimentalist
is not of legal drinking age, it is recommended to close the eyes and rotate 42
times on the spot at high speed before a pebble is thrown.
[3] The pseudo code used does not follow any rules and is by no means consistent.
But it should bring the general ideas across.
[4] Although the algorithm is known as the Metropolis algorithm, N. Metropolis’
contribution to the project was minimal. He merely was the team leader at the
lab. The bulk of the work was carried out by two couples, the Rosenbluths and
the Tellers.
[5] The method is also known under the name of “Exchange Monte Carlo” (EMC)
and “Multiple Markov Chain Monte Carlo” (MCMC).
[6] This section is based on work published in Ref. [54].
[7] H. G. Ballesteros, A. Cruz, L. A. Fernandez, V. Martin-Mayor, J. Pech, J. J. Ruiz-
Lorenzo, A. Tarancon, P. Tellez, C. L. Ullod, and C. Ungil. Critical behavior of
the three-dimensional Ising spin glass. Phys. Rev. B, 62:14237, 2000.
[8] B. Berg and T. Neuhaus. Multicanonical ensemble: a new approach to simulate
first-order phase transitions. Phys. Rev. Lett., 68:9, 1992.
[9] B. A. Berg, A. Billoire, and W. Janke. Functional form of the Parisi overlap
distribution for the three-dimensional Edwards-Anderson Ising spin glass. Phys.
Rev. E, 65:045102, 2002.
[10] B. A. Berg, A. Billoire, and W. Janke. Overlap distribution of the three-
dimensional Ising model. Phys. Rev. E, 66:046122, 2002.
[11] B. A. Berg and T. Neuhaus. Multicanonical algorithms for first order phase
transitions. Phys. Lett. B, 267:249, 1991.
[12] K. Binder. Critical properties from Monte Carlo coarse graining and renormal-
ization. Phys. Rev. Lett., 47:693, 1981.
[13] K. Binder and A. P. Young. Spin glasses: Experimental facts, theoretical concepts
and open questions. Rev. Mod. Phys., 58:801, 1986.
32
References
33
Monte Carlo Methods (Katzgraber)
[33] F. Hamze, N. Dickson, and K. Karimi. Robust parameter selection for parallel
tempering. (arXiv:cond-mat/1004.2840), 2010.
[38] A. K. Hartmann and A. P. Young. Lower critical dimension of Ising spin glasses.
Phys. Rev. B, 64:180404(R), 2001.
[41] J.J. Houdayer. A cluster Monte Carlo algorithm for 2-dimensional spin glasses.
Eur. Phys. J. B., 22:479, 2001.
[44] K. Hukushima and K. Nemoto. Exchange Monte Carlo method and application
to spin glass simulations. J. Phys. Soc. Jpn., 65:1604, 1996.
[45] E. Ising. Beitrag zur Theorie des Ferromagnetismus. Z. Phys., 31:253, 1925.
34
References
[52] D. A. Kofke. Comment on ”The incomplete beta function law for parallel temper-
ing sampling of classical canonical systems” [J. Chem. Phys. 120, 4119 (2004)].
J. Chem. Phys., 121:1167, 2004.
[56] W. Krauth. Algorithms and Computations. Oxford University Press, New York,
2006.
[57] F. Krzakala and O. C. Martin. Spin and link overlaps in 3-dimensional spin
glasses. Phys. Rev. Lett., 85:3013, 2000.
[59] R. H. Landau and M. J. Páez. Computational Physics. Wiley, New York, 1997.
[61] E. Marinari and G. Parisi. Simulated tempering: A new Monte Carlo scheme.
Europhys. Lett., 19:451, 1992.
[62] E. Marinari and G. Parisi. On the effects of changing the boundary conditions
on the ground state of Ising spin glasses. Phys. Rev. B, 62:11677, 2000.
35
Monte Carlo Methods (Katzgraber)
[64] N. Metropolis and S. Ulam. The Monte Carlo Method. J. Am. Stat. Assoc.,
44:335, 1949.
[65] M. Mézard, G. Parisi, and M. A. Virasoro. Spin Glass Theory and Beyond.
World Scientific, Singapore, 1987.
[67] C. M. Newman and D. L. Stein. Short-range spin glasses: Results and specula-
tions. In Lecture Notes in Mathematics 1900, page 159. Springer-Verlag, Berlin,
2007. (cond-mat/0503345).
[71] M. Palassini and A. P. Young. Nature of the spin glass state. Phys. Rev. Lett.,
85:3017, 2000.
[72] C. Predescu, M. Predescu, and C.V. Ciobanu. The incomplete beta function law
for parallel tempering sampling of classical canonical systems. J. Chem. Phys.,
120:4119, 2004.
[75] V. Privman, editor. Finite Size Scaling and Numerical Simulation of Statistical
Systems. World Scientific, Singapore, 1990.
[77] L. Reichl. A Modern Course in Statistical Physics. Wiley, New York, 1998.
[78] D. Sherrington and S. Kirkpatrick. Solvable model of a spin glass. Phys. Rev.
Lett., 35:1792, 1975.
36
References
37