Monte Carlo Simulations in Physics: Kari Rummukainen Department of Physical Sciences, University of Oulu
Monte Carlo Simulations in Physics: Kari Rummukainen Department of Physical Sciences, University of Oulu
Kari Rummukainen
Department of physical sciences, University of Oulu
1
1 Introduction
This course covers (mostly) basic + somewhat more advanced Monte Carlo sim-
ulation methods used in physics. In particular, what we shall mostly concentrate
on are statistical lattice MC simulations.
Approximate contents:
2
Simulations of physical systems. Here the computer is used to model the sys-
tem, mimicking nature. This is a huge field of research: a major factor of the
supercomputing resources in the world is used in physical simulations.
Roughly speaking, there are 2 methods to obtain predictions from a given physi-
cal theory: a) analytical estimations and b) computer simulations.
As a rule physics textbooks discuss only analytical estimates; these are invalu-
able for obtaining physical intuition of the behaviour of the system. However, in
practice these always rely on either simplifying assumptions or some kinds of se-
ries expansions (for example, weak coupling): thus, both the validity and practical
accuracy are limited.
In several fields of physics research computer simulations form the best method
available for obtaining quantitative results. With the increase of computing power
and development of new algorithms new domains of research are constantly
opening to simulations, and simulations are often necessary in order to obtain
realistic results.
Some simulation techniques:
Monte Carlo simulations of statistical systems
Monte Carlo simulations in quantum field theory (Lattice QCD and Elec-
troweak theories)
Methods related to Monte Carlo: simulated annealing, diffusive dynamics
...
Molecular dynamics: classical dynamics of particles, e.g. atoms and
molecules, galaxy dynamics . . .
Quantum mechanical structure of atoms and molecules: Hartree-Fock ap-
proximation, density functional method . . .
Finite element method (FEM) based methods for partial differential equa-
tions
...
3
In these lectures, we shall mostly concentrate on statistical Monte Carlo simula-
tions, especially solving classical partition functions on a (usually) regular lattice:
Z Y
Z= d(x) exp[H()]
x
Here H() is the Hamiltonian (energy) of the system, 1/(kB T ), with kB the
Boltzmann constant and T temperature.
The coordinates are
xi = ani , ni = 0 . . . N i
where a is the lattice spacing (physical length), and the lattice has
d
Y
N= Ni
i=1
The spins si have values 1 and live on a regular lattice. Here the sum in the
exponent ( <ij> ) goes over the nearest-neighbour sites of the lattice. Thus, spin
P
si interacts with spin sj only if the spins are next to each other. (Note: dimension-
less, lattice units!)
Due to its simplicity, we shall use the Ising model a lot in this course. Due to uni-
versality near the Ising model phase transition (Curie point), it actually describes
the critical behaviour in many systems (e.g. liquid-vapour liquid point).
There are many non-lattice systems which also can be studied using Monte Carlo
simulations: interacting atoms forming molecules, general (point-particle) poten-
tial energy problems, stochastic processes etc.
4
Lots of applications within statistical physics and condensed matter physics (crys-
talline structure, spin models, superconductivity, surface physics, glassy systems
...)
The quantum field theory is defined by the Feynman path integral (in Euclidean
spacetime, i.e. after we analytically continue time t it):
Z Y
Z= d(x) exp[S()]
x
Provides the best (only robust) definition of the quantum field theories itself.
Allows the use of (almost all of) the analytical tools used for continuum field
theories especially, the lattice perturbation theory is equivalent to the stan-
dard one (but more cumbersome to use).
Provides new analytical methods and tools, not possible for continuum field
theories (strong coupling/high temperature expansions, duality relations
etc.).
Emphasizes the connection between field theory and statistical mechanics:
Feynman path integral partition function. All of the tools developed for
statistical mechanics are at our disposal!
Permits the evaluation of the path integral numerically, using Monte Carlo
calculations. The integral is evaluated as-is, without expanding it in any
small parameter ( non-perturbative).
Lattice QCD:
Almost all interesting QCD physics non-perturbative!
5
Since around 1980 Quantum Chromodynamics (QCD) has been studied exten-
sively on the lattice. The results have given invaluable qualitative insights (con-
finement); and, during the last 5 years the results have also been quantitatively
comparable to experiments.
Confinement mechanism, hadronic mass spectrum, matrix elements . . .
QCD phase transition at T 150 GeV
Light hadron masses (CP-PACS collaboration, 2002).
1.8
1.6
1.4
*
m (GeV)
1.2
*
N
K*
1.0
0.8 K input
input
K experiment
0.6
0.4
6
Books about lattice field theory:
...
7
2 Monte Carlo integration
The most common use for Monte Carlo methods is the evaluation of integrals. This is
also the basis of Monte Carlo simulations (which are actually integrations). The basic
principles hold true in both cases.
2.1 Basics
Basic idea of the Monte Carlo integration becomes clear
from the dartboard method of integrating the area
of an irregular domain:
Throw darts, i.e. choose points randomly within the rectangular box
A # hits inside area
= p(hit inside area)
Abox # hits inside box
as the # hits .
An early example of Monte Carlo integration is Buffons (17071788) needle experi-
ment to determine :
Throw a needle, length , on a grid of lines l
distance d apart, with < d.
The probability that the needle intersects d
one of the lines is (homework)
2
P =
d
Thus, repeating this N times and observing H intersections we obtain
2N
dH
As an aside, Lazzarini 1901 did Buffons experiment with 3408 throws, and got
1808 intersections (using = 5/6 d)
10 3408 355
= 3.14159292
6 1808 113
This is accurate to 3 107 , way too good a result! Obviously, Lazzarini was
aiming for the known value and interrupted the experiment when he got close
to that, explaining the peculiar number of throws 3408. Thus, this shows the
dangers of fine-tuning the number of samples in Monte Carlo!
(For more information, see the Wikipedia entry on Buffons needle.)
8
2.2 Standard Monte Carlo integration
N
baX
I f (xi ).
N i=1
If the number
of random numbers N is large enough, the error in the approxima-
tion is 1/ N , due to central limit theorem, discussed below.
On the other hand, if we divide the a, b-interval into N steps and use some reg-
ular integration routine, the error will be proportional to 1/N p , where p = 1 even
with the most naive midpoint integral rule. Thus, Monte Carlo is not competitive
against regular quadratures in 1 dimensional integrals (except in some patholog-
ical cases).
For simplicity, let V be a d-dim hypercube, with 0 x 1. Now the Monte Carlo
integration proceeds as follows:
- Generate N random vectors xi from flat distribution (0 (xi ) 1).
- As N ,
N
V X
f (xi ) I .
N i=1
- Error: 1/ N independent of d! (Central limit)
9
Normal numerical integration methods (Numerical Recipes):
- Divide each axis in n evenly spaced intervals
- Total # of points N nd
- Error:
1/n (Midpoint rule)
1/n2 (Trapezoidal rule)
1/n4 (Simpson)
If d is small, Monte Carlo integration has much larger errors than standard meth-
ods.
In practice, N becomes just too large very quickly for the standard methods:
already at d = 10, if n = 10 (pretty small), N = 1010 . This implies simply too many
evaluations of the function!
2.3 Why error is 1/ N ?
This is due to the central limit theorem (see, f.ex., L.E.Reichl, Statistical Physics).
Let yi = V f (xi ) (absorbing the volume in f ) be the value of the integral using one
random coordinate (vector) xi . The result of the integral, after N samples, is
y1 + y2 + . . . + y N
zN = .
N
10
Now
Z
1 = dy p(y)
Z
hyi = dy y p(y)
Z
hy 2 i = dy y 2 p(y)
= hy 2 i hyi2 = h(y hyi)2 i
Here, and in the following, hi means the expectation value of (mean of the
distribution p ()), and is the width of the distribution.
The error of the Monte Carlo integration (of length N ) is defined as the width of
the distribution PN (zN ), or more precisely,
2 2
N = hzN i hzN i2 .
Let us calculate N :
Because of the oscillating exponent, (k/N ) will decrease as |k| increases, and
(k/N )N will decrease even more quickly. Thus,
!N
1 k22 k3 2 2 /2N
N (k) = 1 + O( ) ek
2 N2 N3
11
- Taking the inverse Fourier transform we obtain
1 Z
PN (zN ) = dk eik(zN hyi) N (k)
2 Z
1 2 2
= dk eik(zN hyi) ek /2N
2
s
N (zN hyi)2
" #
N 1
= exp .
2 2 2
Of course, we dont know the expectation values beforehand! In practice, these are
substituted by the corresponding Monte Carlo estimates:
1 X 1 X 2
hf i f (xi ) hf 2 i f (xi ) . (1)
N i N i
12
Actually, in this case we must also use
s
hf 2 i hf i2
N = V
N 1
because using (1) here would underestimate N .
Often in the literature the real expectation values are denoted by hhxii, as opposed to
the Monte Carlo estimates hxi.
The error obtained in this way is called the 1- error; i.e. the error is the width of the
Gaussian distribution of
X
fN = 1/N f (xi ).
i
It means that the true answer is within V hf i N with 68% probability. This is the
most common error value cited in the literature.
This assumes that the distribution of fN really is Gaussian. The central limit theorem
guarantees that this is so, provided that N is large enough (except in some pathological
cases).
However, in practice N is often not large enough for the Gaussianity to hold true! Then
the distribution of fN can be seriously off-Gaussian, usually skewed. More refined
methods can be used in these cases (for example, modified jackknife or bootstrap
error estimation). However, often this is ignored, because there just are not enough
samples to deduce the true distribution (or people are lazy).
Denote y = f (x) = x2 and x = f 1 (y) = y.
dx
px (x)dx = px (x)| |dy pf (y)dy
dy
dx 1 1
pf (y) = | | = |1/f (x)| = x1 = y 1/2
dy 2 2
where 0 < y 1.
13
xi
In more than 1 dim: pf (~y ) = yj
, the Jacobian determinant.
P1 P10
4 P2 40 P100
P3 P1000
P10
3 30
PN PN
2 20
1 10
0 0
0 0.5 1 0.3 0.35 0.4
The expectation value = 1/3 for all PN .
The width of the Gaussian is
s s
hhf 2 ii hhf ii2 4
N = =
N 45N
where
Z 1 Z
2 2
hhf ii = dxf (x) = dyy 2 pf (y) = 1/5.
0 1
14
in the figure above we obtain a curve which is almost undistinguishable from the mea-
sured P1000 (blue dashed) curve.
In practice, the expectation value and the error is of course measured from a single
Monte Carlo integration. For example, using again N = 1000 and the Monte Carlo
estimates
s
1 X 1 X 2 hf 2 i hf i2
hf i = fi , hf 2 i = f , N =
N i N i i N 1
where 1 y < . Thus, y 2 P1 (y) is not integrable, and since P2 (y) y 3 when y ,
neither is y 2 P2 (y). This remains true for any N (?). Thus, formally the error N is infinite!
Nevertheless, we can perform standard MC integrations with MC estimates for hf i, hf 2 i
and plug these into the formula for N :
For example, some particular results
N = 1000 : 1.897 0.051
N = 10000 : 1.972 0.025
N = 100000 : 2.005 0.011
N = 1000000 : 1.998 0.003
Results may fluctuate quite a bit, since occasionally there may occur some very large
values of f (x) = 1/ x.
15
Why does this work? Even though
1 X 2
hf 2 iN = f ,
N i i
does not have a well-defined limit as N , the error estimate
s
hf 2 iN hf i2N
N =
N 1
does, becaues of the extra factor of (N 1) in the denominator! Thus, usually one can
use the formula aboveas long as f (x) is integrable. However, the convergence of the
error is slower than 1/ N , depending on the function!
What about integrals with non-integrable subdivergences? Example:
Z 1 1
dx =0
1 x
(in principal value sense). Usually Monte Carlo methods are not able to handle these
(test the above!). The integral requires special treatment of the subdivergences (not
discussed here).
where S is small).
Z
Let us integrate I = dV f (x)
V
Choose a distribution p(x), which is close to the function f (x), but which is simple
enough so that it is possible to generate random x-values from this distribution.
16
Now
Z
f (x)
I= dV p(x) .
p(x)
Thus, if we choose random numbers xi from distribution p(x), we obtain
1 X f (xi )
I = lim
N N
i p(xi )
Since f /p is flatter than f , the variance of f /p will be smaller than the variance of
f error will be smaller (for a given N ).
Ideal choice: p(x) |f (x)|. Now the variance vanishes! In practice not usually
possible, since the proportionality constant is the unknown integral!
However, exactly this choice is done in Monte Carlo simulations.
In Monte Carlo simulations, we want to evaluate integrals of type
Z Z
S()
I= [d] f () e / [d] eS()
Note that
Z
Z= [d] eS()
cannot be directly calculated with importance sampling, due to the unknown nor-
malization constant.
In a sense, this importance sampling turns the plain Monte Carlo integration into
simulation: the configurations (x) are sampled with the Bolzmann probability
eS , which is the physical probability of a configuration. Thus, the integration
process itself mimics nature!
How to generate s with probability eS() ? That is the job of the Monte Carlo
update algorithms.
17
2.8 Example: importance sampling
Integrate
Z 1
I= dx (x1/3 + x/10) = 31/20 = 1.55
0
Thus, if y is from a flat distribution y (0, 1], x = y 3/2 is from distribution p(x) = 2/3x1/3 .
In the new variable y the integral becomes
Z 1 f (x) Z 1 f (x(y))
I= dxp(x) = dy
0 p(x) 0 p(x(y))
Thus, if we denote g = f /p, we have hgi = 31/20 and hg 2 i = 2.4045. Thus, the width of
the result in MC integration of g is
s
hg 2 i hgi2 0.045
N = =
N 1 N 1
This is 20 times narrower than with the naive method!
The recipe for the MC integration with importance sampling is thus:
- generate N random numbers yi from flat distribution
3/2
- xi = yi are from distribution p(x)
- calculate the average of gi = f (xi )/p(xi )
Indeed:
N naive importance
100 1.4878 0.0751 1.5492 0.0043
10000 1.5484 0.0080 1.5503 0.0004
18
2.9 Note: some standard MC integration routines
Powerful Monte Carlo integration routines are included in several numerical packages.
However, typically these tend to fail when number of dimensions is larger than 15.
Thus, these are useless for Monte Carlo simulations.
The following routines are included in the GSL (GNU scientific library, available for
free). See also Numerical Recipes 7.8 for further details.
Described in:
G.P. Lepage, A New Algorithm for Adaptive Multidimensional Integration, Journal
of Computational Physics 27, 192-203, (1978)
Does not work with importance sampling not useful for Monte Carlo simulation.
19
3 Random numbers
Good random numbers play a central part in Monte Carlo simulations. Usually these
are generated using a deterministic algorithm, which produces a sequence of numbers
which have sufficiently random-like properties (despite being fully deterministic). The
numbers generated this way are called pseudorandom numbers.
There exist several methods for obtaining pseudorandom number sequences in Monte
Carlo simulations. However, there have been occasions where the random numbers
generated with a trusted old workhorse algorithm have failed (i.e. the simulation has
produced incorrect results). What is understood to be a good random number gener-
ator varies with time!
20
The quality of the distribution is often not perfect. For example, a sequence of
random bits might have slightly more 0s than 1s. This is not so crucial for cryp-
tography (as long as the numbers are really random), but is absolute no-no for
Monte Carlo.
This is only of historical note, it is the first pseudorandom generator (N. Metropolis).
Dont use it for any serious purpose. This works as follows: take a n-digit integer;
square it, giving a 2n-digit integer. Take the middle n digits for the new integer value.
21
3.2.2 Linear congruential generator
One of the simplest, widely used and oldest (Lehmer 1948) generators is the linear
congruential generator (LCG). Usually the language or library standard generators
are of this type.
The generator is defined by integer constants a, c and m, and produces a sequence of
random integers Xi via
This generates integers from 0 to (m 1) (or from 1 to (m 1), if c = 0). Real numbers
in [0, 1) are obtained by division fi = Xi /m.
Since the state of the generator is specified by the integer Xi , which is smaller than m,
the period of the generator is at most m. The constants a, c and m must be carefully
chosen to ensure this. Arbitrary parameters are sure to fail!
These generators have by now well-known weaknesses. Especially, if we construct d-
dimensional vectors (d-tuples) from consecutive random numbers (fi , fi+1 , . . . , fi+d ),
the points will lie on a relatively small number of hyperplanes (at most m1/d , but can be
much less; see Numerical Recipes).
In many generators in use m = 2n . In this case, it is easy to see that the low-order
bits have very short periods. This is due to the fact that the information in X is only
moved up, towards more significant bits, never down. Thus, for k least significant bits
there are only 2k different states, which is then the cycle time of these bits. The lowest
order bit has a period of 2, i.e. it flips in sequence 101010. . . . Some amount of cycling
occurs in fact always when m is not a prime.
Thus, if you need random integers or random bits, never use the low-order bits from LGCs!
The ANSI C standard1 random number routine rand() has parameter values
This is essentially a 32-bit algorithm. The cycle time of this generator is only m =
231 2109 , which is exhausted very quickly in a modern computer. Moreover, m
is a power of 2, so that the low-order bits are periodic. In Linux, function rand()
has been substituted by a more powerful function.
This routine exists in many, maybe most, system libraries (and may be the random
number in Fortran implementations). Nevertheless this generator is not good
enough for serious computations.
1
Sorry: not standard, but mentioned in the standard. The standard does not specify a specific
generator.
22
GGL, IBM system generator (Park and Miller minimal standard)
a = 16807, c = 0, m = 231 1
As before, short cycle time. Better generator than the first one, but I would not
recommend this for Monte Carlo simulations. This generator is the RAND gener-
ator in MATLAB.
UNIX drand48():
This uses effectively 64 bits in arithmetics, and the internal state is modded to a
48-bit integer. The cycle time is thus 248 2.8 1014 . The cycle time is sufficient
for most purposes, and this generator is much used. However, it has the common
low-order cycling problem and must be considered pretty much obsolete.
a = 1313 , c = 0, m = 259
Lagged Fibonacci and related generators improve the properties of the random num-
bers by using much more than one integer as the internal state. Generally, lagged
Fibonacci generators form a new integer Xi using
where p and q are lags, integer constants, p < q, and is some arithmetic operation,
such as +, -, * or , where the last is XOR, exclusive bitwise or.
The generator must keep at least q previous Xi s in memory. The quality of these
generators is not very good with small lags, but can become excellent with large lags.
If the operator is addition or subtraction, the maximal period is 2p+q1 .
If the operator is XOR, the generator is often called (generalized) feedback shift
register (GFSR) generator. Standard UNIX random() generator is of this type, using
up to 256 bytes of internal storage. I believe this is the rand() in Linux distributions. I
dont recommend either for Monte Carlo simulations.
Another fairly common generator of this type is R250:
Xi = X103 X250
23
This requires 250 integer words of storage (For more information, see Vattulainens
web-page). However, GFSR generators are known to have rather strong 3-point cor-
relations in tuples (Xi , Xiq , Xip ) (no surprise there). It has been observed to fail
spectacularly in Ising model simulations using a Wolff cluster algorithm (Ferrenberg et
al., Phys. Rev. Lett. 69 (1992)). One should probably not use these kind of generators
in serious Monte Carlo.
That said, these generators typically do not distinguish between low- and high-order
bits; thus, the low-order bits tend to be as good as the high order ones.
Because of the large internal memory, seeding of these generators is somewhat cum-
bersome. Usually one uses some simpler generator, e.g. some LGC, to seed the initial
state variables.
Many of the bad properties of single random number generators can be avoided by
combining two (or more) generators. For example,
forms the core of the combined generator by lEcuyer and Bays-Durham, presented in
Numerical Recipes as ran2; the generator also implements some modifications to the
above simple procedure; shuffling of the registers etc. The period of this is the product
of the mod-factors, 1018 .
RANMAR, by Marsaglia, Zaman and Tsang, is a famous combined generator of a
lagged Fibonacci and a LGC generator with a prime modulus m = 224 3. Period is
24
good, 2144 , but it uses only 24 bits as the working precision (single precision reals).
This generator has passed all the tests thrown at it.
RANLUX, by Luscher, is a lagged Fibonacci generator with adjustable skipping, i.e.
it rejects a variable number of random number candidates as it goes. It produces
luxurious random numbers with mathematically proven properties. In higher luxury
levels it becomes somewhat slow, however, even at the smallest luxury level the period
is about 10171 . Luxury level 3 is the default, luxury level 4 makes all bits fully chaotic.
The code is pretty long, though. (Computer physics communications 79 (1994) 100).
CMRG, combined multiple recursive generator by lEcuyer (Operations Research 44,5
(1996)) (one of the several generators by LEcuyer). It uses 6 words of storage for the
internal state, and has a period of 2205 .
Mersenne twister: probably good overall, fast. My current favourite. Code avail-
able.
RANLUX: very good, somewhat slower at high (= good enough) luxury levels.
drand48: part of the standard UNIX library, thus immediately available if you
dont have anything else. Good enough for most uses, but dont generate random
bit patterns with it!
25
Information about random number generators:
Numerical Recipes
Ilpo Vattulainens random number tests, see
http://www.physics.helsinki.fi/vattulai/rngs.html,
pLab: http://random.mat.sbg.ac.at/
D. Knuth: The Art of Computer Programming, vol 2: Seminumerical Methods
P. LEcuyer, Random numbers for simulation, Comm. ACM 33:10, 85 (1990)
LEcuyers home page: http://www.iro.umontreal.ca/lecuyer
int main()
{
long seed;
int i;
double d;
gluon(/tmp)% ./prog
Give seed: 21313
0 0.580433 1.786811284
1 0.686466 1.986682984
2 0.586646 1.797948229
3 0.515342 1.674211124
4 0.783321 2.188729827
26
Using (my inline version of) the Mersenne twister: you need the files
mersenne inline.c and mersenne.h, given in course www-page. The official
Mersenne twister code, also available in fortran, is available at the twister home page.
...
#include "mersenne.h"
...
int main()
{
long seed;
...
seed_mersenne( seed );
d = mersenne();
...
On a typical generator, m is of order 231 or larger; thus, Xi can also be of the same
magnitude. This means that the multiplication aXI will overflow typical 32-bit integer
(for example, int and long on standard intel PCs).
If m is a power of 2, this is not a problem (at least in C): if a and X are of type unsigned
int or unsigned long, the multiplication gives the low-order bits correctly (and just
drops the high-order bits). Modding this with a power of 2 gives then a correct answer.
However, if m is not a power of 2, this does not work.
The easiest solution is to use 64-bit double precision floating points for the numbers.
The mantissa on double (IEEE) has 52 bits, so integers smaller than 251 can be
represented exactly.
Or, sticking with ints, one can use the Shrages algorithm (Numerical Recipes): we
can always write m as m = aq + p, where q = [m/a], the integer part of m/a. Now it is
easy to show that
27
3.7 Random numbers from non-uniform distributions
The pseudorandom generators in the previous section all return a random number from
uniform distribution [0, 1) (or (0, 1) or some other combination of the limits). However,
usually we need random numbers from non-uniform distribution. We shall now discuss
how the raw material from the generators can be transformed into desired distributions.
Here the integral goes over the whole domain where p(x) is defined.
The fundamental transformation law of probabilities is as follows: if we have a random
variable x from a (known) distribution p1 (x), and a function y = y(x), the probability
distribution of y, p2 (y), is determined through
dx
p1 (x)|dx| = p2 (y)|dy| p2 (y) = p1 (x)
dy
dxi
In more than 1 dimensions: | dx
dy
| || dyj
||, the Jacobian determinant of the transforma-
tion.
Now we know the distribution p1 (x) and we also know the desired distribution p2 (y),
but we dont know the transformation law y = y(x)! It can be solved by integrating the
above differential equation:2
Z x Z y
dx p1 (x ) = dy p2 (y ) P1 (x) = P2 (y) y = P21 [P1 (x)] ,
a1 a2
where P1 (x) is the cumulant of the distribution p1 (x). a1 and a2 are the smallest values
where p1 (x) and p2 (y) are defined (often ).
Now p1 (x) = 1 and x [0, 1]. Thus, P1 (x) = x, and y is to be inverted from the
equation
Z y
x= dy p(y ).
a2
This is the fundamental equation for transforming random numbers from the uniform
distribution to a new distribution p(y). (Now I drop the subscript 2 as unnecessary.) Un-
fortunately, often the integral above is not very feasible to calculate, not to say anything
about the final inversion (analytically or numerically).
2
Dropping the absolute values here means that y(x) will be monotonously increasing func-
R b2
tion. We get monotonously decreasing y(x) by using y dy on the RHS.
28
3.7.2 Exponential distribution
Normalized exponential distribution for y [0, ) is p(y) = ey . Thus, now the trans-
formation is
Z y
x= dy ey = 1 ey y = ln(1 x) = ln x
0
We can use x or 1 x above because of the uniform distribution; one should choose
the one which does not ever try to evaluate ln 0.
However, already the log-distribution p(y) = ln y, 0 < y 1 is not invertible analyti-
cally:
Z y
x= dy ln y = y(1 ln y)
0
This actually can be very efficiently inverted to machine precision using Newtons method
(Numerical Recipes, for example).
29
Inverting this, we get
q
= 2X1 , r= 2 ln X2 .
x = r cos , y = r sin .
Both x and y are good Gaussian random numbers, so we can use both of them, one
after another: on the first call to generator we generate both x and y and return x, and
on the second call just return y.
This process implements two changes of random variables: (X1 , X2 ) (r, ) (x, y).
On the second stage we did not have to do the integral inversion, because we knew
the transformation from (x, y) to (r, ).
It is customary to accelerate the algorithm above by eliminating
the trigonometric func-
tions. Let us first observe that we can interpret the pair ( X2 , ) as the polar coordi-
nates of a random point from a uniform distribution inside a unit circle. Why X? This
is because of the Jacobian; the uniform differential probability inside a circle is ddrr,
which, when plugged in the conversion formula and integrated wrt. r yields X2 = r2 .
Thus, instead of polar coordinates, we can directly generate cartesian coordinates from
a uniform distribution inside a circle using the rejection method:
1. generate 2 uniform random numbers vi (1, 1)
2. accept if R2 = v12 + v22 < 1, otherwise back to 1.
Now R2 corresponds to X2 above, and, what is the whole point of this transformation,
v1 /R cos and v2 /R sin . We dont have to evaluate the trigonometric functions.
/***************** gaussian_ran.c ****************
* double gaussian_ran()
* Gaussian distributed random number
* Probability distribution exp( -x*x/2 ), so < x2 > = 1
* Uses mersenne random number generator
*/
#include <stdlib.h>
#include <math.h>
#include "mersenne.h"
double gaussian_ran()
{
static int iset=0;
static double gset;
register double fac,r,v1,v2;
if (iset) {
iset = 0;
return(gset);
}
do {
v1 = 2.0*mersenne() - 1.0;
v2 = 2.0*mersenne() - 1.0;
30
r = v1*v1 + v2*v2;
} while (r >= 1.0 || r == 0.0);
fac = sqrt( -2.0*log(r)/r );
gset = v1*fac;
iset = 1;
return(v2*fac);
}
The inversion method above is often very tricky. More often than not the function is not
analytically integrable, and doing the integration + inversion numerically is expensive.
The rejection method is very simple and powerful method for generating numbers from
any distribution. Let now p(x) be the desired distribution, and let f (x) be a distribution
according to which we know how to generate random numbers, with the following
property:
p(x) C f (x)
with some known constant C. It is now essential to note that if we have a uniform
density of points on a (x, y)-plane, the number of points in the interval 0 < y < C f (x)
is proportional to f (x). Same is true with p(y). Now the method works as follows:
1. Generate X from distribution f (x).
2. Generate Y from uniform distribution 0 < Y < C f (X). Now the point (X, Y ) will
be from an uniform distribution in the area below curve C f (x).
3. If Y p(x) return X. This is because now the point (X, Y ) is also a point in the
uniform distribution below the curve p(x), and we can interpret point X as being
from distribution p(x).
4. If Y > p(x) reject and return to 1.
C f(x) p(x)
In the figure (X1 , Y1 ) is rejected,
but (X2 , Y2 ) accepted. X2 is
thus a good random number from
distribution p(x).
X1 X2
The rejection rate = (area under p(x)) / (area under C f (x)). Thus, C f (x) should
be as close to p(x) as possible to keep the rejection rate small. However, it should
also be easy to generate random variables with distribution f (x).
31
Often it is feasible to use f (x) = const. (fast), but beware that the rejection rate
stays tolerable.
Works in any dimension.
Theres actually no need to normalize p(x).
Example:
3
exp(1)
Consider (unnormalized) distribution
2.5
p() = exp cos , < < .
2 2
This is not easily integrable, so lets use exp(1-2 / )
2
rejection method.
A) Generate random numbers from 1.5
uniform distribution
f () = e1 p().
1 exp cos
Acceptance rate 0.46.
B) Generate random numbers from
0.5
Gaussian distribution
f () = exp[1 22 / 2 ] p().
Acceptance rate 0.78 0
-
0
Thus, using B) we generate only 0.46/0.78 60% of the random numbers used in A).
Which one is faster depends on the speed of the random number generators.
The trick used in the Box-Muller method to generate random numbers inside unit circle
was a version of the rejection method. It can actually be used to generate random
numbers inside or on the surface of a sphere in arbitrary dimensions:
1. Generate d uniform random numbers Xi (1, 1).
2. If R2 = Xi2 > 1 the point is outside the sphere and go back to 1.
P
i
The rejection rate is fairly small, unless the dimensionality is really large (homework).
This method is almost always faster and easier than trying to use the polar coordinates
(careful with the Jacobian!).
32
3.8 Discrete distributions
Very often in Monte Carlo simulations we need random numbers from discrete distribu-
tions (Ising model, Potts model, any other discrete variable model). These distributions
are easy to generate:
Let i be our discrete random variable with N states (N can be ), and let pi be the
desired probability distribution. These are normalized
X
pi = 1 .
i
Imagine now a line which has been split into segments of length pi :
0 1
p p p p
1 2 3 4
Now, if we generate a uniform random number X, 0 < X < 1, we get the discrete
random number i by checking on which segment X lands. If N is small, this is easy to
do by just summing p1 + p2 . . . + pi until the sum is larger than X. For large N more
advanced search strategies should be used (binary search, for example).
Sometimes the probability distribution p(x) we want is relatively easily integrable, but
the resulting equation is not so easily invertible. Previously, we had the example p(x) =
ln x, where 0 < x 1. The inversion formula is
Z x
X= dyp(y) = P (x) = x(1 ln x)
0
X P (x0 ) X x0 (1 ln x0 )
x = x0 +
= x0 +
P (x0 ) ln x0
3. If |x x0 | > and/or |X P (x)| > , where and are required accuracy, set
x0 = x and go back to 2.
33
4. Otherwise, accept x.
Convergence is usually very fast. However, beware the edges! x can easily wind up
outside the allowed interval ((0, 1) in this case).
In this example y X, the number from 0 to 1, and epsilon is the desired accuracy:
x = y; /* initial value */
df = y - x*(1.0 - log(x)); /* y - P(x) */
while ( fabs(df) > epsilon ) {
x1 = x - df/log(x); /* x - (y-P(x))/P(x) */
x = x1;
df = y - x*(1.0 - log(x));
}
4. If x2 x1 > go to 2.
5. Otherwise, accept x
This converges exponentially, factor of 2 for each iteration. The condition in 3. works
because P (x) increases monotonically.
34