Lecture 1
Lecture 1
Morten Hjorth-Jensen
1 Department of Physics and Center of Mathematics for Applications
University of Oslo, N-0316 Oslo, Norway
2 Department of Physics and Astronomy, Michigan State University
East Lansing, Michigan, USA
January 28 - February 2
Outline
Outline
Outline
http://www.iop.org/EJ/journal/CSD
N EW FO R 2006
www.iop.org /journals/csd
Selected Texts
where ωi are the weights determined by the specific integration method (like Simpson’s
or Taylor’s methods) with xi the given mesh points. To give you a feeling of how we are
to evaluate the above integral using Monte-Carlo, we employ here the crudest possible
approach. Later on we will present more refined approaches. This crude approach
consists in setting all weights equal 1, ωi = 1. Recall also that dx = h = (b − a)/N
where b = 1, a = 0 in our case and h is the step size. We can then rewrite the above
integral as
Z 1 N
1 X
I= f (x)dx ≈ f (xi ).
0 N
i=1
Introduce the concept of the average of the function f for a given Probability
Distribution Function p(x) as
N
1 X
E[f ] = hf i = f (xi )p(xi ),
N
i=1
and identify p(x) with the uniform distribution, viz p(x) = 1 when x ∈ [0, 1] and zero for
all other values of x.
First National Winter School in eScience Lecture I, January 28 2007
Introduction PDF MC Integration
In addition to the average value E[f ] the other important quantity in a Monte-Carlo
calculation is the variance σ 2 and the standard deviation σ. We define first the variance
of the integral with f for a uniform distribution in the interval x ∈ [0, 1] to be
N
1 X
σf2 = (f (xi ) − hf i)2 p(xi ),
N
i=1
or “ ”
σf2 = E[f 2 ] − (E[f ])2 .
which is nothing but a measure of the extent to which f deviates from its average over
the region of integration.
First National Winter School in eScience Lecture I, January 28 2007
Introduction PDF MC Integration
The trapezoidal rule carries a truncation error O(h2 ), with h the step length.
In general, quadrature rules such as Newton-Cotes have a truncation error which
goes like ∼ O(hk ), with k ≥ 1. Recalling that the step size is defined as
h = (b − a)/N, we have an error which goes like ∼ N −k .
Monte Carlo integration is more efficient in higher dimensions. Assume that our
integration volume is a hypercube with side L and dimension d. This cube
contains hence N = (L/h)d points and therefore the error in the result scales as
N −k/d for the traditional methods.
The error in
√ the Monte carlo integration is however independent of d and scales
as σ ∼ 1/ N, always!
Comparing this with traditional methods, shows that Monte Carlo integration is
more efficient than an order-k algorithm when d > 2k
hHi =
where
r1 , .., rA ,
are the coordinates and
α1 , .., αA ,
are sets of relevant quantum numbers such as spin and isospin for a system of A
nucleons (A = N + Z , N being the number of neutrons and Z the number of protons).
More on Dimensionality
There are „ «
A
2A ×
Z
coupled second-order differential equations in 3A dimensions.
For a nucleus like 10 Be this number is 215040. This is a truely challenging many-body
problem.
Assume that a the time t = 0 we have N(0) nuclei of type X which can decay
radioactively. At a time t > 0 we are left with N(t) nuclei. With a transition probability ω,
which expresses the probability that the system will make a transition to another state
during a time step of one second, we have the following first-order differential equation
dN(t) = −ωN(t)dt,
whose solution is
N(t) = N(0)e−ωt ,
where we have defined the mean lifetime τ of X as
1
τ = .
ω
Radioactive Decay
∆N(t)
= −λ
N(t)∆t
Radioactive Decay
As an example, consider the tossing of two dice, which yields the following possible
values
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
These values are called the domain. To this domain we have the corresponding
probabilities
[1/36, 2/36/3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, 1/36].
Expectation Values
Discrete PDF
N
1 X k
E[x k ] = hx k i = xi p(xi ),
N
i=1
PN
provided that the sums (or integrals) i=1 p(xi ) converge
PN
absolutely (viz , i=1 |p(xi )| converges)
Continuous PDF
Z b
E[x k ] = hx k i = x k p(x)dx,
a
Function f (x)
Z b
k k
E[f ] = hf i = f k p(x)dx,
a
Variance
σf2 = E[f 2 ] − (E[f ])2 = hf 2 i − hf i2
Uniform Distribution
Exponential Distribution
and variance ∞ 1
Z
σ2 = x 2 p(x)dx − µ2 = .
0 α2
Normal Distribution
1
Z ∞ √ √
µ= √ b 2(a + b 2y) exp −y 2 dy = a.
b 2π −∞
1
Z ∞ „
(x − a)2
«
σ2 = √ (x − µ)2 exp − dx,
b 2π −∞ 2b2
and inserting the mean value and performing a variable change we obtain
1
Z ∞ √ √ “ ” 2b2 ∞ 2
Z “ ”
σ2 = √ b 2(b 2y)2 exp −y 2 dy = √ y exp −y 2 dy ,
b 2π −∞ π −∞
x2
„ «
1
p(x) = √ exp − .
2π 2
The exponential and uniform distributions have simple cumulative functions, whereas
the normal distribution does not, being proportional to the so-called error function
erf (x), given by
Z x „ 2«
1 t
P(x) = . √ exp − dt,
2π −∞ 2
which is difficult to evaluate in a quick way. Later we will present an algorithm by Box
and Mueller which allows us to compute the cumulative distribution using random
variables sampled from the uniform distribution.
Binomial Distribution
where y is the probability for a specific event, such as the tossing of a coin or moving
left or right in case of a random walker. Note that x is a discrete stochastic variable.
The sequence of binomial trials is characterized by the following definitions
Every experiment is thought to consist of N independent trials.
In every independent trial one registers if a specific situation happens or not,
such as the jump to the left or right of a random walker.
The probability for every outcome in a single trial has the same value, for
example the outcome of tossing a coin is always 1/2.
In Lecture 3 we will show that the probability distribution for a random walker
approaches the binomial distribution.
In order to compute the mean and variance we need to recall Newton’s binomial
formula
m „ «
X m
(a + b)m = an bm−n ,
n
n=0
resulting in
n
X (n − 1)!
µ= x y x−1 (1 − y )n−1−(x−1) ,
x=0
(x − 1)!(n − 1 − (x − 1))!
which we rewrite as
n „ «
X n−1
µ = ny y ν (1 − y)n−1−ν = ny (y + 1 − y)n−1 = ny .
ν
ν=0
The variance is slightly trickier to get. Exercise: show that it reads σ 2 = ny (1 − y).
Poisson Distribution
Another important distribution with discrete stochastic variables x is the Poisson model,
which resembles the exponential distribution and reads
λx −λ
p(x) = e x = 0, 1, . . . , ; λ > 0.
x!
In this case both the mean value and the variance are easier to calculate,
∞ ∞
X λx −λ X λx−1
µ= x e = λe−λ = λ,
x=0
x! x=1
(x − 1)!
Let us recapitulate some of the above concepts using a discrete PDF (which is what we
end up doing anyway on a computer). The mean value of a random variable X with
range x1 , x2 , . . . , N is
N
1 X
hxi = µ = xi p(xi ),
N
i=1
N N
1 X 1 X
σ2 = (xi − hxi)2 p(xi ) = h(xi − µi )2 i.
N N
i=1 i=1
Assume now that we have two independent sets of measurements X1 and X2 with
corresponding mean and variance µ1 and µ2 and σX2 and σX2 .
1 2
Y = X1 + X2 ,
we have
µ Y = µ1 + µ2 ,
and
2
X
σY2 = h(Xj − µj )2 i + 2cov (X1 , X2 ),
j=1
If X1 and X2 are two independent variables we can show that the covariance is zero,
but one cannot deduce from a zero covariance whether the variables are independent
or not. If our random variables which we generate are truely random numbers, then the
covariance should be zero.
A way to measure the correlation between two sets of stochastic variables is the
so-called correlation function ρ(X1 , X2 ) defined as
cov (X1 , X2 )
ρ(X1 , X2 ) = q .
hσ 2 iX1 hσ 2 iX2
Obviously, if the covariance is zero due to the fact that the variables are independent,
then the correlation is zero. This quantity is often called the correlation coefficient
between X1 and X2 . We can extend this analysis to a set of stochastic variables
Y = (X1 + X2 + · · · + XN ). We now assume that we have N different measurements of
the mean and variance of a given variable. Each measurement consists again of N
measurements, although we could have chosen the latter to be different from N. The
total mean value is defined as
XN
hµY i = hµi i.
i=1
The total variance is however now defined as
N
X N
X N
X
σY2 = h(Y − µY )2 i = h(Xj − µj )i2 = σX2 j + 2 h(Xj − µj )ih(Xk − µk )i,
j=1 j=1 j<k
or
N
X N
X
σY2 = σX2 j + 2 cov (Xj , Xk ).
j=1 j<k
Covariance
If the variables are independent, the covariance is zero and the variance is reduced to
N
X
σY2 = σX2 j ,
j=1
and if we assume that all sets of measurements produce the same variance σ 2 , we
end up with
σY2 = Nσ 2 .
In Lecture 5 we will discuss a very important class of correlation functions (another
application of the covariance), the so-called time-correlation functions. This are
important quantities in our studies of equilibrium properties,
Z
dt 0 M(t 0 ) − hMi M(t 0 + t) − hMi .
ˆ ˜ˆ ˜
φ(t) =
From Onsager regression hypothesis, we have that in the long time limit, the variables
M(t 0 + t) and M(t) eventually become uncorrelated from each other so that the time
correlation function becomes zero. The system has then reached its most likely state.
Suppose we have a PDF p(x) from which we generate a series N of averages hxi i.
Each mean value hxi i is viewed as the average of a specific measurement, e.g.,
throwing dice 100 times and then taking the average value, or producing a certain
amount of random numbers. For notational ease, we set hxi i = xi in the discussion
which follows.
If we compute the mean z of N such mean values xi
x1 + x2 + · · · + xN
z= ,
N
x1 + x2 + · · · + xN
Z Z Z
p̃(z) = dx1 p(x1 ) dx2 p(x2 ) . . . dxN p(xN )δ(z − ),
N
where the δ-function enbodies the constraint that the mean is z. All measurements that
lead to each individual xi are expected to be independent, which in turn means that we
can express p̃ as the product of individual p(xi ).
x1 + x2 + · · · + xN 1
Z ∞ “ x +x +···+xN
”
iq(z− 1 2 N )
δ(z − )= dqe ,
N 2π −∞
∞ »Z ∞ –N
1
Z
p̃(z) = dqe(iq(z−µ)) dxp(x)e(iq(µ−x)/N) ,
2π −∞ −∞
The second term on the rhs disappears since this is just the mean and employing the
definition of σ 2 we have
Z ∞ q 2 σ2
dxp(x)e(iq(µ−x)/N) = 1 − + ...,
−∞ 2N 2
resulting in
∞ –N –N
q 2 σ2
»Z »
dxp(x) exp (iq(µ − x)/N) ≈ 1− 2
+ ... ,
−∞ 2N
Thus, the central limit theorem states that the PDF p̃(z) of the average of N random
values corresponding to a PDF p(x) is a normal distribution whose mean is the mean
value of the PDF p(x) and whose variance is the variance of the PDF p(x) divided by
N, the number of values used to compute z.
The theorem is satisfied by a large class of PDFs. Note however that for a finite N, it is
not always possible to find a closed expression for p̃(x). The central limit theorem
leads then to the well-known expression for the standard deviation, given by
σ
σN = √ .
N
The latter is true only if the average value is known exactly. This is obtained in the limit
N → ∞ only.
With the uniform distribution p(x) = 1 for x ∈ [0, 1] and zero else
1 N
1 X
Z
I= f (x)dx ≈ f (xi ),
0 N
i=1
Z 1
I= f (x)dx ≈ E[f ] = hf i.
0
0 12
N N
1 X 1 X
σf2 = f (xi )2 − @ f (xi )A ,
N N
i=1 i=1
or “ ”
σf2 = E[f 2 ] − (E[f ])2 = hf 2 i − hf i2 .
Code at http://folk.uio.no/mhjensen/fys3150/2005/programs/
chapter8/example1.cpp.
Note the call to a function which generates random numbers according to the uniform
distribution
long idum;
idum=-1 ;
.....
x=ran0(&idum);
....
or
...
invers_period = 1./RAND_MAX;
srand(time(NULL));
...
x = double(rand())*invers_period;
N I σN
10 3.10263E+00 3.98802E-01
100 3.02933E+00 4.04822E-01
1000 3.13395E+00 4.22881E-01
10000 3.14195E+00 4.11195E-01
100000 3.14003E+00 4.14114E-01
1000000 3.14213E+00 4.13838E-01
10000000 3.14177E+00 4.13523E-01
109 3.14162E+00 4.13581E-01
We note that as N increases, the integral itself never reaches more than an agreement
to the fourth or fifth digit. The variance also oscillates around its exact value
4.13581E − 01. Note well that the variance need not be zero but one can, with
appropriate redefinitions of the integral be made smaller. A smaller variance yields also
a smaller standard deviation.
Acceptance-Rejection Method
This is a rather simple and appealing method after von Neumann. Assume that we are
looking at an interval x ∈ [a, b], this being the domain of the PDF p(x). Suppose also
that the largest value our distribution function takes in this interval is M, that is
Then we generate a random number x from the uniform distribution for x ∈ [a, b] and a
corresponding number s for the uniform distribution between [0, M]. If
p(x) ≥ s,
we accept the new value of x, else we generate again two new random numbers x and
s and perform the test in the latter equation again.
Acceptance-Rejection Method
Obviously to derive it analytically is much easier, however the integrand could pose
some more difficult challenges. The aim here is simply to show how to implent the
acceptance-rejection algorithm. The integral is the area below the curve
f (x) = exp (x). If we uniformly fill the rectangle spanned by x ∈ [0, 3] and
y ∈ [0, exp (3)], the fraction below the curve obatained from a uniform distribution, and
multiplied by the area of the rectangle, should approximate the chosen integral. It is
rather easy to implement this numerically, as shown in the following code.
Acceptance-Rejection Method
Transformation of Variables
All random number generators provided in the program library generate numbers in
this domain.
When we attempt a transformation to a new variable x → y we have to conserve the
probability
p(y)dy = p(x)dx,
which for the uniform distribution implies
p(y)dy = dx.
Transformation of Variables
Let us assume that p(y) is a PDF different from the uniform PDF p(x) = 1 with
x ∈ [0, 1]. If we integrate the last expression we arrive at
Z y
x(y) = p(y 0 )dy 0 ,
0
This is an important result which has consequences for eventual improvements over
the brute force Monte Carlo.
If we wish to relate this distribution to the one in the interval x ∈ [0, 1] we have
dy
p(y)dy = = dx,
b−a
y dy 0
Z
x(y) = ,
a b−a
yielding
y = a + (b − a)x,
a well-known result!
Assume that
p(y) = e−y ,
which is the exponential distribution, important for the analysis of e.g., radioactive
decay. Again, p(x) is given by the uniform distribution with x ∈ [0, 1], and with the
assumption that the probability is conserved we have
or
y(x) = −ln(1 − x).
This gives us the new random variable y in the domain y ∈ [0, ∞) determined through
the random variable x ∈ [0, 1] generated by our favorite random generator.
This means that if we can factor out exp (−y) from an integrand we may have
Z ∞ Z ∞
I= F (y)dy = exp (−y)G(y)dy
0 0
which we rewrite as
∞ ∞ N
dx 1 X
Z Z
exp (−y)G(y)dy = G(y)dy ≈ G(y (xi )),
0 0 dy N
i=1
The algorithm is rather simple. In the function which sets up the integral, we simply
need the random number generator for the uniform distribution in order to obtain
numbers in the interval [0,1]. We obtain y by the taking the logarithm of (1 − x). Our
calling function which sets up the new random variable y may then include statements
like
.....
idum=-1;
x=ran0(&idum);
y=-log(1.-x);
.....
Example 3
Another function which provides an example for a PDF is
dy
p(y)dy = ,
(a + by )n
with n > 1. It is normalizable, positive definite, analytically integrable and the integral is
invertible, allowing thereby the expression of a new variable in terms of the old one.
The integral Z ∞
dy 1
= ,
0 (a + by )n (n − 1)ban−1
gives
(n − 1)ban−1
p(y)dy = dy ,
(a + by )n
which in turn gives the cumulative function
Z y (n − 1)ban−1 0
x(y) = P(y) = dy =,
0 (a + bx)n
resulting in
a“ ”
y= (1 − x)−1/(n−1) − 1 .
b
it is rather difficult to find an inverse since the cumulative distribution is given by the
error function erf (x).
If we however switch to polar coordinates, we have for x and y
“ ”1/2 x
r = x2 + y2 θ = tan−1 ,
y
resulting in
g(r , θ) = r exp (−r 2 /2)drdθ,
where the angle θ could be given by a uniform distribution in the region [0, 2π].
Following example 1 above, this implies simply multiplying random numbers x ∈ [0, 1]
by 2π.
A function which yields such random numbers for the normal distribution would include
statements like
.....
idum=-1;
radius=sqrt(-2*ln(1.-ran0(idum)));
theta=2*pi*ran0(idum);
x=radius*cos(theta);
y=radius*sin(theta);
.....
Importance Sampling
With the aid of the above variable transformations we address now one of the most
widely used approaches to Monte Carlo integration, namely importance sampling.
Let us assume that p(y) is a PDF whose behavior resembles that of a function F
defined in a certain interval [a, b]. The normalization condition is
Z b
p(y)dy = 1.
a
Importance Sampling
Since random numbers are generated for the uniform distribution p(x) with x ∈ [0, 1],
we need to perform a change of variables x → y through
Z y
x(y) = p(y 0 )dy 0 ,
a
where we used
p(x)dx = dx = p(y)dy .
Importance Sampling
With this change of variables we can express the integral of Eq. (61) as
Z b F (y)
Z b F (y(x))
I= p(y) dy = dx,
a p(y) a p(y(x))
b N
F (y(x)) 1 X F (y(xi ))
Z
dx = .
a p(y(x)) N p(y(xi ))
i=1
The advantage of such a change of variables in case p(y) follows closely F is that the
integrand becomes smooth and we can sample over relevant values for the integrand.
It is however not trivial to find such a function p. The conditions on p which allow us to
perform these transformations are
1 p is normalizable and positive definite,
2 it is analytically integrable and
3 the integral is invertible, allowing us thereby to express a new variable in terms of
the old one.
Importance Sampling
The algorithm for this procedure is
Use the uniform distribution to find the random variable y in the interval [0,1].
p(x) is a user provided PDF.
Evaluate thereafter
Z b Z b F (x)
I= F (x)dx = p(x) dx,
a a p(x)
by rewriting
Z b F (x)
Z b F (x(y))
p(x) dx = dy ,
a p(x) a p(x(y))
since
dy
= p(x).
dx
Perform then a Monte Carlo sampling for
b N
F (x(y)) 1 X F (x(yi ))
Z
dy, ≈ ,
a p(x(y)) N p(x(yi ))
i=1
1
Z 1
p(x) = (4 − 2x) p(x)dx = 1,
3 0
resulting
F (0) F (1) 3
= = .
p(0) p(1) 4
Check that it fullfils the requirements of a PDF. We perform then the change of
variables (via the Cumulative function)
Z x 1
y(x) = p(x 0 )dx 0 = x (4 − x) ,
0 3
or
x = 2 − (4 − 3y)1/2
Simple Code
Code at http://folk.uio.no/mhjensen/fys3150/2005/programs/
chapter8/example2.cpp.
The suffix cr stands for the brute force approach while is stands for the use of
importance sampling. All calculations use ran0 as function to generate the uniform
distribution.
Multidimensional Integrals
where
g(x, y) = exp (−x2 − y2 − (x − y)2 /2),
with d = 6.
We can solve this integral by employing our brute force scheme, or using importance
sampling and random variables distributed according to a gaussian√PDF. For the latter,
if we set the mean value µ = 0 and the standard deviation σ = 1/ 2, we have
1
√ exp (−x 2 ),
π
and through
6 „ «
1
Z Y
π3 √ exp (−xi2 ) exp (−(x − y)2 /2)dx1 . . . . dx6 ,
π
i=1
Z 6
Y
f (x1 , . . . , xd )F (x1 , . . . , xd ) dxi ,
i=1
Brute Force I
.....
// evaluate the integral without importance sampling
// Loop over Monte Carlo Cycles
for ( int i = 1; i <= n; i++){
// x[] contains the random numbers for all dimensions
for (int j = 0; j< 6; j++) {
x[j]=-length+2*length*ran0(&idum);
}
fx=brute_force_MC(x);
int_mc += fx;
sum_sigma += fx*fx;
}
int_mc = int_mc/((double) n );
sum_sigma = sum_sigma/((double) n );
variance=sum_sigma-int_mc*int_mc;
......
Brute Force II
Importance Sampling I
..........
// evaluate the integral with importance sampling
for ( int i = 1; i <= n; i++){
// x[] contains the random numbers for all dimensions
for (int j = 0; j < 6; j++) {
x[j] = gaussian_deviate(&idum)*sqrt2;
}
fx=gaussian_MC(x);
int_mc += fx;
sum_sigma += fx*fx;
}
int_mc = int_mc/((double) n );
sum_sigma = sum_sigma/((double) n );
variance=sum_sigma-int_mc*int_mc;
.............
Importance Sampling II
Results for as function of number of Monte Carlo samples N. The exact answer is
I ≈ 10.9626 for the integral. The suffix cr stands for the brute force approach while gd
stands for the use of a Gaussian distribution function. All calculations use ran0 as
function to generate the uniform distribution.
N Icr Igd
10000 1.15247E+01 1.09128E+01
100000 1.29650E+01 1.09522E+01
1000000 1.18226E+01 1.09673E+01
10000000 1.04925E+01 1.09612E+01
MPI_Command_name
MPI_COMMAND_NAME
#include "mpi.h"
#include <stdio.h>
int main (int nargs, char* args)
{
Declarations ....
MPI_Init (&nargs, &args);
MPI_Comm_size (MPI_COMM_WORLD, &size);
MPI_Comm_rank (MPI_COMM_WORLD, &iam);
....
no_intervalls = mcs/size;
myloop_begin = iam*no_intervalls + 1;
myloop_end = (iam+1)*no_intervalls;
Exercise
using brute force Monte Carlo with p(x) = 1 and importance sampling with
p(x) = ae−x where a is a constant.
(b) Calculate the integral Z π 1
I= dx,
0 x 2 + cos2 (x)
with p(x) = ae−x where a is a constant. Determine the value of a which
minimizes the variance.
(c) Try to parallelize the code as well.