Lecture Notes On Monte Carlo Methods: Andrew Larkoski November 7, 2016
Lecture Notes On Monte Carlo Methods: Andrew Larkoski November 7, 2016
Andrew Larkoski
November 7, 2016
1 Lecture 1
This week we deviate from the text and discuss the important topic of Monte Carlo methods.
Monte Carlos are named after the famous casino in Monaco, where chance and probability
rule. This week we will discuss how to numerically simulate outcomes of an experiment
(“data”). This is a huge and important subject into which we will just dip our feet.
Every experiment in physics corresponds to sampling a probability distribution for the
quantity that you measure.
Example 1: What is the distribution of the times between cars passing a point on the road?
We must measure the time it takes for another car to pass the point. As more
and more cars are measured, we have more and more data points on which to
determine the distribution. We will refer to each instance or data point of an
experiment as an event. Any experiment performed by humans contains only a
finite number of events. How can we model this?
Figure 1: Illustration of four events in the measurement of the distribution in time of cars
passing a point.
Example 2: What is the distribution of the positions of a particle in an infinite square well?
We must prepare n identical infinite square well systems and measure the position
of the particle in each system. Each measurement of the position is an event.
Note that to determine the distribution requires repeatability: the measurements
1
···
Figure 2: Illustration of an example of three events in which the position of the particle in
the infinite square well was measured.
2
To do this integral, we have used that the integration measure dx = xi and that the sum
of the events in each bin is just the total number of events.
To interpret and compare our data to a prediction, we want to generate a histogram of
events numerically. That is, we want to generate pseudodata, which are a finite number
of events drawn from a given probability distribution of our choice. We can then compare
directly between the histogram of our real data to our pseudodata, and attempt to draw
conclusions. There are numerous methods to quantitatively judge the goodness-of-fit of a
histogram by a probability distribution (things like the Kolmogorov-Smirnov test, Student’s
t-test,1 etc.), but we won’t discuss them here.
While this procedure sounds relatively simple, there are many challenges that must be
overcome to produce pseudodata:
• To sample a probability distribution requires the generation of random numbers. We
will typically work with uniform, real random numbers on the domain of [0, 1]. We
will show that this is sufficient to sample an arbitrary probabiltiy distribution. How is
this accomplished? How does Mathematica generate random real numbers?
– Well, Mathematica actually doesn’t produce truly random numbers. The numbers
generated by Mathematica (or any other programming language) is pseudoran-
dom: they appear random, but are actually defined by a deterministic algorithm.
Good pseudorandom number generators repeat on scales of 210000 calls; this is
called the period of the generator. This will be sufficient, though later in this
class we will discuss other choices. One of the most common pseudorandom num-
ber generator is the “Mersenne Twister”, which has a period equal to a Mersenne
prime (see the poster outside Joel’s office).
– There are some examples of truly random numbers and random number genera-
tors. The most famous is probably the collection in the book “A Million Random
Digits with 100,000 Normal Deviates”. This book is copyrighted, and if you are
in need of amusement, check out the Amazon reviews for this book. Now, there
are companies producing true random number generators which exploit the ran-
domness of quantum noise, though they aren’t found in every household yet.
• More than likely, however, you actually don’t know the probability distribution that
you should be sampling. For the infinite square well example, we know the probability
distribution and can sample it efficiently. What do you do if you don’t know it? For
example, what if you want to determine the distribution of spins of a material in 2D
at finite temperature? How do you make pseudodata of an event (a measurement of
the spins in the system)?
3
aligned. At finite temperature, energy is shared between the system of spins and
the outside world, so the spins do not have to be all aligned. However, it is most
likely that the spins are aligned. We will discuss this example in more detail later
in the course.
···
event 1 event 2
For the rest of this week, we will just assume that we have a random number generator
which can produce uniformly distributed real numbers in [0, 1]. Briefly later in the course,
we will question if this is actually the best strategy. Also, we will assume there that we
know the probability distribution that we want to sample and generate pseudodata. So, the
problem we will address this week is how to generate random events distributed according
to a probability distribution p(x), utilizing a random number generator uniform on [0, 1].
In the next lecture, we’ll get into how to do this. For the rest of this lecture, I want to
spend some time discussing one of the most extensive applications of Monte Carlo methods:
simulation of proton-proton collision events at the Large Hadron Collider experiment in
Geneva, Switzerland.
The Large Hadron Collider (LHC) is the largest physics experiment ever created. It
consists of a 27 km (18 mile) ring in which protons are accelerated to kinetic energies of 6.5
tera-electron volts (TeV), which is about 7000 times the rest mass energy of the proton! This
is roughly the kinetic energy of a flying mosquito, but contained in a single proton. The two
oppositely circulating beams are made to collide head-on at so-called interaction points on
the ring. At these interaction points, enormous experiments have been constructed to detect
and measure all particles that are produced in the collision. The two largest experiments are
ATLAS (A Toroidal LHC ApparatuS) and CMS (Compact Muon Solenoid). Each of these
experiments are roughly the size and weight of 5 story buildings! The ATLAS and CMS
experimental collaborations each have over 3000 members, by far the largest experimental
groups ever. And all of this to attempt to understand the structure of the proton, and by
extension nature, at the highest energies and shortest distances ever probed.
So what’s going on there? I will discuss a greatly simplified picture of the collisions of
protons, but I hope it will illustrate the complexity and necessity of Monte Carlo tools. Each
collision of a pair of protons is an event: one instance of some probability distribution that
governs how protons interact. Protons, to good approximation, are bags of more fundamental
quarks and gluons which directly interact in collision. We look inside the protons using
4
Figure 4: A view of Geneva, Switzerland, from the Jura Mountain range to the west. The
city of Geneva is located at the end of Lake Geneva, the ring of the Large Hadron Collider
is illustrated in yellow, the runway of the Geneva airport is visible as a size reference, and
Mont Blanc is present o↵ in the distance. Photo Credit: CERN.
the “Neanderthal method”: we smash things together and determine what was inside by
what comes out (cf. to opening presents on your birthday). A huge number of particles
are produced in these collisions; things called pions, kaons, muons, and familiar protons,
photons, and electrons. Schematically, we might illustrate one such event as in figure 6.
Arrows pointing toward the center represent the colliding protons while arrows pointing
away represent the particles that are produced. I have just illustrated one of a nearly
infinite number of possible collections of produced particles (called “final states”). Also,
these particles can have any relativistic momentum, only subject to momentum conservation.
Additionally, it is the ATLAS and CMS experiments that measure properties of the final
state particles (like their charge, energy, etc.). They do not have little labels on them. Any
experiment is imperfect: energies are not measured arbitrarily accurately; angles are smeared
by finite resolution; there are cables that are in the way of equipment; etc. With all of these
challenges, how are we able to make predictions for the final state of LHC collisions?
This problem is nearly ideal for a Monte Carlo solution for the following reasons:
5
Figure 5: An image of the ATLAS experiment during its construction. Notice the human for
scale; the large piping located around the experiment houses currents that source toroidal
electromagnets. Photo Credit: CERN.
2. Monte Carlo also enables for realistic outcomes to be simulated. We not only need
the types of particles produced in the final state, but also their momentum, which is
a series of four real numbers. This is essentially impossible with a large number of
particles in the final state. However, to good approximation, we are able to assume
that the production of particles is a Markov process: that is, the production probability
of additional particles only depends on what particles currently exist. This enables a
simple recursive, iterative algorithm to be able to generate arbitrary final states.
3. Detectors and experiments are imperfect, so we also need to model the response of the
equipment to particular particles with certain momentum to make realistic pseudo-
data. Because the Monte Carlo produces individual pseudoevents, it is straightforward
to “measure” the final state with a detector simulation. The results of the pseudoex-
periments can then be compared to real data to test hypotheses or to discover new
particles and forces of nature!
The most widely used Markov Chain Monte Carlo simulators go by names like “Pythia”
and “Herwig” and are used ubiquitously to understand the data from ATLAS and CMS.
6
+ p+ n
e 0
e
p+ p+
+ 0
2 Lecture 2
Last lecture, we introduced the idea of Monte Carlo methods and the need for simulation
of pseudodata. This becomes important for modeling e↵ects of finite data samples, finite
experimental resolution, or for understanding systems for which the analytic probability
distribution is unknown or even unknowable. In this lecture, we discuss in detail the math-
ematical foundation for Monte Carlo methods.
With the basis from previous lecture, we assume that we have a (known) probability
distribution p(x) and can efficiently sample (pseudo)random real numbers on [0, 1]. How do
we generate a set of numbers {x} distributed according to p(x)?
For simplicity, let’s assume that p(x) is defined on x 2 [0, 1] so that
Z 1
dx p(x) = 1 . (5)
0
The results that follow are true for arbitrary probability distributions; just confining ourselves
to x 2 [0, 1] is for compactness. The probability for a value to lie within dx of x is p(x) dx.
Assuming the inverse exists, we can make the change of variables to y as:
dx
p(x) dx = p(x(y)) dy , (6)
dy
where x(y) is the change of variables. This then defines a new probability distribution:
dx
p̄(y) = p(x(y)) . (7)
dy
Now, if we assume that this change of variables renders p̄(y) uniform on y 2 [0, 1], then
Z x
dy
p(x(y)) = ) y= dx0 p(x0 ) + c . (8)
dx 0
7
c is an integration constant which is 0 because p(x) is normalized.
The integral Z x
dx0 p(x0 ) (9)
0
is just the cumulative distribution, or the probability that the random variable is less than
x. We will denote it by Z x
⌃(x) = dx0 p(x0 ) . (10)
0
Note that ⌃(0) = 1, ⌃(1) = 1, and ⌃(x) is monotonic on x 2 [0, 1]. Because it is monotonic,
⌃(x) can always be inverted. Now, we can solve for x:
x = ⌃ 1 (y) . (11)
That is, the random variable x distributed as p(x) can be generated by inverting the cumu-
lative distribution and using the uniformly distributed y 2 [0, 1] as the independent variable.
The fundamental step for this method of sampling a distribution is generating random
numbers distributed on [0, 1]. The following Mathematica function rand outputs an array
of random numbers distributed on [0, 1] using its internal random number generator:
rand[Nev ]:=Module[{randtab,r},
randtab={};
For[i = 1, i Nev, i++,
r = RandomReal[{0,1}];
AppendTo[randtab,r];
];
Return[randtab];
];
This is the central result for Monte Carlo simulation. “Any” (invertible, closed form)
probability distribution can be sampled from a uniform distribution. Let’s see how this
works in a couple of examples.
Example.
Let’s consider a quantum particle in the infinite square well. The probability
distribution for a the position of a particle in the ground state of a square well
on x 2 [0, 1] is
p(x) = 2 sin2 (⇡x) . (12)
Note that this is normalized. The cumulative distribution is
Z x
sin(2⇡x)
⌃(x) = dx0 p(x0 ) = x . (13)
0 2⇡
8
Then, to sample this distribution, we set the cumulative distribution equal to y,
which is uniformly distributed on [0, 1]:
sin(2⇡x)
y=x . (14)
2⇡
This isn’t solvable for x in closed form, but for a given y 2 [0, 1], there exists a
unique value of x that can be determined numerically (using Newton’s method,
for example, to find roots).
Example.
Let’s consider the probability distribution of the time it takes a radioactive el-
ement to decay. Radioactive decay of an atom is governed by its half-life, .
At time t = , there is a 50% chance that the atom has decayed. Equivalently,
given a sample of material composed of an unstable element, 50% of it will have
decayed by t = . The probability distribution of the time t it takes one of these
unstable atoms to decay is
log 2 t
log 2
p(t) = e , (15)
where the time t 2 [0, 1). That is, the atom can decay any time after we start
watching (at time t = 0). Note that the ratio of the probability at time t = to
t = 0 is 1/2:
p(t = ) 1
= e log 2 = . (16)
p(t = 0) 2
That is, the likelihood that the atom is still there at time t = is 50%.
Now, we would like to be able to sample this distribution via a Monte Carlo. So,
we first determine the cumulative distribution:
Z t
t
⌃(t) = dt0 p(t0 ) = 1 e log 2 , (17)
0
t= log(1 y) . (18)
log 2
Note that for y 2 [0, 1], this time ranges over t 2 [0, 1).
The following Mathematica function expdist samples an exponential distribu-
tion with half-life and outputs a histogram of the resulting distribution. Its
arguments are , the number of events Nev, the end-point of the histogram
maxpoint, and the number of bins in the histogram Nbins:
9
expdist[ ,Nev ,maxpoint ,Nbins ]:=Module[{randtab,r,t,binno},
randtab=Table[{maxpoint*i/Nbins,0.0},{i,0,Nbins-1}];
For[i = 1, i Nev, i++,
r = RandomReal[{0,1}];
t = - *Log[1-r]/Log[2];
If[t < maxpoint,
binno=Ceiling[Nbins*t/maxpoint];
randtab[[binno,2]]+=Nbins/(maxpoint*Nev);
];
];
Return[randtab];
];
0.6
0.5
0.4
p(t)
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6
t
Figure 7: A plot comparing the output of the program expdist to the analytic expression
for the exponential probability distribution with half-life = 1s, p(t) = log 2 exp[ t log 2].
To make the plot of the histogram, we set InterpolationOrder ! 0, which gives the
histogram the stair step-like shape. The histogram and the plot of p(t) lie on top of one
another.
So, we know how to sample a given probability distribution. What if, however, we instead
want to simulate some process, where each step in the process is some probability distribution
itself? In the case of the unstable element, let’s imagine that we have a large sample of it,
10
and we want to determine the number of atoms that decay in a given time interval t 2 [0, T ].
How do we do this? For this problem and many others in physics, it can be expressed as a
Markov Chain, which we will explore now.
To a good approximation, radioactive decays in a sub-critical sample are independent of
one another. So, the probability of any atom decaying is just the exponential distribution.
The exponential distribution is also “memoryless”: it is independent of when you start the
clock (and appropriately normalize).2 So, to determine how many atoms decay in a time T ,
we have the simple procedure:
1. Determine the time that the first atom decayed t1 from
t1 = log(1 y) , (19)
log 2
for y uniform on [0, 1]. If t1 > T , then stop; no atoms decayed in time T . If t1 < T ,
then continue.
2. Determine the time (from t = 0) that the second atom decayed t2 from
t2 t1 = log(1 y) , (20)
log 2
again with y uniform on [0, 1]. If t2 > T , then stop; 1 atom decayed in time T . If
t2 < T , then continue.
11
but it is not clear at all how to invert this. How do you uniquely invert a function of two
variables into two other variables? To proceed, we need another tactic.
Instead, if we are able to express p(x, y) as the product of two 1D probability distributions,
then we can apply our methods from 1D to 2D distributions. We might think that we could
write
p(x, y) = p(x)p(y) , (22)
but this is not true in general. Instead, we utilize the definition of a conditional probability:
p(x, y)
p(x|y) = , (23)
p(y)
where we define Z 1
p(y) = dx p(x, y) . (24)
0
We read p(x|y) as “probability of x given y”, and the conditional probability is a 1D proba-
bility distribution itself. For p(x|y), y is a fixed parameter, and not a random variable. To
verify that it is normalized we have:
Z 1 Z 1 Z 1
p(x, y) 1
dx p(x|y) = dx = dx p(x, y) = 1 . (25)
0 0 p(y) p(y) 0
Using the conditional probability, we can then express a 2D distribution as a product of two
1D distributions:
p(x, y) = p(y)p(x|y) = p(x)p(y|x) . (26)
Note that this representation is symmetric in x and y.
Then, to sample the 2D distribution p(x, y), we first choose a random y according to p(y)
and a Monte Carlo for it. Then, given that value of y, we choose x according to p(x|y). This
produces a pair (x, y) chosen according to p(x, y), and we can repeat to generate a large
sample of pseudodata. Let’s see how this works in an example.
12
Example.
Let’s consider the 2D distribution
p(x, y) = x + y , (27)
and that the probability cannot be split up into a product: p(x, y) 6= p(x)p(y).
To sample this distribution via a Monte Carlo, we therefore must first find con-
ditional probabilities. Let’s find p(y). We have
Z 1
1
p(y) = dx (x + y) = y + . (29)
0 2
Then, the probability of x given y is
p(x, y) x+y
p(x|y) = = . (30)
p(y) y + 12
To then Monte Carlo this problem, we need to find the corresponding 1D cumu-
lative distributions:
Z y ✓ ◆
0 0 1 y(1 + y)
⌃(y) = dy y + = . (32)
0 2 2
Z x ✓ ◆
0 x+y x(2y + x)
⌃(x|y) = dx 1 = . (33)
0 y+2 2y + 1
Let’s then generate two random numbers r1 , r2 uniform on [0, 1], and set these
equal to the cumulative distributions and invert. That is,
r
1 1
⌃(y) = r1 ) y = + + 2r1 , (34)
2 4
p
⌃(x|y) = r2 ) x = y + y 2 + (2y + 1)r2 . (35)
13
xygen[Nev ]:=Module[{outtab,r1,r2,x,y},
outtab={};
For[i = 1, i Nev, i++,
r1 = RandomReal[{0,1}];
r2 = RandomReal[{0,1}];
p
y = -1/2 p
+ 1/4 + 2*r1;
x = -y + y*y + (2*y+1)*r2;
AppendTo[outtab,{x,y}];
];
Return[outtab];
];
3 Lecture 3
Last lecture, we developed the mathematical foundation for Monte Carlos, Markov Chains
and their utility, and how to sample multi-dimensional distributions. Everything developed
last lecture, however, required analytic calculations to invert cumulative probability distri-
butions. At the very least, efficient numerical methods were required to invert cumulative
probability distributions. In this lecture, we will discuss how to relax this assumption and
efficiently sample generic distributions.
Let’s just assume that the probability distribution p(x) exists on the domain x 2 [0, 1] and
we want to sample it. It might not even have an analytic expression, but just be expressed
as a collection of points. For this general case, there’s no way to invert p(x) as we did before,
so we need another technique. What we will do is bound the probability distribution p(x)
that we want by an integrable, normalizable function q(x) that we can sample efficiently and
simply. If we find a q(x) such that
q(x) p(x) , (36)
for all x 2 [0, 1], then we can use the probability distribution derived from q(x) to sample
p(x). Let’s see how this works.
To be concrete, let’s take q(x) constant on x 2 [0, 1]. With the maximum value of p(x)
we can set q(x) = pmax so that q(x) p(x) for all x 2 [0, 1]. We can turn q(x) into a
probability distribution by normalizing; we will call this probability distribution q̃(x):
q̃(x) = 1 . (38)
14
generate additional uniform random numbers to veto those points that are inconsistent with
p(x). This introduces a loss of efficiency, but we will discuss later how to improve this.
The procedure for sampling p(x) given the bounding function q(x) = pmax is the following:
p(x)
pkeep = 2 [0, 1] . (40)
pmax
This can be accomplished by choosing another uniform random number y 2 [0, 1] and
vetoing (=removing) the event if
p(x)
y> . (41)
pmax
4. The events that remain will be distributed according to p(x). To normalize the distri-
bution to integrate to 1, we multiply by pmax .
Example.
Consider again the probability distribution of the position of the particle in the
ground state of the infinite square well:
for x 2 [0, 1]. Note that the maximum value of p(x) is pmax = 2. So, we find x
by choosing a random number on [0, 1] and then keep that x with probability
p(x)
pkeep = = sin2 (⇡x) 2 [0, 1] . (43)
pmax
This never required inverting cumulative probability distributions.
A Mathematica function vetowell that implements this veto method for sam-
pling probability distributions is as follows. vetowell has two arguments: the
number of events to generate Nev, and the number of bins to put in the histogram
Nbins.
15
vetowell[Nev ,Nbins ]:=Module[{outtab,r1,r2,binno},
outtab=Table[{i/Nbins,0.0},{i,0,Nbins-1}];
For[ i = 1, i Nev, i++,
r1=RandomReal[{0,1}];
r2=RandomReal[{0,1}];
binno=Ceiling[r1*Nbins];
If[r2 < Sin[⇡*r1]2 , outtab[[binno,2]] += 2.0*Nbins/Nev];
];
Return[outtab];
];
2.0
1.5
ψ* ψ
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 8: A plot comparing the output of the program vetowell to the analytic expression
for the probability distribution of the position of the particle in the ground state of the
infinite square well. To make the plot of the histogram, we set InterpolationOrder ! 0,
which gives the histogram the stair step-like shape. The histogram and the plot of p(t) lie
on top of one another.
Note that sin2 (⇡x) ! 0 if x ! 0 or x ! 1, and so step 3 of this algorithm becomes very
inefficient (most events are vetoed) in these regions. Efficiency can be regained by bounding
the probability distribution p(x) more strongly by a non-uniform function q(x). Let’s see
this in an example.
16
Example.
Again, let’s consider the ground state of the infinite square well, where
p(x) = 2 sin2 (⇡x) . (44)
Now, consider the function
q(x) = x(1 x) , (45)
on x 2 [0, 1]. Note that with pmax = 2 and qmax = 1/4, we have
8q(x) p(x) , (46)
for all x 2 [0, 1]. The probability distribution corresponding to the function q(x)
is
q̃(x) = 6x(1 x) . (47)
Then, given Eq. 46, we can generate events according to the probability distri-
bution q̃(x) and then veto appropriately. The cumulative distribution of q̃(x)
is
⌃q (x) = 3x2 2x3 . (48)
By inverting this cumulative distribution, we can sample q̃(x) via Monte Carlo.
Then, given some x from the q̃(x) distribution, we keep that event with proba-
bility equal to
p(x) sin2 (⇡x)
pkeep = = 2 [0, 1] . (49)
8q(x) 4x(1 x)
The events that remain after this veto step are distributed according to p(x). To
properly normalize the distribution, at the end we must multiply by pmax and
divide the histogram entries by the maximum value of q̃(x).
A very reasonable question to ask, both for these pseudodata applications as well as for
evaluating integrals, is what the accuracy of Monte Carlo is. To be concrete, we will just
answer the following question: what is the uncertainty on the probability to be near x, as
the number of Monte Carlo events N ! 1? That is, we want to study the Monte Carlo
approximation to
P = p(x) dx . (50)
In Monte Carlo, this is approximated by the ratio of the number of events in the bin near
the point x, Nx , divided by the total number of events in the distribution, N . By the law of
large numbers, this converges to P as N ! 1:
lim Nx = P N . (51)
N !1
What is the variance of this? By our Monte Carlo, all events in the bin are independent
of one another and each contributes 1 to the bin. The rate at which any event populates this
bin is fixed and equal to P . These criteria uniquely define the Poisson distribution for the
number of events in the bin. This is actually straightforward to derive, which we will do now.
17
Derivation of Poisson Distribution
We want to determine the probability distribution for the number of events in a
bin of a histogram that was filled via a Monte Carlo. The probability distribution
that we are sampling is p(x) and the probability P for an event to be in the bin
near x is
P = p(x) dx . (52)
The probability for an event to not be in the bin is then 1 P . All events in
our pseudodata sample are independent. We therefore determine the probability
that there are k events in the bin near x is:
✓ ◆
k N k N
pk = P (1 P ) . (53)
k
This probability is called the binomial distribution for the following reason. The
factor on the right, ✓ ◆
N N!
= , (54)
k k!(N k)!
is read as “N choose k” and is the number of ways to pick k objects out of a set
of N total objects. The probability in Eq. 53 is called the binomial distribution
because the sum over all k corresponds to the binomial expansion:
N
X N
X ✓ ◆
N
pk = k
P (1 P) N k
= (P + (1 P ))N = 1 . (55)
k=0 k=0
k
Now, we could stop here, but we can simplify the probability by making a few
more assumptions. We will assume that the width of the bin is very small so
that P ⌧ 1 and k ⌧ N . In this limit, note that
✓ ◆
N N! Nk
= ' , (56)
k k!(N k)! k!
and (1 P )k ' 1, for finite k. With these simplifications, the probability for k
events in a bin becomes
✓ ◆N
(N P )k NP (N P )k N P
pk ! 1 = e , (57)
k! N k!
where the equality holds in the N ! 1 limit. This probability distribution is
called the Poisson distribution. Note that it is normalized:
1
X (N P )k NP
e = 1, (58)
k=0
k!
2
has mean hki = N P and variance = hk 2 i hki2 = N P .
18
Therefore, with N events in a sample and probability P for any of those events to be
in a particular bin, the average number of events in the bin is N P and the variance is also
N P . Therefore, the standardpdeviation (the square-root of the variance) on the number of
events in the bin scales like N . The relative
p error is defined by the ratio of the standard
deviation to the
p mean. This scales like 1/ N , and so for N events, we expect an error that
scales like 1/ N . Note that this is independent of bin size and the number of dimensions
in which the probability is defined. Monte Carlo and related methods are among the most
efficient way to sample high dimensional distributions. p
Later in the class, we will discuss ways to even improve this 1/ N relative error by
changing the way that random numbers are generated.
Finally, I want to discuss Monte Carlo generation of non-integrable distributions. Such
distributions are not probability distributions as they cannot be normalized. For example,
the function
1
f (x) = , (59)
x
on x 2 [0, 1] has undefined (infinite) integral. Such infinite distributions are unphysical; no
measurement would ever yield 1. However, depending on assumptions and your predictive
ability, one might find infinite distributions in intermediate steps of a calculation.
For example, one might attempt to calculate in degenerate perturbation theory of quan-
tum mechanics and find at order n in perturbation theory:
( 1)n log2n x
fn (x) = . (60)
x n!
The full distribution is a sum over n:
1
X log2 x
e
fn (x) = , (61)
n=0
x
19