0% found this document useful (0 votes)

93 views19 pages

Lecture Notes On Monte Carlo Methods: Andrew Larkoski November 7, 2016

It is for operational research

Uploaded by

Shiraz Siddiqui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views19 pages

Lecture Notes On Monte Carlo Methods: Andrew Larkoski November 7, 2016

It is for operational research

Uploaded by

Shiraz Siddiqui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Lecture Notes on Monte Carlo Methods

Andrew Larkoski

November 7, 2016

1 Lecture 1
This week we deviate from the text and discuss the important topic of Monte Carlo methods.
Monte Carlos are named after the famous casino in Monaco, where chance and probability
rule. This week we will discuss how to numerically simulate outcomes of an experiment
(“data”). This is a huge and important subject into which we will just dip our feet.
Every experiment in physics corresponds to sampling a probability distribution for the
quantity that you measure.

Example 1: What is the distribution of the times between cars passing a point on the road?

We must measure the time it takes for another car to pass the point. As more
and more cars are measured, we have more and more data points on which to
determine the distribution. We will refer to each instance or data point of an
experiment as an event. Any experiment performed by humans contains only a
finite number of events. How can we model this?

car 1 car 2 car 3 car 4

t=0 t = 5s t = 12s t = 25s

“event 1” t = 15s

Figure 1: Illustration of four events in the measurement of the distribution in time of cars
passing a point.

Example 2: What is the distribution of the positions of a particle in an infinite square well?

We must prepare n identical infinite square well systems and measure the position
of the particle in each system. Each measurement of the position is an event.
Note that to determine the distribution requires repeatability: the measurements

1
···

event 1 event 2 event 3

0 x = 0.3 1 x = 0.6 x = 0.04

Figure 2: Illustration of an example of three events in which the position of the particle in
the infinite square well was measured.

must be done on identical systems. Otherwise, we do not know how to interpret

our results. (The possibility of repeatability is intimately related to energy and
momentum conservation, but that’s a topic for another class.) How do we sample
a probability distribution numerous times?
For any finite number of measurements, we cannot reproduce the probability distribution
exactly, but we can approximate it arbitrarily precisely. To do this, take our set of events:
I = {events of experiment} , (1)
and divide them into sets according to their values; these are called bins. For example,
for the infinite square well on the domain x 2 [0, 1], we will divide the possible measured
positions into Nbins bins. If an event has a measured position

i 1 i
x2 , , (2)
Nbins Nbins
for an integer i 2 [1, Nbins ], then we add 1 to this bin.
Doing this for all of the Nev events in an experiment generates a histogram, which is a
finite approximation to a smooth distribution. This will produce a histogram whose entries
sum to the total number of events. If instead we want a histogram that integrates to 1, so
that it can be directly interpreted as a probability distribution, instead of adding 1 to the
ith bin we add
1
. (3)
xi Nev
To see that this results in a histogram that integrates to 1, we will denote the number of
events in the ith bin by Ni . Then the integral of the histogram is just the sum:
N
X Nbins
bins
Ni 1 X
xi · = Ni = 1 . (4)
i=1
xi Nev Nev i=1

2
To do this integral, we have used that the integration measure dx = xi and that the sum
of the events in each bin is just the total number of events.
To interpret and compare our data to a prediction, we want to generate a histogram of
events numerically. That is, we want to generate pseudodata, which are a finite number
of events drawn from a given probability distribution of our choice. We can then compare
directly between the histogram of our real data to our pseudodata, and attempt to draw
conclusions. There are numerous methods to quantitatively judge the goodness-of-fit of a
histogram by a probability distribution (things like the Kolmogorov-Smirnov test, Student’s
t-test,1 etc.), but we won’t discuss them here.
While this procedure sounds relatively simple, there are many challenges that must be
overcome to produce pseudodata:
• To sample a probability distribution requires the generation of random numbers. We
will typically work with uniform, real random numbers on the domain of [0, 1]. We
will show that this is sufficient to sample an arbitrary probabiltiy distribution. How is
this accomplished? How does Mathematica generate random real numbers?
– Well, Mathematica actually doesn’t produce truly random numbers. The numbers
generated by Mathematica (or any other programming language) is pseudoran-
dom: they appear random, but are actually defined by a deterministic algorithm.
Good pseudorandom number generators repeat on scales of 210000 calls; this is
called the period of the generator. This will be sufficient, though later in this
class we will discuss other choices. One of the most common pseudorandom num-
ber generator is the “Mersenne Twister”, which has a period equal to a Mersenne
prime (see the poster outside Joel’s office).
– There are some examples of truly random numbers and random number genera-
tors. The most famous is probably the collection in the book “A Million Random
Digits with 100,000 Normal Deviates”. This book is copyrighted, and if you are
in need of amusement, check out the Amazon reviews for this book. Now, there
are companies producing true random number generators which exploit the ran-
domness of quantum noise, though they aren’t found in every household yet.
• More than likely, however, you actually don’t know the probability distribution that
you should be sampling. For the infinite square well example, we know the probability
distribution and can sample it efficiently. What do you do if you don’t know it? For
example, what if you want to determine the distribution of spins of a material in 2D
at finite temperature? How do you make pseudodata of an event (a measurement of
the spins in the system)?

– We do know what the configuration of spins should satisfy physically. Energy

is minimized if the spins align, and so we can check spin by spin if they are
1
Student was the pseudonym of William Sealy Gosset, who worked for the Guinness brewery. Gosset
applied statistics to quality control of the beer produced at Guinness but was forbidden to publish, so he
created the name “Student” to evade the restriction.

3
aligned. At finite temperature, energy is shared between the system of spins and
the outside world, so the spins do not have to be all aligned. However, it is most
likely that the spins are aligned. We will discuss this example in more detail later
in the course.

···
event 1 event 2

Figure 3: Events in the measurement of a system of 16 spins at finite temperature.

For the rest of this week, we will just assume that we have a random number generator
which can produce uniformly distributed real numbers in [0, 1]. Briefly later in the course,
we will question if this is actually the best strategy. Also, we will assume there that we
know the probability distribution that we want to sample and generate pseudodata. So, the
problem we will address this week is how to generate random events distributed according
to a probability distribution p(x), utilizing a random number generator uniform on [0, 1].
In the next lecture, we’ll get into how to do this. For the rest of this lecture, I want to
spend some time discussing one of the most extensive applications of Monte Carlo methods:
simulation of proton-proton collision events at the Large Hadron Collider experiment in
Geneva, Switzerland.
The Large Hadron Collider (LHC) is the largest physics experiment ever created. It
consists of a 27 km (18 mile) ring in which protons are accelerated to kinetic energies of 6.5
tera-electron volts (TeV), which is about 7000 times the rest mass energy of the proton! This
is roughly the kinetic energy of a flying mosquito, but contained in a single proton. The two
oppositely circulating beams are made to collide head-on at so-called interaction points on
the ring. At these interaction points, enormous experiments have been constructed to detect
and measure all particles that are produced in the collision. The two largest experiments are
ATLAS (A Toroidal LHC ApparatuS) and CMS (Compact Muon Solenoid). Each of these
experiments are roughly the size and weight of 5 story buildings! The ATLAS and CMS
experimental collaborations each have over 3000 members, by far the largest experimental
groups ever. And all of this to attempt to understand the structure of the proton, and by
extension nature, at the highest energies and shortest distances ever probed.
So what’s going on there? I will discuss a greatly simplified picture of the collisions of
protons, but I hope it will illustrate the complexity and necessity of Monte Carlo tools. Each
collision of a pair of protons is an event: one instance of some probability distribution that
governs how protons interact. Protons, to good approximation, are bags of more fundamental
quarks and gluons which directly interact in collision. We look inside the protons using

4
Figure 4: A view of Geneva, Switzerland, from the Jura Mountain range to the west. The
city of Geneva is located at the end of Lake Geneva, the ring of the Large Hadron Collider
is illustrated in yellow, the runway of the Geneva airport is visible as a size reference, and
Mont Blanc is present o↵ in the distance. Photo Credit: CERN.

the “Neanderthal method”: we smash things together and determine what was inside by
what comes out (cf. to opening presents on your birthday). A huge number of particles
are produced in these collisions; things called pions, kaons, muons, and familiar protons,
photons, and electrons. Schematically, we might illustrate one such event as in figure 6.
Arrows pointing toward the center represent the colliding protons while arrows pointing
away represent the particles that are produced. I have just illustrated one of a nearly
infinite number of possible collections of produced particles (called “final states”). Also,
these particles can have any relativistic momentum, only subject to momentum conservation.
Additionally, it is the ATLAS and CMS experiments that measure properties of the final
state particles (like their charge, energy, etc.). They do not have little labels on them. Any
experiment is imperfect: energies are not measured arbitrarily accurately; angles are smeared
by finite resolution; there are cables that are in the way of equipment; etc. With all of these
challenges, how are we able to make predictions for the final state of LHC collisions?
This problem is nearly ideal for a Monte Carlo solution for the following reasons:

1. In proton-proton collisions, there are essentially an infinite number of possible out-

comes. While there is a fundamental theory underlying proton collision physics (the
Standard Model), it is amazingly complex and there is no known way (and perhaps

5
Figure 5: An image of the ATLAS experiment during its construction. Notice the human for
scale; the large piping located around the experiment houses currents that source toroidal
electromagnets. Photo Credit: CERN.

no way whatsoever) to achieve exact, analytic solutions for probability distributions.

By incorporating the physics of the Standard Model, we are able to construct Monte
Carlo simulations that produce final states faithfully according to their probability to
occur.

2. Monte Carlo also enables for realistic outcomes to be simulated. We not only need
the types of particles produced in the final state, but also their momentum, which is
a series of four real numbers. This is essentially impossible with a large number of
particles in the final state. However, to good approximation, we are able to assume
that the production of particles is a Markov process: that is, the production probability
of additional particles only depends on what particles currently exist. This enables a
simple recursive, iterative algorithm to be able to generate arbitrary final states.

3. Detectors and experiments are imperfect, so we also need to model the response of the
equipment to particular particles with certain momentum to make realistic pseudo-
data. Because the Monte Carlo produces individual pseudoevents, it is straightforward
to “measure” the final state with a detector simulation. The results of the pseudoex-
periments can then be compared to real data to test hypotheses or to discover new
particles and forces of nature!

The most widely used Markov Chain Monte Carlo simulators go by names like “Pythia”
and “Herwig” and are used ubiquitously to understand the data from ATLAS and CMS.

6
+ p+ n
e 0
e

p+ p+

+ 0

Figure 6: Schematic representation of a proton-proton collision event. Note charge conser-

vation.

2 Lecture 2
Last lecture, we introduced the idea of Monte Carlo methods and the need for simulation
of pseudodata. This becomes important for modeling e↵ects of finite data samples, finite
experimental resolution, or for understanding systems for which the analytic probability
distribution is unknown or even unknowable. In this lecture, we discuss in detail the math-
ematical foundation for Monte Carlo methods.
With the basis from previous lecture, we assume that we have a (known) probability
distribution p(x) and can efficiently sample (pseudo)random real numbers on [0, 1]. How do
we generate a set of numbers {x} distributed according to p(x)?
For simplicity, let’s assume that p(x) is defined on x 2 [0, 1] so that
Z 1
dx p(x) = 1 . (5)
0

The results that follow are true for arbitrary probability distributions; just confining ourselves
to x 2 [0, 1] is for compactness. The probability for a value to lie within dx of x is p(x) dx.
Assuming the inverse exists, we can make the change of variables to y as:
dx
p(x) dx = p(x(y)) dy , (6)
dy
where x(y) is the change of variables. This then defines a new probability distribution:
dx
p̄(y) = p(x(y)) . (7)
dy
Now, if we assume that this change of variables renders p̄(y) uniform on y 2 [0, 1], then
Z x
dy
p(x(y)) = ) y= dx0 p(x0 ) + c . (8)
dx 0

7
c is an integration constant which is 0 because p(x) is normalized.
The integral Z x
dx0 p(x0 ) (9)
0
is just the cumulative distribution, or the probability that the random variable is less than
x. We will denote it by Z x
⌃(x) = dx0 p(x0 ) . (10)
0

Note that ⌃(0) = 1, ⌃(1) = 1, and ⌃(x) is monotonic on x 2 [0, 1]. Because it is monotonic,
⌃(x) can always be inverted. Now, we can solve for x:

x = ⌃ 1 (y) . (11)

That is, the random variable x distributed as p(x) can be generated by inverting the cumu-
lative distribution and using the uniformly distributed y 2 [0, 1] as the independent variable.
The fundamental step for this method of sampling a distribution is generating random
numbers distributed on [0, 1]. The following Mathematica function rand outputs an array
of random numbers distributed on [0, 1] using its internal random number generator:

rand[Nev ]:=Module[{randtab,r},
randtab={};
For[i = 1, i  Nev, i++,
r = RandomReal[{0,1}];
AppendTo[randtab,r];
];
Return[randtab];
];

This is the central result for Monte Carlo simulation. “Any” (invertible, closed form)
probability distribution can be sampled from a uniform distribution. Let’s see how this
works in a couple of examples.

Example.

Let’s consider a quantum particle in the infinite square well. The probability
distribution for a the position of a particle in the ground state of a square well
on x 2 [0, 1] is
p(x) = 2 sin2 (⇡x) . (12)
Note that this is normalized. The cumulative distribution is
Z x
sin(2⇡x)
⌃(x) = dx0 p(x0 ) = x . (13)
0 2⇡

8
Then, to sample this distribution, we set the cumulative distribution equal to y,
which is uniformly distributed on [0, 1]:

sin(2⇡x)
y=x . (14)
2⇡
This isn’t solvable for x in closed form, but for a given y 2 [0, 1], there exists a
unique value of x that can be determined numerically (using Newton’s method,
for example, to find roots).

Example.

Let’s consider the probability distribution of the time it takes a radioactive el-
ement to decay. Radioactive decay of an atom is governed by its half-life, .
At time t = , there is a 50% chance that the atom has decayed. Equivalently,
given a sample of material composed of an unstable element, 50% of it will have
decayed by t = . The probability distribution of the time t it takes one of these
unstable atoms to decay is
log 2 t
log 2
p(t) = e , (15)

where the time t 2 [0, 1). That is, the atom can decay any time after we start
watching (at time t = 0). Note that the ratio of the probability at time t = to
t = 0 is 1/2:
p(t = ) 1
= e log 2 = . (16)
p(t = 0) 2
That is, the likelihood that the atom is still there at time t = is 50%.
Now, we would like to be able to sample this distribution via a Monte Carlo. So,
we first determine the cumulative distribution:
Z t
t
⌃(t) = dt0 p(t0 ) = 1 e log 2 , (17)
0

which is 0 at t = 0 and 1 at t = 1, and is monotonic. Next, we set this equal to

a uniformly distributed y 2 [0, 1] and invert, solving for t. We find

t= log(1 y) . (18)
log 2

Note that for y 2 [0, 1], this time ranges over t 2 [0, 1).
The following Mathematica function expdist samples an exponential distribu-
tion with half-life and outputs a histogram of the resulting distribution. Its
arguments are , the number of events Nev, the end-point of the histogram
maxpoint, and the number of bins in the histogram Nbins:

9
expdist[ ,Nev ,maxpoint ,Nbins ]:=Module[{randtab,r,t,binno},
randtab=Table[{maxpoint*i/Nbins,0.0},{i,0,Nbins-1}];
For[i = 1, i  Nev, i++,
r = RandomReal[{0,1}];
t = - *Log[1-r]/Log[2];
If[t < maxpoint,
binno=Ceiling[Nbins*t/maxpoint];
randtab[[binno,2]]+=Nbins/(maxpoint*Nev);
];
];
Return[randtab];
];

A plot of the output histogram is compared to the analytic function expression

for the distribution in figure 7. Here, we use = 1s, Nev = 10000, maxpoint =
7s, and Nbins = 40.

0.6

0.5

0.4
p(t)

0.3

0.2

0.1

0.0
0 1 2 3 4 5 6
t

Figure 7: A plot comparing the output of the program expdist to the analytic expression
for the exponential probability distribution with half-life = 1s, p(t) = log 2 exp[ t log 2].
To make the plot of the histogram, we set InterpolationOrder ! 0, which gives the
histogram the stair step-like shape. The histogram and the plot of p(t) lie on top of one
another.

So, we know how to sample a given probability distribution. What if, however, we instead
want to simulate some process, where each step in the process is some probability distribution
itself? In the case of the unstable element, let’s imagine that we have a large sample of it,

10
and we want to determine the number of atoms that decay in a given time interval t 2 [0, T ].
How do we do this? For this problem and many others in physics, it can be expressed as a
Markov Chain, which we will explore now.
To a good approximation, radioactive decays in a sub-critical sample are independent of
one another. So, the probability of any atom decaying is just the exponential distribution.
The exponential distribution is also “memoryless”: it is independent of when you start the
clock (and appropriately normalize).2 So, to determine how many atoms decay in a time T ,
we have the simple procedure:
1. Determine the time that the first atom decayed t1 from

t1 = log(1 y) , (19)
log 2
for y uniform on [0, 1]. If t1 > T , then stop; no atoms decayed in time T . If t1 < T ,
then continue.

2. Determine the time (from t = 0) that the second atom decayed t2 from

t2 t1 = log(1 y) , (20)
log 2
again with y uniform on [0, 1]. If t2 > T , then stop; 1 atom decayed in time T . If
t2 < T , then continue.

3. Determine t3 , t4 , . . . , and continue until tn > T . Then, n 1 atoms decayed.

This procedure required a couple of things to work as simply as it did. First, at every step
the probability distribution that we sampled was identical (just the exponential distribution).
Also, the time of the next decay only depended on the time of the immediate previous decay
(and not on all previous decays).
Such an iterative procedure that samples a distribution recursively and each step only
depends on the immediate previous step is a Markov Chain. What we constructed above
would properly be called a Markov Chain Monte Carlo (MCMC). Oh, by the way, this was
really overkill for this problem. The distribution for the number of decays in time T is just
the Poisson distribution.
Okay, we have a nice theory for how to sample a 1D probability distribution. How do
we do this for higher dimensional distributions? Let’s consider 2D for concreteness; the
method we develop there will easily generalize. Let’s consider a 2D probability distribution
p(x, y) for (x, y) 2 [0, 1]. To do a similar thing as in 1D, we need to calculate the cumulative
distribution and then invert. The cumulative distribution is
Z x Z y
0
⌃(x, y) = dx dy 0 p(x0 , y 0 ) , (21)
0 0
2
We’re simplifying things here a bit to describe the method. We’re making the somewhat silly assumption
that the atoms decay 1 at a time.

11
but it is not clear at all how to invert this. How do you uniquely invert a function of two
variables into two other variables? To proceed, we need another tactic.
Instead, if we are able to express p(x, y) as the product of two 1D probability distributions,
then we can apply our methods from 1D to 2D distributions. We might think that we could
write
p(x, y) = p(x)p(y) , (22)
but this is not true in general. Instead, we utilize the definition of a conditional probability:

p(x, y)
p(x|y) = , (23)
p(y)

where we define Z 1
p(y) = dx p(x, y) . (24)
0

We read p(x|y) as “probability of x given y”, and the conditional probability is a 1D proba-
bility distribution itself. For p(x|y), y is a fixed parameter, and not a random variable. To
verify that it is normalized we have:
Z 1 Z 1 Z 1
p(x, y) 1
dx p(x|y) = dx = dx p(x, y) = 1 . (25)
0 0 p(y) p(y) 0

Using the conditional probability, we can then express a 2D distribution as a product of two
1D distributions:
p(x, y) = p(y)p(x|y) = p(x)p(y|x) . (26)
Note that this representation is symmetric in x and y.
Then, to sample the 2D distribution p(x, y), we first choose a random y according to p(y)
and a Monte Carlo for it. Then, given that value of y, we choose x according to p(x|y). This
produces a pair (x, y) chosen according to p(x, y), and we can repeat to generate a large
sample of pseudodata. Let’s see how this works in an example.

12
Example.
Let’s consider the 2D distribution

p(x, y) = x + y , (27)

where (x, y) 2 [0, 1]. Note that this is normalized:

Z 1 Z 1
dx dy (x + y) = 1 , (28)
0 0

and that the probability cannot be split up into a product: p(x, y) 6= p(x)p(y).
To sample this distribution via a Monte Carlo, we therefore must first find con-
ditional probabilities. Let’s find p(y). We have
Z 1
1
p(y) = dx (x + y) = y + . (29)
0 2
Then, the probability of x given y is
p(x, y) x+y
p(x|y) = = . (30)
p(y) y + 12

So, we break up the 2D distribution as

✓ ◆✓ ◆
1 x+y
p(x, y) = x + y = y+ . (31)
2 y + 12

To then Monte Carlo this problem, we need to find the corresponding 1D cumu-
lative distributions:
Z y ✓ ◆
0 0 1 y(1 + y)
⌃(y) = dy y + = . (32)
0 2 2
Z x ✓ ◆
0 x+y x(2y + x)
⌃(x|y) = dx 1 = . (33)
0 y+2 2y + 1

Let’s then generate two random numbers r1 , r2 uniform on [0, 1], and set these
equal to the cumulative distributions and invert. That is,
r
1 1
⌃(y) = r1 ) y = + + 2r1 , (34)
2 4
p
⌃(x|y) = r2 ) x = y + y 2 + (2y + 1)r2 . (35)

A Mathematica program xygen that generates pairs of random numbers (x, y)

distributed according to p(x, y) = x + y is provided below. The argument of
xygen is the number of events that you want to generate.

13
xygen[Nev ]:=Module[{outtab,r1,r2,x,y},
outtab={};
For[i = 1, i  Nev, i++,
r1 = RandomReal[{0,1}];
r2 = RandomReal[{0,1}];
p
y = -1/2 p
+ 1/4 + 2*r1;
x = -y + y*y + (2*y+1)*r2;
AppendTo[outtab,{x,y}];
];
Return[outtab];
];

3 Lecture 3
Last lecture, we developed the mathematical foundation for Monte Carlos, Markov Chains
and their utility, and how to sample multi-dimensional distributions. Everything developed
last lecture, however, required analytic calculations to invert cumulative probability distri-
butions. At the very least, efficient numerical methods were required to invert cumulative
probability distributions. In this lecture, we will discuss how to relax this assumption and
efficiently sample generic distributions.
Let’s just assume that the probability distribution p(x) exists on the domain x 2 [0, 1] and
we want to sample it. It might not even have an analytic expression, but just be expressed
as a collection of points. For this general case, there’s no way to invert p(x) as we did before,
so we need another technique. What we will do is bound the probability distribution p(x)
that we want by an integrable, normalizable function q(x) that we can sample efficiently and
simply. If we find a q(x) such that
q(x) p(x) , (36)
for all x 2 [0, 1], then we can use the probability distribution derived from q(x) to sample
p(x). Let’s see how this works.
To be concrete, let’s take q(x) constant on x 2 [0, 1]. With the maximum value of p(x)

pmax = max p(x) < 1 , (37)

we can set q(x) = pmax so that q(x) p(x) for all x 2 [0, 1]. We can turn q(x) into a
probability distribution by normalizing; we will call this probability distribution q̃(x):

q̃(x) = 1 . (38)

Note then that

pmax · q̃(x) p(x) . (39)
Because we are able to easily generate uniform random numbers on x 2 [0, 1], we are able
to generate a function that bounds p(x) everywhere. So, how to we sample p(x)? We can

14
generate additional uniform random numbers to veto those points that are inconsistent with
p(x). This introduces a loss of efficiency, but we will discuss later how to improve this.
The procedure for sampling p(x) given the bounding function q(x) = pmax is the following:

1. Choose a random number uniformly on x 2 [0, 1].

2. Determine the maximum value of p(x) on x 2 [0, 1]; call it pmax .

3. At the chosen random x, keep the event with probability

p(x)
pkeep = 2 [0, 1] . (40)
pmax

This can be accomplished by choosing another uniform random number y 2 [0, 1] and
vetoing (=removing) the event if
p(x)
y> . (41)
pmax
4. The events that remain will be distributed according to p(x). To normalize the distri-
bution to integrate to 1, we multiply by pmax .

Let’s see how this works in an example.

Example.

Consider again the probability distribution of the position of the particle in the
ground state of the infinite square well:

p(x) = 2 sin2 (⇡x) (42)

for x 2 [0, 1]. Note that the maximum value of p(x) is pmax = 2. So, we find x
by choosing a random number on [0, 1] and then keep that x with probability

p(x)
pkeep = = sin2 (⇡x) 2 [0, 1] . (43)
pmax
This never required inverting cumulative probability distributions.
A Mathematica function vetowell that implements this veto method for sam-
pling probability distributions is as follows. vetowell has two arguments: the
number of events to generate Nev, and the number of bins to put in the histogram
Nbins.

15
vetowell[Nev ,Nbins ]:=Module[{outtab,r1,r2,binno},
outtab=Table[{i/Nbins,0.0},{i,0,Nbins-1}];
For[ i = 1, i  Nev, i++,
r1=RandomReal[{0,1}];
r2=RandomReal[{0,1}];
binno=Ceiling[r1*Nbins];
If[r2 < Sin[⇡*r1]2 , outtab[[binno,2]] += 2.0*Nbins/Nev];
];
Return[outtab];
];

A plot of the output histogram from vetowell is plotted in figure 8. To make

the plot, we have generated 1000000 events and used 100 bins on x 2 [0, 1].

2.0

1.5
ψ* ψ

1.0

0.5

0.0
0.0 0.2 0.4 0.6 0.8 1.0
x

Figure 8: A plot comparing the output of the program vetowell to the analytic expression
for the probability distribution of the position of the particle in the ground state of the
infinite square well. To make the plot of the histogram, we set InterpolationOrder ! 0,
which gives the histogram the stair step-like shape. The histogram and the plot of p(t) lie
on top of one another.

Note that sin2 (⇡x) ! 0 if x ! 0 or x ! 1, and so step 3 of this algorithm becomes very
inefficient (most events are vetoed) in these regions. Efficiency can be regained by bounding
the probability distribution p(x) more strongly by a non-uniform function q(x). Let’s see
this in an example.

16
Example.
Again, let’s consider the ground state of the infinite square well, where
p(x) = 2 sin2 (⇡x) . (44)
Now, consider the function
q(x) = x(1 x) , (45)
on x 2 [0, 1]. Note that with pmax = 2 and qmax = 1/4, we have
8q(x) p(x) , (46)
for all x 2 [0, 1]. The probability distribution corresponding to the function q(x)
is
q̃(x) = 6x(1 x) . (47)
Then, given Eq. 46, we can generate events according to the probability distri-
bution q̃(x) and then veto appropriately. The cumulative distribution of q̃(x)
is
⌃q (x) = 3x2 2x3 . (48)
By inverting this cumulative distribution, we can sample q̃(x) via Monte Carlo.
Then, given some x from the q̃(x) distribution, we keep that event with proba-
bility equal to
p(x) sin2 (⇡x)
pkeep = = 2 [0, 1] . (49)
8q(x) 4x(1 x)
The events that remain after this veto step are distributed according to p(x). To
properly normalize the distribution, at the end we must multiply by pmax and
divide the histogram entries by the maximum value of q̃(x).
A very reasonable question to ask, both for these pseudodata applications as well as for
evaluating integrals, is what the accuracy of Monte Carlo is. To be concrete, we will just
answer the following question: what is the uncertainty on the probability to be near x, as
the number of Monte Carlo events N ! 1? That is, we want to study the Monte Carlo
approximation to
P = p(x) dx . (50)
In Monte Carlo, this is approximated by the ratio of the number of events in the bin near
the point x, Nx , divided by the total number of events in the distribution, N . By the law of
large numbers, this converges to P as N ! 1:
lim Nx = P N . (51)
N !1

What is the variance of this? By our Monte Carlo, all events in the bin are independent
of one another and each contributes 1 to the bin. The rate at which any event populates this
bin is fixed and equal to P . These criteria uniquely define the Poisson distribution for the
number of events in the bin. This is actually straightforward to derive, which we will do now.

17
Derivation of Poisson Distribution
We want to determine the probability distribution for the number of events in a
bin of a histogram that was filled via a Monte Carlo. The probability distribution
that we are sampling is p(x) and the probability P for an event to be in the bin
near x is
P = p(x) dx . (52)
The probability for an event to not be in the bin is then 1 P . All events in
our pseudodata sample are independent. We therefore determine the probability
that there are k events in the bin near x is:
✓ ◆
k N k N
pk = P (1 P ) . (53)
k
This probability is called the binomial distribution for the following reason. The
factor on the right, ✓ ◆
N N!
= , (54)
k k!(N k)!
is read as “N choose k” and is the number of ways to pick k objects out of a set
of N total objects. The probability in Eq. 53 is called the binomial distribution
because the sum over all k corresponds to the binomial expansion:
N
X N
X ✓ ◆
N
pk = k
P (1 P) N k
= (P + (1 P ))N = 1 . (55)
k=0 k=0
k

Now, we could stop here, but we can simplify the probability by making a few
more assumptions. We will assume that the width of the bin is very small so
that P ⌧ 1 and k ⌧ N . In this limit, note that
✓ ◆
N N! Nk
= ' , (56)
k k!(N k)! k!

and (1 P )k ' 1, for finite k. With these simplifications, the probability for k
events in a bin becomes
✓ ◆N
(N P )k NP (N P )k N P
pk ! 1 = e , (57)
k! N k!
where the equality holds in the N ! 1 limit. This probability distribution is
called the Poisson distribution. Note that it is normalized:
1
X (N P )k NP
e = 1, (58)
k=0
k!
2
has mean hki = N P and variance = hk 2 i hki2 = N P .

18
Therefore, with N events in a sample and probability P for any of those events to be
in a particular bin, the average number of events in the bin is N P and the variance is also
N P . Therefore, the standardpdeviation (the square-root of the variance) on the number of
events in the bin scales like N . The relative
p error is defined by the ratio of the standard
deviation to the
p mean. This scales like 1/ N , and so for N events, we expect an error that
scales like 1/ N . Note that this is independent of bin size and the number of dimensions
in which the probability is defined. Monte Carlo and related methods are among the most
efficient way to sample high dimensional distributions. p
Later in the class, we will discuss ways to even improve this 1/ N relative error by
changing the way that random numbers are generated.
Finally, I want to discuss Monte Carlo generation of non-integrable distributions. Such
distributions are not probability distributions as they cannot be normalized. For example,
the function
1
f (x) = , (59)
x
on x 2 [0, 1] has undefined (infinite) integral. Such infinite distributions are unphysical; no
measurement would ever yield 1. However, depending on assumptions and your predictive
ability, one might find infinite distributions in intermediate steps of a calculation.
For example, one might attempt to calculate in degenerate perturbation theory of quan-
tum mechanics and find at order n in perturbation theory:

( 1)n log2n x
fn (x) = . (60)
x n!
The full distribution is a sum over n:
1
X log2 x
e
fn (x) = , (61)
n=0
x

which is finite and well-defined.

There are two simple ways to sample non-integrable distributions. The first is to just
artificially cut o↵ the domain. For example, for 1/x, just stop at ✏ > 0:
1
f (x) = , (62)
x
for x 2 [✏, 1], which is normalizable. One can then do any of the techniques that we developed.
Another way is to throw out the data/event interpretation of the Monte Carlo. For 1/x,
we can generate a uniform random variable x 2 [0, 1]. Then, we can fill the bin near x
by an amount equal (or proportional to) 1/x. This will then reproduce the f (x) = 1/x
distribution, but because each event does not contribute just 1 to the distribution, we can’t
easily give it a probabilistic data interpretation.
This is but a tiny introduction to Monte Carlo methods.

ForthCourse EntireDocument
No ratings yet
ForthCourse EntireDocument
161 pages
Change Point Analysis For Time Series
No ratings yet
Change Point Analysis For Time Series
552 pages
Tarasov The World Is Built On Probability
No ratings yet
Tarasov The World Is Built On Probability
200 pages
Watkins Systematic Interference
No ratings yet
Watkins Systematic Interference
32 pages
RTTFA
No ratings yet
RTTFA
70 pages
The Skeleton Key of Mathematics
0% (1)
The Skeleton Key of Mathematics
1 page
30-Second Energy - Brian Clegg
No ratings yet
30-Second Energy - Brian Clegg
219 pages
Fuji Inverter Manual
No ratings yet
Fuji Inverter Manual
103 pages
A First Principles Derivation of Functional Derivatives
100% (1)
A First Principles Derivation of Functional Derivatives
7 pages
Guide To Good Practices of Shift Training
No ratings yet
Guide To Good Practices of Shift Training
29 pages
Research Methods: Dong A Univercity Faculy of Foreign Language
No ratings yet
Research Methods: Dong A Univercity Faculy of Foreign Language
16 pages
UserManual Reliability
No ratings yet
UserManual Reliability
74 pages
How Efficient Is Jacobi Iteration in Solving Linear Systems
No ratings yet
How Efficient Is Jacobi Iteration in Solving Linear Systems
7 pages
Tomasz Dubejko and Kenneth Stephenson - Circle Packing: Experiments in Discrete Analytic Function Theory
No ratings yet
Tomasz Dubejko and Kenneth Stephenson - Circle Packing: Experiments in Discrete Analytic Function Theory
42 pages
MLE in Stata
No ratings yet
MLE in Stata
17 pages
Algebra Can Be Fun (Yakov I. Perelman, G. Yankovsky) (Z-Library)
No ratings yet
Algebra Can Be Fun (Yakov I. Perelman, G. Yankovsky) (Z-Library)
234 pages
Design of Steel Lighting System Support Pole Structures
No ratings yet
Design of Steel Lighting System Support Pole Structures
81 pages
05 Random Variables
No ratings yet
05 Random Variables
327 pages
Quantum Mechanics - Taylor PDF
No ratings yet
Quantum Mechanics - Taylor PDF
7 pages
Central Plant Optimization - WhitePaper PDF
No ratings yet
Central Plant Optimization - WhitePaper PDF
8 pages
Essay-Contest 2021 en
No ratings yet
Essay-Contest 2021 en
42 pages
FFTContent PDF
No ratings yet
FFTContent PDF
8 pages
Question IV. Supply The Correct Verb Tense
No ratings yet
Question IV. Supply The Correct Verb Tense
1 page
Cosom Lab File
100% (1)
Cosom Lab File
37 pages
Synergetics in The Plane
No ratings yet
Synergetics in The Plane
27 pages
Methods of Research
100% (1)
Methods of Research
6 pages
Mathematics Applied To Deterministic Problems in The Natural Sciences
No ratings yet
Mathematics Applied To Deterministic Problems in The Natural Sciences
9 pages
Alperen Tunçkıran 2517100
No ratings yet
Alperen Tunçkıran 2517100
111 pages
Ipc2022 - 87054 Probabilistic Flaw Growth Rate Estimates Using Multiple Inline
No ratings yet
Ipc2022 - 87054 Probabilistic Flaw Growth Rate Estimates Using Multiple Inline
10 pages
10 - Chapter 4 PDF
No ratings yet
10 - Chapter 4 PDF
30 pages
About The Electric Intensity of The Isolated Column.-english-Gustav Theodor Fechner.
No ratings yet
About The Electric Intensity of The Isolated Column.-english-Gustav Theodor Fechner.
9 pages
Sony Hcd-Eh45dab Ver.1.0
No ratings yet
Sony Hcd-Eh45dab Ver.1.0
38 pages
BioEnergy VKK
No ratings yet
BioEnergy VKK
24 pages
A Survey On Ergonomics of Industrial Safety Helmet
No ratings yet
A Survey On Ergonomics of Industrial Safety Helmet
6 pages
Supplemental Essay Guide 2021
No ratings yet
Supplemental Essay Guide 2021
18 pages
Generalized Linear Failure Rate Distribution
No ratings yet
Generalized Linear Failure Rate Distribution
23 pages
Expectancy Theory PDF
No ratings yet
Expectancy Theory PDF
2 pages
Dsa Notes Unit 5
No ratings yet
Dsa Notes Unit 5
21 pages
Probability
100% (1)
Probability
11 pages
CORE Stat and Prob Q4 Mod11 W1 Hypothesistesting
No ratings yet
CORE Stat and Prob Q4 Mod11 W1 Hypothesistesting
24 pages
Random Number Generation
No ratings yet
Random Number Generation
20 pages
N. Metrópolis The Beginning of The Monte Carlo Method PDF
No ratings yet
N. Metrópolis The Beginning of The Monte Carlo Method PDF
6 pages
18ee En590, Euro 5, Gost R52368 - 2005 - WW ..
No ratings yet
18ee En590, Euro 5, Gost R52368 - 2005 - WW ..
1 page
Gravitational Flux and Space Curvatures Under Gravitational Forces
No ratings yet
Gravitational Flux and Space Curvatures Under Gravitational Forces
4 pages
Alternators: LSA 42.2 - 2 Pole
No ratings yet
Alternators: LSA 42.2 - 2 Pole
7 pages
Non-Ergodicity and Its Implications For Businesses and Investors PDF
No ratings yet
Non-Ergodicity and Its Implications For Businesses and Investors PDF
72 pages
Lectures PDF
100% (1)
Lectures PDF
137 pages
Banana World
100% (1)
Banana World
144 pages
Chap4 - 1 Time To Failure
No ratings yet
Chap4 - 1 Time To Failure
36 pages
Pearson Distribution
No ratings yet
Pearson Distribution
11 pages
1-Relative Velocity in Special Relativity: Page898 College
No ratings yet
1-Relative Velocity in Special Relativity: Page898 College
26 pages
Planet Orbits in Python
No ratings yet
Planet Orbits in Python
7 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
Se6 - Machine Guide Final
No ratings yet
Se6 - Machine Guide Final
8 pages
CIR NO 24 Class IX Timetable and Syllabus For Term I
No ratings yet
CIR NO 24 Class IX Timetable and Syllabus For Term I
4 pages
Ipc-87064 The Why and How of Data Integration For Integrity Management
No ratings yet
Ipc-87064 The Why and How of Data Integration For Integrity Management
7 pages
Handling Imbalanced Data
No ratings yet
Handling Imbalanced Data
21 pages
Audit in S CIS Environment 3
No ratings yet
Audit in S CIS Environment 3
4 pages
Published in The Quality Management Forum, Winter Edition, Vol. 23, Number 4, 1997
No ratings yet
Published in The Quality Management Forum, Winter Edition, Vol. 23, Number 4, 1997
4 pages
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
Rubrics For Design Project 1 Report (Part 1) CPB 30703 Design Project 1
No ratings yet
Rubrics For Design Project 1 Report (Part 1) CPB 30703 Design Project 1
1 page
Extreme Value Statistics
No ratings yet
Extreme Value Statistics
41 pages
An Efficient Robotic Tendon For Gait Assistance
No ratings yet
An Efficient Robotic Tendon For Gait Assistance
4 pages
(Inverse and Ill-Posed Problems Series) Yu P Petrov - V S Sizikov - Well-Posed, Ill-Posed, and Intermediate Problems With Applications-VSP (2005)
No ratings yet
(Inverse and Ill-Posed Problems Series) Yu P Petrov - V S Sizikov - Well-Posed, Ill-Posed, and Intermediate Problems With Applications-VSP (2005)
245 pages
Nickel Alloys: Alloy 825 (UNS N08825)
No ratings yet
Nickel Alloys: Alloy 825 (UNS N08825)
1 page
Probability and Statistics
No ratings yet
Probability and Statistics
110 pages
Cantor - Über Eine Elementare Frage Der Mannigfaltigkeitslehre
No ratings yet
Cantor - Über Eine Elementare Frage Der Mannigfaltigkeitslehre
8 pages
Orbital and Celestial Mechanics: John P. Vinti
No ratings yet
Orbital and Celestial Mechanics: John P. Vinti
7 pages
Check List For Module 2 Krishna Sevak
No ratings yet
Check List For Module 2 Krishna Sevak
1 page
BEST STEPS - Physics
No ratings yet
BEST STEPS - Physics
12 pages
Grassmann Numbers
No ratings yet
Grassmann Numbers
4 pages
Mathematical Pendulum
No ratings yet
Mathematical Pendulum
2 pages
Introductory Mathematics-Part A
No ratings yet
Introductory Mathematics-Part A
77 pages
Quadratic Functions
No ratings yet
Quadratic Functions
14 pages
Math For Intro Physics.a4
No ratings yet
Math For Intro Physics.a4
60 pages
Economics Compiled HHW
No ratings yet
Economics Compiled HHW
11 pages
Values: Great Man Theory
No ratings yet
Values: Great Man Theory
4 pages
Click Here For Download: (PDF) Mercury's Perihelion From Le Verrier To Einstein (Oxford Science Publications)
No ratings yet
Click Here For Download: (PDF) Mercury's Perihelion From Le Verrier To Einstein (Oxford Science Publications)
2 pages
Reliability Issues GaN HEMT
No ratings yet
Reliability Issues GaN HEMT
4 pages
03 Failure Distribution r1
No ratings yet
03 Failure Distribution r1
8 pages
Counter Examples in Probability
No ratings yet
Counter Examples in Probability
12 pages
Dave Pressler No 4th Dimension Commentary5-1
No ratings yet
Dave Pressler No 4th Dimension Commentary5-1
2 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
The Conceptual Foundations of the Statistical Approach in Mechanics
From Everand
The Conceptual Foundations of the Statistical Approach in Mechanics
Paul Ehrenfest
3/5 (1)
Examples and Problems in Mathematical Statistics
From Everand
Examples and Problems in Mathematical Statistics
Shelemyahu Zacks
5/5 (2)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Sequential Analysis
From Everand
Sequential Analysis
Abraham Wald
4/5 (2)
Algebras of Holomorphic Functions and Control Theory
From Everand
Algebras of Holomorphic Functions and Control Theory
Amol Sasane
No ratings yet
On the History of Gunter's Scale and the Slide Rule during the Seventeenth Century
From Everand
On the History of Gunter's Scale and the Slide Rule during the Seventeenth Century
Florian Cajori
No ratings yet
A User's Guide to Ellipsometry
From Everand
A User's Guide to Ellipsometry
Harland G. Tompkins
No ratings yet

Lecture Notes On Monte Carlo Methods: Andrew Larkoski November 7, 2016

Uploaded by

Lecture Notes On Monte Carlo Methods: Andrew Larkoski November 7, 2016

Uploaded by

Lecture Notes on Monte Carlo Methods

car 1 car 2 car 3 car 4

t=0 t = 5s t = 12s t = 25s

event 1 event 2 event 3

must be done on identical systems. Otherwise, we do not know how to interpret

– We do know what the configuration of spins should satisfy physically. Energy

Figure 3: Events in the measurement of a system of 16 spins at finite temperature.

1. In proton-proton collisions, there are essentially an infinite number of possible out-

no way whatsoever) to achieve exact, analytic solutions for probability distributions.

Figure 6: Schematic representation of a proton-proton collision event. Note charge conser-

which is 0 at t = 0 and 1 at t = 1, and is monotonic. Next, we set this equal to

A plot of the output histogram is compared to the analytic function expression

3. Determine t3 , t4 , . . . , and continue until tn > T . Then, n 1 atoms decayed.

where (x, y) 2 [0, 1]. Note that this is normalized:

So, we break up the 2D distribution as

A Mathematica program xygen that generates pairs of random numbers (x, y)

pmax = max p(x) < 1 , (37)

Note then that

1. Choose a random number uniformly on x 2 [0, 1].

2. Determine the maximum value of p(x) on x 2 [0, 1]; call it pmax .

3. At the chosen random x, keep the event with probability

Let’s see how this works in an example.

p(x) = 2 sin2 (⇡x) (42)

A plot of the output histogram from vetowell is plotted in figure 8. To make

which is finite and well-defined.

You might also like