Computational Methods in Astrophysics: Monte Carlo Simulations and Radiative Transfer
Computational Methods in Astrophysics: Monte Carlo Simulations and Radiative Transfer
University Observatory
Joachim Puls
0 Introduction 1
1 Theory 1-1
1.1 Some basic definitions and facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.1.1 The concept of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.1.2 Random variables und related functions . . . . . . . . . . . . . . . . . . . 1-1
1.1.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
1.2 Some random remarks about random numbers . . . . . . . . . . . . . . . . . . . 1-6
1.2.1 (Multiplicative) linear congruential RNGs . . . . . . . . . . . . . . . . . . 1-6
1.2.2 The minimum-standard RNG . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.2.3 Tests of RNGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10
1.3 Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
1.3.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
1.3.2 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19
1.3.3 When shall we use the Monte Carlo integration? . . . . . . . . . . . . . . 1-22
1.3.4 Complex boundaries: “hit or miss” . . . . . . . . . . . . . . . . . . . . . . 1-23
1.3.5 Advanced reading: Variance reduction – Importance sampling . . . . . . . 1-24
1.4 Monte Carlo simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26
1.4.1 Basic philosophy and a first example . . . . . . . . . . . . . . . . . . . . . 1-26
1.4.2 Variates with arbitrary distribution . . . . . . . . . . . . . . . . . . . . . . 1-29
iii
CONTENTS
iv
Chapter 0
Introduction
This part of our practical course in computational techniques deals with Monte Carlo meth-
ods, which allow to solve mathematical/scientific/economic problems using random numbers.
The essential approach is to approximate theoretical expectation values by averages over suit-
able samples. The fluctuations of these averages induce statistical variances which have to be
accounted for in the analysis of the results. Most interestingly, Monte Carlo methods can be
applied for both stochastic and deterministic problems.
Some typical areas of application comprise
• optimization
• differential equations
• physical processes.
An inevitable prerequesite for this approch is the availability of a (reliable) random number gen-
erator (RNG). Moreover, all results have to be analyzed and interpreted by statistical methods.
Outline. The outline of this manual is as follows. In the next chapter we will give an overview
of the underlying theoretical concept. At first we will summarize few definitions and facts regard-
ing probabilistic approaches (Sect. 1.1). In Sect. 1.2 we will consider the “construction” of simple
RNGs and provide some possibilities to test them. An important application of Monte Carlo
techniques constitutes the Monte Carlo integration which is presented in Sect. 1.3. The theoret-
ical section is closed by an introduction into Monte Carlo simulations themselves (Sect. 1.4).
In Chap 2 we will outline the specific exercises to be solved during this practical work,
culminating in the solution of one (rather simple) astrophysical radiative transfer problem by
means of a Monte Carlo simulation.
Plan. The plan to carry out this part of our computational course is as follows: Prepare your
work by carefully reading the theoretical part of this manual. If you are a beginner in Monte
Carlo techniques, you might leave out those sections denoted by “advanced reading” (which
consider certain topics which might be helpful for future applications).
1
CHAPTER 0. INTRODUCTION
On the first day of the lab-work then, exercise 1 and 2 should be finished, whereas the second
day covers exercise 3. Please have a look into these exercises before you will accomplish them, in
order to allow for an appropriate preparation.
Don’t forget to include all programs, plots etc. into your final elaboration.
2
Chapter 1
Theory
(d) If there are N mutually exclusive events, Ei , i = 1, . . . , N , and these events are complete
concerning all possible outcomes of the experiment, then
N
X
pi = 1. (1.1)
i=1
For a two-stage experiment with events Fi and Gj , Eij is called the combined event (Gj has
occured after Fi ). If Fi and Gj are independent of each other, the probability for the combined
event is given by
p(Eij ) = pij = p(Gj ) · p(Fi ), (1.2)
1-1
CHAPTER 1. THEORY
In contrast to the expectation value, the variance is no linear operator (prove by yourself!).
If g and h are statistically independent, i.e.,
gh = E(gh) = E(g) · E(h) = g · h,
one can show that
Var (ag + bh) = a2 Var (g) + b2 Var (h) . (1.7)
Thus, the standard deviation has the following scaling property,
σ(ag) = aσ(g).
1-2
CHAPTER 1. THEORY
In analogy to the discrete probabilities, pi , also the pdf has to satisfy certain constraints:
the combined probability that all possible values of x will be realized is unity!
The cumulative probability distribution (cdf), F (x), denotes the probability that all events up
to a certain threshold, x, will be realized,
Zx
0
P (x ≤ x) ≡ F (x) = f (x0 )dx0 . (1.11)
−∞
Consequently,
• F (−∞) = 0 and
• F (∞) = 1.
1.1 Example. A uniform distribution between [a, b] is defined via a constant pdf:
Zb
!
= f (x)dx = C · (b − a) = 1,
a
and we find
1
f (x) = for all x ∈ [a, b]. (1.12)
b−a
Thus, the pdf for uniformly distributed random numbers, drawn by a random number generator (RNG) with
a = 0, b = 1 (cf. Sect. 1.2) is given by f (x) = 1.
The corresponding cumulative probability distribution results in
Zx Zx
dx x−a
F (x) = f (x)dx = = ∀ x ∈ [a, b], (1.13)
b−a b−a
a a
F (x) = 0 for x < a
F (x) = 1 for x > b
Regarding RNGs, this means that F (x) = x for x ∈ [0, 1]. Figs. 1.1 and 1.2 display the situation graphically.
1-3
CHAPTER 1. THEORY
f (x)
PSfrag replacements
1
b−a
x
0 a b
Figure 1.1: Probability density function (pdf), f (x), of a uniform distribution within the interval [a,b].
F (x)
PSfrag replacements
x
0 a b
Figure 1.2: Corresponding cumulative probability distribution, F (x).
In analogy to the discrete case, expectation value and variance of a continous random variable
are defined by
Z∞
E(x) ≡ x ≡ µ = x0 f (x0 )dx0 (1.14a)
−∞
Z∞
E(g) ≡ g = g(x0 )f (x0 )dx0 (1.14b)
−∞
Z∞
2
Var (x) ≡ σ = (x0 − µ)2 f (x0 )dx0 (1.15a)
−∞
Z∞
2 2
Var (g) ≡ σ (g) = (g(x0 ) − g) f (x0 )dx0 (1.15b)
−∞
1-4
CHAPTER 1. THEORY
σ
h
PSfrag replacements
µ
Figure 1.3: The Gauss-distribution (h = f (µ)e−1/2 ≈ 0.61 · f (µ)).
1.2 Example (The Gauss- or normal distribution). The pdf of a Gauss-distribution is defined as follows,
see Fig. 1.3: „ «
1 (x − µ)2
f (x) = √ exp − .
2πσ 2σ 2
Expectation value and variance can be easily calculated,
E(x) = µ Var (x) = σ 2
and the so-called “1σ-deviation” is given by
P (µ − σ ≤ x ≤ µ + σ) ≈ 0.6826.
1-5
CHAPTER 1. THEORY
• real random numbers, e.g., obtained from radio-active decays (laborious, few)
• sub-random numbers, which have an “optimal” uniform distribution, but are not com-
pletely independent (for an introduction, see Numerical Recipes). Sub-random numbers
are particularly used for Monte Carlo integrations.
In the remainder of this section, we will concentrate on pseudo random numbers, which must/should
satisfy the following requirements:
• uniformly distributed
• statistically independent
• not or at least long-periodically (most algorithms produce random numbers with a certain
period)
• portable: it should be possible to obtain identical results using different computers and
different programming languages.
The first three of these requirements are a “must”, whereas the latter ones are a “should”,
which will be met by most of the RNGs discussed in this section, particularly by the “minimum
standard” RNG (see below).
Basic algorithm.
• For any given initial value, SEED, a new SEED-value (“1st random number”) is calculated.
From this value, a 2nd random number is created, and so on. Since SEED is always
(integer mod M ), we have
1-6
CHAPTER 1. THEORY
SEED(actual) = SEED(initial)
At least for linear congruential RNGs with C > 0 it can be proven that the maximum possible
period, M , can be actually created if certain (simple) conditions relating C, A and M are
satisfied.
SEED (start) = 9
SEED (1) = 5 · 9 + 1 mod 32 = 46 mod 32 = 14, etc.,
↓ 9 1 25 17 9 (← new cycle)
14 6 30 22 14
7 31 23 15 7
4 28 20 12 ·
21 13 5 29 ·
10 2 26 18 ·
19 11 3 27
0 24 16 8
Table 1.1: Random number sequence SEED = (5 · SEED + 1) mod 32 with initial value 9.
1-7
CHAPTER 1. THEORY
with A, SEED and M consisting of a certain length (e.g., 4 byte), can be calculated without
overflow.
This “trick”, however, implies that our algorithm has to get along with C ≡ 0. As a first
consequence, SEED = 0 is prohibited, and x ∈ (0, 1).
In dependence of M , again only certain values of A are “allowed” to maintain the required
properties of the RNG. For M = 2n , e.g., long cycles are possible (though being at least a
factor of 4 shorter than above), but most of these cylces inhibit certain problems, particularly
concerning the independence of the created numbers. An impressive example of such problems
is given in example 1.11.
Thus, for C = 0 we strongly advise to avoid M = 2n and to use alternative cycles with M
being a prime number. In this case then, a maximum period of M − 1 (no “zero”) can be created
if A is chosen in a specific, rather complex way. The corresponding sequence has a long period,
uniform distribution and almost no correlation problem, as discussed below.
1-8
CHAPTER 1. THEORY
1.5 Theorem. If r < q (this is the additional condition) and 0 < SEED < M − 1, then
(i)
)
A · (SEED mod
h q)i
< M − 1, (1.19)
r · SEED
q
|DIFF| < M − 1
1.6 Example (simple). Calculation of (3 · 5) mod 10 without overflow corresponding to an intermediate value
> (M − 1) = 9 (A = 3 and SEED = 5):
ˆ ˜ ff
q = 10
3
=3
thus r < q and SEED < M − 1 = 9.
r = 10 mod 3 = 1
The requirements are satisfied, and with (1.20, 1.21) we find
» –
SEED
(A · SEED) mod M = A · (SEED mod q) − r = 3 · 2 − 1 · 1 = 5.
q |{z} |{z}
<M−1 <M−1
1.7 Example (typical). M = 231 − 1 = 2147483647, SEED = 1147483647, A = 69621. Direct calculation yields
1-9
CHAPTER 1. THEORY
K = ISEED / q
ISEED = A * (ISEED - K * q) - r * K
1-10
CHAPTER 1. THEORY
2 1
We will test whether N1 and N2 are distributed according to the expectation, n1 = 3
N, n2 = 3
N, i.e., whether
the dice has not been manipulated.
For this quantity, the following theorem can be proven (actually, this proof is rather complex).
1.9 Theorem. Pearson’s χ2p (1.22) is a statistical function, which asymptotically (i.e., for
Ni À 1 ∀ i) corresponds to the sum of squares of f independent random variables, which are
normally distributed, with expectation value 0 and variance 1. The degrees of freedom, f , are
defined by
• if ni is fixed, we have f = k.
P P
• if ni is normalized via ni = Ni = N , we have f = k − 1 ( one constraint).
• if m additional constraints are present, we have f = k − m − 1.
Because of this theorem, Pearson’s χ2p asymptotically corresponds to the “usual” χ2 -distribution,
well-known from optimization/regression methods.
p Remember that the expectation value of χ 2
2
is f and that the standard deviation of χ is (2f ). In praxis, “asymptotically” means that in
each “channel” i at least ni = 5 events are to be expected.
1.10 Example (Continuation: Test of dices). With Ni , ni , i = 1, 2 as given above, Pearson’s χ2p results in
X (Ni − ni )2 (N1 − 32 N )2 (N2 − 31 N )2
χ2p = = 2
+ 1
.
ni 3
N 3
N
Because of the requirement that one has to expect at least five events in each channel, we have to throw the dice
at least N = 15 times. Thus, we have
(N1 − 10)2 (N2 − 5)2
χ2p = + ,
10 5
and the degrees of freedom are f = 2 − 1 = 1.
Let’s test now four different dices. After 15 throws each we obtained the results given in Tab. 1.2. The quantity Q
gives the probability that by chance a larger value of χ2 (larger than actually found) could have been present. If
Q has a reasonable value (typically, within 0.05. . . 0.95), everything should be OK, whereas a low value indicates
that the actual χ2 is too large for the given degrees of freedom (dice faked!). Concerning√ the calculation of Q,
see Numerical Recipes, chap. 6.2. Note that for an unmanipulated dice, < χ 2 >≈ 1 ± 2 is to be expected.
1-11
CHAPTER 1. THEORY
Comment. To obtain a larger significance regarding our findings, the experiment should be
performed a couple of times (say, at least 10 times). The specific results have then to be checked
versus the expectation that the individual χ2 must follow the χ2 -distribution for f degrees of
freedom, if the dices were unmanipulated (see below).
where
The channels
X must be selected in such a way that all possible events can be accounted for, i.e.,
that pi = 1.
i
Application to random numbers. Remember the original question whether the provided
random numbers are uniformly distributed within (0, 1). To answer this question, we “bin” the
interval (0, 1) into k equally wide sub-intervals. We calculate N random numbers and distribute
them according to their values into the corresponding channels. After the required number of
random numbers has been drawn (see below), we count the occurence Ni in each channel. If the
1
random numbers were uniformly distributed, we should have pi = for all i.
k
Thus, we calculate
k
X (Ni − Nk )2
χ2p = ,
i=1
N/k
such that in each channel the expected value of ni = N/k is at least 5, i.e, a minimum of N = 5k
random numbers have to be drawn. Then,
• the test has to be repeated l times, with typically l ≈ 10. Note, that the initial SEED value
has to be different for different l, of course.
• At last, we test whether the individual χ2p,j (j = 1, . . . , l) within the series are distributed
according to χ2 with k − 1 degrees of freedom. This can be obtained by the so-called
Kolmogoroff-Smirnov-test. 3
1-12
CHAPTER 1. THEORY
x3 , x 4
1
x1 , x 2
PSfrag replacements
0 1
To test this independence, more-dimensional tests have to be applied, which check for the cor-
relation of random numbers.
a) Two-dimensional tests.
!
P (xi+1 |xi ) = P (xi+1 ) checks whether two consecutive numbers are statistically independent.
For this purpose, we bin the unit square (0, 1)2 in two dimensions, create pairs of random
numbers and sort them into corresponding “quadratic” channels (cf. Fig. 1.4 for the random
numbers x1 = 0.1, x2 = 0.3, x3 = 0.5, x4 = 0.9). For k channels in one dimension one obtains
k 2 channels in two dimensions, and p = 1/k 2 random numbers in each “quadratic” channel
have to be expected (independent events, cf. (1.2)). The corresponding Pearson’s χ2p is
calculated via
2
k
X (Ni − N/k 2 )2
χ2p = ,
i=1
N/k 2
if Ni is the number of drawn pairs in channel i. As a minimum, N = 5k 2 numbers have to be
drawn now. Again, more than one series j has to be checked, and both the individual values
of χ2p,j as well as their distribution (via Kolmogoroff-Smirnov) have to be inspected.
1-13
CHAPTER 1. THEORY
b) Three-dimensional tests.
!
P (xi+2 |xi+1 , xi ) = P (xi+2 ) checks for the independence of a third random number from
the previously drawn first and second one. In analogy to a) and b), we consider the unit
cube (0, 1)3 . Triples of three consecutive random numbers are sorted into this cube, and the
corresponding probability is given by pi = 1/k 3 . The further proceeding is as above.
c) Typically, up to 8 dimensions should be checked for a rigorous test of a given RNG. (What
problem might arise?) Check the web for published results.
1.11 Example (RANDU: A famous failure). One of the first RNGs which have been implemented on a computer,
RANDU (by IBM!), used the algoritm (1.17) with M = 231 , A = 65539 and C = 0, i.e., an algorithm which has
been discussed as being potentially “dangerous” (remember that the minimum-standard generator uses a prime
number for M ). Though the above tests for 1- and 2-D are passed without any problem, the situation is different
from the 3-D case on (it seems that IBM did not consider it as necessary to perform this test).
In the following, we carry out such a 3-D test with 303 = 27 000 channels and 270 000 random numbers. Thus,
N/(303 ) = 10 events per channel are to be expected, and a value of χ2 of the order of f = (27 000 − 1) should
result for an appropriate RNG. We have calculated 10 test series each for the Park/Miller minimum-standard
generator and for RANDU. Below we compare the outcome of our test.
Park/Miller Minimum-Standard
----------------------------
The probability that the values for χ2p do follow the χ2 -distribution is given (via Kolmogoroff-Smirnov) by
Thus, as it is also evident from the individual values of χ2p alone, being actually of the order of ≈ 27, 000, we
conclude that the minimum-standard generator passes the 3-D test. In contrast, RANDU. The corresponding results
are given by
RANDU
-----
(note that each individual value of χ2p correspond to ≈ 17 · f ) and the KS-test yields
1-14
CHAPTER 1. THEORY
Figure 1.5: 2-D representation of random numbers, created via SEED = (65 ∗ SEED + 1) mod 2048. The numbers
are located on 31 hyper-surfaces.
i.e., the probability is VERY low that the individual χ2p follow a χ2 distribution with f ≈ 27, 000. Though RANDU
delivers numbers which are uniformely distributed (not shown here), each third number is strongly correlated
with the two previous ones. Consequently, these numbers are not statistically independent and therfore no
pseude-random numbers!
One additional/alternative test is given by the graphical representation of the drawn random
numbers. From such tests, a deficiency becomes obvious which affects all random numbers
craeated by linar congruential algorithms: If one plots k consecutive random numbers in a k-
dimensional space, they do not fill this space uniformely, but are located on (k − 1)-dimensional
hyper-surfaces. It can be shown that the maximum number of these hyper-surfaces is given by
M 1/k , where only a careful choice of A and C ensures that this maximum is actually realized.
In Fig. 1.5 we display this deficiency by means of a (short-period) generator with algorithm
With respect to a 2-D representation, the maximum number of hyper-surfaces (here: straight
lines) is given by 20481/2 ≈ 45. With our choice of A = 65 and C = 1, at least 31 of those
hyper-surfaces are realized.
If M = 231 , a 3-D representation should yield a maximum number of (231 )1/3 ≈ 1290 surfaces.
For RANDU (see example 1.11), this number becomes reduced by a factor of 41/3 , because only
even numbers and additionally only half of them can be created, thus reducing the maximum
number of surfaces to 812. In contrast however, this generator delivers only 15! surfaces (see
Fig. 1.6), i.e., the available space is by no means uniformly filled, which again shows the failure
of RANDU.
1-15
CHAPTER 1. THEORY
’out’
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7 0
0.1
0.2
0.8 0.3
0.4
0.5
0.9 0.6
0.7
1 0.8
0.9
1
Figure 1.6: 3-D representation of “random” numbers created by RANDU. Though 812 hyper-surfaces are possible,
only 15 of them are realized.
Reshuffling
To destroy the correlation described above (which will be always present for linear congruential
RNGs), a fairly cheap and easy method is available, called reshuffling. In this method, the drawn
random number itself is used to destroy the inherent sequence of consecutive numbers and thus
the correlation. Fig. 1.7 displays the algorithm and Fig. 1.8 the result. The algorithm can be
formulated as follows. Let y be a random number just drawn. A new random number follows
from
The RNG recommended by Numerical recipes, “ran1”, uses N = 32 registers V(1. . . N), which
must be filled during a first “warm-up” phase (actually, the very first 8 numbers are not used at
all), before the first number can be drawn.
1-16
CHAPTER 1. THEORY
(1) V(1)
PSfrag replacements
(3) (2)
V(I)
RAN OUTPUT
V(N)
Figure 1.8: As Fig. 1.5, but with reshuffling. Any previous correlation has been destroyed.
1-17
CHAPTER 1. THEORY
1.3.1 Preparation
Before we can start with the “real” stuff, we need some preparatory work in order to understand
the basic ideas. Impatient readers might immediately switch to Sect. 1.3.2.
• Let us begin by drawing N variates (= random variables) x1 , . . . , xN from a given pdf f (x),
e.g., from the uniform distribution of pseudo random numbers as created by a RNG. For
arbitrary pdfs, we will present the corresponding procedure in Sect. 1.4.
• From these variates, we calculate a new random variable (remember that functions of
random variables are random variables themselves) by means of
N
X 1
G= g(xn ), (1.24)
n=1
N
where g is an arbitrary function of x. Since the expectation value is a linear operator, the
expectation value of G is given by
N
X 1
E(G) = G = E(g(xn )) = g, (1.25)
n=1
N
i.e, is the expectation value of g itself. The variance of G, on the other hand, is given by
ÃN ! N
X 1 X 1 1
Var (G) = Var g(xn ) = 2
Var (g(xn )) = Var (g) , (1.26)
n=1
N n=1
N N
where we have assumed that the variates are statistically independent (cf. 1.7). Note that
the variance of G is a factor of N lower than the variance of g.
• Besides being a random variable, G (1.24) is also the arithmetical mean with respect to the
drawn random variables, g(xn ). In so far, (1.25) states that the expectation value of the
arithmetic mean of N random variables g(x) is nothing else than the expectation value of
g itself, independent of N .
• This will be the basis of the Monte Carlo integration, which can be defined as follows:
Approximate the expectation value of a function by the arithmetic mean over the corre-
sponding sample!!!
g=G≈G
• Eq. 1.26 can be interpreted similarly: The variance of the arithmetical mean of N (inde-
pendent!) random variables g(x) is lower than the variance of g(x) itself, by a factor of
1/N . Note that statistically independent variates are essential for this statement, which
justifies our effort concerning the corresponding tests of RNGs (Sect. 1.3.2.2).
1-18
CHAPTER 1. THEORY
Implication: The larger the sample, the smaller the variance of the arithmetical mean, and
the better the approximation
g ≈ G.
Note in particular that
p
G≈G± Var (G), i.e., G → G = g for N → ∞.
Summary. Before applying this knowledge within the Monte Carlo integration, we like to
summarize the derived relations.
¾ ½
pdf f (x) E(g) true expectation value
random variable g(x) Var (g) true variance
Z
E(g) = g = g(x)f (x)dx
Z
2
Var (g) = (g(x) − g) f (x)dx
Then we have
G = E(G) = E(g) = g
1
Var (G) = Var (g)
N
1/2 1
σ(G) = (Var (G)) = √ σ(g)
N
√
The last identity is known as the “1/ N -law” of the Monte Carlo simulation. The actual “trick”
now is the
Zb Zb Zb
g(x)
I= g(x)dx = f (x)dx = h(x)f (x)dx (1.27)
f (x)
a a a
1-19
CHAPTER 1. THEORY
The actual trick is now to demand that f (x) shall be a probability density function (pdf). In
this way then, any integral can be re-interpreted as the expectation value of h(x) with respect
to f (x), and we just have learnt how to approximate expectation values!
Until further notification, we will use a constant pdf, i.e., consider a uniform distribution over
the interval [a, b],
1 a≤x≤b
f (x) = b − a
0 else,
cf. (1.12). The “new” fuction h(x) is thus given by
g(x)
h(x) = = (b − a)g(x),
f (x)
and the integral becomes proportional to the expectation value of g:
Zb
exact!
I = (b − a) g(x)f (x)dx = (b − a)g. (1.28)
a
We like to stress that this relation is still exact. In a second step then, we draw N variates
xn , n = 1, . . . , N according to the pdf (which is almost nothing else than drawing N random
numbers, see below), calculate g(xn ) and evaluate the corresponding arithmetical mean,
N
1 X
G= g(xn ).
N n=1
Finally, the integral I can be easily estimated by means of the Monte Carlo approximation,
Monte Carlo
exact approx.
I = (b − a)g = (b − a)G ≈ (b − a)G. (1.29)
Thus, the established Monte Carlo integration is the (arithmetical) mean with respect to N
“observations” of the integrand, when the variates xn are uniformly drawn from the interval
[a, b], multiplied by the width of the interval.
The error introduced by the Monte Carlo approximation can be easily estimated as well.
1 1³ 2 ´ 1
Var (G) = Var (g) = g − g 2 ≈ (J − G2 ), (1.30)
N N k k N
J G2
if
N
1 X 2
J= g (xn )
N n=1
2
is the Monte Carlo estimator for E(J) = J = g 2 and if we approximate g 2 = G by G2 .
If we identify now the error of the estimated integral, δI, with the standard deviation resulting
from Var (G)4 (which can be proven
√ from the “central limit theorem” of statistics), it is obvious
that this error follows the “1/ N -law”, i.e.,
√
δI ∝ 1/ N .
4 multiplied by (b − a)
1-20
CHAPTER 1. THEORY
if
1 X
g(xn )
hgi = G =
N n
2® 1 X 2
g =J = g (xn )
N n
1.12 Example. [Monte Carlo integration of the exponential function] As a simple example, we will estimate the
integral
Z2
e−x dx = 1 − e−2 = 0.86467
0
via Monte Carlow integration.
(1) At first, we draw N random numbers uniformly distributed ∈ [0, 2] and “integrate” for the two cases, a)
N = 5 and b) N = 25. Since the RNG delivers numbers distributed ∈ (0, 1), we have to multiply them by a
factor of 2, cf. Sect. 1.4, Eq. 1.41. Assume that the following variates have been provided.
Case a) Case b)
1 7.8263693E-05 1 0.1895919
2 1.3153778 2 0.4704462
3 1.5560532 3 0.7886472
4 0.5865013 4 0.7929640
5 1.3276724 5 1.3469290
6 1.8350208
7 1.1941637
8 0.3096535
9 0.3457211
10 0.5346164
11 1.2970020
12 0.7114938
13 7.6981865E-02
14 1.8341565
15 0.6684224
16 0.1748597
17 0.8677271
18 1.8897665
19 1.3043649
20 0.4616689
21 1.2692878
22 0.9196489
23 0.5391896
24 0.1599936
25 1.0119059
(2) We calculate
1 X −Xn
G = hgi = e
N
˙ ¸ 1 X “ −X ”2
J = g2 = e n ,
N
1-21
CHAPTER 1. THEORY
(3) With !
r
1 “ 2 ”
I≈V hgi ± hg i − hgi2
N
and “volume” V = 2, the integral is approximated by
√ √
Note that the “1/ N -law” predicts an error reduction by 1/ 5 = 0.447, not completely met by our simulation,
since case a) was estimated based on a very small sample. Morever, the result given for case b) represents “reality”
only within a 2-sigma error.
The error for N = 25 is of the order of ±0.1. Thus, to reduce the error to ±0.001 (factor 100), the sample size
has to be increased by a factor of 1002 , i.e., we have to draw 250 000 random numbers.
We see that the (estimated) error has decreased as predicted, and a comparison of exact (I(exact) = 0.86476)
and Monte Carlo result shows that it is also of the correct order.
As a preliminary conclusion then, we might state that the MC integration can be quickly pro-
grammed but that a large number of evalutions of the integrand are necessary to obtain a
reasonable accuracy.
1-22
CHAPTER 1. THEORY
Z
PSfrag replacements f (x)dx
• Just for more-D integrals complex boundaries are the rule (difficult to program), and the
MC integration should be preferred, as long as the integrand is not locally concentrated.
• If the required precision is low, and particularly for first estimates of unknown integrals,
the MC integration should be preferred as well, because of its simplicity.
• If the specific integral must be calculated only once, again the MC solution might be
advantegous, since the longer computational time is compensated by the greatly reduced
programming effort.
1-23
CHAPTER 1. THEORY
• A disadvantage of this method is of course that all (N − n) points being no “hit” (i.e.,
which are located above f (x)) are somewhat wasted (note that f (x) has to be evaluated
for each point). For this reason, the circumscribing area A has to be chosen as small as
possible.
• The generalization of this method is fairly obvious, for typical examples see “Numerical
recipes”. E.g., to calculate the area of a circle, one proceeds as above, however now counting
those points as hits on a square A which satisfy the relation x2i + yi2 ≤ r2 , for given radius
n
r. Again, the area of the circle results from the fraction A N .
For arbitrary pdf’s p then, the generalization of (1.31) is straightforward, and the Monte Carlo
estimator of the integral is given by
µ ¶ 12
g 1 g g
I =h i± √ h( )2 i − h i2 , (1.35)
p N p p
where < g/p > is the arithmetic mean of g(xn )/p(xn ), and the variates xn have been drawn
according to the pdf p! Note in particular that any volume factor has vanished. From this
equation, the (original) variance can be diminished signficantly if the pdf p is chosen in such a
way that it is very close to g, i.e., if
g
≈ constant over the integration region.
p
1-24
CHAPTER 1. THEORY
If p = g, the variance would become zero, and one might think that this solves the problem
immediately. As we will see in the next
R section, however, the calculation of variates according to
a pdf p requires the calculation of pdV , which, for p = g, is just the integral we actually like
to solve, and we would have gained nothing.
Instead of using p = g then, the best compromise is to use a pdf p which is fairly similar to g,
but also allows for a quick an easy calculation of the corresponding variates.
Finally, we will show that our MC cooking recipe (1.31) is just a special case for the above
generalization, Eq. 1.35, under the restriction that p is constant. Because of the normalization
(1.34), we immediately find in this case that
1
p= ,
V
and
Z µ ¶1
g 1 g 2 g 2 2
I= gdV ≈ h i± √ h( ) i−h i
1/V N 1/V 1/V
µ ¶
1 1
= V < g > ± √ (< g 2 > − < g >2 ) 2 , q.e.d.
N
1-25
CHAPTER 1. THEORY
• These individual sub-processes are described by using pdf’s, and combined in a proper way
as to describe the complete problem. The final result is then obtained from multiple runs
through the various possible process-chains, in dependence of the provided pdf’s (which
are calculated from random numbers).
• After having followed a sufficient number of runs, we assume that the multitude of outcomes
has provided a fair mapping of “reality”, and the result (plus noise) is obtained from
collecting the output of the individual runs.
advantages: Usually, such a method can be programmed in a fast and easy way, and results of
complicated problems can be obtained on relatively short time-scales.
disadvantages: On the other hand, a deeper understanding of the results becomes rather diffi-
cult, and no analytical approximation can be developed throughout the process of solution.
In particular, one has to perform a completely new simulation if only one of the parame-
ters might change. Moreover, the reliability of the results depends crucially on a “perfect”
knowledge of the sub-processes and a proper implementation of all possible paths which
can be realized during the simulation.
1.13 Example (Radiative transfer in stellar atmospheres). Even in a simple geometry, the solution of the
equation of radiative transfer (RT) is non-trivial, and in more complex situations (more-D, no symmetry, non-
homogenous, clumpy medium) mostly impossible. Because of this reason, Monte Carlo simulations are employed
to obtain corresponding results, particularly since the physics of the subprocesses (i.e., the interaction between
photons and matter) is well understood.
As an example we consider the following (seemingly trivial) problem, which - in a different perspective - will be
reconsidered in the practical work to be performed.
We like to calculate the spatial radiation energy density distribution E(τ ), in a (plane-parallel) atmospheric
layer, if photons are scattered by free electrons only. This problem is met, to a good approximation, in the outer
regions of (very) hot stars.
One can show that this problem can be described by Milne’s integral equation,
Z∞
1
E(τ ) = E(t)E1 |t − τ | dt, (1.36)
2
0
and τ the “optical depth”, here with respect to the radial direction (i.e., perpendicular to the atmosphere).
The optical depth is the relevant depth scale for problems involving photon transfer, and depends on the
particular cross-section(s) of the involved interactions(s), σ, and on the absorber densities, n. Since, for a given
1-26
CHAPTER 1. THEORY
outside
PSfrag replacements τ3
θ
τ2
τ1
inside
Figure 1.10: Monte Carlo Simulation: radiative transfer (electron scattering only) in stellar atmospheres.
frequency, usually different interactions are possible, all these possibilites have to be accounted for in the calcula-
tion of the optical depth. In the considered case (pure thomson-scattering), however, the optical depth is easily
calculated and moreover almost frequency independent, if we consider only frequencies being lower than X-ray
energies.
Zr1
τ (s1 , s2 ) = σ n(s)ds, (1.37)
r2
where the absorber density corresponds to the (free) electron density and the optical depth is defined between to
spatial points with “co-ordinates” s1 and s2 and s is the geometrical length of the photon’s path.
Analytic and MC solution of Milne’s integral equation
Milne’s equation (1.36) can be exactly solved, but only in a very complex way, and the spatial radiative energy
density distribution (normalized to its value at the outer boundary, τ = 0) is given by
E(τ ) √
= 3 (τ + q(τ )) . (1.38)
E(0)
q(τ ) is the so-called “Hopf-function”, with √1 ≤ q(τ ) < 0.710446, which constitutes the real complicate part of
3
the problem.6
An adequate solution by a Monte Carlo simulation, on the other hand, can be realized as follows (cf. Fig. 1.10,
for more details see Sect. 2.3):
• Photons have to be “created” emerging from the deepest layers, defined by τmax .
• The probability for a certain emission angle, θ (with respect to the radial direction), is given by
p(µ)dµ ∼ µdµ, µ = cos θ.
• The optical depth passed until the next scattering event can be described by the probability
p(τ )dτ ∼ e−τ dτ,
(this is a general result for absorption of light) and
• the scattering angle (at low energies) can be approximated to be isotropic,
p(µ)dµ ∼ dµ.
6 Note that an approximate solution of Milne’s equation can be obtained in a much simpler way, where
the only difference compared to the exact solution regards the fact that q(τ ) results in 2/3, independent
of τ (see., e.g., Mihalas, D., 1978, Stellar atmospheres, Freeman, San Francisco, or http://www.usm.uni-
muenchen.de/people/puls/stellar at/stellar at.html, Chap. 3 and 4).
1-27
CHAPTER 1. THEORY
Figure 1.11: Solution of Milne’s integral equation (Example 1.13) by means of a Monte Carlo simulation using
106 photons. The MC solution is indicated by asterisks, and the analytical one (Eq. 1.38) is given by the bold
line.
• All photons have to be followed until they leave the atmosphere (τ ≤ 0) or are scattered back into the
stellar interior (τ > τmax ), and the corresponding energies have to be summed up as a function of optical
depth.
• If the simulation is performed for a large number of photons, the analytical result is well reproduced, cf.
Fig. 1.11.
From this example and accounting for our knowledge accumulated so far, an immediate problem
becomes obvious:
Though we are able to create uniform distributions on the interval [0, 1) (or (0, 1), respec-
tively) via a RNG, (
1 for x ∈ [0, 1)
p(x) =
0 else,
we still do not know how to obtain random variables (variates) y which are distributed according
to an arbitrary (normalized!) pdf, f (y). How to create, for example, optical depth lengths τ
in Example 1.13, which are distributed according to a pdf e−τ dτ , i.e., exponentially instead of
uniformly?
1-28
CHAPTER 1. THEORY
y(x)
dy
PSfrag replacements
x
dx
Figure 1.12: The transformation method, see text. f (y)dy is the probability that y is in dy, and p(x)dx is the
probability that x is in dx.
p(x)dx = P (x ≤ x0 ≤ x + dx)
f (y)dy = P (y ≤ y 0 ≤ y + dy)
with y = y(x): The random variable y shall be a function of the random variable x. A physical
transformation means that corresponding probabilities, P , must be equal:
If p(x)dx is the probability that x is within the range x, x + dx and y is a function of x, then
the probability that y is in the range y, y + dy must be the same!
1.14 Example. x shall be distributed uniformely in [0, 1], and y = 2x. The probability that x is in 0.1 . . . 0.2
must be equal to the probability that y is in 0.2 . . . 0.4 (cf. Example 1.12).
Similarly, the probability that x ∈ [0, 1] (= 1) must correspond to the probability that y ∈ [0, 2] (also = 1).
or alternatively ¯ ¯
¯ dx ¯
f (y) = p(x) ¯¯ ¯¯ .
dy
1-29
CHAPTER 1. THEORY
We have to use the absolute value because y(x) can be a (monotonically) increasing or de-
creasing function, whereas probabilities have to be larger than zero by definition. The more-D
generalization of this relation involves the Jacobian of the transformation,
i.e., ¯ ¯
¯ ∂(x1 , x2 ) ¯
f (y1 , y2 ) = p(x1 , x2 ) ¯¯ ¯ (1.39b)
∂(y1 , y2 ) ¯
Let now p(x) be uniformly distributed on [0, 1], e.g., x has been calculated by a RNG ⇒ p(x) = 1.
Then ¯ ¯
Zx ¯ y(x) ¯
¯ Z ¯
0 ¯ ¯
dx = |f (y)dy| → dx = ¯ f (y)dy ¯
¯ ¯
0 ¯ ymin ¯
¯ Zy ¯
¯ ¯
¯ 0 0¯
⇒ x = F (y) = ¯ f (y )dy ¯ with F (ymin ) = 0, F (ymax = y(1)) = 1.
¯ ¯
ymin
| {z }
cumulative prob.
distribution
step 1: If f (y) is not normalized, we have to normalized it at first, by replacing f (y) → C · f (y)
with
C = F (ymax )−1
.
step 2: We derive F (y).
step 3: By means of a RNG, a random number, x ∈ [0, 1), has to be calculated.
step 4: We then equalize F (y) =: x and
step 5: solve for y = F −1 (x).
1.15 Example (1). y shall be drawn from a uniform distribution on [a, b] (cf. Example 1.12).
Zy
1 1 y−a
f (y) = ⇒ F (y) = dy =
b−a b−a b−a
a
y−a
(test: F (a) = 0, F (b) = 1, ok). Then, = x, and y has to be drawn according to the relation
b−a
y = a + (b − a) · x, x ∈ [0, 1). (1.41)
1-30
CHAPTER 1. THEORY
Zy
0
F (y) = e−y dy 0 = 1 − e−y =: x
0
⇒ y = − ln(1 − x) → y = − ln x, because (1 − x) is distributed as x!
If one calculates
y = − ln x, x ∈ [0, 1), (1.42)
then y is exponentially distributed!
1 1−x !
Test: f (y)dy = e−(− ln(1−x)) dy(x) and dy = dx ⇒ f (y)dy = dx = p(x)dx.
1−x 1−x
Recipe: Draw two randon numbers x1 , x2 from a RNG, then y1 and y2 as calculated by (1.43) are normally
distribued!
1.19 Example (5). In case that the pdf is tabulated, the generalization of the inversion method is straightforward.
The integral F (y) becomes tabulated as well, as a function of y (e.g., from a numerical integration on a certain
grid with sufficient resolution), and the inversion can be performed by using interpolations on this grid.
Z
Even if the inversion method cannot be applied (i.e., f (y)dy cannot be calculated analytically
or F (y) cannot be inverted and a tabulation method (see above) is discarded, there is no need to
give up. Indeed, there exists a fairly simple procedure to calculate corresponding variates, called
1-31
CHAPTER 1. THEORY
Figure 1.13: The most simple implementation of von Neumann’s rejection method (see text).
if x1 , x2 are pairs of consecutive random numbers drawn from a RNG. The statistical indepen-
dence of x1 , x2 is of highest imporance here!!! For this pair then, we accept xi to be a variate
distributed according to w, if
accept xi : yi ≤ w(xi ),
whereas otherwise we reject it!
reject xi : yi > w(xi ).
If a value was rejected, another pair is drawn, and so on. Again, the number of “misses”, e.g.,
of rejected variates, depends on the ratio of the area of rectangle and the area below w. The
optimization of this method, which uses an additional comparison function to minimize this ratio
is presented, e.g., in Numerical Recipes, Chap. 7.3.
1-32
Chapter 2
a) At first, you should check the results from Fig. 1.5. For this purpose, perform the appropriate
modifications of the program, i.e., create an output-file (to be called ran out) with entries x i , yi
for a sufficiently large sample, i = 1. . .N , which can be read and plotted with the idl procedure
test rng.pro. Have a look into this procedure as well!
Convince yourself that all possible numbers are created (how do you do this?). Plot the distri-
bution and compare with the manual (ps-figures can be created with the keyword \ps).
b) After success, modify the program in such a way that arbitrary values for A can be read
in from the input (modifying C will not change the principal result). Display the results for
different A, A = 5, 9, . . .37. Plot typical cases for large and small correlation. What do you
conclude?
c) Finally, for the same number of N , plot the corresponding results obtained from the “minumum-
standard generator” ran1, which is contained in the working directory as program file ran1.f90.
Of course, you have to modify the main program accordingly and to re-compile both programs
together. Compare with the results from above and discuss the origin of the difference.
2-1
CHAPTER 2. PRAXIS: MONTE CARLO SIMULATIONS AND RADIATIVE TRANSFER
2hν 3 1
Bν (T ) = (2.1)
c2 e kT
hν
−1
with frequency ν and temperature T . All other symbols have their usual meaning. The total
intensity radiated from a black body can be calculated by integrating over frequency, resulting
in the well-known stefan-boltzmann-law,
Z∞
σB 4
Bν (T )dν = T , (2.2)
π
0
where σB ≈ 5.67 · 10−5 (in cgs-units, erg cm−2 s−1 grad−4 ) is the boltzmann-constant. Strictly
speaking, this constant is no natural constant but a product of different natural constants and
the value of the dimensionless integral,
Z∞
x3
dx. (2.3)
ex−1
0
a) Determine the value (three significant digits) of this integral by comparing (2.1) with (2.2)
and the given value of σB . (Hint: Use a suitable substitution for the integration variable in
(2.1)). If you have made no error in your calculation (consistent units!), you should find a value
close to the exact one, which is π 4 /15 and can be derived, e.g., from an analytic integration over
the complex plane.
From much simpler considerations (splitting the integral into two parts connected at x = 1), it
should be clear that the integral must be of order unity anyway. As a most important result
then, this exercise shows how simply the T 4 -dependence arising in the stefan-boltzman law
can be understood.
b) We will now determine the value of the integral (2.3) from a Monte Carlo integration,
using random numbers as generated from ran1. For this purpose, use a copy of the program
mcint ex2.f90 and perform appropriate modifications. Since the integral extends to infinity
which cannot be simulated, use different maximum-values (from the input), √ to examine which
value is sufficient. Use also different sample sizes, N , and check the 1/ N -law of the MC
integration. Compare with the exact value as given above.
2-2
CHAPTER 2. PRAXIS: MONTE CARLO SIMULATIONS AND RADIATIVE TRANSFER
where I1 is the specific intensity for the radial direction, µ = 1, i.e., the intensity observed at the
center of a star. For µ = 0, on the other hand, the angle between radial and photon’s direction
is 90o (this situation is met at the limb of the star), and the corresponding region appears to be
fainter, by a factor of roughly 0.4.
Eq. 2.4 is the consequence of an approximate solution of the equation of radiative transfer
in plane-parallel symmetry1 , with absorption and emission processes assumed to be grey, i.e.,
frequency independent, which has been developed (long ago!) by Eddingtion and Barbier.
Note that the approximate solution of Milne’s integral equation (Example 1.13) is based on the
same approach. Just one additional comment: The above relation (2.4) has nothing to do with
the presence of any temperature stratification (though it can be influenced by it), but is the
consequence of a large number of absorption and emission processes in an atmosphere of finite
(optical) depth, as we will see below.
In order to avoid almost all subtleties of radiative transfer and corresponding approximate so-
lutions2 , in this exercise we will perform a Monte Carlo simulation to confirm the above result.
The principle strategy is very close to the simulation as described in Example 1.13, but we will
sort the photon’s now according to the direction they leave the atmosphere, and not according
to their energy.
a) The principle structure of the simulation (to be used again with ran1) is given in program
limb ex3.f90, which has to be copied to a working file and then complemented, according to
the comments given in the program. The “only” part missing is the actual path of the photons.
If you have studied Sect. 1.4 carefully, you should be able to program this path within few
statements.
At first, develope a sketch of the program flow including all possible branches (a so-called
flow-chart), accounting for appropriate distributions of “emission angle” (Example 1.18), optical
path length and scattering angle. Update always the radial optical depth of the photon, according
to the comments in the program, and follow the photons until they have left the atmosphere.
In the latter case then, update the corresponding array counting the number of photons which
have escaped under a certain range of angles. Before implementing this program, discuss your
flow-chart with your supervisor.
b) Implement the algorithm developed in a) into your program and test it using 20 chanels
for the angular distribution, N = 104 . . .105 photons and an optical depth of the atmosphere,
τrmmax = 10. The angular distribution created can be displayed via the idl procedure limb test.pro.
If your program runs successfully, you should have obtained a plot as given in Fig. 2.1.
1 i.e., under the condition that the stellar photosphere is very thin compared to the stellar radius: the solar
photosphere, e.g., is only a few hundred kilometers thick, contrasted to the sun’s radius of 700 000 km.
2 Again, the interested reader may have a look into Mihalas, D., 1978, Stellar atmospheres, Freeman, San
2-3
CHAPTER 2. PRAXIS: MONTE CARLO SIMULATIONS AND RADIATIVE TRANSFER
Figure 2.1: Monte Carlo simulation of limb darkening: angular distribution of 10 5 photons sorted into 20
channels. The total optical depth of the atmosphere is 10.
c) In our simulation, we have calculated the number of photons leaving the atmosphere with
respect to a surface perpendicular to the radial direction. Without going into details, this number
is proportional to the specific intensity weighted with the projection angle µ, since the specific
intensity, I(µ), is defined with respect to unit projected area. Thus,
and the intensity within dµ is obtained from the number of photons, divided by an appropriate
average of µ, i.e., centered at the mid of the corresponding channel (already implemented into
the program output).
This relation is also the reason why the distribution of the photons’ “emission angles” at the
lower boundary follows the pdf µdµ: for an isotropic radiation field, which is assumed to be
present at the lowermost boundary, it is the specific intensity and not the photon number which
is uniformly distributed with respect to µ! Thus, in order to convert to photon numbers, we have
to draw the emission angles at the lower boundary from the pdf µdµ instead of dµ! Inside the
atmosphere, on the other hand, the emission angle refers to the photons themselves and thus is
(almost) isotropic, so that we have to draw from the pdf dµ.
Modify the plot routine limb test.pro in such a way as to display the specific intensity and
compare with the prediction (2.4). Use N = 106 photons for τmax = 10 now, and derive the
limb-darkening coefficients (in analogy to Eq. 2.4) from a linear regression to your results. Hint:
use the idl-procedure poly fit, described in idlhelp. Why is a certain discrepancy between
simulation and theory to be expected?
2-4
CHAPTER 2. PRAXIS: MONTE CARLO SIMULATIONS AND RADIATIVE TRANSFER
2-5
Index
2-6
INDEX
variance, 1-2
variance reduction, 1-24
variate, 1-1
2-7