0% found this document useful (0 votes)
17 views141 pages

Reader ISC 2020

The document is an introduction to Scientific Computing, detailing its role in modeling and analyzing natural phenomena through computational methods. It covers various applications across disciplines such as physics, biology, and medicine, emphasizing the importance of numerical analysis and computer simulations. The text also outlines specific computational techniques and methods that will be explored in the course.

Uploaded by

lkjGAFd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views141 pages

Reader ISC 2020

The document is an introduction to Scientific Computing, detailing its role in modeling and analyzing natural phenomena through computational methods. It covers various applications across disciplines such as physics, biology, and medicine, emphasizing the importance of numerical analysis and computer simulations. The text also outlines specific computational techniques and methods that will be explored in the course.

Uploaded by

lkjGAFd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 141

Introduction to Scientific Computing

Jos B.T.M. Roerdink

February 2019
Acknowledgement: the picture on the front page shows the result of a flow simulation on the Nvidia
logo (W. J. van der Laan et al.: Screen Space Fluid Rendering with Curvature Flow. ACM SIGGRAPH
Symposium on Interactive 3D Graphics and Games, pp. 91–98, 2009).
Contents

1 What is Scientific Computing? 5

2 Computer Tomography 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Obtaining the projection data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Reconstruction methods . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 The algebraic reconstruction technique (ART) . . . . . . . . . . . . . . . . . . . 13
2.3.1 The case N = 4, M = 4: four pixels, four projections . . . . . . . . . . 15
2.3.2 Kaczmarz reconstruction method: M = 2, N = 2 . . . . . . . . . . . . . 17
2.3.3 Kaczmarz reconstruction method: The general case . . . . . . . . . . . . 21
2.3.4 Remarks on the Kaczmarz method . . . . . . . . . . . . . . . . . . . . . 22
2.3.5 Kaczmarz method for binary images . . . . . . . . . . . . . . . . . . . . 23
2.4 Testing the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.A Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.A.1 Matrix-vector operations . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.A.2 Solving linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.A.3 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Stochastic dynamics: Markov chains 39


3.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.1 Computing the equilibrium distribution . . . . . . . . . . . . . . . . . . 41
3.2 The PageRank algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.1 PageRank: defining the Markov chain . . . . . . . . . . . . . . . . . . . 42

i
CONTENTS

3.2.2 Computing the PageRanks . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Modelling and simulation of pattern formation 47


4.1 Cellular Automata: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 A simple CA with majority voting . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Behaviour of the simple CA . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Conway’s Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Variations and generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Dynamics in the complex plane 57


5.1 Intermezzo: Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Mappings in the complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 The homogeneous quadratic equation . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 The inhomogeneous quadratic equation . . . . . . . . . . . . . . . . . . . . . . 60
5.4.1 The Mandelbrot set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.2 The filled Julia set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Differential equations 75
6.1 Linear ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.1 Exponential growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.2 Circular motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.3 Finite difference schemes . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.1.4 Explicit (forward) Euler method . . . . . . . . . . . . . . . . . . . . . . 77
6.1.5 Implicit (backward) Euler method . . . . . . . . . . . . . . . . . . . . . 78
6.1.6 Symplectic (semi-implicit) Euler method . . . . . . . . . . . . . . . . . 80
6.2 Nonlinear ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . 82
6.2.1 Logistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.2 Lotka-Volterra model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7 N -body simulations 89
7.1 Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2 Planetary system formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

ii
CONTENTS

7.3 Dark matter and the cosmic web . . . . . . . . . . . . . . . . . . . . . . . . . . 92


7.4 Two-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.4.1 One particle moving in a circular orbit . . . . . . . . . . . . . . . . . . . 94
7.4.2 Two particles moving in circular orbits . . . . . . . . . . . . . . . . . . 96
7.5 Simulation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8 Simulation of reaction-diffusion processes 103


8.1 Diffusion processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.2 The diffusion equation in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.2.1 Analytical solution of the diffusion equation in 1D . . . . . . . . . . . . 104
8.2.2 Numerical solution of the diffusion equation in 1D . . . . . . . . . . . . 106
8.3 The diffusion equation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.3.1 Numerical solution of the diffusion equation in 2D . . . . . . . . . . . . 109
8.4 Reaction-diffusion systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9 Sequence alignment 115


9.1 Biological background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.1.1 DNA, RNA, proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.1.2 Sequence similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.2 Definition of sequence alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.3 Dotplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.4 Measures of sequence similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.4.1 Scoring functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.5 Dotplots and sequence alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.6 Pairwise alignment via dynamic programming . . . . . . . . . . . . . . . . . . . 128
9.6.1 Optimal substructure property . . . . . . . . . . . . . . . . . . . . . . . 129
9.6.2 Recursive computation of the edit distance . . . . . . . . . . . . . . . . 130
9.7 Variations and generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.8 Sequence logos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.9 Circular visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Index 137

iii
CONTENTS

iv
Chapter 1

What is Scientific Computing?

Scientific Computing (also called Computational Science) deals with computation in the (natural)
sciences. It studies the construction of mathematical models of phenomena occurring in nature
and human society, and analyses such models quantitatively, both by mathematical techniques
and by computer implementation (Heath, 2002; Kutz, 2013). In practice, scientific computing
typically involves the application of computer simulation and other forms of computation to
problems in various scientific disciplines. Numerical analysis is an important foundation for
techniques used in scientific computing.
Scientists and engineers develop computer programs and application software that model the
systems under study and run these programs with various sets of input parameters. Often, the
computations involve massive amounts of calculations (usually floating-point) which require the
use of supercomputers, high performance computing, parallel and distributed computing, or GPU
(graphics processing unit) computing (Hager and Wellein, 2010). To get insight in the large
amounts of data produced by the simulations, visualization techniques are applied. In fact, the
origin of the discipline of Scientific Visualization runs in parallel to the rise of the scientific
computing discipline. Its birth is usually associated to the ACM SIGGRAPH panel report Visu-
alization in Scientific Computing from 1987.
Here are some typical fields in which scientific computing is applied, with some example prob-
lems:

• Physics: simulation of elementary particle behaviour.


• Astronomy: simulation of star and galaxy formation.
• Chemistry: simulation of chemical reactions.
• Biology: simulation of pattern formation in organisms, populations, or ecological networks.
• Earth and Environmental Science: simulation of earthquakes, simulation of climate dynamics.
• Medicine: simulation of X-rays in tissues for radiation therapy planning.
• Sociology: simulation of crowd behaviour.

This list can easily be extended with many other examples.

5
What is Scientific Computing?

The use of computation in all these areas has given rise to many new subfields, the so-called
computational-X fields: computational physics, computational chemistry, computational mathe-
matics, computational biology, computational astrophysics, computational neuroscience, compu-
tational sociology, computational finance, etc. On the other hand, there are also the X-informatics
fields, such as bioinformatics, cheminformatics, astroinformatics, or medical informatics, where
the focus is more on the use of Information Technology, like database technology, special-
purpose programming and script languages, large data handling (transport, compression, and
archiving), web services and ontologies, etc. Of course, this distinction is not absolute, and in
practice we often see combinations of computational-X and X-informatics approaches.
Some of the computational methods you will encounter in this course are:

• Solving systems of linear equations in computer tomography (Chapter 2)

• Discrete-time dynamical systems as models of networks and pattern formation (Chapters 3,


4, 5)

• Numerical solutions of (partial) differential equations (Chapters 6, 7, 8)

In this course, we motivate and illustrate the scientific computing techniques by examples from
application areas such as medicine, physics, or astronomy. These application domains are used to
give context to the computational approaches and to make you aware of the fact that understand-
ing the domain context is an integral part of the task of a computer scientist working in scientific
computing. A dialogue between the computer scientist and the domain expert is necessary to
develop solutions that are both technically sound and relevant in practice. This does not detract
from the fact that the techniques are generic and therefore applicable in many other domains.

6
Bibliography

Hager, G., Wellein, G., 2010. Introduction to High Performance Computing for Scientists and
Engineers. Chapman and Hall.

Heath, M. T., 2002. Scientific Computing: An Introductory Survey (2nd ed.). McGraw-Hill.

Kutz, N., 2013. Data-driven Modeling Scientific Computation. Oxford University Press.

7
BIBLIOGRAPHY

8
Chapter 2

Computer Tomography

In this chapter we look at the fundamentals of tomographic reconstruction. The algorithms we


will consider form the basis of computer tomography, as realized in medical devices such as CT
or MRI scanners.

2.1 Introduction
The word tomography means ‘reconstruction from projections’, i.e., the recovery of a function
from its line or (hyper)plane integrals (from the Greek τ óµoσ—slice and γράφιν—to write);
see Figure 2.1. In the applied sense, it is a method to reconstruct cross sections of the inte-
rior structure of an object without having to cut or damage the object. The term often occurs
in the combination computer (computerized, computed) tomography (CT) or computer-assisted
tomography (CAT), since for performing the reconstructions in practice one needs the use of
a digital computer. Important issues in tomography are existence, uniqueness and stability of
reconstruction procedures, as well as the development of efficient numerical algorithms.
The internal property of the object to be reconstructed, such as a density, space-dependent atten-
uation coefficient, and so on, is generally referred to as the internal distribution. The physical
agens or probe by which to act on this internal distribution may vary from X-rays, gamma rays,
visible light, electrons or neutrons to ultrasound waves or nuclear magnetic resonance signals.
When the probe is outside the object one speaks of transmission computer tomography (TCT).
In contrast with this stands emission computer tomography (ECT), where the probe, such as a
radioactive material, is inside the object. This occurs in two variants: SPECT (single particle
ECT) where radiation along a half line is detected, and PET (positron emission tomography)
where radiation emitted in opposite directions is detected in coincidence. Finally we mention
reflection tomography, where the object is ‘illuminated’ by sound waves and the reflected waves
are recorded to obtain line integrals of the object’s reflectivity function (Kak and Slaney, 1988).
Other forms of tomography exist, such as electric impedance tomography (recovering the con-
ductivity inside a body from electric potential measurements on the surface), biomagnetic imag-
ing (recovering the position of electric currents from magnetic fields induced outside the body),
or diffraction tomography, see Herman (1980). Instances of tomography in three dimensions are

9
Computer Tomography

Figure 2.1: Scanning the body with X-rays to obtain projections. The problem is how to recon-
struct a cross-section from the projections.

=⇒ =⇒
single slice stack of slices 3D volume

Figure 2.2: Performing 3D reconstruction by a stack of 2D reconstructions along parallel slices.

10
2.2. Obtaining the projection data

found in radar theory and magnetic resonance imaging. One of the most prominent applications
of computer tomography occurs in diagnostic medicine, where the method is used to produce
images of the interior of human organs (Shepp and Kruskal, 1978). In 1979 the Nobel prize in
physiology or medicine was awarded to G.N. Hounsfield and A.M. Cormack for their funda-
mental work in the field. Other applications arise in radio astronomy, 3D electron microscopy,
soil science, aerodynamics and geophysics, to name a few. In industry the method is used for
non-destructive testing.
To obtain 3D reconstruction of objects, the standard approach is to make a stack of 2D recon-
structions of parallel cross-sections. These can then be combined into full 3D reconstructions by
volume rendering techniques; see Figure 2.2 for the general idea. In the remainder of this chapter
we restrict ourselves to the 2D reconstruction process of a single cross-section.

L e,s

(cos e, sin e)
s

e
x

Figure 2.3: Parameters θ, s defining a line Lθ,s .

2.2 Obtaining the projection data


To describe computer tomography one starts from the following simplified model (in practice,
many complications arise). If a beam of X-rays with initial intensity I0 passes through an object
along a straight line L, then the beam intensity is attenuated by a factor which involves the
integrated density along this line. This means that the intensity I1 after having passed the object
satisfies Z
I1
(2.1)

= exp − f (x, y) dx dy ,
I0 L

11
Computer Tomography

where f (x, y) denotes the X-ray attenuation coefficient of the object at the point (x, y). Hence
by measuring the ratio I1 /I0 line integrals of the unknown distribution f are obtained.

Rf(e ,s)
Rf(e1 ,s) s 2
Df(a1,e)
s

a2

(a) (b)
Figure 2.4: Parallel beam scanning mode in 2D tomography: projections along parallel lines.
Shown are projection profiles for two directions with angles θ1 and θ2 .

The mathematical concept associated to reconstruction of a distribution from line integrals is the
so-called Radon transform (Deans, 1983). Let (cos θ, sin θ) be a vector of length 1 making an
angle θ with the x-axis, s a real number, and Lθ,s the line defined by
x cos θ + y sin θ = s
The line Lθ,s is perpendicular to the vector (cos θ, sin θ), see Figure 2.3. The integral of f over
the line Lθ,s Z
Rf (θ, s) := f (x, y) dx dy
Lθ,s

is called the Radon transform of the function f . The integral Rf (θ, s) for a single line, i.e., with
θ and s fixed, is called a projection and the set of projections along parallel lines for a fixed value
of θ is called a projection profile, or simply a profile, cf. Figure 2.4.
In practice, one uses different ways to sample the line integrals of the internal distribution. In
parallel beam scanning, as described above, parallel line integrals are determined for a fixed
direction and the process is repeated for a number of different directions; in fan-beam scanning
line integrals emanating from a given source point are computed for different directions, which
is repeated for a certain number of source points. In this chapter, we only consider the case that
projections are collected along parallel lines.

12
2.3. The algebraic reconstruction technique (ART)

2.2.1 Reconstruction methods


Once we have obtained the projection data, the question is how to recover the original 2D distri-
bution. In computer tomography, the reconstruction methods can be subdivided as follows:

• Direct methods:

– Filtered backprojection
– Fourier reconstruction

• Iterative methods:

– ART: algebraic reconstruction technique


– SIRT: simultaneous iterative reconstruction technique
– SART: simultaneous algebraic reconstruction technique

We will consider one of the iterative methods, i.e., ART.

2.3 The algebraic reconstruction technique (ART)


In ART, one considers a discrete version of the projection process, see Figure 2.5. Projections
are collected along bundles of parallel strips, where the direction of a bundle is specified by an
angle θk = π/k, k = 1, 2, . . . , Nangles . Each bundle contains a number Nrays of parallel lines
which are a distance t apart. So the total number of rays is M = Nangles · Nrays . Two consecutive
parallel lines define a strip. We also assume that the distribution f which we want to reconstruct
is defined on a square grid of pixels. Each pixel has width and height d. Each row has n pixels
and there are n rows, so the total number of pixels is N = n2 .
We assume that the (unknown) density in each pixel is constant. The density in pixel j is denoted
by fj , j = 1, 2, . . . , N . So f can be described by a matrix of the form
 
f1 f2 · · · fn
 fn+1 fn+2 · · · f2n 
f = .. .. .. ..  (2.2)
 
 . . . . 
f(n−1)n+1 f(n−1)n+2 · · · fn2

Instead of a matrix (or grid) representation of the density, we will also represent the 2D image of
dimensions n × n as a 1D vector f~ of dimension N = n2 , where

f~ = (f1 , f2 , . . . , fn , fn+1 , fn+2 , . . . , f2n , . . . , . . . , fn2 )

The projection in strip i is denoted by pi . Instead of the continuous formula (2.1) we obtain pi
by summing the contributions of all pixels which have a nonzero intersection with strip i. For
this reason we will call pi a ray sum. For pixel j, the contribution to ray i will be the percentage

13
Computer Tomography

d
p
p n
i+1 f f f
p 1 2 n p
i n+1
f f
n+1 2n

f
n2
pixel j

t
Figure 2.5: Projections of a distribution f on a grid of size n × n along bundles of parallel strips
of width t. The density in pixel j is denoted by fj , j = 1, 2, . . . , N , where N = n2 . The ray sum
for strip i is denoted by pi .

14
2.3. The algebraic reconstruction technique (ART)

of pixel j which is intersected by strip i; this is the shaded area within pixel j in Figure 2.5. We
will denote this contribution by wij . So:

area of intersection of strip i with pixel j


wij = , (2.3)
d2

where the denominator d2 in this formula equals the area of a pixel. The quantity wij is a weight
factor: 0 ≤ wij ≤ 1. It indicates with what weight pixel j is represented in the ray sum pi . The
weight matrix W can be precomputed. However, one problem in practice is that it can become
very large, since its size is M × N , and thus requires a lot of storage space.
Sometimes we use the following simpler method to define the weights wij :

(
1 if ray j intersects pixel i
wij = (2.4)
0 otherwise

This makes the implementation easier because we can easily compute the weight factors at run
time.
If we sum the contributions of all pixels to the ray sum pi we find the following ray equation:

pi = wi1 f1 + wi2 f2 + . . . + wiN fN , i = 1, 2, . . . , M (2.5)

Since we have M rays we also have M ray equations of the form (2.5). In each equation we have
N terms, one for each pixel. So we have M equations in N unknowns.
The question we will consider is how to solve such systems of equations to obtain the unknown
densities f1 , . . . , fN . But first we will look at a simple example to get a good understanding of
the reconstruction process.

2.3.1 The case N = 4, M = 4: four pixels, four projections

Let us assume we have an unknown grid image f of size 2 × 2. We take two horizontal and two
vertical projections. So we have N = 4, M = 4. Assume that the two horizontal ray sums are
3 and 7, respectively, while the vertical ray sums are 4 and 6, respectively. Then we have the
following picture:

15
Computer Tomography

3 f1 f2

7 f3 f4

4 6

Figure 2.6: A simple 2 × 2 grid image with known ray sums along rows and columns.

The ray equations which should be satisfied by the pixel values f1 –f4 are:

f1 + f2 =3 (2.6)
f3 + f4 =7 (2.7)
f1 + f3 =4 (2.8)
f2 + f4 =6 (2.9)

We can guess a solution for f1 –f4 . In fact there are many solutions. Here are a few. (We assume
that negative pixel values are not allowed.)

1 2 2 1 3 0 1.5 1.5

3 4 2 5 1 6 2.5 4.5

We see that the second solution is found by adding the following image to the first one:

1 −1
n=
−1 1

In fact, all solutions can be found by adding a constant times n to the first solution. If we look at
the solution n we see that all ray sums are zero. Therefore we will call this a “null image” or “null
solution”. The existence of such null solutions is a well known phenomenon in linear algebra. In

16
2.3. The algebraic reconstruction technique (ART)

the Appendix (page 31 etc.) we give some background on linear algebra which explains this in
more detail.
How can we ensure that the solution is unique? Well, we can add more projections. For example,
suppose that we add a diagonal ray through pixels 2 and 3, with a ray sum of 5:

f2 + f3 = 5 (2.10)

Now we have five equations with four unknowns. This is called an overcomplete system of
equations. In general, there will be no solution at all, unless the system is consistent. In our
case, this is indeed true. Consider the equations (2.6), (2.8) and (2.10). This a a system of three
equations in three unknowns.

f1 + f2 = 3 (2.11)
f1 + f3 = 4 (2.12)
f2 + f3 = 5 (2.13)

This can easily be solved. From (2.13) we find f3 = 5 − f2 . Substitute this in equation (2.12)
to find f1 + 5 − f2 = 4, that is, f2 = f1 + 1. Substitute this in equation (2.11) to find f1 = 1.
Then we obtain f2 = 2, f3 = 3, and from (2.7) f4 = 4. We can easily verify that the solution
f~ = (1, 2, 3, 4) also satisfies equation (2.9). Hence we have found a solution of all equations and
in this case there is only one.
What we have seen in this simple example also holds in the general case. We have to make sure
that we have a sufficient number of rays, otherwise the solution of the reconstruction problem
may not be unique. On the other hand, if we have more rays than unknowns the solution may
not exist at all, unless the system of equations is consistent. In practice, consistency does not
always hold, for example because there is noise in the data or because there are measurement
inaccuracies. The solution method we will consider next can also deal with this situation.

2.3.2 Kaczmarz reconstruction method: M = 2, N = 2


Kaczmarz developed an algebraic reconstruction technique based on a simple geometric idea.
We explain it for the case of two equations in two unknowns (M = 2, N = 2). Then the system
of equations (2.5) becomes:

w11 f1 + w12 f2 = p1 (2.14)


w21 f1 + w22 f2 = p2 (2.15)

To obtain a solution of these equations we use an iterative method. Starting with an initial
vector f~ (0) , we compute a sequence f~ (0) , f~ (1) , f~ (2) , . . . which converges to the desired solution
(∞) (∞)
f~ (∞) = (f1 , f2 ).
The principle of the Kaczmarz method is as follows.1 Each of these two equations can be graph-
ically represented as a straight line in the f1 − f2 plane; see Figure 2.7. The first equation
1
Please consult the Appendix for some basics about vectors and matrices.

17
Computer Tomography

Figure 2.7: Principle of the Kaczmarz method for two lines. The initial point P is orthogo-
nally projected upon line L1 , then on line L2 , then again on line L1 , etc. The series of points
A1 , B1 , A2 , B2 , . . . converges to the intersection point S of the two lines.

corresponds to line L1 , the second equation corresponds to line L2 . Choose an initial point P ;
this corresponds to the initial solution f~ (0) . Then project point P perpendicularly upon line L1 ;
call the projection A1 ; this corresponds to the next solution f~ (1) . Next project A1 perpendicu-
larly upon L2 ; call the projection B1 ; this corresponds to the next solution f~ (2) . Then project
B1 again line L1 ; call the projection A2 . Continue this process. Then we get a series of points
A1 , B1 , A2 , B2 , . . ., corresponding to a sequence of approximations f~ (0) , f~ (1) , f~ (2) , . . ., which
converges to the intersection point S of the two lines. This intersection represents the solution
f~ (∞) of the two equations.
We now want to find the formula which computes the first approximation f~ (1) from the initial
point P located at position vector f~ (0) . We denote the perpendicular projection of P on the line L
(1) (1)
by Q; the coordinate vector of Q is f~ (1) = (f1 , f2 ). The equation of line L is w11 f1 +w12 f2 =
p1 . Let us write w ~ 1 = (w11 , w12 ) and f~ = (f1 , f2 ). Then we can also write the equation of the
line as

f~ · w
~ 1 = p1 (2.16)

~ 1 denotes the inner product of the vectors f~ and w


Here f~ · w ~ 1 . The definition of inner product
of vectors can be found in the Appendix, Section 2.A.1.
The line L intersects the line through w
~ 1 in a point A. Also, denote the perpendicular projection
of P on the line through w~ 1 by B. See Figure 2.8 for the situation.

18
2.3. The algebraic reconstruction technique (ART)

Figure 2.8: Update step of the Kaczmarz method. The initial point P , representing solution f~ (0) ,
is projected perpendicularly on the line L1 with equation w11 f11 + w12 f12 = p1 . The point of
projection Q represents the next approximation f~ (1) .

Then we have

f~ (1) = OP
~ − QP ~
~ − AB
= OP ~
= f~ (0) − (OB
~ − OA)
~

In Section 2.A.3 of the Appendix we prove that the perpendicular projection B of the vector f~ (0)
with endpoint P on a line with direction vector w
~ 1 satisfies

~ = λw f~ (0) · w
~1
OB ~1 with λ =
kw~ 1 k2

Here kw~ 1 k is the norm (or length) of the vector w ~ 1 , as defined by formula (A.5) of the Appendix.
~
If we apply the same formula to the vector f with endpoint A, we get
(1)

~ = µw f~ (1) · w
~1
OA ~1 with µ =
kw~ 1 k2

Since point A is on the line L, we know that f~ (1) satisfies equation (2.16), so f~ (1) · w
~ 1 = p1 .
Therefore µ = kw~ k2 .
p1
1

19
Computer Tomography

Combining the results so far we find


!
f~ (0) · w
~1 p1
f~ (1) = f~ (0) − ~1−
2 w w
~1
kw~ 1k ~ 1 k2
kw

So we have derived the desired update formula:

f~ (1) = f~ (0) − β1 w ~1 (2.17)


f~ (0) · w
~ 1 − p1
β1 = (2.18)
kw~ 1 k2

A similar formula holds when computing the next estimate f~ (2) from f~ (1) : we only have to
replace w
~ 1 by w
~ 2 , and p1 by p2 .
Let us interpret the update formulas (2.17)-(2.18). For the example with N = 4, M = 4 the
weight matrix has the following form (check this):
 
1 1 0 0
0 0 1 1
W = 1 0
 (2.19)
1 0
0 1 0 1

The first two rows of the matrix correspond to the two row sums, and the last two rows of the
matrix correspond to the two column sums. Let us look at the expression f~ (0) · w ~ 1 . Since
~ 1 = (1, 1, 0, 0) we find, using the definition of inner product in formula (A.6) of the Appendix,
w
that
(0) (0) (0) (0) (0) (0)
f~ (0) · w
~ 1 = 1 · f1 + 1 · f2 + 0 · f3 + 0 · f4 = f1 + f2
If we convert the 1D vector representation f~ (0) back to a 2D matrix representation, this means
that f~ (0) · w
~ 1 is just the first row sum of the matrix f (0) ; see Figure 2.6. Similarly, f~ (0) · w ~ 2 is
the second row sum of f (0) . In the same way we find that f~ (0) · w ~ 3 and f~ (0) · w
~ 4 are the first and
second column sums of the matrix f (0) .
~ 1 k2 takes a very simple form (see formula (A.7)):
The expression kw

~ 1 k2 = w
kw ~1·w
~1 =1·1+1·1+0·0+0·0=2

This means that the expression β1 is obtained as follows: compute the first row sum of the matrix
f (0) , subtract the row sum p1 from it, and divide the result by 2. As a result, the first row sum
of f (1) will have the correct value p1 , because f (1) corresponds to a point on the line L1 ; see
Figure 2.8. This can be verified by explicit computation:

f~ (1) · w
~ 1 = f~ (0) − β1 w~1·w ~ 1 = f~ (0) · w
~ 1 − β1 w
~1·w
~1
= f~ (0) · w
~ 1 − (f~ (0) · w
~ 1 − p1 ) = p1

20
2.3. The algebraic reconstruction technique (ART)

Notice that β1 = 0 when the first row sum of f (0) is already equal to the row sum p1 .
Now let us see what (2.17) means for each of the components:
(1) (1) (1) (1) (0) (0) (0) (0)
(f1 , f2 , f3 , f4 ) = (f1 , f2 , f3 , f4 ) − β1 (w11 , w12 , w13 , w14 )
(0) (0) (0) (0)
= (f1 , f2 , f3 , f4 ) − β1 (1, 1, 0, 0)
(0) (0) (0) (0)
= (f1 − β1 , f2 − β1 , f3 , f4 )
In terms of the matrix representation of the image this simply means that β1 has to be subtracted
from each element of the first row of f (0) . In the same way we find that:

• matrix f (2) is obtained by subtracting β2 from each element of the second row of f (1)
• matrix f (3) is obtained by subtracting β3 from each element of the first column of f (2)
• matrix f (4) is obtained by subtracting β4 from each element of the second column of f (3)

2.3.3 Kaczmarz reconstruction method: The general case


Now we go back to the general case where we have N pixels and M rays. That is, we have M
equations of the form (2.5), one for each ray:
w11 f1 + w12 f2 + . . . + w1N fN = p1
w21 f1 + w22 f2 + . . . + w2N fN = p2
..
.
w M 1 f1 + w M 2 f2 + . . . + w M N fN = p M
~ i = (wi1 , wi1 , . . . , wiN ) for i = 1, 2, . . . , M , and f~ = (f1 , f2 , . . . , fN ), then, using
If we write w
the inner product notation, we can write this system of equations as:
~ 1 · f~ = p1
w
~ 2 · f~ = p2
w
..
.
~ M · f~ = pM
w

Each equation will correspond to a line Li . We project the initial point on line L1 , then project
on line L2 , etc, and finally project on line LM . Starting with an initial estimate f~ (0) , we get
successive estimates f~ (1) , f~ (2) , f~ (3) , . . ., f~ (M ) . The formulas to get these estimates have the
same form as the formulas (2.17)-(2.18) for the two-dimensional case:
f~ (i) = f~ (i−1) − βi w ~ i, i = 1, 2, . . . , M (2.20)
!
f~ (i−1) · w ~ i − pi
βi = (2.21)
kw ~ i k2

21
Computer Tomography

After all M lines have been processed, we repeat the process, and project again on line L1 , then
on L2 , etc.
In practice we don’t want to carry out an infinite number of projections to reach the exact solution.
Therefore, we introduce so-called stopping criteria. That is, we will stop the iteration when a
maximum number of iterations MAX_ITER has been reached, or at the moment when the relative
difference between two successive solutions is smaller than a predefined small number ε, i.e.,
when
f~ ((k+1)M ) − f~ (kM )
δ(f~ ((k+1)M )
, f~ (kM )
)= <ε (2.22)
N
Here k. . .k again denotes the length of a vector (see formula (A.5) of the Appendix).
A pseudocode of the algorithm can be found in Algorithm 2.1.

Algorithm 2.1 Kaczmarz method for reconstruction of a 2D image with N = n2 pixels from M
ray sums.
1: I NPUT: ray sums p1 , p2 , . . . , pM for M rays with weight vectors w ~ 1, w
~ 2, . . . , w
~M
2: I NPUT: maximum iteration number MAX_ITER; accuracy threshold ε
3: O UTPUT: vector representation f~ = (f1 , f2 , . . . , fN ) of the reconstructed image
4: Initialize vector f~ (0)
5: for k = 1 to MAX ITER do
6: for i = 1 to M do
7: compute f~ (i) from f~ (i−1) according to formulas (2.20)-(2.21)
8: impose constraints (optional)
9: end for
10: if δ(f~ (M ) , f~ (0) ) < ε then {compute δ according to formula (2.22)}
11: break
12: end if
13: f~ (0) ← f~ (M ) {reset f~ (0) }
14: end for

2.3.4 Remarks on the Kaczmarz method


Speed of convergence. The speed at which the projections converge to point S depends on the
angles between the lines. If the angle is very small, convergence is slow. For the case of two
lines, if the angle is 90◦ we reach point S in at most two steps (check!). If the angle is 0◦ , i.e.,
the lines are parallel, then obviously there is no solution. In this case B1 = A1 = B2 = A2 etc.

Choice of the initial point. We can always choose P to be be located at the origin (0, 0, . . . , 0)
(this means that all elements of the initial image f (0) are zero). But sometimes we have a good
initial estimate, which is already quite close to point S. If this is the case, the time to convergence
will be decreased significantly.

The case M < N . When we have less equations than unknowns the solution wil in general
not be unique, but an infinite number of solutions is possible (compare our simple example in

22
2.3. The algebraic reconstruction technique (ART)

Section 2.3.1). But Kaczmarz’ method still works: it will convergence to a solution f~ (s) such
that f~ (0) − f~ (s) is minimized.

The case M > N . In practice we often have more equations than unknowns, where typically
the set of equations is not consistent. Graphically, this means that the lines which represent the
equations do not have a unique intersection point. Using Kaczmarz’ method the “solution” does
not converge to a unique point, but will oscillate in the neighbourhood of the intersection points.
Nevertheless, using the stopping criterion with a not too small value of ε guarantees that we still
get fairly close to the region which contains the intersection points; see Figure 2.9.

Figure 2.9: Example of three equations which are inconsistent. The three lines L1 , L2 , L3 do not
intersect in a unique point. The “solution” wanders around in the region of the intersections.

A priori information. In Kaczmarz’ method it is possible to incorporate some a priori infor-


mation about the object one is trying to reconstruct. For example, if we know that the image
f (x, y) is nonnegative, then we can easily set each negative component of f~ (k) to zero during the
iteration to enforce this non-negativity. This can be done in line 8 of Algorithm 2.1.

2.3.5 Kaczmarz method for binary images


For the case of binary images, where pixel values take only two values, say 0 and 1, we can
obtain fairly good reconstruction results with far fewer projections than for the case of grey scale
images. Here we will consider projections for only two projections angles: the horizontal and
vertical direction. This means that projections correspond to sums of rows or columns of the
image.

23
Computer Tomography

Let us look at the pseudocode of Algorithm 2.1 and see how it simplifies for the present case.

1. Line 1: The rays sums pi , i = 1, 2, . . . , M now consists of two sets: n column sums and n
rows sums; so M = 2n. Let us assume that the first n values pi , i = 1, 2, . . . , n correspond
to the sums of the n columns of the image f : we denote these column sums by pcol c ,c =
1, 2, . . . , n. The second n values pi , i = n + 1, n + 2, . . . , 2n correspond to the sums of
the rows i = 1, 2, . . . , n of f : we denote the row sums by pcol r , r = 1, 2, . . . , n. Since we
know that the sums are over rows and columns, we don’t have to store the direction vectors
explicitly.
2. Line 3: Instead of a vector representation, we will use a matrix representation of images.
3. Line 4: As initialization we can define an n × n image f (init) with all elements equal to
zero.
4. Lines 5-9: This do-loop can now be separated into two separate do-loops: one do-loop
over the n columns and one do-loop over the n rows of f .
5. Line 7: Let us look at the update formulas (2.20)-(2.21). Just as we saw in the case
M = 4, N = 4 above (see end of Section 2.3.2), for i = 1, 2, . . . , n the vector w ~ i has
nonzero elements only for column i of the image, so the expression f~ (i−1) · w ~ i equals the
sum of column i of the image f (i−1) . Similarly, for i = n + 1, n + 2, . . . , 2n the vector w~i
has nonzero elements only for row i of the image, so the expression f ~ (i−1)
·w~ i equals the
sum of row i of the image f (i−1) .
The expression kw~ i k2 in the denominator of formula (2.21) is now very simple: since each
w ~ i k2 = w
~ i has n ones with all other entries zero, kw ~i·w ~ i = n.
This means that the expression βi in (2.21) is obtained as follows: compute the ith column
(or row) sum of f (i−1) , subtract the ith column sum pcoli (or row sum prowi ) from it, and
divide by n. Then (2.20) means the following: subtract βi from each element of the ith
column (or row). (Compare the discussion for the case M = 4, N = 4 at the end of
Section 2.3.2.)
6. Line 8: we know that pixel values cannot become negative; also, for binary images the
values can never be larger than 1. Therefore we can impose these constraints for every
pixel (r, c) by the formula
(i)
fr,c (i)
← min(max(fr,c , 0), 1) (2.23)

7. Line 10. For two n × n binary images f and g, the difference kf − gk is simply the
number of pixels where f and g differ. As relative difference we can take the fraction of
pixels where where f and g differ. This means that for the relative difference of f and g we
can in our case simply take the sum of the absolute values of f − g over all pixels, divided
by n2 :
n n
1 XX
δ(f, g) = 2 |fr,c − gr,c |
n r=1 c=1

24
2.4. Testing the algorithm

8. After termination of the algorithm we will end up with pixel values between 0 and 1.
Since we want a binary image as reconstruction we perform a final rounding to the nearest
integer.

A pseudocode of the algorithm for the binary case with only horizontal and vertical projections
can be found in Algorithm 2.2. Of course, we can extend the algorithm by adding more projection
directions, for example along diagonal directions. This means that additional do-loops have to
be added.

Algorithm 2.2 Kaczmarz method for reconstruction of a 2D binary image with N = n2 pixels
from M = 2n ray sums (n column sums and n row sums).
1: I NPUT: column sums pcol
1 , p2 , . . . , pn and row sums p1 , p2 , . . . , pn of an unknown image f
col col row row row

2: I NPUT: maximum iteration number MAX_ITER; accuracy threshold ε


(rec)
3: O UTPUT: matrix representation (fr,c , r = 1, . . . , n; c = 1, . . . , n) of the reconstructed image
4: Initialize matrix f (init) {e.g., by setting all elements to zero}
5: for k = 1 to MAX ITER do
6: f (0) ← f (init)
7: for c = 1 to n do {process all columns}
βc = (sum of column c of f (c−1) ) − pcol

8: c /n
(c) (c−1)
9: fr,c = fr,c − βc ∀r = 1, . . . , n {subtract βc from each element of column c of f (c−1) }
(c) (c)
10: fr,c ← min(max(fr,c , 0), 1) ∀r = 1, 2, . . . , n {impose constraints}
11: end for
12: f (0) ← f (n) {reset f (0) to output of column loop}
13: for r = 1 to n do {process all rows}
βr = (sum of row r of f (r−1) ) − prow

14: r /n
(r) (r−1)
15: fr,c = fr,c − βr ∀c = 1, 2, . . . , n {subtract βr from each element of row r of f (r−1) }
(r) (r)
16: fr,c ← min(max(fr,c , 0), 1) ∀c = 1, 2, . . . , n {impose constraints}
17: end for
(n) (init)
18: if n12 nr=1 nc=1 |fr,c − fr,c | < ε then
P P
19: break
20: end if
21: f (init) ← f (n) {reset f (init) to output of row loop}
22: end for
23: round elements of output f (k) to nearest integer

2.4 Testing the algorithm


In order to test the accuracy of the reconstruction algorithms, one needs a reference image. This
image should consist of simple objects in order to be able to compute the projections of it. The
so-called Shepp-Logan “head phantom” as described in Shepp and Logan (1974) is used for this
purpose. This image consists of 10 ellipses as shown in Fig. 2.10. The parameters of the ellipses

25
Computer Tomography

are given in Table 2.1.2

Figure 2.10: The Shepp-Logan “head phantom”.

Table 2.1: Parameters of the Shepp-Logan head phantom.


Centre Major Minor Rotation Refractive
axis axis angle index
(0, 0) 0.92 0.69 90 90
(0, −0.0184) 0.874 0.6624 90 -40
(0.22, 0) 0.31 0.11 72 -20
(−0.22, 0) 0.41 0.16 108 -20
(0, 0.35) 0.25 0.21 90 10
(0, 0.1) 0.046 0.046 0 10
(0, −0.1) 0.046 0.046 0 10
(−0.08, −0.605) 0.046 0.023 0 10
(0, −0.605) 0.023 0.023 0 10
(0.06, −0.605) 0.046 0.023 90 10

An advantage of using a phantom as described above is that one can give analytical expressions
for the projections. The projections of an image consisting of several ellipses are simply the sum
of the projections for each of the ellipses because of the linearity of the projection process. Let
f (x, y) be a filled ellipse centered at the origin (see Fig. 2.11),
( 2 2
ρ for Ax 2 + By 2 ≤ 1 (inside the ellipse),
f (x, y) = (2.24)
0 otherwise (outside the ellipse),
2
Compared to Shepp and Logan (1974) we use larger values of the refractive index.

26
2.4. Testing the algorithm

where A and B denote the half-lengths of the major and minor axis respectively, and ρ denotes
the refractive index.
The projections of such an ellipse at an angle θ are given by
(
2ρAB
a2 (θ) − t2 for |t| ≤ a(θ),
p
Rθ (t) = a2 (θ)
(2.25)
0 for |t| > a(θ),

where a2 (θ) = A2 cos2 θ +B 2 sin2 θ. The projections for an ellipse centred at (x1 , y1 ) and rotated
over an angle α are easily obtained from this.

y
(t)
Re
t

)
a(e

)
a(e
B e

x
A

Figure 2.11: Projection of an ellipse at an angle θ.

We assume the projections to be available for angles θk uniformly distributed over the interval
[0, π], the total number of angles being denoted by Nangles , and a total number of Nrays parallel
rays per profile with a constant step size. Reconstruction is performed on a 256 × 256 grid.

2.4.1 Results
Here we show reconstruction results for the filtered backprojection algorithm, which is much
faster than ART and gives higher quality reconstructions. The algorithm was applied to pro-
jections of the Shepp-Logan head phantom. The projection data were generated using varying
values of Nangles and Nrays . The reconstruction grid was of size 256 × 256.
The results for Nrays = 256 with varying values of Nangles are shown in Figure 2.12. The results
for Nangles = 128 with varying values of Nrays are shown in Figure 2.13. The images on the

27
Computer Tomography

Nrays = 256, Nangles = 8

Nrays = 256, Nangles = 32

Nrays = 256, Nangles = 128

Figure 2.12: Reconstruction of the Shepp-Logan head phantom. Projection data were generated
for varying values of Nangles and Nrays = 256 lines per profile. The reconstruction grid was of
size 256 × 256. In each row, the right image is a contrast-enhanced version of the reconstruction
on the left.

28
2.4. Testing the algorithm

Nrays = 8, Nangles = 128

Nrays = 32, Nangles = 128

Nrays = 128, Nangles = 128

Figure 2.13: Reconstruction of the Shepp-Logan head phantom. Projection data were generated
for Nangles = 128 with varying values of Nrays . The reconstruction grid was of size 256 × 256.
In each row, the right image is a contrast-enhanced version of the reconstruction on the left.

29
Computer Tomography

left show the original reconstructions; the right images are a contrast-enhanced versions of the
images on the left.
When Nrays is very low, the image looks blurred, due to undersampling of the rays per profile.
When Nangles is very low, the image looks “angular”, due to too few profiles. Note how the quality
of the reconstruction improves as we increase the value of Nangles (more profiles) or increase the
value of Nrays (more lines per profile). Insufficiency of the data, either by undersampling a profile
or by taking the number of profiles too small, causes aliasing artifacts such as Gibbs phenomena,
streaks (lines which are tangent to discontinuities, like the ellipse boundaries) and Moiré patterns
(when the display resolution is too small); see Kak and Slaney (1988).

30
2.A. Linear algebra

2.A Linear algebra


2.A.1 Matrix-vector operations
Definition of vector

A vector ~x is a list of elements x1 , x2 , . . . , xN . The number N is called the dimension of the


vector. We can write the list in row notation or column notation (abstractly, both represent the
same vector). For example, if N = 3:
~x = (x1 , x2 , x3 ) row vector (A.1)
 
x1
~x = x2 
 column vector (A.2)
x3

Addition of vectors

We can add two vectors by adding the corresponding elements:


     
x1 y1 x1 + y 1
 x2  +  y 2  =  x2 + y 2  (A.3)
x3 y3 x3 + y 3

Multiplication of a vector by a constant

We multiply a vector by a real number λ by doing it for each element:


   
w1 λ w1
λw~ = λ w2  = λ w2  (A.4)
w3 λ w3
For example, we can show that every point P on a line through the vector w
~ is of the form λ w
~
for some λ ∈ R; see Figure 2.14(a).

Norm of a vector

The norm of a vector ~x = (x1 , x2 , . . . , xN ), written as k~x k, equals its length (see Figure 2.14(b)):
q
k~x k = x21 + x22 + . . . + x2N (A.5)

Inner product of two vectors

If ~x = (x1 , x2 , . . . , xN ) and ~y = (y1 , y2 , . . . , yN ) are two vectors of the same length, the inner
product3 of ~x and ~y , denoted by ~x · ~y , is a real number defined as follows:
~x · ~y = x1 y1 + x2 y2 + . . . + xN yN (A.6)
3
Dutch: “inproduct”.

31
Computer Tomography

(a) (b) (c)


Figure 2.14: (a): Every point P on a line through the vector w ~ is ofpthe form λ w
~ for some
2 2
λ ∈ R. (b) the norm k~x k of a vector ~x = (x1 , x2 ) is equal to its length x1 + x2 . (c): the inner
product of two vectors ~x and ~y is equal to k~x k k~y k cos α.

For example,
(1, 0) · (1, 1) = 1 · 1 + 0 · 1 = 1
(1, 1) · (1, 1) = 1 · 1 + 1 · 1 = 2
(1, 0) · (0, 1) = 1 · 0 + 0 · 1 = 0

The second and third equations show some general properties of inner products:

• the inner product of a vector with itself is equal to the square of the norm of the vector:
k~x k2 = ~x · ~x (A.7)

• if two vectors are perpendicular, their inner product is zero

In fact, it can be shown that


~x · ~y = k~x k k~y k cos α (A.8)
where α is the angle between the two vectors ~x and ~y .

Definition of matrix

A matrix A is a rectangular array of elements of the form:


 
a11 a12 · · · a1N
 a21 a22 · · · a2N 
A =  .. .. .. ..  (A.9)
 
 . . . . 
aM 1 aM 2 · · · aM N
The numbers M of rows and the number N of columns are called the dimensions of the matrix.

32
2.A. Linear algebra

Addition of matrices

We can add two matrices by adding the corresponding elements. For example,
     
a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13
a21 a22 a23  + b21 b22 b23  = a21 + b21 a22 + b22 a23 + b23 
a31 a32 a33 b31 b32 b33 a31 + b31 a32 + b32 a33 + b33
Also, we can multiply a matrix by a constant λ by doing it for each element, just as in the case
of a vector.

Multiplying a matrix and a vector

A matrix A of dimension M × N can be multiplied with a vector ~x of dimension N , by taking


the inner product of each row of the matrix A with the vector ~x . For example, if M = 3, N = 2,
   
a11 a12   a11 x1 + a12 x2
a21 a22  x1 = a21 x1 + a22 x2 
x2
a31 a32 a31 x1 + a32 x2
Here the vector is multiplied “on the right”. Since a vector can be regarded as a matrix of
dimensions N × 1, it is also possible to multiply a vector “on the left”. For example,
 
 a11 a12 a13
x1 x2 = (x1 a11 + x2 a21 , x1 a12 + x2 a22 , x1 a13 + x2 a23 )
a21 a22 a23

2.A.2 Solving linear equations


Let us consider again the case of four pixels and four projections of Section 2.3.1.
We can write the equations (2.6)–(2.9) in matrix form:
    
1 1 0 0 f1 3
0 0 1 1 f2  7
1 0 1 0 f3  = 4
     (A.10)
0 1 0 1 f4 6

This is an equation of the general form


A ~x = ~b (A.11)
where A is a matrix, ~x is a vector of unkowns and ~b is a known vector.
As we already know, a possible solution is:
 
1
2
~x ∗ =  (A.12)


3
4

33
Computer Tomography

The question is: how do we find all solutions?


Suppose we can find a vector ~n such that
 
0
0
A ~n = 0 = 
0
 (A.13)
0

We will say that ~n is a vector in the null space of the matrix A.


Then any vector of the form
~x = ~x ∗ + λ ~n λ∈R (A.14)
is also a solution of equation (A.10). To prove this, we need the following facts, which you can
easily verify:

1. If we multiply a matrix with a sum of two vectors we can get the same result by multiplying
the matrix with each vector separately, and then adding the results, that is,

A (~x 1 + ~x 2 ) = A ~x 1 + A ~x 2

2. If we multiply a matrix with a vector, which is itself multiplied with a scalar value λ, we
can get the same result by multiplying the matrix with the vector first, and then multiplying
the result by λ, that is,
A (λ ~x ) = λ A ~x

Now we can prove that the vector ~x as defined in equation (A.14) is also a solution of equa-
tion (A.11):

A ~x = A (~x ∗ + λ ~n ) = A ~x ∗ + A (λ ~n ) = A ~x ∗ + λ A ~n = A ~x ∗ + λ 0 = A ~x ∗ = ~b . (A.15)

That is, we can always multiply ~n with some real number λ and add it to ~x ∗ , and the result will
still be a solution of equation (A.10).
Of course, the vector ~n where all elements are zero always satisfies equation (A.13). We call this
the trivial solution. It does not lead to a solution which is different from the solution ~x ∗ . To see
if there are nontrivial vectors ~n which satisfy equation (A.13) for the case of equation (A.10),
we have to solve the following system of equations:

n1 + n2 =0 (A.16)
n3 + n4 =0
n1 + n3 =0
n2 + n4 =0

34
2.A. Linear algebra

When solving such sets of equations we can always add two equations to get a new one which
replaces one of the original equations. If we add equation 1 and 2 we get:

n1 + n2 + n3 + n4 = 0

But if we add equation 3 and 4 we get the same equation. This means that the set of equations is
linearly dependent.
We can take n1 = 1 (because when n1 = 0 we find from the equations that n2 = n3 = n4 = 0,
which would lead to the trivial solution). Then we find n2 = −1, n3 = −1, n4 = 1. So a
nontrivial solution of the equations (A.16) is:
 
1
−1
~n = −1
 (A.17)
1

If we represent this vector as a 2 × 2 image we get

1 −1
n=
−1 1

which is exactly what we already found in section 2.3.1.

2.A.3 Projection
We need a few properties of projection. Consider a line L through the vector w ~ , and another
vector ~v , see Figure 2.15(a). The vector ~v is projected on the line L. Let the point of projection
be B. Then we know that the length of the projected vector OB ~ is equal to

~
OB = k~v k cos α (A.18)

where α is the angle between ~v and w


~ . But we also know that
~ = λw
OB ~

for some λ ∈ R. So
~
OB = kλ w
~ k = λ kw
~ k. (A.19)
If we equate the two expressions (A.18) and (A.19), we find

k~v k cos α
λ=
kw~k

35
Computer Tomography

From equation (A.8) we know that

~v · w
~
cos α =
k~v k kw
~k

Therefore we find the projection formula:

~ = λw ~v · w
~
OB ~ with λ = (A.20)
kw~ k2

A v = ( v1, v2 )

α B
w = ( w1 , w2 )
O

(a) (b)
Figure 2.15: (a): when a vector ~v is projected on a line L through the vector w
~ , then the length
~
of the projected vector OB is equal to k~v k cos α, where α is the angle between ~v and w ~ . (b):
a line L with equation w1 f1 + w2 f2 = p is perpendicular to the vector w ~ = (w1 , w2 ) and
~ at a distance kw~p k from the origin.
intersects the line through w

Now suppose we have a line L with equation w1 f1 + w2 f2 = p, where w ~ = (w1 , w2 ) is a fixed


~
vector and and f = (f1 , f2 ) are the coordinates of the points on L; see Figure 2.15(b). We can
also write this equation in the form
~ · f~ = p
w
We will show that the vector w~ is perpendicular to the line L. Let A be the intersection of L with
the line through w
~ . Then, since A is a point on the line through w ~ has the form
~ , the vector OA

~ = λw
OA ~ = (λ w1 , λ w2 )

36
2.A. Linear algebra

But since A is also on the line L, its coordinates satisfy the equation of the line L, so w1 f1 +
w2 f2 = p where f1 = λ w1 and f2 = λ w2 . This implies

w 1 f1 + w 2 f2 = p
w1 (λ w1 ) + w2 (λ w2 ) = p
λ (w12 + w22 ) = p
~ k2 = p
λ kw
p
λ=
kw~ k2

On the other hand, take any point P on the line L with coordinates f~ = (f1 , f2 ), and project it
perpendicularly on the line through w
~ . Let the point of projection be denoted by B. Then we
know from the projection formula (A.20) that
~ = µw
OB ~
f~ · w
~ p
µ= 2 = .
kw~k ~ k2
kw

So we see that λ = µ and in fact the points A and B are the same!
~ is given by:
The length of the vector OA

~ p p p
OA = kλ w
~k= 2 w
~ = 2 kw
~k=
kw
~k kw
~k kw
~k

Conclusion:

The line with equation w ~ · f~ = p is perpendicular to the line through w


~
p
and intersects this line at a distance from the origin given by .
kw
~k

37
Bibliography

Deans, S. R., 1983. The Radon Transform and Some of Its Applications. J. Wiley.

Herman, G. T., 1980. Image Reconstruction from Projections: the Fundamentals of Computer-
ized Tomography. Academic Press.

Kak, A. C., Slaney, M., 1988. Principles of Computerized Tomographic Imaging. IEEE Press,
New York.

Shepp, L. A., Kruskal, J. B., 1978. Computerized tomography: The new medical x-ray technol-
ogy. Am. Math. Monthly 85, 420–439.

Shepp, L. A., Logan, B. F., 1974. The Fourier reconstruction of a head section. IEEE Transactions
on Nuclear Science NS–21, 21–43.

38
Chapter 3

Stochastic dynamics: Markov chains

In this chapter we study systems that evolve in discrete timesteps according to some stochastic
(probabilistic) rules. We will restrict ourselves to a particular type of stochastic system, called
Markov Chains.
As an application we will discuss Google’s PageRank algorithm.

3.1 Markov chains


A Markov chain is a stochastic dynamical system, that is, a system that evolves in time according
to some probabilistic rules. We will assume that the system can be in a finite number of states
at discrete time steps t = 0, 1, 2, . . .. The characteristic property of a Markov chain is that the
probability that the system is in a certain state at time t depends only on the state probabilities at
the previous step t − 1.
To make things simple we will from now on consider a very simple Markov chain, the so-called
random walk problem. Assume a random walker that can be in two states, 1 and 2. At each
discrete time t = 0, 1, 2, . . . the walker can make a transition to another state or stay in the same
state, with a certain probability. We assume that the walker goes from state 1 to state 2 with
transition probability p, which means the walker stays in state 1 with probability 1 − p (where
0 < p < 1). Similarly, the walker goes from state 2 to state 1 with transition probability q, so

1−p 1−q
q
1 p 2

Figure 3.1: Random walk with two states 1 and 2. The transition probabilities are indicated on
the links.

39
Stochastic dynamics: Markov chains

stays in state 2 with probability 1 − q (where 0 < q < 1).


The probability of the walker to be in state i at time t is denoted by xi (t), i = 1, 2. We combine
these two probabilities into a vector ~x (t) = (x1 (t), x2 (t)), which is called the state vector of the
Markov chain. Note that the state at time t only depends on the state at the previous time t − 1:
this is the Markov property mentioned above. When ~x (0) is given, then the sequence ~x (1), ~x (2),
. . . is uniquely defined.
We present now (without proof) an important property of the random walk. When t → ∞ the
elements of the state vector become constant, independent of the initial condition: x1 (t) → x∞1 ,
x2 (t) → x∞ 2 . The probability vector ~
x ∞
= (x ∞
1 , x∞
2 ) is called the limiting or equilibrium
distribution of the walk.
The following equations relate the probabilities at time t to the probabilities at time t − 1:

(1 − p) x1 (t − 1) + q x2 (t − 1) = x1 (t) (3.1)
p x1 (t − 1) + (1 − q) x2 (t − 1) = x2 (t) (3.2)

Equation 3.1 can be understood as follows.

1. The walker is in state 1 at time t − 1 with probability x1 (t − 1), and can stay in this state
at the next step with probability 1 − p. This gives a contribution (1 − p) x1 (t − 1) to the
probability x1 (t) to be in state 1 at time t.

2. The walker is in state 2 at time t − 1 with probability x2 (t − 1), and can make a transition
from state 2 to state 1 at the next step with probability q. This gives a contribution q x2 (t −
1) to the probability x1 (t) to be in state 1 at time t.

3. Adding the two contributions gives Eq. 3.1.

Eq. 3.2 can be understood in the same way.


The equations (3.1) and (3.2) that relate the probabilities at time t with the probabilities at time
t − 1 can be written in matrix form as

~x (t) = A ~x (t − 1) (3.3)

where      
1−p q x1 (t − 1) x1 (t)
A= ~x (t − 1) = ~x (t) =
p 1−q x2 (t − 1) x2 (t)
We call A the transition matrix or Markov matrix of the Markov chain.
Note that the sum of elements in each column equals 1: i Aij = 1.
P

The time evolution of the Markov chain can be studied by looking at iterations of (3.3):

40
3.1. Markov chains

~x (t) = A ~x (t − 1) = A2 ~x (t − 2) = A3 ~x (t − 3) = . . . (3.4)
where A2 = A A, A3 = A A A, etc.
The powers of A can be interpreted as follows.
Aij = probability to go from state j to state i in one step

(A2 )ij = probability to go from state j to state i in two steps

... ...

(An )ij = probability to go from state j to state i in n steps

3.1.1 Computing the equilibrium distribution


The equations that define the equilibrium probabilities x∞
i or simply xi obey the equations:

(1 − p) x1 + q x2 = x1 (3.5)
p x1 + (1 − q) x2 = x2 (3.6)
or in matrix form
A ~x = ~x (3.7)
where    
1−p q x1
A= ~x =
p 1−q x2
Now there is the so-called Perron-Frobenius theorem, which says that the solution of Eq. 3.7
exists and is unique up to a scaling factor. This scaling factor is determined by the condition that
the sum of the probabilities equals 1,
x1 + x2 = 1. (3.8)

Rewriting the equations (3.5) and (3.6) that define the equilibrium probabilities we get:
−p x1 + q x2 = 0 (3.9)
p x1 − q x2 = 0 (3.10)
Clearly this is a singular (but consistent) linear system of equations, with infinitely many so-
lutions. However, using the normalization equation x1 + x2 = 1 we get the unique solution
(check!):
q p
x1 = , x2 = . (3.11)
p+q p+q
Computing the equilibrium distribution by hand may be difficult or impossible when the number
of states of the random walk becomes very large. In that case the computation can be done by
numerically iterating the equation ~x (t) = A~x (t − 1), until a desired accuracy is obtained.

41
Stochastic dynamics: Markov chains

3.2 The PageRank algorithm


PageRank is an algorithm used by Google Search to rank websites in their search engine results.
This algorithm was developed by Larry Page and Sergey Brin, the founders of Google (Brin and
Page, 1988).
PageRank works on the link structure of the World Wide Web. It does not involve the actual
content of any Web pages or individual queries. It is frequently recomputed. When a query is
submitted, Google finds the pages on the Web that match that query and lists those pages in the
order of their PageRank. The basic idea is that a page has high rank if many other pages link to
it.
The global way PageRank works is as follows. It counts the number and quality of links to a web
page to determine a rough estimate of how important the website is. The underlying assumption
is that more important websites are likely to receive more links from other websites. See also
https://en.wikipedia.org/wiki/PageRank.
The PageRank algorithm can be connected to the concept of random walk in the following way
(see Fig. 3.2). Imagine surfing the Web, going from page to page by randomly choosing an
outgoing link from one page to get to the next. To avoid getting stuck in dead ends at pages
with no outgoing links or cycles of interconnected pages, choose a random page from the Web
a certain fraction of the time. The limiting probability that a random surfer visits any particular
page is its PageRank. The sum of the PageRanks of all Web pages will be equal to 1.

Figure 3.2: Simple network with three web pages.

3.2.1 PageRank: defining the Markov chain


The Markov chain that describes the random walker (or “surfer”) on a network with n nodes, is
defined as follows.

42
3.2. The PageRank algorithm

Figure 3.3: Larger web network. The PageRanks (equilibrium probabilities of the random
surfer) are indicated on the nodes.

First we construct the n × n connectivity matrix G of the network W (set of web pages):
(
1 if node j is linked to node i
Gij = (3.12)
0 otherwise

See Fig. 3.3 for an example of such a network. In practice, the value of n (number of web pages)
is very large (many billions). The number of nonzeros in G is the total number of hyperlinks in
W.
Next define the probability p that the random surfer follows an outgoing link. A typical value is
p = 0.85. This probability is divided equally over all outgoing links of a node. Let cj be the sum
of the j-th column of G (the out-degree of the j-th node, that is, the number of outgoing links)
and let δ = (1 − p)/n be the probability that the surfer chooses a random page (not following a
link). The introduction of δ is needed to avoid that the surfer becomes stuck on a page without
outgoing links or gets trapped in a cycle. Most of the elements of A are equal to δ, the probability
of jumping from one page to another without following a link. The value of δ is typically very
small. For example, when n = 4 · 109 and p = 0.85, then δ = 3.75 · 10−11 .
Putting the above ingredients together we can define the Markov matrix of the network:
(
p Gij /cj + δ if cj 6= 0
Aij = (3.13)
1/n, if cj = 0

43
Stochastic dynamics: Markov chains

3.2.2 Computing the PageRanks


Now that we have determined the Markov matrix A (according to Eq. (3.13)) of the random
surfer we can compute the PageRanks by the following steps.

1. Compute the equilibrium distribution of the random surfer by solving

A~x = ~x

where A is the Markov matrix of the surfer, and the vector ~x is normalized ( ni=1 xi = 1). The
P
elements xi can be initialized arbitrarily, for example uniformly: xi = 1/n, i = 1, 2, . . . , n.

2. The equilibrium probabilities xi , i = 1, 2, . . . , n are the PageRanks of the nodes i.

3. Reorder the nodes in decreasing order of the xi -values (probabilities), resulting in a permuta-
tion j = perm(i), j = 1, 2, . . . , n of the n indices i.

4. This gives the list of nodes (websites) j = 1, 2, . . . , n with associated PageRanks xj in de-
creasing order (importance).

In practice, the matrices G and A are so big that they are never actually computed directly. In-
stead, one can take advantage of the particular (sparse) structure of the Markov matrix. Numeri-
cal routines dedicated to solving large sparse linear systems of equations exists for this purpose.

44
Bibliography

Brin, S., Page, L., 1988. The anatomy of a large-scale hypertextual web search engine. Computer
Networks and ISDN Systems 30 (1-7), 107–117.

45
BIBLIOGRAPHY

46
Chapter 4

Modelling and simulation of pattern forma-


tion

In this chapter we will look at models of pattern formation. In particular we will study how
models with very simple local interactions between components can lead to very complex and
intriguing spatial patterns. Many mathematical techniques are available to formulate such mod-
els, for example based on partial differential equations, which treat time and space as continuous
variables; examples of such equations are discussed in Chapter 8. In this chapter we will study
so-called Cellular Automata, which are discrete in time and space. They were first proposed
by Von Neumann and Ulam, see von Neumann (1966). A very popular cellular automaton is
Conway’s Game of Life, invented by the British mathematician John Horton Conway (Gard-
ner, 1970). For introductions and information about the history of cellular automata, see Schiff
(2007); Wolfram (2002); Chopard and Droz (1998). Cellular Automata modeling of biological
pattern formation is discussed in Deutsch and Dormann (2005).

4.1 Cellular Automata: Introduction


Cellular Automata (CA) provide a way of making simplified models of many natural phenomena.
In a CA model we divide the universe into small cells. Each cell can be in one of a finite
number of states, e.g., “live” or “dead”. Then we define simple rules, called transition rules,
to change the state of each cell, depending on the states of its neighbours. In this way we get
a new configuration of cells. Following biological terminology, this new configuration is called
the next “generation”. This process is then repeated by iterative application of the rules, so
we get a succession of generations1 . In principle this iteration can go on forever; in computer
implementations it goes on for a finite (possibly quite large) number of generations.
There is much interest in CA models among ecologists or biologists. This arises from the of-
ten observed fact that a simple model with relatively few states and relatively simple rules can
often produce very realistic and complicated behaviour. There is also a huge interest amongst
1
In more mathematical terms, we speak of a discrete-time dynamical system.

47
Modelling and simulation of pattern formation

mathematicians and computer scientists, especially those concerned with the theory of artificial
intelligence.

4.2 A simple CA with majority voting


Consider a universe consisting of a subset of the plane, divided up into an n by n array of small
cells, indexed by (i, j), for i, j = 1, . . . , n. The states of these cells may be collectively repre-
sented by the elements A(i, j) of a matrix (two-dimensional array) A, for i, j = 1, . . . , n. The
immediate neighbours of cell (i, j) are defined to be the eight cells that touch it, see Figure 4.1.
So a cell (i, j) which is not on the boundary of the universe (that is, i = 2, . . . , n − 1 and
j = 2, . . . , n − 1) has immediate neighbours (i − 1, j − 1), (i − 1, j), (i − 1, j + 1), (i, j − 1),
(i, j + 1), (i + 1, j − 1), (i + 1, j) and (i + 1, j + 1).

i+1 ? ? ?

i ? ?

i−1 ? ? ?

j−1 j j+1

Figure 4.1: Immediate neighbours (cells with a ?) of an interior cell (i, j).

The next step is to define different types of states for the cells and transition rules for obtaining
new generations. We assume that each cell can occupy only two states, called “alive” and “dead”.
To be specific, we define:
(
1 if cell (i, j) is alive
A(i, j) =
0 if cell (i, j) is dead
The transition rules for constructing each generation from the previous one are as follows (note
the rules are the same in each generation, that is, they do not depend on time):

• A live cell dies in the next generation if more than 4 of its immediate neighbours in the
present generation are dead; otherwise it stays alive.

• A dead cell comes to life in the next generation if more than 4 of its immediate neighbours
in the present generation are alive; otherwise it stays dead.

Note that these rules correspond to a form of majority voting.

48
4.2. A simple CA with majority voting

Since the computer is finite, our universe has to be finite as well, so it will have boundaries. To
apply the above rules consistently, we therefore need to define the states of the cells that lie just
outside the boundary of the universe. This can be done in various ways. One possibility, which
is often used, is to assume the following:

• All cells outside the boundary of our universe are assumed to be dead.

This means that we can effectively extend the boundary of our universe by two rows and two
columns to include cells (0, j), (n+1, j), j = 0, . . . , n+1, and (i, 0), (i, n+1), i = 0, . . . , n+1,
all of which are defined to be dead.

Remark 4.1 Note that the new state of a cell depends on the states of all neighbouring cells in
the present generation. So, even if the states of some neighbours of a cell (i, j) already have been
updated, these new values should not be used in computing the new state of cell (i, j). ♦

4.2.1 Behaviour of the simple CA


Now we consider various initial cell configurations, and then see what happens in successive
generations. We start with very simple initial configurations, and make things more complex as
we go on.

Example 4.2 Initialization: There is only one live cell (i, j), and all other cells are dead.
Then in generation 2 the cell (i, j) will be dead as well. So after generation 2 all cells remain
dead. We say that we have reached a steady state of the discrete-time dynamical system.2 ♦

Example 4.3 Initialization: All cells are alive.


Then in generation 2 all interior cells will be alive as well. Cells on the boundary, but not on the
corners, have 5 alive neighbours, so remain alive as well. The four corner cells ((1, 1), (1, n),
(n, 1), (n, n)) have only three alive neighbours, so they will be dead in generation 2.
After generation 2 no more changes occur (check!). So again we have reached a steady state. ♦

Example 4.4 Initialization: Cells are chosen to be alive or dead in a random way.
Now we observe that a very irregular random initial state is smoothed out by the birth and death
rules, and eventually arrives at a pattern of alive and dead regions with rather smooth interfaces
between them. This is illustrated in Fig. 4.2, where alive cells are represented by black pixels
and dead cells by white pixels. In this example a steady state is reached after 14 generations.
For different initial random distributions with the same fraction of alive cells, the number of
generations needed to reach a steady state will in general be different as well. ♦
2
Or, in mathematical terms, a stationary solution or fixed point.

49
Modelling and simulation of pattern formation

generation 1 generation 2 generation 5

generation 8 generation 11 generation 14 (steady state)


Figure 4.2: Evolution of cellular automaton with majority rule and n = 64, initialized by a
random distribution of alive cells (black pixels) and dead cells (white pixels). The initial fraction
of alive cells is 0.60, while the fraction in the steady state is 0.79.

50
4.3. Conway’s Game of Life

4.3 Conway’s Game of Life


Now we consider a CA which has the same states as before, i.e., 1 and 0, standing for “alive”
and “dead”, respectively. But we modify the transition rules a bit. Again each cell looks at its
eight cells immediately around it. Here are the rules.

• an alive cell dies if it has more than 3 or less than 2 alive neighbours

• a dead cell becomes alive if it has 3 alive neighbours

• in all other cases the state does not change

The first rule is inspired by the biological phenomena of “isolation” and “over-crowding”, re-
spectively. Again we may assume that all cells outside the boundary of our universe are dead.
Now something spectacular happens: although the change of transition rules seems not be very
large, the number of possible dynamical behaviours of our new CA is extremely rich.
You can have spatial configurations which are static, i.e., they never change (stationary solutions
or “still lives”). Some configurations lead to periodic solutions (“blinkers”, “oscillators”), i.e.,
the configuration repeats itself after a fixed number P of generations; P is called the period of
the periodic solution. Some configurations glide across the plane (“gliders”, “spaceships”); or
eat the gliders; or throw off the gliders like waste. Some configurations eventually reach a steady
state, some keep on changing, sometimes even growing to infinite sizes.
Let us again consider a number of examples. See also http://en.wikipedia.org/wiki/
Conway’s_Game_of_Life for some nice animations.

Example 4.5 [Still lives] Initialization: the configurations shown in Fig. 4.3 are all examples of
static solutions (check!). ♦

block boat beehive

Figure 4.3: Still lives: initial configurations which are static under Game of Life-rules. Black
cells are alive, the other cells are dead.

51
Modelling and simulation of pattern formation

Example 4.6 [Oscillators] The configurations shown in Fig. 4.4 are examples of periodic solu-
tions. After two generations the pattern repeats itself: the solution has period 2. ♦

generation 1 generation 2

Figure 4.4: Blinker: configuration which repeats itself under Game of Life-rules. Black cells
are alive, the other cells are dead. The patterns in generations 1 and 2 are indicated. The period
is 2, so the patterns in generation 2n − 1, n = 1, 2, . . . are the same; the same holds for the
patterns in generation 2n, n = 1, 2, . . ..

Example 4.7 [Glider] Initialization: the initial configuration consists of 5 alive cells in a sea of
dead cells, as shown in Fig. 4.5.
The generations 2,3 and 4 are all different. But in generation 5, the pattern is the same as in
generation 1, except from a translation in the south-east direction. This means that, when the
iteration is continued from this point on, we get the same sequence of 4 patterns as before, apart
from the motion in the south-east direction. When this is animated, it looks like a crawling
animal slowly sliding down a slope, hence the name “glider”. ♦

generation 1 generation 2 generation 3 generation 4

generation 5 generation 6 generation 7 generation 8

Figure 4.5: Glider: evolution of cellular automaton with Game of Life-rule. Black cells are alive,
the other cells are dead. After 4 generations the pattern repeats itself, except from a translation
in the south-east direction.

52
4.4. Variations and generalizations

Example 4.8 [Gosper glider gun] Conway originally conjectured that no pattern can grow for-
ever, i.e., that for any initial configuration with a finite number of alive cells, the population
cannot grow beyond some finite upper limit. However, this conjecture turned out to be false.
In 1970, a team from the Massachusetts Institute of Technology, led by Bill Gosper, came up
with a configuration now known as the “Gosper glider gun”. It is shown in Fig. 4.6. When
this configuration is evolved under the Game of Life-rules, it “produces” an infinite number
of gliders. The first glider is produced on the 15th generation, and another glider every 30th
generation from then on. ♦

Figure 4.6: Gosper glider gun: produces an infinite number of gliders under Game of Life-rules.

4.4 Variations and generalizations


Variations on Game of Life. New life-like cellular automata may be obtained by:

• Changing the transition rules.

• Changing the grid. For example, instead of a square grid, a hexagonal grid can be used.

• Changing the grid dimension. Instead of a 2D grid, a 1D or 3D grid can be used.

• Changing the number of states. Instead of two states (live and dead), three or more states
per cell can be used.

1-D Cellular Automata. In 1-D, there are 23 = 8 possible configurations for a cell and its two
immediate neighbors. The rule defining the cellular automaton must specify the resulting state
for each of these possibilities, so there are 28 = 256 possible 1-D cellular automata.

Chaotic behaviour. Under certain transition rules initial patterns evolve in a pseudo-random
or chaotic manner.

53
Modelling and simulation of pattern formation

Self-replication. A self-constructing pattern called “Gemini” was invented by Andrew J. Wade


in 2010, which creates a copy of itself while destroying its parent. This pattern replicates in 34
million generations. See http://conwaylife.com/wiki/index.php?title=Gemini.

Reversibility. A cellular automaton is called reversible if for every current configuration of


the cellular automaton there is exactly one past configuration. For one-dimensional cellular
automata algorithms exist for deciding whether a rule is reversible or not. For cellular automata
in two or more dimensions reversibility is undecidable: there is no algorithm that is guaranteed
to determine correctly whether the automaton is reversible.

Universal Turing machine. It has been shown that Game of Life is a very powerful computa-
tional machine. In fact, Game of Life is theoretically as powerful as any computer with unlimited
memory and no time constraints: it is a universal Turing machine.

54
Bibliography

Chopard, B., Droz, M., 1998. Cellular Automata Modeling of Physical Systems. Cambridge
University Press.

Deutsch, A., Dormann, S., 2005. Cellular Automaton Modeling of Biological Pattern Formation.
Birkhäuser Boston, Cambridge, MA.

Gardner, M., October 1970. Mathematical games. The fantastic combinations of John Conway’s
new solitaire game “life”. Scientific American 223, 120–123.

Schiff, J. L., 2007. Cellular Automata: A Discrete View of the World. John Wiley & Sons, New
York, NY.

von Neumann, J., 1966. The Theory of Self-reproducing Automata (A. Burks, ed.). Univ. of
Illinois Press, Urbana, IL.

Wolfram, S., 2002. A New Kind of Science. Wolfram Media.


URL www.wolframscience.com/nksonline

55
BIBLIOGRAPHY

56
Chapter 5

Dynamics in the complex plane


In this chapter we study dynamical systems generated by transformations (or “mappings”) of the
complex plane. Certain types of such transformations can be used to generate pictures of geomet-
rical structures that are known as fractals. The term “fractal” was coined by Benoit Mandelbrot
in his book The Fractal Geometry of Nature (Mandelbrot, 1982), see also Barnsley (1988). Since
then fractals have become a playground for Computer Graphics, and many amazing pictures have
been produced by this method.

The theory behind fractals requires some deep mathematics, involving the calculus of complex
numbers. We will restrict ourselves here to the most elementary aspects needed. Therefore, let
us first have a brief look at complex numbers.

5.1 Intermezzo: Complex numbers


A complex number z has the form
z = a + i b, i2 = −1 (5.1)

57
Dynamics in the complex plane

where a and b are real numbers, called the real part and the imaginary part of z, respectively, and
i is the complex unit, defined by i2 = −1. The number z can be visualized in the complex plane
as a point (a, b), where the horizontal axis represents the real part and the vertical axis represents
the imaginary part; see Fig. 5.1. To each complex number z is associated the complex conjugate
number z ∗ defined by
z ∗ = a − i b, (complex conjugate) (5.2)
Geometrically z ∗ is obtained by reflecting the point z w.r.t. the horizontal axis.

b z=a+ib= re i θ
r
θ
a

z*=a−ib

Figure 5.1: Point z and its complex conjugate z ∗ in the complex plane.

A complex number has also a polar representation (see Fig. 5.1):


√ √
z = r e iθ , r = |z| = z z ∗ = a2 + b2 (magnitude)
b
θ = arctan (phase)
a

5.2 Mappings in the complex plane


Consider a mapping f (z) that maps points to points in the complex plane:

z 0 = f (z)

We can iterate this mapping (see Fig. 5.2):

zk+1 = f (zk ), k = 0, 1, 2, . . . (5.3)

To save writing let us define


n times
z }| {
(n)
f (z) = f (f (f (. . .)))(z)
So:
zk = f (k) (z0 )
where z0 is the starting point.

58
5.3. The homogeneous quadratic equation

z2
z1

z0

Figure 5.2: Iterated mapping in the complex plane.

The sequence z0 , z1 , z2 , . . . , is called the trajectory or orbit of the dynamical system (5.3).
We can now ask the following question: what kind of behaviours are possible when the number
of iterations becomes (infinitely) large? In general, the answer is not easy to give. Therefore, in
the following we consider a very special mapping, the quadratic equation.

5.3 The homogeneous quadratic equation


Consider the mapping f (z) = z 2 . So

zk+1 = zk2 k = 0, 1, 2, . . .

To analyze what happens when this mapping is iterated, consider the polar representation z =
r ei φ . Then the equation becomes:

rk+1 ei φk+1 = (rk ei φk )2 = rk2 e2 i φk

Taking the magnitude on both sides we get:

rk+1 = rk2

This implies, for any rk 6= 0, that


φk+1 = 2φk

Let us consider a number of cases for z0 ; see Fig. 5.3.

1. z0 is in Region I. Then r0 < 1, so rk = r02k → 0. All points inside the unit circle eventually
move to the origin.

59
Dynamics in the complex plane

(0,1)

II
(−1,0) (1,0)
I

(0,−1)

Figure 5.3: Regions of the complex plane.

2. z0 is in Region II. Then r0 > 1, so rk = r02k → ∞. All points outside the unit circle
eventually move to infinity.
3. z0 is on the unit circle. Then r0 = 1, so rk = r02k = 1. All points stay on the unit circle.
Moreover, φk = 2k φ0 . Examples:
(a) φ0 = 0. Then φk = 0. So the trajectory of z0 = 1 + 0 i is a fixed point.
(b) φ0 = π/2. Then φ1 = π, φ2 = 2 π.
So the trajectory of z0 = 0 + i ends at a fixed point.
(c) φ0 = 2π/3. Then φ1 = 4π/3, φ2 =√8π/3 = 2π/3.
So the trajectory of z0 = − 12 + i 21 3 is a periodic orbit (cycle) of period 2.

So for the homogeneous quadratic equation the possible behaviours are very simple. More inter-
esting behaviours occur when the equation is inhomogeneous.

5.4 The inhomogeneous quadratic equation


Consider the mapping f (z) = z 2 + c. So,

zk+1 = zk2 + c k = 0, 1, 2, . . .

To analyze what type of dynamical behaviour is described by this equation requires very ad-
vanced mathematics.
Two types of question can be asked:

1. Fix the initial point z0 = 0, and study the behaviour of the orbit z0 , z1 , z2 , . . . , zk , . . . for
various values of the parameter c. This leads to the concept of the Mandelbrot set.
2. Fix the parameter c, and study the behaviour of the orbit z0 , z1 , z2 , . . . , zk , . . . for various
values of the initial point z0 . This leads to the concept of the Julia set.

60
5.4. The inhomogeneous quadratic equation

5.4.1 The Mandelbrot set


We start by the iteration with fixed initial condition z0 = 0:

zk+1 = zk2 + c (5.4)


z0 = 0

The first iterates are:

z1 = c
z2 = c2 + c
z3 = (c2 + c)2 + c

Question: What is the set M of parameter values c in the complex plane for which the orbit
z0 = 0, z1 , z2 , . . . , zk , . . . remains bounded as k → ∞ ? This set M is called the Mandelbrot set.

Figure 5.4: Picture of the Mandelbrot set (yellow region).

The set M is called the Mandelbrot set. It can be displayed as an image in the complex c-plane
(yellow region in Fig. 5.4.).
As we can see, the Mandelbrot set has a very complicated structure.
To get a better idea, let us consider some points in / outside the Mandelbrot set M . For this we
have to determine whether for a given value of c the iterates of (5.4) remain bounded or not.
Some points inside M :

61
Dynamics in the complex plane

1. c = 0:
orbit z1 = 0, z2 = 0, etc. 0 is a fixed point, so c = 0 is in M .

2. c = −1:
orbit z1 = −1, z2 = 0, z3 = −1, . . . , so c = −1 is in M . The orbit is a cycle of period 2.

3. c = −2:
orbit z1 = −2, z2 = 2, z3 = 2, etc. The orbit settles on a fixed point, so c = −2 is in M .

4. c = i:
orbit z1 = i, z2 = −1 + i, z3 = −i, z4 = −1 + i, . . . . The orbit settles on a cycle of period
2, so c = i is in M .

Some points outside M :

1. c = 1:
orbit z1 = 1, z2 = 2, z3 = 5, z4 = 26, . . . .
This grows forever, so c = 1 is not in M .

2. c = 2i:
orbit z1 = 2i, z2 = −4 + 2i, z3 = 12 − 14i, z4 = −52 − 334i, . . . .
This grows forever, so c = 2i is not in M .

Here are more orbits for real c, one orbit which is periodic and one which is chaotic:

c = −1.751. The orbit is a period 3 cycle. c = −1.85. The orbit is chaotic.

Real numbers in the Mandelbrot set

Let us now consider points on the real axis of the complex c-plane, and see which of these points
fall inside the Mandelbrot set.

62
5.4. The inhomogeneous quadratic equation

So assume c is a real number, zk+1 = zk2 + c, z0 = 0. Then all iterates zk are real as well. The
following statement can be proved: The set of all real numbers c in the Mandelbrot set is the
interval [−2, 0.25].
This can be seen as follows.

1. We have seen that c = −2 is in the Mandelbrot set. If c < −2 then zk increases in each
iteration without bound. So c = −2 is the left limit of real points in the Mandelbrot set.
2. If c is positive, each iteration zk is greater than the one before. For the orbit zk to reach a
limit, say z, as k → ∞, it must be the case that

z = z2 + c

This equation has the solution √


1± 1 − 4c
z= (5.5)
2
which becomes imaginary at c > 41 . So c = 14 is the largest real number in the Mandelbrot
set, with the limit of the orbit equal to z = 12 . (For c < 14 the solution z is the one with the
minus sign in (5.5).)

Here are a few more orbits for complex c:

c = −0.5 + 0.56i. A period 5 cycle. c = −0.67 + 0.34i. A period 9 cycle.

Periods of points inside the Mandelbrot set

The periods of periodic orbits are constant for any c inside the so-called primary bulbs of the
Mandelbrot set; see Fig. 5.5. For more information, see Robert L. Devaney, http://math.bu.
edu/DYSYS/FRACGEOM/FRACGEOM.html

Computing the Mandelbrot set

Let us now consider how the Mandelbrot set can be computed numerically.

63
Dynamics in the complex plane

Figure 5.5: The Mandelbrot set and its primary bulbs (labeled by integers).

Im c
zk

z1
R=2 z2

z0 Re c

Figure 5.6: Grid in the plane with escape circle with radius R = 2.

64
5.4. The inhomogeneous quadratic equation

First, define a grid of pixels covering the area within the square −2 ≤ c ≤ 2, −2 ≤ c ≤ 2 of the
complex c-plane. For each pixel, the coordinates of one of the corner points (say, the lower-left
corner) are used to define the value of c to be used in the iteration. (In Matlab, the meshgrid
function can be used to define a 2D grid.) This is repeated for all values of c on the grid.
To check (for a given value of c) whether the orbit zk , k = 0, 1, 2, . . . goes to infinity we use an
escape circle of radius R = 2; see Fig. 5.6. It has been proven that if |zk | passes this circle, it
will never return to a point within this circle, but will rapidly escape to infinity.
The maximum number of iterations needed can be quite low, say nmax = 100. Actually, it
depends on the resolution of the image. If we zoom the Mandelbrot set near its boundary, it
takes many iterations before it can be decided whether the orbit passes the escape circle, so the
maximum iteration number has to be increased.

Computing the Mandelbrot set: the fringe

The most interesting behaviours occur near the boundary of the Mandelbrot set. To get a better
picture of this region, we adapt the algorithm as follows.
Instead of determining whether a point escapes to infinity or not, we compute the number of
iterations k required for zk to escape the disc of radius 2. The value of k will be between 0 and
the maximum number of iterations nmax (“depth”), which we set as a parameter of the algorithm.
Typical values are several hundreds to a few thousands.
Two cases can be distinguished:

1. If k = nmax then zk did not get larger than 2, so the current c is part of the Mandelbrot set.
We can colour the pixel black (or any colour we want).

2. If k < nmax , however, then we know that this c does not belong to the Mandelbrot set.
Large values of k (but smaller than nmax ) indicate that c is in the fringe, that is, the area
just outside the boundary. We map the value of k to a colour and draw the pixel with that
colour using a colourmap.

Figure 5.7 gives an example of the Mandelbrot set with its fringe, computed by implementing
the above algorithm in Matlab.
By varying the colourmap different artistic effects can be achieved, as is shown in Fig. 5.8.

Computing the Mandelbrot set: zooming in

A very interesting property of the Mandelbrot set is that it is self-similar. This can be shown if
we zoom in to a small part near the boundary if the Mandelbrot set.
So consider the region:

−0.133 ≤ Re(c) ≤ −0.123


−0.9925 ≤ Im(c) ≤ −0.9825;

65
Dynamics in the complex plane

Figure 5.7: Picture of the Mandelbrot set with fringe. The flipudjet colourmap has been used

flag colourmap hsv colourmap

colourcube colourmap pink colourmap

Figure 5.8: Same as Fig. 5.7, but with different colourmaps.

66
5.4. The inhomogeneous quadratic equation

Figure 5.9: A small region near the boundary of the Mandelbrot set: a “baby-Mandelbrot set”
appears. Colourmap: “flag”.

A picture of the Mandelbrot set for this region is shown in Fig. 5.9. We have used the grid size:
5122 , and have set the number of iterations to nmax = 512. We observe self-similarity under
zooming: a mini-version of the Mandelbrot set (a “baby-Mandelbrot set”) appears! This process
can be repeated, by further zooming in. At all zoom levels new copies of the Mandelbrot set
appear.
Many other regions have been found, where interesting visual patterns occur when zooming in.
Here are two examples.

Valley of the Seahorses Figure 5.10 gives a picture for the region

−0.82 ≤ Re(c) ≤ −0.72


−0.18 ≤ Im(c) ≤ −0.08;

Grid size: 10242 ; number of iterations: nmax = 512.

Left tip of the Mandelbrot set Figure 5.11 gives a picture for the region

−2.000 ≤ Re(c) ≤ −1.998


−0.001 ≤ Im(c) ≤ 0.001

Grid size: 5122 ; number of iterations: nmax = 100.

67
Dynamics in the complex plane

-0.08

-0.1

-0.12
Im(c)

-0.14

-0.16

-0.18
-0.82 -0.8 -0.78 -0.76 -0.74 -0.72
Re(c)

Figure 5.10: Valley of the Seahorses. Colourmap:“prism”.

Figure 5.11: Left tip of the Mandelbrot set. Colourmap:“colorcube”.

68
5.4. The inhomogeneous quadratic equation

Properties of the Mandelbrot set M

Here are a few properties of the Mandelbrot set, given without proof.

1. M is connected: there is a path within M from each point of M to any other point of M .
In other words, there are no disconnected “islands”.

2. The area of M is finite (it fits inside a circle of radius 2).

3. The length of the border of M is infinite. Any part of the border of M also has infinite
length. The border has “infinite details”, no part of the border is smooth. Zooming into any
part of the border always produces something interesting (when resolution and number of
iterations are high enough).

4. M is self-similar under zooming.

5.4.2 The filled Julia set


Again, consider the iteration

zk+1 = zk2 + c k = 0, 1, 2, . . .

This time, we fix c and vary the initial point z0 . The filled Julia set 1 Jc is the set of points z0 for
which the orbit z0 , z1 , z2 , . . . , zk , . . . remains bounded.
There are filled Julia sets for each value of c. The algorithm for computing filled Julia sets is
almost the same as for the Mandelbrot set, with a few small adaptations.

Figure 5.12: Filled Julia set: c = 0.2. Colourmap: flipud(jet).

1
The “Julia set” is the boundary of the filled Julia set. Named after Gaston Julia, 1893–1978.

69
Dynamics in the complex plane

Figure 5.13: Filled Julia set: c = 0.353 + 0.288i. Colourmap: flipud(jet).

Figure 5.14: Filled Julia set: c = −0.52 + 0.57i. Colourmap: flipud(jet).

70
5.4. The inhomogeneous quadratic equation

Computing the filled Julia set: example 1 Figure 5.12 gives an example of the filled Julia
set, for c = 0.2. Region of the z0 -plane: [−2, 2] × [−2, 2]. Grid size: 5122 , number of iterations:
nmax = 50.

Computing the filled Julia set: example 2 Figure 5.13 gives an example of the filled Julia set,
for c = 0.353 + 0.288i. Region of the z0 -plane: [−2, 2] × [−2, 2]. Grid size: 5122 , number of
iterations: nmax = 50.

Computing the filled Julia set: example 3 Figure 5.14 gives an example of the filled Julia set,
for c = −0.52 + 0.57i. Region of the z0 -plane: [−2, 2] × [−2, 2]. Grid size: 5122 , number of
iterations: nmax = 50.

The filled Julia set: a theoretical result

In 1919, Gaston Julia and Pierre Fatou proved the following theorem.

Theorem 5.1 For each c-value, the filled Julia set Jc for the mapping f (z) = z 2 + c is either a
connected set or a Cantor set.

A connected set is a set that consists of just one piece. A Cantor set consists of infinitely (in fact,
uncountably) many pieces (actually, points). An example if shown in Fig. 5.15.

Figure 5.15: Cantor ternary set.

A related theorem is the following.

Theorem 5.2 Let M be the Mandelbrot set and Jc be the filled Julia set for parameter c. If c lies
in M then Jc is connected. If c does not lie in M then Jc is a Cantor set.

To end we show in Figure 5.16 the Mandelbrot set and various Julia sets for various values of c.

71
Dynamics in the complex plane

Figure 5.16: The Mandelbrot set and various Julia sets. See http: // www. karlsims. com/
julia. html .

72
Bibliography

Barnsley, M. F., 1988. Fractals Everywhere.

Mandelbrot, B., 1982. The Fractal Geometry of Nature. Times Books.

73
BIBLIOGRAPHY

74
Chapter 6

Differential equations
Many dynamical phenomena in nature and technology can be modelled by differential equations.
In this chapter we look at two different types of such equations, linear and nonlinear ordinary
differential equations. An ordinary differential equation (ODE) is a differential equation con-
taining one or more functions of one independent variable (often the time variable, denoted by t)
and its derivatives.
In a later chapter we will consider partial differential equations, which involve more than one
independent variable.

6.1 Linear ordinary differential equations


Let us start with a very simple example.

6.1.1 Exponential growth


The following linear first order differential equation with one dependent variable describes ex-
ponential growth:
dy
ẏ = = k y(t)
dt
y(0) = y0
Here k is the growth rate, y(0) = y0 the initial condition. The solution of this equation is easy
to find:
y(t) = y0 e k t (6.1)
Figure 6.1 shows a graph of the solution y(t).

6.1.2 Circular motion


Next, we consider a case with two dependent variables, described by a system of two linear first
order differential equations:
ẋ = y ẏ = −x (6.2)

75
Differential equations

Figure 6.1: Graph of the exponential function.

Initial conditions x(0) = 1, y(0) = 0.


A general approach for solving linear ODEs is to try a solution of exponential form:
x(t) = C eλ t , y(t) = D eλ t
Here the parameter λ can be a real or complex number. Substitute the assumed solution in (6.2):
C λ eλ t = D eλ t , D λ eλ t = −C eλ t
This gives C λ = D, D λ = −C, so λ2 = −1. This equation has two roots: λ1 = i or λ2 = −i.
When λ1 = i we find that D1 = i C1 . When λ2 = −i then D2 = −i C2 .
So the solution is
x(t) = C1 ei t + C2 e−i t
y(t) = i C1 ei t − i C2 e−i t
Apply the initial conditions:
x(0) = 1, so C1 + C2 = 1
y(0) = 0, so i C1 − i C2 = 0
Then we see that C1 = C2 , 2C1 = 1, C1 = C2 = 12 . Then we find the final form of the solution:
1 it
x(t) = (e + e−i t ) = cos t
2
1
y(t) = i (ei t − e−i t ) = − sin t
2
These equations describe circular motion: the curve (orbit) followed by the point (x(t), y(t)) in
the x-y plane when t increases is a circle.
In the two cases above we were able to compute the solution analytically. However, in many
cases this is not possible and we have to turn to computing the solution numerically. A common
type of numerical algorithms are the so-called finite difference schemes, which we consider next.

76
6.1. Linear ordinary differential equations

6.1.3 Finite difference schemes

n n+1
t

Figure 6.2: Discretization of the time-axis.

Finite difference schemes work as follows:

• Divide the time axis into cells of size ∆t (the step size).

• Replace derivatives by finite differences.

• Update (integration) step: compute values at step n + 1 from values at step n.

Various methods exist:

1. Explicit (forward) Euler method

2. Implicit (backward) Euler method

3. Symplectic (semi-implicit) Euler method

Let us now look at these methods in detail for the case of circular motion described by the
equations (6.2).

6.1.4 Explicit (forward) Euler method


The steps of this method are as follows:

• Divide the time axis into cells of size ∆t (the step size); see Fig. 6.2. The nth time point is

tn = t0 + n∆t, . . . n = 1, . . . , N

• Each derivative is replaced by a finite difference:

dx(t) xn+1 − xn dy(t) y n+1 − y n


ẋ = ≈ ẏ = ≈
dt ∆t dt ∆t

where xn+1 = x(tn+1 ), xn = x(tn ) and y n+1 = y(tn+1 ), y n = y(tn ).

77
Differential equations

• Each finite difference at step n equals the value of the right-hand-side of the corresponding
equation at the current step n:
xn+1 − xn = ∆t y n y n+1 − y n = −∆t xn
Reordering we get explicit expressions for xn+1 , y n+1 :

xn+1 = xn + ∆t y n y n+1 = y n − ∆t xn

The following Matlab code shows an implementation of the explicit Euler method for the equa-
tions (6.2) describing circular motion.

% initial values
x0=1; y0=0;
% step size
Delta_t = 2*pi/60;
% number of steps
num_steps = 60;
% Initialization
x = zeros(1,num_steps+1);
y = zeros(1,num_steps+1);
% insert initial values
x(1) = x0; y(1) = y0;
% Simulate
for i=1:num_steps
% Compute derivatives
dxdt = y(i);
dydt = - x(i);
% Update
x(i+1) = x(i) + Delta_t*dxdt;
y(i+1) = y(i) + Delta_t*dydt;
end

Figure 6.3 shows a plot of y(t) against x(t). Such a plot is called a phase-space plot. As we can
observe, the plot is not a circle, but an outwardly moving spiral. We say that the Euler method is
unstable. This can be attributed to insufficient (numerical) damping.

6.1.5 Implicit (backward) Euler method


This method attempts to provide a solution for the instability of the explicit Euler method. The
steps of this method are as follows:

• Divide the time axis into cells of size ∆t, and replace each derivative by a finite difference, as
before.

78
6.1. Linear ordinary differential equations

Figure 6.3: Phase-space plot of the explicit Euler method for the equations (6.2).

• Each finite difference at step n equals the value of the right-hand-side of the corresponding
equation at the next step n + 1, leading to implicit expressions for xn+1 , y n+1 :

xn+1 − xn = ∆t y n+1
y n+1 − y n = −∆t xn+1

Reordering we get a 2 × 2 system of equations to be solved for (xn+1 , y n+1 ):

xn+1 − ∆t y n+1 = xn
y n+1 + ∆t xn+1 = y n

The following Matlab code shows an implementation of the implicit Euler method for the equa-
tions (6.2) describing circular motion.

% initial values
x0=1; y0=0;
% step size
Delta_t = 2*pi/60;
% number of steps
num_steps = 60;
% Initialization
x = zeros(1,num_steps+1);
y = zeros(1,num_steps+1);
% insert initial values
x(1) = x0; y(1) = y0;
% Simulate

79
Differential equations

for i=1:num_steps
% Solve linear system
M=[1, -Delta_t; Delta_t, 1];
soln=M\[x(i);y(i)]; % matrix inversion
% Update
x(i+1)=soln(1);
y(i+1)=soln(2);
end

Figure 6.4 shows the phase-space plot. As we can see, the method is stable, but the trajectory
spirals inwards. There is too much damping.

Figure 6.4: Phase-space plot of the implicit Euler method for the equations (6.2).

6.1.6 Symplectic (semi-implicit) Euler method


A further improvement can be obtained by the so-called symplectic (semi-implicit) Euler method.
The steps of this method are as follows:

• Divide the time axis into cells of size ∆t, and replace each derivative by a finite difference, as
before.
• Two half steps: first use the current value y n in the finite difference for xn , then use the updated
value xn+1 in the finite difference for y n :
xn+1 − xn = ∆t y n
y n+1 − y n = −∆t xn+1
Reordering, we get:

xn+1 = xn + ∆t y n (6.5)
y n+1 = y n − ∆t xn+1 (6.6)

80
6.1. Linear ordinary differential equations

So, given (xn , y n ), first (6.5) is used to update xn , producing the new estimate xn+1 . Then y n
and xn+1 are used in (6.6) to update y n , producing the new estimate y n+1 .

The following Matlab code shows an implementation of the symplectic Euler method for the
equations (6.2) describing circular motion.

% initial values
x0=1; y0=0;
% step size
Delta_t = 2*pi/60;
% number of steps
num_steps = 60;
% Initialization
x = zeros(1,num_steps+1);
y = zeros(1,num_steps+1);
% insert initial values
x(1) = x0; y(1) = y0;
% Simulate
for i=1:num_steps
% Compute derivative
dxdt = y(i);
% Update
x(i+1) = x(i) + Delta_t*dxdt;
% Compute derivative
dydt = - x(i+1);
% Update
y(i+1) = y(i) + Delta_t*dydt;
end

Figure 6.5 shows the phase-space plot. As we can see, the method is stable, although the trajec-
tory is not a perfect circle. To further improve the accuracy, higher order numerical schemes can
be used. However, these are beyond the scope of this course. For further information, see Quar-
teroni et al. (2006, 2014).

Constant of motion

One way to determine the quality of numerical algorithms is to see to what extent they preserve
so-called constants of motion, that is, functions of the dependent variables that remain constant
over time.
For the equations (6.2) describing circular motion, we can define the following quantity:

E = x2 + y 2 (6.7)

81
Differential equations

Figure 6.5: Phase-space plot of the symplectic Euler method for the equations (6.2).

The function E(t) = x2 (t) + y 2 (t) is independent of time. Here (x(t), y(t)) is the solution to the
system of differential equations. To prove this, let us compute the derivative of E(t) with respect
to time:

Ė = 2 x(t) ẋ(t) + 2 y(t) ẏ(t)


= 2 x(t) y(t) + 2 y(t) (−x(t)) = 0

We see that the time derivative is zero, so E(t) is constant in time. We call E a constant of motion
or integral of motion or conserved quantity. It is a function of the variables which is constant
along a trajectory in phase space.
Figure 6.6 shows simultaneous plots of x (blue), y (red), and E (green) as function of time for
the three Euler methods. Only the symplectic method preserves the constant of motion E to a
large degree.

6.2 Nonlinear ordinary differential equations


Now we look at two examples of nonlinear ODEs, the logistic model (one dependent variable)
and the Lotka-Volterra model (two dependent variables).

6.2.1 Logistic model


The logistic model is a modification of the exponential growth model which takes into account
that in reality resources in the environment are limited. We say that the environment has a finite
“carrying capacity”.

82
6.2. Nonlinear ordinary differential equations

explicit Euler implicit Euler symplectic Euler

Figure 6.6: Plots of x (blue), y (red), and E (green) as function of time for the three Euler
methods.

The nonlinear first order differential equation that describes this is the logistic equation:
y(t)
ẏ = k y(t) (1 − )
µ
y(0) = y0
Here k is the growth rate, µ the carrying capacity, and y(0) = y0 the initial condition. The
solution of this model (called the logistic function) can be obtained analytically:
µ y0 ek t
y(t) = (6.8)
y0 ek t + µ − y0
Steady states (equilibrium points) of this model are solutions which are constant in time. To find
these, we set ẏ = 0. Two steady state solutions are found: y(t) = 0 or y(t) = µ.

Figure 6.7: Plot of the exponential function (solid line) and the logistic function (dotted line).

Plots of the exponential function (green) and the logistic (or sigmoid) function (blue) are shown
in Fig. 6.7. We can observe that for small values of y the system shows exponential growth. For
larger values, saturation due to food depletion or crowding occurs.

83
Differential equations

Logistic model: simulation

We can simulate the logistic model by computing the finite difference solution by explicit Euler
integration. We set k = 1, µ = 10, y0 = 1, ∆t = 0.1, and use 100 timepoints.
Here is the Matlab code:

for i=1:num_steps
% Compute derivative
dydt = k*y(i)*(1-(y(i)/m));
% Update
y(i+1) = y(i) + Delta_t*dydt;
end

Figure 6.8 shows plots of the exact solution and the numerical solution. As we can see, the two
curves show good agreement.

Figure 6.8: Plots of the solution of the logistic model. Dotted line: numerical solution. Solid
line: the exact solution.

84
6.2. Nonlinear ordinary differential equations

6.2.2 Lotka-Volterra model

The Lotka-Volterra model is an example of a predator-prey model. Consider two species, a


predator (say foxes) and a prey (say hares), which are in competition. If there are no predators
the prey will grow exponentially. If there is no prey the predators die out exponentially. If there
is much prey the number of predators will grow. So the prey decreases, therefore the predators
decrease. Then the prey can grow again and the process repeats itself.
The Lotka-Volterra model is defined by a pair of nonlinear first order differential equations. Let
y1 (t) be the number of prey and y2 (t) be the number of predators at time t. Then the equations
are:

y2 (t)
ẏ1 = k1 y1 (t) (1 − )
µ2
y1 (t)
ẏ2 = −k2 y2 (t) (1 − )
µ1

Here ki are the net growth rates, µi the carrying capacities. Two initial conditions are needed:
y1 (0) = y1,0 , y2 (0) = y2,0 .
Again we can look at the steady states (equilibrium points), defined by ẏ1 = ẏ2 = 0. Two
solutions are obtained: (y1 (t), y2 (t)) = (0, 0) or (y1 (t), y2 (t)) = (µ1 , µ2 ).

Lotka-Volterra model: simulation

Let us use the parameter values: (k1 , k2 ) = (1, 1); (µ1 , µ2 ) = (300, 200); (y1,0 , y2,0 ) = (400, 100).
Duration = 3 periods (period = 6.5357), ∆t = period/60.
Finite difference solution by symplectic Euler integration is given by the following Matlab code:

% Matlab code snippet


% Simulate
for i=1:num_steps

85
Differential equations

% Compute derivative
dy1dt = k1*y1(i)*(1-(y2(i)/m2));
% Update
y1(i+1) = y1(i) + Delta_t*dy1dt;
% Compute derivative
dy2dt = -k2*y2(i)*(1-(y1(i+1)/m1));
% Update
y2(i+1) = y2(i) + Delta_t*dy2dt;
end

Figure 6.9: Plot of the solution of the Lotka-Volterra model.

Figure 6.9 shows plots of the numerical solutions for y1 (t) and y2 (t). As we can see, the solution
is periodic. Detailed analysis (not carried out here) shows that the period depends on the values
of k1 , k2 , µ1 , µ2 and on the initial conditions y1,0 , y2,0 .
If we plot y2 against y1 we get a phase-space plot, which is a closed curve, confirming that the
solution is periodic; see Fig. 6.10.

Lotka-Volterra model: Motion Invariant

Consider the following quantity:


y1 y1 y2 y2
G(y1 , y2 ) = k2 ( − ln ) + k1 ( − ln )
µ1 µ1 µ2 µ2
It can be verified that dG(y1 (t),y
dt
2 (t))
= 0. In other words, G is time-independent: it is a motion
invariant or constant of motion. Different values of G correspond to different closed contours
in the phase-space; see Fig. 6.11. These contours are called iso-lines, since the value of G is
identical for all points on the contour. For initial conditions (y1,0 , y2,0 ) close to the steady state
(µ1 , µ2 ), the iso-lines are ellipses, but for initial conditions farther away from the steady state
point the iso-lines start to take the shape of something like an hand axe.

86
6.2. Nonlinear ordinary differential equations

Figure 6.10: Phase-space plot of the Lotka-Volterra model. Parameters: (k1 , k2 ) = (1, 1);
(µ1 , µ2 ) = (300, 200); (y1,0 , y2,0 ) = (400, 100).

Figure 6.11: Plots of G(y1 , y2 ) = C (iso-lines) for various values of the constant C =
G(y1,0 , y2,0 ) determined by the initial conditions y1,0 , y2,0 . For all plots (k1 , k2 ) = (1, 1);
(µ1 , µ2 ) = (300, 200).

87
Bibliography

Quarteroni, A., Sacco, R., Saleri, F., 2006. Numerical Mathematics (2nd ed.). No. 37 in Text in
Applied Mathematics. Springer.

Quarteroni, A., Saleri, F., Gervasio, P., 2014. Numerical Mathematics (4th ed.). No. 2 in Texts in
Computational Science and Engineering. Springer.

88
Chapter 7

N -body simulations

Understanding the motions of the planets and the stars has been important throughout human
history. According to Newton’s law of universal gravitation (as described in his Philosophiae
Naturalis Principia Mathematica of 1687) states that a (point) particle attracts every other parti-
cle in the universe using a force that is directly proportional to the product of the masses of the
particles and inversely proportional to the square of the distance between them. That is, the force
F satisfies the equation
m1 m2
F =G 2 , (7.1)
r
where m1 , m2 are the masses of the particles, r is the distance between the centers of the two
masses, and G is the gravitational constant1 . The force on each particle is central, that is, it is
directed along the line that connects the two masses. Also, the force on m1 has a direction that
is opposite to that of the force on m2 . For spherical bodies with a uniform mass distribution the
same formula holds, where the masses can be replaced by point particles at their centers; see
Figure 7.1.

Figure 7.1: Mutual attraction of two massive bodies according to Newton’s law of universal
gravitation.
1
The value of G is 6.67 × 10−11 m3 /kg s2

89
N -body simulations

Since Newton’s discoveries it became possible to try and analyze motions of planets and other
heavenly bodies in a principled, accurate, and recognizably modern fashion. Despite Einstein’s
general relativity providing an even more accurate model, most modern simulations of heavenly
bodies essentially still use the Newtonian law at their core. However, such simulations can of
course handle enormous numbers of bodies/particles2 . Modern simulations often also consider
more than just gravity (for example cooling and heating).

7.1 Navigation
Historically, one of the most important N -body problems to have been studied was the three-
body problem involving the Sun, the Earth, and the Moon. Navigation at sea was a major reason
to be interested in predicting the relative positions of the Sun, Earth, and Moon. In particular,
although latitude was relatively easy to determine, longitude was much more difficult, and this
was where the thee-body problem turned up. The main idea is that if you can predict in advance
where the moon should be in the sky relative to other heavenly bodies (like the Sun or certain
bright stars) at particular times, then by measuring the position of the moon relative to some other
heavenly body, you can figure out the time in Greenwich (for example). If you then also know
the local time (by observing the Sun, for example), then longitude follows easily (15 degrees per
hour difference).
Unfortunately, unlike a two-body problem, a three-body problem is quite difficult to treat analy-
tically (Diacu, 1996; Šiuvakov and Dmitrašinović, 2013). As a result, one has little choice but to
use numerical methods and/or other kinds of approximations to approach the problem. Clearly,
this was no easy task before the advent of computers, and this largely explains the wide variety of
famous mathematicians who concerned themselves with the three-body problem in one way or
another (Brown, 1896; Diacu, 1996; Wilson, 2010): Newton, Euler, Laplace, Lagrange, Poisson,
Poincaré, among others.

7.2 Planetary system formation


The general idea for how the solar system came to be is that small particles coalesced into bigger
ones, and then into even bigger ones, and so on until they ended up as the planets and moons we
currently have. But is this idea accurate? One way to test this is to use an N -body simulation of
the process to see what would happen. The main trick is that collisions can and will happen, and
that these need to be modelled, but this is possible.
One question that can be investigated this way is the phenomenon known as the Late Heavy
Bombardment of the terrestial planets. It is hypothesized, based on analyses of lunar rocks, that
there was a spike in the number of impacts about 700 million years after the planets were formed.
However, the basic picture just discussed does not really explain how this could happen. Using
simulations of the gravitational interactions between the Sun, the giant planets (Jupiter, Saturn,
2
The so-called “Millenium Simulation” was a simulation of over 10 billion particles, which ran for more than a
month (in 2005) on a supercomputer (Springel et al., 2005).

90
7.2. Planetary system formation

Figure 7.2: Snapshots from a simulated planet migration (Gomes et al., 2005, Fig. 2). a) Just
before the migration. b) The interaction of the planets is scattering debris all through the solar
system (causing the Late Heavy Bombardment). c) Afterwards, the planets are in their final
orbits.

Figure 7.3: A simulation of the orbits in a proposed detection of a planetary system with two
planets (Horner et al., 2013, Fig. 3). As the orbits clearly are not very stable, it is extremely
unlikely that the planets exist (as proposed).

Uranus, and Neptune), and disks of debris, a plausible scenario can be found where an instability
led to the migration of (in particular) Uranus and Neptune, in turn changing the orbit of part
of the debris to cross into the interior of the solar system (Gomes et al., 2005), as illustrated in
Fig. 7.2.
Another interesting use of N -body simulations is to get a better handle on the exoplanetary sys-
tems identified using the Kepler instrument 3 . Given that even stars appear to be not much more
than pinpricks even on images captured by telescopes, exoplanet hunting is clearly a challenging
business. In fact, exoplanets are far too small and dim to image directly, so we can only detect
by a very faint dimming of their host star when (and if ) they pass in front of it. Still, the data
can be used to inform and validate N -body simulations of general planetary system formation,
3
https://www.nasa.gov/mission_pages/kepler

91
N -body simulations

giving us a better idea of the possible characteristics of the planets we detect. In some cases, it is
even possible to use simulations for (in)validating detections, see Fig. 7.3.

Figure 7.4: Left: The Kepler instrument used for exoplanet hunting. Right: Image data from
Kepler (a single frame comprises 42 of these images). The star pointed to by the arrow (in
the lower-right quadrant) has an exoplanet called TrES-2 (a so-called “hot Jupiter”). Credit:
NASA/Kepler/Ball Aerospace

7.3 Dark matter and the cosmic web


More exotic, and more challenging (computationally) at the same time, are attempts to simulate
the evolution of the universe using N -body simulations of (predominantly) “dark matter”. The
idea is to start with a large number of particles that are distributed not quite evenly over space in
a virtual universe, and then let this evolve under the influence of (Newtonian) gravity. Clearly,
these particles are not individual atoms or other microscopic particles, each particle is simply
assumed to be a kind of “blob” of matter with a particular mass. One can then examine, for ex-
ample, the influence of subtle changes to the models on voids (Bos et al., 2012), or the evolution
of galaxies (Schaye et al., 2015; Barber et al., 2016).
The reason such simulations are computationally demanding is that, in principle, every particle
interacts (gravitationally) with every other particle. So for N particles there are N (N − 1)/2
interactions. For example, for 100 particles, (100 · 99)/2 = 4950 interactions need to be taken
into account. While for the 300 billion = 3 × 1011 particles used in the Millenium-XXL simu-
lation (Angulo et al., 2012)4 , we need to consider almost 4.5 × 1022 interactions. Even on the
world’s largest supercomputer5 , assuming we only need a single operation for each interaction
(you typically need more), and ignoring any communication overhead (which you will have), a
single computation of all forces would take almost 100 hours6 . For a practical simulation, up to
11 000 timesteps can be needed (Springel et al., 2005), leading to simulation times of up to 125
4
An interactive visualisation of this data is available at http://galformod.mpa-garching.mpg.de/
mxxlbrowser/.
5
As of Nov. 2016, according to top500.org.
6
This is based on the peak performance of 125,435.9 TFlop/s of the machine listed at top500.org.

92
7.3. Dark matter and the cosmic web

Figure 7.5: A visualization of part of the EAGLE simulation designed to study the evolution
of galaxies. Intensity shows gas density, while colour encodes temperature (blue: < 104.5 K,
green: 104.5 –105.5 K, red: > 105.5 K). The insets show progressively zoomed-in views of a galaxy
(the top-right inset uses a different type of visualization that roughly mimics what you would see
through a telescope).

Figure 7.6: Examples of galaxies present in the EAGLE simulation

93
N -body simulations

years. Clearly this is impractical, so alternative simulation techniques have been developed.

7.4 Two-body problem


Let us consider the simple case of the two-body problem (N = 2). In addition, we assume that
the two bodies or particles move in circular orbits under the influence of each other’s gravitational
attraction, each with the same angular velocity ω. The bodies have masses m and M , respec-
tively, where we assume that M ≥ m. Now we want to study the equations for the positions,
velocities, and accelerations of the two masses.

y
v (t)

m r (t)

r
ωt
M x

Figure 7.7: Body with mass m moving in a circular orbit under the influence of the gravitational
attraction of a mass M fixed at the origin.

7.4.1 One particle moving in a circular orbit


We will start with the case that one of the masses is fixed at the origin of the coordinate center.
This model is a good approximation for the case when the heavy mass M is much larger than the
light mass m. So assume a particle of mass m moves in a circular orbit of radius r around the
origin with angular velocity ω; see Fig. 7.7. Writing the x-coordinate of the particle as x(t) and
the y-coordinate as y(t), where t denotes time, we have that

x(t) = r cos ω t y(t) = r sin ω t

We will write the position of the particle as a column vector ~r (t):


   
x(t) r cos ω t
~r (t) = = (7.2)
y(t) r sin ω t

94
7.4. Two-body problem

We can also use the row vector form, where the vector is written as (x(t), y(t)). We say that
the column vector is the transpose of the row vector, which is denoted by the superscript T :
~r (t) = (x(t), y(t))T .
Since the particle moves in a circle of radius r the length of the vector ~r (t) is constant and equal
to r. We canp check this as follows. For any vector ~x = (x1 , x2 ) its magnitude or length is equal
to k~x k = x21 + x22 . This is also called the norm of the vector ~x . Applying this formula to the
vector ~r (t) we find
p p
k~r (t)k = (r cos ω t)2 + (r sin ω t)2 = r cos2 ω t + sin2 ω t = r (7.3)
Next we compute the velocity vector ~v (t) = (vx (t), vy (t))T of the particle, by taking the time
derivatives of the position coordinates:
dx(t)
  !  
vx (t) dt −ω r sin ω t
~v (t) = = = (7.4)
vy (t) dy(t) ω r cos ω t
dt

The speed of the particle is defined as the magnitude of the velocity vector:
q p
v = k~v (t)k = vx2 (t) + vy2 (t) = (−ω r sin ω t)2 + (ω r cos ω t)2 = ω r (7.5)

So we see that the speed is constant: v = ω r.


Also, the position vector ~r (t) and the velocity vector ~v (t) are perpendicular at any time. This
can be shown by computing the inner product (see the Appendix) of these two vectors:
~r (t) · ~v (t) = x(t) · vx (t) + y(t) · vy (t) =
= (r cos ω t) · (−ω r sin ω t) + (r sin ω t) · (ω r cos ω t) = 0
So the inner product is zero, which means the vectors are perpendicular.
Next we compute the acceleration vector ~a (t) = (ax (t), ay (t)) by taking the time derivatives of
the components of the velocity vector:
dvx (t)
!    
d~v (t) dt −ω 2 r cos ω t r cos ω t
~a (t) = = dv (t) = 2 = −ω 2
= −ω 2 ~r (t) (7.6)
dt y −ω r sin ω t r sin ω t
dt

Note that we can also write the acceleration as the second order time derivative of the position:
d2 x(t)
!
d~v (t) d2~r (t) dt2
~a (t) = = = (7.7)
dt dt2 d2 y(t)
dt2

Formula (7.6) shows that the acceleration has the constant magnitude a = ω 2 r and is directed
in the opposite direction of the position vector ~r (t). The force F~ (t) which is needed to keep the
particle in the circular orbit is given by Newton’s second law of motion:
F~ (t) = m ~a (t) = −m ω 2 ~r (t)

95
N -body simulations

This means that the force vector is pointing towards the center of the circular motion (the ori-
gin), which is why it is called the centripetal (“center seeking”) force. The magnitude of the
centripetal force, which we denote by Fc , is equal to
m v2
Fc = F~ (t) = m ω 2 k~r (t)k = m ω 2 r = .
r
This force has to be provided by the gravitational attraction force of the mass M at the origin,
which according to Newton’s gravitational law (Eq. (7.1)) has magnitude
GmM
Fg = (7.8)
r2
and is directed from the position of the mass m to the origin, that is, along the unit vector − k~~rr (t)
(t)k
.
So in vector form the gravitational force is equal to (remember that k~r (t)k = r)
G m M ~r (t) GmM
F~ g (t) = − =− ~r (t)
r 2 k~r (t)k r3
By setting Fc = Fg we find
m v2 GmM
=
r r2
So r r
GM GM
v= ω = v/r = (7.9)
r r3
In other words, circular motion at radius r can only happen if the speed v, c.q. the angular speed
ω, satisfy formula (7.9). If this special relation is not satisfied other types of orbits are possible.
The study of the general case of two-body motion is called the Kepler problem.7 The analysis of
this problem shows that the following types of orbits are possible: circle, ellipse, parabola, and
hyperbola.

7.4.2 Two particles moving in circular orbits


Let us now relax the assumption that one of the masses is fixed and consider the case that both
bodies are moving around one another in circular orbits; see Fig. 7.8. The heavy mass M moves
on a circle of radius R and the lighter mass on a circle of radius r. Both circles have the same
center, which is the center of mass (or barycenter) C, which lies on the line connecting the two
masses.
Let ~r 1 (t) be the position vector of particle 1 (the body with mass M ) and ~r 2 (t) be the position
vector of particle 2 (the body with mass m). The center of mass has the position R ~ c (t) given by

~ c (t) = M ~r 1 (t) + m ~r 2 (t)


R
M +m
7
Johannes Kepler (1571–1630) discovered experimentally that the orbits of the planets are ellipses with the Sun
in one focal point of the ellipse.

96
7.4. Two-body problem

r
m

C
M

Figure 7.8: Two bodies with masses m and M moving in circular orbits under the influence of
their mutual gravitational attraction.

By applying Newton’s law of gravitation to the two masses it can be shown that the barycenter
C moves with constant velocity. So we can put C at the origin of the coordinate system, that
is, R~ c (t) = (0, 0). This means that M ~r 1 (t) = −m ~r 2 (t). Remembering that the magnitude of
~r 1 (t) equals R and the magnitude of ~r 2 (t) equals r, we thus find that the masses and radii are
related by the equation
M R = mr (7.10)

So indeed the radius of the orbit of the heavy mass is smaller than that of the lighter mass.
As we have seen in Section 7.4.1, the magnitude of the centripetal force on particle 1 is equal
to M ω 2 R and the magnitude of the centripetal force on particle 2 is equal to m ω 2 r. In view
of Eq. (7.10) these two magnitudes are equal. From Newton’s law of gravitation we know that
the magnitude of the gravitational force exerted by mass m on mass M and the magnitude of
the gravitational force exerted by mass M on mass m both have the value (R+r)GmM
2 , because the

distance between the two masses is R + r. So we have found the formula

GmM
= m ω2 r
(R + r)2

This means that the angular speed ω is equal to


s
GM
ω=
r(R + r)2

By putting R = 0 (heavy mass fixed at the origin) we recover formula (7.9).

97
N -body simulations

7.5 Simulation techniques


The simplest technique for performing N -body simulations is direct simulation. The idea is sim-
ple: for each particle i the mass mi is considered given and constant, and we have 6 variables
describing the position and velocity of each particle in 3D: xi , yi , zi , vx,i , vy,i , vz,i ; Newton’s sec-
ond law of motion (F~ = m ~a ) is then used to determine the acceleration of each particle based
on Eq. (7.1). Write ~ri = (xi , yi , zi )T and ~vi = (vx,i , vy,i , vz,i )T . Equation (7.1) gives the mag-
nitude of the force acting between particles i and j. Now, keeping in mind that the direction of
the force is towards the other body, dividing by the mass mi to get acceleration rather than force,
and summing over all particles other than i, the acceleration of particle i is given by
X mj ~rj − ~ri X mj (~rj − ~ri )
~ai = G = G . (7.11)
j6=i
k~rj − ~ri k2 k~rj − ~ri k j6=i
k~
r j −~
r ik
3

n n+1
t

Figure 7.9: Discretization of the time domain. The grid has cells of size ∆t along the time axis.
Two discrete time instants indexed by n and n + 1 are shown.

Considering the evolution of positions ~ri (t) and velocities ~vi (t) over time t, we get

d~ri (t)
= ~vi (t) (7.12)
dt
d~vi (t)
= ~ai (t) (7.13)
dt
The solution trajectory, (~ri (t), ~vi (t)), t > 0, is determined by the initial positions ~ri (0) and the
initial velocities ~vi (0) of all the particles i = 1, . . . , N .
To compute the solution numerically, these differential equations have to be discretized. This
can be done using finite difference schemes, as we discussed before in Chapter 6. We do this by
dividing the time axis into cells of size ∆t; see Fig. 7.9. The nth time point is

tn = t0 + n∆t, . . . n = 1, . . . , N.

Each derivative is replaced by a finite difference. For example,

d~ri (t) ~r n+1 − ~r in


≈ i , (7.14)
dt ∆t
where ~r in+1 = ~r i (tn+1 ) and ~r in = ~r i (tn ). The symbol ≈ means “approximately”.

98
7.5. Simulation techniques

After discretization the equations (7.12)-(7.13) become:

~ri n+1 = ~ri n + ∆t ~vi n (7.15)


~vi n+1 = ~vi n + ∆t ~ain (7.16)

When implementing these equations in Matlab it is convenient to put the vectors ~x i and ~v i as
column vectors in a matrix (see the Appendix for more information on vectors and matrices).
For example, for N = 3 we can write the equations in matrix form as follows. Equation (7.15)
becomes:
 n+1     
x1 x2n+1 x3n+1 x1n x2n x3n n
vx,1 n
vx,2 n
vx,3
(7.17)
 n+1 n+1
  n
n+1 
  n n n

 y1 y2 y 3 = y1 y2n y3n  + ∆t vy,1 vy,2 vy,3
     
z1n+1 z2n+1 z3n+1 z1n z2n z3n n
vz,1 n
vz,2 n
vz,3

In the same way, Eq. (7.16) becomes:


 n+1 n+1 n+1
    
n n n n n n
vx,1 vx,2 vx,3 vx,1 vx,2 vx,3 ax,1 ax,2 ax,3
(7.18)
 n+1 n+1 n+1 
  n n n
  n n n 

vy,1
 vy,2 v y,3  = v v v
 y,1 y,2 y,3  + ∆t
 ay,1 ay,2
 a y,3 
n+1 n+1 n+1 n n n n n n
vz,1 vz,2 vz,3 vz,1 vz,2 vz,3 az,1 az,2 az,3

where, according to (7.11), we can write the following matrix expression for the accelerations at
time n:
 m2 (x n −x n ) m3 (x n −x n ) m1 (x n −x n ) m3 (x n −x n ) m1 (x n −x n ) m2 (x n −x n ) 
2 1 3 1 1 2 3 2 1 3 2 3
a n a n a n  + + +
x,1 x,2 x,3 k~r 2n −~r 1n k3 k~r 3n −~r 1n k3 k~r 1n −~r 2n k3 k~r 3n −~r 2n k3 k~r 1n −~r 3n k3 k~r 2n −~r 3n k3
 
 m2 (y2n −y1n ) + m3 (y3n −y1n ) m1 (y1n −y2n ) + m3 (y3n −y2n ) m1 (y1n −y3n ) + m2 (y2n −y3n ) 
 y,1 y,2 y,3  = G  k~r 2n −~
a n a n a n 
3 3 3 3 3 3 
1 k k~r 3 −~r 1 k k~r 1 −~r 2 k k~r 3 −~r 2 k k~r 1 −~r 3 k k~r 2 −~r 3 k
r n n n n n n n n n n n
 
n n n m2 (z2n −z1n ) m3 (z3n −z1n ) m1 (z1n −z2n ) m3 (z3n −z2n ) m1 (z1n −z3n ) m2 (z2n −z3n )
az,1 az,2 az,3 3 + 3 3 + 3 3 + 3
kr n −~
~
2
r n
1 k kr n −~
~
3
r n
1 k kr n −~
~
1
r n
2 k kr n −~
~
3
r n
2 k kr n −~
~
1
r n
3 k kr n −~
~
2
r n
3 k

Note that the positions and velocities are updated simultaneously, and that we just use their
values at time step n to compute their values at time step n + 1. This method of discretization
is called the Euler method, which is an example of an explicit numerical scheme. In practice
a slightly different method (the colourfully named leapfrog method) is usually used to improve
the accuracy of the simulation; this method alternately updates the positions and velocities (for
example, it starts by updating the positions, then it updates the velocities based on the new
positions, then it updates the positions based on the last computed velocities, etc., etc.). This
method is also explicit.
Direct simulation works very well, and is used in practice for small systems, but for large systems
it becomes prohibitively expensive, due to having to compute the forces between all pairs of
particles (which scales quadratically with the number of particles). There are two commonly
used ways of speeding up the computations, and they can even be used together. The first is
to store the points in a tree (or hierarchy) that recursively clusters together particles based on
how close they are to each other; each cluster of particles is then treated as a single (heavier)

99
N -body simulations

particle when computing its contribution to the force acting on some other, distant, particle (the
further away the other particle is, the better the approximation works). The second method is a
little more complicated to explain, but finds the force acting on a particle through computing a
potential, which can be done efficiently with some further tricks (this is called the particle mesh
method). Finally, to further speed up the simulation as a whole, different step sizes are usually
used for different particles, depending on how regular their trajectories are.

100
Bibliography

Angulo, R. E., Springel, V., White, S. D. M., Jenkins, A., Baugh, C. M., Frenk, C. S., 2012.
Scaling relations for galaxy clusters in the Millennium-XXL simulation. MNRAS 426 (3),
2046–2062.

Barber, C., Schaye, J., Bower, R. G., Crain, R. A., Schaller, M., Theuns, T., 2016. The origin of
compact galaxies with anomalously high black hole masses. MNRAS 460 (1), 1147–1161.

Bos, E. G. P., van de Weygaert, R., Dolag, K., Pettorino, V., 2012. The darkness that shaped the
void: dark energy and cosmic voids. MNRAS 426 (1), 440–461.

Brown, E. W., 1896. An Introductory Treatise On The Lunar Theory. Cambridge. The University
Press.

Diacu, F., 1996. The solution of the n-body problem. The Mathematical Intelligencer 18 (3),
66–70.

Gomes, R., Levison, H. F., Tsiganis, K., Morbidelli, A., 2005. Origin of the cataclysmic Late
Heavy Bombardment period of the terrestrial planets. Nature 435 (7041), 466–469.

Horner, J., Wittenmyer, R. A., Hinse, T. C., Marshall, J. P., Mustill, A. J., Tinney, C. G., 2013.
A detailed dynamical investigation of the proposed QS Virginis planetary system. MNRAS
435 (3), 2033–2039.

Schaye, J., Crain, R. A., Bower, R. G., Furlong, M., Schaller, M., Theuns, T., Dalla Vecchia,
C., Frenk, C. S., McCarthy, I. G., Helly, J. C., Jenkins, A., Rosas-Guevara, Y. M., White,
S. D. M., Baes, M., Booth, C. M., Camps, P., Navarro, J. F., Qu, Y., Rahmati, A., Sawala, T.,
Thomas, P. A., Trayford, J., 2015. The EAGLE project: simulating the evolution and assembly
of galaxies and their environments. MNRAS 446 (1), 521–554.

Springel, V., White, S. D. M., Jenkins, A., Frenk, C. S., Yoshida, N., Gao, L., Navarro, J.,
Thacker, R., Croton, D., Helly, J., Peacock, J. A., Cole, S., Thomas, P., Couchman, H., Evrard,
A., Colberg, J., Pearce, F., 2005. Simulations of the formation, evolution and clustering of
galaxies and quasars. Nature 435 (7042), 629–636.

Šiuvakov, M., Dmitrašinović, V., 2013. Three Classes of Newtonian Three-Body Planar Periodic
Orbits. Phys. Rev. Lett. 110, 114301+.

101
BIBLIOGRAPHY

Wilson, C., 2010. Lunar Theory from the 1740s to the 1870s A Sketch. In: The Hill-Brown
Theory of the Moon’s Motion. Sources and Studies in the History of Mathematics and Physical
Sciences. Springer New York, pp. 9–30.

102
Chapter 8

Simulation of reaction-diffusion processes

In this chapter we first will look at the physical process of diffusion. We will see how to math-
ematically model the diffusion process and how to solve the corresponding mathematical equa-
tions on a grid. Then diffusion will be combined with reaction processes into so-called reaction-
diffusion processes, which can describe pattern formation in continuous space (in contrast to the
cellular automata models of Chapter 4 which are discrete in time and space.)1

8.1 Diffusion processes


Diffusion is the process whereby particles of liquids, gases, or solids mix due to spontaneous
movement caused by thermal agitation. An example is the diffusion of a drop of ink in water.
More ink particles will move from a region of higher to one of lower concentration than the
other way around. The net result is that differences in the local ink concentration are gradually
smoothed out.
The diffusion process can be mathematically modelled by the so-called diffusion equation. The
same equation also describes how heat dissipates in a material, therefore another name is the
heat equation. We will first formulate the equation when the spatial domain is one-dimensional.
Then this is generalised to the 2D case.

8.2 The diffusion equation in 1D


The concentration of diffusing material at position x at time t is denoted by f (x, t), where 0 ≤
t ≤ T . T is the maximal time the diffusion process is running. We may however also study the
process for an infinitely long time; then T = ∞. Let us also assume that the spatial domain is an
interval [a, b], so a ≤ x ≤ b. Again, the spatial interval can also be infinite (a = −∞, b = ∞) or
half infinite (a = −∞ or b = ∞, but not both).
The diffusion equation relates the change in time of the concentration f (x, t) at position x to
the second order concentration difference at position x. (If the concentration is a linear function
1
The material of Section 8.4 was developed by Jasper van de Gronde.

103
Simulation of reaction-diffusion processes

of x, the first order concentration difference is constant, while the second order concentration
difference is zero.) Mathematically, the change in time of f (x, t) at x is given by the derivative
∂f (x,t)
∂t
. This derivative is computed by differentiating the expression of f (x, t) with respect to t
while keeping x fixed. Similarly, the second order change in space of f (x, t) at position x is given
2 f (x,t)
by the second derivative ∂ ∂x 2 . This derivative is computed by differentiating the expression of
f (x, t) two times with respect to x while keeping t fixed.
The diffusion equation now takes the form of an equation (a so-called partial differential equation
or PDE) which says that the temporal derivative and second order spatial derivative of f (x, t) are
proportional, where the proportionality constant D is called the diffusion coefficient:

∂f (x, t) ∂ 2 f (x, t)
=D (8.1)
∂t ∂x2
We need to define an initial condition in time when the process starts, that is at t = 0. This takes
the form f (x, 0) = f0 (x), where f0 (x) describes the spatial shape of the initial concentration
profile. For the boundary condition in space, several choices are possible. For example, one may
set f (a, t) = c, f (b, t) = d, where c and d are given constants.2
Note that the initial condition is a formula which holds at t = 0 for any x, while the boundary
conditions are formulas that hold for specific spatial locations a and b at any time t.

8.2.1 Analytical solution of the diffusion equation in 1D


The equation (8.1) can be analytically solved. The methods to do so in the most general case
are beyond the scope of this chapter. However, the solution can be easily given in the following
special case. Let the spatial interval be infinite and assume that the concentration is zero at both
ends x = −∞ and x = ∞ for all times t. Also, assume that at t = 0 the concentration has a
peak of height M at x = 0 and is zero everywhere else. The constant M can be interpreted as
the total mass of diffusing material. Then it can be shown that the solution of equation (8.1) is
given by the formula:

M x2
f (x, t) = √ e− 4Dt (8.2)
4πDt
Considered as a function of x for fixed t, this is an example of a so-called normal or Gaussian
distribution, which occurs in many fields of mathematics and statistics; see Figure 8.1. If we
compare to the general formula for the normal distribution in equation 8.3, we see that the solu-
tion of the diffusion
√ equation is a normal distribution at any time t, with mean µ = 0 and standard
deviation σ = 2Dt. This means that as time proceeds, the normal distribution becomes wider
while the peak value becomes smaller. Plots of the solution (8.2) for a number of time points
are given in Fig. 8.2. As is seen from the figure, shortly after the start of the diffusion process
(t = 0.01), the peak has already broadened a bit. At t = 0.05 the broadening is very clear, while
at t = 1 the curve starts to flatten out.
2
This choice is known as Dirichlet boundary conditions.

104
8.2. The diffusion equation in 1D

The one-dimensional normal distribution or Gaussian function nµ,σ (x) is defined by

1 (x−µ)2
nµ,σ (x) = √ e− 2σ2 (8.3)
σ 2π

The parameter µ is the mean and σ the width or standard deviation. Plots of nµ,σ (x) and its first and second
derivatives are shown in the next figures:

dnµ,σ (x) d2 nµ,σ (x)


nµ,σ (x) dx dx2

Figure 8.1: The 1D normal distribution.

Figure 8.2: Plots of the solution (8.2) to the diffusion equation for three time points. The param-
eter values are: D = 1, M = 1.

Now suppose the initial concentration has not one, but a number of peaks at different locations.
It can be shown that in this case the solution of the diffusion equation is a combination of normal
distributions, each centered at the location of the corresponding peak. This is an illustration of
the so-called superposition principle.

105
Simulation of reaction-diffusion processes

8.2.2 Numerical solution of the diffusion equation in 1D


To compute the solution numerically, the diffusion equation has to be discretized. As in Chapter 7
this can be done using finite difference schemes. Now there are two domains that have to be
discretized, the spatial and the temporal domain. We will do that by considering time and space
as two axes of a 2D space-time domain, and discretize this plane into cells just as we did in the
case of cellular automata (there the 2D space had two spatial axes); see Figure 8.3.

∆t

(i, n)

∆x

Figure 8.3: Discretization of the space-time domain on which diffusion takes place. The grid has
cells of size ∆x along the (horizontal) space axis, and cells of size ∆t along the (vertical) time
axis. The concentration at time tn at location xi is fin .

In other words, we divide the time axis into cells of size ∆t, so the nth time point is
tn = t0 + n∆t, . . . n = 1, . . . , N.
Similarly, the spatial axis is divided into cells of size ∆x, so the ith space point is
xi = x0 + i∆x, i = 1, . . . , I. (8.4)
The concentration of material at time tn at location xi is f (tn , xi ). We abbreviate this by fin .
Now look at the original diffusion equation (8.1). The left-hand side is replaced by a finite
difference,
∂f (xi , tn ) fin+1 − fin
≈ (8.5)
∂t ∆t
The symbol ≈ means “approximately”. Next consider the right-hand side of equation (8.1).
This contains a second order derivative with respect to x. We replace this by a second-order
difference:

106
8.2. The diffusion equation in 1D

n n
∂ 2 f (xi , tn ) fi+1 + fi−1 − 2fin
≈ (8.6)
∂x2 (∆x)2

Combining the results so far we find the discrete diffusion equation:

fin+1 − fin n n
− 2fin
 
fi+1 + fi−1
=D . (8.7)
∆t (∆x)2
Of course, initial and boundary conditions have to be defined, just as in the continuous case.
We can rewrite (8.7) as follows:

D∆t
fin+1 = fin + n n n
(8.8)

f + f − 2f i .
(∆x)2 i+1 i−1

This expresses the concentration at time step (n + 1) in terms of quantities at time step n. This
is called an explicit numerical scheme.3
In Figure 8.4 we show bar plots of the solution to the discrete equation (8.7) for a number of
time steps. We have used the same initial distribution as in the continuous case, that is, a peak
of height M at x = 0. The following parameter values were used: I = 21, M = 100, ∆x = 1,
∆t = 1, D = 0.25, t0 = 0.

Figure 8.4: Plots of the solution to the finite difference approximation (8.7) of the 1D diffusion
equation for three time steps n = 1, n = 3, and n = 10.

As we can see, the behaviour is similar to the continuous case, with the peak rapidly broadening.
Also, the profiles of the concentration show a clear similarity to the normal distributions in
Fig. 8.2.
We have not been very explicit about the exact meaning of the approximation symbol ≈. Various
questions may be asked:

1. Is the behaviour of the finite-difference equation (8.7) similar to the continuous version (8.1)?
3
Also implicit numerical schemes exist.

107
Simulation of reaction-diffusion processes

2. Given a certain choice for the time and space intervals ∆t and ∆x, how large is the error in
the solution we make by replacing the continuous equation by a finite-difference equation?

3. If we let ∆t and ∆x go to zero, do the solutions of the finite-difference equation become


identical to the solutions of the continuous version?

Studying these questions in detail is beyond the scope of this chapter. Such a study belongs to
a branch of mathematics called numerical analysis. One result is worth mentioning here. It has
been found that the finite-difference approximation may lead to instabilities in the solution if the
step size ∆t and ∆x are not carefully chosen. The following stability criterion guarantees that
such instabilities do not occur:
2D∆t
≤1 (8.9)
(∆x)2
The interpretation of this formula is that the maximum allowed time step is, up to a factor, equal
to the diffusion time across a cell of width ∆x.

(a) Nµ~ ,σ (x, y) (b)


dNµ
~ ,σ (x,y)
dx

d2 Nµ
~ ,σ (x,y) ∂ 2 Nµ
~ ,σ (x,y) ∂ 2 Nµ
~ ,σ (x,y)
(c) dx2 (d) ∂x2 + ∂y 2

Figure 8.5: Surface plots of the 2D isotropic Gaussian distribution Nµ~ ,σ (x, y) and some of its
derivatives (σ = 3.0).

108
8.3. The diffusion equation in 2D

8.3 The diffusion equation in 2D


We will not discuss the 2D case in the same detail as the 1D case. When the space domain is two-
dimensional, the concentration depends on three parameters: time t, and two spatial variables x
and y. So we write f (x, y, t) for the concentration. The diffusion equation now has the form:
 2
∂ f (x, y, t) ∂ 2 f (x, y, t)

∂f (x, y, t)
=D + (8.10)
∂t ∂x2 ∂y 2

Compare this to equation (8.1). The form of the equation is the same, but we now have two
second order derivatives on the right-hand side, one with respect to x and one with respect to y.
Solutions to this equation are again a superposition of normal distributions, but now in two di-
mensions. Plots of a multivariate 2D isotropic Gaussian function Nµ~ ,σ (x, y) = nµ1 ,σ (x) nµ2 ,σ (y)
(mean µ~ = (µ1 , µ2 ), variance σ in both dimensions) and some of its derivatives are shown in
Figure 8.5.

8.3.1 Numerical solution of the diffusion equation in 2D


The discretization is done in a similar way as for the 1D case. Now we have a three-dimensional
space-time domain, with one time axis and two space axes. The time axis is again divided
into cells of size ∆t, and the two spatial axes into cells of sizes ∆x and ∆y, respectively. The
concentration of material at time tn and location (xi , yj ) is f (tn , xi , yj ). We abbreviate this by
writing fi,j
n
.
The left-hand side of equation (8.10) is replaced by a finite difference,
n+1 n
∂f (xi , yj , tn ) fi,j − fi,j
≈ (8.11)
∂t ∆t
Next consider the right-hand side of equation (8.10). The first term is a second order derivative
with respect to x. We replace this by a second-order difference:

n n n
∂ 2 f (xi , yj , tn ) fi+1,j + fi−1,j − 2fi,j
≈ (8.12)
∂x2 (∆x)2

Doing the same for the second order derivative with respect to y and combining all the results so
far we find the discrete diffusion equation

n+1 n n n n n n n
fi,j − fi,j fi+1,j + fi−1,j − 2fi,j fi,j+1 + fi,j−1 − 2fi,j
 
=D + . (8.13)
∆t (∆x)2 (∆y)2
This equation can again be written in explicit form, as in equation (8.8).
Next we will look at biological vision and show how we can use simulated diffusion to create
so-called linear scale spaces which are used in computer vision.

109
Simulation of reaction-diffusion processes

8.4 Reaction-diffusion systems


In general it is still an open question how exactly various patterns seen in nature arise (see Fig. 8.6
for some examples). In 1952 Alan Turing (also known for his involvement in defeating Enigma,
as well as the Turing test and Turing machines already mentioned in Chapter 4) published a
fairly simple (and still relevant (Maini et al., 2012)) model that at least shows how many such
patterns could develop. The model has seen countless tweaks and variations, but the essence has
remained the same: multiple “substances” that diffuse and react can (under certain conditions)
give rise to stable patterns that always look similar but are never exactly the same. Note that
the “substances” were originally considered to be chemicals, but you can also imagine different
types of cells or even more macroscopic objects.

Figure 8.6: Leopard spots, as well as stripes on fish could plausibly be explained by the kind of
reaction-diffusion equations discussed in this section. (The leopard image was provided by user
Karamash on the English Wikipedia and available under the CC BY 3.0 license.)

We have already seen diffusion of a single “substance”. Now we just add another substance
that undergoes diffusion. Let us denote the concentration of the two diffusing substances by
f (x, y, t) and g(x, y, t), respectively. If there is no interaction between the two substances then
each of them will satisfy the diffusion equation, so we get:
 2
∂ f (x, y, t) ∂ 2 f (x, y, t)

∂f (x, y, t)
= Df +
∂t ∂x2 ∂y 2
 2
∂ g(x, y, t) ∂ 2 g(x, y, t)

∂g(x, y, t)
= Dg + ,
∂t ∂x2 ∂y 2
Here the diffusion constants Df and Dg determine the rates of diffusion for f and g, respectively.
Now we take an essential step, which is to let the two substances react with one another. Then
we get the following pair of equations, the reaction-diffusion equation:
 2
∂ f (x, y, t) ∂ 2 f (x, y, t)

∂f (x, y, t)
= Df + + φf (f (x, y, t), g(x, y, t))
∂t ∂x2 ∂y 2
 2 (8.14)
∂ g(x, y, t) ∂ 2 g(x, y, t)

∂g(x, y, t)
= Dg + + φg (f (x, y, t), g(x, y, t)).
∂t ∂x2 ∂y 2

110
8.4. Reaction-diffusion systems

Here φf and φg are two functions that both depend on the concentrations f (x, y, t) and g(x, y, t)
and which take care of the reaction rate between the two substances.
A simple case is the linear model, where the reaction rates are just simple linear functions of the
concentrations (for simplicity we write f and g instead of f (x, y, t) and g(x, y, t)):

Df = 1/10, Dg = 1/5,
1 97 1 3 (8.15)
φf (f, g) = f − g, φg (f, g) = f − g.
2 128 2 4
Figure 8.7(a) shows an example of a simulation of the linear model.

(a) Linear reaction rates (b) Non-linear reaction rates

Figure 8.7: Left: a simple example using linear reaction rates (with unbounded “concentra-
tions”, white being positive and black negative), see Eq. (8.15). Right: an example with non-
linear reaction rates from Turing’s original paper (black is 0, white is 8), see Eq. (8.16). In both
cases the concentration of the second substance (g) is shown, and periodic boundary conditions
are used.

We are by no means limited to linear functions. The following nonlinear model was suggested
by Turing (1952, p. 65):

Df = 1/4, Dg = 1/16,
(
1 1
16
(f · g − g − β) if g > 0 (8.16)
φf (f, g) = (16 − f · g), φg (f, g, β) =
16 0 otherwise.

Here φg depends not just on the local values of f and g, but also on that of β, which actually
is a function β(x, y) depending on space but not on time. Figure 8.7(b) shows an example of a
simulation of the nonlinear model. In the given example β(x, y) is normally distributed around
12 with a standard deviation of 0.1 and constant in time. One can think of β as giving the local
concentration of a particular enzyme/catalyst, and its local fluctuations are what drives the system
to break away from a constant solution and produce a pattern. In particular, it can be checked
that f (x, y) = g(x, y) = 4 and β(x, y) = 12 for all x and y in the domain is a stable solution of
the reaction-diffusion equations with the choices specified in Eq. (8.16).

111
Simulation of reaction-diffusion processes

All examples shown here involve fairly isotropic patterns, looking like spots or a kind of “maze”,
but more elaborate systems do exist that allow generating a variety of patterns (Bard, 1981;
Meinhardt, 1982; Turk, 1991).

112
Bibliography

Bard, J. B. L., 1981. A model for generating aspects of zebra and other mammalian coat patterns.
Journal of Theoretical Biology 93 (2), 363–385.

Maini, P. K., Woolley, T. E., Baker, R. E., Gaffney, E. A., Lee, S. S., 2012. Turing’s model for
biological pattern formation and the robustness problem. Interface Focus 2 (4), 487–496.

Meinhardt, H., 1982. Models of biological pattern formation. Academic Press, London.

Turing, A. M., 1952. The Chemical Basis of Morphogenesis. Philosophical Transactions of the
Royal Society of London B: Biological Sciences 237 (641), 37–72.

Turk, G., 1991. Generating Textures on Arbitrary Surfaces Using Reaction-diffusion. SIG-
GRAPH Computer Graphics 25 (4), 289–298.

113
BIBLIOGRAPHY

114
Chapter 9

Sequence alignment

This chapter will study algorithms for sequence alignment. In the field of bioinformatics several
new algorithms for sequence alignment have been developed. Therefore, we first explain in very
general terms what biological sequence analysis is. Then we study some computer algorithms
that biologists use to perform sequence analysis1 .

9.1 Biological background


We first present some basic biological background which is needed to understand the terms used.
This will also equip you with the required knowledge for understanding sequence alignment,
which is the topic of the remainder of this chapter.

9.1.1 DNA, RNA, proteins


The genetic information in living organisms is encoded in their DNA (or RNA, as in some
viruses). A DNA molecule is a long, linear, chain molecule (a linear polymer) consisting of four
nucleotides: deoxyAdenosine monophosphate, deoxyThymidine monophosphate, deoxyGuano-
sine monophosphate and deoxyCytidine monophosphate. Each nucleotide sub-unit consists of
a phosphate, a deoxyribose sugar and one of the 4 nitrogenous nucleobases (usually simply
called “bases”): adenine (abbreviated A), guanine (abbreviated G), cytosine (abbreviated C) and
thymine (abbreviated T); see Figure 9.1.
These four nucleotides can be considered as a four-letter alphabet. RNA is a very similar
polymer of Adenosine monophosphate, Guanosine monophosphate, Cytidine monophosphate,
and Uridine monophosphate. Uridine monophosphate is a nucleotide functionally equivalent
to Thymidine monophosphate; it contains the base uracil (abbreviated U). Since it is the bases
which distinguish the different nucleotides from each other, one often uses “base” as a synonym
of the corresponding nucleotide.
1
Part of this text was adapted from Robert Giegerich and David Wheeler, Pairwise sequence alignment, http:
//www.techfak.uni-bielefeld.de/bcd/Curric, VSNS-BCD c .

115
Sequence alignment

Figure 9.1: The structure of DNA vs. RNA ( c National Human Genome Research Institute, Na-
tional Institutes of Health, USA).

Figure 9.2: Codons are triplets of bases from the RNA sequence ( c National Human Genome
Research Institute, National Institutes of Health, USA).

116
9.1. Biological background

A triplet of letters (i.e., bases) from the DNA sequence (or complementary RNA sequence) is
called a codon. Each codon specifies an amino-acid; see Fig. 9.2. Successive triplets specify
successive amino acids, and sequences of amino acids form proteins.
Since there are four different nucleotides, we can form 43 = 64 different triplet codes. The way
that these 64 codes are mapped onto 20 amino acids is called the genetic code. Since there are
only around 20 amino acids, the genetic code is redundant. Actually, not all triplets code for an
amino acid: 3 of the 64 codes, called stop codons, specify “end of amino acid sequence”. The
standard genetic code is presented in Table 9.1.

Table 9.1: The standard genetic code.

TTT Phe TCT Ser TAT Tyr TGT Cys


TTC Phe TCC Ser TAC Tyr TGC Cys
TTA Leu TCA Ser TAA STOP TGA STOP
TTG Leu TCG Ser TAG STOP TGG Trp
CTT Leu CCT Pro CAT His CGT Arg
CTC Leu CCC Pro CAC His CGC Arg
CTA Leu CCA Pro CAA Gln CGA Arg
CTG Leu CCG Pro CAG Gln CGG Arg
ATT Ile ACT Thr AAT Asn AGT Ser
ATC Ile ACC Thr AAC Asn AGC Ser
ATA Ile ACA Thr AAA Lys AGA Arg
ATG Met ACG Thr AAG Lys AGG Arg
GTT Val GCT Ala GAT Asp GGT Gly
GTC Val GCC Ala GAC Asp GGC Gly
GTA Val GCA Ala GAA Glu GGA Gly
GTG Val GCG Ala GAG Glu GGG Gly

9.1.2 Sequence similarity


Proteins and DNA can be similar with respect to their function, their structure, or their primary
sequence of amino or nucleic acids. The general rule is that sequence determines protein shape,
and shape determines function. So when we study sequence similarity, we eventually hope to
discover or validate similarity in shape and function. This approach is often successful. However,
there are many examples where two sequences have little or no similarity, but still the molecules
fold into the same shape and share the same function. In this chapter we do not speak of shape
or function. Sequences are seen as strings of characters. In fact, the ideas and techniques we
discuss have important applications in text processing, too.
Similarity has both a quantitative and a qualitative aspect: A similarity measure gives a quantita-
tive answer, saying that two sequences show a certain degree of similarity. A sequence alignment

117
Sequence alignment

is a mutual arrangement of two sequences which is a sort of qualitative answer; it exhibits where
the two sequences are similar, and where they differ. An optimal alignment is one that exhibits
the most correspondences, and the least differences.

9.2 Definition of sequence alignment


Given two (nucleotide or amino acid) sequences, we want to:

• measure their similarity;

• determine the correspondences between elements of the sequences.

Once this is possible one can take a given sequence and look in databanks for related sequences.
This can then be used to make biological inferences:

• observe patterns of sequence conservation between related biological species and variabil-
ity of sequences over time;

• infer evolutionary relationships.

We will consider two types of sequences, each with their own alphabet, which we denote by A.

• the DNA alphabet with four letters: A={A,C,T,G}.

• the amino acid alphabet with 20 letters.

The techniques we will describe are independent of the particular alphabet used. So we introduce
the following general definition.

Definition 9.1 Given an alphabet A, a string is a finite sequence of letters from A. Sequence
alignment is the assignment of letter-letter correspondences between two or more strings from a
given alphabet.

DNA and protein molecules evolve mostly by three processes: point mutations (exchange of a
single letter for another), insertions, and deletions. The process of sequence alignment aims at
identifying locations of a gene or DNA sequence that are derived from a common ancestral locus
(DNA location).
The following simple example is taken from Lesk (2005).
Consider two nucleotide strings GCTGAACG and CTATAATC. Some possible alignments are:

118
9.3. Dotplot

G C T G A A C G
An alignment without gaps:
C T A T A A T C

G C T G A – A – – C G
An alignment with gaps:
– – C T – A T A A T C

G C T G – A A – C G
Another alignment with gaps:
– C T A T A A T C –

The first alignment simply aligns each position of sequence 1 with the corresponding position of
sequence 2. The second alignment introduces gaps, but without any resulting match. The third
alignment introduces gaps at strategic positions in order to maximize the number of matches.
Sometimes one uses vertical bars to indicate exact matches in an alignment. For example, the
last alignment would then be written as follows:

G C T G – A A – C G
| | | | |
– C T A T A A T C –

Clearly, to decide which alignment is the best of all possibilities, we need:

1. A way to systematically examine all possible alignments;

2. A score for each possible alignment, which reflects the similarity of the two sequences.

The optimal alignment will then be the one with the highest similarity score. Note that there
may be more than one optimal alignment.
The example above illustrates pairwise sequence alignment. A mutual alignment of more than
two sequences is called multiple sequence alignment. Such multiple alignments are more infor-
mative than pairwise alignments in terms of revealing patterns of conservation. In this chapter
we will restrict ourselves to pairwise alignment.

9.3 Dotplot
The dotplot is a simple graphical way to give an overview of pairwise sequence similarity. The
dotplot is a table or matrix, where the rows correspond to the characters of one sequence and
the columns to the characters of the other sequence. In each cell (i, j), corresponding to row i
and column j, some graphical symbol (letter, color, dot, etc.) is inserted when the character at
position i in sequence 1 matches the character at position j in sequence 2. If there is no match,
the cell is left blank. Stretches of similar characters will show up in the dotplot as diagonals in
the upper-left to lower-right direction.

119
Sequence alignment

D O R O T H Y C R O W F O O T H O D G K I N
D D D
O O O O O O O
R R R
O O O O O O O
T T T
H H H
Y Y
H H H
O O O O O O O
D D D
G G
K K
I I
N N

Figure 9.3: Dotplot showing identities between short name (DOROTHYHODGKIN) and full
name (DOROTHYCROWFOOTHODGKIN) of a famous protein crystallographer.

A B R A C A D A B R A C A D A B R A
A A A A A A A A A
B B B B
R R R R
A A A A A A A A
C C C
A A A A A A A A A
D D D
A A A A A A A A A
B B B B
R R R R
A A A A A A A A A
C C C
A A A A A A A A A
D D D
A A A A A A A A A
B B B B
R R R R
A A A A A A A A A

Figure 9.4: Dotplot showing identities between a repetitive sequence (ABRACADABRA-


CADABRA) and itself.

120
9.4. Measures of sequence similarity

The illustrations in Figures 9.3-9.4 of a dotplot are taken from (Lesk, 2005, Example 4.1). Here
we use letters as symbols in the cells: whenever there is a match of two letters we plot that letter
in the cell; otherwise we leave it blank. In Figure 9.3 letters corresponding to isolated matches
are shown in non-bold type. The longest matching regions, shown in red, are the first and last
names DOROTHY and HODGKIN. Shorter matching regions, such as the OTH of dorOTHy and
crowfoOTHodgkin are noise; these are indicated in bold.
Figure 9.4 contains a dotplot showing identities between a repetitive sequence and itself:
ABRACADABRACADABRA. The repeats appear on several subsidiary diagonals parallel to
the main diagonal. Repetitive sequences are very common in DNA sequences.
When the sequences become very large it is impractical to use letters in the cells where matches
occur. Instead we simply put a dot. This is the ‘real’ dotplot.

Filtering. To remove very short stretches and avoid many small gaps along stretches of matches
one may use the filtering parameters window and threshold. This means that a dot will appear in
a cell of the dotplot if that cell is in the center of a stretch of characters of length window such
that the number of matches is larger than or equal to the value of threshold. Typical values would
be a window of size 15 with a threshold of 6. Another option is to give the cell a color (or grey
value), such that the higher the number of matches in the window, the more intense the color
becomes.

9.4 Measures of sequence similarity


Two ways are used to quantify similarity of two sequences:

1. By a similarity measure. This is a function that associates a numeric value with a pair of
sequences, such that a higher value indicates greater similarity.

2. By a distance function. This is somewhat dual to similarity. A distance measure is a


function that also associates a numeric value with a pair of sequences, such that the larger
the distance, the smaller the similarity, and vice-versa (so a distance function is a dissim-
ilarity measure). Distance measures usually satisfy the mathematical axioms of a metric.
In particular, distance values are never negative.

In most cases, distance and similarity measures are interchangeable in the sense that a small dis-
tance means high similarity, and vice-versa. Two often-used distance measures are the following.

Hamming distance. For two strings of equal length their Hamming distance is the number of
character positions in which they differ. For example:

121
Sequence alignment

s:A G T C
Hamming distance = 2
t:C G T A

s:A G C A C A C A
Hamming distance = 6
t:A C A C A C T A

The Hamming distance measure is very useful in some cases, but in general it is not flexible
enough. First of all, sequences may have different length. Second, there is generally no fixed
correspondence between their character positions. In the mechanism of DNA replication, errors
like deleting or inserting a nucleotide are not unusual. Although the rest of the sequences may be
identical, such a shift of position leads to exaggerated values in the Hamming distance. Look at
the second example above. The Hamming distance says that s and t are apart by 6 characters (out
of 8). On the other hand, by deleting G from s and T from t, both become equal to ACACACA.
In this sense, they are only two characters apart! This observation leads to the concept of edit
distance.

Edit distance. For two strings of not necessarily equal length a distance can be based on the
number of ‘edit operations’ required to change one string to the other. Here an edit operation is
a deletion, insertion or alteration of a single character in either sequence.
Given two sequences s and t, we consider the following one-character edit operations. We intro-
duce a gap character “–” and say that the pair:
(a, a) denotes a match (no change from s to t)
(a, –) denotes deletion of character a (in s); it is indicated by inserting a “–” symbol in t
(a, b) denotes replacement of a (in s) by b (in t), where a 6= b
(–, b) denotes insertion of character b (in s); it is indicated by inserting a “–” symbol in s.
Since the problem is symmetric in s and t, a deletion in s can be seen as an insertion in t, and
vice-versa. An alignment of two sequences s and t is an arrangement of s and t by position,
where s and t can be padded with gap symbols to achieve the same length. For the last two
sequences s and t mentioned above this yields:

Table 9.2: Two examples of alignment of two sequences.

s:A G C A C A C – A s:A G – C A C A C A
or
t:A – C A C A C T A t:A C A C A C T – A

If we read the alignment column-wise, we have a protocol of edit operations that lead from s to
t.

122
9.4. Measures of sequence similarity

Left: Match (A, A) Right Match (A, A)


Delete (G, –) Replace (G, C)
Match (C, C) Insert (–, A)
Match (A, A) Match (C, C)
Match (C, C) Match (A, A)
Match (A, A) Match (C, C)
Match (C, C) Replace (A, T )
Insert (–, T ) Delete (C, –)
Match (A, A) Match (A, A)

The left-hand alignment shows one Delete, one Insert, and seven Matches. The right-hand align-
ment shows one Insert, one Delete, two Replaces, and five Matches.

Unit cost model. We turn the edit protocol above into a measure of distance by assigning
a “cost” or “weight” w to each operation. For example, for arbitrary characters a, b from the
alphabet A we may define:

w(a, a) = 0 (9.1)
w(a, b) = 1 for a 6= b (9.2)
w(a, –) = w(–, b) = 1 (9.3)

This scheme is known as the Levenshtein Distance, also called unit cost model. Its pre-
dominant virtue is its simplicity. In general, more sophisticated cost models must be used (see
section 9.4.1).
Now we are ready to define the most important notion for sequence analysis.

Definition 9.2 Edit distance

• The cost of an alignment of two sequences s and t is the sum of the costs of all the edit
operations needed to transform s to t.

• An optimal alignment of s and t is an alignment which has minimal cost among all possible
alignments.

• The edit distance of s and t is the cost of an optimal alignment of s and t under a cost
function w. We denote it by dw (s; t).

Using the unit cost model for w for the example in Table 9.2, we obtain a cost of 2 for the
left alignment and a cost of 4 for the right alignment. Here it is easily seen that the left-hand
assignment is optimal under the unit cost model, and hence the edit distance dw (s; t) = 2.

123
Sequence alignment

9.4.1 Scoring functions


For applications in molecular biology, it is important to realize that some changes in nucleotide
or amino acid sequences are more likely than others. For example, amino acid substitutions
tend to be conservative. This means that it is likely that an amino acid is replaced by another
amino acid with similar physicochemical properties. Also, the deletion of a contiguous sequence
of elements (bases or amino acids) is more probable that the independent deletion of the same
number of elements at non-contiguous positions. Therefore we want to assign variable weights
to different edit operations.
This leads to the concept of scoring functions or substitution matrices. A substitution matrix is a
square array of values which indicate the scores associated to possible transitions (substitutions,
insertions, deletions). One uses either:

• Similarity scores. Here substitutions that are more likely get a higher score.
• Dissimilarity scores. Here substitutions that are more likely get a lower score. The edit
distance falls in this category: more likely transitions get lower scores (costs).

Similarity scoring schemes for nucleid acid sequences. In the case of nucleid acid sequence
comparison, one uses a Percent Identity substitution matrix. For example, a 99% and a 50%
identity matrix have the following forms, respectively2 :

A T G C A T G C
A +1 -3 -3 -3 A +3 -2 -2 -2
T -3 +1 -3 -3 T -2 +3 -2 -2
G -3 -3 +1 -3 G -2 -2 +3 -2
C -3 -3 -3 +1 C -2 -2 -2 +3
99% identity matrix 50% identity matrix

When we use the 99% identity matrix, mismatches have a large penalty, so we look for strong
alignments. For the case of the 50% identity matrix, weaker alignments have higher scores. For
example, for the two sequences

C A G G T A G C A A G C
| | | | | | | | | |
C A T G T A G C A C G C

the 99% score is 10 · 1 − 2 · 3 = 4, but the 50% score is higher, i.e., 10 · 3 − 2 · 2 = 26.
More complicated matrices may be used which reflect the fact that transitions A↔G and T↔C
(transition mutations) are more common than (A or G)↔(T or C) (transversion mutations).
In addition to the substitution matrix, one has to specify values for creating and extending a gap
in the alignment.
2
The derivation of these matrices is outside the scope of this introduction.

124
9.5. Dotplots and sequence alignment

Similarity scoring schemes for amino acid sequences. In the case of amino acid sequence
comparison, the two most common schemes are:

• PAM (Percent Accepted Mutation) matrix, developed by Margaret Dayhoff in the 1970s.
This estimates the rate at which each possible residue in a sequence changes to each other
residue over time. 1 PAM = 1 percent accepted mutation. For example PAM30 corresponds
to 75% sequence similarity, PAM250 to 20% similarity.
As an example we show the PAM250 scoring matrix in Table 9.3. This PAM 250 matrix
has a built-in gap penalty of -8, as seen in the * column (of course, other gap penalties may
be used)3 . There are 24 rows and 24 columns. The first 20 are the amino acids, represented
by the one letter code. B represents the case where there is ambiguity between aspartate or
asparigine, and Z is the case where there is ambiguity between glutamate or glutamine. X
represents an unknown, or nonstandard, amino acid.

• BLOSUM (BLOck SUbstitution Matrix), developed by S. Henikoff and J.G. Henikoff. A


BLOSUM-X matrix identifies sequences that are X% similar to the query sequence, based
on a more realistic model of amino acid substitutions. For example, one often uses the
BLOSUM50 for 50% or BLOSUM62 matrix for 62% sequence identity. BLOSUM62 is
used for closer sequences than BLOSUM50. The BLOSUM50 matrix has the form given
in Table 9.4. The coding is the same as for the PAM250 matrix (J is the case where there
is ambiguity between Leucine or Isoleucine).

I G R H R Y H I G – G
: | | | | |
– S – – R Y – I G R G

9.5 Dotplots and sequence alignment


It is helpful to return to the dotplot , because it captures not only sequence similarity, but also
the complete set of possible alignments. Consider again the simple example of Figure 9.3. Any
path through this dotplot from upper left to lower right, moving at each cell only East, South or
Southeast, corresponds to a possible alignment. This is visualized in Figure 9.5.

3
A PAM matrix calculator is available at http://www.bioinformatics.nl/tools/pam.html.

125
Sequence alignment

Table 9.3: The PAM250 scoring matrix (source: http://www.bioinformatics.nl/tools/


pam.html).

NCBI Home NCBI C Toolkit Cross source navigation


IEB Home Reference diff markup
C Toolkit docs identifier search
Table 9.4: C++
The Toolkit
BLOSUM50 scoring matrixC/data/BLOSUM50
source browser (source: http://www.ncbi.nlm.nih.gov/IEB/
freetext search
C Toolkit source browser (2)
ToolBox/C_DOC/lxr/source/data/BLOSUM50). file search

1 # Entries for the BLOSUM50 matrix at a scale of ln(2)/3.0.


2 A R N D C Q E G H I L K M F P S T W Y V B J Z X *
3 A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0 -2 -2 -1 -1 -5
4 R -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3 -1 -3 0 -1 -5
5 N -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3 5 -4 0 -1 -5
6 D -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4 6 -4 1 -1 -5
7 C -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1 -3 -2 -3 -1 -5
8 Q -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3 0 -3 4 -1 -5
9 E -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3 1 -3 5 -1 -5
10 G 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 -1 -4 -2 -1 -5
11 H -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4 0 -3 0 -1 -5
12 I -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4 -4 4 -3 -1 -5
13 L -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1 -4 4 -3 -1 -5
14 K -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3 0 -3 1 -1 -5
15 M -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1 -3 2 -1 -1 -5
16 F -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1 -4 1 -4 -1 -5
17 P -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3 -2 -3 -1 -1 -5
18 S 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2 0 -3 0 -1 -5
19 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0 0 -1 -1 -1 -5
20 W -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3 -5 -2 -2 -1 -5
21 Y -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1 -3 -1 -2 -1 -5
22 V 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5 -3 2 -3 -1 -5
23 B -2 -1 5 6 -3 0 1 -1 0 -4 -4 0 -3 -4 -2 0 0 -5 -3 -3 6 -4 1 -1 -5
24 J -2 -3 -4 -4 -2 -3 -3 -4 -3 4 4 -3 2 1 -3 -3 -1 -2 -1 2 -4 4 -3 -1 -5
25 Z -1 0 0 1 -3 4 5 -2 0 -3 -3 1 -1 -4 -1 0 -1 -2 -2 -3 1 -3 5 -1 -5
26 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -5
27 * -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 1

[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ]


[ file search ]

This page was automatically generated126


by the LXR engine.
Visit the LXR main site for more information.
9.5. Dotplots and sequence alignment

D O R O T H Y C R O W F O O T H O D G K I N
D D& D
O O& O O O O O
R R& R
O O O& O O O O
T T& T
H H& H
Y Y&
H H → → → → → → → → H&
O O O O O O O&
D D D&
G G&
K K&
I I&
N N&

Figure 9.5: Any path through this dotplot from upper left to lower right, moving at each cell only
East, South or Southeast, corresponds to a possible alignment.

The path consists of a succession of cells, each of which:

• pairs a character from the row with a character from the column;

• or indicates a gap in one of the sequences.

If the direction of a move between successive cells is diagonal, two pairs of successive characters
appear in the alignment without an insertion between them. If the move is horizontal (vertical), a
gap is introduced in the sequence indexing the rows (columns). Note that the path need not pass
through filled-in points. However, the more filled-in points are on the path, the more matching
characters the alignment will contain. This is for example the case for Figure 9.5. The path in
this figure corresponds to the alignment:

D O R O T H Y – – – – – – – – H O D G K I N
D O R O T H Y C R O W F O O T H O D G K I N

An example of a dotplot of the amino acid sequence of the SLIT protein of Drosophila melanogaster
(the fruit fly) is given in Figure 9.6. This protein is necessary for the development of certain brain
structures of the fruit fly.

127
Sequence alignment

(a)

(b)
Figure 9.6: (a): Dotplot of the amino acid sequence of Drosophila melanogaster SLIT protein.
(b): alignment in a local region of the dotplot (plots made by the web tool Dotlet available at
http://myhits.isb-sib.ch/cgi-bin/dotlet).

9.6 Pairwise alignment via dynamic programming


Now we come to an important question: how do we compute sequence alignments?
The number of possible alignments between two large DNA or amino acid sequences is gigantic,
and unless the weight function is very simple, it may seem difficult to pick out an optimal align-
ment. But fortunately, there is an easy and systematic way to find it. The method described now
is very famous in mathematical optimization and computer programming. It is usually called “the
dynamic programming algorithm”, and was developed by Richard Bellman4 . This algorithm is a
method for solving complex problems by breaking them down into simpler steps.
The idea to use dynamic programming to solve the global pairwise sequence alignment problem
was first put forward by Needleman and Wunsch (1970). For this reason the method we will
describe below is known as the Needleman-Wunsch algorithm5 .
But first some words of caution.

• The algorithm is guaranteed to give a global optimum: it will find the best alignment score,
i.e., the minimal cost, given the weight parameters.
4
A special case of this is the shortest path algorithm of the Dutch computer scientist E.W. Dijkstra.
5
Actually, it is a simplified version of this algorithm.

128
9.6. Pairwise alignment via dynamic programming

• However, many alignments may give the same optimal score, and none of these may ac-
tually correspond to the biologically correct one. Many alignments may exist with scores
close to the optimum, and one of these may be the correct one.

• The time to align two sequences of lengths n and m is proportional to n × m. So this


method is not suitable for matching one sequence against an entire database of sequences.

We will only outline the basic idea of the algorithm. A detailed treatment of this topic will have
to wait until the course on Algorithms & Datastructures.

9.6.1 Optimal substructure property

M
T
S
Figure 9.7: If M is a point on an optimal path π[S→T ] from point S to point T (solid line), then
the paths π[S→M ] and π[M →T ] along the solid line are also optimal paths. The cost of the dotted
path from S to M cannot be smaller than the cost of the solid path from S to M .

The main observation is the following, which is usually called the Optimal substructure property;
see Figure 9.7.

Observation (Optimal substructure). Consider a path π[S→T ] between two points S and T
which is optimal, i.e., has the lowest cost. Let M be a point on the path π[S→T ] . Then the path
π[S→M ] from S to M which everywhere follows the path from S to T is also an optimal path, i.e.,
has lowest cost of all paths from S to M .

To show that this observation is correct, we use a proof by contradiction6 . So let us assume that
the path π[S→M ] from S to M is not an optimal path. Then there would be another path π[S→M 0
]
from S to M with a lower cost than the path π[S→M ] . But then the path π[S→M
0
] followed by the
path π[M →T ] would be a path from S to T with a smaller cost than the original path π[S→T ] . But
this is impossible, since we assumed that π[S→T ] was optimal.
The main observation can be used to systematically subdivide the optimal alignment problem in
parts which are slightly smaller. The dynamic programming method is based on this idea.
6
In Dutch: Bewijs uit het ongerijmde.

129
Sequence alignment

9.6.2 Recursive computation of the edit distance


Remember that the edit distance dw (s; t) of two sequences s = a1 a2 . . . an and t = b1 b2 . . . bm
is the cost of an optimal alignment of s and t under a cost function w. That is, dw (s; t) is the
minimum7 of all sequences of edit operations that convert s and t into a common sequence.
In the context of the dotplot matrix, the alignment problem can be reformulated as follows. Find
a path through the matrix from the upper left to the lower right (with moves to the East, South
or Southeast only) that has the lowest cost. We can do this by creating a matrix by D, with
elements D(i, j), i = 1, 2, . . . , n and j = 1, 2, . . . , m, such that D(i, j) is the minimal edit
distance between the sequences that consist of the first i characters of s and the first j characters
of t. Then D(n, m) will be the minimal edit distance between the full sequences s and t.
Sequences of edit operations correspond to paths in the dotplot matrix of the form
(i0 , j0 ) = (0, 0) → (i1 , j1 ) → · · · (n, m)
Each step in the matrix which arrives in cell (i, j) can be of three types, i.e., East (previous cell
was (i, j−1)), South (previous cell was (i−1, j)), and SouthEast (previous cell was (i−1, j−1)).
This corresponds to three possible edit operations with associated costs, as follows:

edit operation step in matrix cost


substitution of ai → bj (i − 1, j − 1) → (i, j) w(ai , bj )
deletion of ai from sequence s (i − 1, j) → (i, j) w(ai , –)
deletion of bj from sequence t (i, j − 1) → (i, j) w(–, bj )

The algorithm computes D(i, j) by recursion. We compare the costs of the following three paths
which arrive at cell (i, j):

1. The optimal path from the start (0, 0) to cell (i − 1, j − 1), followed by the step (i − 1, j −
1) → (i, j). The cost of this path is D(i − 1, j − 1) + w(ai , bj ).
2. The optimal path from the start (0, 0) to cell (i − 1, j), followed by the step (i − 1, j) →
(i, j). The cost of this path is D(i − 1, j) + w(ai , –).
3. The optimal path from the start (0, 0) to cell (i, j − 1), followed by the step (i, j − 1) →
(i, j). The cost of this path is D(i, j − 1) + w(–, bj ).

If we take the minimum of these three costs we get the cost D(i, j) of the optimal path from the
start (0, 0) to cell (i, j) (compare the Optimal substructure property above).
So we have derived the recursion:

D(i, j) = min{D(i − 1, j − 1) + w(ai , bj ), D(i − 1, j) + w(ai , –), D(i, j − 1) + w(–, bj )}


7
Note that we have to take the minimum because edit distance is a dissimilarity score.

130
9.6. Pairwise alignment via dynamic programming

In order to retrieve the optimal path (and not only the optimal cost) after the calculation we also
store a pointer (an arrow) to one of the three cells (i − 1, j − 1), (i − 1, j) or (i, j − 1) that
provided the minimal value. This cell is called the predecessor of (i, j). If there are more cells
that provided the minimal value (remember that optimal paths need not be unique) we store a
pointer to each of these cells.
On the top row and left column of the matrix we have no North or West neighbours, respectively.
So here we have to initialize values:
i j
X X
D(i, 0) = w(ak , –), D(0, j) = w(–, bk )
k=0 k=0

which impose the gap penalty on unmatched characters at the beginning of either sequence.
In practice one often uses a constant gap penalty:

w(ak , –) = w(–, bk ) = g

The pseudo-code for the algorithm to compute the D-matrix is given in Algorithm 9.1.
As an example, let us align the sequences s=GGAATGG and t=ATG, with scoring scheme:

w(a, a) = 0 (match)
w(a, b) = 4 for a 6= b (mismatch)
w(a, –) = w(–, b) = 5 (gap insertion)

131
Sequence alignment

After initialization and the first diagonal step the matrix looks as follows:

HH t
HH – A T G
s H
H
– 0 5 10 15

G 5 4

G 10

A 15

A 20

T 25

G 30

G 35

The value 4 was entered at position (1, 1) since it is the smallest of the three possibilities 5 + 5
(horizontal move), 5 + 5 (vertical move), 0 + 4 (diagonal move). This also means that cell (0,0)
is the predecessor of cell (1,1).
After the complete calculation has finished, the matrix, including pointers to the predecessor(s)
of each cell, looks as follows:

HH
t
H – A T G
s HH
H
– 0 ← 5 ← 10 ← 15
↑ - - -
G 5 4 ← 9 10
↑ - ↑ - -
G 10 9 8 9
↑ - - ↑ -
A 15 10 13 12
↑ - ↑ - - ↑
A 20 15 14 17
↑ ↑ - -
T 25 20 15 18
↑ ↑ ↑ -
G 30 25 20 15
↑ ↑ ↑ - ↑
G 35 30 25 20

132
9.7. Variations and generalizations

The red arrows indicate trace-back paths of optimal alignment, starting at the lower right and
moving back to upper left. There are two cells (with cost value drawn in bold) where the trace-
back path branches. This gives four optimal alignments with equal score:

G G A A T G G G G A A T G G
– – – A T G – – – – A T – G

G G A A T G G G G A A T G G
– – A – T G – – – A – T – G

Algorithm 9.1 Needleman-Wunsch algorithm for global pairwise sequence alignment.


1: I NPUT: two sequences s = a1 a2 . . . an and t = b1 b2 . . . bm ; cost function w with gap penalty g
2: O UTPUT: matrix D containing the minimal edit distance between the sequences s and t
3: for i = 0 to n do
4: D(i, 0) ← g · i
5: end for
6: for j = 0 to m do
7: D(0, j) ← g · j
8: end for
9: for i = 1 to n do
10: for j = 1 to m do
11: Match ← D(i − 1, j − 1) + w(ai , bj )
12: Delete ← D(i − 1, j) + g
13: Insert ← D(i, j − 1) + g
14: D(i, j) ← min(Match, Insert, Delete)
15: end for
16: end for

9.7 Variations and generalizations


What we have discussed so far is global alignment of two sequences. There are a number of
variants and generalizations of this scheme that we briefly mention.

• Local alignment. Here we look for a region in one sequence that matches a region in
another sequence. An algorithm for this purpose was developed by Smith and Waterman
(1981). Or we probe a database with one sequence, regarding the database itself as one
very long sequence. A well-known algorithm for this is PSI-BLAST (“Position Specific
Iterative Basic Local Alignment Search Tool”).

• Motif match. Here we look for matches of a short sequence fragment (the “motif”) in one
or more regions of a long sequence.

133
Sequence alignment

• Multiple alignment. Here we do a mutual alignment of more than two sequences.


• Significance of alignments. Suppose we find an alignment between two sequences with
a high similarity. Then we want to know whether this result is significant or could have
arisen by chance. To answer this question, one looks at random permutations of one of
the sequences, aligns each of them with the other sequence, and computes the distribution
of the scores. If the randomized sequences score as well as the original sequence the
alignment is unlikely to be significant. Statistical measures of this significance are the Z-
score=(score−mean)/standard deviation, or the p-value, which is the probability that the
observed match could have happened by chance.

9.8 Sequence logos


An interesting way to visualize multiple sequence alignments are the sequence logos (Schneider
and Stephens, 1990).
The idea is to use a graphical display of multiple alignment, with colored stacks of letters repre-
senting nucleotides or amino acids at successive positions, such that height of a letter at a position
increases with increasing frequency of amino acids at that position in the different sequences.
This means that letters in stacks with single amino acids—conserved positions— are taller than
those in stacks with multiple amino acids— where there is more variation. An example is given
in Figure 9.8. This logo shows a small sample of human exon-intron splice boundaries on the
DNA (Stephens and Schneider, 1992). Splicing is a modification of mRNA after transcription,
in which introns are removed and exons are joined, which is needed before the RNA can be used
to produce a correct protein through translation. Looking at Figure 9.8, we see that at position 0
all sequences have the same base G; at position 4 all four bases occur, with G most frequently, T
less frequently, etc.

Figure 9.8: Sequence logo of human exon-intron splice boundaries.


c
http://weblogo.berkeley.edu

A sequence logo is an alternative to a so-called consensus sequence, which would display a

134
9.9. Circular visualization

single sequence as the representative of all the multiple sequences. Sequence logos have the
advantage that information on the frequency of occurrence of the different letters is maintained.

9.9 Circular visualization


Another way to visualize alignments is through circular arrangements. The Circos tool (http://
mkweb.bcgsc.ca/circos) was developed for this purpose. An example for the human genome
is given in Figure 9.9. This figure shows the chromosomes arranged in a circular orientation,
shown as wedges, marked with a length scale. Data placed outside of the chromosome ring
represent small- and large-scale variations at a given genome position found between different
populations.
Data placed on top of the chromosome ring highlight positions of genes implicated in disease,
such as cancer, diabetes, and glaucoma. Data placed inside the ring link disease-related genes
found in the same biochemical pathway (grey) and the degree of similarity for a subset of the
genome (colored).

Figure 9.9: An illustration of the human genome showing location of genes implicated in dis-
ease. Taken from http://mkweb.bcgsc.ca/circos/intro/genomic_data (original picture:
The Condé Nast Portfolio http://www.portfolio.com/news-markets/national-news/
portfolio/2007/10/15/23andMe-Web-Site).

135
Bibliography

Lesk, A. M., 2005. Introduction to Bioinformatics (2nd ed.). Oxford University Press, New York,
NY.

Needleman, S. B., Wunsch, C. D., 1970. A general method applicable to the search for sim-
ilarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (3),
443–453.

Schneider, T. D., Stephens, R. M., 1990. Sequence logos: A new way to display consensus
sequences. Nucleic Acids Res. 18, 6097–6100.
URL http://www.ccrnp.ncifcrf.gov/~toms/paper/logopaper/

Smith, T. F., Waterman, M. S., 1981. Identification of common molecular subsequences. Journal
of Molecular Biology 147 (1), 195–197.
URL http://www.sciencedirect.com/science/article/B6WK7-4DN3Y5S-24/2/
b00036bf942b543981e4b5b7943b3f9a

Stephens, R. M., Schneider, T. D., 1992. Features of spliceosome evolution and function inferred
from an analysis of the information at human splice sites. J. Mol. Biol. 228, 1124–1136.

136
Index

a priori information, 23 of alignment, 123


constants of motion, 81
dark matter, 92
algebraic reconstruction technique, 13 deletion, 118
aliasing, 30 differential equations, 75
alignment linear, 75
optimal, 119, 123 nonlinear, 82
alphabet diffusion, 103
amino acid, 118 coefficient, 104
DNA, 118 diffusion equation, 103
amino acid, 117 discrete, 107
angular velocity, 94 direct simulation, 98
ART, see algebraic reconstruction technique dissimilarity
measure, 121
barycenter, 96 distance function, 121
DNA, 115
Cantor set, 71
bases, 115
cellular automata, 47
dotplot, 119, 125
game of life, 51
filtering, 121
glider, 52
path through, 125
Gosper glider gun, 53
dynamic programming, 128
majority voting, 48
reversible, 54 edit
centripetal force, 96 distance, 122, 123
circular visualization, 135 recursive computation, 130
codon, 117 operation, 122
stop, 117 Euler method, 99
Computational Science, 5 explicit, 77
computational-X, 6 implicit, 78
computer tomography, see tomography symplectic, 80
constant of motion, 82, 86 exoplanetary systems, 91
convergence, 22
cosmic web, 92 filtered backprojection, 13
cost, 123 filtering
model, 123 threshold, 121
unit, 123 window, 121

137
INDEX

finite difference scheme, 98, 106 mutation, 118


finite difference schemes, 76
fractals, 57 n-body simulations, 89
Needleman-Wunsch algorithm, 128
game of life, see cellular automata norm, 19
gap normal distribution, 105
character, 122 nucleotides, 115
in alignment, 119 null
Gaussian function, 105 image, 16
genetic code, 117 solution, 16
Gibbs phenomena, 30 space, 34
numerical analysis, 108
Hamming distance, 121
heat equation, 103 optimal substructure, 129
initialization, 24 partial differential equation, 104
inner product, 18, 31 particle mesh, 100
insertion, 118 pattern formation, 47
Kaczmarz method Perron-Frobenius theorem, 41
binary images, 23 phase-space plot, 78
general case, 21 planetary system formation, 90
simple case, 17 predecessor, 131
Kepler instrument, 91 profile, see projection profile
Kepler problem, 96 projection, 35
data, 11
Late Heavy Bombardment, 90 perpendicular, 18
leapfrog method, 99 projection profile, 12
Levenshtein distance, 123 protein, 117
linear dependence, 35
linear equations, 33 Radon transform, 12
locus, 118 random walk, 39
logistic equation, 83 ray
logistic model, 82 equation, 15
Lotka-Volterra model, 85 sum, 13
reaction-diffusion, 103
Markov chains, 39 reconstruction
matrix Fourier, 13
addition, 33 tomographic, 9
definition, 32 RNA, 115
multiplication, 33
Millenium-XXL simulation, 92 scanning
Moiré patterns, 30 fan-beam, 12
motif match, 133 parallel beam, 12
motion invariant, 86 Scientific Computing, 5

138
INDEX

Scientific Visualization, 5 two-body problem, 94


score
dissimilarity, 124 universal Turing machine, 54
similarity, 124 vector
scoring function, 124 addition, 31
sequence alignment, 115 column, 31
definition, 118 definition, 31
global, 128, 133 dimension, 31
local, 133 inner product, 31
multiple, 119, 134 multiplication, 31
pairwise, 119 norm, 31
sequence logo, 134 row, 31
Shepp-Logan head phantom, 25 visualization
significance of alignment, 134, 135
of alignment, 134 volume rendering, 11
similarity, 119
measure, 121 weight, 123
solution
trivial, 34 X-informatics, 6
SPECT, 9
stability criterion, 108
stopping criteria, 22
streaks, 30
string, 118
substitution matrix, 124
BLOSUM, 125
PAM, 125
Percent Identity, 124
superposition principle, 105
system
consistent, 17
inconsistent, 23
overcomplete, 17

three-body problem, 90
tomography, 9
diffraction, 9
electric impedance, 9
emission, 9
reconstruction methods, 13
reflection, 9
transmission, 9
triplet, 117

139

You might also like