0% found this document useful (0 votes)
11 views122 pages

Numerical Methods Kirkegaard

This document serves as an introduction to numerical methods in physics, covering various computational techniques such as numerical differentiation, ordinary differential equations, partial differential equations, and stochastic systems. It aims to provide a broad overview and basic understanding of these methods to help readers choose and implement appropriate techniques in practical scenarios. The text emphasizes understanding over mathematical rigor and is designed to accompany a hands-on course in programming applications.

Uploaded by

Alexis Blaise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views122 pages

Numerical Methods Kirkegaard

This document serves as an introduction to numerical methods in physics, covering various computational techniques such as numerical differentiation, ordinary differential equations, partial differential equations, and stochastic systems. It aims to provide a broad overview and basic understanding of these methods to help readers choose and implement appropriate techniques in practical scenarios. The text emphasizes understanding over mathematical rigor and is designed to accompany a hands-on course in programming applications.

Uploaded by

Alexis Blaise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

A Brief Introduction to

Numerical Methods in Physics

Julius B. Kirkegaard

December 17, 2021

1
Contents

1 Introduction 4

2 Numerical Differentiation 6
2.1 First Order Derivatives . . . . . . . . . . . . . . . 6
2.2 Higher Order Derivatives . . . . . . . . . . . . . . 9
2.3 Deriving Schemes . . . . . . . . . . . . . . . . . . 12
2.4 Machine Precision . . . . . . . . . . . . . . . . . . 14
2.5 Spectral Methods . . . . . . . . . . . . . . . . . . 17

3 Ordinary Differential Equations 21


3.1 Initial-Value Problems . . . . . . . . . . . . . . . . 23
3.1.1 Runge–Kutta Methods . . . . . . . . . . . 26
3.1.2 Implicit Time-Stepping . . . . . . . . . . . 31
3.2 Boundary-Value Problems . . . . . . . . . . . . . . 36
3.2.1 Shooting Method . . . . . . . . . . . . . . 37
3.2.2 Finite Difference Method . . . . . . . . . . 39
3.2.3 Spectral Methods . . . . . . . . . . . . . . 46
3.2.4 Non-linear Problems . . . . . . . . . . . . 50

2
CONTENTS 3

4 Partial Differential Equations 56


4.1 Explicit Methods & Stability Analysis . . . . . . . 57
4.2 Finite Difference Method . . . . . . . . . . . . . . 62
4.2.1 Time-dependent Problems . . . . . . . . . 62
4.2.2 Time-independent Problems . . . . . . . . 65
4.3 Boundary Conditions . . . . . . . . . . . . . . . . 73
4.4 Spectral Method . . . . . . . . . . . . . . . . . . . 77
4.5 Finite Element Method . . . . . . . . . . . . . . . 79
4.6 Non-linear Problems . . . . . . . . . . . . . . . . 96
4.7 Operator Splitting . . . . . . . . . . . . . . . . . . 97

5 Stochastic Systems 102


5.1 Random Numbers . . . . . . . . . . . . . . . . . . 102
5.1.1 Inverse Transform Sampling . . . . . . . . 104
5.1.2 Rejection Sampling . . . . . . . . . . . . . 106
5.1.3 Markov Chain Monte Carlo . . . . . . . . . 108
5.2 Event-based Simulations . . . . . . . . . . . . . . 112
5.3 Stochastic Differential Equations . . . . . . . . . . 117
5.3.1 Initial-Value Problems . . . . . . . . . . . 118
5.3.2 Boundary-Value Problems . . . . . . . . . 120
Introduction

These notes are meant to serve as an accessible introduction to a


broad range of computational methods often employed in physics.
Understanding is valued over full mathematical details and in-depth
concepts.
Once you have read this text you should
1. have a good overview of what numerical methods exists, and
be able to choose an appropriate method when faced with a
challenging problem.
2. have a basic understanding of most methods. At least enough
to get you started writing some code, and definitely enough
for you to be comfortable reading more detailed descriptions
of the method elsewhere.
The text is meant to be accompanied by a course that teaches hand-
on how to apply these methods in a specific programming environ-
ment.

Numerous methods will be presented to tackle a diverse set of


problems. It is not expected that you will master all these methods,
nor even ever use many of them. Instead, it is expected that you will
understand most of them and employ only some of them. Nonethe-
less, even if you never use a specific method, there is a great deal
of value in having an overview of what methods exist. If you do
not know what exists, how would know what to look for? Further,

4
CHAPTER 1. INTRODUCTION 5

in the academic world one often find oneself in a situation where


some obscure technique pops up, in which case it is very useful to
be able to put this in a context of previously learned material. It will
make reading other people’s code easier and it will make listening
to talks more interesting.
First and foremost though, the aim of these notes is to teach you
methods that you will use. Some of these methods could be become
the hammer that you will use to solve most problems. The aim is
to give you enough understanding to implement simple versions of
all methods, and provide enough of a background to be able to take
a deep dive into the specifics if needed.
The notes are kept code-free. We will discuss the mathematics
needed to implemented the methods presented and will assume only
basic functionality of the programming language of your choice.
Many methods presented will exist in some form implemented in
libraries that can be downloaded and used. If these libraries are
of high quality, we naturally recommend to use these. In this case
understanding the underlying methods become important primarily
to understand how to tune the methods and when to expect things to
go wrong, or, naturally, to help when the code needs to be adapted
to suit a specific need.
Numerical Differentiation

Physical laws are often defined by differential equations, and the


solution of these will be the main subject of this text. It is therefore
crucial that we understand how to do differentiation of functions
numerically.
This chapter will go over a few methods for numerical differen-
tiation and discuss the accuracy of these methods.

2.1 First Order Derivatives


Consider a function 𝑓 of a real number 𝑥 on some interval [𝑎, 𝑏].
This could an analytically defined function such as sin 𝑥, exp(𝑥),
𝑥 2 , and so on, or one defined more indirectly but nonetheless com-
putable numerically. Such a function naturally accepts any value
of 𝑥, for instance 1.0, 𝜋, 1/3, and so on. In order to able to store
a representation of this function on a computer, however, we are
forced to only store the value of the function at a finite number of
values. The standard approach is to discretise the interval of interest
by some small number Δ𝑥1, and consider the function only at these
points:

𝑥 ∈ {𝑎, 𝑎 + Δ𝑥, 𝑎 + 2Δ𝑥, 𝑎 + 3Δ𝑥, · · · , 𝑏 − Δ𝑥, 𝑏}. (2.1)


1 Note that many texts use ℎ in place of Δ𝑥.

6
CHAPTER 2. NUMERICAL DIFFERENTIATION 7

To numerically find the derivative of 𝑓 (𝑥) at one of these points,


we could then for instance do:
𝑓 (𝑥 + Δ𝑥) − 𝑓 (𝑥)
𝑓 0 (𝑥) ≈ . (2.2)
Δ𝑥
This will only be truely equal in the limit of infinitesimally small
Δ𝑥, but will be a good approximation also for reasonable small Δ𝑥.
However, Eq. (2.2) is not the only possible choice we could take to
approximate the derivative. We could also do
𝑓 (𝑥) − 𝑓 (𝑥 − Δ𝑥)
𝑓 0 (𝑥) ≈ . (2.3)
Δ𝑥
Which one of these two are the best? Intuitively, we expect both of
these to give equally good approximations.
In fact, there are many more choices. We could do something
crazy like
𝑓 (𝑥 + 2Δ𝑥) − 𝑓 (𝑥)
𝑓 0 (𝑥) ≈ , (2.4)
2Δ𝑥
which indeed also tends to 𝑓 0 (𝑥) as Δ𝑥 → 0. For finite Δ𝑥, however,
this is a worse approximation than Eq. (2.2) and (2.3).
Eq. (2.2) and (2.3) are known as the forward and backward (or
right and left) approximations to the derivative. We can actually do
better than this (we will define what ‘better’ means in a second) by
using the so-called central derivative
𝑓 (𝑥 + Δ𝑥) − 𝑓 (𝑥 − Δ𝑥)
𝑓 0 (𝑥) ≈ . (2.5)
2Δ𝑥
That this is a good derivative approximation is easily appreciated
from an illustration:
CHAPTER 2. NUMERICAL DIFFERENTIATION 8

Consider using the forward or backward derivative with the same


Δ𝑥 on the illustrated function: this would give a much worse ap-
proximation of the derivative.

All of these formulas are, for good reason, called finite difference
expressions. To express how good an approximation is, we consider
what happens when applying them to the Taylor expansions of
functions. Smooth functions can (locally) be approximated by their
Taylor expansion

Õ 1 (𝑛)
𝑓 (𝑥 − 𝑥 0 ) = 𝑓 (𝑥 0 )(𝑥 − 𝑥 0 ) 𝑛 (2.6)
𝑛=0
𝑛!
1 00
= 𝑓 (𝑥 0 ) + 𝑓 0 (𝑥0 )(𝑥 − 𝑥 0 ) + 𝑓 (𝑥 0 )(𝑥 − 𝑥 0 ) 2 + · · ·
2
Let us try and use our finite difference formulas on these Taylor
expansions. First we use the forward scheme of Eq.(2.2). Without
loss of generality we evaluate the derivative for 𝑥 = 0:
𝑓 (Δ𝑥) − 𝑓 (0) 1
𝑓 0 (0) ≈ = 𝑓 0 (0) + 𝑓 00 (0)Δ𝑥 + · · · (2.7)
Δ𝑥 2
CHAPTER 2. NUMERICAL DIFFERENTIATION 9

We find that the error of the approximation has a term that grows
proportional to Δ𝑥. The next error term grows like Δ𝑥 2 , but for
small Δ𝑥 this will be much smaller, and the following terms even
smaller than that. For this reason one writes

Forward Derivative
𝑓 (𝑥 + Δ𝑥) − 𝑓 (𝑥)
𝑓 0 (𝑥) = + O (Δ𝑥). (2.8)
Δ𝑥

The notation O (Δ𝑥) signifies that the error grows like ∼ Δ𝑥. If we
do the same calculation for the central derivative we find

Central Derivative
𝑓 (𝑥 + Δ𝑥) − 𝑓 (𝑥 − Δ𝑥)
𝑓 0 (𝑥) = + O (Δ𝑥 2 ). (2.9)
2Δ𝑥

It is in this sense that the central derivative is better: the error grows
like Δ𝑥 2 , which for small Δ𝑥 will be a lot smaller than Δ𝑥.

2.2 Higher Order Derivatives


We will also need to be able to take higher order derivatives. A
simple way to do this is to take single derivatives multiple times.
CHAPTER 2. NUMERICAL DIFFERENTIATION 10

For instance, by using Eq. (2.9) on 𝑓 0 we have

𝑓 0 (𝑥 + Δ𝑥) − 𝑓 0 (𝑥 − Δ𝑥)
𝑓 00 (𝑥) = + O (Δ𝑥 2 ).
2Δ𝑥
And then using the formula for 𝑥 + Δ𝑥 and 𝑥 − Δ𝑥 we have

𝑓 (𝑥 + 2Δ𝑥) − 𝑓 (𝑥)
𝑓 0 (𝑥 + Δ𝑥) = + O (Δ𝑥 2 )
2Δ𝑥
𝑓 (𝑥) − 𝑓 (𝑥 − 2Δ𝑥)
𝑓 0 (𝑥 − Δ𝑥) = + O (Δ𝑥 2 )
2Δ𝑥
Combining these we find

𝑓 (𝑥 + 2Δ𝑥) + 𝑓 (𝑥 − 2Δ𝑥) − 2 𝑓 (𝑥)


𝑓 00 (𝑥) = + O (Δ𝑥), (2.10)
4Δ𝑥 2
which is a finite difference approximation for the second derivative.
Note that we used O (Δ𝑥 2 )/Δ𝑥 = O (Δ𝑥), and so this approximation
is only guaranteed to be accurate to within Δ𝑥. It is in fact better
than this, but still worse than the more natural approximation:

Central Second Derivative

𝑓 (𝑥 + Δ𝑥) + 𝑓 (𝑥 − Δ𝑥) − 2 𝑓 (𝑥)


𝑓 00 (𝑥) = +O (Δ𝑥 2 ). (2.11)
Δ𝑥 2

In a similar fashion, for a third order derivative one has


CHAPTER 2. NUMERICAL DIFFERENTIATION 11

Central Third Derivative

1 
𝑓 000 (𝑥) = 𝑓 (𝑥 + 2Δ𝑥) − 2 𝑓 (𝑥 + Δ𝑥) (2.12)
2Δ𝑥 3 
+2 𝑓 (𝑥 − Δ𝑥) − 𝑓 (𝑥 − 2Δ𝑥) + O (Δ𝑥 2 ).

Finite difference schemes can be neatly described by just giving


their coefficients. For instance, the third order derivative scheme
above can be described simply by the coefficients {− 21 , 1, 0, −1, 21 },
corresponding to the factors of 𝑓 (𝑥 − 2Δ𝑥), 𝑓 (𝑥 − Δ𝑥), 𝑓 (𝑥), 𝑓 (𝑥 +
Δ𝑥), and 𝑓 (𝑥 + 2Δ𝑥), respectively.
In this simple way we can write down the schemes in a neat
table. Here are the first few central finite difference coefficients that
have second order accuracy, i.e. those for which the error grows like
O (Δ𝑥 2 ):

Central Difference Coefficients with Accuracy O (Δ𝑥 2 )

−2Δ𝑥 −Δ𝑥 0 Δ𝑥 2Δ𝑥


𝑓 0 (𝑥) 0 − 12 0 1
2 0
𝑓 00 (𝑥) 0 1 -2 1 0
𝑓 000 (𝑥) − 12 1 0 -1 1
2
𝑓 0000 (𝑥) 1 -4 6 -4 1

Note that the schemes all sum to zero and are symmetric for even
derivatives and anti-symmetric for odd derivatives. Why must this
CHAPTER 2. NUMERICAL DIFFERENTIATION 12

be the case?
We will also give a few schemes for forward derivatives

Forward Difference Coefficients with Accuracy O (Δ𝑥 2 )

0 Δ𝑥 2Δ𝑥 3Δ𝑥 4Δ𝑥


𝑓 0 (𝑥) − 32 2 − 12 0 0
𝑓 00 (𝑥) 2 -5 4 -1 0
𝑓 000 (𝑥) − 52 9 -12 7 − 32

Backward derivatives are found by inverting the above and flipping


the signs for odd derivatives.
To use these formulas for a specific value of Δ𝑥, the coefficients
should be divided by Δ𝑥 𝑑 , where 𝑑 is the derivative being taken.

2.3 Deriving Schemes


In the previous section we presented a method to evaluate the accu-
racy of a finite difference schemes: apply the scheme to the Taylor
expansion of a function and check how the error grows with Δ𝑥. We
should be able to reverse engineer this approach in order to derive
schemes that have a required accuracy.
In general we will find that the more points we allow evaluation
at, the higher the accuracy we can attain. The set of evaluation
points are called the stencil. In the central difference scheme, for
example, we used the stencil {𝑥 − 2Δ𝑥, 𝑥 − Δ𝑥, 𝑥, 𝑥 + Δ𝑥, 𝑥 + 2Δ𝑥}.
CHAPTER 2. NUMERICAL DIFFERENTIATION 13

This is a regularly spaced stencil. Sometimes, there is the need for


irregularly spaced points, and so we will need a scheme for general
points such as {𝑥 1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 }. In other words, we want a formula
for the 𝑑’th derivative of the form:

𝑓 (𝑑) (𝑥) ≈ 𝑎 1 𝑓 (𝑥 1 ) + 𝑎 2 𝑓 (𝑥 2 ) + 𝑎 3 𝑓 (𝑥 3 ) + 𝑎 4 𝑓 (𝑥 4 ) + 𝑎 5 𝑓 (𝑥5 ),

where the 𝑎’s are the finite difference coefficients for this custom
stencil, which in general will depend on 𝑥.
The derivation of the formula is quite simple: we just need to
ensure that the terms of the Taylor expansion equal zero except at
the derivative we are trying to calculate, where instead we need to
correct for the factorial. We skip a step-step derivation and simply
state that the correct coefficients for evaluation at 𝑥 = 0 are found
by solving the following linear equation2

General Formula for 5-Point Finite Difference Coefficients

1 1 1 1 1 𝑎1 𝛿0,𝑑
© ª© ª © ª
𝑥
­ 1 𝑥2 𝑥3 𝑥4 𝑥 5 ® ­𝑎 2 ® ­𝛿1,𝑑 ®
­ 2
𝑥 22 𝑥32 𝑥 42 𝑥 52 ® ­𝑎 3 ® = 𝑑! ­𝛿2,𝑑 ® .
® ­ ® ­ ®
­𝑥 1 (2.13)
­ 3
𝑥 23 𝑥33 𝑥 43 𝑥 53 ® ­𝑎 4 ®
®­ ® ­ ®
­𝑥 1 ­𝛿3,𝑑 ®
4 𝑥 24 𝑥34 𝑥 44 𝑥 54 ¬ «𝑎 5 ¬
«𝑥 1 «𝛿4,𝑑 ¬

Here 𝛿 is the Kronecker delta. The formula generalises straight-


forwardly to smaller or larger stencils. The resulting scheme will
2 If you 𝑥 ≠ 0, all 𝑥𝑖 should be replaced by (𝑥𝑖 − 𝑥).
CHAPTER 2. NUMERICAL DIFFERENTIATION 14

have an accuracy of at least O (Δ𝑥 𝑁−𝑑 ), where 𝑁 is the number


of stencil points, and Δ𝑥 is representative of the distance between
them. Central schemes will be one order higher. Note that point of
evaluation does not need to be one of the stencil points; the formula
can interpolate/extrapolate at any value.

Example
To find the third derivative scheme for the regular stencil
{𝑥 − 2Δ𝑥, 𝑥 − Δ𝑥, 𝑥, 𝑥 + Δ𝑥, 𝑥 + 2Δ𝑥}, one would solve

1 1 1 1 1 𝑎 −2 0
­ −2Δ𝑥 −Δ𝑥 0 Δ𝑥
© ª© ª © ª
2Δ𝑥 ® ­𝑎 −1 ® ­0®
­ (−2Δ𝑥) 2 (−Δ𝑥) 2 0 Δ𝑥 2 (2Δ𝑥) 2 ® ­ 𝑎 0 ® = ­0® .
­ ®­ ® ­ ®
­ (−2Δ𝑥) 3 (−Δ𝑥) 3 0 Δ𝑥 3 (2Δ𝑥) 3 ® ­ 𝑎 1 ® ­6®
­ ®­ ® ­ ®
4 4 0 Δ𝑥 4 (2Δ𝑥) 4 ¬ « 𝑎 2 ¬ «0¬
« (−2Δ𝑥) (−Δ𝑥)

With this formula we can calculate schemes for any order of deriva-
tive for any stencil (as long as 𝑁 > 𝑑). To calculate the schemes
presented in the tables above, you can set Δ𝑥 = 1.

2.4 Machine Precision


You might wonder why we need to develop all these different
schemes. Even a method that has an error of the order O (Δ𝑥)
should still be good enough if we simply choose Δ𝑥 small enough,
right? In theory, this is correct, but in practice it is not.
CHAPTER 2. NUMERICAL DIFFERENTIATION 15

Since computers cannot store numbers with infinite precision


there are limits to the above suggestion. An immediate problem
is that Δ𝑥 simply cannot be chosen to be arbitrarily small. Using
“single-precision floating points”3, the smallest representable num-
ber is about 10−38 . Using double-precision we can get down to
around 10−324 . This is indeed quite small, and not the source of the
real problem.
When we do calculations on a computer, the important factor in
maintaining precision is how big the gaps are between the numbers
we can represent. This is called the machine epsilon. Using double-
precision, for instance, the first number we can represent which is
bigger than 1 is

1 + 𝜖 𝑀 = 1 + 2−52 ≈ 1 + 2.2 · 10−16 . (2.14)

So what happens if you ask a computer to calculate 1 + 10−16 ? It


will simply return 1.4 This is called a round-off error, for obvious
reasons. In general, the gap between numbers that are of magni-
tude 𝑁 will be ∼ 𝜖 𝑀 𝑁. In this way, the relative error of doing a
calculation on a computer will for double-precision floating points
be of order 𝜖 𝑀 .
Consider, for example, a situation where we need to estimate
a derivative with absolute precision of some tolerance 𝛿. Using
3 Single-precision floats are often written as float32 or simply float and
are real numbers stored on the computer using 32 bits, i.e. 32 zeros and ones.
Double-precision floats are often written as float64 or double and are real
numbers stored using 64 bits.
4 Likewise, trying to calculate 1 + 10−16 + 10−16 + 10−16 will also return 1,

but 1 + 10−15 will return a number larger than 1.


CHAPTER 2. NUMERICAL DIFFERENTIATION 16

a scheme that has error of order O (Δ𝑥) we thus have to choose


Δ𝑥 ∼ 𝛿. We can do a simple calculation where we take round-off
errors of 𝑓 (𝑥) into account:5
𝑓 (𝑥 + Δ𝑥) − 𝑓 (𝑥) ± 𝜖 𝑀 | 𝑓 (𝑥)| 𝜖 𝑀 | 𝑓 (𝑥)|
= 𝑓 0 (𝑥) + O (Δ𝑥) ± .
Δ𝑥 Δ𝑥
Here we explicitly see that on a computer, finite difference schemes
will have two sources of errors: truncation errors that are due to
our schemes not being precise, and rounding errors that are due to
the finite number representation of computers. In our example, the
truncation error would be ∼ 𝛿, and the rounding error ∼ 𝜖 𝑀 /𝛿 if
| 𝑓 (𝑥)| ≈ 1.
Had we instead used the central scheme we would have
𝑓 (𝑥 + Δ𝑥) − 𝑓 (𝑥 − Δ𝑥) ± 𝜖 𝑀 | 𝑓 (𝑥)| 𝜖 𝑀 | 𝑓 (𝑥)|
= 𝑓 0 (𝑥) + O (Δ𝑥 2 ) ± .
2Δ𝑥 2Δ𝑥
But now
√ in order to reach an error of size 𝛿, we only need to choose
Δ𝑥 ∼ 𝛿. With this choice, the truncation √ error remains the same,
but the rounding error is now only 𝜖 𝑀 /2 𝛿.
This is why higher-order schemes are useful: they allow us to
use a larger Δ𝑥, which minimises round-off errors. For a derivative
of order 𝑑, the round-off error becomes of size ∼ 𝜖 𝑀 /Δ𝑥 𝑑 . In this
way, rounding errors become increasingly problematic for higher
order derivatives. Spectral methods that we present in the next
chapter avoid this problem to a large degree.
5 Weuse the fact that 𝑓 (𝑥) (and 𝑓 (𝑥 + Δ𝑥)) will have a round-off error of
approximately | 𝑓 (𝑥)|𝜖 𝑀 .
CHAPTER 2. NUMERICAL DIFFERENTIATION 17

In our analysis above we considered the round-off error of 𝑓 (𝑥)


and 𝑓 (𝑥 + Δ𝑥). However, there could also be a round-off error for
𝑥 + Δ𝑥. This error leads to
 
𝑓 (𝑥 + Δ𝑥 ± 𝜖 𝑀 |𝑥|) − 𝑓 (𝑥) 𝜖 𝑀 |𝑥| 0
= 1± 𝑓 (𝑥) + O (Δ𝑥). (2.15)
Δ𝑥 Δ𝑥
However, this type of round-off error can actually be avoided by
choosing the grid of 𝑥 and spacing Δ𝑥 in such a way that rounding
never happens. If high precision is your aim, it is recommended to
think about this.

In the following chapters we will consider differential equations.


For these, higher-order schemes are critical not just for precision,
but also for the speed of computation, since the larger the Δ𝑥 or Δ𝑡,
the fewer calculations we need to do.

2.5 Spectral Methods


In the above sections we have introduced finite difference schemes
for taking numerical derivatives. The approximation of a finite
difference scheme lies in approximating the derivative operator. A
contrasting approach is to instead make an analytical approximation
of the function and then take exact derivatives of this approximation.
Such approaches are called spectral methods.
The simplest example of spectral methods uses Fourier series.
As such let us consider a periodic function 𝑓 defined on the interval
[0, 2𝜋). Instead of storing the values of the function at a finite num-
ber of points, we could instead store a finite number of coefficients
CHAPTER 2. NUMERICAL DIFFERENTIATION 18

in a Fourier series. We could for instance store 2𝑁 + 1 coefficients6


{𝑐 𝑘 } and approximate
𝑁
Õ
𝑓 (𝑥) ≈ 𝑐 𝑘 𝑒𝑖𝑘𝑥 . (2.16)
𝑘=−𝑁

Now that the true function 𝑓 is approximated by the Fourier


series, we can approximate derivatives of the true function by exact
derivatives of Fourier series. Concretely,
𝑁
Õ
(𝑑)
𝑓 (𝑥) ≈ (𝑖𝑘) 𝑑 𝑐 𝑘 𝑒𝑖𝑘𝑥 . (2.17)
𝑘=−𝑁

So the 𝑑’th derivative of a function represented by coefficients


{𝑐 𝑘 } is approximated by the function represented by coefficients
{(𝑖𝑘) 𝑑 𝑐 𝑘 }.
If the function 𝑓 is stored at regularly spaced points such as

𝑥 ∈ {0, Δ𝑥, 2Δ𝑥, 3Δ𝑥, · · · , 2𝜋 − Δ𝑥}, (2.18)

spectral methods can still be used. In this case, we simply use a


discrete Fourier transform. Most programming language offer a
very fast version called the fast fourier transform (“fft”). All in all,
the 𝑑’th derivative can then be taken as

6 Ifthe function is real, i.e. not complex, fewer coefficients need to be stored,
since then 𝑐 −𝑘 = 𝑐∗𝑘 .
CHAPTER 2. NUMERICAL DIFFERENTIATION 19

Spectral Derivative using Fast Fourier Transform


 
𝒇 (𝑑) ≈ ifft (𝑖𝒌) 𝑑 fft( 𝒇 ) (2.19)

Here we assume that 𝑓 (𝑥) is stored as 𝒇 = [ 𝑓0 , 𝑓1 , 𝑓2 , · · · , 𝑓𝑛 ], and


‘fft’ and ‘ifft’ are the fast fourier transform and inverse fast fourier
transform, respectively.
Fourier transforming leads to complex numbers, which we then
transform (here by multiplying by (𝑖𝑘) 𝑑 ) and transform back to
real numbers. These intermediary complex numbers can lead to
spurious errors for real numbers. To avoid such problems most fft-
libraries will have functions specifically designed for real-valued
functions. These are typically called ‘rfft’ and ‘irfft’, and we rec-
ommend their use.7
For problems that require high-order numerical derivatives,
spectral methods are often the preferred approach. The excep-
tion to this rule is when the function of interest has discontinuities.
In contrast to finite differences, spectral method are non-local. A
discontinuity at some 𝑥 can therefore influence the result even far
away from 𝑥. The method only works when the function is smooth
everywhere.
7 Ifyou do not have access to such functions, it is best practice to zero out
the so-called Nyquist frequency after Fourier transforming. Also note that this
will depend on whether you are using and even or odd number of grid points.
We mention these details so that you are aware of such issues, but skip a full
discussion of them here.
CHAPTER 2. NUMERICAL DIFFERENTIATION 20

Spectral methods can be combined with other approaches by


using fast fourier transforms to switch between real space and
frequency space appropriately. Such approaches are often called
pseudo-spectral methods.

Spectral methods also exist for non-periodic domains, but the


formulas differ as different basis functions need to be used. These
will typically be slightly more involved to implement as conversion
between function values and basis function coefficients is not nec-
essarily pre-implemented in your programming language of choice
in contrast to Fourier series.
Ordinary Differential Equations

Numerical problems involving Ordinary Differential Equations (ODEs)


typically come in two forms distinguished by their boundary con-
ditions: initial-value problems and boundary-value problems.
Initial-value problems are typically equations in time, where the
value of the function is known at 𝑡 = 0.

Initial-Value Problem

𝑓 (𝑡)
𝑡

Find 𝒇 (𝑡) for 𝑡 ∈ [0, 𝑇] such that

𝒇 0 (𝑡) = 𝑭( 𝒇 (𝑡), 𝑡) (3.1)

and 𝒇 (0) = 𝒇 0 .

Here 𝑭 is any (non-linear) function and 𝒇 is the function to be


found, both of which could be multi-dimensional. 𝒇 0 is a constant
vector containing the initial value of the function.
The definition includes higher-order derivative problems as
these can always be recast as a set of first order equations. For

21
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 22

example, the initial value problem


00
 𝑓 (𝑡) = 𝐹 ( 𝑓 (𝑡), 𝑡)



𝑓 (0) = 𝛼 (3.2)
 𝑓 0 (0) = 𝛽


is equivalent to


 𝑓10 (𝑡) = 𝑓2 (𝑡)
𝑓20 (𝑡) = 𝐹 ( 𝑓1 (𝑡), 𝑡)



(3.3)

 𝑓1 (0) = 𝛼
 𝑓2 (0) = 𝛽


Likewise, any ODE whose highest derivative is of order 𝑑 can
be recast as 𝑑 first-order equations. The problem specification must
then contain exactly 𝑑 boundary conditions.

In contrast, boundary-value problems are typically problems in


space for which boundary conditions are specified at the domain
borders:

Boundary-Value Problem

𝑓 (𝑥)
𝑥

Find 𝒇 (𝑥) for 𝑥 ∈ [𝑎, 𝑏] such that


𝒇 0 (𝑥) = 𝑭( 𝒇 (𝑥), 𝑥) (3.4)
and 𝑛 = dim( 𝒇 ) conditions of the form 𝑓 𝑗 (𝑏𝑖 ) = 𝛼𝑖 are
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 23

satisfied, where 𝑏𝑖 is equal to 𝑎 or 𝑏.

An example of a boundary value problem is




 𝑓10 (𝑥) = 𝑓2 (𝑥)
𝑓20 (𝑥) = 𝐹 ( 𝑓1 (𝑥), 𝑥)



(3.5)

 𝑓1 (𝑎) = 𝛼
𝑓1 (𝑏) = 𝛽



At first sight the difference between initial-value and boundary-
value problems might seem insignificant. This not the case, how-
ever. Initial-value problems can typically be solved step-by-step in
the sense that knowing e.g. 𝑓1 (0) allows you to calculate 𝑓1 (Δ𝑡),
which in turn allows you to calculate 𝑓1 (2Δ𝑡) and so on. This is
because all 𝑓𝑖 ’s are known for the same time points. In contrast,
for boundary-value problems, knowing 𝑓1 (𝑎) does not allow one to
calculate 𝑓1 (𝑎 + Δ𝑥) directly since it could be the case that 𝑓2 is only
known at 𝑓2 (𝑏), but its value is needed at 𝑥 = 𝑎. These aspects, and
how they are dealt with, will become clear in following sections.

3.1 Initial-Value Problems


Choosing a step-size Δ𝑡 to discretise time by, we can use the forward
scheme of Eq. (2.8) to immediately give a method for solving
initial-value problems:
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 24

The Euler Method

𝒇 (𝑡 + Δ𝑡) = 𝒇 (𝑡) + 𝑭( 𝒇 (𝑡), 𝑡) Δ𝑡 (3.6)

We know from Eq. (2.8) that the forward scheme has error of
size O (Δ𝑡), which means that each step in the Euler method has an
error of size O (Δ𝑡 2 )1. In order to integrate the equation from 𝑡 = 0
all the way to 𝑡 = 𝑇 using a step of size Δ𝑡, we will need to do the
above approximation 𝑛 ≈ 𝑇/Δ𝑡 times. This means that the error
of the final term 𝒇 (𝑇) will be of order 𝑇/Δ𝑡 × O (Δ𝑡 2 ) = O (Δ𝑡).
Although the method is exceedingly simple to implement, this large
error makes it unfit for many applications unless very small Δ𝑡 are
used.

Example
Let us solve
p
𝑓 0 (𝑡) = 𝑓 (𝑡), 𝑓 (0) = 1 (3.7)
using the Euler method with Δ𝑡 = 0.1.
The first time step gives us
p
𝑓 (Δ𝑡) = 𝑓 (0.1) ≈ 𝑓 (0) + 𝑓 (0) Δ𝑡
= 1 + 0.1 = 1.1. (3.8)

1 Eq. (2.8) gives 𝑓 0 (𝑡) = 𝑓 (𝑡+Δ𝑡)−


Δ𝑡
𝑓 (𝑡)
+ O (Δ𝑡) which we multiply by Δ𝑡 to
obtain the Euler Method. The multiplication is the reason the error becomes
O (Δ𝑡 2 ).
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 25

And the next


p
𝑓 (2Δ𝑡) = 𝑓 (0.2) ≈ 𝑓 (0.1) + 𝑓 (0.1) Δ𝑡

= 1.1 + 1.1 · 0.1 ≈ 1.205.

And so we continue, easily implemented in a loop. To get an


accurate solution we should use a smaller Δ𝑡.

In physical problems one often needs to solve versions of New-


ton’s or Hamilton’s equations. For these equations we can define
spatial variables 𝑥 and velocity/momentum variables 𝑣, such that
the equations can be written

𝑥 0 (𝑡) = 𝑣(𝑡), (3.9)


𝑣 0 (𝑡) = 𝑎(𝑥(𝑡)), (3.10)

where 𝑎(𝑥) is the acceleration term. We have assumed a conserva-


tive system. In this specific situation there is a small variation of
the Euler method which has some very nice features and is almost
as easy to implement:
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 26

Verlet/Leapfrog Integration
For each time step do
1
𝑥 Δ𝑡/2 = 𝑥(𝑡) + 𝑣(𝑡)Δ𝑡
2
𝑣(𝑡 + Δ𝑡) = 𝑣(𝑡) + 𝑎(𝑥Δ𝑡/2 )Δ𝑡 (3.11)
1
𝑥(𝑡 + Δ𝑡) = 𝑥Δ𝑡/2 + 𝑣(𝑡 + Δ𝑡)Δ𝑡
2

The error of this method is only O (Δ𝑡 3 ) per time step, which is
a big improvement over the Euler Method. This is despite being
computationally as cheap as the Euler method: 𝑎(𝑥), which is
typically the most computationally expensive term to calculate, is
only evaluated once per time step for both methods.
A conservative system of equations will conserve energy, but
we cannot in general guarantee this from numerical approxima-
tions. The Verlet (or Leap-Frog) method is a so-called symplectic
method, which precisely ensures that the (time-averaged) energy is
conserved. This feature can be crucial in many physical simulations.

3.1.1 Runge–Kutta Methods


For finite difference schemes we found that using larger stencils
allowed us a better estimation of the derivatives. Runge–Kutta
methods use exactly the same idea, but for initial-value problems.
The Euler method has error O (Δ𝑡 2 ) per time step. A simple Runge–
Kutta method to improve on this is
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 27

Second Order Runge–Kutta (Midpoint Method)


For each time step update

𝑘 1 = 𝐹 ( 𝑓 (𝑡), 𝑡)
1 1
𝑘 2 = 𝐹 ( 𝑓 (𝑡) + 𝑘 1 Δ𝑡, 𝑡 + Δ𝑡) (3.12)
2 2
𝑓 (𝑡 + Δ𝑡) = 𝑓 (𝑡) + 𝑘 2 Δ𝑡

This method is strikingly similar to Verlet integration in that we do a


half-Δ𝑡 time step. The difference is that here we do the half time step
for all variables, and we thus have to evaluate 𝐹 twice. The error
for each time step of this method is also O (Δ𝑡 3 ), and this method
works for all ODEs, not just the ones that have a position-velocity
separation.

Example
Let us again solve
p
𝑓 0 (𝑡) = 𝑓 (𝑡), 𝑓 (0) = 1 (3.13)

with Δ𝑡 = 0.1 but now using the second order Runge–Kutta


method.
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 28

The first time step is done by evaluating


p
𝑘 1 = 𝑓 (0) = 1,
r r
1 1
𝑘 2 = 𝑓 (0) + 𝑘 1 Δ𝑡 = 1 + · 0.1 ≈ 1.0247,
2 2
and then

𝑓 (Δ𝑡) = 𝑓 (0.1) ≈ 𝑓 (0) + 𝑘 2 Δ𝑡 ≈ 1.10247.

The exact value of 𝑓 (0.1) is 1.1025, showing that this method


achieves much better results than what we obtained using the
Euler method [Eq. (3.8)].

The most famous Runge–Kutta method is the simplest fourth


order version:
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 29

Fourth Order Runge–Kutta (RK4)


For each time step evaluate

𝑘 1 = 𝐹 ( 𝑓 (𝑡), 𝑡)
1 1
𝑘 2 = 𝐹 ( 𝑓 (𝑡) + 𝑘 1 Δ𝑡, 𝑡 + Δ𝑡)
2 2
1 1
𝑘 3 = 𝐹 ( 𝑓 (𝑡) + 𝑘 2 Δ𝑡, 𝑡 + Δ𝑡) (3.14)
2 2
𝑘 4 = 𝐹 ( 𝑓 (𝑡) + 𝑘 3 Δ𝑡, 𝑡 + Δ𝑡)
1
𝑓 (𝑡 + Δ𝑡) = 𝑓 (𝑡) + [𝑘 1 + 2𝑘 2 + 2𝑘 3 + 𝑘 4 ] Δ𝑡
6

The error of this method is O (Δ𝑡 5 ) per time step, which allows
for significantly larger step sizes than that required by the Euler
method.

Runge–Kutta methods are slightly harder to derive than finite


difference coefficients. Their derivation, nonetheless, follow the
exact same approach: compare with the general Taylor expansion
to ensure the highest order of accuracy. In contrast to finite dif-
ference schemes, however, Runge–Kutta methods allow for some
freedom of choice. To derive the RK4 scheme above, for instance,
you begin by specifying that you want a scheme using the specified
𝑘 1 , 𝑘 2 , 𝑘 3 , 𝑘 4 and then calculate the coefficients of these terms to de-
termine 𝑓 (𝑡 +Δ𝑡) with best precision (i.e. you derive 1/6, 2/6, 2/6, 1/6).
Different Runge–Kutta schemes can be obtained with difference
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 30

choices of 𝑘’s. In fact, the above scheme is not the best fourth order
Runge–Kutta scheme, but it is the easiest one to implement.

Runge–Kutta schemes have one more trick up their sleeve: they


provide a method to determine the best choice of Δ𝑡. Because
Runge–Kutta methods of different order have different accuracy,
we can do a time step with two different Runge–Kutta methods and
estimate the error by their difference.
For instance, we know that for a fourth order Runge–Kutta
method we have 𝑓 (𝑡 + Δ𝑡) = 𝑓 (𝑡) + Δ 𝑓 𝑅𝐾4 + O (Δ𝑡 5 ), whereas a
fifth order method would have 𝑓 (𝑡 + Δ𝑡) = 𝑓 (𝑡) + Δ 𝑓 𝑅𝐾5 + O (Δ𝑡 6 ).
Thus the difference 𝐸 = |Δ 𝑓 𝑅𝐾4 − Δ 𝑓 𝑅𝐾5 | will be of order O (Δ𝑡 5 ).
Typically we want to choose the largest Δ𝑡 while ensuring that the
error remains smaller than some tolerance 𝜖. Choosing Δ𝑡 to be
slightly smaller (say, a factor 0.9) than required, we find the perhaps
most widely used solver for initial-value problems:

Adaptive Runge–Kutta 4(5) — RK45 / ode45


Choose a desired error tolerence 𝜖.
Then for each time step
1. Compute Δ 𝑓 𝑅𝐾4 using a RK4 scheme
2. Compute Δ 𝑓 𝑅𝐾5 using a RK5 scheme
3. Estimate the error 𝐸 = |Δ 𝑓 𝑅𝐾4 − Δ 𝑓 𝑅𝐾5 |
 1/5
4. Choose a new Δ𝑡 = 0.9 𝐸𝜖 Δ𝑡old
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 31

5. • If 𝐸 > 𝜖 redo the time step with the new Δ𝑡


• Otherwise update 𝑓 (𝑡 + Δ𝑡old ) = 𝑓 (𝑡) + Δ 𝑓 𝑅𝐾5 .

Using this scheme, large time steps will be taken when the solution
is smooth and small time steps will be taken when the solution
varies rapidly. This is computationally much more efficient than
taking the same time step at all times.
The method is not fully specified in the above presentation, as
there is still some choice in which Runge–Kutta formulas to use. A
brilliant scheme, and the one most widely in use, is the Dormand–
Prince method, which uses a fourth and fifth order scheme that
share most of their 𝑘-expressions.

3.1.2 Implicit Time-Stepping


Runge–Kutta schemes allow for a large step size Δ𝑡 while still
ensuring good accuracy. Despite this, some differential equations
still require so small Δ𝑡 that it becomes impractical to solve them
using the schemes presented thus far. Such problems are called stiff.
This terminology does not have a precise definition, so we will just
stick with this pragmatic version: Stiff differential equations are
those that require impractically small Δ𝑡 to be solved. For stiff
problems, if Δ𝑡 is not chosen small enough, the numerical solution
will tend to become unstable and diverge to ± infinity.

Luckily, implicit methods solve much of the problem created


CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 32

by stiff equations. The downside is that these methods tend to be


harder to implement than explicit methods (those just presented).
Here we present just a few examples of implicit methods, but note
that implicit methods become crucial once we have to solve partial
differential equations in the following chapter.
In deriving the Euler Method we used the forward finite differ-
ence scheme. Had we instead used the backward scheme we would
have obtained the Implicit Euler Method:

Implicit Euler Method

𝒇 (𝑡 + Δ𝑡) = 𝒇 (𝑡) + 𝑭( 𝒇 (𝑡 + Δ𝑡), 𝑡 + Δ𝑡) Δ𝑡 (3.15)

Note that the right-hand side now contains 𝑓 (𝑡 + Δ𝑡), which is what
we are trying to calculate. This is the reason these methods are
called implicit: the methods only provide implicit equations.

Example
Consider the initial-value problem
𝑓 0 (𝑡) = −𝑎 𝑓 (𝑡), 𝑓 (0) = 𝑓0 > 0. (3.16)
Using explicit Euler, the update rule would be
𝑓 (𝑡 + Δ𝑡) = 𝑓 (𝑡) − 𝑎 𝑓 (𝑡)Δ𝑡. (3.17)
Note that if Δ𝑡 > 𝑎2 , this scheme becomes unstable as even
though 𝑓 (𝑡) should decay, it will instead do increasingly large
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 33

oscillations. In fact even for Δ𝑡 > 𝑎1 it will give solutions that


have negative values, which should never be the case for the
given equation.
In contrast the Implicit Euler Method gives us the update rule

𝑓 (𝑡 + Δ𝑡) = 𝑓 (𝑡) − 𝑎 𝑓 (𝑡 + Δ𝑡)Δ𝑡. (3.18)


For this simple differential equation we can solve the implicit
equation analytically. Thus we can rewrite the update rule to
𝑓 (𝑡)
𝑓 (𝑡 + Δ𝑡) = . (3.19)
1 + 𝑎Δ𝑡
Clearly this implicit scheme will never have problems of
oscillations or negative values, and in fact the method is
stable for any positive value of Δ𝑡.

In the above example, Δ𝑡 had to be chosen small for the explicit


method to work. This could seem like not so big an issue, as 1/𝑎
sets the time scale of the problem, and thus it seems okay that
we have to choose Δ𝑡  1/𝑎. We present one more example to
illustrate why this can be a problem:
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 34

Example
Consider

𝑓10 (𝑡) = −𝑎 ( 𝑓1 (𝑡) + 𝑓2 (𝑡)) − 𝑏 ( 𝑓1 (𝑡) − 𝑓2 (𝑡)) ,


𝑓20 (𝑡) = −𝑎 ( 𝑓1 (𝑡) + 𝑓2 (𝑡)) + 𝑏 ( 𝑓1 (𝑡) − 𝑓2 (𝑡)) .

This problem has two time scales: 1/𝑎 and 1/𝑏. For initial
conditions 𝑓1 (0) = 1 and 𝑓2 (0) = 0, the system has the
solution
1  −2𝑎𝑡 −2𝑏𝑡

𝑓1 (𝑡) = 𝑒 +𝑒 ,
2
1  −2𝑎𝑡 
𝑓2 (𝑡) = 𝑒 − 𝑒 −2𝑏𝑡 .
2
Note that if, say, 𝑎  𝑏 then 𝑒 −2𝑏𝑡 will very quickly become
negligible. Yet, for the explicit method to work, we will
nonetheless be forced to choose Δ𝑡  1/𝑏. In other words:
even in cases where the shortest time scale does nothing
important for the final solution, we are still forced to use a
small time scale in our time steps for explicit methods.
The implicit scheme for the above equation is
(1 + 𝑎Δ𝑡 + 𝑏Δ𝑡) 𝑓1 (𝑡) + (𝑏 − 𝑎)Δ𝑡 𝑓2 (𝑡)
𝑓1 (𝑡 + Δ𝑡) = ,
(1 + 2𝑎Δ𝑡) (1 + 2𝑏Δ𝑡)
and similarly for 𝑓2 (𝑡 + Δ𝑡).
For the implicit time step, 𝑎 and 𝑏 appear in both the numera-
tor and denominator, which stabilises the scheme as, even for
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 35

large Δ𝑡, large values of e.g. 𝑏 will not drive the right-hand
side to be large.

The above example shows a typical situation, the classic example of


which is the simulation of chemical reactions, where reaction rates
often differ by many orders of magnitudes and often change as a
function of time. In such cases, implicit methods become crucial.
In our two examples, we have solved the implicit equations
analytically. This is not practical for large systems of equations.
In these cases, the equations are solved numerically, and thus each
time step will require a system of equations to be solved. For
non-linear equations this can be quite cumbersome, and iterative
methods should be used.

While the Implicit Euler Method is much more stable, its ac-
curacy is similar to that of the Euler scheme, i.e. the error of each
time step is of order O (Δ𝑡 2 ). Implicit Runge–Kutta schemes also
exist, and we will give just one example of such:

Crank–Nicolson Method
For each time step solve the following system of equations

𝑘 1 = 𝐹 ( 𝑓 (𝑡), 𝑡)
𝑘 2 = 𝐹 ( 𝑓 (𝑡 + Δ𝑡), 𝑡 + Δ𝑡) (3.20)
1
𝑓 (𝑡 + Δ𝑡) = 𝑓 (𝑡) + (𝑘 1 + 𝑘 2 )Δ𝑡
2
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 36

This implicit method has an error of order O (Δ𝑡 3 ), but even though
it is more stable the explicit Euler, it is not stable for any value of
Δ𝑡.

3.2 Boundary-Value Problems


As we did for initial-value problems, we could present methods
for boundary-value problems for a general ODE. However, we find
that it is much easier to explain these methods if we work with a
specific example. For this reason we will present methods that solve
boundary-value problems that involve ODEs of the form

𝑓 00 (𝑥) = 𝑔(𝑥), 𝑥 ∈ [𝑎, 𝑏], (3.21)

where 𝑔(𝑥) is some specified function. The methods presented


should generalise quite easily to other linear problems, and we will
end this chapter by discussing how to deal with non-linearities.
There are two main types of boundary conditions:

Dirichlet Boundary Condition


A Dirichlet boundary condition specifies the value of the
function at the border, e.g.:

𝑓 (𝑎) = 𝛼 (3.22)
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 37

Neumann Boundary Condition


A Neumann boundary condition specifies the value of the
derivative of a function at the border, e.g.:

𝑓 0 (𝑎) = 𝛼 (3.23)

Naturally, combinations of such conditions could also be used as


boundary conditions, as e.g. in Robin boundary conditions which
specifies for instance 𝑢 1 𝑓 (𝑎) + 𝑢 2 𝑓 0 (𝑎) = 𝛼.
We will consider Eq. (3.21) with the boundary conditions

𝑓 (𝑎) = 𝛼, 𝑓 0 (𝑏) = 𝛽. (3.24)

3.2.1 Shooting Method


Consider that if we instead of the second boundary condition re-
quired 𝑓 0 (𝑎) = 𝛽, then we would have an initial-value problem that
we knew how to solve. This connection leads to a simple method
called the Shooting Method: We guess enough boundary conditions
at 𝑥 = 𝑎 to be able to solve the ODE as an initial-value problem
(shoot) and then readjust these guessed boundary conditions until
the solution satisfies the boundary conditions required at 𝑥 = 𝑏. For
our example problem:
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 38

The Shooting Method


Find a 𝛾 which solves the equation

𝑓 0 (𝑏) = 𝛽, (3.25)

where 𝑓 0 (𝑏) is evaluated from solving the initial-value prob-


lem of ODE with boundary condition

𝑓 (𝑎) = 𝛼, 𝑓 0 (𝑎) = 𝛾 (3.26)

Once 𝛾 is found, the initial-value problem solves the


boundary-value problem.

This method requires some technique for solving numerically for


𝛾, which can be as simple as trying 𝛾 and 𝛾 + 𝜖 and move 𝛾 in the
direction which decreases the error as specified by | 𝑓 0 (𝑏) − 𝛽|, or
something more fancy (such as Powell’s Method, which will not be
discussed here).
The Shooting Method is easy to understand, but not trivial to
implement well. It also does not generalise well to higher order
equations, which have many boundary conditions, and in particular
it is virtually useless in the context of partial differential equations,
which will be the subject of the next chapter.
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 39

3.2.2 Finite Difference Method


In contrast to the shooting method, we will instead focus on methods
that solve directly for the values of 𝑓 (𝑥) at all values of 𝑥 simulta-
neously. In the finite difference method, we discretise 𝑥, 𝑓 (𝑥) and
𝑔(𝑥) and store the value of these as vectors

𝑥 𝑓 𝑔
© 1 ª © 1 ª © 1 ª
­ 𝑥2 ® ­ 𝑓2 ® ­ 𝑔2 ®
­ ® ­ ® ­ ®
­ 𝑥3 ® ­ 𝑓3 ® ­ 𝑔3 ®
𝒙 = ­ .. ®® ,
­ 𝒇 = ­ .. ®® ,
­ 𝒈 = ­ .. ®® ,
­ (3.27)
­ . ® ­ . ® ­ . ®
­𝑥 ® ­𝑓 ® ­𝑔 ®
­ 𝑁−1 ® ­ 𝑁−1 ® ­ 𝑁−1 ®
« 𝑥𝑁 ¬ « 𝑓𝑁 ¬ « 𝑔𝑁 ¬
where 𝑥 1 = 𝑎 and 𝑥 𝑁 = 𝑏. Note that 𝒇 is not known, as this is the
one we are solving for. For regular grids we will have a constant
spacing 𝑥𝑖+1 − 𝑥𝑖 = Δ𝑥 for all 𝑖, which we will assume for now.
Using the central finite difference scheme of Eq. (2.11), we
have
𝑓𝑖+1 + 𝑓𝑖−1 − 2 𝑓𝑖
𝑓𝑖00 ≈ (3.28)
Δ𝑥 2
for 2 ≤ 𝑖 ≤ 𝑁 − 1. So except for the edge case 𝑖 = 1 and 𝑖 = 𝑁, we
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 40

can write our discretised ODE as a matrix equation:

? ? ? ? ··· ? ? ? ?
­1 −2 1 0 · · ·
© ª
0 0 0® 0
­0 1 −2 1 · · ·
­ ®
0 0 0® 0
00 1 ­­ . . .. .. . . .. .... ®® 𝒇
..
𝒇 = .. .. . . . . . .® . (3.29)
Δ𝑥 2 ­­
­0 0 0 0 · · · 1 −2 1 0®®
­0 0 0 0 · · · 0 1 −2 1®®
­
«? ? ? ? ··· ? ? ? ?¬

We could use the central scheme to fill out the first and last rows if
we were solving a periodic problem. For the present problem we
could also use a forward or backward finite difference scheme to fill
out these rows to estimate the second derivative there. However, we
do not need to do this, as we instead need to use these rows to enforce
the boundary conditions. At 𝑥 = 𝑎 we have to enforce 𝑓1 = 𝛼. At
𝑥 = 𝑏 we need to enforce a Neumann boundary condition. We do
this by using backward scheme for the first derivative. We could
for instance enforce 𝑓 𝑁 − 𝑓 𝑁−1 = 𝛽Δ𝑥. However, as we use a
second order scheme with error O (Δ𝑥 2 ), we should do the same for
the boundary condition. The backward scheme of second order is
given by 23 𝑓 𝑁 − 2 𝑓 𝑁−1 + 12 𝑓 𝑁−2 = 𝛽Δ𝑥.
All in all, the finite difference version of Eq. (3.21) with bound-
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 41

ary conditions as given by Eq. (3.24) is


1 0 0 0 ··· 0 0 0 0 𝛼
© ª
­ 1 −2 1 © ª
­ Δ𝑥 2 Δ𝑥 2 Δ𝑥 2
0 ··· 0 0 0 0 ® ® ­ 𝑔2 ®
­ ®
­ 1 −2 1 ®
­ 0 Δ𝑥 2 Δ𝑥 2Δ𝑥 2
· · · 0 0 0 0 ® ­ 𝑔 ®
­ 3 ®
­ ®
­ .. .. .. .. . . .. .. .. .. ® 𝒇 = ­­ .. ®® .
­ . . . . . . . . . ®® ­ . ®
­
1 −2 1 ­ ®
­ 0
­ 0 0 0 · · · Δ𝑥 2 Δ𝑥 2 Δ𝑥 2 0 ® ® ­𝑔 𝑁−2 ®
­ ®
­ −2 1 ®
­ 0
­ 0 0 0 · · · 0 Δ𝑥1 2 Δ𝑥 2 Δ𝑥 ®
2
® ­𝑔 ®
­ 𝑁−1 ®
1 −2 3
« 0 0 0 0 · · · 0 2Δ𝑥 Δ𝑥 2Δ𝑥 ¬ « 𝛽 ¬
(3.30)
This linear equation implements all requirements to the solution
of our boundary-value problem: the first row implements the left
boundary condition, the last row implements the right boundary
condition, and the inner rows implement the differential equation
that the solution must obey. It is a simple matter to solve the above
equation on a computer by using a linear algebra library. This is
the finite difference method.

Finite Difference Method


Use a finite difference scheme to rewrite the ODE as an
approximate matrix equation,
A 𝒇 = 𝒃. (3.31)
For each boundary condition, choose a suitable row and
change those rows in A and 𝒃 with finite difference approxi-
mations to the boundary conditions.
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 42

Finally solve
Abc 𝒇 = 𝒃 bc (3.32)
to find 𝑓 , where bc denotes the updated arrays.

Note that it is not necessary to invert Abc , as faster methods exists


that solve for 𝒇 without doing explicit inversions. Also note that
most entries of Abc will be zeros, it can there be useful to use
sparse matrix representations. This will typically also give speed
improvements.
This method is important, and will be used extensively also
for partial differential equations. Therefore we present one more
example on a slightly different ODE:

Example
Consider

𝑓 00 (𝑥) + ℎ(𝑥) 𝑓 0 (𝑥) = 0, 𝑥 ∈ [𝑎, 𝑏] (3.33)

with the same boundary conditions (3.24). We can again use


the matrix of Eq. (3.29) for the second derivative. Let us
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 43

call this A2 . In a similar fashion a first derivative matrix is

−3 4 −1 0 · · · 0 0 0 0
­−1 0 1 0 · · · 0 0 0
© ª

­ 0 −1 0 1 · · · 0 0 0
­ ®

1 ­­ . .. .. .. . . . .. .. .. ®® ,
A1 = .. . . . . .. . . .®
2Δ𝑥 ­
­
­ 0 0 0 0 · · · −1 0 1 0®®
­ 0 0 0 0 · · · 0 −1 0 1®®
­
« 0 0 0 0 · · · 0 1 −4 3¬

where we used Eq. (2.9) for the inner rows and the forward
and backward schemes for first end last rows. The total
matrix representing the differential equation of Eq. (3.33) is
therefore
A = A2 + diag(𝒉) A1 , (3.34)
where diag(𝒉) is a matrix of zeros with 𝒉 filled along its
diagonal, and matrix multiplication is implied. Now the
boundary conditions can be applied to A. The right-hand
side is zero, so 𝒃 bc = (𝛼, 0, 0, · · · , 0, 0, 𝛽)𝑇 .

Finally, note that if the equation had been

𝑓 00 (𝑥) + (ℎ(𝑥) 𝑓 (𝑥)) 0 = 0, 𝑥 ∈ [𝑎, 𝑏], (3.35)

we would usea

A = A2 + A1 diag(𝒉), (3.36)
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 44

since order matters in matrix multiplication, and this specifies


if we want to apply differentiation to ℎ or not, since the
matrices act from right to left on 𝒇 .
a The analog between the ODEs and the matrices becomes more
clear if we write the ODE term with operator notation: ℎ(𝑥)𝜕𝑥 𝑓 (𝑥) and
𝜕𝑥 [ℎ(𝑥) 𝑓 (𝑥)]. With this notation we simply replace ℎ(𝑥) with diag(𝒉)
and 𝜕𝑥 with A1 to obtain the finite difference matrix.
Note that in this example we could also have used (ℎ(𝑥) 𝑓 (𝑥)) 0 =
ℎ(𝑥) 𝑓 0 (𝑥) + ℎ 0 (𝑥) 𝑓 (𝑥).

Our examples have used a regular grid-spacing of Δ𝑥. The


method, nevertheless, works for irregular grid-spacing as well.
When constructing A you can then use e.g. Eq. (2.13) to derive the
formulas for the derivatives. This can be very useful if the solution
to the equation of interest varies faster in some regions than others.

Although fantastic library functions exist for solving linear equa-


tions, one can find oneself in a situation where a custom solver is
needed. For this reason we end this section with a simple method
for solving linear equations of the form A 𝒇 = 𝒃.
Under the assumption that a solution exists, we can reformulate
the problem as
min |A 𝒇 − 𝒃| 2 , (3.37)
𝒇

i.e. find an 𝒇 that minimises the square distance between A 𝒇 and


𝒃. If the minimum is zero, we have found the solution. Writing
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 45

L = |A 𝒇 − 𝒃| 2 , we have that the gradient of L with respect to 𝒇 is

∇L = 2A𝑇 (A 𝒇 − 𝒃). (3.38)

Thus, if we have a guess for 𝒇 we can make a better guess by


changing 𝒇 in the direction of −∇L. If ∇L is zero, we have found
the minimum.

Gradient Descent for Linear Equations


To find a solution 𝒇 for

A 𝒇 = 𝒃, (3.39)

choose an initial guess 𝒇 0 and a small step size 𝜖.


Then until convergence, do

𝒇 𝑛 = 𝒇 𝑛−1 − 2A𝑇 (A 𝒇 𝑛−1 − 𝒃) 𝜖 . (3.40)

The final 𝒇 𝑛 will be the solution if |A 𝒇 𝑛 − 𝒃| 2 is zero. Oth-


erwise it will be the “least square solution” of the equation.
Choosing a good step size 𝜖 is important for the speed and
convergence of the algorithm. Note that 𝜖 is allowed to be
different in each step.

As should be the case, we did not need to explicitly invert A to solve


the linear equation. Much better versions of this method exist such
as the method of Conjugate Gradients, which you will probably be
able to find a library for your language of choice. Conceptually
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 46

these are similar to the above with smart choices for the step size.
We will furthermore present an alternative method [Eq. (4.32)] in
the chapter on partial differential equations.

3.2.3 Spectral Methods


We will exemplify the spectral method for ODEs on the system
defined by
𝑓 00 (𝑥) + 𝜂 𝑓 0 (𝑥) = 𝑔(𝑥) (3.41)
on a periodic domain [𝑎, 𝑏]. This ODE only has a solution if
∫𝑏
𝑎
𝑔(𝑥) d𝑥 = 0.
Using the ideas of spectral methods presented in the previous
chapter, we can rewrite the ODE using a Fourier transform
−𝑘 2 𝑓˜(𝑘) + 𝑖𝜂𝑘 𝑓˜(𝑘) = 𝑔(𝑘),
˜ (3.42)
where ˜ denotes a Fourier transform. From this we immediately
have in the discrete case
 
fft( 𝒈)
𝒇 = ifft , (3.43)
−𝒌 2 + 𝑖𝜂𝒌
where 𝒌 is the vector of wave numbers of the Fourier transform.
Here, 𝒌 2 means element-wise square. We note that the first wave
number 𝒌 0 is zero, and so we in principle we are dividing by zero
∫𝑏
for this term. But since 0 𝑔(𝑥) d𝑥 = 0, the corresponding Fourier
component will also be zero. This “zero divided by zero” may be
replaced by an actual zero to obtain the correct solution.
In general:
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 47

Spectral Method

𝑓 (𝑛) = (𝑖𝑘) 𝑛 𝑓˜ to rewrite


1. Use fast fourier transforms and g
the equation in terms of 𝑘 and 𝑓˜.

2. Solve the equation for 𝑓˜ taking special care for zero


denominators.

3. Use the fast inverse fourier transform to obtain the


solution.

The spectral method is quite simple to implement, but we will


nonetheless give a quick numerical example, since there are small
details that can easily be overlooked:

Example
Consider Eq. (3.41) on the domain [0, 2𝜋) with 𝜂 = 1. We
will solve this on the grid
𝒙 = (0.00, 0.79, 1.57, 2.36, 3.14, 3.93, 4.71, 5.50),
i.e. with Δ𝑥 = 𝜋/8 ≈ 0.79. We will solve the ODE for a
right-hand side function 𝑔 that on our grid takes the values
𝒈 = 𝑔(𝒙) = (1.0, 1.7, 0.0, −1.7, −1.0, 0.3, 0.0, −0.3).
Using the fast fourier transform we finda
rfft( 𝒈) ≈ (0, 4.0, −4.0 𝑖, 0, 0),
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 48

corresponding to wave numbersb

𝒌 = (0.0, 1.0, 2.0, 3.0, 4.0).

Thus we have
rfft( 𝒈)
= ( 0.0/0.0, −2 − 2 𝑖, −0.4 + 0.8 𝑖, 0, 0).
−𝒌 2 + 𝑖𝒌
We replace 0.0/0.0 by zero, and then solve the PDE by calcu-
lating
 
rfft( 𝒈)
𝒇 = ifft
−𝒌 2 + 𝑖𝒌
= (−0.6, −0.2, 0.6, 0.9, 0.4, −0.2, −0.4, −0.5).

In fact a constant can also be added to the above solution and


will still solve the equation. This is because the ODE does
not contain the value 𝑓 (𝑥) but only its derivatives.
a We use the real fast fourier transform “rfft”.
b These are simply multiplies of 1/Δ𝑥. Your fft library will most
likely have a utility function to calculate the frequencies, e.g. called
“rfftfreq”.

In the above example, we did not tell you what the function 𝑔 was,
but only defined it in terms of its values at the grid points. As can
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 49

be deduced from the Fourier coefficients, our choice was

𝑔(𝑥) = cos(𝑥) + sin(2𝑥). (3.44)

However, with only 8 points as used in the example, there is no way


to tell the difference between this function and the function

𝑔2 (𝑥) = cos(9𝑥) + sin(18 𝑥). (3.45)

This means that we would have obtained the exact same solution
if we had used 𝑔2 and the same number of grid points. This effect
is called aliasing. When using spectral methods there is therefore
a simple rule that must be followed: use a grid-spacing Δ𝑥 that is
small enough to capture the highest frequencies in all functions.

Beyond being very easy to implement spectral methods also


have a very high accuracy. For an 𝑛’th order finite difference
method, the accuracy is O (Δ𝑥 𝑛 ), which in terms of the number
of grid point 𝑚 is O (𝑚 −𝑛 ), since Δ𝑥 ∼ 𝑚1 . The error for spectral
methods is of order O (𝑒 −𝑚 ). In other words, finite difference
methods are polynomial in their error, whereas spectral methods
are exponential. So: if you can apply them, do apply them. We
note, however, that spectral methods require the solution to be
smooth. Small discontinuities, e.g. in a boundary condition, can
completely ruin the method.

As mentioned in the previous chapter, spectral methods also


exist for non-periodic domains, but the formulas differ, as different
basis functions need to be used.
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 50

3.2.4 Non-linear Problems


Non-linear terms in ODEs do not pose much of a problem for
initial-value problems. However, for boundary-value problems they
prevent the problem from being formulated as a linear system of
equations. The shooting method can still be used unaltered in these
cases. However, the shooting method does not generalise well to
PDEs, and we will therefore focus on other methods. Here we
present a simple alternative that can be used with both the finite
difference method and with spectral methods.
Consider a non-linear ODE such as

D 𝑓 (𝑥) = 𝑁 ( 𝑓 (𝑥), 𝑥), (3.46)

where D is some linear differential operator and 𝑁 is a non-linear


function of 𝑓 . Using finite difference, we know to discretise the
left-hand side. So we have an equation of the form

A 𝒇 = 𝑁 ( 𝒇 ). (3.47)

If you have 𝑁 grid points, this is simply a system of 𝑁 non-linear


equations. There are many algorithms to solve such systems of
equations directly, and using these could be a good approach.
A simple alternative is to use the relaxation method:

The Relaxation Method


To solve for 𝒇 in a system of non-linear equations, rewrite
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 51

the equations in form

𝑓𝑖 = 𝐹 ({ 𝑓 𝑗 }), (3.48)

where the right-hand side also is allowed to depend on 𝑓𝑖 .


Then choose a starting guess 𝒇 and until convergence for all
𝑖 not on the boundary do

𝑓𝑖 ← 𝐹 ({ 𝑓 𝑗 }). (3.49)

The right-hand side is evaluated for all 𝑖 before being reas-


signed to update 𝑓𝑖 . Update boundary values using boundary
conditions after each iteration.

This method can be used in many ways, and how well it works can
vary significantly depending on the problem at hand.

Example
To solve Eq. (3.47), choose a starting guess 𝒇 and repeat

𝒇 ← A−1 𝑁 ( 𝒇 ). (3.50)

When the method works, the solution will be found after a


number of iterations. There is no guarantee that this method
convergences though, so it should be checked that the solution
found solves the equation. Also note that in principle, many
CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 52

solutions could exist. Which solution is found depends on


the initial guess.

For a concrete problem, we could also use the relaxation method


in the following way

Example
Consider
𝑓 00 (𝑥) = 𝑁 ( 𝑓 (𝑥), 𝑥). (3.51)
Using central differences we can discretise this as
𝑓𝑖−1 + 𝑓𝑖+1 − 2 𝑓𝑖
= 𝑁 ( 𝑓𝑖 , 𝑥) (3.52)
Δ𝑥 2
We can rewrite this in form of Eq. (3.48) as
1 
𝑓𝑖 = 𝑓𝑖−1 + 𝑓𝑖+1 − 𝑁 ( 𝑓𝑖 , 𝑥𝑖 )Δ𝑥 2 , (3.53)
2
which can be iterated to find the solution. In this case we did
not even have to form a matrix.

Note that this method is also a viable approach to solve linear


equations. We further note that convergence of the method depends
on the initial choice and is not guaranteed.

A separate approach is to use root-finding algorithms to solve


CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 53

the non-linear problem. The non-linear equation resulting from an


ODE can always be formulated as an 𝑛-dimensional equation of the
form
F(𝒙) = 0. (3.54)
To find an 𝒙 that solves the above non-linear equation, one could
use

Newton’s Method
To solve for 𝒙 in a system of non-linear equations, rewrite
the equation in the form of (3.54).
From an initial guess 𝑥 0 , use the Jacobian
𝜕𝐹1 𝜕𝐹1 𝜕𝐹1
© 𝜕𝑥1 𝜕𝑥2 ... 𝜕𝑥 𝑛 ª
­ 𝜕𝐹2 𝜕𝐹2
... 𝜕𝐹2 ®
𝐽𝐹 = ­ 𝜕𝑥. 1 𝜕𝑥2 𝜕𝑥 𝑛 ®
­
.. .. .. ®® (3.55)
­ .. . . . ®
­
𝜕𝐹𝑛 𝜕𝐹𝑛 𝜕𝐹𝑛
« 𝜕𝑥1 𝜕𝑥2 ... 𝜕𝑥 𝑛 ¬

and iteratively solve the linear system of equations

𝐽𝐹 (𝒙 𝑛 ) 𝒂 = −F(𝒙 𝑛 ) (3.56)

for 𝒂 and set


𝒙 𝑛+1 = 𝒙 𝑛 + 𝒂. (3.57)
Iterate until convergence.

The method requires that the Jacobian may be calculated. It is


CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 54

naturally most efficient when this can be calculated analytically, but


also works with numerical approximations of the Jacobian.

Example
Consider the equations

𝑦3 = 1 − 𝑥2, (3.58)
𝑦 = −𝑒 𝑥 .

Rewriting we have
 
𝑦3 + 𝑥2 − 1
F(𝑥, 𝑦) = . (3.59)
𝑦 + 𝑒𝑥

The Jacobian is
 
2𝑥 3𝑦 2
𝐽𝐹 (𝑥, 𝑦) = 𝑥 . (3.60)
𝑒 1

In this simple case, we can simply invert the Jacobian to find


 
−1 1 1 1/𝑦 2
𝐽𝐹 (𝑥, 𝑦) = 𝑥 2 . (3.61)
𝑒 /𝑦 + 2𝑥 −𝑒 𝑥 2𝑥
Choosing e.g. 𝒙 0 = (1, 1)𝑇 we iterate

𝒙 𝑛+1 = 𝒙 𝑛 − 𝐽𝐹−1 (𝒙 𝑛 ) F(𝒙 𝑛 ). (3.62)


CHAPTER 3. ORDINARY DIFFERENTIAL EQUATIONS 55

In this simple case we can actually write out analytically the


full update

𝑥 ← 𝑥 − 𝑥 2 + 𝑦 2 (3𝑒 𝑥 + 2𝑦) , (3.63)


 
𝑥 3
𝑦 ← 𝑦 + 𝑒 (𝑥 − 2) 𝑥 + 𝑦 − 2𝑥𝑦

After just seven iterations we find the to a good accuracy the


correct solution 𝑥 = −1.02297 and 𝑦 = −0.359525.
Partial Differential Equations

Most of the principles that we have discussed for Ordinary Differen-


tial Equations (ODEs) also apply for Partial Differential Equations
(PDEs). Nonetheless, solving PDEs is generally harder than solv-
ing ODEs. This is true for analytical approaches and numerical.
Implementation-wise, PDEs are harder to write solvers for, since we
have to think about the grids on which the PDEs should be solved
— for ODEs, all we had to think about was a one-dimensional
𝑥-axis. PDEs are also harder for the computer, since typically a
𝑑-dimensional problem will have 𝑁 𝑑 variables to be solved for if
we want 𝑁 grid points in each direction. In 3D this very quickly
becomes unmanageable.
We will consider two types of PDE problem formulations: time-
dependent problems that is a mix of initial-value and boundary-
value problems, and time-independent problems that are full boundary-
value problems. The former will in many cases be the easier one to
approach. In particular, we will exemplify the methods using the
diffusion (heat) equation

𝜕 𝑓 (𝑡, 𝒙)
= 𝛾∇2 𝑓 (𝑡, 𝒙), (4.1)
𝜕𝑡
the advection equation

𝜕 𝑓 (𝑡, 𝒙)
= ∇ · (𝒖 𝑓 (𝑡, 𝒙)), (4.2)
𝜕𝑡

56
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 57

and Poisson’s equation

∇2 𝑓 (𝒙) = 𝑔(𝒙). (4.3)

Nevertheless, the methods we develop will have general applicabil-


ity. We will write the general time-dependent PDE as
𝜕 𝑓 (𝑡, 𝒙)
= D 𝑓 (𝑡, 𝒙) + 𝑔(𝒙, 𝑡), (4.4)
𝜕𝑡
and the general time-independent PDE as

D 𝑓 (𝑡, 𝒙) = 𝑔(𝒙), (4.5)

where D is some differential operator. Time-dependent problems


need both initial-value and boundary-value conditions to be fully
specified. We postpone a discussion of non-linear PDEs for the end
of the chapter.
For ODEs, we discovered that some methods were more stable
than others. This is a critical subject for PDEs as many PDEs
simply cannot be solved with some methods. We begin this chapter
by introducing a simple method for solving some time-dependent
PDEs and then show when this approach fails. The following
sections will then be dedicated to more stable (implicit) methods.

4.1 Explicit Methods & Stability Analysis


Probably the simplest scheme for PDEs follows by combining cen-
tral schemes for calculating derivatives with the Euler Method in
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 58

the most straightforward way. The method is, for obvious reasons,
known as the Forward-Time-Central-Space (FTCS) scheme:

FTCS Scheme
To solve PDEs of the form Eq. (4.4), use forward Euler
to discretise the time-derivative, and a central scheme to
discretise spatial derivatives.
For one spatial dimension this becomes:

𝑓 (𝑡 + Δ𝑡, 𝑥𝑖 ) = 𝑓 (𝑡, 𝑥𝑖 ) + 𝐷 ({ 𝑓 (𝑡, 𝑥 𝑗 )})Δ𝑡 (4.6)


+ 𝑔(𝑡, 𝑥𝑖 )Δ𝑡

where 𝐷 is the discretised version of D.

After each time step, ensure that boundary conditions are


met by updating the values of 𝑓 at the borders.

Note that this method can only be used for time-dependent prob-
lems. It cannot be used to tackle equations of the form of Eq.
(4.5).

Example
Consider
𝜕 𝑓 (𝑡, 𝒙) 𝜕 2 𝑓 (𝑡, 𝒙)
=𝛾 (4.7)
𝜕𝑡 𝜕𝑥 2
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 59

with boundary conditions 𝑓 (𝑡, 𝑎) = 𝛼 and 𝑓 (𝑡, 𝑏) = 𝛽. The


FTCS Scheme for this problems is to do

𝑓 (𝑡, 𝑥𝑖−1 ) + 𝑓 (𝑡, 𝑥𝑖+1 ) − 2 𝑓 (𝑡, 𝑥𝑖 )


𝑓 (𝑡+Δ𝑡, 𝑥𝑖 ) = 𝑓 (𝑡, 𝑥𝑖 )+𝛾 Δ𝑡
Δ𝑥 2
(4.8)
for all grid points 𝑥𝑖 except for 𝑥0 = 𝑎 and 𝑥 𝑁 = 𝑏, as these
are set by the boundary conditions.

We will now discuss the stability of this scheme on the example


problem above. That is, we will discuss how small Δ𝑡 has to be
chosen in order for the method to be stable. It is harder to do this
type of analysis for PDEs than for ODEs. The approach we will take
is called Von Neumann stability analysis. The analysis considers
the growth-rate of the Fourier modes of the numerical solution, and
the idea is to require that no mode may grow infinitely big.
For simplicity we consider periodic boundary conditions, al-
though this will not make a difference for the stability condition
that we derive. In this case we can write the solution as a Fourier
series. Using 𝑁 grid points we have1
𝑁/2
Õ
𝑓 (𝑡, 𝑥) = 𝑐 𝑛 (𝑡) 𝑒𝑖𝑘 𝑛 𝑥 , (4.9)
𝑛=−𝑁/2

where 𝑘 𝑛 = 𝜋 𝑛 Δ𝑥. Using this equation in Eq. (4.8) and Fourier


1 Exact expression depends on whether 𝑁 is odd or even.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 60

orthogonality, we obtain

𝑒 −𝑖𝑘 𝑛 Δ𝑥 𝑐 𝑛 (𝑡) + 𝑒𝑖𝑘 𝑛 Δ𝑥 𝑐 𝑛 (𝑡) − 2𝑐 𝑛 (𝑡)


𝑐 𝑛 (𝑡 + Δ𝑡) = 𝑐 𝑛 (𝑡) + 𝛾 Δ𝑡,
Δ𝑥 2
(4.10)
which can be rewritten as
𝑐 𝑛 (𝑡 + Δ𝑡)   𝛾Δ𝑡
= 1 + 𝑒 −𝑖𝑘 𝑛 Δ𝑥 + 𝑒𝑖𝑘 𝑛 Δ𝑥 − 2 . (4.11)
𝑐 𝑛 (𝑡) Δ𝑥 2
Note that the right-hand side does not depend on 𝑡. This equation
tells you what to multiply 𝑐 𝑛 (𝑡) by to get 𝑐 𝑛 (𝑡 + Δ𝑡). If for any 𝑛
  𝛾Δ𝑡
1 + 𝑒 −𝑖𝑘 𝑛 Δ𝑥 + 𝑒𝑖𝑘 𝑛 Δ𝑥 − 2 > 1, (4.12)
Δ𝑥 2
then 𝑐 𝑛 → ∞ as we run our numerical simulation. This means
our scheme is unstable, and some mode will grow indefinitely big.2
Requiring Eq. (4.12) to be false for all 𝑛 leads to

Δ𝑥 2
Δ𝑡 ≤ . (4.13)
2𝛾
This is the Von Neumann stability criteria for the FTCS Scheme
on the diffusion equation. There are two main takeaways: (1) The
better spatial resolution you require, the smaller Δ𝑡 you have to
choose, and (2) the larger the diffusion constant 𝛾, the more stable
the scheme is. Finally, we note that stability is not the same as
2 Note that 𝑐 0 does not change with time. This is because the steady state of
the diffusion equation is a constant).
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 61

accuracy. You can have a stable scheme that is very inaccurate.


Stability just means that the numerical solution does not blow up.

Now consider a one-dimensional advection equation

𝜕 𝑓 (𝑡, 𝑥) 𝜕 𝑓 (𝑡, 𝑥)
=𝑢 , (4.14)
𝜕𝑡 𝜕𝑥
which has the FTCS Scheme
𝑓 (𝑡, 𝑥𝑖+1 ) − (𝑡, 𝑥𝑖−1 )
𝑓 (𝑡 + Δ𝑡, 𝑥𝑖 ) = 𝑓 (𝑡, 𝑥𝑖 ) + 𝑢 Δ𝑡. (4.15)
2Δ𝑥
Using the Von Neumann approach, we find in this case

𝑐 𝑛 (𝑡 + Δ𝑡) 
𝑖𝑘 𝑛 Δ𝑥

−𝑖𝑘 𝑛 Δ𝑥 𝑢Δ𝑡
=1+ 𝑒 −𝑒 . (4.16)
𝑐 𝑛 (𝑡) 2Δ𝑥
The condition for this equation is therefore3

𝑢Δ𝑡
1 + 𝑖 sin(𝑘 𝑛 Δ𝑥) ≤ 1. (4.17)
4Δ𝑥

But we can never satisfy this condition for all 𝑛. Therefore the FCTS
scheme is unstable for any value of Δ𝑡 for the advection equation.
For this reason we will do not recommend the FCTS scheme
unless you are certain that you are applying it to a system for which
you know it is stable. Explicit schemes that work for the advection
equation do exist: for instance the Lax–Wendroff method, which
3 Using 𝑒 𝑖 𝑥 + 𝑒 −𝑖 𝑥 = 2𝑖 sin(𝑥).
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 62

explicitly expands the time-derivative to second order, or upwind


schemes, which replace the central finite difference scheme with
forward or backward schemes depending on the sign of 𝑢. We
will not discuss these methods here, but instead focus on implicit
schemes which have more broad applicability.

4.2 Finite Difference Method


If you have not yet read and understood Sec 3.1.2 and Sec. 3.2.2,
it is recommended that you do so before continuing here. The
methods presented here are very similar, but slightly more involved
than those discussed for ODEs.

4.2.1 Time-dependent Problems


Using implicit time stepping and our knowledge of solving boundary-
value problems for ODEs immediatetely gives us a very stable
scheme for time-dependent PDEs with one spatial dimension. For
Eq. (4.4), implicit time stepping yields4

𝑓 (𝑡 + Δ𝑡, 𝑥𝑖 ) = 𝑓 (𝑡, 𝑥𝑖 ) + 𝐷 ({ 𝑓 (𝑡 + Δ𝑡, 𝑥 𝑗 )}) Δ𝑡 (4.18)


+ 𝑔(𝑡, 𝑥𝑖 ) Δ𝑡,

where 𝐷 is some discretised version of D. To use finite differences


we write 𝑓 , just as we did for ODEs, as a vector of its values: 𝒇 .
4 It is matter of choice whether it should be 𝑔(𝑡, 𝑥𝑖 ) or 𝑔(𝑡 + Δ𝑡, 𝑥𝑖 ).
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 63

This allows us recast the equation as a vector equation:


𝒇 (𝑡 + Δ𝑡) = 𝒇 (𝑡) + D 𝒇 (𝑡 + Δ𝑡)Δ𝑡 + 𝒈(𝑡)Δ𝑡. (4.19)
Which can also be written as
(I − DΔ𝑡) 𝒇 (𝑡 + Δ𝑡) = 𝒇 (𝑡) + 𝒈(𝑡)Δ𝑡, (4.20)
where I is the identity matrix. Thus, in the notation of Eq. (3.31)
that we used for boundary-value problems, we have
A = I − DΔ𝑡, (4.21)
and
𝒃 = 𝒇 (𝑡) + 𝒈(𝑡)Δ𝑡. (4.22)
Now we can apply boundary-conditions to A and 𝒃 and then solve
the boundary-value problem for each step time.
This approach also generalises to higher dimensions.

Implicit Finite Difference for Time-Dependent PDEs

1. Use an implicit time stepping scheme such as backward


Euler.

2. Discretise the spatial derivatives using a finite differ-


ence scheme.

3. Each time step in the simulation is then taken by solving


the resulting boundary-value problem.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 64

Example
Consider the diffusion equation with a source term
𝜕 𝑓 (𝑡, 𝒙) 𝜕 2 𝑓 (𝑡, 𝒙)
=𝛾 + 𝑔(𝑥) (4.23)
𝜕𝑡 𝜕𝑥 2
on [𝑎, 𝑏] with boundary conditions 𝑓 (𝑡, 𝑎) = 𝛼 and 𝑓 (𝑡, 𝑏) =
𝛽. Using Eq. (3.29) and Eq. (4.21), we have
? ? ? ? ··· ? ?? ?
­1 −2 1 0 · · ·
© ª
0 00® 0
­0 1 −2 1 · · ·
­ ®
0 00® 0
Δ𝑡 ­­ . . .. .. . . .. .. ®® .
.. ..
A = I − 2 ­ .. .. . . . . ..® .
Δ𝑥 ­
­0 0 0 0 · · · 1 −2 1 0®®
­0 0 0 0 · · · 0 1 −2 1®®
­
«? ? ? ? ··· ? ? ? ?¬
and
𝒃 = 𝒇 (𝑡) + 𝒈Δ𝑡 (4.24)
Applying our boundary conditions we find
1 0 0 0 ··· 0 0
© ª
­ −Δ𝑡2 1 + 2Δ𝑡2 −Δ𝑡
0 ··· 0 0 ®
­ Δ𝑥 Δ𝑥 Δ𝑥 2 ®
−Δ𝑡 2Δ𝑡 −Δ𝑡
+ ···
­ ®
­ 0 Δ𝑥 2 1 Δ𝑥 2 Δ𝑥 2
0 0 ®
­ . . . .. .. .. ®
Abc = ­ ..
­ .. .. ... ®
. . . ®
­ −Δ𝑡 ®
­ 0
­ 0 0 0 ··· Δ𝑥 2
0 ®®
2Δ𝑡 −Δ𝑡 ®
­ 0
­ 0 0 0 · · · 1 + Δ𝑥 2 Δ𝑥 2 ®
« 0 0 0 0 ··· 0 1 ¬
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 65

and
𝛼
­ 𝑓 (𝑡, 𝑥2 ) + 𝑔(𝑥2 )Δ𝑡 ®
© ª
­ 𝑓 (𝑡, 𝑥3 ) + 𝑔(𝑥3 )Δ𝑡 ®
­ ®
­ .. ®
𝒃 bc = ­­ . ®.
®
­ 𝑓 (𝑡, 𝑥 ) + 𝑔(𝑥 )Δ𝑡 ®
­ 𝑁−2 𝑁−2 ®
­ 𝑓 (𝑡, 𝑥 ) + 𝑔(𝑥 )Δ𝑡 ®
­ 𝑁−1 𝑁−1 ®
« 𝛽 ¬
Each time step is then taken by solving for 𝒇 (𝑡 + Δ𝑡) in

Abc 𝒇 (𝑡 + Δ𝑡) = 𝒃 bc (4.25)

Note that it makes sense to use an iterative solver (such as Eq. (3.40)
or the method of conjugate gradients) to solve the linear equation,
since we have an excellent initial guess for 𝒇 (𝑡 + Δ𝑡) in the form of
𝒇 (𝑡).
If the problem is higher-dimensional, the approach is the same,
except that the boundary-value problem to be solved for each time
step is higher-dimensional. The next section is devoted to the
solution of such problems.

4.2.2 Time-independent Problems


For problems such as Eq. (4.5) of dimensions larger than one, we
need to think about how we represent the discretised version of
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 66

𝑓 (𝒙). We still intend to write the equation in the form

A 𝒇 = 𝒃, (4.26)

where 𝒇 is a vector. In the case of a two-dimensional problem,


where we discretise both 𝑥 and 𝑦 with 𝑁 grid points, 𝒇 must contain
𝑁 2 values, one value for each location {𝑥𝑖 , 𝑦 𝑗 }. Likewise A will be
an 𝑁 2 × 𝑁 2 matrix.
It is a matter of choice how to order the values5, but to make
things simple it makes sense to order them in the vector as

𝑓 (𝑥 1 , 𝑦 1 ) 𝑓 (𝑥 3 , 𝑦 2 ) ..
.
© 𝑓 (𝑥 , 𝑦 ) ª® © ª
­ 𝑓 (𝑥 2 , 𝑦 1 ) ®
© ª
­ 4 2 ® ­ 𝑓 (𝑥 𝑁 , 𝑦 𝑁−1 ) ®®
­ .. ® ­ .. ® ­­
® . ­ 𝑓 (𝑥1 , 𝑦 𝑁 ) ®
®
­
­ . ®
®
­
­ .
𝒇 = ­­ 𝑓 (𝑥 𝑁−1 , 𝑦 1 ) ®® ­ 𝑓 (𝑥 𝑁 , 𝑦 2 ) ® .. ­ 𝑓 (𝑥2 , 𝑦 𝑁 ) ®® ,
®
­ ­
­ 𝑓 (𝑥 , 𝑦 ) ® ­ 𝑓 (𝑥 1 , 𝑦 3 ) ®® ­ .. ®
­ 𝑁 1 ® ­ ­ . ®
­ 𝑓 (𝑥 , 𝑦 ) ® ­ 𝑓 (𝑥 2 , 𝑦 3 ) ® ­
® ®
­ 1 2 ® ­ ­ 𝑓 (𝑥 𝑁−1 𝑁 ®, 𝑦 ) ®
(𝑥 .. ®
2, 𝑦2) ¬
¬ « 𝑓 (𝑥 𝑁 , 𝑦 𝑁 ) ¬
« 𝑓 « .

i.e. stacking 𝑓 (𝒙, 𝑦 1 ), 𝑓 (𝒙, 𝑦 2 ), · · · , 𝑓 (𝒙, 𝑦 𝑁 ). The same idea holds


for higher dimensions.
5 We could also formulate it as a tensor equation, which would avoid the issue

of choosing an ordering. Most libraries will also provide a ‘tensorsolve’ function.


We take the above approach since it is most standard, it is in line with what is
taught in standard linear algebra courses, allows the use of all linear algebra
functionalities provided by libraries. However, for simple cases the tensorial
approach can give code that is easier to read/maintain.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 67

For instance, consider the problem of representing 𝜕𝑥 𝑓 (𝑥, 𝑦) as


a finite difference equation. Let us do this explicitly for a periodic
system with just 𝑁 = 4 grid point.6 We are then trying to solve for
𝒇 which contains elements that represent the function as

6 For finite differences, periodicity implies that e.g. 𝑥1 and 𝑥 𝑁 are neighbour-
ing points.

𝑓 (𝑥1 , 𝑦 1 ) 𝑓 (𝑥2 , 𝑦 1 ) 𝑓 (𝑥3 , 𝑦 1 ) 𝑓 (𝑥4 , 𝑦 1 )

𝑓 (𝑥1 , 𝑦 2 ) 𝑓 (𝑥2 , 𝑦 2 ) 𝑓 (𝑥3 , 𝑦 2 ) 𝑓 (𝑥4 , 𝑦 2 )

𝑓 (𝑥1 , 𝑦 3 ) 𝑓 (𝑥2 , 𝑦 3 ) 𝑓 (𝑥3 , 𝑦 3 ) 𝑓 (𝑥4 , 𝑦 3 )

𝑓 (𝑥1 , 𝑦 4 ) 𝑓 (𝑥2 , 𝑦 4 ) 𝑓 (𝑥3 , 𝑦 4 ) 𝑓 (𝑥4 , 𝑦 4 )


CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 68

We use the central derivative scheme [Eq. (2.9)] and find 𝜕𝑥 𝒇 to be


approximated by

0 1 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 𝑓 (𝑥1 ,𝑦 1 )
© −1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ª© 𝑓 (𝑥2 ,𝑦 1 ) ª
­ 0 −1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ®­ 𝑓 (𝑥3 ,𝑦 1 ) ®
­ 1 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 ®­ 𝑓 (𝑥4 ,𝑦 1 ) ®
­0 0 0
­ 0 0 1 0 −1 0 0 0 0 0 0 0
®­
0 ®­ 𝑓 (𝑥1 ,𝑦 2 ) ®
®
­0 0 0 0 −1 0 1 0 0 0 0 0 0 0 0 0 ®­ 𝑓 (𝑥2 ,𝑦 2 ) ®
­0 0 0 0 0 −1 0 1 0 0 0 0 0 0 0 0 ®­ 𝑓 (𝑥3 ,𝑦 2 ) ®
1 ­­ 0 0 0 0 1 0 −1 0 0 0 0 0 0 0 0
®­
0 ®­ 𝑓 (𝑥4 ,𝑦 2 ) ®
®
2Δ𝑥 ­­ 0 0 0 0 0 0 0 0 0 1 0 −1 0 0 0 0 ®­ 𝑓 (𝑥1 ,𝑦 3 ) ® .
−1 𝑓 (𝑥2 ,𝑦 3 ) ®
­ 00 00 00 0 0 0 0 0 0 1 0 0 0 0 0 ® ­
𝑓 (𝑥3 ,𝑦 3 ) ®
0 0 0 0 0 0 −1 0 1 0 0 0
®­
0 ®­
®
­
­0 0 0 0 0 0 0 0 1 0 −1 0 0 0 0 0 ®­ 𝑓 (𝑥4 ,𝑦 3 ) ®
­0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 −1 ® ­ 𝑓 (𝑥1 ,𝑦 4 ) ®
𝑓 (𝑥2 ,𝑦 4 ) ®
­ ®­ ®
­0 0 0 0 0 0 0 0 0 0 0 0 −1 0 1 0 ®­
0 0 0 0 0 0 0 0 0 0 0 0 0 −1 0 1 𝑓 (𝑥3 ,𝑦 4 )
«0 0 0 0 0 0 0 0 0 0 0 0 1 0 −1 0 ¬« 𝑓 (𝑥4 ,𝑦 4 ) ¬

This is somewhat hard to read, so do not feel bad if you skip it: this
is easier to program than to read! But do focus on single, random
row and make sure to understand that one. For instance, the second
1
row states 𝜕𝑥 𝑓 (𝑥 2 , 𝑦 1 ) = 2Δ𝑥 ( 𝑓 (𝑥3 , 𝑦 1 ) − 𝑓 (𝑥1 , 𝑦 1 )). Note that
the matrix would be very different if we needed 𝜕𝑦 𝑓 . Also note
that as we increase the number of grid points 𝑁, the fraction of
zeros increases. It is thus a very good idea to use a sparse matrix
representation.

For higher-dimensional PDEs we will often encounter the Lapla-


cian: ∇2 . It is useful to a have a finite difference scheme handy
specifically for this operator. The standard choice is to use central
finite differences along all dimensions:
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 69

Discrete Laplace Operator


With regular grid-spacing the discrete Laplace operator is
defined as

∇2 𝑓 (𝑥𝑖 , 𝑦 𝑗 ) = (𝜕𝑥2 + 𝜕𝑦2 ) 𝑓 (𝑥𝑖 , 𝑦 𝑗 ) (4.27)


1 
≈ 𝑓 (𝑥𝑖−1 , 𝑦 𝑗 ) + 𝑓 (𝑥𝑖+1 , 𝑦 𝑗 )
Δ𝑥 2 
+ 𝑓 (𝑥𝑖 , 𝑦 𝑗−1 ) + 𝑓 (𝑥𝑖 , 𝑦 𝑗+1 ) − 4 𝑓 (𝑥𝑖 , 𝑦 𝑗 )

in two dimensions.
It generalises straighforwardly to higher dimensions.

For example, ∇2 𝑓 (𝑥, 𝑦) for our periodic 𝑁 = 4 system is represented


by the finite difference expression L 𝒇 , where
−4 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0
© 1 −4 1 0 0 1 0 0 0 0 0 0 0 1 0 0 ª
­ 0 1 −4 1 0 0 1 0 0 0 0 0 0 0 1 0 ®
­ 1 0 1 −4 0 0 0 1 0 0 0 0 0 0 0 1 ®
−4
­ 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 ®
®
­
­ 0 1 0 0 1 −4 1 0 0 1 0 0 0 0 0 0 ®
­ 0 0 1 0 0 1 −4 1 0 0 1 0 0 0 0 0 ®
1 ­­ 0 0 0 1 1 0 1 −4 0 0 0 1 0 0 0 0 ®
®
L= −4
. (4.28)
Δ𝑥 2 ­­ 0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0 1
1
−4
0
1
1
0 0
1 0
1
0
0
0 ®
0 ®®
­ 0 0 0 0 0 0 1 0 0 1 −4 1 0 0 1 0 ®
­
­ 0 0 0 0 0 0 0 1 1 0 1 −4 0 0 0 1 ®
­
­ 1 0 0 0 0 0 0 0 1 0 0 0 −4 1 0 1 ®®
­ 0 1 0 0 0 0 0 0 0 1 0 0 1 −4 1 0 ®
0 0 1 0 0 0 0 0 0 0 1 0 0 1 −4 1
« 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 −4 ¬

Again, make sure you understand just one or two rows. That should
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 70

be enough to understand the idea and enable you to write a program


that generates such a matrix for any 𝑁.

Example
Consider
∇2 𝑓 (𝑥, 𝑦) = 𝑔(𝑥, 𝑦). (4.29)
The finite difference equation is then simply

L 𝒇 = 𝒈, (4.30)

which can be solved using any linear solver after applying


boundary conditions.

Constructing the matrix A allows us to use any library for solving


the linear equation. This is typically very efficient. However, it does
take some code to build A, which sometimes you want to avoid. In
this case we can turn to relaxation methods [Eq. (3.48)]. For the
systems of equations that result from linear PDEs there is a specific
version of the relaxation method that is easy to implement:7

Jacobi Method
A linear equation
A𝒇 = 𝒃 (4.31)

7 However, ease of implementation is at the of cost speed of convergence: the


Jacobi method typically requires many iterations.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 71

can be solved by splitting the matrix into its diagonal and off-
diagonal parts A = D + O. Here D only contains elements on
the diagonal and O only contains elements off the diagonal.
Then until convergence do

𝒇 ← D−1 (𝒃 − O 𝒇 ) (4.32)

Note that D−1 is trivial to calculate.

The method works if it holds for all rows 𝑖 that |A𝑖𝑖 | ≥


Í
𝑗≠𝑖 |A𝑖 𝑗 |, i.e. that for each row in the matrix, the diagonal
element is at least as large as the sum of the rest of the
elements in that row. Further, the inequality has to be strict
Í
|A𝑖𝑖 | > 𝑗≠𝑖 |A𝑖 𝑗 | in at least one row (meaning the matrix is
so-called irreducibly diagonally dominant).

The Jacobi method is a relaxation method, since it is just a simple


rewrite of A 𝒇 = (D + O) 𝒇 = 𝒃. A nice feature is that we have a
condition for convergence (although in fact the method sometimes
works even when the condition is not satisfied8). To get a feeling
for the usefulness of this method, we turn to an example:

8 The condition we have stated is sufficient for convergence, but not necessary.

The sufficient and necessary condition is that all eigenvalues of D−1 O are less
than or equal to 1, and at least one eigenvalue is strictly less than 1.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 72

Example
Consider again

∇2 𝑓 (𝑥, 𝑦) = 𝑔(𝑥, 𝑦), (4.33)

which we will again approximate using Eq. (4.27). For the


Laplacian matrix L the condition to use the Jacobi method
holds with equality: | − 4| ≥ |1| + |1| + |1| + |1|. A Dirichlet
boundary condition [Eq. (3.22)] will make the full condition
to use the Jacobi method true (in fact this method will typi-
cally also converge for periodic systems if the initial guess is
not too terrible).

The method then requires us to iterate


1
𝑓 (𝑥𝑖 , 𝑦 𝑗 ) ← 𝑓 (𝑥𝑖+1 , 𝑦 𝑗 ) + 𝑓 (𝑥𝑖−1 , 𝑦 𝑗 ) (4.34)
4 
+ 𝑓 (𝑥𝑖 , 𝑦 𝑗+1 ) + 𝑓 (𝑥𝑖 , 𝑦 𝑗−1 ) − 𝑔(𝑥𝑖 , 𝑦 𝑗 ) Δ𝑥 2

where the right-hand side is evaluated for all 𝑖, 𝑗 before the


update. The updates at the boundary points depend on the
boundary conditions.

Note that for this method 𝑓 (𝑥, 𝑦) can be stored as a matrix


instead of as a vector, since we do not need to put it in
a shape suitable for library functions. This can simplify
implementation quite a lot.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 73

Many methods are intuitive to read. This one requires a bit more
thought, so make sure you understand how to obtain Eq. (4.34)
from Eq. (4.32). This method is simpler to implement, but not as
efficient as those found in many modern linear algebra libraries.

4.3 Boundary Conditions


The boundary conditions for ODEs are very easy to specify, as
these will just be at either end of a domain such as [𝑎, 𝑏]. In higher
dimensions, the boundary will not simply be points, but lines for
2D PDEs, faces for 3D PDEs, and so on.
You will often find the notation used in which the region on
which we solve is denoted as Ω, and the boundary Γ.9 For an ODE,
for example, we could solve on the domain given by the interval
Ω = [𝑎, 𝑏], while the boundary is the edge points Γ = {𝑎, 𝑏}.
In two dimensions, we could illustrate it like this:

Ω Γ

To specify different boundary conditions on different parts of


the boundary, it is often divided into sections:
9 It is also standard to use 𝜕Ω instead of Γ.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 74

Γ1 Ω Γ2

For instance, a boundary-value problem could be specified as




 ∇2 𝑓 (𝒙) = 0 𝒙 ∈ Ω,


𝑓 (𝒙) = 𝛼 𝒙 ∈ Γ1 , (4.35)

 𝑓 (𝒙) = 𝛽 𝒙 ∈ Γ2 .


Numerically, this simply means that we have to keep track of which
of our discretised points belong to which part of the boundary.
As clear from the example above, Dirichlet boundary conditions
are similar for PDEs as they are for ODEs. Neumann boundary
conditions are generalised as

Neumann Boundary Conditions (PDEs)


A Neumann boundary condition specifies the value of the
directional derivative along the boundary normal 𝑛ˆ of the
function at the border, e.g.:
𝜕𝑓
= (∇ 𝑓 · 𝑛)
ˆ 𝒙=𝒂
=𝛼 (4.36)
𝜕 𝑛ˆ 𝒙=𝒂
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 75

Example
Consider
∇2 𝑓 (𝑥, 𝑦) = 0 (4.37)
on a square 𝑥 ∈ [𝑎 𝑥 , 𝑏 𝑥 ] and 𝑦 ∈ [𝑎 𝑦 , 𝑏 𝑦 ]. Suitable boundary
conditions could be


 𝑓 (𝑥, 𝑎 𝑦 ) = 𝑓 (𝑎 𝑥 , 𝑦) = 𝛽


𝜕𝑥 𝑓 (𝑏 𝑥 , 𝑦) = 𝛼1 (4.38)

 𝜕𝑦 𝑓 (𝑥, 𝑏 𝑦 ) = 𝛼2


Here we used two Neumann conditions:
𝑦
𝜕𝑦 𝑓 = 𝛼2

𝜕𝑥 𝑓 = 𝛼1

To implement boundary conditions numerically, we recommend


simply changing rows in the matrix A to make Abc , as we have done
so far in all examples. The rows that need to be changed are those
that correspond to points on the border. If a border point has two
boundary conditions, then two points need to be chosen.
Changing rows is simple, but not the only approach. It is also
possible to first solve the problem of representing all boundary
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 76

points in terms of inner points using the boundary conditions, and


then subsequently solve the inner problem. This is often a waste of
(coding) time in trade for a small computational gain. Nonetheless,
many people implement code in this way and so we present the
approach (which is a simple linear algebra exercise). We thus
consider an already discretised equation

A 𝒇 = 𝒃, (4.39)

where we have not replaced rows using our boundary conditions.


A is an 𝑁 × 𝑁 matrix, where 𝑁 is number of points in our grid.
Boundary conditions, be they Dirichlet or Neumann can be
written as their own system of equations

B 𝒇 = 𝜷, (4.40)

where B is an 𝑀 × 𝑁 matrix, 𝑀 being the number of boundary


conditions, the values of which are stored in 𝜷. For each boundary
condition, a corresponding point is chosen (the row we would have
replaced). The vector 𝒇 is then reordered such that these are the last
𝑀 entries, and A, B and 𝒃 are permuted similarly. We now need to
solve for the remaining 𝑅 = 𝑁 − 𝑀 points.
After the reordering, we can split Eq. (4.39) and write it as
    
A 𝑅𝑅 A 𝑅𝑀 𝒇𝑅 𝒃
= 𝑅 , (4.41)
A𝑀 𝑅 A𝑀 𝑀 𝒇 𝑀 𝒃𝑀

from which we need the first 𝑅 equations:

A 𝑅𝑅 𝒇 𝑅 + A 𝑅𝑀 𝒇 𝑀 = 𝒃 𝑅 (4.42)
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 77

Likewise for Eq. (4.40),


 
 𝒇𝑅
B𝑅 B𝑀 = 𝜷, (4.43)
𝒇𝑀
which can be rewritten as
𝒇 𝑀 = B−1
𝑀 ( 𝜷 − B 𝑅 𝒇 𝑅 ). (4.44)
Here we have used the boundary conditions to express the chosen
𝒇 𝑀 in terms of the remaining 𝒇 𝑅 .
Using these together, we finally find
 
A 𝑅𝑅 − A 𝑅𝑀 B−1 −1
𝑀 𝑅 𝒇 𝑅 = 𝒃 𝑅 − A 𝑅𝑀 B 𝑀 𝜷
B (4.45)

which is precisely 𝑅 equations for 𝑅 unknowns. Written differently,


we have derived
Abc = A 𝑅𝑅 − A 𝑅𝑀 B−1
𝑀 B𝑅 , (4.46)
𝒃 bc = 𝒃 𝑅 − A 𝑅𝑀 B−1
𝑀 𝜷. (4.47)
Notice that B 𝑅 = 0 if we only have Dirichlet boundary conditions
in which case only 𝒃 needs to be updated. In this case B−1 𝑀 is also
trivial to calculate as it will typically be equal to the identity matrix.

4.4 Spectral Method


Spectral methods are really powerful also for PDEs. They can be
implemented for all types of boundary conditions and geometries
for which suitable analytical basis functions exist.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 78

We will only present an example for a square domain with


periodic boundary conditions, as for these we can use Fourier series
and the fast fourier transform. In these cases we can write our
function to be solved for as
Õ
𝑓 (𝒙) ≈ 𝑐 𝒌 𝑒𝑖 𝒌·𝒙 . (4.48)
𝒌

For instance, in two dimensions


𝑁
Õ 𝑁
Õ
𝑓 (𝑥, 𝑦) ≈ 𝑐 𝑘 𝑥 ,𝑘 𝑦 𝑒𝑖(𝑘 𝑥 𝑥 + 𝑘 𝑦 𝑦) . (4.49)
𝑘 𝑥 =−𝑁 𝑘 𝑦 =−𝑁

Let us consider a concrete PDE


∇2 𝑓 (𝑥, 𝑦) + 𝜂 𝜕𝑦 𝑓 (𝑥, 𝑦) = 𝑔(𝑥, 𝑦) (4.50)
with periodic boundary conditions. After a Fourier transform we
have
−(𝑘 𝑥2 + 𝑘 2𝑦 ) 𝑓˜(𝑘 𝑥 , 𝑘 𝑦 ) + 𝑖 𝜂 𝑘 𝑦 𝑓˜(𝑘 𝑥 , 𝑘 𝑦 ) = 𝑔(𝑘
˜ 𝑥, 𝑘 𝑦) (4.51)
Thus we can solve the discretised version of the PDE using the fast
fourier transform as
!
fft2( 𝒈)
𝒇 = ifft2 , (4.52)
𝑖𝜂𝒌 𝑦 − ( 𝒌 2𝑥 + 𝒌 2𝑦 )
where fft2 and ifft2 are the two-dimensional fast fourier trans-
forms.10 Using the spectral method it is thus no harder to solve
10 We again recommend to use the real version of these if you have access to
these. They will typically be called rfft2 and irfft2.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 79

a PDE than an ODE as presented in Sec. 3.2.3. We refer to that


section for implementation details.

4.5 Finite Element Method


The finite difference method is in some sense the most “intuitive”
method for discretising a differential equation. It is not the only
method, however. If you are getting serious about numerical solu-
tions to PDEs, you will very quickly stumble into the finite element
method (FEM). FEM is slightly harder to introduce, but in the most
basic version, it is far from as difficult as many resources make it
seem. As it is very likely that you will stumble into this method
at some point we include it here despite being an introductory text.
The derivation, however, is somewhat longer than for the other
methods presented in this text. We do not recommend to imple-
ment your own version of FEM as many great tools exist. But
for using these tools, it is extremely useful to understand what is
happening under the hood.
While FEM is only really useful for PDEs, it certainly can also
be applied to ODEs. Our derivation for FEM will be done for a
2D PDE, however, it can instructive to compare this derivation with
a 1D example. As such, we present FEM applied to an ODE on
page 93. It can be useful to read this example before a second read-
through of the derivation we present here for PDEs. The approach is
very similar but the mathematical details slightly easier for ODEs.

We begin with some motivation. So far we have not considered


CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 80

PDEs on unstructured grids. Unstructured grids such as

can be used to model special geometry, or enable higher accuracy to


be obtained in certain regions. While finite difference schemes cer-
tainly can be derived on unstructured grids using multi-dimensional
Taylor expansions, it is not straightforward. In particular it is very
hard formulate high-order scheme of high accuracy using finite
differences. In the finite element method, unstructured grids are
as easy to handle as regular grids. And obtaining higher-order
schemes is a lot more straightforward.

We are only going to present one of the simplest versions of


FEM, and will only work with one example equation, namely Pois-
son’s equation
∇2 𝑓 (𝑥, 𝑦) = 𝑔(𝑥, 𝑦) (4.53)
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 81

on a triangular mesh (i.e. on a grid made up of triangles). We use


the notation in which the region on which we solve is denoted Ω and
its boundary Γ. We will discuss how to implement both Dirichlet
and Neumann type boundary conditions.
In some sense, the finite element method can be thought of as
being a spectral method inside each triangle that is stitched together
to find the total solution. In this sense, we can get some of the
advantages of spectral methods, while having the possibility of
local variations.
Consider a random triangle of our mesh:

(𝑥1 , 𝑦 1 )

(𝑥 2 , 𝑦 2 ) (𝑥 3 , 𝑦 3 )

If we approximate, as we will, the function 𝑓 to be linear inside


each triangle, then the function is fully specified if we know the
values of 𝑓 (𝑥, 𝑦) at the triangle corners. For the above triangle, we
thus simply need to solve for 𝑓1 = 𝑓 (𝑥 1 , 𝑦 1 ), 𝑓2 = 𝑓 (𝑥2 , 𝑦 2 ), and
𝑓3 = 𝑓 (𝑥 3 , 𝑦 3 ) to have its value everywhere inside the triangle.
We write the total function 𝑓 as a sum of linear functions
𝜙𝑖 (𝑥, 𝑦): Õ
𝑓 (𝑥, 𝑦) = 𝑓𝑖 𝜙𝑖 (𝑥, 𝑦). (4.54)
𝑖
We have not yet specified what 𝜙𝑖 (𝑥, 𝑦) are, other than to say that
they are linear inside each triangle. Here comes the first of two
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 82

crucial ideas that make up the finite element method: We choose


the basis functions {𝜙𝑖 (𝑥, 𝑦)} such that they are equal to zero on all
points of our mesh, except at a single point on which we require
it to be equal to one (in the context of unstructured meshes, the
points of the mesh are typically referred to as nodes). In this way
we can associate each basis function to a specific point/node: The
basis function 𝜙𝑖 belongs to node (𝑥𝑖 , 𝑦𝑖 ), and this is the only node
on which it is not equal to zero. From this follows that Eq. (4.54)
is the correct expansion if simply 𝑓𝑖 is the value 𝑓 (𝑥𝑖 , 𝑦𝑖 ), i.e. the
value of 𝑓 at the node corresponding to 𝜙𝑖 .
How do we calculate these 𝜙𝑖 (𝑥, 𝑦)? We know they have to look
like this:

Inside each triangle it is linear, and it goes from one to zero as we


move away from the node to which it belongs. It is zero everywhere
else. This is not too hard! Inside a triangle that has node 𝑖 as a
corner we must be able to write

𝜙𝑖 (𝑥, 𝑦) = 𝛼1 + 𝛼2 𝑥 + 𝛼3 𝑦. (4.55)
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 83

To find the values of the 𝛼’s we solve


1 𝑥 1𝑖 𝑦𝑖1 𝛼1 1
© ª© ª © ª
­1 𝑥 𝑖 𝑦𝑖 ® ­𝛼2 ® = ­0® , (4.56)
­ 2 2® ­ ® ­ ®
1 𝑖 𝑦𝑖
« 𝑥 3 𝛼
3¬ « 3¬ «0¬
where (𝑥1𝑖 , 𝑦𝑖1 ) is node that 𝜙𝑖 belongs to and (𝑥 2𝑖 , 𝑦𝑖2 ) and (𝑥 3𝑖 , 𝑦𝑖3 )
are the other corners of the triangle. This equation exactly enforces
that 𝜙𝑖 (𝑥 1𝑖 , 𝑦𝑖1 ) = 1 and that it is equal to zero on the other corners.
Solving the equation we find

𝛼 𝑥 2𝑖 𝑦𝑖3 − 𝑥 3𝑖 𝑦𝑖2
© 1 ª 1 ©­ 𝑖 ª
­𝛼2 ® = ­ 𝑦 2 − 𝑦𝑖3 ®® (4.57)
𝐴
« 𝑥3 − 𝑥2 ¬
𝑖 𝑖
«𝛼3 ¬
where
1 𝑥 1𝑖 𝑦𝑖1
𝐴 = 1 𝑥 2𝑖 𝑦𝑖2 = 𝑥 1𝑖 𝑦𝑖2 − 𝑥 1𝑖 𝑦𝑖3 − 𝑥2𝑖 𝑦𝑖1 + 𝑥 2𝑖 𝑦𝑖3 + 𝑥 3𝑖 𝑦𝑖1 − 𝑥 3𝑖 𝑦𝑖2
1 𝑥 3𝑖 𝑦𝑖3

is determinant of the matrix (this equals twice the area of the trian-
gle). Our basis function therefore evaluates to
1  𝑖 𝑖 
𝜙𝑖 (𝑥, 𝑦) = (𝑥 2 𝑦 3 − 𝑥 3𝑖 𝑦𝑖2 ) + (𝑦𝑖2 − 𝑦𝑖3 ) 𝑥 + (𝑥 3𝑖 − 𝑥 2𝑖 ) 𝑦 (4.58)
𝐴
inside this specific triangle. Note that 𝜙𝑖 will have different formulas
inside different triangles and is non-zero only inside triangles that
have node 𝑖 as a corner.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 84

You might be wondering how we are going to deal with the


seconder-order derivative of Eq. (4.53) when we are representing 𝑓
as a piecewise linear function. This is a good point! Indeed second
derivatives of a linear function are equal to zero everywhere. This
problem is solved with the second trick of the finite element method:
Multiply Eq. (4.53) by a function ℎ(𝑥, 𝑦) and integrate over the
entire domain to obtain
∫   ∫
2
∇ 𝑓 (𝑥, 𝑦) ℎ(𝑥, 𝑦) dΩ = 𝑔(𝑥, 𝑦) ℎ(𝑥, 𝑦) dΩ. (4.59)
Ω Ω

It should be pretty clear that the above equation holds for any choice
of ℎ if Eq. (4.53) holds, but why this is a useful thing to consider
might be less obvious. In fact, if we require Eq. (4.59) to hold
for any choice ℎ, then Eq. (4.53) and Eq. (4.59) are exactly
equivalent!11 This integral version [Eq. (4.59)] is called the weak
formulation of the differential equation and is the one used in FEM.
The function ℎ is called a test function.
By integrating by parts, we obtain
∫ ∫ ∫
− ∇ 𝑓 · ∇ℎ dΩ + ℎ ∇ 𝑓 · 𝑛ˆ dΓ = 𝑔 ℎ dΩ, (4.60)
Ω Γ Ω

where 𝑛ˆ is the unit normal vector to the boundary of our domain12.


Neumann boundary conditions precisely specify the value of
∇ 𝑓 · 𝑛ˆ on the boundary. If we have any such boundary conditions
11 We are not being very precise with defining which function spaces 𝑓 and ℎ
belong ∫ to. But rest assured that the statement
∫ can be made precise.
12 is an integral over our domain and is an integral along the boundary
Ω Γ
of our domain.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 85

we use them in the above expression. For boundaries where we


have Dirichlet boundary conditions, the boundary integral terms
will become irrelevant. Here we will assume that we either have
only Dirichelet boundary conditions, or that our Neumann boundary
conditions are of the form ∇ 𝑓 · 𝑛ˆ = 0. This leaves us finally with
the equation to be satisfied
∫ ∫
− ∇ 𝑓 · ∇ℎ dΩ = 𝑔 ℎ dΩ for all ℎ. (4.61)
Ω Ω

It should now be clear how we can use linear functions to


approximate 𝑓 : we no longer have to deal with a second derivative.
Í
On our linearly approximated 𝑓 (𝑥, 𝑦) = 𝑖 𝑓𝑖 𝜙𝑖 (𝑥, 𝑦), the equation
becomes
Õ ∫ ∫
− 𝑓𝑖 ∇𝜙𝑖 · ∇ℎ dΩ = 𝑔 ℎ dΩ for all ℎ. (4.62)
𝑖 Ω Ω

To deal with the requirement ‘for all ℎ‘, we will also expand ℎ
in our basis functions:13
Õ
ℎ(𝑥, 𝑦) = ℎ𝑖 𝜙𝑖 (𝑥, 𝑦). (4.63)
𝑖

With these linear approximations we can restate Eq. (4.62) as


Õ ∫ ∫
− 𝑓𝑖 ∇𝜙𝑖 · ∇𝜙 𝑗 dΩ = 𝑔 𝜙 𝑗 dΩ for all 𝑗, (4.64)
𝑖 Ω Ω
13 This is called the Galerkin method.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 86

since if the equation holds for all basis functions, it will also hold
for all combinations of them and therefore for any ℎ.
We now simply need to perform the integrals, which only depend
on quantities that are known, and then we have a linear equation for
{ 𝑓𝑖 } that can be solved by our usual approach.

You now know all of the main ingredients of the finite element
method. What is left is the mathematics of carrying out the in-
tegrals. In principle, you could simply do numerical integration,
but we can also do it analytically, which is naturally better. Let us
explicitly evaluate the left-hand side of Eq. (4.64) for our linear
basis function.
Clearly ∫
∇𝜙𝑖 · ∇𝜙 𝑗 dΩ = 0 (4.65)
Ω
if node 𝑖 and node 𝑗 do not share a triangle. If they do share a
triangle, we have
1  𝑖 𝑖 
𝜙𝑖 (𝑥, 𝑦) = (𝑥 2 𝑦 3 − 𝑥3𝑖 𝑦𝑖2 ) + (𝑦𝑖2 − 𝑦𝑖3 ) 𝑥 + (𝑥3𝑖 − 𝑥 2𝑖 ) 𝑦 ,
𝐴
1 h 𝑗 𝑗 𝑗 𝑗 𝑗 𝑗 𝑗 𝑗
i
𝜙 𝑗 (𝑥, 𝑦) = (𝑥2 𝑦 3 − 𝑥 3 𝑦 2 ) + (𝑦 2 − 𝑦 3 ) 𝑥 + (𝑥 3 − 𝑥 2 ) 𝑦
𝐴
inside the shared triangle. Note that with our slightly cumbersome
notation for the points, our formula is valid for both the case when
𝑖 = 𝑗 and 𝑖 ≠ 𝑗.
Thus within the triangle
1 h 𝑗 𝑗 𝑗 𝑗
i
∇𝜙𝑖 · ∇𝜙 𝑗 = 2 (𝑦𝑖2 − 𝑦𝑖3 )(𝑦 2 − 𝑦 3 ) + (𝑥 3𝑖 − 𝑥 2𝑖 )(𝑥 3 − 𝑥 2 )
𝐴
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 87

is constant and we find



1 h 𝑖 𝑖 𝑗 𝑗 𝑖 𝑖 𝑗 𝑗
i
∇𝜙𝑖 · ∇𝜙 𝑗 dΩ = (𝑦 2 − 𝑦 3 )(𝑦 2 − 𝑦 3 ) + (𝑥 3 − 𝑥 2 )(𝑥3 − 𝑥 2 )
Ω 2𝐴
(4.66)
since the integral of a constant over a triangle is simply equal to the
constant multiplied by the triangle’s area (= 1/2 𝐴 in our notation).
More complicated terms can arise depending on the PDE under
considering, but the integrals should nonetheless be fairly simple
to carry out.
How to evaluate the right-hand side depends on 𝑔(𝑥, 𝑦). Numer-
ical integration could be done directly on 𝑔, or one could expand 𝑔
as well in the basis functions.
To summarise:

Finite Element Method

1. Rewrite the PDE in the weak formulation using a test


function ℎ

2. Choose basis functions {𝜙𝑖 }


Í
3. Expand 𝑓 = 𝑖 𝑓𝑖 𝜙𝑖 and find an equation for each 𝑗 by
taking ℎ = 𝜙 𝑗 in the weak formulation

4. Apply Neumann boundary conditions for the boundary


integrals.

5. Evaluate all integrals to obtain a linear equation A 𝒇 =


𝒃
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 88

6. Apply Dirichlet boundary conditions changing the


equation to Abc 𝒇 = 𝒃 bc

7. Solve the linear equation to obtain 𝒇 =


( 𝑓1 , 𝑓2 , · · · , 𝑓 𝑁 ) which finally gives the solution
Í
𝑓 (𝒙) = 𝑖 𝑓𝑖 𝜙𝑖 (𝒙)

This was a lot of effort to describe the method. To really get


a feeling for what we have developed, we give an example that
illustrates all steps:

Example
Consider Laplace’s equation

∇2 𝑓 (𝑥, 𝑦) = 0 (4.67)

on the following 16-point unstructured mesh:


CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 89

2 4
10 11

12
1 9 16 5

13
15
14
8 6

7
where grey colour indicates boundary nodes and edges.
We take the following boundary conditions:



 𝑓 (𝑥, 𝑦) = 𝛼 on edge 1–2,
𝑓 (𝑥, 𝑦) = 𝛽 on edge 6–7, (4.68)
𝑓 (𝑥, 𝑦) · 𝑛ˆ = 0


 on remaining boundary.
Physically the solution of this boundary-value problem cor-
responds to the steady state temperature profile of the con-
sidered region, where the edge between node 1 and 2 is kept
at temperature 𝛼, and the edge between node 6 and 7 is kept
at temperature 𝛽, and no heat can escape through the rest of
the boundary.
Taking 𝑔 = 0 in Eq. (4.64), we write our equation as
A 𝒇 = 0, (4.69)
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 90

where for instance



A9,15 = ∇𝜙15 · ∇𝜙9 dΩ
∫ Ω ∫
= ∇𝜙15 · ∇𝜙9 dΩ + ∇𝜙15 · ∇𝜙9 dΩ
𝑇9,15,16 𝑇1,9,15
1
= [(𝑦 9 − 𝑦 16 )(𝑦 15 − 𝑦 16 ) + (𝑥 16 − 𝑥 9 )(𝑦 16 − 𝑥15 )]
2𝐴9,15,16
1
+ [(𝑦 9 − 𝑦 1 )(𝑦 15 − 𝑦 1 ) + (𝑥 1 − 𝑥9 )(𝑥 1 − 𝑥 15 )]
2𝐴1,9,15

using Eq. (4.66). Here 𝑇𝑖, 𝑗,𝑘 refers to a specific triangle,


and 𝐴𝑖, 𝑗,𝑘 is the associated determinant. Note that there are
two terms because node 9 and 15 share exactly two triangles.
Likewise, for example, A16,16 will have seven terms, one for
each triangle.

Our Neumann boundary condition 𝑓 (𝑥, 𝑦) · 𝑛ˆ = 0 is automati-


cally applied at all boundaries due the partial integration [Eq.
(4.60)]. Had we instead had 𝑓 (𝑥, 𝑦) · 𝑛ˆ ≠ 0, we would have
extra integrals to do. The Dirichlet boundary conditions
are applied in the same manner as for the finite difference
method.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 91

Evaluating all integrals for our specific mesh we obtain


1 0 0 0 0 ··· 0 0 0 0
0 ···
© 00 −0.021 0 0
1.42 0.01 0 ···
0
0
0
0
0
0
0
0 ª
­ 0 0 0.01 1.39 0.01 ··· 0 0 0 0 ®
0.01 1.41 ··· −0.53
­ 0 0 0 0 0 0 ®
0 ···
­ 0 0 0 0 0 0 0 0 ®
®
0 ···
­ 0 0 0 0 0 0 0 0 ®
­
Abc = ­­ −0.12 0
−1.33 −0.29 0
0 0
0
0 ···
0 ···
0
0
0
0
−1.43
−1.01
0 ®
−0.48 ® ,
­ 0 −1.13 −0.49 0 0 ··· 0 0 0 −0.48 ®
­ 0 0 −0.93 −0.68 0 ··· 0 0 0 −0.48 ®
­ 0 0 0 −0.73 −0.88 ··· −0.82 0 0 −0.48 ®
­ 0 0 0 0 −0.53 ··· 3.79 −0.88 0 −0.48 ®
­ 0 0 0 0 0 ··· −0.88 3.95 −0.98 −0.48 ®
−0.08 0 0 0 0 ··· 0 −0.98 4.10 −0.48
« 0 0 0 0 0 ··· −0.48 −0.48 −0.48 3.37 ¬

where on row 1, 2, 6, 7 we have applied Dirichlet condi-


tions. All rows sum to zero except those with Dirich-
let conditions applied. Furthermore, we have 𝒃 bc =
(𝛼, 𝛼, 0, 0, 0, 𝛽, 𝛽, 0, 0, 0, 0, 0, 0, 0, 0, 0)𝑇 .
Solving the linear problem

Abc 𝒇 = 𝒃 bc (4.70)

yields the solution at all nodes, which can then be interpolated


using our linear basis functions.

We have only discussed the two-dimensional finite element


method on triangles. The method can, naturally, be formulated
in any dimension, and using more complicated meshes than trian-
gular ones. The simplest extension to the method presented is to
consider quadratic functions on triangular meshes. In this case, on
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 92

each triangle six points are solved for in order to fix a quadratic
polynomial14 on the triangle. These are typically chosen as15

This is called the “P2-element” (or Lagrange Element of order 2),


whereas we have presented FEM on the “P1-element” (Lagrange
Element of order 1). In three dimensions, the simplest extension is
linear functions on tetrahedral meshes. Modern FEM libraries will
have all these versions, and many more, preimplemented.
As mentioned in the introduction, naturally, FEM can also be
applied to ODEs, although their use case here is somewhat limited,
as it is easy to obtain the same discrete formulations using the
method of finite differences. Nonetheless we finish with an example
illustrating FEM used to solve an ODE:

14 𝜙 (𝑥, 𝑦) = 𝛼 + 𝛼 𝑥 + 𝛼 𝑦 + 𝛼 𝑥 2 + 𝛼 𝑦 2 + 𝛼6 𝑥𝑦
𝑖 1 2 3 4 5
15 We have to put three nodes on each edge in order to make the one-
dimensional quadratic polynomials well-defined on the edge, which is shared
between two triangles. Some element types will have nodes inside the triangles
as well (e.g. P3-elements, which use cubic polynomials, will have a single node
inside the triangles and four on each edge). Elements can also be defined that do
not share any nodes between neighbouring triangles/cells. This can be useful if
the solution is expected to have discontinuities.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 93

Example
Consider
𝜕 2 𝑓 (𝑥)
=𝛾 (4.71)
𝜕𝑥 2
for 𝑥 ∈ [𝑎, 𝑏] with 𝑓 (𝑎) = 𝛼 and 𝑓 0 (𝑏) = 𝛽, 𝛾 is a constant.
Multiplying by ℎ(𝑥) and integrating by parts our weak for-
mulation becomes
∫ 𝑏 ∫ 𝑏
0 0 0 𝑏
− 𝑓 (𝑥)ℎ (𝑥) d𝑥+[ 𝑓 (𝑥)ℎ(𝑥)] 𝑎 = 𝛾ℎ(𝑥) d𝑥 for all ℎ.
𝑎 𝑎

We will solve this on a grid [𝑥1 , 𝑥2 , · · · , 𝑥 𝑁 ] using piece-


wise linear basis function {𝜙𝑖 (𝑥)}. We again choose these
such that they are equal to 1 on their associated node and 0
elsewhere. Thus for instance, 𝜙𝑖 (𝑥) looks like

𝑥𝑖−1 𝑥𝑖 𝑥𝑖+1
which can be written as
𝑥−𝑥𝑖−1



 𝑥𝑖 −𝑥𝑖−1 𝑥𝑖−1 < 𝑥 ≤ 𝑥𝑖 & 𝑖>1
 𝑥 −𝑥

𝜙𝑖 (𝑥) = 𝑖+1
𝑥𝑖+1 −𝑥𝑖 𝑥𝑖 < 𝑥 ≤ 𝑥𝑖+1 & 𝑖 < 𝑁 (4.72)



0 elsewhere.

Í
Writing 𝑓 (𝑥) = 𝑖 𝑓𝑖 𝜙𝑖 (𝑥) and ℎ(𝑥) = 𝜙 𝑗 (𝑥), the equations
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 94

to be satisfied for all 𝑗 are




 ? 𝑗 =1
𝑏 


− 𝑓𝑖 𝜙0𝑖 (𝑥)𝜙0𝑗 (𝑥) d𝑥 = 1
2 (𝑥 𝑗+1 − 𝑥 𝑗−1 ) 𝛾 1< 𝑗 <𝑁
𝑎 


𝛽
 𝑗 = 𝑁.

where we used 𝑓 0 (𝑏) = 𝛽, 𝜙 𝑁 (𝑏) = 1 and


∫ 𝑏
1
𝛾𝜙 𝑗 (𝑥) d𝑥 = (𝑥 𝑗+1 − 𝑥 𝑗−1 ) 𝛾 (4.73)
𝑎 2
for 1 < 𝑗 < 𝑁. We have a question mark for 𝑗 = 1, because
we do not have the value for 𝑓 0 (𝑎), but this does not matter,
as the 𝑗 = 1 row will be replaced with the boundary condition
𝑓 (𝑎) = 𝛼.
Evaluating the integral on the left-hand side we find for 𝑖 = 𝑗:
∫ 𝑏 ∫ 𝑥𝑖  2
1
𝜙0𝑖 (𝑥)𝜙0𝑖 (𝑥) d𝑥 = (1 − 𝛿1𝑖 ) d𝑥
𝑎 𝑥𝑖−1 𝑥𝑖 − 𝑥𝑖−1
∫ 𝑥𝑖+1  2
1
+ (1 − 𝛿 𝑁𝑖 ) d𝑥
𝑥𝑖 𝑥𝑖+1 − 𝑥𝑖
1 − 𝛿1, 𝑖 1 − 𝛿 𝑁, 𝑖
= + , (4.74)
𝑥𝑖 − 𝑥𝑖−1 𝑥𝑖+1 − 𝑥𝑖
where we used the Kronecker delta to also make our expres-
sions valid at the boundaries. Likewise, for 𝑗 = 𝑖 + 1 we
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 95

find
∫ 𝑏
𝛿 𝑁, 𝑖 − 1
𝜙0𝑖 (𝑥)𝜙0𝑖+1 (𝑥) d𝑥 = . (4.75)
𝑎 𝑥𝑖+1 − 𝑥𝑖
All other integrals are equal to zero. We can now construct
∫𝑏
the matrix A which has entries A𝑖 𝑗 = − 𝑎 𝜙0𝑖 (𝑥)𝜙0𝑗 (𝑥) d𝑥,
apply our boundary conditions and finally solve Abc 𝒇 = 𝒃 bc
as usual.

On a regular grid with 𝑥𝑖+1 − 𝑥𝑖 = Δ𝑥, the matrix evaluates to

1 0 0 0 ··· 0 0 0 0 𝛼
© ª
­1 −2 1 © ª
­ Δ𝑥 Δ𝑥 Δ𝑥 0 ··· 0 0 0 0® ® ­𝛾Δ𝑥 ®
­ ®
­ 1 −2 1 ®
­0 Δ𝑥 Δ𝑥 Δ𝑥 ··· 0 0 0 0® ­𝛾Δ𝑥 ®
­ ®
­ ®
­ .. .. .. .. .. .. .. .. .. ® 𝒇 = ­­ .. ®® ,
­. . . . . . . . . ®® ­ . ®
­
1 −2 1 ­ ®
­0
­ 0 0 0 ··· Δ𝑥 Δ𝑥 Δ𝑥 0® ® ­𝛾Δ𝑥 ®
­ ®
­ 1 −2 1 ®
­0
­ 0 0 0 ··· 0 Δ𝑥 Δ𝑥 Δ𝑥 ®
® ­𝛾Δ𝑥 ®
­ ®
−1 1
«0 0 0 0 ··· 0 0 Δ𝑥 Δ𝑥 ¬ « 𝛽 ¬

which is exactly what we found using finite difference if we


divide the inner rows by Δ𝑥. We replaced the 𝑗 = 1 row
with the Dirichlet boundary conditions, which got rid of our
question mark.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 96

4.6 Non-linear Problems


The same exact ideas as presented for ODEs in Sec. 3.2.4 also apply
for non-linear PDE problems. For instance, relaxation methods can
readily be applied.

For some time-dependent problems, one can also apply a semi-


implicit approach. Consider for instance the PDE
𝜕𝑓
= 𝑁 ( 𝑓 ) ∇2 𝑓 , (4.76)
𝜕𝑡
where 𝑁 ( 𝑓 ) is some term that renders the equation non-linear, such
as 𝑁 ( 𝑓 ) = 𝑓 or 𝑁 ( 𝑓 ) = sin( 𝑓 ). A fully implicit scheme for this
equation is given by

𝑓 (𝑡 + Δ𝑡, 𝒙) = 𝑓 (𝑡, 𝒙) + 𝑁 ( 𝑓 (𝑡 + Δ𝑡, 𝒙)) ∇2 𝑓 (𝑡 + Δ𝑡, 𝒙). (4.77)

This is a non-linear equation for 𝑓 (𝑡 + Δ𝑡, 𝒙), and requires e.g.


relaxation methods to be solved.
A semi-implicit approach is given by

𝑓 (𝑡 + Δ𝑡, 𝒙) = 𝑓 (𝑡, 𝒙) + 𝑁 ( 𝑓 (𝑡, 𝒙)) ∇2 𝑓 (𝑡 + Δ𝑡, 𝒙). (4.78)

Now the non-linearity only depends on 𝑓 (𝑡, 𝒙) and so the equation


is linear in 𝑓 (𝑡 + Δ𝑡, 𝒙). The approach is called semi-implicit for
the obvious reason that it is a mix of explicit and implicit terms, i.e.
not fully implicit.
We end the next section with an example of a semi-implicit
approach.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 97

4.7 Operator Splitting


Consider a PDE
𝜕𝑓
= D1 𝑓 + D2 𝑓 , (4.79)
𝜕𝑡
where D1 and D2 are two spatial differential operators. An implicit
scheme to solve this equation would be

𝑓 (𝑡 + Δ𝑡, 𝒙) = 𝑓 (𝑡, 𝒙) + 𝐷 1 𝑓 (𝑡 + Δ𝑡, 𝒙)Δ𝑡 + 𝐷 2 𝑓 (𝑡 + Δ𝑡, 𝒙)Δ𝑡,

where 𝐷 1 and 𝐷 2 are the discretised versions of the differential


operators. Operator splitting is an approach in which we solve the
problem for each differential operator separately. In the present
example, we could first solve

𝑓 ∗ (𝑡 + Δ𝑡, 𝒙) = 𝑓 (𝑡, 𝒙) + 𝐷 1 𝑓 ∗ (𝑡 + Δ𝑡, 𝒙)Δ𝑡, (4.80)

for 𝑓 ∗ (𝑡 + Δ𝑡, 𝒙) and then subsequently solve

𝑓 (𝑡 + Δ𝑡, 𝒙) = 𝑓 ∗ (𝑡 + Δ𝑡, 𝒙) + 𝐷 2 𝑓 (𝑡 + Δ𝑡, 𝒙)Δ𝑡 (4.81)

to complete the time step.


If we consider the right-hand side of the equation as physical
forces, it is easy to get intuition for why this approach works: we
simply first apply the effect of the first force and then apply the effect
of the second force: the result will be approximately the same if
Δ𝑡 is small. When the forces applied are independent (in the sense
that they commute) the method is exact. The order of the applying
the operators can be further intermixed by taking two half-Δ𝑡 time
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 98

steps in turn for each operator. This is called Strang Splitting and
will improve accuracy.
Operator splitting is particularly useful when solving systems
of PDEs that depend on one another as it allows us to solve the time
step of each equation independently of the others.

We finish this chapter with an example that showcases operator


splitting as well as many of the approaches we have discussed in this
chapter. In particular, we will demonstrate one approach to solving
the Navier–Stokes equations. This text is introductory, and the
Navier–Stokes equations are renowned for being difficult to solve
numerically. The method we present will work in many cases, but
obviously is not fit for all applications.

Example
Consider the incompressible Navier–Stokes equations
𝜕𝒖
= −(𝒖 · ∇)𝒖 + 𝜈∇2 𝒖 − ∇𝑝, (4.82)
𝜕𝑡
∇ · 𝒖 = 0.

Here, 𝒖 is a velocity field and 𝑝 a pressure field to be solved


for. 𝜈 is the viscosity constant. In two spatial dimensions,
where 𝒖 = (𝑢 𝑥 , 𝑢 𝑦 ), this is a system of PDEs for three quan-
tities: 𝑢 𝑥 (𝑡, 𝑥, 𝑦), 𝑢 𝑦 (𝑡, 𝑥, 𝑦), and 𝑝(𝑡, 𝑥, 𝑦).
We could solve for all these quantities simultaneously, but this
is quite tedious. Instead we will employ operator splitting
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 99

and semi-implicit time stepping which will allow us to solve


for each quantity independently.

We employ operator splitting in which we start by ignoring


the ∇𝑝 term. The tentative time step is then be performed by
taking a semi-implicit approach, e.g.
𝑢 ∗𝑥 (𝑡 + Δ𝑡) = 𝑢 𝑥 (𝑡) (4.83)
∗ 2 ∗
 
+ −(𝒖(𝑡) · ∇) 𝑢 𝑥 (𝑡 + Δ𝑡) + 𝜈∇ 𝑢 𝑥 (𝑡 + Δ𝑡) Δ𝑡
and likewise for 𝑢 ∗𝑦 . Note that (𝒖(𝑡) · ∇) is simply a linear
differential operator as 𝒖(𝑡) is known. It is thus no different
than what we considered e.g. in Eq. (3.33).
We already know how to solve the PDE of Eq. (4.83), and
we can use any method we wish: finite differences, spectral
or finite element.

We now need to apply ∇𝑝 in the second part of our operator


splitting step. Thus we need to evaluate e.g.
𝑢 𝑥 (𝑡 + Δ𝑡) = 𝑢 ∗𝑥 (𝑡 + Δ𝑡) − 𝜕𝑥 𝑝(𝑡 + Δ𝑡)Δ𝑡. (4.84)
However, we do not yet know 𝑝(𝑡 + Δ𝑡). This is set by the
second part of Eq. (4.82).
If we take Eq. (4.84) and the corresponding equation  for
𝑢 𝑦 we can inset 𝒖(𝑡 + Δ𝑡) = 𝑢 𝑥 (𝑡 + Δ𝑡), 𝑢 𝑦 (𝑡 + Δ𝑡) into
∇ · 𝒖(𝑡 + Δ𝑡) = 0 to obtain
𝜕𝑥 𝑢 ∗𝑥 (𝑡 + Δ𝑡) + 𝜕𝑦 𝑢 ∗𝑦 (𝑡 + Δ𝑡) = ∇2 𝑝(𝑡 + Δ𝑡)Δ𝑡, (4.85)
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 100

where we used ∇ · ∇ = ∇2 . But at this stage 𝑢 ∗𝑥 and 𝑢 ∗𝑦 are


known and so this is simply a Poisson equation for 𝑝(𝑡 + Δ𝑡)
which we know well how to solve.

After solving for 𝑝(𝑡 + Δ𝑡), we inset the solution into Eq.
(4.84) and finally evaluate 𝒖(𝑡 + Δ𝑡). In this way each time
step of the Navier–Stokes equation is taken by solving three
equations: one for a tentative version of the velocity field
𝒖 ∗ , one for the pressure field 𝑝, and finally one for the real
velocity field 𝒖.

The method of the example is a version of Chorin’s projection


method. The advection term (𝒖 · ∇)𝒖 moves the velocity field a
distance |𝒖|Δ𝑡 each time step. We naturally run into problems if this
is larger than our spatial discretisation (Δ𝑥). Thus we must choose
Δ𝑡 small enough to ensure this.16 Other approaches to solving the
Navier–Stokes equations avoid this issue.

Finally we note that the Navier–Stokes equations, and many


other equations, are often solved on a staggered grid — i.e. grids
where 𝒖 and 𝑝 do not live at the same points. For instance, we
could discretise 𝒖(𝑥, 𝑦) and 𝑝(𝑥, 𝑦) like this:
16 This is called the Courant–Friedrichs–Lewy (CFL) condition.
CHAPTER 4. PARTIAL DIFFERENTIAL EQUATIONS 101

𝒖 𝒖 𝒖

𝑝 𝑝

𝒖 𝒖 𝒖

𝑝 𝑝

𝒖 𝒖 𝒖
Note that a central finite difference scheme of e.g. 𝑝 in an equation
for 𝒖 is naturally formulated on such grids This is good for accuracy
and helps avoid some numerical problems (“checkboard problems”)
when using non-projection methods to solve fluid problems. We
will not discuss such approaches further here, but mention them
only so you are aware of their existence.
Stochastic Systems

Not all physical laws are deterministic differential equations. Often


we have to deal with stochastic systems. The source of randomness
could be true random events such as nuclear decay or the mea-
surement of a wave function, or it could be the seemingly random
behaviour of stock markets or microorganism motility. Perhaps the
most famous stochastic system is that of Brownian motion: the
random movement of small particles due to thermal noise.
To able to simulate stochastic systems we need the computer to
able to sample random numbers. For instance, the time 𝑡 between
events of radioactive decay is exponentially distributed

𝑝(𝑥) = 𝜆𝑒 −𝜆𝑡 . (5.1)

So in order to simulate such a system, we need to be able to sample


exponentially distributed numbers. Most programming languages
provide random number generators for standard distributions, but
custom methods are needed for special distributions. We begin this
chapter with discussing this, and end with a few methods that are
typically used for simulating specific random systems.

5.1 Random Numbers


To simulate random events on a computer, you need to be able to
sample random numbers. However, a computer is a deterministic

102
CHAPTER 5. STOCHASTIC SYSTEMS 103

machine and cannot do anything truly random. All we can do is


some mathematical operations that make it seem random enough.
This is called pseudo-random number generation. To give a simple
example, consider the sequence generated by1

𝑥 𝑛+1 = 48271 𝑥 𝑛 mod (232 − 1). (5.2)

We start with some seed 𝑥 0 and then keep applying the above
formula. For instance, starting with 𝑥 0 = 1656264184 yields

𝑥 1 = 3007196734, 𝑥2 = 3383877799, 𝑥3 = 1264039384 · · ·


(5.3)
In this way we can generate pseudo-random integers between 0 and
232 − 2. The initial seed 𝑥 0 could be taken from some source that
constantly changes such as the time on the computer in microsec-
onds, or similar. If we need random integers between 0 and some
number 𝑁, we simply use 𝑦 𝑛 = 𝑥 𝑛 mod (𝑁 + 1).
In physics we are typically interested not in integers, but real
numbers. If, for instance, we need random numbers sampled be-
tween 0 and 1 we could then simply take
𝑥𝑛
𝑟𝑛 = , (5.4)
232−2
which is a good approximation to a random uniform number.
The scheme we just presented is not great though. There are
much better versions, and you will probably never need to imple-
ment your own.
1 Recall that 𝑎 mod 𝑏 means the integer remainder of the division 𝑎/𝑏.
CHAPTER 5. STOCHASTIC SYSTEMS 104

It is good to understand, nonetheless, the principles behind such


number generation. In this chapter we are going to assume that you
have access to a library that reliably generates pseudo-random num-
bers. In particular, we will assume that you can generate integers,
both uniform and Poisson distributed, and real numbers, both uni-
form and normal distributed.

5.1.1 Inverse Transform Sampling


Suppose you need to sample random numbers from a distribution
𝑝(𝑥), but this distribution is not implemented in your language of
choice. This is in fact a very common situation. If the probability
distribution is simple (and 1D), the best method to use, by far, is
inverse transform sampling.

Inverse Transform Sampling


To sample from a probability distribution 𝑝(𝑥), solve for the
inverse cumulative distribution 𝑄(𝑢):
∫ 𝑥
𝑝(𝑥 0) d𝑥 0 = 𝑢 ⇔ 𝑥 = 𝑄(𝑢). (5.5)
0

Now sample a uniform random number 𝑈 between 0 and 1.


The number
𝑋 = 𝑄(𝑈) (5.6)
will then be a random number sampled from 𝑝(𝑥).
CHAPTER 5. STOCHASTIC SYSTEMS 105

Proving this method works is quite simple, although we will skip a


formal derivation here. Intuitively, nonetheless, you can note that
the cumulative distribution is always a function that maps an input
𝑥 to a number between [0, 1]. We choose a random location on
this 𝑦-axis and ask which 𝑥 that corresponds to by using the inverse
function.

Example
To sample a random number from the exponential distribu-
tion 𝑝(𝑥) = 𝜆𝑒 −𝜆𝑥 (defined on [0, ∞]) we first need solve
∫ 𝑥
𝜆𝑒 −𝜆𝑥 d𝑥 0 = 1 − 𝑒 −𝜆𝑥 = 𝑢 (5.7)
0
for 𝑥 as a function of 𝑢. This one is easy and we find
1
𝑄(𝑢) = − log(1 − 𝑢). (5.8)
𝜆
Now we can use a standard sampler on a uniform interval to
sample exponentially distributed numbers. Note that if 𝑈 is
uniform on [0, 1] then so is 1 − 𝑈, so we can also use
1
𝑄(𝑢) = − log(𝑢). (5.9)
𝜆
Observe that indeed 𝑄(𝑢) will map to [0, ∞] for input in
[0, 1], as must be the case.
CHAPTER 5. STOCHASTIC SYSTEMS 106

The formula can be used even if the equation cannot be solved


analytically, as a numeric solution is adequate. In fact, not even the
integral needs to be solved analytically, but helps in terms of speed
of the algorithm.

5.1.2 Rejection Sampling


Inverse transform sampling works well for one-dimensional prob-
lems, but sometimes you need to sample for multi-dimensional
distributions. For this rejection sampling can be used:

Rejection Sampling
To sample from a probability distribution 𝑝(𝑥), choose a
proposal distribution 𝑞(𝑥) from which it is simpler to sample
and which is non-zero for all 𝑥 where 𝑝(𝑥) is non-zero.
Find an 𝑀 (preferable as small as possible) such that

𝑝(𝑥) ≤ 𝑀𝑞(𝑥) for all 𝑥 (5.10)

Then

1. Sample an 𝑥 from 𝑞(𝑥)

2. Sample a uniform random number 𝑈 on [0, 1]

3. • If 𝑈 ≤ 𝑝(𝑥)
𝑀𝑞(𝑥) keep the sample
• Otherwise start over.
CHAPTER 5. STOCHASTIC SYSTEMS 107

This method is also very simple to use, but how fast it is depends
on the choice of 𝑞(𝑥). Preferably, 𝑞 should be chosen to be as
close to 𝑝(𝑥) as possible in order to avoid rejection by the last step.
Intuitively, you expect to sample about 𝑀 numbers before getting an
acceptance. Therefore it is important to choose a 𝑞 that minimizes
𝑀.
We will also skip a formal derivation of this method, but again
it should be fairly intuitive: You sample from 𝑞 and then adjust for
the fact that this is the wrong distribution by making samples more
unlikely in proportion to the distance | 𝑝(𝑥) − 𝑀 𝑞(𝑥)|.

Example
Consider sampling from the two-dimensional distribution
1
𝑝(𝑥, 𝑦) = (1 + cos(𝑥 + 𝑦)) (5.11)
4𝜋 2
for 𝑥, 𝑦 ∈ [0, 2𝜋]. As proposal distribution we simply choose
the uniform distribution
1
𝑞(𝑥, 𝑦) = , (5.12)
4𝜋 2
which is extremely easy to sample from, as we just sample
two uniform random numbers on [0, 2𝜋]. The maximal value
of 𝑝(𝑥, 𝑦) is 2𝜋1 2 , and so the best 𝑀 we can choose is 𝑀 = 2.
CHAPTER 5. STOCHASTIC SYSTEMS 108

5.1.3 Markov Chain Monte Carlo


Sometimes the probability distribution you are trying to sample
from is too complicated for rejection sampling to work well. In
this case you can turn to Markov Chain Monte Carlo (MCMC).
The idea of MCMC is to throw away the requirement that we need
independent samples. Instead we create a long sequence of samples
where each sample is allowed to be correlated with the previous,
but have the same statistics as independent samples if shuffled.
MCMC is a random walk in parameter space, biased in such
a way that we spent more time in areas of high probability. We
carefully choose the biasing such that after a long time, we have
perfectly sampled the probability distribution.
We present only the simplest version of Markov Chain Monte
Carlo, which depend on a choice of jump distribution 𝑔(𝑥 | 𝑥 0) that is
easy to sample from. For this one, if the parameters are continuous,
we will often choose a Gaussian
1 0 2 /2𝜎 2
𝑔(𝑥 | 𝑥 0) = √ 𝑒 −(𝑥−𝑥 ) . (5.13)
2𝜋𝜎 2

MCMC: Metropolis–Hastings
To sample 𝑁 points from 𝑝(𝑥) choose a jump distribution
𝑔(𝑥 | 𝑥 0) from which it is easy to sample. Choose a starting
point 𝑥 0 . Then for 𝑛 ∈ [1, 2, · · · 𝑁]

1. Sample 𝑥 from 𝑔(𝑥 | 𝑥 𝑛−1 ).


CHAPTER 5. STOCHASTIC SYSTEMS 109

𝑝(𝑥) 𝑔(𝑥 𝑛−1 | 𝑥)


2. Calculate 𝛼 = 𝑝(𝑥 𝑛−1 ) 𝑔(𝑥 | 𝑥 𝑛−1 )

3. • If 𝛼 > 1 set 𝑥 𝑛 = 𝑥
• Otherwise sample a uniform random number 𝑈
on [0, 1].
– If 𝑈 ≤ 𝛼 set 𝑥 𝑛 = 𝑥
– Otherwise set 𝑥 𝑛 = 𝑥 𝑛−1 .

For sufficiently large 𝑁, the sequence of samples can be


shuffled to emulate independent samples of 𝑝.

Note that if 𝑔 is symmetric, then 𝑔(𝑥 𝑛−1 | 𝑥)/𝑔(𝑥 | 𝑥 𝑛−1 ) = 1, sim-


plifying the method. This is for instance the case for Eq. (5.13).
The early samples of the sequence will depend on the choice of
𝑥 0 . Therefore it is advisable to discard the first many samples (say
the first 1,000, depending on the distribution being sampled). This
is called the warmup or burn-in period.
The choice of 𝑔 will massively affect the efficiency of the
method. In the case that you use Eq. (5.13) for 𝑔(𝑥), how would
you choose 𝜎? For high-dimensional problems, the optimal choice
is one that leads to acceptance probability of about 23 %. This can
be tuned during warmup.

Note that Markov Chain Monte Carlo works even if you do not
have access to a normalised distribution, as it only uses 𝑝(𝑥)/𝑝(𝑥 0).
This is extremely useful both for physical simulations and for data
CHAPTER 5. STOCHASTIC SYSTEMS 110

modelling.

Physical simulation — As an example of using MCMC to do


physical calculation we consider the Boltzmann distribution of sta-
tistical physics
𝑒 −𝐸 (𝒙)/𝑇
𝑝(𝒙) = . (5.14)
𝑍
Here, 𝒙 is the micro-state of the system, 𝐸 is the energy of the
that state, 𝑇 the temperature, and 𝑍 the partition function. The
partition function is simply a normalisation, but is typically very
hard to calculate. To calculate statistics of such a model, we can use
MCMC to sample the distribution of microstates, without having to
evaluate 𝑍. In particular, if we use a symmetric jump distribution
𝑔, we simply have 𝛼 = 𝑒 −Δ𝐸/𝑇 . When used for physical simulations
like this, the method is often referred to simply as The Monte Carlo
Method, although this strictly speaking refers to the broad range of
methods that use random number generation.

Data modelling — Consider the case, where you are trying


to estimate some parameters 𝑥 and 𝑦 based on some data. From
physical principles you derive2 𝑝(data | 𝑥, 𝑦). This is typically what
you can get from physics: if we already knew the values of 𝑥 and
𝑦, we can calculate the probability of observing the data we did.
From background information we also typically have a prior on 𝑥
2 The notation 𝑝( 𝐴|𝐵) meaning the probability of 𝐴 conditional on 𝐵 having
occurred.
CHAPTER 5. STOCHASTIC SYSTEMS 111

and 𝑦: 𝑝(𝑥, 𝑦). Then from Bayes formula we have


𝑝(data | 𝑥, 𝑦) 𝑝(𝑥, 𝑦)
𝑝(𝑥, 𝑦 | data) = , (5.15)
𝑝(data)
which is the function you want to sample from. The normalisa-
tion 𝑝(data) is hard to calculate, especially for high-dimensional
problems, and so we typically only have access to
𝑝(𝑥, 𝑦 | data) ∝ 𝑝(data | 𝑥, 𝑦) 𝑝(𝑥, 𝑦) = L (𝑥, 𝑦), (5.16)
where L (𝑥, 𝑦) is called the likelihood function. Fortunately, this is
enough for MCMC to be able to sample 𝑥 and 𝑦 from 𝑝(𝑥, 𝑦 | data).
It will furthermore typically be better to work in terms of log likeli-
hoods to minimise floating points errors. In the case of symmetric
𝑔, the formula for 𝛼 then becomes
𝛼 = exp(log L (𝑥) − log L (𝑥 𝑛−1 )) (5.17)

Often Markov Chain Monte Carlo is used to evaluate integrals


of the form ∫ ∞
𝐼= 𝑓 (𝑥, 𝑦) 𝑝(𝑥, 𝑦) d𝑥 d𝑦. (5.18)
−∞
For instance, if 𝑓 (𝑥, 𝑦) = 𝑥, we calculate the mean value 𝜇𝑥 of
𝑥, and 𝑓 (𝑥, 𝑦) = (𝑥 − 𝜇𝑥 ) 2 will give the variance. To evaluate
these integrals normally we would need a normalised probability
distribution, but with Markov Chain Monte Carlo, we can estimate
it as
𝑁
1 Õ
𝐼≈ 𝑓 (𝑥𝑖 , 𝑦𝑖 ), (5.19)
𝑁 𝑖=1
CHAPTER 5. STOCHASTIC SYSTEMS 112

where (𝑥𝑖 , 𝑦𝑖 ) are the sampled values of 𝑥 and 𝑦. The error on the
estimation of 𝐼 will be of order O (𝑁 −1/2 ).

Finally, we note that Metropolis–Hastings is not the only MCMC


method. More efficient methods (for continuous parameters) exploit
knowledge of the gradient of 𝑝(𝑥). These are called Hamiltonian
Monte Carlo methods and are beyond the scope of this text.

5.2 Event-based Simulations


The outcome of whether a coin falls heads or tails up is easy to
simulate. We could simply sample one of the integers {0, 1}, and
denote tails with zero and heads with one. This is perhaps the
simplest form of an event-based simulation. If the coin is biased, e.g.
tails happen with probability 𝑝, we could instead sample a uniform
number 𝑈 between [0, 1] and ask if 𝑈 < 𝑝, in which case the toss
is tails, and heads otherwise. This simple approach allows us to
simulate event-driven systems. However, many physical systems
are not formulated directly in terms of probabilities of events but
instead in terms of rates. This section deals with how to simulate
such stochastic systems.

An example of a stochastic systems specified in terms of rates


is that of a chemical reaction such as
𝑘
𝐴+𝐵→
− 𝐶. (5.20)
CHAPTER 5. STOCHASTIC SYSTEMS 113

Here, 𝐴 and 𝐵 react together to produce 𝐶 with rate 𝑘. If we have


a large number of reactions, such system can readily be described
by differential equations3 as fluctuations will not be important.
However, if we consider the reaction of a small number of reactants,
fluctuations cannot be ignored.
For Eq. (5.20) let us denote the number of 𝐴 reactants at time 𝑡
by 𝑁 𝐴 (𝑡). Likewise, we define 𝑁 𝐵 (𝑡) and 𝑁𝐶 (𝑡). The instantaneous
total reaction rate will be equal4 to 𝑘 𝑁 𝐴 (𝑡) 𝑁 𝐵 (𝑡). At time 𝑡, how
long until the next reaction occurs? By the very definition of rates,
the chance that an event with rate 𝑅 occurs in a short time interval
Δ𝑡 is 𝑅Δ𝑡. So the probability distribution of the time 𝜏 until next
event is exponential:
𝑝(𝜏) = 𝑅𝑒 −𝑅𝜏 . (5.21)
In the case of Eq. (5.20), 𝑅(𝑡) = 𝑘 𝑁 𝐴 (𝑡) 𝑁 𝐵 (𝑡). Finally, note that
between events 𝑅(𝑡) is constant.
We now have all the ingredients to simulate an exact stochastic
realization of Eq. (5.20). We simply sample a 𝜏 from Eq. (5.21),
update time 𝑡 ← 𝑡 + 𝜏 as well as the molecular count 𝑁 𝐴 ← 𝑁 𝐴 − 1,
𝑁 𝐵 ← 𝑁 𝐵 − 1, and 𝑁𝐶 ← 𝑁𝐶 + 1, since after a reaction there will
be one less 𝐴 and 𝐵 molecules, and one more 𝐶. This approach is
called the Gillespie algorithm.

It is only slightly more complicated to simulate a system in


which many type of events can occur. In the following we use
3 Inthis case we could for instance have 𝑐 0 (𝑡) = 𝑘 𝑎(𝑡) 𝑏(𝑡).
4 We are sloppy with the definition of 𝑘
here, as it should be rescaled according
to the volume of the system under consideration.
CHAPTER 5. STOCHASTIC SYSTEMS 114

𝒙(𝑡) to denote the current state. For Eq. (5.20) this would be
𝒙(𝑡) = (𝑁 𝐴 (𝑡), 𝑁 𝐵 (𝑡), 𝑁𝐶 (𝑡)).

The Gillespie Algorithm (Version 1)


Repeat until end of simulation:

1. Calculate current rates {𝑟𝑖 (𝑡)} using the current 𝒙(𝑡).

2. For each event sample 𝜏𝑖 from 𝑝(𝜏𝑖 ) = 𝑟𝑖 exp(−𝑟𝑖 𝜏𝑖 ).

3. Find the event corresponding to the minimum value


sampled: 𝑗 = arg min𝑖 {𝜏𝑖 }.

4. Let 𝑡 ← 𝑡 + 𝜏 𝑗 and update 𝒙(𝑡) according to event 𝑗.

If the system being considered has a large number of events


that can occur, it is slightly more efficient to use a different, but
equivalent implementation:

The Gillespie Algorithm (Version 2)


Repeat until end of simulation:

1. Calculate current rates {𝑟𝑖 (𝑡)} using the current 𝒙(𝑡).


Í
2. Calculate the total rate 𝑅 = 𝑖 𝑟𝑖 .

3. Sample 𝜏 from 𝑝(𝜏) = 𝑅 exp(−𝑅 𝜏).


CHAPTER 5. STOCHASTIC SYSTEMS 115

4. Sample a uniform number 𝑈 between 0 and 𝑅.


Í𝑗
5. Find the first event 𝑗 such that 𝑖=1 𝑟𝑖 ≥ 𝑈.

6. Let 𝑡 ← 𝑡 + 𝜏 and update 𝒙(𝑡) according to event 𝑗.

This version only needs two random number samples, indepen-


dent of the number of events. It is useful to know both of these
methods as they are both often used. The two are equivalent because
the distribution of the variable 𝑋 = min(𝑋1 , 𝑋2 , · · · 𝑋𝑁 ) is expo-
nential with parameter 𝑅 = 𝑟 1 + 𝑟 2 + · · · 𝑟 𝑁 if 𝑋𝑖 is exponentially
distributed with parameter 𝑟𝑖 .

The Gillespie algorithm simulates exact realizations of the stochas-


tic rate equations. We recommend its use whenever it is possible.
The only downside to the algorithm is that it becomes really slow
if the rates are large, since the effective time step it takes will be of
size Δ𝑡 ∼ 1/𝑅.
In these cases we can turn to approximate methods, the simplest
of which is called Tau-Leaping.

Tau-Leaping
Choose a time step Δ𝑡, and repeat until end of simulation:

1. Calculate current rates {𝑟𝑖 (𝑡)} using the current 𝒙(𝑡).

2. For each event 𝑖, sample 𝑁𝑖 from a Poisson distribution


CHAPTER 5. STOCHASTIC SYSTEMS 116

with parameter 𝑟𝑖 Δ𝑡:

(𝑟𝑖 Δ𝑡) 𝑁𝑖 𝑒 −𝑟𝑖 Δ𝑡


𝑝(𝑁𝑖 ) = . (5.22)
𝑁𝑖 !

3. Let 𝑡 ← 𝑡 + Δ𝑡. For each event 𝑖, update 𝒙(𝑡) by having


the event occur 𝑁𝑖 times.

We note that the method is called 𝜏-leaping, because Δ𝑡 is usually


written using 𝜏. We prefer Δ𝑡 for consistency with the other meth-
ods, however. The above scheme is approximate, as we assume 𝒙(𝑡)
constant in the time between 𝑡 and 𝑡 + Δ𝑡, even though many events
could occur in that time.
For differential equations we described the accuracy of a numer-
ical scheme by how the error scaled with Δ𝑡. It is slightly harder
to define the error for a stochastic simulation, since each time you
run a simulation a different result will be found. This is the point
of stochastic simulations after all. To define an error we ask what
happens if we simulate many times and compare the average of such
simulations to a true realization (such as one found by the Gillepsie
algorithm). We have two choices for how to define this average
error: we can take the error of the means or we can take the mean
of the errors.
A method is described to have a strong order of convergence
CHAPTER 5. STOCHASTIC SYSTEMS 117

O (Δ𝑡 𝑛 ) if5

max E |𝑋sim (𝑡) − 𝑋true (𝑡)| = O (Δ𝑡 𝑛 ), (5.23)


𝑡

where E denotes expectation (averaging over all simulations).


A method is described to have a weak order of convergence
O (Δ𝑡 𝑛 ) if

max |E[𝑋sim (𝑡) 𝑚 ] − E[𝑋true (𝑡) 𝑚 ] | = O (Δ𝑡 𝑛 ) (5.24)


𝑡

for all integer values of 𝑚. Note that if Eq. (5.23) holds, then so
does Eq. (5.24), but not the other way around.
√Tau-leaping is order O (Δ𝑡) in the weak convergence, but only
O ( Δ𝑡) in strong convergence. You therefore need to use very small
Δ𝑡 when using this method. But very small can still be significantly
larger than what is required by the Gillespie method for problems
with large rates. Note that the definition of error is over the entire
simulation, not per time step. Thus, the above should be compared
e.g. to the total error of the Euler method of O (Δ𝑡) (since for ODEs
the largest error will typically be found at the last time step 𝑡 = 𝑇).

5.3 Stochastic Differential Equations


Event-based stochastic systems are discrete in time in the sense
that there are finite periods of time over which nothing happens. In
5 For simplicity we write maximum. To be mathematically precise this should
a supremum.
CHAPTER 5. STOCHASTIC SYSTEMS 118

contrast, Stochastic Differential Equations (SDEs) are the stochastic


generalisation of Ordinary Differential Equations. Here we consider
SDEs of the form

d𝑋 = 𝜇(𝑋, 𝑡) d𝑡 + 𝜎(𝑋, 𝑡) d𝑊 . (5.25)

If you have never seen this notation before it can be a bit weird.
Informally, you have think of d𝑊 as an infinitesimally small random
number:
d𝑊 = lim Δ𝑊, (5.26)
Δ𝑡→0
where Δ𝑊 is a normal distributed random number with mean zero
√ Δ𝑡. Note that this means that the standard deviation
and variance
of Δ𝑊 is Δ𝑡. Physicists often use the notation
d𝑋
= 𝜇(𝑋, 𝑡) + 𝜎(𝑋, 𝑡) 𝜉 (𝑡), (5.27)
d𝑡
where 𝜉 (𝑡) is a noise term. The former notation, however, is math-
ematically more well-defined and in fact also more natural for in-
troducing numerical methods.

5.3.1 Initial-Value Problems


Just as for initial-value problems for ODEs, for SDEs we also choose
a finite step size Δ𝑡. Almost as simple as the Euler Method, is the
Euler–Maruyama method for SDEs:
CHAPTER 5. STOCHASTIC SYSTEMS 119

Euler–Maruyama Method
Each time step is taken by updating

𝑋 (𝑡 + Δ𝑡) = 𝑋 (𝑡) + 𝜇(𝑋 (𝑡), 𝑡)Δ𝑡 + 𝜎(𝑋 (𝑡), 𝑡)Δ𝑊, (5.28)

where Δ𝑊 is a normal distributed random number with mean


zero and variance Δ𝑡:
1 2 /2Δ𝑡
𝑝(Δ𝑊) = √ 𝑒 −Δ𝑊 . (5.29)
2𝜋Δ𝑡

The Euler–Maruyama
√ method has weak order of error O (Δ𝑡), but
only O ( Δ𝑡) in strong order of convergence. When 𝜎(𝑋, 𝑡) does
not depend on 𝑋 though, the strong order of convergence is O (Δ𝑡).
If 𝜎 does depend on 𝑋, a slightly better method which for all
choices of 𝜎 has O (Δ𝑡) in strong order of convergence is:

Milstein method
Each time step is taken by updating

𝑋 (𝑡 + Δ𝑡) =𝑋 (𝑡) + 𝜇(𝑋 (𝑡))Δ𝑡 + 𝜎(𝑋 (𝑡))Δ𝑊 (5.30)


1  
+ 𝜎(𝑋 (𝑡)) 𝜎0 (𝑋 (𝑡)) Δ𝑊 2 − Δ𝑡 ,
2
where Δ𝑊 is a normal distributed random number with mean
zero and variance Δ𝑡. Both occurrences of Δ𝑊 in Eq. (5.30)
refer to the same random number.
CHAPTER 5. STOCHASTIC SYSTEMS 120

Generalisations of Runge–Kutta methods to SDEs also exist


(which do not use derivatives of 𝜎), but these are beyond the scope
of this text.

The SDE we have considered has been using the Itô integral.
If you are aware of the distinction between Itô and Stratonovich
integrals, you will know that Stratonovich SDEs are more common
in physics than Itô. Conveniently, any Stratonovich SDE can be
rewritten in the Itô interpretation by calculating the noise-induced
drift term. After such a conversation the above methods can be
applied. We note, nonetheless, that specific schemes designed for
Stratonovich SDEs also exist.

5.3.2 Boundary-Value Problems


You are far more likely to run into stochastic initial-value prob-
lems than boundary-value problems. Nothing, however, prevents
boundary-value problems from being well-defined for stochastic
problems.
The classical example of this is the Brownian bridge: a random
walk for which 𝑋 (0) and 𝑋 (𝑇) are known. How would you in
this case simulate 𝑋 (𝑡)? This specific case has a simple solution:
simulate 𝑋˜ (𝑡) considering only the initial-value boundary condition
𝑋 (0). Then
𝑡 
𝑋 (𝑡) = 𝑋˜ (𝑡) + 𝑋 (𝑇) − 𝑋˜ (𝑇) (5.31)
𝑇
is an exact realization of the equation, which satisfies both boundary
conditions.
CHAPTER 5. STOCHASTIC SYSTEMS 121

Most problems, however, will not have such an elegant solution.


One approach to the general problem is to write down a likelihood
function of the stochastic simulation. For the Brownian bridge
example, we could for instance write
𝑁
Ö
L= 𝑝(𝑋 (𝑛Δ𝑡) − 𝑋 ((𝑛 − 1)Δ𝑡)), (5.32)
𝑛=1

where 𝑁Δ𝑡 = 𝑇. Markov Chain Monte Carlo methods, as presented


in Sec. 5.1.3, can then be used to sample for 𝑋 (Δ𝑡), 𝑋 (2Δ𝑡) · · ·
𝑋 ((𝑁 − 1)Δ𝑡). Note that if instead work in terms of log likelihood,
as recommended in Sec. 5.1.3, the product in Eq. (5.32) becomes
a sum.

You might also like