Solving Ordinary Differential Equations in Python
Solving Ordinary Differential Equations in Python
Joakim Sundnes
Solving Ordinary
Differential Equations in
Python
Simula SpringerBriefs on Computing
Volume 15
Editor-in-Chief
Joakim Sundnes, Simula Research Laboratory, Oslo, Norway
Series Editors
Shaukat Ali, Simula Research Laboratory, Oslo, Norway
Evrim Acar Ataman, Simula Metropolitan Centre for Digital Engineering, Oslo, Norway
Are Magnus Bruaset, Simula Research Laboratory, Oslo, Norway
Xing Cai, Simula Research Laboratory & University, Oslo, Norway
Kimberly Claffy, San Diego Supercomputer Center, CAIDA, University of California, San
Diego, San Diego, CA, USA
Andrew Edwards, Simula Research Laboratory, Oslo, Norway
Arnaud Gotlieb, Simula Research Laboratory, Oslo, Norway
Magne Jørgensen, Software Engineering, Simula Research Laboratory, Oslo, Norway
Olav Lysne, Simula Research Laboratory, Oslo, Norway
Kent-Andre Mardal, University of Oslo & Simula Research Lab, Oslo, Norway
Kimberly McCabe, Simula Research Laboratory, Oslo, Norway
Andrew McCulloch, Bioengineering 0412, University of California, San Diego, La Jolla,
CA, USA
Leon Moonen, Simula Research Laboratory, Oslo, Norway
Michael Riegler, Simula Metropolitan Centre for Digital Engineering & UiT The Arctic
University of Norway, Oslo, Norway
Marie Rognes, Simula Research Laboratory & University, Oslo, Norway
Fabian Theis, Institute of Computational Biology, Helmholtz Zentrum München,
Neuherberg, Germany
Aslak Tveito, Simula Research Laboratory, Oslo, Norway
Karen Willcox, Oden Institute for Computational Engineering & Science, The University
of Texas at Austin, Austin, MA, USA
Tao Yue, Nanjing University of Aeronautics and Astronautics & Simula Research
Laboratory, Oslo, Norway
Andreas Zeller, Saarland University, Saarbrücken, Germany
Yan Zhang, University of Oslo & Simula Research Laboratory, Oslo, Norway
In 2016, Springer and Simula launched the book series Simula SpringerBriefs on
Computing, which aims to provide introductions to selected research topics in
computing. The series provides compact introductions for students and researchers
entering a new field, brief disciplinary overviews of the state-of-the-art of select
fields, and raises essential critical questions and open challenges in the field of
computing. Published by SpringerOpen, all Simula SpringerBriefs on Computing
are open access, allowing for faster sharing and wider dissemination of knowledge.
Simula Research Laboratory is a leading Norwegian research organization which
specializes in computing. Going forward, the book series will provide introductory
volumes on the main topics within Simula’s expertise, including communications
technology, software engineering and scientific computing.
By publishing the Simula SpringerBriefs on Computing, Simula Research Labo-
ratory acts on its mandate of emphasizing research education. Books in this series are
published by invitation from one of the series editors. Authors interested in publishing
in the series are encouraged to contact any member of the editorial board.
Joakim Sundnes
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribu-
tion and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Dear reader,
Scientific research is increasingly interdisciplinary, and both students and
experienced researchers often face the need to learn the foundations, tools,
and methods of a new research field. This process can be quite demanding,
and typically involves extensive literature searches and reading dozens of sci-
entific papers in which the notation and style of presentation varies consider-
ably. Since the establishment of this series in 2016 by founding editor-in-chief
Aslak Tveito, the briefs in this series have aimed to ease the process by intro-
ducing and explaining important concepts and theories in a relatively narrow
field, and to outline open research challenges and pose critical questions on
the fundamentals of that field. The goal is to provide the necessary under-
standing and background knowledge and to motivate further studies of the
relevant scientific literature. A typical brief in this series should be around
100 pages and should be well suited as material for a research seminar in a
well-defined and limited area of computing.
We publish all items in this series under the SpringerOpen framework,
as this allows authors to use the series to publish an initial version of their
manuscript that could subsequently evolve into a full-scale book on a broader
theme. Since the briefs are freely available online, the authors do not receive
any direct income from the sales; however, remuneration is provided for every
completed manuscript. Briefs are written on the basis of an invitation from
a member of the editorial board. Suggestions for possible topics are most
welcome and can be sent to [email protected].
v
vi
This book was based on a set of lecture notes originally written for the book A
Primer on Scientific Programming with Python by Hans Petter Langtangen
[14], mainly covering topics from Appendices A, C, and E. To provide a more
comprehensive overview of state-of-the art solvers for ordinary differential
equations (ODEs), the notes have been extended with additional material on
implicit solvers and automatic time-stepping methods. The main purpose of
the notes is to serve as a concise and gentle introduction to solving differential
equations in Python, specifically for the course Introduction to programming
for scientific applications (IN1900, 10 ETCS credits) at the University of
Oslo. These notes will be most useful for readers with a basic knowledge
of Python and NumPy, see for instance [16], and it is also useful to have a
fundamental understanding of ODEs.
One may question the usefulness of learning how to write your own ODE
solvers in Python when there are already multiple solvers available, such as
those in the SciPy library. However, no single ODE solver is universally opti-
mal and efficient for all ODE problems, and the choice of solver should always
be based on the specific characteristics of the problem at hand. To make the
right choice, it is extremely beneficial to understand the strengths and weak-
nesses of different solvers, and the best way to gain this knowledge is by
programming your own collection of ODE solvers. Different ODE solvers are
conveniently grouped into families and hierarchies, offering an excellent ex-
ample of how object-oriented programming (OOP) can maximize code reuse
and minimize duplication.
The book’s presentation style is compact and pragmatic, incorporating
numerous code examples to illustrate how various ODE solvers can be imple-
mented and applied in practice. The complete source code for all examples,
as well as Jupyter notebooks for each chapter, are provided in the accom-
panying online resources. The programs and code examples are written in
a simple and compact Python style, avoiding the use of advanced tools and
features. Experienced Python programmers may find more elegant and mod-
ern solutions to many of the examples, utilizing abstract base classes, type
vii
viii Preface
hints, data classes, and other advanced features. However, the book’s main
goal is to introduce the fundamentals of ODE solvers and OOP as part of an
introductory programming course, and we believe this purpose is best served
by focusing on the basics.
Readers familiar with scientific computing or numerical software may also
miss a discussion of computational performance. While performance is cer-
tainly relevant when solving ODEs, optimizing the performance of a Python-
based solver easily becomes quite technical, and requires features like just-in-
time compilers (e.g., Numba) or mixed-language programming. The solvers
in this book use fairly basic features of Python and NumPy, sacrificing some
performance in favor of enhancing understanding of solver properties and
implementation.1
The book is organized as follows: Chapter 1 introduces the forward Euler
method, serving as a foundation for understanding the principles underlying
all the methods covered later. It introduces the notation and mathematical
formulation used throughout the book for scalar ODEs and systems of ODEs,
and is essential reading for those with limited prior experience with ODEs and
ODE solvers. Additionally, it briefly explains how to use the ODE solvers from
the SciPy library. Readers already familiar with the fundamentals of the for-
ward Euler method and its implementation may consider proceeding straight
to Chapter 2, which presents explicit Runge-Kutta methods. The chapter
introduces the fundamental ideas of these methods, but the main focus is
on the implementation and how a collection of ODE solvers is conveniently
implemented as a class hierarchy. Chapter 3 introduces stiff ODEs, presents
techniques for performing simple stability analysis of Runge-Kutta methods,
and introduces implicit Runge-Kutta methods. The majority of the chapter is
dedicated to the programming of these solvers, which exhibit better stability
properties than explicit methods and are therefore more suitable for solving
stiff ODEs. Chapter 4 concludes the presentation of ODE solvers by introduc-
ing methods for adaptive time step control, which is an essential component
of all modern ODE software. Chapter 5 takes a different approach from the
preceding chapters, as it focuses on a specific class of ODE models rather than
a set of solvers. While the simpler ODE problems discussed in earlier chap-
ters serve the purpose of introducing and testing the solvers, it is valuable to
explore more complex ODE models in order to appreciate both the potential
and the challenges of modeling with ODEs. As an example, the chapter exam-
ines the famous Kermack-McKendrick SIR (Susceptible-Infected-Recovered)
model from epidemiology. These classic models were developed in the early
1900s (see [12]) and remain fundamental for predicting and understanding
the spread of infectious diseases. We describe the derivation of the models
from a set of fundamental assumptions, and discuss the implications and lim-
itations resulting from these assumptions. The main focus of the chapter is
then on modifying and extending the models to capture new phenomena, and
1
Complete source code for all the solvers and examples in the book can be found
here: https://sundnes.github.io/solving_odes_in_python/
Preface ix
demonstrating how these changes can be implemented and explored using the
solvers developed in preceding chapters.
Finally, while the main focus of the text is on differential equations, Ap-
pendix A is dedicated to the related topic of difference equations. Differ-
ence equations have important applications on their own and may serve as
a stepping stone towards understanding and solving ODEs, since numerical
methods for ODEs essentially involve transforming differential equations into
difference equations. The standard formulation of difference equations found
in mathematical textbooks is already well-suited for computer implemen-
tation, using for-loops and arrays. Some students find difference equations
easier to grasp than differential equations, making Appendix A a useful re-
source to begin with. However, others may prefer to dive straight into ODEs
and explore Appendix A at a later stage.
xi
xii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Chapter 1
Programming a Simple ODE Solver
Ordinary differential equations (ODEs) are widely used in science and en-
gineering, particularly when it comes to modeling dynamic processes. Al-
though analytical methods can be employed to solve simple ODEs, nonlinear
ODEs typically require numerical methods for solutions. In this chapter we
demonstrate how to program general numerical solvers capable of handling
any ODE. Initially we will focus on scalar ODEs, which consist of a single
equation and a single unknown. Subsequently, in Section 1.3, we will extend
these concepts to systems of coupled ODEs. Acquiring a solid grasp of the
concepts presented in this chapter will not only help you with programming
your own ODE solvers but also in using a diverse range of readily available,
general-purpose ODE solvers in Python or other programming languages.
which means that the ODE is fully specified by the definition of the right-
hand side function f (t, u). Examples of this function may be:
Notice that, for the sake of generality, we write all the right-hand sides of
the ODEs as functions of both t and u, even though the mathematical for-
mulations only involve u. This general formulation is not strictly necessary
in the mathematical equations, but it proves to be highly convenient when
we start programming and want to use the same solver for a diverse range of
ODE models. We will delve into this topic in greater detail later. Now, our
objective is to write functions and classes that accept the function f as input
and solve the corresponding ODE to generate the output u.
To ensure a unique solution for (1.1), it is necessary to specify the initial
condition for u. This initial condition corresponds to the value of the solution
at a specific time t = t0 . The resulting mathematical problem can be expressed
as
u0 = u.
This general solution of this equation is given by u(t) = Cet for any constant
C, implying that there exist an infinite number of solutions. However, by
specifying an initial condition u(t0 ) = u0 , we get C = u0 and the unique
solution u(t) = u0 et . When solving the equation numerically, it is necessary
to define the initial condition u0 in order to start our method and compute
a solution at all.
A Simple and General Solver: the Forward Euler Method. A numer-
ical method for (1.1) can be derived by using a finite difference approximation
for the derivative in the equation u0 = f (t, u). To introduce this idea, let us
assume that we have already computed u at discrete time points t0 , t1 , . . . , tn .
At time tn we have the ODE
u(tn+1 ) − u(tn )
u0 (tn ) ≈ .
∆t
1.1 Creating a General-Purpose ODE Solver 3
This method, known as the Forward Euler (FE) method or the Explicit Euler
method, is the simplest numerical method for solving an ODE. The terms
forward and explicit refer to the fact that we have an explicit update formula
for u(tn+1 ) that only involves known quantities at time tn . In contrast, an
implicit ODE solver would have an update formula that includes terms like
f (tn+1 , u(tn+1 )), requiring the solution of a generally nonlinear equation to
determine the unknown u(tn+1 ). We will explore other explicit ODE solvers
in Chapter 2 and implicit solvers in Chapter 3.
To simplify the formula, we introduce the notation un = u(tn ), i.e., we let
un represent the numerical approximation to the exact solution u(t) at t = tn .
With this notation, the update formula reads
u1 = u0 + ∆tu0 ,
u2 = u1 + ∆tu1 ,
u3 = u2 + . . . ,
N = 20
T = 4
dt = T/N
u0 = 1
t = np.zeros(N + 1)
u = np.zeros(N + 1)
u[0] = u0
for n in range(N):
t[n + 1] = t[n] + dt
u[n + 1] = (1 + dt) * u[n]
plt.plot(t, u)
plt.show()
Notice that there is no need to set t[0]= 0 when t is created in this way,
but it is important to update u[0]. Forgetting to do so is a common error
in ODE programming, so it is worth taking note of the line u[0] = u0. The
solution is shown in Figure 1.1 for two different choices of the time step ∆t.
As observed, the approximate solution improves as ∆t is reduced, although
both the solutions deviate from the exact solution. However, reducing the
time step further would easily yield a solution that is indistinguishable from
the exact solution.
The for-loop in the aforementioned example could also be implemented
differently, for instance
for n in range(1, N+1):
t[n] = t[n - 1] + dt
u[n] = (1 + dt) * u[n - 1]
Here, the index n runs from 1 to N, and all the indices inside the loop have
been decreased by one to achieve the same outcome. In this simple case, it
is easy to verify that both loop formulations give the same result. However,
mixing up the two formulations can easily lead to errors, such as a loop that
exceeds the array bounds (resulting in an IndexError) or a loop where the
last elements of t and u are not computed. Although these errors may appear
trivial, they are common pitfalls when working with for-loops and it is good
1.1 Creating a General-Purpose ODE Solver 5
Fig. 1.1 Solution of u0 = u, u(0) = 1 with ∆t = 0.4 (N = 10) and ∆t = 0.2 (N = 20).
The modified version of the algorithm only requires a small change in the
formula for computing u[n+1] from u[n]. In the previous case we had
f (t, u) = u, and to create a general-purpose ODE solver we simply replace
6 1 Programming a Simple ODE Solver
u[n] with the more general f(t[n],u[n]). The following Python function
implements this generic version of the FE method:2
import numpy as np
u[0] = u0
dt = T / N
for n in range(N):
t[n + 1] = t[n] + dt
u[n + 1] = u[n] + dt * f(t[n], u[n])
return t, u
This simple function can solve any ODE expressed in the form (1.1). The
right-hand side function f (t, u) must be implemented as a Python function,
which is then passed as an argument to forward_euler, along with the initial
condition u0, the stop time T and the number of time steps N. Inside the
function, the time step dt is calculated using T and N.
To illustrate the usage of the forward_euler function, let us apply it to
solve the same problem as before: u0 = u, with the initial condition u(0) = 1,
for t ∈ [0, 4]. The following code uses the forward_euler function to solve
this problem:
def f(t, u):
return u
u0 = 1
T = 4
N = 30
t, u = forward_euler(f, u0, T, N)
The forward_euler function returns two arrays, t and u, which can be fur-
ther processed or plotted as desired. An important aspect to note in this code
is the definition of the right-hand side function f. As mentioned earlier, this
function should always be written with two arguments, t and u, although in
this case only u is used inside the function. The inclusion of both arguments
is necessary because we want our solver to be applicable for all ODEs in the
form u0 = f (t, u). Therefore, inside the forward_euler function, the f func-
tion is called as f(t[n], u[n]). If the right-hand side function were defined
as a function of u only, i.e., using def f(u):, an error would occur when
2
The source code for this function, as well as all subsequent solvers and examples,
can be found here: https://sundnes.github.io/solving_odes_in_python/
1.2 The ODE Solver Implemented as a Class 7
• The class should have a constructor (__init__) that accepts a single ar-
gument, the right-hand side function f, and stores it as an attribute.
• A method called set_initial_condition is required, which takes the
initial condition as argument and stores it.
• The class should have a solve-method that takes the time interval t_span
and number of time steps N as arguments. This method implements the
for-loop for solving the ODE and returns the solution, similar to the
forward_euler function we presented earlier.
• The time step ∆t and the sequences tn , un must be initialized in one of the
methods, and it may also be convenient to store these as attributes. Since
the time interval and the number of steps are arguments to the solve
method, it is natural to perform these operations there.
In addition to the mentioned methods, it can be convenient to implement
a separate method, for instance called advance, for advancing the solution
one time step. This approach simplifies the implementation of new numerical
methods, as we often only need to modify the advance method. A first version
of the solver class can be implemented as follows:
import numpy as np
class ForwardEuler_v0:
def __init__(self, f):
self.f = f
self.t[0] = t0
self.u[0] = self.u0
for n in range(N):
self.n = n
self.t[n + 1] = self.t[n] + self.dt
self.u[n + 1] = self.advance()
return self.t, self.u
def advance(self):
"""Advance the solution one time step."""
# Create local variables to get rid of "self." in
# the numerical formula
1.2 The ODE Solver Implemented as a Class 9
This class performs the same tasks as the forward_euler function mentioned
earlier, with the main advantage of the class implementation being the en-
hanced flexibility provided by the advance method. As we shall see later,
implementing a different numerical method typically only requires imple-
menting a new version of this method, leaving the rest of the code unchanged.
An additional improvement in the class implementation is the inclusion of an
assert statement within the solve method. This statement verifies that the
user has called set_initial_condition before calling solve. Forgetting to
do so is a common mistake, and the assert statement ensures that a useful
error message is raised rather than a less informative AttributeError.
We can also use a class to represent the right-hand side function f (t, u),
which is particularly convenient for functions with parameters. Consider, for
instance, the model for logistic growth:
u(t)
u0 (t) = αu(t) 1 − , u(0) = u0 , t ∈ [0, 40],
R
The main program for solving the logistic growth problem may now look like:
problem = Logistic(alpha=0.2, R=1.0)
solver = ForwardEuler_v0(problem)
u0 = 0.1
solver.set_initial_condition(u0)
t, u = solver.solve(t_span=(0, 40), N=400)
4
Recall that if we equip a class with a special method named __call__, instances of
the class will be callable and will behave like regular Python functions. See, for instance,
Chapter 8 of [16] for a brief introduction to __call__ and other special methods.
10 1 Programming a Simple ODE Solver
Up until now, our focus has been on solving ODEs with a single solution
component, commonly know as scalar ODEs. However, many interesting pro-
cesses can be described by systems of ODEs, which consist of multiple ODEs
where the right-hand side of one equation depends on the solution of the oth-
ers. Such equation systems are also referred to as vector ODEs. One simple
example is
u0 = v, u(0) = 1
v 0 = −u, v(0) = 0.
in a system of m ODEs:
d (0)
u = f (0) (t, u(0) , u(1) , . . . , u(m−1) ),
dt
d (1)
u = f (1) (t, u(0) , u(1) , . . . , u(m−1) ),
dt
.. ..
.=.
d (m−1)
u = f (m−1) (t, u(0) , u(1) , . . . , u(m−1) ).
dt
To simplify the notation (and later the implementation), we can collect both
the solutions u(i) (t) and right-hand side functions f (i) into vectors;
and
f = (f (0) , f (1) , . . . , f (m−1) ).
Note that f is now a vector-valued function. It takes m + 1 input arguments
(t and the m components of u) and returns a vector of m values. Using this
notation, the ODE system can be written
where u and f are now vectors and u0 is a vector of initial conditions. We ob-
serve that the notation used for scalar ODEs remains the same, and whether
we are solving a scalar or system of ODEs is determined by how we define
f and the initial condition u0 . This general notation is commonly employed
in ODE textbooks, and we can easily make the Python implementation just
as general. The use of NumPy arrays and vectorized computations greatly
simplifies the generalization process and enhances the efficiency of our ODE
solvers.
12 1 Programming a Simple ODE Solver
with the crucial difference that both u[k], u[k+1], and f(t[k], u[k]) are
now arrays.5 Since these are arrays, the solution u must be a two-dimensional
array, and u[k],u[k+1], etc. are the rows of this array. The function f expects
an array as its second argument, and must return a one-dimensional array
containing all the right-hand sides f (0) , . . . , f (n−1) . To gain a better feel for
how these arrays look and how they are used, let us compare the array holding
the solution of a scalar ODE with that of a system of two ODEs. For the scalar
equation, both t and u are one-dimensional NumPy arrays, and indexing into
u gives us numbers representing the solution at each time step. For instance,
in an interactive Python session we may have arrays t and u with the following
contents:
>>> t
array([0. , 0.4, 0.8, 1.2, ... ])
>>> u
array([1. , 1.4, 1.96, 2.744, ... ])
5
This compact notation requires that the solution vector u is represented by a NumPy
array. We could, in principle, use lists to hold the solution components, but the resulting
code would need to loop over the components and would be far less elegant and readable.
1.4 A ForwardEuler Class for Systems of ODEs 13
>>> u[1]
1.4
>>> u[0]
array([1.0, 0.8])
>>> u[1]
array([1.4, 1.1])
to make explicit which of the two array dimensions (or axes) that we are
indexing into.
The similarity between the generic mathematical notation for vector and
scalar ODEs, as well as the convenient algebra of NumPy arrays, suggests
that the implementation of the solver for scalar and system ODEs can be very
similar. Indeed, this is true, and the ForwardEuler_v0 class introduced earlier
can be modified with a few minor adjustments to work for ODE systems:
• Ensure that f(t,u) always returns an array.
• Inspect the initial condition u0 to determine if it is a single number (scalar)
or a list/array/tuple. Based on this, create the array u as either a one-
dimensional or two-dimensional array.6
If these two aspects are handled and initialized correctly, the remaining code
from Section 1.2 will work without any modifications.
The extended class implementation may look like:
import numpy as np
class ForwardEuler:
def __init__(self, f):
self.f = lambda t, u: np.asarray(f(t, u), float)
6
This step is not strictly needed, since we could use a two-dimensional array with
shape (N + 1, 1) for scalar ODEs. However, using a one-dimensional array for scalar
ODEs gives simpler and more intuitive indexing.
14 1 Programming a Simple ODE Solver
self.t[0] = t0
self.u[0] = self.u0
for n in range(N):
self.n = n
self.t[n + 1] = self.t[n] + self.dt
self.u[n + 1] = self.advance()
return self.t, self.u
def advance(self):
"""Advance the solution one time step."""
u, dt, f, n, t = self.u, self.dt, self.f, self.n, self.t
return u[n] + dt * f(t[n], u[n])
two-dimensional array with the appropriate size. The actual for-loop and the
advance method remain unchanged from the previous version of the class.
Example: ODE Model for a Pendulum. As an example, let us consider a
system of ODEs that models the motion of a simple pendulum, as illustrated
in Figure 1.3. This nonlinear system is a classic physics problem, and despite
its simplicity, it is not possible to find an exact analytical solution. The
system is formulated in terms of two main variables; the angle θ and the
angular velocity ω, see Figure 1.3. For a simple pendulum with no friction,
the dynamics of these variables are governed by
dθ
= ω, (1.3)
dt
dω g
= − sin(θ), (1.4)
dt L
where L denotes the length of the pendulum and g represents the gravi-
tational constant. Eq. (1.3) follows directly from the definition of angular
velocity, while (1.4) follows from Newton’s second law, where dω/dt is the
acceleration and the right-hand side is the tangential component of the grav-
itational force acting on the pendulum, divided by its mass. To solve the
system we need to define initial conditions for θ and ω, i.e., we need to know
the initial position and velocity of the pendulum.
Fig. 1.3 Illustration of the pendulum problem. The main variables of interest are the
angle θ and its derivative ω (the angular velocity).
class Pendulum:
def __init__(self, L, g=9.81):
self.L = L
16 1 Programming a Simple ODE Solver
self.g = g
We observe that the function returns a list. However, this list will be auto-
matically wrapped into a function returning an array by the constructor of
the solver class, as mentioned above. The main program remains quite sim-
ilar to the examples presented earlier, with the exception that we now need
to define an initial condition with two components. Assuming that this class
definition as well as the ForwardEuler exist in the same file, the code to solve
the pendulum problem can look like this:
import matplotlib.pyplot as plt
problem = Pendulum(L=1)
solver = ForwardEuler(problem)
solver.set_initial_condition([np.pi / 4, 0])
T = 10
N = 1000
t, u = solver.solve(t_span=(0, T), N=N)
Notice that in order to extract and plot each solution component, we need to
index into the second dimension of u, using array slicing. If we were to use
the first index, such as u[0] or u[0,:], it would return an array of length two
containing the solution components at the first time point. In this specific
example, a call like plt.plot(t, u) would also work and would plot both
solution components. However, there are cases where we are interested in
plotting specific components of the solution, and in such cases, array slicing
becomes necessary. The resulting plot is shown in Figure 1.4. Additionally,
it is worth mentioning the use of Python’s raw string format for the labels,
indicated by the r in front of the string. Raw strings treat the backslash (\)
as a regular character and are often needed when using LaTeX encoding for
mathematical symbols. Furthermore, an observant reader may notice that
the amplitude of the pendulum motion appears to increase over time, which
is clearly not physically accurate. In reality, for an undamped pendulum
problem defined by equations (1.3)-(1.4), the energy is conserved, and the
amplitude should remain constant. The increasing amplitude is a numerical
artifact introduced by the FE method, and the solution may be improved by
reducing the time step or using a different numerical method.
1.5 Checking the Error in the Numerical Solution 17
Fig. 1.4 Solution of the simple pendulum problem, computed with the forward Euler
method.
u(tn+1 ) − u(tn )
u0 (tn ) ≈ . (1.5)
∆t
This approximation obviously introduces an error, and since we approach
the true derivative as ∆t → 0, it is intuitive that the error depends on the
size of ∆t. We visually demonstrated this relationship in Figure 1.1, but it
would be valuable to have a way of more precisely quantifying how the error
depends on the time step. Analyzing the error in numerical methods is a
broad field within applied mathematics, which we will not cover in detail
here, and the interested reader is referred to, for instance, [8]. However, when
implementing a numerical method it is very useful to know its theoretical
accuracy, and in particular to be able to compute the error and verify that
the method performs as expected.
18 1 Programming a Simple ODE Solver
While the choice of error norm may be important for certain cases, it is usually
not crucial for practical applications, and all the different error measures can
generally be expected to behave as predicted by the theory. For simplicity, we
will use an even simpler error measure in our example, where we compute the
error at the final time T , given by e = |uN − û(tN )|. Using the ForwardEuler
class introduced above, the complete code for checking the convergence can
be written as follows:
from forward_euler_class_v1 import ForwardEuler
import numpy as np
def exact(t):
return np.exp(t)
solver = ForwardEuler(rhs)
solver.set_initial_condition(1.0)
T = 3.0
t_span = (0,T)
N = 30
Most of the lines in the code are identical to the previous programs. How-
ever, we have enclosed the call to the solve method within a for loop, and
the last line ensures that the number of time steps N is doubled for each
iteration of the loop. Also, note the f-string format specifiers used, such as
{dt:<14.7f}, which specifies that the output should be a left-aligned deci-
20 1 Programming a Simple ODE Solver
problem = Pendulum(L = 1)
t_span = (0, 10.0)
u0 = (np.pi/4, 0)
plt.plot(solution.t, solution.y[0,:])
plt.plot(solution.t, solution.y[1,:])
plt.legend([r’$\theta$’,r’$\omega$’])
plt.show()
Running this code will generate a plot similar to Figure 1.5, and we ob-
serve that the solution does not appear as smooth as the one obtained from
the ForwardEuler solver introduced earlier. This seeming discrepency is due
to the nature of the solve_ivp solver, which is an adaptive solver that au-
tomatically selects the time step to meet a specified error tolerance. The
default value of this tolerance is relatively large, leading to the solver using
very few time steps and resulting in jagged-looking solution plots. Comparing
the plot with a highly accurate numerical solution, represented by the two
dotted curves in Figure 1.5, we notice that the solution at the specified time
points tn is fairly accurate. However, the visual appearance is compromised
by the linear interpolation between these time points. To obtain a more vi-
sually appealing solution, there are several approaches we can take. We may,
for instance, pass the function an additional argument t_eval, which is a
NumPy array containing the desired time points for evaluating the solution:
t_eval = np.linspace(0, 10.0, 1001)
7
See https://scipy.org/
22 1 Programming a Simple ODE Solver
Fig. 1.5 Solution of the simple pendulum problem, computed with the SciPy
solve_ivp function and the default tolerance.
Alternatively, we can reduce the error tolerance of the solver, for instance,
by setting
rtol = 1e-6
solution = solve_ivp(problem, t_span, u0, rtol=rtol)
This latter call will reduce the relative tolerance rtol from its default value
of 1e-3 (0.001). We could also adjust the absolute tolerance using the param-
eter atol. We will not cover all the possible arguments and options to the
solve_ivp function here, but it is worth mentioning that we can also change
the numerical method used by the function, by passing in a parameter named
method. For instance, a call like
rtol = 1e-6
solution = solve_ivp(problem, t_span, u0, method=’Radau’)
will replace the default solver (called rk45) with an implicit Radau ODE
solver, which we will cover in Chapter 3. For a complete description of pa-
rameters accepted by the solve_ivp function we recommend referring to the
online documentation available on the SciPy website.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 2
Improving the Accuracy
k1 = f (tn , un ),
un+1 = un + ∆tk1 .
It can be observed that this is the same formula as introduced earlier, and
there is no real advantage in writing the formula in two lines instead of one.
However, this alternative formulation aligns with the typical representation
of RK methods and facilitates understanding the relationship between the
FE method and more advanced solvers. The intermediate value k1 is often
referred to as a stage derivative in the ODE literature.
To enhance the accuracy of the FE method to second order, i.e., with
error proportional to ∆t2 , we can employ more accurate approximations of
the integral in (2.1). One option is to maintain the assumption that f (t, u(t))
is constant over tn ≤ t∗ ≤ tn+1 , but to approximate it at the midpoint of the
interval instead of the left end. This approach requires one additional stage:
k1 = f (tn , un ), (2.2)
∆t ∆t
k2 = f (tn + , un + k1 ), (2.3)
2 2
un+1 = un + ∆t k2 . (2.4)
This method is known as the explicit midpoint method or the modified Euler
method. The first step is identical to that of the FE method, but instead of
using the stage derivative k1 to advance the solution to the next step, we use
it to calculate an intermediate midpoint solution
∆t
un+1/2 = un + k1 .
2
This solution is then used to compute the corresponding stage derivative k2 ,
which serves as an approximation to the derivative of u at time tn + ∆t/2.
Finally, we use this midpoint derivative to advance the solution to tn+1 .
Another second-order method is Heun’s method, also known as the explicit
trapezoidal method, which can be derived by approximating the integral in
equation (2.1) using the trapezoidal rule:
k1 = f (tn , un ), (2.5)
k2 = f (tn + ∆t, un + ∆tk1 ), (2.6)
∆t
un+1 = un + (k1 + k2 ). (2.7)
2
This method also computes two stage derivatives k1 and k2 . However, note
that the formula for k2 approximates the derivative at tn+1 rather than at
the midpoint tn + ∆t/2. The solution is then advanced from tn to tn+1 using
the mean value of k1 and k2 .
26 2 Improving the Accuracy
All RK methods follow the same recipe as the two second-order methods
considered above; we calculate one or more intermediate values (i.e., stage
derivatives) and then advance the solution using a combination of these stage
derivatives. The method’s accuracy can be improved by adding more stages.
A general RK method with s stages can be written as
s
X
ki = f (tn + ci ∆t, un + ∆t aij kj ), for i = 1, . . . , s (2.8)
j=1
s
X
un+1 = un + ∆t bi k i . (2.9)
i=1
ci a11 · · · a1s
.. .. ..
. . .
cs as1 · · · ass
b1 · · · bs
The Butcher tableaus of the three methods discussed above: FE, explicit
midpoint, and Heun’s method, are
0 0 0 0 0 0
00
, 1/2 1/2 0 , 1 1 0 ,
1
0 1 1/2 1/2
k1 = f (tn , un ), (2.10)
∆t ∆t
k2 = f (tn + , un + k1 ), (2.11)
2 2
∆t ∆t
k3 = f (tn + , un + k2 ), (2.12)
2 2
k4 = f (tn + ∆t, un + ∆tk3 ), (2.13)
∆t
un+1 = un + (k1 + 2k2 + 2k3 + k4 ) . (2.14)
6
As mentioned earlier, all the methods discussed in this chapter are explicit
methods, meaning that aij = 0 for j ≥ i. Examining equations (2.10)-(2.14)
or the general formula (2.8) more closely, we observe that this conditions
implies that each stage derivative ki only depends on previously computed
stage derivatives. Consequently, all ki can be computed sequentially using
explicit formulas. In contrast, for implicit RK methods, aij 6= 0 for some j ≥ i.
As seen in equation (2.8), the formula for computing ki will then include ki
on the right-hand side, as part of the argument to the function f . Therefore,
equations need to be solved to compute the stage derivatives, and since f
is typically nonlinear, we need to solve these equations with an iterative
solver such as Newton’s method. These steps make implicit RK methods
28 2 Improving the Accuracy
class ODESolver:
def __init__(self, f):
# Wrap user’s f in a new function that always
# converts list/tuple to array (or let array be array)
self.model = f
self.f = lambda t, u: np.asarray(f(t, u), float)
self.t[0] = t0
self.u[0] = self.u0
for n in range(N):
self.n = n
self.t[n + 1] = self.t[n] + self.dt
self.u[n + 1] = self.advance()
return self.t, self.u
def advance(self):
raise NotImplementedError(
"Advance method is not implemented in the base class")
Similarly, the explicit midpoint method and the fourth-order RK method can
be subclasses, each implementing a single method:
class ExplicitMidpoint(ODESolver):
def advance(self):
u, f, n, t = self.u, self.f, self.n, self.t
dt = self.dt
dt2 = dt / 2.0
k1 = f(t[n], u[n])
k2 = f(t[n] + dt2, u[n] + dt2 * k1)
return u[n] + dt * k2
class RungeKutta4(ODESolver):
def advance(self):
u, f, n, t = self.u, self.f, self.n, self.t
dt = self.dt
dt2 = dt / 2.0
k1 = f(t[n], u[n],)
k2 = f(t[n] + dt2, u[n] + dt2 * k1, )
k3 = f(t[n] + dt2, u[n] + dt2 * k2, )
k4 = f(t[n] + dt, u[n] + dt * k3, )
return u[n] + (dt / 6.0) * (k1 + 2 * k2 + 2 * k3 + k4)
t_span = (0, 3)
N = 6
fe = ForwardEuler(f)
fe.set_initial_condition(u0=1)
t1, u1 = fe.solve(t_span, N)
plt.plot(t1, u1, label=’Forward Euler’)
em = ExplicitMidpoint(f)
em.set_initial_condition(u0=1)
t2, u2 = em.solve(t_span, N)
plt.plot(t2, u2, label=’Explicit Midpoint’)
rk4 = RungeKutta4(f)
rk4.set_initial_condition(u0=1)
t3, u3 = rk4.solve(t_span, N)
2.2 A Class Hierarchy of Runge-Kutta Methods 31
This code will solve the same simple equation using three different methods,
and plot the solutions in the same window, as shown in Figure 2.1. To em-
phasize the disparity in accuracy between the methods, we have set N = 6,
resulting in a very large time step (∆t = 0.5).
Fig. 2.1 Numerical solutions of the exponential growth problem, computed with
ForwardEuler, ImplicitMidpoint and RungeKutta4. All the solvers use ∆t = 0.5, to
highlight the difference in accuracy.
32 2 Improving the Accuracy
def exact(t):
return np.exp(t)
T = 3.0
t_span = (0, T)
N = 30
print(f’{solver_class.__name__}, order = {order}’)
print(f’Time step (dt) Error (e) e/dt**{order}’)
for _ in range(10):
t, u = solver.solve(t_span, N)
dt = T / N
e = abs(u[-1] - exact(T))
if e < 1e-13: # break if error is close to machine precision
break
print(f’{dt:<14.7f} {e:<12.7f} {e/dt**order:5.4f}’)
N = N * 2
The code is nearly identical to the FE convergence test in Section 1.5, with
the only difference being that we loop over a list of tuples containing the four
method classes and their corresponding orders. The output is also similar
to the previous version, but now repeated for all four solvers. The built-in
class attribute __name__ is used to extract and print the name of each solver.
Three columns are displayed, representing the time step ∆t, the error e at
time t = 3.0, and finally e/∆tp , where p is the order of the method. The output
matches the expected values for the first two methods, as the numbers in the
2.3 Testing the Solvers 33
def u_exact(t):
"""Exact u(t) corresponding to f above."""
return a * t + b
u0 = u_exact(0)
T = 8
N = 10
tol = 1E-14
t_span = (0, T)
for solver_class in solver_classes:
solver = solver_class(f)
34 2 Improving the Accuracy
solver.set_initial_condition(u0)
t, u = solver.solve(t_span, N)
u_e = u_exact(t)
max_error = abs((u_e - u)).max()
msg = f’{solver_class.__name__} failed, error={max_error}’
assert max_error < tol, msg
Similar to the convergence check illustrated below, this code will loop through
all the solver classes, solve the simple ODE, and check that the resulting error
falls within the specified tolerance.
Both of the methods shown here for verifying the implementation of our
solvers have certain limitations. The most important one is that they both
solve very simple ODEs, and it is possible to introduce errors in the code that
may only manifest themselves when dealing with more complex problems.
However, the methods presented here offer the advantages of simplicity and
generality, and they can be applied to any newly implemented ODE solver
class. Many common implementation errors, such as incorrectly specifying a
single parameter in an RK method, will often become apparent even when
solving these simple problems. Therefore, these methods can provide an initial
indication of whether the implementation is correct, which can be followed
by more extensive tests if needed.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 3
Stable Solvers for Stiff ODE Systems
class VanderPol:
def __init__(self, mu):
self.mu = mu
model = VanderPol(mu=1)
solver = ForwardEuler(model)
solver.set_initial_condition([1, 0])
plt.plot(t, u)
plt.show()
Figure 3.1 shows the solutions of the Van der Pol equation for µ = 0, 1 and
5. When the parameter µ is set even higher, such as µ = 50, the solution
diverges (becomes unstable) with the given time step (∆t = 0.02). Although
using a more accurate ERK method instead of the FE method may provide
some improvement, it does not resolve the issue significantly. It does help
to reduce the time step considerably, but the resulting computation time
may be substantial. In this problem, the time step is determined by stability
requirements rather than the desired accuracy, and opting for a solver that is
more stable than the previously discussed ERK methods may yield significant
benefits.
Before introducing more stable solvers, it is useful to examine the observed
stability problems in more detail. Why does the solution of the Van der Pol
model deteriorate significantly for large values of µ? More generally, what
are the properties of an ODE system that make it stiff? To address these
questions, it is useful to start with a simpler problem than the Van der Pol
model. Consider, for instance, a simple IVP known as the Dahlquist test
equation:
1
Note that the implementation of the solvers in this book does not support solving
this ODE for complex λ. However, considering complex values in the stability analysis
is still important because, for systems of ODEs, the relevant values are the eigenvalues
of the right-hand side, and these may be complex.
3.1 Stiff ODE Systems and Stability 37
Fig. 3.1 Solutions of the Van der Pol model for different values of µ.
we primarily focus on λ values with a negative real part, i.e., either real or
complex λ values that satisfy <(λ) < 0. In such cases, the solution of equation
(3.3) decays over time and remains stable. However, we will discover that the
numerical solutions may not always preserve this stability.
Following the definition in [1], we classify problem (3.3) as stiff for an
interval [0, b] if the real part of λ satisfies
b<(λ) −1.
For more general nonlinear problems, such as the Van der Pol model in (3.1)-
(3.2), the stiffness of the system is determined by the eigenvalues λi of the
local Jacobian matrix J, which is the matrix of partial derivatives of the
right-hand side function f . The Jacobian is defined by
∂fi (t, y)
Jij = ,
∂yj
These definitions highlight that the stiffness of a problem depends not only
on the ODE itself, but also on the length of the solution interval (b), which
may seem somewhat surprising. To understand why the interval of interest
is important, let us consider the equation (3.3). If λ is large and negative, we
need to choose a small ∆t to maintain stability of explicit solvers, as we will
discuss in more detail later. However, if our goal is to solve the equation over a
very small time interval, i.e., b is small, using a small ∆t is not a problem, and
according to the definition above the problem will no longer be considered
stiff. In addition to these definitions, the ODE literature also provides more
pragmatic definitions of stiffness. For example, an equation is often classified
as stiff if the time step needed to maintain stability of an explicit method is
much smaller than the time step dictated by the accuracy requirements [1,2].
For a more comprehensive discussion of stiff ODE systems, refer to [1, 9].
Equation (3.3) serves as the foundation for linear stability analysis, a valu-
able technique for analyzing and understanding the stability of ODE solvers.
The solution to this equation is given by u(t) = eλt , which grows rapidly if
λ has a positive real part. Therefore, our primary interest lies in the case
where <(λ) < 0, for which the analytical solution is stable, but our choice of
solver may introduce numerical instabilities. When the FE method is applied
to (3.3), we obtain the update formula
and for the first step, with the initial condition u(0) = 1, we have
u1 = 1 + ∆tλ. (3.4)
The analytical solution decays exponentially for <(λ) < 0, and it is natural to
require that the numerical solution decreases monotonically. This leads to the
requirement |1 + ∆tλ| ≤ 1. When λ is a negative real number, the time step
must satisfy ∆t ≤ −2/λ to ensure stability. It is important to note that meet-
ing this stability criterion does not necessarily guarantee a highly accurate
solution; the numerical solution may exhibit oscillate and differ substantially
from the exact solution. Nevertheless, by selecting ∆t to satisfy the stability
criterion, we ensure that the solution, along with any spurious oscillations or
other numerical artifacts, decays with time.
We have observed that the right-hand side of (3.4) contains critical infor-
mation about the stability of the FE method. This expression is commonly
referred to as the stability function or amplification factor of the method,
and is often written as
R(z) = 1 + z.
For the FE method to be stable, all values of λ∆t must satisfy |R(λ∆t)| < 1.
This region of λ∆t values in the complex plane is referred to as the method’s
region of absolute stability or its stability region. The stability region for the
FE method is shown in the left panel of Figure 3.2, taking the form of a circle
3.1 Stiff ODE Systems and Stability 39
Fig. 3.2 Stability regions for explicit Runge-Kutta methods. From left: forward Euler,
explicit midpoint, and the fourth order method given by (2.10)-(2.14).
We can easily extend the linear stability analysis to the other explicit RK
methods introduced in Chapter 2. For instance, applying a single step of the
explicit midpoint method given by (2.2)-(2.4) to (3.3) gives
(∆tλ)2
u(∆t) = 1 + λ∆t + ,
2
and we identify the stability function for this method as
z2
R(z) = 1 + z + .
2
The corresponding stability region is shown in the middle panel of Figure 3.2.
For the fourth-order RK method defined in (2.10)-(2.14), the same steps yield
the stability function
z2 z3 z4
R(z) = 1 + z + + + ,
2 6 24
40 3 Stable Solvers for Stiff ODE Systems
and the stability region is shown in the right panel of Figure 3.2. We ob-
serve that the stability regions of these higher-order RK methods are slightly
larger than that of the FE method. However, the difference is not very large,
and when also considering the computational cost of each time step, the FE
method is usually superior for problems where the time step is governed by
stability.
It can be shown that the stability function for an s-stage explicit RK
method is always a polynomial of degree ≤ s, and it can easily be verified
that the stability region defined by such a polynomial will never be very large.
To obtain a significant improvement of this situation, we need to replace the
explicit methods discussed so far with implicit RK methods.
un+1 (1 − ∆tλ) = un ,
The explicit midpoint and trapezoidal methods mentioned earlier also have
their implicit counterparts. The implicit midpoint method is given by
k1 = f (tn , un ), (3.10)
k2 = f (tn + ∆t, un + ∆tk2 ), (3.11)
∆t
un+1 = un + (k1 + k2 ). (3.12)
2
42 3 Stable Solvers for Stiff ODE Systems
Fig. 3.3 Stability regions for the backward Euler method (left) and the implicit mid-
point method and trapezoidal method (right).
Note that this formulation of the Crank-Nicolson is not very common, and
it can be simplified by eliminating the stage derivatives and defining the
method in terms of un and un+1 . However, the given formulation in (3.10)-
(3.12) highlights its implicit RK nature. The implicit nature of these methods
is apparent from the formulas, as one of the stage derivatives must be found
by solving an equation involving the nonlinear function f instead of using
an explicit update formula. The Butcher tableaus of the three methods are
given by
0
11 1/2 1/2
, , 1 0 1 , (3.13)
1 1
1/2 1/2
from left to right for backward Euler, implicit midpoint and the implicit
trapezoidal method.
The implicit midpoint method and the implicit trapezoidal method share
the same stability function, given by R(z) = (2 + z)/(2 − z). The correspond-
ing stability domain covers the entire left half-plane of the complex plane, as
shown in the right panel of Figure 3.3. Both the implicit midpoint method
and the trapezoidal method are therefore A-stable methods. However, since
R(z) → 1 as z → −∞, these methods lack stiff decay and are therefore not L-
3.3 Implementing Implicit Runge-Kutta Methods 43
Euler method, which is the simplest implicit method, but we will keep the
implementation sufficiently general to be easily extendable to more advanced
implicit methods. For a more detailed discussion on solver optimization and
choices to enhance computational performance, interested readers can refer
to references [1, 9].
When examining the ODESolver class introduced in Chapter 2, we can ob-
serve that many administrative tasks involved in RK methods are the same
for both implicit and explicit methods. Specifically, the initialization of so-
lution arrays and the for-loop that advances the solution remain unchanged.
However, advancing the solution from one step to the next differs signifi-
cantly. Therefore, it is convenient to implement implicit solvers within the
existing class hierarchy and let the ODESolver superclass handle the tasks of
initializing the solver and the main solver loop. The different explicit meth-
ods introduced in Chapter 2 were realized through different implementations
of the advance method. We can use the same approach for implicit methods,
but since each step in implicit methods involves a few more operations it is
useful to introduce a couple of additional methods. For instance, a concise
implementation of the backward Euler method could appear as follows:
from ODESolver import *
from scipy.optimize import root
class BackwardEuler(ODESolver):
def stage_eq(self, k):
u, f, n, t = self.u, self.f, self.n, self.t
dt = self.dt
return k - f(t[n] + dt, u[n] + dt * k)
def solve_stage(self):
u, f, n, t = self.u, self.f, self.n, self.t
k0 = f(t[n], u[n])
sol = root(self.stage_eq, k0)
return sol.x
def advance(self):
u, f, n, t = self.u, self.f, self.n, self.t
dt = self.dt
k1 = self.solve_stage()
return u[n] + dt * k1
solve the equation. The root function is a general tool for solving nonlinear
equations of the form g(x) = 0, and we apply it to solve the stage equa-
tion k1 − f (tn + ∆t, un + ∆tk1 ) = 0. The function returns an object of the
OptimizeResult class, which includes the solution as an attribute x, along
with numerous other attributes containing information about the solution
process. For further details on the OptimizeResult and the root function,
we refer to the SciPy documentation.
Fig. 3.4 Solutions of the Van der Pol model for µ = 10, using the forward and backward
Euler methods with ∆t = 0.04.
In this implementation, we leverage the fact that the stage k1 in the Crank-
Nicolson is explicit and does not require solving an equation. On the other
hand, while the definition of k2 is identical to that of k1 in the backward
Euler method. Consequently, We can directly reuse both the stage_eq and
solve_stage methods, with only the advance method needing to be reimple-
mented. While this compact implementation of the Crank-Nicolson method
allows for code reuse, it can be argued that it violates a common principle of
object-oriented programming. Subclassing and inheritance represent an "is-a"
relationship, implying that an instance of the Crank-Nicolson class is also
an instance of the BackwardEuler class. While this works fine in the pro-
gram, and is convenient for code reuse, it is not a correct representation of
the relationship between the two numerical methods. Both methods belong
to the group of implicit RK solvers, but the Crank-Nicolson method is not
a special case of BackwardEuler. In the following sections, we will introduce
an alternative class hierarchy that reflects this relationship and enables a
compact implementation of RK methods using the general formulation in
(2.8)-(2.9).
and this choice affects both the accuracy and computational complexity of
the methods. In this section, we will explore two main branches of IRK meth-
ods: fully implicit methods and diagonally implicit methods. Both classes of
methods are widely used and both have their advantages and drawbacks.
where bi are the weights and ki are the stage derivatives, which could be in-
terpreted as approximations of the right-hand side function f (t, u) at distinct
time points tn + ∆tci .
Numerical integration is a well-established field in numerical analysis, and
it is natural to choose the integration points ci and weights bi in (3.14)
based on standard quadrature rules with known properties. Such quadrature
rules are often derived by approximating the integrand with a polynomial
which interpolates the function f at distinct points, and then integrating the
polynomial exactly. A similar approach can be employed in deriving implicit
RK methods. We approximate the solution u on the interval tn < t ≤ tn+1
using a polynomial P (t) of degree up to s, and require that P (t) satisfies the
ODE exactly at distinct points tn + ci ∆t. This requirement, expressed as
The Gauss methods are A-stable but not L-stable. Since FIRK methods
are primarily used for challenging stiff problems where stability is crucial,
another family of FIRK methods, known as Radau IIA methods, is more
commonly employed. These methods are based on Radau quadrature points,
which include the right end of the integration interval (i.e., cs = 1). The one-
stage Radau IIA method is the backward Euler method, while the two- and
three-stage versions are given by
√ √ √ √
4− 6 88−7 6 296−169 6 −2+3 6
10
√ 360 √ 1800√ 225√
1/3 5/12 −1/12 4+ 6 296+169 6 88+7 6 −2−3 6
1 3/4 1/4 , 10 1800√ 360√ 225 .
16− 6 16+ 6 1
2/3 1/4 1 36√ 36√ 9
16− 6 16+ 6 1
36 36 9
The Radau IIA methods exhibit order 2s − 1, and their stability functions
are (s − 1, s) Padé approximations of the exponential function, as described
in [9]. For the two- and three-stage methods mentioned earlier, the stability
functions are given by
1 + z/3
R(z) = ,
1 − 2z/3 + z 2 /6
1 + 2z/5 + z 2 /20
R(z) = ,
1 − 3z/5 + 3z 2 /20 − z 2 /60
aij 6= 0 complicates the implementation of the methods and makes each time
step computationally expensive. All the s equations of (2.8) become fully
coupled and need to be solved simultaneously. In the case of an ODE system
comprising m equations, we must solve a system of ms nonlinear equations
for each time step. We will come back the implementation of FIRK methods
in Section 3.5, but let us first introduce a slightly simpler class of implicit
RK solvers.
Fig. 3.5 The shaded area represents the stability region for two of the RadauIIA
methods, with s = 2 (left) and s = 3 (right).
mine each ki , but we can solve s systems of m equations rather than solving
one large system to compute all stages simultaneously. This simplifies the
implementation and reduces the computational cost per time step. However,
the restriction on the method coefficients also reduces the accuracy and sta-
bility compared with FIRK methods. A general DIRK method with s stages
has a maximum order of s + 1, and methods optimized for stability typically
have even lower order.
It is worth nothing that the implicit midpoint method discussed earlier
technically falls under the category of DIRK methods. However, it is also
a fully implicit Gauss method, and is not commonly referred to as a DIRK
method. The distinction between FIRK and DIRK methods is meaningful
only for s > 1. The Crank-Nicolson (implicit trapezoidal) method given by
(3.10)-(3.12) is another example of a DIRK method, evident from the right-
most Butcher tableau in (3.13). These methods are, however, only A-stable,
and it is possible to derive DIRK methods with better stability properties.
An example of an L-stable, two-stage DIRK method of order two is given by
γ γ 0
1 1−γ γ , (3.16)
1−γ γ
Note that we have split the sum over the stage derivatives, highlighting that
when solving for ki , the values kj for j < i are already known. The Jacobian
matrix Jg is found by differentiating g with respect to ki , resulting in
Jg = I − ∆taii Jf ,
0 0
2γ γ γ 0
, (3.17)
1 ββγ
ββγ
√ √
with γ = 1 − 2/2 and β = 2/4. The resulting equations for each time step
are
k1 = f (tn , un ),
k2 = f (tn + 2γ∆t, un + ∆t(γk1 + γk2 )),
k3 = f (tn + ∆t, un + ∆t(βk1 + βk2 + γk3 )),
un+1 = un + ∆t(βk1 + βk2 + γk3 ).
the parameter values. In the methods we have discussed so far, the method
coefficients have been hard-coded within the mathematical expressions, often
inside the advance methods. However, with the generic approach, it is more
natural to define these coefficients as class attributes in the constructor. Fol-
lowing this general approach, a base class for implicit RK methods can be
defined as follows:
from ODESolver import *
from scipy.optimize import root
class ImplicitRK(ODESolver):
def solve_stages(self):
u, f, n, t = self.u, self.f, self.n, self.t
s = self.stages
k0 = f(t[n], u[n])
k0 = np.tile(k0,s)
return np.split(sol.x, s)
res = np.zeros_like(k_all)
k = np.split(k_all, s)
for i in range(s):
fi = f(t[n] + c[i] * dt, u[n] + dt *
sum([a[i, j] * k[j] for j in range(s)]))
res[i * neq:(i + 1) * neq] = k[i] - fi
return res
def advance(self):
b = self.b
u, n, t = self.u, self.n, self.t
dt = self.dt
k = self.solve_stages()
Note that we assume that the method parameters are stored in NumPy ar-
rays self.a, self.b, self.c, which need to be defined in subclasses. It
is important to note that, just as the ODESolver class discussed earlier, the
ImplicitRK class is intended as a pure base class for holding common code.
It is not meant to be used as a standalone solver class. In accordance with
the principles described in Section 2.2, we could make the abstract nature of
this class explicit by using the abc module, but for the present text we focus
54 3 Stable Solvers for Stiff ODE Systems
on the fundamentals of the solvers and the class structure, keeping the code
as simple and compact as possible.
The three methods of the ImplicitRK class are generalizations of the
corresponding methods in the BackwardEuler class. They perform the same
tasks but at a higher abstraction level and they rely on a bit of NumPy magic:
• The solve_stages method is a generalization of the solve_stage method
above. Most of the lines are similar and should be self-explanatory. How-
ever, it is important to note that we are now implementing a general IRK
method with s stages. Instead of solving a system of nonlinear equations
for a single stage derivative, we solve a larger system to determine all s
stage derivatives at once. The solution of this system is a one-dimensional
array of length self.stages * self.neq, which contains all the stage
derivatives. The line k0 = np.tile(k0,s) takes an initial guess k0 for a
single stage, and stacks it after itself s times to create the initial guess for
all the stages, using NumPy’s tile function.
• The stage_eq method is also a pure generalization of its BackwardEuler
counterpart and performs the same tasks. The initial lines of this method
are self-explanatory, while the res = np.zeros_like(k_all) creates an
array of the appropriate length to hold the residual of the equation. For
convenience, the line k = np.split(k_all,s) splits the array k_all into
a list k that contains individual stage derivatives. This list is then used
in the subsequent for loop on the next four lines. This loop, which forms
the core of the method, implements equation (2.8), expressed as Python
code and split over several lines for improved readability. The method
returns the residual as a single array of length self.stages * self.neq,
as expected by the SciPy root function.
• Finally, the advance method calls the solve_stages to compute all the
stage derivatives, and then advances the solution using a general imple-
mentation of (2.9).
With the general base class in place, it becomes straightforward to implement
new solvers by writing constructors that define the method coefficients. The
following code implements the implicit midpoint and the two- and three-stage
Radau methods:
class ImplicitMidpoint(ImplicitRK):
def __init__(self, f):
super().__init__(f)
self.stages = 1
self.a = np.array([[1 / 2]])
self.c = np.array([1 / 2])
self.b = np.array([1])
class Radau2(ImplicitRK):
def __init__(self, f):
super().__init__(f)
self.stages = 2
3.5 Implementing Higher Order IRK Methods 55
class Radau3(ImplicitRK):
def __init__(self, f):
super().__init__(f)
self.stages = 3
sq6 = np.sqrt(6)
self.a = np.array([[(88 - 7 * sq6) / 360,
(296 - 169 * sq6) / 1800,
(-2 + 3 * sq6) / (225)],
[(296 + 169 * sq6) / 1800,
(88 + 7 * sq6) / 360,
(-2 - 3 * sq6) / (225)],
[(16 - sq6) / 36, (16 + sq6) / 36, 1 / 9]])
self.c = np.array([(4 - sq6) / 10, (4 + sq6) / 10, 1])
self.b = np.array([(16 - sq6) / 36, (16 + sq6) / 36, 1 / 9])
Notice that we always define the method coefficients as NumPy arrays, even
for the implicit midpoint method where they only contain a single number.
This definition is necessary for the generic methods of the ImplicitRK class
to work.
Here, (3.18) is nearly identical to the equation defining the stage derivative
in the backward Euler method, with the only difference being that ∆t is
replaced with γ∆t. Similarly, the only difference between (3.18) and (3.19)
is the additional term ∆t(1 − γ)k1 inside the function call. In general, any
stage equation for any DIRK method can be written as
i−1
X
ki = f (tn + ci ∆t, un + ∆t( aij kj + γki )), (3.21)
j=0
where the sum inside the function call only includes previously computed
stages.
Given the similarity of (3.21) with the stage equation from the backward
Euler method, it is natural to implement the SDIRK stage equation as a
generalization of the stage_eq method from the BackwardEuler class. To
achieve this, we can create an SDIRK base class that contains the general
versions of both the stage_eq and solve_stages methods. This base class
can then be used as a foundation for deriving specific SDIRK solver classes.
By writing the stage equations in this general form, it becomes straightfor-
ward to generalize the algorithm for looping through the stages and comput-
ing the individual stage derivatives. The complete base class implementation
may appear as follows.
class SDIRK(ImplicitRK):
def stage_eq(self, k, c_i, k_sum):
u, f, n, t = self.u, self.f, self.n, self.t
dt = self.dt
gamma = self.gamma
def solve_stages(self):
u, f, n, t = self.u, self.f, self.n, self.t
a, c = self.a, self.c
s = self.stages
The modified stage_eq method takes two additional parameters: the coef-
ficient c_i, corresponding to the current stage, and the array k_sum, which
holds the sum i−1
P
j=1 aij kj . These arguments need to be initialized correctly
for each stage and passed as additional arguments to the SciPy root func-
tion. For convenience, we also assume that the method parameter γ has been
stored as a separate class attribute. With the stage_eq method implemented
in this general way, the solve_stages method simply needs to update the
weighted sum of previous stages (k_sum), and pass this and the correct c
value as additional arguments to the SciPy root function. The implementa-
tion uses a for loop to compute the stage derivatives sequentially and returns
them as a list k_all.
As for the FIRK method classes, the only method we need to implement
specifically for each solver class is the constructor, in which we define the
number of stages and the method coefficients. A class implementation of the
method in (3.16) may look as follows.
class SDIRK2(SDIRK):
def __init__(self, f):
super().__init__(f)
self.stages = 2
gamma = (2 - np.sqrt(2)) / 2
self.gamma = gamma
self.a = np.array([[gamma, 0],
[1 - gamma, gamma]])
self.c = np.array([gamma, 1])
self.b = np.array([1 - gamma, gamma])
Shifting our attention to the ESDIRK methods, they are identical to the
SDIRK methods except for the first stage, and the potential for code reuse
is obvious. The stage_eq method from the SDIRK base class can be di-
rectly reused in an ESDIRK solver class, since the equations to be solved
for each stage are identical for SDIRK and ESDIRK solvers. However, the
solve_stages method needs to be modified, since there is no need to solve a
nonlinear equation for k1. Nevertheless, the modifications required are min-
imal since all stages i > 1 are identical. A possible implementation of the
ESDIRK class can look as follows:
class ESDIRK(SDIRK):
def solve_stages(self):
u, f, n, t = self.u, self.f, self.n, self.t
a, c = self.a, self.c
s = self.stages
k = f(t[n], u[n]) # initial guess for first stage
k_sum = np.zeros_like(k)
k_all = [k]
for i in range(1, s):
k_sum = sum(a_ * k_ for a_, k_ in zip(a[i, :i], k_all))
k = root(self.stage_eq, k, args=(c[i], k_sum)).x
k_all.append(k)
return k_all
58 3 Stable Solvers for Stiff ODE Systems
Fig. 3.6 Solutions of the Van der Pol model for µ = 10 and ∆t = 0.1, using implicit
RK solvers of different accuracy.
Comparing with the SDIRK base class defined earlier, there are two small but
important differences in the implementation of the solve_stages method.
First, the result of the first function evaluation k = f(t[n],u[n]) is used
directly as the first stage, by setting k_all = [k], instead of just serving
3.5 Implementing Higher Order IRK Methods 59
as an initial guess for the nonlinear equation solver. Second, the for-loop for
computing the remaining stages starts at i=1 rather than i=0.
With the ESDIRK base class at hand, individual ESDIRK methods can
be implemented easily by defining the constructor, for instance:
class TR_BDF2(ESDIRK):
def __init__(self, f):
super().__init__(f)
self.stages = 3
gamma = 1 - np.sqrt(2) / 2
beta = np.sqrt(2) / 4
self.gamma = gamma
self.a = np.array([[0, 0, 0],
[gamma, gamma, 0],
[beta, beta, gamma]])
self.c = np.array([0, 2 * gamma, 1])
self.b = np.array([beta, beta, gamma])
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 4
Adaptive Time Step Methods
Many ODE models of dynamic systems have solutions that exhibit rapid vari-
ations in some intervals and remain nearly constant in others. A motivating
example is a class of ODE models that describe the action potential of ex-
citable cells, initially introduced by Hodgkin and Huxley [10]. These models
play a crucial role in studying the electrophysiology of cells, including neu-
rons and different types of muscle cells. The transmembrane potential, which
is the difference in electrical potential between a cell’s interior and its sur-
roundings, is often the primary variable of interest. When an excitable cell,
such as a neuron or muscle cell, undergoes electrical stimulation, it triggers
a cascade of processes in the cell membrane, including to the opening and
closing of various ion channels. The resulting flux of ions causes the mem-
brane potential to transition from its resting negative state to approximately
zero or slightly positive, before returning to its resting value. This process
of depolarization followed by repolarization is called the action potential (see
Figure 4.1). For a comprehensive overview of the Hodgkin-Huxley model and
action potential models in general, refer to [11].
© The Author(s) 2024 61
J. Sundnes, Solving Ordinary Differential Equations in Python,
Simula SpringerBriefs on Computing 15,
https://doi.org/10.1007/978-3-031-46768-4_4
62 4 Adaptive Time Step Methods
Fig. 4.1 Solution of the Hodgkin-Huxley model. The left panel shows a single action
potential, while the right panel shows the result of stimulating the cell multiple times
with a fixed period.
There are many possible approaches for automatically selecting the time
step in numerical simulations. One intuitive strategy is to estimate the time
step estimate based on the solution’s dynamics, opting for a smaller time
step during periods of rapid variations. This approach is commonly applied
in adaptive solvers for partial differential equations (PDEs), where both the
time step and space step can be chosen adaptively. It has also proven effec-
tive in specialized solvers for action potential models, as discussed in [15],
where the time step is determined by the fluctuations in the transmembrane
voltage. However, it is important to note that this method may not be univer-
sally applicable, and the criteria for choosing the time step must be carefully
selected based on the characteristics of the problem at hand.
Adaptive time stepping methods aim to control the error in the solution, and
it is natural to base the step selection on some form of error estimate. In
Section 1.5 we computed the error at the end of the solution interval, and
used it to confirm the theoretical convergence of the method. In principle,
4.2 Choosing the Time Step Based on the Local Error 63
this global error could also be useful for selecting the time step, since we
can simply redo the calculation with a smaller time step if the error is too
large. However, for interesting ODE problems where the analytical solution is
unavailable, this method of error estimation becomes complicated. Further-
more, the goal of adaptive time step methods is to dynamically select the
time step as the solution progresses, to ensure that the final solution meets
a specified error tolerance. This goal requires a different approach, which is
based on estimating the local error for each step rather than relaying on the
global error.
Assuming that we can estimate the local error for a given step, en , the
goal is to choose the time step ∆tn so that the inequality
is satisfied for all steps. The process of choosing ∆tn to ensure the satisfaction
of (4.1) consists of two essential parts. First, we always check the inequality
after performing a step. If it is satisfied, we accept the step and proceed with
step n + 1 as usual. If it is not satisfied, we reject the step and try again
with a smaller ∆tn . The second part of the procedure involves choosing the
next time step, ∆tn+1 , if the current step was accepted, or a making a new
guess for ∆tn if it was rejected. Interestingly, we will discover that the same
formula, derived from our knowledge of the local error, can be applied in both
cases.
For simplicity of notation, let us assume that step n was accepted with a
time step ∆tn and a local error estimate en < tol. Our aim is now to choose
∆tn+1 so that (4.1) is satisfied as sharply as possible, to avoid unnecessary
computations. Hence, we aim to choose ∆tn+1 such that en+1 / tol. Recall
from 1.5 that for a method of global order p, the local error is of order p + 1,
so we have
where we assume that the error constant C remains constant from one step
to the next. Using (4.2), we can express C as
en
C= ,
(∆tn )p+1
and rearrange to get the standard formula for time step selection
1/(p+1)
tol p+1
∆tn+1 = ∆t .
en n
We see that if en tol, the formula will select a larger step size for the next
step, while if en ≈ tol we get ∆tn+1 ≈ ∆tn . In practice, the formula is usually
modified with a safety factor, i.e., we set
1/(p+1)
tol p+1
∆tn+1 = η ∆t . (4.4)
en n
for some η < 1. The same formula can be used to choose a new step size ∆tn
if the previous step was rejected, i.e., if en > tol.
Although (4.4) provides a simple formula for the step size and works well
for our example problems, more sophisticated methods have been derived.
The task of choosing the time step to control the error is an optimal control
problem, and successful methods based on control theory have been derived to
control the error while avoiding abrupt changes in the step size. For detailed
information and examples of such methods, refer to [9].
is generally applicable and can provide a local error estimate for all ODE
solvers. However, it is computationally expensive, and most modern ODE
software relies on other techniques. The second approach for computing ûn ,
to use a method with a higher order of accuracy, turns out to be particularly
advantageous for RK methods. We shall see in the next section that it is pos-
sible to construct embedded methods, which provides an error estimate with
very little additional computation.
Although the main idea is to reuse the same stage computations to compute
both ûn+1 and un+1 , it is not uncommon to introduce one additional stage in
the method to obtain the error estimate. An RK method with an embedded
method for error estimation is often referred to as an RK pair of order n(m),
where n is the order of the main method and m the order of the method
used for error estimation. Butcher tableaus for RK pairs are written exactly
as before, but with an extra line for the additional coefficients b̂:
66 4 Adaptive Time Step Methods
ci a11 · · · a1s
.. .. ..
. . .
cs as1 · · · ass .
b1 · · · bs
b̂1 · · · b̂s
0 0
1 1
, (4.8)
1 0
1/2 1/2
which translates to the following formulas for advancing the two solutions:
k1 = f (tn , un ),
k2 = f (tn + ∆t, un + ∆tk1 ),
un+1 = un + ∆tk1 ,
ûn+1 = un + ∆t/2(k1 + k2 ).
In the next section, we will see how this method pair can be implemented as
an extension of the ODESolver hierarchy discussed earlier, before we introduce
more advanced embedded RK methods in Section 4.5.
class AdaptiveODESolver(ODESolver):
def __init__(self, f, eta=0.9):
super().__init__(f)
self.eta = eta
if self.neq == 1:
self.u = [np.asarray(self.u0).reshape(1)]
68 4 Adaptive Time Step Methods
else:
self.u = [self.u0]
self.n = 0
self.dt = 0.1 / np.linalg.norm(self.f(t0, self.u0))
loc_t = t0
while loc_t < T:
u_new, loc_error = self.advance()
if loc_error < tol or self.dt < self.min_dt:
loc_t += self.dt
self.t.append(loc_t)
self.u.append(u_new)
self.dt = self.new_step_size(self.dt, loc_error)
self.dt = min(self.dt, T - loc_t, max_dt)
self.n += 1
else:
self.dt = self.new_step_size(self.dt, loc_error)
return np.array(self.t), np.array(self.u)
appended to the corresponding lists, and the next time step is chosen based
on the current step and the local error. The min and max operations ensure
that the time step remains within the selected bounds and that the simulation
ends at the final time T. If the constraint loc_error < tol is not satisfied,
we simply compute a new time step and try again without updating the lists
for the time and solution.
While the solve loop in the AdaptiveODESolver class is undoubtedly
more complex than earlier versions, it is important to note that it still rep-
resents a simplifed version of an adaptive ODE solver. The aim here is to
present the fundamental ideas and foster a general understanding of how
these solvers are implemented. Consequently, we have included only the most
essential components, and certain limitations and simplifications should be
acknowledged:
• The step size selection formula in (4.4), implemented in the method
new_step_size, could be replaced with more sophisticated algorithms.
For more details, refer to sources such as [3, 9].
• The formula for selecting the initial step is quite basic and primarily aims
to prevent extremely poor initial choices. More advanced algorithms have
been developed, and for additional information, consult references like [8,9]
for details.
• The initial if-test within the solver loop is not the most robust, since it
will proceed and move forward if the minimum step size is reached, even if
the error is excessively large. A robust solver should provide a warning to
the user in such cases where the requested tolerance cannot be achieved.
Despite these and other limitations, the adaptive solver class works as in-
tended and captures the essential behavior of adaptive ODE solvers.
With the AdaptiveODESolver base class available, specific solvers can be
implemented by creating tailored versions of the advance method and the
constructor. The order of the method is used in the time step selection and
therefore needs to be defined as an attribute. For example, an implementation
of the Euler-Heun method pair mentioned earlier could appear as follows:
class EulerHeun(AdaptiveODESolver):
def __init__(self, f, eta=0.9):
super().__init__(f, eta)
self.order = 1
def advance(self):
u, f, t = self.u, self.f, self.t
dt = self.dt
k1 = f(t[-1], u[-1])
k2 = f(t[-1] + dt, u[-1] + dt * k1)
high = dt / 2 * (k1 + k2)
low = dt * k1
unew = u[-1] + low
error = np.linalg.norm(high - low)
return unew, error
70 4 Adaptive Time Step Methods
After calculating the derivatives k1 and k2 for the two stages, the method
proceeds to compute the updates for both the high and low order solutions.
The low order solution is used to advance the overall solution, while the
difference between the high and low order solutions serves as the error esti-
mate. The method then returns the updated solution and the error, which
are needed by the solve method implemented in the base class described
earlier.
Since we have two methods with different levels of accuracy, one might
wonder whether it would be better to advance the solution using the more
accurate method rather than the less accurate one. This choice would cer-
tainly yield a reduced local error, but the drawback is that we would no
longer have a proper error estimate for the method used to integrate the
solution. We can use the more accurate solution to estimate the error of the
less accurate, but not the other way around. Nevertheless, this approach,
known as local extrapolation [8], is still used by many popular RK pairs, as
we will observe in the examples below. Even though the error estimate may
not be precise for the method used to integrate the solution, it still works
well as a tool for selecting the time step. In the implementation above, it is
straightforward to experiment with this choice by replacing low with high
when assigning the value to unew. By doing so, we can observe the impact
on the error and the number of time steps.
0
1 1
4 4
3 3 9
8 32 32
12 1932
13 2197 − 7200
2197
7296
2197
1 439 −8 3680 845 . (4.9)
216 513 − 4104
1 8
2 − 27 2 − 3544 1859
2565 4104 − 11
40
25 1408 2197
216 0 2565 4104 − 15 0
16 6656 28561 9 2
135 0 12825 56430 − 50 55
In this tableau, the coefficients in the first line (bi ) correspond to a fourth-
order method, while the coefficients in the last line (b̂i ) correspond to a fifth-
order method. The implementation of the RKF45 method is similar to the
4.5 More Advanced Embedded RK Methods 71
Euler-Heun pair, but due to the increased number of stages and coefficients,
the advance method becomes more complex:
class RKF45(AdaptiveODESolver):
def __init__(self, f, eta=0.9):
super().__init__(f, eta)
self.order = 4
def advance(self):
u, f, t = self.u, self.f, self.t
dt = self.dt
c2 = 1/4; a21 = 1/4;
c3 = 3/8; a31 = 3/32; a32 = 9/32
c4 = 12/13; a41 = 1932/2197; a42 = -7200/2197; a43 = 7296/2197
c5 = 1; a51 = 439/216; a52 = -8; a53 = 3680/513;
a54 = -845/4104
c6 = 1/2; a61 = -8/27; a62 = 2; a63 = -3544/2565;
a64 = 1859/4104; a65 = -11/40
b1 = 25/216; b2 = 0; b3 = 1408/2565; b4 = 2197/4104;
b5 = -1/5; b6 = 0
bh1 = 16/135; bh2 = 0; bh3 = 6656/12825; bh4 = 28561/56430;
bh5 = -9/50; bh6 = 2/55
k1 =
f(t[-1], u[-1])
k2 =
f(t[-1] + c2 * dt, u[-1] + dt * (a21 * k1))
k3 =
f(t[-1] + c3 * dt, u[-1] + dt * (a31 * k1 + a32 * k2))
k4 =
f(t[-1] + c4 * dt, u[-1] + dt *
(a41 * k1 + a42 * k2 + a43 * k3))
k5 = f(t[-1] + c5 * dt, u[-1] + dt *
(a51 * k1 + a52 * k2 + a53 * k3 + a54 * k4))
k6 = f(t[-1] + c6 * dt, u[-1] +
dt * (a61 * k1 + a62 * k2 + a63 * k3
+ a64 * k4 + a65 * k5))
The advance method could be written more concisely, but we have chosen
to maintain the structure of the explicit RK methods introduced earlier.
Another well-known and widely used pair of ERK methods is the Dormand-
Prince method [4], which is a seven-stage method with the following coeffi-
cients:
72 4 Adaptive Time Step Methods
0
1 1
5 5
3 3 9
10 40 40
4 44
5 45 − 56
15
32
9
8 19372 25360 64448
9 6561 − 2187 6561 − 212
729 .
1 9017
3168 − 355
33
46732
5247
49
176
5103
− 18656
1 35 84 0 500
1113
125
192 − 2187
6784
11
84
35 500 125
yn 384 0 1113 192 − 2187
6784
11
84 0
5179 7571 393 92097 187 1
ŷn 57600 0 16695 640 − 339200 2100 40
This method has been optimized for the local extrapolation approach men-
tioned above, where the higher order method is used for advancing the solu-
tion and the less accurate method is used for step size selection. The imple-
mentation is otherwise similar to the RKF45 method. The Dormand-Prince
method has been implemented in many software tools, including the popular
ode45 function in Matlab (The Math Works, Inc. MATLAB. Version 2023a).
Implicit RK methods can also incorporate embedded methods. The un-
derlying idea is the same as for explicit methods, although step size selection
tends to be more challenging for stiff problems. A crucial requirement for
stiff problems is that both the main method and the error estimator must
have good stability properties. Stiff problems pose challenges for error con-
trol algorithms, and simple algorithms such as (4.4) often experience large
fluctuations in step size and local error. For a detailed discussion of these
challenges, refer to [1, 9].
As an example of an implicit method with error control, we can extend
the TR-BDF2 method in (3.17) to include a third order method for error
estimation. The extended Butcher tableau is
0 0
2γ γ γ 0
1 β β γ, (4.10)
β β γ
1−β 3β+1 γ
3 3 3
√ √
where γ = 1 − 2/2, β = 2/4, and the bottom line of coefficients defines the
third-order method. This third-order method is not L-stable, so for stiff prob-
lems it is preferable to advance the solution using the second-order method
and use the more accurate one for time step control. Achieving L-stability
for both methods of an embedded RK pair is ideal but often impossible,
and we need to accept somewhat weaker stability requirements for the error
estimator, as discussed in [13].
When implementing the adaptive TR-BDF2 and other implicit methods,
we need to combine features from the AdaptiveODESolver class mentioned
earlier with the tools from the ImplicitRK hierarchy introduced in Chap-
4.5 More Advanced Embedded RK Methods 73
This simply states that the new class inherits all the methods from both
the AdaptiveODESolver class and the ImplicitRK class. The general de-
sign of the ImplicitRK class mentioned earlier was to define the method
coefficients in the constructor and use a generic advance method, making it
convenient to use the same method for adaptive implicit methods. However,
the advance method needs to be overridden in our AdaptiveImplicitRK base
class from ImplicitRK as we need the method to return the error in addition
to the updated solution. All other methods can be reused directly from either
AdaptiveODESolver or ImplicitRK. Therefore, a suitable implementation of
the new class may look like:
class AdaptiveESDIRK(AdaptiveODESolver, ESDIRK):
def advance(self):
b = self.b
e = self.e
u = self.u
dt = self.dt
k = self.solve_stages()
u_step = dt * sum(b_ * k_ for b_, k_ in zip(b, k))
error = dt * sum(e_ * k_ for e_, k_ in zip(e, k))
for the ESDIRK methods. Further details on adaptive versions of the Radau
methods may be found in [9].
Although multiple inheritance provides a convenient way to reuse the
functionality of our existing classes, it comes with the risk of somewhat
complex and confusing class hierarchies. In particular, the fact that our
AdaptiveESDIRK class inherits from AdaptiveODESolver and ESDIRK, which
are both subclasses of ODESolver, may give rise to a well-known ambigu-
ity referred to as the diamond problem. The problem would arise if, for in-
stance, we were to define a method in ODESolver, override it with special
versions in both AdaptiveODESolver and ESDIRK, and then call it from an
instance of AdaptiveESDIRK. Would we then call the version implemented
in AdaptiveODESolver or the one in ESDIRK? The answer is determined
by Python’s so-called method resolution order (MRO), which decides which
method to inherit first based on its "closeness" in the class hierarchy and
then on the order of the base classes in the class definition. In our particular
example the AdaptiveESDIRK class is equally close to AdaptiveODESolver
and ESDIRK, since it is a direct subclass of both. The method called would
therefore be the version from AdaptiveODESolver, since this is listed first
in the class definition. In our relatively simple class hierarchy there are no
such ambiguities, and even if we use multiple inheritance it should not be
too challenging to determine which methods are called, but it is a potential
source of confusion that is worth being aware of.
Now that we have the AdaptiveESDIRK base class available, we can im-
plement an adaptive version of the TR-BDF2 method as follows:
class TR_BDF2_Adaptive(AdaptiveESDIRK):
def __init__(self, f, eta=0.9):
super().__init__(f, eta) # calls AdaptiveODESolver.__init__
self.stages = 3
self.order = 2
gamma = 1 - np.sqrt(2) / 2
beta = np.sqrt(2) / 4
self.gamma = gamma
self.a = np.array([[0, 0, 0],
[gamma, gamma, 0],
[beta, beta, gamma]])
self.c = np.array([0, 2 * gamma, 1])
self.b = np.array([beta, beta, gamma])
bh = np.array([(1 - beta) / 3, (3 * beta + 1) / 3, gamma / 3])
self.e = self.b - bh
4.5 More Advanced Embedded RK Methods 75
To illustrate the use of this solver class, we may return to the Hodgkin-Huxley
model introduced earlier in this chapter. Assuming the model is implemented
as a class in a file hodgkinhuxley.py, the following code solves the model
and plots the transmembrane potential:
from AdaptiveImplicitRK import TR_BDF2_Adaptive
from hodgkinhuxley import HodgkinHuxley
import matplotlib.pyplot as plt
model = HodgkinHuxley()
u0 = [-45, 0.31, 0.05, 0.59]
t_span = (0, 50)
tol = 0.01
solver = TR_BDF2_Adaptive(model)
solver.set_initial_condition(u0)
t, u = solver.solve(t_span, tol)
Fig. 4.2 Solution of the Hodgkin-Huxley model. The solid line is a reference solution
computed with SciPy solve_ivp, while the +-marks are the time steps chosen by the
adaptive TR-BDF2 solver.
76 4 Adaptive Time Step Methods
A plot of the solution is shown in Figure 4.2, where the +-marks repre-
sent the time steps chosen by the adaptive TR-BDF2 solver. It is apparent
that larger time steps are used in quiescent regions while smaller steps are
employed in regions with rapid solution variations. A more quantitative view
of the solver behavior, for three different solvers, is shown in the table below.
Each method was applied with three different tolerance values over a time
interval from 0 to 50ms, using default choices for the maximum and mini-
mum time steps. The "Error" column provides an estimate of the global error,
calculated based on a reference solution obtained using SciPy’s solve_ivp
function. The "Steps" column indicates the number of accepted time steps,
while "Rejected" indicates the total number of rejected steps. The last two
columns display the minimum and maximum time steps observed during the
computation.
The numbers in this table illustrate several well-known properties and lim-
itations of adaptive ODE solvers. First, we observe that there is no close
relationship between the selected tolerance and the resulting error. The er-
ror gets smaller when we reduce the tolerance, and for this particular case
the error is always smaller than the specified tolerance, but the error varies
substantially between the different methods. As mentioned earlier, the time
step is selected to control the local error, and although we expect the global
error to decrease as we reduce the tolerance, we cannot guarantee that the
global error will be smaller than the tolerance. Second, the RKF45 and Euler-
Heun methods exhibit relatively poor performance and inconsistent behavior
as the tolerance is reduced. For instance, the RKF45 method requires the
highest number of steps, and also rejects the largest number of steps, when
the tolerance is set to the highest value. This behavior stems from the stiff
nature of Hodgkin-Huxley model, where the time step for explicit methods is
primarily determined by stability rather than accuracy. The minimum time
step ∆tmin of 1.0 · 10−5 is a result of divergence issues that automatically
set the time step to the specified lower bound. In most of the other combi-
nations of method and tolerance, the smallest observed time step is the first
one, selected by the simple formula within the solve method. There is room
4.5 More Advanced Embedded RK Methods 77
for improvement in this area, and the overall performance of RKF45 for stiff
problems could be improved with a more sophisticated step size controller.
However, it is important to note that for stiff problems, explicit solvers will
never achieve the same level of performance as implicit solvers.
The ideas and tools introduced in this chapter are fundamental to all
RK methods with error control and automatic time step selection. These
ideas are fairly simple, and, as illustrated in Figure 4.2, give rise to methods
that effectively adapt the time step to control the error. However, there are
many practical considerations in implementing these methods, and we have
only scratched the surface. For example, the time step control formula in
(4.4) could be refined using more sophisticated models derived from control
theory. [7] The initial time step selection, as indicated by the smallest step
∆tmin being the first one for most solvers in the table, could also be im-
proved. Furthermore, adjusted error estimates tailored for stiff systems have
been proposed [9]. For a comprehensive discussion and detailed exploration
of automatic time step control, we recommend referring to [1] and [8, 9].
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 5
Modeling Infectious Diseases
S(t)I(t)
S(t + ∆t) = S(t) − ∆tβ ,
N
S(t)I(t)
I(t + ∆t) = I(t) + ∆t β .
N
These two equations represent the key component of all the models covered
in this chapter. They are formulated as difference equations, and they can
easily be transformed to ODEs. More advanced models are typically derived
5.1 Derivation of the SIR model 81
by adding more categories and more transitions between them, but the indi-
vidual transitions are very similar to those presented here.
S I R
Fig. 5.1 Graphical representation of the simplest SIR-model, where people move from
being susceptible (S) to being infected (I) and then reach the recovered (R) category
with immunity against the disease.
We also need to model the transition of people from the I to the R category.
Again considering a small time interval ∆t, it is reasonable to assume that a
fraction ∆t ν of the infected individuals recover and move to the R category.
Here ν is a constant that describes the time dynamics of the disease. The
increase in R is given by:
Additionally, we need to subtract the same term in the balance equation for
I, since individuals move from I to R. Therefore, we have:
S(t)I(t)
S(t + ∆t) = S(t) − ∆t β , (5.1)
N
S(t)I(t)
I(t + ∆t) = I(t) + ∆t β − ∆tνI(t), (5.2)
N
R(t + ∆t) = R(t) + ∆t νI(t). (5.3)
1
A simpler version of the SIR model is also commonly used, where the disease trans-
mission term is not scaled with N . Eq. (5.8) then reads S 0 = −βSI, and (5.8) is modified
similarly. Since N is constant the two models are equivalent, but the version in (5.7)-
(5.9) is more common in real-world applications and gives a closer relation between β
and common parameters such as the reproduction number.
5.1 Derivation of the SIR model 83
β, are not directly related, and the notation may not be optimal. However,
we use it here because it is widely established in the field of epidemiology.
Although the system (5.7)-(5.9) appears simple, it is not easy to derive
analytical solutions. For specific applications, simplifications can often be
made to allow for simple analytical solutions. For instance, when studying
the early phase of an epidemic, the focus is usually on the I category Since the
number of infected cases is low compared with the entire population during
this phase, it is reasonable to assume that S is approximately constant and
equal to N . By substituting S ≈ N into (5.8), we obtain a simple equation
describing exponential growth with the solution
S0 = 1000
I0 = 1
R0 = 0
solver = RungeKutta4(SIR_model)
solver.set_initial_condition([S0, I0, R0])
84 5 Modeling Infectious Diseases
plt.plot(t, S, t, I, t, R)
plt.show()
As with the models discussed in earlier chapters, the use of the class is very
similar to that of the SIR_model function above. We create an instance of
the class with specific values of beta and nu, and then this instance can be
passed to the ODE solver just like any regular Python function.
5.2 Extending the SIR Model 85
Fig. 5.2 Solution of the simplest version of the SIR model, showing how the number
of people in each category (S, I, and R) changes with time.
The SIR model itself, in its simplest form, is rarely used for predictive sim-
ulations of real-world diseases. However, various extensions of the model are
widely used to better capture the dynamics of different infectious diseases.
In this section, we will explore a few such extensions that are based on the
building blocks of the simple SIR model.
An SIR Model without Life-Long Immunity. One modification of the
model is to remove the assumption of life-long immunity. The original model
(5.7)-(5.9) describes a one-directional flow towards the R category, where
the entire population eventually transitions to R if the model is solved over
a sufficiently long time interval. However, this situation is not realistic for
many diseases, since immunity tends to diminish over time. In the model this
loss can be described by a leakage of people from the R category back to S.
If we introduce the parameter γ to describe this flux (1/γ being the mean
time for immunity), the modified equation system looks like
Fig. 5.3 Illustration of a SIR model without lifelong immunity, where people move
from the R category back to S after a given time.
Note that the overall structure of the model remains the same. Since the
total population is conserved, all terms are balanced in the sense that they
occur twice in the model, with opposite signs. A decrease in one category is
always matched with an identical increase in another category. It is always
useful to be aware of such fundamental properties in a model, since they can
easily be checked in the computed solutions and may reveal errors in the
implementation.
S E I R
Again, this small extension of the model does not make it much more
difficult to solve. The following code shows an example of how the SEIR model
can be implemented as a class and solved with the ODESolver hierarchy:
from ODESolver import RungeKutta4
import numpy as np
import matplotlib.pyplot as plt
class SEIR:
def __init__(self, beta, mu, nu, gamma):
self.beta = beta
self.mu = mu
self.nu = nu
self.gamma = gamma
S0 = 1000
E0 = 0
88 5 Modeling Infectious Diseases
I0 = 1
R0 = 0
model = SEIR(beta=1.0, mu=1.0 / 5, nu=1.0 / 7, gamma=1.0 / 50)
solver = RungeKutta4(model)
solver.set_initial_condition([S0, E0, I0, R0])
t_span = (0, 100)
t, u = solver.solve(t_span, N=101)
S = u[:, 0]
E = u[:, 1]
I = u[:, 2]
R = u[:, 3]
plt.plot(t, S, t, E, t, I, t, R)
plt.show()
2
See https://github.com/folkehelseinstituttet/spread
5.3 A Model of the Covid-19 Pandemic 89
E2 I
S E1 R
Ia
Fig. 5.5 Illustration of the Covid-19 epidemic model, with two alternative disease
trajectories.
The derivation of the model equations for the SEEIR model is similar to
the simpler models discussed earlier, but there are more equations and more
terms involved. The most important extension in the SEEIR model is the
inclusion of three categories of infectious people; E2 , I, and Ia . Each of these
90 5 Modeling Infectious Diseases
where 1/λ2 and 1/µ represent the mean durations of the E2 and I phases,
respectively. The model for the asymptomatic disease trajectory is somewhat
simpler, with Ia receiving an influx from E1 and losing people directly to R.
We have
Ia0 (t) = λ1 pa E1 − µIa ,
5.3 A Model of the Covid-19 Pandemic 91
where we have assumed that the duration of the Ia period is the same as for
I, i.e., 1/µ. Finally, the dynamics of the recovered category are governed by
R0 (t) = µI + µIa .
Note that we do not consider flow from the R category back to S, so we have
effectively assumed life-long immunity. This assumption is not correct for
Covid-19, but in the early phase of the pandemic, the duration of immunity
was largely unknown, and the loss of immunity was therefore not considered
in the models.
To summarize, the complete ODE system of the SEEIIR model can be
written as
SI SIa SE2
S 0 (t) = −β − ria β − re2 β ,
N N N
SI SIa SE2
E10 (t) = β + ria β + re2 β − λ 1 E1 ,
N N N
E20 (t) = λ1 (1 − pa )E1 − λ2 E2 ,
I 0 (t) = λ2 E2 − µI,
Ia0 (t) = λ1 pa E1 − µIa ,
R0 (t) = µ(I + Ia ).
Parameter Value
β 0.33
ria 0.1
re2 1.25
λ1 0.33
λ2 0.5
pa 0.4
µ 0.2
These parameters are similar to the ones used by the health authorities to
model the early phase of the Covid-19 outbreak in Norway. During this time,
the behavior of the disease was largely unknown, and estimating the number
of cases in the population was challenging. Consequently, fitting the param-
eter values was difficult, and they carried considerable uncertainty. As men-
tioned earlier, the most challenging parameters to estimate are those related
to infectiousness and disease spread, which in this model are β, ria , and re2 .
Throughout the course of the pandemic, these parameters have been updated
multiple times to reflect new knowledge about the disease and actual changes
in disease spread due to new mutations or shifts in population behavior.
It is worth noting that we have set re2 > 1, indicating that people in the E2
category are more infectious than the infected group in I. This assumption
92 5 Modeling Infectious Diseases
reflects the fact that the E2 group is asymptomatic, so people in this group
are likely to be more mobile and potentially infect more people than those
in the I group. On the other hand, the Ia group is also asymptomatic and
therefore likely to have normal social interactions, but it is assumed that
these people have a very low virus count. They are therefore less infectious
than the people that develop symptoms, which is reflected in the low value
of ria .
The parameters µ, λ1 , and λ2 are given in units of days−1 , Thus the
mean duration of the symptomatic disease period is five days (1/µ), the non-
infectious incubation period lasts three days on average (1/λ1 ), while the
mean duration of the infectious incubation period (E2 ) is two days (1/λ2 ).
In this model, with multiple infectious categories, the basic reproduction
number is calculated as
since the mean duration of the E2 period is 1/λ2 and the mean duration of
both I and Ia is 1/µ. The parameter choices listed above yield R0 ≈ 2.62,
which is the value used by the Institute of Public Health (FHI) to model
the early stage of the outbreak in Norway, from mid-February to mid-March
2020.
Fig. 5.6 Solution of the SEEIIR model with the default parameter values, which are
similar to the values used by Norwegian health authorities during the early phase of the
Covid-19 pandemic.
5.3 A Model of the Covid-19 Pandemic 93
Although the present model is somewhat more complex than the previous
ones, the implementation is not very different. A class implementation may
look as follows:
class SEEIIR:
def __init__(self, beta=0.33, r_ia=0.1,
r_e2=1.25, lmbda_1=0.33,
lmbda_2=0.5, p_a=0.4, mu=0.2):
self.beta = beta
self.r_ia = r_ia
self.r_e2 = r_e2
self.lmbda_1 = lmbda_1
self.lmbda_2 = lmbda_2
self.p_a = p_a
self.mu = mu
The model can be solved with any of the methods available in the ODESolver
hierarchy, similar to the simpler models discussed earlier. An example solution
with the default parameter values is shown in Figure 5.6. It is important to
note that since the parameters listed above are based on the initial stage
of the pandemic when no restrictions were in place, this solution may be
interpreted as a potential worst case scenario for the pandemic in Norway if
no government-imposed restrictions were implemented.
While the plot for the I category may not appear too dramatic at first
glance, a closer inspection reveals that the peak reaches slightly above 140,000
people. Considering the limited knowledge available at that stage, particu-
larly regarding the severity of Covid-19, it is not surprising that a scenario of
140,000 people being infected simultaneously caused concern among health
authorities. Another interesting observation from the curve is that the S cat-
94 5 Modeling Infectious Diseases
egory flattens out well below the total population number. This behavior
exemplifies the concept of herd immunity, wherein when a sufficient number
of people are immune to the disease, it effectively stops spreading even if
many people remain susceptible. As we are aware, severe restrictions were
put in place in most countries during the early spring of 2020, making it
impossible to determine whether this worst case scenario would ever have
materialized. To accurately capture the actual dynamics of the pandemic in
Norway, we would need to incorporate the effect of societal changes and al-
tered infectiousness over time by making the β parameter a function of time.
For instance, we could define it as a piecewise constant function to match the
observed trends in the data.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Appendix A
Programming of Difference Equations
Although the main focus of these notes is on solvers for differential equations,
we find it useful to include a chapter on the closely related class of problems
known as difference equations. The main motivation for including this topic
in a book on ODEs is to highlight the similarity between the two classes of
problems, and in particular, the similarity of the solution methods and their
implementation. Indeed, solving ODEs numerically can be seen as a two-step
procedure. First, a numerical method is applied to turn differential equations
into difference equations, and then these equations are solved using simple
for-loop. The standard formulation of difference equations is very easy to
translate into a computer program, and some readers may find it easier to
study these equations first, before moving on to ODEs. In the present chapter
we will also touch upon famous sequences and series, which have important
applications both in the numerical solution of ODEs and elsewhere.
x0 , x 1 , x 2 , . . . , x n , . . . .
For certain sequences, we can derive a formula that expresses the n-th number
xn as a function of n. For instance, consider the sequence of all odd numbers:
1, 3, 5, 7, . . . .
For this sequence, we can write a simple formula for the n-th term
xn = 2n + 1,
© The Author(s) 2024 95
J. Sundnes, Solving Ordinary Differential Equations in Python,
Simula SpringerBriefs on Computing 15,
https://doi.org/10.1007/978-3-031-46768-4
96 A Programming of Difference Equations
and we can use this formula to represent the complete sequence in a compact
form;
(xn )∞
n=0 , xn = 2n + 1.
Other examples of sequences include
xn = x0 (1 + p/100)n ,
x[0] = x0
for n in index_set[1:]:
x[n] = x[n - 1] + (p / 100.0) * x[n - 1]
plt.plot(index_set, x, ’ro’)
plt.xlabel(’years’)
plt.ylabel(’amount’)
plt.show()
The three lines starting with x[0] = x0 form the core of the program. Here,
we initialize the first element in our solution array with the known x0, and
then enter the for-loop to compute the rest. The loop variable n runs from 1
to N (= 4), and the formula inside the loop computes x[n] from the known
x[n-1].
Also note that we pass a single array as an argument to plt.plot, whereas
in most examples in this book we pass two arrays, typically representing time
on the x-axis and the solution on the y-axis. When only one array of numbers
is sent to plot, they are automatically interpreted as the y-coordinates of the
points, and the x-coordinates will be the indices of the array, in this case the
numbers from 0 to N .
Solving a Difference Equation without Using Arrays. The previous
program stored the sequence as an array, which is convenient for programming
the solver and allows us to plot the entire sequence. However, if we are only
interested in the solution at a single point, i.e., xn , there is no need to store
the entire sequence. Since each xn only depends on the previous value xn−1 ,
we only need to store the last two values in memory. A complete loop can
look like this:
x_old = x0
for n in index_set[1:]:
x_new = x_old + (p / 100.) * x_old
x_old = x_new # x_new becomes x_old at next step
print(’Final amount: ’, x_new)
For this simple case we can actually make the code even shorter, since x_old
is only used in a single line, and we can instead simply overwrite the old value
of x once it has been used:
98 A Programming of Difference Equations
We can observe that these codes store just one or two numbers, and for
each iteration of the loop, we simply update and overwrite the values we no
longer need. While this approach is straightforward and saves memory by not
storing the complete array, programming with an array x[n] is usually safer,
and we are often interested in plotting the entire sequence. Therefore, in the
subsequent examples, we will mostly use arrays.
Extending the Solver for the Growth of Money. Suppose we want
to change our interest rate model to one where interest is added every day
instead of every year. The daily interest rate is r = p/D, where p is the
annual interest rate and D is the number of days in a year. A common model
in business applies D = 360, but n counts exact (all) days. The difference
equation that relates the amount on one day to the previous day remains the
same:
r
xn = xn−1 + xn−1 ,
100
except that the yearly interest rate has been replaced by the daily (r). If we
want to determine the growth of money between two given dates, we also
need to find the number of days between those dates. This calculation can
be done manually, but Python offers a convenient module named datetime
for this purpose. The following session illustrates how it can be used:
>>> import datetime
>>> date1 = datetime.date(2017, 9, 29) # Sep 29, 2017
>>> date2 = datetime.date(2018, 8, 4) # Aug 4, 2018
>>> diff = date2 - date1
>>> print(diff.days)
309
Putting these tools together, a complete program for daily interest rates may
look like
import numpy as np
import matplotlib.pyplot as plt
import datetime
x[0] = x0
for n in index_set[1:]:
x[n] = x[n - 1] + (r / 100.0) * x[n - 1]
plt.plot(index_set, x)
plt.xlabel(’days’)
plt.ylabel(’amount’)
plt.show()
This program is slightly more sophisticated than the first one, but one may
still argue that solving this problem with a difference equation is unnecessarily
r n
complex when we can simply apply the well-known formula xn = x0 (1+ 100 )
to compute any xn we want. However, we know that interest rates change
quite often, and the formula is only valid for a constant r. On the other hand,
for the program based on solving the difference equation, we only need minor
modifications to handle a varying interest rate. The simplest approach is to
let p be an array of the same length as the number of days, and fill it with the
correct interest rates for each day. The modifications to the previous program
may look like this:
p = np.zeros(len(index_set))
# fill p[n] with correct values
x[0] = x0
for n in index_set[1:]:
x[n] = x[n-1] + (r[n-1] / 100.0) * x[n-1]
p = np.zeros(len(index_set))
p[:Np] = 4.0
p[Np:] = 5.0
100 A Programming of Difference Equations
xn = xn−1 + xn−2 , x0 = 1, x1 = 1,
is called the Fibonacci numbers. Originally derived for modeling rat popu-
lations, the Fibonacci numbers possess a range of interesting mathematical
properties that have attracted considerable attention from mathematicians.
The equation for the Fibonacci numbers differs from the previous exam-
ples, since xn depends on the two previous values (n − 1, n − 2), making it
a second order difference equation. While this classification is important for
mathematical solution techniques, the distinction between first and second
order equations is minor in programming.
A complete code to solve the difference equation and generate the Fi-
bonacci numbers can be written as
import sys
from numpy import zeros
N = int(sys.argv[1])
x = zeros(N+1, int)
x[0] = 1
x[1] = 1
for n in range(2, N+1):
x[n] = x[n-1] + x[n-2]
print(n, x[n])
In this code, we use the built-in list sys.argv from the sys model in order
to provide the input N as a command-line argument. See, for instance, [16]
for an explanation. It is important to note that we need to initialize both
x[0] and x[1] before starting the loop, since the update formula involves
both x[n-1] and x[n-2]. This is the main difference between this second
order equation and the programs for first order equations considered above.
The Fibonacci numbers grow quickly and running this program for large N
will lead to overflow issues (try for instance N = 100). The NumPy int type
supports up to 9223372036854775807, which is almost 1019 , so overflow is
rarely a problem in practical applications. There are ways to avoid this issue,
for instance using the standard Python int type instead of NumPy arrays,
but we won’t delve into those details here.
Logistic Growth. Returning to the initial problem of calculating the growth
of money in a bank, we can write the classical solution formula more concisely
as
A.2 More Examples of Difference Equations 101
xn = x0 (1 + p/100)n = x0 C n (= x0 en ln C ),
where C = (1 + p/100). Since n represents years, this exemplifies exponential
growth in time, following the general formula x = x0 eλt . Similarly, popu-
lations of humans, animals, and other organisms exhibit the same type of
growth when resources (such as space and food) are unlimited, and the ex-
ponential growth model has many applications in biology.1 However, most
environments can only support a finite number R of individuals, whereas the
population continues to grow indefinitely in the exponential growth model.
How can we modify the equation to create a more realistic model for growing
populations?
Initially, when resources are abundant, we want the growth to be expo-
nential, i.e., to grow with a given rate r% per year according to the difference
equation:
xn = xn−1 + (r/100)xn−1 .
To enforce the growth limit as xn → R, r must decay to zero as xn approaches
R. The simplest variation of r(n) is linear:
xn
r(n) = % 1 −
R
We observe that r(n) ≈ % for small n, when xn R, and r(n) → 0 as n grows
and xn → R. This formulation of the growth rate leads to the logistic growth
model:
% xn−1
xn = xn−1 + xn−1 1 − .
100 R
This is a nonlinear difference equation, while all the examples considered
earlier were linear. The distinction between linear and nonlinear equations
is crucial for the mathematical analysis of the equations, but it does not
make much difference when solving the equation in a program. To modify
the interest rate program mentioned above to describe logistic growth, we
can simply replace the line
x[n] = x[n-1] + (p / 100.0) * x[n-1]
by
x[n] = x[n-1] + (rho / 100) * x[n-1] * (1 - x[n-1] / R)
1
As discussed in Chapter 1, the formula x = x0 eλt is the solution of the differential
equation dx/dt = λx, which illustrates the close relation between difference equations
and differential equations.
102 A Programming of Difference Equations
index_set = range(N+1)
x = np.zeros(len(index_set))
x[0] = x0
for n in index_set[1:]:
x[n] = x[n-1] + (rho / 100) * x[n-1] * (1 - x[n-1] / R)
plt.plot(index_set, x)
plt.xlabel(’years’)
plt.ylabel(’amount’)
plt.show()
Fig. A.1 Solution of the logistic growth model for x0 = 100, ρ = 5.0, R = 500.
f (x) = 0.
Starting from some initial guess x0 , Newton’s method gradually improves the
approximation through iterations
f (xn−1 )
xn = xn−1 − .
f 0 (xn−1 )
The arguments f and dfdx are Python functions implementing f (x) and its
derivative. Both of these arguments are called inside the function and must
therefore be callable. The x argument is the initial guess for the solution
x, and the two optional arguments at the end are the tolerance and the
maximum number of iterations. Although the method is implemented as a
while-loop rather than a for-loop, the main structure of the algorithm remains
the same as for the other difference equations considered earlier.
104 A Programming of Difference Equations
c = np.zeros_like(x)
x[0] = F
c[0] = q * p * F * 1e-4
for n in index_set[1:]:
x[n] = x[n - 1] + (p / 100.0) * x[n - 1] - c[n - 1]
c[n] = c[n - 1] + (I / 100.0) * c[n - 1]
Here, a is the natural growth rate of the prey in the absence of predators, b
is the death rate of prey per encounter of prey and predator, c is the natural
death rate of predators in the absence of food (prey), and d is the efficiency
of turning predated prey into predators. This is a system of two first-order
difference equations, similar to the previous example, and a complete solution
code may look as follows.
import numpy as np
import matplotlib.pyplot as plt
x[0] = x0
y[0] = y0
106 A Programming of Difference Equations
for n in index_set[1:]:
x[n] = x[n - 1] + a * x[n - 1] - b * x[n - 1] * y[n - 1]
y[n] = y[n - 1] + d * b * x[n - 1] * y[n - 1] - c * y[n - 1]
plt.plot(index_set, x, label=’Prey’)
plt.plot(index_set, y, label=’Predator’)
plt.xlabel(’Time’)
plt.ylabel(’Population’)
plt.legend()
plt.show()
Sequences and series are extremely useful for approximating functions. For
instance, commonly used functions like sin x, ln x, and ex have been defined
to have some desired mathematical properties, and we have an intuitive un-
derstanding of how they look, but we need an algorithm to evaluate the
function values. One convenient approach is to approximate these functions
using polynomials, since they are easy to calculate. Polynomial approxima-
tions have been used for centuries to compute exponentials, trigonometric
functions and others. The most famous and widely used series for such ap-
proximations are the Taylor series, discovered in 1715, and given by
∞
X 1 dk f (0) k
f (x) = ( )x . (A.2)
k! dxk
k=0
We can also shift the variables to make these truncated Taylor series accurate
around any value x = a:
N
X 1 dk f (a)
f (x) ≈ ( )(x − a)k .
k! dxk
k=0
ex ≈ 1 + x,
1 1
ex ≈ 1 + x + x2 + x3 .
2 6
These approximations are not very accurate for large x, but close to x = 0
they are sufficiently accurate for many applications. We can construct Tay-
lor series approximations for other functions using similar arguments. For
instance, consider sin(x), where the derivatives follow the repetitive pattern
sin0 (x) = cos(x), sin00 (x) = −sin(x), sin000 (x) = − cos(x), . . . .... We also have
sin(0) = 0, cos(0) = 1. In general, we have dk sin(0)/dxk = (−1)k mod(k, 2),
where mod(k, 2) is zero for k even and
∞
X x2k+1
sin x = (−1)k .
(2k + 1)!
k=0
n−1 n−2
X xk X xk xn−1
en = = + ,
k! k! (n − 1)!
k=0 k=0
xn−1
en = en−1 + , e0 = 0. (A.4)
(n − 1)!
We see that this difference equation involves (n − 1)!, which results in many
redundant multiplications when computing the complete factorial for every
iteration. However, we can use the idea of a difference equation for the fac-
torial to compute the Taylor polynomial more efficiently. We have
xn xn−1 x
= · ,
n! (n − 1)! n
en = en−1 + an−1 , e0 = 0,
x
an = an−1 , a0 = 1.
n
Although we are solving a system of two difference equations, the computa-
tion is far more efficient than solving the single equation in (A.4) directly,
since we avoid the repeated multiplications involved in the factorial compu-
tation.
A complete Python code for solving the system of difference equations and
computing the approximation to the exponential function may look like
import numpy as np
This program first prints the exact value ex for x = 0.5, and then the Taylor
approximation and associated error for n = 1 to n = 5. The Taylor series
approximation is most accurate close to x = 0. Choosing a larger value of x
would therefore lead to larger errors, and we would need to also increase n
for the approximation to be accurate.
References
A-stability, 40 eigenvalues, 37
action potential, 61 embedded method, 65
adaptive methods, 61 epidemiology, 79
AdaptiveESDIRK class, 72 error analysis, 17
AdaptiveODESolver, 66 error estimates, 64
amplification factor, 38 ESDIRK class, 57
ESDIRK method, 51
backward Euler method, 40 Euler method
Butcher tableau, 26 implicit, 40
Euler method
class explicit, 2
hierarchy, 28 Euler-Heun method, 66
abstract base class, 29 Euler-Heun method
for ODE solver, 7 implementation, 69
for right-hand side, 9 exponential growth, 2
superclass/base class, 28
collocation methods, 47 Fehlberg method, 70
convergence, 17 Fibonacci numbers, 100
Covid-19, 88 FIRK method, 47
Crank-Nicolson method, 41 forward Euler method, 2
ForwardEuler class, 7, 12
Dahlquist test equation, 36
difference equations, 95 Gauss methods, 48
difference equations
implementation, 96 Heun’s method, 25
SIR model, 81 Hodgkin-Huxley model, 61
systems of, 104
DIRK method, 49 ImplicitRK class, 52
Dormand-Prince method, 71 incubation period, 86
SciPy, 20
SDIRK class, 56
SDIRK method, 51
SEEIIR model