0% found this document useful (0 votes)
16 views104 pages

Num1 Skript

The document provides a comprehensive overview of numerical analysis methods for ordinary differential equations (ODEs), including initial value problems, explicit and implicit methods, and stability considerations. It references various sources and includes examples such as exponential growth and predator-prey systems to illustrate the concepts. The content is structured into chapters covering different aspects of ODEs, their properties, and numerical solution techniques.

Uploaded by

Kamel Khalil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views104 pages

Num1 Skript

The document provides a comprehensive overview of numerical analysis methods for ordinary differential equations (ODEs), including initial value problems, explicit and implicit methods, and stability considerations. It references various sources and includes examples such as exponential growth and predator-prey systems to illustrate the concepts. The content is structured into chapters covering different aspects of ODEs, their properties, and numerical solution techniques.

Uploaded by

Kamel Khalil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

Numerical Analysis of

Ordinary Differential Equations

Guido Kanschat & Robert Scheichl

July 24, 2019


Preface

These notes are a short presentation of the material presented in my lecture. They follow
the notes “Numerik 1: Numerik gewöhnlicher Differentialgleichungen” by Rannacher (in
German) [Ran17b], as well as the books by Hairer, Nørsett, and Wanner [HNW09] and
Hairer and Wanner [HW10]. Furthermore, the book by Deuflhard and Bornemann [DB08]
was used. Historical remarks are in part taken from the article by Butcher [But96].

We are always thankful for hints and errata.

Thanks go to Dörte Jando, Markus Schubert, Lukas Schubotz, and David Stronczek for
their help with writing and editing these notes.

1
Index for shortcuts

IVP Initial value problem, s. definition 1.2.7 on page 6


BDF Backward differencing formula
ODE Ordinary differential equation
DIRK Diagonal implicit Runge-Kutta method
ERK Explicit Runge-Kutta method
IRK Implicit Runge-Kutta method
LMM Linear multistep method
VIE Volterra integral equation, s. Remark 1.2.12 on page 7

Index for symbols

C The set of complex numbers


ei The unit vector of Cd in direction d
Re Real part of a complex number
R The set of real numbers
Rd The d-dimensional vectorspace of the real d-tuple
u The exact solution of an ODE or IVP
uk The exact solution at time step tk
yk The discrete solution at time step tk
hx, yi The Euclidean scalar product in the space Rd or Cd
|x| The absolute value of a real number, the modulus of a com-
plex number, or the Euclidean norm in Rd or Cd , depending
on its argument
kuk A norm in a vector space (with exception of the special cases
covered by |x|)

2
Contents

1 Initial Value Problems and their Properties 2

1.1 Modeling with ordinary differential equations . . . . . . . . . . . . . . . . . 2

1.2 Introduction to initial value problems . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Linear ODEs and Grönwall’s inequality . . . . . . . . . . . . . . . . . . . . 8

1.4 Well-posedness of the IVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Explicit One-Step Methods and Convergence 17

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Estimates of the local error and time step control . . . . . . . . . . . . . . . 32

2.4.1 Extrapolation methods . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4.2 Embedded Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . 34

3 Implicit One-Step Methods and Long-Term Stability 36

3.1 Monotonic initial value problem . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1.1 Stiff initial value problems . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 A-, B- and L-stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3 General Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.1 Existence and uniqueness of discrete solutions . . . . . . . . . . . . . 47

3.3.2 Considerations on the implementation of Runge-Kutta methods . . . 51

3.4 Construction of Runge-Kutta methods via quadrature . . . . . . . . . . . . 51

3.4.1 Gauss-, Radau-, and Lobatto-quadrature . . . . . . . . . . . . . . . . 52

3.4.2 Collocation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

0
4 Newton and quasi-Newton methods 58

4.1 Basics of nonlinear iterations . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 Descent methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.3 Globalization of Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.4 Practical considerations – quasi-Newton methods . . . . . . . . . . . . . . . 62

5 Linear Multistep Methods 65

5.1 Examples of LMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2 General definition and consistency of LMMs . . . . . . . . . . . . . . . . . . 67

5.3 Properties of difference equations . . . . . . . . . . . . . . . . . . . . . . . . 69

5.4 Stability and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.5 Starting procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.6 LMM and stiff problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.6.1 A(α)-stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6 Boundary Value Problems 78

6.1 General boundary value problems . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Second-order, scalar two-point boundary value problems . . . . . . . . . . . 79

6.3 Finite difference methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4 Existence, stability and convergence . . . . . . . . . . . . . . . . . . . . . . 82

7 Outlook towards partial differential equations 87

7.1 The Laplacian and harmonic functions . . . . . . . . . . . . . . . . . . . . . 87

7.2 Finite difference methods in higher dimensions . . . . . . . . . . . . . . . . 89

7.3 Evolution equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A Appendix 94

A.1 Comments on uniqueness of an IVP . . . . . . . . . . . . . . . . . . . . . . 94

A.2 Properties of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

A.2.1 The matrix exponential . . . . . . . . . . . . . . . . . . . . . . . . . 95

A.3 The Banach fixed-point theorem . . . . . . . . . . . . . . . . . . . . . . . . 97

A.4 The implicit and explicit Euler-method . . . . . . . . . . . . . . . . . . . . . 98

A.5 Derivation of a BDF-scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

1
Chapter 1

Initial Value Problems and their


Properties

1.1 Modeling with ordinary differential equations

Example 1.1.1 (Exponential growth). Bacteria are living on a substrate with ample
nutrients. Each bacteria splits into two after a certain time ∆t. The time span for splitting
is fixed and independent of the individuum. Then, given the amount u0 of bacteria at time
t0 , the amount at t1 = t0 + ∆t is u1 = 2u0 . Generalizing, we obtain

un = u(tn ) = 2n u0 , tn = t0 + n∆t.

After a short time, the number of bacteria will be huge, such that counting is not a good
idea anymore. Also, the cell division does not run on a very sharp clock, such that after
some time, divisions will not only take place at the discrete times t0 + n∆t, but at any
time between these as well. Therefore, we apply the continuum hypothesis, that is, u is
not a discrete quantity anymore, but a continuous one that can take any real value. In
order to accommodate for the continuum in time, we make a change of variables:
t−t0
u(t) = 2 ∆t u0 .

Here, we have already written down the solution of the problem, which is hard to generalize.
The original description of the problem involved the change of u from one point in time
to the next. In the continuum description, this becomes the derivative, which we can now
compute from our last formula:

d ln 2 t−t0 ln 2
dt u(t) =
∆t
2 ∆t u0 =
∆t
u(t).

We see that the derivative of u at a certain time depends on u itself at the same time
and a constant factor, which we call the growth rate α. Thus, we have arrived at our first
differential equation

u0 (t) = αu(t). (1.1)

2
Figure 1.1: Plot of a solution to the predator-prey system with parameters α = 23 , β = 43 ,
δ = γ = 1 and initial values u(0) = 3, v(0) = 1. Solved with a Runge-Kutta method of
order five and step size h = 10−5 .

What we have seen as well is, that we had to start with some bacteria to get the process
going. Indeed, any function of the form

u(t) = ceαt

is a solution to equation (1.1). It is the initial value u0 , which anchors the solution and
makes it unique.

Example 1.1.2 (Predator-prey systems). We add a second species to our bacteria exam-
ple. Let’s say, we replace the bacteria by sardines living in a nutrient rich sea, and we
add tuna-eating sardines. The amount of sardines eaten depends on the likelyhood that
a sardine and a tuna are in the same place, and on the hunting efficiency β of the tuna.
Thus, equation (1.1) is augmented by a negative change in population depending on the
product of sardines u and tuna v:

u0 = αu − βuv.

In addition, we need an equation for the amount of tuna. In this simple model, we will
make two assumptions: first, tuna die of natural causes at a death rate of γ. Second, tuna
procreate if there is enough food (sardines), and the procreation rate is proportional to the
amount of food. Thus, we obtain

v 0 = δuv − γv.

Again, we will need initial populations at some point in time to evolve them to later times
from that point.

Remark 1.1.3. The predator-prey system (a.k.a. Lotka-Volterra-equations) have periodic


solutions. Even though none of these exist in closed form, solutions can be computed

3
numerically (simulated): Lotka and Volterra became interested in this system as they had
found that the amount of predatory fish caught had increased during World War I. During
the war years there was a strong decrease of fishing effort. In conclusion, they thought,
there had to be more prey fish.

A (far too rarely) applied consequence is that in order to diminish the amount of, e.g.,
foxes one should hunt rabbits, since foxes feed on rabbits.

Example 1.1.4 (Graviational two-body systems). According to Newton’s law of universal


gravitation, two bodies of masses m1 and m2 attract each other with a force
m1 m2
F1 = G r1 ,
r3
where F1 is the force vector acting on m1 and r1 is the vector pointing from m1 to m2 and
r = |r1 | = |r2 |.

Newton’s second law of motion, on the other hand, relates forces and acceleration:

F = mx00 ,

where x is the position of a body in space.

Combining these, we obtain equations for the positions of the two bodies:
m3−i
x00i = G (xi − x3−i ), i = 1, 2.
r3
This is a system of 6 independent variables. However, it can be reduced to three, noting
that the distance vector r is the only variable to be computed for:
m1 + m2
r00 = −G r.
r3
Intuitively, it is clear that we need an initial position and an initial velocity for the two
bodies. Later on, we will see that this can actually be justified mathematically.

Example 1.1.5 (Celestial mechanics). Now we extend the two-body system to a many-
body system. Again, we subtract the center of mass, such that we obtain n sets of 3
equations for an n + 1-body system. Since forces simply add up, this system becomes
X mj
xi = −G 3 rij . (1.2)
rij
j6=i

Here, rij = rj − ri and rij = |rij |.

Initial data for the solar system can be obtained from

https://ssd.jpl.nasa.gov/?horizons

4
1.2 Introduction to initial value problems

1.2.1 Definition (Ordinary differential equations): Let u(t) be a function


defined on an interval I ⊂ R with values in the real or complex numbers or in the
space Rd (Cd ). An ordinary differential equation (ODE) is an equation for u(t)
of the form

F t, u(t), u0 (t), u00 (t), . . . , u(n) (t) = 0.



(1.3)

Here F (. . .) denotes an arbitrary function of its arguments.


The order n of a differential equation is the highest derivative which occurs. If
d > 1, we talk about systems of differential equations.

Remark 1.2.2. A differential equation (DE), which is not ordinary, is called partial.
These are equations or systems of equations, which involve partial derivatives with re-
spect to several independent variables. While the functions in an ordinary differential
equation may be dependent on additional parameters, derivatives are only taken with re-
spect to one variable. Often, but not exclusively, this variable is time. This manuscript
only deals with ordinary differential equations, and so the adjective will be omitted in the
following.

1.2.3 Definition: An explicit differential equation of first order is a equation


of the form

u0 (t) = f (t, u(t)) (1.4)


0
or shorter: u = f (t, u).

A differential equation of order n is called explicit, if it is of the form


 
u(n) (t) = f t, u(t), u0 (t), . . . , u(n−1) (t)

1.2.4 Lemma: Every differential equation (of arbitrary order) can be written as
a system of first-order differential equations. If the equation is explicit, then the
system is explicit.

Proof. We introduce the additional variables u0 (t) = u(t), u1 (t) = u0 (t) to un−1 (t) =
u(n−1) (t). Then, the differential equation in (1.3) can be reformulated as the system

u00 (t) − u1 (t)


   
0
u 0 (t) − u (t)  0
2

 1   
..   .. 
 = . . (1.5)

 .
0
   
 un−2 (t) − un−1 (t)  0
0

F t, u0 (t), u1 (t), . . . , un−1 (t), un−1 (t) 0

5
In the case of an explicit equation, the system has the form
 0   
u0 (t) u1 (t)
 u0 (t)   u2 (t) 
 1   
 ..   ..
 . = . (1.6)

 0   . 
un−2 (t)  un−1 (t) 
0

un−1 (t) f t, u0 (t), u1 (t), . . . , un−1 (t)

Example 1.2.5. The differential equation


u00 + ω 2 u = f (t) (1.7)
can be transformed into the system
u01 − u2 = 0,
(1.8)
u02 + ω 2 u1 = f (t).
The transformation is not uniquely determined. In this example, a more symmetric system
can be obtained:
u01 − ωu2 = 0,
(1.9)
u02 + ωu1 = f (t).
From a numerical perspective, system (1.9) should be chosen over (1.8) to avoid loss of
significance or overflow, i.e. if |ω|  1 or |ω|  1.

1.2.6 Definition: A differential equation of the form (1.4) is called autonomous,


if the right hand side f is not explicitly dependent on t, i.e.

u0 = f (u). (1.10)

Each differential equation can be transformed into an autonomous differential equa-


tion. This is called autonomization.
   
u f (t, u)
U= , F (U ) = , U 0 = F (U )
t 1

A method which provides the same solution for the autonomous differential equation
as for the original IVP, is called invariant under autonomization.

Differential equations usually provide sets of solutions from which we have to choose a
solution. An important selection criterion is setting an initial value which leads to a well-
posed problem (see below).

1.2.7 Definition: Given a point (t0 , u0 ) ∈ R×Rd and a function f (t, u) with values
in Rd , defined in a neighborhood I × U ⊂ R × Rd of (t0 , u0 ). Then, an initial value
problem (IVP) is defined as follows: find a function u(t), such that

u0 (t) = f t, u(t)

(1.11a)
u(t0 ) = u0 (1.11b)

6
1.2.8 Definition: We call a continuously differentiable function u(t) with u(t0 ) = 0
a local solution of the IVP (1.11), if there exists a neighborhood J ⊂ R of t0 , such
that u(t) and f (t, u(t)) are defined and the equation (1.11a) holds for all t ∈ J.

Remark 1.2.9. We introduced the IVP deliberately in a “local” form because the local
solution term is the most useful one for our purposes. Due to the fact that the neighborhood
J in the definition above can be arbitrarily small, we will have to deal with the extension
to larger intervals below.

Remark 1.2.10. Through the substitution of t 7→ τ with τ = t − t0 it is possible to


transform every IVP at the point t0 to an IVP in 0. We will make use of this fact and
soon always assume t0 = 0.

1.2.11 Lemma: Let f be continuous in both arguments. Then, the function u(t)
is a solution of the initial value problem (1.11) if and only if it is a solution of the
Volterra integral equation (VIE)
Z t 
u(t) = u0 + f s, u(s) ds. (1.12)
t0

Remark 1.2.12. The formulation as an integral equation allows on the other hand a more
general solution term, because the problem is already well-posed for functions f (t, u), which
are just integrable with respect to t. (In that case, the solution u would be just absolutely
continuous and not continuously differentiable.) Both the theoretical analysis of the IVP
and the numerical methods in this lecture notes (with exception of the BDF methods) are
in fact considering the associated integral equation (1.12) and not the IVP (1.11).

1.2.13 Theorem (Peano’s existence theorem): Let α, β > 0 and let the func-
tion f (t, u) be continuous on the closed set

D = (t, u) ∈ R × Rd |t − t0 | ≤ α, |u − u0 | ≤ β .


There exists a solution u(t) ∈ C 1 (I) on the interval I = [t0 − T, t0 + T ] with


 
β
T = min α, , M = max |f (t, u)|.
M (t,u)∈D

The proof of this theorem is of little consequence for the remainder of these notes. For
its verification, we refer to textbooks on the theory of ordinary differential equations or to
[Ran17b, Satz 1.1].

Remark 1.2.14. The Peano existence theorem does not make any statements about the
uniqueness of a solution and guarantees only local existence. The second limitation is
addressed by the following theorem. Uniqueness will be discussed in section 1.4.

7
1.2.15 Theorem (Peano’s continuation theorem): Let the assumptions of
Theorem 1.2.13 hold. Then, the solution can beextended to an interval Im = [t− , t+ ]
such that the points t− , u(t− ) and t+ , u(t+ ) are on the boundary of D. Neither
the values of t, nor the values of u(t) need to be bounded as long as f remains
bounded.

Example 1.2.16. The IVP

u0 = 2 |u|,
p
u(0) = 0,

has solutions u(t) = t2 and u(t) = 0 that both exist for all t ∈ R (global existence, but
non-uniqueness).
Example 1.2.17. The IVP

u0 = −u2 , u(0) = 1.

has the unique solution 1/(1 + t). This solution has a singularity for t → −1 (not global
existence, but uniqeuness). However, it exists for all t > −1 and thus in particular for all
t > 0 = t0 , which is all that matters for an IVP.

1.3 Linear ODEs and Grönwall’s inequality

1.3.1. The study of linear differential equation turns out to be particularly simple and
results obtained here will provide us with important statements for general non-linear
IVP. Therefore, we pay particular attention to the linear case.

1.3.2 Definition: An IVP according to definition 1.2.7 is called linear if the right
hand side f is an affine function of u and the IVP can be written in the form

u0 (t) = A(t)u(t) + b(t) ∀t ∈ R (1.13a)


u(t0 ) = u0 (1.13b)

with a continuous matrix function A : R → Cd×d .


If in addition b(t) ≡ 0, we call the IVP homogeneous.

1.3.3 Definition: Let the matrix function A : I → Cd×d be continuous. Then the
function defined by
 Z t 
M (t) = exp − A(s) ds (1.14)
t0

is called integrating factor of the equation (1.13a).

Corollary 1.3.4. The integrating factor M (t) has the properties

M (t0 ) = I (1.15)
0
M (t) = −M (t)A(t). (1.16)

8
1.3.5 Lemma: Let M (t) be the integrating factor of the equation (1.13a) defined
in (1.14). Then, the function
 Z t 
u(t) = M (t)−1 u0 + M (s)b(s) ds (1.17)
t0

is a solution of the IVP (1.13) that exists for all t ∈ R.

Proof. We consider the auxiliary function w(t) = M (t)u(t) with the integrating factor
M (t) defined as in eqn. (1.14). It follows by using the product rule that

w0 (t) = M (t)u0 (t) + M 0 (t)u(t) = M (t)(u0 (t) − A(t)u(t)). (1.18)

Using the differential equation (1.13a), we obtain

w0 (t) = M (t)b(t).

This can be integrated directly to obtain


Z t
w(t) = u0 + M (s)b(s) ds,
t0

where we have used (1.15) such that w(t0 ) = M (t0 )u(t0 ) = u0 .

According to lemma A.2.3 about the matrix exponential, M (t) is invertible for all t. Thus
we can apply M (t)−1 to w(t) to obtain the solution u(t) of (1.13) as given in equation (1.17).

The global solvability follows since the solution is defined for arbitrary t ∈ R.

Example 1.3.6. The equation in example 1.2.5 is linear and can be written in the form
of (1.13) with
   
0 ω 0
A(t) = A = and b(t) = .
−ω 0 f (t)

Let now f (t) ≡ 0, t0 = 0 and u(0) = u0 . It is easy to see A has eigenvalues iω and −iω,
so that we can write
 
ωi 0
A = C −1 C
0 −ωi

with a suitable transformation matrix C DIY . Using the properties of the matrix expo-
nential, the integrating factor is
 −iωt   
−At −1 e 0 cos ωt sin ωt
M (t) = e =C C= .
0 eiωt − sin ωt cos ωt

Thus, the solution is


 
cos ωt − sin ωt
u(t) = u0 .
sin ωt cos ωt

The missing details in this argument and the case for an inhomogeneity f (t) = cos αt are
left as an exercise DIY .

9
Remark 1.3.7. If the function b(t) in (1.13a) is only integrable, the function u(t) defined
in (1.17) is absolutely continuous and thus differentiable almost everywhere. The chain
rule (1.18) is applicable in all points of differentiability and w(t) solves the Volterra integral
equation corresponding to (1.13). Thus, the representation formula (1.17) holds generally
for solutions of linear Volterra integral equations.

1.3.8 Lemma (Grönwall): Let w(t), a(t) and b(t) be nonnegative, integrable
functions, such that a(t)w(t) is integrable. Furthermore, let b(t) be monotonically
non-decreasing and let w(t) satisfy the integral inequality
Z t
w(t) ≤ b(t) + a(s)w(s) ds, t ≥ t0 . (1.19)
t0

Then, for almost all t ≥ t0 there holds:


Z t 
w(t) ≤ b(t) exp a(s) ds . (1.20)
t0

Proof. Using the integrating factor


 Z t  Z t 
1
m(t) = exp − a(s) ds , = exp a(s) ds ,
t0 m(t) t0

we introduce the auxiliary function


Z t
v(t) = m(t) a(s)w(s) ds,
t0

This function is absolutely continuous, and since m0 (t) = −a(t)m(t), we have almost
everywhere
 Z t 
0
v (t) = m(t)a(t) w(t) − a(s)w(s) ds .
t0

Using assumption (1.19), the bracket on the right can be bounded by b(t). Thus,

v 0 (t) ≤ m(t)a(t)b(t)

and since by definition v(t0 ) = 0, it follows that


Z t
v(t) ≤ m(s)a(s)b(s) ds,
t0

which implies, using the definition of v(t), that


Z t Z t
1 1
a(s)w(s) ds = v(t) ≤ m(s)a(s)b(s) ds.
t0 m(t) m(t) t0

10
Finally, since b(t) is nondecreasing we obtain almost everywhere
Z t Z t  Z s 
b(t)
a(s)w(s) ds ≤ a(s) exp − a(r) dr ds
t0 m(t) t0 t0
  Z s  t
b(t)
= − exp − a(r) dr
m(t) t0
| {z } t0
m(s)
b(t)  b(t)
= m(t0 ) − m(t) = − b(t).
m(t) m(t)
Combining this bound with the integral inequality (1.19), we obtain
Z t
b(t)
w(t) ≤ b(t) + a(s)w(s) ds = ,
t0 m(t)
which proves the lemma.
Remark 1.3.9. As we can see from the form of assumption (1.19) and estimate (1.20),
the purpose of Grönwall’s inequality is to construct a majorant for w(t) that satisfies a
linear IVP. The bound is particularly simple when a, b ≥ 0 are constant.

1.3.10 Corollary: If two solutions u(t) and v(t) of the linear differential equa-
tion (1.13a) coincide in a point t0 , then they are identical.

Proof. The difference w(t) = v(t) − u(t) solves the integral equation
Z t
w(t) = A(s)w(s) ds.
t0

Hence, for an arbitrary vector norm k·k (and induced matrix norm also denoted by k·k),
we can obtain the following integral inequality
Z t Z t
kw(t)k ≤ kA(s)w(s)k ds ≤ kA(s)kkw(s)k ds
t0 t0

Now, applying Grönwall’s inequality (1.20) with a(t) = kA(t)k and b(t) = 0, we can
conclude that kw(t)k = 0 and therefore u(t) = v(t), for all t.
Corollary 1.3.11. The representation formula (1.17) in Lemma 1.3.5 defines the unique
solution to the IVP (1.13). In particular, solutions of linear IVPs are always defined for
all t ∈ R.
Example 1.3.12. Let A ∈ Cd×d be diagonalizable with (possibly repeated) eigenvalues
λ1 , . . . , λd and corresponding eigenvectors ψ (1) , . . . , ψ (d) . The linear IVP
u0 = Au,
u(0) = u0 .
has unique solution u(t) = eAt u0 . Using the properties of the matrix exponential (see
Appendix A.2.1), with Ψ ∈ Cd×d denoting the matrix with ith column ψ (i) , we get
 
λ1 t
−1
u(t) = eΨ diag(λ1 ,...,λd )Ψt u0 = Ψ−1 exp 
 ..  Ψu0 .

.
λd t

11
1.3.13 Lemma: The solutions of the homogeneous, linear differential equation

u0 (t) = A(t)u(t) (1.21)

with u : R → Rd , define a vector space of dimension d. Let {ψ (i) }i=1,...,d be a


basis of Rd . Then the solutions ϕ(i) (t) of the equation (1.21) with initial values
ϕ(i) (0) = ψ (i) form a basis of the solution space. The vectors {ϕ(i) (t)}i=1,...,d are
linear independent for all t ∈ R.

Proof. At first we observe that, due to linearity of the derivative and the right hand side,
for two solutions u(t) and v(t) of the equation (1.21) and for α ∈ R, αu(t) + v(t) is also
a solution of (1.21) with initial condition αu(0) + v(0) ∈ Rd . Therefore, the vector space
structure follows from the vector space structure of Rd .

Let now {ϕ(i) (t)} be solutions of the IVP with linearly independent initial values {ψ (i) }.
As a consequence the functions are linearly independent as well.

Assume that w(t) is a solution of equation (1.21) that cannot be written as a linear
combination of the functions {ϕ(i) (t)}. Then, w(0) is not a linear combination of the
(i) (i)
P
vectors ψP . Because otherwise, if there exists {αi }i=1,...,d with w(0) = αi ψ , then
w(t) = αi ϕ(i) (t) due to the uniqueness of any solution of equation (1.21) proven in
corollary 1.3.10, which would lead to a contradiction. However, since {ψ (i) } was assumed
to form a basis of Rd , that implies w(0) = 0 and thus w ≡ 0. It follows that the solution
space has dimension d and that ϕ(i) (t) forms a basis.

Since in the above argument t ∈ R was arbitrary, the ϕ(i) (t) are linearly independent for
all t ∈ R.

1.3.14 Definition: A basis {ϕ(1) , . . . , ϕ(d) } of the solution space of the linear dif-
ferential equation (1.21), in particular the basis with initial values ϕ(i) (0) = ei , is
called fundamental system of solutions. The matrix function

Y (t) = ϕ(1) (t) . . . ϕ(d) (t)



(1.22)

with column vectors ϕ(i) (t) is called fundamental matrix.

1.3.15 Corollary: The fundamental matrix Y (t) is regular for all t ∈ R and it
solves the IVP

Y 0 (t) = A(t)Y (t)


Y (0) = I.

Proof. The initial value is part of the definition. On the other hand, splitting the matrix-
valued IVP into its column vectors, we obtain the original family of IVPs defining the
solution space. Regularity follows from the linear independence of the solutions for any t.

12
1.4 Well-posedness of the IVP

1.4.1 Definition: A mathematical problem is called well-posed if the following


Hadamard conditions are satisfied:

1. A solution exists.

2. The solution is unique.

3. The solution depends continuously on the data.

The third condition in this form is purely qualitative. Typically, in order to characterize
problems with good approximation properties, we will require Lipschitz continuity, which
has a more quantitative character.

Example 1.4.2. The IVP



u0 = 3
u, u(0) = 0,

has infinitely many solutions of the form



0 for t ∈ [0, c],
u(t) =  2 3/2
 (t − c) for t > c,
3

with c ∈ R. Thus, the solution is not unique and therefore, the IVP is not well-posed.

Let now the initial value be nonzero, but slightly positive. Then, the solution is unique,
3/2
i.e., u(t) ≈ 32 t . In contrast, when the initial value is slightly negative, there exists no
real-valued solution. Hence, a small perturbation of the initial condition has a dramatic
effect on the solution; this is what the third condition for a well-posed problem in definition
1.4.1 excludes.

1.4.3 Definition: The function f (t, y) satisfies a uniform Lipschitz condition on


the domain D = I × Ω ⊂ R × Rd , if it is Lipschitz continuous with respect to y, i.e.,
there exists a constant L > 0, such that

∀t ∈ I; x, y ∈ Ω : |f (t, x) − f (t, y)| ≤ L|x − y| (1.23)

It satisfies a local Lipschitz condition if (1.23) holds for all compact subsets of D.

Example 1.4.4. Let f (t, y) ∈ C 1 (R × Rd ) and let all partial derivatives with respect to
the components of u be bounded such that


max fj (t, y) ≤ K.
t∈R ∂yi
y∈Rd
1≤i,j≤d

13
Then, f satisfies the Lipschitz condition (1.23) with L = Kd. Indeed, by using the Fun-
damental Theorem of Calculus, we see that
Z 1
d 
fj (t, y) − fj (t, x) = fj t, x + s(y − x) ds
0 ds
Z 1Xd
∂ 
= fj t, x + s(y − x) (yi − xi ) ds.
0 ∂yi
i=1
qP
d 2
Now, exploiting the fact that |Ax| ≤ kAkF |x|, where kAkF := i,j=1 aij is the Frobenius
norm of the matrix A, we get
Z 1 d
X ∂f 
|f (t, y) − f (t, x)| ≤ t, x + s(y − x) (yi − xi ) ds
0 i=1 ∂yi
 1/2
Z 1 X d 2
∂f 
≤  t, x + s(y − x)  |y − x| ds ≤ Kd|y − x|.
0 ∂yi
i,j=1

1.4.5 Theorem (Stability): Let f (t, y) and g(t, y) be two continuous functions
on a cylinder D = I × Ω where the interval I contains t0 and Ω is a convex set in
Rd . Furthermore, let f admit a Lipschitz condition with constant L on D. Let u
and v be solutions to the IVPs

u0 = f (t, u) ∀t ∈ I, u(t0 ) = u0 , (1.24)


0
v = g(t, v) ∀t ∈ I, v(t0 ) = v0 . (1.25)

Then
 Z t 
L|t−t0 |
|u(t) − v(t)| ≤ e |u0 − v0 | + max|f (s, x) − g(s, x)| ds . (1.26)
t0 x∈Ω

Proof. Both u(t) and v(t) solve their respective Volterra integral equations. Taking the
difference, we obtain
Z t
 
u(t) − v(t) = u0 − v0 + f (s, u(s)) − g(s, v(s)) ds
t0
Z t 
Z t 
= u0 − v0 + f (s, u(s)) − f (s, v(s)) ds + f (s, v(s)) − g(s, v(s)) ds.
t0 t0

Thus, its norm admits the integral inequality


Z t Z t
|u(t) − v(t)| ≤ |u0 − v0 | + |f (s, v(s)) − g(s, v(s))| ds + |f (s, u(s)) − f (s, v(s))| ds
t0 t0
Z t Z t
≤ |u0 − v0 | + max|f (s, x) − g(s, x)| ds + L|u(s) − v(s)| ds.
t0 x∈Ω t0
| {z }
=: b(t)

This inequality is in the form of the assumption in Grönwall’s lemma with a ≡ L and
w(t) := |u(t) − v(t)|, and its application yields the stability result (1.26).

14
1.4.6 Theorem (Picard-Lindelöf ): Let f (t, y) be continuous on a cylinder

D = {(t, y) ∈ R × Rd | |t − t0 | ≤ a, |y − u0 | ≤ b}.

Let f be bounded such that there is a constant M := max(t,y)∈D |f (t, y)| and satisfy
the Lipschitz condition (1.23) with constant L on D. Then the IVP

u0 = f (t, u)
u(t0 ) = u0

is uniquely solvable on the interval I = [t0 − T, t0 + T ] where T = min{a, Mb }.

Proof. W.l.o.g., we assume t0 = 0 and let


I := [−T, T ] and Ω = y ∈ Rd |y − u0 | ≤ b .


The Volterra integral equation (1.12) allows us to define the operator


Zt
F (u)(t) := u0 + f (s, u(s)) ds. (1.27)
0

Obviously, u is a solution of the Volterra integral equation (1.12) if and only if u is a fixed
point of F , i.e., u = F u. We can obtain such a fixed-point by the iteration u(k+1) = F (u(k) )
with some initial guess u(0) : I → Ω.
From the boundedness of f , we obtain for all t ≤ T that
Z t Z t
(k+1) (k)
u (t) − u0 = f (s, u (s)) ds ≤ |f (s, u(k) (s))| ds ≤ T M ≤ b.
0 0

Thus, it follows by an inductive argument that u(k) : I → Ω is well-defined for all k ∈ N.


We now show that under the given assumptions, F is a contraction and then apply the
Banach Fixed-Point Theorem. We follow the technique in [Heu86, §117] and choose on the
space C(I) of continuous functions defined on I, the weighted maximum-norm
kuke := max e−2Lt |u(t)|.
t∈I

Then, for all u, v ∈ C(I),


Zt
|F (u)(t) − F (v)(t)| = u0 − u0 + (f (s, u(s)) − f (s, v(s))) ds
0
Zt
 
≤ f s, u(s) − f (s, v(s) ds
0
Zt
≤ L|u(s) − v(s)| |e−2Ls 2Ls
{ze } ds
0 =1
Zt
e2Lt − 1 1
≤ Lku − vke e2Ls ds = Lku − vke ≤ e2Lt ku − vke
2L 2
0

15
and by multiplying both sides with e−2Lt it follows that
1
kF (u) − F (v)ke ≤ ku − vke .
2
Thus, we have shown that F is a contraction on (C(I), k·ke ). Therefore, we can apply
theorem A.3.1, the Banach Fixed-Point Theorem, and conclude that F has exactly one
fixed-point, which completes the proof.

Remark 1.4.7. The norm k·ke had been chosen with regard to Grönwall’s inequality,
which was not used explicitly in the proof. It is equivalent to the norm k·k∞ because e−2Lt
is strictly positiv and bounded. One could have performed the proof (with some extra
calculations) also with respect to the ordinary maximum norm k·k∞ .

Remark 1.4.8. Currently our solution is restricted to I = [t0 − T, t0 + T ]. Since T is


chosen in such a way in theorem 1.4.6 that the graph of u does not leave the domain, this
extension always ends at a point (t1 , u1 ) in the interior of D with t1 := t0 +T . One can now
extend the solution by solving the next IVP u0 = f (t, u) with initial condition u(t1 ) = u1
on an interval I1 . This way one obtains a solution on I ∪ I1 ∪ I2 ∪ ....

If f satisfies a Lipschitz condition everywhere then this leads to the following corollary.

Corollary 1.4.9. Let the function f (t, u) admit the Lipschitz condition on R × Cd . Then,
the IVP has a unique solution on the whole real axis.

Proof. The boundedness was used in order to guarantee that u(t) ∈ Ω for any t. This is not
necessary anymore, if Ω = Cd . The fixed point argument does not depend on boundedness
of the set. (See Exercise Sheet 4 for a more detailed proof.)

16
Chapter 2

Explicit One-Step Methods and


Convergence

2.1 Introduction

Example 2.1.1 (Euler’s method). We begin this section with the method that serves as
the prototype for a whole class of schemes for solving IVPs numerically.
(As always for problems with infinite dimensional solution spaces, numerical solution refers
to finding an approximation by means of a discretization method and the study of the
associated discretisation error.)

Consider the following problem:


Given an IVP of the form (1.11) with t0 = 0, calculate the value u(T ) at time T > 0.

Note first that at the initial point 0, not only the value u(0) = u0 of u, but also the
derivative u0 (0) = f (0, u0 ) are known. Thus, near 0 we are able to approximate the
solution u(t) (in blue in Figure 2.1) by a straight line y(t) (in red in Figure 2.1, left) using
a first-order Taylor series expansion, i.e.

u(t) ≈ y(t) = u(0) + tu0 (0) = u0 + tf (0, u0 ) .

The figure suggests that in general the accuracy of this method may not be very good for
t far from 0. The first improvement is that we do not draw the line through the whole
interval from 0 to T . Instead, we insert intermediate points and apply the method to each
subinterval, using the result of the previous interval as the initial point for the next. As a
result we obtain a continuous chain of straight lines (in red in Figure 2.1, right) and the
so-called Euler method (details below).

17
t T t T

Figure 2.1: Derivation of the Euler method. Left: approximation of the solution of the
IVP by a line that agrees in slope and value with the solution at t = 0. Right: Euler
method with three subintervals.

2.1.2 Definition: On a time interval I = [0, T ], we define a partitioning in n


subintervals, also known as time steps. Here we choose the following notation:

I
0 T
I1 I2 Ik In−1 In
t0 t1 t2 tk−1 tk tn−2 tn−1 tn = T

The time steps Ik = [tk−1 , tk ] have step size hk = tk − tk−1 . A partitioning in n


time steps implies tn = T . The term “k-th time step” is used both for the interval
Ik and for the point in time tk (which one is meant will be clear from the context).
Very often, we will consider uniform time steps and in that case the step size is
denoted by h and hk = h, for all k.

Definition 2.1.3 (Notation). In the following chapters we will regularly compare the
solution of an IVP with the results of discretization methods. Therefore, we introduce the
following convention for notation and symbols.

The solution of the IVP is called the exact or continuous solution. to emphasize that it
is the solution of the non-discretized problem. We denote it in general by u and in addition
we use the abbreviation

uk = u(tk ).

(k)
If u is vector-valued we also use the alternative superscript u(k) and write ui for the ith
component of the vector u(tk ).

The discrete solution is in general denoted by y and we write yk or y (k) for the value of
the discrete solution at time tk . In contrast to the continuous solution, y is only defined
at discrete time steps (except for special methods discussed later).

18
2.1.4 Definition (Explicit one-step methods): An explicit one-step method
is a method which, given u0 at t0 = 0 computes a sequence of approximations
y1 . . . , yn to the solution of an IVP at the time steps t1 , . . . , tn using an update
formula of the form

yk = yk−1 + hk Fhk (tk−1 , yk−1 ). (2.1)

The function Fhk () is called increment function.a We will often omit the index
hk on Fhk () because it is clear that the method is always applied to time intervals.
The method is called one-step method because the value yk explicitly depends
only on the values yk−1 and f (tk−1 , yk−1 ), not on previous values.
a
The adjective ‘explicit’ is here in contrast to ‘implicit’ one-step methods, where the increment
function depends also on yk and equation (2.1) typically leads to a nonlinear equation for yk .

Remark 2.1.5. For one-step methods every step is per definition identical. Therefore, it
is sufficient to define and analyze methods by stating the dependence of y1 on y0 , which
then can be transferred to the general step from yn−1 to yn . The general one-step method
above then reduces to
y1 = y0 + h0 Fh0 (t0 , y0 ).
This implies that the values yk with k ≥ 2 are computed through formula (2.1) with the
respective hk and the same increment function (but evaluated at tk−1 , yk−1 ).

2.1.6 Example: The simplest choice for the increment function is Fhk (t, u) :=
f (t, u), leading to the Euler method

y1 = y0 + hf (t0 , y0 ). (2.2)

Consider, for example, the (scalar, homogeneous, linear) IVP

u0 = u, u(0) = 1,

which has exact solution u(t) = et . In that case, the Euler method (with uniform
time steps) reads

y1 = (1 + h)y0 .

The results for h = 1 and h = 1/2 are:

exact h=1 h = 1/2


t u(t) k yk |uk − yk | k yk |uk − yk |
0 1 0 1 0 1
1 2.71828 1 2 0.718 2 2.25 0.468
2 7.38906 2 4 3.389 4 5.0625 2.236
k 2.71828k k 2k 2k 2.25 k

The error is growing in time. The approximation of the solution is improved by


reducing h from 1 to 1/2. The goal of the following error analysis will be to establish
those dependencies.

19
u u

acc loc 2

loc 2 acc
y y

loc 1 loc 1

t t

Figure 2.2: Local and accumulated errors. Exact solution in black, the Euler method in
red. On the left, in blue the exact solution of an IVP on the second interval with initial
value y1 . On the right, in purple the second step of the Euler method, but with exact
initial value u1 .

2.2 Error analysis

Remark 2.2.1. In Figure 2.1, we observe that, at a given time tk+1 , the error consists
of two parts: (i) due to replacing the differential equation by the discrete method on the
interval Ik and (ii) due the initial value yk already being inexact. This is illustrated more
clearly in Figure 2.2. Therefore, in our analysis we split the error into the local error and
an accumulated error. The local error compares continuous and discrete solutions on a
single interval with the same initial value. In the analysis, we will have the options of
using the exact (right figure) or the approximated initial value (left figure).

2.2.2 Definition: Let u be a solution of the differential equation u0 = f (t, u) on


the interval I = [t0 , tn ] = [0, T ]. Then, the global error of a discrete method Fhn
is

|u(tn ) − y(tn )|, (2.3)

i.e., the difference between the solution un of the differential equation at tn and the
result of the one-step method at tn .

20
2.2.3 Definition: Let u be a solution of the differential equation u0 = f (t, u) on
the interval Ik = [tk−1 , tk ]. Then, the local error of a discrete method Fhk is
 
ηk = ηk (u) = uk − uk−1 + hk Fhk (tk−1 , uk−1 ) , (2.4)

i.e., the difference between uk = u(tk ) and the result of one time step (2.1) with this
method with exact initial value uk−1 = u(tk−1 ).
The truncation error is the quotient of the local error and hk :
ηk uk − uk−1
τk = τk (u) = = − Fhk (tk−1 , uk−1 ). (2.5)
hk hk

The one-step method Fhk (t, y) is said to have consistency of order p, if for all suf-
ficiently regular functions f there exists a constant c independent of h := maxnk=1 hk
such that for h → 0:
n
max |τk | ≤ chp (2.6)
k=1

Example 2.2.4 (Euler method). To find the order of consistency of the Euler method,
where Fhk (t, y) = f (t, y), consider Taylor expansion of u at tk−1 :

1
u(tk ) = u(tk−1 ) + hk u0 (tk−1 ) + h2k u00 (ζ), for some ζ ∈ Ik .
2
As a result the truncation error reduces to:
uk − uk−1
τk = − Fhk (tk−1 , u(tk−1 ))
hk
hk f (tk−1 , uk−1 ) + 21 h2k u00 (ζ) 1
= − f (tk−1 , uk−1 ) = u00 (ζ) hk .
hk 2

If f ∈ C 1 (D) on a compact set D around the graph of u, we can bound the right hand
side:
1 1 ∂f
|τk | ≤ max|u00 (t)| hk = max (t, u(t)) + ∇y f (t, u(t))u0 (t) hk
2 t∈I k 2 t∈I k ∂t

1 ∂f
≤ max (t, y) + ∇y f (t, y)f (t, y) hk .
2 (t,y)∈D ∂t
| {z }
=: c
Here, we use the assumption that f is sufficiently smooth to conclude that the Euler
method is consistent of order 1 (slightly more than Lipschitz continuous).

Next we consider stability of explicit one-step methods. To prove this, we first need a
discrete version of Grönwall’s inequality.

21
2.2.5 Lemma (Discrete Grönwall inequality): Let (wk ), (ak ), (bk ) be non-
negative sequences of real numbers, such that (bk ) is monotonically nondecreasing.
Then, it follows from
n−1
X
w0 ≤ b0 and wn ≤ ak wk + bn , for all n ≥ 1, (2.7)
k=0

that
n−1
!
X
wn ≤ exp ak bn . (2.8)
k=0

Proof. Let k ∈ N and define the functions w(t), a(t), and b(t) such that for all t ∈ [k − 1, k)

w(t) = wk−1 , a(t) = ak−1 , b(t) = bk−1 .

These functions are bounded and piecewise continuous on any finite interval. Thus, they
are integrable on [0, n]. Therefore, the continuous Grönwall inequality of Lemma 1.3.8
applies and proves the result.

2.2.6 Theorem (Discrete stability): If Fhk (t, y) is Lipschitz continuous in y for


any t = tk , k < n, with constant L, then the corresponding one-step method is
discretely stable, i. e. for arbitrary sequences (yk ) and (zk ), there holds:
n
!
X
LT
|yn − zn | ≤ e |y0 − z0 | + |ηk (y) − ηk (z)|
k=1

Proof. Subtracting the equations

ηk (y) = yk − yk−1 − hk Fhk (tk−1 , yk−1 ),


ηk (z) = zk − zk−1 − hk Fhk (tk−1 , zk−1 ),

we obtain

yk − zk = yk−1 − zk−1 + ηk (y) − ηk (z)



+ hk Fhk (tk−1 , yk−1 ) − Fhk (tk−1 , zk−1 ) .

Recursive application yields


n
X n
X
|yn − zn | ≤ |y0 − z0 | + |ηk (y) − ηk (z)| + Lhk |yk−1 − zk−1 |.
k=1 k=1

The estimate now follows from the discrete Grönwall inequality in Lemma 2.2.5.

22
2.2.7 Corollary (One-step methods with finite precision): Let the one-step
method Fhk be run on a computer, yielding a sequence (zk ), such that each time step
is executed in finite precision arithmetic. Let (yk ) be the mathematically correct
solution of the one-step method. Then, the difference equation (2.1) is fulfilled only
up to machine accuracy eps, i.e., there exists a c > 0:

|y0 − z0 | ≤ c|z0 | eps ,


|ηk (y) − ηk (z)| = |ηk (z)| ≤ c|zk | eps .

Then, the error between the true solution of the one-step method yn and the com-
puted solution is bounded by
n
|yn − zn | ≤ c eLT n max|zk | eps.
k=0

2.2.8 Theorem (Convergence of one-step methods): Let the one-step method


Fhk (., .) be consistent of order p and Lipschitz continuous in its second argument,
for all t = tk , k < n. Furthermore, let y0 = u0 and let h = maxnk=1 hk . Then, the
global error of the one-step method converges with order p as h → 0 and

|un − yn | ≤ ceLT hp , (2.9)

where the constant c is independent of h.

Proof. Since Fhk is consistent of order p, we have, for all k = 1, . . . , n,

|ηk (u) − ηk (y)| = hk |τk (u)| ≤ hk 2chp , (2.10)


| {z }
=0

where c is the constant in (2.6), which is independent of h.

Now, since Fhk is Lipschitz continuous in its second argument, we can apply the Discrete
Stability theorem 2.2.6 and use the bound in (2.10) to obtain
n
X n
X
LT LT
|un − yn | ≤ e |ηk (u) − ηk (y)| ≤ e hk 2chp = 2cT eLT hp .
k=1 k=1

Corollary 2.2.9. If f is in C 1 in a compact set D around the graph of u over [0, T ], then
the convergence order of the global error in the Euler method is one.

Important ! General approach:

CONSISTENCY + STABILITY = CONVERGENCE

23
2.3 Runge-Kutta methods

2.3.1. Since the IVP is equivalent to the Volterra integral equation (1.12), we can consider
its numerical solution as a quadrature problem. However, the difficulty is that the integrand
is not known. It will need to be approximated on each interval from values at earlier times,
leading to a class of methods for IVP, called Runge-Kutta methods.
We present the formula again only for the calculation of y1 from y0 on the interval from t0
to t1 = t0 + h. The formula for a later time step k is obtained by replacing y0 , t0 and h by
yk−1 , tk−1 and hk , respectively to obtain yk . (The coefficients aij , bj , cj remain fixed.)

2.3.2 Definition: An explicit Runge-Kutta method (ERK) is a one-step


method that uses s evaluations of f with the representation
i−1
X
gi = y0 + h aij kj , i = 1, . . . , s, (2.11a)
j=1

ki = f (t0 + ci h, gi ) , i = 1, . . . , s, (2.11b)
X s
y1 = y0 + h bi ki , (2.11c)
i=1

i.e., with increment function Fh (t, y0 ) := si=1 bi f (t0 + ci h, gi ). The values t0 + ci h


P
are the quadrature points on the interval [t0 , t1 ]. The values gi are approximations to
the solution u(t0 +ci h) at the quadrature points, obtained via recursive extrapolation
using the evaluations of the function f at previous quadrature points. Since the
method uses s intermediate approximations of u on [t0 , t1 ], it is called an s-stage
method.

Remark 2.3.3. In typical implementations, the intermediate values gi are not stored
separately. However, they are useful for highlighting the structure of the method.

2.3.4 Definition (Butcher tableau): It is customary to write Runge-Kutta meth-


ods in the form of a Butcher tableau, containing only the coefficients of equa-
tion (2.11) in the following matrix form:

0
c2 a21
c3 a31 a32
.. .. .. .. (2.12)
. . . .
cs as1 as2 · · · as,s−1
b1 b2 · · · bs−1 bs

Remark 2.3.5. The first row of the tableau should be read such that c1 = 0, g1 = y0 and
that k1 is computed directly by f (t0 , y0 ). The method is explicit since the computation
of ki only involves coefficients with index less than i. The last row below the line is the
short form of formula (2.11c) and lists the quadrature weights in the increment function
Fh (t, y0 ).

24
Considering the coefficients aij as the entries of a square s × s matrix A, we see that A is
strictly lower triangular. This is the defining feature of an explicit RK method. We will
later see RK methods that also have entries on the diagonal or even in the upper part.
Those methods will be called “implicit”, because the computation of a stage value also
involves the values at the current or future stages. We will also write b = (b1 , . . . , bs )T and
c = (0, c2 , . . . , cs )T , such that the Butcher tableau in (2.12) simplifies to

c A
bT

Example 2.3.6. The Euler method Euler method has the Butcher tableau:

0
1

That leads to the already known formula:

y1 = y0 + hf (t0 , y0 )

The values b1 = 1 and c1 = 0 indicate that this is a quadrature rule with a single point
at the left end of the interval. As a quadrature rule, such a rule is exact for constant
polynomials and thus of order 1.

2.3.7 Example (Two-stage methods): The modified Euler method is a vari-


ant of Euler’s method of the following form:

k1 = f (t0 , y0 )
  0
1 1 1 1
k2 = f t0 + h1 , y0 + h1 k1 2 2
2 2
0 1
y1 = y0 + h1 k2

The so-called Heun method of order 2 is characterized through the equation

k1 = f (t0 , y0 )
0
k2 = f (t0 + h1 , y0 + h1 k1 )
  1 1
1 1 1 1
y1 = y0 + h k1 + k2 2 2
2 2

Remark 2.3.8. The modified Euler method uses one quadrature node at t0 + h2 = t0 +t 2
1
and
h h
an approximation to f (t0 + 2 , u(t0 + 2 )) in Fh , corresponding to the midpoint quadrature
rule. The Heun method of order 2 is constructed based on the trapezoidal rule. Both
quadrature rules are of second order, and so are these one-step methods. Both methods
were discussed by Runge in his article of 1895 [Run95].

2.3.9 Lemma: For f is sufficiently smooth, the Heun method of order 2 and the
modified Euler method have consistentcy order two.a
a
Here and in the following proofs of consistency order, we will always assume that all necessary
derivatives of f exist and are bounded and simply write“f is sufficiently smooth”.

25
Proof. The proof uses Taylor expansion of the continuous solution u1 and the discrete
solution y1 around t0 with y0 = u0 . W.l.o.g. we choose t0 = 0. Considering first only the
case d = 1 and the abbreviations

ft = ∂t f (t0 , u0 ) and fy = ∂y f (t0 , u0 )

and replacing u0 (t0 ) = f (t0 , u0 ) = f , we obtain

h2 
u1 = u(h) = u0 + hf (t0 , u0 ) + ft + fy f
2
h3
ftt + 2fty f + fyy f 2 + fy ft + fy2 f + . . . . (2.13)

+
6
For the discrete solution of the modified Euler step on the other hand, Taylor expanding
f around (t0 , u0 ) leads to
 
h h
y1 = u0 + hf t0 + , u0 + f (t0 , u0 )
2 2
h 2 h3
ftt + 2fty f + fyy f 2 + . . . .
 
= u0 + hf + ft + fy f +
2 8
Thus, |τ1 | = h−1 |u1 − y1 | = O(h2 ). Since the truncation error at the kth step can be
estimated identically, the method has consistency order two. The proof for the Heun
method is left as an exercise.

For d > 1, the derivatives with respect to y are no longer scalars, but tensors of increasing
rank, or equivalently multilinear operators. Thus, to be precise in d dimensions, ∂y f (t0 , u0 )
is a d × d matrix that is applied to the vector f (t0 , u0 ) and we should write more carefully

fy (f ) = ∂y f (t0 , u0 )f (t0 , u0 ).

Similarly, ∂yy f (t0 , u0 ) is a d × d × d rank-3 tensor, or more simply a bilinear operator and
to stress this we write fyy (f, f ) instead of fyy f 2 . (However, we will not dwell on this issue
in this course.)

2.3.10 Example: The three stage Runge-Kutta method is

k1 = f (t0 , y0 )
 
1 1 0
k2 = f t0 + h, y0 + hk1 1 1
2 2 2 2
k3 = f (t0 + h, y0 − hk1 + 2hk2 ) 1 −1 2
1 4 1
6 6 6
 
1 4 1
yn+1 = y0 + h k1 + k2 + k3
6 6 6

This method is obviously based on the Simpson rule.

Remark 2.3.11. These Taylor series expansions become tedious very fast. For au-
tonomous ODEs u0 = f (u) the analysis simplifies significantly. The Runge-Kutta method
(2.11) reduces to

26
i−1
X
gi = y0 + h aij f (gj ), i = 1, . . . , s
j=1
s
(2.14)
X
y1 = y0 + h bj f (gj ).
j=1

Each (non-autonomous) ODE can be autonomized (see Def. 1.2.6) using the transformation
 0  
0 u f (t, u)
U := 0 = =: F (U ). (2.15)
t 1

2.3.12 Lemma: An ERK is invariant under autonomization, i.e. its coefficients


remain unchanged, if and only if
i−1
X s
X
aij = ci , i = 1, . . . , s, and bj = 1 . (2.16)
j=1 j=1

Proof. Considering the last components of the vector gi in (2.14) when applied to the
autonomized ODE (2.15) with right hand side F (·), we obtain
i−1
X
t0 + h aij .
j=1

For the ERK to be invariant under autonomization, we require that f is evaluated at t0 +hci
in the ith stage leading to the first condition in (2.16). Similarly, the second condition in
(2.16) follows from the last component of y1 , when applying (2.14) to (2.15).

2.3.13 Lemma: An ERK that is invariant under autonomization with s stages is


consistent of first order, if and only if
b1 + · · · + bs = 1; (2.17a)

it is consistent of second order, if and only if in addition we have

b1 c1 + · · · + bs cs = 1/2 (2.17b)

it is consistent of third order, if and only if in addition we have

b1 c21 + · · · + bs c2s = 1/3, (2.17c)


Xs
bi aij cj = 1/6; (2.17d)
i,j=1

it is consistent of fourth order, if and only if in addition we have

b1 c31 + · · · + bs c3s = 1/4, (2.17e)


Xs
bi aij c2j = 1/12, (2.17f)
i,j=1
Xs
bi aij ajk ck = 1/24, (2.17g)
i,j,k=1
Xs
bi ci aij cj = 1/8. (2.17h)
i,j=1

27
Proof. We consider the autonomous ODE u0 = f (u) and u(t0 ) = u0 , where we assume
w.l.o.g. again t0 = 0. As in the proof of lemma 2.3.9, we first expand u1 around t0 , using
u0 (t0 ) = f (u0 ) = f . Also, since f now only depends on one argument, we abbreviate
d d2
f (u(t0 )) = ∇f (u0 )f (u0 ) =: f 0 (f ), as well as f (u(t0 )) =: f 00 (f, f ) + f 0 (f 0 (f )), . . .
dt dt2
Thus, we obtain
h2 h3  00 
u1 = u0 + hf + f 0 (f ) + f (f, f ) + f 0 (f 0 (f )) (2.18)
2 6
h4  000 
+ f (f, f, f ) + 3f 00 (f 0 (f ), f ) + f 0 (f 00 (f, f )) + f 0 (f 0 (f 0 (f ))) + O(h5 ).
24

To expand y1 around t0 = 0 we consider it as a function y1 (h) of the stepsize h. The stage


values gi are also considered as functions gi (h) of h. First note that
y1 (0) = u0 and gi (0) = u0 , for all i = 1, . . . , s. (2.19)
To compute the derivatives of y1 and gi at h = 0, let q ≥ 1 and note that for an arbitrary
function ϕ = ϕ(h), applying Leibniz’s rule (the product rule for higher derivatives), gives
" #
dq 
   
 q q
hϕ(h) = hϕ(q) (h) + h0 ϕ(q−1) (h) + h00 ϕ(q−2) (h) + 0
dhq h=0 1 |{z} 2 |{z}
=1 =0 h=0

= qϕ(q−1) (0). (2.20)

Using (2.20) and the definition of an ERK for an autonomous ODE in (2.14) we get
s s
(q)
X dq   X dq−1
y1 (0) =0+ bi q hf (gi (h)) =q bi f (gi (h)) , (2.21)
dh h=0 dhq−1 h=0
i=1 i=1

s s
(q)
X dq   X dq−1
gi (0) =0+ aij q hf (gj (h)) =q aij f (gj (h)) . (2.22)
dh h=0 dhq−1 h=0
j=1 j=1

(where we have assumed again for simplicity that aij = 0, for j ≥ i).
Finally, we need to apply the chain rule to compute the derivatives of f (gi (h)) needed
in (2.21) and (2.22). First for q = 1, using again the shorthand notation for the higher
derivatives of f as above, it follows from (2.16), (2.19) and (2.22) that
 
s s
d 0 0 0
X X
f (gi (h)) = f (gi (h)) =f aij f (gj (0)) =
 aij f 0 (f ) = ci f 0 (f )
dh h=0 h=0
j=1 j=1

Similarly, for q = 2, we get


d2
= f 00 (gi0 (h), gi0 (h)) + f 0 (gi00 (h))
 
f (gi (h))
dh2 h=0 h=0
   
s s s
X X X d
= f 00  aij f (gj (0)), aij f (gj (0)) + f 0 2 aij f (gj (h)) 
dh h=0
j=1 j=1 j=1
s
X
= c2i f 00 (f, f ) + 2 aij cj f 0 (f 0 (f ))
j=1

28
The case q = 3 can be derived in a similar way and is left as an exercise DIY .

Substituting these formulae into (2.21), we finally obtain


s
!
X
y10 (0) = bi f,
i=1
s
!
X
y100 (0) = 2 bi ci f 0 (f ),
i=1
 
s s
!
X X
y1000 (0) = 3 bi c2i f 00 (f, f ) + 6  bi aij cj  f 0 (f 0 (f )),
i=1 i,j=1

(4)
as well as a similar formula for y1 (0), which is again left as an exercise DIY .

Considering now the Taylor series expansion of y1 (h) around h = 0, i.e.,

h2 00 h3 h3 (4)
y1 (h) = y1 (0) + hy10 (0) + y1 (0) + y1000 (0) + y1 (0) + O(h5 )
2 6 24
and comparing coefficients with the coefficients in the expansion of u1 in (2.18), we obtain
the order conditions in (2.17).

Remark 2.3.14. Butcher introduced a graph theoretical method for order conditions
based on trees. While this simplifies the process of deriving these conditions for higher
order methods considerably, it is beyond the scope of this course.

2.3.15 Example (The classical Runge-Kutta method of 4th order):

k1 = f (t0 , y0 )
 
1 1
k2 = f t0 + h, y0 + hk1 0
2 2 1 1
  2 2
1 1 1
0 1
k3 = f t0 + h, y0 + hk2 2 2
2 2 1 0 0 1
k4 = f (t0 + h, y0 + hk3 ) 1 2 2 1
6 6 6 6
1 2 2 1
y1 = y0 + h( k1 + k2 + k3 + k4 )
6 6 6 6
Like the 3-stage method in example 2.3.10, this formula is based on Simpson’s
quadrature rule, but it uses two approximations for the value in the center point
and is of fourth order.

Remark 2.3.16 (Order conditions and quadrature). The order conditions derived by
recursive Taylor expansion have a very natural interpretation via the analysis of quadrature
formulae for the Volterra integral equation, where ci h, i = 1, . . . , s, are the quadrature
points on [0, h] and the other coefficients are quadrature weights. First, we observe that
s
X Z h
h bi f (ci h, gi ) approximates f (s, u(s)) ds.
i=1 0

29
In
P this view, conditions (2.17a)–(2.17c) and (2.17e) state that the quadrature formula
i bi p(ci h) is exact for polynomials p of degree up to 3. This implies (see Numerik 0) that
the convergence of the quadrature rule is of 4th order.

Equally, we deduce from formula (2.11a) for gi that


i−1
X Z ci h
h aij f (cj h, gj ) approximates f (s, u(s)) ds.
j=1 0

The condition (2.16), which guarantees that the method is autonomizable, simply translates
to the quadrature rule being exact for constant functions.

For higher order, the accuracy of the value of gi only implicitly enters the accuracy of the
Runge-Kutta method as an approximation of the integrand in another quadrature rule.
Thus, we actually look at approximations of nested integrals of the form
Z h Z s
ϕ(s) ψ(r) dr ds.
0 0

Condition (2.17d) for 3rd order states, that this condition must be true for linear poly-
nomials ψ(r) and constant ϕ(s); thus, after the inner integration again any polynomial
of second order in the outer rule. Equally, conditions (2.17h) and (2.17f) for fourth order
state this for linear polynomials ψ(r) with linear ϕ(s) and for quadratic polynomials ψ(r)
with constant ϕ(s), respectively. Finally, condition (2.17g) states that the quadrature has
to be exact for any linear polynomial ϕ(τ ) in
Z hZ sZ r
ϕ(τ ) dτ dr ds.
0 0 0

Remark 2.3.17 (Butcher barriers). The maximal order of an explicit Runge-Kutta method
is limited through the number of stages, or vice versa, a minimum number of stages is re-
quired for a certain order. The Butcher barriers state that in order to achieve consistency
of order p one requires s stages, where p and s are related as follows:

p 1 2 3 4 5 6 7 8
# cond. 1 2 4 8 17 37 85 200
s p p p p p+1 p+1 p+2 p+3

The Butcher barriers for p ≥ 9 are not known yet.

2.3.18 Lemma: Let f (t, y) admit a uniform Lipschitz condition on [0, T ] × Ω with
{u(t) : t ∈ [0, T ]} ⊂ Ω. Then every ERK that is invariant under autonomization
admits a uniform Lipschitz condition on [0, T ] × Ω.

Proof. The increment function of an ERK is


s
X  
Fh (t, y) = bj f t + ci h, gi (y; h) , (2.23)
j=1

30
with gi defined recursively by
i−1
X  
gi (y; h) = y + h aij f t + cj h, gj (y; h) .
j=1

Let Lf be the Lipschitz constant of f and let q := hLf . Then, for any x, y ∈ Ω, using
(2.16), we get

|g1 (y; h) − g1 (x; h)| = 1 |y − x| =: L1 |y − x|


  
|g2 (y; h) − g2 (x; h)| = y − x + ha21 f t + c2 h, g1 (y; h) − f t + c2 h, g1 (x; h)
≤ (1 + ha21 Lf )|y − x| = (1 + qc2 )|y − x| =: L2 |y − x|
 
|g3 (y; h) − g3 (x; h)| ≤ 1 + hLf a31 + a32 (1 + ha21 Lf ) |y − x|

≤ 1 + qc3 (1 + qc2 ) |y − x| =: L3 |y − x|
..
.
  
|gs (y; h) − gs (x; h)| ≤ 1 + qcs 1 + . . . (1 + qc2 ) . . . |y − x| =: Ls |y − x|.

Since ci ≤ 1, for all i = 2, . . . , s, we can bound


   1 − qs
Li ≤ Ls ≤ 1 + q 1 + . . . (1 + q) . . . = 1 + q + q 2 + . . . + q s−1 =
1−q
Moreover, if q = hLf ≤ 1 we have Ls ≤ s and if q ≤ 1/2 we have Ls ≤ 2.
Using the Lipschitz conditions for the gi together with (2.23) and (2.16) we finally get
s
X
|Fh (t, y) − Fh (t, x)| ≤ bj Lf Lj |x − y| ≤ Lf Ls |x − y|.
j=1

Thus, the increment function Fh admits a Lipschitz condition with constant


1 − (hLf )s
L := Lf
1 − hLf

for general step size h and with constant L = 2Lf for h ≤ (2Lf )−1 .

Corollary 2.3.19. Let f (t, y) admit a uniform Lipschitz condition on [0, T ] × Ω with
{u(t) : t ∈ [0, T ]} ⊂ Ω and let Fhk (., .) be an ERK that is invariant under autonomization.
Then consistency of order p implies convergence with order p and

|un − yn | ≤ ceLT hp , (2.24)

where L is the Lipschitz constant of Fh from lemma 2.3.18 and the constant c is independent
of h.

Proof. Follows directly from lemma 2.3.18 and theorem 2.2.8.

(f Lipschitz ⇒ Fh Lipschitz ⇒ Fh locally stable. Consistency & stability ⇒ convergence)

31
2.4 Estimates of the local error and time step control

2.4.1. The analysis in the last section used a crude a priori bound of the local error based
on high-order derivatives of the right hand side f (t, u). In the case of a complex nonlinear
system, such an estimate is bound to be inefficient, since it involves global bounds on the
derivatives. Obviously, the local error cannot be computed exactly either, because that
would require or imply the knowledge of the exact solution.
In this section, we discuss two methods that allow an estimate of the truncation error
from computed solutions. These estimates are local in nature and therefore usually much
sharper. Thus, they can be used to control the step size, which in turn gives good control
over the balance of accuracy and effort.
Nevertheless, in these estimates there is the implicit assumption that the true solution u is
sufficiently regular and the step size is sufficiently small, such that the local error already
follows the theoretically predicted order.
The main idea is to use two numerical estimates of u(tk ) that converge with different order
to estimate the leading order term of the local error for the lower-order method. Given
this estimate for the local error, we can then devise an algorithm for step size control that
guarantees that the local error of a one-step method remains below a threshold ε in every
time step.
Algorithm 2.4.2 (Adaptive step size control). Let yk and ybk be two approximations of
uk of consistency order p and pb ≥ p + 1, respectively, and let ε > 0 be given.

1. Given yk−1 , compute yk and ybk with time step size hk


(both starting from the value yk−1 at tk−1 ).
2. Compute
  1
ε p+1
hopt = hk . (2.25)
|yk − ybk |

3. If hopt < hk the time step is rejected; choose hk = hopt and recompute yk and ybk .
4. If the time step was accepted, let hk+1 = min(hopt , tn − tk ).
5. Increase k by one and return to Step 1.
Remark 2.4.3. When tk is close to tn , then the choice hk+1 = hopt in Step 4, leads to
tn − tk+1 ≈ eps (machine epsilon). To avoid round-off errors in the next time step, it is
advisable to choose hk+1 = tn − tk already for tn − tk ≤ chk+1 , where c is a moderate
constant of size around 1.1. This way we avoid that hk+2 ≈ eps.
Remark 2.4.4. This algorithm controls and equilibrates the local error. It does not control
the accumulated global error. The global error estimate still retains the exponential term.
Global error estimation techniques involve considerably more effort and are beyond the
scope of this course.
The algorithm does not provide an estimate for the leading-order term in the local error
of ybk . However, since it is a higher order approximation than yk , we should use ybk as the
approximation of u at tk and as the initial value for the next time step.

Let us now discuss two techniques to compute higher-order approximations ybk of uk .

32
2.4.1 Extrapolation methods

2.4.5. Here, we estimate the local error by a method called Richardson extrapolation (cf.
Numerik 0). It is based on computing two approximations with the same method, but
different step size. In particular, we will use an approximation yk+1 with two steps of size
hk and an approximation Yk+1 with one step of size 2hk , both starting with the same initial
value at tk−1 .

2.4.6 Theorem: Let y2 be the approximation of u2 obtained after two steps of an


ERK with step size h and let Y2 be the approximation after one step of the same
method with step size 2h, both starting from u0 at t0 . If f is sufficiently smooth
and the ERK is consistent of order p, then we can define
2p y2 − Y2
yb2 = , (2.26)
2p − 1
and have

|u2 − yb2 | = O(hp+2 ). (2.27)

Proof. The exact form of the leading-order term in the local error of an ERK of order p
can be obtained by explicitly calculating the leading order term
 in the Taylor expansion 
0 (p)
in Lemma 2.3.13, i.e. there exists a constant vector ζk = ζk fk−1 , fk−1 , . . . , fk−1 ∈ Rd
independent of h such that

ηk (u) = ζk hp+1 + O(hp+2 ),


(j)
where fk−1 denotes the jth derivative of f evaluated at (tk−1 , yk−1 ), for k = 1, 2. Moreover,
since t1 = t0 + h and y1 = y0 + O(h), we can also deduce via Taylor expansion that
(j) (j)
f1 = f0 + O(h) so that ζ2 = ζ1 + O(h) and thus

ηk (u) = ζ1 hp+1 + O(hp+2 ), k = 1, 2. (2.28)

Furthermore, we can also use Taylor series expansion of Fh around (t1 , u1 ) to obtain
 
Fh (t1 , y1 ) = Fh (t1 , u1 ) + ∇y Fh (t1 , u1 )ηk (u) + O |ηk (u)|2
= Fh (t1 , u1 ) + hp+1 ∇y Fh (t1 , u1 )ζ1 + O(hp+2 ). (2.29)

Thus, following the same proof technique as in theorem 2.2.6, we obtain for the error after
two steps of size h,
2 h
X i
u2 − y2 = u0 − y0 + ηk (u) − ηk (y) +hFh (tk−1 , uk−1 ) − hFh (tk−1 , yk−1 )
| {z } | {z }
=0 k=1 =0
2
X
= ηk (u) + h [Fh (t1 , u1 ) − Fh (t1 , y1 )]
k=1
= 2ζ1 hp+1 + O(hp+2 ) + h hp+1 ∇y Fh (t1 , u1 )ζ1 + O(hp+2 )
 

= 2ζ1 hp+1 + O(hp+2 ) (2.30)

33
On the other hand,
u2 − Y2 = ζ1 (2h)p+1 + O(hp+2 ) = 2p+1 ζ1 hp+1 + O(hp+2 ). (2.31)
Taking 2p times equation (2.30) and subtracting equation (2.31), we can eliminate the
leading order term and obtain
O(hp+2 ) = 2p (u2 − y2 ) − (u2 − Y2 ) = (2p − 1)u2 − (2P y2 − Y2 ) = (2p − 1)(u2 − yb2 )
which completes the proof.
Remark 2.4.7. Thus, yb2 provides an approximation of u2 with consistency order p+1 > p
and can thus be used in Algorithm 2.4.2 above to control the step size in each step. In
particular, ybk+1 can be computed cheaply from yk+1 and Yk+1 via formula (2.26) (with
index k + 1 instead of 2). As mentioned in remark 2.4.4, in practice we expect a better
global accuracy, if we use ybk−1 instead of yk−1 as the initial value at tk−1 for computing
yk+1 and Yk+1 .

However, in general the computation of Y2 requires s − 1 additional evaluations of f , since


the stage values will differ from those of y1 and y2 , leading to a total of 3s − 1 function
evaluations for two time steps for this p + 1-order method. An alternative, that uses the
optimal number of stage values s for a p + 1-order method and reuses all stage values of
the lower-order method will be discussed in the next section.

2.4.2 Embedded Runge-Kutta methods

Instead of estimating the local error by doubling the step size, embedded Runge-Kutta
methods use two methods of different order to achieve the same effect. The key to efficiency
is here, that the computed stages gi are the same for both methods, and only the quadrature
weights bi differ.
Definition 2.4.8 (Embedded Runge-Kutta methods). An embedded s-stage Runge-Kutta
method with orders of consistence p and pb computes two approximations yk and ybk of uk
with the same function evaluations. For this purpose, the stage values gi and ki at t0 +ci hk
are identical for all i = 1, . . . , s, i.e. both methods have the same coefficients aij and ci . To
compute the final approximations at time step tk , we use two different quadrature rules,
i.e.
X
yk = yk−1 + hk bi ki ,
X (2.32)
ybk = yk−1 + hk bbi ki ,

such that yk and ybk are consistent of order p and pb > p, respectively. Typically, pb = p + 1.

2.4.9 Definition: The Butcher tableau for the embedded method has the form:
0
c2 a21
c3 a31 a32
.. .. .. ..
. . . .
cs as1 as2 · · · as,s−1
bb1 bb2 · · · bbs−1 bbs
b1 b2 · · · bs−1 bs

34
Remark 2.4.10. For higher order methods or functions f (t, u) with complicated eval-
uation, most of the work lies in the computation of the stages. Thus, the additional
quadrature for the computation of yk is almost for free. Nevertheless, due to the different
orders of approximation, ybk is much more accurate and we obtain

uk − yk = ybk − yk + O(hp ). (2.33)

Thus, ybk − yk is a good estimate for the local error in yk . This is the error which is used
in step size control below. However, as in the Richardson extrapolation above, we use the
more accurate value ybk as the final approximation at tk and as the initial value for the
next time step, even if we do not have a computable estimate for its local error.

2.4.11 Definition (Dormand-Prince 45): The embedded Runge-Kutta method


of orders 5 for ybk and 4 for yk due to Dormand and Prince has the Butcher tableau

0
1/5 1/5
3/10 3/40 9/40
4/5 44/45 −56/15 32/9
19372 −25360 64448 −212
8/9 6561 2187 6561 729
9017 −355 46732 49 −5103
1 3168 33 5247 176 18656
35 500 125 −2187 11
1 384 0 1113 192 6784 84
35 500 125 −2187 11
ybk 384 0 1113 192 6784 84 0
5179 7571 393 −92097 187 1
yk 57600 0 16695 640 339200 2100 40

It has become a standard tool for the integration of IVP and it is the backbone of
ode45 in Matlab.

35
Chapter 3

Implicit One-Step Methods and


Long-Term Stability

In the first chapter, we studied methods for the solution of IVP and the analysis of their
convergence with shrinking step size h. We could gain a priori error estimates from con-
sistency and stability for sufficient small h.
All of these error estimates are based on Grönwall’s inequality. Therefore, they contain
a term of the form eLt which increases fast with increasing length of the time interval
[t0 , T ]. Thus, the analysis is unsuitable for the study of long-term integration, since the
exponential term will eventually outweigh any term of the form hp .
On the other hand, for instance our solar system has been moving on stable orbits for
several billion years and we do not observe an exponential increase of velocities. Thus,
there are in fact applications for which the simulation of long time periods is worthwhile
and where exponential growth of the discrete solution would be extremely disturbing.
This chapter first studies conditions on differential equations with bounded long term
solutions, and then discusses numerical methods mimicking this behavior.

3.1 Monotonic initial value problem

Example 3.1.1. We consider for λ ∈ C the (scalar) linear initial value problem
u0 = λu
(3.1)
u(0) = 1.
Splitting λ = Re(λ)+i Im(λ) into its real and imaginary part, the (complex valued) solution
to this problem is
u(t) = eλt = eRe(λ)t cos(Im(λ)t) + i sin(Im(λ)t) .


The behavior of u(t) for t → ∞ is determined by the real part of λ:


Re(λ) < 0 : u(t) → 0
Re(λ) = 0 : |u(t)| = 1 (3.2)
Re(λ) > 0 : u(t) → ∞
Moreover, the solution is bounded for λ with non-positive real part for all points in time t.

36
Remark 3.1.2. Since we deal in the following again and again with eigenvalues of real-
valued matrices and these eigenvalues can be complex, we will always consider complex
valued IVP hereafter.
Remark 3.1.3. Due to Grönwall’s inequality and the stability Theorem 1.4.5, the solution
to the IVP above admits the estimate |u(t)| ≤ e|λ|t |u(0)|. This is seen easily by applying the
comparison function v(t) ≡ 0. As soon as λ 6= 0 has a non-positive real part, this estimate
is still correct but very pessimistic and therefore useless for large t. Since problems with
bounded long-term behavior are quite important in applications, we will have to introduce
an improved notation of stability.

3.1.4 Definition: The function f (t, y) satisfies on its domain D ⊂ R × Cd a one-


sided Lipschitz condition if the inequality

Rehf (t, y) − f (t, x), y − xi ≤ ν|y − x|2 (3.3)

holds with a constant ν for all (t, x), (t, y) ∈ D. Moreover such a function is called
monotonic if ν = 0, thus

Rehf (t, y) − f (t, x), y − xi ≤ 0. (3.4)

An ODE u0 = f (u) is called monotonic if its right hand side f is monotonic.

Remark 3.1.5. The term monotonic from the previous definition is consistent with the
term monotonically decreasing, which we know from scalar, real-valued functions. We can
see this by observing that, for y > x,

f (t, y) − f (t, x) (y − x) ≤ 0 ⇔ f (t, y) − f (t, x) ≤ 0.

3.1.6 Theorem: Let u(t) and v(t) be two solutions of the equation

u0 = f (t, u), v 0 = f (t, v),

with initial values u(t0 ) = u0 and v(t0 ) = v0 , resp. Let the function f be continuous
and let the one-sided Lipschitz condition (3.3) hold. Then we have for t > t0 :

|v(t) − u(t)| ≤ eν(t−t0 ) |v(t0 ) − u(t0 )|. (3.5)

Proof. We consider the auxiliary function m(t) = |v(t) − u(t)|2 and its derivative
m0 (t) = 2Re v 0 (t) − u0 (t), v(t) − u(t)
 
= 2Re f t, v(t) − f t, u(t) , v(t) − u(t)
≤ 2ν|v(t) − u(t)|2
= 2νm(t).
According to Grönwall’s inequality (lemma 1.3.8 on page 10) we obtain for t > t0 :
m(t) ≤ m(t0 )e2ν(t−t0 ) .
Taking the square root yields the stability estimate (3.5).

37
Remark 3.1.7. As in example 3.1.1, we obtain from the stability estimate, that for the
difference of two solutions u(t) and v(t) of the differential equation u0 = f (t, u) (with
different initial conditions) we obtain in the limit t → ∞:

ν<0: |v(t) − u(t)| → 0


(3.6)
ν=0: |v(t) − u(t)| ≤ |v(t0 ) − u(t0 )|

3.1.8 Lemma: Let A(t) ∈ Cd×d be a diagonalizable matrix function with eigen-
values λj (t), j = 1, . . . , d. Then the linear function f (t, y) := A(t)y admits the
one-sided Lipschitz condition (3.3) on all of R × Cd with the constant

ν = max Re(λj (t)).


j=1,...,d
t∈R

Furthermore, the linear differential equation u0 = Au with u(t) ∈ Cd is monotonic


if and only if

Re(λj (t)) ≤ 0, for all t ∈ R. (3.7)

(This is the vector-valued form of example 3.1.1.)

Proof. For the right hand side of the equation, we have

hA(t)y − A(t)x, y − xi
RehA(t)y − A(t)x, y − xi = Re 2 |y − x|2 ≤ max Re(λj (t))|y − x|2 .
|y − x| j=1,...,d

This shows that ν ≤ maxj=1,...,d; t∈R Re(λj (t)). If we now insert for y − x an eigenvector
of eigenvalue λj (t) for which the maximum is attained, then we obtain the equality and
therefore ν = maxj=1,...,d; t∈R Re(λj ).

3.1.1 Stiff initial value problems

Example 3.1.9. We consider the IVP


   
−21 19 −20 1
u0 = Au with A :=  19 −21 20  and u(0) =  0  . (3.8)
40 −40 −40 −1

The eigenvalues of A are λ1 = −2 and λ2,3 = −40 ± 40i. The exact solution is
1 −2t + 21 e−40t [cos 40t + sin 40t]
2e

 1 −2t
u(t) = 2e − 21 e−40t [cos 40t + sin 40t] → 0 as t → ∞.

−e−40t [cos 40t − sin 40t]

For small time 0 ≤ t ≤ 0.2 all three components are changing rapidly due to the trigono-
metric terms, since the factor e−40t in front of them is still fairly big. Thus, it is necessary
to choose small time step sizes h  1.

38
For t > 0.2, we have u3 ≈ 0 and u1 ≈ u2 , and both those components change fairly slowly,
so we could choose a larger time step size h ≥ 0.1.
However, if we consider the explicit Euler method applied to (3.8) we get
y (n) = y (n−1) + hAy (n−1)
and thus
y (n) = (I + hA)n u0 .
Now, if we choose a time step size of h = 0.01 the matrix I + hA has eigenvalues µj =
0.98, 0.6 + 0.4i, 0.6 − 0.4i, so that

|y (n) | = |(I + hA)n u0 | ≤ kI + hAkn |u0 | = 0.98n 2 → 0 as t → ∞,
which is, at least qualitatively, the correct behaviour.
For h = 0.1, I +hA has eigenvalues µj = 0.8, −3+4i, −3−4i. It is easy to see that the first
eigenvector is v1 = √12 (1, 1, 0)T ; the other two eigenvectors are orthogonal to v1 . Thus, if
we apply (I + hA)n to the second or third eigenvector, v2 or v3 , we get
|(I + hA)n vj | = |−3 ± 4i|n |vj | = 5n → ∞ as n → ∞, for j = 2, 3.
Since u0 contains components in the direction of v2 and v3 , this means that |y (n) | → ∞ as
n → ∞, very much in contrast to the behaviour of the exact solution u(t) → 0 for t → ∞.
So, even when u3 ≈ 0 and u2 − u1 ≈ 0 and the perturbations are very small, the instability
of the explicit Euler method with time step size h = 0.1 will lead to an exponential increase
in these perturbations.
Remark 3.1.10. The important message here is that from a point of view of approxi-
mation error (or consistency), it would be possible to increase the time step significantly
at later times, but due to stability problems with the explicit Euler method we cannot
increase h beyond a certain stability threshold.
This phenomenon only arises for monotonic ODEs, or for ODEs that satisfy a one-sided
Lipschitz condition with constant 0 < ν  1 and that are monotonic for all t ≥ t∗ , for
some t∗ ≥ t0 . The consistency error is closely linked to the Lipschitz constant L of f ,
while the stability is linked to the ratio of L and the constant ν in the one-sided Lipschitz
condition. In the following definition, we will only focus on monotonic IVPs.

3.1.11 Definition: Let f be Lipschitz continuous with constant L > 0 and one-
sided Lipschitz continuous with constant ν ∈ R. An initial value problem is called
stiff , if it has the following characteristic properties:

1. The right hand side of the ODE is monotonic.

2. The time scales on which different solution components are evolving differ a
lot, i.e.,
L  |ν| .

3. The time scales which are of interest for the application are much longer than
the fastest time scales of the equation, i.e.,

eνT  e−LT ≈ 0 . (3.9)

39
Remark 3.1.12. Note that for the linear IVP in lemma 3.1.8 and when f is monotonic,
we have

L := max |λj (t)| ≥ max |Re(λj (t))| and |ν| := min |Re(λj (t))| .
j=1,...,d j=1,...,d j=1,...,d
t∈R t∈R t∈R

Remark 3.1.13. Even though we used the term definition, the notion of “stiffness of an
IVP” has something vague or even inaccurate about it. In fact that is due to the very nature
of the problems and cannot be fixed. Instead we are forced to sharpen our understanding
by means of a few examples.

Example 3.1.14. First of all we will have a look at equation (3.8) in example 3.1.9.
Studying the eigenvalues of the matrix A, we clearly see that ν = −2 and thus the problem
is monotonic. We can also find that the Lipschitz constant is L = kAk ≈ 72.5 so that the
second condition holds as well.

According to the discussion of example 3.1.9, the third condition depends on the purpose
of the computation. If we want to compute the solution at time T = 0.01, we would not
denote the problem as stiff. On the other hand, if one is interested on the solution at time
T = 1, on which the terms containing e−40t are already below typical machine accuracy, the
problem is stiff indeed. Here, we have seen that Euler’s method requires disproportionately
small time steps.

Remark 3.1.15. The definition of stiffness and the discussion of the examples reveal that
numerical methods are needed, which are not just convergent for time steps h → 0 but
also for fixed step size h, even in the presence of time scales clearly below h. In this case,
methods still have to produce solutions with correct limit behavior for t → ∞.

Example 3.1.16. The implicit Euler method is defined by the one-step formula

y1 = y0 + hf (t1 , y1 ) ⇔ y1 − hf (t1 , y1 ) = y0 , (3.10)

which in general involves solving a nonlinear system of equations. Applied to our linear
example (3.8), we get

y (n) = (I − hA)−1 y (n−1) ⇒ y (n) = (I − hA)−n u0

For all h > 0, the real part of the eigenvalues of the matrix I − hA is
1 1 1
Re(µj ) = , , ,
1 + 2h 1 + 40h 1 + 40h
which are all strictly less than 1, such that we get

|y (n) | → 0 as n → ∞,

independently of h. Thus, although the implicit Euler method requires in general the
solution of a nonlinear system in each step, it allows for much larger time steps than the
explicit Euler method, when applied to a stiff problem.

For a visualization see the programming exercise on the last problem sheet and the ap-
pendix.

40
3.2 A-, B- and L-stability

3.2.1. In this section, we will investigate desirable properties of one-step methods for stiff
IVP (3.11). We will first study linear problems of the form
u0 = Au u(t0 ) = u0 . (3.11)
and the related notion of A-stability in detail. From the conditions for stiffness we derive
the following problem characteristics:

1. All eigenvalues of the matrix A lie in the left half-plane of the complex plane.
With (3.2) all solutions are bounded for t → ∞.
2. There are eigenvalues close to zero and eigenvalues with a large negative real part.
3. We are interested in time spans which make it necessary, that the product hλ is
allowed to be large, for an arbitrary eigenvalue and an arbitrary time step size.

For this case we now want to derive criteria for the boundedness of the discrete solution
for t → ∞. The important part is not to derive an estimate holding for h → 0, but one
that holds for any value of hλ in the left half-plane of the complex numbers.

3.2.2 Definition: Consider the (general) one-step method

y1 = y0 + hFh (t0 , y0 , y1 ),

applied to the scalar, linear test problem u0 (t) = λu(t). Then

y1 = R(hλ)u0 , (3.12)

and
y (n) = R(hλ)n u0 , (3.13)
for some function R : C → C, which is denoted the stability function of the
one-step method Fh . The stability region of the one-step method is the set

S = z ∈ C |R(z)| ≤ 1 . (3.14)

Example 3.2.3 (explicit Euler).


y1 = y0 + hλy0 = (1 + hλ)y0
(3.15)
⇒ R(z) = 1 + z
The stability region for the explicit Euler is a circle with radius 1 and centre (-1,0) in the
complex plane (see Figure 3.1 left).
Example 3.2.4 (Implicit Euler).
y1 = y0 + hλy1 ⇔ (1 − hλ)y1 = y0
1 (3.16)
⇒ R(z) =
1−z
The stability region for the implicit Euler is the complement of a circle with radius 1 and
centre (1,0) in the complex plane (see Figure 3.1 right).

41
Figure 3.1: Stability regions for explicit and implicit Euler (blue stable, red unstable)

3.2.5 Definition (A-stability): A method is called A-stable, if its stability region


contains the left half-plane of C. Hence,

{z ∈ C | Re(z) ≤ 0} ⊂ S (3.17)

3.2.6 Theorem: Consider the linear, autonomous IVP

u0 = Au, u(t0 ) = u0

with a diagonalizable matrix A and initial value y (0) = u0 . The stability of a one-step
method with stability region S applied to this vector-valued problem is inherited
from the scalar equation.
∞
In particular, let y (k) k=0 be the sequence of approximations, generated by an A-
stable one-step method with step size h for this IVP. If all eigenvalues of A have a
non-positive real part, then the sequence is uniformly bounded for all h.

Remark 3.2.7. The term “A-stability” was deliberately chosen neutrally by Dahlquist. In
particaluar, note that A-stability does not stand for asymptotic stability.

Proof (only for ERKs). Since A is diagonalizable, there exists an invertible matrix V ∈
Cd×d and a diagonal matrix Λ ∈ Cd×d such that A = V −1 ΛV . Let w := V u. Then

w0 = (V u)0 = V u0 = V Au = ΛV u = Λw (3.18)

and w(t0 ) = V u(t0 ), and so the system of ODEs decouples into d independent ODEs

w`0 = λ` w` , w` (t0 ) = (w0 )` , ` = 1, . . . , d.

42
Similarly, the stage values of an ERK decouple into d independent, decoupled components:
s
X
gi = y0 + h aij V −1 ΛV gj ⇒
j=1
s
X s
X
γi := V gi = V y0 + h aij ΛV gj = w0 + h aij Λγj
j=1 j=1
s
X
or equivalently (γi )` = (w0 )` + h aij λ` (γj )` , ` = 1, . . . , d.
j=1

Finally, if we denote by ηj := V yj the transformed numerical solution at the jth time step,
we get for the next iterate
s
X s
X
η1 = V y1 = V y0 + h bi V gi = η0 + h bi γi
i=1 i=1

Thus, the ERK applied to a vector valued problem decouples into d decoupled scalar
problems solved by the same ERK. But for each of the scalar problems, the definition of
A-stability implies boundedness of the solution, if Re(λ` ) ≤ 0 for all ` = 1, . . . , d, and thus

|y (k) | = |V η (k) | ≤ kV k|η (k) | < ∞.

3.2.8 Theorem: No explicit Runge-Kutta method is A-stable.

Proof. We show that for such methods R(z) is a polynomial. It is known for polynomials
that the absolute value of their value goes to infinity, if the absolute value of the argument
goes to infinity. Thus, there exists z ∈ {z ∈ C | Re(z) ≤ 0} such that |R(z)| > 1 and thus
z 6∈ S which implies the result of the theorem.

Consider an arbitrary ERK applied to the scalar problem u0 = λu, u(t0 ) = u0 . From
equation (2.11b) it follows that ki = λgi , for all i = 1, . . . , s. If we insert that into the
equation (2.11a), we obtain
i−1
X
gi = y0 + hλ aij gj .
j=1

With g1 = y0 and z = hλ one has

g2 = y0 + a21 zy0 = (1 + a21 z)y0


g3 = y0 + a32 zg2 = y0 + a32 z(1 + a21 z)y0 = (1 + a32 z(1 + a21 z))y0 .

Therefore, one shows easily per induction that gj is a polynomial of order j − 1 in z.


Substituting into formula (2.11c) it follows that R(z) is a polynomial of order s − 1.

Remark 3.2.9. The notion of A-stability is only applicable to linear problems with di-
agonalizable matrices. Now we are considering its extension to nonlinear problems with
monotonic right hand sides.

43
3.2.10 Definition: A one-step method applied to a monotonic initial value problem
u0 = f (t, u) with arbitrary initial values y0 and ỹ0 is called B-stable if

|y1 − ỹ1 | ≤ |y0 − ỹ0 | (3.19)

independent of the time step size h.

3.2.11 Theorem: Let f be monotonic and such that f (t, 0) = 0, for all t ∈ R, and
consider the IVP

u0 = f (t, u) with u(t0 ) = u0 .


∞
Let y (k) k=0 be the sequence generated by a B-stable one-step method Fh with
initial value y (0) = u0 that satisfies Fh (t, 0) = 0, for all t ∈ R. Then the sequence is
uniformly bounded for k → ∞ independent of the time step size h.

Proof. The theorem follows immediately by setting ỹ0 = 0 and iterating over the definition
of B-stability, since the assumptions of the theorem guarantee that ỹk = 0, for all k. (Note
that f (t, 0) = 0 implies Fh (t, 0) = 0 for all Runge-Kutta methods.)

3.2.12 Corollary: Any B-stable method is A-stable.

Proof. Apply the method to the scalar, linear problem u0 = λu, which is monotonic for
Re(λ) ≤ 0. Now, the definition of B-stability implies |R(z)| ≤ 1, and thus, the method is
A-stable.

An undesirable feature of complex differentiable functions in the context of stability of


Runge-Kutta methods is the fact, that limz→∞ R(z) is well-defined on the Riemann sphere,
independent of the path chosen to approach this limit in the complex plane. Thus, for any
real number x, we have
lim R(x) = lim R(ix). (3.20)
x→∞ x→∞

Thus, a method, which has exactly the left half-plane of C as its stability domain, seemingly
a desirable property, has the undesirable property that components in eigenspaces corre-
sponding to very large negative eigenvalues, and thus decaying very fast in the continuous
problem, are decaying very slowly if such a method is applied.
This gave rise to the following notion of L-stability. However, note that L-stable methods
are not always better than A-stable ones. Similarly, it is also not always necessary to
require A-stability. Judgment must be applied according to the problem being solved.

3.2.13 Definition: An A-stable one-step method is called L-stable, if

lim |R(z)| = 0. (3.21)


Re(z)→−∞

Some authors refer to L-stable methods as strongly A-stable.

44
3.3 General Runge-Kutta methods

3.3.1. According to theorem 3.2.8, an explicit Runge-Kutta method cannot be A- or B-


stable. Thus, they are not suitable for long term integration of stiff IVPs. The goal of
this chapter is the study of methods not suffering from this limitation. The cure will be
implicit methods, where stages may not only depend on known values from the past, but
also on the value to be computed.

We point out immediately that the main drawback of these methods is the fact that they
typically require the solution of nonlinear systems of equations and thus involve much
higher computational effort. Therefore, careful judgment should be applied to determine
whether a problem is really stiff or whether it is better to use an explicit method.

3.3.2 Definition: A (general) Runge-Kutta method is a one-step method of the


form
s
X
gi = y0 + h aij kj i = 1, . . . , s (3.22a)
j=1

ki = f (t0 + hci , gi ) i = 1, . . . , s (3.22b)


X s
y1 = y0 + h bi ki (3.22c)
i=1

where aij 6= 0 for all i, j in general. The method is called

Explicit (ERK) if aij = 0, for all j ≥ i,

Diagonal Implicit (DIRK) if aij = 0, for all j > i,

Singly Diagonal Implicit (SDIRK) if DIRK and a11 = a22 = . . . = ass ,

Implicit (IRK) in all other cases.

Remark 3.3.3. The corresponding Butcher tableaus are

0 a11 a12 ... a1s 0 a11 0 a11


c2 a21 a22 ... a2s c2 a21 a22 c2 a21 a11
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . .
cs as1 · · · as,s−1 as,s cs as1 · · · as,s−1 as,s cs as1 · · · as,s−1 a11
b1 · · · bs−1 bs b1 · · · bs−1 bs b1 · · · bs−1 bs
IRK DIRK SDIRK

Example 3.3.4 (Two-stage SDIRK). The following two SDIRK methodsare of order three:
√ √ √ √
1 3 1 3 1 3 1 3
2 − 6 2 − 6 0 2 + 6 2 + 6 0
√ √ √ √ √ √
1 3 3 1 3 1 3 3 1 3 (3.23)
2 + 6 3 2 − 6 2 − 6 − 3 2 + 6
1 1 1 1
2 2 2 2

45
3.3.5 Lemma: Let I be the s × s identity matrix and let e := (1, . . . , 1)T ∈ Rs .
The stability function of a (general) s-stage Runge-Kutta method with coefficients
   
a11 · · · a1s b1
 .. .
..  and b =  ... 
A= .
 

as1 · · · ass bs

is given by the two expressions

det I − zA + zbeT

T
−1
R(z) = 1 + zb I − zA e =  (3.24)
det I − zA

Proof. Applying the method to the scalar test problem with f (u) = λu, the definition of
the stages gi leads to the system of linear equations
s
X
gi = y0 + h aij λgj , i = 1, . . . , s.
j=1

In matrix notation, with z = hλ, we obtain (I − zA)g = (y0 , . . . , y0 )T , where g is the vector
(g1 , . . . , gs )T . Equally, we obtain
s
X
R(z)y0 = y1 = y0 + h bi λgi
i=1
= y0 + zbT g
 
y0
−1  .. 
= y0 + zb (I − zA)  .  = 1 + zbT (I − zA)−1 e y0 .
T


y0

In order to prove the second representation, we write the whole Runge-Kutta method as a
single system of equations of dimension s + 1:
 
   1
I − zA 0 g  .. 
= y0  .  .
−zbT 1 y1
1

Applying Cramer’s rule yields the result. DIY

3.3.6 Example: Stability functions of the modified Euler method, of the classical
Runge-Kutta method of order 4 and of the Dormand-Prince method of order 5 are
z2
R2 (z) = 1 + z + 2
z2 z3 z4
R4 (z) = 1 + z + 2 + 6 + 24
z2 z3 z4 z5 z6
R5 (z) = 1 + z + 2 + 6 + 24 + 120 + 600

respectively. DIY Their stability regions are shown in Figure 3.2.

46
Figure 3.2: Stability regions of the modified Euler method, the classical Runge-Kutta
method of order 4 and the Dormand/Prince method of order 5 (blue stable, red unstable)

3.3.7 Definition: The ϑ-scheme is the one-step method, defined for ϑ ∈ [0, 1] by

y1 = y0 + h (1 − ϑ)f (y0 ) + ϑf (y1 ) . (3.25)

It is an RKM with the Butcher Tableau


0 0 0
1 1−ϑ ϑ . (3.26)
1−ϑ ϑ

Three special cases are distinguished:


ϑ=0 explicit Euler method
ϑ=1 implicit Euler method
ϑ = 1/2 Crank-Nicolson method

3.3.8 Theorem: The ϑ-scheme is A-stable for ϑ ≥ 1/2.

Proof. DIY (The stability regions for different ϑ are shown in figure 3.3.)

3.3.1 Existence and uniqueness of discrete solutions

While it was clear that the steps of an explicit Runge-Kutta method can always be exe-
cuted, implicit methods require the solution of a possibly nonlinear system of equations.
The solvability of such a system is not always clear. We will investigate several cases here:
First, Lemma 3.3.9 based on a Lipschitz condition on the right hand side. Since this result
suffers from a severe step size constraint, we add Lemma 3.3.10 for DIRK methods based on
right hand sides with a one-sided Lipschitz condition. Finally, we present Theorem 3.3.11
for general Runge-Kutta methods with one-sided Lipschitz condition.

Recall the definition of the usual maximum row-sum norm of a matrix A:


s
X
kAk∞ := max |aij | .
i=1,...,s
j=1

47
Figure 3.3: Stability regions of the ϑ-scheme with ϑ = 0.5 (Crank-Nicolson), ϑ = 0.6,
ϑ = 0.7, and ϑ = 1 (implicit Euler).

48
3.3.9 Lemma: Let f : R × Rd → Rd be continuous and satisfy the Lipschitz
condition with constant L. If

hLkAk∞ < 1 (3.27)

then, for any y0 ∈ Rd , the Runge-Kutta method (3.22) has a unique solution y1 ∈ Rd .

Proof. We prove existence and uniqueness by a fixed-point argument. To this end, given
y0 ∈ Rd , we define the matrix of stage values K = [k1 , . . . , ks ] ∈ Rd×s in (3.22).

Given some initial K (0) ∈ Rd×s , we consider the fixed-point iteration K (m) = Ψ(K (m−1) ),
m = 1, 2, . . ., defined columnwise by
 s 
(m) (m−1)
X
(m−1)
ki = Ψi (k ) = f t0 + ci h, y0 + h aij kj , i = 1, . . . , s,
j=1

which clearly has the matrix of stage values K as a fixed point. Using on Rd×s the norm
kKk = maxi=1,...,s |ki |, where |.| is the regular Euclidean norm on Rd , it follows from the
Lipschitz continuity of f in its second argument that
 s
X 
0
kΨ(K) − Ψ(K )k ≤ hL max |aij | kK − K 0 k.
i=1,...,s
j=1

Under assumption (3.27), the term in parentheses is strictly less than one and thus, the
mapping Ψ is a contraction. Then the Banach fixed-point theorem (cf. theorem A.3.1)
yields the unique existence of y1 .

3.3.10 Lemma: Let f : R × Cd → Cd be continuous, differentiable in its sec-


ond argument and satisfy the one-sided Lipschitz condition (3.3) with constant ν.
Consider an arbitrary DIRK method with aii > 0. If for all i = 1, . . . , s

hνaii < 1, (3.28)

then, for any y0 ∈ Cd , each of the (decoupled) nonlinear equations in (3.22a) has a
solution gi ∈ Cd .

Proof. The proof simplifies compared to the general case of an IRK, since each stage
depends explicitly on the previous stages and implicitly only on itself. Thus, we can write
i−1
X
gi = y0 + vi + haii f (gi ) with vi = h aij f (gj ). (3.29)
j=1

For linear IVPs with f (t, y) := M y with diagonalizable system matrix M , we have

(I − haii M ) gi = y0 + vi .

49
Since ν = maxj=1,...,d Re(λj (M )) (cf. lemma 3.1.8), assumption (3.28) implies that all
eigenvalues of (I − haii M ) have positive real part. Thus, the inverse exists and we obtain
a unique solution.
In the nonlinear case, we use a homotopy argument. To this end, we introduce the param-
eter τ ∈ [0, 1] and set up the family of equations
g(τ ) = y0 + τ vi + haii f (g(τ )) + (τ − 1)haii f (y0 ).
For τ = 0 this equation has the solution g(0) = y0 , and for τ = 1 the solution g(1) = gi .
Now, provided g 0 is bounded on [0, 1], we can conclude that a solution exists, since
Z 1
g(1) = g(0) + g 0 (s) ds . (3.30)
0

g0
To show that is bounded, note first that since f was assumed to be differentiable in the
second argument
hfy (t, y)h + o(|h|), hi = hf (t, y + h) − f (x), hi ≤ ν|h|2

Dividing by |h|2 and taking the limit as |h| → 0, we obtain with b


h = h/|h| that
D E
fy (t, y)b h ≤ ν ⇔ hfy (t, y)h, hi ≤ ν|h|2 , for all h ∈ Cd .
h, b

Hence, with
g 0 (τ ) = vi + haii fy t, g(τ ) g 0 (τ ) + haii f (y0 )


we obtain
|g 0 (τ )|2 = vi + haii f (y0 ), g 0 (τ ) + haii fy (t, g(τ ))g 0 (τ ), g 0 (τ )

≤ |vi + haii f (y0 )||g 0 (τ )| + haii ν|g 0 (τ )|2 .


Now subtracting the second term on the right hand side and dividing by 1 − haii ν, which
by assumption is positive, it follows that
|vi + haii f (y0 )| 0
|g 0 (τ )|2 ≤ |g (τ )|,
1 − haii ν
which implies that g 0 (τ ) is either zero or bounded for all τ ∈ [0, 1].
Thus, we have proved existence of the stage values gi .

If the DIRK method in lemma 3.3.10 is A- or B-stable, then the gi are unique.

3.3.11 Theorem: Let f be continuously differentiable and let it satisfy the one-
sided Lipschitz condition (3.3) with constant ν. If the Runge-Kutta matrix A is
invertible and if there exists a diagonal matrix D = diag(d1 , . . . , ds ) with positive
entries, such that

x, A−1 x
hν < D
, ∀x ∈ Rs , (3.31)
hx, xiD

then the nonlinear system (3.22a) has a solution (g1 , ..., gs ), where hx, yiD = hDx, yi.

Proof. We omit the proof here and refer to [HW10, Theorem IV.14.2]

50
3.3.2 Considerations on the implementation of Runge-Kutta methods

3.3.12. As we have seen in the proof of lemma 3.3.9, implicit Runge-Kutta methods require
the solution of a nonlinear system of size s · d, where s is the number of stages and d the
dimension of the system of ODEs. DIRK methods are simpler and only require the solution
of systems of dimension d. Thus, we should prefer this class of methods, weren’t it for the
following theorem.

3.3.13 Theorem: A B-stable DIRK method has at most order 4

Proof. See [HW10, Theorem IV.13.13].

Remark 3.3.14. In each step of an IRK, we have to solve a (non-)linear system for the
quantities gi . In order to reduce round-off errors, it is advantageous to solve for zi = gi −y0 .
Especially for small time steps, zi is expected to be much smaller than gi . Thus, we have
to solve the system
s
X
zi = h aij f (t0 + cj h, y0 + zj ), i = 1, . . . , s. (3.32)
j=1

Using the Runge-Kutta matrix A, we rewrite this as


   
z1 hf (t0 + c1 h, y0 + z1 )
 ..  ..
 .  = A . (3.33)
 
.
zs hf (t0 + cs h, y0 + zs )
We can avoid further function evaluations by then computing

y1 = y0 + bT A−1 z, (3.34)

which again is numerically much more stable than evaluating f (with a possibly large
Lipschitz constant).

3.4 Construction of Runge-Kutta methods via quadrature

We finish our discussion of Runge-Kutta methods by describing a systematic way to con-


struct stable, high-order implicit Runge-Kutta methods.

3.4.1 Definition (Simplifying order conditions):


s
1
bi cq−1
X
B(p) : i = q = 1, . . . , p (3.35a)
q
i=1
s
cqi q = 1, . . . , p
aij cq−1
X
C(p) : j = (3.35b)
q i = 1, . . . , s
j=1
s
bj q = 1, . . . , p
bi aij cq−1 (1 − cqj )
X
D(p) : i = (3.35c)
q j = 1, . . . , s
i=1

51
3.4.2 Theorem: Consider a (general) Runge-Kutta method that satisfies condition
B(p) in (3.35a), condition C(ξ) in (3.35b), and condition D(η) in (3.35c) with ξ ≥
p/2 − 1 and η ≥ p − ξ − 1. Then the method has consistency order p.

Proof. For the proof, we refer to [HNW09, Ch. II, Theorem 7.4]. Here, we only observe,
that
cq
Z 1 Z ci
q−1 1
t dt = , tq−1 dt = i .
0 q 0 q

If we now insert the function x at the places ci into the quadrature formula with the
quadrature weights bi , then we obtain (3.35a). Similarly we get (3.35b), if we insert the
value tq /q at the places ci from the quadrature formula with weights aij for j = 1, . . . , s.
In both cases we carry this out for all monomials until the desired degree is reached. Due
to linearity of the formulas the exactness holds for all polynomials up to that degree.

3.4.1 Gauss-, Radau-, and Lobatto-quadrature

3.4.3. In this subsection, we review some of the basic facts of quadrature formulas based
on orthogonal polynomials (cf. Numerik 0 for details).

3.4.4 Definition: Let Ln (t) be the (shifted) Legendre polynomial of degree n on


[0, 1], up to scaling. These can be compactly defined by

dn n
Ln (t) = t (t − 1)n .
dtn
R1
A quadrature formula for 0 f dx that uses the n roots of Ln as its quadrature points
and the integrals of the Lagrange interpolating polynomials at those points as its
weights is called a Gauss rule.

[a,b]
3.4.5 Lemma: The Gauss quadrature formula Qn−1 (f ) with n points for approxi-
Rb
mating the integral a f dx is exact for polynomials of degree 2n − 1. If f ∈ C 2n [a, b]
and h := b − a then Z b
[a,b]
Qn−1 (f ) − f dx = O(h2n+1 ).
a

Proof. See Numerik 0. (Please note that in Numerik 0 we numbered the quadrature nodes
x0 , . . . , xn−1 and thus n here is n − 1 in the notes to Numerik 0.)

Remark 3.4.6. An important alternative set of quadrature formulae are the Radau and
Lobatto formulas.

The Radau quadrature formulae are similar to the Gauss rules, but they use one end
point of the interval [0, 1] and the roots of orthogonal polynomials of degree n − 1 as their

52
abscissas. We distinguish left and right Radau quadrature formulae, depending on which
end is included. Lobatto quadrature formulae use both end points and the roots of a
polynomial of degree n − 2. The polynomials are

dn−1 n
t (t − 1)n−1 ,

Radau left pn (t) = n−1
(3.36)
dt
dn−1
pn (t) = n−1 tn−1 (t − 1)n ,

Radau right (3.37)
dt
dn−2
pn (t) = n−2 tn−1 (t − 1)n−1 .

Lobatto (3.38)
dt
A Radau quadrature formula with n points is exact for polynomials of degree 2n − 2. A
Lobatto quadrature formula with n points is exact for polynomials of degree 2n − 3. The
quadrature weights of these formulae are positive.

3.4.2 Collocation methods

3.4.7. An alternative to solving IVP in individual points in time, is to develop methods,


which first approximate the solution function through a polynomial.

However, as we have seen in Numerik 0, polynomials are not suited though for high-order
interpolation over large intervals. Therefore, we apply them again only subintervals in the
form of Runge-Kutta methods. The subintervals correspond to the time steps and the
quadrature points as the stages.

3.4.8 Definition: The collocation polynomial y(t) ∈ Ps of an s-stage colloca-


tion method with pairwise different support points c1 , . . . , cs is defined uniquely
through the s + 1 conditions:

y(t0 ) = y0 (3.39a)
0

y (t0 + ci h) = f t0 + ci h, y(t0 + ci h) i = 1, . . . , s. (3.39b)

The value at the next time step is then defined as

y1 = y(t0 + h). (3.39c)

3.4.9 Lemma: An s-stage collocation method with the points c1 to cs defines a


Runge-Kutta method, as defined in definition 3.3.2, with the coefficients ci and
Z ci Z 1
aij = Lj (t) dt, bi = Li (t) dt, (3.40)
0 0

where Lj (t), j = 1, . . . , s, are the Lagrange interpolation polynomials associated to


the point set {c1 , . . . , cs }, i.e.
s
Y t − ck
Lj (t) = .
cj − ck
k=1
k6=j

53
Proof. The polynomial y 0 (t) is of degree s − 1 and thus uniquely defined by the s interpo-
lation conditions in equation (3.39b). Setting y 0 (x0 + ci h) = f t0 + ci h, y(t0 + ci h) = ki


we obtain
s
X
0
y (x0 + th) = kj · Lj (t), (3.41)
j=1

where Lj (t), j = 1, . . . , s, are the Lagrange interpolation polynomials. By integration we


obtain:
Zci s
X Z ci
0
gi = y(x0 + ci h) = y0 + h y (x0 + th) dt = y0 + h kj Lj (t) dt, (3.42)
j=1 0
0

which, by comparison with (3.22a), defines the coefficients aij . Integrating from 0 to 1
instead, we obtain the coefficients bj by comparison with (3.22c).

3.4.10 Lemma: An implicit s-stage Runge-Kutta method, with pairwise differ-


ent support points ci , is a collocation method if and only if simplifying conditions
B(s) (3.35a) and C(s) in (3.35b) are satisfied. Thus, an s-stage collocation method
is of order (at least) s.

Proof. Consider an s-stage RK method. Condition B(s) leads to a system of s conditions


for the s coefficients b1 , . . . , bs . The system matrix is the transpose of the Vandermonde
matrix V with entries Vi,q := cq−1 i which (for pairwise different ci ) is invertible. Therefore
these coefficients are defined uniquely. Similarly, for each i = 1, . . . , s, condition C(s) leads
to a uniquely solvable system of s conditions for the s coefficients ai,j , j = 1, . . . , s, with
the same system matrix. Thus, all the coefficients are defined uniquely.

On the other hand, (3.35b) yields for q < s:


s
cq+1 ci
Z
aij cqj
X
= i = tq dt.
q+1 0
j=1

As a consequence of linearity we have


s
X Z ci
aij p(cj ) = p(t) dt, ∀p ∈ Ps−1 .
j=1 0

Applying this to the Lagrange interpolation polynomials Lj (t), we obtain the coefficients
of equation (3.40), which were in turn computed from the collocation polynomial, proving
the equivalence.

It follows from theorem 3.4.2 that a Runge-Kutta method that satisfies B(s) and C(s) has
consistency order (at least) s.

54
3.4.11 Theorem: Consider a collocation method with s pairwise different support
points ci and define
s
Y
π(t) = (t − ci ). (3.43)
i=1

If π(t) is orthogonal on [0, 1] to all polynomials of degree r − 1 for r ≤ s, then the


collocation method (3.39) is of consistency order p = s + r.

Proof. We have already shown in the proof of Lemma 3.4.10, that for any collocation
method with s stages, B(s) and C(s) hold.

The condition on π implies that on the interval [0, 1] the quadrature rule is in fact exact
for polynomials of degree s + r − 1 (cf. Numerik 0 for the case r = s), so that we have
B(s + r). Therefore, to prove conistency order p = s + r it remains to show D(r).

First, we observe that due to C(s) and B(s + r), for all p < s and q ≤ r, we have
s s s s
!
p
q−1 p−1 q−1 ci 1 X p+q−1 1
X X X
bi aij ci cj = bi ci = bi ci = .
p p p(p + q)
j=1 i=1 i=1 i=1

Furthermore, since B(s + r) we have for the same p and q:


s s
 1 1 q
bj 1 − cqj cp−1 bj cjp−1 − bj cp+q−1
X  X
j = j = − = .
p p+q p(p + q)
j=1 j=1

1
Subtracting q times the second result from the first we get

s
!
1 1 X p−1 1
bi cq−1 q
X
0= − = cj i aij − bj 1 − cj .
p(p + q) p(p + q) q
j=1 i
| {z }
:=ξj

This holds for p = 1, . . . , s and thus amounts to a homogeneous, linear system in the
variables ξj with system matrix V T . Thus, ξj = 0 and the theorem holds.

Corollary 3.4.12. The consistency order p of an s-stage collocation method satisfies

s ≤ p ≤ 2s.

Proof. The polynomial π(t) in (3.43) is of degree s. If π = Ls , the Legendre polynomial of


degree s on [0, 1], then π is orthogonal to all polynomials of degree s − 1 by construction
(cf. Numerik 0 for details). Thus, it follows from theorem 3.4.11 that there exists an s-stage
collocation method of order p = 2s.

On the other hand, we know from Numerik 0 that there exists no quadrature rule such that
B(2s + 1) is satisfied, otherwise the degree s polynomial π(t) would have to be orthogonal
to itself. In particular, if we consider the scalar model equation u0 = λu with exact solution

55
u(t) = eλt = ∞ j
P
j=0 (λt) /j!, the best we can hope for is that the collocation polynomial
y(t) matches the first 2s − 1 terms in this infinite sum, such that

|u1 − y1 | = O(h2s ).

Hence, it is clear the conistency order of an s-stage collocation method satisfies p ≤ 2s.
The lower bound has already been proved in lemma 3.4.10.

3.4.13 Definition: An s-stage Gauß-Collocation method is a collocation


method, where the collocation points are the set of s Gauß points in the interval
[0, 1], namely the roots of the Legendre polynomial of degree s.

3.4.14 Example: (2- and 3-stage Gauss collocation methods)

√ √ √ √ √
3− 3 1 1 3 5− 15 5 2 15 5 15
6 4 4 − 6 10 36 9 − 15 36 − 30
√ √ √ √
3+ 3 1 3 1 1 5 15 2 5 15
6 4 + 6 4 2 36 + 24 9 36 − 24
√ √ √
1 1 5+ 15 5 15 2 15 5
2 2 10 36 + 30 9 + 15 36
5 4 5
18 9 18

3.4.15 Theorem: The s-stage Gauß-collocation method is consistent of order 2s


and thus of optimal order.

Proof. Follows immediately from the proof of corollary 3.4.12.

3.4.16 Theorem: Gauß-collocation methods are B-stable. The stability region of


Gauß-collocation is exactly the left half-plane of C.

Proof. Let f be monotonic and let y(t) and z(t) be the collocation polynomials according
to (3.39) with respect to initial values y0 or z0 . Analogous to the proof of theorem 3.1.6 we
introduce the auxiliary function m(t) = |z(t)−y(t)|2 . In the collocation points ξi = t0 +ci h
we have

m0 (ξi ) = 2Re z 0 (ξi ) − y 0 (ξi ), z(ξi ) − y(ξi )


= 2Re hf (ξi , z(ξi )) − f (ξi , y(ξi )), z(ξi ) − y(ξi )i ≤ 0. (3.44)

Since Gauß quadrature is exact for polynomials of degree 2s − 1 and m0 is a polynomial of


degree 2s − 1, we have:
Z t0 +h
2
|z1 − y1 | = m(t0 + h) = m(t0 ) + m0 (t) dt
t0
s
X
= m0 + h bi m0 (ξi ) ≤ m(t0 ) = |z0 − y0 |2 ,
i=1

56
Figure 3.4: Stability domains of right Radau-collocation methods with one (implicit Euler),
two, and three collocation points (left to right). Note the different scaling of coordinate
axes in comparison with previous figures.

which establishes B-stability.

To show that the stability region is exactly the left half-plane of C we refer to the problem
sheet.

Remark 3.4.17. Similarly, we can construct collocation rules based on Radau- and
Lobatto-quadrature. As in the proof of theorem 3.4.15, it can be shown that the s-stage
Radau- and Lobatto-collocation methods are of orders 2s − 1 and 2s − 2, respectively.

Also as in the case of Gauß-quadrature it can be shown that collocation methods based
on Radau- and Lobatto quadrature are B-stable (cf. [HW10]). In fact, Radau-collocation
methods with right end point of the interval [0, 1] included in the quadrature set are L-
stable.

The first right Radau collocation method with s = 1 is simply the implicit Euler method.
The definitions of the next two are given in example 3.4.18. The stability regions of the
first three are shown in Figure 3.4.

Observe that the stability domains are shrinking with order of the method. Also, observe
that the computation of y1 coincides with that of gs , such that we can save a few operations.

3.4.18 Example (2- and 3-stage right Radau collocation methods):

√ √ √ √
1 5 1 4− 6 88−7 6 296−169 6 −2+3 6
3 12 − 12 10 360 1800 225
√ √ √ √
3 1 4+ 6 296+169 6 88+7 6 −2−3 6
1 4 4 10 1800 360 225
√ √
3 1 16− 6 16+ 6 1
4 4
1 36 36 9
√ √
16− 6 16+ 6 1
36 36 9

57
Chapter 4

Newton and quasi-Newton methods

4.1 Basics of nonlinear iterations

4.1.1. The efficient solution of nonlinear problems is an important ingredient to implicit


timestepping schemes. Without attempting completeness, we present some important facts
about iterative methods for this problem. We introduce the two generic schemes, Newton
and gradient methods, discuss their respective pros and cons and combine their features
in order to obtain better methods.

Consider the problem of finding x ∈ Rd such that

f (x) = 0, for f : Rd → Rd . (4.1)

4.1.2 Definition: An iteration


 
x(k+1) = G x(k)

to find a fixpoint x∗ = G(x∗ ) is said to be convergent of order p ≥ 1 if

kx(k+1) − x∗ k ≤ q kx(k) − x∗ kp .

For p = 1, in addition we require that q < 1. In that case, q is called the conver-
gence rate.

We have already seen in the proof of lemma 3.3.9 that the fixpoint iteration, e.g. for the
implicit Euler method:

y (m) = Ψ(y (m−1) ) := y0 + hf (t1 , y (m−1) ), with y (0) = y0 ,

converges to y1 provided hL < 1, but the convergence is only of order p = 1 (linear) and
the convergence rate is q := Lh, which may be close to 1. Moreover, it may fail if hL ≥ 1.

In Numerik 0, we have already seen a faster converging algorithm, the Newton method,
and proved there that it converges with order p = 2, for sufficiently good initial guess.

58
4.1.3 Definition: The Newton method for finding the root of the nonlinear
equation f (x) = 0 with f : Rd → Rd reads: given an initial value x(0) ∈ Rd ,
compute iterates x(k) ∈ Rd , k = 1, 2, . . . as follows:
 
J = ∇f x(k) ,
d(k) = −J −1 f (x(k) ), (4.2)
x(k+1) = x(k) + d(k) .

We denote by the term quasi-Newton method any modification of this scheme


employing an approximation Je of the Jacobian J.

4.1.4 Theorem: Let U ⊂ Rd and let f : U → Rd be differentiable with

k∇f (x) − ∇f (y)k ≤ Lkx − yk, for all x, y ∈ U. (4.3)

If there exists a x∗ ∈ U such that f (x∗ ) = 0 and

k(∇f (x∗ ))−1 k ≤ M, (4.4)


1
then there exists a 0 < R ≤ 2LM such that for all x(0) ∈ {x ∈ U : kx∗ − x(0) k ≤ R},
(k) ∗
we have x → x with order p = 2.

Remark 4.1.5. The proof of this theorem can be found in the lecture notes for Numerik 0.
There are also versions that do not require the existence of the root a priori, such as the
Newton-Kantorovich Theorem [Ran17a, Satz 5.5], but we will only discuss some of the
main assumptions and features.
The Lipschitz condition on ∇f can be seen as the deviation of f from being linear. Indeed,
if f were linear, then L = 0 and provided M 6= 0 the method converges in a single step for
any initial value.
The larger the constant M , the smaller one of the eigenvalues of the Jacobian J. Therefore,
the function becomes flat in that direction and the root finding problem becomes unstable.
Most importantly, for an arbitrary initial guess, the method may fail to converge entirely,
but close enough to the solution the convergence is very fast, much faster than the fixpoint
iteration above.

4.2 Descent methods

Nonlinear root finding of a vector-valued functions f : Rd → Rd – as required in implicit


timestepping schemes – is closely related to optimisation of scalar functions F : Rd → R,
and the following problem is equivalent to (4.5) whenever f = ∇F :
x = arg min F (y), for F : Rd → R. (4.5)
y∈Rd

While we assume for most of this discussion that F is known, we will see at the end that
the Newton method with line search does not require it.

59
Obvisously, by choosing f = ∇F , Newton’s method also solves the optimsation problem
(4.5). An alternative family of methods for (4.5) are the following:

4.2.1 Definition: A descent method is an iterative method for finding minimizers


of the functional F : Rd → R that, starting from an initial guess x(0) ∈ Rd , computes
iterates x(k) , k = 1, 2 . . ., by the following steps:

1. If ∇F (x(k) ) 6= 0, choose a descent direction


 
s(k) ∈ Rd such that |s(k) | = 1 and ∇F (x(k) ), s(k) < 0 (4.6)

and a positive parameter α(k) > 0; otherwise terminate.

2. Update: x(k+1) = x(k) + α(k) s(k) .

4.2.2 Lemma: Let F : Rd → R be continuously differentiable. For a given point


x, assume ∇F (x) 6= 0. Then, there is a constant ϑ > 0 such that for any descent
direction d satisfying (4.6) and for any stepsize 0 ≤ α ≤ ϑ there holds

ϑα
F (x + αs) ≤ F (x) − |∇F (x)|. (4.7)
2
In particular, a positive scaling factor α for the descent method, and thus a strict
decrease in the function value, can always be found.

Proof. Skipped.

The most prominent member of this family of methods is the following.

4.2.3 Definition: The gradient method for finding minimizers of F (x) reads:
given an initial value x(0) ∈ Rd , compute iterates x(k) , k = 1, 2, . . . by the rule

d(k) = −∇F (x(k) ),


 
α(k) = argmin F x(k) + γd(k) (4.8)
γ>0

x(k+1) = x(k) + α(k) d(k) .

It is also called the method of steepest descent. The minimization process used
to compute αk , also called line search, is one-dimensional and therefore simple. It
is sufficient only to find an approximate minimum α̃(k) .

4.2.4 Theorem: Let F (x) : Rd → R be continuously differentiable and let x(0) ∈ Rd


be chosen such that the set

K = x ∈ Rd F (x) ≤ F (x(0) )


is compact. Then, each sequence defined by the gradient method has at least one
accumulation point and each accumulation point is a stationary point of F (x).

60
Proof. First, we observe that in any point x(k) with ∇F (x(k) ) 6= 0, it follows from lemma 4.2.2
that there exists γ > 0 such that

F (x(k) ) > F (x(k) + γd(k) ).

We conclude, that for such x(k) , the line search obtains a positive value of α(k) . Thus, the
sequence of the gradient iteration is monotonically decreasing and stays within the set K.
Since K was assumed to be compact the sequence x(k) has at least one accumulation point
x∗ . However, the preceding discussion implies that ∇F (x∗ ) = 0.

Remark 4.2.5. However, we can also choose d(k) = −B (k) ∇F (x(k) ) in (4.8), for any
positive definite matrix B (k) , leading to generalised steepest descent methods that
minimise the descent direction in (4.6) in the weighted inner product (x, y)B (k) = xT B (k) y
instead of the Euclidean inner product (x, y) = xT y.
In particular, if the Hessian D2 F (x(k) ) is positive definite we can choose B (k) = D2 F (x(k) )
and α(k) = 1, which reduces to the Newton method. This link is derived ina different way
in the next section.

4.3 Globalization of Newton

4.3.1. The convergence of the Newton method is only local, and it is the faster, the closer
to the solution we start. Thus, finding good initial guesses is an important task.
A reasonable initial guess for finding y1 in a one-step method seems to be y0 , but on
closer inspection, this is true only if the time step is small. The convergence requirements
of Newton’s method would insert a new time step restriction, which we want to avoid.
Therefore, we present methods which guarantee global convergence while still converging
locally quadratically.
As a rule, Newton’s method should never be used without some globalization strategy!

4.3.2 Lemma: Under the assumptions of theorem 4.1.4, Newton’s Method applied
to the root finding problem f (x) = 0 for f : Rd → Rd is a descent method applied
to the functional F (x) = |f (x)|2 .

Proof. The (multivariate) product rule gives

∇F (x) = 2f (x)T ∇f (x).

The search direction of the Newton method applied to f (x) is


−1
d(k) = − ∇f (x(k) ) f (x(k) )

Now assume that f (x(k) ) 6= 0. Then choosing s(k) := d(k) /|d(k) | and (and omitting the
arguments x(k) ), we have
−1

(k)
 2f T ∇f ∇f f 2|f |2 2|f |
∇F, s =− −1 ≤ − −1 =− −1 < 0,
| ∇f f| k ∇f k kf k k ∇f k

61
and thus s(k) is a descent direction that satisfies (4.6). Here, we used the fact that for x(k)
−1
sufficiently close to x∗ we have k ∇f (x(k) ) k < ∞ (Perturbation Theorem, Numerik 0).
Finally, choosing α(k) := |d(k) | we have established the equivalence.

4.3.3 Definition: The Newton method with line search for finding the root
of the nonlinear equation f (x) = 0 reads: given an initial value x(0) ∈ Rd , compute
iterates x(k) ∈ Rd , k = 1, 2, . . . by the rule
 
(k)
J = ∇f x ,
d(k) = −J −1 f (x(k) ),
2 (4.9)
α(k) = argmin f (x(k) + γd(k) )
γ>0
(k+1) (k)
x =x + α(k) d(k) .

4.3.4 Definition: A practically most often used variant is the Newton method
with step size control (backtracking line search): given an initial value x(0) ∈
Rd , compute iterates x(k) ∈ Rd , k = 1, 2, . . . by the rule
 
J = ∇f x(k) ,
d(k) = −J −1 f (x(k) ), (4.10)
x(k+1) = x(k) + 2−j d(k) .

Here, j is the smallest integer number, such that

|f (x(k) + 2−j d(k) )| < |f (x(k) )| . (4.11)

Remark 4.3.5. The step size control algorithm can be implemented with very low over-
head. In fact, in each Newton step we only have to monitor the norm of the residual
|f (x(k) + d(k) )|, which is typically needed for the stopping criterion anyway. If the residual
grows, i.e. |f (x(k) + d(k) )| ≥ |f (x(k) |, we halve the stepsize, recompute the residual norm
and check again. A modification to the plain Newton method and additional work are only
needed, when the original method was likely to fail anyway.
Under certain assumptions on f it can be shown [NW06] that this backtracking line search
algorithm terminates after a finite (typically very small) number of steps. Also, the step
size control typically only triggers within the first few steps, then the quadratic convergence
of the Newton method starts.

4.4 Practical considerations – quasi-Newton methods

4.4.1. Quadratic convergence is an asymptotic statement, which for any practical purpose
can be replaced by “fast” convergence. Most of the effort spent in a single Newton step
consists of setting up the Jacobian J and solving the linear system in the second line
of (4.2). Therefore, we will consider techniques here, which avoid some of this work. We
will have to consider two cases

62
1. Small systems with d . 1000. For such systems, a direct method like LU - or QR-
decomposition is advisable in order to solve the linear system. To this end, we
compute the whole Jacobian and compute its decomposition, an effort of order d3
operations. Comparing to d2 operations for applying the inverse and order d for all
other tasks, this must be avoided as much as possible.

2. Large systems, where the Jacobian is typically sparse (most of its entries are zero).
For such a system, factorising the matrix at a cost of order d3 is typically not afford-
able. Therefore, the linear problem is solved by an iterative method and we avoid
the computation of the Jacobian when possible.

Remark 4.4.2. In order to save numerical effort constructing and inverting Jacobians,
the following strategies have been successful.

• Fix a threshold 0 < η < 1 which will be used as a bound for error reduction. In each
Newton step, first compute the update vector db using the Jacobian Jb of the previous
step. This yields the modified method

Jk = Jk−1
b = x(k) − Jk−1 f (x(k) )
x
(4.12)
x)| ≤ η|f (x(k) )|
If |f (b x(k+1) = x
b
−1
Else Jk = ∇f (x(k) x(k+1) = x(k) − Jk−1 f (x(k) ).

Thus, an old Jacobian and its inverse are used until convergence rates deteriorate.
This method is a quasi-Newton method which will not converge quadratically. How-
ever, we can obtain linear convergence at any rate η.

• If Newton’s method is used within a time stepping scheme, the Jacobian of the last
Newton step in the previous time step is often a good approximation for the Jacobian
of the first Newton step in the new time step. This holds in particular for small time
steps and constant extrapolation. Therefore, the previous method should also be
extended over the bounds of time steps.

• An improvement of the method above can be achieved by so called low rank updates,
e.g. for the rank-1 update: Let J0 = ∇f (x(0) ) or J (0) = I. Then, at the kth step,
given x(k) and x(k−1) , compute

p = x(k) − x(k−1)
q = f (x(k) ) − f (x(k−1) ) (4.13)
1
Jk = Jk−1 + 2 (q − Jk−1 p) pT
|p|

The fact that the rank of Jk − Jk−1 is at most one can be used to avoid computing
and storing matrices at all. The inverse of such a matrix can be computed via the
Sherman-Morrison formula. The practically most efficient and used methods use
rank-2 updates, such as the Broyden methods [NW06].

Remark 4.4.3. For problems leading to large, sparse Jacobians, typically space discretiza-
tions of partial differential equations, computing inverses of LU -decompositions is infeasi-
ble. These matrices typically only feature a few nonzero elements per row, while the inverse

63
and the LU -decomposition is fully populated, thus increasing the amount of memory from
d to d2 .

Linear systems like this are often solved by iterative methods, leading for instance to so
called Newton-Krylov methods. Iterative methods approximate the solution of a linear
system

Jd = f

only using multiplications of a vector with the matrix J. On the other hand, for any vector
v ∈ Rd , the term Jv denotes the directional derivative of f in direction J. Thus, it can be
approximated easily by

f x(k) + εv − f x(k)
 
Jv ≈ .
ε

The term f x(k) must be calculated anyway as it is the current Newton residual. Thus,


each step of the iterative linear solver requires one evaluation of the nonlinear function,
and no derivatives are computed.

The efficiency of such a method depends on the number of linear iteration steps which is
determined by two factors: the gain in accuracy and the contraction speed. It turns out
that typically gaining two digits in accuracy is sufficient to ensure fast convergence of the
Newton iteration. The contraction number is a more difficult issue and typically requires
preconditioning, which is problem-dependent and as such must be discussed when needed.

64
Chapter 5

Linear Multistep Methods

Instead of using only the one initial value at the beginning of the current time interval to
the next time step, possibly with the help of intermediate steps, we can also use the values
from several previous time steps. Intuitively, this could be more efficient, since function
values at these points have been computed already.

Such methods that use values of several time steps in the past in order to achieve a higher
order are called multistep methods. We will begin this chapter by introducing some of
the common formulae, before studying their stability and convergence properties.

5.1 Examples of LMMs

Basically, there are two construction principles for the multistep methods: Quadrature and
numerical differentiation.

Example 5.1.1 (Adams-Moulton formulae). Here, the integral from point tk−1 to point
tk is approximated by an interpolatory quadrature rule based on the points tk−s to tk , i.e.,
s
X Z tk
yk = yk−1 + fk−r Lr (t) dt, (5.1)
r=0 tk−1

where fj denotes the function value f (tj , yj ) and Lr (t), r = 0, . . . , s, the Lagrange inter-
polation polynomials associated with the points tk−r , r = 0, . . . , s.

Since the integral involves the function evaluated at the time step that is being computed,
these methods are implicit. Here are the first four in this family:

yk = yk−1 + hfk (implicit Euler)


1 
yk = yk−1 + h fk + fk−1 (trapezoidal rule)
2
1 
yk = yk−1 + h 5fk + 8fk−1 − fk−2
12
1 
yk = yk−1 + h 9fk + 19fk−1 − 5fk−2 + fk−3
24

65
Example 5.1.2 (Adams-Bashforth formulae). With the same principle we obtain explicit
methods by omitting the point in time tk in the definition of the interpolation polynomial.
This yields quadrature formulae of the form
s
X Z tk
yk = yk−1 + fk−r Lr (t) dt. (5.2)
r=1 tk−1

Again, we list the first few:

yk = yk−1 + hfk−1 (explicit Euler)


1 
yk = yk−1 + h 3fk−1 − 1fk−2
2
1 
yk = yk−1 + h 23fk−1 − 16fk−2 + 5fk−3
12
1 
yk = yk−1 + h 55fk−1 − 59fk−2 + 37fk−3 − 9fk−4
24

Example 5.1.3 (BDF methods). Backward differencing formulas (BDF) are also based
on Lagrange interpolation at the points tk−s to tk . However, in contrast to Adams formu-
lae they do not use quadrature for the right hand side, but rather the derivative of the
interpolation polynomial in the point tk for the left hand side.

Using the Lagrange interpolation polynomials Lk−r (t), we let


s
X
y(t) = yk−r Lk−r (t),
r=0

where yk is yet to be determined. Now we assume that y(t) satisfies the ODE at tk . Thus,
s
X
yk−r L0k−r (tk ) = y 0 (tk ) = f (tk , yk )
r=0

leading to the following schemes:

yk − yk−1 = hfk (implicit Euler)


4
yk − + 31 yk−2
3 yk−1 = 2
3 hfk
18 9 2 6
yk − 11 yk−1 + 11 yk−2 − 11 yk−3 = 11 hfk
48 36 16 3 12
yk − 25 yk−1 + 25 yk−2 − 25 yk−3 + 25 yk−4 = 25 hfk

For an example on how to derive these schemes see the appendix.

Remark 5.1.4. Recall from Numerik 0 (or any other introductory course in numerical
analysis) that numerical differentiation and extrapolation of interpolation polynomials (i.e.
the evaluation outside the interval which is spanned through the interpolation points) are
both unstable numerically. Therefore, we expect stability problems for all these methods.

Secondly, recall that Lagrange interpolation with equidistant support points is unstable for
higher degree polynomials. Therefore, we also expect all of the above methods to perform
well only at moderate order.

66
5.2 General definition and consistency of LMMs

5.2.1 Definition: A linear multistep method (LMM) with s steps is a method


of the form
s
X s
X
αs−r yk−r = h βs−r fk−r , (5.3)
r=0 r=0

where fk = f (tk , yk ) and tk = t0 + hk, and where we assume |α0 | + |β0 | =


6 0 and
αs = 1. There are explicit (βs = 0) and implicit (βs 6= 0) methods.
It is convenient to define the generating polynomials
s
X s
X s
X s
X
%(x) = αs−r xs−r = αj xj σ(x) = βs−r xs−r = β j xj . (5.4)
r=0 j=0 r=0 j=0

for each of these methods.

Remark 5.2.2. The LMM was defined for constant step size h. In principle it is possible
to implement the method with a variable step size but we restrict ourselves to the constant
case. Notes to the step size control can be found later on in this chapter.

5.2.3 Definition: As for one-step methods, we use the abbreviation uk := u(tk ),


where u(t) denotes the exact solution of u0 = f (t, u), u(t0 ) = u0 .
The local error of a linear multistep method (LMM) at the kth timestep is again
defined by

uk − yk ,

where yk is the numerical solution obtained from (5.3) using the exact initial values
yk−r = uk−r for r = 1, ..., s.
The truncation error of an LMM, on the other hand, is defined as

τk (u) := h−1 (Lh u)(tk ), (5.5)

using the linear difference operator


s 
X 
(Lh u)(tk ) := αs−r uk−r − hβs−r f tk−r , uk−r . (5.6)
r=0

Lemma 5.2.4. For h sufficiently small, the two local errors satisfy the following relation
−1
uk − yk = I − hβs Df k (Lh u)(tk ), (5.7)

where
Z 1 
Df k := Df tk , uk + ϑ(yk − uk ) dϑ.
0

and Df (t, y) is the Jacobian of f with respect to the second argument.

67
Proof. Sinc we assumed yk−r = uk−r , for r = 1, ..., s, in the definition of the local error
and αs = 1, (5.3) is equivalent to
s
X s
X
0 = yk − hf (tk , yk ) + αs−r uk−r − h βs−r f (tk−r , uk−r ).
r=1 r=1

Subtracting this from (5.6), we obtain


 
(Lh u)(tk ) = (uk − yk ) − hβs f (tk , uk ) − f (tk , yk ) .

Finally, the result follows by applying the Integral Mean Value Theorem (see, e.g., [Nu-
merik 0, Hilfssatz 5.8]) and the fact that for h sufficiently small I−hβs Df k is invertible.

Remark 5.2.5. Note that it follows from lemma 5.2.4 that


 
uk − yk = h + O(h2 ) τk (u)

and that the higher-order term is exactly zero for explicit LMMs.

5.2.6 Definition: An LMM is consistent of order p, if for all sufficiently regular


functions f and for all relevant k there holds

τk (u) = O(hp ), (5.8)

or equivalently, that the local error is O(hp+1 ).

5.2.7 Theorem: A LMM with constant step size h is consistent of order p if and
only if
s
X s
X
αs−r rq + qβs−r rq−1 = 0,

αs−r = 0 and q = 1, . . . , p (5.9)
r=0 r=0

Proof. We start with the Taylor expansion of the ODE solution u around tk :
p
X u(q) (tk ) u(p+1) (ξ)
u(t) = (t − tk )q + (t − tk )p+1 ,
q! (p + 1)!
q=0 | {z }
=: Ru (t)

where ξ is a point between t and tk that depends on t. It follows from f (t, u) = u0 that
the corresponding right hand side can be expanded as
p
 X u(q) (tk ) u(p+1) (η)
f t, u(t) = (t − tk )q−1 + (t − tk )p .
(q − 1)! p!
q=1 | {z }
=: Rf (t)

with η again a point between t and tk that depends on t.

68
Substituting the two expansions into (5.6) we get:
 
s p (q)
X X u (tk )
Lh u(tk ) = αs−r  (−rh)q + Ru (tk−r ) −
q!
r=0 q=0
 
p (q)
X u (tk )
− βs−r h  (−rh)q−1 + Rf (tk−r )
(q − 1)!
q=1
s p s
! !
X X u(q) (tk ) q
X
q q−1
= u(tk ) αs−r + (−1) αs−r r + qβs−r r hq + Chp+1 ,
q!
r=0 q=1 r=0

where
s
X (−1)p+1 rp  
C := αs−r r u(p+1) (ξr ) + (p + 1)βs−r u(p+1) (ηr )
(p + 1)!
r=0
and ξr , ηr ∈ [tk−r , tk ], for r = 0, . . . , s, which in general may all be different.
Since the right hand side f was arbitrary, Lh u(tk ) = O(hp+1 ) if and only if the conditions
in (5.9) hold. In that case we have
 s !
ku(p+1) k∞ X p+1 p
|Lh u(tk )| ≤ αs−r r + (p + 1)βs−r r hp+1 .
(p + 1)!
r=0

Remark 5.2.8. A consistent LMM is not necessarily convergent. To understand this and
to develop criteria for convergence we diverge into the theory of difference equations.

5.3 Properties of difference equations

5.3.1. The stability of LMM can be understood by employing the fairly old theory of
difference equations. In order to keep the presentation simple in this section, we use a
different notation for numbering indices in the equations. Nevertheless, the coefficients of
the characteristic polynomial are the same as for LMM.

5.3.2 Definition: An equation of the form


s
X
αr yn+r = 0 (5.10)
r=0

is called a homogeneous difference equation. Assume that αs α0 6= 0 (such that


(5.10) does not reduce to a lower order difference equation). A sequence (yn )n=0,...,∞
is solution of the difference equation, if the equation holds true for all n ≥ s. The
values yn may be from any of the spaces R, C, Rd or Cd .
The generating polynomial of this difference equation is
s
X
χ(x) = αr xr . (5.11)
r=0

5.3.3 Lemma: The solutions of equation (5.10) form a vector space of dimension s.

69
Proof. Since the equation (5.10) is linear and homogeneous, it is obvious that if two se-
quences of solutions (y (1) ) and (y (2) ) satisfy the equation, then (αy (1) + y (2) ) also satisfies
(5.10), for any α ∈ R (or C).
As soon as the initial values y0 to ys−1 are chosen, all other sequence members are uniquely
defined. Moreover it holds
y0 = y1 = · · · = ys−1 = 0 =⇒ yn = 0, n ≥ 0.
Therefore it is sufficient to consider the first s elements. Since those can be chosen arbi-
trarily, they span an s-simensional vector space.

5.3.4 Lemma: For each root ξ of the generating polynomial χ(x) the sequence
yn = ξ n is a solution of the difference equation (5.10).

Proof. Inserting the solution yn = ξ n into the difference equation results in


s
X s
X
n+r n
αr ξ =ξ αr ξ r = ξ n χ(ξ) = 0.
r=0 r=0

5.3.5 Theorem: Let {ξj }j=1,...,J be the roots of the generating polynomial χ with
multiplicity µj . Then, the sequences of the form

yn(j,k) = nk−1 ξjn j = 1, . . . , J; k = 1, . . . , µj (5.12)

form a basis of the solution space of the difference equation (5.10).

Proof. First we observe that the sum of the multiplicities of the roots has to result in the
degree of the polynomial:
J
X
s= µj .
j=1

Moreover, we know from Lemma 5.3.3, that s is the dimension of the solution space.
(j,k)
However, the sequences (yn ) are also linearly independent. This is clear for sequences
of different index j. It is also clear for different roots, because for n → ∞ the exponential
function nullifies the influence of the polynomials.
(j,k)
It remains to show that the sequences (yn ) are in fact solutions of the difference equa-
tions. For k = 0 we have proven this already in lemma 5.3.4. We proof the fact here for
k = 2 and for a double zero ξj ; the principle for higher order roots should be clear then.
Equation (5.10) applied to the sequence (nξjn ) results in
s
X s
X s
X
αr (n + r)ξin+r = nξin αr ξir + ξin+1 αr rξir−1
r=0 r=0 r=1
= nξin χ(ξi ) + ξin+1 χ0 (ξi ) = 0.
Here the term with α0 vanishes, because it is multiplied with r = 0. %(ξi ) = %0 (ξi ) = 0
because ξi is a multiple root.

70
5.3.6 Corollary (Root test): All solutions {yn } of the difference equation (5.10)
are bounded for n → ∞ if and only if:

• all roots of the generating polynomial χ(x) lie in z ∈ C |z| ≤ 1 (closed
unit circle) and
• all roots on the boundary of the unit circle are simple.

Proof. According to theorem 5.3.5 we can write all solutions as linear combinations of the
sequences (y (j,k) ) in equation (5.12). Therefore, for n → ∞,

1. all solutions with |ξi | < 1 converge to zero


2. all solutions with |ξi | > 1 diverge to infinity
3. all solutions with |ξi | = 1 stay bounded if and only if ξi is simple.

This proves the statement of the theorem.

5.4 Stability and convergence

In contrast to one-step methods, the Lipschitz condition (1.23) for the RHS f of the
differential equation is not sufficient to ensure that consistency of a multistep method
implies convergence. As for A-stability, stability conditions will again be deduced by
means of a simple model problem.

Remark 5.4.1. In the following we investigate the solution to a fixed point in time t with
a shrinking step size h. Therefore we choose n steps of step size h = t/n and let n go
towards infinity.

5.4.2 Definition: An LMM is zero-stable (or simply stable) if, applied to the
trivial ODE

u0 = 0 (5.13)

with arbitrary initial values y0 = u0 to ys−1 = us−1 , it generates solutions yk which


stay bounded at each point in time t > 0, if the step size h converges to zero.

5.4.3 Theorem: A LMM is zero-stable if and only if all roots of the first generating
polynomial %(x) of equation (5.4) satisfy the root test in corollary 5.3.6.

Proof. The application of the LMM to the ODE (5.13) results in the difference equation
s
X s
X
αs−r yk−r = αr0 yn+r0 = 0 (5.14)
r=0 r0 =0

71
with n = k − s. Thus, the generating polynomial %(x) is equivalent to the generating
polynomial χ(x) of the difference equation in (5.10) which is independent of h.
If αs α0 6= 0, the result than follows directly from corollary 5.3.6. Otherwise, (5.14) reduces
to a finite difference equation with generating polynomial %m (x) of order s − m, for some
1 ≤ m ≤ s − 1, and %(x) = xm %m (x). Thus, % satisfying the root test is equivalent to %m
satisfying the root test and the result follows again from corollary 5.3.6.

5.4.4 Corollary: Adams-Bashforth and Adams-Moulton methods are zero-stable.

Proof. For all of these methods the first generating polynomial is %(x) = xs − xs−1 . It has
the simple root ξ1 = 1 and the s − 1-fold root 0.

5.4.5 Theorem: The BDF methods are zero-stable for s ≤ 6 and not zero-stable
for s ≥ 7.

Proof. See [HNW09, Theorem 3.4].

5.4.6 Definition: An LMM is convergent of order p, if for any IVP with sufficiently
smooth right hand side f there exists a constant h0 > 0 such that, for all h ≤ h0 ,

|u(tn ) − y(tn )| ≤ chp , (5.15)

whenever the initial values satisfy

|u(ti ) − y(ti )| ≤ c0 hp . (5.16)

Here, u is the continuous solution of the IVP and y is the LMM approximation.

To prove convergence, we will for simplicity only consider the scalar case d = 1. The case
d > 1 can be proved similarly.

5.4.7 Lemma: Let d = 1. Every LMM can be recast as a one-step method

Yk = AYk−1 + hFh (tk−1 , Yk−1 ) (5.17)

where  
  −αs−1 −αs−2 · · · −α0
yk
 .. 
 1 0 ··· 0 
Yk =  .  ∈ R s , A=  ∈ Rs×s , (5.18)
 
. ..
 ··· 0 
yk−s+1
1 0

and Fh (tk , Yk ) = (ψk , 0, . . . , 0)T ∈ Rs with ψk implicitly defined as the solution of


s s
!
X X
ψk = βs−r f (tk−r , yk−r ) + βs f tk , hψk − αs−r yk−r . (5.19)
r=1 r=1

72
Proof. From the general form of LMM we obtain
s
X s−1
X
αs−r yk−r = h βs−r f (tk−r , yk−r ) + hβs f (tk , yk ).
r=0 r=0

We rewrite this to
s
X
yk = − αs−r yk−r + hψk ,
r=1

where this formula is also entered implicitly as the value for yk in the computation of
f (tk , yk ). This is the first equation in (5.17). The remaining equations are simply shifting
the entries in the vector Yk−1 , i.e. (Yk )i+1 = (Yk−1 )i = yk−i , for i = 1, . . . , s − 1.

5.4.8 Lemma: Let d = 1 and let u(t) be the exact solution of the IVP. Suppose Ybk
is the solution of a single step

Ybk = AUk−1 + hFh (tk−1 Uk−1 ),

with correct initial values Uk−1 = (uk−1 , uk−2 , . . . , uk−s )T .


If the multistep method is consistent of order p and f is sufficiently smooth, then
there exist constants h0 > 0 and M such that for h ≤ h0 there holds

|Uk − Ybk | ≤ M hp+1 . (5.20)

Proof. The first component of Uk − Ybk is the local error uk − yk of step k, as defined in
definition 5.2.3, which is of order hp+1 by the assumption. The other components vanish
by the definition of the method.

5.4.9 Lemma: Assume that an LMM is zero-stable. Then, there exists a vector
norm k·k on Cs such that the induced operator norm of the matrix A satisfies

kAk ≤ 1. (5.21)

αs−r xr is the characteristic polynomial of the matrix A.


P
Proof. We notice that %(x) =

By the root test we know that simple roots, which correspond to irreducible blocks of
dimension one have maximal modulus one. Furthermore, every Jordan block of dimension
greater than one corresponds to a multiple root, which by assumption has modulus strictly
less than one. Let ξi be such a multiple root with multiplicity µi . Such a block admits a
modified canonical form
 
λi 1 − |λi |
 .. 
 λ i .  ∈ Cµi ×µi .

Ji = 
 .. 
 . 1 − |λi |
λi

73
Thus, the canonical form J = T −1 AT has norm kJk∞ ≤ 1. If we define the vector norm

kxk = kT −1 xk∞ ,

it follows that

kAxk = kT −1 Axk∞ = kJT −1 xk∞ ≤ kT −1 xk∞ = kxk.

5.4.10 Theorem: Let f be sufficiently smooth. If a linear multi-step method is


zero-stable and consistent of order p, then it is convergent of order p.

Proof. As already stated, we only prove the case d = 1 explicitly. See the original notes
by Guido Kanschat for the general proof. Since f was assumed to be sufficiently smooth,
Fh satsifies a uniform Lipschitz condition with Lipschitz constant Lh .

We reduce the proof to the convergence of the one-step method

Yk = AYk−1 + hFh (tk−1 , Yk−1 ) =: G(Yk−1 ). (5.22)

Let Yk−1 and Zk−1 be two initial values for the interval Ik . By the previous lemma, we
have in the norm defined there, for sufficiently small h, that

kG(Yk−1 ) − G(Zk−1 )k ≤ (1 + hLh )kYk−1 − Zk−1 k. (5.23)

By lemma 5.4.8, the local error ηk = kUk − Ybk k at step k is bounded by M hp+1 (where
M also contains the equivalence constant γ between the Euclidean norm and the norm
defined in the previous lemma). Thus:

kU1 − Y1 k ≤ (1 + hLh )kU0 − Y0 k + M hp+1


kU2 − Y2 k ≤ (1 + hLh )2 kU0 − Y0 k + M hp+1 1 + (1 + hLh )

 
kU3 − Y3 k ≤ (1 + hLh )3 kU0 − Y0 k + M hp+1 1 + (1 + hLh ) + (1 + hLh )2
..
.
 
kUn − Yn k ≤ (1 + hLh )n kU0 − Y0 k + M hp+1 1 + (1 + hLh ) + . . . + (1 + hLh )n
M hp nhLh
≤ enhLh kU0 − Y0 k + − 1 ≤ Chp

e
Lh

where we recall that Un = u(tn ) and tn = t0 + T where T = nh and where

M T Lh
C := c0 γeT Lh + (e − 1)
Lh

with c0 as defined in (5.16).

74
5.5 Starting procedures

5.5.1. In contrast to one-step methods, where the numerical solution is obtained solely
from the differential equation and the initial value, multistep methods require more than
one start value. An LMM with s steps requires s known start values yk−s , . . . , yk−1 . Mostly,
they are not provided by the IVP itself. Thus, general LMM decompose into two parts:

• a starting phase where the start values are computed in a suitable way and

• a run phase where the LMM is executed.

It is crucial that the starting procedure provides a suitable order corresponding to the
LMM of the run phase, recall condition (5.16) in definition 5.4.6. Possible choices for the
starting phase include multistep methods with variable order and one-step methods.

Example 5.5.2 (Self starter). A 2-step BDF method requires y0 and y1 to be known. y0
is given by the initial value while y1 is unknown so far. To guarantee that the method has
order 2, y1 needs to be at least locally of order 2, i.e.,

|u(t1 ) − y1 | ≤ c0 h2 . (5.24)

This is ensured, for example, by one step of the 1-step BDF method (implicit Euler).

However, starting an LMM with s > 2 steps by a first-order method and then successively
increasing the order until s is reached does not provide the desired global order. That is
due to the fact that a one-step method cannot have more than order 2, limiting the overall
convergence order to 2. Nevertheless, self starters are often used in practice.

Example 5.5.3 (Runge-Kutta starter). One can use Runge-Kutta methods to start LMMs.
Since only a fixed number of starting steps are performed, the local order of the Runge-
Kutta approximation is crucial. For an implicit LMM with convergence order p and stepsize
h one could use an RK method with consistency order p − 1 with the same step size h.

Consider a 3-step BDF method. Thus, apart from y0 , we need start values y1 , y2 with
errors less than c0 h3 . They can be computed by RK methods of consistency order 2, for
example by two steps of the 1-stage Gauß collocation method with step size h since it has
consistency order 2s = 2, see theorem 3.4.15.

Remark 5.5.4. In practice not the order of a procedure is crucial but rather the fact
that the errors of all approximations (the start values and all approximations of the run
phase) are bounded by the user-given tolerance, compare Section 2.4. Generally, LMMs
are applied with variable step sizes and orders in practice (see e.g. Exercise 7.2).

Thus, the step sizes of all steps are in practice controlled usually controlled using local
error estimates. Hence, self starting procedures usually start with very small step sizes
and increase them successively. Due to their higher orders RK starters usually are allowed
to use moderate step sizes in the beginning.

75
5.6 LMM and stiff problems

To study A-stability of LMMs we consider again the model equation u0 = λu. Applying a
general LMM (5.3) to this model equation leads to the linear model difference equation
s
X
αs−r − zβs−r )yk−r = 0. (5.25)
r=0

with z = λh.

5.6.1 Definition (A-stability of LMM): The stability region of an LMM is the


set of points z ∈ C, for which all sequences (yn )∞
n=0 of solutions of the equation (5.25)
stay bounded for n → ∞. An LMM is called A-stable, if the stability region
contains the left half-plane of C.

Note that this definition is equivalent to the definition of A-stability for one-step methods
in definition 3.2.5.

5.6.2 Definition: The so-called stability polynomial of an LMM is obtained by


replacing λh in (5.25) by a general element z ∈ C and by inserting yn = xn to obtain
s
X
Rz (x) = αs−r − zβs−r )xs−r . (5.26)
r=0

Remark 5.6.3. Instead of the simple amplification function R(z) of the one-step methods,
we get here a function of two variables: the point z for which we want to show stability
and the artificial variable x from the analysis of the method.

5.6.4 Lemma: Let {ξ1 (z), . . . , ξs (z)} be the set of roots of the stability polynomial
Rz (x) as functions of z. A point z ∈ C is in the stability region of a LMM, if these
roots satisfy the root test in corollary 5.3.6.

Proof. The proof is analog to that of theorem 5.4.3.

5.6.5 Theorem (2nd Dahlquist barrier): There is no A-stable LMM of order


p > 2. Among the A-stable LMM of order 2, the trapezoidal rule (Crank-Nicolson)
has the smallest error constant.

Proof. See [HW10, Theorem V.1.4].

5.6.1 A(α)-stability

5.6.6. Motivated by the fact that there are no higher order A-stable LMMs people have
introduced relaxed concepts of A-stability.

76
Figure 5.1: Boundaries of the stability regions for BDF(1) to BDF(6); the unstable region
is right of the origin. The right figure shows a zoom near the origin.

k 1 2 3 4 5 6
α 90◦ 90◦ 86.03◦ 73.35◦ 51.84◦ 17.84◦
D 0 0 0.083 0.667 2.327 6.075

Table 5.1: Values for A(α)- and stiff stability for BDF methods of order k.

5.6.7 Definition: A LMM is called A(α)-stable, for α ∈ [0◦ , 90◦ ], if its stability
region contains the sector
 
Im(z)
z ∈ C Re(z) < 0 and ≤ tan α .
Re(z)

It is called A(0)-stable, if the stability region contains the negative real axis.
It is called stiffly stable, if it contains the set {Re(z) < −D}.

Remark 5.6.8. The introduction of A(0)-stability is motivated by linear systems of the


form u0 = −Au with symmetric positive definite matrix A. Only stability on the real axis
is required in that case, since all eigenvalues are real. Any positive angle α is sufficient.

Similarly, A(α)-stable LMM are suitable for linear problems in which high-frequency vi-
brations (large Imλ) decay fast (large −Reλ).

LMMs behave similarly for nonlinear problems if the Jacobian matrix Dy f satisfies corre-
sponding properties.

Example 5.6.9. The stability regions of the stable BDF methods are in Figure 5.1. The
corresponding values for A(α)-stability and stiff stability are in Table 5.1. (Recall from
theorem 5.4.5 that BDF(7) is not even zero-stable.)

77
Chapter 6

Boundary Value Problems

This chapter deals with problems of a fundamentally different type than the problems we
examined in chapter 1, namely boundary value problems. Here, we have prescribed values
at the beginning and at the end of an interval of interest. They will require the design of
different numerical methods. We will only consider the most classical one.

6.1 General boundary value problems

Due to lemma 1.2.4 we know that every ODE can be written as a system of first-order
ODEs. Thus, we make the following definition (restricting our attention to explicit ODEs).

6.1.1 Definition: A boundary value problem (BVP) is a differential equation


problem of the form: Find u : [a, b] ⊂ R → Rd , such that

u0 (t) = f t, u(t)

t ∈ (a, b) (6.1a)

r u(a), u(b) = 0. (6.1b)

6.1.2 Definition: A BVP (6.1) is called linear, if the right hand side f as well as
the boundary conditions are linear in u, i.e.: Find u : [a, b] → Rd such that

u0 (t) = A(t)u(t) + c(t) ∀t ∈ (a, b) (6.2a)


Ba u(a) + Bb u(b) = g. (6.2b)

with A : [a, b] → Rd×d , c : [a, b] → Rd , Ba , Bb ∈ Rd×d and g ∈ Rd .

Remark 6.1.3. Since boundary values are imposed at two different points in time, the
concept of local solutions from Definition 1.2.8 is not applicable. Thus, tricks, such as
going forward from interval to interval, as is done in the proof of Péano’s theorem using
Euler’s method, are here not applicable. For this reason, nothing can be concluded from
the local properties of the solution and that right hand side f . In fact, it is hard in general
even to establish that a solution exists.

78
Example 6.1.4. Consider the linear BVP
 0      
u1 (t) 0 1 u1 (t) 0
= + with
u02 (t) 0 0 u2 (t) 1

(i) u1 (0) = 0, u2 (1) = 0,


(ii) u1 (0) = 0, u1 (1) = 0,
(iii) u2 (0) = 0, u2 (1) = 0.

By substitution, we can easily see that this first-order system of ODEs is in fact equivalent
to the second-order ODE u001 = 1, which can be explicitly solved by integration to give

t2
u2 (t) = u01 (t) = t + c1 and u1 (t) = + c1 t + c2 .
2
But the BVP is not solvable for all three choices of boundary conditions (BCs). Using the
BCs we get
T
(i) c2 = 0 and c1 = −1, i.e., u(t) = t( 2t − 1), t − 1 ,
t 1 T

(ii) c2 = 0 and c1 = −1/2, i.e., u(t) = 2 (t − 1), t − 2 ,
(iii) but here the two BCs lead to c1 = 0 and c1 = −1, respectively, which cannot be
satisfied simultaneously.
Remark 6.1.5. Note that due to lemma 1.3.13 we know that the solution space of the
linear ODE in (6.2a) is d-dimensional. Hence, we need d additional pieces of information
to determine the solution uniquely. However, whether the d boundary conditions in (6.2b)
are sufficient is more subtle than in the case of an IVP, as we have just seen.

We will not discuss this further and instead consider an important subclass of linear BVPs,
as well as the most classical numerical method for them. For more details on the general
solution theory, see chapter 6 in the original notes by G. Kanschat or [Ran17b, Chap. 8].

6.2 Second-order, scalar two-point boundary value problems

Let us consider following linear, second-order BVP of finding u : [a, b] → R such that

−u00 (x) + β(x)u0 (x) + γ(x)u(x) = f (x), u(a) = ua , u(b) = ub . (6.3)

for some functions β, γ, f : [a, b] → R and two boundary values ua , ub ∈ R. (As is common
practice, we use x instead of t to denote the independent variable here.)

We introduce the set


n o
B = u ∈ C 2 (a, b) ∩ C[a, b] u(a) = ua and u(b) = ub .

Then, we can see the LHS of (6.3) as a differential operator applied to u, mapping B to
the set of continuous functions. Namely, we define

L : B → C[a, b]
(6.4)
u 7→ −u00 + βu0 + γu.

79
To simplify our life we can (without loss of generality) get rid of the inhomogeneous bound-
ary values ua and ub . To this end, let
b−x x−a
ψ(x) = ua + ub ,
b−a b−a
and introduce the new function u0 := u − ψ. Then, u0 solves the BVP
ub − ua
−u000 (x) + β(x)u00 (x) + γ(x)u0 (x) = f (x) − β(x) − γ(x)ψ(x),
b−a
| {z }
u0 (a) = u0 (b) = 0. =: f0 (x)

Thus, it is sufficient to consider the boundary value problem:

6.2.1 Definition: Given an interval I = [a, b], find a function


n o
u ∈ V = u ∈ C 2 (a, b) ∩ C[a, b] u(a) = u(b) = 0 , (6.5)

such that for a differential operator of second order as defined in (6.4) above and a
right hand side f ∈ C[a, b] there holds
Lu = f. (6.6)

6.3 Finite difference methods

We subdivide the interval I = [a, b] again into subintervals and, as in Definition 2.1.2, and
consider the solution only at the partitioning points a = x0 ≤ x1 . . . ≤ xn = b. As with
one-step and multistep timestepping methods, we denote the approximate solution values
at those partitioning points by yk , k = 0, . . . , n.
While one-step methods directly discretize the Volterra integral equation in order to com-
pute the solution at every new step, finite difference methods discretize the differen-
tial equation on the whole interval at once and then solve the resulting discrete (finite-
dimensional) system of equations. We have accomplished the first step and decided that
instead of function values u(x) in every point t of the interval I, we only approximate
u(xk ) in the points of the partition by yk , k = 0, . . . , n. What is left is the definition of
the discrete operator representing the equation.

6.3.1 Definition (Finite differences): To approximate first derivatives of a


function u, we introduce the operators

u(x + h) − u(x)
Forward difference Dh+ u(x) = , (6.7)
h
u(x) − u(x − h)
Backward difference Dh− u(x) = , (6.8)
h
u(x + h) − u(x − h)
Central difference Dhc u(x) = . (6.9)
2h
For second derivatives we introduce the
u(x + h) − 2u(x) + u(x − h)
3-point stencil Dh2 u(x) = . (6.10)
h2

80
Remark 6.3.2. Note that the 3-point stencil is the product of the forward and backward
difference operators:
   
Dh2 u(x) = Dh+ Dh− u(x) = Dh− Dh+ u(x) .

For simplicity, we only present finite differences of uniform subdivisions. Nevertheless, the
definition of the operators can be extended easily to h changing between intervals.

6.3.3 Definition: A finite difference operator Dhα is consistent of order p with


the αth derivative, if there exists a constant c > 0 independent of h and a subset
I˜ ⊂ [a, b], such that for any u ∈ C α+p (a, b) and for any x ∈ I:
˜

|Dhα u(x) − u(α) (x)| ≤ chp (6.11)

6.3.4 Lemma: The forward and backward difference operators Dh+ and Dh− in
definition 6.3.1 are consistent of order 1 with the first derivative, i.e. for any x ∈
[a, b − h] (resp. x ∈ [a + h, b]),

|Dh+ u(x) − u0 (x)| ≤ ch and |Dh− u(x) − u0 (x)| ≤ ch. (6.12)

The central difference operator Dhc and the 3-point stencil Dh2 are consistent of
order 2 with the first and second derivative, respectively, i.e. for any x ∈ [a+h, b−h],

|Dhc u(x) − u0 (x)| ≤ ch2 and |Dh2 u(x) − u00 (x)| ≤ ch2 . (6.13)

Proof. Taylor expansion: Let x ∈ [a, b − h]. Then there exists ξ ∈ (x, x + h) such that

u(x + h) − u(x)
Dh+ u(x) − u0 (x) = − u0 (x)
h
2
u(x) + hu0 (x) + h2 u00 (ξ) − u(x)
= − u0 (x) = h 00
2 u (ξ).
h
Thus, the result for Dh+ holds with c := 1
2 maxx∈(a,b) |u00 (x)|. The same computation can
be applied to Dh− u(x).

For Dhc , let x ∈ [a + h, b − h]. Then, there exist ξ − , ξ + ∈ (x − h, x + h) such that

Dhc u(x) − u0 (x) =


h2 00 h3 000 + h2 00 h3 000 −
u(x) + hu0 (x) + − u(x) − hu0 (x) +

2 u (x) + 6 u (ξ ) 2 u (x) − 6 u (ξ )
− u0 (x)
2h
h2
u000 (ξ − ) + u000 (ξ + ) .

= 12

The final result for the 3-point stencil Dh2 follows in a similar way DIY .

Remark 6.3.5. When applied to the equation u0 = f (t, u) the solutions obtained by
forward and backward differences correspond to the explicit and implicit Euler methods,
respectively.

81
6.3.6 Definition: The finite difference method (with uniform subdivisions) for
the discretization of the BVP Lu = f on the interval I = [a, b] with homogeneous
boundary conditions, i.e., for u ∈ V as in definition 6.2.1, is defined by

1. choosing a partition a = x0 < x1 < · · · < xn = b with n ∈ N,

h := (b − a)/n and xk = kh, k = 0, . . . , n,

2. replacing all differential operators in L by finite differences, evaluated at xk ,

3. considering and computing the approximations yk of the solution u(xk ) at the


discrete points x0 , . . . , xn .

Example 6.3.7. Using the 3-point stencil for u00 and central differences for u0 , the BVP

−u00 (x) + β(x)u0 (x) + γ(x)u(x) = f (x), u(a) = u(b) = 0,

and the abbreviations βk = β(xk ), γk = γ(xk ) fk = f (xk ), we obtain the discrete system
of equations
−yk+1 + 2yk − yk−1 yk+1 − yk−1
2
+ βk + γk yk = fk , for k = 1, . . . , n − 1, (6.14)
h 2h
with y0 = yn = 0, or in matrix notation
 
λ1 ν1    
. y1 f1

µ2 λ2 .. 
 .   . 
Lh y =    .  =  .  = fh . (6.15)
 .. ..  . .
 . . νn−2  y f
n−1 n−1
µn−1 λn−1
where
2 1 βk 1 βk
λk = + γk , µk = − − , and νk = − + . (6.16)
h2 h 2 2h h 2 2h

Remark 6.3.8. Like our view to the continuous BVP has changed w.r.t. IVPs, the discrete
problem is now a fully coupled linear system which has to be solved by methods of linear
algebra, rather than time stepping. In fact, we have n − 1 unknown variables y1 , . . . , yn−1
and n − 1 equations, such that existence and uniqueness of solutions are equivalent.

6.4 Existence, stability and convergence

6.4.1. Since the solution of the discretized boundary value problem is a problem in linear
algebra, we have to study properties of the matrix Lh . The shortest and most elegant
way to prove stability is through the properties of M-matrices, which we present here very
shortly. We are not dwelling on this approach too long, since it is sufficient for stability,
but by far not necessary and particular to low order methods.
The fact that Lh is an M-matrix requires some knowledge of irreducible weakly diagonal
dominant matrices, which we have already come across in the last chapter in Numerik 0,
in the context of stationary iterative methods.

82
6.4.2 Definition: A quadratic n × n-matrix A is called an M-matrix if it satisfies
the following properties:

aii > 0, aij ≤ 0, i, j = 1, . . . , n, j 6= i. (6.17)

The entries of A−1 = (cij )ni,j=1 satisfy

cij ≥ 0, i, j = 1, . . . , n. (6.18)

6.4.3 Lemma: The matrix Lh defined in (6.15) above is an M-matrix provided


that
2
γk ≥ 0 and |βk | < . (6.19)
h

Proof. It is easy to verify that the two conditions in (6.19) are sufficient for the first M-
matrix property.

The proof of positivity of the inverse is based on irreducible diagonal dominance, which
is too long and too specialized at this point and thus we will omit it. See, e.g., [Ran17b,
Hilfssatz 10.2].

Remark 6.4.4. The finite element method, discussed next semester in “Numerik 2 – Finite
Elements”, provides a much more powerful theory to deduce solvability and stability of the
discrete problem.

6.4.5 Lemma: Let A be an M-matrix. If there is a vector w such that for the
vector v = Aw there holds

vi ≥ 1, i = 1, . . . , n,

then

kA−1 k∞ ≤ kwk∞ . (6.20)

Proof. Let x ∈ Rn and y = A−1 x. Then,


X X X
|yi | = | cij xj | ≤ cij |xj | ≤ kxk∞ cij vj .

Thus,

|yi | ≤ kxk∞ A−1 v = kxk∞ A−1 Aw


 
i i
≤ kxk∞ |wi |.

Taking the maximum over all i and dividing by kxk∞ , we obtain

kA−1 xk∞
kA−1 k∞ = sup ≤ kwk∞ .
x∈Rn kxk∞

83
6.4.6 Theorem: Assume that (6.19) holds and that there exists a constant δ < 2
such that
δ
|βk | ≤ . (6.21)
b−a
Then, the matrix Lh defined in (6.15) is invertible and

(b − a)2
kL−1
h k∞ ≤ . (6.22)
8 − 4δ

Proof. Consider the function

p(x) = (x − a)(b − x) = −x2 + (a + b)x − ab,

with derivatives p0 (x) = a + b − 2x and p00 (x) = −2, and a maximum of (b − a)2 /4 at
x = (a+b)/2. Choose the values pk = p(xk ). Due to the consistency results in lemma 6.3.4,
we know that Dh2 p ≡ p00 and Dhc p ≡ p0 are exact, such that, for all k = 1, . . . , n − 1,

(Lh p)k = 2 + βk p0 (xk ) + γk p(xk ) ≥ 2 − |βk |(b − a) ≥ 2 − δ.


1
Since Lh is a M-matrix, the vector w = 2−δ p can then be used to bound the inverse of Lh
using Lemma 6.4.5.

Remark 6.4.7. The assumptions of the previous theorem involve two sets of conditions
on the parameters βk and γk . Since
δ 2 2n 2
< ≤ = ,
b−a b−a b−a h
condition (6.21) actually implies the second condition in (6.19). It is in fact not necessary
in this form, but a better estimate requires more advanced analysis.

The condition on γk in (6.19) is indeed necessary, as will be seen when we study partial
differential equations. The second condition in (6.19), on the other hand, relates the
coefficients βk to the mesh size and can be avoided, as seen in the next example.

Example 6.4.8. By changing the discretization of the first order term to an upwind
finite difference method, we obtain an M-matrix independent of the relation of βk and h.
To this end define
(
↑ β(x)Dh− u(x) if β(x) > 0
β(x)Dh u(x) = . (6.23)
β(x)Dh+ u(x) if β(x) < 0

This changes the matrix Lh to a matrix L↑h with entries

2 |βk | 1 max{0, βk } 1 min{0, βk }


λk = 2
+ + γk , µk = − 2
− , νk = − 2
+ . (6.24)
h h h h h h
As a consequence, the off-diagonal elements always remain non-positive and the diagonal
elements remain positive provided only that γk ≥ 0, for all k. Thus, L↑h is an M-matrix with
a bounded inverse, independent of the values of βk . However, crucially, the consistency
order is reduced from two to one.

84
6.4.9 Theorem: Consider the boundary value problem defined in definition 6.2.1
with β, γ, f ∈ C 4 (a, b) and γ(x) ≥ 0 for all x ∈ [a, b]. Let y ∈ Rn−1 be the finite
difference approximation for this problem in Example 6.3.7. If there exists a δ < 2
such that maxx∈[a,b] |β(x)| ≤ δ/(b − a), then there exists a constant c independent
of h such that

max |uk − yk | ≤ ch2 . (6.25)


0≤k≤n

For the solution y ↑ ∈ Rn−1 of the upwind finite difference approximation in Exam-
ple 6.4.8 there exists a constant c independent of h such that

max |uk − yk↑ | ≤ ch. (6.26)


0≤k≤n

without any additional assumptions on the function β.

n−1
Proof. Let n ∈ N (and thus h > 0) be arbitrary but fixed and let U = (uk )k=1 be the
vector containing the values of the exact solution at x1 , . . . , xn−1 . Considering first the
discretisation in Example 6.3.7 and denoting by

τk := (Lh U )k − (Lu)(xk ), k = 1, . . . , n − 1,

the consistency errors at the interior grid points. Then



Lh (U − y) k = (Lh U )k − (Lu)(xk ) + (Lu)(xk ) − (Lh y)k = τk + fk − fk = τk

and it follows from (6.13) that

kLh (U − y)k∞ = kτ k∞ ≤ ch2

with c independent of h. Since β, γ satisfy the assumptions of theorem 6.4.6 (for arbitrary
h > 0), we can conclude that there exists a c0 > 0 independent of h such that

kU − yk∞ = kL−1 −1 0 2
h Lh (U − y)k∞ ≤ kLh k∞ kτ k∞ ≤ c h .

The proof for the upwind discretisation in Example 6.4.8 is identical, but as discussed
does not require any boundedness of β to guarantee stability and due to the use of the
forward/backward difference quotients, the consistency error is only of O(h).

Remark 6.4.10. Finite differences can be generalized to higher order by extending the
stencils by more than one point to the left and right of the current point. Whenever we
add two points to the symmetric difference formulas, we can gain two orders of consistency.

• • |• {z •} • • • • |• •
{z •} • •
u0 +O(h2 ) u00 +O(h2 )
| {z } | {z }
u0 +O(h4 ) u00 +O(h4 )
| {z } | {z }
u0 +O(h6 ) u00 +O(h6 )

Similarly, we can define one-sided difference formulas, which get us close to multistep
methods. The matrices generated by these formulas are not M-matrices anymore, although
you can show for the 4th order formula for the second derivative that it yields a product

85
of two M-matrices. While this rescues the theory in a particular instance, M-matrices do
not provide a theoretical framework for general high order finite differences anymore.

Very much like the starting procedures for high order multistep methods, high order finite
differences can lead to difficulties at the boundaries. Here, the formulas must be truncated
and for instance be replaced by one-sided formulas of equal order.

All these issues motivate the study of different discretization methods in the next course.

86
Chapter 7

Outlook towards partial differential


equations

Finite difference methods for two-point boundary value problems have a natural extension
to higher dimensions. There, we deal with partial derivatives ∂x∂ 1 , ∂x∂ 2 , ∂x∂ 3 and ∂t

.
As an outlook towards topics in the numerical analysis of partial differential equations, we
close these notes by a short introduction by means of a some examples.

7.1 The Laplacian and harmonic functions

7.1.1 Definition: the Laplacian in two (three) space dimensions is the sum of the
second partial derivatives

∂2 ∂2 ∂2
 
∆u = u + 2u + 2u (7.1)
∂x21 ∂x2 ∂x3

The Laplace equation is the partial differential equation

−∆u = 0. (7.2)

The Poisson equation is the partial differential equation

−∆u = f. (7.3)

Solutions to the Laplace equations are called harmonic functions.

7.1.2 Theorem (Mean-value formula for harmonic functions): Let u ∈


C 2 (Ω) be a solution to the Laplace equation. Then, u has the mean value property
Z
1
u(x) = d−1 u(y) ds, (7.4)
r ω(d) ∂Br (x)

where ∂Br (x) ⊂ Ω is the sphere of radius r around x and ω(d) is the volume of the
unit sphere in Rd .

87
Proof. First, we rescale the problem to
Z Z
1 1
Φ(r) = d−1 u(y) ds = u(x + rz) ds.
r ω(d) ∂Br (x) ω(d) ∂B1 (0)

Then, it follows by the Gauß theorem for the vector valued function ∇u that
Z
0 1
Φ (r) = ∇u(x + rz) · z dsz
ω(d) ∂B1 (0)
y−x
Z
1
= d−1 ∇u(y) · dsy
r ω(d) ∂Br (x) r
Z
1 ∂
= d−1 u(y) dsy
r ω(d) ∂Br (x) ∂n
Z
1
= d−1 ∆u(y) dy = 0.
r ω(d) Br (x)

Therefore, Φ(r) is constant. Because of continuity, we have


Z
1
lim Φ(r) = lim d−1 u(y) ds = u(x),
r→0 r→0 r ω(d) ∂Br (x)

which proves our theorem.

7.1.3 Theorem (Maximum principle): Let a function u ∈ C 2 (Ω) be a solution


to the Laplace equation on an open, bounded, connected domain Ω. Then, if there
is an interior point x0 of Ω, such that for a neighborhood U ⊂ Ω of x0 there holds

u(x0 ) ≥ u(x) ∀x ∈ U,

then the function is constant in Ω.

Proof. Let x0 be such a local maximum and let r > 0 be such that Br (x0 ) ⊂ Ω. Assume
that there is a point x on ∂Br (x0 ), such that u(x) < u(x0 ). Then, this holds for points
y in a neighborhood of x. Thus, in order that the mean value property holds, there must
be a subset of ∂Br (x0 ) where u(y) > u(x0 ), contradicting that x0 is a maximum. Thus,
u(x) = u(x0 ) for all x ∈ Br (x0 ) for all r such that Br (x0 ) ⊂ Ω.

Let now x ∈ Ω be arbitrary. Then, there is a (compact) path from x0 to x in Ω. Thus, the
path can be covered by a finite set of overlapping balls inside Ω, and the argument above
can be used iteratively to conclude u(x) = u(x0 ).

Corollary 7.1.4. Let u ∈ C 2 (Ω) be a solution to the Laplace equation. Then, its maximum
and its minimum lie on the boundary, that is, there are points x, x ∈ ∂Ω, such that

u(x) ≤ u(x) ≤ u(x) ∀x ∈ Ω.

Proof. If the maximum of u is attained in an interior point, the maximum principle yields
a constant solution and the theorem holds trivially. On the other hand, theorem 7.1.3 does
not make any prediction on points at the boundary, which therefore can be maxima. The
same holds for the minimum, since −u is also a solution to the Laplace equation.

88
Corollary 7.1.5. The Poisson equation with homogeneous boundary condition, u ≡ 0 on
∂Ω, has a unique solution.

Proof. Assume there are two functions u, v ∈ C 2 (Ω) with u = v = 0 on ∂Ω such that

−∆u = −∆v = f.

Then, w = u − v solves the Laplace equation with w = 0 on ∂Ω. Due to the maximum
principle, w ≡ 0 and u = v.

7.2 Finite difference methods in higher dimensions

Example 7.2.1. The notion of an interval I can be extended to higher dimensions by a


square Ω = I 2 or a cube Ω = I 3 .

(a, b, b) (b, b, b)
(a, b) (b, b)

(a, a, b) (b, a, b)

(a, b, a) (b, b, a)

(a, a) (b, a) (a, a, a) (b, a, a)

Example 7.2.2. We consider Dirichlet boundary conditions

u(x) = uB (x), for x ∈ ∂Ω. (7.5)

As for two-point boundary value problems, we can reduce our considerations to homoge-
neous boundary conditions uB ≡ 0 by changing the right hand side in the Poisson equation.

7.2.3 Definition: A Cartesian grid on a square (cube) domain Ω consists of the


intersection points of lines (planes) parallel to the coordinate axes (planes).

• • •• •
• • • •
• • •• •
• • • •
• • •• •
• • •• • • • • •
• • •• • • • • •

The grid is called uniform, if all lines (planes) are at equal distances.

89
For the remainder of this discussion let us restirct to the two-dimensional case, d = 2, and
to uniform Cartesian grids.

7.2.4 Definition: The vector y of discrete values is defined in grid points which
run in x1 - and x2 -direction. In order to obtain a single index for every entry of this
vector in linear algebra, we use lexicographic numbering.

• • • •
[n−1][n−2]+1 [n−1][n−2]+2 [n−1]2−1 [n−1]2

• • • •
[n−1][n−3]+1 [n−1][n−3]+2 [n−1][n−2]−1 [n−1][n−2]

• • • •
n−1+1 n−1+2 2[n−1]−1 2[n−1]

• • • •
1 2 n−1− 1 n−1

7.2.5 Definition: The 5-point stencil consists of the sum of a 3-point stencil in
x1 - and a 3-point stencil in x2 -direction. Its graphical representation is

1 -4 1

For a generic row of the linear system, where the associated point is not neighboring
the boundary, this leads to

−u(xk−(n−1) ) − u(xk−1 ) + 4u(xk ) − u(xk+1 ) − u(xk+(n−1) )


Dh2 u(xk ) = (7.6)
h2
If the point xk is next to the boundary, the entry corresponding to the neighboring
boundary point can be omitted, since the value is assumed to be zero there.

Example 7.2.6. The matrix Lh obtained for the Laplacian on Ω = [0, 1]2 using the 5-
point stencil on a uniform Cartesian mesh of mesh spacing h = 1/n with lexicographic
numbering is in RN ×N with N = (n − 1)2 and has the structure
   
D −I 4 −1
−I D −I  −1 4 −1 
   
2
Lh = n  . . . . . . , D = 
  . . . . . .  ∈ R(n−1)×(n−1) .

 . . .   . . . 
 −I D −I   −1 4 −1
−I D −1 4

90
7.2.7 Theorem: The matrix Lh obtained by discretising the Laplace operator via
the 5-point stencil formula is an M-matrix and the solution of the discrete problem

Lh y = f

is stable in the sense that there is a constant c independent of h such that

kL−1
h k∞ ≤ c.

Proof. The proof is identical to the proof for 2-point boundary value problems. To show
boundedness of kL−1
h k∞ we can use in a similar way the function

p(x, y) = (x − a)(b − x)(y − a)(b − y).

7.2.8 Theorem: The finite difference approximation in Example 7.2.6 for the Pois-
son equation in (7.3) on the unit square Ω = [0, 1]2 with homogeneous Dirichlet
conditions is convergent of second order, i.e.

max |u(xk ) − yk | ≤ ch2 .


k=1,...,(n−1)2

Proof. We apply the consistency bound in (6.13) in the x1 - and x2 -direction separately,
obtaining

∂2 u(x + h, y) − 2u(x, y) + u(x − h, y)


u(x, y) − ≤ ch2
∂x2 h2

∂2 u(x, y + h) − 2u(x, y) + u(x, y − h)


2
u(x, y) − ≤ ch2 ,
∂y h2

and deduce the second-order consistency of the 5-point stencil by the triangle inequality.
The remainder of the proof is identical to the proof of theorem 6.4.9.

7.2.9 Theorem: Let y be the solution to the finite difference method for the Laplace
equation with the 5-point stencil. Then, the maximum principle holds for y, namely,
if there is k ∈ {1, . . . , (n − 1)2 } such that yk ≥ yj for all j 6= k and yk ≤ yB for any
boundary value, then y is constant.

Proof. From equation (7.6), it is clear that a discrete mean value property holds, that is,
yk is the mean value of its four neighbors. Therefore, if yk ≥ yj , for all neighboring indices
j of k, we have yj = yk . We conclude by following a path through the grid points.

91
7.3 Evolution equations

After an excursion to second order differential equations depending on more than one spa-
tial variables, we are now returning to problems depending on time. But this time, on time
and space. As for the nomenclature, we have encountered ordinary differential equations
as equations or systems depending on a time variable only, then partial differential equa-
tions (PDE) with several, typically spatial, independent variables. While the problems
considered here are covered by the definition of PDE, time and space are fundamentally
different. Therefore, we introduce the concept of evolution equations.

While the problems in definition 7.1.1 are PDEs of elliptic type. The following problems
can be either parabolic or hyperbolic.

7.3.1 Definition: An equation of the form


∂u
(t, x) = Lu(t, x), (7.7)
∂t
where u(t, .) is in a function space V on a domain Ω, for all time t ∈ R, and
L : V → C(Ω) is a differential operator with respect to the spatial variables x only,
is called a linear evolution equation of first order (in time).
An initial boundary value problem (IBVP) for this evolution equation com-
pletes the differential equation by conditions

u(0, x) = u0 x∈Ω (7.8)


u(t, x) = g x ∈ ∂Ω, t > 0. (7.9)

Example 7.3.2. Consider the case of one spatial variable, i.e. Ω = [a, b] ⊂ R, and the
differential operator L as defined in (6.4), i.e. a general, linear second order differential
operator with respect to the spatial variable x, for simplicity with β = β(x) and γ = γ(x)
independent of t. Furthermore, let u(t, a) = u(t, b) = 0.

This PDE is parabolic and for β = γ = 0 it is called the heat equation.

We can now discretise the right hand side of (7.7), for every fixed t ≥ 0 on a spatial grid
x0 , . . . , xn , as in Example 6.3.7, to obtain a system of ODEs

y 0 (t) = Lh y(t)

for the unknown (semi-discrete) vector y(t) ∈ Rn−1 of approximations to the solution u(t, ·)
of (7.7) at time t. By choosing as the initial condition

yk (0) = u0 (xk ), k = 1, . . . , n − 1

we obtain an autonomous linear IVP for y : [0, T ] → Rn−1 that we can now solve with our
favourite time stepping method.

For γk ≥ 0 and |βk | sufficiently small, the eigenvalues of Lh have negative real part and
vary strongly in size, e.g. for β = γ = 0 we have λ1 = −4n2 sin(π/2n) ≈ −π 2 and
λn−1 = −4n2 sin(π(n − 1)/2n) ≈ −4n2 . Thus, the problem is stiff, especially for n large,
and we should use a stable time stepping method.

92
From theorem 6.4.9 we know that the spatial discretisation is of second order. Thus, a
common time stepping method to use is the Crank-Nicolson method, which is the second
order A-stable LMM with the smallest error constant. To distinguish between spatial grid
points and time steps, choose m ∈ N and let η = T /m be the time step size. We denote
the approximation of y(tj ) at the jth time step tj , j = 1, . . . , m, by Y (j) ∈ Rn−1 . Applying
the Crank-Nicolson method we finally obtain the fully discrete system
η   η   η 
Y (j) = Y (j−1) + Lh Y (j) + Lh Y (j−1) ⇔ I − Lh Y (j) = I + Lh Y (j−1)
2 2 2
for the jth time step. Since the real part of the spectrum of Lh is negative, the matrix on
the left hand side is invertible, so that we can uniquely solve this system for any η > 0.

We finish by stating the convergence result for this example.

7.3.3 Theorem: Consider the problem in definition 7.3.1 in one space dimension,
i.e. Ω = [a, b] ⊂ R, and the differential operator L as defined in (6.4) with β = β(x)
and γ = γ(x) independent of t. Furthermore, let u(t, a) = u(t, b) = 0. Then,
with central finite difference discretisation of L with mesh width h and applying the
Crank-Nicolson method to discretise in time, as described in Example 7.3.2 with
step size η ≤ h, there exists a constant c > 0 independent of h such that
(j)
max max Yk − u(tj , xk ) ≤ ch2 .
j=0,...,m k=0,...,n

93
Appendix A

Appendix

A.1 Comments on uniqueness of an IVP

For a first order differential equation, Lipschitz continuity of f is only a sufficient and not, as
one might think, a necessary condition for uniqueness of a first order differential equation.
The following theorem and proof show that it is indeed possible to have uniqueness without
assuming Lipschitz continuity.

A.1.1 Theorem (Non-necessity of L-continuity): Let f be a continous function


satisfying f (x) > 0 for all x ∈ R. Then, the solution to the (autonomous) IVP

u0 (t) = f u(t)

(A.1a)
u(t0 ) = u0 (A.1b)

is globally unique for all (t0 , u0 ) ∈ R2 .

Proof. Assume two solutions ϕ, ψ : I → R on an open intervall I with t0 ∈ I. Then,

ϕ(t)0 ψ(t)0
1= = for all t ∈ I. (A.2)
f (ϕ(t)) f (ψ(t))

Define the function F : R → R through


Z x
ds
F (x) = .
u0 f (s)

F is continously differentiable since


Z x 
ds 1
∂x F (x) = ∂x = .
u0 f (s) f (x)

Obviously, F is also stricly increasing, hence injective on R: Take x, y ∈ R and assume


without loss of generality that x < y. Then we have F (x) < F (y) and thus F (x) 6= F (y).
Thus, F is an injection.

94
Also, for all t ∈ I, it follows from (A.2) that

ϕ0 (s) ψ 0 (s)
Z t Z t
F (ϕ(t)) = ds = ds = F (ψ(t)).
u0 f (ϕ(s)) u0 f (ψ(s))

Thus, since F is injective, we have ϕ(t) = ψ(t) for all t ∈ I. In conclusion, the IVP (A.1)
has a unique solution.

A.2 Properties of matrices

A.2.1 The matrix exponential

Definition A.2.1. The matrix exponential eA of a matrix A ∈ Rd×d is defined by its


power series

X Ak
eA = . (A.3)
k!
k=0

Lemma A.2.2. The power series (A.3) converges for each matrix A. It is therefore valid
to write
m ∞
A
X Ak X Ak
e = lim = . (A.4)
m→∞ k! k!
k=0 k=0

Proof. Let k·k be a submultiplicative matrix norm on Rd . We want to show that the
Ak
sequence of partial sums (Sn )n∈N0 with Sn given as limm→∞ m
P
k=n k! converges to S :=
Ak
eA = limm→∞ m
P
k=0 k! . Consider therefore

m m
X Ak X Ak
kS − Sn k = lim = lim . (A.5)
m→∞ k! m→∞ k!
k=n+1 k=n+1

Using the triangle-inequality and the fact that k·k is submultiplicative yields
m m
X Ak X 1
lim ≤ lim kAkk . (A.6)
m→∞ k! m→∞ k!
k=n+1 k=n+1

Considering the limit n → ∞ concludes the proof.

Lemma A.2.3 (Properties of the matrix exponential). The following relations hold true:

e0 = I (A.7)
eαA eβA = e(α+β)A , ∀A ∈ Rd×d ∀α, β ∈ R, (A.8)
A −A d×d
e e =I ∀A ∈ R , (A.9)
T −1 AT
e = T −1 eA T ∀A, T ∈ Rd×d invertible, (A.10)
ediag(λ1 ,...,λd ) = diag(eλ1 , . . . , eλd ) ∀λi ∈ R, i = 1, . . . , d. (A.11)

Moreover, eA is invertible for arbitrary quadratic matrices A with (eA )−1 = e−A .

95
Proof. The equality (A.7) follows directly from the definition.

For (A.8) consider the function ϕ(α) given by

ϕ(α) = eαA eβA − e(α+β)A .

Then

ϕ0 (α) = A eαA eβA − e(α+β)A = Aϕ(α) and ϕ(0) = IeβA − eβA = 0,




giving us an IVP for ϕ(α) with unique solution ϕ(α) = eαA ϕ(0) = 0, and the identity in
(A.8) follows.

Equation (A.9) is a special case of (A.8) with parameters α = 1 and β = −1, which in
combination with (A.7) leads to the result.

For (A.10) note that Rd×d forms a ring and is thus associative. Then, for k ∈ N0 , we have

(T −1 AT )k = (T −1 AT )(T −1 AT ) · · · (T −1 AT )(T −1 AT )
= T −1 A(T T −1 )A(T · · · T −1 )A(T T −1 )AT = T −1 Ak T

and thus
∞ ∞ ∞
!
T −1 AT
X 1 −1 X 1 −1 k X 1 k
e = (T AT )k = T A T = T −1 · A · T = T −1 eA T.
k! k! k!
k=0 k=0 k=0

To prove (A.11), let D = diag(λ1 , . . . , λd ) ∈ Rd×d where λi ∈ R, i = 1, . . . , d. Then,


Dk = diag(λk1 , . . . , λkn ), for any k ∈ N0 , and we have
m
X 1
eD = lim diag(λk1 , . . . , λkn ) (A.12)
m→∞ k!
k=0
m  
X 1 k 1
= lim diag λ1 , . . . , λkn (A.13)
m→∞ k! k!
k=0
m m
!
X 1 k X 1 k
= lim diag λ1 , . . . , λ (A.14)
m→∞ k! k! n
k=0 k=0
m m
!
X 1 k X 1
= diag lim λ1 , . . . , lim λk (A.15)
m→∞ k! m→∞ k! n
k=0 k=0
λk1 λkn
= diag(e , . . . , e ) (A.16)

Here, we have used the absolute convergence of the series and that these matrices are
elements of the ring Rd×d .

The final property follows immediately from (A.9).

Example A.2.4. We will perform an exemplary calculation of a matrix exponential. Con-


sider
 
0 1
A= .
k2 0

96
As the matrix exponential of a diagonal matrix is simply a diagonal matrix with the
exponential of the entries, we diagonalize A.

To diagonalize A, note that the eigenvalues λ1 , λ2 of A are λ1 = k and λ2 = −k. Let


T
D = diag(λ1 , λ2 ) = diag(k, −k). The corresponding eigenvectors are ψ1 = 1, k and
ψ2 = 1, −k . The matrix Ψ = (ψ1 |ψ2 ) ∈ R 2×2 satisfies

A = Ψ−1 DΨ.

The inverse of Ψ is given as


 
−1 1 1 1/k
Ψ =
2 1 −1/k

and with the above lemma we can now calculate


e + e−k 1/k(ek − e−k )
 k   
A D −1 1 cosh(k) 1/k sinh(k)
e = Ψe Ψ = = .
2 k(ek − e−k ) ek + e−k k sinh(k) cosh(k)

A.3 The Banach fixed-point theorem

A.3.1 Theorem (Banach fixed-point theorem): Let Ω ⊂ R be a closed set and


f : Ω → Ω a contraction, i.e. there exists γ ∈ (0, 1) such that |f (x)−f (y)| ≤ γ|x−y|.
Then, there exists a unique x∗ ∈ Ω such that f (x∗ ) = x∗ .

Proof. Let x0 ∈ Ω and define xk+1 = f (xk ). First, we prove existence using the Cauchy-
criterion. Let k, n ∈ N0 and consider

|xk − xk+m | = |f (xk−1 ) − f (xk+m−1 )| ≤ γ|xk−1 − xk+m−1 )|.

Iteratively, we get

|xk − xk+m | ≤ γ k |x0 − xm |.

We now write x0 − xm = x0 − x1 + x1 − x2 + · · · + xm−1 − xm . The triangle-inequality


then yields the estimate

|xk − xk+m | ≤ γ k (|x0 − x1 | + |x1 − x2 | + · · · + |xm−1 − xm |)


γk
≤ γ k |x0 − x1 | 1 + γ + γ 2 + · · · + γ m−1 ≤

|x0 − x1 |.
1−γ
As k gets larger this estimate goes to zero.

Concerning uniqueness, let x∗ and y ∗ be fixpoints. Then,

|x∗ − y ∗ | = |f (x∗) − f (y ∗ )| ≤ γ|x∗ − y ∗ |

Since γ ∈ (0, 1) we immediately obtain |x∗ − y ∗ | = 0. Using that |a| = 0 if and only if
a = 0 yields y ∗ = x∗ . This concludes the proof.

97
A.4 The implicit and explicit Euler-method

The explicit resp. implicit Euler is given by the one-step method

y1 = y0 + hf (y0 ) resp. y1 = y0 + hf (y1 )

Clearly, the explicit Euler is a rather easy calculation since all one needs are f , h and y0 .
The implicit Euler is more difficult to compute since for calculating y1 we need the value of
f at y1 . The goal of this section is to visualize and give an intuition for the two algorithms.

Consider the following visualizations.

y u y u
u0 (t1 )

u0 (t0 )

t t
t1 t1
For the explicit Euler we take u0 and u00 . For implicit Euler we go backwards. On
y1 , our approximated solution for u1 , is the t1 -axis we are looking for an the affine
chosen as the intersection point of t1 and function g that fulfills g(0) = u0 and
g(t) = y0 + t · u0 (t0 ). g 0 (t1 ) = f (t1 ). Then we set y1 = g(t1 ).

A.5 Derivation of a BDF-scheme

The BDF formulae use the approximations of the solution at the previous time steps
tk − sh, . . . , tk − h and the unkown value yQ
k at tk that we would like toPdetermine. With
the Lagrange polynomial given by Li (t) = sj=0,j6=i tt−t i
j −ti
we let y(t) = s
j=0 yk−j Ls−j (t).
Then, we will assume that y solves the IVP in the point tk and obtain a linear system from
which we derive the desired value yk .

We now aim to derive the scheme for BDF(2): Let the points tk − 2h, tk − h and tk be
given.

t
tk − 2h tk − h tk

For the corresponding Lagrange polynomials we have, resp.,


(t−tk +h)(t−tk ) (t−tk )(t−tk j−2h) (t−tk +2h)(t−tk +h)
L0 (t) = 2h2
, L1 (t) = h2
and L2 (t) = 2h2
.

By assumption the interpolation polynomial


Ps fulfilles the IVP in the point tk , i.e. there
holds fk := f (tk , y(tk )) = y (tk ) = j=1 yk−j L0k−j (t). Since
0

98
L00 (t) = 2t−2tk +h
2h2
, L01 (t) = − 2t−2th2k +2h and L02 (t) = 2t−2tk +3h
2h2
,

evaluation at t = tk yields
1
fk = 2h yk−2 − h2 yk−1 + 3
2h yk .
2h
The final BDF(2)-scheme is obtained by multiplication with 3 :

yk − 34 yk−1 + 13 yk−2 = 2
3h fk .

99
Bibliography

[But96] J. C. Butcher. A history of Runge-Kutta methods. Appl. Numer. Math.,


20(3):247–260, 1996.

[DB08] P. Deuflhard and F. Bornemann. Numerische Mathematik 2. Gewöhnliche Dif-


ferentialgleichungen. de Gruyter, 3. auflage edition, 2008.

[Heu86] H. Heuser. Lehrbuch der Analysis. Teil 2. Teubner, 3. auflage edition, 1986.

[HNW09] E. Hairer, S. P. Nørsett, and G. Wanner. Solving ordinary differential equations


I. Nonstiff problems, volume 8 of Springer Series in Computational Mathematics.
Springer, Berlin, second edition edition, 2009.

[HW10] E. Hairer and G. Wanner. Solving ordinary differential equations II. Stiff and
differential-algebraic problems, volume 14 of Springer Series in Computational
Mathematics. Springer-Verlag, Berlin, second edition edition, 2010.

[NW06] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer Series
in Operations Research and Financial Engineering. Springer, New York, second
edition, 2006.

[Ran17a] R. Rannacher. Numerik 0: Einführung in die Numerische Mathematik. Heidel-


berg University Publishing, 2017. DOI: 10.17885/heiup.206.281.

[Ran17b] R. Rannacher. Numerik 1: Numerik gewöhnlicher Differentialgleichungen. Hei-


delberg University Publishing, 2017. DOI: 10.17885/heiup.258.342.

[Run95] C. Runge. Über die numerische Auflösung von Differentialgleichungen. Math.


Ann., 46:167–178, 1895.

100

You might also like