0% found this document useful (0 votes)
17 views99 pages

Ivp Notes

The document introduces initial value problems (IVPs) in the context of ordinary differential equations (ODEs) and their applications in calculus and physics, such as radioactive decay and population growth. It explains the formulation of IVPs, including the specification of initial conditions and the process of approximating solutions through numerical methods like Euler methods and Runge-Kutta methods. The document also discusses the transition from higher-order IVPs to systems of first-order IVPs and provides examples of exponential and logistic growth models.

Uploaded by

Jair Rojas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views99 pages

Ivp Notes

The document introduces initial value problems (IVPs) in the context of ordinary differential equations (ODEs) and their applications in calculus and physics, such as radioactive decay and population growth. It explains the formulation of IVPs, including the specification of initial conditions and the process of approximating solutions through numerical methods like Euler methods and Runge-Kutta methods. The document also discusses the transition from higher-order IVPs to systems of first-order IVPs and provides examples of exponential and logistic growth models.

Uploaded by

Jair Rojas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Part I

Initial Value Problems

19
Chapter 2
Introduction to Initial Value
Problems

In calculus and physics we encounter initial value problems although this terminology
may not be used. For example, in calculus a standard problem is to determine the
amount of radioactive material remaining after a fixed time if the initial mass of
the material is known along with the fraction of the material which will decay at
any instant. In the typical radioactive decay model the rate of change of the mass
(i.e., its first derivative with respect to time) of the radioactive material is assumed
proportional to the amount of material present at that instant; the proportionality
constant is the given decay rate which is negative since the mass is decreasing. Thus
a first order differential equation for the mass is given along with the initial mass
of the object. This model is similar to one to describe population growth except in
this case the proportionality constant is positive.
Recall from physics that Newton’s second law of motion states that the force ap-
plied to an object equals its mass times its acceleration. If we have a function which
denotes the position of the object at any time, then its first derivative with respect
to time is the velocity of the object and the second derivative is the acceleration.
Consequently, Newton’s second law is a second order ODE for the displacement and
if we specify the initial location and velocity of the object we have a second order
initial value problem.
To be more precise, an initial value problem (IVP) for an unknown function y(t)
consists of an ordinary differential equation (ODE) for y(t) and one or more auxiliary
conditions specified at the same value of t. Here the unknown is only a function of
one independent variable (t) so differentiation of y(t) involves standard derivatives,
not partial derivatives. The ODE specifies how the unknown changes with respect
to the independent variable which we refer to as time but it could also represent an
x location, etc. Recall that the order of a differential equation refers to the highest
derivative occurring in the equation; e.g., a second order ODE for y(t) must include
a y 00 (t) term but may or may not include a y 0 (t) term. The number of additional

21
22 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS

conditions corresponds to the order of the differential equation; for example, for a
first order ODE we specify the value of y(t) at an initial time t0 and for a second
order ODE we specify the value of both y(t) and y 0 (t) at t0 . For obvious reasons,
these extra conditions are called initial conditions. The goal is to determine the
value of y(t) for subsequent times, t0 < t ≤ T where T denotes a final time.
We begin this chapter by providing a few applications where IVPs arise. These
problems provide examples where an exact solution is known so they can be used to
test our numerical schemes. Instead of analyzing each IVP, we write a generic first
order IVP which serves as our prototype problem for describing various approaches
for approximating a solution to the IVP. We provide conditions which guarantee
that the IVP has a unique solution and that the solution varies a small amount
when the initial conditions are perturbed by a small amount.
Once we have specified our prototype first order IVP we introduce the idea
of approximating its solution using a difference equation. In general, we have to
give up the notion of finding an analytic solution which gives an expression for the
solution at any time and instead find a discrete solution which is an approximation
to the exact solution at a set of finite times. The basic idea is that we discretize
our domain, in this case a time interval, and then derive a difference equation which
approximates the differential equation in some sense. The difference equation is in
terms of a discrete function and only involves differences in the function values; that
is, it does not contain any derivatives. Our hope is that as the difference equation
is imposed at more and more points (which much be chosen in a uniform manner)
then its solution approaches the exact solution to the IVP.
One might ask why we only consider a prototype equation for a first order IVP
when many IVPs include higher order equations. At the end of this chapter we
briefly show how a higher order IVP can be converted into a system of first order
IVPs.
In Chapter 3 we derive the Euler methods which are the simplest numerical meth-
ods for approximating the solution to a first order initial value problem. Because
the methods are simple, we can easily derive them plus give graphical interpreta-
tions to gain intuition about approximations. Once we analyze the errors made in
replacing the continuous differential equation by a difference equation, we see that
the methods only converge linearly which is quite slow. This is the motivation for
looking at higher accurate methods in the following chapter. We look at several
numerical examples and verify the linear convergence of the methods and we see
that in certain situations one of the methods tends to oscillate and even“blow up”
while the other always provides reliable results. This motivates us to study the
numerical stability of methods.
In Chapter 4 we provide a survey of numerical schemes for solving our prototype
IVP. In particular, we present two classes of methods. The first class of methods
consist of single step methods which use the solution at the previous time along
with approximations at some intermediate time points to approximate the solution
at the next time level. The second class of methods is called multistep methods
which use the calculated approximations at several previous times to approximate
the solution at the next time level. In particular, we look at Runge-Kutta methods
(single step) and multistep methods in detail. Other topics included in this chapter
2.1. EXAMPLES OF IVPS 23

are predictor-corrector methods and extrapolation methods.


For many real-world applications a mathematical model involves several unknown
functions which are inter-related. Typically there is a differential equation for each
unknown which involves some or all of the other unknowns. If each differential
equation is first order and the initial value of each unknown is given, then we have
a system of first order IVPs. Systems are discussed in Chapter 5. Also in this
chapter we briefly consider adaptive time stepping methods. For simplicity, in all
the previous work we assume that the approximations are generated at evenly spaced
points in time. Of course, in practice this is very inefficient because there may be
times when the solution changes rapidly so small time increments are needed and
other instances when the solution varies slowly so that a larger time increment needs
to be used.

2.1 Examples of IVPs


For the simplest type of IVP we have a first order ODE with one initial condition
which is the value of the unknown at the initial time. An example of such an IVP
is an exponential model for population growth/decay where we assume the rate of
growth or decay of the population p(t) is proportional to the amount present at any
time t, i.e., p0 (t) = rp(t) for some constant r and we know the initial population,
i.e., p(0) = p0 . Thus, the IVP for the exponential growth model with growth rate
r is given by
n 0
p (t) = rp(t) t > 0
(2.1)
p(0) = p0 .
If r < 0 then the differential equation models exponential decay as in the example of
radioactive decay. If r > 0 then the differential equation models exponential growth
as in the case of bacteria growth in a large petri dish. Why is this growth/decay
called exponential? To answer this question we solve the differential equation an-
alytically to get the population at any time t as p(t) = p0 ert which says the
population behaves exponentially. This solution can be verified by demonstrating
that it satisfies the differential equation p0 (t) = rp(t) and the initial condition; we
have
p(t) = p0 ert ⇒ p0 (t) = rp0 ert = rp(t)
and p(0) = p0 . In § 2.2 we see how this analytic solution is obtained.
This exponential model makes sense for applications such as bacteria growth in
an area not confined by space because the model assumes there is an endless supply
of resources and no predators. A more realistic population model is called logistic
growth where a condition is imposed which gives a carrying capacity of the system;
i.e., the population is not allowed to grow larger than some prescribed value. When
the population is considerably below this threshold the two models produce similar
results but for larger values of time logistic growth does not allow unbounded growth
as the exponential model does. The logistic model we consider restricts the growth
rate in the following way  p
r = r0 1 − , (2.2)
K
24 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS

where K is the maximum allowable population and r0 is a given growth rate for
small values of the population. As the population p increases to near the threshold
value K then p/K becomes close to one (but less than one) and so the term
(1 − p/K) is positive but approaches zero as p approaches K. Thus the growth
rate decreases because of fewer resources; the limiting value is when p = K and the
growth rate is zero. However when p is small compared with K, the term (1−p/K)
is near one and the model behaves like exponential growth with a rate of r ≈ r0 .
Assuming the population at any time is proportional to the current population using
the proportionality constant (2.2), the differential equation becomes
 
0 p(t) r0
p (t) = r0 1 − p(t) = r0 p(t) − p2 (t) (2.3)
K K
along with p(0) = p0 . This equation is nonlinear in the unknown p(t) due to the
p2 (t) term and is more difficult to solve than the exponential growth equation.
However, it can be shown that the solution is
Kp0
p(t) = . (2.4)
(K − p0 )e−r0 t + p0

This can be verified by substitution into the differential equation and verification
of the initial condition p(0) = p0 . We expect that as we take the limt→∞ p(t) we
should get the threshold value K. Clearly this is true because
1 1
lim p(t) = Kp0 lim = Kp0 =K,
t→∞ t→∞ (K − p0 )e−r0 t + p0 p0

where we have used the fact that limt→∞ e−r0 t = 0 for r0 > 0.

Example 2.1. exponential and logistic growth


Suppose we have bacteria growing in a petri dish with an initial count of 1000 bacteria.
The number of bacteria after an hour is counted and an estimate for the initial hourly
growth rate is determined to be 0.294; thus for the exponential model r = 0.294 and for
the logistic model we set r0 to this value. We assume that the carrying capacity, i.e., the
maximum number of bacteria the petri dish can sustain, is 10,000. Write the IVPs for
each model, give their analytic solution and compare the results graphically from the two
models.
Let pe (t) and p` (t) represent the solution at any time t to the exponential growth
model and the logistic growth model, respectively. Using (2.1) and (2.3), the IVPs for
each model are given by
n p0 (t) = 0.294pe (t)
e
exponential growth model:
pe (0) = 103

and
0.294 2
p0` (t) = 0.294p` (t) − p` (t)
n
logistic growth model: 104
p` (0) = 103 .
2.1. EXAMPLES OF IVPS 25

The analytic solutions are

107
pe (t) = 103 e.294t p` (t) = ,
(9000e−.294t + 1000)

where we have used (2.4) to get p` (t). To compare the results, we plot the exact solutions
to the IVPs as well as giving a table of the bacteria population (truncated to the nearest
whole number) from each model for a range of hours. As we see from the plots and the
table, the predicted size of the bacteria colony is close for small values of t but as the
time increases the exponential model predicts boundless growth whereas the logistic model
predicts a population which never exceeds the carrying capacity of 10,000.
20 000

15 000 exponential
growth

carrying capacity
10 000

logistic growth
5000

2 4 6 8 10 12 14

t 0 1 2 4 6 8 10 15 20
pe (t) 1000 1341 1800 3241 5835 10506 18915 82269 357809
p` (t) 1000 1297 2116 2647 3933 5386 6776 9013 9754

Instead of a first order ODE in the IVP we might have a higher order equation
such as the harmonic oscillator equation. This models an object attached to a wall
with a spring and the unknown function is the displacement of the object from the
wall at any time. At the initial time t = 0 the system is in equilibrium so the
displacement y(t) is zero. This is illustrated pictorially in the figure on the left
below. The figure on the right illustrates the situation where at a later time t the
object has moved to the right an amount y(t) so the spring is stretched.

initial displacement y = 0 object moved to right by y(t)

wall object wall object


mass = m mass = m
y(t)
y=0

Using basic laws of physics the second order ODE which models this spring-mass
system is
my 00 (t) = −ky(t) − cy 0 (t) ,
26 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS

where m is the mass of the object, k is the spring constant in Hooke’s law (force = -
spring constant times the displacement), and c is the coefficient of friction. Because
the differential equation is second order, we must specify two initial conditions at
the same instance of time; here we specify the initial displacement y(0) = 0 and
the initial velocity y 0 (t) = ν. If there is no friction then the differential equation
models a simple harmonic oscillator . In this case the IVP becomes

y 00 (t) = −ω 2 y(t)
y(0) = 0 (2.5)
y 0 (0) = ν ,
p
where ω = k/m. Clearly sin ωt and cos ωt satisfy this differential equation so
the general solution is y(t) = C1 sin ωt + C2 cos ωt for constants C1 , C2 . Satisfying
the initial condition y(0) = 0 implies C2 = 0 and C1 is determined by setting
y 0 (t) = ωC1 cos ωt equal to ν at t = 0; i.e., C1 = ν/ω . Since the solution to the
simple harmonic oscillator equation is y(t) = (ν/ω) sin ωt, the solution should be
periodic as indicated graphically in the next example.
If the coefficient of friction is nonzero, then the equation is more difficult to
solve analytically because we have to consider three cases depending on the sign
of the term c2 − 4ω 2 . However, the inclusion of the friction term will not cause
problems when we discretize.

Example 2.2. simple harmonic oscillator


Assume we have an object of mass m = 5 attached to a wall with a spring whose Hooke’s
constant is k = 0.45. Assume that y(t) denotes the displacement of the object at any
time t, that the initial displacement is zero and the initial velocity is 4. Write the IVP
for this problem assuming first that the coefficient of friction is zero and give the exact
solution. Verify that the solution satisfies the DE and initial conditions. Plot the solution
for 0 ≤ t ≤ 5.
p p √
Note that ω = k/m = 0.45/5 = 0.09 = 0.3. The IVP is given by
00
n y (t) = −0.09y(t)
no friction: y(0) = 0
y 0 (0) = 2

whose exact solution is


2
y(t) = sin(0.3t) = 6.667 sin(0.3t) .
.3
To verify that this is the exact solution we first see that it satisfies the initial conditions:
2
y(0) = 0 y 0 (t) = 0.3 cos(0.3t) ⇒ y 0 (0) = 2 cos(0) = 2 .
.3
To verify that it satisfies the DE we need y 00 (t) = −2(0.3) sin(0.3t) which can be written
00 2 2

as y (t) = −(.3) .3 sin(.3t) = −0.9y(t).
The plot below illustrates the periodicity of the solution.
2.2. GENERAL FIRST ORDER IVP 27

−2

−4

−6

−8
0 10 20 30 40 50 60 70 80 90

In the above examples a single unknown function is sought but in some math-
ematical models we have more than one unknown. An example of this scenario is
a population model where there are two interacting species; this is the so-called
predator-prey model. Here the number of prey, ρ(t), is dependent on the number
of predators, p(t), present. In this case we have a first order ODE for the prey and
for the predator;  
p(t)
ρ0 (t) = 1− ρ(t)
ν
  (2.6)
ρ(t)
p0 (t) = − 1 − p(t)
µ
Note that the equations are nonlinear. These equations must be solved simulta-
neously because the growth/decay of the prey is dependent upon the number of
predators and vice versa. An exact solution to this system is not available but a
numerical approximation to the equation is given in Chapter 5.

2.2 General First Order IVP


In § 2.1 we encountered two examples of first order IVPs. Of course there are many
others examples such as
n y 0 (t) = sin πt 0 < t ≤ 4 n y 0 (t) + y 2 (t) = t 2 < t ≤ 10
I: II :
y(0) = 0 . y(2) = 1 .
(2.7)
28 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS

In the first IVP, the ODE is linear whereas in the second one the ODE is nonlinear
in the unknown. Clearly, these IVPs are special cases of the following general IVP.

General first order IVP

Given scalars t0 , y0 , T and a function f (t, y), find y(t) satisfying

y 0 (t) = f (t, y) for t0 < t ≤ T (2.8a)


y(t0 ) = y0 . (2.8b)

Here f (t, y) is the given derivative of y(t) which we refer to as the slope and y0
is the known value at the initial time t0 . For example, for IVP I in (2.7) we have
f (t, y) = sin πt, i.e., the slope is only a function of time whereas in IVP II we have
f (t, y) = t − y 2 so that the slope is a function of both t and y. The ODE in IVP I
is linear in the unknown and in IVP II it is nonlinear due to the y 2 term so that
both linear and nonlinear differential equations are included in the general equation
(2.8a).
For certain choices of f (t, y) we are able to find an analytic solution to (2.8).
In the simple case when f = f (t), i.e., f is a function of t and not both t and
y,
R we can solve the ODE exactly if we can obtain an antiderivative of f (t), i.e., if
f (t) dt can be evaluated. For example, for the first IVP in (2.7) we have
Z Z
1
y 0 (t) = sin πt ⇒ y 0 (t) dt = sin πt dt ⇒ y(t) + C1 = − cos πt + C2
π

and thus the general solution is y(t) = − π1 cos πt + C. The solution to the differ-
ential equation is not unique because C is an arbitrary constant; actually there is
a family of solutions which satisfy the differential equation. To determine a unique
solution we must specify y(t) at some point such as its initial value. In IVP I in
(2.7) y(0) = 0 so y(0) = − π1 cos 0 + C = 0 which says that the unique solution to
the IVP is y(t) = − π1 cos πx + π1 .
If f (t, y) is more complicated than simply a function of t then other techniques
are available to try to find the analytic solution. These techniques include methods
such as separation of variables, using an integrating factor, etc. Remember that
when we write a code to approximate the solution of the IVP (2.8) we always want
to test the code on a problem where the exact solution is known so it is useful to
know some standard approaches. The following example illustrates how the method
of separation of variables is used to solve some first order ODEs; other techniques
are explored in the exercises.

Example 2.3. separation of variables for finding the analytic solution


Consider the differential equation y 0 (t) = −ty(t) and find its general solution using the
method of separation of variables; illustrate the family of solutions graphically. Verify
2.2. GENERAL FIRST ORDER IVP 29

that the solution satisfies the differential equation and then impose the initial condition
y(0) = 2 to determine a unique solution to the IVP.
Because f (t, y) is a function of both y and t we can not directly integrate the differential
Requation with respect to t to obtain the solution because this would require determining
ty(t) dt and y is unknown. For the technique of separation of variables we move all
terms involving the unknown to the left-hand side of the equation and all terms involving
the independent variable to the other side of the equation. Of course, this technique does
not work for all equations but it is applicable for many. For this ODE we rewrite the
equation as Z Z
dy dy
= −tdt ⇒ dt = − t dt
y y
so that we integrate to get the general solution

t2 t2 t2 t2
ln y + C1 = − + C2 ⇒ eln y+C1 = e− 2 +C2 ⇒ eC1 y(t) = e− 2 eC2 ⇒ y(t) = Ce− 2 .
2
Note that when we integrate an equation we have an arbitrary constant for each integral.
Here we have specifically indicated this but because the sum of two arbitrary constants
C1 , C2 is another arbitrary constant C in the sequel we only give one constant. Since the
general solution to this differential equation involves an arbitrary constant C there is an
infinite family of solutions which satisfy the differential equation; i.e., one for each choice
of C. A family of solutions is illustrated in the figure below; note that as t → ±∞ the
solution approaches zero.
2
C=2

1 C=1

C = 12

-4 -2 2 4
C = -1  2

-1
C = -1

C = -2
-2

We can always verify that we haven’t made an error in determining the solution by demon-
strating that it satisfies the differential equation. Here we have
   
t2 −2t t2 t2
y(t) = Ce− 2 ⇒ y 0 (t) = C e− 2 = −t Ce− 2 = −ty(t)
2

so the equation is satisfied.


To determine a unique solution we impose the value of y(t) at some point; here we set
2
y(0) = 2 to get the particular solution y(t) = 2e−t /2 because
t2
y(0) = 2, y(t) = Ce− 2 ⇒ 2 = Ce0 ⇒ C = 2.

Even if we are unable to determine the analytic solution to (2.8), we can still
gain some qualitative understanding of the behavior of the solution. This is done by
30 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS

the visualization technique of plotting the tangent line to the solution at numerous
points (t, y) and is called plotting the direction fields. Recall that the slope of the
tangent line to the solution curve is given and is just f (t, y). Mathematical software
with graphical capabilities often provide commands for automatically drawing a
direction field with arrows which are scaled to indicate the magnitude of the slope;
typically they also offer the option of drawing some solutions or streamlines. Using
direction fields to determine the behavior of the solution is illustrated in the following
example.

Example 2.4. direction fields


Draw the direction fields for the ODE

y 0 (t) = t2 + y(t) 0<t<4

and indicate the specific solution which satisfies y(0) = 1.


At each point (t, y) we draw the line with slope t2 + y; this is illustrated in the figure
below where numerous streamlines have been sketched. To thread a solution through the
direction field start at a point and follow the solution, remembering that solutions don’t
cross and that nearby tangent lines should be nearly the same.
To see which streamline corresponds to the solution with y(0) = 1 we locate the point (0, 1)
and follow the tangents; this solution is indicated by a thick black line in the direction field
plot below. If a different initial condition is imposed, then we get a different streamline.
4

-2 -1 1 2

-2

-4

Before we discuss methods for approximating the solution of the IVP (2.8) we
first need to ask ourselves if our general IVP actually has an analytic solution, even
if we are unable to find it. We are only interested in approximating the solution to
IVPs which have a unique solution. However, even if we know that a unique solution
exists, we may still have unreliable numerical results if the solution of the IVP does
not depend continuously on the data. If this is the case, then small changes in
the data can cause large changes in the solution and thus roundoff errors in our
calculations can produce meaningless results. In this situation we say the IVP is ill-
posed or ill-conditioned, a situation we would like to avoid. Luckily, most differential
equations that arise from modeling real-world phenomena are well-posed.
2.2. GENERAL FIRST ORDER IVP 31

The conditions that guarantee well-posedness of a solution to (2.8) are well


known and are presented in Theorem 1. Basically the theorem requires that the
derivative of y(t) (given by f (t, y)) be continuous and, moreover, this derivative is
not allowed to change too quickly as y changes. A basic problem in calculus is to
determine how much a continuous function changes as the independent variables
change; clearly we would like a function to change a small amount as an indepen-
dent variable changes but this is not always the case. The concept of Lipschitz
continuity1 gives a precise measure of this “degree of continuity”. To understand
this concept first think of a linear function g(x) = ax + b and consider the effect
changing x has on the dependent variable g(x). We have

|g(x1 ) − g(x2 )| = |ax1 + b − (ax2 + b)| = |a| |x1 − x2 | .

This says that as the independent variable x varies from x1 to x2 the change in
the dependent variable g is governed by the slope of the line, i.e., a = g 0 (x).
For a general function g(x) Lipschitz continuity on an interval I requires that the
magnitude of the slope of the line joining any two points x1 and x2 in I must be
bounded by a real number. Formally, a function g(x) defined on a domain D ⊂ R1
is Lipschitz continuous on D if for any x1 6= x2 ∈ D there is a constant L such
that
|g(x1 ) − g(x2 )| ≤ L|x1 − x2 | ,
or equivalently
|g(x1 ) − g(x2 )|
≤ L.
|x1 − x2 |
Here L is called the Lipschitz constant. This condition says that we must find one
constant L which works for all points in the domain. Clearly the Lipschitz constant
is not unique; for example, if L = 5, then L = 5.1, 6, 10, 100, etc. also satisfy
the condition. If g(x) is differentiable then an easy way to determine the Lipschitz
constant is to find a constant such that |g 0 (x)| ≤ L for all x ∈ D. The linear
function g(x) = ax + b is Lipschitz continuous with L = |a| = |g 0 (x)|. Lipschitz
continuity is a stronger condition than merely saying the function is continuous so a
Lipschitz continuous function is√always continuous but the converse is not true. For
example, the function g(x) = x is continuous
√ on D = [0, 1] but is not Lipschitz
continuous on D because g 0 (x) = 1/(2 x) is not bounded near the origin.
There are functions which are Lipschitz continuous but not differentiable. For
example, consider the continuous function g(x) = |x| on D = [−1, 1]. Clearly it
is not differentiable on D because it is not differentiable at x = 0. However, it is
Lipschitz continuous with L = 1 because the magnitude of the slope of the secant
line between any two points is always less than or equal to one. Consequently,
Lipschitz continuity is a stronger requirement than continuity but a weaker one
than differentiability.
For the existence and uniqueness result for (2.8), we need f (t, y) to be Lipschitz
continuous in y so we need to extend the above definition by just holding t fixed.
Formally, for fixed t we have that a function g(t, y) defined for y in a prescribed
1 Named after the German mathematician Rudolf Lipschitz (1832-1903).
32 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS

domain is Lipschitz continuous in the variable y if for any (t, y1 ), (t, y2 ) there is a
constant L such that

|g(t, y1 ) − g(t, y2 )| ≤ L|y1 − y2 | . (2.9)

We are now ready to state the theorem which guarantees existence and unique-
ness of a solution to (2.8) as well as guaranteeing that the solution depends continu-
ously on the data; i.e., the problem is well-posed. Note that y(t) is defined on [t0 , T ]
whereas f (t, y) must be defined on a domain in R2 . Specifically the first argument t
is in [t0 , T ] but y can be any real number so that D = {(t, y) | t ∈ [t0 , T ], y ∈ R1 };
a shorter notation for expressing D is D = [t0 , T ]×R1 which we employ.

Theorem 2.1 : Existence and uniqueness for IVP (2.8)

Let D = [t0 , T ] × R1 and assume that f (t, y) is continuous on D and is


Lipschitz continuous in y on D; i.e., it satisfies (2.9). Then the IVP (2.8) has
a unique solution in D and moreover, the problem is well-posed.

In the sequel we only consider IVPs which are well-posed, that is, which have a
unique solution that depends continuously on the data.

2.3 Discretization
Even if we know that a solution to (2.8) exists for some choice of f (t, y), we may
not be able to find the closed form solution to the IVP; that is, a representation
of the solution in terms of a finite number of simple functions. Even for the sim-
plified case of f (t, y) = f (t) this is not always possible. For example, consider
f (t) = sin t2 which has no explicit formula for its antiderivative. In fact, a symbolic
algebra software package like Mathematica gives the antiderivative of sin t2 in terms
of the Fresnel Integral which is represented by an infinite power series near the ori-
gin; consequently there is no closed form solution to the problem. Although there
are numerous techniques for finding the analytic solution of first order differential
equations, we are unable to easily obtain closed form analytic solutions for many
equations. When this is the case, we must turn to a numerical approximation to
the solution where we give up finding a formula for the solution at all times and
instead find an approximation at a set of distinct times. Discretization is the name
given to the process of converting a continuous problem into a form which can be
used to obtain numerical approximations.
Probably the most obvious approach to discretizing a differential equation is
to approximate the derivatives in the equation by difference quotients to obtain a
difference equation which involves only differences in function values. The solution
to the difference equation will not be a continuous function but rather a discrete
function which is defined over a finite set of points. When plotting the discrete
solution one often draws a line through the points to get a continuous curve but
2.3. DISCRETIZATION 33

remember that interpolation must be used to determine the solution at points other
than at the given grid points.
Because the difference equation is defined at a finite set of points we first
discretize the time domain [t0 , T ]; alternately, if our solution depended on the
spatial domain x instead of t we would discretize the given spatial interval. For
now we use N + 1 evenly spaced points tn , n = 0, 1, 2, . . . , N

t1 = t0 + ∆t, t2 = t1 + ∆t, ··· , tN = tN −1 + ∆t = T ,

where ∆t = (T − t0 )/N is called the step size or time step. This is illustrated below
where the time domain is [0, T ] and N = 9.

∆t ∆t
0 T
t
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

The strategy to approximate the solution to an IVP at each point tn ∈ (t0 , T ] is


to use the given initial information at t0 , i.e., y0 and the slope evaluated at t0 (i.e.,
f (t0 , y0 )), to get an approximate at t1 . Then the information at t1 and possibly
t0 is used to get an approximation at t2 . This process is continued until we have
an approximation at the final time t = T . Each method that we consider has a
different way to approximate the solution at the next time step. In the next chapter
we consider the simplest methods, the forward and backward Euler methods, and in
the following chapter we survey classes of methods for approximating the solution
of the IVP (2.8).
Because the approximation is a discrete function that is only known at each tn ,
n = 0, 1, . . . , N , we need different notation from the one used for the exact solution
to the IVP. In the sequel we use small case letters for the solution to the IVP (such as
y(t), p(t), etc.) and capital letters for the approximation. Because the approximate
solution is only defined at each tn , n = 0, 1, . . . , N we use a superscript to denote
the particular value of tn . For example, Y 2 ≈ y(t2 ), Y n ≈ y(tn ) . We know the
given solution y0 at t0 so we set Y 0 = y0 .
Once we have an approximation for a fixed value of ∆t, how do we know that
our numerical results are accurate? For example, in the left plot in Figure 2.1 we
graph an exact solution (the continuous curve) to a specific IVP and a discrete
approximation for ∆t = 0.5. The approximate solution is plotted only at the points
where it is determined. Although the approximate solution has the same general
shape as the exact solution, from this plot we are unable to say if our discrete
solution is correct. If we obtain additional approximations as the uniform time step
is reduced, then the plot on the right in Figure 2.1 suggests that the approximate
solution approaches the exact solution in some sense as ∆t → 0. However, this
does not confirm that the numerical results are correct. The reason for this is that
different methods produce approximations which converge to the exact solution
more rapidly than other methods. The only way to confirm that the results are
correct is to compare the numerical rate of convergence with the theoretical rate of
convergence for a problem where the exact solution is known. Consequently when we
34 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS

learn new methods we prove or merely state the theoretical rate of convergence. In
§ 1.2.3 the approach for determining the numerical rate of convergence is explained.
ì
1.0 ì
ì à
ì
1.5 ì à
ìà
ì ò
0.8 ìà
ì
ìà ò
ì
1.0ì
ò
æ
à ìà
ì ì
0.6 ì
à ìà ò
ì ìà
ì
à
òìì ìì
à ò æ
àì
ììì
à
ì
àì
ò ìà ò
æìàì ìì
0.4 ì
àìì
ò 1
àìì 2 ìììì
à
à
ò 3
æ
4
àìì
àìì ììì
à à ò
àìà ò
ò àìì
ò àìì àìì
àìì àìì
àìì æ
æ ò ò ò
æ æ
æ

1 2 3 4 0.0

Figure 2.1: The exact solution to an IVP is shown as a solid curve. In the figure
on the left a discrete solution using ∆t = 0.5 is plotted. From this plot, it is not
possible to say that the discrete solution is approaching the exact solution. However,
in the figure on the right the discrete solutions for ∆t = 0.5, 0.25, 0.125, and 0.625
are plotted. From this figure, the discrete approximations appear to be approaching
the exact solution as ∆t decreases.

When we implement a numerical method on a computer the error we make is due


to both roundoff and discretization error. Rounding error is due to using a computer
which has finite precision. First of all, we may not be able to represent a number
exactly; this is part of roundoff error and is usually called representation error. Even
if we use numbers which can be represented exactly on the computer, we encounter
rounding errors when these numbers are manipulated such as when we divide two
integers like 3 and 7. In some problems, roundoff error can accumulate in such a
way as to make our results meaningless. Discretization error is caused by replacing
the continuous problem with a discrete problem. For example, a discretization error
results when we replace y 0 (tn+1 ) by the difference quotient y(tn+1 ) − y(tn ) /∆t.
We have control over discretization error by choosing a method which approximates
quantities like the derivative more accurately. Our main control over roundoff error
is the computer precision we use in the implementation.

2.4 Higher Order IVPs


The general IVP (2.8) contains a first order ODE and in the next two chapters
we look at methods for approximating its solution. What do we do if we have a
higher order IVP such as the harmonic oscillator equation given in Example 2.2?
The answer is we write the higher order IVP as a system of first order IVPs. Then
the methods we learn to approximate the solution of the IVP (2.8) can be applied
directly; this is explored in Chapter 5.
Suppose we have the second order IVP

y 00 (t) = 2y 0 (t) − sin(πy) + 4t 0 < t ≤ 2


y(0) = 1
y 0 (0) = 0 ,
2.4. HIGHER ORDER IVPS 35

where now the right-hand side is a function of t, y and y 0 . The methods we learn
in Chapter 3 and Chapter 4 only apply to first order IVPs. However, we can easily
convert this second order IVP into two coupled first order IVPs. To do this, we let
w1 (t) = y(t), w2 (t) = y 0 (t) and substitute into the equations and initial conditions
to get a first order system for w1 , w2
w10 (t) = w2 (t) 0<t≤2
w20 (t) = 2w2 (t) − sin(πw1 ) + 4t 0<t≤2
w1 (0) = 1 w2 (0) = 0 .
Note that these two differential equations are coupled, that is, the differential equa-
tion for w1 depends on w2 and the equation for w2 depends on w1 .
In general, if we have the pth order IVP for y(t)
y [p] (t) = f (t, y, y 0 , y 00 , · · · , y [p−1] ) t0 < t ≤ T
y(t0 ) = α1 , y 0 (t0 ) = α2 , y 00 (t0 ) = α3 , · · · y [p−1] (t0 ) = αp
then we convert it to a system of p first-order IVPs by letting w1 (t) = y(t), w2 (t) =
y 0 (t), · · · , wp (t) = y [p−1] (t) which yields the first order coupled system
w10 (t) = w2 (t)
w20 (t) = w3 (t)
..
. (2.10)
0
wp−1 (t) = wp (t)
wp0 (t) = f (t, w1 , w2 , . . . , wp )
along with the initial conditions wk = αk , k = 1, 2, . . . , p. Thus any higher order
IVP that we encounter can be transformed into a coupled system of first order IVPs.

Example 2.5. converting a high order ivp into a system


Write the fourth order IVP
y [4] (t) + 2y 00 (t) + 4y(t) = 5 y(0) = 1, y 0 (0) = −3, y 00 (0) = 0, y 000 (0) = 2
as a system of first order equations.
We want four first order differential equations for wi (t), i = 1, 2, 3, 4; to this end let
w1 = y, w2 = y 0 , w3 = y 00 , and w4 = y 000 . Using the first two expressions we have
w10 = w2 , and the second and third gives w20 = w3 , the third and fourth gives w30 = w4 and
the original differential equation provides the last first order equation w40 + 2w3 + 4w1 = 5.
The system of equations is thus
w10 (t) − w2 (t) = 0
w20 (t) − w3 (t) = 0
w30 (t) − w4 (t) = 0
0
w4 + 2w3 + 4w1 = 5
along with the initial conditions
w1 (0) = 1, w2 (0) = −3, w3 (0) = 0, and w4 (0) = 2.
36 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS
Chapter 3
The Euler Methods

There are many approaches to deriving discrete methods for the general first order
IVP (2.8) but the simplest methods use the slope of a secant line to approximate
the derivative in (2.8a). In this chapter we consider two methods for solving the
prototype IVP (2.8) which are obtained by using this approximation to y 0 (t) at
two different values of t. We see that the two methods require very different
implementations and have different stability properties.
We begin this chapter with the forward Euler method which is described by a
simple formula but also has a graphical interpretation. Numerical examples which
demonstrate how the method is applied by hand are provided before a computer
implementation is discussed. The discretization error for forward Euler is derived in
detail by first obtaining a local truncation error which is caused by approximating
y 0 (t) and then obtaining the global error which is due to the local truncation error
and an accumulated error over the time steps. We see that the global error is
one order of magnitude less than the local truncation error which is the typical
relationship we see for the methods described here. A computer implementation
of the forward Euler method is given and several examples demonstrate that the
numerical results agree with the theoretical rate of convergence. However, for one
example and certain choices of step size, the forward Euler produces results which
oscillate.
The second method considered here is the backward Euler method which has
strikingly different properties than the forward Euler method. The implementation
for the prototype IVP (2.8) typically requires solving a nonlinear equation at each
time step compared with a linear equation for the forward Euler method. However, it
does not produce unreliable results for some problems and some choices of the time
step as the forward Euler method does. Lastly we briefly discuss the concepts of
consistency, stability and convergence of numerical methods to begin to understand
why numerical methods may produce oscillating or unbounded results.

37
38 CHAPTER 3. THE EULER METHODS

3.1 Forward Euler Method


The forward Euler method is probably the best known and simplest method for ap-
proximating the solution to the IVP (2.8). This method was named after Leonhard
Euler1 and is often just referred to as “the Euler method.” There are many ways to
derive the method but in this chapter we use the slope of the secant line between
(tn , y(tn ) and (tn+1 , y(tn+1 ) to approximate y 0 (tn ). Because we use a secant line
approximation, this gives us a graphical interpretation of the method. In this sec-
tion we explore an implementation of the algorithm, demonstrate that the method
converges linearly and provide numerical results.

3.1.1 Derivation and Graphical Interpretation


The forward Euler method can be derived from several different viewpoints, some
of which we explore in this and the next chapter. The simplest approach is just to
use a secant line to approximate the derivative y 0 (t). Recall that the definition of
y 0 (t) is
y(t + ∆t) − y(t)
y 0 (t) = lim ,
∆t→0 ∆t
 
so if ∆t is small so that the point t + ∆t, y(t + ∆t) is close to t, y(t) , then the
slope of the secant
 line is a good approximation to the slope f (t, y) of the tangent
line at t, y(t) . In the figure below we compare the secant line approximation to
the actual slope y 0 (t) for two different choices of ∆t; here ∆t  2 = ∆t1 /3 so the
slope of the secant line joining t, y(t) and t + ∆t2 , y(∆t2 ) (represented by the
cyan line) is much closer to y 0 (t) (tangent line represented in red) then the secant
line joining t, y(t) and t + ∆t1 , y(∆t1 ) (represented by blue line). Of course,
how small we require ∆t also depends on how rapidly y 0 (t) is changing.

secant line through


t, y(t) and 
t + ∆t1 , y(t + ∆t1 )


tangent line at t, y(t)
secant line through t, y(t)
 with slope y 0 (t)

and t + ∆t2 , y(t + ∆t2 )
t
t t + ∆t2 t + ∆t1

1 Euler (1707-1783) was a Swiss mathematician and physicist.


3.1. FORWARD EULER METHOD 39
 
We know that the secant line joining the points t, y(t) and t + ∆t, y(t + ∆t)
is an approximation to y 0 (t), i.e.,
y(t + ∆t) − y(t)
y 0 (t) ≈ (3.1)
∆t
and if ∆t is small then we expect this difference quotient to be a good approximation.
If we set t = tn in (3.1) and y(tn+1 ) = y(tn + ∆t) then

y(tn+1 ) − y(tn ) ≈ ∆ty 0 (tn ) ⇒ y(tn+1 ) ≈ y(tn ) + ∆tf tn , y(tn ) ,




where we have used the differential equation y 0 (tn ) = f tn , y(tn ) . This suggests


the following numerical method for the solution of (2.8) which is called the forward
Euler method where we denote the approximate solution at tn as Y n and set Y 0 =
y(t0 ).

Forward Euler: Y n+1 = Y n + ∆tf (tn , Y n ) , n = 0, 1, 2, . . . , N − 1 (3.2)

The term “forward” is used in the name because we write the equation at the point
tn and difference forward in time to tn+1 ; this implies that the given slope f is
evaluated at the known point (tn , Y n ).
To implement the method we know that Y 0 = y0 is the solution at t0 so we can
evaluate the known slope there, i.e., f (t0 , Y 0 ). Then the solution at t1 is given by

Y 1 = Y 0 + ∆tf (t0 , Y 0 ) .

For the next step, we know Y 1 ≈ y(t1 ) and so we must evaluate the slope at
(t1 , Y 1 ) to get
Y 2 = Y 1 + ∆tf (t0 , Y 1 ) .
The procedure is continued until the desired time is reached.
A graphical interpretation of the forward Euler method is shown in the figure
below.To start the method, we write the tangent line to the solution curve at
(t0 , y0 ) = (t0 , Y 0 ) which has slope y 0 (t0 ) = f (t0 , Y 0 ); the equation is

w(t) − Y 0 = f (t0 , Y 0 )(t − t0 ) .

If Y 1 denotes the point on this line corresponding to t = t1 then Y 1 − Y 0 =


f (t0 , Y 0 )(t1 − t0 ) = ∆tf (t0 , Y 0 ) which is just Euler’s equation for the approxi-
mation Y 1 to y(t1 ). Now for the second step we don’t have a point on the exact
solution curve to compute the tangent line but if ∆t is small, then f (t1 , Y 1 ) ≈
f (t1 , y(t1 )) = y 0 (t1 ). So we write the equation passing through (t1 , Y 1 ) with slope
f (t1 , Y 1 ) and evaluate it at t2 to get Y 2 − Y 1 = ∆tf (t1 , Y 1 ) which again is just
the formula for Y 2 from (3.2). It is important
 to realize that at the second step we
do not have the exact slope f t1 , y(t1 ) of the tangent line to the solution curve
but rather the approximation f (t1 , Y 1 ). This is true for all subsequent steps.
40 CHAPTER 3. THE EULER METHODS

line through (t0 , y0 ) with


slope f (t0 , y0 )

(t0 , Y 0 ) (t1 , Y 1 ) line through (t1 , Y 1 ) with


(t2 , Y 2 ) slope f (t1 , Y 1 )
(t0 , y0 ) 
t1 , y(t1 )
 (t3 , Y 3 ) line through (t2 , Y 2 ) with
t2 , y(t2 )
slope f (t2 , Y 2 )

t3 , y(t3 )

exact solution y(t)

In the following example we implement two steps of the forward Euler method
by hand and then demonstrate how the approximation relates to the tangent line
of the solution curve.

Example 3.1. forward euler method


Apply the forward Euler method to the IVP

y 0 (t) = −2y, 0 < t ≤ T y(0) = 2

to approximate the solution at t = 0.2 using a step size of ∆t = 0.1. Then calculate the
error at t = 0.2 given the exact solution y(t) = 2e−2t . Does the point (t1 , Y 1 ) lie on the
tangent line to y(t) at t = t0 ? Does the point (t2 , Y 2 ) lie on the tangent line to y(t) at
t = t1 ? Justify your answer.
To find the approximation at t = 0.2 using a time step of ∆t = 0.1, we first have to apply
Euler’s method to determine an approximation at t = 0.1 and then use this to approximate
the solution at t = 0.2. From the initial condition we set Y 0 = 2 and from the differential
equation we have f (t, y) = −2y. Applying the forward Euler method gives

Y 1 = Y 0 + ∆tf (t0 , Y 0 ) ⇒ Y 1 = 2 + 0.1f (0, 2) = 2 + 0.1(−4) = 1.6

and thus

Y 2 = Y 1 + ∆tf (t1 , Y 1 ) ⇒ Y 2 = 1.6 + 0.1f (.1, 1.6) = 1.6 − 0.32 = 1.28 .

The exact solution at t = 0.2 is y(0.2) = 2e−0.4 ≈ 1.34064 so the error is |1.34064 −
1.28| = 0.06064.
The equation of the tangent line to y(t) at t = 0 passes through the point (0, 2) and
has slope y 0 (0) = −4. Thus the equation of the tangent line is w − 2 = −4(t − 0)
and at t = 0.1 we have that w = 2 − .4 = 1.6 which is Y 1 so the point (0.1, Y 1 )
is on the tangent line to the solution curve at t = 0; this is to be expected from the
graphical interpretation of the forward Euler method. However, the point (t2 , Y 2 ) is not
3.1. FORWARD EULER METHOD 41

on the tangent line to the solution curve y(t) at t = 0.1 but rather on a line passing
through the point (t1 , Y 1 ) with slope f (t1 , Y 1 ). The actual
 slope of the tangent line to
the solution curve at t1 , y(t1 ) is −2y(t1 ) = −2 2e−2(.1) = −4e−.2 = 3.2749 whereas
we approximate this by f (t1 , Y 1 ) = 3.2.

In the case when f (t, y) is only a function of t or it is linear in y we can get a


general formula for Y n in terms of Y 0 and ∆t so that the intermediate time steps
do not have to be computed. The following example illustrates how this formula
can be obtained.

Example 3.2. general solution to forward euler difference equation for


a special case
Consider the IVP
y 0 (t) = −λy 0 < t ≤ T, y(0) = y0
−λt
whose exact solution is y(t) = y0 e . Find the general solution for Y n in terms of Y 0
and ∆t for the forward Euler method.
For the forward Euler method we have

Y 1 = Y 0 + ∆t(−λY 0 ) = (1 − λ∆t)Y 0 .

Similarly,

Y 2 = (1 − λ∆t)Y 1 = (1 − λ∆t)2 Y 0 , Y 3 = (1 − λ∆t)Y 2 = (1 − λ∆t)3 Y 0 .

Continuing in this manner we see that

Y n = (1 − λ∆t)n Y 0 .

In the next example we use this general formula to compare the results at a
fixed time for a range of decreasing values of the uniform time step. As can be seen
from the results, as ∆t → 0 the error in the solution tends to zero which implies
the approximate solution is converging to the exact solution.

Example 3.3. comparing numerical results as the time step is decreased


Use the general formula from Example 3.2 to approximate the solution to the IVP at t = 1
when λ = 5, y0 = 2. Compare the results for ∆t = 1/20, 1/40, . . . , 1/320 and discuss.
Then determine the time step necessary to guarantee that the relative error is less than
1%.
For this problem the general solution is Y n = 2(1 − 5∆t)n and the exact solution is
y(t) = 2e−5t so the exact solution at t = 1 is y(1) = 0.013475893998. The relative error
in the table below is computed by calculating |Y n − y(1)|/|y(1)|. For each value of ∆t,
the number of time steps, the approximate solution Y n and the magnitude of the relative
error are reported.
42 CHAPTER 3. THE EULER METHODS

∆t n Yn Relative
Error

1/20 20 6.342 10−3 0.52935


1/40 40 9.580 10−3 0.28912
1/80 80 1.145 10−2 0.15048
1/160 160 1.244 10−2 0.076691
1/320 320 1.295 10−2 0.038705

If the numerical solution is converging to the exact solution then the relative error at a
fixed time should approach zero as ∆t gets smaller. As can be seen from the table, the
approximations tend to zero monotonically as ∆t is halved and, in fact, the errors are
approximately halved as we decrease ∆t by half. This is indicative of linear convergence.
At ∆t = 1/320 the relative error is approximately 3.87% so for ∆t = 1/640 we expect the
relative error to be approximately 1.9% so cutting the time step again to 1/1280 should give
a relative error of < 1%. To confirm this, we do the calculation Y n = 2(1 − 5/1280)1280
and get a relative error of approximately 0.97%.

3.1.2 Discretization errors


We know that when we implement a numerical method on a computer the error
we make is due to both roundoff and discretization error. Here we are mainly
concerned with discretization error and when we derive error estimates we assume
that no rounding error exists. We know that at t0 the approximate solution agrees
with the exact solution. Using forward Euler to compute an approximation at t1
incurs an error in the approximation due to the fact that we have used the difference
quotient (y(t1 ) − y(t0 ))/∆t to approximate y 0 (t0 ). However at t2 and subsequent
points the discretization error comes from two sources. The first source of error
is the difference quotient approximation to y 0 (t) and the second is because we
have started from the incorrect point, i.e., we did not start on the exact solution
curve as we did in calculating Y 1 . The global discretization error at a point tn is the
magnitude of the actual error |y(tn )−Y n | whereas the local truncation error or local
discretization error is the error made because we solve the difference equation rather
than the actual differential equation. In other words the local truncation error at tn
measures how well the approximate solution matches the exact solution if the two
solutions are the same at tn−1 .The figure below illustrates the two errors graphically
for the forward Euler method. The plot on the left demonstrates the global error at
t2 which is just |y(t2 )−Y 2 | where Y 2 is found by applying the forward Euler formula
Y 2 = Y 1 + ∆tf t1 , Y 1 ). The figure on the right demonstrates the local error at t2 ;
note that the starting point used is not (t1 , Y 1 ) but the exact
 solution t1 , y(t1 ).
From the figure we see that the local error is |y(t2 ) − y(t1 ) + ∆tf t1 , y(t1 ) ]
which is just the remainder when the exact solution is plugged into the difference
equation.

Calculating local truncation error


3.1. FORWARD EULER METHOD 43

To measure the local truncation error, plug the exact solution into the differ-
ence equation and calculate the remainder.

Y2
Yc2
global error
local error
Y1 y(t2 ) y(t2 )

y(t1 ) y(t1 )

Y0 Y0
Y 2 = Y 1 + ∆tf (t1 , Y 1 )

Yc2 = y(t1 ) + ∆tf t1 , y(t1 )
t t
t0 t1 t2 t0 t1 t2

Our strategy for analytically determining the global error for the forward Euler
method is to first quantify the local truncation error in terms of ∆t and then use this
result to determine the global error. To determine a formula for the local truncation
error for the forward Euler method we substitute the exact solution to (2.8a) into
the difference equation (3.2) and calculate the remainder. If τn+1 denotes the local
truncation error at the (n + 1)st time step then
 
τn+1 = y(tn+1 ) − y(tn ) + ∆tf tn , y(tn ) . (3.3)

In order to combine
 terms in (3.3) we need all terms to be evaluated at the same
point tn , y(tn ) . The only term not at this point is the exact solution y(tn+1 ) =
y(tn + ∆t) so we use a Taylor series with remainder (see Appendix) for this term;
we have
(∆t)2 00
y(tn+1 ) = y(tn + ∆t) = y(tn ) + ∆ty 0 (tn ) + y (ξi ) ξi ∈ (tn , tn+1 ) .
2!
From the differential equation evaluated at tn , we have y 0 (tn ) = f tn , y(tn ) so


we substitute this into the Taylor series expansion for y(tn+1 ). We then put the
expansion into the expression (3.3) for the truncation error to yield
h  (∆t)2 00 i h i
τn+1 = y(tn ) + ∆tf tn , y(tn ) + y (ξi ) − y(tn ) + ∆tf tn , y(tn )
2!
(∆t)2 00
= y (ξi ) ,
2!
If y 00 (t) is bounded on [0, T ], say |y 00 (t)| ≤ M , and T = t0 + N ∆t, then we have
M
τ = max |τn | ≤ (∆t)2 , (3.4)
1≤n≤N 2
44 CHAPTER 3. THE EULER METHODS

where τ denotes the largest truncation error of all the N time steps. We say that the
local truncation error for Euler’s method is order (∆t)2 which we write as O ∆t2
and say that the rate is quadratic. This implies that the local error is proportional
to the square of the step size; i.e., it is a constant times the square of the step size
which in turn says that if we compute the local error for ∆t then the local error
using ∆t/2 is reduced by approximately (1/2)2 = 1/4. Remember, however, that
this is not the global error but rather the error made because we have used a finite
difference quotient to approximate y 0 (t).
We now turn to estimating the global error in the forward Euler method. We
should expect to only be able to find an upper bound for the error because if we
can find a formula for the exact error, then we can calculate this and add it to
the approximation to get the exact solution. The proof for the global error for the
forward Euler method is a bit technical but it is the only global error estimate that
we derive because the methods we consider follow the same relationship between
the local and global error as the Euler method.
Our goal is to demonstrate that the global discretization error for the forward
Euler method is O(∆t) which says that the method is first order, i.e., linear in ∆t.
At each step we make a local error of O(∆t)2 due to approximating the derivative
in the differential equation; at each fixed time we have the accumulated errors of
all previous steps and we want to demonstrate that this error does not exceed a
constant times ∆t.
Theorem 3.1 provides a formal statement and proof for the global error of the
forward Euler method. Note that one hypothesis is that f (t, y) must be Lipschitz
continuous in y which is also assumed to guarantee existence and uniqueness of the
solution to the IVP (2.8) so it is a natural assumption. We also assume that y(t)
possesses a bounded second derivative because we need to use the local truncation
error given in (3.4); however, this condition can be relaxed but it is adequate for
our needs.
3.1. FORWARD EULER METHOD 45

Theorem 3.1 : Global error estimate for the forward Euler method

Let D = [t0 , T ] × R1 and assume that f (t, y) is continuous on D and is


Lipschitz continuous in y on D; i.e., it satisfies (2.9) with Lipschitz constant
L. Also assume that there is a constant M such that

|y 00 (t)| ≤ M for all t ∈ [t0 , T ] .

Then the global error at each point tn satisfies


M TL
|y(tn ) − Y n | ≤ C∆t where C = (e − 1) ;
2L
thus the forward Euler method converges linearly.

Proof. Let En represent the global discretization error at the specific time tn , i.e.,
En = |y(tn ) − Y n |. The steps in the proof are summarized as follows.

Step I. Use the definition of the local truncation error τn to demonstrate that the
global error satisfies

En ≤ K En−1 + |τn | for K = 1 + ∆tL;

that is, the error at a step is bounded by a constant times the error at the
previous step plus the absolute value of the local truncation error. If τ is the
maximum of all |τn |, we have

En ≤ K En−1 + τ for K = 1 + ∆tL . (3.5)

Step II. Apply (3.5) recursively and use the fact that E0 = 0 to get

n−1
X
En ≤ τ Ki . (3.6)
i=0

Step III. Recognize that the sum in (3.6) is a geometric series whose sum is known
to get
τ
En ≤ [(1 + ∆tL)n − 1] . (3.7)
∆tL

Step IV. Use the Taylor series expansion of e∆tL near zero to bound (1 + ∆tL)n
by en∆tL which in turn is less than eT L .

Step V. Use the bound (3.4) for τ to get the final result

M ∆t T L M TL
En ≤ (e − 1) = C∆t where C = (e − 1) . (3.8)
2L 2L
46 CHAPTER 3. THE EULER METHODS

We now give the details for each step. For the first step we use the fact that
the local truncation error is the remainder when we substitute the exact solution
into the difference equation; i.e.,

τn = y(tn ) − y(tn−1 ) − ∆tf tn−1 , y(tn−1 ) .
To get the desired expression for En we solve for y(tn ) in the above expression,
substitute into the definition for En and use the triangle inequality; then we use the
forward Euler scheme for Y n and Lipschitz continuity (2.9). We have
En = |y(tn ) − Y n |
τn + y(tn−1 ) + ∆tf tn−1 , y(tn−1 ) − Y n
 
=
= τn + y(tn−1 ) + ∆tf tn−1 , y(tn−1 ) − Y n−1 + ∆tf (tn−1 , Y n−1 )
  

≤ τn | + |y(tn−1 ) − Y n−1 | + ∆t|f tn−1 , y(tn−1 ) − f (tn−1 , Y n−1 )




≤ |τn | + En−1 + ∆tL|y(tn−1 ) − Y n−1 | = |τn | + (1 + ∆tL)En−1 .


In the final step we have used the Lipschitz condition (2.9) which is a hypothesis
of the theorem. Since |τn | ≤ τ , we have the desired result.
For the second step we apply (3.5) recursively
En ≤ KEn−1 + τ ≤ K[KEn−2 + τ ] + τ = K 2 En−2 + (K + 1)τ
≤ K 3 En−3 + (K 2 + K + 1)τ
≤ ···
n−1
X
≤ K n E0 + τ Ki .
i=0

Because we assume for analysis that there are no roundoff errors, E0 = |y0 −Y0 | = 0
Pn−1
and we are left with τ i=0 K i .
For the
Pthird step we simplify the sum by noting that it is a geometric series of
n−1
the form i=0 ari with a = τ and r = K. From calculus we know that the sum is
given by a(1 − rn )/(1 − r) so that if we use the fact that K = 1 + ∆tL we arrive
at the result (3.7)
1 − Kn
   n 
K −1 τ h i
En ≤ τ =τ = (1 + ∆tL)n − 1 .
1−K K −1 ∆tL
To justify the fourth step we know that for real z the Taylor series expansion
ez = 1 + z + z 2 /2! + · · · near zero implies that 1 + z ≤ ez so that (1 + z)n ≤ enz .
If we set z = ∆tL we have (1 + ∆tL)n ≤ en∆tL so that
τ
En ≤ (en∆tL − 1) .
∆tL
For the final step we know from the hypothesis of the theorem that |y 00 (t)| ≤ M
so τ ≤ M ∆t2 /2. Also n in En is the number of steps taken from t0 so n∆t =
tn ≤ T where T is the final time and so en∆tL ≤ eT L . Combining these results
gives the desired result (3.8).

3.1. FORWARD EULER METHOD 47

In general, the calculation of the local truncation error is straightforward (but


sometimes tedious) whereas the proof for the global error estimate is much more
involved. However, for the methods we consider, if the local truncation error is
O(∆t)r then we expect the global error to be one power of ∆t less, i.e., O(∆t)r−1 .
When performing numerical simulations we need to demonstrate that the numerical
rate of convergence agrees with the theoretical rate. This gives us confidence that
the numerical scheme is implemented properly.

3.1.3 Numerical computations


In this section we provide some numerical simulations for IVPs of the form (2.8)
using the forward Euler method. Before providing the simulation results we discuss
the computer implementation of the forward Euler method. For the simulations
presented we choose problems with known analytic solutions so that we can compute
numerical rates of convergence and compare with the theoretical result given in
Theorem 3.1. To do this we compute approximate solutions for a sequence of step
sizes where ∆t → 0 and then compute a numerical rate of convergence using § 1.6.
As expected, the numerical rate of convergence is linear; however, we see that the
forward Euler method does not provide reliable results for all choices of ∆t for some
problems. Reasons for this failure will be discussed in § 3.4.

Computer implementation
For the computer implementation of a method we first identify what information
is problem dependent. The information which changes for each IVP (2.8) is the
interval [t0 , T ], the initial condition y0 , the given slope f (t, y) and the exact solution
if an error calculation is performed. From the examples calculated by hand, we know
that we approximate the solution at t1 , then at t2 , etc. so implementation requires
a single loop over the number of time steps, say N . However, the solution should
not be stored for all times; if it is needed for plotting, etc., then the time and
solution should be written to a file to be used later. For a single equation it does
not take much storage to keep the solution at each time step but when we encounter
systems and problems in higher dimensions the storage to save the entire solution
can become prohibitively large so it is not good to get in the practice of storing the
solution at each time.
The following pseudocode gives an outline of one approach to implement the
forward Euler method using a uniform time step.
48 CHAPTER 3. THE EULER METHODS

Algorithm 3.1 : Forward Euler Method

Define: external function for the given slope f (t, y) and for the exact solu-
tion for error calculation

Input: the initial time, t0 , the final time, T , the initial condition, y0 , and
the uniform time step ∆t
Set:
t = t0
y = y0
Time step loop:
do while t ≤ T

m = f (t, y)
y = y + ∆t m
t = t + dt
output t, y
Determine error at final time t:

error = | exact(t) − y |

3.1.4 Numerical examples for the forward Euler method


We now consider examples involving the exponential and logistic growth/decay
models discussed in § 2.1. We apply the forward Euler method for each problem
and compute the global error at a fixed time; we expect that as ∆t decreases the
global error at that fixed time decreases linearly. To confirm that the numerical
approximations are valid, for each pair of successive errors we use (1.6) to calculate
the numerical rate of convergence and verify that it approaches one. For one
example we see that the forward Euler method gives numerical approximations
which oscillate and grow unexpectedly. It turns out that this is a result of numerical
instability due to too large a time step. This is investigated in the next chapter.
In the last example, we compare the local truncation error with the global error for
the forward Euler method.

Example 3.4. Using the forward Euler method for exponential growth
Consider the exponential growth problem

p0 (t) = 0.8p(t) 0 < t ≤ 1, p(0) = 2


3.1. FORWARD EULER METHOD 49

whose exact solution is p(t) = 2e.8t . Compute approximations at t = 1 using a sequence


of decreasing time steps for the forward Euler method and plot the exact solution the the
approximations. Demonstrate that the numerical rate of convergence is linear by using the
formula (1.6) and by using a graphical representation of the error. Does the time at which
we compute the results affect the errors or the rates?
We compute approximations using the forward Euler method with ∆t = 1/2, 1/4, 1/8,
1/16 and plot the exact solution along with the approximations. Remember that the
approximate solution is a discrete function but we have connected the points for illustration
purposes. These results are plotted in the figure below and we can easily see that the error
is decreasing as we decrease ∆t.
4.5
ì
à
exact solution ì
Dt=1/16
ò
4.0 ì Dt=1/8
à æ

ì
à
3.5 Dt=1/4
ì ò

ì
à
ì
3.0 ì
à
ò
ì æ Dt=1/2
ì
à
ì
2.5
ì
à
ò
ì
ì
à
ì
ì
ò
æ
à
0.2 0.4 0.6 0.8 1.0

To verify that the global error is O(∆t) we compare the discrete solution to the exact
solution at the point t = 1 where we know that the exact solution is 2e.8 =4.45108; we
tabulate our approximations P n to p(t) at t = 1 and the global error in the table below
for ∆t = 1/4, 1/8, . . . , 1/128. By looking at the errors we see that as ∆t is halved the
error is approximately halved so this suggests linear convergence; the calculation of the
numerical rate of convergence makes this result precise because we see that the sequence
{.891, .942, .970, .985, .992} tends to one. In the table the approximations and errors are
given to five digits of accuracy.

∆t 1/4 1/8 1/16 1/32 1/64 1/128


Pn 4.1472 4.28718 4.36575 4.40751 4.42906 4.4400
|p(1) − P n | 0.30388 0.16390 0.085333 0.043568 0.022017 0.011068
num. rate 0.891 0.942 0.970 0.985 0.992

We demonstrate graphically that the convergence rate is linear by using a log-log plot.
Recall that if we plot a polynomial y = axr on a log-log scale then the slope is r.2 Since
the error is E = C(∆t)r , if we plot the error on a log-log plot we expect the slope to be
r and in our case r = 1. This is illustrated in the log-log plot below where we compute
the slope of the line for two points.

2 Using the properties of logarithms we have log y = log axr = log a + r log x which implies

Y = mX + b where Y = log y, X = log x, m = r and b = log a.


50 CHAPTER 3. THE EULER METHODS

error
1.00

0.50

0.20

0.10 change in ln(error)


_______________ = 0.972
change in ln(Dt)

0.05

0.02

Dt
0.01 0.02 0.05 0.10 0.20 0.50 1.00

If we tabulate the errors at a different time then we get different errors but the numerical
rate should still converge to one. In the table below we demonstrate this by computing
the errors and rates at t = 0.5; note that the error is smaller at t = 0.5 than t = 1 for a
given step size because we have not taken as many steps and we have less accumulated
error.

∆t 1/4 1/8 1/16 1/32 1/64 1/128


Pn 2.8800 2.9282 2.9549 2.9690 2.9763 2.9799
|p(0.5) − P n | 0.10365 0.055449 0.028739 0.014638 0.0073884 0.0037118
num. rate 0.902 0.948 0.973 0.986 0.993

The next example applies the forward Euler method to an IVP modeling logistic
growth. The DE is nonlinear in this case but this does not affect the implementation
of the algorithm. The results are compared to those from the previous example
modeling exponential growth.

Example 3.5. Using the forward Euler method for logistic growth.
Consider the logistic model
 
p(t)
p0 (t) = 0.8 1 − p(t) 0 < t ≤ 10 p(0) = 2 .
100

Implement the forward Euler scheme and demonstrate that we get linear convergence.
Compare the results from this example with those from Example 3.4 of exponential growth.
The exact solution to this problem is given by (2.4) with K = 100, r0 = 0.8, and p0 = 2.
Before generating any simulations we should think about what we expect the behavior of
this solution to be compared with the exponential growth solution in the previous example.
Initially the population should grow at the same rate because r0 = 0.8 which is the same
growth rate as in the previous example. However, the solution should not grow unbounded
3.1. FORWARD EULER METHOD 51

but rather always stay below the carrying capacity p = 100. The approximations at t = 1
for a sequence of decreasing values of ∆t are presented below along with the calculated
numerical rates. The exact value at t = 1 is rounded to 4.3445923. Again we see that the
numerical rate approaches one.

∆t 1/4 1/8 1/16 1/32 1/64 1/128


Pn 4.0740 4.1996 4.2694 4.3063 4.3253 4.3349
|p(1) − P n | 0.27063 0.14497 0.075179 0.038302 0.019334 0.0097136
num. rate 0.901 0.947 0.973 0.993 0.993

Below we plot the approximate solution for ∆t = 1/16 on [0, 10] for this logistic growth
problem and the approximate solution for the previous exponential growth problem. Note
that the exponential growth solution increases without bound whereas the logistic growth
solution never exceeds the carrying capacity of K = 100. Also for small time both models
give similar results.

200

Exponential growth model


150

100 Logistic growth model

50

0 2 4 6 8 10

In the next example the IVP models exponential decay with a large decay con-
stant. The example illustrates the fact that the forward Euler method can sometimes
give erroneous results.

Example 3.6. Numerically unstable computations for the forward Euler method.
In this example we consider exponential decay where the decay rate is large. Specifically,
we seek y(t) such that

y 0 (t) = −20y(t) 0 < t ≤ 2, y(0) = 1

which has an exact solution of y(t) = e−20t . Plot the numerical results using ∆t = 1/4, 1/8
and discuss results.
The implementation is the same as in Example 3.4 so we graph the approximate solutions
on [0, 2] with ∆t = 14 and 18 . Note that for this problem the approximate solution is
oscillating and becoming unbounded.
52 CHAPTER 3. THE EULER METHODS

200

Dt=1/4
100

Dt=1/8
t
0.5 1.0 1.5 2.0

-100

-200

Why are the results for the forward Euler method not reliable for this problem whereas
they were for previous examples? The reason for this is a stability issue which we address
in § 3.4. When we determined the theoretical rate of convergence we tacitly assumed
that the method converged; however, from this example we see that it does not for these
choices of the time step.

We know that the local truncation error for the forward Euler method is O(∆t)2 .
In the previous examples we demonstrate that the global error is O(∆t) so in the
next example we demonstrate numerically that the local truncation error is second
order.

Example 3.7. Comparing the local and global errors for the forward Euler method.

We consider the IVP


y 0 (t) = cos(t)esin t 0<t≤π y(0) = 1
whose exact solution is esin t . Demonstrate that the local truncation error for the forward
Euler method is second order, i.e., O(∆t)2 and compare the local and global errors at
t = π.
The local truncation error at tn is computed from the formula
|y(tn ) − Ye n | where Ye n = y(tn−1 ) + ∆tf tn−1 , y(tn−1 )


that is, we use the exact value y(tn−1 ) instead of Y n−1 and evaluate the slope at the
point (tn−1 , y(tn−1 )) which is on the solution curve. In the table below we tabulate
the local and global errors at t = π using decreasing values of ∆t. From the numerical
rates of convergence you can clearly see that the local truncation error is O(∆t)2 , as we
demonstrated analytically. As expected, the global error converges linearly. Except at the
first step (where the local and global errors are identical) the global error is always larger
than the truncation error because it includes the accumulated errors as well as the error
made by approximating the derivative by a difference quotient.

∆t 1/8 1/16 1/32 1/64 1/128


local error 7.713 10−3 1.947 10−3 4.879 10−4 1.220 10−4 3.052 10−5
num. rate 1.99 2.00 2.00 2.00
global error 2.835 10−2 1.391 10−2 6.854 10−3 3.366 10−3 1.681 10−3
num. rate 1.03 1.02 1.02 1.00
3.2. BACKWARD EULER METHOD 53

3.2 Backward Euler Method


In the previous section
 we derive the forward Euler method using the quotient
y(tn+1 ) − y(tn ) /∆t to approximate the derivative at tn . However, we can also
use this difference quotient to approximate the derivative at tn+1 . This not only
gives a different method but a completely new type of method. Using this quotient
to approximate y 0 (tn+1 ) gives
 y(tn+1 ) − y(tn )
y 0 (tn+1 ) = f tn+1 , y(tn+1 ) ≈
∆t
which implies 
y(tn+1 ) ≈ y(tn ) + ∆tf tn+1 , y(tn+1 ) .
This equation suggests the following numerical scheme for approximating the solu-
tion to y 0 (t) = f (t, y), y(t0 ) = y0 which we call the backward Euler method where
Y 0 = y0 . It is called a “backward” scheme because we are at the point tn+1 and
difference backwards to the point tn to approximate the derivative y 0 (tn+1 ).

Backward Euler: Y n+1 = Y n + ∆tf (tn+1 , Y n+1 ) , n = 0, 1, 2, . . . , N − 1


(3.9)

What makes this method so different from the forward Euler scheme? The
answer is the point where the slope f (t, y) is evaluated. To see the difficulty here
consider the IVP y 0 (t) = t + y 2 , y(0) = 1. Here f (t, y) = t + y 2 . We set Y 0 = 1,
∆t = 0.1 and to compute Y 1 we have the equation
 2 
Y 1 = Y 0 + ∆tf (t1 , Y 1 ) = 1 + 0.1 0.1 + Y 1

as opposed to the equation for Y 1 using forward Euler


 
Y 1 = Y 0 + ∆tf (t0 , Y 0 ) = 1 + 0.1 0 + 1)2 .

For forward Euler we evaluate the slope f (t, y) at known values whereas for back-
ward Euler we don’t know the y-value where the slope is evaluated. In order to
solve for Y 1 using the backward Euler scheme we must solve a nonlinear equation
except when f (t, y) is linear in y or only a function of t.
The difference between the forward and backward Euler schemes is so impor-
tant that we use this characteristic to broadly classify methods. The forward Euler
scheme given in (3.2) is called an explicit scheme because we write the unknown
explicitly in terms of known values. The backward Euler method given in (3.9)
is called an implicit scheme because the unknown is written implicitly in terms of
known values and itself. The terms explicit/implicit are used in the same manner as
54 CHAPTER 3. THE EULER METHODS

explicit/implicit differentiation. In explicit differentiation a function to be differenti-


ated is given explicitly in terms of the independent variable such as y(t) = t3 +cos t;
in implicit differentiation the function is given implicitly such as y 2 + sin y − t2 = 4
and we want to compute y 0 (t). In the exercises you get practice in identifying
schemes as explicit or implicit.
The local truncation error for the backward Euler method is O(∆t)2 , the same
as we obtained for the forward Euler method (see exercises). Thus we expect the
global error to be one power less, i.e., O(∆t). The numerical examples in the next
section confirm this estimate.

3.2.1 Numerical examples for the backward Euler method


In general, the backward Euler method for a single IVP requires solution of a non-
linear equation; for a system of IVPs it requires solution of a nonlinear system of
equations. For this reason, the backward Euler method is not used in a straightfor-
ward manner for solving IVPs. However, when we study IBVPs we see that it is a
very popular scheme for discretizing a first order time derivative. Also in § 4.4 we
see how it is implemented in an efficient manner for IVPs using an approach called
predictor-corrector schemes. We discuss the implementation of the backward Euler
method in the following examples. In the simple case when f (t, y) is only a function
of t or it is linear in y we do not need to solve a nonlinear equation.

Example 3.8. Using the backward Euler method for exponential growth
Consider the exponential growth problem

p0 (t) = 0.8p(t) 0 < t ≤ 1, p(0) = 2

whose exact solution is p(t) = 2e.8t . In Example 3.4 we apply the forward Euler method
to obtain approximations at t = 1 using a sequence of decreasing time steps. Repeat the
calculations for the backward Euler method. Discuss implementation.
To solve this IVP using the backward Euler method we see that for f = 0.8p the difference
equation is linear
P n+1 = P n + 0.8∆tP n+1 ,
where P n ≈ p(tn ). Thus we do not need to use Newton’s method for this particular
problem but rather just solve the equation
1
P n+1 − 0.8∆tP n+1 = P n ⇒ P n+1 = Pn .
1 − 0.8∆t
If we have a code that uses Newton’s method it should get the same answer in one step
because it is solving a linear problem rather than a nonlinear one. The results are tabulated
below. Note that the numerical rate of convergence is also approaching one but for this
method it is approaching one from above whereas using the forward Euler scheme for
this problem the convergence was from below, i.e., through values smaller than one. The
amount of work required for the backward Euler method is essentially the same as the
forward Euler for this problem because the derivative f (t, p) is linear in the unknown p.
3.2. BACKWARD EULER METHOD 55

∆t 1/2 1/4 1/8 1/16 1/32 1/64 1/128


Pn 5.5556 4.8828 4.6461 4.5441 4.4966 4.4736 4.4623
|p(1) − P n | 1.1045 0.43173 0.19503 0.093065 0.045498 0.022499 0.011188
num. rate 1.355 1.146 1.067 1.032 1.015 1.008

In the previous example, the DE was linear in the unknown so it was straight-
forward to implement the backward Euler scheme and it took the same amount of
work as implementing the forward Euler scheme. However, the DE modeling logistic
growth is nonlinear and so implementation of the backward Euler scheme involves
solving a nonlinear equation. This is addressed in the next example.

Example 3.9. Using the backward Euler method for logistic growth.
In Example 3.5 we consider the logistic model
 
p(t)
p0 (t) = 0.8 1 − p(t) 0 < t ≤ 10 p(0) = 2 .
100
Implement the backward Euler scheme and demonstrate that we get linear convergence.
Discuss implementation of the implicit method.
To implement the backward Euler scheme for this problem we see that at each step we
have the nonlinear equation
2 !
n+1 n n+1 n n+1 P n+1
P = P + ∆tf (tn+1 , P ) = P + .8∆t P −
100

for P n+1 . Thus to determine each P n+1 we have to employ a method such as Newton’s
method. Recall that to find the root z of the nonlinear equation g(z) = 0 (a function of
one independent variable) each iteration of Newton’s method is given by

g(z k−1 )
z k = z k−1 −
g 0 (z k−1 )

for the iteration counter k = 1, 2, . . . and where an initial guess z 0 is prescribed. For our
problem, to compute the solution at tn+1 we have the nonlinear equation

z2
 
g(z) = z − P n − .8∆t z − =0
100

where z = P n+1 . Our goal is to approximate the value of z which makes g(z) = 0 and
this is our approximation P n+1 . For an initial guess z 0 we simply take P n because if ∆t
is small enough and the solution is smooth then the approximation at tn+1 is close to the
solution at tn . To implement Newton’s method we also need the derivative g 0 which for
us is just  z 
g 0 (z) = 1 − .8∆t 1 − .
50
The results using backward Euler are tabulated below; note that the numerical rates
of convergence approach one as ∆t → 0. We have imposed the convergence criteria
56 CHAPTER 3. THE EULER METHODS

for Newton’s method that the normalized difference in successive iterates is less than a
prescribed tolerance, i.e.,
|z k − z k−1 |
≤ 10−8 .
|z k |

∆t 1/4 1/8 1/16 1/32 1/64 1/128


Pn 4.714 4.514 4.426 4.384 4.364 4.354
|p(1) − Pn | 0.3699 0.1693 0.08123 0.03981 0.01971 0.009808
num. rate 1.127 1.060 1.029 1.014 1.007

In these computations, two to four Newton iterations are necessary to satisfy this conver-
gence criteria. It is well known that Newton’s method typically converges quadratically
(when it converges) so we should demonstrate this. To this end, we look at the normalized
difference in successive iterates. For example, for ∆t = 1/4 at t = 1 we have the sequence
0.381966, 4.8198 10−3 , 7.60327 10−7 , 1.9105 10−15 so that the difference at one iteration
is approximately the square of the difference at the previous iteration indicating quadratic
convergence.

Lastly we repeat the calculations for the exponential decay IVP in Example 3.6
using the backward Euler method. Recall that the forward Euler method produced
oscillatory results for ∆t = 1/4 and ∆t = 1/8. This example illustrates that the
backward Euler method does not produce unstable results for this problem.

Example 3.10. Backward Euler stable where forward Euler method oscillates.
We consider the problem in Example 3.6 for exponential decay where the forward Euler
method gives unreliable results. Specifically, we seek y(t) such that

y 0 (t) = −20y(t) 0 < t ≤ 2, y(0) = 1

which has an exact solution of y(t) = e−20t . Plot the results using the backward Euler
method for ∆t = 1/2, 1/4, . . . , 1/16. Compare results with those from the forward Euler
method.
Because this is just an exponential decay problem, the implementation is analogous to
that in Example 3.8. We graph approximations using the backward Euler method along
with the exact solution. As can be seen from the plot, it appears that the discrete solution
is approaching the exact solution as ∆t → 0 whereas the forward Euler method gave
unreliable results for ∆t = 1/2, 1/4. Recall that the backward Euler method is an implicit
scheme whereas the forward Euler method is an explicit scheme.
3.3. IMPROVING ON THE EULER METHODS 57

1.0

0.8 Dt = 1  4

Dt = 1  8
0.6

Dt = 1  16

0.4 Dt = 1  32

0.2

exact
solution
t
0.2 0.4 0.6 0.8 1.0

3.3 Improving on the Euler Methods


Although the forward Euler method is easy to understand and to implement, its
rate of convergence is only linear so it converges very slowly. The backward Euler is
more difficult to implement and still converges only linearly. In the next chapter we
look at many methods which improve on the accuracy of the Euler methods, but
we introduce two methods here to give a preview.
When we derive the forward Euler method we use the slope of the secant line
joining (tn , y(tn )) and (tn+1 , y(tn+1 )) to approximate the derivative at tn , i.e.,
y 0 (tn ) and for the backward Euler we use this slope to approximate y 0 (tn+1 ). In
the same manner, we can use this slope to approximate the derivative at any point
in [tn , tn+1 ] so an obvious choice would be the midpoint, i.e.,
 
tn + tn+1 y(tn ) + y(tn+1 )
y0 ≈ .
2 ∆t

As before, we use the differential equation y 0 (t) = f (t, y) and now evaluate at the
midpoint to get
 
y(tn ) + y(tn+1 ) tn + tn+1  tn + tn+1 
≈f ,y
∆t 2 2
which implies
 
tn + tn+1  tn + tn+1 
y(tn+1 ) ≈ y(tn ) + ∆tf ,y .
2 2
Now the problem with using this expression to generate a scheme is that we do not
know y at the midpoint of the interval [tn , tn+1 ]. So what can we do? The only
option is to approximate it there so an obvious approach is to take a step of length
∆t/2 using forward Euler, i.e.,
t + t  ∆t
n n+1 
y ≈ y(tn ) + f tn , y(tn ) .
2 2
58 CHAPTER 3. THE EULER METHODS

Using this approximation leads to the difference scheme


 ∆t n ∆t
Y n+1 = Y n + ∆tf tn + f (tn , Y n ) .

,Y + (3.10)
2 2
This scheme is often called the improved Euler method or the midpoint method and
is clearly explicit because we are evaluating the slope at known points. When we
look at the scheme we see that we actually have to perform two function evaluations
whereas for the forward Euler we only had to evaluate f (tn , Y n ). Since we are doing
more work we should expect better results and, in fact, this method is second order
accurate which we demonstrate in the next chapter. The midpoint rule belongs to
a family of methods called Runge-Kutta methods which we explore in Chapter 4.
An implicit scheme which converges quadratically, as opposed to the linear con-
vergence of the backward Euler method, is the so-called modified Euler method or
the trapezoidal method. This method is found by approximating f at the midpoint
of (tn , tn+1 ) by averaging its value at (tn , Y n ) and (tn+1 , Y n+1 ); it is defined by

∆t 
Y n+1 = Y n + f (tn , Y n ) + f (tn+1 , Y n+1 ) .

(3.11)
2
This method is clearly implicit because it requires evaluating the slope at the un-
known point (tn+1 , Y n+1 ). In the next chapter we look at approaches to system-
atically derive new methods rather than heuristic approaches.

3.4 Consistency, stability and convergence


From Example 3.6 we see that the forward Euler method fails to provide reliable
results for some values of ∆t in an exponential decay problem but in Theorem 3.1
we prove that the global error satisfies |y(tn ) − Y n | ≤ C∆t at each point tn . At
first glance, the numerical results seem to be at odds with our theorem. However,
with closer inspection we realize that to prove the theorem we tacitly assumed that
Y n is the exact solution of the difference equation. This is not the case because
when we implement methods small errors are introduced due to round off. We want
to make sure that the solution of the difference equation that we compute remains
close to the one we would get if exact arithmetic is used. The forward Euler method
in Example 3.6 exhibits numerical instability because roundoff error accumulates so
that the computed solution of the difference equation is very different from its exact
solution; this results in the solution oscillating and becoming unbounded. However,
recall that the backward Euler method gives reasonable results for this same problem
so the roundoff errors here do not unduly contaminate the computed solution to
the difference equation.
Why does one numerical method produce reasonable results and the other mean-
ingless results even though their global error is theoretically the same? The answer
lies in the stability properties of the numerical scheme. In this section we formally
define convergence and its connection with the concepts of consistency and stability.
When we consider families of methods in the next chapter we learn techniques to
determine their stability properties.
3.4. CONSISTENCY, STABILITY AND CONVERGENCE 59

Any numerical scheme we use must be consistent with the differential equation
we are approximating. The discrete solution satisfies the difference equation but the
exact solution y(tn ) yields a residual when substituted into the difference equation
which we call the local truncation error. As before we define τ (∆t) to be the largest
(in absolute value) local truncation error made at each of the N time steps with
increment ∆t , i.e., τ (∆t) = max1≤n≤N |τn (∆t)|. For the scheme to be consistent,
this error should go to zero as ∆t → 0, i.e.,

lim τ (∆t) = 0 . (3.12)


∆t→0

Both the forward and backward Euler methods are consistent with (2.8) because
we know that the maximum local truncation error is O(∆t)2 for all tn . If the local
truncation error is constant then the method is not consistent. Clearly we only want
to use difference schemes which are consistent with our IVP (2.8). However, the
consistency requirement is a local one and does not guarantee that the method is
convergent as we saw in Example 3.6.
We now want to determine how to make a consistent scheme convergent. In-
tuitively we know that for a scheme to be convergent the discrete solution at each
point must get closer to the exact solution as the step size reduces. As with con-
sistency, we can write this formally and say that a method is convergent if

lim max |y(tn ) − Y n | = 0 .


∆t→0 1≤n≤N

To interpret this mathematical expression consider performing calculations at ∆t =


1/4 first. Then we take the maximum global error at each time step. Now we
reduce ∆t to 1/8 and repeat the process. Hopefully the maximum global error
using ∆t = 1/8 is smaller than the one when ∆t = 1/4. We keep reducing ∆t
and this sequence of maximum global errors should approach zero for the method
to converge.
The reason the consistency requirement is not sufficient for convergence is that
it requires the exact solution to the difference equation to be close to the exact
solution of the differential equation. It does not take into account the fact that
we are not computing the exact solution of the difference equation due to roundoff
errors. It turns out that the additional condition that is needed is stability which
requires the difference in the computed solution to the difference equation and
its exact solution to be small. This requirement combined with consistency gives
convergence.

Numerical Convergence Requirements

Consistency + Stability = Convergence

To investigate the effects of the (incorrect) computed solution to the difference


equation, we let Yfn represent the computed solution to the difference equation
60 CHAPTER 3. THE EULER METHODS

which has an actual solution of Y n at time tn . We want the difference between


y(tn ) and Yfn to be small. At a specific tn we have

|y(tn ) − Yfn | = |y(tn ) − Y n + Y n − Yfn | ≤ |y(tn ) − Y n | + |Y n − Yfn | ,

where we have used the triangle inequality. Now the first term |y(tn ) − Y n | is
governed by making the local truncation error sufficiently small (i.e., making the
equation consistent) and the second term is controlled by the stability requirement.
So if each of these two terms can be made sufficiently small then when we take
the maximum over all points tn and take the limit as ∆t approaches zero we get
convergence.
In the next chapter we investigate the stability of methods and demonstrate
that the forward Euler method is only stable for ∆t sufficiently small whereas the
backward Euler method is numerically stable for all values of ∆t. Recall that the
forward Euler method is explicit whereas the backward Euler is implicit. This pattern
follows for other methods; that is, explicit methods have a stability requirement
whereas implicit methods do not. Of course this doesn’t mean we can take a very
large time step when using implicit methods because we still have to balance the
accuracy of our results.
3.4. CONSISTENCY, STABILITY AND CONVERGENCE 61

EXERCISES

3.1. Classify each difference equation as explicit or implicit. Justify your answer.

a. Y n+1 = Y n−1 + 2∆tf (tn , Y n )

b. Y n+1 = Y n−1 + ∆t
 n+1

3 f (tn+1 , Y ) + 4f (tn , Y n ) + f (tn−1 , Y n−1 )

c. Y n+1 = Y n + ∆t tn , Y n + ∆t ∆t
) + ∆t
 n n+1
  n
2 f  2 f (tn , Y )− 2 f (tn , Y 2 f tn+1 , Y +
∆t n+1
2 f (tn+1 , Y )

d. Y n+1 = Y n + ∆t 2
+ 23 ∆tf (tn + ∆t
 n n n
4 f (tn , Y ) + 3f tn + 3 ∆t, Y 3 ,Y +
∆t n

3 f (tn , Y )

3.2. Assume that the following set of errors isobtained from three different methods
for approximating the solution of an IVP of the form (2.8) at a specific time. First
look at the errors and try to decide the accuracy of the method. Then use the result
(1.6) to determine a sequence of approximate numerical rates for each method using
successive pairs of errors. Use these results to state whether the accuracy of the
method is linear, quadratic, cubic or quartic.

∆t Errors Errors Errors


Method I Method II Method III
1/4 0.23426×10−2 0.27688 0.71889×10−5
1/8 0.64406×10−3 0.15249 0.49840×10−6
1/16 0.16883×10−3 0.80353×10−1 0.32812×10−7
1/32 0.43215×10−4 0.41292×10−1 0.21048×10−8

3.3. Suppose the solution to the DE y 0 (t) = f (t, y) is a concave up function on


[0, T ]. Will the forward Euler method give an underestimate or an overestimate to
the solution? Why?

3.4. Show that if we integrate the IVP (2.8a) from tn to tn+1 and use a right
Riemann sum to approximate the integral of f (t, y) then we obtain the backward
Euler method.

3.5. Consider the IVP y 0 (t) = −λy(t) with y(0) = 1. Apply the backward Euler
method to this problem and show that we have a closed form formula for Y n , i.e.,
1
Yn = .
(1 + ∆t)n

3.6. Derive the backward Euler method by using the Taylor series expansion for
y(tn − ∆t).
62 CHAPTER 3. THE EULER METHODS

3.7. Consider approximating the solution to the IVP

y 0 (t) = 1 − y y(0) = 0

using the backward Euler method. In this case the given slope y 0 (t) is linear in y so
the resulting difference equation is linear. Use backward Euler to approximate the
solution at t = 0.5 with ∆t = 1/2, 1/4. . . . , 1/32. Compute the error in each case
and the numerical rate of convergence.

3.8. What IVP has a solution which exhibits the logarithmic growth y(t) = 2+3 ln t
where the initial time is prescribed at t = 1?

3.9. Determine an approximation to the solution at t = 0.5 to the IVP

y 0 (t) = 1 − y 2 y(0) = 0

using the forward Euler method with ∆t = 1/4. Compute the local and global
errors at t = 1/4 and t = 1/2. The exact solution is y(t) = (e2t − 1)/(e2x + 1).

Computer Exercises

3.10. Consider the exponential growth problem

p0 (t) = 4p(t) 0 < t ≤ 1, p(0) = 2

whose exact solution is p(t) = 2e4t .


This problem has a growth rate which is five times that of the rate in Exam-
ple 3.4 so we expect it to growth much faster. Tabulate the errors for ∆t =
1/2, 1/4, . . . , 1/128 and compare with those from the previous example. Compare
the numerical errors as well as the actual magnitude of the errors. Why do you
think that this problem has larger errors.

3.11. Consider the IVP


π π
y 0 (t) = cos2 (t) cos2 (2y) < t ≤ π, y( ) = π .
2 2
This nonlinear differential equation is separable and the IVP has the exact solution
y(t) = 12 arctan t + 21 sin(2t) − π2 + 2π . Compute the solution using forward
 

and backward Euler methods and demonstrate that the convergence is linear. For
the backward Euler method incorporate Newton’s method and verify that it is con-
verging quadratically. For each method compute and tabulate the numerical rates
using successive values of N = 10, 20, 40, . . . , 320. Discuss your results and com-
pare with theory.
3.4. CONSISTENCY, STABILITY AND CONVERGENCE 63

3.12. Write a code which implements the forward Euler method to solve an IVP of
the form (2.8). Use your code to approximate the solution of the IVP
y 0 (t) = 1 − y 2 y(0) = 0
which has an exact solution y(t) = (e2t − 1)/(e2x + 1). Compute the errors at t = 1
using ∆t = 1/4, 1/8, 1/16, 1/32, 1/64.

a. Tabulate the global error at t = 1 for each value of ∆t and demonstrate


that your method converges with accuracy O(∆t); justify your answer by
calculating the numerical rate of convergence for successive pairs of errors.
b. Tabulate the local error at t = 1 for each value of ∆t and determine the
rate of convergence of the local error; justify your answer by calculating the
numerical rate of convergence for successive pairs of errors. Compare your
results with those obtained in (a).
3.13. Suppose you are interested in modeling the growth of the Bread Mold Fungus,
Rhizopus stolonifer and comparing your numerical results to experimental data that
is taken by measuring the number of square inches of mold on a slice of bread over
a period of several days. Assume that the slice of bread is a square of side 5 inches.

a. To obtain a model describing the growth of the mold you first make the
hypothesis that the growth rate of the fungus is proportional to the amount
of mold present at any time with a proportionality constant of k. Assume
that the initial amount of mold present is 0.25 square inches. Let p(t) denote
the number of square inches of mold present on day t. Write an initial value
problem for the growth of the mold.
b. Assume that the following data is collected over a period of ten days. Assuming
that k is a constant, use the data at day one to determine k. Then using the
forward Euler method with ∆t a fourth and an eight of a day, obtain numer-
ical estimates for each day of the ten day period; tabulate your results and
compare with the experimental data. When do the results become physically
unreasonable?

t=0 p =0.25 t=1 p =0.55


t=2 p =1.1 t=3 p =2.25
t=5 p =7.5 t=7 p =16.25
t=8 p =19.5 t = 10 p =22.75

c. The difficulty with the exponential growth model is that the bread model grows
in an unbounded way as you saw in (b). To improve the model for the growth
of bread mold, we want to incorporate the fact that the number of square
inches of mold can’t exceed the number of square inches in a slice of bread.
Write a logistic differential equation which models this growth using the same
initial condition and growth rate as before.
64 CHAPTER 3. THE EULER METHODS

d. Use the forward Euler method with ∆t a fourth and an eighth of a day to obtain
numerical estimates for the amount of mold present on each of the ten days
using your logistic model. Tabulate your results as in (b) and compare your
results to those from the exponential growth model.
Chapter 4
A Survey of Methods

In the last chapter we developed the forward and backward Euler method for ap-
proximating the solution to the first order IVP (2.8). Although both methods are
simple to understand and program, they converge at a linear rate which is often
prohibitively slow. In this chapter we provide a survey of schemes which have higher
than linear accuracy. Also in Example 3.6 we saw that the forward Euler method
gave unreliable results for some choices of time steps so we need to investigate when
this numerical instability occurs so it can be avoided.
The standard methods for approximating the solution of (2.8) fall into two broad
categories which are single-step/one-step methods and multistep methods. Each
category consists of families of explicit and families of implicit methods of varying
degrees of accuracy. Both the forward and backward Euler methods are single-step
methods because they only use information from one previously computed solution
(i.e., at tn ) to compute the solution at tn+1 . We have not encountered multistep
methods yet but they use information at several previously calculated points; for
example, a two-step method uses information at tn and tn−1 to predict the solution
at tn+1 . Using previously calculated information is how multistep methods improve
on the linear accuracy of the Euler method. One-step methods improve the accuracy
by performing intermediate approximations in (tn , tn+1 ] which are then discarded.
There are advantages and disadvantages to each class of methods.
Instead of simply listing common single-step and multistep methods, we want to
understand how these methods are derived so that we gain a better understanding.
For one-step methods we begin by using Taylor series expansions to derive the Eu-
ler methods and see how this approach can be easily extended to get higher order
accurate methods. However, we see that these methods require repeated differen-
tiation of the given slope f (t, y) and so are not, in general, practical. To obtain
methods which don’t require repeated differentiation we investigate how numerical
quadrature rules and interpolating polynomials can be used to derive methods. In
these approaches we integrate the differential equation from tn to tn+1 and either
use a quadrature rule to evaluate the integral of f (t, y) or first replace f (t, y) by

65
66 CHAPTER 4. A SURVEY OF METHODS

an interpolating polynomial and then integrate. We also introduce a systematic


approach to deriving schemes called the method of undetermined coefficients. In
this approach we let the coefficients in the scheme be undetermined parameters and
determine conditions on the parameters so that the scheme has as high accuracy
as possible. The Runge-Kutta methods discussed in § 4.1.3 are the most popular
one-step methods. After introducing Runge-Kutta methods we investigate how to
determine if a single-step method is stable for all choices of time step; if it is not,
we show how to obtain conditions on the time step which guarantee stability.
Multistep methods are derived in analogous ways. However, when we integrate
the equation or use an interpolating polynomial we need to include points such as
tn−1 , tn−2 , etc. Backward difference methods are discussed in § 4.2.1 and the
widely used Adams-Bashforth and Adams-Moulton families of multistep methods
are presented in § 4.2.2. Because multistep methods rely on previously computed
values the stability analysis is more complicated than for single-step methods so we
simply summarize the conditions for stability.
For both multistep and single-step methods we provide numerical results and
demonstrate that the numerical rate of convergence agrees with the theoretical
results. Efficient implementation of the methods is discussed.
In § 4.3 we see how using Richardson extrapolation can take a sequence of ap-
proximations from a lower order method and generate more accurate approximations
without additional function evaluations. This approach can be used with single-step
or multistep methods. We discuss how the currently popular Burlisch-Stoer extrap-
olation method exploits certain properties of methods to provide a robust algorithm
with high accuracy.
In general, the backward Euler method, which is implicit, is more costly to
implement than the forward Euler method, which is explicit, due to the fact that it
typically requires the solution of a nonlinear equation at each time step. In § 4.4
we investigate an efficient way to implement an implicit method by pairing it with
an explicit method to yield the so-called Predictor-Corrector methods.

4.1 Single-Step Methods


In this section we look at the important class of numerical methods for the IVP (2.8)
called single-step or one-step methods which only use information at one previously
calculated point, tn , to approximate the solution at tn+1 . Both the forward and
backward Euler methods are one-step methods with linear accuracy. To improve on
the accuracy of these methods, higher order explicit single-step methods compute
additional approximations in the interval (tn , tn+1 ) which are used to approximate
the solution at tn+1 ; these intermediate approximations are then discarded. Implicit
single-step methods use the additional point tn+1 .
We first look at Taylor series methods which are easy to derive but impracti-
cal to use because they require repeated differentiation of the given slope f (t, y).
Then we demonstrate how other one-step methods are derived by first integrating
the differential equation and using a numerical quadrature rule to approximate the
integral of the slope. One shortcoming of this approach is that for each quadra-
4.1. SINGLE-STEP METHODS 67

ture rule we choose we obtain a different method whose accuracy must then be
obtained. An alternate approach to deriving methods is to form a general explicit or
implicit method, assuming a fixed number of additional function evaluations, and
then determine the coefficients in the scheme so that one has as high an accuracy as
possible. This approach results in families of methods which have a given accuracy
and so eliminates the tedious local truncation error calculations. Either approach
leads to the Runge-Kutta family of methods which we discuss in § 4.1.3.

4.1.1 Taylor series methods


Taylor series is an extremely useful tool in numerical analysis, especially in deriv-
ing and analyzing difference methods. In this section we demonstrate that it is
straightforward to derive the forward Euler method using Taylor series and then
to generalize the approach to derive higher order accurate schemes. Unfortunately,
these schemes are not very practical because they require repeated differentiation of
the given slope f (t, y). This means that software implementing these methods will
be very problem dependent and require several additional routines to be provided
by the user for each problem. Additionally, f (t, y) may not possess higher order
derivatives.
To derive explicit methods using Taylor series we use the differential equation
evaluated at tn so we need an approximation to y 0 (tn ). Thus we expand y(tn + ∆t)
about tn to get

(∆t)2 00 (∆t)k [k]


y(tn + ∆t) = y(tn ) + ∆ty 0 (tn ) + y (tn ) + · · · + y (tn ) + · · · . (4.1)
2! k!
This is an infinite series so if we truncate it then we have an approximation to
y(tn + ∆t) from which we can approximate y 0 (tn ). For example, if we truncate the
series after the term which is O(∆t) we have

y(tn+1 ) − y(tn )
y(tn +∆t) ≈ y(tn )+∆ty 0 (tn ) = y(tn )+∆tf (tn , y(tn )) ⇒ y 0 (tn ) ≈
∆t
which leads to the forward Euler method when we substitute into the differential
equation evaluated at tn , i.e., y 0 (tn ) = f tn , y(tn ) .
So theoretically, if we keep additional terms in the series expansion for y(tn +∆t)
then we get a higher order approximation. To see how this approach works, we now
keep three terms in the expansion and thus have a remainder term of O(∆t)3 . From
(4.1) we have

(∆t)2 00
y(tn + ∆t) ≈ y(tn ) + ∆ty 0 (tn ) + y (tn ) ,
2!
so we expect a local error of O(∆t)3 which leads to an expected global error of
O(∆t)2 . Now the problem we have to address when we use this expansion is what
to do with y 00 (tn ) because we only know y 0 (t) = f (t, y). If our function is smooth
enough, we can differentiate this equation with respect to t to get y 00 (t). To do this
68 CHAPTER 4. A SURVEY OF METHODS

recall that we have to use the chain rule because f is a function of t and y where
y is also a function t. Specifically, we have
∂f dt ∂f dy
y 0 (t) = f (t, y) ⇒ y 00 (t) = + = ft + fy f .
∂t dt ∂y dt

Substituting this into the expression for y(tn + ∆t) gives

 (∆t)2 h   i
y(tn +dt) ≈ y(tn )+∆tf tn , y(tn ) + ft (tn , y(tn ) +f tn , y(tn ) fy (tn , y(tn )
2!
which generates the second order explicit Taylor series method.

Second order explicit Taylor series method

(∆t)2 h i
Y n+1 = Y n + ∆tf (tn , Y n ) + ft (tn , Y n ) + f (tn , Y n )fy (tn , Y n ) (4.2)
2

To implement this method, we must provide function routines not only for f (t, y)
but also ft (t, y) and fy (t, y). In some cases this is easy, but in others it can be
tedious or even not possible. The following example applies the second order Taylor
scheme to a specific IVP and in the exercises we explore a third order Taylor series
method.

Example 4.1. Second order taylor method


Approximate the solution to
1
y 0 (t) = 3yt2 y(0) =
3
using (4.2) by hand using ∆t = 0.1 and T = 0.2. Then implement the method and obtain
approximations for ∆t = 1/4, 1/8, · · · , 1/128. Verify the quadratic convergence. The
3
exact solution is 13 et .
Before writing a code for a particular method, it is helpful to first perform some calculations
by hand so it is clear that the method is completely understood and also to have some
results with which to compare the numerical simulations for debugging. To this end, we
first calculate Y 1 and Y 2 using ∆t = 0.1. Then we provide numerical results at t = 1
for several choices of ∆t and compare with a first order explicit Taylor series method, i.e.,
with the forward Euler method.
From the discussion in this section, we know that we need ft and fy so

f (t, y) = 3yt2 ⇒ ft = 6ty and fy = 3t2 .

Substitution into the difference equation (4.2) gives the expression

(∆t)2
Y n+1 = Y n + 3∆tY n (tn )2 + 6tn Y n + 9(tn )4 Y n .

(4.3)
2
4.1. SINGLE-STEP METHODS 69

For Y 0 = 1/3 we have

(.1)2
 
1 1 1
Y1 = + 0.1(3) 0+ (0) =
3 3 2 3

and
(.1)2
   
1 1 1 1
Y2 = + 0.1(3) (.1)2 + 6(.1) + 9(.1)4 = 0.335335 .
3 3 2 3 3
The exact solution at t = 0.2 is 0.336011 which gives an error of 0.675862 10−3 .
To implement this method in a computer code we modify our program for the forward
Euler method to include the O(∆t)2 terms in (4.2). In addition to a function for f (t, y)
we also need to provide function routines for its first partial derivatives fy and ft ; note
that in our program we code the general equation (4.2), not the equation (4.3) specific
to our problem. We perform calculations with decreasing values of ∆t and compare with
results at t = 1 using the forward Euler method. When we compute the numerical rate
of convergence we see that the rate of convergence is O(∆t)2 , as expected whereas the
forward Euler is only linear. For this reason when we compare the global errors at a fixed,
small time step we see that the error is much smaller for the second order method because
it is converging to zero faster than the Euler method.

∆t Error in Numerical Error in Numerical


Euler rate second order Taylor rate
1/4 3.1689 10−1 1.2328 10−1
1/8 2.0007 10−1 0.663 4.1143 10−2 1.58
1/16 1.1521 10−1 0.796 1.1932 10−2 1.79
1/32 6.2350 10−2 0.886 3.2091 10−3 1.89
1/64 3.2516 10−2 0.939 8.3150 10−4 1.95
1/128 1.6615 10−2 0.969 2.1157 10−4 1.97

Implicit Taylor series methods are derived in an analogous manner. In this case
we use the differential equation evaluated at tn+1 , i.e., y 0 (tn+1 ) = f tn+1 , y(tn+1 ) .


Consequently we need an approximation to y 0 (tn+1 ) instead of y 0 (tn ) so we use the


expansion

(∆t)2 00
y(tn ) = y(tn+1 − ∆t) = y(tn+1 ) − ∆ty 0 (tn+1 ) + y (tn+1 ) + · · · . (4.4)
2!
Keeping terms through O(∆t) gives the backward Euler method. In the exercises
you are asked to derive a second order implicit Taylor series method.
Taylor series methods are single step methods. Although using these methods
results in methods with higher order accuracy than the Euler methods, they are
not considered practical because of the requirement of repeated differentiation of
f (t, y). For example, the first full derivative has two terms and the second has five
terms. So even if f (t, y) can be differentiated, the methods become unwieldy. To
implement the methods on a computer the user must provide routines for all the
partial derivatives so the codes become very problem dependent. For these reasons
we look at other approaches to derive higher order schemes.
70 CHAPTER 4. A SURVEY OF METHODS

4.1.2 Derivation of methods which don’t require repeated dif-


ferentiation of the slope
Taylor series methods are not practical because they require repeated differentiation
of the solution so it is important to obtain methods which don’t need to do this. If
we integrate the differential equation then the left-hand side can be evaluated easily
but, in general, we won’t be able to integrate the given slope. One approach is to
use a numerical quadrature rule to approximate this integral. A second approach is
to use an interpolating polynomial in [tn , tn+1 ] to approximate the slope and then
integrate. Because many quadrature rules are interpolatory in nature, these two
approaches often yield equivalent schemes. The use of a quadrature rule is more
general so we concentrate on that approach. The shortcoming of this approach is
that for each method we derive, we have to demonstrate its accuracy. An approach
which generates families of methods of a prescribed accuracy is described in this
section.

Using numerical quadrature


We first derive one-step methods by integrating the differential equation from tn to
tn+1 and then approximating the integral of the slope using a numerical quadrature
rule. When we integrate (2.8a) from tn to tn+1 we have
Z tn+1 Z tn+1
y 0 (t) dt = f (t, y) dt . (4.5)
tn tn

The integral on the left-hand side can be evaluated exactly by the Fundamental
Theorem of Calculus to get y(tn+1 ) − y(tn ). However, in general, we must use
numerical quadrature to approximate the integral on the right-hand side. Recall
from calculus that one of the simplest approximations to an integral is to use ei-
ther a left or right Riemann sum, i.e., if the integrand is nonnegative then we are
approximating the area under the curve by a rectangle. If we use a left sum for the
integral we approximate the integral by a rectangle whose base is ∆t and whose
height is determined by the integrand evaluated at the left endpoint of the interval;
i.e., we use the formula Z b
g(x) ≈ g(a)(b − a) .
a
Using the left Riemann sum to approximate the integral of f (t, y) gives
Z tn+1

y(tn+1 ) − y(tn ) = f (t, y) dt ≈ ∆tf tn , y(tn )
tn

which leads us to the forward Euler method. In the exercises we explore the im-
plications of using a right Riemann sum. Clearly different approximations to the
integral of f (t, y) yield different methods.
Numerical quadrature rules for single integrals have the general form
Z b Q
X
g(x) dx ≈ wi g(qi ) ,
a i=1
4.1. SINGLE-STEP METHODS 71

where the scalars wi are called the quadrature weights, the points qi are the quadra-
ture points in [a, b] and Q is the number of quadrature points used. One common
numerical integration rule is the midpoint rule where, as the name indicates, we
evaluate the integrand at the midpoint of the interval; specifically the midpoint
quadrature rule is
Z b  
a+b
g(x) dx ≈ (b − a)g .
a 2
Using the midpoint quadrature rule to approximate the integral of f (t, y) in (4.5)
gives
∆t ∆t 
y(tn+1 ) − y(tn ) ≈ ∆tf tn + , y(tn + ) .
2 2
We encountered this scheme in § 3.3. Recall that we don’t know y evaluated at
the midpoint so we must use an approximation. If we use the forward Euler method
starting at tn and take a step of length ∆t/2 then this produces an approximation
to y at the midpoint i.e.,
∆t ∆t 
y(tn + ) ≈ y(tn ) + f tn , y(tn ) .
2 2
Thus we can view our method as having two parts; first we approximate y at the
midpoint using Euler’s method and then use it to approximate y(tn+1 ). Combining
these into one equation allows the scheme to be written as
∆t n 1
Y n+1 = Y n + ∆tf tn + , Y + ∆tf (tn , Y n ) .

2 2
However, the method is usually written in the following way for clarity and to
emphasize the fact that there are two function evaluations required.

Midpoint Rule

k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + ∆t n 1
2 , Y + 2 k1 )
(4.6)
Y n+1 = n
Y + k2

The midpoint rule has a simple geometrical interpretation. Recall that for the
forward Euler method we used a tangent line at tn to extrapolate the solution at
tn+1 . In the midpoint rule we use a tangent line at tn + ∆t/2 to extrapolate the
solution at tn+1 . Heuristically we expect this to give a better approximation than
the tangent line at tn .
The midpoint rule is a single-step method because it only uses one previously
calculated solution, i.e., the solution at tn . However, it requires one more function
evaluation than the forward Euler method but, unlike the Taylor series methods,
it does not require additional derivatives of the slope f (t, y). Because we are
doing more work than the Euler method, we would like to think that the scheme
72 CHAPTER 4. A SURVEY OF METHODS

converges faster. In the next example we demonstrate that the local truncation
error of the midpoint method is O(∆t)3 so that we expect the method to converge
with a global error of O(∆t)2 . The steps in estimating the local truncation error for
the midpoint method are analogous to the ones we performed for determining the
local truncation error for the forward Euler method except now we need to use a
Taylor series expansion in two independent variables for f (t, y) because of the term
f (tn + ∆t n 1
2 , Y + 2 k1 ).

Example 4.2. local truncation error for the midpoint rule


Show that the local truncation error for the midpoint rule is exactly O(∆t)3 .
Recall that the local truncation error is the remainder when the exact solution is substituted
into the difference equation. For the midpoint rule the local truncation error τn+1 at tn+1
is
 
 ∆t ∆t 
τn+1 = y(tn+1 ) − y(tn ) + ∆tf tn + , y(tn ) + f tn , y(tn ) . (4.7)
2 2
As before, we expand y(tn+1 ) with a Taylor series but this time we keep the terms through
(∆t)3 because we want to demonstrate that terms in the expression for the truncation
error through (∆t)2 cancel but terms involving (∆t)3 do not; this way we will demonstrate
that the local truncation is exactly O(∆t)3 rather than it is at least O(∆t)3 . We have
(∆t)2 00 (∆t)3 000
y(tn+1 ) = y(tn ) + ∆ty 0 (tn ) + y (tn ) + y (tn ) + O(∆t)4 . (4.8)
2 3!
Substituting this into (4.7) yields
(∆t)2 00 (∆t)3 000
 
τn+1 = y(tn ) + ∆ty 0 (tn ) + y (tn ) + y (tn ) + O(∆t)4
 2 3!  (4.9)
 ∆t ∆t 
− y(tn ) + ∆tf tn + , y(tn ) + f tn , y(tn ) .
2 2
Now all terms are evaluated at tn except the term f (tn + ∆t 2
, y(tn ) + ∆t
2
f (tn , y(tn ))).
Because this term is a function of two variables instead of one we need to use the Taylor
series expansion (see Appendix)
g(x + h, y + k) = g(x, y) + hgx (x, y) + kgy (x, y)
1  2
h gxx (x, y) + k2 gyy (x, y) + 2khgxy (x, y)

+
2!
1  3
h gxxx (x, y) + k3 gyyy (x, y) + 3k2 hgxyy (x, y) + 3h2 kgxxy (x, y)

+
3!
+··· .
(4.10)
To use this formula to expand f (tn + ∆t 2
, y(t n ) + ∆t
2
f (t n , y(t n ))), we note that the
change in the first variable
 t is h = ∆t/2 and the change in the second variable y is
k = (∆t/2)f tn , y(tn ) . Thus we have
∆t ∆t  h ∆t ∆t
∆tf tn + , y(tn ) + f (tn , y(tn )) = ∆t f + ft + f fy
2 2 2 2
(∆t)2 (∆t)2 2 (∆t)2 i
+ ftt + f fyy + 2 f fty + O(∆t)3 .
4 · 2! 4 · 2! 4 · 2!
4.1. SINGLE-STEP METHODS 73

All terms involving


 f or its derivatives on the right-hand side of this equation are evaluated
at tn , y(tn ) and we have omitted this explicit dependence for brevity. Substituting this
expansion into the expression (4.9) for τn+1 and collecting terms involving each power of
∆t yields
h i h1 1 i
τn+1 = ∆t y 0 − f + ∆t2 y 00 − (ft + f fy )
2 2 (4.11)
1 000 1 h i
+∆t3 y − ftt + f 2 fyy + 2f fty + O(∆t)4 .
3! 8
Clearly y 0 = f so the term involving ∆t cancels but to see if the other terms cancel we
need to write y 00 and y 000 in terms of f and its derivatives. To write y 00 (t) in terms of
f (t, y) and its partial derivatives we have to differentiate f (t, y) with respect to t which
requires the use of the chain rule. We have
∂f ∂t ∂f ∂y
y 00 (t) = + = ft + fy y 0 = ft + fy f .
∂t ∂t ∂y ∂t
Similarly,
∂(ft + fy f ) ∂t ∂(ft + fy f ) ∂y
y 000 (t) = +
∂t ∂t ∂y ∂t
ftt + fyt f + fy ft + fty + fyy f + fy2 f

=
= ftt + 2fyt f + fy ft + fyy f 2 + f fy2 .
Thus the terms in (4.11) involving ∆t and (∆t)2 cancel but the terms involving (∆t)3 do
not. Consequently the local truncation error converges cubically and we expect the global
convergence rate to be quadratic.

If we use a Riemann sum or the midpoint rule to approximate an integral


Rb
a
g(x) dx where g(x) ≥ 0 on [a, b] then we are using a rectangle to approxi-
mate the area. An alternate approach is to use a trapezoid to approximate the area.
The trapezoidal integration rule is found by calculating the area of the trapezoid
with base (b − a) and height determined by the line passing through (a, g(a)) and
(b, g(b)); specifically the rule is
Z b
(b − a)  
g(x) dx ≈ g(a) + g(b) .
a 2
Integrating the differential equation (2.8a) from tn to tn+1 and using this quadrature
rule gives
∆t h  i
y(tn+1 ) − y(tn ) ≈ f tn , y(tn ) + f tn+1 , y(tn+1 ) .
2
We encountered this implicit scheme in § 3.3.

∆t 
Y n+1 = Y n + f (tn , Y n ) + f (tn+1 , Y n+1 )

Trapezoidal Rule (4.12)
2
74 CHAPTER 4. A SURVEY OF METHODS

However, like the backward Euler method this is an implicit scheme and thus for
each tn we need to solve a nonlinear equation for most choices of f (t, y). This can
be done, but there are better approaches for using implicit schemes in the context
of ODEs as we see in § 4.4.
Other numerical quadrature rules on the interval from [tn , tn+1 ] lead to addi-
tional explicit and implicit one-step methods. The Euler method, the midpoint rule
and the trapezoidal rule all belong to a family of methods called Runge-Kutta
methods which we discuss in § 4.1.3.
Many quadrature rules are interpolatory in nature; that is, the integrand is
approximated by an interpolating polynomial which can then be integrated exactly.
Rb
For example, for a g(x) dx we could use a constant, a linear polynomial, a quadratic
polynomial, etc. to approximate g(x) in [a, b] and then integrate it exactly. We
want to use a Lagrange interpolating polynomial instead of a Hermite because the
latter requires derivatives. If we use f (t, y(t)) ≈ f (tn , y(tn )) in [tn , tn+1 ] then we
get the forward Euler method and if we use f (t, y(t)) ≈ f (tn + ∆t/2, y(tn + ∆t/2))
then we get the midpoint rule. So to derive some single-step methods we can use
interpolation but only using points in the interval [tn , tn+1 ]. However there are
many quadrature rules, such as Gauss quadrature, which are not interpolatory and
so using numerical quadrature as an approach to deriving single-step methods is
more general.

Using the method of undetermined coefficients


One problem with deriving methods using numerical integration or interpolation is
that once we have obtained a method then we must determine its local truncation
error which is straightforward but often tedious. A systematic approach to deriving
methods is to assume the most general form of the desired method and then de-
termine constraints on the coefficients in the general method so that it has a local
truncation error which is as high as possible.
Suppose we want an explicit one-step scheme and we are only willing to perform
one function evaluation which is at (tn , Y n ). The forward Euler method is such a
scheme; we want to determine if there are any others, especially one which converges
at a higher rate. The most general form of an explicit one-step scheme is
Y n+1 = αY n + b1 ∆tf (tn , Y n ) ;
note that the forward Euler method has α = b1 = 1. To determine the local
truncation error we substitute the exact solution into the difference equation and
calculate the remainder. We have

τn+1 = y(tn+1 ) − αy(tn ) − b1 ∆tf tn , y(tn )
h ∆t2 00 ∆t3 000 i
= y(tn ) + ∆ty 0 (tn ) + y (tn ) + y (ξn ) − αy(tn ) − b1 ∆ty 0 (tn )
2! 3!
 1  ∆t3 000
= y(tn ) 1 − α + y (tn )∆t 1 − b1 + y 00 (tn )∆t2
0
   
+ y (ξn ) ,
2 3!
where we have expanded y(tn+1 ) in a Taylor series with remainder and used the
fact that y 0 (t) = f (t, y) in the second step. In the last step we have grouped the
4.1. SINGLE-STEP METHODS 75

constant terms, the terms involving ∆t, ∆t2 and ∆t3 . For the constant term to
disappear we require that α = 1; for the linear term in ∆t we require that b1 = 1.
The term involving ∆t2 can not be made to disappear so the only explicit method
with one function evaluation which has a local truncation O(∆t)2 is the forward
Euler method. No explicit method using only the function evaluation at tn with a
local truncation error greater than O(∆t)2 is possible. Note that α must always be
one to cancel the y(tn ) term in the expansion of y(tn + ∆t) so in the sequel there
is no need for it to be unknown.
The following example illustrates this approach if we want an explicit single-step
method where we are willing to perform one additional function value in the interval
[tn , tn+1 ]. We already know that the midpoint rule is a second order method which
requires the additional function evaluation at tn + ∆t/2. However, this example
demonstrates that there is an infinite number of such methods.

Example 4.3. derivation of a second order explicit single-step method


We now assume that in addition to evaluating f (tn , Y n ) we want to evaluate the slope
at one intermediate point in (tn , tn+1 ]. Because we are doing an additional function
evaluation, we expect that we should be able to make the truncation error smaller if we
choose the parameters correctly; i.e., we choose an appropriate point in (tn , tn+1 ]. A
random point may not give us second order accuracy. We must leave the choice of the
location of the point as a variable; we denote the general point in (tn , tn+1 ] as tn + c2 ∆t
and the general approximation to y at this point ias Y n + a21 ∆tf (tn , Y n ). The general
difference equation using two function evaluations is

Y n+1 = Y n + b1 ∆tf (tn , Y n ) + b2 ∆tf tn + c2 ∆t, Y n + a21 ∆tf (tn , Y n ) .



(4.13)

Recall that we have set α = 1 as in the derivation of the forward Euler method.
To determine constraints on the parameters b1 , b2 , c2 and a21 which result in the highest
order for the truncation error, we compute the local truncation error and use Taylor series
to expand the terms. For simplicity, in the following expansion we have omitted the explicit
evaluation of f and its derivatives at the point (tn , y(tn )); however, if f is evaluated at
some other point we have explicitly noted this. We use (4.10) for a Taylor series expansion
in two variables to get

∆t2 00 ∆t3 000


 
τn+1 = y + ∆ty 0 + y + y + O(∆t4 )
2! 3!
h i
− y + b1 ∆tf + b2 ∆tf (tn + c2 ∆t, y + a21 ∆tf )
h ∆t2  ∆t3 i
ftt + 2f fty + f 2 fyy + ft fy + f fy2 + O(∆t)4 )

= ∆tf + ft + f fy +
2 6
h 
− b1 ∆tf + b2 ∆t f + c2 ∆tft + a21 ∆tf fy

c22 (∆t)2 a2 (∆t)2 f 2 i


+ ftt + 21 fyy + c2 a21 (∆t)2 f fty + O(∆t)3 .
2 2
We first see if we can determine the parameters so that the scheme has a local truncation
error of O(∆t3 ); to this end we must determine the equations that the unknown coefficients
76 CHAPTER 4. A SURVEY OF METHODS

must satisfy in order for the terms involving (∆t)1 and (∆t)2 to vanish. We have

 ∆t [f (1 − b1 − b2 )]
 = 0
2 1  1 
∆t ft − b2 c2 + f fy − b2 a21 = 0,
2 2
where once again we have dropped the explicit evaluation of y and f at (tn , y(tn )). Thus
we have the conditions
1 1
b1 + b2 = 1, b2 c2 = and b2 a21 = . (4.14)
2 2
Note that the midpoint method given in (4.6) satisfies these equations with b1 = 0, b2 = 1,
c2 = a21 = 1/2. However, any choice of the parameters which satisfy these constraints
generates a method with a third order local truncation error.
Because we have four parameters and only three constraints we might ask ourselves if it is
possible to choose the parameters so that the local truncation error is one order higher, i.e.,
O(∆t)4 . To see that this is impossible note that in the expansion of y(tn+1 ) the term y 000
involves terms such as ft fy for which there are no corresponding terms in the expansion of
f tn + c2 ∆t, Y n + a21 ∆tf (tn , Y n ) so these O(∆t)3 terms remain. Consequently there
is no third order explicit one-step method which only performs two function evaluations
per time step.

4.1.3 Runge-Kutta methods


Runge-Kutta1 (RK) methods are a family of one-step methods which include both
explicit and implicit methods. They are further characterized by the number of
stages which is just the number of function evaluations performed at each time
step. The forward Euler and the midpoint methods are examples of explicit RK
methods; the Euler method is a one-stage method whereas the midpoint method is
a two-stage method because it uses information at the midpoint tn + ∆t/2 as well
as at tn . Both the backward Euler and the trapezoidal methods are examples of
implicit RK methods; the backward Euler method is a one-stage method whereas
the trapezoidal method is a two-stage method. The family of RK methods were
developed primarily from 1895 to 1925 and involved work by Runge, Heun, Kutta
and Nystrom. Interested readers should refer to the paper by J.C. Butcher entitled
“A history of Runge-Kutta methods.”
The standard approach for the derivation of families of RK methods is to use the
method of undetermined coefficients discussed in § 4.1.2 because it gives families of
methods with a prescribed local truncation error. This is illustrated in Example 4.3
where we assumed the most general form of an explicit two-stage single step method
in (4.13). To illustrate the fact that it is a two-stage method we can also write
(4.13) as
k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + c2 ∆t, Y n + a21 k1 )
Y n+1 = Y n + b1 k1 + b2 k2 .
1 Carl David Tolmé Runge (1856-1927) and Martin Wilhelm Kutta (1867-1944) were German

mathematicians
4.1. SINGLE-STEP METHODS 77

We obtain the constraints b1 + b2 = 1, b2 c2 = 1/2 and b2 a21 = 1/2 and so there


is a family of second order accurate two-stage explicit RK methods. The midpoint
method which we derived using numerical quadrature is a two-stage RK method.
Another commonly used two-stage RK method is the Heun method. Here the
intermediate time is tn + 32 ∆t, so c2 = 2/3. The equation b2 c2 = 1/2 requires
b2 = 3/4 and the condition b1 + b2 = 1 requires b1 = 1/4 and b2 a21 = 1/2 implies
a21 = 2/3. Thus the intermediate point is tn + 23 ∆t, Y n + 32 ∆tf (tn , Y n ) ; note
that y(tn + 23 ∆t) is approximated by taking an Euler step of length 23 ∆t.

Heun Method
k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + 32 ∆t, Y n + 23 k1 ) (4.15)
n+1 n 1 3
Y = Y + 4 k1 + 4 k2

The general form for an explicit s-stage RK method is given below. The coef-
ficient c1 is always zero because we always evaluate f at the point (tn , Y n ) from
the previous step to get the appropriate cancellation for the ∆t term in the local
truncation error calculation.

General s-stage explicit RK method

k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + c2 ∆t, Y n + a21 k1 )
k3 = ∆tf (tn + c3 ∆t, Y n + a31 k1 + a32 k2 )
.. (4.16)
. 
n
ks = ∆tf tn + cs ∆t, Y + as1 k1 + as2 k2 + · · · + ass−1 ks−1
Ps
Y n+1 = Y n + j=1 bj kj

Once the stage s is set and the coefficients are determined, the method is completely
specified; for this reason, the RK explicit s-stage methods are often described by a
Butcher2 tableau of the form

0
c2 a21
c3 a31 a32
.. .. .. .. (4.17)
. . . .
cs as1 as2 ··· ass
b1 b2 ··· bs
2 Named after John C. Butcher, a mathematician from New Zealand.
78 CHAPTER 4. A SURVEY OF METHODS

As an example, a commonly used four-stage RK method is described by the tableau

0
1 1
2 2
1 1
0
2 2 (4.18)
1 0 0 1
1 1 1 1
6 3 3 6

which uses an approximation at the point tn , two approximations at the midpoint


tn + ∆t/2, and the fourth approximation at tn+1 . This defines the method

k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + 21 ∆t, Y n + 12 k1 )
k3 = ∆tf (tn + 21 ∆t, Y n + 12 k2 )
k4 = ∆tf (tn + ∆t, Y n + k3 )
Y n+1 = Y n + k61 + k32 + k33 + k64 .

In the examples of RK methods


Pi−1provided, we note that ci in the term tn + ci ∆t
satisfy the property that ci = j=1 aij which can be demonstrated rigorously. In
Ps
addition, the weights bi satisfy i=1 bi = 1. This can be used as a check in a
computer code to confirm the coefficients have been entered correctly.

numerical implementation of runge-kutta methods


When implementing RK methods using a fixed time step one could have separate
codes for the Euler method, midpoint method, Heun’s method, etc. but a more
general approach is to write a routine where the user chooses the method based
upon the stage of the desired RK method. Basically one writes “library” routines
which input the coefficients for each s-stage method coded and another routine
which advances the solution one time step for any s-stage method; once debugged,
these routines never need to be modified so they can be private routines. A driver
routine is used to call both routines. The user sets the number of stages desired and
the problem specific information. Then the driver routine initializes the computation
by calling the appropriate routine to set the RK coefficients and then at each time
step a routine is called to advance the solution. This code structure can be easily
done in an object-oriented manner or simply with conditional statements. Error
routines need to be added if the exact solution is known. Note that storing the
coefficients as arrays allows us to use a dot product instead of loops.

Algorithm 4.1 : Advancing explicit RK method one time step

Assume: Coefficients in RK method are stored as follows: ci and bi , i =


1, . . . , s are stored in 1D array and coefficients aij , i, j = 1, . . . , s, in a
2D array
4.1. SINGLE-STEP METHODS 79

Input: the solution y at the time t, the uniform time step ∆t, s the number
of stages, coefficients a, b, c.
Loop over number of stages:
k=0

for i = 1, . . . , s
teval = t + c(i)∆t

yeval = y + dot product a(i, ·), k
k(i) = ∆tf (teval , yeval )

y = y + dot product(k, b)

The following example compares the results of using explicit RK methods for
stages one through four. Note that the numerical rate of convergence matches the
stage number for stages one through four but, as we will see, this is not true in
general.

Example 4.4. Numerical simulations using explicit rk methods


Consider the IVP
y 0 (t) = ty 2 (t) 0≤t≤2 y(0) = −1
whose exact solution is y(t) = −2/(t2 + 2) . In the table below we provide numerical
results at t = 2 using RK methods with stages one through four. The methods are chosen
so that they have as high degree of accuracy as possible for the given stage. The global
error for each s-stage method at t = 2 is tabulated along with the corresponding numerical
rates calculated by comparing errors at successive values of the time step.

∆t stage 1 stage 2 stage 3 stage 4


error rate error rate error rate error rate
1/5 2.38 10−2 1.36 10−3 1.29 10−4 1.17 10−5
−2 −4 −5
1/10 1.08 10 1.14 3.40 10 2.01 1.48 10 3.12 7.20 10−7 4.02
−3 −5 −6 −8
1/20 5.17 10 1.06 8.38 10 2.02 1.78 10 3.05 4.45 10 4.02
1/40 2.53 10−3 1.03 2.08 10−5 2.01 2.19 10−7 3.02 2.77 10−9 4.01

Many RK methods were derived in the early part of the 1900’s; initially, the
impetus was to find higher order explicit methods. In Example 4.4 we saw that for
s ≤ 4 the stage and the order of accuracy are the same. One might be tempted
to generalize that an s-stage method always produces a method with global error
O(∆t)s , however, this is not the case. In fact, there is an order barrier which
is illustrated in the table below. As you can see from the table, a five-stage RK
80 CHAPTER 4. A SURVEY OF METHODS

method does not produce a fifth order scheme; we need a six-stage method to
produce that accuracy so there is no practical reason to use a five-stage scheme
because it has the same accuracy as a four-stage scheme but requires one additional
function evaluation.

Maximum accuracy of s-stage explicit RK methods

Stage 1 2 3 4 5 6 7 8 9 10
Order 1 2 3 4 4 5 6 6 7 7

Analogous to the general explicit s-stage RK scheme (4.16) we can write a


general form of an implicit s-stage RK method. The difference in implicit methods
is that in the calculation of ki the approximation to y(tn + ci ∆t) can be over all
values of s whereas in explicit methods the sum only goes through the previous kj ,
j = 1, · · · , j − 1 terms.

General s-stage implicit RK method

k1 = ∆tf (tn , Y n + a11 k1 + a12 k2 + · · · a1s ks )


k2 = ∆tf (tn + c2 ∆t, Y n + a21 k1 + +a22 k2 + · · · a2s ks )
..
. (4.19)
ks = ∆tf (tn + cs ∆t, Y n + as1 k1 + as2 k2 + · · · + ass ks )
Ps
Y n+1 = Y n + j=1 bj kj

An implicit RK method has a tableau which is no longer upper triangular

0 a11 a12 ··· a1s


c2 a21 a22 ··· a2s
.. .. .. .. ..
. . . . . (4.20)
cs as1 as2 ··· ass
b1 b2 ··· bs

Unlike explicit RK methods, implicit RK methods do not have the order barrier.
For example, the following four-stage implicit RK method has order five so it is
more accurate than any four-stage explicit RK method.

k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + 41 ∆t, Y n + 18 k1 + 18 k2 )
7 1 14 3
k3 = ∆tf (tn + 10 ∆t, Y n − 100 k1 + 25 k + 20 k3 )
n 2 5
 2
k4 = ∆tf tn + ∆t, Y + 7 k1 + 7 k3
1
Y n+1 = Y n + 14 k1 + 32 250 5
81 k2 + 567 k3 + 54 k4 .
4.1. SINGLE-STEP METHODS 81

4.1.4 Stability of single-step methods


For stability we want to know that the computed solution to the difference equation
remains close to the actual solution of the difference equation and so does not grow
in an unbounded manner. We first look at stability of the differential equation for
the model problem

y 0 (t) = λy 0 < t ≤ T, λ ∈ C, y(0) = y0 (4.21)

with the solution y(t) = y0 eλt ; here C represents all complex numbers of the form
α + iβ. Note that in general λ is a complex number but to understand why we
look at this particular problem first consider the case when λ is real. If λ > 0 then
small changes in the initial condition can result in the solutions becoming far apart.
For example, if we have IVPs (4.21) with two initial conditions y1 (0) = α and
y2 (0) = β which differ by δ = |β − α| then the solutions y1 = αeλt and y2 = βeλt
differ by δeλt . Consequently, for large λ > 0 these solutions can differ dramatically
as illustrated in the table below for various choices of δ and λ. However, if λ < 0
the term δeλt approaches zero as t → 0. Therefore for stability of this model IVP
when λ is real we require λ < 0.

λ δ = |β − α| |y1 (0.5) − y2 (0.5)| |y1 (1) − y2 (1)| |y1 (10) − y2 (10)|

1 0.01 0.0165 0.0272 220


1 0.1 0.165 0.272 2203
10 0.01 1.48 220 1041
10 0.1 14.8 2203 1042
−1 0.1 6.07 10−2 3.68 10−2 4.54 10−6
−10 0.1 6.73 10−4 4.54 10−6 10−45

In general, λ is√complex so it can be written as λ = α + iβ where α, β are real


numbers and i = −1. The exact solution is

y(t) = y0 eλt = y0 eαt+iβt = y0 eαt eiβt .

Now eiβt = cos(βt) + i sin(βt) so this term does not grow in time; however the
term eαt grows in an unbounded manner if α > 0. Consequently we say that the
differential equation y 0 = λy is stable when the real part of λ is less than or equal
to zero, i.e., Re(λ) ≤ 0 or λ is in the left half of the complex plane.
When we approximate the model IVP (4.21) we want to know that small
changes, such as those due to roundoff, do not cause large changes in the solu-
tion. Here we are going to look at stability of a difference equation of the form

Y n+1 = ζ(λ∆t)Y n (4.22)

applied to the model problem (4.21). We apply the difference equation (4.22)
recursively to get

Y n = ζ(λ∆t)Y n−1 = ζ 2 (λ∆t)Yi−2 = · · · = ζ i (λ∆t)Y0


82 CHAPTER 4. A SURVEY OF METHODS

so we can view ζ as an amplification factor because the solution at time tn−1 is


amplified by a factor of ζ to get the solution at tn , the solution at time tn−2 is
amplified by a factor of ζ 2 to get the solution at tn , etc. Our single step methods
fit into this framework as the following example illustrates.

Example 4.5. amplification factors


Determine the amplification factors for the forward and backward Euler methods.
The forward Euler method applied to the differential equation y 0 = λy gives

Y n+1 = Y n + ∆tλY n = (1 + ∆tλ)Y n

so ζ(λ∆t) = 1 + ∆tλ.
For the backward Euler method we have

Y n+1 = Y n + ∆tλY n+1 ⇒ (1 − ∆tλ)Y n+1 = Y n

so ζ(λ∆t) = 1/(1 − λ∆t).


For explicit RK methods ζ(z) will be a polynomial in z and for implicit RK methods it will
be a rational function.

We know that the magnitude of ζ must be less than or equal to one or else
Y n becomes unbounded. This condition is known as absolute stability. There are
many other definitions of different types of stability; some of these are explored in
the exercises.
Absolute Stability

The region of absolute stability for the difference equation (4.22) is {λ∆t ∈
C | |ζ(λ∆t)| ≤ 1}. A method is called A-stable if |ζ(λ∆t)| ≤ 1 for the entire
left half plane.

In the next example we determine the region of absolute stability of the forward
Euler method and compare to the results in Example 3.6.

Example 4.6. Determine if the forward Euler method and the backward Euler method
are A-stable; if not, determine the region of absolute stability. Then discuss the previous
numerical results for y 0 (t) = −20y(t) in light of these results.
For the forward Euler method ζ(λ∆t) = 1 + λ∆t so the condition for A-stability is that
|1 + λ∆t| ≤ 1 for the entire left plane. Now λ is, in general, complex which we can write
as λ = α + iβ but let’s first look at the real case, i.e., β = 0. Then we have

−1 ≤ 1 + λ∆t ≤ 1 ⇒ −2 ≤ λ∆t ≤ 0

so on the real axis we have the interval [−2, 0]. This says that for a fixed real λ < 0, ∆t
must satisfy ∆t ≤ 2/|λ| and thus the method is not A-stable but has a region [−2, 0] of
4.1. SINGLE-STEP METHODS 83

absolute stability if λ is real. If β 6= 0 then we have the region of stability as a circle in the
complex plane of radius one centered at -1. For example, when λ = −20 ∆t must satisfy
∆t ≤ 0.1. In Example 3.6 we plotted results for ∆t = 1/4 and 1/8 which do not satisfy
the stability criteria. In the figure below we plot approximations to the same problem using
∆t = 1/20, 1/40 and 1/60. As you can see from the graph, the solution appears to be
converging.
0.10

0.08

0.06

Out[654]=

0.04

0.02

0.0 0.2 0.4 0.6 0.8 1.0

For the backward Euler method ζ(λ∆t) = 1/(1 − λ∆t). To determine if it is A-stable we
see if it satisfies the stability criteria for the entire left plane. As before, we first find the
region when λ is real. For λ ≤ 0 have 1 − λ∆t ≥ 1 so that ζ(λ∆t) ≤ 1 for all ∆t and
we have the entire left plane. The backward Euler method is A-stable so any choice of ∆t
provides stable results for λ < 0. This agrees with the results from Example ??.
To be precise, the region of absolute stability for the backward Euler method is actually the
region outside the circle in the complex plane centered at one with radius one. Clearly, this
includes the left half plane. To see this, note that when λ∆t ≥ 2 then |1/(1 − λ∆t)| ≤ 1.
However, we are mainly interested in the case when Re(λ) < 0 because the differential
equation y 0 (t) = λy is stable.

Next we show that the explicit Heun method has the same region of stability as
the forward Euler method so we expect the same behavior. Recall that the Heun
method is second order accurate whereas the Euler method is first order so accuracy
has nothing to do with stability.

Example 4.7. Investigate the region of absolute stability for the explicit 2-stage Heun
method given in (4.15). Plot results for this method applied to the IVP from Example ??.
Choose values of the time step where the stability criteria is not met and then some values
where it is satisfied.
We first write the scheme as a single equation rather than the standard way of specifying
ki because this makes it easier to determine the amplification factor.
∆t  2 2
Y n+1 = Y n + f (tn , Y n ) + 3f tn + ∆t, Y n + ∆tf (tn , Y n ) .

4 3 3
We apply the difference scheme to the model problem y 0 = λy where f (t, y) = λy(t) to
get
∆t  n 2 1 3 1
Y n+1 = Y n + λY + 3λ(Y n + ∆tλY n )] = 1 + (λ∆t) + (λ∆t) + (λ∆t)2 Y n
 
4 3 4 4 2
so ζ(λ∆t) = 1 + λ∆t + 12 (λ∆t)2 . The region of absolute stability is all points z in the
complex plane where |ζ(z)| ≤ 1. If λ is real and non-positive we have
z2 z
−1 ≤ 1 + z + ≤ 1 ⇒ −2 ≤ z(1 + ) ≤ 0 .
2 2
84 CHAPTER 4. A SURVEY OF METHODS

For λ ≤ 0 so that z = λ∆t ≤ 0 we must have 1 + 21 λ∆t ≥ 0 which says ∆tλ ≥ −2. Thus
the region of stability is [−2, 0] when λ is real and when it is complex we have a circle of
radius one centered at −1. This is the same region as the one computed for the forward
Euler method.
The numerical results are shown below for the case when λ = −20. For this choice of λ
the stability criteria becomes ∆t ≤ 0.1 so for the choices of the time step ∆t = 0.5, 0.25
shown on the left, we expect the results to be unreliable but for the ones on the plot on
the right, the stability criteria is satisfied so the results are reliable.
50 1.0

40 0.8

Dt = 0.5

30 0.6

Dt = 0.25
20 0.4

Dt=1/32
10 0.2

Dt=1/64

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

It can be shown that there is no explicit RK method that has an unbounded


region of absolute stability such as the left half plane region of stability that we got
for the backward Euler method. In general, implicit methods do not have stability
restrictions so this is one reason that we need implicit methods. Implicit methods
are especially important when we study initial boundary value problems.

4.2 Multistep Methods


An m-step multistep method uses previously calculated information at the m points
tn , tn−1 , . . . , tn−(m−1) to approximate the solution at tn+1 whereas a one-step
method uses only the information at tn plus additional approximations in the interval
[tn , tn+1 ] which are then discarded. This, of course, means that multistep methods
require fewer function evaluations than single-step methods. However, because
information from previous time steps is needed to advance the solution to tn+1 ,
these values must be stored. This is not an issue when we are solving a single
IVP but when we have a very large system of IVPs the storage of information from
previous steps is significant. Another issue with multistep methods is that they are
not self-starting. Recall that when we implement a RK method we use the initial
condition and then the scheme gives the approximation at t0 + ∆t and subsequent
points. However, if we are using say a three-step method then we need the initial
condition at t0 plus approximations at t1 and t2 to start using the method. So a
shortcoming of m-step methods is that we have to use a one-step method to get
approximations at the first m − 1 time steps after t0 .
An m-step method can be implicit or explicit. The solution at tn+1 can depend
explicitly on the solution and the slope at previous points but the most common
methods only use the solution at tn and the slopes at all points. To write the general
form of an m-step method which can be explicit or implicit we allow the scheme to
be a linear combination of the solution at the m points tn , tn−1 , . . . , tn−(m−1) and
4.2. MULTISTEP METHODS 85

the slopes at the m + 1 points tn+1 , tn , tn−1 , . . . , tn−(m−1) points. If the method
is explicit then the coefficient in front of the term f (tn+1 , Y n+1 ) is zero.

General m-step method

Y n+1 = am−1 Y hn + am−2 Y n−1 + am−3 Yn−2 + · · · + a0 Yn+1−m


+∆t bm f (tn+1 , Y n+1 ) + bm−1 f (tn , Y n ) + bm−2 f (tn−1 , Y n−1 )
i
+ · · · + b0 f (tn+1−m , Y n+1−m ) .
(4.23)

If bm = 0 then the method is explicit; otherwise it is implicit. We don’t include


a term am Y n+1 on the right-hand side for implicit methods because if we did we
could just combine it with the Y n+1 term on the left-hand side of the formula and
then divide by that coefficient to get the form in (4.23).
In this section we begin by looking at two different approaches to derive mul-
tistep methods which involve either approximating the solution or the slope by an
interpolating polynomial through the points used in the m-step method. For com-
monly used explicit methods we consider the family of Adams-Bashforth methods
and for implicit multistep method we investigate the so-called backward difference
methods and the Adams-Moulton families. Implementation issues for multistep
methods are discussed. Finally we state conditions which guarantee stability of
multistep methods.

4.2.1 Derivation of multistep methods


We saw that a common approach to deriving one-step methods (other than Taylor
series methods) is to integrate the differential equation from tn to tn+1 and use
a quadrature rule to approximate the integral over f (t, y). However, for an m-
step method we must integrate from tn−m+1 to tn+1 . For example, for a two-step
method we integrate the equation from tn−1 to tn+1 to get
Z tn+1 Z tn+1
0
y (t) dt = y(tn+1 ) − y(tn−1 ) = f (t, y) dt .
tn−1 tn−1

Now if we use the midpoint quadrature rule to approximate the integral on the right
we have the two-step scheme

Y n+1 = Y n−1 + 2∆tf (tn , Y n ) (4.24)

which is sometimes called the modified midpoint method. However, unlike one-
step methods, our choice of quadrature rule is restricted because for the quadrature
points we must use only the previously calculated times. For example, if we have a
three-step method using tn , tn−1 , and tn−2 we need to use a Newton-Cotes integra-
tion formula such as Simpson’s method. Remember that Newton-Cotes quadrature
86 CHAPTER 4. A SURVEY OF METHODS

rules are interpolatory and so this approach is closely related to using an interpola-
tion polynomial.
Multistep methods are typically derived by using an interpolating polynomial in
either of two ways. The first is to approximate y(t) by an interpolating polynomial
through tn , tn−1 , . . . , tn−m+1 and then differentiate it to get an approximation to
y 0 (t) and substitute this approximation into the DE. If we evaluate the approximation
at tn+1 fthen we obtain an implicit method. This gives rise to a family of implicit
methods called backward difference formulas. The second approach is to use an
interpolating polynomial through tn , tn−1 , . . . , tn−m+1 for the given slope f (t, y)
and then integrate the equation; the integral of the interpolating polynomial can be
computed exactly. We discuss both approaches here.
Similar to one-step methods, we can also derive multistep methods by assuming
the most general form of the m-step method and then determine the constraints on
the coefficients which give as high an order of accuracy as possible. This approach
is just the method of undetermined coefficients discussed in § 4.1.2. This approach
for deriving multistep methods is explored in the exercises.

Using an interpolating polynomial to approximate the solution

Backward difference formulas (BDFs) are a family of implicit multistep methods;


the backward Euler method is considered the first order BDF even though it is a
single step method. We begin by demonstrating how to derive the backward Euler
method by approximating y(t) by a linear interpolating polynomial and then show
how this approach is used to generate more accurate methods by simply using a
higher degree interpolating polynomial.
The Lagrange  form of the unique linear polynomial that passes through the
points tn , y(tn ) and tn+1 , y(tn+1 ) is

t − tn+1 t − tn
p1 (t) = y(tn ) + y(tn+1 ) .
−∆t ∆t

Differentiating with respect to t gives

−1 1
p01 (t) = y(tn ) + y(tn+1 )
∆t ∆t

which leads to the familiar approximation

y(tn+1 ) − y(tn )
y 0 (t) ≈ .
∆t
Using this expression in the differential equation y 0 (t) = f (t, y) at tn+1 gives the
implicit backward Euler method.
For the second order BDF we approximate
 y(t
 n+1 ) by the quadratic
 polynomial
that passes through tn−1 , y(tn−1 ) , tn , y(tn ) and tn+1 , y(tn+1 ) ; the Lagrange
4.2. MULTISTEP METHODS 87

form of the polynomial is

(t − tn )(t − tn+1 ) (t − tn−1 )(t − tn+1 )


p2 (t) = y(tn−1 ) + y(tn )
(tn−1 − tn )(tn−1 − tn+1 ) (tn − tn−1 )(tn − tn+1 )
(t − tn−1 )(t − tn )
+y(tn+1 )
(tn+1 − tn−1 )(tn+1 − tn )

and differentiating with respect to t and assuming a constant ∆t gives

y(tn−1 )   y(tn )   y(tn+1 ) 


p02 (t) =

2
2t−tn −tn+1 − 2
2t−tn−1 −tn+1 + 2
2t−tn−1 −tn .
2(∆t) (∆t) 2(∆t)

We use p02 (tn+1) as an approximation to y 0 (tn+1 ) in the equation y 0 (tn+1 ) =


f tn+1 , y(tn+1 ) to get

y(tn−1 ) y(tn ) y(tn+1 ) 


∆t − 2∆t + 3∆t ≈ f tn+1 , y(tn+1 ) .
2(∆t)2 (∆t)2 2(∆t)2

This suggest the BDF

3 n+1 1
Y − 2Y n + Y n−1 = ∆tf (tn+1 , Y n+1 ) ;
2 2

often these formulas are normalized so that the coefficient of Y n+1 is one. It can
be shown that this method is second order.

4 n 1 n−1 2
Second order BDF Y n+1 = Y − Y + ∆tf (tn+1 , Y n+1 ) (4.25)
3 3 3

In general, BDF formulas using approximations at tn+1 , tn , · · · , tn+1−m have


the general normalized form
m
X
Y n+1 = amj Y (n+1)−j + β∆tf (tn+1 , Y n+1 ) . (4.26)
j=1

For the two-step scheme (4.25) we have m = 2, a21 = 4/3, a22 = −1/3 and
β = 2/3. Table 4.1 gives coefficients for other uniform BDF formulas using the
terminology of (4.26). It can be proved that the accuracy of the m-step methods
included in the table is m.
It is also possible to derive BDFs for nonuniform time steps. The formulas are
derived in an analogous manner but are a bit more complicated because for the
interpolating polynomial we must keep track of each ∆ti ; in the case of a uniform
∆t there are some cancellations which simplify the resulting formulas.
88 CHAPTER 4. A SURVEY OF METHODS

m-step am1 am2 am3 am4 am5 β

1 1 1
2 4/3 -1/3 2/3
3 18/11 -9/11 2/11 6/11
4 48/25 -36/25 16/25 -3/25 12/25
5 300/137 -300/137 200/137 -75/137 12/137 60/137

Table 4.1: Coefficients for implicit BDF formulas of the form (4.26) where the
coefficient of Y n+1 is normalized to one.

Using an interpolating polynomial to approximate the slope


The second choice for deriving schemes using an interpolating polynomial is to
approximate f (t, y) by an interpolating polynomial and then integrate. For example,
suppose we approximate f (t, y) by a polynomial of degree zero, i.e., a constant
 in the
interval [tn , tn+1 ]. If we use the approximation f (t, y) ≈ f tn , y(tn ) in [tn , tn+1 ]
then integrating the differential equation yields
Z tn+1 Z tn+1
 
y(tn+1 ) − y(tn ) = f (t, y) dt ≈ f tn , y(tn ) dt = f tn , y(tn ) ∆t
tn tn

which leads to the forward Euler  method. If we choose to approximate f (t, y) in


[tn , tn+1 ] by f tn+1 , y(tn+1 ) then we get the backward Euler method. In general,
if the interpolation polynomial approximating f (t, y) includes the point tn+1 then
the resulting scheme is implicit because it involves f (tn+1 , Y n+1 ); otherwise it is
explicit.
To see how to derive a two-step explicit scheme, we use the previous information
at tn and tn−1 and write the linear interpolating polynomial for f (t, y) through the
two points and integrate the equation from tn to tn+1 . As before we use the
Rt
Fundamental Theorem of Calculus to integrate tnn+1 y 0 (t) dt. Using uniform step
sizes, we have
Z tn+1  
 t − tn  t − tn−1
y(tn+1 ) − y(tn ) ≈ f tn−1 , y(tn−1 ) + f tn , y(tn ) dt
tn −∆t ∆t

1  (t − tn )2 tn+1
= − f tn−1 , y(tn−1 )
∆t 2 tn
1  (t − tn−1 )2 tn+1
+ f tn , y(tn )
∆t 2 tn

2  3∆t2
 
1  (∆t) 1
= − f tn−1 , y(tn−1 ) + f tn , y(tn )
∆t 2 ∆t 2
which suggests the scheme
3  ∆t
Y n+1 = Y n + ∆tf tn , Y n − f tn−1 , Y n−1 .

(4.27)
2 2
4.2. MULTISTEP METHODS 89

This is an example of an Adams-Bashforth multistep method; these methods are


discussed in more detail in § 4.2.2.

4.2.2 Adams-Bashforth and Adams-Moulton families


A commonly used family of explicit multistep methods are called Adams-Bashforth
methods which use the derivative f evaluated at m prior points (including tn ) but
only use the approximation to y(t) at tn ; i.e., a0 = · · · = am−2 = 0. The one
step Adams-Bashforth method is the forward Euler method. In § 4.2.1 we use an
interpolation polynomial for f (t, y) to derive the 2-step scheme
3  ∆t
Y n+1 = Y n + ∆tf tn , Y n − f tn−1 , Y n−1

2 2
which belongs to the Adams-Bashforth family with b2 = 0, b1 = 3/2 and b0 =
−1/2 in the general formula (4.23). In the exercises, you are asked to rigorously
demonstrate that the local truncation error for (4.27) is third order and thus the
scheme is second order accurate. The methods up to five steps are listed here for
completeness.

Adams-Bashforth 2-step, 3-step, 4-step and 5-step methods


 
3 1
Y n+1 = Y n + ∆t f (tn , Y n ) − f (tn−1 , Y n−1 )
2 2 
n+1 n 23 4 5
Y = Y + ∆t f (tn , Y ) − f (tn−1 , Y n−1 ) + f (tn−2 , Y n−2 )
n

 12 3 12
n+1 n 55 n 59 n−1 37
Y = Y + ∆t f (tn , Y ) − f (tn−1 , Y ) + f (tn−2 , Y n−2 )
24 24  24
3
− f (tn−3 , Y n−3 )
 8
1901 1387 109
Y n+1 = Y n + ∆t f (tn , Y n ) − f (tn−1 , Y n−1 ) + f (tn−2 , Y n−2 )
720 360 30
637 251
− f (tn−3 , Y n−3 ) + f (tn−4 , Y n−4 )
360 720
(4.28)

Schemes in the Adams-Moulton family are implicit multistep methods which use
the derivative f evaluated at tn+1 plus m prior points but only use the solution Y n .
The one-step Adams-Moulton method is the backward Euler scheme and the 2-step
method is the trapezoidal rule; several methods are listed here for completeness.

Adams-Moulton 2-step, 3-step, 4-step and 5-step methods


∆t
Y n+1 = Y n + f (tn+1 , Y n+1 ) + f (tn , Y n )

2
90 CHAPTER 4. A SURVEY OF METHODS

 
5 2 1
Y n+1 = Y n + ∆t f (tn+1 , Y n+1 ) + f (tn , Y n ) − f (tn−1 , Y n−1 )
 12 3 12
3 19 5
Y n+1 = Y n + ∆t f (tn+1 , Y n+1 ) + f (tn , Y n ) − f (tn−1 , Y n−1 )
8 24 24
1 n−2
+ f (tn−2 , Y )
 24
251 646 264
Y n+1 = Y n + ∆t f (tn+1 , Y n+1 ) + f (tn , Y n ) − f (tn−1 , Y n−1 )
720 720 720 
106 19
+ f (tn−2 , Y n−2 ) − f (tn−3 , Y n−3 )
720 720
(4.29)

As mentioned, multistep methods require storage of the m previously calculated


values. In the Adams family of methods only Y n is used but the slopes at all m
points must be stored. So when implementing the methods we must take this into
account. For a single equation the extra storage is negligible but for systems it
requires m vectors.
Another drawback of an m-step method is that we need m starting values Y 0 ,
Y , . . . , Y m−1 and we only have Y 0 from the initial condition. Typically one uses
1

a one-step method to start the scheme. How do we decide what method to use? A
“safe” approach is to use a method which has the same accuracy as the multistep
method but we see in the following examples that you can actually use a method
which has one power of ∆t less because we are only taking a small number of
steps with the method. For example, if we use the 2-step second order Adams-
Bashforth method we need Y 1 in addition to Y 0 . If we take one step with the
forward Euler method it is actually second order accurate at the first step because
the error there is only due to the local truncation error. However, if we use a 3-step
third order Adams-Bashforth method then using the forward Euler method to get
the two starting values results in a loss of accuracy. This issue is illustrated in the
following example.

Example 4.8. starting values for multistep methods


Implement the 3-step third order accurate Adams-Bashforth method given in (4.28) to
solve the IVP
y 0 (t) = t2 + y(t) 2 < t < 5 y(2) = 1 ,
which has the exact solution

y(t) = 11et−2 − (t2 + 2t + 2) .

Compare the numerical rates of convergence when different methods are used to generate
the starting values. Specifically we use RK methods of order one through four to generate
the starting values which for a 4-step method are Y 1 , Y 2 , and Y 3 because we have Y 0 = 1.
Tabulate the errors at t = 3.
4.2. MULTISTEP METHODS 91

As you can see from the table, if a second, third or fourth order scheme is used to compute
the starting values then the method is third order. There is nothing gained by using a
higher order scheme (the fourth order) for the starting values. However, if a first order
scheme (forward Euler) is used then the rate is degraded to second order even though we
only used it to calculate two values, Y 1 and Y 2 . Consequently to compute starting values
we should use a scheme that has the same overall accuracy or one degree less than the
method we are using.

accuracy of starting method


first second third fourth
∆t error rate error rate error rate error rate
1/10 2.425 10−1 1.618 10−2 8.231 10−3 8.042 10−3
1/20 6.106 10−2 1.99 2.241 10−3 2.87 1.208 10−3 2.77 1.195 10−3 2.75
1/40 1.529 10−2 2.00 2.946 10−4 2.92 1.628 10−4 2.89 1.620 10−4 2.88
1/80 3.823 10−3 2.00 3.777 10−5 2.96 2.112 10−5 2.95 2.107 10−5 2.94

The next example provides results for 2-step through 5-step Adams-Bashforth
methods. From the previous example we see that to maintain the accuracy the
starting values need to be determined by methods of order m − 1.

Example 4.9. comparison of adams-bashforth methods


Solve the IVP from the previous example by 2-step through 5-step Adams Bashforth
methods. In each case use a scheme that is one degree less accurate to calculate the
starting values.
As can be seen from the table, all methods have the expected numerical rate of
convergence.

2-step method 3-step method 4-step method 5-step method


∆t error rate error rate error rate error rate
1/10 2.240 10−1 1.618 10−2 9.146 10−4 5.567 10−5
1/20 5.896 10−2 1.93 2.241 10−3 2.87 6.986 10−5 3.71 2.463 10−6 4.50
1/40 1.509 10−2 1.97 2.946 10−4 2.92 4.802 10−6 3.86 8.983 10−8 4.78
1/80 3.816 10−3 1.98 3.777 10−5 2.96 3.144 10−7 3.93 3.022 10−9 4.89

4.2.3 Stability of multistep methods


The numerical stability of a one-step method depends on the initial condition y0
but in a m-step multistep method there are m − 1 other starting values Y 1 , Y 2 , . . . ,
Y m−1 which are obtained by another method such as a RK method. In 1956
Dahlquist3 published a seminal work formulating criteria for the stability of linear
multistep methods. We give an overview of the results here.
3 Germund Dahlquist (1925-2005) was a Swedish mathematician.
92 CHAPTER 4. A SURVEY OF METHODS

We first rewrite the m-step multistep method (4.23) by shifting the indices to
get
Y i+m = am−1 Y hi+m−1 + am−2 Y i+m−2 + am−3 Y i+m−3 + · · · + a0 Y i
+∆t bm f (ti+m , Y i+m ) + bm−1 f (ti+m−1 , Y i+m−1 )
i
+bm−2 f (ti+m−2 , Y i+m−2 ) + · · · + b0 f (ti , Y i )

or equivalently
m−1
X m
X
i+m i+j
Y − aj Y = ∆t bj f (ti+j , Y i+j ) .
j=0 j=0

0
As before, we apply it to the model IVP y = λy, y(0) = y0 for Re(λ) < 0 which
guarantees the IVP itself is stable. Substituting f = λy into the difference equation
gives
m−1
X m
X
Y i+m − aj Y i+j = ∆t bj λY i+j .
j=0 j=0

Recall that a technique for solving a linear homogeneous ODE such as y 00 (t) +
2y 0 (t) − y(t) = 0 is to look for solutions of the form y = ert and get a polynomial
equation for r such as ert (r2 + 2r − 1) = 0 and then determine the roots of the
equation. We take the analogous approach for the difference equation and seek a
solution of the form Y n = z i . Substitution into the difference equation yields
m−1
X m
X
z i+m − aj z i+j = ∆t bj λz i+j .
j=0 j=0

Canceling the lowest order term z i gives a polynomial equation in z which is a


function of λ and ∆t resulting in the stability equation
m−1
X m
X
Q(λ∆t) = z m − aj z j − ∆t bj λz j = ρ(z) − ∆tλσ(z) ,
j=0 j=0

where
m−1
X m
X
ρ(z) = z m − aj z j and σ(z) = bj z j . (4.30)
j=0 j=0

For stability, we need the roots of ρ(z) to have magnitude ≤ 1 and if a root is
identically one then it must be a simple root. If this root condition is violated,
then the method is unstable so a simple check is to first see if the root condition is
satisfied; if the root condition is satisfied then we need to find the region of stability.
To do this, we find the roots of Q(λ∆t) and require that each root has magnitude
less than or equal to one. To simplify the calculations we rewrite Q(λ∆t) as
Q(λ∆t) = z m (1 − λ∆tbm ) − z m−1 (am−1 + bm−1 λ∆t)
−z m−2 (am−2 + bm−2 λ∆t) − · · · − (a0 + b0 λ∆t) .
4.2. MULTISTEP METHODS 93

In the following example we determine the region of stability for both the forward
and backward Euler methods using the analysis for multistep methods. The same
stability conditions as we obtained by analyzing the stability of the one-step methods
using the amplification factor are realized.

Example 4.10. Investigate the stability of the forward and backward Euler methods by
first demonstrating that the root condition for ρ(z) is satisfied and then finding the region
of absolute stability. Confirm that the same results as obtained for stability by considering
the methods as single-step methods are achieved.
The forward Euler method is written as Y n+1 = Y n + ∆tf (tn , Y n ) so in the form of a
multistep method with m = 1 we have a0 = 1, b0 = 1, b1 = 0 and thus ρ(z) = z −1 whose
root is z = 1 so the root condition is satisfied. To find the region of absolute stability
we have Q(λ∆t) = z − (1 + λ∆t) which has a single root 1 + λ∆t; thus the region of
absolute stability is |1 + λ∆t| ≤ 1 which is the condition we got before by analyzing the
method as a single step method.

For the backward Euler method a0 = 1, b0 = 0, b1 = 1 and so ρ(z) = z − 1 which has the
root z = 1 and so the root condition is satisfied. To find the region of absolute stability
we have Q(λ∆t) = z(1 − λ∆t) − 1 which has a single root 1/(1 − λ∆t) and we get the
same restriction that we got before by analyzing the method as a single-step method.

The next example analyzes the stability of a 2-step method using the Dalhquist
conditions.

Example 4.11. In this example we want to show that the 2-step Adams-Bashforth
method
∆t 
Y n+1 = Y n + 3f (tn , Y n ) − f (tn−1 , Y n−1 )

2
is stable.
For this Adams-Bashforth method we have m = 2, a0 = 0, a1 = 1, b0 = −1/2, b1 = 3/2,
and b2 = 0. The characteristic polynomial is ρ(z) = z 2 − z = z(z − 1) whose two roots
are z = 0, 1 and the root condition is satisfied.

In summary, we see that some methods can be unstable if the step size ∆t
is too large (such as the forward Euler method) while others are stable even for
a large choice of ∆t (such as the backward Euler method). In general, explicit
methods have stability restrictions whereas implicit methods are stable for all step
sizes. Of course, one must have a small enough step size for accuracy. We have
just touched on the ideas of stability of numerical methods for IVPs; the interested
reader is referred to standard texts in numerical analysis for a thorough treatment
of stability. The important concept is that we need a consistent and stable method
to guarantee convergence of our results.
94 CHAPTER 4. A SURVEY OF METHODS

4.3 Extrapolation methods


Richardson extrapolation is a technique used throughout numerical analysis. The
basic idea is that you take a sequence of approximations which are generated by a
method whose error can be expanded in terms of powers of a discretization param-
eter and then combine these approximations to generate a more accurate solution.
Recall that when we calculate the local truncation error we expand the error in terms
of the step size ∆t so the methods we have studied can be used with Richardson ex-
trapolation. This approach can also be viewed as interpolating the approximations
and then extrapolating to the point where the parameter is zero. A popular method
for solving an IVP where high accuracy is required is the Burlisch-Stoer algorithm
which refines this extrapolation concept to provide a robust and efficient algorithm.
In this section we demonstrate the basic idea of how extrapolation is used with
approximations generated by methods we already have. Then we briefly discuss the
modifications needed to develop the Burlisch-Stoer method.

4.3.1 Richardson extrapolation


As a simple example consider the forward Euler method which we know is a first
order approximation. To simplify the notation we set h to be the time step or grid
spacing. From the Taylor Series expansion for f (t+h) we have the forward difference
approximation to f 0 (t) which we denote by N (h) where N (h) = f (t+h)−f (t) /h.


We have
h h2 h3
f 0 (t) − N (h) = f 00 (t) + f 000 (t) + f 0000 (t) + · · · . (4.31)
2 3! 4!
Now if we generate another approximation using step size h/2 we have

h h h2 000 h3 0000
f 0 (t) − N ( ) = f 00 (t) + f (t) + f (t) + · · · . (4.32)
2 4 4 · 3! 8 · 4!
The goal is to combine these approximations to eliminate the O(h) term so that
the approximation is O(h2 ). Clearly subtracting (4.31) from twice (4.32) eliminates
the terms involving h so we get
h  h2 h2  000  h3 h3  0000
f 0 (t) − 2N ( ) − N (h) =
 
− f (t) + − f (t) + · · · +
2 2 · 3! 3! 4 · 4! 4!
2 3
h 3h 0000
= − f 000 (t) − f (t) + · · · .
12 32
(4.33)
Thus the approximation 2N (h/2) − N (h) for f 0 (x) is second order. This process
can be repeated to eliminate the O(h2 ) term. To see this, we use the approximation
(4.33) with h halved again to get

h h  h2 000 3h3 0000


f 0 (t) − 2N ( ) − N ( ) = −

f (t) − f (t) + · · · . (4.34)
4 2 4 · 12 8 · 24
To eliminate the h2 terms we need to take four times (4.34) and subtract (4.33)
so that 3f 0 (t) − [8N ( h4 ) − 4N ( h2 ) − 2N ( h2 ) + N (h)] = O(h3 ). This yields the
4.3. EXTRAPOLATION METHODS 95

approximation 38 N ( h4 ) − 2N ( h2 ) + 13 N (h) which is a third order approximation to


f 0 (t). In theory, this procedure can be repeated to get as accurate an approximation
as possible for the given computer precision. In this simple example we have used
approximations at h, h/2, h/4, . . . but a general sequence h0 , h1 , h2 , . . . where h0 >
h1 > h2 > · · · can be used. The following example takes first order approximations
generated by the forward Euler method and produces approximations of a specified
order.

Example 4.12. forward euler with richardson extrapolation


Consider the IVP y 0 (t) = −5y(t) with y(0) = 2 whose exact solution at t = 1 is 0.0134759.
Tabulate the results usng the forward Euler method with ∆t = 1/10, 1/20, . . . 1/320
and then use Richardson extrapolation to get a sequence of approximations which are
quadratically convergent. Calculate the numerical rate of convergence.
We tabulate the results for the problem below and then use Richardson extrapolation to
obtain a sequence of solutions which converge quadratically. Note that no extra compu-
tations are performed except to take the linear combination of two previous results, i.e.,
2N (∆t/2) − N (∆t).

Euler → N (∆t) 2N (∆t/2) − N (∆t)


∆t Yn Rate Yn Rate

1/10 1.953 10−3


1/20 6.342 10−3 0.692 1.073 10−2
1/40 9.580 10−3 0.873 1.282 10−2 2.09
1/80 1.145 10−2 0.942 1.332 10−2 2.05
1/160 1.244 10−2 0.973 1.343 10−2 2.03
1/320 1.295 10−2 0.986 1.346 10−2 2.01

Note that to get the rates in the last column more accuracy in the solution had to be used
than the recorded values in the table.

This procedure of taking linear combinations is equivalent to polynomial interpo-


lation where we interpolate the points hi , N (hi ) , i = 0, 1, . . . , s and then evaluate
the interpolation polynomial at h = 0. For example, suppose we have a first order
approximation as was the case for the forward difference approximation and use  h0
and h1 = h0 /2 . Then the linear polynomial which interpolates h0 , N (h0 ) and
h1 , N (h1 ) is
h − h1 h − h0
N (h0 ) + N (h1 )
h0 − h1 h1 − h0
Setting h1 = h0 /2 and evaluating the polynomial at h = 0, i.e., extrapolating to
zero, we get −N (h0 ) + 2N (h0 /2) which is exactly the approximation we derived in
(4.33) by taking the correct linear combination of (4.31) and (4.32).
96 CHAPTER 4. A SURVEY OF METHODS

4.3.2 Burlisch-Stoer method


The Burlisch-Stoer method is an extrapolation scheme which takes advantage of the
error expansion of certain methods to produce very accurate results in an efficient
manner. In order for the extrapolation method to work we must know that the error
in approximating w(x) is of the form
w(x) − N (h) = K1 h + K2 h2 + K3 h3 + K4 h4 + · · · . (4.35)
However, if all the odd terms in this error expansion are known to be zero then this
greatly enhances the benefits of repeated extrapolation. For example, suppose we
have a second order approximation where all the odd powers of h are zero, i.e.,
w(x) − N (h) = K1 h2 + K2 h4 + · · · + Ki h2i + O (h)2(i+1) .

(4.36)
Then, when we obtain a numerical approximation N (h/2) the linear combination
which eliminates the h2 term is
3h4
[4w(x)−4N (h/2)]−[w(x)−N (h)] = 3w(x)−[4N (h/2)−N (h)] = −K2 +O(h6 )
4
so that the approximation 43 N (h/2)− 13 N (h) is fourth order accurate after only one
extrapolation step. Although extrapolation methods can be applied to all methods
with an error expansion of the form (4.35), the most efficient methods, such as
the Burlisch-Stoer method, use an underlying method which has an error expansion
such as (4.36).
A common choice for the low order method is the two-step midpoint method
Y n+2 = Y n + 2∆tf (tn+1 , Y n+1 ) .
The error expansion for this method does not contain odd terms; see the exercises.
The other modification that the Burlisch-Stoer method incorporates to improve
the extrapolation process is to use rational interpolation rather than polynomial
interpolation. A link to an implementation of this method can be found on its
Wikipedia page.

4.4 Predictor-Corrector Methods


We have considered several implicit schemes for approximating the solution of an
IVP. However, when we implement these schemes the solution of a nonlinear equa-
tion is necessary unless f (t, y) is linear in y or only a function of t. This requires extra
work and moreover, we know that methods such as the Newton-Raphson method
for nonlinear equations are not guaranteed to converge globally. Additionally, we
ultimately want to develop variable time step methods so we need methods which
provide an easy way to estimate errors. For these reasons, we look at predictor-
corrector schemes.
In predictor-corrector methods an implicit scheme is implemented explicitly be-
cause it is used to improve (or correct) the solution that is first obtained (or pre-
dicted) by an explicit scheme. The implicit scheme is implemented as an explicit
4.4. PREDICTOR-CORRECTOR METHODS 97

scheme because instead of computing f (tn+1 , Y n+1 ) we use the known predicted
value at tn+1 . One can also take the approach of correcting more than once. You
can view this approach as being similar to applying the Newton-Raphson method
where we take the predictor step as the initial guess and each corrector is a New-
ton iteration; however, the predictor step gives a systematic approach to finding an
initial guess.
We first consider the Euler-trapezoidal predictor-corrector pair where the explicit
scheme is forward Euler and the implicit scheme is the trapezoidal method (4.12).
Recall that the forward Euler scheme is first order and the trapezoidal is second
order. Letting the result of the predicted solution at tn+1 be Ypn+1 , we have the
following predictor-corrector pair.

Euler-Trapezoidal Predictor-Corrector Method

Ypn+1 = Y n + ∆tfh (tn , Y n )


∆t
i (4.37)
Y n+1 = Yn+ 2 f (tn+1 , Ypn+1 ) + f (tn , Y n )

As can be seen from the description of the scheme, the implicit trapezoidal method
is now implemented as an explicit method because we evaluate f at the known point
(tn+1 , Ypn+1 ) instead of at the unknown point (tn+1 , Y n+1 ). The method requires
two function evaluations so the work is equivalent to a two-stage RK method. The
scheme is often denoted by PECE because we first predict Ypn+1 , then evaluate
f (tn+1 , Ypn+1 ), then correct to get Y n+1 and finally evaluate f (tn+1 , Y n+1 ) to get
ready for the next step.
The predicted solution Ypn+1 from the forward Euler method is first order but
we add a correction to it using the trapezoidal method and improve the error. We
can view the predictor-corrector pair as implementing the difference scheme

∆t h i
Y n+1 = Y n + f tn+1 , Y n + ∆tf (tn , Y n ) + f (tn , Y n )

2
which uses an average of the slope at (tn , Y n ) and at tn+1 the Euler approximation
there. To analytically demonstrate the accuracy of a predictor-corrector method it
is helpful to write the scheme in this manner. In the exercises you are asked to show
that this predictor-corrector pair is second order. Example 4.13 demonstrates that
numerically we get second order.
One might believe that if one correction step improves the accuracy, then two
or more correction steps are better. This leads to methods which are commonly
denoted as PE(CE)r schemes where the last two steps in the correction process
are repeated r times. Of course it is not known a priori how many correction
steps should be done but since the predictor step provides a good starting guess,
only a small number of corrections are typically required. The effectiveness of the
correction step can be dynamically monitored to determine r. The next example
applies the Euler-trapezoidal rule to an IVP using more than one correction step.
98 CHAPTER 4. A SURVEY OF METHODS

Example 4.13. euler-trapezoidal predictor-corrector pair


Perform a single step of the Euler-trapezoidal predictor-corrector pair by hand to demon-
strate how it is implemented and then compare the numerical results when a different
number of corrections are used. Specifically use the IVP

ty 2
y 0 (t) = √ , 0<t≤2 y(0) = 1
9 − t2 − 2

which has an exact solution y(t) = 1/ 9 − t2 that can be found by separating variables
and integrating.
Using Y 0 = 1 and ∆t = 0.1 √ we first predict the value at 0.1 using the forward Euler
method with f (t, y) = ty 2 /( 9 − t2 − 2) to get

P: Yp1 = Y 0 + .1f (t0 , Y 0 ) = 1 + 0.1(0) = 1.0 ;

then evaluate the slope at this point

(.1)(12 )
E: f (0.1, 1.0) = √ = 0.03351867
( 9 − .12 − 2)
and finally correct to obtain the approximation at t1 = 0.1
0.1 
C :Y 1 = Y 0 +

f (0.1, 1) + f (0, 1) == 1.03351867
2
with
E :f (0.1, 1.8944272) = 0.03356667 .
To perform a second correction we have
0.1 
C :Y 1 = Y 0 +

f (0.1, 1.03351867) + f (0, 1) = 1.00167316
2
where
E :f (.1, 1.) = 0.03463567 .

The results for the approximate solutions at t = 2 are given in the table below using
decreasing values of ∆t; the corresponding results from just using the forward Euler method
are also given. As can be seen from the table, the predictor-corrector pair is second order.
Note that it requires one additional function evaluation, f (tn+1 , YiP ), than the Euler
method. The Midpoint rule requires the same number of function evaluations and has
the same accuracy as this predictor-corrector pair. However, the predictor-corrector pair
provides an easy way to estimate the error at each step.

PECE PE(CE)2
∆t Error rate Error rate
1/10 2.62432 10−2 3.93083 10−2
1/20 7.66663 10−2 1.75 1.16639 10−2 1.75
1/40 3.18110 10−3 1.87 3.18110 10−3 1.87
1/80 8.31517 10−4 1.94 8.31517 10−4 1.94
1/160 2.12653 10−4 1.97 2.12653 10−4 1.97
4.5. COMPARISON OF SINGLE-STEP AND MULTISTEP METHODS 99

In the previous example we saw that the predictor was first order, the corrector
second order and the overall method was second order. It can be proved that if the
corrector is O(∆t)n and the predictor is at least O(∆t)n−1 then the overall method
is O(∆t)n . Consequently the PC pairs should be chosen so that the corrector is
one degree higher accuracy than the predictor.
Higher order predictor-corrector pairs often consist of an explicit multistep method
such as an Adams-Bashforth method and a corresponding implicit Adams-Moulton
multistep method. The pair should be chosen so that the only additional function
evaluation in the corrector equation is at the predicted point. To achieve this one
often chooses the predictor and corrector to have the same accuracy. For example,
one such pair is an explicit third order Adams-Bashforth predictor coupled with an
implicit third order Adams-Moulton. Notice that the corrector only requires one
additional function evaluation at (tn+1 , Ypn+1 ).

Third order Adams-Moulton predictor-corrector pair

Ypn+1 = Y n + ∆t
 n n−1

12 23f (tn , Y ) − 16f (tn−1 , Y ) + 5f (ti−2 , Y n−2 )
Y n+1 = Y n + ∆t
 n+1

12 5f (tn+1 , Yp ) + 8f (tn , Y n ) − f (tn−1 , Y n−1 )
(4.38)

Example 4.14. third order adams-moulton predictor-corrector pair


Compare the errors and rates of convergence for the PC pair (4.38) with the third order
Adams-Bashforth method defined in (4.28) for the problem in Example 4.13.
We tabulate the results below. Note that both numerical rates are approaching three but
the error in the PC pair is almost an order of magnitude smaller at a fixed ∆t.

Predictor only Predictor-Corrector


∆t Error rate Error rate
1/10 2.0100 10−2 1.5300 10−3
1/20 3.6475 10−3 2.47 3.3482 10−4 2.19
1/40 5.4518 10−4 2.74 5.5105 10−5 2.60
1/80 7.4570 10−5 2.87 7.9035 10−6 2.80
1/160 9.7513 10−6 2.93 1.0583 10−6 2.90

4.5 Comparison of single-step and multistep meth-


ods
We have seen that single-step schemes are methods which essentially have no “mem-
ory”. That is, once y(tn ) is obtained they perform approximations to y(t) in
100 CHAPTER 4. A SURVEY OF METHODS

the interval (tn , tn+1 ] as a means to approximate y(tn+1 ); these approximations


are discarded once y(tn+1 ) is computed. On the other hand, multistep methods
“remember” the previously calculated solutions and slopes because they combine
information that was previously calculated at points such as tn , tn−1 , tn−2 . . . to
approximate the solution at tn+1 .
There are advantages and disadvantages to both single step and multistep meth-
ods. Because multistep methods use previously calculated information, we must
store these values; this is not an issue when we are solving a single IVP but if we
have a system then our solution and the slope are vectors and so this requires more
storage. However multistep methods have the advantage that f (t, y) has already
been evaluated at prior points so this information can be stored and no new func-
tion evaluations are needed for explicit multistep methods. Consequently multistep
methods require fewer function evaluations per step than single-step methods and
should be used where it is costly to evaluate f (t, y).
If we look at methods such as the Adams-Bashforth schemes given in (4.28) then
we realize another shortcoming of multistep methods. Initially we set Y0 = y(t0 )
and then use this to start a single-step method. However, if we are using a two-step
method we need both Y0 and Y 1 to implement the scheme. How can we get an
approximation to y(t1 )? The obvious approach is to use a single step method. So
if we use m previous values (including tn ) then we must take m − 1 steps of a
single-step method to start the simulations; it is m − 1 steps because we have the
value Y0 . Of course care must be taken in the choice of which single step method
to use and this was discussed in Example 4.8. We saw that if our multistep method
is O(∆t)r then we should choose a single step method of the same accuracy or
one power less; a scheme which converges at a rate of O(∆t)r−2 or less would
contaminate the accuracy of the method.
In the next chapter we investigate variable time step and variable order methods.
It is typically easier to do this with single-step rather than multistep methods. How-
ever, multivariable methods have been formulated which are equivalent to multistep
methods on paper but are implemented in a different way which allows easier use
of a variable time step.
Older texts often recommend multistep methods for problems that require high
accuracy and whose slope is expensive to evaluate and Runge-Kutta methods for
the rest of the problems. However, with the advent of faster computers and more
efficient algorithms, the advantage of one method over the other is less apparent. It
is worthwhile to understand and implement both single-step and multistep methods.
4.5. COMPARISON OF SINGLE-STEP AND MULTISTEP METHODS 101

EXERCISES

1. Each of the following Runge-Kutta schemes is written in the Butcher tableau


format. Identify each scheme as explicit or implicit and then write the scheme
as
Xs
Y n+1 = Y n + bi f (tn + ci , Y n + ki )
i=1

where the appropriate values are substituted for bi , ci , and ki .

1
0 0 0 0 6 − 13 1
6
1 1 1 1 5 1
2 2 0 0 2 6 12 − 12
a. b. 1 2 1
1 −1 2 0 1 6 3 6
1 2 1 1 2 1
6 3 6 6 3 6

2. Modify the derivation of the explicit second order Taylor series method in
§ 4.1.1 to derive an implicit second order Taylor series method.

3. Use a Taylor series to derive a third order accurate explicit difference equation
for the IVP (2.8).

4. Gauss quadrature rules are popular for numerical integration because one
gets the highest accuracy possible for a fixed number of quadrature points;
however one gives up the “niceness” of the quadrature points. In addition,
these rules are defined over the interval [−1, 1]. For example, the one-point
Gauss quadrature rule is
Z 1
1
g(x) dx = g(0)
−1 2

and the two-point Gauss quadrature rule is


1
1  −1
Z
1
g(x) dx = g( √ ) + g( √ )
−1 2 3 3

Use the one-point Gauss rule to derive a Gauss-Runge-Kutta method. Is


the method explicit or implicit? Does it coincide with any method we have
derived?

5. Simpson’s numerical integration rule is given by


b  
b − a
Z
a+b
g(x) dx = g(a) + 4g + g(b)]
a 6 2
102 CHAPTER 4. A SURVEY OF METHODS

If g(x) ≥ 0 on [a, b] then it approximates the area under the curve g(x) by the
area under a parabola passing through the points (a, g(a)), (b, g(b)) and ((a+
Rt
b)/2, g((a+b)/2)). Use this quadrature rule to approximate tnn+1 f (t, y) dt to
obtain an explicit 3-stage RK method. When you need to evaluate terms such
as f at tn + ∆t/2 use an appropriate Euler step to obtain an approximation
to the corresponding y value as we did in the Midpoint method. Write your
method in the format of (4.16) and in a Butcher tableau.

6. In § ?? we derived a second order BDF formula for uniform grids. In an


analogous manner, derive the corresponding method for a nonuniform grid.
7. Use an appropriate interpolating polynomial to derive the multistep method

Y n+1 = Y n−1 + 2∆tf (tn , Y n ) .

Determine the accuracy of this method.


8. Determine the local truncation error for the 2-step Adams-Bashforth method
(4.27).
Chapter 5
Systems and Adaptive Step Size
Methods

When modeling phenomena where we know the initial state and how it changes
with time, we often have either a higher order IVP or a system of IVPs rather than
a single first order IVP. In this chapter we first recall how a higher order IVP can be
transformed into a system of first order IVPs. Then we extend in a straightforward
manner some of the methods from Chapter ?? to systems of equations. We discuss
implementation issues and give examples that illustrate the use of systems of IVPs.
Then we point out how to extend our stability tests for a single equation to a system.
The final concept we investigate in our study of IVPs are methods which effi-
ciently allow a variable time step to be taken. For these methods we need a means
to estimate the next time step. If we can get an estimate for the error made at
time tn then the magnitude of the error can be used to accept or reject the step
and, if the step is accepted, to estimate the next time step. Consequently, our goal
is to find methods which can be used to estimate the error. One strategy is to
obtain two approximations at a given time and use these to measure the error. Of
course obtaining the second approximation must be done efficiently or else the cost
is prohibitively large. In addition to variable step, many production codes are also
variable order. We do not address these here.

5.1 Higher Order IVPs


The methods we have learned only apply to first order IVPs. However, recall from
§ 2.4 that it is straightforward to write an IVP of order m > 1 into a system of
m first order coupled system of IVPs. In general, if we have the pth order IVP for
y(t)

y [p] (t) = f (t, y, y 0 , y 00 , · · · , y [p−1] ) t0 < t ≤ T


y(t0 ) = α1 , y 0 (t0 ) = α2 , y 00 (t0 ) = α3 , · · · y [p−1] (t0 ) = αp

103
104 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS

then we convert it to a system of p first-order IVPs by letting w1 (t) = y(t), w2 (t) =


y 0 (t), · · · , wp (t) = y [p−1] (t) which yields the first order coupled system

w10 (t) = w2 (t)


w20 (t) = w3 (t)
..
. (5.1)
0
wp−1 (t) = wp (t)
wp0 (t) = f (t, w1 , w2 , . . . , wp )

along with the initial conditions wk = αk , k = 1, 2, . . . , p. Thus any higher order


IVP that we encounter can be transformed into a coupled system of first order IVPs.

Example 5.1. converting a high order ivp into a system


Write the fourth order IVP

y [4] (t) + 2y 00 (t) + 4y(t) = 5 y(0) = 1, y 0 (0) = −3, y 00 (0) = 0, y 000 (0) = 2

as a system of first order equations.


We want four first order differential equations for wi (t), i = 1, 2, 3, 4; to this end let
w1 = y, w2 = y 0 , w3 = y 00 , and w4 = y 000 . Using the first two expressions we have
w10 = w2 , and the second and third gives w20 = w3 , the third and fourth gives w30 = w4 and
the original differential equation provides the last first order equation w40 + 2w3 + 4w1 = 5.
The system of equations is thus

w10 (t) − w2 (t) = 0


w20 (t) − w3 (t) = 0
w30 (t) − w4 (t) = 0
0
w4 + 2w3 + 4w1 = 5

along with the initial conditions

w1 (0) = 1, w2 (0) = −3, w3 (0) = 0, and w4 (0) = 2.

Oftentimes a model is already in the form of a coupled system of first order IVPs
such as the predator-prey model (2.6). Our goal is to apply the methods of the
previous chapter to a system of first order IVPs. The notation we use for a general
system of N first order IVPs is

w10 (t) = f1 (t, w1 , w2 , . . . , wN ) t0 < t ≤ T


w20 (t) = f2 (t, w1 , w2 , . . . , wN ) t0 < t ≤ T
.. (5.2)
.
0
wN (t) = fN (t, w1 , w2 , . . . , wN ) t0 < t ≤ T

along with the initial conditions wk (t0 ) = αk , k = 1, 2, . . . , N . For example, using


this notation the pth order IVP written as the system (5.1) has f1 = w2 , f2 = w3 ,
etc.
5.2. SINGLE-STEP METHODS FOR SYSTEMS 105

Existence, uniqueness and continuous dependence of the solution to the system


(5.2) can be established. Analogous to the case of a single IVP each function fi
must satisfy a Lipschitz condition with respect to each unknown wi . Details of this
analysis is found in standards texts in ODEs. For the sequel, we assume that each
system has a unique solution which depends continuously on the data.
In the next two sections we demonstrate how one-step and multistep methods
from Chapter ?? are easily extended to the system of N equations (5.2).

5.2 Single-step methods for systems


We now want to extend single-step methods to the system (5.2). For simplicity
we first extend the forward Euler method for a system and then with the intuition
gained from applying that method we extend a general explicit Runge-Kutta method
to a system. Implicit RK methods are extended in an analogous way. We use the
notation Wkn as the approximation to wk (tn ) where the subscript of W refers to
the unknown number and the superscript to the point tn .
Suppose we have the first order system (5.2) with the initial conditions wk (t0 ) =
αk for k = 1, 2, . . . , N . The forward Euler method for each of the k = 1, 2, . . . , N
equations is
Wkn+1 = Wkn + ∆tfk tn , W1n , W2n , · · · , WNn .


We write the Euler method as a vector equation so we can solve for all N un-
knowns simultaneously at each tn ; this is not necessary but results in an efficient
implementation of the method. To this end we set
 n
f1 (tn , Wn )
 
W1
 W2n   f2 (tn , Wn ) 
Wn =  .  and Fn = 
   
..
 .. 

 . 
WNn fN (tn , Wn ))
so that W0 = (α1 , α2 , . . . , αN )T . For n = 0, 1, 2, . . . we have the following vector
equation for the forward Euler method for a system
Wn+1 = Wn + ∆tFn . (5.3)
To implement the scheme at each point tn we have N function evaluations to form
the vector Fn , then we perform the scalar multiplication to get ∆tFn and then a
vector addition to obtain the final result Wn+1 . To compute the error at a specific
time, we have to take into account the fact that the approximation is now a vector
instead of a scalar. Also the exact solution is a vector of each wi evaluated at
the specific time the error is determined. We can easily calculate an error vector
as the difference in these two vectors. To obtain a single number to use in the
calculation of a numerical rate, we must use a vector norm. A common vector norm
is the standard Euclidean norm which is often called `2 norm or the “little l2 norm”.
Another commonly used vector norm is the max or infinity norm. See the appendix
for details.
106 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS

Example 5.2. forward euler for a system


Consider the system of three IVPs

w10 (t) = 2w2 (t) − 4t 0 < t < 10


w20 (t) = −w1 (t) + w3 (t) − et + 2 0 < t < 10
w30 (t) = w1 (t) − 2w2 (t) + w3 (t) + 4t 0 < t < 10
w1 (0) = −1, w2 (0) = 0, w3 (0) = 2

for the unknown (w1 , w2 , w3 )T whose exact solution is (− cos(2t), sin(2t) + 2t, cos(2t) +
et )T . Determine by hand an approximation at t = 0.2 using ∆t = 0.1 and the forward
Euler method. Calculate the Euclidean or `2 -norm of the error at t = 0.2.
For this problem we have

2W2n − 4tn
   
−1
0 n
W =  0 and F =  −W1n + W3n − etn + 2 
2 W1n − 2W2n + W3n + 4tn

so that with tn = t0 = 0
   
2(0) − 4(0) 0
F0 = −(−1) + 2 − e0 + 2 = 4 .
−1 − 0 + 2 + 4(0) 1

With ∆t = 0.1 we determine W1 from


     
−1 0 −1.0
W1 = W0 + ∆tF0 =  0  + 0.1  4  =  0.4  .
2 1 2.1

Now to determine W2 = W1 + ∆tF1 we need F1 which is found by using W1 and


t1 = ∆t = 0.1 in the definition of Fn . We have
   
2(0.4) − 4(.1) 0.400
F1 =  1 + 2.1 − e.1 + 2 = 3.995 
−1 − 2(.4) + 2.1 + 4(.1) 0.700

so that     
−1.0 0.400 −0.9600
W2 =  0.4  + 0.1  3.995  =  0.7995  .
2.1 0.700 2.1700
The exact solution at t = 0.2 is (−0.921061, 0.789418, 2.14246)T . Unlike the case of a
single IVP we now have an error vector instead of a single number; at t = 0.2 the error
vector in our calculation is (0.038939, .010082, .02754)T . To obtain a single number from
this vector to use in the calculation of a numerical rate, we must use a vector norm. For
this calculation at t = 0.2 the Euclidean norm of the error is 1.98 10−2 .

Suppose now that we have an s-stage RK method; recall that for a single first
order equation we have s function evaluations for each tn . If we have N first order
IVPs, then we need sN function evaluations at each tn . For example, if we use
a 4-stage RK with 10,000 equations then at each time we need 40,000 function
5.2. SINGLE-STEP METHODS FOR SYSTEMS 107

evaluations; if we do 100 time steps then we have 4 million function evaluations. If


function evaluations are expensive, multistep methods may be more efficient.
In an s-stage RK method for a single equation we must compute each ki ,
i = 1, 2, . . . , s as defined in (4.16). For a system, we have a vector of slopes so
each scalar ki in (4.16) is now a vector. Thus for a system an s-stage RK method
is written as
k1 = ∆tF tn , Wn


k2 = ∆tF tn + c2 ∆t, Wn + a21 k1




k3 = ∆tF tn + c3 ∆t, Wn + a31 k1 + a32 k2



..
.
ks = ∆tF tn + cs ∆t, Wn + as1 k1 + as2 k2 + · · · + ass−1 ks−1

X s
Wn+1 = Wn + bj kj .
j=1

For example, the 2-stage Heun RK method has coefficients c1 = 0, c2 = 2/3,


a21 = 2/3, b1 = 1/4 and b2 = 3/4 so for a system we have
k1 = ∆tF(tn , Wn )
k2 = ∆tF(tn + 32 ∆t, Wn + 23 k1 )
Wn+1 = Wn + 14 k1 + 34 k2 .
The following example uses the Heun method to approximate the solution to the
IVP in the previous example by hand and then we see how the approximation can
be computed using dot and matrix products so we can efficiently implement the
method.

Example 5.3. heun method for a system


By hand approximate the solution to the system at t = 0.2 given in the previous example
using the Heun method. Calculate the Euclidean error at t = 0.2 using ∆t = 0.1. Compare
with the results for the forward Euler method in the previous example. Then redo the
calculation using dot and matrix products to illustrate how it might be implemented on a
computer.
As in the previous example, W0 = (−1, 0, 2)T and Fn = 2W2n − 4tn , −W1n + W3n −
T
etn + 2, W1n − 2W2n + W3n + 4tn . For the first step of length ∆t = 0.1 we have
T
k1 = 0.1(0, 4, 1) and to determine k2 we need to evaluate F at ( 32 (.1), W0 + 32 k1 );
performing this calculation gives k2 = (.026667, .399773, .08)T so that
       
−1 0.0 .026667 −0.98000
1 1 3
W =  0  +  0.4  +  .399773  =  0.39983  .
4 4
2 0.1 .080000 2.08500
Similarly for the approximation at 0.2 we have
       
−0.98000 0.039966 0.066097 −0.92040
2 1 3
W =  0.39983  +  0.395983  +  0.390402  =  0.79163  .
4 4
2.08500 −0.070534 0.051767 2.14150
108 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS

The exact solution at t = 0.2 is (−0.921061, 0.789418, 2.14246)T giving an error vector of
(0.000661, .002215, .000096)T ; calculating the standard Euclidean norm of the error and
normalizing by the Euclidean norm of the solution gives 1.0166×10−3 which is considerably
smaller than we obtained for the forward Euler.
Now we want redo the problem using dot and matrix products. Let
 
  1   0 0
0 0 0
c = 2 , b = 43 , a = 2 , K = 0 0
3 4 3
0
0 0

As before N = 3, W0 = (−1, 0, 2)T and to get W1 we compute

W0 + Kb

using a matrix times vector operation. To form the two columns of K we need to determine
the point where we evaluate F. To determine the first column of K, i.e., k1 we have

teval = t + c(1)∆t = 0 + 0(0.1)


 = 0    
−1 0 0   −1
T 0
W0 + K a(1, :) =  0  + 0 0

Weval = = 0 
0
2 0 0 2
=    
 0 0.0
K(·, 1) = ∆tF(teval , Weval = 0.1 4 = 0.4
1 0.1

and for the second column of K


2 0.2
teval = t + c(2)∆t = 0 + (0.1) =
3  3    
−1 0 0   −1
T 2/3
Weval = W0 + K a(2, :) =  0  + .4 0 =  .8/3 
0
2 .1 0 2 + .2/3
=  
 0.026667
K(·, 2) = ∆tF(teval , Weval = 0.399773
0.08

Now we want to determine how to modify our implementation of an explicit RK


method for a single IVP to incorporate a system. The coefficients of the particular
RK method do not change but, of course, the solution at each time tn is now
a vector as well as the slope F(tn , W n ). The routine which evaluates the slope
requires the scalar time and a N -vector as input and outputs an N -vector which is
the slope at the given point. For a single equation using a stage s method we have
s scalars ki which we store as an s-vector so we need a two dimensional array to
store the K which is dimensioned by the number of equations N by the number of
stages s in the method. The following pseudocode illustrates how one could modify
the routine for advancing RK for a single equation.
5.2. SINGLE-STEP METHODS FOR SYSTEMS 109

Algorithm 5.1 : Advancing explicit RK for a system one time step

Assume: Coefficients in RK method are stored as follows: ci and bi , i =


1, . . . , s are stored in 1D array and coefficients aij , i, j = 1, . . . , s, in a
2D array
Input: the solution W at the time t stored as an N -vector, the uniform
time step ∆t, the number of stages s, coefficients a, b, c. k is a two
dimensional N × s array.
Output: the solution W and the new time t
Loop over number of stages:

K=0
for i = 1, . . . , s
teval = t + c(i)∆t

Weval = W + matrix product K, a(i, :)T
K(:, i) = ∆tf (teval , Weval )

W = W + matrix product K, b
t = t + ∆t

Example 5.4. rates of convergence for RK for a system


For the problem in Example 5.2 apply the forward Euler and the Heun method with
∆t = 1/10, 1/20, . . . , 1/80 and output the `2 and `∞ norm of the normalized error at
t = 1. Compute the numerical rates of convergence and compare with the results for a
single IVP.
In the table below we tabulate the results using the forward Euler method for this system
at t = 1 where both the normalized `2 -norm and `∞ -norm (i.e., the maximum norm) of
the normalized error normalized by the corresponding norm of the solution is reported.
Clearly we have linear convergence as we did in the case of a single equation.

∆t `2 Error rate `∞ Error rate


1/10 6.630 10−2 6.019 10−2
1/20 3.336 10−2 0.99 3.156 10−2 0.93
1/40 1.670 10−2 1.00 1.631 10−2 0.95
1/80 8.350 10−3 1.00 8.277 10−3 0.98

In the table below we tabulate the results using the Heun method for this system at
t = 1 where both the normalized `2 -norm and `∞ -norm (i.e., the maximum norm) of the
error normalized by the corresponding norm of the solution is reported. Clearly we have
quadratic convergence as we did in the case of a single equation.
110 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS

∆t `2 Error rate `∞ Error rate


1/10 5.176 10−3 5.074 10−3
1/20 1.285 10−3 2.01 1.242 10−3 2.03
1/40 3.198 10−4 2.01 3.067 10−4 2.02
1/80 7.975 10−5 2.00 7.614 10−5 2.01

5.3 Multistep methods for systems


Recall that explicit multistep methods use not only the approximation at tn but
other nearby calculated times to extrapolate the solution to the new point. The
m-step explicit method from § 4.2.2 for a single IVP is

Y n+1 = am−1 Y hn + am−2 Y n−1 + am−3 Y n−2 + · · · + a0 Y n+1−m


+∆t bm−1 f (tn , Y n ) + bm−2 f (tn−1 , Y n−1 )
i
+ · · · + b0 f (tn+1−m , Y n+1−m ) .

For a system of N equations the function f is now a vector F so we must store its
value at the previous m steps. In the Adams-Bashforth or Adams Moulton methods
a0 , a1 , . . . , am−2 = 0 so only the solution at tn is used. This saves additional storage
because we only have to store m slope vectors and a single vector approximation to
the solution. So for the system of N equations using an m-step Adams-Bashforth
method we must store (m + 1) vectors of length N . Remember that an m-step
method requires m starting values so we need to calculate m − 1 values from a
single-step method.
As a concrete example, consider the 2-step Adams-Bashforth method
h3 1 i
Y n+1 = Y n + ∆t f (tn , Y n ) − f (tn−1 , Y n−1 )
2 2
for a single IVP. Using the notation of the previous section we extend the method
for the system of N equations as
h3 1 i
Wn+1 = Wn + ∆t F(tn , Wn ) − F(tn−1 , Wn−1 ) . (5.4)
2 2
At each step we must store three vectors Wn , F(tn , Wn ), and F(tn−1 , Wn−1 ).
In the next example we apply this 2-step method to the system of the previous
examples.

Example 5.5. adams-bashforth method for a system


Apply the 2-step Adams-Bashforth method (5.4) to the system of Example 5.2 to approx-
imate by hand the solution at t = 0.2 using a time step of ∆t = 0.1. Give the Euclidean
norm of the error. Use appropriate starting values. Then tabulate the results at t = 1 for
a sequence of ∆t approaching zero and calculate the numerical rate of convergence.
5.4. STABILITY OF SYSTEMS 111

Because this is a 2-step method we need two starting values. We have the initial condition
for W0 but we also need W1 . Because this method is second order we can use either a
first or second order scheme to generate an approximation to W1 . Here we use the results
from the Heun method in Example 5.3 for W1 . Consequently we have
   
−1 −0.98000
W0 =  0  and W1 =  0.39982  .
2 2.08500

From the previous example we have F(0, W0 ) = (0.0, 4.0, 1.0)T and F(0.1, W1 )=(.39966,
3.95982, −.704659)T . Then W2 is given by
        
−0.98000 0.39966 0.0 −0.92005
3 1
W2 =  0.39982  + 0.1   3.95983  −  4.0  =  0.79380  .
2 2
2.08500 −0.70466 1.0 1.92930

The table below tabulates the errors at t = 1. Of course we can only use the starting
value W1 = (. − 0.98000, 0.39982, 2.08500)T as starting values for the computations at
∆t = 0.1; for the other choices of ∆t we must generate starting values because t1 is
different. From the results we see that the rate is two, as expected.

∆t `2 Error rate `∞ Error rate


1/10 1.346 10−2 1.340 10−2
1/20 3.392 10−3 1.99 3.364 10−3 1.99
1/40 8.550 10−4 1.99 8.456 10−4 1.99
1/80 2.149 10−5 1.99 2.121 10−5 1.99

5.4 Stability of Systems


Oftentimes we have a system of first order IVPs or we have a higher order IVP which
we first write as a system of first order IVPs. We want to extend our definition
of absolute stability to a system but we first look at stability for the differential
equations themselves. Analogous to the problem y 0 (t) = λy we consider the linear
model problem
w0 (t) = Aw
for an N × N system of IVPs where A is an N × N matrix. Consider first the simple
case where A is a diagonal matrix and the equations are uncoupled so basically we
have the same situation as a single equation. Thus the stability criteria is that
the real part of each diagonal entry must be less than or equal to zero. But the
diagonal entries of a diagonal matrix are just its N eigenvalues λi 1 counted according
to multiplicity. So an equivalent statement of stability when A is diagonal is that
Re(λi ) < 0, i = 1, 2, . . . , N ; it turns out that this is the stability criteria for a
general matrix A. Recall that even if the entries of A are real the eigenvalues may
be complex. If A is symmetric we are guaranteed that the eigenvalues are real. If we
1 The eigenvalues of an N × N matrix A are scalars λ such that Ax = λx; the vector x is

called the eigenvector corresponding to the eigenvalue λ.


112 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS

have the general system (5.2) where fi (t, w) is not linear in w, then the condition
becomes one on the eigenvalues of the Jacobian matrix for f where the (i, j) entry
of the Jacobian is ∂fi /∂wj .
Now if we apply the forward Euler method to the system w0 (t) = Aw where
the entries of A are aij then we have the system
 
1 + ∆ta11 ∆ta12 ∆ta13 ··· ∆ta1N
 ∆ta21 1 + ∆ta22 ∆ta23 ··· ∆ta2N 
Wn+1 = 
 n
W

 . .. . .. 
··· ··· ∆taN,N −1 1 + ∆taN,N

The condition on ∆t is determined by choosing it so that all the eigenvalues of the


matrix have real parts less than zero.

5.5 Adaptive time step methods


If the solution to a differential equation varies rapidly over a portion of the time
interval and slowly over the remaining time, then clearly using a fixed time step is
inefficient. In this section we want to investigate methods which allow us to estimate
an error and then use this error to decide if the time step can be increased or if
it should be decreased. We have already encountered Predictor-Corrector methods
which can easily provide an estimate for the error by comparing the results of the
predictor and corrector steps. Another approach is to use two approximations such
as a p-order and a (p + 1)-order RK method and compare the local truncation error
between the two which should be O(∆t)1 . Of course, in order to do this efficiently
the methods should be nested in the sense that the function values required for the
(p + 1)-order method include all the function evaluations from the p-order method;
for this reason the methods are called embedded RK methods. The best known of
these methods is the Runge-Kutta-Fehlberg method which uses a fourth and fifth
order explicit RK method (RKF45). We begin this section with a simple example
illustrating how to approximations at a given point help us to determine whether to
accept the answer or not and, if accepted, to predict a new step size.

5.5.1 Adaptive methods using Richardson extrapolation


We begin this section by first considering a very simple example of an algorithm
which produces an automatic step size selector. In adaptive time step methods ∆t
is no longer constant so we for use ∆tn where for M time steps we set

∆t0 = t1 −t0 , ∆t1 = t2 −t1 , · · · , ∆tn = ∆tn+1 −∆tn , · · · , ∆tM −1 = tM −tM −1 .

Recall that Richardson extrapolation is introduced in § 4.3 as a technique to combine


lower order approximations to get more accurate approximations. We explore this
idea by comparing an approximation at tn+1 obtained from tn using a step size of
∆tn and a second one at tn+1 which obtained from tn by taking two steps with step
size ∆tn /2. We want to see how these two approximations are used to estimate the
5.5. ADAPTIVE TIME STEP METHODS 113

local truncation error and provide an estimate for the next time step ∆tn+1 as well
as determining if we should accept the result at tn+1 . To describe this approach
we use the forward Euler method because its simplicity should make the technique
clear.
We assume we have the solution at tn and have an estimate for the next time
step ∆tn and the goal is to determine whether this estimate the next time step
∆tn+1 . First we take an Euler step with ∆tn to get the approximation Y1n+1 where
the subscript denotes the specific approximation because we have two. Next we take
two Euler steps starting from tn using a step size of ∆tn /2 to get the approximation
Y2n+1 .
Recall that the local truncation error for the forward Euler method using a step
size of ∆t is C(∆t)2 + O(∆t)3 . Thus the exact solution satisfies

y(tn+1 ) = y(tn ) + ∆tf (tn , y(tn ) + C(∆t)2 + O(∆t)3 ,




where we have equality because we have included the local truncation error. Con-
sequently, for our solution Y1n+1 we have the local truncation error C(∆tn )2 +
O(∆tn )3 , i.e.,
y(tn+1 ) − Y1n+1 = C(∆tn )2 + O(∆tn )3 .
Now for Y2n+1 we take two steps each with a local truncation error of C(∆tn )2 /4 +
O(∆tn /2)3 so basically at tn+1 we have twice this error

y(tn+1 ) − Y2n+1 = C(∆tn )2 /2 + O(∆tn /2)3 .

Now using Richardson extrapolation the solution 2Y2n+1 − Y1n+1 eliminates the
(∆tn )2 so that is has a local truncation error of O(∆t3n ).
We want to see how to use these truncation errors to decide whether to accept
or reject the improved solution 2Y2n+1 − Y1n+1 at tn+1 and if we accept it to choose
a new time step ∆tn+1 . Suppose that the user inputs a tolerance for the maximum
rate of increase in the error. We have an estimate for this rate, rn , from our
solutions so we require it to be less than the given tolerance σ, i.e.,

|Y1n+1 − Y2n+1 |
rn = ≤ prescribed tolerance = σ . (5.5)
∆tn

If this is satisfied then we accept the improved solution 2Y2n+1 −Y1n+1 ; otherwise we
reduce ∆tn and repeat the procedure. We estimate the next step size by computing
the ratio of the acceptable rate (i.e., the prescribed tolerance) and the actual rate
rn . Then we take this fraction of the current step size to estimate the new one. In
practice, one often adds a multiplicative factor less than one for “safety” since we
have made certain assumptions. For example, we could compute a step size from
 
σ
∆tn+1 = γ ∆tn where γ < 1 . (5.6)
rn

From this expression we see that if the acceptable rate σ is smaller than the actual
rate rn then ∆tn is decreased. If the criteria (5.5) is not satisfied then we must
114 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS

repeat the step with a reduced time step. We can estimate this from (5.6) without
the safety factor because in this case rn > σ so the fraction is less than one. The
following example illustrates this technique.

Example 5.6. Consider the IVP


y 0 (t) = −22ty(t) −1<t≤1 y(−1) = e−7
2
whose exact solution is y(t) = e4−11t ; the solution is small and varies slowly in the domain
[−1, −.5] ∪ [.5, 1] but peaks to over fifty at the origin. Below is a plot of the point wise
error when ∆t = 0.001. As can be seen from the plot, a variable time step needs to be
used.
Use the algorithm described above with Richardson extrapolation to solve this IVP
using the time step formula (5.6). Choose the tolerance σ = 0.01, γ = 0.75 and the initial
time step of ∆t = 0.01. Tabulate the (5.5) along with whether the step is accepted or
rejected and the new step size at several different times. The starting step size is chosen
to be ∆t0 = and the tolerance σ =.

rn
initial ∆tn σ
accept/reject new ∆tn
(if rejected) (if accepted)
0.01 0.1 accept 0.0716
0.0716 .87 accept 0.0614
0.0614 2.0 reject 0.0460
0.0460 1.5 reject 0.0345
0.0345 1.1 reject 0.0259
0.0259 .86 accept 0.0225
0.0225 1.12 reject 0.0169
0.0169 .84 accept 0.0150
0.0150 .98 accept 0.0115
0.0115 .95 accept

5.5.2 Adaptive methods using predictor-corrector schemes


Using predictor-corrector pairs also provides a way to estimate the error and thus
determine if the current step size is appropriate. For example, for the third order
predictor and corrector pair given in (4.38) one can specifically compute the constant
in the local truncation error to get
9 [4] 1 [4]
|y(tn+1 ) − Ypn+1 | = y (ξ)(∆t)4 |y(tn+1 ) − Y n+1 | = y (η)(∆t)4 .
24 24
For small ∆t we assume that the fourth derivative is constant over the interval and
so the coefficient in the local truncation error for Ypn+1 is approximately nine times
the local error for Y n+1 . We have
1
|y(tn+1 ) − Y n+1 | ≈ |y(tn+1 ) − Ypn+1 | .
9
5.5. ADAPTIVE TIME STEP METHODS 115

If the step size ∆t is too large, then the assumption that the fourth derivative is
constant from tn to tn+1 may not hold and the above relationship is not true. Typ-
ically the exact solution y(tn+1 ) is not known so instead we monitor the difference
in the predicted and corrected solution |Y n+1 − Ypn+1 |. If it is larger than our
prescribed tolerance, then the step is rejected and ∆t is decreased. Otherwise the
step is accepted; if the difference is below the minimum prescribed tolerance then
the step size is increased in the next step.

5.5.3 Embedded RK methods


To use RK methods for step size control we use two different methods to ap-
proximate the solution at tn+1 and compare the approximations which gives an
approximation to the local truncation error. If the results are close, then we are
confident that a correct step size was chosen; if they vary considerably then we
assume that too large a step size was used and we reduce the time step and repeat
the calculation. If they are extremely close then this suggests a larger step size can
be used on the next step. Typically, an ad hoc formula is used to estimate the
next time step based on these observations. Of course, to efficiently implement this
approach we should choose the methods so that they have function evaluations in
common to reduce the work.
A commonly used RK pair for error control is a combination of a fourth and
fifth order explicit method; it is called the Runge-Kutta-Fehlberg method and was
developed by the mathematician Erwin Fehlberg in the late 1960’s. It is typically
known by the acronym RKF45. For many years it has been considered the “work
horse” of IVP solvers. Recall that to get an accuracy of (∆t)5 at least six function
evaluations are required. The six function evaluations are

k1 = f (tn , Y n )
 
1 n 1
k2 = f tn + ∆t, Y + k1
4 4
 
3 3 9
k3 = f tn + ∆t, Y n + k1 + k2
8 32 32
 
12 1932 7200 7296
k4 = f tn + ∆t, Y n + k1 − k2 + k3
13 2197 2197 2197
 
n 439 3680 845
k5 = f tn + ∆t, Y + k1 − 8k2 + k3 − k4
216 513 4104
 
1 n 8 3544 1859 11
k6 = f tn + ∆t, Y − k1 + 2k2 − k3 + k4 − k5 .
2 27 2565 4104 40

The fourth order RK method which uses the four function evaluations k1 , k3 , k4 , k5

25 1408 2197 1
Y n+1 = Y n + k1 + k3 + k4 − k5 (5.7)
216 2565 4104 5
116 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS

is used to first approximate y(tn+1 ) and then the fifth order RK method

16 6656 28561 9 2
Y n+1 = Y n + k1 + k3 + k4 − k5 + k6 (5.8)
135 12825 56430 50 55
is used for comparison. Note that the fifth order method uses all of the coefficients
of the fourth order method so it is efficient to implement because it only requires
an additional function evaluation. For this reason, we call RK45 an embedded
method. Also note that the fourth order method is actually a 5-stage method but
remember that no 5-stage method is fifth order. Typically the Butcher tableau for
the coefficients ci and aij is written for the higher order method and then two lines
are appended at the bottom for the coefficients bi in each method. For example,
for RKF45 the tableau is
0
1 1
4 4
3 3 9
8 32 32
12 1932
13 2197 − 7200
2197
7296
2197
439 3680 845
1 216 −8 513 − 4104
1 8
2 − 27 2 − 3544
2565
1859
4104 − 11
40

25 1408 2197
216 0 2565 4104 − 51 0
16 6656 28561 9 2
135 0 12825 56430 − 50 55

To implement the RKF45 scheme we find two approximations at tn+1 ; Y4n+1


using the fourth order scheme (5.7) and Y5n+1 using the fifth order scheme (5.8).
We then determine the difference |Y5n+1 − Y4n+1 | which should be O(∆t). This
error is used to make the decision whether to accept the step or not; if we accept
the step then the decision must be made whether or not to increase the step size for
the next calculation or keep it the same. One must a priori choose a minimum and
maximum acceptable value for the normalized difference between |Y5n+1 − Y4n+1 |
and use these values for deciding whether a step is acceptable or not. To implement
the RK45 method the user needs to input a maximum and minimum time step so
that we never allow the time step to get too large or too small. Typically the code
has some default values that the user can either accept or modify.

5.6 Stiff systems


Some differential equations are more difficult to solve than others. We know that
for problems where the solution curve varies a lot, we should take a small step size
and where it changes very little a larger step size should be used for efficiency. If
the change in the solution curve is relatively small everywhere then a uniform step
size is the most efficient approach. This all seems very heuristic. However, there are
problems which require a very small step size even when the solution curve is very
smooth. There is no universally accepted definition of stiff differential equations
5.6. STIFF SYSTEMS 117

but typically the solution curve changes rapidly and then tends towards a slowly-
varying solution. Because the stability region for implicit methods is typically much
larger than explicit methods, most stiff equations are approximated using an implicit
method.
To illustrate the concept of stiffness we look at a single IVP which is considered
stiff. The example is from a combustion model and is due to Shampine (2003)
who is one of the authors of the Matlab ODE suite. The idea is to model flame
propagation as when you light a match. We know that the flame grows rapidly
initially until it reaches a critical size which is dependent on the amount of oxygen.
We assume that the flame is a ball and y(t) represents its radius; in addition we
assume that the problem is normalized so that the maximum radius is one. We
have the IVP
2
y 0 (t) = y 2 (1 − y) 0 < t ≤ ; y(0) = δ (5.9)
δ
where δ << 1 is the small given initial radius. At ignition the solution y increases
rapidly to a limiting value of one; this happens quickly on the interval [0, 1/δ] but
on the interval [1/δ, 2/δ] the solution is approximately equal to one. Knowing the
behavior of the problem suggests that we should take a small step size initially and
then on [1/δ, 2/δ] where the solution is almost constant we should be able to take a
large step size. However, if we use the RKF45 method with an automatic step size
selector, then we can capture the solution on [0, 1/δ] but on [1/δ, 2/δ] the step size
is reduced by so much that the minimum allowable step size is surpassed and the
method often fails if the minimum step size is set too large. Initially the problem
is not stiff but it becomes stiff as its approaches the value one, its steady state
solution. The term “stiff” was used to described this phenomena because it was
felt the steady state solution is so “rigid”.
When one has a system of equations like (5.2) the stiffness of the problem
depends upon the eigenvalues of the Jacobian matrix. Recall that we said we need
all eigenvalues to have real part less than zero for stability. If the Jacobian matrix
has eigenvalues which have a very large negative real part and eigenvalues with a
very small negative real part, then the system is stiff and special care must be used
to solve it. You probably don’t know a priori if a system is stiff but if you encounter
behavior where the solution curve is not changing much but you find that your step
size needs to be smaller and smaller, then your system is probably stiff. In that
case, an implicit method is typically used.

You might also like