Ivp Notes
Ivp Notes
19
Chapter 2
Introduction to Initial Value
Problems
In calculus and physics we encounter initial value problems although this terminology
may not be used. For example, in calculus a standard problem is to determine the
amount of radioactive material remaining after a fixed time if the initial mass of
the material is known along with the fraction of the material which will decay at
any instant. In the typical radioactive decay model the rate of change of the mass
(i.e., its first derivative with respect to time) of the radioactive material is assumed
proportional to the amount of material present at that instant; the proportionality
constant is the given decay rate which is negative since the mass is decreasing. Thus
a first order differential equation for the mass is given along with the initial mass
of the object. This model is similar to one to describe population growth except in
this case the proportionality constant is positive.
Recall from physics that Newton’s second law of motion states that the force ap-
plied to an object equals its mass times its acceleration. If we have a function which
denotes the position of the object at any time, then its first derivative with respect
to time is the velocity of the object and the second derivative is the acceleration.
Consequently, Newton’s second law is a second order ODE for the displacement and
if we specify the initial location and velocity of the object we have a second order
initial value problem.
To be more precise, an initial value problem (IVP) for an unknown function y(t)
consists of an ordinary differential equation (ODE) for y(t) and one or more auxiliary
conditions specified at the same value of t. Here the unknown is only a function of
one independent variable (t) so differentiation of y(t) involves standard derivatives,
not partial derivatives. The ODE specifies how the unknown changes with respect
to the independent variable which we refer to as time but it could also represent an
x location, etc. Recall that the order of a differential equation refers to the highest
derivative occurring in the equation; e.g., a second order ODE for y(t) must include
a y 00 (t) term but may or may not include a y 0 (t) term. The number of additional
21
22 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS
conditions corresponds to the order of the differential equation; for example, for a
first order ODE we specify the value of y(t) at an initial time t0 and for a second
order ODE we specify the value of both y(t) and y 0 (t) at t0 . For obvious reasons,
these extra conditions are called initial conditions. The goal is to determine the
value of y(t) for subsequent times, t0 < t ≤ T where T denotes a final time.
We begin this chapter by providing a few applications where IVPs arise. These
problems provide examples where an exact solution is known so they can be used to
test our numerical schemes. Instead of analyzing each IVP, we write a generic first
order IVP which serves as our prototype problem for describing various approaches
for approximating a solution to the IVP. We provide conditions which guarantee
that the IVP has a unique solution and that the solution varies a small amount
when the initial conditions are perturbed by a small amount.
Once we have specified our prototype first order IVP we introduce the idea
of approximating its solution using a difference equation. In general, we have to
give up the notion of finding an analytic solution which gives an expression for the
solution at any time and instead find a discrete solution which is an approximation
to the exact solution at a set of finite times. The basic idea is that we discretize
our domain, in this case a time interval, and then derive a difference equation which
approximates the differential equation in some sense. The difference equation is in
terms of a discrete function and only involves differences in the function values; that
is, it does not contain any derivatives. Our hope is that as the difference equation
is imposed at more and more points (which much be chosen in a uniform manner)
then its solution approaches the exact solution to the IVP.
One might ask why we only consider a prototype equation for a first order IVP
when many IVPs include higher order equations. At the end of this chapter we
briefly show how a higher order IVP can be converted into a system of first order
IVPs.
In Chapter 3 we derive the Euler methods which are the simplest numerical meth-
ods for approximating the solution to a first order initial value problem. Because
the methods are simple, we can easily derive them plus give graphical interpreta-
tions to gain intuition about approximations. Once we analyze the errors made in
replacing the continuous differential equation by a difference equation, we see that
the methods only converge linearly which is quite slow. This is the motivation for
looking at higher accurate methods in the following chapter. We look at several
numerical examples and verify the linear convergence of the methods and we see
that in certain situations one of the methods tends to oscillate and even“blow up”
while the other always provides reliable results. This motivates us to study the
numerical stability of methods.
In Chapter 4 we provide a survey of numerical schemes for solving our prototype
IVP. In particular, we present two classes of methods. The first class of methods
consist of single step methods which use the solution at the previous time along
with approximations at some intermediate time points to approximate the solution
at the next time level. The second class of methods is called multistep methods
which use the calculated approximations at several previous times to approximate
the solution at the next time level. In particular, we look at Runge-Kutta methods
(single step) and multistep methods in detail. Other topics included in this chapter
2.1. EXAMPLES OF IVPS 23
where K is the maximum allowable population and r0 is a given growth rate for
small values of the population. As the population p increases to near the threshold
value K then p/K becomes close to one (but less than one) and so the term
(1 − p/K) is positive but approaches zero as p approaches K. Thus the growth
rate decreases because of fewer resources; the limiting value is when p = K and the
growth rate is zero. However when p is small compared with K, the term (1−p/K)
is near one and the model behaves like exponential growth with a rate of r ≈ r0 .
Assuming the population at any time is proportional to the current population using
the proportionality constant (2.2), the differential equation becomes
0 p(t) r0
p (t) = r0 1 − p(t) = r0 p(t) − p2 (t) (2.3)
K K
along with p(0) = p0 . This equation is nonlinear in the unknown p(t) due to the
p2 (t) term and is more difficult to solve than the exponential growth equation.
However, it can be shown that the solution is
Kp0
p(t) = . (2.4)
(K − p0 )e−r0 t + p0
This can be verified by substitution into the differential equation and verification
of the initial condition p(0) = p0 . We expect that as we take the limt→∞ p(t) we
should get the threshold value K. Clearly this is true because
1 1
lim p(t) = Kp0 lim = Kp0 =K,
t→∞ t→∞ (K − p0 )e−r0 t + p0 p0
where we have used the fact that limt→∞ e−r0 t = 0 for r0 > 0.
and
0.294 2
p0` (t) = 0.294p` (t) − p` (t)
n
logistic growth model: 104
p` (0) = 103 .
2.1. EXAMPLES OF IVPS 25
107
pe (t) = 103 e.294t p` (t) = ,
(9000e−.294t + 1000)
where we have used (2.4) to get p` (t). To compare the results, we plot the exact solutions
to the IVPs as well as giving a table of the bacteria population (truncated to the nearest
whole number) from each model for a range of hours. As we see from the plots and the
table, the predicted size of the bacteria colony is close for small values of t but as the
time increases the exponential model predicts boundless growth whereas the logistic model
predicts a population which never exceeds the carrying capacity of 10,000.
20 000
15 000 exponential
growth
carrying capacity
10 000
logistic growth
5000
2 4 6 8 10 12 14
t 0 1 2 4 6 8 10 15 20
pe (t) 1000 1341 1800 3241 5835 10506 18915 82269 357809
p` (t) 1000 1297 2116 2647 3933 5386 6776 9013 9754
Instead of a first order ODE in the IVP we might have a higher order equation
such as the harmonic oscillator equation. This models an object attached to a wall
with a spring and the unknown function is the displacement of the object from the
wall at any time. At the initial time t = 0 the system is in equilibrium so the
displacement y(t) is zero. This is illustrated pictorially in the figure on the left
below. The figure on the right illustrates the situation where at a later time t the
object has moved to the right an amount y(t) so the spring is stretched.
Using basic laws of physics the second order ODE which models this spring-mass
system is
my 00 (t) = −ky(t) − cy 0 (t) ,
26 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS
where m is the mass of the object, k is the spring constant in Hooke’s law (force = -
spring constant times the displacement), and c is the coefficient of friction. Because
the differential equation is second order, we must specify two initial conditions at
the same instance of time; here we specify the initial displacement y(0) = 0 and
the initial velocity y 0 (t) = ν. If there is no friction then the differential equation
models a simple harmonic oscillator . In this case the IVP becomes
y 00 (t) = −ω 2 y(t)
y(0) = 0 (2.5)
y 0 (0) = ν ,
p
where ω = k/m. Clearly sin ωt and cos ωt satisfy this differential equation so
the general solution is y(t) = C1 sin ωt + C2 cos ωt for constants C1 , C2 . Satisfying
the initial condition y(0) = 0 implies C2 = 0 and C1 is determined by setting
y 0 (t) = ωC1 cos ωt equal to ν at t = 0; i.e., C1 = ν/ω . Since the solution to the
simple harmonic oscillator equation is y(t) = (ν/ω) sin ωt, the solution should be
periodic as indicated graphically in the next example.
If the coefficient of friction is nonzero, then the equation is more difficult to
solve analytically because we have to consider three cases depending on the sign
of the term c2 − 4ω 2 . However, the inclusion of the friction term will not cause
problems when we discretize.
−2
−4
−6
−8
0 10 20 30 40 50 60 70 80 90
In the above examples a single unknown function is sought but in some math-
ematical models we have more than one unknown. An example of this scenario is
a population model where there are two interacting species; this is the so-called
predator-prey model. Here the number of prey, ρ(t), is dependent on the number
of predators, p(t), present. In this case we have a first order ODE for the prey and
for the predator;
p(t)
ρ0 (t) = 1− ρ(t)
ν
(2.6)
ρ(t)
p0 (t) = − 1 − p(t)
µ
Note that the equations are nonlinear. These equations must be solved simulta-
neously because the growth/decay of the prey is dependent upon the number of
predators and vice versa. An exact solution to this system is not available but a
numerical approximation to the equation is given in Chapter 5.
In the first IVP, the ODE is linear whereas in the second one the ODE is nonlinear
in the unknown. Clearly, these IVPs are special cases of the following general IVP.
Here f (t, y) is the given derivative of y(t) which we refer to as the slope and y0
is the known value at the initial time t0 . For example, for IVP I in (2.7) we have
f (t, y) = sin πt, i.e., the slope is only a function of time whereas in IVP II we have
f (t, y) = t − y 2 so that the slope is a function of both t and y. The ODE in IVP I
is linear in the unknown and in IVP II it is nonlinear due to the y 2 term so that
both linear and nonlinear differential equations are included in the general equation
(2.8a).
For certain choices of f (t, y) we are able to find an analytic solution to (2.8).
In the simple case when f = f (t), i.e., f is a function of t and not both t and
y,
R we can solve the ODE exactly if we can obtain an antiderivative of f (t), i.e., if
f (t) dt can be evaluated. For example, for the first IVP in (2.7) we have
Z Z
1
y 0 (t) = sin πt ⇒ y 0 (t) dt = sin πt dt ⇒ y(t) + C1 = − cos πt + C2
π
and thus the general solution is y(t) = − π1 cos πt + C. The solution to the differ-
ential equation is not unique because C is an arbitrary constant; actually there is
a family of solutions which satisfy the differential equation. To determine a unique
solution we must specify y(t) at some point such as its initial value. In IVP I in
(2.7) y(0) = 0 so y(0) = − π1 cos 0 + C = 0 which says that the unique solution to
the IVP is y(t) = − π1 cos πx + π1 .
If f (t, y) is more complicated than simply a function of t then other techniques
are available to try to find the analytic solution. These techniques include methods
such as separation of variables, using an integrating factor, etc. Remember that
when we write a code to approximate the solution of the IVP (2.8) we always want
to test the code on a problem where the exact solution is known so it is useful to
know some standard approaches. The following example illustrates how the method
of separation of variables is used to solve some first order ODEs; other techniques
are explored in the exercises.
that the solution satisfies the differential equation and then impose the initial condition
y(0) = 2 to determine a unique solution to the IVP.
Because f (t, y) is a function of both y and t we can not directly integrate the differential
Requation with respect to t to obtain the solution because this would require determining
ty(t) dt and y is unknown. For the technique of separation of variables we move all
terms involving the unknown to the left-hand side of the equation and all terms involving
the independent variable to the other side of the equation. Of course, this technique does
not work for all equations but it is applicable for many. For this ODE we rewrite the
equation as Z Z
dy dy
= −tdt ⇒ dt = − t dt
y y
so that we integrate to get the general solution
t2 t2 t2 t2
ln y + C1 = − + C2 ⇒ eln y+C1 = e− 2 +C2 ⇒ eC1 y(t) = e− 2 eC2 ⇒ y(t) = Ce− 2 .
2
Note that when we integrate an equation we have an arbitrary constant for each integral.
Here we have specifically indicated this but because the sum of two arbitrary constants
C1 , C2 is another arbitrary constant C in the sequel we only give one constant. Since the
general solution to this differential equation involves an arbitrary constant C there is an
infinite family of solutions which satisfy the differential equation; i.e., one for each choice
of C. A family of solutions is illustrated in the figure below; note that as t → ±∞ the
solution approaches zero.
2
C=2
1 C=1
C = 12
-4 -2 2 4
C = -1 2
-1
C = -1
C = -2
-2
We can always verify that we haven’t made an error in determining the solution by demon-
strating that it satisfies the differential equation. Here we have
t2 −2t t2 t2
y(t) = Ce− 2 ⇒ y 0 (t) = C e− 2 = −t Ce− 2 = −ty(t)
2
Even if we are unable to determine the analytic solution to (2.8), we can still
gain some qualitative understanding of the behavior of the solution. This is done by
30 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS
the visualization technique of plotting the tangent line to the solution at numerous
points (t, y) and is called plotting the direction fields. Recall that the slope of the
tangent line to the solution curve is given and is just f (t, y). Mathematical software
with graphical capabilities often provide commands for automatically drawing a
direction field with arrows which are scaled to indicate the magnitude of the slope;
typically they also offer the option of drawing some solutions or streamlines. Using
direction fields to determine the behavior of the solution is illustrated in the following
example.
-2 -1 1 2
-2
-4
Before we discuss methods for approximating the solution of the IVP (2.8) we
first need to ask ourselves if our general IVP actually has an analytic solution, even
if we are unable to find it. We are only interested in approximating the solution to
IVPs which have a unique solution. However, even if we know that a unique solution
exists, we may still have unreliable numerical results if the solution of the IVP does
not depend continuously on the data. If this is the case, then small changes in
the data can cause large changes in the solution and thus roundoff errors in our
calculations can produce meaningless results. In this situation we say the IVP is ill-
posed or ill-conditioned, a situation we would like to avoid. Luckily, most differential
equations that arise from modeling real-world phenomena are well-posed.
2.2. GENERAL FIRST ORDER IVP 31
This says that as the independent variable x varies from x1 to x2 the change in
the dependent variable g is governed by the slope of the line, i.e., a = g 0 (x).
For a general function g(x) Lipschitz continuity on an interval I requires that the
magnitude of the slope of the line joining any two points x1 and x2 in I must be
bounded by a real number. Formally, a function g(x) defined on a domain D ⊂ R1
is Lipschitz continuous on D if for any x1 6= x2 ∈ D there is a constant L such
that
|g(x1 ) − g(x2 )| ≤ L|x1 − x2 | ,
or equivalently
|g(x1 ) − g(x2 )|
≤ L.
|x1 − x2 |
Here L is called the Lipschitz constant. This condition says that we must find one
constant L which works for all points in the domain. Clearly the Lipschitz constant
is not unique; for example, if L = 5, then L = 5.1, 6, 10, 100, etc. also satisfy
the condition. If g(x) is differentiable then an easy way to determine the Lipschitz
constant is to find a constant such that |g 0 (x)| ≤ L for all x ∈ D. The linear
function g(x) = ax + b is Lipschitz continuous with L = |a| = |g 0 (x)|. Lipschitz
continuity is a stronger condition than merely saying the function is continuous so a
Lipschitz continuous function is√always continuous but the converse is not true. For
example, the function g(x) = x is continuous
√ on D = [0, 1] but is not Lipschitz
continuous on D because g 0 (x) = 1/(2 x) is not bounded near the origin.
There are functions which are Lipschitz continuous but not differentiable. For
example, consider the continuous function g(x) = |x| on D = [−1, 1]. Clearly it
is not differentiable on D because it is not differentiable at x = 0. However, it is
Lipschitz continuous with L = 1 because the magnitude of the slope of the secant
line between any two points is always less than or equal to one. Consequently,
Lipschitz continuity is a stronger requirement than continuity but a weaker one
than differentiability.
For the existence and uniqueness result for (2.8), we need f (t, y) to be Lipschitz
continuous in y so we need to extend the above definition by just holding t fixed.
Formally, for fixed t we have that a function g(t, y) defined for y in a prescribed
1 Named after the German mathematician Rudolf Lipschitz (1832-1903).
32 CHAPTER 2. INTRODUCTION TO INITIAL VALUE PROBLEMS
domain is Lipschitz continuous in the variable y if for any (t, y1 ), (t, y2 ) there is a
constant L such that
We are now ready to state the theorem which guarantees existence and unique-
ness of a solution to (2.8) as well as guaranteeing that the solution depends continu-
ously on the data; i.e., the problem is well-posed. Note that y(t) is defined on [t0 , T ]
whereas f (t, y) must be defined on a domain in R2 . Specifically the first argument t
is in [t0 , T ] but y can be any real number so that D = {(t, y) | t ∈ [t0 , T ], y ∈ R1 };
a shorter notation for expressing D is D = [t0 , T ]×R1 which we employ.
In the sequel we only consider IVPs which are well-posed, that is, which have a
unique solution that depends continuously on the data.
2.3 Discretization
Even if we know that a solution to (2.8) exists for some choice of f (t, y), we may
not be able to find the closed form solution to the IVP; that is, a representation
of the solution in terms of a finite number of simple functions. Even for the sim-
plified case of f (t, y) = f (t) this is not always possible. For example, consider
f (t) = sin t2 which has no explicit formula for its antiderivative. In fact, a symbolic
algebra software package like Mathematica gives the antiderivative of sin t2 in terms
of the Fresnel Integral which is represented by an infinite power series near the ori-
gin; consequently there is no closed form solution to the problem. Although there
are numerous techniques for finding the analytic solution of first order differential
equations, we are unable to easily obtain closed form analytic solutions for many
equations. When this is the case, we must turn to a numerical approximation to
the solution where we give up finding a formula for the solution at all times and
instead find an approximation at a set of distinct times. Discretization is the name
given to the process of converting a continuous problem into a form which can be
used to obtain numerical approximations.
Probably the most obvious approach to discretizing a differential equation is
to approximate the derivatives in the equation by difference quotients to obtain a
difference equation which involves only differences in function values. The solution
to the difference equation will not be a continuous function but rather a discrete
function which is defined over a finite set of points. When plotting the discrete
solution one often draws a line through the points to get a continuous curve but
2.3. DISCRETIZATION 33
remember that interpolation must be used to determine the solution at points other
than at the given grid points.
Because the difference equation is defined at a finite set of points we first
discretize the time domain [t0 , T ]; alternately, if our solution depended on the
spatial domain x instead of t we would discretize the given spatial interval. For
now we use N + 1 evenly spaced points tn , n = 0, 1, 2, . . . , N
where ∆t = (T − t0 )/N is called the step size or time step. This is illustrated below
where the time domain is [0, T ] and N = 9.
∆t ∆t
0 T
t
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9
learn new methods we prove or merely state the theoretical rate of convergence. In
§ 1.2.3 the approach for determining the numerical rate of convergence is explained.
ì
1.0 ì
ì à
ì
1.5 ì à
ìà
ì ò
0.8 ìà
ì
ìà ò
ì
1.0ì
ò
æ
à ìà
ì ì
0.6 ì
à ìà ò
ì ìà
ì
à
òìì ìì
à ò æ
àì
ììì
à
ì
àì
ò ìà ò
æìàì ìì
0.4 ì
àìì
ò 1
àìì 2 ìììì
à
à
ò 3
æ
4
àìì
àìì ììì
à à ò
àìà ò
ò àìì
ò àìì àìì
àìì àìì
àìì æ
æ ò ò ò
æ æ
æ
1 2 3 4 0.0
Figure 2.1: The exact solution to an IVP is shown as a solid curve. In the figure
on the left a discrete solution using ∆t = 0.5 is plotted. From this plot, it is not
possible to say that the discrete solution is approaching the exact solution. However,
in the figure on the right the discrete solutions for ∆t = 0.5, 0.25, 0.125, and 0.625
are plotted. From this figure, the discrete approximations appear to be approaching
the exact solution as ∆t decreases.
where now the right-hand side is a function of t, y and y 0 . The methods we learn
in Chapter 3 and Chapter 4 only apply to first order IVPs. However, we can easily
convert this second order IVP into two coupled first order IVPs. To do this, we let
w1 (t) = y(t), w2 (t) = y 0 (t) and substitute into the equations and initial conditions
to get a first order system for w1 , w2
w10 (t) = w2 (t) 0<t≤2
w20 (t) = 2w2 (t) − sin(πw1 ) + 4t 0<t≤2
w1 (0) = 1 w2 (0) = 0 .
Note that these two differential equations are coupled, that is, the differential equa-
tion for w1 depends on w2 and the equation for w2 depends on w1 .
In general, if we have the pth order IVP for y(t)
y [p] (t) = f (t, y, y 0 , y 00 , · · · , y [p−1] ) t0 < t ≤ T
y(t0 ) = α1 , y 0 (t0 ) = α2 , y 00 (t0 ) = α3 , · · · y [p−1] (t0 ) = αp
then we convert it to a system of p first-order IVPs by letting w1 (t) = y(t), w2 (t) =
y 0 (t), · · · , wp (t) = y [p−1] (t) which yields the first order coupled system
w10 (t) = w2 (t)
w20 (t) = w3 (t)
..
. (2.10)
0
wp−1 (t) = wp (t)
wp0 (t) = f (t, w1 , w2 , . . . , wp )
along with the initial conditions wk = αk , k = 1, 2, . . . , p. Thus any higher order
IVP that we encounter can be transformed into a coupled system of first order IVPs.
There are many approaches to deriving discrete methods for the general first order
IVP (2.8) but the simplest methods use the slope of a secant line to approximate
the derivative in (2.8a). In this chapter we consider two methods for solving the
prototype IVP (2.8) which are obtained by using this approximation to y 0 (t) at
two different values of t. We see that the two methods require very different
implementations and have different stability properties.
We begin this chapter with the forward Euler method which is described by a
simple formula but also has a graphical interpretation. Numerical examples which
demonstrate how the method is applied by hand are provided before a computer
implementation is discussed. The discretization error for forward Euler is derived in
detail by first obtaining a local truncation error which is caused by approximating
y 0 (t) and then obtaining the global error which is due to the local truncation error
and an accumulated error over the time steps. We see that the global error is
one order of magnitude less than the local truncation error which is the typical
relationship we see for the methods described here. A computer implementation
of the forward Euler method is given and several examples demonstrate that the
numerical results agree with the theoretical rate of convergence. However, for one
example and certain choices of step size, the forward Euler produces results which
oscillate.
The second method considered here is the backward Euler method which has
strikingly different properties than the forward Euler method. The implementation
for the prototype IVP (2.8) typically requires solving a nonlinear equation at each
time step compared with a linear equation for the forward Euler method. However, it
does not produce unreliable results for some problems and some choices of the time
step as the forward Euler method does. Lastly we briefly discuss the concepts of
consistency, stability and convergence of numerical methods to begin to understand
why numerical methods may produce oscillating or unbounded results.
37
38 CHAPTER 3. THE EULER METHODS
tangent line at t, y(t)
secant line through t, y(t)
with slope y 0 (t)
and t + ∆t2 , y(t + ∆t2 )
t
t t + ∆t2 t + ∆t1
where we have used the differential equation y 0 (tn ) = f tn , y(tn ) . This suggests
the following numerical method for the solution of (2.8) which is called the forward
Euler method where we denote the approximate solution at tn as Y n and set Y 0 =
y(t0 ).
The term “forward” is used in the name because we write the equation at the point
tn and difference forward in time to tn+1 ; this implies that the given slope f is
evaluated at the known point (tn , Y n ).
To implement the method we know that Y 0 = y0 is the solution at t0 so we can
evaluate the known slope there, i.e., f (t0 , Y 0 ). Then the solution at t1 is given by
Y 1 = Y 0 + ∆tf (t0 , Y 0 ) .
For the next step, we know Y 1 ≈ y(t1 ) and so we must evaluate the slope at
(t1 , Y 1 ) to get
Y 2 = Y 1 + ∆tf (t0 , Y 1 ) .
The procedure is continued until the desired time is reached.
A graphical interpretation of the forward Euler method is shown in the figure
below.To start the method, we write the tangent line to the solution curve at
(t0 , y0 ) = (t0 , Y 0 ) which has slope y 0 (t0 ) = f (t0 , Y 0 ); the equation is
In the following example we implement two steps of the forward Euler method
by hand and then demonstrate how the approximation relates to the tangent line
of the solution curve.
to approximate the solution at t = 0.2 using a step size of ∆t = 0.1. Then calculate the
error at t = 0.2 given the exact solution y(t) = 2e−2t . Does the point (t1 , Y 1 ) lie on the
tangent line to y(t) at t = t0 ? Does the point (t2 , Y 2 ) lie on the tangent line to y(t) at
t = t1 ? Justify your answer.
To find the approximation at t = 0.2 using a time step of ∆t = 0.1, we first have to apply
Euler’s method to determine an approximation at t = 0.1 and then use this to approximate
the solution at t = 0.2. From the initial condition we set Y 0 = 2 and from the differential
equation we have f (t, y) = −2y. Applying the forward Euler method gives
and thus
The exact solution at t = 0.2 is y(0.2) = 2e−0.4 ≈ 1.34064 so the error is |1.34064 −
1.28| = 0.06064.
The equation of the tangent line to y(t) at t = 0 passes through the point (0, 2) and
has slope y 0 (0) = −4. Thus the equation of the tangent line is w − 2 = −4(t − 0)
and at t = 0.1 we have that w = 2 − .4 = 1.6 which is Y 1 so the point (0.1, Y 1 )
is on the tangent line to the solution curve at t = 0; this is to be expected from the
graphical interpretation of the forward Euler method. However, the point (t2 , Y 2 ) is not
3.1. FORWARD EULER METHOD 41
on the tangent line to the solution curve y(t) at t = 0.1 but rather on a line passing
through the point (t1 , Y 1 ) with slope f (t1 , Y 1 ). The actual
slope of the tangent line to
the solution curve at t1 , y(t1 ) is −2y(t1 ) = −2 2e−2(.1) = −4e−.2 = 3.2749 whereas
we approximate this by f (t1 , Y 1 ) = 3.2.
Y 1 = Y 0 + ∆t(−λY 0 ) = (1 − λ∆t)Y 0 .
Similarly,
Y n = (1 − λ∆t)n Y 0 .
In the next example we use this general formula to compare the results at a
fixed time for a range of decreasing values of the uniform time step. As can be seen
from the results, as ∆t → 0 the error in the solution tends to zero which implies
the approximate solution is converging to the exact solution.
∆t n Yn Relative
Error
If the numerical solution is converging to the exact solution then the relative error at a
fixed time should approach zero as ∆t gets smaller. As can be seen from the table, the
approximations tend to zero monotonically as ∆t is halved and, in fact, the errors are
approximately halved as we decrease ∆t by half. This is indicative of linear convergence.
At ∆t = 1/320 the relative error is approximately 3.87% so for ∆t = 1/640 we expect the
relative error to be approximately 1.9% so cutting the time step again to 1/1280 should give
a relative error of < 1%. To confirm this, we do the calculation Y n = 2(1 − 5/1280)1280
and get a relative error of approximately 0.97%.
To measure the local truncation error, plug the exact solution into the differ-
ence equation and calculate the remainder.
Y2
Yc2
global error
local error
Y1 y(t2 ) y(t2 )
y(t1 ) y(t1 )
Y0 Y0
Y 2 = Y 1 + ∆tf (t1 , Y 1 )
Yc2 = y(t1 ) + ∆tf t1 , y(t1 )
t t
t0 t1 t2 t0 t1 t2
Our strategy for analytically determining the global error for the forward Euler
method is to first quantify the local truncation error in terms of ∆t and then use this
result to determine the global error. To determine a formula for the local truncation
error for the forward Euler method we substitute the exact solution to (2.8a) into
the difference equation (3.2) and calculate the remainder. If τn+1 denotes the local
truncation error at the (n + 1)st time step then
τn+1 = y(tn+1 ) − y(tn ) + ∆tf tn , y(tn ) . (3.3)
In order to combine
terms in (3.3) we need all terms to be evaluated at the same
point tn , y(tn ) . The only term not at this point is the exact solution y(tn+1 ) =
y(tn + ∆t) so we use a Taylor series with remainder (see Appendix) for this term;
we have
(∆t)2 00
y(tn+1 ) = y(tn + ∆t) = y(tn ) + ∆ty 0 (tn ) + y (ξi ) ξi ∈ (tn , tn+1 ) .
2!
From the differential equation evaluated at tn , we have y 0 (tn ) = f tn , y(tn ) so
we substitute this into the Taylor series expansion for y(tn+1 ). We then put the
expansion into the expression (3.3) for the truncation error to yield
h (∆t)2 00 i h i
τn+1 = y(tn ) + ∆tf tn , y(tn ) + y (ξi ) − y(tn ) + ∆tf tn , y(tn )
2!
(∆t)2 00
= y (ξi ) ,
2!
If y 00 (t) is bounded on [0, T ], say |y 00 (t)| ≤ M , and T = t0 + N ∆t, then we have
M
τ = max |τn | ≤ (∆t)2 , (3.4)
1≤n≤N 2
44 CHAPTER 3. THE EULER METHODS
where τ denotes the largest truncation error of all the N time steps. We say that the
local truncation error for Euler’s method is order (∆t)2 which we write as O ∆t2
and say that the rate is quadratic. This implies that the local error is proportional
to the square of the step size; i.e., it is a constant times the square of the step size
which in turn says that if we compute the local error for ∆t then the local error
using ∆t/2 is reduced by approximately (1/2)2 = 1/4. Remember, however, that
this is not the global error but rather the error made because we have used a finite
difference quotient to approximate y 0 (t).
We now turn to estimating the global error in the forward Euler method. We
should expect to only be able to find an upper bound for the error because if we
can find a formula for the exact error, then we can calculate this and add it to
the approximation to get the exact solution. The proof for the global error for the
forward Euler method is a bit technical but it is the only global error estimate that
we derive because the methods we consider follow the same relationship between
the local and global error as the Euler method.
Our goal is to demonstrate that the global discretization error for the forward
Euler method is O(∆t) which says that the method is first order, i.e., linear in ∆t.
At each step we make a local error of O(∆t)2 due to approximating the derivative
in the differential equation; at each fixed time we have the accumulated errors of
all previous steps and we want to demonstrate that this error does not exceed a
constant times ∆t.
Theorem 3.1 provides a formal statement and proof for the global error of the
forward Euler method. Note that one hypothesis is that f (t, y) must be Lipschitz
continuous in y which is also assumed to guarantee existence and uniqueness of the
solution to the IVP (2.8) so it is a natural assumption. We also assume that y(t)
possesses a bounded second derivative because we need to use the local truncation
error given in (3.4); however, this condition can be relaxed but it is adequate for
our needs.
3.1. FORWARD EULER METHOD 45
Theorem 3.1 : Global error estimate for the forward Euler method
Proof. Let En represent the global discretization error at the specific time tn , i.e.,
En = |y(tn ) − Y n |. The steps in the proof are summarized as follows.
Step I. Use the definition of the local truncation error τn to demonstrate that the
global error satisfies
that is, the error at a step is bounded by a constant times the error at the
previous step plus the absolute value of the local truncation error. If τ is the
maximum of all |τn |, we have
Step II. Apply (3.5) recursively and use the fact that E0 = 0 to get
n−1
X
En ≤ τ Ki . (3.6)
i=0
Step III. Recognize that the sum in (3.6) is a geometric series whose sum is known
to get
τ
En ≤ [(1 + ∆tL)n − 1] . (3.7)
∆tL
Step IV. Use the Taylor series expansion of e∆tL near zero to bound (1 + ∆tL)n
by en∆tL which in turn is less than eT L .
Step V. Use the bound (3.4) for τ to get the final result
M ∆t T L M TL
En ≤ (e − 1) = C∆t where C = (e − 1) . (3.8)
2L 2L
46 CHAPTER 3. THE EULER METHODS
We now give the details for each step. For the first step we use the fact that
the local truncation error is the remainder when we substitute the exact solution
into the difference equation; i.e.,
τn = y(tn ) − y(tn−1 ) − ∆tf tn−1 , y(tn−1 ) .
To get the desired expression for En we solve for y(tn ) in the above expression,
substitute into the definition for En and use the triangle inequality; then we use the
forward Euler scheme for Y n and Lipschitz continuity (2.9). We have
En = |y(tn ) − Y n |
τn + y(tn−1 ) + ∆tf tn−1 , y(tn−1 ) − Y n
=
= τn + y(tn−1 ) + ∆tf tn−1 , y(tn−1 ) − Y n−1 + ∆tf (tn−1 , Y n−1 )
Because we assume for analysis that there are no roundoff errors, E0 = |y0 −Y0 | = 0
Pn−1
and we are left with τ i=0 K i .
For the
Pthird step we simplify the sum by noting that it is a geometric series of
n−1
the form i=0 ari with a = τ and r = K. From calculus we know that the sum is
given by a(1 − rn )/(1 − r) so that if we use the fact that K = 1 + ∆tL we arrive
at the result (3.7)
1 − Kn
n
K −1 τ h i
En ≤ τ =τ = (1 + ∆tL)n − 1 .
1−K K −1 ∆tL
To justify the fourth step we know that for real z the Taylor series expansion
ez = 1 + z + z 2 /2! + · · · near zero implies that 1 + z ≤ ez so that (1 + z)n ≤ enz .
If we set z = ∆tL we have (1 + ∆tL)n ≤ en∆tL so that
τ
En ≤ (en∆tL − 1) .
∆tL
For the final step we know from the hypothesis of the theorem that |y 00 (t)| ≤ M
so τ ≤ M ∆t2 /2. Also n in En is the number of steps taken from t0 so n∆t =
tn ≤ T where T is the final time and so en∆tL ≤ eT L . Combining these results
gives the desired result (3.8).
3.1. FORWARD EULER METHOD 47
Computer implementation
For the computer implementation of a method we first identify what information
is problem dependent. The information which changes for each IVP (2.8) is the
interval [t0 , T ], the initial condition y0 , the given slope f (t, y) and the exact solution
if an error calculation is performed. From the examples calculated by hand, we know
that we approximate the solution at t1 , then at t2 , etc. so implementation requires
a single loop over the number of time steps, say N . However, the solution should
not be stored for all times; if it is needed for plotting, etc., then the time and
solution should be written to a file to be used later. For a single equation it does
not take much storage to keep the solution at each time step but when we encounter
systems and problems in higher dimensions the storage to save the entire solution
can become prohibitively large so it is not good to get in the practice of storing the
solution at each time.
The following pseudocode gives an outline of one approach to implement the
forward Euler method using a uniform time step.
48 CHAPTER 3. THE EULER METHODS
Define: external function for the given slope f (t, y) and for the exact solu-
tion for error calculation
Input: the initial time, t0 , the final time, T , the initial condition, y0 , and
the uniform time step ∆t
Set:
t = t0
y = y0
Time step loop:
do while t ≤ T
m = f (t, y)
y = y + ∆t m
t = t + dt
output t, y
Determine error at final time t:
error = | exact(t) − y |
Example 3.4. Using the forward Euler method for exponential growth
Consider the exponential growth problem
ì
à
3.5 Dt=1/4
ì ò
ì
à
ì
3.0 ì
à
ò
ì æ Dt=1/2
ì
à
ì
2.5
ì
à
ò
ì
ì
à
ì
ì
ò
æ
à
0.2 0.4 0.6 0.8 1.0
To verify that the global error is O(∆t) we compare the discrete solution to the exact
solution at the point t = 1 where we know that the exact solution is 2e.8 =4.45108; we
tabulate our approximations P n to p(t) at t = 1 and the global error in the table below
for ∆t = 1/4, 1/8, . . . , 1/128. By looking at the errors we see that as ∆t is halved the
error is approximately halved so this suggests linear convergence; the calculation of the
numerical rate of convergence makes this result precise because we see that the sequence
{.891, .942, .970, .985, .992} tends to one. In the table the approximations and errors are
given to five digits of accuracy.
We demonstrate graphically that the convergence rate is linear by using a log-log plot.
Recall that if we plot a polynomial y = axr on a log-log scale then the slope is r.2 Since
the error is E = C(∆t)r , if we plot the error on a log-log plot we expect the slope to be
r and in our case r = 1. This is illustrated in the log-log plot below where we compute
the slope of the line for two points.
2 Using the properties of logarithms we have log y = log axr = log a + r log x which implies
error
1.00
0.50
0.20
0.05
0.02
Dt
0.01 0.02 0.05 0.10 0.20 0.50 1.00
If we tabulate the errors at a different time then we get different errors but the numerical
rate should still converge to one. In the table below we demonstrate this by computing
the errors and rates at t = 0.5; note that the error is smaller at t = 0.5 than t = 1 for a
given step size because we have not taken as many steps and we have less accumulated
error.
The next example applies the forward Euler method to an IVP modeling logistic
growth. The DE is nonlinear in this case but this does not affect the implementation
of the algorithm. The results are compared to those from the previous example
modeling exponential growth.
Example 3.5. Using the forward Euler method for logistic growth.
Consider the logistic model
p(t)
p0 (t) = 0.8 1 − p(t) 0 < t ≤ 10 p(0) = 2 .
100
Implement the forward Euler scheme and demonstrate that we get linear convergence.
Compare the results from this example with those from Example 3.4 of exponential growth.
The exact solution to this problem is given by (2.4) with K = 100, r0 = 0.8, and p0 = 2.
Before generating any simulations we should think about what we expect the behavior of
this solution to be compared with the exponential growth solution in the previous example.
Initially the population should grow at the same rate because r0 = 0.8 which is the same
growth rate as in the previous example. However, the solution should not grow unbounded
3.1. FORWARD EULER METHOD 51
but rather always stay below the carrying capacity p = 100. The approximations at t = 1
for a sequence of decreasing values of ∆t are presented below along with the calculated
numerical rates. The exact value at t = 1 is rounded to 4.3445923. Again we see that the
numerical rate approaches one.
Below we plot the approximate solution for ∆t = 1/16 on [0, 10] for this logistic growth
problem and the approximate solution for the previous exponential growth problem. Note
that the exponential growth solution increases without bound whereas the logistic growth
solution never exceeds the carrying capacity of K = 100. Also for small time both models
give similar results.
200
50
0 2 4 6 8 10
In the next example the IVP models exponential decay with a large decay con-
stant. The example illustrates the fact that the forward Euler method can sometimes
give erroneous results.
Example 3.6. Numerically unstable computations for the forward Euler method.
In this example we consider exponential decay where the decay rate is large. Specifically,
we seek y(t) such that
which has an exact solution of y(t) = e−20t . Plot the numerical results using ∆t = 1/4, 1/8
and discuss results.
The implementation is the same as in Example 3.4 so we graph the approximate solutions
on [0, 2] with ∆t = 14 and 18 . Note that for this problem the approximate solution is
oscillating and becoming unbounded.
52 CHAPTER 3. THE EULER METHODS
200
Dt=1/4
100
Dt=1/8
t
0.5 1.0 1.5 2.0
-100
-200
Why are the results for the forward Euler method not reliable for this problem whereas
they were for previous examples? The reason for this is a stability issue which we address
in § 3.4. When we determined the theoretical rate of convergence we tacitly assumed
that the method converged; however, from this example we see that it does not for these
choices of the time step.
We know that the local truncation error for the forward Euler method is O(∆t)2 .
In the previous examples we demonstrate that the global error is O(∆t) so in the
next example we demonstrate numerically that the local truncation error is second
order.
Example 3.7. Comparing the local and global errors for the forward Euler method.
that is, we use the exact value y(tn−1 ) instead of Y n−1 and evaluate the slope at the
point (tn−1 , y(tn−1 )) which is on the solution curve. In the table below we tabulate
the local and global errors at t = π using decreasing values of ∆t. From the numerical
rates of convergence you can clearly see that the local truncation error is O(∆t)2 , as we
demonstrated analytically. As expected, the global error converges linearly. Except at the
first step (where the local and global errors are identical) the global error is always larger
than the truncation error because it includes the accumulated errors as well as the error
made by approximating the derivative by a difference quotient.
What makes this method so different from the forward Euler scheme? The
answer is the point where the slope f (t, y) is evaluated. To see the difficulty here
consider the IVP y 0 (t) = t + y 2 , y(0) = 1. Here f (t, y) = t + y 2 . We set Y 0 = 1,
∆t = 0.1 and to compute Y 1 we have the equation
2
Y 1 = Y 0 + ∆tf (t1 , Y 1 ) = 1 + 0.1 0.1 + Y 1
For forward Euler we evaluate the slope f (t, y) at known values whereas for back-
ward Euler we don’t know the y-value where the slope is evaluated. In order to
solve for Y 1 using the backward Euler scheme we must solve a nonlinear equation
except when f (t, y) is linear in y or only a function of t.
The difference between the forward and backward Euler schemes is so impor-
tant that we use this characteristic to broadly classify methods. The forward Euler
scheme given in (3.2) is called an explicit scheme because we write the unknown
explicitly in terms of known values. The backward Euler method given in (3.9)
is called an implicit scheme because the unknown is written implicitly in terms of
known values and itself. The terms explicit/implicit are used in the same manner as
54 CHAPTER 3. THE EULER METHODS
Example 3.8. Using the backward Euler method for exponential growth
Consider the exponential growth problem
whose exact solution is p(t) = 2e.8t . In Example 3.4 we apply the forward Euler method
to obtain approximations at t = 1 using a sequence of decreasing time steps. Repeat the
calculations for the backward Euler method. Discuss implementation.
To solve this IVP using the backward Euler method we see that for f = 0.8p the difference
equation is linear
P n+1 = P n + 0.8∆tP n+1 ,
where P n ≈ p(tn ). Thus we do not need to use Newton’s method for this particular
problem but rather just solve the equation
1
P n+1 − 0.8∆tP n+1 = P n ⇒ P n+1 = Pn .
1 − 0.8∆t
If we have a code that uses Newton’s method it should get the same answer in one step
because it is solving a linear problem rather than a nonlinear one. The results are tabulated
below. Note that the numerical rate of convergence is also approaching one but for this
method it is approaching one from above whereas using the forward Euler scheme for
this problem the convergence was from below, i.e., through values smaller than one. The
amount of work required for the backward Euler method is essentially the same as the
forward Euler for this problem because the derivative f (t, p) is linear in the unknown p.
3.2. BACKWARD EULER METHOD 55
In the previous example, the DE was linear in the unknown so it was straight-
forward to implement the backward Euler scheme and it took the same amount of
work as implementing the forward Euler scheme. However, the DE modeling logistic
growth is nonlinear and so implementation of the backward Euler scheme involves
solving a nonlinear equation. This is addressed in the next example.
Example 3.9. Using the backward Euler method for logistic growth.
In Example 3.5 we consider the logistic model
p(t)
p0 (t) = 0.8 1 − p(t) 0 < t ≤ 10 p(0) = 2 .
100
Implement the backward Euler scheme and demonstrate that we get linear convergence.
Discuss implementation of the implicit method.
To implement the backward Euler scheme for this problem we see that at each step we
have the nonlinear equation
2 !
n+1 n n+1 n n+1 P n+1
P = P + ∆tf (tn+1 , P ) = P + .8∆t P −
100
for P n+1 . Thus to determine each P n+1 we have to employ a method such as Newton’s
method. Recall that to find the root z of the nonlinear equation g(z) = 0 (a function of
one independent variable) each iteration of Newton’s method is given by
g(z k−1 )
z k = z k−1 −
g 0 (z k−1 )
for the iteration counter k = 1, 2, . . . and where an initial guess z 0 is prescribed. For our
problem, to compute the solution at tn+1 we have the nonlinear equation
z2
g(z) = z − P n − .8∆t z − =0
100
where z = P n+1 . Our goal is to approximate the value of z which makes g(z) = 0 and
this is our approximation P n+1 . For an initial guess z 0 we simply take P n because if ∆t
is small enough and the solution is smooth then the approximation at tn+1 is close to the
solution at tn . To implement Newton’s method we also need the derivative g 0 which for
us is just z
g 0 (z) = 1 − .8∆t 1 − .
50
The results using backward Euler are tabulated below; note that the numerical rates
of convergence approach one as ∆t → 0. We have imposed the convergence criteria
56 CHAPTER 3. THE EULER METHODS
for Newton’s method that the normalized difference in successive iterates is less than a
prescribed tolerance, i.e.,
|z k − z k−1 |
≤ 10−8 .
|z k |
In these computations, two to four Newton iterations are necessary to satisfy this conver-
gence criteria. It is well known that Newton’s method typically converges quadratically
(when it converges) so we should demonstrate this. To this end, we look at the normalized
difference in successive iterates. For example, for ∆t = 1/4 at t = 1 we have the sequence
0.381966, 4.8198 10−3 , 7.60327 10−7 , 1.9105 10−15 so that the difference at one iteration
is approximately the square of the difference at the previous iteration indicating quadratic
convergence.
Lastly we repeat the calculations for the exponential decay IVP in Example 3.6
using the backward Euler method. Recall that the forward Euler method produced
oscillatory results for ∆t = 1/4 and ∆t = 1/8. This example illustrates that the
backward Euler method does not produce unstable results for this problem.
Example 3.10. Backward Euler stable where forward Euler method oscillates.
We consider the problem in Example 3.6 for exponential decay where the forward Euler
method gives unreliable results. Specifically, we seek y(t) such that
which has an exact solution of y(t) = e−20t . Plot the results using the backward Euler
method for ∆t = 1/2, 1/4, . . . , 1/16. Compare results with those from the forward Euler
method.
Because this is just an exponential decay problem, the implementation is analogous to
that in Example 3.8. We graph approximations using the backward Euler method along
with the exact solution. As can be seen from the plot, it appears that the discrete solution
is approaching the exact solution as ∆t → 0 whereas the forward Euler method gave
unreliable results for ∆t = 1/2, 1/4. Recall that the backward Euler method is an implicit
scheme whereas the forward Euler method is an explicit scheme.
3.3. IMPROVING ON THE EULER METHODS 57
1.0
0.8 Dt = 1 4
Dt = 1 8
0.6
Dt = 1 16
0.4 Dt = 1 32
0.2
exact
solution
t
0.2 0.4 0.6 0.8 1.0
As before, we use the differential equation y 0 (t) = f (t, y) and now evaluate at the
midpoint to get
y(tn ) + y(tn+1 ) tn + tn+1 tn + tn+1
≈f ,y
∆t 2 2
which implies
tn + tn+1 tn + tn+1
y(tn+1 ) ≈ y(tn ) + ∆tf ,y .
2 2
Now the problem with using this expression to generate a scheme is that we do not
know y at the midpoint of the interval [tn , tn+1 ]. So what can we do? The only
option is to approximate it there so an obvious approach is to take a step of length
∆t/2 using forward Euler, i.e.,
t + t ∆t
n n+1
y ≈ y(tn ) + f tn , y(tn ) .
2 2
58 CHAPTER 3. THE EULER METHODS
∆t
Y n+1 = Y n + f (tn , Y n ) + f (tn+1 , Y n+1 ) .
(3.11)
2
This method is clearly implicit because it requires evaluating the slope at the un-
known point (tn+1 , Y n+1 ). In the next chapter we look at approaches to system-
atically derive new methods rather than heuristic approaches.
Any numerical scheme we use must be consistent with the differential equation
we are approximating. The discrete solution satisfies the difference equation but the
exact solution y(tn ) yields a residual when substituted into the difference equation
which we call the local truncation error. As before we define τ (∆t) to be the largest
(in absolute value) local truncation error made at each of the N time steps with
increment ∆t , i.e., τ (∆t) = max1≤n≤N |τn (∆t)|. For the scheme to be consistent,
this error should go to zero as ∆t → 0, i.e.,
Both the forward and backward Euler methods are consistent with (2.8) because
we know that the maximum local truncation error is O(∆t)2 for all tn . If the local
truncation error is constant then the method is not consistent. Clearly we only want
to use difference schemes which are consistent with our IVP (2.8). However, the
consistency requirement is a local one and does not guarantee that the method is
convergent as we saw in Example 3.6.
We now want to determine how to make a consistent scheme convergent. In-
tuitively we know that for a scheme to be convergent the discrete solution at each
point must get closer to the exact solution as the step size reduces. As with con-
sistency, we can write this formally and say that a method is convergent if
where we have used the triangle inequality. Now the first term |y(tn ) − Y n | is
governed by making the local truncation error sufficiently small (i.e., making the
equation consistent) and the second term is controlled by the stability requirement.
So if each of these two terms can be made sufficiently small then when we take
the maximum over all points tn and take the limit as ∆t approaches zero we get
convergence.
In the next chapter we investigate the stability of methods and demonstrate
that the forward Euler method is only stable for ∆t sufficiently small whereas the
backward Euler method is numerically stable for all values of ∆t. Recall that the
forward Euler method is explicit whereas the backward Euler is implicit. This pattern
follows for other methods; that is, explicit methods have a stability requirement
whereas implicit methods do not. Of course this doesn’t mean we can take a very
large time step when using implicit methods because we still have to balance the
accuracy of our results.
3.4. CONSISTENCY, STABILITY AND CONVERGENCE 61
EXERCISES
3.1. Classify each difference equation as explicit or implicit. Justify your answer.
b. Y n+1 = Y n−1 + ∆t
n+1
3 f (tn+1 , Y ) + 4f (tn , Y n ) + f (tn−1 , Y n−1 )
c. Y n+1 = Y n + ∆t tn , Y n + ∆t ∆t
) + ∆t
n n+1
n
2 f 2 f (tn , Y )− 2 f (tn , Y 2 f tn+1 , Y +
∆t n+1
2 f (tn+1 , Y )
d. Y n+1 = Y n + ∆t 2
+ 23 ∆tf (tn + ∆t
n n n
4 f (tn , Y ) + 3f tn + 3 ∆t, Y 3 ,Y +
∆t n
3 f (tn , Y )
3.2. Assume that the following set of errors isobtained from three different methods
for approximating the solution of an IVP of the form (2.8) at a specific time. First
look at the errors and try to decide the accuracy of the method. Then use the result
(1.6) to determine a sequence of approximate numerical rates for each method using
successive pairs of errors. Use these results to state whether the accuracy of the
method is linear, quadratic, cubic or quartic.
3.4. Show that if we integrate the IVP (2.8a) from tn to tn+1 and use a right
Riemann sum to approximate the integral of f (t, y) then we obtain the backward
Euler method.
3.5. Consider the IVP y 0 (t) = −λy(t) with y(0) = 1. Apply the backward Euler
method to this problem and show that we have a closed form formula for Y n , i.e.,
1
Yn = .
(1 + ∆t)n
3.6. Derive the backward Euler method by using the Taylor series expansion for
y(tn − ∆t).
62 CHAPTER 3. THE EULER METHODS
y 0 (t) = 1 − y y(0) = 0
using the backward Euler method. In this case the given slope y 0 (t) is linear in y so
the resulting difference equation is linear. Use backward Euler to approximate the
solution at t = 0.5 with ∆t = 1/2, 1/4. . . . , 1/32. Compute the error in each case
and the numerical rate of convergence.
3.8. What IVP has a solution which exhibits the logarithmic growth y(t) = 2+3 ln t
where the initial time is prescribed at t = 1?
y 0 (t) = 1 − y 2 y(0) = 0
using the forward Euler method with ∆t = 1/4. Compute the local and global
errors at t = 1/4 and t = 1/2. The exact solution is y(t) = (e2t − 1)/(e2x + 1).
Computer Exercises
and backward Euler methods and demonstrate that the convergence is linear. For
the backward Euler method incorporate Newton’s method and verify that it is con-
verging quadratically. For each method compute and tabulate the numerical rates
using successive values of N = 10, 20, 40, . . . , 320. Discuss your results and com-
pare with theory.
3.4. CONSISTENCY, STABILITY AND CONVERGENCE 63
3.12. Write a code which implements the forward Euler method to solve an IVP of
the form (2.8). Use your code to approximate the solution of the IVP
y 0 (t) = 1 − y 2 y(0) = 0
which has an exact solution y(t) = (e2t − 1)/(e2x + 1). Compute the errors at t = 1
using ∆t = 1/4, 1/8, 1/16, 1/32, 1/64.
a. To obtain a model describing the growth of the mold you first make the
hypothesis that the growth rate of the fungus is proportional to the amount
of mold present at any time with a proportionality constant of k. Assume
that the initial amount of mold present is 0.25 square inches. Let p(t) denote
the number of square inches of mold present on day t. Write an initial value
problem for the growth of the mold.
b. Assume that the following data is collected over a period of ten days. Assuming
that k is a constant, use the data at day one to determine k. Then using the
forward Euler method with ∆t a fourth and an eight of a day, obtain numer-
ical estimates for each day of the ten day period; tabulate your results and
compare with the experimental data. When do the results become physically
unreasonable?
c. The difficulty with the exponential growth model is that the bread model grows
in an unbounded way as you saw in (b). To improve the model for the growth
of bread mold, we want to incorporate the fact that the number of square
inches of mold can’t exceed the number of square inches in a slice of bread.
Write a logistic differential equation which models this growth using the same
initial condition and growth rate as before.
64 CHAPTER 3. THE EULER METHODS
d. Use the forward Euler method with ∆t a fourth and an eighth of a day to obtain
numerical estimates for the amount of mold present on each of the ten days
using your logistic model. Tabulate your results as in (b) and compare your
results to those from the exponential growth model.
Chapter 4
A Survey of Methods
In the last chapter we developed the forward and backward Euler method for ap-
proximating the solution to the first order IVP (2.8). Although both methods are
simple to understand and program, they converge at a linear rate which is often
prohibitively slow. In this chapter we provide a survey of schemes which have higher
than linear accuracy. Also in Example 3.6 we saw that the forward Euler method
gave unreliable results for some choices of time steps so we need to investigate when
this numerical instability occurs so it can be avoided.
The standard methods for approximating the solution of (2.8) fall into two broad
categories which are single-step/one-step methods and multistep methods. Each
category consists of families of explicit and families of implicit methods of varying
degrees of accuracy. Both the forward and backward Euler methods are single-step
methods because they only use information from one previously computed solution
(i.e., at tn ) to compute the solution at tn+1 . We have not encountered multistep
methods yet but they use information at several previously calculated points; for
example, a two-step method uses information at tn and tn−1 to predict the solution
at tn+1 . Using previously calculated information is how multistep methods improve
on the linear accuracy of the Euler method. One-step methods improve the accuracy
by performing intermediate approximations in (tn , tn+1 ] which are then discarded.
There are advantages and disadvantages to each class of methods.
Instead of simply listing common single-step and multistep methods, we want to
understand how these methods are derived so that we gain a better understanding.
For one-step methods we begin by using Taylor series expansions to derive the Eu-
ler methods and see how this approach can be easily extended to get higher order
accurate methods. However, we see that these methods require repeated differen-
tiation of the given slope f (t, y) and so are not, in general, practical. To obtain
methods which don’t require repeated differentiation we investigate how numerical
quadrature rules and interpolating polynomials can be used to derive methods. In
these approaches we integrate the differential equation from tn to tn+1 and either
use a quadrature rule to evaluate the integral of f (t, y) or first replace f (t, y) by
65
66 CHAPTER 4. A SURVEY OF METHODS
ture rule we choose we obtain a different method whose accuracy must then be
obtained. An alternate approach to deriving methods is to form a general explicit or
implicit method, assuming a fixed number of additional function evaluations, and
then determine the coefficients in the scheme so that one has as high an accuracy as
possible. This approach results in families of methods which have a given accuracy
and so eliminates the tedious local truncation error calculations. Either approach
leads to the Runge-Kutta family of methods which we discuss in § 4.1.3.
y(tn+1 ) − y(tn )
y(tn +∆t) ≈ y(tn )+∆ty 0 (tn ) = y(tn )+∆tf (tn , y(tn )) ⇒ y 0 (tn ) ≈
∆t
which leads to the forward Euler method when we substitute into the differential
equation evaluated at tn , i.e., y 0 (tn ) = f tn , y(tn ) .
So theoretically, if we keep additional terms in the series expansion for y(tn +∆t)
then we get a higher order approximation. To see how this approach works, we now
keep three terms in the expansion and thus have a remainder term of O(∆t)3 . From
(4.1) we have
(∆t)2 00
y(tn + ∆t) ≈ y(tn ) + ∆ty 0 (tn ) + y (tn ) ,
2!
so we expect a local error of O(∆t)3 which leads to an expected global error of
O(∆t)2 . Now the problem we have to address when we use this expansion is what
to do with y 00 (tn ) because we only know y 0 (t) = f (t, y). If our function is smooth
enough, we can differentiate this equation with respect to t to get y 00 (t). To do this
68 CHAPTER 4. A SURVEY OF METHODS
recall that we have to use the chain rule because f is a function of t and y where
y is also a function t. Specifically, we have
∂f dt ∂f dy
y 0 (t) = f (t, y) ⇒ y 00 (t) = + = ft + fy f .
∂t dt ∂y dt
(∆t)2 h i
y(tn +dt) ≈ y(tn )+∆tf tn , y(tn ) + ft (tn , y(tn ) +f tn , y(tn ) fy (tn , y(tn )
2!
which generates the second order explicit Taylor series method.
(∆t)2 h i
Y n+1 = Y n + ∆tf (tn , Y n ) + ft (tn , Y n ) + f (tn , Y n )fy (tn , Y n ) (4.2)
2
To implement this method, we must provide function routines not only for f (t, y)
but also ft (t, y) and fy (t, y). In some cases this is easy, but in others it can be
tedious or even not possible. The following example applies the second order Taylor
scheme to a specific IVP and in the exercises we explore a third order Taylor series
method.
(∆t)2
Y n+1 = Y n + 3∆tY n (tn )2 + 6tn Y n + 9(tn )4 Y n .
(4.3)
2
4.1. SINGLE-STEP METHODS 69
(.1)2
1 1 1
Y1 = + 0.1(3) 0+ (0) =
3 3 2 3
and
(.1)2
1 1 1 1
Y2 = + 0.1(3) (.1)2 + 6(.1) + 9(.1)4 = 0.335335 .
3 3 2 3 3
The exact solution at t = 0.2 is 0.336011 which gives an error of 0.675862 10−3 .
To implement this method in a computer code we modify our program for the forward
Euler method to include the O(∆t)2 terms in (4.2). In addition to a function for f (t, y)
we also need to provide function routines for its first partial derivatives fy and ft ; note
that in our program we code the general equation (4.2), not the equation (4.3) specific
to our problem. We perform calculations with decreasing values of ∆t and compare with
results at t = 1 using the forward Euler method. When we compute the numerical rate
of convergence we see that the rate of convergence is O(∆t)2 , as expected whereas the
forward Euler is only linear. For this reason when we compare the global errors at a fixed,
small time step we see that the error is much smaller for the second order method because
it is converging to zero faster than the Euler method.
Implicit Taylor series methods are derived in an analogous manner. In this case
we use the differential equation evaluated at tn+1 , i.e., y 0 (tn+1 ) = f tn+1 , y(tn+1 ) .
(∆t)2 00
y(tn ) = y(tn+1 − ∆t) = y(tn+1 ) − ∆ty 0 (tn+1 ) + y (tn+1 ) + · · · . (4.4)
2!
Keeping terms through O(∆t) gives the backward Euler method. In the exercises
you are asked to derive a second order implicit Taylor series method.
Taylor series methods are single step methods. Although using these methods
results in methods with higher order accuracy than the Euler methods, they are
not considered practical because of the requirement of repeated differentiation of
f (t, y). For example, the first full derivative has two terms and the second has five
terms. So even if f (t, y) can be differentiated, the methods become unwieldy. To
implement the methods on a computer the user must provide routines for all the
partial derivatives so the codes become very problem dependent. For these reasons
we look at other approaches to derive higher order schemes.
70 CHAPTER 4. A SURVEY OF METHODS
The integral on the left-hand side can be evaluated exactly by the Fundamental
Theorem of Calculus to get y(tn+1 ) − y(tn ). However, in general, we must use
numerical quadrature to approximate the integral on the right-hand side. Recall
from calculus that one of the simplest approximations to an integral is to use ei-
ther a left or right Riemann sum, i.e., if the integrand is nonnegative then we are
approximating the area under the curve by a rectangle. If we use a left sum for the
integral we approximate the integral by a rectangle whose base is ∆t and whose
height is determined by the integrand evaluated at the left endpoint of the interval;
i.e., we use the formula Z b
g(x) ≈ g(a)(b − a) .
a
Using the left Riemann sum to approximate the integral of f (t, y) gives
Z tn+1
y(tn+1 ) − y(tn ) = f (t, y) dt ≈ ∆tf tn , y(tn )
tn
which leads us to the forward Euler method. In the exercises we explore the im-
plications of using a right Riemann sum. Clearly different approximations to the
integral of f (t, y) yield different methods.
Numerical quadrature rules for single integrals have the general form
Z b Q
X
g(x) dx ≈ wi g(qi ) ,
a i=1
4.1. SINGLE-STEP METHODS 71
where the scalars wi are called the quadrature weights, the points qi are the quadra-
ture points in [a, b] and Q is the number of quadrature points used. One common
numerical integration rule is the midpoint rule where, as the name indicates, we
evaluate the integrand at the midpoint of the interval; specifically the midpoint
quadrature rule is
Z b
a+b
g(x) dx ≈ (b − a)g .
a 2
Using the midpoint quadrature rule to approximate the integral of f (t, y) in (4.5)
gives
∆t ∆t
y(tn+1 ) − y(tn ) ≈ ∆tf tn + , y(tn + ) .
2 2
We encountered this scheme in § 3.3. Recall that we don’t know y evaluated at
the midpoint so we must use an approximation. If we use the forward Euler method
starting at tn and take a step of length ∆t/2 then this produces an approximation
to y at the midpoint i.e.,
∆t ∆t
y(tn + ) ≈ y(tn ) + f tn , y(tn ) .
2 2
Thus we can view our method as having two parts; first we approximate y at the
midpoint using Euler’s method and then use it to approximate y(tn+1 ). Combining
these into one equation allows the scheme to be written as
∆t n 1
Y n+1 = Y n + ∆tf tn + , Y + ∆tf (tn , Y n ) .
2 2
However, the method is usually written in the following way for clarity and to
emphasize the fact that there are two function evaluations required.
Midpoint Rule
k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + ∆t n 1
2 , Y + 2 k1 )
(4.6)
Y n+1 = n
Y + k2
The midpoint rule has a simple geometrical interpretation. Recall that for the
forward Euler method we used a tangent line at tn to extrapolate the solution at
tn+1 . In the midpoint rule we use a tangent line at tn + ∆t/2 to extrapolate the
solution at tn+1 . Heuristically we expect this to give a better approximation than
the tangent line at tn .
The midpoint rule is a single-step method because it only uses one previously
calculated solution, i.e., the solution at tn . However, it requires one more function
evaluation than the forward Euler method but, unlike the Taylor series methods,
it does not require additional derivatives of the slope f (t, y). Because we are
doing more work than the Euler method, we would like to think that the scheme
72 CHAPTER 4. A SURVEY OF METHODS
converges faster. In the next example we demonstrate that the local truncation
error of the midpoint method is O(∆t)3 so that we expect the method to converge
with a global error of O(∆t)2 . The steps in estimating the local truncation error for
the midpoint method are analogous to the ones we performed for determining the
local truncation error for the forward Euler method except now we need to use a
Taylor series expansion in two independent variables for f (t, y) because of the term
f (tn + ∆t n 1
2 , Y + 2 k1 ).
∆t
Y n+1 = Y n + f (tn , Y n ) + f (tn+1 , Y n+1 )
Trapezoidal Rule (4.12)
2
74 CHAPTER 4. A SURVEY OF METHODS
However, like the backward Euler method this is an implicit scheme and thus for
each tn we need to solve a nonlinear equation for most choices of f (t, y). This can
be done, but there are better approaches for using implicit schemes in the context
of ODEs as we see in § 4.4.
Other numerical quadrature rules on the interval from [tn , tn+1 ] lead to addi-
tional explicit and implicit one-step methods. The Euler method, the midpoint rule
and the trapezoidal rule all belong to a family of methods called Runge-Kutta
methods which we discuss in § 4.1.3.
Many quadrature rules are interpolatory in nature; that is, the integrand is
approximated by an interpolating polynomial which can then be integrated exactly.
Rb
For example, for a g(x) dx we could use a constant, a linear polynomial, a quadratic
polynomial, etc. to approximate g(x) in [a, b] and then integrate it exactly. We
want to use a Lagrange interpolating polynomial instead of a Hermite because the
latter requires derivatives. If we use f (t, y(t)) ≈ f (tn , y(tn )) in [tn , tn+1 ] then we
get the forward Euler method and if we use f (t, y(t)) ≈ f (tn + ∆t/2, y(tn + ∆t/2))
then we get the midpoint rule. So to derive some single-step methods we can use
interpolation but only using points in the interval [tn , tn+1 ]. However there are
many quadrature rules, such as Gauss quadrature, which are not interpolatory and
so using numerical quadrature as an approach to deriving single-step methods is
more general.
constant terms, the terms involving ∆t, ∆t2 and ∆t3 . For the constant term to
disappear we require that α = 1; for the linear term in ∆t we require that b1 = 1.
The term involving ∆t2 can not be made to disappear so the only explicit method
with one function evaluation which has a local truncation O(∆t)2 is the forward
Euler method. No explicit method using only the function evaluation at tn with a
local truncation error greater than O(∆t)2 is possible. Note that α must always be
one to cancel the y(tn ) term in the expansion of y(tn + ∆t) so in the sequel there
is no need for it to be unknown.
The following example illustrates this approach if we want an explicit single-step
method where we are willing to perform one additional function value in the interval
[tn , tn+1 ]. We already know that the midpoint rule is a second order method which
requires the additional function evaluation at tn + ∆t/2. However, this example
demonstrates that there is an infinite number of such methods.
Recall that we have set α = 1 as in the derivation of the forward Euler method.
To determine constraints on the parameters b1 , b2 , c2 and a21 which result in the highest
order for the truncation error, we compute the local truncation error and use Taylor series
to expand the terms. For simplicity, in the following expansion we have omitted the explicit
evaluation of f and its derivatives at the point (tn , y(tn )); however, if f is evaluated at
some other point we have explicitly noted this. We use (4.10) for a Taylor series expansion
in two variables to get
must satisfy in order for the terms involving (∆t)1 and (∆t)2 to vanish. We have
∆t [f (1 − b1 − b2 )]
= 0
2 1 1
∆t ft − b2 c2 + f fy − b2 a21 = 0,
2 2
where once again we have dropped the explicit evaluation of y and f at (tn , y(tn )). Thus
we have the conditions
1 1
b1 + b2 = 1, b2 c2 = and b2 a21 = . (4.14)
2 2
Note that the midpoint method given in (4.6) satisfies these equations with b1 = 0, b2 = 1,
c2 = a21 = 1/2. However, any choice of the parameters which satisfy these constraints
generates a method with a third order local truncation error.
Because we have four parameters and only three constraints we might ask ourselves if it is
possible to choose the parameters so that the local truncation error is one order higher, i.e.,
O(∆t)4 . To see that this is impossible note that in the expansion of y(tn+1 ) the term y 000
involves terms such as ft fy for which there are no corresponding terms in the expansion of
f tn + c2 ∆t, Y n + a21 ∆tf (tn , Y n ) so these O(∆t)3 terms remain. Consequently there
is no third order explicit one-step method which only performs two function evaluations
per time step.
mathematicians
4.1. SINGLE-STEP METHODS 77
Heun Method
k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + 32 ∆t, Y n + 23 k1 ) (4.15)
n+1 n 1 3
Y = Y + 4 k1 + 4 k2
The general form for an explicit s-stage RK method is given below. The coef-
ficient c1 is always zero because we always evaluate f at the point (tn , Y n ) from
the previous step to get the appropriate cancellation for the ∆t term in the local
truncation error calculation.
k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + c2 ∆t, Y n + a21 k1 )
k3 = ∆tf (tn + c3 ∆t, Y n + a31 k1 + a32 k2 )
.. (4.16)
.
n
ks = ∆tf tn + cs ∆t, Y + as1 k1 + as2 k2 + · · · + ass−1 ks−1
Ps
Y n+1 = Y n + j=1 bj kj
Once the stage s is set and the coefficients are determined, the method is completely
specified; for this reason, the RK explicit s-stage methods are often described by a
Butcher2 tableau of the form
0
c2 a21
c3 a31 a32
.. .. .. .. (4.17)
. . . .
cs as1 as2 ··· ass
b1 b2 ··· bs
2 Named after John C. Butcher, a mathematician from New Zealand.
78 CHAPTER 4. A SURVEY OF METHODS
0
1 1
2 2
1 1
0
2 2 (4.18)
1 0 0 1
1 1 1 1
6 3 3 6
k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + 21 ∆t, Y n + 12 k1 )
k3 = ∆tf (tn + 21 ∆t, Y n + 12 k2 )
k4 = ∆tf (tn + ∆t, Y n + k3 )
Y n+1 = Y n + k61 + k32 + k33 + k64 .
Input: the solution y at the time t, the uniform time step ∆t, s the number
of stages, coefficients a, b, c.
Loop over number of stages:
k=0
for i = 1, . . . , s
teval = t + c(i)∆t
yeval = y + dot product a(i, ·), k
k(i) = ∆tf (teval , yeval )
y = y + dot product(k, b)
The following example compares the results of using explicit RK methods for
stages one through four. Note that the numerical rate of convergence matches the
stage number for stages one through four but, as we will see, this is not true in
general.
Many RK methods were derived in the early part of the 1900’s; initially, the
impetus was to find higher order explicit methods. In Example 4.4 we saw that for
s ≤ 4 the stage and the order of accuracy are the same. One might be tempted
to generalize that an s-stage method always produces a method with global error
O(∆t)s , however, this is not the case. In fact, there is an order barrier which
is illustrated in the table below. As you can see from the table, a five-stage RK
80 CHAPTER 4. A SURVEY OF METHODS
method does not produce a fifth order scheme; we need a six-stage method to
produce that accuracy so there is no practical reason to use a five-stage scheme
because it has the same accuracy as a four-stage scheme but requires one additional
function evaluation.
Stage 1 2 3 4 5 6 7 8 9 10
Order 1 2 3 4 4 5 6 6 7 7
Unlike explicit RK methods, implicit RK methods do not have the order barrier.
For example, the following four-stage implicit RK method has order five so it is
more accurate than any four-stage explicit RK method.
k1 = ∆tf (tn , Y n )
k2 = ∆tf (tn + 41 ∆t, Y n + 18 k1 + 18 k2 )
7 1 14 3
k3 = ∆tf (tn + 10 ∆t, Y n − 100 k1 + 25 k + 20 k3 )
n 2 5
2
k4 = ∆tf tn + ∆t, Y + 7 k1 + 7 k3
1
Y n+1 = Y n + 14 k1 + 32 250 5
81 k2 + 567 k3 + 54 k4 .
4.1. SINGLE-STEP METHODS 81
with the solution y(t) = y0 eλt ; here C represents all complex numbers of the form
α + iβ. Note that in general λ is a complex number but to understand why we
look at this particular problem first consider the case when λ is real. If λ > 0 then
small changes in the initial condition can result in the solutions becoming far apart.
For example, if we have IVPs (4.21) with two initial conditions y1 (0) = α and
y2 (0) = β which differ by δ = |β − α| then the solutions y1 = αeλt and y2 = βeλt
differ by δeλt . Consequently, for large λ > 0 these solutions can differ dramatically
as illustrated in the table below for various choices of δ and λ. However, if λ < 0
the term δeλt approaches zero as t → 0. Therefore for stability of this model IVP
when λ is real we require λ < 0.
Now eiβt = cos(βt) + i sin(βt) so this term does not grow in time; however the
term eαt grows in an unbounded manner if α > 0. Consequently we say that the
differential equation y 0 = λy is stable when the real part of λ is less than or equal
to zero, i.e., Re(λ) ≤ 0 or λ is in the left half of the complex plane.
When we approximate the model IVP (4.21) we want to know that small
changes, such as those due to roundoff, do not cause large changes in the solu-
tion. Here we are going to look at stability of a difference equation of the form
applied to the model problem (4.21). We apply the difference equation (4.22)
recursively to get
so ζ(λ∆t) = 1 + ∆tλ.
For the backward Euler method we have
We know that the magnitude of ζ must be less than or equal to one or else
Y n becomes unbounded. This condition is known as absolute stability. There are
many other definitions of different types of stability; some of these are explored in
the exercises.
Absolute Stability
The region of absolute stability for the difference equation (4.22) is {λ∆t ∈
C | |ζ(λ∆t)| ≤ 1}. A method is called A-stable if |ζ(λ∆t)| ≤ 1 for the entire
left half plane.
In the next example we determine the region of absolute stability of the forward
Euler method and compare to the results in Example 3.6.
Example 4.6. Determine if the forward Euler method and the backward Euler method
are A-stable; if not, determine the region of absolute stability. Then discuss the previous
numerical results for y 0 (t) = −20y(t) in light of these results.
For the forward Euler method ζ(λ∆t) = 1 + λ∆t so the condition for A-stability is that
|1 + λ∆t| ≤ 1 for the entire left plane. Now λ is, in general, complex which we can write
as λ = α + iβ but let’s first look at the real case, i.e., β = 0. Then we have
−1 ≤ 1 + λ∆t ≤ 1 ⇒ −2 ≤ λ∆t ≤ 0
so on the real axis we have the interval [−2, 0]. This says that for a fixed real λ < 0, ∆t
must satisfy ∆t ≤ 2/|λ| and thus the method is not A-stable but has a region [−2, 0] of
4.1. SINGLE-STEP METHODS 83
absolute stability if λ is real. If β 6= 0 then we have the region of stability as a circle in the
complex plane of radius one centered at -1. For example, when λ = −20 ∆t must satisfy
∆t ≤ 0.1. In Example 3.6 we plotted results for ∆t = 1/4 and 1/8 which do not satisfy
the stability criteria. In the figure below we plot approximations to the same problem using
∆t = 1/20, 1/40 and 1/60. As you can see from the graph, the solution appears to be
converging.
0.10
0.08
0.06
Out[654]=
0.04
0.02
For the backward Euler method ζ(λ∆t) = 1/(1 − λ∆t). To determine if it is A-stable we
see if it satisfies the stability criteria for the entire left plane. As before, we first find the
region when λ is real. For λ ≤ 0 have 1 − λ∆t ≥ 1 so that ζ(λ∆t) ≤ 1 for all ∆t and
we have the entire left plane. The backward Euler method is A-stable so any choice of ∆t
provides stable results for λ < 0. This agrees with the results from Example ??.
To be precise, the region of absolute stability for the backward Euler method is actually the
region outside the circle in the complex plane centered at one with radius one. Clearly, this
includes the left half plane. To see this, note that when λ∆t ≥ 2 then |1/(1 − λ∆t)| ≤ 1.
However, we are mainly interested in the case when Re(λ) < 0 because the differential
equation y 0 (t) = λy is stable.
Next we show that the explicit Heun method has the same region of stability as
the forward Euler method so we expect the same behavior. Recall that the Heun
method is second order accurate whereas the Euler method is first order so accuracy
has nothing to do with stability.
Example 4.7. Investigate the region of absolute stability for the explicit 2-stage Heun
method given in (4.15). Plot results for this method applied to the IVP from Example ??.
Choose values of the time step where the stability criteria is not met and then some values
where it is satisfied.
We first write the scheme as a single equation rather than the standard way of specifying
ki because this makes it easier to determine the amplification factor.
∆t 2 2
Y n+1 = Y n + f (tn , Y n ) + 3f tn + ∆t, Y n + ∆tf (tn , Y n ) .
4 3 3
We apply the difference scheme to the model problem y 0 = λy where f (t, y) = λy(t) to
get
∆t n 2 1 3 1
Y n+1 = Y n + λY + 3λ(Y n + ∆tλY n )] = 1 + (λ∆t) + (λ∆t) + (λ∆t)2 Y n
4 3 4 4 2
so ζ(λ∆t) = 1 + λ∆t + 12 (λ∆t)2 . The region of absolute stability is all points z in the
complex plane where |ζ(z)| ≤ 1. If λ is real and non-positive we have
z2 z
−1 ≤ 1 + z + ≤ 1 ⇒ −2 ≤ z(1 + ) ≤ 0 .
2 2
84 CHAPTER 4. A SURVEY OF METHODS
For λ ≤ 0 so that z = λ∆t ≤ 0 we must have 1 + 21 λ∆t ≥ 0 which says ∆tλ ≥ −2. Thus
the region of stability is [−2, 0] when λ is real and when it is complex we have a circle of
radius one centered at −1. This is the same region as the one computed for the forward
Euler method.
The numerical results are shown below for the case when λ = −20. For this choice of λ
the stability criteria becomes ∆t ≤ 0.1 so for the choices of the time step ∆t = 0.5, 0.25
shown on the left, we expect the results to be unreliable but for the ones on the plot on
the right, the stability criteria is satisfied so the results are reliable.
50 1.0
40 0.8
Dt = 0.5
30 0.6
Dt = 0.25
20 0.4
Dt=1/32
10 0.2
Dt=1/64
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
the slopes at the m + 1 points tn+1 , tn , tn−1 , . . . , tn−(m−1) points. If the method
is explicit then the coefficient in front of the term f (tn+1 , Y n+1 ) is zero.
Now if we use the midpoint quadrature rule to approximate the integral on the right
we have the two-step scheme
which is sometimes called the modified midpoint method. However, unlike one-
step methods, our choice of quadrature rule is restricted because for the quadrature
points we must use only the previously calculated times. For example, if we have a
three-step method using tn , tn−1 , and tn−2 we need to use a Newton-Cotes integra-
tion formula such as Simpson’s method. Remember that Newton-Cotes quadrature
86 CHAPTER 4. A SURVEY OF METHODS
rules are interpolatory and so this approach is closely related to using an interpola-
tion polynomial.
Multistep methods are typically derived by using an interpolating polynomial in
either of two ways. The first is to approximate y(t) by an interpolating polynomial
through tn , tn−1 , . . . , tn−m+1 and then differentiate it to get an approximation to
y 0 (t) and substitute this approximation into the DE. If we evaluate the approximation
at tn+1 fthen we obtain an implicit method. This gives rise to a family of implicit
methods called backward difference formulas. The second approach is to use an
interpolating polynomial through tn , tn−1 , . . . , tn−m+1 for the given slope f (t, y)
and then integrate the equation; the integral of the interpolating polynomial can be
computed exactly. We discuss both approaches here.
Similar to one-step methods, we can also derive multistep methods by assuming
the most general form of the m-step method and then determine the constraints on
the coefficients which give as high an order of accuracy as possible. This approach
is just the method of undetermined coefficients discussed in § 4.1.2. This approach
for deriving multistep methods is explored in the exercises.
t − tn+1 t − tn
p1 (t) = y(tn ) + y(tn+1 ) .
−∆t ∆t
−1 1
p01 (t) = y(tn ) + y(tn+1 )
∆t ∆t
y(tn+1 ) − y(tn )
y 0 (t) ≈ .
∆t
Using this expression in the differential equation y 0 (t) = f (t, y) at tn+1 gives the
implicit backward Euler method.
For the second order BDF we approximate
y(t
n+1 ) by the quadratic
polynomial
that passes through tn−1 , y(tn−1 ) , tn , y(tn ) and tn+1 , y(tn+1 ) ; the Lagrange
4.2. MULTISTEP METHODS 87
3 n+1 1
Y − 2Y n + Y n−1 = ∆tf (tn+1 , Y n+1 ) ;
2 2
often these formulas are normalized so that the coefficient of Y n+1 is one. It can
be shown that this method is second order.
4 n 1 n−1 2
Second order BDF Y n+1 = Y − Y + ∆tf (tn+1 , Y n+1 ) (4.25)
3 3 3
For the two-step scheme (4.25) we have m = 2, a21 = 4/3, a22 = −1/3 and
β = 2/3. Table 4.1 gives coefficients for other uniform BDF formulas using the
terminology of (4.26). It can be proved that the accuracy of the m-step methods
included in the table is m.
It is also possible to derive BDFs for nonuniform time steps. The formulas are
derived in an analogous manner but are a bit more complicated because for the
interpolating polynomial we must keep track of each ∆ti ; in the case of a uniform
∆t there are some cancellations which simplify the resulting formulas.
88 CHAPTER 4. A SURVEY OF METHODS
1 1 1
2 4/3 -1/3 2/3
3 18/11 -9/11 2/11 6/11
4 48/25 -36/25 16/25 -3/25 12/25
5 300/137 -300/137 200/137 -75/137 12/137 60/137
Table 4.1: Coefficients for implicit BDF formulas of the form (4.26) where the
coefficient of Y n+1 is normalized to one.
1 (t − tn )2 tn+1
= − f tn−1 , y(tn−1 )
∆t 2 tn
1 (t − tn−1 )2 tn+1
+ f tn , y(tn )
∆t 2 tn
2 3∆t2
1 (∆t) 1
= − f tn−1 , y(tn−1 ) + f tn , y(tn )
∆t 2 ∆t 2
which suggests the scheme
3 ∆t
Y n+1 = Y n + ∆tf tn , Y n − f tn−1 , Y n−1 .
(4.27)
2 2
4.2. MULTISTEP METHODS 89
12 3 12
n+1 n 55 n 59 n−1 37
Y = Y + ∆t f (tn , Y ) − f (tn−1 , Y ) + f (tn−2 , Y n−2 )
24 24 24
3
− f (tn−3 , Y n−3 )
8
1901 1387 109
Y n+1 = Y n + ∆t f (tn , Y n ) − f (tn−1 , Y n−1 ) + f (tn−2 , Y n−2 )
720 360 30
637 251
− f (tn−3 , Y n−3 ) + f (tn−4 , Y n−4 )
360 720
(4.28)
Schemes in the Adams-Moulton family are implicit multistep methods which use
the derivative f evaluated at tn+1 plus m prior points but only use the solution Y n .
The one-step Adams-Moulton method is the backward Euler scheme and the 2-step
method is the trapezoidal rule; several methods are listed here for completeness.
5 2 1
Y n+1 = Y n + ∆t f (tn+1 , Y n+1 ) + f (tn , Y n ) − f (tn−1 , Y n−1 )
12 3 12
3 19 5
Y n+1 = Y n + ∆t f (tn+1 , Y n+1 ) + f (tn , Y n ) − f (tn−1 , Y n−1 )
8 24 24
1 n−2
+ f (tn−2 , Y )
24
251 646 264
Y n+1 = Y n + ∆t f (tn+1 , Y n+1 ) + f (tn , Y n ) − f (tn−1 , Y n−1 )
720 720 720
106 19
+ f (tn−2 , Y n−2 ) − f (tn−3 , Y n−3 )
720 720
(4.29)
a one-step method to start the scheme. How do we decide what method to use? A
“safe” approach is to use a method which has the same accuracy as the multistep
method but we see in the following examples that you can actually use a method
which has one power of ∆t less because we are only taking a small number of
steps with the method. For example, if we use the 2-step second order Adams-
Bashforth method we need Y 1 in addition to Y 0 . If we take one step with the
forward Euler method it is actually second order accurate at the first step because
the error there is only due to the local truncation error. However, if we use a 3-step
third order Adams-Bashforth method then using the forward Euler method to get
the two starting values results in a loss of accuracy. This issue is illustrated in the
following example.
Compare the numerical rates of convergence when different methods are used to generate
the starting values. Specifically we use RK methods of order one through four to generate
the starting values which for a 4-step method are Y 1 , Y 2 , and Y 3 because we have Y 0 = 1.
Tabulate the errors at t = 3.
4.2. MULTISTEP METHODS 91
As you can see from the table, if a second, third or fourth order scheme is used to compute
the starting values then the method is third order. There is nothing gained by using a
higher order scheme (the fourth order) for the starting values. However, if a first order
scheme (forward Euler) is used then the rate is degraded to second order even though we
only used it to calculate two values, Y 1 and Y 2 . Consequently to compute starting values
we should use a scheme that has the same overall accuracy or one degree less than the
method we are using.
The next example provides results for 2-step through 5-step Adams-Bashforth
methods. From the previous example we see that to maintain the accuracy the
starting values need to be determined by methods of order m − 1.
We first rewrite the m-step multistep method (4.23) by shifting the indices to
get
Y i+m = am−1 Y hi+m−1 + am−2 Y i+m−2 + am−3 Y i+m−3 + · · · + a0 Y i
+∆t bm f (ti+m , Y i+m ) + bm−1 f (ti+m−1 , Y i+m−1 )
i
+bm−2 f (ti+m−2 , Y i+m−2 ) + · · · + b0 f (ti , Y i )
or equivalently
m−1
X m
X
i+m i+j
Y − aj Y = ∆t bj f (ti+j , Y i+j ) .
j=0 j=0
0
As before, we apply it to the model IVP y = λy, y(0) = y0 for Re(λ) < 0 which
guarantees the IVP itself is stable. Substituting f = λy into the difference equation
gives
m−1
X m
X
Y i+m − aj Y i+j = ∆t bj λY i+j .
j=0 j=0
Recall that a technique for solving a linear homogeneous ODE such as y 00 (t) +
2y 0 (t) − y(t) = 0 is to look for solutions of the form y = ert and get a polynomial
equation for r such as ert (r2 + 2r − 1) = 0 and then determine the roots of the
equation. We take the analogous approach for the difference equation and seek a
solution of the form Y n = z i . Substitution into the difference equation yields
m−1
X m
X
z i+m − aj z i+j = ∆t bj λz i+j .
j=0 j=0
where
m−1
X m
X
ρ(z) = z m − aj z j and σ(z) = bj z j . (4.30)
j=0 j=0
For stability, we need the roots of ρ(z) to have magnitude ≤ 1 and if a root is
identically one then it must be a simple root. If this root condition is violated,
then the method is unstable so a simple check is to first see if the root condition is
satisfied; if the root condition is satisfied then we need to find the region of stability.
To do this, we find the roots of Q(λ∆t) and require that each root has magnitude
less than or equal to one. To simplify the calculations we rewrite Q(λ∆t) as
Q(λ∆t) = z m (1 − λ∆tbm ) − z m−1 (am−1 + bm−1 λ∆t)
−z m−2 (am−2 + bm−2 λ∆t) − · · · − (a0 + b0 λ∆t) .
4.2. MULTISTEP METHODS 93
In the following example we determine the region of stability for both the forward
and backward Euler methods using the analysis for multistep methods. The same
stability conditions as we obtained by analyzing the stability of the one-step methods
using the amplification factor are realized.
Example 4.10. Investigate the stability of the forward and backward Euler methods by
first demonstrating that the root condition for ρ(z) is satisfied and then finding the region
of absolute stability. Confirm that the same results as obtained for stability by considering
the methods as single-step methods are achieved.
The forward Euler method is written as Y n+1 = Y n + ∆tf (tn , Y n ) so in the form of a
multistep method with m = 1 we have a0 = 1, b0 = 1, b1 = 0 and thus ρ(z) = z −1 whose
root is z = 1 so the root condition is satisfied. To find the region of absolute stability
we have Q(λ∆t) = z − (1 + λ∆t) which has a single root 1 + λ∆t; thus the region of
absolute stability is |1 + λ∆t| ≤ 1 which is the condition we got before by analyzing the
method as a single step method.
For the backward Euler method a0 = 1, b0 = 0, b1 = 1 and so ρ(z) = z − 1 which has the
root z = 1 and so the root condition is satisfied. To find the region of absolute stability
we have Q(λ∆t) = z(1 − λ∆t) − 1 which has a single root 1/(1 − λ∆t) and we get the
same restriction that we got before by analyzing the method as a single-step method.
The next example analyzes the stability of a 2-step method using the Dalhquist
conditions.
Example 4.11. In this example we want to show that the 2-step Adams-Bashforth
method
∆t
Y n+1 = Y n + 3f (tn , Y n ) − f (tn−1 , Y n−1 )
2
is stable.
For this Adams-Bashforth method we have m = 2, a0 = 0, a1 = 1, b0 = −1/2, b1 = 3/2,
and b2 = 0. The characteristic polynomial is ρ(z) = z 2 − z = z(z − 1) whose two roots
are z = 0, 1 and the root condition is satisfied.
In summary, we see that some methods can be unstable if the step size ∆t
is too large (such as the forward Euler method) while others are stable even for
a large choice of ∆t (such as the backward Euler method). In general, explicit
methods have stability restrictions whereas implicit methods are stable for all step
sizes. Of course, one must have a small enough step size for accuracy. We have
just touched on the ideas of stability of numerical methods for IVPs; the interested
reader is referred to standard texts in numerical analysis for a thorough treatment
of stability. The important concept is that we need a consistent and stable method
to guarantee convergence of our results.
94 CHAPTER 4. A SURVEY OF METHODS
We have
h h2 h3
f 0 (t) − N (h) = f 00 (t) + f 000 (t) + f 0000 (t) + · · · . (4.31)
2 3! 4!
Now if we generate another approximation using step size h/2 we have
h h h2 000 h3 0000
f 0 (t) − N ( ) = f 00 (t) + f (t) + f (t) + · · · . (4.32)
2 4 4 · 3! 8 · 4!
The goal is to combine these approximations to eliminate the O(h) term so that
the approximation is O(h2 ). Clearly subtracting (4.31) from twice (4.32) eliminates
the terms involving h so we get
h h2 h2 000 h3 h3 0000
f 0 (t) − 2N ( ) − N (h) =
− f (t) + − f (t) + · · · +
2 2 · 3! 3! 4 · 4! 4!
2 3
h 3h 0000
= − f 000 (t) − f (t) + · · · .
12 32
(4.33)
Thus the approximation 2N (h/2) − N (h) for f 0 (x) is second order. This process
can be repeated to eliminate the O(h2 ) term. To see this, we use the approximation
(4.33) with h halved again to get
Note that to get the rates in the last column more accuracy in the solution had to be used
than the recorded values in the table.
scheme because instead of computing f (tn+1 , Y n+1 ) we use the known predicted
value at tn+1 . One can also take the approach of correcting more than once. You
can view this approach as being similar to applying the Newton-Raphson method
where we take the predictor step as the initial guess and each corrector is a New-
ton iteration; however, the predictor step gives a systematic approach to finding an
initial guess.
We first consider the Euler-trapezoidal predictor-corrector pair where the explicit
scheme is forward Euler and the implicit scheme is the trapezoidal method (4.12).
Recall that the forward Euler scheme is first order and the trapezoidal is second
order. Letting the result of the predicted solution at tn+1 be Ypn+1 , we have the
following predictor-corrector pair.
As can be seen from the description of the scheme, the implicit trapezoidal method
is now implemented as an explicit method because we evaluate f at the known point
(tn+1 , Ypn+1 ) instead of at the unknown point (tn+1 , Y n+1 ). The method requires
two function evaluations so the work is equivalent to a two-stage RK method. The
scheme is often denoted by PECE because we first predict Ypn+1 , then evaluate
f (tn+1 , Ypn+1 ), then correct to get Y n+1 and finally evaluate f (tn+1 , Y n+1 ) to get
ready for the next step.
The predicted solution Ypn+1 from the forward Euler method is first order but
we add a correction to it using the trapezoidal method and improve the error. We
can view the predictor-corrector pair as implementing the difference scheme
∆t h i
Y n+1 = Y n + f tn+1 , Y n + ∆tf (tn , Y n ) + f (tn , Y n )
2
which uses an average of the slope at (tn , Y n ) and at tn+1 the Euler approximation
there. To analytically demonstrate the accuracy of a predictor-corrector method it
is helpful to write the scheme in this manner. In the exercises you are asked to show
that this predictor-corrector pair is second order. Example 4.13 demonstrates that
numerically we get second order.
One might believe that if one correction step improves the accuracy, then two
or more correction steps are better. This leads to methods which are commonly
denoted as PE(CE)r schemes where the last two steps in the correction process
are repeated r times. Of course it is not known a priori how many correction
steps should be done but since the predictor step provides a good starting guess,
only a small number of corrections are typically required. The effectiveness of the
correction step can be dynamically monitored to determine r. The next example
applies the Euler-trapezoidal rule to an IVP using more than one correction step.
98 CHAPTER 4. A SURVEY OF METHODS
ty 2
y 0 (t) = √ , 0<t≤2 y(0) = 1
9 − t2 − 2
√
which has an exact solution y(t) = 1/ 9 − t2 that can be found by separating variables
and integrating.
Using Y 0 = 1 and ∆t = 0.1 √ we first predict the value at 0.1 using the forward Euler
method with f (t, y) = ty 2 /( 9 − t2 − 2) to get
(.1)(12 )
E: f (0.1, 1.0) = √ = 0.03351867
( 9 − .12 − 2)
and finally correct to obtain the approximation at t1 = 0.1
0.1
C :Y 1 = Y 0 +
f (0.1, 1) + f (0, 1) == 1.03351867
2
with
E :f (0.1, 1.8944272) = 0.03356667 .
To perform a second correction we have
0.1
C :Y 1 = Y 0 +
f (0.1, 1.03351867) + f (0, 1) = 1.00167316
2
where
E :f (.1, 1.) = 0.03463567 .
The results for the approximate solutions at t = 2 are given in the table below using
decreasing values of ∆t; the corresponding results from just using the forward Euler method
are also given. As can be seen from the table, the predictor-corrector pair is second order.
Note that it requires one additional function evaluation, f (tn+1 , YiP ), than the Euler
method. The Midpoint rule requires the same number of function evaluations and has
the same accuracy as this predictor-corrector pair. However, the predictor-corrector pair
provides an easy way to estimate the error at each step.
PECE PE(CE)2
∆t Error rate Error rate
1/10 2.62432 10−2 3.93083 10−2
1/20 7.66663 10−2 1.75 1.16639 10−2 1.75
1/40 3.18110 10−3 1.87 3.18110 10−3 1.87
1/80 8.31517 10−4 1.94 8.31517 10−4 1.94
1/160 2.12653 10−4 1.97 2.12653 10−4 1.97
4.5. COMPARISON OF SINGLE-STEP AND MULTISTEP METHODS 99
In the previous example we saw that the predictor was first order, the corrector
second order and the overall method was second order. It can be proved that if the
corrector is O(∆t)n and the predictor is at least O(∆t)n−1 then the overall method
is O(∆t)n . Consequently the PC pairs should be chosen so that the corrector is
one degree higher accuracy than the predictor.
Higher order predictor-corrector pairs often consist of an explicit multistep method
such as an Adams-Bashforth method and a corresponding implicit Adams-Moulton
multistep method. The pair should be chosen so that the only additional function
evaluation in the corrector equation is at the predicted point. To achieve this one
often chooses the predictor and corrector to have the same accuracy. For example,
one such pair is an explicit third order Adams-Bashforth predictor coupled with an
implicit third order Adams-Moulton. Notice that the corrector only requires one
additional function evaluation at (tn+1 , Ypn+1 ).
Ypn+1 = Y n + ∆t
n n−1
12 23f (tn , Y ) − 16f (tn−1 , Y ) + 5f (ti−2 , Y n−2 )
Y n+1 = Y n + ∆t
n+1
12 5f (tn+1 , Yp ) + 8f (tn , Y n ) − f (tn−1 , Y n−1 )
(4.38)
EXERCISES
1
0 0 0 0 6 − 13 1
6
1 1 1 1 5 1
2 2 0 0 2 6 12 − 12
a. b. 1 2 1
1 −1 2 0 1 6 3 6
1 2 1 1 2 1
6 3 6 6 3 6
2. Modify the derivation of the explicit second order Taylor series method in
§ 4.1.1 to derive an implicit second order Taylor series method.
3. Use a Taylor series to derive a third order accurate explicit difference equation
for the IVP (2.8).
4. Gauss quadrature rules are popular for numerical integration because one
gets the highest accuracy possible for a fixed number of quadrature points;
however one gives up the “niceness” of the quadrature points. In addition,
these rules are defined over the interval [−1, 1]. For example, the one-point
Gauss quadrature rule is
Z 1
1
g(x) dx = g(0)
−1 2
If g(x) ≥ 0 on [a, b] then it approximates the area under the curve g(x) by the
area under a parabola passing through the points (a, g(a)), (b, g(b)) and ((a+
Rt
b)/2, g((a+b)/2)). Use this quadrature rule to approximate tnn+1 f (t, y) dt to
obtain an explicit 3-stage RK method. When you need to evaluate terms such
as f at tn + ∆t/2 use an appropriate Euler step to obtain an approximation
to the corresponding y value as we did in the Midpoint method. Write your
method in the format of (4.16) and in a Butcher tableau.
When modeling phenomena where we know the initial state and how it changes
with time, we often have either a higher order IVP or a system of IVPs rather than
a single first order IVP. In this chapter we first recall how a higher order IVP can be
transformed into a system of first order IVPs. Then we extend in a straightforward
manner some of the methods from Chapter ?? to systems of equations. We discuss
implementation issues and give examples that illustrate the use of systems of IVPs.
Then we point out how to extend our stability tests for a single equation to a system.
The final concept we investigate in our study of IVPs are methods which effi-
ciently allow a variable time step to be taken. For these methods we need a means
to estimate the next time step. If we can get an estimate for the error made at
time tn then the magnitude of the error can be used to accept or reject the step
and, if the step is accepted, to estimate the next time step. Consequently, our goal
is to find methods which can be used to estimate the error. One strategy is to
obtain two approximations at a given time and use these to measure the error. Of
course obtaining the second approximation must be done efficiently or else the cost
is prohibitively large. In addition to variable step, many production codes are also
variable order. We do not address these here.
103
104 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS
y [4] (t) + 2y 00 (t) + 4y(t) = 5 y(0) = 1, y 0 (0) = −3, y 00 (0) = 0, y 000 (0) = 2
Oftentimes a model is already in the form of a coupled system of first order IVPs
such as the predator-prey model (2.6). Our goal is to apply the methods of the
previous chapter to a system of first order IVPs. The notation we use for a general
system of N first order IVPs is
We write the Euler method as a vector equation so we can solve for all N un-
knowns simultaneously at each tn ; this is not necessary but results in an efficient
implementation of the method. To this end we set
n
f1 (tn , Wn )
W1
W2n f2 (tn , Wn )
Wn = . and Fn =
..
..
.
WNn fN (tn , Wn ))
so that W0 = (α1 , α2 , . . . , αN )T . For n = 0, 1, 2, . . . we have the following vector
equation for the forward Euler method for a system
Wn+1 = Wn + ∆tFn . (5.3)
To implement the scheme at each point tn we have N function evaluations to form
the vector Fn , then we perform the scalar multiplication to get ∆tFn and then a
vector addition to obtain the final result Wn+1 . To compute the error at a specific
time, we have to take into account the fact that the approximation is now a vector
instead of a scalar. Also the exact solution is a vector of each wi evaluated at
the specific time the error is determined. We can easily calculate an error vector
as the difference in these two vectors. To obtain a single number to use in the
calculation of a numerical rate, we must use a vector norm. A common vector norm
is the standard Euclidean norm which is often called `2 norm or the “little l2 norm”.
Another commonly used vector norm is the max or infinity norm. See the appendix
for details.
106 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS
for the unknown (w1 , w2 , w3 )T whose exact solution is (− cos(2t), sin(2t) + 2t, cos(2t) +
et )T . Determine by hand an approximation at t = 0.2 using ∆t = 0.1 and the forward
Euler method. Calculate the Euclidean or `2 -norm of the error at t = 0.2.
For this problem we have
2W2n − 4tn
−1
0 n
W = 0 and F = −W1n + W3n − etn + 2
2 W1n − 2W2n + W3n + 4tn
so that with tn = t0 = 0
2(0) − 4(0) 0
F0 = −(−1) + 2 − e0 + 2 = 4 .
−1 − 0 + 2 + 4(0) 1
so that
−1.0 0.400 −0.9600
W2 = 0.4 + 0.1 3.995 = 0.7995 .
2.1 0.700 2.1700
The exact solution at t = 0.2 is (−0.921061, 0.789418, 2.14246)T . Unlike the case of a
single IVP we now have an error vector instead of a single number; at t = 0.2 the error
vector in our calculation is (0.038939, .010082, .02754)T . To obtain a single number from
this vector to use in the calculation of a numerical rate, we must use a vector norm. For
this calculation at t = 0.2 the Euclidean norm of the error is 1.98 10−2 .
Suppose now that we have an s-stage RK method; recall that for a single first
order equation we have s function evaluations for each tn . If we have N first order
IVPs, then we need sN function evaluations at each tn . For example, if we use
a 4-stage RK with 10,000 equations then at each time we need 40,000 function
5.2. SINGLE-STEP METHODS FOR SYSTEMS 107
The exact solution at t = 0.2 is (−0.921061, 0.789418, 2.14246)T giving an error vector of
(0.000661, .002215, .000096)T ; calculating the standard Euclidean norm of the error and
normalizing by the Euclidean norm of the solution gives 1.0166×10−3 which is considerably
smaller than we obtained for the forward Euler.
Now we want redo the problem using dot and matrix products. Let
1 0 0
0 0 0
c = 2 , b = 43 , a = 2 , K = 0 0
3 4 3
0
0 0
W0 + Kb
using a matrix times vector operation. To form the two columns of K we need to determine
the point where we evaluate F. To determine the first column of K, i.e., k1 we have
K=0
for i = 1, . . . , s
teval = t + c(i)∆t
Weval = W + matrix product K, a(i, :)T
K(:, i) = ∆tf (teval , Weval )
W = W + matrix product K, b
t = t + ∆t
In the table below we tabulate the results using the Heun method for this system at
t = 1 where both the normalized `2 -norm and `∞ -norm (i.e., the maximum norm) of the
error normalized by the corresponding norm of the solution is reported. Clearly we have
quadratic convergence as we did in the case of a single equation.
110 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS
For a system of N equations the function f is now a vector F so we must store its
value at the previous m steps. In the Adams-Bashforth or Adams Moulton methods
a0 , a1 , . . . , am−2 = 0 so only the solution at tn is used. This saves additional storage
because we only have to store m slope vectors and a single vector approximation to
the solution. So for the system of N equations using an m-step Adams-Bashforth
method we must store (m + 1) vectors of length N . Remember that an m-step
method requires m starting values so we need to calculate m − 1 values from a
single-step method.
As a concrete example, consider the 2-step Adams-Bashforth method
h3 1 i
Y n+1 = Y n + ∆t f (tn , Y n ) − f (tn−1 , Y n−1 )
2 2
for a single IVP. Using the notation of the previous section we extend the method
for the system of N equations as
h3 1 i
Wn+1 = Wn + ∆t F(tn , Wn ) − F(tn−1 , Wn−1 ) . (5.4)
2 2
At each step we must store three vectors Wn , F(tn , Wn ), and F(tn−1 , Wn−1 ).
In the next example we apply this 2-step method to the system of the previous
examples.
Because this is a 2-step method we need two starting values. We have the initial condition
for W0 but we also need W1 . Because this method is second order we can use either a
first or second order scheme to generate an approximation to W1 . Here we use the results
from the Heun method in Example 5.3 for W1 . Consequently we have
−1 −0.98000
W0 = 0 and W1 = 0.39982 .
2 2.08500
From the previous example we have F(0, W0 ) = (0.0, 4.0, 1.0)T and F(0.1, W1 )=(.39966,
3.95982, −.704659)T . Then W2 is given by
−0.98000 0.39966 0.0 −0.92005
3 1
W2 = 0.39982 + 0.1 3.95983 − 4.0 = 0.79380 .
2 2
2.08500 −0.70466 1.0 1.92930
The table below tabulates the errors at t = 1. Of course we can only use the starting
value W1 = (. − 0.98000, 0.39982, 2.08500)T as starting values for the computations at
∆t = 0.1; for the other choices of ∆t we must generate starting values because t1 is
different. From the results we see that the rate is two, as expected.
have the general system (5.2) where fi (t, w) is not linear in w, then the condition
becomes one on the eigenvalues of the Jacobian matrix for f where the (i, j) entry
of the Jacobian is ∂fi /∂wj .
Now if we apply the forward Euler method to the system w0 (t) = Aw where
the entries of A are aij then we have the system
1 + ∆ta11 ∆ta12 ∆ta13 ··· ∆ta1N
∆ta21 1 + ∆ta22 ∆ta23 ··· ∆ta2N
Wn+1 =
n
W
. .. . ..
··· ··· ∆taN,N −1 1 + ∆taN,N
local truncation error and provide an estimate for the next time step ∆tn+1 as well
as determining if we should accept the result at tn+1 . To describe this approach
we use the forward Euler method because its simplicity should make the technique
clear.
We assume we have the solution at tn and have an estimate for the next time
step ∆tn and the goal is to determine whether this estimate the next time step
∆tn+1 . First we take an Euler step with ∆tn to get the approximation Y1n+1 where
the subscript denotes the specific approximation because we have two. Next we take
two Euler steps starting from tn using a step size of ∆tn /2 to get the approximation
Y2n+1 .
Recall that the local truncation error for the forward Euler method using a step
size of ∆t is C(∆t)2 + O(∆t)3 . Thus the exact solution satisfies
where we have equality because we have included the local truncation error. Con-
sequently, for our solution Y1n+1 we have the local truncation error C(∆tn )2 +
O(∆tn )3 , i.e.,
y(tn+1 ) − Y1n+1 = C(∆tn )2 + O(∆tn )3 .
Now for Y2n+1 we take two steps each with a local truncation error of C(∆tn )2 /4 +
O(∆tn /2)3 so basically at tn+1 we have twice this error
Now using Richardson extrapolation the solution 2Y2n+1 − Y1n+1 eliminates the
(∆tn )2 so that is has a local truncation error of O(∆t3n ).
We want to see how to use these truncation errors to decide whether to accept
or reject the improved solution 2Y2n+1 − Y1n+1 at tn+1 and if we accept it to choose
a new time step ∆tn+1 . Suppose that the user inputs a tolerance for the maximum
rate of increase in the error. We have an estimate for this rate, rn , from our
solutions so we require it to be less than the given tolerance σ, i.e.,
|Y1n+1 − Y2n+1 |
rn = ≤ prescribed tolerance = σ . (5.5)
∆tn
If this is satisfied then we accept the improved solution 2Y2n+1 −Y1n+1 ; otherwise we
reduce ∆tn and repeat the procedure. We estimate the next step size by computing
the ratio of the acceptable rate (i.e., the prescribed tolerance) and the actual rate
rn . Then we take this fraction of the current step size to estimate the new one. In
practice, one often adds a multiplicative factor less than one for “safety” since we
have made certain assumptions. For example, we could compute a step size from
σ
∆tn+1 = γ ∆tn where γ < 1 . (5.6)
rn
From this expression we see that if the acceptable rate σ is smaller than the actual
rate rn then ∆tn is decreased. If the criteria (5.5) is not satisfied then we must
114 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS
repeat the step with a reduced time step. We can estimate this from (5.6) without
the safety factor because in this case rn > σ so the fraction is less than one. The
following example illustrates this technique.
rn
initial ∆tn σ
accept/reject new ∆tn
(if rejected) (if accepted)
0.01 0.1 accept 0.0716
0.0716 .87 accept 0.0614
0.0614 2.0 reject 0.0460
0.0460 1.5 reject 0.0345
0.0345 1.1 reject 0.0259
0.0259 .86 accept 0.0225
0.0225 1.12 reject 0.0169
0.0169 .84 accept 0.0150
0.0150 .98 accept 0.0115
0.0115 .95 accept
If the step size ∆t is too large, then the assumption that the fourth derivative is
constant from tn to tn+1 may not hold and the above relationship is not true. Typ-
ically the exact solution y(tn+1 ) is not known so instead we monitor the difference
in the predicted and corrected solution |Y n+1 − Ypn+1 |. If it is larger than our
prescribed tolerance, then the step is rejected and ∆t is decreased. Otherwise the
step is accepted; if the difference is below the minimum prescribed tolerance then
the step size is increased in the next step.
k1 = f (tn , Y n )
1 n 1
k2 = f tn + ∆t, Y + k1
4 4
3 3 9
k3 = f tn + ∆t, Y n + k1 + k2
8 32 32
12 1932 7200 7296
k4 = f tn + ∆t, Y n + k1 − k2 + k3
13 2197 2197 2197
n 439 3680 845
k5 = f tn + ∆t, Y + k1 − 8k2 + k3 − k4
216 513 4104
1 n 8 3544 1859 11
k6 = f tn + ∆t, Y − k1 + 2k2 − k3 + k4 − k5 .
2 27 2565 4104 40
The fourth order RK method which uses the four function evaluations k1 , k3 , k4 , k5
25 1408 2197 1
Y n+1 = Y n + k1 + k3 + k4 − k5 (5.7)
216 2565 4104 5
116 CHAPTER 5. SYSTEMS AND ADAPTIVE STEP SIZE METHODS
is used to first approximate y(tn+1 ) and then the fifth order RK method
16 6656 28561 9 2
Y n+1 = Y n + k1 + k3 + k4 − k5 + k6 (5.8)
135 12825 56430 50 55
is used for comparison. Note that the fifth order method uses all of the coefficients
of the fourth order method so it is efficient to implement because it only requires
an additional function evaluation. For this reason, we call RK45 an embedded
method. Also note that the fourth order method is actually a 5-stage method but
remember that no 5-stage method is fifth order. Typically the Butcher tableau for
the coefficients ci and aij is written for the higher order method and then two lines
are appended at the bottom for the coefficients bi in each method. For example,
for RKF45 the tableau is
0
1 1
4 4
3 3 9
8 32 32
12 1932
13 2197 − 7200
2197
7296
2197
439 3680 845
1 216 −8 513 − 4104
1 8
2 − 27 2 − 3544
2565
1859
4104 − 11
40
25 1408 2197
216 0 2565 4104 − 51 0
16 6656 28561 9 2
135 0 12825 56430 − 50 55
but typically the solution curve changes rapidly and then tends towards a slowly-
varying solution. Because the stability region for implicit methods is typically much
larger than explicit methods, most stiff equations are approximated using an implicit
method.
To illustrate the concept of stiffness we look at a single IVP which is considered
stiff. The example is from a combustion model and is due to Shampine (2003)
who is one of the authors of the Matlab ODE suite. The idea is to model flame
propagation as when you light a match. We know that the flame grows rapidly
initially until it reaches a critical size which is dependent on the amount of oxygen.
We assume that the flame is a ball and y(t) represents its radius; in addition we
assume that the problem is normalized so that the maximum radius is one. We
have the IVP
2
y 0 (t) = y 2 (1 − y) 0 < t ≤ ; y(0) = δ (5.9)
δ
where δ << 1 is the small given initial radius. At ignition the solution y increases
rapidly to a limiting value of one; this happens quickly on the interval [0, 1/δ] but
on the interval [1/δ, 2/δ] the solution is approximately equal to one. Knowing the
behavior of the problem suggests that we should take a small step size initially and
then on [1/δ, 2/δ] where the solution is almost constant we should be able to take a
large step size. However, if we use the RKF45 method with an automatic step size
selector, then we can capture the solution on [0, 1/δ] but on [1/δ, 2/δ] the step size
is reduced by so much that the minimum allowable step size is surpassed and the
method often fails if the minimum step size is set too large. Initially the problem
is not stiff but it becomes stiff as its approaches the value one, its steady state
solution. The term “stiff” was used to described this phenomena because it was
felt the steady state solution is so “rigid”.
When one has a system of equations like (5.2) the stiffness of the problem
depends upon the eigenvalues of the Jacobian matrix. Recall that we said we need
all eigenvalues to have real part less than zero for stability. If the Jacobian matrix
has eigenvalues which have a very large negative real part and eigenvalues with a
very small negative real part, then the system is stiff and special care must be used
to solve it. You probably don’t know a priori if a system is stiff but if you encounter
behavior where the solution curve is not changing much but you find that your step
size needs to be smaller and smaller, then your system is probably stiff. In that
case, an implicit method is typically used.