Lec7 ODEs
Lec7 ODEs
Contents
1 Overview 2
2 Basics 3
2.1 First-order ODE; Initial value problems . . . . . . . . . . . . . . . . . . . . . 4
2.2 Converting a higher-order ODE to a first-order ODE . . . . . . . . . . . . . 4
2.3 Other types of differential equations . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Comparison to integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Visualization using vector fields . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Euler’s method 10
3.1 Forward Euler’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Euler’s method: error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Local truncation error . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 From local to global error . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Interpreting the error bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Backward Euler’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Runge-Kutta methods 23
5.1 Setup: one step methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 A better way: Explicit Runge-Kutta methods . . . . . . . . . . . . . . . . . 25
5.2.1 Explicit midpoint rule (Modified Euler’s method) . . . . . . . . . . . 25
1
5.2.2 Higher-order explicit methods . . . . . . . . . . . . . . . . . . . . . . 27
5.3 Implicit methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3.1 Example: deriving the trapezoidal method . . . . . . . . . . . . . . . 29
5.4 Summary of Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . . . . 30
8 Multistep methods 43
8.1 Adams methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.2 Properties of the Adams methods . . . . . . . . . . . . . . . . . . . . . . . . 46
8.3 Other multistep methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.4 Analysis of multistep methods . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.4.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.4.2 Zero stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.4.3 Strongly stable methods . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.5 Absolute stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A Nonlinear systems 50
A.1 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.3 Application to ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1 Overview
In this section of the course we will derive methods to numerically solve ordinary differential
equations (ODE’s), analyze them, and gain experience with using them in practice. We’ll
apply what we have learned from interpolation, differentiation and integration. We will cover
the following topics.
2
• Basics: We will focus on first-order ODE’s, in standard form, and the problems we
will consider are initial value problems (IVP’s). How can we convert a higher-order
ODE into a first-order ODE? How can we visualize the solution to an ODE?
• Algorithms: We will derive and analyze a variety of algorithms, such as forward and
backward Euler, the family of Runge-Kutta methods, and multistep methods.
• Convergence: What is the relationship between local error and global error? How
can we prove convergence? We will see that there are two ingredients: consistency and
stability. How do we get quantitative estimates?
• Stability: Stability is one of the most important considerations in choosing an ODE
solver. An important analysis is to find the region of stability for a numerical method.
Stability is especially important for “stiff” ODEs.
In practice, we will have to manage trade-offs between accuracy and stability.
• Explicit vs. implicit methods: Numerical methods can be classified as explicit and
implicit. Implicit methods often have better stability properties, but require an extra
step of solving non-linear equations using e.g., Newton’s method.
• Adaptive methods: Similarly to integration, it is more efficient to vary the step size.
To do this effectively, we need to derive methods with error estimates.
• Using ODE solvers in MATLAB and python: For example, ode45 is an adaptive
method in MATLAB that is a workhorse of solving ODE’s, that often “just works.”
How do we use it? How can we choose which solver is appropriate for the problem?
What are the tradeoffs?
See sections 7.1-7.3 of Moler’s book or any standard text on ODEs for a review of ODEs.
The bare minimum will be presented here. Essentially no ODE theory is required to solve
ODEs numerically, but the theory does provide important intuition, so it will greatly enhance
your understanding of the numerics.
2 Basics
In this section, you will learn:
• What is the standard form of a first-order ODE? How can we convert a higher-order
ODE into a first-order ODE?
• What is an initial value problem (IVP) vs. a boundary value problem (BVP)?
For IVPs, when do solutions exist and when are they unique?
• How does solving ODEs compare to integration? Integration can be viewed as a special
case of solving an ODE.
• Visualization: Plot vector fields and trajectories.
• Sensitivity: Derive a bound on how solutions with differing initial conditions diverge.
3
2.1 First-order ODE; Initial value problems
We consider an ODE in the following standard form:
dy
y 0 (t) = = f (t, y), t ∈ [a, b], y(t) ∈ Rd
dt
where f is a function f : [a, b] × Rd → Rd . Here t is thought of as the independent variable,
which can be time but does not have to be. Time is a useful analogy since it suggests the
direction (forward in time) in which the ODE is to be solved. For simplicity, we will often
consider the 1-D case when y(t) ∈ R (y(t) is scalar)1 , but the theory extends to the general
case. We say that this ODE is first-order because the highest derivative is first-order. Note
the ODE does not have a unique solution until we impose some more conditions.
We will focus on solving initial value problems (IVPs) in the form
y 0 (t) = f (t, y), t ∈ [a, b], y(t) ∈ Rd (2.1a)
y(a) = y0 . (2.1b)
The equation (2.1a) is the ODE for y(t) and (2.1b) is the initial condition. We seek a
function y(t) that satisfies (2.1a) for all t in the given interval and whose value at a is y0 .
The following is a fundamental theorem about existence and uniqueness for ODE’s.
Note that the solution may not exist for all t ∈ [a, b] because the solution may diverge.
An example is y 0 (t) = y 2 , y(0) = 1c , which has the solution y(t) = − t−c
1
for t < c.
For our purposes, we will attempt to construct numerical solutions where the actual
solution exists, so the theory is just there to ensure that the problem to solve is well-defined.
Throughout, we will further assume that f has partial derivatives of all orders required
for any derivation (mostly for Taylor expansions).
4
y1 (t)
If y(t) ∈ Rd , then we can collect y1 , . . . , yn into a vector y(t) = ... ∈ Rnd . Then letting
yn (t)
y2
F(t, y) = ..
, we have
.
F (t, y1 , . . . , yn−1 )
y0 = F(t, y)
To specify initial conditions for this problem, we need to specify the value of y1 , . . . , yn
at a, i.e., we need to specify the values of y(a), y 0 (a), . . . , y (n−1) (a).
Example 2.2: For example, the ODE governing the motion of a pendulum (without air
resistance, etc.) is
d2 θ
θ00 = = −g sin(θ)
dt2
where θ is the angle from the negative vertical axis and g is the gravitational constant.
The initial conditions θ(0) and θ0 (0) would give the initial position and velocity. The
corresponding first-order ODE is
y10 = y2
y20 = −g sin(y1 ).
y1 (t) y2
This is in the form (2.1a)–(2.1b) with y(t) = and f (t, y) = .
y2 (t) −g sin(y1 )
The fact that we can rewrite higher-order ODE’s as first-order ODE’s means that it
suffices to derive methods for first-order ODE’s. Note that the standard ODE solvers for
MATLAB require you to input a first-order ODE in standard form, so you will need to carry
out this transformation before using it.
Problem 2.3: Write the ODE for the van der Pol oscillator
d2 dx
2
− µ(1 − x2 ) + x = 0
dt dt
as a first-order ODE in standard form.
5
2.3 Other types of differential equations
If conditions are given at more than one point, then the problem is a boundary value
problem (BVP). For an ODE, where the independent variable t is 1-dimensional, this
means that conditions are given on both y(a) and y(b).
One common case of this is that for a second-order ODE, rather than giving the initial
conditions y(a) = y0 and y 0 (a) = y00 , we are given the boundary conditions
y(a) = y0 y(b) = y1 .
Unlike for IVPs, there is no simple existence and uniqueness theorem like Theorem 2.1.
BVPs tend to be more challenging to solve numerically than IVPs, so we will not consider
them here.
Finally, differential equations that involve more than one independent variable (and
derivatives in those variables) are partial differential equations (PDEs).
1. The integrand f (s, y(s)) depends on the function value y(s). This means that any error
in the current function value y(s) propagates. In contrast, for numerical integration,
the errors on the different intervals are independent. This makes getting good error
bounds for ODEs more challenging. Indeed, we have seen (Lemma 2.8) that error can
accumulate exponentially.
2. When solving an ODE, we often want the entire trajectory y(t), rather than just the
value y(b).
6
2.5 Visualization using vector fields
Slope fields are a good way to visualize the solution to an ODE.
Suppose we are given a scalar ODE (y ∈ R)
y 0 = f (t, y).
A solution (t, y(t)) forms a curve in the (t, y) plane. The ODE tells us the direction of the
curve at any given point, since
y(t) is parallel to (1, y 0 ) = (1, f ).
In this sense, solutions to the ODE follow the “slope field”, which is the vector field
(1, f (t, y))
in the (t, y) plane. To find a solution to the IVP starting at (t0 , y0 ), we may follow the
slope field to construct the curve; this is the basis of the simplest numerical method that is
detailed in the next section.
The MATLAB function for plotting vector fields is quiver.2
Example 2.4: An example of plotting the vector field for y 0 = 0.5y on [0, 2] × [0, 4] is
given below. Here, t and y contain the t and y-coordinates of the grid points, and u, v are
the components of the vector field at those grid points. Make sure that u, v are defined
as componentwise functions applied to t, y (for example, using .* for componentwise
2
multiplication). The solution, y = 21 et /2 is also plotted.
[t , y ] = meshgrid (0:0.2:2 ,0:0.5:4);
u = ones ( size ( t ));
v = t .* y ;
hold on
% Set window
xlim ([0 ,2.2])
ylim ([ -0.25 ,4.25])
% Plot vector field
quiver (t ,y ,u ,v , ’b ’)
% Also plot trajectory
T = linspace (0 ,2);
Y = 0.5* exp ( T .^2/2);
plot (T ,Y , ’r ’ );
Another case we can easily visualize is where y ∈ R2 and the ODE is autonomous, that
is, not depending on t:
y 0 = f (y), y ∈ R2 .
2
Documentation: https://www.mathworks.com/help/matlab/ref/quiver.html
7
We can instead plot the vector field given by f : R2 → R2 , i.e., with the vector f (y) ∈ R2 at
y ∈ R2 .
Example 2.5: We plot the vector field corresponding to Example 2.2 on [−π/2, π/2] ×
[−2, 2]. For simplicity, g = 1 here.
[x , y ] = meshgrid ( - pi /2: pi /8: pi /2 , -2:0.5:2);
u = y;
v = - sin ( x );
Problem 2.6: For different values of µ, plot the vector field for the van der Pol
oscillator in Problem 2.3.
2.6 Sensitivity
The slope field gives geometric intuition for some important concepts for numerical methods,
such as the notion of sensitivity.
Sensitivity means: if y(t) gets perturbed by some amount ∆y at time t0 , how far apart
are the original and perturbed trajectories after some time? Put another way, how sensitive
is the ODE to changes in the initial condition?
8
Figure 2: Sensitivity
Suppose we have two solutions x(t) and y(t) to the same ODE,
To see that (2.3) implies (2.2) note that by the mean value theorem, for some ξ ∈ [y1 , y2 ],
∂f
f (t, y1 ) − f (t, y2 ) = (t, ξ)(y1 − y2 ),
∂y
To obtain a bound on z(t) given z(0), we use the following useful lemma.
Lemma 2.8 (Grönwall’s Lemma). Let u ∈ C 1 ([0, t]). If u0 (s) ≤ Lu(s) for each s ∈ [0, t],
then u(t) ≤ u(0)eLt .
3
This proof only works in 1 dimension. For y ∈ Rd , (2.3) becomes maxt∈[a,b],y∈Rd |∇y f (t, y)| ≤ L. We
R1
can conclude (2.2) by f (t, y1 ) − f (t, y2 ) = 0 h∇y f (t, y1 + s(y2 − y1 )) , y2 − y1 i ds.
9
Proof. We show that u(t)e−Lt is decreasing. This follows from the product rule and the
assumption u0 ≤ Lu:
d −Lt du −Lt −Lut du
(ue ) = e − Lue = − Lu e−Lt ≤ 0.
dt dt dt
Thus L (the maximum of the variation of f with respect to y) is the exponential rate, at
worst, at which the two solutions can move apart. We have proved the following.
are solutions to the same ODE with different initial conditions, where f ∈ C 1 ([0, t] ×
R). If f (t, y) is L-Lipschitz in y, then
The idea of sensitivity is very useful: The different initial conditions can come from some
“natural” perturbation, but they can also be error that is built up from previous steps of a
numerical algorithm.
However, the bound in Theorem 2.9 is sometimes pessimistic. Taking absolute values
discards information about the sign, so if z 0 ≈ −Lz then the bound is the same, even though
z then decays exponentially. This is shown in the figure.
3 Euler’s method
In this section we derive and analyze the simplest numerical method for ODEs, (forward)
Euler’s method; we will also briefly consider the backward Euler’s method. Through
analyzing Euler’s method, we introduce general concepts which are useful for understanding
and analyzing all kinds of numerical methods. This includes:
10
4
4 3
2
2
1
0 0
0 1 2 0 1 2
Figure 3: Sketch of the difference in two solutions that start at nearby points (t0 , x0 ) and
(t0 , y0 ) and numerical examples for y 0 = ty and y 0 = −ty.
• Proving convergence (for global error): After getting a bound for the local trunca-
tion error, use induction to give a quantitative bound for the global error.
(with y ∈ R) up to a time t = b. The approximation will take the form of values ỹj defined
on a grid
a = t0 < t1 < · · · < tN = b
11
Figure 4: Numerical solution of an IVP forward in time from t = a to t = b. The actual
values are y1 , . . . , yN and the estimated values are ye1 , . . . , yeN .
such that
ỹj ≈ y(tj ).
For convenience, denote by yj the exact solution at tj and let the “error” at each point be
ej = yj − ỹj .
It will be assumed that we have a free choice of the tj ’s. The situation is sketched in Figure 4.
h = tj − tj−1 ,
where τj+1 is the local truncation error defined below. We could derive a formula, but
the important thing is that
τj+1 = O(h2 ).
12
Dropping the error in (3.1) and iterating this formula, we get the (forward) Euler’s
method:
yej+1 = yej + hf (tj , yej ). (3.2)
Algorithm 3.1 (Forward Euler’s method): The forward Euler’s method for solving
the IVP
is given by
Definition 3.2 (Local truncation error, or LTE): The local truncation error τj+1
is the error incurred in obtaining yej+1 when the previous data yj = yej is known exactly:
In other words, it is the amount by which the exact solution fails to satisfy the
equation given by the numeric method.
The LTE is “local” in the sense that it does not include errors created at previous steps.
For example, for Euler’s method,
yej+1 = yej + hf (tj , yej )
The LTE is the error between this and the true value of yej+1 when yej = yj :
yj+1 = yej + hf (tj , yej ) + τj+1
= yj + hf (tj , yj ) + τj+1 .
! Notice that the total error is not just the sum of the truncation errors, because when
starting the numeric method with y0 , at step j f is evaluated at the approximation yej ,
and yej 6= yj . The truncation error will propagate through the iteration, as a careful
analysis will show.
3.2 Convergence
Suppose we use Euler’s method to generate an approximation
(tj , yej ), j = 0, . . . , N
13
to the solution y(t) in the interval [a, b] (with t0 = a and tN = b). The “error” in the
approximation that matters in practice is the global error
where
ej = yj − yej
is the error at tj . This is a measure of how well the approximation yej agrees with the true
solution over the whole interval.4
Definition 3.3: The method is convergent if the global error approaches 0 if h approaches
0.
More precisely, we would like to show that, given an interval [a, b], the global error has
the form
max |ej | = O(hp )
0≤j≤N
As an example, consider
y 0 = ty, y(0) = 0.1
2
which has exact solution y(t) = 0.1et . Below, we plot some approximations for various time-
steps h; on the right is the max. error in the interval. The log-log plot has a slope of 1,
indicating the error should be O(h).
0.8 10 0
Exact
0.6 h=0.4
h=0.2
max. err
h=0.1
10 -2
y
0.4
0.2
10 -4
0 0.5 1 1.5 2 10 -4 10 -2 10 0
t h
4
Note that this is not precisely true since the approximation is not defined for all t; we would need to
interpolate and that would have its own error bounds. But in practice we typically consider error at the
points where the approximation is computed.
The definition of convergent means that the approximation as a piecewise-defined function, e.g. piecewise
linear, converges to y(t) as h → 0. Since the points get arbitrarily close together as h → 0, the distinction
between “max error at the tj ’s” and “max error as functions” is not of much concern here.
14
3.3 Euler’s method: error analysis
The details of the proof are instructive, as they will illustrate how error propagates in the
“worst case”. Assume that, as before, we have a fixed step size h = tj − tj−1 , points
satisfies max[tj ,tj+1 ] |y 00 | ≤ M . Let the truncation error τj+1 for Euler’s method be defined by
Then
M h2
|τj+1 | ≤ .
2
Proof. By Taylor’s formula with remainder,
h2 00
y(tj+1 ) = y(tj ) + hy 0 (tj ) + y (ξ)
2
for some ξ ∈ [tj , tj+1 ]. Now use the fact that y 0 (tj ) = f (tj , yj ) and |y 00 (ξ)| ≤ M to conclude
the bound.
15
In particular, max τj = O(h2 ) as h → 0, and e0 = 0 (no error in y0 ) then
as h → 0, where the constant in the O(·) depends on the interval size and L but not on h.
Note also that the amplification of the initial error e0 is similar to in Theorem 2.9.
Combining Lemmas 3.4 and 3.5, we obtain the following.
2. f (t, y) is L-Lipschitz in y.
Let ỹj be the result of applying Euler’s method (3.2) starting at (t0 = a, ỹ0 = y0 ).
Then
M h L(tj −t0 )
|ej | ≤ [e − 1] (3.6)
2L
M h L(b−a)
max |ej | ≤ [e − 1]. (3.7)
0≤j≤n 2L
where ej = yj − ỹj is the error at tj .
! The global error is O(h) as h → 0, but the O-notation hides a large constant that is
exponential in L(b − a), and which is independent of h.
As with the bound on sensitivity, this bound can be quite pessimistic.
Proof of Lemma 3.5. To start, recall that from the definition of the truncation error and the
formula,
Because f is L-Lipschitz in y, we have h|f (tj , yj ) − f (tj , yej )| ≤ h(L|yj − yej |) = Lh|ej |. By
the triangle inequality,
|ej+1 | ≤ (1 + Lh)|ej | + |τj+1 |.
Iterating, we get
16
and in general
|ej | ≤ (1 + Lh)j |e0 | + (1 + Lh)j−1 τ1 + · · · + (1 + Lh)τj−1 + τj
j
X
j
= (1 + Lh) |e0 | + (1 + Lh)j−k |τk |.
k=1
17
6
3
4
2
2
1
0 0
0 1 2 3 0 1 2 3
Figure 5: Numerical solutions to y 0 = ty and y 0 = −ty with different values of N ; note the
behavior of the error as t increases.
3.5 Order
The order p of a numerical method for an ODE is the order of the global truncation error.
Euler’s method, for instance, has order 1 since the global error is O(h).
Definition 3.7 (Order): A numerical method with timestep h is said to be convergent
with order p if, on an interval [a, b],
yj − y(tj )| = O(hp ) as h → 0.
max |e
0≤j≤n
The 1/h factor is true for (most) other methods, so as a rule of thumb,
The interpretation here is that to get from a to b we take ∼ 1/h steps, so the error is on
the order of the number of steps times the error at each step, (1/h) · O(hp+1 ) = O(hp ). The
careful analysis shows that the order is not further worsened by the propagation of the errors.
! Some texts define the LTE with an extra factor of h so that it lines up with the global
error, in which case the rule is that the LTE and global error have the same order.
For this reason it is safest to say that the error is O(hp ) rather than to use the term
“order p”, but either is fine in this class.
18
where we are only given a (nonlinear) equation that is satisfied by yej+1 , and we have to solve
for yej+1 .
The simplest example of an implicit method is backward Euler, which has the iteration
yej+1 = yej + hf (tj+1 , yej+1 ).
Note the function is evaluated at yej+1 rather than yej as in the forward Euler’s method. To
solve for yej+1 , we can iterate Newton’s method until convergence; yej makes for a good initial
guess.
Using a “backward” Taylor expansion around tj+1 rather than around tj , you can prove
that the LTE is also O(h2 ).
This seems more complicated; why would you want to use the backward method? It
turns out that it has better stability properties; we will come back to this point.
Algorithm 3.8 (Backward Euler’s method): The backward Euler’s method is given
by
Here, you can solve for yej+1 using Newton’s method with yej as an initial guess.
If Newton’s method is used, the code must know f and ∂f /∂y. The function in Matlab
may be written, for instance, in the form
[T,Y] = beuler(f,fy,[a b],y 0,h)
where f,fy are both functions of t and y. At step j, We would like to set yej+1 equal to the
zero of
g(z) = z − yej − hf (tj+1 , z).
We compute
g 0 (z) = 1 − hfy (tj+1 , z)
so the Newton iteration is
g(zk )
zk+1 = zk − .
g 0 (zk )
This is iterated until convergence; then yej+1 is set to the resulting z.
Problem 3.9: Prove an analogue of Theorem 3.6 for the backward Euler’s method.
You can assume that the numeric solution yej+1 obtained from (3.10) is exact.
19
better understand it, we must identify the key properties that guarantee convergence and
how they can be controlled.
This strategy has two steps: showing consistency and stability. Both are necessary
for convergence. We’ll look at each of these notions in turn, and give an informal “theorem”
which says that they are sufficient for convergence. Finally, we’ll consider an example of
what can go wrong: a method which seems reasonable at first glance but is disastrously
unstable.
As before, we solve an IVP in an interval [a, b] starting at t0 = a with step size h.
4.1 Consistency
A method is called consistent if
That is,
τj
lim = 0 for all j.
h→0 h
To check consistency, we may assume the result of the previous step is exact (since
this is how the LTE is defined). This is a benefit, as there is no need to worry about the
accumulation of errors at earlier steps.
4.2 Stability
In constrast, stability refers to the sensitivity of solutions to initial conditions. We derived
a bound on stability in Theorem 2.9.
We would like to have a corresponding notion of “numerical” stability.
20
Definition 4.1 (Zero stability): Suppose {yn } and {zn } are approximate solutions
to
y 0 = f (t, y), y(t0 ) = y0 (4.1)
in [a, b]. If it holds that
Note that the best we can hope for is C = eL(t−t0 ) since the numerical method will never
be more stable than the actual IVP. In what follows, we will try to determine the right
notions of stability for the numerical method.
As written, the stability condition is not easy to check. However, one can derive easy to
verify conditions that imply zero stability. We have the following informal result.
y 0 = 0.
Here “typical” includes any of the methods we consider in class (like Euler’s method)
and covers most methods for ODEs one encounters in practice. “Numerical stability” means
that a perturbation at some step will not cause the solution to blow up. Numerical stability
for y 0 = 0 is much easier to check than the original definition of zero stability.
This assertion was proven for Euler’s method directly. Observe that the theorem lets
us verify two simple conditions (easy to prove) to show that a method converges (hard to
prove).
21
Consistency: The local truncation error is defined by
Plugging this in to the formula yields τj+1 = O(h2 ) so the method is consistent.
yej+1 = yej
The theorem then guarantees that the method is convergent, and that the order
of convergence is 1 (the global error is O(h)).
y(t + h) − y(t)
= y 0 = f (t, y).
h
One might hope, then, that a more accurate method can be obtained by using a second-order
forward difference
−y(t + 2h) + 4y(t + h) − 3y(t)
y 0 (t) = + O(h2 ).
2h
Plugging this in, we obtain the method
−e
yj+2 = −4e
yj+1 + 3e
yj + 2hf (tj , yej ) (4.3)
which is consistent with an O(h3 ) LTE. However, this method is not zero stable!
It suffices to show numerical instability for the trivial ODE y 0 = 0. The iteration reduces
to
yej+2 = 4eyj+1 − 3eyj .
Plugging in yj = rj we get a solution when
r2 − 4r + 3 = 0 =⇒ r = 1, 3
22
so the general solution is5
yej = a + b · 3j .
If initial values are chosen so that
ye0 = ye1
then yj = y0 for all j with exact arithmetic. However, if there are any errors (e y0 6= ye1 ) then
b 6= 0 and |e
yj | willl grow exponentially. Thus, the method is unstable, and is not convergent.
Obtaining a second order method therefore requires a different approach.
5 Runge-Kutta methods
In this section the most popular general-purpose formulas are introduced, which can be
constructed to be of any order.
• How might we derive a higher-order method? Considering the Taylor expansion, a first
idea is to include higher-order derivatives. This leads us to Taylor’s method.
• Finally, we’ll look at implicit Runge-Kutta methods, and derive the second-order
trapezoidal method.
Definition 5.1 (One step method): A general explicit one step method has the form
where ψ is some function we can evaluate at (tj , yj ). The truncation error is defined
by
yj+1 = yj + hψ(tj , yj ) + τj+1 . (5.2)
5
See Appendix C for a review of solving these recurrences.
23
To improve on the accuracy of Euler’s method with a one-step method, we may try to
include higher order terms to get a. To start, write (5.2) as
For a p-th order method, we want the LHS to equal the RHS up to O(hp+1 ). Now expand
the LHS in a Taylor series around tj :
h2 00
LHS: yj+1 = y(tj ) + hy 0 (tj ) + y (tj ) + · · ·
2
A p-order formula is therefore obtained by taking
h hp−1 (p−1)
ψ(tj , yj ) = y 0 (tj ) + y 00 (tj ) + · · · + y (tj ).
2 p!
The key point is that the derivatives of y(t) at can be expressed in terms of f and its partial
derivatives - which we presumably know. Simply differentiate the ODE y 0 = f (t, y(t)) in t,
being careful with the chain rule. If G(t, y) is any function of t and y evaluated on the
solution y(t) then
d
(G(t, y(t))) = Gt + Gy y 0 (t) = Gt + f Gy .
dt
with subscripts denoting partial derivatives and Gt etc. evaluated at (t, y(t)).
It follows that
y 0 (t) = f (t, y(t)),
y 00 (t) = ft + fy f,
y 000 (t) = (ft + fy f )0 = ftt + fty f + · · · (see HW).
In operator form, p−1
(p) ∂ ∂
y = +f f.
∂t ∂y
h2 00 hp+1 (p+1)
yj+1 = yj + hy 0 (tj ) + y (tj ) + · · · + y (tj ) + O(hp+1 ).
2 (p + 1)!
Note that y 0 , y 00 , . . . are replaced by formulas involving f and its partials by repeatedly
differentiating the ODE.
This method is generally not used due to the convenience of the (more or less equiv-
alent) Runge-Kutta methods.
24
5.2 A better way: Explicit Runge-Kutta methods
Taylor’s method is inconvenient because it involves derivatives of f . Ideally, we want a
method that needs to know f (t, y) and nothing more.
The key observation is that the choice of ψ in Taylor’s method is not unique. We can
replace ψ with anything else that has the same order error. The idea of a Runge-
Kutta method is to replace the expression with function evaluations at “intermediate”
points involving computable values starting with f (tj , yj ).
where
f1 = f (tj , yj ),
f2 = f (tj + h/2, yj + hβf1 )
Aside (integration): You may notice that this resembles an integration formula using two
points; this is not a coincidence since
Z tj+1
0
y = f (t, y) =⇒ yj+1 − yj = f (t, y(t)) dt
tj
so we are really estimating the integral of f (t, y(t)) using points at tj and tj+1/2 . The problem
is more complicated than just integrating f (t) because the argument depends on the unknown
y(t), so that also has to be approximated.
To find the coefficients, expand everything in a Taylor series, keeping terms up to order h2 :
h2 00
LHS = yj + hyj0 + y + O(h3 )
2 j
h2
= yj + hf + (ft + f fy ) + O(h3 )
2
where f etc. are all evaluated at (tj , yj ). For the fi ’s, we only need to expand f2 (using
Taylor’s Theorem A.1 for multivariate functions):
h2
hf2 = hf + ft + h2 fy (βf1 ) + O(h3 )
2
h2
= hf + ft + βh2 f fy + O(h3 ).
2
25
Plugging this into the RHS gives
w2 h2
RHS = yj + h(w1 + w2 )f + ft + w2 βh2 f fy + O(h3 )
2
h2
LHS = yj + hf + (ft + f fy ) + O(h3 )
2
Comparing the LHS/RHS are equal up to O(h3 ) if
1
w1 + w2 = 1, w2 = 1, w2 β =
2
which gives
1
w1 = 0, w2 = 1, β= .
2
We have therefore obtained the formula
Remark (integration connection): In this case one can interpret the formula as using
the midpoint rule to estimate
Z tj+1
y(tj+1 ) = y(tn ) + f (t, y(t)) dt ≈ y(tj ) + hf (tj + h/2, y(tj + h/2))
tj
26
5.2.2 Higher-order explicit methods
The modified Euler method belongs in the class of Runge-Kutta methods.
f1 = f (tj , yj )
f2 = f (tj + c2 h, yj + ha21 f1 )
f3 = f (tj + c3 h, yj + ha31 f1 + ha32 f2 )
..
.
fm = f (tj + cm h, yj + ham1 f1 + · · · + ham,m−1 fm−1 )
yj+1 = yj + h(w1 f1 + · · · + wm fm ),
where ci = i−1
P
k=1 aik (in order for the time to correspond to the y-estimates). Each
fj is an evaluation of f at a y-value obtained as yj plus a linear combination of the
previous fi ’s. The “next” value yj+1 is a linear combination of all the fi ’s.
• The best possible local truncation error is O(hp+1 ) where p ≤ m. For each p, the
system is underdetermined and has a family of solutions (see HW for the p = 2 case).
• Unfortunately, it is not true that p = m. That is, to get a high order method - fifth
order and above - we need more substeps per iteration than the order.
• Deriving RK methods past third order is quite tedious and a mess of algebra, since
the system for the coefficients is non-linear and the Taylor series expansions become
complicated.
Thankfully, just about every useful set of coefficients - at least for general purpose methods
- has been calculated already, so in practice one can just look them up. They are typically
arranged in the Butcher Tableau, defined by
Note that for an explicit method, the “matrix” A in the table only has nonzeros in the
strictly lower triangular part.
27
Algorithm 5.4 (The classical RK-4 method): One four stage method of note is the
classical “RK-4” method
f1 = f (tn , yn )
1 1
f2 = f tn + h, yn + hf1
2 2
1 1
f3 = f tn + h, yn + hf2
2 2
f4 = f (tn + h, yn + hf3 )
h
yn+1 = yn + (f1 + 2f2 + 2f3 + f4 ).
6
This method has a good balance of efficiency and accuracy (only four function evalu-
ations per step, and O(h5 ) LTE). The method would be a good first choice for solving
ODEs, except that there is a more popular variant that is better for error estimation,
the Runge-Kutta-Fehlberg method. (The formula is rather hairy so I will not copy it
here, see e.g. http://maths.cnam.fr/IMG/pdf/RungeKuttaFehlbergProof.pdf.)
Then yej+1 is a root of g(z) (which is computable for any z). Thus yej+1 can be computed by
applying Newton’s method (ideally) or some other root-finder to g(z).
28
Practical note: The obvious initial guess is yej , which is typically close to the root. If h is
small, then yej is almost guaranteed to be close to the root, and moreover
yej+1 → yej as h → 0.
Thus, if Newton’s method fails to converge, h can be reduced to make it work. Since the
initial guess is close, quadratic convergence ensures that the Newton iteration will only take
a few steps to achieve very high accuracy - so each step is only a few times more work than
an equally accurate explicit method.
You may wonder why we would bother with an implicit method when the explicit methods
are more efficient per step; the reason is that they have other desirable properties to be
explored in the next section. For some problems, implicit methods can use much larger h
values than explicit ones.
First, note that for the RHS, we only need to expand yj+1 up to an O(h2 ) error. Using
yj+1 = yj + hf + O(h2 ),
we have that
f (tj+1 , yj+1 ) = f (tj + h, yj + A)
where A = hf + O(h2 ). Since A2 = O(h2 ), the result is
h2
LHS = yj+1 = yj + hf + (ft + f fy ) + O(h3 )
2
we find that w1 = w2 = 1/2. This gives
h
yj+1 = yj + (fj + fj+1 ) + O(h3 )
2
where fj = f (tj , yj ) and fj+1 = f (tj+1 , yj+1 ).
29
Algorithm 5.5 (Implicit Trapezoidal rule): This is a second-order method given by
h
yej+1 = yej + (fj + fj+1 )
2
fj = f (tj , yej ), fj+1 = f (tj+1 , yej+1 ).
Note that when f = f (t), the formula reduces to the composite trapezoidal rule.
where ci = m
P
k=1 aik (in order for the time to correspond to the y-estimates). Each
fj is an evaluation of f at a y-value obtained as yj plus a linear combination of the
previous fi ’s. The “next” value yj+1 is a linear combination of all the fi ’s.
The formulas we discussed, and several others, are given on the next page.
Problem 5.7: Based on the tableau, write out the formulas for the explicit trape-
zoidal and the implicit midpoint rules. Show that they are second-order accurate.
30
Name Order Tableau
0 0
Forward Euler 1
1
0 0 0
Explicit Midpoint/ 1 1
2 2 2
0
Modified Euler
0 1
0 0 0
Explicit Trapezoidal 2 1 1 0
1 1
2 2
0 0 0 0 0
1 1
2 2
0 0 0
1 1
RK-4 4 2
0 2
0 0
1 0 0 1 0
1 1 1 1
6 3 3 6
1 1
Backward Euler 1
1
1 1
Implicit Midpoint 2 2 2
1
0 0 0
Implicit Trapezoidal 2 1 0 1
1 1
2 2
31
• If h > 1/10, then the numerical solution oscillates and diverges (the amplitude grows
exponentially). If h = 1/10 exactly, the oscillations stay bounded.
• If h < 1/10, the iteration suddenly settles down and behaves correctly.
There is a threshold of h∗ = 1/10; the step size must be less than h∗ to have a qualitatively
correct solution. Note that this is a separate issue from convergence, which only guarantees
that the error is O(h) for sufficiently small h.
The requirement that h < 1/10 is a stability constraint: the step size must be below
a certain threshold to avoid numerical instability. (Even in the absence of truncation error,
a larger step size would cause the solution to diverge.)
The threshold depends on the problem. An ODE that has such a constraint is called
stiff. Our first “practical” definition is as follows: 6
6
The definition of stiffness and example are borrowed from Ascher & Petzold, Computer methods for
ordinary differential equations.
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
Figure 7: Euler’s method applied to the stiff IVP (6.1) and the slope field (blue lines).
32
Definition 6.1 (Stiffness, definition I): An IVP in some interval [a, b] is called stiff
if Euler’s method requires a much smaller time step h to be stable than is needed to
represent the solution accurately.
We will see shortly that “Euler’s method” can be replaced by “typical explicit method”.
Note that this is a practical definition in that the meaning of ‘accurate” and “much
smaller” depends on what is needed. The ODE from the example is not stiff in [0, 0.01] by
this definition since h < 1/10 for accuracy, but is stiff in [0, 3]. On the interval [0, 3], to
represent the solution accurately means representing the sin t part accurately, and we know
that it is possible to obtain the curve sin t with much larger step size (e.g., if the −20(y−sin t)
part were removed from the ODE, and we just had y 0 = cos t, we can use fairly large step
size and still obtain y ≈ sin t).
At the present, the “stability” constraint remains mysterious; we will develop some theory
shortly to explain it.
A geometric interpretation
Nearby solutions with some other initial condition rapidly converge to y(t) (if the solution is
perturbed, it will quickly return). The numerical method should, but doesn’t, always behave
the same way.
The figure (Figure 6.5) is revealing here, recalling that Euler’s method follows the vector
field. Observe that if h is not small enough, Euler’s method will overshoot due to the large
slope. This process will continue, causing aggressive oscillations that will diverge. If h is at
a certain value, then the oscillations will stay bounded.
Below this value, the overshoot is not severe and the method provides a reasonable ap-
proximation.
Observation: An IVP with solution y(t) (in an interval) is stiff if y(t) is well-behaved
but nearby trajectories to y(t) vary rapidly compared to y(t) itself.
This is not the same as instability of the IVP itself; the solution is in fact quite stable
here. All solution curves, in fact, will approach sin t as t → ∞.
33
A good numerical method should have the property that
That is, if the true solution decays, the approximation should also decay (exponentially). In
particular, it should not grow exponentially when the true solution does not.
What does the numerical method do? For Euler’s method7 , we have
The test equation is simple enough that the iterates have an exact formula:
yn = (1 + hλ)n y0 .
Now define a region R of the complex plane as the set of hλ for which the iteration has
|yn | → 0 as n → ∞ (i.e. the set of hλ such that Euler’s method applied to (6.2) gives a
sequence that converges to zero). Then
R = {z ∈ C : |1 + z| < 1}.
Moreover, if h > 2/|λ| then |yn | will increase exponentially. This means that in order to
have yn decay exponentially when the exact solution does, the condition (6.3) must hold.
Otherwise, yn will grow exponentially even though it should go to zero.
Note that the λ > 0 is not as much of a concern here; if λ > 0 then 1 + hλ > 1 for
any positive h so yn will grow exponentially as desired.
w = y + u.
where u is a small perturbation (so w starts at w0 = y0 + u0 ). Plugging into w0 = f (t, w),
∂f
y 0 + u0 = f (t, y + u) = f (t, y) + u + O(2 )
∂y
7
For convenience, the tilde’s have been dropped.
34
Figure 8: Sketches of the stability regions (shaded area) for Euler’s method (left) and Back-
ward Euler (right) in the complex plane.
Key point: Locally, for a perturbed solution w(t) to the ODE y 0 = f (t, y), the
difference will evolve like the test equation u0 = λu with λ = ∂f /∂y. This determines
the local stability of the numerical method.
Indeed, this matches the observed behavior from the motivating example where ∂f ∂y
= −20.
Solutions that are nearby to y(t) want to be pushed onto y(t) exponentially fast (with rate
20) as shown by the vector field in Figure 6.5. The numerical method must therefore behave
well on the test equation y 0 = −20y in order to be stable.
The test equation analysis then yields a new, more precise identification of stiffness:
35
Definition 6.2 (stability region): The region of absolute stability R is the set of
complex numbers z ∈ C with z = hλ such that for the test equation
y 0 = λy
The interval of absolute stability is the real portion of R (that is, the
same definition but λ is real).
Note: Typically, the part of R that matters is the region where Re(λ) < 0.
R = {z : |1 + z| < 1}
we get
1
yn+1 = yn
(1 − hλ)
so |yn | → 0 when
|1 − hλ| > 1.
The region of absolute stability is then the set
R = {z ∈ C : |1 − z| > 1}
36
1.5 1.5
Back. Euler Back. Euler
exact exact
1 1
sol.
sol.
0.5 0.5
0 0
0 0.2 0.4 0.6 0 0.2 0.4 0.6
t t
Figure 9: Backward Euler to the stiff IVP (6.4). The approximation stays bounded for all
h, although for the initial transient, h must be taken smaller (h ∼ 1/200) to be accurate.
Definition 6.3: A method for which R contains all of {Re(z) < 0} is called A-
stable. For such a method, there is no stability condition required when integrating
a stiff equation.
For example, the backward Euler method is A-stable. Indeed, if it is used on the example
problem, the approximation will always be reasonable, even when h > 1/10. Below, Euler’s
method and Backward Euler are used to solve
Backwards Euler does fine in the stiff interval, but Euler’s method would require h < 1/50:
the stiffness imposes a severe constraint here and there is a clear winner away from the initial
transient.
The trapezoidal method is also A-stable (see homework). However, there are a few points
worth noting here:
• Not all implicit methods are A-stable, or even have intervals of absolute stability con-
taining (−∞, 0). Only certain classes of implicit methods are good for stiff problems.
• All explicit methods have finite intervals of absolute stability (−a, 0) and the value of
a is typically not that large. The regions R for RK methods up to order 4 are shown
in Figure 6.5. The stability constraint for RK-4 is more or less the same as for Euler’s
method.
There is much more to the story than just absolute stability. Many other notions of
stability exist, describing other constraints and finer properties of numerical solutions. For
37
3 Euler
RK2
2 RK3
RK4
1
-1
-2
-3
-2 0 2
Figure 10: Stability regions (the area inside the curve) for RK methods up to order 4. Note
that all RK methods of the same order have the same stability region.
38
for each component of x. Thus it suffices to know how the method behaves on the scalar
test equation to know how it behaves on the “system” version.
For a general system
y0 = F(t, y),
and a perturbation w = y + u near (t0 , y0 ), the difference u evolves according to
u0 ≈ (DF(t0 , y0 ))u
hλ∗ ∈ R.
That is, the numerical stability constraint is determined by the component that decays
fastest (the “most stiff” component). As a trivial example,
x0 = −5x
y 0 = −100y
using Euler’s method has the stability constraint −100h ∈ (−2, 0) =⇒ h < 1/50 because
the y-component is stiff. Note that in general, “component” means in the eigenvector basis of
DF, so it cannot typically be seen by just looking at the equations in the system individually.
The “stiffness ratio” is the ratio of the largest/smallest real parts (in absolute value) of
the Jacobian; if this ratio is large, the system is typically stiff since it has some components
that change slowly and some that change very fast (e.g. the ratio is 20 in the example above).
To do so, we need an estimate for τi at each step, and a way to select a step size that will
ensure that the estimated error is acceptably small.
On global error: We will keep the goal modest and not discuss strategies for bounding
the global error, which is more difficult.
39
One might, for instance, want the global error to remain below a tolerance :
max |e
y (t) − y(t)| < .
t∈[a,b]
Bounding the τ ’s individually does not lead to a natural bound on the global error, since
there is propagation and so on to worry about as well. But local control is good enough in
practice, so long as one is careful to be cautious (setting local tolerances to be smaller than
you think is necessary).
i) Method A: Order p, producing yen+1 from yn , truncation error τn+1 (h) with step size h
To derive the estimate, suppose that the value at time n is exact. We have that
yen+1 = yn + hψ(tn , yn ),
ŷn+1 = yn + hψ̂(tn , yn ).
By the definition of the truncation error,
Now further approximate (7.1) by assuming that the error comes from a series with some
leading order term:
τn+1 (h) = Chp+1 + O(hp+2 ).
Then
y(tn+1 ) = yen+1 + Chp+1 + O(hp+2 ).
Now subtract the more accurate method (7.2) from this to eliminate y(tn+1 ), leaving
40
This gives an error estimate:
|τ (h)| ≈ |C|hp+1 ≈ |e
yn+1 − ŷn+1 |. (7.3)
At this point, we have enough to get the updated time step hnew . It must satisfy
Taking the ratio of this and the estimate (7.3) (or solving for C) yields
(hnew /h)p+1 < , (7.4)
|e
yn+1 − ŷn+1 |
which is a formula for the next step hnew in terms of known quantities: The two approxima-
tions from the two methods, the last step h and the tolerance .
In practice, because this is just an estimate, one puts an extra coefficient in to be safe,
typically something like
1/(p+1)
hnew = 0.8h .
|e
yn+1 − ŷn+1 |
This formula decreases the step size if the error is too large (to stay accurate) and increases
the step size if the error is too small (to stay efficient). Sometimes, other controls are added
(like not decreasing or increasing h by too much per time step).
! The estimate is based on some strong assumptions that may not be true. While
it typically works, there is no rigorous guarantee that the error estimate is good.
Moreover, as noted earlier, this strategy bounds the local error, which does not
always translate into a reasonable global error bound.
f1 = f (tn , yn )
1 h
f2 = f tn + h, yn + f1
2 2
3
yn+1 = yn + hf2 + O(h ).
41
Figure 11: Left: error estimate using two methods of different order (red/blue). Right:
sketch of step doubling using one method with one step of size h and two half-steps of size
h/2.
The function evaluation needed for Euler’s method is already there, so once modified Euler
is applied, computing
yen+1 = yn + hf1 + O(h2 )
requires essentially no extra work; the value of f1 is used for both.
where yen+1 and ŷn+1 are the result of Euler (order 1) and modified Euler (order 2).
Embedded methods of higher order can be constructed by the right choice of coefficients.
We saw that fourth-order RK methods have the best balance of accuracy and efficiency.
There are a handful of good fourth/fifth order pairs.
One popular embedded pair is the Runge-Kutta-Fehlberg method, which uses a
fourth-order and fifth-order formula that share most of the fi ’s. A formula of this form with
step size selected by (7.4) is the strategy employed, for instance, by MATLAB’s ode45.8
The idea is to use step doubling. Suppose, for the sake of example, that our method
is a one step method of the form
42
Assume yn is exact; then the next step is
yen+1 = yn + hψ(tn , yn ).
Now take two steps of size h/2 (see Figure 11, right) to get a new approximation:
h
ŷn+1/2 = yn + ψ(tn , yn ),
2
h
ŷn+1 = ŷn+1/2 + ψ(tn , ŷn+1/2 ).
2
Assume that each application of a step creates a LTE
τ ≈ Chp+1
and that C is a single constant.9 Then, if yn is exact, we have
yn+1 ≈ y(tn+1 ) + Chp+1
ŷn+1 ≈ y(tn+1 ) + 2C(h/2)p+1
i.e. the ‘doubled’ method accumulates two truncation errors from a step size h/2. Subtract-
ing the two approximations gives
|yn+1 − ŷn+1 | ≈ (1 − 2−p )|Chp+1 |.
Thus the error estimate is
|yn+1 − ŷn+1 |
|Chp+1 | ≈
1 − 2−p
from which we can choose a new time step hnew such that C(hnew )p+1 < .
8 Multistep methods
A linear multistep method has the form
m
X m
X
aj yn−j = h bj fn−j (8.1)
j=0 j=0
where fn−j = f (tn−j , yn−j ). The methods are so named because they involve values from the
m previous steps, and are linear in f (unlike Runge-Kutta methods). For example, Euler’s
method
yn = yn−1 + hfn−1
and the Backward Euler method
yn = yn−1 + hfn
are both (trivially) linear multistep methods that only involve the previous step; we call
these one step methods. Note that the method (8.1) is implicit if b0 6= 0 and explicit
otherwise. We are interested here in methods that use more than just the previous step.
For simplicity, we will assume throughout that the discrete times are t0 , t1 , . . . with fixed
timestep h.
9
This can be made more precise by assuming that the LTE for a step of size h starting at t is τ (h; t) ≈
C(t)hp+1 where C(t) is a smooth function of t.
43
8.1 Adams methods
The starting point is the integrated form of the ODE:
Z tn
y(tn ) = y(tn−1 ) + y 0 (t) dt.
tn−1
To get an explicit method using the previous m values, we estimate the integral using the
previous times tn−1 , tn−2 , . . . , tn−m , or equivalently s = −1, −2, . . . , −m:
Z 0 Xm
g(s) ds = bj g(−j) + Cg (m) (ξ).
−1 j=1
The local truncation error is O(hm+1 ), so the method has order m. There is a trick for
deriving the coefficients; see example below.
We can also derive a method from (8.2) by including tn (or s = 0). With m = 1 this
would be the trapezoidal rule:
Z 0
1 1
g(s) ds = g(−1) + g(0) + Cg (2) (ξ).
−1 2 2
44
Note that this method uses m + 1 points (the m previous points plus tn ). The result is then
the trapezoidal method
h
yn = yn−1 + (fn + fn−1 )
2
which has order 2. In general, the method will have the form
m
!
X
yn = yn−1 + h bj fn−j .
j=0
and is called an Adams-Moulton method. The local truncation error is O(hm+2 ), so the
method is order m + 1. Note that the method is implicit, since fn appears on the RHS.
We will omit the details of the error term here. The easiest way to derive the coefficients
is to use undetermined coefficients on the Newton basis:
1, s + 1, (s + 1)(s + 2), . . .
The reason is that the resulting linear system will be triangular and easy to solve (and
can in fact there is a nice general formula). The method should have degree of accuracy
2, so we require that it be exact for 1, s + 1 and (s + 1)(s + 2). Plugging these in, we get
1 = b1 + b 2 + b3
1
= −2b2 − b3
2
5
= 2b3
6
so b1 = 23/12, b2 = −16/12 and b3 = 5/12. This gives the method
h
yn+1 = yn + (23fn−1 − 16fn−2 + 5fn−3 )
12
which has order of accuracy 3. The Adams-Moulton method with the same order of
accuracy has m = 2 and requires the formula
Z 0
g(s) ds ≈ b0 g(0) + b1 g(−1) + b2 g(−2).
−1
Requiring that the formula is exact for 1, s, s(s + 1) (same trick as before) yields
1 = b0 + b1 + b2
1
− = −b1 − 2b2
2
1
− = 2b2 .
6
45
After solving, the result is the method
h
yn = yn−1 + (8fn + 5fn−1 − fn−2 )
12
with order of accuracy 3.
• The order p Adams-Moulton method has a (much) smaller error than the order p
Adams-Bashforth method (the constant for the LTE is much smaller).
For this reason, neither type of method (even the implicit one!) is useful for stiff problems.
The Adams-Moulton method is superior in terms of accuracy and (absolute) stability, but
it is implicit, so it requires much more work per step.
In practice, the two are combined to form an explicit method that has some of the
stability of the implicit method, but is easy to compute. The implicit term fn = f (tn , yn ) is
estimated using the result ỹn from an explicit method. This strategy is called a predictor-
corrector method. For example, we can combine the two-step explicit formula with the
one-step implicit formula:
h
ỹn = yn−1 + (3fn−1 − fn−2 ) (8.3)
2
˜
fn = f (tn , ỹn ) (8.4)
h
yn = yn−1 + (f˜n + fn−1 ). (8.5)
2
Now the formula is not implicit (and as a bonus, the error can be estimated using yn and ỹn
by standard techniques). It turns out that this trick gives a method that has a reasonably
good stability region (not as good as the implicit one!) and is essentially strictly better than
the pure explicit method (8.3) alone.
Practical note (starting values): To start, we need m previous values, which means
y1 , . . . , ym−2 must be computed by some other method (not a multistep formula). A high-
order RK method is the simplest choice, because it only requires one previous point.
46
8.3 Other multistep methods
The Adams methods only use previous fn ’s and not yn ’s. There are other classes of multistep
methods of the form (8.1) that are derived in different ways. One such class are the Back-
ward differentiation formulas (BDFs). These are derived by approximating y 0 using a
backward difference:
For example, the first order BDF is Backward Euler; the second order BDF is
3yn − 4yn−1 + yn−2
= f (tn , yn )
2h
which rearranges to
4 1 2h
yn = yn−1 − yn−2 + fn .
3 3 3
This method is sometimes called Gears’ method. BDFs are valuable because for orders up
to 6, they do well on stiff problems (see next section) compared to Adams-Moulton methods,
and moreover have the stiff decay property.
47
Further conditions can be derived by taking more terms. Of course, for the Adams formulas
and BDFs, we derived the method and proved consistency by other (more elegant) means,
but with the above we can construct more general methods.
which is a linear recurrence for yn . This turns out to be sufficient! The following theorem is
true.
The equation (8.6) can be solved directly (see Section C for a review), leading to the
following condition.
Theorem 8.2. All solutions of the linear recurrence (8.6) are bounded if and only if all the
roots of the characteristic polynomial
m
X
p(r) = aj rm−j (8.7)
j=0
have magnitude ≤ 1.
The Dahlquist Theorem lets us verify a multistep method is convergent simply by proving
consistency and finding the roots of the characteristic polynomial (8.7).
48
8.5 Absolute stability
The definition of absolute stability is the same. When more than one previous y-value is
involved, the recurrence has more than two terms, so the analysis is more involved. When
the general method (8.1) is applied to y 0 = λy we get
m
X m
X
aj yn−j = h λbj yn−j .
j=0 j=0
To have solutions to the recurrence go to zero as n → ∞, we need all the roots of p(r, z) to
have magnitude less than one. Thus, with R the region of absolute stability,
This is now a condition that can, with some effort, be computed and we can plot the stability
region.
2r2 − (2 + 3z)r + z = 0.
Solving, we get p
2 + 3z ± (2 + 3z)2 − 8z
r= .
4
Thus z is in the region of absolute stability if and only if the two roots above are less
than 1 in magnitude. This is not a nice looking condition. However, we can determine
the largest b such that (−b, 0) is in the interval of absolute stability.
49
Observe that on the boundary of R, one of the roots must have |r| = 1. If
z < 0 is real then both roots are real (by the calculations above). Thus we need only
find the values of z for which r = ±1 is a root, which occurs when
z = 0 or z = −1.
Thus the interval of absolute stability is (−1, 0), which is half the size of Euler’s method!
• A linear multistep method with order ≥ 2 cannot be A-stable. (This is the second
Dahlquist barrier.)
• The interval of absolute stability for BDFs up to order 6 contains (−∞, 0).
• The stability region for Adams-Bashforth methods shrinks as the order increases.
It follows that higher order Adams-Moulton methods are not to be used on stiff problems
even though they are implicit, and that Adams-Bashforth methods should not be used on
anything remotely stiff.10
A Nonlinear systems
Solving implicit systems of ODEs requires solving non-linear systems, which is a problem in
itself. Here we briefly consider numerical solution of the non-linear system F(x) = 0 where
This describes a general system of n equations for n variables. Let DF be the Jacobian
matrix with entries
∂Fi
(DF)ij =
∂xj
We willl consider finding a zero x∗ with the property that
analogous to the f 0 (x∗ ) 6= 0 assumption in 1D. This guarantees that Newton’s method (see
below) works and that the zero is isolated.11
10
More to the point, predictor-corrector methods using Adams-Bashforth and Adams-Moluton formulas
work much better, so there is not much reason to use the Adams-Bashforth formula alone.
11
The result is a consequence of the inverse function theorem, which ensures that F is invertible in a
neighborhood of x∗ .
50
Simple example: The system of two equations
0 = ex1 (x2 − 1)
0 = x21 − x22
Much more precise statements can be made about the error, but this version is enough
for our purposes.
51
leading to
xk+1 = xk − J−1
k F(xk ), Jk := DF(xk ).
In practice, we solve the linear system
Jk v = −F(xk )
for the correction v and then update xk+1 = xk + v.
The convergence results transfer to the n-dimensional case. If DF(x∗ ) is invertible then
Newton’s method converges quadratically when kx0 − x∗ k is small enough (see the Newton-
Kantorovich theorem).
A good guess is even more valuable in Rn since there are more dimensions to work with
- which means more possibilities for failure. Without a good guess, Newton’s method will
likely not work. Moreover, searching for a root is much more difficult (no bisection).
52
Practical note: A general routine would require the user to input the ODE function F and
its Jacobian Dx F, both as functions of (t, x). If the dimension is large, it may be much more
efficient to write a solver specific to that problem that knows how to solve the Newton’s
method system (A.1) efficiently (e.g. Appendix B).
To apply Newton’s method, we first compute the Jacobian J. Let ri = sin(yi ). Then
−1 ···
2 + sin(y1 ) 0 0
...
−1 2 + sin(y2 ) −1 0
J(y) =
... ... ... .
0 0
0 ··· −1 2 + + sin(yN −1 ) −1
0 ··· 0 −1 2 + + sin(yN )
To apply Newton’s method, pick some starting vector y0 . Then, at step k, solve
(J(yk ))vk = −F(yk )
which is a tri-diagonal system (so it can be solved efficiently) and then update
yk+1 = yk + vk .
53
C Difference equations: review
Consider the linear difference equation
an = r n .
There are m (complex) roots r1 , · · · rm . Since the equation is linear, any linear combination
is a solution. The general solution is therefore
m
X
an = bj rjn
j=1
for constants b1 , · · · bm . Note that the solution is bounded for any initial conditions if and
only if
|rj | ≤ 1 for all j
and that an → 0 for any initial conditions if and only if
54