Math 361S Lecture Notes Numerical Solution of Odes: Part I: Topics Covered
Math 361S Lecture Notes Numerical Solution of Odes: Part I: Topics Covered
Topics covered
• Overview
• Euler’s method
• Stability
• Runge-Kutta methods
◦ General notes
◦ Implementation
1
1 Overview of goals
In this section of the course we will consider the solution of ordinary differential equations.
The tools we have developed so far - interpolation, differentiation and integtration - will
serve as a foundation for constructing the algorithms. We will also make use of some linear
algebra and non-linear equations (e.g. Newton’s method) to solve some sub-problems.
Our main goal here twofold. First, we seek to understand the fundamentals of numerical
methods for ODEs: what are the numerical issues that arise, how to recognize and correct
them, and the right perspective. If you are handed a nasty ODE, how do you approach
writing a method to solve it? How can you argue that the answer you obtain is correct?
Second, as ODEs are a synthesis of existing techniques, we will see how pieces can be
combined and how the fundamental principles (error in interpolation, numerical stability of
integration and so on) show up in practice. We’ll also see a few more concepts:
• How stability manifests in numerical solution to ODEs (and how it connects to the
mathematical notion of stability from theory)
• Managing trade-offs between accuracy and stability (the consequences of sacrificing
one for the other are more severe for ODEs than the other problems we have seen)
• Designing adaptive algorithms that ’just work’ like ode45 in Matlab (we saw a bit of
this with adaptive integration, but will explore it in detail here)
The ODE is first-order (the highest derivative is first-order) and scalar (y(t) is a real-
valued function).
2
• f (t, y) is a continuous function in t and y
• f has partial derivatives of all orders required for any derivation (mostly for Taylor
expansions)
It is a fundamental theorem that these conditions ensure a unique solution exists, at least
in some interval around (t0 , y0 ). It is not enough to ensure that a solution exists for all t;
the solution may diverge.
For our purposes, we will attempt to construct numerical solutions where the actual so-
lution exists, so the theory is just there to ensure that the problem to solve is well-defined.
y 0 = f (t, y).
A solution (t, y(t)) forms a curve in the (t, y) plane. The ODE tells us the direction of the
curve at any given point, since
In this sense, solutions to the ODE follow the ’slope field’, which is the vector field
in the (t, y) plane. To find a solution to the IVP starting at (t0 , y0 ), we may follow the
slope field to construct the curve; this is the basis of the simplest numerical method that is
detailed in the next section.
The slope field gives geometric intuition for some important concepts for numerical methods.
For instance, we may ask: if y(t) gets perturbed by some amount ∆y at time t0 , how far
apart are the original and perturbed curves after some time?
Put another way: how sensitive is the ODE to changes in the initial condition?
The picture is shown in Figure 3. Suppose we have two solutions x(t) and y(t) to the
same ODE,
y 0 = f (t, y), y(t0 ) = y0 ,
x0 = f (t, x), x(t0 ) = x0 .
Informally, the difference z = y − x satisfies
∂f
z 0 = f (t, y) − f (t, x) = (t, ξ)z
∂y
3
for some ξ, by the mean value theorem. Thus the size of ∂f /∂y determines how fast the
difference can change with time. Let
∂f
L = max (t, y) .
t∈[a,b],y∈R ∂y
ignoring that the absolute value cannot be manipulated this way with a derivative. Thus L
(the max. of the variation of f with respect to y) is the exponential rate, at worst, at which
the two solutions can move apart.
However, the bound is sometimes pessimistic. Taking absolute values discards informa-
tion about the sign, so if z ∼ −Lz then the bound is the same, even though z then decays
exponentially. This is shown in the figure.
4
4 3
2
2
1
0 0
0 1 2 0 1 2
Figure 1: Sketch of the difference in two solutions that start at nearby points (t0 , x0 ) and
(t0 , y0 ) and numerical examples for y 0 = ty and y 0 = −ty.
4
3 Numerical methods: the basics
Here we introduce Euler’s method, and the framework to be used for better numerical
methods later. We seek a numerical solution to the IVP
and suppose we wish to solve for y(t) up to a time1 t = b. The approximation will take the
form of values ỹj defined on a grid
such that
ỹj ≈ y(tj ).
For convenience, denote by yj the exact solution at tj and let the ‘error’ at each point be
ej = yj − ỹj .
It will be assumed that we have a free choice of the tj ’s. The situation is sketched in Figure 2.
Suppose that we have the exact value of y(t). To get y(t + h) from y(t), expand in a
Taylor series and use the ODE to simplify the derivatives:
5
Figure 2: Numerical solution of an IVP forward in time from t = a to t = b.
where τj+1 is the local truncation error defined below. We could derive a formula, but
the important thing is that
τj+1 = O(h2 ).
Dropping the error in (3.1) and iterating this formula, we get Euler’s method:
Notice that the total error is not just the sum of the truncation errors, because f is evaluated
at the approximation yen . The truncation error will propagate through the iteration, as a
careful analysis will show.
Definition (Local truncation error, or LTE): The local truncation error is the error
incurred in obtaining ỹj when the previous data (yj−1 etc.) is known exactly. It is ’local’ in
the sense that it does not include errors created at previous steps.
Another interpretation is this: The local truncation error is what is ’left over’ when
the exact solution is plugged into the approximation. For Euler’s method,
6
if we plug in y(tj ) instead of ỹj , the LHS/RHS are not equal; the difference is the local
truncation error:
yj+1 = yj + hf (tj , yj ) + τj+1 .
3.2 Convergence
Suppose we use Euler’s method to generate an approximation
(tj , yej ), j = 0, · · · , N
to the solution y(t) in the interval [a, b] (with t0 = a and tN = b). The ‘error’ in the
approximation that matters in practice is the global error
where
ej = yj − yej
is the error at tj . This is a measure of how well the approximation yej agrees with the true
solution over the whole interval.2
Our goal is to show that, given an interval [a, b], the global error has the form
for some integer p, the order of the approximation. In this case we say that the method is
convergent; the approximation, in theory, converges to the true solution y(t) as h → 0.3
As an example, consider
y 0 = ty, y(0) = 0.1
2
which has exact solution y(t) = 0.1et . Below, we plot some approximations for various time-
steps h; on the right is the max. error in the interval. The log-log plot has a slope of 1,
indicating the error should be O(h).
2
Note that this is not precisely true since the approximation is not defined for all t; we would need to
interpolate and that would have its own error bounds. But in practice we typically consider error at the
points where the approximation is computed.
3
See previous footnote; really this means that the approximation as a piecewise-defined function, e.g.
piecewise linear, converges to y(t) as h → 0. Since the points get arbitrarily close together as h → 0, the
distinction between ‘max error at the tj ’s’ and ‘max error as functions’ is not of much concern here.
7
0.8 10 0
Exact
0.6 h=0.4
h=0.2
max. err
h=0.1
10 -2
y
0.4
0.2
10 -4
0 0.5 1 1.5 2 10 -4 10 -2 10 0
t h
This holds, for instance, if the appropriate derivatives of f are all bounded in [a, b] (for
all y), but we will not be too precise about the matter.
Condition (3.3) is called a Lipschitz condition and L is the Lipschitz constant. The
relevant consequence is that
which is a direct consequence of the mean value theorem. This inequality is the key ingredient
for the proof. The theorem is the following:
8
Theorem (Convergence for Euler’s method): Let ỹj be the result of applying Euler’s
method (3.2) starting at (t0 , ỹ0 ) and suppose that (i) and (ii) hold. Then
L(b−a)
L(b−a) e − 1 max |τk |
max |ej | ≤ e |e0 | + . (3.5)
0≤j≤n L h
Proof. To start, recall that from the definition of the truncation error and the formula,
Iterating, we get
|e1 | ≤ (1 + Lh)|e0 | + |τ1 |,
|e2 | ≤ (1 + Lh)2 |e0 | + (1 + Lh)|τ1 | + |τ2 |
and in general
j
X
j
|ej | ≤ (1 + Lh) |e0 | + (1 + Lh)j−k |τk |.
k=1
Bounding each |τk | by the maximum in (ii) and evaluating the sum,
(1 + Lh)j − 1
|ej | ≤ (1 + Lh)j |e0 | + (max |τk |) .
Lh
Now we want to take the maximum over j, so the RHS must be written to be independent
of j. To fix this problem, we use the crude estimate
(1 + Lh) ≤ eLh
to obtain
eLjh − 1
Ljh
|ej | ≤ e |e0 | + max |τk |
Lh
9
But jh = tj − t0 ≤ b − a (equal when j = N ) so
L(b−a)
L(b−a) e − 1 max |τk |
|ej | ≤ e |e0 | + .
L h
Taking the maximum over j (note that the RHS is independent of j) we get (3.5).
max |τk |
max |ej | ≤ C
0≤j≤N h
for a constant C that depends on the interval size and L but not on h. By assumption (ii0,
we know that max |τk | = O(h2 ) so the maximum error is O(h).
• The LTE is O(h2 ) and O(1/h) steps are taken, so the total error is O(h); the propa-
gation does not affect the order of the error on a finite interval.
Note that the factor L and (1 + Lh) are method dependent; for other methods, the factors
may be other expressions related to L.
illustrate the propagation issue. As with actual solutions, the error and a numerical solution
(or two nearby numerical solutions), can grow like eLt at worst. Indeed, for (a), the error
grows in this way; the error bound is good here.
However, for (b), the numerical solutions actually converge to the true solution as t in-
creases; in fact we have that the error behaves more like e−Lt . But the error bound cannot
distinguish between the two cases, so it is pessimistic for (b).
3.5 Order
The order p of a numerical method for an ODE is the order of the global truncation error.
Euler’s method, for instance, has order 1 since the global error is O(h).
10
6
3
4
2
2
1
0 0
0 1 2 3 0 1 2 3
Figure 3: Numerical solutions to y 0 = ty and y 0 = −ty with different values of N ; note the
behavior of the error as t increases.
The 1/h factor is true for (most) other methods, so as a rule of thumb,
The interpretation here is that to get from a to b we take ∼ 1/h steps, so the error is on
the order of the number of steps times the error at each step, (1/h) · O(hp+1 ) = O(hp ). The
careful analysis shows that the order is not further worsened by the propagation of the errors.
Warning: Some texts define the LTE with an extra factor of h so that it lines up with the
global error, in which case the rule is that the LTE and global error have the same order.
For this reason it is safest to say that the error is O(hp ) rather than to use the term ‘order
p’, but either is fine in this class.
This strategy leads to two notions: consistency and stability, which are pieces that are
necessary for convergence.
11
Definition (convergence): The numerical method is said to be convergent with order p
if, given an interval [a, b] where (i) as the timestep goes to zero:
max |eyj − y(tj )| = O(hp ) as h → 0.
0≤j≤n
That is,
τj
lim = 0 for all j.
h→0 h
To check consistency, we may assume the result of the previous step is exact (since
this is how the LTE is defined). This is a benefit, as there is no need to worry about the
accumulation of errors at earlier steps.
12
If (4.2) holds for some interval t ∈ [a, b] containing t0 and all y and f is continuous then the
IVP (4.1) has a unique solution in [a, b].
Moreoever, if y1 amd y2 are two solutions to the ODE with different initial conditions then
Definition (zero-stability) Suppose {yn } and {zn } are approximate solutions to (4.1) in
[a, b]. If it holds that
|yn − zn | ≤ C|y0 − z0 | + O(hp ) (4.3)
where C is independent of n, then the method is called zero stable.
Note that the best we can hope for is C = eL(t−t0 ) since the numerical method will never
be more stable than the actual IVP. In what follows, we will try to determine the right
notions of stability for the numerical method.
As written, the stability condition is not easy to check. However, one can derive easy
to verify conditions that imply zero stability. We have the following informal result:
Zero-stability, again: A ‘typical’ numerical method is zero stable in the sense (4.3) if it
is numerically stable when used to solve the trivial ODE
y 0 = 0.
Here ‘typical’ includes any of the methods we consider in class (like Euler’s method) and
covers most methods for ODEs one encounters in practice.
With some effort, one can show that this notion of stability is exactly the minimum required
for the method to converge.
This assertion was proven for Euler’s method directly. Observe that the theorem lets
us verify two simple conditions (easy to prove) to show that a method converges (hard to
prove).
13
Example: convergence by theorem The Backward Euler method (to be studied in
detail later) is
yej+1 = yej + hf (tj+1 , yej+1 ).
Consistency: The local truncation error is defined by
Plugging this in to the formula yields τj+1 = O(h2 ) so the method is consistent.
yej+1 = yej
The theorem then guarantees that the method is convergent, and that the order
of convergence is 1 (the global error is O(h)).
Euler’s method can be derived by replacing y 0 in the ODE with a forward difference:
y(t + h) − y(t)
= y 0 = f (t, y).
h
One might hope, then, that a more accurate method can be obtained by using a second-order
forward difference
−y(t + 2h) + 4y(t + h) − 3y(t)
y 0 (t) = + O(h2 ).
2h
Plugging this in, we obtain the method
−e
yj+2 = −4e
yj+1 + 3e
yj + 2hf (tj , yej ) (4.4)
14
which is consistent with an O(h3 ) LTE. However, this method is not zero stable!
It suffices to show numerical instability for the trivial ODE y 0 = 0. The iteration reduces to
yj+1 − 3e
yej+2 = 4e yj .
r2 − 4r + 3 = 0 =⇒ r = 1, 3
5 Runge-Kutta methods
In this section the most popular general-purpose formulas are introduced, which can be
constructed to be of any order.
Definition (one step methods): A general ‘explicit’ one step method has the form
where ψ is some function we can evaluate at (tj , yj ). The truncation error is defined by
15
The term ‘explicit’ refers to the fact that yj+1 can be computed explicitly at each step, which
makes computation easy.
To improve on the accuracy of Euler’s method with a one-step method, we may try to
include higher order terms to get a. To start, write (5.2) as
yj+1 = yj + hψ(tj , yj ) +τj+1
|{z} | {z }
LHS RHS
For a p-th order method, we want the LHS to equal the RHS up to O(hp+1 ). Now expand
the LHS in a Taylor series around tj :
h2 00
LHS: yj+1 = y(tj ) + hy 0 (tj ) + y (tj ) + · · ·
2
A p-order formula is therefore obtained by taking
h hp−1 (p−1)
ψ(tj , yj ) = y 0 (tj ) + y 00 (tj ) + · · · + y (tj ).
2 p!
The key point is that the derivatives of y(t) at can be expressed in terms of f and its partial
derivatives - which we presumably know. Simply differentiate the ODE y 0 = f (t, y(t)) in t,
being careful with the chain rule. If G(t, y) is any function of t and y evaluated on the
solution y(t) then
d
(G(t, y(t))) = Gt + Gy y 0 (t) = Gt + f Gy .
dt
with subscripts denoting partial derivatives and Gt etc. evaluated at (t, y(t)).
It follows that
y 0 (t) = f (t, y(t)),
y 00 (t) = ft + fy f,
y 000 (t) = (ft + fy f )0 = ftt + fty f + · · · (see HW).
In operator form, p−1
(p) ∂ ∂
y = +f f.
∂t ∂y
h2 00 hp+1 (p+1)
yj+1 = yj + hy 0 (tj ) + y (tj ) + · · · + y (tj ) + O(hp+1 ).
2 (p + 1)!
Note that y 0 , y 00 , · · · are replaced by formulas involving f and its partials by repeatedly
differentiating the ODE.
This method is generally not used due to the convenience of the (more or less equiv-
alent) Runge-Kutta methods.
16
5.2 A better way
Taylor’s method is inconvenient because it involves derivatives of f . Ideally, we want a one
step method that needs to know f (t, y) and nothing more.
The key observation is that the choice of ψ in Taylor’s method is not unique. We can replace
ψ with anything else that has the same order error. The idea of a Runge-Kutta
method is to replace the expression with function evaluations at ‘intermediate’ points in-
volving computable values starting with f (tj , yj ).
Let us illustrate this by deriving a second order one step method of the form
where
f1 = f (tj , yj ),
f2 = f (tj + h/2, yj + hβf1 )
Aside (integration): You may notice that this resembles an integration formula using two
points; this is not a coincidence since
Z tj+1
0
y = f (t, y) =⇒ yj+1 − yj = f (t, y(t)) dt
tj
so we are really estimating the integral of f (t, y(t)) using points at tj and tj+1/2 . The problem
is more complicated than just integrating f (t) because the argument depends on the unknown
y(t), so that also has to be approximated.
To find the coefficients, expand everything in a Taylor series, keeping terms up to order h2 :
h2 00
LHS = yj + hyj0 + yj + O(h3 )
2
h2
= yj + hf + (ft + f fy ) + O(h3 )
2
where f etc. are all evaluated at (tj , yj ). For the fi ’s, we only need to expand f2 :
h2
hf2 = hf + ft + h2 fy (βf1 ) + O(h3 )
2
h2
= hf + ft + βh2 f fy + O(h3 ).
2
17
Plugging this into the RHS gives
w2 h2
RHS = yj + h(w1 + w2 )f + ft + w2 βh2 f fy + O(h3 )
2
h2
LHS = yj + hf + (ft + f fy ) + O(h3 )
2
Comparing the LHS/RHS are equal up to O(h3 ) if
1
w1 + w2 = 1, w2 = 1, w2 β =
2
which gives
1
w1 = 0, w2 = 1, β= .
2
We have therefore obtained the formula
Remark (integration connection): In this case one can interpret the formula as using
the midpoint rule to estimate
Z tj+1
y(tj+1 ) = y(tn ) + f (t, y(t)) dt ≈ y(tj ) + hf (tj + h/2, y(tj + h/2))
tj
18
5.3 Higher-order explicit methods
The modified Euler method belongs in the class of Runge-Kutta methods.
f1 = f (tj , yj )
f2 = f (tj + c2 h, yj + ha21 f1 )
f3 = f (tj + c3 h, yj + ha31 f1 + ha32 f2 )
..
.
fm = f (tj + cm h, yj + ham1 f1 + · · · + ham,m−1 fm−1 )
yj+1 = yj + h(w1 f1 + · · · + wm fm ).
• The best possible local truncation error is O(hp+1 ) where p ≤ m. For each p, the
system is underdetermined and has a family of solutions (see HW for the p = 2 case).
• Unfortunately, it is not true that p = m. That is, to get a high order method - fifth
order and above - we need more substeps per iteration than the order.
• Deriving RK methods past third order is quite tedious and a mess of algebra, since
the system for the coefficients is non-linear and the Taylor series expansions become
complicated. (Exercise: verify that RK-4 below has an O(h5 ) LTE.)
Thankfully, just about every useful set of coefficients - at least for general purpose methods
- has been calculated already, so in practice one can just look them up. They are typically
arranged in the Butcher Tableau (see book for details).
The classical RK-4 method: One four step method of note is the classical ‘RK-4’ method
f1 = hf (tn , yn )
1 1
f2 = hf (tn + h, yn + f1 )
2 2
1 1
f3 = hf (tn + h, yn + f2 )
2 2
f4 = hf (tn + h, yn + f3 )
1
yn+1 = yn + (f1 + 2f2 + 2f3 + f4 ).
6
This method has a good balance of efficiency and accuracy (only four function evaluations
per step, and O(h5 ) LTE). The method would be a good first choice for solving ODEs, except
that there is a more popular variant that is better for error estimation.
19
5.4 Implicit methods
The explicit RK methods are nice because we simply ’iterate’ to compute successive values,
ending up with yj+1 . That is, all quantities in the formula can be computed explicitly.
However, we could also include yj+1 in the formula as well. For a one step method, the
formula would take the form
Nothing is different in the theory; the truncation error is calculated in the same way and the
same remarks on convergence apply. In practice, however, more work is required because
yj+1 is defined implicitly in the formula:
Then yej+1 is a root of g(z) (which is computable for any z). Thus yej+1 can be computed by
applying Newton’s method (ideally) or some other root-finder to g(z).
Practical note: The obvious initial guess is yej , which is typically close to the root. If h is
small, then yej is almost guaranteed to be close to the root, and moreover
yej+1 → yej as h → 0.
Thus, if Newton’s method fails to converge, h can be reduced to make it work. Since the
initial guess is close, quadratic convergence ensures that the Newton iteration will only take
a few steps to achieve very high accuracy - so each step is only a few times more work than
an equally accurate explicit method.
You may wonder why we would bother with an implicit method when the explicit methods
are more efficient per step; the reason is that they have other desirable properties to be
explored in the next section. For some problems, implicit methods can use much larger h
values than explicit ones.
20
If Newton’s method is used, the code must know f and ∂f /∂y. The function in Matlab may
be written, for instance, in the form
[T,Y] = beuler(f,fy,[a b],y 0,h)
where f,fy are both functions of t and y. Internally, at step j we have
g(z) = z − yj − hf (tj+1 , z), g 0 (z) = 1 − hfy (tj+1 , z)
and the Newton iteration is
g(zk )
zk+1 = zk − .
g 0 (zk )
This is iterated until convergence; then yj+1 is set to the resulting z.
Note that when f = f (t), the formula reduces to the composite trapezoidal rule.
21