Introduction to Numerical Methods and Analysis Python
Introduction to Numerical Methods and Analysis Python
I Frontmatter 3
1 Introduction 5
1.1 Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Some References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
II Main 7
2 Root-finding 9
2.1 Root Finding by Interval Halving (Bisection) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Solving Equations by Fixed Point Iteration (of Contraction Mappings) . . . . . . . . . . . . . . . . . . 18
2.3 Newton’s Method for Solving Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Taylor’s Theorem and the Accuracy of Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5 Measures of Error and Order of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 The Convergence Rate of Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.7 Root-finding Without Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
i
5.3 Definite Integrals, Part 1: The Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.4 Definite Integrals, Part 2: The Composite Trapezoid and Midpoint Rules . . . . . . . . . . . . . . . . 189
5.5 Definite Integrals, Part 3: The (Composite) Simpson’s Rule and Richardson Extrapolation . . . . . . . 194
5.6 Definite Integrals, Part 4: Romberg Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6 Minimization 201
6.1 Finding the Minimum of a Function of One Variable Without Using Derivatives — A Brief Introduction 201
6.2 Finding the Minimum of a Function of Several Variables — Coming Soon . . . . . . . . . . . . . . . 203
ii
15 Exercises on Approximating Derivatives, the Method of Undetermined Coefficients and Richardson Ex-
trapolation 323
15.1 Exercise 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
15.2 Exercise 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
15.3 Exercise 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
15.4 Exercise 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
15.5 Exercise 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
15.6 Exercise 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
21 Python Variables, Including Lists and Tuples, and Arrays from Package Numpy 347
21.1 Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
21.2 Numerical variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
21.3 Text variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
21.4 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
21.5 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
21.6 Naming rules for variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
21.7 The immutability of tuples (and also of text strings) . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
21.8 Numpy arrays: for vectors, matrices, and beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
iii
23.3 Note: With tuples, parentheses are optional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
23.4 Single-member tuples: not an oxymoron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
23.5 Documenting functions with triple quoted comments, and help** . . . . . . . . . . . . . . . . . . . 371
23.6 Exercise A. A robust function for solving quadratics . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
23.7 Keyword arguments: specifying input values by name . . . . . . . . . . . . . . . . . . . . . . . . . . 373
23.8 Functions as input to other functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
23.9 Optional input arguments and default values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
23.10 Optional topic: anonymous functions, a.k.a. lambda functions . . . . . . . . . . . . . . . . . . . . . . 376
iv
30.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
34 Classes, Objects, Attributes, Methods: Very Basic Object-Oriented Programming in Python 447
34.1 Example A: Class VeryBasic3Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
34.2 Example B: Class BasicNVector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
34.3 Inheritence: new classes that build on old ones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
V Appendices 459
36 Notebook for generating the module numericalMethods 461
36.1 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
36.2 Zero Finding: solving 𝑓(𝑥) = 0 or 𝑔(𝑥) = 𝑥 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
36.3 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
36.4 Polynomial Collocation and Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
36.5 Solving Initial Value Problems for Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . 476
36.6 For some examples in Chapter Initial Value Problems for Ordinary Differential Equations . . . . . . . 486
37 Linear algebra algorithms using 0-based indexing and semi-open intervals 487
37.1 The naive Gaussian elimination algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
37.2 The LU factorization with 𝐿 unit lower triangular, by Doolittle’s direct method . . . . . . . . . . . . . 488
37.3 Forward substitution with a unit lower triangular matrix . . . . . . . . . . . . . . . . . . . . . . . . . 488
37.4 Backward substitution with an upper triangular matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 489
37.5 Versions with maximal element partial pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
37.6 Tridiagonal matrix algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
37.7 Banded matrix algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
v
38 Revision notes and plans 495
38.1 Recent changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
38.2 To Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
39 Bibliography 497
Bibliography 499
vi
Introduction to Numerical Methods and Analysis with Python
Brenton LeMesurier College of Charleston, Charleston, South Carolina [email protected], with contributions by
Stephen Roberts (Australian National University)
Last Revised June 13, 2024
For notes on recent changes and plans for further revisions, see Revision notes and plans.
This is published at http://lemesurierb.people.cofc.edu/introduction-to-numerical-methods-and-analysis-python/
The primary language used for computational examples is Python and the related packages Numpy and Matplotlib, and it
also contains a tutorial on using Python with those packages; this is excerpted from the Jupyter book Python for Scientific
Computing by the same author.
I am working on an evolution of this to cover further topics, and with some more advanced material on analysis of methods,
to make it suitable for courses up to introductory graduate level.
There is also a parallel edition that presents examples using the Julia programming language in
place of Python; this can be found at the predictable location http://lemesurierb.people.cofc.edu/
introduction-to-numerical-methods-and-analysis-julia/
Both of these are based on Elementary Numerical Analysis with Python, my notes for the course Elementary Numerical
Analysis at the University of Northern Colorado in Spring 2021), in turn based in part on Jupyter notebooks and other
materials for the courses MATH 245, MATH 246, MATH 445 and MATH 545 at the College of Charleston, South
Carolina, as well as MATH 375 at the University of Northern Colorado.
CONTENTS 1
Introduction to Numerical Methods and Analysis with Python
2 CONTENTS
Part I
Frontmatter
3
CHAPTER
ONE
INTRODUCTION
This book addresses the design and analysis of methods for computing numerical values for solutions to mathematical
problems. Often, only accurate approximations are possible rather than exact solutions, so a key mathematical goal is to
assess the accuracy of such approximations.
Given that most numerical methods allow any degree of accuracy to be achieved by working hard enough, the next level
of analysis is assessing cost, or equivalently speed, or more generally the efficiency of resource usage. The most natural
question then is how much time and other resources are needed to achieve a given degree of accuracy.
1.1 Topics
5
Introduction to Numerical Methods and Analysis with Python
6 Chapter 1. Introduction
Part II
Main
7
CHAPTER
TWO
ROOT-FINDING
References:
• Section 1.1 The Bisection Method in Numerical Analysis by Sauer [Sauer, 2022]
• Section 2.1 The Bisection Method in Numerical Analysis by Burden, Faires and Burden [Burden et al., 2016]
(See the Bibliography.)
2.1.1 Introduction
One of the most basic tasks in numerical computing is finding the roots (or “zeros”) of a function — solving the equation
𝑓(𝑥) = 0 where 𝑓 ∶ ℝ → ℝ is a continuous function from and to the real numbers. As with many topics in this course,
there are multiple methods that work, and we will often start with the simplest and then seek improvement in several
directions:
• reliability or robustness — how good it is at avoiding problems in hard cases, such as division by zero.
• accuracy and guarantees about accuracy like estimates of how large the error can be — since in most cases, the
result cannot be computed exactly.
• speed or cost — often measure by minimizing the amount of arithmetic involved, or the number of times that a
function must be evaluated.
𝑓(𝑥) ∶= 𝑥 − cos 𝑥.
Also, note that | cos 𝑥| ≤ 1, so a solution to the original equation must have |𝑥| ≤ 1. So we will start graphing the function
on the interval [𝑎, 𝑏] = [−1, 1].
9
Introduction to Numerical Methods and Analysis with Python
# We will often need resources from the modules numpy and pyplot:
import numpy as np
import matplotlib.pyplot as plt
# We can also import items from a module individually, so they can be used by "first␣
↪name only".
# Here this is done for mathematical functions; in some later sections it will be␣
↪done for all imports.
def f(x):
return x - cos(x)
a = -1; b = 1
10 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
This shows that the zero lies between 0.5 and 0.75, so zoom in:
a = 0.5; b = 0.75
x = np.linspace(a, b)
plt.figure(figsize=(12,6))
plt.plot(x, f(x))
plt.plot([a, b], [0, 0], 'g')
plt.grid(True);
a = -1
b = 1
c = (a+b)/2
acb = [a, c, b]
plt.figure(figsize=(12,6))
plt.plot(acb, f(acb), 'b*')
# And just as a visual aid:
(continues on next page)
𝑓(𝑎) and 𝑓(𝑐) have the same sign, while 𝑓(𝑐) and 𝑓(𝑏) have opposite signs, so the root is in [𝑐, 𝑏]; update the a, b, c values
and plot again:
acb = [a, c, b]
x = np.linspace(a, b)
plt.figure(figsize=(12,6))
plt.plot(acb, f(acb), 'b*', x, f(x), 'b-.')
plt.plot([a, b], [0, 0], 'g')
plt.grid(True);
12 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Again 𝑓(𝑐) and 𝑓(𝑏) have opposite signs, so the root is in [𝑐, 𝑏], and …
acb = [a, c, b]
x = np.linspace(a, b)
plt.figure(figsize=(12,6))
plt.plot(acb, f(acb), 'b*', x, f(x), 'b-.')
plt.plot([a, b], [0, 0], 'g')
plt.grid(True);
This time 𝑓(𝑎) and 𝑓(𝑐) have opposite sign, so the root is at left, in [𝑎, 𝑐]:
acb = [a, c, b]
x = np.linspace(a, b)
plt.figure(figsize=(12,6))
plt.plot(acb, f(acb), 'b*', x, f(x), 'b-.')
plt.plot([a, b], [0, 0], 'g')
plt.grid(True);
14 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Now it is time to dispense with the graphs, and describe the procedure in mathematical terms:
• if 𝑓(𝑎) and 𝑓(𝑐) have opposite signs, the root is in interval [𝑎, 𝑐], which becomes the new version of interval [𝑎, 𝑏].
• otherwise, 𝑓(𝑐) and 𝑓(𝑏) have opposite signs, so the root is in interval [𝑐, 𝑏]
As a useful bridge from the mathematical desciption of an algorithm with words and formulas to actual executable code,
these notes will often describe algorithms in pseudo-code — a mix of words and mathematical formulas with notation that
somewhat resembles code in a language like Python.
This is also preferable to going straight to code in a particular language (such as Python) because it makes it easier if,
later, you wish to implement algorithms in a different programming language.
Note well one feature of the pseudo-code used here: assignment is denoted with a left arrow:
𝑥←𝑎
is the instruction to cause the value of variable x to become the current value of a.
This is to distinguish from
𝑥=𝑎
which is a comparison: the true-or-false assertion that the two quantities already have the same value.
Unfortunately however, Python (like most programming languages) does not use this notation: instead assignment is done
with x = a so that asserting equality needs a different notation: this is done with x == a; note well that double equal
sign!
Also, the pseudo-code marks the end of blocks like if, for and while with a line end. Many programming languages
do something like this, but Python does not: instead it uses only the end of indentation as the indication that a block is
finished.
With those notational issues out of the way, the key step in the bisection strategy is the update of the interval:
This needs to be repeated a finite number of times, and the simplest way is to specify the number of iterations. (We will
consider more refined methods soon.)
for i in range(N):
c = (a+b)/2
if f(a) * f(c) < 0:
b = c
else:
a = c
(If you wish to review for loops in Python, see the Python Review section on Iteration with for)
16 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Exercise 1
Create a Python function bisection1 which implements the first algorithm for bisection above, which performs a fixed
number 𝑁 of iterations; the usage should be: root = bisection1(f, a, b, N)
Test it with the above example: 𝑓(𝑥) = 𝑥 − cos 𝑥 = 0, [𝑎, 𝑏] = [−1, 1]
(If you wish to review the defining and use of functions in Python, see the Python Review section on Defining and Using
Python Functions)
The above method of iteration for a fixed number of times is simple, but usually not what is wanted in practice. Instead, a
better goal is to get an approximation with a guaranteed maximum possible error: a result consisting of an approximation
𝑟 ̃ to the exact root 𝑟 and also a bound 𝐸𝑚𝑎𝑥 on the maximum possible error; a guarantee that |𝑟 − 𝑟|̃ ≤ 𝐸𝑚𝑎𝑥 . To put it
another way, a guarantee that the root 𝑟 lies in the interval [𝑟 ̃ − 𝐸𝑚𝑎𝑥 , 𝑟 ̃ + 𝐸𝑚𝑎𝑥 ].
In the above example, each iteration gives a new interval [𝑎, 𝑏] guaranteed to contain the root, and its midpoint 𝑐 =
(𝑎 + 𝑏)/2 is with a distance (𝑏 − 𝑎)/2 of any point in that interval, so at each iteration, we can have:
• 𝑟 ̃ is the current value of 𝑐 = (𝑎 + 𝑏)/2
• 𝐸𝑚𝑎𝑥 = (𝑏 − 𝑎)/2
The above algorithm can passively state an error bound, but it is better to be able to solve to a desired degree of accuracy;
for example, if we want a result “accurate to three decimal places”, we can specify 𝐸𝑚𝑎𝑥 ≤ 0.5 × 10−3 .
So our next goal is to actively set an accuracy target or error tolerance 𝐸𝑡𝑜𝑙 and keep iterating until it is met. This can be
achieved with a while loop; here is a suitable algorithm:
(If you wish to review while loops, see the Python Review section on Iteration with while)
Exercise 2
Create a Python function implementing this better algorithm, with usage root = bisection2(f, a, b, E_tol)
Test it with the above example: 𝑓(𝑥) = 𝑥 − cos 𝑥, [𝑎, 𝑏] = [−1, 1], this time accurate to within 10−4 .
Use the fact that there is a solution in the interval (−1, 1).
References:
• Sections 6.1.1 Euler’s Method in [Sauer, 2022]
• Section 5.2 Euler’s Method in [Burden et al., 2016]
• Sections 7.1 and 7.2 of [Chenney and Kincaid, 2012]
2.2.1 Introduction
In the next section we will meet Newton’s Method for Solving Equations for root-finding, which you might have seen in a
calculus course. This is one very important example of a more general strategy of fixed-point iteration, so we start with
that.
# We will often need resources from the modules numpy and pyplot:
import numpy as np
# We can also import items from a module individually, so they can be used by "first␣
↪name only".
# In this book this is done mostly for mathematical functions with familiar names:
from numpy import cos
# and some very frequently use functions for graphing:
from matplotlib.pyplot import figure, plot, title, legend, grid
𝑔(𝑝) = 𝑝
Such problems are interchangeable with root-finding. One way to convert from 𝑓(𝑥) = 0 to 𝑔(𝑥) = 𝑥 is defining
𝑔(𝑥) ∶= 𝑥 − 𝑤(𝑥)𝑓(𝑥)
a = -1
b = 1
x = np.linspace(a, b)
18 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
figure(figsize=(12,5))
title("$y = g_1(x) = \cos (x)$ and $y=x$")
plot(x, g_1(x))
plot(x, x)
grid(True)
The fixed point form can be convenient partly because we almost always have to solve by successive approximations, or
iteration, and fixed point form suggests one choice of iterative procedure: start with any first approximation 𝑥0 , and iterate
with
Proposition 1.2.1
If 𝑔 is continuous, and if the above sequence {𝑥0 , 𝑥1 , … } converges to a limit 𝑝, then that limit is a fixed point of function
𝑔: 𝑔(𝑝) = 𝑝.
That second “if” is a big one. Fortunately, it can often be resolved using the idea of a contraction mapping.
A mapping is sometimes thought of as moving a region 𝑆 within its domain 𝐷 to another such region, by moving each
point 𝑥 ∈ 𝑆 ⊂ 𝐷 to its image 𝑔(𝑥) ∈ 𝑔(𝑆) ⊂ 𝐷.
A very important case is mappings that shrink the region, by reducing the distance between points:
Proposition 1.2.2
Any continuous mapping on a closed interval [𝑎, 𝑏] has at least one fixed point.
Example 1.2.1
Let us illustrate this with the mapping 𝑔4 4(𝑥) ∶= 4 cos 𝑥, for which the fact that |𝑔4 (𝑥)| ≤ 4 ensures that this is a map
of the domain 𝐷 = [−4, 4] into itself:
20 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
figure(figsize=(8,8))
title("Fixed points of the map $g_4(x) = 4 \cos(x)$")
plot(x, g_4(x), label="$y=g_4(x)$")
plot(x, x, label="$y=x$")
legend()
grid(True);
This example has multiple fixed points (three of them). To ensure both the existence of a unique solution, and covergence
of the iteration to that solution, we need an extra condition.
Remark 1.2.1
|𝑔(𝑥) − 𝑔(𝑦)|
It is not enough to have |𝑔(𝑥) − 𝑔(𝑦)| < |𝑥 − 𝑦| or 𝐶 = 1! We need the ratio to be uniformly less than
|𝑥 − 𝑦|
one for all possible values of 𝑥 and 𝑦.
Proof. The main idea of the proof can be shown with the help of a few pictures.
First, uniqeness: between any two of the multiple fixed points above — call them 𝑝0 and 𝑝1 — the graph of 𝑔(𝑥) has
to rise with secant slope 1: (𝑔(𝑝1 ) − 𝑔(𝑝0 )/(𝑝1 − 𝑝0 ) = (𝑝1 − 𝑝0 )/(𝑝1 − 𝑝0 ) = 1, and this violates the contraction
property.
So instead, for a contraction, the graph of a contraction map looks like the one below for our favorite example, 𝑔(𝑥) =
cos 𝑥 (which we will soon verify to be a contraction on interval [−1, 1]):
The second claim, about convergence to the fixed point from any initial approximation 𝑥0 , will be verified below, once
we have seen some ideas about measuring errors.
With differentiable functions, the contraction condition can often be easily verified using derivatives:
Proof. Using the Mean Value Theorem, 𝑔(𝑥) − 𝑔(𝑦) = 𝑔′ (𝑐)(𝑥 − 𝑦) for some 𝑐 between 𝑥 and 𝑦. Then taking absolute
values,
22 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
The contraction constant 𝐶 as a measure of how fast the approximations improve (the smaller the
better)
It can be shown that if 𝐶 is small (at least when one looks only at a reduced domain |𝑥 − 𝑝| < 𝑅) then the convergence
is “fast” once |𝑥𝑘 − 𝑝| < 𝑅.
To see this, we define some jargon for talking about errors. (For more details on error concepts, see section Measures of
Error and Order of Convergence.)
In the case of 𝑥𝑘 as an approximation of 𝑝, we name the error 𝐸𝑘 ∶= 𝑥𝑘 − 𝑝. Then 𝐶 measures a worst case for how fast
the error decreases as 𝑘 increases, and this is “exponentially fast”:
Proposition 1.2.3
|𝐸𝑘+1 | ≤ 𝐶|𝐸𝑘 |, or |𝐸𝑘+1 |/|𝐸𝑘 | ≤ 𝐶, and so
|𝐸𝑘 | ≤ 𝐶 𝑘 |𝑥0 − 𝑝|
That is, the error decreases at worst in a geometric sequence, which is exponential decrease with respect to the variable
𝑘.
Proof. 𝐸𝑘+1 = 𝑥𝑘+1 − 𝑝 = 𝑔(𝑥𝑘 ) − 𝑔(𝑝), using 𝑔(𝑝) = 𝑝. Thus the contraction property gives
Remark 1.2.2
We will often use this “recursive” strategy of relating the error in one iterate to that in the previous iterate.
a = 0
b = 1
x = np.linspace(a, b)
iterations = 10
# Start at left
print(f"Solving x = cos(x) starting to the left, at x_0 = {a}")
x_k = a
figure(figsize=(8,8))
title(f"Solving $x = \cos x$ starting to the left, at $x_0$ = {a}")
plot(x, x, "g")
plot(x, g(x), "r")
grid(True)
for k in range(iterations):
g_x_k = g_1(x_k)
# Graph evalation of g(x_k) from x_k:
plot([x_k, x_k], [x_k, g(x_k)], "b")
x_k_plus_1 = g_1(x_k)
#Connect to the new x_k on the line y = x:
plot([x_k, g_1(x_k)], [x_k_plus_1, x_k_plus_1], "b")
# Update names: the old x_k+1 is the new x_k
x_k = x_k_plus_1
print(f"x_{k+1} = {x_k_plus_1}")
# Start at right
print(f"Solving x = cos(x) starting to the right, at x_0 = {b}")
x_k = b
figure(figsize=(8,8))
title(f"Solving $x = \cos(x)$ starting to the right, at $x_0$ = {b}")
plot(x, x, "g")
plot(x, g(x), "r")
grid(True)
for k in range(iterations):
g_x_k = g_1(x_k)
(continues on next page)
24 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
26 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
In each case, one gets a “box spiral” in to the fixed point. It always looks like this when 𝑔 is decreasing near the fixed
point.
If instead 𝑔 is increasing near the fixed point, the iterates approach monotonically, either from above or below:
a = 0
b = 3
x = np.linspace(a, b)
figure(figsize=(12,5))
title("$y = f_2(x) = x^2-5x+4$ and $y = 0$")
plot(x, f_2(x))
plot([a, b], [0, 0])
grid(True)
figure(figsize=(12,5))
title("$y = g_2(x) = (x^2 + 4)/5$ and $y=x$")
plot(x, g_2(x))
plot(x, x)
grid(True)
28 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
iterations = 10
# Start at left
a = 0.0
b = 2.0
x = np.linspace(a, b)
x_k = a
figure(figsize=(8,8))
title(f"Starting to the left, at x_0 = {a}")
grid(True)
plot(x, x, "g")
plot(x, g_2(x), "r")
for k in range(iterations):
g_x_k = g_2(x_k)
# Graph evalation of g(x_k) from x_k:
plot([x_k, x_k], [x_k, g_2(x_k)], "b")
x_k_plus_1 = g_2(x_k)
#Connect to the new x_k on the line y = x:
plot([x_k, g_2(x_k)], [x_k_plus_1, x_k_plus_1], "b")
# Update names: the old x_k+1 is the new x_k
x_k = x_k_plus_1
print(f"x_{k+1} = {x_k_plus_1}")
x_1 = 0.8
x_2 = 0.9280000000000002
x_3 = 0.9722368000000001
x_4 = 0.9890488790548482
x_5 = 0.9956435370319303
x_6 = 0.9982612105666906
x_7 = 0.9993050889044148
x_8 = 0.999722132142052
x_9 = 0.9998888682989302
x_10 = 0.999955549789623
# Start at right
a = 0.0
b = 2.0
x = np.linspace(a, b)
x_k = b
figure(figsize=(8,8))
title(f"Starting to the right, at x_0 = {b}")
grid(True)
plot(x, x, "g")
plot(x, g_2(x), "r")
for k in range(iterations):
g_x_k = g_2(x_k)
# Graph evalation of g(x_k) from x_k:
plot([x_k, x_k], [x_k, g_2(x_k)], "b")
x_k_plus_1 = g_2(x_k)
#Connect to the new x_k on the line y = x:
(continues on next page)
30 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
x_1 = 1.6
x_2 = 1.312
x_3 = 1.1442688
x_4 = 1.061870217330688
x_5 = 1.0255136716907844
x_6 = 1.010335658164943
x_7 = 1.0041556284319177
x_8 = 1.0016657052223
x_9 = 1.0006668370036975
x_10 = 1.000266823735797
2.2.3 Exercises
Exercise 1
The equation 𝑥3 − 2𝑥 + 1 = 0 can be written as a fixed point equation in many ways, including
𝑥3 + 1
1. 𝑥 =
2
and
√
3
2. 𝑥 = 2𝑥 − 1
For each of these options:
(a) Verify that its fixed points do in fact solve the above cubic equation.
32 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
(b) Determine whether fixed point iteration with it will converge to the solution 𝑟 = 1. (assuming a “good enough” initial
approximation).
Note: computational experiments can be a useful start, but prove your answers mathematically!
References:
• Sections 1.2 Fixed-Point Iteration and 1.4 Newton’s Method of [Sauer, 2022]
• Sections 2.2 Fixed-Point Iteration and 2.3 Newton’s Method and Its Extensions of [Burden et al., 2016]
2.3.1 Introduction
Newton’s method for solving equations has a number of advantages over the bisection method:
• It is usually faster (but not always, and it can even fail completely!)
• It can also compute complex roots, such as the non-real roots of polynomial equations.
• It can even be adapted to solving systems of non-linear equations; that topic wil be visited later.
# We will often need resources from the modules numpy and pyplot:
import numpy as np
from numpy import sin, cos
from matplotlib.pyplot import figure, plot, title, legend, grid
You might have previously seen Newton’s method derived using tangent line approximations. That derivation is presented
below, but first we approach it another way: as a particularly nice contraction mapping.
To compute a root 𝑟 of a differentiable function 𝑓, we design a contraction mapping for which the contraction constant
𝐶 becomes arbitrarily small when we restrict to iterations in a sufficiently small interval around the root: |𝑥 − 𝑟| ≤ 𝑅.
That is, the error ratio |𝐸𝑘+1 |/|𝐸𝑘 | becomes ever smaller as the iterations get closer to the exact solution; the error is thus
reducing ever faster than the above geometric rate 𝐶 𝑘 .
This effect is in turn achieved by getting |𝑔′ (𝑥)| arbitrarily small for |𝑥 − 𝑟| ≤ 𝑅 with 𝑅 small enough, and then using
the above connection between 𝑔′ (𝑥) and 𝐶. This can be achieved by ensuring that 𝑔′ (𝑟) = 0 at a root 𝑟 of 𝑓 — so long
as thr root 𝑟 is simple: 𝑓 ′ (𝑟) ≠ 0 (which is generically true, but not always).
To do so, seek 𝑔 in the above form 𝑔(𝑥) = 𝑥 − 𝑤(𝑥)𝑓(𝑥), and choose 𝑤(𝑥) appropriately. At the root 𝑟,
so we ensure 𝑔′ (𝑟) = 0 by requiring 𝑤(𝑟) = 1/𝑓 ′ (𝑟) (hence the problem if 𝑓 ′ (𝑟) = 0).
We do not know 𝑟, but that does not matter! We can just choose 𝑤(𝑥) = 1/𝑓 ′ (𝑥) for all 𝑥 values. That gives
# but for now I just want a simple presentation of the basic mathematical␣
↪idea.
dx = fx/Dfx
x -= dx # Aside: this is shorthand for "x = x - dx"
errorEstimate = abs(dx)
if demoMode:
print(f"At iteration {k+1} x = {x} with estimated error {errorEstimate:0.
↪3}, backward error {abs(f(x)):0.3}")
Example
Let’s start with our favorite equation, 𝑥 = cos 𝑥.
34 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
print()
print(f"The root is approximately {root}")
print(f"The estimated absolute error is {errorEstimate:0.3}")
print(f"The backward error is {abs(f_1(root)):0.3}")
print(f"This required {iterations} iterations")
Here we have introduced another way of talking about errors and accuracy, which is further discussed in the section on
Measures of Error and Order of Convergence.
This has the advantage that we can actually compute it without knowing the exact solution!
The backward error also has a useful geometrical meaning: if the function 𝑓 were changed by this much to a nearbly
function 𝑓 ̃ then 𝑥̃ could be an exact root of 𝑓.̃ Hence, if we only know the values of 𝑓 to within this backward error
(for example due to rounding error in evaluating the function) then 𝑥̃ could well be an exact root, so there is no point in
striving for greater accuracy in the approximate root.
We will see this in the next example.
Since this is a fixed point iteration with 𝑔(𝑥) = 𝑥 − (𝑥 − cos(𝑥)/(1 + sin(𝑥)), let us compare its graph to the ones seen
in the section on fixed point iteration. Now 𝑔 is neither increasing nor decreasing at the fixed point, so the graph has an
unusual form.
def g(x):
return x - (x - cos(x))/(1 + sin(x))
a = 0
b = 1
# Start at left
description = 'Starting near the left end of the domain'
print(description)
x_k = 0.1
print(f"x_0 = {x_k}")
figure(figsize=(8,8))
title(description)
grid(True)
plot(x, x, 'g')
plot(x, g(x), 'r')
for k in range(iterations):
g_x_k = g(x_k)
# Graph evalation of g(x_k) from x_k:
plot([x_k, x_k], [x_k, g(x_k)], 'b')
x_k_plus_1 = g(x_k)
#Connect to the new x_k on the line y = x:
plot([x_k, g(x_k)], [x_k_plus_1, x_k_plus_1], 'b')
# Update names: the old x_k+1 is the new x_k
x_k = x_k_plus_1
print(f"x_{k+1} = {x_k}")
36 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
# Start at right
description = 'Starting near the right end of the domain'
print(description)
x_k = 0.9
print(f"x_0 = {x_k}")
figure(figsize=(8,8))
title(description)
grid(True)
plot(x, x, 'g')
plot(x, g(x), 'r')
for k in range(iterations):
g_x_k = g(x_k)
# Graph evalation of g(x_k) from x_k:
plot([x_k, x_k], [x_k, g(x_k)], 'b')
x_k_plus_1 = g(x_k)
(continues on next page)
In fact, wherever you start, all iterations take you to the right of the root, and then approach the fixed point monotonically
38 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
— and very fast. We will see an explanation for this in The Convergence Rate of Newton’s Method.
print()
print(f"The root is approximately {root}")
print(f"The estimated absolute error is {errorEstimate}")
print(f"The backward error is {abs(f_1(root)):0.4}")
print(f"This required {iterations} iterations")
Observations:
• It only took one more iteration to meet the demand for twice as many decimal places of accuracy.
• The result is “exact” as fas as the computer arithmeric can tell, as shown by the zero backward error: we have
indeed reached the accuracy limits of computer arithmetic.
We will work almost entirely with real values and vectors in ℝ𝑛 , but actually, everything above also works for complex
numbers. In particular, Newton’s method works for finding roots of functions 𝑓 ∶ ℂ → ℂ; for example when seeking all
roots of a polynomial.
z = 3+4j
print(z)
print(abs(z))
(3+4j)
5.0
print(1j)
1j
print(-1j)
(-0-1j)
but:
print(j)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 print(j)
j = 100
print(1j)
1j
print(j)
100
First, 𝑥0 = 1
print()
print(f"The first root is approximately {root1}")
print(f"The estimated absolute error is {errorEstimate1}")
print(f"The backward error is {abs(f_2(root1)):0.4}")
print(f"This required {iterations1} iterations")
40 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
print()
print(f"The second root is approximately {root2}")
print(f"The estimated absolute error is {errorEstimate2:0.3}")
print(f"The backward error is {abs(f_2(root2)):0.3}")
print(f"This required {iterations2} iterations")
√
This root is in fact −1 + 𝑖 3.
Finally, 𝑥0 = 1 − 𝑖
print()
print(f"The third root is approximately {root3}")
print(f"The estimated absolute error is {errorEstimate3}")
print(f"The backward error is {abs(f_2(root3)):0.4}")
print(f"This required {iterations3} iterations")
√
This root is in fact −1 − 𝑖 3.
The more traditional derivation of Newton’s method is based on the very widely useful idea of linearization; using the fact
that a differentiable function can be approximated over a small part of its domain by a straight line — its tangent line —
and it is easy to compute the root of this linear function.
So start with a first approximation 𝑥0 to a solution 𝑟 of 𝑓(𝑥) = 0.
Step 1: Linearize at 𝑥0 .
The tangent line to the graph of this function wih center 𝑥0 , also know as the linearization of 𝑓 at 𝑥0 , is
Hopefully, the two functions 𝑓 and 𝐿0 are close, so that the root of 𝐿0 is close to a root of 𝑓; close enough to be a better
approximation of the root 𝑟 than 𝑥0 is.
Give the name 𝑥1 to this root of 𝐿0 : it solves 𝐿0 (𝑥1 ) = 𝑓(𝑥0 ) + 𝑓 ′ (𝑥0 )(𝑥1 − 𝑥0 ) = 0, so
42 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Step 3: Iterate
We can then use this new value 𝑥1 as the center for a new linearization 𝐿1 (𝑥) = 𝑓(𝑥1 ) + 𝑓 ′ (𝑥1 )(𝑥 − 𝑥1 ), and repeat to
get a hopefully even better approximate root,
And so on: at each step, we get from approximation 𝑥𝑘 to a new one 𝑥𝑘+1 with
And indeed this is the same formula seen above for Newton’s method.
Illustration: a few steps of Newton’s method for 𝑥 − cos(𝑥) = 0.
This approach to Newton’s method via linearization and tangent lines suggests another graphical presentation; again we
use the example of 𝑓(𝑥) = 𝑥 − cos(𝑥). This has 𝐷𝑓(𝑥) = 1 + sin(𝑥), so the linearization at center 𝑎 is
𝐿0 (𝑥) = −1 + 𝑥
figure(figsize=(12,6))
title('First iteration, from $x_0 = 0$')
left = -0.1
right = 1.1
x = np.linspace(left, right)
plot(x, f_1(x), label='$x - \cos(x)$')
plot([left, right], [0, 0], 'k', label="$x=0$") # The x-axis, in black
x_0 = 0
plot([x_0], [f_1(x_0)], 'g*')
plot(x, L_0(x), 'y', label='$L_0(x)$')
plot([x_0], [f_1(x_0)], 'g*')
x_1 = x_0 - f_1(x_0)/Df_1(x_0)
print(f'{x_1=}')
plot([x_1], [0], 'r*')
legend()
grid(True);
x_1=1.0
figure(figsize=(12,6))
title('Second iteration, from $x_1 = 1$')
# Shrink the domain
left = 0.7
right = 1.05
x = np.linspace(left, right)
x_2=0.7503638678402439
44 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
figure(figsize=(12,6))
title('Third iteration, from $x_2$')
# Shrink the domain some more
left = 0.735
right = 0.755
x = np.linspace(left, right)
For the bisection method, we have seen in Root Finding by Interval Halving a fairly simple way to get an upper limit on
the absolute error in the approximations.
For absolute guarantees of accuracy, things do not go quite as well for Newton’s method, but we can at least get a very
“probable” estimate of how large the error can be. This requires some calculus, and more specifically Taylor’s theorem,
reviewed in the section on Taylor’s Theorem.
So we will return to the question of both the speed and accuracy of Newton’s method in The Convergence Rate of Newton’s
Method.
On the other hand, the example graphs above illustrate that the successive linearizations become ever more accurate as
approximations of the function 𝑓 itself, so that the approximation 𝑥3 looks “perfect” on the graph — the speed of Newton’s
method looks far better than for bisection. This will also be explained in the section on The Convergence Rate of Newton’s
Method.
2.3.6 Exercises
Exercise 1
𝑓(𝑥) = 𝑥𝑘 − 𝑎
46 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Exercise 2
(The last input parameter maxIterations could be optional, with a default like maxIterations=100.)
b) based on your function bisection2 create a third (and final!) version with usage
𝑓1 (𝑥) = 10 − 2𝑥 + sin(𝑥) = 0
Again graph the function, to find a good starting interval [𝑎, 𝑏] and initial approximation 𝑥0 .
e) This second case will behave differently than for 𝑓1 in part (c): describe the difference. (We will discuss the reasons
in class.)
References:
• Theorem 0.8 in Section 0.5 Review of Calculus in [Sauer, 2022].
• Section 1.1 Review of Calculus in [Burden et al., 2016], from Theorem 1.14 onward.
1 𝑓 (𝑘) (𝑎)
𝑓(𝑥) = 𝑓(𝑎) + 𝑓 ′ (𝑎)(𝑥 − 𝑎) + 𝑓 ″ (𝑎)(𝑥 − 𝑎)2 ⋯ (𝑥 − 𝑎)𝑘 + ⋯
2 𝑘! (2.1)
𝑓 (𝑛) (𝑎)
+ (𝑥 − 𝑎)𝑛 + 𝑅𝑛 (𝑥)
𝑛!
𝑓 (𝑛+1) (𝑐𝑥 )
𝑅𝑛 (𝑥) = (𝑥 − 𝑎)𝑛+1 (2.3)
(𝑛 + 1)!
with the value 𝑐𝑥 lying between 𝑎 and 𝑥, and so depending on 𝑥.
This gives information about the absolute error in the polynomial 𝑇𝑛 (𝑥) as an approximation of 𝑓(𝑥):
𝑀𝑛+1
|𝑓(𝑥) − 𝑇𝑛 (𝑥)| ≤ |𝑥 − 𝑎|𝑛+1
(𝑛 + 1)!
where 𝑀𝑛+1 is the maximum absolute value of 𝑓 (𝑛+1) over the relevant interval between 𝑎 and 𝑥.
Of course we typically do not know much about that constant 𝑀𝑛+1 , so often the most important thing is the power law
rate |𝑥 − 𝑎|𝑛+1 at which the error reduces as 𝑥 approaches 𝑎.
Taylor polynomials are therefore most useful when the quantity ℎ ∶= 𝑥 − 𝑎 is small, and we will most often use them in
situations where the limit as ℎ → 0 is relevant. It is convenient to change the notation a bit, treating ℎ as the variable:
A very common use of Taylor’s Theorem is the rather simple case 𝑛 = 1; linearization, to approximate a twice difer-
entiable function by a linear one. (This will be even more so when we come to system of equations, since the only such
systems that we can systematically solve exactly are linear systems.)
Taylor’s Theorem for the linearization 𝐿(𝑥) = 𝑓(𝑎) + 𝑓 ′ (𝑎)(𝑥 − 𝑎) of 𝑓 at 𝑎 then says that
𝑓 ″ (𝑐𝑥 ) 2
𝑓(𝑥) − 𝐿(𝑥) = ℎ , |𝑐𝑥 − 𝑎| < |𝑥 − 𝑎| (2.7)
2
48 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
or in terms of ℎ,
𝑓 ″ (𝑐ℎ ) 2
𝑓(𝑎 + ℎ) = 𝑓(𝑎) + 𝑓 ′ (𝑎)ℎ + ℎ , |𝑐ℎ − 𝑎| < |ℎ| (2.8)
2
Thus there is an error bound
𝑀2 2
|𝑓(𝑎 + ℎ) − (𝑓(𝑎) + 𝑓 ′ (𝑎)ℎ)| ≤ ℎ , where 𝑀2 = max |𝑓 ″ (𝑥)| (2.9)
2 |𝑥−𝑎|<|ℎ|
Of course sometimes it is enough to use the maximum over the whole domain, 𝑀2 = max |𝑓 ″ (𝑥)|.
References:
• Section 1.3.1 Forward and backward error of [Sauer, 2022], on measures of error;
• Section 2.4 Error Analysis for Iterative Methods of [Burden et al., 2016], on order of convergence.
These notes cover a number of small topics:
• Measures of error: absolute, relative, forward, backward, etc.
• Measuring the rate of convergence of a sequence of approximations.
• Big-O and little-o notation for describing how small a quantity (usually an error) is.
Several of these have been mentioned before, but they are worth gathering here.
Consider a quantity 𝑥̃ considered as an approximation of an exact value 𝑥. (This can be a number or a vector.)
For real-valued quantities, the absolute error is related to the number of correct decimal places: 𝑝 decimal places of
accuracy corresponds roughly to absolute error no more than 0.5 × 10−𝑝 .
This is often more relevant than absolute error for inherently positive quantities, but is obviously unwise where 𝑥 = 0 is a
possibility. For real-valued quantities, this is related to the number of significant digits: accuracy to 𝑝 significant digits
corresponds roughly to relative error no more than 0.5 × 10−𝑝 .
When working with computer arithmetic, 𝑝 significant bits corresponds to relative error no more than 2−(𝑝+1) .
An obvious problem is that we usually do not know the exact solution 𝑥, so cannot evaluate any of these; instead we
typically seek upper bounds on the absolute or relative error. Thus, when talking of approximate solutions to an equation
𝑓(𝑥) = 0 the concept of Definition 1.3.1 backward error introduced in the section on Newton’s Method for Solving
Equations can be very useful, for example as a step in getting bounds on the size of the error; to recap
For the case of solving simultaneous linear equations in matrix-vector form 𝐴𝑥 = 𝑏, this is 𝑏 − 𝐴𝑥,̃ also known as the
residual.
For the case of solving simultaneous linear equations in matrix-vector form 𝐴𝑥 = 𝑏, this is ‖𝑏 − 𝐴𝑥‖,
̃ also known as the
residual norm.
Remark 1.5.1
• One obvious advantage of the backward error concept is that you can actually evaluate it without knowing the exact
solution 𝑥.
• Also, one significance of backward error is that if the values of 𝑓(𝑥) are only known to be accurate within an
absolute error of 𝐸 then any approximation with absolute backward error less than 𝐸 could in fact be exact, so
there is no point in seeking greater accuracy.
• The names forward error and absolute forward error are sometimes used as synonyms for error etc. as defined
above, when they need to be distinguished from backward errors.
Definition 1.5.6
We have seen that for the sequence of approximations 𝑥𝑘 to a quantity 𝑥 given by the fixed point iteration 𝑥𝑘+1 = 𝑔(𝑥𝑘 ),
the absolute errors 𝐸𝑘 ∶= |𝑥𝑘 − 𝑥| typically have
𝐸𝑘+1
→ 𝐶 = |𝑔′ (𝑥)|.
𝐸𝑘
50 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
so that eventually the errors diminsh in a roughly geometric fashion: 𝐸𝑘 ≈ 𝐾𝐶 𝑘 . This is called linear convergence.
Aside: Why “linear” rather than “geometric”? Because there is an approximately linear relationship between consecutive
error values,
𝐸𝑛+1 ≈ 𝐶𝐸𝑛 .
This is a very common behavior for iterative numerical methods, but we will also see that a few methods do even better;
for example, when Newton’s method converges to a simple root 𝑟 of 𝑓 (one with 𝑓 ′ (𝑟) ≠ 0)
𝐸𝑘+1 ≈ 𝐶𝐸𝑘2
We have already observed experimentally the intermediate result that “𝐶 = 0” for Newton’s method in this case; that is,
𝐸𝑘+1
→ 0. (2.10)
𝐸𝑘
For most practical purposes, if you have established super-linear convergence, you can be happy, and not worry much
about refinements like the particular order 𝑝.
Consider the error formula for approximation of a function 𝑓 with the Taylor polynomial of degree 𝑛, center 𝑎:
𝑀𝑛+1
|𝑓(𝑎 + ℎ) − 𝑇𝑛 (ℎ)| ≤ |ℎ|𝑛+1 where 𝑀𝑛+1 = max |𝑓 (𝑛+1) (𝑥)|.
(𝑛 + 1)!
Since the coefficient of ℎ𝑛+1 is typicaly not known in practice, it is wise to focus on the power law part, and for this the
“big-O” and little-o” notation is convenient.
If a function 𝐸(ℎ) goes to zero at least as fast as ℎ𝑝 , we say that it is of order ℎ𝑝 , written 𝑂(ℎ𝑝 ).
More precisely, 𝐸(ℎ) is no bigger than a multiple of ℎ𝑝 for ℎ small enough; that is, there is a constant 𝐶 such that for
some positive number 𝛿
|𝐸(ℎ)|
≤ 𝐶 for |ℎ| < 𝛿.
|ℎ|𝑝
Another way to say this is in terms of the lim-sup, if you have seen that jargon:
|𝐸(ℎ)|
lim sup is finite.
ℎ→0 |ℎ|𝑝
This can be used to rephrase the above Taylor’s theorem error bound as
or
Sometimes it is enough to say that some error term is small enough to be neglected, at least when ℎ is close enough to
zero. For example, with a Taylor series we might be able to neglect the powers of 𝑥 − 𝑎 or of ℎ higher than 𝑝.
We will thus say that a quantity 𝐸(ℎ) is small of order ℎ𝑝 , written 𝑜(ℎ𝑝 ) when
|𝐸(ℎ)|
lim = 0.
ℎ→0 |ℎ|𝑝
Note the addition of the word small compared to the above description of the big-O case!
With this, the Taylor’s theorem error bound can be stated as
or
References:
• Section 1.4.1 Quadratic Convergence of Newton’s Method in [Sauer, 2022].
• Theorem 2.9 in Section 2.4 Error Analysis of Iterative Methods in [Burden et al., 2016], but done quite differently.
Jumping to the punch line, we will see that when the iterates 𝑥𝑘 given by Newton’s method converge to a simple root 𝑟
(that is, a solution of 𝑓(𝑟) = 0 with 𝑓 ′ (𝑟) ≠ 0) then the errors 𝐸𝑘 = 𝑥𝑘 − 𝑟 satisfy
In words, the error at each iteration is of the order of the square of the previous error, and so is small of order the previous
error.
52 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
(Yes, it this a slight abuse of the notation as defined above, but all will become clear and rigorous soon.)
The first key step is getting a recursive relationship between consecutive errors 𝐸𝑘 and 𝐸𝑘+1 from the recursion formula
for Newton’s method,
𝑓(𝑥𝑘 )
𝑥𝑘+1 = 𝑥𝑘 − .
𝑓 ′ (𝑥𝑘 )
Start by subtracting 𝑟:
𝑓(𝑥𝑘 ) 𝑓(𝑥 )
𝐸𝑘+1 = 𝑥𝑘+1 − 𝑟 = 𝑥𝑘 − ′
− 𝑟 = 𝐸𝑘 − ′ 𝑘
𝑓 (𝑥𝑘 ) 𝑓 (𝑥𝑘 )
The other key step is to show that the two terms at right are very close, using the linearization of 𝑓 at 𝑥𝑘 with the error
𝐸𝑘 as the small term ℎ, and noting that 𝑟 = 𝑥𝑘 − 𝐸𝑘 :
0 = 𝑓(𝑟) = 𝑓(𝑥𝑘 − 𝐸𝑘 ) = 𝑓(𝑥𝑘 ) − 𝑓 ′ (𝑥𝑘 )𝐸𝑘 + 𝑂(𝐸𝑘2 )
Solve for 𝑓(𝑥𝑘 ) to insert into the numerator above: 𝑓(𝑥𝑘 ) = 𝑓 ′ (𝑥𝑘 )𝐸𝑘 + 𝑂(𝐸𝑘2 ). (There is no need for a minus sign on
that last term; big-O terms can be of either sign, and this new one is a different but still small enough quantity!)
Inserting above,
𝑓 ′ (𝑥𝑘 )𝐸𝑘 + 𝑂(𝐸𝑘2 ) 𝑂(𝐸 2 ) 𝑂(𝐸 2 ) 𝑂(𝑒2 )
𝐸𝑘+1 = 𝐸𝑘 − ′
= 𝐸𝑘 − 𝐸𝑘 + ′ 𝑘 = ′ 𝑘 → ′ 𝑘 = 𝑂(𝐸𝑘2 )
𝑓 (𝑥𝑘 ) 𝑓 (𝑥𝑘 ) 𝑓 (𝑥𝑘 ) 𝑓 (𝑟)
As 𝑘 → ∞, 𝑓 ′ (𝐸𝑘 ) → 𝑓 ′ (𝑟) ≠ 0, so the term at right is still no larger than a multiple of 𝐸𝑘2 : it is 𝑂(𝐸𝑘2 ), as claimed.
If you wish to verify this more carefully, note that
𝑀 2
• this 𝑂(𝐸𝑘2 ) term is no bigger than 2 𝐸𝑘 where 𝑀 is an upper bound on |𝑓 ″ (𝑥)|, and
• once 𝐸𝑘 is small enough, so that 𝑥𝑘 is close enough to 𝑟, |𝑓 ′ (𝑥𝑘 )| ≥ |𝑓 ′ (𝑟)|/2.
𝑂(𝐸𝑘2 ) 𝑀 /2 𝑀
Thus the term has magnitude no bigger than ′ 𝐸2 = ′ 𝐸 2 , which meets the definition of being of
𝑓 ′ (𝑥𝑘 ) |𝑓 (𝑟)|/2 𝑘 |𝑓 (𝑟)| 𝑘
order 𝐸𝑘2 .
A more careful calculation actually shows that
|𝐸𝑘+1 | 𝑓 ″ (𝑟)
lim 2
= ∣ ′ ∣,
𝑘→∞ 𝐸𝑘 2𝑓 (𝑟)
which is the way that this result is often stated in texts. For either form, it then easily follows that
|𝐸𝑘+1 |
lim = 0,
𝑘→∞ |𝐸𝑘 |
giving the super-linear convergence already seen using the Contraction Mapping Theorem, now restated as 𝐸𝑘+1 = 𝑜(𝐸𝑘 ).
One problem for Newton’s Method (and many other numerical methods we will see) is that there is not a simple way
to get a guaranteed upper bound on the absolute error in an approximation. Our best hope is finding an interval that
is guaranteed to contain the solution, as the Bisection Method does, and we can sometimes also do that with Newton’s
Method for a real root. But that approach fails as soon as the solution is a complex number or a vector.
Fortunately, when convergnce is “fast enough” is some sense, the following heuristic or “rule of thumb” applies in many
cases:
The error in the latest approximation is typically smaller than the difference between the two most recent approximations.
When combined with the backward error, this can give a fairly reliable measure of accuracy, and so can serve as a fairly
reliable stopping condition for the loop in an iterative calculation.
Proposition
For the iterations 𝑥𝑘 given by a contraction mapping that has 𝐶 ≤ 1/2,
or in words the error in 𝑥𝑘 is smaller than the change from 𝑥𝑘−1 to 𝑥𝑘 , so the above guideline is valid.
Proposition
For a super-linearly convergent iteration, eventually |𝐸𝑘+1 |/|𝐸𝑘 | < 1/2, and from that point onwards in the iterations,
the above applies again.
References:
• Section 1.5.1 Secant Method and variants in [Sauer, 2022]
• Section 2.3 Newton’s Method and it Extensions in [Burden et al., 2016]; just the later sub-sections, on The Secant
Method and The Method of False Position).
2.7.1 Introduction
We have already seen one method for solving 𝑓(𝑥) = 0 without needing to know any derivatives of 𝑓: the Bisection
Method, a.k.a. Interval Halving. However, we have also seen that that method is far slower then Newton’s Method.
Here we explore methods that are almost the best of both worlds: about as fast as Newton’s method but not needing
derivatives.
The first of these is the Secant Method. Later in this course we will see how this has been merged with the Bisection
Method and Polynomial Interpolation to produce the current state-of-the-art approach; only perfected in the 1960’s.
# We will often need resources from the modules numpy and pyplot:
import numpy as np
from numpy import abs, cos
54 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
One quirk of the Bisection Method is that it only uses the sign of the values 𝑓(𝑎) and 𝑓(𝑏), not their magnitudes. If one
of these is far smaller than the other, one might guess that the root is closer to that end of the interval. This leads to the
idea of:
• starting with an interval [𝑎, 𝑏] known to contain a zero of 𝑓,
• connecting the two points (𝑎, 𝑓(𝑎)) and (𝑏, 𝑓(𝑏)) with a straight line, and
• finding the 𝑥-value 𝑐 where this line crosses the 𝑥-axis. In the words, aproximating the function by a secant line, in
place of the tangent line used in Newton’s Method.
The next step requires some care. The first idea (from almost a millenium ago) was to use this new approximation 𝑐 as
done with bisection: check which of the intervals [𝑎, 𝑐] and [𝑐, 𝑏] has the sign change and use it as the new interval [𝑎, 𝑏];
this is called The Method of False Position (or Regula Falsi, since the academic world used latin in those days.)
The secant line between (𝑎, 𝑓(𝑎)) and (𝑏, 𝑓(𝑏)) is
𝑓(𝑎)(𝑏 − 𝑥) + 𝑓(𝑏)(𝑥 − 𝑎)
𝐿(𝑥) =
𝑏−𝑎
and its zero is at
𝑎𝑓(𝑏) − 𝑓(𝑎)𝑏
𝑐=
𝑓(𝑏) − 𝑓(𝑎)
This is easy to implement, and an example will show that it sort of works, but with a weakness that hampers it a bit:
Remark 1.7.1
For a more concise presentation, you could omit the above def and instead import this function with
Iteration 0:
The root is in interval [0.5403023058681398, 1]
The new approximation is 0.5403023058681398, with error bound 0.4597, backward␣
↪error 0.3173
Iteration 1:
The root is in interval [0.7280103614676172, 1]
The new approximation is 0.7280103614676172, with error bound 0.272, backward␣
↪error 0.01849
Iteration 2:
The root is in interval [0.7385270062423998, 1]
The new approximation is 0.7385270062423998, with error bound 0.2615, backward␣
↪error 0.000934
Iteration 3:
The root is in interval [0.7390571666782676, 1]
The new approximation is 0.7390571666782676, with error bound 0.2609, backward␣
↪error 4.68e-05
Iteration 4:
The root is in interval [0.7390837322783136, 1]
The new approximation is 0.7390837322783136, with error bound 0.2609, backward␣
↪error 2.345e-06
Iteration 5:
The root is in interval [0.7390850630385933, 1]
The new approximation is 0.7390850630385933, with error bound 0.2609, backward␣
↪error 1.174e-07
Iteration 6:
(continues on next page)
56 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Iteration 7:
The root is in interval [0.7390851330390691, 1]
The new approximation is 0.7390851330390691, with error bound 0.2609, backward␣
↪error 2.947e-10
Iteration 8:
The root is in interval [0.7390851332063397, 1]
The new approximation is 0.7390851332063397, with error bound 0.2609, backward␣
↪error 1.476e-11
Iteration 9:
The root is in interval [0.7390851332147188, 1]
The new approximation is 0.7390851332147188, with error bound 0.2609, backward␣
↪error 7.394e-13
Iteration 10:
The root is in interval [0.7390851332151385, 1]
The new approximation is 0.7390851332151385, with error bound 0.2609, backward␣
↪error 3.708e-14
Iteration 11:
The root is in interval [0.7390851332151596, 1]
The new approximation is 0.7390851332151596, with error bound 0.2609, backward␣
↪error 1.776e-15
Iteration 12:
The root is in interval [0.7390851332151606, 1]
The new approximation is 0.7390851332151606, with error bound 0.2609, backward␣
↪error 1.11e-16
Iteration 13:
The root is in interval [0.7390851332151607, 1]
The new approximation is 0.7390851332151607, with error bound 0.2609, backward␣
↪error 0.0
Iteration 14:
The root is in interval [0.7390851332151607, 1]
The new approximation is 0.7390851332151607, with error bound 0.2609, backward␣
↪error 0.0
The good news is that the approximations are approaching the zero reasonably fast — far faster than bisection — as
indicated by the backward errors improving by a factor of better than ten at each iteration.
The bad news is that one end gets “stuck”, so the interval does not shrink on both sides, and the error bound stays large.
This behavior is generic: with function 𝑓 of the same convexity on the interval [𝑎, 𝑏], the secant line will always cross on
the same side of the zero, so that one end-point persists; in this case, the curve is concave up, so the secant line always
crosses to the left of the root, as seen in the following graphs.
"""
fa = f(a)
fb = f(b)
for iteration in range(maxIterations):
c = (a * fb - fa * b)/(fb - fa)
fc = f(c)
abc = [a,b, c]
left = np.min(abc)
right = np.max(abc)
x = np.linspace(left, right)
figure(figsize=[12,5])
title(f"Iteration {iteration+1}, Method of False Position")
xlabel("$x$")
plot(x, f(x))
plot([left, right], [f(left), f(right)]) # the secant line
plot([left, right], [0, 0], 'k') # the x-axis line
plot(abc, f(abc), 'r*')
#show() # The Windows version of JupytLab might need this command
if fa * fc < 0:
b = c
fb = fc # N.B. When b is updated, so must be fb = f(b)
else:
a = c
fa = fc
58 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Refinement: Alway Use the Two Most Recent Approximations — The Secant Method
The basic solution is to always discard the oldest approximation — at the cost of not always having the zero surrounded!
This gives the Secant Method.
For a mathemacal description, one typically enumerates the successive approximations as 𝑥0 , 𝑥1 , etc., so the notation
above gets translated with 𝑎 → 𝑥𝑘−2 , 𝑏 → 𝑥𝑘−1 , 𝑐 → 𝑥𝑘 ; then the formula becomes the recursive rule
60 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Instead, we use the magnitude of 𝑏 − 𝑎 which is now |𝑥𝑘 − 𝑥𝑘−1 |, and this is only an estimate of the error. This is the
same as used for Newton’s Method; as there, it is still useful as a condition for ending the iterations and indeed tends to
be pessimistic, so that we typically do one more iteration than needed — but it is not on its own a complete guarantee of
having achieved the desired accuracy.
We could write Python code that closely follows this notation, accumulating a list of the values 𝑥𝑘 .
However, since we only ever need the two most recent values to compute the new one, we can instead just store these
three, in the same way that we recylced the variables a, b and c. Here I use more descriptive names though:
f_x_new = f(x_new)
(x_older, x_more_recent) = (x_more_recent, x_new)
(f_x_older, f_x_more_recent) = (f_x_more_recent, f_x_new)
errorEstimate = abs(x_older - x_more_recent)
if demoMode:
(continues on next page)
Note: As above, you could omit the above def and instead import this function with
Iteration 0:
The latest pair of approximations are 1 and 0.5403023058681398,
where the function's values are 0.4597 and -0.3173 respectively.
The new approximation is 0.5403023058681398, with estimated error 0.4597, backward␣
↪error 0.3173
Iteration 1:
The latest pair of approximations are 0.5403023058681398 and 0.7280103614676172,
where the function's values are -0.3173 and -0.01849 respectively.
The new approximation is 0.7280103614676172, with estimated error 0.1877, backward␣
↪error 0.01849
Iteration 2:
The latest pair of approximations are 0.7280103614676172 and 0.7396270126307336,
where the function's values are -0.01849 and 0.000907 respectively.
The new approximation is 0.7396270126307336, with estimated error 0.01162,␣
↪backward error 0.000907
Iteration 3:
The latest pair of approximations are 0.7396270126307336 and 0.7390838007832722,
where the function's values are 0.000907 and -2.23e-06 respectively.
The new approximation is 0.7390838007832722, with estimated error 0.0005432,␣
↪backward error 2.23e-06
Iteration 4:
The latest pair of approximations are 0.7390838007832722 and 0.7390851330557805,
where the function's values are -2.23e-06 and -2.667e-10 respectively.
The new approximation is 0.7390851330557805, with estimated error 1.332e-06,␣
↪backward error 2.667e-10
Iteration 5:
The latest pair of approximations are 0.7390851330557805 and 0.7390851332151607,
(continues on next page)
62 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Iteration 6:
The latest pair of approximations are 0.7390851332151607 and 0.7390851332151607,
where the function's values are 0.0 and 0.0 respectively.
The new approximation is 0.7390851332151607, with estimated error 0.0, backward␣
↪error 0.0
"""
x_older = a
x_more_recent = b
f_x_older = f(x_older)
f_x_more_recent = f(x_more_recent)
for iteration in range(maxIterations):
x_new = (x_older * f_x_more_recent - f_x_older * x_more_recent)/(f_x_more_
↪recent - f_x_older)
f_x_new = f(x_new)
latest_three_x_values = [x_older, x_more_recent, x_new]
left = np.min(latest_three_x_values)
right = np.max(latest_three_x_values)
x = np.linspace(left, right)
figure(figsize=[12,5])
title(f"Iteration {iteration+1}, Secant Method")
xlabel("$x$")
plot(x, f(x))
plot([left, right], [f(left), f(right)]) # the secant line
plot([left, right], [0, 0], 'k') # the x-axis line
plot(latest_three_x_values, f(latest_three_x_values), 'r*')
# show() # The Windows version of JupytLab might need this command
(x_older, x_more_recent) = (x_more_recent, x_new)
(f_x_older, f_x_more_recent) = (f_x_more_recent, f_x_new)
errorEstimate = abs(x_older - x_more_recent)
64 Chapter 2. Root-finding
Introduction to Numerical Methods and Analysis with Python
Observations
• This converges faster than the First Attempt: The Method of False Position (and far faster than Bisection).
• The majority of iterations do have the root surrounded (sign-change in 𝑓), but every third one — the second and
fifth — do not.
• Comparing the error estimate to the backward error, the error estmte is in fact quite pessimistic (and so fairly
trustworthy); in fact, it is typically of similar size to the backward error at the previous iteration.
The last point is a quite common occurence: the available error estimates are often “trailing indicators”, closer to the error
in the previous approximation in an iteration. For example, recall that we saw the same thing with Newton’s Method when
we used |𝑥𝑘 − 𝑥𝑘−1 | to estimate the error 𝐸𝑘 ∶= 𝑥𝑘 − 𝑟 and saw that it is in fact closer to the previous error, 𝐸𝑘−1 .
66 Chapter 2. Root-finding
CHAPTER
THREE
References:
• Section 2.1.1 Naive Gaussian elimination of [Sauer, 2022].
• Section 6.1 Linear Systems of Equations of [Burden et al., 2016].
• Section 7.1 of [Chenney and Kincaid, 2012].
3.1.1 Introduction
The problem of solving a system of 𝑛 simultaneous linear equations in 𝑛 unknowns, with matrix-vector form 𝐴𝑥 = 𝑏, is
quite thoroughly understood as far as having a good general-purpose methods usable with any 𝑛 × 𝑛 matrix 𝐴: essentially,
Gaussian elimination (or row-reduction) as seen in most linear algebra courses, combined with some modifications to stay
well away from division by zero: partial pivoting. Also, good robust software for this general case is readily available, for
example in the Python packages NumPy and SciPy.
Nevertheless, this basic algorithm can be very slow when 𝑛 is large – as it often is when dealing with differential equations
(even more so with partial differential equations). We will see that it requires about 𝑛3 /3 arithmetic operations.
Thus I will summarise the basic method of row reduction or Gaussian elimination, and then build on it with methods for
doing things more robustly, and then with methods for doing it faster in some important special cases:
1. When one has to solve many systems 𝐴𝑥(𝑚) = 𝑏(𝑚) with the same matrix 𝐴 but different right-hand side vectors
𝑏(𝑚) .
2. When 𝐴 is banded: most elements are zero, and all the non-zero elements 𝑎𝑖,𝑗 are near the main diagonal: |𝑖 − 𝑗|
is far less than 𝑛. (Aside on notation: “far less than” is sometimes denoted ≪, as in |𝑖 − 𝑗| ≪ 𝑛.)
3. When 𝐴 is strictly diagonally dominant: each diagonal element 𝑎𝑖,𝑖 is larger in magnitude that the sum of the
magnitudes of all other elements in the same row.
Other cases not (yet) discussed in this text are
4. When 𝐴 is positive definite: symmetric (𝑎𝑖,𝑗 = 𝑎𝑗,𝑖 ) and with all eigenvalues positive. This last condition would
seem hard to verify, since computing all the eigenvalues of 𝐴 is harder that solving 𝐴𝑥 = 𝑏, but there are important
situations where this property is automatically guaranteed, such as with Galerkin and finite-element methods for
solving boundary value problems for differential equations.
5. When 𝐴 is sparse: most elements are zero, but not necessarily with all the non-zero elements near the main diagonal.
67
Introduction to Numerical Methods and Analysis with Python
Just as package numpy is used so often that there is a conventional nickname np, so numpy.linalg is usually nick-
named la.
import numpy as np
import numpy.linalg as la
# As in recent sections, we import some items from modules individually, so they can␣
↪be used by "first name only".
3.1.2 Strategy for getting from mathematical facts to a good algorithm and then to
its implentation in [Python] code
Here I take the opportunity to illustrate some useful strategies for getting from mathematical facts and ideas to good
algorithms and working code for solving a numerical problem. The pattern we will see here, and often later, is:
𝑛
1. Start with mathematical facts (like the equations ∑𝑗=1 𝑎𝑖𝑗 𝑥𝑗 = 𝑏𝑖 ).
2. Solve to get an equation for each unknown — or for an updated aproximation of each unknown — in terms of
other quantitities.
3. Specify an order of evaluation in which all the quantities at right are evaluated earlier.
In this, it is often best to start with a verbal description before specifying the details in more precise and detailed mathe-
matical form.
1. Identify cases that can lead to failure due to division by zero and such, and revise to avoid them.
2. Avoid inaccuracy due to problems like severe rounding error. One rule of thumb is that anywhere that a zero value
is a fatal flaw (in particular, division by zero), a very small value is also a hazard when rounding error is present.
So avoid very small denominators. (We will soon examine this through the phenomenon of loss of significance,
and its extreme case catastrophic cancellation.)
For example,
• Avoid repeated evaluation of exactly the same quantity.
• Avoid redundant calculations, such as ones whose value can be determined in advance; for example, values that can
be shown in advance to be zero.
• Compare and choose between alternative algorithms.
We start by considering the most basic algorithm, based on ideas seen in a linear algebra course.
The problem is best stated as a collection of equations for individual numerical values:
Given coefficients 𝑎𝑖,𝑗 1 ≤ 𝑖 ≤ 𝑛, 1 ≤ 𝑗 ≤ 𝑛 and right-hand side values 𝑏𝑖 , 1 ≤ 𝑖 ≤ 𝑛, solve for the 𝑛 unknowns
𝑛
𝑥𝑗 , 1 ≤ 𝑗 ≤ 𝑛 in the equations $∑𝑗=1 𝑎𝑖,𝑗 𝑥𝑗 = 𝑏𝑖 , 1 ≤ 𝑖 ≤ 𝑛.$
In verbal form, the basic strategy of row reduction or Gaussian elimination is this:
• Choose one equation and use it to eliminate one chosen unknown from all the other equations, leaving that chosen
equation plus 𝑛 − 1 equations in 𝑛 − 1 unknowns.
• Repeat recursively, at each stage using one of the remaining equations to eliminate one of the remaining unknowns
from all the other equations.
• This gives a final equation in just one unknown, preceeded by an equation in that unknown plus one other, and so
on: solve them in this order, from last to first.
A precise algorithm must include rules specifying all the choices indicated above. The simplest “naive” choice, which
works in most but not all cases, is to eliminate from the top to bottom and left to right:
• Use the first equation to eliminate the first unknown from all other equations.
• Repeat recursively, at each stage using the first remaining equation to eliminate the first remaining unknown. Thus,
at step 𝑘, equation 𝑘 is used to eliminate unknown 𝑥𝑘 .
• This gives one equation in just the last unknown 𝑥𝑛 ; another equation in the last two unknowns 𝑥𝑛−1 and 𝑥𝑛 , and
so on: solve them in this reverse order, evaluating the unknowns from last to first.
This usually works, but can fail because at some stage the (updated) 𝑘-th equation might not include the 𝑘-th unknown:
that is, its coefficient might be zero, leading to division by zero.
We will refine the algorithm to deal with that in the section on Partial Pivoting.
Remark 2.1.2 (Using Numpy for matrices, vectors and their products)
As of version 3.5 of Python, vectors, matrices, and their products can be handled very elegantly using Numpy arrays, with
the one quirk that the product is denoted by the at-sign @. That is, for a matrix 𝐴 and compatible matrix or vector 𝑏 both
stored in Numpy arrays, their product is given by A @ b.
This means that, along with my encouragement to totally ignore Python arrays in favor of Numpy arrays, and to usually
avoid Python lists when working with numerical data, I also recommend that you ignore the now obsolescent Numpy
matrix data type, if you happen to come across it in older material on Numpy.
Aside: Why not A * b? Because that is the more general “point-wise” array product: c = A * b gives array c with
c[i,j] equal to A[i,j] * b[i,j], which is not how matrix multiplication works.
The problem of solving 𝐴𝑥 = 𝑏 in general, when all you know is that 𝐴 is an 𝑛 × 𝑛 matrix and 𝑏 is an 𝑛-vector, can
in most cases be handled well by using standard software rather than by writing your own code. Here is an example in
Python, solving
4 2 7 𝑥1 2
⎡ 3 5 −6 ⎤ ⎡ 𝑥 ⎤=⎡ 3 ⎤
⎢ ⎥⎢ 2 ⎥ ⎢ ⎥
⎣ 1 −3 2 ⎦ ⎣ 𝑥3 ⎦ ⎣ 4 ⎦
using the array type from package numpy and the function solve from the linear algebra module numpy.linalg.
A =
[[ 4. 2. 7.]
[ 3. 5. -6.]
[ 1. -3. 2.]]
b = [2. 3. 4.]
A @ b = [42. -3. 1.]
x = la.solve(A, b)
print("numpy.linalg.solve says that the solution of Ax = b is")
print(f"x = {x}")
# Check the backward error, also known as the residual
r = b - A @ x
print(f"\nAs a check, the residual (or backward error) is")
print(f" r = b-Ax = {r},")
print(f"and its infinity (or 'maximum') norm is ||r|| = {la.norm(r, inf)}")
print("\nAside: another way to compute this is with max(abs(r)):")
print(f"||r|| = {max(abs(r))}")
print(f"and its 1-norm is ||r|| = {la.norm(r, 1)}")
The rows before 𝑖 = 𝑘 are unchanged, so they are ommited from the update; however, in a situation where we need to
complete the definitions of 𝐴(𝑘) and 𝑏(𝑘) we would also need the following inside the for k loop:
(𝑘) (𝑘−1)
𝑏𝑖 = 𝑏𝑖
end
However, the algorithm will usually be implemented by overwriting the previous values in an array with new ones, and
then this part is redundant.
The next improvement in efficiency: the updates in the first 𝑘 columns at step 𝑘 give zero values (that is the key idea of
the algorithm!), so there is no need to compute or store those zeros, and thus the only calculations needed in the above
for j from 1 to n loop are covered by for j from k+1 to n. Thus from now on we use only the latter:
except when, for demonstration purposes, we need those zeros.
Thus, the standard algorithm looks like this:
Remark 2.1.5 (Syntax for for loops and 0-based array indexing)
Since array indices in Python (and in Java, C, C++, C#, Swift, etc.) start from zero, not from one, it will be convenient
to express linear algebra algorithms in a form compatible with this.
• Every index is one less than in the above! Thus in an array with 𝑛 elements, the index values 𝑖 are 0 ≤ 𝑖 < 𝑛,
excluding n, which is the half-open interval of integers [0, 𝑛).
• In the indexing of an array, one can refer to the part the array with indices 𝑎 ≤ 𝑖 < 𝑏, excluding b, with the slice
notation a:b.
• Similarly, when specifiying the range of consecutive integers 𝑖, 𝑎 ≤ 𝑖 < 𝑏 in a for loop, one can use the expression
range(a,b).
Also, when indices are processed in order (from low to high), these notes will abuse notation slightly, refering to the values
as a set — specifically, a semi-open interval of integers.
For example, the above loop
and then this will sometimes be described in terms of the set of j values:
for j in [k,n):
This new notation needs care initially, but helps with clarity in the long run. For one thing, it means that the indices of an
𝑛-element array, [0, 𝑛 − 1), are described by range(0,n) and by 0:n. In fact, the case of “starting at the beginning”,
with index zero, can be abbreviated: range(n) is the same as range(0,n), and :b ia the same as 0:b.
Another advantage is that the index ranges a:b and b:c together cover the same indices as a:c, with no gap or dupli-
cation of b, and likewise range(a,b) and range(b,c) combine to cover range(a,c).
Here the above notational shift is made, along with eliminating the above-noted redundant formulas for values that are
either zero or are unchanged from the previous step. It is also convenient for 𝑘 to be the index of the row being used to
reduce subsequent rows, and so also the index of the column in which values below the main diagonal are being set to
zero.
Algorithm 2.1.4
(𝑘) (𝑘) (𝑘) (𝑘+1)
for k in [0, n-1): for i in [k+1, n): 𝑙𝑖,𝑘 = 𝑎𝑖,𝑘 /𝑎𝑘,𝑘 If 𝑎𝑘,𝑘 ≠ 0! for j in [k+1, n): 𝑎𝑖,𝑗 =
(𝑘) (𝑘) (𝑘+1) (𝑘) (𝑘)
𝑎𝑖,𝑗 − 𝑙𝑖,𝑘 𝑎𝑘,𝑗 end 𝑏𝑖 = 𝑏𝑖 − 𝑙𝑖,𝑘 𝑏𝑘 end end
Conversion to actual Python code is now quite straightforward; there is litle more to be done than:
• Change the way that indices are described, from 𝑏𝑖 to b[i] and from 𝑎𝑖,𝑗 to A[i,j].
• Use case consistently in array names, since the quirk in mathematical notation of using upper-case letters for matrix
names but lower case letters for their elements is gone! In these notes, matrix names will be upper-case and vector
names will be lower-case (even when a vector is considered as 1-column matrix).
• Rather than create a new array for each matrix 𝐴(0) , 𝐴(0) , etc. and each vector 𝑏(0) , 𝑏(1) , we overwite each in the
same array.
Remark 2.1.6
We will see that this simplicity in translation is quite common once algorithms have been expressed with zero-based
indexing. The main ugliness is with loops that count backwards; see below.
for k in range(n-1):
for i in range(k+1, n):
L[i,k] = A[i,k] / A[k,k]
for j in range(k+1, n):
A[i,j] -= L[i,k] * A[k,j]
b[i] -= L[i,k] * b[k]
# but it helps for displaying results and for checking the results via␣
↪residuals.
U[i,k] = 0.
Note: As usual, you could omit the above def and instead import this functions with
(U, c) = rowreduce(A, b)
print(f"Row reduction gives\nU =\n{U}")
print(f"c = {c}")
Let’s take advantage of the fact that we have used la.solve to get a very accurate approximation of the solution x of
𝐴𝑥 = 𝑏; this should also solve 𝑈 𝑥 = 𝑐, so check the backward error, a.k.a. the residual:
r = c - U@x
print(f"\nThe residual (backward error) c-Ux is {r}, with maximum norm {max(abs(r))}.
↪")
for k in range(n-1):
L[k+1:n,k] = U[k+1:n,k] / U[k,k] # compute all the L values for column k
for i in range(k+1, n):
U[i,k+1:n] -= L[i,k] * U[k,k+1:n] # Update row i
c[k+1:n] -= L[k+1:n,k] * c[k] # update c values
I will break my usual guideline by redefining rowreduce, since this is just a different statement of exactly the same
algorithm:
L = np.zeros_like(A)
for k in range(n-1):
if demomode: print(f"Step {k=}")
(continues on next page)
# but it helps for displaying results and for checking the results via␣
↪residuals.
U[i,k] = 0.0
A[[r1 r2 r3], :]
selects the indicated four columns — but only from row 2 onwards.
This gives another way to describe the update of the lower-right block U[k+1:n,k+1:n] with a single matrix multi-
plication: it is the outer product of part of column k of L after row k by the part of row k of U after column k.
To specify that the piecws of L nd U are identifies as a 1-column matrix and a 1-row matrix respectively, rather than as
vectors, the above “row/column list” method must be used, with the list being just [k] in each case.
L = np.zeros_like(A)
for k in range(n-1):
if demomode: print(f"Step {k=}")
# compute all the L values for column k:
(continues on next page)
# but it helps for displaying results and for checking the results via␣
↪residuals.
U[k+1:n,k] = 0.0
and so solved as
𝑛
𝑐𝑖 − ∑𝑗=𝑖+1 𝑢𝑖,𝑗 𝑥𝑗
𝑥𝑖 = , If 𝑢𝑖,𝑖 ≠ 0
𝑢𝑖,𝑖
Algorithm 2.1.5
𝑛
𝑐𝑖 − ∑𝑗=𝑖+1 𝑢𝑖,𝑗 𝑥𝑗
𝑥𝑛 = 𝑐𝑛 /𝑢𝑛,𝑛 for i from n-1 down to 1 𝑥𝑖 = end
𝑢𝑖,𝑖
This works so long as none of the main diagonal terms 𝑢𝑖,𝑖 is zero, because when done in this order, everything on the
right hand side is known by the time it is evaluated.
For future reference, note that the elements 𝑢𝑘,𝑘 that must be non-zero here, the ones on the main diagonal of 𝑈 , are
(𝑘)
the same as the elements 𝑎𝑘,𝑘 that must be non-zero in the row reduction stage above, because after stage 𝑘, the elements
(𝑘) (𝑛−1)
of row 𝑘 do not change any more: 𝑎𝑘,𝑘 = 𝑎𝑘,𝑘 = 𝑢𝑘,𝑘 .
Again, a zero-based version is more convenient for programming in Python (or Java, or C++):
Algorithm 2.1.6
𝑛−1
𝑐𝑖 − ∑𝑗=𝑖+1 𝑢𝑖,𝑗 𝑥𝑗
𝑥𝑛−1 = 𝑐𝑛−1 /𝑢𝑛−1,𝑛−1 for i from n-2 down to 0 𝑥𝑖 = end
𝑢𝑖,𝑖
Remark 2.1.9 (Indexing from the end of an array and counting backwards)
To express the above backwards counting in Python, we have to deal with the fact that range(a,b) counts upwards
and excludes the “end value” b. The first part is easy: the extended form range(a, b, step) increments by step
instead of by one, so that range(a, b, 1) is the same as range(a,b), and range(a, b, -1) counts down:
𝑎, 𝑎 − 1, … , 𝑏 + 1.
But it still stops just before 𝑏, so getting the values from 𝑛 − 1 down to 0 requires using 𝑏 = −1, and so the slightly quirky
expression range(n-1, -1, -1).
𝑛−1
One more bit of Python: for an 𝑛-element single-index array v, the sum of its elements ∑𝑖=0 𝑣𝑖 is given by sum(v).
𝑏−1
Thus ∑𝑖=𝑎 𝑣𝑖 , the sum over a subset of indices [𝑎, 𝑏), is given by sum(v[a:b]).
And remember that multiplication of Numpy arrays with * is pointwise.
With all the above Python details, the core code for backward substitution is:
x[n-1] = c[n-1]/U[n-1,n-1]
for i in range(n-2, -1, -1):
x[i] = (c[i] - sum(U[i,i+1:] * x[i+1:])) / U[i,i]
Remark 2.1.10
Note that the backward substitution algorithm and its Python coding have a nice mathematical advantage over the row
reduction algorithm above: the precise mathematical statement of the algorithm does not need any intermediate quantities
distinguished by superscripts (𝑘) , and correspondingly, all variables in the code have fixed meanings, rather than changing
at each step.
In other words, all uses of the equal sign are mathematically correct as equations!
This can be advantageous in creating algorithms and code that is more understandable and more readily verified to be
correct, and is an aspect of the functional programming approach. We will soon go part way to that functional ideal,
by rephrasing Gaussian elimination in a form where all variables have clear, fixed meanings, corresponding to the nat-
ural mathematical description of the process: the method of LU factorization introduced in Solving Ax = b with LU
factorization, A = L U.
x[-1] = c[-1]/U[-1,-1]
for i in range(2, n+1):
x[-i] = (c[-i] - sum(U[-i,1-i:] * x[1-i:])) / U[-i,-i]
There is still the quirk of having to “overshoot”, referring to n+1 in range to get to final index -n.
As a final demonstration, we put this second version of the code into a complete working Python function and test it:
x = backwardsubstitution(U, c)
print(f"x = {x}")
r = b - A@x
print(f"\nThe residual b - Ax = {r},")
print(f"with maximum norm {max(abs(r)):.3}.")
Since one is often just interested in the solution given by the two steps of row reduction and then backward substitution,
they can be combined in a single function by composition:
solvelinearsystem(A, b)
3.1.10 Two code testing hacks: starting from a known solution, and using randomly
generated examples
An often useful strategy in developing and testing code is to create a test case with a known solution; another is to use
random numbers to avoid accidently using a test case that in unusually easy.
Prefered Python style is to have all import statements at the top, but since this is the first time we’ve heard of module
random, I did not want it to be mentioned mysteriously above.
import random
for i in range(len(x)):
x_random[i] = random.uniform(-1, 1) # gives random real value, from uniform␣
↪distribution in [-1, 1]
print(f"x_random = {x_random}")
Create a right-hand side b that automatically makes x_random the correct solution:
b_random = A @ x_random
print(f"A =\n{A}")
print(f"\nb_random = {b_random}")
(U, c_random) = rowreduce(A, b_random)
print(f"\nU=\n{U}")
print(f"\nResidual c_random - U@x_random = {c_random - U@x_random}")
x_computed = backwardsubstitution(U, c_random)
print(f"\nx_computed = {x_computed}")
print(f"\nResidual b_random - A@x_computed = {b_random - A@x_computed}")
print(f"\nBackward error |b_random - A@x_computed| = {max(abs(b_random - A@x_
↪computed))}")
A =
[[ 4. 2. 7.]
[ 3. 5. -6.]
[ 1. -3. 2.]]
U=
[[ 4. 2. 7. ]
(continues on next page)
𝑥2 = 1
𝑥1 + 𝑥2 = 2
It is easy to see that this has the solution 𝑥1 = 𝑥2 = 1; in fact it is already in “reduced form”. However when put into
matrix form
0 1 𝑥 1
[ ][ 1 ] = [ ]
1 1 𝑥2 2
the above algorithm fails, because the fist pivot element 𝑎11 is zero:
U1 =
[[ 0. 1.]
[ 0. -inf]]
c1 = [ 1. -inf]
x1 = [nan nan]
/var/folders/zk/qv7t2p8x33ldzh_sg8b854lr0000gn/T/ipykernel_18954/2577478211.py:15:␣
↪RuntimeWarning: divide by zero encountered in true_divide
x[-1] = c[-1]/U[-1,-1]
1 1 1 𝑥1 3
⎡ 1 1 2 ⎤ ⎡ 𝑥 ⎤=⎡ 4 ⎤
⎢ ⎥⎢ 2 ⎥ ⎢ ⎥
⎣ 1 2 2 ⎦ ⎣ 𝑥3 ⎦ ⎣ 5 ⎦
The solution is 𝑥1 = 𝑥2 = 𝑥3 = 1, and this time none of th diagonal elements is zero, so it is not so obvious that a
division by zero problem will occur, but:
U2 =
[[ 1. 1. 1.]
[ 0. 0. 1.]
[ 0. 0. -inf]]
c2 = [ 3. 1. -inf]
x2 = [nan nan nan]
/var/folders/zk/qv7t2p8x33ldzh_sg8b854lr0000gn/T/ipykernel_18954/2577478211.py:15:␣
↪RuntimeWarning: divide by zero encountered in true_divide
What happens here is that the first stage subtracts the first row from each of the others …
A2[1,:] -= A2[0,:]
b2[1] -= b2[0]
A2[2,:] -= A2[0,:]
b2[2] -= b2[0]
… and the new matrix has the same problem as above at the next stage:
print(f"Now A2 is \n{A2}")
print(f"and b2 is {b2}")
Now A2 is
[[1. 1. 1.]
[0. 0. 1.]
[0. 1. 1.]]
and b2 is [3. 1. 2.]
0 1 𝑥 1
[ ][ 2 ] = [ ]
1 1 𝑥3 2
1 1016 𝑥 1 + 1016
[ ][ 1 ] = [ ]
1 1 𝑥2 2
again have the solution 𝑥1 = 𝑥2 = 1, and the only division that happens in the above algorithm for row reduction is by
that pivot element 𝑎11 = 1, ≠ 0, so with exact arithmetic, all would be well. But:
A3 =
[[1.e+00 1.e+16]
[1.e+00 1.e+00]]
b3 = [1.e+16 2.e+00]
U3 =
[[ 1.e+00 1.e+16]
[ 0.e+00 -1.e+16]]
c3 = [ 1.e+16 -1.e+16]
x3 = [2. 1.]
A3a =
[[1.e+00 1.e+15]
[1.e+00 1.e+00]]
b3a = [1.e+15 2.e+00]
U3a =
[[ 1.e+00 1.e+15]
[ 0.e+00 -1.e+15]]
c3a = [ 1.e+15 -1.e+15]
x3a = [1. 1.]
10−16 1 𝑥 1 + 10−16
[ ][ 1 ] = [ ]
1 1 𝑥2 2
Now the problem is more obvious: this system differs from the system in Example 2.1.1 just by a tiny change of 10−16 in
that pivot elements 𝑎11 , and the problem is division by a value very close to zero.
A4 =
[[1.e-16 1.e+00]
[1.e+00 1.e+00]]
b4 = [1. 2.]
U4 =
[[ 1.e-16 1.e+00]
(continues on next page)
One might think that there is no such small denominator in Example 2.1.3, but what counts for being “small” is magnitude
relative to other values — 1 is very small compared to 1016 .
To understand these problems more (and how to avoid them) we will explore Machine Numbers, Rounding Error and
Error Propagation in the next section.
There are several important cases when we can guarantee that these problem do not occur. One obvious case is when
the matrix 𝐴 is diagonal and non-singular (so with all non-zero elements); then it is already row-reduced and with all
denominators in backward substitution being non-zero.
A useful measure of being “close to diagonal” is diagonal dominance:
Loosely, each main diagonal “dominates” in size over all other elements in its row.
(so that each main diagonal element “dominates its column”) the matrix is called column-wise strictly diagonally dom-
inant.
Note that this is the same as saying that the transpose 𝐴𝑇 is SDD.
Aside: If only the corresponding non-strict inequality holds, the matrix is called diagonally dominant.
Theorem 2.1.1
For any strictly diagonally dominant matrix 𝐴, each of the intermediate matrices 𝐴(𝑘) given by the naive Gaussan elim-
ination algorithm is also strictly diagonally dominant, and so the final upper triangular matrix 𝑈 is. In particular, all
(𝑘)
the diagonal elements 𝑎𝑖,𝑖 and 𝑢𝑖,𝑖 are non-zero, so no division by zero occurs in any of these algorithms, including the
backward substitution solving for 𝑥 in 𝑈 𝑥 = 𝑐.
The corresponding fact also true if the matrix is column-wise strictly diagonally dominant: that property is also preserved
at each stage in naive Guassian elimination.
Thus in each case the diagonal elements — the elements divided by in both row reduction and backward substitution
— are in some sense safely away from zero. We will have more to say about this in the sections on pivoting and LU
factorization
For a column-wise SDD matrix, more is true: at stage 𝑘, the diagonal dominance says that the pivot elemet on the diagonal,
(𝑘−1) (𝑘−1)
𝑎𝑘,𝑘 , is larger (in magnitude) than any of the elements 𝑎𝑖,𝑘 below it, so the multipliers 𝑙𝑖,𝑘 have
(𝑘−1) (𝑘−1)
|𝑙𝑖,𝑘 | = |𝑎𝑖,𝑘 /𝑎𝑘,𝑘 | < 1.
As we will see when we look at the effects of rounding error in the sections on Machine Numbers, Rounding Error and
Error Propagation and Error bounds for linear algebra keeping intermediate values small is generally good for accuracy,
so this is a nice feature.
References:
• Sections 0.3 Floating Point Represenation of Real Numbers and 0.4 *Loss of Significance in [Sauer, 2022].
• Section 1.2 Round-off Errors and Computer Arithmetic of [Burden et al., 2016].
• Sections 1.3 and 1.4 of [Chenney and Kincaid, 2012].
3.2.1 Overview
The naive Gaussian elimination algorithm seen in the section Row Reduction/Gaussian Elimination has several related
weaknesses which make it less robust and flexible than desired.
Most obviously, it can fail even when the equations are solvable, due to its naive insistence on always working from the
top down. For example, as seen in Example 2.1.1 of that section, it fails with the system
0 1 𝑥 1
[ ][ 1 ] = [ ]
1 1 𝑥2 2
because the formula for the first multiplier 𝑙2,1 = 𝑎2,1 /𝑎1,1 gives 1/0.
Yet the equations are easily solvable, indeed with no reduction needed: the first equation just says 𝑥2 = 1, and then the
second gives 𝑥1 = 2 − 𝑥2 = 1.
All one has to do here to avoid this problem is change the order of the equations. Indeed we will see that such reordering
is all that one ever needs to do, so long as the original equation has a unique solution.
However, to develop a good strategy, we will also take account of errors introduced by rounding in computer arithmetic,
so that is our next topic.
The above claim raises the concept of robustness and the importance of both existence and uniqueness of solutions.
For example, the problem of finding the root of a continuous, monotonic function 𝑓 ∶ [𝑎, 𝑏] → ℝ with 𝑓(𝑎) and 𝑓(𝑏) of
opposite sign is well-posed. Note the care taken with details to ensure both existence and uniqueness of the solution.
For example, the bisection method is robust for the above class of problems. On the other hand, Newton’s method is
not, and if we dropped the specification of monotonicity (so allowing multiple solutons) then the bisection method in its
current form would not be robust: it would fail whenever there is more that one solution in the interval [𝑎, 𝑏].
There is a second slightly less obvious problem with the naive algorithm for Guassian elimination, closely related to the
first. As soon as the algorithm is implemented using any rounding in the arithmetic (rather than, say, working with
exact arithmetic on rational numbers) division by values that are very close to zero can lead to very large intermediate
values, which thus have very few correct decimals (correct bits); that is, very large absolute errors. These large errors
can then propagate, leading to low accuracy in the final results, as seen in Example 2.1.2 and Example 2.1.4 of Row
Reduction/Gaussian Elimination
This is the hazard of loss of significance, discussed in Section 0.4 of [Sauer, 2022] and Section 1.4 of [Chenney and
Kincaid, 2012].
So it is time to take Step 2 of the strategy described in the previous notes:
1. Identify cases that can lead to failure due to division by zero and such, and revise to avoid them.
2. Avoid inaccuracy due to problems like severe rounding error. One rule of thumb is that anywhere that a zero value
is a fatal flaw (in particular, division by zero), a very small value is also a hazard when rounding error is present.
So avoid very small denominators. …
As a very quick summary, standard computer arithmetic handles real numbers using binary machine numbers with 𝑝
significant bits, and rounding off of other numbers to such machine numbers introduces a relative error of at most 2−𝑝 .
The current dominant choice for machine numbers and arithmetic is IEEE-64, using 64 bits in total and with 𝑝 = 53
significant bits, so that 1/2𝑝 ≈ 1.11⋅10−16 , giving about fifteen significant digits. (The other bits are used for an exponent
and the sign.)
(Note: in the above, I ignore the extra problems with real numbers whose magnitude is too large or too small to be
represented: underflow and overflow. Since the allowable range of magnitudes is from 2−1022 ≈ 2.2 ⋅ 10−308 to 21024 ≈
1.8 ⋅ 10308 , this is rarely a problem in practice.)
With other systems of binary machine numbers (like older 32-bit versions, or higher precision options like 128 bits) the
significant differences are mostly encapsulated in that one number, the machine unit, 𝑢 = 1/2𝑝 .
The basic representation is a binary version of the familiar scientific or decimal floating point notation: in place of the
𝑑
form ±𝑑0 .𝑑1 𝑑2 … 𝑑𝑝−1 × 10𝑒 where the fractional part or mantissa is 𝑓 = 𝑑0 .𝑑1 𝑑2 … 𝑑𝑝−1 = 𝑑0 + 𝑑101 + ⋯ + 10𝑝−1 𝑝−1 .
Binary floating point machine numbers with 𝑝 significant bits can be described as
𝑏1 𝑏 𝑏𝑝−1
±(𝑏0 .𝑏1 𝑏2 … 𝑏𝑝−1 )2 × 2𝑒 = ± (𝑏0 + + 22 + ⋯ 𝑝−1 ) × 2𝑒
2 2 2
Just as decimal floating point numbers are typically written with the exponent chosen to have non-zero leading digit
𝑑0 ≠ 0, normalized binary floating point machine numbers have exponent 𝑒 chosen so that 𝑏0 ≠ 0. Thus in fact 𝑏0 = 1
— and so it need not be stored; only 𝑝 − 1 bits are needed to stored for the mantissa.
It turns out that the relative errors are determined solely by the number of significant bits in the mantissa, regardless of
the exponent, so we look at that part first.
The spacing of consecutive mantissa values (1.𝑏1 𝑏2 … 𝑏𝑝−1 )2 is one in the last bit, or 21−𝑝 . Thus rounding of any inter-
mediate value 𝑥 to the nearest number of this form introduces an absolute error of at most half of this: 𝑢 = 2−𝑝 , which
is called the machine unit
How large can the relative error be? It is largest for the smallest possible denominator, which is (1.00 … 0)2 = 1, so the
relative error due to rounding is also at most 2−𝑝 .
The sign has no effect on the absolute error, and the exponent changes the spacing of consecutive machine numbers by a
factor of 2𝑒 . This scales the maximum possible absolute error to 2𝑒−𝑝 , but in the relative error calculation, the smallest
possible denominator is also scaled up to 2𝑒 , so the largest possible relative error is again the machine unit, 𝑢 = 2−𝑝 .
One way to describe the machine unit u (sometimes called machine epsilon) is to note that the next number above 1 is
1 + 21−𝑝 = 1 + 2𝑢. Thus 1 + 𝑢 is at the threshold between rounding down to 1 and rounding up to a higher value.
For completely full details, you could read about the IEEE 754 Standard for Floating-Point Arithmetic and specifically
the binary64 case. (For historical reasons, this is known as “Double-precision floating-point format”, from the era when
computers were typicaly used 32-bit words, so 64-bit numbers needed two words.)
In the standard IEEE-64 number system:
• 64 bit words are used to store real numbers (a.k.a. floating point numbers, sometimes called floats.)
• There are 𝑝 = 53 bits of precision, so that 52 bits are used to store the mantissa (fractional part).
• The sign is stored with one bit 𝑠: effectively a factor of (−1)𝑠 , so 𝑠 = 0 for positive, 𝑠 = 1 for negative.
• The remaining 11 bits are use for the exponent, which allows for 211 = 2048 possibilities; these are chosen in the
range −1023 ≤ 𝑒 ≤ 1024.
• However, so far, this does not allow for the value zero! This is handled by giving a special meaning for the smallest
exponent 𝑒 = −1023, so the smallest exponent for normalized numbers is 𝑒 = −1022.
• At the other extreme, the largest exponent 𝑒 = 1024 is used to encode “infinite” numbers, which can arise when a
calculation gives a value too large to represent. (Python displays these as inf and -inf). This exponent is also
used to encode “Not a Number”, for situations like trying to divide zero by zero or multiply zero by inf.
• Thus, the exponential factors for normlaized numbers are in the range 2−1022 ≈ 2 × 10−308 to 21023 ≈ 9 × 10307 .
Since the mantissa ranges from 1 to just under 2, the range of magnitudes of normalized real numbers is thus from
2−1022 ≈ 2 × 10−308 to just under 21024 ≈ 1.8 × 10308 .
Some computational experiments:
p = 53
u = 2**(-p)
print(f"For IEEE-64 arithmetic, there are {p} bitd of precision and the machine unit␣
↪is u={u},")
print(f"and the next numbers above 1 are 1+2u = {1+2*u}, 1+4u = {1+4*u} and so on.")
for factor in [3, 2, 1.00000000001, 1]:
onePlusSmall = 1 + factor * u
print(f"1 + {factor}u rounds to {onePlusSmall}")
difference = onePlusSmall - 1
print(f"\tThis is more than 1 by {difference:.4}, which is {difference/u} times u
↪")
For IEEE-64 arithmetic, there are 53 bitd of precision and the machine unit is u=1.
↪1102230246251565e-16,
1 + 3u rounds to 1.0000000000000004
This is more than 1 by 4.441e-16, which is 4.0 times u
1 + 2u rounds to 1.0000000000000002
This is more than 1 by 2.22e-16, which is 2.0 times u
1 + 1.00000000001u rounds to 1.0000000000000002
This is more than 1 by 2.22e-16, which is 2.0 times u
1 + 1u rounds to 1.0
This is more than 1 by 0.0, which is 0.0 times u
1 - 2u rounds to 0.9999999999999998
This is less than 1 by 2.22e-16, which is 2.0 times u
1 - 1u rounds to 0.9999999999999999
This is less than 1 by 1.11e-16, which is 1.0 times u
1 - 0.500000000005u rounds to 0.9999999999999999
This is less than 1 by 1.11e-16, which is 1.0 times u
1 - 0.5u rounds to 1.0
This is less than 1 by 0.0, which is 0.0 times u
Next, look at the extremes of very small and very large magnitudes:
What happens if we compute positive numbers smaller than that smallest normalized positive number 2−1022 ?
These extremely small values are called denormalized numbers. Numbers with exponent 2−1022−𝑆 have fractional part
with 𝑆 leading zeros, so only 𝑝 − 𝑆 significant bits. So when the shift 𝑆 reaches 𝑝 = 53, there are no significant bits left,
and the value is truly zero.
The only errors in the results of Gaussian elimination come from errors in the initial data (𝑎𝑖𝑗 and 𝑏𝑖 ) and from when the
results of subsequent arithmetic operations are rounded to machine numbers. Here, we consider how errors from either
source are propagated — and perhaps amplified — in subsequent arithmetic operations and rounding.
In summary:
• When multiplying two numbers, the relative error in the sum is no worse than slightly more than the sum of the
relative errors in the numbers multiplied. (the be pedantic, it is at most the sum of those relative plus their product,
but that last piece is typically far smaller.)
• When dividing two numbers, the relative error in the quotient is again no worse than slightly more than the sum of
the relative errors in the numbers divided.
• When adding two positive numbers, the relative error is no more that the larger of the relative errors in the numbers
added, and the absolute error in the sum is no larger than the sum of the absolute errors.
• When subtracting two positive numbers, the absolute error is again no larger than the sum of the absolute errors in
the numbers subtracted, but the relative error can get far worse!
Due to the differences between the last two cases, this discussion of error propagation will use “addition” to refer only to
adding numbers of the same sign, and “subtraction” when subtracting numbers of the same sign.
More generally, we can think of rewriting the operation in terms of a pair of numbers that are both positive, and assume
WLOG that all input values are positive numbers.
Let 𝑥 and 𝑦 be exact quantities, and 𝑥𝑎 = 𝑥(1 + 𝛿𝑥 ), 𝑦𝑎 = 𝑦(1 + 𝛿𝑦 ) be approximations. The approximate product
(𝑥𝑦)𝑎 = 𝑥𝑎 𝑦𝑎 = 𝑥(1 + 𝛿𝑥 )𝑦(1 + 𝛿𝑦 ) has error
For example if the initial errors are due only to rounding, |𝛿𝑥 | ≤ 𝑢 − 2−𝑝 and similarly for |𝛿𝑦 |, so the relative error in
𝑥𝑎 𝑦𝑎 is at most 2𝑢 + 𝑢2 = 21−𝑝 + 2−2𝑝 . In this and most situations, that final “product of errors” term 𝛿𝑥 𝛿𝑦 is far smaller
than the first two, giving to a very good approximation
Exercise 1
With 𝑥𝑎 and 𝑦𝑎 as above (and positive), the approximate sum 𝑥𝑎 + 𝑦𝑎 has error
(𝑥𝑎 + 𝑦𝑎 ) − (𝑥 + 𝑦) = (𝑥𝑎 − 𝑥) + (𝑦𝑎 − 𝑦)
so the absolute error is bounded by |𝑥𝑎 − 𝑥| + |𝑦𝑎 − 𝑦|; the sum of the absolute errors.
For the relative errors, express this error as
(𝑥𝑎 + 𝑦𝑎 ) − (𝑥 + 𝑦) = (𝑥(1 + 𝛿𝑥 ) + 𝑦(1 + 𝛿𝑦 )) = 𝑥𝛿𝑥 + 𝑦𝛿𝑦
Let 𝛿 be the maximum or the relative errors, 𝛿 = max(|𝛿𝑥 |, |𝛿𝑦 |); then the absolute error is at most (|𝑥|+|𝑦|)𝛿 = (𝑥+𝑦)𝛿
and so the relative error is at most
(𝑥 + 𝑦)𝛿
= 𝛿 = max(|𝛿𝑥 |, |𝛿𝑦 |)
|𝑥 + 𝑦|
That is, the relative error in the sum is at most the sum of the relative errors, again as advertised above.
When the “input errors” in 𝑥𝑎 and 𝑦𝑎 come just from rounding to machine numbers, the error bound for the sum is no
larger: no precision is lost! Thus, if you take any collection of non-negative numbers, round the to machine numbers so
that each has relative error at must 𝑢, then the sum of these rounded values also has relative error at most 𝑢.
The above calculation for the absolute error works fine regardless of the signs of the numbers, so the absolute error of a
difference is still bounded by the sum of the absolute errors:
|(𝑥𝑎 − 𝑦𝑎 ) − (𝑥 − 𝑦)| ≤ |𝑥𝑎 − 𝑥| + |𝑦𝑎 − 𝑦|
But for subtraction, the denominator in the relative error formulas can be far smaller. WLOG let 𝑥 > 𝑦 > 0. The relative
error bound is
|(𝑥𝑎 − 𝑦𝑎 ) − (𝑥 − 𝑦)| 𝑥𝛿𝑥 + 𝑦𝛿𝑦
≤
|𝑥 − 𝑦| 𝑥−𝑦
Clearly if 𝑥 − 𝑦 is far smaller than 𝑥 or 𝑦, this can be far larger than the “input” relative errors |𝛿𝑥 | and |𝛿𝑦 |.
The extreme case is where the values 𝑥 and 𝑦 round to the same value, so that 𝑥𝑎 − 𝑦𝑎 = 0, and the relative error is 1:
“100% error”, a case of catastrophic cancellation.
Exercise 2
Let us move slightly away from the worst case scenario where the difference is exactly zero to one where it is close to
zero; this will illustrate the idea mentioned earlier that whereever a zero value is a problem in exact aritmetic, a very small
value can be a problem in approximate arithmetic.
For 𝑥 = 8.024 and 𝑦 = 8.006,
• Round each to three significant figures, giving 𝑥𝑎 and 𝑦𝑎 .
• Compute the absolute errors in each of these approximations, and in their difference as an approximation of 𝑥 − 𝑦.
• Compute the relative errors in each of these three approximations.
Then look at rounding to only two significant digits!
The problem is worst when 𝑥 and 𝑦 are close in relative terms, in that 𝑦/𝑥 is close to 1. In the case of the errors in 𝑥𝑎
and 𝑦𝑎 coming just from rounding to machine enumbers, we have:
Exercise 3
(a) Illustrate why computing the roots of the quadratic equation 𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0 with the standard formula
√
−𝑏 ± 𝑏2 − 4𝑎𝑐
𝑥=
2𝑎
can sometimes give poor accuracy when evaluated using machine arithmetic such as IEEE-64 floating-point arithmetic.
This is not alwys a problem, so identify specifically the situations when this could occur, in terms of a condition on the
coefficents 𝑎, 𝑏, and 𝑐. (It is sufficient to consider real value of the ocefficients. Also as an aside, there is no loss of
precision problem when the roots are non-real, so you only need consider quadratics with real roots.)
(b) Then describe a careful procedure for always getting accurate answers. State the procedure first with words and
mathematical formulas, and then express it in pseudo-code.
𝑓(𝑥 + ℎ) − 𝑓(𝑥)
𝐷𝑓(𝑥) = lim
ℎ→0 ℎ
by using
𝑓(𝑥 + ℎ) − 𝑓(𝑥)
𝐷𝑓(𝑥) ≈ 𝐷ℎ 𝑓(𝑥) ∶=
ℎ
with a small value of ℎ — but this inherently involves the difference of almost equal quantities, and so loss of significance.
Taylor’s theorem give an error bound if we assume exact arithmetic — worse for larger ℎ. Then the above results give a
measure of rounding error effects — worse for smaller ℎ.
This leads to the need to balance these error sources, to find an optimal choice for ℎ and the corresponding error bound.
Denote the error in approximately calculating 𝐷ℎ 𝑓(𝑥) with machine arithmetic as 𝐷̃ ℎ 𝑓(𝑥).
The error in this as an approximating of the exact derivative is
𝐸𝐴 = 𝐷̃ ℎ 𝑓(𝑥) − 𝐷ℎ 𝑓(𝑥)
is the error due to machine Arithmetic in evaluation of the difference quotient 𝐷ℎ 𝑓(𝑥), and
𝐸𝐷 = 𝐷ℎ 𝑓(𝑥) − 𝐷𝑓(𝑥)
is the error in this difference quotient as an approximation of the exact derivative 𝐷𝑓(𝑥), = 𝑓 ′ (𝑥). This error is sometimes
called the discretization error because it arises whe we replace the derivative by a discrete algebraic calculation.
Bounding the Arithmetic error 𝐸𝐴
The first source of error is rounding of 𝑓(𝑥) to a machine number; as seen above, this gives 𝑓(𝑥)(1 + 𝛿1 ), with |𝛿1 | ≤ 𝑢,
so absolute error |𝑓(𝑥)𝛿1 | ≤ |𝑓(𝑥)|𝑢.
Similarly, 𝑓(𝑥 + ℎ) is rounded to 𝑓(𝑥 + ℎ)(1 + 𝛿2 ), absolute error at most |𝑓(𝑥 + ℎ)|𝑢.
Since we are interested in fairly small values of ℎ (to keep 𝐸𝐷 under control), we can assume that |𝑓(𝑥 + ℎ)| ≈ |𝑓(𝑥)|,
so this second absolute error is also very close to |𝑓(𝑥)|𝑢.
Then the absolute error in the difference in the numerator of 𝐷ℎ 𝑓(𝑥) is at most 2|𝑓(𝑥)|𝑢 (or only a tiny bit greater).
Next the division. We can assume that ℎ is an exact machine number, for example by choosing ℎ to be a power of two,
so that division by ℎ simply shifts the power of two in the exponent part of the machine number. This has no effect on on
the relative error, but scales the absolute error by the factor 1/ℎ by which one is multiplying: the absolute error is now
bounded by
2|𝑓(𝑥)|𝑢
|𝐸𝐴 | ≤
ℎ
This is a critical step: the difference has a small absolute error, which conceals a large relative error due to the difference
being small; now the absolute error gets amplified greatly when ℎ is small.
Bounding the Discretization error 𝐸𝐷
As seen in Taylor’s Theorem and the Accuracy of Linearization — for the basic case of linearization — we have
𝑓 ″ (𝑐𝑥 ) 2
𝑓(𝑥 + ℎ) − 𝑓(𝑥) = 𝐷𝑓(𝑥)ℎ + ℎ
2
so
𝑓(𝑥 + ℎ) − 𝑓(𝑥) 𝑓 ″ (𝑐𝑥 )
𝐸𝐷 = = ℎ
ℎ 2
and with 𝑀2 = max |𝑓 ″ |,
𝑀2
|𝐸𝐷 | ≤ ℎ
2
Bounding the total absolute error, and minimizing it
The above results combine to give an upper limit on how bad the total error can be:
2|𝑓(𝑥)|𝑢 𝑀2
|𝐸| ≤ |𝐸𝐴 | + |𝐸𝐷 | ≤ + ℎ
ℎ 2
As aniticipated, the errors go in opposite directions: decreasing ℎ to reduce 𝐸𝐷 makes 𝐸𝐴 worse, and vice versa. Thus
we can expect that there is a “goldilocks” value of ℎ — neither too small nor too big — that gives the best overall bound
on the total error.
To do this, let’s clean up the notation: let
𝑀2
𝐴 = 2|𝑓(𝑥)|𝑢, 𝐷= ,
2
𝐴 2|𝑓(𝑥)|𝑢 |𝑓(𝑥)| √ √
ℎ = ℎ∗ = √ =√ = 2√ 𝑢, = 𝐾 𝑢
𝐷 𝑀2 /2 𝑀2
|𝑓(𝑥)|
using the short-hand 𝐾 = 2√ .
𝑀2
This is easily verified to give the global mimimum of 𝐸(ℎ); thus, the best error bound we can get is for this value of ℎ:
2|𝑓(𝑥)|𝑢 𝑀2 √ 2|𝑓(𝑥)| 𝑀 √
𝐸 ≤ 𝐸 ∗ ∶= 𝐸(ℎ∗ ) = √ + 𝐾 𝑢=( + 𝐾 2) 𝑢
𝐾 𝑢 2 𝐾 2
References:
• Section 2.4.1 Partial Pivoting of [Sauer, 2022].
• Section 6.2 Pivoting Stratgies of [Burden et al., 2016].
• Section 7.1 of [Chenney and Kincaid, 2012].
Remark 2.3.1
Some references describe the method of scaled partial pivoting, but here we present instead a version without the “scaling”,
because not only is it simpler, but modern research shows that it is esentially always as good, once the problem is set up
in a “sane” way.
3.3.1 Introduction
The basic row reduction method can fail due to divisoion by zero (and to have very large rouding errors when a denominator
is extremely close to zero. A more robust modification is to swap the order of the equations to avaid these problems: partial
pivotng. Here we look at a particularly robust version of this strategy, Maximal Element Partial Pivoting.
# As in recent sections, we import some items from modules individually, so they can␣
↪be used by "first name only".
We have noted two problems with the naive algorithm for Gaussian elimination: total failure due the division be zero, and
loss of precision due to dividing by very small values — or more preciselt calculations the lead to intermediate values far
(𝑘−1)
larger than the final results. The culprits in all cases are the same: the denominators are first the pivot elements 𝑎𝑘,𝑘 in
(𝑘−1)
evaluation of 𝑙𝑖,𝑘 during row reduction and then the 𝑢𝑘,𝑘 in back substitution. Further, those 𝑎𝑘,𝑘 are the final updated
values at indices (𝑘, 𝑘), so are the same as 𝑢𝑘,𝑘 . Thus it is exactly these main diagonal elements that we must deal with.
The basic strategy is that at step 𝑘, we can swap equation 𝑘 with any equation 𝑖, 𝑖 > 𝑘. Note that this involves swapping
those rows of array A and also those elements of the array b for the right-hand side: 𝑏𝑘 ↔ 𝑏𝑖 .
This approach of swapping equations (swapping rows in arrays A and b) is called pivoting, or more specifically partial
pivoting, to distinguish from the more elaborate strategy where to columns of A are also reordered (which is equivalent
to reordeting the unknowns in the equations). The row that is swapped with row 𝑘 is sometimes called the pivot row, and
the new denominator is the corresponding pivot element.
This approach is robust so long as one is using exact arithmetic: it works for any well-posed system because so long as
(𝑘−1)
the 𝐴𝑥 = 𝑏 has a unique solution — so that the original matrix 𝐴 is non-singular — at least one of the 𝑎𝑖,𝑘 , 𝑖 ≥ 𝑘
will be non-zero, and thus the swap will give a new element in position (𝑘, 𝑘) that is non-zero. (I will stop caring about
(𝑘)
superscripts to distinguish updates, but if you wish to, the elements of the new row 𝑘 could be called either 𝑎𝑘,𝑗 or even
𝑢𝑘,𝑗 , since those values are in their final state.)
The final refinement is to seek the smallest possible magnitudes for intermediate values, and thus the smallest absolute
(𝑘−1)
errors in them, by making the multipliers 𝑙𝑖,𝑘 small, in turn by making the denominator 𝑎𝑘,𝑘 = 𝑢𝑘,𝑘 as large as possible
in magnitude:
(𝑘−1) (𝑘−1)
At step 𝑘, choose the pivot row 𝑝𝑘 ≥ 𝑘 so that |𝑎𝑝𝑘 ,𝑘 | ≥ |𝑎𝑖,𝑘 | for all 𝑖 ≥ 𝑘. If there is more that one such element of
largest magnitude, use the lowest value: in particular, if 𝑝𝑘 = 𝑘 works, use it and do not swap!
I will not give a detailed algorithm for this, since we will soon implement an even better variant.
However, here are some notes on swapping values and how to avoid a possible pitfall.
a) Explain why we cannot just swap the relevant elements of rows 𝑘 and 𝑝 with:
for j in range(k,n):
A[k,j] = A[p,j]
A[p,j] = A[k,j]
A[k,k:] = A[p,k:]
A[p,k:] = A[k,k:]
temp = A[k,k:].copy()
A[k,k:] = A[p,k:]
A[p,k:] = temp
Some demonstrations
No row reduction is done here, so entire rows are swapped rather than just the elements from column 𝑘 onward:
Initially,
A =
[[ 1. -6. 2.]
[ 3. 5. -6.]
[ 4. 2. 7.]]
k = 0
p = 2
temp = A[k,k:].copy()
(continues on next page)
After swapping rows 1 <-> 3 (row indices 0 <-> 2) using slicing and a temporary␣
↪row,
A =
[[ 4. 2. 7.]
[ 3. 5. -6.]
[ 1. -6. 2.]]
k = 1
p = 2
for j in range(n):
( A[k,j] , A[p,j] ) = ( A[p,j] , A[k,j] )
print(f"After swapping rows 2 <-> 3 using a loop and tuples of elements, no temp,")
print(f"A =\n {A}")
After swapping rows 2 <-> 3 using a loop and tuples of elements, no temp,
A =
[[ 4. 2. 7.]
[ 1. -6. 2.]
[ 3. 5. -6.]]
k = 0
p = 1
( A[k,k:] , A[p,k:] ) = ( A[p,k:].copy() , A[k,k:].copy() )
print(f"After swapping rows 1 <-> 2 using tuples of slices, no loop or temp,")
print(f"A =\n {A}")
References:
• Section 2.2 The LU Factorization of [Sauer, 2022].
• Section 6.5 Matrix Factorizations of [Burden et al., 2016].
• Section 8.1 Matrix Factorizations of [Chenney and Kincaid, 2012].
Putting aside pivoting for a while, there is another direction in which the algorithm for solving linear systems 𝐴𝑥 = 𝑏 can
be improved. It starts with the idea of being more efficient when solving multiple system with the same right-hand side:
𝐴𝑥(𝑚) = 𝑏(𝑚) , 𝑚 = 1, 2, ….
However it has several other benefits:
• allowing a strategy to reduce rounding error, and
• a simpler, more elegant mathematical statement.
We will see how to merge this with partial pivoting in Solving Ax = b With Both Pivoting and LU factorization.
Some useful jargon:
The key to the LU factorization idea is finding a lower triangular matrix 𝐿 and an upper triangular matrix 𝑈 such that
𝐿𝑈 = 𝐴, and then using the fact that it is far quicker to solve a linear system when the corresponding matrix is triangular.
Indeed we will see that, if naive Gaussian elimination for 𝐴𝑥 = 𝑏 succeeds, giving row-reduced form 𝑈 𝑥 = 𝑐:
1. The matrix 𝐴 can be factorized as 𝐴 = 𝐿𝑈 with 𝑈 an 𝑛 × 𝑛 upper triangular matrix and 𝐿 an 𝑛 × 𝑛 lower
triangular matrix.
2. There is a unique such factorization with the further condition that 𝐿 is unit lower triangular, which means the
extra requirement that the value on its main daigonal are unity: 𝑙𝑘,𝑘 = 1. This is called the Doolittle Factorization
of 𝐴.
3. In the Doolittle factorization, the matrix 𝑈 is the one given by naive Gaussian elimination, and the elements of 𝐿
below its main diagonal are the multipliers arising in naive Gaussian elimination. (The other elements of 𝐿, on and
above the main diagonal, are the ones and zeros dictated by it being unit lower triangular: the same as for those
elements in the 𝑛 × 𝑛 identity matrix.)
4. The transformed right-hand side 𝑐 arising from naive Gaussian elimination is the solution of the system 𝐿𝑐 = 𝑏,
and this is solvable by an procedure caled forward substitution, very similar to the backward subsitution used to
solve 𝑈 𝑥 = 𝑐.
Putting all this together: if naive Gaussian elimination works for 𝐴, we can introduce the name 𝑐 for 𝑈 𝑥, and note that
𝐴𝑥 = (𝐿𝑈 )𝑥 = 𝐿(𝑈 𝑥) = 𝐿𝑐 = 𝑏. Then solving of the system 𝐴𝑥 = 𝑏 can be done in three steps:
1. Using 𝐴, find the Doolittle factors, 𝐿 and 𝑈 .
2. Using 𝐿 and 𝑏, solve 𝐿𝑐 = 𝑏 to get 𝑐. (Forward substitution)
3. Using 𝑈 and 𝑐, solve 𝑈 𝑥 = 𝑐 to get 𝑥. (Backward substitution)
If you believe the above claims, we already have one algorithm for finding an LU factorization; basically, do naive Gaussian
elimination, but ignore the right-hand side 𝑏 until later. However, there is another “direct” method, which does not rely
on anything we have seen before about Gaussian elimination, and has other advantages as we will see.
(If I were teaching linear algebra, I would be tempted to start here and skip Gaussian Elimination!)
This method starts by considering the apparently daunting task of solving the 𝑛2 simultaneous and nonlinear equations
for the initially unknown elements of 𝐿 and 𝑈 :
𝑛
∑ 𝑙𝑖,𝑘 𝑢𝑘,𝑗 = 𝑎𝑖,𝑗 1 ≤ 𝑖 ≤ 𝑛, 1 ≤ 𝑗 ≤ 𝑛.
𝑘=1
The first step is to insert the known information; the already-known values of elements of 𝐿 and 𝑈 . For one thing, the
sums above stop when either 𝑘 = 𝑖 or 𝑘 = 𝑗, whichever comes first, due to all the zeros in 𝐿 nd 𝑈 :
min(𝑖,𝑗)
∑ 𝑙𝑖,𝑘 𝑢𝑘,𝑗 = 𝑎𝑖,𝑗 1 ≤ 𝑖 ≤ 𝑛, 1 ≤ 𝑗 ≤ 𝑛.
𝑘=1
Next, when 𝑖 ≤ 𝑗— so that the sum ends at 𝑘 = 𝑖 and involves 𝑙𝑖,𝑖 — we can use 𝑙𝑖,𝑖 = 1.
So break up into two cases:
On and above the main diagonal (𝑖 ≤ 𝑗, so min(𝑖, 𝑗) = 𝑖):
𝑖−1
∑ 𝑙𝑖,𝑘 𝑢𝑘,𝑗 + 𝑢𝑖,𝑗 = 𝑎𝑖,𝑗 1 ≤ 𝑖 ≤ 𝑛, 𝑖 ≤ 𝑗 ≤ 𝑛.
𝑘=1
In each equation, the last term in the sum has been separated, so that we can use them to “solve” for an unknown:
𝑖−1
𝑢𝑖,𝑗 = 𝑎𝑖,𝑗 − ∑ 𝑙𝑖,𝑘 𝑢𝑘,𝑗 1 ≤ 𝑖 ≤ 𝑛, 𝑖 ≤ 𝑗 ≤ 𝑛.
𝑘=1
𝑗−1
𝑎𝑖,𝑗 − ∑𝑘=1 𝑙𝑖,𝑘 𝑢𝑘,𝑗
𝑙𝑖,𝑗 = 2 ≤ 𝑖 ≤ 𝑛, 1 ≤ 𝑗 ≤ 𝑖.
𝑢𝑗,𝑗
Here comes the characteristic step that gets us from valid equations to a useful algorithm: we can arrange these equations
in an order such that all the values at right are determined by an earlier equation!
First look at what they say for the first row and first column.
With 𝑖 = 1 in the first equation, there is no sum, and so: $𝑢1,𝑗 = 𝑎1,𝑗 , 1 ≤ 𝑗 ≤ 𝑛,$ which is the familiar fact that the
first row is unchanged in naive Gaussian elimination.
𝑎𝑖,1 𝑢𝑖,1
Next, with 𝑗 = 1 in the second equation, there is again no sum: $𝑙𝑖,1 = 𝑢1,1 , = 𝑢1,1 , 2 ≤ 𝑖 ≤ 𝑛,$ which is indeed the
multipliers in the first step of naive Gaussian elimination.
Remember that one way to think of Gaussian elimination is recursively: after step 𝑘, one just applies the same process
recursively to the smaller 𝑛 − 𝑘 × 𝑛 − 𝑘 matrix in the bottom-right-hand corner. We can do something similar here; at
stage 𝑘:
1. First use the first of the above equations to solve first for row 𝑘 of 𝑈 , meaning just 𝑢𝑘,𝑗 , 𝑗 ≥ 𝑘,
2. Then use the second equation to solve for column 𝑘 of 𝐿: 𝑙𝑖,𝑘 , 𝑖 > 𝑘.
# Import some items from modules individually, so they can be used by "first name only
↪".
and row matrix L[k, 1:k-1] by matrix U[1:k-1,k:n] gives the relevant row vector.
"""
n = len(A) # len() gives the number of rows in a 2D array.
# Initialize U as the zero matrix;
# correct below the main diagonal, with the other entries to be computed below.
U = zeros_like(A)
# Initialize L as the identity matrix;
# correct on and above the main diagonal, with the other entries to be computed␣
↪below.
L = identity(n)
# Column and row 1 (i.e Python index 0) are special:
U[0,:] = A[0,:]
L[1:,0] = A[1:,0]/U[0,0]
if demoMode:
print(f"After step k=0")
print(f"U=\n{U}")
print(f"L=\n{L}")
for k in range(1, n-1):
U[k,k:] = A[k,k:] - L[k,:k] @ U[:k,k:]
L[k+1:,k] = (A[k+1:,k] - L[k+1:,:k] @ U[:k,k])/U[k,k]
if demoMode:
print(f"After step {k=}")
print(f"U=\n{U}")
print(f"L=\n{L}")
# The last row (index "-1") is special: nothing to do for L
U[-1,-1] = A[-1,-1] - sum(L[-1,:-1]*U[:-1,-1])
if demoMode:
print(f"After the final step, k={n-1}")
print(f"U=\n{U}")
return (L, U)
print(f"A=\n{A}")
A=
[[ 4. 2. 7.]
[ 3. 5. -6.]
[ 1. -3. 2.]]
print(f"A=\n{A}")
print(f"L=\n{L}")
print(f"U=\n{U}")
print(f"L times U is \n{L@U}")
print(f"The 'residual' A - LU is \n{A - L@U}")
A=
[[ 4. 2. 7.]
[ 3. 5. -6.]
[ 1. -3. 2.]]
L=
[[ 1. 0. 0. ]
[ 0.75 1. 0. ]
[ 0.25 -1. 1. ]]
U=
[[ 4. 2. 7. ]
[ 0. 3.5 -11.25]
[ 0. 0. -11. ]]
L times U is
[[ 4. 2. 7.]
[ 3. 5. -6.]
[ 1. -3. 2.]]
The 'residual' A - LU is
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
This is the last piece missing. The strategy is very similar to backward substitution, but slightly simplified by the ones on
the main didogonal of 𝐿. The equations 𝐿𝑐 = 𝑏 can be written much as above, separating off the last term in the sum:
𝑛
∑ 𝑙𝑖,𝑗 𝑐𝑗 = 𝑏𝑖 , 1 ≤ 𝑖 ≤ 𝑛
𝑗=1
𝑖
∑ 𝑙𝑖,𝑗 𝑐𝑗 = 𝑏𝑖 , 1 ≤ 𝑖 ≤ 𝑛
𝑗=1
𝑖−1
∑ 𝑙𝑖,𝑗 𝑐𝑗 + 𝑐𝑖 = 𝑏𝑖 , 1 ≤ 𝑖 ≤ 𝑛
𝑗=1
These are already is usable order: the right-hand side in the equation for 𝑐𝑖 involves only the 𝑐𝑗 values with 𝑗 < 𝑖,
determined by earlier equations if we run through index 𝑖 in increasing order.
First, 𝑖 = 1
0
𝑐1 = 𝑏1 − ∑ 𝑙1,𝑗 𝑐𝑗 , = 𝑏1
𝑗=1
Next, 𝑖 = 2
1
𝑐2 = 𝑏2 − ∑ 𝑙2,𝑗 𝑐𝑗 , = 𝑏2 − 𝑙2,1 𝑐1
𝑗=1
Next, 𝑖 = 3
2
𝑐3 = 𝑏3 − ∑ 𝑙3,𝑗 𝑐𝑗 , = 𝑏3 − 𝑙3,1 𝑐1 − 𝑙3,2 𝑐2
𝑗=1
I leave as an exerise expressing this is pseudo-code (adjusted to zero-based indexing); here it is in Python; also available
from the module numericalMethods with
Exercise 1
A) Express this forward substitution strategy as pseudo-code, adjusting to Python’s zero-based indexing. Spell out all the
sums in explicit rather than using ‘Σ’ notation for sums any matrix multiplication short-cut.
B) Then implement it “directly” in a Python function, with format:
function forwardSubstitution(L, b)
. . .
return c
Again do this with explicit evaluation of each sum rather than using the function sum or any matrix multiplication short-
cut.
C) Test it, using this often-useful “reverse-engineering” tactic:
1. Create suitable test arrays L and c. (Use 𝑛 at least three, and preferably larger.)
2. Compute their product, with b = L * c
3. Check if c_solution = forwardSubstitution(L, b) gives the correct value (within rounding error.)
As usual, there is also an implementation available from module numericalMethods, at … and forward substitution
…, so this is used here. (It is not in the form asked for in the above exercise!)
c = forwardSubstitution(L, b)
print(f"c = {c}")
print(f"The residual b - Lc is {b - L@c}")
print(f"\t with maximum norm {max(abs(b - L@c)):0.3}")
c = [2. 1.5 5. ]
The residual b - Lc is [0. 0. 0.]
with maximum norm 0.0
As this step is unchanged, just import the version seen in a previous section.
x = backwardSubstitution(U, c)
Exercise 2
if __name__ == "__main__":
We will add the Doolittle method and such in a while, and could use this module in assignments and projects.
Creating modules
One way to do this is with Spyder (or another Python IDE). However, if you prefer working primarily with JupyterLab
and Jupyter notebooks, one way to create this module is to first put the function def’s and testing code in a notebook
linearalgebra.ipynb and then convert that to linearalgebra.py with the JupyerLab menu command
File > Export Notebook As ... > Export Notebook to Executable Script
As an example of creating a module, I am creating one as we go along in this course, via the notebook Notebook for
generating the module numericalMethods and Python code file numericalMethods.py derived from that, which
defines module numericalMethods.
The notebook version is in the Appendices of this Jupyter book.
It was seen in the section Partial Pivoting that naive Gaussian elimination works (in the sense of avoiding division by zero)
so one good result is that
Theorem 2.4.1
Any SDD matrix has a Doolittle factorization 𝐴 = 𝐿𝑈 , with the diagonal elements of 𝑈 all non-zero, so backward
substitution also works.
For any column-wise SDD matrix, this LU factorization exists and is also “optimal”, in the sense that it follows what you
would do with maximal element partial pivoting.
This nice second property can be got for SDD matrices via a twist, or actually a transpose.
For an SDD matrix, it transpose 𝐵 = 𝐴𝑇 is column-wise SDD and so has the nice Doolitle factorization described above:
𝐵 = 𝐿𝐵 𝑈𝐵 , with 𝐿𝐵 being column-wise diagonally dominant and having ones on the main diagonal.
Transposing back, 𝐴 = 𝐵𝑇 = (𝐿𝐵 𝑈𝐵 )𝑇 = 𝑈𝐵𝑇 𝐿𝑇𝐵 , and defining 𝐿 = 𝑈𝐵𝑇 and 𝑈 = 𝐿𝑇𝐵 ,
• 𝐿 is lower triangular
• 𝑈 is upper triangular, row-wise diagonally dominant and with ones on it main diagonal: it is “unit upper triangular”.
• Thus 𝐿𝑈 is another LU factorization of 𝐴, with 𝑈 rather than 𝐿 being the factor with ones on its main diagonal.
This sort of 𝐿𝑈 factorization is called the Crout decomposition; as with the Doolittle version, if such a factorization
exists, it is unique.
Theorem 2.4.2
Every SDD matrix has a Crout decomposition, and the factor 𝑈 is SDD.
Remark 2.4.2
As was mentioned at the end of the section Row Reduction/Gaussian Elimination naive Gausion elminaor alwo worek for
positive definite matrices,amnd thus so does th Doolittle LU factirozation. However, there is another LU factorization that
works even better in that case, the Cholesky factorization; this topic might be returned to later.
References:
• Section 2.4 The PA=LU Factorization of [Sauer, 2022].
• Section 6.5 Matrix Factorizations of [Burden et al., 2016].
• Section 8.1 Matrix Factorizations of [Chenney and Kincaid, 2012].
3.5.1 Introduction
The last step in producing an algorithm for solving the general case of 𝑛 simultaneous linear equations in 𝑛 variables
that is robust, efficient and with good control of rounding error is to combine the ideas of partial pivoting from Partial
Pivoting and LU factorization from Solving Ax = b with LU factorization, A = L U.
This is sometimes described in three parts:
• permute (reorder) the rows of the matirx 𝐴 by multiplying it at left by a suitable permutation matrix 𝑃 ; one with a
single “1” in each row and each column and zeros elsewhere;
• Get the LU factorization of this matrix: 𝑃 𝐴 = 𝐿𝑈 .
• To solve 𝐴𝑥 = 𝑏
– Express as 𝑃 𝐴𝑥 = 𝐿𝑈 𝑥 = 𝑃 𝑏 (which just involves computing 𝑃 𝑏, which reorders the elements of 𝑏)
– Solve 𝐿𝑐 = 𝑃 𝑏 for 𝑐 by forward substitution
– Solve 𝑈 𝑥 = 𝑐 for 𝑥 by backward substitution: as before, this gives 𝐿𝑈 𝑥 = 𝐿𝑐 = 𝑃 𝑏 and 𝐿𝑈 𝑥 = 𝑃 𝐴𝑥,
so 𝑃 𝐴𝑥 = 𝑃 𝑏; since a permutation matrix 𝑃 is invertible (just unravel the row swaps), this ensures that
𝐴𝑥 = 𝑏.
This gives a nice formulas in terms of matrices; however we can describe it a bit more compactly and efficiently by just talk-
ing about the permutation of the rows, described by a permutation vector — an 𝑛 component vector 𝜋 = [𝜋1 , 𝜋2 , … , 𝜋𝑛 ]
whose elements are the integers from 1 to 𝑛 in some order. So that is how the algorithm will be described below.
(Aside: I use the conventional name 𝜋 for a permutation vector, partly to distinguish from the notation 𝑝𝑖 used for pivot
rows; however, feel free to use the name 𝑝 instead, especially in Julia code.)
A number of details of this sketch will now be filled in, including the very useful fact that the permutation vector (or
matrix) can be contsructed “on the fly”, as rows are swapped in partial pivoting.
Let us look at maximal element partial pivoting, but described in terms of the entries of the factors 𝐿 and 𝑈 , and updating
matrix 𝐴 with a succession of row swaps.
(For now, I omit what happens to the right-hand side vector 𝑏; that is where the permutation vecor 𝑝 will come in, as
addressed below.)
What happens if pivoting occurs at some stage 𝑘, with swapping of row 𝑘 with a row 𝑝𝑘 > 5?
One might fear that the process has to start again from the top using the modified version of matrix 𝐴, but in fact all
previous work can be reused, just swapping those rows “everywhere”.
To see this with a concrete example consider what happens if at stage 𝑘 = 5 we swap rows 5 and 10 of 𝐴.
A) Firstly, what happens to matrix 𝐴?
The previous steps of the LU factorization process only involved entries of 𝐴 in its first four rows and first four columns,
and this row swap has no effect of them. Likewise, in row reduction, changes at and below row 𝑘 = 5 have no effect on
the first four rows of the row reduced form, 𝑈 .
Thus, the only change here is to swap the entries of 𝐴 between rows 5 and 10. What is more, the subsequent calculations
only involve columns of index 𝑗 = 5 upwards, so in fact we only need to update those entries. This can be written as
Thus if we are working in Python with 𝐴 stored in a numpy array, the update is the slice operation
(except for that pesky Pythonic down-shifting of indices; to be seen in pseudo-code later!)
B) Next, look at the work done so far on 𝑈 .
That just consists of the previous rows 1 ≤ 𝑖 ≤ 4, and the swapping of rows 5 with 10 has no effect up there:
Values already computed in 𝑈 are unchanged.
C) Finally, look at the work done so far on the multipiers 𝑙𝑖,𝑗 ; that is, matrix 𝐿.
The values computed so far are the first four columns of 𝐿; the multiples 𝑙𝑖,𝑗 , 1 ≤ 𝑗 ≤ 4 of row 𝑗 subtracted from row
𝑖 > 𝑗. These do change: for example, the multiple 𝑙5,2 of row 2 is now subtracted from what was row 5 but is now row
10: thus, the new value of 𝑙10,2 is the previous value of 𝑙5,2 .
Likewise, the same is true in reverse: the new value of 𝑙5,2 is the previous value of 𝑙10,2 . This applies for all of the first
four rows, so second index 1 ≤ 𝑗 ≤ 4:
The entries of 𝐿 computed so far are swapped between rows 5 and 10, leaving the rest unchanged.
As this is again only for some columns — the first four — the swaps needed are:
The example above extends to all stages 𝑘 of row reduction or computing the LU factorization or a permute versio of
matrix 𝐴, where we adjust the pivot element at position (𝑘, 𝑘) by first swapping row 𝑘 with a row 𝑝𝑘 , ≥ 𝑘. (Allowing that
sometimes no swap is needed, so that 𝑝𝑘 = 𝑘.)
Gathering the key formulas above, this part of the algorithm is
Algorithm 2.5.1
for k from 1 to n-1
Find the pivot row 𝑝𝑘 , ≥ 𝑘.
if 𝑝𝑘 > 𝑘
Swap 𝑙𝑘,𝑗 ↔ 𝑙𝑝𝑘 ,𝑗 , 1≤𝑗<𝑘
Swap 𝑎𝑘,𝑗 ↔ 𝑎𝑝𝑘 ,𝑗 , 𝑘≤𝑗≤𝑛
end
end
Here I also adopt slice notation; for example, 𝑎𝑘,𝑘∶𝑛 denotes the slice [𝑎𝑘,𝑘 … 𝑎𝑘,𝑛 ].
end
for i from k+1 to n (Get the non-zero elements in column 𝑘 of 𝐿 — except the 1’s on its diagonal)
𝑘−1
𝑎𝑖,𝑘 − ∑𝑠=1 𝑙𝑖,𝑠 𝑢𝑠,𝑘
𝑙𝑖,𝑘 =
𝑢𝑘,𝑘
end
end
One thing is missing from this strategy so far: if we are solving with a given right-hand-side column vector 𝑏, we would
also swap its rows at each stage, with
𝑏𝑘 ↔ 𝑏𝑝𝑘
but with the LU factorization we need to keep track of these swaps for use later.
This turns out to mesh nicely with another detail: we can avoid actually copying array entries around by just keeping track
of the order in which we use rows to get zeros in other rows. Our goal will be a permutation vector 𝜋 = [𝜋1 , 𝜋2 , … 𝜋𝑛 ]
which says:
• First use row 𝜋1 to get zeros in column 1 of the 𝑛 − 1 other rows.
• Then use row 𝜋2 to get zeros in column 2 of the 𝑛 − 2 remaining rows.
• …
To do this:
• first, initialize an array 𝜋 = [1, 2, … , 𝑛]
• at stage 𝑘, if the pivot element is in row 𝑝𝑘 ≠ 𝑘, swap the corresponding elements in 𝜋 (rather than swapping entire
rows of arrays):
𝜋𝑘 ↔ 𝜋 𝑝 𝑘
Introducing the name 𝐴′ for the new version of matrix 𝐴, its row 𝑘 has entries 𝑎′𝑘,𝑗 = 𝑎𝜋𝑘 ,𝑗 .
This pattern persists through each row swap: instead of computing a succesion of updated versions of matrix 𝐴, we leave
it alone and just change the row indices:
All references to entries of 𝐴 are now done with permuted row index: 𝑎𝜋𝑖 ,𝑗
The same applies to the array 𝐿 of multipliers:
All references to entries of 𝐿 are now done with 𝑙𝜋𝑖 ,𝑗 .
Finally, since these row swaps also apply to the right-hand side 𝑏, we do the same there:
All references to entries of 𝑏 are now done with 𝑏𝜋𝑖 .
Remark 2.5.1
For the version with a permutation matrix 𝑃 , instead:
• start with an array 𝑃 that is the identity matrix, and then
• swap its rows 𝑘 ↔ 𝑝𝑘 at stage 𝑘 instead of swapping the entries of 𝜋 or the rows of 𝐴 and 𝐿.
and row matrix L[k, 1:k-1] by matrix U[1:k-1,k:n] gives the relevant row vector.
"""
(continues on next page)
A=
[[ 1. -3. 22.]
(continues on next page)
Matrix 𝐿 is not actually lower triangular, due to the permutation of its rows, but is still fine for a version of forward
substition, because
• row 𝜋1 only involves 𝑥1 (multiplied by 1) and so can be used to solve for 𝑥1
• row 𝜋2 only involves 𝑥1 and 𝑥2 (the latter multiplied by 1) and so can be used to solve for 𝑥2
• …
To solve 𝐿𝑐 = 𝑏, all one has to change from the formulas for forward substitution seen in the previous section Solving Ax
= b with LU factorization, A = L U is to put the permuted row index 𝜋𝑖 in both 𝐿 and 𝑏:
𝑖−1
𝑐𝑖 = 𝑏𝜋𝑖 − ∑ 𝑙𝜋𝑖 ,𝑗 𝑐𝑗 , 1 ≤ 𝑖 ≤ 𝑛
𝑗=1
print(f"b = {b}")
b = [2. 3. 4.]
c = forwardsubstitution(L, b, perm)
print(f"c={c}")
c=[4. 0. 1.]
Then the final step, solving 𝑈 𝑥 = 𝑏 for 𝑥, needs no change, because 𝑈 had no rows swapped, so we are done; we can
import the function backwardsubstitution seen previously
x = backwardsubstitution(U, c)
print(f"x={x}")
r = b - A@x
print(f"The residual r = b - Ax is \n{r}, with maximum norm {max(abs(r))}")
References:
• Section 2.3.1 Error Magnification and Condition Number of [Sauer, 2022].
• Section 7.5 Error Bounds and Iterative Refinement of [Burden et al., 2016] — but you may skip the last part, on
Iterative Refinement; that is not relevant here.
• Section 8.4 of [Chenney and Kincaid, 2012].
For an approximation 𝑥𝑎 of the solution 𝑥 of 𝐴𝑥 = 𝑏, the residual 𝑟 = 𝐴𝑥𝑎 − 𝑏 measures error as backward error, often
measured by a single number, the residual norm ‖𝐴𝑥𝑎 − 𝑏‖. Any norm could be used, but the maximum norm is usualt
preferred, for reasons that we will see soon.
The corresponding (dimensionless) measure of relative error is defined as
‖𝑟‖
.
‖𝑏‖
However, these can greatly underestimate the forward errors in the solution: the absolute error ‖𝑥 − 𝑥𝑎 ‖ and relative error
‖𝑥 − 𝑥𝑎 ‖
𝑅𝑒𝑙(𝑥𝑎 ) =
‖𝑥‖
To relate these to the residual, we need the concepts of a matrix norm and the condition number of a matrix.
Given any vector norm ‖ ⋅ ‖ — such as the maximum (“infinity”) norm ‖ ⋅ ‖∞ or the Euclidean norm (length) ‖ ⋅ ‖2 — the
correponding induced matrix norm is
‖𝐴𝑥‖
‖𝐴‖ ∶= max , = max ‖𝐴𝑥‖
𝑥≠0 ‖𝑥‖ ‖𝑥‖=1
This maximum exists for ethe rof these vector norms, and for the infinity norm there ia an explicit formula for it: for any
𝑚 × 𝑛 matrix,
𝑛
𝑚
‖𝐴‖∞ = max ∑ |𝑎𝑖𝑗 |
𝑖=1
𝑗=1
(On the other hand, it is far harder to compute the Euclidean norm of a matrix: the formula requires computing eigen-
values.)
Note that when the matrix is a vector considered as a matrix with a single column — so 𝑛 = 1 — the sum goes away, and
this agrees with the infinity vector norm. This allows us to consider vectors as being just matrices with a single column,
which we will often do from now on.
3.6. Error bounds for linear algebra, condition numbers, matrix norms, etc. 115
Introduction to Numerical Methods and Analysis with Python
These induced matrix norms have many properties in common with Euclidean length and other vector norms, but there
can also be products, and then one has to be careful.
1. ‖𝐴‖ ≥ 0 (positivity)
2. ‖𝐴‖ = 0 if and only if 𝐴 = 0 (definiteness)
3. ‖𝑐𝐴‖ = |𝑐| ‖𝐴‖ for any constant 𝑐 (absolute homogeneity)
4. ‖𝐴 + 𝐵‖ ≤ ‖𝐴‖ + ‖𝐵‖ (sub-additivity or the triangle inequality),
and when the product of two matrices makes sense (including matrix-vector products),
5. ‖𝐴𝐵‖ ≤ ‖𝐴‖ ‖𝐵‖ (sub-multiplicativity)
Note the failure to always have equality with products. Indeed one can have 𝐴𝐵 = 0 with 𝐴 and 𝐵 both non-zero, such
as when 𝐴 is a singular matrix and 𝐵 is a null-vector for it.
𝜅(𝐴) ∶= ‖𝐴‖‖𝐴−1 ‖
Note that for a singular matrix, this is undefined: we can intuitively say that the condition number is then infinite.
At the other extreme, the identity matrix 𝐼 has norm 1 and condition number 1 (using any norm), and this is the best
possible because in general 𝜅(𝐴) ≥ 1. (This follows from property 5, sub-multiplicativity.)
Aside: estimating ‖𝐴−1 ‖∞ and thence the condition number, and numpy.linalg.cond
In Python, good approximations of condition numbers are given by the function numpy.linalg.cond.
As with numpy.linalg.norm, the default numpy.linalg.cond(A) gives 𝜅2 (𝐴), based on the Euclidian length
‖ ⋅ ‖2 for vectors; to get the infinity norm version 𝜅∞ (𝐴) use numpy.linalg.cond(A, numpy.inf).
This is not done exactly, since computing the inverse is a lot of work for large matrices, and good estimates can be got
far more quickly. The basic idea is start with the formula
and instead compute the maximum over some finite selection of values for 𝑥: call them 𝑥(𝑘) . Then to evaluate 𝑦(𝑘) =
𝐴−1 𝑥(𝑘) , express this through the equation 𝐴𝑦(𝑘) = 𝑥(𝑘) . Once we have an LU factorization for 𝐴 (which one probably
would have when exploring errors in a numerical solution of 𝐴𝑥 = 𝑏) each of these systems can be solved relatively fast:
Then
Condition numbers, giving upper limit on the ratio of forward error to backward error, measure the amplification of errors,
and have counterparts in other contexts. For example, with an approximation 𝑟𝑎 of a root 𝑟 of the equation 𝑓(𝑥) = 0, the
1
ratio of forward error to backward error is bounded by max 1/|𝑓 ′ (𝑥)| = , where the maximum only need be
min |𝑓 ′ (𝑥)|
taken over an interval known to contain both the root and the approximation. This condition number becomes “infinite”
for a multiple root, 𝑓 ′ (𝑟) = 0, related to the problems we have seen in that case.
Careful calculation of an approximate solution 𝑥𝑎 of 𝐴𝑥 = 𝑏 can often get a residual that is at the level of machine
rounding error, so that roughly the relative backward error is of size comparable to the machine unit, 𝑢. The condition
number then guarantees that the (forward) relative error is no greater than about 𝑢 𝜅(𝐴).
In terms of significant bits, with 𝑝 bit machine arithmetic, one can hope to get 𝑝 − log2 (𝜅(𝐴)) significant bits in the result,
but can not rely on more, so one loses log2 (𝜅(𝐴)) significant bits. Compare this to the observation that one can expect to
lose at least 𝑝/2 significant bits when using the approximation 𝐷𝑓(𝑥) ≈ 𝐷ℎ 𝑓(𝑥) − (𝑓(𝑥 + ℎ) = 𝑓(𝑥))/ℎ.
A well-conditioned problem is one that is not too highly sensitive to errors in rounding or input data; for an eqution
𝐴𝑥 = 𝑏, this corresponds to the condition number of 𝐴 not being to large; the matrix 𝐴 is then sometimes also called
well-conditioned. This is of course vague, but might typically mean that 𝑝−log2 (𝜅(𝐴)) is a sufficient number of significant
bits for a particular purpose.
A problem that is not deemed well-conditioned is called ill-conditioned, so that a matrix of uncomfortably large condition
number is also sometimes called ill-conditioned. An ill-conditioned problem might still be well-posed, but just requiring
careful and precise solution methods.
1
𝐻𝑖,𝑗 =
𝑖+𝑗−1
3.6. Error bounds for linear algebra, condition numbers, matrix norms, etc. 117
Introduction to Numerical Methods and Analysis with Python
For example
and for larger or smaller 𝑛, one simply adds or remove rows below and columns at right.
These matrices arise in important situations like finding the polynomial of degree 𝑛 − 1 that fits given data in the sense
of minimizing the root-mean-square error — as we will discuss later in this course if there is time and interest.
Unfortunately as 𝑛 increases the condition number grows rapidly, causing severe rounding error problems. To illustrate
this, I will do something that one should usually avoid: compute the inverse of these matrices. This is also a case that
shows the advatage of the LU factorization, since one computes the inverse by succesively computing each column, by
solving 𝑛 different systems of equations, each with the same matrix 𝐴 on the left-hand side.
import numpy as np
from numpy import inf
from numericalMethods import lu_factorize, forwardsubstitution, backwardsubstitution,␣
↪solvelinearsystem
def inverse(A):
"""Use sparingly; there is usually a way to avoid computing inverses that is␣
↪faster and with less rounding error!"""
n = len(A)
A_inverse = np.zeros_like(A)
(L, U) = lu_factorize(A)
for i in range(n):
b = np.zeros(n)
b[i] = 1.0
c = forwardsubstitution(L, b)
A_inverse[:,i] = backwardsubstitution(U, c)
return A_inverse
def hilbert(n):
H = np.zeros([n,n])
for i in range(n):
for j in range(n):
H[i,j] = 1.0/(1.0 + i + j)
return H
for n in range(2,6):
H_n = hilbert(n)
print(f"H_{n} is")
print(H_n)
H_n_inverse = inverse(H_n)
print("and its inverse is")
print(H_n_inverse)
print("to verify, their product is")
print(H_n @ H_n_inverse)
print()
H_2 is
[[1. 0.5 ]
[0.5 0.33333333]]
and its inverse is
[[ 4. -6.]
[-6. 12.]]
to verify, their product is
[[1. 0.]
[0. 1.]]
H_3 is
[[1. 0.5 0.33333333]
[0.5 0.33333333 0.25 ]
[0.33333333 0.25 0.2 ]]
and its inverse is
[[ 9. -36. 30.]
[ -36. 192. -180.]
[ 30. -180. 180.]]
to verify, their product is
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
H_4 is
[[1. 0.5 0.33333333 0.25 ]
[0.5 0.33333333 0.25 0.2 ]
[0.33333333 0.25 0.2 0.16666667]
[0.25 0.2 0.16666667 0.14285714]]
and its inverse is
[[ 16. -120. 240. -140.]
[ -120. 1200. -2700. 1680.]
[ 240. -2700. 6480. -4200.]
[ -140. 1680. -4200. 2800.]]
to verify, their product is
[[ 1.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[-3.55271368e-15 1.00000000e+00 -1.13686838e-13 -1.13686838e-13]
[-3.55271368e-15 5.68434189e-14 1.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 -5.68434189e-14 0.00000000e+00 1.00000000e+00]]
H_5 is
[[1. 0.5 0.33333333 0.25 0.2 ]
[0.5 0.33333333 0.25 0.2 0.16666667]
[0.33333333 0.25 0.2 0.16666667 0.14285714]
[0.25 0.2 0.16666667 0.14285714 0.125 ]
[0.2 0.16666667 0.14285714 0.125 0.11111111]]
and its inverse is
[[ 2.500e+01 -3.000e+02 1.050e+03 -1.400e+03 6.300e+02]
[-3.000e+02 4.800e+03 -1.890e+04 2.688e+04 -1.260e+04]
[ 1.050e+03 -1.890e+04 7.938e+04 -1.176e+05 5.670e+04]
[-1.400e+03 2.688e+04 -1.176e+05 1.792e+05 -8.820e+04]
[ 6.300e+02 -1.260e+04 5.670e+04 -8.820e+04 4.410e+04]]
to verify, their product is
[[ 1.00000000e+00 4.54747351e-13 0.00000000e+00 0.00000000e+00
0.00000000e+00]
[-2.84217094e-14 1.00000000e+00 -1.81898940e-12 -1.81898940e-12
1.81898940e-12]
[-4.26325641e-14 9.09494702e-13 1.00000000e+00 0.00000000e+00
(continues on next page)
3.6. Error bounds for linear algebra, condition numbers, matrix norms, etc. 119
Introduction to Numerical Methods and Analysis with Python
Note how the inverses have some surprisingly large elements; this is the matrix equivalent of a number being very close
to zero and so with a very large reciprocal.
Since we have the inverses, we can compute the matrix norms of each 𝐻𝑛 and its inverse, and thence their condition
numbers; then this can be compared to the approximations of these condition numbers given by numpy.linalg.
cond
for n in range(2,6):
H_n = hilbert(n)
print(f"H_{n} is")
print(H_n)
print(f"with infinity norm {norm(H_n, inf)}")
H_n_inverse = inverse(H_n)
print("and its inverse is")
print(H_n_inverse)
print(f"with infinity norm {norm(H_n_inverse, inf)}")
print(f"Thus the condition number of H_{n} is {norm(H_n, inf) * norm(H_n_inverse,␣
↪inf)}")
H_2 is
[[1. 0.5 ]
[0.5 0.33333333]]
with infinity norm 1.5
and its inverse is
[[ 4. -6.]
[-6. 12.]]
with infinity norm 18.000000000000007
Thus the condition number of H_2 is 27.00000000000001
For comparison, numpy.linalg.cond gives 27.00000000000001
H_3 is
[[1. 0.5 0.33333333]
[0.5 0.33333333 0.25 ]
[0.33333333 0.25 0.2 ]]
with infinity norm 1.8333333333333333
and its inverse is
[[ 9. -36. 30.]
[ -36. 192. -180.]
[ 30. -180. 180.]]
with infinity norm 408.00000000000165
Thus the condition number of H_3 is 748.000000000003
For comparison, numpy.linalg.cond gives 748.0000000000027
H_4 is
[[1. 0.5 0.33333333 0.25 ]
[0.5 0.33333333 0.25 0.2 ]
[0.33333333 0.25 0.2 0.16666667]
(continues on next page)
H_5 is
[[1. 0.5 0.33333333 0.25 0.2 ]
[0.5 0.33333333 0.25 0.2 0.16666667]
[0.33333333 0.25 0.2 0.16666667 0.14285714]
[0.25 0.2 0.16666667 0.14285714 0.125 ]
[0.2 0.16666667 0.14285714 0.125 0.11111111]]
with infinity norm 2.283333333333333
and its inverse is
[[ 2.500e+01 -3.000e+02 1.050e+03 -1.400e+03 6.300e+02]
[-3.000e+02 4.800e+03 -1.890e+04 2.688e+04 -1.260e+04]
[ 1.050e+03 -1.890e+04 7.938e+04 -1.176e+05 5.670e+04]
[-1.400e+03 2.688e+04 -1.176e+05 1.792e+05 -8.820e+04]
[ 6.300e+02 -1.260e+04 5.670e+04 -8.820e+04 4.410e+04]]
with infinity norm 413279.999999164
Thus the condition number of H_5 is 943655.9999980911
For comparison, numpy.linalg.cond gives 943655.9999999335
Next, experiment with solving equations, to compare residuala with actual errors.
I will use the testing strategy of starting with a known solution 𝑥, from which the right-hand side 𝑏 is computed; then
slight simulated error is introduced to 𝑏. Running this repeatedly with use of different random “errors” gives an idea of
the actual error.
for n in range(2,6):
print(f"{n=}")
H_n = hilbert(n)
x = np.linspace(1.0, n, n)
print(f"x is {x}")
b = H_n @ x
print(f"b is {b}")
error_scale = 1e-8
b_imperfect = b + 2.0 * error_scale * (random(n) - 0.5) # add random "errors"␣
↪between -error_scale and error_scale
3.6. Error bounds for linear algebra, condition numbers, matrix norms, etc. 121
Introduction to Numerical Methods and Analysis with Python
print()
n=2
x is [1. 2.]
b is [2. 1.16666667]
b has been slightly changed to [2. 1.16666667]
The residual maximum norm is 7.2687527108428185e-09
and the relative backward error ||r||/||b|| is 3.634e-09
The absolute error is 9.218e-08
The relative error is 4.609e-08
For comparison, the relative error bound from the formula above is 9.813e-08
Beware: the relative error is larger than the relative backward error by a factor␣
↪12.682367
n=3
x is [1. 2. 3.]
b is [3. 1.91666667 1.43333333]
b has been slightly changed to [3.00000001 1.91666667 1.43333333]
The residual maximum norm is 7.658041312197383e-09
and the relative backward error ||r||/||b|| is 2.553e-09
The absolute error is 1.367e-06
The relative error is 4.555e-07
For comparison, the relative error bound from the formula above is 1.909e-06
Beware: the relative error is larger than the relative backward error by a factor␣
↪178.4471
n=4
x is [1. 2. 3. 4.]
b is [4. 2.71666667 2.1 1.72142857]
b has been slightly changed to [4. 2.71666667 2.1 1.72142857]
The residual maximum norm is 6.2034000158917024e-09
and the relative backward error ||r||/||b|| is 1.551e-09
The absolute error is 5.916e-05
The relative error is 1.479e-05
For comparison, the relative error bound from the formula above is 4.401e-05
Beware: the relative error is larger than the relative backward error by a factor␣
↪9536.5336
n=5
x is [1. 2. 3. 4. 5.]
b is [5. 3.55 2.81428571 2.34642857 2.01746032]
b has been slightly changed to [5.00000001 3.55 2.81428571 2.34642857 2.
↪01746033]
Beware: the relative error is larger than the relative backward error by a factor␣
↪83063.018
References:
• Section 2.5 Iterative Methods in [Sauer, 2022], sub-sections 2.5.1 to 2.5.3.
• Chapter 7 Iterative Techniques in Linear Algebra in [Burden et al., 2016], sections 7.1 to 7.3.
• Section 8.4 in [Chenney and Kincaid, 2012].
3.7.1 Introduction
This topic is a huge area, with lots of ongoing research; this section just explores the first few methods in the field:
1. The Jacobi Method.
2. The Gauss-Seidel Method.
The next three major topics for further study are:
3. The Method of Succesive Over-Relaxation (“SOR”). This is usually done as a modification of the Gauss-Seidel
method, though the strategy of “over-relaxation” can also be applied to other iterative methods such as the Jacobi
method.
4. The Conjugate Gradient Method (“CG”). This is beyond the scope of this course; I mention it because in the realm
of solving linear systems that arise in the solution of differential equations, CG and SOR are the basis of many of
the most modern, advanced methods.
5. Preconditioning.
The basis of the Jacobi method for solving 𝐴𝑥 = 𝑏 is splitting 𝐴 as 𝐷 + 𝑅 where 𝐷 is the diagonal of 𝐴:
𝑑𝑖,𝑖 = 𝑎𝑖,𝑖
𝑑𝑖,𝑗 = 0, 𝑖≠𝑗
so that 𝑅 = 𝐴 − 𝐷 has
𝑟𝑖,𝑖 = 0
𝑟𝑖,𝑗 = 𝑎𝑖,𝑗 , 𝑖≠𝑗
Visually
𝑎11 0 0 …
⎡ 0 𝑎22 0 … ⎤
𝐷=⎢ ⎥
⎢ 0 0 𝑎33 … ⎥
⎣ ⋮ ⋮ ⋮ ⋱ ⎦
It is easy to solve 𝐷𝑥 = 𝑏: the equations are just 𝑎𝑖𝑖 𝑥𝑖 = 𝑏𝑖 with solution 𝑥𝑖 = 𝑏𝑖 /𝑎𝑖𝑖 .
Thus we rewrite the equation 𝐴𝑥 = 𝐷𝑥 + 𝑅𝑥 = 𝑏 in the fixed point form
𝐷𝑥 = 𝑏 − 𝑅𝑥
and then use the familiar fixed point iteration strategy of inserting the currect approximation at right and solving for the
new approximation at left:
𝐷𝑥(𝑘) = 𝑏 − 𝑅𝑥(𝑘−1)
Note: We could make this look closer to the standard fixed-point iteration form 𝑥𝑘 = 𝑔(𝑥𝑘−1 ) by dividing out 𝐷 to get
but — as is often the case — it will be better to avoid matrix inverses by instead solving this easy system. This “inverse
avoidance” becomes far more important when we get to the Gauss-Seidel method!
x = jacobi_basic(A, b, n)
B) Then refine this to apply an error tolerance, but also avoiding infinite loops by imposing an upper limit on the number
of iterations:
Test this with the matrices of form 𝑇 below for several values of 𝑛, increasingly geometrically. To be cautious initially,
try 𝑛 = 2, 4, 8, 16, …
To analyse the Jacobi method — answering questions like for which matrices it works, and how quickly it converges —
and also to improve on it, it helps to described a key strategy underlying it, which is this: approximate the matrix 𝐴 by
another one 𝐸 one that is easier to solve with, chosen so that the discrepacy 𝑅 = 𝐴 − 𝐸 is small enough. Thus, repeatedly
solving the new easier equations 𝐸𝑥(𝑘) = 𝑏(𝑘) plays a similar role to repeatedly solving tangent line approximations in
Newton’s method.
Of course to be of any use, 𝐸 must be somewhat close to 𝐴; the remainder 𝑅 must be small enough. We can make
this requirement precise with the use of matrix norms introduced in Error bounds for linear algebra, condition numbers,
matrix norms, etc. and an upgrade of the contraction mapping theorem seen in Solving Equations by Fixed Point Iteration
(of Contraction Mappings).
where 𝑐 = 𝐸 −1 𝑏 and 𝑆 = 𝐸 −1 𝑅.
For vector-valued functions we extend the previous Definition 1.2.2 in Section Solving Equations by Fixed Point Iteration
(of Contraction Mappings) as:
With this, it turns out that the above iteration converges if 𝑆 is “small enough” in the sense that ‖𝑆‖ = 𝐶 < 1 — and it
is enough that this works for any choice of matrix norm!
Theorem 2.7.2
If 𝑆 ∶= 𝐸 −1 𝑅 = 𝐸 −1 𝐴−𝐼 has ‖𝑆‖ = 𝐶 < 1 for any choice of matrix norm, then the iterative scheme 𝑥(𝑘) = 𝑐−𝑆𝑥(𝑘−1)
with 𝑐 = 𝐸 −1 𝑏 converges to the solution of 𝐴𝑥 = 𝑏 for any choice of the initial approximation 𝑥(0) . (Aside: the zero
vector is an obvious and popular choice for 𝑥(0) .)
Incidentally, since this condition guarantees that there exists a unique solution to 𝐴𝑥 = 𝑏, it also shows that 𝐴 is non-
singular.
Proof. (sketch)
The main idea is that for 𝑔(𝑥) = 𝑐 − 𝑆𝑥,
‖𝑔(𝑥) − 𝑔(𝑦)‖ = ‖(𝑐 − 𝑆𝑥) − (𝑐 − 𝑆𝑦)‖ = ‖𝑆(𝑦 − 𝑥)‖ ≤ ‖𝑆‖‖𝑦 − 𝑥‖ ≤ 𝐶‖𝑥 − 𝑦‖,
For the Jacobi method, 𝐸 = 𝐷 so 𝐸 −1 is the diagonal matrix with elements 1/𝑎𝑖,𝑖 on the main diagonal, zero elsewhere.
The product 𝐸 −1 𝐴 then multiplies each row 𝑖 of 𝐴 by 1/𝑎𝑖,𝑖 , giving
• First, sum the absolute values of elements in each row 𝑖; with the common factor 1/|𝑎𝑖,𝑖 |, this gives
(|𝑎𝑖,1 | + |𝑎𝑖,2 | + ⋯ |𝑎𝑖,𝑖−1 | + |𝑎𝑖,𝑖+1 | + ⋯ |𝑎𝑖,𝑛 |) /|𝑎𝑖,𝑖 |.
Such a sum, skipping index 𝑗 = 𝑖, can be abbreviated as
( ∑ |𝑎𝑖,𝑗 |) /|𝑎𝑖,𝑖 |
1≤𝑗≤𝑛,𝑗≠𝑖
𝑛
𝐶 = ‖𝐸 −1 𝐴‖∞ = max [( ∑ |𝑎𝑖,𝑗 |) /|𝑎𝑖,𝑖 |]
𝑖=1
1≤𝑗≤𝑛,𝑗≠𝑖
and the contraction condition 𝐶 < 1 becomes the requirement that each of these 𝑛 “row sums” is less than 1:
Multiplying each of the inequalities by the denominator |𝑎𝑖,𝑖 | gives 𝑛 conditions
This is strict diagonal dominance, as in Definition 2.1.1 in the section Row Reduction/Gaussian Elimination, and as dis-
cussed there, one way to think of this is that such a matrix 𝐴 is close to its main diagonal 𝐷, which is the intuitive condition
that the approximation of 𝐴 by 𝐷 as done in the Jacobi method is “good enough”.
And indeed, combining this result with Theorem 2.7.2 gives:
By the way, other matrix norms give other conditions guaranteeing convergence; perhaps the most useful of these others
is that it is also sufficient for 𝐴 to be column-wise strictly diagonally dominant as in Definition 2.1.2.
𝐴𝐿 𝑥 = 𝑏 − 𝑈 𝑥
𝐴𝐿 𝑥(𝑘) = 𝑏 − 𝑈 𝑥(𝑘−1)
Here we definitely do not use the inverse of 𝐴𝐿 when calculating! Instead, solve with forward substitution.
However to analyse convergence, the mathematical form
𝑥(𝑘) = 𝐴−1 −1
𝐿 𝑏 − (𝐴𝐿 𝑈 )𝑥
(𝑘−1)
is useful: the iteration map is now 𝑔(𝑥) = 𝑐 − 𝑆𝑥 with 𝑐 = (𝐿 + 𝐷)−1 𝑏 and 𝑆 = (𝐿 + 𝐷)−1 𝑈 .
Arguing as above, we see that convergence is guaranteed if ‖(𝐿 + 𝐷)−1 𝑈 ‖ < 1. However it is not so easy in general to
get a formula for ‖(𝐿 + 𝐷)−1 𝑈 ‖; what one can get is slightly disappointing in that, despite the 𝑅 = 𝑈 here being in some
sense “smaller” than the 𝑅 = 𝐿 + 𝑈 for the Jacobi method, the general convergence guarantee looks no better:
However, in practice the convergence rate as given by 𝐶 = 𝐶𝐺𝑆 = ‖(𝐿 + 𝐷)−1 𝑈 ‖ is often better than for the 𝐶 =
𝐶𝐽 = ‖𝐷−1 (𝐿 + 𝑈 )‖ for the Jacobi method.
Sometimes this reduces the number of iterations enough to outweigh the extra computational effort involved in each
iteration and make this faster overall than the Jacobi method — but not always.
Exercise 2: Implement and test the Gauss-Seidel method, and compare to Jacobi
Do the two versions as above and use the same test cases.
Then compare the speed/cost of the two methods: one way to do this is by using Python’s “stop watch”, function time.
time: see the description of Python module time in the Python manual.
3.7.5 A family of test cases, arising from boundary value problems for differential
equations
𝑡𝑖,𝑖 = 1 + 2ℎ2
𝑡𝑖,𝑖+1 = 𝑡𝑖,𝑖+1 = −ℎ2
𝑡𝑖,𝑗 = 0, |𝑖 − 𝑗| > 1
and variants of this arise in the solutions of boundary value problems for ODEs like
Reference:
Section 6.6 Special Types of Matrices in [Burden et al., 2016], the sub-sections on Band Matrices and Tridiagonal Matrices.
Differential equations often lead to the need to solve systems of equations 𝑇 𝑥 = 𝑏 where the matrix 𝑇 has this speical
form:
𝑑1 𝑢1
⎡ 𝑙 𝑑2 𝑢2 ⎤ 𝑥1 𝑏1
⎢ 1 ⎥⎡ ⎤ ⎡ 𝑏 ⎤
𝑙2 𝑑3 𝑢3 𝑥2
𝑇𝑥 = ⎢ ⎥⎢ ⎥=⎢ 2 ⎥
⎢ ⋱ ⋱ ⋱ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ 𝑙𝑛−2 𝑑𝑛−1 𝑢𝑛−1 ⎥⎣ 𝑥𝑛 ⎦ ⎣ 𝑏𝑛 ⎦
⎣ 𝑙𝑛−1 𝑑𝑛 ⎦
with all “missing” entries being zeros. The notation used here suggests one efficient way to store such a matrix: as three
1D arrays 𝑑, 𝑙 and 𝑢.
(Such equations also arise in other important situations, such as spline interpolation)
It can be verified that LU factorization preserves all the non-zero values, so that the Doolittle algorithm — if it succeeds
without any division by zero — gives 𝑇 = 𝐿𝑈 with the form
1 𝐷1 𝑢1
⎡ 𝐿 1 ⎤ ⎡ 𝐷2 𝑢2 ⎤
⎢ 1 ⎥ ⎢ ⎥
𝐿2 1 𝐷3 𝑢3
𝐿=⎢ ⎥, 𝑈 = ⎢ ⎥
⎢ ⋱ ⋱ ⎥ ⎢ ⋱ ⋱ ⎥
⎢ 𝐿𝑛−2 1 ⎥ ⎢ 𝐷𝑛−1 𝑢𝑛−1 ⎥
⎣ 𝐿𝑛−1 1 ⎦ ⎣ 𝐷𝑛 ⎦
Note that the first non-zero element in each column is unchanged, as with a full matrix, but now it means that the upper
diagonal elements 𝑢𝑖 are unchanged.
Again, one way to describe and store this information is with just the two new 1D arrays 𝐿 and 𝐷, along with the unchanged
array 𝑢.
3.8.2 Algorithms
3.8. Faster Methods for Solving 𝐴𝑥 = 𝑏 for Tridiagonal and Banded matrices, and Strict Diagonal
129
Dominance
Introduction to Numerical Methods and Analysis with Python
As we have seen, approximating derivatives to higher order of accuracy and approximating derivatives of order greater
than two requires more than three nodes, but the locations needed are all close to the ones where the derivative is being
approximated. For example, the simplest symmetric approximation of the fourth derivative 𝐷4 𝑓(𝑥) used values from
𝑓(𝑥 − 2ℎ) to 𝑓(𝑥 + 2ℎ). Then row 𝑖 of the corresponding matrix has all its non-zero elements at locations (𝑖, 𝑖 − 2) to
(𝑖, 𝑖 + 2): the non-zero elements lie in the narrow “band” where |𝑖 − 𝑗| ≤ 2, and thus on five “central” diagonals.
This is a penta-digonal matrix, and an example of the larger class of banded matrices: ones in which all the non-zero
elements have indices −𝑝 ≤ 𝑗 − 𝑖 ≤ 𝑞 for 𝑝 and 𝑞 smaller than 𝑛 — usually far smaller; 𝑝 = 𝑞 = 2 for a penta-digonal
matrix.
Let us recap the general Doolittle algorithm for computing an LU factorization:
With a banded matrix, many of the entries at right are zero, particularly in the two sums, which is where most of the
operations are. Thus we can rewrite, exploiting the fact that all elements with indices 𝑗 − 𝑖 < −𝑝 or 𝑗 − 𝑖 > 𝑞 are
zero. To start with, the top diagonal is not modified, as already noted for the tridiagonal case: 𝑢𝑘,𝑘+𝑞 = 𝑎𝑘,𝑘+𝑞 for
1 ≤ 𝑘 ≤ 𝑛 − 𝑞.
𝑢1,𝑗 = 𝑎1,𝑗
end
The top non-zero diagonal is unchanged:
for k from 1 to n - q
𝑢𝑘,𝑘+𝑞 = 𝑎𝑘,𝑘+𝑞
end
The left column requires no sums:
for i from 2 to 1+p
𝑙𝑖,1 = 𝑎𝑖,1 /𝑢1,1
end
The main loop:
for k from 2 to n
for j from k to min(n, k+q-1)
𝑘−1
𝑢𝑘,𝑗 = 𝑎𝑘,𝑗 − ∑ 𝑙𝑘,𝑠 𝑢𝑠,𝑗
𝑠=𝑚𝑎𝑥(1,𝑘−𝑝,𝑗−𝑞)
end
for i from k+1 to min(n,k+p-1)
𝑘−1
𝑙𝑖,𝑘 = ⎛
⎜𝑎𝑖,𝑘 − ∑ 𝑙𝑖,𝑠 𝑢𝑠,𝑘 ⎞
⎟ /𝑢𝑘,𝑘
⎝ 𝑠=𝑚𝑎𝑥(1,𝑖−𝑝,𝑘−𝑞) ⎠
end
end
It is common for a banded matrix to have equal band-width on either side, 𝑝 = 𝑞, as with tridiagonal and pentadiagonal
matrices. Then the algorithm is somewhat simpler:
3.8. Faster Methods for Solving 𝐴𝑥 = 𝑏 for Tridiagonal and Banded matrices, and Strict Diagonal
131
Dominance
Introduction to Numerical Methods and Analysis with Python
for k from 2 to n
for j from k to min(n, k+p-1)
𝑘−1
𝑢𝑘,𝑗 = 𝑎𝑘,𝑗 − ∑ 𝑙𝑘,𝑠 𝑢𝑠,𝑗
𝑠=𝑚𝑎𝑥(1,𝑗−𝑝)
end
for i from k+1 to min(n,k+p)
𝑘−1
𝑙𝑖,𝑘 = ⎛
⎜𝑎𝑖,𝑘 − ∑ 𝑙𝑖,𝑠 𝑢𝑠,𝑘 ⎞
⎟ /𝑢𝑘,𝑘
⎝ 𝑠=𝑚𝑎𝑥(1,𝑖−𝑝) ⎠
end
end
These algorithms for banded matrices do no pivoting, and that is highly desirable, because pivoting creates non-zero
elements outside the “band” and so can force one back to the general algorithm. Fortunately, we have seen one case
where this is fine: the matrix being either row-wise or column-wise strictly diagonally dominant.
References:
• Section 12.1 Power Iteration Methods of [Sauer, 2022].
• Section 7.2 Eigenvalues and Eigenvectors of [Burden et al., 2016].
• Chapter 8, More on Linear Equations of [Chenney and Kincaid, 2012], in particular section 3 Power Method, and
also section 2 Eigenvalues and Eigenvectors as background reading.
The eigenproblem for a square 𝑛 × 𝑛 matrix 𝐴 is to compute some or all non-trivial solutions of
𝐴𝑣 ⃗ = 𝜆𝑣.⃗
(By non-trivial, I mean to exclude 𝑣 ⃗ = 0, which gives a solution for any 𝜆.) That is, to compute the eigenvalues 𝜆 (of
which generically there are 𝑛, but sometimes less) and the eigenvectors 𝑣 ⃗ corresponding to each.
With eigenproblems, and particularly those arising from differential equations, one often needs only the few smallest
and/or largest eigenvalues. For these, the power method described next can be adapted, leading to the shifted inverse
power method.
Here we often restict our attention to the case of a real symmetric matrix (𝐴𝑇 = 𝐴, or 𝐴𝑖𝑗 = 𝐴𝑗𝑖 ), or a Hermitian matrix
(𝐴𝑇 = 𝐴∗ ), for which many things are a bit simpler:
• all eigenvalues are real,
• for symmetric matrices, all eigenvectors are also real,
• there is a complete set of orthogonal eigenvectors 𝑣𝑘⃗ , 1 ≤ 𝑖 ≤ 𝑛 that form a basis for all vectors, and so on.
However, the methods described here can be used more generally, or can be made to work with minor adjustments.
The eigenvalue are roots of the characteristic polynomial, det(𝐴 − 𝜆𝐼); repeated roots are possible, and they will all be
named, so there are always values 𝜆𝑖 , 1 ≤ 𝑖 ≤ 𝑛. Here, these eigenvalues will be enumerated in decreasing order of
magnitude:
Generically, all the magnitudes are different, which makes things works more easily, so that will sometimes be assumed
while developing the intuition of the method.
The basic tool is the Power Method, which will usually but not always succeed in computing the eigenvalue of largest
magnitude, 𝜆1 , and a corresponding eigenvector 𝑣1⃗ . Its success mainly involves assuming there being a unique largest
eigenvalue: 𝜆1 > 𝜆𝑖 for 𝑖 > 1.
In its simplest form, one starts with a unit-length vector 𝑥⃗ 0 , so ‖𝑥⃗ 0 ‖ = 1, constructs the successive multiples 𝑦 ⃗ 𝑘 = 𝐴𝑘 𝑥⃗ 0
by successive multiplications, and rescales at each stage to the unit vectors 𝑥⃗ 𝑘 = 𝑦 ⃗ 𝑘 /‖𝑦 ⃗ 𝑘 ‖.
Note that 𝑦 ⃗ 𝑘+1 = 𝐴𝑥⃗ 𝑘 , so that once 𝑥⃗ 𝑘 is approximately an eigenvector for eigenvalue 𝜆, 𝑦 ⃗ 𝑘+1 ≈ 𝜆𝑥⃗ 𝑘 , leading to the
eigenvalue approximation
3.9. Computing Eigenvalues and Eigenvectors: the Power Method, and a bit beyond 133
Introduction to Numerical Methods and Analysis with Python
Exercise 1
3 1 1
𝐴=⎡
⎢ 1 8 1 ⎤
⎥
⎣ 1 1 4 ⎦
This all real eigenvalues, all within 2 of the diagonal elements (this claim should be explained as part of the project
write-up), so start with it.
As a debugging strategy, you could replace all those off-diagonal ones by a small value 𝛿:
3 𝛿 𝛿
𝐴𝛿 = ⎡
⎢ 𝛿 8 𝛿 ⎤
⎥
⎣ 𝛿 𝛿 4 ⎦
Then the Gershgorin circle theorem ensures that each eigenvalue is within 2𝛿 of an entry on the main diagonal. Further-
more, if 𝛿 is small enough that the circles of radius 2𝛿 centered on the diagonal elements do not overlap, then there is one
eigenvalue in each circle.
You could even start with 𝛿 = 0, for which yuo know exactly the eigenvalues: they are the diagonal elements.
Here and below you could check your work with Numpy, using function numpy.linalg.eig(A).
However, that is almost cheating, so note that there is also a backward error check: see how small ‖𝐴𝑣 − 𝜆𝑣‖/‖𝑣‖ is.
import numpy as np
import numpy.linalg as la
help(la.eig)
delta = 0.01
A = np.array([[3, delta, delta],[delta, 8, delta],[delta, delta, 4]])
[eigenvalues, eigenvectors] = la.eig(A)
Some details are omitted above; above all, how to decide the number of iterations.
One approach is to use the fact that an eigenvector-eigenvalue pair satisfies 𝐴𝑣 ⃗ − 𝜆𝑣 ⃗ = 0, so the “residual norm”
‖𝐴𝑥⃗ 𝑘 − 𝑟(𝑘) 𝑥⃗ 𝑘 ‖
, = ‖𝑦 ⃗ 𝑘+1 − 𝑟(𝑘) 𝑥⃗ 𝑘 ‖ since ‖𝑥⃗ 𝑘 ‖ = 1
‖𝑥⃗ 𝑘 ‖
‖𝑦 ⃗ 𝑘+1 − 𝑟(𝑘) 𝑥⃗ 𝑘 ‖ ≤ 𝜖.
Alternatively, keep the for loop, but exit early (with break) if this condition is met.
I generally recommend this for-if-break form for implementing iterative methods, because it makes avoidance of
infinite loops simpler, and avoids the common while loop issue that you do not yet have an error estimate when the loop
starts.
Exercise 2
The next step is to note that if 𝐴 is nonsingular, its inverse 𝐵 = 𝐴−1 has the same eigenvectors, but with eigenvalues
𝜇𝑖 = 1/𝜆𝑖 .
Thus we can apply the power method to 𝐵 in order to compute its largest eigenvalue, which is 𝜇𝑛 = 1/𝜆𝑛 , along with
the corresponding eigenvector 𝑣𝑛⃗ .
The main change to the above is that
However, as usual one can (and should) avoid actually computing the inverse. Instead, express the above as the sysem of
equations.
𝐴𝑦 ⃗ 𝑘+1 = 𝑥𝑘⃗ .
Here is an important case where the LU factorization method can speed things up greatly: a single LU factorization is
needed, after which for each 𝑘 one only has to do the far quicker forward and backward substitution steps: 𝑂(𝑛2 ) cost
for each iteration instead of 𝑂(𝑛3 /3).
3.9. Computing Eigenvalues and Eigenvectors: the Power Method, and a bit beyond 135
Introduction to Numerical Methods and Analysis with Python
Exercise 3
Implement this basic algorithm (with a fixed iteration count, as in Example 1), and then create a second version that
imposes an accuracy target (as in Example 2).
3.9.3 Getting other eigenvalues with the Shifted Inverse Power Method
The inverse power method computes the eigenvalue closest to 0; by shifting, we can compute the eigenvalue closest to any
chosen value 𝑠. Then by searching various values of 𝑠, we can hope to find all the eigenvectors. As a variant, once we
have 𝜆1 and 𝜆𝑛 , we can search nearby for other large or small eigenvalues: often the few largest and/or the few smallest
are most important.
With a symmetric (or Hermitian) matrix, once the eigenvalue of largest magnitude, 𝜆1 is known, the rest are known to
be real values in the interval [−|𝜆1 |, |𝜆1 |], so we know roughly where to seek them.
The main idea here is that for any number 𝑠, matrix 𝐴 − 𝑠𝐼 has eigenvalues 𝜆𝑖 − 𝑠, with the same eigenvectors as 𝐴:
Thus, applying the inverse power method to 𝐴 − 𝑠𝐼 computes its largest eigenvalue 𝛾, and then 𝜆 = 1/(𝛾 + 𝑠) is the
eigenvalue of 𝐴 closest to 𝑠.
Exercise 4
As above, implement this, probably sarting with a fixed iteration count version.
For the test case above, some plausible initial choices for the shifts are each if the entries on the main diagonal, and as
above, testing with 𝐴𝑠
3.9.4 Further topics: getting all the eigenvalues with the QR Method, etc.
The above methods are not ideal when many or all of the eigenvalues of a matrix are wanted; then a variety of more
advanced methods have been developed, starting with the QR (factorization) Method.
We will not address the details of that method in this course, but one way to think about it for a symmetric matrix is that:
• The eigenvectors are orthogonal.
• Thus, if after computing 𝜆1 and 𝑣1⃗ , one uses the power iteration starting with 𝑥⃗ 0,2 orthogonal to 𝑣1⃗ , then all the
new iterates 𝑥⃗ 𝑘,2 will stay orthogonal, and one will get the eigenvector corresponding to the largest remaining
eigenvector: you get 𝑣2⃗ and 𝜆2 .
• Continuing likewise, one can get the eigenvalues in descending order of magnitude.
• As a modification, one can do all these almost in parallel: at iteration 𝑘, have an approximation 𝑥⃗ 𝑘,𝑖 for each 𝜆𝑖 and
at each stage, got by adjusting these new approximations so that 𝑥⃗ 𝑘,𝑖 is orthogonal to all the approximations 𝑥⃗ 𝑘,𝑗 ,
𝑗 < 𝑖, for all the previous (larger) eigenvalues. This uses a variant of the Gram-Schmidt method for orthogonalizing
a set of vectors.
References:
• Section 2.7 Nonlinear Systems of Equations of [Sauer, 2022]; in particular Sub-section 2.7.1 Multivariate Newton’s
Method.
• Chapter 10 Numerical Solution of Nonlinear Systems of Equations of [Burden et al., 2016]; in particular Sections
10.1 and 10.2.
3.10.1 Background
𝐹 (𝑥) = 0, 𝐹 ∶ ℝ𝑛 → ℝ𝑛
However, I use capital letters for vector-valued functions, for analogy to the use of capital letter for matrices.
𝑓(𝑥(𝑘) )
Rewriting Newton’s method according to this new style, $𝑥(𝑘+1) = 𝑥(𝑘) − 𝐷𝑓(𝑥(𝑘) )
$
or to avoid explicit division and introducing the useful increment 𝛿 (𝑘) ∶= 𝑥(𝑘+1) − 𝑥(𝑘) ,
For vector valued functions, we will see in a while that an analogous result is true:
where 𝐷𝐹 (𝑥) is the 𝑛 × 𝑛 matrix of all the partial derivatives (𝐷𝑥𝑗 𝐹𝑖 )(𝑥) or (𝐷𝑗 𝐹𝑖 )(𝑥), where 𝑥 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 ).
To justify the above result, we need at least a case of Taylor’s Theorem for functions of several variables, for both
𝑓 ∶ ℝ𝑛 → ℝ and 𝐹 ∶ ℝ𝑛 → ℝ𝑛 ; just for linear approximations. This material from multi-variable calculus will be
reviewed when we need it.
Warning: although mathematically this can be written with matrix inverses as
(𝑘+1) (𝑘) (𝑘) −1 (𝑘)
X =X − (𝐷F(X ) (F(X ),
evaluation of the inverse is in general about three times slower than solving the linear system, so is best avoided. (We
have seen a good compromise; Solving Ax = b with LU factorization, A = L U the LU factorization of a matrix.)
Even avoiding matrix inversion, this involves repeatedly solving systems of 𝑛 simultaneous linear equations in 𝑛 unknowns,
𝐴𝑥 = 𝑏, where the matrix 𝐴 is 𝐷F(𝑥(𝑘) ), and that will be seen to involve about 𝑛3 /3 arithmetic operations.
It also requires computing the new values of these 𝑛2 partial derivatives at each iteration, also potentially with a cost
proportional to 𝑛3 .
When 𝑛 is large, as is common with differential equations problems, this factor of 𝑛3 indicates a potentially very large
cost per iteration, so various modifications have been developed to reduce the computational cost of each iteration (with
the trade-off being that more iterations are typically needed): so-called quasi-Newton methods.
FOUR
References:
• Chapter 3 Interpolation of [Sauer, 2022].
• Chapter 3 Interpolation and Polynomial Approximation of [Burden et al., 2016].
• Chapter 4 of [Kincaid and Chenney, 1990].
References:
• Section 3.1 Data and Interpolating Functions in [Sauer, 2022].
• Section 3.1 Interpolation and the Lagrange Polynomial in [Burden et al., 2016].
• Section 4.1 in [Chenney and Kincaid, 2012].
4.1.1 Introduction
Numerical methods for dealing with functions require describing them, at least approximately, using a finite list of num-
bers, and the most basic approach is to approximate by a polynomial. (Other important choices are rational functions and
“trigonometric polynomials”: sums of multiples of sines and cosines.) Such polynomials can then be used to approximate
derivatives and integrals.
The simplest idea for approximating 𝑓(𝑥) on domain [𝑎, 𝑏] is to start with a finite collection of node values 𝑥𝑖 ∈ [𝑎, 𝑏],
0 ≤ 𝑖 ≤ 𝑛 and then seek a polynomial 𝑝 which collocates with 𝑓 at those values: 𝑝(𝑥𝑖 ) = 𝑓(𝑥𝑖 ) for 0 ≤ 𝑖 ≤ 𝑛. Actually,
we can put the function aside for now, and simply seek a polynomial that passes through a list of points (𝑥𝑖 , 𝑦𝑖 ); later we
will achieve collocation with 𝑓 by choosing 𝑦𝑖 = 𝑓(𝑥𝑖 ).
In fact there are infinitely many such polynomials: given one, add to it any polynomial with zeros at all of the 𝑛 + 1 notes.
So to make the problem well-posed, we seek the collocating polynomial of lowest degree.
(Note: although the degree is typically 𝑛, it can be less; as an extreme example, if all 𝑦𝑖 are equal to 𝑐, then 𝑃 (𝑥) is that
constant 𝑐.)
139
Introduction to Numerical Methods and Analysis with Python
Historically there are several methods for finding 𝑃𝑛 and proving its uniqueness, in particular, the divided difference
method introduced by Newton and the Lagrange polynomial method. However for our purposes, and for most modern
needs, a different method is easiest, and it also introduces a strategy that will be of repeated use later in this course: the
Method of Undertermined Coefficients or MUC.
In general, this method starts by assuming that the function wanted is a sum of unknown multiples of a collection of
𝑛
known functions. Here, 𝑃 (𝑥) = 𝑐𝑛 𝑥𝑛 + 𝑐𝑛−1 𝑥𝑛−1 + ⋯ + 𝑐1 𝑥 + 𝑐0 = ∑𝑗=0 𝑐𝑗 𝑥𝑗 .
(Note: any of the 𝑐𝑖 could be zero, including 𝑐𝑛 , in which case the degree is less than 𝑛.)
The unknown factors (𝑐0 ⋯ 𝑐𝑛 ) are the undetermined coefficients.
Next one states the problem as a system of equations for these undetermined coefficients, and solves them.
Here, we have 𝑛 + 1 conditions to be met:
𝑛
𝑃 (𝑥𝑖 ) = ∑ 𝑐𝑗 𝑥𝑗𝑖 = 𝑦𝑖 , 0≤𝑖≤𝑛
𝑗=0
This is a system if 𝑛 + 1 simultaneous linear equations in 𝑛 + 1 iunknowns, so the question of existence and uniqueness is
exactly the question of whether the corresponding matrix is singular, and so is equivalent to the case of all 𝑦𝑖 = 0 having
only the solution with all 𝑐𝑖 = 0.
Back in terms of polynomials, this is the claim that the only polynomial of degree at most 𝑛 with distinct zeros 𝑥0 … 𝑥𝑛
is the zero function. And this is true, because any non-trivial polynomial with those 𝑛 + 1 distinct roots is of degree at
least 𝑛 + 1, so the only “degree n” polynomial fitting this data is 𝑃 (𝑥) ≡ 0. The theorem is proven.
The proof of this theorem is completely constructive; it gives the only numerical method we need, and which is the one
implemented in Numpy through the pair of functions numpy.polyfit and numpy.polyval. (Aside: here as in
many places, Numpy mimics the names and functionality of corresponding Matlab tools.)
Briefly, the algorithm is this (indexing from 0 !)
• Create the 𝑛 + 1 × 𝑛 + 1 matrix 𝑉 with elements
𝑣𝑖,𝑗 = 𝑥𝑗𝑖 , 0 ≤ 𝑖 ≤ 𝑛, 0 ≤ 𝑗 ≤ 𝑛
and the 𝑛 + 1-element column vector 𝑦 with elements 𝑦𝑖 as above.
• Solve 𝑉 𝑐 = 𝑦 for the vector of coefficients 𝑐𝑗 as above.
I use the name 𝑉 because this is called the Vandermonde Matrix.
Example 3.1.1
As usual, I concoct a first example with known correct answer, by using a polynomial as 𝑓:
def f(x):
return 4 + 7*x - 2*x**2 -5*x**3 + 3*x**4
xnodes = array([1., 2., 7., 5., 4.]) # They do not need to be in order
nnodes = len(xnodes)
n = nnodes-1
print(f"The x nodes 'x_i' are {xnodes}")
ynodes = zeros_like(xnodes)
for i in range(nnodes):
ynodes[i] = f(xnodes[i])
print(f"The y values at the nodes are {ynodes}")
V = zeros([nnodes, nnodes])
for i in range(nnodes):
for j in range(nnodes):
V[i,j] = xnodes[i]**j
Solve, using our functions seen in earlier sections and gathered in Notebook for generating the module numericalMethods
4.1.2 Functions for computing the coefficients and evaluating the polynomials
We will use this procedure several times, so it time to put it into a functions — and add a pretty printer for polynomials.
These are returned in an array c of the same length as x and y, even if the␣
↪degree is less than the normal length(x)-1,
"""
nnodes = len(x)
n = nnodes - 1
V = zeros([nnodes, nnodes])
for i in range(nnodes):
for j in range(nnodes):
V[i,j] = x[i]**j
(U, z) = rowReduce(V, y)
c = backwardSubstitution(U, z)
return c
def showPolynomial(c):
print("P(x) = ", end="")
n = len(c)-1
print(f"{c[0]:.4}", end="")
if n > 0:
coeff = c[1]
if coeff > 0:
print(f" + {coeff:.4}x", end="")
elif coeff < 0:
print(f" - {-coeff:.4}x", end="")
if n > 1:
for j in range(2, len(c)):
coeff = c[j]
if coeff > 0:
print(f" + {coeff:.4}x^{j}", end="")
elif coeff < 0:
print(f" - {-coeff:.4}x^{j}", end="")
print()
print(P_i_new)
showPolynomial(c_new)
n is now 3, the nodes are now [-2. 0. 1. 2.], with f(x_i) values [70. 4. 7.␣
↪18.]
There are several ways to assess the accuracy of this fit; we start graphically, and later consider the maximum and root-
mean-square (RMS) errors.
figure(figsize=[12,6])
plot(x, f(x), label="y=f(x)")
plot(xnodes, ynodes, "*", label="nodes")
P_n_x = evaluatePolynomial(x, c)
plot(x, P_n_x, label="y = P_n(x)")
legend()
grid(True);
n = 3
xnodes_g = linspace(a_g, b_g, n+1)
ynodes_g = zeros_like(xnodes_g)
for i in range(len(xnodes_g)):
ynodes_g[i] = g(xnodes_g[i])
print(f"{n=}")
print(f"node x values {xnodes_g}")
print(f"node y values {ynodes_g}")
c_g = fitPolynomial(xnodes_g, ynodes_g)
print(f"The coefficients of P are {c_g}")
showPolynomial(c_g)
P_values = evaluatePolynomial(c_g, xnodes_g)
print(f"The values of P(x_i) are {P_values}")
n=3
node x values [-1. -0.33333333 0.33333333 1. ]
node y values [0.36787944 0.71653131 1.39561243 2.71828183]
The coefficients of P are [0.99519577 0.99904923 0.54788486 0.17615196]
P(x) = 0.9952 + 0.999x + 0.5479x^2 + 0.1762x^3
The values of P(x_i) are [-0.01593727 -0.00316622 -0.91810613 -1.04290824]
There are several ways to assess the accuracy of this fit. We start graphically, and later consider the maximum and
root-mean-square (RMS) errors.
x_g = linspace(a_g - 0.25, b_g + 0.25) # Go a bit beyond the nodes in each direction
figure(figsize=[14,10])
title("With $g(x) = e^x$")
plot(x_g, g(x_g), label="y = $g(x)$")
plot(xnodes_g, ynodes_g, "*", label="nodes")
P_g = evaluatePolynomial(x_g, c_g)
plot(x_g, P_g, label=f"y = $P_{n}(x)$")
legend()
grid(True);
References:
• Section 3.2.1 Interpolation error formula in [Sauer, 2022].
• Section 3.1 Interpolation and the Lagrange Polynomial in [Burden et al., 2016].
• Section 4.2 Errors in Polynomial Interpolation in [Kincaid and Chenney, 1990]..
4.2.1 Introduction
When a polynomial 𝑃𝑛 is given by collocation to 𝑛 + 1 points (𝑥𝑖 , 𝑦𝑖 ), 𝑦𝑖 = 𝑓(𝑥𝑖 ) on the graph of a function 𝑓, one can
ask how accurate it is as an approximation of 𝑓 at points 𝑥 other than the nodes: what is the error 𝐸(𝑥) = 𝑓(𝑥) − 𝑃 (𝑥)?
As is often the case, the result is motivated by considering the simplest “non-trival case”: 𝑓 a polynomial of degree one
too high for an exact fit, so of degree 𝑛 + 1. The result is also analogous to the familiar error formula for Taylor polynonial
approximations.
import numpy as np
from matplotlib.pyplot import figure, plot, title, grid, legend
from numericalMethods import fitPolynomial, evaluatePolynomial
# showPolynomial
Theorem 3.2.1
For a function 𝑓 with continuous derivative of order 𝑛 + 1 𝐷𝑛+1 𝑓, the polynomial 𝑃𝑛 of degree at most 𝑛 that fits the
points (𝑥𝑖 , 𝑓(𝑥𝑖 )) 0 ≤ 𝑖 ≤ 𝑛 differs from 𝑓 by
𝐷𝑛+1 𝑓(𝜉𝑥 ) 𝑛
𝐸𝑛 (𝑥) = 𝑓(𝑥) − 𝑃𝑛 (𝑥) = ∏(𝑥 − 𝑥𝑖 ) (4.1)
(𝑛 + 1)! 𝑖=0
Observation 3.2.1
This is rather similar to the error formula for the Taylor polynomial 𝑝𝑛 with center 𝑥0 :
𝐷𝑛+1 𝑓(𝜉𝑥 )
𝑒𝑛 (𝑥) = 𝑓(𝑥) − 𝑝𝑛 (𝑥) = (𝑥 − 𝑥0 )𝑛+1 , some 𝜉𝑥 between 𝑥0 and 𝑥. (4.2)
(𝑛 + 1)!
This is effectively the limit of Equation (4.1) when all the 𝑥𝑖 congeal to 𝑥0 .
An important special case is when there is a single parameter ℎ describing the spacing of the nodes; when they are the
𝑏−𝑎
equally spaced values 𝑥𝑖 = 𝑎 + 𝑖ℎ, 0 ≤ 𝑖 ≤ 𝑛, so that 𝑥0 = 𝑎 and 𝑥𝑛 = 𝑏 with ℎ = . Then there is a somewhat
𝑛
more practically usable error bound:
Theorem 3.2.2
For 𝑥 ∈ [𝑎, 𝑏] and the above equaly spaced nodes in that interval [𝑎, 𝑏],
𝑀𝑛+1 𝑛+1
|𝐸𝑛 (𝑥)| = |𝑓(𝑥) − 𝑃𝑛 (𝑥)| ≤ ℎ , = 𝑂(ℎ𝑛+1 ), (4.3)
𝑛+1
A major practical problem with this error bound is that is does not in general guarantee convergence 𝑃𝑛 (𝑥) → 𝑓(𝑥) as
𝑛 → ∞ with fixed interval [𝑎, 𝑏], because in some cases 𝑀𝑛+1 grows too fast.
A famous example is the “Witch of Agnesi” (so-called because it was introduced by Maria Agnesi, author of the first
textbook on differential and integral calculus).
def agnesi(x):
return 1/(1+x**2)
agnesi_x = agnesi(x)
plot(x, agnesi_x, label="Witch of Agnesi")
xnodes = np.linspace(a, b, n+1)
ynodes = agnesi(xnodes)
c = fitPolynomial(xnodes, ynodes)
P_n = evaluatePolynomial(x, c)
plot(xnodes, ynodes, 'r*', label="Collocation nodes")
plot(x, P_n, label="P_n(x)")
legend()
grid(True)
figure(figsize=[14, 5])
title(f"Error curve")
E_n = P_n - agnesi_x
plot(x, E_n)
grid(True);
The curve fits better in the central part, but gets worse towards the ends!
One hint as to why is to plot the polynomial factor in the error formula above:
The approach of least squares approximation is introduced in the next section Least-squares Fitting to Data; that can
be appropriate when the original data is not exact (due to measurement error in an experiment, for example) so a good
approximation at each node can be more appropriate than exact collocation at each but with implausable behavior between
the nodes.
When instead exact collocation is sought, piecewise interpolation is typically used. This involves collocation with multiple
polynomials of a fixed degree, each on a part of the domain. Then for each such polynomial 𝑀𝑚+1 in the above error
formula is independent of the number 𝑁 of nodes and with the nodes on interval [𝑎, 𝑏] at equal spacing ℎ = (𝑏 −𝑎)/(𝑁 −
1), one has the convergence result
𝑀𝑚+1 𝑚+1 1
|𝐸𝑚 (𝑥) ≤ ℎ = 𝑂(ℎ𝑚+1 ) = 𝑂 ( 𝑚+1 ) , → 0 as 𝑁 → ∞.
𝑚+1 𝑁
This only requires that 𝑓 has a continuous derivatives up to order 𝑚 + 1.
The simplest case of this — quite often used in computer graphics, including matplotlib.pyplot.plot — is to
divide the domain into 𝑁 − 1 sub-intervals of equal width separated by nodes 𝑥𝑖 = 𝑎 + 𝑖ℎ, 0 ≤ 𝑖 ≤ 𝑁 , and then
approximate 𝑓(𝑥) linearly on each sub-interval by using the two surrounding nodes 𝑥𝑖 and 𝑥𝑖+1 determined by having
𝑥𝑖 ≤ 𝑥 ≤ 𝑥𝑖+1 : this is piecewise linear interpolation.
This gives the approximating function 𝐿𝑁 (𝑥), and the above error formula, now with 𝑚 = 1, says that the worst absolute
error anywhere in the interval [𝑎, 𝑏] is
𝑀2 2
|𝐸2 (𝑥)| = |𝑓(𝑥) − 𝐿𝑁 (𝑥)| ≤ ℎ , 𝑀2 = max |𝑓 ″ (𝑥)|.
2 𝑥∈[𝑎,𝑏]
Thus for any 𝑓 that is is twice continuously differentiable the error at each 𝑥-value converges to zero as 𝑁 → ∞. Further,
it is uniform convergence: the maximum error over all points in the domain goes to zero.
Integrating this piecewise linear approximation over interval [𝑎, 𝑏] gives the Compound Trapezoid Rule approximation of
𝑏
∫𝑎 𝑓(𝑥)𝑑𝑥. As we will soon see, this also has error at worst 𝑂(ℎ2 ), = 𝑂(1/𝑁 2 ): each doubling of effort reduces errors
by a factor of about four.
Also, you might have heard of Simpson’s Rule for approximating definite integrals (and anyway, you will soon!): that uses
piecewise quadratic interpolation and we will see that this improves the errors to 𝑂(ℎ4 ), = 𝑂(1/𝑁 4 ): each doubling of
effort reduces errors by a factor of about 16.
which turns out to be a polynomial of degree 𝑛 + 1 that takes its maximum absolute value of 1 at the 𝑛 + 2 points
𝑖
cos ( 𝑛+1 𝜋) , 0 ≤ 𝑖 ≤ 𝑛 + 1.
There are a number of claims here: most are simple consequences of the definition and what is known about the roots
and extreme values of cosine. The one surprising fact is that 𝑇𝑛 (𝑥) is a polynomial of degree 𝑛, known as a Chebyshev
polynomial. The notation comes from an alternative transliteration, Tchebyshev, of this Russian name.
This can be checked by induction. The first few cases are easy to check: 𝑇0 (𝑥) = 1, 𝑇1 (𝑥) = 𝑥 and 𝑇2 (𝑥) = cos 2𝜃 =
2 cos2 𝜃 − 1 = 2𝑥2 − 1. In general, let 𝜃 = cos−1 𝑥 so that cos 𝜃 = 𝑥. Then trigonometric identities give
and similarly
Since 𝑇0 and 𝑇1 are known to be polynomials, the same follows for each successive 𝑛 from this formula. The induction
also shows that
𝑇𝑛 (𝑥) = 2𝑛−1 𝑥𝑛 + terms involving lower powers of 𝑥
so in particular the degree is 𝑛.
With this information, the error formula can be written in a special form. Firstly 𝑤𝑛+1 is then a polynomial of degree
𝑛 + 1 with the same roots as 𝑇𝑛+1 , so is a multiple of the latter function. Secondly, the leading coefficient of 𝑤𝑛+1 is
1, compared to 2𝑛+1 for the Chebyshev polynomial, so 𝑤𝑛+1 = 𝑇𝑛+1 /2𝑛 . Finally, the maximum of 𝑤𝑛+1 is seen to be
1/2𝑛 and we have the result that
Theorem 3.3.1
When a polynomial approximation 𝑝(𝑥) to a function 𝑓(𝑥) on the interval [−1, 1] is constructed by collocation at the
roots of 𝑇𝑛+1 , the error is bounded by
1
|𝑓(𝑥) − 𝑝(𝑥)| ≤ max |𝑓 (𝑛+1) (𝑡)|
2𝑛 (𝑛 + 1)! −1≤𝑡≤1
When the interval is [𝑎, 𝑏] and the collocation points are the appropriately rescaled Chebychev points as given in (4.4).
(𝑏 − 𝑎)𝑛+1
|𝑓(𝑥) − 𝑝(𝑥)| ≤ max |𝑓 (𝑛+1) (𝑥)|
22𝑛+1 (𝑛+ 1)! 𝑎≤𝑥≤𝑏
This method works well in many cases. Further, it is known that any continuous on any interval [𝑎, 𝑏] can be approximated
arbitrarily well by polynomials, in the sense that the maximum error over the whole interval can be made as small as one
likes [this is the Weierstrass Approximation Theorem]. However, collocation at these Chebyshev nodes will not work for
all continuous functions: indeed no choice of points will work for all cases, as is made precise in theorem 6 on page 288
of [Kincaid and Chenney, 1990]. One way to understand the problem is that the error bound relies on derivatives of ever
higher order, so does not even apply to some continuous functions.
This suggests a new strategy: break the interval [𝑎, 𝑏] into smaller interval, approximate on each interval by a polynomial
of some small degree, and join these polynomials together. Hopefully, the errors will only depend on a few derivatives,
and so will be more controllable, while using enough nodes and small enough intervals will allow the errors to be made
as small as desired. This fruitful idea is dealt with next.
The idea of approximating a function (or interpolating between a set of data points) with a function that is piecewise
polynomial takes its simplest form using continuous piecewise linear functions. Indeed, this is the method most commonly
used to produce a graph from a large set of data points: for example, the command plot from matplotlib.pyplot
(for Python) or PyPlot (for Julia) does it.
The idea is simply to draw straight lines between each successive data point. It is worth analysing this simple method
before considering more accurate approaches.
Consider a set of 𝑛 + 1 points (𝑥0 , 𝑦0 ), (𝑥1 , 𝑦1 ), … , (𝑥𝑛 , 𝑦𝑛 ) again, this time requiring the 𝑥 values to be in increasing
order. Then define the linear functions
𝑦 − 𝑦𝑖
𝐿𝑖 (𝑥) = 𝑦𝑖 + (𝑥 − 𝑥𝑖 ) 𝑖+1 , 𝑥𝑖 ≤ 𝑥 ≤ 𝑥𝑖+1 , 0 ≤ 𝑖 < 𝑛
𝑥𝑖+1 − 𝑥𝑖
with the values 𝐿(𝑥𝑖 ) = 𝑦𝑖 at all nodes, so that the definition is consistent at the points where the domains join, also
guaranteeing continuity.
(𝑡0 , 𝑦0 ), … , (𝑡𝑛 , 𝑦𝑛 )
and is linear in each of the 𝑛 interval between them, the “smoothest” curve that one can get is the continuous one given
by using linear interpolation between each consecutive pair of points. Less smooth functions are possible, for example
the piecewise constant approximation where 𝐿(𝑥) = 𝑦𝑖 for 𝑥𝑖 ≤ 𝑥 < 𝑥𝑖+1 .
The general strategy of spline interpolation is to approximate with a piecewise polynomial function, with some fixed
degree 𝑘 for the polynomials, and is as smooth as possible at the joins between different polynomials. Smoothness is
measured by the number of continuous derivatives that the function has, which is only in question at the knots of course.
The traditional and most important case is that of cubic splines interpolants, which have the form
These conditions automatically give continuity, but leave many degrees of freedom to impose more smoothness. Each
cubic is described by four coefficients and so there are 4𝑛 in all, and the interpolation conditions give only 2𝑛 conditions.
There are 𝑛 − 1 knots where different cubics join, so requiring 𝑆 to have continuous first and second derivatives imposes
2(𝑛 − 1) further conditions for a total of 4𝑛 − 2. This is the best smoothness possible without 𝑆(𝑥) becoming a single
cubic, and leaves two degrees of freedom. These will be dealt with later, but one approach is imposing zero second
derivatives at each end of the interval.
Thus we have the equations
′
𝑆𝑖−1 (𝑡𝑖 ) = 𝑆 ′ (𝑡𝑖 )
and
′′
𝑆𝑖−1 (𝑡𝑖 ) = 𝑆 ′′ (𝑡𝑖 ),
1 ≤ 𝑖 ≤ 𝑛 − 1.
The brute force method would be to write something like
𝑆𝑖 (𝑥) = 𝑎𝑖 𝑥3 + 𝑏𝑖 𝑥2 + 𝑐𝑖 𝑥 + 𝑑𝑖
which would leave to a set of 4𝑛 simultaneous linear equations for these 4𝑛 unknowns once the two missing conditions
have been chosen.
This could then be solved numerically, but the size and cost of the problem can be considerably reduced, to a tridiagonal
system of 𝑛 − 1 equations.
Start by considering the second derivative of 𝑆(𝑥), which must be continuous and piecewise linear. Its values at the knots
can be called 𝑥𝑖 = 𝑆𝑖′′ (𝑡𝑖 ) and the lengths of the interval called ℎ𝑖 = 𝑥𝑖+1 − 𝑥𝑖 so that
𝑧𝑖 𝑧
𝑆𝑖′′ (𝑥) = (𝑡𝑖+1 − 𝑥) + 𝑖+1 (𝑥 − 𝑡𝑖 )
ℎ𝑖 ℎ𝑖
Integrating twice,
𝑧𝑖 𝑧
𝑆𝑖 (𝑥) = (𝑡𝑖+1 − 𝑥)3 + 𝑖+1 (𝑥 − 𝑡𝑖 )3 + 𝐶𝑖 (𝑡𝑖+1 − 𝑥) + 𝐷𝑖 (𝑥 − 𝑡𝑖 )
6ℎ𝑖 6ℎ𝑖
The interpolation conditions then determine 𝐶𝑖 and 𝐷𝑖 :
𝑧𝑖 𝑧 𝑦 𝑧ℎ 𝑦 𝑧 ℎ
𝑆𝑖 (𝑥) = (𝑡 − 𝑥)3 + 𝑖+1 (𝑥 − 𝑡𝑖 )3 + ( 𝑖 − 𝑖 𝑖 ) (𝑡𝑖+1 − 𝑥) + ( 𝑖+1 − 𝑖+1 𝑖 ) (𝑥 − 𝑡𝑖 ) (4.5)
6ℎ𝑖 𝑖+1 6ℎ𝑖 ℎ𝑖 6 ℎ𝑖 6
In effect, three quarters of the equations have been solved explicitly, leaving only the 𝑧𝑖 to be determined using the
remaining condition of the continuity of 𝑆 ′ (𝑥).
Differentiating the above expression and evaluating at the appropriate points gives the expressions
ℎ𝑖 ℎ 𝑦 𝑦
𝑆𝑖′ (𝑡𝑖 ) = − 𝑧 − 𝑖 𝑧𝑖+1 − 𝑖 + 𝑖+1 (4.6)
3 𝑖 6 ℎ𝑖 ℎ𝑖
ℎ𝑖−1 ℎ 𝑦 𝑦
′
𝑆𝑖−1 (𝑡𝑖 ) = − 𝑧 + 𝑖−1 𝑧𝑖 − 𝑖−1 + 𝑖 (4.7)
6 𝑖−1 3 ℎ𝑖−1 ℎ𝑖−1
Equating these at the internal knots (and simplifying a bit) gives
6 6
ℎ𝑖−1 𝑧𝑖−1 + 2(ℎ𝑖 + ℎ𝑖−1 )𝑧𝑖 + ℎ𝑖 𝑧𝑖+1 = (𝑦 − 𝑦𝑖 ) − (𝑦 − 𝑦𝑖−1 ) (4.8)
ℎ𝑖 𝑖+1 ℎ𝑖−1 𝑖
These are 𝑛−1 linear equations in the 𝑛+1 unknowns 𝑧𝑖 , so various different cubic spline interpolants can be constructed
by adding two extra conditions in the form of two more linear equations. The traditional way is the one mentioned above:
require the second derivative to vanish at the two endpoints. That is
𝑆 ′′ (𝑡0 ) = 𝑆 ′′ (𝑡𝑛 ) = 0
Solving tridiagonal systems is far more efficient if it can be done without pivoting by the method seen earlier, and this is
a good method if the matrix is diagonally dominant.
That is true here: recalling that the 𝑡𝑖 are in increasing order, each ℎ𝑖 is positive, so each diagonal element is at least twice
the sum of the absolute values of all other elements in the same row. This result incidentally also shows that the equations
have a unique solution, which means that the natural cubic spline exists and is determined uniquely by the data, requiring
about 𝑂(𝑛) operations.
Evaluation of 𝑆(𝑥) is then done by finding the 𝑖 such that 𝑡𝑖 ≤ 𝑥 < 𝑡𝑖+1 and then evaluating the appropriate case in (4.5).
When the spline is to be used to approximate a function 𝑓(𝑥) one useful alternative choice of boundary conditions is to
specify the derivative of the spline function to match that of 𝑓 at the endpoints:
𝑦1 − 𝑦0 𝑓(𝑡1 ) − 𝑓(𝑡0 )
𝑑0 ∶= =
ℎ0 𝑡1 − 𝑡 0
𝑦𝑛 − 𝑦𝑛−1 𝑓(𝑡𝑛 ) − 𝑓(𝑡𝑛−1 )
𝑑𝑛 ∶= =
ℎ𝑛−1 𝑡𝑛 − 𝑡𝑛−1
using:
In conjunction with equation (4.8), this gives the new tridiagonal system
2ℎ0 ℎ0 𝑧0
⎡ ℎ 2(ℎ0 + ℎ1 ) ℎ1 ⎤⎡ 𝑧 ⎤
⎢ 0 ⎥⎢ 1 ⎥
⎢ ⋱ ⋱ ⋱ ⎥⎢ ⋮ ⎥
⎢ ℎ𝑛−2 2(ℎ𝑛−2 + ℎ𝑛−1 ) ℎ𝑛−1 ⎥ ⎢ 𝑧𝑛−1 ⎥
⎣ ℎ𝑛−1 2ℎ𝑛−1 ⎦ ⎣ 𝑧𝑛 ⎦
6 ((𝑦1 − 𝑦0 )/ℎ0 − 𝑑0 )
⎡ 6((𝑦2 − 𝑦1 )/ℎ1 − (𝑦1 − 𝑦0 )/ℎ1 ) ⎤
⎢ ⎥
= ⎢ ⋮ ⎥
⎢ 6((𝑦𝑛 − 𝑦𝑛−1 )/ℎ𝑛−1 − (𝑦𝑛−1 − 𝑦𝑛−2 )/ℎ𝑛−2 ) ⎥
⎣ 6 (𝑑𝑛 − (𝑦𝑛 − 𝑦𝑛−1 )/ℎ𝑛−1 ) ⎦
As in the case of the tridiagonal system for natural splines, the rows of the matrix also satisfy the condition of diagonal
dominance, so again this system has a unique solution that can be computed accurately with only 𝑂(𝑛) operations and no
pivoting.
If the exact derivatives mentioned in (4.10) are available, the errors are bounded as follows
Theorem 3.4.1
Suppose that 𝑓(𝑥) is four times continuously differentiable on the interval [𝑎, 𝑏], with max𝑎≤𝑥≤𝑏 |𝑓 (4) (𝑥)| ≤ 𝑀 . Then
the clamped cubic spline approximation 𝑆(𝑥) using the points 𝑎 = 𝑡0 < 𝑡1 < ⋯ < 𝑡𝑛 = 𝑏 and 𝑦𝑖 = 𝑓(𝑡𝑖 ) satisfies
4
5
|𝑓(𝑥) − 𝑆(𝑥)| ≤ 𝑀 ( max ℎ𝑖 )
384 0≤𝑖≤𝑛−1
There is also an error bound of the same “fourth order” form for the natural cubic spline: that is, one of the form of some
constant depending on 𝑓 times the fourth power of max0≤𝑖≤𝑛−1 ℎ𝑖 . However it is far more complicated to describe: see
page 138 of [Burden et al., 2016] for more comments on this.
When we have studied methods for approximating derivatives, it will be possible to establish error bounds for modified
clamped splines with various approximations for the derivatives at the endpoints, so that they depend only on the values
of 𝑓 at the knots. With care, these more practical approximations can also be made fourth order accurate.
and counting constants suggests that there should be a unique cubic ℎ with these properties. From now on, I will use
“cubic” to include the degenerate cases that are actually quadratics and so on.
To determine this cubic it is convenient to put it in the form
𝑎 = 𝑦0 , 𝑏 = 𝑦0′
𝑦1 − 𝑦 0 𝑦0′ 𝑦1′ − 𝑦0′ 2(𝑦1 − 𝑦0 )
𝑐 = − , 𝑑= −
ℎ2 ℎ 3ℎ2 3ℎ3
With more points, one could look for higher order polynomials, but it is useful in some cases to construct a piecewise
cubic approximation, with the cubic between each consecutive pair of nodes determined only by the value of the function
and its derivative at those nodes. Thus the piecewise Hermite cubic approximation to 𝑓 on the interval [𝑎, 𝑏] for the points
𝑎 = 𝑡0 < 𝑡1 < ⋯ < 𝑡𝑛 is given by a set of 𝑛 cubics
with
𝑎𝑖 = 𝑦𝑖 , 𝑏𝑖 = 𝑦𝑖′
𝑦𝑖+1 − 𝑦𝑖 𝑦′
𝑐𝑖 = 2
− 𝑖
ℎ𝑖 ℎ𝑖
′ ′
𝑦𝑖+1 − 𝑦𝑖 2(𝑦𝑖+1 − 𝑦𝑖 )
𝑑𝑖 = −
3ℎ2𝑖 3ℎ3𝑖
where 𝑦𝑖 ∶= 𝑓(𝑡𝑖 ), 𝑦𝑖′ ∶= 𝑓 ′ (𝑡𝑖 ) and ℎ𝑖 ∶= 𝑡𝑖+1 − 𝑡𝑖 . Most often, the points are equally spaced so that
ℎ𝑖 − ℎ ∶= (𝑏 − 𝑎)/𝑛.
There is an error formula for this (which is also an error formula for a clamped spline in the case 𝑛 = 1)
Theorem 3.4.2
For 𝑥 ∈ [𝑡𝑡 , 𝑡𝑖+1 ]
𝑓 (4) (𝜉)
𝑓(𝑥) − 𝐻(𝑥) = [(𝑥 − 𝑡𝑖 )(𝑥 − 𝑡𝑖+1 )]2
4!
where 𝜉 ∈ [𝑡𝑡 , 𝑡𝑖+1 ] Thus if |𝑓 (4) (𝑥)| ≤ 𝑀𝑖 for 𝑥 ∈ [𝑡𝑡 , 𝑡𝑖+1 ],
𝑀𝑖 4
|𝑓(𝑥) − 𝐻(𝑥)| ≤ ℎ
384 𝑖
Thus the accuracy is about as good as for clamped splines: the trade off is that the Hermite approximation is less smooth
(only one continuous derivative at the nodes), but the error is “localised”. That is, if the fourth derivative of 𝑓 is large or
non-existent in one interval, the accuracy of the Hermite approximation only suffers in that interval, not over the whole
domain.
However this comparison is a bit unfair, as the Hermite approximation uses the extra information about the derivatives
of 𝑓. This is also often impractical: either the derivatives are not known, or there is no known function 𝑓 but only a
collection of values 𝑦𝑖 .
To overcome this problem, the derivatives needed in the above formulas can be approximated from the 𝑦𝑖 as was done for
modified clamped splines. To do this properly, it is worth taking a thorough look at methods for approximating derivatives
and bounding the accuracy of such approximations.
References:
• Chapter 4 Least Squares of [Sauer, 2022], sections 1 and 2.
• Section 8.1 Discrete Least Squares Approximation of [Burden et al., 2016].
import numpy as np
from matplotlib.pyplot import figure, plot, title, legend, grid, loglog
from numpy.random import random
from numericalMethods import solveLinearSystem
from numericalMethods import evaluatePolynomial
import numericalMethods as nm
We have seen that when trying to fit a curve to a large collection of data points, fitting a single polynomial to all of them
can be a bad approach. This is even more so if the data itself is inaccurate, due for example to measurement error.
Thus an important approach is to find a function of some simple form that is close to the given points but not necesarily
fitting them exactly: given 𝑁 points
𝑒𝑖 = 𝑦𝑖 − 𝑓(𝑥𝑖 ),
The first decision to be made is how to measure the overall error in the fit, since the error is now a vector of values
𝑒 = {𝑒𝑖 }, not a single number. Two approaches are widely used:
• Min-Max: minimize the maximum of the absolute errors at each point, ‖𝑒‖𝑚𝑎𝑥 or ‖𝑒‖∞ , = max |𝑒𝑖 |
1≤𝑖≤𝑛
𝑛
• Least Squares: Minimize the sum of the squares of the errors, ∑ 𝑒2𝑖
1
but this often fails completely. In the following example, all three lines minimize this measure of error, along with
infinitely many others: any line that passes below half of the points and above the other half.
figure(figsize=[12,6])
plot(xdata, ydata, 'b*', label="Data")
xplot = np.linspace(0.,5.)
ylow = xplot -0.5
yhigh = xplot + 0.5
yflat = 2.5*np.ones_like(xplot)
plot(xplot, ylow, label="low")
plot(xplot, yhigh, label="high")
plot(xplot, yflat, label="flat")
legend(loc="best");
The Min-Max method is important and useful, but computationally difficult. One hint is the presence of absolute values
in the formula, which get in the way of using calculus to get equations for the minimum.
Thus the easiest and most common approach is Least Squares, or equivalently, minimizing the root-mean-square error,
which is just the Euclidean length ‖𝑒‖2 of the error vector 𝑒. That “geometrical” interpretation of the goal can be useful.
So we start with that.
The simplest approach is to seek the straight line 𝑦 = 𝑓(𝑥) = 𝑐0 + 𝑐1 𝑥 that minimizes the total square sum error,
Note well that the unknowns here are just the two values 𝑐0 and 𝑐1 , and 𝐸 is s fairly simple polynomial function of them.
The minimum error must occur at a critical point of this function, where both partial derivatives are zero:
𝜕𝐸
= 2 ∑(𝑐0 + 𝑐1 𝑥𝑖 − 𝑦𝑖 ) = 0,
𝜕𝑐0 𝑖
𝜕𝐸
= 2 ∑(𝑐0 + 𝑐1 𝑥𝑖 − 𝑦𝑖 )𝑥𝑖 = 0.
𝜕𝑐1 𝑖
These are just simultaneous linear equations, which is the secret of why the least squares approach is so much easier than
any alternative. The equations are:
∑𝑖 1 ∑ 𝑖 𝑥𝑖 𝑐 ∑ 𝑖 𝑦𝑖
[ ][ 0 ] = [ ]
∑𝑖 𝑥𝑖 ∑𝑖 𝑥2𝑖 𝑐1 ∑ 𝑖 𝑥𝑖 𝑦𝑖
𝑚𝑗 = ∑ 𝑥𝑗𝑖 , 𝑝𝑗 = ∑ 𝑥𝑗𝑖 𝑦𝑖
𝑖 𝑖
𝑀𝑐 = 𝑝
with
𝑚0 𝑚1 𝑝 𝑐
𝑀 =[ ],𝑝 = [ 0 ],𝑐 = [ 0 ].
𝑚1 𝑚2 𝑝1 𝑐1
N = 10
x = np.linspace(-1, 1, N)
# Emulate a straight line with measurement errors:
# random(N) gives N values uniformly distributed in the range [0,1],
# and so with mean 0.5.
# Thus subtracting 1/2 simulates more symmetric "errors", of mean zero.
yline = 2*x + 3
y = yline + (random(N) - 0.5)
figure(figsize=[12,6])
plot(x, yline, 'g', label="The original line")
plot(x, y, '*', label="Data")
c = linefit(x, y)
print("The coefficients are", c)
xplot = np.linspace(-1, 1, 100)
plot(xplot, evaluatePolynomial(xplot, c), 'r', label="Linear least squares fit")
legend(loc="best");
𝑝(𝑥) = 𝑐0 + 𝑐1 𝑥 + ⋯ + 𝑐𝑛 𝑥𝑛
(𝑥1 , 𝑦1 ), … (𝑥𝑁 , 𝑦𝑁 )
Note that when 𝑁 = 𝑛 + 1, the solution is the interpolating polynomial, with error zero.
The necessary conditions for a minimum are that all 𝑛 + 1 partial derivatives of 𝐸 are zero:
𝜕𝐸
= 2 ∑ (𝑦𝑖 − ∑ 𝑐𝑘 𝑥𝑘𝑖 ) 𝑥𝑗𝑖 = 0, 0 ≤ 𝑗 ≤ 𝑛.
𝜕𝑐𝑗 𝑖 𝑘
This gives
∑ ∑ (𝑐𝑘 𝑥𝑗+𝑘
𝑖 ) = ∑ (∑ 𝑥𝑗+𝑘
𝑖 ) 𝑐𝑘 = ∑ 𝑦𝑖 𝑥𝑗𝑖 , 0 ≤ 𝑗 ≤ 𝑛,
𝑖 𝑘 𝑘 𝑖 𝑖
∑ 𝑚𝑗+𝑘 𝑐𝑘 = 𝑝𝑗 , 0 ≤ 𝑗 ≤ 𝑛.
𝑘
𝑚0 𝑚1 … 𝑚𝑛 𝑝0 𝑐0
⎡ 𝑚 𝑚2 … 𝑚𝑛+1 ⎤ ⎡ 𝑝 ⎤ ⎡ 𝑐 ⎤
𝑀 =⎢ 1 ⎥, 𝑝 = ⎢ 1 ⎥, 𝑐 = ⎢ 1 ⎥.
⎢ ⋮ ⋮ ⋱ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 𝑚𝑛 𝑚𝑛+1 … 𝑚2𝑛 ⎦ ⎣ 𝑝𝑛 ⎦ ⎣ 𝑐𝑛 ⎦
"""
N = len(x)
m = np.zeros(2*n+1)
for k in range(2*n+1):
m[k] = sum(x**k)
M = np.zeros([n+1,n+1])
for i in range(n+1):
for j in range(n+1):
M[i, j] = m[i+j]
p = np.zeros(n+1)
for k in range(n+1):
p[k] = sum(x**k * y)
c = solveLinearSystem(M, p)
return c
N = 10
n = 3
xdata = np.linspace(0, np.pi/2, N)
ydata = np.sin(xdata)
figure(figsize=[14,5])
(continues on next page)
The discrepancy between the original function sin(𝑥) and this cubic is:
figure(figsize=[14,5])
plot(xplot, np.sin(xplot) - evaluatePolynomial(xplot, c))
title("errors fitting at N = 10 points")
grid(True);
N = 50
xdata = np.linspace(0, np.pi/2, N)
ydata = np.sin(xdata)
(continues on next page)
figure(figsize=[14,5])
plot(xdata, ydata, 'b.', label="sin(x) data")
plot(xplot, np.sin(xplot), 'b', label="sin(x) curve")
c = fitPolynomialLeastSquares(xdata, ydata, n)
print("The coefficients are", c)
plot(xplot, evaluatePolynomial(xplot, c), 'r', label="Cubic least squares fit")
legend(loc="best")
grid(True);
figure(figsize=[14,5])
plot(xplot, np.sin(xplot) - evaluatePolynomial(xplot, c))
title("errors fitting at N = 50 points")
grid(True);
When data (𝑥𝑖 , 𝑦𝑖 ) is inherently positive, it is often natural to seek an approximate power law relationship
𝑦𝑖 ≈ 𝑐𝑥𝑝𝑖
That is, one seeks the power 𝑝 and scale factor 𝑐 that minimizes error in some sense.
When the magnitudes and the data 𝑦𝑖 vary greatly, it is often appropriate to look at the relative errors
𝑐𝑥𝑝𝑖 − 𝑦𝑖
𝑒𝑖 = ∣ ∣
𝑦𝑖
and this can be shown to be very close to looking at the absolute errors of the logarithms
Introducing the new variables 𝑋𝑖 = ln(𝑥𝑖 ), 𝑌𝑖 = ln(𝑌𝑖 ) and 𝐶 = ln(𝑐), this becomes the familiar problem of finding a
linear approxation of the data 𝑌𝑖 by 𝐶 + 𝑝𝑋𝑖 .
4.5.6 A simulation
cexact = 2.0
pexact = 1.5
x = np.logspace(0.01, 2.0, 10)
xplot = np.logspace(0.01, 2.0) # For graphs later
yexact = cexact * x**pexact
y = yexact * (1.0 + (random(len(yexact))- 0.5)/2)
figure(figsize=[12,6])
plot(x, yexact, '.', label='exact')
plot(x, y, '*', label='noisy')
legend()
grid(True);
figure(figsize=[12,6])
loglog(x, yexact, '.', label='exact')
loglog(x, y, '*', label='noisy')
legend()
grid(True)
X = np.log(x)
Y = np.log(y)
Cp = fitPolynomialLeastSquares(X, Y, 1)
C = np.exp(Cp[0])
p = Cp[1]
print(f"{C=}, {p=}")
C=2.0413344277069996, p=1.474126908463272
figure(figsize=[12,6])
plot(x, yexact, '.', label='exact')
plot(x, y, '*', label='noisy')
plot(xplot, C * xplot**p)
legend()
grid(True);
figure(figsize=[12,6])
loglog(x, yexact, '.', label='exact')
loglog(x, y, '*', label='noisy')
loglog(xplot, C * xplot**p)
legend()
grid(True);
References:
• Chapter 4 Least Squares of [Sauer, 2022], sections 1 and 2.
• Section 8.1 Discrete Least Squares Approximation of [Burden et al., 2016].
4.6.1 Introduction
We have seen that one common and important approach to approximating data
(𝑥𝑖 , 𝑦𝑖 ), 1 ≤ 𝑖 ≤ 𝑁
by a polynomial 𝑦 = 𝑝(𝑥) = 𝑐0 + ⋯ 𝑐𝑛 𝑥𝑛 of degree at most 𝑛 is to minimize the “average” of the errors
𝑒𝑖 = 𝑦𝑖 − 𝑓(𝑥𝑖 ),
√
√𝑁
in the sense of the root-mean-square error 𝐸𝑅𝑀𝑆 = √∑ 𝑒2𝑖 . Equivalently, we will avoid the square root and just
⎷ 𝑖=1
minimize the sum of the squares of the errors:
𝑁
𝐸(𝑐0 , 𝑐1 , … , 𝑐𝑛 ) = ∑ 𝑒2𝑖
𝑖=1
One way to derive the needed formulas is by seeking the critical point og th abev function via teh 𝑛 + 1 equations
𝜕𝐸
= 0, 0≤𝑖≤𝑛
𝜕𝑐𝑖
Fortunately these gives a systems of linear equations, and it has a unique solution, thus giving the desired global minimum.
However, there is another “geometrical” approach, that is also relevant as an introduction to strategies also used for
other minimization problems, for example with application to the numerical solutions of boundary value problems for
differential equations.
4.6.3 Linear least squares: minimizing RMS error by minimizing “Euclidean” dis-
tance with geometry
For approximation by a polynomial 𝑦 = 𝑝(𝑥) = 𝑐0 + ⋯ 𝑐𝑛 𝑥𝑛 , we can think of the data 𝑦𝑖 , 1 ≤ 𝑖 ≤ 𝑁 as giving a point
in 𝑁 -dimensional space (𝑅)𝑁 , and the approximations as giving another point with coordinates 𝑦𝑖̃ ∶= 𝑝(𝑥𝑖 ).
Then the least squares problem is to minimize the Euclidean distance ‖𝑦 − 𝑦‖̃ 2 .
One way to think of this is that we attempt unsuccessfully to solve the collocation equations 𝑝(𝑥𝑖 ) = 𝑦𝑖 as an over-
determined sytem of 𝑁 equations in 𝑛 + 1 unknowns 𝐴𝑐 = 𝑦, where
1 𝑥1 𝑥21 … 𝑥𝑛1
⎡ 1 𝑥2 𝑥22 … 𝑥𝑛2 ⎤
⎢ ⎥
⋮ ⋮ ⋮ ⋮
𝐴=⎢ ⎥
⎢ 1 𝑥𝑖 𝑥2𝑖 … 𝑥𝑛𝑖 ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣ 1 𝑥𝑁 𝑥2𝑁 … 𝑥𝑛𝑁 ⎦
Recall that (𝑥, 𝐴𝑦) = (𝐴𝑇 𝑥, 𝑦) where 𝐴𝑇 is the transpose of 𝐴: the mirror image with 𝑎𝑇𝑖,𝑗 = 𝑎𝑗,𝑖 .
Using this gives
(𝐴𝑇 (𝑦 − 𝑦),
̃ 𝑐′ ) = 0 for every 𝑐′ ∈ (𝑅)𝑛+1 .
𝑀 𝑐 = 𝐴𝑇 𝑦
where 𝑀 ∶= 𝐴𝑇 𝐴
Since here 𝐴 is 𝑁 × (𝑛 + 1), 𝐴𝑇 is (𝑛 + 1) × 𝑁 , and the product 𝑀 is an (𝑛 + 1) × (𝑛 + 1) square matrix.
Further calculation shows that in fact
𝑚0 𝑚1 … 𝑚𝑛
⎡ 𝑚 𝑚2 … 𝑚𝑛+1 ⎤ 𝑁
𝑀 =⎢ 1 ⎥, 𝑚𝑘 = ∑ 𝑥𝑘𝑖
⎢ ⋮ ⋮ ⋱ ⋮ ⎥ 𝑖=1
⎣ 𝑚𝑛 𝑚𝑛+1 … 𝑚2𝑛 ⎦
so these equations are the same ones 𝑀 𝑐 = 𝑝 given by the previous calculus derivation.
FIVE
𝑓(𝑥 + ℎ) − 𝑓(𝑥)
𝐷𝑓(𝑥) ≈ 𝐷ℎ 𝑓(𝑥) ∶= (5.1)
ℎ
and
𝑓(𝑥 − ℎ) − 2𝑓(𝑥) + 𝑓(𝑥 + ℎ)
𝐷2 𝑓(𝑥) ≈ 𝛿 2 𝑓(𝑥) ∶= . (5.2)
ℎ2
For the first case we can use the Taylor formula for 𝑛 = 1,
1
𝑓(𝑥 + ℎ) = 𝑓(𝑥) + 𝐷𝑓(𝑥)ℎ + 𝐷2 𝑓(𝜉𝑥 )ℎ2 where 𝜉𝑥 is between 𝑥 and 𝑥 + ℎ
2
(see Equations (2.5) or (2.8) in the section Taylor’s Theorem and the Accuracy of Linearization); this gives
173
Introduction to Numerical Methods and Analysis with Python
where the integers 𝑙 and 𝑟 can be negative, positive or zero. The assumed form then is
𝑓(𝑥 + ℎ) − 𝑓(𝑥)
𝐷𝑓(𝑥) = + 𝑂(ℎ)
ℎ
has 𝑘 = 1, 𝑙 = 0, 𝑟 = 1, 𝑝 = 1.
𝐷2 𝑓(𝑥) 2 𝐷3 𝑓(𝑥) 3
𝑓(𝑥 + ℎ) = 𝑓(𝑥) + 𝐷𝑓(𝑥)ℎ + ℎ + ℎ +⋯
2 6
If you are not sure how accurate the result is, you might need to initially be vague about how may terms are needed, so I
will do it that way and then go back and be more specific once we know more.
A series for 𝑓(𝑥 + 2ℎ) is also needed:
𝐷2 𝑓(𝑥) 𝐷3 𝑓(𝑥)
𝑓(𝑥 + 2ℎ) = 𝑓(𝑥) + 𝐷𝑓(𝑥)(2ℎ) + (2ℎ)2 + (2ℎ)3 + ⋯
2 6
𝐷2 𝑓(𝑥) 2 𝐷3 𝑓(𝑥) 3
= 𝑓(𝑥) + 2𝐷𝑓(𝑥)ℎ + 4ℎ + 8ℎ + ⋯
2 6
4𝐷3 𝑓(𝑥) 3
= 𝑓(𝑥) + 2𝐷𝑓(𝑥)ℎ + 2𝐷2 𝑓(𝑥)ℎ2 + ℎ +⋯
3
Insert these into the above three-point formula, and see how close it is to the exact derivative:
Now gather terms with the same power of ℎ (which is also gathering terms with the same order of derivative):
2
−3𝑓(𝑥) + 4𝑓(𝑥 + ℎ) − 𝑓(𝑥 + 2ℎ) −3 + 4 − 1 4−2 4/4 − 2/2 4/12 − 4/6
= 𝑓(𝑥) + 𝐷𝑓(𝑥) + 𝐷2 𝑓(𝑥) + 𝐷3 𝑓(𝑥) +⋯
2ℎ 2ℎ 2 ℎ ℎ
𝐷3 𝑓(𝑥) 2
= 𝐷𝑓(𝑥) − ℎ +⋯
3
and it is clear that the omitted terms have higher power of ℎ: ℎ3 and up. That is, they are 𝑂(ℎ3 ), or more conveniently
𝑜(ℎ2 ).
Thus we have confirmed that the error in this approximation is
𝑓(𝑥 + 𝑖ℎ) = 𝑓(𝑥) + (𝑖ℎ)𝐷𝑓(𝑥) + (𝑖ℎ)2 /2𝐷2 𝑓(𝑥) + ⋯ + (𝑖ℎ)𝑗 /𝑗!𝐷𝑗 𝑓(𝑥) + ⋯ + (𝑖ℎ)𝑝+𝑘 /(𝑝 + 𝑘)!𝐷𝑝+𝑘 𝑓(𝑥) + 𝑜(ℎ𝑝+𝑘 )
Then these can be rearranged, putting the terms with the same derivative 𝐷𝑗 𝑓(𝑥) together — all of which have the same
factor ℎ𝑗 in the numeriator, and so the same factor ℎ𝑗−𝑝 overall:
The final “small” term 𝑜(ℎ𝑝 ) comes from the terms 𝑜(ℎ𝑝+𝑘 ) in each Taylor’s formula term, each divided by ℎ𝑘 .
We want this whole thing to be approximately 𝐷𝑘 𝑓(𝑥), and the strategy is to match the coefficients of the derivatives:
𝑙𝑘 𝐶 𝑙 + ⋯ + 𝑟 𝑘 𝐶 𝑟 = 1 =
𝐶𝑙 + ⋯ + 𝐶 𝑟 = 0 (5.3)
𝑗 𝑗
𝑙 𝐶𝑙 + ⋯ + 𝑟 𝐶𝑟 = 0, 𝑗 ≠ 𝑘 (5.4)
𝑘 𝑘
𝑙 𝐶𝑙 + ⋯ + 𝑟 𝐶𝑟 = 1 (5.5)
(5.6)
And indeed it can be verified that the resulting matrix for this system of equations is non-singular, and so there is a unique
solution for the coefficients 𝐶𝑙 … 𝐶𝑟 .
Exercise A
𝐷2 𝑓(𝑥) 2 𝐷3 𝑓(𝑥) 3
𝑓(𝑥 + ℎ) = 𝑓(𝑥) + 𝐷𝑓(𝑥)ℎ + ℎ + ℎ + 𝑂(ℎ4 )
2 6
and
4𝐷3 𝑓(𝑥) 3
𝑓(𝑥 + 2ℎ) = 𝑓(𝑥) + 2𝐷𝑓(𝑥)ℎ + 2𝐷2 𝑓(𝑥)ℎ2 + ℎ + 𝑂(ℎ4 )
3
B) Verify the result in Example 4.1.3.
Again, do this by hand, and exploit the symmetry. Note that it works a bit better than expected, due to the symmetry.
Theorem 4.1.1
From the above degree of precision result, one can determine the coefficients by requiring degree of precision 𝑝 + 𝑘 − 1,
and for this it is enough to require exactness for each of the simple monomial functions 1, 𝑥, 𝑥2 , and so on up to 𝑥𝑝+𝑘−1 .
Also, this only needs to be tested at 𝑥 = 0, since “translating” the variables does not effect the result.
This is probably the simplest method in practice.
Example 4.1.4
Let us revisit Example 4.1.2. The goal is to get exactness in
𝐶0 × 1 + 𝐶 1 × 1 + 𝐶 2 × 1
= 0,
ℎ
so
𝐶0 + 𝐶1 + 𝐶2 = 0
𝐶1 + 2𝐶2 = 1
We need at least three equations for the three unknown coefficients, so continue with 𝑓(𝑥) = 𝑥2 , 𝐷𝑓(0) = 0:
𝐶1 + 4𝐶2 = 0
• The first equation then gives 𝐶0 = −𝐶1 − 𝐶2 = −3/2 all as claimed above.
So far the degree of precision has been shown to be at least 2. In some cases it is better, so let us check by looking at
𝑓(𝑥) = 𝑥3 :
𝐷𝑓(𝑥) = 0, whereas
Remark 4.1.1
If you want to verify more rigorously the order of accuracy of a formula devised by this method, one can use the “checking”
procedure with Taylor polynomials and their error terms as done in Example 4.1.2 above.
References:
• Section 5.1.3 Extrapolation in [Sauer, 2022].
• Section 4.2 Richardson Extrapolation im [Burden et al., 2016].
• Section 4.2 Estimating Derivatives and Richardson Extrapolation in [Chenney and Kincaid, 2012].
5.2.1 Motivation
Thus we would like to produce new approximation formulas of higher order 𝑝; that is, with error 𝑂(ℎ𝑝 ) for 𝑝 greater than
the values 𝑝 = 1 for Δℎ 𝑓(𝑥) or 𝑝 = 2 for 𝛿ℎ2 𝑓(𝑥).
5.2.2 Procedure
The general framework for this is an exact quantity 𝑄0 for which we have an approximation formula 𝑄(ℎ) with
𝑄(ℎ) ≈ 𝑄0 + 𝑐𝑝 ℎ𝑝 ,
and evaluate for two values of ℎ; most often either ℎ and 2ℎ (or ℎ and ℎ/2, which ismore or less equivalent.)
That gives
𝑄(2ℎ) ≈ 𝑄0 + 𝑐𝑝 (2ℎ)𝑝 = 𝑄0 + 𝑐𝑝 2𝑝 ℎ𝑝 ,
and with only 𝑄0 and 𝑐𝑝 unknown, this is two (approximate) linear equations in two unknowns, so we can solve for the
desired quantity 𝑄0 by basic Gaussian elimination. This gives
2𝑝 𝑄(ℎ) − 𝑄(2ℎ)
𝑄0 ≈ =∶ 𝑄𝑞 (ℎ).
2𝑝 − 1
But is this new approximation any better than the original? Using the more complete error formula above for 𝑄(ℎ) and
its version with ℎ replaced by 2ℎ,
one gets
2𝑝 𝜙(ℎ) − 𝜙(2ℎ)
𝑄𝑞 (ℎ) = = 𝑄0 + 𝑂(ℎ𝑞 ),
2𝑝 − 1
so indeed an improvement, since 𝑞 > 𝑝.
We can get a useful practical error estimate by rewriting the above result as
𝑄(ℎ) − 𝑄(2ℎ)
𝑄0 ≈ 𝑄(ℎ) + (5.7)
2𝑝 − 1
so that the quantity
𝑄(ℎ) − 𝑄(2ℎ)
𝐸ℎ ∶= ≈ 𝑄0 − 𝑄(ℎ) (5.8)
2𝑝 − 1
is approximately the error in 𝑄(ℎ). Thus,
1. Richardson extrapolation can be viewed as “correcting” 𝑄ℎ by subtracting of this estimated error:
𝑄0 ≈ 𝑄𝑞 (ℎ) = 𝑄ℎ + 𝐸ℎ
2. This magnitude |𝐸ℎ | of this error estimate can be used as a (typically pessimistic!) estimate of the error in the cor-
reted result 𝑄𝑞 . Sometimes makes sens to use an even more cautious error estimate, by discarding the denominator
2𝑝 − 1: using |𝑄(ℎ) − 𝑄(2ℎ)| as an estimate of the error in the extrapolated value 𝑄𝑞 .
Either way, these follow the pervasive pattern of using the change between the two most recent approximations as an error
estimate.
Note the analogy to Newton’s method for solving 𝑓(𝑥) = 0, which can be broken into the two steps
• estimate the error in approximate root 𝑥𝑛 as 𝐸𝑛 ∶= −𝑓(𝑥𝑛 )/𝑓 ′ (𝑥𝑛 )
• update the approximation to 𝑥𝑛+1 = 𝑥𝑛 + 𝐸𝑛 .
Finally, note that this is always extrapolation, in the sense of “going beyond”: the new approximation is on the opposite
side of the better of the original approximations from the less accurate of them.
Example 4.2.1
For the basic forward difference approximation above, this process give a three-point method of second order accuracy
(𝑞 = 2):
Exercise 1(a)
Apply Richardson extrapolation to the standard three-point, second order accurate approximation 𝑄(ℎ) ∶= 𝛿ℎ2 𝑓(𝑥) of the
second derivative 𝑄0 ∶= 𝐷2 𝑓(𝑥) as given above, and verify that it gives a fourth-order accurate five-point approximation
formula.
Exercise 1(b)
As a supplementary exercise, one could verify the order of accuracy directly with Taylor polynomials, or verify that the
new formula has degree of precision 𝑑 = 5, and hence is of order 𝑝 = 4 due to the formula 𝑑 = 𝑝 + 𝑘 − 1 for
approximations of 𝑘-th derivatives, given in the notes for Day 11.
One could also derive the same formula “from scratch” using the Method of Undetermined Coefficients.
Exercise 2
Apply Richardson extrapolation to the above one-sided three-point, second order accurate approximation of the derivative
𝐷𝑓(𝑥), and verify that it gives a third-order accurate four-point approximation formula.
But note something strange about this new formula: it skips 𝑓(𝑥 + 3ℎ).
Here, instead of extrapolating, one is probably better off applying the Method of Undetermined Coefficients directly with
data 𝑓(𝑥), 𝑓(𝑥 + ℎ), 𝑓(𝑥 + 2ℎ), 𝑓(𝑥 + 3ℎ) and 𝑓(𝑥 + 4ℎ): what order of accuracy does that give?
5.2.3 A variant, more useful for integration and ODE boundary value problems: pa-
rameter 𝑛
A slight variant of the above is approximation with an integer parameter 𝑛, such as approximations of integrals by the
(composite) trapezoid rule with 𝑛 intervals, 𝑇𝑛 , or the approximate solution of an ordinary differential equation at the
above-described collection of 𝑛 + 1 equally spaced values in domain [𝑎, 𝑏]. Then a more natural ntito of teh approxatio
formula is 𝑄𝑛 instead of 𝑄(ℎ).
The errors of the form 𝑐𝑝 ℎ𝑝 + 𝑂(ℎ𝑞 ) become
1 𝑐𝑝 1
𝑄𝑛 = 𝑄0 + 𝑂 ( 𝑝
) = 𝑄0 + 𝑝 + 𝑂 ( 𝑞 ) .
𝑛 𝑛 𝑛
The main difference is that to work with integer values of 𝑛, it must be the quantity that is doubled, whereas doubling of
ℎ would correspond to halving of 𝑛.
The extrapolation formula becomes
2𝑝 𝑄2𝑛 − 𝑄𝑛 1
𝑄0 = + 𝑂( 𝑞). (5.9)
2𝑝 − 1 𝑛
Remark 4.2.1
For the slightly more general case of increasing from 𝑛 to 𝑘𝑛, one gets
𝑘𝑝 𝑄𝑘𝑛 − 𝑄𝑛 1
𝑄0 = + 𝑂( 𝑞).
𝑘𝑝 − 1 𝑛
This can be summarized with the same verbal form as the original formula:
• 2𝑝 times the more accurate approximation,
• minus the less accurate approximation,
• all divided by (2𝑝 − 1)
Also
The error in the more accurate approximation is approximated by the difference between the two approximations, divided
by (2𝑝 − 1)
As with the “ℎ” form above, this extrapolation can be broken into two steps
𝑄2𝑛 − 𝑄𝑛
𝐸2𝑛 ∶= ,
2𝑝 − 1
1
𝑄0 = 𝑄2𝑛 + 𝐸2𝑛 + 𝑂 ( ).
𝑛𝑞
so 𝐸2𝑛 estimates the error in 𝑄2𝑛 , and the improved approxmation can be expressed as
𝑄2𝑛 + 𝐸2𝑛 .
The new improved approximation formulas have the same sort of error formula, but for order 𝑞 instead of order 𝑝, so we
could extrapolate again to get an even higher order method, and this can be done numerous times if there is a suitable
power series in ℎ or 1/𝑛 for the errors.
That is not so useful for derivative approximations, where one can get the same or better results with the method of
underermined coefficients, but can be very useful for integration methods, and for the related task of solving boundary
value problems for ordinary differential equations.
For example, it can be applied to the composite trapezoid rule, giving the composite Simpson’s rule at the first step, and
then a succession of approximations of ever higher order – this is known as the Romberg method.
Repeated Richardson extrapolation can also be applied to the approximate solution of dofferential equations; we might
explore that later.
5.3.1 Introduction
The objective of this and several subsequent sections is to develop methods for approxmating a definite integral
𝑏
𝐼 = ∫ 𝑓(𝑥) 𝑑𝑥
𝑎
This is arguably even more important than approximating derivatives, for several reasons; in particular, because there
are many functions for which antiderivative formulas cannot be found, so that the result of the Fundamental Theorem of
Calculus, that
𝑏
∫ 𝑓(𝑥) 𝑑𝑥 = 𝐹 (𝑏) − 𝐹 (𝑎), for 𝐹 any antiderivative of 𝑓
𝑎
The idea is to approximate 𝑓 ∶ [𝑎, 𝑏] → ℝ by collocation at the end points of this interval:
𝑓(𝑎)(𝑏 − 𝑥) + 𝑓(𝑏)(𝑥 − 𝑎)
𝑓(𝑥) ≈ 𝐿(𝑥) ∶= , = 𝑓𝑎𝑣𝑒 (𝑏 − 𝑎)
𝑏−𝑎
Then the approximation — which will be called 𝑇1 , for reasons that will becom clear soon — is
𝑏
𝑓(𝑎) + 𝑓(𝑏)
𝐼 ≈ 𝑇1 = ∫ 𝐿(𝑥)𝑑𝑥 = (𝑏 − 𝑎)
𝑎 2
This can be interpreted as replacing 𝑓(𝑥) by 𝑓𝑎𝑣𝑒 the average of the value at the end points, and inegrting that simple
function.
For the example 𝑓(𝑥) = 𝑒𝑥 on [−1, 3]
a = 1
b = 3
def f(x): return exp(x)
The approximation 𝑇1 is the area of the orange trapezoid (hence the name!) which is also the area of the green rectangle.
The idea here is to approximate 𝑓 ∶ [𝑎, 𝑏] → ℝ by its value at the midpoint of the interval, like the building blocks in a
Riemann sum with the middel being the intuitive best choice of where to put the rectangle.
𝑎+𝑏
𝑓(𝑥) ≈ 𝑓𝑚𝑖𝑑 ∶= 𝑓 ( )
2
Then the approximation — which will be called 𝑀1 — is
𝑏
𝑎+𝑏
𝐼 ≈ 𝑀1 = ∫ 𝑓𝑚𝑖𝑑 𝑑𝑥 = 𝑓 ( ) (𝑏 − 𝑎)
𝑎 2
f_mid = f((a+b)/2)
figure(figsize=[14,10])
plot(x, f(x))
plot([a, a, b, b, a], [0, f_mid, f_mid, 0, 0], 'r', label="Midpoint Rule")
grid(True)
f_mid = f((a+b)/2)
figure(figsize=[14,10])
plot(x, f(x))
plot([a, a, b, b, a], [0, f(a), f(b), 0, 0], label="Trapezoid Rule")
plot([a, a, b, b, a], [0, f_ave, f_ave, 0, 0], '-.', label="Trapezoid Rule area")
plot([a, a, b, b, a], [0, f_mid, f_mid, 0, 0], 'r', label="Midpoint Rule")
legend()
grid(True)
These graphs indicate that the trapezoid rule will over-estimate the error for this and any function that is convex up on
the interval [𝑎, 𝑏]. With closer examination it can perhaps be seen that the Midpoint Rule will instead underestimate in
this situation, because its “overshoot” at left is less than its “undershoot” at right.
We can derive error formulas that confirm this, and which are the basis for both practical error estimates and for deriving
more accurate approximation methods.
The first such method will be to use multiple small intervals instead of a single bigger one (using piecewise polynomial
approximation) and for that, it is convenient to define ℎ = 𝑏 − 𝑎 which will become the parameter that we reduce in order
to improve accuracy.
For a function 𝑓 that is twice differentiable on interval [𝑎, 𝑏], the error in the Trapezoid Rule is
𝑏
(𝑏 − 𝑎)3 ″
∫ 𝑓(𝑥)𝑑𝑥 − 𝑇1 = − 𝑓 (𝜉) for some 𝜉 ∈ [𝑎, 𝑏]
𝑎 12
It will be convenient to define ℎ ∶= 𝑏 − 𝑎 so that this becomes
𝑏
ℎ3 ″
∫ 𝑓(𝑥)𝑑𝑥 − 𝑇1 = − 𝑓 (𝜉) for some 𝜉 ∈ [𝑎, 𝑏].
𝑎 12
These will be verified below, using the error formulas for Taylor polynomials and collocation polynomials.
For now, note that:
• The results confirm that for a function that is convex up, the Trapezoid Rule overestimates and the Midpoint Rule
underestimates.
• The ratio of the errors is approximately −2. This will be used to get a better result by using a weighted average:
Simpson’s Rule.
• The errors are 𝑂(ℎ3 ). This opens the door to Richardson Extrapolation, as will be seen soon in the method of
Romberg Integration.
One side benefit of the following verifications is that they also offer illustrations of how the two fundamental error formulas
help us: Taylor’s Formula and its cousin the error formula for polynomial collocation.
To help prove the above formulas, we introduce a result that also helps in various places later:
with 𝑓 continuous and the “weight function” 𝑤(𝑥) positive valued (actually, it is enough that 𝑤(𝑥) ≥ 0 and it is not zero
everyhere), there is a point 𝜉 ∈ [𝑎, 𝑏] that gives a “weighted average value” for 𝑓(𝑥) in the sense that
𝑏 𝑏 𝑏
∫ 𝑓(𝑥)𝑤(𝑥) 𝑑𝑥 = ∫ 𝑓(𝜉)𝑤(𝑥) 𝑑𝑥, = 𝑓(𝜉) ∫ 𝑤(𝑥) 𝑑𝑥
𝑎 𝑎 𝑎
Proof. As 𝑓 is continuous on the closed, bounded interval [𝑎, 𝑏], the Extreme Value Theorem from calculus says that
𝑓 has a minimum 𝐿 and a maximum 𝐻 on this interval: 𝐿 ≤ 𝑓(𝑥) ≤ 𝐻. Since 𝑤(𝑥) ≥ 0, this gives
and by integrating,
𝑏 𝑏 𝑏
𝐿 ∫ 𝑤(𝑥) 𝑑𝑥 ≤ ∫ 𝑓(𝑥)𝑤(𝑥) 𝑑𝑥 ≤ 𝐻 ∫ 𝑤(𝑥) 𝑑𝑥
𝑎 𝑎 𝑎
𝑏
Dividing by ∫𝑎 𝑤(𝑥) 𝑑𝑥 (which is positive),
𝑏
∫𝑎 𝑓(𝑥)𝑤(𝑥) 𝑑𝑥
𝐿≤ 𝑏
≤𝐻
∫𝑎 𝑤(𝑥) 𝑑𝑥
and the Mean Value Theorem says that 𝑓 attains this value for some 𝜉 ∈ [𝐿, 𝐻]:
𝑏
∫𝑎 𝑓(𝑥)𝑤(𝑥) 𝑑𝑥
𝑓(𝜉) = 𝑏
(5.10)
∫𝑎 𝑤(𝑥) 𝑑𝑥
𝑏
(𝑏 − 𝑎)3
A bit of calculus gives ∫ (𝑥 − 𝑎)(𝑏 − 𝑥) 𝑑𝑥 = , so
𝑎 6
Here symmetry helps, by eliminating the first (potentialy biggest) term in the error: we use the fact that 𝑎 = 𝑐 − ℎ/2 and
𝑏 = 𝑐 + ℎ/2
𝑏 𝑐+ℎ/2
𝑐+ℎ/2
∫ 𝑓 ′ (𝑐)(𝑥 − 𝑐) 𝑑𝑥 = 𝑓 ′ (𝑐) ∫ 𝑥 − 𝑐 𝑑𝑥 = [(𝑥 − 𝑐)2 /2]𝑐−ℎ/2 = (ℎ/2)2 − (ℎ/2)2 = 0
𝑎 𝑐−ℎ/2
and much as above, the Integral Mean Value Theorem can be used, this time with weight function 𝑤(𝑥) = (𝑥 − 𝑐)2 , ≥ 0:
𝑏
𝑓 ″ (𝜉)
𝐼 − 𝑀1 = ∫ (𝑥 − 𝑐)2 𝑑𝑥
2 𝑎
𝑏 ℎ/2
ℎ/2
Another caluclus exercise: ∫ (𝑥 − 𝑐)2 𝑑𝑥 = ∫ 𝑥2 𝑑𝑥 = [𝑥3 /3]−ℎ/2 = ℎ3 /12, so indeed,
𝑎 −ℎ/2
𝑓 ″ (𝜉) 3
𝐼 − 𝑀1 = ℎ
24
𝑓(𝑥) ≈ 𝑓(𝑎)
leading to
𝑏 𝑏
𝐼 ∶= ∫ 𝑓(𝑥) 𝑑𝑥 ≈ 𝐿1 ∶= ∫ 𝑓(𝑎) 𝑑𝑥 = 𝑓(𝑎)(𝑏 − 𝑎)
𝑎 𝑎
with 𝑥𝑖 = 𝑎 + 𝑖ℎ as before.
Proof. This time use Taylor’s Theorem just for the constant approximation with center 𝑎:
That is,
Using the Integral Mean Value Theorem again, now with weight 𝑤(𝑥) = 𝑥 − 𝑎 gives
𝑏 𝑏
(𝑏 − 𝑎)2 ℎ2 ′
∫ 𝑓 ′ (𝜉𝑥 )(𝑥 − 𝑎)𝑑𝑥 = 𝑓 ′ (𝜉) ∫ (𝑥 − 𝑎)𝑑𝑥 = 𝑓 ′ (𝜉) = 𝑓 (𝜉) for some 𝜉 ∈ [𝑎, 𝑏]
𝑎 𝑎 2 2
and inserting this into the previous formula gives the result.
References:
• Section 5.2.3 and 5.2.4 of Chapter 5 Numerical Differentiation and Integration in [Sauer, 2022].
• Section 4.4 Composite Numerical Integration of [Burden et al., 2016].
5.4.1 Introduction
5.4. Definite Integrals, Part 2: The Composite Trapezoid and Midpoint Rules 189
Introduction to Numerical Methods and Analysis with Python
In turn, the most straightforward way to do this is to use 𝑛 sub-intervals of equal width ℎ = (𝑏 − 𝑎)/𝑛, so that the
sub-interval endpoints are 𝑥0 = 𝑎 + 𝑖ℎ, 0 ≤ 𝑖 ≤ 𝑛: that is
sub-intervals [𝑥𝑖−1 , 𝑥𝑖 ], 1 ≤ 𝑖 ≤ 𝑛 separated by the nodes.
𝑎, 𝑎 + ℎ, 𝑎 + 2ℎ, … , 𝑏 − ℎ, 𝑏
Using the Midpoint Rule on each interval and summing gives a formula that could be familiar:
𝑥0 + 𝑥 1 𝑥 + 𝑥2 𝑥 + 𝑥𝑛
𝑀𝑛 ∶= 𝑓 ( )ℎ + 𝑓 ( 1 ) ℎ + ⋯ + 𝑓 ( 𝑛−1 )ℎ
2 2 2
𝑎 + (𝑎 + ℎ) (𝑎 + ℎ) + (𝑎 + 2ℎ) (𝑏 − ℎ) + 𝑏
=𝑓( )ℎ + 𝑓 ( )ℎ + ⋯ + 𝑓 ( )ℎ
2 2 2
= [𝑓(𝑎 + ℎ/2) + 𝑓(𝑎 + 3ℎ/2) + ⋯ + 𝑓(𝑏 − ℎ/2)] ℎ
This is a Riemann Sum as used in the definition of the defnite integral; possibly the best and natural one in most situations,
by using the midpoints of each interval. The theory of definite integrals also guarantees that 𝑀𝑛 → 𝐼 as 𝑛 → ∞ so long
as the function 𝑓 is continuous — the next question for us will be “how fast?*
This is also a Riemann sum, with intervals if length ℎ/2 at each end, using value at teh ends of thos intervals, and the rest
of width ℎ, with the Midpoint Rule used. So again, we know that 𝑇𝑛 → 𝐼 as 𝑛 → ∞ and next want to know “how fast?*
In brief, the errors for ech of rhese rules is the sum of the errors for each of the pieces; I will just state them for now.
Firstly,
𝑛
ℎ3 ″
𝐼 − 𝑀𝑛 = ∑ 𝑓 (𝜉𝑖 ), for some 𝜉𝑖 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]
𝑖=1
24
ℎ3 𝑛 ″
𝐼 − 𝑀𝑛 = ∑ 𝑓 (𝜉𝑖 )
24 𝑖=1
and as we will see, this sum can have each 𝑓 ″ (𝜉𝑖 ) replace by an “average value” 𝑓 ″ (𝜉𝑖 ), 𝜉 ∈ [𝑎, 𝑏]:
ℎ3 𝑛 ″ ℎ3 ″ ℎ2
𝐼 − 𝑀𝑛 = ∑ 𝑓 (𝜉) = 𝑛𝑓 (𝜉) = (𝑏 − 𝑎)𝑓 ″ (𝜉)
24 𝑖=1 24 24
𝐼 − 𝑀𝑛 = 𝑂(ℎ2 )
Similarly,
ℎ2
𝐼 − 𝑇𝑛 = − (𝑏 − 𝑎)𝑓 ″ (𝜉) = 𝑂(ℎ2 )
12
again with 𝜉 ∈ [𝑎, 𝑏], but note well: these two 𝜉 values are probably not the same!
Ignoring the 𝜉 values being different, this suggests again that we can cancel some of the errors wi ha weighted average:
2𝑀𝑛 + 𝑇𝑛
𝑆2𝑛 ∶=
3
Indeed we will see that the main, 𝑂(ℎ2 ), errors cancel out, and also due to symmetry, the error is even in ℎ, so that
𝐼 − 𝑆2𝑛 = 𝑂(ℎ4 )
The name is because this is the Composite Simpson’s Rule, and the interleaving of the different 𝑥 values used by 𝑀𝑛 and
𝑇𝑛 means that is uses 2𝑛 + 1 nodes, and so 2𝑛 sub-intervals.
A key step in getting more useful error formulas for approximations of integrals is the following result:
That is, the value of the function at 𝑐 is the average of its values at those other points.
Proof. The proof is rather similar to that of The Integral Mean Value Theorem in the previous section; essentially replacing
the integral there by a sum:
As 𝑓 is continuous on the closed, bounded interval [𝑎, 𝑏], the Extreme Value Theorem from calculus says that 𝑓 has a
minimum 𝐿 and a maximum 𝐻 on this interval. Each of the values 𝑓(𝑥𝑖 ) is in interval [𝐿, 𝐻] so their average is also:
𝑛
∑𝑖=1 𝑓(𝑥𝑖 )
𝑓(𝑥𝑖 ) ∈ [𝐿, 𝐻] and thus ∈ [𝐿, 𝐻]
𝑛
The Mean Value Theorem then says that 𝑓 attains this mean value for some 𝜉 ∈ [𝐿, 𝐻].
5.4. Definite Integrals, Part 2: The Composite Trapezoid and Midpoint Rules 191
Introduction to Numerical Methods and Analysis with Python
Completing the derivation of the error formulas for these composite rules
I will spell this out for the Composite Trapezoid Rule; it works very similarly for the “midpoint” case.
First, break the exact integral up as
𝑏 𝑛 𝑥𝑖
𝐼 = ∫ 𝑓(𝑥) 𝑑𝑥 = ∑ 𝐼 (𝑖) , where 𝐼 (𝑖) = ∫ 𝑓(𝑥) 𝑑𝑥
𝑎 𝑖=1 𝑥𝑖−1
Similarly,
𝑛
𝑇𝑛 = ∑ 𝑇 (𝑖)
𝑖=1
𝑓(𝑥𝑖−1 ) + 𝑓(𝑥𝑖 )
𝑇 (𝑖) = ℎ
2
The error in 𝑇𝑛 is the sum of the errors in each piece:
𝑛 𝑛
𝐼 − 𝑇𝑛 = ∑ 𝐼 (𝑖) − ∑ 𝑇 (𝑖)
𝑖=1 𝑖=1
𝑛
= ∑(𝐼 (𝑖) − 𝑇 (𝑖) )
𝑖=1
𝑛
ℎ3 ″
= ∑− 𝑓 (𝜉𝑖 ), 𝑥𝑖 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]
𝑖=1
12
ℎ3 𝑛 ″
=− ∑ 𝑓 (𝜉𝑖 )
12 𝑖=1
Now we can use the above mean value result (with 𝑓 ″ in place of 𝑓) to replace the last sum above by 𝑛𝑓 ″ (𝜉), some
𝜉 ∈ [𝑎, 𝑏], so that as claimed,
ℎ3 ″ ℎ2
𝐼 − 𝑇𝑛 = − 𝑛𝑓 (𝜉), = − (𝑏 − 𝑎)𝑓 ″ (𝜉) = 𝑂(ℎ2 ),
12 12
using ℎ𝑛 = 𝑏 − 𝑎.
Starting from
ℎ3 𝑛 ″ ℎ2 𝑛
𝐼 − 𝑇𝑛 = − ∑ 𝑓 (𝜉𝑖 ), = − ∑(𝑓 ″ (𝜉𝑖 )ℎ)
12 𝑖=1 12 𝑖=1
note that the sum in the second version is a Riemann sum for approximating the integral
𝑏
𝐼 ″ ∶= ∫ 𝑓 ″ (𝑥) 𝑑𝑥, = [𝑓 ′ (𝑥)]𝑏𝑎 = 𝑓 ′ (𝑏) − 𝑓 ′ (𝑎),
𝑎
so it seems that
𝑓 ′ (𝑏) − 𝑓 ′ (𝑎) 2
𝐼 − 𝑇𝑛 ≈ − ℎ , = 𝑂(ℎ2 )
12
A virtue of this form is that now we have a good chance of evaluating the coefficient of ℎ2 , so this given a “practical error
formula” when 𝑓 ′ (𝑥) is known.
Another useful fact (not proven in these notes) is that the error for the basic Trapezoid rule can be computed with the
help of Taylor’s Theorem in a series:
𝑏
𝑇1 = ∫ 𝑓(𝑥) 𝑑𝑥 = 𝐵2 𝐷2 𝑓(𝜉2 )ℎ3 + 𝐵4 𝐷4 (𝜉4 )ℎ5 + ⋯
𝑎
so that
𝑏
𝐷𝑓(𝑏) − 𝐷𝑓(𝑎) 2
𝑇𝑛 = ∫ 𝑓(𝑥) 𝑑𝑥 + ℎ + 𝑂(ℎ4 )
𝑎 12
The last form is the setup for Richardson extrapolation — and the previous one with a succession of “big-O” terms is the
setup for repeated Richardson extrapolation, to get a succession of approximations with errors 𝑂(ℎ2 ), then 𝑂(ℎ4 ), then
𝑂(ℎ6 ), and so on: Definite Integrals, Part 4: Romberg Integration.
There are similar formulas for the Composite Midpoint Rule, like
ℎ2 𝐷𝑓(𝑏) − 𝐷𝑓(𝑎) 2
𝐼 − 𝑀𝑛 = (𝑏 − 𝑎)𝑓 ″ (𝜉) = ℎ + 𝑂(ℎ4 )
24 24
but we will see why the Composite Trapezoid Rule is far more useful for Richardson extrapolation.
5.4.3 Appendix: The Composite Left-hand Endpoint Rule, and its Error
The Composite Left-hand Endpoint Rule with 𝑛 sub-intervals of equal width ℎ = (𝑏 − 𝑎)/𝑛 is
𝑛−1 𝑛−1
𝐿𝑛 = ∑ 𝑓(𝑥𝑖 )ℎ, = ∑ 𝑓(𝑎 + 𝑖ℎ)ℎ
𝑖=0 𝑖=0
To study its errors, start as with the Compound Trapezoid Rule: break the integral up as
𝑏 𝑛 𝑥𝑖
𝐼 = ∫ 𝑓(𝑥) 𝑑𝑥 = ∑ 𝐼 (𝑖) , where 𝐼 (𝑖) = ∫ 𝑓(𝑥) 𝑑𝑥
𝑎 𝑖=1 𝑥𝑖−1
𝐿(𝑖) = 𝑓(𝑥𝑖−1 )ℎ
5.4. Definite Integrals, Part 2: The Composite Trapezoid and Midpoint Rules 193
Introduction to Numerical Methods and Analysis with Python
Then the error in 𝐿𝑛 is again the sum of the errors in each piece:
𝑛 𝑛
𝐼 − 𝐿𝑛 = ∑ 𝐼 (𝑖) − ∑ 𝐿(𝑖)
𝑖=1 𝑖=1
𝑛
= ∑(𝐼 (𝑖) − 𝐿(𝑖) )
𝑖=1
𝑛
ℎ2 ′
=∑ 𝑓 (𝜉𝑖 ), 𝑥𝑖 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]
𝑖=1
2
ℎ2 𝑛 ′
= ∑ 𝑓 (𝜉𝑖 )
2 𝑖=1
The Generalized Mean Value Theorem — now with 𝑓 ′ in place of 𝑓 — allows us to replace the last sum above by 𝑛𝑓 ′ (𝜉),
some 𝜉 ∈ [𝑎, 𝑏], so that as claimed,
ℎ2 ′ ℎ
𝐼 − 𝐿𝑛 = 𝑛𝑓 (𝜉), = (𝑏 − 𝑎)𝑓 ′ (𝜉) = 𝑂(ℎ)
2 2
Remark 4.4.1
As with the Composite Trapezoid Rule, one can also get
𝑏
𝑓(𝑏) − 𝑓(𝑎)
𝐿𝑛 = ∫ 𝑓(𝑥) 𝑑𝑥 + ℎ + 𝑂(ℎ2 )
𝑎 2
References:
• Sections 5.2.2 and 5.2.3 of Chapter 5 Numerical Differentiation and Integration in [Sauer, 2022].
• Sections 4.3 and 4.4 of Chapter 5 Numerical Differentiation and Integration in [Burden et al., 2016].
5.5.1 Introduction
The Composite Simpson’s Rule can be be derived in several ways. The traditional approach is to devise Simpson’s
Rule by approximating the integrand function with a colocating quadratic (using three equally spaced nodes) and then
“compounding”, as seen with the Trapezoid and Midpoint Rules.
We have already seen another approach: using a 2:1 weighted average of the Trapezoid and Midpoint Rules with th goal
of cancelling their 𝑂(ℎ2 ) error terms.
This section will show a third approach, based on Richardson extrapolation: this will set us up for Romberg Integration.
From the section on The Composite Trapezoid and Midpoint Rules, we have
𝑏
𝐷𝑓(𝑏) − 𝐷𝑓(𝑎) 2
𝑇𝑛 = ∫ 𝑓(𝑥) 𝑑𝑥 + ℎ + 𝑂(ℎ4 ), = 𝐼 + 𝑐2 ℎ2 + 𝑂(ℎ4 )
𝑎 12
where 𝐼 is the integral to be approximated (the “Q” in the section on Richardson Extrapolation, and 𝑐2 = (𝐷𝑓(𝑏) −
𝐷𝑓(𝑎))/12.
Thus the “n form” of Richardson Extrapolation with 𝑝 = 2 gives a new approximation that I will call 𝑆2𝑛 :
4𝑇2𝑛 − 𝑇𝑛
𝑆2𝑛 =
4−1
To start, look at the simplest case of this:
4𝑇2 − 𝑇1
𝑆2 =
3
Definfing ℎ = (𝑏 − 𝑎)/2, the ingredients are
𝑓(𝑎) + 𝑓(𝑏) 𝑓(𝑎) + 𝑓(𝑏)
𝑇1 = (𝑏 − 𝑎) = 2ℎ = (𝑓(𝑎) + 𝑓(𝑏))ℎ
2 2
and
𝑓(𝑎) 𝑓(𝑏)
𝑇2 = [ + 𝑓(𝑎 + ℎ) + ]ℎ
2 2
so
[2𝑓(𝑎) + 4𝑓(𝑎 + ℎ) + 2𝑓(𝑏)] − [𝑓(𝑎) + 𝑓(𝑏)] 𝑓(𝑎) + 4𝑓(𝑎 + ℎ) + 𝑓(𝑏)
𝑆2 = ℎ, = ℎ
3 3
which is the basic Simpson’s Rule. The subscript “2” is because this uses two intervals, with ℎ = (𝑏 − 𝑎)/2
Rather than derive this the traditional way — by fitting a quadratic to the function values at 𝑥 = 𝑎, 𝑎 + ℎ and 𝑏 — this
can be confirmed “a postiori” by showing that the degree of precision is at least 2, so that it is exact for all quadratics.
And actually we get a bonus, thanks to some symmetry.
For 𝑓(𝑥) = 1, the exact integral is 𝐼 = 𝑏 − 𝑎, = 2ℎ, and also
1+4×1+1
𝑆2 = ℎ, = 2ℎ
3
𝑏
For 𝑓(𝑥) = 𝑥, the exact integral is 𝐼 = ∫𝑎 𝑥 𝑑𝑥 = [𝑥2 /2]𝑏𝑎 = (𝑏2 − 𝑎2 )/2 = (𝑏 − 𝑎)(𝑏 + 𝑎)/2 = (𝑎 + 𝑏)ℎ
and
𝑎 + 4(𝑎 + 𝑏)/2 + 𝑏 𝑎 + 2(𝑎 + 𝑏) + 𝑏
𝑆2 = ℎ= ℎ = (𝑎 + 𝑏)ℎ
3 3
However, it is sufficient to traslate the domain to the symmetric interval [−ℎ, ℎ], so redo the 𝑓(𝑥) = 𝑥 case this easier
way:
ℎ
The exact integral is ∫−ℎ 𝑥 𝑑𝑥 = 0 (because the function is odd)
−ℎ + 4 × 0 + ℎ
𝑆2 = ℎ=0
3
5.5. Definite Integrals, Part 3: The (Composite) Simpson’s Rule and Richardson Extrapolation 195
Introduction to Numerical Methods and Analysis with Python
ℎ
For 𝑓(𝑥) = 𝑥2 , again do it just on the symmetric interval [−ℎ, ℎ]: the exact integral is ∫−ℎ 𝑥2 𝑑𝑥 = [𝑥3 /3]ℎ−ℎ = 2ℎ3 /3
and
(−ℎ)2 + 4 × 02 + ℎ2
𝑆2 = ℎ = 2ℎ3 /3
3
So the degree of precision is at least 2, as expected.
What about cubics? Check with 𝑓(𝑥) = 𝑥3 , again on interval [−ℎ, ℎ].
Almost no calculation is needed: symmetry does it all for us:
ℎ
• on one hand, the exact integral is zero due to the function being odd on a symmetric interval: ∫−ℎ 𝑥3 𝑑𝑥 =
[𝑥4 /4]ℎ−ℎ = 0
• on the other hand,
(−ℎ)3 + 4 × 03 + ℎ3
𝑆2 = ℎ=0
3
The degree of precision is at least 3.
Our luck ends here, but looking at 𝑓(𝑥) = 𝑥4 is informative:
For 𝑓(𝑥) = 𝑥4 ,
ℎ
• the exact integral is ∫−ℎ 𝑥4 𝑑𝑥 = [𝑥5 /5]ℎ−ℎ = 2ℎ5 /5
• on the other hand
(−ℎ)4 + 4 × 04 + ℎ4
𝑆2 = ℎ = 2ℎ5 /3
3
So there is a discrepancy of (4/15)ℎ5 , = 𝑂(ℎ5 ).
This Simpson’s Rule has degree of precision 3: it is exact for all cubics, but not for all quartics.
The last result also indicate the order of error:
𝑆2 − 𝐼 = 𝑂(ℎ5 )
Just as for the composite Trapezoid and Midpoint Rules, when we combine multiple simple Simpson’s Rule approx-
imations with 2𝑛 intervals each of width ℎ = (𝑏 − 𝑎)/(2𝑛), the error is roughly multiplied by 𝑛, so ℎ5 goes to
𝑛ℎ5 , = (𝑏 − 𝑎)ℎ4 , leading to
𝑆2𝑛 − 𝐼 = 𝑂(ℎ4 )
As we now know that this will be exact for cubics, use third order Tayloe polynomials:
𝑆2 = ℎ𝑓(0)(𝐶1 + 𝐶2 + 𝐶3 )
+ ℎ2 𝑓 (1) (0)(−𝐶1 + 𝐶3 )
+ ℎ3 𝑓 (2) (0)(𝐶1 /2 + 𝐶3 /2)
+ ℎ4 𝑓 (3) (0)(−𝐶1 /6 + 𝐶3 /6)
+ ℎ5 (𝐶1 𝑓 (4) (𝜉− ) + 𝐶3 𝑓 (4) (𝜉+ ))/24
𝐶1 + 𝐶 2 + 𝐶 3 = 2
−𝐶1 + 𝐶3 = 0
𝐶1 /2 + 𝐶3 /2 = 1/3
𝐶1 = 1/3, 2𝐶1 + 𝐶2 = 2
and thus
𝐶1 = 𝐶3 = 1/3, 𝐶2 = 4/3
as claimed above.
References:
• Section 5.3 Romberg Integration of [Sauer, 2022].
• Section 4.5 Romberg Integration of [Burden et al., 2016].
5.6.1 Introduction
Romberg Integration is based on repeated Richardson extrapolalation from the composite trapezoidal rule, starting with
one interval and repeatedly doubling. Our notation starts with
𝑅𝑖,0 = 𝑇2𝑖 , 𝑖 = 0, 1, 2, …
where
and the second index will indicate the number of extapolation steps done (none so far!)
Actually we only need this 𝑇𝑛 formula for the single trapezoidal rule, to get
𝑓(𝑎) + 𝑓(𝑏)
𝑅0,0 = 𝑇1 = (𝑏 − 𝑎),
2
because the most efficient way to get the other values is recursively, with
𝑇𝑛 + 𝑀𝑛
𝑇2𝑛 =
2
where 𝑀𝑛 is the composite midpoint rule,
𝑛
𝑏−𝑎
𝑀𝑛 = ℎ ∑ 𝑓(𝑎 + (𝑘 − 1/2)ℎ), ℎ=
𝑘=1
𝑛
4𝑗 𝑅𝑖,𝑗−1 − 𝑅𝑖−1,𝑗−1
𝑅𝑖,𝑗 = , 𝑗 = 1, 2, … , 𝑖
4𝑗 − 1
which can also be expressed as
𝑅𝑖,𝑗−1 − 𝑅𝑖−1,𝑗−1
𝑅𝑖,𝑗 = 𝑅𝑖,𝑗−1 + 𝐸𝑖,𝑗−1 , where 𝐸𝑖,𝑗−1 = is an error estimate.
4𝑗 − 1
4𝑇2𝑛 − 𝑇𝑛
𝑅𝑖,1 = 𝑆2𝑛 = , 𝑛 = 2𝑖−1
4−1
The above can now be arranged into a basic algorithm. It does a fixed number 𝑀 of levels of extrapolation so using 2𝑀
intervals; a refinement would be to use the above error estimate 𝐸𝑖,𝑗−1 as the basis for a stopping condition.
for j from 1 to i:
4𝑗 𝑅𝑖,𝑗−1 − 𝑅𝑖−1,𝑗−1
𝑅𝑖,𝑗 =
4𝑗 − 1
end for
𝑛 ← 2𝑛
ℎ ← ℎ/2
end for
SIX
MINIMIZATION
References:
• Section 13.1 Unconstrained Optimization Without Derivatives of [Sauer, 2022], in particular sub-section 13.1.1
Golden Section Search.
• Section 11.1, One-Variable Case in Chapter 11 Optimization of [Chenney and Kincaid, 2012].
6.1.1 Introduction
The goal of this section is to find the minimum of a function 𝑓(𝑥) and more specifically to find its location: the argument
𝑝 such that 𝑓(𝑝) ≤ 𝑓(𝑥) for all 𝑥 in the domain of 𝑓.
Several features are similar to what we have seen with zero-finding:
• Some restictions on the function 𝑓 are needed:
– with zero-finding, to guarantee existence of a solution, we needed at least an interval [𝑎, 𝑏] on which the
function is continuous and with a sign change between the endpoints;
– for minimization, the criterion for existence is simply an interval [𝑎, 𝑏] on which the function is continuous.
• With zero-finding, we needed to compare the values of the function at three points 𝑎 < 𝑐 < 𝑏 to determine a new,
smaller interval containing the root; with minimzation, we instead need to compare the values of the function at
four points 𝑎 < 𝑐 < 𝑑 < 𝑏 to determine a new, smaller interval containing the minimum.
• There are often good reasons to be able to do this without using derivatives.
As is often the case, a guarantee of a unique solution helps to devise a robust algorithm:
• to guarantee uniqueness of a zero in interval [𝑎, 𝑏], we needed an extra condition like the function being monotonic;
• to guarantee uniqueness of a minimum in interval [𝑎, 𝑏], the condition we use is being monomodal: The function
is decreasing near 𝑎, increasing near 𝑏, and changes between decreasing and increasing only once (which must
therefore happen at the minimum.)
So we assume from now on that the function is monomodal on the interval [𝑎, 𝑏].
201
Introduction to Numerical Methods and Analysis with Python
6.1.2 Step 1: finding a smaller interval within [𝑎, 𝑏] that contains the minimum
As claimed above, three points are not enough: even if for 𝑎 < 𝑐 < 𝑏 we have 𝑓(𝑎) > 𝑓(𝑐) and 𝑓(𝑐) < 𝑓(𝑏), the
minimum could be either to the left or the right of 𝑐.
So instead, choose two internal points 𝑐 and 𝑑, 𝑎 < 𝑐 < 𝑑 < 𝑏.
• if 𝑓(𝑐) < 𝑓(𝑑), the function is increasing on at least part of the interval [𝑐, 𝑑], so the transition from decreasing to
increasing is to the left of 𝑑: the minimum is in [𝑎, 𝑑];
• if instead 𝑓(𝑐) > 𝑓(𝑑), the “mirror image” argument shows that the minimum is in [𝑐, 𝑏].
What about the borderline case when 𝑓(𝑐) = 𝑓(𝑑)? The monomodal function cannot be either increasing or decreasing
on all of [𝑐, 𝑑] so must first decrease and then increase: the minimum is in [𝑐, 𝑑], and so is in either of the above intervals.
So we almost have a first algorithm, except for the isue of choosing; given an interval [𝑎, 𝑏] on which function 𝑓 is
monomodal:
1. Choose two internal points 𝑐 and 𝑑, with 𝑎 < 𝑐 < 𝑑 < 𝑏
2. Evaluate 𝑓(𝑐) and 𝑓(𝑑).
3. If 𝑓(𝑐) < 𝑓(𝑑), replace the interval [𝑎, 𝑏] by [𝑎, 𝑑]; else replace it by [𝑐, 𝑏].
4. If the new interval is short enough to locate the minimum with sufficient accuracy (e.g. its length is less that twice
the error tolerance) stop; its midpoint is a sufficiently accurate approximate answer); othewise, repaeat from step
(1).
6.1.3 Step 2: choosing the internal points so that the method is guaranteed to con-
verge
6.1.4 Step 3: choosing the internal points so that the method converges as fast as
possible
References:
• Chapter 13 Optimization of [Sauer, 2022], in particular sub-sections 13.2.2 Stepest Descent and 13.1.3 Nelder-Mead.
• Chapter 11 Optimization of [Chenney and Kincaid, 2012].
6.2.1 Introduction
This future section will focus on two methods for computing the minimum (and its location) of a function 𝑓(𝑥, 𝑦, … ) of
several variables:
• Steepest Descent where the gradient is used iteratively to find the direction in which to search for a nw approxiate
lovaitoi wher 𝑓 has a lower value.
• The method of Nelder and Mead, which does not use derivatives.
6.2. Finding the Minimum of a Function of Several Variables — Coming Soon 203
Introduction to Numerical Methods and Analysis with Python
SEVEN
References:
• Sections 6.1.1 Euler’s Method in [Sauer, 2022].
• Section 5.2 Euler’s Method in [Burden et al., 2016].
• Sections 7.1 and 7.2 in [Chenney and Kincaid, 2012].
import numpy as np
#from matplotlib import pyplot as plt
# Shortcuts for some favorite commands:
from numpy import linspace
from matplotlib.pyplot import figure, plot, grid, title, xlabel, ylabel, legend
𝑑𝑢
= 𝑓(𝑡, 𝑢(𝑡)), 𝑎 ≤ 𝑡 ≤ 𝑏
𝑑𝑡
with the initial condition
𝑢(𝑎) = 𝑢0
I will follow the common custom of referring to the independent variable as “time”.
For now, 𝑢(𝑡) is real-valued, but little will change when we later let it be vector-valued (and/or complex-valued).
205
Introduction to Numerical Methods and Analysis with Python
Sometimes, we need to be more careful and explicit in describing the function that solves the above initial value problem;
then the input parameters 𝑎 and 𝑢0 = 𝑢(𝑎) will be included of the function’s formula:
𝑢(𝑡) = 𝑢(𝑡; 𝑎, 𝑢0 )
(It is standard mathematical convention to separate parameters like 𝑎 and 𝑢0 from variables like 𝑡 by putting the former
after a semicolon.
7.1.2 Examples
A lot of useful intuition comes from these four fairly simple examples.
Example (Integration)
If the derivative depends only on the independent variable 𝑡, so that
𝑑𝑢
= 𝑓(𝑡), 𝑎 ≤ 𝑡 ≤ 𝑏
𝑑𝑡
the solution is given by integration:
𝑡
𝑢(𝑡) = 𝑢0 + ∫ 𝑓(𝑠) 𝑑𝑠.
𝑎
and this gives us a back-door way to use numerical methods for solving ODEs to evaluate definite integrals.
Example (Integration)
The simplest case with 𝑢 present in 𝑓 is 𝑓(𝑡, 𝑢) = 𝑓(𝑢) = 𝑢. But it does not hurt to add a constant, so:
𝑑𝑢
= 𝑘𝑢, 𝑘 a constant.
𝑑𝑡
The solution is
𝑢(𝑡) = 𝑢0 𝑒𝑘(𝑡−𝑎)
We will see that this simple example contains the essence of ideas relevant far more generally.
Example (A nonlinear equation, wiht solutions existing only for a finite time)
In the previous examples, 𝑓(𝑡, 𝑢) is linear in 𝑢 (consider 𝑡 as fixed); nonlinearities can lead to more difficult behavior. The
equation
𝑑𝑢
= 𝑢2 , 𝑢(𝑎) = 𝑢0
𝑑𝑡
can be solved by separation of variables — or for now you can just verify the solution
1
𝑢(𝑡) = , 𝑇 = 𝑎 + 1/𝑢0 .
𝑇 −𝑡
Note that if 𝑢0 > 0, the only exists for 𝑡 < 𝑇 . (The solution is also valid for 𝑇 > 0, but that part has no connection to
the initial data at 𝑡 = 𝑎.)
This example warns us that the IVP might not be well-posed when we set the interval [𝑎, 𝑏] in advance: all we can
guarantee in general is that a solution exists up to some time 𝑏, 𝑏 > 𝑎.
𝑑𝑢
= − sin 𝑡 − 𝑘(𝑢 − cos 𝑡)
𝑑𝑡
where 𝑘 is large and positive. Its family of solutions is
Once we know 𝑢(𝑡) (or a good approximation) at some time 𝑡, we also know the value of 𝑢′ (𝑡) = 𝑓(𝑡, 𝑢(𝑡)) there; in
particular, we know that 𝑢(𝑎) = 𝑢0 and so 𝑢′ (𝑎) = 𝑓(𝑎, 𝑢0 ).
This allows us to approximate 𝑢 for slightly larger values of the argument (which I will call “time”) using its tangent line:
This leads to the simplest approximation: choose a step size ℎ determining equally spaced times 𝑡𝑖 = 𝑎 + 𝑖ℎ and define
— recursively — a sequence of approximations 𝑈𝑖 ≈ 𝑢(𝑡𝑖 ) with
𝑈0 = 𝑢0
𝑈𝑖+1 = 𝑈𝑖 + ℎ𝑓(𝑡𝑖 , 𝑈𝑖 ))
If we choose a number of time steps 𝑛 and set ℎ = (𝑏−𝑎)/𝑛 for 0 ≤ 𝑖 ≤ 𝑛, the second equation is needed for 0 ≤ 𝑖 < 𝑛,
ending with 𝑈𝑛 ≈ 𝑢(𝑡𝑛 ) = 𝑢(𝑏).
This “two-liner” does not need a pseudo-code description; instead, we can go directly to a rudimentary Python function
for Euler’s Method:
Exercise 1
Show that for the integration case 𝑓(𝑡, 𝑢) = 𝑓(𝑡), Euler’s method is the same as the composite left-hand endpoint rule,
as in the section Definite Integrals, Part 2.
a = 0.
b = 3/4*np.pi
u_0 = 3.
n = 20
figure(figsize=[12,8])
title(f"The exact solution is y = cos(x) + {u_0 - np.cos(a)}")
plot(t, u, "g", label="Exact solution")
plot(t, U, ".:b", label=f"Euler's answer for h={(b-a)/n:0.4g}")
legend()
grid(True);
"""
return k*u
figure(figsize=[12,8])
title(f"The exact solution is $u = {u_0} \, \exp({k} \, t)$")
plot(t, u, "g", label="Exact solution")
(continues on next page)
legend()
grid(True);
figure(figsize=[12,8])
title(f"The exact solution is $y = {u_0} \, \exp({k} \, t)$")
plot(t, u, "g", label="Exact solution")
plot(t10, U10, ".:r", label=f"Euler's answer for h={(b-a)/10:0.4g}")
plot(t20, U20, ".:b", label=f"Euler's answer for h={(b-a)/20:0.4g}")
legend()
grid(True);
a = 0.
b = 0.9
u_0 = 1.
figure(figsize=[12,8])
title(f"The exact solution is $u = 1/({a + 1/u_0} - t)$")
plot(t, u, "g", label=f"Exact solution")
plot(t100, U100, ".:r", label=f"Euler's answer for h={(b-a)/100:0.2g}")
plot(t200, U200, ".:b", label=f"Euler's answer for h={(b-a)/200:0.2g}")
(continues on next page)
a = 0.
b = 0.999
u_0 = 1.
n = 200
u = u3(tplot, a, u_0)
figure(figsize=[12,8])
title(f"The exact solution is $u = 1/({T} - t)$")
plot(tplot, u, "g", label=f"Exact solution")
plot(t, U, ":b", label=f"Euler's answer for h={(b-a)/n:0.4g}")
legend()
grid(True);
Clearly Euler’s method can never produce the vertical asymptote. The best we can do is improve accuracy by using more,
smaller time steps:
n = 10000
u = u3(tplot, a, u_0)
figure(figsize=[12,8])
title(f"The exact solution is $u = 1/({T} - t)$")
plot(tplot, u, "g", label="Exact solution")
plot(t, U, ":b", label=f"Euler's answer for h={(b-a)/n:0.4}")
legend()
grid(True);
The general solution is u(t) = u(t; a, u_0, k) = \cos t + (u_0 - cos(a)) e^{-K (t-
↪a)}
"""
return -np.sin(t) - k*(u - np.cos(t))
With enough steps (small enough step size ℎ), all is well:
a = 0.
b = 2 * np.pi # One period
u_0 = 2.
k = 40.
n = 400
figure(figsize=[12,8])
title(f"The exact solution is u = cos t + {u_0-1:0.4g} exp(-{k} t)")
plot(t, u, "g", label=f"Exact solution for {k=}")
(continues on next page)
However, with large steps (still small enough to handle the cos 𝑡 part), there is a catastrophic failure, with growing oscil-
lations that, as we will see, are a characteristic feature of instability.
n = 124
figure(figsize=[12,8])
title(f"The exact solution is u = cos t + {u_0-1:0.3} exp(-{k} t)")
plot(t, u, "g", label=f"Exact solution for {k=}")
plot(t, U, '.:b', label=f"Euler's answer for h={(b-a)/n:0.3}")
legend()
grid(True);
To show that the 𝑘 part is the problem, reduce 𝑘 while leaving the rest unchanged:
k = 10.
figure(figsize=[12,8])
title(f"The exact solution is u = cos t + {u_0-1:0.3} exp(-{k} t)")
plot(t, u, "g", label=f"Exact solution for {k=}")
plot(t, U, '.:b', label=f"Euler's answer for h={(b-a)/n:0.3}")
legend()
grid(True);
It is sometime useful to adjust the time step size; for example reducing it when the derivative is larger, (as happens in
Example 3 above). This gives a slight variant, now expressed in pseudo-code:
Input: 𝑓, 𝑎, 𝑏, 𝑛
𝑡0 = 𝑎
𝑈0 = 𝑢0
for i in [0, 𝑛):
Choose ℎ𝑖 somehow
𝑡𝑖+1 = 𝑡𝑖 + ℎ𝑖
𝑈𝑖+1 = 𝑈𝑖 + ℎ𝑖 𝑓(𝑡𝑖 , 𝑈𝑖 )
end for
In a later section, we will see how to estimate errors within an algorithm, and then how to use such error estimates to
guide the choice of step size.
A great amount of intuition about numerical methods for solving ODE IVPs comes from that “simplest nontrivial exam-
ple”, number 2 above. We can solve it with constant step size ℎ, and thus study its errors and accuracy. The recursion
relation is now
with solution
𝑈𝑖 = 𝑢0 (1 + ℎ𝑘)𝑖 .
So each is a geometric series: the difference is that the growth factor is 𝐺 = (1 + ℎ𝑘) for Euler’s method, vs 𝑔 = 𝑒𝑘ℎ =
1 + ℎ𝑘 + (ℎ𝑘)2 /2 + ⋯ = 1 + ℎ𝑘 + 𝑂(ℎ2 ) for the ODE.
Ths deviation at each time step is 𝑂(ℎ2 ), suggesting that by the end 𝑡 = 𝑏, at step 𝑛, the error will be 𝑂(𝑛ℎ2 ) =
𝑏−𝑎 2
𝑂( ℎ ) = 𝑂(ℎ).
ℎ
This is in fact what happens, but to verify that, we must deal with the challenge that once an error enters at one step, it is
potentially amplified at each subsequent step, so the errors introduced at each step do not simply get summed like they
did with definite integrals.
𝐸𝑖 = 𝑢(𝑡𝑖 ) − 𝑈𝑖
We will approach this by first considering the new error added at each step, the local truncation error (or discretization
error).
At the first step this is the same as above:
𝑒1 = 𝑢(𝑡1 ) − 𝑈1 = 𝑢(𝑎 + ℎ) − 𝑈1
However at later steps we compare the results 𝑈𝑖+1 to what the solution would be if it were exact at the start of that step:
that is, if 𝑈𝑖 were exact.
Using the notation 𝑢(𝑡; 𝑡𝑖 , 𝑈𝑖 ) introduced above for the solution of the ODE with initial condition 𝑢(𝑡𝑖 ) = 𝑈𝑖 , the location
truncation error at step 𝑖 is the discrepancy at time 𝑡𝑖+1 between what Euler’s method and the exact solution give when
both start at that point (𝑡𝑖 , 𝑈𝑖 ):
𝑒𝑖 = 𝑢(𝑡; 𝑡𝑖 , 𝑈𝑖 ) − 𝑈𝑖+1
The first term is the difference at 𝑡 = 𝑡2 of two solutions with values at 𝑡 = 𝑡1 being 𝑢(𝑡1 ) and 𝑈1 respectively. As the
ODE is linear and homogeneous, this is the solution of the same ODE with value at 𝑡 = 𝑡1 being 𝑢(𝑡1 ) − 𝑈1 , which is
𝑒1 : that solution is 𝑒1 𝑒𝑦(𝑡−𝑡1 , so at 𝑡 = 𝑡2 it is 𝑒1 𝑒𝑘ℎ . Thus the global error after two steps is
𝐸2 = 𝑒2 + (𝑒𝑘ℎ )𝑒1 ∶
the error from the previous step has been amplified by the growth factor 𝑔 = 𝑒𝑘ℎ :
𝐸2 = 𝑒2 + 𝑔𝑒1 ∶
To get a bound on the global error from the formula above, we first need a bound on the local truncation errors 𝑒𝑖 .
Taylor’s theorem gives 𝑒𝑘ℎ = 1 + 𝑘ℎ + 𝑒𝑘𝜉 (𝑘ℎ)2 /2, 0 < 𝜉 < 𝑘ℎ, so
and thus
𝑒𝑘ℎ 2
|𝑒𝑖 | ≤ |𝑈𝑖 | ℎ
2
Also, since 1+𝑘ℎ < 𝑒𝑘ℎ , |𝑈𝑖 | < |𝑢(𝑡𝑖 )| = |𝑢0 |𝑒𝑘(𝑡𝑖 −𝑎) , and we only need this to the beginning of the last step, 𝑖 ≤ 𝑛−1,
for which
Thus
|𝑢0 |𝑒𝑘(𝑏−ℎ−𝑎) 𝑒𝑘ℎ 2 |𝑢 𝑒𝑘(𝑏−𝑎) | 2
|𝑒𝑖 | ≤ ℎ = 0 ℎ
2 2
That is,
|𝑢0 𝑒𝑘(𝑏−𝑎) |
|𝑒𝑖 | ≤ 𝐶ℎ2 where 𝐶 ∶=
2
… and using this to complete the bound on the global truncation error
Using this bound on the local errors 𝑒𝑖 in the above sum for the global error 𝐸𝑖 ,
𝑔𝑖 − 1 2
|𝐸𝑖 | ≤ 𝐶ℎ2 (1 + 𝑔 + ⋯ 𝑔𝑖−1 ) = 𝐶 ℎ
𝑔−1
Since 𝑔𝑖 = 𝑒𝑘ℎ𝑖 = 𝑒𝑘(𝑡𝑖 −𝑎) and the denominator 𝑔 − 1 = 𝑒𝑘ℎ − 1 > 𝑘ℎ, we get
|𝑢0 𝑒𝑘(𝑏−𝑎) |
• The first is the constant which is roughly half of the maximum value of the exact solution over the
2
interval [𝑎, 𝑏].
𝑒𝑘(𝑡𝑖 −𝑎) − 1
• The second depends on 𝑡, and
𝑘
• The third is ℎ, showing the overall order of accuracy: first order: the overall absolute error is 𝑂(ℎ).
A very similar result applies to the solution 𝑢(𝑡; 𝑎, 𝑢0 ) of the more general initial value problem
𝑑𝑢
= 𝑓(𝑡, 𝑢), 𝑢(𝑎) = 𝑢0
𝑑𝑡
so long as the function 𝑓 is “somewhat well-behaved” in that it satisfies a so-called Lipschitz Condition: that there is some
constant 𝐾 such that
𝜕𝐹
∣ (𝑡, 𝑢)∣ ≤ 𝐾
𝜕𝑢
for the relevant time values 𝑎 ≤ 𝑡 ≤ 𝑏.
(Aside: As you might have seen in a course on differential equations, such a Lipschitz condition is necessary to even
guarantee that the initial value problem has a unique solution, so it is a quite reasonable requirement.)
Then this constant 𝐾 plays the part of the exponential growth factor 𝑘 above:
first one shows that the local trunction error is bounded by
|𝑢0 𝑒𝐾(𝑏−𝑎) |
|𝑒𝑖 | ≤ 𝐶ℎ2 where now 𝐶 ∶= ;
2
then calculating as above bounds the global truncation error with
|𝑢0 𝑒𝐾(𝑏−𝑎) | 𝑒𝐾(𝑡𝑖 −𝑎) − 1
|𝐸𝑖 | ≤ ℎ, = 𝑂(ℎ)
2 𝑘
As with definite integrals, this is not very impressive, so in the next section on Runge-Kutta Methods we will explore several
widely used methods that improve to second order and then fourth order accuracy. Later, we will see how to get even
higher orders.
But first, we can illustrate how this exponential growth of errors looks in some examples, and coapr the the better behaved
errors in definite integrals.
This will be done by looking at the effect of a small change in the initial value, to simulate an error that arises there.
a = 0.
b = 2*np.pi
u_0 = 1. # Original value
n = 100
But now “perturb” the initial value in all cases by this much:
delta_u_0 = 0.1
figure(figsize=[12,8])
title("The solution before perturbing $u(0)$ was $u = \cos(x)$")
plot(t, u, "g", label="Original exact solution")
plot(t, U, ".:b", label="Euler's answer before perturbation")
plot(t, U_perturbed, ".:r", label="Euler's answer after perturbation")
legend()
grid(True);
k = 1.
a = 0.
b = 2.
u_0 = 1. # Original value
delta_u_0 = 0.1
n = 100
figure(figsize=[12,8])
title("The solution before perturbing $u(0)$ was $u = {u_0} \, \exp({k} \, t)$")
plot(t, u, "g", label="Original exact solution")
plot(t, U, ".:b", label="Euler's answer before perturbation")
plot(t, U_perturbed, ".:r", label="Euler's answer after perturbation")
legend()
grid(True);
figure(figsize=[12,8])
title("Error")
plot(t, u - U_perturbed, '.:')
U_perturbed
grid(True);
References:
• Sections 6.4 Runge-Kutta Methods and Applications in [Sauer, 2022].
• Section 5.4 Runge-Kutta Methods in [Burden et al., 2016].
• Sections 7.1 and 7.2 in [Chenney and Kincaid, 2012].
import numpy as np
# Shortcuts for some favorite commands:
from numpy import linspace
from matplotlib.pyplot import figure, plot, grid, title, xlabel, ylabel, legend
7.2.1 Introduction
The original Runge-Kutta method is the fourth order accurate one to be described below, which is still used a lot, though
with some modifications.
However, the name is now applied to a variety of methods based on a similar strategy, so first, here are a few simpler
methods, all of some value, at least for small, low precision calculations.
The simplest of all methods of this general form is Euler’s method. To set up the notation to be used below, rephrase it
this way:
To get from (𝑡, 𝑢) to an approximation of (𝑡 + ℎ, 𝑢(𝑡 + ℎ)), use the approximation
𝐾1 = ℎ𝑓(𝑡, 𝑢)
𝑢(𝑡 + ℎ) ≈ 𝑢 + 𝐾1
We have seen that the global error of Euler’s method is 𝑂(ℎ): it is only first order accurate. This is often insufficient, so
it is more common even for small, low precision calculation to use one of several second order methods:
The Explicit Trapezoid Method (a.k.a. the Improved Euler method or Huen’s method)
One could try to adapt the trapezoid method for integrating 𝑓(𝑡) to solve 𝑑𝑢/𝑑𝑡 = 𝑓(𝑡)
𝑡+ℎ
𝑓(𝑡) + 𝑓(𝑡 + ℎ)
𝑢(𝑡 + ℎ) = 𝑢(𝑡) + ∫ 𝑓(𝑠)𝑑𝑠 ≈ 𝑢(𝑡) + ℎ
𝑡 2
to solving the ODE 𝑑𝑢/𝑑𝑡 = 𝑓(𝑡, 𝑢) but there is a problem that needs to be overcome:
we get
On one hand, one can in fact use this formula, by solving the equation at each time step for the unknown 𝑈𝑖+1 ; for
example, one can use methods seen in earlier sections such as fixed point iteration or the secant method.
We will return to this in a later section; however, for now we get around this more simply by inserting an approximation
at right — the only one we know so far, given by Euler’s Method. That is:
• replace 𝑢(𝑡 + ℎ) at right by the tangent line approximation 𝑢(𝑡 + ℎ) ≈ 𝑢(𝑡) + ℎ𝑓(𝑡, 𝑢(𝑡)), giving
𝑓(𝑡, 𝑢(𝑡)) + 𝑓(𝑡 + ℎ, 𝑢(𝑡) + ℎ𝑓(𝑡, 𝑢(𝑡)))
𝑢(𝑡 + ℎ) ≈ 𝑢(𝑡) + ℎ
2
and for the formulas in terms of the 𝑈𝑖 , replace 𝑈𝑖+1 at right by 𝑈𝑖+1 ≈ 𝑈𝑖 + ℎ𝑓(𝑡𝑖 , 𝑈𝑖 ), giving
𝐾1 = ℎ𝑓(𝑡, 𝑢)
𝐾2 = ℎ𝑓(𝑡 + ℎ, 𝑢 + 𝐾1 )
1
𝑢(𝑡 + ℎ) ≈ 𝑢 + (𝐾1 + 𝐾2 )
2
For equal sized time steps, this leads to
𝑈0 = 𝑢0
1
𝑈𝑖+1 = 𝑈𝑖 + (𝐾1 + 𝐾2 ),
2
where
𝐾1 = ℎ𝑓(𝑡𝑖 , 𝑈𝑖 )
𝐾2 = ℎ𝑓(𝑡𝑖+1 , 𝑈𝑖 + 𝐾1 )
We will see that, despite the mere first order accuracy of the Euler approximation used in getting 𝐾2 , this method is
second order accurate; the key is the fact that any error in the approximation used for 𝑓(𝑡 + ℎ, 𝑢(𝑡 + ℎ)) gets multiplied
by ℎ.
Exercise 1
A) Verify that for the simple case where 𝑓(𝑡, 𝑢) = 𝑓(𝑡), this gives the same result as the composite trapezoid rule for
integration.
B) Do one step of this method for the canonical example 𝑑𝑢/𝑑𝑡 = 𝑘𝑢, 𝑢(𝑡0 ) = 𝑢0 . It will have the form 𝑈1 = 𝐺𝑈0
where the growth factor 𝐺 approximates the factor 𝑔 = 𝑒𝑘ℎ for the exact solution 𝑢(𝑡1 ) = 𝑔𝑢(𝑡0 ) of the ODE.
C) Compare to 𝐺 = 1 + 𝑘ℎ seen for Euler’s method.
D) Use the previous result to express 𝑈𝑖 in terms of 𝑈0 = 𝑢0 , as done for Euler’s method.
u = np.empty_like(t)
u[0] = u_0
for i in range(n):
K_1 = f(t[i], u[i])*h
K_2 = f(t[i]+h, u[i]+K_1)*h
u[i+1] = u[i] + (K_1 + K_2)/2.
return (t, u)
As always, this function can now also be imported from numericalMethods, with
Examples
For all methods in this section, we will solve for versions of Example 2 and 4 in the section Basic Concepts and Euler’s
Method.
𝑑𝑢
= 𝑓1 (𝑡, 𝑢) = 𝑘𝑢
𝑑𝑡
with solution
and
𝑑𝑢
= 𝑘(cos(𝑡) − 𝑢) − sin(𝑡)
𝑑𝑡
with solution
For comparison, the same examples are done below with Euler’s method.
a = 1.
b = 3.
u_0 = 2.
k = 1.5
n = 40
(continues on next page)
figure(figsize=[14,5])
title(f"Error")
plot(t, u - U, '.:')
grid(True);
a = 1.
b = a + 4 * np.pi # Two periods
u_0 = 2.
k = 2.
n = 80
figure(figsize=[14,5])
title(f"Error")
plot(t, U - u, '.:')
grid(True);
If we start with the Midpoint Rule for integration in place of the Trapezoid Rule, we similarly get an approximation
𝑢(𝑡 + ℎ) ≈ 𝑢(𝑡) + ℎ𝑓(𝑡 + ℎ/2, 𝑢(𝑡 + ℎ/2))
This has the slight extra complication that it involves three values of 𝑢 including 𝑢(𝑡 + ℎ/2) which we are not trying to
evaluate. We deal with that by making yet another approximation, using an average of 𝑢 values:
𝑢(𝑡) + 𝑢(𝑡 + ℎ)
𝑢(𝑡 + ℎ/2) ≈
2
leading to
𝑢(𝑡) + 𝑢(𝑡 + ℎ)
𝑢(𝑡 + ℎ) ≈ 𝑢(𝑡) + ℎ𝑓 (𝑡 + ℎ/2, )
2
and in terms of 𝑈𝑖 ≈ 𝑢(𝑡𝑖 ), the Implicit Midpoint Rule
𝑈𝑖 + 𝑈𝑖+1
𝑈𝑖+1 = 𝑈𝑖 + ℎ𝑓 (𝑡 + ℎ/2, )
2
We will see late that this is a particularly useful method in some situations, such as long-time solutions of ODEs that
describe the motion of physical systems with conservation of momentum, angular momentum and kinetic energy.
However, for now we again seek a more straightforward explicit method; using the same tangent line approximation
strategy as above gives the Explicit Midpoint Rule
𝐾1 = ℎ𝑓(𝑡, 𝑢)
𝐾2 = ℎ𝑓(𝑡 + ℎ/2, 𝑢 + 𝐾1 /2)
𝑢(𝑡 + ℎ) ≈ 𝑢 + 𝐾2
and thus for equal-sized time steps
𝑈0 = 𝑢0
𝑈𝑖+1 = 𝑈𝑖 + 𝐾2
where
𝐾1 = ℎ𝑓(𝑡𝑖 , 𝑈𝑖 )
𝐾2 = ℎ𝑓(𝑡𝑖 + ℎ/2, 𝑈𝑖 + 𝐾1 /2)
A) Verify that for the simple case where 𝑓(𝑡, 𝑢) = 𝑓(𝑡), this give the same result as the composite midpoint rule for
integration (same comment as above).
B) Do one step of this method for the canonical example 𝑑𝑢/𝑑𝑡 = 𝑘𝑢, 𝑢(𝑡0 ) = 𝑢0 . It will have the form 𝑈1 = 𝐺𝑈0
where the growth factor 𝐺 approximates the factor 𝑔 = 𝑒𝑘ℎ for the exact solution 𝑢(𝑡1 ) = 𝑔𝑢(𝑡0 ) of the ODE.
C) Compare to the growth factors 𝐺 seen for previous methods, and to the growth factor 𝑔 for the exact solution.
Exercise 3
A) Apply Richardson extrapolation to one step of Euler’s method, using the values given by step sizes ℎ and ℎ/2.
B) This should give a second order accurate method, so compare it to the above two methods.
u = np.empty_like(t)
u[0] = u_0
for i in range(n):
K_1 = f(t[i], u[i])*h
K_2 = f(t[i]+h/2, u[i]+K_1/2)*h
u[i+1] = u[i] + K_2
return (t, u)
Examples
a = 1.
b = 3.
u_0 = 2.
k = 1.5
n = 40
figure(figsize=[14,5])
title(f"Error")
(continues on next page)
Observation: The errors are very similar to those for the Explicit Trapezoid Method, not “half as much and of opposite
sign” as seen with integration; the exercises give a hint as to why this is so.
a = 1.
b = a + 4 * np.pi # Two periods
u_0 = 2.
k = 2.
n = 80
figure(figsize=[12,5])
title(f"Error")
plot(t, u - U, '.:')
grid(True);
Observation: This time, the errors are slightly better than for the Explicit Trapezoid Method but still not “half as much ”
as seen with integration; this is because this equation has a mix of integration (the “sin” and “cos” parts) and exponential
growth (the “Ku” part.)
𝐾1 = ℎ𝑓(𝑡, 𝑢)
𝐾2 = ℎ𝑓(𝑡 + ℎ/2, 𝑢 + 𝐾1 /2)
𝐾3 = ℎ𝑓(𝑡 + ℎ/2, 𝑢 + 𝐾2 /2)
𝐾4 = ℎ𝑓(𝑡 + ℎ, 𝑢 + 𝐾3 )
1
𝑢(𝑡 + ℎ) ≈ 𝑢 + (𝐾1 + 2𝐾2 + 2𝐾3 + 𝐾4 )
6
The derivation of this is far more complicated than those above, and is omitted. For now, we will instead assess its
accuracy “a postiori”, through the next exercise and some examples.
Exercise 4
A) Verify that for the simple case where 𝑓(𝑡, 𝑢) = 𝑓(𝑡), this gives the same result as the composite ximpson’s Rule for
integration.
B) Do one step of this method for the canonical example 𝑑𝑢/𝑑𝑡 = 𝑘𝑢, 𝑢(𝑡0 ) = 𝑢0 . It will have the form 𝑈1 = 𝐺𝑈0
where the growth factor 𝐺 approximates the factor 𝑔 = 𝑒𝑘ℎ for the exact solution 𝑢(𝑡1 ) = 𝑔𝑢(𝑡0 ) of the ODE.
C) Compare to the growth factors 𝐺 seen for previous methods, and to the growth factor 𝑔 for the exact solution.
u = np.empty_like(t)
u[0] = u_0
for i in range(n):
K_1 = f(t[i], u[i])*h
K_2 = f(t[i]+h/2, u[i]+K_1/2)*h
K_3 = f(t[i]+h/2, u[i]+K_2/2)*h
K_4 = f(t[i]+h, u[i]+K_3)*h
u[i+1] = u[i] + (K_1 + 2*K_2 + 2*K_3 + K_4)/6
return (t, u)
Examples
a = 1.
b = 3.
u_0 = 2.
k = 1.5
n = 20
figure(figsize=[14,5])
title(f"Solving du/dt = {k}u, u({a})={u_0} by the Runge-Kutta Method")
plot(t, u, "g", label="Exact solution")
plot(t, U, ".:b", label=f"Solution with h={(b-a)/n:0.4}")
legend()
grid(True)
figure(figsize=[14,5])
title(f"Error")
plot(t, u - U, ".:")
grid(True);
a = 1.
b = a + 4 * np.pi # Two periods
u_0 = 2.
k = 2.
n = 40
figure(figsize=[14,5])
title(f"Solving du/dt = {k}(cos(t) - u) - sin(t), u({a})={u_0} by the Runge-Kutta␣
↪Method")
figure(figsize=[14,5])
title(f"Error")
plot(t, u - U, ".:")
grid(True);
7.2.6 For comparison: the above examples done with Euler’s Method
a = 1.
b = 3.
u_0 = 2.
k = 1.5
n = 80
figure(figsize=[14,5])
title(f"Error")
plot(t, u - U, ".:")
grid(True);
a = 1.
b = a + 4 * np.pi # Two periods
u_0 = 2.
k = 2.
n = 160
figure(figsize=[12,5])
title(f"Solving du/dt = {k}(cos(t) - u) - sin(t), u({a})={u_0} by Euler's method")
plot(t, u, "g", label="Exact solution")
plot(t, U, ".:b", label=f"Euler's answer with h={(b-a)/n:0.4}")
legend()
grid(True)
figure(figsize=[14,5])
title(f"Error")
plot(t, u - U, ".:")
grid(True);
References:
• Subection 6.2.1 Local and global truncation error in [Sauer, 2022].
• Section 5.2 Euler’s Method in [Burden et al., 2016].
• Section 8.5 of [Kincaid and Chenney, 1990]
All the methods seen so far for solving ODE IVP’s are one-step methods: they fit the general form
𝑈𝑖+1 = 𝐹 (𝑡𝑖 , 𝑈𝑖 , ℎ)
𝐹 (𝑡, 𝑈 , ℎ) = 𝑈 + ℎ𝑓(𝑡, 𝑈 ),
and even the Runge-Kutta method has a similar form, but it is long and ugly.
For these, there is a general result that gives a bound on the globl truncation error (“GTE”) once one has a suitable bound
on the local truncation error (“LTE”). This is very useful, because bounds on the LTE are usually far easier to derive.
Theorem 6.3.1
When solving the ODE IVP
on interval 𝑡 ∈ [𝑎, 𝑏] by a one step method, one has a bound on the local truncation error
and the ODE itself satisfies the Lipschitz Condition that for some constant 𝐾,
𝜕𝐹
∣ (𝑡, 𝑢)∣ ≤ 𝐾
𝜕𝑢
then there is a bound on the global truncation error:
𝑒𝐾(𝑡𝑖 −𝑎) − 1 𝑝
|𝐸𝑖 | = |𝑈𝑖 − 𝑢(𝑡𝑖 ; 𝑎, 𝑢0 )| ≤ 𝐶 ℎ , = 𝑂(ℎ𝑝 )
𝑘
So yet again, there is a loss of one factor of ℎ in going from local to global error, as first seen with the composite rules for
definite integrals.
We saw a glimpse of this for Euler’s method, in the section Basic Concepts and Euler’s Method where the Taylor’s Theorem
error formula canbe used to get the LTE bound
|𝑢0 𝑒𝐾(𝑏−𝑎) |
|𝑒𝑖 | ≤ 𝐶ℎ2 where 𝐶 =
2
and this leads to to GTE bound
|𝑢0 𝑒𝐾(𝑏−𝑎) | 𝑒𝐾(𝑡𝑖 −𝑎) − 1
|𝐸𝑖 | ≤ ℎ.
2 𝑘
• For Euler’s method, it was stated in section Basic Concepts and Euler’s Method, (and verified for the test case of
𝑑𝑢/𝑑𝑡 = 𝑘𝑢) that the global truncation error is of first order n step-size ℎ:
• The Explicit (and Implicit) Trapezoid and Midpoint rules, the local truncation error is 𝑂(ℎ3 ) and so their global
truncation error is 𝑂(ℎ2 ) — they are second order accurate, just as for the corresponding approximate integration
rules.
• The classical Runge-Kutta method, has local truncation error 𝑂(ℎ5 ) and so its global truncation error is 𝑂(ℎ4 ) —
just as for the composite Simpson’s Rule, to which it corresponds for the “integration” case 𝑑𝑦/𝑑𝑡 = 𝑓(𝑡).
References:
• Section 6.3 Systems of Ordinary Differential Equations in [Sauer, 2022], to Sub-section 6.3.1 Higher order equations.
• Section 5.9 Higher Order Equations and Systems of Differential Equations in [Burden et al., 2016].
The short version of this section is that the numerical methods and algorithms developed so for for the initial value problem
𝑑𝑢
= 𝑓(𝑡, 𝑢(𝑡)), 𝑎≤𝑡≤𝑏
𝑑𝑡
𝑢(𝑎) = 𝑢0
all also work for system of first order ODEs by simply letting 𝑢 and 𝑓 be vector-valued, and for that, the Python code
requires only one small change.
Also, higher order ODE’s (and systems of them) can be converted into systems of first order ODEs.
To convert
𝑦″ = 𝑓(𝑡, 𝑦, 𝑦′ )
Next this can be put into vector form. Defining the vector-valued functions
𝑢(𝑡)
̃ = ⟨𝑢1 (𝑡), 𝑢2 (𝑡)⟩
̃ 𝑢(𝑡))
𝑓(𝑡, ̃ = ⟨𝑢1 (𝑡), 𝑓(𝑡, 𝑢2 (𝑡), 𝑢2 (𝑡))⟩
𝑑𝑢̃ ̃ 𝑢(𝑡)),
= 𝑓(𝑡, ̃ 𝑎≤𝑡≤𝑏
𝑑𝑡
𝑢(𝑎)
̃ = 𝑢̃0
𝑑𝑢̃ ̃ 𝑢(𝑡)),
= 𝑓(𝑡, ̃ 𝑎≤𝑡≤𝑏
𝑑𝑡
𝑢(𝑎)
̃ = 𝑢̃0
In this and subsequent sections, numerical methods for higher order equations and systems will be compared using several
test cases:
𝑑2 𝑦 𝑑𝑦
𝑀 = −𝐾𝑦 − 𝐷
𝑑𝑡2 𝑑𝑡
with initial conditions
(7.1)
𝑦(𝑎) = 𝑦0
𝑑𝑦
∣ = 𝑣0
𝑑𝑡 𝑡=𝑎
𝑑 𝑦 0 1 𝑦
[ ]=[ ][ ′ ]
𝑑𝑡 𝑦′ −𝐾 −𝐷 𝑦
Exact solutions
For testing of numerical methods in this and subsequent sections, here are the exact solutions.
They depend on whether
√
• 𝐷 < 𝐷0 ∶= 2 𝐾𝑀 : underdamped,
• 𝐷 > 𝐷0 : overdamped, or
• 𝐷 = 𝐷0 : critically damped.
The variable can be rescaled to the case 𝐾 = 𝑀 = 1, so that will be done from now on, but of course you can easily
experiment with other parameter values by editing copies of the Jupyter notebooks.
The equation
so for large 𝐾, it has two very disparate time scales, with only the slower scale of much significance after an initial
transient.
This is a convenient “toy” example for testing two refinements to algorithms:
• Variable time step sizes, so that they can be short during the initial transient and longer later, when only the 𝑒−𝑡
behavior is significant.
• Implicit methods that can effectively suppress the fast but extremely small 𝑒−𝑘𝑡 terms while hanling the larger,
slower terms accurately.
The examples below will use 𝐾 = 100, but as usual, you are free to experiment with other values.
Both the above equations are constant coefficient linear, which is convenient for the sake of having exact solution to
compare with, but one famous nonlinear example is worth exporing too.
A pendulum with mass 𝑚 concentrated at a distnace 𝐿 from the axis of rotation and that can rotate freely in a vertical
plane about that axis and with possible friction proportional to 𝐷, can be modeled in terms of its angular position 𝜃 and
angular velocity 𝜔 = 𝜃′ by
or in system form
𝑑 𝜃 𝜔
[ ]=[ 𝑔 𝐷 ]
𝑑𝑡 𝜔 − 𝐿 sin 𝜃 − 𝑀𝜔
These notes will mostly look at the frictionelss case 𝐷 = 0, which has conserved energy
𝑀𝐿 2
𝐸(𝜃, 𝜔) = 𝜔 − 𝑀 𝑔 cos 𝜃
2
For this, the solution fall into three qualitatively different cases depending on whether the energy is less than, equal to,
or greater than the “critical energy” 𝑀 𝑔, which is the energy of the unstable stationary solutions 𝜃(𝑡) = 𝜋( mod 2𝜋),
𝜔(𝑡) = 0: “balancing at the top”:
• For 𝐸 < 𝑀 𝑔, a solution can never reach the top, so the pendulum rocks back and forth, reach maximum height at
𝜃 = ± arccos(−𝐸/(𝑀 𝑔))
√
• For 𝐸 > 𝑀 𝑔, solutions have angular speed |𝜔| ≥ 𝐸 − 𝑀 𝑔 > 0 so it never drops to zero, and so the direction
of rotation can never reverse: solutions rotate in one direction for ever.
• For 𝐸 √= 𝑀 𝑔, one special type of solution is those up-side down stationary ones. Any other solution always has
|𝜔| = 𝐸 − 𝑀 𝑔 cos 𝜃 > 0, and so never stops or reverses direction but instead approaches the above stationary
point asymptotically both as 𝑡 → ∞ and 𝑡 → ∞. To visualize concretely, the solution starting at the bottom with
𝜃(0) = 0, 𝜔(0) = √2𝑔/𝐿 has 𝜃(𝑡) → ±𝜋 and 𝜔(𝑡) → 0 as 𝑡 → ±∞.
import numpy as np
# Shortcuts for some favorite mathemtcial functions and numbers:
from numpy import sqrt, sin, cos, pi
from matplotlib.pyplot import figure, plot, grid, title, xlabel, ylabel, legend, show
The Euler’s method code from before does not quite work, but only slight modification is needed; that “scalar” version
becomes
# Only the following three lines change for the system version
n_unknowns = len(u_0)
u = np.zeros([n+1, n_unknowns])
u[0] = np.array(u_0) # In case u_0 is a single number (the scalar case)
for i in range(n):
u[i+1] = u[i] + f(t[i], u[i])*h
return (t, u)
The above functions are available in module numericalMethods; they will be used in later sections.
M = 1.0
K = 1.0
D = 0.0
y_0 = 1.0
Dy_0 = 0.0
u_0 = [y_0, Dy_0]
a = 0.0
periods = 4
b = 2 * pi * periods
stepsperperiod = 1000
n = stepsperperiod * periods
figure(figsize=[14,7])
title(f"y and dy/dt with {K/M=}, {D=} by Euler's method with {stepsperperiod} steps␣
↪per period")
plot(t, Y, label="y")
plot(t, DY, label="dy/dt")
legend()
xlabel("t")
grid(True)
# Phase plane diagram; for D=0 the exact solutions are ellipses (circles if M = k)
title(f"The orbits of the mass-spring system, {K/M=}, {D=} by Euler's method with
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
E_0 = E_mass_spring(y_0, Dy_0)
E = E_mass_spring(Y, DY)
title("Energy variation")
plot(t, E - E_0)
xlabel("t")
grid(True)
Damped
figure(figsize=[14,7])
title(f"y and dy/dt with {K/M=}, {D=} by Euler's method with {stepsperperiod} steps␣
↪per period")
plot(t, Y, label="y")
plot(t, DY, label="dy/dt")
legend()
xlabel("t")
grid(True)
plot(Y, DY)
xlabel("y")
ylabel("dy/dt")
plot(Y[0], DY[0], "g*", label="start")
plot(Y[-1], DY[-1], "r*", label="end")
legend()
grid(True)
As above, the previous “scalar” function for this method needs just three lines of code modified.
Before:
"""
h = (b-a)/n
t = np.linspace(a, b, n+1) # Note: "n" counts steps, so there are n+1 values for␣
↪t.
u = np.empty_like(t)
u[0] = u_0
for i in range(n):
K_1 = f_mass_spring(t[i], u[i])*h
K_2 = f_mass_spring(t[i]+h/2, u[i]+K_1/2)*h
K_3 = f_mass_spring(t[i]+h/2, u[i]+K_2/2)*h
K_4 = f_mass_spring(t[i]+h, u[i]+K_3)*h
u[i+1] = u[i] + (K_1 + 2*K_2 + 2*K_3 + K_4)/6
return (t, u)
After:
"""
h = (b-a)/n
t = np.linspace(a, b, n+1) # Note: "n" counts steps, so there are n+1 values for␣
↪t.
# Only the following three lines change for the system version.
n_unknowns = len(u_0)
u = np.zeros([n+1, n_unknowns])
u[0] = np.array(u_0)
for i in range(n):
K_1 = f_mass_spring(t[i], u[i])*h
K_2 = f_mass_spring(t[i]+h/2, u[i]+K_1/2)*h
K_3 = f_mass_spring(t[i]+h/2, u[i]+K_2/2)*h
K_4 = f_mass_spring(t[i]+h, u[i]+K_3)*h
u[i+1] = u[i] + (K_1 + 2*K_2 + 2*K_3 + K_4)/6
return (t, u)
M = 1.0
k = 1.0
D = 0.0
y_0 = 1.0
Dy_0 = 0.0
u_0 = [y_0, Dy_0]
a = 0.0
periods = 4
b = 2 * pi * periods
(continues on next page)
stepsperperiod = 25
n = stepsperperiod * periods
figure(figsize=[14,7])
title(f"y and dy/dt with {k/M=}, {D=} by Runge-Kutta with {stepsperperiod} steps per␣
↪period")
7.4.5 Appendix: the Explicit Trapezoid and Midpoint Methods for systems
Yet again, the previous functions for these methods need just three lines of code modified.
The demos are just for the non-dissipative case, where the solution is known to be 𝑦 = cos 𝑡, 𝑑𝑡/𝑑𝑡 = − sin 𝑡.
For a fairer comparison of “accuracy vs computational effort” to the Runge-Kutta method, twice as many time steps are
used so that the same number of function evaluations are used for these three methods.
"""
h = (b-a)/n
t = np.linspace(a, b, n+1)
# Only the following three lines change for the systems version
n_unknowns = len(u_0)
u = np.zeros([n+1, n_unknowns])
u[0] = np.array(u_0)
(continues on next page)
for i in range(n):
K_1 = f_mass_spring(t[i], u[i])*h
K_2 = f_mass_spring(t[i]+h, u[i]+K_1)*h
u[i+1] = u[i] + (K_1 + K_2)/2.
return (t, u)
M = 1.0
k = 1.0
D = 0.0
y_0 = 1.0
Dy_0 = 0.0
u_0 = [y_0, Dy_0]
a = 0.0
periods = 4
b = 2 * pi * periods
stepsperperiod = 50
n = stepsperperiod * periods
figure(figsize=[14,7])
title(f"y and dy/dt with {k/M=}, {D=} by explicit trapezoid with {stepsperperiod}␣
↪steps per period")
At first glance this is foing well, keeping the orbits circular. However note the discrepancy between the start and end
points: these should be the same, as they are (visually) with the Runge-Kutta method.
"""
h = (b-a)/n
t = np.linspace(a, b, n+1)
# Only the following three lines change for the systems version.
n_unknowns = len(u_0)
u = np.zeros([n+1, n_unknowns])
u[0] = np.array(u_0)
for i in range(n):
K_1 = f_mass_spring(t[i], u[i])*h
K_2 = f_mass_spring(t[i]+h/2, u[i]+K_1/2)*h
u[i+1] = u[i] + K_2
return (t, u)
M = 1.0
k = 1.0
D = 0.0
y_0 = 1.0
Dy_0 = 0.0
u_0 = [y_0, Dy_0]
a = 0.0
periods = 4
b = 2 * pi * periods
stepsperperiod = 50
n = stepsperperiod * periods
figure(figsize=[14,7])
title(f"y and dy/dt with {k/M=}, {D=} by explicit midpoint with {stepsperperiod}␣
↪steps per period")
References:
• Section 6.5 Variable Step-Size Methods in [Sauer, 2022].
• Section 5.5 Error Control and the Runge-Kutta-Fehlberg Method in [Burden et al., 2016].
• Section 7.3 in [Chenney and Kincaid, 2012].
𝑑𝑢
= 𝑓(𝑡, 𝑢) 𝑎 ≤ 𝑡 ≤ 𝑏, 𝑢(𝑎) = 𝑢0
𝑑𝑡
We now allow the possibility that 𝑢 and 𝑓 are vector-valued as in the section on Systems of ODEs and Higher Order ODEs,
but omitting the tilde notation 𝑢,̃ 𝑓.̃
Algorithm 6.5.1
Input: 𝑓, 𝑎, 𝑏, 𝑛
𝑡0 = 𝑎
𝑈0 = 𝑢0
ℎ = (𝑏 − 𝑎)/𝑛
for i in [0, 𝑛):
Choose step size ℎ𝑖 somehow!
𝑡𝑖+1 = 𝑡𝑖 + ℎ𝑖
𝑈𝑖+1 = 𝑈𝑖 + ℎ𝑖 𝑓(𝑡𝑖 , 𝑈𝑖 )
end
We now consider how to choose each step size, by estimating the error in each step, and aiming to have error per unit
time below some limit like 𝜖/(𝑏 − 𝑎), so that the global error is no more than about 𝜖.
As usual, the theoretical error bounds like 𝑂(ℎ2𝑖 ) for a single step of Euler’s method are not enough for quantitative tasks
like choosing ℎ𝑖 , but they do motivate more practical estimates.
Starting at a point (t, u(t)), we can estimate the error in Euler’s method approximato at a slightly later time 𝑡𝑖 + ℎ by using
two approximations of 𝑈 (𝑡 + ℎ):
• The value given by a step of Euler’s method with step size ℎ: call this 𝑈 ℎ
ℎ/2
• The value given by taking two steps of Euler’s method each with step size ℎ/2: call this 𝑈2 , because it involves
2 steps of size ℎ/2.
ℎ/2
The first order accuracy of Euler’s method gives 𝑒ℎ = 𝑢(𝑡 + ℎ) − 𝑈 ℎ ≈ 2(𝑢(𝑡 + ℎ) − 𝑈2 ), so that
ℎ/2
𝑈2 − 𝑈ℎ
𝑒ℎ ≈
2
Exercise A
Write a formula for 𝑈ℎ and 𝑒ℎ if one starts from the point (𝑡𝑖 , 𝑈𝑖 ), so that (𝑡𝑖 + ℎ, 𝑈 ℎ ) is the proposed value for the next
point (𝑡𝑖+1 , 𝑈𝑖+1 ) in the approximate solution — but only if 𝑒ℎ is small enough!
Error tolerance
One simple criterion for accuracy is that the estimated error in this step be no more than some overall upper limit on the
error in each time step, 𝑇 . That is, accept the step size ℎ if
|𝑒ℎ | ≤ 𝑇
If this error tolerance is not met, we must choose a new step size ℎ′ , and we can predict roughly its error behavior using
the known order natue of the error in Euler’s method: scaling dowen to ℎ′ = 𝑠ℎ, the error in a single step scales with ℎ2
𝑒
(in general it scales with ℎ𝑝+1 for a method of order 𝑝), and so to reduce the error by the needed factor ℎ one needs
𝑇
approximately
𝑇
𝑠2 =
|𝑒ℎ |
and so using 𝑒ℎ ≈ 𝑒ℎ̃ = |𝑈 ℎ/2 − 𝑈 ℎ | suggests using
1/2
𝑇
𝑠=( )
|𝑈 ℎ/2 − 𝑈 ℎ |
However this new step size might have error that is still slightly too large, leading to a second failure. Another is that one
might get into an infinite loop of step size reduction.
So refinements of this choice must be considered.
If we simply follow the above aproach, the step size, once reduced, will never be increased. This could lead to great
inefficiency, through using an unecessarily small step size just because at an earlier part of the time domain, accuracy
required very small steps.
Thus, after a successful time step, one might consider increasing ℎ for the next step. This could be done using exactly the
above formula, but again there are risks, so again refinement of this choice must be considered.
One problem is that if the step size gets too large, the error estimate can become unreliable; another is that one might
need some minimum “temporal resolution”, for nice graphs and such.
Both suggest imposing an upper limit on the step size ℎ.
7.5.4 Another strategy for getting error estimates: two (related) Runge-Kutta meth-
ods
The recurring strategy of estimating errors by the difference of two different approximations — one expected to be far
better than the other — can be used in a nice way here. I will first illustrate with the simplest version, using Euler’s Mathod
and the Explicit Trapezoid Method.
Recall that the increment in Euler’s Method from time 𝑡 to time 𝑡 + ℎ is
𝐾1 = ℎ𝑓(𝑡, 𝑈 )
𝐾1 = ℎ𝑓(𝑡, 𝑈 )
𝐾2 = ℎ𝑓(𝑡 + ℎ, 𝑈 + 𝐾1 )
Thus we can use the difference, |𝐾1 − (𝐾1 + 𝐾2 )/2| = |(𝐾1 − 𝐾2 )/2| as an error estimate. In fact to be cautious, one
often drops the factor of 1/2, so using approximation 𝑒ℎ̃ = |𝐾1 − 𝐾2 |.
One has to be careful: this estimates the error in Euler’s Method, and one has to use it that way: using the less accurate
value 𝐾1 as the update.
A basic algorithm for the time step starting with 𝑡𝑖 , 𝑈𝑖 is
Algorithm 6.5.2
𝐾1 ← ℎ𝑓(𝑡𝑖 , 𝑈𝑖 )
𝐾2 ← ℎ𝑓(𝑡𝑖 + ℎ, 𝑈𝑖 + 𝐾1 )
𝑒ℎ ← |𝐾1 − 𝐾2 |
𝑠 ← √𝑇 /𝑒ℎ
if 𝑒ℎ < 𝑇
𝑈𝑖+1 = 𝑈𝑖 + 𝐾1
𝑡𝑖+1 = 𝑡𝑖 + ℎ
Increase ℎ for the next time step:
ℎ ← 𝑠ℎ
else: (not good enough: reduce ℎ and try again)
ℎ ← 𝑠ℎ
Algorithm 6.5.3
𝐾1 = ℎ𝑓(𝑡𝑖 , 𝑈𝑖 )
𝐾2 = ℎ𝑓(𝑡𝑖 + ℎ, 𝑈𝑖 + 𝐾1 )
𝑒ℎ = |𝐾1 − 𝐾2 |
𝑠 = 0.9√𝑇 /𝑒ℎ
if 𝑒ℎ < 𝑇
𝑈𝑖+1 = 𝑈𝑖 + 𝐾1
𝑡𝑖+1 = 𝑡𝑖 + ℎ
Increase ℎ for the next time step:
ℎ ← min(0.9𝑠ℎ, ℎ𝑚𝑎𝑥 )
else: (not good enough; reduce ℎ and try again)
ℎ ← max(0.9𝑠ℎ, ℎ𝑚𝑖𝑛 )
Start again from 𝐾1 = …
end
Exercise B
𝑑𝑢/𝑑𝑡 = 𝐾𝑢
and
𝑑𝑢/𝑑𝑡 = 𝐾(cos(𝑡) − 𝑢) − sin(𝑡)
(𝐾 = 1 is enough.)
import numpy as np
from matplotlib.pyplot import figure, plot, title, grid
# Use Python lists rather than Numpy arrays; they are easier to "increment"
# Initialize variables holding the current values of t and U
steps = 0
t_i = a
U_i = u_0
t = [t_i]
U = [U_i]
h = h_max # Start optimistically!
while t_i < b and steps < steps_max:
K_1 = h*f(t_i, U_i)
K_2 = h*f(t_i + h/2, U_i + K_1/2)
errorEstimate = abs(K_1 - K_2)
s = 0.9 * np.sqrt(errorTolerance/errorEstimate)
if errorEstimate <= errorTolerance: # Success!
t_i += h
U_i += K_1
t.append(t_i)
U.append(U_i)
# Adjust step size up, but not too big
h = min(s*h, h_max)
else: #Innacurate; reduce step size and try again
h = max(s*h, h_min)
if demoMode: print(f"{t_i=}: Decreasing step size to {h:0.3e} and trying␣
↪again.")
# A refinement not mentioned above; the next step should not overshoot t=b:
if t_i + h > b:
h = b - t_i
steps += 1
# Convert out to Numpy arrays, so that they can be used as input to numerical␣
↪functions and such:
t = np.array(t)
U = np.array(U)
return (t, U)
# Note: if the step count ran out, this does not reach t=b, but at least it is␣
↪correct as far as it goes
k = 1.
a = 1.
b = 3.
(continues on next page)
errorTolerance = 1e-2
time_start = time()
(t, U) = euler_error_control(f, a, b, u_0, errorTolerance, demoMode=True)
time_end = time()
time_elapsed = time_end - time_start
steps = len(U) - 1
h_ave = (b-a)/steps
U_exact = u(t)
U_error = U-U_exact
U_max = max(abs(U_error))
print()
print(f"With {errorTolerance=}, this took {steps} time steps, of average length {h_
↪ave:0.3}")
figure(figsize=[14,5])
title(f"Solution to du/dt={k}u, u({a})={u_0}")
plot(t, U, ".:")
grid(True)
figure(figsize=[14,5])
title(f"Error in the above")
plot(t, U_error, ".:")
grid(True);
errorTolerance = 1e-3
time_start = time()
(t, U) = euler_error_control(f, a, b, u_0, errorTolerance, demoMode=True)
time_end = time()
time_elapsed = time_end - time_start
steps = len(U) - 1
h_ave = (b-a)/steps
U_exact = u(t)
U_error = U-U_exact
U_max = max(abs(U_error))
print()
print(f"With {errorTolerance=}, this took {steps} time steps, of average length {h_
↪ave:0.3}")
figure(figsize=[14,5])
title(f"Solution to du/dt={k}u, u({a})={u_0}")
plot(t, U, ".:")
grid(True)
figure(figsize=[14,5])
(continues on next page)
With errorTolerance=0.001, this took 119 time steps, of average length 0.0168
The maximum absolute error is 0.265
The maximum absolute error per time step is 0.00223
The time taken to solve was 0.000433 seconds
errorTolerance = 1e-4
time_start = time()
(t, U) = euler_error_control(f, a, b, u_0, errorTolerance, demoMode=True)
time_end = time()
time_elapsed = time_end - time_start
steps = len(U) - 1
h_ave = (b-a)/steps
U_exact = u(t)
U_error = U-U_exact
(continues on next page)
figure(figsize=[14,5])
title(f"Solution to du/dt={k}u, u({a})={u_0}")
plot(t, U, ".:")
grid(True)
figure(figsize=[14,5])
title(f"Error in the above")
plot(t, U_error, ".:")
grid(True);
With errorTolerance=0.0001, this took 380 time steps, of average length 0.00526
The maximum absolute error is 0.084
The maximum absolute error per time step is 0.000221
The time taken to solve was 0.00127 seconds
In practice, one usually needs at least second order accuracy, and one approach to that is using computing a “candidates”
for the next time step with a second order accurate Runge-Kutta method and also a third order accurate one, the latter
used only to get an error estimate for the former.
Perhaps the simplest of these is based on adding error estimation to the Explicit Trapezoid Rule. Omitting the step size
adjustment for now, the main ingredients are:
Algorithm 6.5.4
𝐾1 = ℎ𝑓(𝑡, 𝑈 )
𝐾2 = ℎ𝑓(𝑡 + ℎ, 𝑈 + 𝐾1 )
(So far, as for the explicit trapezoid method)
𝐾3 = ℎ𝑓(𝑡 + ℎ/2, 𝑈 + (𝐾1 + 𝐾2 )/4)
(a midpoint approximation, using the above)
𝛿2 = (𝐾1 + 𝐾2 )/2
(The order 2 increment as for the explicit trapezoid method)
𝛿3 = (𝐾1 + 4𝐾3 + 𝐾2 )/6
(An order 3 increment — note the resemblance to Simpson’s Rule for integration. This is only used to get the final error
estimate below)
𝑒ℎ = |𝛿2 − 𝛿3 |, = |𝐾1 − 2𝐾3 + 𝐾2 |/3
Again, if this step is accepted, one uses the explicit trapezoid rule step: 𝑈𝑖+1 = 𝑈𝑖 + 𝛿2 .
The scale factor 𝑠 for step size adjustment must be modified for a method order 𝑝 (with 𝑝 = 2 now):
• Changing step size by a factor 𝑠 will change the error 𝑒ℎ in a single time step by a factor of about 𝑠𝑝+1 .
• Thus, we want a new step with this rescaled error of about 𝑠𝑝+1 𝑒ℎ roughly matching the tolerance 𝑇 . Equating
would give 𝑠𝑝+1 𝑒ℎ = 𝑇 , so 𝑠 = (𝑇 /𝑒ℎ ))1/(𝑝+1) , but as noted above, since we are using only an approximation 𝑒ℎ̃
of 𝑒ℎ it is typical to include a “safety factor” of about 0.9, so something like
1/(𝑝+1)
𝑇
𝑠 = 0.9 ( )
|𝑒ℎ̃ |
Thus for this second order accurate method, we then get
1/3
3𝑇
𝑠 = 0.9 ( )
|𝐾1 − 2𝐾3 + 𝐾2 |
Exercise C
Implement The explicit trapezoid method with error control, and test on the two familiar examples
𝑑𝑢/𝑑𝑡 = 𝐾𝑢
and
𝑑𝑢/𝑑𝑡 = 𝐾(cos(𝑡) − 𝑢) − sin(𝑡)
(𝐾 = 1 is enough.)
7.5.6 Fourth order accurate methods with error control: Runge-Kutta-Felberg and
some newer refinements
The details involve some messy coefficients; see the references above for those.
The basic idea is to devise a fifth order accurate Runge-Kutta method such that we can also get a fourth order accurate
method from the same colection of stages 𝐾𝑖 values. One catch is that any such fifth order method requires six stages
(not five as you might have guessed).
The first such method, still widely used, is the Runge-Kutta-Felberg Mathod published by Erwin Fehlberg in 1970:
𝐾6 = 𝑓(𝑡 + 21 ℎ, 𝑈 − 8
27 𝐾1 + 2𝐾2 − 3544
513 𝐾3 + 1859
4104 𝐾4 − 11
40 𝐾5 )
25 1408 2197
𝛿4 = 216 𝐾1 + 2565 𝐾3 + 4104 𝐾4 − 15 𝐾5
(The order 4 increment that will actually be used)
16 6656 28561 9 2
𝛿5 = 135 𝐾1 + 12825 𝐾3 + 56430 𝐾4 − 50 𝐾5 + 55 𝐾6
(The order 5 increment, used only to get the following error estimate)
1 128 2197 1 2
𝑒ℎ̃ = 360 𝐾1 − 4275 𝐾3 + 75240 𝐾4 + 50 𝐾5 + 55 𝐾6
This method is typically used with the relative error control mentioned above, and since the order is 𝑝 = 4, the recom-
mended step-size rescaling factor is
1/5 1/5
𝑇 𝑈𝑖 𝑇 𝑈𝑖
𝑠 = 0.9 ∣ ∣ , = 0.9 ∣ 1 128 2197 1 2
∣ ,
𝑒ℎ̃ 360 𝐾1 − 4275 𝐾3 + 75240 𝐾4 + 50 𝐾5 + 55 𝐾6
Newer software often uses a variant called the Dormand–Prince method published in 1980; for example this is the default
method in the module scipy.integrate within Python package SciPy. This is similar in form to “R-K-F”, but has
somewhat smaller errors.
The basic usage is
because the output is an “object” containing many items; the ones we need for now are t and y, extracted with
t = ode_solution.t
y = ode_solution.y
where “RK45” refers to the Dormand–Prince method. Other options include method="RK23", which is second order
accurate, and very similar to the above The explicit trapezoid method with error control
Notes:
• SciPy’s notation is 𝑑𝑦/𝑑𝑡 = 𝑓(𝑡, 𝑦), so the result is called y, not u
• The initial data y_0 must be in a list or numpy array, even if it is a single number.
• The output y is a 2D array, even if it is a single equation rather than a system.
• This might output very few values; for more output times (for better graphs?), try something like
t_plot = np.linspace(a, b)
[t, y] = solve_ivp(f, [a, b], y_0, t_eval=t_plot)
Example
import numpy as np
import matplotlib.pyplot as plt
# Get an ODE IVP solve function, from module "integrate" within package "scipy"
from scipy.integrate import solve_ivp
To read more about this SciPy function scipy.integrate.solve_ivp, run the folowing help command in the notebook version
of this section:
help(solve_ivp)
a = 0.0
b = 2.0
y_0 = [1.0]
t_plot = np.linspace(a, b)
time_start = time()
ode_solution = solve_ivp(f, [a, b], y_0, t_eval=t_plot)
time_end = time()
time_elapsed = time_end - time_start
print(f"Time take to solve: {time_elapsed:0.3} seconds")
# The output is an "object" containing many items; the ones we need for now are t and␣
↪y.
figure(figsize=[14,5])
plt.title("Computed Solution")
plt.plot(t, y, ".:")
grid(True)
y_exact = np.exp(t)
errors = y - y_exact
figure(figsize=[14,5])
plt.title("Errors")
plt.plot(t, errors, ".:")
grid(True);
t_plot = np.linspace(a, b)
time_start = time()
ode_solution = solve_ivp(f, [a, b], y_0, t_eval=t_plot, rtol=1e-12, atol=1e-12)
time_end = time()
time_elapsed = time_end - time_start
print(f"Time take to solve: {time_elapsed:0.3} seconds")
t = ode_solution.t
y = ode_solution.y[0]
figure(figsize=[14,5])
plt.title("Computed Solution")
plt.plot(t, y, ".:")
grid(True)
y_exact = np.exp(t)
errors = y - y_exact
figure(figsize=[14,5])
(continues on next page)
References:
• Section 6.7 Multistep Methods of [Sauer, 2022].
• Section 5.6 Multistep Methods of [Burden et al., 2016].
7.6.1 Introduction
When approximating derivatives we saw that there is a distinct advantage in accuracy to using the centered difference
approximation
𝑑𝑓 𝑓(𝑡 + ℎ) − 𝑓(𝑡 − ℎ)
(𝑡) ≈ 𝛿ℎ 𝑓(𝑡) ∶=
𝑑𝑡 2ℎ
(with error 𝑂(ℎ2 )) over the forward difference approximation
𝑑𝑓 𝑓(𝑡 + ℎ) − 𝑓(𝑡)
(𝑡) ≈ Δℎ 𝑓(𝑡) ∶=
𝑑𝑡 ℎ
(which has error 𝑂(ℎ)).
However Euler’s method used the latter, and all ODE methods seen so far avoid using values at “previous” times like
𝑡 − ℎ. There is a reason for this, as using data from previous times introduces some complications, but sometimes those
are worth overcoming, so let us look into this.
In this section, we look at one simple multi-step method, based on the above centered-differnce derivative approximation.
Future sections will look at higher order methods such as the Adams-Bashforth and Adams-Moulton methods.
Inserting the above centered difference approximation of the derivative into the ODE 𝑑𝑢/𝑑𝑡 = 𝑓(𝑡, 𝑢) gives
𝑢(𝑡 + ℎ) − 𝑢(𝑡 − ℎ)
≈ 𝑓(𝑡, 𝑢(𝑡))
ℎ
which leads to the leap-frog method
𝑈𝑖+1 − 𝑈𝑖−1
= 𝑓(𝑡𝑖 , 𝑈𝑖 )
2ℎ
or
so that the new approximate value of 𝑢(𝑡) depends on approximate values at multiple previous times.
More specifically, this is called an 𝑠-step method.
This jargon is consistent with all methods seen in earlier sections being called one-step methods. For example, Euler’s
method can be written as
The leap-frog method already illustrates two of the complications that arise with multistep methods:
• The initial data 𝑢(𝑎) = 𝑢0 gives 𝑈0 , but then the above formula gives 𝑈1 in terms of 𝑈0 and the non-existent value
𝑈−1 ; a different method is needed to get 𝑈1 . More generally, with an 𝑠-step methods, one needs to compute the
first 𝑠 − 1 steps, up to 𝑈𝑠−1 , by some other method.
• Leap-frog needs a constant step size ℎ; the strategy of error estimation and error control using variable step size is
still possible with some multistep methods, but that is distinctly more complicated than we have seen with one-step
methods, and is not addressed in these notes.
Fortunately, many differential equations can be handled well by choosing a suitable fixed step size ℎ. Thus, in these notes
we work only with equal step sizes, so that our times are 𝑡𝑖 = 𝑎 + 𝑖ℎ and we aim for approximations 𝑈𝑖 ≈ 𝑢(𝑎 + 𝑖ℎ).
Using the fact that the centered difference approximation is second order accurate, one can verify that
𝑢(𝑡𝑖+1 ) − 𝑢(𝑡𝑖−1 )
− 𝑓(𝑡, 𝑢(𝑡𝑖 )) = 𝑂(ℎ2 )
2ℎ
(Alternatively one can get this by inserting quadratic Taylor polynomials centered at 𝑡𝑖 , and their error terms.)
The definition of local trunctation error needs to be extended slightly: it is the error 𝑈𝑖+1 − 𝑢(𝑡𝑖+1 ) when one starts with
exact values for all previous steps; that is, assuming 𝑈𝑗 = 𝑢(𝑡𝑗 ) for all 𝑗 ≤ 𝑖.
The above results then shows that the local truncation error in each step is 𝑈𝑖+1 − 𝑢(𝑡𝑖+1 ) = 𝑂(ℎ3 ), so that the “local
truncation error per unit time” is
𝑈𝑖+1 −𝑢(𝑡𝑖+1 )
$ ℎ = 𝑂(ℎ2 )$.
says that when a one-step methods has local truncation error per unit time of 𝑂(ℎ𝑝 ) it also has global truncation error of
the same order. The situation is a bit more complicated with multi-step methods, but loosely:
if the errors in a multistep method has local truncation 𝑂(ℎ𝑝 ) and it converges (i.e. the global error goes to zero at
ℎ → 0) then it does so at the expected rate of 𝑂(ℎ𝑝 ).
But multi-step methods can fail to converge, even if the local truncation error is of high order! This is dealt with via the
concept of stability; not considered here, but addressed in both references above, and a topic for future expansion of
these notes.
In particular, when the leap-frog method converges it is second order accurate, just like the centered difference approxi-
mation of 𝑑𝑢/𝑑𝑡 that it is built upon.
7.6.4 The speed advantage of multi-step methods like the leapfrog method
This second order accuracy illustrates a major potential advantage of multi-step methods: whereas any one-step Runge-
Kutta method that is second order accurate (such as the explicit trapezoid or explicit midpoint methods) require at least
two evaluations of 𝑓(𝑡, 𝑢) for each time step, the leapfrog methods requires only one.
More generally, for every 𝑠, there are 𝑠-step methods with errors 𝑂(ℎ𝑠 ) that require only one evaluation of 𝑓(𝑡, 𝑢) per
time step — for example, the Adams-Bashforth methods, as seen at
• Section Adams-Bashforth Multistep Methods
• https://en.wikipedia.org/wiki/Linear_multistep_method#Adams-Bashforth_methods
• https://en.m.wikiversity.org/wiki/Adams-Bashforth_and_Adams-Moulton_methods
• [Sauer, 2022] Section 6.7.1 and 6.7.2
• [Burden et al., 2016] Section 5.6
In comparison, any explicit one-step method order 𝑝 require at least 𝑝 evaluations of 𝑓(𝑡, 𝑢) per time step.
(See the Implicit Methods: Adams-Moulton for the distinction between explicit and implicit methods.)
import numpy as np
# Shortcuts for some favorite mathemtcial functions and numbers:
from numpy import sqrt, sin, cos, pi, exp
# Shortcuts for some favorite graphics commands:
from matplotlib.pyplot import figure, plot, grid, title, xlabel, ylabel, legend, show
import numericalMethods as nm
As seen in the section Systems of ODEs and Higher Order ODEs the Damped Mass-Spring Equation (6.4.1) is
𝑑2 𝑦 𝑑𝑦
𝑀 2
= −𝐾𝑦 − 𝐷
𝑑𝑡 𝑑𝑡
with initial conditions
𝑦(𝑎) = 𝑦0
𝑑𝑦
∣ = 𝑣0
𝑑𝑡 𝑡=𝑎
with first-order system form
𝑑𝑢0
= 𝑢1
𝑑𝑡
𝑑𝑢1 𝐾 𝐷
= − 𝑢0 − 𝑢
𝑑𝑡 𝑀 𝑀 1
with initial conditions
𝑢0 (𝑎) = 𝑦0
𝑢1 (𝑎) = 𝑣0
and the solutions seen in that section are given by function y_mass_spring
M = 1.
K = 1.
D = 0.
U_0 = [1., 0.]
a = 0.
periods = 4
b = 2 * pi * periods
# Note: In the notes on systems, the second order methods were tested with 50 is per␣
↪period
stepsperperiod = 100 # Equal cost per unit time as for the explicit trapezoid and␣
↪midpoint and Runge-Kutta methods
n = stepsperperiod * periods
h = (b-a)/n
(t_1step, U_1step) = nm.rungekutta_system(f_mass_spring, a, a+h, U_0, 1)
U_1 = U_1step[-1]
(t, U) = leapfrog(f_mass_spring, a, b, U_0, U_1, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[14,7])
title(f"y and dy/dt with {K/M=}, {D=} by leap-frog with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
title(f"The orbits")
plot(Y, DY)
xlabel("y")
ylabel("dy/dt")
plot(y[0], DY[0], "g*", label="start")
plot(y[-1], DY[-1], "r*", label="end")
legend()
grid(True)
D = 0.
periods = 32
b = 2 * pi * periods
# Note: In the notes on systems, the second order methods were tested with 50 steps␣
↪per period
stepsperperiod = 100 # Equal cost per unit time as for the explicit trapezoid and␣
↪midpoint and Runge-Kutta methods
n = stepsperperiod * periods
y = U[:,0]
Dy = U[:,1]
figure(figsize=[14,7])
title(f"y with {K/M=}, {D=} by leap-frog with {periods} periods, {stepsperperiod}␣
↪steps per period")
plot(t, y, label="y")
xlabel("t")
grid(True)
title(f"The orbits of the mass-spring system, {K/M=}, {D=} by leap-frog with {periods}
↪ periods, {stepsperperiod} steps per period")
plot(y, Dy)
xlabel("y")
ylabel("dy/dt")
plot(y[0], Dy[0], "g*", label="start")
plot(y[-1], Dy[-1], "r*", label="end")
legend()
grid(True)
This is an example in instability: reducing the step-size only postpones the problem, but does not avoid it.
In future sections it will be seen that the leap-frog method is stable (and a good choice) for “conservative” systems like
the undamped mass-spring system, but unstable otherwise, such as for the damped case.
D = 0.1
periods = 32
b = 2 * pi * periods
# Note: In the notes on systems, the second order methods were tested with 50 steps␣
↪per period
stepsperperiod = 100 # Equal cost per unit time as for the explicit trapezoid and␣
↪midpoint and Runge-Kutta methods
n = stepsperperiod * periods
h = (b-a)/n
(continues on next page)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[14,7])
title(f"y with {K/M=}, {D=} by leap-frog with {periods} periods, {stepsperperiod}␣
↪steps per period")
plot(t, y, label="y")
xlabel("t")
grid(True)
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
References:
• Section 6.7 Multistep Methods in [Sauer, 2022].
• Section 5.6 Multistep Methods in [Burden et al., 2016].
7.7.1 Introduction
so that the new approximate value of 𝑢(𝑡) depends on approximate values at multiple previous times. (The shift of
indexing to describe “present” in terms of “past” wil be convenient here.)
This is called an 𝑠-step method: the Runge-Kutta family of methods are all one-step.
We will be more specifically interted in what are called linear multistep methods, where the function at right is a linear
combination of value of 𝑢(𝑡) and 𝑓(𝑡, 𝑢(𝑡)).
So for now we look at
The Adams-Bashforth methods are a case of this with the only 𝑎𝑖 term being 𝑎𝑠−1 = 1:
As wil be verified later, the 𝑠-step version of this is accurate to order 𝑠, so one can get arbitrarily high order of accuracy
by using enough steps.
Aside. The case 𝑠 = 1 is Euler’s method, now written as
The Adams-Bashforth methods are probably the most comomnly used explicit, one-stage multi-step methods; we will see
more about the alternatives of implicit and multi-stage options in future sections. (Note that all Runge-Kutta methods
(except Euler’s) are multi-stage: the explicit trapezoid and midpoint methods are 2-stage; the classical Runge-Kutta
method is 4-stage.)
The most basic Adams-Bashforth multi-step method is the 2-step method, which can be thought of this way:
1. Start with the two most recent values, 𝑈𝑖−1 ≈ 𝑢(𝑡𝑖)−ℎ and 𝑈𝑖−2 ≈ 𝑢(𝑡𝑖 − 2ℎ)
2. Use the derivative approximations 𝐹𝑖−1 ∶= 𝑓(𝑡𝑖−1 , 𝑈𝑖−1 ) ≈ 𝑢′ (𝑡𝑖−1 ) and 𝐹𝑖−2 ∶= 𝑓(𝑡𝑖−2 , 𝑈𝑖−2 ) ≈ 𝑢′ (𝑡𝑖−2 ) and
linear extrapolation to “predict” the value of 𝑢′ (𝑡𝑖 − ℎ/2); one gets: 𝑢′ (𝑡𝑖 − ℎ/2) ≈ 32 𝑢′ (𝑡𝑖 − ℎ) − 12 𝑢′ (𝑡𝑖 − 2ℎ) ≈
3 1
2 𝐹𝑖−1 − 2 𝐹𝑖−2
#import numericalMethods as nm
M = 1.0
K = 1.0
D = 0.0
y_0 = 1.0
v_0 = 0.0
U_0 = [y_0, v_0]
a = 0.0
periods = 4
b = 2*pi * periods
# Using the same time step size as with the leapfrog method in the previous section.
stepsperperiod = 100
n = int(stepsperperiod * periods)
h = (b-a)/n
(t_1step, U_1step) = rungekutta_system(f_mass_spring, a, a+h, U_0, 1)
U_1 = U_1step[-1,:]
(t, U) = adamsbashforth2(f_mass_spring, a, b, U_0, U_1, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 2-step Adams-Bashforth with $periods periods, {stepsperperiod}
↪ steps per period")
(continues on next page)
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
title("The orbit")
plot(Y, DY)
xlabel("y")
plot(Y[1], DY[1], "g*", label="start")
plot(Y[-1], DY[-1], "r*", label="end")
legend()
grid(True)
D = 0.0
(continues on next page)
periods = 16
b = 2*pi * periods
# Using the same time step size as with the leapfrog method in the previous section.
stepsperperiod = 100
n = int(stepsperperiod * periods)
h = (b-a)/n
(t_1step, U_1step) = rungekutta_system(f_mass_spring, a, a+h, U_0, 1)
U_1 = U_1step[-1,:]
(t, U) = adamsbashforth2(f_mass_spring, a, b, U_0, U_1, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title("K/M=$(K/M), D=$D by 2-step Adams-Bashforth with $periods periods,
↪$stepsperperiod steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
title("The orbits")
plot(Y, DY)
xlabel("y")
xlabel("dy/dt")
plot(Y[1], DY[1], "g*", label="start")
plot(Y[-1], DY[-1], "r*", label="end")
legend()
grid(True)
In comparison to the (also second order accurate) leap-frog method, this is distinctly worse; the errors are more than
twice as large, and the solution fails to stay on the circle; unlike leapfrog, the energy 𝐸(𝑡) = 21 (𝑦2 (𝑡) + 𝐷𝑦2 (𝑡)) is not
conserved.
On the other hand …
This is an example in stability; in future sections it will be seen that the the Adams-Bashforth methods are all stable for
these equations for small enough step size ℎ, and so converge to the correct solution as ℎ → 0.
Looking back, this suggests (correctly) that while the leapfrog method is well-suited to conservative equations, Adams-
Bashforth methods are much preferable for more general equations.
D = 0.5
periods = 4
b = 2*pi * periods
# Using the same time step size as with the leapfrog method in the previous section.
(continues on next page)
h = (b-a)/n
(t_1step, U_1step) = rungekutta_system(f_mass_spring, a, a+h, U_0, 1)
U_1 = U_1step[-1,:]
(t, U) = adamsbashforth2(f_mass_spring, a, b, U_0, U_1, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 2-step Adams-Bashforth with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
The strategy described above of polynomial approximation, extrapolation, and integration can be generalized to get the 𝑠
step Adams-Bashforth method, of order 𝑠; to get the approximation 𝑈𝑖 of 𝑢(𝑡𝑖 ) from data at the 𝑠 most recent previous
times 𝑡𝑖−1 to 𝑡𝑖−𝑠 :
1. Find the collocating polynomial 𝑝(𝑡) of degree 𝑠 − 1 through (𝑡𝑖−1 , 𝐹𝑖−1 ) … (𝑡𝑖−𝑠 , 𝐹𝑖−𝑠 )
2. Use this on the interval (𝑡𝑖−1 , 𝑡𝑖 ) (extrapolation) as an approximation of 𝑢′ (𝑡) = 𝑓(𝑡, 𝑢(𝑡)) in that interval.
𝑡𝑖 𝑡𝑖
3. Use 𝑢(𝑡𝑖 ) = 𝑢(𝑡𝑖−1 ) + ∫𝑡 𝑢′ (𝜏 )𝑑𝜏 ≈ 𝑢(𝑡𝑖−1 ) + ∫𝑡 𝑝(𝜏 )𝑑𝜏 , where the latter integral can be evaluated exactly.
𝑖−1 𝑖−1
Again, one does not actually evaluate this integral; it is enough to verify that the resulting form will be
with the coefficients being the same for any 𝑓(𝑡, 𝑢) and any ℎ.
In fact, the polynomial fitting and integration can be skipped: thecoefficients can be derived by the method of undetermined
coefficients as seen in Approximating Derivatives by the Method of Undetermined Coefficients and this also established that
the local truncation error is 𝑂(ℎ𝑠 ):
• insert Taylor polynomial approximations of 𝑢(𝑡𝑖−𝑘 ) = 𝑢(𝑡𝑖 )−𝑘ℎ) and 𝑓(𝑡𝑖−𝑘 , 𝑢(𝑡𝑖−𝑘 )) = 𝑢′ (𝑡𝑖−𝑘 ) = 𝑢′ (𝑡𝑖 −𝑘ℎ)
into $𝑈𝑖 = 𝑈𝑖−1 + ℎ(𝑏0 𝑓(𝑡𝑖−𝑠 , 𝑈𝑖−𝑠 ) + ⋯ + 𝑏𝑠−1 𝑓(𝑡𝑖−1 , 𝑈𝑖−1 ))$
• solve for the 𝑠 coefficients 𝑏0 … 𝑏𝑠−1 that give the highest power for the residual error: the terms in the first 𝑠 powers
of ℎ (from ℎ0 = 1 to ℎ𝑠−1 ) can be cancelled, leaving an error 𝑂(ℎ𝑠 ).
The first few Adams-Bashforth formulas are:
• 𝑠 = 1: 𝑏0 = 1, $𝑈𝑖 = 𝑈𝑖−1 + ℎ𝐹𝑖−1 = 𝑈𝑖−1 + ℎ𝑓(𝑡𝑖−1 , 𝑈𝑖−1 ) (Euler's method)$
ℎ
• 𝑠 = 2: 𝑏0 = −1/2, 𝑏1 = 3/2, $𝑈𝑖 = 𝑈𝑖−1 + 2 (3𝐹𝑖−1 − 𝐹𝑖−2 ) (as above)$
ℎ
• 𝑠 = 3: 𝑏0 = 5/12, 𝑏1 = −16/12, 𝑏2 = 23/12, $𝑈𝑖 = 𝑈𝑖−1 + 12 (23𝐹𝑖−1 − 16𝐹𝑖−2 + 5𝐹𝑖−3 )$
• 𝑠 = 4: 𝑏0 = −9/24, 𝑏1 = 37/24, 𝑏2 = −59/24, 𝑏3 = 55/24, $𝑈𝑖 = 𝑈𝑖−1 +
ℎ
24 (55𝐹𝑖−1 − 59𝐹𝑖−2 + 37𝐹𝑖−3 − 9𝐹𝑖−4 )$
D = 0.0
periods = 16
b = 2*pi * periods
# Using the same time step size as for leapfrog method in the previous section.
stepsperperiod = 100
n = int(stepsperperiod * periods)
# We need U_1 and U_2, and get them with the Runge-Kutta method;
# this is overkill for accuracy, but since only two steps are needed, the time cost␣
↪is negligible.
h = (b-a)/n
(t_2step, U_2step) = rungekutta_system(f_mass_spring, a, a+2*h, U_0, 2)
U_1 = U_2step[1,:]
U_2 = U_2step[2,:]
(t, U) = adamsbashforth3(f_mass_spring, a, b, U_0, U_1,U_2, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 3-step Adams-Bashforth with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
Comparing to the leap-frog method, this higher order method at last has smaller errors (and they can be got even smaller
by increasing the number of steps) but the leapfrog method is still better at keeping the solutions on the circle.
D = 0.5
periods = 4
b = 2*pi * periods
# Note: In the notes on systems, the second order Runge-Kutta methods were tested␣
↪with 50 steps per period
stepsperperiod = 100 # Equal cost per unit time as for the explicit trapezoid and␣
↪midpoint and Runge-Kutta methods
n = int(stepsperperiod * periods)
# We need U_1 and U_2, and get them with the Runge-Kutta method;
# this is overkill for accuracy, but since only two steps are needed, the time cost␣
↪is negligible.
h = (b-a)/n
(t_2step, U_2step) = rungekutta_system(f_mass_spring, a, a+2*h, U_0, 2)
(continues on next page)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 3-step Adams-Bashforth with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
The fourth-order, four step method does at last appear to surpass leap-frog on the conservative case:
D = 0.0
periods = 16
b = 2*pi * periods
# Using the same time step size as for leapfrog method in the previous section.
stepsperperiod = 100
n = int(stepsperperiod * periods)
# We need U_1, U_2 and U_3, and get them with the Runge-Kutta method;
# this is overkill for accuracy, but since only three steps are needed, the time cost␣
↪is negligible.
h = (b-a)/n
(t_3step, U_3step) = rungekutta_system(f_mass_spring, a, a+3*h, U_0, 3)
U_1 = U_3step[1,:]
U_2 = U_3step[2,:]
U_3 = U_3step[3,:]
(continues on next page)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 4-step Adams-Bashforth with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
title("The orbit")
plot(Y, DY)
xlabel("y")
plot(Y[1], DY[1], "g*", label="start")
plot(Y[-1], DY[-1], "r*", label="end")
legend()
grid(True)
D = 0.5
# Using the same time step size as for leapfrog method in the previous section.
stepsperperiod = 50
n = int(stepsperperiod * periods)
# We need U_1, U_2 and U_3, and get them with the Runge-Kutta method.
h = (b-a)/n
(t_3step, U_3step) = rungekutta_system(f_mass_spring, a, a+3*h, U_0, 3)
U_1 = U_3step[1,:]
U_2 = U_3step[2,:]
U_3 = U_3step[3,:]
(t, U) = adamsbashforth4(f_mass_spring, a, b, U_0, U_1, U_2, U_3, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 4-step Adams-Bashforth with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
Finally, an “equal cost” comparison to the forht order Runge-Kutta method results in section Systems of ODEs and Higher
Order ODEs with four times as many steps per unit time: the fourth order Adams-Bashforth method come out ahead in
these two test cases.
D = 0.0
periods = 16
b = 2*pi * periods
stepsperperiod = 100
n = int(stepsperperiod * periods)
# We need U_1, U_2 and U_3, and get them with the Runge-Kutta method;
# this is overkill for accuracy, but since only three steps are needed, the time cost␣
↪is negligible.
h = (b-a)/n
(t_3step, U_3step) = rungekutta_system(f_mass_spring, a, a+3*h, U_0, 3)
U_1 = U_3step[1,:]
U_2 = U_3step[2,:]
U_3 = U_3step[3,:]
(t, U) = adamsbashforth4(f_mass_spring, a, b, U_0, U_1, U_2, U_3, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 4-step Adams-Bashforth with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
(continues on next page)
title("The orbit")
plot(Y, DY)
xlabel("y")
plot(Y[1], DY[1], "g*", label="start")
plot(Y[-1], DY[-1], "r*", label="end")
legend()
grid(True)
D = 0.5
periods = 4
b = 2*pi * periods
stepsperperiod = 100
n = int(stepsperperiod * periods)
# We need U_1, U_2 and U_3, and get them with the Runge-Kutta method;
# this is overkill for accuracy, but since only three steps are needed, the time cost␣
↪is negligible.
h = (b-a)/n
(t_3step, U_3step) = rungekutta_system(f_mass_spring, a, a+3*h, U_0, 3)
U_1 = U_3step[1,:]
U_2 = U_3step[2,:]
U_3 = U_3step[3,:]
(t, U) = adamsbashforth4(f_mass_spring, a, b, U_0, U_1, U_2, U_3, n)
Y = U[:,0]
DY = U[:,1]
y = y_mass_spring(t, t_0=a, u_0=U_0, K=K, M=M, D=D) # Exact solution
(continues on next page)
figure(figsize=[10,4])
title(f"{K/M=}, {D=} by 4-step Adams-Bashforth with {periods} periods,
↪{stepsperperiod} steps per period")
figure(figsize=[10,4])
title("Error in Y")
plot(t, y-Y)
xlabel("t")
grid(True)
title("The orbit")
plot(Y, DY)
xlabel("y")
plot(Y[1], DY[1], "g*", label="start")
plot(Y[-1], DY[-1], "r*", label="end")
legend()
grid(True)
7.7.3 Exercises
Exercise A
Verify the derivation of Equation (7.2) for the second order Adams-Bashforth method, via polynomial collocation and
integration.
Exercise B
References:
• Section 6.7 Multistep Methods in [Sauer, 2022].
• Section 5.6 Multistep Methods in [Burden et al., 2016].
7.8.1 Introduction
So far, most methods we have seen give the new approximation value with an explicit formula for it in terms of previous
(and so already known) values; the general explicit s-step method seen in Adams-Bashforth Multistep Methods was
However, we briefly saw two implict methods back in Runge-Kutta Methods, in the process of deriving the explicit
trapezoid and explicit midpoint methods: the implicit trapezoid method (or just the trapezoid method, as this is the
real thing, before the further approximations were used to get an explicit formula)
𝑓(𝑡𝑖 , 𝑈𝑖 ) + 𝑓(𝑡𝑖+1 , 𝑈𝑖+1 ))
𝑈𝑖+1 = 𝑈𝑖 + ℎ
2
and the Implicit Midpoint Method
𝑈𝑖 + 𝑈𝑖+1
𝑈𝑖+1 = 𝑈𝑖 + ℎ𝑓 (𝑡 + ℎ/2, )
2
These are clearly not as simple to work with as explicit methods, but the equation solving can often be done. In particular
for linear differential equations, these give linear equations for the unknown 𝑈𝑖+1 , so even for systems, they can be solved
by the method seen earler in these notes.
Another strategy is noting that these are fixed point equations so that fixed point iterato can be used. The factor ℎ at right
helps; it can be shown that for small enough ℎ (how small depends on the function 𝑓), these are contraction mappings and
so fixed point iteration works.
This ide can be combined with linear multistep methods, and one important case is modifying the Adams-Bashforth
method by allowing 𝐹𝑖 = 𝑓(𝑡𝑖 , 𝑈𝑖 ) to appear at right: this gives the Adams-Moulton form
𝑠 = 0 ∶ 𝑏0 = 1
𝑈𝑖 − ℎ𝑓(𝑡𝑖 , 𝑈𝑖 )) = 𝑈𝑖−1 The backward Euler method
𝑠 = 1 ∶ 𝑏0 = 𝑏1 = 1/2
ℎ ℎ
𝑈𝑖 − 𝑓(𝑡𝑖 , 𝑈𝑖 ) = 𝑈𝑖−1 + (𝐹𝑖−1 ) The (implicit) trapezoid method
2 2
𝑠 = 2 ∶ 𝑏0 = −1/12, 𝑏1 = 8/12, 𝑏2 = 5/12
5ℎ ℎ
𝑈𝑖 − 𝑓(𝑡𝑖 , 𝑈𝑖 ) = 𝑈𝑖−1 + (−𝐹𝑖−2 + 8𝐹𝑖−1 )
12 12
𝑠 = 3 ∶ 𝑏0 = 1/24, 𝑏1 = −5/24, 𝑏2 = 19/24, 𝑏3 = 9/24
9ℎ ℎ
𝑈𝑖 − 𝑓(𝑡 , 𝑈 ) = 𝑈𝑖−1 + (𝐹𝑖−3 − 5𝐹𝑖−2 + 19𝐹𝑖−1 )
24 𝑖 𝑖 24
The use of 𝐹𝑖−𝑘 notation emphasizes that these earlier values of 𝐹𝑖−𝑘 = 𝑓(𝑡𝑖−𝑘 , 𝑈𝑖−𝑘 ) are known from a previous step,
so can be stored for reuse.
The backward Euler method has not been mentioned before; it comes from using the backward counterpart of the forward
difference approximation of the derivative:
𝑢(𝑡) − 𝑢(𝑡 − ℎ)
𝑢′ (𝑡) ≈
ℎ
Like Euler’s method it is only first order accurate, but it has excellent stability properties, which makes it useful in some
situations.
Rather than implementing any of these, the next section introduces a strategy for deriving explicit methods of comparable
accuracy, much as in Runge-Kutta Methods Euler’s method (Adams-Bashforth 𝑠 = 1) was combined with the trapezoid
method (Adams-Moulton 𝑠 = 1) to get the explicit trapezoid method: an explicit method with the same order of accuracy
as the latter of this pair.
7.8.2 Exercises
Coming soon.
References:
• Section 6.7 Multistep Methods in [Sauer, 2022].
• Section 5.6 Multistep Methods in [Burden et al., 2016].
7.9.1 Introduction
We have seen one predictor-corrector method already: the explicit trapezoid method, which uses Euler’s method to predict
a first approximation of the solution, and then corrects this using an approximation of the implicit trapezoid method.
We will look at combining the 𝑠-step Adams-Bashforth and Adams-Moulton methods to achieive an two-stage, 𝑠-step
method of same order of accuracy as the latter while being explicit — the explicit trapezoid method is this in the simplest
case 𝑠 = 1.
import numpy as np
from matplotlib import pyplot as plt
# Shortcuts for some favorite commands:
from matplotlib.pyplot import figure, plot, grid, title, xlabel, ylabel, legend
References:
• Section 6.6 Implicit Methods and Stiff Equations in [Sauer, 2022].
• Section 5.11 Stiff Equations in [Burden et al., 2016].
import numpy as np
from matplotlib import pyplot as plt
# Shortcuts for some favorite commands:
from matplotlib.pyplot import figure, plot, grid, title, xlabel, ylabel, legend
7.10.1 Introduction
Coming soon …
Exercises
307
CHAPTER
EIGHT
As a first test case, we will solve 𝑥 − cos(𝑥) = 0, which can be shown to have a unique root that lies in the interval [0, 1].
Then other equations can be tried.
8.2 Exercise 1
Create a Python function implementing the first, simplest algorithm from the section on Root finding by interval halving,
which perfomrs a fixed number of iterations, max_iterations. (This was called “N” there, but in code I encourage
using more descriptive names for variables.)
This be used as: root = bisection1(f, a, b, max_iterations)
Test it with the above example, and then try solving at least one other equation.
The main task is to create a Python function whose input specifies a function f, the interval end-points a and b, and an
upper limit tol on the allowable absolute error in the result; and whose output is both an approximate root c and a bound
errorBound on its absolute error.
That is, we require that there is an exact root 𝑟 near 𝑐, in that
|𝑟 − 𝑐| ≤ errorBound ≤ TOL.
I give a definition for the test function 𝑓. Note that I get the cosine function from the module numpy rather than the
standard Python module math, because numpy will be far more useful for us, and so I encourage you to avoid module
math as much as possible!
309
Introduction to Numerical Methods and Analysis with Python
This helps with the readability of large collections of code, avoiding the need to look further up the file to see where an
object like cos comes from. (It is also essential if you happen to use two functions of the same name from different
modules, though in the current example, one is unlikely to want both math.cos and numpy.cos.)
Here is a description of the bisection method algorithm in pseudocode, as used in our text book and these notes: a mix of
notations from mathematics and computer code, whatever makes the ideas clearest.
Input: f (a continuous function from and to the real numbers), a and b (real numbers, 𝑎 < 𝑏 with 𝑓(𝑎) and 𝑓(𝑏)
of opposite sign) errorTolerance (the maximum allowable absolute error)
Output will be: r (an approximation of a solution of 𝑓(𝑟) = 0) errorBound (an upper limit on the absolute
error in that approximation).
𝑎+𝑏
𝑐= errorBound = c - a while errorBound > errorTolerance: if f(a) f(c) > 0: 𝑎←𝑐 else: 𝑏←𝑐
2
𝑎+𝑏
end if 𝑐 = errorBound = c - a end while r = c
2
Output: r, errorBound
8.4 Exercise 2
Create Python/Numpy code for the more refined algorthn mabvm, wolving to a specified maximum allownble absolte
error, so with usage (root, errorBound) = bisection(f, a, b, TOL)
Again test by solving 𝑥 − cos 𝑥 = 0, using the fact that there is a solution in the interval (−1, 1), but this time solve
accurate to within 10−4 , and otu the the final error bound as well as the apprxomate root.
8.5 Exercise 3
NINE
9.1 Exercise 1
The equation 𝑥3 − 2𝑥 + 1 = 0 can be written as a fixed point equation in many ways, including
𝑥3 + 1
1. 𝑥 =
2
and
√
3
2. 𝑥 = 2𝑥 − 1
For each of these options:
(a) Verify that its fixed points do in fact solve the above cubic equation.
(b) Determine whether fixed point iteration with it will converge to the solution 𝑟 = 1. (assuming a “good enough” initial
approximation).
Note: computational experiments can be a useful start, but prove your answers mathematically!
311
Introduction to Numerical Methods and Analysis with Python
TEN
10.1 Exercise 1
313
Introduction to Numerical Methods and Analysis with Python
ELEVEN
11.1 Exercise 1
𝑓(𝑥) = 𝑥𝑘 − 𝑎
11.2 Exercise 2
(The last input parameter maxIterations could be optional, with a default like maxIterations=100.)
b) based on your function bisection2 create a third (and final!) version with usage
𝑓1 (𝑥) = 10 − 2𝑥 + sin(𝑥) = 0
315
Introduction to Numerical Methods and Analysis with Python
Again graph the function, to find a good starting interval [𝑎, 𝑏] and initial approximation 𝑥0 .
e) This second case will behave differently than for 𝑓1 in part (c): describe the difference. (We will discuss the reasons
in class.)
TWELVE
Note: This builds on the previous exercise comparing the Bisection and Newton’s Methods; just adding the Secant
Method.
A) Write a Python function implementing the secant method with usage
Update your previous implementations of the bisection method and Newton’s method to mimic this interfacce:
Aside: the last parameter maxIterations could be optional, with a default like maxIterations=100.
B) Use these to solve the equation
10 − 2𝑥 + sin(𝑥) = 0
317
Introduction to Numerical Methods and Analysis with Python
THIRTEEN
13.1 Exercise 1
Verify that when dividing two numbers, the relative error in the quotient is no worse than slightly more than the sum of
the relative errors in the numbers divided. (Mimic the argument for the corresponding result on products.)
13.2 Exercise 2
13.3 Exercise 3
(a) Illustrate why computing the roots of the quadratic equation 𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0 with the standard formula
√
−𝑏 ± 𝑏2 − 4𝑎𝑐
𝑥=
2𝑎
can sometimes give poor accuracy when evaluated using machine arithmetic such as IEEE-64 floating-point arithmetic.
This is not alwys a problem, so identify specifically the situations when this could occur, in terms of a condition on the
coefficents 𝑎, 𝑏, and 𝑐. (It is sufficient to consider real value of the ocefficients. Also as an aside, there is no loss pr precisio
prbe, when th roots are non-real, so you only need consider quadratics with real roots.)
(b) Then describe a careful procedure for always getting accurate answers. State the procedure first with words and
mathematical formulas, and then express it in pseudo-code.
319
Introduction to Numerical Methods and Analysis with Python
320 Chapter 13. Exercises on Machine Numbers, Rounding Error and Error Propagation
CHAPTER
FOURTEEN
14.1 Exercise 1
4. 2. 1. 0.693147
⎡ 9. 3. 1. ⎤ 𝑥 = ⎡ 1.098612 ⎤
⎢ ⎥ ⎢ ⎥
⎣ 25. 5. 1. ⎦ ⎣ 1.609438 ⎦
by naive Gaussian elimination. Do this by hand, rounding each intermediate result to four significant digits, and write
down each intermediate version of the system of equations.
B) Compute the residual vector 𝑟 ∶= 𝑏−𝐴𝑥𝑎 and residual maximum norm ‖𝑟‖max = ‖𝑏−𝐴𝑥𝑎 ‖max for your approximation.
Residual calculations must be done to high precision, so I recommend that you do this part with Python in a notebook.
14.2 Exercise 2
Repeat Exercise 1, except using maximal element partial pivoting. Then compare the residuals for the two methods (with
and without pivoting), and comment.
14.3 Exercise 3
4. 2. 1.
𝐴=⎡
⎢ 9. 3. 1. ⎤
⎥
⎣ 25. 5. 1. ⎦
0.693147
𝑏=⎡ ⎤
⎢ 1.098612 ⎥
⎣ 1.609438 ⎦
as above.
321
Introduction to Numerical Methods and Analysis with Python
FIFTEEN
15.1 Exercise 1
Show that for a three-point one-sided difference approximation of the first derivative
𝐷2 𝑓(𝑥) 2 𝐷3 𝑓(𝑥) 3
𝑓(𝑥 + ℎ) = 𝑓(𝑥) + 𝐷𝑓(𝑥)ℎ + ℎ + ℎ + 𝑂(ℎ4 )
2 6
and
4𝐷3 𝑓(𝑥) 3
𝑓(𝑥 + 2ℎ) = 𝑓(𝑥) + 2𝐷𝑓(𝑥)ℎ + 2𝐷2 𝑓(𝑥)ℎ2 + ℎ + 𝑂(ℎ4 )
3
15.2 Exercise 2
323
Introduction to Numerical Methods and Analysis with Python
15.3 Exercise 3
Verify that the most accurate three-point centered difference approximation of 𝐷2 𝑓(𝑥), form
15.4 Exercise 4
15.5 Exercise 5
Derive a symmetric five-point approximation of the second derivative, using the Method of Undetermined Coefficients; I
recomend that you use the simpler second, “monomials” approach.
Note: try to exploit symmetry to reduce the number of equations that need to be solved.
15.6 Exercise 6
Use the symmetric centered difference approxmation of the second derivative and Richardson extrapolation to get another
more accurate approximation of this derivative.
Then compare to the result in Exercise 5.
Python Tutorial
325
CHAPTER
SIXTEEN
INTRODUCTION
This is a selection of notes on Python, particularly about using the packages Numpy and Matplotlib which might not have
been encountered in a first course on Python programming.
These are excerpts from the Jupyter book Python for Scientific Computing, which might also be useful for review and
reference.
327
Introduction to Numerical Methods and Analysis with Python
SEVENTEEN
I suggest that you get Anaconda for you own computers, even if you also have access to it via computers on-campus. Get
a sufficiently recent version: at least version 3.9; we will use some of the newer features, especially for working more
easily with matrices and vectors.
Anaconda is a free download from https://www.anaconda.com/products/individual
Once Anaconda is installed, you access its compoments by opening the Anaconda Navigator. The most important of
these for us will be JupyterLab, for working with Jupyter notebooks; see the Project Jupyter site.
Other Anaconda components of possible interest are:
• Spyder an Integrated Development Environment (like IDLE, but far fancier!) for writing and running Python code
files (suffix .py). This has more advanced editing and debugging tools than JupyterLab, so some readers might
prefer to develop Python code in Spyder (or even with IDLE) and then copy the code into a notebook for final
presentation. Aside: “Spyder” is a portmanteau for ScientificPYthon DEvelopment enviRonment.)
• IPython console provides a command line for executing Python code, akin to what you might be familar with
when using IDLE. (Note that within Spyder, one pane is an iPython console, along-side the editing pane.) (Aside:
“IPython” is short for Interactive Python; the “I” is capitalized; it is not an Apple product!)
An alternative is to use the online resource Colab provided by Google. This works entirely with Jupyter notebooks, stored
in the Colab website. Colab does not support directly running Python code files. However, it does support uploading your
own modules in Python code files, from which a notebook can import stuff. (Never mind if you do not yet know about
creating modules and importing from them; we will review that later.)
329
Introduction to Numerical Methods and Analysis with Python
EIGHTEEN
331
Introduction to Numerical Methods and Analysis with Python
10. Only use the Markdown section heading syntax # ..., ## ... and so on for items that are legitimately the titles
of sections, sub-sections, and so on. Otherwise, emphasize with the *...* and **...** syntax for italics and
boldface respectively.
11. When the result of a numerical calculation is only an approximation (almost always in this course!) aim to also
provide an upper bound on the absolute error — and if that is not feasable, then at least provide a plausable estimate
of the error, or a measure of backward error.
12. Avoid infinite loops! (In JupyterLab, the disk at top-right is black while code is running; goes white when finished;
in case of execution never finishing, you can stop it with the square “Stop” button above.) One strategy is to avoid
while loops, in particular while True or while 1; I will illustrate the folowing approach in sample codes
for iterative methods:
• set a maximum number of iterations maxiterations
• use a for loop for iteration in range(maxiterations): instead of while not weAre-
Done:
• Implement the stopping condition with if we_are_done: break somewhere inside the for loop.
13. The last step before you submit a notebook or otherwise share it is to do a “clean run”, restarting the kernel and
then running every cell, and then read it through, checking that the Python code has run succesfully, its output is
correct, and so on.
Usually the easiest way to do this in JupyterLab is with the “fast forward” double right-arrow button at the top;
alternatively, there is a menu item Kernel > Restart Kernel and Run All Cells ...
If you get Python errors but still want to share the file (for example, to ask for debugging help), this run will stop
at the first error, so scan down to that first error, use menu item Run > Run Selected Cell and All
Below, and repeat at each error till the run gets to the last cell.
332 Chapter 18. Suggestions and Notes on Python and Jupyter Notebook Usage
CHAPTER
NINETEEN
PYTHON BASICS
For some tasks, Python can be used interactively, much like a scientific calculator (and later, like a graphing scientific
calculator) and that is how we will work for this section.
There are multiple ways of doing this, and I invite you to experiment with them all at some stage:
• With the “bare” IPython console; this is available in many ways, including from the Anaconda Navigator by opening
Qt Console
• With Spyder, which can be accessed from within the Anaconda Navigator by opening Spyder; within that, there is
an interactive Python console frame at bottom-right.
• In Jupyter notebooks (like this one), which can be accessed from within the Anaconda Navigator by opening
JupyterLab.
I suggest learning one tool at a time, starting with this Jupyter notebook; however if you wish to also one of both of the
alternatives above, go ahead.
A Jupyter notebook consists of a sequence of cells, and there are two types that we care about:
• Code cells, which contain Python code
• Markdown cells (like all the ones so far), which contain text and mathematics, using the Markdown markup
notation for text combined with LaTeX markup notation for mathematical formulas.
Running a Code cell executes its Python code; running a Markdown cell formats its content for nicer output.
There are several ways to run a cell:
• The “Play” (triangle icon) button above
• Typing “Shift-Return” (or “Shift-Enter”)
Both of these run the current cell and then move down to the next cell as “current”.
Also, the menu “Run” above has various options — hopefully self-explanatory, at least after some experimentation.
To get a Markdown cell back from nicely formatted “display mode” back to “edit mode”, double click in it.
It will be convenient to see the value of an expression by simply making it the last line of a cell.
333
Introduction to Numerical Methods and Analysis with Python
2 + 2
To see several values from a cell, put them on that last line, separated by commas; the results will appear also separated
by commas, and in parentheses:
2 + 2, 6 - 1
(4, 5)
(As we will see in the next section, these are tuples; a very convenient way of grouping information.)
Python, like most programming languages distinguishes integers from floating point (“real”) numbers; even “1.0” is dif-
ferent from ‘1’, the decimal point showing that the former is a floating point number.
One difference is that there is a maximum possible floating point number of about 10300 , whereas Python integers can be
arbitrarily large — see below with exponentiation.
Addition and subtraction with “+” and “-” are obvious, as seen above.
2 + 2, 6 - 2
(4, 4)
However, there are a few points to note with multiplication, exponentiation, and division.
Firstly, the usual multiplication sign not in the standard character setor on standard keyboards, so an asterisk “*” is used
instead:
2 * 3
Likewise exponentiation needs a special “keyboard-friendly” notation, and it is a dobuble asterisk “**” (not “^”):
2**3
Numbers with exponents can be expressed using this exponential notation, but note how Python outputs them:
5.3*10**21
5.3e+21
7*10**-8
7e-08
This “e” notation is the standard way to describe numbers with exponents, and you can input them that way too.
When printing numbers, Python decides if the number is big enough or small enough to be worth printing with an exponent:
Aside: note that either “e” or “E” can be used in input of exponents. However in most contexts, Python is case sensitive;
we will see an example soon.
19.3.2 Division
6/3
2.0
5/3
1.6666666666666667
6//3
5//3
6%3
5%3
Python uses j for the square root of -1 rather than i, and complex numbers are writen as a + bj or just bj for purely
imaginary numbers. (Here ‘a’ and ‘b mut be literal numbers, nt mnakem of variables.)
The coeffcient b in the imaginary part is always needed, even if it is 1.
but
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 j
Both “and” and “or” are use lazy or “short-circuiting” evaluation: if the T/F at left determines the overall truth value
answer (it is False with and, True with or) then the term at right is not evaluated.
For example, the following avoids division by zero (the syntax will be explained later, but hopefuly it is mostly clear.)
19.5 Comparisons
Comparisons of numbers and tests for equality exist, with two catches:
• There are no keyboard symbols for ≤, ≥ or ≠, so <=, >= and != are used
• Equality is indicated with a double equals sign == because a single equals sign has another job, as seen below under
Naming quantities, and displaying information with print.
So in all:
One convenient difference from most programming languages is that comparisons can be chained, as seen in the example
above:
is equivalant to
but is both more readable and more efficient, because the middle term is only evaluated once.
Like and and or, this is short-circuiting.
Chaining can even be done in cases where usual mathematical style forbids, with reversing of the direction of the inequal-
ities:
2 < 4 > 3
True
Chunks of text can be described by surrounding them either with double quotes (“real quotation marks”) or so-called
‘single quotes’, (which are actualy apostrophes).
"I recommend using 'double quote' characters, except perhaps when quote marks must␣
↪appear within a string, like here."
"I recommend using 'double quote' characters, except perhaps when quote marks must␣
↪appear within a string, like here."
However, using apostrophes (single right quotes) is also allowed by Python, perhaps because they are slightly easier to
type.
greeting = "Hello"
audience = "world"
sentence = greeting + " " + audience + "."
print(sentence)
print(3*(greeting+' '))
Hello world.
Hello Hello Hello
It often helps to give names to quantities, which is done with assignment statements. For example, to solve for 𝑥 in the
very simple equation 𝑎𝑥 + 𝑏 = 0 for given values of 𝑎 and 𝑏, let us name all three quantities:
a = 3
b = 7
x = -b/a
print(a, b, x)
3 7 -2.3333333333333335
That output does not explain things very clearly; we can add explanatory text to the printed results in several ways. Most
basically:
The above prints six items — three text strings (each quoted), three numbers — with the input items separated by commas.
There is an alternative notation for such output, called “f-strings”, which use braces to specify that the value of a variable
be inserted into a string of text to display:
In this example each pair of braces inserts the value of a variable; one can instead put an expression in braces to have it
evaluated and that value displayed. So in fact we could skip the variable 𝑥 entirely:
A final shortcut with print: if an expression in braces ends with =, both the expression and its value are displayed:
Python includes many modules and packages that add useful defintions of functions, constants and such. For us, the
most fundamental is math, which is a standard part of Python.
Aside: we will soon learn about another package numpy and then mostly use that instead of math. I mention math for
now because it is a core part of Python whereas numpy is a separate “add-on”, and you should be aware of math if only
because many references on doing mathematical stuff with Python will refer to it.
We can access specific items with a from … import command; to start with, two variables containing famous values:
pi is approximately 3.141592653589793
e is approximately 2.718281828459045
Each is accurate to about 16 significant digits; that is the precision of the 64-bit number system standard in most modern
computers.
We can now return to the comment above about case sensitivity. Compare the results of the following two input lines:
print(f'e = {e}')
e = 2.718281828459045
print(f'E = {E}')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [33], in <cell line: 1>()
----> 1 print(f'E = {E}')
sin(pi/4)
0.7071067811865475
cos(pi)
-1.0
cosh(0)
1.0
Notes
• All Python functions need parentheses around their arguments; no lazy shortcuts with trig. functions and such.
• Trig. functions use radians, not degrees.
import math
we get access to many functions such as tan, but need to address them by fully qualified name, like math.tan:
print(f"tan(pi/4) = {math.tan(pi/4)}")
tan(pi/4) = 0.9999999999999999
If we had not already imported pi above, its full name math.pi would also be needed, as in:
print(f"tan(pi/4) = {math.tan(math.pi/4)}")
tan(pi/4) = 0.9999999999999999
Aside: The imperfection of computer arithmetic becomes clear: the exact value is 1 of course. However, the precision
of about 16 decimal places is far greater than for any experimental measurement (so far), so this is usually not a problem
in practice.
There is another way of importing all the contents of a module so that they are available on a “first name basis”, called a
wild import; it is like the single-item imports above with from math import ... but no with an asterisk (often
called the wild-card character in this context) as the name of the items to import:
log(10)
2.302585092994046
Wild imports can be convenient when working interactively (using Python like a scientific calculator) through saving a bit
of typing, but I strongly recommend using the previous more specific approaches when creating files and programs.
One reason is that it is then unambiguous where a given item came from, even if multiple import commands are used.
For example, we will see later than there are both math.cos and numpy.cos, and they behave differently in some
situations.
Another reason for explicit imports is for internal efficiency in storage and execution time; wild imports can potentially
load thousands of items from a large module even if only a few are used.
We just saw that there is a function log in module math, but which base does it use?
log(10)
2.302585092994046
log(e)
1.0
Evaluating these two expressions reveals that “log()” is the natural logarithm, base 𝑒;
what mathematicians usually call “ln”.
For the base ten version “log10 ” (sometimes just called “log”) use log10:
log10(100)
2.0
A square root can of course be computed using the 1/2 power, as with
16**0.5
4.0
but this function is important enough to have a named form provided by module math:
sqrt(2)
1.4142135623730951
If you are doing the exercises in these notes for a course, record both your input and the resulting output, in a document
that you will submit for feedback and then grading, suc as a moified copy of this notebook.
I also encourage you to make notes of other things that you learn, beyond what must be submitted — they will help later
in the course.
Also, every document that you submit, hand-written or electronic, should start with:
• A title,
• your name (possibly also with your email address), and
TWENTY
In this book names are sometimes a description formed from several words, and since spaces are forbidden, this is generally
done by using camelCase; for example, an error estimate might be in a variable errorEstimate. (Another popular
style is to use the underscore as a proxy for a space, as with error_estimate.)
One place where underscores are used in names is where the corresponding mathematical notation would have a subscript;
for example, the mathematical name 𝑥0 becomes x_0. (This mimics the LaTeX notation for subscripts.)
345
Introduction to Numerical Methods and Analysis with Python
TWENTYONE
21.1 Foreword
With this and all future sections, start by creating your own Jupyter notebook; perhaps by copying relevant cells from this
notebook and then adding your work.
If you also wish to start practicing with the Spyder IDE, then in addition use it to create a Python code file with that code,
and run the commands there too.
Later you might find it preferable to develop code in Spyder and then copy the working code and notes into a notebook
for final presentation — Spyder has better tools for debugging.
The first step beyond using Python merely as a calculator is storing value in variables, for reuse later in more elaborate
calculations. For example, to find both roots of a quadratic equation
𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0
we want the values of each coefficient and are going to use each of them twice, which we might want to do without typing
in each coefficient twice over.
21.2.1 Example
2𝑥2 − 8𝑥 + 6 = 0
using the quadratic formula. But first we need to get the square root function:
347
Introduction to Numerical Methods and Analysis with Python
a = 2
b = -10
c = 8
(Aside: why did I number the roots 0 and 1 instead of 1 and 2? The answer is coming up soon.)
Where are the results? They have been stored in variables rather than printed out, so to see them, use the print function:
print('The smaller root is', root0, 'and the larger root is', root1)
Aside: This is the first mention of the function print, for output to the screen (or to files). You can probably learn
enough about its usage from examples in this and subsequent units of the course, but for more information see also the
notes on Formatted Output and Some Text String Manipulation
Formatted Output and Some Text String Manipulation
A short-cut for printing the value of a variable is to simply enter its name on th last line of a cell:
root0
1.0
root0, root1
(1.0, 4.0)
Note that the output is parenthesized: this, as will be explained below, is a tuple
LastName = "LeMesurier"
FirstName = 'Brenton'
print("Hello, my name is", FirstName, LastName)
Note that either ‘single quotes’ or “double quotes” can be use to surround text, but one must be consistent within each
piece of text. I recomned always uising double quotes (which are true quotation characters) not single quotes (which are
actually apostrophes: for one thing many other languages require this, so it could help to get into the habit.
348 Chapter 21. Python Variables, Including Lists and Tuples, and Arrays from Package Numpy
Introduction to Numerical Methods and Analysis with Python
21.4 Lists
Python has several ways of grouping together information into one variable. We first look at lists, which can collect all
kinds of information together:
name + phone
Individual entries (“elements”) can be extracted from lists; note that Python always counts from 0, and indices go in
[brackets], not (parentheses) or {braces}:
LastName = name[0]
FirstName = name[1]
print(FirstName, LastName)
Brenton LeMesurier
name[1] = 'John'
print(name[1])
print(name)
John
['LeMesurier', 'John']
We can use the list of coefficients to specify the quadratic, and store both roots in a new list.
But let’s shorten the name first, by making “q” a synonym for “coefficients”:
q = coefficients
print('The list of coefficients is', q)
a = q[0]
b = q[1]
c = q[2]
Alternatively one can unpack the elements of a list into separate variables with
(a, b, c) = q
21.4.1 The equals sign = creates synonyms for lists; not copies
Note that it says above that the statement q = coefficients makes q is a synonym for coefficients, not a
copy of its values. To see this, note that when we make a change to q it also applies to coefficients (and vice versa):
print("q is", q)
print("coefficients is", coefficients)
q[0] = 4
print("q is now", q)
print("coefficients is now", coefficients)
q is [2, -10, 8]
coefficients is [2, -10, 8]
q is now [4, -10, 8]
coefficients is now [4, -10, 8]
coefficients[0] = 2
350 Chapter 21. Python Variables, Including Lists and Tuples, and Arrays from Package Numpy
Introduction to Numerical Methods and Analysis with Python
Python allows you to count backwards from the end of a list, by using negative indices:
• index -1 refers to the last element
• index -k refers to the element k from the end.
For example:
digits = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print('The last digit is', digits[-1])
print('The third to last digit is', digits[-3])
This also works with the Numpy arrays: for vectors, matrices, and beyond and Tuples introduced below.
21.5 Tuples
One other useful kind of Python collection is a tuple, which is a lot like a list except that it is immutable: you cannot
change individual elements. Tuples are denoted by surrounding the elements with parentheses “(…)” in place of the
brackets “[…]” used with lists:
(2, -10, 8)
print(qtuple)
(2, -10, 8)
qtuple[2]
Actually, we have seen tuples before without the name being mentioned: when a list of expressions is put on one line
separated by commas, the result is a tuple. This is because when creating a tuple, the surrounding parentheses can usually
be omitted:
('LeMesurier', 'Brenton')
There are some rules limiting which names can be used for variables:
• The first character must be a letter.
• All characters must be “alphanumeric”: only letters of digits.
• However, the underscore “_” (typed with “shift dash”) is an honorary letter: it can be used where you are tempted
to have a space.
Note well: no dashes “-” or spaces, or any other punctuation.
When you are tempted to use a space in a name, such as when the name is a desrciptive phrase, it is recommended to use an
underscore. (Another option is to capitalize the first letter of each new word: so-called camelCase or UpperCamelCase.)
21.6.1 Exercise A
It will soon be convenient to group the input data to and output values from a calculation in tuples.
Do this by rewriting the quadratic solving exercise using a tuple “coefficients” containing the coefficients (a, b, c) of a
quadratic 𝑎𝑥2 + 𝑏𝑥 + 𝑐 and putting the roots into a tuple named “roots”.
Break this up into three steps, each in its own code cell (an organizational pattern that will be important later):
1. Input: create the input tuple.
2. Calculation: use this tuple to compute the tuple of roots.
3. Output: print the roots.
This is only a slight variation of what is done above with lists, but the difference will be important later.
As mentioned above, a major difference from lists is that tuples are immutable; their contents cannot be changed: I
cannot change the lead cofficient of the quadratic above with
qtuple[0] = 4
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [25], in <cell line: 1>()
----> 1 qtuple[0] = 4
352 Chapter 21. Python Variables, Including Lists and Tuples, and Arrays from Package Numpy
Introduction to Numerical Methods and Analysis with Python
This difference between mutable objects like lists and immutable ones like tuples comes up in multiple places in Python.
The one other case that we are most likely to encounter in this course is strings of text, which are in some sense “tuples
of characters”. For example, the characters of a string can be addressed with indices, and concatenated:
language = "Python"
print(f"The initial letter of '{language}' is '{language[0]}'")
print(f"The first three letters are '{language[0:3]}'")
languageversion = language + ' 3'
print(f"We are using version '{languageversion}'")
Aside: Here a new feature of printing and string manipulation is used, “f-string formatting” (new in Python version 3.6).
For details, see the notes on formatted output and some text string manipulation mentioned above.
Also as with tuples, one cannot change the entries via indexing; we cannot “lowercase” that name with
language[0] = "p"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [27], in <cell line: 1>()
----> 1 language[0] = "p"
Many mathematical calculations involve vectors, matrices and other arrays of numbers. At first glance, Python lists and
tuples look like vectors, but as seen above, “addition” of such objects does not do what you want with vectors.
Thus we need a type of object that is specifically an array of numbers of the same type that can be manipulatd like a vector
or matrix. There is not a suitable entity for this in the core Python language, but Python has a method to add features
using modules and packages, and the most important one for us is Numpy: this provides for suitable numerical arrays
through objects of type ndarray, and provides tools for working with them, like the function array() for creating
arrays from lists or tuples. (Numpy also provides a large collection of other tools for numerical computing, as we will see
later.)
import numpy
Then the function array is accessed by its “fully-qualified name” numpy.array, and we can create an ndarray
that serves for storing a vector:
u = numpy.array([1, 2, 3])
array([1, 2, 3])
print(u)
[1 2 3]
Note: As you might have noticed above, displaying the value of a variable by simply typing its name describes it in
more detail than the print function; sometimes it is a description that could be used to create the object. Thus I will
sometimes use both display methods below, as a reminder of the syntax and semantics of Numpy arrays.
As seen above, if we just want that one function, we can import it specifically with the command
v = array([4, 5, 6, 7])
print(v)
[4 5 6 7]
21.8.2 Notes
1. Full disclosure: Python’s core collection of resources does provide another kind of object called an array, but we
will never use that in this course, and I advise you to avoid it: the Numpy ndarray type of array is far better for
what we want to do! The name “ndarray” refers to the possibility of creating n-dimensional arrays — for example,
to store matrices — which is one of several important advantages.
2. There is another add-on package Pylab, which contains most of Numpy plus some stuff for graphics (from package
Matplotlib, which we will meet later, in Section 8) That is intended to reproduce a Matlab-like environment, espe-
cially when used in Spyder, which is deliberately Matlab-like. So you could instead use from pylab import
*, and that will sometimes be more convenient. However, when you search for documentation, you will find it by
searching for numpy, not for pylab. For example the full name for function array is numpy.array and once
we import Numpy with import numpy we can get help on that with the command help(numpy.array).
Beware: this help information is sometimes very lengthy, and “expert-friendly” rather than “beginner-friendly”.
Thus, now is a good time to learn that when the the up-array and down-array keys get to the top or bottom of a cell in a
notebook, they keep moving to the previous or next cell, skipping past the output of any code cell.
help(numpy.array)
The function help can also give information about a type of object, such as an ndarray. Note that ndarray is
referred to as a class; if that jargon is unfamiliar, you can safely ignore it for now, but if curious you can look at the brief
notes on classes, objects, attributes and methods
Beware: this help information is even more long-winded, and tells you far more about numpy arrays than you need to
know for now! So make use of that down-arrow key.
354 Chapter 21. Python Variables, Including Lists and Tuples, and Arrays from Package Numpy
Introduction to Numerical Methods and Analysis with Python
help(numpy.ndarray)
Numpy arrays (more pedantically, objects of type ndarray) are in some ways quite similar to lists, and as seen above,
one way to create an array is to convert a list:
list0 = [1, 2, 3]
list1 = [4, 5, 6]
array0 = array(list0)
array1 = array(list1)
list0
[1, 2, 3]
array0
array([1, 2, 3])
print(list0)
[1, 2, 3]
print(array0)
[1 2 3]
We can skip the intermediate step of creating lists and instead create arrays directly:
Printing makes these seem very similar, though an array is displayed without commas between elements. Note that this
is like the style for a single-row matrix.
list0 = [1, 2, 3]
array0 = [1 2 3]
print(list0 + list1)
[1, 2, 3, 4, 5, 6]
print(array0 + array1)
[ 5 12 9]
print(2 * list0)
[1, 2, 3, 1, 2, 3]
print(2 * array0)
[ 2 14 6]
A list can have other lists as its elements, and likewise an array can be described as having other arrays as its elements, so
that a matrix can be described as a succession of rows. First, a list of lists can be created:
print(listoflists)
listoflists[1][-1]
356 Chapter 21. Python Variables, Including Lists and Tuples, and Arrays from Package Numpy
Introduction to Numerical Methods and Analysis with Python
matrix = array(listoflists)
print(matrix)
[[1 2 3]
[4 5 6]]
matrix*3
array([[ 3, 6, 9],
[12, 15, 18]])
anothermatrix
array([[4, 5, 6],
[1, 7, 3]])
print(anothermatrix)
[[4 5 6]
[1 7 3]]
Note that we must use the notation array([…]) to do this; without the function array() we would get a list of arrays, which
is a different animal, and much less fun for doing mathematics with:
[array([4, 5, 6]),
array([1, 7, 3]),
array([4, 5, 6]),
array([1, 7, 3]),
array([4, 5, 6]),
array([1, 7, 3])]
21.8.6 Referring to array elements with double indices, or with successive single
indices
matrix[1,2]
but you can also use a single index to extract an “element” that is a row:
matrix[1]
array([4, 5, 6])
and you can use indices successively, to specify first a row and then an element of that row:
matrix[1][2]
This ability to manipulate rows of a matrix can be useful for linear algebra. For example, in row reduction we might want
to subtract four times the first row from the second row, and this is done with:
Note well the effect of Python indexing starting at zero: the indices used with a vector or matrix are all one less than you
might expect based on the notation seen in a linear algebra course.
Arrays with three or more indices are possible, though we will not see much of them in this course:
arrays_now_in_3D
358 Chapter 21. Python Variables, Including Lists and Tuples, and Arrays from Package Numpy
Introduction to Numerical Methods and Analysis with Python
array([[[ 1, 2, 3],
[ 0, -3, -6]],
[[ 4, 5, 6],
[ 1, 7, 3]]])
print(arrays_now_in_3D)
[[[ 1 2 3]
[ 0 -3 -6]]
[[ 4 5 6]
[ 1 7 3]]]
Exercise B
2 3 3 0
Create two arrays, containing the matrices $𝐴 = [ ], 𝐵 =[ ]$ Then look at what is given by the
1 4 2 1
formula
C = A * B
D = A @ B
360 Chapter 21. Python Variables, Including Lists and Tuples, and Arrays from Package Numpy
CHAPTER
TWENTYTWO
22.1 Introduction
To move beyond using Python to execute a simple sequence of commands, two new algorithmic features are needed:
decision making and repetition (with variation). In this unit we look at decision making, using conditional statements; if
statements.
We first see the simplest case of either doing something or not, depending on some condition; here, avoiding division by
zero. The code we will use first is:
if b == 0:
print("Do you really want to try dividing by zero?!")
print(f"{a}/{b} = {a/b}")
a = 4
b = 3
if b == 0:
print("Do you really want to try dividing by zero?!")
print(f"{a}/{b} = {a/b}")
4/3 = 1.3333333333333333
a = 4
b = 0
361
Introduction to Numerical Methods and Analysis with Python
if b == 0:
print("Do you really want to try dividing by zero?!")
print(f"a/b = {a/b}")
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Input In [5], in <cell line: 3>()
1 if b == 0:
2 print("Do you really want to try dividing by zero?!")
----> 3 print(f"a/b = {a/b}")
a = 4
b = 0
if b == 0:
print("You cannot divide by zero! Try changing the value of b and rerun this cell.
↪")
else:
print(f"{a}/{b} = {a/b}")
You cannot divide by zero! Try changing the value of b and rerun this cell.
answer = (b == 0)
print(f"The answer is {answer}")
362 Chapter 22. Decision Making With if, else, and elif
Introduction to Numerical Methods and Analysis with Python
Note that logical statements like b == 0 have value either True or False — and remember that these are case
sensitive; “true” and “false” have no special meaning.
if answer:
print("You cannot divide by zero! Try changing the value of b and rerunning this␣
↪cell.")
else:
print(f"{a}/{b} = {a/b}")
You cannot divide by zero! Try changing the value of b and rerunning this cell.
a = 4
b = 3
answer = (b == 0)
print(f"The answer is {answer}")
if answer:
print("You cannot divide by zero! Try changing the value of b and rerun this cell.
↪")
else:
print(f"{a}/{b} = {a/b}")
4/3 = 1.3333333333333333
a = 4
for b in (3, 0):
print(f"With a = {a} and b = {b},")
if b == 0:
print("b is 0; you cannot divide by zero!")
else:
print(f"{a}/{b} = {a/b}")
With a = 4 and b = 3,
4/3 = 1.3333333333333333
With a = 4 and b = 0,
b is 0; you cannot divide by zero!
Here is another way to do the same thing, this time putting the more “normal” case first, which tends to improve readability:
a = 4
b = 3
4/3 = 1.3333333333333333
More than two possibilities can be handled, using an elif clause (short for “else, if”). Let’s see this, while also intro-
ducing inequality comparisons:
x = 4
if x > 0:
print("x is positive")
elif x < 0:
print("x is negative")
else: # By process of elimination ...
print("x is zero")
x is positive
22.2.1 Exercise A
Test the above by changing to various other values of x and rerunning. As mentioned above, ensure that you exercise all
three possibilities.
While experimenting in a notebook (or Python code file), one way to do this is to edit in new values for x and then re-run,
but for final presentation and submission of your work, do this by one of the methods seen above:
• make multiple copies of the code, one for each chosen value of x.
• be more adventurous by working out how to use a for statement to run a list of test cases. This is looking ahead
to Iteration with for.
n = 36
364 Chapter 22. Decision Making With if, else, and elif
Introduction to Numerical Methods and Analysis with Python
if n % 10 == 0:
print(f"{n} is a multiple of ten")
elif n % 5 == 0:
print(f"{n} is an odd multiple of five")
elif n % 2 == 0:
print(f"{n} is even, but not a multiple of five.")
else:
print(f"{n} has no factor of either two or five.")
22.3.1 Exercise B
Again, test all four possibilities by using a suitable collection of values for n.
Start with written preparation and planning of your procedure, and check that with me before creating any Python code.
You can do this on paper or a text file, but as a goal for how best to do things, you could also try to present this planing
in a Jupyter notebook; then you could append the Python work to that notebook later.
As part of this planning, select a collection of test cases that explore all the possibilities: keep in mind the goal that each
line of code is executed in a least one test case (bearing in mind that if statements can skip some lines of code, depending
on whether various statements are true or false).
22.4.1 Exercise C. Planning for robust handling of all possible “quadratics” 𝑎𝑥2 +
𝑏𝑥 + 𝑐 = 0
In the next exercise, we will create Python code that correctly handles the task of finding all the real roots of a quadratic
equation 𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0 for the input of any real numbers 𝑎, 𝑏 and 𝑐.
Before writing any code, we need a written plan, distinguishing all qualitatively different cases for the input, and deciding
how to handle each of them. (This planning and decision making will be an important requirement in many later exercises.)
Produce quadratic solving code that is robust; handling all possible triples of real numbers a, b and c for the coefficients
in equation 𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0.
Not all choices of those coefficients will give two distinct real roots, so work out all the possibilities, and try to handle
them all.
366 Chapter 22. Decision Making With if, else, and elif
CHAPTER
TWENTYTHREE
23.1 Introduction
Our main objective is to learn how to define our own Python functions, and see how these can help us to deal with sub-tasks
done repeatedly (with variation) within a larger task.
Typically, core mathematical tasks (like evaluating a function 𝑓(𝑥) or solving an equation 𝑓(𝑥) = 0) will be done within
Python functions designed to communicate with other parts of the Python code, rather than getting their input interactively
or returning results by displaying them on the screen.
In this course, the notebook approach is prioritized; that makes it easier to produce a single document that combines
text and mathematical information, Python code, and computed results, making for a more “literate”, understandable
presentation.
Example A. A very simple function for computing the mean of two numbers
To illustrate the syntax, let’s start with a very simple function: computing the mean of two numbers.
mean_of_2_and_6 = mean(2, 6)
print('The mean of 2 and 6 is', mean_of_2_and_6)
367
Introduction to Numerical Methods and Analysis with Python
23.1.1 Notes
Also, multiple values can be given in a return line; they are then output as a tuple.
mean_and_difference(2, 5)
(3.5, 3)
A more subtle point: all the variables appearing in the function’s def line (here a and b) and any created inside with
an assignment statement (here just mean_of_a_and_b) are purely local to the function; they do not exist outside the
function. For that reason, when you call a function, you have to do something with the return value, like assign it to a
variable (as done here with mean_of_a_and_b) or use it as input to another function (see below).
Aside: there is a way that a variable can be shared between a function and the rest of the file in which the function
definition appears; so-called global variables, using global statements. However, it is generally good practice to avoid
them as much as possible, so I will do so in these notes.
To illustrate this point about local variables, let us look for the values of variables a and mean_of_a_and_b in the
code after the function is called:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [6], in <cell line: 2>()
1 print('After using function mean:')
----> 2 print(f'a = {a}')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 print('mean_of_a_and_b =', mean_of_a_and_b)
print(f'mean310 = {mean310}')
mean310 = 6.5
The same name can even be used locally in a function and also outside it in the same file; they are different objects with
independent values:
def double(a):
print(f"At the start of function 'double', a = {a}")
a = 2 * a
print(f"A bit later in that function, a = {a}")
return a
a = 1
print(f"Before calling function 'double', a = {a}")
b = double(a)
print(f"After calling function 'double', b = {b}, but a = {a} again.")
Warning about keeping indentation correct: The line a = 1 is after the function definition, not part of it, as indicated
by the reduced indentation; however Python code editing tools like JuperLab and Spyder will default to indenting each
new line as much as the previous one when you end a line by typing “return”. Thus, when typing in a function def, it is
important to manually reduce the indentation (“dedent”) at the end of the definition. The same is true for all statements
that end with a colon and so control a following block of code, like if, else, elif and for.
Example B. More on multiple output values, with tuples
Often, a function computes and returns several quantities; one example is a function version of our quadratic equation
solver, which takes three input parameters and computes a pair of roots. Here is a very basic function for this, ignoring
for now possible problems like division by zero:
Now what is returned is a tuple, and it can be stored into a single variable:
However, it is often convenient to store each returned value into a separate variable, using tuple notation at left in the
assignment statement:
When tuples were introduced in the section on Python variables, etc., they were described as “a parenthesized list of values
separated by commas”, but note that above, no parentheses were used: the return line was
not
Tuples can have a single member, but then to make it clear to Python that it is a tuple, there must always be a comma
after that sole element. Compare:
tuple_a = (1,)
print(tuple_a)
(1,)
tuple_b = 2,
print(tuple_b)
(2,)
not_a_tuple_c = (3)
print(not_a_tuple_c)
Note that the code blocks for some of the functions above start with a comment surrounded by a triplet of quote characters
at each end, and this sort of comment can continue over multiple lines.
In addition to making it easier to have long comments, this sort of comment provides some self-documentation for the
function — not just in the code for the function, but also in the Python help system:
help(mean_and_difference)
mean_and_difference(a, b)
Compute the mean and difference of two numbers.
help(solve_quadratic)
About the reference to module __main__: that is the standard name for code that is not explicitly part of any other
module, such as anything defined in the current file rather than imported from elsewhere. Compare to what we get when
a function comes from another module:
help(sqrt)
sqrt(x, /)
Return the square root of x.
As you might expect, all objects provided by standard modules like math and numpy have some documentation provided;
help is useful in cases like this where you cannot see the Python code that defines the function.
When a function def lacks such a self-documentation comment, help still tells us something; the syntax for using it,
and where it come from:
help(mean)
mean(a, b)
Refine the above function solve_quadratic for solving quadratics, and make it robust; handling all pos-
sible input triples of real numbers in a “reasonable” way. (If you have done Exercise C of the section
{doc}(decisions-with-if-else-elif, this is essentially just integrating that code into a function.)
Not all choices of those coefficients will give two distinct real roots, so work out all the possibilities, and try to handle
them all.
1. Its input arguments are three real (floating point) numbers, giving the coefficients 𝑎, 𝑏, and 𝑐 of the quadratic
equation $ax^2 + bx + c = 0.
2. It always returns a result (usually a pair of numerical values for the roots) for any “genuine” quadratic equation,
meaning one with 𝑎 ≠ 0.
3. If the quadratic has real roots, these are output with a return statement — no print commands in the function.
4. In cases where it is not a genuine quadratic, or there are no real roots, return the special value None.
5. As an optional further refinement: in the “prohibited” case 𝑎 = 0, have it produce a custom error message, by
“raising an exception”. We will learn more about handling “exceptional” situations later, but for now you could just
use the command:
raise ZeroDivisionError("The coefficient 'a' of x^2 cannot have the value zero.")
or
raise ValueError("The coefficient 'a' of x^2 cannot have the value zero.")
Ultimately put this in a cell in a Jupyter notebook (suggested name: “exercises_on_functions.ipynb”); if you prefer to
develop it with Spyder, I suggest the filename “quadratic_solver_c.py”
23.6.1 Testing
Test and demonstrate this function with a list of test cases, including:
1. 2𝑥2 − 10𝑥 + 8 = 0
2. 𝑥2 − 2𝑥 + 1 = 0
3. 𝑥2 + 2 = 0
4. 𝑥2 + 6𝑥 + 25 = 0
5. 4𝑥 − 10 = 0
Sometimes a function has numerous input arguments, and then it might be hard to remember what order they go in.
Even with just a few arguments, there can be room for confusion; for example, in the above function
solve_quadratic do we give the coefficients in order c_2, c_1, c_0 as for 𝑐2 𝑥2 + 𝑐1 𝑥 + 𝑐0 = 0, or in
order c_0, c_1, c_2 as for 𝑐0 + 𝑐1 𝑥 + 𝑐2 𝑥2 = 0?
To improve readability and help avoid errors, Python has a nice optional feature of specifying input arguments by name;
they are then called keyword arguments, and can be given in any order.
For example:
(2.0, -5.0)
When you are specifying the parameters by name, there is no need to have them in any particular order. For example, if
you like to write polynomials “from the bottom up”, as with −10 + 3𝑥 + 𝑥2 , which is 𝑐0 + 𝑐1 𝑥 + 𝑐2 𝑥2 , you could do this:
(2.0, -5.0)
In mathematical computing, we often wish to define a (Python) function that does something with a (mathematical)
function. A simple example is implementing the basic difference quotient approximation of the derivative
𝑓(𝑥 + ℎ) − 𝑓(𝑥)
𝑓 ′ (𝑥) = 𝐷𝑓(𝑥) ≈
ℎ
with a function Df_approximation, whose input will include the function 𝑓 as well as the two numbers 𝑥 and ℎ.
Python makes this fairly easy, since Python functions, like numbers, can be the values of variables, and given as input to
other functions in the same way: in fact the statement
def p(x):
return 2*x**2 - 10*x + 8
x0 = 1
h = 1e-4
A bit more about keyword arguments: they can be mixed with positional arguments, but once an argument is given in
keyword forms all later ones must be also. Thus it works to do this:
Input In [27]
Dfxh = Df_approximation(p, x=2, 0.000001)
^
SyntaxError: positional argument follows keyword argument
Sometimes it makes sense for a function to have default values for arguments, so that not all argument values need to be
specified. For example, the value ℎ = 10−8 is in some sense close to “ideal”, so let us make that the default, by giving h
a “suggested” value as part of the function’s (new, improved) definition:
The value for input argument h can now optionally be omitted when the function is used, getting the same result as before:
big_h = 0.01
Df_x0_h = Df_approximation(p, x0, big_h)
print(f'Using h={big_h}, Df({x0}) is approximately {Df_x0_h}')
23.9.1 Arguments with default values must come after all others in the def
When default values are given for some arguments but not all, these must appear in the function definition after all the
arguments without default values, as is the case with h=1e-8 above.
Note: Some students might be interested in “anonymous functions”, also known as “lambda functions”, so here is a brief
introduction. However, this topic is not needed for this course; it is only a convenience, and if you are new to computer
programming, I suggest that you skip this section for now.
One inconvenience in the above example with Df_approximation is that we had to first put the values of each input
argument into three variables. Sometimes we would rather skip that step, and indeed we have seen that we could put the
numerical argument values in directly:
However, we still needed to define the function first, and give it a name, p.
If the function is only ever used this one time, we can avoid this, specifying the function directly as an input argument
value to the function Df_approximation, without first naming it.
This is done with what is called an anonymous function, or for mysterious historical reasons, a lambda function.
For the example above, we can do this:
We can even do it all in a single line by composing two functions, print and Df_approximation:
creates a function that is mathematically the same as function p above; it just has no name.
In general, the form is a single-line expression with four elements:
• It starts with lambda
• next is a list of input argument names, separated by commas if there are more than one (but no parentheses!?)
• then a colon
• and finally, a formula involving the input variables.
We can, if we want, assign a lambda function to a variable, so we could have defined p as
though I am not sure if that has any advantage over doing this with def:
As an example of that, and also of having a lambda function that returns multiple values, here is yet another quadratic
equation solver:
Anonymous functions have most of the fancy features of functions created with def, with the big exception that they
must be defined on a single line. For example, they also allow the use of keyword arguments, allowing the input argument
values to be specified by keyword in any order. It is also possible to give default values to some arguments at the end of
the argument list.
To show off a few of these refinements:
(4.0, 1.0)
Df_approximation(sqrt, 4.0)
0.24999997627617176
0.249843945007866
TWENTYFOUR
24.1 Introduction
The last fundamental tool for describing algorithms is iteration or “looping”: tasks that repeat the same sequence of actions
repeatedly, with possible variation like using different input values at each repetition.
In Python — as in most programming languages — there are two versions:
• when the number of iterations to be done is determined before we start — done with for loops;
• when we must decide “on the fly” whether we are finished by checking some conditions as part of each repetition
— done with while loops.
This Unit covers the first case, of for loops; the more flexible while loops will be introduced in thssection on Iteration
with while.
We can apply the same sequence of commands for each of a list of values by using a for statement, followed by an
indented list of statements.
24.2.1 Example A
We can compute the square roots of several numbers. Here I give those numbers in a tuple (i.e., in parentheses). They
could just as well be in a list [i.e., in brackets.]
import math
379
Introduction to Numerical Methods and Analysis with Python
One very common choice for the values to use in a for loop is a range of consecutive integers, or more generally, equally
spaced integers. The first n natural numbers are given by range(n) — remember that for Python, the natural numbers
start with 0, so this range is the semi-open interval of integers [0, 𝑛) = {𝑖 ∶ 0 ≤ 𝑖 < 𝑛}, and so 𝑛 is the first value not in
the range!
24.3.1 Example B
for n in range(8):
if n % 4 == 0:
print(f'{n} is a multiple of 4')
elif n % 2 == 0:
# A multiple of 2 but not of 4, or we would have stopped with the above "match
↪".
0 is a multiple of 4
1 is odd
2 is an odd multiple of 2
3 is odd
4 is a multiple of 4
5 is odd
6 is an odd multiple of 2
7 is odd
range(m, n)
Again, the terminal value n is the first value not in the range, so this gives [𝑚, 𝑛) = {𝑖 ∶ 𝑚 ≤ 𝑖 < 𝑛}
24.4.1 Example C
10 cubed is 1000
11 cubed is 1331
12 cubed is 1728
13 cubed is 2197
14 cubed is 2744
The final, most general use of the function range is generating integers with equal spacing other than 1, by giving it three
arguments:
24.5.1 Example D
2^0 = 1
2^2 = 4
2^4 = 16
2^6 = 64
2^8 = 256
Even though the first argument has the “default” of zero, I did not omit it here: what would happen if I did so?
We sometimes want to count down, which is done by using a negative value for the increment in function range.
24.6.1 Example E
Before running the following code, work out what it will do — this might be a bit surprising at first.
10 seconds
9 seconds
8 seconds
7 seconds
6 seconds
5 seconds
4 seconds
3 seconds
2 seconds
1 seconds
I will illustrate two methods; with and without using an array to store the values.
"""
Method 1: producing a numpy array of the factorials after displaying them one at a␣
↪time.
We need to create an array of the desired length before we compute the values that go␣
↪into it,
import numpy
factorials[0] = 1
print(f"0! = {factorials[0]}")
for i in range(1,n):
factorials[i] = factorials[i-1] * i
print(f"{i}! = {factorials[i]}")
print(f"The whole array is {factorials}")
0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
The whole array is [ 1 1 2 6 24]
Note that if we just want to print them one at a time inside the for loop, we do not need the array; we can just keep
track of the most recent value:
"""
Method 2: storing just the most recent value[s] needed.
"""
i_factorial = 1
print(f"0! = {i_factorial}")
for i in range(1,n):
i_factorial *= i # Note: this "*=" notation gives a short-hand for "i_factorial␣
↪= i_factorial * i"
print(f"{i}! = {i_factorial}")
0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
`i_factorial *= i`
means
The same pattern works for many arithmetic operators, so that for example
sum += f(x)*h
means
you might have to look carefully to check that the array reference on each side is the same, whereas that is made clear by
saying
Exercise C
Write a Python function that inputs a natural number 𝑛, and with the help of a for loop, computes and prints the first 𝑛
Fibonacci numbers. Note that these are defined by the formulas
𝐹0 = 1
𝐹1 = 1
𝐹𝑖 = 𝐹𝑖−1 + 𝐹𝑖−2 for 𝑖 ≥ 2
For now it is fine for your function to deliver its output to the screen with function print() and not use any return
line; we will come back to this issue below.
Follow method 2 above, without the need for an array.
Plan before you code. Before you create any Python code, work out the mathematical, algorithmic details of this process
and write down your plan in a mix of words and mathematical notation — and then check that with me before you proceed.
My guideline here is that this initial written description should make sense even to someone who knows little or nothing
about Python or any other programming language.
One issue in particular is how to deal with the first two values; the ones not given by the recursion formula 𝐹𝑖 = 𝐹𝑖−1 +
𝐹𝑖−2 . In fact, this initialization is often an important first step to deal with when designing an iterative algorithm.
TWENTYFIVE
25.1 Introduction
The last fundamental tool for describing algorithms is iteration or “looping”: tasks that repeat the same sequence of actions
repeatedly, with possible variation like using different input values at each repetition.
The section on Iteration with for covered the easier case where the number of iterations to be done is determined before
we start; now we consider the case where we must decide “on the fly” whether the iteration is finished, by checking some
conditions as part of each repetition; this is usualy done with while loops.
Often, calculating numerical approximate solutions follows a pattern of iterative improvement, like
1. Get an initial approximation.
2. Use the best current approximation to compute a new, hopefully better one.
3. Check the accuracy of this new approximation.
4. If the new approximation is good enough, stop — otherwise, repeat from step 2.
For this, a while loop can be used. Its general meaning is:
387
Introduction to Numerical Methods and Analysis with Python
We are now ready for illustrations that do something more mathematically substantial: computing cube roots using only
a modest amount of basic arithmetic. For now this is just offered as an example of programming methods, and the rapid
success might be mysterious, but is explained in a numerical methods course like Math 245. Also, the phrase “backward
error” should be familiar to students of numerical methods.
Note how the backward error allows us to check accuracy without relying on the fact that — in this easy case — we
already know the answer. Change from ‘a=8’ to ‘a=20’ to see the advantage!
# The answer "root" should satisfy root**3 - a = 0, so check how close we are:
while abs(root**3 - a) > error_tolerance:
root = (2*root + a/root**2)/3
print(f'The new approximation is {root:20.16g}, with backward error of
↪{abs(root**3 - a):e}')
print('Done!')
print(f'The cube root of {a:g} is approximately {root:20.16g}')
print(f'The backward error in this approximation is {abs(root**3 - a):.2e}')
Aside D: I have thrown in some more refinements of output format control, “:20.16g”, “:e” and “:.2e”. If you are curious,
you could try to work out what they do from these examples, or read up on this, for example in the notes on formatted
output But that is not essential, at least for now.
As a variant of Example F: The first n factorials in the previous section, if we want to compute the all factorials that are
less than N, we do not know in advance how many there are, which is a problem with a for loop.
Thus, in place of the for loop used there, we can do this:
N = 1000
"""
Compute and print all factorials less than N
"""
i = 0
i_factorial= 1
print(f"{i}! = {i_factorial}")
while i_factorial < N:
i += 1
i_factorial *= i
if i_factorial < N: # We have to check again, in case the latest value overshoots
print(f"{i}! = {i_factorial}")
0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
If we want to store all the values, we cannot create an array of the correct length in advance, as was done in Example F.
This is one place where Python lists have an advantage over Numpy arrays; lists can be extended incrementally. Also, the
way we do this introduces a new kind of Python programming tool: a method for transforming an object.
In an exercise like the above, it might be nice to accumulate a list of all the results, but the number of them is not known
in advance, so the array creation strategy seen in Example F cannot be used.
This is one place where Python lists have an advantage over Numpy arrays; lists can be extended incrementally. Also, the
way we do this introduces a new kind of Python programming tool: a method for transforming an object. The general
syntax for methods is
object.method(...)
which has the effect of transforming the object, and can take a tuple of arguments, or none. Thus, it is sort of like
25.3. Appending to lists, and our first use of Python methods 389
Introduction to Numerical Methods and Analysis with Python
We start with an empty list and then append values with the method .append().
[]
[2]
[2, 3]
25.3.2 Example D: Storing a list of the values of the factorials the factorials less
than N
Now we use this new list manipulation tool to create the desired list of factorial values: creating a list of all values 𝑖! with
𝑖! < 𝑁 .
"""
Collecting a Python list of all the factorials less than N.
"""
factorials = [] # Start with an empty list
i = 0
i_factorial = 1
print(f"{i}! = {i_factorial}")
factorials.append(i_factorial)
while i_factorial < N:
i += 1
i_factorial *= i
if i_factorial < N: # We have to check again, in case the latest value overshoots
print(f"{i}! = {i_factorial}")
factorials.append(i_factorial)
print()
print(f"The list of all factorials less that {N} is {factorials}")
0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
The list of all factorials less that 1000 is [1, 1, 2, 6, 24, 120, 720]
factorials = numpy.array(factorials)
Write a Python function that inputs a natural number 𝑁 , and with the help of a while loop, computes and prints in turn
each Fibonacci number less than or equal to 𝑁 .
For now the values are only printed, and so one does not need to store them all; only a couple of the most recent ones.
Note well; this is all 𝐹𝑖 ≤ 𝑁 , not the Fibonacci numbers up to 𝐹𝑁 . Thus we do not know how many there are initially:
this is the scenario where while loops are more natural than for loops.
Written planning. Again, start by working out and writing down your mathematical plan, and check it with me before
creating any Python code.
25.3.4 Exercise B: all output via a return statement; no print to screen in the
function
Modify your function from the previous exercise to cumulate the needed Fibonacci numbers in a Python list, and return
this list. This time, your function itself should not print anything: instead, your file will display the results with a single
print function after invoking the function.
NOTE: This approach of separating the calculations in a function from subsequent display of results is the main way
that we will arrange things from now on.
25.3. Appending to lists, and our first use of Python methods 391
Introduction to Numerical Methods and Analysis with Python
TWENTYSIX
26.1 Introduction
To work with files of Python code and create moduels fo ue from witin notebooks, it is convenient to use a software tool
that supports creating and editing such files and then running the code: a so-called Integrated Development Environment,
or “IDE”.
For that task, these notes describe the use of Spyder, [from “Scientic PYthon Development enviRonment] which is
included in Anaconda and can be opened from the Anaconda Navigator. However there are several other widely used
alternatives, such as the fairly rudimentary IDLE and the more advanced PyCharm One reon for peferring Spyder over
IDLE is that Spyder integrates the “Scientific Python” packages Numpy, Matlpotlib and Scipy that we will be using.
393
Introduction to Numerical Methods and Analysis with Python
To start learning the use of the Spyder IDE, we will reproduce Exercise A of the section on functions.
1. Open Spyder (from Anaconda-Navigator).
2. In Spyder, create a new file, using menu “File”, item “New File …”.
3. Save this file into the folder numerical-methods” (or whatever you named it) for this course, using the name “exer-
ciseA.py”.
4. Sypder puts some stuff in the newly created “empty” file, in particular a “triple quoted comment” near the top. Edit
this comment to include a title, your name, the date and some decription, and maybe some contact information.
Mine would look something like:
""" For Exercise A @author: Brenton LeMesurier <[email protected]> Last
revised January 0, 1970 """
This special triple quoting notation is used for introductory comments and documention for a file, as well as for
functions; we will see one nice use of this soon, with the function help.
5. Immediately after these opening comments, copy in any import statements needed by the function
solve_quadratic in Exercise 3A.
Note: It is good practice for all imports to be near the top of a code file or notebook, straight after any introductory text.
6. Copy the code for the function solve_quadratic developed in Exercise 3A into this file, below the import
sttemwnt (if any).
7. Then copy in the code for the test cases after the function definition.
8. Run the code in the folder, using the “Run” button atop Spyder (it looks like a triangular “Play” icon).
Aside: For more substantial Python code development, it can be better to first develop the code in a file like this, using
Spyder (or another IDE), and then copy the working code into a notebook for presentation; that allows you to make use
of more powerful testing and debugging tools.
On the other hand, there is a potential drawback with putting the function definition and test cases in one Python file: if
an error (“exception”) occurs for any test case, the rest of the file is not executed. For now, avoid this by having at most
one such exceptional case, and putting it last.
We will later learn several better ways of handling this sort of situation — one seen already is the above approach of
testing from separate cells in a notebook.
26.2.2 Exercise B. using code from a Python code file with command import
Functions (and variables) defined in a code file can be used from a notebook or from another code file by importing:
effectively we have created a module exercise_A, and can access the function solve_quadratic defined there
from a notebook by using the command from exercise_A import solve_quadratic
Note that the base-name of the file is now being used as a variable name — that is why code file names should follow the
same naming rules as variable names.
To experiment a bit more with defining and using your own modules:
1. Create a notebook modules.ipynb — you can cut and paste from this notebook, but leave out irrelevant stuff:
copy mainly the headings and the statements of exercises.
2. Make a copy of the file exercise_A.py named functions.py.
394 Chapter 26. Code Files, Modules, and an Integrated Development Environment
Introduction to Numerical Methods and Analysis with Python
3. In that file functions_exercise.py, remove the test cases for Exercise A, and add the definition of function
Df_CD_approximation that was created in Exercise A of the section Functions. So now this new file defines
a module functions which just defines these functions, not running any test cases using them or producing any
output when run.
4. In notebook modules.ipynb, import these two functions, and below the import command, copy in the test
case cells used in section Functions for both Exercises A and B.
5. Run the noteboook.
This combination of notebook with importing from a module can give a notebook presentation that is more concise and
less cluttered; this will be a distict advantage at times later in the course where the collection of function definitions is far
longer.
26.2.3 Exercise C. Start building a module of functions you create in this course.
Make a copy of the module file modules.py with a name like numerical_methods.py.
As the course progresses, you will create some more useful functions, such as one for solving equations using Newton’s
method: gather those in this module numerical_methods.
396 Chapter 26. Code Files, Modules, and an Integrated Development Environment
CHAPTER
TWENTYSEVEN
27.1 Introduction
Although we have now seen all the essential tools for describing mathematical calculations and working with functions,
there is one more algorithmic tool that can be very convenient, and is also of fundamental importance in the study of
functions in both mathematics and theoretical computer science.
This is recursively defined functions: where the definition of the function refers to other values of the same function.
To avoid infinite loops, the other values are typically for input values that are in some sense “earlier” or “smaller”, such
as lower values of a natural number.
We have already seen two familiar examples in Unit 5; the factorial:
0! = 1
𝑛! = 𝑛 ⋅ (𝑛 − 1)! for 𝑛 ≥ 1
and the Fibonacci numbers
𝐹0 = 1
𝐹1 = 1
𝐹𝑛 = 𝐹𝑛−1 + 𝐹𝑛−2 for 𝑛 ≥ 2
These can be implemented using iteration, as seen in Unit 5; here are two versions:
def factorial_iterative(n):
n_factorial = 1
for i in range(2, n+1): # n+1 to include the factor n
n_factorial *= i
return n_factorial
n = 5
print(f'{n}! = {factorial_iterative(n)}')
print('Test some edge cases:')
print('0!=', factorial_iterative(0))
print('1!=', factorial_iterative(1))
397
Introduction to Numerical Methods and Analysis with Python
5! = 120
Test some edge cases:
0!= 1
1!= 1
def first_n_factorials(n):
factorials = [1]
for i in range(1, n):
factorials.append(factorials[-1]*i)
return factorials
n = 10
print(f'The first {n} factorials (so up to {n-1}!) are {first_n_factorials(n)}')
The first 10 factorials (so up to 9!) are [1, 1, 2, 6, 24, 120, 720, 5040, 40320,␣
↪362880]
However we can also use a form that is closer to the mathematical statement.
First, let us put the factorial definition in more standard mathematical notation for functions
𝑓(0) = 1
𝑓(𝑛) = 𝑛 ⋅ 𝑓(𝑛 − 1) for 𝑛 ≥ 1
Next to make it more algorithmic and start the segue towards Python code, distinguish the two cases with an if
if n = 0:
𝑓(𝑛) = 1
else:
𝑓(𝑛) = 𝑛 × 𝑓(𝑛 − 1)
Here that is in Python code:
def factorial_recursive(n):
if n == 0:
return 1
else:
return n * factorial_recursive(n-1)
n = 9
print(f'{n}! = {factorial_recursive(n)}')
9! = 362880
Yes, Python functions are allowed to call themselves, though one must beware that this could lead to an infinite loop.
27.3.1 Exercise A
Can you see how you could cause that problem with this function?
Try, and you will see that Python has some defences against this kind of infinite loop.
It can be illuminating to trace the steps with some extra print commands:
n = 5
nfactorial = factorial_recursive_with_tracing(n, trace=True)
print('The final result is', nfactorial)
n = 5
n = 4
n = 3
n = 2
n = 1
n = 0
0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
The final result is 120
Experiment: Try sending this tracing version into an infinite loop. (In JupyterLab, note the square “stop” button to the
right of the triangular “play” button!)
Write and test a recursive function to evaluate the 𝑛-th Fibonacci number 𝐹𝑛 , mimicking the first, simplest recursive
version for the factorial above.
Do this without using either lists, arrays, or special variables holding the two previous values!
If you have access to Spyder (or another Python IDE) develop and test this in a Python file “exercise7b.py” and submit
that as well as a final notebook for this unit.
Test for 𝑛 at least 5.
Write and test a second recursive function to evaluate the first 𝑛 Fibonacci numbers, adding the option of tracing output,
as in the second recursive example above.
Again test for 𝑛 at least 5.
Again, develop and test this in a Python file “exercise7c.py” initially if you have access to a suitable IDE.
Comment on why this illustrates that, although recursive implementations can be very concise and elegant, they are
sometimes very inefficient compared to expressing the calculation as an iteration with for or while loops.
If you have not done so already, put all your code for the above exercises into a single Jupyter notebook.
Make sure that, like all documents produced in this course, the notebook and any other files submitted have an appropriate
title, your name, and the correct date at the top!
Note the way I express the date as “Last modified”; keep this up-to-date when you revise.
Some recursive algorithms are so-called tail recursive, which means that when a function calls itself, the “calling” invo-
cation of the function has nothing more to do; the task is handed off entirely to the new invocation. This means that it
can be possible to “clean up” by getting rid of all memory and such associated with the calling invocation of the function,
eliminating that nesting seen in the above tracing and potentially improving efficiency by a lot.
Some programming languages do this “clean up” of so-called “tail calls”; indeed functional progamming languages forbid
variables to have their values changed within a function (so that functions in such a language are far more like real
mathematical functions), and this rules out many while loop algorithms, like those above. Then recursion is a central
tool, and there is a high piority on implementing recursion in this efficient way.
For example, here is a tail recursive approach to the factorial:
def factorial_tail_recursive(n):
'''For convenience, we wrap the actual "working" function inside one with simpler␣
↪input:
'''
def tail_factorial(result_so_far, n):
print(f'result_so_far = {result_so_far}, n = {n}')
if n == 0:
return result_so_far
else:
return tail_factorial(result_so_far*n, n-1)
result_so_far = 1
return tail_factorial(result_so_far, n)
n = 9
print(f'factorial_tail_recursive gives {n}! = {factorial_tail_recursive(n)}')
print('\nFor comparison,')
print(f'factorial_recursive gives {n}! = {factorial_recursive(n)}')
print(f'factorial_iterative gives {n}! = {factorial_iterative(n)}')
result_so_far = 1, n = 9
result_so_far = 9, n = 8
result_so_far = 72, n = 7
result_so_far = 504, n = 6
result_so_far = 3024, n = 5
result_so_far = 15120, n = 4
result_so_far = 60480, n = 3
result_so_far = 181440, n = 2
result_so_far = 362880, n = 1
result_so_far = 362880, n = 0
factorial_tail_recursive gives 9! = 362880
For comparison,
factorial_recursive gives 9! = 362880
factorial_iterative gives 9! = 362880
However, tail recursion is in general equivalent to iteration with a while loop, with the input and output of the tail
recursive function instead being variables that are updated in the loop. Thus it is mostly a matter of preference as to how
one expresses the algorithm.
For example, the above can be rather straightforwardly translated to the following:
TWENTYEIGHT
Numerical data is often presented with graphs, and the tools we use for this come from the module matplotlib.
pyplot which is part of the Python package matplotlib. (A Python package is essentially a module that also
contains other modules.)
Matplotlib is a huge collection of graphics tools, of which we see just a few here. For more information, the home site
for Matplotlib is http://matplotlib.org and the section on pyplot is at http://matplotlib.org/1.3.1/api/pyplot_api.html
However, another site that I find easier as an introduction is https://scipy-lectures.org/intro/matplotlib/
In fact, that whole site https://scipy-lectures.org/ is quite useful a a reference on Python, Numpy, and so on.
Note: the descriptions here are for now about working in notebooks: see the note below on differences when using Spyder
and IPython.
In a notebook, we can choose between having the figures produced by Matplotlib appear “inline” (that is, within the
notebook window) or in separate windows. For now we will use the inline option, which is the default, but can also be
specified explicitly with the command
%matplotlib inline
To activate that, uncomment the line below; that is, remove the leading hash character “#”
%matplotlib inline
This is an IPython magic command, indicated by starting with the percent character “%” — you can read more about
them at https://ipython.org/ipython-doc/dev/interactive/magics.html
Alternatively, one can have figures appear in separate windows, which might be useful when you want to save them to
files, or zoom and pan around the image. That can be chosen with the magic command
403
Introduction to Numerical Methods and Analysis with Python
%matplotlib tk
#%matplotlib tk
As far as I know, this magic works for Windows and Linux as well as Mac OS; let me know if it does not!
We need some Numpy stuff, for example to create arrays of numbers to plot.
Note that this is Numpy only: Python lists and tuples do not work for this, and nor do the versions of functions like sin
from module math!
# Import a few favorites, and let them be known by their first names:
from numpy import linspace, sin, cos, pi
And for now, just the one main matplotlib graphics function, plot
To plot the graph of a function, we first need a collection of values for the abscissa (horizontal axis). The function linspace
gives an array containing a specified number of equally spaced values over a specified interval, so that
Not quite what you expected? To get values with ten intervals in between them, you need 11 values:
We could use these 11 values to graph a function, but the result is a bit rough, because the given points are joined with
straight line segments:
plot(tenintervals, sin(tenintervals))
[<matplotlib.lines.Line2D at 0x7fa69a063d30>]
Here we see the default behavior of joining the given points with straight lines.
Aside: That text output above the graph is a message returned as the output value of function plot; that is what happens
when you execute a function but do not “use” its return value by either saving its result into a variable or making it input
to another function.
You might want to suppress that, and that can be done by saving its return value into a variable (which you can then
ignore).
It turns out that 50 points is often a good choice for a smooth-looking curve, so the function linspace has this as a default
input parameter: you can omit that third input value, and get 50 points.
Let’s use this to plot some trig. functions.
x = linspace(-pi, pi)
print(x)
As we have seen when using plot to produce inline figures in a Jupyter notebook, plot commands in different cells
produce separate figures.
To combine curves on a single graph, one way is to use successive plot commands within the same cell:
On the other hand, when plotting externally, or from a Python script file or the IPython command line, successive plot
commands keep adding to the same figure until you explicitly specify otherwise, with the function figure introduced
below.
Aside on message clean up: a Juypter cell only displays the output of the last function invoked in the cell (along with anything
explicitly output with a print function), so I only needed to intercept the message from the last plot command.
Several curves can be specified in a single plot command (which also works with external figure windows of course.)
Note that even with multiple curves in a single plot command, markers can be specified on some, none or all: Matplotlib
uses the difference between an array and a text string to recognize which arguments specify markers instead of data.
Here are some other marker options — particularly useful if you need to print in back-and-white.
x = linspace(-1,1)
plotmessage = plot(x, x, x, x**2, x, x**3, x, x**4, x, x**5, x, x**6, x, x**7)
With enough curves (more than ten? It depends on the version of matplotlib in use) the color sequence eventually repeats
– but you probably don’t want that many curves on one graph.
x = linspace(-1,1)
plotmessage = plot(x, x, x, x**2, x, x**3, x, x**4, x, x**5,
x, x**6, x, x**7, x, x**8, x, x**9, x, x**10,
x, -x)
Aside on long lines of code: The above illustrates a little Python coding hack: one way to have a long command continue
over several lines is simply to have parentheses wrapped around the part that spans multiple lines—when a line ends with
an opening parenthesis not yet matched, Python knowns that something is still to come.
28.9.1 Aside: using IPython magic commands in Spyder and with the IPython com-
mand line
If using Spyder and the IPython command line, there is a similar choice of where graphs appear, but with a few differences
to note:
• With the “inline” option (which is again the default) figures then appear in a pane within the Spyder window.
• The “tk” option works exactly as with notebooks, with each figure appearing in its own window.
• Note: Any such IPython magic commands must be entered at the IPython interactive command line, not in a
Python code file.
A curve can also be specified by a single array of numbers: these are taken as the values of a sequence, indexed Pythonically
from zero, and plotted as the ordinates (vertical values):
From within a single Jupyter cell, or when working with Python files or in the IPython command window (as used within
Spyder), successive plot commands keep adding to the previous figure. To instead start the next plot in a separate
figure, first create a new “empty” figure, with the function matplotlib.pyplot.figure.
With a full name as long as that, it is worth importing so that it can be used on a first name basis:
x = linspace(0, 2*pi)
plot(x, sin(x))
figure()
plotmessage = plot(x, cos(x), 'o')
The figure command can also do other things, like attach a name or number to a figure when it is displayed externally,
and change from the default size.
So even though this is not always needed in a notebook, from now on each new figure will get an explicit figure
command. Revisiting the last example:
x = linspace(0, 2*pi)
figure(99)
# What does 99 do?
# See with external "tk" display of figures,
# as with `%matplotlib tk`
(continues on next page)
Curves can be decorated in different ways. We have already seen some options, and there are many more. One can specify
the color, line styles like dashed or dash-dot instead of solid, many different markers, and to have both markers and lines.
As seen above, this can be controlled by an optional text string argument after the arrays of data for a curve:
figure()
plot(x, sin(x), '*-')
plotmessage = plot(x, cos(x), 'r--')
These three-part curve specifications can be combined: in the following, plot knows that there are two curves each
specified by three arguments, not three curves each specified by just an “x-y” pair:
figure()
plotmessage = plot(x, sin(x), 'g-.', x, cos(x), 'm+-.')
28.13 Exercises
There are many commands for refining the appearance of a figure after its initial creation with plot. Experiment yourself
with the commands title, xlabel, ylabel, grid, and legend.
Using the functions mentioned above, produce a refined version of the above sine and cosine graph, with:
• a title at the top
• labels on both axes
• a legend identifying each curve
• a grid or “graph paper” background, to make it easier to judge details like where a function has zeros.
Then work out how to save this figure to a file (probably in format PNG), and turn that in, along with the file used to
create it.
This ismost readily done with externally displayed figures; that is, with %matplotlib tk. Making that change to tk
in a notebook requires then restarting the kernel for it to take effect; use the menu Kernel above and select “Restart Kernel
and Run All Cells …*
For your own edification, explore other features of externally displayed figures, like zooming and panning: this cannot be
done with inline figures.
For some of these, you will probably need to read up. For simple things, there is a function help, which is best used in
the IPython interactive input window (within Spyder for example), but I will illustrate it here.
The entry for plot is unusually long! It provides details about all the options mentioned above, like marker styles. So
this might be a good time to learn how to clear the output in a cell, to unclutter the view: either use the above menu “Edit*
or open the menu with Control-click or right-click on the code cell; then use “Clear Outputs” to remove the output of just
the current cell.
help(plot)
The jargon used in help can be confusing at first; fortunately there are other online sources that are more readable and
better illustrated, like http://scipy-lectures.github.io/intro/matplotlib/matplotlib.html mentioned above.
However, that does not cover everything; the official pyplot documentation at http://matplotlib.org/1.3.1/api/pyplot_api.
html is more complete: explore its search feature.
So far I have encourage you to use explicit, specific import commands, because this is good practice when developing
larger programs. However, for quick interactive work in the IPython command window and Jupyter notebooks, there is
a sometimes useful shortcut: the IPython “magic” command
%pylab
adds everything from Numpy and the main parts of Matplotlib, including all the items imported above. (This creates the
so-called pylab environment: that name combines “Python” with “Matlab”, as its goal is to produce an environment very
similar to Matlab.)
Note that such “magic” commands are part of the IPython interactive interface, not Python language commands, so they
must be used either in a IPython notebook or in the IPython command window (within Spyder), not in a Python “.py”
file.
However, there is a way to access magics in python scripts; the above can be achieved in such a file with:
get_ipython().run_line_magic('pylab', '')
%matplotlib inline
is achieved with
get_ipython().run_line_magic('matplotlib', 'inline')
TWENTYNINE
Note: We will see more tools for linear algebra in the section on Scipy Tools, which introduces the package Scipy.
As of version 3.5, Python can handle matrix multiplication on Numpy arrays, using the at sign “@”:
C = A @ B
D = A * B
import numpy as np
print(f"A is\n{A}")
print(f"B is\n{B}")
print(f"C is\n{C}")
419
Introduction to Numerical Methods and Analysis with Python
A is
[[1 2 3]
[4 5 6]]
B is
[[1 2]
[3 4]
[5 6]]
C is
[[10 9 8]
[ 7 6 5]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-3bb26336abf9> in <module>
----> 1 print(f"The array product A * B fails:\n{A * B}")
ValueError: operands could not be broadcast together with shapes (2,3) (3,2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-55b834c682c2> in <module>
----> 1 print(f"Matrix product A times C fails:\n{A @ C}")
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with␣
↪gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)
print(f"A-transpose is\n{A.T}")
A-transpose is
[[1 4]
[2 5]
[3 6]]
This works with Python lists and Numpy arrays, and we have seen some of it before; I review it here because it will help
with doing the row operations of linear algebra.
print(f"A:\n{A}")
print(f"The column of index 1 (presented as a row vector): {A[:,1]}")
print(f"The row of index 1: {A[1,:]}")
print(f"The first 2 elements of the row of index 1: {A[1,:2]}")
print(f"Another way to say the above: {A[1,0:2]}")
print(f"The bottom-right element: {A[-1,-1]}")
print(f"The 2x2 sub-matrix in the bottom-right corner:\n{A[-2:,-2:]}")
A:
[[1 2 3]
[4 5 6]]
The column of index 1 (presented as a row vector): [2 5]
The row of index 1: [4 5 6]
The first 2 elements of the row of index 1: [4 5]
Another way to say the above: [4 5]
The bottom-right element: 6
The 2x2 sub-matrix in the bottom-right corner:
[[2 3]
[5 6]]
29.2.2 Synonyms for array names with the equal sign (not copying!)
If we use the equal sign between two array names, that makes them synonyms, referring to the same values:
A is
[[1 2 3]
[4 5 6]]
Anickname is
[[1 2 3]
[4 5 6]]
Anickname is now
[[12 2 3]
[ 4 5 6]]
and so is A!:
[[12 2 3]
[ 4 5 6]]
Thus if we want a separate new array or list with the same elements initially, we must make a copy with the method
.copy(), not the equal sign:
A is
[[1 2 3]
[4 5 6]]
Acopy is
[[1 2 3]
[4 5 6]]
Acopy is now
[[54 2 3]
[ 4 5 6]]
A is still
[[1 2 3]
[4 5 6]]
Exercise A
Create a Numpy array (not a Numpy matrix; those are now mostly obsolete!) containing the matrix
4. 2. 1.
𝐴=⎡
⎢ 9. 3. 1. ⎤
⎥
⎣ 25. 5. 1. ⎦
Exercise B
Create arrays 𝑐 and 𝑑 containing respectively the last row of 𝐴 and the middle column of 𝐴.
Note: Do this by manipulating the array A with indexing and slicing operations, without entering any numerical values
for array entries explicity.
29.2. Slicing: Extracting rows, columns, and other rectangular chunks from matrices 423
Introduction to Numerical Methods and Analysis with Python
THIRTY
30.1 Introduction
The package package Scipy provides a a great array of funtions for scinetific computing; her we wil just exlore one part of
it: some additional tools for linear algebra from module linalg within the package Scipy. This provides tools for solving
simultaneous linear equations, for variations on the LU factorization seen in a numerical methods course, and much more.
This module has the standard standard nickname la, so import it using that:
import scipy.linalg as la
SciPy usually needs stuff from NumPy, so let’s import that also:
import numpy as np
30.1.1 Exercise C
4. 2. 1.
𝐴=⎡
⎢ 9. 3. 1. ⎤
⎥
⎣ 25. 5. 1. ⎦
and
425
Introduction to Numerical Methods and Analysis with Python
30.1.2 Exercise D
Exercise D-bonus
Check further by computing the maximum error, or maximum norm, or infinity norm of this: ‖𝑟‖∞ = ‖𝐴𝑥 − 𝑏‖∞ ;
that is, the maximum of the absolute values of the elements of 𝑟, max𝑛𝑖=1 |𝑟𝑖 |.
30.1.3 Exercise E
Next use the Scipy function lu to compute the 𝑃 𝐴 = 𝐿𝑈 factorization of 𝐴. Check you work by verifying that:
1. 𝑃 is a permutation matrix (a one in each row and column; the rest all zeros)
2. 𝑃 𝐴 is got by permutating the rows of 𝐴
3. 𝐿 is (unit) lower triangular (all zeros above the main diagonal; ones on the main diagonal)
4. 𝑈 is upper triangular (all zeros below the main diagonal)
5. The products 𝑃 𝐴 and 𝐿𝑈 are the same (or very close; there might be rounding error).
6. The “residual matrix” 𝑅 = 𝑃 𝐴 − 𝐿𝑈 is zero (or very close; there might be rounding error).
30.1.4 Exercise F
30.1.5 Optional Exercise G (on the matrix norm, seen in a numerical methods
course)
426 Chapter 30. Package Scipy and More Tools for Linear Algebra
CHAPTER
THIRTYONE
31.1 Introduction
This unit starts with the methods for approximating definite integrals seen in a calculus course like the (Composite)
Midpoint Rule and (Composite) Trapezoidal Rule.
As bonus exercises, we will then work through some more advanced methods as seen in a numerical methods course; the
algorithms will be stated below.
We will also learn more about working with modules: how to run tests while developing a module, and how to incorporate
demonstrations into a notebook once a module is working.
31.2 Exercises
M_n = midpoint0(f, a, b, n)
𝑏
which returns the approximation of 𝐼 = ∫𝑎 𝑓(𝑥) 𝑑𝑥 given by the [Composite] Midpoint Rule with 𝑛 intervals of equal
width.
That is,
𝑏−𝑎
𝑀𝑛 = ℎ (𝑓(𝑎 + ℎ/2) + 𝑓(𝑎 + 3ℎ/2) + 𝑓(𝑎 + 5ℎ/2) + ⋯ + 𝑓(𝑏 − ℎ/2)) , ℎ= .
𝑛
In this first version, accumulate the sum using a for loop.
𝑒
𝑑𝑥
Test this with 𝐼1 ∶= ∫ and several choices of 𝑛, such as 10 and 100.
1 𝑥
Work out the exact value, and use this to display the size of the errors in each approximation.
427
Introduction to Numerical Methods and Analysis with Python
Express the above Midpoint Rule in summation notation “Σ” and reimplement as
M_n = midpoint1(f, a, b, n)
using the Python function sum to avoid any loops. (Doing it this way give a code that is more concise and closer to
mathematical form, so hopefully more readable; it will also probably run faster.)
(Summation-and-Integration-Exercise-C)
T_n = trapezoidal(f, a, b, n)
𝑏
which returns the approximation of 𝐼 = ∫𝑎 𝑓(𝑥) 𝑑𝑥 given by the [Composite] Trapezoidal Rule with 𝑛 intervals of equal
width.
That is,
Place the test cases below the function definitions within one or more if blocks starting
if __name__ == "__main__":
(Each of those long dashes is a pair of underscores.)
The contents of such an if block are executed when you run the file directly (as if it were a normal Python program file)
but are ignored when the module is used with import from another file. Thus, these blocks can be used while developing
the module, and later to provide a demonstration of the module’s capabilities.
As you may have seen, importing a module a second or subsequent time from another Python file or notebook does not get
the updated version of the module’s contents unless you restart the Python kernel. (Python does this to avoid duplicated
effort when the same module is mentioned in several import statements in a file.) Thus while revising a module, it is more
convenient to treat it like a normal Python file, testing from within by this method rather than by importing elsewhere.
Once the module is working for the above three methods, create a notebook which imports those functions and runs various
examples with them. (That is, copy all the stuff from within the above if __name__ == "__main__": blocks to
the notebook.)
The notebook should also describe the mathematical background; in particular, the formulas for all the methods.
(Summation-and-Integration-Exercise-E)
2𝑀𝑛 + 𝑇𝑛
Define another Python function which uses the above functions to compute the “weighted average” 𝑆𝑛 = .
3
Yes, this is the Composite Simpson Rule, so make it
S_n = simpson(f, a, b, n)
Implement this, using these three formulas plus the above function for the composite midpoint rule.
One natural data structure is a 2D array with unused entries above the main diagonal. However, you might consider how
to store this triangular collection of data as a list of lists, succesively of lengths 1, 2 and so on up to 𝑛.
Add further test cases; one interesting type of example is a periodic function.
THIRTYTWO
32.1 Introduction
Random numbers are often useful both for simulation of physical processes and for generating a collection of test cases.
Here we will do a mathematical simulation: approximating 𝜋 on the basis that the unit circle occupies a fraction 𝜋/4 of
the 2 × 2 square enclosing it.
32.1.1 Disclaimer
Actually, the best we have available is pseudo-random numbers, generated by algorithms that actually produce a very
long but eventually repeating sequence of numbers.
The pseudo-random number generator we use are provided by package Numpy in its module random – full name numpy.
random. This module contains numerous random number generators; here we look at just a few.
We introduce the abbreviation “npr” for this, along with the standard abbreviations “np” for numpy and “plt” for module
matplotlib.pyplot witit package `matplotlib
431
Introduction to Numerical Methods and Analysis with Python
First, the function rand (full name numpy.random.rand) generates uniformly-distributed real numbers in the semi-
open interval [0, 1).
To generate a single value, use it with no argument:
n_samples = 4
for sample in range(n_samples):
print(npr.rand())
0.6831120389814106
0.5548018157238266
0.8695626004658878
0.3752686546706374
To generate an array of values all at once, one can specify how many as the first and only input argument:
pseudorandomnumbers = npr.rand(n_samples)
print(pseudorandomnumbers)
However, the first method has an advantage in some situations: neither the whole list of integers from 0 to n_samples
- 1 nor the collection of random numbers is stored at any time: instead, just one value at a time is provided, used, and
then “forgotten”. This can be beneficial or even essential when a very large number of random values is used; it is not
unusual for a simulation to require more random values than the computer’s memory can hold.
We can also generate multi-dimensional arrays, by giving the lengths of each dimension as arguments:
numbers2d = npr.rand(2,3)
print('A two-dimensional array of random numbers:\n', numbers2d)
numbers3d = npr.rand(2,3,4)
print('\nA three-dimensional array of random numbers:\n', numbers3d)
The function randn has the same interface, but generates numbers with the standard normal distribution of mean zero,
standard deviation one:
n_samples = 10**7
normf_samples = npr.randn(n_samples)
mean = sum(normf_samples)/n_samples
print('The mean of these', n_samples, 'samples is', mean)
standard_deviation = np.sqrt(sum(normf_samples**2)/n_samples - mean**2)
print('and their standard deviation is', standard_deviation)
Note The exact mean and standard deviation of the standard normal distribtion are 0 and 1 respectively, so the slight
variations above are due to these being only a sample mean and sample standard deviation.
matplotlib.random has a function hist(x, bins, ...) for plotting histograms, so we can check what this
normal distribution actually looks like.
Input parameter x is the list of values, and when input parameter bins is given an integer value, the data is binned into
that many equally wide intervals.
The function hist also returns three values:
• n, the number of values in each bin (the bar heights on the histogram)
• bins (which I prefer to call bin_edges), the list of values of the edges between the bins
• patches, which we can ignore!
It is best to assigned this output to variables; otherwise the numerous values are sprayed over the screen.
# Note: the three output values must be assigned to variables, even though we do not␣
↪need them here.
One can generate pseudo-random integers, uniformly distributed between specified lower and upper values.
n_dice = 60
dice_rolls = npr.randint(1, 6+1, n_dice)
print(n_dice, 'random dice rolls:\n', dice_rolls)
# Count each outcome: this needs a list instead of an array:
dice_rolls_list = list(dice_rolls)
for value in (1, 2, 3, 4, 5, 6):
count = dice_rolls_list.count(value)
print(value, 'occured', count, 'times')
This time, it is best to explicitly specify a list of the edges of the bins, by making the second argument bins a list.
With six values, seven edges are needed, and it looks nicest if they are centered on the integers.
Run the above several times, redrawing the histogram each time; you should see a lot of variation.
Things average out with more rolls:
n_dice = 10**6
dice_rolls = npr.randint(1, 6+1, n_dice)
# Count each outcome: this needs a list instead of an array:
dice_rolls_list = list(dice_rolls)
for value in (1, 2, 3, 4, 5, 6):
count = dice_rolls_list.count(value)
print(value, 'occured a fraction', count/n_dice, 'of the time')
(n, bin_edges, patches) = plt.hist(dice_rolls, bins = bin_edges)
Exercise A: approximating 𝜋
We can compute approximations of 𝜋 by using the fact that the unit circle occupies a fraction 𝜋/4 of the circumscribed
square:
plt.figure(figsize=[8, 8])
angle = np.linspace(0, 2*np.pi)
# Red circle
plt.plot(np.cos(angle), np.sin(angle), 'r')
# Blue square
plt.plot([-1,1], [-1,-1], 'b') # bottom side of square
plt.plot([1,1], [-1,1], 'b') # right side of square
plt.plot([1,-1], [1,1], 'b') # top side of square
plt.plot([-1,-1], [1,-1], 'b') # left side of square
[<matplotlib.lines.Line2D at 0x7fc7ea44be10>]
Exercise B
It takes a lot of samples to get decent accuracy, so after part (a) is working, experiment with successively more samples;
increase the number 𝑁 of samples per trial by big steps, like factors of 100.
For each choice of sample size 𝑁 , compute the mean and standard deviation, and plot the histogram.
THIRTYTHREE
The basic tool for displaying information from Python to the screen is the function print(), as in
print("Hello world.")
Hello world.
What this actually does is convert one or more items given as its input arguments to strings of text, and then displays them
on a line of output, each separated from the next by a single space. So it also handles numbers:
print(7)
print(1/3)
7
0.3333333333333333
x = 1
y = 2
print(x,"+",y,"=",x+y)
1 + 2 = 3
439
Introduction to Numerical Methods and Analysis with Python
When assembling multiple pieces of information, the above syntax can get messy; also, it automatically inserts a blank
space between items, so does not allow us to avoid the space before “am” and “pm”.
Python has several methods for finer string manipulation; my favorite is f-strings, introduced in Python version 3.6, so
I will describe only it. (The two older approaches are the “%” operator and the .format() method; if you want
to know about them — if only for a reading knowledge of code that uses them — there is a nice overview at https:
//realpython.com/python-string-formatting/)
The key idea of f-strings is that when an “f” is added immediately before the opening quote of a string, parts of that string
within braces {…} are taken as input which is processed and the results inserted into a modifed string. For example, the
previous print command above could instead be done with precise control over blank spaces as follows:
Sometimes this is useful, because strings get used in other places besides print(), such as the titles and labels of a
graph, and as we will see next, it can be convenient to assemble a statement piece at a time and then print the whole thing;
for this, we use “addition” of strings, which is concatenation.
Note also the explicit insertion of spaces and such.
440 Chapter 33. Formatted Output and Some Text String Manipulation
Introduction to Numerical Methods and Analysis with Python
When do stores open? Stores open at 7am. — When do stores close? Stores close at␣
↪11pm.
The information printed above came out alright, but there are several reasons we might want finer control over the display,
especially with real numbers (type float)
• Choosing the number of significant digits or correct decimals to display for a float (real number).
• Controlling the width a number’s display (for example, in order to line up columns).
First, width control. The following is slightly ugly due to the shifts right.
4^0 = 1
4^1 = 4
4^2 = 16
4^3 = 64
4^4 = 256
4^5 = 1024
4^6 = 4096
4^7 = 16384
4^8 = 65536
4^9 = 262144
4^10 = 1048576
To line things up, we can specify that each output item has as many spaces as the widest of them needs: 2 and 7 columns
respectively, with syntax{quantity:width}
4^ 0 = 1
4^ 1 = 4
4^ 2 = 16
4^ 3 = 64
4^ 4 = 256
4^ 5 = 1024
4^ 6 = 4096
4^ 7 = 16384
4^ 8 = 65536
4^ 9 = 262144
4^10 = 1048576
That is still a bit strange with the exponents, because the output is right-justified. Left-justified would be better, and is
done with a “<” before the width. (As you might guess, “>” can be used to explicitly specify right-justification).
4^0 = 1
4^1 = 4
4^2 = 16
4^3 = 64
4^4 = 256
4^5 = 1024
4^6 = 4096
4^7 = 16384
4^8 = 65536
4^9 = 262144
4^10 = 1048576
Next, dealing with float (real) numbers: alignment, significant digits, and scientific notation vs fixed decimal form.
Looking at:
4^-0 = 1.0
4^-1 = 0.25
4^-2 = 0.0625
4^-3 = 0.015625
4^-4 = 0.00390625
4^-5 = 0.0009765625
4^-6 = 0.000244140625
4^-7 = 6.103515625e-05
4^-8 = 1.52587890625e-05
4^-9 = 3.814697265625e-06
4^-10 = 9.5367431640625e-07
442 Chapter 33. Formatted Output and Some Text String Manipulation
Introduction to Numerical Methods and Analysis with Python
4^-0 = 1.000000
4^-1 = 0.250000
4^-2 = 0.062500
4^-3 = 0.015625
4^-4 = 0.003906
4^-5 = 0.000977
4^-6 = 0.000244
4^-7 = 0.000061
4^-8 = 0.000015
4^-9 = 0.000004
4^-10 = 0.000001
4^-0 = 1.000000e+00
4^-1 = 2.500000e-01
4^-2 = 6.250000e-02
4^-3 = 1.562500e-02
4^-4 = 3.906250e-03
4^-5 = 9.765625e-04
4^-6 = 2.441406e-04
4^-7 = 6.103516e-05
4^-8 = 1.525879e-05
4^-9 = 3.814697e-06
4^-10 = 9.536743e-07
4^-0 = 1
4^-1 = 0.25
4^-2 = 0.0625
4^-3 = 0.015625
4^-4 = 0.00390625
4^-5 = 0.000976562
4^-6 = 0.000244141
4^-7 = 6.10352e-05
4^-8 = 1.52588e-05
4^-9 = 3.8147e-06
4^-10 = 9.53674e-07
To control precision, the width specifier gains a “decimal” p, with {...:w.p} specifying p decimal places or significant
digits, depending on context. Also, the width can be omitted if only precision matters, not spacing.
Let’s ask for 9 digits:
4^-0 = 1.000000000
4^-1 = 0.250000000
4^-2 = 0.062500000
(continues on next page)
4^-0 = 1.000000000e+00
4^-1 = 2.500000000e-01
4^-2 = 6.250000000e-02
4^-3 = 1.562500000e-02
4^-4 = 3.906250000e-03
4^-5 = 9.765625000e-04
4^-6 = 2.441406250e-04
4^-7 = 6.103515625e-05
4^-8 = 1.525878906e-05
4^-9 = 3.814697266e-06
4^-10 = 9.536743164e-07
Another observation: the last text string ‘scientific notation’ was too long for the specified 16 columns, and the “e” format
for the second version of the power definitely needed more than the 0 columns requested. So they went over, rather than
getting truncated — the width specification is just a minimum.
444 Chapter 33. Formatted Output and Some Text String Manipulation
Introduction to Numerical Methods and Analysis with Python
• https://realpython.com/python-f-strings/
• As always, consider the function help(), as below
help(print)
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
print(1,"+",1,'=',1+1,sep="")
1+1=2
Also, by default the output line is ended at the end of a print command; technically, an “end of line” character “\n” is
added at the end of the output string.
One can change this with the optional argument “end”, for example to allow a single line to be assembled in pieces, or to
specify double spacing.
No new line after print:
Double spacing:
10
446 Chapter 33. Formatted Output and Some Text String Manipulation
CHAPTER
THIRTYFOUR
This is a very basic introduction to object-oriented prgramming in Python: defining classes and using them to create
objects with methods (a cousin of functions) that act on those objects.
These are illustrated with objects that are vectors, and in particular 3-component vectors that have a cross product.
The first class will be for vectors wih three components, labeld ‘x’, ‘y’ and ‘z’.
On its own, this would be rather redundant, since numpy arrays could be used, but it serves to introduce some basic ideas,
and then prepare for the real goal: 3-vectors with cross product.
This first class VeryBasic3Vector illustrates some basic features of creating and using classes; however, it will be
superceded soon!
Almost every class has a method with special name __init__, which is use to create objects of this class. In this case,
__init__ sets the three attributes on a VeryBasic3Vector — its x, y, and z components.
This class has just one other method, for the scalar (“dot”) product of two such vectors.
class VeryBasic3Vector():
"""A first, minimal class for 3-component vectors, offering just creation and the␣
↪scalar ("dot") product."""
a = VeryBasic3Vector([1, 2, 3])
print(f"The x, y, and z attributes of object a are {a.x}, {a.y} and {a.z}")
b = VeryBasic3Vector([4, 5, 2])
print(f"Object b contains the vector <{b.x}, {b.y}, {b.z}>")
447
Introduction to Numerical Methods and Analysis with Python
This way of printing values one attribute at a time gets tiresome, so soon, an alternative will be introduced, in improved
class BasicNVector
The attributes of an object can also be set directly:
a.y = 5
print(f"a is now <{a.x}, {a.y}, {a.z}>")
Methods are used as follows, with the variable before the “.” part being self and any parenthesized variables being the
arguments to the method definition.
A second class BasicNVector with some improvements over the above class VeryBasic3Vector:
• Allowinv vecors of any length, “N”
• Methods for addition, subtraction and vector-by-scalar multiplication.
• The special method __str__ (using this name is mandatory) to output an object’s values as a string — for using
in print, for example.
Aside: here and below, the names of these special methods like __str__ start and end with a pair of underscores, “__”.
class BasicNVector():
"""This improves on class VeryBasic3Vector by:
- allowing any number of components,
- adding several methods for vector arithmetic, and
- defining the special method __str__() to help display the vector's value."""
# The special method names wrapped in double underscores, __add__, __sub__ and __
↪ str__,
# have special meanings, as will be revealed below.
448
Chapter 34. Classes, Objects, Attributes, Methods: Very Basic Object-Oriented Programming in
Python
Introduction to Numerical Methods and Analysis with Python
def __str__(self):
"""How to convert the value to a text string.
As above, this uses angle brackets <...> for vectors, to distinguish from␣
↪lists [...] and tuples (...)"""
string = '<'
for component in self.list[:-1]:
string += f'{component}, '
string += f"{self.list[-1]}>"
return string
We need to create new objects of class BasicNVector to use these new methods:
c = BasicNVector([1, 2, 3, 4])
d = BasicNVector([4, 5, 2, 3])
The new method “__str__” makes it easer to display the value of a BasicNVector, by just using print:
print("c = ", c)
c_times_3 = c.times(3)
e = c.__add__(d)
print(f"{c} + {d} = {e}")
… but that special name also means that it also specifies how the operation “+” works on a pair of BasicNVector objects:
f = c + d
print(f'{c} + {d} = {f}')
A new class can be defined by refining an existing one, by adding methods and such, to avoid defining everything from
scratch. The basic syntax for creating class ChildClass based on an existing parent class named ParentClass is
class ChildClass(ParentClass):
Here we define class Vector3, which is restricted to vectors with 3 components, and uses that restriction to allow
defining the vector cross product.
In addition, it makes the operator “*” do cross multiplication on such objects, by also defining the special method
__mul__:
class Vector3(BasicNVector):
"""Restrict to BasicNVector objects of length 3, and then add the vector cross␣
↪product"""
#BasicNVector.__init__(self, list_of_components)
else: # Complain!
raise ValueError('The length of a Vector3 object must be 3.')
__mul__ = cross_product
Again, we need some Vector3 objects; the above BasicNVector objects do not know about the cross product.
But note that the previously-defined methods for class BasicNVector also work for Vector3 objects, so for example
we can still print with the help of method __str__ from there.
u = Vector3([1, 2, 3])
v = Vector3([4, 5, 10])
print(f'The vector cross product of {u} with {v} is {u.cross_product(v)}')
print(f'The vector cross product of {v} with {u} is {v*u}')
The vector cross product of <1, 2, 3> with <4, 5, 10> is <5, 2, -3>
The vector cross product of <4, 5, 10> with <1, 2, 3> is <-5, -2, 3>
450
Chapter 34. Classes, Objects, Attributes, Methods: Very Basic Object-Oriented Programming in
Python
Introduction to Numerical Methods and Analysis with Python
This is what happens with inappropriate input, thanks to that raise command:
w = Vector3([1, 2, 3, 4])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-4dc0f24dc07a> in <module>
----> 1 w = Vector3([1, 2, 3, 4])
Aside: That’s ugly: as seen in the notes on Exceptions and Exception Handling, a more careful usage would be:
try:
w = Vector3([1, 2, 3, 4])
except Exception as what_just_happened:
print(f"Well, at least we tried, but: '{what_just_happened.args[0]}'")
Well, at least we tried, but: 'The length of a Vector3 object must be 3.'
452
Chapter 34. Classes, Objects, Attributes, Methods: Very Basic Object-Oriented Programming in
Python
CHAPTER
THIRTYFIVE
35.1 Introduction
This unit addresses the almost inevitable occurence of unanticipated errors in code, and methods to detect and handle
exceptions.
Note: This “notebook plus modules” organization is useful when the collection of function definitions usd in a project is
lengthy, and would otherwise clutter up the notebook and hamper readability.
Ideally, when we write a computer program for a mathematical task, we will plan in advance for every possibility, and
design an algorithm to handle all of them. For example, anticipating the possibility of division by zero and avoiding it is
a common issue in making a program robust.
However this is not always feasible; in particular while still developing a program, there might be situations that you have
not yet anticipated, and so it can be useful to write a program that will detect problems that occur while the program is
running, and handle them in a reasonable way.
We start by considering a very basic code for our favorite example, solving quadratic equations.
453
Introduction to Numerical Methods and Analysis with Python
Try it repeatedly, with some “destructive testing”: seek input choices that will cause various problems.
For this, it is is useful to have an interactive loop to ask for test cases:
Let me know what problems you found; we will work on detecting and handling all of them.
Some messages I get ended with these lines, whose meaning we will explore:
• ZeroDivisionError: float division by zero
• ValueError: math domain error
• ValueError: could not convert string to float …
Here is a minimal way to catch all problems, and at least apologize for failing to solve the equation:
• it first tries to run the code in the (indented) block introduced by the colon after the statement try
• if anything goes wrong (in Python jargon, if any exception occurs) it gives up on that try block and runs the code
in the block under the statement except.
One thing has been lost though: the messages like “float division by zero” as seen above, which say what sort of exception
occured.
We can regain some of that, by having except statement save that message into a variable:
This version detects every possible exception and handles them all in the same way, whether it be a problem in arithmetic
(like the dreaded division by zero) or the user making a typing error in the input of the coefficients. Try answering “one”
when asked for a coefficient!
Python divides exceptions into many types, and the statement except can be given the name of an exception type, so
that it then handles only that type of exception.
For example, in the case of division be zero, where we originally got a message
# Exception handling, version 3: as above, but with special handing for division by␣
↪zero.
However, this still crashes with other errors, lke typos in the input. To detect several types of exception, and handle each
in an appropriate way, there can be a list of except statements, each with a block of code to run when that exception is
detected. The type Exception was already seen above; it is just the totally generic case, so can be used as a catch-all after
a list of exception types have been handled.
Experiment a bit, and you will see how these multiple except statements are used:
• the first except clause that matches is used, and any later ones are ignored;
• only if none matches does the code go back to simply “crashing”, as with version 0 above.
For programs with interactive input, a useful pattern for robustly handling errors or surprises in the input is a while-try-
except pattern, with a form like:
try_again = True
while try_again:
try:
Get input
Do stuff with it
try_again = False
except Exception as message:
print("Exception", message, " occurred; please try again.")
Maybe actually fix the problem, and if successful: try_again = False
One can refine this by adding except clauses for as many specific exception types as are relevant, with more specific
handling for each.
Copy your latest version of a “quadratic_solver” function here into your module name something like “math246” created
for Unit 9) — or into a new file named something like quadratic_solvers.py. Then augment that function with
multiple except clauses to handle all exceptions that we can get to occur.
First, read about the possibilities, for example in Section 5 of the official Python 3 Standard Library Reference Manual
at https://docs.python.org/3/library/exceptions.html, or other sources that you can find.
Two exceptions of particular importance for us are ValueError and ArithmeticError, and sub-types of the
latter like ZeroDivisionError and OverflowError. (Note the “CamelCase” capitalization of each word in an
exception name: it is essential to get this right, since Python is case-sensitive.)
Aside: If you find a source on Python exceptions that you prefer to the above references, please let us all know!
Using a basic code for Newton’s Method (such as the one I provide in module root_finders) experiment with ex-
ception handling for the possibility of division by zero.
(You could then do likewise with the Secant Method.)
Appendices
459
CHAPTER
THIRTYSIX
36.1 Index
import numpy as np
import matplotlib.pyplot as plt
461
Introduction to Numerical Methods and Analysis with Python
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
Iteration 1:
The root is in interval [0.0, 1]
The new approximation is 0.5, with backward error 0.378
Iteration 2:
The root is in interval [0.5, 1]
The new approximation is 0.75, with backward error 0.0183
Iteration 3:
The root is in interval [0.5, 0.75]
The new approximation is 0.625, with backward error 0.186
Iteration 4:
The root is in interval [0.625, 0.75]
The new approximation is 0.6875, with backward error 0.0853
Iteration 5:
The root is in interval [0.6875, 0.75]
The new approximation is 0.71875, with backward error 0.0339
Iteration 6:
The root is in interval [0.71875, 0.75]
The new approximation is 0.734375, with backward error 0.00787
Iteration 7:
The root is in interval [0.734375, 0.75]
The new approximation is 0.7421875, with backward error 0.0052
Iteration 8:
The root is in interval [0.734375, 0.7421875]
The new approximation is 0.73828125, with backward error 0.00135
Iteration 10:
The root is in interval [0.73828125, 0.740234375]
The new approximation is 0.7392578125, with backward error 0.000289
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
root=0.7389377567153446, errorEstimate=0.00036613568156118603
"""
x = x0
for k in range(1, maxIterations+1):
fx = f(x)
Dfx = Df(x)
# Note: a careful, robust code would check for the possibility of division by␣
↪zero here,
# but for now I just want a simple presentation of the basic mathematical␣
↪idea.
dx = fx/Dfx
x -= dx # Aside: this is shorthand for "x = x - dx"
errorEstimate = abs(dx)
if demoMode:
print(f"At iteration {k} x = {x} with estimated error {errorEstimate:0.3},
↪ backward error {abs(f(x)):0.3}")
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
Iteration 1:
The root is in interval [0.5403023058681398, 1]
The new approximation is 0.5403023058681398, with error bound 0.46, backward error␣
↪0.317
Iteration 3:
The root is in interval [0.7385270062423998, 1]
The new approximation is 0.7385270062423998, with error bound 0.261, backward␣
↪error 0.000934
Iteration 4:
The root is in interval [0.7390571666782676, 1]
The new approximation is 0.7390571666782676, with error bound 0.261, backward␣
↪error 4.68e-05
Iteration 5:
The root is in interval [0.7390837322783136, 1]
The new approximation is 0.7390837322783136, with error bound 0.261, backward␣
↪error 2.34e-06
Iteration 6:
The root is in interval [0.7390850630385933, 1]
The new approximation is 0.7390850630385933, with error bound 0.261, backward␣
↪error 1.17e-07
Iteration 7:
The root is in interval [0.7390851296998365, 1]
The new approximation is 0.7390851296998365, with error bound 0.261, backward␣
↪error 5.88e-09
Iteration 8:
The root is in interval [0.7390851330390691, 1]
The new approximation is 0.7390851330390691, with error bound 0.261, backward␣
↪error 2.95e-10
Iteration 9:
The root is in interval [0.7390851332063397, 1]
The new approximation is 0.7390851332063397, with error bound 0.261, backward␣
↪error 1.48e-11
Iteration 10:
The root is in interval [0.7390851332147188, 1]
The new approximation is 0.7390851332147188, with error bound 0.261, backward␣
↪error 7.39e-13
Iteration 11:
The root is in interval [0.7390851332151385, 1]
The new approximation is 0.7390851332151385, with error bound 0.261, backward␣
↪error 3.71e-14
Iteration 12:
The root is in interval [0.7390851332151596, 1]
The new approximation is 0.7390851332151596, with error bound 0.261, backward␣
↪error 1.89e-15
Iteration 13:
Iteration 14:
The root is in interval [0.7390851332151607, 1]
The new approximation is 0.7390851332151607, with error bound 0.261, backward␣
↪error 0.0
Iteration 15:
The root is in interval [0.7390851332151607, 1]
The new approximation is 0.7390851332151607, with error bound 0.261, backward␣
↪error 0.0
f_x_new = f(x_new)
(x_older, x_more_recent) = (x_more_recent, x_new)
(f_x_older, f_x_more_recent) = (f_x_more_recent, f_x_new)
errorEstimate = abs(x_older - x_more_recent)
if demoMode:
print(f"The latest pair of approximations are {x_older} and {x_more_
↪recent},")
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
Iteration 1:
The latest pair of approximations are 1 and 0.5403023058681398,
where the function's values are 0.46 and -0.317 respectively.
The new approximation is 0.5403023058681398, with estimated error 0.46, backward␣
↪error 0.317
Iteration 2:
The latest pair of approximations are 0.5403023058681398 and 0.7280103614676171,
where the function's values are -0.317 and -0.0185 respectively.
The new approximation is 0.7280103614676171, with estimated error 0.188, backward␣
↪error 0.0185
Iteration 3:
The latest pair of approximations are 0.7280103614676171 and 0.7396270126307336,
where the function's values are -0.0185 and 0.000907 respectively.
The new approximation is 0.7396270126307336, with estimated error 0.0116, backward␣
↪error 0.000907
Iteration 4:
The latest pair of approximations are 0.7396270126307336 and 0.7390838007832723,
where the function's values are 0.000907 and -2.23e-06 respectively.
The new approximation is 0.7390838007832723, with estimated error 0.000543,␣
↪backward error 2.23e-06
Iteration 5:
The latest pair of approximations are 0.7390838007832723 and 0.7390851330557806,
where the function's values are -2.23e-06 and -2.67e-10 respectively.
The new approximation is 0.7390851330557806, with estimated error 1.33e-06,␣
↪backward error 2.67e-10
Iteration 6:
The latest pair of approximations are 0.7390851330557806 and 0.7390851332151607,
where the function's values are -2.67e-10 and 0.0 respectively.
The new approximation is 0.7390851332151607, with estimated error 1.59e-10,␣
↪backward error 0.0
Iteration 7:
The latest pair of approximations are 0.7390851332151607 and 0.7390851332151607,
where the function's values are 0.0 and 0.0 respectively.
The new approximation is 0.7390851332151607, with estimated error 0.0, backward␣
↪error 0.0
Immediately updated to the following, but I leave the first version for reference.
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
A =
[[ 1. -3. 15.]
[ 5. 200. 7.]
[ 4. 11. -6.]]
b = [4. 3. 2.]
Step k=0
The multipliers in column 1 are [5. 4.]
The updated matrix is
[[ 1. -3. 15.]
[ 0. 215. -68.]
[ 0. 23. -66.]]
The updated right-hand side is
[ 4. -17. -14.]
Step k=1
The multipliers in column 2 are [0.10697674]
The updated matrix is
(continues on next page)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
x = backwardSubstitution(U, c, demoMode=True)
print("")
print(f"x = {x}")
r = b - A@x
print(f"The residual b - Ax = {r},")
print(f"with maximum norm {max(abs(r)):.3}.")
x_3 = 0.20742911452558213
x_2 = -0.013464280057025185
x_1 = 0.8481704419451925
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
x = solveLinearSystem(A, b)
print("")
print(f"x = {x}")
(continues on next page)
36.3.2 LU factorization …
and row matrix L[k, 1:k-1] by matrix U[1:k-1,k:n] gives the relevant row vector.
"""
n = len(A) # len() gives the number of rows in a 2D array.
# Initialize U as the zero matrix;
# correct below the main diagonal, with the other entries to be computed below.
U = np.zeros_like(A)
# Initialize L as the identity matrix;
# correct on and above the main diagonal, with the other entries to be computed␣
↪below.
L = np.identity(n)
# Column and row 1 (i.e Python index 0) are special:
U[0,:] = A[0,:]
L[1:,0] = A[1:,0]/U[0,0]
if demoMode:
print(f"After step k=0")
print(f"U=\n{U}")
print(f"L=\n{L}")
for k in range(1, n-1):
U[k,k:] = A[k,k:] - L[k,:k] @ U[:k,k:]
L[k+1:,k] = (A[k+1:,k] - L[k+1:,:k] @ U[:k,k])/U[k,k]
if demoMode:
print(f"After step {k=}")
print(f"U=\n{U}")
print(f"L=\n{L}")
# The last row (index "-1") is special: nothing to do for L
U[-1,-1] = A[-1,-1] - sum(L[-1,:-1]*U[:-1,-1])
if demoMode:
print(f"After the final step, k={n-1}")
print(f"U=\n{U}")
return (L, U)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
print(f"A=\n{A}")
print(f"L=\n{L}")
print(f"U=\n{U}")
print(f"L times U is \n{L@U}")
print(f"The 'residual' A - LU is \n{A - L@U}")
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
c = forwardSubstitution(L, b, demoMode=True)
print("")
print(f"c = {c}")
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
x = backwardSubstitution(U, c, demoMode=True)
print("")
print(f"The residual c - Ux for the backward substitution step is {c - U@x}")
print(f"\t with maximum norm {np.max(np.abs(c - U@x)):0.3}")
print(f"The residual b - Ax for the whole solving process is {b - A@x}")
print(f"\t with maximum norm {np.max(np.abs(b - A@x)):0.3}")
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
def inverse(A):
"""Use sparingly; there is usually a way to avoid computing inverses that is␣
↪faster and with less rounding error!"""
n = len(A)
A_inverse = np.zeros_like(A)
(L, U) = luFactorize(A)
for i in range(n):
b = np.zeros(n)
b[i] = 1.0
c = forwardSubstitution(L, b)
A_inverse[:,i] = backwardSubstitution(U, c)
return A_inverse
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it
def hilbert(n):
H = np.zeros([n,n])
for i in range(n):
for j in range(n):
H[i,j] = 1.0/(1.0 + i + j)
return H
for n in range(2,4):
H_n = hilbert(n)
print(f"The Hilbert matrix H_{n} is")
print(H_n)
H_n_inverse = inverse(H_n)
print("and its inverse is")
print(H_n_inverse)
print("to verify, their product is")
print(H_n @ H_n_inverse)
print()
These are returned in an array c of the same length as x and y, even if the␣
↪degree is less than the normal length(x)-1,
"""
nnodes = len(x)
n = nnodes - 1
(continues on next page)
def showPolynomial(c):
print("P(x) = ", end="")
n = len(c)-1
print(f"{c[0]:.4}", end="")
if n > 0:
coeff = c[1]
if coeff > 0:
print(f" + {coeff:.4}x", end="")
elif coeff < 0:
print(f" - {-coeff:.4}x", end="")
if n > 1:
for j in range(2, len(c)):
coeff = c[j]
if coeff > 0:
print(f" + {coeff:.4}x^{j}", end="")
elif coeff < 0:
print(f" - {-coeff:.4}x^{j}", end="")
print()
(More to come.)
h = (b-a)/n
t = np.linspace(a, b, n+1) # Note: "n" counts steps, so there are n+1 values for␣
↪t.
# Only the following two lines will need to change for the systems version
U = np.empty_like(t)
U[0] = u_0
for i in range(n):
U[i+1] = U[i] + f(t[i], U[i])*h
return (t, U)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
a = 1.
b = 3.
u_0 = 2.
K = 0.5
n = 10
plt.figure(figsize=[12,4])
plt.title(f"Error")
plt.plot(t, U - u_1(t))
plt.grid(True)
# Only the following three lines change for the systems version
n_unknowns = len(u_0)
U = np.zeros([n+1, n_unknowns])
U[0] = np.array(u_0)
for i in range(n):
U[i+1] = U[i] + f(t[i], U[i])*h
(continues on next page)
36.5. Solving Initial Value Problems for Ordinary Differential Equations 477
Introduction to Numerical Methods and Analysis with Python
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
M = 1.0
k = 1.0
D = 0.1
u_0 = [1.0, 0.0]
a = 0.0
b = 8 * np.pi # Four periods
n=10000
(t, U) = eulerSystem(f, a, b, u_0, n)
y = U[:,0]
Dy = U[:,1]
plt.figure(figsize=[12,6])
plt.title(f"U_0, = y with {k=}, {D=} — by Euler with {n} steps")
plt.plot(t, y)
plt.xlabel('t')
plt.ylabel('y')
plt.grid(True)
plt.figure(figsize=[12,6])
plt.title(f"U_1, = dy/dt with {k=}, {D=} — by Euler with {n} steps")
plt.plot(t, Dy)
plt.xlabel('t')
plt.ylabel('dy/dt')
plt.grid(True)
plt.figure(figsize=[12,6])
plt.title(f"U_0=y and U_1=dy/dt with {k=}, {D=} — by Euler with {n} steps")
plt.plot(t, U)
plt.xlabel('t')
plt.ylabel('y and dy/dt')
plt.grid(True)
else:
plt.title(f"The orbits of the damped mass-spring system, k={k}, D={D} — by␣
↪Euler with {n} steps")
plt.plot(y, Dy)
plt.xlabel('y')
plt.ylabel('dy/dt')
plt.plot(y[0], Dy[0], "g*", label="start")
plt.plot(y[-1], Dy[-1], "r*", label="end")
(continues on next page)
# Only the following two lines will need to change for the systems version
U = np.empty_like(t)
U[0] = u_0
for i in range(n):
K_1 = f(t[i], U[i])*h
K_2 = f(t[i]+h, U[i]+K_1)*h
U[i+1] = U[i] + (K_1 + K_2)/2.
return (t, U)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
a = 1.
b = 3.
u_0 = 2.
K = 1.
n = 10
plt.figure(figsize=[12,4])
plt.title(f"Error")
plt.plot(t, U - u_1(t))
plt.grid(True)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
The solution for a=0 is u(t) = u(t; 0, u_0) = cos t + (u_0 - 1) e^(-Kt)
The solution in general is u(t) = u(t; a, u_0) = cos t + C e^(-K t), C = (u_0␣
↪- cos(a)) exp(K a)
(continues on next page)
36.5. Solving Initial Value Problems for Ordinary Differential Equations 479
Introduction to Numerical Methods and Analysis with Python
a = 1.
b = a + 4 * np.pi # Two periods
u_0 = 2.
K = 2.
n = 50
plt.figure(figsize=[12,4])
plt.title(f"Error")
plt.plot(t, U - u_2(t))
plt.grid(True)
# Only the following two lines will need to change for the systems version
U = np.empty_like(t)
U[0] = u_0
for i in range(n):
K_1 = f(t[i], U[i])*h
K_2 = f(t[i]+h/2, U[i]+K_1/2)*h
U[i+1] = U[i] + K_2
return (t, U)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
a = 1.
b = 3.
u_0 = 2.
K = 1.
n = 10
plt.figure(figsize=[12,4])
plt.title(f"Error")
plt.plot(t, U - u_1(t))
plt.grid(True)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
The solution for a=0 is u(t) = u(t; 0, u_0) = cos t + (u_0 - 1) e^(-Kt)
The solution in general is u(t) = u(t; a, u_0) = cos t + C e^(-K t), C = (u_0␣
↪- cos(a)) exp(K a)
"""
return K*(np.cos(t) - u) - np.sin(t)
def u_2(t): return np.cos(t) + C * np.exp(-K*t)
a = 1.
b = a + 4 * np.pi # Two periods
u_0 = 2.
K = 2.
n = 50
plt.figure(figsize=[12,4])
plt.title(f"Error")
plt.plot(t, U - u_2(t))
plt.grid(True)
36.5. Solving Initial Value Problems for Ordinary Differential Equations 481
Introduction to Numerical Methods and Analysis with Python
# Only the following two lines will need to change for the systems version
U = np.empty_like(t)
U[0] = u_0
for i in range(n):
K_1 = f(t[i], U[i])*h
K_2 = f(t[i]+h/2, U[i]+K_1/2)*h
K_3 = f(t[i]+h/2, U[i]+K_2/2)*h
K_4 = f(t[i]+h, U[i]+K_3)*h
U[i+1] = U[i] + (K_1 + 2*K_2 + 2*K_3 + K_4)/6
return (t, U)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
a = 1.
b = 3.
u_0 = 2.
K = 1.
n = 10
plt.figure(figsize=[12,4])
plt.title(f"Error")
plt.plot(t, U - u_1(t))
plt.grid(True)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
The solution for a=0 is u(t) = u(t; 0, u_0) = cos t + (u_0 - 1) e^(-Kt)
The solution in general is u(t) = u(t; a, u_0) = cos t + C e^(-K t), C = (u_0␣
↪- cos(a)) exp(K a)
"""
return K*(np.cos(t) - u) - np.sin(t)
def u_2(t): return np.cos(t) + C * np.exp(-K*t)
a = 1.
b = a + 4 * np.pi # Two periods
u_0 = 2.
(continues on next page)
plt.figure(figsize=[12,4])
plt.title(f"Error")
plt.plot(t, U - u_2(t))
plt.grid(True)
# Only the following three lines change for the systems version — the same lines␣
↪as for eulerSystem and so on.
n_unknowns = len(u_0)
U = np.zeros([n+1, n_unknowns])
U[0] = np.array(u_0)
for i in range(n):
K_1 = f(t[i], U[i])*h
K_2 = f(t[i]+h/2, U[i]+K_1/2)*h
K_3 = f(t[i]+h/2, U[i]+K_2/2)*h
K_4 = f(t[i]+h, U[i]+K_3)*h
U[i+1] = U[i] + (K_1 + 2*K_2 + 2*K_3 + K_4)/6
return (t, U)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
36.5. Solving Initial Value Problems for Ordinary Differential Equations 483
Introduction to Numerical Methods and Analysis with Python
plt.figure(figsize=[12,6])
plt.title(f"U_0, = y with {k=}, {D=} — by Runge-Kutta with {n} steps")
plt.plot(t, y)
plt.xlabel('t')
plt.ylabel('y')
plt.grid(True)
plt.figure(figsize=[12,6])
plt.title(f"U_1, = dy/dt with {k=}, {D=} — by Runge-Kutta with {n} steps")
plt.plot(t, Dy)
plt.xlabel('t')
plt.ylabel('dy/dt')
plt.grid(True)
else:
plt.title(f"The orbits of the damped mass-spring system, k={k}, D={D} — by␣
↪Runge-Kutta with {n} steps")
plt.plot(y, Dy)
plt.xlabel('y')
plt.ylabel('dy/dt')
plt.plot(y[0], Dy[0], "g*", label="start")
plt.plot(y[-1], Dy[-1], "r*", label="end")
plt.legend()
plt.grid(True)
36.5.2 For the future: an attempt to use the IMPLICIT Midpoint Method
to solve du/dt = f(t, u) for t in [a, b], with initial value u(a) = u_0
Note:
- the default of iterations=1 gives the Explicit Midpoint Method
- iterations=0 gives Euler's Method
"""
h = (b-a)/n
t = np.linspace(a, b, n+1) # Note: "n" counts steps, so there are n+1 values for␣
↪t.
# Only the following three lines change for the systems version — the same lines␣
↪ as for eulerSystem and so on.
n_unknowns = len(u_0)
U = np.zeros([n+1, n_unknowns])
U[0] = np.array(u_0)
(continues on next page)
for i in range(n):
K = f(t[i], U[i])*h
# A few iterations of the fixed point method
for iteration in range(iterations):
K = f(t[i]+h/2, U[i]+K/2)*h
U[i+1] = U[i] + K
return (t, U)
# Demo
if __name__ == "__main__": # Do this if running the .py file directly, but not when␣
↪importing [from] it.
plt.figure(figsize=[12,6])
plt.title(f"U_0, = y with {k=}, {D=} — by FPMidpointSystem with {n} steps,
↪{iterations} iterations")
plt.plot(t, y)
plt.xlabel('t')
plt.ylabel('y')
plt.grid(True)
else:
plt.title(f"The orbits of the damped mass-spring system, k={k}, D={D} — by␣
↪FPMidpointSystem with {n} steps, {iterations} iterations")
plt.plot(y, Dy)
plt.xlabel('y')
plt.ylabel('dy/dt')
plt.plot(y[0], Dy[0], "g*", label="start")
plt.plot(y[-1], Dy[-1], "r*", label="end")
plt.legend()
plt.grid(True)
36.5. Solving Initial Value Problems for Ordinary Differential Equations 485
Introduction to Numerical Methods and Analysis with Python
36.6 For some examples in Chapter Initial Value Problems for Ordi-
nary Differential Equations
THIRTYSEVEN
This section describes some of the core algorithms of linear algebra using the same indexing conventions as in most modern
programming languages: Python, C, Java, C++, javascript, Objective-C, C#, Swift, etc. (In fact, almost everything except
Matlab and Fortran.)
The key elements of this are:
• Indices for vectors and other arrays start at 0.
• Ranges of indices are described with semi-open intervals [𝑎, 𝑏).
This “index interval” notation has two virtues: it emphasizes the mathematical fact that the order in which things are
done is irrelevant (such as within sums), and it more closely resembles the way that most programming languages
specify index ranges. For example, the indices 𝑖 of a Python array with 𝑛 elements are 0 ≤ 𝑖 < 𝑛, or [0, 𝑛), and
the Python notations range(n), range(0,n), :n and 0:n all describe this. Similarly, in Java, C, C++ etc.,
one can loop over the indices 𝑖 ∈ [𝑎, 𝑏) with for(i=a, i<b, i+=1)
The one place that the indexing is still a bit tricky is counting backwards!
For this, note that the index range 𝑖 = 𝑏, 𝑏 − 1, … 𝑎 is 𝑏 ≥ 𝑖 > 𝑎 − 1, which in Python is range(b, a-1, -1).
I include Python code for comparison for just the three most basic algorithms: “naive” LU factorization and forward and
backward substitution, without pivoting. The rest are good exercises for learning how to program loops and sums.
In this careful version, the original matrix 𝐴 is called 𝐴(0) , and the new versions at each stage are called 𝐴(1) , 𝐴(2) , and
so on to 𝐴(𝑛−1) , which is the row-reduced form also called 𝑈 ; likewise with the right-hand sides 𝑏(0) = 𝑏, 𝑏(1) up to
𝑏(𝑛−1) = 𝑐.
However, in software all those super-scripts can be ignored, just updating arrays A and b.
Algorithm 2.1
for k in [0, n-1)
for i in [k+1, n)
(𝑘) (𝑘)
𝑙𝑖,𝑘 = 𝑎𝑖,𝑘 /𝑎𝑘,𝑘
for j in [k+1, n)
(𝑘+1) (𝑘) (𝑘)
𝑎𝑖,𝑗 = 𝑎𝑖,𝑗 − 𝑙𝑖,𝑘 𝑎𝑘,𝑗
end
487
Introduction to Numerical Methods and Analysis with Python
Actually this skips formulas for some elements of the new matrix 𝐴(𝑘+1) , because they are either zero or are unchanged
from 𝐴(𝑘) :
the rows before 𝑖 = 𝑘 are unchanged, and for columns before 𝑗 = 𝑘, the new entries are zeros.
Algorithm 2.2
for j in [0, n)
𝑢0,𝑗 = 𝑎0,𝑗
end
for i in [1, n)
𝑙𝑖,0 = 𝑎𝑖,0 /𝑢0,0
end
for k in [1, n)
for j in [k, n)
𝑢𝑘,𝑗 = 𝑎𝑘,𝑗 − ∑ 𝑙𝑘,𝑠 𝑢𝑠,𝑗
𝑠∈[0,𝑘)
end
for i in [k+1, n)
𝑙𝑖,𝑘 = ⎛
⎜𝑎𝑖,𝑘 − ∑ 𝑙𝑖,𝑠 𝑢𝑠,𝑘 ⎞
⎟ /𝑢𝑘,𝑘
⎝ 𝑠∈[0,𝑘) ⎠
end
end
Algorithm 2.3
𝑐0 = 𝑏 0
for i in [1, n)
488 Chapter 37. Linear algebra algorithms using 0-based indexing and semi-open intervals
Introduction to Numerical Methods and Analysis with Python
𝑐𝑖 = 𝑏𝑖 − ∑ 𝑙𝑖,𝑗 𝑐𝑗
𝑗∈[0,𝑖)
end for
Algorithm 2.4
𝑥𝑛−1 = 𝑐𝑛−1 /𝑢𝑛−1,𝑛−1
for i from n-2 down to 0
𝑐𝑖 − ∑𝑗∈[𝑖+1,𝑛) 𝑢𝑖,𝑗 𝑥𝑗
𝑥𝑖 =
𝑢𝑖,𝑖
end
For the Python implementation, we need a range of indices that change by a step of -1 instead of 1. This can be done with
an optional third argument to range: range(a, b, step) generates a succession values starting with a, each value
differing from its predecessor by step, and stopping just before b. That last rule requires care whe the step is negative:
for example, range(3, 0, -1) gives the sequence {3, 2, 1}. So to count down to zero, one has to use 𝑏 = −1! That
is, to count down from 𝑛 − 1 and end at 0, one uses range(n-1, -1, -1).
import numpy as np
Algorithm 2.5
𝑥𝑛−1 = 𝑐𝑛−1 /𝑢𝑛−1,𝑛−1
for d in [2, n+1):
𝑖=𝑛−𝑑
𝑐𝑖 − ∑𝑗∈[𝑖+1,𝑛) 𝑢𝑖,𝑗 𝑥𝑗
𝑥𝑖 =
𝑢𝑖,𝑖
end
Apart the choices of pivot rows and updating of the permutation vector 𝑝, the only change from the non-pivoting version
is that all row indices change from 𝑖 to 𝑝𝑖 and so on, in both 𝑈 and 𝑐; column indices are unchanged.
In the following description, I will discard the above distinction between the successive matrices 𝐴(𝑘) and vectors 𝑏(𝑘) ,
and instead refer to 𝐴 and 𝑏 like variable arrays in a programming language, with their elements being updated. Likewise,
the permutation will be stored in a variable array 𝑝.
Algorithm 2.6
Initialize the permuation vector as 𝑝 = [0, 1, … , 𝑛 − 1]
for k in [0, n-1)
Search elements 𝑎𝑝𝑖 ,𝑘 for 𝑖 ∈ [𝑘, 𝑛) and find the index r of the one with largest absolute value.
If 𝑟 ≠ 𝑘, swap 𝑝𝑘 with 𝑝𝑟
for i in [k+1, n)
𝑙𝑝𝑖 ,𝑘 = 𝑎𝑝𝑖 ,𝑘 /𝑎𝑝𝑘 ,𝑘
for j in [k+1, n)
𝑎𝑝𝑖 ,𝑗 = 𝑎𝑝𝑖 ,𝑗 − 𝑙𝑝𝑖 ,𝑘 𝑎𝑝𝑘 ,𝑗
end
𝑏𝑝𝑖 = 𝑏𝑝𝑖 − 𝑙𝑝𝑖 ,𝑘 𝑏𝑝𝑘
end
end
490 Chapter 37. Linear algebra algorithms using 0-based indexing and semi-open intervals
Introduction to Numerical Methods and Analysis with Python
37.5.2 The Doolittle LU factorization algorithm with maximal element partial pivot-
ing
Algorithm 2.7
for k in [0, n)
Search elements 𝑎𝑝𝑖 ,𝑘 for 𝑖 ∈ [𝑘, 𝑛) and find the index r of the one with largest absolute value.
If 𝑟 ≠ 𝑘, swap 𝑝𝑘 with 𝑝𝑟
for j in [k, n)
Note that for 𝑘 = 0, the sum can [and should!] be omitted in the following line:
𝑢𝑝𝑘 ,𝑗 = 𝑎𝑝𝑘 ,𝑗 − ∑ 𝑙𝑝𝑘 ,𝑠 𝑢𝑝𝑠 ,𝑗
𝑠∈[0,𝑘)
end
for i in [k+1, n)
Note that for 𝑘 = 0, the sum can [and should!] be omitted in the following line:
𝑙𝑝𝑖 ,𝑘 = ⎛
⎜𝑎𝑝𝑖 ,𝑘 − ∑ 𝑙𝑝𝑖 ,𝑠 𝑢𝑝𝑠 ,𝑘 ⎞
⎟ /𝑢𝑝𝑘 ,𝑘
⎝ 𝑠∈[0,𝑘) ⎠
end
end
Algorithm 2.8
𝑐𝑝0 = 𝑏𝑝0 /𝑙𝑝0 ,0
for i in [1, n)
𝑐𝑝𝑖 = 𝑏𝑝𝑖 − ∑ 𝑙𝑝𝑖 ,𝑗 𝑐𝑝𝑗
𝑗∈[0,𝑖)
end
Algorithm 2.9
𝑥𝑛−1 = 𝑐𝑝𝑛−1 /𝑢𝑝𝑛−1 ,𝑛−1
for i from n-2 down to 0
𝑐𝑝𝑖 − ∑ 𝑢𝑝𝑖 ,𝑗 𝑥𝑗
𝑗∈[𝑖+1,𝑛)
𝑥𝑖 =
𝑢𝑝𝑖 ,𝑖
end
𝑑0 𝑢0
⎡ 𝑙 𝑑1 𝑢1 ⎤
⎢ 0 ⎥
𝑙1 𝑑2 𝑢2
Describe a tridiagonal matrix with three 1D arrays as $𝑇 = ⎢ ⎥$
⎢ ⋱ ⋱ ⋱ ⎥
⎢ 𝑙𝑛−3 𝑑𝑛−2 𝑢𝑛−2 ⎥
⎣ 𝑙𝑛−2 𝑑𝑛−1 ⎦
with all “missing” entries being zeros, and the right had side of system 𝑇 𝑥 = 𝑏 as
𝑏0
⎡ 𝑏 ⎤
⎢ 2 ⎥
⎢ ⋮ ⎥
⎣ 𝑏𝑛−1 ⎦
1
⎡ 𝐿 1 ⎤
⎢ 0 ⎥
⎢ 𝐿1 1 ⎥,
The factorization has the form 𝑇 = 𝐿𝑈 with $𝐿 = 𝑈 =
⎢ ⋱ ⋱ ⎥
⎢ 𝐿𝑛−3 1 ⎥
⎣ 𝐿𝑛−2 1 ⎦
𝐷0 𝑢0
⎡ 𝐷1 𝑢1 ⎤
⎢ ⎥
⎢ 𝐷2 𝑢2 ⎥$
⎢ ⋱ ⋱ ⎥
⎢ 𝐷𝑛−2 𝑢𝑛−2 ⎥
⎣ 𝐷𝑛−1 ⎦
so just the arrays 𝐿 and 𝐷 are to be computed.
Algorithm 2.10
𝐷0 = 𝑑0
for i in [1, n)
𝐿𝑖−1 = 𝑙𝑖−1 /𝐷𝑖−1
𝐷𝑖 = 𝑑𝑖 − 𝐿𝑖−1 𝑢𝑖−1
end
492 Chapter 37. Linear algebra algorithms using 0-based indexing and semi-open intervals
Introduction to Numerical Methods and Analysis with Python
Algorithm 2.11
𝑐0 = 𝑏 0
for i in [1, n)
𝑐𝑖 = 𝑏𝑖 − 𝐿𝑖−1 𝑐𝑖−1
end
Algorithm 2.12
𝑥𝑛−1 = 𝑐𝑛−1 /𝐷𝑛−1
for i from n-2 down to 0
𝑥𝑖 = (𝑐𝑖 − 𝑢𝑖 𝑥𝑖+1 )/𝐷𝑖
end
Algorithm 2.13
The top row is unchanged:
for j in [0, p+1)
𝑢0,𝑗 = 𝑎0,𝑗
end The top non-zero diagonal is also unchanged:
for k in [1, n - p)
𝑢𝑘,𝑘+𝑝 = 𝑎𝑘,𝑘+𝑝
end for
The left column requires no sums:
for i in [1, p+1)
𝑙𝑖,0 = 𝑎𝑖,0 /𝑢0,0
end for
end
for i in [k+1, min(n,k+p+1))
𝑙𝑖,𝑘 = ⎛
⎜𝑎𝑖,𝑘 − ∑ 𝑙𝑖,𝑠 𝑢𝑠,𝑘 ⎞
⎟ /𝑢𝑘,𝑘
⎝ 𝑠∈[max(0,𝑖−𝑝),𝑘) ⎠
end
end
Algorithm 2.14
𝑐0 = 𝑏0 /𝑙0,0
for i in [1, n)
𝑐𝑖 = 𝑏𝑖 − ∑ 𝑙𝑖,𝑗 𝑐𝑗
𝑗∈[𝑚𝑎𝑥(0,𝑖−𝑝),𝑖)
end
Algorithm 2.15
𝑥𝑛−1 = 𝑐𝑛−1 /𝑢𝑛−1,𝑛−1
for i from n-2 down to 0
𝑐𝑖 − ∑𝑗∈[𝑖+1,min(𝑛,𝑖+𝑝+1)) 𝑢𝑖,𝑗 𝑥𝑗
𝑥𝑖 =
𝑢𝑖,𝑖
end
494 Chapter 37. Linear algebra algorithms using 0-based indexing and semi-open intervals
CHAPTER
THIRTYEIGHT
38.2 To Do
495
Introduction to Numerical Methods and Analysis with Python
• camelCase everything (e.g. change from numerical_methods to numericalMethods), except where an underscore
is used to indicate a subscript. (For one thing that harmonises with Julia style.)
• When using <br>, it should appear on its own line, not at end-of-line. (For PDF output; HTML output is more
forgiving.) This is relevant to some pseudo-code appearence in the PDF; e.g in Basic Concepts and Euler’s Method:
• See https://jupyterbook.org/en/stable/content/myst.html
• Move to MyST Markdown notation {doc}, {ref}, {eq} and so on.
• To number equations for referencing, use MyST-Markdown-augmented $$...$$ notation, as with $$2+2=4$$
(eq-obvious)
• If the top-level section atop a file is labelled as with (section-label)= then the usage {ref}<section-label>
can be used instead of {doc}<file-base-name> this is posibly useful if the file name can have suffix either
“-python” or “-julia” but I want to use the same cross-reference text.
THIRTYNINE
BIBLIOGRAPHY
497
Introduction to Numerical Methods and Analysis with Python
[BFB16] Richard L. Burden, J. Douglas Faires, and Annette M. Burden. Numerical Analysis. Cengage, 10th edition,
2016.
[CK12] Ward Chenney and David Kincaid. Numerical Mathematics and Computing. Cengage, 7 edition, 2012.
[KC90] David Kincaid and Ward Chenney. Numerical Analysis. Brooks/Cole, 1990.
[Sau22] Timothy Sauer. Numerical Analysis. Pearson, 3rd edition, 2022.
499
Introduction to Numerical Methods and Analysis with Python
500 Bibliography
PROOF INDEX
a-contraction-mapping-theorem appendix-backward-banded-0-
a-contraction-mapping-theorem (main/fixed- based
point-iteration-python), 22 appendix-backward-banded-0-based
(main/linear-algebra-with-0-based-indexing-
a-derivative-based-fixed-point- and-semiopen-intervals), 494
theorem
a-derivative-based-fixed-point-theorem appendix-backward-mepp-0-based
(main/fixed-point-iteration-python), 22 appendix-backward-mepp-0-based
(main/linear-algebra-with-0-based-indexing-
absolute-backward-error and-semiopen-intervals), 491
absolute-backward-error (main/error-measures-
convergence-rates), 50 appendix-backward-substitution-0-
based-2
absolute-error appendix-backward-substitution-0-
absolute-error (main/error-measures-convergence- based-2 (main/linear-algebra-with-0-based-
rates), 49 indexing-and-semiopen-intervals), 489
algorithm-Doolittle-factorization appendix-backward-tridiagonal-0-
algorithm-Doolittle-factorization based
(main/linear-equations-3-lu-factorization- appendix-backward-tridiagonal-0-based
python), 101 (main/linear-algebra-with-0-based-indexing-
and-semiopen-intervals), 493
algorithm-plu-1
algorithm-plu-1 (main/linear-equations-4-plu- appendix-forward-banded-0-based
factorization-python), 109 appendix-forward-banded-0-based
(main/linear-algebra-with-0-based-indexing-
algorithm-plu-2 and-semiopen-intervals), 494
algorithm-plu-2 (main/linear-equations-4-plu-
factorization-python), 111 appendix-forward-mepp-0-based
appendix-forward-mepp-0-based (main/linear-
algorithm-plu-fragment algebra-with-0-based-indexing-and-semiopen-
algorithm-plu-fragment (main/linear-equations- intervals), 491
4-plu-factorization-python), 109
appendix-forward-substitution-0-
another-way-to-count-backwards based
another-way-to-count-backwards appendix-forward-substitution-0-based
(main/linear-equations-1-row-reduction-python), (main/linear-algebra-with-0-based-indexing-
78 and-semiopen-intervals), 488
501
Introduction to Numerical Methods and Analysis with Python
appendix-forward-tridiagonal-0- backward-substitution-redux
based backward-substitution-redux (main/linear-
appendix-forward-tridiagonal-0-based equations-7-tridiagonal-banded-and-SDD-
(main/linear-algebra-with-0-based-indexing- matrices), 129
and-semiopen-intervals), 493
bisection-for
appendix-lu-banded-0-based bisection-for (main/root-finding-by-interval-
appendix-lu-banded-0-based (main/linear- halving-python), 16
algebra-with-0-based-indexing-and-semiopen-
intervals), 493 bisection-step
bisection-step (main/root-finding-by-interval-
appendix-lu-doolittle-0-based halving-python), 15
appendix-lu-doolittle-0-based (main/linear-
algebra-with-0-based-indexing-and-semiopen- bisection-while
intervals), 488 bisection-while (main/root-finding-by-interval-
halving-python), 17
appendix-lu-mepp-0-based
appendix-lu-mepp-0-based (main/linear-algebra- bisection-x-cosx
with-0-based-indexing-and-semiopen-intervals), bisection-x-cosx (main/root-finding-by-interval-
491 halving-python), 9
appendix-lu-tridiagonal-0-based check-with-taylor`
appendix-lu-tridiagonal-0-based check-with-taylor` (main/derivatives-and-the-
(main/linear-algebra-with-0-based-indexing- method-of-undetermined-coefficents), 178
and-semiopen-intervals), 492
choose-=step-size-2
appendix:gaussian-elimination-0- choose-=step-size-2 (main/ODE-IVP-5-error-
based control), 259
appendix:gaussian-elimination-0-based
(main/linear-algebra-with-0-based-indexing- choose-step-size-1
and-semiopen-intervals), 487 choose-step-size-1 (main/ODE-IVP-5-error-
control), 258
backward-error
backward-error (main/newtons-method-python), ?? collocation-error-formula
collocation-error-formula (main/polynomial-
backward-error-redux collocation-error-formulas-python), 148
backward-error-redux (main/error-measures-
convergence-rates), 50 collocation-error-formula-equally-
spaced-nodes
backward-substitution- collocation-error-formula-equally-
backward-substitution- (main/linear-equations- spaced-nodes (main/polynomial-
1-row-reduction-python), 78 collocation-error-formulas-python), 148
backward-substitution-0-based collocation-error-formula-remark
backward-substitution-0-based (main/linear- collocation-error-formula-remark
algebra-with-0-based-indexing-and-semiopen- (main/polynomial-collocation-error-formulas-
intervals), 489 python), 153
backward-substitution-1 comparison-to-taylor-error-formula
backward-substitution-1 (main/linear- comparison-to-taylor-error-formula
equations-1-row-reduction-python), 77 (main/polynomial-collocation-error-formulas-
python), 148
convergence-of-order-p error-bound-chebychev-collocation
convergence-of-order-p (main/error-measures- error-bound-chebychev-collocation
convergence-rates), 51 (main/polynomial-collocation-chebychev),
155
definition-absolute-error
definition-absolute-error (main/fixed-point- error-bounds-clamped-splines
iteration-python), 23 error-bounds-clamped-splines
(main/piecewise-polynomial-approximation-
definition-columnwise-strictly- and-splines), 159
diagonally-dominant
error-bounds-hermite-cubics
definition-columnwise-strictly-
diagonally-dominant (main/linear- error-bounds-hermite-cubics (main/piecewise-
equations-1-row-reduction-python), 85 polynomial-approximation-and-splines), 160
definition-contraction-mapping error-left-endpoint-rule
definition-contraction-mapping error-left-endpoint-rule (main/integrals-1-
(main/fixed-point-iteration-python), 21 building-blocks-python), 188
definition-error error-redux
error-redux (main/error-measures-convergence-rates),
definition-error (main/fixed-point-iteration-
49
python), 23
definition-mapping errors-when-approximating-
definition-mapping (main/fixed-point-iteration-
derivatives
python), 20 errors-when-approximating-derivatives
(main/machine-numbers-rounding-error-and-
definition-psychologically-triangular error-propagation-python), 93
definition-psychologically-triangular euler-variable-h
(main/linear-equations-4-plu-factorization-
python), 114 euler-variable-h (main/ODE-IVP-5-error-control),
256
definition-strictly-diagonally- example-1-x-4cosx
dominant example-1-x-4cosx (main/fixed-point-iteration-
definition-strictly-diagonally- python), 20
dominant (main/linear-equations-1-row-
reduction-python), 85 example-2
example-2 (main/newtons-method-python), ??
definition-tridiagonal
definition-tridiagonal (main/linear-equations- example-2-x-cosx
7-tridiagonal-banded-and-SDD-matrices), 128
example-2-x-cosx (main/fixed-point-iteration-
python), 22
definition-vector-valued-contraction-
mapping example-3
definition-vector-valued-contraction-
example-3 (main/newtons-method-python), ??
mapping (main/linear-equations-6-iterative-
methods-python), 125 example-3-x-cosx-fpi
dolittle-general example-3-x-cosx-fpi (main/fixed-point-iteration-
python), 24
dolittle-general (main/linear-equations-7-
tridiagonal-banded-and-SDD-matrices), 130 example-4
example-4 (main/fixed-point-iteration-python), 27
example-almost-division-by-zero example-three-point-centered-
example-almost-division-by-zero difference
(main/linear-equations-1-row-reduction-python), example-three-point-centered-
83 difference (main/derivatives-and-the-
method-of-undetermined-coefficents), 175
example-avoiding-small-
denominators example-three-point-one-sided-
example-avoiding-small-denominators difference
(main/linear-equations-1-row-reduction-python), example-three-point-one-sided-
84 difference (main/derivatives-and-the-
method-of-undetermined-coefficents), 174
example-basic-forward-difference
example-basic-forward-difference example-three-point-one-sided-
(main/derivatives-and-the-method-of-
undetermined-coefficents), 174
difference-method-2
example-three-point-one-sided-
example-hilbert-matrices difference-method-2 (main/derivatives-
and-the-method-of-undetermined-coefficents),
example-hilbert-matrices (main/linear-
177
equations-5-error-bounds-condition-numbers-
python), 117 explicit-midpoint-algorithm
example-integration explicit-midpoint-algorithm (main/ODE-IVP-
2-Runge-Kutta-python), 229
example-integration (main/ODE-IVP-1-basics-
and-Euler-python), ?? explicit-trapezoid-algorithm
example-less-obvious-division-by- explicit-trapezoid-algorithm (main/ODE-
zero IVP-2-Runge-Kutta-python), 225
example-less-obvious-division-by-zero forward-substitution
(main/linear-equations-1-row-reduction-python),
82 forward-substitution (main/linear-equations-7-
tridiagonal-banded-and-SDD-matrices), 129
example-newton-x-cosx
example-newton-x-cosx (main/newtons-method-
gaussian-elimination
python), ?? gaussian-elimination (main/linear-equations-1-
row-reduction-python), 72
example-nonlinear-ode
example-nonlinear-ode (main/ODE-IVP-1-basics-
gaussian-elimination-0-based
and-Euler-python), ?? gaussian-elimination-0-based (main/linear-
equations-1-row-reduction-python), 73
example-obvious-division-by-zero
example-obvious-division-by-zero
gaussian-elimination-inserting-zeros
(main/linear-equations-1-row-reduction-python), gaussian-elimination-inserting-zeros
81 (main/linear-equations-1-row-reduction-python),
71
example-simplest-real-ode
example-simplest-real-ode (main/ODE-IVP-1-
gemepp-0-based
basics-and-Euler-python), ?? gemepp-0-based (main/linear-algebra-with-0-based-
indexing-and-semiopen-intervals), 490
example-stiff-ode
example-stiff-ode (main/ODE-IVP-1-basics-and-
generalized-mean-value-theorem
Euler-python), ?? generalized-mean-value-theorem
(main/integrals-2-composite-rules), 191
geometrical-derivation-of-least- multistep-method
squares multistep-method (main/ODE-IVP-6-multi-step-
geometrical-derivation-of-least- methods-introduction-python), 271
squares (main/least-squares-fitting), 165
multistep-method-redux
integral-mean-value-theorem multistep-method-redux (main/ODE-IVP-7-
integral-mean-value-theorem (main/integrals- multi-step-methods-Adams-Bashforth-python),
1-building-blocks-python), 186 281
interpolation-example-1 naive-gaussian-elimination
interpolation-example-1 (main/polynomial- naive-gaussian-elimination (main/linear-
collocation+approximation-python), 140 equations-1-row-reduction-python), 71
interpolation-example-2 no-scaled-partial-pivoting
interpolation-example-2 (main/polynomial- no-scaled-partial-pivoting (main/linear-
collocation+approximation-python), 143 equations-2-pivoting-python), 95
interpolation-example-3 numpy-math-functions
interpolation-example-3 (main/polynomial- numpy-math-functions (main/root-finding-by-
collocation+approximation-python), 145 interval-halving-python), 12
inverse-power-method numpy-matplotlib
inverse-power-method (main/eigenproblems- numpy-matplotlib (main/root-finding-by-interval-
python), 135 halving-python), 9
linear-convergence odeivp-onestep-order-of-global-error
linear-convergence (main/error-measures-
odeivp-onestep-order-of-global-error
convergence-rates), 50
(main/ODE-IVP-3-error-results-one-step-
lu-banded methods), 238
lu-banded (main/linear-equations-7-tridiagonal- power-method
banded-and-SDD-matrices), 130
power-method (main/eigenproblems-python), 133
lu-banded-symmetric
lu-banded-symmetric (main/linear-equations-7-
proposition-1
tridiagonal-banded-and-SDD-matrices), 131 proposition-1 (main/newtons-method-convergence-
rate), ??
lu-factorization
lu-factorization (main/linear-equations-7-
proposition-1-fpi-iterates-converge-
tridiagonal-banded-and-SDD-matrices), 129 to-fp
proposition-1-fpi-iterates-converge-
mathematically-correct-notation to-fp (main/fixed-point-iteration-python),
mathematically-correct-notation 19
(main/linear-equations-1-row-reduction-python),
78 proposition-2
proposition-2 (main/newtons-method-convergence-
midpoint-rule-error rate), ??
midpoint-rule-error (main/integrals-1-building-
blocks-python), 186 proposition-2-ivp-fpi-version
proposition-2-ivp-fpi-version (main/fixed-
module-numerical-methods point-iteration-python), 20
module-numerical-methods (main/newtons-
method-python), ??
proposition-3 remark-LU-with-P
proposition-3 (main/fixed-point-iteration-python), 23 remark-LU-with-P (main/linear-equations-4-plu-
factorization-python), 111
python-array-dicing
python-array-dicing (main/linear-equations-1- remark-dolittle
row-reduction-python), 76 remark-dolittle (main/linear-equations-3-lu-
factorization-python), 101
python-array-slicing
python-array-slicing (main/linear-equations-1- remark-float-vs-int
row-reduction-python), 75 remark-float-vs-int (main/linear-equations-1-
row-reduction-python), 70
python-complex-numbers
python-complex-numbers (main/newtons-method- remark-module-linalg
python), ?? remark-module-linalg (main/linear-equations-1-
row-reduction-python), 67
python-counting-backwards
python-counting-backwards (main/linear- remark-numpy-linalg-norm
equations-1-row-reduction-python), 78 remark-numpy-linalg-norm (main/linear-
equations-5-error-bounds-condition-numbers-
python-dot python), 116
python-dot (main/eigenproblems-python), 133
remark-numpy-matrix-product
python-splat-* remark-numpy-matrix-product (main/linear-
python-splat-* (main/linear-equations-1-row- equations-1-row-reduction-python), 69
reduction-python), 79
remark-other-matrix-norms
relative-error remark-other-matrix-norms (main/linear-
relative-error (main/error-measures-convergence- equations-5-error-bounds-condition-numbers-
rates), 49 python), 116
remark-1 remark-positive-definite-also-works
remark-1 (main/integrals-2-composite-rules), 194 remark-positive-definite-also-works
(main/linear-equations-3-lu-factorization-
remark-1-not-quite-zero-values-and- python), 107
rounding
remark-1-not-quite-zero-values-and- remark-positive-definite-matrices-
rounding (main/linear-equations-1-row- also-work
reduction-python), 71 remark-positive-definite-matrices-
also-work (main/linear-equations-1-row-
remark-1-to-0-easy reduction-python), 86
remark-1-to-0-easy (main/linear-equations-1-row-
reduction-python), 73 remark-python-for-0-based
remark-python-for-0-based (main/linear-
remark-12 equations-1-row-reduction-python), 72
remark-12 (main/fixed-point-iteration-python), 24
remark-python-style
remark-19 remark-python-style (main/newtons-method-
remark-19 (main/linear-equations-1-row-reduction- python), ??
python), 82
remark-use-the-module
remark-5 remark-use-the-module (main/root-finding-
remark-5 (main/error-measures-convergence-rates), 50 without-derivatives-python), 56
richardson-forward-differences theorem-LU-SDD
richardson-forward-differences theorem-LU-SDD (main/linear-equations-3-lu-
(main/richardson-extrapolation), 180 factorization-python), 106
richardson0n-to-kn theorem-collocation
richardson0n-to-kn (main/richardson- theorem-collocation (main/polynomial-
extrapolation), 181 collocation+approximation-python), 139
rkf theorem-gaus-seidel-convergence
rkf (main/ODE-IVP-5-error-control), 266 theorem-gaus-seidel-convergence
(main/linear-equations-6-iterative-methods-
robust python), 127
robust (main/machine-numbers-rounding-error-and-
error-propagation-python), 87
theorem-jacobi-convergence
theorem-jacobi-convergence (main/linear-
romberg-integration equations-6-iterative-methods-python), 126
romberg-integration (main/integrals-4-romberg-
integration), 198
theorem-loss-of-precision
theorem-loss-of-precision (main/machine-
runge-kutta numbers-rounding-error-and-error-propagation-
python), 93
runge-kutta (main/ODE-IVP-2-Runge-Kutta-python),
233 theorem-matrix-iteration-
secant-method convergence
theorem-matrix-iteration-convergence
secant-method (main/root-finding-without-
(main/linear-equations-6-iterative-methods-
derivatives-python), 61
python), 125
separatrices theorem-row-reduction-preserves-
separatrices (main/ODE-IVP-4-system-higher-order-
equations-python), 242
sdd
theorem-row-reduction-preserves-sdd
stiffness (main/linear-equations-1-row-reduction-python),
85
stiffness (main/ODE-IVP-4-system-higher-order-
equations-python), 241 todo-ode-2
todo-ode-2 (main/ODE-IVP-2-Runge-Kutta-python),
super-linear 223
super-linear (main/error-measures-convergence-
rates), 51 trapezoid-rule-error
trapezoid-rule-error (main/integrals-1-building-
taylors-theorem-a blocks-python), 185
taylors-theorem-a (main/taylors-theorem), ??
trapezoid-step-size
taylors-theorem-h trapezoid-step-size (main/ODE-IVP-5-error-
taylors-theorem-h (main/taylors-theorem), ?? control), 265
theorem-1 triangular-matrix
theorem-1 (main/derivatives-and-the-method-of- triangular-matrix (main/linear-equations-3-lu-
undetermined-coefficents), 176 factorization-python), 99
uniformly-contracting
uniformly-contracting (main/fixed-point-
iteration-python), 22
vector-valued-contraction-mapping-
theorem
vector-valued-contraction-mapping-
theorem (main/linear-equations-6-iterative-
methods-python), 125
well-posed
well-posed (main/machine-numbers-rounding-error-
and-error-propagation-python), 87