0% found this document useful (0 votes)
198 views50 pages

MA385

This document provides an overview of the MA385 Numerical Analysis I course taught by Dr. Niall Madden at NUI Galway. It outlines the course contents which include Taylor's theorem, solving nonlinear equations, initial value problems, matrix algorithms, and condition numbers. The document discusses recommended textbooks, learning outcomes, mathematical prerequisites, and reasons for taking the course. Numerical analysis involves designing, analyzing, and implementing numerical methods to solve mathematical problems approximately or exactly.

Uploaded by

slimane1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views50 pages

MA385

This document provides an overview of the MA385 Numerical Analysis I course taught by Dr. Niall Madden at NUI Galway. It outlines the course contents which include Taylor's theorem, solving nonlinear equations, initial value problems, matrix algorithms, and condition numbers. The document discusses recommended textbooks, learning outcomes, mathematical prerequisites, and reasons for taking the course. Numerical analysis involves designing, analyzing, and implementing numerical methods to solve mathematical problems approximately or exactly.

Uploaded by

slimane1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

MA385: Numerical Analysis I

Niall Madden
November 22, 2012
Contents
1 MA385 Introductory Lecture 2
2 Taylors Theorem Part II 5
3 Solving nonlinear equations 7
4 The Secant Method 9
5 From Secant to Newtons Method 11
6 Analysing Newtons Method 13
7 Matlab class 1 14
8 Wrap up: Iteration 16
9 Initial Value Problems 20
10 One step methods 21
11 Analysis of one-step methods 23
12 RK2 25
13 Matlab Class 2: RK1 and RK2 27
14 Higher-order methods 30
15 From IVPs to Linear Systems 32
16 Gaussian Elimination 35
17 Matlab Class 3: functions 37
18 Triangular Matrices 38
19 LU-factorisation 40
20 Putting it all together 42
21 Norms of vectors and matrices 44
22 Computing A
2
46
23 Condition Numbers 47
24 Gerschgorins theorems 49
1
Lecture 1 2 Tuesday 04/09/12
1 MA385 Introductory Lecture
1.1 Welcome to MA385
This is Semester I of the honours numerical analysis course.
You may be taking this course if you are enrolled in 4th
Financial Mathematics, 3rd Science (Honours Maths),
the H.Dip. in Applied Science (Maths), year one of the
two-year MA program, or just taking it as an option as
part of your mathematical science degree.
The basic information for the course is as follows:
Lecturer: Dr Niall Madden, School of Maths. My oce
is in room ADB-1013 (formerly known as C213,
Aras de Brun).
Email: [email protected]
Lectures: Tuesday and Thursday, 2.103.00 in ADB-
1020 (C219).
Tutorial/Lab: TBA (to begin during Week 3)
Assessment: Lab assignments, written problem sets, a
2-hour exam in December.
1.1.1 Text books
The main text-book is S uli and Mayers, An Intro-
duction to Numerical Analysis, Cambridge University
Press [1]. This should be available from the library at
519.4 MAY, and there are copies in the book shop. It is
very well suited to this course: though it does not over
complicate the material, it approaches topics with a rea-
sonable amount of rigour. There is a good selection of
interesting problems. The scope of the book is almost
perfect for the course, especially for those students taking
both semesters. You should buy this book.
Other book include
G.W. Stewart, Afternotes on Numerical Analysis,
SIAM [3]. In the library at 519.4 STE. Moreover,
the full text is freely available online to NUI Galway
users! This book is very easy to read, and students
who would enjoy a bit more discussion at an easier
pace might turn to it.
Cleve Moler, Numerical Computing with MATLAB [2].
The emphasis is on the implementation of algo-
rithms in Matlab, but the techniques are well ex-
plained and there are some nice exercises. Also, it
is freely available online.
James F Epperson, An introduction to numerical
methods and analysis,[5]. There are ve copies in
the library at 519.4.
Michelle Schatzman, Numerical Analysis,[8].
Stoer and Bulirsch, Introduction to Numerical Anal-
ysis, Springer [6]. A very complete reference for
this course.
Quarteroni, et al, Numerical Mathematics, Springer.
Also very through, and with sample Matlab code.
1.1.2 Web site:
The on-line content for the course will be hosted at
NUIGalway.BlackBoard.com and on the Mathematics web-
server at http://www.maths.nuigalway.ie/MA385. There
youll nd various pieces of information, including lecture
summaries, these notes, problem sets, announcements,
etc.
If you are registered for MA385, you should be au-
tomatically enrolled onto the blackboard site. If you are
enrolled in MA530, please send an email to me.
A synopsis of each lecture will be posted to the
website before the class. They contain most of the
main remarks, statements of theorems, results and exer-
cises. However, they will not contain proofs of theorems,
examples, solutions to exercises, etc.
You should try to print these before class. It will make
following the lecture easier, and youll know what notes
to take down.
1.1.3 What is Numerical Analysis?
Its the design, analysis and implementation of numeri-
cal methods that yield exact or approximate solutions to
mathematical problems.
It does not involved endless, tedious calculations. We
will not (usually) implement Newtons Method by hand,
waste time doing the arithmetic of Gaussian Elimination,
etc.
The Design of a numerical method is perhaps the
most interesting; its often about nding a clever way
swapping the problem for one that is easier to solve, but
has the same or similar solution. If the two problems have
the same solution, then the method is exact. If they are
similar, then it is approximate.
The Analysis is the mathematical part; its usually cul-
minates in proving a theorem that tells us (at least) one
of the following
The method will work; that our algorithm will yield
the solution we are looking for;
how much eort is required;
if the method is approximate, determine how close
the true solution be to the real one. A descrip-
tion of this aspect of the course, to quote the In-
troduction to [5], is being rigorously imprecise or
approximately precise.
Well look at the implementation of the methods in labs.
REFERENCES 3 Tuesday 04/09/12
1.1.4 Topics
1. Well start o with a review of Taylors theorem. It
is the starting point for the algorithms of the next
two sections.
2. Root-nding and solving non-linear equations.
3. Initial value ordinary dierential equations.
4. Matrix Algorithms I solving systems of linear equa-
tions.
5. Matrix Algorithms II estimating eigenvalues and
eigenvectors.
We also see how these methods can be applied to real-
world, including Financial Mathematics.
1.1.5 Learning outcomes
When you have successfully completed this course, you
will be able to demonstrate your factual knowledge of
the core topics (root-nding, solving ODEs, solving lin-
ear systems, estimating eigenvalues), using appropriate
mathematical syntax and terminology.
Moreover, you will be able to describe the fundamen-
tal principles of the concepts (e.g., Taylors Theorem)
underpinning Numerical Analysis. Then, you will apply
these principles to design algorithms for solving mathe-
matical problems, and discover the properties of these
algorithms. course to solve problems.
Students will gain the ability to use a Matlab to im-
plement these algorithms, and adapt the codes for more
general problems, and for new techniques.
1.1.6 Mathematical Preliminaries
Anyone who can remember their rst and second years of
analysis and algebra should be able to handle this course.
Students who know a little about dierential equations
(initial value and boundary value) will nd a certain sec-
tions (particularly in Semester II) somewhat easier than
those who havent.
If its been a while since you covered basic calculus,
you will nd it very helpful to revise the following: the In-
termediate Value Theorem; Rolles Theorem; The Mean
Value Theorem; Taylors Theorem, and the triangle in-
equality: |a+b| |a| +|b|. Youll nd them in any good
text book, e.g., Appendix 1 of S uli and Mayers.
Youll also nd it helpful to recall some basic linear
algebra, particularly relating to eigenvalues and eigenvec-
tors. Consider the statement: all the eigenvalues of a
real symmetric matrix are real. If are unsure what the
meaning of any of the terms used, or if you didnt know
that its true, you should have a look at a book on Linear
Algebra.
1.1.7 Why take this course?
Many industry and academic environments require grad-
uates who can solve real-world problems using a math-
ematical model, but these models can often only be re-
solved using numerical methods. To quote one Financial
Engineer: We prefer approximate (numerical) solutions
to exact models rather than exact solutions to simplied
models.
Another expert, who leads a group in fund manage-
ment with DB London, when asked what sort of gradu-
ates would you hire, the list of specic skills included
A programming language and a 4th-generation lan-
guage such as Matlab (or S-PLUS).
Numerical Analysis
Recent graduates of our Financial Mathematics, Com-
puting and Mathematics degrees often report to us that
they were hired because that had some numerical anal-
ysis background, or were required to go and learn some
before they could do some proper work. This is particu-
larly true in the nancial sector, games development, and
mathematics civil services (e.g., MET oce, CSO).
References
[1] E S uli and D Mayers, An Introduction to Nu-
merical Analysis, 2003. 519.4 MAY.
[2] Cleve Moler, Numerical Computing
with MATLAB, Cambridge Univer-
sity Press. Also available free from
http://www.mathworks.com/moler
[3] G.W. Stewart, Afternotes on Numerical Analy-
sis, SIAM, 1996. 519.4 STE.
[4] G.W. Stewart, Afternotes goes to Graduate
School, SIAM, 1998. 519.4 STE.
[5] James F Epperson, An introduction to numeri-
cal methods and analysis. 519.4EPP
[6] Stoer and Bulirsch, An Introduction to Numer-
ical Analysis, Springer.
[7] Quarteroni, Sacco and Saleri, Numerical Math-
ematics, Springer.
[8] Michelle Schatzman, Numerical Analysis: a
mathematical introduction, 515 SCH.
Lecture 1 Taylors Theorem 4 Thu 06/09/12
1.2 Taylors Theorem
Taylors theorem is perhaps the most important math-
ematical tool in Numerical Analysis. Providing we can
evaluates the derivatives of a given function at some
point, it gives us a way of approximating the function
by a polynomial.
Working with polynomials, particularly ones of degree
3 or less, is much easier than working with arbitrary func-
tions. For example, polynomials are easy to dierentiate
and integrate. Most importantly for the next section of
this course, their zeros are easy to nd.
Our study of Taylors theorem starts with our old
friend, the mean value theorem.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theorem 1.1 (Mean Value Theorem). If f is function
that is continuous and dierentiable for all a x b,
then there is a point c [a, b] such that
f(b) f(a)
b a
= f

(c).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
This is just a consequence of Rolles Theorem, and
has few dierent interpretations. One is that the slope of
the line that intersects f at the points a and b is equal
to the slope of the tangent to f at some point between
a and b.
There are many important consequences of the MVT,
some of which well return to later. Right now, we in-
terested in the fact that the MVT tells us that we can
approximate the value of a function by a near-by value,
with accuracy that depends on f

:
Or we can think of it as approximating f by a line:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
But what if we wanted a better approximation? We
could replace our function with, say, a quadratic polyno-
mial. Let p
2
(x) = b
0
+b
1
(xa) +b
2
(xa)
2
and solve
for the coecients b
0
, b
1
and b
2
so that
p
2
(a) = f(a), p

2
(a) = f

(a), p

2
(a) = f

(a).
Giving that
p
2
(x) = f(a) + (x a)f

(a) +
(x a)
2
2
f

(a).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Next, if we try to construct an approximating cubic of
the form
p
3
(x) = b
0
+b
1
(x a) +b
2
(x a)
2
+b
3
(x a)
3
,
=
3

k=0
b
k
(x a)
k
,
so that p
3
(a) = f(a), p

3
(a) = f

(a), p

3
(a) = f

(a),
and p

3
(a) = f

(a). Note: we can write this is a more


succinct, using the mathematical short-hand:
p
(k)
3
(a) = f
(k)
(a) for k = 0, 1, 2, 3.
Again we nd that
b
k
=
f
(k)
k!
for k = 0, 1, 2, 3.
As you can probably guess, this formula can be easily
extended for arbitrary k, giving us the Taylor Polynomial.
Denition 1.2 (Taylor Polynomial). The Taylor
1
Polyno-
mial of degree k (also called the Truncated Taylor Series)
that approximates the function f about the point x = a
is
p
k
(x) = f(a) + (x a)f

(a) +
(x a)
2
2
f

(a)+
(x a)
3
3!
f

(a) + . . .
(x a)
k
k!
f
(k)
(a).
In the next lecture, well return to this topic with
a particular emphasis on quantifying the error in the
Taylor Polynomial.
1
Brook Taylor, 1865 1731, England. He
(re)discovered this polynomial approxima-
tion in 1712, though its importance was
not realised for another 50 years.
Lecture 2 Taylors Theorem Part II 5 Thu 06/09/12
2 Taylors Theorem Part II
2.1 Taylor Polynomials
Recall from Tuesday: the Taylor Polynomial is a polyno-
mial of degree k that approximates the function f about
the point x = a is
p
k
(x) = f(a) + (x a)f

(a) +
(x a)
2
2
f

(a)+
(x a)
3
3!
f

(a) + . . .
(x a)
k
k!
f
(k)
(a).
It is also called the Truncated Taylor Series.
Example 2.1. Write down the Taylor polynomial of de-
gree k that approximates f(x) = e
x
about the point
x = 0.
Fig. 2.1: Taylor polys for f(x) = e
x
about x = 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
As Figure 2.1 suggests, in this case p
3
(x) is a more
accurate estimation of e
x
than p
2
(x), which is more ac-
curate than p
1
(x). This is demonstrated in Figure 2.2
were it is shown the dierence between f and p
k
.
Fig. 2.2: Error in Taylor polys for f(x) = e
x
about x = 0
2.2 The Remainder
The remainder (ha!) of this lecture is examining the ac-
curacy of the Taylor polynomial as an approximation. In
particular, we would like to nd a formula for the remain-
der or error :
R
n
(x) := f(x) p
n
(x).
With a little bit of eort one can prove that:
R
n
(x) :=
(x a)
n+1
(n + 1)!
f
(n+1)
(), for some [x, a].
We wont prove this in class, since it is quite stan-
dard and features in other courses you have taken. But
for the sake of completeness, a proof is included below
in Section 2.5 below.
Example 2.2. With f(x) = e
x
and a = 0, we get that
R
n
(x) =
x
n+1
(n + 1)!
e

, some [0, x].


Example 2.3. How many terms are required in the Taylor
Polynomial for e
x
about x = 0 to ensure that the error
at x = 1 is no more than 10
6
?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
There are other ways of representing the remainder,
including the Integral Representation of the Remainder :
R
n
(x) =

x
a
f
(n+1)
(t)
n!
(x t)
n
dt. (2.1)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other issues related to Taylor polynomials, and that
we will touch on later in the course, include constructing
the Taylor series for a function of two variables: f =
f(x, y).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 To applications of Taylors Theorem
The reasons for emphasising Taylors theorem so early in
this course are that
It introduces us to the concept of approximation,
and error estimation, but in a very simple setting;
It is the basis for deriving methods for solving both
nonlinear equations, and initial value ordinary dif-
ferential equations.
Lecture 2 Taylors Theorem Part II 6 Thu 06/09/12
With the last point in mind, well now outline how to
derive
1. Newtons method for nonlinear equations;
2. Eulers method for IVPs.
This is just a taster : Well return to these topics later
in the semester.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Exercises
Exercise 2.1. Prove the Integral Mean Value Theorem:
there exists a point c [a, b] such that
f(x) =
1
b a

b
a
f(x)dx.
Exercise 2.2. Write down the formula for the Taylor
Polynomial for
(i) f(x) =

1 +x about the point x = 0,


(ii) f(x) = sin(x) about the point x = 0,
(iii) f(x) = log(x) about the point x = 1.
Exercise 2.3. The Fundamental Theorem of Calculus
tells us that

x
a
f

(t)dt = f(x) f(a). This can be re-


arranged to get f(x) = f(a) +

x
a
f

(t)dt. Use this and


integration by parts to deduce (2.1) for the case n = 1.
(Hint: Check Wikipedia!)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 A proof of Taylors Theorem
Here is a proof of Taylors theorem. It wasnt cov-
ered in class. One of the ingredients need is Gener-
alised Mean Value Theorem: if the functions F and G are
continuous and dierentiable, etc, then, for some point
c [a, b],
F(b) F(a)
G(b) G(a)
=
F

(c)
G

(c)
. (2.2)
Theorem 2.4 (Taylors Theorem). Suppose we have a
function f that is suciently dierentiable on the interval
[a, x], and a Taylor polynomial for f about the point x =
a
p
n
(x) = f(a) + (x a)f

(a) +
(x a)
2
2
f

(a)+
(x a)
3
3!
f

(a) + . . .
(x a)
k
k!
f
(n)
(a). (2.3)
If the remainder is written as R
n
(x) := f(x) p
n
(x),
then
R
n
(x) :=
(x a)
n+1
(n + 1)!
f
(n+1)
(), (2.4)
for some point [x, a].
Proof. We want to prove that, for any n = 0, 1, 2, . . . ,
there is a point [a, x] such that
f(x) = p
n
(x) +R
n
(x).
If x = a then this is clearly the case because f(a) =
p
n
(a) and R
n
(a) = 0.
For the case x = a, we will use a proof by induction.
The Mean Value Theorem tells us that there is some
point [a, x] such that
f(x) f(a)
x a
= f

().
Using that p
0
(a) = f(a) and that R
0
(x) = (x a)f

()
we can rearrange to get
f(x) = p
0
(a) +R
0
(x),
as required.
Now we will assume that (2.3)(2.4) are true for the
case n = k1; and use this to show that they are true for
n = k. From the Generalised Mean Value Theorem (2.2),
there is some point c such that
R
k
(x)
(x a)
k+1
=
R
k
(x) R
k
(a)
(x a)
k+1
(a a)
k+1
=
R

k
(c)
(k + 1)(c a)
k
,
where here we have used that R
k
(a) = 0. Rearranging we
see that we can write R
k
in terms of its own derivative:
R
k
(x) = R

k
(c)
(x a)
k+1
(k + 1)(c a)
k
. (2.5)
So now we need an expression for R

k
. This is done by
noting that it also happens to be the remainder for the
Taylor polynomial of degree k 1 for the function f

.
R

k
(x) = f

(x)
d
dx
_
f(a) + (x a)f

(a)+
(x a)
2
2!
f

(a) + +
(x a)
k
k!
f
(k)
(a)
_
.
R

k
(x) = f

(x)
_
f

(a) + (x a)f

(a) + +
(x a)
k1
(k 1)!
f
(k)
(a)
_
.
But the expression on the last line of the above equation
is the formula for the Taylor Polynomial of degree 1 for
f

. By our inductive hypothesis:


R

k
(c) :=
(c a)
k
k!
f
(k+1)
(),
for some . Substitute into (2.5) above and we are done.
Lecture 3 Solving nonlinear equations 7 Tue 11/09/12
3 Solving nonlinear equations
3.1 Introduction
Linear equations are of the form:
nd x such that ax +b = 0
and are easy to solve. Some nonlinear problems are also
easy to solve, e.g.,
nd x such that ax
2
+bx +c = 0.
Cubic and quartic equations also have solutions for which
we can obtain a formula. But, in the majority of cases,
there is no formula. So we need a numerical method.
References
S uli and Mayers [1, Chapter 1]. Ill follow this
pretty closely in lectures.
Stewart (Afternotes ...), [3, Lectures 1-5]. A well-
presented introduction, with lots of diagrams to
give an intuitive introduction.
Moler (Numerical Computing with MATLAB) [2,
Chap. 4]. Gives a brief introduction to the meth-
ods we study, and some a description of MATLAB
functions for solving these problems.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Our generic problem is:
Let f be a continuous function on the interval [a, b].
Find = [a, b] such that f() = 0.
Then is the solution to f(x) = 0.
This leads to two natural questions:
1. How do we know there is a solution?
2. How do we nd it?
The following gives sucient conditions for the exis-
tence of a solution:
Proposition 3.1. Let f be a real-valued function that
is dened and continuous on a bounded closed interval
[a, b] R. Suppose that f(a)f(b) 0. Then there
exists [a, b] such that f() = 0.
OK now we know there is a solution to f(x) = 0, but
how to we actually solve it? Usually we dont! Instead
we construct a sequence of estimates {x
0
, x
1
, x
2
, x
3
, . . . }
that converge to the true solution. So now we have to
answer these questions:
(1) How can we construct the sequence x
0
, x
1
, . . . ?
(2) How do we show that lim
k
x
k
= ?
There are some subtleties here, particularly with part (2).
What we would like to say is that at each step the error
is getting smaller. That is
| x
k
| < | x
k1
| for k = 1, 2, 3, . . . .
But we cant. Usually all we can say is that the bounds
on the error is getting smaller. That is: let
k
be a bound
on the error at step k
| x
k
| <
k
,
then
k+1
<
k
for some number (0, 1). It is
easiest to explain this in terms of an example, so well
study the simplest method: Bisection.
3.2 Bisection
The most elementary algorithm is the Bisection Method
(also known as Interval Bisection). Suppose that we
know that f(x) = 0 has a solution, , in the interval
[a, b].
Method 3.2 (Bisection). Set x
0
= a and x
1
= b. Set
x
2
= (x
0
+ x
1
)/2. For every other i, set x
k+1
to be the
midpoint of the x
k
and x
j
where x
j
is the point to the
left or right of x
k
such that f(x
k
)f(x
j
) 0. Stop when
f(x
k+1
) or |x
k+1
x
k
| is suciently small.
Alternatively, we can give some pseudocode for the
method. Set eps to be the stopping criterion
while( abs(b-a) > eps)
do
if (f(a) == 0)
return a;
else if (f(b) == 0)
return b;
if;
c = (b+a)/2;
if ( f(c) == 0)
return c;
else
if (f(a)f(c) < 0)
b=c;
else
a=c;
if;
od;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lecture 3 Solving nonlinear equations 8 Tue 11/09/12
0.5 0 0.5 1 1.5 2 2.5
2
1
0
1
2
3


x
[0]
=a x
[1]
=b x
[2]
=1 x
[3]
=1.5
f(x)= x
2
2
xaxis
Fig. 3.1: Solving x
2
2 = 0 with the Bisection Method
k x
k
|x
k
| |x
k
x
k1
|
0 0.000000 1.41
1 2.000000 5.86e-01
2 1.000000 4.14e-01 1.00
3 1.500000 8.58e-02 5.00e-01
4 1.250000 1.64e-01 2.50e-01
5 1.375000 3.92e-02 1.25e-01
6 1.437500 2.33e-02 6.25e-02
7 1.406250 7.96e-03 3.12e-02
8 1.421875 7.66e-03 1.56e-02
9 1.414062 1.51e-04 7.81e-03
10 1.417969 3.76e-03 3.91e-03
.
.
.
.
.
.
.
.
.
.
.
.
22 1.414214 5.72e-07 9.54e-07
Table 3.1: Solving x
2
2 = 0 with the Bisection Method
Example 3.3. Find an estimate for

2 that is correct
to 6 decimal places. Solution: Try to solve the equation
f(x) := x
2
2 = 0. Then proceed as shown in Figure 3.1
and Table 3.1.
Note that at steps 4 and 10 in Table 3.1 the error
actually increases, although the bound on the error is
decreasing.
3.3 The bisection method works
To actually implement the Bisection Method, we con-
struct as sequence of estimates
x
0
, x
1
, x
2
, x
3
, . . .
as follows: Choose a small number eps.
1. Set x
0
= a and x
1
= b.
2. Set x
2
= (x
0
+x
1
)/2.
3. If f(x
2
) eps, stop and take x
2
as your answer.
4. Otherwise, set x
3
to be the midpoint of the interval
to the the left or right of x
2
where f changes sign.
5. Continue as above until you nd a point x
k
such
that f(x
k
) eps.
The main advantages of the Bisection method are
It will always work.
After i steps we know that
Proposition 3.4.
| x
k
|
_
1
2
_
k1
|b a|, for k = 2, 3, 4, ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A disadvantage of bisection is that it is not as ecient
as some other methods well investigate later.
3.4 Improving upon bisection
The bisection method is not very ecient. Our next goals
will be to derive better methods, particularly the Secant
Method and Newtons method. We also have to come up
with some way of expressing what we mean by better;
and well have to use Taylors theorem in our analyses.
3.5 Exercises
Exercise 3.1. Does Proposition 3.1 mean that, if there is
a solution to f(x) = 0 in [a, b] then f(a)f(b) 0? That
is, is f(a)f(b) 0 a necessary condition for their being a
solution to f(x) = 0. Give an example that support your
answer.
Exercise 3.2. Suppose we want to nd [a, b] such
that f() = 0 for some given f, a and b. Write down
an estimate for the number of iterations K required to
ensure that, for a given , we know |x
k
| for all
k K. In particular, how does this estimate depend of
f, a and b?
Exercise 3.3. How many (decimal) digits of accuracy
are gained at each step?
Exercise 3.4. Let f(x) = e
x
2x 2. Show that the
problem
Find [0, 2] such that f() = 0
has a solution.
Taking x
0
= 0 and x
1
= 2, use 6 steps of the bisection
method to estimate . Give an upper bound for the error
| x
6
|.
Lecture 4 The Secant Method 9 Thu 13/09/12
4 The Secant Method
4.1 Motivation
Looking back at Table 3.1 we notice that, at step 4 the
error increases rather decreases. You could argue that
this is because we didnt take into account how close x
3
is
to the true solution. We could improve upon the bisection
method as described below. The idea is, given x
k1
and
x
k
, take x
k+1
to be the zero of the line intersects the
points
_
x
k1
, f(x
k1
)
_
and
_
x
k
, f(x
k
)
_
. See Figure 4.1
0.5 0 0.5 1 1.5 2 2.5
3
2
1
0
1
2
3


x
[0]
(x
[0]
, f(x
[0]
))
x
[1]
(x
[1]
, f(x
[1]
))
x
[2]
f(x)= x
2
2
xaxis
0.5 0 0.5 1 1.5 2 2.5
3
2
1
0
1
2
3


x
[0]
x
[1]
(x
[1]
, f(x
[1]
))
x
[2]
(x
[2]
, f(x
[2]
))
x
[3]
f(x)= x
2
2
xaxis
Fig. 4.1: The Secant Method for Example 4.2
Method 4.1 (Secant).
2
Choose x
0
and x
1
so that there
is a solution in [x
0
, x
1
]. Then dene
x
k+1
= x
k
f(x
k
)
x
k
x
k1
f(x
k
) f(x
k1
)
. (4.6)
Example 4.2. Use the Secant Method to solve the non-
linear problem x
2
2 = 0 in [0, 2]. The results are shown
in Table 4.1. By comparing Tables 3.1 and 4.1, we see
that for this example, the Secant method is much more
ecient than Bisection. Well return to why this is later.
The Method of Bisection could be written as the
weighted average x
k+1
= (1
k
)x
k
+
k
x
k1
, with

k
= 1/2. We can also think of the Secant method as
being a weighted average, but with
k
chosen to obtain
2
The name comes from the name of the line that intersects
a curve at two points. There is a related method called false
position which was known in India in the 3rd century BC, and
China in the 2nd century BC.
k x
k
|x
k
|
0 0.000000 1.41e
1 2.000000 5.86e-01
2 1.000000 4.14e-01
3 1.333333 8.09e-02
4 1.428571 1.44e-02
5 1.413793 4.20e-04
6 1.414211 2.12e-06
7 1.414214 3.16e-10
8 1.414214 4.44e-16
Table 4.1: Solving x
2
2 = 0 using the Secant Method
faster convergence to the true solution. Looking at Fig-
ure 4.1 above, you could say that we should choose
k
depending on which is smaller:
f(x
k1
) or f(x
k
). If (for example) f(x
k1
) < f(x
k
),
then probably | x
k1
| < | x
k
|. This gives another
formulation of the Secant Method.
x
k+1
= x
k
(1
k
) +x
k1

k
, (4.7)
where

k
=
f(x
k
)
f(x
k
) f(x
k1
)
.
When its written in this form it is sometimes called a
relaxation method.
Finally, we remark that the formulation in (4.6) can
lead to Newtons method (see next lecture) by observing
that
x
k
x
k1
f(x
k
) f(x
k1
)

1
f

(x
k
)
.
4.2 Order of Convergence
To compare dierent methods, we need the following
concept:
Denition 4.3 (Linear Convergence). Suppose that =
lim
k
x
k
. Then we say that the sequence {x
k
}

k=0
con-
verges to at least linearly if there is a sequence of
positive numbers {
k
}

k=0
, and (0, 1), such that
lim
k

k
= 0, (4.8a)
and
| x
k
|
k
for k = 0, 1, 2, . . . . (4.8b)
and
lim
k

k+1

k
= . (4.8c)
So, for example, the bisection method converges at least
linearly.
The reason for the expression at least is because we
usually can only show that a set of upper bounds for the
errors converges linearly. If (4.8b) can be strengthened
to the equality |x
k
| =
k
, then the {x
k
}

k=0
converges
linearly, (not just at least linearly).
As we have seen, there are methods that converge
more quickly than bisection. We state this more precisely:
Lecture 4 The Secant Method 10 Thu 13/09/12
Denition 4.4 (Order of Convergence). Let
= lim
k
x
k
. Suppose there exists > 0 and a
sequence of positive numbers {
k
}

k=0
such (4.8a) and
and (4.8b) both hold. Then we say that the sequence
{x
k
}

k=0
converges with at least order q if
lim
i

k+1
(
k
)
q
= .
Two particular values of q are important to us:
(i) If q = 1, and we further have that 0 < < 1, then
the rate is linear.
(ii) If q = 2, the rate is quadratic for any > 0.
Lecture 5 From Secant to Newtons Method 11 Tue 18/09/12
5 From Secant to Newtons Method
5.1 Analysis of the Secant Method
Last week we were introduced to the Secant Method:
Choose x
0
and x
1
so that there is a solution in [x
0
, x
1
].
Then dene
x
k+1
= x
k
f(x
k
)
x
k
x
k1
f(x
k
) f(x
k1
)
.
Now we want to prove that the Secant Method con-
verges, though only linearly. Well investigate exactly
how rapidly it converges in a lab.
One simple mathematical tool that we use is Theo-
rem 1.1 (The Mean Value Theorem. See also [1, p420].
Proposition 5.1. Suppose that f and f

are real-valued
functions, continuous and dened in an interval I = [
h, +h] for some h > 0. If f() = 0 and f

() = 0, then
the sequence (4.6) converges at least linearly to .
Before we prove this, we note the following
We wish to show that | x
k+1
| < | x
k
|.
From Theorem 1.1, there is a point w
k
[x
k1
, x
k
]
s.t.
f(x
k
) f(x
k1
)
x
k
x
k1
= f

(w
k
).
Also by the MVT, there is a point z
k
[x
k
]
such that
f(x
k
) f()
x
k

=
f(x
k
)
x
k

= f

(z
k
).
Therefore f(x
k
) = (x
k
)f

(z
k
).
Suppose that f

() > 0. (If f

() < 0 just tweak


the arguments accordingly). Saying that f

is con-
tinuous in the region [ h, +h] means that, for
any > 0 there is a > 0 such that
|f

(x) f

()| < for any x [ , +].


Take = f

()/4. Then |f

(x) f

()| < f

()/4.
Thus
3
4
f

() f

(x)
5
4
f

() for any x [, +].


Then, so long as w
k
and z
k
are both in [, +]
f

(z
k
)
f

(w
k
)

5
3
.
Given enough time and eort we could show that the
Secant Method converges faster that linearly. In partic-
ular, that the order of convergence is q = (1 +

5)/2
1.618. This number arises as the only positive root of
q
2
q 1. It is called the Golden Mean, and arises in
many areas of Mathematics, including nding an explicit
expression for the Fibonacci Sequence: f
0
= 1, f
1
= 1,
f
k+1
= f
k
+ f
k1
for k = 2, 3, . . . . That gives, f
0
= 1,
f
1
= 1, f
2
= 2, f
3
= 3, f
4
= 5, f
5
= 8, f
6
= 13, . . . .
The connection here is that it turns out that
k+1

C
k

k1
. Repeatedly using this we get:
Let r = |x
1
x
0
| so that
0
r and
1
r,
Then
2
= C
1

0
Cr
2
Then
3
= C
2

1
C(Cr)r = C
2
r
3
.
Then
4
= C
3

2
C(C
2
r
3
)(Cr
2
) = C
4

5
.
Then
5
= C
4

3
C(C
4
r
5
)(C
2
r
3
) = C
7

8
.
And in general,
k
= C
f
k
1
r
f
k
.
5.2 From Secant to Newton
These notes are loosely based on Section 1.4 of [1] (i.e.,
Suli and Mayers, Introduction to Numerical Analysis).
See also, [3, Lecture 2], and [5, 3.5] The Secant method
is often written as x
k+1
= x
k
f(x
k
)(x
k
, x
k1
), where
the function is chosen so that x
k+1
is the root of
the secant line joining the points
_
x
k1
, f(x
k1
)
_
and
_
x
k
, f(x
k
)
_
. A related idea is to construct a method
x
k+1
= x
k
f(x
k
)(x
k
), where we choose so that
x
k+1
is the point where the line through (x
k
, f(x
k
)) with
slope f

(x
k
) cuts the x-axis. This is shown in Figure 5.2.
We attempt to solve x
2
2 = 0, taking x
0
= 2. Taking
the x
1
to be zero of the tangent to f(x) at x = 2, we
get x
1
= 1.5. Taking the x
2
to be zero of the tangent to
f(x) at x = 1.5, we get x
2
= 1.4167, which is very close
to the true solution of = 1.4142.
Lecture 5 From Secant to Newtons Method 12 Tue 18/09/12
1 1.5 2 2.5
1
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4


x
0
(x
0
, f(x
0
)
f(x)= x
2
2
xaxis
1 1.5 2 2.5
1
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4


x
[0]
(x
0
, f(x
0
))
x
[1]
(x
1
, f(x
1
))
f(x)= x
2
2
xaxis
Fig. 5.2: Finding x
1
using Newtons Method
Method 5.2 (Newtons Method
3
).
1. Choose any x
0
in [a, b],
2. For i = 1, 2, . . . , set x
k+1
to the root of the line
through x
k
with slope f

(x
k
).
By writing down the equation for the line at
_
x
k
, f(x
k
)
_
with slope f

(x
k
), show that the formula for the iteration
is
x
k+1
= x
k
f(x
k
)/f

(x
k
). (5.9)
Example 5.3. Use Newtons Method to solve the non-
linear problem x
2
2 = 0 in [0, 2]. The results are show
in Table 5.2.
By comparing Table 4.1 and Table 5.2, we see that
for this example, the Newtons method is more ecient
again than the Secant method.
Deriving Newtons method geometrically certainly has an
intuitive appeal. However, to analyse the method, we
3
Sir Isaac Newton, 1643 - 1727, England.
Easily one of the greatest scientist of all
time. The method we are studying ap-
peared in his celebrated Principia Mathe-
matica in 1687, but it is believed he had
used it as early as 1669.
k x
k
|x
k
| |x
k
x
k1
|
0 2.000000 5.86e-01
1 1.500000 8.58e-02 5.00e-01
2 1.416667 2.45e-03 8.33e-02
3 1.414216 2.12e-06 2.45e-03
4 1.414214 1.59e-12 2.12e-06
5 1.414214 2.34e-16 1.59e-12
Table 5.2: Solving x
2
2 = 0 using Newtons Method
need a more abstract derivation based on a Truncated
Taylor Series that we met in Lecture 2:
Exercise 5.1.
(i) Write down the equation of the line that intersect
_
x
k1
, f(x
k1
)
_
and
_
x
k
, f(x
k
)
_
.
Hence show how to derive (4.6).
(ii) Can you construct a problem for which the bisection
method will work, but the secant method will fail
completely? If so, give an example.
Exercise 5.2.
(i) Write down Newtons Method as applied to the
function f(x) = x
3
2. Simplify the computation
as much as possible. What is achieved if we nd
the root of this function?
(ii) Do three iterations by hand of Newtons Method
applied to f(x) = x
3
2 with x
0
= 1.
Lecture 6 Analysing Newtons Method 13 Thu 19/09/12
6 Analysing Newtons Method
6.1 Newton Error Formula
In Lecture 5 we introduced:

Newtons Method
To nd [a, b] such that f() = 0, choose x
0
in [a, b] and then
x
k+1
= x
k

f(x
k
)
f

(x
k
)
, for k = 0, 1, 2, . . . .
We saw in Table 5.2 that this method can be much
more ecient than, say, Bisection: it yields estimates
that converge far more quickly to . Bisection converges
(at least) linearly, whereas Newtons converges quadrat-
ically, a.k.a. at least order q = 2.
In order to prove that this is so, we need to
1. Show that it converges
2. Write down a recursive formula for the error.
3. Use this to nd the limit of | x
k+1
|/| x
k
|.
Step 2 is usually the crucial part.
There are two parts to the proof. The rst involves
deriving the so-called Newton Error formula. Then
well apply this to prove (quadratic) convergence. In all
cases well assume that the functions f, f

and f

are
dened and continuous on the an interval I

= [
, +] around the root .
The proof is essentially the same as Epperson [5, Thm
3.2].
Proposition 6.1 (Newton Error Formula). If f() = 0
and
x
k+1
= x
k

f(x
k
)
f

(x
k
)
,
then there is a point
k
between and x
k
such that
x
k+1
=
( x
k
)
2
2
f

(
k
)
f

(x
k
)
,
Example 6.2. As an application of Newtons error for-
mula, well show that the number of correct decimal dig-
its in the approximation doubles at each step.
6.2 Convergence of Newtons Method
Well now complete our analysis of this section by proving
the convergence of Newtons method.
Proposition 6.3. Let us suppose that f is a function
such that
f is continuous and real-valued, with continuous
f

, dened on some close interval I

= [, +],
f() = 0 and f

() = 0,
there is some positive constant A such that
|f

(x)|
|f

(y)|
A for all x, y I

.
Let h = min{, 1/A}. If | x
0
| h then Newtons
Method converges quadratically.
6.3 Exercises
Exercise 6.1. [5, Exer 3.5.1] If f is such that |f

(x)| 3
and |f

(x)| 1 for all x, and if the initial error in Newtons


Method is less than 1/2, give an upper bound for the error
at each of the rst 3 steps.
Exercise 6.2. Here is (yet) another scheme called Stef-
fensons Method: Choose x
0
[a, b] and for k = 0,1, 2,
. . . , set
x
k+1
= x
k

_
f(x
k
)
_
2
f
_
x
k
+f(x
k
)
_
f(x
k
)
.
How does this relate to Newtons Method?
Exercise 6.3. ([1, Exer 1.6]) The proof of the conver-
gence of Newtons method given in Prop. 6.3 uses that
f

() = 0. Suppose that it is the case that f

() = 0.
(i) What can we say about the root, ?
(ii) Show that the Newton Error formula can be ex-
tended to give that
x
k+1
=
( x
k
)
2
f

(
k
)
f

(
k
)
,
for some
k
between and x
k
.
(iii) What does the above error formula tell us about the
convergence of Newtons method in this case?
Lecture 7 Matlab class 1 14 Tue 25/09/12
7 Matlab class 1
The goal of this lecture/lab is to gain familiarity with the
fundamental tasks that can be accomplished with Mat-
lab: dening vectors, computing functions, and plotting.
Well then see how to implement and analyse some nu-
merical schemes in Matlab.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matlab is an interactive environment for mathemat-
ical and scientic computing. It the standard tool for
numerical computing in industry and research.
Matlab stands for Matrix Laboratory. It specialises
in Matrix and vector computations, but includes func-
tions for graphics, numerical integration and dierentia-
tion, solving dierential equations, etc.
Matlab diers from most signicantly from, say, Maple,
by not having a facility for abstract computation.
7.1 The Basics
Matlab an interpretive environment you type a com-
mand and it will execute it immediately.
The default data-type is a matrix of double precision
oating-point numbers. A scalar variable is in instance
of a 1 1 matrix. To check this set, say,
>> t=10
and use the
>> size(t)
command to nd the numbers of rows and columns of t.
A vector may be declared as follows:
>> x = [1 2 3 4 5 6 7]
This generates at vector with x, with x
1
= 1, x
2
= 2,
etc. However, this could also be done with x=1:7
More generally, a vector may be declared as follows:
>> x = a:h:b;
This sets x
1
= a, x
2
= a +h, x
3
= x + 2h, ..., x
n
= b.
If h is omitted, it is assumed to be 1.
The i
th
element of a vector is access by typing x(i).
The element of in row i and column j of a matrix is given
by A(i,j)
Most scalar functions return a matrix when given a
matrix as an argument. For example, if x is a vector, then
y = sin(x) sets y to be a vector such that y
i
= sin(x
i
).
Matlab has most of the standard mathematical func-
tions: sin , cos , exp , log , etc.
In each case, write the function name followed by the
argument in round brackets, e.g.,
>> exp(x) for e
x
.
The * operator performs matrix multiplication. For
element-by-element multiplication use .*
For example, y = x.*x sets y
i
= (x
i
)
2
. So does y = x.^2.
Similarly, y=1./x set y
i
= 1/x
i
.
If you put a semicolon at the end of a line of Matlab,
the line is executed, but the output is not shown. (This
is useful if you are dealing with large vectors). If no
semicolon is used, the output is shown in the command
window.
7.2 Plotting functions
Dene a vector >> x=[0 1 2 3] and then set
>> f = x.^2 -2
To plot these vectors use:
>> plot(x, f)
If the picture isnt particularly impressive, then this might
be because Matlab is actually only printing the 4 points
that you dened. To make this more clear, use
>> plot(x, f, -o)
This means to plot the vector f as a function of the vec-
tor x, placing a circle at each point, and joining adjacent
points with a straight line.
Try instead: >> x=0:0.1:3 and f = x.^2 -2
and plot them again.
To dene function in terms of any variable, type:
>> F = @(x)(x.^2 -2);
Now you can use this function as follows:
>> plot(x, F(x)) ;
Take care to note that Matlab is case sensitive.
In this last case, it might be helpful to also observe
where the function cuts the x-axis. That can be done
by also plotting the line joining, for example, the points
(0, 0), and (3, 0). This can be done by:
>> plot(x,F(x), [0,3], [0,0]);
Also try >> plot(x,F(x), [0,3], [0,0], --);
Notice the dierence?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Bisection
Revise the lecture notes from Lecture 3 on the Bisection
Method.
Suppose we want to nd a solution to e
x
(2x)
3
= 0
in the interval [0, 5] using Bisection.
Dene the function f as:
>> f = @(x)(exp(x) - (2-x).^3);
Taking x
1
= 0 and x
2
= 5, do 8 iterations of the
Bisection method
Complete the table below. You may use that the
solution is (approximately)
= 0.7261444658054950.
k x
k
| x
k
|
1
2
3
4
5
6
7
8
Lecture 7 Matlab class 1 15 Tue 25/09/12
Implementing the Bisection method by hand is very
tedious. Here is a program that will do it for you. You
dont need to type it all in; you can download it from
http://www.maths.nuigalway.ie/MA385/lab1/Bisection.m
%% Using the Bisection method to find a zero of f(x)
% Lab 1 of MA385
clear; % Erase all stored variables
fprintf(\n\n---------\n Using Bisection );
% The function is
f = @(x)(exp(x) - (2-x).^3);
disp(Solving f=0 with the function);
disp(f);
%% The true solution is
tau = 0.72614446580549503614;
fprintf(The true solution is %12.8f, tau);
%% Our initial guesses are x_1=0 and x_2 =2;
x(1)=0;
fprintf(\n\n%2d | %1.8e | %1.3e \n, ...
1, x(1), abs(tau - x(1)));
x(2)=5;
fprintf(%2d | %1.8e | %1.3e \n, 2, x(2), abs(tau-x(2)));
for k=2:10
x(k+1) = (x(k-1)+x(k))/2;
if ( f(x(k+1))*f(x(k-1)) < 0)
x(k)=x(k-1);
end
fprintf(%2d | %1.8e | %1.3e\n, k+1, x(k+1), ...
abs(tau - x(k+1)));
end
Read the code carefully. If there is a line you do
not understand, then ask a tutor, or look up the on-line
help. For example, nd out that clear on Line 3 does
by typing >> doc clear . . . . . . . . . . . . . . . . . . . . . . . . . .
Q1. Suppose we wanted an estimate x
k
for so that
| x
k
| 10
10
.
(i) In Lecture 3 we saw that |x
k
| (1/2)
k1
|x
1

x
0
|. Use this to estimate how many iterations
are required in theory.
(ii) Use the program above to nd how many iter-
ations are required in practice.
7.4 The Secant method
Recall the the Secant Method is Choose x
0
and x
1
so
that there is a solution in [x
0
, x
1
]. Then dene
x
k+1
= x
k
f(x
k
)
x
k
x
k1
f(x
k
) f(x
k1
)
. for k = 1, 2, 3, . . .
Q2 (a) Adapt the program above to implement the
secant method.
(b) Use it to nd a solution to e
x
(2 x)
3
= 0
in the interval [0, 5].
(c) How many iterations are required to ensure
that the error is less than 10
10
?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Q3 Recall from Denition 4.4 the order of convergence
of a sequence {
0
,
1
,
2
, . . . } is q if
lim
k

k+1

q
k
= ,
for some constant .
We would like to verify that q = (1 +

5)/2
1.618. This is dicult to do computationally be-
cause, after a relatively small number of iterations,
the round-o error becomes signicant. But we
can still try!
Adapt the program above so that at each iteration
it displays
| x
k
|, | x
k
|
1.68
, | x
k
|
2
,
and so deduce that the order of converges is greater
than 1 (so better than bisection), less than 2 (so
not as good as Newtons method that well see in
tomorrows class) and roughly (1 +

5)/2.
Q4 The bisection method is popular because it is ro-
bust: it will always work subject to minimal con-
straints. However, it is slow: if the Secant works,
then it converges much more quickly. How can we
combine these two algorithms to get a fast, robust
method? Consider the following problem:
Solve f(x) = 1
2
x
2
2x + 2
on [10, 1].
You should nd that the bisection method works (slowly)
for this problem, but the Secant method will fail. So write
a hybrid algorithm that switches between the bisection
method and the secant method as appropriate.
Take care to document your code carefully, to show
which algorithm is used when.
How many iterations are required?
Q5 Still have time to spare? Why not try implementing
Newtons method?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5 To Finish
Before you leave the class send an email to
[email protected] with your name, ID number,
and answers to Questions 1, 2 and 3. For bonus credit,
submit your solution to Q4 too.
If you have worked in small groups, you should still
submit separate emails.
Lecture 8 Wrap up: Iteration 16 Thu 27/09/12
8 Wrap up: Iteration
8.1 Introducing Fixed Point Iteration
Newtons method can be considered to be a particular
instance of a very general approach called Fixed Point
Iteration or Simple Iteration.
The basic idea is:
If we want to solve f(x) = 0 in [a, b], nd
a function g(x) such that, if is such that
f() = 0, then g() = . Choose x
0
and for
k 0 set x
k+1
= g(x
k
).
Example 8.1. Suppose that f(x) = exp(x) 2x1 and
we are trying to nd a solution to f(x) = 0 in [1, 2]. Then
we can take g(x) = ln(2x + 1).
We have reformulated the problem in Example 8.1
above, from
For f(x) = exp(x) 2x 1, nd [1, 2]
such that f() = 0,
to:
For g(x) = ln(2x + 1), nd [1, 2] such
that g() = .
If we take the initial estimate x
0
= 1, then we get the
following results
k x
k
| x
k
|
0 1.0000 2.564e-1
1 1.0986 1.578e-1
2 1.1623 9.415e-2
3 1.2013 5.509e-2
4 1.2246 3.187e-2
5 1.2381 1.831e-2
.
.
.
.
.
.
.
.
.
10 1.2558 6.310e-4
To make this table, I used a numerical scheme to solve the
problem quite accurately to get = 1.256431. In general
we dont know in advanceotherwise we wouldnt need
such a scheme! I just given the quantities | x
k
| here
so we can observe that the method is converging, and
get an idea of how quickly it is converging..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
We have to be quite careful with this method: not every
choice is g is suitable.
Suppose we want the solution to
f(x) = x
2
2 = 0 in [1, 2]. We could
choose g(x) = x
2
+ x 2. Taking
x
0
= 1 we get the iterations shown
opposite.
This sequence doesnt converge!
k x
k
0 1
1 0
2 -2
3 0
4 -2
5 0
.
.
.
.
.
.
So we need to rene the method that ensure that it
will. Before we do that in a formal way, consider the
following...
Example 8.2. Use the Mean Value Theorem to show
that the xed point method x
k+1
= g(x
k
) converges if
|g

(x)| < 1 for all x near the xed point.


This is an important example; mostly because it intro-
duces the tricks of using that g() = and g(x
k
) =
x
k+1
. But not a rigorous theory. That requires some
ideas such as the contraction mapping theorem.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 A short tour of xed points and con-
tractions
A variant of the famous Fixed Point Theorem
4
] is : Sup-
pose that g(x) is dened and continuous on [a, b], that
g(x) [a, b] for all x [a, b], then there exists a point
[a, b] such that g() = . That is, g(x) has a xed
point in the interval [a, b].
Try to convince yourself that it is true, by sketching
the graphs of a few functions that send all points in the
interval, say, [1, 2] to that interval, as in Figure 8.1.
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
b
a
b
a
g(x)
Fig. 8.1: Sketch of a function g(x) such that, if a x
b then a g(x) b
The next ingredient we need is to observe that g is a
contraction. That is, g(x) is continuous and dened on
[a, b] and there is a number L (0, 1) such that
|g() g()| L| | for all , [a, b]. (8.10)
4
LEJ Brouwer, 18811966, Netherlands
Lecture 8 Wrap up: Iteration 17 Thu 27/09/12
Proposition 8.3 (Contraction Mapping Theorem). Sup-
pose that the function g is a real-valued, dened, con-
tinuous, and
maps every point in [a, b] to some point in [a, b]
is a contraction on [a, b],
then
(i) g(x) has a xed point [a, b],
(ii) the xed point is unique,
(iii) the sequence {x
k
}

k=0
dened by x
0
[a, b] and
x
k
= g(x
k1
) for k = 1, 2, . . . converges to .
Proof:
8.3 Convergence of Fixed Point Iteration
We now know how to apply to Fixed-Point Method and
to check if it will converge. Of course we cant perform
an innite number of iterations, and so the method will
yield only an approximate solution. Suppose we know
that we want the solution to be accurate to say 10
6
,
how many steps are needed? That is, how big do we
need to take k so that
|x
k
| 10
6
?
The answer is obtained by rst showing that
| x
k
|
L
k
1 L
|x
1
x
0
|. (8.11)
Example 8.4. If g(x) = ln(2x + 1) and x
0
= 1, and
we want |x
k
| 10
6
, then we can use (8.11) to
determine the number of iterations required.
This calculation only gives an upper bound for the
number of iterations. It is correct, but not necessarily
sharp. In Table 8.1 it is shown that this level of accuracy
is achieved after 23 iterations. Even so, 23 iterations a
quite a lot for such a simple problem. So can conclude
that this method is not as fast as, say, Newtons Method.
However, it is perhaps the most generalizable.
i x
k
|x
k
|
0 1.00000 2.56e-01
1 1.09861 1.58e-01
2 1.16228 9.41e-02
3 1.20134 5.51e-02
4 1.22456 3.19e-02
5 1.23812 1.83e-02
6 1.24595 1.05e-02
.
.
.
22 1.25643 1.29e-06
23 1.25643 7.32e-07
Table 8.1: Applying Fixed Point Iteration for E.g. 8.1
8.4 Knowing When to Stop
Suppose you wish to program one of the above methods.
You will get your computer to repeat one of the iterative
methods until your solution is suciently close to the
true solution:
x[0] = 0
tol = 1e-6
i=0
while (abs(tau - x[i]) > tol) // This is the
// stopping criterion
x[i+1] = g(x[i]) // Fixed point iteration
i = i+1
end
All very well, except you dont know . If you did, you
wouldnt need a numerical method. Instead, we could
choose the stopping criterion based on how close succes-
sive estimates are:
while (abs(x[i-1] - x[i]) > tol)
This is ne if the solution is not close to zero. E.g., if
its about 1, would should get roughly 6 accurate gures.
But is = 10
7
then it is quite useless: x
k
could be
ten times larger than . The problem is that we are
estimating the absolute error.
Instead, we usually work with relative error:
while (abs (
x[i1]x[i]
x[i]
) > tol)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 What has Newtons method ever done
for me?
When studying a numerical method (or indeed any piece
of Mathematics) it is important to know why? you are
doing this. Sometimes it is: so that you can understand
Lecture 8 Wrap up: Iteration 18 Thu 27/09/12
other topics later; because it is interesting/beautiful in
its own right; (most commonly) because it is useful.
Here are some examples of each of these:
1. The analyses we have used in this section allowed us
to consider some important ideas in a simple setting.
Examples include
Convergence, include rates of convergence
Fixed-point theory, and contractions. Well be
seeing analogous ideas in the next section (Lips-
chitz conditions).
The approximation of functions by polynomials
(Taylors Theorem). This point will reoccur in the
next section, and all through-out next semester.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. Applications come from lots of areas of science
and engineering. Less obvious might be applications to
nancial mathematics.
The celebrated Black-Scholes equation for pricing a
put option can be written as
V
t

1
2

2
S
2

2
V
S
2
rS
V
S
+rV = 0.
where
V(S, t) is the current value of the right (but not
the obligation) to buy or sell (put or call) an
asset at a future time T;
S is the current value of the underlying asset;
r is the current interest rate (because the value of
the option has to be compared with what we would
have gained by investing the money we paid for it)
is the volatility of the assets price.
Often one knows S, T and r, but not . The method of
implied volatility is when we take data from the market
and then nd the value of which would give the data
as the solution to the B-S equation. This is a nonlin-
ear problem and so Newtons method can be used. See
Chapters 13 and 14 of Highams An Introduction to Fi-
nancial Option Valuation for more details.
(We will return to the Black-Scholes problem again
at the end of the next section).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3. Some of these ideas are interesting and beautiful.
Consider Newtons method. Suppose that we want to
nd the complex n
th
roots of unity: the set of numbers
{z
0
, z
1
, z
2
, . . . , z
n1
} whos n
th
roots are 1. Although we
could just express this in terms of polar coordinates:
z = e
i
where =
2k
n
for some k Z = {. . . , 2, 1, 0, 1, 2 . . . } and i =

1.
Example: 1, i, 1 and i are the 4th roots of unity.
Recall that, plotted in the Argand Plane, these point
form a regular polygon.
If we were to use Newtons method to nd a give
root, we could try to solve f(z) = 0 with f(z) = z
n
1.
Applying Newtons method gives the iteration:
z
k+1
= z
k

(z
k
)
n
1
n(z
k
)
(n1)
.
However, there are n possible solutions; given a particular
starting point, which root with the method converge to?
If we take a number of points in a region of space, iterate
on each of them, and then colour the points to indicate
the ones that converge to the same root, we get the
famous Julia
5
set, an example of a fractal. One such
Julia set, generate by the Matlab script Julia.m, which
you can download from the course website. is shown
below in Figure 8.2.
1.5 1 0.5 0 0.5 1 1.5
1.5
1
0.5
0
0.5
1
1.5
Fig. 8.2: A contour plot of a Julia set with n = 5
Exercise 8.1. Is it possible for g to be a contraction
on [a, b] but not have a xed point in [a, b]? Give an
example to support your answer.
Exercise 8.2. Show that g(x) = ln(2x+1) is a contrac-
tion on [1, 2]. Give an estimate for L. (Hint: Use the
Mean Value Theorem).
Exercise 8.3. Consider the function g(x) = x
2
/4 +
5x/4 1/2.
1. It has two xed points what are they?
5
Gaston Julia, French mathematician 1893
1978. The famous paper which introduced
these ideas was published in 1918 when he
was just 25. Interest later waned until the
1970s when Mandelbrots computer experi-
ments reinvigorated interest.
Lecture 8 Wrap up: Iteration 19 Thu 27/09/12
2. For each of these, nd the largest region around
then such that g is a contraction on that region.
Exercise 8.4. Although we didnt prove it in class, it
turns out that a xed point method given by
x
k+1
= g(x
k
),
will converge with order p to the point if g() = and
g

() = g

() = = g
(p1)
() = 0.
(i) Use a Taylor Series expansion to prove this.
(ii) We can think of Newtons Method for the problem
f(x) = 0 as xed point iteration with g(x) = x
f(x)/f

(x). Use this to show that Newtons method


converges with order 2 providing that f

() = 0.
See http://www.maths.nuigalway.ie/MA385/PS1.pdf for
a summary of the exercises on solving nonlinear equa-
tions.
Lecture 9 Initial Value Problems 20 Tue 02/10/12
9 Initial Value Problems
9.1 Introduction
(This introduction is ripped o from [5, Chap. 6]. How-
ever, the rest of the notes mostly follow [1, Chap. 12].)
The growth of some tumours can be modelled as
R

(t) =
1
3
S
i
R(t) +
2
R +
_

2
R
2
+ 4
,
subject to the initial condition R(t
0
) = a, where R is
the radius of the tumour at time t. Clearly, it would
be useful to know the value of R as certain times in the
future. Though its essentially impossible to solve for R
exactly, we can accurately estimate it. In this section,
well study techniques for this.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Initial Value Problems (IVPs) are dierential equations
of the form: Find y(t) such that
d y
d t
= f(t, y) for t > t
0
, and y(t
0
) = y
0
. (9.1)
Here y

= f(t, y) is the dierential equation and y(t


0
) =
y
0
is the initial value.
Some IVPs are easy to solve. For example:
y

= t
2
with y(1) = 1.
Just integrate the dierential equation to get that
y(t) = t
3
/3 +C,
and use the initial value to nd the constant of integra-
tion. This gives the solution y

(t) = (t
3
+ 2)/3. How-
ever, most problems are much harder, and some dont
have solutions at all.
In many cases, it is possible to determine that a giv-
ing problem does indeed have a solution, even if we cant
write it down. The idea is that the function f should be
Lipschitz, a notion closely related to that of a contrac-
tion (8.10).
Denition 9.1. A function f satises a Lipschitz
6
Con-
dition (with respect to its second argument) in the rect-
angular region D if there is a positive real number L such
that
|f(t, u) f(t, v)| L|u v| (9.2)
for all (t, u) D and (t, v) D.
Example 9.2. For each of the following functions f, show
that is satises a Lipschitz condition, and give an upper
bound on the Lipschitz constant L.
6
Rudolf Otto Sigismund Lipschitz, Germany, 18321903. Made
many important contributions to science in areas that include dier-
ential equations, number theory, Fourier Series, celestial mechanics,
and analytic mechanics.
(i) f(t, y) = y/(1 +t)
2
for 0 t .
(ii) f(t, y) = 4y e
t
for all t.
(iii) f(t, y) = (1 +t
2
)y + sin(t) for 1 t 2.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The reason we are interested in functions satisfying
Lipschitz conditions is as follows:
Proposition 9.3 (Picards
7
). Suppose that the real-valued
function f(t, y) is continuous for t [t
0
, t
M
] and y
[y
0
C, y
0
+ C]; that |f(t, y
0
)| K for t
0
t t
M
;
and that f satises the Lipschitz condition (9.2). If
C
K
L
_
e
L(t
M
t
0
)
1
_
,
then (9.1) us a unique solution on [t
0
, t
M
]. Furthermore
|y(t) y(t
0
)| C t
0
t t
M
.
You are not required to know this theorem for this course.
However, its important to be able to determine when a
given f satises a Lipschitz condition.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exercise 9.1. For the following functions show that they
satisfy a Lipschitz condition on the corresponding do-
main, and give an upper-bound for L:
(i) f(t, y) = 2yt
4
for t [1, ),
(ii) f(t, y) = 1 +t sin(ty) for 0 t 2.
Exercise 9.2. Many text books, instead of giving the
Lipschitz condition (9.2) give the following: There is a
nite, positive, real number L such that
|

y
f(t, y)| L for all (t, y) D.
Is this statement stronger than (i.e., more restrictive then),
equivalent to or weaker than (i.e., less restrictive than)
statement (9.2)? Justify your answer.
7
Charles Emile Picard, France, 18561941
Lecture 10 One step methods 21 Thu 04/10/12
10 One step methods
10.1 Eulers Method
Classical numerical methods for IVPs attempt to generate
approximate solutions at a nite set of discrete points
t
0
< t
1
< t
2
< t
n
. The simplest is Eulers Method
8
and may be motivated as follows:
Suppose we know y(t
i
), and want to nd y(t
i+1
). From
the DE, we know the slope of the tangent to y at t
i
. So,
if this is similar to the slope of the line joining (t
i
, y(t
i
))
and (t
i+1
, y(t
i+1
)):
y

(t
i
) = f(t
i
, y(t
i
))
y
i+1
y
i
t
i+1
t
i
.
. . . . . . . . . . . . . . . . . . ..................
. . . . . . . . . . . . . . . . . . ..................
. . . . . . . . . . . . . . . . . . ..................
................... . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
................................................
............... . . . . . . . . . . . . .
.......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..
............................................................
..........................
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . ...............
............... . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . ...............
. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .

Tangent to y(x) at x = x
i
y(t)
Secant line joining the points
t
i+1
(t
i
, y
i
)
t
i
h
on y(t) at t
i
and t
i+1
(t
i+1
, y
i+1
)
Method 10.1 (Eulers Method). Choose equally spaced
points t
0
, t
1
, . . . , t
n
so that
t
i
t
i1
= h = (t
n
t
0
)/n for i = 0, . . . , n 1.
Let y
i
denote the approximation for y(t) at t = t
i
(i.e.,
y
i
y(t
i
). Set
y
i+1
= y
i
+hf(t
i
, y
i
), i = 0, 1, . . . , n 1. (10.1)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 10.2. Taking h = 1, estimate y(4) where
y

(t) = y/(1 +t
2
), y(0) = 1.
If we had chosen h = 4 we would have only required
one step: y
n
= y
0
+4f(t
0
, y
0
) = 5. However, this would
8
Leonhard Euler, 17071783, Switzerland. One of the
greatest Mathematicians of all time, he made vital
contributions to geometry, calculus and number the-
ory. nite dierences, special functions, dierential
equations, continuum mechanics, astronomy, elastic-
ity, acoustics, light, hydraulics, music, cartography,
and much more.
not be very accurate. With a little work one can show
that the solution to this problem is y(t) = e
tan
1
(t)
and
so y(4) = 3.7652. Hence the computed solution with
h = 1 is much more accurate than the computed solution
when h = 4. This is also demonstrated in Figure 10.1
below, and in Table 10.1 where we see that the error
seems to be proportional to h.
Fig. 10.1: Eulers method for Example 10.2 with h = 4,
h = 2, h = 1 and h = 1/2
n h y
n
|y(t
n
) y
n
|
1 4 5.0 1.235
2 2 4.2 0.435
4 1 3.960 0.195
8 1/2 3.881 0.115
16 1/4 3.831 0.065
32 1/8 3.800 0.035
Table 10.1: Error in Eulers method for Example 10.2
10.2 One-step methods
Eulers method is an example of the more general one-
step methods, which have the form:
y
i+1
= y
i
+h(t
i
, y
i
; h). (10.2)
Lecture 10 One step methods 22 Thu 04/10/12
To get Eulers method, just take (t
i
, y
i
; h) = f(t
i
, y
i
).
In the introduction, we motivated Eulers method
with a geometrical argument. An alternative, more math-
ematical way of deriving Eulers Method is to use a Trun-
cated Taylor Series.
This again motivates formula (10.1), and also sug-
gests that at each step the method introduces a (local)
error of h
2
y

()/2. (More of this later).


10.3 Errors
Well now give an error analysis for general one-step
methods, and then look at Eulers Method as a specic
example. First, some denitions.
Denition 10.3. Global Error : E
i
= y(t
i
) y
i
.
Denition 10.4. Truncation Error :
T
i
:=
y(t
i+1
) y(t
i
)
h
(t
i
, y(t
i
); h). (10.3)
It can be helpful to think of T
i
as representing how
much the dierence equation (10.1) diers from the dif-
ferential equation. We can also determine the truncation
error for Eulers method directly from a Taylor Series.
The relationship between the global error and trun-
cation errors is explained in the following (important!)
result, which in turn is closely related to Proposition 9.3:
Proposition 10.5. Let () be Lipschitz with constant
L. Then
|E
n
| T
_
e
L(t
n
t
0
)
1
L
_
, (10.4)
where T = max
i=0,1,...n
|T
i
|.
Exercise 10.1. An important step in the proof of Prop. 10.5
and which we didnt do in class, requires the observation
that if |E
i+1
| |E
i
|(1 +hL) +h|T
i
|, then
|E
i
|
T
L
_
(1 +hL)
i
1

i = 0, 1, . . . , N.
Use induction to show that is indeed the case.
Lecture 11 Analysis of one-step methods 23 Tue 09/10/12
11 Analysis of one-step methods
11.1 Recall...
Last Thursday we were introduced to the idea of one-step
methods for solving IVPs:
y
i+1
= y
i
+h(t
i
, y
i
; h).
The only example we seem so far is Eulers method: take
(t
i
, y
i
; h) = f(t
i
, y
i
).
We then dened: Global Error : E
i
= y(t
i
) y
i
,
and Truncation Error :
T
i
:=
y(t
i+1
) y(t
i
)
h
(t
i
, y(t
i
); h).
The relationship between these was then shown to be
|E
n
| T
_
e
L(t
n
t
0
)
1
L
_
,
where T = max
i=0,1,...n
|T
i
|, providing that is Lips-
chitz.
11.2 Example: Eulers method
For Eulers method, we get
T = max
0jn
|T
j
|
h
2
max
t
0
tt
n
|y

(t)|.
Example 11.1. Given the problem:
y

= 1 +t +
y
t
for t > 1; y(1) = 1,
nd an approximation for y(2).
(i) Give an upper bound for the global error taking n =
4 (i.e., h = 1/4)
(ii) What n should you take to ensure that the global
error is no more that 0.1?
To answer these questions we need to use (10.4):
|E
n
|
T
L
_
e
L(t
n
t
0
)
1
_
,
which requires that we nd L and an upper bound for T.
In this instance, L is easy:
(Note: This is a particularly easy example. Often we need
to employ the mean value theorem. See [1, Eg 12.2].)
To nd T we need an upper bound for |y

(t)| on
[1, 2], even though we dont know y(t). However, we do
know y

(t)....
Plug these values into (10.4) to nd E
n
0.644. In
fact, the true answer is 0.43, so we see that (10.4) is
somewhat pessimistic.
To answer (ii): What n should you take to ensure that
the global error is no more that 0.1? (We should get
n = 26. This is not that sharp: n = 19 will do).
11.3 Convergence and Consistency
We are often interested in the convergence of a method.
That is, is is true that
lim
h0
y
n
= y(t
n
)?
Or equivalently that,
lim
h0
E
n
= 0?
Given that the global error for Eulers method can be
bounded:
|E
n
| h
max |y

(t)|
2L
_
e
L(t
n
t
0
)
1
_
= hK, (11.1)
we can say it converges.
Denition 11.2. The order of accuracy of a numerical
method is p if there is a constant K so that
|T
n
| Kh
p
.
So Eulers method is 1st-order.
One of the requirements for convergence is Consistency:
Denition 11.3. A one-step method y
n+1
= y
n
+
h(t
n
, y
n
; h) is consistent with the dierential equa-
tion y

(t) = f
_
t, y(t)
_
if f(t, y) (t, y; 0).
Lecture 11 Analysis of one-step methods 24 Tue 09/10/12
11.4 Higher-order methods
Next well try to develop methods that are of higher order
than Eulers method; that is that we can show
|E
n
| Kh
p
for some p > 1.
Suppose we numerically solve some dierential equa-
tion and estimate the error. If we think this error is too
large we could redo the calculation with a smaller value of
h. Or we could use a better method, for example Runge-
Kutta methods. These are high-order methods that rely
on evaluating f(t, y) a number of times at each step in
order to improve accuracy.
Well rst motivate one such method and then, in the
next lecture, look at the general framework.
11.5 Modied Euler Method
Recall the motivation for Eulers method from 10.1.
We can do something similar for the Modied Eulers
method.
In Eulers method, we use the slope of the tangent to
y at t
i
as an approximation for the slope of the secant
line joining the points (t
i
, y(t
i
)) and (t
i+1
, y(t
i+1
)).
One could argue, given the diagram below, that the
slope of the tangent to y at t = (t
i
+t
i+1
)/2 = t
i
+h/2
would be a better approximation. This would give
y(t
i+1
) y
i
+hf(t
i
+h/2, y(t
i
+h/2)).
However, we dont know y(t
i
+ h/2), but can approx-
imate it using Eulers Method: y(t
i
+ h/2) y
i
+
(h/2)f(t
i
, y
i
).
. . . . . . . . . . . . . . . . . . . ...................
. . . . . . . . . . . . . . . . . . . ...................
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
..................... . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
................................................................
............................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
................ . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . ................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
.......

y(x)
Secant line joining the points
t
i+1
(t
i
, y
i
)
t
i
h
Tangent to y(t) at t = t
i
+h/2
on y(t) at t
i
and t
i+1
Example 11.4. Use the Modied Euler Method to ap-
proximate y(1) where
y(0) = 1, y

(t) = ylog(1 +t
2
).
This has the solution y(t) = (1+t
2
)
t
exp(2t+2 tan
1
t).
In Table 11.1 below we compare the error in the solution
to this problem using Eulers Method (left) and the Mod-
ied Eulers Method (right) for various values of n.
Clearly we get a much more accurate result using the
Modied Euler Method. Even more importantly, we get
Euler Modied
n E
n
E
n
/E
n1
E
n
E
n
/E
n1
1 3.02e-01 7.89e-02
2 1.90e-01 1.59 2.90e-02 2.72
4 1.11e-01 1.72 8.20e-03 3.54
8 6.02e-02 1.84 2.16e-03 3.79
16 3.14e-02 1.91 5.55e-04 3.90
32 1.61e-02 1.95 1.40e-04 3.95
64 8.13e-03 1.98 3.53e-05 3.98
128 4.09e-03 1.99 8.84e-06 3.99
Table 11.1: Errors in solutions to Example 11.4 using
Eulers and Modied Eulers Methods
a higher order of accuracy: if we half h, the error in the
Modied method is reduced by a factor of four.
Exercise 11.1. Given the problem:
y(1) = 1, y

= (t 1) sin(y),
nd an approximation for y(2).
(i) Give an upper bound for the global error taking n =
4 (i.e., h = 1/4)
(ii) What n should you take to ensure that the global
error is no more that 10
3
?
Exercise 11.2. As a special case in which the error of
Eulers method can be analysed directly, consider Eulers
method applied to
y

(t) = y(t), y(0) = 1.


The true solution is y(t) = e
t
.
(i) Show that the solution to Eulers method can be
written as
y
i
= (1 +h)
t
i
/h
, i 0.
(ii) Show that
lim
h0
(1 +h)
1/h
= e.
This then shows that, if we denote by y
n
(T) the ap-
proximation for y(T) obtained using Eulers method
with n intervals between t
0
and T, then
lim
n
y
n
(T) = e
T
.
Hint: Let w = (1+h)
1/h
, so that log w = (1/h) log(1+
h). Now use lHospitals rule to nd lim
h0
w.
Exercise 11.3. An alternative denition of consistency
(see [1, p321]) is
A one-step method is consistent with the cor-
responding dierential equation if the trun-
cation error is such that for any > 0 there
exists h() > 0 such that |T
n
| < for 0
h h() and any pair of points on the solu-
tion curve.
Show that this is equivalent to Denition 11.3. State
what, if any, assumptions you make about .
Lecture 12 RK2 25 Thu 11/10/11
12 RK2
12.1 Introduction
The goal of todays lecture is to provide you with some
techniques for deriving your own methods for accurately
solving IVPs. Rather than using formal theory, the ap-
proach will be based on carefully chosen examples.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Towards the end of the last lecture, we studied a
heuristic derivation of the so-called Modied (Midpoint)
Euler Method:
y
i+1
= y
i
+hf
_
t
i
+
h
2
, y
i
+
h
2
f(t
i
, y
i
)
_
.
This is an example of one of the (large) family of 2
nd
-
order Runge-Kutta (RK2) methods is given by:
y
i+1
= y
i
+h(ak
1
+bk
2
)
k
1
= f(t
i
, y
i
)
k
2
= f(t
i
+h, y
i
+hk
1
).
(12.1)
If we choose a, b, and in the right way, then the
error for the method will be bounded by Kh
2
, for some
constant K.
An (uninteresting) example of such a method is if we
take a = 1 we get b = 0 and so it reduces to Eulers
Method. If we can choose = = 1/2, a = 0, b = 1
and get the Modied method above.
Our aim now is to deduce general rules for choosing a,
b, and . Well see that if we pick any one of these four
parameters, then the requirement that the method be
consistent and second-order determines the other three.
12.2 Using consistency
By demanding that RK2 be consistent we get that a +
b = 1.
12.3 Ensuring that RK2 is 2
nd
-order
Next we need to know how to choose and . The
formal way is to use a two-dimensional Taylor series ex-
pansion. It is quite technical, and not suitable for doing in
last. Detailed notes on it are given in Section 12.5 below.
Instead well take a less rigorous, heuristic approach.
Because we expect that, for a second order accurate
method, |E
n
| Kh
2
where K depends on y

(t), if we
choose a problem for which y

(t) 0, we expect no
error...
In the above example, the right-hand side of the dif-
ferential equation, f(t, y), depended only on t. Now well
try the same trick: using a problem with a simple known
solution (and zero error), but for which f depends explic-
itly on y.
Consider the DE y(1) = 1, y

(t) = y(t)/t. It has a


simple solution: y(t) = t. We now use that any RK2
method should be exact for this problem to deduce that
= .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Now we collect the above results all together and
show that the second-order Runge-Kutta (RK2) methods
are:
y
i+1
= y
i
+h(ak
1
+bk
2
)
k
1
= f(t
i
, y
i
) k
2
= f(t
i
+h, y
i
+hk
1
),
where we choose any b = 0 and then set
a = 1 b, =
1
2b
, = .
It is easy to verify that the Modied method satises
these criteria.
12.4 Exercises
Exercise 12.1. Another popular RK2 method called the
Improved Euler Method, is obtained by choosing = 1.
(i) Use the Improved Euler Method to nd an approx-
imation for y(4) when
y(0) = 1, y

= y/(1 +t
2
),
taking n = 2. (If you wish, use Matlab.)
(ii) Using a diagram similar to the one above for the
Modied Euler Method, justify the assertion that
the Improved Euler Method is more accurate than
the basic Euler Method.
Lecture 12 RK2 26 Thu 11/10/11
(iii) Show that the method is consistent.
(iv) Write out what this method would be for the prob-
lem: y

(t) = y for a constant . How does this


relate to the Taylor series expansion for y(x
i+1
)
about the point x
i
? [Note: this exercise will make
a little more sense after next Thursdays class.]
Exercise 12.2. As we know, if a one-step method has
order then there is a constant K such that |E
n
| Kh

.
It is common practice to numerically verify the order a
convergence of a method by
log
2
_
|E
n
|
|E
2n
|
_
.
Give a mathematical justication for this formula. [Note:
this exercise will make a little more sense after next Tues-
days class].
Exercise 12.3. In his seminal paper of 1901, Carl Runge
gave the following example of what we now call a Runge-
Kutta 2 method:
y
n+1
= y
n
+
h
4
_
f(t
n
, y
n
)+
3f
_
t
n
+
2
3
h, y
n
+
2
3
hf(t
n
, y
n
)
_
_
.
(i) Show that it is consistent.
(ii) Show how this method ts into the general frame-
work of RK2 methods.
(iii) Use it to estimate the solution at the point t = 2
to y(1) = 1, y

= 1 + t + y/t taking n = 2 time


steps.
(iv) Write down what this method is when problemy

(t) =
y(t). How does this compare with the Taylor se-
ries for y(t
i+1
) about the point t
i
?
12.5 Formal Dirivation of RK2
This section is based on [1, p422]. We didnt actually
cover this in class, instead opting to to deduce the same
result in an easier, but unrigorous way in Section 12.3. If
were to do it properly, this how we would do it.
We will require that the Truncation Error (10.3) being
second-order. More precisely, we want to be able to say
that
|T
n
| = |
y(t
n+1
) y(t
n
)
h
(t
n
, y(t
n
); h)| Ch
2
where C is some constant that does not depend on n or
h.
So the problem is to nd expressions (using Taylor se-
ries) for both
_
y(t
n+1
) y(t
n
)
_
/h and (t
n
, y(t
n
); h)
that only have O(h
2
) remainders. To do this we need to
recall two ideas from 2nd year calculus:
To dierentiate a function f
_
a(t), b(t)
_
with re-
spect to t:
df
dt
=
f
a
da
dt
+
f
b
db
dt
;
The Taylor Series for a function of 2 variables, trun-
cated at the 2nd term is:
f(x +
1
, y +
2
) = f(x, y) +
1
f
x
(x, y)+

2
f
y
(x, y) +C(max{
1
,
2
})
2
.
for some constant C. See [1, p422] for details.
To get an expression for |y(t
n+1
) y(t
n
)|, use a
Taylor Series:
y(t
n+1
) = y(t
n
) +hy

(t
n
) +
h
2
2
y

(t
n
) + O(h
3
)
= y(t
n
) +hf
_
t
n
, y(t
n
)
_
+
h
2
2
_
f
_
t
n
, y(t
n
)
_
_

+ O(h
3
)
= y(t
n
) +hf
_
t
n
, y(t
n
)
_
+
h
2
2
_

t
f
_
t
n
, y(t
n
)
_
+
y

(t
n
)

y
f
_
t
n
, y(t
n
)
_
_
+ O(h
3
),
= y(t
n
) +hf
_
t
n
, y(t
n
)
_
+
h
2
2
_

t
f +f

y
f
_
_
t
n
, y(t
n
)
_
+ O(h
3
),
= y(t
n
) +hf
_
t
n
, y(t
n
)
_
+
h
2
2
_
{
t
+ {
y
F
_
+ O(h
3
).
This gives
|y(t
n+1
) y(t
n
)|
h
= f
_
t
n
, y(t
n
)
_
+
h
2
_

t
f +f

y
f
_
_
t
n
, y(t
n
)
_
+ O(h
2
). (12.2)
Next we expand the expression f(t
i
+h, y
i
+hf(t
i
, y
i
)
using a (two dimensional) Taylor Series:
f
_
t
n
+h, y
n
+hf(t
n
, y
n
)
_
=
f(t
n
, y
n
) +h

t
f(t
n
, y
n
)+
hf(t
n
, y
n
)

y
f(t
n
, y
n
)
_
+ O(h
2
).
This leads to the following expansion for (t
n
, y(t
n
)):

_
t
n
, y(t
n
); k
_
= (a +b)f
_
t
n
, y(t
n
)
_
+
h
_
b

t
f +bf

y
f
_
_
t
n
, y(t
n
)
_
+ O(h
2
). (12.3)
So now, if we are to subtract (12.3) from (12.2) and
leave only terms of O(h
2
) we have to choose a +b = 1,
b = 1/2 = b = 1/2. That is, choose and let
= , b =
1
2
, a = 1 b. (12.4)
(For a more detailed exposition, see [1, Chap 12]).
Lecture 13 Matlab Class 2: RK1 and RK2 27 Tue 16/10/12
13 Matlab Class 2: RK1 and RK2
In this lab-based lecture youll extend your familiarity with
Matlab, and to use it to implement some numerical meth-
ods for IVPs, and study their order of accuracy.
13.1 Four ways to dene a vector
We know that the most fundamental object in Matlab
is a matrix. The the simplest (nontriveal) example of a
matrix is a vector. So we need to know how to dene
vectors. Here are several ways we could dene the vector
x = (0, .2, .4, . . . , 1.8, 2.0).
x = 0:0.2:2 % From 0 to 2 in steps of 0.2
x = linspace(0, 2, 11); % 11 equally spaced
% points with x(1)=0, x(11)=2.
x = [0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, ...
1.6, 1.8, 2.0]; % Define points individually
The last way is rather tedious, but the one is worse:
x(1)=0.0; x(2)=0.2, x(3)=0.4; x(4)=0.6; ...
Well see a less tedious way of doing this last approach
in Section 13.3 below.
13.2 Script les
Matlab is an interpretative environment: if you type a
(correct) line of Matlab code, and hit return, it will ex-
ecute it immediately. For example: try >> exp(1) to
get a decimal approximation of e.
However, we usually want to string together a collec-
tion of Matlab operations and run the repeatedly. To do
that, it is best to store these commands in a script le.
This is done be making a le called, for example, Lab2.m
and placing in it a series of Matlab commands. A script
le is run from the Matlab command window by typing
the name of the script, e.g., >> Lab1
Try putting some of the above commands for dening
a vector into a script le and run it.
13.3 forloops
When we want to run a particular set of operations a
xed number of times we use a forloop.
It works by iterating over a vector; at each iteration
the iterand takes each value stored in the vector in turn.
For example, here is another way to dene the vector
above:
for i=0:10 % 0:10 is the vector [0,1,..., 10]
x(i+1) = i*0.2;
end
13.4 Functions
In Matlab we can dene a function in a way that is quite
similar to the mathematical denition of a function. The
syntax is >> Name = @(Var)(Formula); Examples:
f = @(x)(exp(x) - 2*x -1);
g = @(x)(log(2*x +1));
How we can call, say, f(1) to compute e 3. Note also
that if x is a vector, so too is f(x).
But a more interesting example would be to try
>> xk = 1;
and the repeat the line
>> xk = g(xk)
Try this and observe that the values of xk seem to be
converging. This is because we are using Fixed Point
Iteration.
Later today, well need to know how to dene func-
tions of two variables. This can be done as:
f = @(y,t)(y./(1 + t.^2));
13.5 Plotting function
Matlab has two was to plot functions. The easiest way
is to use a function called ezplot:
ezplot(f, [0, 2]);
plots the function f(x) for 0 x 2. A more exible
approach is to use the plot function, which can plot
one vector as a function of another. Try these examples
below, making sure you rst have dened the vector x
and the functions f and g:
figure(1);
plot(x,f(x));
figure(2);
plot(x,f(x), x, g(x), --, x,x, -.);
Can you work out what the syntax -- and -. does?
If not, ask a tutor. Also try
plot(x,f(x), g-o, x, g(x), r--x, ...
x,x, -.);
13.6 How to learn more
You cant expect these notes to provide an encyclopedic
guide to Matlab just enough information to get started.
There are many good references online. As an exercise
access Learning MATLAB by Tobin Driscoll through the
NUI Galway library portal. Read Section 1.6: Things
about MATLAB that are very nice to know, but which
often do not come to the attention of beginners.
Lecture 13 Matlab Class 2: RK1 and RK2 28 Tue 16/10/12
13.7 Initial Value Problems
The particular example of an IVP that well look at in
this lab is: estimate y(4) given that
y

(t) = y/(1 +t
2
), for t > 0, and y(0) = 1.
(13.1)
The true solution to this is y(t) = e
arctan(t)
. If you dont
want to solve this problem by hand, you could use Maple.
The command is:
dsolve({D(y)(t)=y(t)/(1+t^2),y(0)=1},y(t));
13.8 Eulers Method
Eulers Method is
Choose n, the number of points at which you will
estimate y(t). Let h = (t
n
t
0
)/n, and t
i
=
t
0
+ih.
For i = 0, 1, 2, . . . , n1 set y
i+1
= y
i
+hf(t
i
, y
i
).
Then y(t
n
) y
n
. As shown during Lecture 10, the
global error for Eulers method can be bounded:
|E
n
| := |y(T) y
n
| Kh,
for some constant K that does not depend on h (or n).
That is, if we half h (i.e, double n), the error is reduced
by half.
Download the Matlab script le Euler.m. It can be
run in Matlab simply by typing >> Euler
It implements Eulers method for n = 1. Read the le
carefully and make sure you understand it.
The program computes a vector y that contains the
estimates for y at the time-values specied in the vector
t. However, Matlab indexes all vectors from 1, and not
0. So t(1) = t
0
, t(2) = t
1
, ... t(n + 1) = t
n
.
By changing the value of n, complete the table.
We want to use this table to verify that Eulers Method
is 1
st
-order accurate. That is:
|E
n
| Kh

with = 1.
A computational technique that veries the order of the
method is to estimate by
log
2
_
|E
n
|
|E
2n
|
_
. (13.2)
Use the data in the table verify that 1 for Eulers
Method.
n y
n
E
n

2 4.2 4.347 10
1
4 3.96 1.947 10
1
8
16
32
64
128
256
512
13.9 More Matlab: formatted output
When you read the code for the Euler.m script le, youll
see that it uses the disp function to display messages and
values. It is not a very exible function. For example,
it only prints one piece of information per line, and it
doesnt display in scientic notation where appropriate.
A better approach is to use the more exible, if com-
plicated, fprintf function. Its basic syntax is:
fprintf(Here is a message\n); where the \n
indicates a new line.
Using fprintf to display the contents of a variable
is a little more involved, and depends on how we want
the data displayed. For example:
To display an integer:
fprintf(Using n=%d steps\n, n);
To display a oating point number:
fprintf(y(n)=%f\n, Y(n+1));
To display in exponential notation:
fprintf(Error=%e\n, Error);
To display 3 decimal places:
fprintf(y(n)=%.3f\n, Y(n+1)); or
fprintf(Error=%.3e\n, Error);
Use these to improve the formatting of the output from
your Euler.m script. To get more information on fprintf,
type >> doc fprintf , or ask one of the tutors.
13.10 Runge-Kutta 2 Methods
The RK2 method is:
y
i+1
= y
i
+h(ak
1
+bk
2
),
where k
1
= f(t
i
, y
i
) and k
2
= f(t
i
+h, y
i
+hk
1
). In
Lecture 12, we saw that if we pick any b = 0, and let
a = 1 b, =
1
2b
, = , (13.3)
then we get a second order method:
|E
n
| := |y(T) y
n
| Kh
2
, (13.4)
Lecture 13 Matlab Class 2: RK1 and RK2 29 Tue 16/10/12
13.11 The Modied Euler Method
If we choose b = 1, we get the so-called Modied or mid-
point Euler Method from Lecture 12. A saw a heuristic
justication for the assertion that it should be more ac-
curate than the basic Euler Method..
Adapt the Euler.m script to implement this method,
and complete the table below.
n y
n
E
n

2 3.720 4.526 10
2
4 3.786 2.108 10
2
8
16
32
64
128
256
512
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Important: Send a summary of your numerical re-
sults, and your code for the Modied method to
[email protected] before the end of the lab.
13.12 Next week...
This lab serves as an introduction to a more substantial
assignment that combines Matlab programming exercise,
and a written homework assignment. More details in
Thursdays class....
Lecture 14 Higher-order methods 30 Thu 18/10/12
14 Higher-order methods
The goal of todays class is the introduce the most com-
monly used higher-order method: the so-called Runge-
Kutta 4 method.
14.1 Runge-Kutta 4
It is possible to construct methods that have higher rates
of convergence, for example to nd a method such that
|y(t
n
) y
n
| Ch
4
.
However, writing down the general form of the RK4 method,
and then deriving conditions on the parameters is rather
complicated. Therefore, well simply state the most pop-
ular RK4 method, without proving that it is 4th-order.
4th-Order Runge Kutta Method (RK4):
k
1
= f(t
i
, y
i
),
k
2
= f(t
i
+
h
2
, y
i
+
h
2
k
1
),
k
3
= f(t
i
+
h
2
, y
i
+
h
2
k
2
),
k
4
= f(t
i
+h, y
i
+hk
3
),
y
i+1
= y
i
+
h
6
_
k
1
+ 2k
2
+ 2k
3
+k
4
).
It can interpreted as
k
1
is the slope of y(t) at t
i
.
k
2
is an approximation for the slope of y(t) at t
i
+h/2
(using Eulers Method).
k
3
is an improved approximation for the slope of y(t) at
t
i
+h/2.
k
4
is an approximation for the slope of y(t) at t
i+1
com-
puted using the slope at y(t
i
+h/2).
Finally The slope of the secant line joining y(t) at the
points t
i
and t
i+1
is approximated using a weighted
average of the of the above values.
Example 14.1. Recall the test problem (13.1). In Ta-
ble 14.1 below we give the errors in the solutions com-
puted using various methods and values of n.
14.2 RK4: consistency and convergence
Although we wont do a detailed analysis of RK4, we can
do a little. In particular, we would like to show it is
(i) consistent,
(ii) convergent, and 4th-order, at least for some exam-
ples.
|y(t
n
) y
n
|
n Euler Modied RK4
1 1.235 3.653e-01 6.037e-01
2 4.347e-01 4.526e-02 1.953e-01
4 1.947e-01 2.108e-02 1.726e-02
8 1.154e-01 1.146e-02 9.192e-04
16 6.530e-02 4.274e-03 5.712e-05
32 3.501e-02 1.315e-03 3.474e-06
64 1.817e-02 3.650e-04 2.122e-07
128 9.263e-03 9.619e-05 1.308e-08
256 4.678e-03 2.469e-05 8.112e-10
512 2.351e-03 6.254e-06 5.050e-11
1024 1.178e-03 1.574e-06 3.154e-12
2048 5.899e-04 3.948e-07 1.839e-13
Table 14.1: Errors in solutions to Example 14.1 using
Eulers, Modied, and RK4
Example 14.2. It is easy to see that RK4 is consistent:
Example 14.3. In general, showing the rate of con-
vergence is tricky. Instead, well demonstrate how the
method relates to a Taylor Series expansion for the prob-
lem y

= y where is a constant.
14.3 The (Butcher) Tableau
A great number of RK methods have been proposed and
used through the years. A unied approach of represent-
Lecture 14 Higher-order methods 31 Thu 18/10/12
ing and studying them was developed by John Butcher
of the University of Auckland, New Zealand. In his no-
tation, we write an s-stage method as
(t
i
, y
i
; h) =
s

j=1
b
j
k
j
, where
k
1
= f(t
i
+
1
h, y
i
),
k
2
= f(t
i
+
2
h, y
i
+
21
hk
1
),
k
3
= f(t
i
+
3
h, y
i
+
31
hk
1
+
32
hk
2
),
.
.
.
k
s
= f(t
i
+
s
h, y
i
+
s1
hk
1
+ . . .
s,s1
hk
s1
),
The most convenient way to represent the coecients
is in a tableau:

2

21

3

31

32
.
.
.

s

s1

s2

s,s1
b
1
b
2
b
s1
b
s
The tableau for the basic Euler method is trivial:
0
1
So far weve seen two RK2 methods: the Modied
method that you implemented in Lecture 13, and the
Improved Method of Exercise 12.1. Their tableaux are:
0
1/2 1/2
0 1
and
0
1 1
1/2 1/2
The tableau for the RK4 method above is:
0
1/2 1/2
1/2 0 1/2
1 0 0 1
1/6 2/6 2/6 1/6
You should now convince yourself that these tableaux
do indeed correspond to the methods we did in class.
14.4 Exercises
Exercise 14.1. We claim that, for RK4:
|E
N
| = |y(t
N
) y
N
| Kh
4
.
for some constant K. How could you verify that the
statement is true in this instance (i.e., using the data of
Table 14.1)? Give an estimate for K.
Exercise 14.2. Recall the problem in Example 10.2: Es-
timate y(2) given that
y(1) = 1, y

= f(t, y) := 1 +t +
y
t
,
(i) Show that f(t, y) satises a Lipschitz condition and
give an upper bound for L.
(ii) Use Eulers method with h = 1/4 to estimate y(2).
Using the true solution, calculate the error.
(iii) Repeat this for RK2 method taking h = 1/2.
(iv) Use RK4 with h = 1 to estimate y(2).
Exercise 14.3. Here is the tableau for a three stage
Runge-Kutta method, known as RK3-2.
0
1/2 1/2
1 -1 2
1/6 2/3 1/6
(i) Write out the equations for this method in the same
form that RK4 was given in Section 14.1.
(ii) (a) Show that the method is consistent.
(b) For the general s-stage Runge-Kutta method,
what conditions must one have one the terms
in the tableau in order to ensure consistency?
(iii) Write a Matlab program that implements this method.
Test it for your own choice of a dierential equation
for which you know the exact solution.
Now use this program to check the order of conver-
gence of the method. If possible, have it compute
the error for dierent values of n (e.g., n = 2,
n = 4, n = 8, etc). Then produce a log-log plot of
these n against the corresponding errors.
Explain how one should interpret the slope of this
line, and why.
(iv) Consider the initial value problem:
y(0) = 1, y

(t) = y(t).
Using that the solution is y(t) = e
t
, write out a
Taylor series for y(t
i+1
) about y(t
i
) up to terms of
order h
4
(note: use that h = t
i+1
t
i
).
Write out what the RK32 method would be for this
problem. Show that it agrees with the Taylor Series
expansion up to terms of order h
3
.
Lecture 15 From IVPs to Linear Systems 32 Tue 23/10/12
15 From IVPs to Linear Systems
To goal of todays class is to highlight the many im-
portant aspects of the numerical solution of IVPs that
are not covered in detail in this course. We have the
additional goal of seeing how these methods related to
the earlier section of the course (nonlinear problems) and
next section (linear equation solving).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
We complete our study of numerical methods for IVP
by giving a brief consideration to each of the following:
Systems of ODEs; Higher-order equations; Implicit meth-
ods; and Problems in two dimensions.
15.1 Systems of ODEs
So far we have only solved single IVPs. However, must
interesting problems are coupled systems: nd functions
y and z such that
y

(t) = f
1
(t, y, z),
z

(t) = f
2
(t, y, z).
This does not present much of a problem to us. For
example the Euler Method is extended to
y
i+1
= y
i
+hf
1
(t, y
i
, z
i
),
z
i+1
= z
i
+hf
2
(t, y
i
, z
i
).
Example 15.1. In pharmokinetics, the ow of drugs be-
tween the blood and major organs can be modelled
dy
dt
(t) = k
21
z(t) (k
12
+k
elim
)y(t).
dz
dt
(t) = k
12
y(t) k
21
z(t).
y(0) = d, z(0) = 0.
where y is the concentration of a given drug in the blood-
stream and z is its concentration in another organ. The
parameters k
21
, k
12
and k
elim
are determined from phys-
ical experiments.
Eulers method for this is:
y
i+1
= y
i
+h
_
(k
12
+k
elim
)y
i
+k
21
z
i
_
,
z
i+1
= z
i
+h
_
k
12
y
i
+k
21
z
i
_
.
15.2 Higher-Order problems
So far weve only considered 1st order IVPS:
y

(t) = f(t, y); y(t


0
) = y
0
.
However, the methods can easily be extended to high-
order problems:
y

(t) +a(t)y

(t) = f(t, y); y(t


0
) = y
0
, y

(t
0
) = y
1
.
We do this by converting the problem to a system: set
z(t) = y

(t). Then:
z

(t) = a(t)z(t) +f(t, y), z(t


0
) = y
1
y

(t) = z(t), y(t


0
) = y
0
.
Example 15.2.
Consider the following 2nd-order IVP
y

(t) 3y

(t) + 2y(t) +e
t
= 0,
y(1) = e, y

(1) = 2e.
Let z = y

, then
z

(t) = 3z(t) 2y(t) +e


t
, z(0) = 2e
y

(t) = z(t), y(0) = e.


Eulers Method is
z
i+1
= z
i
+h
_
3z
i
2y
i
+e
t
i
_
,
y
i+1
= y
i
+hz
i
.
15.3 Implicit methods
Although we will do cover it in class, there are many prob-
lems for which the one-steps methods we have seen will
give a useful solution only when the step size, h, is small
enough. For larger h, the solution can be very unstable.
Such problems are called sti problems. They can be
solved, but are best done with so-called implicit meth-
ods, the simplest of which is the Implicit Euler Method:
y
i+1
= y
i
+hf(t
i+1
, y
i+1
).
Note that y
i+1
appears on both sizes of the equation.
To implement this method, we need to be able to solve
this non-linear problem. The most common method for
doing this is Newtons method.
15.4 Towards Partial Dierential Equations
So far weve only consider ordinary dierential equations:
these are DEs which involve functions of just one variable.
In our examples above, this variable was time.
But of course many physical phenomena vary in space
and time, and so the solutions to the dierential equa-
tions the model them depend on two or more variables.
The derivatives expressed in the equations are partial
derivatives and so they are called partial dierential equa-
tions (PDEs).
We will take a brief look at how to solve these (and
how not to solve them). This will motivate the following
section, on solving systems of linear equations.
Students of nancial mathematics will be familiar with
the Black-Scholes equations for pricing an option:
V
t

1
2

2
S
2

2
V
S
2
rS
V
S
+rV = 0.
Lecture 15 From IVPs to Linear Systems 33 Tue 23/10/12
With a little eort, (see, e.g., Chapter 5 of The Mathe-
matics of Financial Derivatives: a student introduction,
by Wilmott, Howison, and Dewynne) this can be trans-
formed to the simpler-looking heat equation:
u
t
(t, x) =

2
u
x
2
(t, x), for (x, t) [0, L] [0, T],
and with the initial and boundary conditions
u(0, x) = g(x) and u(t, 0) = a(t), u(t, L) = b(t).
Example 15.3. If L = , g(x) = sin(x), a(t) = b(t)
0 then u(t, x) = e
t
sin(x) (see Figure 15.1).
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 15.1: The true solution to the heat equation
In general, however, this problem cant be solved ex-
plicitly for arbitrary g, a, b so a numerical scheme is used.
Suppose we somehow know
2
u/x
2
, then we could just
use Eulers method:
u(t
i+1
, x) = u(t
i
, x) +h

2
u
x
2
(t
i
, x).
Although we dont know

2
u
x
2
(t
i
, x) we can approximate
it.
The algorithm is as follows:
1. Divide [0, T] into N intervals of width h, giving
the grid {0 = t
0
, t
1
, t
2
, . . . , t
N1
, t
N
= T}, with
t
i
= t
0
+ih.
2. Divide [0, L] into M intervals of width H, giving the
grid {0 = x
0
, x
1
, . . . , X
M1
, x
M
= L} with x
j
=
x
0
+jH.
3. Denote by u
i,j
the approximation for u(t, x) at
(t
i
, x
j
).
4. For each i = 0, 1, . . . , N 1, use the following
approximation for

2
u
x
2
(t
i
, x
j
)

2
x
u
i,j
=
1
H
2
_
u
i,j1
2u
i,j
+u
i,j+1
_
,
for k = 1, 2, . . . , M 1, and then take
u
i+1,j
:= u
i,j
h
_

2
x
u
i,j

.
This scheme is called an explicit method: if we know
u
i,j1
, u
i,j
and u
i,j+1
then we can explicitly calculate
u
i+1,j
. See Figure 15.2.
u
0,0
u
0,M
u
i,j
u
i+1,j
u
i,j+1
u
i,j1
(time)
t
x (space)
Fig. 15.2: A Finite Dierence Grid. If we know u
i,j1
,
u
i,j
and u
i,j+1
we can calculate u
i+1,j
.
Unfortunately, as we will see in class, this method is
not very stable. Unless we are very careful choosing h
and H, huge errors occur in the approximation for larger
i (time steps).
Instead one might use an implicit method: if we know
u
i1,j
, we compute u
i,j1
, u
i,j
and u
i,j+1
simultane-
ously. More precisely: solve u
i,j
h
_

2
x
u
i,j

= u
i1,j
for
i = 1, then i = 2, i = 3, etc. Expanding the
2
x
term
we get, for each i = 1, 2, 3, . . . , the set of simultaneous
equations
u
i,0
= a(t
i
),
u
i,j1
+u
i,j
+u
i,j+1
= u
i1,k
, k = 1, 2, . . . , M 1
u
i,M
= b(t
i
),
where =
h
H
2
and =
2h
H
2
+ 1. This could be
expressed more clearly as the Matrix-Vector equation:
Ax = f,
where
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0 0 . . . 0 0 0 0
0 . . . 0 0 0 0
0 . . . 0 0 0 0
.
.
.
.
.
.
.
.
.
0 0 0 0 . . . 0
0 0 0 0 . . . 0
0 0 0 0 . . . 0 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,
Lecture 15 From IVPs to Linear Systems 34 Tue 23/10/12
and
x =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
u
i,0
u
i,1
u
i,2
.
.
.
u
i,n2
u
i,n1
u
i,n
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, y =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a(0)
u
i1,1
u
i1,2
.
.
.
u
i1,n2
u
i1,n1
b(T)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
So all we have to do now is solve this system of
equations. That is what the next section of the course is
about.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exercise 15.1. Write down the Euler Method for the
following 3rd-order IVP
y

+ 2y

+ 2y = x
2
1,
y(0) = 1, y

(0) = 0, y

(0) = 1.
Exercise 15.2. Use a Taylor series to provide a derivation
for the formula

2
u
x
2
(t
i
, x
j
)
1
H
2
_
u
i,j1
2u
i,j
+u
i,j+1
_
.
Lecture 16 Gaussian Elimination 35 Thu 25/10/12
16 Gaussian Elimination
16.1 Solving linear systems: Introduction
This section of the course is concerned with solving sys-
tems of linear equations (simultaneous equations). All
problems are of the form: nd the set of real numbers
x
1
, x
2
, . . . , x
n
such that
a
11
x
1
+a
12
x
2
+ +a
1n
x
n
= b
1
a
21
x
1
+a
22
x
2
+ +a
2n
x
n
= b
2
.
.
.
a
n1
x
1
+a
n2
x
2
+ +a
nn
x
n
= b
n
where the a
ii
and b
i
are real numbers.
It is natural to rephrase this using the language of
linear algebra:
Find x = (x
1
, x
2
, . . . x
n
)
T
R
n
such that
Ax = b, (16.1)
where A R
nn
is a n n-matrix and b R
n
is a
(column) vector with n entries.
In this section, well try to nd clever ways of solving
this system. In particular,
1. well argue that its unnecessary and (more impor-
tantly) expensive to try to compute A
1
2. well have a look at Gaussian Elimination, but will
study it in the context of LU-factorisation.
3. After a detour to talk about matrix norms, we cal-
culate the condition number of a matrix.
16.2 A short review of linear algebra
A vector x R
n
, is an ordered n-tuple of real
numbers, x = (x
1
, x
2
, . . . , x
n
)
T
.
A matrix A R
mn
is a rectangular array of n
columns of m real numbers.
We assume you know how to multiply a matrix by
a vector and a matrix by a matrix.
You can write a dot product as (x, y) = x
T
y =

n
i=1
x
i
y
i
.
Recall that (Ax)
T
= x
T
A
T
, so (x, Ay) = x
T
Ay =
(A
T
x)
T
y = (A
T
x, y).
Letting I be the identity matrix, if there is a matrix
B such that AB = BA = I, when we call B the
inverse of A and write B = A
1
. If there is no
such matrix, we say that A is singular.
Lemma 16.1. The following are equivalent:
1. for any b, Ax = b has a solution.
2. if there is a solution to Ax = b, it is unique.
3. for all x R
n
, if Ax = 0 then x = 0.
4. The columns (or rows) of A are linearly indepen-
dent.
5. There exists a matrix A
1
R
nn
such that
AA
1
= I = A
1
A.
6. det(A) = 0.
Item 5 of the above lists suggests that we could solve
(16.1) as follows: nd A
1
and then compute x = A
1
b.
However this is not the best way to proceed. We only
need x and computing A
1
requires quite a bit a work...
In rst-year linear algebra courses, we would compute
A
1
=
1
det(A)
A

where A

is the adjoint matrix. But it turns out that


computing det(A) directly is time-consuming.
16.3 The time it takes to compute det(A)
Let d
n
be the number of multiplications required to
compute the determinant of an n n matrix using the
Method of Minors (also known as Laplaces Formula).
We know that if
A =
_
_
a b
c d
_
_
then det(A) = ad bc. So d
2
= 2. If
A =
_
_
_
_
a b c
d e f
g h j
_
_
_
_
then
det(A) =

a b c
d e f
g h j

= a

e f
h j

d f
g j

+c

d e
g h

,
so d
3
= 3d
2
. Next:
Lecture 16 Gaussian Elimination 36 Thu 25/10/12
16.4 Gaussian Elimination
In the earlier sections of this course, we studied approx-
imate methods for solving a problem: we replaced the
problem (e.g, nding a zero of a nonlinear function) with
an easier problem (nding the zero of a line) that has a
similar solution. Here we will look at an exact method:
we replace the problem with one that is easier to solve
and has the same solution.
Well tackle problem (16.1) by Gaussian Elimination.
9
Example 16.2. Consider the problem:
_
_
_
_
1 3 1
3 1 2
2 2 5
_
_
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_
_
=
_
_
_
_
5
0
9
_
_
_
_
We can perform a sequence of elementary row operations
to yield the system:
_
_
_
_
1 3 1
0 2 1
0 0 5
_
_
_
_
_
_
_
_
x
1
x
2
x
3
_
_
_
_
=
_
_
_
_
5
3
5
_
_
_
_
.
In the latter form, the problem is much easier: from
the 3rd row, it is clear that x
3
= 1, substitute into row
2 to get x
2
= 1, and then into row 1 to get x
1
= 1.
In our next class well an alternative way of viewing
this.
Recall that in Gaussian elimination, at each step in
the process, we perform an elementary row operation
such as
A =
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
_
_
being replaced by
_
_
_
_
a
11
a
12
a
13
a
21
+
21
a
11
a
22
+
21
a
12
a
23
+
21
a
13
a
31
a
32
a
33
_
_
_
_
=
A+
21
_
_
_
_
0 0 0
a
11
a
12
a
13
0 0 0
_
_
_
_
where
21
= a
21
/a
11
. Because
_
_
_
_
0 0 0
a
11
a
12
a
13
0 0 0
_
_
_
_
=
_
_
_
_
0 0 0
1 0 0
0 0 0
_
_
_
_
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
_
_
9
Carl Freidrich Gau, Germany, 1777-1855. Although
he produced many very important original ideas, this
wasnt one of them. The Chinese knew of Gaussian
Elimination about 2000 years ago. His actual con-
tributions included major discoveries in the areas of
number theory, geometry, and astronomy.
we can write the row operation as
_
I+
21
E
(21)
_
A, where
E
(pq)
is the matrix of all zeros, except for e
pq
= 1.
In general each of the row operations in Gaussian
Elimination can be written as
_
I +
pq
E
(pq)
_
A where 1 q < p n, (16.2)
and
_
I +
pq
E
(pq)
_
is an example of a Unit Lower Tri-
angular Matrix.
As we will see, each step of the process will involve
multiplying A by a unit lower triangular matrix, resulting
in an upper triangular matrix.
Exercise 16.1. Verify that the two linear systems of
equations in Example 16.2 have the same solution.
Exercise 16.2. Show that det(A) =
n
det(A) for any
A R
nn
and any scalar R.
Note: the purpose of this exercise is not just to help
develop skills at establishing important properties of ma-
trices. It also gives us another reason to avoid trying
to calculate the determinant of the coecient matrix in
order to nd the inverse, and thus the solution to the
problem. For example A R
1010
and the system is
rescaled by = 0.1, the det(A) is rescaled by 10
10
it
is almost singular!
Lecture 17 Matlab Class 3: functions 37 Tue 30/10/12
17 Matlab Class 3: functions
We regress from our study of linear solvers to revisit to
numerical approximation of IVPs using Matlab implemen-
tation of RK methods.
While our main goal is ostensibly to write some Mat-
lab code that can employ a range of RK methods, our
real goal is to learn about Matlab function les. This is
an important concept that is applicable to any problem
type that can be tackled using Matlab or Octave.
17.1 Scripts V functions
In earlier labs we have developed Matlab script les.
These contain a sequence of commands. The name of
the le ends with DOT m. When we type the name of the
script le, those commands are executed. The result is
the same as if we typed those line in the Matlab com-
mand window. Storing them in a script le makes it easy
to edit and re-run the commands.
A Matlab function le also ends in DOT m but has the
purpose of computing a value (or vector) that is returned
to the calling command line. The results of intermediate
computations are not stored or returned. (For those who
have some programming experience, this means that all
variables are local to the function).
For more information on scripts and functions, see
Chapter 3 of Learning Matlab by Tobin Driscoll, pub-
lished by SIAM, and (legally!) accessible through the
library website.
17.2 Function syntax
A function le always begins with the key-word: function.
The syntax is
function RetVal FunctionName(Arg1, Arg2, ...)
For example, download the SolveEuler.m le from
the course web-page. Youll notice it begins with the line:
function yN = SolveEuler(f, t0, y0, tN, n)
The components of this line are:
function is the keyword that indicates that this is
a function le, and not a script le.
yN is the return value
SolveEuler is the name of the function. Notice
that it is the same as the name of the le.
The arguments to this function are
the data for the IVP: f, t0, y0;
tN, the time at which we wish to estimate the
solution (i.e., estimate t(t
N
));
n the number of time-steps to use.
Read the rest of the SolveEuler.m le. Youll notice
that, although it is 28 lines long, only 9 of them actually
do anything. The rest of the lines are comments, or black
lines, included so that it is easier to see what the function
is doing.
If you have any questions, please ask one of the tutors.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
You can also download TestEuler.m
This is used to verify that the Euler solver gives reason-
able results. For example, you should be able to observe
that it is 1st order accurate.
17.3 Your tasks
As a rst step to step to see if you understand the use
of function les and script les, write a function, similar to
SolveEuler that implements the so-called Modied Euler
method. This repeats an exercise that you did for Lab 2.
Now modify the code, so that it takes the RK2 pa-
rameters as arguments: a, b, and . By taking
= = 1/2, a = 0, and b = 1, verify that, again,
the results for the Modied Euler method are obtained.
See Section 13.11.
Next review Lecture 14. There several dierent meth-
ods are described in terms of their (Butcher) Tableaux.
Can you extend your code to use, say, RK4?
Last, but certainly not least, do Exercise 14.3. For
the purpose of this lab, pay particular attention to 14.3-
(iii). It asks you to write a Matlab implementation for a
method called RK3-2.
17.4 What you should submit as home-
work
At the very least, submit your Matlab code to solve for
14.3-(iii). In addition, try to write a more general version
of the programme that can implement a Runge-Kutta
method, up to four stages, for any tableau. Find an
examples of an RK2 and an RK3 method that was not
covered in lectures, and verify that your program works
for these. You can nd examples in, e.g., Numerical Meth-
ods for Ordinary Dierential Equations (2nd Ed) by J.C.
Butcher. Ideally, you should submit two les:
a function le that implements the methods;
a script le that tests the function le for several
examples of RK methods.
For this question, marks will be given the quality of
the coding, and the generality of the implementation. For
more details, please come to Thursdays lecture.
Lecture 18 Triangular Matrices 38 Thu 01/11/12
18 Triangular Matrices
In Lecture 16 we recalled Gaussian Elimination for solving
systems of linear equations, expressed in matrix-vector
form. We saw that each step that algorithm could be
expressed as multiplying the matrix by a lower triangular
one.
The goal of this lecture is to gain an understanding of
the properties of triangular matrices, and how they relate
to Gaussian Elimination.
18.1 Unit Lower Triangular and Upper Tri-
angular Matrices
Denition 18.1. L R
nn
is a Lower Triangular Matrix
(LTM) if the only non-zero entries are on or below the
main diagonal, i.e., if l
ij
= 0 for 1 i j n. It is a
Unit Lower Triangular Matrix (ULTM) is l
ii
= 1.
U R
nn
is an Upper Triangular Matrix (UTM) if
u
ij
= 0 for 1 j i n. It is a Unit Upper Triangular
Matrix (UUTM) if u
ii
= 1.
Example 18.2.
Triangular matrices have many important properties.
A very important one is: the determinant of a trian-
gular matrix is the product of the diagonal entries
(for proof, see Exercise 18.1). Some more properties are
noted Theorem 18.5 below. The style of proof used in
that theorem recurs through out this section of the course
and so is studying for its own sake; The key idea is to
partition the matrix and then apply inductive arguments.
Denition 18.3. X is a submatrix of A if it can be
obtained by deleting some rows and columns of A.
Denition 18.4. The Leading Principal Submatrix of or-
der k of A R
nn
is A
(k)
R
kk
obtained by deleting
all but the rst k rows and columns of A. (Simply put,
its the k k matrix in the top left-hand corner of A).
Next recall that if A and V are matrices of the same
size, and each are partitioned
A =
_
_
B C
D E
_
_
, and V =
_
_
W X
Y Z
_
_
,
where B is the same size as W, C is the same size as X,
etc. Then
AV =
_
_
BW +CY BX +CZ
DW +EY DX +EY
_
_
.
Armed with this, we are now ready to state our the-
orem on lower triangular matrices.
Theorem 18.5. For any integer n 2:
(i) If L
1
and L
2
are n n Lower Triangular (LT) Ma-
trices that so too is their product L
1
L
2
.
(ii) If L
1
and L
2
are nn Unit Lower Triangular (ULT)
matrices, then so too is their product L
1
L
2
.
(iii) L
1
is nonsingular if and only if all the l
ii
= 0. In
particular all ULT matrices are nonsingular.
(iv) The inverse of a LTM is an LTM. The inverse of a
ULTM is a ULTM.
Well prove just part (iv) of Theorem 18.5, and for
LTMs. The other parts are left to Exercise 18.2. We
restate Part (iv) as follows:
Suppose that L R
nn
is a Lower Trian-
gular matrix (LTM), with n 2, and that
there is a matrix L
1
R
nn
such that
L
1
L = I
n
. Then L
1
is also a Lower Trian-
gular Matrix.
Proof.
Theorem 18.6. Statements analogous to Theorem 18.5
hold for upper triangular and unit lower triangular matri-
ces. (For proof, see Exercise 18.3)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.2 Exercises
Exercise 18.1. Let L be a Lower Triangular nn matrix.
Show that
det(L) =
n

j=0
l
jj
.
Hence give a necessary and sucient condition for L to
be invertible. What does that tell us about Unit Lower
Triangular Matrices?
Exercise 18.2. Prove Parts (i)(iii) of Theorem 18.5.
Exercise 18.3. Prove Theorem 18.6.
Lecture 18 Triangular Matrices 39 Thu 01/11/12
Exercise 18.4. Suppose the n n matrices A and C
are both LTMs, and that there is a nn matrix B such
that AB = C. Must B be a LTM?
Suppose A and C are ULTMs. Must B be ULT? Why?
Exercise 18.5. Construct an alternative proof of Theo-
rem 18.5 (iv) as follows: Suppose that L is a non-singular
lower triangular matrix. If b R
n
is such that b
i
= 0
for i = 1, . . . , k n, and y solves Ly = b, then y
i
= 0
for i = 1, . . . , k n. (Hint: partition L by the rst k
rows and columns.)
18.3 Extra: some details on the proof of The-
orem 18.5
The proof is by induction. First show that the statement
is true for n = 2 by writing down the of a 22 nonsingular
lower triangular matrix.
Next we shall that the result holds true for k =
3, 4, . . . n 1. For the case where L R
nn
, partition
L by the last row and column:
L =
_
_
_
_
_
_
_
_
_
_
_
l
1,1
0 . . . 0 0
l
2,1
l
2,2
. . . 0 0
l
n1,1
l
n1,2
. . . l
n1,n1
0
.
.
.
l
n,1
l
n,2
. . . l
n,n1
l
nn
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
L
(n1)
0
r
T
l
n,n
_
_
_
_
_
_
_
_
_
_
where L
(n1)
is the LT matrix with n1 rows and column
obtained by deleting the last row and column of L, 0 =
(0, 0, . . . , 0)
T
and r R
n1

. Do the same for L


1
:
L
1
=
_
_
X y
z
T

_
_
,
and use the fact that their product is I
n
:
LL
1
=
_
_
L
(n1)
0
r
T
l
n,n
_
_
_
_
X y
z
T

_
_
=
_
_
L
(n1)
X + 0z
T
L
(n1)
y + 0
r
T
X +l
n,n
z
T
r
T
y +l
n,n

_
_
=
_
_
I
n1
0
0
T
1
_
_
.
This gives that
L
(n1)
X = I
n1
so, by the inductive hypothesis,
X is an LT matrix.
L
(n1)
y = 0. Since L
n1
is invertible, we get that
y = L
1
N1
0 = 0.
z
T
= (r
T
X)/l
nn
which exists because, since det(L) =
0 we know that l
nn
= 0.
Similarly, = (1 r
T
y)/l
nn
Lecture 19 LU-factorisation 40 Tue 5/11/12
19 LU-factorisation
To goal of todays class is to see that applying Gaus-
sian elimination to solve the linear system Ax = b is
equivalent to factoring A as the product of a unit lower
triangular (ULT) and upper triangular (UT) matrix.
19.1 Factorising A
It was mentioned towards the end of Lecture 16 that
each elementary row operation in Gaussian Elimination
(GE) involves replacing A with (I +
rs
E
(rs)
)A. But
(I +
rs
E
(rs)
) is a unit lower triangular matrix. Also,
when we are nished we have an upper triangular matrix.
So we can write the whole process as
L
k
L
k1
L
k2
. . . L
2
L
1
A = U,
where each of the L
i
is a ULT matrix. But Theorem 18.5
tells us that the product of ULT matrices is itself a ULT
matrix. So we can write the whole process as

LA = U.
Theorem 18.5 also tells us that the inverse of a ULT
matrix exists and is a ULT matrix. So we can write
A = LU,
where L is unit lower triangular and U is upper triangular.
This is called...
Denition 19.1. The LU-factorization of the matrix is
a unit lower triangular matrix L and an upper triangular
matrix U such that LU = A.
Example 19.2. If A =
_
_
3 2
1 2
_
_
then:
Example 19.3. If A =
_
_
_
_
3 1 1
2 4 3
0 2 4
_
_
_
_
then:
19.2 A formula for LU-factorisation
In the above examples, we deduced the factorisation be
inspection. But the process suggests a algorithm/formula.
That is, we need to work out formulae for L and U where
a
i,j
= (LU)
ij
=
n

k=1
l
ik
u
kj
1 i, j n.
Since L and U are triangular, so
If i j then a
i,j
=
i

k=1
l
ik
u
kj
If j < i then a
i,j
=
j

k=1
l
ik
u
kj
The rst of these equations can be written as
a
i,j
=
i1

k=1
l
ik
u
kj
+l
ii
u
ij
.
But l
ii
= 1 so:
u
i,j
= a
ij

i1

k=1
l
ik
u
kj

i = 1, . . . , j 1,
j = 2, . . . , n.
(19.1a)
And from the second:
l
i,j
=
1
u
jj
_
a
ij

j1

k=1
l
ik
u
kj
_
i = 2, . . . , n,
j = 1, . . . , i 1.
(19.1b)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 19.4. Find the LU-factorisation of
A =
_
_
_
_
_
_
_
1 0 1 2
2 2 1 4
3 4 2 4
4 6 5 0
_
_
_
_
_
_
_
Details of Example 19.4: First, using (19.1a) with
i = 1 we have u
1j
= a
1j
:
U =
_
_
_
_
_
_
_
1 0 1 2
0 u
22
u
23
u
24
0 0 u
33
u
34
0 0 0 u
44
_
_
_
_
_
_
_
.
Then (19.1b) with j = 1 we have l
i1
= a
i1
/u
11
:
L =
_
_
_
_
_
_
_
1 0 0 0
2 1 0 0
3 l
32
1 0
4 l
42
l
43
1
_
_
_
_
_
_
_
.
Lecture 19 LU-factorisation 41 Tue 5/11/12
Next (19.1a) with i = 2 we have u
2j
= a
2j
l
21
u
2j
:
U =
_
_
_
_
_
_
_
1 0 1 2
0 2 1 0
0 0 u
33
u
34
0 0 0 u
44
_
_
_
_
_
_
_
,
then (19.1b) with j = 2 we have l
i2
= (a
i2
l
i1
u
12
)/u
22
:
L =
_
_
_
_
_
_
_
1 0 0 0
2 1 0 0
3 2 1 0
4 3 l
43
1
_
_
_
_
_
_
_
Etc....
19.3 Existence of an LU-factorisation
Question: It this factorisation possible for all matrices?
That answer is no. For example, suppose one of the
u
ii
= 0 in (19.1b). Well now characterise the matrices
which can be factorised.
Example 19.5.
To prove the next theorem we need the Binet-Cauchy
Theorem: det(AB) = det(A) det(B).
Theorem 19.6. If n 2 and A R
nn
is such that
every leading principal submatrix of A is nonsingular for
1 k < n, then A has an LU-factorisation.
19.4 Exercises
Exercise 19.1. Many textbooks and computing systems
compute the factorisation A = LDU where L and U are
unit lower and unit upper triangular matrices respectively,
and D is a diagonal matrix. Use Theorem 19.6 to show
such a factorisation exists.
Exercise 19.2. Suppose that A R
nn
is nonsingular
and symmetric, and that every leading principle subma-
trix of A is nonsingular. Use Exercise 19.1 so show that
A can be factorised as A = LDL
T
.
How would this factorization be used to solve Ax = b?
Lecture 20 Putting it all together 42 Thu 08/11/12
20 Putting it all together
In todays class well see how to use the LU-factorisation
of a matrix to solve a linear system. Well also study the
computational eciency of the method.
20.1 Solving LUx = b
From Theorem 19.6 we can factorize a A R
nn
(that
satises the theorems hypotheses) as A = LU. But out
overarching goal is to solve the problem: nd x R
n
such that Ax = b, for some b R
n
. We do this by
rst solving Ly = b for y R
n
and then Ux = y.
Because L and U are triangular, this is easy. The process
is called back-substitution.
Example 20.1. Use LU-factorisation to solve
_
_
_
_
_
_
_
1 0 1 2
2 2 1 4
3 4 2 4
4 6 5 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
2
3
1
1
_
_
_
_
_
_
_
Solution: Take the LU-factorisation from Example 19.4.
Then solve
_
_
_
_
_
_
_
1 0 0 0
2 1 0 0
3 2 1 0
4 3 2 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
y
1
y
2
y
3
y
4
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
2
3
1
1
_
_
_
_
_
_
_
to get y =
_
_
_
_
_
_
_
2
1
3
0
_
_
_
_
_
_
_
And then
_
_
_
_
_
_
_
1 0 1 2
0 2 1 0
0 0 3 2
0 0 0 4
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
2
1
3
0
_
_
_
_
_
_
_
to get x =
_
_
_
_
_
_
_
1
0
1
0
_
_
_
_
_
_
_
20.2 Pivoting
Example 20.2. Suppose we want to compute the LU-
factorisation of
A =
_
_
_
_
0 2 4
2 4 3
3 1 1
_
_
_
_
.
We cant compute l
21
because u
11
= 0. But if we swap
rows 1 and 3 we get the matrix in Example 19.3 and
so can form an LU-factorization. This like changing the
order of the linear equations we want to solve.
If
P =
_
_
_
_
0 0 1
0 1 0
1 0 0
_
_
_
_
then PA =
_
_
_
_
3 1 1
2 4 3
0 2 4
_
_
_
_
.
This is called Pivoting and P is the permutation matrix.
Denition 20.3. P R
nn
is a Permutation Matrix if
every entry is either 0 or 1 (it is a Boolean Matrix) and
if all the row and column sums are 1.
Theorem 20.4. For any A R
nn
there exists a per-
mutation matrix P such that PA = LU.
For a proof, see p53 of text book.
20.3 The computational cost
How ecient is the method of LU-factorization for solv-
ing Ax = b? That is, how many computational steps
(additions and multiplications) are required? In the text
book [1], youll nd a discussion in Section 2.6 that goes
roughly as follows:
Suppose we want to compute l
i,j
. From the formula (19.1)
from Tuesday, we see that this would take j2 additions,
j 2 multiplications, 1 subtraction and 1 division: a total
of 2j 2 operations. Recall that
1 + 2 + +k =
1
2
k(k + 1), and
1
2
+ 2
2
+ k
2
=
1
6
k(k + 1)(2k + 1).
So the number of operations required for computing L is
n

i=2
i1

j=1
(2j 1) =
n

i=2
i
2
2i + 1 =
1
6
n(n + 1)(2n + 1) n(n + 1) Cn
3
for some C. A similar (slightly smaller) number of opera-
tions is required for computing U. (For a slightly dierent
approach that yields cruder estimates, but requires a little
less work, have a look at Lecture 10 of Afternotes [3]).
This doesnt tell us how long a computer program will
take to run, but it does tell us how the execution time
grows with n. For example, if n = 100 and the program
takes a second to execute, then if n = 1000 wed expect
it to take about a thousand seconds.
20.4 Towards error estimates
Unlike, say, Eulers method or Newtons method, LU-
factorisation is an exact technique; after a nite number
of steps, it gives us exactly the solution we are looking for,
with no error. Or at least it would if we could work with
exact arithmetic. But we (i.e., computers) cant. Most
computer systems represent numbers using only 8 or 16
digits. This approximation leads to round-o error. It
transpires that the implementation of LU-factorization,
and related methods, can cause this error to be greatly
magnied: this phenomenon will be our main focus over
the next few lectures
Lecture 20 Putting it all together 43 Thu 08/11/12
You might remember from earlier sections of the course
that we had to assume functions where well-behaved in
the sense that
|f(x) f(y)|
|x y|
L,
for some number L, so that our numerical schemes (e.g.,
xed point iteration, Eulers method, etc) would work. If
a function doesnt satisfy a condition like this, we say it is
ill-conditioned. One of the consequences is that a small
error in the inputs gives a large error in the outputs. Wed
like to be able to express similar ideas about matrices:
that A(u v) = Au Av is not too large compared
to u v. To do this we used the notion of a norm to
describing the relatives sizes of the vectors u and Au.
20.5 Norms
When we want to consider the size of a real number,
without regard to sign, we use the absolute value. Im-
portant properties of this function are:
1. |x| 0 for all x.
2. |x| = 0 if and only if x = 0.
3. |x| = |||x|.
4. |x +y| |x| + |y| (triangle inequality).
This notion can be extended to vectors and matrices.
Denition 20.5. Let R
n
be all the vectors of length n
of real numbers. The function is called a norm of
R
n
if, for all u, v R
n
1. v 0,
2. v = 0 if and only if v = 0.
3. v = ||v for any R,
4. u +v u +v (triangle inequality).
20.6 Exercises
Exercise 20.1. Recall Exercise 18.1. Form a conjecture
concerning the determinant of an upper triangular matrix.
Prove it. Use this to derive a method of computing the
determinant of a matrix that is more ecient than the
approach of Section 16.3.
Exercise 20.2. Consider the matrix
H
3
=
_
_
_
_
1
1
2
1
3
1
2
1
3
1
4
1
3
1
4
1
5
_
_
_
_
.
(i) Write down the LU-factorization of H
3
.
(ii) Hence solve the linear systems H
3
x = b where
(a) b = (1/2, 1/6, 1/12)
T
, and
(b) b = (1/6, 1/12, 1/20)
T
.
Lecture 21 Norms of vectors and matrices 44 Tue 13/11/12
21 Norms of vectors and matrices
21.1 Three norms on R
n
Norms on vectors in R
n
give use some information about
the size of the vector. But there are dierent ways of
measuring the size: you could take the absolute value of
the largest entry, you cold look at the distance for the
origin, etc...
Let v R
n
: v = (v
1
, v
2
, . . . , v
n1
, v
n
)
T
.
Denition 21.1. (i) The 1-norm (also known as the
Taxi cab or Manhattan norm) is
v
1
=
n

i=1
|v
i
|.
(ii) The 2-norm (a.k.a. the Euclidean norm)
v
2
=
_
n

i=1
v
2
i
_
1/2
.
Note, if v is a vector in R
n
, then
v
T
v = v
2
1
+v
2
2
+ +v
2
n
= v
2
2
.
(iii) The -norm (also known as the max-norm)
v

=
n
max
i=1
|v
i
|.
Example 21.2. If v = (2, 4, 4) then
In Figure 21.1, the rst diagram shows the unit ball
in R
2
given by the 1-norm: the vectors x = (x
1
, x
2
) in R
2
are such that x
1
= |x
1
| + |x
2
| = 1 are all found on the
diamond (top left). In the second diagram, the vectors
have
_
x
2
1
+x
2
2
= 1 and so are arranged in a circle (top
right). The bottom diagram gives the unit ball in

,
the largest component of each vector is 1.
It is easy to show that
1
and

are norms
(see Exercise 21.2). And it is not hard to show that

2
satises conditions (1), (2) and (3) of Denition
20.5. But it takes a little bit of eort to show that
2
satises the triangle inequality. First we need
Lemma 21.3 (Cauchy-Schwarz).
|
n

i=1
u
i
v
i
| u
2
v
2
, u, v R
n
.
The proof can be found in any text-book on analysis.
Now can now apply it to show that
.......
.......
.......
.......
.......
.......
.......
.......
.......
............
.......
.......
.......
.......
.......
.......
.......
.......
...... . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . .
(0, 1)
(0, 1)
(1, 0)
(1, 0)
. .. . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . ......
..........
..........
..........
..........
..........
.......... .......... .......... ..........
..........
..........
..........
..........
..........
........
(0, 1)
(0, 1)
(1, 0)
(1, 0)
(0, 1)
(0, 1)
(1, 0)
(1, 0)
Fig. 21.1: The unit vectors in R
2
: x
1
= 1, x
2
= 1,
x

= 1,
Lemma 21.4.
u +v
2
u
2
+v
2
.
It follows directly that
Corollary 21.5.
2
is a norm.
21.2 Subordinate Matrix Norms
Denition 21.6. Given any norm on R
n
, there is a
subordinate matrix norm on R
nn
dened by
A = max
vR
n

Av
v
, (21.1)
where A R
nn
and R
n

= R
n
/{0}.
You might wonder why we dene a matrix norm like
this. The reason is that we like to think of A as an
operator on R
n
: if v R
n
then Av R
n
. So rather
than the norm giving us information about the size of
the entries of a matrix, it tells us how much the matrix
can change the size of a vector.
The only problem is that the denition isnt very use-
ful: for a given A, wed have to calculate Av/v for
all v. And there is rather a lot of them. Fortunately, there
are some easier ways of computing the more important
norms.
Lecture 21 Norms of vectors and matrices 45 Tue 13/11/12
21.3 Computing Matrix Norms
The formula for a subordinate matrix norm in Deni-
tion 21.6, is sensible, but not much use to if we actually
want to compute, say, A
1
, A

or A
2
. Fortunately
there reasonably tractable ways of doing this. Well see
that
The -norm of a matrix is just largest absolute-
value row sum.
The 1-norm of a matrix is just largest absolute-
value column sum.
The 2-norm of the matrix A is the square root of
the largest eigenvalue of A
T
A.
21.4 The max-norm on R
nn
Theorem 21.7. For any A R
nn
the subordinate ma-
trix norm associated with

on R
n
can be computed
by
A

= max
i=1,...n
n

j=1
|a
ij
|.
A similar result holds for the 1-norm, the proof of
which is left as an exercise.
Theorem 21.8.
A
1
= max
j=1,...,n
n

i=1
|a
i,j
|. (21.2)
21.5 Exercises
Exercise 21.1. Show that
1
and

are norms on
R
n
.
Exercise 21.2. Show that
2
on R
n
satises conditions
(1)(3) in Denition 20.5.
Exercise 21.3. For that for any subordinate matrix norm
on R
nn
, the norm of the identity matrix is 1.
Exercise 21.4. Prove Theorem 21.8. Hint: Suppose
that
n

i=1
|a
ij
| C, j = 1, 2, . . . n,
show that for any vector x R
n
n

i=1
|(Ax)
i
| Cx
1
.
Now nd a vector x such that

n
i=1
|(Ax)
i
| = Cx
1
.
Deduce (21.2).
Lecture 22 Computing A
2
46 Thu 15/11/12
22 Computing A
2
We saw in Lecture 21 that it is easy to compute the
1- and -norms of a given square matrix, A. As we
shall see today, computing the 2-norm is a little harder.
However, later well need estimates not just for A, but
also A
1
. And, unlike the 1- and -norms, we can
estimate A
1

2
without explicitly forming A
1
.
22.1 Eigenvalues
We begin by recalling some important facts about eigen-
values and eigenvectors.
Denition 22.1. Let A R
nn
. We call C an
eigenvalue of A if there is a non-zero vector x C
n
such that
Ax = x.
We call any such x an eigenvector associated with A.
Some properties of eigenvalues:
(i) If A is a real symmetric matrix (i.e., A = A
T
),its
eigenvalues and eigenvectors are all real-valued.
(ii) If is an eigenvalue of A, the 1/ is an eigenvalue
of A
1
.
(iii) If x is an eigenvector associated with the eigenvalue
then so too is x for any non-zero scalar .
(iv) An eigenvector may be normalised as
x
2
2
= x
T
x = 1.
(v) There are n eigenvectors
1
,
n
, . . . ,
n
associated
with the real symmetric matrix A. Let x
(1)
, x
(2)
,
..., x
(n)
be the associated normalised eigenvectors.
Then the eigenvectors are linearly independent and
so form a basis for R
n
. That is, any vector v R
n
can be written as a linear combination:
v =
n

i=1

i
x
(i)
.
(vi) Furthermore, these eigenvectors are orthogonal and
orthonormal :
_
x
(i)
_
T
x
(j)
=

1 i = j
0 i = j
Lemma 22.2. For any matrix A, the eigenvalues of A
T
A
are real and non-negative.
Proof:
Part of the above proof involved showing that, if
_
A
T
A
_
x =
x, then

=
Ax
2
x
2
.
This at the very least tells us that
A
2
:= max
xR
n

Ax
2
x
2
max
i=1,...,n
_

n
.
With a bit more work, we can show that if
1

2


n
are the the eigenvalues of B = A
T
A, then
A
2
= max
i=1,...,n
_

n
.
22.2 Singular values
The singular values of a matrix A are the square roots of
the eigenvalues of A
T
A. They play a very important role
in matrix analysis and in areas of applied linear algebra,
such as image and text processing. Our interest here is
in their relationship to A
2
.
Theorem 22.3. Let A R
nn
. Let
1

2

n
, be the eigenvalues of B = A
T
A. Then
A
2
= max
i=1,...,n
_

i
=
_

n
,
Proof:
22.3 Exercises
Exercise 22.1. For any vector x R
n
, show that x


x
2
and x
2
2
x
1
x

. For each of these inequal-


ities, give an example for which the equality holds. De-
duce that x

x
2
x
1
.
Exercise 22.2. Show that if x R
n
, then x
1

nc

and that x
2

nc

.
Lecture 23 Condition Numbers 47 Tue 20/11/12
23 Condition Numbers
23.1 Consistency of matrix norms
It should be clear from (21.1) that, if is a subordinate
matrix norm then, for any u R
n
,
Au Au.
It is an important result: well need it later. There is an
analogous statement for the product of two matrices:
Denition 23.1. A matrix norm is consistent if
AB AB, for all A, B R
n
.
Theorem 23.2. Any subordinate matrix norm is consis-
tent.
The proof is left to Exercise 23.1. You should note
that there are matrix norms which are not consistent.
See Exercise 23.2.
23.2 A short note on computer represen-
tation of numbers
This course is concerned with the analysis of numerical
methods to solve problems. It is implicit that these meth-
ods are implemented on a computer. However, comput-
ers are nite machines: there are limits on the amount of
information they can store. In particular there are limits
on the number of digits that can be stored for each num-
ber, and there are limits on the size of number that can
be stored. Because of this, just working with a computer
introduces an error.
Modern computers dont store numbers in decimal
(base 10), but in binary (base 2) oating point num-
bers of the form : a 2
b
. Most often 8 bytes (64
bits or binary digits) are used to store the four compo-
nents of this number: the two signs, a (the mantissa)
and b (the exponent). One bit is used each of the signs.
52 for a, and 11 for b. This is called double precision.
Note that a has roughly 16 decimal digits.
(Some older computer systems sometimes use single
precision where a has 23 bits (giving 8 decimal digits)
and b has 7; so too do many new GPU-based systems).
When we try to store a real number x on a computer,
we actually store the nearest oating-point number. That
is, we end up storing x+x, where |x| is the round-o
error and |x|/|x| is the relative error.
Since this is not a course on computer architecture,
well simplify a little and just take it that single and dou-
ble precision systems lead to a relative error of 10
8
and
10
16
respectively.
23.3 Condition Numbers
(Please refer to p6870 of [1] for an through development
of the concepts of local condition number and relative
local condition number ).
Denition 23.3. The condition number of a matrix is
(A) = AA
1
.
If (A) 1 then we say A is ill-conditioned.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
We now come the most important theorem of this
section. It tells us how the condition number of a matrix
is related to (numerical) error.
Theorem 23.4. Suppose that A R
nn
is nonsingular
and that b, x R
n
are a non-zero. If Ax = b and
A(x +x) = (b +b) then
x
x
(A)
b
b
.
Theorem 23.4 means that, if we are solving the system
Ax = b but because of (e.g.) round-o error, there is
an error in the right-hand side, then the relative error in
the solution is bounded by the condition number of A
multiplied by the relative error in the right-hand side.
This means that, if we are solving the system Ax = b
but because of (e.g.) round-o error, there is an error in
the right-hand side, then the relative error in the solution
is bounded by the condition number of A multiplied by
the relative error in the right-hand side.
Example 23.5. Suppose we are using a computer to
solve Ax = b where
A =
_
_
10 12
0.08 0.1
_
_
and b =
_
_
1
1
_
_
.
But, due to round-o error, right-hand side has a relative
error (in the -norm) of 10
6
. Then
Lecture 23 Condition Numbers 48 Tue 20/11/12
For every matrix norm we get a dierent condition
number. Consider the following example:
Example 23.6. Let A be the n n matrix
A =
_
_
_
_
_
_
_
_
_
_
1 0 0 . . . 0
1 1 0 . . . 0
1 0 1 . . . 0
.
.
.
1 0 0 . . . 1
_
_
_
_
_
_
_
_
_
_
.
What are
1
(A) and

(A)?
23.4 Estimating
2
In the previous example we computed (A) by nding
the inverse of A. In general, that is not practical. How-
ever, we can estimate the condition number of a matrix.
In particular, this is usually possible for
2
(A). Recall
that Theorem 22.3 tells us that A
2
=

n
where
n
is the largest eigenvalue of B = A
T
A.
Furthermore, we can show that

2
(A) =
_

n
/
1
_
1/2
,
where
1
and
n
are, respectively, the smallest and largest
eigenvalues of B = A
T
A.
Hint: We rst show that A
T
A and AA
T
are simi-
lar, i.e., they have the same eigenvalues. This can be
done by noting that, if A
T
Ax = x, then (AA
T
)Ax =
A(A
T
A)x = Ax. That is, if is an eigenvalue of
A
T
A, with corresponding eigenvector x, then is also an
eigenvalue of AA
T
with corresponding eigenvector Ax.
Now use this, and that for any nonsingular matrix X,
(X
T
)
1
= (X
1
)
T
, to prove the result.
23.5 Exercises
Exercise 23.1. Prove that all subordinate matrix norms
are consistent.
Exercise 23.2. One might think it intuitive to dene the
max norm of a matrix as follows:
A

= max
i,j
|a
ij
|.
Show that this is indeed a norm on R
nn
. Show that,
however, it is not consistent.
Exercise 23.3. Let A be the matrix
A =
_
_
_
_
_
_
_
_
_
_
2 0 0 0 0
1 2 0 0 0
1 0 2 0 0
1 0 0 2 0
1 0 0 0 2
_
_
_
_
_
_
_
_
_
_
.
What is
1
(A)? and

(A)? (Hint: Youll need to


nd A
1
. Recall that the inverse of a lower triangular
matrix is itself lower triangular).
Note: to do this using Maple:
> with(linalg):
> n:=5;
> f:= (i,j) -> piecewise( (i=j),2, (j=1),1);
> A := matrix(5,5, f);
> evalf(cond(A,1)); evalf(cond(A,2));
> evalf(cond(A,infinity));
Exercise 23.4. Let A be the matrix
A =
_
_
_
_
0.1 0 0
10 0.1 10
0 0 0.1
_
_
_
_
Compute

(A). Suppose we wish to solve the system


of equations Ax = b on single precision computer system
(i.e., the relative error in any stored number is approxi-
mately 10
8
). Give an upper bound on the relative error
in the computed solution x.
Lecture 24 Gerschgorins theorems 49 Thu 22/11/12
24 Gerschgorins theorems
The goal of this, our nal class of the year, is to learn of
a simple, but very useful approach, to estimating eigen-
values of matrices. This can be used, for example, when
computing A
2
and
2
(A).
24.1 Gerschgorins First Theorem
(See Section 5.4 of [1]).
Two theorems about the eigenvalues of a matrix are
very useful if we just want to get a quick estimate for the
magnitude of the eigenvalues of a matrix.
Denition 24.1. Given a matrix A R
nn
, the Ger-
schgorin
10
Discs D
i
are the discs in the complex plane
centre a
ii
and radius r
i
:
r
i
=
n

j=1,j=i
|a
ij
|.
So D
i
= {z C : |a
ii
z| r
i
}.
Theorem 24.2 (Gerschgorins First Theorem). All the
eigenvalues of A are contained in the union of the Ger-
schgorin discs.
Proof.
The proof was given in class. Here is an outline. Let
be an eigenvalues of A, so Ax = x for the corresponding
eigenvector x.
Suppose that x
i
is the entry of x with largest absolute
value. That is |x
i
| = x

. Then
(Ax)
i
=
n

j=1
a
ij
v
j
= v
k
.
This can be rearranged as
(a
ii
)v
i
=

j=i
a
ij
v
j
.
By the triangle inequality,
|a
ii
||v
i
| = |

i=k
a
ij
v
j
|

j=i
|a
ij
||v
j
|

j=k
|a
ij
||v
i
|,
since |v
i
| |v
j
| for all j. This gives
|a
kk
|

j=k
|a
kj
|,
as required.
10
Semyon Aranovich Gerschgorin, 19011933, Belarus
Note that we made no assumption on the symmetry
of A. However, if A is symmetric, then its eigenvalues
are real and so the theorem can be simplied.
Example 24.3. Let
A =
_
_
_
_
4 2 1
2 3 0
1 0 2
_
_
_
_
24.2 Gerschgorins 2nd Theorem
Theorem 24.4 (Gerschgorins Second Theorem). If k of
discs are disjoint (have an empty intersection) from the
others, their union contains k eigenvalues.
Proof. We wont do the proof in class, and you are not
expected to know it. Here is a sketch of it: let B() be
the matrix with entries
b
ij
=

a
ij
i = j
a
ij
i = j.
So B(1) = 1 and B(0) is the diagonal matrix whose en-
tries are the diagonal entries of A.
Each of the eigenvalues of B(0) correspond to its
diagonal entries and (obviously) coincide with the Ger-
schgorin discs of B(0) the centres of the Gerschgorin
discs of A.
The eigenvalues of B are the zeros of the characteris-
tic polynomial det(B()I) of B. Since the coecients
of this polynomial depend continuously on , so too do
the eigenvalues.
Now as varies from 0 to 1, the eigenvalues of B()
trace a path in the complex plane, and at the same time
the radii of the Gerschgorin discs of A increase from 0 to
the radii of the discs of A. If a particular eigenvalue was
in a certain disc for = 0, the corresponding eigenvalue
is in the corresponding disc for all .
Thus if one of the discs of A is disjoint from the
others, it must contain an eigenvalue.
The same reasoning applies if k of the discs of A
are disjoint from the others; their union must contain k
eigenvalues.
Lecture 24 Gerschgorins theorems 50 Thu 22/11/12
24.3 Using Gerschgorins theorems
Example 24.5. Locate the regions contains the eigen-
values of
A =
_
_
_
_
3 1 2
1 4 0
2 0 6
_
_
_
_
(The actual eigenvalues are approximately 7.018, 2.130
and 4.144.)
Example 24.6. Use Gerschgorins Theorems to nd an
upper and lower bound for the Singular Values of the
Matrix
A =
_
_
_
_
4 1 2
2 3 1
1 1 4
_
_
_
_
Example 24.7. Let D R
nn
be the matrix with the
same diagonal entries as A, and zeros for all the o-
diagonals. That is:
D =
_
_
_
_
_
_
_
_
_
_
a
1,1
0 . . . 0 0
0 a
2,2
. . . 0 0
.
.
.
.
.
.
.
.
.
0 0 . . . a
n1,n1
0
0 0 . . . 0 a
n,n
_
_
_
_
_
_
_
_
_
_
Some iterative schemes (e.g., Jacobis and Gauss-Seidel)
for solving system of equations involve successive multi-
plication by the matrix T = D
1
(A D). Proving that
these methods work often involves showing that if is
an eigenvalue of T then || < 1. Using Gerschgorins
theorem, we can show that this is indeed the case if A is
strictly diagonally dominant.
24.4 Exercises
Exercise 24.1. A real matrix A = {a
i,j
} is Strictly Di-
agonally Dominant if
|a
ii
| >
n

j=1,j=i
|a
i,j
| for i = 1, . . . , n.
Show that all strictly diagonally dominant matrices are
nonsingular.
Exercise 24.2. Let
A =
_
_
_
_
4 1 0
1 4 1
0 1 3
_
_
_
_
Use Gerschgorins theorems to give an upper bound for

2
(A).
Exercise 24.3. The problem with the matrix in Example
23.5 is that the magnitudes of the entries in the rst and
second rows of A are very dierent. This means that
det(A) is quite small and thus that A
1

is quite
large. Let D = diag(A), i.e, D is a matrix of the same
size as A with
d
ij
=

a
ij
i = j
0 i = j.
For this problem, what is the condition number (in the
-norm) of the matrix D
1
A?
The end...

You might also like