1998 Book NumericalAnalysis
1998 Book NumericalAnalysis
Editorial Board
S. Axler F. W. Gehring K.A. Ribet
Numerical Analysis
With II Illustrations
i Springer
Rainer Kress
Institut fUr Numerische und
Angewandte Mathematik
Universităt G6ttingen
D-37083 G6ttingen
Germany
Editorial Board
S. Axler F. W. Gehring K.A. Ribet
Department of Department of Department of
Mathematics Mathematics Mathematics
San Francisco State University University of Michigan University of California
San Francisco, CA 94132 Ann Arbor, MI 48109 at Berkeley
USA USA Berkeley, CA 94720
USA
987654321
ISBN 978-1-4612-6833-8
Preface
1 Introduction 1
2 Linear Systems 5
2.1 Examples for Systems of Equations . 6
2.2 Gaussian Elimination. 11
2.3 LR Decomposition 18
2.4 QR Decomposition 19
Problems · ....... 23
8 Interpolation 151
8.1 Polynomial Interpolation. 152
8.2 Trigonometric Interpolation 161
8.3 Spline Interpolation 169
8.4 Bezier Polynomials 179
Problems . 186
References 317
Index 322
Glossary of Symbols
Norms
Miscellaneous
E element inclusion
C set inclusion
u,n union and intersection of sets
o empty set
Oem) a quantity of order m
o end of proof
1
Introduction
Xj = jh, j = 0, ... ,n + 1,
where the step size is given by h = 1/ (n + 1) with n E IN. At the internal
grid points x j, j = 1, ... , n, we replace the differential quotient in the
differential equation (2.1) by the difference quotient
AU = F(U). (2.3)
For obvious reasons, the above matrix A is called a tridiagonal matrix, and
the vector F is diagonal; i.e., the jth component of F depends only on
the jth component of u. If (2.1) is a linear differential equation, i.e., if f
depends linearly on the second variable u, then the tridiagonal system of
equations (2.3) also is linear.
The following two questions will be addressed later in the book (see
Chapter 11):
1. Can we establish existence and uniqueness of a solution to the system
of equations (2.3) for sufficiently small step size h, provided that the
boundary value problem (2.1)-(2.2) itself is uniquely solvable?
2. How large is the error between the approximate solution Uj and the
exact solution u(Xj)? Do we have convergence of the approximate
solution towards the exact solution as h -+ O?
At this point we would like only to point out that the discretization of
boundary value problems for ordinary differential equations leads to sys-
tems of equations with a large number of unknowns, since we expect that
in order to achieve a reasonably accurate approximation we need to choose
the step size h sufficiently small. 0
Obviously, for each point Xij, this difference operator has nonvanishing
weights only at the four neighboring points on the vertical and horizontal
line through Xi}' This observation also illustrates why the set of grid points
with nonvanishing weights is called the star associated with the Laplace
difference operator. Using this difference approximation leads to the system
of equations
1
h2 [4Uij -Ui+l,j -Ui-l,j -Ui,j+l-Ui,j-d = !(Xij,Uij), i,j = 1, ... ,n,
for approximate values Uij to the exact solution u( Xij). This system has to
be complemented by the boundary conditions
at the grid points on the horizontal parts of the boundary aD. In order to
write this system in matrix form we rearrange the unknowns by ordering
them row by row and setting
2.1 Examples for Systems of Equations 9
B -I
-I B -I
1 -I B -I
A = h2
-I B -I
-I B
AU = F(U), (2.6)
cp(x) -1 1
K(x, y)cp(y) dy = f(x), x E [0,1],
1
1 1 n
K(x, y)cp(y) dy ~ - L K(x, Xk)'P(Xk)
o n k=]
with equidistant grid points Xk = kin, k = 1, ... , n. If we require the
approximated equation to be satisfied only at the grid points, we arrive at
the system of linear equations
1 n
'Pj - - LK(Xj,Xk)CPk
n k=]
= f(xj), j = 1, ... ,n,
for approximate values 'Pj to the exact solution cp(Xj). As in the preced-
ing examples, we postpone the question of unique solvability of the linear
system and the convergence and error analysis (see Chapter 12). 0
Example 2.4 In this last example we will briefly touch on the method of
least squares. Consider some (physical) quantity u depending on time t and
a parameter vector a = (a], ... , an)T E IRn in terms of a known function
If m = n, this system consists of n equations for the n unknowns a], ... ,an'
However, in general, the measurements will be contaminated by errors.
Therefore, usually one will take m > n measurements and then will try to
determine a by requiring the deviations
~-o
8aj - , j = 1, ... ,n,
lead to the normal equations
j = 1, ... ,n,
2.2 Gaussian Elimination 11
At this point, the reader should be convinced of the need for effective
methods for solving large systems of linear and nonlinear equations and be
willing to be introduced to such methods in the subsequent chapters. We
also wish to note that the discretization of differential equations leads to
sparse matrices, whereas for the least squares problem and the discretiza-
tion of integral equations one is faced with full matrices.
that is,
+ annxn = Yn·
Assuming that the reader is familiar with basic linear algebra, we recall the
following various ways of saying that the matrix A is nonsingular:
1. The inverse matrix A -1 exists.
2. For each Y the linear system Ax = Y has a unique solution.
3. The homogeneous system Ax = 0 has only the trivial solution.
4. The determinant of A satisfies det A :j:. O.
5. The rows (columns) of A are linearly independent.
The very basic idea of the Gaussian elimination method is to use the first
equation to eliminate the first unknown from the last n - 1 equations, then
use the new second equation to eliminate the second unknown from the last
n - 2 equations, etc. This way, by n - 1 such eliminations the given linear
12 2. Linear Systems
Xm = f-
mm
(zm - t
k=m+l
bm,kXk), m =n - 1, n - 2, ... , 1.
(2)
an2 X2 +
with the new coefficients given by
.- a(1)
blk·- lk' k = 1, ... ,n,
Here, for the coefficients and right-hand sides of the original system we
have set aj~) := ajk and y]l) := Yj'
Proceeding in this way, the given n x n system for the unknowns Xl, ... , X n
is equivalently transformed into an (n -1) x (n -1) system for the unknowns
X2, ... , X n . Adding a multiple of one row of a matrix to another row does
not change the value of its determinant. Therefore, in the above elimina-
tion the determinant of the system remains the same (with the exception
of a possible change of its sign if the order of rows or columns is changed).
Hence, the resulting (n - 1) x (n - 1) system for X2, ... , X n again has a
nonvanishing determinant, and we can apply precisely the same procedure
to eliminate the second unknown X2 from the remaining (n - 1) x (n - 1)
system.
By repeating this process we complete the forward elimination, by which
the system of linear equations
(l)x _ y(l)
+ a nn n - n
+
+
(m) (m)
(m+l)._ (m)
a jk . - a jk -
ajm a mk
(m) j, k = m + 1, ... ,n,
amm
m = 1, ... ,n-1.
(m) (m)
ajm Ym
(m)
j = m+ 1, ... ,n,
amm
14 2. Linear Systems
The coefficients and the right-hand sides of the final triangular system are
given by
.- a(j)
b jk·- jk' k = j, ... ,n, j = 1, ... ,n,
and
· '.= y(j)
ZJ J' J' - 1 ... ,n.
- ,
The condition a~~ ;f. 0, which is necessary for performing the algorithm,
always can be achieved by a reordering of the rows or columns, since oth-
erwise the matrix A would not be nonsingular.
We would like to compress the operations of one elimination step into
the following scheme
a b
e d
where the rectangle illustrates the remaining part of the matrix and the
right-hand side for which the elimination has to be performed. Here, a
stands for the elimination element, or pivot element; the elements b in
the elimination row remain unchanged; the elements e of the elimination
column are replaced by zero (with the exception of the pivot element a)j
and the remaining elements d are changed according to the rule
be
d-+d--.
a
We note that in computer calculations, of course, the new values for the
coefficients of the matrix and the right-hand sides can be stored in the
locations held by the old values.
More explicitly, the entire Gaussian elimination can be written in the
following algorithmic form.
Algorithm 2.5 (Gaussian elimination)
1. Forward elimination:
For m = 1, ... ,n - 1 do
for j = m + 1, ... ,n do
ajmYm
Yj:= Yj - - - -
a mm
2.2 Gaussian Elimination 15
2. Backward substitution:
Xm
Xm := - -
a mm
The solutions can be found from the triangular system by arbitrarily choos-
ing Xr+l, ... , X n and then recursively determining X r , ... , Xo. This way we
obtain the (n - r)-dimensional solution manifold.
In order to control the influence of roundoff errors we want to keep the
quotient a;:l /a~,l" small; i.e., we want to have a large pivot element a~,l".
Therefore, instead of only requiring a~,l" oj:. 0, in practice, either complete
pivoting or partial row or column pivoting is employed. For complete piv-
oting, both the rows and the columns are reordered such that a~~ has
maximal absolute value in the (n - m + 1) x (n - m + 1) matrix remaining
for the mth forward elimination step. In order to minimize the additional
computational cost caused by pivoting, for row (or column) pivoting the
rows (or columns) are reordered such that a~,l" has maximal absolute value
in the elimination column (or row), i.e., in the mth column (or row). Of
course, in the actual implementation of the Gaussian elimination algorithm
the reordering of rows and columns need not be done explicitly. Instead,
the interchange may be done only implicitly by leaving the pivot element
at its original location and keeping track of the interchange of rows and
columns through the associated permutation matrix.
The following example illustrates that partial pivoting does not always
prevent loss of accuracy in the numerical computations.
Xl + 200X2 = 100
Xl + X2 = 1
with the exact solution Xl = 100/199 = 0.502 ... , X2 = 99/199 = 0.497 ....
For the following computations we use two-decimal-digit floating-point
16 2. Linear Systems
arithmetic. Column pivoting leads to all as pivot element, and the elimi-
nation yields
Xl + 200X2 = 100
- 200X2 = -99,
since 199 = 200 in two-digit floating-point representation. From the second
equation we then have X2 = 0.50 (0.495 = 0.50 in two decimal digits), and
from the first equation it finally follows that Xl = O.
However, if by complete pivoting we choose al2 as pivot element, the
elimination leads to
Xl + 200X2 = 100
= 0.5
(0.995 = 1.00 in two decimal digits), and from this we get the solution
Xl = 0.5, X2 = 0.5 (0.4975 = 0.50 in two decimal digits), which is correct
to two decimal digits. 0
an = t
k=l
k = n(n 2+ 1) ,
because f31,r = O. Adding ran and f3n,r we obtain the following result.
multiplications.
where ei is the ith column of the identity matrix. Then the n solutions
will provide the columns of the inverse matrix A-I. We would
Xl, ... , X n
like to stress that one does not want to solve a system Ax = y by first
18 2. Linear Systems
computing A-I and then evaluating x = A-I y, since this generally leads
to considerably higher computational costs.
The Gauss-Jordan method is an elimination algorithm that in each step
eliminates the unknown both above and below the diagonal. The com-
plete elimination procedure transforms the system equivalently into a di-
agonal system. The multiplication count shows a computational cost of
order n 3 /2 + O(n 2 ), i.e., an increase of 50 percent over Gaussian elimina-
tion. Hence, the Gauss-Jordan method is rarely used in applications. For
details we refer to [26, 27].
2.3 LR Decomposition
In the sequel we will indicate how Gaussian elimination provides an LR
decomposition (or factorization) of a given matrix.
Definition 2.8 A factorization of a matrix A into a product
A=LR
of a lower (left) triangular matrix L and an upper (right) triangular matrix
R is called an LR decomposition of A.
A matrix A = (ajk) is called lower triangular or left triangular if ajk = 0
for j < k; it is called upper triangular or right triangular if ajk = 0 for
j > k. The product of two lower (upper) triangular matrices again is lower
(upper) triangular, lower (upper) triangular matrices with nonvanishing
diagonal elements are nonsingular, and the inverse matrix of a lower (upper)
triangular matrix again is lower (upper) triangular (see Problem 2.14).
Theorem 2.9 For a nonsingular matrix A, Gaussian elimination (without
reordering rows and columns) yields an LR decomposition.
Proof. In the first elimination step we multiply the first equation by ajI/all
and subtract the result from the jth equation; i.e., the matrix Al = A is
multiplied from the left by the lower triangular matrix
1
a21
1
all
L1 =
anI
1
all
The resulting matrix A 2 = L 1 A 1 is of the form
A = 2 (al~ An - : ) '
2.4 QR Decomposition 19
2.4 QR Decomposition
We conclude this chapter by describing a second elimination method for
linear systems, which leads to a QR decomposition.
20 2. Linear Systems
A=QR
of a unitary matrix Q and an upper (right) triangular matrix R is called a
QR decomposition of A.
QQ* = Q*Q = I.
The product of two unitary matrices again is unitary.
In terms of the columns of the matrices A = (al,"" an) and
Q = (ql, ... , qn) and the coefficients of R = (rjk), the QR decomposition
A = QR means that
k
ak = Lrikqi, k = 1, ... ,n. (2.7)
i=l
Hence, the vectors aI, ... , an of <en have to be orthonormalized from the
left to the right into an orthonormal basis ql, ... , qn' This, for example, can
be achieved by the Gram-Schmidt orthonormalization procedure (see The-
orem 3.18). However, since the Gram-Schmidt orthonormalization tends to
be numerically unstable, we describe the QR decomposition by Householder
matrices.
Definition 2.11 A matrix H of the form
H = 1- 2vv*,
H* = I* - 2(vv*)* = 1- 2vv* = H
and
x=vv*x+y
2.4 QR Decomposition 21
Hx =x - 2vv*x = -vv*x + y;
i.e., H x has the opposite component -vv*x in the v-direction and the
same component y orthogonal to v. Because of this property, Householder
matrices are also called elementary reflection matrices.
We now describe the elimination of the unknown Xl by multiplying A
from the left by a Householder matrix HI = I - 2VI vi. By al we denote
the first column of A and by ek the kth column of the identity matrix; in
particular, el = (1,0, ... ,0)*. Then the first column bl of the product HIA
is given by
bl = HIAel = Hlal = al - 2vlvial.
We would like to achieve that bl = O"el with 0" -I- O. Hence, except for the
first row, VI must be a multiple of al. Therefore, we try
(2.8)
with
all -I- 0,
all = o.
Then we have
and
u *l al = a *l al =f II~
all vaial = 2"1*
u l UI·
From the two possible signs in (2.8) the positive sign yields the numerically
more stable variant.
The same procedure is now repeated for the remaining (n - 1) x (n - 1)
matrix. The corresponding (n - 1) x (n - 1) Householder matrix has to be
completed as an n x n Householder matrix. In general, if A k is an n x n
matrix of the form
1
H k = ( 0k
Hn - l ·· ·HIA =R
with Householder matrices HI, ... ,Hn - I and an upper triangular matrix
R. From this we obtain
A=QR
Problems
2.1 Solve the linear system
X3 = 10
by Gaussian elimination.
2.2 Write a computer program for the solution of a system of linear equations
by Gaussian elimination with partial pivoting and test it for various examples.
You will need this code as part of other numerical algorithms later in this book.
det (~ ~) = detAdet(D - 1
CA- B).
t
k=l
k = ~ n(n + 1) and
~
L...Jk
k=l
2
= 61 n(n+ 1)(2n+ 1)
2.1 Prove the analogue of Theorem 2.7 for the number of additions in Gaussian
elimination.
A=
bn - 1 an-l Cn-l
bn an
and lad> led > 0 and lanl > Ibnl > 0 are nonsingular.
2.9 Show that Gaussian elimination for tridiagonal n x n matrices requires 4n
multiplications.
24 2. Linear Systems
2.10 Show that the diagonal elements of a positive definite matrix are positive.
2xI + X2 2X3
{(1m A)-I Re A + (Re A)-limA} Imx = (1m A)-11m y - (Re A)-l Rey.
2.20 Use QR decomposition to prove Hadamard's inequality
n
I det AI 2 ~ II L lajkj2
j=1 k=1
n
Ilxllt := L IXjl, Ilxll<Xl:= J=l,
. max... ,n
IXjl
j=l
for x = (Xl, ... , x n ) T. It is an easy exercise for the reader to verify that
the norm axioms (N1)-(N3) are satisfied. The triangle inequality for the
norms 11·111 and II· 11<Xl follows immediately from the triangle inequality in
IR or ~. The verification of the triangle inequality for the norm II . 112 is
postponed until Section 3.2. 0
The norms in Example 3.2 are denoted the £1, £2, and £<Xl norm, respec-
tively. For obvious reasons the £2 norm is also called the Euclidean norm,
r,
and the £<Xl norm is called the maximum norm. The three norms are special
cases of the £p norm
defined for any real number p ~ 1. The £<Xl norm is the limiting case of
(3.1) as p -t 00 (see Problem 3.1).
Remark 3.3 For each norm, the second triangle inequality
For two elements x, y in a normed space Ilx - yll is called the distance
between x and y.
lim
n-+oo
IIx n - xii = 0,
i.e., if for every c > 0 there exists an integer N(c) such that IIxn - xII < c
for all n ~ N(c). The element x is called the limit of the sequence (x n ),
and we write
lim X n = x
n-+oo
or
x n -t x, n -t 00.
IIx - yll = IIx - Xn +Xn - yll ~ IIx - xnll + IIxn - yll -t 0, n -t 00.
Definition 3.6 Two norms on a linear space are called equivalent if they
have the same convergent sequences.
Theorem 3.7 Two norms 11·lla and 11·11 b on a linear space X are equivalent
if and only if there exist positive numbers c and C such that
for all x EX. The limits with respect to the two norms coincide.
Proof. Provided that the conditions are satisfied, from IIx n - xll a -t 0,
n -t 00, it follows that IIx n - xllb -t 0, n -t 00, and vice versa.
Conversely, let the two norms be equivalent and assume that there is
no C > 0 such that IIxlib ~ Cllxll a for all x E X. Then there exists a
sequence (x n ) with IIxnll a = 1 and IIxnllb ~ n 2 . Now, the sequence (Yn)
with Yn := xn/n converges to zero with respect to II . lIa, whereas with
respect to II . lib it is divergent because of IIYnilb ~ n. 0
As in Example 3.2,
IIxli oo := . max
J=l, ... ,n
10jI (3.2)
Assume that there is no c > such that cllxll oo :S IIxll for all x EX.
Then there exists a sequence (xv) with Ilxvll = 1 such that IIx v ll oo 2 v.
Consider the sequence (Yv) with Yv := x v /llx v ll oo and write
n
Yv =L 0jvUj'
j=1
e
and also IIYv(l) - yll :S CIIYv(i) - ylloo ~ 0, ~ 00. But on the other hand
we have IIYvll = l/lIx v ll oo ~ 0, v ~ 00. Therefore, y = 0, and consequently
IIYv(l)lIoo ~ 0, e ~ 00, which contradicts IIYvlloo = 1 for all v. 0
Proof. Let Ul, ... , Un be a basis of X and let (xv) be a bounded sequence.
Then writing
n
Xv = Lajvuj
j=l
and using the norm (3.2), as in the proof of Theorem 3.8 we deduce that
each of the sequences (ajv), j = 1, ... , n, is bounded in ceo Hence, by
the Bolzano-Weierstrass theorem we can select convergent subsequences
aj,v(l) -+ aj, e-+ 00, for each j = 1, ... ,n. This now implies
n
xv(l) -+ L ajuj E X, e-+ 00,
j=l
for x = (XI, ... ,Xn)T and y = (Yl' ... ,Yn)T. (Note that (x, y) = y*x.)
Theorem 3.14 For a scalar product we have the Cauchy-Schwarz inequal-
ity
l(x,y)1 2 ::; (x,x)(y,y)
for all x, y EX, with equality if and only if x and yare linearly dependent.
Proof. The inequality is trivial for x = O. For x :f. 0 it follows from
(ax + (3y, ax + (3y) = laI 2 (x, x) + 2 Re{a,8(x, y)} + 1{31 2 (y, y)
= (x,x)(y,y) -1(x,y)jZ,
where we have set a = -(X,X)-1/2(X,y) and {3 = (X,X)I/2.Since (".) is
positive definite, this expression is nonnegative, and it is equal to zero if
and only if ax + {3y = O. In the latter case x and yare linearly dependent
because (3 :f. o. 0
Proof. From
n
L(Xkqk =0
k=l
for the orthonormal system {ql' ... , qn}, by taking the scalar product with
qj, we immediately have that (Xj = 0 for j = 1, ... , n. 0
Theorem 3.18 Let {uo, UI, ... } be a finite or countable number of linearly
independent elements of a pre-Hilbert space. Then there exists a uniquely
determined orthogonal system {qo, ql , ... } of the form
span{ uo,· .. ,Un-I, un} = span{ qo, ... , qn-l, un} = span{qo, ... ,qn-l, qn}.
Hence, the existence of qn is established.
32 3. Basic Functional Analysis
Assume that {qo, ql, ... } and {qO' ql, ... } are two orthogonal sets of el-
ements with the required properties. Then clearly qo = Uo = qo. Assume
that we have shown that equality holds up to qn-l = qn-l. Then, since
qn - qn E span{uo, ... , un-d, we can represent qn - iin as a linear combi-
nation of ql, ... , qn-l; Le.,
n-l
qn - qn = L O:kqk·
k=O
=0,
whence qn = iin· o
sincexn-x+xo-txo,n-too. 0
max L
IIAlh = k=l, lajkl, (3.5)
... ,n.
)=1
n
IIAlloo = . max
)=l,... ,n L lajkl, (3.6)
k=l
(3.7)
In this case the norms are also called matrix norms. (Note that in (3.5)-
(3.7) both the domain and the range are given the same norm.)
Proof. By Theorem 3.8 it suffices to prove boundedness of A with respect
to one norm. For II . lit we can estimate
n n n n
IIAIIt:S k=l,
max L lajkl·
... ,n.
(3.8)
)=1
and choose Z E IR
n
with Zi = 1 and Zk = 0 for k =I: i. Then IIzII1 = 1 and
3.4 Matrix Norms 35
Hence n
IIAlh = sup
I/xl/,=1
IIAxlh ~ IIAzlh = max L
k=l, ... ,n j=1
jajkl, (3.9)
IIAlioo ~ 1==1,
. max... ,n
L lajkl· (3.10)
k=1
Hence
n
IIAlloo = sup
I/xl/ oo =l
IIAxiloo ~ IIAzlioo = . max L
J=1, ... ,n k=1
lajkl, (3.11)
Therefore,
n
IIAII~ ~ L lajkl 2 ,
j,k=l
and (3.7) is proven. In this inequality equality does not hold, in general, as
can be seen by considering the identity matrix. 0
Ax = AX.
with some (n -1) x (n-1) matrix An-I. By the induction assumption there
exists a unitary (n - 1) x (n - 1) matrix Qn-l such that Q~_IAn-IQn-1
is upper triangular. Then
are positive.
38 3. Basic Functional Analysis
and have
and
1
For the second part, by Theorem 3.27 there exists a unitary matrix Q
such that
bu bl2 b13 . . bIn
b 22 b 23 . . b 2n
B = Q* AQ = b 33 . . b3n
(
bnn
is upper triangular. Because of det(AI - A) = det(AI - B), the eigenvalues
of A are given by Aj = bjj , j = 1, ... , n. We set
b:= max
l~j~k~n
Ib'kl
J
and
6 := min (1, (n ~ l)b)
and define the diagonal matrix
D := diag(l, 6, 6 2 , ... ,6 n - l
)
)
bll 6b l2 62 bl3 6n - I bln
b 22 6b 23 6n - 2b2n
C= b33 6 n - 3 b 3n
(
40 3. Basic Functional Analysis
3.5 Completeness
Definition 3.33 A sequence (x n ) of elements in a normed space X is
called a Cauchy sequence if for every c > 0 there exists an integer N(c)
such that
Ilxn - xmll < c
for all n, m ~ N(c), i.e., if limn,m-Too IIx n - xmll = O.
Theorem 3.34 Every convergent sequence is a Cauchy sequence.
Proof. Let X n -+ x, n -+ 00. Then, for c > 0 there exists N(c) E IN such
that IIx n - xii < c/2 for all n ~ N(c). Now the triangle inequality yields
The fact that the converse of Theorem 3.34 is not true in general gives
rise to the following definition.
Definition 3.35 A subset U of a normed space X is called complete if
every Cauchy sequence of elements in U converges to an element in U. A
normed space is called a Banach space if it is complete. A pre-Hilbert space
is called a Hilbert space if it is complete.
The subset of rational numbers is not complete in JR. In order to give
further examples, we introduce some infinite-dimensional normed spaces.
The set C[a, b) of continuous functions f : [a, b) -+ JR equipped with
pointwise addition and scalar multiplication,
(f + g)(x) := f(x) + g(x), (af)(x):= af(x),
obviously is a linear space. Since the monomials x f-t x n , n = 0, 1, ... , are
linearly independent (see Theorem 8.2), C[a, b) has infinite dimension.
3.5 Completeness 41
Example 3.36 The linear space C[a, b] lurnished with the maximum norm
11/1100 := zE[a,b]
max l!(x)1
is a Banach space.
Proof. The norm axioms (N1)-(N3) are trivially satisfied. The triangle in-
equality follows from
for some Xo E [a, b]. Since the condition IIfn - 11100 < € is equivalent to
I/n(x) - f(x)1 < € for all x E [a, b], convergence of a sequence of continuous
functions in the maximum norm is equivalent to uniform convergence on
[a, b]. Since the Cauchy criterion is sufficient for uniform convergence of a
sequence of continuous functions to a continuous limit function, the space
C[a, b] is complete with respect to the maximum norm. 0
Example 3.37 The linear space C[a, b] equipped with the L 1 norm
II/lh := l b
I/(x)1 dx
is not complete.
Proof. The norm axioms are trivially satisfied. Without loss of generality
we take [a, b] = [0,2] and choose
o :S x :S 1,
fn(x) := {
1, 1 :S x :S 2.
Il/n - Imlh =
Jto (x
n
- x m ) dx:S _1_ -t 0, n -t 00,
n +1
and therefore Un) is a Cauchy sequence. Now we assume that Un) con-
verges with respect to the L 1 norm to a continuous function f; Le.,
11
Then
11
o
If(x)1 dx :S 11
0
I/(x) - xnl dx +
0
x n dx :S II! - Inlh + --1
1
n+
-t 0
42 3. Basic Functional Analysis
for n -t 00, whence f(x) = 0 follows for 0 ::; x ::; 1. Furthermore, we have
1
2If (x) - 11 dx = 1 2If (X) - fn(x)1 dx ::; Ilf - fnll) -t 0, n -t 00.
This implies that f(x) = 1 for 1 ::; x ::; 2, and we have a contradiction,
since f is continuous.
However, we note that the space £1 [a, b] of measurable and Lebesgue
integrable real-valued functions is complete with respect to the £1 norm
(see [5, 51, 59]). 0
Example 3.38 The linear space C[a, b] equipped with the £2 norm
b ) 1/2
IIfI12:= ( llf(xWdx
is not complete.
and recall from Theorem 3.8 that there exists C > 0 such that
for all v, J.l E IN. Hence for j = 1, ... , n the (ajv) are Cauchy sequences
in ceo Therefore, there exist al, .. . ,an such that ajv -t aj, v -t 00, for
j = 1, ... ,n, since the Cauchy criterion is sufficient for convergence in ceo
Then we have convergence,
n
Xv -t x:= Lajuj E X, v -t 00,
j=1
Remark 3.40 Complete sets are closed, and each closed subset of a com-
plete subset is complete.
Ax=x.
Theorem 3.44 Each contraction operator has at most one fixed point.
Proof. Assume that x and yare two different fixed points of the contraction
operator A. Then
Then we have
Hence, for m > n, by the triangle inequality and the geometric series it
follows that
(3.12)
qn
::::: (qn + qn+l + ... + qm-l)llxl - xoll : : : l Ilxl - xoll.
-q
Since qn -+ 0, n -+ 00, this implies that (x n ) is a Cauchy sequence, and
therefore because U is complete there exists an element x E U such that
X n -+ x, n -+ 00. Finally, the continuity of A from Remark 3.42 yields
i.e., x is a fixed point of A. That this fixed point is unique we have already
settled by Theorem 3.44. 0
1
f(x):= X+--
l+x
as a consequence of
x - Bx =z
has a unique solution x EX. The successive approximations
IIBII
Ilxn - xII ~ 1 -IIBII IIx n - xn-III
for all n E :IN. Furthermore, the inverse operator (I - B)-I is bounded by
-1 1
11([ - B) II ~ 1 _ IIBII
Ilxnll :S t
k=O
IIBkzl1 :S t
k=O
IIBllkllzll :S 1 ~~ik" '
and therefore, since X n -t (I - B)-l z, n -t 00, it follows that
which is valid for all u, v E U. From this, sufficiency of the condition (3.13)
is obvious, since U is a linear subspace.
To establish the necessity we assume that v is a best approximation and
(w - v, uo) :f. 0 for some Uo E U. Then, since U is a linear subspace, we
may assume that (w - v, uo) E JR. Choosing
(w - v,uo)
u =v + Il uoll 2 UO,
I 1 (w-V,UO)2 2
11 = I w - vi 2 -
2
II W - U Il uol1 2 < Ilw - vII ,
p2 =P and IIPII = 1.
It is called the orthogonal projection from X onto U.
1
Ilw - un ll 2 ~ d2 + -,
n
n E IN, (3.15)
2 2 2
<4d
- +-+-
n m
for all w E X. Therefore, P is bounded with IIPII :::; 1. From Remark 3.25
and p 2 = P it follows that IIPII 2: 1, which concludes the proof. 0
Problems
3.1 Show that (3.1) defines a norm on ~n for p ~ 1 and that
lim
p-too
IIxll p = IIxll oo
for all x E ~n.
50 3. Basic Functional Analysis
3.2 Indicate the closed balls {x E lR? : IIxli p :5 I} for p = 1,2,00. What
properties do they have in common?
3.3 Show that (3.1) does not define a norm on (;n for 0 < p < 1.
3.4 For the e1 and eoo norms on (;n show that Ilxli oo :5l1xlh :5 nllxll oo .
3.5 Let X and Y be normed spaces with norms II . IIx and II . IIY, respectively.
Show that
II(x,y)11 := Ilxllx + Ilylly,
lI(x,y)1I := (lIxll3c + lIyll~)1/2,
lI(x, y)1I := max(lIxllx, lIylly),
for (x, y) E X x Y define norms on the product X x Y.
Sn:= LXk
k=1
converges. The limit S = Iim n -+ oo Sn is called the sum of the series. Show that
in a Banach space X the convergence of the series
00
is a sufficient condition for the convergence of the series 2:;'=1 Xk and that
3.8 A norm 1I·lIa on a linear space X is called stronger than a norm 1I·lIb if every
sequence converging with respect to the norm II . lIa also converges with respect
to the norm II . lib. Show that II . lIa is stronger than II . lib if and only if there
exists a positive number C such that IIxllb :5 Cllxll a for all x E X. Show that on
CIa, bl the maximum norm is stronger than the L2 norm (and stronger than the
L 1 norm). Construct a counterexample to demonstrate that the maximum norm
and the L 2 norm (and the maximum norm and the L 1 norm) are not equivalent.
3.9 Show that in a normed space the operations of addition and multiplication
by a scalar are continuous functions. Show that in a pre-Hilbert space the scalar
product is a continuous function.
Problems 51
3.10 Show that a norm 11.11 on a linear space X is generated by a scalar product
if and only if the parallelogram equality
holds for all x, y EX. Show that the f 1 and foo norms on <en are not generated
by scalar products.
3.12 Show that eigenvectors of a matrix for different eigenvalues are linearly
independent.
3.13 Let X and Y be normed spaces and denote by L(X, Y) the linear space
of all bounded linear operators A : X -t Y. Show that L(X, Y) equipped with
converges (with respect to any norm on <en), and denote the sum of the series by
eA. Show that if A is an eigenvalue of A, then e A is an eigenvalue of eA.
LB
00
k
= (I - B)-l
k=O
n=l
n=l
3.20 Show that the best approximation to a function f E e[o, 211"] in the £2
norm with respect to the space of trigonometric polynomials of degree at most n
is given by the partial sum
n
= ~o + L +L
n
ak =- 11
11" 0
21f
f(x)coskxdx, bk =- 11
11" 0
21f
f(x)sinkxdx.
4
Iterative Methods for Linear Systems
We note that Theorem 4.1 remains valid for bounded linear operators
B : X --t X in infinite-dimensional Banach spaces with the definition of
the spectral radius appropriately modified. However, the proof requires a
different and deeper analysis.
For the iterative solution of a system of linear equations of the form
Ax =y
we distinguish different methods by the way in which the original system
is transformed into an equivalent fixed-point form. We decompose A by
a~1 0
AL = a~1 a32 0
(
anI an2 an,n-I
° )
an-I,n
°
4.1 Jacobi and Gauss-Seidel Iterations 55
We assume that all the diagonal entries of A are different from zero. Hence
the inverse D-l of D exists.
In the method attributed to Jacobi, which is sometimes also called the
method of simultaneous displacements, the system Ax = y is transformed
into the equivalent form
qoo := . max
J=l •...• n
'"'
L.J
n
k=l
a·k
_J_
ajjI I< 1 (4.1)
ki'j
or
ql := max
k=l, ....n . n
l: I-l...-
ajj I
a·k
<1 (4.2)
J=l
j#
(t
or
2
q2 := I:jk 1 ) 1/2 < 1. (4.3)
j.k=l JJ
j#
Then the Jacobi method, or method of simultaneous displacements,
n
Xv+l.j = - l: a·k
k=lajj
Xv.k +
_J_
y.
ajj
_J, j = 1, ... , n, v = 0,1,2, ... ,
k¥j
Proof. The Jacobi matrix _D- l (A L + AR) has diagonal entries zero and
off-diagonal entries -ajk/ajj. Hence by Theorem 3.26 we have
with arbitrarily chosen starting element xo. For the actual computations
we rewrite this as
(D + Adx v+1 = -ARX v + y, v = 0, 1,2, ... ,
and solve the linear system for Xv+l with the lower triangular matrix D+AL
by forward substitution. This leads to the Gauss-Seidel iteration scheme
in the following explicit form:
j-l n
Xv+l,j = - L a·k
_J_
a"JJ
Xv+l,k - L a·k
_J_ Xv,k
y.
+ _J , j = 1, ... , n.
k=l k=j+l a··
JJ
a··
JJ
PI := tI
k=2
alk
all
I, Pj:= ~ Iajk
k=lajj
IPk + k=j+l Iajk
ajj
tI, j = 2, ... ,n.
converges for each y E {;n and each Xo E {;n to the the unique solution of
Ax = y (in any norm on {;n) . We have the a priori error estimate
pI'
Ilxv- xlloo 'S -1- Ilxl - xoll oo
-P
and the a posteriori error estimate
P
Ilxv- xlloo 'S -1--P IIx v - xv-dloo
for all v E IN.
58 4. Iterative Methods for Linear Systems
L a··
a'k a'k
xi = - _J_ Xk - '"' _J_ zk, j = 1, ... ,n.
LJ a··
k=1 JJ k=i+l JJ
2 -1
-1 2 -1
-1 2 -1
A=
-1 2-1
-1 2
Proof. Obviously, q<Xl = 1; i.e., (4.1) is not fulfilled. We have the recursion
1 1 1
Pi = 2 Pj-l + 2 ' j = 2, ... ,n -1, Pn = 2 Pn-l·
From this, by induction, it follows that
1
Pj = 1 - 2j , j = 1, ... ,n -1,
Therefore,
1
P = 1 - 2n - 1 < 1,
and this implies convergence of the Gauss-Seidel iterations by Theorem
4.3. 0
4.1 Jacobi and Gauss-Seidel Iterations 59
and
ajk = 0, j E N, k E M.
Otherwise the matrix is called irreducible.
A reducible matrix A, after a reordering of the rows and columns, can
be partitioned into a 2 x 2 block matrix of the form
A= (~~~ A~2)
(see Problem 4.5). Therefore, solving a linear system with the matrix A
can be reduced to solving two smaller linear systems with the matrices All
and A 22 .
Theorem 4.7 Assume that the matrix A = (ajk) is irreducible and weakly
row-diagonally dominant; i.e., A is row-diagonally dominant,
n
L lajkl ~ lajjl, j = 1, ... ,n, (4.6)
k=l
k:f-j
with inequality holding for at least one row j. Then the Jacobi iterations
converge for each y E en and each XQ E en to the unique solution of
Ax = y (in any norm on en).
Proof. By (4.6) and Theorem 3.26 we have that IIBlioo ~ 1 for the Jacobi
matrix B = - V- l (A L + A R ). Therefore, from Theorem 3.32 it follows that
p(B) ~ 1 for the spectral radius.
60 4. Iterative Methods for Linear Systems
Now assume that there exists an eigenvalue Aof B with IAI = 1. For the
associated eigenvector we may assume that Ilxlloo = 1. Then from Ax = Bx
we obtain the inequality
t
k=l
I I= 1,
:jk
J)
j E N.
k#j
Therefore, we have p(B) < 1, and the statement of the theorem follows
from Theorem 4.1. 0
xv+! = Xv + D-1(y - Ax v ),
i.e., in components
Xv+l,j = Xv,j + a
W
JJ
. [yj - t
k=l
ajkXv,k], j = 1, ... , n,
is known as the Jacobi method with relaxation. The weight factor w > 0 is
called the relaxation parameter.
Theorem 4.9 Assume that the Jacobi matrix B := _D- 1 (A L + A R ) has
real eigenvalues and spectral radius less than one. Then the spectral radius
of the iteration matrix
for the Jacobi method with relaxation becomes minimal for the relaxation
parameter
2
Wopt = -::-----:-------=--
2 - A max - Amin
I - D- 1 A) _ Amax - Amin
p( Wopt - 2_ \ _ \ . '
.l\max Amln
where Amin and A max denote the smallest and the largest eigenvalue of B,
respectively. In the case Amin :j:. - Amax the convergence of the Jacobi method
with optimal relaxation parameter is faster than the convergence of the
Jacobi method without relaxation.
Proof. For W > 0 the equation Bu = AU is equivalent to
From this, elementary algebra yields the optimal parameter Wopt and the
spectral radius p(I - woptD-1 A) as stated in the theorem. 0
W
XII+l j + j- j
= XII'j a j = 1, ... ,n,
,
and
(2 - w)D + wA - W(AR - Ad = 2[D + wALl
we deduce that
for 0 < w < 2 we now can conclude that 1J.t1 < 1 for 0 < w < 2. Hence
convergence of the SOR method for 0 < w < 2 follows from Theorem 4.1. 0
64 4. Iterative Methods for Linear Systems
Without going into detail, we wish to say that a much wider class of
matrices arising in the discretization of differential equations enjoys the
property of being consistently ordered in the sense of Definition 4.13. For
a more comprehensive study we refer to [61, 63, 66].
Theorem 4.15 (Young) Assume that A is a consistently ordered matrix
and that the eigenvalues of the Jacobi matrix -D-1(AL + A R ) are real
with spectral radius A = p[-D-1(A L + A R )] < 1. Then the SOR method
converges for all 0 < w < 2. The spectral radius of the SOR matrix B(w)
is minimal for
2
Wopt = ~ > 1.
l+vl-A2 -
In this case we have
1- Jl- A2
p[B(wopd] = 1 + Vf=A2 .
4.2 Relaxation Methods 65
Proof. From
A=J,t+w-l (4.8)
Viiw
is an eigenvalue of
J,t + w - 1 = Vii WA
yields
W1AI JW2A2 )2
J,t=
( --+
2
--+I-w
4
has two real solutions, and only one of them belongs to the interval (0,2),
namely
2
WO(A) = ~>1.
1+ 1- A
2 -
Therefore, we have
W:1, J
(
-+ --+I-w
)
0< w ~ wo(A),
p[B(w)] = 4 ' (4.11)
{
wo(A) < w < 2.
The function
wA . /w 2A2
/(w) := 2 + Y-4- + 1 - w
has the properties /(0) = 1 and
2
j'(w) = ~+ wA - 2 < O.
22vw2A2+4-4w
The latter follows from
A2 (4 - 4w + w2A2 ) < 4 - 4wA 2 + w2A4 = (2 _ wA 2)2.
Therefore, the spectral radius described by (4.11) is strictly monotonically
decreasing for 0 < w < Wo and strictly monotonically increasing for
Wo < w < 2 (see Figure 4.1). Since p[B(O)] = p[B(2)] = 1, we finally
obtain that p[B(w)] < 1 for all 0 < w < 2 and that p[B(w)] assumes its
minimum for w = wo(A) with value p[B(wo(A))] = wo(A) - 1. 0
p[B(w)]
1.---
1 wo 2 w
Example 4.11 For the tridiagonal matrix A from Example 4.5 we have
N(SOR) 11"
N(Jacobi) ~ 4(n + I)
for the optimal relaxation parameter.
Proof. Using the trigonometric addition theorem
1 . 1I"j(k - I) I . 1I"j(k + I) 1I"j. 1I"jk
- sm
2 n+1
+-
2
sm
n+1
= cos - - sm - - ,
n+1 n+1
it can be seen that the Jacobi matrix
o1
101
101
101
1 0
v}" k
. -
= sm 1I"jk
-, k 1 ... , n,
=, J
. = 1,... , n.
, n+1
Hence,
68 4. Iterative Methods for Linear Systems
and
For example, for n = 30 the optimal SOR method is about forty times
as fast as the Jacobi method. Note that the improvement on the speed
of convergence improves as n increases. The fact that in Example 4.17,
and, more generally, in almost all linear systems arising in the discretiza-
tion of boundary value problems, the optimal relaxation parameter has the
property W > 1 explains why the method is known as the overrelaxation
method.
TO := y - Axo,
A80 = TO
4.3 Two-Grid Methods 69
in order that Xl satisfy (4.12). We observe that the correction term 150 will,
in general, be small compared to xo, and therefore it is unnecessary to solve
the defect correction equation exactly. Hence we write
(4.14)
for the solution of (4.12). By Theorem 4.1, the iteration (4.15) converges
to the unique solution X of A;iprox[Y - Ax] = 0, provided that the spectral
radius of the iteration matrix 1- A;iproxA is less than one. Since the unique
solution X of Ax = Y trivially satisfies A;iprox[Y - Ax] = 0, we then have
convergence of the scheme (4.15) to the unique solution of (4.12). For a
rapid convergence it is desirable that the spectral radius be close to zero,
which will be the case if A;iprox is a reasonable approximation to A-I. For
a more complete introduction to the defect correction principle we refer to
[56].
Here we wish to indicate briefly two applications. Firstly, the defect cor-
rection principle (4.14) can be used to improve on the accuracy of an
approximate solution xo, obtained for example by Gaussian elimination.
Then, in principle, the computation of Xo corresponds to some approxima-
tion Xo = A;iproxY obtained from an LR decomposition. This means that
evaluating 150 = A;iproxro is achieved by applying again the same elimi-
nation algorithm to the defect correction equation. This way, the defect
correction principle provides a simple tool to improve on the accuracy of a
solution to a linear system obtained by elimination.
Secondly, we would like to illustrate the more systematic use of the defect
correction principle for the development of multigrid methods as a powerful
tool for the fast iterative solution of linear systems arising in the discretiza-
tion of differential and integral equations. For the sake of simplicity we will
confine ourselves to the case of two-grid iterations.
The basic idea of two-grid methods is to use the defect correction princi-
ple with the approximate inverse A;plprox for the matrix Aline of a large lin-
ear system corresponding to a fine approximation grid given simply by the
exact inverse of the matrix Acoarse of a smaller linear system, correspond-
ing to a coarse approximation grid. Of course, a number of mathematical
problems arise in the design of such methods concerning the appropriate
70 4. Iterative Methods for Linear Systems
relation between the fine and coarse grid and the transfer between the two
grids. We will outline some ideas on the structure of two-grid methods by
again considering the simple model problem from Example 2.1 as a typical
case.
Recall that the solution vector U(h) E rn.n of the linear system
(4.16)
2 -1
-1 2-1
-1 2-1
-1 2-1
-1 2
h=_1_
n+1
in the matrix A(h) and the solution U(h). We assume that n is odd because
later we want to choose the coarser grid by doubling the mesh width.
We start from the Jacobi iteration with relaxation
(4.17)
4 . 21fjh
J.Lj = h 2 sm 2 j = 1, .. . ,n, (4.18)
(h)
vj,k = sm
. ( 'kh)
1fJ , k = 1, ... ,n, j = 1, ... ,n. (4.19)
Note that by Theorem 3.29, the eigenvectors of the Hermitian matrix A(h)
form an orthogonal basis for rn.n (see Problem 4.18). The vt), j = 1, ... , n,
are also eigenvectors of the Jacobi matrix 1- [D(h)j-l A(h), with eigenvalues
From Theorem 4.9 we observe that w = 1 is the optimal choice for the
Jacobi iteration with relaxation. However, it will turn out that in the con-
text of two-grid methods the damped, or underrelaxed, Jacobi method with
o < w < 1 is more important. This is due to the following observation.
Since the vt), j = 1, ... , n, provide a basis for IRn , we can represent the
difference between the exact solution U(h) and the l/th iteration U II in the
form
n
U (h) - UII -- 'L-t
" u'},11 v(h)
A<.
j .
j=1
Qj,lI+l = { 1 - 2w sm
. 2 j
1T
2 h} Qj,II' j = 1, . .. ,n,
for the coefficients Qj,II' In particular, if we choose w = 0.5, we have that
From this we observe that even though convergence of the iterations (4.17)
becomes slower when we decrease w, for w = 0.5 the convergence restricted
to the subspace
Wn := span{v!!.±,!, ... ,vn }
2
This fact can be expressed by saying that the damped Jacobi iteration is a
smoothing iteration. In the sequel we will consider only the damping factor
w = 0.5.
The slow convergence with respect to low frequencies will now be taken
care of by the defect correction principle through incorporating a so-called
coarse grid correction on the grid with mesh width 2h. For this we need
to transfer vectors corresponding to the fine grid to vectors correspond-
ing to the coarse grid and vice versa. The transfer from the fine grid
to the coarse grid requires a restriction and corresponds to a mapping
n
R(h) : IR -+ IR ":;-' . Note that we only need to consider this mapping for
the interior grid points. Instead of choosing the restriction (R(h)Y)k = Y2k,
72 4. Iterative Methods for Linear Systems
1 2 3 4 5 6 7
I I I I I I I
1 2 3
FIGURE 4.2. Restriction operator of the two-grid method for n =7
The corresponding matrix is
2 1
1 2 1
1 2 1
1 2
n- 1
R
(h)V(h)
J
= C2V(2h)
J J'
R(h) (h)
vn +1 -
__ 2 (2h)
j - SjV j ,
.
J = 1,... , -2- ,
(4 21)
.
between the eigenvectors (4.19) for the fine and the coarse grid (see Problem
4.19). Here we have set
The transfer from the coarse grid to the fine grid is called prolongation
and corresponds to a mapping p(h) : IR.!!j-! --+ IRn . The simplest choice for
p(h) is given by the piecewise linear interpolation (see Chapter 8)
n-l
k=I""'-2-
n+l
k = 1""'-2-
4.3 Two-Grid Methods 73
n-l
for y E IR --r, as illustrated in Figure 4.3. The corresponding matrix is
given by p(h) = 2R(h).i. Either by direct computation or from (4.21) and
the fact that the matrices p(h) and 2R(h) are adjoint one can establish that
(see Problem 4.19)
(4.22)
1 2 3 4 5 6 7
I I I I I I I
1 2 3
FIGURE 4.3. Prolongation operator of the two-grid method for n =7
where IN (UII , F(h») denotes the result of N steps of the damped Jacobi
iterations (4.17) with starting element UII' Obviously, the iteration matrix
corresponding to this two-grid method is given by
(4.23)
Theorem 4.18 For the spectral radius ofT we have that p(T) = 0.5; i.e.,
the two-grid iterations converge.
74 4. Iterative Methods for Linear Systems
Proof. We note that from (4.18) and (4.19), with h replaced by 2h, we have
that
whence
n-1
j = 1""'-2-'
follows. From this, using (4.20)-(4.22) and R(h)v~ = 0, it can be derived
2
that
Tv(h)
(h~ (4.24)
( TV n + 1 _ j
for J. -- 1, ... , -2-
n-l an d
(h) _ 1 (h)
Tv!l.±.! - -2 v!l.±.! . (4.25)
2 2
. n+1
J = 1""'-2-'
and the eigenvalue zero of multiplicity n2"l. This implies the assertion on
the spectral radius of T. 0
Theorem 4.18 shows that the two-grid method is a very fast iteration. As
compared to the classical Jacobi and Gauss-Seidel methods and also to the
SOR method with optimal relaxation parameter, it decreases the spectral
radius from a value close to one to one-half, which causes a substantial
increase in the speed of convergence. However, for practical computations
it has the disadvantage that in each step the solution of a system with half
the number of unknows is required.
This drawback of the two-grid method is remedied by the multigrid
method. Whereas for the two-grid method as described above only two
grids are used, the multigrid method uses M > 2 different grids with mesh
widths hI-' = 21-' h, p, = 1, ... , M, obtained from the mesh width h on the
finest grid. The multigrid method is defined recursively. The method for
M + 1 grids performs one or several steps of the damped Jacobi iteration
on the finest grid with mesh width h and uses as approximate inverse for
the defect correction one or several steps of the multigrid iteration on the
M grids with mesh widths 2h, 4h, ... ,2 M h. To be more explicit, the three-
grid method uses one or several steps of the two-grid method as the defect
Problems 75
correction of the damped Jacobi iteration on the finest grid; the four-grid
method uses one or several steps of the three-grid method as the defect
correction; and so on. To describe further details of the multigrid method,
in particular showing that the computational cost of one step of a multigrid
iteration is proportional to the cost of the Jacobi iterations on the finest
grid provided that the coarsest grid is coarse enough, is beyond the aim of
this introduction. For a comprehensive study we refer to [8, 26, 29, 63].
Problems
4.1 Consider the solution of the linear system
4.2 Write a computer program for the Jacobi method, the Gauss-Seidel method,
and the SOR method and test it for various examples.
4.3 Show that a matrix A has spectral radius peA) < 1 if and only if it satisfies
Iim v -+ oo A V = O.
4.4 Prove that the Jacobi method converges for strictly column-diagonally dom-
inant matrices (compare (4.5)).
),
where All is a kxkmatrix and A 22 is an (n-k)x(n-k) matrix with 1 ~ k ~ n-l.
4.6 Show that the matrix A from Example 4.5 is irreducible and weakly row-
diagonally dominant.
4.7 Let
A=(~ : ~).
Show that for 1 ~ 20 < 2 the Gauss-Seidel method is convergent and the Jacobi
method is not.
76 4. Iterative Methods for Linear Systems
A=(=i ~~ -!)
show that the Jacobi method is convergent and the Gauss-Seidel method is not.
4.9 For the matrix
A = (-~ ~ =~)
show that the Gauss-Seidel method is convergent and the Jacobi method is not.
4.10 Show that the matrix
2 0 -1
A~ 0
( -1
2
-1
-1
2
-1
-1 )
0
-1 -1 0 2
is irreducible and that the Jacobi method is not convergent.
4.11 Show that the iteration matrix of the Gauss-Seidel method has eigenvalue
zero.
4.12 Consider the variant of the Gauss-Seidel iteration where the components
are iterated from the nth component backward to the first component. What is
the iteration matrix of this method? Obtain a symmetric method by alternating
one step of the forward Gauss-Seidel method and one step of the backward
Gauss-Seidel method. What is the iteration matrix of this method?
4.13 Show that the Jacobi iteration converges for a matrix A if and only if it
converges for the transposed matrix AT.
4.14 Show that the matrix A of Example 2.2 is irreducible, positive definite,
and weakly row-diagonally dominant.
4.15 Compute the eigenvalues of the Jacobi iteration matrix for the matrix A
of Example 2.2.
4.16 Let A = (ajk) be a nonnegative n x n matrix, i.e., ajk ::::: 0, j, k = 1, ... , n,
and let p(A) < 1. Show that I - A is nonsingular and (I - A)-l is nonnegative.
4.17 Give a counterexample to show that the Jacobi method, in general, does
not converge for positive definite matrices (see Theorem 4.12).
4.18 Show by direct computations that the eigenvectors given by (4.19) are
orthogonal.
4.19 Prove the relations (4.21), (4.22), (4.24), and (4.25).
for the two-grid iteration matrix with N damped Jacobi iterations at each step.
5
Ill-Conditioned Linear Systems
of degree n in the least squares sense, i.e., with respect to the £2 norm.
Using the monomials x H x k , k = 0,1, ... , n, as a basis of the subspace
Pn C e[O, 1] of polynomials of degree less than or equal to n (see Theorem
8.2), from Corollary 3.53 and the integrals
it follows that the coefficients 0:0, ... , O:n of the best approximation are
uniquely determined by the normal equations
L·J + k1 + 1
n
k=O
O:k =
1 0
1
f(x)x i dx, j = O, ... ,n. (5.1)
1
1 xi
r··-
).- 0
- - dx '
1+x
j =O, ... ,n.
we deduce that
n ao al az a3 a4 a5 a6
1 0.9314 -0.4766
2 0.9860 -0.8040 0.3274
3 0.9972 -0.9389 0.6645 -0.2247
4 0.9994 -0.9830 0.8630 -0.5334 0.1543
5 0.9999 -0.9956 0.9512 -0.7688 0.4191 -0.1059
6 0.9999 -0.9989 0.9843 -0.9011 0.6672 -0.3242 0.0727
n ao al az a3 a4 a5 a6
1 0.93 -0.47
2 0.98 -0.80 0.32
3 0.99 -0.95 0.70 -0.24
4 1.00 -1.16 1.63 -1.69 0.72
5 1.06 -2.74 12.68 -31.16 33.87 -13.25
6 1.39 -16.58 151.09 -584.79 1071.93 -926.75 304.49
Despite the fact that the changes in the right-hand sides are less than
0.000005, we obtain drastic changes in the solution. Therefore, qualitatively
we may say that our linear system provides an example of an ill-conditioned
system. The matrix of this example is known as the Hilbert matrix. 0
Ax =y (5.2)
and
(5.3)
respectively. Then
whence
Theorem 5.3 shows that the condition number may serve as a measure of
stability for linear operator equations and, in particular, for linear systems.
A linear system with a small condition number is stable, whereas a large
condition number indicates instability. We call a linear system with a small
condition number well-conditioned. Otherwise, it is called ill-conditioned.
By Theorem 3.31, the condition number of a Hermitian matrix A in the
Euclidean norm is given by
IAmaxl
cond 2 (A) = IAminl '
where Amax and Amin denote the eigenvalues of A with largest and smallest
modulus, respectively. Table 5.3 is obtained by employing the QR algorithm
(see Section 7.4) for the computation of matrix eigenvalues. It illustrates
quantitatively the degree of instability, i.e., the ill-conditionedness of the
linear system from Example 5.1.
n 2 3 4 5 6
Amax 1.27 1.41 1.50 1.57 1.62
Amin 6.57.10- 2 2.69.10- 3 9.67.10- 5 3.29.10- 6 1.08.10- 7
cond 2 19.3 5.24.102 1.55.104 4.77.105 1.50.101
J.lI 2': J.l2 2': ... 2': J.lr > J.lr+l = ... = J.ln = 0
and orthonormal vectors UI, ... , Un E <en and VI, ... , Vm E <em such that
Ax = LJ.lj(x,Uj)Vj. (5.6)
j=l
Each system (J.lj, Uj, Vj) with these properties is called a singular system of
the matrix A.
Proof. The Hermitian and semipositive definite matrix A * A of rank r has
n orthonormal eigenvectors UI, ... , Un with nonnegative eigenvalues
(5.7)
which we may assume to be ordered according to J.lI 2': J.l2 2': ... 2': J.lr > 0
and J.lr+l = ... = J.ln = O. We define
1
Vj:= - AUj, j = 1, ... ,r.
J.lj
Then, using (5.7) we have
y = L(y,Vj)Vj, (5.12)
j=1
since A*vj = 0 for j = r + 1, ... , m. For the vector Xo defined by (5.11) we
have that r
in terms of the orthonormal basis VI, ... , V m . Let Xo be given by (5.11) and
let x E (:n be arbitrary. Then
(5.13)
Therefore, there exists a smallest integer p = p(<5) such that F(P) ~ O. Note
that p ~ r. In actual computations, this stopping parameter p is determined
by terminating the sum (5.14) when the right-hand side of (5.17) becomes
smaller or equal to zero for the first time.
In order to show the convergence (5.16), we note that IIAx p - yOliz ~ <5
implies
The spectral cutoff method requires the full solution of the eigenvalue
problem for the matrix A * A, which we will describe in Chapter 7. As an
alternative, in the following section we shall describe the Tikhonov regu-
larization, which can be performed without explicitly knowing the singular
value decomposition.
(5.19)
Proof. (Compare to the proof of Theorem 3.51.) We first note the relation
which is valid for all x, X o E (Cn. From this it is obvious that the solution
X o of (5.18) satisfies (5.20).
Conversely, let X o be a solution of (5.20) and assume that
a.X o + A* Ax o =I A*y.
From the proof of Theorem 5.7 we know that the eigenvalues of the
Hermitian matrix a1 + A* A are given by a + J.LJ, j = 1, ... ,no Hence by
Theorem 3.27 we have that
error
Etotal
Eapprox
~------E data
a
FIGURE 5.1. Total error for Tikhonov regularization
This decomposition shows that the total error consists of two parts:
It reflects the influence of the incorrect data and, for fixed 0, becomes large
as 0: -t 0, if the smallest positive singular value J.Lr is close to zero (see also
Problem 5.16). The second term,
(5.24)
This discrecancy principle for Tikhonov regularization is regular in the
sense that if the error level 0 tends to zero, then
(5.25)
Proof. We have to show that the function F : (0,00) -t IR defined by
~
2
F(o:) = LJ (
0:
2)2 I(y eS ,Vj)1 2 - 02.
j=l 0: + J.Lj
Therefore, F is continuous and strictly monotonically increasing with the
limits F(o:) -t _0 2 < 0,0: -t 0, and F(o:) -t lIyeSlI~ - 02 > 0,0: -t 00.
Hence, F has exactly one zero 0: = 0:(0).
Note that the condition lI yeS - yl12 ~ 0 < lI yeSII2 implies that y =I- O. Using
(5.23), (5.24), and the triangle inequality we can estimate
and
alJAx a lJ2 = IJAA*(yO - Ax a )1I2 S IJAA*1I 26.
Combining these two inequalities and using lI y o112 ~ IJyll2 - 6 yields
This implies that a --+ 0, 6 --+ O. Now the convergence (5.25) follows from
the representations (5.13) for At y and (5.19) for X a (with y replaced by
yO) and the fact that lI yo - yl12 --+ 0, 6 --+ o. 0
n ao at a2 a3 a4 a5 a6
1 0.9315 -0.4767
2 0.9862 -0.8052 0.3285
3 0.9987 -0.9546 0.7021 -0.2491
4 1.0015 -1.0193 1.0154 -0.7605 0.2644
5 0.9992 -0.9659 0.7236 -0.1458 -0.2838 0.1735
6 0.9995 -0.9618 0.6564 0.0254 -0.2818 -0.1512 0.2166
Problems
5.1 For the condition number of linear operators show that
and
5.3 Determine cond2(A) for the matrix A of Example 2.1 and discuss its be-
havior for large n.
Problems 91
~)
7
A=(~ 11
2
A= CO
i')
1 4
10 5
1
4 5 10
0 -1 7
A= ( ~
Show that one can improve the condition of a matrix by scaling through calcu-
lating condcx:>(DA) where D is the diagonal matrix
Show that
condcx:>(A) :s; condcx:>(DA)
for all n x n diagonal matrices D (see Problem 5.6).
5.9 Show that for an m x n matrix A the n x n matrix A' A is Hermitian and
positive semidefinite.
o
A=(~ o
1
-1 ~).
92 5. Ill-Conditioned Linear Systems
5.11 Show that At y is the least squares solution of Ax = y with minimal norm.
5.12 Show that the pseudo-inverse At is uniquely determined by the properties
5.13 For the pseudo-inverse show that (At)t = A and (At)* = (A')t.
5.14 Give an example to show that in general, (AB)t =F BtAt.
5.15 What is the pseudo-inverse of A : (;n -t (;m given by Ax = (x, a)b with
a E (;m and b E (;n?
(Ax,y)y = (x,A'y)x
for all x E X and y E Y. Use this result to formulate and prove a generalization
of Theorem 5.9 for the minimization of
is a bounded linear operator that does not have a bounded inverse; i.e., show
that differentiation is an ill-posed problem.
6
Iterative Methods for
Nonlinear Systems
l'f/.en the equation f(x) = x has a unique solution xED, and the successive
approximations
Xv+1 := f(x v ), v = 0,1,2, ... ,
with arbitrary Xo E D converge to this solution. We have the a priori error
estimate
Proof. Equipped with the norm 11·11 = 1·1 the space rn. is complete. By the
mean value theorem, for x, y E D with x < y, we have that
monotonically if f has positive slope and that it converges with values al-
ternating above and below the fixed point if f has negative slope. In both
cases the slope of the function f has absolute value less than one in a
neighborhood of the fixed point. From drawing a corresponding figure for
a function with a slope of absolute value greater than one it can be seen
that the corresponding iteration will move away from the fixed point (see
Problem 6.2).
: f
I
xo
The following theorem states that for a fixed point x with 1!,(x)1 < 1 we
always can find starting points Xo ensuring convergence of the successive
approximations.
Theorem 6.2 Let x be a fixed point of a continuously differentiable func-
tion f such that 1!,(x)1 < 1. Then the method of successive approximations
xv+! := f(x v ) is locally convergent; i.e., there exists a neighborhood B of
the fixed point x such that the successive approximations converge to x for
all Xo E B.
Proof. Since!, is continuous and 1!,(x)1 < 1, there exist constants 0 < q < 1
and J > 0 such that 1!'(y)1 :s q for all y E B := [x - 6, x + J]. Then we have
that
If(y) - xl = If(y) - f(x)1 :s qly - xl :s Iy - xl :s J
for all y E Bj i.e., f maps B into itself and is a contraction f : B -+ B.
Now the statement of the theorem follows from Theorem 6.1. 0
Theorem 6.2 expresses the fact that for a fixed point x with 1!,(x)1 < 1
the sequence x v+ 1 : = f (x v) converges if the starting point Xo is sufficiently
close to x. In practical situations the problem of how to obtain such a good
initial guess is unresolved in general. Frequently, however, a good estimate
of the fixed point might be known a priori from the underlying application
or might be deduced from analytic observations.
The following examples illustrate that in some cases we also have global
convergence, where the successive approximations converge for each start-
ing point in the domain of definition of the function f.
96 6. Iterative Methods for Nonlinear Systems
v Xv Xv
0 0.30000000 0.40000000
1 0.42000000 0.48000000
2 0.48720000 0.49920000
3 0.49967232 0.49999872
l/a 2/a
FIGURE 6.2. Division by iteration
Example 6.4 For computing the square root of a positive real number a
by an iterative method we consider the function f : (0,00) --+ (0,00) given
by
f(x):= ~ (x+~).
By solving the quadratic equation f(x) = x it can be seen that f has
the fixed point x = va. By the arithmetic geometric mean inequality we
have that f(x) > va for x > 0; i.e., f maps the open interval (0,00) into
[va, 00), and therefore it maps the closed interval [va, 00) into itself. From
j'(x) =~ (1- :2)
it follows that
q := sup 1j'(x)1 = ~.
va'5,x<oo 2
Hence f: [va, 00) --+ [va, 00) is a contraction. Therefore, by Theorem 6.1
the successive approximations
converge to the square root v'a for each Xo > 0, and we have the a posteriori
error estimate
Iv'a - xvi::; Ixv - xv-II·
Figure 6.3 illustrates the convergence. The numerical results again are for
a = 2. 0
v Xv
0 5.00000000
1 2.70000000
2 1.72037037
3 1.44145537
4 1.41447098
5 1.41421359
6 1.41421356
In both of Examples 6.3 and 6.4 the numerical values exhibit a very
rapid convergence. This is due to the fact that because of f'(x) = 0 at the
fixed point, the contraction number is very small. We shall elaborate on
this observation later when we consider Newton's method.
v Xv V Xv
0 1.00000000 7 0.72210243
1 0.54030231
2 0.85755322
3 0.65428979 45 0.73908513
4 0.79348036 46 0.73908514
5 0.70136877 47 0.73908513
6 0.76395968 48 0.73908513
I(x) := cosx.
Here we have
q = sup 1f'(x)1 = sin 1 < 1,
O:S;x:S1
98 6. Iterative Methods for Nonlinear Systems
and Theorem 6.1 implies that the successive approximations Xv+l := cos Xv
converge to the unique solution x of cos x = x for each Xo E [0,1]. Table
6.1 illustrates the convergence, which is notably slower than in the two
previous examples. 0
°
lim x-+ oo h(x) = 00. Therefore, the function f(x) := -lnx has a unique
fixed point x. Since this fixed point must satisfy < x < 1, the derivative
g(x) := e- x
by Theorem 6.1 it follows that for arbitrary Xo > 0 the successive approx-
imations Xv+l = e- xv converge to the unique solution of x = e- x . 0
AX + (1 - A)Y E D
for all x, y E D and all A E (0,1), Le., if the straight line connecting x and
y is contained in D,
Theorem 6.7 Let DC R n be open and convex and let f : D -t R n be a
mapping
f(x) = (fl(Xl, .. "Xn), ... ,fn(Xl,.,.,xn)T,
where the Ii : D -t R, j = 1, ... , n, are continuously differentiable func-
tions. By
we denote the Jacobian matrix of f. Then we have the mean value theorem
(6.1)
where the integral on the left-hand side has to be understood as the vector
of the integrals over the components of g. The function .\ f---+ IIg(.\) II is
continuous, since the norm is a continuous function. Therefore, the integral
on the right-hand side of (6.1) is well-defined. Consider the equidistant
subdivision .\i = i/m, i = 0,1, ... , m, for m E IN. Then we have the
converging Riemann sums
1
and
1
~ g(.\i) (.\i - .\i-d -t g(.\) d.\, m -t 00.
From the second limit, by the continuity of the norm we conclude that
1r
1
d
Ii(x) -li(y) = d.\ fj[.\x + (1 - .\)y] d.\, j = 1, . .. ,n.
0
af.
d
d.\ Ii[.\x + (1 - .\)y] = Ln
axl [.\x + (1 - .\)y] (Xk - Yk),
k=1 k
100 6. Iterative Methods for Nonlinear Systems
and therefore
of
1o k=1 Xk [.\x + (1 - .\)Y](Xk - Yk) d.\;
L 1 n
h(x) -h(y) = 0 j
f(x) - f(y) = 1 1
J'(.\x + (1 - .\)y] (x - y) d.\.
From this, with the aid of (6.1) and the continuity of .\ H f'[.\x + (1- .\)y],
we obtain
< max
- O~>'~1
1If'[.\x + (1 - .\)y]lIl1x - yll,
in some norm 11·11 on IRn. Then the equation f (x) = x has a unique solution
xED, and the successive approximations
converge for each Xo E D to this fixed point. We have the a priori error
estimate
q"
Ilx" - xII ~ -l-q II x l - xoll
and the a posteriori error estimate
q
Ilx" - xii ~ -l-q Ilx" - x"-111
for all v E IN.
sup . max
xEDJ=l, ... ,n
L
n
8f ' (x) < 1,
1_J
8Xk
I
k=l
sup max
xED k=l, ... ,n.
Ln
8f ' (x) < 1,
1_ J
8 Xk
I
J=l
sup
xED
[t I~~j
j,k=l k
(X)1
2
,] 1/2 <1
X2 = O.5sinxl + 0.5COSX2
we have
j'(x) = ( -0.5 sinx l -0.5 COSX2 )
0.5 cos Xl -0.5 sinx2 '
and therefore 11f'(x)112 ~ J(f.5 for all x E lR2. Hence Theorem 6.8 is
applicable. o
The reader will not be surprised to learn that for speeding up convergence
of the successive approximations, concepts developed for linear equations
like relaxation methods or multigrid methods can also be successfully em-
ployed in the nonlinear case. However, since we discussed these methods
in some detail in Sections 4.2 and 4.3 for linear equations, we shall refrain
from repeating the analysis for nonlinear equations.
Geometrically, the affine linear function g describes the tangent line to the
graph of the function f at the point Xo.
This consideration can be extended to the case of more than one variable.
Given an approximation Xo to a zero of f, by Taylor's formula we still have
the approximation (6.2), where now, as in the previous section,
f(x) = 0
X v +l := 2x v - ax~.
f(x) =: x 2 - a
(a) f satisfies
11!'(x) - !'(y)11 :S T'llx - yll
for all x, y E D and some constant T' > O.
(b) The Jacobian matrix !' (x) is nonsingular for all xED, and there
exists a constant f3 > 0 such that
the inequality
1
q <-
2
is satisfied.
(d) For r := 20 the closed ball B[xo, r] := {x : Ilx - xoll :S r} is contained
in D.
Then f has a unique zero x· in B[xo, r]. Starting with Xo the Newton
iteration
xv+! := Xv - [J'(x v )t 1 f(x v ), V = 0,1, ... , (6.4)
is well-defined. The sequence (xv) converges to the zero x· of f, and we
have the error estimate
Proof. 1. Let x, y, zED. From the proof of Theorem 6.7 we know that
f(y) - f(x) = 1 1
!'[AX + (1 - A)Y] (y - x) dA.
Hence
and estimating with the aid Qf (6.1) and condition (a) we find that
From this, with the help of the triangle inequality, the induction assump-
tion, and condition (c), we obtain that
From this we observe that (XII) is a Cauchy sequence, since q < 1/2 and by
Theorem 3.39 the limit
x· = lim XII
11-"+00
and deduce from the above estimate and Theorem 3.48 that the radius p
of B[x·, p) can be chosen such that f'(x) is nonsingular on B[x·, p] and
1I[f'(x·))-lll::; (3 for all x E B[x·,p] and some constant (3 > O.
Since f is continuous, f(x·) = 0 implies that there exists 8 < p/2 such
I}
that
. {p4(3' 2(32"(
IIf(xo)1I < mm
for all IIxo - x·1I < 8. Then, after setting a := 1I[f'(XO)]-l f(xo)11 we have
the inequalities
and
2a ::; 2(3l1f(xo)1I < ~ .
Hence for the open and convex ball B(x·, p) and for each Xo with
Ilxo - x·11 < 8 the assumptions of Theorem 6.14 are satisfied. 0
V XII
0 1.00000000
1 0.75036387
2 0.73911289
3 0.73908513
4 0.73908513
108 6. Iterative Methods for Nonlinear Systems
v Xv
0 1.00000000
1 0.53788284
2 0.56698699
3 0.56714329
4 0.56714329
Proof. Using condition (b) of Theorem 6.14 and the inequality (6.5) we can
estimate
Ilx· - xv+lll = Ilx· - Xv + It(xv)t 1 f(xv)11
:s Illt(xv))-lllllf(x·) - f(x v ) - t(xv)(x· - xv)11
:s fJ2'Y IIx· - x v ll 2 ,
since f(x·) = O. o
6.2 Newton's Method 109
Geometrically, in the one-dimensional case this means that the tangent line
of f at Xv is replaced by the parallel to the tangent line of f at Xo passing
through (xv, f(x v )).
Theorem 6.21 Under the assumptions of Theorem 6.14 the simplified
Newton method converges linearly to the unique zero of f in B[xo, r].
Proof. Recall that the function
Then estimating with the help of conditions (b), (c) and (d) and the in-
equality (6.5) we obtain
In the secant method for a function of one variable the derivative f'(x v )
is approximated by the difference quotient and the corresponding iterative
scheme is given by
Xv - Xv-l
Xv+! := Xv - f(x ) _ f(Xv-l) f(x v ), v = 0,1,.... (6.9)
v
Pi ()
x := bOX n-i + biX n-2 + b2X n-3 + ... + bn-2 X + bn-i,
using (6.10) we compute
n-i n
Pi(X) (x - Z) + bn = L bmxn-i-m(X - Z) + bn = L amX n- m = p(X).
m=O m=O
This implies that for a zero z the Horner scheme provides the coefficients
of the polynomial obtained by dividing P by the linear factor x - z. In
addition, we have that
and in particular,
p'(z) = Pi(Z).
Hence, applying the Horner recursion to the polynomial Pi yields the value
of the derivative p'(z). By repeating this process recursively, we can deter-
mine all the derivatives of P at the point z, since by induction, from (6.11)
we obtain that
whence
px n
( ) =aox +aix
n-i
+a2x n-2 +···+an-ix+an
ao al a2 an-l an
Z bo bl b2 bn - l bn
Z b'0 b~ b~ b~_l
Z b"
0 b"
1 b"
2 b"
n-2
b(n-l) b(n-l)
Z 0 1
b(n)
Z 0
g'(z) = 1 - -!.
m
.
Therefore, by Theorem 6.2, at a multiple zero Newton's method is locally
convergent. Obviously, the convergence at a multiple zero is only linear.
However, one can modify Newton's method for multiple zeros such that
the quadratic convergence is preserved (see Problem 6.14).
For finding complex zeros, in principle one can apply Newton's method
in ce. For this one has to keep in mind that for polynomials with real coef-
ficients, the starting values need to be complex, since otherwise Newton's
method would produce only real approximations. For the conjugate com-
plex zeros of a polynomial with real coefficients Bairstow's method avoids
working in the complex plane by using the fact that for two conjugate zeros,
the product of the linear factors (x - z)(x - z) is a polynomial of degree
two with real coefficients. The basic idea is to write the polynomial p of
degree n in the form
p(x) = (x 2 - ux - v)q(x) + a(x - u) + b,
where q is a polynomial of degree n - 2, and a and b are constants depending
on u, v E JR.. The factor x 2 - ux - v corresponds to two conjugate complex
zeros of p if the pair u, v solves the nonlinear system a(u, v) = 0, b( u, v) = O.
The latter can be solved by Newton's method, and once the solution u, v is
known, the two zeros of p are obtained by solving the quadratic equation
x 2 - ux - v = O.
We conclude this section with some consideration of the question of sta-
bility. In particular, we show that the zeros of polynomials can be quite
sensitive to small changes in the coefficients even if all the zeros are simple
and well separated from each other.
Let p and q be polynomials of degree n and assume that Zo is a simple
zero of p. Consider the perturbed polynomial
p(',€) :=p+€q,
q(Zo)
z(c) ~ Zo - c.,--(). (6.13)
p Zo
Example 6.24 The polynomial
has the zeros 1, 2, ... ,10, which are well separated from each other. We
perturb the coefficient of x 9 by choosing q(x) := 55x 9 . Since p'(lO) = 9!,
by (6.13), the zero Zo = 10 of the polynomial p is perturbed into
55· 109 5
10 - 9! c ~ 10 - 1.5 . 10 c.
where
6.4 Least Squares Problems 115
Xl = Xo - AM gradg(xo) (6.16)
og
Yj(x) := -~ (x), (6.17)
UXj
which we have to solve for the difference Xl - xo. Similarly, one step of the
steepest descent (6.16) can be transformed into
Now recall the least squares problem of Example 2.4. In a slight refor-
mulation, this problem consists in minimizing the function
m
g(x) := L)Ii(x) - U;]2
;=1
over some domain D, where the Ii : D --* JR are given functions and the
U; E JR are given constants for i = 1, ... , m. We compute the derivatives
og ~ ali
ax. (x) = 2 L)Ii(x) - Ui] ax. (x)
J i=l J
116 6. Iterative Methods for Nonlinear Systems
and
In this case the matrix (ajk) contains second derivatives of the functions
Ii- However, since these derivatives are multiplied by the factor [Ii(x) -Ui],
which will become small by minimizing g, it is justified to neglect this term.
Note that if Newton's method converges, it always will converge to a zero,
even if we do not use the exact Jacobian for the computation, provided that
the approximate Jacobian at the limit is nonsingular. Hence, we simplify
and replace (6.17) by
;:.- a
Ii Ii
ajk(X) := 2 L. ~ (x) ~ (x)
a (6.20)
i=l uXj UXk
and note that ajj(x) > O.
Now the Levenberg-Marquardt method combines (6.18) and (6.19) by
first introducing the n x n matrix A = (ajk) with entries
ajj = (1 + 'Y)ajj, ajk = ajk, j =j:. k,
where'Y is some positive parameter, and then replacing (6.18) and (6.19)
by
(6.21)
Obviously, for large 'Y the matrix A will become diagonally dominant, and
(6.21) will get close to the steepest descent, with
and A = l/T For'Y --+ 0, on the other hand, (6.21) will turn into the Newton
step (6.18). This ability to gradually vary between Newton's method and
the steepest descent method is one of the basic features of the Levenberg-
Marquardt method, which we describe as follows:
1. Choose an initial guess xo, some moderately sized value for 'Y, and a
factor 0:, say 'Y = 0.001 and °: = 10.
2. Solve the linear system (6.21) to obtain Xl.
3. If g(xd > g(xo), then reject Xl as a new approximation, replace 'Y by
O:'Y, and go back and repeat step two.
4. If g(Xl) < g(xo), then accept Xo as a new approximation, replace Xo
by Xl and'Y by 'Y/a, and go back to step two.
5. Terminate when the difference Ig(xd - g(xo)1 is smaller than some
given tolerance.
For a detailed analysis of this method we refer to [44]. For a study of
nonlinear optimization methods and their relation to nonlinear systems we
refer to [20].
Problems 117
Problems
6.1 Prove Brouwer's fixed point theorem in IR; i.e., show that if D C IR is a
closed and bounded interval and if f : D -+ D is continuous, then f has a (not
necessarily unique) fixed point.
v square roots
6.6 Let the sequence (xv) in IR converge to x such that Xv -# x for all v E IN
and
Xv+l - x = (q + ~v )(xv - x), v = 0,1, ... ,
where Iql < 1 and ~v -+ 0, v -+ 00. Show that
(Xv+l - x v )2
Yv := x v - ---'--'--,---'---
Xv+2 - 2xv+l + Xv
is well-defined for sufficiently large v and that
lim Yv - x -- 0',
v~oo Xv - X
Le., the sequence (Yv) converges to x more rapidly than the sequence (xv).
This method for speeding up the convergence of sequences is known as Aitken's
82 method.
+ 3a)
Xv(x~
Xv+l:= 2 3 , v = 0, 1, ... ,
Xv +a
is a method of order three for computing the square root of a positive number a.
118 6. Iterative Methods for Nonlinear Systems
6.10 Prove an analogue of Corollary 6.16 for the secant method (6.9).
6.11 Give conditions for monotone convergence of Newton's method for a func-
tion of one variable.
6.12 Show that Newton's method for the function f(x) := x n -a, x> 0, where
n> 1 and a > 0, converges globally to a 1 / n .
6.13 Assume that the polynomial p with real coefficients has only real zeros
and denote the largest zero by Zl. Show that for any initial point Xo with Xo > Zl
Newton's method converges to Zl.
p(X v )
Xv+l := Xv - m -(-) ,
pi Xv
v = 0,1, ... ,
converges locally and quadratically to the zero z.
A V + 1 := A v [2I - AA v ], v = 0, 1, ... ,
converges quadratically to the inverse A-\ provided that 111- AAol1 < 1.
6.16 Write a computer program for finding n simple zeros of a polynomial of
degree n with real coefficients. Use this code for the computation of the zeros of
the Laguerre polynomial £4 (x) = x 4 - 16 x 3 + 72 x 2 - 96 x + 24.
6.19 Write a computer program for solving a least squares problem by the
Levenberg-Marquardt method.
starting with Zo = °
6.20 The set of all points ( E <C for which the fixed point iteration Zv+l := z~ +(
remains bounded is called the Mandelbrot set. Write a
computer program for visualizing the Mandelbrot set.
7
Matrix Eigenvalue Problems
the equations
Le., similar matrices have the same characteristic polynomial. This invari-
ance allows one to transform a given matrix A by a similarity transfor-
mation into a matrix of simpler form with the same eigenvalues as A. In
particular, the iterative methods successively construct sequences of similar
matrices that converge to a diagonal matrix or an upper (or lower) trian-
gular matrix from which the eigenvalues can be read off as the diagonal
elements.
7.1 Examples
We begin by illustrating how the discretization of eigenvalue problems for
differential operators leads to eigenvalue problems for large matrices.
Example 7.1 The vibrations of a string are modeled by the so-called wave
equation
82w 1 82w
8x 2 c2 8t 2 '
where w = w(x, t) denotes the vertical elongation and c is the speed of
sound in the string. Assuming that the string is clamped at x = 0 and
x = 1, the boundary conditions w(O, t) = w(l, t) = 0 must be satisfied for
all times t. Obviously, the time-harmonic wave
with frequency w solves the wave equation, provided that the space-dependent
part v satisfies
-V" = Avon [0,1],
v(O) = v(l) = O.
show that the functions vm(x) = sin m1rX are eigenfunctions of D with the
eigenvalues Am = m 2 1r 2 for m = 1,2, .... It can be shown that these are
the only eigenvalues and eigenfunctions of D.
For discussing an approximate solution we consider the slightly more
general differential equation
-v" + pu = Avon [0,1]
with boundary conditions v(O) = v(l) = 0, where P E e[O, 1] is a given pos-
itive function. We can proceed as in Example 2.1 and choose an equidistant
mesh Xj = jh, j = 0, ... ,n+ 1, with step size h = l/(n+ 1) and n E IN. At
the internal grid points Xj, j = 1, ... , n, we replace the differential quotient
by the difference quotient
for approximate values Vj to the exact solution v(Xj). Here, we have set
Pj := p(Xj) for j = 0, ... ,n + 1. This system has to be complemented by
the two boundary conditions Vo = Vn+l = O. For an abbreviated notation
we introduce the n x n tridiagonal matrix
1 -1
A = h2
and the vector u = (VI, ... ,vn)T. Then the above system of equations,
including the boundary conditions, reads
Au = AU;
i.e., the eigenvalue problem for the differential operator D is approximated
by the eigenvalue problem for the matrix A. 0
for a linear integral operator with continuous kernel K. For the numerical
approximation we proceed as in Example 2.3 and approximate the integral
by the rectangular rule with equidistant quadrature points Xk = kin for
k = 1, ... , n. If we require the approximated equation to be satisfied only
at the grid points, we arrive at the approximating system of equations
1 n
- LK(xj,Xk)CPk
n
= ACPj, j = I, ... ,n,
k=l
for approximate values CPj to the exact solution cp(Xj). Hence, we approx-
imate the eigenvalues of the integral operator by the eigenvalues of the
matrix with entries K (x j , X k) In. Of course, instead of the rectangular rule
any other quadrature rule can be used. A discussion of the convergence of
the matrix eigenvalues to the eigenvalues of the integral operator is again
beyond the aim of this introduction. 0
Hence
n
Ax = LAk(X,Xk)Xk
k=j
and
n n
(Ax, x) = LAkl(x,Xk)12::; Aj LI(x,XkW = Aj(X,X).
k=j k=j
This implies
sup (Ax,x) <A
- j,
xEV; ( X,X )
x#O
A=
1
3 5 1
32) ,
( 214
\ . (Ax, x ) .
/\j = mm max ( ) ' J = 1, ... , n,
x, x
U;EM; xEU;
x#O
and the continuity of the function x H (Ax, x), the supremum is attained;
i.e., the maximum exists.
By Xl, X2, ... ,xn we denote orthonormal eigenvectors corresponding to
the eigenvalues Al ~ A2 ~ ... ~ An. First, we show that for a given
subspace Vj of dimension n + 1 - j there exists a vector x E Vj such that
(X,Xk) = 0, k= j + 1, ... ,no (7.1)
Let Zl, ... , Zn+l-j be a basis of V j . Then we can represent each x E Vj by
n+l-j
X= L aiZi· (7.2)
i=l
In order to guarantee (7.1), the n +1- j coefficients al, ... , a n +1-j must
satisfy the n - j linear equations
n+l-j
L ai(zi,xk) = 0, k = j + 1, ... ,no
i=l
x = L(X,Xk)Xk
k=l
we obtain that
j j
and hence
(Ax,x) ::; (Bx,x) + IIA - BI12I1xll~.
By the Courant minimum maximum principle of Theorem 7.4 this implies
and therefore
where the elements a~l' ... , a~n represent a permutation of the diagonal
elements au, ... ,ann of A such that a~ 1 2: a~2 2: ... 2: a~n·
Proof. Use B = diag(aj j ) and 11·11 = II· 112 in the preceding corollary. 0
and
kf-j
Proof. Assume that Ax = AX and IIxli oo = 1, and for x = (Xl, ... ,xn)T
choose j such that IXil = Ilxlloo = 1. Then
n n
IA-aiil = I(A-aii)xil = Laikxk ~ Llaikl,
k=l k=l
k#-i k#-i
and therefore
j=1
For the discussion of the case of equality, we first note that any unitary
transformation of a normal matrix is again normal. This is a consequence
of the identity
N(A):=
(
.t
J,k=1
lajkl
2
)
(7.4)
j#
The main idea of the Jacobi method for real symmetric matrices is to
successively reduce N(A) by elementary plane rotation matrices such that
in the limit the matrix becomes diagonal (with the eigenvalues as diagonal
entries).
7.3 The Jacobi Method 129
Lemma 7.11 For each pair j < k and each 'P E rn. the matrix
1
which coincides with the identity matrix except for Ujj = Ukk = cos'p and
Ukj = -Ujk = sin'P (and which describes a rotation in the xjxk-plane) is
unitary.
( cOS'P
sin 'P
- sin'P
cos 'P )( cos'p
- sin'P
sin 'P
cos'p ) (~ ~ )
and
( cos'p
- sin'P
sin'P ) ( C?S'P
cos'P sm'P
- sin'P
cos'p ) (~ ~ ).
Lemma 7.12 Let A be a real symmetric matrix and let U be the unitary
matrix of Lemma 7.11. Then B = U* AU is also real and symmetric and
has the entries
. 2 . 2
bj j = ajj cos 2 'P + ajk sm 'P + akk sm 'P,
. 2 'P - ajk sm
bkk = ajj sm . 2
'P + akk cos 'P,
2
i.e., the matrix B differs from A only in the jth and kth rows and columns.
Proof. The matrix B is real, since A and U are real, and it is symmetric,
since the unitary transformation of a Hermitian matrix is again Hermitian.
Elementary calculations show that
with bjj , bjk , bkj, and bkk as stated in the theorem. For i -:J j, k we have
that
n
bij = L uisasrurj = aijUjj + aikUkj = aij cos r.p + aik sin r.p
r,s=l
and
n
bik = L uisasrUrk = aijUjk + aikUkk = -aij sinr.p + aik cosr.p.
T,s=l
Finally, we have
n
bil = L uisasrUrl = ail
r,s=l
for i, l -:J j, k. o
Lemma 7.13 For
7r
r.p = "4 ' ajj = akkl
the transformation of Lemma 7.12 annihilates the elements
bjk = bkj = 0
and reduces the off-diagonal elements according to
[N(B)]2 = [N(AW - 2a;k'
Proof. bjk = bkj = 0 follows immediately from Lemma 7.12. Applying
Lemma 7.8 to the matrices
and
yields
+ 2ajk + akk = bj j + b2kk'
2
aj2 j 2 2
From this, with the aid of Lemmas 7.8 and 7.12 we find that
n n
[N(B)f = II B II} - L b;i = IIAII} - L b;i
i=l i=l
n
= [N(A)j2 + L(a;i - b;i) = [N(A)f - 2a;k'
i=l
Note that the quantities required for the computation of the elements of
the transformed matrix can be obtained by the trigonometric identities
1
=
cos2~
J1 + tan22~ ,
cos J
~ = ~ (1 + cos 2~), sin ~ = ~ (1 - J cos 2~).
The sign of the root in the expression for sin ~ has to be chosen such that
it coincides with the sign of tan 2~.
The classical Jacobi method generates a sequence (A,,) of similar matri-
ces by starting with the given matrix A o := A and choosing the unitary
transformation at the vth step according to Lemma 7.13 such that the non-
diagonal element of A"-1 with largest absolute value is annihilated. It is
obvious that the elements annihilated in one step of the Jacobi iteration,
in general, do not remain zero during subsequent steps. However, we can
establish the following convergence result.
Theorem 7.14 The classical Jacobi method converges; i.e., the sequence
(A,,) converges to a diagonal matrix with the eigenvalues of A as diagonal
elements.
[N(AW :s (n 2
- n) max a;l
i,l=l, ... ,n
i#-l
we obtain that
2 [N(A)F
a .k > ..::....,--'..-'-'-:-
J -n(n-1)
for the nondiagonal element ajk with largest modulus. Hence, from Lemma
7.13 we deduce that
where
q ._
.-
(1 _n(n-1)
2 ) 1/2
Note that for large n the value of q is close to one, indicating a slow
convergence of the Jacobi method. Writing A v = (ajk,v) by Corollary 7.6
we have the a posteriori error estimate
after performing v steps of the Jacobi method. Further error estimates can
be derived from Gerschgorin's Theorem 7.7.
Approximations to the eigenvectors can be obtained by successively mul-
tiplying the unitary transformations of each step. We have A v = Q~AQv,
where Qv = UI ... Uv is the product of the elementary unitary transforma-
tions for each step. From
Av ~ D = diag(AI,'." An)
it follows that AQv ~ QvD. Hence the columns Qv = (UI,"" un) of Qv
satisfy AUj ~ AjUj for j = 1, ... , n; i.e., they provide approximations to
the eigenvectors.
In each step, the classical Jacobi method requires the determination of
the nondiagonal element with largest modulus. In order to reduce the com-
putational costs, in the cyclic Jacobi method the nondiagonal elements are
annihilated in the order
(1,2), ... , (1, n), (2,3), ... , (2, n), (3,4), ... , (n - 1, n)
independent of their size. Convergence results can also be established for
this variant (see [27]).
A further refinement is to choose a constant threshold and to annihilate
in each cyclic sweep only those off-diagonal elements that are larger in
absolute value than the threshold. Of course, the threshold needs to be
lowered after each sweep, Le., after performing a full cycle. For details we
refer to [48, 65].
Example 7.15 For the matrix
-D
2 -1
-1 2
o -1
the first six transformed matrices for the classical Jacobi method are given
by
1.0000 0.0000 -0.7071 )
Al = 0.0000 3.0000 -0.7071 ,
( -0.7071 -0.7071 2.0000
(
0.6340 -0.2768 -0.1704 )
A3 = -0.2768 3.3864 0.0000 ,
-0.1704 0.0000 1.9796
(
0.6064 0.0000 -0.1695 )
A4 = 0.0000 3.4140 0.0169 ,
-0.1695 0.0169 1.9796
whence
n
Allvo = L akAA;xk
k=l
follows. Scaling after each step by the factor 1/Al leads to
and consequently
and
The fact that we need to find only the direction of the eigenvectors
motivates us to interpret the power method as a successive iteration of
subspaces. For
and corresponding eigenvectors Xl, X2, ... , X n . Assume that for some m
with 1 ~ m < n we have that IAm I > IAm+ll and define
SnU = {O}.
Then the orthogonal projections PAvs and PT of ce n onto A"S and T,
respectively, satisfy
and from this, with the aid of SnU = {O} and the linear independence of the
Yj,we conclude that a1 = ... = am = O. Hence, B indeed is nonsingular.
We denote the entries of the inverse of B by B- 1 = (Cjk)' Then
m
from
n
Allu = ~ akAkxk
k=m+1
we conclude that there exists a constant L > 0 such that
Am+l11l
II
I---.x:- ' = 1, ... ,m,
Ilwjll - Xj 2 ~ L j (7.6)
Analogously, we have
m
PAVSTl = ~l3kllwkll (7.9)
k=l
7.4 The QR Algorithm 137
and
m
Ll3kll(Wkll,Wjll) = (TJ,Wjll), j = 1, ... ,m. (7.10)
k=I
v E IN, (7.11)
for some constant C I . We denote the right-hand sides of (7.8) and (7.10) by
a and bll , respectively. Again from (7.6) and the Cauchy-Schwarz inequality
we have that
v E IN, (7.12)
for some constant C2 . Now, considering the linear system (7.10) as a per-
turbation of (7.8), from Theorem 5.3 we can conclude that
(7.13)
for the vectors Q: = (Q:I, ... ,Q:m)T and 1311 = (I3III, ... ,l3mll)T and some
constant C3 . From (7.7) and (7.9), using (7.6), (7.13), and the triangle in-
equality, the assertion of the lemma follows. 0
is a unitary matrix such that its first m columns represented by the matrix
Pt form a basis of T. Then P; API = 0, since T is invariant with respect
to A, and P; PI = O. Therefore, the unitary transformation yields
P* AP = ( Pt API
Pi API
denotes a unitary matrix such that its first m columns represented by the
matrix QI v form a basis of AV S, then for
B 12 ,v )
B 22 ,v
for m = 1, ... , n - 1.
for m = 1, ... , n - 1. Let qlO, ... , qnO be an orthonormal basis of {;n and
let the subspaces
Sm :=span{qlO, ... ,qmo}
satisfy
Sm n Um = {O}, m = 1, ... , n - 1.
Assume that for each v E lN we have constructed an orthonormal system
qlv, . .. , qnv with the property
and define Qv = (ql v, ... ,qnv). Then for the sequence of matrices
Av = (ajk,v) given by
(7.15)
we have convergence:
lim ajk,v = 0,
v-too
1<k <j :s n,
and
lim ajj,v
v-too
= Aj, j = 1, ... ,n.
Proof. 1. Without loss of generality we may assume that IIxjll2 1 for
j = 1, ... , n. From Lemma 7.18 it follows that
(7.16)
r := max Am+11
- - < 1.
I
m=l,... ,n-l Am
To prove this we assume to the contrary that the vectors Wl v , ..• , W nv are
not linearly independent for all sufficiently large v. Then there exists a
sequence Vt such that the vectors Wl vt , ... , w nvt are linearly dependent for
each f E IN. Hence there exist complex numbers 0u, ... ,Ont such that
n n
L 0ktWknt =0 and L IOktl 2
= 1. (7.18)
k=l k=l
Then
span{PI, ... ,Pm} = Tm , m = 1; ... , n - 1,
and by repeating the above argument,
span{Vlv, ... ,Vmv } = AVSm , m = 1, ... ,n - 1, (7.19)
it follows that
qmll = <(JmIlVmll,
= 1, ... ,no
m
For the matrices A = (aI, ... , an) and Q = (ql, ... , qn) this corresponds to
a QR decomposition
A=QR
as described in detail in Section 2.4. Now assume that All = Q~_lAQII-I
has been determined according to (7.15). To generate All+! from this, a
QR decomposition of the matrix AQII-I is required, since
A II 8 m = AA II - 18 m = span{Aql,II-I, .. "Aqm,lI-d·
142 7. Matrix Eigenvalue Problems
(7.22)
of A v by
AQv-1 = Qv-IAv = QV-IQvR v = QvR v ,
where Qv = QV-I Qv. From this we find that
(7.23)
Hence the two equations (7.22) and (7.23) represent one step of the succes-
sive iterations of subspaces as described in Theorem 7.19.
Now the QR algorithm consists in performing these iterations starting
from the canonical basis el, ... ,en, which means that in the first step a
QR decomposition is required for Al = A = (Ael,"" Ae n ).
Theorem 7.20 (QR algorithm) Let A be a diagonalizable matrix with
eigenvalues
(7.24)
and setting
A V+ I := RvQv
for v = 0, 1,2, . . .. Then for A v = (ajk,v) we have convergence:
and
lim ajj,v
v---+oo
= Aj, j = 1, ... ,no
Proof. This is just a special case of Theorem 7.19. o
We proceed with a discussion of the assumption (7.24). Define the ma-
trices X := (Xl,"" x n ) and Y := X-I = (Yjk). Then the identity I = XY
means that
n
ej = LXkYkj, j = 1, ... ,no
k=l
7.4 The QR Algorithm 143
1An-l
\ _1«1.
0-
Having reduced the elements of the last row to almost zero, the last row
and column of the matrix may be neglected. This means that the smallest
eigenvalue is deflated by canceling the last row and column, and the same
procedure can be applied to the remaining (n - 1) x (n -1) matrix with the
parameter 0- changed to be close to An-I. This so-called shift and deflation
strategy leads to a tremendous speeding up of the convergence. For details
we refer to [27, 65].
The computational costs of one step of the QR algorithm is reduced when
the matrix has a large number of zero entries. For example, for tridiagonal
matrices all matrices generated in the QR algorithm remain tridiagonal. In
the following section we will consider so-called Hessenberg matrices, which
differ from upper triangular matrices only by a non-zero first subdiagonal. It
can be shown (see Problem 7.16) that the Hessenberg form is also invariant
with respect to the QR algorithm. Hence, for practical computations it is
convenient first to transform the matrix into Hessenberg form.
In general, comparing the computational costs, for symmetric matrices
the QR algorithm is superior to the Jacobi method. However, the actual
programming for the Jacobi method is very simple as compared with the
QR algorithm. Hence for small matrix size n the Jacobi method is still
attractive.
A= (~ll A)'
where A is an (n - 1) x (n - 1) matrix and (it an (n - 1) vector. Then
considering a Householder matrix HI of the form
7.5 Hessenberg Matrices 145
AHl• = (~ll * )
al AHi
and
HlAH; = (;ll~l Hl~Hi)'
As shown in the proof of Theorem 2.13, choosing
where
Ul = al =fO'(l,o,oo.,of
and
0' = { I:~~ IJaial , a21 i 0,
Jaial , a21 = 0,
eliminates all elements of al with the exception of the first component.
Hence the first column of the transformed matrix is of the required form.
Now assume that A k is an n x n matrix of the form
Hk = (~ H~-k)'
where Ik denotes the k x k identity matrix and Hn - k is an (n - k) x (n - k)
Householder matrix, it follows that
AkHic = ( Bk
ak ° *
An-kH~_k )
and
HkAkHic = ( ° Hn-kak
Bk *
Hn-kAn-kH~_k
).
Now, proceeding as above, we can choose Hn - k such that all elements of
Hn-kak vanish with the exception of the first component. This procedure
reduces a further column into Hessenberg form. We can summarize our
analysis in the following theorem.
146 7. Matrix Eigenvalue Problems
al C2
C2 a2 C3
C3 a3 C4
A=
and
p~(A) = (ak - A)p~_1 (A) - C%P~_2(A) - Pk-I (A), k = 2, ... , n, (7.26)
starting with PO(A) = 1 and pdA) = al - A.
Proof. The recursion (7.25) follows by expanding det(A k - AI) with respect
to the last column, and (7.26) is obtained by differentiating (7.25). 0
Example 1.24 The n x n tridiagonal matrix
2 -1
-1 2 -1
-1 2 -1
A=
-1 2-1
-1 2
has the eigenvalues
\ 4' 2 j7r . 1
Aj = sm 2(n+l) , J= , ... ,n
7.5 Hessenberg Matrices 147
(see Example 4.17). Table 7.1 gives the results of the Newton iteration using
(7.25) and (7.26) for computing the smallest eigenvalue Amin = Al and the
largest eigenvalue Amax = An for n = 10. The starting values are obtained
from the Gerschgorin estimates IA - 21 ~ 2 following from Theorem 7.7. 0
Amax Amin
4.00000000 0.00000000
3.95000000 0.05000000
3.92542110 0.07457890
3.91933549 0.08066451
3.91898705 0.08101295
3.91898595 0.08101405
3.91898595 0.08101405
bn,n-l~n-l + (b nn - A)~n = 0,
and ~n = 1. This is an n x n upper triangular linear system for the n
unknowns 0:,6, ... ,~n-l, and it can be solved by backward substitution.
Setting
bll - A b12 .. b1,n-l
C = b21 21
b - A : : b2,~-1
(
bn,n-l
by Cramer's rule we have that
1
1 = ~n = detC = (-1)n- b21 ·· ·bn,n-IO:
det(B - AI) det(B - AI)
148 7. Matrix Eigenvalue Problems
that is,
p(A) = (_1)n-lb21···bn,n_lO(A).
Differentiating the last equation yields
p' (A) = (_1)n-l b21 ... bn,n-l 0' (A),
and therefore
p(A) O(A)
P'(A) = O'(A)'
By differentiating the above linear system with respect to A we obtain
the linear system
(b l l - A)TJI + + b1,n-ITJn-1 = 6 + (3,
+ b2 ,n-ITJn-l = 6,
bn,n-ITJn-1 = ~n
for the derivatives (3 = 0', TJI = ~~, ... , TJn-1 = ~~_I' This linear sys-
tem again can be solved by backward substitution for the n unknowns
(3, TJI, ... , TJn-l· Thus we have proven the following theorem.
Theorem 7.25 Let B = (bjk) be an irreducible Hessenberg matrix and let
A E <C. Starting from ~n = 1, TJn = 0, compute recursively
~n-k = bn-k+l,n-k
1 t bn-k+l,j~j},
{A~n-k+1 - j=n-k+1
TJn-k =b 1
n-k+l,n-k
t bn-k+I,jTJj}
{~n-k+l + ATJn-k+l - j=n-k+1
for k = 1, ... , n - 1 and
n
0= -A6 + 2:>lj~j,
j=l
n
(3 = -6 - ATJI + I>ljTJj.
j=1
Then for the characteristic polynomial of B we have
p(A) 0
p'(A) = -g .
Problems 149
Problems
7.1 For the eigenvalues (repeated according to their algebraic multiplicity) of
an n x n matrix A show that
n
LAj
n
7.2 For Example 7.1 show that in the case p = 0 the eigenvalues of the matrix
A converge to the eigenvalues of the differential operator D as n ~ 00.
7.3 Show that the eigenvalues of the adjoint matrix A· are the complex conju-
gate of the eigenvalues of the matrix A.
7.4 Show that for the eigenvalues of Hermitian matrices the geometric and the
algebraic multiplicities coincide.
To check the estimates, compute the eigenvalues by finding the zeros of the char-
acteristic polynomial.
7.8 Write a computer program for the Jacobi method and test it for various
examples.
7.10 Show convergence ofthe cyclic Jacobi method with threshold [N(AW j(2n 2 ).
7.11 Let A be a diagonalizable n x n matrix with eigenvalues AI, ... ,An and
eigenvectors Xl, ... ,X n , and assume that lAd> IA21 ~ IA31 ~ ... ~ IAnl. Starting
from Vo E <en with Vo If. span{X2, ... , X n } show that the sequence
Av v v = 0, 1,2, ... ,
Vv+l := IIAv v l12 '
150 7. Matrix Eigenvalue Problems
(Avv,vv)
Rv : = -'--:c,.--,:-=--'- v = 0, 1,2, ... ,
IIvvll~
satisfies the estimate
v
IRv-Ad:::;Cr , v=0,1,2, ... ,
1.13 Write a computer program for the QR algorithm and test it for various
examples.
1.14 Verify the numerical results of Table 5.2 for the Hilbert matrix.
1.16 Show that the Hessenberg form of a matrix is preserved by the QR algo-
rithm.
1.11 Show that the number of multiplications required for the transformation
of a matrix into Hessenberg form via Householder transformations according to
Theorem 7.22 is 5n 3 /3 + O(n 2 ).
1.18 Write a computer program for transforming a matrix into Hessenberg form
via Householder transformations according to Theorem 7.22.
j=1
for a real (or complex) variable x and with real (or complex) coefficients
ao, ... ,an' A polynomial p E Pn is said to be of degree n if an =I- O. In
this chapter, we consider Pn as a subspace of the linear space C[a, b] of
continuous real- (or complex-) valued functions on the interval [a, b], where
a < b. For m E IN we denote by Cm[a, b] the linear space of m times
continuously differentiable real- (or complex-) valued functions on [a, b].
We recall the following basic uniqueness property of algebraic polynomi-
als as part of the fundamental theorem of algebra. Since we will use this
property frequently, it is appropriate to include a simple proof by induction.
Theorem 8.1 For n E IN U{O}, each polynomial in P n that has more than
n (complex) zeros, where each zero is counted repeatedly according to its
multiplicity, must vanish identically; i. e., all its coefficients must be equal
to zero.
Proof. Obviously, the statement is true for n = O. Assume that it has been
proven for some n 2: O. By using the binomial formula for x k = [(x-z)+z]k
we can rewrite the polynomial p E Pn+l in the form
n+l
p(x) = L bk(x - Z)k + bo
k=l
with the coefficients bo , bl , ... ,bn+t depending on ao, at, . .. ,an+1 and z.
If z is a zero of p, then we must have bo = 0, and this implies that
p(x) = (x - z)q(x) with q E Pn . Obviously, q has more than n zeros,
since p has more than n + 1 zeros. Hence, by the induction assumption, q
must vanish identically, and this implies that p vanishes identically. 0
that is,
n
L:akxk = 0, x E [a,b].
k=O
Then the polynomial with coefficients ao, aI, ... ,an has more than n dis-
tinct zeros, and from Theorem 8.1 it follows that all the coefficients must
be zero. 0
The linear independence of the monomials Uo, ... , Un implies that they
form a basis for Pn and that Pn has dimension n + 1.
Theorem 8.3 Given n + 1 distinct points Xo, . .. ,Xn E [a, b] and n + 1
values Yo, ... ,Yn E nt, there exists a unique polynomial Pn E Pn with the
properly
Pn(Xj)=Yj, j=O, ... ,n. (8.1)
In the Lagrange representation, this interpolation polynomial is given by
(8.2)
°
hold, where 8jk = 1 for k = j, and 8jk = for k ::J j. It follows that Pn
given by (8.2) is in Pn , and it fulfills the required interpolation conditions
Pn(Xj) = Yj, j = 0, ... ,no
To prove uniqueness of the interpolation polynomial we assume that
Pn,l, Pn,2 E Pn are two polynomials satisfying (8.1). Then the difference
Pn := Pn,l - Pn,2 satisfies Pn(Xj) = 0, j = 0, ... , n; i.e., the polynomial
Pn E Pn has n + 1 zeros and therefore by Theorem 8.1 must be identically
zero. This implies that Pn,l = Pn,2' 0
n large the Lagrange factors become very large and highly oscillatory, which
causes ill-conditioning of the Lagrange interpolation polynomial. Already
in 1676, in his study of quadrature formulae (see Theorem 9.3), Newton
had obtained a representation of the interpolation polynomial that is more
practical for computational purposes. For its description we need to give
the following definition.
D oj '.-
- Yj, j =O, ... ,n,
Dk- l _ Dk- l
k.=
DJ '
j+l j
,
.
J= 0 , ... ,n-,
k k = 1, ... ,n.
Xj+k - Xj
We notice that the points Xo, ... ,Xn need not be in ascending order. It
is convenient to arrange the divided differences according to the tableau
Xo Yo = D8
DA
Xl Yl = D? D 02
D 1
1
D30
X2 Y2 = D~ D 21
D 2l
X3 Y3 = Dg
which we illustrate by the following example. Obviously, for the full tableau
the computational cost is of order O(n 2 ).
j+k j+k 1
Dj = L . Yrn II
.. Xrn - Xi
' j = 0, ... ,n - k, k = 1, ... ,n. (8.4)
rn=J '=J
i#rn
we obtain
+ Yj+k
j+k-l
D 1
Xj+k - Xi + Yj i!!l
j+k 1
-xJ-·---X-i = l;
j+k
Ym g
j+k
i#m
1
-x-m---X-i
and therefore
dn(Xj) =0, j=O, ... ,n-l.
Hence, by Theorem 8.1 it follows that dn = 0, and therefore Pn = Pn. 0
Analogously to the Horner scheme (see (6.10)), the value of the Newton
interpolation polynomial at a point x can be obtained by nested multipli-
cations according to
(8.6)
proven for degree k - 1 for some k ~ 2. Then the right-hand side of (8.6)
describes a polynomial P E Pk, and by the induction assumption we find
that the interpolation conditions
(x· - Xi)Y· - (x· - Xi+k)Y·
p(Xj)= J J J J =Yj, j=i+I, ... ,i+k-I,
XHk - Xi
as well as p(Xi) = Yi and p(Xi+k) = Yi+k are fulfilled. o
The main application of polynomial interpolation consists in the approx-
imation of continuous functions I : [a, b) -+ IR. In this case, given n + 1
distinct points Xo, . .. , Xn E [a, b], by
L n : C[a, b) -t Pn
(Rnf)(x) =
I(n+l)(~)
(n + I)! !!
n
(x - Xj), x E [a, b), (8.8)
with the step width h = Xl -xo. For the polynomial q2(X) = (x-xo)(x-xd
we have that
h2
max
xE[xQ,xI]
Iq2(X)1 = -4 .
Therefore, by Corollary 8.11, the error occurring in linear interpolation of
a twice continuously differentiable function f can be estimated by
h2
I(Rd)(x)1 ~ -8 max
yE[xQ,xI]
1!"(y)l, x E [xo,xd· (8.9)
For example, the error in linear interpolation with step size h = 0.01 for
the sine function is less than or equal to h 2 /8 = 0.0000125. 0
and
8.1 Polynomial Interpolation 159
(see [16]) ensures that for each f E CIa, b] there exists a sequence of poly-
nomials Pn E Pn such that IIPn - flloo -+ 0 as n -+ 00. As a consequence of
the Chebyshev alternation theorem from approximation theory (see [16]),
for the uniquely determined best approximation Pn to f in the maximum
norm with respect to Pn, the error Pn - f has at least n + 1 zeros in [a, b].
Then taking the sequence of these zeros as the sequence of interpolation
points implies the statement of the theorem. 0
(8.10)
Proof. Obviously, the polynomial P2n+l belongs to P2n +l , since the Hermite
factors have degree 2n + 1. From (8.3), by elementary calculations it can
be seen that (see Problem 8.7)
From this it follows that the polynomial (8.11) satisfies the Hermite inter-
polation property (8.10).
To prove uniqueness of the Hermite interpolation polynomial we assume
that P2n+l,1, P2n+l,2 E P2n+l are two polynomials having the interpolation
property (8.10). Then the difference P2n+l := P2n+l,1 - P2n+l,2 satisfies
i.e., the polynomial P2n+l E P 2n+l has n + 1 zeros of order two and there-
fore, by Theorem 8.1, must be identically equal to zero. This implies that
P2n+l,1 = P2n+l,2· 0
1(2n+2)W n 2
(Rnf)(x) = (2n + 2)! II
(x - Xj), x E [a, b], (8.13)
)=0
for some T > 0. For example, functions defined on closed planar or spatial
curves always may be viewed as periodic functions. Polynomial interpola-
tion is not appropriate for periodic functions, since algebraic polynomials
162 8. Interpolation
with real (or complex) coefficients ao, ... , an and bl , ... , bn . A trigonomet-
ric polynomial q E Tn is said to be of degree n if lanl + Ibnl > o.
From the addition theorems for the cosine and sine functions it follows
that qlq2 E T n1 + n2 if ql E Tn. and q2 E T n2 . This justifies speaking of
trigonometric polynomials.
Theorem 8.21 A trigonometric polynomial in Tn that has more than 2n
distinct zeros in the periodicity interval [0,271") must vanish identically; i.e.,
all its coefficients must be equal to zero.
Proof. We consider a trigonometric polynomial q E Tn of the form
n
q(t) = a; + L[ak cos kt + bk sin kt]. (8.14)
k=l
Setting bo = 0,
p(z):= L IkZn+k,
k=-n
we have the relation
q(t) = z-np(z).
8.2 Trigonometric Interpolation 163
Theorem 8.22 The cosine functions Ck(t) := cos kt, k = 0,1, ... , n, and
the sine functions Sk(t) := sin kt, k = 1, ... , n, are linearly independent in
the function space e[O, 27r].
that is,
n n
L ak cos kt + L bk sin kt = 0, t E [0,27r].
k=O k=1
Then the trigonometric polynomial with coefficients ao, ... , an and bl , ... , bn
has more than 2n distinct zeros in [0, 27r), and from Theorem 8.21 it follows
that all the coefficients must be zero. Note that this linear independence
also can be deduced from Theorem 3.17. 0
Theorem 8.22 implies that the cosines Ck, k = 0,1, ... , n, and sines
Sk, k = 1, ... ,n, form a basis for Tn and that Tn has dimension 2n + 1.
Theorem 8.23 Given 2n+ 1 distinct points to, ... , t2n E [0,27r) and 2n+ 1
values Yo, ... ,Y2n E JR., there exists a uniquely determined trigonometric
polynomial qn E Tn with the property
(8.17)
(8.18)
2n . t - ti
sm--
fk(t) = II 2
. tk - ti '
k = 0, .. . ,2n.
i=O sm---
i#k 2
164 8. Interpolation
Proof. The function qn belongs to Tn, since the Lagrange factors are trigono-
metric polynomials of degree n. The latter is a consequence of
. t - to . t - tl tl - to t1 + to)
sm -2- sm -2- = 2"1 cos --2- -
1 (
2" cos t - --2- ;
°,
2n 1 _ ei(2n+l)tk
ijtk
L...J e
'"' = .
1 - ettk =
j=O
Assume that the coefficients rk solve (8.20). Then, with the aid of (8.19),
we obtain
2n n 2n
LYj e- imtj = L rk L ei(k-m)tj = (2n + Ibm;
j=O k=-n j=O
8.2 Trigonometric Interpolation 165
2n
'Yk =2 ~1 LYj e-
iktj
, k = -n, ... , n. (8.21)
n j=O
On the other hand, again with the aid of (8.19), for 'Yk given by (8.21) we
have that
n 1 2n 2n
'"" 'Yk e iktj
L.J
= 2n + 1 '""
L.J
Ym '"" eik(tj - t = Yj,
L.J
m
} j = 0, ... , 2nj
k=-n m=O k=O
i.e., the linear system (8.20) has a unique solution, which is given by (8.21).
From this, using the relation (8.15) between the real representation (8.14)
and the complex representation (8.16) of trigonometric polynomials, we
derive the following theorem.
21fj )
qn ( 2n + 1 = Yj, j = 0, ... , 2n.
Its coefficients are given by
2 2n 21fjk
ak = 2n + 1 L Yj cos 21
j=O n+
' k = O, ... ,n,
2 2n 2 'k
bk '"" . 1fJ k = I, ... ,n.
- 2n + 1 L.J Yj sm 21 '
j=O n+
qn(t ) aO + LJ
="2 ~[ .
ak coskt + bk smkt1+ 2'
an cosnt
k=l
1 'k
L
2n-l
bk =- Yj sin 7l'J , k = 1, ... , n - 1.
n j=O n
Obviously, the trigonometric interpolation polynomials of Theorems 8.24
and 8.25 may be viewed as discretized versions of the Fourier series, where
the integrals giving the coefficients of the Fourier series (see Problem 3.20)
are approximated by the rectangular quadrature rule at an equidistant grid
(see Corollary 9.27). Therefore, trigonometric interpolation on an equidis-
tant grid is also known as the discrete Fourier transform.
An effective numerical evaluation of trigonometric polynomials can be
done analogously to the Horner scheme for algebraic polynomials. For the
polynomial
n
p(z) = LCkZk
k=O
the recursion (6.10) of the Horner scheme has the form
starting with bn = Cn, and it delivers p(z) = boo Assuming that the coeffi-
cients Ck are real, we substitute z = e it and separate into real and imaginary
parts, bk = Uk + iVk, to obtain Un = Cn, Vn = 0, and the recursion
k = 0, ... ,n - 1. (8.22)
Let m := n/2 = 2P - 1
and W := e- 27ri / n . Then wn 1, w m = -1, and
(8.22) reads
Ck = -1 ~ k·
L.., YjW J, k = 0, ... ,n - 1.
n j=O
Now, the basic idea of the Cooley-Thkey algorithm is to break this sum
into two parts for j even and j odd; i.e.,
11
Ck = 2 rk + 2 W k <5k , k = 0, ... , n - 1,
where
rk := -
1" m-l
<5 k := -1 L..,
" Y2j+lW 2k
J , k = 0, ... ,n-1.
m j=O m j=O
M p = p2 P+l = 2n log2 n,
168 8. Interpolation
i.e., that the computational cost is reduced significantly from order O(n Z )
to order O( n logz n).
The actual numerical implementation is based on writing the indices k
and j in a binary representation
p-l p-l
q
k = [ko, ... ,kp-d = L kq2 , j = [jo, ... ,jp-d = L jq2 q
q=O q=O
since
for q + r 2: p. Inserting this into (8.22), we can split the long sum into p
nested short sums, and the Fourier transform becomes
1 ..
~ _ 2''']p_1 k
X .•. x L...J e 2 0 Y[jo, ... ,jp-d'
jp_1=0
L
jp_q=o
1
k
L
_ 21rijp_l
X··· x e 2 0 Y[jo, ... ,jp- d
jp-1=0
for q = 1, ... ,p and jo, ... ,jp-q-l, kq- 1 , ..• , k o E {O, I}. Then clearly,
1
c[ko, ... ,k p -1 1-- -n Sp[k p -1, ... ,koJ' (8.23)
and setting
SOl.
Jo, ... ,Jp-l . l'
. J = Y[J' o,···,)p-l
we have the recursive relation
S[q]O,··.,Jp-q-l,
. k q-1,'''' k 0 ] = S[q-l.
Jo,···,}p-q-l,
Ok
I
k 1
q-2,···, 0
°°
From the error estimate (8.9) for linear interpolation, we see that for piece-
wise linear interpolation we have uniform convergence IISn - flloo -+
for n -+ 00 on [a, b], provided that h := maxj=l .... ,n IXj - xj-ll -+ and
f E C 2 [a, b]. The main advantage of this method is its simplicity and its
stability with respect to errors in the interpolation values. However, since
by (8.9) linear interpolation has an error only of order O(h 2 ), for achieving
a prescribed accuracy it usually requires a much finer discretization than
some of the higher-order methods described below.
m
X+ .-
.- {
0, x < 0,
for m E IN. The m +n functions
Uk(X):= (x-xo)k, k=O, ... ,m,
(8.24)
Vk(X) := (x - Xk)+, k = 1, ... ,n -1,
are linearly independent. In order to see this, let
m n-l
L akUk + L {3k vk = O.
k=O k=l
Then, in particular,
m
for j = 1, ... ,n. This is true for j = 1, since on [xo, Xl] the spline s coincides
with an element of Pm. Now assume that we have the representation (8.25)
for some j 2: 1. Then the difference
m j-l
p(x) := s(x) - L Uk(X - xo)k - L f3k(X - Xk)~
k=O k=l
restricted to the interval [Xj, Xj+I] is in Pm. Since the spline s is in C m - l [a, b]
and p vanishes on [xo, Xj] we have that
p(i)(Xj) = 0, i = 0, ... ,m-1.
Hence p(x) = f3j(x - Xj)+- on [Xj,xj+d for some constant f3j, and because
°
(x - Xj)+- = on [xo,Xj], the representation (8.25) is proven for j + 1. 0
where
R:= ib[j(l)(X) - s(l)(x)]s(l)(x)dx.
R = (_I)l-l i= l
j=l
Xj
Xj-1
[f'(x) - s,(x)]s(m)(x) dx
Xj
n
(_I)l-l 2:)/(x) - s(x)]s(m) (X) =0,
j=l
Xj-l
l b
[s(l) (xWdx = O.
This implies that s(l) = 0, and therefore s E Pl - 1 on [a, b]. Now the bound-
ary conditions s(j) (a) = 0, j = 0, ... , e- 1, yield s = o. 0
From the proof it can be seen that Lemmas 8.28 and 8.29 remain valid
if the boundary conditions (8.27) are replaced by
s(l+j) (a) = s(l+j) (b) = 0, j = 0, ... ,e - 2,
or, provided that 1 is periodic with period b-a, by the periodicity condition
s(j)(a) = s(j)(b), j = 1, ... , e- 1.
Consequently, the following conclusions drawn from Lemma 8.29 are also
true for these two end conditions. However, from a practical point of view
only the latter modification is of relevance.
Theorem 8.30 Let m = 2f - 1 with e E IN and e ~ 2. Then, given n + 1
values Yo, ... , Yn and m - 1 boundary data al,· .. , al-l and b1, ... ,bl- 1 ,
there exists a unique spline s E S;:' satisfying the interpolation conditions
m n-l
m n-l
j
L aku~) (a) + L (3k vk )(a) = aj, j = 1, ... , £ - 1, (8.32)
k=O k=l
m n-l
Laku~)(b)+L(3kvkj)(b)=b j , j=I, ... ,£-I,
k=O k=l
I, Ixi :S 0.5,
Bo(x) :=
{ 0,
Ixi > 0.5,
and define recursively
Bm+i(x):= 1-1x+!
2
Bm(y)dy, XEIR, m=O,I, .... (8.33)
Ixl :S 1,
(8.34)
Ixl ~ 1,
0, Ixl ~ 1.5,
0, Ixl ~ 2.
Proof. This is trivial for m = 0, and we assume that it has been proven for
degree m - 1 for some m 2:: 1. Let
m
LCXkBm(X - k) = 0, x Elm . (8.38)
k=O
Then, with the aid of (8.33), differentiating (8.38) yields
a i:~!! Bm(x) dx = 0.
This finally implies a = 0, since the B m are nonnegative, and the proof is
finished. 0
X - a - hk)
Bm,k(X) := B m ( h ' x E [a, b], (8.39)
The use of the B-splines as a basis opens up another possibility for the
computation of an interpolating spline. We only consider the case m = 3,
i.e., cubic splines. From (8.36) we note that
B~(O) = 0, B~(±l) = 1= ~ •
176 8. Interpolation
8(X) = 2::
n+1
O:k
( X - Xk )
B3 - h - , x E [a, b], (8.40)
k=-l
1 2 1
{; O:j-1 + :3 O:j + {; 0:)+1 = Yj, j = 0, ... , n, (8.41 )
h3 / 2 2 111"112,
III - 81100 ::; -2- 111'% and III' - 8'1100 ::; h
1
/
Proof. The error function r := f - 8 has n + 1 zeros XO, ••• , X n . Hence, the
distance between two consecutive zeros of r is less than or equal to h. By
Rolle's theorem, the derivative r' has n zeros with distance less than or
equal to 2h. Choose z E [a,b] such that Ir'(z)1 = Ilr'lIoo. Then the closest
zero ( of r' has distance I( - zl ::; h, and by the Cauchy-Schwarz inequality
we can estimate
Choose x E [a, b] such that Ir(x)1 = IIrll oo . Then the closest zero ~ of r
has distance I~ - xl ~ h/2, and we can estimate
Hence, the cubic spline (8.40) has second derivatives given by the difference
formula
s" (Xj) = ~2 [OJ-l - 20j + OJ+l], j = 0, ... , n. (8.42)
for j = 1, ... , n - 1,
and
From this and the linear system (8.41), for the special case of the interpo-
lation conditions (8.26) and the boundary conditions (8.27), it follows that
the n + 1 values of s" at the grid points satisfy the system
48"(XO) + 2s"(xt} = Fo ,
since trivially £1 r = O.
8.4 Bezier Polynomials 179
are called Bernstein polynomials of degree n for the interval [a, b].
Some basic properties of Bernstein polynomials are described in the fol-
lowing theorem.
Theorem 8.37 The Bernstein polynomials are nonnegative on [0,1] and
provide a partition of unity; i.e.,
(8.48)
and n
L BI: (t) = 1, t E JR.. (8.49)
k=O
They satisfy the relations
and
B[;(t) = (1 - t)B[;-l(t), B~(t) = tB~=i(t) (8.51)
°
for all t E JR. and n E IN. The point t = is a zero of BI: of orner k, and
t = 1 is a zero of order n - k. Each of the polynomials BI: assumes its
maximum value only at t = kin. They satisfy the recursion relation
(8.52)
for n E IN and k = 1, ... , n - 1. The polynomials BEf, ... ,B;: form a basis
of Pn ·
Proof. The first five properties are obvious. The statement on the maximum
of BI: is a consequence of
Then
dj
L bk dt
n
j B'k(t) = 0, t E [0,1]'
k=O
and therefore
n
p(x) =L bkB'k(X; a, b), x E [a, b], (8.53)
k=O
is the smallest convex set containing the points bo, ... , bn (see Problem
8.19).
For computing the derivatives of a Bezier curve we first note that
182 8. Interpolation
implies that
-n B 0n - 1
, k = 0,
n(Bk-I
n- 1 _ B kn - I)
, k = 1, ... , n - 1, (8.54)
n- I
n B n-I' k=n.
With this identity we are ready to establish the following theorem.
Theorem 8.39 Let
n
p(t) = I: bkB'k(t), t E [0,1],
k=O
n-(j+l)
n!
[n - (j + I)]!
I: 6 i +lbk B;-(j+l) (t),
k=O
From Corollary 8.40 we note that pU) (0) depends only on bo , ... ,bj and
that pU)(l) depends only on bn - j , ... , bn . In particular, we have that
(8.55)
i.e., at the two endpoints the Bezier curve has the same tangent lines as
the Bezier polygon. Through the affine transformation (8.46) these results
on the derivatives carryover to the general interval [a, b].
bo bo
Figure 8.2 illustrates by two Bezier polynomials of degree two in rn? how
the shape of the curve is influenced by the location of the control points bi.
From (8.55) we also observe how to patch two Bezier polynomials of degree
two together smoothly such that the tangent lines at the joints coincide,
i.e., such that the two polynomials match up to a Bezier spline of degree
two. The Bezier polynomials have the same tangent lines at the joints if
the Bezier polygons do. This is illustrated by Figure 8.3.
bo
(8.57)
for i = 0, ... ,n- k and k = 1, ... ,n.
Proof. We insert the recursion formulae (8.51) and (8.52) for the Bernstein
polynomials into the definition (8.56) for the subpolynomials and obtain
k-l
b~(t) = biB~(t) + L bi+jBj(t) + biHB~(t)
j=l
k-l k
Since bo(t) = p(t), starting the recursion with b2(t) = bk, from (8.57) we
can compute p(t) by successive convex combinations of the Bezier points
bo, ... , bn , which clearly is a numerically stable procedure. Since (8.57) is
similar in structure to the divided differences in Definition 8.4, the compu-
tations can be arranged in a tableau analogous to the one for the divided
differences.
From the coefficients of the de Casteljau tableau we can construct two
Bezier polynomials on the subintervals [0, t] and [t,l] that coincide with
the original Bezier polynomial on the full interval [0,1].
8.4 Bezier Polynomials 185
j j
= (;)x (1- x)n- .
Problems
8.1 Let UI, ... ,U n E C[a, b) be linearly independent and let XI, ... ,X n E [a,b)
be distinct. For given values YI, ... ,Yn E JR consider the interpolation problem
of finding a function U E Un := span{ UI, ... ,Un} with the property
8.3 Write a computer program for the Neville scheme of Theorem 8.9.
8.5 Let Xo, ... , Xn E JR be n + 1 distinct points. Show that the Vandennonde
matrix V with entries (xj) for j, k = 0, 1, ... n has determinant
8.8 Prove Theorem 8.19, i.e., the representation of the remainder in Hermite
interpolation.
8.11 For the trigonometric interpolation from Theorem 8.24 with 2n+l equidis-
tant interpolation points show that the Lagrange factors are given by
where
1 sin(n+~+I)t
F(t):= --1 t
n+ sin-
2
for t i- 0, ±271", ±471", .... Prove that
8.12 For the trigonometric interpolation from Theorem 8.24 with 2n+l equidis-
tant interpolation points show that
for all n E IN and all continuous 271"-periodic functions 1 and use the Weierstrass
approximation theorem for periodic functions.
8.13 For the trigonometric interpolation from Theorem 8.24 with 2n+ 1 equidis-
tant interpolation points show that
for n = 1,2, ... and k = 0, ±1, ±2, ... , and use the fact that the Fourier series
for continuously differentiable functions is uniformly convergent.
8.15 Given n distinct points Zl, ... , Zn (/. la, b], n distinct points Xl, ... ,X n in
fa, b], and n values Yl, ... ,Yn E JR, show that there exists a unique function of
the form
n
U(X) = L: -ak-
Xk + Zk
k=l
8.17 Use the fact that the second derivative of a cubic spline is a piecewise linear
function to derive the linear system (8.43) without using the B-spline (8.36).
Hint: On each subinterval integrate the piecewise linear function for s" twice and
eliminate the integration constants through the interpolation conditions. Then
use the continuity of s' to obtain the linear system.
k
L nB;:(t) = t,
n
t E JR.,
k=O
and
n 2
L
k=O
k B kn (t) -
-
n2
n --1t 2
_ -
n
t
+-,
n
t E JR..
8.20 Give the Bezier representation of the (cubic) Hermite factors of Theorem
8.18 for the case of two interpolation points. Draw the graphs of the Hermite
factors and their Bezier polygons.
9
Numerical Integration
QU) := l b
f(x) dx (9.1)
l
by
b
QnU) := (Lnf)(x) dx (9.3)
ak = I 1
qn+I (Xk )
l
a
b
qn+I(x) dx,
X - Xk
k = 0, ... ,n, (9.4)
with
b lb II
ak =
la
lk(X)dx =
a.
n
)=0
j#
x
-x
Xk - Xj
j dx,
(9.5)
for all P E P n .
Proof. From (9.3) and LnP = P for all P E Pn it follows that
i.e., the quadrature is exact for all P E Pn . On the other hand, from (9.5)
we obtain
f
k=O
akf(xk) = f
k=O
ak(Lnf)(Xk) = l a
b
(Lnf)(x) dx
Xk = a + kh, k = 0, ... , n,
and step width h = (b-a)/n is called the Newton-Cotes quadrature formula
of order n. Its weights are given by
ak =h
(_l)n-k
k! (n _ k)! 10
r IIn .
(z - J) dz, k = O, ... ,n, (9.6)
)=0
j#
and
aO + al = j1 dx =
-I
2,
-aD + al = jl -I
X dx = 0,
and imply that ao = al = 1. Hence, for a general interval the trapezoidal
rule has the form
1
6 b-a h
a f(x) dx ~ -2- [f(a) + f(b)) = "2 [f(xo) + f(xd)·
-aD + az = jl xdx = 0,
-I
9.1 Interpolatory Quadratures 193
which imply that ao = a2 = 1/3 and al = 4/3. Hence, for a general interval,
Simpson's rule is given by
Jrf(x)
b
b- a[
dx~ -6- f(a) + 4f (a-2-
+ b) + f(b) ] =:3h[f(xo)+4f(xt}+ f(X2)]'
a
n ak
1 1
1 - - Trapezoidal rule
2 2
1 4 1
2 - - Simpson's rule
3 3 3
3 9 9 3
3 - - - Newton's three-eights rule
8 8 8 8
14 64 24 64 14
4 - - - - - Milne's rule
45 45 45 45 45
Jar f(x) dx -
b
b- a h3
-2- [/(a) + I(b)] = -12 I"(~) (9.7)
is given by
Since the first factor of the integrand is nonpositive on [a, b] and since
by I'Hopital's rule the second factor is continuous, from the mean value
theorem for integrals we obtain that
b
E 1 (J) = f(z) - (Ld)(z) {a (x - a)(x - b) dx
(z-a)(z-b) ia
for some z E [a, b]. From this, with the aid of the error representation for
linear interpolation from Theorem 8.10 and the integral
{b h3
i (x - a)(x - b) dx = -6 '
a
both the integral and the value obtained from Simpson's rule are zero.
Hence, this polynomial of degree three is integrated exactly by Simpson's
rule.
b a[f(a) + 4f (a-2-
ia{b f(x) dx - --i- + b) + f(b) ] = - 90
h5 f(4)(~) (9.8)
l
is given by
b
E 2(f) = [J(X) - (Ld)(x)]dx. (9.9)
and consequently
l
b 2 f(x) - p(x)
E2(f) = (X - xo)(x - Xl) (X - X2) ( )( )2( ) dx.
a X - Xo X - Xl X - X2
As in the proof of Theorem 9.4, the first factor of the integrand is non-
positive on [a, b], and the second factor is continuous. Hence, by the mean
value theorem for integrals, we obtain that
E 2(f) = (
z- Xo
f(z) - p(z)
)( )2(
z - Xl Z - X2
)
lb
a
2
(X - xo)(x - xt} (X - X2) dx
for some z E [a, b]. Analogous to Theorem 8.10, it can be shown that
f(4)(0
f(z) - p(z) = -4-!- (z - xo)(z - XI)2(Z - X2)
for some ~ E [a, b]. From this, with the aid of the integral
(b - a)5
120
we conclude the statement of the theorem. o
196 9. Numerical Integration
Jro
l
In2= ~
1 +x
by the trapezoidal rule yields
In 2 ~~ [1 + ~] = 0.75.
For f(x) := 1/(1 + x) we have
~~ 111"1100 = ~ ,
and hence, from Theorem 9.4, we obtain the estimate lin 2 - o. 751 ~ 0.167
as compared to the true error In 2 - 0.75 = -0.056 ....
Simpson's rule yields
In 2 ~ 6"
1[1 + 1 +4t + 21] = 3625 = 0.6944 ... ,
and from Theorem 9.5 and
h5 (4) _ _1_
90 IIf 1100 - 120
we find the estimate lin 2 - 0.69441 ~ 0.0084 as compared to the true error
In 2 - 25/36 = -0.0012. . .. 0
n min j"(x) ~
:tEla,b] k=l :tEla,b]
and the continuity of j" we conclude that there exists ~ E [a, bj such that
L j"(~k) = nj"(~),
k=l
for f E era, bj. Its error can be represented and estimated as follows.
Theorem 9.8 Let f : [a, bj -t lR be four-times continuously differentiable.
Then the error for the composite Simpson's rule is given by
Proof. Using Theorem 9.5, the proof is analogous to the proof of Theorem
9.7. 0
Table 9.2 gives the error between the exact value of the integral from Ex-
ample 9.6 and its numerical approximation by the composite trapezoidal
rule and the composite Simpson's rule. Clearly, if the number n of quadra-
ture points is doubled, i.e., if the step size h is halved, then the error for
the trapezoidal rule is reduced by the factor 1/4 and for Simpson's rule by
the factor 1/16, as predicted in Theorems 9.7 and 9.8.
198 9. Numerical Integration
1 -0.05685282
2 -0.01518615 -0.00129726
4 -0.00387663 -0.00010679
8 -0.00097467 -0.00000735
16 -0.00024402 -0.00000047
32 -0.00006103 -0.00000003
l
gent if
b
QnU) -t QU) = f(x) dx, n -t 00,
for all n 2: N(c). Now with the aid of the triangle inequality and using
(9.12) we can estimate
n
IQn(f) - Q(f)1 ::; L n
lain)llf(xi » - p(xin»1 + IQn(P) - Q(p)1
k=O
+ lb Ip(x) - f(x)1 dx
Cc c (b - a)c
<
- 2(C+b-a)
+-+
2 2(C+b-a)
=c
t
k=O
lain) I = t
k=O
n
ai ) = Qn(l) -+ I a
b
dx =b- a, n -+ 00,
From Theorems 9.7 and 9.8 and Corollary 9.11 we observe that the com-
posite trapezoidal rule and the composite Simpson's rule are convergent.
On the other hand, using the fact that the conditions of Theorem 9.10 are
necessary for convergence, it can be shown that the Newton-Cotes quadra-
tures do not converge for all continuous functions (see Problem 9.5).
Lakx~
n
k=O
= l a
b
xidx, i = 0, ... ,2n + 1.
Q(f) := l b
w(x)f(x) dx, (9.14)
J:
where w denotes some weight function. We assume that w (a, b) -+ IR.
is continuous and positive and that the integral w(x) dx exists. Typical
examples are given by
Qn(f) := l b
w(x)(Lnf)(x) dx.
1 a
b
w(x)f(x) dx
n
~ :~:::>kf(Xk)
k=O
with n+ 1 distinct quadrature points is called a Gaussian quadrature formula
if it integrates all polynomials P E P2n+1 exactly, i.e., if
L akP(xk) = 1w(x)p(x) dx
n b
(9.15)
k=O a
l b
w(x)qn+l (x)q(x) dx =0 (9.16)
1 a
b
w(x)qn+l (x)q(x) dx =
n
L akqn+l (Xk)q(Xk) = 0
k=O
for all q E Pn . o
Lemma 9.14 Let Xo, ... ,Xn be n+ 1 distinct points satisfying the condition
(9.16). Then the corresponding polynomial interpolatory quadrature is a
Gaussian quadrature formula.
Proof. Let L n denote the polynomial interpolation operator for the interpo-
lation points Xo, ... ,Xn . By construction, for the interpolatory quadrature
l
we have
t
k=O
akf(xk) =
a
b
w(x)(Lnf)(x) dx (9.17)
for all f E era, b]. Each P E P2n+ 1 can be represented in the form
P = LnP + qn+lq
(9.18)
and
Pn=span{qo, ... ,qn}, n=O,I, .... (9.19)
Proof. This follows by the Gram-Schmidt orthogonalization procedure from
Theorem 3.18 applied to the linearly independent functions un(x) := x n
for n = 0,1, ... and the scalar product
(f,g):= l b
w(x)f(x)g(x) dx
for f, g E C[a, b]. The positive definiteness of the scalar product is a conse-
quence of w being positive in (a, b). 0
for n > O. Hence, since w is positive on (a, b), the polynomial qn must
have at least one zero in (a, b) where the sign of qn changes. Denote by
Xl, ... , X m the zeros of qn in (a, b) where qn changes its sign. We assume
that m < n and set rm(x) := (x - xd··· (x - x m ). Then r m E Pn - 1 and
l
therefore
b
w(x)rm(x)qn(x) dx = O.
However, this integral must be different from zero, since rmqn does not
change its sign on (a, b) and does not vanish identically. Hence, we have
arrived at a contradiction, and consequently m = n. 0
Theorem 9.17 For each n = 0,1, ... there exists a unique Gaussian quad-
rature formula of order n. Its quadrature points are given by the zeros of
the orthogonal polynomial qn+l of degree n + 1.
Theorem 9.18 The weights of the Gaussian quadrature formulae are all
positive.
Proof. Define
2
fk(X):= [qn+l(X)] , k = O, ... ,n.
x - Xk
Then
ak[q~+l (Xk)f = Ln ajfk(xj) = jb w(x)fk(x) dx > 0,
j=O a
since!k E P2n, and the theorem is proven. 0
En(f) := jb w(x)f(x) dx - t
a k=O
akf(xk)
we can write
En(f) = fb w(x)[J(x) - (Hnf) (x)] dx.
Then as in the proofs of Theorems 9.7 and 9.8, using the mean value the-
l
orem we obtain
b
f(z) - (Hnf)(z) 2
En(f) = [qn+l ()]2
Z a
w(x)[qn+l(x)] dx
for some z E [a, b]. Now the proof is finished with the aid of the error
representation for Hermite interpolation from Theorem 8.19. 0
204 9. Numerical Integration
Obviously To(x) = 1 and T 1 (x) = x. From the addition theorem for the
cosine function, cos(n + 1) t + cos( n - 1) t = 2 cos t cos nt, we can deduce the
recursion formula
t T~X) dx = Jr cosntcosmtdt =
L1 1- x 2
1r
2'
n = m > 0,
o
0, n f:. m.
Hence, the orthogonal polynomials qn of Lemma 9.15 are given by
qn = 21 - nT n . The zeros of Tn and hence the quadrature points are given
by
Xk = cos (2~: l 1r) , k = O, ... ,n - 1.
The weights can be most easily derived from the exactness conditions
m = O, ... ,n-l,
for the interpolation quadrature, Le., from
+ l)m m=O,
L ak cos (2k 2n
n-1
1r =
{ 1r,
rt
i-I
I(x)
vI - x 2
dx _ ~
n
I:
k=O
I (cos 2k + 1
2n
1r) = 1r1(2n)(~)
22n - I (2n)!
al = r dx = 2.
t
i-I
Hence the first Gauss-Legendre formula is given by
(9.20)
206 9. Numerical Integration
/1
ness conditions
a1 + az = dx = 2,
-1
inr (a + b
b
a
f(x) dx
b- a
~ -2- 2:
n
k=O
akf -2- + -2-
b- a
Xk
)
with an error of order O(h 2n ). These composite Gaussian rules are used
quite frequently in practice. We illustrate their convergence behavior by
Table 9.3, which gives the error between the exact value of the integral
from Example 9.6 and its numerical approximation by composite Gaussian
quadrature of orders one and two. As predicted by our error analysis, if the
number n of quadrature points is doubled, i.e., if the step size h is halved,
then the error for the Gaussian quadrature of orders one and two is reduced
roughly by the factor 1/4 and 1/16, respectively.
m n=1 n=2
1 0.02648051 0.00083949
2 0.00743289 0.00007054
4 0.00192729 0.00000489
8 0.00048663 0.00000031
16 0.00012197 0.00000002
32 0.00003051 0.00000000
1 1
Bn(x) dx = 0, n E IN. (9.22)
and
- m ~ sin21fkx
B 2m - 1(X) = 2(-1) L...J (21fk)2m-l (9.26)
k=l
9.4 Quadrature of Periodic Functions 209
for m = 1,2, .... This follows from (9.21) and (9.22) and the elementary
Fourier expansion for the piecewise linear function ih (see Problem 9.13).
Let Xk = a + kh, k = 0, ... , n, be an equidistant subdivision of the
interval [a, b] with step size h = (b - a)/n and recall the definition of the
trapezoidal sum
(9.27)
where [T] denotes the largest integer smaller than or equal to T'
Proof. Let 9 E cm[o, 1]. Then, by m - 1 partial integrations and using
(9.23) we find that
t
io B 1 (z)g'(z)dz= 21 [g(I)+g(O)]- i t g(z)dz
o
and observing that the odd Bernoulli numbers vanish leads to
['q-] b .
1
1
L.
1
g(z) dz =- [g(O) + g(I)] - ~ [g(2 i -l)(I) - g(2 i -l)(0)]
o 2 J=1
(2J)!
210 9. Numerical Integration
IXk
+1
f(x) dx =
h
"2 (f(Xk) + f(Xk+dl
+( _l)mh m i:k+ 1
B m (X ~ a) f(m)(x) dx.
2: t/ (2~k).
coincides with the rectangular rule
12~ f(x)dx ~
For its error
2~+l r
h
IEn(f)I::; If(2m+l)(x)1 dx,
n Jo
where
1
2L
00
C:= k 2m + 1 .
k=l
Proof. From Theorem 9.26 we have that
Corollary 9.27 illustrates why for periodic functions the simple rectangu-
lar rule is superior to any other quadrature rule (see Problem 9.12). Note
that the rectangular rule can also be obtained by integrating the trigono-
metric interpolation polynomials of Theorems 8.24 and 8.25.
In the following theorem we give an example of derivative-free error esti-
mates for numerical quadrature rules in the spirit of Davis [15]. They have
the advantage that they do not need the computation of higher derivatives
for the evaluation of the estimates. However, they require the integrand to
be analytic, and their proofs need complex analysis.
Theorem 9.28 Let f : lR -+ lR be analytic and 21r-periodic. Then there
°
exists a strip D = lR x (-a, a) C cr:; with a > such that f can be extended
to a holomorphic and 21r -periodic bounded function f : D -+ cr:;. The error
for the rectangular rule can be estimated by
°
expansion provides a holomorphic extension of f into some open disk in the
complex plane with radius r(x) > and center x. The extended function
again has period 21r, since the coefficients of the Taylor series at x and
at x + 21r coincide for the 21r-periodic function f : lR --+ lR. The disks
corresponding to all points of the interval [0,21r] provide an open covering
of [0, 21r]. Since [0,21r] is compact, a finite number of these disks suffices to
cover [0, 21r]. Then we have an extension into a strip D with finite width
2a contained in the union of the finite number of disks. Without loss of
generality we may assume that f is bounded on D.
From the residue theorem we have that
1
+ 2 71"
io
nz /-io<+271" nz 41ri n (21rk)
. cot- f(z)dz-. cot-f(z)dz=-- L f -
to< 2 -to< 2 n k=l n
for each °< a < a. This implies that
1
i o<+271" nz 21r
Re. i cot -2 f(z) dz = -
to< n
since by the Schwarz reflection principle, f enjoys the symmetry property
f(z) = f(z). By Cauchy's integral theorem we have
1
i o<+271" 1271"
Re. f(z) dz = f(x) dx,
to< 0
and combining the last two equations yields
1
i
o<+271" ( nz)
En(f)=Re l-icot f(z)dz
io<
T
212 9. Numerical Integration
l b
f(x) dx = T~(J) + '"f1h 2 + O(h4 )
for some constant 'Y! depending on f but not on h. Hence, for half the step
size, we have that
From these two equations we can eliminate the terms containing h 2 j Le.,
we multiply the first equation by -1/3 and the second equation by 4/3 and
add both equations to obtain
of the composite trapezoidal rule with step sizes hand h/2 leads to a
quadrature formula with the improved error order O(h4 ). The quadrature
T;(f) coincides with the composite Simpson's rule for the step size h/2.
If f is six-times continuously differentiable, by linearly combining the
Euler-Maclaurin formulae for the step sizes hand h/2 we obtain an error
representation of the form
l b
f(x) dx = T;(f) + 12h4 + O(h6 )
for some constant 12 depending only on f. From this and the corresponding
formula
ira
b 4
h
f(x) dx = T; (f) + 12 16 + O(h 6 )
for step size h/2, by eliminating the terms containing h 4 we obtain the
quadrature formula
3 [2
1 16T%(f) - Th(f)
Th(f) := 15 2]
with an error of order O(h 6 ). Note that the actual numerical evaluation of
T'K(f) requires the values for the composite trapezoidal rule for the step
sizes h, h/2, and h/4.
Obviously, this procedure can be repeated, and this leads to the sequence
of Romberg quadrature formulae. Let
Tf(f) := T~k(f), k = 0,1,2, ... ,
be the trapezoidal sums for the step sizes h k := h/2 k . Then for m = 1,2, ...
the Romberg quadratures are recursively defined by
It
Then for the Romberg quadratures we have the error estimate
l b
f(x) dx - T~(J) _ ~l Ij,i[J(2 j -l) (b) _ f(2 j -l)(a)) (2:) 2j
(9.29)
for i = 1, ... ,m and k = 0, 1, .... Here the sum on the left-hand side is set
equal to zero for i = m. By the Euler-Maclaurin expansion this is true for
i = 1 with 'Yj,l = b2j j(2j)! for j = 1, ... , m - 1 and
As an abbreviation we set
Assume that (9.29) has been shown for some 1 ::; i < m. Then, using (9.28),
we obtain
4
i
4i _ 1
[ib f(x) dx - Ti.+l. (f) - f;
a
m-l (
2k+1
h )2 'Yj,iFj
j
]
- 4i
1
_ 1
[i a
b
f(x) dx - Ti.(f) -
.
f;
m-l ( h ) 2j
2k 'Yj,iFj
]
where
4i - j - 1
'Yj,i+l = 4i _ 1 'Yj,i, j = i + 1, ... ,m - 1.
Now with the aid of the induction assumption we can estimate
::; 'Ym,i+l11f(2m)1100 (~ ) 2m ,
where
4i - m +1
'Ym,i+l = 4i - 1
and the proof is complete. o
From Theorem 9.29 we conclude that the Romberg quadrature Ti: in-
tegrates polynomials of degree less than or equal to 2m - 1 exactly. For
h = b- a the Romberg quadrature TO' uses 2m - 1 + 1 equidistant integration
points. Therefore, TJ coincides with the trapezoidal rule, T;f with Simp-
son's rule, and TJ with Milne's rule. Similarly, TI, Tf, and T2 correspond
9.5 Romberg Integration 215
to the composite trapezoidal rule, the composite Simpson's rule, and the
composite Milne rule, respectively. For m 2:: 4 the number of the quadrature
points in TO' is greater than the degree of exactness. The Romberg formula
TJ uses nine quadrature points, and this is the number of quadrature points
where the Newton-Cotes formulae start having negative weights.
Theorem 9.30 The quadrature weights of the Romberg formulae are pos-
itive.
k
1_ [22m+1Tm
Qm+l := __
4m _ 1 k+ 1
+ 2Tm
k
+ 4 m+lQm
k+l
] (9.30)
for k = 1,2, ... and m = 1,2 ... and show by induction that
Tm+1 = _1_ [T m + Qm]. (9.31)
k 4m _ 1 k k
4m + 1 1
T;:+l + Q~+l = 4m _ 1 [TM-l + QM-l] - 4m _ 1 [4mTM-l - Tr]
i.e., (9.31) also holds for m + 1. Now, from (9.30) and (9.31), by induction
with respect to m, it can be deduced that the weights of are positiveTr
and that the weights of Q'k are nonnegative. 0
lim Tr(f) =
m-+oo
Ia
b
f(x) dx and lim Tr(f)
k-4OO
= I a
b
f(x) dx
Proof. This follows from Theorems 9.29 and 9.30 and Corollary 9.11. 0
For continuous functions, the trapezoidal sums converge as the step size
tends to zero. This motivates us to consider a polynomial in h 2 interpolating
the values Tl (f), ... , Tl+m (f) at the interpolation points h%, ... , h%+m and
evaluate it at h = o.
Theorem 9.32 Denote by L'k the uniquely determined polynomial in h 2
of degree less than or equal to m with the interpolation property
(9.32)
Proof. Obviously, (9.32) is true for m = o. Assume that it has been proven
for m - 1. Then, using the Neville scheme from Theorem 8.9, we obtain
k TI T; Tt r:
1 -0.05685282
2 -0.01518615 -0.00129726
4 -0.00387663 -0.00010679 -0.00002742
8 -0.00097467 -0.00000735 -0.00000072 -0.00000030
16 -0.00024402 -0.00000047 -0.00000001 -0.00000000
32 -0.00006103 -0.00000003 -0.00000000 -0.00000000
We finish this section with the corresponding Table 9.5 for the integral
t
Jo
2
JX dx = 3 (9.33)
k Tl n Tt Tt T~
1 0.166667
2 0.063113 0.028595
4 0.023384 0.010140 0.008910
8 0.008536 0.003587 0.003151 0.003059
16 0.003085 0.001268 0.001114 0.001082 0.001074
32 0.001108 0.000448 0.000394 0.000382 0.000380
1 1
f(x) dx
Jt Jr
21f
f(x) dx = g(t) dt,
o o
where
g(t) := w'(t) f(w(t)), 0 < t < 211".
Now assume that the function w has derivatives
w U) (0) = w U)(211") = 0, j = 1, ... ,p - 1, (9.34)
and
(9.35)
for some p E IN. Then we may expect that the function 9 and some of
its derivatives up to a certain order vanish at t = 0 and t = 211"; i.e., 9
can be considered as a sufficiently smooth 211"-periodic function, and the
rectangular rule may be applied to the transformed integral. This yields
the quadrature formula
(9.36)
218 9. Numerical Integration
and
for some constants 0 < Co < Cl depending on the function w. From (9.38) it
is obvious that the quadrature points are graded towards the two endpoints
x = 0 and x = 1 of the integration interval.
For substitutions with the properties (9.34), (9.35), and (9.37), from the
Euler-Maclaurin expansion applied to the integral over 9 we now will derive
an estimate for the remainder term
En(J) := 1°1
f(x) dx -
n-l
L ad(xk)'
k=l
Then, clearly
for j = 0, ... , q.
9.6 Improper Integrals 219
Theorem 9.33 Let p E IN and assume that w satisfies (9.34), (9.35), and
(9.37). Further, let q E IN and j E s2q+l,a with 0< 0: ~ 1 such that
Then from
g(r+l)(t) = t;
r [
uj(t) w'(t) jU+1)(W(t» -it
+ d U 'fU)(w(t»
:(t)]
u~(t) w'(t), j = r + 1,
for the coefficients uj. In particular, we have
for some constant C 1 , and with the aid of (9.43), we further obtain that
luj(t) f(j)(w(t))1 ::; C 21IfIl2q+1,a[t(211" - t)]a p-r-1, 0 < t < 211", (9.44)
for some constant C 2 and r = 0, ... ,2q + 1 and j = 0, ... , r. From this,
since QP > 2q + 1, we observe that for r = 0, ... , 2q the derivatives g(r) can
be continuously extended from (0,211") onto [0,211"] with values
Furthermore, from (9.44) and the assumption QP > 2q + 1 we see that the
integral of g(2 q+1) over [0,211"] exists as an improper integral and
Wp(t) := [1 21<
[s(211" - S)]P-1 dS]
-1
1 t
[s(211" - s)]P-1ds . (9.45)
wp(t) := [1 2
1< sin P - 1
~ dS] -1 it sin P - 1
~ ds (9.46)
w(t) =
[1o
2
1< exp (11"
- - - -11")]
S
- ds -1
211" - S
it 0
exp (11"
- - - -11")
S
- ds
211" - S
Problems 221
and
[1 1
Jo VX dx = 2. (9.48)
Table 9.6 gives the error between the exact value and the numerical ap-
proximation obtained by using the substitution (9.47).
Problems
9.1 Show that the error for the composite trapezoidal rule can be expressed in
l -l
the form
b b
f(x)dx-Th(f) = KT(X)f"(x)dx,
for k = 1, ... ,n. Use this error representation for an alternative proof of Theorem
9.7.
9.2 Show that the error for the composite Simpson's rule can be expressed in
the form
222 9. Numerical Integration
h ( 3 1 4
$ x $
18 x - Xk-2) - 24 (x - Xk-2) , Xk-2 Xk-l,
Ks(x) :=
{ h 3 1 4
- (Xk -x) - - (Xk -x)
18 24 '
for k = 2,4, ... , n. Use this error representation for an alternative proof of The-
orem 9.8.
9.4 Show that the weight a4 for the Newton-Cotes formula of order eight is
negative.
9.5 For the remainder En of the Newton-Cotes formula of order n on the in-
terval [-1,1), applied to the Chebyshev polynomial T n + l , show that
n 1
E (T ) _ (n+ 1)!4 + n E IN.
n n+1 - nn+2
where
(n - 1)' 4 n + 1
Tn = 3 nn+2 --t 00, n --t 00.
Io
n ( ) z
n+l
dz = -2 II ( )
0
z
n+2
dz
for n odd.
9.6 Compute the weights for the polynomial interpolatory quadratures with
equidistant quadrature points
b-a
Xk = a + (k + 1) -n+
-2' k = 0, 1, ... , n,
for n = 0, 1, 2 and obtain representations of the quadrature errors. These formulae
are called open Newton-Cotes quadmtures, since the two endpoints a and b are
omitted.
Problems 223
I
b b n
/(x) dx::::: ~ a ~ /(Xk)
a k=1
with distinct quadrature points XI, ... ,X n E [a, b] and equal weights is called a
Chebyshev quadrature if it integrates polynomials in Pn exactly. Find the Cheby-
shev quadratures for n = 1,2,3,4. (Chebyshev quadratures exist only for n < 8.)
9.8 Show that there exists no polynomial interpolatory quadrature of order n
that integrates polynomials of degree 2n + 2 exactly.
9.9 The Chebyshev polynomial of the second kind Un of degree n is defined by
[II ~Un(X)Um(x)dx= i t5 nm .
9.10 Show that the quadrature points and quadrature weights for the Gauss-
Chebyshev quadrature of order n - 1 for the integral
[II ~/(x)dx
are given by
k+l
Xk = cos--
n+l
7r
and
• 2 k +1
ak =-
n+l
-
11"
sm - -
n+l
7r
for k = 0, ... ,n - 1.
9.11 Find the quadrature weights aO,al,a2,a3, and the (remaining) quadrature
points x I, X2 of a quadrature formula of the form
1
21r
1 211"
----dx=-
o 5-4cosx 3
by the rectangular rule and Simpson's rule convince yourself of the superiority of
the rectangular rule for periodic functions.
9.13 Verify the Fourier series (9.25) and (9.26) for the periodic Bernoulli poly-
nomials.
is absolutely and locally uniformly convergent for all x E [0,1] and all t E (-1,1).
9.16 Write a computer program for Romberg integration and test it for various
examples.
9.19 Show that the functions (9.45), (9.46), and (9.47) are strictly monoton-
ically increasing, infinitely differentiable, and map [0,211"] onto [0,1] such that
(9.34), (9.35), and (9.37) are satisfied.
9.20 Write a computer program for the numerical quadrature (9.36) using the
substitution (9.47) and test it for various examples.
10
Initial Value Problems
ness theorem for initial value problems. In Section 10.2 we will describe
some variants of the simplest method for the numerical solution of initial
value problems, which was first used by Euler. These methods are special
cases of so-called single-step methods, for which we will give a convergence
and error analysis in Section 10.3. This section also includes a short dis-
cussion of the Runge-Kutta method as the most widely used single-step
method. The final section, Section lOA, is concerned with the description
and analysis of multistep methods.
We wish to note explicitly that this chapter is also meant to serve as
an application of some of the material provided in Chapters 8 and 9 on
interpolation and numerical integration.
if (x, u(x)) E G and u'(x) = f(x, u(x)) for all x E [a, b].
Geometrically speaking, the differential equation (10.1) defines a field of
directions on G. Solving the differential equation means looking for func-
tions whose graphs match this field of directions.
Systems of ordinary differential equations can be included in the dis-
cussion as follows. If G c lRn + 1 is a domain and f : G -+ lRn , then a
continuously differentiable function u : [a, b] -+ lRn is called a solution of
the system of ordinary differential equations of the first order
u' = f(x, u)
if (x,u(x)) E G and u'(x) = f(x,u(x)) for all x E [a,b]. More explicitly,
this system reads
... ,
dp
- =ap
dt
dp 2
- = ap-bp
dt
with positive constants a and b contains a correction term that slows down
the growth rate for large populations and is known as the Verhulst equa-
tion. It was introduced by Verhulst in 1938 as a model for the .growth of
the human population. In general, for a given growth rate r one wants to
determine the development of the population p(t) in time for a given initial
population PO at time t = to. 0
Definition 10.4 The initial value problem for the ordinary differential
equation
u' = f(x,u) (10.2)
consists in finding a continuously differentiable solution u satisfying the
initial condition
u(xo) = Uo (10.3)
for a given initial point Xo and a given initial value uo.
°
Lipschitz constant. Then for each initial data pair (xo, uo) E G there exists
an interval [xo - a, Xo + a] with a > such that the initial value problem
(10.2)-{10.3) has a unique solution in this interval.
Proof. Firstly, we transform the initial value problem equivalently into the
Volterra integral equation
u(x) = Uo + l f(~,
Xo
x
u(f,)) d~. (10.5)
Since D is open, we can choose a > ° such that the closed rectangle
B := {(x, u) E lRn + 1 : Ix - xol ::; a, lIu - uoll ::; Ma}
that is,
Ilu - uoll oo :S M a,
which implies that the solution remains within the rectangle B. We consider
the closed subset
The operator A indeed maps U into itself, since the function Au is con-
tinuous and satisfies IIAu - uoll oo :S M a. With the aid of the Lipschitz
condition (10.4) and using (6.1) we can estimate
Exploiting the fact that in Theorem 10.5 the width a of the interval
is determined by the Lipschitz constant L, which is independent of the
initial point (xo,uo), one can assure global existence of the solution; i.e.,
the solution to the initial value problem exists and is unique until it leaves
the domain G of definition for the differential equation.
Note that on a convex domain each function that is continuously dif-
ferentiable with respect to u satisfies a Lipschitz condition (see the mean
value Theorem 6.7).
230 10. Initial Value Problems
La
Ilu - uvlloo ~ 1 _ La Iluv- uv-Illoo , v = 1,2, ....
Proof. This follows from Theorem 3.46. 0
u' = x 2 + u2 , u(O) = 0,
on G = (-0.5,0.5) x (-0.5,0.5). For f(x, u) := x 2 + u 2 we have
on G. Hence for any a < 0.5 and M = 0.5 the rectangle B from the proof
of Theorem 10.5 satisfies BeG. Furthermore, we can estimate
for all (x, u), (x, v) E G; Le., f satisfies a Lipschitz condition with Lipschitz
constant L = 1. Thus in this case the contraction number in the Picard-
Lindelof theorem is given by La < 0.5.
Here, the iteration (10.6) reads
U3(X) = i
r [e + 9~6 + 2~10 e
189 + 3969
4
]
de = 3
x
3
+
x
7
63 +
2x
ll
2079
X
+ 59535
15
o
with the error estimate
1 1
lIu - u31100 ~ II u3- uzlloo = 2079.2 10 + 59535.2 15 = 0.00000047 ....
In this example three steps of the Picard-Lindel6f iteration give eight dec-
imal places of accuracy. However, the example is not typical, since in gen-
eral, the integrations required in each iteration step will not be available
explicitly as in the present case. 0
u' = x2 + u2 , u(O) = 0,
from Example 10.7. Table 10.1 gives the difference between the exact so-
lution as computed by the Picard-Lindelof iterations in Example 10.7 and
the approximate solution obtained by Euler's method for various step sizes
h. We observe a linear convergence as h -+ O. 0
l
X1
h
f({,u({»d{ ~"2 [f(xo,u(xo)) + f(xl,u(xd)]'
Xo
which yields
h
Ul = Uo +"2 [f(xo,uo) + f(Xl,ud]· (10.9)
Since the solution of the nonlinear equation (10.9) will deliver only an
approximation to the solution of the initial value problem, there is no need
to solve (10.9) with high accuracy. Using the approximate value from the
explicit Euler method as a starting point and carrying out only one itera-
tion, we arrive at the following method.
234 10. Initial Value Problems
Definition 10.12 The predictor corrector method for the Euler method
for the numerical solution of the initial value problem (10.7), also known
as the improved Euler method or Heun method, constructs approximations
Uj to the exact solution u(x j) at the equidistant grid points
Xj := Xo + jh, j = 1,2, ... ,
by
h
Uj+I := Uj + 2" [f(xj, Uj) + f(xj+l' Uj + hf(xj, Uj))], j = 0,1, ....
Example 10.13 Consider again the initial value problem from Example
10.7. Table 10.2 gives the difference between the exact solution as computed
by the Picard-Lindelofiterations and the approximate solution obtained by
the improved Euler method for various step sizes h. We observe quadratic
convergence as h -t O. 0
In the following section we will show that the Euler method and the
improved Euler method are convergent with convergence order one and
two, respectively, as observed in the special cases of Examples 10.9 and
10.13.
must be fulfilled for the exact solution u. We also expect that the order of
this convergence will influence the accuracy of the approximate solution.
These considerations are made more precise by the following definition.
Definition 10.16 For each (x, u) E G denote by TJ = TJ(O the unique
solution to the initial value problem
TJ' = f(~, TJ), TJ(x) = u,
with initial data (x, u). Then
1
d(x, u; h) ;= Ii [TJ(x + h) - TJ(x)]- ep(x, u; h)
is called the local discretization error. The single-step method is called con-
sistent (with the initial value problem) if
lim d(x, Uj h)
h-+O
= °
uniformly for all (x, u) E G, and it is said to have consistency order p if
Id(x, Uj h)1 :s Kh P
for all (x,u) E G, all h > 0, and some constant K.
236 10. Initial Value Problems
lim ep(x, u; h)
h-+O
= f(x, u)
uniformly for all (x, u) E G.
Proof. Since we assume f to be bounded, we have
1](x+t)-1](x) = l t
1]'(X+S)ds= l tf
(x+s,1](X+S))dS-70, t-70,
<
h
I
Jrh[1]'(X+t)-1]'(X)]dtl
o - O~t9
max 11]'(x+t)-1]'(x)1
= max O~t:Sh
If(x + t,1](x + t)) - f(x,1](x))1 -70, h -7 0,
uniformly for all (x, u) E G. From this we obtain that
= ~ l h
[1]'(X + t) -1]'(x)]dt -7 0, h -7 0,
uniformly for all (x, u) E G. This now implies that the two conditions
Ll -7 0, h -7 0, and ep -7 f, h -7 0, are equivalent. 0
for some 0 < 0 < 1 and a bound K 1 for 6(fxx+ 2fxuf+ fuuF+ fufx+ f~f)·
From Taylor's formula for functions of two variables we have the estimate
1
If(x + h, u + k) - f(x, u) - hfx(x, u) - kfu(x, u)1 :::; 2 Kz(lhl + Ikl)z
with a bound K z for the second derivatives fxx, fxu, and fuu. From this,
setting k = hf(x, u), in view of (10.12) we obtain
1
If(x + h,u + hf(x,u)) - f(x,u) - hTJ"(X)1 :::; 2 K z (1 + Ko)zh z
with some bound K o for f, whence
follows. Now combining (10.13) and (10.14), with the aid of the triangle
inequality and using the differential equation, we can establish consistency
order two. 0
is called the maximal global error. The single-step method is called conver-
gent if
lim E(h) = 0,
h---+O
E(h) ~ HhP
j = 0, 1, ... ,
holds.
I~j+ll ~ (1 + A)I~olejA + (1 + A) ~ (e jA - 1) +B
Proof. We first show that consistency implies convergence and assume that
the single-step method is consistent. For the difference of two consecutive
errors we compute
Hence
(10.15)
where
c(h) := max I~(x, u(x); h)1
a~x9
satisfies
c(h) -t 0, h -t 0,
since we assume consistency. The inequality (10.15) implies that
From this, applying Lemma 10.21 for A = hM and B = hc(h) and using
eo = 0, we obtain the estimate
(10.16)
We now show that convergence implies consistency and assume that the
°
single-step method is convergent; i.e., for h -t the approximations
(10.17)
g(x,u) :=~(x,u;O)
and observe that by Theorem 10.17 the single-step method is also consistent
with the initial value problem
Proof. By Theorems 10.18, 10.19, 10.22, and 10.23 it remains only to verify
the Lipschitz condition of the function ~ for the improved Euler method
given by (10.11). From the Lipschitz condition for f we obtain
1 1
~ 2 If (x, u) - f(x, v)1 + 2 lf (x + h, u + hf(x, u)) - f(x + h, v + hf(x, v)) I
L L ( 1+ hL) lu-vl;
~2Iu-vl+2I[u+hf(x,u)1-[v+hf(x,v)11~L
2
i.e., ~ also satisfies a Lipschitz condition. o
10.3 Single-Step Methods 241
k2 =f ( Xj + ~ ,Uj + ~ k1),
k 3 = f (Xj + ~ ,Uj + ~ k2 ),
242 10. Initial Value Problems
and
h
Uj+l = Uj + 6 (k i + 2k2 + 2k3 + k 4 ).
For the differential equation u' = f(x) the Runge-Kutta method coin-
cides with Simpson's rule for numerical integration.
Theorem 10.26 The Runge-Kutta method is consistent. If f is four-times
continuously differentiable, then it has consistency order four and hence
convergence order four.
Proof. The function <P describing the Runge-Kutta method is given recur-
sively by
where
<PI (x, u; h) = f(x, u),
The error estimate in Theorem 10.23 is not practical in general, since the
constants M and K have to be determined from higher-order derivatives of
f. Therefore, in practice, the error is estimated by the following heuristic
consideration. For convergence order p, the error between the approximate
solution ii(x; h) at the point x, obtained with step size h, and the exact
solution u(x) satisfies
ii(x; h) - u(x) ';:;j ch P
for some constant c. Correspondingly, for step size h/2 we have that
10.4 Multistep Methods 243
Hence we may consider (10.19) as an estimate for the error occurring with
the smaller step size h/2. However, we need to keep in mind that (10.19)
does not provide an exact bound and might fail in particular situations.
Nevertheless, it can be used for controlling the step size during the course of
the numerical calculations in order to adjust the actual step size according
to the required accuracy.
Solving for u(x) in (10.19) yields
2P u (Xi~) -u(x;h)
u(x) ~ 2P _ 1 (10.20)
Xj+r-k
f(~, u(~)) d~
with 1 :::; k :::; r by an interpolatory quadrature, i.e., by
where PEPs with 0 :::; s :::; r is the uniquely determined polynomial with
the interpolation property
Le., by setting
l
xi + r
uj+r - Uj+r-k = p(~) d~. (10.21)
Xj+r-k
(10.22)
is called the local discretization error. The multistep method is called con-
sistent (with the initial value problem) if
lim ~(x, Uj h) = 0
h--+O
~(x, u; h) = l1
h
x rh
+
x+(r-k)h
[f(~, u(~)) - p(O] d~,
246 10. Initial Value Problems
for all ~ in the interval x + (r - k)h ::; ~ ::; x + rh and some constant K
depending on f and its derivatives up to order s + 1. 0
Definition 10.30 The starting values uj, j = 0, ... ,r - I, are called con-
sistent if
lim [uj(h) - u(Xj)] = 0, j = 0, ... , r - 1.
h-tO
They are said to have consistency order p if
To make sure that the consistency order of the starting values coincides
with the consistency order of the multistep method, the single-step method
for computing the starting values has to be chosen accordingly.
Secondly, multistep methods can be unstable, as illustrated by the fol-
lowing example.
and approximate
U'(XO) ~ p'(xo).
Using the fact that the approximation for the derivative is exact for poly-
nomials of degree less than or equal to two, simple calculations show that
(see Problem 10.15)
u(x) - p(x) I~ 1
lIulll llool(x - xt}(x - x2)1,
I
-6
x - Xo
and from this, passing to the limit x -+ XO, it follows that the error for the
derivative can be estimated by
By approximating
p'(xo) ~ u'(xo) = f(xo, uo)
we derive a multistep method of the form
From (10.27) it follows that (10.28) is consistent with order two if f is twice
continuously differentiable.
Now we consider the initial value problem
with the solution u(x) = e- x . Here the multistep method (10.28) reads
Table 10.3 gives the error ej = Uj -e- Xj between the approximate and exact
solutions for the step sizes h = 0.1 and h = 0.01. For the starting values,
Uo = 1 and Ul = e- h have been used with ten-decimal-digits accuracy. The
last column gives the quotient qj := ej/ej-l of the error in two consecutive
steps.
h = 0.1 h = 0.01
j Xj ej qj j Xj ej qj
where a and A are complex numbers. Substituting into (10.29) shows that
(10.30) solves (10.29) if and only if A is a solution of the so-called charac-
teristic equation
A2 - 4A + (3 - 2h) = O.
This quadratic equation has two solutions, namely
Al,2 = 2 =f VI + 2h.
Therefore, the general solution of (10.29) is given by
Uj = aA{ + bA~.
The two constants a and b are determined by the conditions Uo = 1 and
Ul = e- h and have the values
A2 - e- h
a = = 1 + O(h 2 )
A2 - Al
and
e- h - Al
b = ----::.
A2 - Al
The term aA{ in the solution to the difference equation approximates the
solution e- Xj = e- jh to the initial value problem, since
aA{ = [1 + O(h 2 )] [1 - h + O(h 2 )F ~ e- jh .
However, the additional term bA~ grows exponentially, and the relation
Uj - u(Xj) ~ A2 = 3 + h + O(h 2 )
Uj-l - u(xj-d
explains the last column of Table 10.3. o
Roughly speaking, for multistep methods with r ~ 2, the (homogeneous)
difference equation of order r occurring in the multistep method has r lin-
early independent solutions, whereas the approximated differential equa-
tion has only one solution. Hence only one of the solutions to the difference
equation corresponds to the differential equation. Therefore, convergence
of the multistep method can be expected only when the additional solu-
tions to the difference equation remain bounded. Note that these additional
solutions will always be activated by errors in the starting values and by
round-off errors. For this reason we proceed by investigating the stability
of the difference equation.
Definition 10.32 The linear difference equation
r-l
Uj+r +L amuj+m = 0, j = 0,1, ... , (10.31)
m=O
with constant coefficients ao, . .. ,ar - l is called stable if all its solutions are
bounded.
10.4 Multistep Methods 249
k=O
= A
j
t (~)l(An-kp)(A).
From this it can be deduced that if A is a zero of the characteristic poly-
nomial p of multiplicity 8, then for n = 0,1, ... , 8 - 1 the sequence (10.33)
solves the difference equation.
Now assume that AI, ... , Ak are the zeros of the characteristic polynomial
(10.32) and have multiplicities 81, ... , 8k j i.e.,
k
p(A) =
Al)SI. II (A -
1=1
Then the general solution of the homogeneous difference equation (10.31)
is given by
k sl-l
Uj = L L alsjS A{ (10.34)
1=1 s=o
with r arbitrary constants also To establish this we need to show that the
coefficients als can be chosen such that arbitrarily given initial conditions
k 81-1
are fulfilled. The homogeneous adjoint system to the system (10.35) reads
r-l
L (3jP A{ = 0, S = 0, ... , SI - 1, 1 = 1, . .. , k.
j=O
j=O
Lemma 10.34 For k = 0,1, ... , r -1, let Uj,k denote the unique solutions
to the homogeneous difference equation (10.31) with initial values
Then for a given right-hand side cr , Cr+ 1, ... , the unique solution to the
inhomogeneous difference equation
r-l
Zj+r +L amzj+m = Cj+r, j = 0,1, . .. , (10.36)
m=O
Proof. Setting Um,r-l = 0 for m = -1, -2, ... , we can rewrite (10.37) in
the form
r-l
Zj =L Zk Uj,k + Wj, j = 0,1, ... ,
k=O
where
L
00
r r 00
L amwj+m = L am L Ck+rUj+m-k-l,r-1
m=O m=O k=O
r j
= L am L Ck+rUj+m-k-l,r-1
m=O k=O
j r
Now the proof is completed by noting that each solution to the inhomo-
geneous difference equation (10.36) is uniquely determined by its r initial
values Zo, ZI,' .. , Zr-l' 0
Uj+r + L amuj+m = 0
m=O
is stable.
Single-step methods are always stable, since the associated difference
equation Uj+l - Uj = 0 clearly satisfies the root condition.
Remark 10.36 The multistep methods (10.21) are stable.
Proof. The corresponding characteristic polynomial p(A) := Ar - Ar - k ful-
fills the root condition. 0
Then the assertion follows by using the estimate 1 + A ~ eA. The inequality
(10.38) is true for j = 1. Assume that it has been proven up to some j ~ 1.
Then we have
j j
we obtain
r-l r-l r-l
ej+r + L amej+m = uj+r + L amuj+m - U(Xj+r) - L amu(xj+m)
m=O m=O m=O
where
where
c(h) = max I~(x, u(x); h)1
a~x~b
for some constant C and ,(h) := d(h) + c(h). Now Lemma 10.37 implies
that
whence
E(h) ::; C(1 + Ch)r(h)e(b-a)C -t 0, h -t 0,
The basic advantage of multistep methods results from the fact that for
arbitrary convergence order, in each step only one new evaluation of the
function f is required. In contrast, for single-step methods the number of
function evaluations required in each step is equal, in general, to the conver-
gence order. Therefore, multistep methods are much faster than single-step
methods. However, it should be noted that readjusting the step size dur-
ing the computation is more involved due to the need to recompute the
corresponding starting values for the new step size.
Problems
10.1 Find the exact solution of the initial value problem
u' = _u 2 , u(O) = 1,
10.2 Consider the initial value problem u' = u, u(O) = 1, and show that the
approximate solution from the Euler method is given by Uj = (1 + h)j.
10.4 Show that Euler's method fails to approximate the solution u(x) = (~x) 3/2
of the initial value problem u' = U 1 / 3 , u(O) = O. Explain this failure.
10.5 Show that the differential equation u' = ax with a E lR is solved exactly
by the improved Euler method.
Problems 255
k1 = I(xj, Uj),
2h 2h)
k3 = I ( Xj + 3" ,Uj + 3" k2 ,
and
Uj+l = Uj + 4"h (k + 3k3)
1
kl = !(Xj,Uj),
and
h
Uj+l = Uj + "6 (k 1 + 4k2 + k3)
is consistent and has consistency order three if ! is three-times continuously
differentiable. This method is known as Kutta's third-order method.
10.9 Show that the Runge-Kutta method (see Definition 10.25) has consistency
order four if ! is four-times continuously differentiable.
10.10 Write a computer program for the Runge-Kutta method and test it for
various examples.
10.11 The population p = p(t) and q = q(t) of two interacting animal species
that have a predator prey relationship is modeled by the system of the Lotka-
Volterra equations
~~ = OIp + {3pq, ~; = 'Yq + ~pq
with constant coefficients 01 < 0, {3 > 0, 'Y > 0, and ~ < 0, complemented by initial
conditions p(O) = po and q(O) = qo. (Explain the significance of the signs of the
constants for the model.) For the coefficients 01 = -1, {3 = 0.01, 'Y = 0.25, and
~ = -0.01, test the stability of the solutions by solving the initial value problem
256 10. Initial Value Problems
numerically by the Runge-Kutta method for the four different initial conditions
po = 30 ± 1 and qo = 80 ± 1. Visualize the numerical results by a phase diagram,
i.e., by the curve {(p(t), q(t) : t E [0, T)} for sufficiently large T > O.
10.19 Attempt to approximate the unique solution u(x) = 2 of the initial value
problem
u' = xu(u - 2), u(O) = 2,
numerically by any of the methods described in this chapter. Discuss the results
by relating them to the solution of the initial value problem with perturbed initial
condition u(O) = 2 + a for small a E JR.
For each s the value F(s) can be computed approximately by one of the
numerical methods of Chapter 10 for the solution of initial value problems,
extended appropriately to the case of a second-order equation. Note that
for a nonlinear differential equation the equation F(s) = is nonlinear.
For finding a zero of F the Newton method of Section 6.2 can be em-
°
ployed. For the computation of the derivative F'(s), which is required for
Newton's method, we assume that the solution u to the initial value prob-
lem (11.3) depends in a continuously differentiable manner on the parame-
ter s. This can be assured by appropriate assumptions on f (see [12]). We
set
8u
v:= 8s
and differentiate the differential equation and the initial condition (11.3)
with respect to s to obtain
v"(x,s) = fu(x,u(x,s),u'(x,s))v(x,s)
(11.4)
+ fU I (x, u(x, s), u' (x, s))v' (x, s)
and
v(a,s) = 0, v'(a,s) = 1. (11.5)
Since
F'(s) = v(b,s),
computing the derivative of F requires solving the additional linear initial
value problem (11.4)-(11.5) for v, where u is known from solving (11.3).
Note that from a numerical approximation, u is known only at grid points.
Summarizing, we obtain the following method.
Algorithm 11.1 The shooting method with Newton iterations consists of
the following steps:
1. Choose an initial slope s E JR.
2. Solve numerically the initial value problem for
with initial conditions u(a) = 0::, u'(a) = s and the initial value problem for
u(x) = 1
ellO _ e- IOO
{(ellO _ l)e- lOx + (1 _ e-IOO)e llx }.
The unique solution to the associated initial value problem with initial
conditions u(O) = 1 and u'(O) = s is given by
11 - S -lOx 10 + s llx
ux
() = - - e +--e.
21 21
11.1 Shooting Methods 261
11 - S -lOO 10 + s 110 1
s =--e
F() +--e -.
21 21
From F(s) = 0 we deduce that the exact initial slope s satisfies
e- 110
_ e- 210
-10<s=-10+21 1-e- 2lO
= 21 -2110- 10- 9
9
u(1O -10 + 10- 9 ) e- lOO + -21- e 110 ~ 2.8.10 37 .
, '
Le., small changes in s will cause very large changes in the values of the
solution at the other endpoint. Hence, we cannot expect that this bound-
ary value problem can be numerically solved by the simple version of the
shooting method. 0
U(Xn,un-l,sn-d - (3 = O.
For the solution of this system Newton's method can again be used. For
details we refer to [36, 50].
Proof. Assume that Ul and U2 are two solutions to the boundary value
problem. Then the difference U = UI - U2 solves the homogeneous boundary
value problem
-U" + qu = 0, u(a) = u(b) = O.
This implies u' = 0 on [a, b], since q 2: o. Hence U is constant on [a, b],
and the boundary conditions finally yield U = 0 on [a, b]. Therefore, the
boundary value problem (11.7)-(11.8) has at most one solution.
11.2 Finite Difference Methods 263
This system is uniquely solvable. Assume that C 1 and C2 solve the homoge-
neous system. Then U = C 1 U1 + C2U2 yields a solution to the homogeneous
boundary value problem. Hence U = 0, since we have already established
uniqueness for the boundary value problem. From this we conclude that
C 1 = C2 = 0 because U1 and U2 are linearly independent, and the exis-
tence proof is complete. 0
for approximate values Uj to the exact solution u(Xj). Here we have set
qj := q(Xj)and rj := r(xj). The system has to be complemented by the
two boundary conditions
Uo = U n +1 = O. (11.11)
For an abbreviated notation we introduce the n x n tridiagonal matrix
-1
264 11. Boundary Value Problems
and the vectors U = (Ul,"" un)T and R = (rl, .. " rn)T. Then our system
of equations, including the boundary conditions, reads
AU=R. (11.12)
Lemma 11.6 Denote by A the matrix of the finite difference method for
q 2: 0 and by A o the corresponding matrix for q = O. Then
Proof. The columns of the inverse A-I = (aI, ,an) satisfy Aaj = ej for
j = 1, ... ,n with the canonical unit vectors el, , en in ffi n. The Jacobi
iterations for the solution of Az = ej starting with Zo = 0 are given by
h4 h4
u(x+h)-2u(x)+u(x-h) = h 2 u"(X) + 24 u(4)(x+O+h)+ 24 u(4)(x-O_h),
(11.13)
Since
A(U - U) = Z,
and from this, using Lemma 11.6 and the estimate (11.13), we obtain
h2
lu(xj) -ujl:::; IIA- 1 Zlloo:::; 121Iu(4)lIooIiAolelloo, j = 1, ... ,n, (11.14)
266 11. Boundary Value Problems
Theorem 11.8 confirms that as in the case of the initial value problems
in Chapter 10, the order of the local discretization error is inherited by
the global error. Note that the assumption in Theorem 11.8 on the dif-
ferentiability of the solution is satisfied if q and r are twice continuously
differentiable.
The error estimate in Theorem 11.8 is not practical in general, since it
requires a bound on the fourth derivative of the unknown exact solution.
Therefore, in practice, analogously to (10.19) the error is estimated from
the numerical results for step sizes hand h/2. Similarly, as in (10.20), a
Richardson extrapolation can be employed to obtain a fourth-order approx-
imation.
Of course, the finite difference approximation can be extended to the
general linear ordinary differential equation of second order
_ul/ + pu' + qu = r
by using the approximation
(11.15)
for the first derivative. This approximation again has an error of order
O(h 2 ) (see Problem 11.9). Besides Richardson extrapolation, higher-order
approximations can be obtained by using higher-order difference approxi-
mations for the derivatives such as
1
ul/(x) ~ 12h2 [-u(x + 2h) + 16u(x + h)
(11.16)
-30u(x) + 16u(x - h) - u(x - 2h)],
with step size h = 1/(n+ 1) and n E IN. Then we approximate the Laplacian
at the internal grid points by
1
t6.U(Xij) ~ h2 {U(Xi+l,j) + U(Xi-l,j) + u(xi,j+d + u(xi,j-d - 4U(Xij)}
for approximate values Uij to the exact solution U(Xij). Here we have set
qij := q(Xij) and rij := r(xij). This system has to be complemented by the
boundary conditions
From the proof of Lemma 11.6 it can be seen that its statement also holds
for the corresponding matrices of the system (11.19)-(11.20). Lemma 11.7
implies that
Theorem 11.10 Assume that the solution to the boundary value problem
(11.17)-(11.18) is four-times continuously differentiable. Then the error of
the finite difference approximation can be estimated by
Theorem 11.11 (Riesz) Let X be a Hilbert space. Then for each bounded
linear function F : X -+ ce there exists a unique element f E X such that
for all u EX. The norms of the element / and the linear junction F
coincide; i. e.,
Ilfll = IIFII· (11.23)
Proof. Uniqueness follows from the observation that because of the positive
definiteness of the scalar product, f = 0 is the only element representing
the zero function F = 0 in the sense of (11.22). For F :f 0 choose w E X
with F( w) :f O. Since F is continuous, the nullspace
F(9)9)
F(u) = ( u'lf9TI2
for all u EX, which completes the proof of (11.22).
From (11.22) and the Cauchy-Schwarz inequality we have that
for all u E X.
Hence
IIAull ~ ellull (11.25)
for all u E X. From (11.25) we observe that Au = 0 implies u = 0; i.e., A
is injective.
Next we show that the range A(X) is closed. Let v be an element of the
closure A(X) and let (v n ) be a sequence from A(X) with V n -+ v, n -+ 00.
Then we can write V n = AU n with some Un E X, and from (11.25) we find
that
(11.26)
is a consequence of (11.25). o
Definition 11.14 Let X be a complex (or real) linear space. Then a func-
tion S : X X X -+ ce (or IR) is called sesquilinear if it is linear with respect
to the first variable and antilinear with respect to the second variable, i. e.,
if
S(au + {3v, w) = as(u, w) + {3S(v, w)
and
S(u,av + {3w) = as(u,v) + j3S(u,w)
for all u, v, w E X and a, {3 E ce (or IR). A sesquilinear function on a
normed space X is called bounded if
IS(u,v)1 ~ Cllullllvil
for all u, v E X and some positive constant C. It is called strictly coercive
if
ReS(u,u) ~ cllull 2
for all u E X and some positive constant e.
Note that for a real linear space, sesquilinear functions are bilinear func-
tions, i.e., linear with respect to both variables. Each bounded and strictly
11.3 The Riesz and Lax-Milgram Theorems 271
for all u,v EX. Then we have (u,A1v - A 2 v) = 0 for all u,v E X, which
implies Atv = A 2 v for all v E X by setting u = A1v - A 2 v. 0
for all u, v EX, and by Theorem 11.11 there exists a uniquely determined
element f such that
F(v) = (v, f)
for all vEX. Hence, the equation (11.27) is equivalent to the equation
Au = f.
However, the latter equation is uniquely solvable as a consequence of the
Lax-Milgram Theorem 11.13. 0
Since the coercivity constants for A and S coincide, from (11.23) and
(11.26) we conclude that
Ilull -< !c IIFII (11.28)
Theorem 11.17 For a bounded and strictly coercive linear operator A the
Galerkin equations (11.30) have a unique solution. It satisfies the error
estimate
Ilun - ull ::; M inf Ilv - ull,
vEX n
(11.32)
(11.33)
the Galerkin equations (11.30) are equivalent to the system of linear equa-
tions
n
LO:k(Wj,Awk) = (Wj,!), j = 1, ... ,n. (11.35)
k=l
274 11. Boundary Value Problems
Here we assume that p E C I [a, b] and q, r E C[a, b] such that p(x) > 0
and q(x) ~ 0 for all x E [a,b]. Multiplying the differential equation by
v and performing a partial integration, it follows that each solution u to
(11.36)-(11.37) satisfies
S(v, u) = F(v) (11.38)
for all v E C I [a, b] with v( a) = v( b) = 0, where we have set
S(u, v) := l b
(pu'V' + quv) dx (11.39)
l
and
b
F(v) := rvdx. (11.40)
l
that
b
[(pul)I - qu + r]vdx = 0 (11.41)
11.4 Weak Solutions 275
for almost all x E [a, b] and some constant c, since by Fubini's theorem
276 11. Boundary Value Problems
for all v E CI [a, b) with v(a) = v(b) = O. Hence both sides of (11.43) have
the same weak derivative.
(11.44)
is a Hilbert space.
Proof. It is readily checked that HI [a, b) is a linear space and that (11.44)
defines a scalar product. Let (un) denote an HI Cauchy sequence. Then
(un) and (u~) are both L 2 Cauchy sequences. From the completeness of
L 2 [a, b) we obtain the existence of u E L 2 [a, b) and w E L 2 [a, b) such that
lIu n - ul12 -t 0 and lIu~ - wl12 -t 0 as n -t 00. Then for all v E C I [a, b)
with v(a) = v(b) = 0 we can estimate
Proof. Since C[a, b) is dense in L 2 [a, b], for each u E HI[a, b) and € > 0 there
exists w E C[a, b) such that Ilu' - wl12 < €. Then we define v E CI [a, b) by
v(x) := u(a) + l x
w({) d{,
u(x) - v(x) = l x
{u'W - w({)} d~.
By the Cauchy-Schwarz inequality this implies lIu - vl12 < (b - a)€, and
the proof is complete. 0
follows for all x, y E [a, b]. Therefore, every function u E HI [a, b] belongs to
C[a, b], or more precisely, it coincides almost everywhere with a continuous
function. 0
for some constant C. The latter inequality means that the HI norm is
stronger than the maximum norm (in one space dimension!).
Proof. Since the HI norm is stronger than the maximum norm, each HI
convergent sequence of elements of HJ [a, b] has its limit in HJ [a, b]. There-
fore HJ [a, b] is a closed subspace of HI [a, b], and the statement follows from
Remark 3.40. 0
Theorem 11.24 Assume that p > 0 and q ~ O. Then there exists a unique
weak solution to the boundary value problem (11.36)-(11.37).
by the Cauchy~Schwarz inequality. For u EHJ [a, b], from (11.45) and the
Cauchy-Schwarz inequality we obtain that
for all u E HJ
[a, b] and some positive constant Cj i.e., S is strictly coercive.
Finally, by the Cauchy-Schwarz inequality we have
We note that from (11.28) and the previous inequality it follows that
(11.46)
for the weak solution u to the boundary value problem (11.36)-( 11.37).
Theorem 11.25 Each weak solution to the boundary value problem (11.36)-
(11.37) is also a classical solution; i. e., it is twice continuously differen-
tiable.
Proof. Define
f(x) := l [q(~)u(O
x
- r(~)] d~, x E [a, b].
l b
[pu' - f]v' dx = 0
l (P(~)u'(~)
and x
vo(x) := - f(~) - c]d~, x E [a,b].
11.5 The Finite Element Method 279
Hence
pu' = 1+ c,
and since I and p are in C [a, b] with p(x) > 0 for all x E [a, b], we can
1
1
h (Xk+1 - x),
0,
Each u E X n can be represented in the form
and
S(wj,wj+d = S(Wj+I,Wj)
1 {XHI 1 (Xj+1
= ~2 J p(x) dx + h2 J q(x)(Xj+! - x)(x - Xj) dx,
xJ xJ
11.5 The Finite Element Method 281
F(wj) = h1 {lX'
J r(x)(x-Xj-ddx+) lX'+1 r(x)(xj+l-x)dx } .
X)-l x)
These equations illustrate two general features of the finite element meth-
ods. Firstly, it is characteristic for the finite element method that the co-
efficients are computed by the same formula for each subinterval, i.e., for
each of the finite elements into which the total interval is subdivided.
Secondly, as already mentioned earlier, the Galerkin method is only
semidiscrete. In order to make it fully discrete, a numerical quadrature
has to be applied. If we remain within our framework of approximations
and approximate P, q, and r by linear splines, we obtain
and
-1 h
S(wj,wj+d ~ 2h (Pj + pj+d + 12 (qj + qj+d
for the matrix elements, and
for the right-hand sides. Here, as above, we have set Pj = p(Xj), qj = q(Xj),
and rj = r(xj). Similar to the linear system (11.10)-(11.11) for the finite
difference method, the tridiagonal linear system is irreducible and weakly
row-diagonally dominant. It also is accessible to convergence acceleration
of the Jacobi iterations by relaxation and multigrid methods.
In order to derive an error estimate for the semidiscrete version of the
finite element method with linear splines from Theorem 11.17, we need an
estimate for the interpolation error for linear splines with respect to the
Hi norm (see also Theorem 8.33).
Lemma 11.26 Let f[a, b] E C 2 [a, b]. Then the remainder Rtf := f - Ltf
for the linear interpolation at the two endpoints a and b can be estimated
by
IIRtfllL2 ::; (b - a)2 III"11£2,
(11.49)
II(Rtf)'II£2 ::; (b - a) 111"11£2.
Proof. For each function g E CI [a, b] satisfying g(a) = 0, from
282 11. Boundary Value Problems
for functions 9 E C1[a,b] with g(a) = 0 (or g(b) = 0). Using the interpola-
tion property (RI!)(a) = (RI!)(b) = 0, by partial integration we obtain
whence (11.49) follows with the aid of Friedrich's inequality (11.50) for
9 = RI!. 0
(11.51)
whence
follows. Now (11.51) is a consequence of the error estimate for the Galerkin
method of Theorem 11.17. 0
Since S(v, u) = F(v) and S(v, un) = F(v) for all v E X n , using the sym-
metry of S we have
S(U-Un,v)=O
for all v E X n . Inserting the Galerkin approximation to Zn, which we denote
by zn, into the last equation and subtracting from (11.52), we obtain
(11.53)
Since S is bounded, from (11.53) and (11.51), applied to U-U n and Zn -zn,
we can conclude that
for some constant C2 . Now the assertion of the theorem follows from the
last two inequalities. 0
Problems
11.1 Consider multiple shooting for the boundary value problem
11.2 Write a computer program for multiple shooting using the Newton method
and the Runge-Kutta method and test it for various examples.
11.3 Show that the boundary value problem for the differential equation
11.4 Show that the general solution of the linear differential equation (11. 7) is
given by (11.9).
11.5 For p E C 1 [a, b] and q E CIa, b] show that the boundary value problem
u"+ pu
, + qu =r .
In [a, b] , u(a) = u(b) = 0,
is solvable for each right-hand side r E CIa, b] if and only if the boundary value
problem
u" + pu' + qu = 0 in [a, b], u(a) = u(b) = 0,
admits only the trivial solution u = O.
11.6 Find the solution of the boundary value problem
11.7 Write a computer program for the finite difference method (11.10)-(11.11)
and test it for various examples.
11.8 Find the explicit solution for the finite difference approximation (11.10)-
(11.11) for the boundary value problem
11.9 Show that the error in the finite difference approximation (11.15) is of
order O(h 2 ) and that the error in the approximation (11.16) is of order O(h 4 ).
11.10 Prove the estimate lIuolioo :::; 1/8 for the solution to the boundary value
problem (11.21).
11.11 In the space CIa, b] with scalar product
(u,v):= l b
u(x)v(x)dx,
F(u):= l b
u(x)dx.
Show that F is linear and bounded. Is there an f E CIa, b] such that F(u) = (u, f)
for all u E CIa, b]? Does your answer agree with the Riesz Theorem 11.11?
Problems 285
11.12 In the pre-Hilbert space of Problem 11.11, for a fixed x E [a, b) consider
the point evaluation functional F : C[a, b] -t <C defined by
F(u) := u(x).
Un+l = Un - Onpn,
pn = r n + {3n-1Pn-l,
rn = rn-l - on- 1Apn-l,
On-l (r n-l,Pn-J)/(Apn-l,Pn-J),
{3n-l -ern, Apn- d/(Apn-l,Pn- d·
11.16 Show that under the assumptions of Problem 11.15 for the Galerkin
equations the SOR method of Section 4.2 converges for 0 < w < 2.
11.17 Show that the weak derivative, if it exists, is unique and that each func-
tion with vanishing weak derivative must be constant almost everywhere.
286 11. Boundary Value Problems
11.18 Write a computer program for the finite element method with linear
splines and test it for various examples. Compare the numerical results with
those for the finite difference method.
11.19 Let B-1,Bo,B1, ... ,Bn,Bn+1,Bn+2 denote the cubic B-splines for the
equidistant grid Xj := a+ jh, j = 0, ... ,n+ 1, with step size h = (b-a)/n. Show
that
Uo := Bo - 4B-1, U1 := B 1 - B_ 1,
11.20 Formulate and prove analogues of Theorems 11.27 and 11.28 for the finite
element approximation using cubic splines as in Problem 11.19.
12
Integral Equations
The topic of the last chapter of this book is linear integral equations, of
lb
which
K(x, Y)'{J(Y) dy = j(x), x E [a, b],
-lb
and
'{J(x) K(x, Y)'{J(Y) dy = j(x), x E [a, b],
are typical examples. In these equations the function '{J is the unknown,
and the so-called kernel K and the right-hand side j are given functions.
The above equations are called Fredholm integral equations of the first and
second kind, respectively. Since both the theory and the numerical approx-
imations for integral equations of the first kind are far more complicated
than for integral equations of the second kind, we will confine our presen-
tation to the latter case.
Integral equations provide an important tool for solving boundary value
problems for both ordinary and partial differential equations (see Problem
12.1 and [39]). Their historical development is closely related to the solution
of boundary value problems in potential theory in the last decades of the
nineteenth century. Progress in the theory of integral equations also had a
great impact on the development of functional analysis.
Omitting the proofs, we will present the main results of the Riesz theory
for compact operators as the foundation of the existence theory for integral
equations of the second kind. Then we will develop the fundamental ideas
of the Nystr6m method and the collocation method as the two most im-
288 12. Integral Equations
<p(x) -l b
K(x, y)<p(y) dy = f(x), x E [a, b], (12.1)
with continuous kernel K has a unique solution <p E C[a, b] for each right-
hand side f E C[a, b] if and only if the homogeneous integral equation
<p(x) -l b
K(x, y)<p(y) dy = 0, x E [a, b], (12.2)
has only the trivial solution. The importance of this result originates from
the fact that it reduces the difficult problem of establishing existence of
a solution to the inhomogeneous integral equation to the simpler problem
of showing that the homogeneous integral equation allows only the trivial
solution <p = 0, and it extends the corresponding statement for systems
of linear equations to the case of integral equations. Actually, Fredholm
derived his results by interpreting integral equations as a limiting case of
linear systems by considering the integral as a limit of Riemann sums and
passing to the limit in Cramer's rule for the solution of linear systems. For
the solution of integral equations with continuous kernels, Fredholm's ap-
proach is still the most elegant and shortest. However, since it is restricted
to the case of continuous kernels, it is more convenient to consider the
above equations as a special case of operator equations of the second kind
with a compact operator, as presented by Riesz in 1918.
Definition 12.1 A linear operator A : X -+ Y from a normed space X
into a normed space Y is called compact if for each bounded sequence (<Pn)
in X the sequence (A<pn) contains a convergent subsequence in Y, i.e., if
each sequence from the set {A<p : <p EX, 1I<p1I ~ I} contains a convergent
subsequence.
12.1 The Riesz Theory 289
(A<p)(x):= l b
K(x,y)<p(y)dy, x E [a,b], (12.3)
1<p(x)1 ::; C
for all x E [a, b] and all <p E U, and for every c > 0 there exists 8 > 0 such
that
l<p(x) - <p(y)1 < c
for all x,y E [a,b] with Ix - yl < 8 and all <p E U.
290 12. Integral Equations
Proof. For all rp E C[a, b] with Ilrpll<XJ ~ 1 and all x E [a, b], we have that
Le., the set U := {Arp : rp E C[a, b], Ilrpll<XJ ~ I} C C[a, b] is bounded. Since
K is uniformly continuous on the square [a, b] x [a, b], for every E: > 0 there
exists fl > 0 such that
E:
IK(x, z) - K(y, z)1 < b _ a
for all x, y E [a, b] with Ix - yl < fl and all rp E C[a, b] with IIrpll<XJ ~ 1; Le.,
U is equicontinuous. Hence A is compact by the Arzela-Ascoli Theorem
12.3. 0
In our analysis we also will need an explicit expression for the norm of
the integral operator A.
and thus
.I.(y) ._ K(xo,Y) [ b]
'f' .- IK(xo,y)1 +€' yEa, .
Ja IK(xo,y)1 +€
y)F -
IK(xo,y)1 +€
b
€2 dy
= l b
IK(xo,y)1 dy - €(b - a).
Hence
~ IIA'l/Jlloo ~ ra
b
IIAlloo = sup IIA'Plioo IK(xo, y)1 dy - €(b - a),
1I'PliooSl J
and since this holds for all € > 0, we have
It also can be shown that the integral operator remains compact if the
kernel K is merely weakly singular (see [39]). A kernel K is said to be
weakly singular if it is defined and continuous for all x, y E [a, b], x f:. y,
and there exist positive constants M and 0: E (0,1] such that
'P - A'P =I
of the second kind is to replace it by an equation
'Pn - An'Pn = In
with approximating sequences An ---+ A and In ---+ I as n ---+ 00. For com-
putational purposes, the approximating equations will be chosen such that
292 12. Integral Equations
In order to develop a similar analysis for the case where the sequence
(An) is merely pointwise convergent, i.e., Ancp -t cP, n -t 00, for all cP, we
will have to bridge the gap between norm and pointwise convergence. This
goal will be achieved through the concept of collectively compact operator
sequences and the following uniform boundedness principle.
Theorem 12.7 Let the sequence An : X -t Y of bounded linear operators
mapping a Banach space X into a normed space Y be pointwise bounded;
12.2 Operator Approximations 293
(12.6)
for all cP E X with IIcp - 1/J11 'S P and all n E IN. Assume that this is not
possible. Then, by induction, we construct sequences (nk) in IN, (Pk) in lR,
and (CPk) in X such that
IIA nk cpll ~ k
for k = 0,1,2, ... and cP with Ilcp - CPkll 'S Pk and
1
o < Pk 'S 2 Pk-l,
for k = 1,2, ....
We initiate the induction by setting no = 1, Po = 1, and CPo = O. Assume
that nk E IN, Pk > 0, and CPk E X are given. Then there exist nHl E IN
and CPHI E X satisfying IlcpHl - CPk II 'S pk/ 2 and IIAnk+l CPk+l1l ~ k + 2.
Otherwise, we would have IIAncpli 'S k+2 for all cP E X with IIcp-cpk II 'S Pk /2
and all n E IN, and this contradicts our assumption. Set
Then for all cP E X with IIcp - CPHll1 'S PHI, by the triangle inequality we
have
1 1
-< -2 Pk + ... + -2 p'J -1 <
-
Pk·
Now, in the second step, from the validity of (12.6) we deduce for each
'P E X with II'PII :S 1 and for all n E IN the estimate
1 2M
IIAn'Pll = - IIAn(p'P + 1/J) - A n1/J11 :S - .
P P
This completes the proof. o
Proof. Assume that (12.7) is not valid. Then there exist co > 0, a sequence
(nk) in IN with nk -+ 00, k -+ 00, and a sequence ('Pk) in X with lI'Pkll :s 1
such that
(12.8)
(12.10)
The first term on the right-hand side of (12.10) tends to zero as j -+ 00,
since the operator sequence (B n ) is pointwise convergent. The second term
tends to zero as j -+ 00, since the operator sequence (B n ) is uniformly
bounded by Theorem 12.7 and since we have the convergence (12.9). There-
fore, passing to the limit j -+ 00 in (12.10) yields a contradiction to (12.8),
and the proof is complete. 0
12.2 Operator Approximations 295
(I - A)-l = I + (I - A)-l A
suggests
(12.12)
where
From Lemma 12.9 we conclude that IISnl1 -t 0, n -t 00. Hence for suffi-
ciently large n we have IISnll ::; q < 1. For these n, by the Neumann series
Theorem 3.48, the inverse operators (I - Sn)-l exist and are uniformly
bounded by
Note that both error estimates (12.5) and (12.11) show that the accuracy
of the approximate solution essentially depends on how well An<p approxi-
mates A<p for the exact solution <po
296 12. Integral Equations
<p - A<p =f
is approximated by the solution of
<Pn - An<Pn = f,
which reduces to solving a finite-dimensional linear system.
Then the values <p)n) .- <Pn(Xj), j 0, ... ,n, at the quadrature points
satisfy the linear system
n
<p)n) _ L akK(Xj, Xk)<p~n) = f(xj), j = 0, ... , n. (12.14)
k=O
12.3 Nystrom's Method 297
Proof. The first statement is trivial. For a solution <p)n l , j = 0, ... ,n, of
the system (12.14) the function <Pn defined by (12.15) has values
n
<Pn(Xj) = f(xj) +L akK(Xj, xk)<p~nl = <p)n l , j = 0, ... , n.
k=O
(12.16)
IIAn<plloo :s a<x<b
max L lakK(x,xk)l,
- - k=O
k=O
and
for all Xl, X2 E [a, b]. From (12.17) and (12.18) we see that
max jK(x,y)1 r {I -
b
IIA(<p'l/J£) - A<plloo ~ x,yE[a,b] } a
'l/J£(y)} dy -t 0, c: -t 0,
for all <p E C[a, b] with 11<p1100 = 1. Using this result, we derive
IIA - Anll oo = sup II(A - An)<plloo 2 sup sup II(A - An)(<p'l/J£) 1100
11'1'11==1 11'1'11==1 £>0
b n
IIAcp - Ancpll= = aT;~b fa K(x, y)cp(y) dy - ( ; akK(x, Xk)cp(Xk)
and requires a uniform estimate for the error of the quadrature applied
to the integration of K(x, .)cp. Therefore, from the error estimate (12.11),
it follows that under suitable regularity assumptions on the kernel K and
the exact solution cp, the convergence order of the underlying quadrature
formulae carries over to the convergence order of the approximate solutions
to the integral equation. We illustrate this by the case of the trapezoidal
rule. Under the assumption cp E C 2 [a, b) and K E C 2 ([a, b) x [a, b)), by
Theorem 9.7, we can estimate
1 h2 (b - a) max
IIAcp - Ancpll= ~ -2
1 a<::;x,y<::;b
I~f)2 [K(x, y)cp(y)] I.
uy
cp(x) - 2
1 1 0
1
(x + l)e- xY cp(y)dy = e- x - 2 1 e-(X+l), o ~
1+2 x ~ 1,
(12.19)
with exact solution cp(x) = e- X • For its kernel we have
1
1 1 x+1
max - (x + l)e- xY dy = sup - - (1 - e- X) < 1.
O<::;x::;1 0 2 O<x<::;l 2x
Therefore, by the Neumann series Theorem 3.48 and the operator norm
(12.4), equation (12.19) is uniquely solvable.
We use the (composite) trapezoidal rule for approximately solving the
integral equation (12.19) by the Nystrom method. Table 12.1 gives the
difference between the exact and approximate solutions and clearly shows
the expected convergence rate O(h 2 ).
We now use the (composite) Simpson's rule for the integral equation
(12.19). The numerical results in Table 12.2 show the convergence order
O(h 4 ), which we expect from the error estimate (12.11) and the convergence
order for Simpson's rule from Theorem 9.8. 0
After comparing Tables 12.1 and 12.2, we wish to emphasize the major
advantage of Nystrom's method over other methods like the collocation
method, which we will discuss in the next section. The matrix and the
right-hand side of the linear system (12.14) are obtained by just evaluating
the kernel K and the given function f at the quadrature points. Therefore,
without any further computational effort we can improve considerably on
the approximations by choosing a more accurate numerical quadrature for-
mula.
In the next example we consider an integral equation with a periodic
kernel and a periodic solution.
ab r 2n
~(T)dT
~(t) + -; io a2 + b2 _ (a2 _ b2) cos(t + T) = f(t),
(12.20)
where a 2: b > O. This integral equation arises from the solution of the
Dirichlet problem for the Laplace equation in an ellipse with semiaxis a
and b (see [39]). Any solution ~ to the homogeneous form of equation
(12.20) clearly must be a 21l'-periodic analytic function, since the kernel is
a 21l'-periodic analytic function with respect to the variable t. Hence, we
can expand ~ into a uniformly convergent Fourier series
00 00
Inserting this into the homogeneous integral equation and using the inte-
grals (see Problem 12.10)
(12.21)
12.3 Nystrom's Method 301
Using the integrals (12.21), it can be seen that the right-hand side becomes
The actual size of the error, i.e., the constant factor in the exponential
decay, depends on the parameters a and b, which describe the location of
the singularities of the integrands in the complex plane; i.e., they determine
the width of the strip of the complex plane into which the kernel can be
extended as a holomorphic function.
Note that for periodic analytic functions the rectangular rule generally
yields better approximations than Simpson's rule (see Problem 9.12). 0
302 12. Integral Equations
in the form
n
Ln! = L !(Xk)ek (12.23)
k=O
(12.24)
and immediately see that equation (12.24) is equivalent to the linear system
n
L 'YdUk(Xj) - (AUk)(Xj)} = f(xj), j = 0, ... ,n, (12.25)
k=O
for the coefficients 'YO, ... ,'Yn' If we use the Lagrange basis for X n and write
then of course 'Yj = <Pn(Xj), j = 0, ... , n, and the system (12.25) becomes
n
'Yj - L 'Yk(A£k)(Xj) = f(xj), j = 0, ... , n. (12.26)
k=O
From the systems (12.25) and (12.26) it is obvious that the collocation
method is only semidiscrete, since in general, additional approximations
are needed in order to compute the matrix entries (AUk)(Xj) or (A£k)(Xj).
The collocation method can be interpreted as a projection method; i.e.,
since the interpolating function is uniquely determined by its values at the
interpolation points, equation (12.24) is equivalent to
(12.27)
This equation can be considered as an equation in the whole space C[a, b]
because any solution <Pn = LnA<pn + Lnf automatically belongs to X n .
Hence, our general error and convergence results for operator equations of
the second kind can be applied to the collocation method.
Theorem 12.16 Let A : C[a, b) -+ C[a, b) be a compact linear operator
such that I - A is injective, and assume that the interpolation operators
L n : C[a, b) -+ X n satisfy IIL n A - Alloo -+ 0, n -+ 00. Then, for suffi-
ciently large n, the approximate equation (12.27) is uniquely solvable for all
f E C[a, b), and we have the error estimate
(12.28)
for some positive constant C depending on A.
304 12. Integral Equations
ip(x) -l b
K(x, y)ip(y) dy = f(x), x E [a,b], (12.29)
ipn(X) -l b
[LnK(·, Y)](X)ipn(Y) dy = (Lnf)(x), x E [a, b], (12.30)
~ ik {Uk(Xj) -l b
K(xj,y)udy) dY } = f(xj), j = 0, ... ,n, (12.31)
and
(12.32)
uo, ... ,un, and for the collocation points xo, ... ,X n . We briefly discuss two
possibilities, based on linear splines and on trigonometric polynomials.
First we consider piecewise linear interpolation. Let Xj = a + jh,
j = 0, ... , n, denote an equidistant subdivision with step size h = (b- a)jn
and let X n be the space of continuous functions on [a, b] whose restrictions
on each of the subintervals [Xj-l, Xj], j = 1, ... , n, coincide with a linear
function. As in Section 11.5, the Lagrange basis is given by
1
h (Xk+l - x),
0,
for k = 0, ... ,n. Since for piecewise linear interpolation we have that
IILnfiloo ~ . max
J=O, ... ,n
If(xj)1 ~ Ilflloo,
Theorem 12.18 The collocation method with linear splines converges for
integral equations of the second kind with continuous kernels.
Provided that the exact solution of the integral equation is twice contin-
uously differentiable, then from the error estimate (8.9) for linear interpo-
lation and Corollary 12.17 we derive an error estimate of the form
for the linear spline collocation approximate solution <Pn' Here, C denotes
some constant depending on the kernel K.
In general, in most practical problems the evaluation of the matrix entries
J:
in (12.32) will require a numerical quadrature for integrals of the form
K(xj'Y)£k(y)dy. To be consistent with our approximations, we replace
K(xj,·) by its piecewise linear interpolation; i.e., we approximate
! a
b
K(Xj'Y)£k(y)dy
n
~ LK(Xj,Xi)
;=0
! a
b
£;(Y)£k(y)dy
306 12. Integral Equations
with
n
Kn(x,y):= LK(x,Xi)fi(y);
i=O
and using the fact that for the piecewise linear spline interpolation we have
IILnll oo = 1, from (8.9) we obtain
Lemma 12.20 Let I E C 1 [0, 27r]. Then for the remainder in trigonometric
interpolation we have
(12.34)
where Cn -+ 0, n -+ 00.
m=-oo
308 12. Integral Equations
1
21r
!,(t)e-imtdt = im 1
21r
f(t)e-imtdt = 21rima m
:s
for all <p E C[O, 21r]. Hence, IILnA-Alloo cnM -t 0, n -t 00, and Theorem
12.16 can be applied to obtain the following result.
Theorem 12.21 The collocation method with trigonometric polynomials
converges for integral equations of the second kind with continuously differ-
entiable periodic kernels and right-hand sides.
One possibility for the implementation of the collocation method is to
use the trigonometric monomials as basis functions. Then the integrals
21r
Jo K(tj,r)eikrdr have to be integrated numerically. Replacing the kernel
by its trigonometric interpolation leads to the quadrature formula
for j = 0, ... ,2n - 1. Using fast Fourier transform techniques (see Section
8.2 ) these quadratures can be carried out very rapidly. A second, even
more efficient, possibility is to use the Lagrange basis
(12.35)
12.4 The Collocation Method 309
for k = 0, ... , 2n -1 which can be derived from Theorem 8.25 (see Problem
J:
12.13).
1r
For the evaluation of the matrix coefficients K(tj, r)fk(r) dr we pro-
ceed analogously to the preceding case of linear splines. We approximate
these integrals by replacing K(tj,') by its trigonometric interpolation poly-
nomial, Le., we approximate
(12.36)
for m, k = 0, ... ,2n - 1. Note that despite the global nature of the trigono-
metric interpolation and its Lagrange basis, due to the simple structure of
the weights (12.36) in the quadrature rule, the computation of the matrix
elements is not too costly. The only additional computational effort besides
the kernel evaluation is the computation of the row sums
2n-l
L (-l)mK(tj,t m )
m=O
for j = 0, ... ,2n - 1. We omit the analysis of the additional error in the
fully discrete method caused by the numerical quadrature.
Example 12.22 For the integral equation (12.20) from Example 12.15,
Table 12.5 gives the error between the exact solution and the collocation
approximation.
TABLE 12.5. Collocation method for equation (12.20)
n t=O t = rr/2 t = rr
12.5 Stability
For finite-dimensional approximations of a given operator equation we have
to distinguish three condition numbers, namely, the condition numbers of
the original operator and of the approximating operator as mappings in
the underlying normed spaces, and the condition number of the linear sys-
tem for the actual numerical solution. This latter system we can influence,
for example in the collocation method by the choice of the basis for the
approximating subspaces.
Consider an equation of the second kind C{} - AC{} = f in a Banach space
X and approximating equations C{}n - AnC{}n = fn under the assumptions of
Theorem 12.6, Le., norm convergence, or of Theorem 12.10, i.e., collective
compactness and pointwise convergence. Then, recalling Definition 5.2 of
the condition number, from Theorems 12.6 and 12.10 it follows that the
condition numbers cond(I - An) are uniformly bounded. Hence, for the
condition of the approximating scheme, we mainly have to be concerned
with the condition of the linear system for the actual computation of the
solution of C{}n - AnC{}n = fn.
For the discussion of the condition number for the Nystrom method we
recall the linear system (12.14) and denote by An the matrix with the
entries akK(Xj,Xk)' We introduce operators R n : C[a,b] -t IR nH by
and M n : IRnH -t C[a, b], where Mnif! is the piecewise linear interpolation
with (Mnif!)(xj) = if!j, j = 0, ... , n, for if! = (if!o, ... , if!n)T. (If a < xo, we
12.5 Stability 311
set (Mn4l)(x) = 41 0 for a ::; x::; xo; and if Xn < b, we set (Mn 4l)(x) = «P n
for X n ::; x ::; b.) Then clearly, IIRnll oo = IIMnli oo = 1.
From Theorem 12.11 we conclude that
and
Theorem 12.23 For the Nystrom method the condition numbers for the
linear system are uniformly bounded.
This theorem states that the Nystrom method essentially preserves the
stability of the original integral equation.
For the collocation method, we introduce the matrices En with entries
Uk(Xj) and An with entries (AUk)(Xj). Since X n = span{uo, ... ,un} is
assumed to be such that the interpolation problem with respect to the
collocation points Xo, ... ,Xn is uniquely solvable, the matrix En is invertible
(see Problem 8.1). In addition, let the operator W n : lRn+l -t C[a,b] be
defined by
n
Wn : 'Y ~ L 'YkUk
k=O
for 'Y = ('Yo,.·., 'Yn) T and recall the operators R n and Mn from above.
Then we have
and
(En - An)-l = E;;l Rn(I - LnA)-l LnMn .
From these three relations, and the fact that by Theorems 12.7 and 12.16
the sequence of operators (I - LnA)-l L n is uniformly bounded, we obtain
the following theorem.
Theorem 12.24 Under the assumptions of Theorem 12.16, for the collo-
cation method the condition number of the linear system satisfies
This theorem suggests that the basis functions must be chosen with cau-
tion. For a poor choice, like monomials, the condition number of En can
grow quite rapidly. However, for the Lagrange basis, i.e., for the linear sys-
tem (12.26), En becomes the identity matrix with condition number one.
In addition, IIL n ll enters in the estimate on the condition number of the
linear system, and for example, for polynomial or trigonometric polynomial
interpolation we have IILnll -t 00, n -t 00 (see Theorem 8.16).
In the context of stability we will conclude this chapter with a few re-
marks on integral equations of the first kind.
Theorem 12.25 implies that integral equations of the first kind with con-
tinuous (or weakly singular) kernels are improperly posed problems in the
sense of Hadamard, as described in Chapter 5.
Of course, the ill-posed nature of an equation has consequences for its
numerical treatment. The fact that an operator does not have a bounded
inverse means that the condition numbers of its finite-dimensional approx-
imations grow with the quality of the approximation. Hence, a careless dis-
cretization of ill-posed problems leads to a numerical behavior that at first
glance seems to be paradoxical. Namely, increasing the degree of discretiza-
tion, i.e., increasing the accuracy of the approximation for the operator, will
cause the approximate solution to the equation to become less and less re-
liable. Therefore, straightforward application of the methods described in
this chapter to integral equations of the first kind with continuous kernels
will generate numerical nonsense.
To make this remark more vivid, we consider the approximate solution
of an integral equation of the first kind
by the analogue of the linear system (12.14) for the Nystrom method, i.e.,
by
n
LakK(xj,xdep~n) =f(Xj), j =O, ... ,n.
k=O
Problems 313
1 1
(x + l)e- xY <p(y)dy = 1- e-(x+l), 0:S x :S 1, (12.37)
has the unique solution <p(x) = e- X (see Problem 12.20). Table 12.6 gives
the difference between the exact solution and the solution obtained by the
quadrature method using the (composite) trapezoidal rule.
Problems
12.1 Show that the boundary value problem for the differential equation
_u" + qu = r in [0,1]
u(x) + 1 1
G(x,y)q(y)u(y)dy = 1 1
G(x,y)r(y)dy, x E [0,1],
314 12. Integral Equations
where
(1 - x)y, 0::; y ::; x::; 1,
G(x,y) :=
{
(1 - y)x, 0::; x ::; y ::; 1,
is the so-called Green's junction of the boundary value problem.
12.2 Show that linear combinations of compact linear operators are compact
and that the product of two bounded linear operators is compact if one of the
factors is compact.
12.3 Show that the integral operator with continuous kernel is a compact op-
erator from L 2 [a, b) into L 2 [a, b).
12.4 Show that the Volterra integral equation of the second kind
with continuous kernel K has a unique continuous solution <p for each continuous
right-hand side I.
Hint: Show that the homogeneous equation allows only the trivial solution and
use Theorem 12.2.
by successive approximations.
12.7 Show that a sequence (<Pn) of functions <pn : [a, b) ~ 1R. that is equicontin-
uous and converges pointwise on [a, b) to some function <P : [a, b] ~ 1R. converges
uniformly on [a, b].
12.9 For the integral operator A and the numerical integration operators using
the (composite) trapezoidal rule, derive bounds on II(A n - A)Alloo and
II(A n - A)Anll oo . Relate the results to Lemma 12.9.
12.11 Write a computer program for the Nystom method allowing the use of
different quadrature formulae and test it for various examples.
12.12 Use the quadrature formula (9.36) with the substitution (9.47) in a
Nystrom method for the integral equation (12.19). Compare the numerical re-
sults with those obtained from the trapezoidal and Simpson's rule.
12.13 Verify the Lagrange basis (12.35) and the integrals (12.36).
12.14 Show that the Fourier series of a continuously differentiable periodic func-
tion is uniformly convergent.
12.15 In the degenerate kernel approximation the integral equation of the sec-
ond kind with continuous kernel K is approximated by the solutions of
Kn(x,y) = Laj(x)bj(y).
j=O
Show how the solution of the approximate equation can be reduced to solving
a system of linear equations. Give an error and convergence analysis based on
Theorem 12.6.
12.16 Use the results of Problem 12.15 to prove Theorem 12.2 for the case of
an integral equation of the second kind with continuous kernel.
12.17 Construct degenerate kernels via interpolation of the kernel K with re-
spect to the first variable and relate this particular degenerate kernel method to
the collocation method (see Problem 12.15).
12.18 The idea of two-grid and multigrid iterations can also be applied to
integral equations of the second kind. For its theoretical foundation assume
the sequence of operators An : X -+ X to be either norm convergent (i.e.,
IIA n - All -+ 0, n -+ (0) or collectively compact and pointwise convergent (i.e.,
An'P -+ A'P, n -+ 00, for all 'P E X). Show that the defect correction iteration
using the preceding coarser level converges, provided that n is sufficiently large.
Show that the defect correction iteration
using the coarsest level converges, provided that the approximation A o is suffi-
ciently close to A.
316 12. Integral Equations
with Zn quadrature points. Show that each iteration step requires the following
computations. First
gn,v := In + (An - Am)'Pn,v
has to be evaluated at the Zm quadrature points x;m), j = 1, ... ,Zm, on the level
m and at the Zn quadrature points x;n), j = 1, ... , Zn, on the level n by setting
x = x(m) and x = x(n) respectively in
J J ' )
Zn Zm
for the values 'Pn,v+1 (x;m») at the Zm quadrature points x;n). Finally, the values
at the Zn quadrature points x;n), j = 1, ... ,Zn, are obtained from the Nystrom
interpolation
Zm
Make an operation count for one step of the defect correction iteration. Set up
the corresponding equations for the collocation method.
12.20 Show that the integral equation (12.37) has a unique solution.
References
[1] Anderssen, R.S., de Hoog, F.R., and Lukas, M.A. The Application
and Numerical Solution of Integral Equations. Sijthoff and Noordhoff,
Alphen aan den Rijn 1980.
[5] Aubin, J.P. Applied Functional Analysis. John Wiley & Sons, New
York 1979.
[11] Ciarlet, P.S. The Finite Element Method for Elliptic Problems. North
Holland, Amsterdam 1978.
[19] Delves, L.M. and Mohamed, J.L. Computational Methods for Integral
Equations. Cambridge University Press, Cambridge 1985.
[20] Dennis, J.E. and Schnabel, R.B. Numerical Methods for Uncon-
strained Optimization and Nonlinear Equations. Prentice-Hall, En-
glewood Cliffs 1983.
[21] Engels, H. Numerical Quadrature and Cubature. Academic Press,
New York 1980.
[22] Engl, H.W., Hanke, M., and Neubauer, A. Regularization of Inverse
Problems. Kluwer Academic Publishers, Dordrecht 1996.
[23] Farin, G. Curves and Surfaces for Computer Aided Geometric Design.
A Practical Guide. 2nd edition. Academic Press, Boston 1990.
[25] Golberg, M.A. and Chen, C.S. Discrete Projection Methods for Inte-
gral Equations. Computational Mechanics Publications, Southamp-
ton 1997.
[26] Golub, G. and Ortega, J.M. Scientific Computing. Academic Press,
Boston 1993.
[27] Golub, G. and van Loan, C. Matrix Computations. John Hopkins
University Press, Baltimore 1989.
[32] Hairer, E., N0rsett, S.P., and Wanner, G. Solving Ordinary Differen-
tial Equations. Nonstiff Problems. Springer-Verlag, Berlin 1987.
[43] Louis, A.K. Inverse und schlecht gestellte Probleme. Teubner, Stutt-
gart 1989.
[50] Roberts, S.M. and Shipman, J.S. Two-Point Boundary Value Prob-
lems: Shooting Methods. Elsevier, New York 1972.
[53] Schumaker, L.L. Spline Functions: Basic Theory. John Wiley & Sons,
Chichester 1981.