Numerical Linear Algebra
Numerical Linear Algebra
Mathematics LN
v4
v3
v2
v1
v0
Numerical Linear Algebra
ROLF RANNACHER
ver-Bild 2:
Ax = b
AT Ax = AT b
Ax = λx
HEIDELBERG
UNIVERSITY PUBLISHING
NUMERICAL LINEAR ALGEBRA
Lecture Notes Mathematics LN
NUMERICAL LINEAR
ALGEBRA
Rolf Rannacher
Institute of Applied Mathematics
Heidelberg University
About the author
0 Introduction 1
0.1 Basic notation of Linear Algebra and Analysis . . . . . . . . . . . . . . . . 1
0.2 Linear algebraic systems and eigenvalue problems . . . . . . . . . . . . . . 2
0.3 Numerical approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.4 Applications and origin of problems . . . . . . . . . . . . . . . . . . . . . . 4
0.4.1 Gaussian equalization calculus . . . . . . . . . . . . . . . . . . . . . 4
0.4.2 Discretization of elliptic PDEs . . . . . . . . . . . . . . . . . . . . . 6
0.4.3 Hydrodynamic stability analysis . . . . . . . . . . . . . . . . . . . . 10
v
vi CONTENTS
Bibliography 246
Index 250
0 Introduction
Subject of this course are numerical algorithms for solving problems in Linear Algebra,
such as linear algebraic systems and corresponding matrix eigenvalue problems. The
emphasis is on iterative methods suitable for large-scale problems arising, e. g., in the
discretization of partial differential equations and in network problems.
At first, we introduce some standard notation in the context of (finite dimensional) vector
spaces of functions and their derivatives. Let K denote the field of real or complex
numbers R or C , respectively. Accordingly, for n ∈ N , let Kn denote the n-dimensional
vector space of n-tuples x = (x1 , . . . , xn ) with components xi ∈ K, i = 1, . . . , n . For
these addition and scalar multiplication are defined by:
k
ci ai = 0, ci ∈ K ⇒ ci = 0, i = 1, . . . , k.
i=1
Each (finite dimensional) vector space, such as Kn , possesses a basis. The special “Carte-
sian basis” {e1 , . . . , en } is formed by the “Cartesian unit vectors” ei := (δ1i , . . . , δni ) ,
δii = 1 and δij = 0, for i = j, being the usual Kronecker symbol. The elements of this
basis are mutually orthonormal, i. e., with respect to the Euclidian scalar product, there
holds (ei , ej )2 := nk=1 eik ejk = δij . “Matrices” A ∈ Kn×n are two-dimensional square
arrays of numbers from K written in the form A = (aij )ni,j=1 , where the first index, i ,
1
René Descartes (1596–1650): French mathematician and philosopher (“(ego) cogito ergo sum”);
worked in the Netherlands and later in Stockholm; first to recognize the close relation between geometry
and arithmetic and founded analytic geometry.
1
2 Introduction
refers to the row and the second one, j , to the column (counted from the left upper corner
of the array) at which the element aij is positioned. Usually, matrices are square arrays,
but in some situations also rectangular matrices may occur. The set of (square) matri-
ces forms a vector space with addition and scalar multiplication defined in the natural
elementwise sense,
Matrices are used to represent linear mappings in Kd with respect to a given basis, mostly
a Cartesian basis, ϕ(x) = Ax . By ĀT = (aTij )ni,j=1 , we denote the conjugate “transpose”
of a matrix A = (aij )ni,j=1 ∈ Kn×n with the elements aTij = āji . For matrices A, B ∈ Kn×n
there holds (AB)T = B T AT . Matrices for which A = ĀT are called “symmetric” in the
case K = R and “Hermitian” in the case K = C.
In the “quadratic” case the solvability of the system (0.2.1) is equivalent to any one of
the following properties of the coefficient matrix A ∈ Kn×n :
- Ax = 0 implies x = 0 .
- rank(A) = n .
- det(A) = 0 .
- All eigenvalues of A are nonzero.
Aw = λw. (0.2.2)
Eigenvalues are just the zeros of the characteristic polynomial χA (z) := det(A − zI)
of A , so that by the fundamental theorem of Algebra each n × n-matrix has exactly
n eigenvalues counted accordingly to their (algebraic) multiplicities. The corresponding
eigenvectors span linear subspaces of Kn called “eigenspaces”.
Eigenvalue problems play an important role in many problems from science and engi-
neering, e. g., they represent energy levels in physical models (e. g., Schrödinger equation
in Quantum Mechanics) or determine the stability or instability of solutions of dynamical
systems (e. g., Navier-Stokes equations in hydrodynamics).
We will mainly consider numerical methods for solving quadratic linear systems and asso-
ciated eigenvalue problems. The emphasis will be on medium- and large-scale problems,
i. e., problems of dimension n ≈ 104 − 109 , which at the upper end impose particularly
strong requirements on the algorithms with respect to storage and work efficiency. Prob-
lems of that size usually involve matrices with special structure such as “band structure”
and/or extreme “sparsity”, i. e., only very few matrix elements in each row are non-zero.
Most of the classical methods, which have originally been designed for “full” but smaller
matrices, cannot be realistically applied to such large problems. Therefore, modern meth-
ods extensively exploit the particular sparsity structure of the matrices. These methods
split into two classes, “direct methods” and “iterative methods”.
Definition 0.1: A “direct” method for the solution of a linear system Ax = b is an
algorithm, which (neglecting round-off errors) delivers the exact solution x in finitely
many arithmetic steps. “Gaussian elimination” is a typical example of such a “direct
method”. In contrast to that an “iterative method” constructs a sequence of approximate
solutions {xt }t∈N , which only in the limit t → ∞ converge to the exact solution, i. e.,
limt→∞ xt = x. “Richardson iteration” or more general fixed-point methods of similar
kind are typical example of such “iterative methods”. In analyzing a direct method, we are
mainly interested in the work count, i. e., the asymptotic number of arithmetic operations
needed for achieving the final result depending on the problem size, e. g., O(n3 ) , while
4 Introduction
in an iterative method, we look at the work count needed for one iteration step and the
number of iteration steps for reducing the initial error by a certain fixed factor, e. g., 10−1 ,
or the asymptotic speed of convergence (“linear”, “quadratic, etc.).
However, there is no sharp separation between the two classes of “direct” or “iterative”
methods as many theoretically “direct” methods are actually used in “iterative” form in
practice. A typical method of this type is the classical “conjugate gradient (CG) method”,
which in principle is a direct method (after n iteration steps) but is usually terminated
like an iterative methods already after m n steps.
we present some applications, from which large linear algebra problems originate. This
illustrates how the various possible structures of matrices may look like. Thereby, we have
to deal with scalar or vector-valued functions u = u(x) ∈ Kn for arguments x ∈ Kn . For
derivatives of differentiable functions, we use the notation
∂u ∂2u ∂u ∂2u
∂x u := , ∂x2 u := 2 , . . . , ∂i u := , ∂ij2 u := , ... ,
∂x ∂ x ∂xi ∂xi ∂xj
and analogously also for higher-order derivatives. With the nabla operator ∇ the “gra-
dient” of a scalar function and the “divergence” of a vector function are written as
grad u = ∇u := (∂1 u, ..., ∂d u)T and div u = ∇ · u := ∂1 u1 + ... + ∂d ud , respectively.
For a vector β ∈ Rd the derivative in direction β is written as ∂β u := β · ∇u . Combi-
nation of gradient and divergence yields the so-called “Laplacian operator”
The symbol ∇m u denotes the “tensor” of all partial derivatives of order m of u , i. e., in
two dimensions u = u(x1 , x2 ) , ∇2 u = (∂1i ∂2j u)i+j=2.
m 1/2
Δ2 := |u(xj ) − yj |2
j=1
0.4 Applications and origin of problems 5
becomes minimal. (The “Chebyshev2 equalization problem” in which the “maximal de-
viation” Δ∞ := maxj=1,...,m |u(xj ) − yj | is minimized poses much more severe difficulties
and is therefore used only for smaller n.) For the solution of the Gaussian equalization
problem, we set y := (y1 , . . . , ym ), c := (c1 , . . . , cn ) and
AT Ac = AT y, (0.4.3)
a linear n × n-system with a positive definite (and hence regular) coefficient matrix AT A.
In the particular case of polynomial fitting, i. e., uk (x) = xk−1 , the “optimal” solution
n
u(x) = ck xk−1
k=1
called “Gaussian equalization parabola” for the points (xj , yj ), j = 1, . . . , m . Because of
the regularity of the “Vandermondian3 determinant”
⎡ ⎤
1 x1 · · · xn−1
⎢ 1
⎥
⎢ 1 x2 · · · xn−1 ⎥
n
⎢ 2 ⎥
det ⎢ .. .. .. ⎥ = (xk − xj ) = 0,
⎢ . . . ⎥
⎣ ⎦ j,k=1,j<k
1 xn · · · xn−1
n
for mutually distinct points xj there holds rank(A) = n , i. e., the equalization parabola
is uniquely determined.
2
Pafnuty Lvovich Chebyshev (1821–1894): Russian mathematician; prof. in St. Petersburg; contribu-
tions to number theory, probability theory and especially to approximation theory; developed the general
theory of orthogonal polynomials.
3
Alexandre-Thophile Vandermonde (1735–1796): French mathematician; gifted musician, came late
to mathematics and published here only four papers (nevertheless member of the Academy of Sciences
in Paris); contributions to theory of determinants and combinatorial problem (curiously enough the
determinant called after him does not appear explicitly in his papers).
6 Introduction
1 2 3 4
1
5 6 7 8 h= m+1
mesh width
9 10 11 12
n = m2 “interior” mesh points
13 14 15 16
h
uh (P ) = gh (P ), P ∈ ∂Ωh , (0.4.6)
4
Pierre Simon Marquis de Laplace (1749–1827): French mathematician and astronomer; prof. in Paris;
founded among other fields probability calculus.
0.4 Applications and origin of problems 7
For any numbering of the mesh points in Ωh and ∂Ωh , Ωh = {Pi , i = 1, ..., n} , ∂Ωh =
{Pi , i = n+1, ..., n+m}, we obtain a quadratic linear system for the vector of approximate
mesh values U = (Ui )N i=1 , Ui := uh (Pi ) .
AU = F, (0.4.8)
n+m
aij := σ(Pi , Pj ), bj := fh (Pj ) − σ(Pj , Pi )gh (Pi ).
i=n+1
In the considered special case of the unit square and row-wise numbering of the interior
mesh points Ωh the 5-point difference approximation of the Laplacian yields the following
sparse matrix of dimension n = m2 :
⎡ ⎤⎫ ⎡ ⎤⎫
Bm −Im ⎪
⎪ 4 −1 ⎪
⎪
⎢ ⎪
⎥⎪ ⎢ ⎪
⎥⎪
⎢ ⎥⎪
⎬ ⎢ ⎥⎪
⎬
1 ⎢ −I m B m −I m ⎥ ⎢ −1 4 −1 ⎥
A= 2 ⎢ . ⎥ n Bm = ⎢ . ⎥ m,
h ⎢ −Im Bm . . ⎥ ⎪ ⎢ −1 4 .. ⎥ ⎪
⎣ ⎦⎪
⎪
⎪ ⎣ ⎦⎪
⎪
⎪
⎪ ⎪
. ⎭ . ⎭
.. .. .. ..
. .
where Im is the m×m-unit matrix. The matrix A is a very sparse band matrix with
half-band width m , symmetric and (irreducibly) diagonally dominant. This implies that
it is regular and positive definite. In three dimensions the corresponding matrix has di-
mension n = m3 and half-band width m2 and shares all the other mentioned properties
of its two-dimensional analogue. In practice, n
104 up to n ≈ 107 in three dimensions.
If problem (0.4.4) is only part of a larger mathematical model involving complex domains
and several physical quantities such as (in chemically reacting flow models) velocity, pres-
sure, density, temperature and chemical species, the dimension of the complete system
may reach up to n ≈ 107 − 109 .
To estimate a realistic size of the algebraic problem oriented by the needs of a practical
application, we consider the above model problem (Poisson equation on the unit square)
with an adjusted right-hand side and boundary function such that the exact solution is
given by u(x, y) = sin(πx) sin(πy) ,
For this setting the error analysis of the difference approximation yields the estimate
max |u − uh | ≈ 1 2
d M (u)h2
24 Ω 4
≈ 8h2 , (0.4.10)
Ωh
8 Introduction
where M4 (u) = maxΩ̄ |∇4 u| ≈ π 4 (see the lecture notes Rannacher [3]). In order to
guarantee a relative error below TOL = 10−3 , we have to choose h ≈ 10−2 corresponding
to n ≈ 104 in two and n ≈ 106 in three dimension. The concrete structure of the matrix
A depends on the numbering of mesh points used:
i) Row-wise numbering: The lexicographical ordering of mesh points leads to a band
matrix with band width 2m + 1. The sparsity within the band would be largely reduced
by Gaussian elimination (so-called “fill-in”).
21 22 23 24 25
16 17 18 19 20
11 12 13 14 15
6 7 8 9 10
1 2 3 4 5
ii) Diagonal numbering: The successive numbering diagonally to the Cartesian coor-
dinate directions leads to a band matrix with less band volume. This results in less fill-in
within Gaussian elimination.
11 16 20 23 25
7 12 17 21 24
4 8 13 18 22
2 5 9 14 19
1 3 6 10 15
11 24 12 25 13
21 9 22 10 23
6 19 7 20 8
16 4 17 5 18
1 14 2 15 3
For large linear systems of dimension n > 105 direct methods such as Gaussian elim-
ination are difficult to realize since they are generally very storage and work demanding.
For a matrix of dimension n = 106 and band width m = 102 Gaussian elimination re-
quires already about 108 storage places. This is particularly undesirable if also the band
is sparse as in the above example with at most 5 non-zero elements per row. In this
case those iterative methods are more attractive, in which essentially only matrix-vector
multiplications occur with matrices of similar sparsity pattern as that of A .
As illustrative examples, we consider simple fixed-point iterations for solving a linear
system Ax = b with a regular n×n-coefficient matrix. The system is rewritten as
n
ajj xj + ajk xk = bj , j = 1, . . . , n.
k=1
k=j
When computing xtj the preceding components xtr , r < j, are already known. Hence, in
order to accelerate the convergence of the method, one may use this new information in
the computation of xtj . This idea leads to the “Gauß-Seidel5 method”:
1
xtj = bj − ajk xtk − ajk xt−1
k , j = 1, . . . , n. (0.4.12)
ajj
k<j k>j
5
Philipp Ludwig von Seidel (1821–1896): German mathematician; Prof. in Munich; contributions to
analysis (method of least error-squares) and celestial mechanics and astronomy.
10 Introduction
The Gauß-Seidel method has the same arithmetic complexity as the Jacobi method but
under certain conditions (satisfied in the above model situation) it converges twice as fast.
However, though very simple and maximal storage economical, both methods, Jacobi as
well as Gauß-Seidel, are by far too slow in practical applications. Much more efficient iter-
ative methods are the Krylov-space methods. The best known examples are the classical
“conjugate gradient method” (“CG method”) of Hestenes and Stiefel for solving linear
systems with positive definite matrices and the “Arnoldi method” for solving correspond-
ing eigenvalue problems. Iterative methods with minimal complexity can be constructed
using multi-scale concepts (e. g., geometric or algebraic “multigrid methods”). The latter
type of methods will be discussed below.
where v̂ is the velocity vector field of the flow, p̂ its hydrostatic pressure, ν the kinematic
viscosity (for normalized density ρ ≡ 1), and q the control pressure. The flow is driven
by a prescribed flow velocity v in at the Dirichlet (inflow) boundary (at the left end), a
prescribed mean pressure P at the Neumann (outflow) boundary (at the right end) and
the mean pressure q at the control boundary ΓQ . The (artificial) “free outflow” (also
called “do nothing”) boundary condition in (0.4.13) has proven successful especially in
modeling pipe flow since it is satisfied by Poiseuille flow (see Heywood et al. [42]).
ΓQ
Γin S Γout
ΓQ
Fig. 5 shows the configuration of a channel flow around an obstacle controlled by pressure
prescription at ΓQ , and Figure 6 the computational mesh and streamline plots of two
flows for different Reynolds numbers and control values, one stable and one unstable.
0.4 Applications and origin of problems 11
Figure 6: Computational mesh (top), uncontrolled stable (middle) and controlled unstable
(bottom) stationary channel flow around an obstacle.
For deciding whether these base flows are stable or unstable, within the usual linearized
stability analysis, one investigates the following eigenvalue problem corresponding to the
Navier-Stokes operator linearized about the considered base flow:
From the location of the eigenvalues in the complex plane, one can draw the following
conclusion: If an eigenvalue λ ∈ C of (0.4.14) has Re λ < 0 , the base flow is unstable,
otherwise it is said to be “linearly stable”. This means that the solution of the linearized
nonstationary perturbation problem
∂t w − νΔw + v̂ · ∇w + w · ∇v̂ + ∇q = 0, ∇ · w = 0, in Ω,
(0.4.15)
w|Γrigid ∪Γin = 0, ν∂n w − qn|Γout ∪ΓQ = 0
with some constant A ≥ 1 . After discretization the eigenvalue problem (0.4.14) in func-
tion space is translated into an nonsymmetric algebraic eigenvalue problem, which is
usually of high dimension n ≈ 105 − 106 . Therefore its solution can be achieved only by
iterative methods.
However, “linear stability” does not guarantee full “nonlinear stability” due to effects
caused by the “non-normality” of the operator governing problem (0.4.14), which may
cause the constant A to become large. This is related to the possible “deficiency” (dis-
crepancy of geometric and algebraic multiplicity) or a large “pseudo-spectrum” (range
12 Introduction
of large resolvent norm) of the critical eigenvalue. This effect is commonly accepted as
explanation of the discrepancy in the stability properties of simple base flows such as
Couette flow and Poiseuille flow predicted by linear eigenvalue-based stability analysis
and experimental observation (see, e. g., Trefethen & Embree [22] and the literature cited
therein).
1 Linear Algebraic Systems and Eigenvalue Problems
In this chapter, we introduce the basic notation and facts about the normed real or
complex vector spaces Kn of n-dimensional vectors and Kn×n of corresponding n × n-
matrices. The emphasis is on square matrices as representations of linear mappings in
Kn and their spectral properties.
We recall some basic topological properties of the finite dimensional “normed” (vector)
space Kn , where depending on the concrete situation K = R (real space) or K = C
(complex space). In the following each point x ∈ Kn is expressed by its canonical coor-
dinate representation x = (x1 , . . . , xn ) in terms of a (fixed) Cartesian basis {e1 , . . . , en }
of Kn ,
n
x= xi ei .
i=1
The notion of a “norm” can be defined on any vector space V over K , finite or infinite
dimensional. The resulting pair {V, · } is called “normed space”.
Remark 1.1: The property x ≥ 0 is a consequence of the other conditions. With (N2),
we obtain 0 = 0 and then with (N3) and (N2) 0 = x − x ≤ x + − x = 2x .
With the help of (N3) we obtain the useful inequality
x − y ≤ x − y, x, y ∈ Kn . (1.1.1)
Example 1.1: The standard example of a vector norm is the “Euclidian norm”
n 1/2
x2 := |xi |2 .
i=1
13
14 Linear Algebraic Systems and Eigenvalue Problems
The first two norm properties, (N1) and (N2), are obvious, while the triangle inequality
is a special case of the “Minkowski inequality” provided below in Lemma 1.4. Other
examples of useful norms are the “maximum norm” (or “l∞ norm”) and the “l1 norm”
n
x∞ := max |xi |, x1 := |xi |.
i=1,...,n
i=1
The norm properties of ·∞ and ·1 are immediate consequences of the corresponding
properties of the modulus function. Between l1 norm and maximum norm there are the
so-called “lp norms” for 1 < p < ∞ :
n 1/p
xp := |xi |p .
i=1
Again the first two norm properties, (N1) and (N2), are obvious and the triangle inequality
is the Minkowski inequality provided in Lemma 1.4, below.
“compact”, “diameter”, and “neighborhood” for point sets in Kn in analogy to the cor-
responding situation in K . We use the maximum norm · ∞ in the following discussion,
but we will see later that this is independent of the chosen norm. For any a ∈ Kn and
r > 0 , we use the ball
Kr (a) := {x ∈ Kn : x − a∞ < r}
as standard neighborhood of a with radius r . This neighborhood is “open” since for
each point x ∈ Kr (a) there exists a neighborhood Kδ (x) ⊂ Kr (a) ; accordingly the
complement Kr (a)c is “closed”. The “closure” of Kr (a) is defined by K r (a) := Kr (a) ∪
∂Kr (a) with the “boundary” ∂Kr (a) = {x ∈ Kn : x − a∞ = r} of Kr (a).
Proof. i) For any Cauchy sequence (xk )k∈N , in view of |xi | ≤ x∞ , i = 1, . . . , n , for
x ∈ Kn , also the component sequences (xki )k∈N , i = 1, . . . , n, are Cauchy sequences in
K and therefore converge to limits xi ∈ K . Then, the vector x := (x1 , . . . , xn ) ∈ Kn is
limit of the vector sequence (xk )k∈N with respect to the maximum norm.
ii) For any bounded vector sequence (xk )k∈N the component sequences (xki )k∈N , i =
1, . . . , n, are likewise bounded. By successively applying the theorem of Bolzano-Weierstraß
k
in K, in the first step, we obtain a convergent subsequence (x11j )j∈N of (xk1 )k∈N with
k k k
x11j → x1 (j → ∞) , in the next step a convergent subsequence (x22j )j∈N of (x21j )j∈N
k2j
with x2 → x2 (j → ∞), and so on. After n selection steps, we eventually obtain a sub-
k
sequence (xknj )j∈N of (xk )k∈N , for which all component sequences (xi nj )j∈N , i = 1, . . . , n,
converge. Then, with the limit values xi ∈ K, we set x := (x1 , . . . , xn ) ∈ Kn and have
the convergence xknj → x (j → ∞) . Q.E.D.
The following important result states that on the (finite dimensional) vector space Kn
the notion of convergence, induced by any norm · , is equivalent to the convergence
with respect to the maximum norm, i. e., to the componentwise convergence.
Theorem 1.2 (Equivalence of norms): All norms on the finite dimensional vector
space Kn are equivalent to the maximum norm, i. e., for each norm · there are
positive constants m, M such that
n
Proof. Let · be a vector norm. For any vector x = i=11 xi ei ∈ Kn there holds
n
n
x ≤ |xk | ek ≤ M x∞ , M := ek .
k=1 k=1
We set
S1 := {x ∈ Kn : x∞ = 1}, m := inf{x, x ∈ S1 } ≥ 0.
We want to show that m > 0 since then, in view of x−1
∞ x ∈ S1 , it follows that also
m ≤ x−1
∞ x for x
= 0 , and consequently,
to some x ∈ Kn . Since
1 − x∞ = xk ∞ − x||∞ ≤ xk − x∞ → 0 (k → ∞),
Remark 1.2: i) For the two foregoing theorems, the theorem of Bolzano-Weierstrass and
the theorem of norm equivalence, the finite dimensionality of Kn is decisive. Both theo-
rems do not hold in infinite-dimensional normed spaces such as the space l2 of (infinite)
l2 -convergent sequences or the space C[a, b] of continuous functions on [a, b].
ii) A subset M ⊂ Kn is called “compact” (or more precisely “sequentially compact”),
if each sequence of vectors in M possesses a convergent subsequence with limit in M .
Then, the theorem of Bolzano-Weierstrass implies that the compact subsets in Kn are
exactly the bounded and closed subsets in Kn .
iii) A point x ∈ Kn is called “accumulation point” of a set M ⊂ Kn if each neighborhood
of x contains at least one point from M \ {x} . The set of accumulation points of M is
denoted by H(M) (closed “hull” of M ). A point x ∈ M \ H(M) is called “isolated” .
Remark 1.3: In many applications there occur pairs {x, y} (or more generally tuples)
of points x, y ∈ Kn . These form the so-called “product space” V = Kn × Kn , which may
1/2
be equipped with the generic norm {x, y} := x2 + y2 . Since this space may
be identified with the 2n-dimensional Euclidian space K2n all results on subsets of Kn
carry over to subsets of Kn × Kn . This can be extended to more general product spaces
of the form V = Kn1 × · · · × Knm .
Remark 1.4: i) If the strict definiteness (S3) is relaxed, (x, x) ∈ R, (x, x) ≥ 0, the
sesquilinear form becomes a so-called “semi-scalar product”.
ii) From property (S2) (linearity in the first argument) and (S1) (conjugate symmetry),
we obtain the conjugate linearity in the second argument. Hence, a scalar product is a
special kind of “sesquilinear form” (if K = C ) or “bilinear form” (if K = R )
Lemma 1.1: For a scalar product on Kn there holds the “Cauchy-Schwarz inequality”
Proof. The assertion is obviously true for y = 0 . Hence, we can now assume that y = 0 .
For arbitrary α ∈ K there holds
0 ≤ (x, x) − (x, y)(y, y)−1(y, x) − (x, y)(y, y)−1(x, y) + (x, y)(x, y)(y, y)−1
= (x, x) − |(x, y)|2(y, y)−1
and, consequently, 0 ≤ (x, x)(y, y) − |(x, y)|2. This is the asserted inequality. Q.E.D.
The “Euclidian” scalar product (·, ·)2 corresponds to the “Euclidian” norm x2 .
Proof. The norm properties (N1) and (N2) are obvious. It remains to show (N3). Using
the Cauchy-Schwarz inequality, we obtain
1
Ludwig Otto Hölder (1859–1937): German mathematician; Prof. in Tübingen; contributions first to
the theory of Fourier series and later to group theory; found 1884 the inequality named after him.
2
William Henry Young (1863–1942): English mathematician; worked at several universities world-
wide, e. g., in Calcutta, Liverpool and Wales; contributions to differential and integral calculus, topological
set theory and geometry.
18 Linear Algebraic Systems and Eigenvalue Problems
Lemma 1.2 (Young inequality): For p, q ∈ R with 1 < p, q < ∞ and 1/p + 1/q = 1,
there holds the inequality
|x|p |y|q
|xy| ≤ + , x, y ∈ K. (1.1.4)
p q
Proof. The logarithm ln(x) is on R+ , in view of ln (x) = −1/x2 < 0, a concave
function. Hence, for x, y ∈ K there holds:
ln 1p |x|p + 1q |y|q ≥ 1p ln(|x|p ) + 1q ln(|y|q ) = ln(|x|) + ln(|y|).
Because of the monotonicity of the exponential function ex it further follows that for
x, y ∈ K :
1
p
|x|p + 1q |y|q ≥ exp ln(|x|) + ln(|y|) = exp ln(|x|) exp ln(|y|) = |x||y| = |xy|,
Lemma 1.3 (Hölder inequality): For the Euclidian scalar product there holds, for ar-
bitrary p, q ∈ R with 1 < p, q < ∞ and 1/p + 1/q = 1, the so-called “Hölder inequality”
3
Hermann Minkowski (1864–1909): Russian-German mathematician; Prof. in Göttingen; several
contributions to pure mathematics; introduced the non-euclidian 4-dimensional space-time continuum
(“Minkowski space”) for describing the theory of relativity of Einstein.
1.1 The normed Euclidean space Kn 19
Proof. For p = 1 and p = ∞ the inequality follows from the triangle inequality on R :
n
n
n
x + y1 = |xi + yi | ≤ |xi | + |yi | = x1 + y1,
i=1 i=1 i=1
x + y∞ = max |xi + yi | ≤ max |xi | + max |yi | = x∞ + y∞.
1≤i≤n 1≤i≤n 1≤i≤n
Let now 1 < p < ∞ and q be defined by 1/p + 1/q = 1 , i. e., q = p/(p − 1) . We set
n
n
ξqq = |ξi | =
q
|xi + yi |p = x + ypp,
i=1 i=1
and consequently,
(x − PM x, y)2 = 0 ∀y ∈ M, (1.1.8)
n
x= (x, ai )2 ai , (1.1.11)
i=1
4
Marc-Antoine Parseval des Chênes (1755–1836): French mathematician; worked on partial differen-
tial equations in physics (only five mathematical publications); known by the identity named after him,
which he stated without proof and connection to Fourier series.
1.1 The normed Euclidean space Kn 21
Proof. From the representation x = nj=1 αj aj taking the product with ai it follows
that
n
(x, ai )2 = αj (aj , ai )2 = αi , i = 1, . . . , n,
j=1
n
n
x22 = (x, x)2 = i
(x, a )2 (x, aj ) 2
i j
(a , a )2 = |(x, ai )2 |2 ,
i,j=1 i=1
b1 := a1 −1 1
2 a ,
k−1
(1.1.13)
b̃k := ak − (ak , bj )2 bj , bk := b̃k −1 k
2 b̃ , k = 2, . . . , n,
j=1
Proof. First, we show that the construction process of the bk does not stop with k < n .
The vectors bk are linear combinations of the a1 , . . . , ak . If for some k ≤ n
k−1
ak − (ak , bj )2 bj = 0,
j=1
that {a1 , . . . , an } is a basis. Now, we show by induction that the Gram-Schmidt process
yields an orthonormal basis. Obviously b1 2 = 1 . Let now {b1 , . . . , bk }, for k ≤ n, be
an already constructed orthonormal system. Then, for l = 1, . . . , k, there holds
k
(bk+1 , bl )2 = (ak+1 , bl )2 − (ak+1 , bj )2 (bj , bl )2 = 0
j=1
= δjl
and b k+1
2 = 1 , i. e., {b , . . . , b
1 k+1
} is also an orthonormal system. Q.E.D.
5
Jørgen Pedersen Gram (1850–1916): Danish mathematician, employee and later owner of an insurance
company, contributions to algebra (invariants theory), probability theory, numerics and forestry; the
orthonormalization algorithm named after him had already been used before by Cauchy 1836.
6
Erhard Schmidt (1876–1959): German mathematician, Prof. in Berlin, there founder of the Institute
for Applied Mathematics 1920, after the war Director of the Mathematical Institute of the Academy of
Sciences of DDR; contributions to the theory of integral equations and Hilbert spaces and later to general
topology.
22 Linear Algebraic Systems and Eigenvalue Problems
We now consider linear mappings from the n-dimensional vector space Kn into the m-
dimensional vector space Km , where not necessarily m = n . However, the special case
m = n plays the most important role. A mapping ϕ = (ϕ1 , . . . , ϕm ) : Kn → Km is called
“linear”, if for x, y ∈ Kn and α, β ∈ K there holds
The action of a linear mapping ϕ on a vector space can be described in several ways. It
obviously suffices to prescribe the action of ϕ on the elements of a basis of the space,
e. g., a Cartesian basis {ei , i = 1, . . . , n},
n
n
n
x= i
xi e → ϕ(x) = ϕ i
xi e = xi ϕ(ei ).
i=1 i=1 i=1
Thereby, to each vector (or point) x ∈ Kn a “coordinate vector” x̂ = (xi )ni=1 is uniquely
associated. If the images ϕ(x) are expressed with respect to a Cartesian basis of Km ,
m m
n
ϕ(x) = ϕj (x)ej = ϕj (ei ) xi ej ,
j=1 j=1 i=1
=: aji
By this matrix A ∈ Km×n the linear mapping ϕ is uniquely described with respect to
the chosen bases of Kn and Km . In the following discussion, for simplicity, we identify
the point x ∈ Kn with its special cartesian coordinate representation x̂ . Here, we follow
the convention that in the notation Km×n for matrices the first parameter m stands for
the dimension of the target space Km , i. e., the number of rows in the matrix, while the
1.1 The normed Euclidean space Kn 23
second one n corresponds to the dimension of the initial space Kn , i. e., the number of
columns. Accordingly, for a matrix entry aij the first index refers to the row number and
the second one to the column number of its position in the matrix. We emphasize that this
is only one of the possible concrete representations of the linear mapping ϕ : Kn → Km .
In this sense each quadratic matrix A ∈ Kn×n represents a linear mapping in Kn . The
identity map ϕ(x) = x is represented by the “identity matrix” I = (δij )ni,j=1 where
δij := 1 for i = j and δij = 0 else (the usual “Kronecker symbol”).
Clearly, two matrices A, A ∈ Km×n are identical, i. e., aij = aij if and only if Ax =
A x, x ∈ Kn . To a general matrix A ∈ Km×n , we associate the “adjoint transpose”
i,j=1 by setting aij := āji . A quadratic matrix A ∈ K
ĀT = (aTi,j )n×m T n×n
is called “regular”, if
the corresponding linear mapping is injective and surjective, i. e., bijective, with “inverse”
denoted by A−1 ∈ Kn×n . Further, to each matrix A ∈ Kn×n , we associate the following
quantities, which are uniquely determined by the corresponding linear mapping ϕ :
– “determinant” of A : det(A) .
The following property of the determinant will be useful below: det(ĀT ) = det(A).
Lemma 1.6: For a square matrix A = (aij )ni,j=1 ∈ Kn×n the following statements are
equivalent:
i) A is regular with inverse A−1 .
ii) The equation Ax = 0 has only the zero solution x = 0 (injectivity).
iii) The equation Ax = b has for any b ∈ Kn a solution (surjectivity).
iv) det(A) = 0 .
v) The adjoint transpose ĀT is regular with inverse (ĀT )−1 = (A−1 )T .
Proof. For the proof, we refer to the standard linear algebra literature. Q.E.D.
Lemma 1.7: For a general matrix A ∈ Km×n , we introduce its “range” and its “kernel”
(or “null space”)
There holds
i. e., the equation Ax = b has a solution if and only if (b, y)2 = 0 for all y ∈ kern(ĀT ) .
24 Linear Algebraic Systems and Eigenvalue Problems
Proof. For the proof, we refer to the standard linear algebra literature. Q.E.D.
In many practical applications the governing matrices have special properties, which
require the use of likewise special numerical methods. Some of the most important prop-
erties are those of “symmetry” or “normality” and “definiteness”.
or equivalently,
Lemma 1.8: For a Hermitian positive definite matrix A ∈ Kn×n the main diagonal
elements are real and positive, aii > 0 , and the element with largest modulus lies on the
main diagonal.
Proof. i) From aii = āii it follows that aii ∈ R. The positiveness follows via testing by
the Cartesian unit vector ei yielding aii = (Aei , ei )2 > 0.
ii) Let aij = 0 be an element of A with maximal modulus and suppose that i = j .
Testing now by x = ei − sign(aij )ej = 0 , we obtain the following contradiction to the
definiteness of A :
Remark 1.5 (Exercises): i) If a matrix A ∈ Kn×n is positive definite (or more generally
just satisfies (Ax, x)2 ∈ R for x ∈ Cn ), then it is necessarily Hermitian. This does not
need to be true for real matrices A ∈ Rn×n .
ii) The general form of a scalar product (·, ·) on Kn is given by (x, y) = (Ax, y)2 with
a (Hermitian) positive definite matrix A ∈ Kn×n .
1.1 The normed Euclidean space Kn 25
Lemma 1.9: A unitary matrix Q ∈ Kn×n is regular and its inverse is Q−1 = Q̄T .
Further, there holds:
T
Proof. First, we show that Q is the inverse of Q . Let qi ∈ Kn denote the column
vectors of Q satisfying by definition (qi , qj )2 = qiT q j = δij . This implies:
⎛ ⎞ ⎛ ⎞
q T1 q1 . . . q T1 qn 1 ... 0
⎜ . . ⎟ ⎜ . . ⎟
Q Q=⎜
T
. . .. .. ⎟ = ⎜ .. . . ... ⎟ = I.
⎝ . ⎠ ⎝ ⎠
T T
q n q1 . . . q n qn 0 ... 1
and further
1/2
Qx2 = (Qx, Qx)2 = x2 , x ∈ Kn ,
which completes the proof. Q.E.D.
i j
⎛ ⎞
1 0 0 0 0
⎜ ⎟
⎜ 0 cos(θ) 0 − sin(θ) 0 ⎟
⎜ ⎟ i
(ij) ⎜ ⎟
Qθ =⎜ 0 0 1 0 0 ⎟
⎜ ⎟
⎜ 0 sin(θ) 0 cos(θ) 0 ⎟
⎝ ⎠ j
0 0 0 0 1
describes a rotation in the (xi , xj )-plane about the origin x = 0 with angle θ ∈ [0, 2π) .
Remark 1.6: i) In view of the relations (1.1.20) and (1.1.21) Euclidian scalar product
and norm of vectors are invariant under unitary transformations. This explains why it is
the Euclidian norm, which is used for measuring length or distance of vectors in Rn .
ii) The Schwarz inequality (1.1.3) allows the definition of an “angle” between two vectors
26 Linear Algebraic Systems and Eigenvalue Problems
in Rn . For any number α ∈ [−1, 1] there is exactly one θ ∈ [0, π] such that α = cos(θ).
By
(x, y)2
cos(θ) = , x, y ∈ Kn \ {0},
x2 y2
a θ ∈ [0, π] is uniquely determined. This is then the “angle” between the two vectors
x and y . The relation (1.1.20) states that the Euclidian scalar product of two vectors
in Kn is invariant under rotations. By some rotation Q in Rn , we can achieve that
Qx, Qy ∈ span{e(1) , e(2) } and Qx = x2 e(1) . Then, there holds
(x, y)2 = (Qx, Qy)2 = x2 (e(1) , Qy)2 = x2 (Qy)1 = x2 Qy2 cos(θ) = x2 y2 cos(θ),
i. e., θ is actually the “angle” between the two vectors in the sense of elementary geometry.
x2
Qy
(Qy)2
cos(θ) = (Qy)1/Qy2
θ
x1
(Qy)1 Qx
Ax = b, (1.1.22)
for x ∈ Rn . Here, rank(A) < rank[A, b] is allowed, i. e., the system does not need to
possess a solution in the normal sense. In this case an appropriately extended notion
of “solution” is to be used. In the following, we consider the so-called “method of least
error-squares”, which goes back to Gauss. In this approach a vector x̄ ∈ Rn is seeked
with minimal defect norm d2 = b − Ax̄2 . Clearly, this extended notion of “solution”
coincides with the traditional one if rank(A) = rank([A, b]).
AT Ax̄ = AT b. (1.1.24)
Proof. i) Let x̄ be a solution of the normal equation. Then, for arbitrary x ∈ Rn there
holds
i. e., x̄ has least error-squares. In turn, for such a least error-squares solution x̄ there
holds
∂ 2
n n
∂
0= Ax − b22|x=x̄ = ajk xk − bj
∂xi ∂xi j=1 k=1 |x=x̄
n
n
=2 aji ajk x̄k − bj = 2(AT Ax̄ − AT b)i ,
j=1 k=1
AT Ax̄ = AT s = AT s + AT r = AT b,
i. e., x̄ solves the normal equation. In case that range(A) = n there holds kern(A) = {0}
and range(A) = Rn . Observing AT Ax = 0 and kern(AT ) ⊥ range(A), we conclude
Ax = 0 and x = 0 . The matrix AT A ∈ Rn×n is regular and consequently x̄ uniquely
determined. In case that range(A) < n , for any other solution x1 of the normal equation,
we have
This follows from the non-negativity of the function F (y) := x̄ + y2 and its uniform
strict convexity, which also implies uniqueness of the minimal solution. Q.E.D.
For the computation of the “solution with smallest error-squares” of a non-quadratic
system Ax = b, we have to solve the normal equation AT Ax = AT b. Efficient methods
for this task will be discussed in the next chapter.
Lemma 1.10: For any matrix A ∈ Km×n the matrices ĀT A ∈ Kn×n and AĀT ∈
Km×m are Hermitian (symmetric) and positive semi-definite. In the case m ≥ n and if
rank(A) = n the matrix ĀT A it is even positive definite.
i. e., ĀT A is Hermitian and positive semi-definite. The argument for AĀT is analogous.
In case that m ≥ n and rank(A) = n the matrix viewed as mapping A : Rn → Rm is
injective, i. e., Ax2 = 0 implies x = 0 . Hence, the matrix ĀT A is positive definite.
Q.E.D.
Aw = λw. (1.1.26)
ii) The vector space of all eigenvectors of an eigenvalue λ is called “eigenspace” and
denoted by Eλ . Its dimension is the “geometric multiplicity” of λ . The set of all eigen-
values of a matrix A ∈ Kn×n is called its “spectrum” and denoted by σ(A) ⊂ C . The
matrix function RA (z) := zI − A is called the “resolvent” of A and Res(A) := {z ∈
C | zI − A is regular} the corresponding “resolvent set”.
iii) The eigenvalues are just the zeros of the “characteristic polynomial” χA ∈ Pn of A ,
Hence, by the fundamental theorem of algebra there are exactly n eigenvalues counted
accordingly to their multiplicity as zeros of χA , their so-called “algebraic multiplicities”.
The algebraic multiplicity is always greater or equal than the geometric multiplicity. If it
is strictly greater, then the eigenvalue is called “deficient” and the difference the “defect”
of the eigenvalue.
iv) The eigenvalues of a matrix can be determined independently of each other. One speaks
of the “partial eigenvalue problem” if only a small number of the eigenvalues (e. g., the
largest or the smallest one) and the corresponding eigenvectors are to be determined. In
the “full eigenvalue problem” one seeks all eigenvalues with corresponding eigenvectors.
For a given eigenvalue λ ∈ C (e. g., obtained as a zero of the characteristic polynomial)
a corresponding eigenvector can be determined as any solution of the (singular) problem
(A − λI)w = 0. (1.1.27)
Conversely, for a given eigenvector w ∈ Kn (e. g., obtained by the “power method” de-
scribed below), one obtains the corresponding eigenvalue by evaluating any of the quotients
(choosing wi = 0 )
Since
det(ĀT − z̄I) = det(AT − zI) = det(A − zI)T ) = det(A − zI)
the eigenvalues of the matrices A and ĀT are related by
7
John William Strutt (Lord Rayleigh) (1842–1919): English mathematician and physicist; worked at
the beginning as (aristocratic) private scholar, 1879–1884 Professor for Experimental Physics in Cam-
bridge; fundamental contributions to theoretical physics: scattering theory, acoustics, electro-magnetics,
gas dynamics.
30 Linear Algebraic Systems and Eigenvalue Problems
The dual eigenvector w ∗ may also be normalized by w ∗2 = 1 or, what is more suggested
by numerical purposes, by (w, w ∗)2 = 1 . In the “degenerate” case (w, w ∗)2 = 0 , and
only then, the problem
Aw 1 − λw 1 = w (1.1.31)
(A − λI)2 w 1 = (A − λI)w = 0,
i. e., w 1 ∈ kern((A − λI)2 ) and, consequently, in view of the above definition, the eigen-
value λ has “defect” α(λ) ≥ 1 . If this construction can be continued, i. e., if (w 1 , w ∗ )2 =
0, such that also the problem Aw 2 − λw 2 = w 1 has a solution w 2 ∈ Kn , which is then
a “generalized eigenvector” of level two, by construction satisfying (A − λI)3 w 2 = 0 .
In this way, we may obtain “generalize eigenvectors” w m ∈ Kn of level m for which
(A − λI)m+1 w m = 0 and (w m , w ∗)2 = 0. Then, the eigenvalue λ has defect α(λ) = m .
Example 1.3: The following special matrices Cm (λ) occur as building blocks, so-called
“Jordan blocks”, in the “Jordan8 normal form” of a matrix A ∈ Kn×n (see below):
⎡ ⎤
λ 1 0
⎢ ⎥
⎢ λ 1 ⎥
⎢ ⎥
⎢ .. .. ⎥
Cm (λ) = ⎢ . . ⎥ ∈ Km×m , eigenvalue λ∈C
⎢ ⎥
⎢ 1 ⎥
⎣ λ ⎦
0 λ
χCm (λ) (z) = (z − λ)m ⇒ σ = m, rank(Cm (λ) − λI) = m − 1 ⇒ ρ = 1 .
8
Marie Ennemond Camille Jordan (1838–1922): French mathematician; Prof. in Paris; contributions
to algebra, group theory, calculus and topology.
1.1 The normed Euclidean space Kn 31
Definition 1.8: Two matrices A, B ∈ Kn×n are called “similar (to each other)”, if there
is a regular matrix T ∈ Kn×n such that
B = T −1 AT. (1.1.32)
Lemma 1.11: For any two similar matrices A, B ∈ Kn×n there holds:
a) det(A) = det(B).
b) σ(A) = σ(B).
c) trace(A) = trace(B).
Proof. i) The product theorem for determinants implies that det(AB) = det(A) det(B)
and further det(T −1 ) = det(T )−1 . This implies that
det(B) = det(T −1 AT ) = det(T −1 ) det(A) det(T ) = det(T )−1 det(A) det(T ) = det(A).
Definition 1.9 (Normal forms): i) Any matrix A ∈ Kn×n is similar to its “canonical
normal form” JA (“Jordan normal form”) which is a block diagonal matrix with main
32 Linear Algebraic Systems and Eigenvalue Problems
diagonal blocks, the “Jordan blocks”, of the form as shown in Example 1.3. Here, the
“algebraic” multiplicity of an eigenvalue corresponds to the number of occurrences of this
eigenvalue on the main diagonal of JA , while its “geometric” multiplicity corresponds to
the number of Jordan blocks containing λ .
ii) A matrix A ∈ Kn×n , which is similar to a diagonal matrix, then having its eigenvalues
on the main diagonal, is called “diagonalizable” ,
This relation implies that the transformation matrix W = [w 1 , . . . , w n ] has the eigenvec-
tors w i corresponding to the eigenvalues λi as column vectors. This means that orthog-
onalizability of a matrix is equivalent to the existence of a basis of eigenvectors.
iii) A matrix A ∈ Kn×n is called “unitarily diagonalizable” if it is diagonalizable with
a unitary transformation matrix. This is equivalent to the existence of an orthonormal
basis of eigenvectors.
Positive definite Hermitian matrices A ∈ Kn×n have very special spectral properties.
These are collected in the following lemma and theorem, the latter one being the basic
result of matrix analysis (“spectral theorem”).
Lemma 1.12: i) A Hermitian matrix has only real eigenvalues and eigenvectors to dif-
ferent eigenvalues are mutually orthogonal.
ii) A Hermitian matrix is positive definite if and only if all its (real) eigenvalues are pos-
itive.
iii) Two normal matrices A, B ∈ Kn×n commute, AB = BA, if and only if they possess
a common basis of eigenvectors.
Proof. For the proofs, we refer to the standard linear algebra literature. Q.E.D.
Theorem 1.5 (Spectral theorem): For square Hermitian matrices, A = ĀT , or more
general for “normal” matrices, ĀT A = AĀT , algebraic and geometric multiplicities of
eigenvalues are equal, i. e., these matrices are diagonalizable. Further, they are even
unitarily diagonalizable, i. e., there exists an orthonormal basis of eigenvectors.
Proof. For the proof, we refer to the standard linear algebra literature. Q.E.D.
We now consider the vector space of all m×n-matrices A ∈ Km×n . This vector space may
be identified with the vector space of mn-vectors, Kn×n ∼= Kmn . Hence, all statements for
vector norms carry over to norms for matrices. In particular, all norms for m×n-matricen
1.1 The normed Euclidean space Kn 33
are equivalent and the convergence of sequences of matrices is again the componentwise
convergence
Ak → A (k → ∞) ⇐⇒ akij → aij (k → ∞) , i = 1, . . . , m , j = 1, . . . , n .
Now, we restrict the further discussion to square matrices A ∈ Kn×n . For an arbitrary
vector norm · on Kn a norm for matrices A ∈ Kn×n is generated by
Ax
A := sup = sup Ax.
x∈Kn \{0} x x∈Kn ,x=1
The definiteness and homogeneity are obvious and the triangle inequality follows from
that holding for the given vector norm. This matrix norm is called the “natural matrix
norm” corresponding to the vector norm · . In the following for both norms, the
matrix norm and the generating vector norm, the same notation is used. For a natural
matrix norm there always holds I = 1 . Such a “natural” matrix norm is automatically
“compatible” with the generating vector norm, i. e., it satisfies
Further it is “submultiplicative”,
Not all matrix norms are “natural” in the above sense. For instance, the square-sum norm
(also called “Frobenius9 -norm”)
n 1/2
AF := |ajk |2
j,k=1
Lemma 1.13 (Spectral norm): For an arbitrary square matrix A ∈ Kn×n the product
T
matrix A A ∈ Kn×n is always Hermitian and positive semi-definitsemi-definite. For the
spectral norm of A there holds
9
Ferdinand Georg Frobenius (1849–1917): German mathematician; Prof. in Zurich and Berlin; con-
tributions to the theory of differential equations, to determinants and matrices as well as to group theory.
34 Linear Algebraic Systems and Eigenvalue Problems
Proof. i) Let the matrix A ∈ Kn×n be Hermitian. For any eigenvalue λ of A and
corresponding eigenvector x there holds
λx2 Ax2
|λ| = = ≤ A2 .
x2 x2
Ax2 = A xi ai 2 = λi xi ai 2 ≤ max |λi | xi ai 2 = max |λi | x2,
i i i i i
and consequently,
Ax2
≤ max |λi |.
x2 i
and ĀT A2 ≤ ĀT 2 A2 = A22 (observe that A2 = ĀT 2 due to Ax2 =
ĀT x̄2 ). This completes the proof. Q.E.D.
Lemma 1.14 (Natural matrix norms): The natural matrix norms generated by the
l∞ norm · ∞ and the l1 Norm · 1 are the so-called “maximal-row-sum norm” and
the “maximal-column-sum norm” , respectively,
n
n
A∞ := max |aij |, A1 := max |aij | . (1.1.38)
1≤i≤n 1≤j≤n
j=1 i=1
Proof. We give the proof only for the l∞ norm. For the l1 norm the argument is analogous.
i) The maximal row sum · ∞ is a matrix norm. The norm properties (N1) - (N3) follow
from the corresponding properties of the modulus. For the matrix product AB there
holds
n
n n
n
AB∞ = max aik bkj ≤ max |aik | |bkj |
1≤i≤n 1≤i≤n
j=1 k=1 k=1 j=1
n
n
≤ max |aik | max |bkj | = A∞ B∞ .
1≤i≤n 1≤k≤n
k=1 j=1
n
n
Ax∞ = max | ajk xk | ≤ max |ajk | max |xk | = A∞ x∞
1≤j≤n 1≤j≤n 1≤k≤n
k=1 k=1
1.1 The normed Euclidean space Kn 35
the maximal row-sum is compatible with the maximum norm · ∞ and there holds
n
n
A∞ = max |ajk | = |amk |.
1≤j≤n
k=1 k=1
For k = 1, . . . , n, we set
!
|amk |/amk für amk = 0,
zk :=
0, sonst,
n
n
vm = amk zk = |amk | = A∞ .
k=1 k=1
Consequently,
A∞ = vm ≤ v∞ = Az∞ ≤ sup Ay∞,
y∞ =1
i. e., all eigenvalues of A are contained in a circle in C with center at the origin and
radius A. Especially with A∞ , we obtain the eigenvalue bound
n
max |λ| ≤ A∞ = max |aij |. (1.1.40)
λ∈σ(A) 1≤i≤n
j=1
Since the eigenvalues of ĀT and A are related by λ(ĀT ) = λ̄(A) , using the bound
(1.1.40) simultaneously for ĀT and A yields the following refined bound:
36 Linear Algebraic Systems and Eigenvalue Problems
The following lemma contains a useful result on the regularity of small perturbations
of the unit matrix.
Lemma 1.15 (Perturbation of unity): Let · be any natural matrix norm on Kn×n
and B ∈ Kn×n a matrix with B < 1 . Then, the perturbed matrix I + B is regular and
its inverse is given as the (convergent) “Neumann10 series”
∞
(I + B)−1 = Bk. (1.1.42)
k=0
Proof. i) First, we show the regularity of I + B and the bound (1.1.43). For all x ∈ Kn
there holds
In view of 1 − B > 0 this implies that I + B is injective and consequently regular.
Then, the following estimate implies (1.1.43):
k
S := lim Sk , Sk = Bs.
k→∞
s=0
S is well defined due to the fact that {Sn }n∈N is a Cauchy sequence with respect to the
matrix norm · (and, by the norm equivalence in finite dimensional normed spaces,
10
Carl Gottfried Neumann (1832–1925): German mathematician; since 1858 “Privatdozent” and since
1863 apl. Prof. in Halle. After holding professorships in Basel and Tübingen he moved 1868 to Leipzig
where he worked for more than 40 years. He contributed to the theory of (partial) differential and integral
equations, especially to the Dirichlet problem. The “Neumann boundary condition” and the “Neumann
series” are named after him. In mathematical physics he worked on analytical mechanics and potential
theory. together with A. Clebsch he founded the journal “Mathematische Annalen”.
1.2 Spectra and pseudo-spectra of matrices 37
with respect to any matrix norm). By employing the triangle inequality, using the matrix
norm property and the limit formula for the geometric series, we see that
k
k
1 − Bk+1 1
S = lim Sk = lim B s
≤ lim Bs = lim = .
k→∞ k→∞ n→∞ k→∞ 1 − B 1 − B
s=0 s=0
Corollary 1.2: Let A ∈ Kn×n be a regular matrix and à another matrix such that
1
à − A < . (1.1.44)
A−1
Then, also à is regular. This means that the “resolvent set” Res(A) of a matrix A ∈
Kn×n is open in Kn×n and the only “singular” points are just the eigenvalues of A , i. e.,
there holds C = Res(A) ∪ σ(A) .
by Lemma 1.15 the matrix I + A−1 (Ã − A) is regular. Then, also the product matrix
A(I + A−1 (à − A)) is regular, which implies the regularity of à . Q.E.D.
For simplicity, we restrict the following discussion to the special situation of an au-
tonomous system, i. e., F (t, ·) ≡ F (·), and a stationary particular solution u(t) ≡ u ∈ Rn ,
i. e., to the solution of the nonlinear system
F (u) = 0. (1.2.48)
where the higher-order term depends on bounds on u and u as well as on the smoothness
properties of F (·) .
Theorem 1.6: Suppose that the Jacobian A := F (u) is diagonalizable and that all its
eigenvalues have negative real part. Then, the solution u of (1.2.48) is exponentially stable
in the sense of Definition 1.10 with the constants κ = |Reλmax | and K = cond2 (W ) ,
where λmax is the eigenvalue of A with largest (negative) real part and W = [w 1 , . . . , w n ]
the column matrix formed by the (normalized) eigenbasis of A . If A is normal then
K = cond2 (W ) = 1 .
W −1 AW = Λ, A = W ΛW −1 .
1.2 Spectra and pseudo-spectra of matrices 39
Using this notation the perturbation equation can be rewritten in the form
This implies:
n
n
v(t)22 ≤ |vi (t)| ≤
2
e2Reλi t |(W −1w)i (0)|2 ≤ e2Reλmin t (W −1 w)(0)22,
i=1 i=1
and consequently,
The condition number of W can become arbitrarily large depending on the “non-orthogo-
nality” of the eigenbasis of the Jacobian A .
ii) The assertion now follows by combining (1.2.51) and (1.2.49) within a continuation
argument. The proof is complete. Q.E.D.
Following the argument in the proof of Theorem 1.6, we see that the occurrence of
just one eigenvalue with Re λ > 0 inevitably causes dynamic instability of the solution
u , i. e., arbitrarily small perturbations may grow in time without bound. Denoting by
S : Rn → C 1 [0, ∞; Rn) the “solution operator” of the linearized perturbation equation
(1.2.50), i. e., w(t) = S(t)w 0 , this can be formulated as
The result of Theorem 1.6 can be extended to the case of a non-diagonalizable Jacobian
A = F (u) . In this case, one obtains a stability behavior of the form
where α ≥ 1 is the defect of the most critical eigenvalue λmax , i. e., that eigenvalue with
largest real part Reλmax < 0 . This implies that
α α 1
sup S(t) ≈ , (1.2.54)
t>0 e | Re λmax |α
40 Linear Algebraic Systems and Eigenvalue Problems
i. e., for −1 Re λmin < 0 initially small perturbations may grow beyond a value at
which nonlinear instability is triggered. Summarizing, we are interested in the case that
all eigenvalues of A = F (u) have negative real part, suggesting stability in the sense of
Theorem 1.6, and especially want to compute the most “critical” eigenvalue, i. e., that
λ ∈ σ(A) with maximal Re λ < 0 to detect whether the corresponding solution operator
S(t) may behave in a critical way.
The following result, which is sometimes addressed as the “easy part of the Kreiss11
matrix theorem” indicates in which direction this analysis has to go.
Lemma 1.16: Let A := F (u) and z ∈ C \ σ(A) with Re z > 0 . Then, for the solution
operator S(t) of the linearized perturbation equation (1.2.50), there holds
Proof. We continue using the notation from the proof of Theorem 1.6. If S(t)2 is
unbounded over [0, ∞) , the asserted estimate holds trivially. Hence, let us assume that
Next, integrating this over 0 ≤ t < T and observing Re z > 0 and limt→∞ e−tz w = 0
yields
" ∞
−1 0
−(zI − A) w = e−tz S(t) dt w 0 .
0
From this, we conclude
" ∞
(zI − A)−1 2 ≤ e−t| Re z| dt sup S(t)2 ≤ | Re z|−1 sup S(t)2 ,
0 t>0 t>0
11
Heinz-Otto Kreiss (1930–2015): Swedish/US-American mathematician; worked in Numerical Analy-
sis and in the new field Scientific Computing in the early 1960s; born in Hamburg, Germany, he studied
and worked at the Kungliga Tekniska Hgskolan in Stockholm, Sweden; he published a number of books;
later he became Prof. at the California Institute of Technology and University of California, Los Angeles
(UCLA).
1.2 Spectra and pseudo-spectra of matrices 41
of Theorem 1.6 would indicate stability of solutions to (1.2.50), there may be points z in
the right complex half plane for which (zI − A)−1 2
| Re z|−1 and consequently,
Hence, even small perturbations of the particular solution u may be largely amplified
eventually triggering nonlinear instability.
The estimate (1.2.55) makes us search for points z ∈ C \ σ(A) with Re z > 0 and
This suggests the concept of the “pseudo-spectrum” of the matrix A , which goes back
to Landau [9] and has been extensively described and applied in the stability analysis of
dynamical systems, e. g., in Trefethen [20] and Trefethen & Embree [22].
Remark 1.7: The concept of a pseudo-spectrum is interesting only for non-normal opera-
tors, since for a normal operator σε (A) is just the union of ε-circles around its eigenvalues.
This follows from the estimate (see Dunford & Schwartz [8] or Kato [12])
Remark 1.8: The concept of the “pseudo-spectrum” can be introduced in much more
general situations, such as that of closed linear operators in abstract Hilbert or Banach
spaces (see Trefethen & Embree [22]). Typically hydrodynamic stability analysis concerns
differential operators defined on bounded domains. This situation fits into the Hilbert-
space framework of “closed unbounded operators with compact inverse”.
Using the notion of the pseudo-spectrum the estimate (1.2.55) can be expressed in the
following form
| Re z|
sup S(t)2 ≥ sup ε > 0, z ∈ σε (A), Re z > 0 , (1.2.59)
t≥0 ε
or
Below, we will present methods for computing estimates for the pseudo-spectrum of a
matrix. This will be based on related methods for solving the partial eigenvalue problem.
To this end, we provide some results on several basic properties of the pseudo-spectrum.
ii) Let 0 ∈ σ(A) . Then, the ε-pseudo-spectra of A and that of its inverse A−1 are related
by
# $
σε (A) ⊂ z ∈ C \ {0} z −1 ∈ σδ(z) (A−1 ) ∪ {0}, (1.2.61)
Proof. The proof of part (i) can be found in Trefethen & Embree [22]. For completeness,
we recall a sketch of the argument. The proof of part (ii) is taken from Gerecht et al.
[35].
ia) In all three definitions, we have σ(A) ⊂ σε (A) . Let z ∈ σε (A) in the sense of definition
(a). There exists a w ∈ Kn with w2 = 1 , such that (zI − A)−1 w2 ≥ ε−1 . Hence,
there is a v ∈ Kn with v2 = 1 , and s ∈ (0, ε) , such that (zI − A)−1 w = s−1 v or
(zI − A)v = sw . Let Q(v, w) ∈ Kn×n denote the unitary matrix, which rotates the unit
vector v into the unit vector w , such that sw = sQ(v, w)v . Then, z ∈ σ(A + E) where
E := sQ(v, w) with E2 ≤ ε , i. e., z ∈ σε (A) in the sense of definition (b). Let now
be z ∈ σε (A) in the sense of definition (b), i. e., there exists E ∈ Kn×n with E2 ≤ ε
such that (A + E)w = zw , with some w ∈ Kn , w = 0 . Hence, (A − zI)w = −Ew , and
therefore,
Hence, z ∈ σε (A) in the sense of (a). This proves the equivalence of definitions (a) and
(b).
1.2 Spectra and pseudo-spectra of matrices 43
ib) Next, let again z ∈ σε (A) \ σ(A) in the sense of definition (a). Then,
(A − zI)−1 w2 −1 (A − zI)v2
ε ≥ (A − zI)−1 −1
2 = sup = inf .
n
w∈K \{0} w 2 v∈K n \{0} v2
Hence, there exists a v ∈ Kn with v2 = 1 , such that (A − zI)v ≤ ε , i. e., z ∈ σε (A)
in the sense of definition (c). By the same argument, now used in the reversed direction,
we see that z ∈ σε (A) in the sense of definition (c) implies that also z ∈ σε (A) in the
sense of definition (a). Thus, definition (a) is also equivalent to condition (c).
iia) We use the definition (c) from part (i) for the ε-pseudo-spectrum. Let z ∈ σε (A) and
accordingly v ∈ Kn , v2 = 1 , satisfying (A − zI)v2 ≤ ε . Then,
we conclude that
ε ε
(A − z −1 I)w2 ≤ ≤ .
|z|(|z| − ε) 1−ε
This completes the proof. Q.E.D.
The next proposition relates the size of the resolvent norm (zI − A)−1 2 to easily
computable quantities in terms of the eigenvalues and eigenvectors of the matrix A =
F (u) .
Proof. The argument of the proof is recalled from Gerecht et. al. [35] where it is
developed within a function space setting and has therefore to be simplified here for the
finite dimensional situation.
i) Let B ∈ Kn×n be a matrix with B2 ≤ 1. We consider the perturbed eigenvalue
problem
(A + εB)vε = λε vε . (1.2.64)
Since this is a regular perturbation and λ non-deficient, there exist corresponding eigen-
values λε ∈ C and eigenvectors vε ∈ Kn , vε 2 = 1, such that
we conclude that
(A − λε I)vε 2 ≤ εB2vε 2 ≤ εvε 2 ,
and from this, if λε is not an eigenvalue of A ,
(A − λε I)−1 y2 −1 x2 −1
(A − λε I)−1 −1
2 = sup = sup
y∈Kn y2 x∈Kn (A − λε I)x2
(A − λε I)x2 (A − λε I)vε 2
= infn ≤ ≤ ε.
x∈K x2 vε 2
ii) Next, we analyze the dependence of the eigenvalue λε on ε in more detail. Subtracting
the equation for v from that for vε , we obtain
(Bvε , v ∗ )2
ω(ε) := → 1 (ε → 0).
(vε , v ∗ )2 (Bv, v ∗ )2
1.3 Perturbation theory and conditioning 45
The unitary matrix S acts like a Householder transformation mapping v into ṽ ∗ (s. the
discussion in Section 2.3.1, below). In fact, observing v2 = ṽ ∗ 2 = 1 , there holds
Sv22 = v22 − 2 Re (v, w)2(v, w)2 − 2 Re (v, w)2 (w, v)2 + 4 Re (v, w)22w22 = v22 ,
we have B2 = S2 = 1. Hence, for this particular choice of the matrix B , we have
as asserted. Q.E.D.
Remark 1.9: i) We note that the statement of Theorem 1.7 becomes trivial if the matrix
A is normal. In this case primal and dual eigenvectors coincide and, in view of Remark
1.7, σε (A) is the union of ε-circles around its eigenvalues λ . Hence, observing w ∗ 2 =
w2 = 1 and setting ω(ε) ≡ 1 , we trivially have λε := λ − ε ∈ σε (A) as asserted.
ii) If A is non-normal it may have a nontrivial pseudo-spectrum. Then, a large norm
of the dual eigenfunction w ∗2 corresponding to a critical eigenvalue λcrit with −1
Re λcrit < 0 , indicates that the ε-pseudo-spectrum σε (A) , even for small ε , reaches into
the right complex half plane.
iii) If the eigenvalue λ ∈ σ(A) considered in Theorem 1.7 is deficient, the normalization
(w, w ∗)2 = 1 is not possible. In this case, as discussed above, there is still another
mechanism for triggering nonlinear instability.
First, we analyze the “conditioning” of quadratic linear systems. There are two main
sources of errors in solving an equation Ax = b :
46 Linear Algebraic Systems and Eigenvalue Problems
a) errors in the “theoretical” solution caused by errors in the data, i. e,, the elements
of A and b ,
b) errors in the “numerical” solution caused by round-off errors in the course of the
solution process.
Ax = b (1.3.66)
with regular coefficient matrix A ∈ Kn×n . The matrix A and the vector b are faulty by
small errors δA and δb , so that actually the perturbed system
Theorem 1.8 (Perturbation theorem): Let the matrix A ∈ Kn×n be regular and the
perturbation satisfy δA < A−1 −1 . Then, the perturbed matrix à = A + δA is also
regular and for the resulting relative error in the solution there holds
! %
δx cond(A) δb δA
≤ + , (1.3.68)
x 1 − cond(A)δA/A b A
with the so-called “condition number” cond(A) := A A−1 of the matrix A .
such that also A + δA = A[I + A−1 δA] is regular by Lemma 1.15. From
(A + δA)δx = δb − δAx,
# $
δx ≤ (A + δA)−1 δb + δA x
−1 # $
= A(I + A−1 δA) δb + δA x
# $
= (I + A−1 δA)−1 A−1 δb + δA x
# $
≤ (I + A−1 δA)−1 A−1 δb + δA x
A−1 # $
≤ δb + δA x
1 − A δA
−1
! %
A−1 A x δb δA
≤ + .
1 − A−1 δA A A−1 A x A
|λmax |
cond2 (A) := A2 A−1 2 =
|λmin |
with the eigenvalues λmax and λmin of A with largest and smallest modulus, respectively.
Accordingly, the quantity cond2 (A) is called the “spectral condition (number)” of A . In
the case cond(A)δA A−1 1 , the stability estimate (1.3.68) takes the form
! %
δx δb δA
≈ cond(A) + ,
x b A
i. e., cond(A) is the amplification factor by which relative errors in the data A and b
affect the relative error in the solution x .
Corollary 1.3: Let the condition of A be of size cond(A) ∼ 10s . Are the elements of
A and b faulty with a relative error if size
δA δb
≈ 10−k , ≈ 10−k (k > s),
A b
then the relative error in the solution can be at most of size of A and b faulty with a
relative error if size
δx
≈ 10s−k .
x
In the case · = · ∞ , one may lose s decimals in accuracy.
48 Linear Algebraic Systems and Eigenvalue Problems
Example 1.4: Consider the following coefficient matrix A and its inverse A−1 :
& ' & '
1.2969 0.8648 −1 8 0.1441 −0.8648
A= , A = 10
0.2161 0.1441 −0.2161 1.2969
Finally, we demonstrate that the stability estimate (1.3.68) is essentially sharp. Let
A be a positive definite n × n-matrix with smallest and largest eigenvalues λ1 and λn
and corresponding normalized eigenvectors w1 and wn , respectively. We choose
δA ≡ 0, b ≡ wn , δb ≡ εw1 (ε = 0).
x = λ−1
n wn , x̃ = λ−1 −1
n w n + ε λ1 w 1 .
20
χA (z) = (z − j) = z 20 −210 19
z + . . . +
20! .
j=1 b20
b1
The coefficient b1 is perturbed: b̃1 = −210 + 2−23 ∼ −210, 000000119 . . . , which results
in
1.3 Perturbation theory and conditioning 49
b̃ − b
1 1
relative error ∼ 10−10 .
b1
Then, the perturbed polynomial χ̃A (z) has two roots λ± ∼ 16.7 ± 2.8i, far away from
the trues.
The above example shows that via the characteristic polynomial eigenvalues may be
computed reliably only for very special matrices, for which χA (z) can be computed with-
out determining its monomial from. Examples of some practical importance are, e. g.,
“tridiagonal matrices” or more general “Hessenberg12 matrices”.
⎡ ⎤ ⎡ ⎤
a1 b1 a11 · · · a1n
⎢ ⎥ ⎢ .. ⎥
⎢ c .. ..
. . ⎥ ⎢ .
a21 . . ⎥
⎢ 2 ⎥ ⎢ . ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎢ . bn−1 ⎥ ⎢ . an−1,n ⎥
⎣ ⎦ ⎣ ⎦
cn an 0 an,n−1 ann
tridiagonal matrix Hessenberg matrix
Next, we provide a useful estimate which will be the basis for estimating the condi-
tioning of the eigenvalue problem.
Lemma 1.18: Let A, B ∈ Kn×n be arbitrary matrices and · a natural matrix norm.
Then, for any eigenvalue λ of A , which is not eigenvalue of B there holds
(A − B) w = (λI − B) w,
(λI − B)−1 (A − B) w = w.
Consequently
(λI − B)−1 (A − B) x
1≤ sup = (λI − B)−1 (A − B),
x∈Kn \{0} x
12
Karl Hessenberg (1904–1959): German mathematician; dissertation “Die Berechnung der Eigenwerte
und Eigenl”osungen linearer Gleichungssysteme”, TU Darmstadt 1942.
13
Semyon Aranovich Gershgorin (1901–1933): Russian mathematician; since 1930 Prof. in Leningrad
(St. Petersburg); worked in algebra, complex function theory differential equations and numerics.
50 Linear Algebraic Systems and Eigenvalue Problems
If the sets U ≡ ∪m
i=1 Kji and V ≡ ∪j=1 Kj \ U are disjoint then U contains exactly m
n
Proof. i) We set B ≡ D = diag(ajj) in Lemma 1.18 and take the “maximal row sum”
as natural matrix norm. Then, it follows that for λ = ajj :
1
n
(λI − D)−1 (A − D)∞ = max |ajk | ≥ 1,
j=1,...,n |λ − ajj | k=1,k
=j
||Α||∞ ||Α||1
1 2 3
Next, from the estimate of Lemma 1.18, we derive the following basic stability result
for the eigenvalue problem.
i. e., A is “similar” to the diagonal matrix Λ = diag(λi (A)). Since λ = λ(B) is not an
eigenvalue of A ,
i. e., the eigenvalue problem of “Hermitian” (or more general “normal”) matrices is well
conditioned. For general “non-normal” matrices the conditioning of the eigenvalue prob-
lem may be arbitrarily bad, cond2 (W)
1 .
52 Linear Algebraic Systems and Eigenvalue Problems
1.4 Exercises
n −1 n −1
b) i=1 xi λi ≤ i=1 xi λi , xi ∈ R+ , 0 ≤ λi ≤ 1, ni=1 λi = 1.
# $
c) max0≤x≤1 x2 (1 − x)2n ≤ (1 + n)−2 .
Exercise 1.2 (Some useful facts about norms and scalar products):
Verify the following claims for vectors x, y ∈ Rn and the Euclidean norm · 2 and scalar
product (·, ·)2 :
a) 2x22 + 2y22 = x + y22 + x − y22 (Parallelogram identity).
b) |(x, y)2| ≤ x2 y2 (Schwarz inequality).
c) For any symmetric, positive definite matrix A ∈ Rn×n the bilinear form (x, y)A :=
(Ax, y)2 is a scalar product. i) Can any scalar product on Rn×n be written in this form?
ii) How has this to be formulated for complex matrices A ∈ Cn×n ?
Exercise 1.4 (Some useful facts about vector spaces and matrices):
a) Formulate the Gram-Schmidt algorithm for orthonormalizing a set of linearly indepen-
dent vectors {x1 , . . . , xm } ⊂ Rn :
b) How can one define the square root A1/2 of a symmetric, positive definite matrix
A ∈ Rn×n ?
c) Show that a positive definite matrix A ∈ Cn×n is automatically Hermitian, i. e.,
A = ĀT . This is not necessarily true for real matrices A ∈ Rn×n , i. e., for real matrices
the definition of positiveness usually goes together with the requirement of symmetry.
Exercise 1.6: Recall the proofs of the following facts about matrices:
a) The diagonal elements of a (Hermitian) positive definite matrix A ∈ Kn×n are real
and positive.
b) For the trace tr(A) := ni=1 aii of a Hermitian matrix A ∈ Kn×n with eigenvalues
λi ∈ Σ(A) there holds
n
tr(A) = λi .
i=1
Exercise 1.7: Let B ∈ Kn×n be a matrix, which for some matrix norm · satisfies
B < 1 . Prove that the matrix I − B is regular with inverse satisfying
1
(I − B)−1 ≤ .
1 − B
Exercise 1.8: Prove that each connected component of k Gerschgorin circles (that are
disjoined to all other n−k circles) of a matrix A ∈ Cn×n contains exactly k eigenvalues of
A (counted accordingly to their algebraic multiplicities). This implies that such a matrix,
for which all Gerschgorin circles are mutually disjoint, has exactly n simple eigenvalues
and is therefore diagonalizable.
Exercise 1.9: Let A, B ∈ Kn×n be two Hermitian matrices. Then, the following state-
ments are equivalent:
i) A and B commute, i. e., AB = BA .
ii) A and B possess a common basis of eigenvectors.
iii) AB is Hermitian.
Does the above equivalence in an appropriate sense also hold for two general “normal”
matrices A, B ∈ Kn×n , i. e., if ĀT A = AĀT and B̄ T B = B B̄ T ?
ϕ(αx + βy, z) = ᾱϕ(x, z) + β̄ϕ(y, z), ϕ(z, αx + βy) = αϕ(z, x) + βϕ(z, y), α, β ∈ K.
i) Show that for any regular matrix A ∈ Kn×n the sesquilinear form ϕ(x, y) := (Ax, Ay)2
is a scalar product on Kn .
ii) In an earlier exercise, we have seen that each scalar product (x, y) on Kn can be written
54 Linear Algebraic Systems and Eigenvalue Problems
in the form (x, y) = (x, Ay)2 with a (Hermitian) positive definite matrix A ∈ Kn×n . Why
does this statement not contradict (i)?
where λmin (A) and λmax (A) denote the minimal and maximal (real) eigenvalues of A ,
respectively. (Hint: Use that a Hermitian matrix possesses an ONB of eigenvectors.)
Exercise 1.12: Let A ∈ Kn×n and 0 ∈ σ(A) . Show that the ε-pseudo-spectra of A
and that of its inverse A−1 are related by
# $
σε (A) ⊂ z ∈ C \ {0} z −1 ∈ σδ(z) (A−1 ) ∪ {0},
In this chapter, we collect some basic results on so-called “direct” methods for solving
linear systems and matrix eigenvalue problems. A “direct” method delivers the exact
solution theoretically in finitely many arithmetic steps, at least under the assumption of
“exact” arithmetic. However, to get useful results a “direct” method has to be carried
to its very end. In contrast to this, so-called “iterative” methods produce sequences of
approximate solutions of increasing accuracy, which theoretically converge to the exact
solution in infinitely many arithmetic steps. However, “iterative” methods may yield
useful results already after a small number of iterations. Usually “direct” methods are
very robust but, due to their usually high storage and work requirements, feasible only
for problems of moderate size. Here, the meaning of “moderate size” depends very much
on the currently available computer power, i. e., today reaches up to dimension n ≈
105 − 106 . Iterative methods need less storage and as multi-level algorithms may even
show optimal arithmetic complexity, i. e., a fixed improvement in accuracy is achieved in
O(n) arithmetic operations. These methods can be used for really large-scale problems
of dimension reaching up to n ≈ 106 − 109 but at the prize of less robustness and higher
algorithmic complexity. Such modern “iterative” methods are the main subject of this
book and will be discussed in the next chapters.
In the following, we discuss “direct methods” for solving (real) quadratic linear systems
Ax = b . (2.1.1)
It is particularly easy to solve staggered systems, e. g., those with an upper triangular
matrix A = (ajk ) as coefficient matrix
This requires Nback subst = n2 /2 + O(n) arithmetic operations. The same holds true if
the coefficient matrix is lower triangular and the system is solved by the corresponding
“forward substitution”.
55
56 Direct Solution Methods
Definition 2.1: For quantifying the arithmetic work required by an algorithm, i. e., its
“(arithmetic) complexity”, we use the notion “arithmetic operation” (in short “a. op.”),
which means the equivalent of “1 multiplication + 1 addition” or “1 division” (assuming
that the latter operations take about the same time on a modern computer).
The classical direct method for solving linear systems is the elimination method of
Gauß1 which transforms the system Ax = b in several “elimination steps” (assuming
“exact” arithmetic) into an equivalent upper triangular system Rx = c, which is then
solved by backward substitution. In practice, due to round-off errors, the resulting upper
triangular system is not exactly equivalent to the original problem and this unavoidable
error needs to be controlled by additional algorithmical steps (“final iteration”, or “Na-
chiteration”, in German). In the elimination process two elementary transformations are
applied to the matrix A , which do not alter the solution of system (2.1.1): “permutation
of two rows of the matrix” and “addition of a scalar multiple of a row to another row of
the matrix”. Also the “permutation of columns” of A is admissible if the unknowns xi
are accordingly renumbered.
In the practical realization of Gaussian elimination the elementary transformations
are applied to the composed matrix [A, b] . In the following, we assume the matrix A
(0)
to be regular. First, we set A(0) ≡ A, b(0) ≡ b and determine ar1 = 0, r ∈ {1, . . . , n} .
(Such an element exists since otherwise A would be singular.). Permute the 1-st and the
r-th row. Let the result be the matrix [Ã(0) , b̃(0) ] . Then, for j = 2, . . . , n, we multiply
the 1-st row by qj1 and subtract the result from the j-th row,
(0) (0) (0) (1) (0) (0) (1) (0) (0)
qj1 ≡ ãj1 /ã11 (= ar1 /a(0)
rr ), aji := ãji − qj1 ã1i , bj := b̃j − qj1 b̃1 .
The result is ⎡ ⎤
(0) (0) (0) (0)
ã11 ã12 . . . ã1n b̃1
⎢ ⎥
⎢ 0 (1) (1) (1) ⎥
⎢ a22 . . . a2n b2 ⎥
[A , b ] = ⎢ .
(1) (1)
.. ⎥.
⎢ .. . ⎥
⎣ ⎦
(1) (1) (1)
0 an2 . . . ann bn
The transition [A(0) , b(0) ] → [Ã(0) , b̃(0) ] → [A(1) , b(1) ] can be expressed in terms of matrix
multiplication as follows:
1
Carl Friedrich Gauß (1777–1855): Eminent German mathematician, astronomer and physicist;
worked in Göttingen; fundamental contributions to arithmetic, algebra and geometry; founder of modern
number theory, determined the planetary orbits by his “equalization calculus”, further contributions to
earth magnetism and construction of an electro-magnetic telegraph.
2.1 Gaussian elimination, LR and Cholesky decomposition 57
1 r
⎡ ⎤
0 ··· 1 1
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥ 1
⎢ . .. .. ⎥ ⎡ ⎤
⎢ .. . . ⎥
⎢ ⎥ 1
⎢ ⎥ ⎢ ⎥
⎢ 1 ⎥ ⎢ −q21 1 ⎥
⎢ ⎥ ⎢ ⎥ 1
P1 = ⎢
⎢ 1 ··· 0
⎥
⎥ r G1 = ⎢
⎢ ... .. ⎥
⎥
⎢ ⎥ ⎣ . ⎦
⎢ 1 ⎥
⎢ ⎥ −qn1 1
⎢ ⎥
⎢ ..
. ⎥
⎣ ⎦
1
Both matrices, P1 and G1 , are regular regular with determinants det(P1 ) = det(G1 ) = 1
and there holds ⎡ ⎤
1
⎢ ⎥
⎢ q21 1 ⎥
⎢ ⎥
P1−1 = P1 , G−1 = ⎢ . . ⎥.
1
⎢ .. .. ⎥
⎣ ⎦
qn1 1
The systems Ax = b and A(1) x = b(1) have obviously the same solution,
Ax = b ⇐⇒ A(1) x = G1 P1 Ax = G1 P1 b = b(1) .
(0)
Definition 2.2: The element ar1 = ã11 is called “pivot element” and the whole substep
of its determination “pivot search”. For reasons of numerical stability one usually makes
the choice
The whole process incl. permutation of rows is called “column pivoting” . If the elements
of the matrix A are of very different size “total pivoting” is advisable. This consists in
the choice
and subsequent permutation of the 1-st row with the r-th row and the 1-st column with
the s-th column. According to the column permutation also the unknowns xk have to
be renumbered. However, “total pivoting” is costly so that simple “column pivoting” is
usually preferred.
58 Direct Solution Methods
i r
⎡ ⎤
1
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥ i
⎢ ⎥ ⎡ ⎤
⎢ ⎥
⎢ 1 ⎥ 1
⎢ ⎥ ⎢ ⎥
⎢ 0 ··· 1 ⎥i ⎢ .. ⎥
⎢ ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎢ 1 ⎥ ⎢ ⎥i
⎢ ⎥ ⎢ 1 ⎥
Pi = ⎢
⎢
..
.
..
.
..
.
⎥
⎥ Gi = ⎢
⎢ −qi+1,i 1
⎥
⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ .. .. ⎥
⎢ 1 ⎥ ⎢ . ⎥
⎢ ⎥ ⎣ . ⎦
⎢ ··· ⎥r
⎢ 1 0 ⎥ −qni 1
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎣ ⎦
1
The matrix A(1) generated in the first step is again regular. The same is true for
the reduced submatrix to which the next elimination step is applied. By repeating this
elimination process, one obtains in n − 1 steps a sequence of matrices,
where
[A(i) , b(i) ] = Gi Pi [A(i−1) , b(i−1) ] , [A(0) , b(0) ] := [A, b] ,
with (unitary) permutation matrices Pi and (regular) Frobenius matrices Gi of the above
form. The end result
is an upper triangular system Rx = c , which has the same solution as the original system
Ax = b . By the i-th elimination step [A(i−1) , b(i−1) ] → [A(i) , b(i) ] the subdiagonal elements
in the i-th column are made zero. The resulting free places are used for storing the
elements qi+1,i , . . . , qn,i of the matrices G−1
i (i = 1, . . . , n − 1) . Since in this elimination
step the preceding rows 1 to i are not changed, one works with matrices of the form
⎡ ⎤
r11 r12 ··· r1i r1,i+1 ··· r1n c1
⎢ ⎥
⎢ λ21 r22 ··· r2i r2,i+1 ··· r2n c2 ⎥
⎢ ⎥
⎢ ⎥
⎢ λ31 λ32 r3i r3,i+1 ··· r3n c3 ⎥
⎢ ⎥
⎢ .. .. .. .. .. .. .. ⎥
⎢ . . . . . . . ⎥
⎢ ⎥.
⎢ ⎥
⎢ λi1 λi2 rii ri,i+1 ··· rin ci ⎥
⎢ ⎥
⎢ λi+1,1 λi+2,2 λi+1,i
(i)
ai+1,i+1 ···
(i)
ai+1,n
(i) ⎥
bi+1 ⎥
⎢
⎢ .. .. .. .. .. .. ⎥
⎢ . ⎥
⎣ . . . . . ⎦
(i) (i) (i)
λn,1 λn,2 · · · λn,i an,i+1 ··· an,n bn
2.1 Gaussian elimination, LR and Cholesky decomposition 59
Here, the subdiagonal elements λk+1,k , . . . , λnk in the k-th column are permutations of
the elements qk+1,k , . . . , qnk of G−1
k since the permutations of rows (and only those) are
applied to the whole composed matrix. As end result, we obtain the matrix
⎡ ⎤
r11 ··· r1n c1
⎢ ⎥
⎢ l21 r22 r2n c2 ⎥
⎢ ⎥
⎢ .. .. .. .. .. ⎥.
⎢ . . . . . ⎥
⎣ ⎦
ln1 · · · ln,n−1 rnn cn
P A = LR , P := Pn−1 · · · P1 . (2.1.5)
Ly = P b, Rx = y, (2.1.6)
Proof. i) We give the proof only for the case that pivoting is not necessary, i. e., Pi = I .
Then, R = Gn−1 · · · G1 A and G−1 −1 −1 −1
1 · · · Gn−1 R = A. In view of L = G1 · · · Gn−1 the first
assertion follows.
ii) To prove uniqueness let A = L1 R1 = L2 R2 be two LR decompositions. Then, L−1 2 L1 =
R2 R1−1 = I since L−12 L1 is lower triangular with ones on the main diagonal and R2 R1−1
is upper triangular. Consequently, L1 = L2 and R1 = R2 , what was to be shown. Q.E.D.
arithmetic operations. This is just the work count of computing the corresponding decom-
position P A = LR , while the solution of the two triangular systems (2.1.6) only requires
n2 + O(n) arithmetic operations.
60 Direct Solution Methods
requires n−k divisions and (n−k) + (n−k)2 combined multiplications and additions
resulting altogether in
n−1
k 2 + O(n2 ) = 13 n3 + O(n2 ) a. Op.
k=1
for the n−1 steps of forward elimination. By this all elements of the matrices L and R
are computed. The work count of the forward and backward elimination in (2.1.6) follows
by similar considerations. Q.E.D.
⎡ ⎤⎡ ⎤ ⎡ ⎤ pivoting
3 1 6 x1 2
⎢ ⎥⎢ ⎥ ⎢ ⎥ 3 1 6 2
⎢ 2 1 3 ⎥ ⎢ x2 ⎥ = ⎢ 7 ⎥ →
⎣ ⎦⎣ ⎦ ⎣ ⎦ 2 1 3 7
1 1 1 x3 4
1 1 1 4
elimination pivoting
3 1 6 2 3 1 6 2
→
2/3 1/3 −1 17/3 1/3 2/3 −1 10/3
1/3 2/3 −1 10/3 2/3 1/3 −1 17/3
elimination
x3 = −8
3 1 6 2
→ x2 = 2 ( 3 − x3 ) = −7
3 10
1/3 2/3 −1 10/3
x1 = 3 (2 − x2 − 6x3 ) =
1
19 .
2/3 1/2 −1/2 4
LR decomposition: ⎡ ⎤
1 0 0
⎢ ⎥
P2 = ⎢
P1 = I ,
⎣ 0 0 1 ⎥⎦,
0 1 0
⎡ ⎤ ⎡ ⎤⎡ ⎤
3 1 6 1 0 0 3 1 6
⎢ ⎥ ⎢ ⎥⎢ ⎥
PA = ⎢
⎣
⎥ ⎢
1 1 1 ⎦ = LR = ⎣ 1/3 1 ⎥ ⎢
0 ⎦ ⎣ 0 2/3 −1 ⎥.
⎦
2 1 3 2/3 1/2 1 0 0 −1/2
2.1 Gaussian elimination, LR and Cholesky decomposition 61
Example 2.2: For demonstrating the importance of the pivoting process, we consider
the following linear 2×2-system:
& '& ' & '
10−4 1 x1 1
= (2.1.8)
1 1 x2 2
Example 2.3: The positive effect of column pivoting is achieved only if all row sums of
the matrix A are of similar size. As an example, we consider the 2×2-system
& '& ' & '
2 20000 x1 20000
= ,
1 1 x2 2
which results from (2.1.8) by scaling the first row by the factor 20.000 . Since in the first
column the element with largest modulus is on the main diagonal the Gauß algorithm
with and without pivoting yields the same unacceptable result (x1 , x2 )T = (0, 1)T . To
avoid this effect, we apply an “equilibration” step before the elimination, i. e., we multiply
A by a diagonal matrix D,
n −1
Ax = b → DAx = Db , di = |aij | , (2.1.9)
j=1
such that all row sums of A are scaled to 1 . An even better stabilization in the case
of matrix elements of very different size is “total pivoting”. Here, an equilibration step,
row-wise and column-wise, is applied before the elimination.
We briefly discuss the conditioning of the solution of a linear system by Gaussian elim-
ination. For any (regular) matrix A there exists an LR decomposition like P A = LR.
Then, there holds
R = L−1 P A, R−1 = (P A)−1 L.
Due to column pivoting the elements of the triangular matrices L and L−1 are all less
or equal one and there holds
Consequently,
Then, the general perturbation theorem, Theorem1.8, yields the following estimate for
the solution of the equation LRx = P b (considering only perturbations of the right-hand
side b ):
Theorem 2.2 (Round-off error influence): The matrix A ∈ Rn×n be regular, and the
linear system Ax = b be solved by Gaussian elimination with column pivoting. Then, the
actually computed perturbed solution x+δx under the influence of round-off error is exact
solution of a perturbed system (A + δA)(x + δx) = b , where (eps = “machine accuracy”)
δA∞
≤ 1.01 · 2n−1 (n3 + 2n2 ) eps. (2.1.10)
A∞
δx∞ cond(A)
≤ {1.01 · 2n−1 (n3 + 2n2 ) eps} . (2.1.11)
x∞ 1 − cond(A)δA∞ /A∞
This estimate is, as practical experience shows, by far too pessimistic since it is oriented
at the worst case scenario and does not take into account round-off error cancellations.
Incorporating the latter effect would require a statistical analysis. Furthermore, the above
estimate applies to arbitrary full matrices. For “sparse” matrices with many zero entries
much more favorable estimates are to be expected. Altogether, we see that Gaussian
elimination is, in principle, a well-conditioned algorithm, i. e., the influence of round-off
errors is bounded in terms of the problem dimension n and the condition cond(A) , which
described the conditioning of the numerical problem to be solved.
2
James Hardy Wilkinson (1919–1986): English mathematician; worked at National Physical Labora-
tory in London (since 1946); fundamental contributions to numerical linear algebra, especially to round-off
error analysis; co-founder of the famous NAG software library (1970).
2.1 Gaussian elimination, LR and Cholesky decomposition 63
min(j,k)
ajk = lji rik . (2.1.12)
i=1
Here, the ordering of the computation of ljk , rjk is not prescribed a priori. In the so-called
“algorithm of Crout3 ” the matrix A = LR is tessellated as follows:
⎡ ⎤
1
⎢ ⎥
⎢ 3 ⎥
⎢ ⎥
⎢ ⎥
⎢ 5 ⎥
⎢ ⎥.
⎢ .. ⎥
⎢ . ⎥
⎣ ⎦
2 4 6 ···
1
k = 1, · · · , n : a1k = l1i rik ⇒ r1k := a1k ,
i=1
1
−1
j = 2, · · · , n : aj1 = lji ri1 ⇒ lj1 := r11 aj1 ,
i=1
2
k = 2, · · · , n : a2k = l2i rik ⇒ r2k := a2k − l21 r1k ,
i=1
..
.
j−1
rjk := ajk − lji rik , k = j, j + 1, · · · , n ,
i=1
( ) (2.1.13)
j−1
−1
lkj := rjj akj − lki rij , k = j + 1, j + 2, · · · , n .
i=1
The Gaussian elimination and the direct computation of the LR decomposition differ only
in the ordering of the arithmetic operations and are algebraically equivalent.
3
Prescott D. Crout (1907–1984): US-American mathematician and engineer; Prof. at Massachusetts
Institute of Technology (MIT); contributions to numerical linear algebra (“A short method for evaluating
determinants and solving systems of linear equations with real or complex coefficients”, Trans. Amer.
Inst. Elec. Eng. 60, 1235–1241, 1941) and to numerical electro dynamics.
64 Direct Solution Methods
Ly = P b , Rx = y . (2.1.14)
This variant of the Gaussian algorithm is preferable if the same linear system is succes-
sively to be solved for several right-hand sides b . Because of the unavoidable round-off
error one usually obtains an only approximate LR decomposition
L̃R̃ = P A
and using this in (2.1.14) an only approximate solution x(0) with (exact) “residual”
(negative “defect”)
dˆ(0) := b − Ax(0) = 0 .
Using the already computed approximate trianguler decomposition L̃R̃ ∼ P A, one solves
(again approximately) the so-called “correction equation”
i. e., x(1) = x would be the exact solution of the system Ax = b . In general, x(1)
is a better approximation to x than x(0) even if the defect equation is solved only
approximately. This, however, requires the computation of the residual (defect) d with
higher accuracy by using extended floating point arithmetic. This is supported by the
following error analysis.
For simplicity, let us assume that P = I . We suppose the relative error in the LR
decomposition of the matrix A to be bounded by a small number ε . Due to the general
perturbation result of Theorem 1.8 there holds the estimate
A − Ã
≤ ε̃ ε .
A
and, consequently,
Since
L̃R̃ = A − A + L̃R̃ = A I − A−1 (A − L̃R̃) ,
we can use Lemma 1.15 to conclude
has the exact solution x = (−100, 103.921 . . .)T . Gaussian elimination, with 3-decimal
arithmetic and correct rounding, yields the approximate triangular matrices
& ' & '
1 0 1.05 1.02
L̃ = , R̃ = ,
0.990 1 0 0.01
66 Direct Solution Methods
& '
0 0
L̃R̃ − A = (correct within machine accuracy).
5 · 10−4 2 · 10−4
The resulting “solution” x(0) = (−97, 1.101)T has the residual
,
(0, 0)T 3-decimal computation,
d(0) = b − Ax(0) = T
(0, 065, 0, 035) 6-decimal computation.
has the solution k (1) = (−2.9, 102.899)T (obtained by 3 decimal computation). Hence,
one correction step yields the approximate solution
forward elimination
1 0 r11 · · · r1n 1 0
.. → .. .. ..
A . . . .
0 1 rnn ∗ 1
2.1 Gaussian elimination, LR and Cholesky decomposition 67
⎡ ⎤ forward elimination
3 1 6
⎢ ⎥ 3 1 6 1 0 0
A=⎢ ⎥
⎣ 2 1 3 ⎦ : →
2 1 3 0 1 0
1 1 1
1 1 1 0 0 1
scaling
3 0 0 −6 15 −9 1 0 0 −2 5 −3
→ →
0 2/3 0 2/3 −2 2 0 1 0 1 −3 3
0 0 −1/2 −1/2 1 −1/2 0 0 1 1 −2 1
⎡ ⎤
−2 5 −3
⎢ ⎥
⇒ A−1 = ⎢
⎣ 1 −3 3 ⎥
⎦.
1 −2 1
An alternative method for computing the inverse of a matrix is the so-called “exchange
algorithm” (sometimes called “Gauß-Jordan algorithm”). Let be given a not necessarily
quadratic linear system
yields for j = 1, . . . , m , j = p :
- . - .
ajq ap1 ajq ap,q−1 ajq
aj1 − x1 + . . . + aj,q−1 − xq−1 + yp +
apq apq apq
- . - .
ajq ap,q+1 ajq apn
+ aj,q+1 − xq+1 + . . . + ajn − xn = yj .
apq apq
The result is a new system, which is equivalent to the original one,
⎡ ⎤ ⎡ ⎤
x1 y1
⎢ . ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
à ⎢ ⎥ ⎢
⎢ yp ⎥ = ⎢ xq ⎥ ⎥, (2.1.18)
⎢ . ⎥ ⎢ .. ⎥
⎢ .. ⎥ ⎢ . ⎥
⎣ ⎦ ⎣ ⎦
xn ym
If we succeed with replacing all components of x by those of y the result is the solution
of the system y = A−1 x . In the case m = n , we obtain the inverse A−1 , but in general
with permutated rows and columns. In determining the pivot element it is advisable, for
stability reasons, to choose an element apq of maximal modulus.
Proof. Suppose the algorithm stops after r exchange steps. Let at this point x1 , . . . , xr
be exchanged against y1 , . . . , yr so that the resulting system has the form
2.1 Gaussian elimination, LR and Cholesky decomposition 69
⎡ ⎤
⎧ ⎡ ⎤ ⎤ ⎡
⎪
⎪ ⎢ ⎥ y1 x1
⎨ ⎢ ∗ ∗ ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥
r ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥
⎪
⎪ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎩ ⎢ ⎥ ⎢ y ⎥ ⎢ x ⎥
⎧ ⎢
⎢
⎥
⎥
⎢ r
⎢
⎥ ⎢ r ⎥
⎥ = ⎢ ⎥.
⎪
⎪ ⎢ ⎥ ⎢ xr+1 ⎥ ⎢ yr+1 ⎥
⎨ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥
m−r ⎢ ∗ 0 ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎪
⎪ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦
⎩ ⎣ ⎦
xn ym
r n−r
Hence, dim(kern(A)) ≥ n−r . On the other hand, because y1 , · · · , yr can be freely chosen,
we have dim(range(A)) ≥ r . Further, observing dim(range(A)) + dim(kern(A)) = n it
follows that rank(A) = dim(range(A)) = r . This completes the proof. Q.E.D.
For a quadratic linear system with regular coefficient matrix A the Gauß-Jordan
algorithm for computing the inverse A−1 is always applicable.
Example 2.6: ⎡ ⎤⎡ ⎤ ⎡ ⎤
1 2 1 x1 y1
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ −3 −5 −1 ⎥ ⎢ x2 ⎥ = ⎢ y2 ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
−7 −12 −2 x3 y3
x1 x2 x3 x1 y3 x3
1 2 1 y1 −1/6 −1/6 2/3 y1
−3 −5 −1 y2 −1/12 5/12 −1/6 y2
−7 −12 −2 y3 −7/12 −1/12 −1/6 x2
70 Direct Solution Methods
x1 y3 y1 y2 y3 y1 ⎡ ⎤
−2 −8 3
1/4 1/4 3/2 x3 −2 1 1 x3 ⎢ ⎥
inverse: ⎢ 1 5 −2 ⎥
−1/8 −1/4 y2 ⎣ ⎦
3/8 −8 3 −2 x1
1 −2 1
−5/8 −1/8 −1/4 x2 5 −2 1 x2
multiplications and additions and subsequently n2 divisions. Hence the total work count
for computing the inverse is
ii) In the Gauß-Jordan algorithm the k-th exchange step requires 2n + 1 divisions in
pivot row and column and (n − 1)2 multiplications and additions for the update of the
remaining submatrix, hence all together n2 +O(n) a. op.. The computation of the inverse
requires n exchange steps so that the total work count again becomes n3 + O(n2 ) a. op..
Q.E.D.
The application of Gaussian elimination for the solution of large linear systems of size
n > 104 poses technical difficulties if the primary main memory of the computer is not
large enough for storing the matrices occurring during the process (fill-in problem). In
this case secondary (external) memory has to be used, which increases run-time because
of slower data transfer. However, many large matrices occurring in practice have special
structures, which allow for memory saving in the course of Gaussian elimination.
Definition 2.3: A matrix A ∈ Rn,n is called “band matrix” of “band type” (ml , mr )
with 0 ≤ ml , mr ≤ n − 1 , if
i. e., the elements of A outside of the main diagonal and of ml + mr secondary diagonals
are zero. The quantity m = ml + mr + 1 is called the “band width” of A.
Theorem 2.4 (Band matrices): Let A ∈ Rn×n be a band matrix of band type (ml , mr ),
for which Gaussian elimination can be applied without pivoting, i. e., without permutation
of rows. Then, all reduced matrices are also band matrices of the same band type and
the matrix factors L and R in the triangular decomposition of A are band matrices of
type (ml , 0) and (0, mr ), respectively. The work count for the computation of the LR
decomposition A = LR is
are simply be obtained by short recursion formulas (sometimes called “Thomas4 algo-
rithm”),
α1 = a1 , β1 = b1 ,
i = 2, . . . , n − 1 : γi = ci /αi−1 , αi = ai − γi βi−1 , βi = bi ,
γn = cn /αn−1 , αn = an − γn βn−1 .
Definition 2.4: A matrix A = (aij )ni,j=1 ∈ Rn×n is called “diagonally dominant”, if there
holds
n
|ajk | ≤ |ajj | , j = 1, . . . , n. (2.2.21)
k=1,k
=j
4
Llewellyn Thomas (1903–1992): British physicist and applied mathematician; studied at Cambridge
University, since 1929 Prof. of physics at Ohio State University, after the war, 1946, staff member at
Watson Scientific Computing Laboratory at Columbia University, since 1968 Visiting Professor at North
Carolina State University until retirement; best known for his contributions to Atomic Physics, thesis
(1927) “Contributions to the theory of the motion of electrified particles through matter and some effects
of that motion”; his name is frequently attached to an efficient version of the Gaussian elimination method
for tridiagonal matrices.
2.2 Special matrices 73
(1) aj1
j = 2, . . . , n , k = 1, . . . , n : ajk = ajk − qj1 a1k , qj1 = .
a11
Hence, for j = 2, . . . , n, there holds
n
(1)
n
n
|ajk | ≤ |ajk | + |qj1 | |a1k |
k=2,k
=j k=2,k
=j k=2,k
=j
n n
≤ |ajk | −|aj1 | + |qj1 | |a1k | −|qj1 ||a1j |
a k=2
k=1,k
=j
j1
=
≤ |ajj | a11 ≤ |a11 |
(1)
≤ |ajj | − |qj1 a1j | ≤ |ajj − qj1 a1j | = |ajj |.
The matrix A(1) = G1 A(0) is regular and obviously again diagonally dominant. Conse-
(1)
quently, a22 = 0 . This property is maintained in the course of the elimination process,
i. e., the elimination is possible without any row permutations. Q.E.D.
Remark 2.1: If in (2.2.21) for all j ∈ {1, . . . , n} the strict inequality holds, then the
matrix A is called “strictly diagonally dominant” . The proof of Theorem 2.5 shows that
for such matrices Gaussian elimination is applicable without pivoting, i. e., such a matrix
is necessarily regular. The above model matrix is diagonally dominant but not strictly
diagonally dominant. Its regularity will be shown later by other arguments based on a
slightly more restrictive assumption.
Proof. For the (symmetric) positive matrix A there holds a11 > 0 . The relation
(1) aj1 ak1 (1)
ajk = ajk − a1k = akj − a1j = akj ,
a11 a11
74 Direct Solution Methods
1 1 2
n n
− ak1 a1j xk xj + a1k xk
a11 j,k=2 a11 k=2
= 0 (ajk = akj )
n ak1 a1j 1
n 2
= ajk − xj xk + a11 x1 + a1k xk
a a11
j,k=2 11
(1)
k=2
= ajk =0
A = LR = R̃T DLT ,
and, consequently, L = R̃T and R = DLT . This proves the following theorem.
2.2 Special matrices 75
with the matrix L̃ := LD 1/2 . For computing the Cholesky decomposition it suffices to
compute the matrices D and L . This reduces the required work count to
starts from the relation A = L̃L̃T , which can be viewed as a system of n(n + 1)/2
equations for the quantities ˜ljk , k ≤ j. Multiplicating this out,
⎡ ⎤⎡ ⎤ ⎡ ⎤
˜l11 0 ˜l11 · · · ˜ln1 a11 · · · a1n
⎢ . ⎥⎢ .. ⎥ ⎢ . .. ⎥
⎢ .. .. ⎥⎢ .. ⎥ ⎢ . ⎥
⎣ . ⎦⎣ . . ⎦ = ⎣ .. ⎦,
˜ln1 · · · ˜lnn 0 ˜lnn an1 · · · ann
5
Andrè Louis Cholesky (1975–1918): French mathematician; military career as engineer officer; con-
tributions to numerical linear algebra, “Cholesky decomposition”; killed in battle shortly before the end of
World War I, his discovery was published posthumously in ”Bulletin Géodésique”.
76 Direct Solution Methods
Ax = b, (2.3.25)
AT Ax = AT b. (2.3.26)
In the rank-deficient case, rank(A) < n , a particular solution x̂ of the normal system is
not unique, but of the general form x̂ + y with any element y ∈ kern(A). In this case
uniqueness is achieved by requiring the “least error-squares” solution to have minimal
Euclidian norm, x̂2 .
We recall that the matrix AT A is symmetric and positive semi-definite, and even
positive definite if A has maximal rank rank(A) = n. In the latter case the normal
equation can, in principle, be solved by the Cholesky algorithm for symmetric positive
definite matrices. However, in general the matrix AT A is rather ill-conditioned. In fact,
for m = n, we have that
1.07 1.15
But AT A is not positive definite: (−1, 1) · AT A · (−1, 1)T = −0.01 , i. e., in this case the
Cholesky algorithm will not yield a solution.
We will now describe a method, by which the normal equation can be solved without
explicitly forming the product AT A. For later purposes, from now on, we admit complex
matrices.
Theorem 2.8 (QR decomposition): Let A ∈ Km×n be any rectangular matrix with
m ≥ n and rank(A) = n . Then, there exists a uniquely determined orthonormal matrix
Q ∈ Km×n with the property
and a uniquely determined upper triangular matrix R ∈ Kn×n with real diagonal rii >
0 , i = 1, . . . , n , such that
A = QR. (2.3.29)
k−1
q1 ≡ a1 −1
2 a1 , k = 2, . . . , n : q̃k ≡ ak − (ak , qi )2 qi , qk ≡ q̃k −1
2 q̃k .
i=1
Since by assumption rank(A) = n the n column vectors {a1 , . . . , an } are linearly in-
dependent and the orthonormalization process does not terminate before k = n. By
construction the matrix Q ≡ [q1 , . . . , qn ] is orthonormal. Further, for k = 1, . . . , n, there
holds:
k−1
k−1
ak = q̃k + (ak , qi )2 qi = q̃k 2 qk + (ak , qi )2 qi
i=1 i=1
and
k
ak = rik qk , rkk ≡ q̃k 2 ∈ R+ , rik ≡ (ak , qi )2 .
i=1
Setting rik ≡ 0, for i > k, this is equivalent to the equation A = QR with the upper
triangular matrix R = (rik ) ∈ Kn×n .
ii) Uniqueness: For proving the uniqueness of the QR decomposition let A = Q1 R1 and
78 Direct Solution Methods
A = Q2 R2 be two such decompositions. Since R1 and R2 are regular and (det(Ri ) > 0)
it follows that
Since Q̄T Q = R1 R2−1 R2 R1−1 = I it follows that Q is orthonormal and diagonal with
|λi | = 1 . From QR1 = R2 , we infer that λi rii1 = rii2 > 0 and, consequently, λi ∈ R and
λi = 1. Hence, Q = I , i. e.,
R1 = R2 , Q1 = AR1−1 = AR2−1 = Q2 .
transforms into
AT Ax = RT QT QRx = RT Rx = RT QT b,
and, consequently, in view of the regularity of RT ,
Rx = QT b. (2.3.30)
This triangular system can now be solved by backward substitution in O(n2 ) arithmetic
operations. Since
AT A = RT R (2.3.31)
The Gram-Schmidt algorithm used in the proof of Theorem 2.8 for orthonormalizing
the column vectors of the matrix A is not suitable in practice because of its inherent
2.3 Irregular linear systems and QR decomposition 79
(not to be confused with the “scalar product” v̄ T v = v22 , which maps vectors to scalars).
S = I − 2vv̄ T ∈ Km×m
Su = (I − 2vv T ) (αv + βv ⊥ )
= αv + βv ⊥ − 2α (v v T ) v −2β(v v T ) v ⊥ = −αv + βv ⊥ .
=1 =0
6
Alston Scott Householder (1904–1993): US-American mathematician; Director of Oak Ridge National
Laboratory (1948-1969), thereafter Prof. at the Univ. of Tennessee; worked in mathematical biology,
best known for his fundamental contributions to numerics, especially to numerical linear algebra.
80 Direct Solution Methods
⎡ ⎤
∗ ∗
⎢ .. ⎥
⎢ ..
. . ⎥
⎢ ⎥
⎢ ⎥
⎢ ∗ ⎥
A(i−1) = ⎢
⎢
⎥i
⎥
⎢ ∗ ··· ∗ ⎥
⎢ ⎥
⎢ 0 ⎥
⎣ ⎦
∗ ··· ∗
i
In the i-th step the Householder transformation Si ∈ Km×m is determined such that
Si A(i−1) = A(i) .
From this, we obtain the desired QR decomposition of A simply by striking out the last
m − n columns in Q̃ and the last m − n rows in R̃ :
⎡ ⎤⎫
⎡ ⎤ ⎪
⎪
⎢ ⎥⎬
⎢ ⎥ ⎢ ⎢ ⎥
⎢ ⎥ ⎢
R ⎥⎪ n
⎢ ⎥ ⎢ ⎥⎪
⎢ ⎥ ⎢ ⎥⎭
A = ⎢ Q ∗ ⎥·⎢ ⎥⎫ = QR .
⎢ ⎥ ⎢ ⎥⎪
⎢ ⎥ ⎢ ⎥⎪⎬
⎣ ⎦ ⎢ ⎥
⎥
⎣ 0 ⎦⎪ m−n
⎪
⎭
n m-n
We remark that here the diagonal elements of R do not need to be positive, i. e., the
Householder algorithm does generally not yield the “uniquely determined” special QR
decomposition given by Theorem 2.8.
2.3 Irregular linear systems and QR decomposition 81
Now, we describe the transformation process in more detail. Let ak be the column
vectors of the matrix A .
Step 1: S1 is chosen such that S1 a1 ∈ span{e1 } . The vector a1 is reflected with respect
to one of the axes span{a1 + a1 e1 } or span{a1 − a1 e1 } ) into the x1 -axis. The choice
of the axis is oriented by sgn(a11 ) in order to minimize round-off errors. In case a11 ≥ 0
this choice is
a1 + a1 2 e1 a1 − a1 2 e1
v1 = , v1⊥ = .
a1 + a1 2 e1 2 a1 − a1 2 e1 2
Then, the matrix A(1) = (I − 2v1 v̄1T )A has the column vectors
(1) (1)
a1 = −a1 2 e1 , ak = ak − 2(ak , v1 )v1 , k = 2, . . . , n.
Spiegelungsachse
a1−||a1||2e1 a1 a1+||a1||2e1
−||a1||2e1 ||a1||2e1 x1
The application of the (unitary) matrix Si to A(i−1) leaves the first i − 1 rows and
columns of A(i−1) unchanged. For the construction of vi , we use the considerations of
the 1-st step for the submatrix:
82 Direct Solution Methods
⎡ ⎤
(i−1) (i−1)
ãii · · · ãin
⎢ . .. ⎥ 3 (i−1) 4
Ã(i−1) =⎢
⎣
.. . ⎥⎦ = ãi , . . . , ãn(i−1) .
(i−1) (i−1)
ãmi · · · ãmn
It follows that
(i−1) (i−1) (i−1) (i−1)
ãi − ãi 2 ẽi ãi + ãi 2 ẽi
ṽi = , ṽi⊥ = ,
. . . 2 . . . 2
Remark 2.2: For a quadratic matrix A ∈ Kn×n the computation of the QR decom-
position by the Householder algorithm costs about twice the work needed for the LR
decomposition of A , i. e., NQR = 32 n3 + O(n2 ) a. op.
The methods for solving linear systems and equalization problems become numerically
unreliable if the matrices are very ill-conditioned. It may happen that a theoretically
regular matrix appears as singular for the (finite arithmetic) numerical computation or
vice versa. The determination of the rank of a matrix cannot be accomplished with suf-
ficient reliability by the LR or the QR decomposition. A more accurate approach for
treating rank-deficient matrices uses the so-called “singular value decomposition (SVD)”.
This is a special orthogonal decomposition, which transforms the matrix from both sides.
For more details, we refer to the literature, e. g., to the introductory textbook by Deufl-
hard & Hohmann [33].
Let A ∈ Km×n be given. Further let Q ∈ Km×m and Z ∈ Kn×n be orthonormal
matrices. Then, the holds
Hence this two-sided transformation does not change the conditioning of the matrix A .
For suitable matrices Q and Z , we obtain precise information about the rank of A
and the equalization problem can by accurately solved also for a rank-deficient matrix.
However, the numerically stable determination of such transformations is costly as will
be seen below.
2.4 Singular value decomposition 83
A = W ΛW̄ T (2.4.34)
From (2.4.33), one sees that for the column vectors ui , v i of U, V , there holds
The ONS {u1, . . . , un } can be extended to an ONB {u1 , . . . , um} of Rm such that the
associated matrix U := [u1 , . . . , um ] is unitary. Then, in matrix notation there holds
ii) Case m ≤ n (underdetermined system): We apply the result of (i) to the transposed
matrix AT ∈ Rn×m , obtaining
AT = Ũ Σ̃Ṽ T , A = Ṽ Σ̃T Ũ T .
Then, setting U := Ṽ , V := Ũ , and observing that, in view of the above discussion, the
eigenvalues of (AT )T AT = AAT ∈ Rm×m are among those of AT A ∈ Rn×n besides n−m
zero eigenvalues. Hence, Σ̃T has the desired form. Q.E.D.
We now collect some important consequences of the decomposition (2.4.33). Suppose
that the singular values are ordered like σ1 ≥ · · · ≥ σr > σr+1 = · · · = σp = 0, p =
min(m, n) . Then, there holds (proof exercise):
- rank(A) = r ,
- range(A) = span{u1 , . . . , ur } ,
- A = Ur Σr VrT ≡ ri=1 σi ui v iT (singular decomposition of A ),
- A2 = σ1 = σmax ,
We now consider the problem of computing the “numerical rank” of a matrix. Let
where eps is the “machine accuracy” (maximal relative round-off error). If the matrix
elements come from experimental measurements, then the parameter ε should be related
to the measurement error. The concept of “numerically rank-deficient” has something in
common with that of the ε-pseudospectrum discussed above.
k
Ak = σi ui v iT ,
i=1
Proof. Since
U T Ak V = diag(σ1 , . . . , σk , 0, . . . , 0)
it follows that rank(Ak ) = k . Further, we obtain
U T (A − Ak )V = diag(0, . . . , 0, σk+1, . . . , σp )
A − Ak 2 = σk+1 .
It remains to show that for any other matrix B with rank k, the following inequality
holds
A − B2 ≥ σk+1 .
To this end, we choose an ONB {x1 , . . . , xn−k } of kern(B) . For dimensional reasons
there obviously holds
k+1
Bz = 0 , Az = σi (v iT z)ui
i=1
86 Direct Solution Methods
and, consequently,
k+1
A − B22 ≥ (A − B)z22 = Az22 = σi2 (v iT z)2 ≥ σk+1
2
.
i=1
k+1 iT
Here, we have used that z = i=1 (v z)v i and therefore
k+1
1 = z22 = (v iT z)2 .
i=1
x = (AT A)−1 AT b.
In the case rank(A) < n the normal equation has infinitely many solutions. Out of these
solutions, one selects one with minimal euclidian norm, which is then uniquely determined.
This particular solution is called “minimal solution” of the equalization problem. Using
the singular value decomposition the solution formula (2.4.36) can be extended to this
“irregular” situation.
r
uiT b
x̄ = vi
i=1
σi
2.4 Singular value decomposition 87
is the uniquely determined “minimal solution” of the normal equation. The corresponding
least squares error satisfies
m
ρ2 = Ax̄ − b22 = (uiT b)2 .
i=r+1
Setting z = V T x , we conclude
r
m
Ax − b22 = Σz − U T b22 = (σi z i − uiT b)2 + (uiT b)2 .
i=1 i=r+1
σi z i = uiT b, i = 1, . . . , r.
where
A+ = V Σ+ U T , Σ+ = diag(σ1−1 , . . . , σr−1 , 0, . . . , 0) ∈ Rn×m .
The matrix
A+ = V Σ+ U T (2.4.38)
is called “pseudo-inverse” of the matrix A (or “Penrose7 inverse” (1955)). The pseudo-
inverse is the unique solution of the matrix minimization problem
with the Frobenius norm · F . Since the identity in (2.4.37) holds for all b it follows
that
7
Roger Penrose (1931–): English mathematician; Prof. at Birkbeck College in London (1964) and since
1973 Prof. at the Univ. of Oxford; fundamental contributions to the theory of half-groups, to matrix
calculus and to the theory of “tesselations” as well as in Theoretical Physics to Cosmology, Relativity
and Quantum Mechanics.
88 Direct Solution Methods
In numerical practice the definition of the pseudo-inverse has to use the (suitably defined)
numerical rank. The numerically stable computation of the singular value decomposition
is rather costly. For details, we refer to the literature, e. g., the book by Golub & van Loan
[36].
In the following, we again consider general square matrices A ∈ Kn×n . The direct way of
computing eigenvalues of A would be to follow the definition of what an eigenvalue is and
to compute the zeros of the corresponding characteristic polynomial χA (z) = det(zI − A)
by a suitable method such as, e. g., the Newton method. However, the mathematical task
of determining the zeros of a polynomial may be highly ill-conditioned if the polynomial is
given in “monomial expansion”, although the original task of determining the eigenvalues
of a matrix is mostly well-conditioned. This is another nice example of a mathematical
problem the conditioning of which significantly depends on the choice of its formulation.
In general the eigenvalues cannot be computed via the characteristic polynomial. This
is feasible only in special cases when the characteristic polynomial does not need to be
explicitly built up, such as for tri-diagonal matrices or so-called “Hessenberg8 matrices”.
det(A − zI) = det(T −1 [B − zI]T ) = det(T −1 ) det(B − zI) det(T ) = det(B − zI),
similar matrices A, B have the same characteristic polynomial and therefore also the
same eigenvalues. For any eigenvalue λ of A with a corresponding eigenvector w there
8
Karl Hessenberg (1904–1959): German mathematicians; dissertation “Die Berechnung der Eigenwerte
und Eigenl”osungen linearer Gleichungssysteme”, TU Darmstadt 1942.
2.5 “Direct” determination of eigenvalues 89
holds
Aw = T −1 BT w = λw,
i. e., T w is an eigenvector of B corresponding to the same eigenvalue λ . Further, al-
gebraic and geometric multiplicity of eigenvalues of similar matrices are the same. A
“reduction method” reduces a given matrix A ∈ Cn×n by a sequence of similarity trans-
formations to a simply structured matrix for which the eigenvalue problem is then easier
to solve,
In order to prepare for the following discussion of reduction methods, we recall (without
proof) some basic results on matrix normal forms.
Theorem 2.12 (Jordan normal form): Let the matrix A ∈ Cn×n have the (mutually
different) eigenvalues λi , i = 1, . . . , m, with algebraic and geometric multiplicities σi and
(i) (i) (i)
ρi , respectively. Then, there exist numbers rk ∈ N k = 1, . . . , ρi , σi = r1 + . . . + rρi ,
such that A is similar to the Jordan normal form
⎡ ⎤
Cr(1) (λ1 )
⎢ 1 ⎥
⎢ .. ⎥
⎢ . 0 ⎥
⎢ ⎥
⎢ Cr(1) (λ1 ) ⎥
⎢ ⎥
⎢ ρ1 ⎥
⎢ .. ⎥
JA = ⎢ . ⎥.
⎢ ⎥
⎢ ⎥
⎢ Cr(m) (λm ) ⎥
⎢ ⎥
⎢ ⎥
1
⎢ .. ⎥
⎣ 0 . ⎦
Cr(m) (λm )
ρm
(i)
Here, the numbers rk are up to their ordering uniquely determined.
The following theorem of Schur9 concerns the case that in the similarity transformation
only unitary matrices are allowed.
Theorem 2.13 (Schur normal form): Let the matrix A ∈ Cn×n have the eigenvalues
λi , i = 1, . . . , n (counted accordingly to their algebraic multiplicities). Then, there exists
a unitary matrix U ∈ Cn×n such that
9
Issai Schur (1875–1941): Russian-German mathematician; Prof. in Bonn (1911–1916) and in Berlin
(1916–1935), where he founded a famous mathematical school; because of his jewish origin persecuted
he emigrated 1939 to Palestine; fundamental contributions especially to the Representation Theory of
Groups and to Number Theory.
90 Direct Solution Methods
⎡ ⎤
λ1 ∗
⎢ ⎥
Ū T AU = ⎢
⎣
..
. ⎥.
⎦ (2.5.40)
0 λn
Lemma 2.3 (Diagonalization): For any matrix A ∈ Cn×n the following statements
are equivalent:
i) A is diagonalizable.
ii) There exists an ONB in Cn of eigenvectors of A .
iii) For all eigenvalues of A algebraic and geometric multiplicity coincide.
In general, the direct transformation of a given matrix into normal form in finitely
many steps is possible only if all its eigenvectors are a priori known. Therefore, first one
transforms the matrix in finitely many steps into a similar matrix of simpler structure
(e. g., Hessenberg form) and afterwards applies other mostly iterative methods of the form
Here, the transformation matrices Ti should be given explicitly in terms of the elements
of A(i−1) . Further, the eigenvalue problem of the matrix A(i) = Ti−1 A(i−1) Ti should not
be worse conditioned than that of A(i−1) .
Let · be any natural matrix norm generated by a vector norm · on Cn . For
any two similar matrices, B ∼ A, there holds
and, therefore,
B ≤ cond(T) A , δA ≤ cond(T) δB.
This implies that
δA δB
≤ cond(T)2 . (2.5.41)
A B
Hence, for large cond(T)
1 even small perturbations in B may effect its eigenvalues
significantly more than those in A . In order to guarantee the stability of the reduction
approach, in view of
the transformation matrices Ti are to be chosen such that cond(Ti ) does not become too
large. This is especially achieved for the following three types of transformations:
2.5 “Direct” determination of eigenvalues 91
The Givens and the Householder transformations are unitary with spectral condition
cond2 (T) = 1.
c) Elimination
⎡ ⎤
1
⎢ ⎥
⎢ ..
. ⎥
⎢ ⎥
⎢ ⎥
⎢ 1 ⎥
T =⎢
⎢
⎥,
⎥ |ljk | ≤ 1 =⇒ cond∞ (T) ≤ 4.
⎢ li+1,i 1 ⎥
⎢ ⎥
⎢ .. .. ⎥
⎣ . . ⎦
ln,i 1
In the following, we consider only the eigenvalue problem of real matrices. The fol-
lowing theorem provides the basis of the so-called “Householder algorithm”.
Theorem 2.14 (Hessenberg normal form): To each matrix A ∈ Rn×n there exists
a sequence of Householder matrices Ti , i = 1, . . . , n − 2, such that T AT T with T =
Tn−2 . . . T1 is a Hessenberg matrix . For symmetric A the transformed matrix T AT T is
tri-diagonal.
Proof. Let A = [a1 , . . . , an ] and ak the column vectors of A . In the first step u1 =
(0, u12 , . . . , u1n )T ∈ Rn , u1 2 = 1 , is determines such that with T1 = I −2u1 uT1 there
holds T1 a1 ∈ span{e1 , e2}. Then,
92 Direct Solution Methods
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a11 a12 . . . a1n 1 0 ... a11 ∗
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A(1) = T1 AT1 = ⎢ ⎥ ⎢ 0 ∗ ⎥=⎢ //// ⎥
⎥.
⎣ //// ∗ ⎦ ⎣ ⎦ ⎢
⎣ ⎦
0
.. Ã(1)
. 0
T1 A T1T
In the next step, we apply the same procedure to the reduced matrix Ã(1) . After n−2
steps, we obtain a matrix A(n−2) which has Hessenberg form. With A also A(1) = T1 AT1
is symmetric and then also A(n−2) . The symmetric Hessenberg matrix A(n−2) is tri-
diagonal. Q.E.D.
Remark 2.4: For a symmetric matrix A ∈ Rn×n the Householder algorithm for reduc-
ing it to tri-diagonal form requires 32 n3 + O(n2) a. op. and the reduction of a general
matrix to Hessenberg form 53 n3 + O(n2 ) a.op. For this purpose the alternative method
of Wilkinson using Gaussian elimination steps and row permutation is more efficient as
it requires only half as many arithmetic operations. However, the row permutation de-
stroys the possible symmetry of the original matrix. The oldest method for reducing a
real symmetric matrix to tri-diagonal form goes back to Givens10 (1958). It uses (uni-
tary) Givens rotation matrices. Since this algorithm requires twice as many arithmetic
operations as the Householder algorithm it is not further discussed. For details, we refer
to the literature, e. g., the textbook by Stoer & Bulirsch II [50].
The classical method for computing the eigenvalues of a tri-diagonal or Hessenberg matrix
is based on the characteristic polynomial without explicitly determining the coefficients
in its monomial expansion. The method of Hyman11 (1957) computes the characteristic
polynomial χA (·) of a Hessenberg matrix A ∈ Rn×n . Let as assume that the matrix
A does not separate into two submatrices of Hessenberg form, i. e., aj+1,j = 0, j =
1, . . . , n−1. With a function c(·) still to be chosen, we consider the linear system
10
James Wallace Givens, 1910–1993: US-American mathematician; worked at Oak Ridge National
Laboratory; known by the named after him matrix transformation “Givens rotation” (“Computation of
plane unitary rotations transforming a general matrix to triangular form”, SIAM J. Anal. Math. 6,
26-50, 1958).
11
Morton Allan Hyman: Dutch mathematician; PhD Techn. Univ. Delft 1953, Eigenvalues and
eigenvectors of general matrices, Twelfth National Meeting A.C.M., Houston, Texas, 1957.
2.5 “Direct” determination of eigenvalues 93
Consequently, c(z) = const. det(A − zI), and we obtain a recursion formula for deter-
mining the characteristic polynomial χA (z) = det(zI − A) .
The polynomials pi ∈ Pi are the i-th principle minors of det(zI − A), i. e., pn = χA . To
see this, we expand the (i + 1)-th principle minor with respect to the (i + 1)-th column:
⎡ ⎤
a1 − z b1
⎢ ⎥
⎢ b1
.. .. ⎥
⎢ . . ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ bi−1 ⎥ = (ai+1 − z)pi (z) − b2i pi−1 (z) .
⎢ ⎥
⎢ ⎥
⎢ bi−1 ai − z bi ⎥ =: pi+1 (z)
⎢ ⎥
⎢ .. ⎥
⎢ bi ai+1 − z . ⎥
⎣ ⎦
.. ..
. .
i−1 i i+1
Often it is useful to know the derivative χA (·) of χA (·) (e. g., in using the Newton
method for computing the zeros of χA (·)). This is achieved by the recursion formula
q0 (z) = 0 , q1 (z) = −1
qi (z) = −pi−1 (z) + (ai − z)qi−1 (z) − b2i−1 qi−2 (z) , i = 2, . . . , n ,
qn (z) = χA (z) .
For verifying this, we compute (A − zI) w(z) . For i = 1, . . . , n − 1 (b0 := 0) there holds
We will now describe a method for the determination of zeros of the characteristic polyno-
mial χA of a real symmetric (irreducible) tridiagonal matrix A ∈ Rn×n . Differentiating
in the identity (2.5.43) yields
2.5 “Direct” determination of eigenvalues 95
⎡ ⎤
0
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
[(A − zI)w(z)] = −w(z) + (A − zI)w (z) = ⎢ ⎥.
⎢ 0 ⎥
⎣ ⎦
−wn (z)
We set z = λ with some eigenvalue λ of A and multiply by −w(λ) to obtain
The value of the existence of a Sturm chain of a polynomial p consists in the following
result.
12
Jacques Charles Franois Sturm (1803–1855): French-Swiss mathematician; Prof. at École Poly-
technique in Paris since 1840; contributions to Mathematical Physics, differential equations, (“Sturm-
Liouville problem”) and Differential Geometry.
96 Direct Solution Methods
Proof. We consider the number of sign changes N(a) for increasing a . N(a) remains
constant as long as a does not pass a zero of one of the pi . Let now a be a zero of one
of the pi . We distinguish two cases:
i) Case pi (a) = 0 for i = n : In this case pi+1 (a) = 0 , pi−1 (a) = 0 . Therefore, the sign of
pj (a) , j ∈ {i − 1, i, i + 1} for sufficiently small h > 0 shows a behavior that is described
by one of the following two tables:
In each case N(a − h) = N(a) = N(a + h) and the number of sign changes does not
change.
ii) Case pn (a) = 0 : In this case the behavior of pj (a) , j ∈ {n − 1, n} , is described by
one of the following two tables (because of (S2)):
p0 (x) = 1 , p1 (x) = a1 − x
i = 2, . . . , n : pi (x) = (ai − x)pi−1 (x) − b2i pi−2 (x) ,
has the sign distribution +, . . . , + , which shows that N(x) = 0 . Consequently, N(ζ)
corresponds to the number of zeros λ of χA with λ < ζ . For the eigenvalues λi of A
2.6 Exercises 97
it follows that
In order to determine the i-th eigenvalue λi , one starts from an interval [a0 , b0 ] containing
λi , i. e., a0 < λ1 < λn < b0 . Then, the interval is bisected and it is tested using the Sturm
sequence, which of the both new subintervals λi contains λi . Continuing this process for
t = 0, 1, 2, . . ., one obtains:
,
at , for N(μt ) ≥ i
at+1 :=
at + bt μt , for N(μt ) < i
μt := , , (2.5.45)
2 μt , for N(μt ) ≥ i
bt+1 :=
bt , for N(μt ) < i
2.6 Exercises
Exercise 2.1: a) Construct examples of real matrices, which are symmetric, diagonally
dominant and regular but indefinite (i. e. neither positive nor negative definite), and vice
versa those, which are positive (or negative) definite but not diagonally dominant. This
demonstrates that these two properties of matrices are independent of each other.
b) Show that a matrix A ∈ Kn×n for which the conjugate transpose ĀT is strictly
diagonally dominant is regular.
c) Show that a strictly diagonally dominant real matrix, which is symmetric and has
positive diagonal elements is positive definite.
Exercise 2.2: Let A = (aij )ni,j=1 ∈ Rn×n be a symmetric, positive definite matrix. The
Gaussian elimination algorithm (without pivoting) generates a sequence of matrices A =
A(0) → . . . → A(k) → . . . → A(n−1) = R, where R = (rij )ni,j=1 is the resulting upper-right
triangular matrix. Prove that the algorithm is “stable” in the following sense:
(k) (k−1)
k = 1, . . ., n − 1 : aii ≤ aii , i = 1, . . ., n, max |rij | ≤ max |aij |.
1≤i,j≤n 1≤i,j≤n
Exercise 2.3: The “LR decomposition” of a regular matrix A ∈ Rn×n is the represen-
98 Direct Solution Methods
Exercise 2.4: Let A ∈ Rn×n be a regular matrix that admits an “LR decomposition”.
In the text it is stated that Gaussian elimination (without pivoting) has an algorithmic
complexity of 13 n3 + O(n2 ) a. op., and that in case of a symmetric matrix this reduces
to 61 n3 + O(n2 ) a. op. Hereby, an “a. op.” (arithmetic operation) consists of exactly one
multiplication (with addition) or of a division.
Question: What are the algorithmic complexities of these algorithms in case of a band
matrix of type (ml , mr ) with ml = mr = m? Give explicit numbers for the model matrix
introduced in the text with m = 102 , n = m2 = 104 , and m = 104 , n = m2 = 108 ,
respectively.
Ax = b, (3.0.1)
with a real square matrix A = (aij )ni,j=1 ∈ Rn×n and a vector b = (bj )nj=1 ∈ Rn . Here,
we concentrate on the higher-dimensional case n
103 , such that, besides arithmetical
complexity, also storage requirement becomes an important issue. In practice, high-
dimensional matrices usually have very special structure, e. g., band structure and extreme
sparsity, which needs to be exploited by the solution algorithms. The most cost-intensive
parts of the considered algorithms are simple matrix-vector multiplications x → Ax .
Most of the considered methods and results are also applicable in the case of matrices
and right-hand sides with complex entries.
For the construction of cheap iterative methods for solving problem (3.0.1), one rewrites
it in form of an equivalent fixed-point problem,
Ax = b ⇔ Cx = Cx − Ax + b ⇔ x = (I − C −1 A)x + C −1 b,
with a suitable regular matrix C ∈ Rn×n , the so-called “preconditioner”. Then, starting
from some initial value x0 , one uses a simple fixed-point iteration,
xt = (I − C −1 A) xt−1 + C −1
b, t = 1, 2, . . . . (3.1.2)
=: c
=: B
Here, the matrix B = I − C −1 A is called the “iteration matrix” of the fixed-point
iteration. Its properties are decisive for the convergence of the method. In practice,
such a fixed-point iteration is organized in form of a “defect correction” iteration, which
essentially requires in each step only a matrix-vector multiplication and the solution of a
linear system with the matrix C as coefficient matrix:
Example 3.1: The simplest method of this type is the (damped) Richardson1 method,
which for a suitable parameter θ ∈ (0, 2λmax (A)−1 ] uses the matrices
1
Lewis Fry Richardson (1881–1953): English mathematician and physicist; worked at several institu-
tions in in England and Scotland; a typical “applied mathematician”; pioneered modeling and numerics
in weather prediction.
99
100 Iterative Methods for Linear Algebraic Systems
In view of the Banach fixed-point theorem a sufficient criterion for the convergence
of the fixed-point iteration (3.1.2) is the contraction property of the corresponding fixed-
point mapping g(x) := Bx + c ,
in some vector norm · . For a given iteration matrix B the property B < 1 may
depend on the particular choice of the norm. Hence, it is desirable to characterize the
convergence of this iteration in terms of norm-independent properties of B . For this, the
appropriate quantity is the “spectral radius”
Obviously, spr(B) is the radius of the smallest circle in C around the origin, which
contains all eigenvalues of B . For any natural matrix norm · , there holds
Bx2
spr(B) = B2 = sup . (3.1.6)
x∈Rn \{0} x2
However, we note that spr(·) does not define a norm on Rn×n since the triangle inequality
does not hold in general.
Theorem 3.1 (Fixed-point iteration): The fixed-point iteration (3.1.2) converges for
any starting value x0 if and only if
In case of convergence the limit is the uniquely determined fixed point x . The asymptotic
convergence behavior with respect to any vector norm · is characterized by
xt − x 1/t
sup lim sup = ρ. (3.1.8)
x0 ∈Rn t→∞ x0 − x
Hence, the number of iteration steps necessary for an asymptotic error reduction by a
small factor TOL > 0 is approximately given by
ln(1/TOL)
t(TOL) ≈ . (3.1.9)
ln(1/ρ)
3.1 Fixed-point iteration and defect correction 101
Proof. Assuming the existence of a fixed point x , we introduce the notation et := xt −x.
Recalling that x = Bx + c, we find
i) In case that spr(B) < 1, in view of Lemma 3.1 below, there exists a vector norm · B,ε
depending on B and some ε > 0 chosen sufficiently small, such that the corresponding
natural matrix norm · B,ε satisfies
λt e0 = λt w = B t w = B t e0 = et → 0 (t → ∞).
This necessarily requires spr(B) = |λ| < 1. As byproduct of this argument, we see that
in this particular case
et 1/t
= ρ, t ∈ N.
e0
iii) For an arbitrary small ε > 0 let · B,ε again be the above special norm for which
BB,ε ≤ ρ + ε. Then, by the norm equivalence for any other vector norm · there
exist positive numbers m = m(B, ε), M = M(B, ε) such that
Since ε > 0 can be chosen arbitrarily small and recalling the last identity in (ii), we
obtain the asserted identity (3.1.8).
102 Iterative Methods for Linear Algebraic Systems
xt − x
≤ (ρ + ε)t ≈ TOL, t ≥ t(TOL),
x0 − x
ln(1/10)
t(10−1 ) ≈
ln(1/ρ)
more iterations. For example, for ρ ∼ 0.99, which is not at all unrealistic, we have
t1 ∼ 230 . For large systems with n
106 this means substantial work even if each
iteration step only requires O(n) arithmetic operations.
We have to provide the auxiliary lemma used in the proof of Theorem 3.1.
Lemma 3.1 (Spectral radius): For any matrix B ∈ Rn×n and any small ε > 0 there
exists a natural matrix norm · B,ε , such that
Proof. The matrix B is similar to an upper triangular matrix (e. g., its Jordan normal
form), ⎡ ⎤
r11 · · · r1n
⎢ .. ⎥
B = T −1 RT , R = ⎢ ⎣
..
. . ⎥
⎦,
0 rnn
with the eigenvalues of B on its main diagonal. Hence,
⎡ ⎤
⎡ ⎤ 0 r12 δr13 · · · δ n−2 r1n
1 0 ⎡ ⎤ ⎢ ⎥
⎢ ⎥ r11 0 ⎢
⎢
..
.
..
.
..
. ... ⎥
⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ δ ⎥
Sδ = ⎢ .. ⎥ R0 = ⎢
⎣
..
. ⎥ Qδ = ⎢
⎦ ⎢
..
.
..
. δrn−2,n ⎥,
⎥
⎢ . ⎥ ⎢ ⎥
⎣ ⎦ 0 rnn ⎢ .. ⎥
⎣ . rn−1,n ⎦
0 δ n−1
0
B = T −1 RT = T −1 Sδ Rδ Sδ−1 T.
This implies
Bxδ
Bδ = sup ≤ spr(B) + μδ,
x∈Rn \{0} xδ
and setting δ := ε/μ the desired vector norm is given by · B,ε := · δ . Q.E.D.
In using an iterative method, one needs “stopping criteria”, which for some prescribed
accuracy TOL terminates the iteration, in the ideal case, once this required accuracy is
reached.
104 Iterative Methods for Linear Algebraic Systems
i) Strategy 1. From the Banach fixed-point theorem, we have the general error estimate
q
xt − x ≤ xt − xt−1 , (3.1.11)
1−q
with the “contraction constant” q = B < 1 . For a given error tolerance TOL > 0 the
iteration could be stopped when
The realization of this strategy requires an quantitatively correct estimate of the norm
B or of spr(B) . That has to be generated from the computed iterates xt , i. e., a
posteriori in the course of the computation. In general the iteration matrix B = I −C −1 A
cannot be computed explicitly with acceptable work. Methods for estimating spr(B) will
be considered in the chapter about the iterative solution of eigenvalue problems, below.
ii) Strategy 2. Alternatively, one can evaluate the “residual” Axt − b . Observing that
et = xt − x = A−1 (Axt − b) and x = A−1 b, it follows that
1 1
et ≤ A−1 Axt − b , ≥ ,
b A x
and further
et Axt − b Axt − b
≤ A−1 A = cond(A) .
x b b
This leads us to the stopping criterion
Axt − b
cond(A) ≤ TOL. (3.1.13)
b
The evaluation of this criterion requires an estimate of cond(A), which may be as costly
as the solution of the equation Ax = b itself. Using the spectral norm · 2 the condition
number is related to the singular values of A (square roots of the eigenvalues of AT A),
σmax
cond2 (A) = .
σmin
Again generating accurate estimates of these eigenvalues may require more work than the
solution of Ax = b . This short discussion shows that designing useful stopping criteria
for iterative methods is an not at all an easy task. However, in the context of linear
systems originating from the “finite element discretization” (“FEM”) of partial differen-
tial equations there are approaches based on the concept of “Galerkin orthogonality”,
which allow for a systematic balancing of iteration and discretization errors. In this way,
practical stopping criteria can be designed, by which the iteration may be terminated
once the level of the discretization error is reached. Here, the criterion is essentially the
approximate solution’s “violation of Galerkin orthogonality” (s. Meidner et al. [43] and
Rannacher et al. [45] for more details).
3.1 Fixed-point iteration and defect correction 105
The construction of concrete iterative methods for solving the linear system Ax = b by
defect correction requires the specification of the preconditioner C . For this task two
particular goals have to be observed:
– spr(I − C−1 A) should be as small as possible.
– The correction equation Cδxt = b − Axt−1 should be solvable with O(n) a. op.,
requiring storage space not much exceeding that for storing the matrix A itself.
Unfortunately, these requirements contradict each other. The two extreme cases are:
C=A ⇒ spr(I−C−1 A) = 0
C = θ−1 I ⇒ spr(I−C−1 A) ≈ 1.
The simplest preconditioners are defined using the natural additive decomposition of the
matrix, A = L + D + R , where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a11 ··· 0 0 ··· 0 0 a12 · · · a1n
⎢ ⎥ ⎢ ⎥ ⎢ .. ⎥
⎢ . .. ⎥ ⎢ a ..
. ⎥ ⎢ . .. . .. ⎥
⎢ ⎥ ⎢ 21 ⎥ ⎢ . ⎥
D=⎢ .. ⎥ L = ⎢ . .. .. ⎥ R=⎢ .. ⎥.
⎢ . ⎥ ⎢ .. . . ⎥ ⎢ . an−1,n ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
0 ··· ann an1 · · · an,n−1 0 0 ··· 0
Further, we assume that the main diagonal elements of A are nonzero, aii = 0.
or written component-wise:
n
aii xti = bi − aij xtj , i = 1, . . . , n,
j=1
2
Carl Gustav Jakob Jacobi (1804–1851): German mathematician; already as child highly gifted;
worked in Königsberg and Berlin; contributions to many parts of mathematics: Number Theory, elliptic
functions, partial differential equations, functional determinants, and Theoretical Mechanics.
106 Iterative Methods for Linear Algebraic Systems
(D + L)xt = b − Rxt−1 , t = 1, 2, . . . .
one sees that Jacobi and Gauß-Seidel method have exactly the same arithmetic com-
plexity per iteration step and require the same amount of storage. However, since
the latter method uses a better approximation of the matrix A as preconditioner it
is expected to have an iteration matrix with smaller spectral radius, i. e., converges
faster. It will be shown below that this is actually the case for certain classes of
matrices A.
The arithmetic complexity is about that of Jacobi and Gauß-Seidel method. But
the parameter ω can be optimized for a certain class of matrices resulting in a
significantly faster convergence than that of the other two simple methods.
For a symmetric, positive definite matrix A the ILU method naturally becomes
the ILLT method (“Incomplete Cholesky decomposition”). The ILU decomposition
is obtained by the usual recursive process for the direct computation of the LU
decomposition from the relation LU = A by setting all matrix elements to zero,
which correspond to index pairs {i, j} for which aij = 0 :
i−1
i = 1, ..., n : r̃il = ail − ˜lik r̃kl (l = 1, ..., n)
k=1
i−1
˜lii = 1, ˜lki = r̃ −1 aki − ˜lkl r̃li (k = i + 1, ..., n)
ii
l=1
˜lij = 0, r̃ij = 0, for aij = 0 .
3.1 Fixed-point iteration and defect correction 107
If this process stops because some r̃ii = 0 , we set r̃ii := δ > 0 and continue. The
iteration of the ILU method reads as follows:
We note that here, L and U stand for “lower” and “upper” triangular matrix,
respectively, in contrast to the notion L and R for “left” and “right” triangular
matrix as used before in the context of multiplicative matrix decomposition.
Again this preconditioner is cheap, for sparse matrices, O(n) a. op. per iteration
step, but its convergence is difficult to analyze and will not be discussed further.
However, in certain situations the ILU method plays an important role as a robust
“smoothing iteration” within “multigrid methods” to be discussed below.
5. ADI method (“Alternating-Direction Implicit Iteration”):
The ADI method can be applied to matrices A which originate from the discretiza-
tion of certain elliptic partial differential equations, in which the contributions from
the different spatial directions (x-direction and y-direction in 2D) are separated in
the form A = Ax + Ay . A typical example is the central difference approximation of
the Poisson equation described in Chapter 0.4.2. The iteration of the ADI method
reads as follows:
(Ax + ωI)(Ay + ωI)xt = (Ax + ωI)(Ay + ωI) − A xt−1 + b, t = 1, 2, . . . .
Here, the matrices Ax + ωI and Ay + ωI are tri-diagonal, such that the second
goal “solution efficiency” is achieved, while the full matrix A is five-diagonal. This
method can be shown to converge for any choice of the parameter ω > 0 . For
certain classes of matrices the optimal choice of ω leads to convergence, which is
at least as fast as that of the optimal SOR method. We will not discuss this issue
further since the range of applicability of the ADI method is rather limited.
where the submatrices Aij are of small dimension, 3−10 , such that the explicit inversion
108 Iterative Methods for Linear Algebraic Systems
of the diagonal blocks Aii is possible without spoiling the overall complexity of O(n) a. op.
per iteration step.
In the following, we will give a complete convergence analysis of Jacobi and Gauß-Seidel
method. As already stated above, both methods have the same arithmetic cost (per
iteration step) and require not much more storage as needed for storing the matrix A.
This simplicity suggests that both methods may not be very fast, which will actually be
seen below at the model matrix in Example (2.7) of Section 2.2.
Theorem 3.2 (Strong row-sum criterion): If the row sums or the column sums of
the matrix A ∈ Rn×n satisfy the condition (strict diagonal dominance)
n
n
|ajk | < |ajj | or |akj | < |ajj |, j = 1, . . . , n, (3.1.20)
k=1,k
=j k=1,k
=j
then, spr(J) < 1 and spr(H1 ) < 1 , i. e., Jacobi and Gauß-Seidel method converge.
Proof. First, assume that the matrix A is strictly diagonally dominant. Let λ ∈ σ(J)
and μ ∈ σ(H1 ) with corresponding eigenvectors v and w , respectively. Then, noting
that ajj = 0 , we have
λv = Jv = −D −1 (L+R)v
and
μw = H1 w = −(D+L)−1 Rw ⇔ μw = −D −1 (μL+R)w.
From this it follows that for v∞ = w∞ = 1 and using the strict diagonal dominance
of A :
1 n
|λ| ≤ D −1 (L+R)∞ = max |ajk | < 1.
j=1,...,n |ajj | k=1,k
=j
so that also spr(H1 ) < 1 . If instead of A its transpose AT is strictly diagonally dominant,
we can argue analogously since, in view of λ(ĀT ) = λ(A) , the spectral radii of these two
matrices coincide. Q.E.D.
3.1 Fixed-point iteration and defect correction 109
However, this matrix is strictly diagonally dominant in some of its rows, which together
with an additional structural property of A can be used to guarantee convergence of
Jacobi and Gauß-Seidel method.
(simultaneous row and column permutation) with matrices Ã11 ∈ Rp×p , Ã22 ∈ Rq×q , Ã21
∈ Rq×p , p, q > 0, p + q = n . It is called “irreducible” if it is not reducible.
Lemma 3.2 (Irreducibility): A matrix A ∈ Rn×n is irreducible if and only if the as-
sociated directed graph
# $
G(A) := knots P1 , ..., Pn , edges Pj Pk ⇔ ajk =
0, j, k = 1, ..., n
is connected, i. e., for each pair of knots {Pj , Pk } there exists a directed connection between
Pj and Pk .
110 Iterative Methods for Linear Algebraic Systems
Theorem 3.3 (Weak row-sum criterion): Let the matrix A ∈ Rn×n be irreducible
and diagonally dominant,
n
|ajk | ≤ |ajj | j = 1, . . . , n, (3.1.21)
k=1,k
=j
and let for at least one index r ∈ {1, . . . , n} the corresponding row sum satisfy
n
|ark | < |arr |. (3.1.22)
k=1,k
=r
Then, A is regular and spr(J) < 1 and spr(H1 ) < 1 , i. e., Jacobi and Gauß-Seidel
method converge. An analogous criterion holds in terms of the column sums of A .
Proof. i) Because of the assumed irreducibility of the matrix A there necessarily holds
n
|ajk | > 0 , j = 1, . . . , n ,
k=1
spr(J) ≤ 1 , spr(H1 ) ≤ 1.
ii) Suppose now that there is an eigenvalue λ ∈ σ(J) with modulus |λ| = 1 . Let v ∈ Cn
be a corresponding eigenvector with a component vs satisfying |vs | = v∞ = 1. There
holds
|λ| |vi | ≤ |aii |−1 |aik | |vk | , i = 1, . . . , n. (3.1.23)
k
=i
By the assumed irreducibility of A in the sense of Lemma 3.2 there exist a chain of indices
i1 , . . . , im such that asi1 = 0 , . . .,aim r = 0 . Hence, by multiple use of the inequality
3.2 Acceleration methods 111
Consequently, there must hold spr(J) < 1 . Analogously, we also conclude spr(H1 ) < 1.
Finally, in view of A = D(I −J) the matrix A must be regular. Q.E.D.
For practical problems Jacobi and Gauß-Seidel method are usually much too slow. There-
fore, one tries to improve their convergence by several strategies, two of which will be
discussed below.
The SOR method can be interpreted as combining the Gauß-Seidel method with an extra
“relaxation step”. Starting from a standard Gauß-Seidel step in the t-th iteration,
1
x̃tj = bj − xtk − xt−1
k ,
ajj k<j k>j
the next iterate xtj is generated as a convex linear combination (“relaxation”) of the form
with a parameter ω ∈ (0, 2). For ω = 1 this is just the Gauß-Seidel iteration. For ω < 1,
one speaks of “underrelaxation” and for ω > 1 of “overrelaxation”. The iteration matrix
of the SOR methods is obtained from the relation
as
Hω = −(D + ωL)−1 [(ω − 1) D + ωR].
112 Iterative Methods for Linear Algebraic Systems
or in componentwise notation:
ω
xti = (1 − ω)xt−1
i + bi − aij xt
j − aij xt−1
j , i = 1, . . . , n. (3.2.25)
aii j<i j>i
The following lemma shows that in the relaxation parameter has to be picked in the range
0 < ω < 2 if one wants to guarantee convergence.
Lemma 3.3 (Relaxation): For an arbitrary matrix A ∈ Rn×n with regular D there
holds
Proof. We have
3
Alexander Markowitsch Ostrowski (1893–1986): Russian-German-Swiss mathematician; studied at
Marburg, Göttingen (with D. Hilbert and E. Landau) and Hamburg, since 1927 Prof. in Basel; worked
on Dirichlet series, in Valuation Theory and especially in Numerical Analysis: “On the linear iteration
procedures for symmetric matrices”, Rend. Mat. Appl. 5, 140–163 (1954).
4
Edgar Reich (1927–2009): US-American mathematician of German origin; start as Electrical Engineer
at MIT (Massachusetts, USA) and Rand Corp. working there on numerical methods and Queuing Theory:
“On the convergence of the classical iterative method for solving linear simultaneous equations”, Ann.
Math. Statist. 20. 448–451 (1949); PhD at UCLA and 2-year postdoc at Princeton, since 1956 Prof.
at Univ. of Minnesota (Minneapolis, USA), work in Complex Analysis especially on quasi-conformal
mappings.
3.2 Acceleration methods 113
2 2 λmax (D)
spr(H1 ) ≤ 1 − + , μ := , (3.2.28)
μ μ(μ + 1) λmin (A)
and
Observing v T Lv = v T LT v implies
ωv T Av = (1−λ) v T Dv + ω (1−λ) v T Lv
λωv T Av = (1−λ)(1−ω) v T Dv − (1−λ) ωv T Lv,
1+λ 2 − ω v T Dv
μ := = > 0.
1−λ ω v T Av
Resolving this for λ, we finally obtain the estimate
114 Iterative Methods for Linear Algebraic Systems
μ − 1
|λ| = < 1, (3.2.29)
μ + 1
where
v T Dv maxy2 =1 y T Dy λmax (D)
μ= ≤ ≤ .
v T Av miny2 =1 y T Ay λmin (A)
This completes the proof. Q.E.D.
Remark 3.3: The estimate (3.2.28) for the convergence rate of the Gauß-Seidel method
in the case of a symmetric, positive definite matrix A has an analogue for the Jacobi
method,
1
spr(J) ≤ 1 − , (3.2.30)
μ
λDv = Dv − Av.
Multiplying by v and observing that A as well as D are positive definite, then yields
v T Av 1
λ=1− T
≤1− .
v Dv μ
for μ
1 , indicates that the Gauß-Seidel method may be almost twice as fast as the
Jacobi method. That this is actually the case will be seen below for a certain class of
matrices.
are independent of the parameter α, i. e., equal to the eigenvalues of the matrix J = J(1).
The importance of this property lies in the fact that in this case there are explicit
relations between the eigenvalues of J and those of Hω .
3.2 Acceleration methods 115
Example 3.2: Though the condition of “consistent ordering” appears rather strange
and restrictive, it is satisfied for a large class of matrices. Consider the model matrix in
Subsection 0.4.2 of Chapter 0. Depending on the numbering of the mesh points matrices
with different block structures are encountered.
i) If the mesh points are numbered in a checker-board manner a block-tridiagonal matrix
⎡ ⎤
D1 A12
⎢ ⎥
⎢ A ..
. ⎥
⎢ 21 D2 ⎥
A=⎢ .. .. ⎥,
⎢ . . A ⎥
⎣ r−1,r ⎦
Ar,r−1 Dr
occurs where the Di are diagonal and regular. Such a matrix is consistently ordered,
which is seen by applying a suitable similarity transformation,
⎡ ⎤
I
⎢ ⎥
⎢ αI ⎥
⎢ ⎥
T =⎢ .. ⎥, αD −1 L + α−1 D −1 R = T (D −1 L + D −1 R)T −1 .
⎢ . ⎥
⎣ ⎦
αr−1 I
occurs where the Ai are tridiagonal and the Dij diagonal. Such a matrix is consistently
ordered, which is seen by first applying the same similarity transformation as above,
⎡ ⎤
A1 α−1 D12
⎢ ⎥
⎢ αD ..
. ⎥
⎢ A ⎥
T AT −1 = ⎢
21 2
.. .. ⎥,
⎢ . . α Dr−1,r ⎥
−1
⎣ ⎦
αDr,r−1 Ar
S = diag{S1 , . . . , Sm },
⎡ ⎤
S1 A1 S1−1 α−1 D12
⎢ ⎥
⎢ S2 A2 S2−1
..
. ⎥
⎢ αD21 ⎥
ST AT −1 S −1 =⎢ .. .. ⎥.
⎢ . . α−1 Dr−1,r ⎥
⎣ ⎦
αDr,r−1 Sr Ar Sr−1
Here, it has been used that the blocks Dij are diagonal. Since the main-diagonal blocks
are tri-diagonal, they split like Ai = Di + Li + Ri and there holds
Si Ai Si−1 = Di + αL + α−1 R.
Theorem 3.5 (Optimal SOR method): Let the matrix A ∈ Rn×n be consistently or-
dered and 0 ≤ ω ≤ 2. Then, the eigenvalues μ ∈ σ(J) and λ ∈ σ(Hω ) are related
through the identity
λ1/2 ωμ = λ + ω − 1. (3.2.31)
and
(λ + ω − 1)v = −λ1/2 ω λ1/2 D −1 L + λ−1/2 D −1 R v = λ1/2 ωJ(λ1/2 ) v.
Thus, v is eigenvector of J(λ1/2 ) corresponding to the eigenvalue
λ+ω−1
μ= .
λ1/2 ω
Then, by the assumption on A also μ ∈ σ(J). In turn, for μ ∈ σ(J), by the same relation
we see that λ ∈ σ(Hω ). Q.E.D.
As direct consequence of the above result, we see that for consistently ordered matrices
the Gauß-Seidel matrix (case ω = 1 ) either has spectral radius spr(H1 ) = 0 or there holds
In case spr(J) < 1 the Jacobi method converges. For reducing the error by the factor
10−1 the Gauß-Seidel method only needs half as many iterations than the Jacobi method
and is therefore to be preferred. However, this does not necessarily hold in general since
one can construct examples for which one or the other method converges or diverges.
For consistently ordered matrices from the identity (3.2.31), we can derive a formula
for the “optimal” relaxation parameter ωopt with spr(Hωopt ) ≤ spr(Hω ), ω ∈ (0, 2). If
there holds ρ := spr(J) < 1, then:
3.2 Acceleration methods 117
,
ω−1 , ωopt ≤ ω
spr(Hω ) = 6 2
1
4
ρ ω + ρ ω − 4(ω − 1)
2 2 , ω ≤ ωopt .
1 ........................................................................................ ...
...
..
....................
................... ...
...............
......... ....
.
....... ...
0.8 spr(Bω ) ......
....
.... ...
..
... ...
...
.. ....
... ..
... .....
0.6 ... ..
.. ...
.....
....
0.4
0.2
0 - ω
1 ωopt 2
Figure 3.1: Spectral radius of the SOR matrix Hω as function of ω
.
Then, there holds
6
2 1− 1 − ρ2
ωopt = 6 , spr(Hωopt ) = ωopt − 1 = 6 < 1. (3.2.33)
1 + 1 − ρ2 1 + 1 − ρ2
In general the exact value for spr(J) is not known. Since the left-sided derivative of
the function f (ω) = spr(Hω ) for ω → ωopt is singular, in estimating ωopt it is better to
take a value slightly larger than the exact one. Using inclusion theorems for eigenvalues
or simply the bound ρ ≤ J∞ one obtains estimates ρ̄ ≥ ρ . In case ρ̄ < 1 this yields
an upper bound ω̄ ≥ ωopt
2 2
ω̄ := 6 ≥ 6 = ωopt
1 + 1 − ρ̄2 1 + 1 − ρ2
for which
6
1− 1 − ρ̄2
spr(Hω̄ ) = ω̄ − 1 = 6 < 1. (3.2.34)
1+ 1 − ρ̄2
xt = Bxt−1 + c, t = 1, 2, . . . , (3.2.35)
with diagonalizable iteration matrix B . First, we describe the general principle of this
approach and then apply it to a symmetrized version of the SOR method. Suppose that
the above fixed-point iteration converges to the solution x ∈ Rn of the linear system
Ax = b ⇔ x = Bx + c, (3.2.36)
i. e., that spr(B) < 1 . The idea of Chebyshev acceleration is to construct linear combi-
nations
t
y t := γst xs , t ≥ 1, (3.2.37)
s=0
with certain coefficients γst , such that the new sequence (y t )t≥0 converges faster to the
fixed point x than the original sequence (xt )t≥0 . Once the fixed-point has been reached,
i. e., xt ≈ x , the new iterates should also be close to x . This imposes the consistency
condition
t
γst = 1. (3.2.38)
s=0
t
t
y −x=
t
γst (xs − x) = γst B s (x0 − x) = pt (B)(x0 − x), (3.2.39)
s=0 s=0
t
pt (z) = γst z s , pt (1) = 1. (3.2.40)
s=0
The eigenvalues λ ∈ spr(B) are usually not known, but rather the bound spr(B) ≤ 1 − δ
with some small δ > 0 may be available. Hence, this optimization problem has to be
relaxed to
This optimization problem can be explicitly solved in the case σ(B) ∈ R . Therefore, we
make the following assumption.
σ(B) ⊂ R. (3.2.44)
Remark 3.4: In general the iteration matrix B cannot be assumed to be symmetric and
not even similar to a symmetric matrix (e. g., in the Gauß-Seidel method with H1 = −(D+
L)−1 LT ). But if this were the case (e. g., in the Richardson method with B = I − θA or
in the Jacobi method with J = −D −1 (L + LT ) ) the analysis of the new sequence (yt )t≥0
may proceed as follows. Taking spectral-norms, we obtain
Hence, the convergence can be improved by choosing the polynomial pt such the the
norm pt (B)2 becomes minimal,
y t − x2
≤ min pt (B)2 B t 2 ≤ Bt2 . (3.2.46)
x0 − x2 pt ∈Pt ,pt (1)=1
Using the representation of the spectral norm, valid for symmetric matrices,
and observing σ(B) ∈ [−1 + δ, 1 − δ], for same small δ > 0, the optimization problem
takes the form
min max |pt (x)|. (3.2.48)
pt ∈Pt ,pt (1)=1 |x|≤1−δ
The solution of the optimization problem (3.2.43) is given by the well-known Cheby-
shev polynomials (of the first kind), which are the orthogonal polynomials obtained by
120 Iterative Methods for Linear Algebraic Systems
successively orthogonalizing (using the the Gram-Schmidt algorithm with exact arith-
metic) the monomial basis {1, x, x2 , . . . , xt } with respect to the scalar product
" 1
dx
(p, q) := p(x)q(x) √ , p, q ∈ Pt ,
−1 1 − x2
defined on the function space C[−1, 1] . These polynomials, named Tt ∈ Pt , are usually
normalized to satisfy Tt (1) = 1 ,
⎧
" 1
dx ⎨ 0, t = s,
Tt (x)Ts (x) √ = π, t = s = 0,
−1 1 − x2 ⎩
π/2, t = s = 0.
They can be written in explicit form as (see, e. g., Stoer & Bulirsch [50] or Rannacher [1]):
⎧
⎨ (−1) cosh(t arccosh(−x)), x ≤ −1,
t
Tschebyscheff-Polynome
1
0.8
tp(1,x)
tp(2,x)
0.6 tp(3,x)
tp(4,x)
tp(5,x)
0.4
0.2
y-Achse
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x-Achse
That the so defined functions are actually polynomials can be seen by induction.
Further, there holds the three-term recurrence relation
which allows the numerically stable computation and evaluation of the Chebyshev poly-
nomials. Sometimes the following alternative global representation is useful:
√ √
Tt (x) = 12 [x + x2 − 1 ]t + [x − x2 − 1 ]t , x ∈ R. (3.2.51)
Tt (1 + 2 x−b )
p(x) := Ct (x) = b−a
c−b
, x ∈ [a, b]. (3.2.53)
Tt (1 + 2 b−a )
1 2γ t
min max |p(x)| = c−b
= ≤ 2γ t , (3.2.54)
p∈Pt ,p(c)=1 x∈[a,b] Tt (1 + 2 b−a ) 1 + γ 2t
where √
1 − 1/ κ c−a
γ := √ , κ := .
1 + 1/ κ c−b
Proof. i) By affine transformation, which does not change the max-norm, we may restrict
ourselves to the standard case [a, b] = [−1, 1] and c ∈ R \ [−1, 1] . Then, Ct (x) = C̃Tt (x)
with constant C̃ = Tt (c)−1 . The Chebyshev polynomial Tt (x) = cos(t arccos(x)) attains
the values ±1 at the points xi = cos(iπ/t), i = 0, . . . , t, and it alternates between 1
and −1 , i. e., Tt (xi ) and Tt (xi+1 ) have opposite signs. Furthermore, max[−1,1] |Tt | = 1 ,
implying max[−1,1] |Ct | = |C̃| .
ii) Assume now the existence of q ∈ Pt such that max[−1,1] |q| < max[−1,1] |Ct | = |C̃|
and q(c) = 1 . Then, the polynomial r = Ct − q changes sign t-times in the interval
[−1, 1] since sign r(xi ) = sign Tt (xi ), i = 0, . . . , t. Thus, r has at least t zeros in [−1, 1] .
Additionally, r(c) = 0 . Hence, r ∈ Pt has at least t + 1 zeros; thus, r ≡ 0, which leads
to a contradiction.
iii) By definition, there holds |Tt (x)| ≤ 1, x ∈ [−1, 1] . This implies that
1
max |Ct (x)| = c−b
.
x∈[a,b] Tt (1 + 2 b−a )
122 Iterative Methods for Linear Algebraic Systems
The assertion then follows from the explicit representation of the Tt given above and
some manipulations (for details see the proof of Theorem 3.11, below). Q.E.D.
We now assume σ(B) ⊂ (−1, 1) , i. e., convergence of the primary iteration. Moreover,
we assume that a parameter ρ ∈ (−1, 1) is known such that σ(B) ⊂ [−ρ, ρ] . With
the parameters a = −ρ, b = ρ, and c = 1, we use the polynomials pt = Ct given in
Theorem 3.6 in defining the secondary iteration (3.2.37). This results in the “Chebyshev-
accelerated” iteration scheme. This is a consistent choice since Tt (1) = 1 .
The naive evaluation of the secondary iterates (3.2.37) would require to store the
whole convergence history of the base iteration (xt )t≥0 , which may not be possible for
large problems. Fortunately, the three-term recurrence formula (3.2.50) for the Chebyshev
polynomials carries over to the corresponding iterates (y t)t≥0 , making the whole process
feasible at all.
Since the Tt satisfy the three-term recurrence (3.2.50), and so do the polynomials
p = Ct from (3.2.53):
2x
μt+1 pt+1 (x) = μt pt (x) − μt−1 pt−1 (x), t ≥ 1, μt = Tt (1/ρ), (3.2.55)
ρ
with initial functions
T1 (x/ρ) x/ρ
p0 (x) ≡ 1, p1 (x) = = = x,
T1 (1/ρ) 1/ρ
i. e., a0,0 = 1 and a1,0 = 0, a1,1 = 1 . We also observe the important relation
2
μt+1 = μt − μt−1 , μ0 = 1, μ1 = 1/ρ. (3.2.56)
ρ
which can be concluded from (3.2.55) observing that pt (1) = 1 . With these preparations,
we can now implement the Chebyshev acceleration scheme. With the limit x := limt→∞ xt ,
we obtain for the error y t − x = ẽt = pt (B)e0 :
μt μt−1
y t+1 = x + ẽt+1 = x + pt+1 (B)e0 = x + 2 Bpt (B)e0 − pt−1 (B)e0
ρμt+1 μt+1
μt μt−1 t−1 μt μt−1 t−1
=x+2 Bẽt − ẽ = x + 2 B(y t − x) − (y − x)
ρμt+1 μt+1 ρμt+1 μt+1
μt μt−1 t−1 1 2
=2 By t − y + μt+1 − μt B + μt−1 x.
ρμt+1 μt+1 μt+1 ρ
Now, using the fixed-point relation x = Bx+c and the recurrence (3.2.56), we can remove
the appearance of x in the above recurrence obtaining
3.2 Acceleration methods 123
μt μt−1 t−1 μt
y t+1 = 2 By t − y +2 c, y 0 = x0 , y 1 = x1 = Bx0 + c. (3.2.57)
ρμt+1 μt+1 ρμt+1
Hence, the use of Chebyshev acceleration for the primary iteration (3.2.35) consists in
evaluating the three-term recurrences (3.2.56) and (3.2.57), which is of similar costs as
the primary iteration (3.2.35) itself, in which the most costly step is the matrix-vector
multiplication By t .
In order to quantify the acceleration effect of this process, we write the secondary
iteration in the form
t
yt − x = γst (xs − x) = pt (B)(x0 − x),
s=0
Tt (x/ρ)
pt (x) = Ct (x) = .
Tt (1/ρ)
Hence, for the primary and the secondary iteration, we find the asymptotic error behavior
et 1/t
lim sup = spr(B) ≤ ρ = 1 − δ, (3.2.58)
e0
ẽt 1/t 1 − 1/√κ √
lim sup ≤ √ ≤ 1 − c δ, (3.2.59)
e0 1 + 1/ κ
We want to apply the concept of Chebyshev acceleration to the SOR method with the
iteration matrix (recalling that A is symmetric)
Hω = (D + ωL)−1 (1 − ω) D − ωLT , ω ∈ (0, 2).
However, it is not obvious whether this matrix is diagonalizable. Therefore, one introduces
a symmetrized version of the SOR method, which is termed “SSOR method”,
or equivalently,
xt = (D + ωLT )−1 [(1−ω) D − ωL](D + ωL)−1 [(1−ω)D − ωLT ]xt−1 + b + b , (3.2.60)
The SSOR-iteration matrix is similar to a symmetric matrix, which is seen from the
relation
The optimal relaxation parameter of the SSOR method is generally different from that of
the SOR method.
Remark 3.5: In one step of the SSOR method the SOR loop is successively applied
twice, once in the standard “forward” manner based on the splitting A = (L + D) + LT
and then in “backward” form based on A = L + (D + LT ) . Hence, it is twice as expensive
compared to the standard SOR method. But this higher cost is generally not compensated
by faster convergence. Hence, the SSOR method is attractive mainly in connection with
the Chebyshev acceleration as described above and not so much as a stand-alone method.
In the following, we consider a class of iterative methods, which are especially designed for
linear systems with symmetric and positive definite coefficient matrices A , but can also
be extended to more general situations. In this section, we use the abbreviated notation
(·, ·) := (·, ·)2 and · := · 2 for the Euclidian scalar product and norm.
Let A ∈ Rn×n be a symmetric positive definite (and hence regular) matrix,
This matrix generates the so-called “A-scalar product” and the corresponding “A-norm”,
Accordingly, vectors with the property (x, y)A = 0 are called “A-orthogonal”. The
positive definite matrix A has important properties. Its eigenvalues are real and positive
0 < λ := λ1 ≤ . . . ≤ λn =: Λ and there exists an ONB of eigenvectors {w1 , . . . , wn } . For
its spectral radius and spectral condition number, there holds
Λ
spr(A) = Λ , cond2 (A) = . (3.3.63)
λ
3.3 Descent methods 125
The basis for the descent methods discussed below is provided by the following theorem,
which characterizes the solution of the linear system Ax = b as the minimum of a
quadratic functional.
Q(x) < Q(y) ∀ y ∈ Rn \ {x}, Q(y) := 12 (Ay, y)2 − (b, y)2 . (3.3.64)
1 ∂ ∂
n n n
∂Q
(x) = ajk xj xk − bk xk = aik xk − bi = 0, i = 1, . . . , n,
∂xi 2 ∂xi j,k=1 ∂xi k=1 k=1
i. e., Ax = b . Q.E.D.
We note that the gradient of Q in a point y ∈ R is given by n
grad Q(y) = 1
2
(A + AT )y − b = Ay − b. (3.3.65)
This coincides with the “defect” of the point y with respect to the equation Ax = b
(negative “residual” b−Ay ). The so-called “descent methods”, starting from some initial
point x(0) ∈ Rn , determine a sequence of iterates xt , t ≥ 1, by the prescription
Here, the “descent directions” r t are a priori determined or adaptively chosen in the
course of the iteration. The prescription for choosing the “step length” αt is called “line
search”. In view of
d
Q(xt + αr t ) = gradQ(xt + αr t ) · r t = (Axt − b, r t ) + α(Ar t , r t ),
dα
we obtain the formula
(g t, r t )
αt = − , g t := Axt − b = gradQ(xt ).
(Ar t , r t )
Definition 3.3: The general descent method, starting from some initial point x0 ∈ Rn ,
determines a sequence of iterates xt ∈ Rn , t ≥ 1, by the prescription
126 Iterative Methods for Linear Algebraic Systems
i) gradient g t = Axt − b,
ii) descent direction r t ,
(g t , r t )
iii) step length αt = − ,
(Ar t , r t )
iv) descent step xt+1 = xt + αt r t .
Each descent step as described in the above definition requires two matrix-vector
multiplications. By rewriting the algorithm in a slightly different way, one can save one
of these multiplications at the price of additionally storing the vector Ar t .
General descent algorithm:
i. e., the minimization of the functional Q(·) is equivalent to the minimization of the
Defect norm Ay − bA−1 or the error norm y − xA .
The various descent methods essentially differ by the choice of the descent directions
r t . One of the simplest a priori strategies uses in a cyclic way the Cartesian coordinate
direction {e1 , . . . , en } . The resulting method is termed “coordinate relaxation” and is
sometimes used in the context of nonlinear systems. For solving linear systems it is much
too slow as it is in a certain sense equivalent to the Gauß-Seidel method (exercise). A
more natural choice are the directions of steepest descent of Q(·) in the points xt :
In case that (Ag t , g t ) = 0 for some t ≥ 0 there must hold g t = 0 , i. e., the iteration can
only terminate with Axt = b .
3.3 Descent methods 127
Theorem 3.8 (Gradient method): For a symmetric positive definite matrix A ∈ Rn×n
the gradient method converges for any starting point x0 ∈ Rn to the solution of the linear
system Ax = b .
λy2 ≤ (y, Ay) ≤ Λy2 , Λ−1 y2 ≤ (y, A−1y) ≤ λ−1 y2,
with λ = λmin(A) and Λ = λmax (A) . In the case xt = x , i. e., E(xt ) = 0 and g t = 0 ,
we conclude that
g t 4 g t 4 λ
≥ = ,
t t t −1 t
(g , Ag )(g , A g ) Λg λ g
t 2 −1 t 2 Λ
and, consequently,
Since 0 < 1 − 1/κ < 1 for any x0 ∈ Rn the error functional E(xt ) → 0 (t → ∞) , i. e.,
xt → x (t → ∞). Q.E.D.
For the quantitative estimation of the speed of convergence of the gradient method,
we need the following result of Kantorovich5 .
Lemma 3.4 (Lemma of Kantorovich): For a symmetric and positive definite matrix
5
Leonid Vitalyevich Kantorovich (1912–1986): Russian Mathematician; Prof. at the U of Leningrad
(1934–1960), at the Academy of Sciences (1961–1971) and at the U Moscow (1971-1976); fundamental
contributions to linear optimization in Economy, to Functional Analysis and to Numerics (Theorem of
Newton-Kantorovich).
128 Iterative Methods for Linear Algebraic Systems
λΛ y4
4 ≤ , y ∈ Rn . (3.3.69)
(λ + Λ)2 (y, Ay)(y, A−1y)
n
n
ψ(ζ) = λ−1
i ζi , ϕ(ζ) = ( λi ζi )−1 .
i=1 i=1
−1
n
Since the function f (λ) = λ is convex it follows from 0 ≤ ζi ≤ 1 and i=1 ζi = 1
that
n
n
λ−1
i ζi ≥ ( λi ζi )−1 .
i=1 i=1
f(λ)
g(λ)
λ1 λn λ
Obviously, the graph of ϕ(ζ) lies, for all arguments ζ on the curve f (λ) , and that of
ψ(ζ) between the curves f (λ) and g(λ) (shaded area). This implies that
Theorem 3.9 (Error estimate for gradient method): Let the matrix A ∈ Rn×n be
symmetric positive definite. Then, for the gradient method the following error estimate
holds:
1 − 1/κ t
xt − xA ≤ x0 − xA , t ∈ N, (3.3.70)
1 + 1/κ
with the spectral condition number κ = cond2 (A) = Λ/λ of A . For reducing the initial
error by a factor TOL the following number of iterations is required:
Proof. i) In the proof of Theorem 3.8 the following error identity was shown:
g t 4
E(xt+1 ) = 1 − t −1
E(xt ).
(g , Ag ) (g , A g )
t t t
This together with the inequality (3.3.69) in the Lemma of Kantorovich yields
λΛ λ − Λ 2
E(xt+1 ) ≤ 1−4 E(xt
) = E(xt ).
(λ + Λ)2 λ+Λ
obtaining 1 κ + 1 −1
t(TOL) > ln ln .
TOL κ−1
Since ! %
x+1 1 1 1 1 1 2
ln =2 + + + . . . ≥
x−1 x 3 x3 5 x5 x
this is satisfied for t(TOL) ≥ 12 κ ln(1/TOL) . Q.E.D.
The relation
shows that the descent directions r t = −g t used in the gradient method in consecutive
steps are orthogonal to each other, while g t+2 may be far away form being orthogonal
to g t . This may lead to strong oscillations in the convergence behavior of the gradient
method especially for matrices A with large condition number, i. e., λ Λ . In the two-
130 Iterative Methods for Linear Algebraic Systems
dimensional case this effect can be illustrated by the contour lines of the functional Q(·),
which are eccentric ellipses, leading to a zickzack path of the iteration (see Fig. 3.3.1).
The gradient method utilizes the particular structure of the functional Q(·), i. e., the
distribution of the eigenvalues of the matrix A , only locally from one iterate xt to the next
one, xt+1 . It seems more appropriate to utilize the already obtained information about
the global structure of Q(·) in determining the descent directions, e. g., by choosing the
descent directions mutually orthogonal. This is the basic idea of the “conjugate gradient
method” (“CG method”) of Hestenes6 and Stiefel7 (1952), which successively generates
a sequence of descent directions dt which are mutually “A-orthogonal”, i. e., orthogonal
with respect to the scalar product (·, ·)A .
For developing the CG method, we start from the ansatz
with a set of linearly independent vectors di and seek to determine the iterates in the
form
t−1
xt = x0 + αi di ∈ x0 + Bt , (3.3.74)
i=0
such that
Q(xt ) = min
0
Q(y) ⇔ Axt − bA−1 = min
0
Ay − bA−1 . (3.3.75)
y∈x +Bt y∈x +Bt
Setting the derivatives of Q(·) with respect to the αi to zero, we see that this is equivalent
6
Magnus R. Hestenes (1906–1991): US-American mathematician; worked at the National Bureau of
Standards (NBS) and the University of California at Los Angeles (UCLA); contributions to optimization
and control theory and to numerical linear algebra.
7
Eduard Stiefel (1909–1978): Swiss mathematician; since 1943 Prof. for Applied Mathematics at
the ETH Zurich; important contributions to Topology, Groupe Theory, Numerical Linear Algebra (CG
method), Approximation Theory and Celestrian Mechanics.
3.3 Descent methods 131
(Axt − b, dj ) = 0 , j = 0, . . . , t − 1, (3.3.76)
or in compact form: Axt − b = g t ⊥ Bt . Inserting the above ansatz for xt into this
orthogonality condition, we obtain a regular linear system for the coefficients αi , i =
0, . . . , t−1,
n
αi (Adi , dj ) = (b, dj ) − (Ax0 , dj ), j = 0, . . . , t − 1. (3.3.77)
i=1
Remark 3.6: We note that (3.3.76) does not depend on the symmetry of the matrix A .
Starting from this relation one may construct CG-like methods for linear systems with
asymmetric and even indefinite coefficient matrices. Such methods are generally termed
“projection methods”. Methods of this type will be discussed in more detail below.
Recall that the Galerkin equations (3.3.76) are equivalent to minimizing the defect
norm Axt − bA−1 or the error norm xt − xA on x0 + Bt . Natural choices for the
spaces Bt are the so-called Krylov9 spaces
with some vector d0 , e. g., the (negative) initial defect d0 = b−Ax0 of an arbitrary vector
x0 . This is motivated by the observation that from At d0 ∈ Kt (d0 ; A), we necessarily obtain
8
Boris Grigorievich Galerkin (1871–1945): Russian civil engineer and mathematician; Prof. in St.
Petersburg; contributions to Structural Mechanics especially Plate Bending Theory.
9
Aleksei Nikolaevich Krylov (1863–1945): Russian mathematician; Prof. at the Sov. Academy of
Sciences in St. Petersburg; contributions to Fourier Analysis and differential equations, applications in
ship building.
132 Iterative Methods for Linear Algebraic Systems
t−1
(dt , Adi ) = (−g t , Adi ) + βjt−1 (dj , Adi ) = (−g t + βit−1 di , Adi ). (3.3.80)
j=0
For i < t − 1, we have (g t , Adi ) = 0 since Adi ∈ Kt (d0 ; A) and, consequently, βit−1 = 0.
For i = t − 1, the condition
The next iterates xt+1 and g t+1 = Axt+1 − b are then determined by
(g t, dt )
αt = − , xt+1 = xt + αt dt , g t+1 = g t + αt Adt . (3.3.83)
(dt , Adt )
These are the recurrence equations of the CG method. By construction there holds
g t 2 g t+1 2
αt = , βt = , (3.3.87)
(dt , Adt ) g t 2
are automatically A-orthogonal. This implies that the vectors d0 , . . . , dt are linearly
independent and that therefore span{d0, . . . , dn−1 } = Rn . We formulate the properties of
the CG method derived so far in the following theorem.
Theorem 3.10 (CG method): Let the matrix A ∈ Rn×n be symmetric positive defi-
nite. Then, (assuming exact arithmetic) the CG method terminates for any starting vector
x0 ∈ Rn after at most n steps at xn = x . In each step there holds
Q(xt ) = min
0
Q(y), (3.3.88)
y∈x +Bt
and, equivalently,
In view of the result of Theorem 3.10 the CG method formally belongs to the class of
“direct” methods. In practice, however, it is used like an iterative method, since:
1. Because of round-off errors the descent directions dt are not exactly A-orthogonal
such that the iteration does not terminate.
2. For large matrices one obtains accurate approximations already after t n itera-
tions.
As preparation for the main theorem about the convergence of the CG method, we provide
the following auxiliary lemma.
Lemma 3.5 (Polynomial norm bounds): Let A be a symmetric positive definite ma-
trix with spectrum σ(A) ⊂ [a, b] . Then, for any polynomial p ∈ Pt , p(0) = 1 there holds
with the natural matrix norm · A generated from the A-norm · A . Let λi , i =
1, . . . , n, be the eigenvalues and {w 1 , . . . , w n } a corresponding ONS of eigenvectors of the
symmetric, positive definite matrix A. Then, for arbitrary y ∈ Rn there holds
n
y= γ i wi , γi = (y, wi ),
i=1
and, consequently,
n
n
p(A)y2A = λi p(λi )2 γi2 ≤ M 2 λi γi2 = M 2 y2A .
i=1 i=1
This implies
p(A)yA
p(A)A = sup ≤ M,
y∈Rn , y
=0 yA
which completes the proof. Q.E.D.
with the spectral condition number κ = cond2 (A) = Λ/λ of A . For reducing the initial
error by a factor TOL the following number of iteration is required:
√
t(TOL) ≈ 12 κ ln(2/TOL). (3.3.92)
# 1 − 6λ/Λ t
$
min sup |p(μ)| ≤ 2 6 .
p∈Pt , p(0)=1 λ≤μ≤Λ 1 + λ/Λ
3.3 Descent methods 135
This is again a problem of approximation theory with respect to the max-norm (Chebyshev
approximation), which can be solved using the Chebyshev polynomials described above
in Subsection 3.2.2. The solution pt ∈ Pt is give by
Λ + λ − 2μ Λ + λ −1
pt (μ) = Tt Tt ,
Λ−λ Λ−λ
with the t-th Chebyshev polynomial Tt on [−1, 1] . There holds
Λ + λ −1
sup pt (μ) = Tt .
λ≤μ≤Λ Λ−λ
Hence,
√κ − 1 t
sup pt (μ) ≤ 2 √ ,
λ≤μ≤Λ κ+1
which implies (3.3.91).
ii) For deriving (3.3.92), we require
√κ − 1 t(ε)
2 √ ≤ TOL,
κ+1
and, equivalently,
2 √κ + 1 −1
t(TOL) > ln ln √ .
TOL κ−1
Since ! %
x+1 1 1 1 1 1 2
ln =2 + + + ... ≥ ,
x−1 x 3x 3 5x 5 x
1√
this is satisfied for t(TOL) ≥ 2 κ ln(2/TOL) . Q.E.D.
√
Since κ = condnat (A) > 1, we have κ < κ. Observing that the function f (λ) =
(1 − λ−1 ) (1 + λ−1 )−1 is strictly monotonically increasing for λ > 0 (f (λ) > 0), there
holds
136 Iterative Methods for Linear Algebraic Systems
√
1 − 1/ κ 1 − 1/κ
√ < ,
1 + 1/ κ 1 + 1/κ
implying that the CG method should converge faster than the gradient method. This is
actually the case in practice. Both methods converge the faster the smaller the condition
number is. However, in case Λ
λ , which is frequently the case in practice, even the
CG method is too slow. An acceleration can be achieved by so-called “preconditioning”,
which will be described below.
For solving a general linear system Ax = b, with regular but not necessarily symmetric
and positive definite matrix A ∈ Rn , by the CG method, one may consider the equivalent
system
AT Ax = AT b (3.3.93)
with the symmetric, positive definite matrix AT A. Applied to this system the CG method
takes the following form:
Since cond2 (AT A) ≈ cond2 (A)2 the convergence of this variant of the CG method may
be rather slow. However, its realization does not require the explicit evaluation of the
matrix product AT A but only the computation of the matrix-vector products z = Ay
and AT z .
On the basis of the formulation (3.3.75) the standard CG method is limited to linear
systems with symmetric, positive definite matrices. But starting from the (in this case
equivalent) Galerkin formulation (3.3.76) the method becomes meaningful also for more
general matrices. In fact, in this way one can derive effective generalizations of the CG
method also for nonsymmetric and even indefinite matrices. These modified CG methods
are based on the Galerkin equations (3.3.76) and differ in the choices of “ansatz spaces”
3.3 Descent methods 137
This leads to the general class of “Krylov space methods”. Most popular representatives
are the following methods, which share one or the other property with the original CG
method but generally do not allow for a similarly complete error analysis.
Axt − b = min
0
Ay − b. (3.3.96)
y∈x +Kt
Since this method minimizes the residual over spaces of increasing dimension as
the CG method also the GMRES methods yields the exact solution after at most
n steps. However, for general nonsymmetric matrices the iterates xt cannot be
obtained by a simple tree-term recurrence as in the CG method. It uses a full
recurrence, which results in high storage requirements. Therefore, to limit the costs
the GMRES method is stopped after a certain number of steps, say k steps, and
then restarted with xk as new starting vector. The latter variant is denoted by
“GMRES(k) method”.
In the BiCG method the iterates xt are obtained by a three-term recurrence but
for an unsymmetric matrix the residual minimization property gets lost and the
method may not even converge. Additional stability is provided in the “BiCGstab”
method.
Both methods, GMRES(k) and BiCGstab, are especially designed for unsymmetric but
definite matrices. They have there different pros and cons and are both not universally
applicable. One can construct matrices for which one or the other of the methods does
not work. The methods for the practical computation of the iterates xt in the Krylov
spaces Kt are closely related to the Lanczos and Arnoldi algorithms used for solving the
corresponding eigenvalue problems discussed in Chapter 4, below.
138 Iterative Methods for Linear Algebraic Systems
The error estimate (3.3.91) for the CG method indicates a particularly good convergence
if the condition number of the matrix A is close to one. In case of large cond2 (A)
1 ,
one uses “preconditioning”, i. e., the system Ax = b is transformed into an equivalent
one, Ãx̃ = b̃ with a better conditioned matrix Ã. To this end, let C be a symmetric,
positive definite matrix, which is explicitly given in product form
C = KK T , (3.3.98)
with a regular matrix K . The system Ax = b can equivalently be written in the form
K −1 A (K T )−1 K T x = K −1 b . (3.3.99)
à x̃ b̃
Then, the CG method is formally applied to the transformed system Ãx̃ = b̃ , while it is
hoped that cond2 (Ã) cond2 (A) for an appropriate choice of C . The relation
shows that for C ≡ A the matrix à is similar to I, and thus cond2 (Ã) = cond2 (I) = 1.
Consequently, one chooses C = KK T such that C −1 is a good approximation to A−1 .
The CG method for the transformed system Ãx̃ = b̃ can then be written in terms of
the quantities A, b, and x as so-called “PCG method” (“Preconditioned CG” method)
as follows:
Compared to the normal CG method the PCG iteration in each step additionally requires
the solution of the system Cρt+1 = r t+1 , which is easily accomplished using the decompo-
sition C = KK T . In order to preserve the work complexity O(n) a. op. in each step the
triangular matrix K should have a sparsity pattern similar to that of the lower triangular
part L of A . This condition is satisfied by the following popular preconditioners:
1) Diagonal preconditioning (scaling): C := D = D 1/2 D 1/2 .
The scaling ensures that the elements of A are brought to approximately the same size,
especially with ãii = 1 . This reduces the condition number since
max1≤i≤n aii
cond2 (A) ≥ . (3.3.101)
min1≤i≤n aii
Example: The matrix A = diag{λ1 = ... = λn−1 = 1, λn = 10k } has the condition number
3.3 Descent methods 139
cond2 (A) = 10k , while the scaled matrix à = D −1/2 AD −1/2 has the optimal condition
number cond2 (Ã) = 1.
2) SSOR preconditioning: We choose
C := (D + L)D −1 (D + LT ) = D + L + LT + LD −1 LT
= (D
1/2
+LD −1/2)(D
1/2
D −1/2 LT),
+
K KT
or, more generally, involving a relaxation parameter ω ∈ (0, 2),
1 1 1 −1 1
C := D+L D D + LT
2−ω ω ω ω
1 −1/2 1
=6 1/2
(D + ωLD )6 (D 1/2 + ωD −1/2 LT ).
(2−ω)ω (2−ω)ω
K KT
Obviously, the triangular matrix K has the same sparsity pattern as L . Each step of the
preconditioned iteration costs about twice as much work as the basic CG method. For an
optimal choice of the relaxation parameter ω (not easy to determine) there holds
6
cond2 (Ã) = cond2 (A).
The matrix L generally has nonzero elements in the whole band of A, which requires
much more memory than A itself. This can be avoided by performing (such as in the
ILU approach discussed in Subsection 3.1.2) only an “incomplete” Cholesky decomposition
where within the elimination process some of the lji are set to zero, e. g., those for which
aji = 0. This results in an incomplete decomposition
A = L̃L̃T + E (3.3.102)
with a lower triangular matrix L̃ = (˜lij )ni,j=1, which has a similar sparsity pattern as A .
In this case, one speaks of the “ICCG(0) variant”. In case of a band matrix A, one
may allow the elements of L̃ to be nonzero in further p off-diagonals resulting in the
so-called “ICCG(p) variant” of the ICCG method, which is hoped to provide a better
approximation C −1 ≈ A−1 for increasing p. Then, for preconditioning the matrix
C = KK T := L̃L̃T (3.3.103)
140 Iterative Methods for Linear Algebraic Systems
is used. Although, there is no full theoretical justification yet for the success of the
ICCG preconditioning practical tests show a significant improvement in the convergence
behavior. This may be due to the fact that, though the condition number is not necessar-
ily decreased, the eigenvalues of the corresponding transformed matrix à cluster more
around λ = 1.
At the end of the discussion of the classical iterative methods for solving linear systems
Ax = b , we will determine their convergence rates for the model situation already de-
scribed in Section 0.4.2 of Chapter 0. We consider the so-called “1-st boundary value
problem of the Laplace operator”
∂2u ∂2u
− 2
(x, y) − 2 (x, y) = f (x, y) for (x, y) ∈ Ω
∂x ∂y (3.4.104)
u(x, y) = 0 for (x, y) ∈ ∂Ω,
on the unit square Ω = (0, 1) × (0, 1) ⊂ R2 . For solving this problem the domain Ω is
covered by a uniform mesh as shown in Fig. 3.4.
1 2 3 4
1
h= m+1
mesh size
5 6 7 8
9 10 11 12 n = m2 number of unknown
mesh values
13 14 15 16
h
The “interior” mesh points are numbered row-wise. On this mesh the second deriva-
tives in the differential equation (3.4.104) are approximated by second-order central dif-
ference quotients leading to the following difference equations for the mesh unknowns
U(x, y) ≈ u(x, y) :
# $
−h−2 U(x+h, y) − 2U(x, y) + U(x−h, y) + U(x, y+h) − 2U(x, y) + U(x, y−h) = f (x, y).
Observing the boundary condition u(x, y) = 0 for (x, y) ∈ ∂Ω this set of difference
3.4 A model problem 141
Ax = b, (3.4.105)
for the vector x ∈ Rn of unknown mesh values xi ≈ u(Pi ) , Pi interior mesh point. The
matrix A has the already known form
⎡ ⎤⎫ ⎡ ⎤⎫
B −I ⎪
⎪ 4 −1 ⎪
⎪
⎢ ⎥⎪
⎪
⎪ ⎢ ⎥⎪
⎪
⎪
⎢ −I B −I ⎥⎬ ⎢ −1 4 −1 ⎥⎬
⎢ ⎥ ⎢ ⎥
A= ⎢ . ⎥ n B = ⎢ . ⎥ m
⎢
⎣ −I B . . ⎥ ⎪
⎦⎪
⎪
⎢
⎣ −1 4 .. ⎥ ⎦
⎪
⎪
⎪
⎪
⎪ ⎪
⎪
. ⎭ . ⎭
.. .. .. ..
. .
with the m×m-unit matrix I . The right-hand side is given by b = h2 (f (P1 ), . . . , f (Pn ))T .
The matrix A has several special properties:
- “sparse band matrix” with bandwidth 2m + 1 ;
- “irreducible” and “strongly diagonally dominant”;
- “symmetric” and “positive definite”;
- “consistently ordered”;
- “of nonnegative type” (“M-matrix”): aii > 0, aij ≤ 0, i = j.
The importance of this last property will be illustrated in an exercise.
For this matrix eigenvalues and eigenvectors can be explicitly determined (h = 1/(m+1)) :
Hence,
π2 2
ρ := spr(J) = μmax (J) = cos[hπ] = 1 − h + O(h4 ). (3.4.107)
2
For the iteration matrices of the Gauß-Seidel and the optimal SOR iteration matrices,
142 Iterative Methods for Linear Algebraic Systems
Now, we make a comparison of the convergence speed of the various iterative methods
considered above. The reduction of the initial error x(0) − x2 in a fixed-point iteration
by the factor ε 1 requires about T (ε) iterations,
ln(1/ε)
T (ε) ≈ , ρ = spr(B), B = I − C−1 A iteration matrix. (3.4.110)
ln(1/ρ)
ln(1/ε) ln(1/ε) 2
TJ (ε) ≈ − ≈ 2 2 2 = 2 n ln(1/ε),
ln(1 − 2 h )
π2 2 π h π
ln(1/ε) ln(1/ε) 1
TGS (ε) ≈ − ≈ = 2 n ln(1/ε),
ln(1 − π 2 h2 ) π 2 h2 π
ln(1/ε) ln(1/ε) 1√
TSOR (ε) ≈ − ≈ = n ln(1/ε).
ln(1 − 2πh) 2πh 2π
The gradient method and the CG method require for the reduction of the initial error
x0 − x2 by the factor ε 1 the following numbers of iterations:
1 2 2
TG (ε) = κ ln (2/ε) ≈ 2 2 ln(1/ε) ≈ 2 n ln(1/ε),
2 π h π
1√ 1 1√
TCG (ε) = κ ln(2/ε) ≈ ln(2/ε) ≈ n ln(2/ε).
2 πh π
We see that the Jacobi method and the gradient method converge with about the same
speed. The CG method is only half as fast as the (optimal) SOR method, but it does
not require the determination of an optimal parameter (while the SOR method does not
require the matrix A to be symmetric). The Jacobi method with Chebyshev acceleration
is as fast as the “optimal” SOR method but also does not require the determination of
an optimal parameter (but a guess for spr(J) ).
For the special right-hand side function f (x, y) = 2π 2 sin(πx) sin(πy) the exact solu-
tion of the boundary value problem is given by
The error caused by the finite difference discretization considered above can be estimated
3.4 A model problem 143
as follows:
π4 2
max |u(Pi ) − xi | ≤ h + O(h4 ). (3.4.112)
Pi 12
Hence, for achieving a relative accuracy of TOL = 10−3 (three decimals) a mesh size
√
12 −3/2
h ≈ 10 ≈ 10−2 ,
π2
is required. This results in n ≈ 104 unknowns. In this case, we obtain for the above
spectral radii, conditions numbers and numbers of iterations required for error reduction
by ε = 10−4 (including a safety factor of 1/10) the following values (ln(1/ε) ∼ 10) :
For the comparison of the various solution methods, we also have to take into account
the work in each iteration step. For the number “OP” of “a. op.” (1 multiplication + 1
addition) per iteration step there holds:
As final result, we see that the computation of the approximate solution of the boundary
value model problem (3.4.104) with a prescribed accuracy TOL by the Jacobi method,
the Gauß-Seidel method and the gradient method requires O(n2 ) a. op. In this case
a direct method such as the Cholesky algorithm requires O(n2 ) = O(m2 n) a. op. but
significantly more storage space. The (optimal) SOR method and the CG method only
require O(n3/2 ) a. op.
For the model problem with n = 104 , we have the following total work “TW” required
for the solution of the system (3.4.105) to discreetization accuracy ε = 10−4 :
has optimal solution complexity O(n) . For such a multigrid (“MG”) method, we can
expect work counts like TWM G ≈ 4 · 25n ≈ 106 a. op..
Remark 3.8: For the 3-dimensional version of the above model problem, we have
8
λmax ≈ 12h−2 , λmin ≈ 3π 2 , κ ≈ ,
3π 2 h2
and consequently the same estimates for ρJ , ρGS and ρSOR as well as for the iteration
numbers TJ , TGS , TSOR, TCG , as in the 2-dimensional case. In this case the total work
per iteration step is OPJ , OPGS , OPSOR ≈ 8 N, OPCG ≈ 12 N . Hence, the resulting
total work amounts to
while that for the multigrid method increases only to TWM G ≈ 4 · 50n ≈ 2 · 108 a. op..
Remark 3.9: For the interpretation of the above work counts, we have to consider the
computing power of available computer cores, e. g., 200 MFlops (200 million “floating-
point” oper./sec.) of a standard desktop computer. Here, the solution of the 3-dimensional
model problem by the optimal SOR method takes about 1, 5 minutes while the multigrid
method only needs less than 1 second.
3.5 Exercises
What are the limits of the iterates in case of convergence? (Hint: The eigenvalues of the
matrices B are to be estimated. This can be done via appropriate matrix norms or also
via the determinants.
is to be solved by the Jacobi and the Gauß-Seidel method. How many iterations are
approximately (asymptotically) required for reducing the initial error x0 − x2 by the
factor 10−6 ? (Hint: Use the error estimate stated in the text.)
Exercise 3.3: Show that the two definitions of “irreducibility” of a matrix A ∈ Rn×n
given in the text are equivalent.
Hint: Use the fact that the definition of “reducibility” of the system Ax = b , i. e., the
existence of simultaneous row and column permutations resulting in
& '
Ã11 0
T
P AP = Ã = , Ã11 ∈ Rp×p , Ã22 ∈ Rq×q , n = p + q,
0 Ã22
Exercise 3.4: Examine the convergence of the Jacobi and Gauss-Seidel methods for
solving the linear system Ai x = b (i = 1, 2) for the following two matrices
⎡ ⎤ ⎡ ⎤
2 −1 2 5 5 0
⎢ ⎥ ⎢ ⎥
A1 = ⎢
⎣ 1 2 −2 ⎦ ,
⎥ A2 = ⎢⎣ −1 5 4 ⎦ .
⎥
2 2 2 2 3 8
(Hint: Use the convergence criteria stated in the text, or estimate the spectral radius)
for which the spectral radius of the iteration matrix Bω becomes minimal and sketch the
graph of the function f (ω) = spr(Bω ) .
146 Iterative Methods for Linear Algebraic Systems
Exercise 3.6: Let A ∈ Rn×n be a symmetric (and therefore diagonalizable) matrix with
eigenvalues λi ∈ R, i = 1, . . . , n. Show that for any polynomial p ∈ Pk there holds
Exercise 3.7: For the computation of the inverse A−1 of a regular matrix A ∈ Rn×n
the following two fixed-point iterations are considered:
Give (sufficient) criteria for the convergence of these iterations. For this task (computation
of a matrix inverse), how would the Newton iteration look like?
Exercise 3.9: The method of Chebyshev acceleration can be applied to any convergent
fixed-point iteration
xt = Bxt−1 + c, t = 1, 2, . . . ,
with symmetric iteration matrix B. Here, the symmetry of B guarantees the relation
p(B)2 = spr(p(B)) = maxλ∈σ(B) |p(λ| for any polynomial p ∈ Pk , which is crucial for
the analysis of the acceleration effect. In the text this has been carried out for the SSOR
(Symmetric Successive Over-Relaxation) method. Repeat the steps of this analysis for
the Jacobi method for solving the linear system Ax = b with symmetric matrix A ∈ Rn×n .
with a symmetric positive definite matrix A ∈ Rn×n and a not necessarily quadratic
matrix B ∈ Rn×m , m ≤ n. The coefficient matrix cannot be positive definite since some
of its main diagonal elements are zero. Most of the iterative methods discussed in the
text can directly be applied for this system.
i) Assume that the coefficient matrix is regular. Can the damped Richardson method,
& ' (& ' & ') & ' & '
xt I O A B xt−1 b
t
= −θ T t−1
+θ ,
y O I B O y c
3.5 Exercises 147
be made convergent in this case for appropriately chosen damping parameter θ ? (Hint:
Investigate whether the coefficient matrix may have positive AND negative eigenvalues.)
ii) A classical approach to solving this saddle-point system is based on the equivalent
“Schur-complement formulation”:
Exercise 3.11: The general “descent method” for the iterative solution of a linear system
Ax = b with symmetric positive definite matrix A ∈ RN ×N has the form
The so-called “Coordinate Relaxation” uses descent directions r t , which are obtained by
cyclicing through the Cartesian unit vectors {e1 , . . . , en } . Verify that a full n-cycle of
this method is equivalent to one step of the Gauß-Seidel iteration
AT Ax = AT b.
The square matrix AT A is symmetric and also positive definite, provided A has full rank.
Formulate the CG method for solving the normal equation without explictly computing
the matrix product AT A. How many matrix-vector products with A are necessary per
iteration (compared to the CG method applied to Ax = b)? Relate the convergence speed
of this iteration to the singular values of the matrix A.
Exercise 3.13: For solving a linear system Ax = b with symmetric positive definite co-
efficient matrix A one may use the Gauß-Seidel, the (optimal) SOR method, the gradient
mathod, or the CG methods. Recall the estimates for the asymptotic convergence speed
of these iterations expressed in terms of the spectral condition number κ = cond2 (A) and
compare the corresponding performance results.
148 Iterative Methods for Linear Algebraic Systems
In order to derive convergence estimates for the Gauß-Seidel and (optimal) SOR method,
assume that A is consistently ordered and that the spectral radius of the Jacobi iteration
matrix is given by
1
spr(J) = 1 − .
κ
Discuss the pros and cons of the considered methods.
Exercise 3.14: Consider the symmetric “saddle point system” from Exercise 3.10
& '& ' & '
A B x b
T
= ,
B O y c
with a symmetric positive definite matrix A ∈ Rn×n and a not necessarily quadratic
matrix B ∈ Rn×m , m ≤ n with full rank. The coefficient matrix cannot be positive
definite since some of its main diagonal elements are zero.
A classical approach of solving this saddle-point system is based on the equivalent “Schur-
complement formulation”:
Exercise 3.15: For the gradient method and the CG method for a symmetric, positive
definite matrix A there hold the error estimates
1 − 1/κ t
xtgrad − xA ≤ x0grad − xA ,
1 + 1/κ
1 − 1/√κ t
xtcg − xA ≤ 2 √ x0cg − xA ,
1 + 1/ κ
with the condition number κ := cond2 (A) = λmax /λmin . Show that for reducing the
initial error by a factor ε the following numbers of iteration are required:
√
tgrad (ε) ≈ 12 κ ln(1/ε), tcg (ε) ≈ 12 κ ln(2/ε).
3.5 Exercises 149
Exercise 3.16: The SSOR preconditioning of the CG method for a symmetric, positive
definite matrix A with the usual additive decomposition A = L + D + LT uses the
parameter dependent matrix
1 1 1 −1 1
C := D+L D D + LT , ω ∈ (0, 2).
2−ω ω ω ω
Write this matrix in the form C = KK T with a regular, lower-triangular matrix K and
explain why C −1 may be viewed as an approximation to A−1 .
Exercise 3.17: The model matrix A ∈ Rn×n , n = m2 , originating from the 5-point
discretization of the Poisson problem on the unit square,
⎡ ⎤⎫ ⎡ ⎤⎫
B −I ⎪
⎪ 4 −1 ⎪
⎪
⎢ ⎥⎪⎪ ⎢ ⎥⎪⎪
⎢ −I B −I ⎥⎪⎬ ⎢ −1 4 −1 ⎥ ⎪
⎬
⎢ ⎥ ⎢ ⎥
A= ⎢ ⎥
. . ⎥⎪ n B = ⎢ ⎥
. . ⎥⎪ m,
⎢ −I B . ⎪ ⎢ −1 4 . ⎪
⎣ ⎦⎪⎪ ⎣ ⎦⎪⎪
⎪
⎭ ⎪
. ⎭
.. .. . . . .
. . .
Show that the inverse A−1 = (aij )ni,j=1 has nonnegative elements aij ≥ 0, i. e., A is a
(−1) (−1)
so-called “M-matrix” (“(inverse) monotone” matrix). This implies that the solution x of
a linear system Ax = b with nonnegative right-hand side b, bi ≥ 0 , is also nonnegative
xi ≥ 0. (Hint: consider the Jacobi matrix J = −D −1 (L + R) and the representation of
the inverse (I − J)−1 as a Neumann series.)
Exercise 3.18: In the text, we formulated the sequence of iterates {xt }t≥1 of the CG-
method formally as the solution xt of the optimization problem
with the Krylow spaces Kt (d0 ; A) = span{d0 , Ad0 , · · · , At−1 d0 }. The so called “Gener-
alized minimal residual method” (GMRES), instead, formally constructs a sequence of
iterates {xtgmres }t≥1 by
i) Prove that the GMRES method allows for an error inequality similar to the one that
was derived for the CG method:
ii) Prove that in case of A being a symmetric, positive definite matrix, this leads to the
same asymptotic convergence rate as for the CG method.
iii) Show that the result obtained in (i) can also be applied to the case of A being similar
to a diagonal matrix D = diagi (λi ) ∈ Cn×n , i. e.,
A = T DT −1,
What makes this result rather cumbersome in contrast to the case of a symmetric, positive
matrix discussed in (ii)?
Remark: The advantage of the GMRES method lies in the fact that it is, in principle,
applicable to any regular matrix A . However, good convergence estimates for the general
case are hard to prove.
Exercise 3.19: Repeat the analysis of the convergence properties of the various solution
methods for the 3-dimensional version of the model problem considered in the text. The
underlying boundary value problem has the form
∂2 ∂2 ∂2
− + + u(x, y, z) = f (x, y, z), (x, y, z) ∈ Ω = (0, 1)3 ∈ R3 ,
∂x2 ∂y 2 ∂z 2
u(x, y, z) = 0, (x, y, z) ∈ ∂Ω,
Using again row-wise numbering of the mesh points the resulting linear system for the
mesh values Uijk ≈ u(Pijk ) takes the form
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
B −Im2 C −Im 6 −1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
A=⎢
⎣ −Im2 B . ⎥
⎦ B=⎢
⎣ −Im C . ⎥
⎦ C=⎢
⎣ −1 6 . ⎥
⎦
.. .. .. .. .. ..
. . . . . .
n=m3 m2 m
In this case the corresponding eigenvalues and eigenvectors are explicitly given by
λijk = 6 − 2 cos[ihπ] + cos[jhπ] + cos[khπ] , i, j, k = 0, . . . , m,
m
w ijk = sin[pihπ] sin[qjhπ] sin[rkhπ] p,q,r=1.
For the exact solution u(x, y, z) = sin(πx) sin(πy) sin(πz) there holds the error estimate
3.5 Exercises 151
π4 2
max |Uijk ) − u(Pijk )| ≤ h + O(h4 ),
Ω 8
which dictates a mesh size h = 10−2 in order to guarantee a desired relative discretization
accuracy of TOL = 10−3 .
a) Determine formulas for the condition number cond2 (A) and the spectral radius spr(J)
in terms of the mesh size h.
b) Give the number of iterations of the Jacobi, Gauß-Seidel and optimal SOR method as
well as the gradient and CG method approximately needed for reducing the initial error
to size ε = 10−4 (including a small safety factor).
c) Give a rough estimate (in terms of h) of the total number of a. op. per iteration step
for the methods considered.
4 Iterative Methods for Eigenvalue Problems
4.1 Methods for the partial eigenvalue problem
In this section, we discuss iterative methods for solving the partial eigenvalue problem of
a general matrix A ∈ Kn×n .
Definition 4.1: The “Power method” of v. Mises1 generates, starting from some initial
point z 0 ∈ Cn with z 0 = 1, a sequence of iterates z t ∈ Cn , t = 1, 2, . . . , by
(Az t )r
λt := , r ∈ {1, . . . , n} : |zrt | = max |zjt |. (4.1.2)
zrt j=1,...,n
In practice, this is not really a restrictive assumption since, due to round-off errors, it will
be satisfied in general.
Theorem 4.1 (Power method): Let the matrix A be diagonalizable and let the eigen-
value with largest modulus be separated from the other eigenvalues, i. e., |λn | > |λi |, i =
1, . . . , n − 1. Further, let the starting vector z 0 have a nontrivial component with respect
to the eigenvector w n . Then, there are numbers σt ∈ C, |σt | = 1 such that
z t − σt w n → 0 (t → ∞), (4.1.4)
1
Richard von Mises (1883–1953): Austrian mathematician; Prof. of applied Mathematics in Straßburg
(1909-1918), in Dresden and then founder of the new Institute of Applied Mathematics in Berlin (1919-
1933), emigration to Turkey (Istambul) and eventually to the USA (1938); Prof. at Harward University;
important contributions to Theoretical Fluid Mechanics (introduction of the “stress tensor”), Aerodynam-
ics, Numerics, Statistics and Probability Theory.
153
154 Iterative Methods for Eigenvalue Problems
Furthermore,
n
n−1
αi λi t i
At z 0 = αi λti w i = λtn αn w n + w
i=1 i=1
αn λ n
and consequently, since |λi /λn | < 1, i = 1, . . . , n − 1 ,
This implies
λtn αn {w n + o(1)} λtn αn n
zt = = w + o(1).
|λtn αn | w n + o(1)2 |λtn αn |
=: σt
The iterates z t converges to span{w n }. Further, since αn = 0, it follows that
λt := (Az t , z t )2 , z t 2 = 1. (4.1.6)
In this case {w1 , . . . , wn } can be chosen as ONB of eigenvectors such that there holds
n
i=1 |αi | λi
2 2t+1
(At+1 z 0 , At z 0 )
t
λ = =
At z 0 2 i=1 |αi | λi
n 2 2t
# 2t+1 $
λ2t+1 n−1
|αn |2 + i=1 |αi |2 λλni λ 2t
= λmax + O n−1 .
n
=
|αn |2 + n−1 2 λi 2t λmax
λ2tn i=1 |αi | λn
Here, the convergence of the eigenvalue approximations is twice as fast as in the non-
Hermitian case.
4.1 Methods for the partial eigenvalue problem 155
Remark 4.1: The convergence of the power method is the better the more the modulus-
wise largest eigenvalue λn is separated from the other eigenvalues. The proof of conver-
gence can be extended to the case of diagonalizable matrices with multiple “maximum”
eigenvalue for which |λn | = |λi | necessarily implies λn = λi . For even more general,
non-diagonalizable matrices convergence is not guaranteed. The proof of Theorem 4.1
suggests that the constant in the convergence estimate (4.1.5) depends on the dimension
n and may therefore be very large for large matrices. The proof that this is actually not
the case is posed as an exercise.
For practical computation the power method is of only limited value, as its convergence is
very slow in general if |λn−1 /λn | ∼ 1. Further, it only delivers the “largest” eigenvalue. In
most practical applications the “smallest” eigenvalue is wanted, i. e., that which is closest
to zero. This is accomplished by the so-called “Inverse Iteration” of Wielandt2 . Here, it
is assumed that one already knows a good approximation λ̃ for an eigenvalue λk of the
matrix A to be computed (obtained by other methods, e. g., Lemma of Gershgorin, etc.)
such that
Definition 4.2: The “Inverse Iteration” consists in the application of the power method
to the matrix (A − λ̃I)−1 , where the so-called “shift” λ̃ is taken as an approximation to
the desired eigenvalue λk . Starting from an initial point z 0 the method generates iterates
z t as solutions of the linear systems
[(A − λ̃I)−1 z t ]r
μt := , r ∈ {1, . . . , n} : |zrt | = max |zjt |, (4.1.10)
zrt j=1,...,n
2
Helmut Wielandt (1910–2001): German mathematician; Prof. in Mainz (1946-1951) and Tübingen
(1951-1977); contributions to Group Theory, Linear Algebra and Matrix Theory.
156 Iterative Methods for Eigenvalue Problems
In the evaluation of the eigenvalue approximation in (4.1.10) and (4.1.11) the not yet
known vector z̃ t+1 := (A − λ̃I)−1 z t is needed. Its computation requires to carry the
iteration, possibly unnecessarily, one step further by solving the corresponding linear
system (A − λ̃I)z̃ t+1 = z t . This can be avoided by using the formulas
(Az t )r
λt := , or in the symmetric case λt := (Az t , z t )2 , (4.1.12)
zrt
We collect the above results for the special case of the computation of the “smallest”
eigenvalue λmin = λ1 of a diagonalizable matrix A in the following theorem.
Theorem 4.2 (Inverse Iteration): Let the matrix A be diagonalizable and suppose
that the eigenvalue with smallest modulus is separated from the other eigenvalues, i. e.,
|λ1 | < |λi |, i = 2, . . . , n. Further, let the starting vector z 0 have a nontrivial component
with respect to the eigenvector w 1 . Then, for the “Inverse Iteration” (with shift λ̃ := 0)
there are numbers σt ∈ C, |σt | = 1 such that
z t − σt w 1 → 0 (t → ∞), (4.1.16)
Remark 4.2: The inverse iteration allows the approximation of any eigenvalue of A for
which a sufficiently good approximation is known, where “sufficiently good” depends on
the separation of the desired eigenvalue of A from the other ones. The price to be paid
for this flexibility is that each iteration step requires the solution of the nearly singular
system (A − λ̃I)z t = z t−1 . This means that the better the approximation λ̃ ≈ λk , i. e.,
the faster the convergence of the Inverse Iteration is, the more expensive is each iteration
step. This effect is further amplified if the Inverse Iteration is used with “dynamic shift”
λ̃ := λtk , in order to speed up its convergence.
(A − λ̃I)z̃ t = z t−1 ,
Example 4.1: We want to apply the considered methods to the eigenvalue problem of
the model matrix from Section 3.4. The determination of vibration mode and frequency of
a membrane over the square domain Ω = (0, 1)2 (drum) leads to the eigenvalue problem
of the Laplace operator
∂2w ∂2w
− 2
(x, y) − (x, y) = μw(x, y) for (x, y) ∈ Ω,
∂x ∂y 2 . (4.1.18)
w(x, y) = 0 for (x, y) ∈ ∂ Ω.
This eigenvalue problem in function space shares several properties with that of a sym-
metric, positive definite matrix in Rn . First, there are only countably many real, positive
eigenvalues with finite (geometric) multiplicities. The corresponding eigenspaces span
the whole space L2 (Ω) . The smallest of theses eigenvalues, μmin > 0, and the associated
eigenfunction, wmin , describe the fundamental tone and the fundamental oscillation mode
of the drum. The discretization by the 5-point difference operator leads to the matrix
eigenvalue problem
Az = λz, λ = h2 μ, (4.1.19)
of the boundary value problem discussed in Section 3.4. Using the notation from above,
the eigenvalues of A are explicitly given by
We are interested in the smallest eigenvalue λmin of A , which by h−2 λmin ≈ μmin yields
an approximation to the smallest eigenvalue of problem (4.1.18). For λmin and the next
eigenvalue λ∗ > λmin there holds
For computing λmin , we may use the inverse iteration with shift λ = 0 . This requires in
each iteration the solution of a linear system like
Az t = z t−1 . (4.1.20)
log(εh2 )
t(ε) ≈ ≈ log(n). (4.1.23)
log(2/5)
This strategy for computing μmin is not very efficient if the solution of the subproblems
(4.1.20) would be done by the PCG method. For reducing the work, one may use an
iteration-dependent stopping criterion for the inner PCG iteration by which its accuracy
is balanced against that of the outer inverse iteration.
Remark 4.3: Another type of iterative methods for computing single eigenvalues of sym-
metric or nonsymmetric large-scale matrices is the “Jacobi-Davidson method” (Davidson
[30]), which is based on the concept of defect correction. This method will not be dis-
cussed in these lecture notes, we rather refer to the literature, e. g., Crouzeix et al. [29]
and Sleijpen & Van der Vorst [48]
4.2 Methods for the full eigenvalue problem 159
In this section, we consider iterative methods for solving the full eigenvalue problem of
an arbitrary matrix A ∈ Rn×n . Since these methods use successive factorizations of
matrices, which for general full matrices have arithmetic complexity O(n3 ), they are only
applied to matrices with special sparsity pattern such as general Hessenberg or symmetric
tridiagonal matrices. In the case of a general matrix therefore at firsr a reduction to such
special structure has to be performed (e. g., by applying Householder transformations as
discussed in Section 2.5.1). As application of such a method, we discuss the computation
of the singular value decomposition of a general matrix. In order to avoid confusion
between “indexing” and “exponentiation”, in the following, we use the notation A(t)
instead of the short version At for elements in a sequence of matrices.
I) The “LR method” of Rutishauser3 (1958), starting from some initial guess A(1) := A,
generates a sequence of matrices A(t) , t ∈ N, by the prescription
Since
A(t+1) = R(t) L(t) = L(t)−1 L(t) R(t) L(t) = (L(t)−1 A(t) L(t) ,
all iterates A(t) are similar to A and therefore have the same eigenvalues as A. Under
certain conditions on A , one can show that, with the eigenvalues λi of A :
⎡ ⎤
λ1 ∗
⎢ ⎥
lim A(t) = lim R(t) = ⎢
⎣
..
. ⎥ , lim L(t) = I.
⎦ (4.2.25)
t→∞ t→∞ t→∞
0 λn
3
Heinz Rutishauser (1918–1970): Swiss mathematician and computer scientist; since 1962 Prof. at
ETH Zurich; contributions to Numerical Linear Algebra (LR method: “Solution of eigenvalue problems
with the LR transformation”, Appl. Math. Ser. nat. Bur. Stand. 49, 47-81(1958).) and Analysis as well
as to the foundation of Computer Arithmetik.
4
J. F. G. Francis: “The QR transformation. A unitary analogue to the LR transformation”, Computer
J. 4, 265-271 (1961/1962).
160 Iterative Methods for Eigenvalue Problems
where Q(t) is unitary and R(t) is an upper triangular matrix with positive diagonal
elements (in order to ensure its uniqueness). The QR decomposition can be obtained,
e. g., by employing Householder transformations. Because of the high costs of this method
for a general full matrix the QR method is economical only for Hessenberg matrices or,
in the symmetric case, only for tridiagonal matrices. Since
A(t+1) = R(t) Q(t) = Q(t)T Q(t) R(t) Q(t) = Q(t)T A(t) Q(t) ,
all iterates A(t) are similar to A and therefore have the same eigenvalues as A. The
proof of convergence of the QR method will use the following auxiliary lemma.
Lemma 4.1: Let E (t) ∈ Rn×n , t ∈ N, be regular matrices, which satisfy limt→∞ E (t) = I
and possess the QR decompositions E (t) = Q(t) R(t) with rii > 0. Then, there holds
Proof. Since
T T T
E (t) − I2 = Q(t) R(t) − Q(t) Q(t) 2 = Q(t) (R(t) − Q(t) )2 = R(t) − Q(t) 2 → 0,
(t)
it follows that qjk → 0 (t → ∞) for j < k . In view of
⎡ ⎤⎡ ⎤
→ 0
⎢ ⎥ ⎢ ⎥
⎢ ∗ ⎥ ⎢ ∗ ⎥
⎢ ⎥ ⎢ ⎥
T ⎢ .. ⎥ ⎢ .. ⎥
I = Q(t) Q(t) =⎢ . ⎥ ⎢ ∗ . ⎥,
⎢ ⎥ ⎢ ⎥
⎢ ∗ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦
→ 0
we conclude that
(t) (t)
qjj → ±1 , qjk → 0 (t → ∞), j > k.
Hence Q(t) → diag(±1) (t → ∞). Since
Theorem 4.3 (QR method): Let the eigenvalues of the matrix A ∈ Rn×n be separated
with respect to their modulus, i. e., |λ1 | > |λ2 | > . . . > |λn | . Then, the matrices A(t) =
(t)
(ajk )j,k=1,...,n generated by the QR method converge like
(t)
{ lim ajj | j = 1, . . . , n} = {λ1 , . . . , λn }. (4.2.28)
t→∞
Proof. The separation assumption implies that all eigenvalues of the matrix A are
simple. There holds
A(t) = R(t−1) Q(t−1) = Q(t−1)T Q(t−1) R(t−1) Q(t−1) = Q(t−1)T A(t−1) Q(t−1)
(4.2.29)
= . . . = [Q(1) . . . Q(t−1) ]T A[Q(1) . . . Q(t−1) ] =: P (t−1)T AP (t−1) .
and, consequently,
By the assumption on the separation of the eigenvalues λi , we have |λj /λk | < 1, j > k ,
which yields
N (t) → 0 , RN (t) R−1 → 0 (t → ∞).
Then, for the (uniquely determined) QR decomposition Q̃(t) R̃(t) = I + RN (t) R−1 with
(t)
r̃ii > 0, Lemma 4.1 implies
At = Q[I + RN (t) R−1 ]RΛt S = Q[Q̃(t) R̃(t) ]RΛt S = [QQ̃(t) ][R̃(t) RΛt S]
162 Iterative Methods for Eigenvalue Problems
Since the QR decomposition of a matrix is unique up to the scaling of the column vectors
of the unitary matrix Q , there must hold
with certain diagonal matrices D (t) = diag(±1). Then, recalling again the realtion
(4.2.29) and observing that
A = W ΛW −1 = QRΛ[QR]−1 = QRΛR−1 QT ,
we conclude that
In case that W −1 does not possess an LR decomposition, then the eigenvalues λi do not
appear ordered according to their modulus. Q.E.D.
4.2 Methods for the full eigenvalue problem 163
Remark 4.4: The separation assumption |λ1 | > |λ2 | > . . . > |λn | means that all eigen-
values of A are simple, which implies that A is necessarily diagonalizable. For more
general matrices the convergence of the QR method is not guaranteed. However, conver-
gence in a suitable sense can be shown in case of multiple eigenvalues (such as in the model
problem of Section 3.4). For a more detailed discussion, we refer to the literature, e. g.,
Deuflhard & Hohmann [33], Stoer & Bulirsch [50], Golub & Loan [36], and Parlett [44].
The speed of convergence of the QR method, i. e., the convergence of the off-diagonal
elements in A(t) to zero, is determined by the size of the quotients
λj
< 1, j > k,
λk
The convergence is the faster the better the eigenvalues of A are modulus-wise separated.
This suggests to use the QR algorithm with a “shift” σ for the matrix A − σI, such that
λj − σ
λj < 1,
λk − σ λk
for the most interesting eigenvalues. The QR method with (dynamic) shift starts form
some initial guess A(1) = A and constructs a sequence of matrices A(t) , t ∈ N, by the
prescription
For this algorithm a modified version of the proof of Theorem 4.3 yields a convergence
estimate
λ − σ λ − σ
(t) j 1 j t
|ajk | ≤ c ··· , j > k, (4.2.34)
λk − σ1 λk − σt
(t)
for the lower off-diagonal elements of the iterates A(t) = (ajk )nj,k=1.
Remark 4.5: For positive definite matrices the QR method converges twice as fast as the
corresponding LR method, but requires about twice as much work in each iteration. Under
certain structural assumpotions on the matrix A , one can show that the QR method with
varying shifts converges with quadratic order for Hermitian tridiagonal matrices and even
with cubic order for unitary Hessenberg matrices (see Wang & Gragg [56]).
As the LR method, for economy reasons, also the QR method is applied only to pre-
reduced matrices for which the computation of the QR decomposition is of acceptable cost,
e. g., Hessenber matrices, symmetric tridiagonal matrices or more general band matrices
with bandwidth 2m + 1 n = m2 (e. g., the model matrix considered in Section 3.4).
This is justified by the following observation.
The numerically stable computation of the singular value decomposition (SVD) is rather
costly. For more details, we refer to the literature, e. g., the book by Golub & van Loan
[36]. The SVD of a matrix A ∈ Cn×k is usually computed by a two-step procedure. In the
first step, the matrix is reduced to a bidiagonal matrix. This requires O(kn2 ) operations,
assuming that k ≤ n . The second step is to compute the SVD of the bidiagonal matrix.
This step needs an iterative method since the problem to be solved is generically nonlinear.
For fixe4 accuracy requirement (e. g., round-off error level) this takes O(n) iterations,
each costing O(n) operations. Thus, the first step is more expensive and the overall
cost is O(kn2 ) operations (see Trefethen & Bau [54]). The first step can be done using
Householder reflections for a cost of O(kn2 + n3 ) operations, assuming that only the
singular values are needed and not the singular vectors.
The second step can then very efficiently be done by the QR algorithm. The LAPACK
subroutine DBDSQR[9] implements this iterative method, with some modifications to
cover the case where the singular values are very small. Together with a first step using
Householder reflections and, if appropriate, QR decomposition, this forms the LAPACK
DGESVD[10] routine for the computation of the singular value decomposition.
If the matrix A is very large, i. e., n ≥ 104 − 108 , the method described so far for
computing the SVD is too expensive. In this situation, particularly if A ∈ Cn×n is square
and regular, the matrix is first reduced to smaller dimension,
with m n, by using, e. g., the Arnoldi process described below in Section 4.3.1, and
then the above method is applied to this reduced matrix. For an appropriate choice
of the orthonormal transformation matrix Q(m) ∈ Cn×m the singular values of A(m)
are approximations of those of A , especially the “largest” ones (by modulus). If one is
interested in the “smallest” singular values of A , what is typically the case in applications,
the dimensional reduction process has to be applied to the inverse matrix A−1 .
4.3 Krylov space methods 165
“Krylov space methods” for solving eigenvalue problems follow essentially the same idea
as in the case of the solution of linear systems. The original high-dimensional problem
is reduced to smaller dimension by applying the Galerkin approximation in appropriate
subspaces, e. g., so-called “Krylov space”, which are sucessively constructed using the
given matrix and sometimes also its transpose. The work per iteration should amount to
about one matrix-vector multiplication. We will consider the two most popular variants
of such methods, the “Arnoldi5 method” for general, not necessarily Hermitian matrices,
and its specialization for Hermitian matrices, the “Lanczos6 method”.
First, we introduce the general concept of such a “model reduction” by “Galerkin
approximation”. Consider a general eigenvalue problem
Az = λz, (4.3.35)
with a higher-dimensional matrix A ∈ Cn×n , n ≥ 104 , which may have resulted from the
discretization of the eigenvalue problem of a partial differential operator. This eigenvalue
problem can equivalently be written in variational from as
5
Walter Edwin Arnoldi (1917–1995): US-American engineer; graduated in Mechanical Engineering
at the Stevens Institute of Technology in 1937; worked at United Aircraft Corp. from 1939 to 1977;
main research interests included modelling vibrations, Acoustics and Aerodynamics of aircraft propellers;
mainly known for the “Arnoldi iteration”; the paper “The principle of minimized iterations in the solution
of the eigenvalue problem”, Quart. Appl. Math. 9, 17-29 (1951), is one of the most cited papers in
Numerical Linear Algebra.
6
Cornelius (Cornel) Lanczos (1893–1974): Hungarian mathematician and physicist; PhD in 1921 on
Relativity Theory; assistant to Albert Einstein 19281929; contributions to exact solutions of the Einstein
field equation; discovery of the fast Fourier transform (FFT) 1940; worked at the U.S. National Bureau
of Standards after 1949; invented the “Lanczos algorithm” for finding eigenvalues of large symmetric
matrices and the related conjugate gradient method; in 1952 he left the USA for the School of Theoretical
Physics at the Dublin Institute for Advanced Studies in Ireland, where he succeeded Schrödinger and
stayed until 1968; Lanczos was author of many classical text books.
166 Iterative Methods for Eigenvalue Problems
Within the framework of Galerkin approximation this is usually written in compact form
as a generalized eigenvalue problem
Aα = λMα, (4.3.39)
n
for the vector α = (αj )nj=1 , involving the matrices A = (Aq j , q i )2 i,j=1 and M =
j i n
(q , q )2 i,j=1.
In the following, we use another formulation. With the Cartesian representations of
the basis vectors q i = (qji )nj=1 the Galerkin eigenvalue problem (4.3.37) is written in the
form
m
n
m
n
αj akl qkj q̄li =λ αj qkj q̄ki , i = 1, . . . , m. (4.3.40)
j=1 k,l=1 j=1 k=1
Then, using the matrix Q(m) := [q 1 , . . . , q m ] ∈ Cn×m and the vector α = (αj )m
j=1 ∈ C
m
of the reduced matrix H (m) := Q̄(m)T AQ(m) ∈ Cm×m . If the reduced matrix H (m) has a
particular structure, e. g., a Hessenberg matrix or a symmetric tridiagonal matrix, then,
the lower-dimensional eigenvalue problem (4.3.42) can efficiently be solved by the QR
method. Its eigenvalues may be considered as approximations to some of the dominant
eigenvalues of the original matrix A and are called “Ritz7 eigenvalues” of A . In view of
this preliminary consideration the “Krylov methods” consist in the following steps:
7
Walter Ritz (1878–1909): Swiss physicist; Prof. in Zürich and Göttingen; contributions to Spectral
Theorie in Nuclear Physics and Electromagnetism.
4.3 Krylov space methods 167
to be determined the whole process has to be applied to the inverse matrix A−1 ,
which possibly makes the construction of the subspace Km expensive.
Remark 4.6: In the above form the Krylov method for eigenvalue problems is analogous
to its version for (real) linear systems described in Section 3.3.3. Starting from the
variational form of the linear system
x ∈ Rn :
(Ax, y)2 = (b, y)2 ∀y ∈ Rn ,
we obtain the following reduced system for xm = m j
j=1 αj q :
m
n
n
αj akl qkj qli = bk qki , i = 1, . . . , m.
j=1 k,l=1 k=1
The “power method” for computing the largest eigenvalue of a matrix only uses the
current iterate Am q, m n, for some normalized starting vector q ∈ Cn , q2 = 1, but
ignores the information contained in the already obtained iterates {q, Aq, . . . , A(m−1) q}.
This suggests to form the so-called “Krylov matrix”
The columns of this matrix are not orthogonal. In fact, since At q converges to the
direction of the eigenvector corresponding to the largest (in modulus) eigenvalue of A ,
this matrix tends to be badly conditioned with increasing dimension m. Therefore, one
constructs an orthogonal basis by the Gram-Schmidt algorithm. This basis is expected to
yield good approximations of the eigenvectors corresponding to the m largest eigenvalues,
for the same reason that Am−1 q approximates the dominant eigenvector. However, in
this simplistic form the method is unstable due to the instability of the standard Gram-
Schmidt algorithm. Instead the “Arnoldi method” uses a stabilized version of the Gram-
Schmidt process to produce a sequence of orthonormal vectors, {q 1 , q 2 , q 3 , . . .} called
the “Arnoldi vectors”, such that for every m, the vectors {q 1 , . . . , q m } span the Krylov
subspace Km . For the following, we define the orthogonal projection operator
which projects the vector v onto span{u}. With this notation the classical Gram-Schmidt
orthonormalization process uses the recurrence formulas:
168 Iterative Methods for Eigenvalue Problems
q 1 = q−1
2 q, t = 2, . . . , m :
t−1
(4.3.43)
q̃ t = At−1 q − projqj (At−1 q), q t = q̃ t −1 t
2 q̃ .
j=1
Here, the t-th step projects out the component of At−1 q in the directions of the already
determined orthonormal vectors {q 1 , . . . , q t−1 }. This algorithm is numerically unstable
due to round-off error accumulation. There is a simple modification, the so-called “mod-
ified Gram-Schmidt algorithm”, where the t-th step projects out the component of Aq t
in the directions of {q 1 , . . . , q t−1 }:
q 1 = q−1
2 q, t = 2, . . . , m :
t−1
(4.3.44)
q̃ t = Aq t−1 − projqj (Aq t−1 ), q t = q̃ t −1 t
2 q̃ .
j=1
Then, with the setting hi,t−1 := (Aq t−1 , q i )2 , from (4.3.44), we infer that
t
Aq t−1 = hi,t−1 q i , t = 2, . . . , m + 1. (4.3.45)
i=1
This algorithm gives the same result as the original formula (4.3.43) but introduces smaller
errors in finite-precision arithmetic. Its cost is asymptotically 2nm2 a. op.
Definition 4.3 (Arnoldi algorithm): For a general matrix A ∈ Cn×n the Arnoldi
method determines a sequence of orthonormal vectors q t ∈ Cn , 1 ≤ t ≤ m n
(“Arnoldi basis”), by applying the modified Gram-Schmidt method (4.3.46) to the basis
{q, Aq, . . . , Am−1 q} of the Krylov space Km :
Let Q(m) denote the n×m-matrix formed by the first m Arnoldi vectors {q 1 , q 2 , . . . , q m },
and let H (m) be the (upper Hessenberg) m × m-matrix formed by the numbers hjk :
⎡ ⎤
h11 h12 h13 ... h1m
⎢ ⎥
⎢ h21 h22 h23 ... h2m ⎥
⎢ ⎥
⎢ ⎥
Q(m) := [q 1 , q 2 , . . . , q m ], H (m) =⎢ 0 h32 h33 ... h3m ⎥ .
⎢ .. ⎥
⎢ .. .. ..
. .
..
. . ⎥
⎣ . ⎦
0 0 hm,m−1 hmm
The matrices Q(m) are orthonormal and in view of (4.3.45) satisfy (“Arnoldi relation”)
Multiplying by Q̄(m)T from the left and observing Q̄(m)T Q(m) = I and Q̄(m)T q m+1 = 0 ,
we infer that
H (m) = Q̄(m)T AQ(m) . (4.3.48)
In the limit case m = n the matrix H (n) is similar to A and, therefore, has the same
eigenvalues. This suggests that even for m n the eigenvalues of the reduced matrix
H (m) may be good approximations to some eigenvalues of A . When the algorithm stops
(in exact arithmetic) for some m < n by hm+1,m = 0, then the Krylov space Km is an
invariant subspace of the matrix A and the reduced matrix H (m) = Q̄(m)T AQ(m) has m
eigenvalues in common with A (exercise), i. e.,
The following lemma provides an a posteriori bound for the accuracy in approximating
eigenvalues of A by those of H (m) .
Lemma 4.3: Let {μ, w} be an eigenpair of the Hessenberg matrix H (m) and let v =
Q(m) w so that (μ, v} is an approximate eigenpair of A. Then, there holds
The relation (4.3.49) does not provide a priori information about the convergence of
the eigenvalues of H (m) against those of A for m → n , but in view of σ(H (n) ) = σ(A)
this is not the question. Instead, it allows for an a posteriori check on the basis of the
computed quantities hm+q,m and wm whether the obtained pair {μ, w} is a reasonable
approximation.
Remark 4.7: i) Typically, the Ritz eigenvalues converge to the extreme (“maximal”)
eigenvalues of A. If one is interested in the “smallest” eigenvalues, i. e., those which
are closest to zero, the method has to be applied to the inverse matrix A−1 , similar to
the approach used in the “Inverse Iteration”. In this case the main work goes into the
generation of the Krylov space Km = span{q, A−1 q, . . . , (A−1 )m−1 q}, which requires the
successive solution of linear systems,
v 0 := q, Av 1 = v 0 , ... Av m = v m−1 .
Remark 4.8: The algorithm (4.3.46) can be used also for the stable orthonormalization
of a general basis {v 1 , . . . , v m } ⊂ Cn :
u1 = v 1 −1 1
2 v , t = 2, . . . , m :
j = 1, . . . , t − 1 : ut,1 = v t , (4.3.50)
u t,j+1
=u t,j
− projuj (u ), t,j t
u = ut,t −1 t,t
2 u .
This “modified” Gram-Schmidt algorithm (with exact arithmetic) gives the same result
as its “classical” version (exercise)
u1 = v 1 −1 1
2 v , t = 2, . . . , m :
t−1
(4.3.51)
ũt = v t − projuj (v t ), ut = ũt −1 t
2 ũ .
j=1
Both algorithms have the same arithmetic complexity (exercise). In each step a vector is
determined orthogonal to its preceding one and also orthogonal to any errors introduced
in the computation, which enhances stability. This is supported by the following stability
estimate for the resulting “orthonormal” matrix U = [u1 , . . . , um ]
c1 cond2 (A)
U T U − I2 ≤ ε. (4.3.52)
1 − c2 cond2 (A)
As in the solution of linear systems by Krylov space methods, e. g., the GMRES
method, the high storage needs for general matrices are avoided in the case of Hermitian
matrices due to the availability of short recurrences in the orthonormalization process.
This is exploited in the “Lanczos method”. Suppose that the matrix A is Hermitian.
Then, the recurrence formula of the Arnoldi method
t−1
q̃ t = Aq t−1 − (Aq t−1 , q j )2 q j , t = 2, . . . , m + 1,
j=1
q̃ t = Aq t−1 − (Aq t−1 , q t−1 )2 q t−1 − (Aq t−1 , q t−2 )2 q t−2 = Aq t−1 − αt−1 q t−1 − βt−2 q t−2 .
=: αt−1 =: βt−2
q̃ t 2 = (q t , q̃ t )2 = (q t , Aq t−1 − αt−1 q t−1 − βt−2 q t−2 )2 = (q t , Aq t−1 )2 = (Aq t , q t−1 )2 = βt−1 .
This implies that also βt−1 ∈ R and βt−1 q t = q̃ t . Collecting the foregoing relations, we
obtain
where the matrix T (m) ∈ Rm×m is real symmetric. From this so-called “Lanczos relation”,
172 Iterative Methods for Eigenvalue Problems
we finally obtain
Definition 4.4 (Lanczos Algorithm): For a Hermitian matrix A ∈ Cn×n the Lanczos
method determines a set of orthonormal vectors {q 1 , . . . , q m }, m n, by applying the
modified Gram-Schmidt method to the basis {q, Aq, . . . , Am−1 q} of the Krylov space Km :
After the matrix T (m) is calculated, one can compute its eigenvalues λi and their
corresponding eigenvectors w i , e. g., by the QR algorithm. The eigenvalues and eigen-
vectors of T (m) can be obtained in as little as O(m2 ) work. It can be proven that the
eigenvalues are approximate eigenvalues of the original matrix A. The Ritz eigenvectors
v i of A can then be calculated by v i = Q(m) w i .
As an application of the Krylov space methods described so far, we discuss the computa-
tion of the pseudo-spectrum of a matrix Ah ∈ Rn×n , which resulted from the discretization
of a dynamical system governed by a differential operator in the context of linearized sta-
bility analysis. Hence, we are interested in the most “critical” eigenvalues, i. e., in those
which are close to the origin or to the imaginary axis. This requires to consider the in-
verse of matrix, T = A−1 h . Thereby, we follow ideas developed in Trefethen & Embree
[22], Trefethen [21], and Gerecht et al. [35]. The following lemma collects some useful
facts on the pseudo-spectra of matrices.
where σmin (B) denotes the smallest singular value of the matrix B , i. e.,
σε (Q̄T T Q) = σε (T ). (4.3.56)
4.3 Krylov space methods 173
and, consequently,
leads one to simply take a number of random matrices E of norm less than ε and to
plot the union of the usual spectra σ(T + E) . The resulting pictures are called the
“poor man’s pseudo-spectra”. This approach is rather expensive since in order to obtain
precise information of the ε-pseudo-spectrum a really large number of random matrices
are needed. It cannot be used for higher-dimensional matrices.
where v̂ is the stationary “base flow” the stability of which is to be investigated. This
eigenvalue problem is posed on the linear manifold described by the incompressibility con-
straint ∇·v = 0. Hence after discretization the resulting algebraic eigenvalue problems in-
herit the saddle-point structure of (4.3.58). We discuss this aspect in the context of a finite
element Galerkin discretization with finite element spaces Hh ⊂ H01 (Ω)d and Lh ⊂ L2 (Ω).
Let {ϕih , i = 1, . . . , nv := dim Hh } and {χjh , j = 1, . . . , np := dim Lh } be standard nodal
bases of the finite element spaces Hh and Lh , respectively. nv i The np j vj h ∈ Hh
eigenvector
and the pressure qh ∈ Lh possess expansions vh = i=1 vh ϕh , qh = j=1 qh χh , where
i
n
qh = (qhj )j=1
p
∈ Cnp , respectively. With this notation the discretization of the eigenvalue
problem (4.3.58) results in a generalized algebraic eigenvalue problem of the form
& '& ' & '& '
Sh Bh vh Mh 0 vh
= λh , (4.3.59)
BhT 0 qh 0 0 qh
with the so-called stiffness matrix Sh , gradient matrix Bh and mass matrix Mh defined
by
nv nv ,np nv
Sh := a (v̂h ; ϕjh , ϕih ) i,j=1, Bh := (χjh , ∇ · ϕih )L2 i,j=1 , Mh := (ϕjh , ϕih )L2 i,j=1.
For simplicity, we suppress terms stemming from pressure and transport stabilization.
The generalized eigenvalue problem (4.3.59) can equivalently be written in the form
& '& '−1 & '& ' & '& '
Mh 0 Sh Bh Mh 0 vh Mh 0 vh
= μh , (4.3.60)
0 0 BhT 0 0 0 qh 0 0 qh
where μh = λ−1h . Since the pressure qh only plays the role of a silent variable (4.3.60)
reduces to the (standard) generalized eigenvalue problem
Th vh = μh Mh vh , (4.3.61)
The approach described below for computing eigenvalues of general matrices T ∈ Rn×n
can also be applied to this non-standard situation.
Computation of eigenvalues
For computing the eigenvalues of a (general) matrix T ∈ Rn×n , we use the Arnoldi
process, which produces a lower-dimensional Hessenberg matrix the eigenvalues of which
approximate those of T :
⎛ ⎞
h1,1 h1,2 h1,3 ··· h1,m
⎜ ⎟
⎜h2,1 h2,2 h2,3 ··· h2,m ⎟
⎜ ⎟
⎜ ⎟
H (m) = Q̄(m)T T Q(m) = ⎜ 0 h3,2 h3,3 ··· h3,m ⎟ ,
⎜ . .. ⎟
⎜ .. ..
.
..
.
..
. . ⎟
⎝ ⎠
0 ··· 0 hm,m−1 hm,m
4.3 Krylov space methods 175
By back-transformation of this eigenvector from the Krylov space Km into the space Rn ,
we obtain a corresponding approximate eigenvector of the full matrix T .
We want to determine the “critical” part of the ε-pseudo-spectrum of the discrete operator
Ah , which approximates the unbounded differential operator A . As discussed above, this
requires the computation of the smallest singular value of the inverse matrix T = A−1 h .
Since the dimension nh of T in practical applications is very high, nh ≈ 104 − 108 , the
direct computation of singular values of T or even a full singular value decomposition is
prohibitively expensive. Therefore, the first step is the reduction of the problem to lower
dimension by projection onto a Krylov space resulting in a (complex) Hessenberg matrix
H (m) ∈ Cn×n the inverse of which, H (m)−1 , may then be viewed as a low-dimensional
approximation to Ah capturing the critical “smallest” eigenvalues of Ah and likewise its
pseudo-spectra. The pseudo-spectra of H (m) may then be computed using the approach
described in Section 4.2.2. By Lemma 1.17 the pseudo-spectrum of H (m) is closely related
176 Iterative Methods for Eigenvalue Problems
to that of H (m)−1 but involving constants, which are difficult to control. Therefore, one
tends to prefer to directly compute the pseudo-spectra of H (m)−1 as an approximation
to that of Ah . This, however, is expensive for larger m since the inversion of the matrix
H (m) costs O(m3 ) operations. Dealing directly with the Hessenberg matrix H (m) looks
more attractive. Both procedures are discussed in the following. We choose a section
D ⊂ C (around the origin), in which we want to determine the pseudo-spectrum. Let
D := {z ∈ C| { Re z, Im z} ∈ [ar , br ] × [ai , bi ]} for certain values ar < br and ai < bi .
To determine the pseudo-spectrum in the complete rectangle D , we cover D by a grid
with spacing dr and di , such that k points lie on each grid line. For each grid point, we
compute the corresponding ε-pseudo-spectrum.
i) Computation of the pseudo-spectra σε (H (m)−1 ): For each z ∈ D \ σ(H (m)−1 ) the
quantity
ε(z, H (m)−1 ) := (zI − H (m)−1 )−1 −1
2 = σmin (zI − H
(m)−1
)
determines the smallest ε > 0 , such that z ∈ σε (H (m)−1 ). Then, for any point z ∈ D ,
by computing σmin (zI − H (m)−1 ) , we obtain an approximation of the smallest ε , such
that z ∈ σε (H (m)−1 ) . For computing σmin := σmin (zI − H (m)−1 ) , we recall its definition
as smallest (positive) eigenvalue of the Hermitian, positive definite matrix
The linear systems in each iteration can be solved by pre-computing either directly an
LR decomposition of S , or if this is too ill-conditioned, first a QR decomposition
zI − H (m)−1 = QR,
which then yields a Cholesky decomposition of S :
zI − H (m) = UΣV̄ T ,
For that, we use the LAPACK routine dgesvd within MATLAB. Since the operation count
of the singular value decomposition growth like O(m2 ) , in our sample calculation, we limit
the dimension of the Krylov space by m ≤ 200 .
4.3 Krylov space methods 177
- The mesh size h in the finite element discretization on the domain Ω ⊂ Rn for
reducing the infinite dimensional problem to an matrix eigenvalue problem of di-
mension nh .
- The dimension of the Krylov space Km,h in the Arnoldi method for the reduction
of the nh -dimensional (inverse) matrix Th to the much smaller Hessenberg matrix
(m)
Hh .
Only for an appropriate choice of these parameters one obtains a reliable approximation
to the pseudo-spectrum of the differential operator A . First, h is refined and m is in-
creased until no significant change in the boundaries of the ε-pseudo-spectrum is observed
anymore.
First, the interval Ω = (−10, 10) is discretized by eightfold uniform refinement resulting
in the finest mesh size h = 20 · 2−8 ≈ 0.078 and nh = 256 . The Arnoldi algorithm
for the corresponding discrete eigenvalue problem of the inverse matrix A−1 h generates
(m)
a Hessenberg matrix Hh of dimension m = 200 . The resulting reduced eigenvalue
problem is solved by the QR method. For the determination of the corresponding pseudo-
(m)
spectra, we export the Hessenberg matrix Hh into a MATLAB file. For this, we use the
routine DGESVD in LAPACK (singular value decomposition) on meshes with 10×10 and
with 100 × 100 points. The ε-pseudo-spectra are computed for ε = 10−1, 10−2 , ..., 10−10
leading to the results shown in Fig. 4.1. We observe that all eigenvalues have negative real
178 Iterative Methods for Eigenvalue Problems
part but also that the corresponding pseudo-spectra reach far into the positive half-plane
of C , i. e., small perturbations of the matrix may have strong effects on the location of
the eigenvalues. Further, we see that already a grid with 10 × 10 points yields sufficiently
(m)
good approximations of the pseudo-spectrum of the matrix Hh .
150 150
100 100
50 50
0 0
−100 −50 0 50 −100 −50 0 50
−νΔv + v · ∇v = 0, in Ω. (4.3.67)
−νΔv1 + x2 ∂1 v1 + v2 = λv1 ,
(4.3.68)
−νΔv2 + x2 ∂1 v2 = λv2 ,
in Ω with the boundary conditions v|ΓD = 0, ∂n v|Γout = 0 . For discretizing this problem,
we use the finite element method described above with conforming Q1 -elements combined
4.3 Krylov space methods 179
5 5
0 0
−5 −5
−2 0 2 4 6 8 10 −2 0 2 4 6 8 10
Figure 4.2: Computed pseudo-spectra of the linearized Burgers operator with Dirichlet
inflow condition for ν = 0.01 and h = 2−7 (left) and h = 2−8 (right) computed by
the Arnoldi method with m = 100. The “dots” represent eigenvalues and the lines the
boundaries of the ε-pseudo-spectra for ε = 10−1 , ..., 10−4.
5 5
0 0
−5 −5
−2 0 2 4 6 8 10 −2 0 2 4 6 8 10
Figure 4.3: Computed pseudo-spectra of the linearized Burgers operator with Dirichlet
inflow condition for ν = 0.01 and h = 2−8 computed by the Arnoldi method with m =
100 (left) and m = 200 (right). The “dots” represent eigenvalues and the lines the
boundaries of the ε-pseudo-spectra for ε = 10−1 , ..., 10−4.
Now, we turn to Neumann inflow conditions. In this particular case the first eigenval-
180 Iterative Methods for Eigenvalue Problems
ues and eigenfunctions of the linearized Burgers operator can be determined analytically
as λk = νk 2 π 2 , vk (x) = (sin(kπx2 ), 0)T , for k ∈ Z . All these eigenvalues are degenerate.
However, there exists another eigenvalue λ4 ≈ 1.4039 between the third and fourth one,
which is not of this form, but also degenerate.
We use this situation for studying the dependence of the proposed method for com-
puting pseudo-spectra on the size of the viscosity parameter, 0.001 ≤ ν ≤ 0.01 . Again
the discretization uses the mesh size h = 2−7 , Krylov spaces of dimension m = 100 and
a grid of spacing k = 100 . By varying these parameters, we find that only eigenvalues
with Reλ ≤ 6 and corresponding ε-pseudo-spectra with ε ≥ 10−4 are reliably computed.
The results are shown in Fig. 4.4.
2 2
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−2 −2
−1 0 1 2 3 −1 0 1 2 3
Figure 4.4: Computed pseudos-pectra of the linearized (around Couette flow) Burger op-
erator with Neumann inflow conditions for ν = 0.01 (left) and ν = 0.001 (right):
The dots represent eigenvalues and the lines the boundaries of the ε-pseudo-spectra for
ε = 10−1 , . . . , 10−4.
For Neumann inflow conditions the most critical eigenvalue is significantly smaller
than the corresponding most critical eigenvalue for Dirichlet inflow conditions, which
suggests weaker stability properties in the “Neumann case”. Indeed, in Fig. 4.4, we
see that the 0.1-pseudo-spectrum reaches into the negative complex half-plane indicating
instability for such perturbations. This effect is even more pronounced for ν = 0.001
with λNcrit ≈ 0.0098 .
In this last example, we present some computational results for the 2d Navier-Stokes
benchmark “channel flow around a cylinder” with the configuration shown in Section 0.4.3
(see Schäfer & Turek [65]). The geometry data are as follows: channel domain Ω :=
(0.00m, 2.2m) × (0.00m, 0.41m), diameter of circle D := 0.10m, center of circle at a :=
(0.20m, 0.20m) (slightly nonsymmetric position). The Reynolds number is defined in
terms of the diameter D and the maximum inflow velocity Ū = max |v in| = 0.3m/s
(parabolic profile), Re = Ū 2 D/ν . The boundary conditions are v|Γrigid = 0, v|Γin =
v in , ν∂n v − np|Γout = 0. The viscosity is chosen such that the Reynolds number is small
enough, 20 ≤ Re ≤ 40 , to guarantee stationarity of the base flow as shown in Fig. 4.5.
Already for Re = 60 the flow turns nonstationary (time periodic).
4.3 Krylov space methods 181
(0 m,0.41 m)
0.16 m
0.41 m
0.15 m
0.1 m x2
S
0.15 m
x1
(0 m,0 m)
2.2 m
Figure 4.5: Configuration of the “channel flow” benchmark and x1 -component of the ve-
locity for Re = 40 .
We want to investigate the stability of the computed base flow for several Reynolds
numbers in the range 20 ≤ Re ≤ 60 and inflow conditions imposed on the admissible
perturbations, Dirichlet or Neumann (“free”), by determining the corresponding critical
eigenvalues and pseudo-spectra. This computation uses a “stationary code” employing the
Newton method for linearization, which is known to potentially yield stationary solutions
even at Reynolds numbers for which such solutions may not be stable.
We begin with the case of perturbations satisfying (homogeneous) Dirichlet inflow condi-
tions. The pseudo-spectra of the critical eigenvalues for Re = 40 and Re = 60 are shown
in Fig. 4.6.
The computation has been done on meshes obtained by four to five uniform refinements
of the (locally adapted) meshes used for computing the base flow. In the Arnoldi method,
we use Krylov spaces of dimension m = 100 . Computations with m = 200 give almost
the same results. For Re = 40 the relevant 10−2 -pseudo-spectrum does not reach into
the negative complex half-plane indicating stability of the corresponding base solution in
this case, as expected in view of the result of nonstationary computations. Obviously the
transition from stationary to nonstationary (time periodic) solutions occurs in the range
40 ≤ Re ≤ 60 . However, for this “instability” the sign of the real part of the critical
eigenvalue seems to play the decisive role and not so much the size of the corresponding
pseudo-spectrum.
4.4 Exercises
Exercise 4.1: The proof of convergence of the “power method” applied to a symmetric,
positive definite matrix A ∈ Rn×n resulted in the identity
# $
(λn )2t+1 |αn |2 + n−1 2 λi 2t+1 λ 2t
i=1 |αi | n−1
t t t
λ = (Az , z )2 = #
λn
2t $ = λ max + O ,
(λn ) |αn | + i=1 |αi | λn
2t 2 n−1 2 λi λmax
right-hand side is uniformly bounded with respect to the dimension n of A but depends
linearly on |λn |.
Exercise 4.2: The “inverse iteration” may be accelerated by employing a dynamic “shift”
taken from the presceding eigenvalue approximation (λ0k ≈ λk ):
z̃ t−1 −1 t t 1
(A − λt−1 t
k I)z̃ = z
t−1
, zt = , μtk = ((A − λt−1
k I) z , z )2 , λtk = + λt−1
k .
z̃ t−1 μtk
Investigate the convergence of this method for the computation of the smallest eigenvalue
λ1 = λmin of a symmetric, positive definite matrix A ∈ Rn×n . In detail, show the
convergence estimate
t−1
λ1 − λj 2 z 0 22
|λ1 − λt | ≤ |λt − λt−1 | .
j=0
λ2 − λj |α1 |2
A(0) := A,
A(t+1) := R(t) Q(t) , with A(t) = Q(t) R(t) , t ≥ 0.
(Hint: Use the fact that the QR decomposition of A yields a Cholesky decomposition of
the matrix ĀT A.)
Exercise 4.5: For a matrix A ∈ Cn×n and an arbitrary vector q ∈ Cn , q = 0, form the
Krylov spaces Km := span{q, Aq, . . . , Am−1 q}. Suppose that for some 1 ≤ m ≤ n there
holds Km−1 = Km = Km+1 .
i) Show that then Km = Km+1 = · · · = Kn = Cn and dimKm = m.
ii) Let {q 1 , . . . , q m } be an ONB of Km and set Qm := [q 1 , . . . , q m ]. Show that there
holds σ(QmT AQm ) ⊂ σ(A) . In the case m = n there holds σ(QnT AQn ) = σ(A) .
Exercise 4.6: Recall the two versions of the Gram-Schmidt algorithm, the “classical”
one and the “modified” one described in the text, for the successive orthogonalization of
a general, linear independent set {v 1 , . . . , v m } ⊂ Rn .
i) Verify that both algorithms, used with exact arithmetic, yield the same result.
ii) Determine the computational complexity of these two algorithms, i. e., the number of
arithmetic operations for computing the corresponding orthonormal set {u1 , . . . , um }.
Exercise 4.8: Consider the model eigenvalue problem from the text, which originates
from the 7-point discretization of the Poisson problem on the unit cube:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
B −Im2 C −Im 6 −1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
A = h−2 ⎢
⎣ −Im2 B . ⎥
⎦ B=⎢ ⎣ −Im C . ⎥
⎦ C=⎢ ⎣ −1 6 . ⎥
⎦
.. .. .. .. .. ..
. . . . . .
n=m3 m2 m
where h = 1/(m + 1) is the mesh size. In this case the corresponding eigenvalues and
eigenvectors are explicitly given by
# $
λhijk = h−2 6 − 2 cos[ihπ] + cos[jhπ] + cos[khπ] , i, j, k = 1, . . . , m,
m
whijk = sin[pihπ] sin[qjhπ] sin[rkhπ] p,q,r=1.
4.4 Exercises 185
For this discretization, there holds the theoretical a priori error estimate
|λijk − λhijk | 1
≤ λijk h2 ,
|λijk | 12
where λijk = (i2 + j 2 + k 2 )π 2 are the exact eigenvalues of the Laplace operator (and h
sufficiently small).
i) Verify this error estimate using the given values for λijk and λhijk .
ii) Derive an estimate for the number of eigenvalues (not counting multiplicities) of the
Laplace operator that can be approximated reliably for a mesh size of h = 2−7 , if a
uniform relative accuracy of TOL = 10−3 is required.
iii) How small has the mesh size h to be chosen if the first 1.000 eigenvalues (counting
multiplicities for simplicity) of the Laplace operator have to be computed with relative
accuracy TOL = 10−3 ? How large would the dimension n of the resulting system matrix
A be in this case? (Hint: We are interested in an upper bound, so simplify accordingly.)
Exercise 4.9: Formulate the “inverse iteration” of Wielandt and the “Lanczos algo-
rithm” (combined with the QR method) for computing the smallest eigenvalue of a large
symmetric positive definite matrix A ∈ Rn×n . Suppose that matrix vector products as
well as solving linear systems occurring in these processes can be accomplished with O(n)
a. op.:
i) Compare the arithmetic work (# of a. op.) of these two approaches for performing 100
iterations.
ii) How do the two methods compare if not only the smallest but the 10 smallest eigen-
values are to be computed?
Exercise 4.10: The Krylov space method applied for general matrices A ∈ Cn×n re-
quires complex arithmetic, but many software packages provide only real arithmetic.
i) Verify that a (complex) linear system Ax = b can equivalently be written in the
following real “(2n × 2n)-block form”:
( )( ) ( )
Re A Im A Re x Re b
= .
−Im A Re A −Im x −Im b
ii) Formulate (necessary and sufficient) conditions on A, which guarantee that this (real)
coefficient block-matrix is a) regular, b) symmetric and c) positive definite?
Ah xh = bh , (5.1.1)
xt+1
h = xth + θh (bh − Ah xth ) = (Ih − θh Ah )xth + θh bh , (5.1.2)
with a damping parameter 0 < θh ≤ 1 . The symmetric, positive definite matrix Ah pos-
sesses an ONS of eigenvectors {whi , i = 1, ..., nh } corresponding to the ordered eigenvalue
187
188 Multigrid Methods
λmin (Ah ) = λ1 ≤ ... ≤ λn = λmax (Ah ) =: Λh . The expansion of the initial error
nh
e0h := x0h − xh = εi whi ,
i=1
nh
|eth |2 = ε2i (1 − θh λi )2t . (5.1.3)
i=1
The assumption 0 < θh ≤ Λ−1 h is sufficient for the convergence of the Richardson iteration.
Because of |1 − θh λi | 1 for larger λi and |1 − θh λ1 | ≈ 1 “high-frequency” components
of the error decay fast, but “low-frequency” components only very slowly. The same holds
for the residuum rht = bh − Ah xth = Ah eth . Hence already after a few iterations there holds
[N/2]
|rht |2 ≈ ε2i λ2i (1 − θh λi )2t , [n/2] := max{m ∈ N| m ≤ n/2}. (5.1.4)
i=1
This may be interpreted as follows: The iterated defect rht on the mesh Th is “smooth”.
Hence, it can be approximated well on the coarser mesh T2h with mesh size 2h . The
resulting defect equation for the computation of the correction to the approximation xth
on Th would be less costly because of its smaller dimension n2h ≈ nh /4 .
This defect correction process in combination with recursive coarsening can be carried
on to a coarsest mesh, on which the defect equation can finally be solved exactly. The
most important components of this multigrid process are the “smoothing iteration”, xνh =
Shν (x0h ) and certain transfer operations between functions defined on different meshes. The
smoothing operation Sh (·) is usually given in form of a simple fixed-point iteration (e. g.,
the Richardson iteration)
xν+1
h = Sh (xνh ) := (Ih − Ch−1 Ah )xνh + Ch−1 bh ,
For the formulation of the multigrid process, we consider a sequence of nested grids
Tl = Thl , l = 0, ..., L , of increasing fineness h0 > ... > hl > ... > hL (for instance
obtained by successively refining a coarse starting grid) and corresponding finite element
spaces Vl := Vhl of increasing dimension nl , which are subspaces of the “continuous” solu-
5.1 Multigrid methods for linear systems 189
tion space V = H01 (Ω) (first-order Sobolev1 space on Ω including zero Dirichlet boundary
conditions). Here, we think of spaces of continuous, with respect to the mesh Th piece-
wise linear (on triangular meshes) or piecewise (isoparametric) bilinear (on quadrilateral
meshes) functions. For simplicity, we assume that the finite element spaces are hierachi-
cally ordered,
This structural assumption eases the analysis of the multigrid process but is not essential
for its practical success.
As usual, we write the continuous problem and its corresponding finite element Galerkin
approximation in compact variational form
Here, a(u, ϕ) := (Lu, ϕ)L2 is the “energy bilinear form” corresponding to the (elliptic)
differential operator L and (f, ϕ)L2 the L2 -scalar product on the solution domain Ω . In
the model problem discussed above this notation has the explicit form Lu = −Δu and
" "
a(u, ϕ) = ∇u(x)∇ϕ(x) dx, (f, ϕ)L2 = f (x)ϕ(x) dx.
Ω Ω
The finite element subspace Vh ⊂ V has a natural so-called “nodal basis” (Lagrange basis)
{b1 , . . . , bnh } characterized by the interpolation property bi (aj ) = δij , i, j = 1, . . . , nh ,
where aj are the nodal points of the mesh Th . Between the finite element function
uh ∈ Vh and the corresponding nodal-value vector xh ∈ Rnh , we have the connection
uh (aj ) = xh,j , j = 1, ..., nh ,
nh nh
j
uh = xh,j b = uh (aj )bj .
j=1 j=1
Using this notation the discrete problems (5.1.7) can be written in the following form:
nh
xh,j a(bj , bi ) = (f, bi )L2 , i = 1, . . . , nh ,
j=1
1
Sergei Lvovich Sobolev (1908–1989): Russian mathematician; worked in Leningrad (St. Petersburg)
and later at the famous Steklov-Institute for Mathematics of the Academy of Sciences in Moscow; funda-
mental contributions to the theory of partial differential equations concept of generalized (distributional)
solutions, “Sobolev spaces”; worked also on numerical methods, numerical quadrature.
190 Multigrid Methods
Ah xh = bh , (5.1.8)
a(uh , vh ) = (Ah xh , yh )2 .
The system matrix Ah is symmetric and positive definite by construction and has a
condition number of size cond2 (Ah ) = O(h−2 ). Additionally, we will use the so-called
“mass matrix” Mh = (mij )ni,j=1
h
defined by
The mass matrix Mh is also symmetric and positive definite by construction and has a
condition number of size cond2 (Ah ) = O(1).
For the exact “discrete” solution uh ∈ Vh there holds the error estimate
This process is called “complexity-optimal” if the arithmetic work for achieving this accu-
racy is of size O(nh ) uniformly with respect to the mesh size h . We will see below that
the multigrid method is actually optimal is this sense if all its components are properly
chosen.
Let u0L ∈ VL be an initial guess for the exact discrete solution uL ∈ VL on mesh level L
(For example, u0L = 0 or u0L = uL−1 if such a coarse-grid solution is available.). Then,
u0L is “smoothed” (“pre-smoothed”) by applying ν steps, e. g., of the damped Richardson
iteration starting from ū0L := u0L . This reads in variational from as follows:
5.1 Multigrid methods for linear systems 191
# $
L , ϕL )L2 + θL (f, ϕL )L2 − a(ūL , ϕL )
(ūkL , ϕL )L2 = (ūk−1 ∀ϕL ∈ VL ,
k−1
(5.1.11)
where θL = λmax (Ah )−1 . For the resulting smoothed approximation ūνL , we define the
“defect” dL ∈ VL (without actually computing it) as follows:
Since VL−1 ⊂ VL , we obtain the “defect equation” on the next coarser mesh TL−1 as
a(qL−1 , ϕL−1 ) = (dL , ϕL−1 )L2 = (f, ϕL−1 )L2 − a(ūνL , ϕL−1 ) ∀ϕL−1 ∈ VL−1 . (5.1.13)
The correction qL−1 ∈ VL−1 is now computed either exactly (for instance by a direct
0
solver) or only approximately by a defect correction iteration qL−1 → qL−1
R
using the
sequence of coarser meshes TL−2 , ..., T0 . The result qL−1 ∈ VL−1 is then interpreted as
R
The correction step may involve a damping parameter ωL ∈ (0, 1] in order to minimize
the residual of ū¯L . This practically very useful trick will not be further discussed here,
i. e., in the following, we will mostly set ωL = 1 . The obtained corrected approximation
ū¯L is again smoothed (“post-smoothing”) by applying another μ steps of the damped
Richardson iteration starting from ū¯0L := ū¯L :
# $
L , ϕL )L2 + θL (f, ϕL )L2 − a(ūL , ϕL )
(ū¯kL , ϕL )L2 = (ū¯k−1 ∀ϕL ∈ VL .
¯k−1 (5.1.15)
The result is then accepted as the next multigrid iterate, u1L := ū¯μL , completing one step
of the multigrid iteration (“multigrid cycle”) on mesh level L. Each such cycle consists of
ν + μ Richardson smoothing steps (on level L), which each requires the inversion of the
mass matrix Mh , and the solution of the correction equation on the next coarser mesh.
Now, we will formulate the multigrid algorithm using a more abstract, functional
analytic notation, in order to better understand its structure and to ease its convergence
analysis. To the system matrices Al = Ahl on the sequence of meshes Tl , l = 0, 1, . . . , L,
we associate operators Al : Vl → Vl by setting
In the finite element context these operators are naturally chosen as rll−1 = Pl−1 , the L2
projection onto Vl−1 , and pll−1 = id., the natural embedding of Vl−1 ⊂ Vl into Vl .
192 Multigrid Methods
Now, using this notation, we reformulate the multigrid process introduced above for
solving the linear system on the finest mesh TL :
AL uL = fL := PL f. (5.1.17)
Multigrid process: Starting from an initial vector u0L ∈ VL iterates utL are constructed
by the recursive formula
(t+1) (t)
uL = MG(L, uL , fL ). (5.1.18)
(t)
Let the t-th multigrid iterate uL be determined.
Coarse grid solution: For l = 0 , the operation MG(0, 0, g0) yields the exact solution of
the system A0 v0 = g0 (obtained for instance by a direct method),
dl := gl − Al v̄l ; (5.1.22)
iii) Restriction:
v) Prolongation:
ql := pll−1 ql−1
R
; (5.1.26)
vii) Post-smoothing:
ut+1 1
L := vl . (5.1.29)
We collect the afore mentioned algorithmic steps into a compact systematics of the multi-
grid cycle utL → ut+1
L :
qL−1 = Ã−1 ˜
L−1 dL−1 (R-times defect correction)
If the defect equation AL−1 qL−1 = d˜L−1 on the coarser mesh TL−1 is solved “exactly”, one
speaks of a “two-grid method”. In practice, the two-grid process is continued recursively
to the “multigrid method” up to the coarsest mesh T0 . This process can be organized
in various ways depending essentially on the choice of the iteration parameter R , which
determines how often the defect correction step is repeated on each mesh level. In practice,
for economical reasons, only the cases R = 1 and R = 2 play a role. The schemes of the
corresponding multigrid cycles, the “V-cycle” and the “W-cycle”, are shown in Fig. 5.1.
Here, “•” represent “smoothing” and “defect correction” on the meshes Tl , and lines
“−” stand for the transfer between consecutive mesh levels.
v4
v3
v2
v1
v0
v4
v3
v2
v1
v0
Figure 5.1: Scheme of a multigrid algorithm organized as V- (top left), F- (top right), and
W-cycle (bottom line).
194 Multigrid Methods
The V-cycle is very efficient (if it works at all), but often suffers from instabilities
caused by irregularities in the problem to be solved, such as strong anisotropies in the
differential operator, boundary layers, corner singularities, nonuniformities and deteriora-
tions in the meshes (local mesh refinement and cell stretching), etc.. In contrast to that,
the W-cycle is more robust but usually significantly more expensive. Multigrid methods
with R ≥ 3 are too inefficient. A good compromise between robustness and efficiency is
provided by the so-called “F-cycle” shown in Fig. 5.1. This process is usually started on
the finest mesh TL with an arbitrary initial guess u0L (most often u0L = 0). However,
for economical reasons, one may start the whole multigrid process on the coarsest mesh
T0 and then use the approximate solutions obtained on intermediate meshes as starting
values for the iteration on the next finer meshes. This “nested” version of the multigrid
method will be studied more systematically below.
Nested multigrid: Starting from some initial vector u0 := A−1 0 f0 on the coarsest mesh
T0 , for l = 1, ..., L, successively approximations ũl ≈ ul are computed by the following
recursion:
Remark 5.1: Though the multigrid iteration in V-cycle modus may be unstable, it can
be used as preconditioners for an “outer” CG (in the symmetric case) or GMRES iteration
(in the nonsymmetric case). This approach combines the robustness of the Krylov space
method with the efficiency of the multigrid methods and has been used very successfully
for the solution of various nonstandard problems, involving singularities, indefiniteness,
saddle-point structure, and multi-physics coupling.
Remark 5.2: There is not something like the multigrid algorithm. The successful real-
ization of the multigrid concept requires a careful choice of the various components such
as the “smoother” Sl , the coarse-mesh operators Al , and the mesh-transfer operators
rll−1 , pll−1 , specially adapted to the particular structure of the problem considered. In the
following, we discuss these algorithmic componenents in the context of the finite element
discretization, e. g., of the model problem from above.
i) Smoothers: “Smoothers” are usually simple fixed-point iterations, which could princi-
pally also be used as “solvers”, but with a very bad convergence rate. They are applied
on each mesh level only a few times (ν, μ ∼ 1 − 4), for damping out the high-frequency
components in the errors or the residuals. In the following, we consider the damped
Richardson iteration with iteration matrix
as smoother, which, however, only works for very simple (scalar) and non-degenerate
problems.
5.1 Multigrid methods for linear systems 195
Remark 5.3: More powerful smoothers are based on the Gauß-Seidel and the ILU iter-
ation. These methods also work well for problems with certain pathologies. For example,
in case of strong advection in the differential equation, if the mesh points are numbered
in the direction of the transport, the system matrix has a dominant lower triangular part
L , for which the Gauß-Seidel method is nearly “exact”. For problems with degenerate
coefficients in one space direction or on strongly anisotropic meshes the system matrix has
a dominant tridiagonal part, for which the ILU method is nearly “exact”. For indefinite
saddle-point problems certain “block” iterations are used, which are specially adapted to
the structure of the problem. Examples are the so-called “Vanka-type” smoothers, which
are used in solving the “incompressible” Navier-Stokes equations in Fluid Mechanics.
ii) Grid transfers: In the context of a finite element discretization with nested subspaces
V0 ⊂ V1 ⊂ ... ⊂ Vl ⊂ ... ⊂ VL the generic choice of the prolongation pll−1 : Vl−1 → Vl
is the cellwise embedding, and of the restriction rll−1 : Vl → Vl−1 the L2 projection. For
other discretizations (e. g., finite difference schemes), one uses appropriate interpolation
operators (e. g., bi/trilinear interpolation).
iii) Corse-grid operators: The operators Al on the several spaces Vl do not need to
correspond to the same discretization of the original “continuous” problem. This as-
pect becomes important in the use of mesh-dependent numerical diffusion (“upwinding”,
“streamline diffusion”, etc.) for the treatment of stronger transport. Here, we only con-
sider the ideal case that all Al are defined by the same finite element discretization on
the mesh family {Tl }l=0,...,L . In this case, we have the following useful identity:
In the following analysis of the multigrid process, for simplicity, we will make the choice
ωl = 1.
The classical analysis of the multigrid process is based on its interpretation as a defect-
correction iteration and the concept of recursive application of the two-grid method. For
196 Multigrid Methods
simplicity, we assume that only pre-smoothing is used, i. e., ν > 0, μ = 0, and that in the
correction step no damping is applied, i. e., ωl = 1 . Then, the two-grid algorithm can be
written in the following form:
−1 L−1
L = SL (uL ) + pL−1 AL−1 rL
ut+1 fL − AL SLν (utL )
ν t L
= SLν (utL ) + pLL−1 A−1
L−1 rL AL uL − SL (uL ) .
L−1 ν t
SL (vL ) := SL vL + gL ,
et+1 t
L = ZGL (ν)eL . (5.1.35)
Proof. We write
The first term on the right-hand side describes the quality of the approximation of the fine-
grid solution on the next coarser mesh, while the second term represents the smoothing
effect. The goal of the further analysis is now to show that the smoother SL (·) possesses
the so-called “smoothing property”,
(A−1 L −1 L−1
L − pL−1 AL−1 rL )vL L2 ≤ ca hL vL L2 ,
2
vL ∈ VL , (5.1.40)
with positive constants cs , ca independent of the mesh level L . Combination of these two
estimates then yields the asserted estimate (5.1.36). For sufficiently frequent smoothing,
we have ρZG := cν −1 < 1 and the two-grid algorithm converges with a rate uniformly
w.r.t. the mesh level L . All constants appearing in the following are independent of L .
i) Smoothing property: The selfadjoint operator AL possesses real, positive eigenvalues
0 < λ1 ≤ ... ≤ λi ≤ ... ≤ λnL =: ΛL and a corresponding L2 -ONS of eigenfunctions
{w , ..., w } , such that each vL ∈ VL can be written as vL = ni=1
1 nL L
γi w i , γi = (vL , w i )L2 .
For the Richardson iteration operator,
SL := IL − θL AL : VL → VL , θL = Λ−1
L , (5.1.41)
there holds
nL λi ν i
AL SLν vL = γ i λi 1 − w, (5.1.42)
i=1
ΛL
and, consequently,
nL λi 2ν
AL SLν vL 2L2 = γi2 λ2i 1 −
i=1
ΛL
λ 2 λi 2ν
nL
i
≤ Λ2L max 1− γi2
1≤i≤nL ΛL ΛL i=1
λ 2 λi 2ν
i
= Λ2L max 1− vL 2L2 .
1≤i≤nL ΛL ΛL
By the relation (exercise)
it follows that
ii) Approximation property: We recall that in the present context of nested subspaces Vl
prolongationen and restriction operators are given by
Lv = fL in Ω, v = 0 on ∂Ω, (5.1.46)
or in “weak” formulation
There holds
i. e., vL and vL−1 are the Ritz projections of v into VL and VL−1 , respectively. For
these the following L2 -error estimates hold true:
vL − vL2 ≤ ch2L ∇2 vL2 , vL−1 − vL2 ≤ ch2L−1 ∇2 vL2 . (5.1.49)
This together with the a priori estimate (5.1.48) and observing hL−1 ≤ 4hL implies that
vL − vL−1 L2 ≤ ch2L ∇2 vL2 ≤ ch2L fL L2 , (5.1.50)
and, consequently,
A−1 L −1 L−1
L fL − pL−1 AL−1 rL fL L2 ≤ chL fL L2 .
2
(5.1.51)
A−1 L −1 L−1
L − pL−1 AL−1 rL L2 ≤ chL ,
2
(5.1.52)
The foregoing result for the two-grid algorithm will now be used for inferring the
convergence of the full multigrid method.
Theorem 5.2 (Multigrid conver- gence): Suppose that the two-grid algorithm con-
verges with rate ρZG (ν) → 0 for ν → ∞ , uniformly with respect to the mesh level L .
Then, for sufficiently frequent smoothing the multigrid method with R ≥ 2 (W-cycle)
converges with rate ρM G < 1 independent of the mesh level L,
Proof. The proof is given by induction with respect to the mesh level L . We consider
only the relevant case R = 2 (W-cycle) and, for simplicity, will not try to optimize the
constants occurring in the course of the argument. Let ν be chosen sufficiently large such
that the convergence rate of the two-grid algorithm is ρZG ≤ 1/8 . We want to show that
then the convergence rate of the full multigrid algorithm is ρM G ≤ 1/4, uniformly with
respect to the mesh level L . For L = 1 this is obviously fulfilled. Suppose now that
also ρM G ≤ 1/4 for mesh level L − 1. Then, on mesh level L , starting from the iterate
utL , with the approximative solution qL−1
2
(after 2-fold application of the coarse-mesh
correction) and the exact solution q̂L−1 of the defect equation on mesh level L − 1 , there
holds
According to the induction assumption (observing that the starting value of the multigrid
iteration on mesh level L − 1 is zero and that ρ̂L−1 = A−1 L−1
L−1 rL dL ) it follows that
q̂L−1 − qL−1
2
L2 ≤ ρ2MG q̂L−1 L2 = ρ2MG A−1
L−1 rL AL SL (uL − uL )L2 .
L−1 ν t
(5.1.55)
Combination of the last two relations implies for the iteration error etL := utL − uL that
−1 L−1
t
et+1
L L2 ≤ ρZG + ρMG AL−1 rL AL SL L2 eL L2 .
2 ν
(5.1.56)
The norm on the right-hand side has been estimated already in connection with the
convergence analysis of the two-grid algorithm. Recalling the two-grid operator
there holds
A−1
L−1 rL AL SL = SL − ZGL ,
L−1 ν ν
und, consequently,
A−1
L−1 rL AL SL L2 ≤ SL L2 + ZGL L2 ≤ 1 + ρZG ≤ 2.
L−1 ν ν
(5.1.57)
Remark 5.4: For well-conditioned problems (symmetric, positive definite operator, reg-
ular coefficients, quasi-uniform meshes, etc.) one achieves multigrid convergence rates in
the range ρM G = 0, 05 − 0, 5 . The above analysis only applies to the W-cycle since in
part (ii), we need that R ≥ 2 . The V-cycle cannot be treated on the basis of the two-
grid analysis. In the literature there are more general approaches, which allow to prove
convergence of multigrid methods under much weaker conditions.
Next, we analyze the computational complexity of the full multigrid algorithm. For
this, we introduce the following notation:
C0 = OP(A−1
0 )/n0 ,
Cs = max {OP(Sl )/nl }, Cd = max {OP(dl )/nl },
1≤l≤L 1≤l≤L
Theorem 5.3 (Multigrid complexity): Under the condition q := Rκ < 1, for the full
multigrid cycle MGL there holds
OP(MGL ) ≤ CL nL , (5.1.60)
where
(ν + μ)Cs + Cd + Cr + Cp
CL = + C0 q L .
1−q
The multigrid algorithm for approximating the nL -dimensional discrete solution uL ∈ VL
on the finest mesh TL within discretization accuracy O(h2L ) requires O(nL ln(nL )) a. op.,
and therefore has (almost) optimal complexity.
Proof. We set Cl := OP(MGl )/nl . One multigrid cycle contains the R-fold application
of the same algorithm on the next coarser mesh. Observing nl−1 ≤ κnl and setting
Ĉ := (ν + μ)Cs + Cd + Cr + Cp it follows that
This implies the asserted estimate (5.1.60). The total complexity of the algotrithm then
results from the relations
−2/d ln(nL )
ρtMG ≈ h2L ≈ nL , t≈− .
ln(ρMG )
is essential. This means for the W-cycle (R = 2) that by the transition from mesh Tl−1
to the next finer mesh Tl the number of mesh points (dimension of spaces) sufficiently
increases, comparibly to the situation of uniform mesh refinement,
nl ≈ 4nl−1 . (5.1.62)
Remark 5.5: In an adaptively driven mesh refinement process with only local mesh
refinement the condition (5.1.61) is usually not satisfied. Mostly only nl ≈ 2nl−1 can be
expected. In such a case the multigrid process needs to be modified in order to preserve
good efficiency. This may be accomplished by applying the cost-intensive smoothing only
to those mesh points, which have been newly created by the transition from mesh Tl−1 to
mesh Tl . The implementation of a multigrid algorithm on locally refined meshes requires
much care in order to achieve optimal complexity of the overall algorithm.
The nested multigrid algorithm turns out to be really complexity optimal, as it requires
only O(nL ) a. op. for producing a sufficiently accurate approximation to the discrete
solution uL ∈ VL .
Theorem 5.4 (Nested multigrid): The nested multigrid algorithm is of optimal com-
plexity, i. e., it delivers an approximation to the discrete solution uL ∈ VL on the finest
mesh TL with discretization accuracy O(h2L ) with complexity O(nL ) a. op. independent
of the mesh level L.
Proof. The accuracy requirement for the multigrid algorithm on mesh level TL is
i) First, we want to show that, under the assumptions of Theorem 5.2, the result (5.1.63)
is achievable by the nested multigrid algorithm on each mesh level L by using a fixed
(sufficiently large) number t∗ of multigrid cycles. Let etL := utL −uL be again the iteration
error on mesh level L . By assumption et0 = 0, t ≥ 1 . In case u0L := utL−1 there holds
provided that κ−2 ρtMG < 1. Obviously there exists a t∗ , such that (5.1.63) is satisfied for
t ≥ t∗ uniformly with respect to L .
ii) We can now carry out the complexity analysis. Theorem 5.3 states that one cycle
of the simple multigrid algorithm MG(l, ·, ·) on the l-th mesh level requires Wl ≤ c∗ nl
a. op. (uniformly with respect to l ). Let now Ŵl be the number of a. op. of the nested
multigrid algorithm on mesh level l . Then, by construction there holds
Iterating this relation, we obtain with κ := max1≤l≤L nl−1 /nl < 1 that
used as components of other iterative methods, such as the Krylov space methods, for
accelerating certain computation-intensive substeps. In the following, we will only briefly
describe these different approaches.
To this system, we may apply a nonlinear version of the multigrid method described in
Section 5.1 again yielding an algorithm of optimal complexity, at least in principly (for
details see, e. g., Brand et al. [27] and Hackbusch [37]). However, this approach suffers
from stability problems in case of irregularities of the underlying continuous problem,
such as anisotropies in the operator, the domain or the computational mesh, which may
spoil the convergence of the method or render it inefficient. One cause may be the lack
of approximation property in case that the continuous eigenvalue problem is not well
approximated on coarser meshes, which is essential for the convergence of the multigrid
method. The possibility of such a pathological situation is illustrated by the following
example, which suggests to use the multigrid concept not directly but rather for accel-
erating the cost-intensive components of other more robust methods such as the Krylov
space methods (or the Jacobi-Davidson method) described above.
with coefficients ν > 0 and c = (c1 , c2 ) ∈ R2 . The (real) eigenvalues are explicitly given
by
b2 + b22
λ= 1 + νπ 2 (n21 + n22 ), n1 , n2 ∈ N,
4ν
with corresponding (non-normalized) eigenfunctions
b x + b x
1 1 2 2
w(x1 , x2 ) = exp sin(n1 πx1 ) sin(n2 πx2 ).
2ν
The corresponding adjoint eigenvalue problem has the eigenfunctions
b x +b x
w ∗ (x1 , x2 ) = exp −
1 1 2 2
sin(n1 πx1 ) sin(n2 πx2 ).
2ν
204 Multigrid Methods
This shows, first, that the underlying differential operator in (5.2.66) is non-normal and
secondly that the eigenfunctions develop strong boundary layers for small parameter val-
ues ν (transport-dominant case). In particular, the eigenvalues depend very strongly
on ν. For the “direct” application of the multigrid method to this problem, this means
that the “coarse-grid problems”, due to insufficient mesh resolution, have completely dif-
ferent eigenvalues than the “fine-grid” problem, leading to insufficient approximation for
computing meaningful corrections. This renders the multigrid iteration, being based on
successive smoothing and coarse-grid correction, inefficient and may even completely spoil
convergence.
The most computation-intensive part of the Arnoldi and Lanczos methods in the case of
the approximation of the smallest (by modulus) eigenvalues of a high-dimensional matrix
A ∈ Kn×n is the generation of the Krylov space
which requires the solution of a small number m n but high-dimensional linear systems
with A as coefficient matrix. Even though the Krylov space does not need to be explicitly
constructed in the course of the modified Gram-Schmidt algorithm for the generation
of an orthonormal basis {q 1 , . . . , q m } of Km , this process requires the same amount
of computation. This computational “acceleration” by use of multigrid techniques is
exploited in Section 4.3.2 on the computation of pseudospectra. We want to illustrate this
for the simpler situation of the “inverse iteration” for computing the smallest eigenvalue
of a symmetric, positive definite matrix A ∈ Rn×n .
Recall Example 4.1 in Section 4.1.1, the eigenvalue problem of the Laplace operator
on the unit square:
∂2w ∂2w
− 2
(x, y) − (x, y) = μw(x, y) for (x, y) ∈ Ω,
∂x ∂y 2 . (5.2.67)
w(x, y) = 0 for (x, y) ∈ ∂ Ω.
The discretization of this eigenvalue problem by the 5-point difference scheme on a uniform
Cartesian mesh or the related finite element method with piecewise linear trial functions
leads to the matrix eigenvalue problem
Az = λz, λ = h2 μ, (5.2.68)
For computing λ1 , we may use the inverse iteration with shift λ = 0 . This requires in
each step the solution of a problem like
1 (A−1 z t , z t )2
:= = (z̃ t+1 , z t )2 , (5.2.70)
λt1 z t 22
z 0 22 λ2 2t
|λ1 − λt1 | ≤ λt1 . (5.2.72)
|α1 |2 λ1
Observing that λt1 ≈ λ1 ≈ h2 and h2 z 0 22 = h2 ni=1 |zi0 |2 ≈ v 02L2 , where v 0 ∈ H01 (Ω)
is the continuous eigenfunction corresponding to the eigenvector z 0 , we obtain
λ 2t
2
|λ1 − λt1 | ≤ c ≤ c 0.42t . (5.2.73)
λ1
i. e., the convergence is independent of the mesh size h or the dimension n = m2 ≈ h−2
of A. However, in view of the relation μ1 = h−2 λ1 achieving a prescribed accuracy in the
approximation of μ1 requires the scaling of the tolerance in computing λ1 by a factor
h2 , which introduces a logarithmic h-dependence in the work count of the algorithm,
log(εh2 )
t(ε) ≈ ≈ log(n). (5.2.74)
log(2/5)
Now, using a multigrid solver of optimal complexity O(n) in each iteration step (4.1.20)
the total complexity of computing the smallest eigenvalue λ1 becomes O(n log(n)) .
Remark 5.6: For the systematic use of multigrid acceleration within the Jacobi-Davidson
method for nonsymmetric eigenvalue problems, we refer to Heuveline & Bertsch [41]. This
combination of a robust iteration and multigrid acceleration seems presently to be the
most efficient approach to solving large-scale symmetric or unsymmetric eigenvalue prob-
lems.
206 Multigrid Methods
5.3 Exercises
−Δu = f, in Ω, u = 0, on ∂Ω,
on the unit square Ω = (0, 1)2 ⊂ R2 by the finite element Galerkin method using linear
shape and test functions on a uniform Cartesian triangulation Th = {K} with cells K
(rectangular triangles) of width h > 0. The lowest-order finite element space on the mesh
Th is given by
Its dimension is dimVh = nh , which coincides with the number of interior nodal points
ai , i = 1, . . . , nh , of the mesh Th . Let {ϕ1h , . . . , ϕnhh } denote the usual “nodal basis” (so-
called “Lagrange basis”) of the finite element subspace Vh defined by the interpolation
condition ϕih (aj ) = δij . Make a sketch of this situation, especially of the mesh Th and a
nodal basis function ϕih .
Then, the finite element Galerkin approximation in the space Vh as described in the text
results in the following linear system for the nodal value vector xh = (x1h , . . . , xnhh ) :
Ah xh = bh ,
where ai , i = 1, 2, 3 , are the three vertices of the triangle K and |K| its area. This
quadrature rule is exact for linear polynomials. The result is a matrix and right-hand
side vector which are exactly the same as resulting from the finite difference discretization
of the Poisson problem on the mesh Th described in the text.
Exercise 5.2: Analyze the proof for the convergence of the two-grid algorithm given in
the text for its possible extension to the case the restriction rll−1 : Vl → Vl−1 is defined
by local bilinear interpolation rather than by global L2 -projection onto the coarser mesh
Tl−1 . What is the resulting difficulty? Do you have an idea how to get around it?
−Δu + ∂1 u = f in Ω, u = 0 on ∂Ω,
leads to asymmetric system matrices Ah . In this case the analysis of the multigrid
algorithm requires some modifications. Try to extend the proof given in the text for the
5.3 Exercises 207
convergence of the two-grid algorithm for this case if again the (damped) Richardson
iteration is chosen as smoother,
xt+1
h = xth − θt (Ah xth − bh ), t = 0, 1, 2, . . . .
What is the resulting difficulty and how can one get around it?
−Δu = f, in Ω, u = 0, on ∂Ω,
on the unit square Ω = (0, 1)2 ⊂ R2 by the finite element Galerkin method using linear
shape and test functions. Let (Tl )l≥0 be a sequence of equidistant Cartesian meshes of
width hl = 2−l . The discrete equations on mesh level l are solved by a multigrid method
with (damped) Richardson smooting and the natural embedding as prolongation and the
L2 projection as restriction. The number of pre- and postsmoothing steps is ν = 2 and
μ = 0 , respectively. How many arithmetic operations are approximately required for a
V-cycle and a W-cycle depending on the dimension nl = dimVl ?
For which of these matrices do the Jacobi method, the Gauß-Seidel method and the CG
method converge unconditionally for any given initial point u0 ?
Exercise 5.7: Derive best possible estimates for the eigenvalues of the matrix
⎡ ⎤
1 10−3 10−4
⎢ ⎥
A=⎢ ⎣ 10
−3
2 10−3 ⎥⎦
10−4 10−3 3
by the enclusion lemma of Gerschgorin. (Hint: Precondition the matrix by scaling, i. e.,
by using an appropriate similarity transformation with a diagonal matrix A → D −1 AD .)
Exercise 5.8: Formulate the power method for computing the largest (by modulus)
eigenvalue of a matrix A ∈ Cn×n . Distinguish between the case of a general matrix
and the special case of a Hermitian matrix.
i) Under which conditions is convergence guaranteed?
ii) Which of these conditions is the most critical one?
iii) State an estimate for the convergence speed.
A Solutions of exercises
In this section solutions are collected for the exercises at the end of the individual chapters.
These are not to be understood as ”‘blue-print”’ solutions but rather as suggestions in
sketchy form for stimulating the reader’s own work.
A.1 Chapter 1
x + y2 + x − y2 = (x + y, x + y) + (x − y, x − y)
= (x, x) + (y, y) + (x, y) + (y, x) + (x, x) + (y, y) − (x, y) − (y, x)
= 2 x2 + 2 y2.
Similarly
c) The properties of a scalar product follow immediately from those of the Euclidean
scalar product and those assumed for the matrix A .
i) Yes. Let ·, · be an arbitrary scalar product and let {ei }1≤i≤n
be a Cartesian
basis of
Rn , such that any x, y ∈ Rn have the respresentations x = i xi ei , y = i xi ei ∈ Rn .
Define
a matrix A ∈ Rn×n by aij := ej , ei . Then, there holds (Ax, y) = ij aij xj yi =
ij xj yi ej , ei = x, y . Furthermore, A is obviously symmetric and positive definite
due to the same properties of the scalar product < ·, · >.
209
210 Solutions of exercises
2. There exists a (Hermitian) positive definite matrix A ∈ Cn×n such that x, y =
(Ax, y), x, y ∈ Cn .
Ax2 Ay2
Ax2 = x2 ≤ sup x = A2 x2 .
x y∈R n y2
c) There holds
This relation is not true for any matrix norm. As a counter example, employ the elemen-
twise maximum norm Amax := maxi,j=1,··· ,n |aij | to
( ) ( ) ( )
1 1 1 0 2 0
· = .
0 0 1 0 0 0
λx2 Ax2
|λ| = = ≤ A2 .
x2 x2
Ax2 = A xi ai 2 = λi xi ai 2 ≤ max |λi | xi ai 2 | = max |λi | x2 ,
i i
i i i
and consequently,
Ax2
≤ max |λi |.
x2 i
e) There holds
and ĀT A2 ≤ ĀT 2 A2 = A22 (observe that A2 = ĀT 2 due to Ax2 =
ĀT x̄2 , x ∈ Cn ).
A.1 Chapter 1 211
Setting x = ei and y = ej , we see that aij + āji ∈ R and i(aij − āji ) ∈ R, i.e.,
Hence, aij = Re aij + iIm aij = Re aji − iIm aji = Re āji + iIm āji = āji .
Remark: The above argument only uses that x̄T Ax ∈ R, x ∈ Cn .
n
Solution A.1.5: a) v∞ = max1≤i≤n |vi | and v1 = i=1 |vi |.
b) The “spectrum” Σ(A) is defined as Σ(A) := {λ ∈ C, λ eigenvalue of A}.
Solution A.1.6: a) aii ∈ R follows directly from the property aii = āii of a Hermitian
matrix. Positiveness follows via testing by ei , which yields aii = ēTi Aei > 0.
b) The trace of a matrix is invariant under coordinate
transformation,
i. e. similarity
transformation (may be seen by direct calculation ijk bij ajk bki = i aii or by applying
the product formula for determinants to the characteristic polynomial. Observing that a
Hermitian matrix is similar to a diagonal matrix with its eigenvalues on the main diagonal
implies the asserted identity.
c) Assume that A is singular. Then ker(A) = ∅ and there exists x = 0 such that A x = 0,
i. e., zero is an eigenvalue of A. But this contradicts the statement of Gerschgorin’s Lemma
which bounds all eigenvalues away from zero due to the strict diagonal dominance. If
212 Solutions of exercises
all diagonal entries aii > 0 , then also by Gerschgorin’s lemma all Gerschgorin circles
(and consequently all eigenvalues) are contained in the right complex half-plane. If A
is additionally Hermitian, all these eigenvalues are real and positive and A consequently
positive definite.
S is well defined due to the fact that {Sn }n∈N is a Cauchy sequence with respect to the
matrix norm · (and, by the norm equivalence in finite dimensional normed spaces,
with respect to any matrix norm). By employing the triangle inequality, using the matrix
norm property and the limit formula for the geometric series, one proofs that
n
n
1 − Bn+1 1
S = lim Sn = lim B s ≤ lim Bs = lim = .
n→∞ n→∞ n→∞ n→∞ 1 − B 1 − B
s=0 s=0
Obviously A(0) is a diagonal matrix with eigenvalues λi (0) = aii . Now observe that the
“evolution” of the ith eigenvalue λi (t) is a continuous function in t (This follows from
the fact that a root t0 of a polynomial pα is locally arbitrarily differentiable with respect
to its coefficients – a direct consequence of the implicit function theorem employed to
p(α, t) = pα (t) and a special treatment of multiple roots).
Furthermore, the Gerschgorin circles of A(t), 0 ≤ t ≤ 1 have all the same origin, only
the radii are strictly increasing. So, Gerschgorin’s Lemma implies that the image of the
function t → λi (t) lies entirely in the union of all Gerschgorin circles of A(1). And due
to the fact that it is continous obviously in the connected component containing aii .
Solution A.1.9: (i)⇒(ii): Suppose that A and B commute. First observe that for an
arbitrary eigenvector z of B with eigenvalue λ there holds:
So, Az is either 0 or also an eigenvector of B with eigenvalue λ. Due to the fact that
B is Hermitian there exists an orthonormal basis {vi }ni=1 of eigenvectors of B . So we
A.1 Chapter 1 213
Solution A.1.10: i) Let A ∈ Kn×n be an arbitrary, regular matrix and define ϕ(x, y) :=
(Ax, Ay)2 . It is clear that ϕ is a sesquilinear form. Furthermore symmetry and positivity
follow directly from the corresponding property of ( . ). For definiteness observe that
(Ax, Ax) = 0 ⇒ Ax = 0 ⇒ x = 0 due to the regularity of A.
ii) The earlier result does not contradict (i) because there holds
So, if λ1 = λ2 it must hold that (v 1 , v 2 ) = 0. Yes, this result is true in general for normal
matrices (over C) and—more generally—known as the “spectral theorem for normal op-
erators” (see [Bosch, Lineare Algebra, p. 266, Satz 7.5/8], for details).
ii) Let v be an eigenvector for the eigenvalue λmax . There holds:
The corresponding equality for λmin (A) follows by a similar argument. λmin(A) ≤
λmax (A) is obvious.
214 Solutions of exercises
Solution A.1.12: i) We use the definition (c) from the text for the ε-pseudo-spectrum.
Let z ∈ σε (A) and accordingly v ∈ Kn , v = 1 , satisfying (A − zI)v ≤ ε . Then,
we conclude that
ε ε
(A − z −1 I)w ≤ ≤ .
|z|(|z| − ε) 1−ε
This completes the proof.
A.2 Chapter 2
On the other hand, a symmetric, positive definite but not (strictly) diagonally dominant
matrix is given by
⎛ ⎞
3 2 2
⎜ ⎟
B=⎜ ⎝2 3 2⎠ ,
⎟
2 2 3
or typically system matrices arising from higher order difference approximations, e. g. the
“stretched” 5-point stencil for the Laplace problem in 1D:
A.2 Chapter 2 215
⎛ ⎞
30 −16 1
⎜ ⎟
⎜−16 30 −16 1 ⎟
⎜ ⎟
⎜ 1 −16 30 −16 1 ⎟
⎜ ⎟
n 1 ⎜⎜ . .
⎟
⎟ ∈ Rn×n .
B = .
12h ⎜
⎜
⎟
⎟
⎜ 1 −16 30 −16 1 ⎟
⎜ ⎟
⎜ ⎟
⎝ 1 −16 30 −16⎠
1 −16 30
Note: To prove that the above B n ∈ Rn×n is positive definite, compute det(B k ) > 0 for
k = 1, · · · , 3 and derive a recursion formula of the form
Solution A.2.2: The result of the first k − 1 elimination steps is a block matrix A(k−1)
of the form
& k−1 '
R ∗ k−1
A(k−1)
= k−1 , with A ∈ R(n−k)×(n−k) pos. def. (by induction).
0 A
(k)
The submatrix A obtained in the k-th step is again positive definite. Hence the result
(i) implies
(k) (k) (k−1) (k−1)
max |aij | ≤ max |aii | ≤ max |aii | ≤ max |aij |.
k≤i,j≤n k≤i≤n k≤i≤n k≤i,j≤n
Since in the k-th elimination step the first k − 1 rows are not changed anymore induction
with respect to k = 1, . . ., n yields:
(n−1) (0)
max |rij | = max |aij | ≤ max |aij | ≤ max |aij |.
1≤i,j≤n 1≤i,j≤n 1≤i,j≤n 1≤i,j≤n
We have to show the following group properties for the matrix multiplication ◦ :
(G1) Closedness: L1 , L2 ∈ L ⇒ L1 ◦ L2 ∈ L .
(G2) Associative law: L1 , L2 , L3 ∈ L ⇒ L1 ◦ (L2 ◦ L3 ) = (L1 ◦ L2 ) ◦ L3 .
(G3) Neutral element I : L ∈ L ⇒ L ◦ I = L .
(G4) Inverse: L ∈ L ⇒ ∃L−1 ∈ L : L ◦ L−1 = I .
(G1) follows by computation. (G2) and (G3) follow from the properties of matrix multi-
plication. (G4) is seen through determination of the inverse by simultaneous elimination:
⎡ ⎤ ⎡ ⎤
1 0 1 0 1 0
⎢ ⎥ ⎢ ⎥
⎢ ..
.
..
. ⎥ ⇒ L−1 = ⎢ ..
. ⎥ ∈ L.
⎣ ⎦ ⎣ ⎦
0 1 ∗ 1 ∗ 1
The argument for R is analogous. The group R is also not abelian as the following 2 × 2
example shows:
& '& ' & ' & ' & '& '
1 1 −1 1 −1 2 −1 0 −1 1 1 1
= = = .
0 1 0 1 0 1 0 1 0 1 0 1
ii) For proving the uniqueness of the LR-decompositiong let for a regular matrix A ∈ Rn×n
two LR-decompositiongs A = L1 R1 = L2 R2 be given. Then, by (i) L1 , L2 ∈ L as well
as R1 , R2 ∈ R and consequently
for the n − 1 steps of the forward elimination for computing the matrix R and si-
multanously of the matrix L . For the sparse model matrix, we have Nband = 108 +
O(106) a. op. in contrast to N = 13 1012 + O(108) a. op. for a full matrix.
ii) If A is additionally symmetric (and positive definite) one obtaines the Cholesky de-
composition from the LR decomposition by
Because of the symmetry of all resulting rduced submatrices only the elements on the
main diagonal and the upper diagonals need to be computed. This reduces the work to
Nband = 21 nm2 +O(nm) a. Op. , i. e., for the model matrix to Nband = 12 108 +O(106) a. op.,
and Nband = 12 1016 + O(1012 ) a. op., respectively.
Solution A.2.5: a) The first step of the Gaussian elimination applied on the extended
matrix [A|b] produces:
⎡ ⎤ ⎡ ⎤
1 3 −4 1 1 3 −4 1
⎢ ⎥ ⎢ ⎥
⎢ 3 9 −2 1 ⎥ ⎢ 0 0 −2 ⎥
⎢ ⎥ ⎢ 10 ⎥
⎢ ⎥ → ⎢ ⎥.
⎢ 4 12 −6 1 ⎥ ⎢ 0 0 10 −3 ⎥
⎣ ⎦ ⎣ ⎦
2 6 2 1 0 0 10 −1
The linear system is not solvable because of rank A = 2 = 3 = rank [A|b] . Observe in
particular, that A does not have full rank.
b) A straightforward calculation leads to the following normal equation:
⎡ ⎤ ⎡ ⎤
⎡ ⎤ 1 3 −4 ⎡ ⎤ ⎡ ⎤ 1
1 3 4 2 ⎢ ⎥ x1 1 3 4 2 ⎢ ⎥
⎢ ⎥ ⎢ 3 9 −2 ⎥ ⎢ ⎥ ⎢ ⎥⎢ 1 ⎥
⎢ 3 9 12 6 ⎥ ⎢ ⎥ ⎢ x ⎥ = ⎢ 3 9 12 6 ⎥⎢ ⎥.
⎣ ⎢
⎦ 4 12 −6 ⎣⎥ 2 ⎦ ⎣ ⎦⎢ 1 ⎥
⎣ ⎦ ⎣ ⎦
−4 −2 −6 2 x3 −4 −2 −6 2
2 6 2 1
218 Solutions of exercises
Hence, ⎡ ⎤⎡ ⎤ ⎡ ⎤
30 90 −30 x1 10
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 90 270 −90 ⎥ ⎢ x2 ⎥ = ⎢ 30 ⎥ .
⎣ ⎦⎣ ⎦ ⎣ ⎦
−30 −90 60 x3 −10
Because of Rank A = 2 < 3, the kernel of the matrix AT A ∈ R3×3 is one dimensional.
Gaussian elimination:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
30 90 −30 10 30 90 −30 10 3 9 0 1
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 90 270 −90 30 ⎥ → ⎢ 0 0 0 0 ⎥ → ⎢ 0 0 0 0 ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
−30 −90 60 −10 0 0 30 0 0 0 1 0,
A.3 Chapter 3
3
B1 = max |aij | = 0.9 < 1.
j=1,2,3
i=1
This implies convergence due to the fact that spr(B) ≤ B1 < 1 and, hence, the iteration
is contractive. (Observe that the maximal absolute row sum does not imply convergence
because B||∞ = 1.4 > 1 .) The limit z = limt→∞ xt fullfills z = B z + c . Hence,
z = (I − B)−1 c.
3
λi = det(B) = −1.
i=1
This implies that at least for one of the eigenvalues there must hold |λ| ≥ 1. So, for
the choice (ii) the fixed point iteration cannot be convergent in general: In particular, if
x0 − x happens to be an eigenvector corresponding to the above eigenvalue λ it holds
Solution A.3.2: For a general fixed point iteration xt+1 = Bxt + c the following error
A.3 Chapter 3 219
It follows by induction that in order to reduce the initial error by at least a factor of ε it
is necessary to perform the following number of iterations:
9 :
log10 (ε)
t= .
log10 (spr(B))
hence, spr(J) = 1/3 and spr(H1 ) = 1/9 . Therefore, the necessary number of iterations
is 9 : 9 :
6 6
tJ = + 1 = 13, and tH1 = = 7, respectively.
log10 (3) log10 (9)
Solution A.3.4: a) In case of the matrix A1 , the iteration matrices for the Jacobi and
220 Solutions of exercises
are real valued. For any other choice of ω they are complex. Therefore:
, 2 √
1 − ω + 18 ω 2 + 12 ω 1 − ω + 16 ω , 0 ≤ ω ≤ 8 − 4 3,
1 2
spr(Bω ) = √
ω − 1, 8 − 4 3 < ω ≤ 2.
A.3 Chapter 3 221
1 ............................................................................................ .
...
..
...................... ...
.....................
................... ...
.
................... .
............... ...
.......... ..
........ ...
....... ...
0.8 spr(Bω ) ......
....
.... .
...
.
... ...
... ...
... ..
... ...
...
... ...
.
.
... ...
0.6 ...
... ....
.. ...
..
.. ..
.....
.
0.4
0.2
0 - ω
1 ωopt 2
and
⎛ ⎞
λk1
⎜ ⎟
T −1 Ak T = T −1 AT · T −1 AT · · · T −1 AT = ⎜
⎝
..
. ⎟
⎠ ∀k ∈ N.
λkd
for an arbitrary polynomial p. So, the spectral radius of p(A) is exactly maxi=1,...,n |p(λi )|.
222 Solutions of exercises
Hence,
g(X) − g(Y ) = (X − Y )(I − AC) ≤ X − Y || I − AC.
Therefore, if I − AC =: q < 1, then g is a contraction. The corresponding fixed-
point iteration converges for every initial value X0 . The limit Z fulfills the equation
Z = Z(I − AC) + C or ZAC = C . This is equivalent to Z = A−1 .
So, if q < 1 the fixed point iteration converges for every initial value X0 ∈ Rn×n to the
limit A with the a priori error estimate
b) We have
Xt = g(Xt−1 ), g(X) := X(2I − AX).
Let Z be an arbitrary fixed point of g . It necessarily fulfills the equation Z = Z(2I−AZ)
or Z = ZAZ . Suppose that Z is regular, then Z = A−1 . Note that this assumption is
essential because the singular matrix Z = 0 is always a valid fixed point of g . To prove
convergence (under a yet to be stated assumption) we observe that:
This implies
Xt − Z ≤ A Xt−1 − Z2 .
So, for Z = A−1 and under the condition that
1
X0 − Z <
A
This iteration is exactly Newton’s method for calculating the inverse of a matrix.
Remark: It is sufficient to choose a starting value X0 that fulfills the convergence
criterion (for the preconditioner C ) in (a):
1
1 > I − AX0 = A A−1 − X0 ⇐⇒ A−1 − X0 < ,
A
Solution A.3.8: Let J be the Jordan normal form of B and T a corresponding trans-
formation matrix such that
T −1 BT = J.
Let p(X) = α0 + α1 X + α2 X 2 + · · · + αk X k be an arbitrary polynomial. Then,
T −1 p(B)T = α0 + α1 T −1 BT + α2 (T −1 BT )2 + · · · + αk (T −1 BT )k = p(J).
Furthermore, observe that multiplication (or addition) of upper triangular matrices yields
another upper triangular matrix, whose diagonal elements are formed by elementwise
multiplication (or addition) of the corresponding diagonal elements of the multiplicands.
Consequently, p(J) is an upper triangular matrix of the form
⎛ ⎞
p(λ1 )
⎜ ⎟
⎜ p(λ2 ) ∗ ⎟
⎜ ⎟
p(J) = ⎜ .. ⎟,
⎜ 0 . ⎟
⎝ ⎠
p(λn )
xt = −D −1 (L + LT )xt−1 + D −1 b,
t
pt (z) = γst z s , pt (1) = 1.
s=0
It holds
y t − x2 ≤ pt (B)2 x0 − x2 ,
with pt (B)2 = maxλ∈σ(B) |p(λ)|. So, the optimal choice for the polynomial pt (z) would
be the solution of the minimzation problem
due to the fact that the resulting iteration matrix B is similar to a symmetric matrix
D −1/2 (L + LT )D −1/2 .
This optimization problem can be solved analytically. The solutions are given by rescaled
Chebyshev polynomials: x
Tt 1−δ
pt (x) := Ct (x) = δ
.
Tt 1 + 2−2δ
Solution A.3.10: a) No, the damped Richardson equation cannot be made convergent in
general. A necessary (and sufficient) condition for convergence of the damped Richardson
equation (applied to a symmetric coefficient matrix) for arbitrary starting values is that
(& ' & ')
I O A B
spr −θ < 1.
O I BT O
For this to hold true, it is necessary that the eigenvalues of the coefficient matrix are
sufficiently small – this can be controlled by θ and is therefore not a problem – and that
all eigenvalues are positive. But this does not need to be the case, consider, e. g.,
( ) ( )
1 0 −1
A= , B= .
0 1 0
B T AB)il = bji ajk bkl = bkl akj bkl = B T AB)li 1 ≤ i, j ≤ m,
jk jk
xT B T ABx ≥ Bx2 ≥ 0 ∀x ∈ Rm .
ζ = B T A−1 b − c.
B T A−1 B = L + D + LT
Due to the specific choice of decent directions r (t) = et+1 in the coordinate relaxation,
(t+1) (t)
there holds xj = xj for j = t + 1. Consequently, it suffices to show that in the step
t → t + 1 the (t + 1)-th component is set to the correct value. Inserting the step length
(t)
gt+1 1
(t)
αt+1 = = bt+1 − at+1,k xk
at+1,t+1 at+1,t+1 k
Solution A.3.12: The CG method applied to the normal equation reads: Given an
initial value x0 and an initial decent direction
(g (t) , g (t) )
αt = , y (t+1) = y (t) + αt d(t) , g (t+1) = g (t) + αt AT Ad(t) ,
(Ad(t) , Ad(t) )
(g (t+1) , g (t+1) )
βt = , d(t+1) = −g (t+1) + βt d(t) .
(g (t) , g (t) )
with the set of singular values S(A) of A. This implies that for symmetric A the relation
κ(AT A) = κ(A)2 holds and therefore a much slower convergence speed has to be expected.
for the different methods in terms of κ = cond2 (A) = Λ/λ (with Λ maximal absolute
eigenvalue and λ minimal absolute eigenvalue) are as follows:
A.3 Chapter 3 227
1 2 1 1 2
Gauß-Seidel: spr(H1 ) = spr(J)2 = 1 − = 1−2 +O ,
6 κ κ κ
1 − 1 − spr(J)2 √ 1 1
Optimal SOR: spr(Hopt ) = 6 = 1 − 8√ + O ,
1 + 1 − spr(J)2 κ κ
1 − 1/κ 1 1 2
Gradient method: = 1−2 +O ,
1 + 1/κ κ κ
√
1 − 1/ κ 1 1
CG method: √ = 1−2√ +O .
1 + 1/ κ κ κ
B T A−1 By = B T A−1 b − c
(g (t) , g (t) )
αt = , y (t+1) = y (t) + αt d(t) , g (t+1) = g (t) + αt B T A−1 Bd(t) ,
(A−1 Bd(t) , Bd(t) )
(g (t+1) , g (t+1) )
βt = , d(t+1) = −g (t+1) + βt d(t) .
(g (t) , g (t) )
Observe that in each step it is only necessary to compute two matrix vector products
(one with B and one with B T ) and one matrix vector product with A−1 when eval-
uating A−1 Bd(t) . This can be done with the help of an iterative method, e. g. with a
preconditioned Richardson method (as introduced in the text)
ξ t = ξ t−1 + C −1 (b − Aξ t−1 ).
Now, without loss of generality, both bases are greater than 1, so that
; <
κ+1 1
⇐⇒ t(ε) ≥ log .
κ+1 ε
228 Solutions of exercises
κ+1 #1 $
Finally, observe that log κ−1
=2 κ
+ 1 1
3 κ3
+··· ≥ 2 κ1 . Hence,
1 1
⇐= 2 t(ε) ≥ log .
κ ε
√
The corresponding result for the CG method follows by replacing κ with κ.
Solution A.3.16: The matrix C can be written in the form C = KK T with the help
of 1
1
K=6 D + L D −1/2 .
(2 − ω) ω ω
A close look reveals that the iteration matrix HωSSOR of the SSOR method can be ex-
pressed in terms of C and A :
HωSSOR = I − C −1 A.
Solution A.3.17: For the model problem matrix A it holds that spr(A) < 1. Hence,
the inverse (I − J)−1 is well defined and the Neumann series converges:
∞
−1
(I − J) = J k.
k=0
∞
A−1 = D −1 J k.
k=0
Finally, observe that the multiplication of two arbitrary matrices with non-negative entries
yields another matrix with non-negative entries. Therefore the matrices J k = D −k (−L −
R)k are elementwise non-negative. So, A−1 viewed as the sum of elementwise non-negative
matrices has the same property.
Solution A.3.18: i) The stated inequality is solely a result of the special choice of x0 +
Kt (d0 ; A) as affine subspace for the optimization problem – it holds:
# $ # $
x0 + Kt (d0 , A) = x0 + span A0 d0 , · · · , At−1 d0 = x0 + p(A)d0 : p ∈ Pt−1 .
ii) Due to the fact that A is symmetric and positive definite there exists an orthonormal
basis {oi } of eigenvectors of A with corresponding eigenvalues {λi }. Let y ∈ R be an
n
We conclude that p(A)2 ≤ supi |p(λi )| and consequently (Let λ be the smallest and Λ
be the biggest eigenvalue of A ):
But this is (up to the different norms) the very same inequality that was derived for the
CG method. So, with the same line of reasoning one derives
;√ <t
κ−1
Axgmres − b2 ≤
t
√ A(x0 − x)2 .
κ+1
The difficulty of this result lies in the fact that the λi are generally complex valued, so
some a priori asumption has to be made in order to control maxi |λi |.
and hence,
4
cond2 (A) ≈ .
π 2 h2
In analogy to the text, it holds that the eigenvalues of the Jacobi iteration matrix J =
I − D −1 A are given by
230 Solutions of exercises
1
μijk = cos[ihπ] + cos[jhπ] + cos[khπ] , i, j, k = 0, . . . , m.
2
Consequently,
π2 2
spr(J) = 1 −
h + O(h4 ).
2
b) Due to the fact that the matrix A is consistently ordered, it holds
spr(H1 ) = ρ2 = 1 − π 2 h2 + O(h4 ),
6
1 − 1 − ρ2 1 − πh + O(h2 )
spr(Hωopt ) = 6 = = 1 − 2πh + O(h2 ).
1+ 1−ρ 2 1 + πh + O(h2)
A.4 Chapter 4
n
Solution A.4.1: There holds z 0 = i=1 αi w i and z t = At z 0 −1 t 0
2 A z and therefore
n
(At+1 z 0 , At z 0 )2 |αi |2 λ2t+1
λt = (Az t , z t )2 = = i=1 i
A z 2
t 0 2 n
i=1 |α i | 2 λ2t
i
#
2 λi 2t+1
$
(λn )2t+1 |αn |2 + n−1 |α i |
= i=1
n−1 λi 2t
λn
2 λi 2t
2 λi 2t λi
|αn |2 + n−1 i=1 |αi | λn
+ n−1 i=1 |αi | λn λn
−1
= λn
|αn |2 + n−1 2 λi 2t
i=1 |αi | λn
n−1
2 λi 2t λi
i=1 |αi | λn λn
−1
= λn + λn n−1 λ 2t =: λn + λn Et .
|αn |2 + i=1 |αi |2 λni
A.4 Chapter 4 231
Hence,
z 0 22 λn−1 2t
|λt − λn | ≤ |λn | .
|αn |2 λn
Solution A.4.2: Let μi := (λi − λ)−1 be the eigenvalues of the matrix (A − λI)−1 .
Further, we note that μmax = (λmin − λ)−1 . The corresponding iterates generated by the
inverse iteration are μt = (λt − λ)−1 with λt := 1/μt + λ . We begin with the identity
Next,
5t−1 n 5t−1
|α1 |2 (λ1 − λt−1 )−1 j=0 (λ1 − λj )−2 + i=2 |αi | (λi − λ
2 t−1 −1
) j=0 (λi − λj )−2
t
μ = 5t−1 n 5t−1
|α1 | 2
j=0 (λ1 − λj )−2 + i=2 |αi |
2
j=0 (λi − λ )
j −2
5 n
2 λ1 −λt−1
5t−1 λ −λj 2
(λ1 − λt−1 )−1 t−1j=0 (λ1 − λ )
j −2 |α |2 +
1 i=2 |αi | λi −λt−1
1
j=0 λi −λj
= 5t−1 5
λ1 −λj 2
j=0 (λ1 − λ )
j −2
|α1 |2 + ni=2 |αi |2 t−1 j=0 λi −λj
−λj 2 n 5 λ1 −λj 2 −λt−1
1 |α1 |2 + ni=2 |αi |2 λλ1i −λ j + i=2 |αi |2 t−1 j=0 λi −λj 1 − λλ1i −λt−1
= n 5t−1 λ1 −λj 2
λ1 − λ t−1
|α1 | + i=2 |αi |
2 2
j=0 λi −λj
n 5t−1 λ1 −λj 2 −λt−1
i=2 |αi | 1 − λλ1i −λ
2
1 1 j=0 λi −λj t−1
= + 5 λ1 −λj 2
λ1 − λt−1 λ1 − λt−1 |α1 |2 + ni=2 |αi |2 t−1 j=0 λi −λj
1 1
=: + Et .
λ1 − λt−1 λ1 − λt−1
232 Solutions of exercises
This yields
t−1
t 1 1 λ1 − λj 2 z 0 22
μ − ≤ .
λ1 − λt−1 λ1 − λt−1 j=0 λ2 − λj |α1 |2
Solution A.4.3: It suffices to prove the following two statements about the QR-iteration.
The assertion then follows by induction.
Gn−1 Gn−2 · · · G1 A = R
with ⎛ ⎞
Ii−1 0
⎜ ⎟
Gi = ⎜
⎝ G̃i ⎟,
⎠
0 In−i−1
and an orthogonal component G̃i ∈ R2×2 that eliminates the lower left off diagonal entry
of the block ( ) ( )
∗i,i ∗i,i+1 ∗ ∗
G̃i = .
ai+1,i ai+1,i+1 0 ∗
A.4 Chapter 4 233
Apart from eliminating the entry ai+1,i , the orthogonal matrix Gi only acts on the upper
right part of the (intermediate) matrix. Consequently, R is an upper triangular matrix
and it holds
à = RQ = RGT1 GT2 · · · GTn−1 .
Similarly, it follows by induction that multiplication with GTi from the right only intro-
duces at most one (lower-left) off-diagonal element at position ∗i+1,i , so à is indeed a
Hessenberg matrix.
Now, let A be symmetric. It holds QR = A = AT = RT QT and consequently R =
QT RT QT . We conclude that
à = RQ = QT RT QT Q = (RQ)T = ÃT .
yields
projKm A Qm v = λ Qm v.
But by definition of m it holds that Km is A-invariant, i. e. AKm ⊂ Km , hence
projKm (A Qm v) = A Qm v and therefore λ ∈ σ(A).
In case of m = n there is Km = Cn . Consequently, Qm ∈ Cn×n is a regular matrix and
234 Solutions of exercises
i−1
α) ũi := v i − (uj , v i )uj ,
j=1
β) ui := ũi /ũi .
α) ũi,1 := v i ,
ũi,k := ũi,k−1 − (uk−1, ũi,k−1)uk−1, for k = 2, . . . , i,
β) ui := ũi,i /ũi,i .
hence
α) ũi,1 := v i ,
ũi,k := ũi,k−1 − (uk−1, v i )uk−1, for k = 2, . . . , i,
one observes that the algorithmic complexity of both variants are exactly the same. Both
consist of i − 1 scalar-products (with n a. op.) with vector scaling and vector addition
(with n a. op.) in step (α) which sums up to
m
(i − 1) (n + n) = n m(m − 1) a. op.
i=1
Solution A.4.8: i) With the help of the Taylor expansion of the cosine:
|λijkk − λhijk | =
! %
2 (i2 + j 2 + k 2 )π 2 h2 (i4 + j 4 + k 4 )π 4 h4
(i + j 2 + k 2 )π 2 − h−2 6 − 6 − − − O(h6 )
2! 4!
(i4 + j 4 + k 4 )π 4 h2 1 1
= + O(h4 ) ≤ λ2ijk h2 + O(h4 ) ≤ λ2ijk h2 (for h sufficiently small).
4! 4! 12
ii) The maximal eigenvalue λmax that can be reliably computed with a relative tolerance
TOL fulfills the relation
1 12 TOL
λmax h2 ≈ TOL =⇒ λmax ≈ .
12 h2
The number of reliably approximateable eigenvalues (not counting multiplicities) is the
cardinality of the set
#2 λmax $
i + j 2 + k 2 : (i2 + j 2 + k 2 ) ≤ 2 , i, j, k ∈ N, 1 ≤ i, j, k ≤ m .
π
For the concrete choice of numbers this leads to:
# $
# i2 + j 2 + k 2 : (i2 + j 2 + k 2 ) ≤ 19, i, j, k ∈ N ,
Solution A.4.9: i) The inverse iteration for determining the smallest eigenvalue (with
shift λ = 0 ) reads
with intermediate guesses μt = (A−1 z t , z t ) for the smallest eigenvalue. One iteration of
the inverse iteration consists of 1 solving step consisting of cn a. op and a normalization
step of roughly 2n a. op. Determining the final guess for the eigenvalue needs another
solving step and a scalar product, in total (c + 1) n a. op. So, for 100 iteration steps we
end up with
(101 c + 201)n a. op.
The Lanczos algorithm reads: Given initial q 0 = 0, q 1 = q−1 q, β1 = 0 compute for
1 ≤ t ≤ m − 1:
r t = A−1 q t , αt = (r t , q t ), st = r t − αt q t − βt q t−1
β t+1 = st , q t+1 = st /β t+1 ,
and a final step r m = A−1 q m , αm = (r m , q m ). This procedure takes cn a. op. for the
matrix vector product with additional 5n a. op. per round. In total (respecting initial
and final computations):
(101 c + 501) n a. op.
The Lanczos algorithm will construct a tridiagonal matrix T m (with m = 100 in our
case) of which we still have to compute the eigenvalues with the help of the QR method:
B (0) = T m ,
B (i) = Q(i) R(i) , B (i+1) = R(i) Q(i) .
A.4 Chapter 4 237
From a previous exercise we already know that the intermediate B (i) will retain the
tridiagonal matrix property, so that a total workload of O(m) a. op. per round of the
QR method can be assumed. For simplicity, we assume that the number of required QR
iterations (to achieve good accuracy) also scales with O(m). Then, the total workload of
QR method is O(m2 ) a. op.
ii) Assume that it is possible to start the inverse iteration with a suitable guess for each
of the 10 desired eigenvalues. Still, it is necessary to do the full 100 iterations for each
eigenvalue independently, resulting in
The Lanczos algorithm, in contrast, already approximates the first 10 eigenvalues simul-
taneoulsy for the choice m = 100 (see results of the preceding exercise). Hence, we end
up with the same number of a. op.:
(except for some possibly higher workload in the QR iteration). Given the fact that c is
usually o moderate size somewhere around 5, the Lanczos algorithm clearly wins.
Ax = b
⇐⇒ (Re A + i Im A)(Re x + i Im x) = Re b + i Im b
!
Re A Re x − Im A Im x = Re b
⇐⇒
−Re A Im x − Im A Re x = −Im b
( )( ) ( )
Re A Im A Re x Re b
⇐⇒ = .
−Im A Re A −Im x −Im b
ii) For all three properties it holds that they are fullfilled by the block-matrix à if and
only if the correspondig complex valued matrix A has the analogous property (in the
complex sense):
a) From the above identity we deduce that the complex valued linear system of equations
(in the first line) is uniquely solvable for arbitrary b ∈ Cn if and only if the same holds
true for the real valued linear equation (in the last line) for arbitrary (Re b, Im b) ∈ R2n .
Thus à is regular iff A is regular.
b) Observe that
à symmetric
⇐⇒ Im A = −Im AT and Re A = Re AT
⇐⇒ Re A + Im A = Re A − Im AT
⇐⇒ A = ĀT .
238 Solutions of exercises
Solution A.4.11: The statement follows immediately from the equivalent definition
# $
σε (T ) = z ∈ C : σmin (zI − T ) ≤ ε , with
# $
σmin (T ) := min λ1/2 : λ ∈ σ(T̄ T T )
and by the observation that similar matrices yield the same set of eigenvalues:
# $
σmin (T ) = min λ1/2 : λ ∈ σ(T̄ T T )
# $
= min λ1/2 : λ ∈ σ(Q̄T T̄ T QQ̄T T Q)
T
= min λ1/2 : λ ∈ σ (Q̄T T Q) (Q̄T T Q)
= σmin (Q−1 T Q).
A.5 Chapter 5
Solution A.5.1: Let ai be an arbitrary nodal point and ϕih be the corresponding nodal
basis function. Its support consists of 6 triangles T1 , · · · , T6 :
Outside of ∪6i=1 Ti the function ϕih is zero. Due to the fact that ϕih is continuous and
cellwise linear, its gradient is cellwise defined and constant with values
; < ; <
1 1 1 0
∇ϕih = , ∇ϕih = ,
K1 h 1 K2 h 1
; < ; <
1 −1 1 −1
∇ϕih = , ∇ϕih = ,
K3 h 0 K4 h −1
; < ; <
1 0 1 1
∇ϕih = , ∇ϕih = ,
K5 h −1 K6 h 0
A.5 Chapter 5 239
where h denotes the length of the catheti of the triangles. With these preliminaries it
follows immediately that
6
|Kμ |
3
1 2
bi = f (aj )ϕih (aj ) = 6 h f (ai ) = h2 f (ai ).
μ=1
3 j=1
6
For the stiffness matrix aij = (∇ϕih , ∇ϕjh ), we have to consider three distinct cases: a)
where ai = aj , b) where ai and aj are endpoints of a cathetus, and c) where they are
endpoints of a hypotenuse:
6
|Kμ |
3
1
a) aii = ∇ϕih (aν ), ∇ϕih (aν ) = h2 3 (2 + 1 + 1 + 2 + 1 + 1) h−2 = 4.
μ=1
3 ν=1
6
6
|Kμ |
3
1
b) aij = ∇ϕih (aν ), ∇ϕjh (aν ) = h2 3 (−1 − 1) h−2 = −1.
μ=1
3 ν=1
6
6
|Kμ |
3
1
c) aij = ∇ϕih (aν ), ∇ϕjh (aν ) = h2 3 0 = 0.
μ=1
3 ν=1
6
This is, up to a factor of h−2 exactly the stencil of the finite different discretization
described in the text.
Solution A.5.2: The principal idea for the convergence proof of the two-grid algorithm
was to prove a contraction property for
= ZGL (ν)eL , ZGL (ν) = A−1 −1
(t+1) (t)
L − pL−1 AL−1 rL
L L−1
eL AL SLν
A−1 −1 L−1
L − pL−1 AL−1 rL
L
≤ ca h2L .
The first property is completely independent of the choice of restriction that is used. The
second, however, poses major difficulties for our choice of restriction: In analogy to the
proof given in the text let ψL ∈ VL be arbitrary. Now, vL := A−1 L ψL is the solution of
the variational problem
240 Solutions of exercises
a(v, ϕ) = (ψL , ϕ) ∀ϕ ∈ V,
a (ṽ, ϕ) = (rLL−1 ψL , ϕ) ∀ϕ ∈ V.
We can employ the usual a priori error estimate (for the Ritz-projection):
rLL−1 ψL ≤ cψL .
But, now, rLL−1 is not the L2 -projection. So we have to assume that in general
and hence v = ṽ. This is a problem because a necessary bound of the form
Solution A.5.3: This time, the problem when trying to convert the proof to the given
problem arises in the smoothing property. The proof of the approximation property does
not need symmetry. We still have an inverse property of the form AL ≤ ch−2 . So, it
remains to show that
SL ≤ c < 1
for SL = IL − θAL with a constant c that is independent of L. Because AL is not
symmetric, it is not possible to copy the arguments (that utilize spectral theory) from the
text. We proceed differently: First of all observe that for all uL ∈ VL it holds that
Hence, AL is positive definite – or, equivalently, for all (complex valued) eigenvalues λi ,
i = 1, ..., NL of AL it holds:
with a constant c independent of L. The smoothing property now follows with the
general observation that for every ε > 0 there exists an (operator, or induced matrix)
norm · ∗ with
SL ∗ ≤ c + ε.
The question remains whether this extends to an L independent convergence rate in the
norm · .
Solution A.5.4: Applying one step of the Richardson iteration x̄n+1 = x̄n + θ(b−AL x̄n )
needs essentially one matrix vector multiplication with a complexity of 9NL a. op. (due to
the fact that at most 9 matrix entries per row are non-zero). Together with the necessary
addition processes SLν needs 11 νNL a. op.
Calculating the defect dl = fl − Al xl needs another 10NL a. op. For the L2 projektion
onto the coarser grid, we need to calculate
d˜l−1 := rll−1 dl .
This can be done very efficiently: Let {ϕli } be the nodal basis on level l. The i-th
component of the L2 projection of d˜l−1 is given by
d˜l−1
i = (rll−1 dl , ϕl−1 l l−1
i ) = (d , ϕi ).
Nl
ϕl−1
i = μij ϕlj ,
j=1
where at most 9 values μij are non trivial. This reduces the computation of the L2
projection to
242 Solutions of exercises
Nl
Nl
d˜l−1
i = μij (dl , ϕli ) = μij dli
j=1 j=1
and needs 9Nl a. op.. Contrary to this, the prolongation is relatively cheap with roughly
2Nl a. op. (interpolating intermediate values, neglecting the one in the middle and the
boundary, . . . ). Additionally, we account another Nl a. op. for adding the correction. In
total:
(2 · 11 + 10 + 9 + 2 + 1)Nl = 44Nl a. op. on level l.
The dimension of the subspaces behaves roughly like
Nl−k ≈ 2−2k Nl .
Within a V-cycle all operations have to be done exactly once on every level, hence (ne-
glecting the cost for solving on the coarsest level) we end up with:
l l
44 4 4
44Nl−k = N = 44Nl 1 − 2−(2k+2) ≤ 44Nl a. op..
2k l
k=0 k=0
2 3 3
l
l
44
k
2 44Nl−k = Nl = 2 · 44Nl 1 − 2−k−1 ≤ 2 · 44Nl a. op.
k=0 k=0
2k
Solution A.5.5: a) If there exists a regular T ∈ Rn×n and a diagonal matrix D ∈ Rn×n
such that
T −1 AT = D.
b) A matrix A = (aij ) is diagonally dominant if there holds
n
|aij | ≤ |aii |, i = 1, . . . , n.
j=1,j
=i
g) A Gerschgorin circle is a closed disc, denoted by K̄ρ (aii ) , and associated with a row (or
column) of a matrix
by the diagonal value aii and the absolute sum of the off-diagonal
elements ρ = j
=i |aij | (or ρ = j
=i |aji | , respectively). The union of all Gerschgorin
A.5 Chapter 5 243
circles of a matrix has the property that it contains all eigenvalues of the matrix.
h) The restriction rll−1 : Vl → Vl−1 is used to transfer an intermediate value vl ∈ Vl to
the next coarser level Vl−1 , typically a given finite element function to the next coarser
mesh. The prolongation operator pll−1 : Vl−1 → Vl does the exact opposite. It transfers
an intermediate result from Vl−1 to the next finer level.
i) Given an arbitrary b ∈ Cn it is defined as
# $
Km (b; A) = span b, Ab, . . . , Am−1 b .
k−1
ũk = v k − (v k , ui ) ui .
i=1
In the classical Gram-Schmidt method this is done in a straight forward manner, in the
modified version a slightly different algorithm is used:
with uk = ũk,k /|ũk,k |. Both algorithms are equivalent in exact arithmetic, but the latter
is much more stable in floating point arithmetic.
Solution A.5.6: i) The matrix A1 fulfils the weak row-sum criterion. Therefore the
Jacobi and Gauß-Seidel methods converge. Furthermore, A1 is symmetric and positive
definite (because it is regular and diagonally dominant), hence the CG method is appli-
cable.
ii) For A2 the Jacobi matrix reads
⎛ ⎞
0 1
− 12
⎜ 2 ⎟
J =⎜
⎝
1
2
0 1
2
⎟
⎠
− 12 1
2
0
√
with eigenvalues f ulf ilsλ1 = −1, f ulf ilsλ2,3 = ± 2/2. Hence, no convergence in gen-
eral. The Gauß-Seidel matrix is
⎛ ⎞
0 4 −4
1⎜ ⎟
H1 = ⎜ ⎝ 0 2 2⎟⎠
8
0 −1 3
244 Solutions of exercises
√
with eigenvalues f ulf ilsλ1 = 0, f ulf ilsλ2,3 = − 16
5
± i167 . Hence, the Gauß-Seidel iteration
does converge. A2 is symmetric and positive definite (because it is regular and diagonally
dominant).
iii) The matrix A3 is not symmetric, so the CG method is not directly applicable. For
the Jacobi method: ⎛ ⎞
0 12 − 12
⎜ ⎟
J =⎜ 1 1 ⎟
⎝2 0 2 ⎠ ,
1 1
2 2
0
with corresponding eigenvalues f ulf ilsλ1 = 0, f ulf ilsλ2,3 = ± 12 . Hence, the method
does converge. Similarly for the Gauß-Seidel method:
⎛ ⎞
0 4 −4
1⎜ ⎟
H1 = ⎜ ⎝0 2 2⎟ ⎠,
8
0 3 −1
√
with eigenvalues f ulf ilsλ1 = 0, f ulf ilsλ2,3 = − 16
1
± 33
16
. The method does converge.
Now, we choose d ∈ R in such a way that the Gerschgorin circle defined by the first
column has minimal radius but is still disjunct from the other two Gerschgorin circles.
Therefore, a suitable choice of d must fulfil (the first two Gerschgorin circles must not
touch):
1 + 1.1 × 10−3 d < 2 − 10−3 − 10−3 d−1 .
Solving this quadratic inequality leads to a necessary condition d > 0.001001 (and . . . ),
hence d = 0.0011 is a suitable choice. This improves the radius of the first Gerschgorin
circle to
ρ1 = (1.1 × 10−3 )2 = 1.21 × 10−6 : K 1.21×10−6 (1).
Similarly, for the third Gerschgorin circle and with the choice D = diag(1, 1, d) :
z̃ t := Az t−1 , z t = z̃ t /z̃ t .
(Az t )r
λt := ,
zrt
where r is an index such that |zrt | = maxj=1,...,n |zjt | . In case of a Hermitian matrix A ,
the eigenvalue approximation can be determined with the help of the Rayleigh quotient:
(Az t , z t )
λt := .
z t 2
i) The power method converges if A is diagonalizable and the eigenvalue with largest
modulus is separated from the other eigenvalues, i. e. |λn | > |λi | for i < n . Furthermore
the starting vector z 0 must have a non-trivial component in the direction of the eigen-
vector wn corresponding to λn .
ii) The separation of the biggest eigenvalue from the others is the most crucial restriction
because the convergence rate is directly connected to this property (see iii)), and the other
two conditions are usually fulfilled (due to round-off errors).
iii) The power method has the following a priori error estimate (for a general matrix):
λ t
λ = λmax + O
n−1
t
, t → ∞.
λn
Bibliography
[1] R. Rannacher: Numerik 0: Einführung in die Numerische Mathematik, Lecture
Notes Mathematik, Heidelberg: Heidelberg University Publishing, Heidelberg, 2017,
https://doi.org/10.17885/heiup.206.281
[2] R. Rannacher: Numerik 1: Numerik Gewöhnlicher Differentialgleichungen, Lecture
Notes Mathematik, Heidelberg University Publishing, Heidelberg, 2017,
https://doi.org/10.17885/heiup.258.342
[3] R. Rannacher: Numerik 2: Numerik Partieller Differentialgleichungen, Lecture Notes
Mathematik, Heidelberg University Publishing, Heidelberg, 2017,
https://doi.org/10.17885/heiup.281.370
[8] N. Dunford and J. T. Schwartz: Linear Operators I, II, III, Interscience Publishers
and Wiley, 1957, 1963, 1971.
[9] H. J. Landau: On Szegö’s eigenvalue distribution theory and non-Hermitian kernels,
J. Analyse Math. 28, 335–357 (1975).
[10] R. A. Horn and C. R. Johnson: Matrix Analysis, Cambridge University Press, 1985-
1999, 2007.
[14] H.-O. Kreiss: Über die Stabilitätsdefinition für Differenzengleichungen die partielle
Differentialgleichungen approximaieren, BIT, 153–181 (1962).
247
248 BIBLIOGRAPHY
[15] P. Lancester and M. Tismenetsky: The Theory of Matrices with Applications, Aca-
demic Press, 1985.
[20] L. N. Trefethen: Pseudospectra of linear operators, SIAM Rev. 39, 383–406 (1997).
[26] A. Björck and C. C. Paige: Loss and recapture of orthogonality in the modified Gram-
Schmidt algorithm, SIAM J. Matrix Anal. Appl. 13, 176–190 (1992).
[27] A. Brandt, S. McCormick, and J. Ruge: Multigrid method for differential eigenvalue
problems, J. Sci. Stat. Comput. 4, 244–260 (1983).
[28] Ph. G. Ciarlet: Introduction to Numerical Linear Algebra and Optimization, Cam-
bridge University Press, 1989.
[29] M. Crouzeix, B. Philippe, and M. Sadkane: The Davidson method, SIAM J. Sci.
Comput. 15, 62–76 (1994).
[30] E. R. Davidson: The iterative calculation of a few of the lowest eigenvalues and
corresponding eigenvectors of large real-symmetric matrices, J. Comput. Phys. 17,
87–94 (1975).
[36] G. H. Golub and C. F. van Loan: Matrix Computations, Johns Hopkins University
Press, 1984.
[41] V. Heuveline and C. Bertsch: On multigrid methods for the eigenvalue computation
of nonselfadjoint elliptic operators, East-West J. Numer. Math. 8, 257–342 (2000).
[42] J. G. Heywood, R. Rannacher, and S. Turek: Artificial boundaries and flux and
pressure conditions for the incompressible Navier-Stokes equations, Int. J. Nu-
mer. Meth. Fluids 22, 325–352 (1996).
[43] D. Meidner, R. Rannacher, and J. Vihharev: Goal-oriented error control of the iter-
ative solution of finite element equations, J. Numer. Math. 17, 143-172 (2009).
[46] Y. Saad: Numerical Methods for Large Eigenvalue Problems, Manchester University
Press, 1992.
[48] G. L. G. Sleijpen and H. A. Van der Vorst: A Jacobi-Davidson iteration method for
linear eigenvalue problems, SIAM Review42, 267–293 (2000).
[50] J. Stoer and R. Bulirsch: Numerische Mathematik 1/2, Springer, 2007 (10th edi-
tions).
[51] G. Strang: Linear Algebra and its Applications, Academic Press, 1980.
[53] J. Todd: Basic Numerical Mathematics, Vol. 2: Numerical Algebra, Academic Press,
1977.
[54] L. N. Trefethen and D. Bau, III: Numerical Linear Algebra, SIAM, 1997.
[56] T.-L. Wang and W. B. Gragg: Convergence of the shifted QR algorithm for unitary
Hessenberg matrices, Math. Comput. 71, 1473–1496 (2001).
[57] D. M. Young: Iterative Solution of Large Linear Systems, Academic Press, 1971.
[58] O. Axelsson and V. A. Barker: Finite Element Solution of Boundary Value Problems,
Theory and Computation, Academic Press, 1984.
[63] A. R. Mitchell and D. F. Griffiths: The Finite Difference Method in Partial Differ-
ential Equations, Wiley, 1980
[65] M. Schäfer and S. Turek: Benchmark computations of laminar flow around a cylinder,
in Flow Simulation with High-Performance Computer II, Notes on Numerical Fluid
Mechanics, vol. 48 (Hirschel, E. H., ed.), pp. 547–566, Vieweg, 1996.
[67] G. Strang and G. J. Fix: An Analysis of the Finite Element Method, Prentice-Hall,
1973.
251
252 INDEX
9 783947 732005