NLA10
NLA10
NLA10
Literature
The material of this course is based on the following textbooks:
G. Golub, C. Van Loan. Matrix computations. Baltimore, 1996.
R. B. Lehoucq, D. C. Sorensen and C. Yang. ARPACK Users Guide: Solution of
Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, Philadelphia, 1989.
Y. Saad. Numerical methods for large eigenvalue problems. Manchester, 1992.
L. Trefethen, D. Bau. Numerical linear algebra. Philadelphia, 1997.
D. Watkins. Fundamentals of matrix computations. New York, 2002.
The following books are also useful to complement the material of these notes:
J. Demmel. Applied numerical linear algebra. Philadelphia, 1997.
N.J. Higham. Accuracy and stability of numerical algorithms. Philadelphia, 2002.
G.W. Stewart. Matrix algorithms. Philadelphia, 1998-2001, 2 Volumes.
G.W. Stewart, J.G. Sun. Matrix perturbation theory. Boston, 1990.
Chapter 0
Introduction
The main topics of Numerical Linear Algebra are the solution of different classes of eigenvalue
problems and linear systems.
For the eigenvalue problem we discuss different classes.
(a) The standard eigenvalue problem: For a real or complex matrix A Cn,n , determine
x Cn , C, such that
Ax = x.
The standard eigenvalue problem is a special case of the generalized eigenvalue problem:
For real or complex matrices A, E Cm,n , determine x Cn , C, such that
Ax = Ex,
In many applications the coefficient matrices have extra properties such as being real and
symmetric or complex Hermitian.
For linear systems:
Ax = b, x Cn , b Cm
with A Cm,n we again may extra properties for the coefficient matrices.
We will concentrate in this course on the numerical solution of standard and generalized
eigenvalue problems and the solution of linear systems. We will briefly review some of the
standard techniques for small scale problems and put an emphasis on large scale problems.
Applications: Eigenvalue problems arise in
the vibrational analysis of structures and vehicles (classical mechanics);
the analysis of the spectra and energy levels of atoms and molecules (quantum mechanics);
model reduction techniques, where a large scale model is reduced to a small scale model
by leaving out weakly important parts;
many other applications.
Linear systems arise in almost any area of science and engineering such as
3
EVP
LS
A small
QR-Algorithm, QZ-Algorithm
A large
Lanczos, Arnoldi, Jacobi-Davidson
CG, GMRES
Chapter 1
Matrix theory
1.1
1.1.1
Basics
Eigenvalues and Eigenvectors
1.1.2
Matrix norms
||Ax||p
||x||p
m
X
i=1
|aij |
1.1.3
1.1.4
Subspaces
1.1.5
Invariant subspaces
1.2
1.2.1
Matrix decompositions
Schur decomposition
Remark 9 In the Schur decomposition U can be chosen such that the eigenvalues of A appear
in arbitrary order on the diagonal.
Definition 10 A matrix T Rn,n is called quasi-upper triangular matrix, if T is a blockupper triangular matrix and the diagonal blocks have size maximal 2 2.
Theorem 11 (Murnaghan, Wintner, 1931)
Let A Rn,n . Then there exists a real orthogonal matrix Q, (i.e., QT Q = I) such that
T = QT AQ
is quasi-upper triangular.
Proof: The proof is similar to that of the Theorem of Schur: If A has a real eigenvector
then we can proceed the induction as in the complex case. Otherwise for a complex
eigenvector v = v1 + iv2 to the complex eigenvalue = 1 + i2 , with v1 , v2 Rn and
1 , 2 R, 2 6= 0 we have from
A (v1 + iv2 ) = (1 + i2 ) (v1 + iv2 )
that
Av1 = 1 v1 2 v2
Av2 = 1 v2 + 2 v1
A [v1 v2 ] = [v1 v2 ]
8
1 2
2 1
.
1 2
A
12
QH AQ = 2 1
.
0
A22
The remaining steps of the induction are as in the complex case.
Proof: Exercise.
Corollary 15 If A Cn,n is normal, then there exists U Cn,n unitary, such that U H AU
is diagonal.
2
Proof: Exercise.
Theorem 16 (Generalized Schur form) Let A, E Cn,n be such that the pair (E, A) is
regular, i.e., det E A 6= 0 for all C. Then there exist U, V Cn,n unitary such that
S = U H EV, T := U H AV
are upper triangular.
Proof: Let (Ek ) be a sequence of nonsingular matrices that converges to E. For every let
1
QH
k AEk Qk = Tk
be a Schur decomposition of AEk1 and let ZkH (Ek1 Qk = Sk1 be a QR decomposition. Then
H
both QH
k AZk = Rk Sk and Qk Ek Zk = Sk are upper triangular.
Using the Bolzano-Weierstra it follows that the bounded sequence (Qk , Zk ) has a
converging subsequence with a limit (Q, Z), where Q, Z are unitary. Then QH AZ = T and
QH EZ = S are upper triangular.
2
1.2.2
..
0
.
A = U V H , =
Cm,n .
r
0
0
Furthermore, 1 = ||A||2 and 1 , . . . , r are uniquely determined.
2
Proof: Exercise.
and
AAH = U V H V H U H = U H U H = U 2 U H ,
i.e. 12 , . . . , r2 are the nonzero eigenvalues of AAH and AH A, respectively.
(b) Since AV = U one has Kernel (A) = Span {vr+1 , . . . , vn } and Image (A) = R (A) =
Span {u1 , . . . , ur }.
(c) The SVD allows optimal low-rank approximation of A, since
H
A = U V
= U
..
+
.
.
.
+
r
X
0
..
H
V
.
0
n
j uj vjH .
j=1
X
i=1
10
i uj viH
||A B|| = +1 ,
inf
Rank (B)
BCm,n
where r+1 := 0.
(d) If A is real, then also U and V can be chosen real.
1.3
Perturbation theory
In the analysis of numerical methods, we will have to study the eigenvalues, eigenvectors
invariant subspaces under small perturbations ?
1.3.1
| hx, yi | = max .
W.l.o.g. we can choose x and y such that their scalar product is real and nonnegative.
Otherwise we can take z C with |z| = 1, so that hzx, yi = z hx, yi is real and nonnegative.
Then | hx, yi | = | hzx, yi |.
(a) Choose x1 U and y1 V with ||x1 || = ||y1 || = 1 such that
hx1 , y1 i = max {Re hx, yi | x U, y V, ||x|| = ||y|| = 1}
Then hx1 , y1 i is real, 1 = arccos hx1 , y2 i is called first canonical angle and x1 , y1 are
called first canonical vectors.
(b) Suppose that we have determined j 1 canonical angles and vectors, i.e.,
x1 , . . . , xj1 U, y1 , . . . , yj1 V
are determined with (x1 , . . . , xj1 ) and (y1 , . . . , yj1 ) orthonormal.
Choose xj U and yj V with xj x1 , . . . , xj1 and yj , y1 , . . . , yj1 and ||xj || =
||yi || = 1, so that
hxj , yj i
has maximal real part. Then hxj , yj i is real,
j := arccos hxj , yj i
is the j-th canonical angle, and xj , yj are j-th canonical vectors. Proceeding inductively
we obtain k canonical angles 0 1 . . . k 2 and orthonormal bases (x1 , . . . , xk )
and (y1 , . . . , yk ) of U, V, respectively.
11
Proof: Exercise.
cos 1
0
..
X H Y = (hxi , yj i) =
.
0
cos k
with cos 1 . . . cos k 0 and this is SVD.
Practical computation of canonical angles and vectors
(a) Determine orthonormal bases of U and V, i.e., isometric matrices P, Q Cn,k with
R (P ) = U, R (Q) = V.
(b) Compute the SVD of P H Q
P H Q = U V H
with the diagonal matrix
H H
QV
=U
| {zP } |{z}
XH
1.3.2
||x||=1
Lemma 24 Let
U =R
Im
0
and
U = R
Im
X
,
13
Chapter 2
2.1
Idea: Take an arbitrary q Cn \{0} and form the sequence q, Aq, A2 q, . . .. What will happen?
Assumption: A is diagonalizable. Let 1 , . . . , n with |1 | . . . |n | be the eigenvalues
of A and let (v1 , . . . , vn ) be a basis of eigenvectors. Then there exist c1 , . . . , cn with
q = c1 v1 + . . . + cn vn .
Further assumption: c1 6= 0 (this happens with probability 1 if q is random). Then,
Aq = c1 1 v1 + . . . + cn n vn ,
Ak q = c1 k1 v1 + . . . + cn kn vn .
For |1 | > 1 the powers |k1 | will grow, so we scale as
1 k
A q = c1 v1 + c2
k1
2
1
k
v2 + . . . + cn
n
1
k
vn .
1
Ak q
k1
Definition 25 A sequence (xk ) converges linearly to x, if there exists r with 0 < r < 1 such
that
||xk+1 x||
lim
= r.
k ||xk x||
Then r is called the convergence rate of the sequence.
We say that the convergence (xk ) x is of order m 2 if
||xk+1 x||
= c 6= 0.
k ||xk x||m
lim
1
Aqk1 ,
k
(b) Forming the full products Aqk costs 2n2 flops and the scaling O(n) flops. Hence m
iterations will cost 2n2 m flops.
(c) If (as is very common) | 21 | 1 then the convergence is very slow.
15
2.2
and
respect.
16
wH Aw
.
wH w
wH Aw
wH w
wH A = wH ,
kvk = 1 = kwk,
then for small perturbations A we have in first approximation that A+A has an eigenvalue
+ with
1
|| H kAk.
|w v|
We then have that 1/|wH v| is a condition number for simple eigenvalues. For normal matrices
we have v = w and thus |wH v| = 1. Normal matrices thus have well-conditioned eigenvalues.
2.3
Subspace iteration
To compute several eigenvalues and the associated invariant subspace, we can generalize the
power method to the subspace iteration. Consider A Cn,n with eigenvalues 1 , . . . , n ,
where |1 | . . . |n |.
Idea: Instead of q0 Cn , consider a set of linearly independent vectors {w1 , . . . , wm } Cn .
Set
W0 := [w1 , . . . , wm ] Cnm ,
and form the sequence W0 , AW0 , A2 W0 , . . . via
h
i
Wk := Ak W0 = Ak w1 , . . . , Ak wm ,
k 1.
In general, we expect R(Wk ) to converge to the invariant subspace U associated with the m
eigenvalues 1 , . . . , m . This iteration is called subspace iteration.
Theorem 33 Let A Cn,n with eigenvalues 1 , . . . , n satisfy
|1 | . . . |m | > |m+1 | . . . |n |.
Let U, V be the invariant subspaces associated with 1 , . . . , m , and m+1 , . . . , n respectively.
Furthermore, let W Cnm with Rank (W ) = m and R(W ) V =
{0}. Then for the
m+1
iteration W0 := W , Wk+1 = AWk for k 0 and for every % with m < % < 1, there exists
a constant c such that
d(R(Wk ), U) c %k , k 1.
Proof: We prove the theorem for the case that A is diagonalizable. We perform a similarity
transformation
A1 0
1
Anew = S Aold S =
,
0 A2
with A1 = diag (1 , . . . , m ) and A2 = diag (m+1 , . . . , n ). Then A1 is nonsingular, since
|1 | . . . |m | > 0. Set
Im
1
Unew = S Uold = R
0
18
and
Vnew = S
Vold = R
Inm
Furthermore, let
Wnew = S 1 Wold =
Z1
Z2
(Exercise)
(Exercise)
Im
X0
as well as
k
I
X0
).
Ak
I
X0
=
Ak1 0
0 Ak2
I
X0
=
Ak1
Ak2 X0
Im
k
k
= A2 X0 A1 Ak1 ,
| {z }
=:Xk
and thus,
R(Wk ) = R
Im
Xk
.
i.e., we perform the iteration not only for W0 but simultaneously also for all W0
since
h
i
(j)
k
k
k
A W 0 = A w1 , . . . , A wj .
= [w1 , . . . , wj ],
j = 1, . . . , m.
j = 1, . . . , m.
2.4
2.4.1
Ak =
m
nm
nm
A11
A21
A12
A22
the block A21 converges to 0 for k . Since this happens for all m simultaneously, it
follows that Ak converges to block-upper triangular matrix.
Another question is whether we can directly move from Ak1 to Ak ? To see this, observe that
Ak1 = Q1
k1 AQk1
Ak = Q1
k AQk
and hence
1
1
Ak = Q1
k Qk1 Ak1 Qk1 Qk = Uk Ak1 Uk .
| {z }
=:Uk
Thus we can reformulate the k-th step of the unitary subspace iteration
AQk1 = Qk Rk
as
1
Ak1 = Q1
k1 AQk1 = Qk1 Qk Rk = Uk Rk .
21
Ak1 = Uk Rk
"
m
nm
m
(k)
A11
(k)
A21
nm
#
(k)
A12
(k) ,
A22
then for every % with m+1
< % < 1 there exists a constant c such that
m
(k)
||A21 || c%k .
Proof: (Sketch) Let U be the invariant subspace associated with 1 , . . . , m and
n
o
(k)
(k)
Uk = Span q1 , . . . , qm
,
where
h
i
(k)
Qk = q1 , . . . , qn(k)
(k)
||A21 || 2 2||A||d(U, Uk )
Then using the convergence results for the subspace iteration there exists a constant c > 0
with
d(U, Uk ) c%k .
Remark 37 In the presented form the algorithm has two major disadvantages:
(a) It is expensive, since it costs O(n3 ) flops per iteration step.
(b) The convergence is slow (only linear).
A way to address the two problems is the Hessenberg reduction and the use of shifts.
2.4.2
Hessenberg reduction
2
vH v
vv H
for v Cn \ {0}. Householder Transformations are Hermitian and unitary. (Exercise) Multiplication with Householder transformations is geometrically a reflection of a vector x Cn
at the hyperplane Span {v} . (Exercise.)
A typical task: Reflect x Cn \ {0} to a multiple of the first unit vector, i.e., determine v
and from this P such that
P x = ||x||e1 .
To determine such a v we make an ansatz v = x + e1 and obtain
v = x ||x||e1
and
P x = ||x||e1 .
(Exercise)
x + ||x||e1 , x1 0,
x ||x||e1 , x1 < 0.
Gi,j (c, s) =
1
..
1
c
s
1
..
.
1
c
1
..
.
1
where |c|2 +|s|2 = 1 and where the matrix differs from an identity only in positions (i, i), (i, j), (j, i), (j, j).
Multiplication of a matrix with a Givens rotation allows to zero an element in any position.
E.g. choose
c s
1,2 (c, s) =
G
s c
with |c|2 + |s|2 = 1 such
sa11 + ca21 = 0, then we have
1,2 (c, s)
G
a11 a12
a21 a22
=
.
(a) For a Hessenberg matrix H Cn,n the QR decomposition can be performed in O(n2 )
flops using Givens rotations.
(b) Hessenberg matrices are invariant under QR iterations.
Theorem 39 (Implicit Q Theorem) Let A Cn,n and let Q = [q1 , . . . , qn ] , U = [u1 , . . . , un ]
be unitary matrices such that
H = Q1 AQ = (hij )
and
G = U 1 AU = [gij ]
Proof: Exercise.
24
2.4.3
m
nm
nm
H11
0
H12
H22
i.e., we can split our problem into two subproblems H11 , H22 .
Algorithm (QR Algorithm with Hessenberg reduction and shifts)
Given: A Cn,n :
(a) Compute U0 unitary such that
H0 := U0H AU0
is in Hessenberg form. We may assume that H0 is unreduced, otherwise we can deflate
right away.
(b) Iterate for k = 1, 2, . . . until deflation happens, i.e.,
(k)
(k)
(c) If hm+1,m = 0 or hm+1,m = O(eps), then we have deflation and we can continue with
smaller problems.
(d) If k is an eigenvalue then deflation happens immediately after one step.
Shift strategies:
(a) Rayleigh-quotient shift: For the special case that A is Hermitian, the sequence Ak
(k)
converges to a diagonal matrix. Then qn is a good approximation to an eigenvector
and a good approximation to the eigenvalue is the Rayleigh-quotient
r(qn(k) ) = (qn(k) )H Aqn(k)
(k)
(b) Wilkinson-shift: Problems with the Rayleigh-quotient shift arise when the matrix is
real and has nonreal eigenvalues, e.g., for
0 1
A=
.
1 0
A QR iteration for A0 = A yields Q0 = A0 , R0 = I and hence,
R0 Q0 = IA0 = A0 ,
i.e., the algorithm stagnates. To avoid such situations, for A Cn,n in the k-th step,
one considers the submatrix B in
Ak =
n2
n2
(k)
2.4.4
This opens the question whether we can compute Q directly without carrying out l QRiterations.
Lemma 41 M := (H l I) . . . (H 1 I) = Q1 . . . Ql Rl . . . R1 = QR.
| {z }
=:R
j = 1, . . . , l.
I
Rj1 . . . R1
1
j1
j
j1
1
I.A.
(H j I)(H j1 I) . . . (H 1 I).
2
This leads to the idea to compute M and then the Householder QR decomposition of M , i.e.,
M = QR, and to set
= QH RQ = Hl .
H
This means that one just needs one QR decomposition instead of l QR decompositions in
each QR step. On the other hand we would have to compute M , i.e., l 1 matrix-matrix
directly from H using the implicit
multiplications. But this can be avoided by computing H
Q Theorem.
Implicit shift-strategy:
(a) Compute
M e1 = (H l I) . . . (H 1 I)e1 ,
the first column of M . Then the first l + 1 entries are in general nonzero. If l is not too
large, then this costs only O(1) flops.
(b) Determine a Householder matrix P0 such that P0 (M e1 ) is a multiple of e1 . Transform
H with P0 as
P0 =
P0 HP0 =
l+1
nl1
l+1
nl1
0
I
l+2
nl2
l+2
nl2
(d) P0 has the same first column as Q. As in the first step for P0 we have Pk e1 = e1 , then
also
P0 P1 . . . Pn2
has the same first column as P0 and Q, respectively. With the implicit Q Theorem then
and QH HQ are essentially equal, thus we
also Q and P0 . . . , Pn2 and therefore also H
2.5
The QZ algorithm
For the solution of the generalized eigenvalue problem Ex = Ax with regular (E, A), E, A
Cn,n we can extend the ideas of the QR algorithm. The basic idea is to apply the QR algorithm
implicitly to E 1 A, without really computing the inverse and the product.
The first step is a transformation to Hessenberg-triangular form, i.e., one computes unitary
matrices U0 , V0 such that
U0H EV0 = S0 , U0H AV0 = T0 ,
with S0 upper triangular and T0 upper Hessenberg. If S0 was invertible, then S01 T0 would
be of upper Hessenberg form.
Algorithm. Hessenberg-triangular reduction
For a given regular pair (E, A) with E, A Cn,n the algorithm computes unitary matrices
U0 , V0 such that
U0H EV0 = S0 , U0H AV0 = T0 ,
with S0 upper triangular and T0 upper Hessenberg.
28
such that U
H E is upper triangular and set A =
Use the QR decomposition to compute U
H A = [aij ], E
=U
H E = [eij ].
U
For j = 1, . . . , n 2
for i = n, n 1, . . . , j + 2
Determine a 2 2 Householder or Givens matrix P such that
ai1,j
P
=
aij
0
and set A := P A = [aij ], E = P E = [eij ], where P = diag (Ii2 , P , Ini ).
such that
Determine a 2 2 Householder or Givens matrix Q
= 0
ei,i1 aii Q
Ini ).
and set A := AQ, E = EQ, where Q = diag (Ii2 , Q,
end
end
, V are desired.
This algorithm costs about 5n3 flops plus extra 23/6n3 if U
If A is not unreduced then we can again deflate the problem into subproblems, i.e., as
E11 E12
A11 A12
E A =
0 E22
0 A22
and continue with the subproblems.
Every 0 element in the diagonal of E corresponds to an infinite eigenvalue, i.e. a zero eigenvalue of A E. If an element on the diagonal of E is zero then we can introduce a 0 in the
position (n, n 1) of A and move the 0 to the bottom of the diagonal of E (Exercise). This
can be repeated until all zero elements are in the bottom part of the diagonal of E and the
corresponding part of A is triangular. Then we have deflated all the infinite eigenvalues and
have obtained
E11 E12
A11 A12
H
H
U0 EV0 = S0 =
, U0 AV0 = T0 =
,
0 E22
0 A22
with E22 strictly upper triangular and A22 is nonsingular upper triangular by the regularity
of (E, A).
In the top pair (E11 , A11 ) then E11 is invertible and in principle we can imply the implicit
1
QR algorithm to E11
A11 . This leads to the QZ algorithm of Moler and Stewart from 1973
(Exercise).
29
Chapter 3
For a given function u(x, y), we get on a two-dimensional grid in each grid point (i, j)
approximations uij = u(xi , yj ) for i = 1, . . . , m and j = 1, . . . , n. Here the approximation to v = u in the grid point (i, j) obtained via the 5-point difference star satisfies
-1
-1
|
4
|
-1
-1
together with further boundary conditions. This we can write as matrix equation.
v = [v11 , . . . , v1n , v21 , . . . , v2n , . . . , vm1 , . . . , vmn ] = Au
u = [u11 , . . . , u1n , u21 , . . . , u2n , . . . , um1 , . . . , umn ]
Instead of explicitly storing A we can write a subroutine that computes v = Au from u
via
for i=1,...,m
for j=1,...,n
v[i,j]=4*u[i,j]-u[i,j+1]-u[i,j-1]-u[i+1,j]-u[i-1,j]
end
end
a0
A=
a1
..
.
a1
..
.
..
.
an
...
. . . an
..
...
.
..
..
.
.
a1 a0
(, v) is an eigenvalue/eigenvector pair
Av v = 0,
Av v Cn .
3.1
Krylov Spaces
where
r(x) =
xT Ax
xT x
(Exercise)
max
r (x) = max
y6=0
xR(Qk )\{0}
y T QTk AQk y
,
y T QTk Qk y
and analogously
mk :=
min
r (x) .
xR(Qk )\{0}
and
mk n
for small k.
and
32
mk = r (wk ) .
2
Ax
r(x)x
xT x
(Exercise).
(3.1)
(3.2)
Since r (x) Span {x, Ax}, the conditions 3.1 and 3.2 are satisfied if
Span {q1 , q2 } = Span {q1 , Aq1 } ,
Span {q1 , q2 , q3 } = Span q1 , Aq1 , A2 q1 ,
..
.
n
o
Span {q1 , . . . , qk+1 } = Span q1 , Aq1 , A2 q1 , . . . , Ak q1 .
Definition 44 Let A Cn,n , x Cn and l N.
(a) Kl (A, x) := x, Ax, A2 x, . . . , Al1 x is called Krylov matrix for A and x.
(b)
n
o
Kl (A, x) := R (Kl (A, x)) = Span x, Ax, A2 x, . . . , Al1 x
is called Krylov space for A and x.
We have just observed that for symmetric matrices A Rn,n already after few matrix vector
multiplications, Krylov spaces yield good approximations to eigenvalues 1 and n , i.e., eigenvalues at the exterior of the spectrum. We expect that a similar property holds for general
matrices A Cn,n . Heuristic: Krylov spaces are good search spaces!
In the following we present a few properties of Krylov spaces. In particular, we construct the
relationship to minimal polynomials and Hessenberg matrices.
Reminder: Let A Cn,n , x Cn , then there exists a unique normalized polynomial p of
smallest degree such that
0 = p (A) x = Am x + m1 Am1 x + . . . + 1 Ax + 0 x.
Then p is called minimal polynomial of x with respect to A.
33
Lemma 45 Let A Cn,n , x Cn and let be the degree of the minimal polynomial of x
with respect to A. Then
(a) dim Km (A, x) = m m .
(b) K (A, x) ist A-invariant.
(c) Km (A, x) = K (A, x) f
ur m
2
Proof: Exercise.
Lemma 46 Let A Cn,n and g1 Cn be such that g1 , Ag1 , . . . , Am1 g1 are linearly independent. Suppose that g2 , . . . , gn are such that G = [g1 , g2 , . . . , gn ] is nonsingular and let
B = G1 AG = [bij ]. Then the following are equivalent
(a) bjk = 0 for k = 1, . . . , m 1 and j = k + 2, . . . , n, i.e.,
b11
B=
b21
0
..
.
..
.
..
.
0
...
..
...
...
...
..
. bm,m1 bmm
..
0
.
..
..
.
.
...
0
bn,m
b1n
..
.
..
.
. . . bmn
..
..
.
.
..
..
.
.
. . . bnn
Proof: Exercise.
3.2
Hessenberg reduction
A = QHQH
Householder
new: Arnoldi
34
h11 . . .
...
h1n
..
h21
.
A [q1 , . . . , qn ] = [q1 , . . . , qn ]
..
.
..
.
0
hn,n1 hnn
We want QH AQ = H
1
hk+1,k
Aqk
k
X
!
hik qi
i=1
j = 1, . . . , k
x
||x|| .
2) For k = 1, 2, . . . , n 1
(a) qk+1 := Aqk
k
X
i=1
(b) hk+1,k := ||
qk+1 ||.
1
(c) qk+1 =
qk+1 .
hk+1,k
Remark 47
(a) The algorithm stops if hm+1,m = 0 for some m ( good breakdown). Then
Aqk =
k+1
X
hjk qj
j=1
for k = 1, . . . , m 1 and
Aqm =
m
X
j=1
35
hjm qj .
Thus,
h11
...
h21
A [q1 , . . . , qm ] = [q1 , . . . , qm ] 0
..
.
0
|
..
h1,m1
.
..
h1m
..
.
..
.
..
.
. . . hm,m1 hmm
{z
}
=:Hm
h11
h21
A [q1 , . . . , qm ] = [q1 , . . . , qm+1 ]
0
0
...
..
.
..
.
h1m
..
.
..
.
. . . hm+1,m
This is called the Arnoldi relation. Due to the orthonormality of the qi we also have QH
m AQm =
Hm .
Consequence: Due to the relation between Hessenberg matrices and Krylov spaces it follows
that
Span {q1 , . . . , ql } = Kl (A, x)
for l = 1, . . . , m + 1. Thus the Arnoldi algorithm computes orthonormal bases of Krylov
spaces.
Application: The Arnoldi algorithm as projection method for A Cn,n , n large.
1) For a given start vector x 6= 0 compute the vectors q1 , q2 , . . . with the Arnoldi method.
2) Stop after m n steps
(a) either due a good breakdown in step m;
(b) or because we cannot store more vectors or because we have convergence. Note
that the computation of
k
X
qk+1 = Aqk
hjk qj
j=1
is more and more expensive in each further step (concerning flops and storage).
36
l = 1, . . . , m
und
Av v K.
Av v R (Qm ) .
Proof:
H
QH
m AQm z = z = Qm Qm z
QH
m (Av v) = 0
Av v R (Qm ) .
2
Remark 50 We can compute the eigenvalues of Hm by the Francis QR algorithm and here
we can exploit that Hm is already in Hessenberg form. To compute the eigenvectors, we carry
out one step of inverse iteration with a computed eigenvalue
of Hm as shift, i.e., we choose
a start vector w0 Cm , ||w0 || = 1, solve
(Hm
Im ) w
1 = w0
w
1
is already a good eigenvalue approximation, i.e., a good
for w
1 and set w1 = ||w
1 || . Since
shift, one step is usually enough.
To check whether a Ritz pair is a good approximation, we can compute the residual. A
small residual means a small backward error, i.e., if Av v is small and the eigenvalue is
well-conditioned, then (, v) is a good approximation to an eigenvalue/eigenvector pair of A.
Theorem 51 Let A, Qm , Hm , hm+1,m be the results of m steps of the Arnoldi algorithm.
Furthermore, let z = [z1 , . . . , zm ]T Cm be an eigenvector of Hm associated with C.
Then (, v), with v = Qm z, is a Ritz pair of A with respect to R (Qm ) and
||Av v|| = |hm+1,m ||zm |.
37
Qm Hm + hm+1,m qm+1 eTm z Qm z
Remark 52 (a) For the computation of the residual Av v we do not need to determine
the Ritz vector v explicitly.
(b) After some iterations in finite precision arithmetic the orthonormality of the qi deteriorates. This happens in particular when |hm+1,m | is small. Then the Ritz values
deteriorate as well. This can be fixed by re-orthonormalization using the modified GramSchmidt method for q1 , . . . , qm .
(c) We know how to detect good approximations but we are not sure that they occur.
3.3
1 1 0
...
0
..
.
.
.
.
.
.
1
.
.
.
.
.
.
.
..
.. ..
T = 0
0
.. . . . .
..
.
.
. n1
.
0 . . . 0 n1 n
Rn,n
is even real, since the diagonal of an Hermitian matrix is real and, furthermore ,we have
m = hm+1,m = ||
qm+1 || for m = 1, . . . , n 1. With Q = [q1 , . . . , qm ], comparing the columns
in AQ = QT , we obtain that
Aq1 = 1 q1 + 1 q2
Aqk = k1 qk1 + k qk + k qk+1 ,
k = 2, . . . , n 1
Aqn = n1 qn1 + n qn
This is called a 3-term recursion. Since q1 , . . . , qn are orthonormal we have, furthermore, that
k = qkH Aqk .
Algorithm (Lanczos, 1950)
Given AH = A Cn,n .
38
x
and 0 := 0.
||x||
2) For k = 1, 2, . . . , m
(a) k := qkH Aqk ,
(b) rk = Aqk k1 qk1 k qk ,
(c) If rk = 0 STOP. Otherwise set k := ||rk || and qk+1 :=
Remark 53
1
rk .
k
(b) As in the Arnoldi algorithm, after m steps we have (due to Lemma 46) that
Span {q1 , . . . , ql } = Kl (A, x) ,
The eigenvalue/eigenvector pairs of the
1
Tm =
l = 1, . . . , m
matrix
1
2
..
.
..
..
m1
m1
m
3.4
A disadvantage of Arnoldi algorithm is that the recursion becomes more and more expensive
the more steps we perform. So we might ask whether there also exists a 3-term recursion for
A 6= AH ?
Idea: Transform A to tridiagonal form and allow the transformation matrix to become nonunitary, i.e., we want to determine
X 1 AX = T
with T tridiagonal, or equivalently
1
AX = X
2
..
.
0
..
..
n1
39
n1
n
If we write X as
X = [x1 , . . . , xk ] ,
then we obtain
Axk = k1 xk1 + k xk + k xk+1
for k = 1, . . . , n 1 and with 0 := 0, x0 := 0. Furthermore,
T H = X H AH X = Y 1 AH Y
with Y := X . Then clearly Y H X = I, i.e., yjH xi = ij with
Y = [y1 , . . . , yn ] .
Such families of vectors (x1 , . . . , xn ) and (y1 , . . . , yn ) are called bi-orthogonal. We again compare columns in AH Y = Y T H . This means that
AH yk = k1 yk1 +
k yk + k yk+1
for k = 1, . . . , n 1 and 0 := 0, y0 := 0. Since Y H X = I, it follows that
k = ykH Axk .
Moreover,
k xk+1 = Axk k1 xk1 k xk =: rk ,
and
k yk+1 = AH yk k1 yk1
k yk =: sk .
Furthermore, we have
1 H
s rk
k k k
for all k 1. But there is still freedom in the computation of k , k . We could in principle
use one of the variants
k = ||rk ||, k = ||sk ||, k = k , . . .
H
1 = yk+1
xk+1 =
Remark 54
1) After m steps, if we do not have a breakdown (since we cannot divide by
0), we have
1 1
0
..
1 . . .
.
..
..
A [x1 , . . . , xm ] = [x1 , . . . , xm , xm+1 ]
.
.
|
{z
}
m1
:=Xm
m1 m
0
m
If we denote the submatrix consisting of the first m rows with Tm , then we have
AXm = Xm Tm + m xm+1 eTm
and analogously
H
H
AH Ym = Ym Tm
+ m
ym+1 eTm .
2) Due to the relationship with Krylov spaces and Hessenberg matrices, we have
Span {x1 , . . . , xl } = Kl (A, x1 ) ,
l = 1, . . . , m,
and
Span {y1 , . . . , yl } = Kl AH , y1 ,
l = 1, . . . , m.
Tm = YmH AXm .
Some of the eigenvalues of this matrix are typically good approximations to an eigenvalue
of A. Let C and z Cm , v = Xm z. If (, z) is an eigenvalue/eigenvector pair of
Tm , then
YmH AXm z = Tm z = z = YmH Xm z
41
3.5
For A Cn,n and x Cn Krylov space methods construct Xm = [x1 , . . . , xm ] Cn,m such
that
Span {x1 , . . . , xl } = Kl (A, x1 ) , l = 1, . . . m
We stop the method after m n steps and choose K = Km (A, x1 ) as search space together
with a test space L Cn and then we compute pairs
C,
v K \ {0} ,
such that
Av v L.
In the Arnoldi and the symmetric Lanczos algorithm we choose L = K, in the nonsymmetric
Lanczos method L = Km AH , y1 .
Then the obvious question is whether in the computed pairs (, v) there are good approximations to eigenvalue/eigenvector pairs?
Lemma 55 Denoting by m1 the set of polynomials of degree less than or equal to m 1,
then
Km (A, x) = {p(A)x | p m1 } .
Proof: Let w Km (A, x). Then there exist 0 , . . . , m1 C with
w = 0 x + 1 Ax + . . . + m1 Am1 x = p (A) x,
2
The convergence is typically very good for eigenvalue in the outer part of the spectrum and
very slow for the eigenvalues in the interior. Quantitative results using Lemma 55 are based
on optimal polynomial approximations, but a complete convergence analysis in all cases is an
open problem.
42
3.6
The Arnoldi algorithm becomes more and more expensive the more iterations one performs,
and the alternative nonsymmetric Lanczos is unstable. Can we resolve the problem with
Arnoldi algorithm?
Ideas:
1) Use restarts: After m steps of the Arnoldi algorithm with startvector x choose p
m1 with
large for desiredi ,
|p (i ) | =
small for undesiredi .
Then choose p(A)x as new start vector and restart the Arnoldi algorithm. Since the start
vector has been enlarged in the components of the direction of the desired eigenvectors,
we except that the Krylov spaces contain, in particular, good approximations to the
desired eigenvectors.
The disadvantage is that we start with a single new vector, we loose a lot of already
obtained information. It would be ideal if we choose the new start vector p (A) x optimally in the sense that we keep as much information as possible, i.e., maximally many
approximation to desired eigenvalues. This is again an open problem.
2) To preserve more information, we proceed as follows. If k eigenvalues are desired then
we run the Arnoldi algorithm for m = k + l steps. Then keep k vectors and throw away
l vectors. This leads to the
IRA (implicitly restarted Arnoldi method ), Sorensen, 1992.
The strategy of the implicitly restarted Arnoldi algorithm is as follows:
1) After m steps of the Arnoldi algorithm (without good breakdown) we have the Arnoldi
relation
AQm = Qm Hm + hm+1,m qm+1 eTm .
Qm = [q1 , . . . , qm ] is isometric and Hm Cm,m is in Hessenberg form.
2) Choose l Shifts 1 , . . . , l C. (Details will be described later!)
3) Carry out l steps of the QR algorithm with shifts 1 , . . . , l .
H (1) = Hm
For j = 1, . . . , l,
H (j) j I = Uj Rj (QR decomposition)
H (j+1) = Rj Uj + j I
end
m = H (l+1)
H
U
= U1 . . . Ul
m = U H Hm U and furthermore every Ui has the form Ui = G(i) . . . G(i)
Then H
12
m1,m with
(i)
(i)
m
=:Q
=:uT
m
m as
for some C. We then partition Q
m = Q
k Q
l = [
Q
q1 , . . . , qm ]
j = [
m . Then
and set Q
q1 , . . . , qj ] for j = 1, . . . , m. In the same way we partition H
i
i h
h
A Qk Ql = Qk Ql
k
H
e1 eTk
l
H
+ fm+1 [0, . . . , 0, , , . . . , ] .
k + fk+1 eT .
kH
= Q
k
H fk+1 . In particular, this
This is again an Arnoldi relation, since one easily see that Q
k
Arnoldi relation is the same as that after k steps of the restarted Arnoldi algorithm
with the restart vector q1 . This is the reason for the terminology implicit restart.
5) Then we perform l further Arrnoldi steps and begin again with 1) until sufficiently many
eigenvalues are found.
This concept works very nicely but some questions remain.
1) How to choose the shifts 1 , . . . , l ?
2) What is the relation between q1 and q1 ?
3) What is the relation between the corresponding Krylov spaces?
Lemma 56 Let p (t) = (t 1 ) . . . (t l ). Then, with the notation and assumptions above,
p (A) Qm = Qm U R + Fm ,
where Fm =
0 Fm
and R = Rk . . . R1 .
44
l 1 l: We have
p (A) Qm
=
I.V.
=
=
(A 1 I) (A 2 I) . . . (A l I) Qm
(A 1 I) Qm (Hm 2 I) . . . (Hm l I) + Fm1
Qm (Hm 1 I) + hm+1,m qm+1 eTm (Hm 2 I) . . . (Hm l I) + (A 1 I) Fm1
h
i
Qm p (Hm ) + hm+1,m qm+1 eTm (Hm 2 I) . . . (Hm l I) + m l + 1 l 1R 0 (A 1 I) Fm1 ,
{z
}
|
|
=0,...,0,, . . . ,
| {z }
l
{z
|
where Fm =
0 Fm
=:Fm
r11 . . . r1m
.. + 0 F .
..
m
= Q
.
m
.
0
rmm
Comparing the first columns yields
p (A) q1 = q1 r11 .
45
Therfore, choose = 1/r11 . Here r11 6= 0, otherwise we would have p (A) q1 = 0, i.e.,
the minimal polynomial of q1 with respect to A would have degree l. But then the
Krylov space Kl (A, q1 ) is invariant, i.e., the Arnoldi would have had a good
breakdown after l steps which is a contradiction to the fact that we have done m > l
steps without breakdown.
2) We could proceed as in 1) but we use the following approach: Since we know already
that
R (Qj ) = Kj (A, q1 ) , j = 1, . . . , m
and
j ) = Kj (A, q1 ) ,
R(Q
j = 1, . . . , m
it follows that
j ) = Span p (A) q1 , Ap (A) q1 , . . . , Aj1 p (A) q1
R(Q
= Span p (A) q1 , p (A) Aq1 , . . . , p (A) Aj1 q1
= p (A) Span q1 , Aq1 , . . . , Aj1 q1
= p (A) R (Qj )
2
j ) = (A 1 I) . . . (A l I) R (Qj ) , j = 1, . . . , m.
As a consequence we obtain that R(Q
This correspond to l steps of subspace iteration with shifts 1 , . . . , l . Then we can expect
that Kj (A, q1 ) contains better approximations to eigenvectors than Kj (A, q1 ).
Choice of shifts: Suppose that A is diagonalizable and (v1 , . . . , vn ) is a basis of eigenvectors
to eigenvalues 1 , . . . , n . Then
q1 = c1 v1 + . . . + cn vn
with ci C.
3.7
If only a specific pair (, u) of a large sparse matrix A Cn,n is desired (e.g., the eigenvalue
largest in modulus or the one nearest to
C, then alternative methods can be considered.
1. Idea. Apply the Arnoldi algorithm with shift-and-invert, i.e., apply the Arnoldi algorithm
to (A I)1 . This has the disadvantage that per step we have to solve a linear system
with A I. In practice, this has to be done very accurately, otherwise the convergence
deteriorates. If a sparse LR decomposition of (A I)1 can be determined, then this is
acceptable otherwise this is problematic.
2. Idea. We use again a projection method. We choose an isometric matrix Qm Cn,m
with m n and use R (Qm ) as search and test space. Then we have again
QH
Qm z = z
mA |
{z }
Av v R (Qm ) ,
=:v
u = vk + x,
= k + .
P x = x,
P rk = rk ,
P (A I) x = P rk = rk
P (A I) P x = rk and x vk .
47
Unfortunately we cannot solve the last equation, since we do not know . Therefore, we
replace by the Ritz value k ,
P (A k I) P x = rk ,
x vk .
(3.3)
k
X
hik qi .
i=1
The solution of the linear system has to be done very accurately. If this is not done
accurately enough, then the information about the operator is not injected well enough.
Furthermore, restarts, locking and purging is easier in the Jacobi-Davidson method,
since we do not need to preserve an Arnoldi relation.
(b) On the other hand the Jacobi-Davidson algorithm approximates only one pair at a time
while the Arnoldi method determines several eigenvalues at the same time.
49
3.8
For large scale generalized eigenvalue problems Ex = Ax with regular pairs (E, A) we
can apply all the methods from the previous sections whenever a solution of linear systems
(0 E A)x = b are feasible for chosen shifts 0 .
and apply the
Set = ( + 0 )1 and replace Ex = Ax by x = (0 E A)1 Ex =: Ax
50
Chapter 4
4.1
Splitting-Methods
The basic idea of splitting methods is to split the matrix A in two summands A = M N
and to transfer the linear system to a fix-point equation
M x = N x + b.
This immediately leads to an iterative method via the recursion
M x(k+1) = N x(k) + b.
Remark 62
(a) If A, M are nonsingular and % M 1 N < 1, where
% (B) := max ||,
(B)
0 ... 0
0
0
.. . .
..
Example 63 We split A as A = . . . ... +
+ .
.
. .
{z
=:L
0
}
{z
=:D
0 ... 0
{z
}
=:R
4.2
4.2.1
Steepest Descent
rT r
.
rT Ar
2
Proof: Exercise.
A1 b.
Otherwise set k =
T r
rk1
k1
T Ar
rk1
k1
1
1
2 (A)
1
(xk ) + bT A1 b .
2
We thus have global convergence for all start vectors. But the method has many disadvantages.
(a) The convergence is very slow if 2 (A) is large.
(b) Furthermore, even if becomes small very quickly, then this is not automatically true
for the residual.
The main reasons for these disadvantages are the following.
1) We minimize only in one search direction rk , but we have many more directions than
one (namely r0 , . . . , rk ).
2) The search directions are not different enough.
4.2.2
To improve the convergence behavior of the descent method we add a little modification. The
basic idea is to choose in every step instead of the negative gradient a direction p Rn with
p 6 r. Then we also find in this direction a decrease of . Thus in every step instead of rk
we choose a search direction pk with pTk rk 6= 0.
We require the following conditions for pk+1 and xk+1 .
R1) p1 , . . . , pk+1 are linearly independent.
R2) (xk+1 ) = min (x), where Rk+1 := x0 + Span {p1 , . . . , pk+1 }.
xRk+1
min (x), such that all three conditions R1)-R3) are satisfied.
xRk+1
(xk+1 ) =
If the mixed term was not there then we could minimize separately over the two variables.
Thus we choose pk+1 so that
pTk+1 APk = 0
and obtain
min (x) = min (x0 + Pk y) + min
xRk+1
R
yRk
|
{z
} |
Sol. y=yk
1 2 T
pk+1 Apk+1 pTk+1 r0 .
2
{z
}
Sol. k+1 =
r
pT
k+1 0
T
Apk+1
p
k+1
pT
k+1 r0
T
pk+1 Apk+1
. Thus
i 6= j,
i, j = 1, . . . , k
or
r0 = b Ax0 6 Span {Ap1 , . . . , Apk } .
Thus there exists pk+1 Span {Ap1 , . . . , Apk } with pTk+1 r0 6= 0. Since
xk x0 + Span {p1 , . . . , pk }, we have
rk = b Axk r0 + Span {Ap1 , . . . , Apk }
and thus also
pTk+1 rk = pTk+1 r0 6= 0.
2
Remark 68 From the proof of the Lemma 67 we have the following observation. Since
pT rk = pT r0 for p Span {Ap1 , . . . , Apk }T , we have, in particular, that pTk+1 rk = pTk+1 r0 , and
thus
pT r0
pT rk
k+1 = T k+1
= T k+1
.
pk+1 Apk+1
pk+1 Apk+1
We then can finally show that also the first requirement R1) is satisfied.
Lemma 69 The search directions p1 , . . . , pk are linearly independent.
Proof: The matrix PkT APk = diag pT1 Ap1 , . . . , pTk Apk is invertible, since A is positive
definite. Thus Pk has full rank, i.e., the columns p1 , . . . , pk are linearly independent.
pTk+1 rk
pTk+1 Apk+1
4.2.3
We have seen that the choice of A-conjugate search directions has many advantages (an easy
computation of xk+1 from xk and a guaranteed convergence in at most n steps in exact
arithmetic). On the other hand we would like to keep the advantage of steepest descent
that the function decreases maximally in the direction of the negative gradient, i.e., this is
heuristically a good search direction. The idea is then to use the freedom in pk+1 to choose
that pk+1 which is nearest to rk , the direction of the negative gradient, i.e., to choose pk+1 so
that
||pk+1 rk || =
min
||p rk ||.
(4.1)
pSpan{Ap1 ,...,Apk }
At first sight this looks strange, since we wanted to choose directions that allow an easy
solution of the optimization problem and here we introduce another optimization problem.
We will see now that this optimization problem is easy to solve since it will turn out that
pk+1 is just a linear combination of pk and rk .
In the following, under the same assumptions as before, we choose the A-conjugate search
directions to minimize (4.1) for k = 0, . . . , m. Let Pk = [p1 , . . . , pk ] and show then that
pk+1 Span {pk , rk }.
Lemma 70 Let k {1, . . . , m} and zk Rk , such that
||rk APk zk || = min ||rk AP z||.
zRk
56
k = 1, . . . , m.
= rk APk1 w +
(rk rk1 )
k
=
1+
rk + sk ,
k
57
where
rk1 APk1 w
k
Span{rk1 , APk1 w}
sk =
pTk Ark
.
pTk Apk
Hence pk+1 can be constructed directly from pk and rk without solving a minimization problem.
Algorithm: (CG, conjugate-gradient method - Hesteness/Stiefels, 1952)
For A Rn,n symmetric, positive definite and b Rn the algorithm computes the solution
x = A1 b of Ax = b.
1) Start: x0 Rn , r0 = b Ax0 , p1 = r0 .
2) Iterate for k = 1, 2, . . . until convergence
(a) k =
pTk rk1
,
pTk Apk
(b) xk = xk1 + k pk ,
(c) rk = b Axk ,
(d) k+1 =
pTk Ark
,
pTk Apk
Remark 73 There are many theoretical results behind this simple algorithm, for example the
convergence after at most n steps in exact arithmetic, since the CG-algorithm is a special
case of the algorithm of A-conjugate search directions. The iterate xk satisfies
(xk ) = min (x) ,
xRk
For this reason one calls the CG-algorithm a Krylov space method.
4.2.4
Using the relationship of the CG algorithm with Krylov spaces allows a detailed convergence
analysis. For this we introduce a special norm.
Definition 74 Let A Rn,n be symmetric and positive definite. Then the norm defined by
||x||A := xT Ax
on Rn is called the A-Norm or energy norm.
We would like to estimate the error
ek := A1 b xk = A1 (b Axk ) = A1 rk
where (xk ) is the sequence generated by the CG algorithm.
Theorem 75 (Optimality of CG in the A-Norm) Let A Rn,n be symmetric and
positive definite and let (xk ) be the CG sequence generated from a starting vector x0 . If
rk1 6= 0, then
||ek ||A = ||A1 b xk ||A < ||A1 b x||A
for all x x0 + Kk (A, r0 ) with xk 6= x.
Proof: We know that xk x0 + Kk (A, r0 ). Let x x0 + Kk (A, r0 ) be arbitrary and
x = xk x, i.e., x Kk (A, r0 ), and moreover
e := A1 b x = A1 b (xk x) = ek + x
Then
||
e||2A = eT A
e = (ek + x)T A (ek + x)
= eTk Aek + 2eTk Ax + xT Ax
and
2eTk Ax = 2rkT A1 Ax = 2rkT x = 0
since x Kk (A, r0 ) = Span {r0 , . . . , rk1 } and rk rj for j = 0, . . . , k 1 due to
Theorem 71. Thus we obtain
||
e||2A = ||ek ||2A + ||x||2A > ||ek ||2A ,
if x 6= 0.
2
59
(4.2)
Thus, we have
ek = A1 rk = A1 r0
p
(A) r0 = e0 pk1 (A) Ae0 = (I pk1 (A) A) e0 .
| {z } k1
|
{z
}
=e0
k
=pk (A)
Then the uniqueness of pk , and the first equality in (4.2) follows from Theorem 75. To prove
the inequality in (4.2), let (v1 , . . . , vn ) be an orthonormal basis of eigenvectors of A to the
k and
eigenvalues 1 , . . . , n . Furthermore let p
with c1 , . . . , cn R.
e0 = c1 v1 + . . . + cn vn
Then,
p (A) e0 = c1 p (1 ) v1 + . . . + cn p (n ) vn .
By the orthogonality of the vi we obtain
||e0 ||2A = eT0 Ae0 =
n
X
c2i i
i=1
and
||p (A) e0 ||2A
n
X
c2i p (i )2 i
i=1
max p ()
(A)
n
X
c2i i .
i=1
60
Remark 77
1) From Corollary 76 we conclude that the CG algorithm converges fast if A
has an appropriate spectrum, i.e., one for which there exist polynomials p with p (0) = 1
and small degree such that |p () | is small for all (A). This is e.g. the case if
(a) the eigenvalues occur in clusters,
(b) all eigenvalues are far from the origin,
(then 2 (A) = max
is not too large).
min
2) With the help of Chebysheff polynomials one can prove that
||ek ||A
2
||e0 ||A
k
1
,
+1
||ek ||2
2
||e0 ||2
k
1
.
+1
4.2.5
In this section we use the same notation as in previous sections. Consider the matrices
1 2
0
..
.
1
.
Rk = [r0 , . . . , rk1 ], Pk = [p1 , . . . , pk ], Bk =
.
.
. n
0
1
Using the equations p1 = r0 and pi = ri1 + i pi1 for i = 2, . . . , n (see Section 4.2.3) we
obtain
Rk = Pk Bk .
Then the matrix RkT ARk is tridiagonal, since
pT1 Ap1
..
0
.
pTk Apk
Bk .
Furthermore, we know from Theorem 75, that the r0 , . . . , rk1 are orthogonal and span a
r
Krylov space, i.e., krr00 k , . . . , krk1
is an orthonormal basis of Kk (A, r0 ).
k1 k
This leads to an interesting conclusion. If q1 := krr00 k and if q1 , . . . , qk are the vectors generated
by the Lanczos algorithm, then by the implicit Q Theorem
qj =
rj1
,
krj1 k
j = 1, . . . , k.
Thus, the tridiagonal matrix generated by the Lanczos algorithm is (except for signs) the
matrix RkT ARk , i.e.,
CG Lanczos
Application: In the course of the CG algorithm we can generate the tridiagonal matrix
RkT ARk and obtain information about extremal eigenvalues of A and the condition number
2 (A) = max
.
min
4.2.6
min
xx0 +Kk (A,r0 )
kb Axk2 .
A Hermitian
A general
(4.3)
This means that we have to solve in each step the least-squares problem
kb Axk k2 =
min
xx0 +Kk (A,r0 )
kb Axk2 .
In Section 4.2.5 we have seen that the CG algorithm corresponds to the Lanczos algorithm.
We expect that in the general case
GMRES Arnoldi
After k steps of the Arnoldi algorithm (without breakdown) we have the Arnoldi relation
AQk = Qk Hk + hk+1,k qk+1 eTk = Qk+1 Hk+1,k ,
62
h11 . . .
...
h1k
..
.
h21 . .
.
..
.
.
..
..
Hk+1,k = 0
.
..
..
.
. hk,k1
hkk
0 ...
0
hk+1,k
If q1 =
r0
kr0 k ,
for an y
Ck+1,k .
Ck .
Then
r0
kr0 k .
(4.4)
M = QR,
R=
R1 0
where
QH c =
c1 c2
eH
Q
k
0
0
1
Hk+1,k =
=
eH
Q
k
0
0
1
Hk,k1
hkk
0
hk+1,k
0
.
=
ek1 Q
e H hkk
R
k
0
hk+1,k
Rk1
0
hk+1,k
Thus, the element hk+1,k can be eliminated by a single Givens rotation and we obtain the QR
decomposition of Hk+1,k from Hk,k1 by this Givens rotation (this costs only O(n) flops).
Algorithm (GMRES)
For A Cn,n invertible, b Cn , and a starting vector x0 Cn , the algorithm computes the
solution x
= A1 b of Ax = b.
63
a) qk =
rk
,
hk,k1
b) rk = Aqk
k
X
j=1
c) hk+1,k = krk k,
d) Determine yk such that
kr0 k e1 Hk+1,k yk
is minimal,
e) xk = x0 + Qk yk .
64
Remark 78 As the CG algorithm, also GMRES can be analyzed via polynomial approximae k = {p k | p(0) = 1}.
tion in
for p k1 ,
x = x0 + p(A)r0
since x x0 + Kk (A, r0 ). Therefore,
rk := b Axk = b Ax0 A
p(A)r0 = I A
p(A) r0 = p(A)r0
e k.
for p
ek
p
2
Consequence The GMRES algorithm (in most cases) converges fast if
1) the spectrum A is appropriate and
2) (V ) is small, i.e., if A is not too far from a normal matrix (since for a normal matrix
V can be chosen unitary, i.e., with condition number 1).
Remark 80 Convergence acceleration can again be achieved via preconditioning, i.e., instead
of Ax = b we solve M 1 Ax = M 1 b, where M y = c is easy to solve and chosen such that the
spectrum is appropriate.
65
AH
A=
A 6= AH
Ax = x
Lanczos
Arnoldi
Lanczos
Ax = b
CG
GMRES
BiCG
66