Matrix Comp
Matrix Comp
THIRD EDITION
Gene H. Golub
Department of Computer Science
Stanford University
ALSTON S. HOUSEHOLDER
AND
JAMES H. WILKINSON
Contents
2 Matrix Analysis 48
2.1 Basic Ideas from Linear Algebra 48
2.2 Vector Norms 52
2.3 Matrix Norms 54
2.4 Finite Precision Matrix Computations 59
2.5 Orthogonality and the SVD 69
2.6 Projections and the CS Decomposition 75
2.7 The Sensitivity of Square Linear Systems 80
Bibliography 637
Index 687
Preface to the Third Edition
xi
Xli
the WlSJililiiletric Lanczos process and the Arnoldi iteration. The "unsym·
metric component" of Chapter 10 (Iterative Methods for Linear Systems)
has likewise been broadened with a whole new section devoted to various
Krylov space methods designed to handle the sparse unsymmetric linear
system problem.
In §12.5 (Updating Orthogonal Decompositions) we included a new sub-
section on ULV updating. Toeplitz matrix eigenproblems and orthogonal
matrix eigenproblems are discu.ssed in §12.6.
Both of us look forward to continuing the dialog with our readers. As
we said in the Preface to the Second Edition, "It has been a pleasure to
deal with such an interested and friendly readership."
Many individuals made valuable Thi.rd Edition suggestions, but Greg
Ammar, Mike Heath, Nick Trefethen, and Steve Vavasis deserve special
thanks.
Finally, we would like to acknowledge the support of Cindy Robinson
at Cornell. A dedicated aaaistant makes a big difference.'
Software
LAPACK
Many of the algorithms in this book are implemented in the software pack-
age LAPACK:
Our LAPACK references are spare in detail but rich enough to "get you
started." Thus, when we say that _TRSV can be used to solve a triangular
system Ax = b, we leave it to you to discover through the LAPACK manual
that A can be either upper or lower triangular and that the transposed
system ATz = b can be handled as well. Moreover, the underscore is a
placeholder whose mission is to designate type (si.Dgle, double, complex,
etc).
LAPACK stands on the shoulders of two other paclm.ges that a.re mile-
stones in the history of software development. EISPACK was developed in
the early 1970s and is dedicated to solving symmetric, unsymmetric, and
generalized eigenproblems:
B.T. Smith, J.M. Boyle, Y. Ikebe, V.C. Klema., and C.B. Moler {1970).
Matrix Eigensystem Routines: EISPACK Guide, 2nd ed., Lecture Notes
in Computer Science, Volume 6, Springer-Verlag, New York.
xili
xiv SOFTWARE
B.S. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972). Matri.z
Eigm.~Yriem Routina: EISPACK GWk ~ Lecture Notes in
Computer Science, Volume 51, Sprillger~Verlag, New York.
UNPACK WBB developed in the late 1910s for linear equations and least
squares problems:
EISPACK and LINPACK have their roots in 8BQ.Uence of papers that feature
Algol implementations of some of the key matrix factorizations. These
papers are collected in
NETLIB
A wide range of software including LAPACK, EISPACK, and LINPACK is
available electronically via Netlib:
World Wide Web: http: I /wv. net lib. arg/ i.D.dex. html
Anonymous ftp: ftp://ftp.netlib.org
Via email, send a on~line message:
mail netli~ornl.gov
send index
to get started.
Pre-1970 Classics
V.N. Faddeeva (1959). ComputGtioool MethotU of Limar Algebru, Dover,
New York.
Buic M.terial froal LiD_. AI~ SyfieaW of LiMV Equa&iona. The Props-
Number~~ lllolld P'ropw Vecton1 ol a MIIIUU.
XV
xvi SELECTED REFERENCES
Introductory (General)
A.R. Gourlay and G.A. Watson (1973). Computational MethodtJ for Matrix
Eigenproblems, John Wiley !.l Sons, New York.
lutmduction. Background Theory. RllduciioJIII aad Ttamformatio1111. Methods for
the Dominant Eigenval1111. Methods Cot the Subdominant Eigemralue. ~ It-
eration. Jacobi's Methods. Giveua and Houaebolder'a Methods. EigeDsyBtem of
~~o Symmetric 'l'ndiaconal Matrix. The LR and QR Algorithms. Extensions of J~
cobi'a Method. Extensioll ofGiwoa' and Houeeholde.-'s Methodl. QR Algorithm for
H-Ilberg Matriaa. pneralized Eigenvalue Problema. Avsilable Implement.uiona.
R.J. Gault, R.F. Hoskins, J.A. Milner and M.J. Pratt (1974). Computa-
tional Methods in Linear Algebro, John Wiley and Sons, New York.
Eigenvalue~~ 811d Eigenvectors. Error Analyais. The Solution of Linear Equations by
Eliminaiion and Decompoeition Methods. The Solution of Linear Systems of Equa-
tion. by ltera&ive Methods. Errors in the Solution Seta of Equations. Computation
of Eigeoval11e11 and Elgeuvectors. Errors in Elgeovalues and Eigenvectol'll. Appendix
- A SIII"iey of Essentis.l Results from Linear Algebra.
T.F. Coleman and C.F. Van Loan (1988). Handbook for Matri% Computa·
tions, SIAM Publications, Philadelphia, PA.
Fortran 77, Tbe Basic Linear Algebra Subprograms, Linpac.k., MATLAB.
Advanced (General)
N.J. Higham (1996). Accuracy and Stability of Numerical Algorithms,
SIAM Publications, Philadelphia, PA.
PrlncipiM of Finite Precillion Computation. Floating Polni Ariibmetic. Baaicll.
Snmm"*-m Po!yu.oznia& Norms. Perturba&ion Theoty for u-r Syfteml. Tri-
angular' Systems. LU F'llciomaiioo and u-- Equa&;iona. Cho~ F'lldorizat.ion.
Iterat:ille Refineznenl. Block LU Factorization. Matrix lownion. Condition Number
Blltimatioq. The Syh-elt« EquabOD. Slationacy Iteratiw Method.. Mai:ri:lt Powenl.
QR Factorint.ion. The r-t S<tua- Problem. Underdetermined Systema. Van-
dsrmonde S~. Filii MUrix Multiplication. The Fost Follrier 'I'raDIIform and
Application~~. Automat£ &rot- Analya:ll. Softwere Iaues In Floating Point Arith-
metic. A GaJlery of TeiJi Ma&rica.
SELECTED REFERENCES XIX
L.N. Trefethen and D. Bau III {1997). Numerical Linear Algebra, SIAM
Publications, Philadelphia, PA.
Matrix-Vector Muhiplieation. Orthogooal Vectota and Ma&ricll!ll. Norms. The Siu-
gulac Value Doac:ompoaition. More on the SVD. Projectonl. QR Fadorizaiion. Gram-
Sdunidt Ortbogonaliaatioo. MI.TLJt.B. Householder Triangulariation. Leaa&-Squ-
Problems. Conditioning and Condition NUIIlhem. Floating Point Arithmetic. Stabil-
iqr. More on Stability. Stability of Houaehoider 'I'rianglllarizion. Stability of Back
Submtucioll. Condicioning of Leut-Squaree Probleml. St.abUity of r-t-Sque.ret
Algorithms. Gaw.iao Elimination. Pivoting. Stability of GBU~Bi.ao El.i.mi.nation.
Choleaky Factorization. Eipnwlue Problema. Overvin- of EigenVILiue Algorithms.
Reduction to H~TridJagooal Fbnn. Rayleigh Quotiem., Inverse Iteration.
QR AJ.corit;hm. Without Shi& QR. Algorithm With Shib. Other Eisenvalue Al-
gorittnm. Computing the SVD. Overview of Iteratiw Methods. The Arnoldi Itera-
tion. How Arnoldi Loeat.ell Elgenwluea. GMRES. The Le.ncsoe Iteration. Orthogo-
nal Polynomiall!l and Gaus Qu~ure. Conjugate Gradients. Biorthogooalization
Methods. Preoooditioning. The Definition of Numerical. Analysis.
Analytical
F.R. Gantmacher (1959). The Theory of Matrices Vol. 1, Chelsea, New
York.
Matricm and Opemtkma DD Matrices. The Algmitbm of c - and Some of it.s
AppiicUioaa. I..ine&l' Opera&ore in 11o11 n-dimeoaional Vector Space. The Cluu'acter-
istic Polynomial and the MiDimum Polynomial of a Matrix. Functio011 of Matrice~~,
Equivaleni 'I"ranaforDWio of Polynomia.l Mlltricea, Analytic TheoJy of Elementary
Diviaol'a. The Structure of a Linear Operator ill aa n-dimeDsional Spece. Mattix
Eque.tloll.t.. LiDeu' Operawrs iu a Uniialy Space. Quadratic aud Hermitiaa Fbnna.
L.A. Hageman and D.M. Young (1981). Applied Iterative Method6, Aca-
demic Press, New York.
Be.ckground on Lineal' Algebra IUld R2lated Topics. &ckgrouud on Buic Iterative
Method&. Polynomial Acc:elera&ion. Ch~ Accelen.t.ion. All Adaptive Cheby-
shev Procedure Uling Special Nonne. Adaptl:n! Chebyshev Aceeltnt.ion. Conjupte
Gradient Accelen.&ion. Special Methods for Red/Black Pactitianinp. Adaptive Pro-
cedure~ for Succ:emw OvelTelaltllt.ion Method. The Use oflt.entive Metboda in the
Solution of Pactia.l Dl1fenm.t.ial Equa&.io111. c - Studies. The Nouymmetrizable
c-.
REFERENCES xxi
I.S. Duff, A.M. Erisman, and J.K. Reid (1986}. Direct Method3 for Sparse
Matrices, Oxford University Press, New York.
Introduction. Sparse Matrices:St.oragfl Scb.emea and Simple Opentions. Gaussian
Elimination for Deuae Mat~: The Algebraic Problem. Galll!Bia.n Elimination
for Dense Matrkee: Numerical Consi.deratioll.ll. GaU88ia.n Elimination for Spacse
Matrices: An Introduction. Reduction to Block niangulal' Form. Local Pivotal
Strategies for Spame Ma.t.riclll!l. Ofdering Sparse Matrices to Special Forms. Im-
plementing GaWIBian Elimlna&ion: Anal~ with Numerical ValuM. lmplemenUng
Gau.ian E]jmin"'iOJt witb Symbolic Analyae. PBrlitionillg, Matrb( Modification,
aod Tearing. Oth• Spamity-Orieoted lslnles.
Y. Saad {1996). Iterative Meth.otl8 for Spar.1e Linear Sy.1tem.~, PWS Pub-
lishing Co., Boston.
&cqrouud iu Lineae Alpbr&. Dilaetization of PDF... Sparee MMric... Buie
lterui:w Metboda. Projeciioa M~. Krylov Suo.p.ce Method. - Part. I. Krylov
Sut.pKe Methoda- Part U. Met.hoda Related to the Norma.l Equations. Precon-
ditioned It.erat.iona. Prec:ouditioniq Techniquee. Panlle! lmplementatic~DS. Parallel
PrecoDdiUouen.. Oo!Dain Oecompaeition Methods.
S. Van HuJfel and J. Vandewalle (1991). The Total Least Square3 Problem:
Computational Aspect& and Analysis, SIAM Publications, Philadelphia,
PA.
Introduction. Basic Principles of the Tot.al Least Squares Problem. Extensions of the
Basic Total Least Squ.are~ Problem. Direct Speed Improvement of the Thtal Least
Squares Computations. Iterative Speed Improvement for Solving Slowly Varying
Total Least Square~~ Problems. Algebraic Conoeciiom Between Total Least Sqnace~~
a.Ild Leatit Squa.re~ Problems. Sensitivity AMI.ysiB of Tot.al Leut Squa.. and Least
Squares Problems in the Preaenc;:e ot En-on in AU OM&. Statistic;:al Properties of the
Total Least Squane Problem. Algebraic Connections Benn!en Total Least Squarea
Estimation and Clallsica.l Linear Regre!I!Jion in Multicollinearity Problems. Conclu-
sions.
Eigenvalue Problems
B.N. Parlett (1980). The Symmetric EigentJalue Problem. Prentice-Hall,
Englewood Cliffs, NJ.
Baaic Facta abom Self-Adjoint M.mc.. 'l'llalal, 0~ .ud Aida. Clount.izl«
Eigtmvaluea. Simp!e Vector Iterations. Deliadon. Uilllful OttbopJIIIII. M&tricell.
Tridiagoul Form. The QL find QR A.1gorithma. Jac:obi Metboda. Eigeavalue
Bowada. ApproximMioa. from a Sabepace. Krylov Subal-. J.u,r:sa, Algoritlui!L
Subllpace ltemtion. The Geaeral Lin_.- Elgenvalae Problem.
High Performance
Edited Volumes
D.J. Rose and R. A. Willoughby, eds. (1972). Sparse Matrice~ and Their
Applications, Plenum Press, New York, 1972
J.R. Bunch and D.J. Roee, eds. (1976). Sparse Matri:r: Compumti~,
Academic Press, New York.
I.S. Duff and G.W. Stewart, eds. (1979). Sparse Matrix Pf'OCWlings, 1978,
SIAM Publications, Philadelphia., PA.
I.S. Dutf, ed. (1981). Spar3e Matrice3 and Their Usu, Academic Press,
New York.
B. K8gstrom and A. Rube, eds. (1983). Matri% Pencils, Proc. Pite Haw-
bad, 1982, Lecture Notes in Mathematics 973, Springer-Verlag, New
York and Berlin.
J. Cullum and R.A. Willoughby, eds. (1986). Large Scale Eigenvalue Prob.
lems, North-Holland, Amsterdam.
A. Wouk, ed. (1986). New Computing Environment!: Pamllel, Vector, and
Synolic, SIAM Publications, Philadelphia, PA.
M.H. Schultz, ed. {1988). Numerical Algorithms for Modem Parallel Com-
puter Architectures, IMA Volumes in Mathematics and Its Applications,
Number 13, Springer-Verlag, Berlin.
E.F. Deprettere, ed. (1988). SVD and Signal Processing. Elsevier, Ams-
terdam.
B.N. Datta, C.R. Johnson. M.A. Kaashoek, R. Plemmons, aod E.D. Son-
tag, eds. (1988), Linear Algebro in Signals, Systems, and Control, SIAM
Publications, Philadelphia, PA.
G.H. Golub and P. Van Dooren, eds. (1991). Numerical Linear Alge-
bm, Digital Sign41. Procusing, and Parnllel Algorithms. Springer-Verlag,
Berlin.
R. Vaccaro, ed. (1991). SVD and Signal Processing II: Algorithms, Analy-
sis, and Applications. Elsevier, Amsterdam.
REFERENCES
R..J. Plemmons and C.D. Meyer, eds. (1993). Linear Algebro, Mo:rkov
Chllim, and Queuing Modeu, Springer-Verlag, New York.
M.S. Moonen, G.H. Golub, and B.L.R. de Moor, eds. (1993). Linear
Algebra for Large Scale and Real- Time Applications, Kluwer, Dordrecht,
The Netherlands.
J.D. Brown, M.T. Chu, D.C. Ellison, and R.J. Plemmons, eds. {1994). Pro-
ceedings of the Cornelius Lanczos International Centenatlf Conference.,
SIAM Publications, Philadelphia, PA.
R. V. Patel, A.J. Laub, and P.M. Van Dooren, eds.. (1994). NumericiJJ.
Linear Algebra Techniques for System3 and Contro~ IEEE Press, Pis-
cataway, New Jersey.
M. Moonen and B. De Moor, ads. (1995). SVD and Signal Prvceuing III:
Algorithms, Analysis, and Applications. Elsevier, Amsterdam.
Chapter 1
Matrix Multiplication
Problems
The proper study of matrix computations begins with the study of the
matrix-matrix multiplication problem. Although this problem is simple
mathematically it is very rich from the computational point of view. We
begin in §1.1 by lOoking at the several ways that the matrix multiplica-
tion problem can be organized. The «language" of partitioned matrices
is established and used to characterize several linear algebraic "levels" of
computation.
H a matrix baa structure, then it is usually possible to exploit it. For
example, a symmetric matrix can be stored in half the space as a general
rn.&trix. A matrix-vector product that involves a matrix with many zero
entries may require much less time to execute than a full matrix times a
vector. These matters are discussed in §1.2.
In §1.3 bloclt matrix notation is established. A block matrix is a matrix
with matrix entries. This concept is very important from the standpoint of
both theory and practice. On the theoretical side, block matrix notation
allows ua to prove important matrix factorizations very succinctly. These
factorizations are the cornerstone of numerical linear algebra. From the
computational po:iut of view, block algorithms are important because they
2 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
are rich in matrix multiplication, the operation of choice for many new high
performance computer architectures.
These new architectures require the algorithm designer to pay as much
attention to memory traffic as to the actual amount of arithmetic. This
aspect of scientific computation is illustrated in §1.4 where the critical is-
sues of vector pipeline computing are discussed: stride, vector length, the
number of vector loads and stores, and the level of vector re-use.
bow these two styles of expression complement each another. Along the way
we pick up notation and acquaint the reader with the kind of thinking that
underpins the matrix computation area.. The discussion revolves around
the matrix multiplication problem, a computation that can be organized in
several ways.
l
m-by-n real matrices by R"x":
au Btn
A E IR.mxn = Bi; E R.
~1 ~n
A= (Bi;)
[
C;j =a;;,
C=A+B. ==>
.
C=AB C;j = L: au.b"i.
... 1
vector addition,
c = -zTy
Algorithm 1.1.1 (Dot Product) Ifx,y E R", then thiB algorithm com-
putes their dot product c = xT y.
c=O
fori= l:n
c = c + x(i)y(i)
end
The dot product of two n-vectors involves n multiplications and n additions.
It is an "O(n)" operation, meaning that the amount of work is linear in
the dimension. The sa.x.py computation is aLso a.n O(n) operation, but it
returns a vector instead of a scalar.
y = Az+y
where z E R" and y E Rm are given. This generalized saxpy operation is
referred to as a ga%py. A standard way that this computation proceeds is
to update the components one at a time:
n
Yi = La.;x; + Y• i=Lm.
j~I
fori= l:m
for j = l:n
y(i) = A(i, j)z(j) + y(i)
end
end
An alternative algorithm rffilllts if we regard Ax as a linear combination of
A's columns, e.g.,
[~ ~ l[~ ]
5 6
= [ ~ ~ ! ~ :!
:
5.7+6.8
l ~ l !l ~~ l
= 7[
5
+8 [
6
=[
83
for- j = l:n
for-i;; l:m
y(i) = A(i,j)x(j) + y(i)
end
end
Note that the inner loop in either gaxpy algorithm carries out a saxpy
operation. The column version was derived by rethinking what matrix-
vector multiplication "means" at the vector level, but it could also have
been obtained simply by interchanging the order of the loops in the row
version. In matrix computations, it is important to reJate loop interchanges
to the underlying linear algebra.
A E Je"xn (1.1.1}
[~ n
1.1. BASIC ALGORITHMS AND NOTATION 7
rf = [ 1 2 ], rf = [3 4 ), and rf = [ 5 6).
With the row partitioning (1.1.1) Algorithm 1.1.3 can be expressed as fol-
lows:
fori= l:m
Yi = rf X+ y(i)
end
Alternatively, a matrix is a collection of column vectors:
With (1.1.2) we see that Algorithm 1.1.4 is a saxpy procedure that accesses
A by columns:
for j = l:n
end
In this context appreciate y as a running vector sum that undergoes re-
peated saxpy updates.
l
The kth column is specified by
a11c
A(:,k) =
[
~k
a.nd
for j =l:n
y = :t{j)A(:,j) +y
end
respectively. With the colon notation we are able to suppress iteration
details. This frees us to think at the vector level and focus on larger com-
putational issues.
The outer product operation xyT "looks funny" but is perfectly legal, e.g.,
[ ~][4
3
5]=[: r5o].
12 15
This is because zyT is the product of two "skinny" matrices and the number
of columns in the left matrix x equals the number of rows in the right matrix
yT. The entries in the outer product update are prescribed by
fori= l:m
for j = l:n
end
end
The mission of the j loop is to add a multiple of yT to the i~th row of A,
i.e.,
fori= l:m
A(i, :) = A(i, :) + x(i)yT
end
On the other hand, if we make the i-loop the inner loop, then its task is to
add a. multiple of x to the jth column of A:
for j = l:n
A(:,j) = A(:,j) + yU)x
end
Note that both outer product algorithms amount to a set of saxpy updates.
1.1. BASIC ALGORITHMS AND NOTATION 9
Finally, in the outer product version, the result is regarded as the sum of
outer products:
C=AB+C
The starting point is the familiar triply-nested loop algorithm:
This is the "'ijk va.riant" because we identify the rows of C (and A) with i,
the columns of C (and B) with j, and the summation index with k.
We consider the update C = AB + C instead of just C == AB for two
reasons. We do not have to bother with C .,. 0 initializations and updates
of the form C = AB + C arise more frequently in practice.
The three loops in the matrix multiplication update can be arbitrarily
ordered giving 3! = 6 variations. Thus,
for j = l:n
fork= l:p
for I= l:m
C(i,j) = A(i,k)B(k,j} + C(i,j)
end
end
end
is the jki variant. Each of the six possibilities (ijk, jik, ikj, jki, kij,
kji) features an inner loop operation (dot product or saxpy) and has its
own pattern of data flow. For example, in the ijk variant, the inner loop
oversees a dot product that requires access to a row of A and a column of
B. The jki variant involves a saxpy that requires access to a column of C
and a column of A. These attributes are summarized in Table 1.1.1 along
with an interpretation of what is going on when the middle and inner loop
are considered together. Each variant involves the same amouot of floating
This is the idea behind Algorithm 1.1.5. Using the colon notation we can
highlight this dot-product formulation:
and
B = I bl, ... 'b.,. l
then Algorithm 1.1.6 has this interpretation:
fori= l:m
for j = l:n
C;1. =alb-+
I 1 e;·J
end
end
Note that the "mission" of the j·loop is to compute the ith row of the
update. To emphasize this we could write
fori= l:m-
cf = afB +c'f
end
where
c- [1]
is a row partitioning of C. To say the same thing with the colon notation
we write
fori= l:m
C(i, :) = A(i, :)B + C(i, :}
end
Either way we see that the inner two loops of the ijk variant define a
row-<>riented gaxpy operation.
12 CHAPTER 1. MATRIX MULTIPLlCATION PROBLEMS
C = [ci.····c,t]
By comparlDg jth columns in C = AB + C we see that
p
These vector sums can be put together with a sequence of saxpy updates.
for j = l:n
fork= l:p
C(:,j) =A(:, k)B(k,j) + C(:,j)
end
end
Note that the k-loop oversees a gaxpy operation:
for j = l:n
C(:,j) = AB(:,j) + C(:,j)
end
fork= l:p
for j = l:n
fori= l:m
C(i,j) = A(i, k)B(k,j) + C(i,j)
end
end
end
where
with a~; E
A
We therefore obtain
~ [ 1] (1.1.3)
AB""' L:c.~rb'f
.t:=l
where the a,. and b.~; are defined by the partitionings in (1.1.3).
14 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
" p
<4; = [BT AT]i; == L[BT],~o[AT].~:; = L b.~:oaj.l:·
Scalar-level proofs such as this one are usually not very insightful. However,
they are sometimes the only way to proceed.
Problems
Pt.l.l Suppoae A E R'x" and z E R" 8l'e giVIIZI. Give a saxpy algorithm for computing
the linrt column of M = {A- z 1 I)··· (A- z~I).
1.1. BASIC ALGORITHMS AND NOTATION 15
Pt.1.5 F'onnulMe all ouw product algorithm fw the upda&e C =ABT + C where
A E R"'l<r, BE ~rxr, 8Pd 0 E R"x".
P1.1.6 Sumx- we haw real n-by-n matriceB C, D, E, and F. Sbow bow to compute
real n.-by-n ma&riCfll A and B with just three real n-by-n ~rix multipliea&iona 110 that
(A+ iB) = {C + iD}{E +iF). Hint: Compute W == (0 + D){E- F).
C.L. Lawaon, R.J. He.oao.n, , D.R. KincNd, and F.T. Krogh (1979-). "Basic Linlllll'
Algebra Subprograms for FORI'RAN Usage," AOM Thln.8. Math. Soft. 5, 308-323.
C.L. LaWIIOo, R.J. H81180D, D.R. Kioeaid, and F.T. KrOih (1979). "Algorithm 539,
Basic Linear Algebr-a Subprograms for FORI'RAN Usage," ACM Thm.s. Modi. Soft.
5, 324-325.
J.J. Doogarra, J. Du C~ S. Hammarliug, and R.J. H81111011 (1988). ~Ao Extended Set
of Fortran Bll8ie Linem Algebra Subprogra.m~," AOM 'nun~. Math. Soft. 1,4, 1-17.
J.J. Doogacra, J. Du Crol, S. HBIIliQUliDg, and R.J. H&rl80n (1988). "Algorithm 656 An
Extended Set of Fol'tra.D. Basic Lin..- Algebra Subprograma: Model ImplemeoWion
and 'l'an Programa," AOM :INm. Math. Soft. 1,4, 18-32.
J.J. Doogana, J. Du Cro&. I.S. Duff, and S.J. Hammarling (1990). •A Set of Level 3
Bui.c Linear~ Subprograma,~ ACM Th:uu. Math. Soft. 16, l-17.
J.J. Dongvm, J. Ou Crm:, I.S. Duff, lllolld S.J. H&mmarll.ng (1990). •AJgorithm 679. A
Set of Level 3 Ba.lc Linear Alpbra Subprograms: Model lmplemeutatoion and Tim
Program!," ACM Thlnl. Math. Soft. 16, 18-28.
X X X 0 0
X X X X 0
0 X X X X
0 0 X X X
0 0 0 X X
0 0 0 0 X
0 0 0 0 0
0 0 0 0 0
I 0
0
a22~2
0
a22~J + a23b33
a33b33
It suggests that the product is upper triangular and that its upper trian-
gular entries are the result of abbreviated inner products. Indeed, since
llikbki = 0 whenever k < i or j < k we see that
J
Cij = E a.ttbt;
and so we obtain:
·-i
Algorithm 1.2.1 (Triangular Matrix Multiplication) If A, B E R.nxn
= AB.
are upper triangular, then this algorithm computes C
C=O
fori= l:n
for j = i:n
fork= i:j
C(i,j) = A(i,k)B(k,j} + C(i,j}
end
end
end
18 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
To quantify the savings in this algorithm we need some tools for measuring
the amount of work.
1.2.4 Flops
Obviously, upper triangular matrix. multiplication involws less arithmetic
than when the matrices are full. One way to quantify this is with the notion
of a jWp. A flop 1 ia a floating point operation. A dot product or saxpy
operation of length n in~lvea 2n Bops because there are n multiplic.ations
and n adds in either of these vector operations.
The ga.xpy y = Az + y where A E rxn involws 2mn Bops as does an
m-by-n outer product update of the form A =A+ %'!JT.
The matrix multiply update C..: AB+C where A E Rmxp, B € JRPx",
and C E lR.mxn involves 2mnp flops.
Flop counts are usually obtained by su.mming the amount of arithmetic
associated with the I008t deeply nested statements in an algorithm. For
matrix-matrix multiplication, this is the statement,
C(i,j) = A(i,k)B(k,j) + C(i,j)
which involves two llops and is executed mnp times as a simple loop ac-
counting indicates. Hence the conclusion that general matrix multiplication
requires 2mnp Bops.
Now let us investigate the amount of work involved in Algorithm 1.2.1.
Note that Csi • (i $ j) requires 2(j - i + 1) flops. Using the heuristics
q(q + 1)
2
and
ql q2 q ql
-+-+-
3 2 6
:::=-
3
P-1
we find that triangular matrix multiplication requires one-sixth the number
of flops as full matrix multiplication:
We throw aW&y the low order tenus since their inclusion does not contribute
to what the Oop count "says." For example, an exact Oop count of Algo-
rithm 1.2.1 ~ thal precl&ely n 3 /3 + n 2 + 2n/3 Bops are involved. For
tu
tIn ~e fim edition of book - deftned,a flop to bel the amount of work ~ia&ed
witb au openailoo of \be form au • e&;J + CI(}G,Io1 , I.e., a~ po~ add, a floa&i.og
point multiply, aod 80Dl8 auba!7iptinc. Tblla, au "old flop~ iQ\101\.W two •a- flof&" lo
deft.Diog a flop to be & ~ floaUq point operaioD we are optiag Cor a more prec:IM
measure of &ritbmetlc complexity.
1.2. EXPLOITING STRUCTURE 19
large n (the typical situation of interest) we see that the exact 8op count
offers no insight beyond the n 3 /3 approximation.
Flop counting is a necessarily crude approach to the measuring of pro-
gram efficiency since it ignores subscripting, memory traffic, and the count-
less other overheads all80ciated with program execution. We must not infer
too much from a compal'ison of 8ops oounts. We C8JlDOt conclude. for ex-
ample, that triangular matrix multiplication is six times faster than square
matrix multiplication. Flop counting is just a "quick and dirty" accounting
method that captures only one of the several dimensions of the efficiency
issue.
A(p;q, c) = [ ~
aqc
l
e R.q-p+l.
8 = L Xt:Yn-i+l ·
(=1
au an a13 0 0 0
421 ~ a23 a-u. 0 0
0 a;n 033 a:w G:J6 0
A= 0 0 0.(6
a43 04-f, 0411
0 0 0 064 OM 056
0 0 0 0 G66 1166
l
then
for j = l:n
Ytop = max(l,j- q)
Ybot = m.in(n,j + p)
atop = ma.x(1, q + 2 - j)
allot = <ltop + Yt~oc - Ytop
Y(Ytop:ybot) = x(j)A.band(atop:«boc,j) + Y(Ytop:Ybot)
end
1.2. 7 Symmetry
We say that A E Rnxnis symmetric if AT= A. TbUB,
1 2 3]
A=
[ 2
3 5 6
4 5
1.2. EXPLOITING STRUCTURE 21
1 2 3]
A=
[3 245,
5 6
then in a. store- by-diagonal scheme we represent A with the vector
A.diag = ( 1 4 6 2 5 3 ] .
In general, if i ?. j, then
Thus,
A=[~~~]""[~~;]+[~~~]
356 000 000
D(A,2) D(A,l)
fori= l:n
= A.diag(i)x(i) + y(i)
y(i)
end
fork= l:n -1
t = nk - k(k- 1)/2
{y = D(A,k)x+y}
fori"" l:n- k
y(i) = A.diag(i + t)x(i + k) + y(i)
end
{y = D(A,k)Tx + y}
fori= l:n- k
y(i + k) = A.diag(i + t)x(i) + y(i + k)
end
end
Note that the inner loops oversee vector multiplications:
C{l:n,l:n) = 0
for j = l:n
fork= l:n
C(:,j) = C(:,j~ +A(:, k)B(k,j)
end ·.
end
to
for j = l:n
fork= l:n
B(:,j) = B(:,j) +A(:,k}B(k,j)
end
end
because B(:,j) is needed throughout the entire k-loop. A linear workspace
is needed to hold the jth column of the product until it is "safe" to overwrite
B(:,j):
for j = l:n
w(l:n) 0 =
fork= l:n
w(:) = w(:) +A(:, k)B(k,j)
end
B(:,j) = w(:)
end
A linear workspace overhead is usually not important in a matrix compu-
tation that has a 2-dimensional array of the same order.
Problema
P1.2.1 Give an ll.lgorithm that overwr:itM A with A:! wh~ A € e-xn is {a) upper
tria.ngul.a.r and (b) square. Strive for a minimum workspace in each cue.
P1.2.2 Sup~ A E R'x" is upper B-berg and thai -.ian Alo· .. ,A,. are pven.
Give a saxpy algorithm for romputing the lim column Q{ M =(A- A1l) ···(A- ).,.J).
P1.2.3 Give a column sa.xpy algorithm for the n-by-n matrix multipl.ication problem
24 CHAPTER I. MATRlX MULTIPLICATION PROBLEMS
Pl.2.6 Suppoae X E R'x~ and A E R'xn, with A llytlUJletric and rrtond by diagonal..
Give e.u algorithln that computes Y = XT AX and stores the nl!lult by diagonal. Use
separate anays for A a.od Y.
Pl.2.1 Suppoee a E R" is gi...en e.ud that A E R'xn hu the property that ~i "'
ali-il+l· Give a.o algorithm that overwrites 11 with AZ' + 11 where z,v E R" are given.
Pl.2.8 Suppoee a E R" ill given and that A E R"xn hu the property thai Gii =
G({'+j-l) ""'" n)H· Give a.n algorithm that oYei"Write!i 11 with h + v where Z,JI E R"
are gi'iell,
A to obtain
A = [ Au . . . A;1.-l mt
Block matrice:S combine just \like matrices with scal.a.r entries as long as
· . certain dimension requirements are met. For example, if
Bu ... Bt.-l mt
B =
[ Bq 1 ••• ;qr m.,
n1 llr
[l l
mt
A= B = Bt 1 •• ~ ,. B,.
mq nt llr
then
l
Cu Ct.- m1
AB = c =
[ Cqt Cqr
n,.
mq
n1
where
t' = n•+···+nf:J-1·
But
p
A= , and
B = [ ~: l:
Proof. We set s = 2 and lea.ve the general s case to the reader. (See
P1.3.6.) For 1 $ i $ m and 1 $ j :::;: n we have
p ~ ~+~
Au Ah Bn
l l
m1 Pt
[
B1T
A =
Aqt Aq•
p,
mq
B =
[ B.t B_. P•
Pt nt nr
1.3. BLOCK MATRICES AND ALGORITHMS 27
Cn ... Ct,.
C=
[ Cqt •.• Cqr
nt n,.
then
a= 1:q, {J = 1:r.
a31 a32]
a42 .
A(3:5, 1:2) =
[ a.u
a61 a52
28 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
While on the subject of submatrices, recall from §1.1.8 that if i and j are
scalars, then A(i, :) designates the ith raw of A and A(:,j) designates the
jth column of A.
_[ l l
where A E lRmxn, z E 1R.n, y E lR.m, and
A- ~~: mt _
y- [ : ~~ mt
. Aq mq 'Yq mq
We refer to A; as the ith block row. If m.vec = (mt. ... , mq) is the vector
of block row "heights", then from
r ~1 l ~ l ~~ l
= [ l X + [
l Yq A, Yq
we obtain
last= 0
fori= 1:q
first= last+ 1
last= first+ m.vec(i)- 1 (1.3.1}
y{first:last) = A(first:last, :)x + y(Jirst:last)
end
Each time through the loop an "ordinary" gaxpy is performed so Algorithms
1.1.3 and 1.1.4 apply.
Another way to block the ga.xpy computation is to partition A and x as
follows:
A== z = [ 7]
Xr
nt
n,.
y = [At , ... , ~ j
[
ZJ
;
x,.
l ,.
( n 1, ••• , ~) is the vector of block column widths, then from
+ y = :?= A;x; + y
,.1
we obtain
1.3. BLOCK MATRICES AND ALGORITHMS 29
last= 0
for j = 1:r
first = last + 1
last= first+ n.vec(j)- 1 (1.3.2)
y = A(:, first:last)x(first:last) + y
end
Again, the ga.xpy's performed each time through the loop can be carried
out with Algorithm 1.1.3 or 1.1.4.
B·:u
[ c1 , ... , eN J = [ At , ... , AN )
[
BNl
for f3 = l:N
j = (/3 -l)t + l:f3l
for a= l:N
i =(a -l)l + l:al (1.3.4)
C(:,j) == A(:,i)B(i,j) + C(:,j)
end
end
c = LArB~ + c
..,.~1
and so
for 1 = l:N
k = (/ - 1 )l + 1:-yl
C = A(:, k)B(k, :) + C (1.3.5}
end
where all the matrices are real and i 2 = -1. Comparing the real and
imaginary parts we find
Ct = AtBt - A2B2 + Ct
C2 = A1B2 + A2B1 + C::~
and this can be expressed as follows:
1.3. BLOCK MATRICES AND ALGORITHMS 31
This suggests how real matrix software might be applied to solve complex
matrix problems. The only snag is that the explicit formation of
recur down to the n = 1 leveL When the block size getB sufficiently small,
(n :$ Rmm), it may be sensible to use conventional matrix multiplication
when finding the Pt . Here is the overall procedure:
Example 1.3.1 [f n = 1024 and n...,0,. = 64, thm lltrtUIB inwlva~ (7/8)1°-G ~ .6 the
arithmetic: of the oomoentiooal algor:itbm.
Prob16IDII
Pl.S.l Generalize (1.3.3) 80 that it can bNldle the V8liable block.flize problem covered
by Theorem 1.3.3. •
P1.3.2 Generalize (1.3.4) and (1.3.5) 80 thai they «*1 handle the variable block-size
P1.3.S Adapt etrBBII so thai it cao handle equa.re mt>trix multiplication of any order.
Hint: Ir the •curnmt" A has odd dimension, append a. zero row and column.
P1.3.4 Prove that if
".II
A= .
[
A.r•
is a blocking of the malrix A, then
Pl.3.5 SupPQSe n is even and define the following limction from Rn toR.;
rt/2
-/{%) = .-t(l;2;n)T:(2:n) = L:,:'l~-l.'I:Zi
(a) Show thai if:, v E R" then
ft/2
Ci.j = 2:: 2:
,.. .....,..,+1
G;•b•J·
34 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
Pl.S.7 Use Lemmas 1.3.1 and 1.3.2 to pl'tiW Theorem 1.3.3. In pvticul!ll', aet
S. Winograd (1968). "A NeYt Algorithm for Inner Product," IEEE 1\un~~. Comp. C-17,
693-694.
V. Strasaen (1969). uGa.WI8ian Elimination is Not Optimal," Numer. Math. 13, 354-356.
V. Pan (1984). "How CB.D We Speed Up Matrix Multiplication?,~ SIAM ~ 1l6,
393-416.
Many of &h- methods have dubious practic&l value. However, with the publi<:Mion of
D. Bailey (1988). ~Extra. High Speed Matrix Multiplication on the Cr3y-2," SIAM J.
Sci and St4,, Comp. 9, 603-607.
it is clear that the blanket diBmillaal of theae fasc proced~ is unwise. The urtability"
of the Stra.en algorithm is diric::LLBXI. in §2.4.10. See also
N.J. Higham (1900). "Exploiting F&Bt Matrix Multipliation within the Level 3 BLAS,"
ACM nu~ Math. Soft. 16, 352-368.
C. C. Dougi.N, M- Heroux, G. Slishman, aud R.M. Smith (1994). "GEMMW: A Portable
Level 3 Bl.J\S Winograd Variant of Stra&llen's Matrix-Matrix Multiply Algorithm,~
J. Comput. PhyiJ. 110, 1-10.
%
y Add z
and think of the addition unit aa an 8Sienlbly line with three "work st.
tiona". The input scalars z and y proceed along the 888embly line spending
one cycle at each of three stations. The sum z emerges after three cycles.
36 CHAPTER l. MATRIX MULTIPLICATION PROBLEMS
AdjUBt
Add Norma!De
Exponenta
- · · :tto
Z6 ---
·•• Ylo
Note that when a single, "free standing" addition is performed, only one of
the three stations is active during the computation.
Now consider a vector addition z = x + y . With pipelining, the x and y
vectors are streamed through the addition unit. Once the pipeline is filled
and steady state reached, a Zi is produced every cycle. In FIG.l.4.2 we
depict what the pipeline might look like once this steady state is achieved.
In this case, vector speed is about three times scalar speed because the time
for an individual add is three cycles.
first= 1
while first :5 n
last= min{n,first +vt. -1}
Vector load x(first:last).
Vector load y(first:last).
Vector add: z(!irst:last) = x(first:last) + y(!irst:last).
Vector store z(Jirst:last).
first = last + 1
end
{If IJ is in seconds, then 14, is in flops per second.) The asymptotic rate of
performance is given by
fori= l:m
for j = l:n
fork= l:p
C(i,j) = A(i, k)B(k,j) + C{i,j)
end
end
end
This is the ijk variant and its innermost loop oversees a length-p dot prod-
uct. Thus, our performance model predicts that
Variant Cycles
ijk mnp + mn · -rr~ot(pfvt.)
jik mnp +mn · 1"dot(p/vL)
ikj mnp + mp · -r.=(n/vL}
jki mnp + np · 'T10 z(mjvt.)
kij mnp + mp · 'T~z(n/vr.)
kji mnp + np · 'T1 ~:u:(m/vrJ
Here is a table that specifies the A, B, and C strides associated with each
of these possibilities:
for j = l:n
y= A(:,j)x(j) + y
end
Tt = n(r._ + n)
cycles are required.
In §1.2.7 we introduced the lower triangular storage scheme for sym-
metric matrices and obtained this version of the gaxpy:
40 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
for j = l:n
fori= l:j -1
y(i) = A.uec((i- 1)n- i(i- 1)/2 + j)x(j) + y(i)
end
fori =j:n
y(i) = A.vec((j- l)n- j(j -1)/2 + i)x(j) + y(i}
end
end
Notice that the first i-loop does not define a unit stride saxpy. If we assume
that a length n, nonunit stride sa.xpy is equivalent to n unit-length saxpys
(a worst case scenario}, then this implementation involves
cycles.
In §1.2.8 we developed the store-by-diagonal version:
fori= l:n
y(i) = A.diag(i)x(i} + y(i)
end
fork= l:n- 1
t = nk- k(k- 1)/2
{y = D(A, k)x + y}
fori= l:n- k
y(i) = A.diag(i + t)x(i + k} + y(i)
end
{y = D(A,k)T X+ y}
fori= l:n -k
y(i + k) = A.diag(i + t)x(i) + y(i + k)
end
end
In this case both inner loops define a unit stride vector multiply (vm) and
our model of execution predicts
Ta = n(2r._. + n)
cycles.
The example shows how the choice of data. structure can effect the stride
attributes of an algorithm. Store by diagonal seems attractive because it
represents the matrix compactly and has unit stride. However, a careful
which-is-best analysis would depend upon the values of r,az and Ttlm and
the precise penalties for nonunit stride computation and excess storage.
The complexity of the situation would call for careful benchma.rldog.
1.4. VECTORIZATION AND RE-USE IssUES 41
for a= l:m1
i = (a- 1)1h + l:avz.
for {3 = l:n1
j = (/3- l)Vt. + l:{JVL
A(i,j) = A(i,j) + x(i)y(j)T
end
end
Each column of the submatrix A(i,j) must be loaded, updated, and then
stored. Not forgetting to account for the vector touches associated with x
and y we see that approximately
vector touches are required. (Low order terms do not contribute to the
analysis.)
Now consider the ga.xpy update y = Ax + y where y E Rm, x E m.n and
A E Rm x n. Breaking this computation down into segments of length v z.
gives
foro= l:m1
i = (cr- l)vz. + l:avz.
for /1 = l:n1
j = ({3- l)th + 1:/3vL
y(i) = y(i) + A(i, j)x(j)
end
end
Again, each column of submatrix A(i,j) must be read but the only writing
to memory involves subvectors of y. Thus, the number of vector touches
for an m-by-n gaxpy is
B = [ 81 , ... , BN ) c = [ c. ,... , eN
i i t t
where we 8BSUIIle that n = i N . From the expansion
,.
CB = ABB + Ca = EA{:, k)Bo(k, :) + Ca
·-1
we obtain the following computational framework:
for a= l :N
Load Ba and Ca into cache.
fork= l:n
Load A(:, k) into cache and update 0 0 :
Ca =A(:, k)Ba(k, :) + Ca
end
Store C.. in main memory.
end
Note that if M is the cache size measured in 6oating point words, then we
must have
2ni+n ~ M. (1.4.1)
Let rl be tbe number of floating point numbers tbat 8ow (in either direc-
tion) between cache and main memory. Note that e"MlY entry in B ia loaded
into cache once, every entry in C is loaded into cache OIK!e aod ~red beck
in main memory once, and every entry in A is loaded into cache N = n/l
times. It follows that
n3
r. = 3n2 + t:·
2 The dJacu.ioo wbidl Collawa would al8o apply it the mauil:el wwe OD a dillk 6D<l
a-ted t4 be brought iato maio rnemoey.
44 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
~~~(~ -1)
obtaining
2n 4
2
r1:::::3n + - M .
-n
(We use "~" to emphasize the approximate nature of our analysis.) U cache
is large enough to house the entire B and C matrices with room left over
for a column of A, then l =nand rl = 4n 2. At the other extreme, if We
can just fit three columns in cache, then l =
1 and rl :::::: n 3 .
Now let us regard A= (A~) , B = (Bajj), and C = (Cajj) as N-by-N
block matrices with uniform block size l = nfN. With this blocking the
computation of
a = l:N, {3 = l:N
foro= l:N
for {3 = l:N
Load Cap into cache.
for 1 = l:N
Load Aa-, and B..,jj into cache.
Cajj = Cap + AD.'"fB..,p
end
Store Cap in main memory.
end
end
In this case the main memory/cache traffic sums to
2n3
f:~;=2n 2 +-
t
because each entry in A and B is loaded N = njt times and each entry
in C is loaded once and stored once. We can minimize this by choosing l
to be as large as possible subject to the constra.i.nt that three blades fit in
cache, i.e.,
rl 3n'J.+J'!!.n 3+2n2
-~ > --~M~=
r'l 2n2+2n3/I 2+2-/3~·
The key quantity here is n 2 fM, the ratio of matrix size (in floating point
words) to cache size. AJJ. this ratio grows the we find that
rl
-R!--
n
r:~ ../3M
showing that the second blocking strategy is superior from the standpoint
of data motion to and from the cache. The fundamental conclUBion to be
reached from all of this is that blocking effects data motion.
A(i,j) with v((i- l)m + i). For algorithms that accem~ matrix data by
column this is a good arrangement since the column entries are contiguous
in memory.
46 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS
PrQbleDUI
P1.4.1 Cooside!' the matrix produc\ D "" ABC where A E enxr , BE R'"x" a.nd
C E R" x• . .AIBwne thai all the ~ are st~ by column and t.ha& the time requin!d
to execute & unit-stride -.xpy open&loo of length /1: is oHhe form 1.(/1:) =(LH:)IJ where L
is & OOIIBtaUt aad I' is the cycle time. Sued on this model, whllll ill it more economical to
computeD. D == (AB)C instead of IIIII D = A(BC)? Asaume that an matrix multiplil!ll
are done \Wing the ji:i, (ga.xpy) algorithm.
P1.4.2 Whai is the toi&l time apm~t in ja variant on tb saxpy opa-a.tiooa 881Uming
that &II the matrices are lltored by column and tb.at the time required to execute & unit-
=
Slrid.e saxpy operalion of length /1: is of the fonn t(.l:) (L + .1:)1' whm'e L ill & coDSiaDt
and p is the cycle time? Specialbe the algorithm ao that it efficiently handles the ca.
when A a.nd B are n-by-n and upper triangu1ac. Does it follow that the triangular
implementation ill six timl!ll fastel' u the ftop COU1rt 8Uggmts7
Pt.4.3 Give 1m algorithm far computing C =AT BA where A and B ILt'fl n-by-n and
B is symmetric. Anaya ahould be IIC.C-t in unit SUide fll8b.ioJI within all innermost;
loops.
P1.4.4 Suppoee A E R"xn is atored by column in A.ccol(l:mn). Allsume that m = l1M
1.4. VECTORJZATION AND RE-USE IssUES 47
and n • l2N and tba&;- reprd Au aD M-by-N block matrix with lt·by-b blocks.
Given i, j, a, and Q tbal; Ali8ly 1 $ i $ lt, 1 $ j $l2, 1 $a$ M, aDd 1 $ Q $ N
detennine It; so that A.col(l:) ~the (i,j) entry of A,.tl. Glw an algorithm that
avenorite. A.col with A lliol'ed by block 811 in Figure LUi. How big of a work anay ia
required?
J.J. Donga:ra, F.G. Gllfiawon, and A. KNp (1984). Klmplementing Linea: Algebra
Algorithms for Denae Matrices on a Vector Pipeline Machine,~ SIAM ReuieT.a- 16,
91-112.
J.M. Ortega and R.G. Voigt ( 1985). uSolutioo of Partial Differential Equaiiooa on Vector
and Parallel Computer.,~ SIAM Review fJ'T, 14g..240.
A very detailed look ai matrix oomputatiooa in hienrchical memory systems can be
found In
J.J. Dongacra and A. Hinda (1979}. '"Unrolliog Loops in Fortran," Sojfvnzre Pnlc:tice
ond ~ 9, 219-229.
J.J. Dongarn. and S. ~ (1984). "Squeezing the Mosc Out of an Algoritlun in
Cnt.y Fortran," ACM 7nuu. Mstk. Soft. 10, 221-230.
B.L. Bust- (1986) "A Stra.te§ for Vectorizaiion," Porollel Computing 3, 187-192.
K. Gallivan, W. Jalby, and U. Meter (1987). "The Ueo of BLAS3 in u-- Algebr-a 011 a
Pamllel ~ with a Hiemrclrlcal Memory-,~ SIAM J. Sci. tmd SW. Comp. 8,
1()79..1084.
J.J. Dongarn. aDd D. Walht (1995). "Software Librariel for LiDe8r Algebra Computa-
tions oo High Perlonnaoce Compu1en.," SIAM R.evieut 37, 151-180.
Chapter 2
Matrix Analysis
AO
2.1. BASIC IDEAS FROM LINEAR ALGEBRA 49
In= [et.···tl:n]
where e" is the ktb "canonical" vector:
e~c = ( 0, ... , 0 , 1, o, ... ,o)T.
...............
Ji:-1
-...-
n-ft
The canonical vectors arise frequently in matrix analysis and if their eli-
. ·18 ever amb'1guous, we use superscnp
mellBIOn · ts , ·1.e., ele(n) E 10n
Jn. •
(2.1.2)
The identity
(2.1.3)
shows how the inverse changes if the matrix changes.
The Slu!nnan-Morrison- Woodbury formul4 gives a convenient expres-
sion for the inverse of (A+ UVT) where A E R'x" and U and V are n-by-k:
(2.1.4)
Here, AtJ is an (n -1)-by-(n -1) matrix obtained by deleting the first row
and jth column of A. Useful properties of the determinant include
2.1.5 Differentiation
Suppoee a is a scalae and that A( a) is an m-by-n matrix with entries ao;(a).
H ao;(a) is a. differentiable function of a for all i and j, then by A(a) we
mean the matrix
. a) = da
A( d A(a) = ( da
d flii(a) ) =(~;(a)).
Problmna
P2.1.1 Show thai if A E R"x'" b. rank p, thea there exists an X E R"x" and a
Y E R"x" IRICh t.._ A=- XYT, where rank(X) = nuak(Y) = p.
P2.1.2 Suppoee A(a) E R'"xr aDd B(a:) E R"x" are masria!a wbi:laB entrim ace diffel"-
entiable fuDctioDII ol the IEII!ar ca. Show
P:U.-& S~ A E R"x", II E R" aad tJ.a; f(z) - !:.:7' Az - rll. Sbow \ba& the
gradieDt of~ • P- h:r V~:) "" !CAT +A):- b.
P2.1.!5 ~that bot.b AUld A+J areDOIIIIiDp;ularw'-'t.A E R"x"aad u.,v E R
Show t~~.a& if :~: 80iftw cA + .,rp- = b. then it a11o aolvM • pmt1Ubed riPt bud side
probill!ln of the fonD A% z fl + -.. Giw aD exp&.-io.a for a iD ten. ol A, a. aad v.
S.J. Laon (1980). Lmear Algem UIUh A~rioN. MJICJ!Jil!an, New York..
G. Svaog (1993). lnD-cldueMI& to Lm- Algll/mJ., WeUesJey-Cambridge Pn., WI!IIB!Iey
MA.
D. Lay (1994). Linmr Algebnl and lu A~ Addillon-Wealey, a.-img, M.A.
C. Meyer (1997). A Cour~e in A.:ppMd Linear Algebra, SIAM Publicaticma, Philadelphia,
PA.
More adv1Ulced u-tme11ts include Gantmacher (1959), Haru and Johll8011 (1985, 1991),
IIDd
A.S. Houaebokl« (1964). The 7'heor-, of M~Uricu in Numeric:Gl Arualy.V, Gilul. (Blaie-
dell), Boston.
M. Mareua 1111d H. Mine (1964). A Surt~e~~ of Mot:riz Theory and MG#U lntqUalttiu,
Ally:o and Bacon, Boston.
J.N. Franklin (1968). Marit Themv Pnmtil:.ll Hall, Englvwood Cllfb, NJ.
R. Bellman (1970). Jntrodu.::lion to M~ An4lJt,tU, Second .&iition, McGraw-ltill, New
York.
P. Lanc:Mter and M. Tismelleteky (1985). The TMory of Malrica, Secmul Edition,
Academic Pn., New Y~rk.
J.M. 0rtep (1987). Mol:t'W Theory: A Second COUNf!, Plenum Pre., New York.
2.2.1 Definitions
A vector nonn on R" Is a function f:R" ..... R that satisfies the following
properties:
/(z} ~ 0 x E R", (J(z) = 0 ilF x""' 0)
/(:& + y) ~ /{z) + /(y) -z:,yeR"
f(az) = lalf(:c) aER,zeR"
We denote such a function with a double bar notation: /(z) = II :J: Sub- n.
sc:ript& on the double bar are ueed to distiDguiah between VBrious norma.
A UBeful class of vector norms are the p-nornu defined by
A unit vector with respect to the norm U· II is a vector :r: tbat satisfies
ll:r: ll = 1.
2.2.2 Some Vector Norm Properties
A classic result concerning p-norms is the Holder ineqoolity:
1 1
!:r:T111 ~ II :r: ll,h 119 -p + -q = 1. {2.2.2)
Example 12.1 Irz = (1.234 .0567-4):r u.t! "' (1.235 .05128)'~'. thea 1%- z U.,./ 1z H.,.
:::. .0043 ~ 10-~. Note t.bazl %1 baa about tluw ~illcaDt dlsita that an con-ect while
only oae ~ dicit in i:2 is c:olftet.
54 CHAPTER 2. MATRIX ANALYSIS
2.2.4 Convergence
We say that a sequence {xCI:)} of n-vectors conve~u to x if
lim llz(I:J - x II = 0•
I:-co
Note that because of (2.2.4), convergence in the a-norm implies convergence
in the P.,norm and vice versa.
P:robl~~~ma
P2.:il.l Sbow that if :Z: E R", thtm lim_..., II Z II,= II :Z: lloo·
P2.2.2 Prove the Cauchy-Sc.hwarta inequality (2.2.3) by considering the inequality
0 $' (= + br)T(u + b\1) for suitable I!IC&lan 4 and b.
P2.2.3 Verify that II · lb, II · lb. and II · lloo az-e vector norma.
P2.2.4 Verify (2.2.5)-(2.2.7). When is equality a.chieYed in each l'lllllllt?
P2.2.5 Show that in R", z(il - :z: if and only if zi' 1 --. z~o fork= l:n.
P2.2.8 Show that lillY vector nann on R" ~ uniformly continuous by verifying the
inequality 111 z 11 -1111111 $' nz -1111·
P2.2.T Let II · II be a w.ctor nonn on R" and 111111.10111 A E R"x" . Show thai if
rank(A) = n, then Rz nA = Ilk: 11 ia & vtlCtDr norm on R".
P2.2.8 Let z and II be in an and define ,P:R- R by f/l(c.) = II z- CClllb· Show that
=
?/1 i.l miDimiRd when a zT -ui'UTy.
P2.2.9 (a) Verify t~ It z II,::::: (lzd~'+ · · · +lznl'); ill a wctor norm on C'. (b) Show
thai if z e C' then II :z: II, $' c (II Ra{z) U, + lllm(z) II,). (c) Fi.nd a. constant Cn such
tb&i c.. (\I Ra{z) lb +II Im(z) ll2) $\I z II:~ for all z E C".
P:l-2.10 Prove or disprove:
Altbou&h a YeCior oonn i.l ~jlllli" a gen«&&lz&iion of the abeolute w.lue concept, t~
ant 8011111 noteworthy IUb&Jetiel:
J.D. Prycll!l (1984). ~A New M-ure of Relat.ive En-or fOl' Vector&," SIAM J. Num.
AftGL RI, 202-21.
2.3.1 Definitions
S"ance £ " X f t is ieomorpbiC to I t -1 the de6nition of & matrix DOrDl should be
equivalent to the definition of a vector oorm. In particular, f:l("'xn - R
is a matrix norm if the following tbree pmperties hold:
IIAII,. =
"' ..
L L I4i,;l2 (2.3. 1)
i•l j • l
tbat we discus8ed in the previous section. The verification that (2.3.1) and
(2.3.2) are matrix DOrms is left; aa ao exercise. It is clear that II A lip is
tbe p-norm o( the largest vector obta.i.oed by appl}'Uig A to a Wlit p-norm
vector:
(2.3.4)
56 CHAPTER 2. MATRJX ANALYSIS
A=B=[~ ~].
then II AB II A > II A 116.11 B IIA. For the most part we work with norms that
satisfy {2.3.4).
The p-norms have the important property that for every A E Rmxn a.nd
x E R" we have II Az II ~ II A II II x lip· More generally, for any vector
norm II · II~ on R" andJl · lip on ii~ we have II Ax ll,s :5 II A ll.~ ..sll x II,
where II A lllll,tJ is a matrix norm defined by
ll Az lltJ (2.3.5)
II A ll.,,a = sup
:"0
II X II lll .
We say that II · lla,ti is subordinate to the vector norms II · II~ and II · itp·
Since the set {x ERn: II x lla = 1} is compact and ll · lla ia continuous, it
follows that
(2.3.6)
(2.3.7)
II Alit (2.3.9)
(2.3.10)
1
..;m II A ll1 :5 II A ll2 :5 vn II A lit (2.3.12)
2.3. MATRIX NORMS 57
(2.3.13)
The proofs of tbeee relations are oot bard and are left u exerciaes.
A sequence {A<.Ir)} E R"'x" oont~etJe.t if lim11-oo HA(lll - A II = 0.
Choice of norm is irrelevact siDoe all norma Oil R"xn are equivalent.
(N~tFic) (I -F) = ].
A:=O
N
It follows that (I - F) - 1 = lim " ' P. From this it is easy to show that
N-oaa~
k"'O
· ro k l
II (I- F)-1 11,. ::S (;II F 11, = 1 -II F II,.. 0
Equation (2.1.3) says that {A+ E)- 1 - A- 1 = -A- 1E(A + E)- 1 and BD
by taking norms we find
II (A+ E)- 1 - A- 1 liP $ II A- 1 liP II E liP II (A+ E)- 1 lip
3
II A- 1 11 p 11 E II
0
< P.
1-r
Problem~~
P2.3.9 Suppoee u E R"' and v E R". SbOI'II' that if E = uvT then II E 11,.. = II E Ill =
II u ll2ll vlll and that nE R.., $II u II.,., II vllt-
P:;a.s.lo Suwc- A e R"""",., e R"', aad o # & e R!'. Show \bel E = (1>'- A&)•r I•T &
hae the smalltBt 2-nonn of all m-by-n mat.rieel E thai seliefy (A+ E)• ='II·
·F.L. Bau« and C.T. Fib (1960). "Norma and Exclllllioo Theon!ma,w N11JJWS'• .MGth. !,
137-44.
L. Mimky (1960). "S)'1IIJD8tric Gaup Functiou and Uoiterily ~t Nonna, ~ Q.art.
J. Math.. 11, ~-
A.S. Ho~ (1964). The Theorr of Matricu in Numcriml A~ , DoYel- Pub-
l.k:atioos, N- York.
N.J. Higham ( 1992). "F..RiDIM.inc the M.a&.~ p-Norm, ~ Nurru!r. MiliA. 6.1, 539-M6.
together witb zero. Notice that for a nonzero f E F we have m ~ 1/1 :s; M
wbere
m = {jL- l and M = {i' (1 - f' ' ). (2.4.1 )
As an example, if fJ = 2, t = 3, L .., 0, and U = 2, then the non-noegative
elements ofF are represented by hash marks on the axis displayed in FIG.
2.4 .1. Notice that the floating point numbers are not equally s paced. A
-2 . - 1 -.5 0 .s l 2
(2.4.3)
u = 21pl-t. (2.4.4)
Let a and b be any two Boating point numbers and let "opM denote any
of the four arithmetic operations+, -, x, +. If a op bEG, then in our
model of floating point arithmetic we a.uume th4t tJu compuUd verrion of
(a op b) is given by fl(a op b). It follows that fl(a op b) = (a op b)(l + £)
with ltl '$ u . Thus,
showing that there is small relative error associated with individual arith·
metic operations1 . It is important to realize, however, that this is not
necessarily the case when a sequence of operatioos is involved.
Example 2 .••1 U fJ,. 10, t .. 3 6oatiac poUlt arithmetic ia usi, theA it cao be ahown
that fl[/l(l0- 4 + l) - I) ,. 0 implyiuc a relative erro:- ol 1. Oo the otbl!r b.aDd the
exact _ _ . ia pv. by fl(fl(lo-• + /l(l - t)J • •o-•.Floating poia1 arithmetic •
~ al-y. a.ociattft
2.4.3 Cancellation
Another important aspect of finite precision arithmetic is the pbeoomeDoo
of cota..trop~Uc ctSn«Ua.tion.. RDugb1y tpMking, this term refers to the ex-
treme loes of correct significant digits when small numbers are additiwly
computed from large numbers. A well-known example tabu from Fo:rsythe,
Malcolm and Moler (19n, pp. 14-16) Ia tbe computa&ioo of e- • via Tay-
lor series with a > 0. Tbe roundo~ error auociated with this method is
1'1'1Mn .,. importaiK -rM- Ott mecbm. waa- edditive floaliA& poUlt opera&lou
satlafy fl(a :!:b) '"' (1 + ~l}o :t: (1 + E,}b wiMn IEtl. l€,, ~ u. ro such 1111 t~~Yiroo.mem,
the iDequa1ity l/1(4 :t: 6) - (4 :t: 6)1 S uta :J: 61 oeecla~ bold.
62 CHAPTER 2. MATRIX ANALYSIS
approximately u times the largest partial 8UJIL For large a, this error can
actually be larger than the exact exponential and there will be 110 correct
digits in the answer no matter how many terms in the series are summed.
On the other hand, if enough terms in the Taylor series for r!' are added and
the result reciprocated, then an estimate of e-a to full preciai.on is attained.
for all i and j. A better way to say the same thing results if we adopt two
conventions. If A and Bare in R"'x"', then
A relation such as this can be easily turned into a norm inequality, e.g.,
II fl(A)- A 11 1 S uJI A ih· However, when quantifying the rounding errors
in a matrix manipulation, the absolute value notation can be a lot more
informative because it provides a comment on each (i,j) entry.
s=O
fork= l:n
(2.4.7)
end
where n
(1 +1t) = (1 +6t) Il<t +E.;)
j•lr
lal S l.Olnu.
P~f. See.Higbam (1996, p. 75). Cl
Applying this result to (2.4.9) under the "reaaonable" assumption nu ::5 .01
gives
(2.4.10)
Notice that if lxTYl < l:riTivl, then the relative error iD fl(z1' y) may not
be small.
(2.4.12)
and
(2.4.13)
and
fl(A +B) = (A+ B) +E lEI ~ uiA + Bl. (2.4.15)
As a consequence of these two results, it is easy to verify that computed
sa.xpy's and outer product updates satisfy
The same result applies if a ge.xpy or outer procluct based procedure is used.
Notice that matrix multiplication does not necessarily give s:maJ..l relative
error since IABI may be much smaller than IAIIBI, e.g.,
1 1] [ 1 0 ] = [ .01 0 ]
[ 0 0 -.99 0 0 0 .
It is easy to obtain norm bounds from the roundoff results developed thus
far. U we look at tbe 1-nonn error in 8oating point matrix multiplication,
then it is easy to show from (2.4.18) that
fl(AB) =
[
aubu(l + ii!t) (oub-t2(l + f2) + Ot2ilzr(1 + e3))(l + f4)
0 a:nbn(1 + 11!5)
l
66 CHAPTER 2. MATRIX ANALYSIS
A= [ au
0
an{ I+ t:3)(l + t:,.)
az.,(l + f5)
l
and
•
B=
[ bu(1 + t:t}
0
b12(1 + t:l)(1
~
+ ~) l
•
A = B = [ .99 .0010 )
.0010 .99
and that we execute Algorithm 1.3.1 using 2-digit floating point arithmetic.
Among other things, the following quantities are computed:
the small off-diagooal e1emeota is not lost. Indeed, for the above A and B
a conventional matrix multiply gives en = .0020.
Failure to produce a. oomponentwise accurate C can be a serious short-
coming in BDme applicatiooa. For example, in Markov procesaes the eli;,
b.;;, and ct; are transition probabilities and are therefore nonnegatiw. It
may be critical to compute Ci; accurately if it reftecta a particularly im-
portant probability in the modeled pheDomena.. Note that if A ~ 0 and
B ~ 0, tbeu coiM!Dtional matrix multiplication produces a product C that
has small componentwiae relative error:
This follows from (2.4.18). Beca.Wie we cannot sa.y the same for the Str88tlell
approach, we conclude that Algorithm 1.3.1 is not attractive foe certain
nonnegative matrix multiplication problems if relatively accurate C.; are
required.
Extrapolating from this discUSBion we reach two fairly obviot19 but im-
portant conclWiions:
• Different methods for computing the same quantity can produce sub-
stantially different results.
• Whether or not an algorithm produces satisfactory results depends
upon the type of problem solved and the goals of the user.
Tbese observations are clarified in subsequent chapters and are intimately
related to the concept& of algorithm stability and problem condition.
Problema
P2.4.1 Sbclow tliM if (2.4.7) Ia applied with r,. :r, tbeo /l(:rTz) = zTz(l + Q) wbere
jQj $ rw+O(u2).
P:l-4.2 Prove (2.4.3).
P2.U Sbow that if E E R"x .. with m ~ n, u- UIEJ b $ ..milE h· Tbie r.ult ill
umfu.l when dmiving oorm bouDdl from abeolu&e value boUDdll.
=
P2."-4 .A.ume the aillieDce of a equant rooc fuDcUoD IIMilfyiDg /l(./%) ../i(l + ~>
with1(1 $ u. Gl~ aa a.lprit.hm for compuiinc I :r b Uld bowJd thlll roundinc lln'On.
P2..&.a Sa:wc- A aad B an ~-n upper triaqu]ar ~ pobd. ~ If C =
/I(AB) '- QDIIIPUted ll8iq oao ol the CIIG'tWlCioDal §1.1 aJsorit!um, dom it fol1ow thai
C=ABwta.ntAaad iJ u-ec:*-coA aad B1
P2.U Sa~ A and B 8oft ~-n floet.iu& poia:t IDMrics aacl tbM A ill llOOIIiDp1at
with u c
IA- 1 IlAII... "" T". Show that if 2 /l(.AB) • ~ llliDc UIY of UM
~in §1.1, thea tbere exiRI a Jj 110 6 • AA IIDd I iJ- B lao ~ nvrl B Roo +
O{u2).
P2.4. 7 Pnwe (2.4.18).
68 CHAPTER 2. MATRIX ANALYSIS
T.E. HuU aud J.R. s - (1966). "'l'eN of ProbabiU.ie MocW. fot Propeplloft of
&UDdolf Envn.," Comm. ACM. 9, 108-13.
J. Lanon 8lld A. Sameh (1978). •Eftk~eat C&lculatloll oHhe El!eca of &undolf Fzror1,"
ACM n.n.. MoJJt.. Soft. 4, 228-36.
W . Miller ud D. SpoooM" (1978). "Soft-wan for R.ouado« Aaalym, II," ACU ThiN.
MoUI. Soft, 4, ~-
J.M. Yobe (1979). "Sol\1olare for InteNd Aritlunaie: A Reaeooable Portable ~e,"
ACM 'lralu. MGIJL. Soft. 5, ~.
All)IOIIe eoclp(i in ..;oWl aoftware ~ .-18 a thorouch undentaadiDc of
floMiDg point ~ie. A gtKd way t.o becio acquiring lmowledp iD ttu. directioll ill
to reed about the IEEE Boating point llandard ill
D. Goldberg (UKU). MWfw E'WJI')' Computer Scientist Shoukl Koow About Flo6siac
Poioi Arithmetic," ACM Survey• 13, 5-48.
See allo
R.P. Bnlllt (1978). "A .t'brtran Multiple Preeilioa Arithmetic Pecbp." ACM 'Ihuu.
Mo/Jt.. Sof', 4, 57-70.
R.P. Bra (1978). "Aipitbm 524 MP, a Fonran Multiple Preclsloll ArlthmeUc: Padt-
ace." ACM n..... MGlla. Soft. 4, 71~1 .
J.W. Demmel (19114). "Undertow aDd tbe Rel1abillty of Numerical Soltwanlt SIAM J.
ScL ~PN~ StoL eom,. .s, 88'1-019.
U.W. Klllilcb aud W .L. MiraAII:a' (1986). "The Arithloetic of the DicNl Coaaputer,"
SIAM &view U, 1-40.
W .J . Cody (1988). "ALGORITHM ~ MACHAR! A SubrouWie \o Dynamically 0.
&ermine MechiDe Pwame&en," ACM Thuu. MGlla. &ft. .!4, 303--311.
D.H. Bailey, H.D. SO.., J. T . &noa, M.J. Fouta (1989). "Floecl.q Point Ari:\biDedc:
lD ~ture SIIJMI'COIDP11Wn," lnt'l J. Sllpmlllmplfling Appl. J, ~
D .H. Bailey (lm). • AJ&ofttbm 119: Mukipnlc:8ao Tnn•!etioo ADd Euc:utioll of FO~
TRAN Propaa..• ACJI n-.. MIIAJa. hft. II, 288-310.
The lrllbtJetiel .-oc:iMed with the d~ o1 bi&h..qualib' eoftftre, - for "'im-
pie" problem~, are !~ A good eample ill the dalip ol a aubrolatiae t.o 00111p11te
2-Dar~M
J.M. BJoe (1078). •A Plll1able FORI'RAN Prop1uD t.o FWt 'be E1lclideaD Norm of a
Vw:tor,.. ACJI n-t. MoUa. Soft. 4, 15-2S.
Ft.- IY1 ~ ol ibe S t - elpnllm aud cxller .,... nn.r ...._ pnxllld--
2.5. ORTHOGONALITY AND THE SVD 69
R.P. Brem (1910). ~ Aaa.lyJd. of AJcorittm. lor' Matrix Mnhipllca&ioD and 'I'riu-
gu)M Oeoompofli\iotl Ulliq Wmogr.d'• kllmticy,~ NVJJU:r. MalA. 16, 145-156.
W. Mills" (1915). "Computa.tioaal Compluity aDd NUIDSical Stability,,. SIAM J. Com-
~4,97-107.
N.J. Hi&hNn (lim). "&ability ol a Mechod b NuldplyiJII Complex :W.UX. with
Three RMl Ma&z:b: Multiplicaliooa,,. SIAM J. MfJizV AnaL Appl. 13, 681-e81.
J.W. Oemmellllld N.J.Ifi&bam. (1992). "Stability ol Block AJgorithmll with Put r-&-3
BLAS: ACJl lhan.ll. Moth. SofL 111. 274--291.
2.5.1 Orthogonality
A set of vectors {x11.,, ,x,.} in Rm. is ort.hogonal if :rfx; = 0 whenever
i =F j and orthonormal if zf x; = Oij· Intuitively, orthogonal vectors are
maximally independent for they point in totally different directions.
A col.lect.ion of subspaces Sl, ... , S, in Rm is mutually orthogon4l if
XT y = 0 wbelle'rel' x E Si andy E S; fori;/:. j, The orthogonal complement
of a subspace S ~ R"' is defined by
and it is not bard to show that ran(A).L = null(AT). The vector& v1 , ••• , Vk
form an orlhononnal basis for a subspace S ~ Rrn if they are orthonormal
and spanS.
A matrix Q E R"'xm is said to be orthogonal if QTQ = 1. If Q =
[ ql, · · · , qm ) is orthogonal, then the qi form an orthonormal basis for Rrn.
It is alway& poi!Sible to extend such a basis to a full orthonormal basis
{ Vt,, .. , tim} for Rm:
(2.5.1)
and
II QAZ ll2 = II A ll2 . (2.5.2)
!Uch that
Proof. Let X E R" and y E am be unit 2-norm vectors that satisfy A:z: =
U1J with q = II A 112· From Theorem 2.5.1 there exist v2
E .R"X(n-1) and
U2 E ~x(m-l} so V = (:r V2l E R'x• and U = [ !1 U2) E ~rxm are
orthogonal. It is not hard to show that rJT AV bas the foUowing structure:
uT AV = a
[ 0
wT ]
B =At.
Since
The o, are the 5ingular value.t of A and tbe vectom Ui and v, are the
ith kft singular vector aud the ith right singular vector respectiveJy. It
2.5. 0KI"HOGONALJTY AND THE SVD 71
.96 1.72]=UEVT={·6
A=[ 2.28 .96 .8
-.8]{3
.6
0][·8.6 -.8.6]T·
0 1
The SVD reveals a great deal about the structure of a matrix. If the
SVD of A is given by Theorem 2.5.2, and we definer by
rank(A) = r {2.5.3)
null(A) = span{tlr+lo··. ,vt~} {2.5.4}
· ra.n(A) = span{Uto•··oUr}, (2.5.5)
min
II Axil, u,. (m ;::: n). (2.5.9)
=
z¢0 ~
72 CHAPTER 2. MATRIX ANALYSIS
A= U1:E1VT
where
ul = U(:, l:n) = I Ul ' •.. ' Un I E nmxn
and
E1 = :E(l:n, l:n) = diag(u., ... ,un) e R'xn.
We refer to this much-used, trim.med down version of the SVD as the thin
SVD.
(2.5.11)
2.5. 0Jtl'ROGONALITY AND THE SVD 73
Proof. Since efT A~c V = diag(o-1, .•. ,a~~:,O, ... , 0} it follows that rank( At) =
k and that rfT(A-A.,)V = diag(O, ... ,o,a,.+l, ... ,o-11 ) and soU A- A~oll2 ~
O",Hl·
Now suppose rank(B) = k for some BE lf"x". It follows that we can
find orthonormal vectors ZlJ ••• ,:Cn-11: so null( B) = span{:~:1, ... ,:cn-.r.} .
A dimension argument shows that
we have
i:+l
II A- B II~ ~ II (A- B)z II~ = II Az II~ = L CTf(v[ z):J 2: a'+l
i=-1
Theorem 2.5.3 says that the smallest singular value of A is the 2-norm
distance of A to the set of all rank-deficient matrices. It also follows that
the set of full rank matrices in Rmxn is both open and dense.
Finally, if rt = rank( A, t), then
p = min{m, n}.
We have more to say about the numerical rank issue in §5.5 and §12.2.
p = min{m,n}
ProbiiBDlll
P2.5.1 Show 'hat if S ill real Mel sT = -S, Chen I-S ill DOnsiDgulal- aDd the matrix
(I- S)- 1 (1 + S) ilon~ Thia ill known 1111 the C11flley t:ran6f-ot S.
74 CHAPTER 2. MATRIX ANALYSIS
ia ortbosooal.
P2.$.4 Eatabliab propertlaa (2.5.3)·(2.M).
P2.5.6 Fol- the 2-by-2 matrix A = [ ~ ~ ] , derive ~na for C1moa (A) and
a--o..(A) that ant fundi.oaa ol w, :1:, 1f, .-d .E.
P2.5. T Shaw til» any mMrix io R" "'" ill th~ limit ola ~oce o! fuU rank ma&ricc:e..
P~.5.8 Sbow tb~ i{ A E R''""' has rank n, then II A(ATA) - 1 A T 112 = l.
P:l.$.9 WhM ia t he - ' rank-ooe ma&rix to A "" [ !. ~ ] in the Frobeui111 oorm7
Fbrsf'be aAd Moler ( 1961) oller a cood IMXlOillli ol the SVD'a role in the Ule.IY8ia of the
.U "" II proble!l'l. Their proof of the decompollitloo il more tnlditioo&l thaD ow. io til»
it mallm u. of the el&eavalua theory foe symmetric matricel. Hinoric:al SVD tefemKea
iodude
LC. Gobber& and M.G . KmD (1960). I~ to til. 171-, of L"'- Non-&1/
AG;oin£ ()perGcora , Am.. Math. Soc., ~ R.I.
F. Smitble. (19'10). lf~Urral ~ Cambridp Ulliwnity PN., Cambridp.
R.educ:iiiJ the raDlt of a ma&rix aa in Tb«nm 2.5.3 wheo the pctwbiD& matrix ill COD-
at:mi.aed ia m.cu-1 iD
J.W. Demmel (1987). "Tbe nnalleBt pmurbe&loa of a mbmatrill whidl 1 - . the raak
aad COM\raloed toW ..._ ~~ problaM, SIAN J. NtuMr. Anol. !4, 1~206.
2.6. PROJECTIONS AND THE CS DECOMPOSITION 75
where Po. is the orthogonal projection onto S;. The distance between a
pair of subspaces can be characterized in terms of the blocks of a certain
orthogonal matrix.
Theorem 2.6.1 Suppose
Z = [ Zt
k
and therefore
II Q12ll~ = 2
1- O'mu.(Qu) .
Thus, II Q21 lb = II Qn ll2· a
Note that if 81 and S2 are subspace& in R" with the same dimension, then
where
. [ ~1 ~2 r[~~ ] vl =[~]
and
0 :::; 81 :::; 82 :::; .. · ::5 Bn :::;
n
2"
Proof. Since UQu lb $ II Q lh = 1, the singula:r values of Qu are all in
the interval [0, 1]. Let
[ ~t n-t
~]
t
m1 -t
78 CHAPTER 2. MATRlX ANALYSIS
r
then
Q= [ ~~: ~~ J
is a 2-by-2 (arbitrury) partitioning of an n-by-n orthogonal matri:l;, then
there ~t ortlwgonal
u =[~~ I~2 ]
and Y=(~l~~]
such that
I 0 0 0 0 0
0 c 0 0 s 0
0 0 0 0 0 I
UTQV =
0 0 0 I 0 0
0 s 0 0 -C 0
0 0 I 0 0 0
2.6. PROJECTIONS AND THE CS DECOMPOSITION 79
where C = diag(ct, ... ,c,.) and S = diag(st, ... , s,.) are square diagono.i
mllt1-U%" with 0 < eo.,,
< 1.
Proof. See Paige and Saunders (1981) for details. We have suppressed the
dimensions of the zero submatrices, some of which may be empty. []
The essential message of the decomposition is that the SVDs of the Qa; are
highly related.
Exa.mple :u.t The m.a&nx
=~~=
0.3691 0.3838 0.2126 -0.3112]
-0.1552 -0.1129 0.2676 0.8517
0. 7240 -0.6730 -0.1301 0.0602
Q= [ -0.2287 0.0088 0.2235 -o.9235 0.2120
UT QV "'
0.4530
0.9337
0.0000
0.0000
[ 0.1800
0.5612
0.0000
0.6781
0.0000
0.0000
0.5806
0.1800
0.0000
0.0000
-0.9837
0.1162
is orthogonal and 'ffiih the inclicaUd partilioniog cu.Oe reduced to
0.0000
0.7349
0.0000
0.0000
0.3595
0.0000
O.(l()(X)
1.0000
0.0000
l
0.0000 0.7341) 0.0000 -0.6781 0.0000
The angles B&SOCiated with the cosines and sines turn out to be very im-
portant in a. number of applications. See §12.4.
Problenu
2. 7.2 Condition
A precise measure of linear system sensitivity can be obtained by OOillBider·
ing the parameterized syBtem
with the convention that I'O(A} = oo for singular A. Using the inequality
11 b 11 ::; II A II II x l1 it follows from (2.7.2) that
where
PA =
IIFII
If I fAii and pr, = IE I m
11/11
ut (A)
I'O:z(A) =II A u~:tl A-l 112 = u,.(A). (2.7.5)
This result may be found in Kahan {1966) and shows that ~~:p(A) measures
the relative p-norm distance from A to the set of singular matrices.
For any norm, we also have
(2.7.8)
B.. =
l -1
~. ~.
... -1]
...
.
-1
. E lR.nxn (2.7.9)
[ . .
0 0 .. . 1
2t
= - I '(A). Cl
1-r
84 CHAPTER 2. MATRIX ANALYSlS
[ ~ 10~. ][ ~ ] .. [ 10~. ]
11M eolutloa z..: ( 1 , 1 )T Md eoodition~eao(A) = IIJ6. U ab = ( llr 8 , 0 )T, dA .:r. 0,
aDd (A+ .O.A)II =II+~. theu JIIC ( 1 + 10-•, 1 )T 1111d the iDequallty (2. 7.10) ay.
to-' ... I z- II ft..., < II~ Hoo "eo(A) '"" to-• tot • 1.
I z lloo nolloo
ThUIJ, Lhe uppw bound in (2.7.10) CAll be a~ CMJreB&iiDMe of the en-or indueed by t.he
perturbation. On tbeotber baud, if 4111 = (0, t0- 8 )T, l1A = 0, Md (A+ll..A)II :o b+.O.b,
cheo thia l11equ.lity aye
~= ~ 2 )( 10-•to' .
Thus, there an petturbatloaa for which the bound in (2.7.10) is -nti..ly ettained.
Note that by properly choosing E and f the perturbed system can take on
certain qualities. For example, if E = IAI and f = lbl and w is small, then
i satisfies a nearby system in the componentwise sense. Oettli and Prager
(1964) show that for a given A, b, x,
E, and f the smallest w possible in
(2. 7.12) is given by
Wmtn =
1M- bli
(Eixl +f),·
Proble~
C= [ ~ J.
Ialand.
S. Chaodruekanm and I. C. F. lpee11 (1995). "'n the Senaitivity of Solution Components
in Linear SyaWDII of Equationa, ~ SIAM J. Mat:ri:& A114L Appi. 16, 93-112.
Tbe reciproct.l of the condition n~~Jllb« lllMSUrell how neN' a given A:l: = b problem is
to singularity. The importance of kn(J'III'ing how Jlelll' a civeu problem is to a difficult or
insoluble problem ba.s come to be appreciated in ma.ny computational sedinp. See
(1991), Ciadet (1992), Datta (1995), Higham (1996), 'fiefethen and Bau
(1996), and Demmel (1996). Some MATLABfunctions important to thia
chapter are lu, cond, rcoDd, and the "bacbluh" operator "\ ". LAPACK
oonnectiona include
Zt = bt/ln
Zl = (~ -l2tZt)f l-n.
(L + F).i = b (3.1.1)
l-or a proof, see Higham (1996). It says tha.t the computed solution exactly
satisfies a slightly perturbed system. Moreover, each entry in the perturbing
matrix F is small relative to the corresponding element of L.
(U + F)i = b (3.1.2)
90 CHAPTER 3. GENERAL LINEAR SYSTEMS
(3.1.3)
Assume tbat the diagonal blacks are square. Paralleling the development of
Algorithm 3.1.3, we solve the system L 11 X 1 = 8 1 for X 1 and then remove
X 1 from block equations 2 through N:
:
£22 0
£33
[
LN3
Continuing in this way we obtain the following block saxpy forward elimi-
nation scheme:
for j = l:N
Solve L;;X; = B;
fori =i + l:N (3.1.4)
Bi = Bi - L;;X;
end
end
l l
Notice tbat the i-loop oversees a single block saxpy update of the form
Nr 2 1
1 - - == l - -
n2 N
Thus, for large N almost all flops are level-3 flaps and it makes seose to
choose N as large as possible subject to the constraint that the underlying
architecture can achieve a high level of performance when processing block
saxpy's of width at least r = njN.
Lu E R"x" bt E 1R"
£ 21 E JR(m-n)Xn b, E 1Rm-n
Problema
P3.1.1 Give an a.lgorithm for computing a. llODZe!'O z: E R" such that Uz: = 0 where
=
U E R"x" Is upper triangular with u,.,. 0 and Un · · · U...-l,n-1 yi; 0.
P3.1.l Discll8S how the determinant of a.llqUII£8 triangular matrix could be computed
w1th minimum risk of ownlow and underflow.
P3.1.3 Rewrite Algorithm 3.1.4 given thai U is stored by column in a length n(n+ l)/2
array u.ve.=.
P3.1.4 Write a. det.ailed wnrion of (3.1.4). Do not I!ISiume that N divides n.
P 3.1.5 Prove all the facts about triangular matrices that Me listed in §3.1.8.
P3.1.6 SupposeS, T E wxn 11re npper triangular and that (ST- M):z: =bill a. non-
singular system. Give a.n O(n2 ) algorithm for computill!: z. Note tbai the explicit
fonna.tion of ST- )J require& O(n3 ) fioJlli. Hint. Suppoae
s+ ; [ ~ f ]. T+ = [ ~ ;: J. b+ =[~ ]
where S+ = =
S(k -l:n,.k-1:n), T+ T(k-l;n, k-Ln), b+ =b(.k-1:n~ and a,T,Jj E R.
Show tbai ilwe Jur.ve a wctor r~ such tha& ·
(SeTa - .U)ze =be
and We = Tezc ill a.~le, then
7 ] /J- avr :&c - uT We
%+ = [ re 'T = err - J.
SOlWII (S+T+ - Al)r+ .::: b+. Obaervl! that z+ and W+ = T+z+ ea.c.h require O(n- k)
fiOJlli.
P:S.l. T Soppoc.e the matricm Rt. ... , R,. e e-x.. are all upper triangular. Give an
O(pn2 ) algorithm far 110lving thesyztem (R 1 • • • R,. -M):z: = b a.umin« that the matrix
of coefficients is nona:lngulw. Hint. Generalize the eolntiou to the previous problem.
3%1 +5x::t = 9
6x1 + 7x, = 4
lx1 +5x::t = 9
-3x::~; = -14
This lB n = 2 Gaussian elimination. Our objective in this section is to give
a complete specification of this central procedure and to describe what it
does in the language of matrix factorizations. This means showing that
the algorithm computes a unit lower triangular matrix L and an upper
triangular matrix U so that A= LU, e.g.,
[! ~]
The solution to the original Ax = b problem is then found by a two step
triangular solve process:
and we define
(3.2.1)
then
1 0 0 0 %t %a
0 1 0 0 %At %At
= 0
0 -Tlr+l 1 0 %At+I
0 - T,. 0 1 :r,. 0
for 1 = k+ l:n
C(i, :) = C(i, :) - r,C(k, :)
end
Example 3.2.1
c =[~ :
6
~ ] , r :.
10
[ ~ ] ~ (I - ref>c ;
- 1
[:
•
~ 17i ] .
10
Iff is used in a Gauss transform update and fl((I- feDC) denotes the
computed result, then
where
Clearly, if r bas large components, then the errors in the update may be
large in comparison to 101. For t.his reason, care must be exercised when
Gauss transformations are employed, a matter that is pursued in §3.4.
A= [ 2I 45 87] .
3 6 10
If
then
M1 - [_: -3
0
1
0
n,
[~ 7]
4
MtA = -3 -6
-6 -11
un
Likewise.
M, =
0
1 => M2(M1A) = [I 4 7]
0 -3 -6
-2 0 0 1
Extrapolating from this example observe that during the kth step
k=1
while (A(k,k) :F 0} & (k $ n -1)
r(k + 1:n) = A(k + l:n, k)/A(k, k) (3.2.2)
A(k + l:n, :) = A(k + l:n, :) - r(k + l:n)A(k, :)
k=k+l
end
The entry A(k, k) must be checked to avoid a zero divide. These quantities
are referred to as the pivots and their reJative magnitude turns out to be
critically important.
A=LU (3.2.3)
where
(3.2.4)
It is clear that Lis a unit lower triangular matrix because each Mt-l is unit
lower triangular. The factorization (3.2.3) is called the LU factorization of
A.
As suggested by the need to check for zero pivots in (3.2.2), the LU
factorization need not exist. For example, it is impossible to find l;; and
l[
u.;; so
1 2 3] [ 1 0 Q Uu UJ!I U13]
[ 3 5 3 ==
2 4 7
1
/21
£31
1
£32
Q 0
0
U22
0
U23
U33
•
To see this equate entries and observe that we must have uu = 1, u12 = 2,
l2 1 = 2, U22 = 0, and l31 = 3. But when we then look at the (3,2) entry
we obtain the contradictory equation 5 = l:31u 12 + l32u22 = 6.
As we now show, a zero pivot in (3.2.2) can be identified with a singular
leading principal submatrlx.
Theorem 3.2.1 A E R'x" has an LU factorization i/det(A(I:k,l:k)) f 0
fork= l:n- 1. If the LU factorization e:Nt.s and A i8 nonsingular, then
the LU factorization i3 unique and det(A) = uu · · · Un"·
Proof. Suppose k-1 steps in (3.2.2) have been executed. At the beginning
of step k the matrix A has been overwritten by M,._ 1 • • • M 1 A = A(k- 1).
Note that ai~- 1) is the kth pivot. Since the Gauss transformations are
98 CHAPTER 3. GENERAL LINEAR SYSTEMS
fork= l:n -1
rows= k+ l:n
A(rows,k) =A(rows,k)/A(k,k)
A(rows,rows) = A(rowa,rows)- A(rowa,k)A(k,rows)
end
3.2.7 Where is L?
Algorithm 3.2.3 represents L in terms of the multipliem. la particular, if
-r<•) is the vector of multipliers 8880ciated with M~c then upon termination,
A(k + l:n, k) = r<lc). One of the more happy "ooincidences" in matrix
3.2. THE LU FACTORIZATION 99
Since A(k + l:n, k) houses the ktb vector of multipUers ,(It) , it follows that
A(i, k) houses la~~: for all i > k .
A: [ i 4
5
6
! l = [ 3~
10
0
1
2 ~ l [~
4
-3
0 -~ l'
then upoo completioo,
A ::: [ ~ -~2
3
lf b-.. (l,l,l)T, thu 11 = (1,-l,O)T solves Ly t band :z: = (-l/3,1/3,0)T eolves
U:z: = ll·
fork= l:n-1
A(k + l:n, k) = A(k + I:n, k)/A(k, k)
fori= k+ l:n
for j = k + l:n
A(i,j) = A(i,j)- A(i, k)A(k,j)
end
end
end
There are five other versioOB: kji, ikj, ijk, jik, and jki. The last of these
results in an implementation that features a sequence of gaxpy's and for-
ward eliminations. In this formulation, the Gauss transformations are not
100 CHAPTER 3. GENERAL LINEAR SYSTEMS
and
;
A(j:n,j) =L L(j:n, k}U(k,j).
k=l
The first equation is a lower triangular system that can be solved for the
vector U(1:j -l,j). Once this iB accomplished, the second equation can be
rearranged to produce recipes for U (j, j) and L(j + 1:n, j). Indeed, if we
set
j-1
v(j:n) = A(j:n,j)- LL(j:n,k)U(k,j)
k=l
A(j:n,j)- L(j:n, l:j- 1)U(l:j -l,j),
then L(j + l:n,j) = v(j + 1:n)/v(j) and U{j,j) = v(j). Thus, L(j + 1:n,j)
is a scaled gaxpy and we obtain
L =I; U = 0
for j = 1:n
if j = 1
v{j:n) = A(j:n,j)
else
Solve L(l:j- 1, l:j- l)z = A(1:j- l,j) for z {3.2.5)
and set U(l:j- 1,j) = z.
v{j:n) = A(j:n,j)- L(j:n, l:j -1)z
end
ifj<n
L(j + l:n,j) = v(j + 1:n)/v(i)
end
U(j,j) = v(j)
end
3.2.10 Block LU
It is possible to organize Gaussian elimination so that matrix multiplication
becomes the dominant operation. The key to the derivation of this block
procedure is to partition A E R'xn as follows
A = [ Au A12 ] r
A·:u A22 n -r
r n-r
[
Au A12 .] = [ Lu 0 ] [ Ir ~ ] [ Uu Ut:z ]
A21 An L:zt In-r 0 A 0 In-r
Au
[ A21
A12 ] = [ Lu 0 ] [ lr ~ ] [ Uu Ut:z ]
A22 L21 L22 0 A 0 U:z2
is the LU factorization of A. Thus, after Lu, L 21 , Uu and U22, are com-
puted, we repeat the process on the level-3 updated (2,2) block A.
A=l
while,\$ n
JA. = min(n,,\ +r -1}
Use Algorithm 3.2.1 to overwrite A(A:~, A:IJ)
with its LU factors L and (J.
Solve lz = A( A:/-', p + 1:n) for Z and overwrite
A(A:IJ,IJ+ l:n) with Z.
Solve Wti = A(IJ + l:n, A:p) for W and overwrite
A(J.' + l:n, ,\:~) with W.
A(p + l:n,IJ + l:n) = A(IJ + l:n,p+ l:n)- WZ
A=p.+l
end
102 CHAPTER 3. GENERAL LINEAR SYSTEMS
[: :]-[~ !][~ _; l
while
123]
[ 4 5 6 :
[10][1
4 1 0 -3
2 -63]
depict& them < n situation. The LU factorization of A E R"'x" is guaran-
teed to exist if A(l:k, l:k) is noosingular fork= l:min(m, n).
The square L U factorization algorithms above need only minor modifi-
cation to handle the rectangular case. For example, to handle the m > n
case we modify Algorithm 3.2.1 aa follows:
fork= 1:n
row11 = k+ l:m
A(rows,k) = A(row.~~,k)/A(k,k)
ifk<n
col.11 = k + l:n
A( row.!, cob) = A(row.!,col.ll) -A(row.!, k)A(k, cols)
end
end
This algorithm requires mn 2 - n 3 /3 Oops.
3.2. THE LU FACI'ORJZATION 103
A=[~~]·
While A has perfect 2-no.rm condition, it fails to have an LU factorization
because it has a singular leading principal submatrix.
Clearly, modifications are necessary if Gaussian elimination is to be
effectively used in general linear system solving. The error analysis in the
Following section suggests the needed modifications.
Problmn~~
PS.l.l Suppc»e the entrie. of A(() E R"'" are continuoUBiy difl'eren.tiable functions of
the sc.b.l (. Assume that A~ A(O} aod .U iw princip&lsubma&rics are nOIWngulu.
Show that for sufficiently smlll.l E, the .nairix A(E) ba~~ a.o LU factorimtion A(E) =
L{()U{t) and that L(() ud U(t) llle both continuously ditferenti&ble.
P3.l.2: Suppoee we partition A E R"'"
A == [ An Atz ]
A21 A22
P3.2:.4 Deecribe a variant of Gu.ian elimin.tion that im.rodut::>M SM'08 into the colwnna
of A ill tbe order, n: - 1:2 aod which prod\IC!II8 tbe factoriu&.ion A = U L where U ._ u11it
upper triangulac and L ia loww triaDJuiar.
P3.2:.S MWic:s in R'x" of the form N(v,fc) =l- pef
wbm-e V E R" are .W to
be G-Jonlfm fnlru/orrrJGtioru. (a) Glwa formula few N(tt,J:)- 1 ..u.m.i..ng it emta.
(b) Given ~e R", Wlder what. cooditioDIJ c&aJI" be (l)uQd .o N(11,J:)~ = e,.? (c) Give
aa ~hm llliDg Ga1111-Jorda.ll ~ thal ~A ..rnh A- 1 • What
oond.iUou 011 A I!ZIIIW1: the suca~~~ of your aJ&ornhm7
PS.2:.15 Extend (3.2.5) .o tbal it caD &J.o haDdle the ca. wbea A hu ~ ron thAD
collliiUIL
P3.l.7 Show~ A. can be ~Ue11 wie.h L &Del U ill (3.2.S). Organize tbe three
loops 110 that umt .uide ~ ~
PS.2.8 Develop a version of G~ elimina.Cioc in whlch tbe ~ ot the three
~~a dot product.
104 CHAPTER 3. GENERAL LINEAR SYSTEMS
Schur complemeata (P3.2.2) arille ill many applicaJ;i.oall. For a .urvey of both praaical
and theoi'Btical i.Dienllt, -
R.W. Cottle (1974). uManifestatioll8 of the Schur Complement,~ Lin. Alg. lind /tJ
...tpplic. 8, 189-211.
Schur compleroeotll are known 811 ucauss transforma" iD some application areu.. The
uae of G&WB-Jorda.n t.ranaformations (P3.2.5) i.l detailed in Fax: (1964). See a.klo
J.J. Dongana, F.G. Gu.atavson, 1111d A. Karp (1984). "Implementing Lineec Algebra
Algorithma for Denee Matricel on a Vector Pipeline Machine," SIAM Review 26,
91-112.
J.M. Ortega (1988). "The ijfc Folliiii or FactorU;ation Methods I: Vector Computers,"
Pa.nUkl Com~ 7, 135-147.
D.H. Bailey, K.t-, and H.D. Simon (1991). "Uiins Str--.'a Algorithm to Accelerate
the Solution of LinMI' S}'lltema," J. Supercomputing 4, 357-371.
J.W. Demmel, N.J. Higham, and R.S. Schreiber (199~). "Stability of Block LU Factor-
izMion," Numer. Lin.. Afg. lliteh Applic. .2, 173-lQO.
(A+ E)x = (b +e) II E lloo ~ ull A lie"" II e floc :5 ull b lloo. (3.3.1)
3.3. RoUNDOFF ANALYSIS OF GAUSSIAN Ew.tiNATION 105
tO = A+H (3.3.3)
A=
a
[ v
wT]
B n- l
1
1 n- 1
then i = fl(vfa) and At = fl(B- zwT) are computed in the tint step of
the algorithm. We therefore have
z= 1
-u+f
a Ill ~ ul:: (3.3.5)
and
At= B - iwT +F
106 CHAPTER 3. GENERAL LINEAR SYSTEMS
Thua,
= A +[ :! Ht ~ F ] =A + H .
From (3.3.6} it follows that
Were it not for the possibility of a large JLIJUI term, (3.3.9) would compare
favorably with the ideal bound in (3.3.1). (The factor n is of no conse-
quence, cf. the Wilkinson quotation in §2.4.6.) Such a possibility exists, for
there is nothing in Gaussian elimination to rule out the appearance of small
pivots. If a small pivot is encountered, then we can expect large numbers
to be present in i. and U.
We stress that small pivots are not necessarily due to ilkonditioning as
the example A= [ ~ ~ ] bears out. Thus, Gaussian elimination can give
arbitrarily poor results, even for well-conditioned problems. The method is
unstable.
In order to repair this shortroming of the algorithm, it is necessary to
intrOduce row and/or column interchanges during the elimination process
with the intention of keeping the numbers that arise during the calculation
suitably bounded. This idea is pursued in the next section.
Example 3.3.1 Su~ fJ"" 10, t = 3, llouinc point arithmetic Ia Ulllld to 110lve:
[ i~ ~:: ][ :~ ] = [ ~:: ] .
Applying G&UI!Bi.an elimination we get
L; [ 1~ ~] U= [
.001
0
1
-1000
]
L(J = [ ·~ 1 ~ ] + [ ~ -~ ] : A+ H.
108 CHAPTER 3. GENERAL LINEAR SYSTEMS
ID&ie of IHI- U- co au to 1111lw the problem ~ding the triaDgular II)'1ICem eolven of §3.1,
then llllinc tbe RID8 ~ arithmeUc- obu.iD a o::omput.ed IOiutioD z = (0, l)T.
This Is in coutrut to the exact; 110lution :r: .: (L002. .. , .998 ... )T,
I~~~~~ 1
:S Ue II :S II A- II Ur 11-
Aasume coDBistency ~ the matrix and vector norm.
PS.S.4 Using 2-digit, bue 10, floating point arithmetic, comput-e the LU CactorizMion
of
J .H. Wilkimon ( 1961}. "Enw Ana.lyBia of Direct Mecbodl of Matrix lnvenion," J. AC!tl
8, 281-330.
Variou. itnprovementa in tbe boundt and mnplificaiion. in the llllaly.il have occurred
- the yean. See
B.A. CbartrM and J.C. Geuder (1967). "Computable Error Bounds for Direct Solution
of Linear Equationa," J. ACM 14, 63--71.
J.K. Reid (19TI). "A Not.e OQ the Stability ol GauU.n Elimillation," J. i~t. Math.
Applic. 8, 374-15.
c.c. Paige (1973). "An Error Analysia of e. Metbod lor Solvin( Mamx Equa&iolla,"
Math. c()fn,. t1, 355-59.
C. de Boor ami A. Pinkwl (1917). "A Backward Error Analysis lor Tota.lly Pomm
Lin_. S)"'teem." Nllffler. Moth. n, ~90.
H.H. R.obenacm (1917). "The Accuncy of Ern.- ~ foe Systam of Linea:r- Alp-
IJraic Equatioua," J. InD. MtUA. AppHe. to, 409---14-
J.J. Du Cros and N.J. fl1«bam (1992). "Stability ofMetboda !or Matrix lnwnion," IMA
J. Num. AnaL 11, 1-19.
3.4. PIVOTING 109
3.4 Pivoting
The analysis in the previous section shows that we must take steps to ensure
that no large entries appear in the computed triangular factors L and 0.
The example
A = [ .0001 1] = [ 1 0 ] [ .0001 1 ] = LU
1 1 .10,000 1 0 -9999
then
P-un n
An n- by-n permutation matrix should never be explicitly stored. It is much
more efficient to represent a general permutation matrix P with au integer
n-vector p. One way to do this is to let p(k) be the column index of the
sole "1" in P's kth raw. Thus. p = [4 1 3 21 is the appropriate encoding of
the above P. It is also possible to encode P on the basis of where the "1"
occurs in each column, e.g., p = [2 431].
110 CHAPTER 3. GENERAL LINEAR SYSTEMS
E- [n! ~l
Interchange permutations can be used to describe row and column swap-
ping. With the above 4-by-4 example, EA is A with rows 1 and 4 inter-
changed. Likewise, AE is A with columns 1 and 4 swapped.
If P = En··· E1 and each E,. is the identity with rows k and p(k)
interchanged, then p(1:n) is a useful vector encoding of P. Indeed, z E !Rn
can be overwritten by Px as follows:
fork= l:n
x(k) +-+ x(p(k))
end
Here, the"+-+" notation means "swap contents." Since each E1c is symmetric
and pT = E 1 ···En, the representation can also be used to overwrite x with
pTx:
for k = n: - 1:1
x(k) +-+ x(p(k))
end
A =
3 17
2 4
[ 6 18
10
-2
-12
l .
3.4. PIVOTING lll
To get the smallest possible multipliers in the first GaUBS trBDSform using
row interchanges we need au to be the largest entry in the first column.
ThU5, if E 1 is the interchange permutation
~l
[ 0 0
E1 = 0 1
1 0
then
-12]
[~
18
E1A = 4 -2
17 10
and
Mt - [ -:/3 01 00
-1/2 0 1
l ==> MtE1A =
[6 18 -12]
0
0
-2
8
2
16
.
E2 = 1 0 0
0 0 1
[ 0 1 0
l and 01
00
1/4 1
l
then
M2E:~MtE1A = [ 0~ -~~6 ] ·
1
:
0
The example illustrates the basic idea behind the row interchanges. In
general we have:
fork= 1:n -1
Determine an interch.a.nge matrix E~c with E~c(l:k, l:k} = I~c
such that if z is the kth column of E~cA, then
=
lz(k)l II z(k:n) lloo·
A=E~~:A
Determine the Gauss transform M~~: such that if v is the
kth column of M~cA, then v(k + 1:n) = 0 .
A=M~:A
end
All the information necesaary to do this is contained in the array A and the
pivot vector p. Indeed, the calculation
fork= l:n -1
b(k) - b(p{k))
b{k + l:n) ::o b(k + l:n} - b(k)A(k + l:n, k)
end
00][100][ 100][001] [6
[~ 1
1/4
0
1
0
0
0
1
1
0
-1/3
-1/2
1
0
0
1
0
1
1
0
0
0
A= 0
0
3.4.4 Where is L?
Gaussian elimination with partial pivoting computes the L U factorization of
a row permuted version of A. The proof is a messy subscripting argument.
{3.4.1)
k::Sn-2.
114 CHAPTER 3. GENERAL LINEAR SYSTEMS
[ 0~ ~1
1
°0 l[6~ 1
18
~ -12~~ l= [ I/;
1/3 -1/4
~ ~1 l[~ 0
1
:
0
-!~6 l'
3.4.5 The Gaxpy Version
In §3.2 we developed outer product and ga.xpy schemes for computing the
LU factorization. Having just incorporated pivoting in the outer product
version, it is natural to do the same with the ga.xpy approach. Recall from
(3.2.5) the general structure of the ga.xpy LU process:
L=I
U=O
for j = l:n
ifj=l
v(j:n} = A(j:n,j)
else
Solve L(l:j - l,l:j - l}z == A{l:j - 1, j) for z
and set U(l:j - 1, j} = z.
v(j:n) = A(j:n,j)- L(j:n, l:j - l)z
end
if j <n
L(j + l:n,j} = v(j + l:n)/v(j)
end
U(i,j) = v(j)
end
With partial pivoting we search lv(j:n)l for its maximal element and pro-
ceed accordingly. Assuming A is nonsingular so no zero pivots are encoun-
tered we obtain
3.4. PIVOTING 115
L=l; U=O
for j = l:n
ifj=l
v(i:n) = A(i:n,j)
else
Solve L(l:j- 1, l:j- l)z ::o A(l:j -l,j)
for z and set U(l:j- l,j) = z.
v(i:n) = A(i:n,j)- L(j:n, l:j- l}z
~ (U~
ifj<n
Determine p. with k ~ IJ ~ n so lv(p.)l = IJ v(j:n) lloo·
p{j) = p.
v(j) .-. v{p.)
A(j,j + l:n) .-. A(J.',j + l:n)
L(j + l:n,j) = v(j + l:n)fv(j)
ifj>l
L(i, l:j- 1) .-. L(J.', l:j -1)
end
end
U(j,j) = v(j)
end
Here we are assuming that P, L, and f.J are the computed analog5 of P,
L, and U as produced by the above algori~. Pivoting implies that the
elements of L are bounded by one. Thus RL Roo :5 n and we obtain the
bound
p=
la~A:l I
max __i _ (3.4.5)
i.,j,k II A lloo
where JV.:) is the computed version of the matrix A(.t) ""M.tE.t · · · M1E1A.
It follows that
(3.4.6)
Whether or not this compares favorably with the ideal bound (3.3.1) hinges
upon the size of the growth factor of p. (The factor n 3 is not an operating
factor in practice and may be ignored in this discussion.) The growth factor
meaaures how large the numbers become during the process of elimination.
In practice, p is usually of order 10 but it can also be as large as 2"- 1• De-
spite this, most numerical analysts regard the occurrence of serious element
growth in Gausaian elimination with partial pivoting as highly unlikely in
practice. The method can be used with confidence.
Ex.unple 8.4.3 H Ga11811ian elimination with partial pivoting is applied to the problem
with {3
[ i~ ~:~ ] [ =~
= 10, t = 3, floating point aritlunetic, then
] = [ i:~ ]
A == [ ~~~ ~~ ] n: r
r n-r
3.4 . PIVOTING 117
The first step in the block reduction is typical aDd proceeds as follows:
• Use scalar GaUMian elimination with partial pivoting (e.g. a rec:ta.o-
gular version of Algorithm 3.4.1) to compute permutation P 1 e R' x " ,
unit lower tria.ngular Lu e ~xr and Upper triangular Uu e R"xr so
then
PAQ =:! LU
where P = En-l · · · E1 , Q = F1 · · · F,._l and L is a unit lOUJer triangular
matrix with jl;jl S 1. The kth column of L belmu the diagonal is a permuted
version of the kth Gauss vector. In particular, if M~~: "" I- -r!")ei then
L(k + l:n, k) = g(k + l:n} where g = E,._ •. ··EJo:+lT(/o) •
Proof. The proof is similar to the proof of Theorem 3.4.1. Details are left
to the reader. 0
PAQ :::o LU = [ Lu
~1
0 ] ( Uu
f .. _.. 0
U12 ] •
0
Here Lu and Uu are r-by-r and ~1 and [!'[; are (n - r)-by-r. Thus,
Gaussian elimination with complete pivoting can in principle be used to
determine the rank of a matrix. Yet roWldoff errors make the probability
of encountering an exactly zero pivot remote. In practice one would have to
"'declare" A to have rank k if the pivot element in step k + 1 wBB sufficiently
small. The numerical rank determination problem is discussed in detail in
§5.4.
Wilkinson (1961) hBB shown that in exact arithmetic the elements of
the matrix A<"l = M~oE1c · · · MtEtAFt · · • F~; satisfy
(3.4.8)
The upper bound is a rather slow-growing function of k. This fa.ct coupled
with vast empirical evidence suggesting that p is always modestly sized (e.g,
p = 10) permit us to conclude that Gauuian elimination 111ith ctJmplete
pivoti119 is stable. The method solves a nearby linear system (A+ E)% b =
exactly in the sense of (3.3.1). However, there appears to be no practical
justification for choosing complete pivoting over partial. pivoting except in
cases where rank determination is an issue.
Exaolp.le 3.4.6 H GIWIIIIiaD elimlnelion rib complete piwt.ing ill applied to the prob-
IIIID
.001 1.00 ] [ :1 ] "" [ UIO ]
[ 1.00 2.00 %2 3.00
'iritb {J = 10, t = 3, lloa&.ing: llritbmetic, tben
p,.. [ 1
o 1 ]
0 'Q=
[ o
1
1 ]
0 '
L ""
[ .500
1.oo o..oo
UIO'
] (J = [ 0.00
2.00 1.00 ]
.499
aDd t = [1.00, l.oo)T. Compece with Examples 3.3.1 NKI3.4...3.
illustrate the kind of analysis required to prove that pivoting can be safely
avoided, we conaider the case of diagonally dominaot matrices. We say that
A E :K')(n ia strictly diagonally dominant if
i = l:n.
The following theorem shows how this property can ensure a nice, no-
pivoting LU factorization.
A= [ ~ ~]
where o: is 1-by-1 and note that after one step of the outer product LU
process we have the factorization
1 0 ] [ 1 0 ] [ o: wT ]
[ vfa. I 0 C- vwT fa 0 I ·
1 0 ] [ 0 WT ]
A = [ vjo L1 0 U1 :=: LU.
:S (lc;;l-lw;f) + lw·l
; (lol-lv;l)
11
$ Is;- I
w~v; = lb,,J.o
3.4. PNOTING 121
Problem8
P3.4.1 Let A = LU be the LU factGrizati.on of n-by.n A wit.h jt,; I :5 1. Let a.'[ and uf
denote the iib row. of A Md U, re~pecti~y. Verify the equ&iion
i-1
uf = 4l' - L ltruf
j•l
122 CHAPTER 3. GENERAL LINEAR. SYSTEMS
ftope.
PS.4.4 Suppoee X is the compuud in\ll!IIBI! obtained via (3.4..9). Giw UJ upper bound
for ll AX- IliF·
PS.4.5 Prow Tbeorem 3.4.2.
PS.4.6 Extend Algorithm 3.4.3 so that ii can ~ 1m arllitrary rectanguJar matrix.
PS.4. 7 Write a detailed wnion of tbe block eliminalion algoriUun outlined in §3.4.7.
H.J. Dowdier, R.S. Martin, G. Peten., Uld J.H. Willtin8on {1966}. "Solulioo of Real
and Complt!X Systems o£ Linear Equations,n Numer. Moth. 8, 217-34. See alao
Wilkinson and Reinsch (1971, 93-110).
The conjecture that la~;)l cS n maxi~~;; I when complete pivoting is u.cl bu be8n pi'O\I'&Il
in the real n = 4 case in
C.W. Cryer (1008). ~Pivot Size in Gawmiao. Elimination.~ Hamer. MAth. a, 335-45.
J.K. Reid (1971). "A Note on the Sblhility oi GauBa.n Elimination,~ J.Irut. Math.
ApplN;,. 8, 374-75.
P.A. BuaiDpr (1971). ~MoniWJDg tbe Numerical Stability of G...-iaa Eliminalion,n
N - . MGJA. 16, 360-61. ·
A.M. Cohen (1Q74). "A Note on~ Size in G-..-ia.n Elimination," Li-n. Alg. and It.
Applic. 8, 361~
A.M. Eriaman and J.K. Reid (1974). "Monitorin~ the Stabili'f of the "I'rianplR Fac:-
tomar.bl of a Spvae Matrilc," Numer. MAth. n, 183-86. .
J. Day BDd B. Pettnon (1988). "Gnnrtb in G~ Elimin.tion,~ Am=. Math.
Mon.cflly 95, 489--513.
N.J. Higham &Dd D.J. HiJbam (1989). •~..aqe Gl'OII'th F8cton in Ga.uaian EJiminatioQ
with Plwting," SIAM J. M&trV AnaL AppL 10, 1M-1M.
L.N. Trefethen &Del R.S. Schnliber (1990). ·A~ Stability of GAU81ian ElJmi.
n.abm, .. SlAM J. M~ AnaL AppL ll, 335-360.
N. Gould (1991). "'n Growtb in Gau.ia.n Elimination with Complete Piwtinc," SIAM
J. MtUriz Anol. Appl. 11, 354-361.
A. Edelman (1992). "'The Complete PivotU!.c Conjecture !of Gauaaian EJin~inatioo ia
Faile.R The Matlw:m4ttm Jaunt4l !, 53-61.
S.J. Wript (1993). "A Collection ol Probimll for Which Gauasian Elimination with
Partial Pivoting is U~R SIAM J. Sri. on4 StQ. ComJ~. 1,1, 231-238.
L.V. F'ostel- (1994). "GIWMian El.i.miDIItion with Partial Pivoting Can Fail in Practice,"
SIAM J. Mlltri% AnaL Appl. 15, 1354-1362.
3.5. IMPROVING AND :EsTIMATING ACCURACY 123
!.S. DWr, A.M. Erisman, UJd J.K. Reid (1086). Direct MeiiiDd.tftlr' SpGne MOlriou,
Oxford UDiV'IInhy Pre..
The OO!lllection betwMn amall plwta Uld n_. ~ ill ~ in
T .F. Chan (1085). "'n tbe Em&eaee and Comput»ion of LU ~u with email
pivoca," Moll&. Comp. 41. ~~8.
A pivot 1tnt41C that- did aot UcuM ia ~ P'wfing. Ia this appr'Ofldl. 2-by-2
G&ll8 uansf~ioo8 are u.d to 1SV the lower tri&Dculac ponion of A The techuique
ie appealin1 in certain mo.ltipr_. eovimameut.w becau. ooly ad,llloeellt - ue com-
bined in each l&ep. See
s. Ssbio (1080). "'n F'actoriiJc a Cl- or Complex SymiDOtric Malricell Wi~bout Piv-
oting,~ MalA. C<nnp. 35, 1231-1234.
Just • there an six ·~· vel'lion8 of ec:alar Gallll6ao e!jmiDe&lon, there are
alao six oonveatioul block fonnula&iou of Gau.ian ~ion. For a diacualnn of
~b- procedwea aAd their i~ion -
K. GallivaD, W. Jalby, U. Meier, aud A.H. Sameb (1988). ulm~ a! Hilnrchical Mem-
ory S)'8temll on Lioe&r AJcebra Algorithm Design," lnt'l J. Supm»mputer Applic.
2, 12-48.
[ :: :~~ ] [ :~ ] = [ :ib~ ]
in which ~(A) ~ 700 and x =(2, -3)r. Here is what we find for various
machine precisions:
3.5.2 Scaling
Let {3 be the machine base and define the diagonal matrices D 1 and D2 by
D1 = diag(,8'"l ..• tr·)
~ = diag(,OCl ..• ~).
The solution to the n-by-n linear system Az = b can be found by solving
the scakd sy.ttem (Di" 1 AD2)Y = Di 1b using Gaussian elimination and
then setting x = D'ltl· The scali.ngs of A, b, andy require only O(n2 ) flops
and may be accomplished without roundoff. Note that D1 scales equations
and~ scales llllknowns.
It follows from Heuristic II that if x and fi are the computed versions of
x andy, then
Thus, if ~>IXI(D1 1 AD2 ) can be made considerably smaller than ~t,x,(A), then
we might expect a correspondingly more accurate x, provided errors are
measured in the "~" norm defined by 1l z 11 02 = II D2 1z lloo· This is the
objective of scaling. Note that it encompasses two issues: the condition
of the scaled problem and the appropriateness of appraising error in the
D2-norm.
An interesting but very difficult mathematical problem concerns the
exact minimization of ~>p(D1 1 AD2 ) for general diagonal D, and various
p. What results there are in this direction are not very practical. This is
hardly discouraging, howewr, when we recall that (3.5.3) is heuristic and
it m.a.kes little sense to minimize exactly a heuristic bound. What we seek
is a fast, approximate method for improving the quality of the computed
solution x.
One technique of this wriety is simple row sca.ling. In this scheme D2 is
the identity and D1 .is chosen so that each row in Di 1 A bas apprmdm.ately
the same oo-oonn. Row scaling reduces the likelihood of adding a very
small number to a very large number during elimination-an event that
can greatly diminish accuracy.
Slightly more complicated than simple row scaling is row-column equi-
librotion. Here, the object Is to choose D1 and D 2 so that the oo-oorm
of each row and column of D1 1 AD2 belongs to the interval [1/,8, 1] where
fj is the base of the Boating point system. For work along these lines see
McKeeman ( 1962).
It cannot be stressed too much that simple row scaling and row-column
equilibration do not "solve" the acaling problem. Indeed. either technique
can render a worse %than if no scaling whatever is used. The f81Dificatioua
of this point are thoroughly diacussed. in Forsythe and Moler (1967, chap--
ter 11). The basic recommendation is that the scaling of equations and
126 CHAPTER 3. GENERAL LINEAR SYSTEMS
ate each 110lved UBing fJ = lO,t = 3 arithmetic, tben .110lutio1111 ~ = (0.00, UlO)T and
~ =:: (1.00, l.OO)T ar-e reepectiwly computed. Note that :z: =: (1.0001 ... , .9999 ... )T ill
the uuct aoluliou.
r=b-A:i
Solve Ly == Pr. {3.5.4)
Solve Uz == y.
;a;_.=:i:+z
=
then in exact arithmetic A:ttuw = AZ+Az (b-r)+r =b. Unfortunately,
the naive Boating point execution of these formuJae renders an Xtuw that is
no more accurate than i:. This is to be expected since f = fl(b- A:i:) has
few, if any, correct significant digits. (Recall Heuristic I.) Consequently,
i = fl(A- 1r} ::::: A- 1 · noise ::::: noise is a very poor correction from ~
standpoint of improving the accurocy of£. However, Skeel (1980) has done
an error analysis that indicates when (3.5.4) gives an improved ;a;IWUI from
fll.e standpoint of bad:ward.s error. In particular, if the quantity
is not too big, then (3.5.4) produces an~ such that (A+ E)x.._, = b
for very small E. Of course, if Gaussian elimination with partial pivoting
is used then the computed % already solves a nearby system. However,
this may not be the case for some of the pivot strategies that are used to
preserve sparsity. In this situation, the fixed precision iterntive impnwement
3.5. IMPROVING AND EsTIMATING AccURACY 127
step (3.5.4) can be very worthwhile aod cheap. See Arioli, Demmel, and
Duff (1988).
For (3.5.4) to produce a more accurate x, it is necessary to compute the
residual b- Ai with extended precision Boating point arithmetic. Typically,
this means that ifkligit arithmetic is used to compute PA = W, x, y, and
z, then 2t.-d.igit arithmetic is used to form b-Ai, i.e., double precision. The
process can be iterated. In particular, once we have computed PA = LU
=
and initialize x 0, we repeat the following:
r = b - Ax (Double Precision)
Solve Ly = Pr for y. (3.5.5)
Solve Uz = y for z.
x=x+z
II A lloo = max
l~i~n
L la;il·
jcol
The idea behind their estimator is to choose d so that the solution y is large
in norm and then set
The success of this method hinges on how close the ratio II y lloc/11 d lloc is
to its maximum value II A -l lloo·
Consider the case when A = Tis upper triangular. The relation between
d and y is completely specified by the following column version of back
substitution:
p(l:n) =0
3.5. IMPROVING AND EsTIMATING AcCURACY 129
for k = n: - 1:1
Choose d{k).
y(k) = (d(k) - p(k))/T(k, k) {3.5.6)
p(l:k- 1) ""p(l:k- 1) + y(k)T(l:k- l,k)
end
Normally, we use thia algorithm to solw a given triangular system Ty = d.
Now, however, we are free to pick the right-hand side d subject to the
uconstraint" that y is large relatiw to d.
One way to encourage growth in y is to choose d( k) from the set
{ -1, +1} so as to maximize y(k). II p(k) ~ 0, then set d(k) = -L If
p(k) < 0, th~n set d(k) = +1. In other words, (3.5.6) is invoked with d(k)
=-sign(p(k)). Since d is then ·a vector of the form d(1:n) = (±I, ... , ±l)T,
we obtain the estimator ltoo =- II T llooll Y lloo·
A more reliable estimator results if d( k) E { -1, +1} is chosen so as
to encourage growth both in y(k) and the updated running sum given by
p(l:k- 1, k) + T(l:k- 1, k)y(k). In particular, at step k we compute
and set
This gives
p{1:n) = 0
for k = n: - 1:1
y(k)+ = (1- p(k))/T(k,k)
y(k)- = (-1- p(k))/T(k,k)
p(k)+ = p{l:k- I)+ T(I:k- 1, k)y(k)+
P(k)- = p(l:k- I)+ T(l:k- 1, k)y(k)-
130 CHAPTER 3. GENERAL LINEAR SYSTEMS
Problem~~
P3.5.1 Show by example that there may be mol'l!! than one way to equilib~ a matrix.
using Gauaaian elimination witb partlal pivoting. Do one step of itecative improvement
ueing t = 4 arithiMtic to compute the residual. (Do noi forget to roUlld the computed
residual to two digita)
t
P3.5.3 Suppoae P(A +E) :; L(J, where P is • permutation, is lower triangular with
1, and U is upper triangular. Show that iec.o(A) ~ II A lloo/(11 E IJ.,.. + #) where
Iii; I :S:
3.5. IMPROVING AND EsTIMATING AccuRACY 131
J' = min IU.ol· Conclude the.& if a small pivot ill e!ICOuntend wiHm Gaullian elimina&ion
with pivotill( is applied to A, then A ill ill-amditiona:l. The CUilYt!IWII is not true. {Let
A= B .. ).
= b where
l l
P3.5.4 {Ka.ha.u 1966) Tbe syBtem. .4:1:
l
P3.5.5 Consider the matrix:
T = [ ~
0
! -~
0 0
-~
1
MER.
What estimate of ~{T) ia produced when (3.5.6) is applied wi~b d(k) "" -agn(p(k))?
What estimate 00. Algorithm 3.5.1 produce? What is the true ~~:.oo(T)?
P3.5.6 Wha& do. Algorithm 3.5.1 produce when spplied to thl! matrix B,. given in
(2.7.9)?
C.B. Molar (1967}. "lteratiw Refinemt!IDt in Floating Point," J. ACM 14, 316-71.
R.D. Skeel (1980}. "Iterative Refinement Impliet Numerieal Stability for Ga.ussian Elim-
ination," M11th.. Comp. 35, 817-832.
G.W. Stewan: (1981). "'n tba Implicit Deflation of Ntt~~rly Singulac Systems of Linear
Equations," SIAM J. Set. and Stilt. Comp. S, 1~140.
The condition estiJnator that we described ill giwn ill
A.K. Cline, C.B. Moler, G.W. Stewart, and J.H. WilkillSOD (1919). "An Estimate for
the Condition Number of a Matrix," SIAM J. Num. Anal. 16, 368-75.
Other references concerned with the rondition estimation problem include
C.G. Broyden (1973}. "Some Condition Number Bound!! for the Gaussian Elimination
Proceaa," J. !rut. Math. Applic:. 11, 273-86.
F. Lemeire (1973). "Bound& for Condition Numbers of Triangular Value of a Matrix,"
Lin. Alg. and Ita Applil:.. 11, 1-2.
R.S. Varga (1976). "On Diagonal Dominance Arguments for Bounding II A- 1 lloo," Lin.
Alg. and Ita Applic. 14, 211-17.
G.W. S~ (1980). "The Elftciellt Generation of Random Onhogona.l. Matricl!ll with
a.n Applica.tion to Condition Estimaton," SIAM J. Num. Anal. 17, 403--9.
D.P. O'Leary (1980). "Estimating Matrix Condition Numberst SIAM J. Sci. Stat.
Comp. 1, 205-9.
R.G. Grime!l and J.G. Lewis {1981). "Condition Number- Elltimat.ion for Sparse Matri-
ces," SIAM J. Sci. and Stat. Comp. B, 384-88.
A.K. Cline, A.R. Conn, and C. Van Loan (1982). "Generalizing the LINPACK Condition
Estimator-," in Numerical Analy~i.!! , ed., J.P. Hennart, Lectur-e Notes in Mathematics
no. 909, Springer-Verlag, New York.
A.K. Cline and R.K. Rew (1983}. "A Set of Counter examplee to Tbn;~e Condition
Number- Eaiimators," SIAM J. Sci. and Sto.t. Comp. 4, 602-611.
W. Hager (1984). "Condition Estimates,~ SIAM J. Sci. and Sto.t. Comp. 5, 311-316.
N.J. Higham (1987). "A Survey of Condition Number Estimation for Triangular Matri-
CIIII," SIAM .RetMw 19, 575-596.
N.J. Higham (1988). "Fortran Codea for EBilmaiing tbe One-norm of a Real or Complex
Matrix, with Applications to Condition Elltimation," ACM Tl-cM. MotA. Soft. 14,
381-396.
N.J. Higham (1988). "FORI'RAN Codes for Estimatin&: tbe One-Norm of a Real or
Complex Matrix with ApplicatioiiiJ to Condition Eet~n (Algorithm 674}t ACM
Thm.t. Math. Soft. 14, 381-396.
C.H. Bischof (1990}. "'ncremmtal Condition Estlmaiion,w SIAM J. MatN Anlll. AppL
11' 644-659. .
C.H. Bischof (1990}. "Incremental Condition EBtimation f« Spacse Matrices," SIAM J.
Matri.l: Anal. AppL 11, 312-322.
G. Aucbmuty (1991). "A Posteriori Error Elltimates for Linear Equations: Numer.
Ma.tJ&. 61, 1-6.
N.J. Higham (199.1}. "'ptimisation by Direct; Search in Matrix Computationa," SIAM
J. Matri:l: Anal. Appl. U, 317-333.
D.J. Higham (1995}. "Condition NIIJIIbm-a and Their Condition Numbera," Lin. Alg.
and & Applie. 814, 193-213.
Chapter 4
133
134 CHAPTER 4. SPECIAL LINEAR SYSTEMS
The proof shows that the LDMT factorization can be found by using Gaus-
sian elimination to compute A = LU and then determining D and M from
the equation U = DMI'. However, an interesting alternative algorithm can
be derived by computing L, D, and M directly.
Assume that we know the first j - 1 columns of L, diagonal entries
d 11 ... , di-l of D, and the first j -1 rows of M for some j with 1 '5: j '5: n.
To develop recipes fur L(j + l:n,j), M(j, l:j - 1), and di we equate jth
columns in the equation A = LDA(I'. In particular,
A(l:n,j) = Lv (4.1.1)
where v = DMTei. The "top" half of (4.1.1) defines v(l:j) as the solution
of a known lower triangular system:
d(j) = v(j)
M(j,i) = v(i)/d(i) i = 1:j -1.
The "bottom" half of (4.1.1) sayB L(j + 1:n, l:j)v(l:j) = A(j + 1:n,j) which
can be rearranged to obtain a recipe for the jth column of L:
L(j + l:n,j)v(j) = A(j + l:n,j) - L(j + 1:n,l:j -l)v(1:j -1).
Thus, L(j + 1:n, j) is a scaled gaxpy operation and overall we obtain
for j = l:n
Solve L(l:j, 1:j)v(l:j)= A(l:j,j) for v(l:j).
fori= l:j -1
M(j, i) = v(i)/d(i) (4.1.2)
end
d(j) = v(j)
L(j + l:n,j) =
(A(j + 1:n,j)- L(j + l:n, l:j- 1)v(l:j- l)) fv(j)
end
4.1. THE LDMT AND LDLT FACTORIZATIONS 137
for j = l:n
{ Solve L(l:j, l:j)v(l:j) = A(1:j, j). }
u(l:j) =A(l:j,j)
fork= l:j -1
v(k + l:j) = v(k + l:j) - u(k)A(k + l:j, k)
end
{ Compute M(j, l:j- 1) and store in A(l:j- l,j). }
fori= 1:j -1
A(i,j) = v(i)/A(i,i)
end
{ Store d(j) in A(j, j). }
A(i, j) = u(j)
{Compute L(j + l:n,j) and store in A(j + l:n,j) }
fork= l:j -1
A(i + l:n) = A(i + l:n,j) - v(k)A(j + l:n, k)
end
A(j + l:n,j) = A(j + 1:n,j)Jv(j)
end
Example 4.1.1
A """ [ ~ ~
30 00
:
61
l = [ 3~ ~ ~
( 1
l[ ~ ~ gl[~ ! ~ l
1
0 0 1 0 0 1
aDd upon completion, Algorithm 4.1.1 ovenmt-. Au folio1w3:
A = [ lg3 4~ ~1 ]·
4.1.2 Symmetry and the LDLT Factorization
There is redundancy in the LDM'f factorization if A is symmetric.
Theorem 4.1.2 If A = LDM'T i.! the LDM'T factorization of a nonsin-
gtdar symmetric matriz A, then L M. =
Proof. The matrix = M- 1LD is both symmetric and lower
M- 1AM-T
triangular and therefore diagonal. Since D is nonsingular, this implies
that M- 1 L is also diagonal. But M- 1 L is unit lower triangular and so
M- 1L = J.o
l
columns. Recall that in the jth step of (4.1.2) the vector v(1:j) is defined
by the first j components of DMT e;. Since M = L, this says that
d(l)L(j, l)
1
v( :j) = [
d(j- l)~(j,j - 1) .
d(j)
Hence, the vector v(1:j -1) can be obtained by a simple scaling of L's jth
row. The formula v(j) = A(j,j)- L(j,l:j- l)v(1:j -1) can be derived
from the jth equation in L(1:j, 1:j)v = A(I:j, j) rendering
for j = 1:n
fori= 1:j -1
v(i) = L(j, i)d(i)
end
v(j) = A(j,j)- L(j, 1:j- 1)v(1:j- 1)
d(j) = v(j)
L(j + l:n, j) =
(A(j + 1:n,j)- L(j + l:n, l:j- l)v(l:j- l))fv(j)
end
4.1. THE LDMT AND LDLT FACTORIZATIONS 139
Example 4..1.:1
1
~ ~~ ~ ! ~ ][ ~ ~ ~ I !]
10
A= [
~ ] = [ ] [ g
and eo if Algorithm 4.1.:1' i8 applied, A is overwriUen by
A=
[
10
2
3
20
5
4
30
80
1
l .
P:roblem.
Algorithm 4.l.lla related CO the met;hods of Crout and Doolittle in thac ou~ product
updaiell are avoided. See Chapter- 4 of Fax (HI64} or Stewart (l!n3,131-149). An Algol
procllldlm!l may be found in
H.J. Bowdler, R..S. Martin, G. Petera, and J.H. Wi.lldnaou (1966), "Solution of Real and
Complex Systems of Lineat Equations," Numer. MGtA. 8, 217-234.
See alao
M. Arioli, J. Demmel, and I. Duff (1989). "Solvi111 Sparse Linear Syatems with SpBI'IIe
Backwa:d Error," SIAM J. Matri.% Anal. AppL. 10, 165-190.
J.R. Bunch, J.W. Demmel, and C.F. Va.n Loan (1989). "The Strong Stability of Algo-
rith.ms for Solving Symmetric Linear Syatell'lll," SIAM J. MGtri:l AML Appl. 10,
494-499.
A. Ba.rrlund (1991). "Perturbation Bounds for the LDLT a.nd LU DecomJ)08itio~."
BIT 31, 358-363.
D.J. Higham and N.J. Higham (1992). "Backward Error aod Condition of Structured
Lineal' Systems,~ SIAM J. MC!trU AML Appl. 13, 162-mi.
A = [ au a12 ]
t121 a22
The last two equations imply la12l ~ (au + a22)/2. From these results we
see that the largest entry in A is on the diagonal and that it is positive. This
turns out to be true in general. A symmetric positive definite matrix has
a "weighty" diagonal. The mass on the diagonal is not blatantly obvious
4.2. POSITIVE DEFINITE SYSTEMS 141
as in the C88e of diagonal dominance but it bas the same effect in that jt
precludes the need for pivoting. See §3.4.10.
We begin with a few comments about the property of positive definite..
ness and what. it implies in the unsymmetric case with respect to pivoting.
We then focus oo the efficient organization of the Cholesky procedure which
can be used to safely factor a symmetric positive definite A. Gaxpy, outer
product, and block versioDB are developed. The section concludes with a
few comments about the semidefinite case.
A _ [ e m ] _ [ 1 0 ] [ f 0 ] [ 1 m/t. ]
- -m t. - -m/E 1 0 e + m 2fe 0 1
(4.2.1}
The theorem suggests when it is safe not to pivot. Assume that the com-
puted factors t, iJ, and M satisfy:
(4.2.2)
where c is a constant of modest size. It follows from (4. 2.1) and the analysis
in §3.3 that if these factors are used to compute a solution to Ax = b, then
the computed solution i: satisfies (A+ E).i: b with =
II E IIF :5 u {3nll A IIF + 5cn2(II T ll:~ +II ST- 1s !12)) + O(u2). (4.2.3)
is not too large then it is safe not to pivot. In other words, the norm of the
skew part S has to be modest relative to the condition of the symmetric
part T. Sometimes it is possible to estimate 0 in an application. This is
trivially the case when A is symmetric for then 0 = o.
Proof. From Theorem 4.1.2, there exists a unit lower triangular Land a
diagonal D = diag(d 1 , ••• , d.,..) such that A= LDLr. Since the dt are~
itive, the matrix G = L diag( .;d;, ... , -./d:)
is real lower triangular with
positive diagonal entries. It also satisfies A = GGr. Uniqueness follows
from the uniqueness of the LDLT factorization. []
is positive definite.
G(j:n,j) = v(j:n)j..fo(j).
for j = 1:n
v(j:n) = A(j:n, j)
fork= 1:j -1
v(j:n) = v(j:n) - G(j, k)G(j:n, k)
end
G(j:n,j) = v(j:n)/ y'vU)
end
It is possible to arrange the computations so that G overwrites the lower
triangle of A.
A _ [ Q VT ] _ [ 0 ] [ 1
(3 0 ] [ (3 vT j{J ]
- v B - v/P
ln-1 0 B- wT /a 0 ln-1 ·
(4.2.6)
Here, (3 = ..fii and we know that a > 0 because A is positive definite. Note
that B-ooT fa is positive definite because it is a principal submatrix of
J(1' AX where
X = [ 1 -vT fa ] .
0 [,._1
If we have the Cholesky factorization G 1Gf = B -wT fa, then from (4.2.6)
it follows that A = GGC" with
fork= l:n
A(k,k) = y'A(k,k)
A(k + l:n,k) = A(k + l:n,k)/A(k,k)
for j = k + l:n
A(j:n,j) = A(j:n,j) -A(j:n,k)A(j,k)
end
end
This algorithm involves n 3 /3 flops. Note that the j-Joop computes the lower
triangular part of the outer product update
A(k + l:n, k + l:n) = A(k + l:n, k + l:n) - A(k + l:n, k)A(k + l:n, k)T.
Recalling our discussion in §1.4.8 about gaxpy versus outer product up-
dates, it is easy to show that Algorithm 4.2.1 involves fewer vector touches
than Algorithm 4.2.2 by a factor of two.
A;; = L G~df;c.
t-1
Defining
j-1
s= ~; - '2:: oi,.aJ,.
bool
for j = l:N
fori =j:N
j-1
S = A;; - .E
Jr,.l
GUrGJ~c
ifi=j
Compute CboJesky factorizationS= Giid£·
else
Solve G 1 ;d£= S for G1;
end
Overwrite AiJ with G,;.
end
end
The overall process involves n 3 /3 Bops like the other Cholesky procedures
that we have developed. The procedure is rich in matrix multiplication
assuming a suitable blocking of the matrix A. For ex&mple, if n = r N and
each Aii is r-by-r, t.hen the level-3 fraction is approximately 1- (l/N 2 ).
Algorithm 4.2.3 is incomplete in the sense that we have not specified how
the products GUrG;~c are formed or how the r-by-r Cholesky factorizations
S = Giid£ are computed. These important details would have to be
worked out carefully in order to extract high performance.
Another block procedure can be derived from the gaxpy Cholesky algo-
rithm. After r steps of Algorithm 4.2.1 we know the matrices Gu E wxr
and G:u E ~n-r)Xr in
T
Au Atl 0 11 0 1,. 0 Gu 0
[ A21 A23 ] = [ G21 I .. _,. ] [ 0 A] [ G21 ln-r ]
equality
•
it, :S L.o! = Bts.
k-1
This show& that tbe entries in the Cholesky triangle are nicely bounded.
The same conclusion can be reaclled from the equation II G II~ II A ll2· =
The roundoff errors associated with the Cholesky factorization have
been extensively studied in a clasaical paper by WUkinson (1968). Using
the results in this paper, it can be shown that if i is the computed 110lution
to Ax = b, obtained via any of our Cholesky procedures then £ solves
the perturbed system (A + E)% = b where II E II:~: :5 c,.ull A l :z: and Cn
is a small constant depending upon n. Moreover, Willdnson showa that if
Qn~(A) $ 1 where Qnis another small constant, then the Cholesky process
nms to completion, i.e, no square roots of negative numbers arise.
l
Ex.mple 4.2.2 U Algornhm 4.2.2 ia applied to the positive definite mairix
A = [ I~ 2~; :~~
.01 .01 1.00
and fj = 10, t = 2, rounded arithmetic used, then gu "" 10, j:u = 1.5, i:u = .001 and
h.z = 0.00. The alcoriibm the11 breaks down tr)'in& to compute 932·
which holds since A(1:2, 1:2) is also semidefinite. This is a quadratic equa-
tion in x and for the inequality to hold, the discriminant 4al3 - 4aua22
must be negative. Implication ( 4.2.10) follows from (4.2.8}. Cl
fork:;:;;: l:n
if A(k,k) > 0
A(k,k) = .jA(k,k)
A(k + l:n,k) = A(k + l:n,k)/A(k,k)
for j = k+ l:n
A(j:n,j) = A(j:n,j)- A(j:n,k)A(j,k) (4.2.11)
end
end
end
r=O
fork= l:n
Find q (k ~ q ~ n) so A(q, q) = max {A(k, k), .. , A(n, n)}
if A(q,q) > 0
r=r+l
piv(k) q =
A(k, :) ...... A(q, :)
A{:,k) +-+ A(:,q)
A(k, k) = .jA(k, k)
A(k + l:n, k) = A(k + l:n, k)/A(k,k)
for j = k + l:n
A(j:n,j) = A(j:n,j)- A(j:n,k)A(j,k)
end
end
end
In practice, a tolerance is used to detect small A(k, k). However, the sit-
uation is quite tricky and the reader should consu]t Higham {1989). In
addition, §5.5 has a discussion of tolerances in the rank detection problem.
Finally, we remark that a truly efficient implementation of Algorithm 4.2.4
would only access the lower triangular portion of A.
Proble~
ia S)'tnllllltric and po.itive definite. (b) Fbrmula&e an algorithm for 10lving (A+iB)(:+ill)
:: (II + U::), where b, c, :~:, and t1 are in R". lt should involve ana /3 Dopa. How much
storage ia required?
P4.2.2 Suppoee A e R'x" is .symtnetric and positive definite. Give an algorithm for
computing an upper triangular lll.lWix R E R"x" such thai A= RRT.
P4.2.3 Let A E R'xn be poeitive definite and 1St T = (A+AT)/2 and S = (A-AT)/'.2.
(a) Show thatII A- 1 112 $ II r- 1 112 and ;~:T A- 1:t $ :~:TT- 1 z for all :r E R". (b) Show
that If A= LDMT, then d~o 2:" 1/11 r- 1 Ill fork= l:n
P4.2.4 Find a 2-by-2 reel matrix A with the property that :r.T A:r. > o ror all real nonzero
2-vectors but which is not positive definite when regarded as & member of ~x •
2
P4.2.5 Suppose A € R'x" hAll a positive diagonaL Shaw that if both A and AT ace
strictly diagonally dominalli, then A is pclllitive definite.
P4.2.6 Show that the function /(:r) = (:z:T A:r.)/2 is a vector norm on R" if &lid only if
A is p011itive definite.
P4.2. T Modify Algorithm 4.2.1 so that if Ule ~quare rooe; of a nepiive number is
encount.an!d, then the al&Qritbm finds a unit vector z so :zT A:r. < 0 &lid terminates.
P4.2.8 The numerical range W(A) of a complex matrix A is defined to be the set
W(A) = {:~:HA:: z-llz= 1}. Show that iiO!{ W(A), then A has an LU factorip,t.ion.
P4.2.9 Fonnulaie au m <n V'!l'llion of the polar dec:omposition for A e R"x".
=
P4.2.10 Suppoee A I+ U'IIT where A E R"l(OI and II 'II n2 = 1. Giw e:q~licit fonuuJa.e
for 1he diagonal and subdiagoDal of A's Cboleaky factor.
P4.2.11 SupPQee A E ft"X" i8 syiiUIIetric positive definite and that itl Choleeky factor
i8 avU.Iable. Let eJ. :I,.(:, .t). For 1 ~ i <; :S "• lee a.; be tbe amalleai real tbat mabl!l
A+ll(~~oef +e1ef) singular. Likewi., le&. Bfi be 1be anallest reallba& lllllkel (A+~ef)
sinruJac. Sbow how 1o compute ''-e quanli1iell using the Sberman-Morrlaoo-Woodbury
formula. Haw many flops IV'e required to find all the ~;?
S)'llliDe'tric positive definRe l)'lieml conaUtute tbe moat important claM of special.U =b
problems. A.Jgol programs for ~heee problems are given in
J.H. Wilkinson (1968). ~). Priori Error Analysis of Algebl'llic Proces&1!fl," Prot!. lnter-
nacionol Ctm~ Moth. (Moecow: lulat. Mir, 1968), pp. 629-39.
J. Meinguet (1983). ~Refined Error Analyam of Choleaky Factorization," SIAM J. Nu-
rner. AnaL ~. 1243-1250.
A. Kielbaainslci (1987). MA Note oa RDunding Error An.alyais ofCholeeky Faetoma.tion,"
Lin. Alg. and !U Applic. 88/89, 487-494.
N.J. Higham (1990). ~ Analyais of 1he Cholesky Decompo~~ition of a Semidefinite Matrix,~
in R.eUable Numt:ricol Computatitm, M.G. Cox and S.J. Hammacling (eda), Oxford
Untver.ity ~.Oxford, UK, 161-185.
R. Carte' (1991). "Y-MP Floa&ing Point and Choialky Factorization," lnt'l J. High
~Computing 3, 215-222.
J-Guang Sun (1992). "Rounding Errol- and Perturbation Bounds for the Choleslcy and
LDLT Factorizaa:iona," Un. Alg. and It.r Applic. 113, 71-91.
N~/eensitivity iauel aa~eiated witb poeitiw 881Jli..defiui~ and the polar de-
composition an! pn!III!Dted in
Computationally-oriented refefence8 for tbe polar- decomparitlon and tbe IIQUant root are
given in §8.6 and §11.2 rmpecr.i-ty.
152 CHAPTER 4. SPECIAL LINEAR SYSTEMS
L _ [ I 0 ] a
and U = [ 0
WT]
Ut
- vfa Lt
have the desired bandwidth properties and satisfy A = LU. []
fork= 1:n -1
fori= k + 1:min(k + p,n)
A(i, k) = A(i, k)/A(k, k)
end
for j = k + l:min(k + q, n)
for i = k + l:min(k + p, n)
A(i,j) = A(i,j)- A(i,k)A(k,j)
end
end
end
If n > p and n > q then this algorithm involves about 2npq Hops. Band
versions of Algorithm 4.1.1 (LD~) and all the Cholesky procedures also
exist, but we leave their formulation to the exercises.
tion to Uz =b.
for j = n: - 1:1
b(j) = b(j)JU(j,j)
fori= max(l,j- q):j -1
b(i) = b(i)- U(i,j)b(j)
end
end
If n > q then this algorithm requires about 2nq fiops.
154 CHAPTER 4 . SPECIAl. LINEAR SYSTEMS
Theorem 4.3.2 Suppose A E R')(" is nonsmgul4r and ha.s upper and lower
lxmdwidtlu q o.nd p, re~peCtively. If Gau.rsian elimination with parli41 pttr
oting is wed to oompute Gaws tmn.sfomuJtioru
; = l :n -1
4.3.4 Hessenberg LU
A3 an example of an WlBymmetric band matrix computation, we show bow
Gaussiao elimination with partial pivoting cao be applied to factor an upper
Hessenberg matrix H. (Recall that if H is upper Hes,enberg then h.o; = 0,
i > j + 1). After k - 1 atepe of GaWJSian elimination with partial pivoting
4.3. BANDED SYSTEMS 155
[ ~:::
0 0
0 0
X
X
X
X
:1
X
X
lc=3,n=5
0 0 0 X X
By virtue of the special structure of this matrix. we see that the next
permutation, P3 , is either the identity or the identity with rows 3 and 4
interchanged. Moreowr, the next GaUBS transformation Mk bas a single
nonzero multiplier in the (k + l,k) position. This illustrates the kth step
of the following algorithm.
for j = l:n
for k == max(l,j - p):j - 1
.\. = min(k + p, n)
A(j:.\,j) = A(j:.\,j)- A(j, k)A(j:>., k)
end
.\. =
min(j + p, n)
A(j:.\,j) = A(j:>.,j)/ ..jA(j,j)
end
If n > p then thiB algorithm requires about n(p2 + 3p} flopa and n square
roots. Of COUI'8e, in a serious implementation an appropriate data structure
for A should be used. For example, if we just store the nonzero lower
triangular part, then a (p + 1)-by-n array would suffice. (See §1.2.6)
If our band Cholesky procedure is coupled with appropriate band trian-
gular solve routines then approximately n;? + 7np + 2n flops and n square
roots are required to solve Ax = b. For small p it follows that the square
roots represent a significant portion of the computation and it is prefer-
able to use the LDL T approach. Indeed, a careful flop count of the steps
A= LDLT, Ly = b, Dz = y, and LTx = z reveals that n;? +Bnp+n flops
and no square roots are needed.
1 0
L=
0 e..-1 1
4.3. BANDED SYSTEMS 157
and D = diag(d~o ••• ,4,) we deduce from the equation A= LDLT that:
au = dt
a~~:,.~~:- 1 = e.~~:-ldifl:-1 k=2:n
au = d.~~:+ ~-l d.~~:-1 =-d.~~: + e~~:_ 1 a~~:,.r.-t k= 2:n
Thus, the d.; and e. can be resolved as follows:
d1 =au
fork= 2:n
e~~:-1 = a,.,~e-1/dt-1; d,. =au- e~ro-tal:,k-t
end
To obtain the solution to A.:r = b we solve Ly = b, Dz = y, and LTx = z.
With overwriting we obtain
matrix A and b E lR", the following algorithm overwrites b with the solu-
tion to Ax= b. It is assumed that the diagonal of A is stored in d(l:n) and
the superdiagonal in e(l:n- 1).
fork= 2:n
t = e(k- I); e(k- 1) = tfd(k- 1); d(k) = d(k) - te(k- 1)
end
fork= 2:n
b(k) = b(k)- e(k- l)b(k- 1)
end
b(n) = b(n)/d{n)
for k = n- 1: - 1:1
=
b(k) b(k)fd(k)- e(k)b(k + 1)
end
and that m > n. Suppoee we have arrays E(l:n- 1, l:m) and B(l:n, l:m)
with the property that E(l:n - 1, k) houses tbe subdiagonal of A(l:) and
B(1:n, k) houses the kth right hand side b(.ll:) . We can overwrite b(JJ) with
the solution :z(l:) as foUows:
fork= l:m
fori= 2:n
B(i,k) = B(i,k)- E(i -l,k)B(i -l,k)
end
end
The problem with thia algorithm, which sequentially solves each bidiagonal
system in turn, is that the inner loop does not vectorize. This is because
of the dependence of B(i,k) on B(i -l,k). H we interchange the k and i
loops we get
fori= 2:n
fork= l:m
B(i,k) = B(i,k)- E(i -l,k)B(i -l,k) (4.3.1)
end
end
Now the inner loop vectorizel well as it involves a vector multiply and a
vector add. Unfortunately, (4.3.1) is not a unit stride procedure. However,
this problem is easily rectified if we store the subdiagonals and right--hand-
sides by row. That is, we use the arrays E(l:m, l:n -1) and B(1:m,1:n-1)
and store the Bubdiagonal of A(AI) in E(k, 1:n- 1) and b(.II:)T in B(k, l:n).
The computation (4.3.1) then transforms to
fori= 2:n
fork= l:m
B(k,i) = B(k,i)- E(k,i -1)B(k,i- 1)
end
end
illustrat.lng once again the effect of data structure on performance.
ProbJem.
P4.S.l Oeriw a buded LDMT pl'C)CBdun llmi1ar &o AJcomhm ...3..l.
P4.S.2 Slt.ow how the output of Algorithm 4.3.._ eaa be liB w eolw the upper Hts-
-bcg~Hz-h.
P4.3.3 Giw u alsori'hm for .olvin& an w.yiDIDCibic ~ 8)'ftem Az ... h th-at
, . . GeuaiA elimma&iou with partial pivotin(. It lhould require Oftly four ""vecwn of
floe&iac poim storap Cor the factoriaaUoo.
P4.3.4 For C E R' 11" define the profiJc ~ m(C, i) = miQ(j:~;i ;. 0}. wb.e
i : l:n. Sbow tb&l if A : ad" il the Cbolelky fad.Oriaa&loo oC A., ChilD m(A, i) :or
m(G, i) fori = l:n . (We~ &lW G baa the ame profiJc M A.)
P4.3.5 SappoM A E R' 11" ill aymmetric positiw deflDite with profile indices 1ni -
m(A, i) wbele i = l:n. Aaume thM A Is nored in a o-.dimlllllional anay vaa folknn:
v :: (mu , ~,,.. 2 , • .• , G22 0 113,,.s• " ' ' ll33, ·· . ,Go.,, ., .. . , Go.,.). Write &D algorithm th-at
overwrites t1 witb the c:orn.poadinc entries ol the Cbolalky fac\01' G 61ld tbeo '*' thia
f~ to eolw Az : b. How ma~~y !lope are required?
P4.3.S ForC E R'.c" de6oe p(C, i ) "' max{i:e;J ;1. 0}. Sup!)C* that A E R''"' haa ao
LU f~ioo A • LU aDd~:
m(A,l) ~ m (A,2) ~ ··· ~ m(A,n)
p(A.l) ~ p(A, 2) ~ . .. S p(A. n )
Show that m(A, i) • m(L, j ) aod p( A, i) = fi(U, i) for i • l :n. Recall tbe defioitioa of
m(A, i) &om P4.3.4.
P4.3.T De¥elop a pxpy wnioG of Alsorithm 4.3.1.
P4.3.8 o.veJop a unn ltride. Yec:Wrizable .Jpithm lor eolving the symmetrk potitlve
defioitetridiaconai8)'Bte!m A<•l~<•> .. /)( 11) . Alwmethat thedlaconala. euperd!aconala.
and fi&ht baDd aides an stored b)' row ia arta)'l D , E , aDd. B aod tba& bl•> ill ovwwriUen
with :z( lr).
P4.1.1a Onelop a -:moo of A1&oritbm 4.3.1 iD wllk:h A iii&Onld by ~
P4.3.10 Gtv. ao aampJe of a 3-by-3 S)'1IIID8&ric poaitiw defbliWI ma&rtx w~ tridlq·
ooa.l part ill DOt poaittve de8nlte.
P4.S.U Couider the Az .. b problem where
2 -1 0 0 -1
-1 2 -l 0
0 - 1 2
A•
0
0 2 - 1
-1 0 0 -1 2
Thi8 ldDd of matrix an- ia houod.ary value proba- with ptriodie bowdary ooodi.Uoll&
(a) Shaw A illliDcuJar. (b) Gi¥eCODdi\loDI tlw b mill& IMiafy fof: tbere tocxilt a .otutioo
aod specify aa atsomhm for aolviac it. (c). A.ume that" i l l . - aDd COMiOer the
pennut.a&ioo
160 CHAPTER 4. SPECIAL LINEAR SYSTEMS
wbere e 11 il ~he k&h mlum.a. of I.,. o-:ribethe ~l'lllllld sy11tem pT AP(PT%) = pTb
and mbow bow C.O 10lve it. hlrume that there ill a IIIDiution IIDd ignon!l pnootiug.
R.S. Martin aDd J.H. Wi.l.kinaon (1965). "S)'DIIDIIUie Decompoeition of Positive Definite
Baod M..mce.,w N'Uf116. Moth. 7, 3M-61.
R. S. Menmand J .H. Wi.lkin81ln ( 1961}. "Solution of Symmetric and U nsytiliiletri.c Band
Equatio1121 and ~be Calcul&tion of Eipnva.IIIM of Band Matric:s, ~ Numer. Math. 9,
27'9--301.
E.L. AJl&ower' (1973). "Exact~ of Certain Band Matl"i.cci!!l,~ NurM:r. Math. JH,
219--84.
Z. Bobte (1975). "Bounds for Rolllldiq Ezrors in the Gau.ian EliminaUon for Band
Symqns," J. lrvt. MGih.. Applic. 16, 133-42.
I.S. Duff (19'17). "A Survey of Sparae Matrix~.~ Proc. IEEE 65, ~535.
N.J. Higham (1990). "BoliDding the Error in Gauaaian Elimination for Tridiagonal
S}'Remll." SIAM J. Macri:~: AnaL AwL 11, 521-530.
A topic of COIIIIiderable intermt in the a.rea of banded matricM deals with method! for
reducing the width of the band. See
C. FB:ber &nd R.A. Ulllllalli {UHI9). "Propatim of Soma~ MatrK:a. and Their
.App&a&ion to Boundary Value Problema," SIAM J. Num. ARAL 6, 121-.c2.
D.J. Rc. (l969). "All Algornbm fcx Solvin( a Special Clall of Tridiagonal S)'lltemll of
Lin~~K Equatioaa,~ Camm.. ACM Ill, 234-36.
H.S. Stoue (1973}. ~An Eftl.cienc Parallel Algorithm for the Solution of a 'Indiagonal
Lineez- SysteQ~ of EquaWD.t,M J. ACM !0, 27-38.
M.A. Malcolm and J. Pam-- {1!r74.). "A FW Method for Solvfu« a C1aa of 'l'ridia«onal
SysCema of LiDeer Equa;iona," Comm. ACM 11, 14-17.
J. Lambiotte aod R.G. Voigt (1975). '"l'he Solution of 'DicJia«on.al Linear Syaiemll of
the COC.STAR 100 Compu.t.," ACM n-... MatA. S~ l, 308-29.
H.S. Stoue (1975). "Pvaalel 'I'ridiagoDal Equacioa Solven,~ ACM n-an.. Math. Soft.I,
289-307.
D. Kmmaw(1982). "SoltRion of Sinpl 'l'riditcvnal u - Systems aDd Vedoriatian of
the ICCG Algornh.m on the Cray-1,~ in G. Rcderigue (ed), Pa.m.Uel Com~
Academic ~ NY, 1982.
N.J. Higham (1986). "EIIicieni AlgoritluDI foe oomputiJJ&: the condition number of a
tridiagonal matrix," SIAM J. Sci. Clnd SCot. Comp. 1, 150-165.
Chapca- 4 of George and Liu (1981) contains a nb survey of bud mechodlll for positive
defiltite 8)'11tema.
4.4. SYMMETRIC INDEFINITE SYSTEMS 161
A= p [ ft 1 ] pT
1 f:l;
has small diagonal entries and large numbers surface in the factorization.
With symmetric pivoting, the pivots are always selected from the diagonal
and trouble results if these numbers are small relative to what must be
zeroed off the diagonal. Thus, LDL T with symmetric pivoting cannot be
recommended as a reliable approach to symmetric indefinite system solving.
It seems that the challenge is to involve the off-diagonal entries in the
pivoting prore;s while at the same time maintaining symmetry.
In this section we discuss two ways to do this. The first method is due
to Aasen(l971) and it computes the factorization
(4.4.1)
=
where L (l,J) is.unit lower triangular aod Tis tridiagonal Pis a permu-
tation chosen such that 14;1 :5 1. In contrast, the diagonal. pivoting method
due to Bunch and Parlett ( 1971.} computes a permutation P such that
(4.4.2)
l
beeo traasformed to
A(l) = M 1 P1 AP[M( = [ ~~ ~ :
Ov4 x x x
; ;
Ovsxxx
where Pt is a permutation chosen so that the entries in the Gauss trans-
formation Mt are bounded by unity in modulus. Scanning the vector
(v3 v, vs)T for its largest entry, we now determine a 3-by-3 permutation P,
such. that
Pt 0 0
n
[ a,
P1 O'J ~ 0
A<2> = M,P,A(t) P[ M[ = 0 ~ X X
0 0 X X
0 0 X X
L = (Mn-:~Pn-2 · · · M1P1PT)- 1 •
Analysis of L reveals that its first column is e 1 and that its subdiagonal
entries iP column k with k > 1 are "made up" of the multipliers in M1c-1·
The efficient implementation of the Parlett-Reid method requires care
when computing the update
(4.4.3)
bu
u::;: Be 1 - -w,
2
then the lower half of the symmetric matrix B+ = B- wuT- uwT can
be formed in 2(n - k) 2 flops. Summing this quantity as k ranges from 1
to n - 2 indicates that the Parlett~Reid procedure requires 2n 3 /3 flops-
twice what we would like.
then
•=[iii!]
Pi""' [et~l!l~)
M1 !4 - {0, 0, 2/3, 1/3, )T ef
Pz =- [ et e:a 1!4 e:s ]
M:a '"' !4 - (0, 0, 0, 1/2)Tef
= LTLT, whi!R P = (e1, e:s, e.&, e:~),
*1l·
and PAPT
T=
For clarity, we temporarily ignore pivoting and assume that th4!l factoriz~
tion A = LTLT exists where Lis unit lower triangular with L(:, 1) = e 1 •
Aasen's method is organized as foUows:
for j = 1:n
Compute h(l:j) where h = TLT e; =He;.
Compute a(j).
if j ~ n - 1
Compute PU> (4.4.4)
end
if j~ n-2
Compute L(j + 2:n,j + 1).
end
end
Thus, the mission of the jth Aasen step is to compute the jth column of
T and the (j + 1)4 column of L. The algorithm exploits the fact that the
matrix H = TLT is upper Hessenberg. AB can be deduced from (4.4.4),
the computation of a(j), {J(j), and L(j + 2:n,j + 1) hinges upon the vector
h(l:j) = H(1:j,j). Let us .see why.
Consider the jth column of the equation A = LH:
v(i + 1) = h(j + 1)
and so from that same equation we obtain the following recipe for the
(j + 1)-st column of L:
{J(j) = v(j+l).
With these recipes we can completely describe the Aasen procedure:
for j = 1:n
Compute h(1:j} where h = TLT e;.
ifj=1Vj=2
=
a.(j) h(:j)
else
a.(j) = h(j) - {J(j- 1)L(j,j - 1)
end
if j ~ n- l {4.4.7)
v(j + l:n) = A(i + l:n,j) - L(j + l:n, l:j)h(l:j}
,tJ(j) = v(j + 1)
end
ifj5n-2
L(i + 2:n,j + 1) = v(j + 2:n)jv(j + 1)
end
end
This lower triangular system can be solved for h(1:j) since we know the first
j colUDlDS of£. However, a much more efficient way to compute H(l:j,j)
166 CHAPTER 4. SPECIAL LINEAR SYSTEMS
Collecting results and using a work array t(l:n) for L(j,1:j) we see that
the computation of h(l:j) in (4.4.7) can be organized as follows:
if j =1
h(l} =A( I, 1)
elseif j = 2
h(l} = ,8(1); h(2) = A(2,2) (4.4.9)
else
l(O) = 0; l(1) = 0; l(2:j - 1) = L(j, 2:j - 1); i(j) = 1
h(j) = A(j, j)
fork= l:j -1
h(k) = ,B(k- 1)t'(k - 1) + a(k)i(k) + (J(k)i(k + 1)
h(j} = h(j) - l(k}h(k)
end
end
Note that with this O(j) method for computing h(1:j), the gaxpy calcula-
tion of v(j + l:n) is the dominant operation in (4.4.7). During the jth step
this gaxpy involves about 2j(n- j) Oops. Summing this for j = l:n shows
that Aasen's method requires n 3 /3 flops. ThUB, the Aasen and Cholesky
algorithms entail the same amount of arithmetic.
for j = l:n
Compute h(l:j) via (4.4.9).
if i = 1 v j = 2
a(j) = h(j)
else
o:(j) = h(i)- {j(j- l)L(j,j- 1)
end
ifj~n-1
v(j + 1:n) = A(j + 1:n,j)- L(j + l:n, l:j)h(l:j)
Find q so jv(q)i = I v(j + l:n) lloo with j + 1 :::; q :::; n.
piv(i) = q; v(j + 1) +-+ v(q); L(j + 1, 2:j) ...... L(q,2:j)
A(j + 1,j + 1:n) +-> A{q,j + l:n)
A(j + 1:n, j + 1) +-+ A(j + l:n, q)
(J{j) = v(j + 1)
end
ifj~n-2
L(i + 2:n,j + 1) = v(j + 2:n)
if v(j + 1) ~ 0
L(j + 2:n,j + 1) = L(j + 2:n,j + 1)/v(j + 1)
end
end
end
Aasen's method is stable in the same sense that GaUSBian elimination with
partial pivoting is stable. That ia, the exact factorization of a matrix near
A is obtained provided fl T l!:t/11 A l12:::: 1, where tis the computed version
of the tridiagonal matrix T. In general, this is almost always the case.
In a practical implementation of the Aasen algorithm, the lower trian-
gular portion of A would be overwritten with L and T. Here is. n = 5
case:
PtAP[ = [ ~ '; ] n ~s
8 n -8
T
PtAPt =
( r. o ] [ E0
cE- 1 In-• B- CE-l(fl'
o ] [ r.0 E- cr ]
1
/,._.
For the sake of stability, the s-by-s "pivot" E should be chosen so that the
entries in
(4.4.10)
are suitably bounded. To this end, let a E (0, 1) be given and define the
size measures
1-'G = max
i,j
l<li; I
while s = 2 implies
(4.4.12)
4.4. SYMMETRIC INDEFINITE SYSTEMS 169
l 10 20 ]
A = 10 l 30
[ 20 30 1
then in tbe first step ), "" 20, r == 3, cr = 30, and p == 2. Tbe permutation P = [ e3 e:z et ]
is applied giving
PAPT =[ io ~ ~ ] .
20 10 1
A 2-by-2 piwt is then tilled to produce the reduction
l 0
PAPT "" 0 1
[ .3115 .tl563
A= ( 4.4.13)
where Cis symmetric positive definite and B has full colwnn rank. These
conditions ensure that A is nonsingular.
Of course, the methods of this section apply to A. However, they do not
exploit its structure because the pivot strategies "wipe out" the zero (2,2)
block. On the other hand, here is a tempting approach that does exploit
A's block structure:
(a} Compute the Cbolesky factorization of C, C = G(/I'.
(b) Solve GK =B forK E Jr)(P.
(c) Compute the Cbolesky factorization of i(TK = sTc- 1B, HHT =
I(1" K.
However, it is clear by considering steps (b) and (c) above that the accuracy
of the computed solution depends upon .oc(C) and this quantity may be
much greater than ,o,;;(A). The situation has been carefully analyzed and
various structure-exploiting algorithms have been proposed. A brief review
of the literature is given at the end of the section.
But before we close it is interesting to consider a special case of (4.4.14)
that clarifies what it mean8 for an algorithm to be stable and illustrates
how perturbation analysis can structtm the search for better methods.
In several important applications, g = 0, C is diagonal, and the solution
subvector y is of primary importance. A manipulation of (4.4.14) shows
that this vector is specified by
(4.4.15)
Looking at this we are aga.io led to believe that ~t( C) should have a bearing
on the accuracy of the computed y. However, it can be shown that
(4.4.16)
IB,v][~]=!
is then solved implying f =By+ Vq. Thus, BTC- 1f = BTC- 1By and
(4.4.15) holds.
Problema
P4.4.1 Show t!W if all the l·by-1 Uld 2-by~2 priDcipalsubiD&b'ieee of an n-by-n
symmetric ma&.rix A are singular', tbeD A is zero.
P4.4.2 Shaw tha& no 2-by-2 pivots ClUJ arill8 iD the Bunch-Kaufman algorithm if A is
posnive definite.
P4.4.S Arrange Algorithm 4.4.1 ao tha& only the knn!r- triangll.lal- portioD of A ill
referenced and eo tha& a(j) 0\WWrit.es .A(j,j) f€K j =
1;n, {J(j) ovennitel A(j + l,j) for
j = 1:n- 1, aDd L(,,j) overwrite. A(i,j- 1) for j = 2:n- 1 and i = j + I:n.
P4.4.4 Supp<Me A E R' u is DODSingular, symmetric, and atricily diagonally dominant.
Give an aJcorithm thai compute. the factorizaiioD
nAIIT == ( ~ -~ J[ R: ~ ]
where R E ft')(il aDd ME R<"-.11))((,.-lll are loww trian&u1at' and uoWiingulaz- and ll is
a pennutaiioa..
172 CHAPTER 4. SPECIAL LINEAR SYSTEMS
D= [ ~1 -~~ J
where D1 E R'x" and lh E IJ'XP have ~tiw cllaconal eniriel.
P4.4..8 Pron {4.4.11) lllld (4.4.12).
P4.4.T Sboao that -(BTC- 1 B)- 1 lethe (2,2) block of A- 1 whece A ~giYea. by (4.4.13).
P4.4.8 The point of this problem is to coll.Sider a special caM of (4.4.15). Define the
matrix
where
C =(I,. +aer.ef> a> -1.
a.a.d e1 = I,.(:,k). (Note thai C i3 jll8l the identity with a added to the (k, k) entry.)
Aaume that BE R'x" hell rank p and show tha&
J.R. Bunch aad B.N. Parlett (1911). "Direct M~bodll fill' SolYiJI& SyrnmeCrlc l.ndeftnite
SyM:ID8 of~ Equaiiou," SIAM J. Nvm. Anal. 8, 639-SS.
J.R. BlllloCh {1971). ~Allalym of the Dia&oaal Pivoting Method: SIAM J. Nvm. Anal.
8, 6S6-680.
J.R. Bunch (1974). ~Partial Pivot.ing~forS}'DIIDI!tric Matrioell," SIAM J. Num.
Anol. 11, 521-528.
J.R. BUJI.Ch, L. Kaufmau, aDd B.N. Paclett {1916). ~tioQ. or a Symmetric
Matrix.• Nwn.er. MtUh. 1'1, 95-109.
J.R.. BWICb IIDCI L. Kaufman (1977). "'Some Stable M~ Jot: Calculatin& lnarl:ia aud
Solvillg Symmetric Lil:leu' S~" Math. Cqmp. 3l, 162-19.
I.S. Duff, N.I.M. Gould, J.K. Reid, J.A. Scott, IIDCI K. Turn..- (1991). "The Fllottorization
of SJMl1IIIIIDdelluite Matricm," IMA J. ~ AnaL 11, 181-204.
4.4. SYMMETRIC INDEFINITE SYSTEMs 173
=
b eolW!', it may be advisable to equilibnte A. An
Before UBing &D)' ll)'tJllnetric A%
O(n") algorithm lor acmmpliabing this t-" ia giwm. in
Aoaloguee of tbe symmetric iDdefinite solwl'll tba.t - bave prlllll!:llted l!ldst ror sla!w-
symmetric 'Y!Itema. See
J.R. Bunch (1982). ~A Note on the Stable Decompoei\ion of Slaew Symmetric Matric.es,n
Mllth. Comp. 158, 475-480.
The equilibrium system liter.ture is acatU!nld among &he 3IM!l'a1 applicaiion a.reu where
it ha11 an i.mport&nt role to play. Nice 0\WViewl with pointers to thil litemture include
G. Strang (1988). "A l"raJDework fOI.' Equilibrium Equatious," SIAM &uU:u 30, 283-291.
S.A. Ve.weis (1994). "Stahle MUJMrical AJ&orithmlll for Equilibrium Syatema," SIAM J.
Matr= Afi4L AppL U, 1108-1131.
Some of tbme pllpSIImaim ~ of tbe QR factorisUioll. &Dd otbe£ leuC IIC(IUII'EII id.-
tlw. an~ in the ned cbaptllr &Dd §12.1.
Probleuls wiUa nructure abound in m.irix ~ and perturlla&ion &ll.ecQ'
baa a by ro1ll to play In the -a. ror Aab&e, flftlcieD& algorithm&. Fix' equilibrium .,..
tems, there ~ IIEIVEII'8l reau.Ita lib (4.4.15) that UDdmpiJ1 the moM. effilctive algl:lmb.!llS.
See
174 CHAPTER 4. SPECIAL LINBAR SYSTEMS
G.W . S&ewan (1989). "'n Sealed Projectiona Uld PlltldoUI..--,• Lin. Alg. ond /U
Applic. I II, 1ag...1~.
D.P. O'Leuy (1000). "011 Bound8 C« Scaled Projtc:tioM Uld Peeudo~~ L in. Alg.
cm4 Ie. A,&. 1#, 115-117.
M.J. 'Thdd (1990). ~A Darrtals-WoUHilur VarieD& of ~·• lm«ior-Pola& LiMar
Procrunmiuc A)&withm." ~ Raeareh. 38, Ul06-1018.
0 :f) bt
x, lJ.z
= (4.5. 1)
Fn- 1
0 En- 1 Dn Xn bn
Here 1WI assume that all blocks are q-by-q and that the x, and b, are in
~. In this section we discusa both a block LU approach to this problem as
ftlJ as a divide and conquer scheme known as cyclic reduction. Kronecker
product systems are briefly mentioned.
Dt F1 0
Et D~
Fk-1
0 E,._ L D,.
4.5. BLOCX SYSTEMS 175
Comparing blocb in
I 0 u. Ft 0
La I 0 u2
A..= (4.5.3)
Fn-1
0 L..-· I 0 0 U,.
Ua =Dt
fori:= 2:n
Solve L,_,u,_, E,_:= 1 for L,_ ,. (4.5.4)
U,:: D,- L,_,.F(_,
end
The procedure is defined so long as tbe U, are nonsingular. This is assured,
for example, if tbe matrices A, , ... ,An are nonsingular.
Having comput«i the factorization (4.5.3), the vector z in (4.5.1) can
be obtained via block forward aud back substitution:
Yt • bt
fori= 2:n
end (4.5.5)
Solve Un:&n = Yn for %,..
for i = n- 1: - 1:1
end
To carry out both (4.5.4) and (4.5.5), each U, m~ be factored aiDce linear
systems involving these' sublllAtrices are solved. This could be done using
Gt.ussian elimination with pivoting. However, this does not guarantee the
stability of the overall process. To see this just consider the caae whi!n the
block size q is unity.
E,.:Fo=:O (4.5.6)
176 CHAPTER 4. SPECIAL LINEAR SYSTEMS
then the factorization (4.5.3) exists and it is poeaible to show that the ~
and Ua satisfy the inequalities
IILdl1 ~ 1 (4.5.7)
IIUilll ~ IIA.alh (4.5.8)
(4.5.9)
U1 = D1 (diagonal)
1 (tridiagonal)
£1 = E1Ui"
u'J = ~- L1F1 (pentadiagonal)
Yl = bl
112 = ~- E1(Di 1yl)
U2z2 = lh
DtXl = Yl- FtX2.
Consequently, some very simple n-by-n calculatiODS with the original banded
blocks renders the solution.
On the other hand, the naive application of band Gaussian elimination
to the system ( 4.5.9) would entail a great deal of unnecessary work and
storage as the system has bandwidth n + 1. However, we mention that by
permuting the rows and columns of the system via the permutation
(4.5.10)
4.5. BLOCK SYSTEMS 177
X X 0 X 0 0 0 0 0 0
X X X 0 0 0 0 0 0 0
0 X X X 0 X 0 0 0 0
X 0 X X X 0 0 0 0 0
PAPT = 0 0 0 X X X 0 X 0 0
0 0 X 0 X X X 0 0 0
0 0 0 0 0 X X X 0 X
0 0 0 0 X 0 X X X 0
0 0 0 0 0 0 0 X X X
0 0 0 0 0 0 X 0 X X
This matrix has upper and lower bandwidth equal to three and so a very
reasonable solution procedure results by applying band Gaussian elimina-
tion to this permuted version of A.
The subject of bandwidth.reducing permutations is important. See
George and Liu (1981, Chapter 4). We also refer to the reader to Va.rah
(1972) and George (1974) for further details concerning the solution of block
tridiagonal systems.
D F 0
F D
A= E Jr'9X"''' (4.5.11)
F
0 F D
where F and Dare q-by-q matrices that satisfy DF = FD. We also 888Ullle
that n = 211: - 1. These conditions hold in certain important applications
such as the discretization of Poissonts equation on a rectangle. In that
situation,
4 -1 0
-1 4
D= (4.5.12)
-1
0 -1 4
178 CHAPTER 4. SPECIAL LINEAR SYSTEMS
and F = -lq. The integer n is determined by the size of the mesh and can
often be chosen to be of the form n = 2k- 1. (Sweet (1977) shows how to
proceed when the dimension is not of tbis form.)
The basic idea behind cyclic reduction is to halve the dimension of the
problem on hand repeatedly until we are left with a single q-by-q system
for the unknown subvector x 2•-•· This system is then solved by standard
mea.Jl8. The previously eliminated z, are found by a back-substitution
process.
The general procedure is adequately motivated by considering the case
n=7:
bt = Dx1 + Fz2
~ = Fzt + Dz2 + Fza
~ == Fz 2 + Dz3 + Fz4
b, = Fz3 + Dx, + Fz6
IJs = Fz, + Dx6
ba = Fxs Fx1
b., = Dxr
(4.5.13)
Fori= 2, 4, and 6 we multiply equations i - 1, i, and i + 1 by F, -D, and
F, respectively, and add the resulting equations to obtain
{2F2 - D 2 )x2 + f"lx, = F(b1 + b:J) - Dbz
f"lz2 + (2F2 - D2 )x, + F 2x6 = F(b3 + bs) - Db4
p2z, + (2f"l - D 2)z6 = F(b5 + b.,} - Dbo
Thus, with this tactic we have removed the odd-indexed x, and are left
with a reduced block tridiagonal system of the form
D( 1lx2 + F( 1lz4 = b~1 )
p!ll.x 2 + D(llx4 + p0lz6 = b~1 l
F( 1lx4 + D(llz6 = b~ll
=
where D< 1l 2f'2- D 2 and p(ll = Fl commute. Applying the same elim-
ination strategy as above, we multiply these three equations respectively
by p{ 1l, -D<ll, and .F(ll. When these transformed equations added are
together, we obtain the single equation
( 2[p(l)]2- D(l)2) Z-t = p(l) ( b;l) + bil') - D(l)bll)
which we write as
D( 2) X-t = b{2).
This completes the cyclic reduction. We now solve this (small) q-by-q sys-
tem for :z,. The vectors 2:2 and X6 a.re then found by solving the systems
D(l)X2 = ~1) - p(ll:z:,
v< 1l:z:e = b~l) - p(l)z,
4.5. BLOCK SYSTEMS 179
Finally, we use the first, third, fifth, and seventh equations in (4.5.13) to
compute xa, xa, xs, and xr, respectively.
= =
Fbr general n of the form n 2~r -1 we set D(O) D, F< 0 l F, b(O) b = =
and compute:
for p = l:k -1
p<Pl = [FCP-11]2
D<P> = 2F<P> -[D<P- 1>] 2
r= 211
for i = 1:2"'-"- 1 (4.5.14)
b(pl = p(p-t) (b<P-tl + b(p- 1) ) - D(p-llb~- 1 1
jr jr-r/2 jr+r/2 Jr
end
end
elseif i = 2"-p+l
- b(p) p(p)
C- (:lj-l)r- X(2j-2)r
else
C = b~~~-l)r - p(p) {x2jr + X(2j-2)r}
end
Solve D(p)X(IIj-l)r = c for X(2j-l)r
end
end
The amount of work required to perform these recnrsions depends greatly
upon the sparsity of the D(p) and F(pl. In the worse case when these
matrices are full, tbe overall flop count baa order log(n)t/. Care must be
exercised in order to ensure stability during the reduction. For further
details, see Buneman (1969).
[
-141
0
-141
l
01
-14
l[ l [ z,
~"
:1"8
"" -24]
-48
-80
p=1
[-194]=[~][-ns] p=2
The %i .-e theu deten:n.inad via {4.5.16):
p=2: ~ =4
Jl ""' ~~ %2 =2 %8 =6
p ... 0: %1 =1 %3 == 3
vee{X) =
[
X(:, 1)
:
X(:,n)
l E Rmn.
(B®C)~ ""'d
is equi'lalent to aolving the matrix equation CXBT = D fOr X where
z = =
vee(X) alld d vee( D). Thia baa efficiency rami1icatlons. To Wusuate,
suppoee B, C E ~w. are symmetric poeitive definite. If A : B ® C is
treated as a geoera.l matrix and factored in order to solve fur ~, then 0(n•)
11 2
dops are required Binc:e B ® C e E' x.. . On tbe other h.aod, the solution
approach
1. Compute the Cholesky factorizations 8 = G(ff" and C: HJ(T.
2. Solve BZ = DT for Z using G.
4. z = vec(X).
involves O (n 3 ) flops. Note that
Problem.
iD 0(1\) &,. ...-. o./J E R lllld hER'" - gr_, eacl ~ IIIIMrix of C04Ilciell&a At- ill
nonaiJiplar. Tbe edvilability of goiBI for neb • quick eolutm il a eomplicaWd - .
dw depeada upGQ tile mDditioa aumbsa of A ud -4 lllld aU. fadon.
P4.5.5 Verily (·U.16)-(U.19).
P4.&.7 Sboao boa- to~ the SVD of B ® C from tbe SVDI of B Uld C .
P4.5.8 U A, B. Uld Cue mallic8, tbeD il cu belbowu tba& (A®B)®C • A8(B®C)
182 CHAPTER 4 . SPECIAL Lll'fEAR SYSTEMS
ud eo - ju.t write A ~ B 4P C far tbll meUtx.. Sbftr bow to .W. the u- .,._
(A e 8 e C).:= d a.umin& that A, B, ud C are I)'11IJrletrie poeiliw deli~tite.
J.A. G«qe (1974). "On Block Wimioatioo for S~ u-r Syfien\1,~ SIAM J. N~~m.
AML 11, 585--603.
R.. Pourer (1984). "St..u-c- Matrio~!a ead S~" SIAM Rnial U. 1-n.
M.L. Merriam (1985). •on the Fllctoriaatioo ol Blodt 'lndiacoula With Storace Con-
araiD~" SIAM J. Sci. AM Sill*. CMIIJ'. 6, Ul2·192.
The property of block cliacooa1 dominaDc:e &lid ita variou8 impl.icatioo8 is the central
theme lD
D.G. Fusold and R..S. Vacp {1962). "Block Di~aally Domiaam Macncee aad Gen-
eraliaationa of the GenllevriD Circle Tbeoremt Ptldftc J. Mlllh. It, 1241-56.
Early methoda ~bat involve the idea of cyclic reduction are d.;:ri~ in
0. Bu~~e~UG (1960). "A Compact Noo-lteraiive PoiaJo Solvw," ~\aport 294, Stanford
Univenli'Y lniC.itu\e foe Plum& ~. S~. Califonai&.
~b.- li&erMure cooc:emed with cyelic redu~ion iDcluct.
F.W. Dow (1910). '"l'be ~ Solu&ioD oBIM Dillc:rCe Polilloe EquaUOD OD a R.ect&D-
P," SIAM Review 1.. ~.
B.L. Busbee, F.W. Dorr, J .A. Georp. aad O.H. Golub (1971). "The Direcl Solution o(
die Dl8cnCe PoM.oo Equa&ion OD ~ ft.eclou," SIAM J. Nwm.. AnoL 8. 122-36.
F.W. ~ (1973). '"The Dlrecl SolutioD of tbe Dia:tece Poia1oo Equtioa i:D O(tal)
~·SIAM Rewiew 15, 412-41&.
P. Concua IIDd G.H. Golub (1973). •U• ol Fu& Direct Mebodlf« &be Eftld1111t Nu-
morical Solation ol No~ Elllp&ic Equa&ioua,~ SIAM J. Num. Anal. 10,
U03-20.
B.L. Bua~Me aDd F .W . Dorr (1974). '7be ~ So11l~ oC tbe Blbarmoo.ic Equation
OD ~ ftecjoca aad tbe P~ F.qu&tioa on fn1wu.la.r .R.cio-.• SlAM J.
Nvm. .4-'. 11, 753-03.
D . Heller ( 1~76). "SoiDI Alpecta oHbe Cyclic Reduction Al&oriU!m for Block Tric1la«ooaa
Lin~ s.,--,-SIAM J. N-. AnoL 13, 484-fe.
P.N. Swamuaat-lllld R.A. s.- (1973). -rile Dlrecl SoluUoa of the Oiacme Poaoo
Equa&»ll oa a DIU.'' SIAM J. Nwm. AML 10, ~.
4.6. VANDERMONDE SYSTEMS AND THE FFT 183
R..A. SWMt (1974}. •A a--.lis«i Cyclic RAid!ICWn Alcombm.." SIAM J. Nvm. AnaL
11,506-20.
M.A. Diamond N1d D.L.V. Ferreira (1978). "'n a Cyclic lbduction Method for the
SoluUDn of Poillloo'• EquatloD," S/Altl J. Nvm. AnaL 13, 54-10.
R.A. Sw.t {1971}. "'A Cyclic Reduction Alp1ihJD for SotviDc Diode TridiJicoD&I Sy&-
&ema of Arbitrary Dimea8ioo," SIAM J. N'lml. AnaL 14, 706-20.
P.N. Swamrauber aad R. ~ (1989). "Vector aud Par.llel Methods for the Dinlct
Solutioll ofPo_,.'• Equation,~ J. Comp. AppL M/Uh. rl, 241-263.
S. Bondellud W. Cuder (1994). "Cyclic Reduction !or Special nidiasooal Systems,~
SIAM J. Mllir'IZ AnaL Appl. 15, 321-330.
F'Ol' cmtai.D matricel tha& artae in conjUDCtion rib elllp& perlla.l diffi!nm.tial equationa,
block ellmillaiion com.'lllp<lnds to rather natural opera&lona on the underlying mesh. A
clurical example of thilll is the method of neated dillrieaioD. dellcribed in
J.R. Bunch (1976)."Block Methoda for Solving Span~e Lipear Sy.tems," in Spar~e
Motriz ComputGHoM, J.R. Bunch and D.J. &c- (eds), Academic Pre., New York.
Bordered linear systems as pn!!lelltad in P4.5.4 are ~ in
W. eo-t. and J.D. Pryce {1990). ~Block Eliminacion wiib One Iterative Refinement
Solvm Bordered Lineel' Syatemll Accuru.ely," BIT 30, 490-507.
W. Caw.erta (1991). "Stable Sol\'el'll and Block Elimina1ioa for Borden!d Systems,"
SIAM J. Mott'U An4l. Appl. 11, 469-483.
W. Govaertl and J.D. Pryce (1993). ~Mlxa:l Block Elimination for Line&~' Sy&ems with
Wider Boniers," IMA J. Nvm. A~. 13, 161-180.
Kroneda!!r- product refer-enOM iDCiude
(4.6.1}
(4.6.2)
a(O:n) = c(O:n)
for k = n- 1: - 1:0
fori= k:n -1 (4.6.4)
lli = ll; - Xklli+l
end
end
Combining this iteration with (4.6.3) renders the following algorithm:
[ i~ !~ .:! !!!]T[::;]
::
z [ :]
~~
186 CHAPTER 4. SPECIAL LtNEAR SYSTEMS
lie 0
1 0
-a 1
1
0 -a 1
and the diagonal matrix D~c by
uT = v;-~ 1 L,.-t(l)···DO''Lo(t).
Similarly, from ( 4.6.4) we have
z = v- 1b -= U(Lb)
= (Lo(l)T Di) 1 • • • L,.- J(ll D,;2 1) (L..- t(Zn-1) · · · Lo(zo)b)
4.6. VANDERMONDE SYSTEMS AND THE FFT 187
fork= O:n -1
for i = n: - 1:k + 1
b(i) = b(i)- x(k)b(i- 1)
end
end
for k = n - 1: - 1:0
fori= k + l:n
b(i) = b(i)f(x(i) - x(i- k- 1))
end
for 1 = k:n -1
b(i) = b(i)- b(i + 1)
end
end
This algorithm requires 5n2 /2 flops.
l[~ l -~ l
[ 1!1 84~ 2 7!9 6 416! S2
.llJ
= [ 3
35
The fiZ'It k-loop computes tbe vector
~(3)~(2)L,(l) j l ~ [-~ ].
[
Lo(l)'DO'L,(l)'DO'L,(l)TD;' [ j l ~ [i l
4.6.3 Stability
Algorithms 4.6.1 and 4.6.2 are discussed and analyzed in BjOrck and Pereyra
(1970). Their experience is that these algorithms frequently produce sur-
priBingly accurate solutions, even when V is ill-conditioned. They also
show haw to update the solution when a new coordinate pair (xn+l. fn+t)
188 CHAP'rER 4 . SPECIAL LINEAR SYSTEMS
l
Vc~ ~t~lteml, le., systems involving matrices lib
01 :t31
2%1 ~ .
~~
where
"'" = exp( -211'i/n) ""'ooa(211'/n)- i · sin(211'/ n).
-:_: ]·
-1 -·
If :t e C', t hen its DFT is the vector F,.:t. The DIT bas an extremely
important role to play throughout applied mathematics and engineering.
U n is highly compceite, tben it is possible to carry out the DFT in
many~ than the O(n,) ftops required by conventional matrix-vector
multiplication. To illustrate this we set n =
2' and proceed to develop
the ~2/an Fourier trun.jorm (FFT}. The starting point 15 to look
at an even-order DFT matrix when we permute its colu.mns 80 that the
even-indexed colWIUliS come first. Consider the cue n a 8. Noting that
w:; = ~i mod 8 we have
1 1 1 1 1 1 1 1
l w w2 w" w$ w" w1
1 wa w• w'
1 w3 w" w
""' 1 w2 w'
w' wT w2
w"
ws
Fa= 1 w' 1 w• w' W=Ws·
1 w' 1
1 ...,s w2 w" w' w w'
1 ...,e w' w'2 1 w" w• """
w2
1 ...,r w" wS w' Wl w
""'
4.6. VANDBRMONDE SYSTEMS AND THE FFT 189
F4
Fa(:, c) = [ F4
n,F,]
- n.F4
wnere
n..... ~~ ~ ~ w~]l ·
0 0 0
It follows that if z in an 8-vector, then
Fax=F(:,c)x(c) =
= [ ~ ,_g: ][~:=~~~~~~~ J.
Thus, by simple seaJinp we can obtain the 8-pcint DFT y = Faz from the
4-point DFrs YT = ·F4 x(0:2:7) and·ya = F•%(1:2:7):
Here,
y{O:m - 1) = liT+ d. • YB
y(m:n- 1) .,. !IB -d. • !IB
190 CHAPTER 4. SPECIAL LINEAR SYSTEMS
where
YT = Fm:c(0:2:n- 1},
YB = Fm:c(1:2:n- 1).
For n = 2' we can recur on this process until n = 1 for which F1 x = x:
function y = F FT(x, n)
ifn=l
y=x
else
m = n/2; w = e-2-rls/n
'!IT= FFT(x(0:2:n),m); YB = F.FT(x(1:2:n),m)
d = ( 1, w, · · ·, wm-l ]T; z =d.* YB
Y
:=[YT+Z]
YT -z
end
This is a member of the fast Fourier transform family of algorithms. It
has a nonrecursive implementation that is best presented in terms of a.
factorization of F... Indeed, it can be shown that F,. =At··· A1P,. where
L=~. r =nfL
with
BL = [ h/r,f2
12 nL/2 ]
-{h/2 an
d n
L/2 = diag(I 'w}L, ... 'WLL/2-1) .
The matrix P,. is called the bit reversal pennutation, the description of
which we omit. (Recall the definition of the Kronecker product "®" from
§4.5.5.) Note that with this factorization, 11 = F,.z can be computed as
follows:
z=P,.x
for q = l:t
L=2'l,r=nfL {4.6.5)
X= (Jr ®BL)X
end
The matrices~= (lr®BL) have 2 nonzeros per row and it is this sparsity
that makes it possible to implement the DFI' in O{nlogn) Hope. In fact,
a careful implementation involves 5n log2 n flops.
The DFI' matrix has the property that
(4.6.6)
4.6. VANDERMONDE SYSTEMS AND THE FFT 191
That is, the inverse of Fn is obtained by conjugating its entries and acaling
by n. A fast inverse DFT can be obtained from a (forward) FIT merely
by replacing all root-of-unity referencs with their complex conjugate and
scaling by n at the end.
The value of the DFI' is that many '"hard problems" are made simple
by transfo.rmillg into Fourier space (via Fn)• The sought-after solution
is then obtained by trBDSforming the Fourier apace solution into original
coordinates (via F; 1).
Problema
P4.6.2 (Gaut.achi 1975&) Verify tbe following inequality for tbe n::; 1 cue above:
''"'
Equality resultlll if the z; an all on the same ray in the complex plane.
Tbe latter , ............ iDclude. 811 Alcol prooedw.. Error ~of V&AdarmoDde l)'lltem
110lwn include
The bmlic algorithms p~nted can be extended to cover confluent Vandermonde sys-
tems, block Vandermonde syaems, IIDd Vaudermonde systl!lllll that 111'8 ~ on other
polynomial ~:
The FFI' litmat.UJ1!1 ill 'ili!fY exteuive aud acattered. Fbr an 0\W\'iew of tbe ace& couched
in Kronedcer prodlaei noiation, -
C.F. Van Loen (1992). CQ!n~ ~far the Faa' Fourier 7hatu/orm,
SIAM Pnbl~. Phlladelplria. PA.
The polni of view in tllW te~tila that difterem FFTe oorre.pond to diBenut ~na
of ihe OF"I' maUtx. 'I'hale 11111 ~e fac:toriatiolla in that the fActota have vecy few
I10Mei'OII per row.
4.1. TOEPLrrZ AND RELATED SYSTEMS 193
T = ro
r -1
r1 r:r1z: r,r3]
ro =
[3 11
4 3 1 7
6]
r -2 r -1 ro rt 0 4 3 1
[
r -3 r -2 r -1 ro 9 0 4 3
i.s Toeplitz.
Toeplitz matrices belong to the larger class of persymmctnc matrice...
We say that B e R'xn is persymmetric if it symmetric about its northeast-
southwest diagonal, i.e., bi; = bn-;+t,n-i+l for al1 i and j. This is equivalent
to requiring B = EBT E where E = [ e,., ... , et ] = !,.,.(:, n: - 1:1) is the
n-by-n exchange matrix, i.e.,
00 00 01 01]
E=
[ 01 1 0 0
0 0 0
.
It i.s easy to verify that (a) Toeplitz matrices are persymmetric and (b) the
inverse of a nonsingu.la.r Toeplitz matrix is persymmetric. In this section we
show how the careful exploitation of (b) enables us to solve Toeplitz systems
with O(n 2 ) fiope. The discUSBion focuses on the important case when Tis
also symmetric and positive definite. Unsymmetric Toeplitz systems and
connections with circulant matrices and the discrete Fourier transform are
brie8y discUBSed.
1 r1 r,_, Ti;-1
r1 1 rk-2
To~; =
ri:-:l rl
Ti;-1 r.~:-2 Tt 1
• Durbin's algorithm for the Yule- Walker problem T,y = -{r 1 , ... , r,JT.
• Levinson's algorithm for the general righthand side problem T,z =b.
• Trench's algorithm for computing B = 7; 1•
z = y- aE.,T; 1r = y +aE,y.
I E"y ] T E~.y
[ 0 1
[
rrT,E" E,r ] [ I
1 0 1
] _ [ T"
- 0
0
1 + rT y
]
.
(4.7.1)
end
As it stands, this algorithm would require 3n 2 8ops to generate y = y( .. l.
It is possible, however, to reduce the amount of work even further by ex-
ploiting some of the above expressiom:
fJt 1 + [r<"l]T y<")
Algorithm 4. 7 .1. (Durbin) Given real numbers 1 = ro, rt. ... , r n such
that T = (rli-jJ) E Fx" is positive definite, the following algorithm com-
putes y E JR."' such that Ty = -(r1, ... , r,.)T.
y(1) = -r(1}; {J = 1; cr = -r(l)
fork'=" 1:n -1
{J = (1 -:- a2){J
a=- (r(k + 1) + r(k:- 1:l)Ty(1:k)) J/3
z(l:k) = y(1:k) + ay(l:: - 1:1)
y(1:k + 1) = [ z(~k) ]
end
This algorithm requires 2n2 Oops. We have included an auxilia.ry vector z
for clarity, but it can be avoided.
We tbell compute
{3 = (1 - a 2 ),B "" 56/75
Q
= + '"2111 + rt1/2)//3 =
-(Tl -J/28
%1 = + 01/2 = -225/420
Ill
%2 = 112 + OJ/1 = -36/420,
giviog lhe finale:~lution v = {-75, 12, -~JT /140.
(4.7.2}
(4.7.3)
Here, r = (r 1 , ••• , r,~;)T as above. Assume also that the solution to the kth
order Yule-Walker system T,~;y = -r is also available. From T.~;V+JJE,~;r = b
it follows that
and so
jJ = b.1:+1 - rT E~cv
= b~.:+t - rT E~cz -- ~JrT 11
= (bk+l-rTEr.x)/(l+rTy).
fork= l:n -1
=
P = (1- a')/1; p. (b(k + 1)- r(l:kf z(k~- 1:1)) f{J
v(l:k) = z{l:k) + IJY(k:- 1:1)
z(1:k + 1} = [ v(~k) ]
ifk<n-1
a= { -r(k + 1) + r(1:k)Ty(k:- 1:1)) /P
z(1:k) = y(1:k) + ay(k: - 1:1)
y(1:k +I) = [ z(~k) ]
end
end
This algorithm require! 4n2 Bops. The vectors z and u are for darity and
can be avoided in a detailed implementation.
Example 4. T.2 Suppoee we wish to solw the symmetric positive definite Toeplitz
sysk!m
[ :~ :! :~ l[:: l
1111ing the aboYe algorithm. After one peg through tbe loop
=- [ -1] wt1 obtain
We then compute
tJ = (I - a 2 ){J "'56/75 J,1 =- (ba- 1"1%2- r2:t1)//3 = 285/56
t11 "" Zl + ,Ul/2 "' 3M/56 ""l = Z2 +Pill .::: -376/M
giving the 6.nal110lution z ={355, -376, 2MJT /56.
T.-t
n
= [ rTE
A Er ] - l = [ B
1 if 7
v ] (4.7.4}
where A= T,._lJ E = En-1 1 and r = (rt. ... ,rn-l)T. From the equation
[J E ~r ] [ ; ] = [~ ]
it follows that Av = --yEr = --yE(rt. ... , rn-t)T and 1 = 1 - ,-7' Ev.
IT y solves the (n- 1)-st order Yule-Walker system Ay = -r, then these
198 CHAPTER 4. SPECIAL LINEAR SYSTEMS
u u u u u k
u u u u u k
T.-1 = u u u u u k
" uuuuuk
u u u u u k
k k lc k k k
Here u and k denote the unknown and the known entries respectiwly, and
=
n 6. Alternately exploiting the pensymmetry of T,; 1 and the recursion
(4.7.5), we cao compute B, the leading (n-1)-by-(n -1) block ofT;!1 , u
followa:
lc k k k k k k k k k k k
k u u u u k k u u u k k
~eym.
- k
k u
k
u
u
u u
u u
u u
u
u
u
k
lc
le
-
(U.$) k
k
k
u
u
k k
u
u
u
u
k
k
lr;
k
k
k
k
k k k k k k k k k k k /r;
4.7. TOEPL11"1: AND RELATED SYSTEMS 199
k k k k k k k k k k k k
k k k k k k k k k k k k
--
1"'1"'¥""'· k k u u k k
k k u u k
k k k k k k
k k k k k k
" -
(4.7.5) k
k
k
k
k u
k k
k k
k k
k k
k k
k k
k k
k
k
k
k
k k k k k k
k k k k k k
prr~ k k k k k k
k k k k k k
k k k k k k
k k k k k k
Of course, when computing a matrix that i.s both symmetric and persym~
metric, such as T; 1 , it is only necessary to compute the "upper wedgen of
the matrix-e.g.,
X X X X X X
X X X X (n = 6)
X X
With this last observa.tion, we are ready to prESent the overall algorithm.
1 .5 .2 ]
.5 1 .5 '
[ .2 .5 1
200 CHAPTEll 4 . SPECIAL LJNEAR SYSTEMS
Uleo - otKabl"'f a 75/ SO. bat = 75/56, 612 s -5/7, ba2 • 5/M, aDd~ • 12/7.
Moreover, the solution to the Yule-Walker system T.. y = -r(t :n) satisfies
{4.7.7)
" (1 +fa~: I)
II ro II ~ u II
l<•l
(n = 5).
=
in O(k) Hops. This means that in principle it is possible to solve an unsym-
metric Toeplitz system in 0( n :l) Hops. However, the stability of the process
cannot be assured unless the matrices T~~: = T(l:k1 l:k) are sufficiently well
conditioned.
4. 7. 7 Circulant Systems
A very important class ofToeplitz matrices are the circulant matrices. Here
is an example:
~ ~ :]·
~ tl.t
tit VI)
C(t~) = t1:t v1
[ Va tl2 V} tlo tlf
tl.t 113 1.12 tit vo
202 CHAPTER 4. SPECIAL LINEAR SYSTEMS
01 0
0 0
0 0
0 0 11
Sn= 0 1 0 0 0 (n = 5)
0 0 1 0 0
1 0 0 0 1 0
and v = {vo tit · • • 11n-l JT, then C(tr) = [ v, Snv, ~v, ... , 1v ].
s;:-
There are important connectiona between circulant matrices, 'lbeplitz
matrices, and the DFI'. First of all, it can be shown that
(4.7.10)
This means that a. product of the form y = C(v)x can be solved a.t "FFT
speed":
x= Fn:r
v=Fnv
z =v.•x
y = F;; 1 z
In other words, three DFTs and a vector multiply suffice to carry out the
product of a circulant matrix and a vector. Products of this form are called
convolutions and they are ubiquitous in signal processing and other areas.
Toeplitz-vector products can also be computed fast. The key idea is
that any Toeplitz matrix can be "embedded" in a circulant. For example,
T=[!! i]
is the leading 3-by~3 submatrix of
C=r~~~~n
In general, if T = (tij) is an n--by-n Toeplitz matrix, then T = C{1:n, l:n)
where C E m_(2n-l)x(2n-l) is a circulant with
T(1:n, 1}
C( :, 1) = [ T(1, ]
n:- 1:2)T ·
Probleu»
P4.7.1 Fot a.oy v E R' define \be vectora V+ = (v + E,.v)/2 a.od V- = (v- E..v)/'l.
Suppo~~e A E R" "" ia II)'UUDI!'Cric aod perll)'lDJDIItrc. Shaw that. if A% ""' b than Az + = b+
aod Az_ = 6--
P4. 7.2 Let U E R" "" be the UDit upper triaogular matrix with tbe propeny that;
U(l;k- l,J:) = E~<-lll(lo-l) where 11{lo) ia defined by (4.7.1). Show that
then xr X ia Toeplib..
P4. T.4 Coa.ider the LDLT factorization of ao n.by-n symmetric, t.ridiattooal., positive
definit-e Tooplltz matrix. Show thai; d.. and l,.,n-1 coovqe 1111 n-o oo.
P4. T.5 Show that the product of two lower triangular Toeplitz matrices is 'Theplitz.
P4.T.6 Give an algorithm f"" determioi:og ~ E R ,uch thai;
l
P4.T.9 Suppose A1, A1, As and~ are m-by-m mairiCEII and that.
~]
Ao At At
A- A-t Ao At
- [ A-2 A-1 Ao A1 '
A-3 A-2 A-t Ao
Ta
aol.~
to the.,.... in (4.7.8) ant nai1abla. AMwne thai all the ma&ricel imvlved
oomd.ngular. ~ to dewklp a. ~ w.ymmetrie Toepllt.a eolwr for T: = b
&n!
~ ~ T's 1-.diq pri.ociple mbmatricalan~ a.ll no~.
P4.7.12 A matrix HE e-xn ill HtWca if H(n:- 1:1,:) i1 'lbeplita. Show thai if
A E R'x" ill defined by
a,;; = 1• O»(U) COB(jll)d8
thaD A i1 the sum ol a. Han.lo!l matrix aod Toeplits matrix. Hint. Mllkb 1188 of the
=
i.deutity coe(u + 11) <XIII{u} cos(v) - sin(u} ain(v).
P4.7.13 Varify tha.t F,.C(v):: dia.g(F,.w)F~~.
P4. 7.14 Show that it is poeaiblo!! to embed a l)'llliDI!tric Toeplit:~ matrix into a. symmetric
circulant matrix.
P4.7.15 Consider the kth onl.er Yule-Walkm system T~ou< 11 > = -r<•> tba& aria~!~ in
l
(4..7.1):
T, [ ·;'
Uu
=- [ 7]
r~o
Show that if
1 0 0 0 0
1111 1 0 0 0
l':l2 :11:11 1 0 0
L= W3 1132 1131 1 0
the~~ LT.. LT = dief!:(l,01 •... ,,6..-I) •here fJ11 = 1 + r<lo)T 1/(J.). Tbus, the Durbin
algorithm can be thought of as a fasi method for computing the LDLT fadorization of
...
'T'-1
~
ADyOJIII who Y'llllt- iDio tbe w.t Toepllta metbod lit.eratUie should flmt rMd
J.R.. Blllldl (1~). "Stability of Mecb.oda for Solving Toeplib S:ya.ems of Equ.ati0011,~
SIAM J. Sci. St4t.. Comp. 6, 34.g.....;J64
for- a dari&eiion of l&abllity u.n-. N. ill true .nth the "£Bet aJgoritluna" ar-ea in general,
uutable Toeplib techniqus abound aDd caution muat be flhl'ciaed. See aiBo
J. Durbin (1960). '"The Fittinc ol Time s..n- Model.,~ Rev. /rut. /nt. Sl4i. S8 233-43.
N. Leviulon (1941). 'The W«ner RMS Errol- Criterion in Filtm- Design and Pn!diction,"
J. M!Uh. Phf•- 15, 261-78.
W.F. 'I'reacll (1964). ~An Algorithm lor the lnwnion of Finite Toepliu Ma.tnc.," J.
SIAM 1J, 515-22.
A more detailed dtB:ription of the nonsymmetric Trench algoriUun is given in
S. Zohu (1969). '"'I'oepliU: Mairix lnvenion: The Algorithm orW.F. Trench." J. ACM
16, 592-601.
Fui Toeplib system eolving hM aitraeted an enotmoUII amount of attmltion and a sam-
pling of inten.ting algorithmic id.. may be found in
G. Ammar and W.B. Gragg (1988). "Superfalt Solution of Real Poaitive Definite
Toeplitz Systems," SIAM J. Matri: AnaL AppL 9, 61-16.
T.F. Chan and P. Hauen (1992). ~A Look-Ahead Levinaon Algorithm for Indefinite
Toeplita S}'1ltemll," SIAM J. Matri: AnaL AppL 13, 490-606.
D.R. S'WOH (1993). "The Ute of Pi"YotiDg w lmpi"CMt the N1lDHrical. Perlonnallc:e or
Al~t~ritbms for Toeplitl Mamcea,• SIAM J. Ma1:1U Aftlll. App.l. 14, 468-493.
T. Kailath and J. Chun (1994). "'GeneraJized Oilplacement Structure for Block-Toeplitz,
Thepfits-Biock, and TO!!pl.ib-Deriwd. M.atricm," SIAM J. Mal:ri:c AnaL AppL 15,
114-128.
T. Kailath ami A.H. S&yed (1995). "Disp1ecemem Structure: Theory and Applicaiions,"
SIAM Re1Mul 37, 297-386.
Important Toeplits znairix appUcationa are di8cu.ed in
J. Makhoul (1975}. "Linear Prediciiol1: A Thtorial Rmaw," Proc. IEEE 63(4), 561-80.
J. Markel aud A. Gray (1976). LiMar Pred1dlon of Sp«eh, Springer-Verlag, Berlin and
New York.
A.V. Oppenheim (1978). App&:Geioru of Diglt4l Signal Proceuing, Pnmt.ice-Hall, En-
glewood Cill&.
-
ffaAkel mairics are eonatam aloonr Uleir alliidiagpnals and llriea in ~ im.pon;am
G . Heiuig and P. Jaakowlki (1990). "Parallel &ad Supmfut AJ&oriihma for H&Db.l
Sy.tems of Equations," H - . M~Mh. 58, lt»-127.
R. W. Freund and H. Zha (1993). ~A Look-AheM!. AJgoritlun.lor the Solution of a-enJ
IUabl Sywte~m. N N - . .Vc:a.th. 64. 211.5-322.
The DFTfl'oeplit;s/circu.INU CIOIUlection ia ~ in
Orthogonalization and
Least Squares
Q _ [ cos(8) sin(8) ]
- sin( B) -cos( B) ·
then QT~ = ( 2, 0 JT. Thus, a. rotuioo of -W' ~the 8e1Xlod COQ1pooent of:. U
cos(30") sin.(30") ] [ ../3/2 1/2 ]
Q = [ llin(30") - CC111(30") = 1/2 -.iJ/2
then QT: = (2, 0 ]T. Thus, by reflecting z acrcw tbe 30" liDe - c:aa zaoo it. -=ood
compooeot.
5. 1. HOUSEHOLDER AND GrvENS MA.TRJC8S 209
p = 1- __!_ttvT (5.1.1)
vTv
and
and therefore
Px =
In order for the coefficient of x to be zero, we set a= ±II x ll:z for then
v = [ x(;:n) )
if (j = 0
/3=0
else
~ = v'x(l)2 +a
if x(l) <== 0
t1(l) = x(l) -~J
else
t1(l) = -u/(x(l) + IJ)
end
{3 = 2v(l) 3 J(a + v{l)3 )
t1 = tljv(l)
end
This algorithm involws about 3n flops and renders a computed Householder
matrix that is orthogonal to machine precision, a concept disc\UISeCl below.
5.1. HOUSEHOLDER AND GIVENS MATRICES 211
[v,PJ = house(A(j:m,j))
A(j:m,j:n) = (Im-;+I- fjwT)A(j:m,j:n)
A(j + l:m,j) = v(2:m- j + 1)
II P- Pl12 = O(u)
(5.1.3)
vU> = ( ~
0 0 . · · 0 1 vUl
J+l•
. .. • n
vUl)T ·
;-1
for j = l:r
C=Q;C
end
The storage of the Householder vectors v< 1>· · · v(r) and the corresponding
/J; (if convenient) amounts to a factored form representation of Q. To
illustrate the economies of the factored form representation, suppose that
we have an array A and that A(j + l:n,j) houses vU>(j + l:n), the essential
part of the jth Houooholder vector. The overwriting of C E R"x 11 with
QT C can then be implemented as follows:
for j = l:r
v(j:n) = [ A(j +\:n.,j) ] (5.1.4)
Q= In
for j = 1:r
Q=QQ;
end
Q =In
for j = r:- 1:1
Q=Q;Q
end
Q =In
for j = r: - 1:1
(5.1.6)
where W and Y Bl'e n-by-r matrices. The key to computing the block
repre.sentation (5.1.6) is the following lemma.
214 CHAPTER 5. 0RTHOGONALIZATlON AND LEAST SQUARES
Q+=QP=l+W+Y.f
where w+ = [ w z 1andY+ = [ Y v 1are each n-by-(j + 1).
Proof.
QP = (I+ WYT) (I - {JvvT) = I+ WYT - f3Qm7
= I + WYT + ZVT = I + [ w z l [ y v IT D
Example 15.1.3 [f n "' 4, r = 2, and { 1, .6, 0, .8 JT and ( 0, 1, .8., .6JT e.re the
5.1. HOUSEHOLDER AND GJVENS MATRJCBS 215
Q(h
1
1.
.. • +WY1' ii
l
" + [
-1
-6
0
- .8
1.080
- .352
- .800
.:liU
l [ 1
0
.6
1
0
.8
.8]
.6 .
1 0 0 0
0 c ... s ... 0
G(i,k,8) = (5.1.7)
0 -6 ... c 0 k
0 0 0 1
i k
where c = cos(9) and 8 = siD(8) for some 9. Givens rotations are clearly
orthogonal.
Premultiplicatlon by G(i, k,9)T amount& to a counterclockwise rotation
of 8 radians in the (i, k) coordinate plane. Indeed. u z e R" and '!I =
T .
G(i,k,8) z, then
1/J = { :: :: ; : ~ i
z; j ~ i, /c
From theae formulae it ill clev that ,.., cao bee ~ to be zero by setting
(5.1.8)
[_; :r[:
function: [c., s) = givens( a, b)
J= [ ~ J .
ifb=O
c= 1; s =0
else
if lbl > lal
r=-afb; s=lf~; c=s-r
else
r = -bfa; c = 1/v'f'+'?; s = cr
end
end
This algorithm requires 5 flops and a single square root. Note that it does
not compute 8 and so it does not involve inverse trigonometric functions.
Example 5.1.4 If z = {1, 2, 3, fJT, coe(9) = 1/../5, and sin(9) = -2/../5, then
G(2,4.,9)z:: [1, ..120, 3, oJT.
8
A([i, k), :) = [ c ] T A([i, k), :)
-s c
and requires just 6n flops:
for j = l:n
Tt = A(i,j)
= A(k,j}
7'2
A(l,j) = CTJ. - 872
A(2,j) = STJ. + cr2
end
Likewise, if G(i,k,S) E R"x .. , then the update A+- AG(i,k,B} effectB just
two colum.os of A,
for j = 1:m
= A(j,i)
-r1
= A(j,k)
-rl
A(j,i) = cr1- S"'"l
A(j, k) = "'"• + C1"J
end
z = [
-s
e s]
e
then we define the scalar p by
ifc=O
p=l
elseif lsi < lei
p = sign(c)s/2 (5.1.9)
else
p = 2sign(s)/c
end
218 CHAPTER 5. ORI'HOGONALIZATION AND LEAST SQUARES
Essentially, this amounts to storing s/2 if the sine is smaller and 2/c if the
cosine is smaller. With this encoding, it is possible to reconstruct ±Z as
follows:
ifp=l
c=O;s=l
elseif IPl < 1
s=2p;c=~ (5.1.10)
else
c= 2/ p; 8 = v'f"="'&
end
k = l:p.
Assume that the above Householder and Givens algorithms are used for
.
both the generation and application of the Q11 and z~. Let O~r and Z 11 be
the orthogonal matrice~ that would be produced in the absence of roundoff.
It can be shown that
f3t 1 ] (5.1.12)
[ 1 Clt
and
T [ d2 + /Jrdt dd3t + d2a1 ]
Ml DMt = dtf3t +~at dt +or~ :::: Dt .
If x2 "# 0, 0:1 = -xt!x2, and f31 = -a1d2/dt, then
M 1T X _
-
[ X2(l + 'Tl)
0
]
{5.1.13)
1 0 0 0
0 {J 1 0 i
F(i,k,a,{J) = (5.1.14)
0 I ·· · a 0 k
0 0 0 I
i k
while the "type 2" trBDBformations are structured as follows:
1 0 0 0
0 1 0 ... 0 i
F(i,k,a,jJ) "" (5.1.15)
0 {J 1 0 k
0 0 0 1
k
EDC8p8ulating all this we obtain
ProbleiWI
P15.1.0 Suppoee ::~: aDd ¥are unit vee\on in R'". GMI a.a. algorithm W1in1 Giwoa
tn.uafannatioD. which computlll an orthQsmlal Q IIUCb that QT:: = V·
PIS.l. T Detenmne c = cos(8) aud s =sin(B) lllldl that
Ho~dm matriC8!1 aa n&IPSd alter A.S. Houaebolder, wbo popularized their use in
numerical analysis. HO\WI'\I'er, the propertlai o£ these matricea have been known for quite
IJOPHI time. See
H.W. Turnbull and A.C. Aitbn (1961). An /nirodudion t.o the Theory of Canonical
Matricu, Dover Publicationa, N- Yock, pp. 102-5.
Other refen!nc- concerned with Houaeholder tl'alliJformatlona include
C.H. Billchof and C. Van Loan (1987). "The WY Represemation fo~ PmduettJ of Houae-
holder M!!oirices," SIAM J. Sci. and Stat. Camp. 8, s2""1113.
5.2. THE QR FACTORIZATION 223
R. Sochnibolll' aod B.N. Parleu (1987). "Block IWieciors: Th!QY aDd Computatio!l,~
SIAM J. Nt.mur. An.al.. !5, 189.205.
B.N. Plll'iett and R. Schreibet (1988). "Block RAiflecton: Thecly ud Compnta.tkm,"
SIAM J. Hum. AlldL 16, 189-l06.
R.S. Schreiber .ad C. Vall LoaD (1989}. "A Stocap--Efficient WY R.epr.entMion foc
Product. of Hotueho1del- 'I'rallsformaiio011," SIAM J. Sci. and Stat. Comp. 10,
52-57.
C. ~ ( 1992). •Modificai:ion of the H1>1J81!boMW Method a-1 oo tbe Compact WY
~." SIAM J. Sci and St.GL Camp. 13, 123---126.
X. Su.a. and C.H. BiBcbof (1995). "A BMia-Kentel ~t.ati.on of Orthogonal Ma&ri-
a.," SIAM J. Ma.triz A-'. AppL 16, 1184-1196.
GiWD~~ rot.atiou.l, IIIUDI!Jd after- W. Giveu, are allo referred to • Jacobi rota&iDns. Jacobi
devised a S)'111ID8bic eigeovalue a.lgori\hm ~ on thMe traoafonnatio011 in 1846. See
§8.4. The GiWIDII rotation atorap IICheme dillcw.ed in tbe text ia detailed in
5.2.1 Householder QR
We begin with a QR factorization method that utilizes Householder trans-
formations. The essence of the algorithm can be conveyed by a small ex·
ample. Suppose m = 6, n = 5, and assume that HoWieholder matrices H 1
and H2 have been computed so that
X X X X X
0 X X X X
0 0 131 X X
H2H1A = 0 0 131 X X
0 0 131 X X
0 0 131 X X
X X X X X
0 X X X X
0 0 X X X
H3H:zH1A = 0 0 0 X X
0 0 0 X X
0 0 0 X X
for j = l:n
[v, PI = house(A(j:m,j))
A(j:m,j:n) = (Im-,;+1- .BvvT)A(j:m,j:n)
if j < m
A(j + l:m,j) = v(2:m- j + l)
end
end
5.2. THE QR FACTORIZATION 225
v~1) v6
('1) (3)
v6
{4)
v6 ti~S)
.\ = 1; k- 0
while.\~ n
::smin(.\ + r - 1, n); k k + 1
1" =
Using Algorithm 5.2.1, upper triaugularize A(A:m,.\:n)
generating Houaeholder matrices H,., ... , H,.. (5.2.1)
Use Algorithm 5.1.2 to get t.he block repreeeDtation
I+ W•Y• = H,., . . . ,H,..
A(.\:m. -r + l:n) = (1 + W1l'i)TA(.\:m , r + l :n)
.\::zT+l
end
The zero-nonzero structure of the Houaehok!er vectors that define the ma-
trices H", . .. , H,. implies that the first .\ - 1 rows ofw.
and Y.a. au zero.
This fACt would be exploited in a practical implementatioo.
The proper way to regard (5.2.1) is through the partitioning
where block column A• is procesaed during the ktb step. In the ktb step of
(5.2.1), a block Householder is formed that zeros the subdiagonal portioo
of A~c. The remaining block oolUIDilS are then updated.
The roundoff properties of (5.2.1) are essentially the same as those Cor
Algorithm 5.2.1. There is a slight i.ncreue in tbe number of Bops required
because of theW-matrix computatious. However, as a result of the block-
ing, all but a small fraction of the flope occur in the context of matrix mul-
tiplication. In particular, the level-3 fraction of (5.2.1) is approximately
1- 2/ N. See Bischof aod Van Loan (1987) for further details.
u n~r~ ~ l~ [~ n~
X X X
X X X
)( )( )(
X X X
[~ ~ l~ [~ ~ l~ [~ ~ ]~R
X X X
X X X
X X 0
X 0 0
Here we haw highlighted the 2-vectors that deti.oe the underlying G ivewJ
rotatioos. Clearly, if G1 denotes the jth Givens rotation iD tbe reduction,
then QT A ""'R is upper triangular wbere Q = G 1 • • · Gc and t is tbe total
5.2. THE QR FAcroRIZATJON 227
for j = l:n
for i = m: - 1:j + 1
[c,s) = givens(A(i -l,j),A(i,j))
A(i -l::i,j:n) ;; [ c s
-s c
]T A(i- l:i,j:n)
end
end
This algorithm requires 3n 2 (m- n/3) Oops. Note that we could use (5.1.9)
to encode (c, s) in a. single nwnber p wltich could then be stored in the zeroed
entry A(i,j). An opera.tion such as x ~ QT x could then be implemented
by using (5.1.10), taking care to reconstruct the rotations in the proper
order.
Other sequences of rotations ca.n be used to upper triangula.rize A. For
example, if we replace the for statements in Algorithm 5.2.2 with
for i = m: - 1:2
for j = l:min{i- 1, n}
for j = l:n
for i = m: - l:j + 1
[c,s) = givens(A(j,j),A(i,j))
A{[ji],j:n) = [ c
8 ]T A([jij,j:n)
-s c
end
end
We then compute G(3, 4, 93 ) to zero the current (4,3) entry thereby obtain-
ing
X X X X X X
0 X X X X X
0 0 X X X X
G(3,4,83)TG(2,3,9"l)TG(l,2,1h)T A =
0 0 0 X X X
0 0 0 X X X
0 0 0 0 X X
Overall we have
fori"" l:m
d(i) = 1
end
for j = l:n
for i = m: - l:j + 1
[ et, {3, type J = fast.givens(A(i- l:i,j), d(i- l:i))
if type= 1
A(i -l:i,j:n) = [ ~ ! r A(i- l:i,j:n)
end
else
end
This algorithm requires 2n2 (m- n/3) flops. AE we mentioned in the pre-
vious section, it is necessary to guard against overflow in fast Givens algo-
rithms such as the above. This means that M, D, and A must be periodi-
cally scaled if their entries become large.
If the QR factorization of a narrow band matrix is required, then the
fast Givens approach is attractive because it involves no square roots. (We
found LD LT preferable to Cholesky in the narrow band case for the same
reason; see §4.3.6.) In particular, if A E Rmxn has upper bandwidth q and
lower bandwidth p, then QT A = R has upper bandwidth p + q. In this
case Givens QR requires about O{np(p+q)) flops and O(np) square roots.
Thus, the square roots are a significant portion of the overall computation
if p, q <t: n.
k= l:n.
The matrices Ql = Q(l:m, l:n) and Q'l = Q(l:m, n + l:m) can be easily
computed from a fa.ctored form representation of Q.
II A = QR is a. QR factorization of A E Rmx" and m 2: n, then we refer
to A ;; Q(:,l:n)R(l:n, l:n) as the thin QR factorization. The next result
addresses the uniqueness issue for the thin QR factorization
Theorem 5.2.2 Suppose A E R"x" has full column rank. The thin QR
factorization
A= Q1R1
is unique wMre Q 1 E Rmxn has ort!wnormal columm and Rt is upper tri-
angular with po.ritive diagonal entriu. Moreover, R 1 = (jl' where G is the
lower triangular Cholesky factor of AT A.
Proof. Since AT A = {Q 1 R1)T (QtRi) = Rf R1 vre see that G = Rf is the
Cholesky factor of AT A. This factor is unique by Theorem 4.2.5. Since
Q 1 = AR} 1 it follows that Ql is also unique. []
It follows that if
A(Jc} = [ z B I
1 n- k
then ru = II z IJ;,, q,., =
zfrlcJc and (rt,i+l · .. r..,.) = qfB. We then
compute the outer product A<•+ I) = B - q• (rll,k+l · · · r.m) and proceed
to the next step. This completely describes the kth step of MGS.
fork= l:n
R(k,k) =II A(l:m,k) II:~
Q(l:m,k) = A(l:m,k)/R(k,k)
for j = k+ l:n
R(k, j) = Q(l:m,k)TA(l:m,j)
.A.(l:m,j) = A(l:m,j)- Q{l :m, k)R(k,j)
end
end
This algorithm require~ 2mn2 fiops. It ia oot possible to overwrite A with
both Q1 and Rt. Typically, the MGS computation is arranged so that A is
overwritten by Q, and the matrix Rt ia stored iDa aeparate array.
-.~0'1]
.107100
.
5.2. THE QR FACTORIZATION 233
for j = l:n
x = A(j:m,j)
v = x ± e08 11 x llze1 where x1 = re 08 •
/3 = 2/vH /v
A(j:m,j:n) ""{Im-i+l -pvvH)A(j:m,j:n)
end
The reduction involves 8n2 (m - n/3) real flops, four times the number
required to execute Algorithm 5.2.1. H Q = P 1 • • • Pn is the product of the
Householder transformations, the11 Q is unitary and QT A= R e Rmxn is
complex and upper triangular.
Problems
Pli.:il.l Adapt tbe Houebolder QR al&orithm ao tlllll it c:&D efficiently bandle the ca.
when A E R"x" hM ~ bandwidth p and upper be.DdwidUl q.
Pli.2.2 Adapt tbe Hou.holdet QR alpiihm 110 thai it cornpa.tm the r.ctorization
A :i QL wbwe L ill loww ~aDd Q is ortbopmal. AMume t.ha& A ia ~~quare. This
=
involw. rewritin11: the Hou.holdet vectot function v bou.(z) ., thai (l-2vvT fvT v ):
i.l!l zero ewrywhere but itl bouom component.
Pli.2.3 Adapt the Giwa.l QR factorization algorithm 80 'IW t h e - ant ioUoduced by
diagonal. That ia, the eotri• a.rezaoed in the order (m, 1), (m-1, 1), (m, 2), (m- 2, l),
(m - 1, 2), (m, 3) , etc.
P5.2.4 Adap; tbe fMt GiYI!llllll QR fiK:toriation aJ«ori&hm 110 t..h.llt it efficieot.ly bandleB
the a . wbtm A i1: n.-by&n aod tridia&onal A.ume that tbe subdiagoaal. diagonal, and
s~ ol A ace stored iD e{l:n- l), o(l:n),/(l:n- l) ruapectiwly. DMigD. your
algorithm 80 thai tru- 't'eCionl are oY!lfWritten by tbe oo-ao portioo ofT.
PS.:II:.lS Suppoae L E R'""' with m ~ n is lower triaDgu)ar. Show hl:nr HoWieboider
Dllltricm Ht ... H .. CUI be Wllld. &o determille a ~ triaugular Lt E R"x .. 80 tha&
H.. ···HtL = [ ~1 ]
234 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES
A= [ ~"]
where 0.. ill t.he rrby-n zero matrix. Verily tba.t. th.il &latement i.l!l true aftet- the first
step of eech method ia completed.
P&.l.U llewnle the loop on:l.ss ill AJsoritlun 5.~.5 (MGS QR) 80 that Rill computed
colWIUl-by-colWIID.
PS.l.ll Develop a complex~ oftbe Givsu QR ~ Refer to P5.l.5.
where complex GiveM rota&iooa are the theme. I. ii paable to mp.oize the calcu.Jal;ioDB
110 thN tbe <fia&oll&l elemsCII of Rare DOoneptiv.7
The idea of using Householder transformatioDS to solve the LS problem was prnpoaed in
P. Busiuger llDd G.H. Golub (l965). ~Linear Least. Sq~ SolutiolljJ by HOUIIeholder
'l'c&usbnna.tiolla," NUJJil!r. Mtldl.. 1, 269-16. See aJ.o Wi.l.k.illiiOn and Rein8ch
(1971,111-18).
G. H. Golub (1965). ~Numerical Metboda !or Solvin~~: Linetl.l' ~ Squana Problem~!,~
Numer. MctJi. 7, 206-16.
5.2. THE QR FACTORIZATION 235
min (5.3.1)
sER"
In this section we pursue these two solution approaches for the case when
A baa full column rank. Methods based on normal equations and the QR
factorization are detailed and compared.
AT AxLs ;: ATb.
These are called the nonn4l equat&onl. Since V~(x) ""AT(.Az- b) where
~(z) = !II Ax- b II~, we see that solving the normal equations is tanta--
mount to solving the gradient equation V ~ = 0. We call
rts = b- A%Ls
PLs = uAzr.s-, n,
to deDOte ita size. Note that if PLS is small, then we can "predict" b with
the colulllD.S of A.
So far we have been &SBUming tbat A E m.mxn has full column rank.
This 888WDption is dropped in §5.5. However, even if rank(A) = n, then
we can expect trouble in the above procedures if A is nearly rank deficient.
When 9"l1'8Sing the quality of a computed LS solution its, theze are
two important laeues to beat in mind:
A ~ ~ ~-•
[ ] , 6A = [ ~ l~-• ] , 6= [ ~] , 6b = [ ~],
and tbaULS Uld i u minilnife M..b-612 aDd II (A+6A)z -(6+66) 1
12 respectively.
I.e\ rLS aDd fr.s bo ~be cOCYWpOIIdioc minimum ruid!U. The~~
The example suggests that the S6DSitivityo{ XLS depends upoo ~2(A) 2 . At
the end o! this &eetio.o we develop a perturbation theory {or the LS problem
and the "'2 (A) 2 factor will return.
A= [ o1 ] &Dd 11 = [ 2 3 ]
10-
to- 3 10- 3
then ~2(A) ~ 1.4 · lo', %£5 = (1 IJT, eDd PLS • 0. U the oorma1 equatlou mecbod is
execuWICI with bue 10, t • Ci aritllmatic, then a divide-by-aero OCC'Iln duriJIK the eolutioo
pro<:ea~, s ioce
fl(AT A) = [ ! !]
is exactly aingular. Oo the other halld, if 7-di&it.arithmetk is ~ thea :LS ""
I 2.000001 • 0 JT aod II Z£$ - %{.$ 11,/1 %£$ 112""' U1'2(A) 3 .
QTA=R= [R
0
1] m-n
n (5.3.3)
is upper triaogular. II
n
m-n
tbeo
240 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES
PLS = lldJlz.
We conclude that the full rank LS problem ca.u be readily aolved once we
have computed the QR factorization of A. Details depend on the exact QR
procedure. If Householder matrices are used and QT is applied in factored
form to b, then we obtain
where
(5.3.5)
and
IJ6b liz :5 (6m- 3n + 40)null b lb + O(u 2). (5.3.6)
These inequalities are established in Lawson and Hanson {1974, p.90ff} and
show that :i: LS satisfies a "nearby" LS problem. (We cannot address the
relative error in :i: LS without an LS perturbation theory, to be discussed
shortly.) We mention that similar results hold if Givens QR is used.
A+ = I A bI = I Q, 9n+i I [ ~I ; ] '
Sa ] n
[ 0 m-n
is upper triangular. H
n
m-n
then
for aoy ,; E R". CJearly, XLS 1s obtained by .solving the noosingular upper
tri&Dgu.lar system St% =c.
The computed solution zc.s obtained in this fashion can be shown to
solve a nearby LS problem in the seose of (5.3.4)-(5.3.6). This may seem
242 CHAPTER 5. OJO'HOGONALIZATION AND LEAST SQUARES
surprising since lacge numbers can arise during the calculation. An entry
in the sca.ling matrix D can double in magnitude after a single fast Givens
update. However, largeness in D must be exactly compensated for by large-
ness in M, since v-
1/lM is orthogonal at aJl stages of the computation.
(5.3.7}
and
. (O} PLS ..i. l
sw ""llblh r
i.e.,
(5.3.12}
By substituting this result into (5.3.11 ), taking norms, and using the easily
verified inequalities II J lb ~ 11 b 11 2 and II E 11 2 ~ II A 1! 2 we obtain
ll.i:-xlb
I xll2
< E {11 A ll2ll (AT A)-lAT lb (
11
}1 : 1 ~ lb
\ 1
+ 1)
+ II A ::~r X u211 A II~ II (AT A)-I lb} +
2
0(E )-
Thus,
x II~
II xII -x lb { ( 1 1) :z sin(9) } 2
:5 f. K:z(A) cos( H) + + ~(A) cos(B) + O(f. )
r
and observe that r = r(O) and = r(~). Using (5.3.12} it can be shown
that
r{O) = (I- A( AT A)- 1 Ai) (! -Ex) -A( ATA)- 1 ET r.
2 PLS 2
tan(8)~~:2(A) = 11:2(A) .
v'll b II~ - Pls
Thus, in nonzero residual problems it is the square of the condition that
measures the sensitivity of XLS· In contrast, residual sensitivity depends
just linearly on ltl(A). These dependencies are confirmed by Example 5.3.1.
Thus, we may conclude that if P£S ia small and 11:2(A) is large, then the
method of normal equations does not solve a nearby problem and will usu-
ally render an LS solution tbat ia less IICCUl'&te than a stable QR approach.
Conversely, the two methods produce comparably inaccurate reaulta when
applied to large residual, ill-conditioned problems.
Finally, we meatioll two other factors that figure in the debate about
QR venus normal equatiooa:
At the 'Very minimum, thl8 discussion sbould convince you how difficult it
can be to choose the "right" algorit hm!
Problema
I r - ,.II, $ ~2(A) ~ ~ ~2 11 z 11
2•
P6.3.3 1M A E R""'" rill m > n ADd t1 E R"' aad deftDe A ,. fA til E R",(,.+l).
Silo. thaa crt(A) ?: <1t(A) Nld <1n+t(A) $ o-,.(A). Tbaa, the CIDIIditloa CI'IJ'ft if a cduma
il added t.o a ~~latrix.
P5.lU La\ A E R", .. (m ?: n), w E R", and define
Show t~ cr,.(B) ?: ~r,.(A) 8Dd <11 (8) $ v'l A 11: + UwiJ. T'11ut, tbe OODditioD ol a
matrix lila)' t.Da.. ex ~ if a rvw ia .added.
P5.3 .8 (Cliu lln3) S~ Q.at A E R'"l(" b. taDk" .00 t.hM Gan=i•n eliminvjon
rib puUal piwdq il ~ &o OOIIIPnt the~ PA • LU, -~ L E R'"x"ll
Wlit loww ~. U e R'•" il ~ Viaqulw, .M PER",.,. ill a~
&plaiD bow tbe decompolit.illa ill P5.2.5 cu be ll8ld t.o liDd & vect.or • E R'" llldl Cia&&
II L~& - ~ u2 • miDimjW Silo. \W if U:z .. z, tJa. I A:z- It 1:.
ill m i gjg•m ~
\bat tJUa melbod of 8Dlvlq &be LS problem It IDOft ema.a& cbaa H~ QR. from
~be ftop poiDt of view ..,~ m S 'lm/3.
P5.3.8 Tbe lll&trilr C :a (AT A)- 1, wbere I'Mk(A) = n , 1ri1ee ill --.y ..-Wjc&l ~
cacioaa and ill lmowD aa tbe oariAnce-oowrioACe motri& Aauma uas
cbe ~
246 CHAPTER 5. 0Jn'HOGONALIZATION AND LEAST SQUARES
A"" QR ia avai.labla. (a} Sbcnr C = (RTR)- 1 • (b) Give N1 a.J&om.hm lor comput~ the
diagonal of C tl!5 requirel n 3 /3 flops. (c) Show that
Our restriction to le&llt &:Juarell apprmrimaiion ia not a vote against minimiution in other
nonns. There an1 occuions when it ill adviAble to minimize II Az - b II,. for 'P == 1 and
oo. Some a.lgoritbme fo~ doing this ue dBI!ICribed in
A.K. Cline (1976&). "A n-::eot Method foc the Uniform Solution to Oven:l.etenn.ined
Symma of Equaliona, ~ SIAM J. Num. AM-I. 13, 293-300.
R.H. Ba.rte1s, A.R. Conn, and C. Chan.Jambous (1978). "'n Cline's Direci Method !or
Solving Overdetermined Linear S}'lltems in the L.,.. Senset SIAM J. Num. An.t!.L 15,
255-70.
T.F. ColemiLII ILIId Y. Li (1992). ~A Glob&lly and Quadratically Convergent Affine
Scaling Method for Linear L1 Problems,~ Mathemotirot Progromming, 56, St:ri~ A,
189-222.
Y. Li (1993). "A Globally Convergent Method for L,. Problems," SIAM J. Opmnwticn
3, 609-629.
Y. Zhang (1993). "A Primal--Dual Interior Point Approach for Computing the L1 and
L 00 Solutions of Ovetdetermill.ed Linear Syateme, n J. OptimUation 7'ht:orv and Ap-
plic:otionl 17, 323-341.
Tbe . - of G&UIB u.u.lonnaiio• to solve the LS problem hal atcracted some attention
'**- they _, cheapm- to . - than Houeebolder or Gift~~~~ ma.triais. See
G. Peters and J .H. Wilkinaoo ( 1970). "The Leut Squaces Problem =d P!lelldo-JnverlleS,"
Oomp. J. 13, 309-16.
A.K. Cline (1913). "An Eliminalion Method fur the Solution of Linear Lee..& Squllnlll
Problema," SIAM J. Num. AMl. 10, 283-89.
R.J. Plemmooa (1974). "Linear Least SqiiSI'tll by Elimination ILIId MGS," J. Auoc.
Oomp. MDcl&. !1, 581-85.
G.H. Golub and J.H. Wilkinaon (1966). •Note on tbe lten.tM! Refinement of I...eu&
Squana Solution," Nwn.er. Malh. 9, 139-48.
A. \'11.11 del' Slnia ( 1915). "Stability of the So!uti~ of Lineal' ~ Sq~ Problem,"
Nvmer-. Moth. D, 241-54.
Y. Saad (1986). "'n tbe Condition Number of Some Gram MacricM Arising from Le:a.tn.
Squar&~ Appraximation in the Complelt Plane," N'1J.'11U!:r. Math. ~8, 337-348.
A. BjOn:k ( 1987). "Stability Ana.lysia of t.he Method of Seminonnal Equa&iollll," Lin.
Alg. and /q Awlic. 88/89, 31-48.
5.3. THE FULL RANK LS PROBLEM 247
F.L. Baum- {1965). MElimination with Weighted Row Combinatolons ror Solving Lin-
ear Equatio01 and Least Squares Problems," N'ISfMT. Mtaeh. 7, 338-!5-2. See al80
Wilkinson and Reinscb (1971, 119-33).
Least squazes problem~~ often baw special structllRl wbicb, of colllllll, :should be exploited.
M.G. Cox (1981). "Tbe Least Squares Solution of Own:letmmined Linear Equations
having Band or Augmented Band StructUM," IMA J. Num. Anal. 1, 3-22.
G. Cybenko (1984). 'The Numerical Stability of the Lattice Algornhm for Least Squares
Linear Prediction Problems," BIT 2.4, 441-455.
P.C. H~ a.od H. Gaunar (1993). "Fast Orthogonal Decompo~~ition of Rank-Deficient
Toeplitz Matricee," Numericol Algonth!'M .t. 151-166.
The use of Householder matrices to solve sparse LS problemll require~~ careful attention
to avoid ~ve 6ll-in.
J.K. Reid (1967). ~A Note on the Leaat Squ8l"fll Solution of a. Band S}'lltem of Linear
Equa.t.iona by Houaeholder Reductiona,n Comp. J. 10, 188-89.
l.S. Duff and J.K. Reid {1976). "A Compal'i!lon of Some Methods for the Solution of
Spane Over-Determined Systems of Linear Equat.iODI,~ J. /rut. MaUL AppUc. 17,
267-80.
P.E. Gill and W. Murray (1976). "The Orthogonal Factorization of a Lacge Spame
Matrix, • ill Sporn Mlltriz Campueatimu, ed. J.R BIIDC.b aod D.J. Rose, Academic
Pre., N_. YOf'lc, pp. 177-200.
L. Kaufman (100'9). "Application of n..-
Hou.ebokler 'l'nmldClr~Datiou. to a Span~~~
Mairix.,n ACM 7hm•. Mtaeh. Soft. 5, 442-51.
l.S. DuB (1974). "Prou SeJeclioD &lid Row OrderiDg in Gi,_. Reduciion on Sparse
Matrics," ComJ"'ting 13, 239-48.
J.A.. Georp ud M.T. He.&.h (1980). "Soo.mioD ofSJ~UM u - 1.-.c Square~ Problema
U.mg Gi11el!S ~io01," Lin. Alg. and /C.. Applic.. 34, 61HJ3.
248 CHAPTER 5. OJti'HOGONALIZATlON AND LEAST SQUARES
There are interesting choices for Q and Z and these, together with the
column pivoted QR factorization, are discUBSed in this section.
is its QR factorization, then rank( A) = 2 but ran(A) does not equal any of
the subspaces span{qt. lJ2}, span{q1o q3}, or spao{Q2, 113}.
Fortunately, the Householder QR factorization procedure (Algorithm
5.2.1) can be modified in a simple way to produce an orthonormal basis for
ran(A). The modified algorithm computes the factorization
qT Ali = ( Ru
0
Rt::r
0
J r
m-r (5.4.1)
T n-T
min{r,k}
ac. = L Tttqi E span{qt, ... ,q,.}
i.-1
implying
ran( A) = span{qb ... , q,. }.
The matrices Q and ll are products of Householder matricea and inter-
change matrices respectively. Assume for some k that we have computed
5.4. OTHER ORTHOGONAL FACTORIZATIONS 249
Householder matrices H1, ... , H~c-1 and permutations 111 , ••• ,n,._ 1 such
that
(5.4.2)
k-1
m-k+1
that
,(k-1) - [ (lc-1) (k-1)]
£"22 - zlc ' ••• 'Zn
II ,4~<-l) liz = max {II z~k- 1 ) ll:z• ···,II z!_k-tl lb} · (5.4.3)
Note that if k-1 = rank(A), then this maximum is zero and we are finished.
Otherwise, let D~o be the n-by-n identity with columns p and k interchanged
and determine a Householder matrix H,. such that if R(k) = H"R(k-llDk,
then R(k)(k + 1:m, k) = 0. In other words, Il1c moves the largest column in
~- 1 ) to the lead position and il.., zeroes all of its subdiagonal components.
The column norms do not have to be recomputed at each stage if we
exploit the property
QTz = [a]
w s-1
1
which holds for any orthogonal matrix Q E ~x•. This reduces the overhead
associated with column pivoting from O(mn:J) flops to O(mn) fiops because
we can get the new column norms by updating the old column norms, e.g.,
for j = 1:n
c(j) = A(l:m,j)TA(l:m,j)
end
r = 0; T = m.a.x{c(1}, ... , c(n)}
Find smallest k with 1 ::5 k ::5 n so c( k) = .,.
while.,.> 0
r=r+l
piv(r) = k; A(l:m, r) ...... A(l:m, k); c{r) ..... c(k)
[v,,B) = bouse(A(r:m,r))
A(r:m, r:n) = (Im-r+l - ,BvvT)A(r:m, r:n)
A(r + l:m, r); v(2:m- r + 1)
fori= r + l:n
c(i) = c(i) - A(r, i) 2
end
ifr < n
T= max{c(r+ 1), ... ,c(n)}
Find smallest k with r + 1 ::5 k ::5 n so c( k) == T.
else
;::::::0
end
end
This algorithm requires 4mnr-2r2(m+n) +4rl /3 flops where r = rank( A).
AB with the nonpivoting procedure, Algorithm 5.2.1, the orthogonal matrix
Q is stored in factored form in the subdia.gonal portion of A.
A = [ ~ ~
1 11
:
12
l·
n:::::: [e! e2 el] a.nd to three significant digits we obtain
l[
then
5.4.3 Bidiagonalization
Suppose A E R"x" aud m ~ n. We next allow how to compute orthogonal
Us (m-by-m) aud Va (n-by-n) auch that
dt h 0 0
0 d2 h 0
0
uB= ul ... u.. aDd Vs = vl ... Va-l can each be determined as a product
of Householder matrices:
~ :~
[ 0
0
0
X
X
X
X
X
X
=].!2.[~ ~ =
X
X
0
0
0
0
X
X
[~ ~ ~ n~[~
X
X
0
0
0
0
X
X
0
0
~
X
X
X
l [~ :
~ 0
0
0
0
0
0
252 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES
In general, uli introduces zeros into the kth column, while v. zeros the
appropriate entries in row k. Overall we have:
.
A=
[ 1
1
4
10 ll
2
5
8
12
!].
then to three significant digits we obtain
iJ =
[
12.8
0
0
21.8
2.24
0
0
-.613
0
l Vs ::=:
1.00
o.oo
[ 0.00
0.00
-.867
-.745
0.00
-.745
.667
l
_:: -:: l
0 0 0
-0776 -.833
• -.3110 -.451
Us""' __
[ 5430 -.069 .101 -.457 .
-.1160 .312 .547 .037
5.4.4 R·Bidiagonalization
A faster method of bidiagonalizing when m > n results if we upper trian-
gularize A first before applying Algorithm 5.4.2. In particular, suppose we
5.4. OTHER OJUHOGONAL FACI"ORIZATIONS 253
UT AV == [ ~1 ] ; B
is a bidiagonalization of A.
The idea of computing the bidia.gonalization in this lll8J1Der is mentioned
in Lawson and Hanson (1974, p.ll9) and more fully analyuxi in Chan
{1982a). We refer to this method as R-bidiagonalization. By comparing its
flop count {2mn2 +2n 3) with that for Algorithm 5.4.2 (4mn 2 -4n3 /3) we see
that it involves fewer computations (approximately) whenever m ~ 5n/3.
Problema
P5.,.1 Suppoee A E R",." with m < " · Gn. to~~ aJ&oritbm for cocnputmc tbe fector-
iaaUoll
urAv •!BOJ
wbere B ia an m-by-m upper bidiagoaal ma&rix. (HIIll: Obtain the form
X 0
X )(
0
0
0
0
0]
0
0 X X 0 0 .
0 0 X X 0
uaiac Ho\!Mbolcler 1!\Mricee .lid tha "cha." tbe (m, m + 1) eotry up the (m + l}a
colUDUl by app!yias G l - rocKiooa from the right.)
P5.4.2 Show bow ~ eflb:ieDtl)' bidiacooallle u n-by-n upper tri&nplar m¥rix IWios
GIWDI ro&atloDL
P5.4.3 Show bow to upPft' bidiacooaliae a tridlacoD&I matrix T E R',. .. uainc Giveu
~ou.
PS.,.4 Let A e R""" aad _.me tllat 0 .;, v atiafiea 0 Av ll2 "" u,.(A)II v l2 Let n
be a ptnDU~ioll auc:h that if nr• • w, the:D IU~oo l "" Uw flao· Show thal II All • QR
ill the QR factorisation of An. thell fr,.,.j ~ ,..;M,.(A). Thu, Ulere always exiRa a
~ll n 8UCb t l$ \he QR !~D of An •w.piayw" -rank defidaucy.
PS.4.S .1...- z, l' E R"' aad Q E R"x.. be pwn with Q onbogooaL Show that if
QT%= [a]u
1
m-1
QT'/1= [/J]
o
1
m-1
tbell uTV a %T'II - a/J,
P5.C.6 Let As ( cu, ... ,o,.J E R"•" Uld bE R"" be giwa. fo\x' auy tut.K of A'a
cofWIIIIII {Get , ... , Oc.\ } deftDe
compu~ of
The tbe SVD ill detailed in §8.6. But ha-t! an1 aome of the aCandaro
~ collCIImed wit..h U. calcu.la&ion:
G.H. Golub aDd W. Kahaa (1965). "Cakulacial lhe Singulv Values and Peeudo-lllwne
of a Matrix,~ SIAM J. Num. AnaL t, '205-24.
P.A. Bllllingar and G.H. Golub (1969). KAigorithm 358: s~ Value Decomposition
of the Complu Matrix." Cmnm. ACM 111, ~-
G.H. Golub a~~d C. JWnacb (1970). "Singular- Value Deco~tion aDd Least. Squ.t~AS
Solutiona," Numer. MalA. 1./, 403-20. See ablo Wilkinaon a~~d Reinach(I971, pp.
1334-51).
T.F. Chan (1982}. KAo lmpnJyed Algurithm for Computing the Singulac Value Decom-
polition, ~ A CM lhJN. Mo&h. Soft. 8, 12-83.
P.A. Bu.singer aDd G.H. Golub (1965). •Linear Learlt Squares Solutio11.11 by H01111eholder
'I'nmaforma1iona," Nvmer. Math. 7, 269-76. See alao Wilkinrlon a~~d Rsinacb. (Urn,
pp. 11-18).
L.V. Fl:.ter (1986). '"RaU and NuU Space CalcuiMiou Using Matrix Oecompo8Rion
without Column Incert:baDpa.~ Lin. Alg. an& Ita App!ic.. 7.f, 47-11.
T.F. Chan (1987). "RaQk ~ QR Factoriationa,~ Lln.. Alg. C&n.d Ju Apptie.
88/89, 67-8!.
T.F. Chan and P. Ilanlml (1992). "Some Applicatio~~~a ol the Rank Revealing QR Fac-
torizaa;ion,~ SIAM J. Sci. otul SJ4t. Comp. 1,, 727-741.
J.L. Barlow Uld U.B. Vflml11apa&i (1992). •ftank ~Methode lor Spene M.m.
01!11,~ SIAM J. MGO'V. AnaL Appl. 1!, 12'19-1297.
T-M. HM.DS, W~W. Ll.n, IIDd E.K. Y11Ag (1992). "Ra.ok·R.enalint: LU Factomatiooa,~
Lin. Alg. and Ita AppAc. 175, 115-141.
C.H. Blacbof aod P.C. a - (1992). •A Bkx:k Algori\bm for Compui!Dc RBIIk·
~ QR ~, N--ux.l Algontlmu ~ 371-392.
S. ~ and I. C. F. I~ (1994). "'n. Rank·n-liDg F'ac:toriza&;klaa,,. SIAM
J. Mfltri.% A114l.. AppL 15, 592-622.
R.D. Fierro aDd P.C. 11.- (1995). •A.ccmacy of TSVD SoluUooa Canpu.ted from
Rank~R.enaiint: Decompnaitioaa, ~ Num.et". MtUh. 70, 453-472.
256 CHAPTER 5. 0RTHOGONALJZATION AND LEAST SQUARES
~
] m-r r
r = rank(A)
r n-r
then
I!Az-bll~ =II(QTAZ)z7'x-QTbJI~ z:IITuw-c ll~ +lldll~
wbere
zrz = [ w
y
J n-r r
Clearly, if x is to miaimize tbe awn of squans. then we must have to = Til 1c.
For % to have minimal 2-oorm, tl must be zero, aod thu.,
~lc
ZLS = Z [
1
J ·
)
5.5. THE RANK DEFICIENT LS PROBLEM 257
=
Theorem 5.5.1 Suppoae rfi' AV E i.t the SVD of A e Ie'xn with r =
rank(A). If U = [ u1,. •• , Urn] and V = [111, ••• , Vn] are column partition-
ingt and b e R"\ then
(5.5.1)
minimize, ll Ax - b 1l2 and ha.t the small.e.!t 1?-nonn of all minimizers. More-
over m
r m
= L(u,a,- ufb) 2
+ L (u[b) 2
then %LS = A+b and PLs = [I {I- AA+)b ll2· A+ is referred to as the
pteudo-inver&e of A. It is the unique minimal Frobenius norm solution to
the problem
{5.5.3)
tben
A+
=
[ 1 0 0]
0 0 0
R = [ Ru Rt2 ] r
0 0 m- r .
r n-r
Giwn thia reduction, the LS problem cao be readily solwd. Indeed, for
any .7: E It" we have
where
r r
n - r m-r
Tbua, if z is ao LS minimizer, tbeo we must have
zs =II [ R-t
Oc ] .
"-~
12
] zll .
2
(5.5.4)
I
R it is practically impossible for near rank deficiency to go unnoticed.
-c -c ··· -c
1 -c ··· -c
.
.
.
. ...
T.(<) = d"«(l,•, .. , ,•-') [ : . .
1 -c
1
with til +•3 =1 with c.,• > 0 (See LaWIIOo and HIWlBOD (1974, p.31).) Th- matrices~
unaltered by Al~t~rilbm 5.4.1 and thus YII;} ll2 ~ Jn-t ror It = l:n -1 . This inequality
implie. (for example} th6&. the matrix TuXJ(.2} b.u 110 particulacly small trailing princiJ)61
submatrix since 11"""'
.13. However, it can be shown thai un = 0(10- 11 ).
(5.5.7)
(5.5.8)
it follows that lu~c- &,.I 5 fu 1 fork= l:n. ThiiB, if A has rank r then we
can expect n - r of the computed singular values to be small. Near rank
deficiency in A clUUlot escape detection when the SVD of A is computed.
Example 6.6.2 Fw ihe muri:JI: T100 (.2) in Eump)e 5.5.1, a,. 1":1 .367 · to-•.
One approach to estimating r = rank(A) from the computed singular
values is to have a tolf!rance 6 > 0 and a convention that A has "numerical
r
·rank" if the a,
satisfy
.,. ~Tb
:&;o = L u!
i•l qi
.Ui
262 CHAPTER 5. 0RTHOGON.UIZATIOM AND LEAST SQUARES
aa an approximation to %£S· Since II z,.ll2 ll:l lfu, $ 1f6 then 6 may also
be chosen .-ith the intention of producing an approximate LS solution with
suitably small norm. In §12.1, we discusa more liOphisticated methods for
doing this.
If fr~ > 6, then we haw reason to be comfortable with z,. because A
can thea be unambiguously regarded as a raok(A,) matrix (modulo 6).
On the other band, {•h 1 ••• 1 c1.,} might not clearly split into subaets
of smaU and large singular values, malciDg the determination of by this r
means somewhat arbitrary. This leads to more complicated methods for
estimating rank which we now dlacuaa in the context of the LS problem.
For example, 8uppoee r = n, and assume for the mo.ment that ~ = 0
in (5.5.10). Thus u; = £or i a-, =
l:n. Denote the ith oolUDlllS of the
matrices U, W1 V1 and Z by u., w;, v,, and z,, respectively. Subtracting
z,. from z £S and taking norms we obtain
~· Cft
Again f could be choeen to minimize the upper bound. See Varah (1973)
for practical details aod a1ao the LAPACK manual.
Probt-
P5.5.1 Show tbat if
A = [~~]
r
m-r
r n-r
When! " ,.. rank(A) &Ad T il nouinplar, \hea
X == [ T;• ~ ] n "- r
r m-r
aatisfiel A.XA.,. A aDd (AX)T ;: (AX). In tbis cue. -aay tba& X ia a (1,3) rufiWID-
inueneof A. Sbow 'ba& for g-.1 A, za • Xb w!Mn X ia a (1,3) paeudo-i~ ol A.
P6.6.2 De6Jie 8(..\) E R"><"' by B(A) :: (AT A+ A/)- 1 AT, wbere .l. > 0. Show
).
I B (.l.) -A+ ft:a .. cr,.(A)(o,.(A)l + .l.J r = rank( A)
C.L. La.aon end R.J. Hal:ia:l11 (1969). "Enanaiolla aod Applicac.iona ofthe Houaehold.
Algwi:thm !or SolviDg LinM:r 1.-t Squania Problezm, • MotA. Comp. rl, 787~12.
G.H. Golllb aod V. Pereyra (19'13). "The Differeat.iation of J>eeudo..ln~ and Noolln-
e-.r L.- Squlll'tll Problem~ W'hoe Variablea Separate," SlAM J. Num. AnaL 10,
413-32.
s~ u-tmen~ of LS ~uroawn theory may be foUDd in l..aWIIOII e n d " -
(1974), Stewart and Sun (1991), Bjon:k (1996), and
P.A. WedlD ( 1973). "Perturbation Theory for Paelkio-IIIV'ei"8S," BIT 13, 217-32.
G.W. S~ (1977}. "'n th«i Perturbation of Paeudo-lnvenm, Projections, and Linear
Least Squarea," SIAM &tti.evJ 19, 634-62.
Ewn for fuU rank probleme, column pivoting IIBelllll to produce more accuraie solutioQII.
The error aoalysie in the following papa- attempts to explain wby.
L.S. Jennings and M.R. Osborne (1974). "A Dind Error Analysis for r.-st. Sq~"
NIJnU!:r". Math. .!2, 322-32.
Variou. other aapects rank deficiency 111e disc~ in
J.M. Vara.h (1973). "On the Numeri<:al Solution of Til-Conditioned Linear Systems witb
Appli.ca.tiona to Dl-Poeed Problems," SIAM J. Nusn. Anal. JO, 257-67.
G.W. Stewart (1984). "Rank Degeneracy," SIAM J. Sci. and Stat. Comp. 5, 403413.
P.C. Hanaen (1987). "The 'I'nmcated SVD 11.1 a Method for Regularisation," BIT !1,
534-553.
G.W. Stewart (1987). "Colllnearity a.nd Leas&. Squan'lll Regression," StotUtical Sc:U:na
.e, 68-100.
We b&w more to say on the subject in §12.1 and §12.2.
Assume rank( A) = nand that xn solves (5.6.3). It follows that the solution
XLS to (5.6.1) satisfies
This shows that row weighting in the LS problem affects the solution. (An
important exception occurs when be ran(A) for then XD = XLS.)
One way of determining D is to let dk be some measure of the un-
certainty in b~c, e.g., the reciprocal of the standard deviation in b~c. The
tendency is for rt = ei'(b- Axo) to be small whenever dt is large. The
precise effect of d~c on r* can be clarified as follows. Define
lf D = I<& cbeD. ~D ;; r -1, .85JT aod ,. ... b- ..Uo =( .3, __ ., -.1, .2]T. On
lhe ocher band, if D : diag( 1000, 1, 1, 1 ) Uien - b&\1111 zo ~ ( -1.43, 1.2:1 ]'l' and
r = b- .Uo "'" ( .000428 -.571428 -.142:853 .28$7l.f.IT·
(5.6.8)
Notice that this problem is defined even if A and B are rank deficient.
Although Paige's technique can be applied when this is the case, we shall
describe it under the 888Ulllption that both these matrices have full rank.
The first step is to compute the QR factorization of A:
(5.6.11)
= 0; :t(O) = 0
r <O)
for 1e = 0, 1,
[!~:: ] = [ ~] - [ A: ~ ][:~:~ ]
[A: ~ ][~:: ] = [ ~::! ]
end
[At ~] [~] = ~ []
transforms to
p Rf
Im-n
0
0 1' l[~ l = [~ l
where
QTf =
[ ~~ ] m-n
n
QTp = [~] n
m-n
this heuristic.
Problema
+ 6). 1, ...• 1 )
-.....--
.1. = diag( 1, ... ' 1 ' ( l
...............
lW 6 > -1. c-ote the LS 1110lution to min U.:l(Az- b) Hz by :.:(6) &lid ic. l'Midual by
r(6) = b- Az(.S). (a) Shaw
A(ATA)- 1 ATe,eT )
r(6) =(I - 61+ .ser
A(ATA)-lA~eit r(O).
(b) Letting r~o(.S) l!taDd [or the kth CODlpoDelli or r(6}, 8how
r~o(O)
(6 )
r~: = 1+6efA(A1"A)- 1 ATe,·
(c) Uee (b) to -my (5.6.5).
P5.8.3 Show how the SVD ca.o be Ulll!d to IIOMt the genenW.zed LS problem when the
matrices A ud Bin (5.6.8) are rank deficient.
5 .6 . WEIGHTING AND ITERATIVE IMPROVEMENT 269
(a) Aaaumiq tbu tbe QR factorisation of A ill awilable, bow I1INl)' Bop. ptr itefaticm
~e required? (b) Show tbM the above iteracton rwulta by eeumg g<•> a 0 ID tbe iten.-
tive improvemect ICbemt 9- iD §5.6.4.
RDw &lid column wwiglrtio& in cbe LS problem ill ~ iD L&11n10n .ud flaa80Cl (SLS,
pp. 186-88). T be varioua effecu of tealin'
are d~ lD
A. van de!' Sluie (1969). "Coadition Numben and Equilibra&ion of Ma&rics," Numt!:r.
Math. 14, 14-23.
G.W. St4wart (1984b). "'n cbe Aqmpcotie Beha.nor of Sealed SiDcuiar Value and QR
OecompoG\ioq," MaUL Camp. 43, ~110.
The Cbeoraieal aod compu&a.&iooal upecc. of Cbe gellel'aiDed lelll &quaret probmn ~
peuiD
Method Flops
Gaussian Elimination 2n3 /3
Householder Orthogonalization 4n3 /3
Modified Gram-Schmidt 2n3
Bidiagona.lization 8n 3 /3
Singular Value Decomposition 12n3
RT11 =Pen
(5.7.1)
QT An = [ Rt R'l ]
where
Setx=II[ ri ]·
272 CHAPTER 5. 0RTHOOONAUZATION AND LEAST SQUARES
where
AT = QR (QR factorization)
Solve R(l:m, l:m)T z = b.
:t = Q(:, l:m)z
Proof. Let E and f be defined by 6A/t and 66/ e:. Note that rank( A+ tE) =
m for all 0 < t < e: and that
x(t) = ( A+ tE)T ((A+ tE)(A + tE)T) - l (b + tn
satisfies (A+ tE)x(t) =
6 + tj. By cillferentiatin( this expression with
resp«t to t and setting t "' 0 in t he result we obtain
z(O) = (/ - AT(AAT)- 1A) E'T(AAT)- 1b+ AT(AAT)- 1(! - Ex).
Since
II x 1!2 ~ II AT(AAT)- 1b 1!2 ~ u,.(A)II (AAT)- 1b 112.
II I-AT(AAT)- 1A II2 = min( l , n - m),
and
we bave
II x - x lb ::~(e:) - x(O) = e:ll z (O) Ill + O(e::z)
II x !12 II x(O) 82 II :t l1 2
. {II E lh II I ll2 II E ,:2
S umn(l , n- m) UA ll2 + Jib 1!2 + II A 1!2 ~(A) + O( )
!12}
from which the theorem follows. []
Probleru
P&.f.l Deri- the~ ~D fot' z(O).
P5.1.2 Find the miD.Im&ln«m dution to tbe ~ ..U • b when A = ( 1 2 3 J aad
b= J.
P5. 1.3 Show boo1r triaoplac ay.c.em IIOiv1Dc o:&D be avoided wt.o ..m& tbe QR fal:tor-
iatioo to mlve &D Ulldenlee.ermiDed ~
P5.1.4 Suppoee b,z e R" are J{WD. CoDIIid• tbe followlug problema:
'274 CHAPTER 5. ORI'BOGONALIZATION AND LEAST SQUARES
R.E. Cline ami R.J. Plemmoua (1976). ~~lotions to Underdetenn.ined Linear Sye-
tema," SIAM Jl.ev$eul18, 92-106.
M. Arioli &Dd A. Lanatta (1985). "Error AnalyaiB of ao Algorithm for Salvin& an Unde!'-
detenniiM!d Sycem,.. ~. MGth. -46, 255-268.
J.W. Demmel a.nd N.J. Higham (1993). •Improved Error Bounds for Undflnietennined
System Solvers," SIAM J. Mlltnz AnaL Appl 14, l-14.
The QR factorizaiion can of c o - be ~ to 1101Ye linear systems. See
Parallel Matrix
Computations
The parallel matrix computation area bas been the focus of intense
research. Although much of the work is machine/systeol dependent, a
number of basic strategies have emerged. Our aim is to present these along
with a picture of what it is like to "think parallel" during the design of a
matrix computation.
The distributed and shared memory para,digms are considered. We use
matrix-vector multiplication to introduce the notion of a node program in
§6.1. Load balancing, speed-up, and synchronization are also discussed.
In §6.2 matrix-matrix multiplication is used to ahow the eHect of blocking
on granularity and to convey the spirit of two-<limensiooal data How. Two
parallel implementations of the Cholesky factorization are given in §6.3.
Sorensen, and van der Vorst (1991), and Golub and Ortega (1993) and the
excellent review papers by Heller (1978), Ortega and Voight (1985), Galli-
van, Plemmons, and Sameh (1990), and Demmel, Heath, and van der Vorst
(1993).
important interconnection schemes include the mesh and torus (for their
close correspondence with two-dimensional anayz), the hypercube (for its
generality and optimality), and the tree (for its handling of divide and
conquer procedures). See Ortega and Voigt (1985) for a discussion of the
possibilities. Our immediate goal is to develop a ring algorithm fur (6.1.1).
Matrix multiplication on a torus is discussed in §6.2.
Each processor haa an identificatitm number. The J..tth proceseor is des-
ignated by ProcV.). We say that Proc(~) is a neighbor of Proc{J..t) if there
is a direct physical connection between them. Thus, in a p-processor ring,
Proc{p- I) and Proc(1) are neighbora of Proc{p).
6.1. BASIC CoNcEPTS 277
6.1.2 Communication
To describe the sending and receiving of messages we adopt a simple nota-
tion:
Scalars and vectors are matrices and therefore messages. In our model,
if Proc(p) executes the instruction send(Vloe, >.), then a copy of the local
matrix Vioc is sent to Proc().) and the execution ofProc(J!)'s node program
resumes immediately. It is legal for a processor to send a message to itself.
'Ib emphasize that a matrix is stored in a local memory we UBe the subscript
"toe."
If Proc(p) executes the instruction recv(U,oc:, .).), then the execution of
its node program is suspended untU a message is received from Proc{..\).
Once received, the message is pl.a.ced in a local matrix U,oc and Proc(Ji)
resumes execution of its node program.
Although the syntax and semantics of our send/receive notation is ad-
equate for our purposes, it does suppress a number of important details:
and store each ooluom in a procesaor, Le., .r(l + (JJ - l )r:pr) € Proc(~).
(In thia context "€" means "is stored in..") Note that each p~r houaee
a coDttguous portion of r .
In the sto~by-rmr acheme f t regard .r as a ,_by-r matrix
where A;; E Exr and z,,fk,zt e R'". We assume that at the start of com-
putation Proc(JJ) houses Zp., 1/p.• and the J.&th block row of A. Upon com-
pletion we set aa our goal the owrwriting of 1/p. by z,.. From the Proc(J..t)
perspective, the computation of
z,.
..
.,.
p
= y,. + L A,..,.z.,.
involves local data (A 11.,., y11 , z,.} and nonlocal data (z.,., r "!- IJ). To make
the nonlocal portions of x available, we circulate its subvectors around the
ring. For example, in the 'P = 3 C&Be we rotate the z 1 , z 2 , and x 3 as follows:
Algorithm 6.1.1 Suppoee A E R'xn, x E R", andy E R" are given and
that z = y + Az. H eacll processor in a ~processor ring executes the
following node program and n = rp, then upon rompletion Proc(JJ) boll8e8
z(l + (p -1 )r:JJr) in 1f1K· Assume the following local memory initialir.ationa:
p,p (the node id), left and right (the neighbor id's), n, row= l+(JJ-l)r:J.&r,
.A,QC = A(row, :), Xtoc = z(row), !/Zoe= y(row).
280 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
fort= l:p
send(:rloc. right)
recv(:rzoc:, left)
T=JJ-t
ifr:50
end
{ Zloc = :r(l + (r- l)r:rr) }
Yloc = Yloc + .A,oc(:, 1 + (r- l)r:rr):z:loc
end
The index .,.. names the currently available x subvector. Once it is com-
puted it is possible to carry out the update of the locally housed portion of
y. The send-recv pair passes the currently housed :r subvector to the right
and waits to receive the nen one from the left. Synchronization ill achieved
because the local y update cannot begin until the "new" z subvector ar-
riws. It is impossible for one processor to "race ahead" of the others or for
an :r subvector to pass another in the merry-go-round. The algorithm is
tailored to the ring topology in that only nearest neighbor communication
is involved. The computation is also perfectly load balanced meaning that
each processor has the same amount of computation and communication.
Load imbalance is discl.1886d further in §6.1.7.
The design of a parallel program involves subtleties that do not arise in
the uniprocessor setting. For example, if we inadvertently reverse the order
of tb.e send and the recv, then each processor starts its node program by
waiting for a message from its left neighbor. Since that neighbor in turn is
waiting for a message from 1ts left neighbor, a state of deadlock results.
seconds to carry out. Here ac~ is the time required to initiate the send or
recv and /Jc~ ill the reciprocal of the rate that a message can be tranaferred.
Note that this model does not take into consideration the "distance" be-
tween the sender and receiver. Clearly, it takes longer to pBSS a message
halfway around a ring than to a neighbor. That is why it is always desirable
to arrange (if possible) a distributed computation so that communication
is just between neighbors.
During each step in Algorithm 6.1.1 an r-vector is sent and received and
2r2 Oops are performed, If the computation proceeds at R flops per 9eCOnd
6.1. BASIC CoNcEPTS 281
and there is no idle waiti.og 88IIOciated with tbe recv, then eadl ~ upda.te
requires approximately (~ / R) + 2(04 + {J4r) sec:oods.
Another IDstructiw statistic ia the annpu(Gtion-to-communic:Gtion ratio.
For Algorithm 6.1.1 this is prescribed by
E = T(l)
pT(p)
where T(k) ia the time required to execute tbe program on k processors.
If computation proceeds at R Bops/~ and oommunieation is modeled by
(6.1.3), then a reasonable estimate of T (k) for AJgoritlun 6.1.1 is gi'\lell by
k 2 ~
T(k) = L:2(n/k)2/R+2(ac~+ Oc~(n/k)) = ;;. + 2ac~k + 2tJ.,n
l= l
1
E = 1 +. PI (0d~ + IJ) .
imprCMlS with incre&ling n aDd degradates with increasing p or R. In
practice, benchmarking ia the only dependable way to asaees efficiency.
A coDCept related to efticieDcy is ~·up. We say that a parallel alg~
rithm for a particular problem acbiews ·speed-up S if
S = T_,/T,_.
where T .-ris the tilDe required fOI' execution of tbe parallel program and
T,~ is the time required by one prooeaaor when the best uniproce1110r pro-
ced~ is used. For aome problema, the fastest sequential algorithm does
not paralle1ize and so two distinct algorithms are involved In the speed-up
a. ment.
I We -'ioa t~Mol ~apt.--~ putleularly WnminMiac iD . , . _
whwe Uae DOdll ant able ~ cwerlap mmpuwioa 8lld CX'mmaak:Nim..
282 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
if'T$J.'
Yloc = Yloc + ~...,(:,1 + (-r- 1)r:-rr)x,...,
end
then the overall number of Hops is halved. This solves the superfluous Hops
problem but it creates a load imbalance problem. Proc(J..t) over90011 about
2
IJT /2 Hops, an increasing function of the processor id 1-'· Consider the
following r = p = 3 example:
Z} 0: 0 0 0 0 0 0 0 0 XI Yt
Z2 a a 0 0 0 0 0 0 0 X2 Y2
~ a a a 0 0 0 0 0 0 X3
...ML
z.t fJ {J fJ {3 0 0 0 0 0 X.c Y•
zs = tJ {3 tJ {J
fJ 0 0
0 0 X~ + Ys
%:6 (3 f3 {3 {3
{3 0 0 0
{3 ~
.., .., ..1!!...
Z7 1' 1' 1' 1' 1' 0 0 X7 Y7
zs 1' "f 1' "f 1' 1' 1' 1' 0 xs Ya
Z9 1' 1' 1' 1' 1' "f 1' 1' 1' Xg 119
Here, Proc(l) handles the a part, Proc(2) handles the (3 part, and Proc(3)
handles the 7 part.
However, if processors 1, 2, and 3 compute (zl!z.&,ZT), (z2,ZG,Ze), and
(z3, zs, z.g), respectively, then approximate load balancing results:
Zt (J 0 0 0 0 0 0 0 0 Xt Y1 .
z.c /3 f3 11 f3 0 0 0 0 0 X':I Y4
..., 1' 0 0
2!.... 1' 7 7 7 7 ~ .J!I_
Z2 a a 0 0 0 0 0 0 0 X.c Y2
4 = f3 {3 {3 fJ f3 0 0 0 0 XI) + Y&
~ 1' 7 1' 7 1' 1' 7 1' 0 X&
...!!...
Z3 (J () (J 0 0 0 0 0 0 X'7 113
Z& fJ tJ fJ {3 tJ {3 0 0 0 zs Ye
Ze 1' 7 7 7 7 7 1' 7 7 Xg Yll
The amount of arithmetic still increases with 1-'t but the effect is not no-
ticeable if n > p.
The development of the general algorithm requires some index JJUWip-
ulation. Assume that Proc(p} is initialized with ~« = A(p:pm, :) and
6.1. BASIC CONCEPTS 283
Yloc =
y(p:p:n), and 8SIIUlile that the contiguous .:r-subvectors circulate as
before. If at some stage :l:loc contains x(l + ('r -l)r:'TT'), then the update
Algorithm 6.1.2 Suppose A E R'x", x E R" andy E R" are given and
that z = y + Ax. Assume that n = rp and that A is lower triangular. If
each processor in a p-processor ring executes the following node program,
then upon completion Proc(J.&) hoUBeS Z(J.&:p:n) in Yloc· Assume the following
local memory initializatioos: p, J.' (the node id), left and right (the neighbor
= =
A(J.&:p:n, :), 3/101! y(J.&:p:n), and X1oc = x(l + (J.&- l)r:J.&r).
id's), n, .4,...,
r=nfp
fort= l:p
send(Xloe• right)
recv(XIoeoleft)
'T=JJ-t
ifT~O
T:=T+p
end
{:cloc = x(l + (T -l)r:rr)}
for a= l:r
for {3 = l:p +(a -l)p- (r-l)r
YZuc(a) = !lioc(a) + AEuc(a, 11 + (1'- l)r)Xioc(/3)
end
end
end
Having to map indices back and forth between "node space" and "global
space" is one aspect of distributed matrix computations that requires care
and (hopefully) compiler assistance.
284 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
6.1.8 Tradeoff's
AJJ we did in §1.1, let us ~lop a column-orient«i gaxpy aad auticipate
ita performance. With the block colWIUl partitioning
At e R"xr, r = n/p
=
where x,. x(1 + (p.- 1)r.pr). Assume that Proc(p) oootains A,. and x,..
Its contribution to the gaxpy is the product A,.xj.l and involves local data.
However, these products must be summed. We assign this task to Proc(l )
which we asaume contains y. The strategy is thu for each procesaor to
compute A,.xj.l aad to aend the result to Proc(l).
Algorithm 6.1.3 Suppose A e ~xn, x e R" andy E R" are given and
that z = 11 +A%. If each processor in a p-prooeasor network executes the
following node program and n = rp, tben upon completion Proc(l ) houses
z. Assume the following local memory initi&lizations: p, p. {the node id),
n, Zloe = x(1 + (p - l)r:p.r), Aloe= A(:, 1 + (p - l )r:pr), and (in Proc(l)
only) !llO<! = y.
if#-'=1
!lloc = Yloc + Atoc%1oe
fort= 2:p
recv(wloc• t)
Yloe = !lloc + Wioc
else
w~oc =Az~loc
aend(wroe, 1)
end
At first glance this seems to be much leas attractive than the row-oriented
Algorithm 6.1.1. The additional responsibilities o£ Proc(l) meao that it
has more arithmetic to pezform by a factor of about
2n2 /p + np = 1 + -
-~:::-;-.....;..
r
2nlfp 2n
and more m~ to process by a factor of about p. This imbalance be-
comes leu critk.al if n > p and the communication parameters a, and Pt~
factors are small enough. Another poe~ible mitigating fa.ctor l8 that ~
rithm 6. 1.3 maoipulattw length n vecton whereas Algorithm 6.Ll worb
6.1. BABIC CONCEPTS 285
with length nfp vectors. If the nodea are capable of -m:tor arithmetic; then
the longer vac:tors m&y raise the lew! of performance.
This brief compariaon of Algorithms 6.1.1 aDd 6.1.3 reminds WI ooce
again that difterent implemeatatiou of the same computation eau ha"'!
vwy diffinnt performaoce cbaracteristies.
Global Memory
Here we assume that n = rp and that A,. E R"x", yl' E R", and z,. E R'".
We use the following algorithm to introduce the basic ideas and notations.
r =nfp
row= 1 + (p- l)r:pr
Xloc. =X
Yloc = y(row)
for j = l:n
aloe= A(row,j)
Yloc = 1/10<: + Gloc.XIoc(i)
end
y(row) = 1/loc
end
Y = 1/loc
end
288 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
Indeed, if more than one processor is executing thia code fragment at the
same time, then there may be a losa of information.. Coosider the following
sequence:
Proc( 1) reads y
Proc(2) reads y
Proc(l) writes y
Proc(2) writes y
The contribution of Proc(l) is lost because Proc(l) and Proc(2) obtain the
same version of y. A1J a result, the effect of the Proc(l) write is erased by
the Proc(2) write.
To prevent this kind of thing from happening most shared memory
systems support the idea of a critical section. These are special, isolated
portions of a node program that require a "key" to enter. Throughout the
system, there is only one key and so the net effect is that only one processor
can be executing in a critical section at any given time.
This use of the ttitical. section concept controls the update of y in a way
that eDBUI1l8 correc:tnesa. The algorithm is dyMmically &ch«lukd because
the order in which the summations occur is determined as the computation
unfolds. Dynamic scheduling is -rery important in problems with irregular
structure.
290 CHAPTER 6. PARALLEL MATRIX CoMPUTATIONS
Proble..-
P4.1.1 Modify Algoritlun 6.1.1 10 thai it caD baDdJe arbitrary n.
1"0.1.2 Modify Al&vritbm 6.1.210 ~" llftldelllly blmdlell the DppS' triangular cue.
P8.1.3 (a) Modify Al(urithml6.1.3 aDd 6.1.410 thu they owrwrite 11 with z v+A"'z =
for a gn.u poaitiw imepr m that is avNlable to sdl p~. (b) Modify Algorithmll
6.1.3 aDd 6.1.4 10 that v ill tMitWlitt.IID t., I = 1f +AT A:.
PS.1.4 Modify ~hm 6.1.310 that upon completion, the loca.l array A'- in Proc(#-1)
ho'- the pth block eoi.IUDD of A + eyT.
P8.1.5 Modify Algorithm 6.1.410 thai (a} A iiCM!II'Written by the outer product update
A+ rvT, (b) z ia <M1rWritten with A 2 :r:, (c) V is a--writteo by a1lllR 2-nonn ~ in
the direction ofv+A•:r:, and (d) it •ciemly lwldlea theca- when A illlowm triangulal-.
B.N. Datta. (1989). "Parallel aud Large-Scale Matrix ComputatioiiiJ in Coutrol: Some
Jdeu,n Lin. Alg. Mid It.J Af'Plic. 111, 24.3-264.
A. Edelman (1993). "Large Denlle Numerical Linear AJgebra in 1993: Tbe Pan.llel
Computing lntluenoe,~ Int'l J. ~ AppL 7, 113-128.
Managing aud modelling oommunication in a distributed memory environment is an im-
portaDt., difficult problem. &!e
fbr snap!!bots of basic linear &lgebra computation on a distributed memory system, see
0. McBryB.n aod E.F. van de Velde (1981). "Hypercube A~ritbms and Implementa-
tions," SIAM J. Sci. and Stai. Comp. 8, s227-s287.
S.L. JohD8S>o and C.T. Ho (1988). "Matrix Tl'ansposition on Boolean n<ube Configurl!d
EDSemble Architectures," SIAM J. M4triz Anal. AppL 9, 419-454.
T. Dehn, M. Eierma.nn, K. Giebermann, a.nd V. Sperling (1995). ~structured Sparse
Matrix Vector Multiplication on Massively Parallel SIMD Architectures,M Parnl~l
Computing £1, 1867-1894.
J. Choi, J.J. Dongarr&, and D.W. W&l.ker (1995). ~Parallel Matrix Tha.n.spose Algontbms
oa Distributed Memory Conc::urrent Computers,~ P4rolle! Computing 21, 1387-1406.
L. Colombet, Ph. Micballoa, and D. 'l'rystr&Jn (1996). ~Parallel Matrix-Vector Product
on Rings witb a Minimum of Communicatioa,~ Parol~ Computing 22, 289-310.
The implemeiit&tioo of a p&lll.lltd algoritbm is usually very cballenging. It is important
to ha.ve compilers and rei&~ tools tbat an! able to bandle the details. See
D.P. O'Leary ud G.W. Stewart (1986). "Aa!i8nment and Scbedulill8 in Parallel M&trix
Fll.ctorizatiou," Lin. Alg. cmd It. Applic. 17, 2~300.
J. Ooogarn and D.C. Sorell8en (1987). "'A Portable Environment for I>evelopiog Parallel
P~ms, ~ PC!Ttlllel Com~ 5, 175-186.
K. Connolly, J.J. Dongarra. D. Sorellllen, and J. Pattei1!0n (1988). "Programming
Methodology and PerfOrmanoe lllllle& l'or Advauced Computer Architectures," Par-
aU& Compuoog 5, 41-58.
P. Ja.cobeon, B. Kagstrom, and M. Rann&r (1992). ~Algorithm Development for Di&-
tributed Memory Multicomputen Using Conl&b,~ Scientific P'I'Dgl"'lmming, 1, 185-
203.
C. Anoolll't, F. Coelho, F. lrigoio, and R. Keeyell (1993). "A Linear Alpbnt. Framework
for Static HPF Coda Disui.butirm," ~of the 4th Worbhop on C~
for ParaUd C~, Delft, The Netberlan.da.
D. Bau, I. Kodukul&, V. KotlY1U', K. Pingali, and.P. Stodghill (1993). "Soo.viD8 Alignment
Using Elementary Linear Al@;ebra," in ~ of the 7th I~ Worbhop
on Langw.gou and Compilen for Poralld Computing, Lecture Notes in Computer
Science 892, Springer- Verlag, New York, 46-00.
M. Wol11!! (1996). High. Pe.r/Dnnafll% Compilef'J for Parallel C~, Addl8ou. Wmley,
Reading MA.
292 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
(6.2.1)
[ Dt. . ... D,.,. I = [ C~o .. .• c,.,. I + (At .... A,., I [BJ . ....
I Bk·p I (6.2.2)
BlJ
B;;
B; = 0
then
j
Di = ci + AB; = c 1 + L ~B,..1 . (6.2.3)
r-1
F(JJ) =
Sc (
bf~>+<•-llp ~ kJJ + 2
k2p) k2~·
2n3
The quotient F(p)f F(l) is a. measure of load balancing from the fl.op point
of view. Since
F(p) _ kp + k2p/2 _ 2(p- 1)
1
F(l) - k+k 2pj2 - + 2+kp
we see that arithmetic ba.la.uce improves with increasing k. A similar anal-
ysis shows that the communication overheads are weU ba.l.auced as k in-
creases.
On the other hand, the total number of global memory reads and writes
88liOCia.ted with Algorithm 6.2.1 increases with the square of k. If the start-
up parameter a. in (6.1.5) is large, then performance can degrade with
increased k.
The optimum choice for k given these two opposing forces is system
dependeot. If communication is fast, then smaller taeb can be supported
without penalty and this makes it easier to achieve load balancing. A mul-
tiprocessor with this attribute support& fine-grained parnl~li.mt.. However,
if granularity ia too fine in a system with high-performance nodes, then it
may be impossible for the DOde programa to perform at level-2 or level-3
speeds simply because there just is not enough local linear algebra. Again..
benchmarking is the only way to clarify these issues.
6.2.2 Torus
A torus is a two-dimensional proce880r enay in which each TOW' and col-
umn is a ring. See FIGURE 6.2.1. A Proce!liiOr id in this context is an
294 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
ordered pair and each proce11110r baa four neighbor& In the displ&yed. exam-
ple, Proc(1,3) has wut neighbor Proc(l,2), ea.tt neighbor Proc{l,4), south
neighbor Proc(2,3), and north neighbor Proc(4,3).
To show what it ia like to organi2e a toroidal matrix computation, we
develop an algorithm for the matrix multiplication D = C + AB where
A,B, C E R"xn. Assume that the torus is Pl-by-Pt and that n = ?1·
Regard A = (A.;), B = (B,;), and C = (C,;) as P1-by-p1 block matrices
with r-by-r bl.ocks. Aaaume that Proc(i,j) contains A.;, Bi;, and C,i and
that its mission ia to overwrite C,; with
Pl
Di; = Ci; + L ~Bt;·
lr•l
We develop the general algorithm from the Pl = 3 caae., displaying the torus
in cellular form as follows:
6.2. M.ATJUX MULTIPUCATION 295
Au Bu Atl At3
B:n
B31
(Pay no attention to the "dots." They are later replaced by various A,;
and s.,).
Our plan is to "ratchet" the first block row of A and the first bloclc.
column of B through Proc(l,l ) in a coordinated fashion. The pairs Au
and Bu, Atl and Blh and Au and 831 meet, are multiplied, and added
into a running 8UDl array C.oc:
8,1
Bu
Bu . .
~1
Au Bu A12 A13
fort= 1:3
send( Aloe, west)
send( Bloc, north)
rec:v(Aioet ea.tt}
recv( Bloc• •auth)
Ctoe =Ctoe + AloeBicoe
end
fort= 1:3
Hnd(Aioeowut)
recv(Aioe 1 east)
send(B~.x:, north)
recv(Btoc• •outh)
C1oe = C1oc: + A1oeB1coe
end
also works. However, this induces unneceuary delays into the process be-
catl8e the B submatrix is not sent until the new A submatrix arrives.
We uext coDSider the activity in Proc(I,2), Proc(l,3), Proc(2,1), and
Proc(3,1). At this pomt in the development, these proces80rs merely help
circulate b.locks Au, Au, and Au aod Bu, B21 o and lhs , respectively. If
832, Bs2. and Bn &wed through Proc(1,2) during these step&, then
if B13, 823 , and B33 are available during t = 1:3. To tbia end we illitialize
the torus u followa
Au Bn Atz Bn Au Bss
B,l Bn Bt3
Blt B 12 B'l3
6.2. MATRIX MULTIPLICATION 297
Bu ~ 833
then with westward Bow of the ~i and northward flow of the B,; we obtain
An B21 Ata ~ An Bu
Au Bu An Bn A13 833
Problema
P8.~.~ An upp« triangulel' matrix Cll.ll be overwritten witb ita aquare wnbout any
additional work3pace. Write a. d)'D&IDical1y IICheduled, sbared·memory procedure for
doing thlll.
L.E. CIUUlOn (1969). A C'alular Corr~.p'!~Ur to lmplemmc the KtJbrusn Filla- A.fgoriUun,
Ph.D. ThePs, Montana State Uniwnity.
K.H. Cbenc and S. SaluU (1987). "VUll Systems for Band Matrix Mult.iplic:ation,"
ParaUd Computing 4, 239-258.
G. Fox, S.W. Ono, aDd A.J. Hey (1987). "Matrix AlgoriLhmB on a Hypmm~be 1: Matrix
Multiplicaiion," PdtGlld Compueing ,f, 17-31.
J. eenu- (1989). "<JommrmiaWon Efficient Matrix Multiplica&ion on Hypm=bs,"
PomUel Compu&ing 1.1, 335-342.
H.J. Japdiab and T. Kail&th (1989). "A Family of N- Efllci1111t Arrays for Matrix
Multiplication,,. !EBB nun.. Comp~~&. S&, 149--155.
P. Bjjmltad, F. Mamie, T.Stlnvik, aad M. Vaj\ertic (1992). "Efficient Matrix Mulllpli-
cation on SIMD Compu~'" SlAM J. M~&triz Anal. AppL 13, 386-401.
K. MathW' ood S.L. Joru- (191U). "MultiplkaiJon of Mauicel of kbitrazy Shape oo
a Data Para1lel Compmer.~ Pa.ndld Compatmg BO., 919-952.
R. M&tbiu (1995). -rho Instability of Pamllel. Prefix Matrix Mu1tiplica&ion,~ SIAM J.
Sci. Com-p. 16, 956-973.
300 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
6.3 Factorizations
In this section we present a pair of parallel Cholesky factorizations. To
illustrate what a distributed memory factorization looks l.ike, we implement
the gaxpy Cholesky algorithm on a ring. A shared memory implementation
of outer product Cbolesky is also detailed.
This equation is obtained by equating the p.th column in the n-by-n equa-
tion A = G<fl'. Once the vector v(p.:n) is found then G(p.:n, p.) is a simple
scaling:
G(p.:n, p.) = v(p.:n)j.,;;[ii.).
For clarity, we first assume that n = p and that Proc(p.) initially houses
A(p.:n, p.). Upon completion, each processor overwrites its A-column with
the corresponding G-oolumn. For Proc(p.) this process involves p. -1 sa.xpy
updates of the form
for j = l:p. -1
Receive a 0-a>lumn from the left neighbor.
U necessary, send a copy of the received G-column to
the right neighbor.
Update A(.u:n, p.) .
end
Generate G(p:n, p.) and, if Decal88l'Y1 send it to the
right neighbor.
j=l
while j < p
recv(g,oc(j:n), le.ft)
ifp<n
send(moc(j:n), right)
end
Atoc~:n) = Atoc(p:n)- !noc(JJ)g~:n)
j=j+l
end
Atoc(p:n) = Atoc(p:n)/-/Atoc:(JJ)
ifp<n
send{Atoc(JJ:n), right)
end
1 1 1 ]
[ o, o.. CIT aao I 02 ar; t1c au I ~3 tic Cit,
....
Proc(l) Proe(2) "-:(3)
• The right Deighbor must :rt.UJ how more G-oolUIDD8 to generate. Oth-
erwise, a G-column will be seDt to an inactive processor.
Tbl.s kind of reasoning is quite typical in distributed memory matrix com-
putations.
Let UB examine the behavior of Algorithm 6.3.1 under tbe 888umption
that n > p. It is not bard to show that Proc(~) performs
L 3
F(p) = L 2(n- (p + (k -l)p))(p + (k - l)p) :;, ;
*•1 'P
flops. Each processor recei~ and sends just about ewry G-oolumn. u~
ing our communication overhead model (6.1.3), we see that the time each
proceMOr spends communicating is given by
..
m11 = }:2(a.,c + .Bc~(n- j)) ::t: 2actn + Pc~n2 •
j•l
H we 888UDle that ciomputatlon proceeds a& R Oops per second. then tbe
computation/communication ratio for Algorithm 6.3.1 is apprmrimately
given by (n/p)(l/Wt~.). Thus, commuuic.ation owrbeeda diminish in iJn.
portaDce as n/p grows.
Problem.
G.A. Ceil&: NKl M.T. Headl (18116). "Mairix F8ctoriutioa. 011 a H~ • in M.T.
H.-th (ed) (1986). ~ of Fine SIAM Con/- on Hypereube M~
~ SIAM Publicatiom, Philadelphia, Pa.
I.C.F. I . - , Y. Saad., &Dd M. Sc:hulb (1986). " n - LineR SyRe~m 011 a lUng of
Pnx-n.• Lift. A.lg• ..ale. Appiic. 77, 205-2:m.
D.P. O'Leary and G.W. S~ (1986). •AMp"*" BOd Scheduling iD Pamllel Mairix
FadorisMio11,• Lin. A.lg. and lu App!ic.. TT, 275-300.
R.S. Schrl!lib. (1988). •sJock Atpnlum for Puallel Mlld!iDea," iD NWM!riaJl Algo-
rithnu for Modern Pflnllld Camputer A~hlfU. M.H. Schultli (ed), IMA Volumes
in Mub.emailce aDd Ita Application~~, NUIDber 13, Sprinpr'-Verles, Berlh1, 191-207.
S.L. Jobn.o11 and W. Lichte~~Aein (1993). "Bkx:k Cydk Dell8e Linear Alpbn,ft SIAM
J. Sci.Comp. 1-l, 1257-1286.
R.N. Kapur and J.C. Brow~~e (1984}. "'''echn.iqUM !or SoWing Blodr. 'I'ridiaconel Sys&em~~
on Recollfigun&bla Airay Com~," SlAM J. Sci. ond SUU. Camp. 5, 701-119.
G.J. Davis (1986). "Column LU Pi...otillg on a Hypm-cube Multip~.ft SlAM J.
Alg. and DUe. Metlwd.1, .538-550.
J.M. Delosme ud I.C.F. Ipseo. (1986). ~Panllel Solution of Symmet.ric ~'M Definite
Systems with Hyperbolic Ro&Miooa," Lin. Alg. and Iu Applic. TT, ~112.
A. Potb.en, S. Jha, a.nd U. V~~~J~BPul.ati (1987). "Ortbogona.l Fac:torizatjoo on a Dis-
tributed Memory Multi~." in Hyperr:u/18 MuJI.i~or1, ed. M.T. Heath,
SIAM P~, 1981.
C.H. Billchof (1988). "QR Factorizatinn Algorithms for Coane Grain Disiributed Sya...
tema," PhD~ Depc. of Computer Scl,nce, Cornell Univemty, libca, NY.
G.A. Geillt aAd C.H. &amine (1988). "LU Factorization Aliorithma on Dillt.ributed.
Memory M~ Arclrlt.ect~~n~~~," SIAM J. Sci. and S""- Comp. 9, 639--649.
J.M. Ortega and C.H. Romil1e (1988). "The ijlc F'ormll of Factoriation Methods Il:
Pe.rallel Sywtema," Panalld Cmn,.mng 1, 149-162.
M. MliiTUI:hi SAd Y. Robst (1989). "'ptimal Algoritbml for GaUMian Eliminat.ion on
an MIMD Computer,R Parallel Computing 11!, 183-194.
Parallel triaDplar syatan .wine is ~ in
R. Momoye aad D. Laurie (1982). MA Ptw:tical Algorithm for tbe SolutioD of'I'riangulat
S}'llteml on a Panllel ~ Synem." IEEE lhlru. Comp. C-31, UJ76-1082.
D.J. EYaDI aDd R. Dunbar (1983). "'The Pa.allel Solution of 'I\"iangul8t Syatem. of
Equ.ations,R IEBB ThiN. Ctnn.p. C-3!, 201-204.
C. H. Romine and J.M. Ortega (1988). "Pwaalel SolutioD of'Ina.Dgular Systems of Equa.
tions," Parallel Cornptlting 6, 109-114.
M.T. Hmt.h IIZld C.R. Romine (1988). "ParaDe!. Solution of~ System. on Dis-
tributed Memory M~" SIAM J. Sci. and St4L Comp. 9, ~.
G. U ll.lld T. ColemaD (1988). "A PamDel niNl&ular SoMr for a Dishibuied-Memory
M~r," SIAM J. Sci. lind Stat. COfiiJI. 9, 485-502.
S.C. E-...t, M.T. Heatb, C.S. Bmbl, and C.H. Romine (1988). "Modified Cyclic
Algorithlm fot Solving 'I'ciuJpiM s,.._ 011 Di.tributed Memory Uultipmcwra,"
SIAM J. Sci. and SfGL Ccmp. 9, 589-600.
N.J. Bigh&m (1995). "Stability of Parallel niNl&ular Syatem Solwm.~ SIAM J. Scf.
Com-p. 111. 401)-.(13.
Papers oo the parallal OOU!pllt8tion of the LU aDd ChoiMky fac:torisat.ioD iDclude
306 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS
R.P. Bnmt ud F .T. Luk (1982) "Computing the Choleaky F'actorisMion Ullillg & SyRolic
An:hiteciwe,~ Prvc. 6th Ati.HnlliGn Computer S~ Con/. 296-302.
D.P. 0~ utd G.W. Stewart {1985). "D&ta Flow Algoritb.I:PIJ for Parallel MMrix
Computaiiooll," Comm. of the ACM 118, Ml-853.
J.M. De1omoe lltld LC.F. 11*111 (1986). "Pu"&Uee Solu\ion of Symmetric; PDBitiw Definite
S)'ll&em8 with Hyperbolic Roce&.iou," Lin. Alg. and Ju Applic. 'fT, ~112.
R.E. Funderik ud A. Geist ( 1986). "Torus Da&& Fknr for Pamllel. Computation of
MiMised Ma&rix Problems," Lin. Alg. and lu AJ'Piic. 11, 149-164.
M. Costnatd. M. Manakchi, Ad Y. Hoben (1988). "P~ G&UIIian Elimination on
an MIMD Computer," Parulld Ctnnputing 6, 275-296.
Parallel m.ethods for banded and spane lf}"'temml include
S.L. JohnMon (1985). "Solving NarT'CIW Banded Syfiems 011 Ell86Dlhle An:hitectlln!8,"
ACM '1l-aru.. MGth. Soft. 1l, 211-288.
S.L. J~n (1986). KBand M&trix Sy.tem Solvcn on Eo.emble An:hitectlln!ll," in
Supm:omputen: Algoritllnu, Arcllitectuf'U, cmrl Scimnfic Compul4tion, edl. F.A.
M~ aDd T. -r.jima, U.llivenRy of Teua ~ Austin TX., 196-216.
S.L. J~n (1987). KSolvinK Tridiagonal Sywte~m on Euemble An:ltitect.un~B," SIAM
J. Sci. tmd Stat.. Camp. 8, 3M-392.
U. Mei.m (1985). wA P&rallel. Partition Method for Solving Ba.Dded Syaten11 of Lineal"
Equa&~u.," PG.t'Glld Compu&ers I, 33-43.
H. van del" Yom (1987). MI...arp 'l'ridie«onal and Block Tridiagonal LineR System~~ on
Vector and Perallel Compuua,~ Pflflllld Cmnput. 5, 45-M.
R. Bev&cqua., B. Codenotii, and F. Romani (1988). KPantJlel SolutioD of Bloc:k nidi-
agonal Llnea:r Syatems," L'n.Alg. Gnd Iu Appiic. 10.4, 39-57.
E. Gallopouloll ao.d Y. Saad (1989). wA Panllel Block Cyclic Rllduction Algorithm for
the Fut Solution of Elliptic Eqll&iiowl," Paralld Computing 10, 143-160.
J.M. Conroy (1989). MA Note on the PMallel Choleaky Factorizalioo of Wide BUided
Mauic:el," Pcwallel Computing 10, 239-246.
M. HecJaod (1991). "'n the Pu-allel Solution of Tridiagonal S~ by ~Around
Partitioning &nd lncompiel.e LU F'acWrizetion," Num£r. Ma:h. Sg, 453-472.
M.T. Heath, E. Ng, &nd B.W. Peyion (1Wl). "P&n.11el Alcoritimla for Sp&ne Lineer
Syatema," SIAM &tM:ul :1::1, 42().-460,
V. Mebnaann (1993). MDivide &nd Conquer Methodll for Block Tridiagona.l. Systems,"
PIJf'OUel Computing 19, 257-280.
P. Ragh&V&D (199S). wDistributed Spat811 GIWMiMl EliminAtion and Orthogon.&l f'atctor..
bation," SIAM J. Sci. Comp. 16, 1462-14T7.
Pare.l.lel QR. fiiCWI'iatio11 prooedurea a.re or intenlt in real-time signal proceMil1g. Do-
taila ma,y be found in
W.M. Gentleman aDd H.T. Kung (1981). ~Mauix 'I'riangu.l.arilon by Systolic Arraya, •
SPIE P~lnp, Vol. 298, 19-26.
D.E. Heller .ad I.C.F. IIMS! (1983). "S)"IItolic Netwwb for Ortho&onal Decomposi.tionll,"
SIAM J. Sci. and St4t. Comp . .4, 261-269.
M. Coeinard., J.M. Muller-, aiJd Y. Rot-f. (1986). "PanJJel. QR Decomposition of a.
~laz- Matrix,~ Nvm.er. MatJ&.. .f8, 23g....250_
L. Eldin aod R.. Sc:lmnbm- {1986). "An Application of Sy.tolic Anaya to Linear D~
111-P~ p~· SIAM J. Sci. tmd St4t. Comp. 7, 892:-oo.l.
F.T. Lu.k (1986). "A R~Mtion Mahod for Computing the QR FactotiaUcm," SIAM J.
Sci. and StaL Camp. 1, 452-M9.
J.J. Modi and M.R.B. Clade (1986). ~An Alternalive Give~U~ Orderi.as," Numer. Math.
.4:1, 83-90.
P. Amodio and L. BnJ&D&DO ( 1995). ~he PanJJel QR fact.miaatioG Algorithm for
'Ihdi.aCQneJ LiDee.r Systems," Panslld Com,utmg J1, L097-1110.
6.3. FAc:I'ORIZATIONS 307
S. Chea, D. Kuc:k, aod A- Semeh (1978). •Practical PamHel 8aDd 'I'riaDp1az' Syat.ema
Solwra," ACM Thlru. Math. Soft. .4, 210-217.
A. Samail aDd D. KliCk (1978). "On Stable Parallel l.m-- Syaem Solven," J. ANoe.
Comp. MadL 15, 81-91.
P. s~ (1979). ""A PacalW AJgoritbm for Solving G--.1. Tridiagonal Equa-
Uona," MI&IA. Comp. 33, 185-199.
s. Chea, J. Doqvra, aud c. lkuing {1984}. "Multi~ u - Alg8bra Algo-
ritlu:n. on the Cray X-MP-2: ~with Sm&ll Gruulariiy," J. PonUid ond
Dilirlhtelf Comptlting 1, 22-31.
J.J. Donpna aDd A.H. Sameh (191W). "'D Some Pamll4!1 Banded System Sol-."
Paralld Computmg 1, 223-235.
J.J. Donpna aad R.E. H~ (1984). "A CoHection of Parallel Linear Equation
Routinm for the Deneloor HEP,' Parallel Compuilng 1, 133-142.
J.J. Dongarra and T. Hewi.U (1986). •ImplementiJls Deuae LiDeN" Alpbnt. Algoritbm.
Using Multit.alkillr; on the Cn.y X-MP-4 (m- Approadrinr; the Gi.pllop)," SlAM J.
Sci. tmd Sttat. Cmnp. 7, 347-3:50.
J.J. Dooprra, A. Sameh, and D. ~ (1986). •rmplemea~ion of Some ConcmTI!!Ilt
Algorithml for Murix ~ion," P!&n~Ud Camyutmg 3, 25-34.
A. George, M.T. Heath, and J. Liu (1986). "P111111le.l Cbolmky Factorization on a Shared
Memory Multip~," Lin. ttlg. and It~ Applic. 77, 165-187.
J.J. Donprm and D.C.~ (1987). "Linear Algebra on High PerfDiliiBIIc& Co~
putel"ll," Appl. Math. and C111J1p. 10, 57-88.
K. Dackland, E. FJmroth, aud B. Kaptrom (191i12). "Parallel Block Factorizations on the
Shared Memory Multiproc_.- IBM 3090VF/600J," ln~ J. SuJli!I'"COf7lputer
Applicatioru, 6, 69-97.
Chapter 7
The U nsymmetric
Eigenvalue Problem
Having discussed linear equations and least squares, we now direct our
attention to the third major problem area in matrix computations, the
algebraic eigenvalue problem. The unsymmetric problem is considered in
this chapter and the more agreeable symmetric case in the next.
Our first task is to present the decompositions of Schur and Jordan
along with the basic properties of eigenvalues and invariant subspaces. The
contrasting behavior of these two decompositions sets the stage for §7.2
in which we investigate how the eigenvalues and invariant subspaces of
a matrix are affected by perturbation. Condition numbers are developed
that permit estimation of the errors that can be expected to arise because
of roundoff.
The key algorithm of the chapter is the justly famous QR algorithm.
This procedure is the most complex algorithm presented in this book and its
development is spread over three sections. We derive the basic QR iteration
in §7.3 as a natural generalization of the simple power method. The next
308
309 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
Chapters 1-3 and §§5.1-5.2 are assumed. Within this chapter there are
the following dependencies:
AX=XB,
then ran(X) is invariant and By= >.y =} A(Xy) = >.(Xy). Thus, if X has
full column rank, then AX= XB implies that >.(B)<:;;; >.(A). If X is square
and nonsingular, then >.(A)= >.(B) and we say that A and B = x-
1
AX
are similar. In this context, X is called a similarity transformation.
7 .1.2 Decoupling
Many eigenvalue computations involve breaking the given problem down
into a collection of smaller eigenproblems. The following result is the basis
for these reductions.
Lemma 7.1.1 JfT E cnxn is partitioned as follows,
T=
T 11 T12 ] p
QHAQ = T =
[ 0 T22 n-p (7.1.2)
p n-p
Proof. Let
where
T11 T12 ] P
[ T21 T22 n- p
p n-p
By using the nonsingularity of R 1 and the equations T2 1R 1 = 0 and T11R1 =
R 1 B, we can conclude that T2 1 = 0 and ).(T11 ) = ).(B). The conclusion
now follows because from Lemma 7.1.1 ).(A) = ).(T) = ).(T11 ) U ).(T22)· []
Example 7.1.1 If
67.00 177.60 -63.20 ]
A = -20.40 95.88 -87.16 ,
[ 22.80 67.84 12.12
X= [20, -9, -12]T and B = [25], then AX= X B. Moreover, if the orthogonal matrix
Q is defined by
-.800 .360 .480 ]
Q = .360 .928 -.096 '
[ .480 -.096 .872
then QT X= [-25, 0, OjT and
2
QT AQ = T = [ ~ ~~~ -10~ ]
0 146 3
A calculation shows that .I.( A) = {25, 75 + IOOi, 75 - 100i}.
Lemma 7.1.2 says that a matrix can be reduced to block triangular form
using unitary similarity transformations if we know one of its invariant
subspaces. By induction we can readily establish the decomposition of
Schur (1909).
7 .1. PROPERTIES AND DECOMPOSITIONS 313
). WH ] 1
[ 0 C n-1
1 n -1
Example 7".1.2 If
A= [ _; n and Q = [ .B944i
-.4472
.4472 ]
-.8944i '
then Q is unitary and
-6 ]
3- 4i .
T = [ T~1 ~:~ ] :
p q
y = [ lp
0 lq
z] r/>(Z) = -T12
[~
T
p-r
r q -r
7.1. PROPERTIES AND DECOMPOSITIONS 315
Example 7.1.3 If
then
l
l
Tu T12 · · · Ttq
0 T22 · · · T2
QHAQ = T = ... ... .. (7.1.5)
..
.q
.
0 0 Tqq
is a Schur decomposition of A E ID"xn and assume that the T;; are square.
If >..(T;;) n >.(Tii) = 0 whenever i f. j, then there exists a nonsingular matrix
Y E ID"xn such that
Proof. A proof can be obtained by using Lemma 7.1.5 and induction. IIJ]
(7.1.7)
(7.1.8)
Note that if yfl is the ith row of x- 1, then yfl A = A;yfl. Thus, the columns
of x-T are left eigenvectors and the columns of X are right eigenvectors.
Example 7.1.4 If
~
-1 1
A = [ 5 and X= [ -2
-2 6
then x- 1 AX= diag(4, 7).
X = [ XI , ... ' Xq
n1 nq
0 0
is mt -by-m, and m 1 + · · · + m, = n .
Proof. See Halmos {1958, pp. 112 ff.) 0
The J; are referred to as Jordan blocks . The number and dimensions of the
Jordan blocks associated with each distinct eigenvalue is unique, although
their ordering along the diagonal is not.
A= [1+ f 1] (7.1.9)
0 1- f
has a 2-norm condition of order 1/f.
These observat ions serve to highlight the difficulties associated wit h ill-
conditioned similarity transformations. Since
(7.1.10)
where
II E ll2 ::::: uttz(X) II A liz (7.1.11)
This is a reminder that for nonnormal matrices, eigenvalues do not have the
"predictive power" of singular values when it comes to Ax = b sensitivity
matters. Eigenvalues of nonnormal matrices have other shortcomings. See
§11.3.4.
Problems
P7.1.1 Show that if T E q;nxn is upper triangular and normal, then Tis dh>gonal.
P7.1.2 Verify that if X diagonalizes the 2-by-2 matrix in (7.1.9) and f S 1/2 then
~<t(X) 2: 1/<.
P7.1.3 Suppose A E q;nxn has distinct eigenvalues. Show that if QH AQ "'Tis its
Schur decomposition and AB = BA, then QH BQ is upper triangular.
P7.1.4 Show that if A and BH are in cr;mxn with m 2: n, then:
>.(AB) = >.(BA) U { 0, ... , 0 }.
~
m-n
P7.1.5 Given A E q;nxn, use the Schur decomposition to show that for every < > 0,
there exists a diagonalizable matrix B such that II A - B l12 S f. This shows that the set
of diagonalizable matrices is dense in q;nxn and that the Jordan canonical form is not
a continuous matrix decomposition.
P7.1.6 Suppose Ak --> A and that Q{! AkQk = Tk is a Schur decomposition of Ak.
Show that {Q.} has a converging subsequence {Q•J with the property that
lim Qk; = Q
•-oo
where QH AQ = T is upper triangular. This shows that the eigenvalues of a matrix are
continuous functions of its entries.
P7.1.7 Justify (7.1.10) and (7.1.11).
P7.1.8 . Show how to compute the eigenvalues of
k
M= j
[ :~:: ] = Ah [ :: ]
where Ah is a 2-by-2 matrix. For each case, compute A(Ah) and use the previous problem
to di.scUBS limxk and timyk ask--+ oo.
P7.1.11 If J E R"xd is a Jordan block, what is l<oo(J)?
P7.1.12 Show that if
p
R q
R. Bellman {1970). Introduction to Matrix Analysis, 2nd ed., McGraw-Hill, New York.
I.C. Gohberg, P. Lancaster, and L. Rodman {1986). Invariant Subspaces of Matrices
With Applications, John Wiley and Sons, New York.
M. Marcus and H. Mine {1964). A Suroey of Matrix Theory and Matrix Inequalities,
Allyn and Bacon, Boston.
L. Mirsky (1963). An Introduction to Linear Algebra, Oxford University Press, Oxford.
The Schur decomposition originally appeared in
I. Schur (1909). "On the Characteristic Roots of a Linear Substitution with an Appli-
cation to the Theory of Integra.! Equations." Math. Ann. 66, 488-510 (Gennan).
A proof very similar to ours is given on page 105 of
K-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra
of Companion Matrices," Numer. Math. 68, 403-425.
F. Kittaneh (1995). "Singular Values of Companion Matrices and Bounds on Zeros of
Polynomials," SIAM J. Matrix Anal. Appl. 16, 333-340.
n
where D; {z E <C: lz- d;l < L 1/;jl}.
j=l
It can also be shown that if the Gershgorin disk D; is isolated from the other
disks, then it contains precisely one of A's eigenvalues. See Wilkinson (1965,
7.2. PERTURBATION THEORY 321
pp. 71ff.).
Example 7.2.1 If
10 2 3 ]
A = -1 0 2
[ 1 -2 1
then A( A) ::e {10.226, .3870 + 2.2216i, .3870- 2.2216i} and the Gershgorin disks are
D1 = { lzl : lz- 101 :0: 5}, D2 = { lzl : lzl ::; 3}, and D3 = { lzl : lz- II :0: 3}.
For some very important eigenvalue routines it is possible to show that the
computed eigenvalues are the exact eigenvalues of a matrix A+ E where E
is small in norm. Consequently, we must understand how the eigenvalues
of a matrix can be affected by small perturbations. A sample result that
sheds light on this issue is the following. theorem.
Theorem 7.2.2 (Bauer-Fike) If p. is an eigenvalue of A+ E E <C"xn
and x- 1AX= D = diag(>.1, ... , >-n), then
min
>.E>.(A)
1>-- !JI ~ ~>p(X)II E II
P
where
p-1
6 II E 112 L II N II~ .
k=O
322 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
Proof. Define
6 =
6 k=O 6
If 8 > 1 then
p-1
II(J.Ll-D)-N)- 1
112::; ~ LIINII;
k=O
and so from (7.2.1), 6::; (J. H 8::; 1 then
II (J.LI- D) -
1
N)-l 112 ::; 8P I: II
k=O
N II~
and so from (7.2.1), 8P ::; fJ. Thus, 8::; max(fJ, (J 11P). D
Example 7.2.2 If
A=[~~~]
0 0 4.001
and E=[~.001 0~~].
0
then >.(A+ E) "' {1.0001, 4.0582, 3.9427} and A's matrix of eigenvectors satisfies
rt2{X) "' 107 . The Bauer-Fike bound in Theorem 7.2.2 he.s order 104 , while the Schur
bound in Theorem 7.2.3 has order 10°
Example 7.2.3 If
A = [ ~ I~ ] and E = [ 10 ~ 1o ~ J,
7.2. PERTURBATION THEORY 323
=
then for all .1. E .I.(A) and !J E .I.( A+ E), 1.1.- 1-'1 w-1. In this example a change of
order 10- 10 in A results in a change of order 10-t in its eigenvalues.
Example 7.2.4 If
0 0 0 ]
A= [ g 042 53 ]
4.001
and E= 0
[ .001
0
0
0
0
,
then J.(A +E) "'{1.0001, 4.0582, 3.9427} and s(1) "'.8 x 10°, s(4)"' .2 x 10- 3 , and
s(4.001)"' .2 X 10-a. Observe that II E ll2/s(J.) is a good estimate of the perturbation
that each eigenvalue undergoes.
and F= [~ ~],
then >.(A+ EF) = {1 ± y'ffi}. Note that if a f. 0, then it follows that the
eigenvalues of A+ EF are not differentiable at zero; their rate of change at
the origin is infinite. In general, if >. is a defective eigenvalue of A, then
0( E) perturbations in A can result in 0( E11P) perturbations in >. if >. is
associated with a p-dimensional Jordan block. See Wilkinson (1965, pp.
77ff.) for a more detailed discussion.
[ T~1 ~~~ ] T
n -r (7.2.2)
T n- T
(7.2.3)
min
II TuX- XT22IIF (7.2.4)
X#O IIX IIF
[~~: r
~~~ ] n-r
n-r
r
. (ran (Q 1 ) ,ran(Q, 1 ))
d1st ::; 4 &1 Tlb
II (T ).
sep u, 22
Proof. Using the SVD of P, it can be shown that
II P(I +pH P)- 112 112 ::; II p lb· (7 .2.5)
The corollary follows because the required distance is the norm of Q!j Q1 =
P(I +pH P)-1/2. [J
Thus, the reciprocal of sep(T11 , T22 ) can be thought of as a condition num-
ber that measures the sensitivity of ran( Q1) as an invariant subspace.
and that
A =T = [ TOu T,, ]
T., .
Observe that AQ, = Q1Tu where Q, = [e, e2] E R'X2. A calculation shows that
sep(Tu, T22l ::e .0003. If
E = 10 _6 [ 1 1 )
21 1 1
and we examine the Schur decomposition of
A +E ~ [ f~~ ~~~ ],
then we find that Q 1 gets perturbed to
-.9999 -.0003]
q, = .0003 -.9999
[ -.0005 -.0026
.0000 .0003
n-1
1
[~ 1
n-1
1
IIPib :::; 4~
a
such that {iJ = (qi +Q2p)j ..j1 + plfp is a unit 2-norm eigenvector for A+E.
Moreover,
Proof. The result follows from Theorem 7.2.4, Corollary 7.2.5 and the
observation that if Tn =>.,then sep(Tn, T22) = amin(T22- >.I). D
7.2. PERTURBATION THEORY 327
Note that CTmin(T22- >.I) roughly measures the separation of>. from the
eigenvalues of T22· We have to say "roughly" because
Example 7.2.6 If
A = [ 0.00
1.01 0.01
0.99
l
then the eigenvalue .X= .99 has condition l/s(.99) ""1.118 and 8S80ciated eigenvector
x = [.4472, -.8944]T. On the other hand, the eigenvalue .X= 1.00 of the "nearby" matrix
A+ E = [ 1.01
0.00
0.01
1.00
l
has an eigenvector i = [. 7071, -. 7071 jT.
Problems
P7.2.6 Suppose
A= [ ~ ;: ]
and that Art »(T22). Show that if a= sep(», T22l, then
I a
s (») = < --;=~=;==;;;=
JI +II (T22- »I) 'vII~ - Ja 2 +II vII~
P7 .2. 7 Show that the condition of a simple eigenvalue is preserved under unitary
similarity transformations.
P7.2.8 With the same hypothesis as in the Bauer-Pike theorem (Theorem 7.2.2), show
P.L. Bauer and C.T. Pike (1960). "Norms and Exclusion Theorems," Numer. Math. 2,
123-44.
A.S. Householder (1964). The Theory of Matrices in Numerical Analysis. Blaisdell,
New York.
The following papers are concerned with the effect of perturbations on the eigenvalues
of a general matrix:
A. Rube (1970). "Perturbation Bounds for Means of Eigenvalues and Invariant Sub-
spaces," BIT 10, 343-54.
A. Rube (1970). "Properties of a Matrix with a Very Ill-Conditioned Eigenproblem,"
Numer. Math. 15, 57-60.
J .H. Wilkinson ( 1972). "Note on Matrices with a Very Ill-Conditioned Eigenproblem,"
Numer. Math. 19, 176-78.
W. Kahan, B.N. Parlett, and E. Jiang (1982). "Residual Bounds on Approximate Eigen-
systems of Nonnorma.l Matrices," SIAM J. Numer. Anal. 19, 47(}-.484.
J.H. Wilkinson (1984). "On Neighboring Matrices with Quadratic Elementary Divisors,"
Numer. Math. 44, 1-21.
J.V. Burke and M.L. Overton (1992). "Stable Perturbations ofNonsymmetric Matrices,"
Lin.Alg. and Its Application 171, 249--273.
Wilkinson's work on nearest defective matrices is typical of a growing body of literature
that is concerned with "nearness" problems. See
N.J. Higham (1985). "Nearness Problems in Numerical Linear Algebra," PhD Thesis,
University of Manchester, England.
C. Van Loan (1985). "How Near is a Stable Matrix to an Unstable Matrix?," Contem-
porary Mathematics, Vol. 47, 465--477.
J.W. Demmel (1987). "On the Distance to the Nearest lll-Posed Problem," Numer.
Math. 51, 251-289.
7.2. PERTURBATION THEORY 329
J.W. Demmel (1987). "A Counterexample for two Conjectures About Stability," IEEE
Tro.m. Auto. Cont. A C-32, 340-342.
A. Ruhe (1987). "Closest Normal Matrix Found!," BIT 27, 585-598.
R. Byers (1988). "A Bisection Method for Measuring the Distance of a Stable ~latrix to
the Unstable Matrices," SIAM J. Sci. and Stat. Comp . .9, 875-881.
J.W. Demmel (1988). "The Probability that a Numerical Analysis Problem is Difficult,"
Math. Comp. 50, 449-480.
N.J. Higham (1989). "Matrix Nearness Problems and Applications," in Applicatiom of
Matnx Theory, M.J.C. Gover and S. Barnett (eds), Oxford University Press, Oxford
UK, 1-27.
Aspects of eigenvalue condition are discussed in
C. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors,"
Lin. Alg. and Its Applic. 88/89, 715-732.
C. D. Meyer and G.W. Stewart (1988). "Derivatives and Perturbations of Eigenvectors,"
SIAM J. Nv.m. Anal. 25, 679-691.
G.W. Stewart and G. Zhang (1991). "Eigenvalues of Graded Matrices and the Condition
Numbers of Multiple Eigenvalues," Nv.mer. Math. 58, 703-712.
J.-G. Sun (1992). "On Condition !\umbers of a Kondefective Multiple Eigenvalue,"
Nv.mer. Math. 61, 265-276.
The relationship between the eigenvalue condition number, the departure from normal-
ity, and the condition of the eigenvector matrix is discussed in
P. Henrici (1962). "Bounds for Iterates, Inverses, Spectral Variation and Fields of Values
of Non-normal Matrices," Numer. Math. 4, 24-40.
P. Eberlein (1965). "On Measnres of Non-=-rormality for Matrices," A mer. Math. Soc.
Monthly 72, 99.'>-96.
R.A. Smith (1967). "The Condition Numbers of the Matrix Eigenvalue Problem," Nu-
mer. Math. 10 232-40.
G. Loizou (1969). "Non normality and Jordan Condition Numbers of Matrices," J_ ACM
16, 580-40.
A. van der Slnis (1975). "Perturbations of Eigenvalues of Non-normal ).,\atrices," Comm.
ACM 18, 30-36.
The paper by Henrici also contains a result similar to Theorem 7.2.3. Penetrating treat-
ments of invariant subspace perturbation include
T. Kato (1966). Perturbation Theory for Linear Opemtors, Springer-Verlag, New York.
C. Davis and W.M. Kahan (1970). "The Rotation of Eigenvectors by a Perturbation,
Ill," SIAM J. Num. Anal. 7, 1-46.
G.W. Stewart (1971). "Error Bounds for Approximate Invariant Subspaces of Closed
Linear Operators," SIAM. J. Num. Anal. 8, 796-808.
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with
Certain Eigenvalue Problems," SIAM Review 15, 727-64.
Detailed analyses of the function sep(.,.) and the map X~ AX+ X AT are given in
J. Varah (1979). "On the Separation of Two Matrices," SIAM J_ Num. Anal. 16,
216-22.
R. Byers and S.G. Nash (1987). "On the Singular Vectors of the Lyapunov Operator,''
SIAM J. A!g. and DiBc. Methods 8, 59-66.
Gershgorin's Theorem can be used to derive a comprehensive perturbation theory. See
Wilkinson (1965, chapter 2). The theorem itself can be generalized and extended in
various ways; see
330 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
R.S. Varga (1970). "Minimal Gershgorin Sets for Partitioned Matrices," SIAM J. Num.
Anal. 7, 493-507.
R.J. Johnston (1971). "Gershgorin Theore= for Partitioned Matrices," Lin. Alg. and
Its Applic. 4, 205-20.
for k = 1, 2; ...
z(k) = Aq(k-1)
q(k) = z(k) /II z(k) !12 (7.3.3)
A(kJ = (q(kl]H Aq(k)
end
Akq(o) = a1>..~ ( X1
dist(span{q<k>},span{xll) = o (l~:n
and moreover,
1.>..1-.>..<k>l = o(i~:O·
If l>..1l > l>..2l ;?: • • • ;?: I>..,. I then we say that >..1 is a dominant eigenvalue.
Thus, the power method converges if >.. 1 is dominant and if q(o) has a
component in the direction of the corresponding dcnninant eigenvector x 1.
The behavior of the iteration without these assumptions is discussed in
Wilkinson (1965, p.570) and Parlett and Poole (1973).
Example '1'.3.1 If
A= [ =~~
-800
: ==]
631 -144
then .\(A)= {10, 4, 3}. Applying (7.3.3) with q(0 ) = [1, 0, O)Twe find
k J.(k)
1 13.0606
2 10.7191
3 10.2073
4 10.0633
5 10.0198
6 10.0063
7 10.0020
8 10.0007
9 10.0002
In practice, the usefulness of the power method depends upon the ratio
1.>..2!/1>..1!, since it dictates the rate of convergence. The danger that q(0 ) is
deficient in X1 is a less worrisome matter because rounding errors sustained
during the iteration typically ensure that the subsequent q(k) have a com-
ponent in this direction. Moreover, it is typically the case in applications
332 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
where the dominant eigenvalue and eigenvector are desired that an a priori
estimate of Xi is known. Normally, by setting q<0l to be this estimate, the
dangers of a small a 1 are minimized.
Note that the only thing required to implement the power method is a
subroutine capable of computing matrix-vector products of the form Aq.
It is not necessary to store A in an n-by-n array. Fbr this reason, the
algorithm can be of interest when A is large and sparse and when there is
a sufficient gap between IA1I and IA21·
Estimates for the error IA(k) - Ad can be obtained by applying the
perturbation theory developed in the previous section. Define the vector
r(I•J = Aq(k)- A(k)q(k) and observe that (A+ E(kl)q(k) = A(k)q(k) where
E(k) = -r(k) [q<kl]H. Thus A(k) is an eigenvalue of A+ E(k) and
Q = [Q,. Q(j I
r n-r
T= [ T~1 ~~:] r
n-r
r n- r
(7.3.6)
7.3. POWER ITERATIONS 333
r
N=
n -r.
r n-r
If /Arl > /Ar+J/, then the subspace Dr(A) = ran(Qa) is said to be a dom-
inant invariant subspace. It is the unique invariant subspace associated
with the eigenvalues A1, ... , Ar· The following theorem shows that with rea-
sonable assumptions, the subspaces ran(Qk) generated by (7.3.4) converge
to Dr(A) at a rate proportional to IAr+t/Ar/k.
Theorem 7.3.1 Let the Schur decomposition of A E CO"" be given by
(7.3.5} and (7.3.6} with n 2!: 2. Assume that /Ar/ > /Ar+Ii and that 8 2!: 0
satisfies
(1 + 8)/Ar/ > /1 N /IF ·
If Qo E C'"r has orthonormal columns and
d = dist(Dr(AH), ran(Qo)) < 1,
then the matrices Qk generated by (7.3.4) satisfy
dist(Dr(A), ran(Qk)) ~
2
(1+8)"- (
1+ /IT121/F ) (iAr+Ii+liNI/F/(1+8))k
v'1-d2 sep(Tu,T22) /Ar/-11 N IIF/(1 +8)
Proof. The proof is given in an appendix at the end of this section. D
The condition d < 1 in Theorem 7.3.1 ensures that the initial Q matrix is
not deficient in certain eigendirections:
d <1 +-+ Dr(AH).L n ran(Qo) = {0}.
The theorem essentially says that if this condition holds and if 8 is chosen
large enough, then
where c depends on sep(Tu, T22) and A's departure from normality. Need-
less to say, convergence can be very slow if the gap between /Ar/ and /Ar+l/
is not sufficiently wide.
Example 7.3.2 If (7.3.4) is applied to the matrix A in Example 7.3.1, with Qo = [e1,e2),
we lind:
k dist(D2(A), ran Qk))
1 .0052
2 .0047
3 .0039
4 .0030
5 .0023
6 .0017
7 .0013
334 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
If
dist(D1 (AH),span{q~0l, ... ,q~0 l}) < 1 i = l:n (7.3.7)
then it follows from Theorem 7.3.1 that
dist(span{q~k), ... ,qlkl},span{q11 ••• ,q;}) -+ 0
Tk = Qf:AQk
are converging to upper triangular form. Thus, it can be said that the
method of orthogonal iteration computes a Schur decomposition provided
the original iterate Qo E e-x" is not deficient in the sense of (7 .3. 7).
The QR iteration arises naturally by considering how to compute the
matrix Tk directly from its predecessor Tk_ 1. On the one hand, we have
from (7.3.4) and the definition of Tk- 1 that
for lc = 1, 2, ...
A=QR
A=RQ
end
is applied to the matrix of Example 7.3.1, then the strictly lower triangular elements
diminish as follows:
7.3.4 LR Iterations
We conclude with some remarks about power iterations that rely on the LU
factorization rather than the QR fa.ctorizaton. Let Go E have rank r. c-xr
Corresponding to (7.3.4) we have the following iteration:
It can be shown that if we set L 0 =Go, then the T~c can be generated as
follows:
336 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
To= L 01 ALo
fork=1,2, ... (7.3.10)
Tk-1 = L~cRk (LU factorization)
TJ.=R"L"
end
Iterations (7.3.8) and (7.3.10) are known as treppenitemtion and the LR
iteration, respectively. Under reasonable assumptions, the Tk converge to
upper triangular form. To successfully implement either method, it is nec-
essary to pivot. See Wilkinson (1965, p.602).
Appendix
In order to establish Theorem 7.3.1 we need the following lemma which is
concerned with bounding the powers of a matrix and its inverse.
Lemma 7.3.2 Let QH AQ = T = D + N be a Schur decomposition of
A E ccnxn where D is diagonal and N strictly upper triangular. Let A and
p. denote the largest and smallest eigenvalues of A in absolute value. If
8 :2-: 0 then for all k :2-: 0 we have
If A is nonsingular and 8 :2-: 0 satisfies (1 + 8)[P.! > II N IIF• then for all
k :2-: 0 we also have
(7.3.12)
' un-1·11 )k
s 1>2(.6.) ( 1 - II .6-D-1 N~ I 1\2
s (1
+
er•-l (
IJ.tl-11
1
N IIF/(1 + 0)
)k . 0
Proof of Theorem 7.3.1
It is easy to show by induction that AkQ 0 = Qk(Rk · · · R!). By substi-
tuting (7 .3.5) and 7 .3.6) into this equality we obtain
Tk [ ~ ] = [ ~ ] (Rk .. · R1 )
where vk = Q{!Qk and wk = QUQk. Using Lemma 7.1.5 we know that a
matrix X E <r:•" (n-r) exists such that
Ir X Tu T12 Ir X Tu 0
-I [ ] [ ] [ ]
[ 0 In-r ] 0 T22 0 In-r = 0 T22
and so
[ Ttl J 2
][ Vo W~Wo ] = [ Vk ~Wk ] (Rk ... R1 ) .
Below we establish that the matrix V0 - XW0 is nonsingular and this enables
us to obtain the following expression:
Since
II [I. , -X] liz S 1 + II X IIF
we have
dist(D.(A), ran( Qk)) s (7.3.13)
II T~2 llzll (Vo- XWo)- ll2ll Tii.k ll2 (1 +II X II F)
1
.
To prove the theorem we must look at each of the four factors in the upper
bound.
Since sep(T11 , T22 ) is the smallest singular value of the linear transfor-
mation c/l(X)= TuX- XTzz it readily follows from c/l(X) = -T12 that
(7.3.14)
338 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
and
(7.3.16)
= i lr , -X ] [ ~ ] Qo
[1 Q.Q, [ [ -~H Jr Qo
where
The columns of this matrix are orthonormal. They are also a basis for
Dr(AH ) because
AH(QOr - Q/J XH) = (Qo - Q/JXH)T{{ .
This last fact follows from the equation AHQ = QTH .
From Theorem 2.6.1
Problems
PT.3.1 (a) Show that if X E «:"X" is nonsingula.r, then II A llx = II x-I AX II• defines
a matrix norm with the property that II AB llx :5 II A llx II B llx. (h) Let A E CC"x" and
set p =max lA; I. Show that for a.ny • > 0 there exists a nonsingular X E CC"x" such that
II A llx = II x-I AX ll2 :5 p + •: C?nclude that there is a constant M_ such that II Ak I!•
:5 M(p + •l' for all non-negative mtegers k. (Hmt: Set X = Q diag(1, a, ... , a"- )
where QH AQ = D + N is A's Schur decomposition.)
PT.3.2 Verify that (7.3.10) calculates the matrices T, defined hy (7.3.9).
PT.3.3 Suppose A E C[;"x" is nonsingular and that Qo E cr;nxp has orthonormal columns.
The following iteration is referred to as inverse ortlwgonal iterotion.
Explain why this iteration ca.n usually be used to compute the p smalle~t eigenvalue~
of A in absolute value. Note that to implement this iteration it is necessary to be able
to solve linear systems that involve A. When p = 1, the method is referred to as the
inverse power method.
PT.3.4 Assume A E R'x" has eigenvalues A~o ... , A, that satisfy
A= AI = A2 = Aa = .\.4 >lAs I ;: -: ··· ;: -: jA,j
where .1. is positive. Assume that A has two Jordan blocks of the form.
Discuss the convergence properties of the power method when applied to this matrix.
Discuss how the convergence might be accelerated.
B.N. Parlett and W.G. Poole (1973). "A Geometric Theory for the QR, LU, and Power
Iterations," SIAM J. Num. Anal. 10, 389-412.
The QR iteration was concurrently developed in
J.H. Wilkinson (1965). "Convergence of the LR, QR, and Related Algorithms," Comp.
J. 8, 77-84.
B.N. Parlett (1965). "Convergence of the Q-R Algorithm," Numer. Math. 7, 187-93.
(Correction in Numer. Math. 10, 163-{;4.)
B.N. Parlett (1966). "Singular and Invariant Matrices Under the QR Algorithm," Math.
Comp. 20, 611-15.
B.N. Parlett (1968). "Global Convergence of the Basic QR Algorithm on Hessenberg
Matrices," Math. Comp. 22, 803-17.
Wilkinson (AEP, chapter 9) also discusses the convergence theory for this important
algorithm.
Deeper insight into the convergence of the QR algorithm and its connection to other
important algorithms can be attained by reading
(7.4.2)
where each R;; is either a 1-by-1 matrix or a 2-by-2 matrix having complex
conjugate eigenvalues.
Proof. The complex eigenvalues of A must come in conjugate pairs, since
the characteristic polynomial det (zl - A) has real coefficients. Let k be
342 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
A[ y z ] = [ Y z ] [ -~ ~ ] ·
UT AU = [Til Tl2 ] 2
0 T22 n- 2
2 n- 2
where >.(T11 ) = {>., X}. By induction, there exists an orthogonal U so
(JTT22U has the required structure. The theorem follows by setting Q = U
diag(h U). []
The theorem shows that any real matrix is orthogonally similar to an upper
quasi-triangular matrix. It is clear that the real and imaginary part of the
complex eigenvalues can be easily obtained from the 2-by-2 diagonal blocks.
[~ ~l
X X X X X X
X X X X X X
0 X 0 X X X
0 0 0 0 0 0
X X
X X
X X
0 X
Overall we obtain the following algorithm:
[~ n·
I
H= 2
.01
then
and
G, [ .8
6
0
-.8
.6
0 n.
[ 4.7600
G2
-2.5442
=
[~
0
.9996
.0249
5.4653 ]
-.024~
.9996
] '
X X X X X X X X X X X X
X
-
X X X X X X X X X X X
X X X X X X p, 0 X X X X X
A
X X X X X X 0 X X X X X
X X X X X X 0 X X X X X
X )( X X X X 0 X X )( X X
X X X X X X X X X X X X
-
X X X X X X X X X )( X X
0 X X )( )( X p, 0 X X X X X
~
0 0 X )( X X 0 0 X )( X X
0 0 X X X X 0 0 0 )( X X
0 0 X X X X 0 0 0 X X X
X X X X X X
X X X X X X
0 X X X X X
0 0 X X X X
0 0 0 X X X
0 0 0 0 X X
Bu k- 1
B·n 1
[ 0
n-k
k-1
is upper Hessenberg through its first k - I columns. Suppose A: is an order
n- k Householder matrix such that f>,.B32 is a multiple of e~n-k) . If Pt =
diag(I~;,i\), then
7.4. THE HESSENBERG AND REAL SCHUR FORMS 345
u~ n
Example 7.4.2 If
A=
and Ua = [~ .~ -.6.~ ]
0 .8
then
1.00 8.60 -.20 ]
UJ' AUo =H = 5.00
[ 0.00
4.96 -.72
2.28 -3.96
A= An A12 ] r
[ A21 A22 n- r
r n-r
Notice that the updates of the ( 1,2) and (2,2) blocks are rich in level-3
operations given that Q 1 is in WY form. This fully illustrates the overall
process as Qf AQ1 is block upper Hessenberg through its first block column.
We next repeat the computations on the first r columns of QfA22Q1. After
N - 2 such steps we obtain
H uJ' AUo = 0
0 0
where each H;i is r-by-r and Uo = Q 1 · · ·QN- 2 with with each Q, in WY
form. The overall algorithm has a level-3 fraction of the form 1- 0(1/N).
Note that the subdiagonal blocks in H are upper triangular and so the
matrix has lower bandwidth p. It is possible to reduce H to actual Hessen-
berg form by using Givens rotations to zero all but the first subdiagonal.
Dongarra, HammarUng and Sorensen (1987) have shown how to proceed
directly to Hessenberg form using a mixture of gaxpy's and level-3 updates.
Their idea involves minimal updating after each Householder transforma-
tion is generated. For example, suppose the first Householder Pt has been
computed. To generate P2 we need just the second column of PtAPt , not
the full outer product update. To generate P 3 we need just the 3rd col-
umn of P2PtAP1P2, etc. In this way, the Householder matrices can be
determined using only gaxpy operations. No outer product updates are
involved. Once a suitable number of Householder matrices are known they
can be aggregated and applied in a level-3 fashion.
Since w1 = e~o it follows that [ WJ, .• . , WJ:) is upper triangular and thus w;
= ±In(:, i) = ±e, for i = 2:k. Since w; = yT Qi and hi,i-1 = w[Gw,_ 1 it
follows that v; = ±q, and
c = 0 1 0 -C2 (7.4.4)
0 0 1 -Cn-1
where
B
B13]
B23 k -1
1
B33 n-k
n-k
is upper Hessenberg through its first k - 1 columns. A permutation ih
of order n - k is then determined such that the first element of ihB32 is
maximal in absolute value. This makes it possible to determine a stable
Gauss transformation Mk = I- Zkef also of order n- k, such that all but
the first component of Mk(frkB32) is zero. Defining Ilk = diag(Ik, frk) and
Mk = diag(Ik,Mk), we see that
is upper Hessenberg through its first k columns. Note that M;; 1 = I+ zke[
and so some very simple rank-one updates are involved in the reduction.
A careful operation count reveals that the Gauss reduction to Hessen-
berg form requires only half the number of flops of the Householder method.
However, as in the case of Gaussian elimination with partial pivoting, there
is a {fairly remote) chance of 2" growth. See Businger (1969). Another dif-
ficulty associated with the Gauss approach is that the eigenvalue condition
350 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
Problems
P7.4.1 Suppooe A E R"x" and z E R". Give a detailed algorithm for computing an
orthogonal Q such that QT AQ is upper H.....,nberg and QT z is a multiple of (Hint: e,.
Reduce z first and then apply Algorithm 7.4.2.)
P7.4.2 Specify a complete reduction to Hessenberg form using Gauss transformations
and verify that it only requires 5n 3 /3 flops.
P7.4.3 In some situations, it is necessary to solve the linear system (A+ zl):z: = b for
many different values of z E R and b E R". Show how this problem can be efficiently
and stably solved using the H"""""berg decomposition.
P7.4.4 Give a detailed algorithm for explicitly computing the matrix Uo in Algorithm
7.4.2. Design your algorithm so that H is overwritten by Uo.
P7.4.5 Suppose HE wxn is an unreduced upper Hessenberg matrix. Show that there
exists a diagonal matrix D such that each subdiagona.l element of n- 1 H D is equa.l to
one. What is 1<2(D)?
P7.4.6 Suppose W, Y E wxn and define the matrices C and B by
C = W + iY, B = [ ~ -: ]
Show that if>. E >.(C) is real, then >. E >.(B). Relate the corresponding eigenvectors.
[ -sc "]T[w
c 11
"'][
z -sc "]
c = ["
a {J]
>.
=
where a{J -JJ 2 •
P7.4.8 Suppose(>., z) is a known eigenvalu&eigenvector pair for the upper H....enberg
matrix H E R!'x". Give an algorithm for computing an orthogona.l matrix P such that
pTHP -- [ >.
0
wT
H,
]
where H1 E ft(n-l)x(n-l) is upper Hessenberg. Compute P as a product of Givens
rotations.
P7.4.9 Suppose HE wxn has lower bandwidth p. Show how to compute Q E R!'x",
a product of Givens rotations, such that QT HQ is upper H....enberg. How many flops
are required?
P7.4.10 Show that if C ill a companion matrix with distinct
then vcv- 1 = diag(.\1, ... '>.,.) where
V= [ : :~
>.~-1
>.n-1
2
.x~-1
l eigenvalue~ >.1, ... , >.,.,
F.D. Murnaghan and A. Wintner (1931). 'A Canonical Fbrm for Real Mabicea Under
Orthogonal Tra.nsforma~ions," Proc. Nat. Acad. Sci. 17, 417-20.
A thorough treatment of the reduction to Hessenberg form is given in Wilkinson (1965,
chapter 6), and Algol procedures for both the Householder and Gauss methods appear in
RS. Martin and J.H. Wilkinson (1968). "Similarity Reduction of a General Matrix to
Hessenberg Form," Numer. Math. 12, 349-£8. See also Wilkinson and Reinsch
(1971,pp.339-58).
Fortran versions of the Algol procedures in the last reference are in Eispack.
Givens rotations can also be used to compute the Hessenberg decomposition. See
W. Rath (1982). "Fast Givens Rotations for Orthogonal Similarity," Nu.mer. Math. 40,
47-56.
The high performance computation of the Hessenberg reduction is discussed in
J.J. Dongarra, L. Kaufman, and S. Hammarling (1986). "Squeezing the Most Out of
Eigenvalue Solvers on High Performance Computers," Lin. Alg. and If.8 Applic. 77,
113-136.
J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). "Block Reduction of Matrices
to Condensed Forms for Eigenvalue Computations," JACM f!.7, 215--227.
M.W. Berry, J.J. Dongarra, andY. Kim (1995). "A Parallel Algorithm for the Reduction
of a Nonsymmetric Matrix to Block Upper Hessenberg Form," Parollel Computing
f!.l, 1189-1211.
The possibility of exponential growth in tile Gauss transformation approach was first
pointed out in
P. Businger (1969). "Reducing a Matrix to H""""nberg Fbrm," Math. Comp. 29, 819-21.
However, the algorithm should be regarded in the same light as Gaussian elimination
with partial pivoting-stable for all practical purposes. See Eispack, pp. 56-58.
Aspects of the He9Benberg decomposition for sparse matric... are discussed in
I.S. Duff and J.K. Reid (1975). "On the Reduction of Sparse Matric... to Condensed
Forms by Similarity Transformations," J. Inst. Math. Applic. 15, 217-24.
Once an eigenvalue of an unreduced upper Hessenberg matrix is known, it is possible to
zero the last subdiagonal entry using Givens similarity transformations. See
P.A. Businger (1971). "Numerically Stable Deflation of Hessenberg and Symmetric Tridi-
agonal Matric...,BIT 11, 262-70.
Some interesting mathematical properties of the Hesaenberg form may be found in
W. Enright (1979). "On the Efficient and Reliable Numerical Solution of Large Linear
Systems of O.D.E. 's," IEEE 7rans. Auto. Cont. AC-24, 905--8.
352 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
G.H. Golub, S. Nash and C. Van Loan (1979). "A Hessenberg-Schur Method for the
Problem AX +XB = C," IEEE Trons. Auto. Cont. AC-£4, 909-13.
A. Laub (1981). "Efficient Multivariable Frequency Response Computations," IEEE
Trnm. Auto. Cont. AC-£6, 407-8.
C.C. Paige (1981). "Properties of Numerical Algorithms Related to Computing Control-
lability," IEEE Trons. Auto. Cont. AC-26, 13~38.
G. Miminis and C.C. Paige (1982). "An Algorithm for Pole Assignment of Time Invariant
Linear Systems," International J. of Control 35, 341-354.
C. Van Loan (1982). "Using the Hessenberg Decomposition in Control Theory," in
Algorithms and Theory in Filtering and Control , D.C. Sorensen and R.J. Wets
(eds), Mathematical Programming Study No. 18, North Holland, Amsterdam, pp.
102-11.
The advisability of posing polynomial root problems as companion matrix eigenvalue
problem is discussed in
K.-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra
of Companion Matrices," Numer. Math. 68, 403-425.
A. Edelman and H. Murakami (1995). "Polynomial Roots from Companion Matrix
Eigenvalues," Math. Comp. 64, 763-776.
7.5.1 Deflation
Without loss of generality we may assume that each Hessenberg matrix H
in (7.5.1) is unreduced. If not, then at some stage we have
H= Hn H12 ] p
[ 0 H22 n- p
P n-p
where 1 :0::: p < n and the problem decouples into two smaller problems
involving H11 and H22· The term deflation is also used in this context,
usually when p = n - 1 or n - 2.
In practice, decoupling occurs whenever a subdiagonal entry in H is
suitably small. For example, in Eispack if
(7.5.2)
7 .5. T HE PRACTICAL QR ALGORITHM 353
for a small constant c, then hs>+ 1,p is "declared" to be zero. This is justified
since rounding errors of order ull H 11 are already present throughout the
matrix.
and J.l is fixed from iteration to iteration, then the theory of §7.3 says that
the pth subdiagonal entry in H converges to zero with rate
>.p+l - IJ lk
I >.p - JJ
Of course, if >.,. = >.p+ 1 , then there is no convergence at all. But if, for
example, 1J is much closer to >.n than to the other eigenvalues, then the
zeroing of the (n , n - 1) entry is rapid. In the extreme case we have the
following:
Theorem 7.5.1 Let JJ. be an eigenvalue of an n-by-n unreduced Hessenberg
matrix H . If fl = RU + JJl, where H -JJI =URi:; the QR factorization
of H - JJ.l, then hn,n-1 = 0 and hnn = JJ.
Proof. Since H is an unreduced Hessenberg matrix the first n - 1 columns
of H - JJl are independent, regardless of Jl.· T hus, if U R = (H- JJl) is the
QR factorization then r ,t # 0 fori = l:n - 1. But if H - J.!l is singular then
ru · · · r,, = 0. Thus, r,, = 0 and fl(n , :) = [0, ... , 0, JJ I· [J
Example 1.5.1 If
H = [ ~ -~1 =~5 ].
354 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
fork=1,2, ...
= H(n,n)
J.l.
H-J.J.l=UR (QR Factorization) (7.5.4)
H=RU+J.J.l
end
X X X
.u
X X X
X X X
0 X X
0 0 €
~l
X X X
X X X
0 X X
0 0 a
0 0 €
the new (n,n -1} entry has order e2 , precisely what we would expect of o.
quadratically converging algorithm.
D.~1 n
Example 7.5.2 If
H =
and U R = H - 7 I is the QR factorization, then fl = RU + 7 I is given by
-0.5384 1.6908 0.8351 ]
fl "' 0.3076 6.5264 -6.6555 .
[ 0.0000 2 . 10- 5 7.0119
Near-perfect shifts a.s above almost always ensure a small hn,n-1· However, this is just
a. heuristic. There are examples in which hn,n-! is a relatively large matrix entry even
though u~;n(H - !'1) "' U.
H -a,! = U,R,
Ht = R,u, +a1I (7.5.6}
H 1 -~I U2R2
H2 = R2U2 + a2l
These equations can be manipulated to show that
(7.5.7}
where M is defined by
(7.5.8}
Note that M is a real matrix even if G's eigenvalues are complex since
M = H 2 -sH+tl
where
s =a, + a2 = hmm + hnn =trace(G) E R
356 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
and
t= a1a2 = hmmhnn- hmnhnm = det(G) E lR.
Thus, (7.5.7) is the QR factorization of a real matrix and we may choose
U1 and U2 so that Z = U1 U2 is real orthogonal. It then follows that
is real.
Unfortunately, roundoff error almost always prevents an exact return to
the real field. A real H 2 could be guaranteed if we
• set H2 = zrHz.
But since the first of these steps requires O(n3 ) flops, this is not a practical
course of action.
x = h~ 1 + h12h21 - shu +t
y = h21(hu + h22- s)
z = h2lh32·
7.5. THE PRACTICAL QR ALGORITHM 357
X X X X X X
X X X X X X
X X X X X X
PoHPo =
X X X X X X
0 0 0 X X X
0 0 0 0 X X
Now the mission of the Householder matrices P1, ... , Pn-2 is to restore this
matrix to upper Hessenberg form. The calculation proceeds as follows:
X X X X X X X X X X X X
X X X X X X X X X X X X
X X X X X X 0 X X X X X P,
~ --+
X X X X X X 0 X X X X X
0 0 0 X X X 0 X X X X X
0 0 0 0 X X 0 0 0 0 X X
X X X X X X X X X X X X
X X X X X X X X X X X X
0 X X X X X Ps 0 X X X X X
--+ ~
0 0 X X X X 0 0 X X X X
0 0 X X X X 0 0 0 X X X
0 0 X X X X 0 0 0 X X X
X X X )(" X X
X X X X X X
0 X X X X X
0 0 X X X X
0 0 0 X X X
0 0 0 0 X X
Clearly, the general Pk has the form [\ = diag(Jk, Pk, In-k-3) where A is
a 3-by-3 Householder matrix. For example,
1 0 0 0 0 0
0 1 0 0 0 0
0 0 X X X 0
p2
0 0 X X X 0
0 0 X X X 0
0 0 0 0 0 1
have the same first column. Hence, Z1e1 = Ze 1, and we can assert that Z1
essentially equals Z provided that the upper Hessenberg matrices zT HZ
and Z[ HZ1 are each unreduced.
The implicit determination of H2 from H outlined above was first de-
scribed by Francis (1961) and we refer to it as a Francis QR step. The
complete Francis step is summarired as follows:
spot any possible decoupling. How this is done is illustrated in the following
algorithm:
p
H n-p-q
q
p n-p-q q
This algorithm requires 25n3 flops if Q and T are computed. If only the
eigenvalues are desired, then 10n3 flops are necessary. These flops counts
are very approximate and are based on the empirical observation that on
average only two Francis iterations are required before the lower 1-by-1 or
360 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
2-by-2 decouples.
u ll
3 4 5
4 5 6
A= 3 6 7
0 2 8
0 0 1 10
then the subdiagona.l entries converge as follows
The roundoff properties of the QR algorithm are what one would expect
of any orthogonal matrix technique. The computed real Schur form T is
orthogonally similar to a matrix near to A, i.e.,
QT(A+E)Q = T
where QTQ =I and II E ll2 r::= ull A ll2· The computed Q is almost orthog-
onal in the sense that QTQ =I+ F where II F ll2 r::= u.
The order of the eigenvalues along T is somewhat arbitrary. But as we
discuss in §7.6, any ordering can be achieved by using a simple procedure
for swapping two adjacent diagonal entries.
7.5.7 Balancing
Finally, we mention that if the elements of A have widely varying magni-
tudes, then A should be balanced before applying the QR algorithm. This
is an O(n 2 ) calculation in which a diagonal matrix D is computed so that
if
Problems
D. Watkins and L. Elsner (1991). "Chasing Algorithms for the Eigenvalue Problem,"
SIAM J. Matrix Anal. Appl. 11!, 374-384.
D.S. Watkins and L. Elsner (1991). "Convergence of Algorithms of Decomposition Type
for the Eigenvalue Problem," Lin.Aig. and Ito Application 143, 19-47.
J. Erxiong {1992). "A Note on the Double-Shift QL Algorithm," Lin.Alg. and Its
Application 171, 121-132.
Algol procedures for LR and QR methods are given in
R.S. Martin and J.H. Wilkinson (1968). "The Modified LR Algorithm for Complex Hes-
senberg Matrices," Numer. Math. 11!, 369-76. See also Wilkinson and Reinsch(1971,
pp. 396-403).
362 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
R.S. Martin, G. Peters, and J.H. Wilkinson (1970). "The QR Algorithm for Real Hes-
senberg Matrices," Numer. Math. 1.4, 21!}--31. See also Wilkinson and Reinsch(1971,
pp. 35!}--71).
Aspects of the balancing problem are discw;sed in
Z. Bai and J.W. Demmel (1989). "On a Block Implementation of Hessenberg Multishift
QR Iteration," Int'l J. of High Speed Comput. 1, 97-112.
G. Shroff (1991). "A Parallel Algorithm for the Eigenvalues and Eigenvectors of a
General Complex Matrix," Numer. Math. 58, 77!}--806.
R.A. Van De Geijn (1993). "Deferred Shifting Schemes for Parallel QR Methods," SIAM
J. Matrix Anal. Appl. 14, 18o-194.
A.A. Dubrulle and G.H. Golub (1994). "A Multishift QR Iteration Without Computa-
tion of the Shifts," Numerical Algorithms 7, 173-181.
satisfies
II T(k) lloo :::; cull A lloo (7.6.2)
• Set x = Uoz.
This last point is perhaps the most interesting aspect of inverse iteration
and requires some justification since >. can be comparatively inaccurate if
it is ill-conditioned. Assume for simplicity that >. is real and let
n
H->.1 L a;u;v'[ = UEVT
i=l
be the SVD of H - >.I. From what we said about the roundoff properties
of the QR algorithm in §7.5.6, there exists a matrix E E JR"x" such that
H + E- >.I is singular and II E ll2 ~ ull H ll2· It follows that a., ~ ua1 and
II (H- 5.J)v, ll2 "" ual> i.e., v, is a good approximate eigenvector. Clearly
if the starting vector q<0 > has the expansion
then
z(ll = " T
L....!.v;
a·1
i=l
is "rich" in the direction v,. Note that if s(>.) ~ lu;:v, I is small, then
z(l) is rather deficient in the direction u.,.
This explains (heuristically)
why another step of inverse iteration is not likely to produce an improved
eigenvector approximate, especially if>. is ill-conditioned. For more details,
see Peters and Wilkinson (1979).
1
A = [ w-w
has eigenvalm., .\1 = .99999 and .\2 = 1.00001 and corresponding eigenvectors Xi
[1, -lo- 0 f
and :r2 [1, w- 5= f.
The condition of both eigenvalues is of order 100.
Th.e approximate eigenvalue 1J. = 1 is a.n exact eigenvalue of A+ E where
T Tu T12 ] P
[ 0 T22 q
p q
and >.(Tu) n >.(T22) = 0, then the first p columns of Q span the unique
invariant subspace associated with >.(T11 ). (See §7.1.4.) Unfortunately, the
Francis iteration supplies us with a real Schur decomposition Q~AQF = TF
in which the eigenvalues appear somewhat randomly along the diagonal of
TF. This poses a problem if we want an orthonormal basis for an invariant
subspace whose associated eigenvalues are not at the top of TF's diago-
nal. Clearly, we need a method for computing an orthogonal matrix QD
such that Q'J;TFQD is upper quasi-triangular with appropriate eigenvalue
ordering.
A look at the 2-by-2 case suggests how this can be accomplished. Sup-
pose
F F -_ T F -_ [ >.,0 h2
QTAQ ). ]
2
and that we wish to reverse the order of the eigenvalues. Note that TFx =
>.2x where
X = [ )..2 t~\1 ].
T(1:k+1,k:k+1)=T(1:k+1,k:k+1) [ -~ :]
8
Q(1:n,, k:k + 1) = Q(1:n, k:k + 1) [ c ]
-s c
end
end
end
This algorithm requires k(12n) flops, where k is the total number of required
swaps. The integer k is never greater than (n- p)p.
The swapping gets a little more complicated when T has 2-by-2 blocks
along its diagonal. See Ruhe (1 970) and Stewart (1 976) for details. Of
course, these interchanging techniques can be used to sort the eigenvalues,
say from maximum to minimum modulus.
Computing invariant subspaces by manipulating the real Schur decom-
position is extremely stable. If Q =[ Qb ... , Qn] denotes the computed or-
thogonal matrix Q, then I QTQ- I lb ;::; u and there exists a matrix E
satisfying II E 112 ;::; ull A ll2 such that (A+ E)q; E span{qi, ... , Qp} for
i = 1:p.
T
[
T~1 ~~~
0 0
~~: ~~
Tqq
l nq
(7.6.3)
n1 n2 nq
In other words, Y;j looks just like the identity except that Z;j occupies the
(i,j) block position. It follows that if Y;j 1TY; 1 = T = (T; 1 ) then T and T
are identical except that
k
Fzk - L9ikZi = Ck.
i=l
Thus, once we know z 1 , •.• , Zk-! then we can solve the quasi-triangular
system
k-!
(F - 9kkl) Zk = Ck +L 9ikZi
i=l
(7.6.5)
fork=l:r
C(I:p,k) = C(l:p,k) + C(l:p, l :k - l)G(l:k - l , k)
Solve (F - G(k,k)I)z = C(l:p,k) for z.
C(L:p,k) = z
end
for j = 2:q
fori = l:j -1
Solve TiiZ- ZT11 = -1iJ for Z using Algorithm 7.6.2.
fork =j + l:q
T,k = Tu,- ZTJk
end
fork= l:q
QkJ = Qk;Z + Qki
end
end
end
satisfies
II z -ZIIF ~
liT lip
U --':=...;;.;.:,~
II Z liP sep(T;;, Tii)
For details, see Golub, Nash, and Van Loan (1979). Since
there can be a substantial loss of accuracy whenever the subsets .>..(T;;) are
insufficiently separated. Moreover, if Z satisfies (7.6.6) then
Y;i=[~~]·
Note: KF(l'ii) = 2n +II Z II~·
Confronted with these difficulties, Bavely and Stewart ( 1979) develop
an algorithm for block diagonalizing that dynamically determines the eigen-
value ordering and partitioning in (7.6.3) so that all the Z matrices in Al-
gorithm 7.6.3 are bounded in norm by some user-supplied tolerance. They
find that the condition of Y can be controlled by controlling the condition
of the Y;j.
QTAQ = T~u u
)..
k-1
1
[ 0 n-k
k-1 1
is upper quasi-triangular and that .>.. if_ .>..(Tu) U .>..(Taa). It follows that if we
solve the linear systems (T11 - .>..I)w = -u and (Taa - .>..Il z = -v then
are the associated right and left eigenvectors, respectively. Note that the
condition of.>.. is prescribed by 1/s(.>..) = .,/(1 + wTw)(1 + zTz).
370 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
At this point, we know that the geometric multiplicity of >. is 4-i.e, C's
Jordan form has 4 blocks (p 1 -Po= 4-0 = 4).
Now suppose U[ LV2 = E 2 is the SVD of Land that we find that L has
unit rank. If we again order the singular values from small to large, then
£2 = ~T L V2 clearly has the following structure:
Besides allowing us to introduce more zeroes into the upper triangle, the
SVD of L also enables us to deduce the dimension of the null space of N 2 .
Since
Nf = [ ~ ~~ ] = [ ~ ~ ][ ~ ~ ]
and [ ~ ] has full column rank,
>. 0 0 0
b~do< ~de< 1 ~ i>uge<
X X X
0 >. 0 0 X X X
} 4 of
0 0 >. 0 X X X
vrcv 0 0 0 >. X X X
0 0 0 0 >. X a } 2 blocks of order 2 or larger
0 0 0 0 0 >. 0
0 0 0 0 0 0 >. } 1 block of order 3 or larger
Problems
PT.6.1 Give a complete algorithm for solving a reaJ, n-by-n, upper quasi-triangular
system Tx = b.
P7.6.2 Suppose u- 1 AU = diag(az, ... ,am) and v- 1 BV = diag(iJz, ... ,iJ,.). Show
that if </l(X) = AX + X B, then .X(</l) {a; + IJi : i = 1:m, j = 1:n }. What
are the corresponding eigenvectors? How can these decompositions be used to solve
AX+XB= C?
P7.6.3 Show that if Y = [ ~ ~ ] then 1t2(Y) = [2 + u 2 + v'4u2 + u4 ]/2 where
u = II z II2·
P7.6.4 Derive the system (7.6.5).
P7.6.5 Assume that T E Rnxn is block upper triangular and partitioned as follows:
= ~ (~-~k + l-' k)
1
1-'k+l
1 ( Ak
Ak+l = 2 + Ak-1) Ao =A
converges to
1 0
sign( A) =X [ Q' -ln-p ] x-'.
(c) Suppose
M [~I M12 ]
M22
p
n-p
p n -p
with the property that .X(Mzz) is in the open right half plane and .X(M2 2 ) is in the open
left half plane. Show that
sign(M) = [
1
Q'
z
-ln-p
u-
- [ ~22
/p0 -Z/2 ] ~u- 1 MU= [ Mu
O
ln-p ] .
7 .6. INVARIANT SUBSPACE COMPUTATIONS 373
G.H. Golub and J.H. Wilkinson {1976). "Ill-Conditioned Eigensystems and the Compu-
tation of the Jordan Canonical Form," SIAM Review 18, 578--619.
Papers that specifically ana.lyze the method of inverse iteration for computing eigenvec-
tors include
C. Bavely and G.W. Stewart (1979). "An Algorithm for Computing Reducing Subspaces
by Block Diagonalization," SIAM J. Num. Anal. 16, 359-67.
B. Kilgstrom and A. Ruhe (1980a). "An Algorithm for Numerical Computation of the
Jordan Normal Form of a Complex Matrix," ACM Trans. Math. Soft. 6, 398-419.
B. Kilgstrom and A. Ruhe (1980b). "Algorithm 560 JNF: An Algorithm for Numerical
Computation of the Jordan Norma.! Form of a Complex Matrix," ACM Trans. Math.
Soft. 6, 437-43.
J.W. Demmel (1983). "A Numerical Analyst's Jordan Canonical Form," Ph.D. Thesis,
Berkeley.
PapeiS that are concerned with estimating the error in a computed eigenvalue andfor
eigenvector include
S.P. Chan and B.N. Parlett (1977). "Algorithm 517: A Program for Computing the
Condition Numbers of Matrix Eigenvalues Without Computing Eigenvectors," ACM
Trans. Math. Soft. 3, 18&--203.
H.J. Symm and J.H. Wilkinson {1980). ''Rea.listic Error Bounds for a Simple Eigenvalue
and Its ASBOCiated Eigenvector," Numer. Math. 35, 113-26.
374 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
C. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors,"
Lin. Alg. and Its Applic. 88/89, 715--732.
Z. Bai, J. Demmel, and A. McKenney (1993). "On Computing Condition Numbers for
the Nonsymmetric Eigenproblem," ACM 7\uns. Math. Soft. 19, 202-223.
As we have seen, tbe sep(.,.) function is of great importance in tbe assessment of a com-
puted invariant subspace. Aspects of this quantity and the associated Sylvester equation
are discussed in
J. Vamh (1979). "On the Separation of Two Matrices," SIAM J. Num. Anal. 16,
212-22.
R. Byers (1984). "A Linpack-Style Condition Estimator for the Equation AX- XBT =
C," IEEE 7\uns. Auto. Cont. AC-29, 926-928.
K Datta (1988). "The Matrix Equation XA- BX =Rand Its Applications," Lin. Alg.
and It8 Appl. 109, 91-105.
N.J. Higham (1993). "Perturbation Theory and Backward Error for AX- XB = C,"
BIT 33, 124-136.
J. Gardiner, M.R. Wette, A.J. Laub, J.J. Amato, and C.B. Moler (1992). "Algorithm
705: A FORTRAN-77 Software Package for Solving the Sylvester Matrix Equation
AX BT + CX DT = E," ACM 'lrnns. Math. Soft. 18, 232-238.
Numerous algorithms have bffin proposed for the Sylvester equation, but those described
in
R.H. Bartels and G.W. Stewart (1972). "Solution of the Equation AX+ XB = C,"
Comm. ACM 15, 82o-26.
G.H. Golub, S. Nash, and C. Van Loan (1979). "A Hessenberg-Schur Method for the
Matrix Problem AX+ XB = C," IEEE 7\uns. Auto. Cont. AC-24, 90~13.
are among the more reliable in that they rely on orthogonal transformations. A con-
strained Sylvester equation problem is considerd in
J.B. Barlow, M.M. Monahemi, and D.P. O'Leary (1992). "Constrained Matrix Sylvester
Equations," SIAM J. Matri3: Anal. AppL 13, 1-9.
The Lyapunov problem FX + XFT = -C where Cis non-negative definite bas a
very important role to play in control theory. See
S. Barnett and C. Storey (1968). "Some Applications of the Lyapunov Matrix Equation,"
J. Inst. Math. Applic. 4, 3342.
G. Hewer and C. Kenney (1988). "Tbe Sensitivity of tbe Stable Lyapunov Equation,"
SIAM J. Control Optim 26, 321-344.
A.R. Ghavirni and A.J. Laub (1995). "Residual Bounds for Discrete-Time Lyapunov
Equations," IEEE 7\uns. Auto. Cont. 40, 1244-1249.
Several authors have considered generalizations of the Sylv..,ter equation, i.e., EF;X G; =
C. These include
P. Lancaster (1970). "Explicit Solution of Linear Matrix Equations," SIAM RIIDiew 12,
544-66.
H. Wimmer and A.D. Ziebur (1972). "Solving tbe Matrix Equations E/p(A)gp(A) = C,"
SIAM Review 14, 318--23.
W.J. Vetter (1975). "Vector Structur... and Solutions of Linear Matrix Equations," Lin.
Alg. and Its Applic. 10, 181-88.
Some Ideas about improving computed eigenvalu..,, eigenvectors, and invariant sub-
spaces may be found in
7. 7. THE QZ METHOD FOR Ax = ABX 375
J.J. Dongarra, C.B. Moler, and J.H. Wilkinson (1983). "Improving the Accuracy of
Computed Eigenvalues and Eigenvectors," SIAM J. Numer. Anal. 20, 23--46.
J.W. Demmel (1987). ''Three Methods for Refining Estimates of Invariant Subspaces,"
Computing 38, 43-57.
Hessenberg/QR iteration techniques are fast, but not very amenable to parallel computa-
tion. Because of this there is a hunger for radically new approaches to the eigenproblem.
Here are some papers that focus on the matrix sign function and related ideas that have
high performance potential:
C.S. Kenney and A.J. Laub (1991). "Rational Iterative Methods for the Matrix Sign
Function," SIAM J. Matrix Anal. Appl. 12, 273-291.
C.S. Kenney, A.J. Laub, and P.M. Papadopouos (1992). "Matrix Sign Algorithms for
Riccati Equations," IMA J. of Math. Control Inform. 9, 331-344.
C.S. Kenney and A.J. Laub (1992). "On Scaling Newton's Method for Polar Decompo-
sition and the Matrix Sign Function," SIAM J. Matrix Anal. Appl. 13, 688-706.
N.J. Higham (1994). "The Matrix Sign Decomposition and Its Relation to the Polar
Decomposition," Lin. Alg. and Its Applic 212/219, 3-20.
L. Adams and P. Arbenz (1994). "Towards a Divide and Conquer Algorithm for the Real
Nonsymmetric Eigenvalue Problem," SIAM J. Matrix Anal. Appl. 15, 1333-1353.
If A E A(A, B) and
7.7.1 Background
The first thing to observe about the generalized eigenvalue problem is that
there are n eigenvalues if and only if rank( B) = n. H B is rank deficient
376 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
A= [ ~ ;] B =
[~ ~ ] ~ >.(A, B)= {1}
A
[~ ~ ] B =
[~ ~ ] ~ >.(A, B)= 0
A -- [ 1
0 0
2] B -- 1 0]
[ 0 0 ~ >.(A, B)= Qj
Note that C will be affected by roundoff errors of order ull A ll2ll B- 1 ll2·
If B is ill-conditioned, then this can rule out the possibility of computing
any generalized eigenvalue accurately---€ven those eigenvalues that may be
regarded as well-conditioned.
Example 7.7.1 If
then ~(A, B)= {2, 1.07x 106 }. With 7-digit floating point arithmetic, we find ~(/l(AB- 1 ))
= {1.562539, 1.01 x 106}. The poor quality of the sma.ll eigenvalue is because 1<2(8)""
2 x 106 . On the other hand, we find that
We say that the pencils A - >.B and A 1 - >.B 1 are equivalent if (7.7.2)
holds with nonsingular Q and Z.
7. 7. THE QZ METHOD FOR Ax = >.Bx 377
that both Qf/ AZ~c = R~cS~c and Qf/ B~cZk = Sk are also upper triangular.
Using the Bolzano-Weierstrass theorem, we know that the bounded se-
quence {(Q~c, Zk)} has a converging subsequence, Jim(Qk., Zk,) = (Q, Z).
It is easy to show that Q and Z are unitary and that QH AZ and QH BZ
are upper triangul&. The assertions about >.(A, B) follow from the identity
n
det(A- >.B) = det(QZH) 11(t;;- >.s;;). D
i=l
induce large changes in the eigenvalue~; = t;, j s,1 if s;; is small. However,
as Stewart (1978) argues, it may not be appropriate to regard such an
eigenvalue as "ill-oonditioned." 'Ibe reason is that the reciprocal p.1 =
s,ift., might be a very well behaved eigenvalue for the pencil p.A- B. In
the Stewart analysis, A and B a.re treated symmetrically and the eigenvalues
are regarded more as ordered pairs (t", Si;) than as quotients. With this
point of view it becomes appropriate to measure eigenvalue perturbations
in the chordal metric chord (a, b) defined by
chord( a, b) =
la-bl .
v'l + a 2 v'l + b2
Stewart shows that if ~ is a dL<;tinct eigenvalue of A - ~B and ~e is the
corresponding eigenvalue of the perturbed pencil A- ~iJ with II A - A 11~ ~
II B - iJ 1!2::::: t., then
where x and y have unit 2-norm and satisfy Ax = ~Bx and y 8 = ~yB B .
Note that the denominator in the upper bound is symmetric in A and B .
The "truly" ill-conditioned eigenvalues are those for which this denominator
is small.
The extreme case when tJc1c = su = 0 for some A: has been studied
by Wilkinson (1979). He me.Jces the interesting observation that when this
occurs, the remaining quotients t;;f B!i can assume arbitrary values.
~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 0 X
X X X 0 0 X
~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 0 X
X X X 0 0 0
Zeros are similarly introduced into the (4, 1} and (3, 1} positions in A:
~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 X X
X X X 0 0 0
X~
~l
X X X X X X
X X X X X X
A= AZs4 = X X X 0 X X
[ X X X 0 0 X
X X X 0 0 0
~l
X X X X X X
X X X X X X
X X X X X X
X X X 0 0 X
X X X 0 0 0
~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 0 X
X X X 0 0 0
A is now upper Hessenberg through its first column. The reduction is
completed by zeroing as2. a42, and as3· As is evident above, two orthogonal
transformations are required for each .a;; that is zeroed--one to do the
380 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM
zeroing and the other to restore B's triangularity. Either Givens rotations
or 2-by-2 modified Householder transformations can be used. Overall we
have:
Example 7.7.3 If
-.1231
A = [
1
~1 ~1 -i2 ]
and orthogonal matrices Q and Z are defined by
-.9917 .0378 ]
and B
[~
2
5
8
[ 1.0000
n 0.0000 0.0000 ]
Q = -.4924
[ -.8616
.0279 -.8699 and z 0.0000 -.8944 -.4472
.1257 .4917 0.0000 .4472 -.8944
then A1 = QT AZ and B, = QT BZ are given by
-2.5849 1.5413 2.4221 ] [ -8.1240 3.6332 14.2024 ]
A, = -9.7631 .0874 1.9239 a.nd B, 0.0000 0.0000 1.8739 .
[ 0.0000 2.7233 -.7612 0.0000 0.0000 .7612
7. 7. THE QZ METHOD FOR Ax = ).Bx 381
7.7.5 Deflation
In describing the QZ iteration we may assume without loss of generality that
A is an unreduced upper Hessenberg matrix and that B is a nonsingular
upper triangular matrix. The first of these assertions is obvious, for if
ak+l,k = 0 then
~l
X X X X X X
X X X X X X
X X X 0 0 X
0 X X 0 0 X
0 0 X 0 0 0
The zero on B's diagonal can be "pushed down" to the (5,5) position as
follows using Givens rotations:
~l
X X X X X X
X X X X X X
X X X 0 0 X
X X X 0 0 0
0 0 X 0 0 0
~l
X X X X X X
X X X X X X
X X X 0 0 X
0 X X 0 0 0
0 0 X 0 0 0
~l
X X X X X X
X X X X X X
X X X 0 0 X
0 X X 0 0 0
0 X X 0 0 0
~l
X X X X X X
X X X X X X
X X X 0 X X
0 X X 0 0 0
0 0 X 0 0 0
382 CHAPTER 7. T HE UNSYMMETRlC EIGENVALUE PROBLEM
[~ ~ l·B ~ BZ~ ~ ~ ~l
X X X X X X
X X X X X X
A= AZ4s = X X X 0 X X
0 X X [ 0 0 X
0 0 0 0 0 0
X X X X X X
X X X X X X
X X X X X X
A = Po A =
0 0 X X X X
0 0 0 X X X
0 0 0 0 X X
X X X X X X
X X X X X X
X X X X X X
B = PoB ::::
0 0 0 X X X
0 0 0 0 X X
0 0 0 0 0 X
X X X X X X
0 X X X X X
0 0 X X X X
B = BZIZ'J = 0 0 0 X X X
0 0 0 0 X X
0 0 0 0 0 X
X X X X X X
X X X X X X
0 X X X X X
0 X X X X X
0 0 0 X X X
0 0 0 0 X X
X X X X X X
0 X X X X X
0 X X X X )(
B = PtB = 0 X X X X X
0 0 0 0 X X
0 0 0 0 0 X
Notice that with this step the unwanted nomero elements bave been shifted
down and to the right from their original position. This Ulustrates a typical
step in the QZ iteration. Notice that Q = QoQt · · · Qn-z has the same first
column as Qo. By the way the initial Householder matrix was detennined,
we can apply the Implicit Q theorem and assert that AB- 1 = QT (AB- 1)Q
is indeed essentially the same matrix that we would obtain by applying the
Francis iteration toM = AB- 1 directly. Overall we have:
p
A n-p-q
q
n-p-q q
p
B n-p-q
q
n-p-q
if q < n
if B22 is singular
Zero an-q,n-q-1
else
Apply A!goritlun 7.7.2 to A22 and B2 2
A= diag(lp,Q,lq)TAdiag(Ip,Z,lq)
B = diag(lp,Q,IqjTBdiag(Ip,Z,Iq)
end
end
end
Q;r(A + E)Zo = T s
386 CHAPT ER 7. THE UNSYMMETRIC EICENVALUE PROBLEM
where Qo and Zo are eJUI(:tly orthogonal and II E 112 ~ unA 112 and II F 112 ~
uiiBII2·
Example 7.7.6 Ir the QZ algorithm is app lied t.o
A=
[l
3
4
3
0
0
4
5
6
2
0
5
6
7
8
1 u and B =
[j
-1
1
0
0
0
- 1
- 1
1
0
0
-1
- 1
- 1
1
0
- 1
-1
- 1
-1
1
l
then t he subdiagonal elements or A converge BB follows
Problems
lJTBV = [~ 0 ]
0 n-r
r u = [ u, u. l V=[Vl V2]
T n -T r n-r
r n-r
is the SVD of B, where D is r-by-r and r = mnk(B). Show that if >.(A, B) = cr then
Uf A V2 is singular.
P7.7.2 Define F: Rn ~ R by
F(:r) = !.I
2
lAo: - "'T BT A:r B:rll2
zTBTBz
2
=
where A and Bare in R""n. Show that if VF(:r) 0, then A:r is a multiple of B:r.
P7.7.3 Suppose A and Bare in :re'"n. Givt~ an algorithm for computing orthogonal Q
e.nd Z such that QT AZ is upper Hessenberg and zT BQ is upper trie.ngulac.
P7. 7.4 Suppose
A_ [ An Al2 ] BOd B-
- 0 A22 - [ Bn
0
with An, Bu E Rkxk and A22, B22 E fll><j. Under what circumste.nces do there exist
X = [ ~ ~;2 ] BDd y =[ ~ 1 2
]
so that y-l AX and y-l BX ace both block diagonal? This is the generulizeD. Sylvester
equation problem. Specify an algorithm for the case when An, A22, Bn, and B22 ace
upper triangulac. See K8gstriim (1994).
P7.7.5 Suppose 1.1 ~>.(A, B). Relate the eigenvalues and eigenvectors of A, =(A-
1.1B)- 1 A BDd B1 = (A- 1.1B)- 1B to the generalized eigenvalues and eigenvectors of
A- >.B.
P7.7.6 Suppose A, B,C,D E R""n. Show how to compute orthogonal matrices Q, Z,U,
and v such that QT AU is upper Hessenberg and vTcz, QT BV, and vTnz are Bll
upper triangulac. Note that this converts the pencil AC- .\BD to Hessenberg-triangulac
form. Your algorithm should not form the products AC or BD explicitly and not should
not compute any matrix inverse. See Van Loan (1975).
A good general volume that covers many aspects of the A - J..B problem is
B. Kii.gstrom and A. Ruhe {1983). Matrix PenciL., Proc. Pite Havsbad, 1982, Lecture
Notes in Mathematics 973, Springer-Verlag, New York and Berlin.
The perturbation theory for the generalized eigenvalue problem is treated in
G.W. Stewart {1972). "On the Sensitivity of the Eigenvalue Problem Ax= J..Bx," SIAM
J. Num. Anal. 9, 669-86.
G.W. Stewart {1973). "Error and Perturbation Bounds for Subspaces Associated with
Certain Eigenvalue Problems," SIAM Review 15, 727--{;4.
G.W. Stewart (1975). "Gershgorin Theory for the Generalized Eigenvalue Problem Ax =
J..Bx," Mafh. Comp. 29, 6()(}-606.
G.W. Stewart {1978). "Perturbation Theory for the Generalized Eigenvalue Problem"',
in Recent Advance• in Numerical Analy•io , ed. C. de Boor and G.H. Golub, Aca-
demic Press, New York.
A. Pokrzywa {1986). "On Perturbations and the Equivalence Orbit of a Matrix Pencil,"
Lin. Alg. and Applic. 82, 99-121.
C.B. Moler and G.W. Stewart (1973). "An Algorithm for Generalized Matrix Eigenvalue
Problems," SIAM J. Num. Anal. 10, 241-56.
L. Kaufman {1974). "The LZ Algorithm to Solve the Generalized Eigenvalue Problem,"
SIAM J. Num. Anal. 11, 997-1024.
R.C. Ward {1975). "The Combination Shift QZ Algorithm," SIAM J. Num. Anal. 12,
835-853.
C. F. Van Loan {1975). "A General Matrix Eigenvalue Algorithm," SIAM J. Num. Anal.
12, 819--834.
L. Kaufman {1977). "Some Thoughts on the QZ Algorithm for Solving the Generalized
Eigenvalue Problem," ACM 'lhln•. Mafh. Soft. S, 65-75.
R.C. Ward (1981). "Balancing the Generalized Eigenvalue Problem," SIAM J. Sci and
Stat. Comp. 2, 141-152.
P. Van Dooren {1982). "Algorithm 590: DSUBSP and EXCHQZ: Fortran Routines
for Computing Deflating Subspaces with Specified Spectrum," ACM 'lhln•. Math.
Software 8, 376-382.
D. Watkins and L. Elsner {1994). "Theory of Decomposition and Bulge-Chasing Algo-
rithms for the Generalized Eigenvalue Problem," SIAM J. Matrix Anal. Appl. 15,
943-967.
Just as the Hessenberg decomposition is important in its own right, so is the Hessenberg-
triangular decomposition that serves as a QZ front end. See
W. Enright and S. Serbin (1978). "A Note on the Efficient Solution of Matrix Pencil
Systems," BIT 18, 276-81.
Other solution frameworks are proposed in
V.N. Kublanovskaja and V.N. Fadeeva (1964). "Computational Methods for the Solution
of a Generalized Eigenvalue Problem," A mer. Math. Soc. 'lhln•l. 2, 271-90.
G. Peters and J.H. Wilkinson (1970a). "Ax= J..Bx and the Generalized Eigenproblem,"
SIAM J. Num. Anal. 7, 479--92.
7.7. THE QZ METHOD FOR Ax= ABX 389
G. Rodrigue (1973). "A Gradient Method for the Matrix Eigenvalue Problem A:r =
.>.B:r," Numer. Math. 22, 1-16.
H.R. Schwartz (1974). "The Method of Coordinate Relaxation for (A - .>.B)x = 0,"
Num. Math. 23, 135-52.
A. Jennings and M.R. Osborne (1977). "Generalized Eigenvalue Problems for Certain
Unsymmetric Band Matrices," Lin. Alg. and Its Applic. !19, 139-50.
V.N. Kublanovskaya (1984). "AB Algorithm and Its Modifications for the Spectral
Problem of Linear Pencils of Matrices," Numer. Math. 43, 329-342.
C. Oara (1994). "Proper Deflating Subspaces: Properties, Algorithms, and Applica.-
tiOJJS," Numerical Algorithms 7, 355-373.
The general Ax = .>.Bx problem is central to some important control theory applications.
See
P. Van Dooren (1981). "A Generalized Eigenvalue Approach for Solving Riccati Equa-
tions," SIAM J. Sci. and Stat. Comp. 2, 121-135.
P. Van Dooren (1981). "The Generalized Eigenstructure Problem in Linear System
Theory," IEEE 1Tans. Auto. Cont. AC-26, 111-128.
W.F. Arnold and A.J. Laub (1984). "Generalized Eigenproblem Algorithms and Software
for Algebraic Riccati Equations," Proc. IEEE 72, 1746-1754.
J.W. Demmel and B. KB.gstrom (1988). "Accurate Solutions of Ill-Posed Problems in
Control Theory," SIAM J. MatTi% Anal. Appl. 126-145.
U. Flaschka, W-W. Li, and J-L. Wu (1992). "A KQZ Algorithm for Solving Linear-
Response Eigenvalue Equations," Lin. Alg. and Its Applic. 165, 93-123.
Rectangular generalized eigenvalue problems arise in certain applications. See
G.L. Thompson and R.L. Weil (1970). "Reducing the Rank of A- .>.B," Proc. Amer.
Math. Sec. 26, 548-54.
G.L. Thompson and R.L. Wei! (1972). "Roots of Matrix Pencils Ay =.>.By: Existence,
Calculations, and Relations to Game Theory," Lin. Alg. and Its Applic. 5, 207-26.
G.W. Stewart (1994). "Perturbation Theory for Rectangular Matrix Pencils," Lin. Alg.
and Applic. 208/809, 297-301.
The Kronecker Structure of the pencil A-.>.B is analogous to Jordan structure of A-.>.I:
it provides very useful information about the underlying application.
J .H. Wilkinson (1978). "Linear Differential Equations and Kronecker's Canonical Form,"
in Recent Advances in Numerical Analysis , ed. C. de Boor and G.H. Golub, Aca.-
demic Press, New York, pp. 231-65.
Interest in the Kronecker structure has led to a host of new algorithms and ana.lyses.
J.H. Wilkinson (1979). "Kronecker's Canonical Form and the QZ Algorithm," Lin. Alg.
and Its Applic. 88, 285-303.
P. Van Dooren (1979). "The Computation of Kronecker's Canonical Form of a Singular
Pencil," Lin. Alg. and Its Applic. 27, 103-40.
J.W. Demmel (1983). ''The Condition Number of Equivalence Transformations that
Block Diagonalize Matrix Pencils," SIAM J. Numer. Anal. 80, 599-610.
J.W. Demmel and B. Kagstrom (1987). "Computing Stable Eigendecompositions of
Matrix Pencils," Linear Alg. and Its Applic 88/89, 139-186.
B. Kagstrom (1985). "The Generalized Singular Value Decomposition and the General
A - AB Problem," BIT 24, 56s-583.
B. Kagstri:im (1986). "RGSVD: An Algorithm for Computing the Kronecker Structure
and Reducing Subspaces of Singular A- .>.B Pencils," SIAM J. Sci. and Stat. Comp.
7,185-211.
J. Dernmel and B. K8.gstri:im (1986). "Stably Computing the Kronecker Structure and
Reducing Subspaces of Singular Pencils A - .>.B for Uncertain Data," in Large Scale
390 CHAPTER 7. THE UNSYMMETRJC EIGENVALUE PROBLEM
The Symmetric
Eigenvalue Problem
391
392 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
§8.4
i
§8.1 -+ §8.2 -+ §8.3 -+ §8.6 _, §8.7
!
§8.5
LAPACI<: SVD
. CESVD A = VEVT
.BDSQR SVD of real bidiagonal matrix
.C£&1\11 bidiagonalization of general matrix
.ORGBR generates the orthogonal tr&nsformations
.CBBlUI bidiagonalization of b&nd matrix
Q= H o]
[ 01 Ql and a" = [.>. •
0
Ao
a ]
1
fork= l:n.
Proof. Let QT AQ = diag(>.i) be the Schur decomposition with>.,~: = >.,~;(A)
and Q = [ q" q2, ... , qn ]. Define
S.~; = span{q1, ... , qk},
the invariant subspace associated with >.~, ... , >.k. It is easy to show that
TA
max min y T y ~ >..~:(A)
dim(S)=k O;o!yES Y Y
,\(A)~ U[d;-r;,d;+r;]
i=l
n
where r; L IJ;il fori= 1:n. See Theorem 7.2.1.
j=l
has Gerschgorin intervals [1.7, 2.3], [4.5,5.5], and [-1.4, -.6] and eigenvalues 1.9984,
5.0224, and -1.0208.
Example 8.1.3 If
A= [ 6.8
2.4
2.4
8.2
l and E _
-
[ .002
.003
.003
.001
l '
then A(A) = {5, 10} and A(A +E)= {4.9988, 10.004} confirming that
1.95 x 10-' = 14.9988- 51 2 + 110.004- 101 2 :-::; 11 E 11~ = 2.3 x w-'.
Example 8.1.4 If
fork= 1:n.
Proof.
Several more useful perturbation results follow from the minimax property.
Theorem 8.1.7 (Interlacing Property) If A E !Rnxn is symmetric and
A,. = A(1:r, 1:r), then
Ar+t(Ar+t) :":: Ar(Ar) :":: Ar(Ar+!) :":: · .. :":: A2(Ar+I) :":: .XI(Ar) :":: .XI(Ar+J)
forr = 1:n -1.
8.1. PROPERTIES AND DECOMPOSITIONS 397
Example 8.1.5 If
A=[~~~!]
1 3 6 10
1 4 10 20
then -\(A,) = {1}, -\(A2) = {.3820, 2.6180}, -\(A 3 ) = {.1270, 1.0000, 7.873}, and
-\(A.)= {.0380, .4538, 2.2034, 26.3047}.
with m1 + · · · + mn = 1.
QTAQ= D =
(8.1.1)
Proof. If
QT AQ = [ Dt £1; ] ,
E21 D2
then from AQ = QD we have AQ1 - Q 1D 1 = Q 2E2 1. Since ran(Qt) is
invariant, the columns of Q2E21 are also in ran(Qt) and therefore perpen-
dicular to the columns of Q2. Thus,
o = QI(AQt- QtDt) = QfQ2E21 = E21·
and so (8.1.1) holds. It is easy to show
det(A- Mn) = det(QT AQ- Mn) = det(Dt- M.)det(D2- Mn-r)
confirming that >.(A)= >.(Dt) U >.(D2). D
II E lb $ sep(Dt D2),
d
IIEib:::; 4'
then there exists p E Rn-I satisfying
400 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
such that ih = (qi +Q2p)/ y'1 + pTp is a unit 2-nonn eigenvector for A+E.
Moreover,
Proof. Apply Theorem 8.1.10 and Corollary 8.1.11 with r = 1 and observe
that if Dt =(A), then d = sep(D1,D2). D
is orthogonal. Let tls = Qes, i = 1, 2, 3. Thus) 'is is the perturbation of A's eigenvector
q; = "-i· A calculation shows that
Thus, because they are WliOciated with nearby eigenvalues, the eigenvectors q1 and ~
cannot be computed accurately. On the other hand, since >., and >.2 are well separated
from >.3, they define a tw<>-dimensional subspace that is not particularly sensitive as
dist{span{qi,q2},apan{qt,oh}} = .01.
fork= 1:r.
8.1. PROPERTIES AND DECOMPOSITIONS 401
~~::
2
B+E
[: QI:Q2] + [ E[OQ ]
for k = l:r. The theorem follows by noting that for any x E lRr and
y E lRn-r we have
liE [ ; ] 112 ~ II EtX 112 +II E[Q2Y 112 ~ II Et 11211 X 112 +II Et 112 II y 112
A =[ 6.8
2.4
2.4 ]
8.2 ' Q1 = [ :~:~ ] , and S = (5.1) E R
then
AQ1- Q1S = [ =:::~ ]= E1.
The theorem predicts that A has a.n eigenvalue within -./211 E1 ll2"' .1415 of 5.1. This
=
is true since >.(A) {5, 10}.
is the Schur decomposition of Q[ AQ1 and Q1Z = [ YI, ... , Yr] , then
fork= 1:r.
Proof.
In Theorem 8.1.15, the (!k are called Ritz values, the Yk are called Ritz
vectors, and the {Ok, Yk) are called Ritz pairs.
The usefulness of Theorem 8.1.13 is enhanced if we weaken the assump-
tion that the columns of Q 1 are orthonormal. As can be expected, the
bounds deteriorate with the loss of orthogonality.
Theorem 8.1.16 Suppose A E Rnxn is symmetric and that
fork= 1:r.
Proof. Let X 1 = ZP be the polar decomposition of X 1 . Recall from
§4.2.10 that this means Z E Rnxr has orthonormal columns and P E Rkxk
is a symmetric positive semidefinite matrix that satisfies P 2 = Xf X 1 .
Taking norJns in the equation
gives
II x, u~ ~ 1 + r. (8.1.6)
Since
y ERn =>
y E So =>
it foUows that
min
yESo
404 CHAPTER 8. THE SYMM£TRIC EIGENVALUE P ROBLEM
T hus, ).,.(A) and ~r(XT AX) have the s8Jlle sign and so we have shown that
A and xr AX have the same number of positive eigenvalues. If we apply
this result to - A, we conclude that A and xr AX have the s8Jlle number of
negative eigenvalues. ObviollSly, the number of zero eigenvalues possessed
by each matrix is also the same. C
X=[~!~]·
then
xT AX = [ 1~ 64~~ !!82 ]
15
and ,\(XT AX) {134.769, 3555, - .1252}.
Problems
P8.1.1 Without using any of the results i.n this section, show that the eigenvalues or a
2-by-2 symmetric matrix must be real.
P8.1.6 Suppose A, E E wxn are symmetric and consider the Schur decomposition
A + tE = QDQT where we GSsume that Q = Q(t) and D = D(t) are continuously differ-
entiable functions oft E R. Show t hat D(t) = diag( Q(t)T EQ(t)) where t he matrix on
the right is the diagonal part of Q(t)TEQ(t). Establi$h the Wielandt-Hoffman theorem
by integrating both sides of this equation from 0 to 1 and taking Frobenius norms to
show that
II BX- XC IIF where the min is taken over a.ll matrices in Rmxn.
P8.1.10 Prove the inequality (8.1.3).
PS.l.ll Suppose A E R'xn is symmetric and C E R'xr has full column rank and
assume that r « n. By using Theorem 8.1.8 relate the eigenva.lues of A + CCT to the
eigenva.lues of A.
The perturbation theory for the symmetric eigenva.Iue problem is surveyed in Wilkinson
(1965, chapter 2), Parlett {1980, chapters 10 and 11), a.nd Stewart and Sun (1990, chap-
ters 4 a.nd 5). Some representative papers in this well-researched area include
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with
Certain Eigenvalue Problems," SIAM Review 15, 727-£4.
C.C. Paige {1974). "Eigenvalues of Perturbed Hermitian Matrices," Lin. Alg. and It8
Applic . 8, 1-10.
A. Ruhe {1975). "On the Closeness of Eigenva.lues and Singular Values for Almost
Norma.I Matrices," Lin. Alg. and It8 Applic. 11, 87-94.
W. Kahan (1975). "Spectra of Nearly Hermitian Matrices," Proc. Amer. Math. Soc.
48, 11-17.
A. Schonhage (1979). "Arbitrary Perturbations of Hermitian Matrices," Lin. Alg. and
Its Applic. 24, 143-49.
P. Deift, T. Nanda, and C. Tomei (1983). "Ordinary Differentia.l Equations and the
Symmetric Eigenvalue Problem," SIAM J. Numer. Anal. 20, 1-22.
D.S. Scott {1985). "On the Accuracy of the Gershgorin Circle Theorem for Bounding
the Spread of a Rea.l Symmetric Matrix," Lin. Alg. and Its Applic. 65, 147-155
J.-G. Sun (1995). "A Note on Backward Error Perturbations for the Hermitian Eigen-
va.lue Problem," BIT 35, 385--393.
R.-C. Li (1996). "Relative Perturbation Theory(!) Eigenvalue and Singular Value Vari-
ations," Technical Report UCB/ /CSD-94-855, Department of EECS, University of
Ca.Iifornia at Berkeley.
R.-C. Li (1996). "Relative Perturbation Theory (II) Eigenspace and Singular Subspace
Variations," Technical Report UCB/ /CSD-94-856, Department of EECS, University
of Ca.lifornia at Berkeley.
To= UJ'AUo
fork= 1,2, ...
Tk-1 = UkRk ( QR factorization) (8.2.1)
Tk = RkUk
end
Since Tk
that
(8.2.2)
406 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
where Q = [q 1 , ••• ,qn] is orthogonal and [>.t[ > [>.21 ~ ··· ~ l>.nl• Let the
vectors Qk be specified by {8.2.3) and define 8k E [O,n/2] by
Proof. From the definition of the iteration, it follows that q(k) is a multiple
of Akq(O) and so
a~+ · ··+a; = 1,
and
Thus,
n
~ a:l.A~k
~. '
i=2
1 - n n
~a2_A?k
LJ 1 • La~A'f"
i=l i=l
2 .A )2~<
= tan(9o) ( A:
This proves (8.2.4). Likewise,
and so
n
i= l
A _
- [
-1.6407
1.0814
1.2014
1.0814
4.1573
7.4035
1.1539 -1.0463
1.2014
7.4035
2.7890
-1.5737
1 1539
-1:0463
-1.5737
8.6944
l
408 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
are given by J.(A) : {12, 8, -4, -2}. If (8.2.3) i.s applied to this matrix with q(O) ==
[1 0 0 O)T, then
k ;.(k)
1 2.3156
2 8.6802
3 10.3163
4 11.0663
5 11.5259
6 11.7747
7 11.8967
8 11.9534
9 11.9792
10 11.9907
Observe the convergence to J. 1 = 12 with rate IJ. 2 jJ. 1 12 k : (8/12) 2 k : (4/9)k.
Computable error bounds for the power method can be obtained by using
Theorem 8.1.13. If
I Aq(k)- .>,(k)q(k) lb = h,
then there exists), E >.(A) such that l.>.(k) ->.I : : ; -/26.
X = t
i=I
a;q; }
=> (A- >.I)- 1 x =
n a·
2: ~ ~ ),q,.
t=l
Aq; = >.;q;, i = l:n
Thus, if ), "=' ),i and aj is not too small, then this vector has a strong
component in the direction of QJ· This process is called inverse itemtion
and it requires the solution of a linear system with matrix of coefficients
A->.I.
), = r(x) _
minimizes II (A- >.I)x ll2· (See also Theorem 8.1.14.) The scalar r(x) is
called the Rayleigh quotient of x. Clearly, if x is an approximate eigen-
vector, then r(x) is a reasonable choice for the corresponding eigenvalue.
8.2 . POWER ITERATIONS 409
Combining t his Idea with inverse iteration gives rise to the Rayleigh quotient
iteration:
xo given, II xo !1 2= 1
for k= 0, 1, . ..
J.Jk = r(x~r.) (8.2.6)
Solve (A - JJ~r. l )zk+ l = Xk for Zk+l
Xk+J = Zk+J/ 11 Zk+l 112
end
l
Exam ple 8 .l.2 If (8.2.6) is applied to
I 1 I I
2 3 4 5 6
A - [ j 3
4
5
6 10
10 20
15 35
15
35
70
21
I
56
126
6 21 56 126 252
with xo = [1, 1, 1, 1, 1, 1]T/6, t hen
k ~k
0 153.8333
1 120.0571
2 49.5011
3 13.8687
4 15.4959
5 15.5534
The iteration is converging to the eigenvalue>. = 15.5534732737.
c~ + s~ = 1
Z1t.+1 ). 1 ~ ~ -~~~~ [ ]
(8.2.7)
fork=1,2, ...
zk = AQk-1 (8.2.8)
QkRk = zk (QR factorization)
end
Note that if r = 1, then this is just the power method. Moreover, the
sequence {Qked is precisely the sequence of vectors produced by the power
iteration with starting vector q(O) = Q 0 e 1 .
In order to analyze the behavior of (8.2.8), assume that
QT AQ = D = diag(>.;) (8.2.9)
r
Q = [ Q(i Q{3 l D= n-r (8.2.10)
r n-r
Dr(A) = ran(Qo)
is the dominant invariant subspace of dimension r. It is the unique invari-
ant subspace associated with the eigenvalues >. 1 , ... , >.r.
The following theorem shows that with reasonable assumptions, the
subspaces ran(Qk) generated by (8.2.8) converge to Dr(A) at a rate pro-
portional to IAr+ dAr Ik.
Theorem 8.2.2 Let the Schur decomposition of A E lRnxn be given by
{8.2.9) and (8.2.10) with n ~ 2. Assume that 1>-rl > 1>-r+II and that the
n-by-r matrices {Qk} are defined by {8.2.8). lf8 E [0,7r/2] is specified by
then
dist(Dr(A), ran(Qk)) ::; tan(£1) I>.~:Ilk
See also Theorem 7. 9.1.
8.2. POWER ITERATIONS 411
If
then
and so
Example 8.2.3 If (8.2.8) Is applied to tile matrix ot Example 8 .2.1 with r = 2 and
Qo = 1,(:,1:2), then
k dist(D2(A) ,1'811(Q~o))
1 0.8806
:i! o.4091
3 0.1121
4 0.0313
5 0.0106
6 0.0044
1 0.0020
8 0.0010
9 0.0005
10 0.0002
412 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
0 0
dist(D;(A),span{ql \ ••• , q} )}) < 1 (8.2.11)
k) .
• (k) (k)
dist(span{q 1 , ... ,q; },span{q~, ... ,q;}) = 0
(I T
>.H1
l
are converging to diagonal form. Thus, it can be said that the method
of orthogonal iteration computes a Schur decomposition if r = n and the
original iterate Q 0 E m.nxn is not deficient in the sense of (8.2.11).
The QR iteration arises by considering how to compute the matrix Tk
directly from its predecessor Tk_ 1 , On the one hand, we have from (8.2.1)
and the definition of Tk- 1 that
Example 8.2.4 If the QR iteration (8.2.1) is applied to the matrix in Example 8.2.1,
then alter 10 iterations
k 1Tk(2, 1)1 1Tk(3, 1)1 1Tk(4, 1) 1Tk(3, 2)1 1Tk(4, 2)1 1Tk(4, 3)1
1 3.9254 1.8122 3.3892 4.2492 2.8367 1.1679
2 2.6491 1.2841 2.1908 1.1587 3.1473 0.2294
3 2.0147 0.6154 0.5082 0.0997 0.9859 0.0748
4 1.6930 0.2408 0.0970 0.0723 0.2596 0.0440
5 1.2928 0.0866 0.0173 0.0665 0.0667 0.0233
6 0.9222 0.0299 0.0030 0.0405 0.0169 0.0118
7 0.6346 0.0101 0.0005 0.0219 0.0043 0.0059
8 0.4292 0.0034 0.0001 0.0113 0.0011 0.0030
9 0.2880 0.0011 0.0000 0.0057 0.0003 0.0015
10 0.1926 0.0004 0.0000 0.0029 0.0001 0.0007
Note that a single QR iteration involves O(n3 ) flops. Moreover, since con-
vergence is only linear (when it exists), it is clear that the method is a pro-
hibitively expensive way to compute Schur decompositions. Fortunately,
these practical difficulties can be overcome as we show in the next section.
Problems
P8.2.1 Suppose Ao E JI!'Xn is symmetric a.nd positive definite a.nd consider the following
iteration:
fork= 1, 2, ...
Ak-1 = Gka'f {Cholesky)
Ak =GfGk
end
(a) Show that this iteration is defined. (b) Show that if Ao = [ ~ be ] with a~ c has
eigenvalues A1 ~ A2 > 0, then the Ak converge to dia.g{At, A2).
P8.2.2 Prove {8.2.7).
P8.2.3 Suppose A E R!'xn is symmetric and define the function f:R"+ 1 -+ R"+ 1 by
f ( ).x) = [Ax-Ax
(rTx -1)/2
l
where "' E R" and ). E R Suppose X+ a.nd ).+ are produced by applying Newton's
method to f at the "current point" defined by rc a.nd Ac. Give expnS'IiOns for "'+ and
).+ assuming that II rc 112 = 1 and Ac = xr Axe.
Notes and References for Sec. 8.2
The following references are concerned with the method of orthogonal iteration {a.k.a..
the method of simulta.neous iteration):
G.W. Stewart {1969). "Accelerating The Orthogonal Iteration for the Eigenvalues of a.
Hermitian Matrix," Numer. Math. 13, 362-76.
M. Clint and A. Jennings (1970). "The E>91uation of Eigenvalues and Eigenvectors of
Real Symmetric Matrices by Simulta.neous Iteration," Comp. J. 13, 76-80.
H. Rutishall5BC {1970). "Sintulta.neous Iteration Method for Symmetric Matrices," Nu-
mer. Malh. 16, 205-23. See also WilkillBOn a.nd Reinsch {197l,pp.284-302).
414 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
(8.3.1)
Bu k-1
B21 1
[ 0 n-k
k-1
[ ~;: B2~Pk
PkB33pk
l k-1
1
n-k
k-1 n-k
is tridiagonal. Clearly, if Uo = P1 · · · Pn-2, then UJ' AUo = Tis tridiagonal.
In the calculation of Ak it is important to exploit symmetry during the
formation of the matrix PkB33Pk. To be specific, suppose that Pk has the
form
Pk = I- (3vvT f3 = 2/vT v, 0 f. v E IR.n-k.
Note that if p = (3B33 v and w =p - (f3pT v j2)v, then
Since only the upper triangular portion of this matrix needs to be calcu-
lated, we see that the transition from Ak-l to Ak can be accomplished in
only 4( n - k ) 2 flops.
Example 8.3.1
I 0 O]T[l3 ~
[ 00 .6.8 -.6.8 4 8
: ] [~
3 0
.~ -~
.8 -.6
] [; 0
5
10.32
1. 76
1~6
-5.32
] .
Note that if T has a zero subdiagonal, then the eigenproblem splits into
a pair of smaller eigenproblems. In particular, if tk+l,k = 0, then >.(T) =
416 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
QTK(A, Q(:, 1), n) [ QT ql, (QT AQ)(QT ql), ... , (QT AQt-I(QTqi) J
[et, Tq, ... ,rn- 1 e!] = R
is upper triangular with the property that r 11 = 1 and r;; = t21t32 · · · t;,;-1
for i = 2:n. Clearly, if R is nonsingular, then T is unreduced. If R is
singular and rkk is its first zero diagonal entry, then k 2: 2 and tk,k-l is the
first zero subdiagonal entry. D
The next result shows that Q is essentially unique once Q(:, 1) is specified.
Proof. Define the orthogonal matrix W = QTV and observe that W(:, 1) =
In(:, 1) = e1 and WTTW = S. By Theorem 8.3.1, WTK(T,e 1 , k) is upper
triangular with full column rank. But K(T, e1 , k) is upper triangular and
so by the essential uniqueness of the thin QR factorization,
This says that Q(:,i) = ±V(:,i) fori= 1:k. The comments about the
subdiagonal entries follows from this since ti+l,i = Q(:, i + 1)T AQ(:, i) and
si+ 1 ,; = V(:,i + 1)T AV(:,i) fori= 1:n -1. D
QR=T-sl
fork= 1:n -1
[c, s] = givens(tkk• tk+l,k)
m = min{ k + 2, n}
T(k:k + 1, k:m) = [ -sc cs } T T(k:k + 1, k:m)
end
This requires O(n) flops. If the rotations are accumulated, then O(n2)
flops are needed.
418 CHAPTER 8. THE SYMM ET RIC E I GENVALUE P ROBLEM
T = U[ AUo (tridiagonal)
fork= 0, 1, ...
Determine real shift Jl.· (8.3.2)
T - JJ.l = U R (QR factorization)
T = RU +J.Ll
end
1f
0
T =
then one reasonable choice for the shift is J1. = a,... However, a more effective
choice is to shift by the eigenvalue of
that is closer t o an. This is known as the Wilkinson shift and it is given
by
(8.3.3)
X X 0 0 0 0 X X 0 0 0 0
X X X + 0 0 X X X 0 0 0
a,
___, 0 X X X 0 0 a,
___, 0 X X X + 0
0 + X X X 0 0 0 X X X 0
0 0 0 X X X 0 0 + X X X
0 0 0 0 X X 0 0 0 0 X X
X X 0 0 0 0 X X 0 0 0 0
X X X 0 0 0 X X X 0 0 0
a.
--+
0 X X X 0 0 a,
___, 0 X X X 0 0
0 0 X X X + 0 0 X X X 0
0 0 0 X X X 0 0 0 X X X
0 0 0 + X X 0 0 0 0 X X
Thus, it follows from the Implicit Q theorem that the tridiagonal matrix
zrrz produced by this zero-chasing technique is essentially the same as the
tridiagonal matrix T obtained by the explicit method. (We may assume
that all tridiagonal matrices in question are unreduced for otherwise the
problem decouples.)
Note that at any stage of the zero-chasing, there is only one nonzero
entry outside the tridiagonal band. How this nonzero entry moves down
01
[0
0
-8
0c 08
0 0
c
00
0
1
l
T [ ak
bk
Zk
0
bk
ap
bP
0
Zk
bp
aq
bq Ur
00
bq
l[ l [ l
the matrix during the update T +-- G{TGk is illustrated in the following:
01 _ 0c 0s
0
0 0 0
s c
00
0
1
=
Uk
bk
0
0
bk
ap
bp
Zp
0
bP
aq
bq
0
zp
bq
Ur
26 flops once c and s have been determined from the equation bks + ZkC =
0. Overall we obtain
1 1 0
T = 1 2 1
[
0 1 3
0 0 .01
T _
-
[
.5000
.5916
0
0
.5916
1.785
. 1808
0
0
.1808
3.7140
.0000044
0
0
.0000044
4.002497
l .
D
[Y ] n-:-q
p n-p -q q
1 2 0 0 ]
A= 2 3 4 0
0 4 5 6
[ 0 0 6 7
The subdiagonal entries change as follows during the execution or Algorithm 8.3.3:
Iteration a21 a3 2 a 43
1 1.6817 3.2344 .8649
2 1.6142 2.5755 .0006
3 1.6245 1.6965 10- 13
4 1.6245 1.6965 COD\oerg•
5 1.5117 .0150
6 1.1195 10- 9
7 .7071 converg.
8 conver!·
422 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
The computed eigenvalues 5., obtained via Algorithm 8.3.3 are the exact
eigenvalues of a matrix that is near to A, i.e., Q~(A + E)Qo = diag(~)
where Q~ Qo = I and II E 112 ~ ull A ll2· Using Corollary 8.1.6 we know that
the absolute error in each 5., is small in the sense that 15.; - >.;! "" ull A ll2·
If Q = [ q1 , ••• , <1n J is the computed matrix of orthonormal eigenvectors,
then the accuracy of q; depends on the separation of >.; from the remainder
of the spectrum. See Theorem 8.1.12.
If all of the eigenvalues and a few of the eigenvectors are desired, then
it is cheaper not to accumulate Q in Algorithm 8.3.3. Instead, the desired
eigenvectors can be found via inverse iteration with T. See §8.2.2. Usually
just one step is sufficient to get a good eigenvector, even with a random
initial vector.
If just a few eigenvalues and eigenvectors are required, then the special
techniques in §8.5 are appropriate.
It is interesting to note the connection between Rayleigh quotient it-
eration and the symmetric QR algorithm. Suppose we apply the latter
to the tridiagonal matrix T E !Rnxn with shift a = e'?;Ten = tnn where
en= In(:,n). If T -ul= QR, then we obtain T = RQ +a!. From the
equation (T -ul)Q = RT it follows that
zk = AQk-1
QkRk = Zk (QR factorization)
Theorem 8.1.14 says that we can minimize II AQk - GkS IIF by settingS=
Sk =: {Jf
AQk· If rf{ SkUk = Dk is the Schur decomposition of Sk E JR."xr
and Qk = GkUk, then
showing that the columns of Qk are the best possible basis to take after k
steps from the standpoint of minimizing the residual. This defines the Ritz
accelerotion idea:
8.3 . THE SYMMETRIC QR ALGORIT HM 423
then
A-[T t lll . . Q··[! ll
k di.s~ {D2(A). QA: }
0 .2 x to- 1
1 .5 X 10- 3
2 .1 X 10- 4
3 .3 X 10- 6
4 .8 X 10- 8
Problems
P8.3.5 Suppose A E <Dnxn is Hermitian. Show how to construct unitary Q such that
QH AQ = T is real, symmetric, and tridiagonal.
The first two references contain Algol programs. Algol procedures for the explicit and
implicit tridiagonal QR algorithm are given in
H. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). "The QR and QL
Algorithms for Symmetric Matrices," Numer. Math. 11, 293-306. See also Wilkinson
and Reinsch (1971, pp.227-40).
A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). "The Implicit QL Algorithm,"
Numer. Math. 1f!, 377-83. see also Wilkinson and Reinsch (1971, pp.241-48).
The "QL" algorithm is identical to the QR algorithm except that at each step the matrix
T- >.I is factored into a product of an orthogonal matrix and a lower triangular matrix.
Other papers concerned with these methods include
G.W. Stewart (1970). "Incorporating Original Shifts into the QR Algorithm for Sym-
metric Tridiagonal Matrices," Comm. ACM 13, 365--67.
A. Dubrulle (1970). "A Short Note on the Implicit QL Algorithm for Symmetric Tridi-
agonal Matrices," Numer. Math. 15, 450.
Extensions to Hermitian and skew-symmetric matrices are described in
D. Mueller (1966). "Householder's Method for Complex Matrices and Hermitian Matri-
ces," Numer. Math. 8, 72-92.
R.C. Ward and L.J. Gray (1978). "Eigensystem Computation for Skew-Symmetric and
A Class of Symmetric Matrices," ACM Trans. Math. Soft. 4, 278-85.
8.3. THE SYMMETRIC QR ALGORITHM 425
The convergence properties of Algorithm 8.2.3 are detailed in Lawson and Hanson {1974,
Appendix B), as well as In
C.P. Huang {1981). "On the Convergence of the QR Algorithm with Origin Shifts for
Normal Matrices," IMA J. Num. Anal. 1, 127-33.
Interesting papers concerned with shifting in the tridiagonal QR algorithm include
F.L. Bauer and C. Reinsch {1968). "Rational QR Transformations with Newton Shift
for Symmetric Tridiagonal Matrices," Numer. Math. 11, 264-72. See also Wilkinson
and Reinsch {1971, pp.257-65).
G.W. Stewart {1970). "Incorporating Origin Shifts into the QR Algorithm for Symmetric
Tridiagonal Matrices," Comm. Assoc. Comp. Mach. 13, 365-67.
Some parallel computation pOSBibilities for the algorithms in this section are discussed in
S. Lo, B. Philippe, and A. Sameh {1987). "A Multiprocessor Algorithm for the Symmet-
ric Tridiagonal Eigenvalue Problem," SIAM J. Sci. and Stat. Comp. 8, s155-s165.
H.Y. Chang and M. Salama (1988). "A Parallel Householder Tridiagonalization Strategy
Using Scattered Square Decomposition," Parallel Computing 6, 297-312.
Another way to compute a specified subset of eigenvalues is via the rational QR algo-
rithm. In this method, the shift is determined using Newton's method. This makes it
possible to "steer" the iteration towards desired eigenvalues. See
C. Reinsch and F.L. Bauer {1968). "Rational QR Transformation with Newton's Shift
for Symmetric Tridiagonal Matrices," Numer. Math. 11, 264-72. See also Wilkinson
and Reinsch {1971, pp.257-65).
Papers concerned with the symmetric QR algorithm for banded matrices include
R.S. Martin and J.H. Wilkinson (1967). "Solution of Symmetric and Unsymmetric Band
Equations and the Calculation of Eigenvectors of Band Matrices," Numer. Math. 9,
279-301. See also See also Wilkinson and Reinsch (1971, pp.7Q-92).
R.S. Martin, C. Reinsch, and J.H. Wilkinson {1970). "The QR Algorithm for Band
Symmetric Matrices," Numer. Math. 16, 85--92. See also Wilkinson and Reinsch
(1971, pp.266-72).
426 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
n n
off( A) = L I>~j ,
i=l j=l
#i
i.e., the"norm" of the off-diagonal elements. The tools for doing this are
rotations of the form
1 0 0 0
0 c s 0 p
J(p,q,(})
0 -s c 0 q
0 0 0 1
p q
which we call Jacobi rotations. Jacobi rotations are no different from Givens
rotations, c.f. §5.1.8. We submit to the name change in this section to honor
the inventor.
The basic step in a Jacobi eigenvalue procedure involves (1) choosing an
index pair (p, q) that satisfies 1 :::; p < q :::; n, (2) computing a cosine-sine
pair (c, s) such that
and so
n
off(B) 2 = II B II~- Lb?, (8.4.2)
i= l
n
= II A II~-L:af, + (a~+ a~q- b~ - b~9 )
i=l
= olf(A) 2 - 2a;9 •
It is in this sense that A moves closer to diagonal form with each Jacobi
step.
Before we discuss how the index pair (p, q) can be chosen, let us look at
the actual computations associated wit h the (p,q) subproblem.
t
2
+ 2rl - 1 = 0 .
It t urns out to be important to select the smaller of the two roots,
t = -T± ~
whereupon c and scan be resolved from t he formulae
c= l/v'l+t2 s = tc.
Choosing t to be the smaller of the two roots ensures that 191 ~ 7r I 4 and
has the effect of minimizing the difference between B and A because
n
Since la,.ql is the largest off-diagonal entry, off(A) 2 :5 N(a~ +a~,.) where
8.4. JACOBI METHODS 429
olf(B)
2
::; ( 1- ~) olf(A)2 •
By induction, if A(t) denotes the matrix A after k Jacobi updates, then
A = [ ~
1
i : 1~ l
4 10 20
we lind
sweep O(off(A))
0 102
1 101
2 to- 2
3 to-11
4 to-n
(p, q) = (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4), (1, 2), ...
V=In
eps = tolll A IIF
while off(A) > eps
for p=l:n-1
for q = p+ 1:n
(c, s) = sym.schur2(A,p, q)
A= J(p, q, B)T AJ(p, q, B)
V = V J(p, q, B)
end
end
end
Cyclic Jacobi converges also quadratically. (See Wilkinson (1962) and van
Kempen (1966).) However, since it does not require off-diagonal search, it
is considerably faster than Jacobi's original algorithm.
Example 8.4.2 If the cyclic Jacobi method is applied to the matrix in Example 8.4.1
we find
Sweep O(off(A))
0 10
1 10'
2 10_,
3 w-6
4 10 -16
8.4. JACOBI METHODS 431
for some ordering of A's eigenvalues >.;. The parameter k, depends mildly
on r.
Although the cyclic Jacobi method converges quadratically, it is not
generally competitive with the symmetric QR algorithm. For example, if
we just count flops, then 2 sweeps of Jacobi is roughly equivalent to a com-
plete QR reduction to diagonal form with accumulation of transformations.
However, for small n this liability is not very dramatic. Moreover, if an ap-
proximate eigenvector matrix V is known, then vr AV is almost diagonal,
a situation that Jacobi can exploit but not QR.
Another interesting feature of the Jacobi method is that it can a com-
pute the eigenvalues with small relative error if A is positive definite. To
appreciate this point, note that the Wilkinson analysis cited above cou-
pled the §8.1 perturbation theory ensures that the computed eigenvalues
>.1 :::: · · · :::: 5-n satisfy
15.;- >.;(A)I ~ II A 112 < (A)
>.;(A) ~ u >.;(A) - UK2 .
where D = diag( ..;all, ... , ya;;;;-) and this is generally a much smaller ap-
proximating bound. The key to establishing this result is some new pertur-
bation theory and a demonstration that if A+ is a computed Jacobi update
obtained from the current matrix Ac, then the eigenvalues of A+ are rel-
atively close to the eigenvalues of Ac in the sense of (8.4.4). To make the
whole thing work in practice, the termination criteria is not based upon
the comparison of off(A) with ull A IIF but rather on the size of each la;il
compared to u,ja;;aii. This work is typical of a new genre of research con-
cerned with high-accuracy algorithms based upon careful, componentwise
error analysis. See Mathias (1995).
Note that all the rotations within each of t he t hree rotation sets are "non-
conflicting." That is, subproblems (1,2) and (3,4) can be carried out in
parallel. Likewise the (1,3) and (2,4) subproblems can be executed in par-
allel as can subproblems (1,4) and (2,3). In general, we say that
N = (n - l )n/2
I ~ ! ~ ~
I I I I
rot.set{1) = { (1, 2) , (3,4), (5,6), (7,8)}
I !~ I I : I ~ I
rot.set(2) = { (1, 4), (2, 6), (3, 8), (5, 7)}
I ~ I : I ; I ~ I
rot.set(3) {(1,6),(4,8),(2, 7),(3, 5)}
I!~ I I : I ; I
rot.set(4) = {(1,8), (6,7),(4,5), (2,3)}
V=ln
eps =toll! A IIF
top= 1:2:n; bot = 2:2:n
while off(A) > eps
for set= 1:n- 1
fork= 1:n/2
p = min(top(k),bot(k))
q = max(top(k), bot(k))
(c, s) = sym.schur2(A,p,q)
A = J(p, q, 9)T AJ(p, q, 9)
v = v J(p,q,9)
end
(top, bot]= music(top,bot,n)
end
end
434 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
Notice that the k-loop steps through n/2 independent, nonconflicting sub-
problems.
The ordered pairs denote the indices of the housed columns. The first index
names the left column and the second index names the right column. Thus,
the left and right columns in Proc(3) during step 3 are 2 and 7 respectively.
Note that in between steps, the columns are shuffled according to the
permutation implicit in music and that nearest neighbor communication
prevails. At each step, each processor oversees a single subproblem. This
involves (a) computing an orthogonal Vamall E IR2 x 2 that solves a local 2-
by-2 Schur problem, (b) using the 2-by-2 v.mall to update the two housed
columns of A and V, (c) sending the 2-by-2 V.mall to all the other proces-
sors, and (d) receiving the V.mall matrices from the other processors and
updating the local portions of A and V accordingly. Since A is stored by
column, communication is necessary to carry out the Vamall updates be-
cause they effect rows of A. For example, in the second step of the n = 8
problem, Proc(2) must receive the 2-by-2 rotations associated with sub-
problems (1,4), (3,8), and (5,7). These come from Proc(1), Proc(3), and
Proc(4) respectively. In general, the sharing of the rotation matrices can
be conveniently implemented by circulating the 2-by-2 Vsmall matrices in
"merry go round" fashion around the ring. Each processor copies a pass-
ing 2-by-2 V.mall into its local memory and then appropriately updates the
locally housed portions of A and V.
The termination criteria in Algorithm 8.4.4 poses something of a prob-
lem in a distributed memory environment in that the value of off(·) and
II A IIF require access to all of A. However, these global quantities can be
computed during the V matrix merry-go-round phase. Before the circu-
lation of the V's begins, each processor can compute its contribution to
II A IIF and off(·). These quantities can then be summed by each processor
if they are placed on the merry-go-round and read at each stop. By the
end of one revolution each processor has its own copy of II A IIF and off(}
8.4. JACOBI METHODS 435
Here, each A;j is r-by-r. In block Jacobi the (p, q) subproblem involves
computing the 2r-by-2r Schur decomposition
_ [ Dw 0 ]
] - 0 Dqq
and then applying to A the block Jacobi rotation made up of the V;i . If
we call this block rotation V then it is easy to show that
Problems
A=[~~]·
It is desired to compute an orthogonal matrix
J= [ c •]
-· c
such that the (!, I) entry of JT AJ equals -y. Show that this requirement leads to the
equation
(w--y)r2 -2xr+(z--y) =0,
=
where r cfs. Verify that this quadratic has real roots if-y satisfies ll2 $ -y $ ll,, where
ll, and ,\2 a.re the eigenvalues of A.
P8.4.2 Let A E R'x" be symmetric. Give an algorithm that computes the factorization
QT AQ = -yl +F
where Q is a product of Jacobi rotations, -y = trace(A)/n, and F has zero diagonal
entries. Discuss the uniqueness of Q.
P8.4.3 Fbrmulate Jacobi procedures for (a) skew symmetric matrices and (b) complex
436 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
Hermitian matrices.
P8.4.4 Partition the n-by-n real symmetric matrix A as follows:
a vT ] 1
A= [ v AI n-1
1 n-1
Jacobi's original paper is one of the earliest references found in the numerical analysis
literature
C.G.J. Jacobi (1846). "Uber ein Leichtes Verfahren Die in der Theorie der Sacularstroun-
gen Vorkommendern Gleichungen Numerisch Aufzulosen," Grelle's J. SO, 51-94.
Prior to the QR algorithm, the Jacobi technique was the standard method for solving
dense symmetric eigenvalue problems. Early attempts to improve upon it include
H. Rutishauser (1966). "The Jacobi Method for Real Symmetric Matrices," Numer.
Math. 9, 1-10. See also Wilkinson and Reinsch (1971, pp. 202-11).
N. Mackey (1995). "Hamilton and Jacobi Meet Again: Quaternions and the Eigenvalue
Problem," SIAM J. Matrix Anal. Applic. 16, 421-435.
The method is also useful when a nearly diagonal matrix must be diagonalized. See
J .H. Wilkinson (1968). "Almost Diagonal Matrices with Multiple or Close Eigenvalues,"
Lin. Alg. and Its Applic. I, 1-12.
Establishing the quadratic convergence of the classical and cyclic Jacobi iterations has
attracted much attention:
P. Henrici (1958). "On the Speed of Convergence of Cyclic and Quasicyclic Jacobi
Methods for Computing the Eigenvalues of Hermitian Matrices," SIAM J. Appl.
Math. 6, 144-62.
E.R. Hansen (1962). "On Quasicyclic Jacobi Methods," ACM J. 9, lls--35.
8.4. JACOBI METHODS 437
J.H. Wilkinson (1962). "Note on the Quadratic Convergence of the Cyclic Jacobi Pro-
cess," Numer. Math. 6, 296-300.
E.R. Hansen (1963). "On Cyclic Jacobi Methods," SIAM J. Appl. Math. 11, 448-59.
A. Schonhage (1964). "On the Quadratic Convergence of the Jacobi Process," Numer.
Math. 6, 41o-12.
H.P.M. van Kempen (1966). "On Quadratic Convergence of the Special Cyclic Jacobi
Method," Numer. Math. 9, 19-22.
P. Henrici and K. Zimmermann (1968). "An Estimate for the Nonns of Certain Cyclic
Jacobi Operators," Lin. Alg. and Its Applic. 1, 489-501.
K.W. Brodlie and M.J.D. Powell (1975). "On the Convergence of Cyclic Jacobi Meth-
ods," J. Inst. Math. Applic. 15, 279-87.
Detailed error analyses that ..,tablish imprtant oomponentwise error bounds include
H.H. Goldstine and L.P. Horowitz (1959). "A Procedure for the Diagonalization of
Normal Matrices," J. Assoc. Comp. Mach. 6, 176-95.
G. Loizou (1972). "On the Quadratic Convergence of the Jacobi Method for Normal
Matrices," Comp. J. 15, 274-76.
A. Rube (1972). "On the Quadratic Convergence of the Jacobi Method for Normal
Matrices," BIT 7, 305-13.
See also
M.H.C. Paardekooper (1971). "An Eigenvalue Algorithm for Skew Symmetric Matrices,"
Numer. Math. 17, 189-202.
D. Hacon (1993). "Jacobi's Method for Skew-Symmetric Matrices," SIAM J. Matrix
AnaL Appl. 14, 619-628.
Essentially, the analysis and algorithmic developments presented in the text carry over
to the normal case with minor modification. For non-normal matrices, the situation iB
considerably more difficult. Consult
J. Greenstadt (1955). "A Method for Finding Roots of Arbitrary Matrices," Math.
Tables and Other Aids to Comp. 9, 47-52.
C.E. Froberg {1965). "On Tria.ngularization of Complex Matrices by Two Dimensional
Unitary Tranformations," BIT 5, 23o-34.
J. Boothroyd and P.J. Eberlein (1968). "Solution to the Eigenproblem by a Norm-
Reducing Jacobi-Type Method (Handbook)," Numer. Math. 11, 1-12. See also
Wilkinson and Reinsch (1971, pp.327-3S).
A. Rube (1968). On the Quadratic Convergence of a Generalization of the Jacobi Method
to Arbitrary Matrices," BIT 8, 21o-3L
A. Rube (1969). ''The Norm of a Matrix After a. Similarity Transformation," BIT 9,
53-58.
438 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
Jacobi methods for complex symmetric matrices have also been developed. See
Although the symmetric QR algorithm is generally much faster than the Jacobi
method, there are special settings where the latter technique is of interest. As we illU&-
trated, on a pacallel-computer it is possible to perform several rotations concurrently,
thereby accelerating the reduction of the off-diagonal elements. See
A. Sameh (1971). "On Jacobi and Jacobi-like Algorithms for a Parallel Computer,"
Math. Comp. 25, 579-90.
J.J. Modi and J.D. Pryce (1985). "Efficient Implementation of Jacobi's Diagona.lization
Method on the DAP," Numer. Math. 46, 443-454.
D.S. Scott, M.T. Heath, and R.C. Wacd (1986). "Parallel Block Jacobi Eigenvalue
Algorithms Using Systolic Arrays," Lin. Alg. and Its Applic. 77, 345-356.
P.J. Eberlein (1987). "On Using the Jacobi Method on a Hypercube," in Hypercube
Multiprocessors, ed. M.T. Heath, SIAM Publications, Philadelphia.
G. Shroff and R. Schreiber (1989). "On the Convergence of the Cyclic Jacobi Method
for Parallel Block Orderings," SIAM J. Matri:I: Anal. Appl. 10, 32&-346.
M.H.C. Paardekooper (1991). "A Quadratically Convergent Parallel Jacobi Process
for Diagonally Dominant Matrices with Nondistinct Eigenvalues," Lin.Alg. and Its
Applic. 145, 71-&!.
8.5. TRIDIAGONAL METHODS 439
a, b, 0
b, a2
T = (8.5.1)
bn-1
0 bn-l a,.
then a( A) equals the number ofT's eigenvalues that are less than A. Here,
the polynomials p,(x) are defined by {8.5.2} and we have the convention
that Pr(A) has the opposite sign of Pr-1(A) if p,(A) = 0.
Proof. It follows from Theorem 8.1.7 that the eigenvalues of T,_ 1 weakly
separate those of Tr. To prove that the separation must be strict, suppose
that Pr(J.L) = Pr-1(J.L) = 0 for some r and J.L. It then follows from (8.5.2)
and the assumption that Tis unreduced that Po(J.L) = p 1(J.L) = · · · = Pr(J.L)
= 0, a contradiction. Thus, we must have strict separation.
The assertion about a(A) is established in Wilkinson (1965, 300-301).
We mention that if Pr(A) = 0, then its sign is assumed to be opposite the
sign of Pr-1 (A). D
Example 8.5.1 If
=~ -! l
1 -1
-1 2
T = [
0 -1
0 0
then A(T)"" {.254, 1.82, 3.18, 4.74}. The sequence
{po(2), P!(2), P2(2), pa(2), P<(2)} = { 1, -1, -1, 0, 1}
Example 8.6.2 If (8.5.3) is applied to the matrix of Example 8.5.1 with k ""3, then
t he values shown in the following table are generated:
II z X a(x)
0.0000 s.oooo 2.5000 2
0.0000 2.5000 1.25o00 1
1.2500 2.5000 1.3750 1
1.3750 2.5000 1.9375 2
1.3750 1.9375 1.65&3 1
1.6563 1.9375 1.7969 1
We conclude from the output that ~a (T) E [ 1.7969, 1.9375 ]. Note: ~a(T)::::: 1.82.
A- Jj [ = LDLT
442 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
is the LDLT factorization of A- JJI with D = diag(db ... , dn), then the
number of negatived; equals the number of >.;(A) that are less than IJ· See
Parlett (1980, p.46) for details.
Lemma 8.5.2 SupposeD = diag(dl! ... , d,.) E JR.""" has the property that
d1 > · · · > dn Assume that p # 0 and that z E IR" has no zero compo-
nents. If
(D + pzzT)v = >.v v#0
then zT v #0 and D- >.I is nonsingular.
Theorem 8.5.3 SupposeD= diag(dJ. ... , dn) E R""" and that the diag-
onal entries satisfy d1 > · · · > dn. Assume that p # 0 and that z E IR" has
no zero components. If V E IR""" is orthogonal such that
(b) If p > 0, then >.1 > d1 > >.2 > · · · > dn.
If p < 0, then d1 > >. 1 > d2 > · · · > d,. > >.n.
v E span{(D- AJ)- 1 z}
ZT V (1 + pzT(D- AJ)- 1 z) = 0.
By Lemma 8.5.2, zTv # 0 and so this shows that if A E A(D + pzzT), then
/(A) = 0. We must show that all the zeros off are eigenvalues of D + pzzT
and that the interlacing relations (b) hold.
To do this we look more carefully at the equations
/(A)
/'(A)
In either case, it follows that the zeros of f are precisely the eigenvalues of
D+ pvvT.o
The theorem suggests that to compute V we (a) find the roots A1 , ... , An
off using a Newton-like procedure and then (b) compute the columns of
V by normalizing the vectors (D- >.;I)- 1 z fori= l:n. The same plan of
attack can be followed even if there are repeated d; and zero z;.
Theorem 8.5.4 If D = diag(d1 , •.. ,dn) and z ERn, then there exists an
orthogonal matrix Vi such that if v[ DV1 = diag(J.'t, ... , 1-'n) and w =
Vtz then
l-'1 > l-'2 > · · · > 1-'r ;::: 1-'r+l ;::: · · · ;::: 1-'n ,
W; # 0 fori= l:r, and w; = 0 fori= r + l:n.
Proof. We give a constructive proof based upon two elementary opera-
tions. (a) Suppose d; = di for some i < j . Let J( i, j, B) be a Jacobi
rotation in the (i, j) plane with the property that the jth component of
J(i,j,B)Tz is zero. It is not hard to show that J(i,j,B)TDJ(i,j,B) =D.
Thus, we can zero a component of z if there is a repeated d;. (b) If z; = 0,
444 C H APTER 8. T HE SYMMET RIC EIGENVALUE PROBLEM
z1 :/= 0 , and i < j, then let P be the identity with columns i and j inter-
changed. It follows that pTDp is diagonal, (PTz), :/= 0, and (PTz)1 = 0.
Thus, we can permute all the zero z. to the "bottom." Clearly, repetition
of (a) and (b) eventually renders the desired canonical structure. V1 is the
product of the rotations. D
See Barlow (1993) and the references therein for a discussion of the solution
procedures that we have outlined above.
v = [ ~~) ]
8e1
. (8.5.6)
Note that for all p E R the matrix T=T - fYVVT is identical to T except
in its "middle foue' entries:
T(m:m + 1, m:m + 1)
If we set p8 = bm then
where
0
0
8.5. TRIDIAGONAL METHODS 445
a m+l bm+l 0
bm+l Clm+2
U=
then
where
is diagonal and
T [ Q[em ]
z = U tl = BQI et ·
Comparing these equations we see that the effective synthesis of the two
half-sized Schur decompositions requires the quick and stable computation
of an orthogonal V such that
T(O) T(l)
~
T(OO) T(Ol)
~
T(lO) T(ll)
A
T(OOO) T(OOl)
A
T(OlO) T(Oll)
A
T(lOO) T(lOl)
A
T(llO) T(lll)
T(b)
A
T(bO) T(bl)
Problems
A=[$ v
where D = diag(d, ... , dn- I) has distinct diagonal entries and v E Rn- 1 has no zero
entries. (a) Show that if >. E >.(A), then D- >.In- 1 is nonsingular. (b) Show that if
>. E >.(A), then >. is a zero of
T
P8.5. Suppose A=
S+auuT where S E wxn is skew-symmetric, u ERn, and a E R.
Show how to compute an orthogonal Q such that QT AQ = T + ae1ei where Tis tridi-
agonal and skew-symmetric and e1 is the first column of In.
P8.5.8 It is known that >. E >.(T) where T E Rnxn is symmetric and tridiagonal with
no zero subdiagonal entries. Show how to compute x(1:n -1) from the equation Tx = >.x
given that Xn = 1.
W. Barth, R.S. Martin, and J.H. Wilkinson (1967). "Calculation of the Eigenvalues of
a Symmetric Tridiagonal Matrix by the Method of Bisection,'' Numer. Math. 9,
386-93. See also Wilkinson and Reinsch (1971, 249-256).
K.K. Gupta (1972). "Solution of Eigenvalue Problems by Sturm Sequence Method," Int.
J. Numer. Meth. Eng. 4, 379-404.
Various aspects of the divide and conquer algorithm discussed in this section is detailed in
G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Review 15,
318-44.
J.R. Bunch, C.P. Nielsen, and D.C. Sorensen (1978). "Rank-One Modification of the
Symmetric Eigenproblem," Numer. Math. 31, 31--48.
J.J.M. Cuppen (1981). "A Divide and Conquer Method for the Symmetric Eigenprob-
lem," Numer. Math. 36, 177-95.
448 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
J.J. Dongana and D.C. Sorensen (1987). "A Fully Parallel Algorithm for the Symmetric
Eigenvalue Problem," SIAM J. Sci. and Stat. Comp. 8, S139-S154.
S. Crivelli and E.R. Jessup (1995). "The Cost of Eigenvalue Computation on Distributed
Memory MIMD Computers," Parallel Computing 21, 401-422.
The very delicate computations required by the method are CIU"efully analyzed in
J.L. B!U"low (1993). "Error Analysis of Update Methods for the Symmetric Eigenvalue
Problem," SIAM J. Matm Anal. Appl. 14, 598-{;18.
Various generalizations to banded symmetric eigenproblems have been explored.
P. Arbenz, W. Gander, and G.H. Golub (1988). "Restricted Rank Modification of the
Symmetric Eigenvalue Problem: Theoretical Considerations," Lin. Alg. and Its
Applic. 104, 75-95.
P. Arbenz and G.H. Golub (1988). "On the Spectral Decomposition of Hermitian Ma-
trices Subject to Indefinite Low Rank Perturbations with Applications," SIAM .1.
Matm Anal. Appl. 9, 40-58.
A related divide and conquer method based on the "ELrrowhead" matrix (see P8.5. 7) is
given in
M. Gu and S.C. EisenstELt (1995). "A Divide-and-Conquer Algorithm for the Symmetric
Tridiagonal Eigenproblem," SIAM J. Matm Anal. Appl. 16, 172-191.
and
dia.g(a?, ... , a;, ..__...._..
0, ... , 0) E lR.mxm (8.6.2)
m-n
Moreover, if
then
0 (A+E?] 0
[ A+E 0 .
n
Example 8.6.1 If
A= [ ~ and A+E= [ ;
3
~]
6.01
then u(A) = {9.5080, .7729} and u(A +E) = {9.5145, .7706}. It is clear that fori= 1:2
we have lo-;(A +E) - u;(A)I $ II E ll2 = .01.
This last result says that by adding a column to a matrix, the largest
singular value increases and the smallest singular value is diminished.
Example 8.3.2
7
6
8
9
10
12
11
13
14
15
l {
a(A 1 ) = {7.4162}
a(A2) = {19.5377, 1.8095}
a(Aa) = {35.1272, 2.4654, 0.0000}
Example 8.6.3 If
then
A= r~ n and A+E = [ ;
3
;
6.01
]
2
L (uk(A +E)- O'k(A)) 2 .472 x w-• :-: : w-•
k=l
See Example 8.6.1.
and that ran(VI) and ran(U1 ) form a singular subspace pair for A. Let
r
uH AV = [ AOu 0 ]
A22 m- r
r n-r
8.6. CoMPUTING THE SVD 451
UHEV r
m-r
r n-r
and assume that
If
such that ran( VI + V2Q) and ran(UJ + U2P) is a singular subspace pair for
A+E.
Roughly speaking, the theorem says that O(t) changes in A can alter a
singular subspace by an amount t/6, where 6 measures the separation of
the relevant singular values.
Example 8.6.4 The matrix A= diag(2.000, 1.001, .999) E R'x 3 has singular subspace
l
pairs (span{ vi}, span{u;}) fori= I, 2, 3 where v, = ej l and u; = ej 4 ) Suppose
3
2.000
.010 .010
E = .010 1.001 .010
A + .010
[ .OIO .999
.010 .010 .010
[ .9999 -.0144
.OIOI .7415 .0007]
.6708
U= [ iLt u2 u3] .0101 .6707 -.7616
.005I .OI38 -.0007
• form C =AT A,
dt ft 0
0 d'l
UiAVs
[~ ] B E Jre1Xfl.
fn- l
0 0 dn
The remaining problem is thus to compute the SVD of B. To this end, con-
sider applying an implicit-shift QRstep (Algorithm 8.3.2) to the tridiagonal
matrix T = B TB :
d2 +j,'l
T(m:n,m:n) = m m- t nl=n-1
[
dmfm
[
c1 s1]T [ 4- A] = [ XQ ]
-S l Ct ddt
and set G1 = G(1,2,8t)·
8.6. COMPUTING THE SVD 453
We then can determine Givens rotations U1, V2, U2 , ••• , Vn-1, and Un-l to
chase the unwanted nonzero element down the bidiagonal:
X X + 0 0 0
0 X X 0 0 0
0 0 X X 0 0
B <-- U[B =
0 0 0 X X 0
0 0 0 0 X X
0 0 0 0 0 X
X X 0 0 0 0
0 X X 0 0 0
B <-- BV2
0 + X X 0 0
0 0 0 X X 0
0 0 0 0 X X
0 0 0 0 0 X
X X 0 0 0 0
0 X X + 0 0
0 0 X X 0 0
B <-- U'{B
0 0 0 X X 0
0 0 0 0 X X
0 0 0 0 0 X
and so on. The process terminates with a new bidiagonal B that is related
to B as follows:
- T T -r -
B = (Un-1 · · · U1 )B(G1 V2 · · · Vn-1) = U BV.
454 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
Since each V; has the form V; = G(i,i + l,Oi) where i = 2:n -1, it follows
that Ve1 = Qe 1. By the implicit Q theorem we can assert that V and Q
are essentially the same. Thus, we can implicitly effect the transition from
T tot= fJT fJ by working directly on the bidiagonal matrix B.
Of course, for these claims to hold it is necessary that the underlying
tridiagonal matrices be unreduced. Since the subdiagonal entries of BTB
are of the form d;-1 , !;, it is clear that we must search the bidiagonal band
for zeros. If fk = 0 for some k, then
B= B1 0 ] k
[ 0 B2 n- k
k n -k
and the original SVD problem decouples into two smaller problems involv-
ing the matrices B 1 and B 2 . If dk = 0 for some k < n, then premultiplication
by a sequence of Givens transformations can zero fk. For example, if n =
6 and k = 3, then by rotating in row planes (3,4), (3,5), and (3,6) we can
zero the entire third row:
X X 0 0 0 0 X X 0 0 0 0
0 X X 0 0 0 0 X X 0 0 0
B =
0 0 0 X 0 0 (3,4)
--+
0 0 0 0 + 0
0 0 0 X X 0 0 0 0 X X 0
0 0 0 0 X X 0 0 0 0 X X
0 0 0 0 0 X 0 0 0 0 0 X
X X 0 0 0 0 X X 0 0 0 0
0 X X 0 0 0 0 X X 0 0 0
(3,5)
--+
0 0 0 0 0 + (3,6)
--+
0 0 0 0 0 0
0 0 0 X X 0 0 0 0 X X 0
0 0 0 0 X X 0 0 0 0 X X
0 0 0 0 0 X 0 0 0 0 0 X
[ y z l [ -~ ~ ] = [ * 0 l
B = BG(k,k+ 1,0)
y = bkk; z = bk+l,k
[-~ ~ r[n [~
Determine c = cos(O) and s = sin(O) such that
J
B = G(k,k + 1,0)TB
if k < n- 1
y = bk,k+1; z = bk,k+2
end
end
An efficient implementation of this algorithm would store B's diagonal and
superdiagonal in vectors a(1:n) and f(1:n- 1) respectively and would re-
quire 30n flops and 2n square roots. Accumulating U requires 6mn flops.
Accumulating V requires 6n 2 flops. ·
Typically, after a few of the above SVD iterations, the superdiagonal
entry f n-l becomes negligible. Criteria for smallness within B's band are
usually of the form
until q = n
Set bi,i+l to zero if lbi,i+tl :5 E(lb;;l + lbHt,i+lD
for any i = 1:n - 1.
Find the largest q and the smallest p such that if
p
B n- p- q
q
n-p -q
A~u~~n
t ben the superdingonal elements converge \o zero as follows :
Ct
[ -S!
Problems
P8.6.1 Show that if BE Rnxn is an upper bidiagonal matrix having a repeated singular
value, then B must have a zero on its diagonal or superdiagonal.
P8.6.2 Give formulae for the eigenvectors of [ ~ AOT ] in terms of the singular
vectors of A E R"'xn where m ~ n.
P8.6.3 Give an algorithm for reducing a complex matrix A to real bidiagonal form
using complex Householder transformations.
P8.6.4 Relate the singular values and vectors of A= B + iC (B, C E R"'xn) to those
of [ ~ -~].
P8.6.5 Complete the proof of Theorem 8.6.1.
P8.6.6 Assume that n = 2m and that S E Rnxn is skew-symmetric and tridiagonal.
Show that there exists a permutation P E E'xn such that pT SP has the following form:
pTsp = [ ~ -~T ] :
m m
Describe B. Show how to compute the eigenvalues and eigenvectors of S via the SVD
of B. Repeat for the case n = 2m + 1.
P8.6. 7 (a) Let
be real. Give a stable algorithm for computing c and s with c2 + s2 = I such that
develop a Jacobi-like algorithm for computing the SVD of A E wxn. For a given (p, q)
with p < q, Jacobi transformatiollB J(p, q, IJ,) and J(p, q, 02) are determined such that if
B = J(p, q, O!)T AJ(p, q, 62),
then bpq = bqp = 0. Show
off(B) 2 = off(A) 2 - b;q- b~p·
How might p and q be determined? How could the algorithm be adapted to handle the
case when A E wxn with m > n?
P8.6.8 Let x and y be in Rm and define the orthogonal matrix Q by
Q=[ -s c "
c ]·
Give a stable algorithm for computing c and s such that the columns of [x, y]Q are or-
thogonal to each other.
P8.6.D Suppose BE Rnxn is upper bidiagonal with bnn = 0. Show how to CO!lBtruct
orthogonal u and v (product of GivellB rotations) so that uT BV is Upper bidiagonal
with a zero nth column.
P8.6.10 Suppose B E wxn is upper bidiagonal with diagonal entries d(1:n) and super-
diagonal entries f(1:n- 1). State and prove a singular value version of Theorem 8.5.1.
G.H. Golub and W. Kahan (1965). "Calculating the Singular Values and Pseudo-Inverse
of a Matrix," SIAM J. Num. Ana!. Ser. B 2, 205-24.
and then came some early implementations:
P.A. Businger and G. H. Golub (1969). "Algorithm 358: Singular Value Decomposition
of a Complex Matrix," Comm. Assoc. Camp. Mach. 1£, 564-65.
G.H. Golub and C. Reinsch (1970). "Singular Value Decomposition and Least Squares
Solutions," Numer. Math. 14, 403--20. See also Wilkinson and Reinsch ( 1971, 134-
51).
Interesting algorithmic developments associated with the SVD appear in
8.6. COMPUTING THE SVD 459
J.J.M. Cuppen (1983). "The Singular Value Decomposition in Product Form," SIAM
J. Sci. and Stat. Camp. 4, 216-222.
J.J. Dongarra (1983). "Improving the Accuracy of Computed Singular Values," SIAM
J. Sci. and Stat. Camp. 4, 712-719.
S. Van Huffel, J. Vandewalle, and A. Haegemans (1987). "An Efficient and Reliable
Algorithm for Computing the Singular Subspace of a Matrix Associated with its
Smallest Singular Valu..,," J. Camp. and Appl. Math. 19, 313-330.
P. Deift, J. Demmel, L.-C. Li, and C. Thmei (1991). "The Bidiagonal Singular Value
Decomposition and Hamiltonian Mechanics," SIAM J. Num. Anal. £8, 1463-1516.
R. Mathias and G.W. Stewart (1993). "A Block QR Algorithm and the Singular Value
Decomposition," Lin. Alg. and It8 Applic. 182, 91-100.
A. Bjiirck, E. Grimme, and P. Van Dooren (1994). "An Implicit Shift Bidiagonalization
Algorithm for Ill-Posed Problems," BIT .'14, 510-534.
The Polar decomposition of a matrix can be computed immediately from its SVD. How-
ever, special algorithms have been developed just for this purpose.
J.C. Nash (1975). "A One-Sided Tranformation Method for the Singular Value Decom-
position and Algebraic Eigenproblem," Camp. J. 18, 74-76.
P.C. Hansen (1988). "Reducing the Number of Sweeps in Hestenes Method," in Singular
Value Decomposition and Signal Processing, ed. E. F. Deprettere, North Holland.
K. Veseli~ and V. Ho.ri (1989). "A Note on a One-Sided Jacobi Algorithm," Numer.
Math. 56, 627-633.
Numerous parallel implementations have been developed.
F.T. Luk (1980). "Computing the Singular Value Decomposition on the ILLIAC IV,"
ACM Trans. Math. Soft. 6, 524-39.
460 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM
R.P. Brent and F.T. Luk (1985). "The Solution of Singular Value a.nd Symmetric Eigen-
value Problems on Multiprocessor Arrays," SIAM J. Sci. and Stat. Comp. 6, 6!<-84.
R.P. Brent, F .T. Luk, and C. Van Loan (1985). "Computation of the Singular Value
Decomposition Using Mesh Connected Processors," J. VLSI Computer Systems 1,
242-270.
F.T. Luk (1986). "A Triangula.r Processor Array for Computing Singular Values," Lin.
Alg. and Its Applic. 77, 25!<-274.
M. Berry and A. Sameh (1986). "Multiprocessor Jacobi Algorithms for Dense Symmetric
Eigenvalue and Singulac Value Decompositions," in Proc. International Conference
on Parallel Processing, 433--440.
R. Schreiber (1986). "Solving Eigenvalue and Singular Value Problems on an Undersized
Systolic Array," SIAM J. Sci. and Stat. Camp. 7, 441--451.
C.H. Bischof and C. Va.n Loan (1986). "Computing the SVD on a Ring of Array Proces-
sors," in Large Scale Eigenvalue Problems, eds. J. Cullum and R. Willoughby, North
Holland, 51-66.
C.H. Bischof (1987). "The Two-Sided Block Jacobi Method on Hypercube Architec-
tures," in Hwercube Multiprocessors, ed. M.T. Heath, SIAM Press, Philadelphia.
C.H. Bischof (1989). "Computing the Singular Value Decomposition on a Distributed
System of Vector Processors," Parallel Computing 11, 171-186.
S. Van Huffel and H. Park (1994). "Parallel Tri- and Bidiagona.lization of Bordered
Bidiagonal Matrices," Parallel Computing 20, 1107-1128.
B. Lang (1996). "Parallel Reduction of Banded Matrices to Bidiagonal Form," Parallel
Computing 22, 1-18.
The divide and conquer algorithms devised for for the symmetric eigenproblem have
SVD analogs:
E.R. Jessup and D.C. Sorensen (1994). "A Parallel Algorithm for Computing the Sin-
gula.r Value Decomposition of a Matrix," SIAM J. Matrix Anal. Appl. 15, 53(}-548.
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Bidiagonal
SVD," SIAM J. Matri:J: Anal. Appl. 16, 7!<-92.
Careful analyses of the SVD calculation include
J.W. Demmel and W. Kahan (1990). "Accurate Singular Values o[ Bidiagonal Matrices,"
SIAM J. Sci. and Stat. Comp. 11, 873-912.
K.V. Fernando and B.N. Parlett (1994). "Accurate Singular Values and Differential qd
Algorithms," Numer. Math. 67, 191-230.
S. Cha.ndrasekaren and I.C.F. Ipsen (1994). "Backward Errors ror Eigenvalue and Sin-
gular Value Decompositions," Numer. Math. 68, 215--223.
High accuracy SVD calculation and connections among the Cholesky, Schur, and singu-
lar value computations are discussed in
J.W. Derome! a.nd K. Veselic (1992). "Jacobi's Method is More Accurate than QR,"
SIAM J. Matrix Anal. Appl. 13, 1204-1245.
R. Mathias (1995). "Accurate Eigensystem Computations by Jacobi Methods," SIAM
J. Matrix Anal. Appl. 16, 977-1003.
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 461
(8.7.1)
If there exists a J1o E [0, 1] such that C(Jlo) is non-negative definite and
Q'[C(Jlo)Qt = [ ~ ~]
be the Schur decomposition of C(J.Io) and define X 1 = Q 1 diag(D- 112 , In-k).
If At= X'[ AXt, Bt =X{ BX1, and Ct = X'[C(Jlo)Xl, then
k k
n-k n-k
and
Example 8. '7.1 If
A = [ 229 163 ] 81 59 ]
163 116 and B= [ 59 43
X = [ 3 -5 ]
-4 7
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 463
The scalar c(A, B) is called the Crawford number of the pencil A- )..B.
~
2
= II EA II~ +II Es II~ < c(A,B).
that satisfy
fori= l:n.
n
Example 8.7.2 If
r~:
.001 0
and G = ~ .001
A= [ 1
and B = Gcfl', then the two smallest eigenvalues of A- ).8 are
a, = -0.619402940600584 a2 = 1.627440079051887.
If 17-<l.igit floating point arithmetic is UBed, then these eigenvalues are computed to full
machine precision when the symmetric QR algorithm is applied to fl(D- 1 12VT AV D- 1/2),
where B = V DVT is the Schur decomposition of B. On the other hand, if Algorithm
8.7.1 is applied, then
il1 = -0.619373517376444 il2 = 1.627516601905228.
The reason for obtaining only four correct significant digits is that "2( B) "" 1018 .
minimizes
!(>.) = II Ax - >.Bx liB (8.7.5)
where II·JIB is defined by Jlzll1 = zT B- 1 z. The mathematical properties of
and
VTBX = S = diag(s~. ... ,sq)
where q = min(p,n).
Proof. The proof of this decomposition appears in Van Loan (1976). We
present a more constructive proof along the lines of Paige and Saunders
(1981). For clarity we assume that null(A) n null( B) = {0} and p ~ n. We
leave it to the reader to extend the proof so that it covers theses cases.
Let
(8.7.6)
The elements of the set u(A, B) ={ ctfs 1 , ••• , en/sq } are referred
to as the generalized singular values of A and B. Note that a E u(A, B)
implies that u 2 E >.(AT A, BT B). The theorem is a generalization of the
SVD in that if B =In, then u(A,B) = u(A).
Our proof of the GSVD is of practical importance since Stewart (1983)
and Van Loan (1985) have shown how to stably compute the CS decompo-
sition. The only tricky part is the inversion of WT R to get X. Note that
the columns of X = [ x1, ... , Xn] satisfy
i = 1:n
Problems
P8.7.1 Suppose A E Rnxn is symmetric and G E Rnxn is lower triangular and nonsin-
gular. Give an efficient algorithm for computing C = o-1 AG-T .
P8.7.2 Suppose A E R'xn is symmetric and BE R'x" is symmetric positive definite.
Give an algorithm for computing the eigenvalues of AB that us.., the Cholesky factor-
ization and the symmetric QR algorithm.
P8. 7.3 Show that if Cis real and diagonalizable, then there exist symmetric matrices A
and B, B nonsingular, such that C = AB- 1 • This shows that symmetric pencils A->.B
are essentially general.
=
P8. 7.4 Show how to convert an Ax >.Bx problem into a generalized singular value
problem if A and B are both symmetric and non-negative definite.
P8.7.5 Given Y E R'x" show how to compute Householder matrices H2, ... , H, so
that Y H, ·. · H2 = T is upper triangular. Hint: Hk zeros out the kth row.
P8.7.6 Suppose
where A E Rmxn, B1 E R"'xm, and B, E Jl!'X". Assume that B1 and B, are positive
definite with Cholesky triangles G1 and G, respectively. Relate the generalized eigen-
values of this problem to the singular values of G1 1 AG:IT
P8. 7. 7 Suppose A and B are both symmetric positive definite. Show how to compute
>.(A, B) and the corresponding eigenvectors using the Cholesky factorization and C S
decomposition.
G.W. Stewart (1976). "A Bibliographical Tour of the Large Sparse Generalized Eigen-
value Problem," in Sparse Matrix Computations , ed., J.R. Bunch and D.J. Rooe,
Academic Press, New York.
Some papers of particular interest include
G.W. Stewart (1979). "Perturbation Bounds for the Definite Generalized Eigenvalue
Problem," Lin. Alg. and Its Applic. 23, 69-86.
See also
L. Elsner and J. Gue.ng Sun (1982). "Perturbation Theorems for the Generalized Eigen-
value Problem,; Lin. Alg. and its Applic. 48, 341-357.
J. Guang Sun (1982). "A Note on Stewart's Theorem for Definite Matrix Pairs," Lin.
Alg. and Its Applic. 48, 331-339.
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 469
J. Guang Sun (1983). "Perturbation Analysis for the Generalized Singular Value Prob-
lem," SIAM J. Numer. Anal. 20, 611~25.
C.C. Paige (1984). "A Note on a Result of Sun J.-Guang: Sensitivity of the CS and
GSV Decompositions," SIAM J. Numer. Anal. 21, 186-191.
The generalized SVD and some of its applications are discussed in
C.F. Van Loan (1976). "Generalizing the Singular Value Decomposition," SIAM J. Num.
Anal. 13, 76-83.
C. C. Paige !llld M. Saunders (1981). ''Towsrds A Generalized Singular Value Decompo-
sition," SIAM J. Num. Anal. 18, 398--405.
B. Kagstrom (1985). "The Generalized Singular Value Decomposition !llld the General
A - :I.B Problem," BIT 24, 568-583.
Stable methods for computing the CS !llld generalized singular value decompositions are
described in
G.W. Stewart (1983). "A Method for Computing the Generalized Singular Value De-
composition," in Matri:J: Pencils , eel. B. Kilgstrom !llld A. Ruhe, Springer-Verlag,
New York, pp. 207-20.
C.F. Van Loan (1985). "Computing the CS and Generalized Singular Value Decompo-
sition," Numer. Math. 46, 479--492.
M.T. Heath, A.J. Laub, C. C. Paige, and R.C. Ward (1986). "Computing the SVD of a
Product of Two Matrices," SIAM J. Sci. and Stat. Comp. 7, 1147-1159.
C. C. Paige (1986). "Computing the Generalized Singular Value Decomposition," SIAM
J. Sci. and Stat. Comp. 7, 1126-1146.
L.M. Ewerbring and F.T. Luk (1989). "Canonical Correlations and Generalized SVD;
Applications and New Algorithms," J. Comput. Appl. Math. 27, 37-52.
J. Erxiong (1990). "An Algorithm for Finding Generalized Eigenpairs of a Symmetric
Definite Matrix Pencil," Lin.Alg. and Its Applic. 132, 65--91.
P.C. H!lllsen (1990). "Relations Between SVD and GSVD of Discrete Regularization
Problems in Standard and General Form," Lin.Alg. and Its Applic. 141, 165-176.
H. Zha (1991). "The Restricted Singular Value Decomposition of Matrix Triplets," SIAM
J. Matrix Anal. Appl. 12, 172-194.
B. De Moor and G.H. Golub (1991). "The Restricted Singular Value Decomposition:
Properties and Applications," SIAM J. Matrix Anal. Appl. 12, 401-425.
V. Hari (1991). "On Pairs of Almost Diagonal Matrices," Lin. Alg. and Its Applic.
148, 193-223.
B. De Moor and P. Van Dooren (1992). ''Generalizing the Singular Value !llld QR
Decompositions," SIAM J. Matrix Anal. Appl. 13, 993-1014.
H. Zha (1992). "A Numerical Algorithm for Computing the Restricted Singular Value
Decomposition of Matrix Triplets," Lin.Alg. and Its Applic. 168, 1-25.
R-C. Li (1993). "Bounds on Perturbations of Generalized Singular Values and of AI;oo-
ciated Subspaces," SIAM J. Matrix Anal. Appl. 14, 195--234.
K. Veseli~ (1993). "A Jacobi Eigenreduction Algorithm for Definite Matrix Pairs," Nu-
mer. Math. 64, 241-268.
Z. Bai and H. Zha (1993). "A New Preprocessing Algorithm for the Computation of the
Generalized Singular Value Decomposition," SIAM J. Sci. Comp. 14, 1007-1012.
L. Kaufm!lll (1993). "An Algorithm for the Banded Symmetric Generalized Matrix
Eigenvalue Problem," SIAM J. Matri:J: Anal. Appl. 14, 372-389.
G.E. Adams, A.W. Boja.nczyk, a.nd F.T. Luk (1994). "Computing the PSVD of Two
2x2 Tri!lllgular Matrices," SIAM J. Matrix Anal. AppL 15, 366-382.
Z. Drmai: (1994). The Genemlized Singular Value Problem, Ph.D. Thesis, FemUniver-
sitat, Hagen, Germany.
R-C. Li (1994). "On Eigenvalue Variations of Rayleigh Quotient Matrix Pencils of a
Definite Pencil," Lin. Alg. and Its Applic. 208/209, 471-483.
Chapter 9
Lanczos Methods
470
9.1. 0ER1VATION AND CONVERGENCE PROPERTIES 471
r(x) X# 0.
Recall from Theorem 8.1.2 that the maximum and minimum values of r(x)
are .>. 1 (A) and >.n(A), respectively. Suppose {q;} ~ Rn is a sequence of
orthonormal vectors and define the scalars Mk and mk by
max
y¢0
472 CHAPTER 9. LANCZOS METHODS
(This assumes V'r(uk) # 0.) Likewise, ifvk E span{q~, ... ,qk} satisfies
r(vk) = mk, then it makes sense to require
(9.1.2)
Thus, we are Jed to the problem of computing orthonormal bases for the
Krylov subspaces
9.1.2 Tridiagonalization
In order to find this basis efficiently we exploit the connection between the
tridiagonalization of A and the QR factorization of K(A, q1, n). Recall that
if QT AQ = T is tridiagonal with Qe 1 = q1, then
T =
0
and equat ing columns in AQ = QT, we fi nd
while ([3,. f= 0)
Qk+I = r,. j{3,.; k = k + 1; a,. = qf Aqk (9.1.3)
r,. = (A- a~ol)q,. - f3~o - tqk-ti /3~c = II rk 1!2
end
There is no loss of generality in choosing t he f3r. to be positive. The q,. are
called Lanc.zos vectors.
Theorem 9.1.1 Let A E !Rnxn be symmetric and assume QJ E IRn has unit
2-norm. Then the Lanczos itemtion (9.1.3) runs until k = m, where m =
rank(K(A,q 1 ,n)) Moreover, fork= l:m we have
where
Clj fh 0
f3J a2
Tk =
f3k-l
0 f3k-l O!k
and Qk = [ q1 , ... , Qk] has orthonormal columns that span JC(A, q1 , k ).
Proof. The proof is by induction on k. Suppose the iteration has produced
Qk = [q 1, ... ,qk] such that ran(Qk) = JC(A,qt,k) and QIQk =h. It is
easy to see from (9.1.3) that (9.1.4) holds. Thus, QI AQk = Tk +QI rkef,
Since a; = q'[ Aq; for i = l:k and
and so Ay; = O;y; + rk(e[ Se;). The proof is complete by taking norms
and recalling that II rk !!2 = li3kl· IDl
i = 1:k
Note that in the terminology of Theorem 8.1.15, the (0;, y;) are Ritz pairs
for the subspace ran(Qk)·
Another way that Tk can be used to provide estimates of A's eigenvalues
is described in Golub (1974) and involves the judicious construction of a
rank-one matrix E such that ran(Qk) is invariant for A+ E. In particular,
if we use the Lanczos method to compute AQk = QkTk + rke[ and set E
= rwwT, where r = ±1 and w = aqk + brk, then it can be shown that
where cos(.PI) = lqf zd, PI = (AI - A2)/(A2 - An), and ck_ 1 (x) is the
Chebyshev polynomial of degr-ee k - 1.
476 CHAPTER 9. LANCZOS METHODS
•=• n
L:d~v(.x,)2
1= 1
n
L:d~p(.\;)2
~ At - (AI -An) •= n
2
drv<>..) + L:d~v(-X;?
2
·-2
We C3ll make the lower bound tight by selecting a polynomial p(x) that is
large at x = At in comparison to its value at the remaining eigenvalues.
One way of doing this is to set
Co= 1, Ct = z.
These polynomials are boWlded by unity on l-1, 1], but grow very rapidly
outside this interval. By defining p(x) this way it follows that lp(Ai)lis
bounded by unity fori= 2:n, while p(At) = ck_ 1(1 + 2pt). Thus,
1-4 1
91 ~ At - (At- An)~ Ck-t(l + 2pt)2 .
v = Ak-tqt = L:c.>.~-tZi
i=l
Using the proof and notation of Theorem 9.1.3, it is easy to show that
(9.1.5)
(Hint: Set p(x) = xk- t in the proof.) Thus, we can oompare the quality of
the lower bounds for 91 and 1 1 by comparing
and
).2) 2(k-1)
Rk-1 = ('f."
This is done in following table for representative values of k and .>.2 / .>.1 •
The superiority of the Lanczos estimate is self-evident. This should
be no surprise, since 81 is the maximum of r(x) = xT AxjxT x over all of
X:(A,q 1 ,k), while 11 = r(v) for a particular v in X:(A,q1,k), namely v =
Ak-tqt.
478 CHAPTER 9. LANCZOS METHODS
Problems
(a) Show that the interval [a - 6, a+ 6] must contain Bll eigenvalue of A where 6 =
11 z 11•/11'7 II•· (b) Consider the new approximation fj =a.,+ bz Blld show how to deter-
mine the scalars a and b so that
_ fiT Afj
a= fjTfj
is maximized. (c) Relate the above computations to the first two steps of the La.nczos
process.
C. Lanczos (1950). "An Iteration Method for the Solution of the Eigenvalue Problem of
Linear Differential and Integral Operators," J. Res. Nat. Bur. Stand. 45, 255--82.
Although the convergence of the Ritz values is alluded to this paper, for more details we
refer the reader to
l.S. Duff (1974). "Pivot Selection and Row Ordering in Givens Reduction on Sparse
Matrices," Computing 13, 239--48.
l.S. Duff and J.K. &.id (1976). "A Comparison of Some Methods for the Solution of
Sparse Over-Determined Systems of Linear Equations," J. Ins!. Ma!hs. Applic. 17,
267-80.
L. Kaufman (1979). "Application of Dense Householder Transformations to a Sparse
Matrix," ACM Trans. Math. Sof!. 5, 442-50.
ak = q"[(Aqk- f3k-1qk-J),
the whole Lanczos process can be implemented with just two n-vectors of
storage.
ch PI 0
iJ1 02
'i',. =
Paige (1971, 1976) shows that iff~.: is the computed analog of r~c, then
(9.2.1)
where
II E1c: 112 ::::~ ull A lb . (9.2.2)
This indicates that the important equation AQ" = Q,.T,. +nef is satisfied
to working precision.
Unfortunately, the picture is much less rosy with respect to the orthog-
onality among the q; . (Normality is not an issue. The computed Lanczos
vectors essentially have unit length.) If Pk
= fl (ll f k ll2) and we compute
Qk+l = fl(rk f{J,.), then a simple analysis shows that PkQk+1 ~ i'k + w~o
where II Wk ll2::::~ uJI f~r ll2 ~ uJI A 112· Thus, we may conclude that
·T • I ~ IffQil + ul! A !1 2
Iqk+lQi !.B~cl
A = [ 2.64 -.48 ]
-.48 2.36
has eigenvalues .\1 = 3 and .\2 = 2. If the Lanczos algorithm is applied to this matrix
with 91 =[ .810, -.586 jT and thre&digit floating point arithmetic is performed, then
q2 =[ .707, .707]T. Loss of orthogonality occurs because span{ql} is almost invariant
for A. {The vector x = [.8, -.6JT is the eigenvector affiliated with>.,.)
satisfies T < 1, then there exist eigenvalues JJ.b ... , JJ.k E A(A) such that
k
w rk- "i:,(qf rk)Q; E span{q1, ... ,Qk}.L.
i=l
(9.2.4)
484 CHAPTER 9. LANCZOS METHODS
and
(9.2.5)
That is, the most recently computed Lanczos vector qk+l tends to have a
nontrivial and unwanted component in the direction of any converged Ritz
vector. Consequently, instead of orthogonalizing qk+ 1 against all of the
previously computed Lanczos vectors, we can achieve the same effect by
orthogonalizing it against the much smaller set of converged Ritz vectors.
The practical aspects of enforcing orthogonality in this way are dis-
cussed in Parlett and Scott (1979). In their scheme, known as selective
orlhogonalization, a computed Ritz pair (0, y) is called "good" if it satisfies
Thus, if we have a bound for II h- QfQk ll2 we can generate a bound for
-T - -
II h+J- Qk+JQk+1 ll2 by applying the lemma with S = Qk and d = qk+1·
(In this case {j ~ u and we assume that qk+ 1 has been orthogonalized against
the set of currently good Ritz vectors.) It is possible to estimate the norm
of Qf qk+l from a simple recurrence that spares one the need for accessing
q1, ... , qk· See Kahan and Parlett (1974) or Parlett and Scott (1979). The
overhead is minimal, and when the bounds signal loss of orthogonality, it is
time to contemplate the enlargement of the set of good Ritz vectors. Then
and only then is 'h diagonalized.
Mt BT
1 0
Bt M2
QTAQ 1'= (9.2.6)
B'f-t
0 Br-1 Mr
where
Q = [X1, ... ,Xr j xi e JR.nxp
is orthogonal, each M , E JR.Px-", and each B1 e JR!'xp is upper triangular.
Comparing blocks in AQ = Q1' shows that
XoBo =0
for k = 1:r - 1. From the orthogonality of Q we have
Mt = X[AX~c
where
Ml BT 0
B1 Mz
T"
Using an argument similar to the one used in the proof of T heorem 9.1.1,
we can show that the x,. are mutually orthogonal provided none of the R,.
are rank-deficient. However if rank(Rk ) < p for some k, then it is possible
to choose the columns of Xk+l such that X[+1 X, = 0, fori = l :k. See
Golub and Underwood (1977).
Because Tk has bandwidth p, it can be efficiently reduced to tridiagcr
nat form using an algorithm of Schwartz (1968). Once t ridiagonal form is
achieved, the Ritz values can be obtained via the symmetric QR algorithm.
In order to intelligently decide when to use block Lanczos, it is necessary
to understand how the block dimension affects convergence of the Ritz
values. The following generalization of Theorem 9.1 .3 sheds light on this
issue.
Theorem 9.2.2 Let A by an n-by-n symmetric matrix with eigenvalues
>. 1 ~ • • • ~ >.., and corresponding orthonormal eigenvectors z 1 , •• • , z,.. . Let
P.l ~ · · · ~ IJ.p be the p largest eigenvalues of the matrix 1',. obtainP.d after
k steps of the block Lanczos iteration {9.2. 7). If Z 1 = [ z 1 , •• • , z, ] and
cos(o9,) = q,(Z'[X1 ) > 0, the11 fori= l:p, >., ~ IJ> ~ >., - where t:t
t:~
2
(>.1 tan (9p) >.,+1
·
- ,\i) "Yi "'• -
[ct.-)o=~)r >.,->..,
9.2. PRACTICAL LANCZOS PROCEDURES 487
Of the several computational variants of the Lanczos Method, Algorithm 9.2.1 is the
most stable. For details, see
C. C. Paige (1972). "Computational Variants of the Lanczos Method for the Eigenprob-
lem," J. Inst. Math. Applic. 10, 373~81.
Other practical details associated with the implementation of the Lanczos procedure are
discussed in
D.S. Scott (1979). "How to Make the Lanczos Algorithm Converge Slowly," Math.
Comp. 33, 239~47.
B.N. Parlett, H. Simon, and L.M. Stringer (1982). "On Estimating the Largest Eigen-
value with the Lanczos Algorithm," Math. Comp. 38, 153-166.
B.N. Parlett and B. Nour-Omid (1985). "The Use of a Refined Error Bound When
Updating Eigenvalues of Tridiagonals," Lin. Alg. and It• Applic. 68, 179-220.
J. Kuczynski and H. Wo:iniakowski (1992). "Estimating the Largest Eigenvalue by the
Power and Lanczos Algorithms with a Random Start," SIAM J. Matrix Anal. Appl.
13, 1094-1122.
The behavior of the Lanczos method in the presence of roundoff error was originally
reported in
C.C. Paige (1971). "The Computation of Eigenvalues and Eigenvectors of Very Large
Sparse Matrices," Ph.D. thesis, University of London.
C. C. Paige (1976). "Error Analysis of the Lanczos Algorithm for Tridiagonalizing Sym-
metric Matrix," J. In•t. Math. Applic. 18, 341-49.
C.C. Paige (1980). "Accuracy and Effectiveness of the Lanczos Algorithm for the Sym-
metric Eigenproblem," Lin. Alg. and Its Applic. 34, 235-58.
C. C. Paige (1970). "Practical Use of the Symmetric Lanczos Process with Reorthogo-
nalization/' BIT 10, 183-95.
G.H. Golub, R. Underwood, and J.H. Wilkinson (1972). "The Lanczos Algorithm for the
Symmetric Ax= >.Bx Problem," Report STAN-CS-72-270, Department of Computer
Science, Stanford University, Stanford, California.
B.N. Parlett and D.S. Scott (1979). "The Lanczos Algorithm with Selective Orthogo-
nalization/' Math. Camp. 331 217-38.
H. Simon (1984). "Analysis of the Symmetric Lanczos Algorithm with Reorthogonaliza-
tion Methods," Lin. Alg. and It• Applic. 61, 101-132.
Without any reorthogonalization it is necessary either to monitor the loss of orthogonal-
ity and quit at the appropriate instant or else to devise some scheme that will aid in the
9.2. PRACTICAL LANCZOS PROCEDURES 489
distinction between the ghost eigenvalues and the actual eigenvalues. See
W. Kahan and B.N. Parlett (1976). "How Far Should You Go with the Lanczos Process?"
in Sparse Matrix Computations, ed.. J. Bunch and D. Rose, Academic Press, New
York, pp. 131-44.
J. Cullum and R.A. Willoughby (1979). "Lanczos and the Computation in Specified
Intervals of the Spectrum of Large, Sparse Real Symmetric Matrices, in Sparse Matrix
Proc. , 1978, ed. I.S. Duff and G.W. Stewart, SIAM Publications, Philadelphia, PA.
B.N. Parlett and J.K. Reid (1981). "Tracking the Progress of the Lanczos Algorithm for
Large Symmetric Eigenproblems," IMA J. Num. Anal. 1, 135-55.
D. Calvetti, L. Reichel, and D.C. Sorensen ( 1994). "An Implicitly Restarted Lanczos
Method for Large Symmetric Eigenvalue Problems," ETNA 2, 1-21.
The block Lanczos algorithm is discussed in
J. Cullum and W.E. Donath (1974). "A Block Lanczos Algorithm for Computing the q
Algebraically Largest Eigenvalues and a Corresponding Eigenspace of Large Sparse
Real Symmetric Matrices," Proc. of the 1974 IEEE Conf. on Dei!ision and Control,
Phoenix, Arizona, pp. 505-9.
R. Underwood (1975). "An Iterative Block Lanczos Method for the Solution of Large
Sparse Symmetric Eigenproblems," Report STAN-CS-75-495, Department of Com-
puter Science, Stanford University, Stanford, California.
G.H. Golub and R. Underwood (1977). "The Block Lanczos Method for Computing
Eigenvalues," in Mathematical Software III, ed. J. llice, Academic Press, New York,
pp. 364-77.
J. Cullum (1978). "The Simultaneous Computation of a Few of the Algebraically Largest
and Smallest Eigenvalues of a Large Sparse Symmetric Matrix," BIT 18, 265-75.
A. Ruhe (1979). "Implementation Aspects of Band Lanczos Algorithms for Computation
of Eigenvalues of Large Sparse Symmetric Matrices," Math. Camp. 39, 680.87.
The block Lanczos algorithm generates a symmetric band matrix whose eigenvalues can
be computed in any of several ways. One approach is described in
A.K. Cline, G.H. Golub, and G.W. Platzman (1976). "Calculation of Normal Modes of
Oceans Using a Lanczos Method,, in Sparse Matrix Computations, ed. J.R. Bunch
and D.J. Rose, Academic Press, New York, pp. 409-26.
T. Ericsson and A. Ruhe (1980). "The Spectral Transformation Lanczos Method for the
Numerical Solution of Large Sparse Generalized Symmetric Eigenvalue Problems,"
Math. Camp. 35, 1251-68.
R.G. Grimes, J.G. Lewis, and H.D. Simon (1994). "A Shifted Block Lanczos Algorithm
for Solving Sparse Symmetric Generalized Eigenproblems," SIAM J. Matri:r Anal.
Appl. 15, 228-272.
490 CHAPTER 9. LANCZOS METHODS
We show that both of these requirements are met if the q1r are Lanczos
vectors.
After k steps of the Lanczos algorithm we obtain the factorization
(9.3.3)
where
0
(9.3.4)
0
1
and
that
dJ = Ot
fori = 2:k
JJi- 1 = /3i- ddt- t
d, ""'a, - 13<-ll'i.- 1
end
Note that we need only calculate the quantities
(9.3.7)
492 CHAPTER 9. LANCZOS METHODS
P1 q[ ro
[ L,_,D,_, q'fro
:J
P2
=
0 ... 0 1-"k- Jdk-1
Pk-1 qJ._ 1 ro
P1e qJ.ro
Pk = [ Pk- l ]
Pk
where
and thus,
This is precisely the kind of recursive formula for Xk that we need. To-
gether with (9.3.6) and (9.3.7) it enables us to make the transition from
( Q~c-t,Ck-t , XJc -J) to (q~c,c.~c ,x~c) with a minimal work and storage.
A further simplification results if we set q1 to be a unit vector in the
direction of the initial residual r 0 == b- Axo. With this choice for a Lanczos
starting vector, qJ. ro = 0 for k ~ 2. It follows from (9.3.3) that
ro = b- Axo
/3o = II ro ll2
Qo = 0
k=O
while {3k #0
Qk+l = Tk/f3k
k=k+1
ak = qfAqk
Tk =(A- akl)qk- f3k-1Qk-l
l3k =II Tk lb
if k = 1
dl = ll}
Ct = Ql
PI = f3o/ai
X} = PIQI
else
J.Lk-1 = l3k-I/dk-l
dk = llk- f3k-IJ.Lk-!
Ck = Qk - /.Lk-!Ck-1
Pk = -J.Lk-Idk-IPk-1/dk
Xk = Xk-1 + PkCk
end
end
X= Xk
dt 0 0 0
Ct th 0 0
h ez d3 0
Tk11 · · · lk-1 = L~: =
0
0 0 0 ik- z Ck - 1 Jk
where
where
H k == [ 13~;r ] .
This (k + 1)-by-k matrix is upper Hessenberg and figures in the MINRES
method of Paige and Saunders {1975). In this technique Xk minimizes
II A x - b 11 2 over t he set xo +span{qt •. . . , qk}. Note that
0 ()2
B = (9.3.8)
fJn-1
0 0 On
Recall from §5.4.3 that this factorization may be computed using House-
holder transformations and that it serves as a. front end for the SVD algo-
rithm.
Unfortunately, if A is large and sparse, then we can expect large, dense
submatrices to arise during the HousehoJder bidiagonalization. Conse-
quently, it would be nice to develop a means for computing B di.rectly
without any orthogonal updates of the matrix A.
Proceeding just as we did in §9.1.2 we compare columns in the equations
AV = UB and ATU = VBT fork= l :n a.nd obtain
Av~o = a~ouk + Pt.-tUk-1 fJouo =0
(9.3.9)
ATu,_ = Oktlk + fJkVk+l /JnVn+I: 0
Defining
rk = Av,. - fJk-ttLk-1
Pk = ATUk - Olk1Jk
we may conclude from orthonormality that a,. = ±II rk 1!2. u~~: = r~~:fa,. ,
{3,. = ±II Pk 112, and Vk+l = Pk!{3,.. Properly sequenced, these equations
define the La.nczos method for bidiagonalizlng a rectangular matrix:
Vt= given unit 2-norm n-vector
Po= V t j f3o = 1; k = 0; U0 = 0
while fJ1e '¥= 0
Vle+l = P1e/fJ1e
k=k+l
r,. = Avk- fJie-tUie-1 (9.3.10)
Otk = II r1e H2
u,. = r,,Ja~c
Pie = ATUk - Otk1Jic
/Jic = II Pie 112
end
496 CHAPTER 9. LANCZOS METHODS
2:: YiVi
i=l
where y = [YJ, ... ,yn]T solves the system By = [ufb, ... ,u~bjT. Note
that because B is upper bidiagonal, we cannot solve for y until the bidi-
agonalization is complete. Moreover, we are required to save the vectors
vi, ... , Vn, an unhappy circumstance if n is large.
The development of a sparse least squares algorithm based on the bidi-
agonalization can be accomplished more favorably if A is reduced to lower
9.3. APPLJCATIONS TO A:z: = bAND LEAST SQUARES 497
bidiagonal form
0] 0 0
f3t oz
urAV = B = 0
0
0
0
where V = IVI, ... , Vn I and U = (u 11 ••• , Um I are orthogonal. Comparing
columns in the equations ATU = V BT and AV = U B we obtain
die ] k
[ u 1 '
recursion that involves the last column of Wk. The net result is a sparse LS
algorithm referred to as LSQR that requires only a few n-vectors of storage
to implement.
Problems
P9.3.1 Modify Algorithm 9.3.1 so that it implements the indefinite symmetric solver
outlined in §9.3.2.
P9.3.2 How many vector workspaces are required to implement efficiently (9.3.10)?
P9.3.3 Suppose A is rank deficient and ak = 0 in (9.3.10). How could Uk be obtained
so that the iteration could continue?
P9.3.4 Work out the lower bidiagonal version of (9.3.10) and detail the least square
solver sketched in §9.3.4.
Much of the material in this section has been distilled from the following papers:
0. Widlund {1978). "A Lanczos Method for a Class of Nonsymmetric Systems of Linear
Equations," SIAM J. Numer. Anal. 15, 801-12.
B.N. Parlett (1980). "A New Look at the Lanczos Algorithm for Solving Symmetric
Systems of Linear Equations," Lin. Alg. and Its Applic. 29, 323--46.
G.H. Golub, F.T. Luk, and M. Overton (1981). "A Block Lanczos Method for Computing
the Singular Values and Corresponding Singular Vectors of a Matrix," ACM 7\"ans.
Math. Soft. 1, 149--69.
J. Cullum, R.A. Willoughby, and M. Lake {1983). "A Lanczos Algorithm for Computing
Singular Values and Vectors of Large Matrices," SIAM J. Sci. and Stat." Comp. 4,
197-215.
Y. Saad (1987). "On the Lanczos Method for Solving Symmetric Systems with Several
Right Hand Sides," Math. Comp. 48, 651-662.
M. Berry and G.H. Golub (1991). "Estimating the Largest Singular Values of Large
Sparse Matrices via Modified Moments," Numerical Algorithms 1, 353-374.
C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Numer. Linear Algebm with Applic. 2,
115--134.
9.4. ARNOLDI AND UNSYMMETRIC LANCZOS 499
k
hk+1,kqk+1 = Aqk- L h;kq;: = rk
i=l
where h;k = q[ Aqk for i = 1:k. It follows that if rk =/= 0, then qk+l is
specified by
qk+l = Tk/hk+t,k
where hk+t,k = II rk lb· These equations define the Arnoldi process and in
strict analogy to the symmetric Lanczos process (9.1.3) we obtain :
ro = ql
hw = 1
k=O
while (hk+l,k =/= 0)
qk+l = Tk/hk+l,k
k=k+1
Tk = Aqk (9.4.1)
fori= 1:k
h;k = q[w
Tk = rk - h;kqi
end
hk+t,k = II Tk 112
end
500 CHAPTE R 9 . L ANCZOS METHODS
We assume that q1 is a given unit 2-norm starting vector. The Qk are called
t he Amcldi vectors and they define an orthonormal basis for the Krylov
subspace .t::(A,q 1 ,k):
(9.4.2)
The situation after k steps is summarized by the k-step Arnoldi factoriza-
tion
0 hk,k-l hkk
If Tk = 0, then the columns of Q,. define an invariant subspace and >.( H k) s;;;
>.(A). Otherwise, the focus is on how to extract information about A's
eigensystem from the Hessenberg matrix Hk and the mat rix Qk of Arnoldi
vectors.
If y E R" is a unit 2-norm eigenvector for Hk and Hky = >.y, then from
(9.4.3)
(A - Al)x = {ef y)rk
where x = Qi:Y· We call >. a Ritz value and x the corresponding Ritz
vector. The size of le[ Ylll rk lb can be used to obtain error bounds, although
the relevant perturbation theorems are not as routine to apply as in the
symmetric case.
Some numerical properties of tbe Arnoldi iteration are discussed in
Wilkinson (1965, pp.382). As with the symmetric Lanczos iteration, Joss
of orthogonality among the q; is an issue. But two other features of (9.4.1)
must be addressed before a practical Arnoldi eigensolver can be obtained:
• The Arnoldi vectors q1 , • •. , Qk are referenced in step k and the com-
putat ion of H,.(I:k, k) involves O(kn) flops. Thus, there is a steep
penalty associated with the generation of long Arnoldi sequences.
• The eigenvalues of H,. do not approximate the eigenvalues of A in the
style of Kaniel and Paige. This is in contrast t o the symmetric case
where information about A's extremal eigenvalues emerges quickly .
With Arnoldi, the early extraction of eigenvalue information depends
crucially on the choice of q 1•
These realities suggest a framework in which we use Arnoldi with repeated,
carefully c.hosen restarts and a controlled iteration maximum. {Recall the
s-step Lanczos process of §9.2.7.)
9.4. ARNOLDI AND UNSYMMETRIC LANCZOS 501
for some polynomial of degree m- 1. If Av; = >.;v; for i = 1:n and q1 has
the eigenvector expansion
then
q+ = aip(>.I)VI + · · · + anp(>.n)Vn.
Note that K:(A,q+,m) is rich in eigenvectors that are emphasized by p(>.).
That is, if p(>.wanted) is large compared to p().,.nwanted), then the Krylov
space K:(A, q+, m) will have much better approximations to the eigenvector
Xwanted than to the eigenvector Xunwanted· (It is possible to couch this
argument in terms of Schur vectors and invariant subs paces rather than in
terms of particular eigenvectors.)
Thus the act of picking a good restart vector q+ from K(A, q1, m) is the
act of picking a polynomial "filter" that tunes out Wlwanted portions of the
spectrum. Various heuristics for doing this have been developed based on
computed Ritz vectors. See Saad (1980, 1984, 1992).
We describe a method due to Sorensen (1992) that determines the
restart vector implicitly using the QR iteration with shifts. The restart
occurs after every m steps and we assume that m > j where j is the num-
ber of sought-after eigenvalues. The choice of the Arnoldi length parameter
m depends on the problem dimension n, the effect of orthogonality loss, and
system storage constraints.
After m steps we have the Arnoldi factorization
H(I) =He
fori= 1:p
H(i) - ?tJ = V;R;
H(i+I) = R;V; + ?t;/
502 CHAPTER 9. LANCZOS METHODS
(2) [V]mi = 0 fori= 1:j -1. This is because each V; is upper Hessenberg
and so V E lR.mxm has lower bandwidth p = m- j.
(9.4.4)
where a is a scalar.
(9.4.5)
"Yn-1
0 f3n-t a..
With the column partitionings
Q = I q., . .. ,q" I
Q- T = p = IPl, .. ·,Pn I
we find upon contpa.ring columns in AQ= qr and AT P = PTT that
Aqk = "Y~:-tqk-J + a~.:q~: + f3~rqk+I
ATP~< = fJic - tPI:- 1 + <lkPk + "YicPic+l
for k = 1:n -1. TM8e equations together with the &icrtMgonalitJJ oondition
pTQ = In imply
Ole == p'fAq,.
and
There is some flexibility in choosing the scale factors l3k and 'Yk· Note that
'Yk =sfr~c/f3~c.
0
then the situation at the bottom of the loop is summarized by the equations
If TJt == 0, then the iteration terminates and span {q1 , ••• , Qk} is an invari-
ant subspace for A. U s~c = 0, then the iteration also terminates and
span{p1, ... ,pk} is an invariant subspace for AT. However, if neither of
these conditions are true and sfTJt = 0, t hen the tridiagonalization process
ends without any invariant subspace information. This is called serii.>US
breakdown. See Wilkinson (1965, p.389) for an early discussion of the mat-
ter.
9.4. ARNOLDl AND UNSYMMETRJC LANCZOS 505
C:-1
0 Br-1 M,.
where all the blocks are p-by-p. Let Q = [ Q1 , • .. , Q,.] and P = [ P1, . . . , P,.]
be conformable partitionings of Q and P. Comparing block columns in the
equations AQ = QT and AT P = PTI' we obtain
then
satisfy PJ'+ 1Qic+l = Ip. Serious breakdown in t his set ting is associated with
having a singular s'[ R,..
One way of solving the serious breakdown problem in (9.4.7} is to go
after a factorization of the form (9.4.10) in which the block sizes are dynam-
ically determined. Roughly speaking, in this approach matrices Q k+ l and
P,.+l are built up column by column with special recursions that culminate
in the production of a nonsingular P'[+ 1Qk+l· The computations are ar-
ranged so that the biorthogonality conditions Pt Qk+l = 0 and Q[ Pk+ 1 = 0
hold for i = 1:k.
A method of t his form belongs to the family of look-ahead Lanczos
methods. The length of a look-ahead step is the widt h of the Qk+l and Pic+l
that it produces. H that width is one, a conventional block Lanczos step
may be taken. Length-2 look-ahead steps are discussed in Parlett , Taylor
and Liu (1985). T he notion of incurable brookdown.ls also presented by these
authors. Freund, Gutknecht, and Nachtigal (1993) cover the general case
along with a host of implementation details. Floating point considerations
506 CHAPTER 9. LANCZOS METHODS
Problems
P9.4.1 Prove that the Arnoldi vectors in (9.4.1) are mutually orthogonal.
P9.4.2 Prove (9.4.4).
P9.4.3 Prove (9.4.6).
P9.4.4 Give a.n example of a starting vector for which the unsymmetric La.nczos iteration
(9.4.7) breaks down without rendering any invariant subspace information. Use
~
6
0
A= [
3
References [or the Arnoldi iteration and its practical implementation include Saad (1992)
and
W.E. Arnoldi (1951). "The Principle of Minimized Iterations in the Solution of the
Matrix Eigenvalue Problem," Quarterly of Applied Mathematics 9, 17-29.
Y. Saad (1980). "Variations of Arnoldi's Method for Computing Eigenelements of Large
Unsymmetric Matrices.," Lin. Alg. and Its Applic. 34, 269-295.
Y. Saad (1984). "Chebyshev Acceleration Techniques for Solving Nonsymmetric Eigen-
value Problems," Math. Comp. 42, 567-588.
D.C. Sorensen (1992). "Implicit Application of Polynomial Filters in a k-Step Arnoldi
Method," SIAM J. Matrix Anal. Appl. 13, 357-385.
D.C. Sorensen {1995). "Implicitly Restarted Arnoldi/Lanczos Methods for Large Scale
Eigenvalue Calculations," in Proceedings of the ICASEjLaRC Workshop on Pamllel
Numerical Algorithms, May 23-25, 1994, D. E. Keyes, A. Sameh, and V. Venkatakr-
ishnan (eds), Kluwer.
R.B, Lehoucq {1995). "Analysis and Implementation of an Implicitly Restarted Arnoldi
Iteration," Ph.D. thesis, ruce University, Houston Texas.
R.B. Lehoucq {1996). "Restarting an Arnoldi Reduction," Report MCS-P591-0496, Ar-
gonne National Laboratory, Argonne Jllinois.
R.B. Lehoucq and D.C. Sorensen (1996). "Deflation Techniques for an Implicitly Restarted
Iteration," SIAM J. Matrix Analysis and Applic, to appear.
R.B. Morgan (1996). "On Restarting the Arnoldi Method for Large Nonsymmetric
Eigenvalue Problems," Math Comp 65, 1213-1230.
Related papers include
A. Ruhe {1984). "Rational Krylov Algorithms for Eigenvalue Computation," Lin. Alg.
and Its Applic. 58, 391-405.
A. Ruhe (1994). "Rational Krylov Algorithms for Nonsymmetric Eigenvalue Problems
IJ. Matrix Pairs," Lin. Alg. and Its Applic. 191, 283-295.
9.4. ARNOLDI AND UNSYMMETRIC LANCZOS 507
A. Ruhe (1994). "The Rational Krylov Algorithm for Nonsymmetric Eigenvalue Prob-
lems Ill: Complex Shifts for Real Matrices," BIT 34,165-176.
T. Huckle (1994). "The Arnoldi Method for Normal Matrices," SIAM J. MatTi% Anal.
Appl. 15, 479-489.
C.C. Paige, B.N. Parlett,and H.A. VanDer Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Numer. Lineor Algelrm with Applic. 2,
115-134.
K.C. Toh and L.N. Trefethen (1996). "Calculation of PHeudospectra by the Arnoldi
Iteration," SIAM J. Sci. Cump. 17, 1-15.
The unsymmetric Lanczoa proce~B and related look ahead ideaa are nicely presented in
B.N. Parlett, D. Taylor, and z. Liu (1985). "A Look-Ahead Lanczos Algorithm for
Unsymmetrlc Matric..,," Math. Comp. 44, 105-124.
R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). "An Implementation of the
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices," SfAM J. Sci. and
Stat.Comp. 14, 137-158.
See also
Y. Saad (1982). "The Lanczos Biorthogonalizatlon Algorithm and Other Oblique Pro-
jection Methods for Solving Large Unsymmetric Eigenproblems," SIAM J. Numer.
AnaL 19, 485-506.
G.A. Geist (1991). "Reduction of a General Matrix to Tridiagonal Form," SIAM J.
Matriz Anal. Appl. 12, 362-373.
C. Brezinski, M. Zaglia, and H. Sadok (1991). "Avoiding Breakdown and Near Break-
down in Lanczos Tyoe Algorithms," Numer. Alg. 1, 261-284.
S.K. Kim and A.T. Chronopoulos (1991). "A CIBBS of Lanczos..Like Algorithms Imple-
mented on Parallel Computers," Parallel Comput. 17, 763-778.
B.N. Parlett (1992). "Reduction to Tridiagonal Form a.nd Minimal Realizations," SIAM
J. MatTi% AnaL Appl. 13, 567-593.
M. Gutknecht (1992). "A Compl~ Theory of the Unsymrnetric Lanczos Process and
Related Algorithms, Part 1," SIAM J. Matriz Anal. Appl. 13, 594--639.
M. Gutknecht (1994). "A Completed Theory of the Unsymmetric Lanczos Pro~ and
Related Algorithms, Part II," SIAM J. MatTi% Anal. Appl. 15, 15-58.
Z. Bai (1994). "Error Analysis of the Lanczos Algorithm for Nonsymmetric Eigenvalue
Problem," Math. Comp. 62, 209-226.
T. Huckle (1995). "Low-Rank Modification of tbe Unsynunetric Lanczos Algorithm,"
Math.Comp. 64, 1577-1588.
Z. Jia. (1995). "The Convergence of Generalized LanczosMethods for Large Unsymmetric
Eigenproblems," SIAM J. MatTi% Anal. Appllc 16, 543-562.
M.T. Chu, R.E. Funderlic, and G.H. Golub (1995). "A Rank-One Reduction Formula.
a.nd Its Applications to Matrix Factorizations," SIAM Retliew 37, 512-530.
Other papers include
H.A. Va.n der Vorst (1982). "A Generalized Lanczas Scheme," Math. Cump. 39, 559-
562.
D. Boley and G.H. Golub (1984). "The La.nczos-Amoldi Algorithm and Controllability,"
Syst. Control Lett. 4, 317-324.
Chapter 10
508
10.1. THE STANDARD ITERATIONS 509
x~k+L) = (b• - I:
J- L
a,ix)k ) - .t
J ~t+l
a;ix]"'>) /a;; (10.1.2)
e nd
Note that in the Jacobi iteration one does not use the most recently avail-
able information when comp uting x~Ht} . For example, x~k) is used in the
calculation of x~k+ L) even though component x~k+ t) is known. If we revise
the J acobi iteration so that we always use the most current est imate of t he
exact x; then we obtain
fori = l :n
x!'+>l = (b,
end
This defines what is called the Gauss-Seidel iterotion.
For both the Jacobi and Gauss-Seidel iterations, the transition from
x(k) to x (k+L} can be succinctly described in terms of the matrices L, D ,
and U defined by:
0 0 0
a21 0
L = a~u a32 0
0 0
GnJ a,.2 ... an,n - 1 0
10.1. THE STANDARD ITERATIONS 511
0 a12 a1n
0 0
u = 0 0 an-2,n
an.-1,n
0 0 0 0
In particular, the Jacobi step has the form MJx(k+I} = NJx(k) + b where
MJ = D and NJ = -(L+U). On the other hand, Gauss-Seidel is defined
by Max<k+ 1) = Nax(k) + b with Me = (D + L) and Nc = -U.
For example, consider the Jacobi iteration, Dx(k+I) = -(L + U)x(k) +b.
One condition that guarantees p(M:J 1 NJ) < 1 is strict diagonal dominance.
Indeed, if A has that property (defined in §3.4.10), then
< 1
Usually, the "more dominant" the diagonal the more rapid the convergence
but there are counterexamples. See P10.1. 7.
A more complicated spectral radius argument is needed to show that
Gauss-Seidel converges for symmetric positive definite A.
2 ~-a+bi 12 a2+b2
1>-1 = 1 +a+ bi = 1 + 2a + a 2 + b2 ·
This result is frequently applicable because many of the matrices that arise
from discretized elliptic PDE's are symmetric positive definite. Numerous
other results of this flavor appear in the literature.
end
This computation requires about twice as many flops as there are nonzero
entries in the matrix A. It makes no sense to be more precise about the
work involved because the actual implementation depends greatly upon the
structure of the problem at hand.
In order to stress this point we consider the application of (10.1.3) to
the NM-by-NM block tridiagonal system
0
91 11
92 h
= ( 10.1.6)
0 9M
where
4 -1 0 G(I,j) F(l,j)
-1 4 G(2,j) F(2,j)
T= '9J = ' h=
-1
0 -1 4 G(N, j) F(N,j)
for j = l :M
fori= l:N
G(i,j) = (F(i,j) + G(i - l,j) + G(i + l,j)+
G(i,j - 1) + G(i,j + 1))/4
end
end
Subject to this constraint, we consider how to choose the vj(k) so that the
error in y(k) is minimized.
10.1. THE STANDARD ITER.ATlONS 515
Recalling from the proof of Theorem 10.1.1 that x<">-x = (M - 1 N)"e<0 >
where e<0 > = x<0 l - x, we see that
lr
y(k) - X = 2: Vj(k)(x(j) - x) "
2:vj(k)(M- 1 N)'e(O).
; ..o j=O
(10.1.11)
where G = M- 1N and
k
Pl<(z) = 2:v;(k)zi.
f=O
Note that the condition {10.1.10) implies P~<(1) = 1.
At this point we assume that G is symmetric with eigenvalues A; that
satisfy -1 < a $ An ::; · • · ~ A1 $ {3 < 1. It follows that
II PI<(G) 112 =
Thus, to make the norm of PI<(G) small, we need a polynomial p~r(z) that
is small on [a ,/3] subject to the constraint that Pl<(1) = 1.
Consider the Chebyshev polynomials Cj(z) generated by the recursion
CJ(z) = 2zci_ 1(z)- C;-2(z) where eo(z) -=:: 1 and Ct(z) = z. These polyn~
mials satisfy lc;(z)l ::; 1 on [~1, 1] but grow rapidly off this interval. As a
consequence, the polynomial
Ck (-1 + 2z-a)
{3 - a
Pl<(z)
where
1- a 1-{3
p. = -1+2{3-a = 1+2{3 -a
tacit ly assuming that n is large and thus the retrieval of x! 0 ) , •.. , x! k) for
large k would be inconvenient or even impossible.
Fortunately, it is possible to derive a three-term recurrence among the
y<kl by exploiting t he three-term recurrence among the Chebyshev polyn~
mials. In particular, it can be shown that if
then
1 = 2/ (2 -a- {j),
where y< 0) = x (O) and y(ll = x( 1l. We refer to this scheme as the Cheby-
shev semi-iterative method associated with My{k+l ) = Ny(k) + b. For the
acceleration to be effective we need good lower and upper bounds a and {3.
As in SOR, these parameters may be difficult to ascertain except in a few
structured problems.
Chebyshev semi-iterative methods are extensively analy.>,ed in Varga
(1962, chapter 5), as well as in Golub and Varga (1961).
+ (1 -w)xt> (10.1.13)
end
(10.1.15)
=
It is clear that G M:;;T NJ M:; 1 N.., is the iteration matrix for this
method. From the definitions of Mw and Nw it follows that
(10.1.16)
If D has positive diagonal entries and KKT = (N'J n- 1 Nw) is the Cholesky
factorization, then KTGK-T = KT(MwD- 1 MJ)- 1 K. Thus, G is similar
to a symmetric matrix and has real eigenvalues.
The iteration (10.1.15) is called the symmetric successive over-relaxation
(SSOR) method. It is frequently used in conjunction with the Chebyshev
semi-iterative acceleration.
Problems
=
PlO.l.l Show that the Jacobi iteration can be written in the form x<k+t) x<•>+Hr(k)
where r<•l = b- Ax<•>. Repeat for the Gauss-Seidel iteration.
P10.1.2 Show that if A is strictly diagonally dominant, then the Gauss-Seidel iteration
converges.
P10.1.3 Show that the Jacobi iteration converges for 2-by-2 symmetric positive definite
systems.
P10.1.4 Show that if A= M- N is singular, then we can never have p(M- 1 N) < 1
even if M is nonsingular.
P10.1.5 Prove (10.1.16).
P10.1.6 Prove the converse of Theorem 10.1.1. In other words, show that if the iteration
Mx{k+l) =Nx(k) + b always converges, then p(M- 1 N) < l.
P10.1. 7 (Supplied by R.S. Varga) Suppose that
Let J 1 and J2 be the associated Jacobi iteration matrices. Show that p(Jt) > p(J,)
thereby refuting the claim that greater diagonal dominance implies more rapid Jacobi
convergence.
P10.1.8 The Chebyshev algorithm is defined in terms of parameters
2c.(l/p)
1
Wk+ = pck+t(l/p)
where ck(>.) = cosh(kcosh- 1 {>.)] with >. > 1. {a) Show that 1 < Wk < 2 for k > 1
whenever 0 < p < 1. (b) Verify that Wk+t < Wk· (c) Determine limwk ask~ oo.
P10.1.9 Consider the 2-by-2 matrix
A= [ -~ n.
518 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
(a) Under what conditions will GaU!IS-Seidel converge with this matrix? (b) For what
range of w will the SOR method converge? What is the optimal choice for this parameter?
(c) J!P.peat (a) and (b) for the matrix
A- [
- I,.
-sT I~ ]
As we mentioned, Young (1971) has the most comprehensive treatment of the SOR
method. The object of "SOR theory" is to guide the user in choosing the relaxation
parameter w. In this setting, the ordering of equations and unknowns is critical. See
M.J.M. Bernal and J.H. Verner (1968). "On Generalizing of the Theory of Consistent
Orderings for Successive Over-Relaxation Methods," Numer. Math. 12, 21&-22.
D.M. Young (1970). "Convergence Properties of the Symmetric and Unsymmetric Over-
Relaxation Methods," Math. Camp. 24, 793-807.
D.M. Young (1972). "Generalization of Property A and Consistent Ordering," SIAM J.
Num. Anal. 9, 454-63.
ILA. Nicolaides (1974). "On a Geometrical Aspect of SOR and the Theory of Consistent
Ordering for Positive Definite Matrices," Numer. Math. 12, 99--104.
L. Adams and H. Jordan (1986). "ls SOR Color-Blind?" SIAM J. Sci. Stat. Comp. 1,
49o--506.
M. Eiermann and R.S. Varga (1993). "Is the Optimal w Best for the SOR Iteration
Method," Lin. Alg. and Its Applic. 182, 257-277.
G.H. Golub and ILS. Varga (1961). "Chebychev Semi-Iterative Methods, Successive
Over-Relaxation Iterative Methods, and Second-Order Richardson Iterative Methods,
Parts I and II," Numer. Math. 3, 147-56, 157-68.
10.1. THE STANDARD ITERATIONS 519
This work is premised on the assumption that the underlying iteration matrix has real
eigenvalues. How to proceed when this is not the case is discussed in
T.A. Manteuffel (1977). "The Tchebychev Iteration for Nonsymmetric Linear Systems,"
Numer. Math. 28, 307-27.
M. Eiermann and W. Niethammer (1983). "On the Construction of Semi-iterative Meth-
ods," SIAM J. Numer. Anal. 20, 1153-1160.
W. Niethammer and R.S. Varga (1983). "The Analysis of k-step Iterative Methods for
Linear Systems from Summability Theory," Numer. Math. 41, 177-206.
G.H. Golub and M. Overton (1988). ''The Convergence of Inexact Chebychev and
Richardson Iterative Methods for Solving Linear Systems," Numer. Math. 53, 571-
594.
D. Calvetti, G.H. Golub, and L. Reichel (1994). "An Adaptive Chebyshev Iterative
Method for Nonsymmetric Linear Systems Based on Modified Moments," Numer.
Math. 67, 21-40.
Other unsymmetric methods include
J.W. Sheldon (1955). "On the Numerical Solution of Elliptic Difference Equations,"
Math. Table, Aids Comp. 9, 101-12.
The parallel implementation of the classical iterations has received some attention. See
D.J. Evans (1984). "Parallel SOR Iterative Methods," Parollel Computing 1, 3-18.
N. Patel and H. Jordan (1984). "A Parallelized Point Rowwise Successive Over-Relaxation
Method on a Multiprocessor," Parollel Computing 1, 207-222.
R.J. Plemmons {1986). "A Parallel Block Iterative Scheme Applied to Computations in
Structural Analysis," SIAM J. Alg. and Disc. Methods 7, 337-347.
C. Kamath and A. Sarneh (1989). "A Projection Method for Solving Nonsymmetric
Linear Systems on Multiprocessors," Pamllel Computing 9, 291-312.
We have seen that the condition K(A) is an important issue when direct methods are
applied to Ax = b. However, the condition of the system also has a bearing on iterative
method performance. See
M. Arioli and F. Romani (1985). "Relations Between Condition Numbers and the Con-
vergence of the Jacobi Method for Real Positive Definite Matrices," Numer. Math.
46, 31-42.
M. Arioli, l.S. Duff, and D. Ruiz (1992). "Stopping Criteria for Iterative Solvers," SIAM
J. Matrix Anal. Appl. 13, 138-144.
A. Dax (1990). "The Convergence of Linear Stationary Iterative Processes for Solving
Singular Unstructured Systems of Linear Equations," SIAM Review 32, 611--{;35.
Finally, the effect of rounding errors on the methods of this section are treated in
the residual of Xc. If the residual is nonzero, then there exists a positive
a such that rjJ(xc +arc) < <f>(xc)· In the method of steepest descent (with
10.2. THE CONJUGATE GRADIENT METHOD 521
This gives
xo = initial guess
ro = b- Axo
k=O
while rk f- 0
k=k+1 (10.2.1)
elk= rf_ 1rk-J/rf_ 1Ark-l
Xk = Xk-l + ClkTk-l
Tk = b- Axk
end
(10.2.3)
xo = initial guess
ro = b- Axo
k=O
while Tk I 0
k=k+1 (10.2.4)
Choose a direction Pk such that pfrk-1 I 0.
Ctk = pfTk-I/PI Apk
Xk = Xk-1 + CtkPk
rk = b- Axk
end
Note that
Our goal is to choose the search directions in a way that guarantees con-
vergence without the shortcomings of steepest descent.
Xk = Xo + flJc-1Y + Ctpk
where Pk-1 = (p~, ... ,Pk-1 ], y E IRk- I, and Ct E IR, then
2
Ct T
cf>(xk) = </>(xo + Pk-IY) + oyTPk-lAPk
T
+ T
2Pk Apk - etpk ro.
If Pk E span{Ap1. ... , Apk- i}.l, then the cross term ayT P[_ 1Apk is zero
and the search for the minimizing Xk splits into a pair of uncoupled mini-
mizations, one for y and one for a:
0
= min ( </>(:z:o+Pk-IY) + 2
2 vfApk
- apfro)
y,<>
10.2. THE CONJUGATE GRADIENT METHOD 523
Note that if Yk-1 solves the first min problem then Xk-1 = xo + Pk-tYk-1
minimizes</> over x 0 + span{pt. ... ,Pk-d· The solution to the a min prob-
lem is given by ak = pf To/pf Apk. Note that because of A-conjugacy,
A- 1b cf_ xo + span{pt, ... ,Pk-d '* b cf_ Axo + span{Apt, ... , APk-d
=? To cf_span{Apt,····APk-d·
Thus there exists a p E span{ Apt. ... , Apk-d_i such that pT To -1 0. But
Xk-1 E Xo + span{pt, ... ,Pk-d and so Tk-1 E To+ span{Apt, ... , APk-d·
It follows that pT Tk-1 = pT To -1 0. (]
The search directions in {10.2.6) are said to be A-conjugate because
pf Api = 0 for all i -1 j. Note that if Pk = [p1, ... ,pk] is the matrix of these
vectors, then
P'{ APk = diag{pf Apt, ... ,pf Apk)
is nonsingular since A is positive definite and the search directions are
nonzero. It follows that Pk has full column rank. This guarantees conver-
gence in {10.2.6) in at most n steps because Xn (if we get that far) minimizes
</>(x) over ran(Pn) = !Rn.
524 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
x 0 = initial guess
k=O
ro = b- Axo
while rk # 0
k=k+l
if k = 1
P1 = ro
else (10.2.7)
Let Pk minimize II p- rk-l ll2 over all vectors
p E span{Ap,, ... , Apk-d .L
end
ak = Pr rk-1/Pr Apk
Xk = Xk-1 + CtkPk
rk = b- Axk
end
X= Xk
where Pk-1 = [p,, ... ,Pk-1] and Zk-1 solves min II rk-1- APk-lZ 112·
z E lRk-l
Pmof. Suppose Zk- 1 solves the above LS problem and let p be the associ-
ated minimum residual:
But this means that Yk solves the linear system (PJ' APk)Y = PJ'(b- Axo).
Thus
0 = P'{ (b- Axo) - P'{ APkYk = P'{ (b- A(xo + PkYk)) = P'{ rk.
To prove {10.2.10) we note from (10.2.8) that
{Ap1, ... ,Apk-d ~ span{ro, ... ,Tk-d
and so from Lemma 10.2.2,
Pk = Tk-1- [Ap1, ... , APk-1] Zk-1 E span{ro, ... , Tk-d
It follows that
[p1, ... ,Pk] = [ro, ... 'Tk-1] T
for some upper triangular T. Since the search directions are independent,
Tis nonsingular. This shows
span{p1, ... ,pk} = span{ro, ... , rk-d·
Using (10.2.8) we see that
rk E span{rk-1, Apk} ~ span{rk-1• Aro, ... , Ark_J}.
The Krylov space connection in (10.2.10)follows from this by induction.
Finally, to establish the mutual orthogonality of the residuals, we note
from (10.2.9) that rk is orthogonal to any vector in the range of Pk. But
from {10.2.10) this subspace contains ro, ... , Tk-I· D
Using these facts we next show that Pk is a simple linear combination
of its predecessor Pk- 1 and the "current" residual rk_ 1 .
526 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
Corollary 10.2.4 The residuals and search directions in (10.2. 7} have the
properly that Pk E span {Pk-i, Tk-i} for k ;:: 2.
Zk-i = [ ~]
,.. k-12
where
Sk-i
J.!
---Tk-2 - APk-2W
O:k-1
E span{rk-2, APk-2W}
<:::; span{rk-2, Ap1, ... , APk-2}
<:::; span{rJ, ... ,rk-2}
Because the r; are mutually orthogonal, it follows that Sk-i and Tk-i are
orthogonal to each other. Thus, the least squares problem of Lemma 10.2.2
boils down to choosing J.! and w such that
2
II Pk II~ = (1 + _J.!_)
O:k-1
II rk-i II~ +II Sk-J II~
is minimum. Since the 2-norm of Tk_ 2 -APk-2Z is minimized by Zk-2 giving
residual Pk-i, it follows that Sk-i is a multiple of Pk-i· Consequently,
Pk E span{rk-l,Pk-1}. IJ
We are now set to derive a very simple expression for Pk· Without loss
of generality we may assume from Corollary 10.2.4 that
Pk = Tk-i + f3kPk-i·
Since pf_ 1 Apk =0 it follows that
xo = initial guess
k=O
ro= b-Axo
while rk '# 0
k=k+1
if k =1
Pi= ro
else
f3k = -pf_iArk-1/PL~Apk-t
Pk = Tk-1 + f3kPk-i (10.2.11)
end
C<k = p'[Tk- tfp'[ Apk
Xk = Xk-1 + C<kPk
Tk = b- Axk
end
X=Xk
In this implementation, the method requires three separate matrix-vector
multiplications per step. However, by computing residuals recursively via
rk = rk-1 - okAPk and substituting
(10.2.12)
and
(10.2.13)
into the formula for f3k, we obtain the following more efficient version:
1 -(Jz 0 0
0 1 -/3a
then the columns of R~ca- 1 form an orthonormal basis for the subspace
span{ro,Aro, ... ,A"- 1r 0 }. Consequently, the columns of this mat rix are
essentially the La.oczos vectors of Algorithm 9.3.1, i.e.,
q; = ±r;-J/Pl-1 i = l:k.
Moreover, the tridiagonal matrix associated with these Lanczos vectors is
given by
(10.2.14)
The d iagonal and subdiagonal of this matrix involve quantities that are
readily available during the conjugate gradient iteration. Thus, we can
obtain good estimates of A's extremal eigenvalues (and condition number)
as we generate the Xk in Algorit hm 10.2.1.
10.2. THE CONJUGATE GRADIENT METHOD 529
x = initial guess
k=O
r = b- Axo
Po= I r II~
while ( y'iik > ~11 b lb) A (k < krrwx)
k=k+1
ifk=1
p=r
else (10.2.16)
/3k =Pk-t/Pk-2
p = r + f3kP
end
w=Ap
Ok = Pk-t!PT w
x =x+okp
r =r-okw
Pk =II r II~
end
This algorithm requires one matrix-vector multiplication and IOn flops per
iteration. Notice that just four n-vectors of storage are essential: x, r, p,
and w. The subscripting of the scalars is not necessary and is only done
here to facilitate comparison with Algorithm 10.2.1.
It is also possible to base the termination criteria on heuristic estimates
of the error A- 1 rk by approximating II A- 1 ll2 with the reciprocal of the
smallest eigenvalue of the tridiagonal matrix Tk given in (10.2.14).
The idea of regarding conjugate gradients as an iterative method began
with Reid (1971). The iterative point of view is useful but then the mte of
convergence is central to the method's success.
method performs well when A is near the identity either in the sense of a
low rank perturbation or in the sense of norm.
Theorem 10.2.5 If A = I + B is an n-by-n symmetric positive definite
matrix and mnk(B) = r then Algorithm 10.2.1 converges in at most r + 1
steps.
Proof. The dimension of
cannot exceed r + 1. Since p 1 , ••• ,pk span this subspace and are indepen-
dent, the iteration cannot progress beyond r + 1 steps. D
The accuracy of the {xk} is often much better than this theorem predicts.
However, a heuristic version of Theorem 10.2.6 turns out to be very useful:
Problems
P10.2.5 Give formula for the entri'"' of the tridiagonal matrix Tk in (10.2.14).
P10.2.6 Compare the work and storage requirements associated with the practical im-
plementation of Algorithms 9.3.1 and 10.2.1.
P10.2.'1' Show that if A E R'xn is symmetric positive definite and has It distinct eigen-
values, then the conjugate gradient method does not require more than It + 1 steps to
converge.
P10.2.8 Use Theorem 10.2.6 to verify that
J.E. Dennis Jr. and K. Thrner (1987). "Generalized Conjugate Directions,• Lin. Alg.
and Its Applic. 88/89, 187-209.
G.W. Stewart (1973). "Conjugate Direction Methods for Solving Systems of Linear
Equations," Numer. Math. 111, 284-97.
G. Golub and D. O'Leary (1989). "Some History of the Conjugate Gradient and Lancws
Methods," SIAM Review 81, SQ-102.
M.R. Hestenes (1990). "Conjugacy and Gradients," in A Hi.tory of Scientific Comput-
ing, Addison-Wesley, Reading, MA.
S. Ashby, T.A. Manteulfel, and P.E. Saylor (1992). "A Taxonomy for Conjugate Gradient
Methods," SIAM J. Numer. Anal. 117, 1542-1568.
The classic reference for the conjugate gradient method is
M.R. Hestenes and E. Stiefel (1952). "Methods of Conjugate Gradients for Solving
Linear Systems," J. Res. Nat. Bur. Stand. ,49, 409-36.
An exact arithmetic analysis of the method may be found in chapter 2 of
The idea of using the conjugate gradient method as an iterative method was first dis-
cussed in
J.K. REid (1971). " On the Method of Conjugate Gradients for the Solution of Large
Sparse Systems of Linear Equations," in Large Sparse Sets of Linear Equations , ed.
J.K. REid, Academic Press, New York, pp. 231-54.
Several authors have attempted to explain the algorithm's behavior in finite precision
arithmetic. See
Finally, we mention that the method can be used to compute an eigenvector of a large
sparse symmetric matrix:
A. Ruhe and T. Wiberg (1972). "The Method of Conjugate Gradients Used in Inverse
Iteration," BIT 12, 543-54.
10.3.1 Derivation
Consider the n-by-n symmetric positive definite linear system Ax = b. The
idea behind preconditioned conjugate gradients is to apply the "regular"
conjugate gradient method to the transformed system
Ax= b, (10.3.1)
k=O
x0 = initial guess (Ax 0 :::::b)
fo =b- Aio
while fk # 0
k=k+1
if k = 1
ih = fo
else (10.3.2)
~k = rf_ 1fk-I/rf_ 2fk-2
Pk = Tk-! + ~kPk-!
end
ak = rf_ 1rk-J/fifC- 1AC- 1fik
Xk = h-J + OkPk
fk = fk-J- akC- 1 AC- 1pk
k=O
xo = initial guess ( Axo ::::: b)
ro = b- Axo
while c-!Tk fc 0
k=k+l
if k = 1
Cp1 = c- 1ro
else (10.3.3)
~k = (C- 1rk_t)T(c- 1rk-J)/(C- 1rk-2)T(c- 1rk-2)
Cpk = c-
1
rk-l + ~kcPk-!
end
ak = (C- 1 rk_t)T(c- 1rk_J)f(Cpk)T(c- 1AC- 1)(Cpk)
Cxk = Cxk-1 + akCPk
c- 1rk = C- 1rk-1- Ok(C- 1AC- 1)Cpk
end
Cx = Cxk
534 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
k=O
ro = b- Axo
while (rk # 0)
Solve M Zk = Tk.
k=k+1
ifk = 1
PI= zo
else
~k =r[_ 1Zk-1/r[_ 2zk-2
Pk = Zk-1 + fJkPk-1
end
ak = rf_ 1zk-1/P[ Apk
Xk = Xk-1 + akPk
Tk = Tk-1 - akAPk
end
X =Xk
i#j (10.3.4)
i#j (10.3.5)
The choice of a good preconditioner can have a dramatic effect upon the
rate of convergence. Some of the possibilities are now discussed.
10.3. PRECONDITIONED CONJUGATE GRADIENTS 535
For purposes of illustration, we assume that the A; are tridiagonal and the
E; are diagonaL Matrices with this structure arise from the standard 5-
point discretization of self-adjoint elliptic partial differential equations over
a two-dimensional domain.
The 3-by-3 case is sufficiently general. Our discussion is based upon
Concus, Golub, and Meurant (1985). Let
G1Gf = B, - AI
F1 = E,c-;1
c2cr = B2 = A2-F1F'[ = A2- E,B1 1Ef
F2 = E2G"i. 1
cacr = Ba - Aa -F2F[ Aa- E2Bi. 1 E[
ih =AI
EJ't( 1
Fh = A2- E1A 1ET, A1 (tridiagonal) ::::: B! 1
1
E2G"2
B3 = A3- E2A2Ef, A2 (tridiagonal) ::::: B2 1
Note that all the B; are tridiagonal. Clearly, the A; mnst be carefully
chosen to ensure that the B; are also symmetric and positive definite. It
then follows that the G; are lower bidiagonal. The F; are full, but they
need not be explicitly formed. For example, in the course of solving the
system M z = r we must solve a system of the form
Glwl Tl
- --I
G2w2 r2- F1w1 = T2- E1G 1 w1
- - I
Caw a ra - F2w2 = T3 - E2Gi w2
B1 XJ di
B2 X2 d2
(10.3.8)
Ap Bp Xp dp
B'[ BJ' BT
p Q z f
if the unknowns are properly sequenced. See Meurant (1984). Here, the
A; are symmetric positive definite, the B; are sparse, and the last block
column is generally much narrower than the others.
An example with p = 2 serves to connect (10.3.8) and its block structure
with the underlying problem geometry and the chosen domain decomposi-
tion. Suppose we are to solve Poisson's equation on the following domain:
+++++++++
+++++++++
+++++++++
+++++++++
+++++++++
+++++++++
+++++++++
••••••• ••
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X
for the zero blocks in (10.3.8). Also observe that the number of interface
unknowns is typically small compared to the overall number of unknowns.
Now let us explore the preconditioning possibilities associated with
(10.3.8). We continue with the p = 2 case for simplicity. If we set
1
[ M- 0
M L I 0 M21 qLT
0 0 s-1
where
M~
~]
0
L = [ M2
BT
I nr
then
M =
[ M1
0
BT
I
0
M2 B1
BT
2
B2
s.
l (10.3.9)
(10.3.10)
end
Xk = Xk-2 + Wkbk-1Zk-1 + Xk-1- Xk-2)
rk =b-Axk
end
X =.Xn
10.3. PRECONDITIONED CONJUGATE GRADIENTS 541
Problems
Our discussion .,f the preconditioned conjugate gradient is drawn from .several sources
including
P. Concus, G.H. Golub, and D.P. O'Leary (1976). " A Generalized Conjugate Gradient
Method for the Numerical Solution of Elliptic Partial Differential Equations," in
Sparse Matrix Computations , ed. J.R. Bunch and D.J. Rose, Academic Press, New
York.
G.H. Golub and G. Meumnt (1983). Resolution Numerique des Gmndes Systemes
Lineaires, Collection de Ia Direction des Etudes et Recherches de l'Eiectricite de
France, vol. 49, Eyolles, Paris.
0. Axelsson (1985). "A Survey of Preconditioned Iterative Methods for Linear Systems
of Equations," BIT f5, 166-187.
P. Concus, G.H. Golub, and G. Meurant (1985). "Block Preconditioning for the Conju-
gate Gradient Method," SIAM J. Sci. and Stat. Comp. 6, 22(}-252.
0. Axelsson and G. Lindskog (1986). "On the Rate of Convergence of the Preconditioned
Conjugate Gradient Method," Numer. Math. 48, 499-523.
J.A. Meijerink and H.A. Vander vorst (1977). "An Iterative Solution Method for Linear
Equation Systems of Which the Coefficient Matrix is a Symmetric M-Matrix," Math.
Comp. :n, 148-{)2.
T.A. Mantueflel (1979). "Shifted Incomplete Cholesky Factorization," in Sparse Matrix
Proceedings, 1978, ed. l.S. Duff and G.W, Stewart, SIAM Publications, Philadelphia,
PA.
T.F. Chao, K.R. Jackson, and B. Zhu (1983). "Alternating Direction Incomplete Fac-
torizations," SIAM J. Numer. Anal. fO, 239-257.
G. Roderigue and D. Wolitzer (1984). "Preconditioning by Incomplete Block Cyclic
Reduction," Math. Comp. 4f, 549-566.
0. Axelsson (1985). "Incomplete Block Matrix Factorization Preconditioning Methods.
The ffitimate Answer?", J. Comput. Appl. Math. 12&13, 3-18.
0. Axelsson (1986). "A General Incomplete Block Matrix Factorization Method," Lin.
Alg. Appl. 14, 179-190.
H. Elman (1986). "A Stability Analysis of Incomplete LU Factorization," Math. Comp.
47, 191-218.
T. Chan (1991). "Fourier Analysis of RelllXed Incomplete Factorization Precondition-
ers," SIAM J. Sci. Statist. Comput. lf, 668-680.
542 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition-
en; for Elliptic Problems by Substructuring 1," Math. Comp. 41, 103-134.
J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition-
en; for Elliptic Problems by Substructuring II," Math. Comp. 49, 1-17.
G. Meurant {1989). "Domain Decomposition Methods for Partial Differential Equations
on Parallel Computers," to appear lnt'l J. Supercomputing Applications.
W.D. Gropp and D.E. Keyes (1992). "Domain Decomposition with Local Mesh Refine-
ment," SIAM 1. Sci. Statist. Comput. 13, 967-~J93.
D.E. Keyes, T.F. Chan, G. Meurant, J.S. Scroggs, and R.G. Voigt {eds) {1992). Do-
main Decomposition Methods for Partial Differential Equations, SIAM Publications,
Philadelphia, PA.
M. Mu. (1995). "A New family of Preconditioners for Domain Decomposition," SIAM
1. Sci. Comp. 16, 289-306.
Various aspects of polynomial preconditioners a.re discussed in
O.G. Johnson, C.A. Micchelli, and G. Paul {1983). "Polynomial Preconditioners for
Conjugate Gradient Calculations," SIAM 1. Numer. Anal. 20, 362-376.
S.C. Eisenstat {1984). "Efficient Implementation of a Class of Preconditioned Conjugate
Gradient Methods," SIAM 1. Sci. and Stat. Computing 2, 1-4.
Y. Saad (1985). "Practical Use of Polynomial Preconditionings for the Conjugate Gra-
dient Method," SIAM 1. Sci. and Stat. Comp. 6, 865-882.
L. Adams {1985). "m-step Preconditioned Congugate Gradient Methods," SIAM 1. Sci.
and Stat. Comp. 6, 452--463.
S.F. Ashby {1987). "Polyuomial Preconditioning for Conjugate Gradient Methods,"
Ph.D. Thesis, Dept. of Computer Science, University of Illinois.
S. Ashby, T. Manteuffel, and P. Saylor (1989). "Adaptive Polynomial Preconditioning
for Hermitian Indefinite Linear Systems," BIT 29, 583--{;09.
R.W. Freund {1990). "On Conjugate Gradient Type Methods and Polynomial Pre-
conditioners for a Class of Complex Non-Hermitian Matrices," Numer. Math. 57,
285-312.
S. Ashby, T. Manteuffel, and J. Otto (1992). "A Comparison of Adaptive Chebyshev
and Least Squares Polynomial Preconditioning for Hermitian Positive Definite Linear
Systems," SIAM 1. Sci. Stat. Comp. 13, 1-29.
Numerous vector/parallel implementations of the cg method have been developed. See
P.F. Dubois, A. Greenbaum, and G.H. Rodrigue (1979). "Approximating the Inverse
of a Matrix for Use on Iterative Algorithms on Vector Processors," Computing 22,
257-268.
H.A. Vander Vorst {1982). "A Vectorizable Variant of Some JCCG Methods," SIAM 1.
Sci. and Stat. Comp. 3, 35Q--356.
G. Meurant (1984). "The Block Preconditioned Conjugate Gradient Method on Vector
Computers," BIT 24, 623--{;33.
T. Jordan (1984). "Conjugate Gradient Preconditioners for Vector and Parallel Pro-
cessors," in G. Birkoff and A. Schoenstadt {eds), Proceeding• of the Conference on
Elliptic Problem Solvero, Academic Press, NY.
H.A. Van der Vorst {1986). "The Performance of Fortran Implementations for Precon-
ditioned Conjugate Gradients on Vector Computers," Parallel Computing 3, 49-58.
M.K. Seager {1986). "Parallelizing Conjugate Gradient for the Cray X-MP," Parallel
Computing 3, 35-47.
10.3. PRECONDITIONED CONJUGATE GRADIENTS 543
0. Axelsson and B. Polman (1986). "On Approximate Factorization Methods for Block
Matrices Suitable for Vector and Parallel Processors,'' Lin. Alg. and Its Applic. 77,
3-26.
D.P. O'Leary (1987). "Parallel Implementation of the Block Conjugate Gradient Algo-
rithm," Parallel Computers 5, 127-140.
R. Melhem(1987). ''Toward Efficient Implementation of Preconditioned Conjugate Gra-
dient Methods on Vector Supercomputers," Int'l J. Supercomputing Applications 1,
7Q-98.
E.L. Poole and J.M. Ortega {1987). "Multicolor ICCG Methods for Vector Computers,"
SIAM J. Numer. Anal. !!4, 1394-1418.
C.C. Ashcraft a.nd R. Grimes (1988). "On Vectorizing Incomplete Factorization and
SSOR Preconditioners," SIAM J. Sci. and Stat. Camp. 9, 122-151.
U. Meier and A. Sameh (1988). "The Behavior of Conjugate Gradient Agorithms on a
Multivector Processor with a HierBrchical Memory," J. Comput. Appl. Math. 24,
13-32.
W.O. Gropp and D.E. Keyes (1988). "Complexity of Parallel Implementation of Domain
Decomposition Techniques for Elliptic Partial Differential Equations," SIAM J. Sci.
and Stat. Comp. 9, 312-326.
H. VanDer Vorst {1989). "High Performance Preconditioning," SIAM J. Sci. and Stat.
Comp. 10, ll74-ll85.
H. Elman (1989). "Approximate Schur Complement Preconditioners on Serial and Par-
allel Computers," SIAM J. Sci. Stat. Comput. 10, 581-605.
0. Axelsson and V. Eijkhout {1989). ''Vectorizable Preconditioners for Elliptic Difference
Equations in Three Space Dimensions," J. Comput. Appl. Math. 27, 299-321.
S.L. Johnsson and K. Mathur (1989). "Experience with the Conjugate Gradient Method
for Stress Analysis on a Data Parallel Supercomputer," International Journal on
Numerical Methods in Engineering 27, 523-546.
L. Mansfield (1991). "Damped Jacobi Preconditioning and Coarse Grid Deflation for
Conjugate Gradient Iteration on Parallel Computers," SIAM J. Sci. and Stat. Camp.
12, 1314-1323.
V. Eijkhout (1991). "Analysis of Parallel Incomplete Point Factorizations," Lin. Alg.
and Its Applic. 154-156, 723-740.
S. Doi (1991). "On Parallelism and Convergence oflncomplete LU Factorizations," Appl.
Numer. Math. 7, 417--436.
G. Stmng {1986). "A Proposal for Toeplitz Matrix Calculations," Stud. Appl. Math.
14, 171-176.
T.F. Chan (1988). "An Optimal Circulant Preconditioner for Toeplitz Systems," SIAM.
J. Sci. Stat. Comp. 9, 766-771.
R.H. Chan {1989). "The Spectrum of a Family of Circulant Preconditioned Toeplitz
Systems," SIAM J. Num. Anal. 26, 503-506.
R.H. Chan (1991). "Preconditioners for Toeplitz Systems with Nonnegative Generating
Functions," IMA J. Num. Anal. 11, 333-345.
T. Huckle {1992). "Circulant and Skewcirculant Matrices for Solving Toeplitz Matrix
Problems," SIAM J. Matrix Anal. Appl. 13, 167-777.
T. Huckle {1992). "A Note on Skew-Circulant Preconditioners foe Elliptic Problems,"
Numerical Algorithms 1?., 279-286.
R.H. Chan, J.G. Nagy, and R.J. Plemmons {1993). "FFT based Preconditioners for
Toeplitz Block Least Squares Problems," SIAM J. Num. Anal. 30, 174Q-1768.
M. Hanke and J.G. Nagy (1994). ''Toeplitz Approximate Inverse Preconditioner for
Banded Toeplitz Matrices," Numerical Algorithms 7, 183-199.
R.H. Chan, J.G. Nagy, and R.J. Plemmons (1994). "Circulant Preconditioned Toeplitz
Least Squares Iterations," SIAM J. Matrix Anal. Appl. 15, 8Q-97.
544 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
T.F. Chan and J.A. Oikin (1994). "Circulant Preconditioners for Toeplitz Block Matri-
ces," Numerical Algorithms 6, 89-101.
J.K. Reid (1972). "The Use of Conjugate Gradients for Systems of Linear Equations
Possessing Property A," SIAM J. Num. Anal. 9, 325-32.
D.P. O'Leary (1980). "The Block Conjugate Gradient Algorithm and Related Methods,"
Lin. Alg. and Its Applic. 29, 293-322.
R.C. Chin, T.A. Manteuffel, and J. de Pillis (1984). "ADI as a Preconditioning for
Solving the Convection-Diffusion Equation," SIAM J. Sci. and Stat. Comp. 5,
281-299.
I. Duff and G. Meurant (1989). "The Effect of Ordering on Preconditioned Conjugate
Gradients," BIT 29, 635--657.
A. Greenbaum and G. Rodrigue {1989). "Optimal Preconditioners of a Given Sparsity
Pattern," BIT 29, 61Q-634.
0. Axelsson and P. Vassilevski {1989). "Algebraic Multilevel Preconditioning Methods
I," Numer. Math. 56, 157-177.
0. Axelsson and P. Vassilevski (1990). "Algebraic Multilevel Preconditioning Methods
II," SIAM J. Numer. Anal. 27. 1569-1590.
M. Hanke and M. Neumann (1990). "Preconditioning• and Splittings for Rectangular
Systems," Numer. Math. 57, 85--96.
A. Greenbaum (1992). "Diagonal Scalings of the Laplacian as Preconditioners for Other
Elliptic Differential Operators," SIAM J. Matrix Anal. Appl. 13, 826-846.
P.E. Gill, W. Murray, D.B. Poncele6n, and M.A. Saunders (1992). "Preconditioners
for Indefinite Systems Arising in Optimization," SIAM J. Matrix Anal. App!. 13,
292-311.
G. Meurant (1992). "A Review on the Inverse of Symmetric Tridiagonal and Block
Tridiagonal Matrices," SIAM J. Matrix Anal. Appl. 13, 707-728.
S. Holmgren and K. Otto (1992). "Iterative Solution Methods and Preconditioners for
Block-Tridiagonal Systems of Equations," SIAM J. Matrix Anal. Appl. 13, 863--886.
S.A. Vavasis (1992). "Preconditioning for Boundary Integral Equations," SIAM J. Ma-
trix Anal. Appl. 13, 905--925.
P. Joly and G. Meurant (1993). "Complex Conjugate Gradient Methods," Numerical
Algorithms 4, 379-406.
X.-C. Cai and 0. Widlund (1993). "Multiplicative Schwarz Algorithms for Some Non-
symmetric and Indefinite Problems," SIAM J. Numer. Anal. 30, 936--952.
Bear in mind that there is a large gap between our algorithmic speci-
fications and production software. A good place to build an appreciation
for this point is the Templates book by Barrett et al (1993). The book by
Saad (1996) is also highly recommended
k=O
ro = b-Axo
while rk # 0
k=k+l
if k = 1
P1 = ATro
else
f3k =(ATTk_J)T(AT Tk-1)/(ATTk-2)T(AT Tk-2)
Pk =ATTk-1 + fJkPk-1
end
O:k = (AT Tk-dT (AT Tk-1)/(Apk)T(Apk)
Xk = Xk-1 + O:kPk
Tk = Tk-1 - o:kAPk
end
X= Xk
k=O
Yo= initial guess (AATy 0 =b)
ro = b- AATyo
while rk 'f 0
k=k+1
if k = 1
P1 = ro
else
~k =rf_ 1rk-1/rf_ 2rk-2
Pk = Tk-1 + ~kPk-1
end
ak = rf_ 1rk-J/pfAAT Pk
Yk = Yk-1 + akPk
Tk = Tk-1 - akAAT Pk
end
Y = Yk
Making the substitutions Xk <- AT Yk and Pk <- AT Pk and simplifying we
obtain the Qonjugate Qradient Normal Equation ~rror method:
k=O
ro = b- Axo
while Tk 'f 0
k=k+1
if k = 1
P1 = ATro
else
~k = rf_ 1rk-1/rf_ 2rk-2
Pk =ATTk-1 + ~kPk-1
end
ak-- rk-1Tk-1
T IT
PkPk
Xk = Xk-1 + QkPk
rk = rk-1 - akAPk
end
X= Xk
over the set Yo+ JC(AAT,b- AATYo,k). With the change of variable x =
AT y it can be shown that Xk minimizes
over
(10.4.1)
Thus CGNE minimizes the ~rror at each step and that explains the "E" in
"CGNE".
are equivalent and that the former is the normal equation version of the
latter. If we apply CGNR to this square root system and simplify the
results, then we obtain
k == O
r 0 = b- Axo
w hile rk ::10
k = k+ l
if k =1
P1 = ro
e lse
fJk = rf_ 1 Ark-tfr[_ 2 Ark- 2
Ap.r. = Ark- J + fJ.r.AP.r. - 1
end
O"k = rZ'_ 1Ark_ If(Apk)T(Ap.r.)
Xk = Xk- l + ll!kPk
rk = rk-l - akAPk
e nd
X = Xk
It follows from our comments about CGNR that II A -lf2 (b- Ax) 11 2 is min-
imized over the set x 0 +A::( A , r 0 , k) during the kth iteration
10.4.4 GMRES
In §9.3.2 we briefly discussed the Lanczos-based MINRES method for sym-
metric, possibly indefinite, Ax = b problems. In tha.t method the iterate
xk minimizes II b - Ax lb over the set
sk = Xo + span{ro, Aro, .. . , Ak-lro } = Xo + A::( A, To,k) (10.4.2)
The key idea behind the algorithm is to express xk in terms of the Lanczos
vectors Qt , Q2, • • . , Qk which span K(A, r 0 , k) if q1 is a. mult iple of the init ial
residual ro = b - Axo.
In t he Qeneralized Minimum Residual (GMRES) method of Saad and
Schultz (1986) t he same approach is taken except that the iterates are
expressed in terms of Arnoldi vectors instead of Lanczos vectors in order
to handle unsymmetric A. After k steps of the Arnoldi iteration (9.4.1) we
have the factorization
(10.4.3)
where t he columns of QA:+ t = [ Qk qk+l J are the orthonormal Arnoldi vec-
tors and
hu hl2 hlk
h21 h22 h2k
0 E Rk+lxk
ii.r.=
0 hk,k- l hkk
0 0 hk+J,k
10.4. OTHER KRYLOV SUBSPACE METHODS 549
To= b-Axo
hw = I To liz
k=O
while (hk+i,k > 0)
qk+l = Tk(hk+i,k
k=k+l
Tk = Aqk
fori= l:k
hik = q[ Tk
Tk = Tk - h;kqi
end
hk+t,k = II Tk lb
Xk = xo + QkYk where II h10e1 - ilkYk liz = min
end
X= Xk
The upper Hessenberg least square problem can be efficiently solved using
Givens rotations. In practice there is no need to form Xk until one is happy
with its residual.
The main problem with "unlimited GMRES" is that the kth iteration
involves O(kn) flops. Thus like Arnoldi, a practical GMRES implementa-
tion requires a restart strategy to avoid excessive amounts of computation
and memory traffic. For example, if at most m steps are tolerable, then x,.
can be used as the initial vector for the next GMRES sequence.
550 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
10.4.5 Preconditioning
Preconditioning is the other key to making GMRES effective. Analogous
to the development of the preconditioned conjugate gradient method in
§10.3, we obtain a nonsingular matrix M = M 1M2 that approximates A
in some sense and then apply GMRES to the system Ax = b where A =
Mi 1 AM2 1 , b = Mi 1 b, and x = M2x. If we write down the GMRES
iteration for the tilde system and manipulate the equations to restore the
original variables, then the resulting iteration requires the solution of linear
systems that involve the preconditioner M. Thus, the act of finding a good
preconditioner M = M 1M2 is the act of making A = Mi 1 AM2 1 look
as much as possible like the identity subject to the constraint that linear
systems with M are easy to solve.
holds.
As might be expected, it is possible to develop recursions so that Xk
can be computed as a simple combination of Xk-l and qk-l, instead of as
a linear combination of all the previous q-vectors.
10.4. OTHER KRYLOV SUBSPACE METHODS 551
10.4.7 QMR
Another iteration that runs off of the unsymmetric Lanczos process is the
quasi-minimum residual (QMR) method of Freund and Nachtigal (1991).
As in BiCG the kth iterate has the form Xk = xo +QkYk· It is easy to show
that after k steps in (9.4. 7) we have the factorization
AQk =Qk+!Tk
where 'h E IR.k+lxk is tridiagonal. It follows that if q1 = p(b- Ax0 ), then
b-Axk = b-A(xo+QkYk)
= To -AQkYk
= To- Qk+tTkYk
= Qk+t(Pet- TkYk)·
If Yk is chosen to minimize the 2-norm of this vector, then in exact arith-
metic xo + QkYk defines the GMRES iterate. In QMR, Yk is chosen to
minimize I pe1- 'hYk lb·
10.4.8 Summary
The methods that we have presented do not submit to a linear ranking.
The choice of a technique is complicated and depends on a host of factors.
A particularly cogent assessment of the major algorithms is given in Barrett
et al (1993).
Problems
P10.4.1 Analogous to (10.2.16), develop efficient implementations of the CGNR, CGNE,
Conjugate residual met hods.
P10.4.2 Establish the mathematical equivalence of the CGNR and the LSQR method
outlined in §9.3.4.
P10.4.3 Prove (10.4.3).
P10.4.4 Develop an efficient preconditioned GMRES implementation. Proceeding as
we did in §10.3 for preconditioned conjugate gradient method. (See (10.3.2) and (10.3.3)
in particular.)
P10.4.5 Prove that the GMRES least squares problem has full rank.
Krylov space methods and analysis are featured in the following papers:
W.E. Arnoldi (1951). "The Principle of Minimized Iterations in the Solution of the
Matrix Eigenvalue Problem," Quart. Appl. Math. 9, 17-29.
Y. Saad (1981). "Krylov Subspace Methods for Solving Large Unsymmetric Linear
Systems," Math. Comp. 37, 10&-126.
Y. Saad (1984). "Practica.I Use of Some Krylov Subspace Methods for Solving Indefinite
and Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. Comp. 5, 203-228.
Y. Saad (1989). "Krylov Subspace Methods on Supercomputers," SIAM J. Sci. and
Stat. Comp. 10, 120D-1322.
C.-M. Huang and D.P. O'Leary (1993). "A Krylov Multisplitting Algorithm for Solving
Linear Systems of Equations," Lin. Alg. and It• Applic. 194, 9-29.
C.C. Paige, B.N. Parlett,and H.A. VanDer Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Numer. Linear Algebra with Applic. 2,
11&-134.
Preconditioning ideas for Ull8ymmetric problems are discussed in the following papers:
M. Benzi, C.D. Meyer, and M. Tuma (1996). "A Sparse Approximate Inverse Precondi-
tioner for the Conjugate Gradient Method," SIAM J. Sci. Comput. 17, to appear.
D.M. Young and K.C. Jea (1980). "Generalized Conjugate Gradient Acceleration of
Nonsymmetrizable Iterative Methods," Lin. Alg. and Its Applic. 34, 159-94.
0. Axelsson (1980). "Conjugate Gradient Type Methods for Unsymmetric and Incon-
sistent Systems of Linear Equations," Lin. Alg. and Its Applic. 29, 1-16.
K.C. Jea 81ld D.M. Young (1983). "On the Simplification of Generalized Conjugate
Gradient Methods for Nonsymmetrizable Linear Systems," Lin. Alg. and Its Applic.
52/53, 399-417.
V. Faber and T. Manteuffel (1984). "Necessary and Sufficient Conditions for the Exis-
tence of a Conjugate Gradient Method," SIAM J. Numer. Anal. 21 352-362.
Y. Saad and M. Schultz (1985). "Conjugate Gradient-Like Algorithms for Solving Non-
symmetric Linear Systems," Math. Comp. 44, 417-424.
H.A. Vander Vorst (1986). "An Iterative Solution Method for Solving /(A)z =bUsing
Krylov Subspace Information Obtained for the Symmetric Positive Definite Matrix
A," J. Comp. and App. Math. 18, 249-263.
M.A. Saunders, H.D. Simon, and E.L. Yip (1988). "Two Conjugate Gradient-Type
Methods for Unsymmetric Linear Equations," SIAM J. Num. Anal. 25, 927-940.
R. Freund (1992). "Conjugate Gradient-Type Methods for Linear Systems with Complex
Symmetric Coefficient Matrices," SIAM J. Sci. Statist. Comput. 13, 425-448.
Y. Saad (1982). "The Lanczos Biorthogona.lization Algorithm and Other Oblique Pro-
jection Methods for Solving Large Unsymmetric Systems," SIAM J. Numer. Anal.
19, 485-506.
Y. Saad (1987). "On the Lanczos Method for Solving Symmetric Systems with Several
Right Hand Sides," Math. Comp. 48, 651-662.
C. Brezinski and H. Sadok (1991). "Avoiding Breakdown in the CGS Algorithm," Nu-
mer. Alg. 1, 199-206.
C. Brezinski, M. Zaglia., and H. Sadok (1992). "A Breakdown Free Lanczos Type Algo-
rithm for Solving Linear Systems," Numer. Math. 63, 29-38.
S.K. Kim and A.T. Chronopoulos (1991). "A Class of Lanczos-Like Algorithms Imple-
mented on Parallel Computers," Parollel Comput. 17, 763-778.
W. Joubert (1992). "Lanczos Methods for the Solution of Nonsymmetric Systems of
Linear Equations," SIAM J. Matriz Anal. Appl. 13, 926-943.
R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). "An Implementation of the
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices," SIAM J. Sci. and
Stat.Comp. 14, 137-158.
R.W. Freund and N. Nachtigal (1991). "QMR: A Quasi-Minimal Residual Method for
Non-Hermitian Linear Systems," Numer. Math. 60, 315-339.
R.W. Freund (1993). "A Transpose-Free Quasi-Minimum Residual Algorithm for Non-
hermitian Linear System," SIAM J. Sci. Comput. 14, 470-482.
R. W. Freund and N.M. Nachtigal (1994). "An Implementation of the QMR Method
Based on Coupled Two-term Recurrences," SIAM J. Sci. Comp. 15, 313-337.
The residuals in BiCG tend to display erratic behavior prompting the development of
stabilizing techniques:
554 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS
H. van der Vorst (1992). "BiCGSTAB: A Fast and Smoothly Converging Variant of the
Bi-CG for the Solution of Nonsymmetric Linear Systems," SIAM J. Sci. and Stat.
Cump. 13, 631--£44.
M. Gutknecht (1993). "Variants of BiCBSTAB for Matrices with Complex Spectrum,"
SIAM J. Sci. and Stat. Comp. 14, 102o-1033.
G.L.G. Sleijpen and D.R. Fokkema (1993). "BICGSTAB(i) for Linear Equations In-
volving Unsymmetric Matrices with Complex Spectrum,, Electronic Transactions
on Numerical Analysis 1, 11-32.
C. Brezinski and M. Redivo-Zaglia (1995). "Look-Ahead in BiCGSTAB and Other
Product-Type Methods for Linear Systems," BIT 35, 169-201.
In some applications it is awkward to produce matrix-vector product code for both A:r
and AT :r. Transpose free methods are popular in this context. See
P. Sonneveld (1989). "CGS, A Fast Lanczos-Type Solver for Nonsymmetric Linear Sys-
tems," SIAM J. Sci. and Stat. Comp. 10, 36-52.
G. Radicati dl Brozolo and Y. Robert (1989). "Parallel Conjugate Gradient-like Algo-
rithms for Solving Sparse Nonsymmetric Linear Systems on a Vector Multiprocessor,"
Parallel Computing 11, 233-240.
C. Brezinski and M. Redivo-Zaglia (1994). ''Treatment of Near-Breakdown in the CGS
Algorithms," Numerical Algorithms 7, 33-73.
E.M. Kasenally (1995). "GMBACK: A Generalized Minimum Backward Error Algorithm
for Nonsymmetric Linear Systems," SIAM J. Sci. Comp. 16, 698-719.
C.C. Paige, B.N. Parlett, a.nd H.A. van der Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Num. Lin. Alg. with Applic. 2, 115-
133.
M. Hochbruck and Ch. Lubich (1996), "On Krylov Subspace Approximations to the
Matrix Exponential Operator," SIAM J. Numer. Anal., to appear.
M. Hochbruck and Ch. Lubich (1996), "Error Analysis of Krylov Method in a Nutshell,"
SIAM J. Sci. Cumput., to appear.
Connections between the pseudoinverse of a rectangular matrix A and the conjugate
gradient method applied to AT A are pointed out in the paper
Functions of Matrices
555
556 CHAPTER 11. FUNCTIONS OF MATRICES
p(A) = I +A
and
2 ¢>.(A).
11.1.1 A Definition
There are many ways to establish rigorously the notion of a matrix function.
See Rinehart (1955). Perhaps the most elegant approach is in terms of a
line integral. Suppose f(z) is analytic inside on a closed contour r which
encircles >.(A). We define f(A) to be the matrix
1
==? !ki = - .
27rl
1 f(z)ef(zi- A)- 1 e;dz.
!r
Notice that the entries of (zl- A)- 1 are analytic on r and that J(A) is
defined whenever f(z) is analytic in a neighborhood of >.(A).
11.1. EIGENVALUE METHODS 557
For the case when the B; are Jordan blocks we obtain the following:
Theorem 11.1.1 Let x- 1AX = diag{JI, ... ,J,) be the Jordan canonical
form ( JCF) of A E (:"xn with
Aa 1 0
0 Ai 1
1
0 0 A;
being an m,-by~m, Jordan block. I/ f(z) is analytic on an open set contain-
ing A(A), then
/(A) = Xdlag{/(JI), ... , / (J,))X-1
where
j (m,-l){Ai)
/{At) j<1>(Aa)
(m,- 1)1
0 / (A,)
/{J;) =
0
Proof. In view of the remarks preceding the statement of the theorem, it
suffices to examine f (G) where
G = AI+ E E = (t5a.J-I)
~ a q-by-q Jordan block. Suppose (zl- G) Is nonsingultJ.r, Since
q-t E"
(zl- G)- 1 " ..,......-~-:-:-
= LJ (z - A)k+I
k=O
558 CHAPTER 11. FUNCTIONS OF MATRICES
1
q-l [ 1 f(z) ] k j(k)(>.) k
q-l
f(G) = { ; 27ri Jr (z- >.)k+l dz E = 2:-k!-E.
k=O
These results illustrate the close connection between j(A) and the eigen-
system of A. Unfortunately, the JCF approach to the matrix function
problem has dubious computational merit unless A is diagonal.izable with
a well-conditioned matrix of eigenvectors. Indeed, rounding errors of order
u~~: 2 (X) can be expected to contaminate the computed result, since a lin-
ear system involving the matrix X must be solved. The following example
suggests that ill-conditioned similarity transformations should be avoided
when computing a function of a matrix.
Example 11.1.1 If
1 + w- 5
A = [ 0
then any matrix of eigenvectors is a column scaled verHion of
x = [ ~ 2(1--io- 5 ) ]
and has a 2-norm condition number of order 105 . Using a computer with machine
precision u "' w- 7 we find
f![X-'diag(exp(l + 10-s),exp(l- w-s))X] = [ 20 . 700000183007 2.750000 ]
2.718254
while
e
A = [ 2. 718309
0.000000
2. 718282
2. 718255
l
11.1.3 A Schur Decomposition Approach
Some of the difficulties associated with the Jordan approach to the matrix
function problem can be circumvented by relying upon the Schur decom-
position. If A = QTQH is the Schur decomposition of A, then
j(A) = QJ(T)QH.
For this to be effective, we need an algorithm for computing functions of
upper triangular matrices. Unfortunately, an explicit expression for J(T)
is very complicated as the following theorem shows.
11.1. EIGENVALUE METHODS 559
where S;i is the set of all strictly increasing sequences of integers that start
at i and end at j and f [>., 0 , ••• , >.,.] is the kth order divided difference of
fat {>.,o' ... '>.,.}.
Proof. See Descloux (1963), Davis (1973), or Van Loan (1975). 0
Computing f(T) via Theorem 11.1.3 would require 0(2n) flops. Fortu-
nately, Parlett (197 4) has derived an elegant recursive method for deter-
mining the strictly upper triangular portion of the matrix F = f (T). It
requires only 2n3 /3 flops and can be derived from the following commutivity
result:
FT = TF. (11.1.3)
Indeed, by comparing (i, j) entries in this equation, we find
j j
2..: J;ktkj = L t,kfkj j >i
k=i k=i
(11.1.4)
From this we conclude that J;i is a linear combination of its neighbors to its
left and below in the matrix F. For example, the entry hs depends upon
!22, ha, !24, fss, !4s 1 and hs· Because of this, the entire upper triangular
portion ofF can be computed one superdiagonal at a time beginning with
the diagonal, f(tu), ... , f(tnn)- The complete procedure is as follows:
for p = 1:n -1
fori= 1:n- p
j=i+p
s= t;j (fjj - /;;)
for k = i + 1:j- 1
s = s + t;k/kj - /;ktkj
end
Iii = s/(tii - t;;)
end
end
This algorithm requires 2n 3 /3 flops. Assuming that T = QAQH is the
Schur form of A, f(A) = QFQH where F = f(T). Clearly, most of the
work in computing !(A) by this approach is in the computation of the
Schur decomposition, unless f is extremely expensive to evaluate.
n
Example 11.1.2 If
T= [~ ~
and f(z) = {1 + z)/z then F = (/;;) = f(T) is defined by
/n {1+1)/1=2
!22 (1 + 3)/3 = 4/3
h• (1 + 5)/5 = 6/5
!t2 t12(/22- /ul/(t22- tu) = -2/3
h• t23(/33- /22)/(t..- t22) = -4/15
/13 [t13(/33- /u) + (h2/23- /12t23)1/(t33- tu) = -1/15.
l l
eigenvalues are clustered in blocks Tu, ... , Tvv along the diagonal ofT. In
particular, we must compute a partitioning
Next, we compute the submatrices F;; = f(T;;) for i = 1:p. Since the
eigenvalues of T;; are presumably close, these calculations require special
methods. (Some possibilities are discussed in the next two sections.) Once
the diagonal blocks of F are known, the blocks in the strict upper triangle
of F can be found recursively, as in the scalar case. To derive the governing
equations, we equate (i, j) blocks in FT = TF for i < j and obtain the
following generalization of ( 11.1.4):
j-1
This is a linear system whose unknowns are the elements of the block F;j
and whose right-hand side is "known" if we compute the F;3 one block
super-diagonal at a time. We can solve (11.1.5) using the Bartels-Stewart
algorithm (Algorithm 7.6.2).
The block Schur approach described here is useful when computing real
functions of real matrices. After computing the real Schur form A = QTQT,
the block algorithm can be invoked in order to handle the 2-by-2 bumps
along the diagonal ofT.
Problems
P11.1.1 Using the definition (11.1.1) show that (a) Af(A) = f(A)A, (b) /(A) is upper
triangular if A is upper triangular, and (c) f(A) is Hermitian if A is Hermitian.
P11.1.2 Rewrite Algorithm 11.1.1 so that f(T) is computed column by column.
Pll.l.S Suppose A= Xdiag(>.i)X- 1 where X= [z1, ... ,zn] andX- 1 = [tn, ... ,!In J".
Show that if f(A) is defined, then
n
/(A} =L f(>.i)zwfl .
k=1
T= r,. J
r,, P
q f(T) = [ ~1 Ft.]
F••q
p
q
q p
where Fu = f(Tu) and F22 = f(T,,). Assume f(T) is defined.
The contour integral representation of f(A) given in the text is UBeful in functional .anal-
ysis because of its generality. See
N. Dunford and J. Schwartz {1958). Linear Operators, Po.rt I, Interscience, New York.
As we discussed, other definitions of f(A) are poBBible. However, for the matrix functionB
typically encountered in practice, all these definitions are equivalent. See
562 CHAPTER 11. FUNCTIONS OF MATRICES
J.S. frame {1964). "Matrix Functions and Applications, Part II," IEEE Spectrum 1
(April), 102~.
J.S. frame (1964). "Matrix Functions and Applications, Part IV," IEEE Spectrum 1
(June), 123-31.
The following are concerned with the Schur decomposition and its relationship to the
f(A) problem:
D. Davis (1973). "Explicit Functional Calculus," Lin. Alg. and Its Applic. 6, 193-99.
J. Descloux (1963). "Bounds for the Spectral Norm of Functions of Matrices," Numer.
Math. 5, 185--90.
C.F. Van Loan {1975). "A Study of the Matrix Exponential," Numerical Analysis Report
No. 10, Dept. of Maths., University of Manchester, England.
Algorithm 11.1.1 and the various computational difficulties that arise when it is applied
to a matrix having close or repeated eigenvalues are discussed in
B.N. Parlett {1976). "A Recurrence Among the Elements of Functions of Triangular
Matrices," Lin. Alg. and Its Applic. 14, 117-21.
A compromise between the Jordan and Schur approaches to the f(A) problem results if
A is reduced to block diagonal form as described in §7.6.3. See
C.S. Kenney and A.J. Laub (1989). "Condition Estimates for Matrix Functions," SIAM
J. Matrix Anal. Appl. 10, 191-209.
C.S. Kenney and A.J. Laub (1994). "Small-Sample Statistical Condition Estimates for
General Matrix Functions," SIAM J. Sci. Camp. 15, 36--61.
A theme in this chapter is that if A is nonnormal, then there is more to computing f(A)
than just computing f(z) on .\(A). The pseudo-eigenvalue concept is a way of under-
standing this phenomena. See
We begin by bounding II f(A)- g(A) II using the Jordan and Schur matrix
function representations. We follow this discussion with some comments
on the evaluation of m~trix polynomials.
0
being an fflJ-by-m, Jordan block. lf f(z) and g(z) are analytic on oo open
set containing >.(A), then
r=O
1
where
Proof. Let h(z) = f(z)- g(z) and setH= (h;;) = h(A). LetS;~) denote
the set of strictly increasing integer sequences (so, ... , Sr) with the property
that so = i and Sr = j. Notice that
j-i
S;; = U ,, s(r)
and so from Theorem 11.1.3, we obtain the following for all i < j:
i-1
h;; = 2: 2: n.o .• ,n., .•, ···n,r_,,.rh[>-.o ..... >-.rl·
r=l sES~;)
sup (11.2.1)
zEfl
j < i+ r
(11.2.2)
j2:i+r
The theorem now follows by taking absolute values in the expression for
h;; and then using (11.2.1) and (11.2.2). D
The bounds in the above theorems suggest that there is more to approximat-
ing f(A) than just approximating f(z) on the spectrum of A. In particular,
we see that if the eigensystem of A is ill-conditioned and/or A's departure
11.2. APPROXIMATION METHODS 565
from normality is large, then the discrepancy between f(A) and g(A) may
be considerably larger than the maximum of IJ(z) - g(z)l on >.(A). Thus,
even though approximation methods avoid eigenvalue computations, they
appear to be influenced by the structure of A's eigensystem, a point that
we pursue further in the next section.
-.01 1 1 ]
A= 00 1.
[ 0 0 .01
If /(z) = e• and g(z) = 1 + z + z 2 /2, then II /(A) - g(A) II "" 10- 5 in either the
Frobenius norm or the 2-norm. Since t< 2 (X) "' 107, the error predicted by Theorem
11.2.1 is 0(1), rather pessimistic. On the other hand, the error predicted by the Schur
decomposition approach is O(lo- 2 ).
f(z} = Lc"z"
k=O
sin( A)
The following theorem bounds the errors that arise when matrix functions
such as these are approximated via truncated Taylor series.
f(z) = L ovk
k=O
(11.2.4)
II Aq+ij(Hil(As) ll2
le;;(s)l :5 max
(q + 1)1
0$•:51
Example 11.2.2 If
then
eA =[ -0.735759 .0551819 )
-1.471518 1.103638
For q = 59, Theorem 11.2.4 predicts that
q
A ~Ak n
II e - ~ 7CT ll2 ::; (q + 1)! max
O:$a~l
k=O
fl ~ ~!
59 k) = [
-22.25880 -1.4322766 ]
( -61.49931 -3.474280 .
The problem is that some of the partial sums have large elements. For example, I+· · · +
A 11 /17! has entries of order 101. Since the machine precision is approximately 10- 7 ,
rounding errors larger than the norm of the solution are sustained.
it is possible to "build up" the sine and cosine of a matrix from suitably
truncated Taylor series approximates:
Here k is a positive integer chosen so that, say, II A lloo ""2k. See Serbin
and Blalock (1979).
568 CHAPTER 11. FUNCTIONS OF MATRICES
A2 = A2
A3 AA2
F1 bgA3 + bsA2 + b7A + b6I
F2 = A3F1 + bsA2 + b4A + b3I
F A3F2 + b2A2 + b1A + bol.
where
t
Let 8 = 2: J3k2k be the binary expansion of s with !3t =F 0.
k=O
Z=A; q=O
while J3q = 0
z = Z 2; q = q + 1
end
F=Z
fork= q + l:t
Z= z2
if i3k =F 0
F=FZ
end
end
F = l j(At)dt.
If (d 4 jdz 4 )f(zt) = j< 4 l(zt) is continuous fort E [a,b] and if j< 4 l(At) is
defined on this same interval, then it can be shown that F = F + E where
(11.2.7)
Let /;j and eij denote the (i, j) entries ofF and E, respectively. Under the
above assumptions we can apply the standard error bounds for Simpson's
rule and obtain
The inequality (11.2.7) now follows since If E Jl2 ~ n max le;il and
Problems
P11.2.1 (a) Suppose G = >.I+ E is a p-by-p Jordan block, where E = (6;,;-d. Show
that
min{p-l,k}
(>J + E)k = L ( ~ )>.H Ei .
i=O
(b) Use (a) and Theorem 11.1.1 to prove Theorem 11.2.3.
P11.2.2 Verify (11.2.2).
P11.2.3 Show that if II A II• < 1, then log(!+ A) exists and satisfies the bound
ll.2. APPROXIMATION METHODS 571
q A2k+I
sin( A) ., "<
L
-1)k..,.(...,.----,--,
2k+ 1)!
k=O
C. F. Van Loan (1978). "A Note on the Evaluation of Matrix Polynomials," IEEE Tmns.
Auto. Cont. AC-24, 32(}-21.
Other aspects of matrix function computation are discussed in
N.J. Higham and P.A. Knight (1995). "Matrix Powers in Finite Precision Arithmetic,"
SIAM J. Matrix Anal. Appl. 16, 343-358.
R. Mathias (1993). "Approximation of Matrix-Valued Functions," SIAM J. Matrix Anal.
Appl. 14, 1061-1063.
S. Friedland (1991). "Revisiting Matrix Squaring," Lin. Alg. and Its Applic. 154-156,
59-63.
H. Bolz and W. Niethammer (1988). "On the Evaluation of Matrix Functions Given by
Power Series," SIAM J. Matrix Anal. Appl. 9, 202-209.
The Newton and Language representations for f(A) and their relationship to other ma-
trix function definitions is discuSBed in
The "double angle" method for computing the cosine of matrix is analyzed in
S. Serbin and S. Blalock (1979). "An Algorithm for Computing the Matrix Cosine,"
SIAM J. Sci. Stat. Comp. 1, 198-204.
The square root is a particularly important matrix function. See §4.2.10. Several ap.
proaches are possible:
A. Bjorck and S. Hammar ling (1983). "A Schur Method for the Square Root of a Matrix,"
Lin. Alg. and Its Applic. 52/53, 127-140.
N.J. Higham (1986). "Newton's Method for the Matrix Square Root," Math. Comp.
46, 537-550.
N.J. Higham (1987). "Computing Real Square Roots of a Rea.l Matrix," Lin. Alg. and
Its Applic. 88/89, 405--430.
At - ~ (At)k
e - LJ k! .
k=O
Numerous algorithms for computing eAt have been proposed, but most of
them are of dubious numerical quality, as is pointed out in the survey article
by Moler and Van Loan (1978). In order to illustrate what the computa-
tional difficulties are, we present a "scaling and squaring" method based
upon Pade approximation. A brief analysis of the method follows that in-
volves some eAt perturbation theory and comments about the shortcomings
of eigenanalysis in settings where non-normality prevails.
where
~ (p+q-k)!p! k
= 8,(p+q)!k!(p-k)!z
and
~ (p+q-k)!q! k
Dpq(z) = LJ(p+q)!k!(q-k)!(-z) ·
k=O
Notice that Rp 0 (z) = 1 + z + · · · + zP /p! is the pth order Taylor polynomial.
11.3. THE MATRIX EXPONENTIAL 573
Unfortunately, the Pad~ approximant& are good only near the origin, as
the following identity reveals:
II A lloo < ~
2i - 2'
then there exists an E E Rnxn such that
Fpq = eA+E
AE = EA
IIE IIoo ~ e(p,q)lj A lloo
e(p,q) z3-(p+~) p!ql
=
(p + q)!(p + q + 1)! .
These results form the basis of an effective eA procedure wit h error control.
Using the above formulae lt is easy to establish the inequality:
D =I; N = I; X = I; c = 1
fork = l:q
c= c(q - k + 1)/[(2q -k+ l)k)
X=AX; N = N+cX; D=D+ (-l)"cX
end
Solve DF = N for F using Gaussian elimination.
fork= l :j
F=FJ
end
This algorithm requires about 2(q + j + 1/ 3)n3 flops. The roundoff error
properties of ba.ve essentially been analyzed by Ward (1977).
The special Horner techniques of §11.2 can be applied to quicken the
computation of D = D9q(A) and N = N 9 q(A). For example, if q = 8 we
have Nqq(A) = U + AV and Dqq(A) = U- AV where
and
V = ctl + caAz + (Cf.l + 0rA2 )A4 •
Clearly, N and D can be found in 5 matrix multiplies rather than the 7
required by Algorithm 11.3.1.
II e(A+E)t -
II ?t
eAt liz
1! 2
< II E
-
liz
II eAt 112
r II
lo
eA(t-s) II 2 II e(A+E)I II 2 ds
•
where
a( A) =max {Re(.\): .\ E .\(A)} (11.3.3)
and
Ms(t) = ~ II~~ II~ .
A:~o
The quantity a( A) is called the spectml abscissa and with a little manipu-
lation it can be shown that
II e<A+E)t eAt I2 2
II eAtliz :S tilE llzMs(t) exp(tMs(t)ll E ll2).
Notice that Ms(t) = 1 if and only if A is normal, suggesting that the matrix
exponential problem is "well behaved" if A is normal. This observation
is confirmed by the behavior of the matrix exponential condition number
v(A, t), defined by
v(A, t) =
A= [ -~ ~] # eAt_
- e-t [ 1 tM ]
0 1
Problems
PU.S.l Show that e<A+B)t = eA•eBt for all t if and only if AB = BA. (Hint: Express
both sides as a power seri""' in t and compare the coefficient of t.)
PU.3.2 Suppose that A is skew-symmetric. Show that both eA and the (1,1) Pade
approximate Rn (A) are orthogonal. Are there any other values of p and q for which
Rpq(A) is orthogonal?
PU.S.S Show that if A is nonsingular, then there exists a matrix X such that A= eX.
Is X unique?
P11.3.4 Show that if
n n
then
TF
F 11 12 = Jnro· eAT 'PeA'dt.
Pll.3.5 Give an algorithm for computing eA when A = uvT, u, v E Rn.
P11.3.6 Suppose A E fl!'Xn and that v E Rn has unit 2-norm. Define the function
=
<f>(t) II eA•v 11~/2 and show that
¢,(t) $ i<(A)</>(t)
where J.<(A) = >.1((A + AT)/2). Conclude that II eAt ll2 $ e"(A)t where t 2: 0.
Pn.s. 7 Prove the three pseudospectra properti""' given in the text.
C.B. Moler and C.F. Van Loan (1978). "Nineteen Dubious Ways to Compute the Expo-
nential of a Matrix," SIAM Relliew fO, 801-36.
Scaling and squaring with Pad<\ approximant• (Algorithm 11.3.1) and a careful imple-
mentation of Parlett's Schur decomposition method (Algorithm 11.1.1) were found to be
among the less dubious of the nineteen methods scrutinized. Various aspects of Pade
578 CHAPTER 11. FUNCTIONS OF MATRICES
R.S. Varga (1961). "On Higher-Order Stable Implicit Methods for Solving Parabolic
Partial Differential Equations," J. Math. Phys. 40, 22Q-31.
There are many applications in control theory calling for the computation of the ma-
trix exponential. In the linear optimal regular problem, for example, various integrals
involving the matrix exponential are required. See
J. Johnson and C.L. Phillips (1971). "An Algorithm for the Computation of the Integral
of the State Transition Matrix," IEEE 1rons. Auto. Cont. AC-16, 204-5.
C.F. Van Loan (1978). "Computing Integrals Involving the Matrix Exponential," IEEE
1rons. Auto. Cont. AC-23, 395-404.
An understanding of the map A --> exp(At) and its sensitivity is helpful when assessing
the performance of algorithms for computing the matrix exponential. Work in this di-
rection includes
B. Kagstrom (1977). "Bounds and Perturbation Bounds for the Matrix Exponential,"
BIT 17, 39-57.
C.F. Van Loan (1977). "The Sensitivity of the Matrix Exponential," SIAM J. Num.
Anal. 14, 971-81.
R. Mathias (1992). "Evaluating the Frechet Derivative of the Matrix Exponential,"
Numer. Math. 63, 213-226.
The computation of a logarithm of a matrix is an important area demanding much more
work. These calculations arise in various 11system identification" problems. See
Special Topics
579
580 CHAPTER 12. SPECIAL TOPICS
where A E m.mxn (m;::: n), bERm, BE R"x", dE JRP, and o;::: 0. The
generalized singular value decomposition of §8.7.3 sheds light on the solv-
ability of (12.1.2). Indeed, if
(12.1.4)
i=n+l
L ell~ a
2
•
i=r+l
i = 1:r
i=r+1:n,a;f0 (12.1.6)
i = r + 1:n,a; = 0
solves the LSQI problem. Otherwise
(12.1.7)
i=r+l
y· _ { bj/ai a; f 0 i = 1:n
• - d;/,6; Cti =0
is a minimizer of II DAY - b lb· If this vector is also feasible, then we have
a solution to (12.1.2). (This is not necessarily the solution of minimum
2-norm, however.) We therefore assume that
This implies that the solution to the LSQI problem occurs on the boundary
of the feasible set. Thus, our remaining goal is to
we see that the equations 0 = Bh/By; , i = 1:n, lead to the linear system
else
x= Lr (b·)
~ Vt
i=l t1 t
end
is given by
2 2
(A!4) + (A:1) = l.
Fbr this problem we find A• = 4.57132 and o: = [.93334 .35898JT.
In the general ridge regression problem one has some criteria for selecting
the ridge parameter A, e.g., II x(A) Jb = a for some given o. We describe a
A-selection procedure that is discussed in Golub, Heath, and Wahba (1979).
Set Dk = I- ekef = diag(1, ... , 1, 0, 1, ... , 1) E lR.mxmand let xk(A)
solve
Thus, Xk(A) is the solution to the ridge regression problem with the kth row
of A and kth component of b deleted, i.e., the kth experiment is ignored.
Now consider choosing A so as to minimize the cross-validation weighted
square error C(A) defined by
2
we see that (afxk(A)- bk) is the increase in the sum of squares result-
ing when the kth row is "reinstated." Minimizing C(A) is tantamount to
choosing A such that the final model is not overly dependent on any one
experiment.
A more rigorous analysis can make this statement precise and also sug-
gest a method for minimizing C(A). Assuming that A > 0, an algebraic
manipulation shows that
where Zk = (AT A+ AI)- 1ak and x(A) = (AT A+ AI)- 1AT b. Applying
-af to (12.1.11) and then adding bk to each side of the resulting equation
gives
Noting that the residual r = {rt, ... ,rm)T = b- Ax(A) is given by the
formula r = [J- A( AT A+ AJ)- 1 AT]b, we see that
1
C(A) = m LWk
m (
8r 7ab )2
k=! k k
C(>.)
1- t n~i
i= l
( tJ+ .,)
q) ..
The minimization of this expression is discussed in Golub, Heath, and
Wahba (1979).
p
n-p
be the QR factorization of BT and set
p
AQ :;: [ A t
p n-p
Thus, y is determined from the constraint equation RTy :;: d and the vector
z is obtained by solving the unconstrained LS problem
BT = QR ( QR factorization)
Solve R(1:p, 1:p)T y = d for y.
A=AQ
Find z so II A(:,p + 1:n)z- (b- A(:, 1:p)y) 11 2 is minimized.
x = Q(:, 1:p)y + Q(:,p + 1:n)z
Note that this approach to the LSE problem involves two factorizations and
a matrix multiplication.
(12.1.14)
for large >.. The generalized singular value decomposition of §8. 7.3 sheds
light on the quality of the approximation. Let
ur AX diag(aJ, ... ,an)= DA E IRmxn
VTBX = diag(/h, ... ,,6p)=DBEJRI'x"
be the GSVD of (A, B) and assume that both matrices have full rank for
clarity. If U = [ UJ, ... , Um ], V = [ v 1 , .•. , Vp ], and X = [ x1, ... , Xn], then
it is easy to show that
P vTd n uTb
X = L i=l
~-
,._,,
X; + L
i=p+I
~- 1
X; (12.1.15)
x(>.) - x (12.1.17)
12.1. CONSTRAINED L EAST SQUARES 587
mlD.[ 5 ~ 6 ~][:1:]]
:z::a -[i]
3
1000 -1000 0 2
Problema
P 12.1.1 {a) Show that if nuii(A) n null( B) ~ {0}, then (12.1.2) canncrt ha~ a unique
solution. (b) Give an example which shows that the converae is not true. (Hint: A+b
feasible.)
P12.1.2 Let Po(:r), ... ,p,.(:r) be given polynomlu and (:ro, flo), ... , (:r.n , y.,.) a given
set of coordinate pairs with :r,
E (a, b). It Is desired to find a polynomial p(:r) =
:L;-o Oi<PA:(:r) such that :L:.O(p{:r,)-";)
2 Is minimized subject to the constraint that
b N 2
l0
IP'' (:r)):ad:r Rj h ~ (p(zo- 1)- 2p~;') + P(Zi+l) ) ~ Q2
where zo = a+ ih and b =a+ N h. Show that thie leade to an LSQI problem of the form
(12.1.1).
P12.1.S Suppo&e Y = [111 , .. . ,1/, J E R" x• has lhe properly that
yTy = diag(4, .. . ,4) d1 ~ d:a ~ .. · ~ d~o > 0.
Show that if Y = QR is the QR factorization of Y , then R is diagonal with Jr;tJ = d;.
P12.1.4 (a) Show that if (AT A + >.I):r • ATb, ~ > 0, and ft lb = a , then z = :r
(A:r- b)/ A solves the dv.al equatiDnB (AAT + Al)z = - b with II AT z 11 2 a a. (b) Show
that if(AAT +Al)z = - b, R ATzlb =a, then :r = - A7'zaatisfiel (ATA + AT):r = ATb,
U :r Y2 = or.
P12.1.5 Suppose A Is the ~by-1 matrix of ones and let b E R"'. Show that the
cross-validation technique with unit weights preseribes at1 optimal A given by
...
where or= (bl +.. . + bm.)/m and 8 = L:<b·- '>2 / (m- 1).
·-1
588 CHAPTER 12. SPECIAL TOPICS
A = [ ~~ ]
where A1 E E'xn is nonsingular and A2 E R(m-n)xn. Show that
Asaume that B and 0 are positive definite and that Z E Rn.xn is a. nollBingular matrix
with the property that zTBz = diag(.I.Jo····.l.n) and zTcz =I•. Assume that
>.1 ~ · · · ~ >. •. (a) Show that the the set of feasible x is empty unless .l.n -:5, {P h 2 -:5, >.1.
(b) Using Z, show how the two constraint problem can be converted to a single constraint
problem of the form
min II Ax- b lb
yTWy=./32-An'l2
G.E. Forsythe and G.H. Golub (1965). "On the Stationary Values of a Second-Degree
Polynomial on the Unit Sphere," SIAM 1. App. Math. 14, 1051Hl8.
L. Elden (1980). "Perturbation Theory for the Least Squar.., Problem with Linear
Equality Constraints," SIAM 1. Num. Anal. 17, 338-50.
W. Gander (1981). "Lea.st Squares with a Quadratic Constraint," Numer. Math. 36,
291-307.
L. Elden (1983). "A Weighted Pseudoinverse, Generalized Singular Values, and Con-
strained Least Squares Problems," BIT llll, 487-502.
G.W. Stewart (1984). "On the Asymptotic Behavior of Scaled Singular Value and QR
Decompositions," Math. Comp. 43, 483-490.
G.H. Golub and U. von Matt (1991). "Quadratically Constrained Lea.st Square~ and
Quadratic Problems," Numer. Math. 59, 561-580.
T.F. Chan, J.A. Olkin, and D. Cooley (1992). "Solving Quadratically Constrained Least
Squares Using Black Box Solvers," BIT 32, 481-495.
Other computational aspects of the LSQI problem involve updating and the handling of
banded and sparse problems. See
K. Schittkowski and J. Stoer (1979). "A Factorization Method for the Solution of Con-
strained Linear Le118t Squares Problems Allowing for Subsequent Data changes,"
Numer. Math. 31, 431-463.
D.P. O'Leary and J.A. Simmons (1981). "A Bidiagona)ization-Regulacization Procedure
for Large Scale Discretizations oflli-Pnsed Problems," SIAM 1. Sci. and Stat. Camp.
2, 474-489.
A. Bjorck (1984). "A General Updating Algorithm for Constrained Linear Least SquSre~
Problems," SIAM 1. Sci. and Stat. Comp. 5, 394-402.
L. Elden (1984). "An Algorithm for the Regularization of Ill-Conditioned, Banded Least
SquSre~ Problems," SIAM 1. Sci. and Stat. Camp. 5, 237-254.
M.J.D. Powell and J.K. Reid (1968). "On Applying Householder's Method to Linear
Least Squares Problems," Proc. IFIP Congre••, pp. 122-26.
C. Van Loan (1985). "On the Method of Weighting for Equality Constrained Least
Squares Problems," SIAM 1. Numer. Anal. llll, 851-864.
J.L. Barlow, N.K. Nichols, and R.J. Plemmons (1988). "'terative Methods for Equality
Constrained Least SquSre~ Problems," SIAM J. Sci. and Stat. Comp. 9, 892-906.
J.L. Barlow (1988). "Error Analysis and Implementation Aspects of Deferred Correction
for Equality Constrained Least-Squares Problems," SIAM 1. Num. Anal. 25, 134(}-
1358.
J.L. Barlow and S.L. Handy (1988). "The Direct Solution of Weighted and Equality
Constrained Least-Squares Problems," SIAM 1. Sci. Stat. Comp. 9, 704-716.
J.L. Barlow and U.B. Vemulapati (1992). "A Note on Deferred Correction for Equality
Constrained Least Square~ Problems," SIAM 1. Num. Anal. 29, 249--256.
M. Wei (1992). "Perturbation Theory for the Rank-Deficient Equality Constrained Least
Squar.., Problem," SIAM 1. Num. Anal. 29, 1462-1481.
M. Wei (1992). "Algebraic Properties of the Rank-Deficient Equality-Constrained and
Weighted Least Squac.., Problems," Lin. Alg. and Its Applic. 161, 27-44.
M. Gulli.ksson and P-A. Wedin (1992). "Modifying the QR-Decomposition to Con-
strained and Weighted Linear Least Squares," SIAM 1. Matriz Anal. Appl. 13,
1298-1313.
A. Bjorck and C.C. Paige (1994). "Solution of Augmented Linear Systems Using Or-
thogonal Factorizations," BIT 34, 1-24.
M. Gulliksson (1994). "Iterative Refinement for Constrained and Weighted Linear Least
Squares," BIT 34, 239--253.
590 CHAPTER 12. SPECIAL TOPICS
M. Gulliksson (1995). "Backward Error Analysis for the Constrained and Weighted
Linear Least Squares Problem When Using the Weighted QR Factorization," SIAM
J. Matrix. Anal. Appl. 18, 675--£87.
Generalized factorizations have a.n important bearing on generalized least squares prob-
lems.
C. C. Paige (1985). "The General Linear Model and the Generalized Singular Value
De<:omposition," Lin. Alg. and Its Applic. 70, 269-284.
C.C. Paige (1990). "Some Aspects of Generalized QR Factorization," in Reliable Nu-
merical Computations, M. Cox and S. Hammarling (eds), Clarendon Press, Oxford.
E. Anderson, Z. Bai, and J. Dongarra (1992). "Generalized QR Factorization and Its
Applications," Lin. Alg. and Its Applic. 162/163/164, 243-271.
with
r Tb
X; L..,
" " U;--v;
i=l O'i
where
r
A = u~vT = Ea;u;vf (12.2.1)
i=l
is the SVD of A and i' is some numerically determined estimate of r. Note
that x; minimizes II A,x - b 11 2 where
;
A, = L:a;u;vf
i=l
A=
I 1
Il+E
[ 0 0
0
I,
I
l b = [ -i l,
f = 2, and P = I, then min II B1z- b ll2 = 0, but II Bib ll2 = O{Ijf}.
On the other hand, any proper subset involving the third column of A is
strongly independent but renders a much worse residual.
592 CHAPTER 12. SPECIAL TOPICS
This example shows that there can be a trade-off between the indepen-
dence of the chosen columns and the norm of the residual that they render.
How to proceed in the face of this trade-off requires additional mathemati-
cal machinery in the form of useful bounds on ar(B 1 ), the smallest singular
value of B1.
Theorem 12.2.1 Let the SVD of A E Rmxn be given by {12.2.1}, and
define the matrix Bt E Rmxr, i' ~rank( A), by
AP = [ Bt B2 I
f n- i'
where P E Rnxn is a permutation. If
-T 2 -T 2
= IIL:tVuwll2 + IIL:2V12wlb ·
The theorem now follows because II L:1 V?;w 1 2 2: ar(A)/11 Vj[ 1 1 2. D
Vu V12 ] i'
V= [ V21 V22 n-i'
i' n-i'
12.2. SuBSET SELECTION USING THE SVD 593
[ ~~ J = pT [ ~: ] = [ ~E~~ J .
Note that Rn is nonsingular and that II Vjj 1 1\ 2 = II R]} lb· Heuristically,
column pivoting tends to produce a well-conditioned Rn, and so the overall
process tends to produce a well-conditoned Vn. Thus we obtain
A=
[
3
7
2
4
4
5
1.0001
-3.0002
2.9999
l ul'
,
b =
-1 4 5.0003
A is close to being rank 2 in the sense that u,(A) "".0001. Setting f = 2 in Algorithm
12.2.1 leads to x = [0 0.2360 - 0.0085]T with II Ax- b II, = .1966. (The permutation
Pis given by P = [e3 e2 e,].) Note that XLS = [828.1056 -8278569 828.0536]T
with minimum residual II AxLs - b II, = 0.0343.
u = [ U1 u2 l
r m-r
is a partitioning of the matrix U in (12.2.1) and where Q 1 = B 1 (B'f B 1 )- 112 .
Using Theorem 2.6.1 we obtain
<
Noting that
P12.2.1 Suppose A E Rmxn and that II uTA 11 2 = " with uT u = l. Show that if
uT(Ax- b)= 0 for x ERn and bERm, then II x II,<': luTbi/CI.
P12.2.2 Show that if Bt E ff"Xk is comprised of k columns from A E Rmxn then
"k(Bt) :5 "k(A).
P12.2.3 In equation (12.2.2) we know that the matrix
pTy = ~12 ]
f
V22 n-f
r n-f
12.3. TOTAL LEAST SQUARES 595
v,-;;'
is orthogonal. Thus, II ii,"i' 112 =II cs
112 from the decomposition (Theorem 2.6.3).
Show how to compute P by applying the QR with column pivoting algorithm to [V,f. ii,~].
(For r > n/2, this procedure would be more economical than the technique discussed in
the text.) Incorporate this observation in Algorithm 12.2.1.
G.H. Golub, V. Klema and G.W. Stewart (1976). "Rank Degeneracy and Least Squar"'
Problems," Technical Report TR-456, Department of Computer Science, University
of Maryland, College Park, MD.
A subset selection procedure based upon the total least squares fitting technique of §12.3
is given in
S. Van Huffel and J. Vandewalle (1987). "Subset Selection Using the Total Least Squares
Approach in Collinearity Problems with Errors in the Variables," IAn. Alg. and Its
Applic. 88/89, 695-714.
The literature on subset selection is vast and we refer the reader to
min II Dr lb (12.3.1)
b+r E range(A)
In this problem, there is a tacit assumption that the errors are confined to
the "observation" b. When error is also present in the "data" A, then it
may be more natural to consider the problem
where D = diag(dt, ... , dm) and T = diag(tt, ... , tn+J) are nonsingular.
This problem, discussed in Golub and Van Loan (1980), is referred to as
the total least squares (TLS) problem.
If a minimizing [Eo, ro] can be found for (12.3.2), then any x satisfying
(A+ Eo)x = b + ro is called a TLS solution. However, it should be realized
n• [! ],
that (12.3.2) may fail to have a solution altogether. For example, if
then for all f > 0, bE ran(A + Ef). However, there is no smallest value of
II [E, r]IIF for which b +r E ran(A+E).
A generalization of (12.3.2) results if we allow multiple right-hand sides.
In particular, if BE lRmxk, then we have the problem
where E E lRmxn and R E R"xk and the matrices D = diag(d~, ... ,dm)
and T = diag(it, ... , tn+k) are nonsingular. If [Eo, Ro] solves (12.3.3),
then any X E lRnxk that satisfies (A+ Eo)X = (B + Ro) is said to be a
TLS solution to (12.3.3).
In this section we discuss some of the mathematical properties of the
total least squares problem and show how it can be solved using the SVD.
Chapter 5 is the only prerequisite. A very detailed treatment of the TLS
problem is given in the monograph by Van Huffel and Vanderwalle (1991).
u = 1 u1 U2] v = [ Vu
V21
vl2] n
V22 k
n k
n k
E= [~I 0 ]
E2
n
k
n k
If O',..(C!) > O'n+I(C), then the matrix [Eo, Ro] defined by
solves (12.3.3}. lfT1 = diag(t1, ... , t,..) and T2 = diag(t,..+l, ... ,t,..+k) then
the matrix
XTLS = -T1 V12V221T2-l
exists and is the unique solution to (A+ Eo)X = B + Ro.
12.3. TOTAL LEAST SQUARES 597
Proof. We first establish two results that follow from the assumption
an(C1) > <Tn+J(C). From the equation CV = UE we have C1 V12+C2V22 =
U2E 2. We wish to show that l/2 2 is nonsingular. Suppose V22X = 0 for some
unit 2-norm X. It follows from v1~v12 + v:r:;v22 = I that II v12X 112 = 1. But
then
O"n+J(C) ~II U2E2x lb =II c1 v12x lb ~ <Tn(C1),
a contradiction. Thus, the submatix l/2 2 is nonsingular.
The other fact that follows from an(C1) > <Tn+ 1(C) concerns the strict
separation of an( C) and <Tn+ 1(C). From Corollary 8.3.3, we have an( C)~
<Tn(C1) and SO <Tn(C) ~ <Tn(C1) > <Tn+1(C).
Now we are set to prove the theorem. If ran{B + R) C ran( A+ E),
then there is an X (n-by-k) so (A+ E)X = B + R, i.e.,
Thus, the matrix in curly brackets has, at most, rank n. By following the
argument in Theorem 2.5.3, it can be shown that
n+k
II D[ E, R]T IIF ~ L a;{C) 2
i=n+l
r-1 [ -~k ] = [ ~~ ] s
for some k-by-k matrix S. From the equations Tj 1X = V12S and - T2- 1 =
V22S we see that S = -V221T2 1 . Thus, we must have
1 1
X = T1 V12S = -T1 V12 \t22 T2- = XrLS· D
If an(C) = <Tn+J(C), then the TLS problem may still have a solution,
although it may not be unique. In this case, it may be desirable to single
out a "minimal norm" solution. To this end, consider the r-norm defined
on lRnxk by I z llr = II r1- 1ZT2 112· If X is given by {12.3.5), then from the
CS decomposition {Theorem 2.6.3) we have
2 _ 1- ak(V22) 2
II X llr = II V12V22 1 ll22 = O"k {V22 ) 2 ·
This suggests choosing V in Theorem 12.3.1 so that ak{l/2 2) is maximized.
598 CHAPTER 12. SPECIAL TOPICS
V(:,n+1-p:n+1)Q [~ p
~] ~
1
if Vn+l,n+l of 0
fori= l:n
X; = -t;iii,n+J/(tn+!Vn+l,n+l)
end
end
This algorithm requires about 2mn 2 + 12n3 flops and most of these are
associated with the SVD computation.
Example 12.3.1 The TLS problem min II [ e, r JIIF where a= [1, 2, 3, 4]T and
(o+e)x=b+r
b = [2.01, 3.99, 5.80, 8.3o]T has solution XTLS = 2.0212, e = [-.0045, -.0209, -.1048, .0855JT,
and r = [.0022, .0103, .0519, -.0423]T. Note that for this data XLs = 2.0197.
12.3. TOTAL LEAST SQUARES 599
~cE !a'!':r:-bs l2
'1/J(:r:) = LJ i r;.- 2 + t - 2
i•l :r: 1 :r: n+l
Pc = { [ : ] : a E Rn, b E R, b = :r:T a}
where distance in m,n+t is measured by the norm II z II =II Tz lb· A great
deal has been written about this kind of fitting. See Pearson {1901) and
Mad8JlSky (1959).
Problems
P12.3.1 Consider t he T LS problem (12.3.2) with nonBlngular D and T. (a) Show that
if rank( A) < n, then (12.3.2) bas a solution if and only if b E ran( A). (b) Show that if
rank( A) = n , then (12.3.2) has no solution if AT.D2b = Osnd lt,.+t lll Db lb ~ u,.(DATt)
where T1 = diag(tt, ... , tn)·
P12.3.2 Show that if 0 = D[ A , b )T = [ A1 , d] and u,.(O) > ITn+l(O), then the TLS
solution :r satisfies (Af A t - 1Tn+l(0) 2/):r = Af d.
P12.3.3 Sbow how to solw (12.3.2) with the added oonstn.int t hat the firat p columns
of the minimizing E are zero.
G.H. Golub and C.F. Van Loan (1980). "An Analysis of the Total Least Squares Prob-
lem," SIAM J. Num. Anal. 17, 883-93.
The bearing or Lhe SVD on the TLS problem is set forth in
G.H. Golub and C. R.elnsch (1970). "Singular Value Decomposition and Least Square~
Solutio1111," Numer. MiliA. 1.1, 403-420.
600 CHAPTER 12. SPECIAL TOPICS
G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Review 15,
318-334.
S. Van Huffel and J. Vandewalle (1991). The Total L&J.St Squares Problem: Computa-
tional Aspects and Analysis, SIAM Publications, Philadelphia.
U some of the columns of A are known exactly then it is sensible to force the TLS per-
turbation matrix E to be zero in the same columns. Aspects of this constrained TLS
problem are discussed in
J.W. Demmel (1987). ''The Smallest Perturbation of a Submatrix which Lowers the
Rank and Constrained Total Least Squares Problems, SIAM J. Numer. Anal. 24,
19~206.
S. Van Huffel and J. Vandewalle (1988). ''The Partial Total Least Squares Algorithm,"
J. Comp. and App. Math. 21, 333-342.
S. Van Huffel and J. Vandewalle (1988). "Analysis and Solution of the Nongeneric Tbtal
Least Squares Problem," SIAM J. Matrix Anal. Appl. 9, 36G-372.
S. Van Huffel and J. Vandewalle (1989). "Analysis and Properties of the Generalized
Total Least Squo.res Problem AX "' B When Some or All Columns in A a.re Subject
to Error," SIAM J. Matrix Anal. Appl. 10, 294-315.
S. Van Huffel and H. Zha (1991). ''The Restricted Total Least Squares Problem: For-
mulation, Algorithm, and Properties," SIAM J. Matrix Anal. Appl. 1!!, 292-309.
S. Van Hutrel (1992). "On the Significance of Nongeneric Total Least Squares Problems,"
SIAM J. Matri:z: Anal. Appl. 13, 2G-35.
M. Wei (1992). "The Analysis for the Total Least Squares Problem with More than One
Solution," SIAM J. Matri:z: Anal. Appl. 13, 746-763.
S. Van Huffel and H. Zha (1993). "An Efficient Total Least Squo.res Algorithm Based
On a Rank-Revealing Two-Sided Orthogonal Decomposition," Numerical Algorithms
•• 101-133.
C.C. Paige and M. Wei (1993). "Analysis of the Generalized Total Least Squares Problem
AX = B when Some of the Columns o.re Free of Error," Numer. Math. 65, 177-202.
R.D. Fierro and J.R. Bunch (1994). "Collinearity and Total Least Squo.res," SIAM J.
Matriz Anal. Appl. 15, 1167-1181.
Other references concerned with least squares fitting when there are errors in the data
matrix include
K. Pearson (1901). "On Lines and Planes of Closest Fit to Points in Space," Phil. Mag.
1!, 55~72.
A. Wald (1940). "The Fitting of Straight Lines if Both Variables are Subject to Error,"
Annals of Mathematical Statistics 11, 284-300.
A. Madanaky (1959). "The Fitting of Straight Lines When Both Variables Are Subject
to Error," J. Amer. Stat. Assoc. 5.1, 173-205.
I. Linnik (1961). Method of L&J.St Squares and Principles of the Theory of Observations,
Pergamon Press, New York.
W.G. Cochrane (1968). "Errors of Measurement in Statistics," Technometrics 10, 637-
66.
R.F. Gunst, J.T. Webster, and R.L. Mason (1976). "A Comparison of Least Squo.res
and Latent Root Regression Estimators," Technometrics 18, 75-83.
G.W. Stewart (1977c). "Sensitivity Coefficients for the Effects of Errors in the Ind<>-
pendent Variables in a Linear Regression," Technical Report TR-571, Department of
Computer Science, University of Maryland, College Park, MD.
A. Van der Sluis and G.W. Veltkamp (1979). "Restoring Rank and Consistency by
Orthogonal Projection," Lin. Alg. and Its Applic. 28, 257-78.
12.4. COMPUTING SUBSPACES WITH THE SVD 601
Example 12.4.1
Theorem 12.4.1 Suppose A E Rmxn and let {z11 ••• , Zt} be an orthonor-
mal basis for null( A). Define Z = [ Zt. ... , Zt ) and let {Wt. ... , Wq} be an
orthonormal basis for null(B Z) where B E JR?x". If W = [ Wt, ... , Wq ] ,
then the columns of ZW form an orthonormal basis for null( A) n null( B).
When the SVD is used to compute the orthonormal bases in this theorem
we obtain the following procedure:
The amount of work required by this algorithm depends upon the relative
sizes of m, n, p, and r.
We mention that a practical implementation of this algorithm requires
a means for deciding when a computed singular value &; is negligible. The
use of a tolerance 6 for this purpose (e.g. &; < 6 => &; = 0) implies that
the columns of the computed Y "almost" define a common null space of A
and B in the sense that II AY 112 :::: II BY !12 :::: 6.
Example 12.4.2 If
A = [ ~ =~ ~ ] wd B = [ ~ ~ ~]
then null( A) n null( B)= span{x}, where x = [1 -2 - 3]T. Applying Algorithm 12.4.2
we find
-.8165 .0000 ] [ - 3273 ] [ .2673 ]
V 2 AV2c = -.4082 .7071 _: 9449 "" -.5345
[ .4082 .7071 -.8018
subject to:
llull=llvll=l
UTU; = 0 i = 1:k -1
vTv, = 0 i = 1:k -1.
Note that the principal angles satisfy 0 ~ 01 ~ · · · ~ Oq ~ 71'/2. The vectors
{ u 1 , ••• , uq} and {v 1 , ••• , vq} are called the principal vectors between the
subspaces F and G.
Principal angles and vectors arise in many important statistical appli-
cations. The largest principal angle is related to the notion of distance
between equidimensional subspaces that we discnssed in §2.6.3 If p = q
then dist(F, G) = \/1- cos(Op) 2 = sin(Ov)·
If the columns of QF E Rmxp and Qo E Rmxq define orthonormal bases
for F and G respectively, then
[ u 1, ,uP]
.•.
cos(6k)
Typically, the spaces F and G are defined as the ranges of given matrices
A E Rmxp and BE Rmxq. In this case the desired orthonormal bases can
be obtained by computing the QR factorizations of these two matrices.
C=Q~QB
Compute the SVD yTcz = diag(cos(8k)).
QAY(:,1:q) = [uJ, ... ,uq]
QBZ = [ VJ, •.• ,Vq j
Proof. The proof follows from the observation that if cos(Ok) 1, then
Uk = Vk. 0
Example 12.4.3 If
A [: n and B = [ ~ 3 1
]
then the cosines of the principal angles between ran( A) and ran( B) are 1.000 and .856.
Problems
P12.4.1 Show that if A and B are m-by-p matrices, with p :<:; m, then
p
min II A- BQ II} = L):r;(A) 2 - 20';(BT A)+ 0';(8) 2 ).
QTQ=lp i=l
P12.4.2 Extend Algorithm 12.4.2 so that it can compute an orthonormal be.sis for
null(A 1 ) n · · · n null( A.).
P12.4.3 Extend Algorithm 12.4.3 to handle the case when A and Bare rank deficient.
P12.4.4 Relate the principal angles and vectors between ran(A) and ran(B) to the
eigenvalues and eigenvectors of the generalized eigenvalue problem
P12.4.5 Suppose A, BE R"'xn and that A has full column rank. Show how to compute
a symmetric matrix X E Jl!'xn that minimizes II AX- B IIF· Hint: Compute the SVD
of A.
When B = I, this problem amounts to finding the clOBeSt orthogonal matrix to A. This
is equivalent to the polar decomposition problem of §4.2.10. See
A. Bjorck and C. Bowie (1971). "An Iterative Algorithm for Computing the Best Esti-
mate of an Orthogonal Matrix," SIAM J. Num. Anal. 8, 358--<i4.
N.J. Higham (1986). "Computing the Polar Decompoaition-with Applications," SIAM
J. Sci. and Stat. Comp. 7, 1160-1174.
If A is reasonably close to being orthogonal itself, then Bjorck and Bowie's technique is
more efficient than the SVD algorithm.
The problem of minimizing II AX - B IIF subject to the constraint that X is sym-
metric is studied in
N.J. Higham (1988). "The Symmetric Procrustes Problem," BIT 28, 133-43.
A. Bjorck and G.H. Golub (1973). "Numerical Methods for Computing Angles Between
Linear Subspaces," Math. Comp. 27, 579-94.
G.H. Golub and H. Zha (1994). "Perturbation Analysis of the Canonical Correlations of
Matrix Pairs," Lin. Alg. and Its Applic. 210, 3-28.
n~[!~~~l
and then update as follows:
R ~ P,R [ !~ ~ ~l w Jfw
R ~ ~ [! ~ ~ ~
P,R l w = J.Tw
2
H ~ ~ ~~~~
Consequently,
JiR [ l
(J[ .. · 1J'_ 1)(R + wvT) = H ±II w ll2e1vT = H1 (12.5.3)
A careful assessment of the work reveaJs that about 26n2 flops are required.
The vector w = QT u requires 2n2 Bops. Computing H and accumulating
the J~c into Q. involves 12n2 Bops. Finally, computing R 1 and multiplying
the G~c into Q involves 12n2 Hops.
The technique reailily extends to the case when B Is rectangular. It can
also be generalized to compute the QR factorization of B + uvr where
rank(UVT) = p > 1.
a; € 1R"' (12.5.5)
[1'
v k-1
Tkk 1
R = 0 m-k
k- 1 1
Qr A = [ ~0 R33
~!}
1
l = H
X X X X X
0 X X X X
0 0 X X X
H = 0 0 X X X m = 7, n = 6, k = 3
0 0 0 X X
0 0 0 0 X
0 0 0 0 0
Clearly, the unwanted subdiagonal elements h~t+I ,k , .•. , hn,n- l can be ze-
roed by a sequence of Givens rotations: c:;_, ···
G'f H = R 1• Here, G; is
12.5. UPDATING MATRIX FACTORIZATIONS 609
Wk+ l
0
=
with .f[+l · · · JJ;_1 A R upper triangular. We illustrate this by continuing
with the above example:
X X X X X X
0 X X X X X
0 0 X X X X
T-
H = J6A = 0 0 0 X X X
0 0 0 X 0 X
0 0 0 X 0 0
0 0 0 0 0 0
610 CHAPTER 12. SPECIAL TOPICS
X X X X X X
0 X X X X X
0 0 X X X X
H = f[H = 0 0 0 X X X
0 0 0 X 0 X
0 0 0 0 0 X
0 0 0 0 0 0
X X X X X X
0 X X X X X
0 0 X X X X
H = f[H == 0 0 0 X X X
0 0 0 0 X X
0 0 0 0 0 X
0 0 0 0 0 0
- [ wTA ]
A=
diag(1, QT)A = [ u: ] = H
A= QlRI
P -_ [ I,.0 Im-k]
0 '
A=
[ ~] 1
m-1
(12.5.6)
where A E Rmxn with m > n and z E R". Our task is to find a lower
triangular G1 such that G1 Gf = AfA 1 • There are several approaches to
this interesting and important problem. Simply because it is an opportunity
to Introduce some new ideas, we present a downdating procedure that relies
on hyperbolic transformations.
612 CHAPTER 12. SPECIAL TOPICS
(12.5.7)
and suppose that we can find H E JR.(n+ L) x (n+l) such that HT S H = S with
the property that
(12.5.8)
we obtain the equation cx2 = SXJ. Note that there is no solution to this
equation if XJ = x2 ofi 0, a clue that hyperbolic rotations are not as nu-
merically solid as their Givens rotation counterparts. If x 1 ofi x2 then it is
possible to compute the cosh-sinh pair:
if X2 = 0
s = 0; c = 1
else (12.5.9)
if lx2l < lx1l
T = X2/XIi C = 1/y1 - r2; 8 = CT
elseif lx 1 < 1 lx2l
7 = XJ/X2j S = 1/~i C = ST
end
end
12.5. UPDATING MATRIX FACTORIZATIONS 613
0 bserve that the norm of the hyperbolic rotation produced by this algo-
rithm gets large as x 1 gets close to x2.
Now any matrix H = H(p, n + 1, 1:1) E JR(n+L) x (n+l) that is the identity
everywhere except _h£,P = hn+l,n+L = cosh(!:!) and hp,n+l = hn+l,p =
-sinh( I:!) satisfies H" SH = S where S is prescribed in (12.5.7). Using
(12.5.9), we attempt to generate hyperbolic rotations Hk = H(1, k, l:lk) for
k = 2:n + 1 so that
This turns out to be possible if A has full column rank. Hyperbolic rotation
Hk zeros entry (k + 1, k). In other words, if A has full column rank, then
it can be shown that each call to (12.5.9) results in a cosh-sinh pair. See
Alexander, Pan, and Plemmons (1988).
(12.5.10)
e 0 0 0 0 0
0
e e 0 0 0 0
0
e e e 0 0 0
0
e e e f 0 0
0
(12.5.11)
h h h h e 0
0
h h h h e e 0
h h h h e e e
w w w w y y y
we have deduced that the numerical raok is four. In practice, this involves
comparisons with a small tolerance as discussed in §5.5.7.
Using zeroing techniques similar to those presented in §12.5.3, the bot-
tom row can be zeroed with a sequence of row rotations giving
X 0 0 0 0 0 0
X X 0 0 0 0 0
X X X 0 0 0 0
[~ ] X
X
X
X
X
X
X
X
X
X
X
X
0
X
X
0
0
X
0
0
0
X X X X X X X
0 0 0 0 0 0 0
It follows that
X 0 0 0 0 0 0
X X 0 0 0 0 0
X X X 0 0 0 0
X X X X 0 0 0
X X X X X 0 0
X X X X X X 0
h h h h h h e
with small h's and e. We can repeat the condition estimation and zero
chasing on the leading 6-by-6 portion thereby producing (perhaps) another
row of small numbers:
X 0 0 0 0 0 0
X X 0 0 0 0 0
X X X 0 0 0 0
X X X X 0 0 0
X X X X X 0 0
h h h h h e 0
h h h h h e e
(If not, then the revealed rank is 6.) Continuing in this way, we can restore
any lower triangular matrix to rank-revealing form.
In the event that they vector in (12.5.11) is small, we can reach rank-
revealing form by a different, more efficient route. We start with a sequence
of left and right Givens rotations to zero all but the first component of y:
; f 0 0 0 0 0 0 f 0 0 0 0 0 0 -
f f 0 0 0 0 0 f f 0 0 0 0 0
f f f 0 0 0 0 f f f 0 0 0 0
f f f f 0 0 0 f f f f 0 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e e h h h h e e 0
h h h h e e e h h h h e e e
X X X X y y 0 X X X X y y 0
f 0 0 0 0 0 0 f 0 0 0 0 0 0
f f 0 0 0 0 0 f f 0 0 0 0 0
f f f 0 0 0 0 f f f 0 0 0 0
f f f f 0 0 0 f f f f 0 0 0
h h h h e e 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
X X X X y. 0 0 X X X X y. 0 0
12.5. UPDATING MATRIX FACTORIZATIONS 617
Here, "U;j'' means a rotation of rows i and j and "V;/' means a rotation of
columns i and j. It is important to observe that there is no intermingling
of small and large numbers during this process. The h's and e's are still
small.
e 0 0 0 0 0 0
e e 0 0 0 0 0
e e e 0 0 0 0
e e e e 0 0 0
(12.5.12)
h h h h e 0 0
h h h h e e 0
h h h h e e e
y y y y y 0 0
e 0 0 0 0 0 0 - e 0 0 0 0 0 0
e e 0 0 0 0 0 e e 0 0 0 0 0
e e e 0 0 0 0 e e e 0 J.L 0 0
e e e e /1 0 0 e e e e /1- 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e e h h h h e e 0
h h h h e e e h h h h e e e
X X X 0 y 0 0 X X 0 0 y 0 0
e 0 0 0 0 0 0 e 0 0 0 /1- 0 0
e e 0 0 /1 0 0 e e 0 0 /1- 0 0
e e e 0 J.L 0 0 e e e 0 /1- 0 0
e e e e /1 0 0 e e e e /1 0 0
h h h h e e 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
X 0 0 0 y 0 0 L 0 0 0 0 y •• 0 0
in planes (1,5), (2,5), (3,5), and (4,5) can remove the 11-'s:
£ 0 0 0 0 0 0 £ 0 0 0 0 0 0
£ £ 0 0 11- 0 0 £ £ 0 0 0 0 0
i i i 0 11- 0 0 £ £ £ 0 11- 0 0
i i £ £ 11- 0 0 £ £ £ £ 11- 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
L y 0 0 0 y 0 0 y y 0 0 y 0 0
£ 0 0 0 0 0 0 £ 0 0 0 0 0 0
£ £ 0 0 0 0 0 £ £ 0 0 0 0 0
£ £ £ 0 0 0 0 £ £ £ 0 0 0 0
£ £ £ £ 11- 0 0 £ £ £ £ 0 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
y y y 0 y 0 0 . y y y y y 0 0
thus producing the structure displayed in (12.5.12). All the y's are small
and thus a sequence of row rotations U67 , U4 7 , •.. , U17 , can be constructed
to clean out the bottom row giving the rank-revealed form
,. £ 0 0 0 0 0 0
£ £ 0 0 0 0 0
£ £ £ 0 0 0 0
i i £ £ 0 0 0
h h h h e 0 0
h h h h e e 0
h h h h e e e
0 0 0 0 0 0 0
Problems
P12.5.1 Suppose we ha.ve the QR factorization for A E E"x" and now wish to mini-
mize II (A+ uvT)x - b ll2 where u, bE Rm and v E R" are given. Give an algorithm for
solving this problem that requires O(mn) flops. Assume tha.t Q must be updated.
P12.5.2 Suppose we have the QR factorization QR =A E E"x". Give an a.lgnrithm
for computing the QR factorization of the matrix A obtained by deleting the kth row of
A. Your algorithm should require O(mn) flops.
P12.5.3 Suppose T E R"x" is tridiagonal and symmetric and that v E R". Show how
the Lanczos algorithm ca.n be used (in principle) to compute an orthogonal Q E R"x"
=
in O(n 2 ) flops such that QT(T + vvT)Q f is also tridiagonal.
P12.5.4 Suppose
A= [ ~] c E R", BE R(m-t)xn
12.5. UPDAT ING MATRIX FACTORIZ.-'TIONS 619
ha.a tun column ra.nk and m > n. Using the Sherman-Morrison-Woodbury formula show
that
1 < I (AT A) - 1c II~
.,.,.;,.(B) - <T,.;,.(A) + 1 - cT(AT A)-lc .
P12.5.5 As a function of :rt and :r2, what is the 2-nonn of tbe hyperbolic rotation
produoed by (12.!1.9)?
P12.5.6 Show that the hyperbolic reduction in §12.5.4 does not breakdown U A has
full column rank.
P12.5. 7 A•ume
A=[~~]
where R and E are square with
p =
II ENz < 1.
O'm;n(R)
Show that if
Q-
Is orthogonal and
[~ ~ ][ ~~ ] = [ ~~
then II Ht lb ~ Pll H lb·
Notes and References for Sec. 12.5
Numerous aapects of the updating problem are presented in
P.E. 0111, C .H. Golub, W . Murray, and M.A. Saunder8 (1974). •Mmods for Modifylnc
Matrix Fa.ctorlza.tions," Ma.th. Oomp. 28, 506-35.
Appllcatione in the area of opUmizatlon are covered In
R.H. Bartels (1971). "A Stabilization of the Simplex M~bod ," Numer. Mo.tlt. 16,
414-434.
P.E. CiU, W . Murray, and M.A. Saunders (1975). "Methode for Computin& a nd Modl-
fyine the LDV FactoB of a Matrix," Math. Oomp. %9, 1051-77.
D. Goldfarb (1976). "Factored Variable Metric Methods for Unconstrained Optimiu.-
tion," Math. Oomp. 30, 796-811.
J .E . Dennie and R.B. Schnabel (1983). Num<rical Methodl/or Uncofutm1ned 0ptimiz4-
tion and Nonlinear Equations, Prenti~Hall, Englewood Cliffil, NJ.
W .W . Hager (1989). "Up d ating the Inverse of a Matrix," SIAM Review 31, 221-239.
S.K. Elderaveld and M.A. Saunders (1992). "A Bloclt-LU Update for Large-Scale Linear
Programming," SIAM J . Matri:z AnaL Appl. 13, 191- 201.
Updating issues in the least equ11J'81 eetting are discussed in
S.J. Olszanskyj, J.M. Leba.k, and A.W. Bojanczyk (1994). "Rank-k Modification Meth-
ods for Recursive Least Squares Problems," Numerical Algorithms 7, 325-354.
L. Elden and H. Park (1994). "Block Downdating of Least Squares Solutions," SIAM J.
Matrix Anal. Appl. 15, 1018-1034.
Another important topic concerns the updating of condition estimates:
W.R. Ferng, G.H. Golub, and R.J. Plemmons (1991). "Adaptive Lanczos Methods for
Recursive Condition Estimation," Numerical Algorithms 1, 1-20.
G. Shroff and C.H. Bischof (1992). "Adaptive Condition Estimation for Rank-One Up-
dates of QR Factorizations," SIAM J. Matrix Anal. Appl. 19, 1264-1278.
D.J. Pierce and R.J. Plemmons (1992). "Fast Adaptive Condition Estimation," SIAM
J. Matrix Anal. Appl. 19, 274-291.
G.W. Stewart {1979). "The Effects of Rounding Error on an Algorithm for Downdating
a Cholesky Factorization," J. Inst. Math. Applic. 29, 203-13.
A.W. Bojanczyk, R.P. Brent, P. Van Dooren, and F.R. de Hoog (1987). "A Note on
Downdating the Cholesky Factorization," SIAM J. Sci. and Stat. Comp. 8, 21D-221.
C.S. Henkel, M.T. Heath, and R.J. Plemmons (1988). "Cholesky Downdating on a
Hypercube," in G. Fox (1988), 1592-1598.
C.-T. Pan {1993). "A Perturbation Analysis of the Problem of Downdating a Cholesky
Factorization," Lin. Alg. and I/.8 Applic. 183, 103-115.
L. Elden and H. Park (1994). "Perturbation Analysis for Block Down dating of a Cholesky
Decomposition," Numer. Math. 68, 457-468.
Updating and downdating the ULV and URV decompositions and related topics are cov-
ered in
C.H. Bischof and G.M. Shroff (1992). "On Updating Signal Subspaces," IEEE nuns.
Signal Proc. 40, 96-105.
G.W. Stewart (1992). "An Updating Algorithm for Subspace Tracking," IEEE nuns.
Signal Proc. 40, 1535-1541.
G.W. Stewart (1993). "Updating a Rank-Revealing ULV Decomposition," SIAM J.
Matrix Anal. AppL 14, 494-499.
G.W. Stewart {1994). "Updating URV Decompositions in Parallel," Pamllel Computing
!JO, 151-172.
H. Park and L. Elden (1995). "Downdating the Rank-Revealing URV Decomp06ition,"
SIAM J. Matrix Anal. Appl. 16, 138-155.
Finally, we mention the following paper concerned with SVD updating:
12.6. MODIFIED/STRUCTURED EIGENPROBLEMS 621
M. Moonen, P. Van Dooren, and J. Vandewalle (1992). "A Singular Value Decomposition
Updating Algorithm," SIAM J. Matrix Anal. Appl. 13, 1015--1038.
[~ ~ ]
r p-r
T
n-r
r =rank( C)
B12 ] r
QTAQ = B = [ Bu
B21 B22 n-r
T n-r
and set
y = QTx = [:] n-r
T
where d = QT c, i.e.,
n n
p(A) =L d~ II (Aj - A)
i~t j =l
= 0.
j;f>i
n- 1 k = l :n. (12.6. 1)
n
j•l
(A; - AA:)
i#
12.6. MODIFIED/STRUCTURED EIGENP ROBLEMS 623
This determines each dk up to its sign. Thus there are 2n different solutions
c = Qd to the original problem.
A related inverse eigenvalue problem involves finding a tridiagonal ma-
trix
0
T=
0
such that T has prescribed eigenvalues {.Xt •... , An} and T(2:n, 2:n) has
prescribed eigenvalues {>.1, . . . , .Xn- l} with
At > ).1 > A2 > · · · > An-1 > ~-1 > An.
We show how to compute the tridiagonal T via the La.nczos process. Note
that the ~. are the stationary values of
!J>(y) = yTAy
yTy
T= [~ ~]
is symmetric, positive definite, and Toeplitz with r E m,n- l. Our goal is to
compute the smallest eigenvalue Amm(T) ofT given that
[~ ~][;] A[;].
624 CHAPTER 12. SPECIAL TOPICS
i.e.,
a +rTy >.a
ar+Gy .>.y.
We have dealt with similar functions in §8.5 and §12.1. In this case, f
always has a negative slope
(12.6.2)
(12.6.3)
Since, ,A(k) < Amin(G), this system is positive definite and Algorithm 4.7.1
is applicable if we simply apply it to the normalized Toeplitz matrix (G -
,A(k)I)/(1- ,A(kl).
A starting value that satisfies (12.6.2) can be obtained by examining
the Durbin algorithm when it is applied to T>.. = (T- .AI)/(1 - .>.). For
this matrix the "r" vector is r/(1- .A) and so the Durbin algorithm (4.7.1)
transforms to
12.6. MODIFIED/STRUCTURED EIGENPROBLEMS 625
(12.6.4)
end
From the discussion in §4. 7.2 we know that f3I. .. • , f3k > 0 implies that
T,x(1:k + 1, 1:k + 1) is positive definite. Hence, a suitably modified (12.6.4)
can be used to compute m(>.), the largest index m such that /3~o ... , !3m are
all positive but that /3m+ I .::; 0. Note that if m(>.) = n- 2, then (12.6.2)
holds. This suggests the following bisection scheme:
The bracketing interval [L, R] always contains a >. such that m(>.) = n- 2
and so the current >. has this property upon termination.
There are several possible choices for a starting interval. One idea is to
set L = 0 and R = 1 - lr1l since
QT(A+AT) Q .
= diag(cos(8J), ... , cos(lln))
2
where m = n / 2.
The cosines c1, . . . , Cn a.re called t he Schur parameters and as we men-
tioned, the oorresponding sloes are the subdiagonal entries of H. Using
these numbers it is possible to construct explicitly a pair of bidiagonal ma-
trices Bc,Bs e IR.nxn with the property that
a(Bc(I:m , l :m )) = {cos(81 /2), ... ,cos(Bm/ 2)} (12.6.5)
a(Bs(1:m, 1:m)) = {sin(OJ / 2), ... ,sin(Bm / 2)} (12.6.6)
The singular values of Bc(1:m, l:m) and Bs(l:m, l:m) can be computed
using the bidiagonal SVD algorithm. The angle 8k can be 8()CUrately com-
puted from sin(8k/2) if 0 < 8k $ 1rj 2 and accurately oomputed from
cos(O~c /2) if 'lr/ 2 $ 8k < 1r. The construction of Be and Bs is based
on three facts:
1. H is slmilar to
fi = HoHe
where H 0 and H, are t he odd and even reflection products
Ho = GtG3 '"Gn-1
Ht, = G2G4 ·· ·Gn.
These matrices are block diagonal with 2-by-2 and 1-by-1 blocks, i.e.,
Ho = diag(R(¢1), R(</>a), . . . , R(¢,._1)) (12.6.7)
He = diag(l, R(¢-l), R(,P4), . .. , R(¢n-2), - 1) (12.6.8)
where
R(¢) _ [ - cos(~/>) sin(¢)] (12.6.9)
- sin(~/>) cos{¢) ·
2. The eigenvalues of the symmetric tridiagonal matrices
C = Ho + He and S = H0 - He (12.6.10)
2 2
are given by
A(C) = {e cos(81/2), .. . , ± cos{8m/2) } (12.6.11)
A(S) { ± sin(Bt/2), . .. ,±sin(Bm/2)}. (12.6.12)
628 CHAPTER 12. SPECIAL TOPICS
and U§SVs = Bs
F3 = G3G4GsG6G1Gs
where Fs = GsG6G7Gs
{
F1 = G1Gs.
Since reflections are symmetric and G;Gi = GiGi if li- il ::0: 2, we see that
F3HF[ = (G3G4GsG6G7Gs)(GIG2GaG4GsG6G7Gs)(GJG4GsG6G7Gs)r
(G3G4GsG6G1Gs)GtG2
= GtGJG2G4GsG6G1Gs,
Fs(F3HFJ)F[ (GsG6G7Gs)(GtG3G2G4GsG6G7Gs)(GsG6G7Gs)r,
= (GsG6G1Gs)G,G3G2G4
G,G3GsG2G4G5G7Gs
F1(FsF3H FJ F[)Fi
(G1Gs) (Gt G3GsG2G4G6G1Gs)( G1Gs)T
(GtG3GsG7)(G2G4G6Gs) = HaH•.
The second of the three facts that we need to establish relates the eigen-
values of if = HoHe to the eigenvalues of the C and S matrices defined
in (12.6.10). It follows from (12.6.7) and (12.6.8) that these matrices are
symmetric, tridiagonal, and unreduced, e.g.,
c 1
2
[-~
0
s,
SJ CJ- ~
s2
0
s2
C2 - C3
0
0
SJ
l
l
0 0 83 C3- C4
s =
1
2
[-~
0
s,
s, C1 + C2
-S2
0
-s2
-C2- C3
0
0
SJ .
0 0 SJ CJ +c4
By working with the definitions it is easy to verify that
if+ ifT
2
12.6. MODIFI ED/STRUCTURED EJGENPROBLEMS 629
and
R(4>/2)R(4>)R(4>/2) = [ ~ ~~ ] .
Thus, if
then from (12.6.7) and (12.6.8) H0 and He have the following Schur decom-
positions:
Q 0 H0 Q0 = D 0 = diag(1, -1, 1, - 1, .. •, 1, - 1}
Q eHeQe De = diag(1, 1,-1,1, -1, .. · ,1 , - 1,-1).
The matrices
1 1
QoCQe = 2Qo (Ho + He)Qe = 2 (Do(QoQe) + (QoQe )De)
1 1
QoSQe = 2Qo (Ho - He) Qe = 2 (Do(QoQe) - (QoQe)De)
X X X 0 0 0 0 0
X X X 0 0 0 0 0
0 X X X X 0 0 0
0 X X X X 0 0 0
QoQe
0 0 0 X X X X 0
0 0 0 X X X X 0
0 0 0 0 0 X X X
0 0 0 0 0 X X X
630 CHAPTER 12. SPECIAL TOPICS
(The main ideas from this point on are amply communicated with n =8
examples.) If D 0 (i,i) and D.(j,j) have the opposite sign, then C;~) =0
from which we conclude that c(l) has the form
ao b! 0 0 0 0 0 0
0 0 ~ 0 0 0 0 0
0 a2 0 b3 0 0 0 0
0 0 a3 0 b4 0 0 0
c(J) = QoCQ.
0 0 0 a4 0 bs 0 0
0 0 0 0 as 0 be 0
0 0 0 0 0 ae 0 0
0 0 0 0 0 0 a7 ba
Analogously, if D 0 (i, i) and D.(j,j) have the same sign, then sg) = 0 from
which we conclude that S(l) has the form
0 0 h 0 0 0 0 0
e2 d2 0 0 0 0 0 0
0 0 d3 0 Is 0 0 0
S(l) = Q 0 SQ. 0 e4 0 d4 0 0 0 0
0 0 0 0 ds 0 Is 0
0 0 0 e6 0 d6 0 0
0 0 0 0 0 0 d7 h
0 0 0 0 0 0 es 0
ao b! 0 0 0 0 0 0
0 a2 b3 0 0 0 0 0
0 0 a4 bs 0 0 0 0
0 0 0 ae 0 0 0 0
= 0 0 0 0 b2 0 0 0
0 0 0 0 a3 b4 0 0
0 0 0 0 0 as b6 0
0 0 0 0 0 0 a7 bs
12.6. MODIFIED/STRUCTURED EJGENPROBLEMS 631
Problems
Pl2.6.1 Let A E If" X" and consider the problem of finding the stationary values of
liTA,;
R(z tt) - 11 E nm,z E R"
... - lilllbll,;b
subject to the constraints
<P':r: = O CeE'x" n~p
DTy = O DER"xq m~q
Show how to solve this problem by first computing complete orthogonal decompositions
of C and D and then computing the SVD of a ceztain eubmatrix of a transformed A.
Pl2.6.2 Supp06e A E R"'x" and B E R"'><". Assume that rank(A) = nand rank( B ) =
p. Using the methods of this section, s how how to solve
to the eigenvalues and eigenvectors of A= A1A2A3A4. Assume that the diagonal blocks
in A ace square.
P12.6.6 Prove that if {12.6.2) holds, then {12.6.3) converges to >.m;n(T) montonically
from the right.
P12.6.7 Recall from §4.7 that it is possible to compute the inverse of a. symmetric pos-
itive definite Toeplitz matrix in O{n 2) flops. Use this fact to obtain an initia.l bracketing
interval for {12.6.5) that is based on II T- 1 lloo and II a- 1 lloo·
P12.6.8 A matrix A e R''" is centrosymmetric if it is symmetric and persymmet-
ric, i.e., A = EnAEn where En = In(:,n:- 1:1). Show that if n =2m and Q is the
orthogonal matrix
1 [ lm
Q=.Ji Em -&n
Im
].
then
QTAQ = [
An+ A12Em
0
0
An- A12Em ]
where An = A{1:m,1:m) and A12 = A{1:m, m + 1:n). Show that if n = 2m, then the
Schur decnmposition of a centrosymmetric matrix can be computed with onG-fourth the
flops that it takBs to compute the Schur decomposition of a. symmetric matrix, assuming
that the QR a.lgorithm is used in both cases. Repeat the problem if n = 2m+ 1.
P12.6.9 Suppose F, G E R"x" are symmetric and that
Q = [ Q1 02 1
I' n-p
is an n-by-n orthogonal matrix. Show how to compute Q and p so that
f(Q,p) =tr(Q[FQ1) +tr{QfGQ2)
is maximized. Hint: tr(Qf FQ1) + tr(Qf GQ2) = tr(Qf (F- G)QJ) + tr{G).
P12.6.10 Suppose A E R''" is given and consider the problem of minimizing II A - S IIF
over all symmetric positive semidefinite matrices S that have rank r or less. Show that
min{k,r}
s= L >.,q,qr
i=l
P12.6.11 Verify for general n (even) that His similar to HaHe where these matrices
ace defined in §12.6.4.
P12.6.12 Verify that the bidiagonal matrices Bc{1:m,1:m) and Bs(1:m, 1:m) in §12.6.4
12.6. MODIFIED/STRUCTURED ElGENPROBLEMS 633
have nonzero entries on their diagonal and superdiagonal and specify their value.
P12.6.13 A real 2n-by-2n matrix of the form
I,.
0 l ,
then ME fi'"X 2n is Hamiltonian if and only if .P'MJ = -MT. (a) Show that the
eigenvalues of a Hamiltonian matrix come in plU&-minus pairs. (b) A matrix S E R 2"X 2n
is symplectic if .P' SJ =-s-T. Show that if Sis symplectic and M is Hamiltonian, then
s- 1 MS is also Hamiltonian. (c) Show that if Q E fi'"X 2n is orthogonal and symplectic,
then
Q =[ -~~ ~~ l
where QfQ 1 + Q'fQ2 = In and Q'fQ, is symmetric. Thus, a Givens rotation of the
form G(i, i + n, IJ) is orthogonal symplectic as is the direct sum of n-by-n Householders.
(d) Show how to compute a symplectic orthogonal U such that
UT MU = [ ~ -H~ l
where His upper Hessenberg and D is diagonal.
G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Re!liew 15,
318-44.
D. Boley a.nd G.H. Golub (1987). "A Survey of Matrix Inverse Eigenvalue Problems,"
Inverse Problems 3, 595-{;22.
References for the stationary value problem include
G.E. Fbrsythe and G.H. Golub (1965). "On the Stationary Values of a Second-Degree
Polynomial on the Unit Sphere," SIAM J. App. Math. 19, 105(H)8.
G.H. Golub and R. Underwood {1970). "Stationary Values of the Ratio of Quadratic
Fbrms Subject to Linear Constraints," Z. Angew. Math. Phys. 21, 318-26.
S. Leon {1994). ''Maximizing Bilinear Fbrms Subject to Linear Constraints," Lin. Alg.
and Its Applic. flO, 4~58.
An algorithm for minimizing xT Ax where x satisfies Bx = d and II x ll2=1 is presented in
W. Gander, G.H. Golub, and U. von Matt {1991). "A Constrained Eigenvalue Problem,"
in Numericnl Linear Algebra, Digital Signal Processing, and Parallel Algorithms,
G.H. Golub and P. Van Dooren (eds), Springer-Verlag, Berlin.
Selected papers that discuss a range of inverse eigenvalue problems include
G.H. Golub and J.H. Welsch (1969). "Calculation of Gauss Quadrature Rules," Math.
Camp. es, 221-30.
S. Friedland (1975). "On Inverse Multiplicative Eigenvalue Problems for Matrices," Lin.
Alg. and Its Applic. 12, 127-38.
634 CHAPTER 12. SPECIAL TOPICS
D.L. Boley and G.H. Golub (1978). "The Matrix Inverse Eigenvalue Problem for Peri-
odic Jacobi Matrice!," in Proc. Fourth Svmposium on Basic Problems of Numerical
Mathematics, Prague, pp. 63-76.
W.E. Ferguson (1980). ''The Construction of Jacobi and Periodic Jacobi Matrices with
Pr=ribed Spectra," Math. Camp. 35, 1203-1220.
J. Kautsky and G.H. Golub (1983). "On the Ca.iculation of Jacobi Matrices," Lin. Alg.
and It. Applic. 52/53, 439-456.
D. Boley and G.H. Golub (1984). "A Modified Method for Restructuring Periodic Jacobi
Matriee!," Math- Comp. 42, 143-150.
W.B. Gragg and W.J. Harrod (1984). "The Numerica.lly Stable Reconstruction of Jacobi
Matrices from Spectral Data," Numer. Math. 44, 317-336.
S. Friedland, J. Noceda.i, and M.L. Overton (1987). "The Formulation and Analysis of
Numerical MethodB for InveiBe Eigenvalue Problems," SIAM J. Numer. Anal. £4,
634-W7.
M.T. Chu (1992). "Numerical Methods for InveJBe Singular Value Problems," SIAM J.
Num. Anal. 29, 885-903.
G. Ammar and G. He (1995). "On an lnveiBe Eigenvalue Problem for Unitary Matrices,"
Lin. Alg. and Its Applic. 218, 263-271.
H. Zha and z. Zhang (1995). "A Note on Constructing a Symmetric Matrix with Spec-
ified Diagona.l Entries and Eigenvalues," BIT 35, 448-451.
G. Cybenko and C. Van Loan (1986). "Computing the Minimum Eigenvalue of a Sym-
metric Positive Definite Toeplitz Matrix," SIAM J. Sci. and Stat. Camp. 7, 123-131.
W.F. Trench (1989). "Numerical Solution of the Eigenvalue Problem for Hermitian
'Ibeplitz Matriee!," SIAM J. Matriz Anal. Appl. 10, 135-146.
L. Reichel and L.N. Trefethen (1992). "Eigenva.lues and Pseudo-eigenvalues of 'Ibeplitz
Matriee!," Lin. Alg. and Its Applic. 162/163/164, 153-186.
S.L. Handy and J.L. Barlow (1994). "Numerica.l Solution of the Eigenproblem for
Banded, Symmetric Toeplitz Matrices," SIAM J. Matriz AnaL Appl. 15, 205-214.
Hamiltonian eigenproblems (see P12.6.13) occur throughout optimal control theory and
are very important.
C. C. Paige and C. Van Loan (1981). "A Schur Decomposition for Hamiltonian Matrices,"
Lin. Alg. and Its Applic. 41, 11-32.
C. Van Loan (1984). "A Symplectic Method for Approximating All the Eigenvalues of
a Hamiltonian Matrix," Lin. Alg. and Its Applic. 61, 233-252.
12.6. MODIFIED/STRUCTURED EIGENPROBLEMS 635
R. Byers (1986) "A Hamiltonian QR Algorithm," SIAM J. Sci. and Stat. Camp. 7,
212-229.
V. Mehrmann (1988). "A Symplectic Orthogonal Method for Single Input or Single
Output Discrete Time Optimal Quadratic Control Problems," SIAM J. Matriz Anal.
Appl. 9, 221-247.
G. Ammar and V. Mehrmann (1991). "On Hamiltonian and Symplectic Hessenberg
Fbrms," Lin.Alg. and Its Application 149, 55-72.
A. BuDBe-Gerstner, R. Byers, and V. Mebrmann (1992). "A Chart of Numerical Methods
for Structured Eigenvalue Problems," SIAM J. Matri:t: Anal Appl. 13, 419-453.
Other papers on modified/structured eigenvalue problems include
J.O. ABSen (1971). "On the Reduction of a Symmetric Matrix to Tridiagona.l Form,"
BIT 11, 233-42.
N.N. Abdelmalek (1971). "Roundoff Error Analysis for Gram-Schmidt Method and
Solution of Linear Least Squares Problems," BIT 11, 345-{;8.
G.E. Adams, A.W. Bojanczyk, and F.T. Luk (1994). "Computing the PSVD of Two
2x2 Triangular Matrices,• SIAM J, Matri% Anal. AppL 15, 366-382.
L. Adams (1985). "m-step Preconditioned Congugate GrBdient Methods," SIAM J. Sci.
and Stat. Comp. 6, 452-463.
L. Adams and P. Arbenz (1994). "Towards a Divide and Conquer Algorithm for the Real
Nonsymmetric Eigenvalue Problem," SIAM J. Matri% Anal. Awl. 15, 1333-1353.
L. Adams and T. Crockett (1984). "Modeling Algorithm Execution Time on Processor
Arrays," Computer 17, 38-43.
L. Adams and H. Jordan (1986). "Is SOR Color-Blind?" SIAM J. Sci. Stat. Comp. 7,
49Q-506.
S.T. Alexander, C.T. Pan, and R.J. Plemmons (1988). "Analysis of a Recursive Least
Squares Hyperbolic Rotation Algorithm for Signal ProC<I!Sing," Lin. Alg. and It•
Applic. 98, 3--40.
E.L. Allgower (1973). "Exact Inverses of Certain Band Matrices," Numer. Math. Ill,
279-84.
A.R. Amir-Moez (1965 ). &tremol Properties of Linmr 'I'rowformatiow and Geometry
of Unitary Spaces, Texas Tech University Mathematics Series, no. 243, Lubbock,
Texas.
G.S. Ammar and W.B. Gragg (1988). "Superfast Solution of Real Positive Definite
Toeplitz Systems," SIAM J. Matri% Anal. Appl. 9, 61-76.
G.S. Ammar, W.B. Gragg, and L. Reichel (1985). "On the Eigenproblem for Orthogona.l
Matrices," Proc. IEEE Conference on Deci.ion and Control, 1963-1966.
G.S. Ammar and G. He (1995). "On an Invel'8e Eigenvalue Problem for Unitary Matri-
ces," Lin. Alg. and Its Applic. 1!18, 263-271.
G.S. Ammar and V. Mehrmann (1991). "On Hamiltonian and Symplectic Hessenberg
Forms," Lin.Aig. and ft. Applic. 1,49, 55-72.
P. Amodio and L. Brugnano (1995). "The Parallel QR Factorization Algorithm for
Tridiagonal Linear Systems," Pamllel Computing !H, 1097-1110.
C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell (1993). "A Linear Algebra Framework
for Static HPF Code Distribution," PT'OC«iiings of the 4th Workshop on Compilers
for Pamllel Computers, Delft, The Netherlands.
A.A. Anda and H. Park (1994). "Fast Plane Rotations with Dynamic Scaling," SIAM
J. Matri% An.U. AppL 15, 162-174.
E. Anderson, z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. DuCroz, A. Greenbaum,
S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen (1995). LAPACK
Users' Guide, Releaoe 2.0, 2nd ed., SIAM Publications, Phi!Bdelphia, PA.
E. Anderson, Z. Bal, and J. Dongarra (1992). "Generalized QR Factorization and Its
Application," Lin. Alg. and /Is Applic. 162/163/164, 243-271.
N. Anderson and I. Karasalo (1975). "On Computing Bounds for the Least Singular
Value of a Triangular Matrix," BIT 15, 1-4.
637
638 BIBLIOGRAPHY
S. Barnett and C. Storey (1968). "Some Applications of the Lyapunov \latrix Equation,"
J. [rut. Math. Applic. 4, 33-42.
R. Barrett, \!.Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra. V. Eijkhout, R.
Po~o, C. Romine, H. van der Vorst (1993). Templates for the Solution of Linear
Systems: Building Blocks for Itemtive Methods, SIAM Publications, Philadelphia,
PA.
A. Barrlund (1991). "Perturbation Bounds for the I>DLT and LU Decompositions,"
BIT 31, 358-363.
A. Barrlund (1994). "Perturbation Bounds for the Generalized QR Factorization," Lin.
Alg. and Its Applic. 207, 251-271.
R.H. Bartels (1971). "A Stabilization of the Simplex Method," Nnmer. Math. 16,
414-434.
R.H. Bartels, A.R. Conn, and C. Charalambous (1978). "On Cline's Direct Method for
Solving Overdetermined Linear Systems in the L 00 Sense," SIAM J. Num. Anal. 15,
255-70.
R.H. Bartels and G.W. Stewart (1972). "Solution of the Equation AX+ XB "'0,"
Comm. ACM 15, 82Q-26.
S.G. Bartels and D.J. Higham (1992). "The Structured Sensitivity of Vandermonde-Like
Systems," Nume1·. Math. 62, 17-34.
W. Barth, R.S. Martin, and J.H. Wilkinson (1967). "Calculation of the Eigenvalues of
a Symmetric Tridiagonal Matrix by the Method of Bisection," Numer. Math. 9,
386-93. See also Wilkinson and Reinsch (1971, 249-256).
V. Barwell and J.A. George (1976). "A Comparison of Algorithms for Solving Symmetric
Indefinite Systems of Linear Equations," ACM Trons. Math.. Soft. 2, 242-51.
K.J. Bathe and E.L. Wilson (1973). "Solution Methods for Eigenvalue Problems in
Structural Mechanics," Int. J. Numer. Meth.. Eng. 6, 213-26.
S. Batterson (1994). "Convergence of the Francis Shifted QR Algorithm on Normal
Matrices," Lin. Alg. and Its Applic. 201, 181-195.
S. Batterson and J. Smillie (1989). "The Dynamics of Rayleigh Quotient Iteration,"
SIAM J. Num. Anal. 26, 621-636.
D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodgh\11 (1993). "Solving Alignment
Using Elementary Linear Algebra," in Proceedings of the 7th International Workshop
on Language.• and Compilers for Pamllel Computing, Lecture Notes in Computer
Science 892, Springer-Verlag, New York, 46-60.
F.L. Bauer (1963). "Optimally Scaled Matrices," Numer. Math. 5, 73~7.
F.L. Bauer (1965). "Elimination with Weighted Row Combinations for Solving Lin-
ear Equations and Least Squares Problems," Numer. Math. 1, 338-52. See also
Wilkinson and Reinsch (1971, 119-33).
F.L. Bauer and C.T. Fike (1960). "Norms and Exclusion Theorems," Numer. Math. 2,
123-44.
F.L. Bauer and C. Reinsch (1968). "Rational QR Transformations with Newton Shift
for Symmetric Tridiagonal Matrices," Numer. Math. 11, 264-72. See also Wilkinson
and Reinsch (1971, pp.257-{l5).
F.L. Bauer and C. Reinsch (1971 ). "Inversion of Positive Definite Matrices by the Gauss-
Jordan :\!ethod," in Handbook fm· Automatic Computation Vol. 2, Linear Algebra,
J .H. Wilkinson and C. Reinsch, eds. Springer-Verlag, New York, 45-49.
C. Bavely and C.W. Stewart (1979). "An Algorithm for Computing Reducing Suhspaces
hy Block Diagonallzation," SIAM J. Num. Anal. 16, 359-{!7.
C. Beattie and D.W. Fox (1989). "lm:alization Criteria and Containment for Rayleigh
Quotient Iteration," SIAM J. Matrn Anal. Appl. 10, 8Q-93.
ll. Beauwens and P. de Groen, eds. (1992). Itemtitw Methods in Linear Algebra, Elsevier
(North-Holland), Amsterdam.
T. Beelen and P. Van Dooren (1988). "An Improved Algorithm for the Computation of
Kronecker's Canonical Fbrm of a Singular Pencil," Lin. Alg. and Its Applic. 105,
9-65.
R. Bellman (1970). Introduction to Matrix Analysis, 2nd ed., McGraw-Hill, New York.
BIBLIOGRAPHY 641
H. Bolz and W. Nietho.mmer (1988). "On the Evaluation of Matrix Functions Given by
Power Series," SIAM 1. Matrix Anal. Appl. 9, 202-209.
S. Bondeli and W. Gander (1994). "Cyclic Reduction for Special Tridio.gonaJ Systems,"
SIAM J. Matrix Anal. Appl. 15, 321-330.
J. Boothroyd and P.J. Eberlein (1968). "Solution to the Eigenproblem by a. Norm-
Reducing Jacobi-Type Method (Handbook)," Numer. Math. 11, 1-12. See also
Wilkinson and Reinsch (1971, pp.327-38).
H.J. Bowdler, R.S. Martin, G. Peters, and J.H. Wilkinson (1966). "Solution of Real
and Complex Systems of Linear Equations," Numer. Math. 8, 217-234. (See also
Wilkinson and Reinsch ( 1971, 93-110).
H.J. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). "The QR and QL
Algorithms for Symmetric Matrices," Numer. Math. 11, 293-306. See also Wilkinson
and Reinsch (1971, pp.227-240).
J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition-
ers for Elliptic Problems by Substructuring I," Math. Camp. 41, 103-134.
J.H. Bramble, J.E. Pascio.k, and A. H. Schatz (1986). "The construction of Precondition-
.,. for Elliptic Problems by Substructuring II," Math. Camp. 49, 1-17.
R Bramley and A. So.meh (1992). "Row Projection Methods for Large Nonsymmetric
Linear Systems," SIAM J. Sci. Sto.tiat. Comput. 13, 168-193.
R.P. Brent (1970). "Error Analysis of Algorithms for Matrix Multiplication and Trian-
gular Decomposition Using Winograd's Identity," Numer. Math. 16, 145-156.
R.P. Brent (1978). "A Fortran Multiple Precision Arithmetic Package," ACM 11-ans.
Math. Soft. 4, 57-70.
R.P. Brent (1978). "Algorithm 524 MP, a. Fortran Multiple Precision Arithmetic Pack-
age," ACM 11-ans. Math. Soft. 4, 71-al.
R.P. Brent and F.T. Luk (1982) "Computing the Choiesky Factorization Using a Systolic
Architecture," Proe. 6th Australian Computer Science Conf. 295-302.
R.P. Brent and F.T. Luk (1985). "The Solution of Singular Value and Symmetric Eigen-
value Problems on Multiprocessor Arrays," SIAM J. Sci. and Stat. Camp. 6, 69-84.
R.P. Brent, F.T. Luk, and C. Van Loan (1985). "Computation of the Singular VaJue
Decomposition Using Mesh Connected Processors," J. VLSI Computer Systems 1,
242-270.
C. Brezinski and M. Redivo-Zaglia (1994). "Treatment of Near-Breakdown in the CGS
Algorithms," Numer. Alg. 7, 33-73.
C. Brezinski and M. Redivo-Zaglio. ( 1995). "Look-Aheod in BiCGSTAB and Other
Product-Type Methods for Linear Systems," BIT 35, 169-201.
C. Brezinski a.od H. Sodok (1991). "Avoiding Breakdown in the CGS Algorithm," Nu-
mer. Alg. 1, 199-206.
C. Brezinski, M. Zaglio., and H. Sodok (1991). "Avoiding Breakdown and Near Break-
down in Lo.nczos Tyoe Algorithms," Numer. Alg. 1, 261-284.
C. Brezinski, M. Zaglia., and H. Sadok (1992). "A Breakdown Free Lanczos Type Algo-
rithm for Solving Linear Systems," Numer. Math. 63, 29-38.
K.W. Brodlie and M.J.D. Powell (1975). •on the Convergence of Cyclic Jacobi Meth-
ods," J. lnst. Math. Applic. 15, 279-87.
J.D. Brown, M.T. Chu, D.C. Eiiison, and R.J. Plemmons, eds. (1994). Proceedings
of the Cornelius Lanczos International Centenary Conference, SIAM Publications,
Philadelphia., PA.
C.G. Broyden (1973). "Some Condition Number Bounds for the Gaussian Elimination
Pro«...," J. lnst. Math. Applic. 18, 273-86.
A. Buckley (1974). "A Note on Matrices A = I+ H, H Skew-Symmetric," Z. Angew.
Math. Mech. 54, 125-26.
A. Buckley (1977). "On the Solution of Certain Skew-Symmetric Linear Systems," SIAM
J. Num. Anal. 14, 566-70.
J.R. Bunch (1971). "AnaJysis of the DiagonaJ Pivoting Method," SIAM J. Num. Anal.
8, 656-680.
644 BIBLIOGRAPHY
R. Byers (1984). "A Linpack-Style Condition Estimator for the Equation AX- X aT=
C," IEEE Tinns. Auto. Cont. AC-.1!9, 926-928.
R. Byers (1986) "A Hamiltonian QR Algorithm," SIAM J. Sci. and Stat. Camp. 7,
212-229.
R. Byers (1988). "A Bisection Method for Measuring the Distance of a Stable Matrix to
the Unstable Matrices," SIAM J. Sci. Stat. Comp. 9, 875-881.
R. Byers and S.G. Nash (1987). "On the Singular Vectors of the Lyapunov Operator,"
SIAM J. Alg. and Disc. Methods 8, 5!Hl6.
X.-C. Cai and 0. Widlund (1993). "Multiplicative Schwarz Algorithms for Some Non-
symmetric and Indefinite Problems," SIAM J. Numer. Anal. 30, 936-952.
D. Calvetti, G.H. Golub, and L. Reichel (1994). "An Adapti11e Chebyshev Iterati11e
Method for Nonsymmetric Linear Systems Bas.ed on Modified Moments," Numer.
Math. 67, 21-40.
D. Calvetti and L. Reichel (1992). "A Chebychev-Vandermonde Sol11er," Lin. Alg. and
Its Applic. 171!, 219-229.
D. Calvetti and L. Reichel (1993). "Fast In11ersion of Vandermonde-Like Matrices In-
volving Orthogonal Polynomials," BIT 33, 473-484.
D. Calvetti, L. Reichel, and D.C. Sorensen (1994). "An Implicitly Restarted Lanczos
Method for Large Symmetric Eigenvalue Problems," ETNA 2, 1-21.
L.E. Cannon (1969). A Cellular Computer to Implement the Kalman Filter Algorithm,
Ph.D. Thesis, Montana State University.
R. Carter (1991). "Y-MP Floating Point and Cholesky Factorization," Jnt'l J. High
Speed. Cumputing 3, 215-222.
F. Chaitin-Chatelin and V. Fraysee (1996). Lectures on nnite Precision Computations,
SIAM Publications, Philadelphia, PA.
R.H. Chan (1989). "The Spectrum of a Family of Circulant Preconditioned Toeplitz
Systems," SIAM J. Num. Anal. 26, 503-506.
R.H. Chan (1991). "Preconditioners for Toeplitz Systems with Nonnegative Generating
Functions," IMA J. Num. Anal. 11, 333-345.
R.H. Chan, J.G. Nagy, and R.J. Plemmons (1993). "FFT bllll<>:l Preconditioners for
Toeplitz Block Least Squares Problems," SIAM J. Num. Anal. 30, 174Q-1768.
R.H. Chan, J.G. Nagy, and R.J. Plemmons (1994). "Circulant Preconditioned Toeplitz
Least Squares Iterations," SIAM J. Matriz Anal. Appl. 15, 8Q-97.
S.P. Chan and B.N. Parlett (1977). "Algorithm 517: A Program for Computing the
Condition Numbers of Matrix Eigenvalues Without Computing Eigenvectors," ACM
Tinns. Math. Soft. 3, 186-203.
T.F. Chan (1982). "An Improved Algorithm for Computing the Singular Value Decom-
position," ACM Tinns. Math. Soft. 8, 72-83.
T.F. Chan (1984). "Deflated Decomposition Solutions of Nearly Singular Systems,"
SIAM J. Num. Anal. 21, 738-754.
T.F. Chan (1985). "On the Existence and Computation of LU Factorizations with small
pivots," Math. Comp. 42, 535-548.
T.F. Chan (1987). "Rank Revealing QR Factorizations," Lin. Alg. and Its Applic.
88/89, 67-82.
T.F. Chan (1988). "An Optimal Circulant Preconditioner for Toeplitz Systems," SIAM.
J. Sci. Stat. Comp. 9, 766-771.
T.F. Chan (1991). "Fourier Analysis of Relaxed Incomplete Factorization Precondition-
era," SIAM J. Sci. Statist. Comput.. 12, 668--680.
T.F. Chan and P. Hansen (1992). "A Look-Ahead Levinson Algorithm for Indefinite
Toeplitz Systems," SIAM J. Matriz Anal. Appl. 13, 49Q-506.
T.F. Chan and P. Hansen (1992). "Some Applications of the Rank Revealing QR Fac-
torization," SIAM J. Sci. and Stat. Comp. 13, 727-741.
T.F. Chan, K.R. Jackson, and B. Zhu (1983). "Alternating Direction Incomplete Fac-
torizations," SIAM J. Numer. Anal. 20, 239-257.
T.F. Chan and J.A. Olkin (1994). "Circulant Preconditioners for Toeplitz Block Matri-
ces," Numerical Algorithms 6, 89-101.
646 BIBLIOGRAPHY
T.F. Chan, J.A. Olkin, and D. Cooley (1992). "Solving Quadratically Constrained Least
Squares Using Block Box Solvers," BIT 32, 481-495.
S. ChandrBSeksren and l.C.F. Ipsen (1994). "On Rank-Revealing Factorizations," SIAM
J. Matnz Anal. Appl. 15, 592--{;22.
S. ChsndrBSeksren and I.C.F. Ipsen (1994). "Backward Errors for Eigenvalue and Sin-
gular Value Decompositions," Numer. Math. 68, 215--223.
S. Chandrasekaren snd I.C.F. Ipsen (1995). "On the Sensitivity of Solution Components
in Linear Systems of Equations," SIAM J. Matriz Anal. Appl. 16, 93-112.
H.Y. Chsng snd M. Salama (1988}. "A Parallel Householder Tridiagonalization Strategy
Using Scattered Square Decomposition," Parollel Computing 6, 297-312.
J.P. Charlier, M. Vanbegin, P. Vsn Dooren (1988}. "On Efficient Implementation of
Kogbetlisntz's Algorithm for Computing the Singular Value Decomposition," Numer.
Math. 52, 279-300.
J.P. Charlier snd P. Van Dooren (1987}. "On Kogbetliantz's SVD Algorithm in the
Presence of Clusters," Lin. Alg. and Its Applic. 95, 135--160.
B.A. Chartres and J.C. Geuder (1967}. "Computable Error Bounds for Direct Solution
of Linear Equations," J. ACM 14, 63-71.
F. Chatelin (1993}. Eigenvalues of Matrice•, John Wiley snd Sons, New York.
S. Chen, J. Dongarra, snd C. Hsuing (1984}. "Multiprocessing Linear Algebra Algo-
rithms on the Cray X-MP-2: Experiences with Small Granularity," J. Parollel and
Distributed Computing 1, 22-31.
S. Chen, D. Kuck, snd A. Sameh (1978}. "Practical Parallel Band Triangular Systems
Solvers," ACM 1mns. Math. Soft. 4, 270--277.
K.H. Cheng and S. Sahni (1987). "VLSI Systems for Band Matrix Multiplication,"
Parallel Computing 4, 239-258.
R.C. Chin, T.A. Manteuffel, and J. de Pillis (1984). "AD! as a Preconditioning for
Solving the Convection-Diffusion Equation," SIAM J. Sci. and Stat. Comp. 5,
281-299.
J. Choi, J.J. Dongarra, and D.W. Walker (1995). "Parallel Matrix Transpose Algorithms
on Distributed Memory Concurrent Computers," Parollel Computing 21, 1387-1406.
M.T. Chu (1992}. "Numerical Methods for Inverse Singular Value Problems," SIAM J.
Num. Anal. 29, 885--903.
M.T. Chu, R.E. Funderlic, and G.H. Golub (1995). "A Rank-One Reduction Formula
and Its Applications to Matrix Factorizations," SIAM Review 37, 512--530.
P.G. Ciarlet (1989). Introduction to Numerical Linear Algebro and Optimisation, Cam-
bridge University Press.
A.K. Cline (1973). "An Elimination Method for the Solution of Linear Least Squares
Problems," SIAM J. Num. Anal. 10, 283-89.
A.K. Cline (1976). "A Descent Method for the Uniform Solution to Overdetermined
Systems of Equations," SIAM J. Num. Anal. 13, 293-309.
A.K. Cline, A.R. Conn, snd C. Van Loan (1982}. "Generalizing the LINPACK Condition
Estimator," in Numerical Analysis, ed., J.P. Hennart, Lecture Notes in Mathematics
no. 909, Springer-Verlag, New York.
A.K. Cline, G.H. Golub, and G.W. Platzmsn (1976}. "Calculation of Normal Modes of
Oceans Using a Lsnczos Method," in Sparse Matriz Computations, ed. J.R. Bunch
snd D.J. Rooe, Academic Press, New York, pp. 409-26.
A.K. Cline, C.B. Moler, G.W. Stewart, snd J.H. Wilkinson (1979). "An Estimate for
the Condition Number of a Matrix," SIAM J. Num. Anal. 16, 368-75.
A.K. Cline and R.K. Rew (1983). "A Set of Counter examples to Three Condition
Number Estimators," SIAM J. Sci. and Stat. Comp. 4, 602--{;11.
R.E. Cline and R.J. Plemmons (1976). "L2-Solutions to Underdetermined Linear Sys-
tems," SIAM Review 18, 92-106.
M. Clint a.nd A. Jennings (1970). "The Evaluation of Eigenvalues and Eigenvectors of
Real Symmetric Matrices by Simultaneous Iteration," Comp. J. 13, 76--{;0.
M. Clint a.nd A. Jennings (1971}. "A Simultsneous Iteration Method for the Unsym-
metric Eigenvalue Problem," J. Inst. Math. Applic. 8, 111-21.
BIBLIOGRAPHY 647
B.N. Datta, C.R. Johnson. M.A. Kaashoek, R. Plemmons, and E.D. Sontag, eds. (1988),
Linear Algebra in Signals, System8, and Control, SIAM Publications, Philadelphia,
PA.
K. Datta (1988). ''The Matrix Equation XA- BX =Rand Its Applications," Lin. Alg.
and Its Applic. 109, 91-105.
C. Davis and W.M. Kahan (1970). ''The Rotation of Eigenvectors by a Perturbation,
Ill," SIAM J. Num. Anal. 7, 1-46.
D. Davis (1973). "Explicit Functional Calculus," Lin. Alg. and Its Applic. 6, 193-99.
G.J. Davis (1986). "Column LU Pivoting on a Hypercube Multiprocessor," SIAM J.
Alg. and Disc. Methods 7, 538-550.
J. Day and B. Peterson (1988). "Growth in Gaussian Elimination," Amer. Math.
Monthly 95, 489-513.
A. Dax (1990). "The Convergence of Linear Stationary Iterative Processes for Solving
Singular Unstructured Systems of Linear Equations," SIAM Review 3!!, 611-635.
C. de Boor (1979). "Efficient Computer Manipulation of Tensor Products," ACM 1hlns.
Math. Soft. 5, 173-182.
C. de Boor and A. Pinkus ( 1977). "A Backward Error Analysis for Totally Positive
Linear Systems," Numer. Math. !!7, 485-90.
T. Dehn, M. Eiermann, K. Giebermann, and V. Sperling (1995). "Structured Sparse
Matrix Vector Multiplication on Massively Parallel SIMD Architectures," Parallel
Computing !!1, 1867-1894.
P. Deift, J. Demmel, L.-C. Li, and C. 'lbmei (1991). ''The Bidiagonal Singular Value
Decomposition and Hamiltonian Mechanics," SIAM J. Num. Anal. !!8, 1463-1516.
P. Deift, T. Nanda, and C. Tomei (1983). "Ordinary Differential Equations a.nd the
Symmetric Eigenvalue Problem," SIAM J. Numer. Anal. 20, 1-22.
T. Dekker and W. Hoffman (1989). "Rehabilitation of the Gauss-Jordan Algorithm,"
Numer. Math. 54, 591-599.
T.J. Dekker and J.F. Traub (1971). "The Shifted QRAlgorithm for Hermitian Matrices,"
Lin. Alg. and Its Applic. 4, 137-54.
J.M. Delosme and I.C.F. Ipsen (1986). "Parallel Solution of Symmetric Positive Definite
Systems with Hyperbolic Rotations," Lin. Alg. and Its Applic. 11, 75-112.
C.J. Demeure (1989). "Fast QR Factorization of Vandermonde Matrices," Lin. Alg.
and It. Applic. 1!!2/1fl3j124, 165-194.
J.W. Demmel (1983). "A Numerical Analyst's Jordan Canonical Form," Ph.D. Thesis,
Berkeley.
J.W. Demmel (1983). "The Condition Number of Equivalence Transformations that
Block Diagonalize Matrix Pencils," SIAM J. Numer. Anal. 20, 599-610.
J.W. Demmel (1984). "Underflow and the Reliability of Numerical Software," SIAM J.
Sci. and Stat. Comp. 5, 887-919.
J.W. Demmel (1987). "Three Methods for Refining Estimates of Invariant Subspaces,"
Computing 38, 43-57.
J.W. Demmel (1987). "On the Distance to the Nearest Ill-Posed Problem," Numer.
Math. 51, 251-289.
J.W. Demmel (1987). "A Counterexample for two Conjectures About Stability," IEEE
1hlns. Auto. Cont. AC-3!!, 34G-342.
J.W. Demmel (1987). "The smallest perturbation of a submatrix which lowers the rank
and constrained total least squares problems, SIAM J. Numer. Anal. 24, 199-206.
J.W. Demmel (1988). ''The Probability that a Numerical Analysis Problem is Difficult,"
Math. Comp. 50, 449-480.
J.W. Demmel (1992). "The Componentwise Distance to the Nearest Singular Matrix,"
SIAM J. Matri3: Anal. Appl. 13, 1G-19.
J.W. Demmel (1996). Numerical Linear Algebra, SIAM Publications, Philadelphia, PA.
J.W. Demmel and W. Gragg (1993). "On Computing Accurate Singular Values and
Eigenvalues of Matrices with Acyclic Graphs," Lin. Alg. and Its Applic. 185, 203-
217.
650 BIBLIOGRAPHY
J.W. Demmel, M.T. Heath, and H.A. VanDer Vorst (1993) "Parallel Numerical LineBC
Algebra," in Acta Nume:rica 1993, Cambridge University Press.
J.W. Demmel and N.J. Higham (1992). "Stability of Block Algorithms with Fast Level-3
BLAS," ACM 1hms. Math. Soft. 18, 274-291.
J.W. Demmel and N.J. Higham (1993). "Improved Error Bounds for Underdetermined
System Solvers," SIAM J. Matrix Anal. Appl. 14, 1-14.
J.W. Demmel, N.J. Higham, and R.S. Schreiber (1995). "Stability of Block LU Factor-
ization," Numer. Lin. Alg. with Applic. tl, 173-190.
J.W. Demmel and B. K8gstrom (1986). "Stably Computing the Kronecker Structure
and Reducing Subspaces of Singular Pencils A - >.B for Uncertain Data," in Large
Scale Eigenvalue Problems, J. Cullum and R.A. Willoughby (eds), North-Holland,
Amsterdam.
J.W. Demmel and B. Kagstrom (1987). "Computing Stable Eigendecompositions of
Matrix Pencils," Linear Alg. and Its Applic 88/89, 139-186.
J.W. Demmel and B. Kagstrom (1988). "Accurate Solutions of ill-Posed Problems in
Control Theory," SIAM J. Matrix Anal. Appl. 126-145.
J.W. Demmel and W. Kahan (1990). "Accurate Singular Values of Bidiagonal Matrices,"
SIAM J. Sci. and Stat. Comp. 11, 873-912.
J.W. Demmel and K. Veselic (1992). "Jacobi's Method is More Accurate than QR,"
SIAM J. Matrix Anal. Appl. 13, 1204-1245.
B. De Moor and G.H. Golub (1991). "The Restricted Singular Value Decomposition:
Properties and Applications," SIAM J. Matrix Anal. Appl. 1tl, 401--425.
B. De Moor and P. Van Dooren (1992). "Generalizing the Singular Value and QR
Decompositions," SIAM J. Matrix Anal. Appl. 13, 993-1014.
J.E. Dennis and R.B. Schnabel (1983). Numerical Methods for Unconstrained Optimiza-
tion and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, NJ.
J.E. Dennis Jr. and K. Turner (1987). "Generalized Conjugate Directions," Lin. Alg.
and Its Applic. 88/89, 187-209.
E. F. Deprettere, ed. (1988). SVD and Signal Processing. Elsevier, Amsterdam.
J. Descloux (1963). "Bounds for the Spectral Norm of Functions of Matrices," Numer.
Math. 5, 185-90.
M.A. Diamond and D.L.V. Ferreira (1976). "On a Cyclic Reduction Method for the
Solution of Poisson's Equation," SIAM J. Num. Anal. 13, 54-70.
S. Doi (1991). "On Parallelism and Convergence of Incomplete LU Factorizations," Appl.
Numer. Mat h. 1, 417--436.
J.J. Dongacra (1983). "Improving the Accuracy of Computed Singular Values," SIAM
J. Sci. and Stat. Comp. 4, 712-719.
J.J. Dongarra, J.R. Bunch, C.B. Moler, and G.W. Stewart (1979). UNPACK Users
Guide, SIAM Publications, Philadelphia, PA.
J.J. Dongacra, J. Du Croz, I.S. Duff, and S.J. Hammarling (1990). "A Set of Level 3
Basic Linear Algebra Subprograms," ACM 1hms. Math. Soft. 16, 1-17.
J.J. Dongarra, J. Du Croz, I.S. Duff, and S.J. Hammarling (1990). "Algorithm 679. A
Set of Level 3 Basic Linear Algebra Subprograms: Model Implementation and Test
Programs," ACM Tmns. Math. Soft. 16, 18-28.
J.J. Dongarra, J. Du Croz, S. Hammarling, and R.J. Hanson (1988). "An Extended Set
of Fortran Basic Linear Algebra Subprograms," ACM Tmns. Math. Soft. 14, 1-17.
J.J. Dongacra, J. Du Croz, S. Ho.rnma.rling, and R.J. Hanson (1988). "Algorithm 656 An
Extended Set of Fortran Basic Linear Algebra Subprograms: Model Implementation
and Test Programs," ACM Tmns. Math. Soft. 14, 18-32.
J.J. Dongarra, I. Duff, P. Gaffney, and S. McKee, eds. (1989), Vector and Pamllel
Computing, Ellis Horwood, Chichester, England.
J.J. Dongarra, l. Duff, D. Sorensen, and H. van der Vorst (1990). Solving Linear Systems
on Vector and Shared Memrny Computers, SIAM Publications, Philadelphia, PA.
J.J. Dongacra and S. Eisenstat (1984). "Squeezing the Most Out of an Algorithm in
Cray Fortran," ACM Tmns. Math. Soft. 10, 221-230.
BIBLIOGRAPHY 651
J.J. Dongarra., F.G. Gustavson, and A. Karp (1984). "Implementing Lineae Algebra
Algorithms for Dense Matricftl on a Vector Pipeline Machine," SIAM Review 26,
91-112.
J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). "Block Reduction of Matrices
to Condensed Fonns for Eigenw.lue Computations," JACM 27, 21&-227.
J.J. Dongarra, S. Hammarling, and J.H. Wi!kiiUIOn (1992). "Numerical Considerations
in Computing Invariant Subspacffi," SIAM J. Matri:< Anal. Appl. I3, 14&-161.
J.J. Dongarra and T. Hewitt (1986). "Implementing Dense Linear Algebra Algorithms
Using Multitasking on the Cray X-MP-4 (or Approaching the Giga.tlop)," SIAM J.
Sci. and Stat. Comp. 7, 347-350.
J.J. Dongarra and A. Hinds (1979). "Unrolling Loops in Fortran," Software Practice
and E:tperience 9, 219-229.
J.J. Dongarra and R.E. Hiromoto (1984). "A Collection of Parallel Linear Equal;ion
Routinffi for the Denelcor HEP,' Parallel Computing I, 133-142.
J.J. Dongarra, L. Kaufman, and S. Hamrnarling (1986). "Squee:zing the Most Out of
Eigenvalue Solvers on High Performance Computers," Lin. Alg. and Its Applic. 77,
113-136.
J.J. Dongarra., C.B. Moler, and J.H. Wilkinaon (1983). "Improving the Accuracy of
Computed Eigenw.lues and Eigenvectors," SIAM J. Numer. Anal. 20, 23-46.
J.J. Dongarra and A.H. Sameh (1984). "On Some Parallel Banded System Solvers,"
Parallel Computing I, 223-235.
J.J. Dongarra, A. Sameh, and D. Sorensen (1986). "Implementation of Some Concurrent
Algorithms for Matrix Factorization," Parallel Computing 3, 2&-34.
J.J. Dongarra and D.C. Sorensen (1986). "Linear Algebra on High Performance Com-
puters," Appl. Math. and Comp. 20, 57-88.
J.J. Dongarra and D.C. Sorensen (1987). "A Portable Environment for Developing
Parallel Programs," Pamllel Computing 5, 17&-186.
J.J. Dongarra and D.C. Sorensen (1987). "A Fully Parallel Algorithm for the Symmetric
Eigenvalue Problem," SIAM J. Sci. and Stat. Comp. 8, S139-S154.
J.J. Dongarra and D. Walker (1995). "Software Libraries for Linear Algebra Computa-
tions on High Performance Computers," SIAM Review 37, 151-180.
F.W. Dorr (1970). "The Direct Solution of the Discrete Poisson Equation on a Rectan-
gle," SIAM Review I2, 248-63.
F.W. Dorr (1973). "The Direct Solution of the Discrete Poisson Equation in O(n2 )
Operations," SIAM Review I5, 412--415.
C. C. Douglas, M. Heroux, G. Slishman, and R.M. Smith (1994). "GEMMW: A Portable
Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm,"
J. Comput. Phys. 110, 1-10.
Z. Drmoc (1994). The Generalized Singular Value Problem, Ph.D. Thesis, FernUniver-
sitat, Hagen, Germany.
Z. Dramllc, M. Omlad~, and K. Veseli~ (1994). "On the Perturbation of the Cholesky
Factorization," SIAM J. Matri:< Anal. AppL I5,1319--1332.
P.F. Dubois, A. Greenbaum, and G.H. Rodrigue (1979). "Approximating the Inverse
of a Matrix for Use on Iterative Algorithms on Vector Processors,'' Computing 22,
257-268.
A.A. Dubrulle (1970). "A Short Note on the Implicit QL Algorithm for Symmetric
Tridiagonal Matriee~," Numer. Math. I5, 450.
A.A. Dubrulle and G.H. Golub (1994). "A Multishift QR Iteration Without Computa-
tion of tbe Shifts," Numerical Algorithms 7, 173-181.
A.A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). "The Implicit QL Algorithm,"
Numer. Math. I2, 377-83. see aiao Wilkinson and Reinsch (1971, pp.241-48).
J.J. Du Croz and N.J. Higham (1992). "Stability of Methods for Matrix Inversion," IMA
J. Num. Anal. Ill, 1-19.
I.S. Duff (1974). "Pivot Selection and Row Ordering in Givens Reduction on Sparse
Matriee~," Computing I3, 239--48.
I.S. Duff (1977). "A Survey of Sparse Matrix Research," Proc. IEEE 65, 500-535.
652 BIBLIOGRAPHY
I.S. Duff, ed. (1981). Sparse Matrices and Their Uses, Academic Press, New York.
I.S. Duff, A.M. Erisman, and J.K. Reid (1986). Direct Method. for Sparse Matrices,
Oxford University Press.
I.S. Duff, N.I.M. Gould, J.K. Reid, J.A. Scott, and K. Turner (1991). "The Factorization
of Span<e Indefinite Matrices," IMA J. Num. Anal. 11, 181-204.
I.S. Duff and G. Meurant (1989). "The Effect of Ordering on Preconditioned Conjugate
Gradients," BIT f9, 63fH>57.
I.S. Duff and J.K. Reid (1975). "On the Reduction of Sparse Matrices to Condensed
Forms by Similarity Transformations," J. Inst. Math. Applic. 15, 217-24.
I.S. Duff and J.K. Reid (1976). "A Comparison of Some Methods for the Solution of
Sparse Over-Determined Systems of Linear Equations," J. Inst. Math. Applic. 17,
267-80.
I.S. Duff and G.W. Stewart, eds. (1979). Sparse Matrix Proceedings, 1978, SIAM
Publications, Philadelphia, PA.
N. Dunford and J. Schwartz (1958). Linear O!JEmtors, Part I, Interscience, New York.
J. Durbin (1960). "The Fitting of Time Series Models," Rev. Inst. Int. Stat. f8 233-43.
P.J. Eberlein (1965). "On Measures of Non-Normality for Matrices," A mer. Math. Soc.
Monthly 72, 995-96.
P.J. Eberlein (1970). "Solution to the Complex Eigenproblem by a Norm-Reducing
Jacobi-type Method," Numer. Math. L4, 232-45. See also Wilkinson and Reinsch
(1971, pp.404-17).
P.J. Eberlein (1971). "On the Diagonalization of Complex Symmetric Matrices," J. Inst.
Math. Applic. 7, 377-83.
P.J. Eberlein (1987). "On Using the Jacobi Method on a Hypercube," in Hypercube
Multiprocessors, ed. M.T. Heath, SIAM Publications, Philadelphia, PA.
P.J. Eberlein and C.P. Huang (1975). "Global Convergence of the QR Algorithm for
Unitary Matrices with Some Results for Normal Matrices," SIAM J. Numer. Anal.
1f, 421-453.
C. Eckart and G. Young (1939). "A Principal Axis Transformation for Non-Hermitian
Matrices," Bull. Amer. Math. Soc. 45, 118-21.
A. Edelman (1992). "The Complete Pivoting Conjecture for Gaussian Elimination is
False," The Mathematica Journal 2, 58-61.
A. Edelman (1993). "Large Dense Numerical Linear Algebra in 1993: The Parallel
Computing Influence," Int'l J. Supercomputer Appl. 7, 113-128.
A. Edelman, E. Elmroth, and B. Kii.gstrom (1996). "A Geometric Approach to Pertur-
bation Theory of Matrices and Matrix Pencils," SIAM J. Matrix Anal., to appear.
A. Edelman and W. Mascarenhas (1995). "On the Complete Pivoting Conjecture for a
Hadamard Matrix of Order 12," Linear and Multilinear Algebm 38, 181-185.
A. Edelman and H. Murakami (1995). "Polynomial Roots from Companion Matrix
Eigenvalues," Math. Comp. 64, 763-776.
M. Eiermann and W. Niethammer (1983). "On the Construction of Semi-iterative Meth-
ods," SIAM J. Numer. Anal. 20, 1153-1160.
M. Eiermann, W. Niethammer, and R.S. Varga (1992). "Acceleration of Relaxation
Methods for Non-Hermitian Linear Systems," SIAM J. Matrix Anal. Appl. 13,
97!f-991.
M. Eiermann and R.S. Varga (1993). "Is the Optimal w Best for the SOR Iteration
Method," Lin. Alg. and Its Applic. 182, 257-277.
V. Eijkhout (1991). "Analysis of Parallel Incomplete Point Factorizations," Lin. Alg.
and Its Applic. 154-156, 723-740.
S.C. Eisenstat (1984). "Efficient Implementation of a Class of Preconditioned Conjugate
Gradient Methods," SIAM J. Sci. and Stat. Computing 2, 1-4.
S.C. Eisenstat, H. Elman, and M. Schultz (1983). "Variational Iterative Methods for
Nonsymmetric Systems of Equations," SIAM J. Num. Anal. 20, 345-357.
S.C. Eisenstat, M.T. Heath, C.S. Henkel, and C.H. Romine (1988). "Modified Cyclic
Algorithms for Solving Triangular Systems on Distributed Memory Multiprocessors,"
SIAM J. Sci. and Stat. Comp. 9, 589-600.
BIBLIOGRAPHY 653
J.S. Frame (1964). "Matrix Functions a.nd Applications, Pa.rt IV," IEEE Spectrum 1
(June}, 123--31.
J.G.F. Francis (1961). "The QR Transformation: A Unitary Analogue to the LR Trans-
formation, Pa.rts I and II" Camp. J. 4, 265-72, 332-45.
J.N. Franklin (1968). Matrix Theo'f71 Prentice Hall, Englewood Cliffs, NJ.
T. L. Freeman and C. Phillips (1992). Parallel Numerical Algorithms, Prentice Hall,
New York.
R.W. Freund (1990). "On Conjugate Gradient Type Methods and Polynomial Pre-
conditioners for a Class of Complex Non-Hermitian Matrices," Numer. Math. 57,
285-312.
R.W. Freund (1992). "Conjugate Gradient-Type Methods for Lineae Systems with Com-
plex Symmetric Coefficient Matrices," SIAM J. Sci. Statist. Comput. 13, 425-448.
R.W. Freund (1993). "A Transpose-Free Quasi-Minimum Residual Algorithm for Non-
hermitian Linear System," SIAM J. Sci. Comput. 14, 470-482.
R.W. Freund and N. Nachtigal (1991). "QMR: A Quasi-Minimal Residual Method for
Non-Hermitian Lineae Systems," Numer. Math. 60, 315-339.
R.W. Freund and N.M. Nachtigal (1994). "An Implementation of the QMR Method
Based on Coupled Two-term Recurrences," SIAM J. Sci. Comp. 15, 313-337.
R.W. Freund, G.H. Golub, and N. Nachtigal (1992). "Iterative Solution of Lineae Sys-
tems," Acta Numerim 1, 57-lOll.
R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). "An Implementation of the
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices," SIAM J. Sci. and
Stat.Comp. 14, 137-158.
R.W. Freund and H. Zha (1993). "A Look-Ahead Algorithm for the Solution of General
Hankel Systems," Numer. Math. 64, 295-322.
S. Friedland (1975). "On Inverse Multiplicative Eigenvalue Problems for Matrices," Lin.
Alg. and Its Applic. HJ, 127-38.
S. Friedland (1991). "Revisiting Matrix Squaring," Lin. Alg. and Its Applic. 154-156,
59--63.
S. Friedland, J. Nocedal, and M.L. Overton (1987). "The Formulation a.nd Analysis of
Numerical Methods for Inverse Eigenvalue Problems," SIAM J. Numer. Anal. !4,
634--£67.
C.E. Froberg (1965). "On Tria.ngulacization of Complex Matrices by Two Dimensional
Unitary Tranforma.tions," BIT 5, 23(}-34.
R.E. Funderlic and A. Geist (1986). "Thrus Data Flow for Parallel Computation of
Missized Matrix Problems," Lin. Alg. and I!3 Applic. 11, 149-164.
G. Galimberti and V. Pereyra (1970). "Numerical Differentiation and the Solution of
MultidimenBional Vanderrnonde Systems," Math. Comp. !4, 357-{;4.
G. Galimberti and V. Pereyra (1971). "Solving Confluent Vandermonde Systems of
Hermitian Type," Numer. Math. 18, 44-{iO.
K.A Gallivan, M. Heath, E. Ng, J. Ortega, B. Peyton, R. Plemmons, C. Romine, A.
Sa.meh, and B. Voigt (1990), Pamllel Algorithms for Matru Computations, SIAM
Publications, Philadelphia., PA.
K.A. Gallivan, W. Jalby, a.nd U. Meier (1987). ''The Use of BLAS3 in Lineae Algebra
on a. Parallel Processor with a Hieracchica.l Memory," SIAM J. Sci. and Stat. Comp.
8, 1079-1084.
K.A. Gallivan, W. Jalby, U. Meier, and A.H. Sameh (1988). "Impact of Hierarchical
Memory Systems on Lineae Algebra Algorithm Design," Int 'I J. Supercomputer Ap-
plic. ll, 12-48.
K.A. Gallivan, R.J. Plemmons, and A.H. Sameh (1990). "Parallel Algorithms for Dense
Lineae Algebra Computations," SIAM Review 3£, 54-135.
E. Ga.llopoulos and Y. Saad (1989). "A Parallel Block Cyclic Reduction Algorithm for
the Fast Solution of Elliptic Equations," Parallel Computing 10, 143-160.
W. Gander (1981). "Least Squares with a Quadratic Constraint," Numer. Math. 36,
291-307.
656 BIBLIOGRAPHY
W. Gander, G.H. Golub, and U. von Matt (1991). "A Constrained Eigenvalue Problem,"
in Numerical Linear Algebm, Digital Signal Processing, and Pamlld Algorithms,
G.H. Golub and P. Van Dooren (eds), Springer-Verlag, Berlin.
D. Gannon and J. Van Rllsendale (1984). "On the Impact of Communication Complexity
on the Design of Parallel Numerica.l Algorithms," IEEE 7Fans. Comp. C-33, 1180-
1194.
F.R. Gantmacher (1959). The TheoT'lf of Matrices, vols. 1 and i!, Chelsea, New York.
B.S. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972). Matrix Eigensy•tem
Routineo: EISPACK Guide Extension, Lecture Notes in Computer Science, Volume
51, Springer-Verlag, New York.
J. Gardiner, M.R. Wette, A.J. Laub, J.J. Amato, and C.B. Moler (1992). "Algorithm
705: A FORI'RAN-77 Sortware Package fur Solving the Sylvester Matrix Equation
AXBT + CXDT = E," ACM 7Fans. Math. Soft. 18, 232-238.
W. Gautschi (1975). "Norm Estimates fur Inverses of Vandennonde Matrices,• Numer.
Math. 23, 337-47.
W. Gautschi (1975). "Optimally Conditioned Vandermonde Matrices," Numer. Math.
ll4, 1-12.
G.A. Geist {1991). "Reduction of a General Matrix to Tridiagonal Form," SIAM J.
Matrix Anal. Appl. lll, 362-373.
G.A. Geist and M.T. Heath (1986). "Matrix Factorization on a Hypercube," in M.T.
Heath (ed) {1986). Proceedings of First SIAM Conference on Hypercube Multipro-
cessors, SIAM Publications, Philadelphia, PA.
G.A. Geist and C.H. Romine (1988). "LU Factorization Algorithms on Distributed
Memory Multiprocessor Architectures," SIAM J. Sci. and Stat. Comp. 9, 639--649.
W.M. Gentleman {1973). "Least Squares Computations by Givens Transformations
without Square Roots," J. Inst. Math. Appl. 12, 329-36.
W.M. Gentleman (1973). "Error Analysis of QR Decompositions by Givens Transfor-
mations," Lin. Alg. and Its Applic. 10, 189-97.
W.M. Gentleman and H.T. Kung (1981). "Matrix Triangularization by Systolic Arrays,"
SPIE Proceedings, Vol. 298, 19-26.
J.A. George (1973). "Nested Dissection of a Regular Finite Element Mesh," SIAM J.
Num. Anal. 10, 345-63.
J.A. George {1974). "On Block Elimination for Sparse Linear Systems," SIAM J. Num.
Anal. 11, 585--603.
J.A. George and M.T. Heath (1980). "Solution of Sparse Linear Least Squares Problems
Using Givens Rotations," Lin. Alg. and Its Applic. 34, 69-83.
A. George, M.T. Heath, and J. Liu {1986). "Parallel Cholesky Factorization on a Shared
Memory Multiproceasor," Lin. Alg. and Its Applic. 77, 161>-187.
A. George and J. W-H. Liu (1981). Computer Solution of Large Sparse Positive Definite
Systems. Prentice-Hall Inc., Englewood Cliffs, New Jersey.
A.R. Ghavimi and A.J. Laub (1995). "Residual Bounds for Discret&-Time Lyapunov
Equations,• IEEE 7Fans. Auto. Cont. 40, 1244-1249.
N.E. Gibbs and W.G. Poole, Jr. {1974). "Tridiagonalization by Permutations," Comm.
ACM 17, 2G-24.
N.E. Gibbs, W.G. Poole, Jr., and P.K. Stockmeyer (1976). "An Algorithm for Reducing
the Bandwidth and Profile of a SpiiJ'!!e Matrix," SIAM J. Num. Anal. 13, 236-50.
N.E. Gibbs, W.G. Poole, Jr., and P.K. Stockmeyer (1976). "A Comparison of Several
Bandwidth and Profile Reduction Algorithms," A CM 7Fans. Math. Soft. 2, 322-30.
P.E. Gill, G.H. Golub, W. Murray, and M.A. Saunders (1974). "Methods for Modifying
Matrix Factorizations," Math. Comp. 88, 50~35.
P.E. Gill and W. Murray {1976). ''The Orthogonal Factorization of a Large Sparse
Matrix," in Sparse Matrix Computations, ed. J.R. Bunch and D.J. Rllse, Academic
Press, New York, pp. 177-200.
P.E. Gill, W. Murray, D.B. Poncele6n, and M.A. Saunders (1992). "Preconditioners
for Indefinite Systems Arising in Optimization," SIAM J. Matrix Anal. Appl. 13,
292-311.
BIBLIOGRAPHY 657
P.E. Gill, W. Murray, and M.A. Saunders (1975). "Methods for Computing and Modi-
fying the LDV Factors of a Matrix," Math. Comp. !1.9, 1051-77.
P.E. Gill, W. Murray, and M.H. Wright (1991). Numerical Linear Algel>ra and Opti-
mimtion, Vol. 1, Addison-Wesley, Reading, MA.
W. Givens (1958). "Computation of Plane Unita.ry Rotations Transforming a Genera.!
Matrix to Triangular Form," SIAM J. App. Math. 6, 26--50.
J. Gluchowska and A. Smoktunowicz (1990). "Solving the Linear Least Squares Problem
with Very High Relative Accuracy," Computing 45, 345-354.
l.C. Gohberg and M.G. Krein (1969). Introduction to the TheoT]f of Linear Non-Self
Adjoint Operators , Amer. Math. Soc., Providence, R.I.
I.C. Gohberg, P. Lancaster, and L. Rodman (1986). /nooriant Subspaces of Matrices
With Applications, John Wiley and Sons, New York.
D. Goldberg (1991). "What Every Computer Scientist Should Know About Floating
Point Arithmetic," ACM SunJeys 23, 5-48.
D. Goldfarb (1976). "Factored Variable Metric Methods for Unconstrained Optimiza.-
tion," Math. Comp. 30, 796--811.
H.H. Goldstine and L.P. Horowitz (1959). "A Procedure for the Diagonalization of
Norma.! Matrices," J. Assoc. Comp. Mach. 6, 176--95.
G.H. Golub (1965). "Numerical Methods for Solving Linear Least Squares Problems,"
Numer. Math. 7, 206--16.
G.H. Golub (1969). "Matrix Decompositions and Statistical Computation," in Statistical
Computation , ed. R.C. Milton and J.A. Nelder, Academic Press, New York, pp.
365--97.
G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Review 15,
318-334.
G.H. Golub (1974). "Some Uses of the Lanczos Algorithm in Numerical Linear Algebra,"
in Topics in NumericBI Analysis, ed., J.J.H. Miller, Academic Press, New York.
G.H. Golub, M. Heath, and G. Wahba (1979). "Generalized Cr088-Validation as a
Method Cor Choosing a Good Ridge Parameter," Technometrics Ill, 215--23.
G.H. Golub, A. Hoffman, and G.W. Stewart (1988). "A Generalization of the Eckart-
Yaung-Mirsky Approximation Theorem." Lin. Alg. and Its Applic. 88/89, 317-328.
G.H. Golub a.nd W. Kahan (1965). "Calculating the Singular Values and Pseudo-Inverse
of a Matrix," SIAM J. Num. Anal. !!., 205-24.
G.H. Golub, V. Klema and G.W. Stewart (1976). "Rank Degeneracy and Least Squares
Problems," Technical Report TR-456, Department of Computer Science, University
of Maryland, College Park, MD.
G.H. Golub, F.T. Luk, and M. Overton (1981). "A Block Lanczos Method for Computing
the Singular Values and Corresponding Singular Vectors of a Matrix," ACM 1hlns.
Math. Soft. 7, 14!Hi9.
G.H. Golub and G. Meurant (1983). Resolution Numeri.que des Grondes Systi!mes
Lineaires, Collection de Ia Direction des Etudes et Recherches de l'Electricite de
France, val. 49, Eyolles, Paris.
G.H. Golub and C.D. Meyer (1986). "Using the QR Factorization and Group Inversion
to Compute, Differentiate, and estimate the Sensitivity of Stationary Probabilities
for Markov Chains," SIAM J. Alg. and Dis. Methods, 7, 273-281.
G.H. Golub, S. Nash, and C. Van Loan (1979). "A Hessenberg-Schur Method for the
Matrix Problem AX+ XB = C," IEEE 1hlns. Auto. Cont. AC-!!.4, 909-13.
G.H. Golub and D. O'Leary (1989). "Some History of the Conjugate Gradient and
Lanczos Methods," SIAM Review 31, 5(}--102.
G.H. Golub and J.M. Ortega (1993). Scientific Computing: An Introduction with Par-
allel Computing, Academic Press, Boston.
G.H. Golub and M. Overton (1988). "The Convergence of Inexact Chebychev and
Richardson Iterative Methods for Solving Linear Systems," Numer. Math. 53, 571-
594.
658 BIBLIOGRAPHY
G.H. Golub and V. Pereyra (1973). "The Differentiation of Pseudo-Inverses and Nonlin-
eae Least Squares Problems Whose Variables Separate," SIAM J. Num. Anal. 10,
413-32.
G.H. Golub and V. Pereyra (1976). "Differentiation of Pseudo-Inverses, Separable Non-
lineae Least Squares Problems and Other Tales," in Generalized Inverses and Appli-
cations , ed. M.Z. Nashed, Academic Press, New York, pp. 303-24.
G.H. Golub and C. Reinsch (1970). "Singular Va.lue Decomposition and Least Squares
Solutions," Numer. Math. 14, 403-20. See a.lso Wilkinson and Reinsch (1971, 134-
51).
G.H. Golub and W.P. Tang (1981). "The Block Decomposition of a Vandermonde Matrix
a.nd Its Applications," BIT 21, 505-17.
G.H. Golub and R. Underwood (1977). ''fhe Block Lanczos Method for Computing
Eigenvalues," in Mathematical Software III, ed. J. Rice, Academic Press, New York,
pp. 364-77.
G. H. Golub, R. Underwood, and J.H. Wilkinson (1972). "The Lanczos Algorithm for the
Symmetric Ax= >.Bx Problem," Report STAN-CS-72-270, Department of Computer
Science, Stanford University, Stanford, California.
G.H. Golub and P. Van Dooren, eds. (1991). Numerical Linear Algebro, Digital Signal
Processing, and Parollel Algorithms .. Springer-Verlag, Berlin.
G.H. Golub and C. F. Van Loan (1979). "Unsymmetric Positive Definite Linear Systeri!B,"
Lin. Alg. and Its Applic. 28, 85--98.
G.H. Golub and C.F. Van Loan (1980). "An Ana.lysis of the Total Least Squares Prob-
lem," SIAM J. Num. Anal. 17, 883-93.
G.H. Golub and J.M. Varah (1974). "On a Characterization of the Best £2-Scaling of a.
Matrix," SIAM J. Num. Anal. 11, 472-79.
G.H. Golub and R.S. Varga (1961). "Chebychev Semi-Iterative Methods, Successive
Over-Relaxation Iterative Methods, and Second-Order Richardson Iterative Methods,
Parts I and II," Numer. Math. 3, 147-56, 157-68.
G.H. Golub and J.H. Welsch (1969). "Ca.lcula.tion of Gauss Quadrature Rules," Math.
Comp. 23, 221-30.
G.H. Golub and J.H. Wilkinson (1966). "Note on Iterative Refinement of Least Squares
Solutions,'' Numer. Math. 9, 139-48.
G.H. Golub and J.H. Wilkinson (1976). "Ill-Conditioned EigellBYstems and the Compu-
tation of the Jordan Canonical Form," SIAM Review 18, 578-619.
G.H. Golub a.od H. Zha. ( 1994). "Perturbation Analysis of the Canonical Correlations of
Matrix Paino,'' Lim. Alg. and Its Applic. 210, 3-28.
N. Gould (1991). "On Growth in Gaussian Elimination with Complete Pivoting,'' SIAM
J. Matrix Anal. Appl. 12, 354-361.
R.J. Goult, R.F. Hoskins, J.A. Milner and M.J. Pratt (1974). Computational Methods
in Linear Algebm, John Wiley and Sons, New York.
A.R. Gourlay (1970). "Generalization of Elementary Hermitian Matrices," Comp. J.
13, 411-12.
A.R. Gourlay and G.A. Watson (1973). Computational Methods for Matrix Eigenprob-
lems, John Wiley & Sons, New York.
W. Govaerts (1991). "Stable Solvers and Block Elimination for Bordered SysterllB,"
SIAM J. Matrix Anal. Appl. 12, 469-483.
W. Gova.erts and J.D. Pryce (1990). "Block Elimination with One Iterative Refinement
Solves Bordered Lineae Systems Accurately," BIT 30, 490-507.
W. Govaerts and J.D. Pryce (1993). "Mixed Block Elimination for Linear Systems with
Wider Bordeno," IMA J. Num. Anal. 13, 161-180.
W. B. Gragg (1986). "The QR Algorithm for Unitary Hessenberg Matrices," J. Comp.
Appl. Math. 16, 1-8.
W.B. Gragg and W.J. Harrod (1984). "The Numerically Stable Reconstruction of Jacobi
Matrices from Spectra.! Data.,'' Numer. Math. 44, 317-336.
W.B. Gragg and L. Reichel (1990). "A Divide and Conquer Method for Unitary a.nd
Orthogona.l EigenproblerllB," Numer. Math. 57, 695-718.
BIBLIOGRAPHY 659
A. Graham (1981). Kronecker Producls and Matru Calculw with Application.., Ellis
Horwood Ltd., Chichester, England.
B. Green (1952). "The Orthogonal Approximation of an Oblique Structure in Factor
Analysis," Psychometrika 17, 429-40.
A. Greenbaum (1992). "Diagonal Sca.lings of the Laplacian as Preconditione"' for Other
Elliptic Differentia.! Operatoi8," SIAM J. Matru Anal. Appl. 13, 826--846.
A. Greenbaum and G. Rodrigue ( 1989). "Optimal Preconditione"' of a Given Sparsity
Pattern," BIT 29, 61Q-634.
A. Greenbaum and Z. Strakos (1992). "Predicting the Behavior of Finite Precision
Lanczos and Conjugate Gradient Computations," SIAM J. Matri:J; Anal. Appl. 13,
121-137.
A. Greenbaum and L.N. 'frefethen (1994). "GMR.F.'lfCR and Arnoldi/Lanczos as Matrix
Approximation Problems," SIAM J. Sci. Comp. 15, 359-368.
J. Greenstadt (1955). "A Method for Finding Roots of Arbitrary Matrices," Math.
Tables and Other Aids to Comp. 9, 47-52.
R.G. Grimes and J.G. Lewis (1981). "Condition Number Estimation for Sparse Matri-
ces," SIAM J. Sci. and Stat. Comp. 2, 384-88.
R.G. Grim..,, J.G. Lewis, and H.D. Simon (1994). "A Shifted Block Lanczos Algorithm
for Solving Sparse Symmetric Generalized Eigenprohlems," SIAM J. Matri:J; Anal.
Appl. 15, 228-272.
W.O. Gropp and D.E. Key.., (1988). "Complexity of Parallel Implementation of Domain
Decomposition Techniqu.., for Elliptic Partial Differential Equations," SIAM J. Sci.
and Stat. Comp. 9, 312-326.
W.O. Gropp and D.E. Keyes (1992). "Domain Decomposition with Local Mesh Refine-
ment," SIAM J. Sci. Stati.ot. Comput. 13, 967-993.
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Bidiagonal
SVD," SIAM J. Matru Anal. Appl. 16, 79-92.
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Symmetric
'fridiagona.l Eigenprohlem," SIAM J. Matri:J; Anal. Appl. 16, 172-191.
M. Gulliksson (1994). "Iterative Refinement for Constrained and Weighted Linear Least
Squares," BIT 34, 239-253.
M. Gulliksson (1995). "Backward Error Analysis for the Constrained and Weighted
Linear Least Squares Problem When Using the Weighted QR Factorization," SIAM
J. Matru. Anal. Appl. 13, 675--u87.
M. Gulliksson and P-A. Wedin (1992). "Modifying the QR-Decomposition to Con-
strained and Weighted Linear Least Squares," SIAM J. Matri:J; Anal. AppL 13,
1298-1313.
R.F. Gunst, J.T. Webster, and R.L. Mason (1976). "A Comparison of Least Squares
and Latent Root Regression Estimators," Technometrics 18, 75-83.
K.K. Gupta (1972). "Solution of Eigenvalue Problems by Sturrn Sequence Method," Int.
J. Numer. Meth. Eng. 4, 379-404.
M. Gutknecht (1992). "A Completed Theory of the Unsymmetric Lanczos Process and
Related Algorithms, Part 1," SIAM J. Matru Anal. Appl. 13, 594--u39.
M. Gutknecht (1993). "Variants of BiCBSTAB for Matrices with Complex Spectrum,"
SIAM J. Sci. and Stat. Comp. 14, 102(}-1033.
M. Gutknecht (1994). "A Completed Theory of the Unsymmetric Lanczos Process and
Related Algorithms, Part II," SIAM J. Matru Anal. Appl. 15, 15-58.
W. Hackbusch (1994). Iterative Solution of Larye Sparse Systems of Equations, Springer-
Verlag, New York.
D. Hacon (1993). "Jacobi's Method for Skew-Symmetric Matrices," SIAM J. Matru
Anal. Appl. 14, 619--u28.
L.A. Hageman and D.M. Young (1981). Applied Itemtive Methods, Academic Press,
New York.
W. W. Hager (1984). "Condition Estimates," SIAM J. Sci. and Stat. Comp. 5, 311-316.
W. W. Hager (1988). Applied Numerical Linear Algebrn, Prentice-Hall, Englewood Cliffs,
NJ.
660 BIBLIOGRAPHY
S.J. Hammarling (1974). "A Note on Modifications to the Givens Plane Rotation," J.
Irut. Math. Appl. 13, 215-18.
S.J. Harnmarling (1985). "The Singular Value Decomposition in Multivariate Statistics,"
ACM SIGNUM Newsletter 20, 2-25.
S.L. Handy and J.L. Barlow (1994). "Numerical Solution of the Eigenproblem for
Banded, Symmetric Toeplitz Matrices," SIAM J. Matrix Anal. Appl. 15, 205-214.
M. Hanke and J.G. Nagy (1994). "Toeplitz Approximate Inverse Preconditioner for
Banded Toeplitz Matrices," Numerical Algorithms 7, 183-199.
M. Hanke and M. Neumann (1990). "Preconditionings and Splittings for Rectangular
Systems," Numer. Math. 57, 85-96.
E.R. Hansen (1962). "On Quasicyclic Jacobi Methods," ACM J. 9, 118-35.
E.R. Hansen (1963). "On Cyclic Jacobi Methods," SIAM J. AppL Math. 11, 448-59.
P.C. Hansen (1987). "The Truncated SVD as a Method for Regularization," BIT 27,
534-553.
P.C. Hansen (1988). "Reducing the Number of Sweeps in Hestenes Method," in Singular
Value Decomposition and Signal Processing, ed. E.F. Deprettere, North Holland.
P.C. Hansen (1990). "Relations Between SVD and GSVD of Discrete Regularization
Problems in Standard and General Form," Lin.Alg. and Its Applic. 141, 165-176.
P.C. Hansen and H. Gesmar (1993). "Fast Orthogonal Decomposition of Rank-Deficient
Toeplitz Matrices," Numerical Algorithms 4, 151-166.
R.J. Hanson and C.L. Lawson (1969). "Extensions and Applications of the Householder
Algorithm for Solving Linear Least Square Problems," Math. Camp. 23, 787-fll2.
V. Hari (1982). "On the Global Convergence of the Eberlein Method for Real Matrices,"
Numer. Math. 39, 361-370.
V. Hari (1991). "On Pairs of Almost Diagonal Matrices," Lin. Alg. and Its Applic.
148, 193-223.
M.T. Heath, ed. (1986). Proceedings of First SIAM Conference on Hypercube Multipro-
cessors, SIAM Publications, Philadelphia, PA.
M.T. Heath, ed. (1987). Hypercube Multiprocessors, SIAM Publications, Philadelphia,
PA.
M.T. Heath (1997). Scientific Computing: An Introductory Survey, McGraw-Hill, New
York.
M.T. Heath, A.J. Laub, C. C. Paige, and R.C. Ward (1986). "Computing the SVD of a
Product of Two Matrices," SIAM J. Sci. and Stat. Comp. 7, 1147-1159.
M.T. Heath, E. Ng, and B.W. Peyton (1991). "Parallel Algorithms for Sparse Linear
Systems," SIAM Review 33, 420-460.
M.T. Heath and C.H. Romine (1988). "Parallel Solution of Triangular Systems on Dis-
tributed Memory Multiprocessors," SIAM J. Sci. and Stat. Comp. 9, 558-588.
M. Hegland (1991). "On the Parallel Solution of Tridiagonal Systems by Wrap-Around
Partitioning and Incomplete LU Factorization," Numer. Math. 59, 453-472.
G. Heinig and P. Jankowski (1990). "Parallel and Superfast Algorithms for Hankel
Systems of Equations," Numer. Math. 58, 109-127.
D.E. Heller (1976). "Some Aspects of the Cyclic Reduction Algorithm for Block Tridi-
agonal Linear Systems," SIAM J. Num. Anal. 13, 484-96.
D.E. Heller (1978). "A Survey of Parallel Algorithms in Numerical Linear Algebra,"
SIAM Review 20, 740-777.
D.E. Heller and l.C.F. Ipsen (1983). "Systolic Networks for Orthogonal Decompositions,"
SIAM J. Sci. and Stat. Comp. 4, 261-269.
B.W. Helton (1968). "Logarithms of Matrices," Proc. Amer. Math. Soc. 19, 733-36.
H.V. Henderson and S.R. Searle (1981). "The Vee-Permutation Matrix, The Vee Opera..
tor, and Kronecker Products: A Review," Linear and Multilinear Algebra 9, 271-288.
B. Hendrickson and D. Womble ( 1994). "The Torus-Wrap Mapping for Dense Matrix
Calculations on Massively Parallel Computers," SIAM J. Sci. Comput. 15, 1201-
1226.
C.S. Henkel, M.T. Heath, and R.J. Plemmons (1988). "Cholesky Downdating on a
Hypercube," in G. Fox (1988), 1592-1598.
BIBLIOGRAPHY 661
P. Henrici (1958). "On the Speed of Convergence of Cyclic and Quasicyclic Jacobi
Methods for Computing the Eigenvalues of Hermitian Matrices," SIAM J. Appl.
Math. 6, 144-62.
P. Henrici (1962). "Bounds for Iterates, Inverse., Spectral Variation and Fields of Values
of Non-normal Matrices," Numer. Math. 4, 24-40.
P. Henrici and K. Zimmermann (1968). "An Estimate foc the Norms of Certain Cyclic
Jacobi Operators," Lin. Alg. and I!s Applic. 1, 489-501.
M.R. Hestenes (1980). Conjugate Direction Methods in Optimization, Springer-Verlag,
Berlin.
M.R. Hestenes (1990). "Conjugacy and Gradients," in A Hi.sto711 of Scientific Comput-
ing, Addison-Wesley, Reading, MA.
M.R. Hestenes and E. Stiefel (1952). "Methods of Conjugate Gradients for Solving
Lineae Systems," J. Res. Nat. Bur. Stand. 49, 409-36.
G. Hewer and C. Kenney (1988). "The Sensitivity of the Stable Lyapunov Equation,"
SIAM J. Control Optim f6, 321-344.
D.J. Higham (1995). "Condition Numbers and Their Condition Numbers," Lin. Alg.
and Its Applic. f14, 193-213.
D.J. Higham and N.J. Higham (1992). "Componentwise Perturbation Theory for Linear
Systems with Multiple Right-Hand Sides," LirL Alg. and Ita Applic. 174, 111-129.
D.J. Higham and N.J. Higham (1992). "Backward Error and Condition of Structured
Linear Systems," SIAM J. Matrix Anal. Appl. 13, 162-175.
D.J. Higham and L.N. Trefethen (1993). "Stiffness of ODES," BIT 33, 285--303.
N.J. Higham (1985). "Nearness Problems in Numerical Linear Algebra," PhD Thesis,
University of Manchester, England.
N.J. Higham (1986). "Newton's Method for the Matrix Square Root," Math. Comp.
46, 537--550.
N.J. Higham (1986). "Computing the Polar Decomposition-ith Applications," SIAM
J. Sci. and Stat. Comp. 7, 116(}-1174.
N.J. Higham (1986). "Efficient Algorithms for computing the condition number of a
tridiagonal matrix," SIAM J. Sci. and Stat. Comp. 7, 15(}-165.
N.J. Higham (1987). "A Survey of Condition Number Estimation for Triangular Matri-
ces," SIAM Review f9, 575--596.
N.J. Higham (1987). "Error Analysis of the Bjorck-Pereyra Algorithms for Solving Van-
dermonde Systems," Numer. Math. 50, 613-632.
N.J. Higham (1987). "Computing Real Square Roots of a Real Matrix," Lin. Alg. and
Its Applic. 88/89, 405-430.
N.J. Higham (1988). "Fast Solution of Vandermonde-like Systems Involving Orthogonal
Polynomials," IMA J. Num. Anal. 8, 473-486.
N.J. Higham (1988). "Computing a Nearest Symmetric Positive Semidefinite Matrix,"
Lin. Alg. and Its Applic. 103, 103-118.
N.J. Higham (1988). "The Symmetric Procrustes Problem," BIT fB, 133-43.
N.J. Higham (1988). "FORTRAN Codes for Estimating the One-Norm of a Real or
Complex Matrix with Applications to Condition Estimation (Algorithm 674)," ACM
TI-ans. Math. Soft. 14, 381-396.
N.J. Higham (1989). "Matrix Nearness Problems and Applications," in Applications of
Matrix Theo711, M.J.C. Gover and S. Barnett (eds), Oxford University Press, Oxford
UK, 1-27.
N.J. Higham (1989). "The Accuracy of Solutions to Triangular Systems," SIAM J. Num.
Anal. f6, 1252-1265.
N.J. Higham (1990). "Bounding the Error in Gaussian Elimination for Tridiagonal
Systems," SIAM J. Matrix Anal. Appl. 11, 521-530.
N.J. Higham (1990). "Stability Analysis of Algorithms for Solving Confluent Vandermonde-
like Systems," SIAM J. Matrix Anal. Appl. 11, 23-41.
N.J. Higham (1990). "Analysis of the Cholesky Decomposition of a Semidefinite Matrix,"
in Reliable Numerical Computation, M.G. Cox and S.J. Hammarling (eds), Oxford
University Press, Oxford, UK, 161-185.
662 BIBLIOGRAPHY
N.J. Higham (1990). "Exploiting Fa.st Matrix Multiplication within the Level3 BLAS,"
ACM TI-ans. Math. Soft. 16, 352-368.
N.J. Higham (1991). "Iterative Refinement Enhances the Stability of QR Factorization
Methods for Solving Linear Equations," BIT 31, 447--468.
N.J. Higham (1992). "Stability of a Method for Multiplying Complex Matrices with
Three Real Matrix Multiplications," SIAM J. Matri:r: Anal. Appl. 13, 681-687.
N.J. Higham {1992). "Estimating the Matrix p-Norm," Numer. Math. 62, 539-556.
N.J. Higham (1993). "Optimization by Direct Search in Matrix Computations," SIAM
J. Matri:r: Anal. Appl. 14, 317-333.
N.J. Higham (1993). "Perturbation Theory and Backward Error for AX - XB = C,"
BIT 33, 124-136.
N.J. Higham (1994). ''The Matrix Sign Decomposition and Its Relation to the Polar
Decomposition," Lin. Alg. and It. Applic. 212/213, 3--20.
N.J. Higham (1994). "A Survey of Componentwise Perturbation Theory in Numerical
Linear Algebra," in Mathematics of Computation 1943-1993: A Half Century of
Computational Mathematics, W. Gautschi (ed.), Volume 48 of Proceedings of Svm-
posia in Applied Mathematics, American Mathematical Society, Providence, Rhode
Island.
N.J. Higham (1995). "Stability of Parallel Triangular System Solvers," SIAM J. Sci.
Comp. 16, 40Q-413.
N.J. Higham (1996). Accuracy and Stability of Numerical Algorithms, SIAM Publica-
tions, Philadelphia, PA.
N.J. Higham and D.J. Higham (1989). "La.rge Growth Factors in Gaussian Elimination
with Pivoting," SIAM J. Matri:r: Anal. Appl. 10, 155-164.
N.J. Higham and P.A. Knight (1995). "Matrix Powers in Finite Precision Arithmetic,"
SIAM J. Matri:r: Anal. Appl. 16, 343--358.
N.J. Higham and P. Papadimitriou (1994). "A Parallel Algorithm for Computing the
Polar Decomposition," Parallel Comp. 20, 1161-1173.
R.W. Hockney (1965). "A Fast Direct Solution of Poisson's Equation Using Fourier
Analysis," J. ACM 12, 95-113.
R.W. Hockney and C.R. Jesshope (1988). Parallel Computers 2, Adam Hilger, Bristol
and Philadelphia.
W. Hoffman and B.N. Parlett (1978). "A New Proof of Global Convergence for the
Tridiagonal QL Algorithm," SIAM J. Num. Anal. 15, 929-37.
S. Holmgren and K. Otto (1992). "Iterative Solution Methods and Preconditioners for
Block-Tridiagonal Systems of Equations," SIAM J. Matri:r: Anal. Appl. 13, 863-886.
H. Hotelling {1957). "The Relations of the Newer Multivariate Statistical Methods to
Factor Analysis," Brit. J. Stat. Psych. 10, 69-79.
P.D. Hough and S.A. Vavasis (1996). "Complete Orthogonal Decomposition for Weighted
Least Squares," SIAM J. Matri:r: Anal. Appl., to appear.
A.S. Householder (1958). "Unitary Triangularization of a Nonsymmetric Matrix," J.
ACM. 5, 339--42.
A.S. Householder (1964). The Theory of Matrices in Numerical Analysis , Dover Pub-
lications, New York.
A.S. Householder (1968). "Moments and characteristic Roots II," Numer. Math. 11,
126-28.
R. Horn and C. Johnson (1985). Matri:r: Analysis, Cambridge University Press, New
York.
R. Horn and C. Johnson (1991). Topics in Matri:r: Analysis, Cambridge University Pre5B,
New York.
C.P. Huang (1975). "A Jacobi-Type Method for Triangularizing an Arbitrary Matrix,"
SIAM J. Num. Anal. 12, 566-70.
C.P. Huang (1981). "On the Convergence of the QR Algorithm with Origin Shifts for
Normal Matrices," IMA J. Num. Anal. 1, 127-33.
C.-M. Huang and D.P. O'Leary (1993). "A Krylov Multisplitting Algorithm for Solving
Linear Systems of Equations," Lin. Alg. and It. Applic. 194, 9-29.
BIBLIOGRAPHY 663
T. Huckle (1992). "Circulant o.nd Skewcirculo.nt Matrie<., for Solving Toeplitz Matrix
Problems," SIAM J. Matri:r: AnaL Appl. 13, 767-777.
T. Huckle (I992). "A Note on Skew-Circulant Preconditioners for Elliptic Problems,"
Numerical Algorithm.o 2, 27~286.
T. Huckle (I994). "The Arnoldi Method for Normal Matrices," SIAM J. Matri:r: Anal.
Appl. 15, 4~89.
T. Huckle (I995). "Low-Rank Modification of the Unsymmetric Lanczos Algorithm,"
Math.Comp. 64, I577-I588.
T.E. Hull and J.R. Swensen (I966). '"!'est. of Probabilistic Models for Propagation of
Roundoff Errors," Comm. ACM. 9, I08-I3.
T-M. Hwang, W-W. Lin, and E.K. Yo.ng (I992). "Rank-Revealing LU Factorizations,"
Lin. Alg. and It. Applic. 175, 115-I41.
Y. Ikebe (I979). "On Inverses of Hessenberg Matric..,," Lin. Alg. and It. Applic. 24,
93-97.
l.C.F. Ipsen, Y. Saad, and M. Schultz (I986). "Dense Linear Systems on a Ring of
Processors," Lin. Alg. and It. Applic. 77, 205-239.
C.G.J. Jacobi (I846). "Uber ein Leichtffi Verfahren Die in der Theorie der Sacularstroun-
gen Vorkommendern Gleichungen Numerisch Aufzulosen," Grelle's J. 30, 5I-94.
P. Jacobson, B. KB.gstrom, and M. Rannar (I992). "Algorithm Development for Dis-
tributed Memory Multicomputers Using Conlab," Scientific Progrnmming, 1, I85-
203.
H.J. Jagadish and T. Kailath (I989). "A Family of New Efficient Arrays for Matrix
Multiplication," IEEE 'Ihlns. Comput. 38, I4~I55.
W. Jalby and B. Philippe (I99I). "Stability Analysis and Improvement of the Block
Gram-Schmidt Algorithm," SIAM J. Sci. Stat. Comp. 12, I058-I073.
M. Jankowski and M. Wozniakowski (I977). "Iterative Refinement Implies Numerical
Stability," BIT 17, 303-3Il.
K.C. Jea and D.M. Young (I983). "On the Simplification of Generalized Conjugate
Gradient Methods for Nonsymmetrizable Linear Systems," Lin. Alg. and Its Applic.
52/53, 39~I7.
A. Jennings (I977). "Influence of the Eigenvalue Spectrum on the Convergence Rate of
the Conjugate Gradient Method," J. Inst. Math. Applic. 20, 6I-72.
A. Jennings (I977). Matri:r: Computation for Engineers and Scientist., John Wiley and
Sons, New York.
A. Jennings and J.J. McKeowen (I992). Matri:r: Computation (2nd ed}, John Wiley and
Sons, New York.
A. Jennings and D.R.L. Orr (1971). "Application of the Simultaneous Iteration Method
to Undamped Vibration Problems," Inst. J. Numer. Math. Eng. 3, 13-24.
A. Jennings and M.R. Osborne (1977). "Generalized Eigenvalue Problems for Certain
Unsymmetric Band Matrices," Lin. Alg. and Its Applic. 29, I3~50.
A. Jennings and W.J. Stewart (1975). "Simultaneous Iteration for the Partial Eigenso-
lution of Real Matrices," J. Inst. Math. Applic. 15, 35I-62.
L.S. Jennings and M.R. Osbome (I974). "A Direct Error Analysis for Least Squar..,,"
Numer. Math. 22, 322-32.
P.S. Jenson (I972). "The Solution of Large Symmetric Eigenproblems by Sectioning,"
SIAM J. Num. AnaL 9, 534-45.
E.R. Jessup and D.C. Sorensen (I994). "A Parallel Algorithm for Computing the Sin-
gular Value Decomposition of a Matrix," SIAM J. Matri:r: Anal. Appl. 15, 53o-548.
Z. Jia (1995). "The Convergence of Generalized Lanczos Methods for Large Unsymmetric
Eigenproblems," SIAM J. Matri:r: Anal. Applic 16, 543-562.
J. Johnson and C.L. Phillips (1971). "An Algorithm for the Computation of the Integral
of the State Transition Matrix," IEEE 'Ihlns. Auto. Cont. AC-16, 204-5.
O.G. Johnson, C.A. Micchelli, and G. Paul (1983). "Polynomial Preconditioners for
Conjugate Gradient Calculations," SIAM J. Numer. Anal. 20, 362-376.
R.J. Johnston (1971). "Gershgorin Theorems for Partitioned Matrices," Lin. Alg. and
Its Applic. ..j, 205-20.
664 BIBLIOGRAPHY
B. Kagstrom and L. Westin (1989). "Generalized Schur Methods with Condition Esti-
mators for Solving the Generalized Sylvester Equation," IEEE 'lhms. Auto. Cont.
AC-34, 745-751.
W. Kahan (1966). "Numerical Linear Algebra," Canadian Math. Bull. 9, 757-801.
W. Kahan (1975). "Spectra of Nearly Hermitian Matrices," Proc. Amer. Math. Soc.
48, 11-17.
W. Kahan and B.N. Parlett (1976). "How Far Should You Go with the Lanczos Process?"
in Sparse Matrix Computations, ed. J. Bunch and D. Rose, Academic Press, New
York, pp. 131-44.
W. Kahan, B.N. Parlett, and E. Jiang (1982). "Residual Bounds on Approximate Eigen-
systems of Nonnormal Matrices," SIAM J. Numer. Anal. 19, 47o-484.
D. Kahaner, C.B. Moler, and S. Nash (1988). Numerical Methods and Software, Prentice-
Hall, Englewood Cliffs, NJ.
T. Ko.ilath and J. Chun (1994). "Generalized Displacement Structure for Block-Toeplitz,
Toeplitz-Biock, and 'Ibeplitz-Derived Matrices," SIAM J. Matriz Anal. Appl. 15,
114-128.
T. Kailath and A. H. Sayed (1995). "Displacement Structure: Theory and Applications,"
SIAM Review 37, 297-386.
C. Ka.rnath and A. Sameh (1989). "A Projection Method for Solving Nonsymmetric
Linear Systems on Multiproce<!SOrs," Parallel Computing 9, 291-312.
S. Kaniel (1966). "Estimat'"' for Some Computational Techniqu.., in Linear Algebra,"
Math. Camp. 20, 369-78.
I.E. Kaporin (1994). "New Convergence Results and Preconditioning Strategi'"' for the
Conjugate Gradient Method," Num. Lin. Alg. Applic. 1, 179-210.
R.N. Kapur and J.C. Browne {1984). "Techniques for Solving Block Tridiagonal Systems
on Reconfigurable Array Computers," SIAM J. Sci. and Stat. Comp. 5, 701-719.
I. Karasalo (1974). "A Criterion for Truncation of the QR Decomp011ition Algorithm for
the Singular Linear Least Squa.re5 Problem," BIT 14, 156-66.
E.M. Kasenally (1995). "GMBACK: A Generalized Minimum Backward Error Algorithm
for Nonsymmetric Linear Systems," SIAM J. Sci. Camp. 16, 698-719.
T. Kato (1966). Perturbation Theory for Linear Operators, Springer-Verlag, New York.
L. Kaufman (1974). "The LZ Algorithm to Solve the Generalized Eigenvalue Problem,"
SIAM J. Num. Anal. 11,997-1024.
L. Kaufman (1977). "Some Thoughts on the QZ Algorithm for Solving the Generalized
Eigenvalue Problem," ACM 'lhms. Math. Soft. 3, 65-75.
L. Kaufman (1979). "Application of Dense HoUBeholder Transformations to a Sparse
Matrix," ACM 'lhms. Math. Soft. 5, 442-51.
L. Kaufman (1987). "The Generalized Householder Transformation and Sparse Matri-
ces," Lin. Alg. and Its Applic. 90, 221-234.
L. Kaufman (1993). "An Algorithm for the Banded Symmetric Generalized Matrix
Eigenva.lue Problem," SIAM J. MoJ.riz Anal. Awl. 14, 372-389.
J. Kautsky and G.H. Golub (1983). "On the Calculation of Jacobi Matrices," Lin. Alg.
and Its Applic. 52/53, 439-456.
C.S. Kenney and A.J. Laub {1989). "Condition Estimates for Matrix Functions," SIAM
J. Matrix Anal. Awl. 10, 191-209.
C.S. Kenney and A.J. Laub (1991). "Rational Iterative Methods for the Matrix Sign
Function," SIAM J. Matriz Anal. Appl. 12, 273-291.
C.S. Kenney and A.J. Laub (1992). "On Scaling Newton's Method for Polar Decompo-
sition and the Matrix Sign Function," SIAM J. Matrix Anal. Appl. 13, 688-706.
C.S. Kenney and A.J. Laub (1994). "Small-Sample Statistical Condition Estimates for
General Matrix Functions," SIAM J. Sci. Camp. 15, 36-61.
D. Kershaw(1982). "Solution of Single Tridiagonal Linear Systems and Vectorization of
the ICCG Algorithm on the Cray-1," in G. Roderigue (ed), Parallel Computation,
Academic Press, NY, 1982.
666 BIBLIOGRAPHY
D.E. Keyes, T.F. Chan, G. Meurant, J.S. Scroggs, and R.G. Voigt (eds) (1992). Do-
main Decomposition Methods for Partial Differential Equations, SIAM Publications,
Philadelphia, PA.
A. Kielbasioski ( 1987). "A Note on RDunding Error Analysis of Cholesky Factorization,"
Lin. Alg. and I1.8 Awlic. 88/89, 487-494.
S.K. Kim and A.T. Chronopoulos (1991). "A Closs of Lanczos-Like Algorithms Imple-
mented on Parallel Computers," Parallel Comput. 17, 763-778.
F. Kittaneh (1995). "Singular Values of CompWJion Matric.., and Bounds on Zeros of
Polynomials," SIAM J. Matrix Anal. Appl. 16, 333-340.
P.A. Knight (1993). "Error Analysis of Stationary Iteration and Associated Problems,"
Ph.D. thesis, Department of Mathematics, University of Manchester, England.
P.A. Knight (1995). "F&Bt Rectangular Matrix Multiplication and the QR Decomposi-
tion," Lin. A/g. and I1.8 Applic. 11111, 69-81.
D. Knuth (1981). The Art of Computer Programming , vol. !. Seminumericul Algo-
rithms, 2nd ed., Addison-Wesley, Reading, Massachusetts.
E.G. Kogbetliantz (1955). "Solution of Linear Equations by Diagonalization of Coeffi-
cient Matrix," Quart. Appl. Math. 13, 123-132.
S. Kourouklis and C.C. Paige (1981). "A Constrained Least Squares Approach to the
General Galliiii-Markov Linear Model," J. A mer. Stat. Assoc. 76, 620-25.
V.N. KublaoOVBicaya (1961). "On Some Algorithms for the Solution of the Complete
Eigenvalue Problem," USSR Comp. Math. Phys. 3, 637-57.
V.N. Kublanovskaya (1984). "AB Algorithm and Its Modifications for the Spectral
Problem of Linear Pencils of Matrices," Numer. Math. 43, 329-342.
V.N. Kublanovskaja and V.N. Fadeeva (1964). "Computational Methods for the Solution
of a Generalized Eigenvalue Problem," A mer. Math. Soc. 7hmsl. 2, 271-90.
J. KuczyDski and H. Woiniakowski (1992). "Estimating the Larg...t Eigenvalue by the
Power and Lanczos Algorithms with a Random Start," SIAM J. Matrix Anal. AwL
13, 1094-1122.
U.W. Kulisch and W.L. Miranker (1986). "The Arithmetic of the Digital Computer,"
SIAM Review 28, 1-40.
V. Kumar, A. Grama, A. Gupta and G. Karypis (1994). Introduction to Parallel Com-
puting: Design and Analysis of Algorithms, Benjamin/Cummings, Reading, MA.
H.T. Kung (1982). "Why Systolic Architectur...?," Computer 15, 37-46.
C.D. La Budde (1964). "Two Closs... of Algorithms for Finding the Eigenvalues and
Eigenvectors of Real Symmetric Matric..,," J. ACM 11, 53-58.
S. Lakshmivarahan and S. K. Dhall (1990). Analysis and Design of Parallel Algorithms:
Arithmetic and MatrU: Problems, McGraw-Hill, New York.
J. Lambiotte Wid R.G. Voigt (1975). ''The Solution of Tridiagonal Linear Systems of
the CDC-STAR 100 Computer," ACM Trans. Math. Soft. 1, 308--29.
P. Lsncaste< (1970). "Explicit Solution of Linear Matrix Equations," SIAM Review 12,
544-{16.
P. Lanc&Bter and M. Tismenetsky (1985). The Theory of Matrices, S=nd Edition,
Academic Press, New York.
C. Lanczos (1950). "An Iteration Method for the Solution of the Eigenvalue Problem of
Linear Differential and Integral Operators," J. Res. Nat. Bur. Stand. 45, 25&-82.
B. Lang (1996). "Parallel Reduction of Banded Matric.., to Bidiagonal Form," Parnllel
Computing 22, 1-18.
J. Larson and A. Sameb (1978). "Efficient Calculation of the EtfectsofRDundotfErrors,"
ACM Trans. Math. Soft. 4, 228--36.
A. Laub (1981). "Efficient Multivariable Frequency Response Computations," IEEE
Trans. Auto. Cont. AC-26, 407-8.
A. Laub(1985). "Numerical Linear Algebra Aspects of Control Design Computations,"
IEEE Trans. Auto. Cont. AC-30, 97-108.
C.L. LaWHOn and R.J. Hanson (1969). "Extensions and Applications of the Householder
Algorithm for Solving Linear Le&Bt Squares Problems," Math. Camp. 23, 787-812.
BIBLIOGRAPHY 661
C.L. Lawson and R.J. Hanson (1974). Sollling Least Squares Problems, Prentice-Hall,
Englewood Cliffs, NJ. Reprinted with a detailed "new developments" appendix in
1996 by SIAM Publications, Philadelphia, PA.
C.L. Lawson, R.J. Hanson, , D.R. Kincaid, and F.T. Krogh (1979). "Basic Linear
Algebra Subprograms for FORTRAN Usage," ACM 'lluns. Math. Soft. 5, 308--323.
C.L. LaWliOn, R.J. Hanson, D.R. Kincaid, and F.T. Krogh (1979). "Algorithm 539,
Basic Linear Algebra Subprograms for FORTRAN Usage," ACM 'lluns. Math. Soft.
5, 324-325.
D. Lay (1994). Linear Algebra and Its Applications, Addison-Wesley, Reading, MA.
N.J. Lehmann (1963). "Optimale Eigenwerteinschliessungen," Numer. Math. 5, 246-72.
R.B, Lehoucq (1995). "Analysis and Implementation of an Implicitly Restarted Arnoldi
Iteration," Ph.D. thesis, Rice University, Houston Texas.
R.B. Lehoucq (1996). "Restarting an Arnoldi Reduction," Report MCS-P591..Q496, Ar-
gonne National Laboratory, Argonne Illinois.
R.B. Lehoucq and D.C. Sorensen (1996). "Deftation Techniques for an Implicitly Restarted
Iteration," SIAM J. Matrix Analysis and Applic, to appear.
F.T. Leighton (1992). Introduction to Parallel Algorithms and Architectures, Morgan
Kaufmann, San Mateo, CA.
F. Lemeire (1973). "Bounds for Condition Numbers of 'lliangular Value of a Matrix,"
Lin. Alg. and /Is Applic. 11, 1-2.
S.J. Leon (1980). Linear Algebra with Applications. Macmillan, New York.
S.J. Leon (1994). "Maximizing Bilinear Forms Subject to Linear Constraints," Lin. Alg.
and Its Applic. !HO, 4~58.
N. Levinson (1947). "The Weiner RMS Error Criterion in Filter Design and Prediction,"
J. Math. Phys. 25, 261-78.
J. Lewis, ed. (1994). Prr>eet!dings of the Fifth SIAM Conference on Applied Linear
Algebm, SIAM PublicatioiiB, Philadelphia, PA.
G. Li and T. Coleman (1988). "A Parallel Triangular Solver for a Distributed-Memory
Multiprocessor," SIAM J. Sci. and Stat. Comp. 9, 485-502.
K. Li and T-Y. Li (1993). "A Homotopy Algorithm for a Symmetric Generalized Eigen-
problem," Numeriml Algorithms 4, 167-195.
K. Li, T-Y. Li, and Z. Zeng (1994). "An Algorithm for the Generalized Symmetric
Tridiagonal Eigenvalue Problem," Numeriml Algorithms 8, 269-291.
R-C. Li (1993). "Bounds on Perturbations of Generalized Singular Values and of AIIIIO-
ciated SubspacE&," SIAM J. Matrix Anal. Appl. 14, 195-234.
R-C. Li (1994). "On Eigenvalue Variations of Rayleigh Quotient Matrix Pencils of a
Definite Pencil," Lin. Alg. and Its Applic. 208/209, 471-483.
R-C. Li (1995). "New Perturbation Bounds for the Unitary Polar Factor," SIAM J.
Matrix Anal. Appl. 16, 327-332.
R.-C. Li (1996). "Relative Perturbation Theory (I) Eigenvalue and Singul...- Value Vari-
ations," 'Thchnical Report UCB//CSD-94-855, Department of EECS, University of
California at Berkeley.
R.-C. Li (1996). "Relative Perturbation Theory (II) Eigenspace and Singular Subspace
Variations," Technical Report UCB/ /CSD-94-856, Department of EECS, University
of California at Berkeley.
Y. Li (1993). "A Globally Convergent Method for Lp Problems," SIAM J. Optimization
3, 609-629.
W-W. Lin and C.W. Chen (1991). "An Acceleration Method br Computing the Gen-
eralized Eigenvalue Problem on a Parallel Computer," L&n.Alg. and Its Applic. 146,
4H5.
I. Linnik (1961). Method of Least Squares and Principles of the Theory of Obseroations,
Pergamon Press, New York.
E. Linzer (1992). "On the Stability of Solution Methods for Band Toeplitz Systems,"
L&n.Alg. and Its Appl&c. 170, 1-32.
S. Lo, B. Philippe, and A. Sameh (1987). "A Multiprocessor Algorithm for the Symmet-
ric Tridiagonal Eigenvalue Problem," SIAM J. Sc&. and Stat. Comp. 8, s15~s165.
668 BIBLIOGRAPHY
R.S. Martin and J .H. Wilkinson (1968). "Similarity Reduction of a General Matrix to
Hessenberg Form," Numer. Malh. 1f, 349-68. See also Wilkinson and Reinsch
(197l,pp.339-58).
R.S. Martin and J.H. Wilkinson (1968). "The Modified LR Algorithm for Complex H.,.
senberg Matrices," Numer. Malh. 12, 369-76. See aJBo Wilkinson and Reinsch(1971,
pp. 396--403).
R.S. Martin and J.H. Wilkinson (1968). "Householder's Tridiagonalization of a Sym-
metric Matrix," Numer. Malh. 11, 181-95. See also Wilkinson and Reinsch (1971,
pp.212--26).
R.S. Martin and J.H. Wilkinson (1968). "Reduction of a Symmetric Eigenproblem Ax=
)I.Bz and Related Problems to Standard Form," Numer. Malh. 11, 99-110.
R.S. Martin, G. Peters, and J.H. Wilkinson (1965). "Symmetric Decomposition of a
Positive Definite Matrix," Numer. Malh. 7, 362-83.
R.S. Martin, G. Peters, and J .H. Wilkinson (1966). "Iterative Rellnement of the Solution
of a Positive Definite System of Equations," Numer. Malh. 8, 203-16.
R.S. Martin, C. Reinsch, and J.H. Wilkinson (1970). "The QR Algorithm for Band
Symmetric Matrices," Numer. Malh. 16, 85-92. See also See also Wilkinson and
Reinsch (1971, pp.26&-72).
W.F. Mascarenhas (1994). "A Note on Jacobi Being More Accurate than QR," SIAM
J. Matrix AnaL Appl. 15, 215-218.
R. Mathias (1992). "Matrices with Positive Definite Hermitian Part: Inequalities and
Linear Systems," SIAM J. Matm Anal. Appl. 13, 64o-654.
R. Mathias (1992). "Evaluating the Frechet Derivative of the Matrix Exponential,"
Numer. Math. 63, 213-226.
R. Mathias (1993). "Approximation of Matrix-Valued Functions," SIAM J. Matrix Anal.
Appl. 14, 1061-1063.
R. Mathias (1993). "Perturbation Bounds for the Polar DecomJXJSition," SIAM J. Matrix
Anal. Appl. 14, 588-597.
R. Mathias (1995). "Accurate Eigensystem Computations by Jacobi Methods," SIAM
J. Matm Anal. Appl. 16, 977-1003.
R. MathiM (1995). "The Instability of Parallel Prefix Matrix Multiplication," SIAM J.
Sci. Camp. 16, 95&-973.
R. MathiM and G.W. Stewart (1993). "A Block QR Algorithm and the Singular Value
Decomposition," Lin. Alg. and Its Applic. 181l, 91-100.
K. Mathur and S.L. Johnsson (1994). "Multiplication of Matrices of Arbitrary Shape on
a Data Parallel Computer," Parallel Computing eo, 919-952.
B. Mattingly, C. Meyer, and J. Ortega (1989). "Orthogonal Reduction on Vector Com-
puters," SIAM J. Sci. and Stat. Camp. 10, 372-381.
0. McBryan and E.F. van de Velde (1987). "Hypercube Algorithms and Implementa-
tions," SIAM J. Sci. and Stat. Camp. 8, s227__,287.
C. McCarthy and G. Strang (1973). "Optimal Conditioning of Matrices," SIAM J. Num.
Anal. 10, 37(}-88.
S.F. McCormick (1972). "A General Approach to One-Step Iterative Methods with
Application to Eigenvalue Problems," J. Comput. Sys. Sci. 6, 354-72.
W.M. McKeeman (1962). "Crout with Equilibration and Iteration," Comm. ACM. 5,
553-55.
K. Meerbergen, A. Spence, and D. Rnose (1994). "Shift-Invert and Cayley Transforms
for the Detection of Rightmost Eigenvalues of Nonsymmetric Matrices," BIT 34,
409-423.
V. Mehrmann (1988). "A Symplectic Orthogonal Method for Single Input or Single
Output Discrete Time Optimal Quadratic Control Problems," SIAM J. Matm Anal.
Appl. 9, 221-247.
V. Mehrmann (1993). "Divide and Conquer Methods for Block Tridiagonal Systems,"
Parallel Computing 19, 257-280.
U. Meier (1985). "A Parallel Partition Method for Solving Banded Systems of Linear
Equations," Parallel Computers f, 33-43.
670 BIBLIOGRAPHY
M. Mu. (1995). "A New family of Preconditioners for Domain Decomposition," SIAM
J. Sci. Comp. 16, 289--306.
D. Mueller (1966). "Householder's Method for Complex Matrices and Hermitian Matri-
ces," Numer. Math. 8, 72-92.
F.D. Mumagha.n a.nd A. Wintner (1931). "A Ca.nonical Form for Real Mlltrires Under
Orthogonal Transforma.tions," Proc. Nat. Acad. Sci. 17, 417-20.
N. Na<:htigal, S. Reddy, and L. Trefethen (1992). "How Fast Are Nonsymmetric Matrix
Iterations," SIAM J. Matrix Anal. Appl. 13, 778-795.
N. Nachtigal, L. Reichel, and L. Trefethen (1992). "A Hybrid GMRES Algorithm for
Nonsymmetric Linear Systems," SIAM J. Matrix Anal. Appl. 13, 796-825.
T. Nanda (1985). "Differential Equations and the QR Algorithm," SIAM J. Numer.
Anal. 22, 31(}-321.
J.C. Nash (1975). "A One-Sided Tranformation Method for the Singular Value Decom-
position and Algebraic Eigenproblem," Comp. J. 18, 74-76.
M.Z. Nashed (1976). Generalized Inverses and Applications, Academic Press, New York.
R.A. Nicolaides (1974). "On a Geometrical Aspect of SOR and the Theory of Consistent
Ordering for Positive Definite Matrices," Numer. Math. 12, 99--104.
W. Nietharnmer and R.S. Varga (1983). "The Analysis of k-step Iterative Methods for
Linaar Systema from Summability Theory," Numer. Math. 41, 177-206.
B. Noble and J.W. Daniel (1977). Applied Linear Algebra, Prentice-Hall, Englewood
Cliffs.
Y. Notay (1992). "On the Robustn""" of Modified Incomplete Factorization Methods,"
J. Comput. Math. 40, 121-141.
C. Oara (1994). "Proper Deflating Subspaces: Properties, Algorithms, and Applic"'
tiona," Numerical Algorithms 7, 355-373.
W. Oettli and W. Prager (1964). "Complltibility of Approximate Solutions of Linear
Equations with Given Error Bounds for Coefficients and Right Hand Sides," Numer.
Math. 6, 405-409.
D.P. O'Leary (1980). "Estimating Matrix Condition Numbers," SIAM J. Sci. Stat.
Comp. 1, 205-9.
D.P. O'Leary (1980). "The Block Conjugate Gradient Algorithm and Related Methods,"
Lin. Alg. and Its Applic. 2g, 293-322.
D.P. O'Leary (1987). "Parallel Implementation of the Block Conjugate Gradient Alg<>-
rithm," Parallel Computers 5, 127-140.
D.P. O'Leary (1990). "On Bounds for Scaled Projectione and Pseudoinverses," Lin. Alg.
and Its Applic. 132, 115-117.
D.P. O'Leary and J.A. Simmons (1981). "A Bidiagonalization-Regularization Procedure
for Large Scale Discretizations of lll-Posed Problems," SIAM J. Sci. and Stat. Comp.
2, 474-489.
D.P. O'Leary and G.W. Stewart (1985). "Data Flow Algorithms for Parallel Matrix
Computations," Comm. ACM 28, 841-853.
D.P. O'Leary and G.W. Stewart (1986). "Assignment and Scheduling in Parallel Matrix
Factorization," Lin. Alg. and Its Applic. 77, 275-300.
S.J. Olszanskyj, J.M. Lebak, and A.W. Bojanczyk (1994). "Rank-k Modification Meth-
ods for Recursive Least Squares Problems," Numerical Algorithms 7, 325-354.
A.V. Oppenheim (1978). Applications of Digital Signal Processing , Prentice-Hall, En-
glewood Cliffs.
J.M. Ortega (1987). Matrix Theory: A Second Course, Plenum Press, New York.
J.M. Ortega (1988). "The ijk Forms of Factorization Methods I: Vector Computers,"
Parollel Computers 7, 135-147.
J.M. Ortega (1988). Introduction to Parallel and Vector Solution of Linear Systems,
Plenum Press, New York.
J.M. Ortega and C.H. Romine (1988). "The ijk Forms of Factorization Methods ll:
Parallel Systems," Parnllel Computing 7, 149--162.
J.M. Ortega and R.G. Voigt (1985). "Solution of Partial Differential Equations on Vector
and Parallel Computers," SIAM Review !J7, 149--240.
672 BIBLIOGRAPHY
C.-T. Pan (1993). "A Perturbation Analysis of the Problem of Downdating a Cho)..,ky
FBctorization," Lin. Alg. and Its Applic. 183, 103-115.
V. Pan (1984). "How Can We Speed Up Matrix Multiplication?," SIAM Review 26,
393--416.
H. Park (1991). "A Parallel Algorithm for the Unbalanced Orthogonal Procrustes Prob-
lem," Parallel Computing 17, 913-923.
H. Park and L. Elden (1995). "Downdating the Rank-Revealing URV Decomposition,"
SIAM J. Matrix Anal. Appl. 16, 138-155.
B.N. Parlett (1965). "Convergence of the Q-R Algorithm," Numf!r'. Math. 7, 187-93.
(Correction in Numer. Math. 10, 163-64.)
B.N. Parlett (1966). "Singular and Invariant Matric"" Under the QR Algorithm," Math.
Comp. 20, 611-15.
B.N. Parlett (1967). "Canonical Decomposition of Hessenberg Matric...," Math. Comp.
21, 223-27.
B.N. Parlett (1968). "Global Convergence of the Basic QR Algorithm on Hessenberg
Matrices," Math. Comp. 22, 803-17.
B.N. Parlett (1971). "Analysis of Algorithms for Reflections in Bisectors," SIAM Review
13, 197-208.
B.N. Parlett (1974). "The Rayleigh Quotient Iteration and Some Generalizations for
Nonnormal Matric...," Math. Comp. 28, 679-93.
B.N. Parlett (1976). "A Recurrence Among the Elements of Functions of Triangular
Matric...," Lin. Alg. and Its Applic. 1-4, 117-21.
B.N. Parlett (1980). The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood
Cliffs, NJ.
B.N. Parlett (1980). "A New Look at the Lanczos Algorithm for Solving Symmetric
Systems of Linear Equations," Lin. Alg. and Its Applic. 29, 323--46.
B.N. Parlett (1992). "Reduction to Tridiagonal Fonn and Minimal Realizations," SIAM
J. Matrix Anal. Appl. 13, 567-593.
B.N. Parlett (1995). "The New qd Algorithms," ACTA Numerica 5, 459-491.
B.N. Parlett and B. Nour-Omid (1985). "The Use of a Relined Error Bound When
Updating Eigenvalues of Tridiagonals, • Lin. Alg. and Its Applic. 68, 179-220.
B.N. Parlett and W.G. Poole (1973). "A Geometric Theory for the QR, LU, and Power
Iterations," SIAM J. Num. Anal. 10, 389-412.
B.N. Parlett and J.K. Reid (1970). "On the Solution of a System of Linear Equations
Whose Matrix is Symmetric but not Definite," BIT 10, 386-97.
B.N. Parlett and J.K. Reid (1981). "Tracking the Progress of the LanCZOII Algorithm for
Large Symmetric Eigenprohlerns," IMA J. Num. Anal. 1, 135-55.
B.N. Parlett and C. Reinsch (1969). "Balancing a Matrix for Calculation of Eigen-
values and Eigenvectors," Numer. Math. 13, 292-304. See also Wilkinson and
Reinsch(1971, pp. 315-26).
B.N. Parlett and R. Schreiber (1988). "Block Reflectors: Theory and Computation,"
SIAM J. Num. Anal. 25, 189-205.
B.N. Parlett and D.S. Scott (1979). "The Lanczos Algorithm with Selective Orthogo-
nalization," Math. Comp. 33, 217-38.
B.N. Parlett, H. Simon, and L.M. Stringer (1982). "On Estimating the Largest Eigen-
value with the Lanczos Algorithm," Math. Comp. 38, 153-166.
B.N. Parlett, D. Taylor, and Z. Liu (1985). "A Look-Ahead Lanczos Algorithm for
Unsymmetric Matrices," Math. Comp. -4-4, 105-124.
N. Patel and H. Jordan ( 1984). "A Parallelized Point Rowwise Successive Over-Relaxation
Method on a Multiprocessor," Parallel Computing 1, 207-222.
R.V. Patel, A.J. Laub, and P.M. Van Dooren, eds. (1994). Numerical Linear Algebra
Techniq1Joes for S11stems and Control, IEEE Press, Piscataway, New Jersey.
D.A. Patterson and J.L. Hennessy {1989). Computer Architecture: A Quantitatiue Ap-
proach, Morgan Kaufmann Publishers, Inc., Palo Alto, CA.
M.S. Paterson and L.J. Stockmeyer (1973). "On the Number of Nonscalar Multiplica--
tions Necessary to Evaluate Polynomials," SIAM J. Comp. 2, 6D-66.
674 BIBLIOGRAPHY
K. Pearson (1901). "On Lin"" and Plan"" of Closest Fit to Points in Space," Phil. Mag.
£, 559--72.
=
G. Peters a.nd J.H. Wilkinson (1969). "Eigenvaluffi of Az :l.Bz with Ba.nd Symmetric
A a.nd B," Camp. J. 12, 398-404.
G. Peters and J.H. Wilkinson (1970). "The Least Squarffi Problem and Pseudo-Inverses,"
Camp. J. 13, 309-16.
G. Peters a.nd J.H. Wilkinson (1970). "Az = :l.Bz a.nd the Generalized Eigenproblem,"
SIAM J. Num. Anal. 1, 479--92.
G. Peters and J.H. Wilkinson (1971). "The Calculation of Specified Eigenvectors by
Inverse Iteration," in Wilikinson and Reinsch (1971, pp.418-39).
G. Peters and J.H. Wilkinson (1979). "Inverse Iteration, Ill-Conditioned Equations, and
Newton's Method," SIAM Review £1, 339--60.
D.J. Pierce and R.J. Plemmons (1992). "Fast Adaptive Condition Estimation,'' SIAM
J. Matri:J; Anal. Appl. 13, 274-291.
S. Pissanetsky (1984). Sparse Matrix Tedanology, Academic Press, New York.
R.J. Plemmons (1974). "Linear Least Squares by Elimination and MGS," J. Assoc.
Camp. Mach. £1, 581-85.
R.J. Plemmons (1986). "A Parallel Block Iterative Scheme Applied to Computations in
Structural Analysis,'' SIAM J. Alg. and Disc. Method• 1, 337-347.
R.J. Plemmons and C.D. Meyer, eds. ( 1993). Linear Algebra, Markov Chain•, and
Queuing Model8, Springer-Verlag, New York.
A. Pokrzywa (1986). "On Perturbations and the Equivalence Orbit of a Matrix Pencil,"
Lin. Alg. and Applic. 82, 99--121.
E.L. Poole and J.M. Ortega (1987). "Multicolor ICCG Methods for Vector Computers,"
SIAM J. Numer. Anal. tl4, 1394-1418.
D.A. Pope and C. Tompkins (1957). "Maximizing Functions of Rotations: Experiments
Concerning Speed of Diagonalization of Symmetric Matrices Using Jacobi's Method,"
J. ACM 4, 459--66.
A. Pothen, S. Jha, and U. Vemapulati (1987). "Orthogonal Factorization on a Dis-
tributed Memory Multiprocessor," in Hypercube Multiproce••or•, ed. M.T. Heath,
SIAM Publications, 1987.
M.J.D. Powell and J.K. Reid (1968). "On Applying Householder's Method to Linear
Least Squares Problems,'' Proc. IFIP Congre•s, pp. 122-26.
R. Pratap (1995). Getting Started with MATLAB, Saunders College Publishing, Fort
Worth, TX.
J.D. Pryce (1984). "A New Measure of Relative Error for Vectors,'' SIAM J. Num.
Anal. 21, 202-21.
C. Puglisi ( 1992). "Modification of the Householder Method Based on the Compact WY
Representation," SIAM J. Sci. and Stat. Camp. 13, 723--726.
S. Qiao(1986). "Hybrid Algorithm for Fast Toeplitz Orthogonalization," Numer. Math.
53, 351-366.
S. Qiao (1988). "Recursive Least Squares Algorithm for Linear Prediction Problems,"
SIAM J. Matrix Anal. Appl. 9, 323--328.
C.M. Rader a.nd A.O. Steinhardt (1988). "Hyperbolic Householder Transforms," SIAM
J. Matri:J; Anal. Appl. 9, 269--290.
G. Radicati di Brozolo and Y. Robert (1989). "Parallel Conjugate Gradient-like Algo-
rithms for Solving Sparse Nonsymmetric Linear Systems on a Vector Multiprocessor,"
Parallel Computing 11, 233-240.
P. Raghava.n (1995). "Distributed Sparse Gaussian Elimination and Orthogonal Factor-
ization," SIAM J. Sci. Camp. 16, 1462-1477.
W. Rath (1982). "Fast Givens Rotations for Orthogonal Similarity," Numer. Math. 40,
47-56.
P.A. Regalia and S. Mitra (1989). "Kronecker Products, Unitary Matricffi, and Signal
Processing Applications," SIAM Review 31, 58&-613.
L. Reichel (1991). "Fast QR Decomposition of Vandermonde-Like Matrices and Polyno-
mial Least Squares Approximation," SIAM J. Matrix Anal. Appl. 12., 552-564.
BIBLIOGRAPHY 675
A. Rube (1984). "Rational Krylov AlgorithrnB for Eigenvalue Computation," Lin. Alg.
and Its Applic. 58, 391-405.
A. Rube (1987). "Closest Normal Matrix Found!," BIT 21, 585-598.
A. Rube (1994). "Rational Krylov Algorithms for Nonsymmetric Eigenvalue Problems
II. Matrix Pairs," Lin. Alg. and Its Applic. 197, 283-295.
A. Rube (1994). "The Rational Krylov Algorithm for Nonsymmetric Eigenvalue Prob-
lems III: Complex Shifts for Real Matrices," BIT 34,165-176.
A. Rube and T. Wiberg (1972). ''The Method of Conjugate Gradients Used in Inverse
Iteration," BIT 1B, 543-54.
H. Rutishauser (1958). "Solution of Eigenvalue Problems with the LR Transformation,"
Nat. Bur. Stand. App. Math. Ser. 49, 47-81.
H. Rutishauser (1966). "Bestimmung der Eigenwerte Orthogonaler Matrizen," Numer.
Math. 9, 104-108.
H. Rutishauser (1966). "The Jacobi Method for Real Symmetric Matrices," Numer.
Math. 9, 1-10. See also Wilkinson and Reinsch (1971, pp. 202-11).
H. Rutishauser (1970). "Simultaneous Iteration Method for Symmetric Matrices," Nu-
mer. Math. 16, 205-23. See also Wilkinson and Reinsch (1971,pp.284-302).
Y. Saad (1980). "On the Rates of Convergence of the Lanczos and the Block Lanczos
Methods," SIAM J. Num. Anal.11, 687-706.
Y. Saad (1980). "Variations of Arnoldi's Method for Computing Eigenelements of Large
Unsymmetric Matrices.," Lin. Alg. and Its Applic. 34, 269-295.
Y. Saad (1981). "Krylov Subspace Methods for Solving Large Unsymmetric Linear
Systems," Math. Camp. 37, 105-126.
Y. Saad (1982). "The Lanczos Biorthogonalization Algorithm and Other Oblique Pro-
jection Metods for Solving Large Unsymmetric Systems," SIAM J. Numer. Anal.
19, 485-506.
Y. Saad (1984). "Practical Use of Some Krylov Subspace Methods for Solving Indefinite
and Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. Camp. 5, 203-228.
Y. Saad (1985). "Practical Use of Polynomial Preconditioning& for the Conjugate Gra-
dient Method," SIAM J. Sci. and Stat. Comp. 6, 865-882.
Y. Saad (1986). "On the Condition Number of Some Gram Matrices Arising from Least
Squares Approximation in the Complex Plane," Numer. Math. 48, 337-348.
Y. Saad (1987). "On the Lanczos Method for Solving Symmetric Systems with Several
Right Hand Sides," Math. Comp. 48, 651-662.
Y. Saad (1988). "Preconditioning Techniques for Indefinite and Nonsymmetric Linear
Systems," J. Camput. Appl. Math. 2,4, 89-105.
Y. Saad (1989). "Krylov Subspace Methods on Supercomputers," SIAM J. Sci. and
Stat. Comp. 10, 12D0-1322.
Y. Saad (1992). NumeriCBI Methods for Large Eigenvalue Problems: Theory and Algo-
rithms, John Wiley and Sons, New York.
Y. Saad (1993). "A Flexible Inner-Outer Preconditioned GMRES Algorithm," SIAM J.
Sci. Comput. 14, 461-469.
Y. Saad (1996). ltemtive Methods for Sparse Linear Systems, PWS Publishing Co.,
Boston.
Y. Saad and M.H. Schultz (1985). "Conjugate Gradient-Like Algorithms for Solving
Nonsymmetric Linear Systems," Math. Camp. 44, 417-424.
Y. Saad and M.H. Schultz (1986). "GMRES: A Generalized Minimal Residual Algorithm
for Solving Nonsymmetric Linear Systems," SIAM J. Scientific and Stat. Comp. 1,
856--869.
Y. Saad and M.H. Schultz (1989). "Data Communication in Parallel Architectures," J.
Dist. Pamllel Camp. 11, 131-150.
Y. Saad and M.H. Schultz (1989). "Data Communication in Hypercubes," J. Dist.
Parallel Camp. 6, 115-135.
A. Sameh (1971). "On Jacobi and Jacobi-like Algorithms for a Parallel Computer,"
Math. Camp. BS, 579-90.
BIBLIOGRAPHY 677
A. Sacneh and D. Kuck (1978). "On Stable Pacallel Lineae System Solvers," J. A&Ooc.
Comp. Mach. !5, 81-91.
A. Sacneh, J. Lermit and K. Noh (1975). "On the Intermediate Eigenvalues of Symmetric
Sparse Matrices," BIT 12, 543-54.
M.A. Sanders (1995). "Solution of Sp8C8e Rectangulac Systems," BIT 95, 58lHl04.
M.A. Saunders, H.D. Simon, and E.L. Yip (1988). "Two Conjugate Gradient-Type
Methods for Unsymmetric Lineae Equations," SIAM J. Num. Anal. !5, 927-940.
K. Schittkowski and J. Stoer (1979). "A Foctorization Method for the Solution of Con-
strained Lineae Least Squaces Problems Allowing for Subsequent Data changes,"
Numer. Math. 91, 431-463.
W. Schonauer (1987). Scientific Computing on Vector Computers, North Holland, Am-
sterdacn.
P. Schonemann (1966). "A Generalized Solution ofthe Orthogonal Procrustes Problem,"
Psychometrika 31, 1-10.
A. Scbonhage (1964). "On the Quadratic Convergence of the Jacobi Process," Numer.
Math. 6, 41G-12.
A. Schonhage (1979). "Arbitrary Perturbations of Hermitian Matrices," Lin. Alg. and
Its Applic. 24, 143-49.
R.S. Schreiber (1986). "Solving Eigenvalue and Singulac Value Problems on an Under-
sized Systolic Array," SIAM J. Sci. and Stat. Comp. 1, 441-451.
R.S. Schreiber (1988). "Block Algorithms for Pacallel Machines," in Numerical Algo-
rithms for Modern Pamllel Computer Architectures, M.H. Schultz (ed), !MA Volumes
in Mathematics and Its Applications, Number 13, Springer-Verlag, Berlin, 197-207.
R.S. Schreiber and B.N. Parlett (1987). "Block Reflectors: Theory and Computation,"
SIAM J. Numer. Anal. !5, 189-205.
R.S. Schreiber and C. Van Loan (1989). "A Storage-Efficient WY Representation for
Products of Householder Transformations," SIAM J. Sci. and Stat. Comp. 10,
52-57.
M.H. Schultz, ed. (1988). Numerical Algorithms for Modern Pamllel Computer Archi-
tectures, !MA Volumes in Mathematics and Its Applications, Number 13, Springer-
Verlag, Berlin.
!. Schur (1909). "On the Chacacteristic Roots of a Lineae Substitution with an Appli-
cation to the Theory of Integral Equations." Math. Ann. 66, 488-510 (German).
H.R. Schwartz (1968). 'Tridiagonalization of aSymmetric Band Matrix," Numer. Math.
12, 231-41. See also Wilkinson and Reinsch (1971, 273-83).
H.R Schwartz (1974). "The Method of Coordinate Relaxation for (A - AB)x = 0,"
Num. Math. 23, 135-52.
D. Scott (1978). "Analysis of the Symmetric Lanczos Process," Electronic Reseacch
Laboratory Technica.l Report UCB/ERL M78/40, University of California, Berkeley.
D.S. Scott (1979). "Block Lanczos Software for Symmetric Eigenvalue Problems," Re-
port ORNL/CSD-48, Oak Ridge National Laboratory, Union Carbide Corporation,
Oak Ridge, Tennessee.
D.S. Scott (1979). "How to Make the Lanczos Algorithm Converge Slowly," Math.
Camp. 33, 239-47.
D.S. Scott (1984). "Computing a Few Eigenvalues and Eigenvectors of a Symmetric
Band Matrix," SIAM J. Sci. and Stat. Comp. 5, 658--666.
D.S. Scott (1985). "On the Accuracy of the Gershgorin Circle Theorem for Bounding
the Spread of a Real Symmetric Matrix," Lin. Alg. and Its Applic. 65, 147-155
D.S. Scott, M.T. Heath, and R.C. Ward (1986). "Parallel Block Jacobi Eigenvalue
Algorithms Using Systolic Arrays," Lin. Alg. and Its Applic. 77, 345-356.
M.K. Seager (1986). "Pacallelizing Conjugate Gradient for the Cray X-MP," Pamllel
Computing 3, 35-47.
J.J. Seaton (1969). "Diagonalization of Complex Symmetric Matrices Using a Modified
Jocobi Method," Comp. J. 12, 156-57.
S. Serbin (1980). "On Factoring a Class of Complex Symmetric Matrices Without Piv-
oting," Math. Comp. 35, 1231-1234.
678 BIBLIOGRAPHY
S. Serbin and S. Blalock (1979). "An Algorithm fur Computing the Matrix Coeine,"
SIAM J. Sci. Stat. Camp. 1, 198-204.
J.W. Sheldon (1955). "On the Numerical Solution of Elliptic Difference Equations,"
Math. 7bbles Aids Comp. 9, 101-12.
W. Shougen a.nd Z. Shuqin (1991). "An Algorithm for Ax= )IBx with Symmetric a.nd
Positive Definite A and B," SIAM J. Matrix Anal. Appl. 1!, 654--660.
G. Shroff (1991). "A Parallel Algorithm for the Eigenvalues a.nd Eigenvectors of a
General Complex Matrix," Numer. Matfl. 58, 779-806.
G. Shroff and C.H. Bischof (1992). "Adaptive Condition Estimation for Rank-One Up-
dat ... of QR Factorizations," SIAM J. Matrix Anal. Appl. 13, 1264-1278.
G. Shroff a.nd R. Schreiber (1989). "On the Convergence of the Cyclic Jacobi Method
for Parallel Block Orderings," SIAM J. Matrix Anal. AppL 10, 326-346.
H. Simon (1984). "Analysis of the Symmetric La.nczos Algorithm with Reorthogonaliza..
tion Methods," Lin. Alg. and Its Applic. 61, 101-132.
B. Singer and S. Spilerman (1976). "The Representation of Social Processes by Markov
Models," Amer. J. Sociology 8f, 1-54.
R.D. Skeel (1979). "Scaling for numerical stability in Gaussian Elimination," J. ACM
M, 494-526.
R.D. Skeel (1980). "Iterative Refinement Implies Numerical Stability for Gaussian Elim-
ination," Math. Comp. 35, 817--832.
R.D. Skeel (1981). "Effect of Equilibration on RBiidual Size for Partial Pivoting," SIAM
J. Num. Anal. 18, 449-55.
G.L.G. Sleijpen and D.R. Fokkema (1993). "BICGSTAB(f) for Linear Equations In-
volving Unsymmetric Matrices with Complex Spectrum," Electronic 1hJnsactions
on Numerical Analysis 1, 11-32.
B.T. Smith, J.M. Boyle, Y. lkebe, V.C. Klema., a.nd C.B. Moler (1970). Matrix Eigen-
SlfStem Routines: EISPACK Guide, l!nd ed., Lecture Notes in Computer Science,
Volume 6, Springer-Verlag, New York.
R.A. Smith (1967). ''The Condition Numbers of the Matrix Eigenvalue Problem," N,._
mer. Matll. 10 232-40.
F. Smithi'"' (1970). Integrnl Equationa, Cambridge University Press, Cambridge.
P. Sonneveld (1989). "CGS, A Fast LanczO&-Type Solver for Nonsymmetric Linear Sy,.
terns," SIAM J. Sci. and Stat. Comp. 10, 36-52.
D.C. Sorensen ( 1992). "Implicit Application of Polynomial Filters in a k-Step Arnoldi
Method," SIAM J. Matrix Anal. Appl. 13, 357-385.
D.C. Sorensen (1995). "Implicitly RBitarted Arnoldi/Lanczos Methods for Large Scale
Eigenvalue Calculations," in Proceeding• of tile ICASE/LaRC Workshop on Parollel
Numeriall Algoritll,, May f3-f5, 1994, D.E. Keyes, A. Sa.meh, and V. Venkata.kr-
ishnan (eda), Kluwer.
G.W. Stewart (1969). "Accelerating The Orthogonal Iteration for the Eigenvalues of a
Hermitian Matrix," Numer. Math. 13, 362-76.
G.W. Stewart (1970). "Incorporating Original Shifts into the QR Algorithm for Sym-
metric Tridiagonal Matrices," Comm. ACM 13, 365--67.
G.W. Stewart (1971). "Error Bounds for Approximate 1nva.ria.nt Subspa.ces of Closed
Linear Operators," SIAM. J. Num. Anal. 8, 796-808.
G.W. Stewart (1972). "On the Sensitivity of the Eigenvalue Problem Ax= )IBx," SIAM
J. Num. Anal. 9, 669-86.
G.W. Stewart (1973). "Error a.nd Perturbation Bounds for Subspa.ces ASBOCiated with
Certain Eigenvalue Problema," SIAM &view 15, 727--64.
G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press, New York.
G.W. Stewart (1973). "Conjugate Direction Methods for Solving Systems of Linear
Equations," Numer. Math. f1, 284-97.
G.W. Stewart (1974). "The Numerical Treatment of Large Eigenvalue Problems," Proc.
IFIP Congress 74, North-Holland, pp. 666-72.
G. W. Stewart (1975). "The Convergence of the Method of Conjugate Gradients at
Isolated Extreme Points in the Spectrum," Numer. Math. f4, 85--93.
BIBLIOGRAPHY 679
G.W. Stewart (1975). "Gershgorin Theory for the Generalized Eigenvalue Problem Ax=
>.Bx," Math. Camp. 89, 600-606.
G.W. Stewart (1975). "Methods of Simultaneous Iteration for Calculating Eigenvectors
of Matrices," in Topics in Numerical Analysis II, ed. John J.H. Miller, Academic
Press, New York, pp. 185-96.
G.W. Stewart (1976). "The Economical Storage of Plane Rotations," Numer. Math.
85, 137-38.
G.W. Stewart (1976). "Simultaneous Iteration for Computing Invariant Subspaces of
Non-Hermitian Matrices," Numer. Math. 85, 123-36.
G.W. Stewart (1976). "Algorithm 406: HQR3 and EXCHNG: Fortran Subroutines for
Ca.lculating and Ordering the Eigenvalues of a Real Upper Hessenberg Matrix," ACM
7rans. Math. Soft. 8, 275--80.
G.W. Stewart (1976). "A Bibliographical Tour of the Large Sparse Generalized Eigen-
value Problem," in Sparse Matrix Computations , ed., J.R. Bunch and D.J. Rose,
Academic Press, New York.
G.W. Stewart (1977). "Perturbation Bounds for the QR Factorization of a Matrix,"
SIAM J. Num. Anal. 14, 509--18.
G.W. Stewart (1977). "On the Perturbation of Pseudo-Inverses, Projections and Linear
Least Squares Problems," SIAM &view 19, 634--662.
G.W. Stewart (1977). "Sensitivity Coefficients for the Effects of Errors in the Indepen-
dent Variables in a Linear Regression," Technical Report TR.-571, Department of
Computer Science, University of Maryland, College Park, MD.
G.W. Stewart (1978). "Perturbation Theory for the Generalized Eigenvalue Problem"',
in ¢ Advances in Numerical Analysis, ed. C. de Boor and G. H. Golub, Aca,.
demic Press, New York.
G. W. Stewart (1979). "A Note on the Perturbation of Singular Values," Lin. Alg. and
Its Applic. 88, 213-16.
G.W. Stewart (1979). "Perturbation Bounds for the Definite Generalized Eigenvalue
Problem," Lin. Alg. and Its Applic. 83, 69--86.
G.W. Stewart (1979). ''The Effects of Rounding Error on an Algorithm for Downdating
a Cholesky Factorization," J. Inst. Math. Applic. 23, 203-13.
G.W. Stewart (1980). "The Efficient Generation of Random Orthogonal Matrices with
an Application to Condition Estimators," SIAM J. Num. Anal. 17, 403-9.
G.W. Stewart (1981). "On the Implicit Deflation of Nearly Singular Systems of Linear
Equations," SIAM J. Sci. and Stat. Comp. 8, 136-140.
G.W. Stewart (1983). "A Method for Computing the Generalized Singular Value De-
composition," in Matrix Pencils , ed. B. KAgstrom and A. Ruhe, Springer-Verlag,
New York, pp. 207-20.
G.W. Stewart (1984). "A Second Order Perturbation Expansion for Small Singular
Values," Lin. Alg. and Its Applic. 56, 231-236.
G.W. Stewart (1984). "Rank Degeneracy," SIAM J. Sci. and Stat. Camp. 5, 403-413.
G.W. Stewart (1984). "On the Asymptotic Behavior of Scaled Singular Value and QR
Decompositions," Math. Camp. 43, 483-490.
G.W. Stewart (1985). "A Jacobi-Like Algorithm for Computing the Schur Decomposi-
tion of a Nonhermitian Matrix," SIAM J. Sci. and Stat. Camp. 6, 853-862.
G.W. Stewart (1987). "Collinearity and Least Squares Regression," Statistical Science
8, 68-100.
G.W. Stewart (1989). "On Scaled Projections and Pseudoinverses," Lin. Alg. and Its
Applic. 112, 189--193.
G.W. Stewart (1992). "An Updating Algorithm for Subspace Tracking," IEEE 7rans.
Signal Proc. 40, 1535--1541.
G.W. Stewart (1993). "Updating a Rank-Revealing ULV Decomposition," SIAM J.
Matrix Anal. Appl. 14, 494-499.
G.W. Stewart (1993). "On the Perturbation of LU Cholesky, and QR Factorizations,"
SIAM J. Matrix Anal. Appl. 14, 1141-1145.
680 BIBLIOGRAPHY
G.W. Stewart (1993). "On the Early History of the Singular Value Decomposition,"
SIAM Review 35, 551-566.
G.W. Stewart (1994). "Perturbation Theory for Rectangular Matrix Pencils," Lin. Alg.
and Applic. 208/209, 297-301.
G.W. Stewart (1994). "Updating URV Decompositions in Parallel," Pamllel Computing
20, 151-172.
G.W. Stewart and J.-G. Sun (1990). Matrix Perturbation TheoMJ, Academic PreBS, San
Diego.
G.W. Stewart and G. Zheng (1991). "Eigenvalues of Graded Matrices and the Condition
Numbers of Multiple Eigenvalues," Numer. Math. 58, 703-712.
M. Stewart and P. Van Dooren (1996). "Stability Issues in the Factorization of Struc-
tured Matrices," SIAM J. Matri>: Anal. App!. 18, to appear.
H.S. Stone (1973). "An Efficient Parallel Algorithm for the Solution of a Tridiagonal
Linear System of Equations," J. ACM 20, 27-38.
H.S. Stone (1975). "Parallel Tridiagonal Equation Solvers," ACM Tmns. Math. Sojt.1,
289-307.
G. Strang (1988). "A Framework for Equilibrium Equations," SIAM Review 30, 283-297.
G. Strang (1993). Introduction to Linear Algebm, Wellesley-Cambridge PreBS, Wellesley
MA.
V. Strassen (1969). "Gaussian Elimination is Not Optimal," Numer. Math. 13, 354-356.
J.-G. Sun (1982). "A Note on Stewart's Theorem for Definite Matrix Pa.irs," Lin. Alg.
and Its Applic. 48, 331-339.
J.-G. Sun (1983). "Perturbation Analysis for the Generalized Singular Value Problem,"
SIAM J. Numer. Anal. 20, 611-625.
J.-G. Sun (1992). "On Condition Numbers of a Nondefective Multiple Eigenvalue,"
Numer. Math. 61, 265-276.
J.-G. Sun (1992}. "Rounding Error and Perturbation Bounds for the Cholesky and
LDLT Factorizations," Lin. Alg. and Its Applic. 173, 77-97.
J.-G. Sun (1995}. "A Note on Backward Error Perturbations for the Hermitian Eigen-
value Problem," BIT 35, 385-393.
J.-G. Sun (1995}. "On Perturbation Bounds for the QR Factorization," Lin. Alg. and
Its Applic. 215, 95-112.
X. Sun and C.H. Bischof (1995). "A Basis-Kernel Representation of Orthogonal Matri-
ces," SIAM J. Matri>: Anal. Appl. 16, 1184-1196.
P.N. Swarztrauber (1979}. "A Parallel Algorithm for Solving General Tridiagonal Equa-
tions," Math. Comp. 33, 185-199.
P.N. Swarztrauber and R.A. Sweet (1973}. "The Direct Solution of the Discrete Poisson
Equation on a Disk," SIAM J. Num. Anal. 10, 900-907.
P.N. Swarztrauber and R.A. Sweet (1989}. "Vector and Parallel Methods for the Direct
Solution of Poisson's Equation," J. Comp. Appl. Math. 27, 241-263.
D.R. Sweet (1991}. "FBBt Block Theplitz Orthogonalization," Numer. Math. 58, 613-
629.
D.R. Sweet (1993}. ''The Use of Pivoting to Improve the Numerical Performance of
Algorithms for Toeplitz Matrices," SIAM J. Matri>: Anal. Appl. 14, 468-493.
R.A. Sweet (1974}. "A Generalized Cyclic Reduction Algorithm," SIAM J. Num. Anal.
11, 506--20.
R.A. Sweet (1977). "A Cyclic Reduction Algorithm for Solving Block Tridiagonal Sy&-
tems of Arbitrary Dimension," SIAM J. Num. Anal. 14, 706--20.
H.J. Symm and J.H. Wilkinson (1980}. "Realistic Error Bounds for a Simple Eigenvalue
and Its ASBOciated Eigenvector," Numer. Math. 35, 113-26.
P.T.P. Thng (1994). "Dynamic Condition Estimation and Rayleigh-Ritz Approxima-
tion," SIAM J. Matri>: Anal. Appl. 15, 331-346.
R.A. Thpia and D.L. Whitley (1988}. "The Projected Newton Method HBB Order 1 + .,/2
for the Symmetric Eigenvalue Problem," SIAM J. Num. Anal. 25, 1376-1382.
G.L. Thompson and R.L. Weil (1970). "Reducing the Rank of A - :O.B," Proc. Amer.
Math. Sec. 26, 548--54.
BIBLIOGRAPHY 681
G.L. Thompson and R.L. Weil (1972). "Roots of Matrix Pencils Ay = .\By: Existence,
Calculations, and Relations to Game Theory," Lin. Alg. and Ita Applic. 5, 207-26.
M.J. Todd (1990). "A Dantzig-Wolfe-like Variant of Ka.rmarker's Interior-Point Linear
Programming Algorithm," Opemtions Research 38, 1006-1018.
K.-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra
of Companion Matrices," Numer. Math. 68, 403-425.
L.N. Trefethen (1992). "Pseudospecta of Matrices," in Numerical Analysis 1991, D.F.
Griffiths and G.A. Watson (eds), Longman Scientific and Technical, Harlow, Essex,
UK, 234-262.
L.N. Trefethen and D. Bau III ( 1997). Numerical Linmr Algelnu, SIAM Publications,
Philadelphia, PA.
L.N. Trefethen and R.S. Schreiber (1990). "Average-Case Stability of Gaussian Elimi-
nation," SIAM J. Matnz Anal. Appl. 11, 335-360.
L.N. Trefethen, A.E. Trefethen, S.C. Reddy, and T.A. Driscoll (1993). "Hydrodynamic
Stability Without Eigenvalues," Science 261, 578-584.
W.F. Trench (1964). "An Algorithm for the Inversion of Finite Toeplitz Matrices," J.
SIAM 12, 515-22.
W.F. Trench (1989). "Numerical Solution of the Eigenvalue Problem for Hermitian
1beplitz Matrices," SIAM J. MatTi% Anal. Appl. 10, 135-146.
N.K. Tsao (1975). "A Note on Implementing the Householder Transformations." SIAM
J. Num. Anal. 12, 53-58.
H.W. Turnbull and A.C. Aitken (1961). An Introduction to the Theory of Canonical
Matrices, Dover Publications, New York, pp. 102-5.
F. Uhlig (1973). "Simultaneous Block Diagonalization of Two Real Symmetric Matrices,"
Lin. Alg. and Ita Applic. 7, 281-a9.
F. Uhlig (1976). "A Canonical Fbrm for a Pair of Real Symmetric Matrices That Gen-
erate a Nonsingular Pencil," Lin. Alg. and Ita Applic. 1.j, 18!}--210.
R. Underwood (1975). "An Iterative Block La.nczoe Method for the Solution of Large
Sparse Symmetric Eigenproblerns," Report STAN-CS-75-495, Department 'of Com-
puter Science, Stanford UniveiBity, Stanford, California.
R.J. Vaccaro, ed. (1991). SVD and Signal Proce8sing II: Algorithms, Analysis, and
Applications. Elsevier, Amsterdam.
R.J. Vaccaro (1994). "A Second-Order Perturbation Expansion for the SVD," SIAM J.
MatTi% Anal. Applic. 15, 661-671.
R.A. Van De Geijn (1993). "Deferred Shifting Schemes for Parallel QR Methods," SIAM
J. MatTi% Anal. Appl. 1.j, 18Q-194.
J. Vandergraft (1971). "Generalized Rayleigh Methods with Applications to Finding
Eigenvalues of Llllge Matrices," Lin. Alg. and Ita Applic . .j, 353-68.
A. Van der Sluis (1969). "Condition NurnbeiB and Equilibration Matrices," Numer.
Math. J.j, 14-23.
A. Van der Sluis ( 1970). "Condition, Equilibration, and Pivoting in Linear Algebraic
Systems," Numer. Math. 15, 74-a6.
A. Vander Sluis (1975). "Stability of the Solutions of Linear Least Squares Problem,"
Numer. Math. 23, 241-54.
A. VanderSluis (1975). "Perturbations of Eigenvalues of Non-normal Matrices," Comm.
ACM 18, 3Q-36.
A. Vander Sluis and H.A. VanDer VoiBt ( 1986). "The Rate of Convergence of Conjugate
Gradients," Numer. Math . .j8, 543-560.
A. Van der Sluis and G.W. Veltkamp (1979). "Restoring Rank and Consistency by
Orthogonal Projection," Lin. Alg. and Ita Applic. 28, 257-78.
H. Van de Vel (1977). "Numerical Treatment of a Generalized Vandermonde systems of
Equations," Lin. Alg. and Ita Applic. 17, 14!}--74.
E.F. Van de Velde (1994). Concurrent Scientific Computing, Springer-Verlag, New York.
H.A. Vander VoiBt (1982). "A Vectorizable Variant of Some ICCG Methods," SIAM J.
Sci. and Stat. Cvmp. 3, 35Q-356.
682 BIBLIOGRAPHY
H.A. Van der Vorst (1982). "A Generalized Lanczos Scheme," Math. Comp. 99, 559--
562.
H.A. Vander Vorst {1986). "The Performance of Fortran Implementations for Precon-
ditioned Conjugate Gradients on Vector Computers," Parollel Computing 9, 49--58.
H.A. Vander Vorst {1986). "An Iterative Solution Method for Solving f(A)x =bUsing
Krylov Subspace Information Obtained for the Symmetric Positive Definite Matrix
A," J. Comp. and App. Math. 18, 249--263.
H. Vander Vorst (1987). "Large Tridiagonal and Block Tridiagonal Linear Systems on
Vector and Parallel Computers," Parollel Comput. 5, 45-54.
H. VanDer Vorst (1989). "High Performance Preconditioning," SIAM J. Sci. and Stat.
Comp. 10, 1174-1185.
H.A. Van Der Vorst (1992). "BiCGSTAB: A Fast and Smoothly Converging Variant of
the Bi-CG for the Solution of Nonsymmetric Linear Systems," SIAM J. Sci. and
Stat. Comp. 19, 631--{)44.
P. Van Dooren (1979). ''The Computation of Kronecker's Canonical Form of a Singular
Pencil," Lin. Alg. and Its Applic. fn, 103-40.
P. Van Dooren (1981). "A Generalized Eigenvalue Approach for Solving lliccati Equa,.
tiona," SIAM J. Sci. and Stat. Comp. !!, 121-135.
P. Van Dooren (1981). "The Generalized Eigenstructure Problem in Linear System
Theory," IEEE 1hms. Auto. Cont. AC-26, 111-128.
P. Van Dooren (1982). "Algorithm 590: DSUBSP and EXCHQZ: Fortran Routines
for Computing Deflating Subspaces with Specified Spectrum," ACM 1Tons. Math.
Software 8, 376-382.
S. Van Huffel (1992). "On the Significance of Nongeneric Total Least Squares Problems,"
SIAM J. Matrix Anal. Appl. 13, 2D-35.
S. Van Huffel and H. Park (1994). "Parallel Tri- and Bidiagonalization of Bordered
Bidiagonal Matrices," Parollel Computing flO, 1107-1128.
S. Van Huffel and J. Vandewalle (1987). "Subset Selection Using the Total Least Squares
Approach in Collinearity Problems with Errors in the Variables," Lin. Alg. and Its
Applic. 88/89, 695-714.
S. Van Hutfel and J. Vandewalle (1988). ''The Partial Total Least Squares Algorithm,"
J. Comp. and App. Math. 21, 333-342.
S. Van Huffel and J. Vandewalle (1988). "Analysis and Solution of the Nongeneric Total
Least Squares Problem," SIAM J. Matrix Anal. Appl. 9, 360-372.
S. Van Hutfel and J. Vandewalle (1989). "Analysis and Properties of the Generalized
Total Least Squares Problem AX "' B When Some or All Columns in A are Subject
to Error," SIAM J. Matrix Anal. Appl. 10, 294-315.
S. Van Huffel and J. Vandewalle (1991). The Total Least Squares Problem: Computa-
tional Aspects and Analysis, SIAM Publications, Philadelphia, PA.
S. Van Hutfel, J. Vandewalle, and A. Haegemans (1987). "An Efficient and Reliable
Algorithm for Computing the Singular Subspace of a Matrix Associated with its
Smallest Singular Values," J. Comp. and Appl. Math. 19, 313-330.
S. Van Hutfel and H. Zha (1991). "The Restricted Total Least Squares Problem: For-
mulation, Algorithm, and Properties," SIAM J. Matrix Anal. Appl. 1!!, 292-309.
S. Van Hutfel and H. Zha (1993). "An Efficient Total Least Squares Algorithm Based
On a Rank-Revealing Two-Sided Orthogonal Decomposition," Numerical Algorith1118
4. 101-133.
H.P.M. van Kempen (1966). "On Quadratic Convergence of the Special Cyclic Jacobi
Method," Numer. Math. 9, 19--22.
C.F. Van Loan (1973). "Generalized Singular Values With Algorithms and Applica-
tions," Ph.D. thesis, University of Michigan, Ann Arbor.
C. F. Van Loan {1975). "A General Matrix Eigenvalue Algorithm," SIAM J. Num. Anal.
1!!, 819--834.
C. F. Van Loan (1975). "A Study of the Matrix Exponential," Numerical Analysis Report
No. 10, Dept. of Maths., University of Manchester, England.
BIBLIOGRAPHY 683
C. F. Van Loan (1976). "Generalizing the Singular Value Decomposition," SIAM J. Num.
Anal. 13, 76-83.
C. F. Van Loan (1977). "On the Limitation and Application of Pade Approximation to
the Matrix Exponential," in Pade and Rational Approximation, ed. E.B. Sa.ff a.nd
R.S. Varga, Academic Press, New York.
C.F. Van Loan (1977). ''The Sensitivity of the Matrix Exponential," SIAM J. Num.
Anal. 14, 971-81.
C.F. Van Loan (1978). "Computing Integrals Involving the Matrix Exponential," IEEE
7hms. Auto. Conl AC-23, 39fr404.
C. F. Van Loan (1978). "A Note on the Evaluation of Matrix Polynomials," IEEE 1hms.
Auto. Cont. AC-.24, 32D-21.
C.F. Va.n Loa.n (1982). "Using the Hessenberg Decomposition in Control Theory,• in
Algorithms and Theory in Filtering and Control, D.C. Sorensen and R.J. Wets (eds),
Mathematical Programming Study No. 18, North Holland, Amsterdam, pp. 102-11.
C.F. Van Loan (1984). "A Symplectic Method for Approximating All the Eigenvalues
of a Hamiltonian Matrix," Lin. Alg. and Its Applic. 61, 233-252.
C.F. Van Loan (1985). "How Near is a Stable Matrix to an Unstable Matrix?," Con-
tempomry Mathematics, Vol. 47, 46fr477.
C.F. Van Loan (1985). "On the Method of Weighting for Equality Constrained Least
Squares Problems," SIAM J. Numer. Anal. 22, 851-864.
C.F. VWI Loan (1985). "Computing the CS and Genemlized Singular Value Decompo-
sition," Nvmer. Math. 46, 479-492.
C.F. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors,"
Lin. Alg. and Its Applic. 88/89, 715-732.
C.F. Van Loan (1992). Computational l'romeworks for the Fut Fourier Transform,
SIAM Publications, Philadelphia, PA.
C.F. Van Loan (1997). Introduction to Scientific Computing: A Matrix- Vector Approach
Using Matlab, Prentice Hall, Upper Saddle River, NJ.
J.M. Varah (1968). "The Calculation of the Eigenvectors of a. General Complex Matrix
by Inverse Iteration," Math. Camp. 22, 785-91.
J.M. Va.rah (1968). "Rigorous Machine Bounds for the Eigensystem of a General Com-
plex Matrix," Math. Comp. 22, 793-801.
J.M. Varah (1970). "Computing Invariant Subspaces of a General Matrix When the
Eigensystem is Poorly Determined," Math. Comp. 24, 137-49.
J.M. Varah (1972). "On the Solution of Block-Tridiagonal Systems Arising from Certain
Finite-Difference Equations," Math. Comp. 26, 85!Hl8.
J.M. Va.rah (1973). "On the Numerical Solution of Ill-Conditioned Linear Systems with
Applications to III-Posed Problems," SIAM J. Num. Anal. 10, 257-67.
J.M. Va.rah (1979). "On the Separation of Two Matrices," SIAM J. Num. Anal. 16,
212-22.
J.M. Varah (1993). "Errors a.nd Perturbations in Vandermonde Systems," IMA J. Num.
AnaL 13, 1-12.
J.M. Varah (1994). "Backward Error Estimates for Toeplitz Systems," SIAM J. MatTi%
Anal. Appl. 15,408-417.
R.S. Varga (1961). "On Higher-Order Stable Implicit Methods for Solving Parabolic
Partial Differential Equations," J. Math. Phys. 40, 22D-31.
R.S. Varga (1962). MatTi% Itemtive Analysis, Prentice-Hall, Englewood Cliffs, NJ.
R.S. Varga (1970). "Minima.! Gershgorin Sets for Partitioned Matrices," SIAM J. Num.
AnaL 7, 493-507.
R.S. Varga (1976). "On Diagonal Dominance Arguments for Bounding II A- 1 lloo," Lin.
Alg. and Its Applic. 14, 211-17.
S.A. Vavasis (1994). "Stable Numerical Algorithms for Equilibrium Systems," SIAM J.
Matri:J: Anal. AppL 15, 1108-1131.
S.A. Vavasis (1992). "Preconditioning for Boundary Integral Equations," SIAM J. Ma-
tri:J: Anal. Appl. 13, 905-925.
684 BIBLIOGRAPHY
K. Veseli~
(1993). "A Jacobi Eigenreduction Algorithm for Definite Matrix Pairs," Nu-
mer. Math. 64, 241-268.
K. Vesel~ and V. Hari (1989). "A Note on a One-Sided Jacobi Algorithm," Numer.
Math. 56, 627-{;33.
W.J. Vetter (1975). "Vector Structures Wld Solutions of Linear Matrix Equations," Lin.
Alg. and Its Applic. 10, 181--s8.
C. Vuik and H.A. van der Vorat (1992). "A Comparison of Some GMRES-Iike Methods,"
Lin. Alg. and Its Applic. 160, 131-162.
A. Wald (1940). "Tbe Fitting of Straight Lines if Botb Variables are Subject to Error,"
Annals of Mathematical Statistics 11, 284-300.
B. Walden, R. Karlson, J. Sun (1995). "Optimal Backward Perturbation Bounds for the
Linear Least Squares Problem," Numerical Lin. Alg. with Applic. 2, 271-286.
H.F. Walker (1988). "Implementation of the GMRES Method Using Householder Trans-
formations," SIAM J. Sci. Stat. Camp. 9, 152-163.
R.C. Ward (1975). ''The Combination Shift QZ Algorithm," SIAM J. Num. Anal. 1S,
83~853.
R.C. Ward (1977). "Numerical Computation of the Matrix Exponential with Accuracy
Estimate," SIAM J. Num. Anal. 14, 6()(}-14.
R.C. Ward (1981). "Balancing the Generalized Eigenvalue Problem," SIAM J. Sci and
Stat. Camp. 8, 141-152.
R.C. Ward and L.J. Gray (1978). "Eigensystem Computation for Skew-Symmetric and
A Cl888 of Symmetric Matrices," ACM Thins. Math. Soft. 4, 278-85.
D.S. Watkins (1982). "Understanding the QR Algorithm," SIAM Re!Jiew 84, 427-440.
D.S. Watkins (1991). Jihndamentals of M11tri:t Computations, John Wiley and Sons,
New York.
D.S. Watkins (1993). "Some Perspectives on the Eigenvalue Problem," SIAM Re!Jiew
35, 43o-471.
D.S. Watkins and L. Elsner (1991). ''Chasing Algorithms for the Eigenvalue Problem,"
SIAM J. Matri:t Anal. Appl. 1!, 374-384.
D.S. Watkins and L. Elsner (1991). "Convergence of Algorithms of Decomposition Type
for the Eigenvalue Problem," Lin.Aig. and Its Applic. 143, 19-47.
D.S. Watkins and L. Elsner (1994). "Theory or Decomposition and Bulge-ChBSing Al-
gorithms for the Generalized Eigenvalue Problem," SIAM J. Matri:r: Anal. Appl. 15,
943-967.
G.A. Watson (1988). "The Smallest Perturbation or a Submatrix which Lowers the Rank
of the Matrix," IMA J. Numer. Anal. 8, 29~304.
P.A. Wedin (1972). "Perturbation Bounds in Connection with the Singular Value De-
compollition," BIT 11l, 99-111.
P.A. Wedin (1973). "Perturbation Theory for Pseudo-Inverses," BIT 13, 217-32.
P.A. Wedin (1973). "On the Almost Rank-Deficient Case of the Least Squares Problem,"
BIT 13, 344-54.
M. Wei (1992). "Perturbation Theory for the Rank-Deficient Equality Constrained Least
Squares Problem," SIAM J. Num. Anal. IJ9, 1462-1481.
M. Wei (1992). "Algebraic Properties of the Rank-Deficient Equality-Constrained and
Weighted Least Squares Problems," Lin. Alg. and Its Applic. 161, 27-44.
M. Wei (1992). "The Analysis for the Total Least Squares Problem with More than One
Solution," SIAM J. Matri3: Anal. AppL 13, 746-763.
0. Wid lund (1978). "A Lanczos Method for a Cl888 of Nonsymmetric Systems of Linear
Equations," SIAM J. Numer. Anal. 15, 801-12.
J.H. Wilkinson (1961). "Error Analysis of Direct Methods or Matrix Inversion," J. ACM
8, 281-330.
J.H. Wilkinson (1962). "Note on the Quadratic Convergence of the Cyclic Jacobi Pro-
cess," Numer. Math. 6, 296--300.
J.H. Wilkinson (1963). Rounding .EmJrs in Algebroic Processes, Prentice-Hall, Engle-
wood Cliffs, NJ.
BIBLIOGRAPHY 685
J.H. Wilkinson (1965). The Algebmic EigentJalue Problem, Clarendon Pn!118, Oxford,
England.
J.H. Wilkinson (1965). "Convergence of the LR, QR, and Related Algorithms," Camp.
J. 8, 77-84.
J.H. Wilkinson (1968). "Global Convergence of Tridiagonal QR Algorithm With Origin
Shifts," Lin. Alg. and Its Applic. I, 409-20.
J.H. Wilkinson (1968). "Almost Diagonal Matrices with Multiple or Close Eigenvalues,"
Lin. Alg. and Its Applic. I, 1-12.
J.H. Wilkinson (1968). •A Priori Error Analysis of Algebraic Processes," Proc. Inter-
national Congress Math. (Moscow: Izdat. Mir, 1968), pp. 629-39.
J.H. Wilkinson (1971). "Modern Error Analysis," SIAM Retliew 13, 548-{;8.
J.H. Wilkinson (1972). "Note on Matrices with a Very Ill-Conditioned Eigenproblem,"
Numer. Math. 19, 176-78.
J.H. Wilkinson (1977). "Some Recent Advances in Numerical Linear Algebra.," in The
State of the Art in Numerical Analysis, ed. D.A.H. Jacobs, Academic Press, New
York, pp. 1-53.
J.H. Wilkinson (1978). "Linear Differential EquatiollB BOd Kronecker's CBOonical Form,"
in Recent Adt!ances in Numerical Analysis , ed. C. de Boor and G.H. Golub, Ac&-
demic Press, New York, w. 231-65.
J.H. Wilkinson (1979). "Kronecker's Canonical Form a.nd the QZ Algorithm," Lin. Alg.
and Its Applic. 28, 285-303.
J.H. Wilkinson (1984). "On Neighboring Matrices with Quadratic Elementary Divisors,"
Numer. Math. 44, 1-21.
J.H. Wilkinson a.nd C. Reinsch, eds. (1971). Handbook for Automatic Computation,
VoL J?, Lifular Algebm, Springer-Verlag, New Yock.
H. Wimmer a.nd A.D. Ziebur (1972). "Solving the Matrix Equations Efp(A)gp(A) = C,"
SIAM Retliew 14, 318-23.
S. Winograd {1968). "A New Algorithm for Inner Product," IEEE 'lrans. Camp. C-17,
693-694.
M. WoUe {1996). High Performance Compilers for Pamllel Computers, Addison-Wesley,
Reading MA.
A. Wouk, ed. (1986). New Computing Enllironments: Pamllel, Vector, and Systolic,
SIAM Publications, Philadelphia, PA.
H. Woznialtowaki (1978). "Roundoff-Error Aoa.lysis of Iterations for Large Linear Sy&-
tems," Numer. Math. 30, 301-314.
H. Woznialtowski (1980). "Roundoff Error Analysis of a New Class of Conjugate Gradient
Algorithms," Lin. Alg. and Its Applic. !9, 507-29.
A. Wragg {1973}. "Computation of the Exponential of a Matrix 1: Theoretical Consid-
erations," J. Inst. Math. Applic. 11, 369-75.
A. Wragg (1975). "Computation of the Exponential of a Matrix II: Practical Consider-
ations," J. Inst. Math. Applic. 15, 273-78.
S.J. Wright (1993). "A Collection of Problems for Which Gaussian Elimination with
Partial Pivoting is Unstable," SIAM J. Sci. and Stat. Comp. 14, 231-238.
J.M. Yohe (1979}. "Software for Interval Arithmetic: A Reasonable Portable Package,"
ACM 'lrans. Math. Soft. 5, 5o-63.
D.M. Young (1970). "Convergence Properties of the Symmetric and Uosymmetric Over-
Relaxation Methods," Math. Camp. J?-4, 793-&>7.
D.M. Young (1971}. ItemtitJe Solution of Large Linear Systems, Academic Press, New
York.
D.M. Young (1972). "Generalization of Property A and Consistent Ordering," SIAM J.
Num. Anal. 9, 454-63.
D.M. Young a.nd K.C. Jea (1980). "Generalized Conjugate Gradient Acceleration of
Nonsymmetrizable Iterative Methods," Lin. Alg. and Its Applic. 34, 159-94.
L. Yu. Kolotilina BOd A. Yu. Yeremin (1993). "Factorized Sparse Approximate lnve<se
Preconditioning 1: Theory," SIAM J. Matrix Anal. Applic. 14, 45-58.
686 BIBLIOGRAPHY
L. Yu. Kolotilina and A. Yu. Yeremin {1995). "Factorized Sparse Approximate Inverse
Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers,"
Intern. J. High Speed Comput. 7, 191-215.
H. Zha (1991). "The Restricted Singular Value Decomposition of Matrix Triplets," SIAM
J. Matrix Anal. Appl. 12, 172-194.
H. Zha (1992). "A Numerical Algorithm for Computing the Restricted Singular Value
Decomposition of Matrix Triplets," Lin.Alg. and Its Applic. 168, 1-25.
H. Zha {1993). "A Componentwise Perturbation Analysis of the QR Decomposition,"
SIAM J. Matrix Anal. Appl. 4, 1124-1131.
H. Zha and z. Zhang (1995). "A Note on Constructing a Symmetric Matrix with Spec-
iJied Diagonal Entries and Eigenvalues," BIT 35, 448-451.
H. Zhang and W.F. Moss {1994). "Using Parallel Banded Linear System Solvers in
Generalized Eigenvalue Problems," Pamllel Computing 20, 108~1106.
Y. Zhang {1993). "A Primal-Dual Interior Point Approach for Computing the L, and
L 00 Solutions of Overdetermined Linear Systems," J. Optimization Theory and Ap-
plications 77, 323--341.
S. Zohar (1969). "Toeplitz Matrix Inversion: The Algorithm ofW.F. Trench," J. ACM
16, 592--{)01.
Index
687
688 INDEX