0% found this document useful (0 votes)
1K views716 pages

Matrix Comp

Uploaded by

Ayak Chol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views716 pages

Matrix Comp

Uploaded by

Ayak Chol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 716

Matrix Computations

THIRD EDITION

Gene H. Golub
Department of Computer Science
Stanford University

Charles F. Van Loan


Department of Computer Science
Cornell University

The Johns Hopkins University Press


Baltimore and London
DEDICATED TO

ALSTON S. HOUSEHOLDER
AND

JAMES H. WILKINSON
Contents

Preface to the Third Edition xi


Software xw
Selected References xv

1 Matrix Multiplication Problems 1

1.1 Basic Algorithms and Notation 2


1.2 Exploiting Structure 16
1.3 Block Matrices and Algorithms 24
1.4 Vectorization and Re-Use lssu~ 34

2 Matrix Analysis 48
2.1 Basic Ideas from Linear Algebra 48
2.2 Vector Norms 52
2.3 Matrix Norms 54
2.4 Finite Precision Matrix Computations 59
2.5 Orthogonality and the SVD 69
2.6 Projections and the CS Decomposition 75
2.7 The Sensitivity of Square Linear Systems 80

3 General Linear Systems 87


3.1 Triangular Systems 88
3.2 The LU Factorization 94
3.3 Roundoff Analysis of Gaussian Elimination 104
3.4 Pivoting 109
3.5 Improving and Estimating Accuracy 123
,,;;
4 Special Linear Systems 133
4.1 The LDMT and LDLT Factorizations 135
4.2 Positive Definite Systems 140
4.3 Banded Systems 152
4.4 Symmetric lndefrnite Systems 161
4.5 Block Systems 174
4.6 Vandermonde Systems and the FFT 183
4.7 Toeplitz and Related Systems 193

5 Orthogonalization and Least Squares 206


5.1 Householder and Givens Matrices 208
5.2 The QR Factorization 223
5.3 The Full Rank LS Problem 236
5.4 Other Orthogonal Factorizations 248
5.5 The Rank Deficient LS Problem 256
5.6 Weighting and Iterative Improvement 264
5.7 Square and Underdetermined Systems 270

6 Parallel Matrix Computations 275


6.1 Basic Concepts 276
6.2 Matrix Multiplication 292
6.3 Factorizations 300

7 The Unsymmetric Eigenvalue Problem 308


7.1 Properties and Decompositions 310
7.2 Perturbation Theory 320
7.3 Power Iterations 330
7.4 The Hessenberg and Real Schur Forms 341
7.5 The Practical QR Algorithm 352
7.6 Invariant Subspace Computations 362
7.7 The QZ Method for Ax=>. Bx 375

8 The Symmetric Eigenvalue Problem 391


8.1 Properties and Decompositions 393
8.2 Power Iterations 405
8.3 The Symmetric QR Algorithm 414
8.4 Ja.cobi Methods 426
8.5 Tridiagonal Methods 439
8.6 Computing the SVD 448
8.7 Some Generalized Eigenvalue Problems 461

9 Lanczos Methods 470


9.1 Derivation and Convergence Properties 471
9.2 Practical Lanczos Procedures 479
9.3 Applications to Ax= band Least Squares 490
9.4 Arnoldi and Unsymmetric La.nczos 499

10 Iterative Methods for Linear Systems 508


10.1 The Standard Iterations 509
10.2 The Conjugate Gradient Method 520
10.3 Preconditioned Conjugate Gradients 532
10.4 Other Krylov Subspace Methods 544

11 Functions of Matrices 555


11.1 Eigenvalue Methods 556
11.2 Approximation Methods 562
11.3 The Matrix Exponential 5 72

12 Special Topics 579


12.1 Constrained Least Squares 580
12.2 Subset Selection Using the SVD 590
12.3 Total Least Squares 595
12.4 Computing Subspa.ces with the SVD 601
12.5 Updating Matrix Factorizations 606
12.6 Modified/Structured Eigenproblems 621

Bibliography 637
Index 687
Preface to the Third Edition

The field of matrix computations continues to grow and mature. In


the Third Edition we have added over 300 new references and 100 new
problems. The LINPACK and EISPACK citations have been replaced with
appropriate pointers to LAPACK with key codes tabulated at the beginning
of appropriate chapters.
In the First Edition and Second Edition We identified a BD18ll number
of global references: Wilkinson (1965), Forsythe and Moler (1967), Stewart
(1973), Hanson and Lawson (1974) and Parlett (1980). These volumes are
as important as ever to the research landscape, but there are some mag-
nificent new textbooks and monographs on the scene. See The Literature
section that follows.
We continue as before with the practice of giving references at the end
of each section and a master bibliography at the end of the book.
The earlier editions suffered from a large number of typographical errors
and we are obliged to the dozens of readers who have brought these to our
attention. Many corrections and clari.fications have been made.
Here are some specific highlights of the new edition. Chapter 1 (Matrix
Multiplication Problems) and Chapter 6 (Parallel Matrix Computations)
have been completely rewritten with less formality. We thi.nJl: that this
facilita.tes the building of intuition for high performance computing and
draws a better line between algorithm and implementation on the printed
page.
In Chapter 2 (Matrix Analysis) we expanded the treatment of CS de-
composition and included a proof. The overview of floating powt arithmetic
has been brought up to date. In Chapter 4 (Special Linear Systems) we
embellished the 'Ibeplitz section with connections to circulant matrices and
the fast Fourier transform. A subsection on equilibrium systems has been
included in our treatment of indefinite systems.
A more accurate rendition of the modified Gra.m-Schmidt process is
offered in Chapter 5 (Orthogonal.ization and Least Squares). Chapter 8
{The Symmetric Eigen.problem) has been extensively rewritten and rear-
ranged 80 as to minimize its dependence upon Chapter 7 (The Unsymmet-
ric Eigenproblem). Indeed, the coupling between these two chapters is now
so minimal that it is possible to read either one first.
In Chapter 9 (Lanczoa Methods) we have expanded the discussion of

xi
Xli

the WlSJililiiletric Lanczos process and the Arnoldi iteration. The "unsym·
metric component" of Chapter 10 (Iterative Methods for Linear Systems)
has likewise been broadened with a whole new section devoted to various
Krylov space methods designed to handle the sparse unsymmetric linear
system problem.
In §12.5 (Updating Orthogonal Decompositions) we included a new sub-
section on ULV updating. Toeplitz matrix eigenproblems and orthogonal
matrix eigenproblems are discu.ssed in §12.6.
Both of us look forward to continuing the dialog with our readers. As
we said in the Preface to the Second Edition, "It has been a pleasure to
deal with such an interested and friendly readership."
Many individuals made valuable Thi.rd Edition suggestions, but Greg
Ammar, Mike Heath, Nick Trefethen, and Steve Vavasis deserve special
thanks.
Finally, we would like to acknowledge the support of Cindy Robinson
at Cornell. A dedicated aaaistant makes a big difference.'
Software

LAPACK
Many of the algorithms in this book are implemented in the software pack-
age LAPACK:

E. Anderson1 Z. Bai, C. Bischof, J. Demmel, J. Do.ngarra, J. DuCroz,


A. Greenbaum, S. Hamma.rli.ng, A. McKeillley, S. Ostrouchov, and D.
Sorensen (1995). LAPACK Users' Guide, Release 2.0, 2nd ed., SIAM
Publications, Philadelphia.
Pointers to some of the more important routines in this package are given
at the beginniny of selected chapters:

Chapter 1. Level-l, Level-2, Level-3 BLAS


Chapter 3. General Linear Systems
Chapter 4. Positive Definite and Band Systems
Chapter 5. Orthogonaliza.tion and Least Squares Problems
Chapter 7. The Unsymmetric Eigenvalue Problem
ChapterS. The Symmetric Eigenvalue Problem

Our LAPACK references are spare in detail but rich enough to "get you
started." Thus, when we say that _TRSV can be used to solve a triangular
system Ax = b, we leave it to you to discover through the LAPACK manual
that A can be either upper or lower triangular and that the transposed
system ATz = b can be handled as well. Moreover, the underscore is a
placeholder whose mission is to designate type (si.Dgle, double, complex,
etc).
LAPACK stands on the shoulders of two other paclm.ges that a.re mile-
stones in the history of software development. EISPACK was developed in
the early 1970s and is dedicated to solving symmetric, unsymmetric, and
generalized eigenproblems:

B.T. Smith, J.M. Boyle, Y. Ikebe, V.C. Klema., and C.B. Moler {1970).
Matrix Eigensystem Routines: EISPACK Guide, 2nd ed., Lecture Notes
in Computer Science, Volume 6, Springer-Verlag, New York.

xili
xiv SOFTWARE

B.S. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972). Matri.z
Eigm.~Yriem Routina: EISPACK GWk ~ Lecture Notes in
Computer Science, Volume 51, Sprillger~Verlag, New York.
UNPACK WBB developed in the late 1910s for linear equations and least
squares problems:

EISPACK and LINPACK have their roots in 8BQ.Uence of papers that feature
Algol implementations of some of the key matrix factorizations. These
papers are collected in

J.H. Wilkinson and C. R.einsch, eds. (1911}. Handbook for Automatic


Computation, Vol. 2, Linmr Algebrn, Springer~ Verlag, New York.

NETLIB
A wide range of software including LAPACK, EISPACK, and LINPACK is
available electronically via Netlib:
World Wide Web: http: I /wv. net lib. arg/ i.D.dex. html
Anonymous ftp: ftp://ftp.netlib.org
Via email, send a on~line message:
mail netli~ornl.gov
send index

to get started.

Complementing LAPACK and de6ning a very popular matrix computation


enviromnent is MATLA.B:

M.ATLAB U1er'11 Guide, The MathWorb Inc., Natick, Massachtutetts.


M. Marcus (1993). Ma.trice.s and MATLAB: A Thtoricl, Prentice Hall, Up-
per Saddle River, NJ.
R. Pratap {1995). Getting Starled with MATLAa, Saunders College Pu~
liahing, Fort Worth, TX.

MaDy of the problems in Matri:r CompuUJtionl are bellt posed to students


as MATLAB problems. We make extensive use of MA.TLAB notation in the
presentation of algorithms.
Selected References

Each section in the book concludes with an annotated list of references.


A master bibliography iB giveo at the end of the text.
Useful books that collecthrely cover the field, are cited below. Chapter
titles are included if appropriate but do not infer too much from the level
of detail because one author's chapter may be another's subsection. The
citations 8l'e classified as follOWB:

Pre--1910 Classics. EBlly volumes that set the stage.


Introductory (General). Suitable for the undergraduate clas5room.
Advanced (General). Best for practitioner21 and graduate studeota.
Analytical. For the supporting mathematics.
Linear Equation Problems. Ax = b.
Linear Fitting Problems. A:t ~ b.
Eigenvalue Problems. Ax = >.z.
High Performance. Parallel/vector issues.
Edited Volumes. Useful, thematic collections.

Within each group the entries are specified in chronological order.

Pre-1970 Classics
V.N. Faddeeva (1959). ComputGtioool MethotU of Limar Algebru, Dover,
New York.
Buic M.terial froal LiD_. AI~ SyfieaW of LiMV Equa&iona. The Props-
Number~~ lllolld P'ropw Vecton1 ol a MIIIUU.

E. Bodewig (1959). Matriz Calcrdw, North Hoiland, Amsterdam.


Malrix Cak:ulua. Direci Me&bodlll"ot- l..ia.c Equatioaa. IDdinK:t Methock for ~
Equa&io1111. 1nwnioa. of N~ Geodet.ic Ma&:ric& ~

R.S. Varga (1962). Matri:z Iterative Analy.ris, Prentice-Hall, Englewood


ClilfB, NJ.
MabU PropertN. &Dd Coocepta. NODDIIptift Matrices. Baa: lterUiw M•bods
and ColnpaNon Tbeore~m. Suc:a.ive Owm:laxaiion l1.erm.ive Mtlthoda. Semi-
I&eneive Metb.olk Deriwiioa IIIDd. Solution at Elliptic Diff«eaee EqllMions. Alter-
natinc Directioa lmplicR ltera&iw Maboda. Ma.lrix Metbod8 for Panabolic Pan.ia.l
Diffen!n'ia.l Equatioo&. F..olltimation ol Accelention Panunetens.

XV
xvi SELECTED REFERENCES

J.H. Wilkinaon (1963). Rounding Errors in Algefnuic Prooeues, Prenti~


Hall, Englewood Cliffs, N J.
The F\mdamenta! Arithmetic OperalioD& Computations Zlnoolving Polynomial&!.
Matrix Computation&.

A.S. Householder (1964). 1'hem7i of Matrices in Numerical Analysis', Blais-


dell, New York. Reprinted in 1974 by Dover, New York.
Some Buie Identitiel and Inequalitia Norms, Boll.IId&, and Conwrgeoce. Loca1iza..
tion 1'heor'elna and Othel' Inequalitie.. The Solution of LiDear Syatmna; Methods of
Suoc~w Approximation. Oi:ect Methods of Inversion. Proper Values and Vec:tom:
Normalisation aod R.eductiOII of the Matrix. Proper Values and V«'C>fll: Successive
Approximation.

L. Fax (1964). An Introduction to Numerical Linear Algebra, Oxford Uni·


versity Press, Oxford, England.
Introduction, Matrix Algebra. Eli.mi.n&&ion Methods of G - , Jordan, and Aitken.
Compact EUmination Metbod.ll of Doolittle, Crout, Banachiewicz, and Choleslcy.
Orthogonalization Methods. Condition, Accuracy, and Precision. Comparison of
Methods, Measure of Work. Iterative and Gradient Method!!. Iterative methods for
Latent Roots and Vectors. 1'hwBI'ol11lation Methods for Latent Roots and Vector~~.
Notes on Enor Analyail for I...a.tent Roots and Vec:tonL

J.H. Wilkinson (1965). The Algebraic Eigenvalue Problem, Clarendon Press,


Oxford, England.
Theoretical Background. Perturba.tion Theory. Error Analyaia. Solution of Lin-
_. Algebnic Equatio1111. Hennitiau Matrices. Reduction of a General Matrix to
Condemed Form. Eigenvalues of Mat.ricea of Coodenaed Forua 1'be LR and QR
Algorithms. Iteratiw Methods.

G.E. Forsythe and C. Moler (1967). Computer Soluticm of Linear Algebraic


Sy:~tems, Prenti~Hall, Englewood Cliffs, NJ.
Reader's Background &Dd Purpoee of Book.. Vector and Matrix NoriDII. Diagonal
Form of a Matrix Undel' Orthogonal Equiwl.ence.. Proof of Dit«<D&l Form 'rh.«wem.
Tn- of Computational Problema in Linear Algebra. TypN of Maine. eiiCOun-
tered in Practical Problems. Source~ of Computational Problema of LIDeac Algebra.
Condition of a LiDMl" System. Gaussian Elimination 1111K1 LU Decompollition. Need
for Inten:han«iog Rowe. Scaling EquatiomJ and Umc-na. The Croui a.od Doolit-
tle Varianta. Iterative Improvement. Computing the Deterroinam. Needy Singulu
Matrices. Algol 60 Program. Fortrall, Extended Algol, and PL/1 ProgmmL Ma-
trix Inversion. An Example: Hilbert Matrical. FloaUng Point Round-Off Analysis.
Rouoding &ror in G-.i&D Elimination. Conwrgeoce of Iterative hnprovement.
Positive Definite Matrieea; Band Matricetl. Iterative Met-bods for Solving Lineaf
S~ Nonline&l' S}'Btems of Equ5iollll.
REFERENCES xvli

Introductory (General)
A.R. Gourlay and G.A. Watson (1973). Computational MethodtJ for Matrix
Eigenproblems, John Wiley !.l Sons, New York.
lutmduction. Background Theory. RllduciioJIII aad Ttamformatio1111. Methods for
the Dominant Eigenval1111. Methods Cot the Subdominant Eigemralue. ~ It-
eration. Jacobi's Methods. Giveua and Houaebolder'a Methods. EigeDsyBtem of
~~o Symmetric 'l'ndiaconal Matrix. The LR and QR Algorithms. Extensions of J~
cobi'a Method. Extensioll ofGiwoa' and Houeeholde.-'s Methodl. QR Algorithm for
H-Ilberg Matriaa. pneralized Eigenvalue Problema. Avsilable Implement.uiona.

G.W. Stewart (1973). Introduction to Matrix Computations, Academic


Press, New York.
PreiiminarieL Practio:alitiee. The Direct Solution of Linear SyateiDII. Norms, Lim-
its, aliCI Condition Numbers. The Linear Leut Squan~~~ Problem. EigenwJ.uea 8l1d
Eigenvectors. The QR Algorithm.

R.J. Gault, R.F. Hoskins, J.A. Milner and M.J. Pratt (1974). Computa-
tional Methods in Linear Algebro, John Wiley and Sons, New York.
Eigenvalue~~ 811d Eigenvectors. Error Analyais. The Solution of Linear Equations by
Eliminaiion and Decompoeition Methods. The Solution of Linear Systems of Equa-
tion. by ltera&ive Methods. Errors in the Solution Seta of Equations. Computation
of Eigeoval11e11 and Elgeuvectors. Errors in Elgeovalues and Eigenvectol'll. Appendix
- A SIII"iey of Essentis.l Results from Linear Algebra.

T.F. Coleman and C.F. Van Loan (1988). Handbook for Matri% Computa·
tions, SIAM Publications, Philadelphia, PA.
Fortran 77, Tbe Basic Linear Algebra Subprograms, Linpac.k., MATLAB.

W.W. Hager (1988). Applied Numerical Linear Algebra, Prentice-Hall, En-


glewood Cliffs, NJ.
Introduction. Ellmination Schemes. Conditioning. Nonlineer Systems. Leesi
~ Eigen~ lt.en.t.ive Maboda.. .

P.G. Ciarlet (1989). Introduction to Numerical Linear Algebra and Opti-


misation, Cambridge University Press.
A Summary of R.Mults on Matric.es. General Resulta in tbe Numerical Analyllis of
Matrices. 5ourcel!ll of Problems in the Numerical Aoalyeis of Matriee~~. Direct Meth-
ods for the Solution of Linear S)'IJtems. Tteratiw Method.l for the Solution of Linear
SyRems. Methods for the CalcuJation of~*- and Eigenvectors. A Review of
Difrereutial Caleul.us. Some Applicat.io111. Gtmt!ral Rarulte ou Optim.bation. Some
Algorithms. Introduction to Nolllinee.r Programmi.ng. Linear Prognunmi:ng.

D.S. Watkins (1991). l'Wdamentals of Matrix Computations, John Wiley


and Sons, New York.
Gauaian Elimination aQd Its Variams.. SeDmivity of LineK Systems; Effecu ol
Roundoff En'Ol!l. Orthogonal Matricell and the r-st.SQURe& Problem. Eiganva.luee
and Eigenvectors I. EigeUYalues and Eigenwcton IL Otber Methods for the Sym.
metric Eige~~value Problem. The Siogulal- Value Decompoaition.
xvili REFERENCES

P. Gill, W. Murray, and M.H. Wright (1991). Numerical Linear Algebra


and Optimization, Vol. 1, Addison-Wesley, Reading, MA.
Introduc:tion. Lioelll" A1gtMxa Background. ComputatiOn and Condition. Linear
EquatioD&. ColllpMible Systems. Linear Leut SquarEa. Linear ColllltrainU 1: Linear
Programming. Tha Siluplex Method.

A. Jennings and J.J. McKeowen (1992). Matri:z: Computation (2nd ed),


John Wiley and Sons, New York.
Bask Algebraic BDd Numeri.cal Coucepts. Some Matrix Probletns. CompiiW lmple-
menu.tion. EliminatioD. Methods for Lineae Equations. Spame Ma.t.rix Elimination.
Some Matrix Ei&enwloe Problema. 'I'rtl.nstonnation Methods for Elpnvalue Prob-
lems. Sturm Sequmce Methods. Vector Itemtiw Methods for Pllriial Eigensolution.
Orthogonalization aad Re-Solu~ioo Techniques for Linear Equationa. lt!:rative Metb-
od.lll for I..inMc Equationa. Non-linear Equatio~. PBZ"allel and Vecwr Computing.

B.N. Datta (1995). Numerieo.l Linear Algebra and Applications. Brooks/Cole


Publishing Company, Pacific Grove, California.
IUMew of Requind Linear Algebra Concepts. Floating Point Numbers and Errors in
ComputationB. Stability of Algorithms arul Conditioning of Problems. Numerically
Effective A.Jgoritbme and Mathematical Software. Some Uaeful TransfonnatiDIIB in
Numerical Lin8BZ' Algebra and Their App~lons. Numerical Matrix Elgen..-a.lue
Problems. The Generalized Eigenvalue Problem. Tbe Singuial' Value Dec:omposition.
A T8lrte of Roundoff Error Anai)'11ia.

M.T. Heath (1997). Scientific Computing: An Introductory Suroey, McGraw-


Hill, New York.
Scientific Computing. System~~ of Linee.f Equations. Lineal' LeMi Squ&nll!l. Eigen-
value~~and Singulal' VaiUfS. Nonlinear EquatioOB. Optimization. IDterpolatioo. Nu-
merical Integration and Dift«entiat.ion. lni~ial Value Problema for ODEs. Boundary
Value Problems for ODEs. PBZ'iial Differmltial Equation~~. Fui Fouriel' Tra.oafonn.
Randozn Numben and Simulation.

C.F. Van Loan (1997). Introduction to Scientific Computing: A Matrix-


Vector Approach Using Matlabt Prentice Ha.ll, Upper Saddle River, NJ.
Power Tools of the 1nde. Polynomial lJaterpolacioa. PiecewiBe Pol,ynomia.IIDterpo-
laiion. Numerical Integra&ion. Matriz Computaiiona. t.m.r SyRems. Tbe QR and
Choleaky Fucto~looa. Noa.lineK Eqaat.ioaa and Opt.!mUetjon. The blitial Value
Prob)em.

Advanced (General)
N.J. Higham (1996). Accuracy and Stability of Numerical Algorithms,
SIAM Publications, Philadelphia, PA.
PrlncipiM of Finite Precillion Computation. Floating Polni Ariibmetic. Baaicll.
Snmm"*-m Po!yu.oznia& Norms. Perturba&ion Theoty for u-r Syfteml. Tri-
angular' Systems. LU F'llciomaiioo and u-- Equa&;iona. Cho~ F'lldorizat.ion.
Iterat:ille Refineznenl. Block LU Factorization. Matrix lownion. Condition Number
Blltimatioq. The Syh-elt« EquabOD. Slationacy Iteratiw Method.. Mai:ri:lt Powenl.
QR Factorint.ion. The r-t S<tua- Problem. Underdetermined Systema. Van-
dsrmonde S~. Filii MUrix Multiplication. The Fost Follrier 'I'raDIIform and
Application~~. Automat£ &rot- Analya:ll. Softwere Iaues In Floating Point Arith-
metic. A GaJlery of TeiJi Ma&rica.
SELECTED REFERENCES XIX

J.W. Demmel (1996). Numerical Linear AlgelmJ, SIAM Publications., Philadel-


phia, PA.
Introduction. Linear Equation Solving. Linear I..euc Squares Problema. Nonsym-
IQil\ric Eipn,.Jue Problao.. The S)'IQJD8tric Eipm.problem and Sinplar- Value De-
compoaiilou. Itentiw Met.boda £or I.UJ.r S)'SCelm .ad ~ue Problema. lter-
atiw Al3urithma for Eigeovalue Probleua.

L.N. Trefethen and D. Bau III {1997). Numerical Linear Algebra, SIAM
Publications, Philadelphia, PA.
Matrix-Vector Muhiplieation. Orthogooal Vectota and Ma&ricll!ll. Norms. The Siu-
gulac Value Doac:ompoaition. More on the SVD. Projectonl. QR Fadorizaiion. Gram-
Sdunidt Ortbogonaliaatioo. MI.TLJt.B. Householder Triangulariation. Leaa&-Squ-
Problems. Conditioning and Condition NUIIlhem. Floating Point Arithmetic. Stabil-
iqr. More on Stability. Stability of Houaehoider 'I'rianglllarizion. Stability of Back
Submtucioll. Condicioning of Leut-Squaree Probleml. St.abUity of r-t-Sque.ret
Algorithms. Gaw.iao Elimination. Pivoting. Stability of GBU~Bi.ao El.i.mi.nation.
Choleaky Factorization. Eipnwlue Problema. Overvin- of EigenVILiue Algorithms.
Reduction to H~TridJagooal Fbnn. Rayleigh Quotiem., Inverse Iteration.
QR AJ.corit;hm. Without Shi& QR. Algorithm With Shib. Other Eisenvalue Al-
gorittnm. Computing the SVD. Overview of Iteratiw Methods. The Arnoldi Itera-
tion. How Arnoldi Loeat.ell Elgenwluea. GMRES. The Le.ncsoe Iteration. Orthogo-
nal Polynomiall!l and Gaus Qu~ure. Conjugate Gradients. Biorthogooalization
Methods. Preoooditioning. The Definition of Numerical. Analysis.

Analytical
F.R. Gantmacher (1959). The Theory of Matrices Vol. 1, Chelsea, New
York.
Matricm and Opemtkma DD Matrices. The Algmitbm of c - and Some of it.s
AppiicUioaa. I..ine&l' Opera&ore in 11o11 n-dimeoaional Vector Space. The Cluu'acter-
istic Polynomial and the MiDimum Polynomial of a Matrix. Functio011 of Matrice~~,
Equivaleni 'I"ranaforDWio of Polynomia.l Mlltricea, Analytic TheoJy of Elementary
Diviaol'a. The Structure of a Linear Operator ill aa n-dimeDsional Spece. Mattix
Eque.tloll.t.. LiDeu' Operawrs iu a Uniialy Space. Quadratic aud Hermitiaa Fbnna.

F.R. Gantmacher (1959). ~ Theory of Matricu VoL 2, Chelsea, New


York.
Complex Symmetric, Sbw-8yi'IUlletric, &nd Dnhogonal MairiceB. S~ Peucila
of Matriaa. Matricel '1\'ith NonDepiive Elemella Application of the Theory of Ma-
tricm to the ID'nBtiptioo. of Sywtema of l , i _ . Dilfenmtial. Equal;ioDB. The Problem
of RDu\h-Hill'W'it:l aud ReiUed Qustiooa.

A. Berman and R.J. Plemmons (1979). N~ative Mo.tricu in the Math-


emoticol Sciences, Academic Pres&, New York. Reprinted with additioos
in 1994 by SIAM Publications, Philadelphia, PA.
Matricm Which Law a Colle lnvariaDt. NOIUltlgMiw Ma&rica Sem.lgrou~ of Non-
nepdve Mauics. S)'lDJDIItric: Nolllll!ptive Mat.ri.ciiJI. c-.1iztd ~Pt.iUrity.
M-MatriceL I~ Metbod8 for LiDear' S~ Finite Mllrlwv Chain& lnpu'-
Output AnalysU in Economics. The Linear Compieme!ltarity Problem.
XX REFERENCES

G.W. Stewart and J. Sun (1990). Matri:z Perturbation Th«>ry, Academic


Press, San Diego.
Pn!limiDarim. Norma and Metrics. ~ S)'8teml and Leut 5quanll Problema. The
Pen.urbUion of Ei&emaJum. Inwria.Dt Sut.J-., GeneraJiHd EipDvalue Problema.

R. Hom and C. Johnson (1985). Matri% Analy..U, Cambridge University


Press, New York.
R8view and MiK.ellaoea. Eigeovalu-, Eigemoecton, IUKI Similarity. UniWy Equiv-
aJeooe and Normal M11.tril;:l». Canomeal Fbnm. Hermitian ILlld Symmetric Ma.t.nce..
Nonns for Vectora and Matrices. IAcatlon and Perturbation of Ei~values. Positiw
Definite Matricee.

R. Hom and C. Johnson (1991). Topics in Ma.tri% Analym, Cambridge


University Press, New York.
The Field of Va.lu1111. Stable Mat.ricet and lnertia. Singulac Val11e Inequalities.. Ma-
trix Equations and the Kroneclcm- Product. The Hadamard PfQduct. Malrica~ 6Dd
Functiona.

Linear Equation Problems


D.M. Young (1971). Iterative Solution of Large Linear Systems, Academic
Press, New York.
Introduction. Matrix PmimiAaritll. Linew Statiooazy Iterative Methods. Conver-
gence of the Buic Iterative Methode. Eigeu.va.lus of the SOR Method for Con-
aistently Ordered Matricell. Dett!II"'Ilinat.ion of tbe Optimum Relaxation Pacalllet.«.
Nonns of the SOR Method. Tb& Modified SOR Mechod: Fixed Param.eten. Noosta.-
tionary Lineae Iterative Methods. The Modified SOR Method: Variable Parameters.
Semi-1\er~ve Methods. ExtemioM of the SOR Theory; Stieltj111 Ma.t.rii:M. Gener-
alized ConaitJteatly Ordered Matri.aa. Group lten&ive Met.hodll. Sytn.Jnetric SOR
Method &Dd R.eiA;ed Mettuxk. Secolld Degree Method&. Altemaling Dinaion Im-
plicit Methoda. Selectiou of an Itel"'iive Method.

L.A. Hageman and D.M. Young (1981). Applied Iterative Method6, Aca-
demic Press, New York.
Be.ckground on Lineal' Algebra IUld R2lated Topics. &ckgrouud on Buic Iterative
Method&. Polynomial Acc:elera&ion. Ch~ Accelen.t.ion. All Adaptive Cheby-
shev Procedure Uling Special Nonne. Adaptl:n! Chebyshev Aceeltnt.ion. Conjupte
Gradient Accelen.&ion. Special Methods for Red/Black Pactitianinp. Adaptive Pro-
cedure~ for Succ:emw OvelTelaltllt.ion Method. The Use oflt.entive Metboda in the
Solution of Pactia.l Dl1fenm.t.ial Equa&.io111. c - Studies. The Nouymmetrizable
c-.
REFERENCES xxi

A. George and J. W-H. Liu {1981). Computer Solution of Large Sparse


PosititJe Definite Sy5tems. Prentice-Hall Inc., Englewood Cliffs, New
Jersey.
Intnxh~ion. F\md..meutall Soma Graph Theory Notation and Ite U~~e in the
Study of Spane Symmetric Matrices. BAnd and Envejope Methods. General SJ)Mie
Metboda. Quo\iem 'I'ree Metboda for Finite Element. and Finite Difrerence Prob-
lems. One-Way D~ioa Methods for Finite Element Problamll. NaKed D~ion
Methods. NllDiflrical ElcperimeutL

S. Pissa.netsky (1984). Sparse Matrix Techrwlogy, Academic Press, New


York.
Fundameutals. Linear Algebmic Equation~~. Numerical Errors in Gau88ian Elimi-
nation. Ordering for G&WIII Elimination: Symmetric Matrices. Ordering for Gallll!l
Elimination: General Matriees. Sparae Eigmuma.I)'Bia. Spame Matrix Algebra. Con-
nectivity and Nodal ~ly. General Purpoae Algorithmll.

I.S. Duff, A.M. Erisman, and J.K. Reid (1986}. Direct Method3 for Sparse
Matrices, Oxford University Press, New York.
Introduction. Sparse Matrices:St.oragfl Scb.emea and Simple Opentions. Gaussian
Elimination for Deuae Mat~: The Algebraic Problem. Galll!Bia.n Elimination
for Dense Matrkee: Numerical Consi.deratioll.ll. GaU88ia.n Elimination for Spacse
Matrices: An Introduction. Reduction to Block niangulal' Form. Local Pivotal
Strategies for Spame Ma.t.riclll!l. Ofdering Sparse Matrices to Special Forms. Im-
plementing GaWIBian Elimlna&ion: Anal~ with Numerical ValuM. lmplemenUng
Gau.ian E]jmin"'iOJt witb Symbolic Analyae. PBrlitionillg, Matrb( Modification,
aod Tearing. Oth• Spamity-Orieoted lslnles.

R. Barrett, M. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V.


Eijkhout, R. Pozo, C. Romine, H. van der Vorst (1993). Templates for
the Solution ofLiJWJ.r System~: Building Blocb for lterotive Methods,
SIAM Publication.s, Philadelphia, PA.
Introduction. Why U•Templais? Whal Methods are COIII!!nd7 Iteratiw Methods.
St.iionary Mechoda. N01111tationaey lter.iive Methods. Survey of Recent Krylov
Methoda. Jacobi, lllcomplete, SSOR, and Polynomial PreooudiUoaera. Complex
Synema. Skipping Criteria. Data Structura. Parallelism. The Lanczce Conneciion.
Block Iterative Method& Redu.ced System Preconditioning. Domain Decomposition
Method& Multigrid Methods. :Row Projection Methods.

W. Hackbusch (1994). lterotWe Solution of Large Spane System6 of Equa-


tions, Springer-Verlag, New York.
I.n:troduction. Recapitulation of Linear Algebra. Iterative Method& Methods of
Jacobi aDd G~ and SOR. lteraiioD in the Poeiiive Definite Cue. Ana.I)'IIis
in the 2-Cyd.ic c-. Analyma for M-Matnc-. Semi-lteratnre Methods. 'Ii'&P.Ifor·
mat.ion11, Secondacy ltenliiona:, IDcomple&e Triangular Decompo~~itions. Conjupte
Gradient Methods. Multi-Grid Mecbods. Domain Decompollition Methods.
:xxil REFERENCES

0 . .Axelaeou (1994). !ttmltive Solution Method8, Cambridge University


PreBB.
Oireci Solutiou Me&boda. Tbeory of Ma&.rix Elc-wl-. Po.itM! De:fi.ui\<1 Muri-
0.. Schur Comp1-u, ud Centnliad ~ue Pzobtma. Reducible ud lrre-
dueible M~ aDd ~be Pa'J'OD..:F'robcUoua 'Theory for No1U1ep&ive Mauic-. Buic
Itel'Uhe Melhoda ud Tbeir RMm of Co!h•puoa. M-Ma&.ricea, CoiM!II"ClUJ\ Split-
dnp, aDd the SOR. Mechod. blctm!pWM Facto~Uat.ioD ~\lcmiDc Mechoda.
A~ Matrix 1nwnM aDd CoJTIIII)OtldiJI&; PrecondiUonin& Methoda. Block
Dia(onal and Schl.ll" Complemeut Precooditioo.np. F&im&&e~ of EipuftillM and
Coudition NUJDIMn 6x PreooDditioaMt Matrior& Coujtlp&;e Gr.diell' aDd ~
Type Method.. ~ CoDjupt.e Cr.dieAt M~. The Rate of Convarpoce
ot ~he Conjupee GMieut Method.

Y. Saad {1996). Iterative Meth.otl8 for Spar.1e Linear Sy.1tem.~, PWS Pub-
lishing Co., Boston.
&cqrouud iu Lineae Alpbr&. Dilaetization of PDF... Sparee MMric... Buie
lterui:w Metboda. Projeciioa M~. Krylov Suo.p.ce Method. - Part. I. Krylov
Sut.pKe Methoda- Part U. Met.hoda Related to the Norma.l Equations. Precon-
ditioned It.erat.iona. Prec:ouditioniq Techniquee. Panlle! lmplementatic~DS. Parallel
PrecoDdiUouen.. Oo!Dain Oecompaeition Methods.

Linear Fitting Problems


C.L. Lawson and R.J. Hanson (1974). Solving LeMt Square$ Problenu,
Prentice-Hall, Englewood CliJfs, NJ. Reprinted with a detailed '"new
developments" appendix in 1996 by SIAM Publications. Philadelphia,
PA.
lD.uoduchon. AAalym: of the 1.-- Squares Problem. OrtbOKO!l*J Oecompo»itiou by
CezUin Elemea\ary 'I'ra.lwfurmatioiiiL Ort~nal Decompc.ttion by Siu«ula.r Value
Decomposition. P!nu.rb&tioD: ThflOrf:lm for Sin~War Value.. Bollllda for the Con-
ditlon Number of a 'I'IiaAcuJar M~rix. Tbe Plelldoinwne. Pmurb&UoD BouDda
fOl'tbe ~- Pertwt.c.ion Bolllllb foc 'he Soltnion of Problllll!l LS. Nu-
D*ic&l Computation~! u.ua, Elemlllrt:ary OrtbCJconal 'InMb:matiou.. CoDlfiU'&in&
the Solu*ion for tbe ~ ot Ex.csly I>etenniDed l'hll R.&n.k Problem.
Compu\a.Uon of the COYUiulce Mauix of the Solution Panmeten. Computinc tbe
Solut.ioD m the Undenietllr'miued Fu.ll Rank Probli!!m. Coluputiq: tbe Solution for
Probllll:n LS wii.b Pc.Wiy DeficieM Paeudorank. Aaalya of Computin& Em:n for
Hollltbolder 'I'nlufonDNiona. A~ of Comp~ID& En-on fOr the Problem LS.
A.a.alyn. of Compu~ Errnn fur Uw Probi.Q ~ U.mc Miud Pn.:aan Aritllu*ic.
Cotnp~na&.ion of tbe Sinplar Value Daco~ion and tiM Solutio11. of Problem
LS. Otba- Methodli for r..... SQuarw Problema. u - L.- Squu. wt~ Lill..eu-
Equality ~ Ulliut; a Bali. of the Null Speca. u_. ._. Sq~ with
I.m-- Equ.lity CobM.raiDtl by Dina Elimiuation. LiDMr lAMl ~ with Liii.-
MI' Equality Conm.ill.t.a by WlliJh.W~.&. :u_. ~ Squ&~W with Li.neer .lu.equ&lity
Coutraillia. Modifyinc a QR .Decompoeitioll. to Add or H - . Colum.D. vecwra.
PracUcal Ana.lym of I.-. Squan. ~- Eumpl• ot Some Method~~ of Ana-
l~ a I..euc Squ.llftll Problem. Modifyinc a QR ~ii.oll to Add Ol' Re!DCM!
Rmr Vectot11 with Applicatio11. to Saqlllllltial Pl"DC.-i.nc of Problem. HaviJic a ~
ot BaDded Coeft'!cieDt M&Uix.
REFERENCES xxiii

R. W. Farebrother (1987). Linear Le48t Squ4res Comp11tQtions. Marcel


Dekker, New York.
The a- aDd a-Jordan Methodl. Matrix Analysia of Gaullll'a Method: The
Cholalky lltld Doolittle Decompositioll8. Tbe Lineal' Algebnic Model: The Method
of A~ and tbe Metbod of I.-e Sq~ The Cauchy-Bienayme, LapbK:e,
and Schmidt Prncedlli'III. Houaeholder Procedurm. Givlmll ProcediU'ell. Updating
the QU Decompositiou. Peeudorandom Numbers. The Standard Linear Model.
Condition Numbem.. Instrwneutal Variable &timators.. Gelleraliaed I-.. Squ.ve~
Eeiimaiion. Itera&iw Solutions of Lineu lltld Nonlineac r-t SqUMm Problema..
Ca.Donical ~ for the Leul Square~ &tbMiots alld n.&. S~ic:a. Tra-
ditional ~for the Least Squarm Updating Fonnulas and test Sta.t.iBtic:a.
I..ea.n Square~ ~ion Subject t.o LiAear Constraint&

S. Van HuJfel and J. Vandewalle (1991). The Total Least Square3 Problem:
Computational Aspect& and Analysis, SIAM Publications, Philadelphia,
PA.
Introduction. Basic Principles of the Tot.al Least Squares Problem. Extensions of the
Basic Total Least Squ.are~ Problem. Direct Speed Improvement of the Thtal Least
Squares Computations. Iterative Speed Improvement for Solving Slowly Varying
Total Least Square~~ Problems. Algebraic Conoeciiom Between Total Least Sqnace~~
a.Ild Leatit Squa.re~ Problems. Sensitivity AMI.ysiB of Tot.al Leut Squa.. and Least
Squares Problems in the Preaenc;:e ot En-on in AU OM&. Statistic;:al Properties of the
Total Least Squane Problem. Algebraic Connections Benn!en Total Least Squarea
Estimation and Clallsica.l Linear Regre!I!Jion in Multicollinearity Problems. Conclu-
sions.

A. Bjorck (1996). Numerical Methods for Least Squares Problems, SIAM


Publications, Philadelphia, PA.
Mathem.tical a~~d Statillt.l.cal. Propwtil!ll of 1-.t SquaceB Solutiona. Basic Nwnerical
Method& Modified 1.-.: Squ- Problems. Genenlized ~ SqUMm Problems.
Cooam.ined Least Sq118R11 Problema. D~ Mdhods for Spe.ne Leii8C Square~ Pmb-
lema. Iterative Met;boda fu£ ~ SqiW'ell Problems. r - t SqUIIRB with Spec;:ial.
aa-. Nonlineu l - * Squeres Problems.

Eigenvalue Problems
B.N. Parlett (1980). The Symmetric EigentJalue Problem. Prentice-Hall,
Englewood Cliffs, NJ.
Baaic Facta abom Self-Adjoint M.mc.. 'l'llalal, 0~ .ud Aida. Clount.izl«
Eigtmvaluea. Simp!e Vector Iterations. Deliadon. Uilllful OttbopJIIIII. M&tricell.
Tridiagoul Form. The QL find QR A.1gorithma. Jac:obi Metboda. Eigeavalue
Bowada. ApproximMioa. from a Sabepace. Krylov Subal-. J.u,r:sa, Algoritlui!L
Subllpace ltemtion. The Geaeral Lin_.- Elgenvalae Problem.

J. Cullum and R.A. Willoughby (1985a). Lanczos Algorithms for Large


Symmetric Eigenvalue Com~, Vol. I '.I'h.eory, Birkhaiiaer, Boston.
PreliminariaJ: Notation and DefizUtioaa. Real Symme&ric Pro~. ~ Pro-
cedlllar Real S)'DliDIItric Prob~ 1'rldla&ollal Mairic:ea. ~ PTocedum~ with
No~ for S)'miDI!Cric: Problems. Re&l ~ Maail:a Ncm-
Defective Complex S)'111111ieCric: Ma&nc:.. BJock Le.nC3Illl PmcedUl'I!B, Real S}'1l~Dlelric
Maine..
xxiv REFERENCES

J. Cullum and R.A. Willoughby (1985b). Lanczo4 Algorithms for Lo.rye


Symmetric Eigenvalue Computations, Vol. II Programs, Birkhaiiser,
Boston.
I.anaoa Procedures. Real Symmetric Matrica Hmmi.tian Matricee. Factored Io-
WIBM of RaJ Symmetric Matric:a~. Real Symmetric Geoeralbed Problems. Real
Rectangular Problema. Noudefectiw Complex Symmetric Matrica Real Symmet-
ric Mamc:es, Block Law:zoe Code. Factored Invemm, RMl S)'DliDfltric Mameee,
Block l.a.nczal Code.

Y. Saad (1992). NumerictJ. Methods for Large Eigenvalue Problems: Theory


and Algorithms, John Wiley and Sons, New York.
Background in Matrix Theory and Lineal' Algebra. Perturbation Theocy and Er-
ror Analysis. The Tools of Spectral Apprmcimation. Subepa.ce Iteration. Krylov
Sobapace Methods. Acceleration Techniques and Hybrid Methods. Precondition·
iog Techniqum. Non-Standard Eigenvalue Problems. Origi!UI of Matrix Eigenwlue
Problema.

F. Chatelin (1993). Eigenvalue.~ of Matrices, John Wiley and Sons, New


York.
Supplementll from Linear Algebra. Elementll of Spearal Theory. Why Compute
Eigeovalllel!l. Er-rw Aoal)'Bia. Foundationa of Methoda for Computing Eigenvalues.
NWDI!I:lical Methods for Large Matrice&.. Chebyshev's Uera,ive Methods.

High Performance

W. Schonauer (1987). Scientific Computing on Vector Computers, North


Holland, Amsterdam.
lntroductioo. The Firat Commercially Sigoiticaot Vector Computer. The Arithmetic
Performance oft he First Commercially Significant Vector Computer. Hockney's n lfl
and Ti.mi.ng Formulae. Fortran and Autovectorizaiioa. Behavior of Programs. Some
BaaX: Algorithms, Recummcee. Matrix Operations. Systems of Lioe&r" Equationll
with Full Matrices. Tnd.i.gonal Lineal' Systeme. The Iterative Solution of Lineae
Equatio:tiiJ. Special Applicat.iooa. The Fujitsu VPa and Other Japan- Vector Com-
puters. The Cray-2. The ffiM VF and Other Vecior Pro«eaon. Tbe Convex CL

R.W. Hockney and C.R. Jesshope (1988). Parallel Computers 2, Adam


Hilger, Bristol and Philadelphia.
Int.roductioa. PipeliDed Computen. ~ Arrays. Pwa1lel Languages. Pezallel
Algorithms. Futlli1!! Developments.

J.J. Modi (1988).Parallel Algorithms and Matrix Computation, Oxford Uni-


versity Press, Oxford.
Ceaen11 PrinciplM of Pamllel. Compming. PualJel Techniques and Algorithm& Par-
allel Sorting AJgorithmll. Soluiion of a Syaem of Liaeer Algebraic Equatiooa. The
Symmetric Eipnvaiue Problem: Jacobi'a Method. QR Factorizaiicm. Singular Value
Decompoaltion aad Related Problema.
SELECTED REFERENCES XXV

J. Ortega {1988). Introduction to Pamllel and Vector Solution of Linear


Sy~te'lru. Plenum Press, New York.
lntroductioll. Dina Method8 for Lineu Equations. ltemtiw Methoda: for Linear
EquatioiiiJ.

J. Donga.rra, I. Duff, D. Sorensen, and H. van der Vorst (1990). Solving


Linear SysteJM on Vector and $luJred Memory Computen, SIAM Pub-
lications, Philadelphia, PA.
VectOI' aDd Parallel ProcEMing. Owrview of Current High-Perlorm&Dce Comput-
eD. Implemel\t.a1ioD DeUW11 and ~ Performance Analyais, Modeling, and
Meutuemtmta.. Building Blocbio LiPeel' Algebra. Dicect Solution of Spane Linear
Systems. Iterative Solution of Spame Line&r" Systems.

Y. Robert (1990). The ImPQCt of Vector and Pamllel Architecture~ on the


Gaussian Elimination Algorithm, Halsted Press, New York.
Introduction. Vector aDd Pacal.lel An:bitectures. Vector Mul,iproc:-.r Computing.
Hype!'Cllbe Computing. SyAotic Computing. Task Gru.ph Scheduling. Analyaill of
Distributed Algorithms. DaDgo Methodologies.

G.H. Golub and J.M. Ortega (1993). Scientific Computing: An Introduc-


tion with Parallel Computing> Academic Press, Boston.
The WOl'ld of Scientiflc Computing. Linear Algebra. Pamllel and Vector Computing.
Polynomial AppmximatioP. ContinuoUB Problems Solved Discretely. Direct Solu-
tion of Linear EquatioiiiJ. Pa.rallel Direa Methods. Iterative Methods. Conjugate
Gr.dient-Type Methods.

Edited Volumes
D.J. Rose and R. A. Willoughby, eds. (1972). Sparse Matrice~ and Their
Applications, Plenum Press, New York, 1972

J.R. Bunch and D.J. Roee, eds. (1976). Sparse Matri:r: Compumti~,
Academic Press, New York.

I.S. Duff and G.W. Stewart, eds. (1979). Sparse Matrix Pf'OCWlings, 1978,
SIAM Publications, Philadelphia., PA.

I.S. Dutf, ed. (1981). Spar3e Matrice3 and Their Usu, Academic Press,
New York.

A. Bjorck, R.J. Plemmons, and H. Schneider, eds. (1981). Larye-Scale


Mat"r'U Problems, North-Holland, New York.

G. Rodrigue, ed. (1982). Parallel Computation, Academic Press, New


York.
xxvi REFERENCES

B. K8gstrom and A. Rube, eds. (1983). Matri% Pencils, Proc. Pite Haw-
bad, 1982, Lecture Notes in Mathematics 973, Springer-Verlag, New
York and Berlin.

J. Cullum and R.A. Willoughby, eds. (1986). Large Scale Eigenvalue Prob.
lems, North-Holland, Amsterdam.
A. Wouk, ed. (1986). New Computing Environment!: Pamllel, Vector, and
Synolic, SIAM Publications, Philadelphia, PA.

M.T. Heath, ed. (1986). Proceedings of First SIAM Conference on Hyper·


cube Multiprocessors, SIAM Publications, Philadelphia, PA.

M.T. Heath, ed. (1987). Hypef'Ctlbe Multiprocessors, SIAM Publications,


Philadelphia, PA.

G. Fox, ed. (1988). The Third Conference on Hypercube Concum:nt Com-


puters and Applications, Vol. II - Applications, ACM Press, New York.

M.H. Schultz, ed. {1988). Numerical Algorithms for Modem Parallel Com-
puter Architectures, IMA Volumes in Mathematics and Its Applications,
Number 13, Springer-Verlag, Berlin.

E.F. Deprettere, ed. (1988). SVD and Signal Processing. Elsevier, Ams-
terdam.

B.N. Datta, C.R. Johnson. M.A. Kaashoek, R. Plemmons, aod E.D. Son-
tag, eds. (1988), Linear Algebro in Signals, Systems, and Control, SIAM
Publications, Philadelphia, PA.

J. Dongarra, I. Duff, P. Gaffney, and S. McKee, eds. (1989), Vector and


Parallel Computing, Ellis Horwood, Chichester, England.

0. Axelsson, ed. (1989). "Preconditioned Conjugate Gradient Methods,"


BIT 29:4.

K. Gallivan, M. Heath, E. Ng, J. Ortega, B. Peyton, R. Plemmons, C.


Romine, A. Sameh, and B. Voigt (1990), PamUel Algorithms for Matrix
Computations, SIAM Publications, Philadelphia, PA.

G.H. Golub and P. Van Dooren, eds. (1991). Numerical Linear Alge-
bm, Digital Sign41. Procusing, and Parnllel Algorithms. Springer-Verlag,
Berlin.

R. Vaccaro, ed. (1991). SVD and Signal Processing II: Algorithms, Analy-
sis, and Applications. Elsevier, Amsterdam.
REFERENCES

R. Beauwena and P. de Groen, eds. (1992). Iterotive Method.! in Linear


Algefnu, E1sevi.er (North-Holland), Amsterdam.

R..J. Plemmons and C.D. Meyer, eds. (1993). Linear Algebro, Mo:rkov
Chllim, and Queuing Modeu, Springer-Verlag, New York.

M.S. Moonen, G.H. Golub, and B.L.R. de Moor, eds. (1993). Linear
Algebra for Large Scale and Real- Time Applications, Kluwer, Dordrecht,
The Netherlands.

J.D. Brown, M.T. Chu, D.C. Ellison, and R.J. Plemmons, eds. {1994). Pro-
ceedings of the Cornelius Lanczos International Centenatlf Conference.,
SIAM Publications, Philadelphia, PA.

R. V. Patel, A.J. Laub, and P.M. Van Dooren, eds.. (1994). NumericiJJ.
Linear Algebra Techniques for System3 and Contro~ IEEE Press, Pis-
cataway, New Jersey.

J. Lewis, eel. (1994). Proceedings of th.e Fifth SIAM Conference on Applied


Linear Algebro, SIAM Publications, Philadelphia, PA.

A. Bojanczyk and G. Cybenko, eds. (1995). Linear Algebra for Signal


l'roce3sing, IM.A VolUJDes in Mathematics and Its Applications, Springer-
Verlag, New York.

M. Moonen and B. De Moor, ads. (1995). SVD and Signal Prvceuing III:
Algorithms, Analysis, and Applications. Elsevier, Amsterdam.
Chapter 1

Matrix Multiplication
Problems

§1.1 Basic Algorithms and Notation


§1.2 Exploiting Structure
§1.3 Block Matrices and Algorithms
§1.4 Vectorization and Re-Use Issues

The proper study of matrix computations begins with the study of the
matrix-matrix multiplication problem. Although this problem is simple
mathematically it is very rich from the computational point of view. We
begin in §1.1 by lOoking at the several ways that the matrix multiplica-
tion problem can be organized. The «language" of partitioned matrices
is established and used to characterize several linear algebraic "levels" of
computation.
H a matrix baa structure, then it is usually possible to exploit it. For
example, a symmetric matrix can be stored in half the space as a general
rn.&trix. A matrix-vector product that involves a matrix with many zero
entries may require much less time to execute than a full matrix times a
vector. These matters are discussed in §1.2.
In §1.3 bloclt matrix notation is established. A block matrix is a matrix
with matrix entries. This concept is very important from the standpoint of
both theory and practice. On the theoretical side, block matrix notation
allows ua to prove important matrix factorizations very succinctly. These
factorizations are the cornerstone of numerical linear algebra. From the
computational po:iut of view, block algorithms are important because they
2 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

are rich in matrix multiplication, the operation of choice for many new high
performance computer architectures.
These new architectures require the algorithm designer to pay as much
attention to memory traffic as to the actual amount of arithmetic. This
aspect of scientific computation is illustrated in §1.4 where the critical is-
sues of vector pipeline computing are discussed: stride, vector length, the
number of vector loads and stores, and the level of vector re-use.

Before You Begin


It is important to be familiar with the MATLAB language. See the
texts by Pratap(l995) and Van Loan (1996). A richer introduction to high
performance matrix computations is given in Dongarra, Duff, Sorensen, and
Duff {1991). This chapter's LAPACK connections include

LAPACK: Some General Operations


.SCAL z- (1% Vector scale
.DOT. p.- ZTJI Dot product
.!IPY v-cu+y Saxpy
.GEMV 11 - aA.z + Pu Matrix-Yeetor multiplication
_CEll A- A+azyT Ra.nk-1 update
.GEIV! C -aAB +{JC Matrix nmltiplication

LAPACK: Some Symmetric Operations


_sYMY Y - aA:r+ /37; Matrix-vector multiplication
.SPMV yo- aAz+fJv Matrix-vector multiplication (Packed)
.STR A - CIZZT +A Rank-1 update
.STR2 A - cczvT + QYZT + A Rank-2 update
.STIIX C+-aAAT +f)C Raolc-k update
.SYR2K C.._ aABT +aBAT +{JG Rank-2k update
_SYMK C = aAB + {JC or (aBA + tJC) Sym~~~etric/General Product

LAPACK; Some Band/Triangular Operations


_GBKV11 - aA:t + .Ou General Band
.smw 11 .._ aAz + iJ11 Synunetric BBD.d
- TBifV z-~ 'lnangula.r
_TPKV : t - aA% 'I'ria.Dgulal" Padmd
- TltKK B - a.AB (or BA) 'I'rian.gu.lar/Genenl Product

1.1 Basic Algorithms and Notation


Matrix computations are built upon a hierarchy of linear algebraic oper&-
tions. Dot products involve the scalar operations of addition and multipli-
cation. Matrix-vector multiplication iB made up of dot products. Matrix-
matrix multiplication amounts to a collection of matrix-vector products.
All of these operations can be described in algorithmic form or in the lan-
guage of linear ~bra. Our primacy objective in this section is to show
1.1. BASIC ALGORITHMS AND NOTATION 3

bow these two styles of expression complement each another. Along the way
we pick up notation and acquaint the reader with the kind of thinking that
underpins the matrix computation area.. The discussion revolves around
the matrix multiplication problem, a computation that can be organized in
several ways.

1.1.1 Matrix Notation


Let R denote the set of real numbers. We denote the vector space of all

l
m-by-n real matrices by R"x":

au Btn
A E IR.mxn = Bi; E R.
~1 ~n
A= (Bi;)
[

If a capital letter is used to denote a matrix (e.g. A, B, A.), then the


corresponding lower case letter with subscript ij refers to the (i,j) entry
(e.g., a.; , b•i• 65;). A1l appropriate, we also use the notation I A l•.1 and
A(i,j) to designate the matrix elements.

1.1.2 Matrix Operations


Basic matrix operations include tmnsposition (Rmxn-- R"xrn),

C;j =a;;,

C=A+B. ==>

C=a.A C;j = Otlijo

.
C=AB C;j = L: au.b"i.
... 1

These are the building blocks of matrix computations.


4 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

1.1.3 Vector Notation


Let Rn denote the vector space of real n-vectors:

We refer to x; as the ith component of :r:. Depending upon context, the


alternative notations !zJ; and x(i) are sometimes used.
Notice that we are identifying lR.n with Rnx 1 and so the members of
R" are column vectors. On the other hand, the elements of lR! x n are row
vectors:
3: E Rlxn *=> ;t = (:rt, · · ·, Xn) ·
U ::r: is a column vector, then y = zT is a row vector.

1.1.4 Vector Operations


Assume a E R, ;t E Rn, and y E R"'. Basic vector operations include sCIJiar-
vector multiplication,
z=az z. = ax;,

vector addition,

the dot product (or inner product) 1

c = -zTy

and vector multiply (or the Hadamard product)


z =::r:.•y ==>

Another very important operation which we write in "update form" is the


sa:rpy:.
Y = w:: + Y ==:> Yi = ax; + y;
Here, the symbol "=" is being used to denote assignment, not mathematical.
equality. The vector :11 is being updated. The name "saxpy" is used in
LAPACK, a software package that implements many of the algorithms in
this book. One can think of "saxpy" aa a mnemonic for "scalar a x plus
y."
1.1. BASIC ALGORITHMS AND NOTATION 5

1.1.5 The Computation of Dot Products and Saxpys


We have chosen to express algorithms in a stylized version of the MATLAB
language. MATLAB is a powerful interactive system that is ideal for matrix
computation work. We gradually introduce our stylized MATLAB notation
in this chapter beginning with an algorithm for computing dot products.

Algorithm 1.1.1 (Dot Product) Ifx,y E R", then thiB algorithm com-
putes their dot product c = xT y.
c=O
fori= l:n
c = c + x(i)y(i)
end
The dot product of two n-vectors involves n multiplications and n additions.
It is an "O(n)" operation, meaning that the amount of work is linear in
the dimension. The sa.x.py computation is aLso a.n O(n) operation, but it
returns a vector instead of a scalar.

Algorithm 1.1.2 {Saxpy} If x, y ERn and a E 1R., then this algorithm


overwrites y with ax+ y.
fori== l:n
y(i) = ax(i) + y(i)
end
It must be stressed that the algorithms in this book are encapsulations of
critical computational ickas.. -.d not "production oodes. "

1.1.6 Matrix-Vector Multiplication and the Gaxpy


Suppose A E IRrnxn and that we wish to compute the update

y = Az+y
where z E R" and y E Rm are given. This generalized saxpy operation is
referred to as a ga%py. A standard way that this computation proceeds is
to update the components one at a time:
n

Yi = La.;x; + Y• i=Lm.
j~I

This gives the following algorithm.

Algorithm 1.1.3 (Gaxpy: Row Version) If A E Rmxn, z E R.n, and


y E IRm, then this algorithm overwrites y with Ax + y.
6 CHAPTER 1. MATRIX MULTIPLlCATION PROBLEMS

fori= l:m
for j = l:n
y(i) = A(i, j)z(j) + y(i)
end
end
An alternative algorithm rffilllts if we regard Ax as a linear combination of
A's columns, e.g.,

[~ ~ l[~ ]
5 6
= [ ~ ~ ! ~ :!
:
5.7+6.8
l ~ l !l ~~ l
= 7[
5
+8 [
6
=[
83

Algorithm 1.1.4 (Gaxpy: Column Version} If A E Rmxn, x E 1R.n,


and y E JR.m, then tbiB algorithm overwrites y with A.:z: + y.

for- j = l:n
for-i;; l:m
y(i) = A(i,j)x(j) + y(i)
end
end
Note that the inner loop in either gaxpy algorithm carries out a saxpy
operation. The column version was derived by rethinking what matrix-
vector multiplication "means" at the vector level, but it could also have
been obtained simply by interchanging the order of the loops in the row
version. In matrix computations, it is important to reJate loop interchanges
to the underlying linear algebra.

1.1.7 Partitioning a Matrix into Rows and Columns


Algorith.tru 1.1.3 and 1.1.4 access the data in A by row and by column
respectively. To highlight these orientations more clearly we introduce the
language of partitioned matrices.
From the row point of view, a matrix is a stack of row vectors:

A E Je"xn (1.1.1}

This is called a row partition of A. Thus, if we row partition

[~ n
1.1. BASIC ALGORITHMS AND NOTATION 7

then we are choosing to think of A as a collection of rows with

rf = [ 1 2 ], rf = [3 4 ), and rf = [ 5 6).
With the row partitioning (1.1.1) Algorithm 1.1.3 can be expressed as fol-
lows:
fori= l:m
Yi = rf X+ y(i)
end
Alternatively, a matrix is a collection of column vectors:

A E Rmxn <==;> A;;;; [clo··· 1 Cn}, Cio E R"' · {1.1.2)

We refer to this as a column partition of A. In the 3-by-2 example above, we


thus would set c1 and ~ to be the first and second columns of A respectively:

With (1.1.2) we see that Algorithm 1.1.4 is a saxpy procedure that accesses
A by columns:
for j = l:n
end
In this context appreciate y as a running vector sum that undergoes re-
peated saxpy updates.

1.1.8 The Colon Notation


A handy way to specify a column or row of a matrix is with the "colon"
notation. If A E lR"'xn, then A(k, :) designates the kth row, i.e., ··

A(k, :) = [akt. ... , 4.tn] .

l
The kth column is specified by

a11c
A(:,k) =
[
~k

With these conventions we can rewrite Algorithms 1.1.3 and 1.1.4 as


fori= l:m
y(i) = A(i, :):r +y(i)
end
8 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

a.nd
for j =l:n
y = :t{j)A(:,j) +y
end
respectively. With the colon notation we are able to suppress iteration
details. This frees us to think at the vector level and focus on larger com-
putational issues.

1.1.9 The Outer Product Update


As a preliminary application of the colon notation, we use it to understand
the outer product update

The outer product operation xyT "looks funny" but is perfectly legal, e.g.,

[ ~][4
3
5]=[: r5o].
12 15

This is because zyT is the product of two "skinny" matrices and the number
of columns in the left matrix x equals the number of rows in the right matrix
yT. The entries in the outer product update are prescribed by

fori= l:m
for j = l:n

end
end
The mission of the j loop is to add a multiple of yT to the i~th row of A,
i.e.,

fori= l:m
A(i, :) = A(i, :) + x(i)yT
end
On the other hand, if we make the i-loop the inner loop, then its task is to
add a. multiple of x to the jth column of A:

for j = l:n
A(:,j) = A(:,j) + yU)x
end
Note that both outer product algorithms amount to a set of saxpy updates.
1.1. BASIC ALGORITHMS AND NOTATION 9

1.1.10 Matrix-Matrix Multiplication


Consider the 2-by-2 matrix-matrix multiplication AB. In the dot product
formulation each entry is computed as a dot product:

1 2][5 6]-[1·5+2·7 1·6+2·8]


[3 4 7 8 - 3·5+4·7 3·6+4·8 .
In the saxpy version each column in the product is regarded as a linear
combination of col UID.Il5 of A:

Finally, in the outer product version, the result is regarded as the sum of
outer products:

Although equivalent mathematically, it turns out that these versions of


matrix multiplication can have very different levels of performance because
of their memory traffic properties. This matter is pursued in §1.4. For now,
it is worth detailing the above three approaches to matrix multiplication
because it gives us a chance to review notation and to practice thinking at
different linear algebraic levels.

1.1.11 Scalar-Level Specifications


To fix the discUBSion we focus on the following matrix multiplication update:

C=AB+C
The starting point is the familiar triply-nested loop algorithm:

Algorithm 1.1.5 (Matrix Multiplication: ijk Variant) IT A E JR"'xP,


BE R!'x", and C E R.rnxn are given, then this algorithm overwrites C with
AB+C.
fori= l:m
for ;j = l:n
fork= l:p
C(i,j) = A(i, k)B(k,j} + C(i,j)
end
end
end
10 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

This is the "'ijk va.riant" because we identify the rows of C (and A) with i,
the columns of C (and B) with j, and the summation index with k.
We consider the update C = AB + C instead of just C == AB for two
reasons. We do not have to bother with C .,. 0 initializations and updates
of the form C = AB + C arise more frequently in practice.
The three loops in the matrix multiplication update can be arbitrarily
ordered giving 3! = 6 variations. Thus,
for j = l:n
fork= l:p
for I= l:m
C(i,j) = A(i,k)B(k,j} + C(i,j)
end
end
end

is the jki variant. Each of the six possibilities (ijk, jik, ikj, jki, kij,
kji) features an inner loop operation (dot product or saxpy) and has its
own pattern of data flow. For example, in the ijk variant, the inner loop
oversees a dot product that requires access to a row of A and a column of
B. The jki variant involves a saxpy that requires access to a column of C
and a column of A. These attributes are summarized in Table 1.1.1 along
with an interpretation of what is going on when the middle and inner loop
are considered together. Each variant involves the same amouot of floating

Loop Inner Middle Inner Loop


Order Loop Loop Data Access
ijk dot vector x matrix A by row, B by column
jik dot matrix x vector A by row, B by column
ikj sa:x:py row ga.xpy B by row, C by row
jki saxpy column gaxpy A by column, C by column
kij saxpy row outer product B by row, C by row
kji saxpy column outer product A by column, C by column

TABLE 1.1.1. Matrix Multiplication: Loop Orderi.ng1 and Properties

point arithmetic, but accesses the A, B, and C data differently.

1.1.12 A Dot Product Formulation


The USllal matrix multiplication procedure regards AB as an array of dot
products to be computed one at a time in left-to-right, to~to-bottom order.
1.1. BASIC ALGORITHMS AND NOTATION 11

This is the idea behind Algorithm 1.1.5. Using the colon notation we can
highlight this dot-product formulation:

Algorithm 1.1.6 (Matrix Multiplication: Dot Product Version)


If A E Rmxp, BE :JR.Pxn, and C E R"xn are given, then this algorithm
overwrites C with AB +C.
fori= l:m
for j = l:n
C(i,j) = A(i, :)B(:,j) + C(i,j)
end
end
In the language of partitioned matrices, if

and
B = I bl, ... 'b.,. l
then Algorithm 1.1.6 has this interpretation:
fori= l:m
for j = l:n
C;1. =alb-+
I 1 e;·J
end
end
Note that the "mission" of the j·loop is to compute the ith row of the
update. To emphasize this we could write
fori= l:m-
cf = afB +c'f
end
where

c- [1]
is a row partitioning of C. To say the same thing with the colon notation
we write
fori= l:m
C(i, :) = A(i, :)B + C(i, :}
end
Either way we see that the inner two loops of the ijk variant define a
row-<>riented gaxpy operation.
12 CHAPTER 1. MATRIX MULTIPLlCATION PROBLEMS

1.1.13 A Saxpy Formulation


Suppose A and C are column-partitioned as follows

A = (at, ... ,ap]

C = [ci.····c,t]
By comparlDg jth columns in C = AB + C we see that
p

c; = L bk;a~e + c;, j = l:n.


kool

These vector sums can be put together with a sequence of saxpy updates.

Algorithm 1.1.7 {Matrix Multiplication: Saxpy Version) If the ma-


trices A E R.mxp, BE JRPx", and C E Rmxn are given, then this algorithm
overwrites C with AB + C.

for j = l:n
fork= l:p
C(:,j) =A(:, k)B(k,j) + C(:,j)
end
end
Note that the k-loop oversees a gaxpy operation:

for j = l:n
C(:,j) = AB(:,j) + C(:,j)
end

1.1.14 An Outer Product Formulation


Consider the kij variant of Algorithm 1.1.5:

fork= l:p
for j = l:n
fori= l:m
C(i,j) = A(i, k)B(k,j) + C(i,j)
end
end
end

The inner two loops oversee the outer product update


1.1. BASIC ALGORITHMS AND NOTATION 13

where

with a~; E
A

m.m and b,. E Rn.


~ I., ... ' .. I and B

We therefore obtain
~ [ 1] (1.1.3)

Algorithm 1.1.8 (Matrix Multiplication; Outer Product Version)


If A € lRmxp, BE lftPx", and C E R"'x'l are given, then this algorithm
overwrites C with AB + C.
fork= l:p
C =A(:, k)B(k, :) +C
end
This implementation revolves around the fact that AB is the sum of p outer
products.

1.1.15 The Notion of "Level"


The dot product and saxpy operations are examples of ulevel-1" operations.
Level-l operations involve an amount of data and an amount of arithmetic
that is linear in the dimension of the operation. An m-by-n outer product
update or ga.xpy operation involves a quadratic amount of data (O(mn))
and a quadratic amount of work (O(mn)). They are examples of "level-2"
operations.
The matrix update C = AB + C is a "level-3" operation. Leve!-3
operations involve a quadratic amount of data and a cubic IWlOunt of work.
If A, B, and C a.re n-by-n matrices, then C = AB + C involves O(n 2 )
matrix entries and O(n3 ) arithmetic operatiollB.
The design of matrix algorithms that are rich in high-level linear al-
gebra operations is a recurring theme in the book. For example, a high
performance linear equation solver may require a leveJ-3 organization of
Gaussian elimination. This requires some algorithmic rethinking because
that method is usually specified in level-1 terms, e.g., "multiply row 1 by a
scalar and add the result to row 2."

1.1.16 A Note on Matrix Equations


In striving to understand matrix multiplication via outer products, we es-
sentially established the matrix equation
p

AB""' L:c.~rb'f
.t:=l
where the a,. and b.~; are defined by the partitionings in (1.1.3).
14 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

Numerous matrix equatiOilS are developed in subsequent chapters. Some-


times they are established algorithmically like the above outer product ex-
paosioo and other times they are proved at the ij-compouellt level. As
an example of the latter, we prove an important result that characterize!
transposes of products.
Theorem 1.1.1 If A E R"'xp and 8 E R"x", then (AB)T := BT AT.
Proof. If C""' (AB)T, then
p

Ci; = [(AB)T]~; = [AB];i == L ajl.bki.


1<-1

On the other hand, if D = BT AT, then

" p
<4; = [BT AT]i; == L[BT],~o[AT].~:; = L b.~:oaj.l:·

Since e;; := <41 for all i and j, it follows that C =D. 0

Scalar-level proofs such as this one are usually not very insightful. However,
they are sometimes the only way to proceed.

1.1.17 Complex Matrices


From time to time computations that involve complex matrices are dis-
cussed. The vector space of m-by-n complex matrices is designated by
cmx ". The scaling, addition, and multiplication of complex matrices corre-
sponds exactly to the real case. However, transposition becomes conjugate
tmnspo.rition:
c = AH ==} co; =a;,.
The vector space of complex. n-vectors is designated by (!;'*. The dot product
of complex n-vectors :r and y is prescribed by

Finally, if A= B +iCE cmxn,


then we designate the real and imaginary
parts of A by Re(A) =Band Im(A) = C respectively.

Problems

Pt.l.l Suppoae A E R'x" and z E R" 8l'e giVIIZI. Give a saxpy algorithm for computing
the linrt column of M = {A- z 1 I)··· (A- z~I).
1.1. BASIC ALGORITHMS AND NOTATION 15

P1.1.2 In the C01IYfliJtional 2-b:y-2 ~ multiplication C == AB, then are lligb.t


multiplication.: aubn, IIUII-t~, ~111-tz, ~tilt~. 111211-u, 111211-n. GU~t and 1122~- Make
a table thai iDdicatM thll order that tbme multiplicatioDB are perfoltDfld far the ijll:, jik,
a;, ikj, ja, and Jcji. matrix multiply algorithms.
P1.1.3 Give an algorithm for computi~J« C = (ZJIT)II when!% and 11 are n-vectorL
P1.1.4 Specify aa aJcoritbm fOl' computinc (XYT)I< when X, Y E R'x~.

Pt.1.5 F'onnulMe all ouw product algorithm fw the upda&e C =ABT + C where
A E R"'l<r, BE ~rxr, 8Pd 0 E R"x".

P1.1.6 Sumx- we haw real n-by-n matriceB C, D, E, and F. Sbow bow to compute
real n.-by-n ma&riCfll A and B with just three real n-by-n ~rix multipliea&iona 110 that
(A+ iB) = {C + iD}{E +iF). Hint: Compute W == (0 + D){E- F).

Note. a.nd References for Sec. 1.1


It must be lit~ that tlu! development of quality software from any of our "Mmi-
fonnal" a.lgoritbmic ~tatioD.II is a long &nd arduoDB ta&k.. Even the implement.a.t.ion
of the lev&l-1,2, and 3 BLAS require can:

C.L. Lawaon, R.J. He.oao.n, , D.R. KincNd, and F.T. Krogh (1979-). "Basic Linlllll'
Algebra Subprograms for FORI'RAN Usage," AOM Thln.8. Math. Soft. 5, 308-323.
C.L. LaWIIOo, R.J. H81180D, D.R. Kioeaid, and F.T. KrOih (1979). "Algorithm 539,
Basic Linear Algebr-a Subprograms for FORI'RAN Usage," ACM Thm.s. Modi. Soft.
5, 324-325.
J.J. Doogarra, J. Du C~ S. Hammarliug, and R.J. H81111011 (1988). ~Ao Extended Set
of Fortran Bll8ie Linem Algebra Subprogra.m~," AOM 'nun~. Math. Soft. 1,4, 1-17.
J.J. Doogacra, J. Du Crol, S. HBIIliQUliDg, and R.J. H&rl80n (1988). "Algorithm 656 An
Extended Set of Fol'tra.D. Basic Lin..- Algebra Subprograma: Model ImplemeoWion
and 'l'an Programa," AOM :INm. Math. Soft. 1,4, 18-32.
J.J. Doogana, J. Du Cro&. I.S. Duff, and S.J. Hammarling (1990). •A Set of Level 3
Bui.c Linear~ Subprograma,~ ACM Th:uu. Math. Soft. 16, l-17.
J.J. Dongvm, J. Ou Crm:, I.S. Duff, lllolld S.J. H&mmarll.ng (1990). •AJgorithm 679. A
Set of Level 3 Ba.lc Linear Alpbra Subprograms: Model lmplemeutatoion and Tim
Program!," ACM Thlnl. Math. Soft. 16, 18-28.

Other BLAS ~tll'l!!lCEIII illclude

B. laptriim, P. Line, &Ad C. Van Loan (1991). •Higb-Performaooe ~ BLAS:


Sample RDutme. fw Double Precision Real Dal:a," in High Per/Of'f'I&Gf'l« Computing
II, M. Dlli'IUid and F. El Dabaghi (eds), North-Holland, ~281.
B. KAptriim, P. Ling, aod C. Van Loea (1006). "GEMM-aa-t l.evel-3 BLAS: Higb-
Perl0l'ID8Dc:e Model Implememation. aod PezfonnaDce EwJ.uatioD B-chmvk, ~ in
Parallel Prn,narnming onli AJtPlScatioru. P. Fritaon 8Pd L. Fiamo («<a), ISO Pr-,
184-188.

J.R. Rice (1981). Mlltri: C~ and AI~ Softw-, Academic: Pn.,


New Yol'k.
&Dd a browae tbroup;h the LAPACK IDIII.Ilual.
16 CHAPTER 1. MATRIX MULTlPLJCATJON PROBLEMS

1.2 Exploiting Structure


The efficiency of a given matrix algorithm depends on many things. Most
obvious and what we treat in tbia section is the amount of required arith-
metic and storage. We cootioue to U8e matrix-vector and matrix-matrix
multiplication as a vehicle for introducing the key ideas. As examples of
exploitable structure we have chosen the properties of handedness and sym-
metry. Band matrices have many zero eotries and so it is no surprise that
band matrix manipulation allows for many arithmetic and storage short-
cuts. Arithmetic complexity and data structures are discussed in this coo-
text.
Symmetric matrices provide another set of examples that can be used to
illustrate structure exploitation. Symmetric linear systems and eigenvalue
problems have a very prominent role to play in matrix computatio ns and
so it is important to be familiar with their ma.nipuJatlon.

1.2.1 Band Matrices and the x-0 Notation


We say that A e JRmxn has lower bandwidth p if a1; = 0 whenever i > j +p
and upper b4ndwidth q if j > i + q implies a,i = 0 . Here is an example of
an 8-by-5 matrix that bas lower bandwidth l and upper bandwidth 2:

X X X 0 0
X X X X 0
0 X X X X
0 0 X X X
0 0 0 X X
0 0 0 0 X
0 0 0 0 0
0 0 0 0 0

The x 's designates arbitrary nonzero entries. This notatloo is bandy to


indicate the zero-nonzero structure of a matrix and we UBe it extellSively.
Band structures that occur frequently are tabulated iD Table 1.2.1.

1.2.2 Diagonal Matrix Manipulation


Matrices with upper and lower bandwidth zero are diogcnoL. If De R"'l("
is diagonal, then

If D is diagonal and A is a matrix. then DA is a row 1ooling of A and AD


is a column 1aKing of A.
1.2. EXPLOITING STRUCTURE 17

Type Lower Upper


of Matrix Bandwidth Bandwidth
diagonal 0 0
upper triangular 0 n-1
lower triangular m-1 0
tridiagonal 1 1
upper bidiagonal 0 1
lower bidiagonal 1 0
upper Hessenberg 1 n-1
lower Hessenberg m-1 1

TABLE 1.2.1. Band Terminology for m-by-n Matrices

1.2.3 Triangular Matrix Multiplication


To introduce band matrix "thinking" we look at the matrix multiplication
problem C = AB when A and B are both n-by-n and upper triangular.
The 3-by-3 case is illuminating:
aubu aub12 + a12h-n anbt3 + a12b:!J + a13b:33
C

I 0

0
a22~2

0
a22~J + a23b33
a33b33
It suggests that the product is upper triangular and that its upper trian-
gular entries are the result of abbreviated inner products. Indeed, since
llikbki = 0 whenever k < i or j < k we see that
J
Cij = E a.ttbt;
and so we obtain:
·-i
Algorithm 1.2.1 (Triangular Matrix Multiplication) If A, B E R.nxn
= AB.
are upper triangular, then this algorithm computes C
C=O
fori= l:n
for j = i:n
fork= i:j
C(i,j) = A(i,k)B(k,j} + C(i,j}
end
end
end
18 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

To quantify the savings in this algorithm we need some tools for measuring
the amount of work.

1.2.4 Flops
Obviously, upper triangular matrix. multiplication involws less arithmetic
than when the matrices are full. One way to quantify this is with the notion
of a jWp. A flop 1 ia a floating point operation. A dot product or saxpy
operation of length n in~lvea 2n Bops because there are n multiplic.ations
and n adds in either of these vector operations.
The ga.xpy y = Az + y where A E rxn involws 2mn Bops as does an
m-by-n outer product update of the form A =A+ %'!JT.
The matrix multiply update C..: AB+C where A E Rmxp, B € JRPx",
and C E lR.mxn involves 2mnp flops.
Flop counts are usually obtained by su.mming the amount of arithmetic
associated with the I008t deeply nested statements in an algorithm. For
matrix-matrix multiplication, this is the statement,
C(i,j) = A(i,k)B(k,j) + C(i,j)
which involves two llops and is executed mnp times as a simple loop ac-
counting indicates. Hence the conclusion that general matrix multiplication
requires 2mnp Bops.
Now let us investigate the amount of work involved in Algorithm 1.2.1.
Note that Csi • (i $ j) requires 2(j - i + 1) flops. Using the heuristics

q(q + 1)
2

and
ql q2 q ql
-+-+-
3 2 6
:::=-
3
P-1
we find that triangular matrix multiplication requires one-sixth the number
of flops as full matrix multiplication:

We throw aW&y the low order tenus since their inclusion does not contribute
to what the Oop count "says." For example, an exact Oop count of Algo-
rithm 1.2.1 ~ thal precl&ely n 3 /3 + n 2 + 2n/3 Bops are involved. For
tu
tIn ~e fim edition of book - deftned,a flop to bel the amount of work ~ia&ed
witb au openailoo of \be form au • e&;J + CI(}G,Io1 , I.e., a~ po~ add, a floa&i.og
point multiply, aod 80Dl8 auba!7iptinc. Tblla, au "old flop~ iQ\101\.W two •a- flof&" lo
deft.Diog a flop to be & ~ floaUq point operaioD we are optiag Cor a more prec:IM
measure of &ritbmetlc complexity.
1.2. EXPLOITING STRUCTURE 19

large n (the typical situation of interest) we see that the exact 8op count
offers no insight beyond the n 3 /3 approximation.
Flop counting is a necessarily crude approach to the measuring of pro-
gram efficiency since it ignores subscripting, memory traffic, and the count-
less other overheads all80ciated with program execution. We must not infer
too much from a compal'ison of 8ops oounts. We C8JlDOt conclude. for ex-
ample, that triangular matrix multiplication is six times faster than square
matrix multiplication. Flop counting is just a "quick and dirty" accounting
method that captures only one of the several dimensions of the efficiency
issue.

1.2.5 The Colon Notation-Again


The dot product that the k-loop performs in Algorithm 1.2.1 can be suc-
cinctly stated if we extend the colon notation introduced in §1.1.8. Suppose
A E R"'xn and the integers p, q, and r satisfy 1 '5 p '5 q :5: nand 1 ~ r ~ m.
We then define
A(r,p:q) = [a,, ... ,a..q) e Rlx(q-p+l).
Likewise, if 1 ~ p 5 q 5 m and 1 '5 c '5 n, then

A(p;q, c) = [ ~
aqc
l
e R.q-p+l.

With this notation we can rewrite Algorithm 1.2.1 as


C(l:n, 1:n) = 0
fori= l:n
for j = i:n
C(i,j) = A(i,i:j)B(i:j,j) +C(i,j)
end
end
We mention one additional feature of the colon notation. Negative in-
crements are allowed. Thus, if x and y are n-vectors, then s = xTy( n: -1:1)
is the summation
n

8 = L Xt:Yn-i+l ·
(=1

1.2.6 Band Storage


Suppose A E Fxn has lower bandwidth p and upper bandwidth q and
assume that p and q are much smaller than n. Such a matrix can be stored
in a. (p + q + 1)-by-n array A. band with the convention that
as; = A.band(i- j + q + 1, j) (1.2.1)
20 CHAPTER 1. MATRlX MULTIPLICATION PROBLEMS

for all ( i, j) that fall inside the band. Thus, if

au an a13 0 0 0
421 ~ a23 a-u. 0 0
0 a;n 033 a:w G:J6 0
A= 0 0 0.(6
a43 04-f, 0411
0 0 0 064 OM 056
0 0 0 0 G66 1166

l
then

A.band = [ 0~1 :~ := ::: :: ::


a21 a3~ a43 ~ ~ 0
Here, the "0" entries are unused. With this data. structure, our column-
oriented gaxpy algorithm transforms to the following:

Algorithm 1.2.2 (Band Gaxpy) Suppose A E R"xn has lower band-


width p and upper bandwidth q and is stored in the A.band format (1.2.1).
If x, y E R", then this algorithm overwrites y with Ax + y.

for j = l:n
Ytop = max(l,j- q)
Ybot = m.in(n,j + p)
atop = ma.x(1, q + 2 - j)
allot = <ltop + Yt~oc - Ytop
Y(Ytop:ybot) = x(j)A.band(atop:«boc,j) + Y(Ytop:Ybot)
end

Notice that by storing A by colllinD. in A.band, vre obtain a sa.xpy, column


access procedure. Indeed, Algorithm 1.2.2 is obtained from Algorithm 1.1.4
by recognizing that each saxpy involves a vector with a .small number of
nonzeros. Integer arithmetic is used to identify the location of these nonze-
ros. As a result of this careful zero/nonzero analysis, the algorithm involves
just 2n(p + q + 1) flops with the assumption that p and q are much smaller
than n.

1.2. 7 Symmetry
We say that A E Rnxnis symmetric if AT= A. TbUB,

1 2 3]
A=
[ 2
3 5 6
4 5
1.2. EXPLOITING STRUCTURE 21

is symmetric. Storage requirements can be halved if we just store the lower


triangle of elements, e.g., A. vee= [ 1 2 3 4 5 6 ] . In general, with
this data structure we agree to store the a;; as follows:
tli; = A.vec((j- l)n- j(j- 1)/2 + i) (i ~ j) (1.2.2)
Let us look at the column-oriented ga.xpy operation with the matrix A
represented in A. vee..

Algorithm 1.2.3 (Symmetric Storage Gaxpy) Suppose A e IR.nxn is


symmetric and stored in the A. vee style (1.2.2). If x, y E m.n, then this
algorithm overwrites y with Ax + y.
for j = l:n
for i == l:j - 1
y(i) = A.vec((i- l)n- i(i- 1)/2 + j)x(j) + y(i)
end
fori= j:n
y(i) = A.vec((j- l)n- j(j- 1)/2 + i)x(j) + y(i)
end
end
This algorithm requires the same 2n 2 flops that an ordinary gaxpy requires.
Notice that the halving of the storage requirement is purchased with some
awkward subscripting.

1.2.8 Store by Diagonal


Symmetric matrices can also be stored by diagonal. If

1 2 3]
A=
[3 245,
5 6
then in a. store- by-diagonal scheme we represent A with the vector

A.diag = ( 1 4 6 2 5 3 ] .
In general, if i ?. j, then

lliH,• = A.diag(i + nk- k(k -1)/2) (k ~ 0) (1.2.3)


Some notation simplifies the discussion of how to use this data structure in
a matrix-vector multiplication.
U A E Rmxn, then let D(A, k) E R"'xn designate the ktb diagonal of A
as follows:

[D(A k)J·.1 ={ llii i = i +. k, 1 ~ i S m, 1Sj S n


' ~ 0 otherwwe.
22 CHAPTER 1. MATRIX MULTlPLICATJON PROBLEMS

Thus,

A=[~~~]""[~~;]+[~~~]
356 000 000
D(A,2) D(A,l)

+[~ ~ ~]+[~ ~ ~]+[~ ~ ~].


006 050 300
.._.__. .__.. .._,_.
D(A,O) D(A,-1) D(A,-2)

Returning to our store-by·diagonaJ data structure, we see that the nonzero


parts of D(A, 0), D(A, 1), ... , D(A, n- 1) are sequentially stored in the
A.diag scheme (1.2.3). The gaxpy y = Ax+ y can then be organized as
follows:
n-1
y = D(A,O)x + L)D(A, k) + D(A,k)T)x + y.
lr•l

Working out the details we obtain the following a.Jgorithm.

Algorithm 1.2.4 (Store-By-Diagonal Gaxpy) Suppose A E .IR"xn is


symmetric and stored in the A.diag style (1.2.3). If x, y E 1Rn, then this
algorithm overwrites y with Ax + y.

fori= l:n
= A.diag(i)x(i) + y(i)
y(i)
end
fork= l:n -1
t = nk - k(k- 1)/2
{y = D(A,k)x+y}
fori"" l:n- k
y(i) = A.diag(i + t)x(i + k) + y(i)
end
{y = D(A,k)Tx + y}
fori= l:n- k
y(i + k) = A.diag(i + t)x(i) + y(i + k)
end
end
Note that the inner loops oversee vector multiplications:

y(l:n- k) "" A.diag(t + l:t + n- k). • x(k + l:n) + y(l:n - k)


y(k+ l:n) = A.diag(t + l:t + n- k). • x(l:n- k) + y(k + l:n)
1.2. EXPLOITING STRUCTURE 23

1.2.9 A Note on Overwriting and Workspaces


An undercurrent in the above discussion has been the economical use of
storage. Overwriting input data is another way to control the amount of
memory that a matrix computation requires. CoDBider the n--by-n matrix
multiplication problem C = AB with the provioo tbat tbe "input matrix"
B is to be overwritten by the "output matrix" C . We cannot simply
transform

C{l:n,l:n) = 0
for j = l:n
fork= l:n
C(:,j) = C(:,j~ +A(:, k)B(k,j)
end ·.
end
to

for j = l:n
fork= l:n
B(:,j) = B(:,j) +A(:,k}B(k,j)
end
end
because B(:,j) is needed throughout the entire k-loop. A linear workspace
is needed to hold the jth column of the product until it is "safe" to overwrite
B(:,j):

for j = l:n
w(l:n) 0 =
fork= l:n
w(:) = w(:) +A(:, k)B(k,j)
end
B(:,j) = w(:)
end
A linear workspace overhead is usually not important in a matrix compu-
tation that has a 2-dimensional array of the same order.

Problema

P1.2.1 Give an ll.lgorithm that overwr:itM A with A:! wh~ A € e-xn is {a) upper
tria.ngul.a.r and (b) square. Strive for a minimum workspace in each cue.
P1.2.2 Sup~ A E R'x" is upper B-berg and thai -.ian Alo· .. ,A,. are pven.
Give a saxpy algorithm for romputing the lim column Q{ M =(A- A1l) ···(A- ).,.J).
P1.2.3 Give a column sa.xpy algorithm for the n-by-n matrix multipl.ication problem
24 CHAPTER I. MATRlX MULTIPLICATION PROBLEMS

C = AB whme A ill upper triangulac and B ill m-- trieugular.


Pl.2.4. Extend Aicoritb.m 1.2.2 eo that it caq bandle rect.a.ngular band matricea. Be
sure to d~be the undedyillg data aructum.
Pl.2.5 A E R'x" ill Hermition if AH =A. [fA= B+&C, then it ia easy to show thai
=
sT B aDd cr
=-c. SUJIPOIIII- represent A in 1!.11 array A.henn with the property
thai A.herm(i,j) mn- b;j if i ~ j and Cij if j > i. Using tba. data structUI'e write a
matrix-vector multiply function that computes Re(J:) and lm(J:) from Re(z) a.nd lm(z)
!10 that % = kl:.

Pl.2.6 Suppoae X E R'x~ and A E R'xn, with A llytlUJletric and rrtond by diagonal..
Give e.u algorithln that computes Y = XT AX and stores the nl!lult by diagonal. Use
separate anays for A a.od Y.
Pl.2.1 Suppoee a E R" is gi...en e.ud that A E R'xn hu the property that ~i "'
ali-il+l· Give a.o algorithm that overwrites 11 with AZ' + 11 where z,v E R" are given.
Pl.2.8 Suppoee a E R" ill given and that A E R"xn hu the property thai Gii =
G({'+j-l) ""'" n)H· Give a.n algorithm that oYei"Write!i 11 with h + v where Z,JI E R"
are gi'iell,

Pt •.2.9 Develop a compact store-by-d.lagonal liCheme for unsymrnetric b&nd matrices


and write the conesponding gaxpy algorithm.
Pl.2.10 Suppoee p and q are n-vectom and that A = (a;;) is defined by o.;; = a;i :: 7'1q;
for 1 !5 i !5 j !5 11. How many flops are required to compute y = Az when! :r: E R" is
given?

Note& .llnd Referenca for Sec. 1.2


Consult the LAPACK manual foc a discuasion about appropriate data structure~~ when
syllUillltey and/or bandedneE is pr-nt. See alBo
N. Madsen, G. Roderigue, and J. Karush (1976). "Matrix Multiplication by Diagonals
on a Vector Pacallel ProcesiiOl'," lnfOJ'fiOtion Procening Leuen S, 41-45.

1.3 Block Matrices and Algorithms


Having a facility with block matrix notation is crucial io. matrix computa-
tions because it simplifies the derivation of many central algorithms. More-
over, "block algorithms" are increasingly important in high performance
computing. By a block algorithm we essentially mean an algorithm that
is rich in matrix-matrix multiplication. Algorithms of this type turn out
to be more efficient in many computing environments than those that are
organized at a. lower lineo:r algebraic level.

1.3.1 Block Matrix Notation


Column and row partitionings a.re special cases of matrix blocking. In
general we can partition both the rows and columns of an m-by-n matrix
1.3. BLOCK MATRICES AND ALGORITHMS 25

A to obtain
A = [ Au . . . A;1.-l mt

A.,t ··· Aq.. m 11


n1 n,.
where m1 + · · · + m 11 = m, n1 + · · · + n,. = n, and Ao,p designates the
(o, ,B) block or submatrix. With this notation, block Aap has dimension
m.,-by-nf:J and we say that A= (AatJ) is a. q-by-r block matrix.

1.3.2 Block Matrix ~anipulation

Block matrice:S combine just \like matrices with scal.a.r entries as long as
· . certain dimension requirements are met. For example, if

Bu ... Bt.-l mt
B =
[ Bq 1 ••• ;qr m.,
n1 llr

then we say that B is partitioned conformably with the matrix A above.


The sum C = A + B can also be regarded as a q-by-r block matrix:

C =[ C:a ··· Ct. ] = [ Au ~ Bn ··· At.-~ Btr ]·

Cqt Cq.- Aqt + Bql A'~'" + Bq,.


The multiplication of block matrices is a little trickier. We start with a pair
of lemmas.

[l l
mt
A= B = Bt 1 •• ~ ,. B,.
mq nt llr

then

l
Cu Ct.- m1
AB = c =
[ Cqt Cqr
n,.
mq
n1

where CotJ = A 0 Bp foro= l:q and 13 = l:r.


26 CHAPTER 1. MATRlX MULTIPLICATION PROBLEMS

Proof. First we relate scalar entries in block CQ/J to scalar entries in C.


For 1 $ 0t $ q, 1 $ {3 5 r, 1 $ i $ ma-, and 1 $ j $ np we have

where

>. = + ••· + ffio-1


fflt

t' = n•+···+nf:J-1·
But
p

C),+i,l'+i = 2:::" a),+<.ltbk,,.+i = 2::: [Ao,]i.l:: [B{:J],~:i =


.1::=1

Thus, CQ/J = Aa-B/3. dii


Lemma 1.3.2 lf A E Jl."'xP, BE RJ'x",

A= , and
B = [ ~: l:
Proof. We set s = 2 and lea.ve the general s case to the reader. (See
P1.3.6.) For 1 $ i $ m and 1 $ j :::;: n we have
p ~ ~+~

= L a;~obkj = 2: a;~tbkj + L a,~obkj


lt•l .1::=1 k=tll+l

= [A.Bd;; + [A2B2J;1 = [AtBt + A2B2l;i.


Thus, C = A1B1 + A:lB2. 0
For general block matrix multiplication we have the following result:
Theorem 1.3.3 If

Au Ah Bn

l l
m1 Pt

[
B1T
A =
Aqt Aq•
p,
mq
B =
[ B.t B_. P•
Pt nt nr
1.3. BLOCK MATRICES AND ALGORITHMS 27

and we partition the product C = AB a.t jollow11,

Cn ... Ct,.
C=
[ Cqt •.• Cqr
nt n,.

then
a= 1:q, {J = 1:r.

Proof. See P1.3. 7. 0

A very important special case ar~ if we sets= 2, r = 1, and nr = 1:

[ ~~: ~~ ] [ :~ ] = [ ~:~:: : ~::;~· ] .


This partitioned matrix-vector product is used over and over again in sub-
sequent chapters.

1.3.3 Submatrix Designation


As with "ordinary" matrix multiplication, block matrix multiplication can
be organized in several ways. To specify the computations precisely, we
need some notation.
Suppose A E lR.mxn and that i = (ito .. . , i..) and j = Ut, ... ,j,:-J are
integer vectors with the property that
i,,. .. ,i,. E {1,2, ... ,m}
j,, ... ,jc E {1,2, ... ,n}.
We let A(i,j) denote the r-by-c submatrix

A(it:.• jt) A(i~.·ic) ] •


A(i,j) =
[
A(i,.,jt) ·· · A(i,.,jc)
If the entries in the subscript vectors i and j are contiguous, then the
"colon" notation can be used to define A(i,j) in terms of the scalar entries
in A. In particular, if 1 :5 it :5 i2 :5 m and 1 :5 it :5 i2 :5 n, then
A( it :i2,)1:i2) is the submatrix obtained by extracting row& it through i2
and columns i1 through j2, e.g,

a31 a32]
a42 .
A(3:5, 1:2) =
[ a.u
a61 a52
28 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

While on the subject of submatrices, recall from §1.1.8 that if i and j are
scalars, then A(i, :) designates the ith raw of A and A(:,j) designates the
jth column of A.

1.3.4 Block Matrix Times Vector


An important situation covered by Theorem 1.3.3 is the case of a block
matrix times vector. Let us consider the details of the gaxpy y = Ax + y

_[ l l
where A E lRmxn, z E 1R.n, y E lR.m, and

A- ~~: mt _
y- [ : ~~ mt
. Aq mq 'Yq mq
We refer to A; as the ith block row. If m.vec = (mt. ... , mq) is the vector
of block row "heights", then from

r ~1 l ~ l ~~ l
= [ l X + [
l Yq A, Yq
we obtain
last= 0
fori= 1:q
first= last+ 1
last= first+ m.vec(i)- 1 (1.3.1}
y{first:last) = A(first:last, :)x + y(Jirst:last)
end
Each time through the loop an "ordinary" gaxpy is performed so Algorithms
1.1.3 and 1.1.4 apply.
Another way to block the ga.xpy computation is to partition A and x as
follows:

A== z = [ 7]
Xr
nt
n,.

In this case we refer to A; as the jth block column of A. If n.vec =

y = [At , ... , ~ j
[
ZJ
;
x,.
l ,.
( n 1, ••• , ~) is the vector of block column widths, then from

+ y = :?= A;x; + y
,.1
we obtain
1.3. BLOCK MATRICES AND ALGORITHMS 29

last= 0
for j = 1:r
first = last + 1
last= first+ n.vec(j)- 1 (1.3.2)
y = A(:, first:last)x(first:last) + y
end
Again, the ga.xpy's performed each time through the loop can be carried
out with Algorithm 1.1.3 or 1.1.4.

1.3.5 Block Matrix Multiplication


Just as ordinary, scalar-level matrix multiplication can be arranged in sev-
. eral possible ways, so can the multiplication of block matrices. Different
blackings for A, B, and C can set the stage for block versions of the dot
product, saxpy, and outer product algorithms of §1.1. To illustrate this
with a minimum of subscript clutter, we assume that these three matrices
are all n-by-n and that n = Nl where N and l are positive integers.
If A = (AaiJ), B = (Ba/J), and C = (Cal3) are N-by-N block matrices
with l-by-l blocks, then from Theorem 1.3.3
N
CafJ = L A.rrB-rfJ + CafJ o: = l:N, /3 = 1:N.
If we organize a matrix multiplication procedure around this summation,
then we obtain a. block ana.log of Algorithm 1.1.5:
foro:= 1:N
i = (o:- 1)£ + l:o:£
for {3 = 1:N
j = ({3- 1)l + 1:/3l (1.3.3)
for "f = 1:N
k = (1' - 1)l + 1:-yl
C(i,j) = A(i, k)B(k,j) + C(i,j)
eod
end
end
Note that if l = 1, then o: ~ i, {3 ~ j, and "f :: k and we revert to Algorithm
1.1.5.
To obtain a block saxpy matrix multiply, we write C = AB + C as

B·:u
[ c1 , ... , eN J = [ At , ... , AN )
[
BNl

where A"'' Ca E Rnxt, and B<d3 E ntxt. From this we obtain


30 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

for f3 = l:N
j = (/3 -l)t + l:f3l
for a= l:N
i =(a -l)l + l:al (1.3.4)
C(:,j) == A(:,i)B(i,j) + C(:,j)
end
end

This is the block version of Algorithm 1.1.7.


A block outer product scheme results if we work with the blackings

A = [At ' ... ' AN l

where A.,., B-, E Rnxt. From Lemma 1.3.2 we have

c = LArB~ + c
..,.~1

and so

for 1 = l:N
k = (/ - 1 )l + 1:-yl
C = A(:, k)B(k, :) + C (1.3.5}
end

This is the block version of Algorithm 1.1.8.

1.3.6 Complex Matrix Multiplication


Consider the complex matrix multiplication update

where all the matrices are real and i 2 = -1. Comparing the real and
imaginary parts we find

Ct = AtBt - A2B2 + Ct
C2 = A1B2 + A2B1 + C::~
and this can be expressed as follows:
1.3. BLOCK MATRICES AND ALGORITHMS 31

This suggests how real matrix software might be applied to solve complex
matrix problems. The only snag is that the explicit formation of

requires the "double storage" of the matrices A1 and A2.

1.3. 7 A Divide and Conquer Matrix Multiplication


to the matrix·
We conclude this section with a completely different approach
matrix multiplication problem. The starting point in the discussion is the
2-by-2 block matrix multiplication

where each block is square. In the ordinary algoritlun, C;j = AuBt; +


A; 2 B2;· There are 8 multiplies and 4 adds. Strassen (1969) has shown how
to compute C with just 1 multiplies and 18 adds:

pl = (Au + A22HBu + 822)


p2 = (A21 + A:r.z)Bu
Pa = Au(Bl2- B22)
p4 = A22(B21 - Bu)
p5 = (An + A1:~)B22
Ps = (A21 - Au)(Bu + 812)
p7 = (A12 - A22)(B21 + 822)
Cu = Pt+Po~.-P5+Fr
c12 = Pa+l\
c21 = ~+P"
c'l2 = Pt+P:J-~+P6

These equations are easily confirmed by substitution. Suppose n =2m so


that the blocks are m-by-m. Counting adds and multiplies in the compu.
tation C = AB we find that conventional matrix multiplication involves
(2m) 3 multiplies and (2m) 3 - (2m)'2 adds. In contrast, if Strasaen's al-
gorithm is applied with conventional multiplication at the block le'Ve~ then
7m3 multiplies and 7m3 + 11m2 adds are required. H m > 1, then the
Stra.ssen method involves about 7 /8ths the arithmetic of the fully conven-
tional algorithm.
Now recognize that we can recur on the Strassen idea. In particular, we
can apply the Stra.ssen algorithm to each of the half-sized block multiplica-
tions 8880ciated with the .P;. Thus, if the original A and B are n-by-n and
n = Z', then we can repeatedly apply the Strassen multiplication algorithm.
At the bottom "level," the blocks are 1-by-1. Of course, there is no need to
32 CHAPTER 1. MATRIX MULTIPLlCATION PROBLEMS

recur down to the n = 1 leveL When the block size getB sufficiently small,
(n :$ Rmm), it may be sensible to use conventional matrix multiplication
when finding the Pt . Here is the overall procedure:

Algorithm 1.3.1 (Strassen Multiplication) Suppose n = 211 and that


A E Rnxn and BE .rxn. If n...in = zl with d :5 q, then this algorithm
computes C =ABby applying Str888ell procedure recursively q- d times.

function: C = strass( A, B, n, flm&n)


if n :5 nm.in
C=AB
else
=
m n/2;u = l:m;v = m + l:n;
P1 = strass(A(u, u) + A(v, v), B(u, u) + B(v, v), m, n,.1.. )
P2 = strass(A(v, u) + A(v, v), B(u, u), m, nm.on)
Pa = strass(A(u, u), B(u, v)- B(v, v), m, nm... )
P4 = strass(A(v, v), B(v, u)- B(u, u), m, tlman)
Ps = strass(A(u, u) + A(u, v), B(v, v), m, nrnin)
P6 = strass(A(v, u)- A(u, u), B(u, u) + B(u, v), m, nm<n)
P; = strass(A{u, v)- A(v,v),B(v,u) + B(v,v),m, nmin)
C(u, u) = P 1 + P,.- Ps + P1
C(u,v) = P3 + Ps
C(v,u) = P2 +Po~.
C(v,v) = Pt + P3- P2 + P6
end

Unlike any of our previous algorithms strass is recursive, meaning that


it calls itself. Divide and conquer algorithms are often best described in
this manner. We have presented this algorithm in the style of a MATLAB
function so that the recursive calls can be stated with precision.
The amount of arithmetic associated with strass is a complicated func-
tion of nand 7lmin· H n.n1.. > 1, then it suffices to count multiplications
as the number of additions is roughly the same. U we just count the mul-
tiplications, then it suffices to examine the deepest level of the recursion
as that is where all the multiplications occur. In strass there are q - d
subdivisions and thus, -rz-i. conventional matrix·matrix multipllca.tions to
perform. These multiplications have size n...on and thus strass involves
about s = (~) 3 7'1-i. multiplications compared to c = (2'1) 3 , the number
of multiplications in the conventional approach. Notice that
1.3. BLOCK MATRICES AND ALGORITHMS 33

If d = 0 , i.e., we recur on down to the 1-by-llevel, then


9
s = (~) c = 7" = nlos:2 1 $::::l n:J.80T.

Thus, asymptotically, the number of multiplications in the Strassen proce-


dure is O(n 2·807 ). However, the number of additions (relative to the number
of multiplications) becomes significant as nm;n. gets small.

Example 1.3.1 [f n = 1024 and n...,0,. = 64, thm lltrtUIB inwlva~ (7/8)1°-G ~ .6 the
arithmetic: of the oomoentiooal algor:itbm.

Prob16IDII

Pl.S.l Generalize (1.3.3) 80 that it can bNldle the V8liable block.flize problem covered
by Theorem 1.3.3. •
P1.3.2 Generalize (1.3.4) and (1.3.5) 80 thai they «*1 handle the variable block-size

P1.3.S Adapt etrBBII so thai it cao handle equa.re mt>trix multiplication of any order.
Hint: Ir the •curnmt" A has odd dimension, append a. zero row and column.
P1.3.4 Prove that if
".II
A= .
[
A.r•
is a blocking of the malrix A, then

Pl.3.5 SupPQSe n is even and define the following limction from Rn toR.;
rt/2
-/{%) = .-t(l;2;n)T:(2:n) = L:,:'l~-l.'I:Zi
(a) Show thai if:, v E R" then
ft/2

:Tv= 2::<=2•-• +~~)(:2i+lrli-d-/(::.:)-/hr)


(b) Now consider the n-by-n matrix multlplica&ion C = AB. Give an algorithm fol'
comp'liing this product tha& requins n3 /2 multiptim ooce f is applied to the row11 of A
and the ooiUIIUIS of B. See Winognrd (1968) for deCaiJs.
Pl.3.8 Pnm! Lemma 1.3.2 £or pulllnl111. Hint. Set
P'r = 1'1 + --- + Jl-r-1
• p_.+l

Ci.j = 2:: 2:
,.. .....,..,+1
G;•b•J·
34 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

Pl.S.7 Use Lemmas 1.3.1 and 1.3.2 to pl'tiW Theorem 1.3.3. In pvticul!ll', aet

and B., = [ 8-,1 ··· B-,r )

and oote &om Lemma 1.3.2 that



C= LA.,B.,.
-,=I
Now analyze 1!8Cb A.., B., with the help of Lemma 1.3.1.

Notes and Referencee for Sec. 1.3


For quite some time fast methods for matrix multiplication have 11.ttracted a lot of at-
tention wi.&hin computer science. See

S. Winograd (1968). "A NeYt Algorithm for Inner Product," IEEE 1\un~~. Comp. C-17,
693-694.
V. Strasaen (1969). uGa.WI8ian Elimination is Not Optimal," Numer. Math. 13, 354-356.
V. Pan (1984). "How CB.D We Speed Up Matrix Multiplication?,~ SIAM ~ 1l6,
393-416.
Many of &h- methods have dubious practic&l value. However, with the publi<:Mion of

D. Bailey (1988). ~Extra. High Speed Matrix Multiplication on the Cr3y-2," SIAM J.
Sci and St4,, Comp. 9, 603-607.
it is clear that the blanket diBmillaal of theae fasc proced~ is unwise. The urtability"
of the Stra.en algorithm is diric::LLBXI. in §2.4.10. See also

N.J. Higham (1900). "Exploiting F&Bt Matrix Multipliation within the Level 3 BLAS,"
ACM nu~ Math. Soft. 16, 352-368.
C. C. Dougi.N, M- Heroux, G. Slishman, aud R.M. Smith (1994). "GEMMW: A Portable
Level 3 Bl.J\S Winograd Variant of Stra&llen's Matrix-Matrix Multiply Algorithm,~
J. Comput. PhyiJ. 110, 1-10.

1.4 Vectorization and Re-Use Issues


The matrix manipulations discUBSed in this book are mostly built upon
dot products and saxpy operations. Vector pipeline computers are able
to perform vector operations such as these very fast because of special
hardware that iB a.ble to exploit the fact that a voctor operation is a very
regular sequence of scalar operations. Whether or not high performance
is extracted from such a computer depends upon the length of the vector
operands and a number of other factors that pertain tQ the movement of
data such as vector stride, the number of vector loads and stores, and
the level of data re-use. Our goal is to build a useful awareness of these
issues. We are not trying to build a comprehensive model of vector pipeline
1.4. VecTORIZATJON AND RE-USE IssUES 3S

computing that migbt be used to predict perfonnance. We simply want to


identify the kind of thinking that goes into the design of an effective vector
pipeline code. We do not mention any particular machine. The Literature
is filled witb case studies.

1.4.1 Pipelining Arithmetic Operations


The primary reason wby vector computers are fast has to do with pipel~
ing. The concept of pipelining is best understood by making an analogy to
assembly line production. Supp06e the assembly of an individual automo-
bile requires one minute at each of sixty workstations along an assembly
line. U the line is well sta1fed and able to initiate the assembly of a new cal
every minute, tben 1000 cars can be produced from scratch in about 1000
+ 60 = 1060 minutes. For a work order of this size the line has an effective
"vector speed" of 1000/1060 automobiles per minute. On the other hand,
if the assembly line is understa.fl'ed and a new assembly can be initiated
just once an hour, then 1000 hoW'S are required to produce 1000 cars. In
this case the line has an effective "scalar speed" of l / 60th automobile per
minute.
So it is witb a pipelined vector operation such as the vector add z = :r+y.
The scalar operations zo = :r; + y; are the cars. T he number of elements
is the size of the work order. U the start-~ftnish time required for each
z; is r , then a pipellned, length n vector add could be completed in time
much less t han nr. This gives vector speed. Without the pipelini.og, the
vector computation would proceed at a scalar rate and would approximately
require time nr for completion.
Let us see bow a sequence of floating point operations can be pipelined.
Floating point operations usually require several cycles to complete. For
example, a ~ycle addition of two seaJars :r and y may proceed as in
FtG.l.4.1. To visUalize the operation, continue with the above metaphor

%
y Add z

FIG. 1.4. 1 A 3-Cych Ad&r

and think of the addition unit aa an 8Sienlbly line with three "work st.
tiona". The input scalars z and y proceed along the 888embly line spending
one cycle at each of three stations. The sum z emerges after three cycles.
36 CHAPTER l. MATRIX MULTIPLICATION PROBLEMS

AdjUBt
Add Norma!De
Exponenta

- · · :tto
Z6 ---
·•• Ylo

FIG. 1.4.2 Pipelined Addition

Note that when a single, "free standing" addition is performed, only one of
the three stations is active during the computation.
Now consider a vector addition z = x + y . With pipelining, the x and y
vectors are streamed through the addition unit. Once the pipeline is filled
and steady state reached, a Zi is produced every cycle. In FIG.l.4.2 we
depict what the pipeline might look like once this steady state is achieved.
In this case, vector speed is about three times scalar speed because the time
for an individual add is three cycles.

1.4.2 Vector Operations


A vector pipeline computer comes with a repertoire of vector instructions,
such as vector add, vector multiply, vector scale, dot product, and saxpy.
We assume for clarity that these operations take place in vector registers.
Vectors travel between the registers and memory by means of vector load
and vector store instructions.
An important attribute of a vector processor is the length of its vector
registerswbfb we designate by Vr.. A length-n vector operation must be
broken down) into subvector operations of length vr.or less. Here is how such
a partitioning might be managed in the case of a vector addition Z "" X + y
where :t andy are n-vectors:

first= 1
while first :5 n
last= min{n,first +vt. -1}
Vector load x(first:last).
Vector load y(first:last).
Vector add: z(!irst:last) = x(first:last) + y(!irst:last).
Vector store z(Jirst:last).
first = last + 1
end

A reasonable compiler for a vector computer would automatically generate


these vector instructions from a programmer specified z = x + y comm&lld.
1.4. VECTORIZATION AND RE-UsE IssUES 37

1.4.3 The Vector Length Issue


Suppose the pipeline for the vector operation op take! Top cycles to "set
up." Assume that one component of the result is obtained per cycle once
the pipeline is filled. The time required to perform an JHlimensional op is
then given by
Top(n) = (rop + n)p
where p is the cycle time and 11 z. is the length of the vector hardware.
If the vectors to be combined are longer than the vector hardware length,
then as we have seen the overall vector operation must be broken down into
hardware-manageable chunks. Thus, if

then we assume that


T. (n) _ { n1(r..,. +vd,u
..,. - (n1(r..,. + vd +Top+ no)p
specifies the overall time required to perform a length-n op. This simplifies
to
Top(n) = (n + ropceil(n/vr.)) ,u
where ceil(o:) is the smallest integer such that a :5 ceil(o:). If p flops per
component are involved, then the effective rate of computation for general
n is given by
rm P 1
R,p(n) = Top(n) = ~ 1 +~ceil (..n.) .
n VL

{If IJ is in seconds, then 14, is in flops per second.) The asymptotic rate of
performance is given by

lim Rop(n) = ~ !!.. .


n-oo 1 + if: p
As a way of asse8Si.ng how serious the start-up overhead is for a vector
operation, Hackney and Jesshope (1988} define the quantity n 1n to be the
smallest n for which half of peale performance is achieved, i.e.,
Jn1t/2 I p
Top(n 112) = 2'P
Machines that have big n 112 factors do not perform well on short vector
operations.
Let us see what the above performance model says about the design
of the matrix multiply update C = AB + C where A E Rmxp, Be wxn,
and C E R"'xn. Recall from §1.1.11 that there are six possible versions of
the conventional algorithm and they correspond to the six possible loop
orderings of
38 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

fori= l:m
for j = l:n
fork= l:p
C(i,j) = A(i, k)B(k,j) + C{i,j)
end
end
end

This is the ijk variant and its innermost loop oversees a length-p dot prod-
uct. Thus, our performance model predicts that

Tojk =mnp + mn · ceil{p/vd-rdot


cycles are required. A similar analysis for each of the other variants leads
to the following table:

Variant Cycles
ijk mnp + mn · -rr~ot(pfvt.)
jik mnp +mn · 1"dot(p/vL)
ikj mnp + mp · -r.=(n/vL}
jki mnp + np · 'T10 z(mjvt.)
kij mnp + mp · 'T~z(n/vr.)
kji mnp + np · 'T1 ~:u:(m/vrJ

We make a few observations based upon some elementary integer arithmetic


manipulation. Assume that 'T6 u and 'Tdo~ are roughly equal. U m, n, and
pare all less than Vr., then the most efficient variants will have the longest
inner loops. If m, n, and pare much bigger than Vr., then the distinction
between the six options is small.

1.4.4 The Stride Issue


The "layout" of a vector operand in memory often has a bearing on execu-
tion speed. The key factor is stride. The 6tride of a stored floating point
vector is the distance (in logical memory locations) between the vector's
components. Accessing a row in a twi:Kiimensional Fortran array is not a
unit stride operation because arrays are stored by column. In C, it is just
the opposite as matrices are stored by row. Nonunit stride vector opera-
tions may interfere with the pipelining capability of a computer degrading
performance.
To clarify the stride issue we coDBider how the six variants of matrix
multiplication "pull up" data from the A, B, and C matrices in the inner
loop. This is where the vector calculation occurs (dot product or saxpy}
and there are three possibilities:
1.4. VECTORIZATION AND RE-USE ISSUES 39

jki or kji: fori= l:m


C(i,j) = C(i,j) + A(i,k)B(k,j)
end

ikj or kij: for j = l:n


C(i,j) = C(i,j) + A(i,k}B(k,j)
end

ijk or jik: fork= l:p


C(i,j) = C(i,j) + A(i, k)B(k,j}
end

Here is a table that specifies the A, B, and C strides associated with each
of these possibilities:

Variant A Stride B Stride C Stride


jki or kji Unit 0 Unit
ikj or kij 0 Non. Unit Non· Unit
ijk or jik Non. Unit Unit 0

Storage in column-major order is assumed. A stride of zero means that only


a single array element is accessed in the inner loop. From the stride point
of view, it is clear that we should favor the jki and kji variants. This may
not coincide with a preference that is based on vector length considerations.
Dilemmas of this type are typical in high performance computing. One goal
(maximize vector length) can conflict with another (impose unit stride).
Sometimes a vector stride/vector length conflict cans be resolved through
the intelligent choice of data structures. Consider the ga.xpy y = Ax + y
where A e Run is symmetric. Assume that n S v~, for simplicity. If
A is sto~ conventionally ~d Algorithm 1.1.4 is used, then the central
computation entails n, unit stride saxpy's each having length n:

for j = l:n
y= A(:,j)x(j) + y
end

Our simple execution model tells us that

Tt = n(r._ + n)
cycles are required.
In §1.2.7 we introduced the lower triangular storage scheme for sym-
metric matrices and obtained this version of the gaxpy:
40 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

for j = l:n
fori= l:j -1
y(i) = A.uec((i- 1)n- i(i- 1)/2 + j)x(j) + y(i)
end
fori =j:n
y(i) = A.vec((j- l)n- j(j -1)/2 + i)x(j) + y(i}
end
end
Notice that the first i-loop does not define a unit stride saxpy. If we assume
that a length n, nonunit stride sa.xpy is equivalent to n unit-length saxpys
(a worst case scenario}, then this implementation involves

cycles.
In §1.2.8 we developed the store-by-diagonal version:

fori= l:n
y(i) = A.diag(i)x(i} + y(i)
end
fork= l:n- 1
t = nk- k(k- 1)/2
{y = D(A, k)x + y}
fori= l:n- k
y(i) = A.diag(i + t)x(i + k} + y(i)
end
{y = D(A,k)T X+ y}
fori= l:n -k
y(i + k) = A.diag(i + t)x(i) + y(i + k)
end
end

In this case both inner loops define a unit stride vector multiply (vm) and
our model of execution predicts

Ta = n(2r._. + n)

cycles.
The example shows how the choice of data. structure can effect the stride
attributes of an algorithm. Store by diagonal seems attractive because it
represents the matrix compactly and has unit stride. However, a careful
which-is-best analysis would depend upon the values of r,az and Ttlm and
the precise penalties for nonunit stride computation and excess storage.
The complexity of the situation would call for careful benchma.rldog.
1.4. VECTORIZATION AND RE-USE IssUES 41

1.4.5 Thinking About Data Motion


Another important attribute of a matrix algorithm concerns the actual vol-
ume of data that has to be moved around during execution. Matrices sit
in memory but the computations that involve their entri«:~ take place in
functional units. The control of memory traffic is crucial to performance
in many computers. To continue with the factory metaphor used at the
beginning of this section: Can we keep the superfast arithmetic units bu8y
with enough deliveries of matrix data and can we ship the results back to
memory fwt enough to avoid backlog? F1G.1.4.3 depicts the typical situa--
tion in an advanced uniprocessor environment. Details vary from machine

FIG. 1.4.3 Memory Hierarchy

to machine, but two "axioms;; prevail:


• Each level in the hierarchy has a limited capacity and for economic
reasons this capacity is usually smaller as vre ascend the hierarchy.

• There is a cost, sometimes relatively great, associated with the moving


of data between two levels in the hierarchy.

The design of an efficient matrix algorithm requires careful thinking about


the flow of data in between the various levels of storage. The vector touch
and data re-use issues are important in this regard.

1.4.6 The Vector Touch Issue


In many advanced computers, data is moved around in chunks, e.g., vectors.
The time required to read or write a vector to memory is comparable to
the time required to engage the vector in a dot product or saxpy. 'l'hus, the
number of vector touches associated with 8 matrix code is 8 very important
statistic. By 8 "vector touch" we mean either & vector load or store.
42 CHAPTER 1. MATRIX MULTIPLICATlON PROBLEMS

Let's count the number of vector touches associated with an m-by-n


= =
outer product. Assume that m m1v" and n ntV&. where Vt.is the vector
hardware length. (See §1.4.3.) In this environment, the outer product
update A = A + zyT would be arranged as follows:

for a= l:m1
i = (a- 1)1h + l:avz.
for {3 = l:n1
j = (/3- l)Vt. + l:{JVL
A(i,j) = A(i,j) + x(i)y(j)T
end
end

Each column of the submatrix A(i,j) must be loaded, updated, and then
stored. Not forgetting to account for the vector touches associated with x
and y we see that approximately

vector touches are required. (Low order terms do not contribute to the
analysis.)
Now consider the ga.xpy update y = Ax + y where y E Rm, x E m.n and
A E Rm x n. Breaking this computation down into segments of length v z.
gives

foro= l:m1
i = (cr- l)vz. + l:avz.
for /1 = l:n1
j = ({3- l)th + 1:/3vL
y(i) = y(i) + A(i, j)x(j)
end
end

Again, each column of submatrix A(i,j) must be read but the only writing
to memory involves subvectors of y. Thus, the number of vector touches
for an m-by-n gaxpy is

This is half the number required by an identically-sized the outer product.


Thus, if a computation can be arranged in terms of either outer products
or gaxpys, then the former is preferable from the vector touch standpoint.
1.4. VECTORIZATION AND RE-USE IssUES 43

1.4. 7 Blocking and Re-Use


A CtJCh.e is a small bigh-speed memory situated in between the functional
I.Ulits and main memory. See FlG. l.4.3. Cache utilization colors perfor·
mance b«ause it bas a direct bearing upon how data 8owB in between the
functional units and the lower levels of memory.
To illustrate this we consider the computation of the matrix multiply
update C"'" AB + C where A,B,C E Ru" reside in main memofil. All
data must pass through the cache on its way to the functional units where
the floating point computatioos are carried out. If the caclle is small and
n is big, then the update must be broken down into smaller parts so that
the cache can "gracefully" process the flow of data.
One ~rategy is to block the B and C matrices,

B = [ 81 , ... , BN ) c = [ c. ,... , eN
i i t t
where we 8BSUIIle that n = i N . From the expansion
,.
CB = ABB + Ca = EA{:, k)Bo(k, :) + Ca
·-1
we obtain the following computational framework:

for a= l :N
Load Ba and Ca into cache.
fork= l:n
Load A(:, k) into cache and update 0 0 :

Ca =A(:, k)Ba(k, :) + Ca
end
Store C.. in main memory.
end
Note that if M is the cache size measured in 6oating point words, then we
must have
2ni+n ~ M. (1.4.1)
Let rl be tbe number of floating point numbers tbat 8ow (in either direc-
tion) between cache and main memory. Note that e"MlY entry in B ia loaded
into cache once, every entry in C is loaded into cache OIK!e aod ~red beck
in main memory once, and every entry in A is loaded into cache N = n/l
times. It follows that
n3
r. = 3n2 + t:·
2 The dJacu.ioo wbidl Collawa would al8o apply it the mauil:el wwe OD a dillk 6D<l
a-ted t4 be brought iato maio rnemoey.
44 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

In the interest of keeping data motion to a minimum, we choose l to be as


large as possible subject to the constraint (1.4.1). We therefore set

~~~(~ -1)
obtaining
2n 4
2
r1:::::3n + - M .
-n
(We use "~" to emphasize the approximate nature of our analysis.) U cache
is large enough to house the entire B and C matrices with room left over
for a column of A, then l =nand rl = 4n 2. At the other extreme, if We
can just fit three columns in cache, then l =
1 and rl :::::: n 3 .
Now let us regard A= (A~) , B = (Bajj), and C = (Cajj) as N-by-N
block matrices with uniform block size l = nfN. With this blocking the
computation of

a = l:N, {3 = l:N

can be arranged as follows:

foro= l:N
for {3 = l:N
Load Cap into cache.
for 1 = l:N
Load Aa-, and B..,jj into cache.
Cajj = Cap + AD.'"fB..,p
end
Store Cap in main memory.
end
end
In this case the main memory/cache traffic sums to
2n3
f:~;=2n 2 +-
t
because each entry in A and B is loaded N = njt times and each entry
in C is loaded once and stored once. We can minimize this by choosing l
to be as large as possible subject to the constra.i.nt that three blades fit in
cache, i.e.,

Setting t ~ .fM73 gives


1.4. VEcroRIZATION AND RE-USE IssuES 45

A manipulation shows that

rl 3n'J.+J'!!.n 3+2n2
-~ > --~M~=
r'l 2n2+2n3/I 2+2-/3~·
The key quantity here is n 2 fM, the ratio of matrix size (in floating point
words) to cache size. AJJ. this ratio grows the we find that
rl
-R!--
n
r:~ ../3M
showing that the second blocking strategy is superior from the standpoint
of data motion to and from the cache. The fundamental conclUBion to be
reached from all of this is that blocking effects data motion.

1.4.8 Block Matrix Data Structures


We conclude this section with a discussion about block data structures. A
programming Ja.nguage that supports two-dimensional arrays must have a
convention for storing such a structure in memory. For example, Fortran
stores two-dimensional arrays in column major order. This means that the
entries within a column are contiguous in memory. ThUB, if 24 storage
locations are allocated for A E Jt"x 6 , then in traditional store-by-column
format the matrix entries are "lined up" in memory as depicted in FIG.
1.4.4. In other words, if A E JR"'x"' is stored in v(l:mn), then we identify

FIG. 1.4.4 Store by Column (4-by-6 CJIJ6e)

A(i,j) with v((i- l)m + i). For algorithms that accem~ matrix data by
column this is a good arrangement since the column entries are contiguous
in memory.
46 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS

FIG. 1.4.5 Store-by-Blocks (4-by-6 case with 2-by-2 Blocks)

In certain block matrix algorithms it is sometimes useful to store matri-


ces by blocks rather than by column. Suppose, for example, that the matrix
A above iB a 2-by-3 block matrix with 2-by-2 blocks. In a store-by-eolumn
block scheme with store-by-column within each block, the 24 entries are
arranged in memory as shown in FIG. 1.4.5. This data structure can be
attractive for block algorithms because the entries within a given block are
contiguous in memory.

PrQbleDUI

P1.4.1 Cooside!' the matrix produc\ D "" ABC where A E enxr , BE R'"x" a.nd
C E R" x• . .AIBwne thai all the ~ are st~ by column and t.ha& the time requin!d
to execute & unit-stride -.xpy open&loo of length /1: is oHhe form 1.(/1:) =(LH:)IJ where L
is & OOIIBtaUt aad I' is the cycle time. Sued on this model, whllll ill it more economical to
computeD. D == (AB)C instead of IIIII D = A(BC)? Asaume that an matrix multiplil!ll
are done \Wing the ji:i, (ga.xpy) algorithm.
P1.4.2 Whai is the toi&l time apm~t in ja variant on tb saxpy opa-a.tiooa 881Uming
that &II the matrices are lltored by column and tb.at the time required to execute & unit-
=
Slrid.e saxpy operalion of length /1: is of the fonn t(.l:) (L + .1:)1' whm'e L ill & coDSiaDt
and p is the cycle time? Specialbe the algorithm ao that it efficiently handles the ca.
when A a.nd B are n-by-n and upper triangu1ac. Does it follow that the triangular
implementation ill six timl!ll fastel' u the ftop COU1rt 8Uggmts7
Pt.4.3 Give 1m algorithm far computing C =AT BA where A and B ILt'fl n-by-n and
B is symmetric. Anaya ahould be IIC.C-t in unit SUide fll8b.ioJI within all innermost;
loops.
P1.4.4 Suppoee A E R"xn is atored by column in A.ccol(l:mn). Allsume that m = l1M
1.4. VECTORJZATION AND RE-USE IssUES 47

and n • l2N and tba&;- reprd Au aD M-by-N block matrix with lt·by-b blocks.
Given i, j, a, and Q tbal; Ali8ly 1 $ i $ lt, 1 $ j $l2, 1 $a$ M, aDd 1 $ Q $ N
detennine It; so that A.col(l:) ~the (i,j) entry of A,.tl. Glw an algorithm that
avenorite. A.col with A lliol'ed by block 811 in Figure LUi. How big of a work anay ia
required?

Notes and Referencu for- Sec. 1.4


Two excellent expositions abou:t; wctor computation sre

J.J. Donga:ra, F.G. Gllfiawon, and A. KNp (1984). Klmplementing Linea: Algebra
Algorithms for Denae Matrices on a Vector Pipeline Machine,~ SIAM ReuieT.a- 16,
91-112.
J.M. Ortega and R.G. Voigt ( 1985). uSolutioo of Partial Differential Equaiiooa on Vector
and Parallel Computer.,~ SIAM Review fJ'T, 14g..240.
A very detailed look ai matrix oomputatiooa in hienrchical memory systems can be
found In

K. Gallivan, W. Jalby, U. M~, and A. H. Sameh (1988). Klmpact of Hienu-dtieal Mem·


ory Systems on Linear AJcebra Algorithm Design," lnt'l J. Supercomputer Applic.
t, 12-48.

W. Schooaue£ (1987). Scien~ Computing on Vector Comput.en, North Holland, Am-


sterdam.
R. W. Hackney and C.R. J~pe (1988). Parallel ComputerJ .2, Adam Hilger, Bristol
and Philedelpbia.
where varioWI modeJs of vector proci!BIIOr perlorma.nce are set forth. Pap«S oo the prac-
tical aspects of vector comp1.1\iq include

J.J. Dongacra and A. Hinda (1979}. '"Unrolliog Loops in Fortran," Sojfvnzre Pnlc:tice
ond ~ 9, 219-229.
J.J. Dongarn. and S. ~ (1984). "Squeezing the Mosc Out of an Algoritlun in
Cnt.y Fortran," ACM 7nuu. Mstk. Soft. 10, 221-230.
B.L. Bust- (1986) "A Stra.te§ for Vectorizaiion," Porollel Computing 3, 187-192.
K. Gallivan, W. Jalby, and U. Meter (1987). "The Ueo of BLAS3 in u-- Algebr-a 011 a
Pamllel ~ with a Hiemrclrlcal Memory-,~ SIAM J. Sci. tmd SW. Comp. 8,
1()79..1084.
J.J. Dongarn. aDd D. Walht (1995). "Software Librariel for LiDe8r Algebra Computa-
tions oo High Perlonnaoce Compu1en.," SIAM R.evieut 37, 151-180.
Chapter 2

Matrix Analysis

§2.1 Basic Ideas from Linear Algebra


§2.2 Vector Norms
§2.3 Matrix Norms
§2.4 Finite Precision Matrix Computations
§2.5 Orthogonality and the SVD
§2.6 Projections and the CS Decomposition
§2. 7 The Sensitivity of Square Linear Systems
The analysis and derivation of algorithms in the matrix computation
area requires a facility with certain aspects of linear algebra. Some of the
basics a.re reviewed in §2.1. Norms and their manipulation are covered in
§2.2 and §2.3. In §2.4 we develop a model of finite precision arithmetic and
then use it in a typical roundoff analysis.
The next two sections deal with orthogonality, which has a prominent
role to play in matrix computatioDB. The singular value decomposition
and the CS decomJXJSition are a pair of orthogonal reductions that provide
critical insight into the important notions of rank and distance between
subspaces. In §2. 7 we examine how the 90lution of a linear system .4% ==
b cbanges if A and b are perturbed. The important concept of matrix
condition is introduced.

BefOTP. You Begin


References that complement this chapter include Forsythe and Moler
(1967}, Stewart (1973), Stewart and Sun (1990), and Higham (1996).

2.1 Basic Ideas from Linear Algebra


This section is a quick review of linear algebra. Readers who wish a more
detailed coverage should consult the references at the end of the section.

AO
2.1. BASIC IDEAS FROM LINEAR ALGEBRA 49

2.1.1 Independence, Subspace, Basis, and Dimension


A set of vectors {a 11 ••• , a,.} in Rm is linearly independent ifL:~... 1 o;a; = 0
implies a(l:n) = 0. Otherwise, a nontrivial combination of tbe ao is zero
and {a 1o ••• , an} is said to be linearly dependent .
A subspace of 1Rm is a subset that is also a vector space. Given a
coUection of vectors a 1 , .•. , an E Rm, the set of all linear combinations of
these vectors is a subspace referred to as the ~J~Qn of {at. ... , a,.}:
..
span{a1, ... ,a,.} == { Li1;a;: !3; E R}.
j•l

If {a 1, ... , a,.} is independent and bE span{a1, ... , a,.}, then b is a unique


linear combination of the ai.
If S 1 , ••• , S~~: are subspaces ofJR.m, then their sum is the subspace defined
by S::::; { a1 + a2 + · · · + ar. :a. E S,, i ""' l:k }. Sis said to be a direct aum
if each v E S haa a unique representation 11 ::; a 1 + · · · + a.1r with a; E S;.
In this case we writeS= 8 1 ffi · · · ffi S~~:. The intersection of the S, is also
=
a subspace, S St n S'l n · · · n S~~:.
The subset {Qo 1 , ••• , a..} is a marimal linearly independent subset of
{ a 1, ••• , a,.} if it is linearly independent and is not properly contained in any
linearly independent subset of {a1. ... , an}. If {a;w .. , a..} is maximal,
then span{at, ... , a,.} = span{ Go" ... , a;.} and {ao 11 . . . , ao.} is a. basis
for spa.n{a1. ... ,an} . If S s;;; JR.m is a subspace, then it is pos5ible to find
independent basic vectors at. ••• , ar. e S such that S = span{a 1 , ••. , a~~:} .
All bases for a subspace S have the same number of elements. This number
is the dimension and is denoted by dim( S).

2.1.2 Range, Null Space, and Rank


There are two important subspaces a.ssociated with an m-by.n matrix A.
The mnge of A is defined by
ran(A) = {y € nm : y = Ax for some X E R"},
and the null SfJG(% of A is defined by
null(A) = {x ERn: Ax:::: 0}.
If A = [ a1, ... , a,.) is a column pa.rtl.tioning, then
ran(A) = span{alt ... , On} .
The mnA: of a matrix A is defined by
rank( A)= dim (ran(A)).
It cao be shown that rank(A) = rank( AT). We say that A e ~x" is rank
deficient if rank(A) < min{m, n}. If A E !r"xro, then
dim(null(A)) + ran.k(A) = n.
50 CHAPTER 2. MATRIX ANALYSIS

2.1.3 Matrix Inverse


The n-by-n identity matri:t In is defined by the column partitioning

In= [et.···tl:n]
where e" is the ktb "canonical" vector:
e~c = ( 0, ... , 0 , 1, o, ... ,o)T.
...............
Ji:-1
-...-
n-ft

The canonical vectors arise frequently in matrix analysis and if their eli-
. ·18 ever amb'1guous, we use superscnp
mellBIOn · ts , ·1.e., ele(n) E 10n
Jn. •

If A and X are in lR"xn and satisfy AX= I, then X is the inverse of


A and is denoted by A- 1 . If A - 1 exists, then A is said to be nonsingular.
Otherwise, we say A is singular.
Several matrix inverse properties have an important role to play in ma-
trix computatiollB. The inverse of a product is the reverse product of the
iDverse5:
(AB)-1 = B-IA-1. (2.1.1)
The transpose of the inverse is the inverse of the transpose:

(2.1.2)

The identity
(2.1.3)
shows how the inverse changes if the matrix changes.
The Slu!nnan-Morrison- Woodbury formul4 gives a convenient expres-
sion for the inverse of (A+ UVT) where A E R'x" and U and V are n-by-k:

(2.1.4)

A rank k oorrection to s. matrix results in a rank k correction of the inverse.


In (2.1.4) we assume that both A a.nd (I+ vT A- 1 U) are nonsingula.r.
Any of these facts can be verified by just showing that the "proposed"
inverse does the job. For example, here is bow to confirm (2.1.3):

B (A- 1 - B- 1 (B- A)A- 1) = BA- 1 - (B- A)A- 1 =I.

2.1.4 The Determinant


1 1
U A = (a) E R x , then its determinant is given by det(A) = a. The
determinant of A E Rnxn is defined in terms of order n -1 determinants:
n
det(A) = L(-1)1+ 1 a 1;det(At;).
jo:l
2.1. BASIC IDEAS FROM LINEAR ALGEBRA 51

Here, AtJ is an (n -1)-by-(n -1) matrix obtained by deleting the first row
and jth column of A. Useful properties of the determinant include

det(AB) = det(A)det(B) A,B E R'xn


det(Al) = det(A) AER'x"
det(c.A) = c"det(A) c e R,A e .R"xn
det(A) f. 0 ~ A is nonsingular A E Jr~XR

2.1.5 Differentiation
Suppoee a is a scalae and that A( a) is an m-by-n matrix with entries ao;(a).
H ao;(a) is a. differentiable function of a for all i and j, then by A(a) we
mean the matrix

. a) = da
A( d A(a) = ( da
d flii(a) ) =(~;(a)).

The differentiation of a parameterized matrix turns out to be a handy way


to examine the sensitivity of various matrix problems.

Problmna

P2.1.1 Show thai if A E R"x'" b. rank p, thea there exists an X E R"x" and a
Y E R"x" IRICh t.._ A=- XYT, where rank(X) = nuak(Y) = p.
P2.1.2 Suppoee A(a) E R'"xr aDd B(a:) E R"x" are masria!a wbi:laB entrim ace diffel"-
entiable fuDctioDII ol the IEII!ar ca. Show

! (A(a)B(a)J = [!A( a)] B(a) +A(a) (! B(a)] .


P2.1.3 S~ A(a) E R"x" hM tll&triM tbU ant dilliniDtiable lu.Paioaa of the acal&r
a. A..um.ing A( a} • N'IIPI,Y8 aouiJicular,lbow

~ [A(a:)- 1].., -A(a)- 1 [~..t(a)J A.(a}- 1 •


4a: . 4a:

P:U.-& S~ A E R"x", II E R" aad tJ.a; f(z) - !:.:7' Az - rll. Sbow \ba& the
gradieDt of~ • P- h:r V~:) "" !CAT +A):- b.
P2.1.!5 ~that bot.b AUld A+J areDOIIIIiDp;ularw'-'t.A E R"x"aad u.,v E R
Show t~~.a& if :~: 80iftw cA + .,rp- = b. then it a11o aolvM • pmt1Ubed riPt bud side
probill!ln of the fonD A% z fl + -.. Giw aD exp&.-io.a for a iD ten. ol A, a. aad v.

Not. IUid R.elenmca. Cor See. 2.1


Thmt! ant maay wrocbx:tory ~ alpbra wxta. .Amoo« tb!lm, tbe fo1brias are plll'-
ticttJady 1IBIIful:

P.R. Halmos (1958). Ff:niU ~ V~ $p.s«a. 2Dd ed., Van Nomaud-Rainhllld,


PriDcetozl.
52 CHAPTER 2. MATRIX ANALYSIS

S.J. Laon (1980). Lmear Algem UIUh A~rioN. MJICJ!Jil!an, New York..
G. Svaog (1993). lnD-cldueMI& to Lm- Algll/mJ., WeUesJey-Cambridge Pn., WI!IIB!Iey
MA.
D. Lay (1994). Linmr Algebnl and lu A~ Addillon-Wealey, a.-img, M.A.
C. Meyer (1997). A Cour~e in A.:ppMd Linear Algebra, SIAM Publicaticma, Philadelphia,
PA.
More adv1Ulced u-tme11ts include Gantmacher (1959), Haru and Johll8011 (1985, 1991),
IIDd

A.S. Houaebokl« (1964). The 7'heor-, of M~Uricu in Numeric:Gl Arualy.V, Gilul. (Blaie-
dell), Boston.
M. Mareua 1111d H. Mine (1964). A Surt~e~~ of Mot:riz Theory and MG#U lntqUalttiu,
Ally:o and Bacon, Boston.
J.N. Franklin (1968). Marit Themv Pnmtil:.ll Hall, Englvwood Cllfb, NJ.
R. Bellman (1970). Jntrodu.::lion to M~ An4lJt,tU, Second .&iition, McGraw-ltill, New
York.
P. Lanc:Mter and M. Tismelleteky (1985). The TMory of Malrica, Secmul Edition,
Academic Pn., New Y~rk.
J.M. 0rtep (1987). Mol:t'W Theory: A Second COUNf!, Plenum Pre., New York.

2.2 Vector Norms


Norms serve the same purpose on voctor spaces that absolute value does
on the real line: they furnish a measure of <liatance. More precisely, R"
together with a norm on R" defutes a metric space. Therefore, we have the
familiar notions of neighborhood, open sets, convergence, and continuity
when working with vectors and vector-valued functions.

2.2.1 Definitions
A vector nonn on R" Is a function f:R" ..... R that satisfies the following
properties:
/(z} ~ 0 x E R", (J(z) = 0 ilF x""' 0)
/(:& + y) ~ /{z) + /(y) -z:,yeR"
f(az) = lalf(:c) aER,zeR"
We denote such a function with a double bar notation: /(z) = II :J: Sub- n.
sc:ript& on the double bar are ueed to distiDguiah between VBrious norma.
A UBeful class of vector norms are the p-nornu defined by

II x II,.= Oxtl 7 + · · · + l::~:niP)~ p ~ 1. (2.2.1)


Of these the 1, 2, and oo norms are the most important:
llzlh -" lxtl + ·" + l:tnl
llxlhz = (lxtf~+···+!x.. l2)t = (xTx)'
llxiJCD = max
1:9$n
2.2. VECJ'OR NoRMS 53

A unit vector with respect to the norm U· II is a vector :r: tbat satisfies
ll:r: ll = 1.
2.2.2 Some Vector Norm Properties
A classic result concerning p-norms is the Holder ineqoolity:
1 1
!:r:T111 ~ II :r: ll,h 119 -p + -q = 1. {2.2.2)

A very important special case of thia is the Cauchy-Schwartz ifaettuolity.


(2.2.3)
All norms on R,. ~ equwcalent , i.e., if II · llo an<lll · 116 are norms on
R", then there exist positive constants, Ct and OJ such that

call%: II a ~ II :t 11.8 ~ C2ll Z llo (2.2.4)

ror all X E R". For example, if :t E R". then

ll • lb < II:~; II, ~ v'n ll•lll (2.2.5)


II x IIOQ ~ II z ll2 ~ vn II :r: lloo (2.2.6)
II X lloa < II x lh ~ n II x lloo· (2.2.7}

2.2.3 Absolute and Relative Error


Suppose x e IR" is an approximation to :z: E R". For a giveu vector norm
I! · II we say that
fok = 11%-zll
is the absolute erTOr in%. U :r: f:. 0, then
11%-zll
liz II
prescribes the r!lt&tive efTOr in i. Relative error in the oo.oorm caD be
translated into a statement about the number of comet significant digits
in%. In particular, if
IIi- z Hoo ::::: w-",
II zHao
then the largest oompooent of t baa approximately p correct significant
digits.

Example 12.1 Irz = (1.234 .0567-4):r u.t! "' (1.235 .05128)'~'. thea 1%- z U.,./ 1z H.,.
:::. .0043 ~ 10-~. Note t.bazl %1 baa about tluw ~illcaDt dlsita that an con-ect while
only oae ~ dicit in i:2 is c:olftet.
54 CHAPTER 2. MATRIX ANALYSIS

2.2.4 Convergence
We say that a sequence {xCI:)} of n-vectors conve~u to x if

lim llz(I:J - x II = 0•
I:-co
Note that because of (2.2.4), convergence in the a-norm implies convergence
in the P.,norm and vice versa.

P:robl~~~ma

P2.:il.l Sbow that if :Z: E R", thtm lim_..., II Z II,= II :Z: lloo·
P2.2.2 Prove the Cauchy-Sc.hwarta inequality (2.2.3) by considering the inequality
0 $' (= + br)T(u + b\1) for suitable I!IC&lan 4 and b.
P2.2.3 Verify that II · lb, II · lb. and II · lloo az-e vector norma.
P2.2.4 Verify (2.2.5)-(2.2.7). When is equality a.chieYed in each l'lllllllt?
P2.2.5 Show that in R", z(il - :z: if and only if zi' 1 --. z~o fork= l:n.
P2.2.8 Show that lillY vector nann on R" ~ uniformly continuous by verifying the
inequality 111 z 11 -1111111 $' nz -1111·
P2.2.T Let II · II be a w.ctor nonn on R" and 111111.10111 A E R"x" . Show thai if
rank(A) = n, then Rz nA = Ilk: 11 ia & vtlCtDr norm on R".
P2.2.8 Let z and II be in an and define ,P:R- R by f/l(c.) = II z- CClllb· Show that
=
?/1 i.l miDimiRd when a zT -ui'UTy.
P2.2.9 (a) Verify t~ It z II,::::: (lzd~'+ · · · +lznl'); ill a wctor norm on C'. (b) Show
thai if z e C' then II :z: II, $' c (II Ra{z) U, + lllm(z) II,). (c) Fi.nd a. constant Cn such
tb&i c.. (\I Ra{z) lb +II Im(z) ll2) $\I z II:~ for all z E C".
P:l-2.10 Prove or disprove:

Altbou&h a YeCior oonn i.l ~jlllli" a gen«&&lz&iion of the abeolute w.lue concept, t~
ant 8011111 noteworthy IUb&Jetiel:

J.D. Prycll!l (1984). ~A New M-ure of Relat.ive En-or fOl' Vector&," SIAM J. Num.
AftGL RI, 202-21.

2.3 Matrix Norms


The analysis of matrix algorithms frequently requires use of matrix norms.
For example, the quality of a. linear system solver may be poor if the ma-
trix of coefficients is "nearly singulal'." To quantify the notion of near-
singularity we need a measure of distance on the space of matrices. Matrix
norms provide that measure.
2.3. MATRJX NoRMS 55

2.3.1 Definitions
S"ance £ " X f t is ieomorpbiC to I t -1 the de6nition of & matrix DOrDl should be
equivalent to the definition of a vector oorm. In particular, f:l("'xn - R
is a matrix norm if the following tbree pmperties hold:

/(A) ~ 0 A e l("'x", (f(A) ,.. 0 iff A = 0)


f(A +B) s
f (A) + !(B) A, s e rx".
f(aA) = lalf(A) a e It, A e R"'x".
As witb wctor norms. we use a double bar notation with subecripts to
designate matrix DOnna. i.e., II A II =/(A).
The IDOSt frequently 1.l8ed matrix oorms in numerical linear algebra are
the Frobeaius norm,

IIAII,. =
"' ..
L L I4i,;l2 (2.3. 1)
i•l j • l

aod the p-norms

II A II,. = = NAz II,.


II z II., .
Note that the matrix p-oorma are defined in terma of tbe vector p-uorma
(2.3.2)

tbat we discus8ed in the previous section. The verification that (2.3.1) and
(2.3.2) are matrix DOrms is left; aa ao exercise. It is clear that II A lip is
tbe p-norm o( the largest vector obta.i.oed by appl}'Uig A to a Wlit p-norm
vector:

nAll,.= sup .r,, )~ - ~


IIA (-u QAzll,. .
SJ'O :J: ..
.. ····-·

It is importaot to I1D<IenJtand tbat (2.3.1) aad (2.3.2) de6oe £amiliea


of oo~he 2-norm on ft'XI is & diff'erent function &om the 2-norm on
Jtd. Thus, the e1181ly verified inequality
II AB liP~ I A 11,.1 B n.. A e I("'X", Be rx• (2.3.3)

Is really an obaervatioD about the rel&tiooship between three difl'erent norma.


Formally, we •Y that DOrms lh h . and ,, OD I("'Xf' R")C"' and are r••
matuallr amrirtent if for all A e r u aad Be rx•
we haw /a(AB) S
/2(A)/,(B).
Not all matrix norma aat.isfy tbe submultiplicative property

(2.3.4)
56 CHAPTER 2. MATRJX ANALYSIS

For example., if t1 A 11 6 =max Ia;; I a.nd

A=B=[~ ~].
then II AB II A > II A 116.11 B IIA. For the most part we work with norms that
satisfy {2.3.4).
The p-norms have the important property that for every A E Rmxn a.nd
x E R" we have II Az II ~ II A II II x lip· More generally, for any vector
norm II · II~ on R" andJl · lip on ii~ we have II Ax ll,s :5 II A ll.~ ..sll x II,
where II A lllll,tJ is a matrix norm defined by

ll Az lltJ (2.3.5)
II A ll.,,a = sup
:"0
II X II lll .

We say that II · lla,ti is subordinate to the vector norms II · II~ and II · itp·
Since the set {x ERn: II x lla = 1} is compact and ll · lla ia continuous, it
follows that
(2.3.6)

for some x• E lR0 having unit a-norm.

2.3.2 Some Matrix Norm Properties


The Frobenius and p-norms (especially p = 1, 2, oo) satisfy certain inequal-
ities that are frequently used in the analysjB of matrix computations. For
A E nmxn we have

(2.3.7)

IBi;l :5 II A ll2 :5 VmTi max la;;l (2.3.8)


i,j

II Alit (2.3.9)

(2.3.10)

Jn !I A JL:>a :5 UA ll2 :5 .;m Jl A l!ao (2.3.11}

1
..;m II A ll1 :5 II A ll2 :5 vn II A lit (2.3.12)
2.3. MATRIX NORMS 57

(2.3.13)

The proofs of tbeee relations are oot bard and are left u exerciaes.
A sequence {A<.Ir)} E R"'x" oont~etJe.t if lim11-oo HA(lll - A II = 0.
Choice of norm is irrelevact siDoe all norma Oil R"xn are equivalent.

2.3.3 The Matrix 2--Norm


A nice feature of the matrix 1-nonn and the matrix co-norm is that they
are easily computed from (2.3.9) and (2.3.10). A characterization of the
2-nonn is considerably more complicated.
Theorem 2.3.1 If A E R"'l(", then there e%Uu a unit !-norm n-wctot- z
JUCh tbt AT Az = iJ 2 Z where 1J = fl A !1 2 ·
Proof. Suppose z E R" is a unit vector such that UAz U ~ = fl A IJ2- Since
z maximizes the functioo
( ) - !II Ax ll~ - ! zT AT Ax
gx - 2 II x II~ - 2 xTx
it follows that it ~li~ 'Y'g(z) = 0 where Vg is the gradient of g. But a
tedious diJferentiation shows that fori:: l:n

a!~)= [(zTz) t.(AT A),;z;- (zTATAz)z.] /(zTz)' .


In vector notation tb.ia says AT Az = (zT ATAz)z. The tbeorem follows by
setting~= II Az U2- CJ
The theorem impliea that II A II~ is a zero of tbe polynomial p(~) =
det(AT A - M). l.o particular, the 2-norm of A is the square root of the
largest eigenvalue of AT A. We h&ve much more to •Y about eigenvalues in
Chapters 1 and 8. For now, we merely observe that 2-norm computation
is iterative and decidedly more complicated than the computation of tbe
matrix 1-norm or co-norm. Fortuuately, if the object is to obtain a.n order-
of-.maguitude estimate oi II A 82, then (2.3.7), (2.3.11), or (2.3.12) caD be
used.
As another example of "norm analysis," here is a handy result for 2-
norm estimation.

Corolluy 2.3.2 I/ A E rl(n, then II A U2 ~ .ju .A u.u A Uoo •


Proof. If z t-: 0 is such that ATAz == p 2 z with 1J = II A U2. then iJ3 II z II. ;
II ATAz lit ~ II AT 11.11 A lit II Z lit= II A llooll A lhh Z If.· 0
58 CHAPTER 2. MATRIX ANALYSIS

2.3.4 Perturbations and the Inverse


We frequently use norms to quantify the effect of perturbations or to prove
that a. sequence of matrice~ converges to a specified limit. As an illustration
of theae norm applications, let UB quantify the change iD A- 1 aa a function
of change in A.
Lemma 2.3.3 If F e R'x" and II F 11, < 1 , then I - F is noruingular
and

11 (I- F)-1 lip ::; 1 -I~ F II,.


Proof. Suppose I-F is singular. It follows that (I- F)x = 0 for some
nonzero x. But then 11 :t 11, = 11 F:t 11,. implies II F II, ;:: 1, a contradiction.
Thus, I-F is non.singuJar. To obtain an expression for its inverse consider
the identity

(~Ft) {1- F) = I - pN+l.

Since 11 F 11,. < 1 it follows that t~ r = 0 because II pt lip :$ II F 11!·


Thus,

(N~tFic) (I -F) = ].
A:=O
N
It follows that (I - F) - 1 = lim " ' P. From this it is easy to show that
N-oaa~
k"'O
· ro k l
II (I- F)-1 11,. ::S (;II F 11, = 1 -II F II,.. 0

Note that II (I- F)- 1 - I l!,. ::; 11 F 11~,/(1 - 11 F 11,.) as a consequence


of the lemma. Thus, if E < 1, then 0( EJ perturbations in I induce 0( E)
perturbations ill the inverse. We next extend this result to general matrices.
Theorem 2.3.4 If A is DMUingalGr and r =II A- 1E II,.< 1, then A+ E
is nonsingular and 11 (A+ E)- 1
- A- 1
II, :5 II E II,. II A- 1 11:1(1- r).
Proof. Since A is nonsingular A+ E = A(I- F) where F = -A- 1E.
Since II F II, = r < 1 it follows from Lemma 2.3.3 that I-F Is nonsiogular
and II (I- F)- 1 11,. < 1/(1- r). Now (A+ E)- 1 =(I- F}- 1A- 1 and so
nA-1 11,.
II (A+ E)- 1 II,. :5 1
_r .
2.4. FINITE PRECISION MATRIX COMPUTATIONS 59

Equation (2.1.3) says that {A+ E)- 1 - A- 1 = -A- 1E(A + E)- 1 and BD
by taking norms we find
II (A+ E)- 1 - A- 1 liP $ II A- 1 liP II E liP II (A+ E)- 1 lip
3
II A- 1 11 p 11 E II
0
< P.
1-r

Problem~~

P2.3.1 Shown AB II,. :S RA 0,.11 B n,. where 1 $ p $ 00.

?2.3.:11 Let. B be all)' BU~rix of A. Shaw that nB "" $ HA II,..


P2.3.3 Shaw that if D = diag(~t, .... ~~) E R"'x" with k = mill{m. n}, then II D H.,
=max 1~1-
P2.3.4 Verify (2.3.1) and (2.3.8).
P2.3.5 Verify (2..3.9) aod (2.3.10).
P2.3.6 Verify (2.3.11) aDd (2.3.12).
P2.!1.7 Verify (2.3.13).
P'l.S.8 Show that ifO # & E R" and E E R'"", then

P2.3.9 Suppoee u E R"' and v E R". SbOI'II' that if E = uvT then II E 11,.. = II E Ill =
II u ll2ll vlll and that nE R.., $II u II.,., II vllt-
P:;a.s.lo Suwc- A e R"""",., e R"', aad o # & e R!'. Show \bel E = (1>'- A&)•r I•T &
hae the smalltBt 2-nonn of all m-by-n mat.rieel E thai seliefy (A+ E)• ='II·

NotM and R e f - for See. 2.3

For deeper - - conc:mnlng mal;rix/vectol' noniiii, -

·F.L. Bau« and C.T. Fib (1960). "Norma and Exclllllioo Theon!ma,w N11JJWS'• .MGth. !,
137-44.
L. Mimky (1960). "S)'1IIJD8tric Gaup Functiou and Uoiterily ~t Nonna, ~ Q.art.
J. Math.. 11, ~-
A.S. Ho~ (1964). The Theorr of Matricu in Numcriml A~ , DoYel- Pub-
l.k:atioos, N- York.
N.J. Higham ( 1992). "F..RiDIM.inc the M.a&.~ p-Norm, ~ Nurru!r. MiliA. 6.1, 539-M6.

2.4 Finite Precision Matrix Computations


In part, rounding errors are what makes the matrix oomputation area so
nontrivial and interesting. In this section we set up a model of floating point
arithmetic and then use it to develop error bounds for floating poi.ut dot
products, sa.xpy's, matrix-vector products and matrix-matrix products. For
60 CHAPTER. 2. MATRJX ANALYSIS

a more comprehensive treatment than what we offer, see Higham (1996) or


Wilkinson (1965). The coverage in Forsythe and Moler (1967) and Stewart
(1973) is also excellent.

2.4.1 The Floating Point Numbers


When calculations are performed on a computer, each arithmetic opera..
tion is generally affected by 'I'OUnflo/1 error. This error arises because the
machine ba.rdware can only represent a 8Ubset of the real numbers. We
denote this subset by F and refer to its elements as /footing point numbers.
Following conventiollS set forth in Forsythe, Malcolm, and Moler (1977, pp.
10-29), the floating point number system ou a particular computer is char-
acterized by four integers: the kte {j, the pm:ision t, and the uponent
mnge [L, U]. In particular, F consists of all numbers f of the form

together witb zero. Notice that for a nonzero f E F we have m ~ 1/1 :s; M
wbere
m = {jL- l and M = {i' (1 - f' ' ). (2.4.1 )
As an example, if fJ = 2, t = 3, L .., 0, and U = 2, then the non-noegative
elements ofF are represented by hash marks on the axis displayed in FIG.
2.4 .1. Notice that the floating point numbers are not equally s paced. A

-2 . - 1 -.5 0 .s l 2

FIGURE 2.4 .1 Sample Flo4ting Point Number Sy1tem

typical value for (/3, t, L, U) might. be (2, 56, -64, 64).

2.4.2 A Model of Floating Point Arithmetic


To malGe general pronoUDCemeDU about the etrect of rounding erron on a
given algorithm, it is necessary to have a model of computer arithmetic on
F. To tbi8 end define the set G by

G = { z E lR.: m :s; lzl :s; M } U {0} (2.4.2)


2.4. FINITE PRECISION MATRJX COMPUTATIONS 61

aDd the operator fl: G- F by

fl(:r) = {nean!'.St c E F to z with tiea handled}


by rowuiing a.way from zero.
The fl operator can be shown to satisfy

(2.4.3)

where u is tbe unit roundoff defined by

u = 21pl-t. (2.4.4)

Let a and b be any two Boating point numbers and let "opM denote any
of the four arithmetic operations+, -, x, +. If a op bEG, then in our
model of floating point arithmetic we a.uume th4t tJu compuUd verrion of
(a op b) is given by fl(a op b). It follows that fl(a op b) = (a op b)(l + £)
with ltl '$ u . Thus,

Ill(~ qp 6)- (4 9P ~)I < u


aopb:FO (2.4.5)
laopbl -

showing that there is small relative error associated with individual arith·
metic operations1 . It is important to realize, however, that this is not
necessarily the case when a sequence of operatioos is involved.

Example 2 .••1 U fJ,. 10, t .. 3 6oatiac poUlt arithmetic ia usi, theA it cao be ahown
that fl[/l(l0- 4 + l) - I) ,. 0 implyiuc a relative erro:- ol 1. Oo the otbl!r b.aDd the
exact _ _ . ia pv. by fl(fl(lo-• + /l(l - t)J • •o-•.Floating poia1 arithmetic •
~ al-y. a.ociattft

If a op b t1. G, then an 4ritl&metic e%Ception occurs. Over/foul and


underflow results whenever Ia op bl > M or 0 < Ia op bl < m respectively.
The bandHog of these and other exceptions is hardware/system dependeot.

2.4.3 Cancellation
Another important aspect of finite precision arithmetic is the pbeoomeDoo
of cota..trop~Uc ctSn«Ua.tion.. RDugb1y tpMking, this term refers to the ex-
treme loes of correct significant digits when small numbers are additiwly
computed from large numbers. A well-known example tabu from Fo:rsythe,
Malcolm and Moler (19n, pp. 14-16) Ia tbe computa&ioo of e- • via Tay-
lor series with a > 0. Tbe roundo~ error auociated with this method is
1'1'1Mn .,. importaiK -rM- Ott mecbm. waa- edditive floaliA& poUlt opera&lou
satlafy fl(a :!:b) '"' (1 + ~l}o :t: (1 + E,}b wiMn IEtl. l€,, ~ u. ro such 1111 t~~Yiroo.mem,
the iDequa1ity l/1(4 :t: 6) - (4 :t: 6)1 S uta :J: 61 oeecla~ bold.
62 CHAPTER 2. MATRIX ANALYSIS

approximately u times the largest partial 8UJIL For large a, this error can
actually be larger than the exact exponential and there will be 110 correct
digits in the answer no matter how many terms in the series are summed.
On the other hand, if enough terms in the Taylor series for r!' are added and
the result reciprocated, then an estimate of e-a to full preciai.on is attained.

2.4.4 The Absolute Value Notation


Before we proceed with the roundoff analysis of some basic matrix calcu-
lations, we 8CC[Uire some useful notation. Suppose A E IRmxn. and that we
wish to quantify the errors asaociated with its floating point representation.
Denoting the stored version of A by fl(A), we see that

[/l(A)]•; = fl(ao;) = a.;(l + e,j) lf··l <


~, - u (2.4.6)

for all i and j. A better way to say the same thing results if we adopt two
conventions. If A and Bare in R"'x"', then

B = IAI =} bi; = lat;l. t = l:m, j = 1:n

B :::; A ;;;;;> b,; ~ a;;, i =1:m, j = 1:n .


With this notation we see that (2.4.6) has the form

lfl(A) -AI ~ ujAI.

A relation such as this can be easily turned into a norm inequality, e.g.,
II fl(A)- A 11 1 S uJI A ih· However, when quantifying the rounding errors
in a matrix manipulation, the absolute value notation can be a lot more
informative because it provides a comment on each (i,j) entry.

2.4.5 Roundoff in Dot Products


We begin our study of finite precision matrix computations by considering
the rounding errors that result in the standard dot product algorithm:

s=O
fork= l:n
(2.4.7)
end

Here, z and 71 are n-by-1 ftoating point vectors.


ID trying to quaDtify the rou.ndiag errors in this algorithm, we a.re
immediately confronted with a notational problem: the distinction be-
tween computed and exact quantities. When the underlying computations
are clear, we shall use the flO operator to signify computed quantities.
2.4. FINITE PRECISION MATRIX COMPUTATIONS 63

Thus, fl(:zil" y) denotes the computed output of (2.4.7). Let us bound


ifl(:rTy)- Jil' yj. If

tben St = :J:tYt(1 + 61) with 1Jltl5 u and for p = 2:n


Sp = fl(sp-t + fl(:rp~~p))

= (sp-1 + Xp7/p(l + Jip)) (1 + Ep)


A little algebra shows that
n
fl(xT y) = s,. = :L:XkYir(l + 'YA:)
lr•l

where n
(1 +1t) = (1 +6t) Il<t +E.;)
j•lr

with the convention that Et = 0. Thua,


n

lfl(xT y)- xT Yl ~ L l:r:tY.tii'Ytl· {2.4.9)


lr•l

To proceed further, we must bound the quantities I'Y~rl in terms of u. The


following result ia useful for this purpose.
n
Lemma 2.4.1 /f{l+cr) = II(l+crt) 'll1Mre jcr~rl5 u andnu ::5.01, tMn

lal S l.Olnu.
P~f. See.Higbam (1996, p. 75). Cl

Applying this result to (2.4.9) under the "reaaonable" assumption nu ::5 .01
gives
(2.4.10)
Notice that if lxTYl < l:riTivl, then the relative error iD fl(z1' y) may not
be small.

2.4.6 Alternative Ways to Quantify Roundoff Error


An easier but less rigorous way of bounding a in Lemma 2.4.1 ia to say
lal S nu + O(u2 ). With this couvention we haw
(2.4.11)
64 CHAPTER 2. MATRIX ANALYSIS

Other ways of expressing the 881lle result include

(2.4.12)

and
(2.4.13)

where in {2.4.12} ¢-(n) is a "modest" function of n and in (2.4.13) c is a


constant of order unity.
We shall not express a preference for any of the error bounding styles
shown in (2.4.10H2.4.13). This spares us the necessity of translating the
roundoff results that appear in the literature into a fixed format. Moreover,
paying overly close attention to the details of an error bound is inconsistent
with the "philosophy" of roundoff analysis. AlJ Wilkinson (1971, p. 567)
says,

There is stiU a tendency to attach too much importance to the


precise error bounds obtained by an a priori error analysis. In
my opinion, the bound itself is usually the least important part
of it. The main object of such a.n analysis is to expose the
potential instabilities, if any, of an algorithm so that hopefully
from the insight thus obtained one might be led to improved al-
gorithms. Usually the bound itself is weaker than it might have
been because of the necessity of restricting the mass of detail
to a reasonable level and because of the limitations imposed by
expressing the errors in terms of matrix norms. Apriori bounds
are not, in general, quantities that should be used in practice.
Practical error bounds should usually be determined by some
form of & posteriori error analysis, since this takes full advan-
tage of the statistical distribution of rounding errors and of any
special features, such as sparseness, in the matrix.

It is important to keep these perspectives in mind.

2.4. 7 Dot Product Accumulation


Some computers haw provision for accumulating dot products in double
precision. This means that if x and 11 are floating point vectors with length
t mantiBsas, then the running sum 8 in (2.4.7) is built up in a register with
a 2t digit ma.Jltissa. Since the multiplication of two t-digit Boating point
numbers can be stored exactly in a double precision variahle 1 it is only
when 8 is written to single precision memory that any roundoff occurs. In
this situation one can usually assert that a computed dot product has good
relatioo error, i.e., fl(xT y) = xTy(l + 6) where j6j :::::: u. Thus, the ability
to accumulate dot products is very appealing.
2.4. FINITJ.!: PRECISION MATRIX COMPUTATIONS 65

2.4.8 Roundoff in Other Basic Matrix Computations


It is easy to shaw that if A s.nd B are Boating point matrices and a ia a
floating point number, then

fl(aA) = aA + E lEI~ ula.AI (2.4.14)

and
fl(A +B) = (A+ B) +E lEI ~ uiA + Bl. (2.4.15)
As a consequence of these two results, it is easy to verify that computed
sa.xpy's and outer product updates satisfy

fl(ax + y) =ax+ y + z (2.4.16)

fl(C + uvT) = C +uvT + E lEI$ u (ICI + 2luvTI) +O(u:t). (2.4.17)


Using (2.4.10) it is easy to show that a dot product based multiplication of
two floating point matrices A and B satisfies

fl(AB) = AB + E lEI $ nuiAIIBI + O(u2 ). (2.4.18)

The same result applies if a ge.xpy or outer procluct based procedure is used.
Notice that matrix multiplication does not necessarily give s:maJ..l relative
error since IABI may be much smaller than IAIIBI, e.g.,

1 1] [ 1 0 ] = [ .01 0 ]
[ 0 0 -.99 0 0 0 .

It is easy to obtain norm bounds from the roundoff results developed thus
far. U we look at tbe 1-nonn error in 8oating point matrix multiplication,
then it is easy to show from (2.4.18) that

Jlfl(AB)- AB 11 1 :5 null A 11 1 11 B 11 1 + O(u2 ). (2.4.19)

2.4.9 Forward and Backward Error Analyses


Each roundoff bound given above is tbe consequence of a f01'11J(Jrd. eTTOr
analysis. An alternative style of characterizing the roundoff errors in an
algorithm is accompliahed through a teclmique known aa backward. error
analysir. Here, the rounding errors are related to the data of the problem
rather than to its solution. By way of illustration, coDSider the n = 2
version of triangular matrix multiplication. It can be shown that:

fl(AB) =
[
aubu(l + ii!t) (oub-t2(l + f2) + Ot2ilzr(1 + e3))(l + f4)

0 a:nbn(1 + 11!5)
l
66 CHAPTER 2. MATRIX ANALYSIS

= 1:5. However, if we define


where lEi I $ u, for i

A= [ au

0
an{ I+ t:3)(l + t:,.)

az.,(l + f5)
l
and

B=
[ bu(1 + t:t}

0
b12(1 + t:l)(1

~
+ ~) l

then it is easily verified that fl(AB) =A.B. Moreover,

A= A+E lEI :5 2uJAI +0(u2 )


B= B +F IFJ :5 2uiBI + O(u2 ).
In other words, the computed product is the exact product of slightly per-
turbed A and B.

2.4.10 Error in Strassen Multiplication


In §1.3.8 we outlined an unconventional matrix multiplication procedure
due to Strasseo (1969). It is instructive to compare the effect of roundoff
in this method with the effect of roundoff in any of the conventional matrix
multiplication methods of §1.1.
It can be shown that the Strassen approach (Algorithm 1.3.1) produces
a 6 = fl(AB) that satisfies an inequality of the form (2.4.19). This is
perfectly satisfactory in many applications. However, the 6 that Strassen's
method producas does not always satisfy ao inequality of the form (2.4.18).
To see this, suppose

A = B = [ .99 .0010 )
.0010 .99

and that we execute Algorithm 1.3.1 using 2-digit floating point arithmetic.
Among other things, the following quantities are computed:

p3 = fl(.99{.001 - .99}) = -.98


1\ = fl((.99 + .001).99) = .98
Ct2 = fl(P:, + 1\) = 0.0
Now in exact arithmetic c12 = 2(.001)(.99) = .00198 and thus Algorithm 1.3.1
produces a c12 with no correct significant digits. The Strassen approach gets
into trouble ill thia example bec:auae amall off-diagonal entries are combined
with large diagonal entries. Note that in conventional matrix multiplication
neither bn and ~ o~ a 11 and a12 are summed. Thus the contribution of
2.4. FINITE PRECISION MATRIX COMPUTATIONS 67

the small off-diagooal e1emeota is not lost. Indeed, for the above A and B
a conventional matrix multiply gives en = .0020.
Failure to produce a. oomponentwise accurate C can be a serious short-
coming in BDme applicatiooa. For example, in Markov procesaes the eli;,
b.;;, and ct; are transition probabilities and are therefore nonnegatiw. It
may be critical to compute Ci; accurately if it reftecta a particularly im-
portant probability in the modeled pheDomena.. Note that if A ~ 0 and
B ~ 0, tbeu coiM!Dtional matrix multiplication produces a product C that
has small componentwiae relative error:

IC- Cl $; nuJAIIBI + O(u3 ) = nuJCI + O(u2 ).

This follows from (2.4.18). Beca.Wie we cannot sa.y the same for the Str88tlell
approach, we conclude that Algorithm 1.3.1 is not attractive foe certain
nonnegative matrix multiplication problems if relatively accurate C.; are
required.
Extrapolating from this discUSBion we reach two fairly obviot19 but im-
portant conclWiions:

• Different methods for computing the same quantity can produce sub-
stantially different results.
• Whether or not an algorithm produces satisfactory results depends
upon the type of problem solved and the goals of the user.
Tbese observations are clarified in subsequent chapters and are intimately
related to the concept& of algorithm stability and problem condition.
Problema

P2.4.1 Sbclow tliM if (2.4.7) Ia applied with r,. :r, tbeo /l(:rTz) = zTz(l + Q) wbere
jQj $ rw+O(u2).
P:l-4.2 Prove (2.4.3).
P2.U Sbow that if E E R"x .. with m ~ n, u- UIEJ b $ ..milE h· Tbie r.ult ill
umfu.l when dmiving oorm bouDdl from abeolu&e value boUDdll.
=
P2."-4 .A.ume the aillieDce of a equant rooc fuDcUoD IIMilfyiDg /l(./%) ../i(l + ~>
with1(1 $ u. Gl~ aa a.lprit.hm for compuiinc I :r b Uld bowJd thlll roundinc lln'On.
P2..&.a Sa:wc- A aad B an ~-n upper triaqu]ar ~ pobd. ~ If C =
/I(AB) '- QDIIIPUted ll8iq oao ol the CIIG'tWlCioDal §1.1 aJsorit!um, dom it fol1ow thai
C=ABwta.ntAaad iJ u-ec:*-coA aad B1
P2.U Sa~ A and B 8oft ~-n floet.iu& poia:t IDMrics aacl tbM A ill llOOIIiDp1at
with u c
IA- 1 IlAII... "" T". Show that if 2 /l(.AB) • ~ llliDc UIY of UM
~in §1.1, thea tbere exiRI a Jj 110 6 • AA IIDd I iJ- B lao ~ nvrl B Roo +
O{u2).
P2.4. 7 Pnwe (2.4.18).
68 CHAPTER 2. MATRIX ANALYSIS

J.B. W~ll (1963). ~ Brron in AlgeOtaic Pr«:euu, Praztic&.Hall. ED&Je.


1IOOd CWiil, N J.
J.H. WllkiMoll (U1Tl). "Modaru Enw AD.aiysia," SlAM Review JJ, 548-68.
D. Kahaa., C.B. Moler, aDdS. NMh (11188). Numcricol Method~ Oft4 So,_., Preatice-
HaU, ~Cliff., NJ.
F . CbaitiD-CbMelia aud V . ~ (1996). Le::turu 01t F'mU.c Precirion Com~
SIAM Publicatlou, PbiJ.cWpbla.
More ~ ~ iD enw auat,- lawtw illlolnal aualysil.tu buiJdiac of na.
tistical modela of l'OUIIdoB armr, uad the aut.omatiD& of the analyaill i&aelf:

T.E. HuU aud J.R. s - (1966). "'l'eN of ProbabiU.ie MocW. fot Propeplloft of
&UDdolf Envn.," Comm. ACM. 9, 108-13.
J. Lanon 8lld A. Sameh (1978). •Eftk~eat C&lculatloll oHhe El!eca of &undolf Fzror1,"
ACM n.n.. MoJJt.. Soft. 4, 228-36.
W . Miller ud D. SpoooM" (1978). "Soft-wan for R.ouado« Aaalym, II," ACU ThiN.
MoUI. Soft, 4, ~-
J.M. Yobe (1979). "Sol\1olare for InteNd Aritlunaie: A Reaeooable Portable ~e,"
ACM 'lralu. MGIJL. Soft. 5, ~.
All)IOIIe eoclp(i in ..;oWl aoftware ~ .-18 a thorouch undentaadiDc of
floMiDg point ~ie. A gtKd way t.o becio acquiring lmowledp iD ttu. directioll ill
to reed about the IEEE Boating point llandard ill

D. Goldberg (UKU). MWfw E'WJI')' Computer Scientist Shoukl Koow About Flo6siac
Poioi Arithmetic," ACM Survey• 13, 5-48.
See allo

R.P. Bnlllt (1978). "A .t'brtran Multiple Preeilioa Arithmetic Pecbp." ACM 'Ihuu.
Mo/Jt.. Sof', 4, 57-70.
R.P. Bra (1978). "Aipitbm 524 MP, a Fonran Multiple Preclsloll ArlthmeUc: Padt-
ace." ACM n..... MGlla. Soft. 4, 71~1 .
J.W. Demmel (19114). "Undertow aDd tbe Rel1abillty of Numerical Soltwanlt SIAM J.
ScL ~PN~ StoL eom,. .s, 88'1-019.
U.W. Klllilcb aud W .L. MiraAII:a' (1986). "The Arithloetic of the DicNl Coaaputer,"
SIAM &view U, 1-40.
W .J . Cody (1988). "ALGORITHM ~ MACHAR! A SubrouWie \o Dynamically 0.
&ermine MechiDe Pwame&en," ACM Thuu. MGlla. &ft. .!4, 303--311.
D.H. Bailey, H.D. SO.., J. T . &noa, M.J. Fouta (1989). "Floecl.q Point Ari:\biDedc:
lD ~ture SIIJMI'COIDP11Wn," lnt'l J. Sllpmlllmplfling Appl. J, ~
D .H. Bailey (lm). • AJ&ofttbm 119: Mukipnlc:8ao Tnn•!etioo ADd Euc:utioll of FO~
TRAN Propaa..• ACJI n-.. MIIAJa. hft. II, 288-310.
The lrllbtJetiel .-oc:iMed with the d~ o1 bi&h..qualib' eoftftre, - for "'im-
pie" problem~, are !~ A good eample ill the dalip ol a aubrolatiae t.o 00111p11te
2-Dar~M

J.M. BJoe (1078). •A Plll1able FORI'RAN Prop1uD t.o FWt 'be E1lclideaD Norm of a
Vw:tor,.. ACJI n-t. MoUa. Soft. 4, 15-2S.
Ft.- IY1 ~ ol ibe S t - elpnllm aud cxller .,... nn.r ...._ pnxllld--
2.5. ORTHOGONALITY AND THE SVD 69

R.P. Brem (1910). ~ Aaa.lyJd. of AJcorittm. lor' Matrix Mnhipllca&ioD and 'I'riu-
gu)M Oeoompofli\iotl Ulliq Wmogr.d'• kllmticy,~ NVJJU:r. MalA. 16, 145-156.
W. Mills" (1915). "Computa.tioaal Compluity aDd NUIDSical Stability,,. SIAM J. Com-
~4,97-107.
N.J. Hi&hNn (lim). "&ability ol a Mechod b NuldplyiJII Complex :W.UX. with
Three RMl Ma&z:b: Multiplicaliooa,,. SIAM J. MfJizV AnaL Appl. 13, 681-e81.
J.W. Oemmellllld N.J.Ifi&bam. (1992). "Stability ol Block AJgorithmll with Put r-&-3
BLAS: ACJl lhan.ll. Moth. SofL 111. 274--291.

2.5 Orthogonality and the SVD


Orthogonality has a very prominent role to play in matrix computations.
After establishing a few definitions we prove the extremely useful singular
value decomposition (SVD). Among other things, the SVD enables us to
intelligently handle the matrix rank problem. The concept of rank, though
perfectly clear in the exact arithmetic context, is tricky in the presence of
roundoff error and fuzey data. With the SVD we caa introduce the practical
notion of numerical rank.

2.5.1 Orthogonality
A set of vectors {x11.,, ,x,.} in Rm. is ort.hogonal if :rfx; = 0 whenever
i =F j and orthonormal if zf x; = Oij· Intuitively, orthogonal vectors are
maximally independent for they point in totally different directions.
A col.lect.ion of subspaces Sl, ... , S, in Rm is mutually orthogon4l if
XT y = 0 wbelle'rel' x E Si andy E S; fori;/:. j, The orthogonal complement
of a subspace S ~ R"' is defined by

SJ.. = {71 E Rm : yTx = 0 for' all :z: E S}

and it is not bard to show that ran(A).L = null(AT). The vector& v1 , ••• , Vk
form an orlhononnal basis for a subspace S ~ Rrn if they are orthonormal
and spanS.
A matrix Q E R"'xm is said to be orthogonal if QTQ = 1. If Q =
[ ql, · · · , qm ) is orthogonal, then the qi form an orthonormal basis for Rrn.
It is alway& poi!Sible to extend such a basis to a full orthonormal basis
{ Vt,, .. , tim} for Rm:

Theorem 2.5.1 If V1 E r • r lwu orlhonornaal column~, then there e%Uu


Yl E R'x(.. -r) such that
V=[V.\'2]
il ort/wgonGL Note th4t ran(V1).l = rao(¥2).

Proof. This is a standard result &om introductory linear algebra. It is


alao a corollary of the QR factorization that we present in §5.2. []
70 CHAPTER 2. MATRIX ANALYSIS

2.5.2 Norms and Orthogonal Transformations


The 2-norm is invariant Ullder orthogonal transformation, for if QTQ =I,
then II Qx IJ~ = ~TQTQz = zTz = II :z: II~. The matrix 2-norm and
the Frobeniua norm a.re aJao invariant with respect to orthogonal transfor-
mations. In partiC1,J..lar. it is easy to .show that for all orthogonal Q and Z
of appropriate dimensions we have

(2.5.1)

and
II QAZ ll2 = II A ll2 . (2.5.2)

2.5.3 The Singular Value Decomposition


The theory of norma developed in the previous two sections can be used to
prove the extremely U8eful singular value decomposition.

Theorem 2.5.2 (Singular Value Decomposition (SVD)) If A is a real


m-by-n matriz,. then there e:z:Ut orfho9onal matrices

U=[ut 1 ••• ,Um.]ER"'xm and V={Vt, ... ,tJn]ER"x"

!Uch that

UT AV = diag(ut. ... ,u11 ) E nmxn p = min{m,n}

Proof. Let X E R" and y E am be unit 2-norm vectors that satisfy A:z: =
U1J with q = II A 112· From Theorem 2.5.1 there exist v2
E .R"X(n-1) and
U2 E ~x(m-l} so V = (:r V2l E R'x• and U = [ !1 U2) E ~rxm are
orthogonal. It is not hard to show that rJT AV bas the foUowing structure:

uT AV = a
[ 0
wT ]
B =At.
Since

we have ~ (u2 + wTw). But u 2 =II A II~ =II At II~ 'and so we


ll At n~
must have w = 0. An obvioua induction argument completes the proof of
the theorem. o

The o, are the 5ingular value.t of A and tbe vectom Ui and v, are the
ith kft singular vector aud the ith right singular vector respectiveJy. It
2.5. 0KI"HOGONALJTY AND THE SVD 71

ia easy to verify by comparing columns in the equations AV = UI: and


ATU = VI:T that

It is convenient to have the following notation for designating singular val-


ues:

u 1(A) = the ith largest singular value of A.


O'maz(A) = the largest singular value of A,
O'mi"(A) = the smallest singular value of A.
The singular values of a matrix A are precisely the lengths of the semi-axes
of the hyperellipsoid E defined byE= { A:J:: II x ll2 1 }. =
Example :1.5.1

.96 1.72]=UEVT={·6
A=[ 2.28 .96 .8
-.8]{3
.6
0][·8.6 -.8.6]T·
0 1

The SVD reveals a great deal about the structure of a matrix. If the
SVD of A is given by Theorem 2.5.2, and we definer by

0'1 ;::: • • • ;::: O'r > O'r+l = ·· · = 0'-p =0,


then

rank(A) = r {2.5.3)
null(A) = span{tlr+lo··. ,vt~} {2.5.4}
· ra.n(A) = span{Uto•··oUr}, (2.5.5)

and we have the SVD e:tpan.Sion


r
A = L O'ifliv[ . (2.5.6)
i-1

Various 2-norm and Frobeniua Donn properties have connectlona to the


SVD. If A E Rmx", then

II All~ = al + · ··+u; p=min{m,n} (2.5.7)


II All:~ = <7t (2.5.8)

min
II Axil, u,. (m ;::: n). (2.5.9)
=
z¢0 ~
72 CHAPTER 2. MATRIX ANALYSIS

2.5.4 The Thin SVD


If A= UEvT e R"xn is the SVD of A aod m ~ n, then

A= U1:E1VT

where
ul = U(:, l:n) = I Ul ' •.. ' Un I E nmxn
and
E1 = :E(l:n, l:n) = diag(u., ... ,un) e R'xn.
We refer to this much-used, trim.med down version of the SVD as the thin
SVD.

2.5.5 Rank Deficiency and the SVD


One of the most valuable aspects of the SVD is that it enables us to deal
sensibly with the concept of matrix rank. Numerous theorems in linear
algebra have the form "if such-and-such a matrix has full rank, then such-
and-such a property holds." While neat and aesthetic, results of this flavor
do not help us address the numerical difficulties frequently encountered in
situations where near rank deficiency prevails. Rounding errors and fuzzy
data make rank determination a nontrivial exercise. Indeed, for some small
£ we may be interested in the e-rank of a matrix which we define by

rank{ A, e) = min rank.( B).


IIA-Bh~·

Thus, if A is obtained in a laboratory with eadllli.; correct to within ±.001,


then it might make BeDBe to look at rank(A, .001). Along the same lines, if
A is an m-by-n Boating point matrix then it is reasonable to regard A as
nu~rically mn.t deficient if rank( A, f)< min{m, n} with t = ull A ll2·
Numerical rank deficiency and f-rank are :nicely cb.&racterized in terms
of the SVD because the singular values indicate bow near a given matrix is
to a matrix of lower rank.

Theorem 2.5.3 Let the SVD of A E Rmxn be giv~ by Theon!m 2.5.2. If


k < r = rank(A) and
.1:
A,.= 2:u1u.av[, (2.5.10)
i-1

(2.5.11)
2.5. 0Jtl'ROGONALITY AND THE SVD 73

Proof. Since efT A~c V = diag(o-1, .•. ,a~~:,O, ... , 0} it follows that rank( At) =
k and that rfT(A-A.,)V = diag(O, ... ,o,a,.+l, ... ,o-11 ) and soU A- A~oll2 ~
O",Hl·
Now suppose rank(B) = k for some BE lf"x". It follows that we can
find orthonormal vectors ZlJ ••• ,:Cn-11: so null( B) = span{:~:1, ... ,:cn-.r.} .
A dimension argument shows that

Let z be a unit 2-norm vector in this intersection. Since Bz = 0 and


lc+l
Az = L a,(v[ z)u.;
i-1

we have
i:+l
II A- B II~ ~ II (A- B)z II~ = II Az II~ = L CTf(v[ z):J 2: a'+l
i=-1

completing the proof of the theorem. 0

Theorem 2.5.3 says that the smallest singular value of A is the 2-norm
distance of A to the set of all rank-deficient matrices. It also follows that
the set of full rank matrices in Rmxn is both open and dense.
Finally, if rt = rank( A, t), then

p = min{m, n}.
We have more to say about the numerical rank issue in §5.5 and §12.2.

2.5.6 Unitary Matrices


Over the complex field the unitary matrices correspond to the orthogonal
matrices. In particular, Q E C'"'" is unitGflJifQ 8 Q = QQH =In- Unitary
matrices preserve 2-norm. The SVD of a complex matrix involves unitary
matrices. If A E cmxn, then there exist unitary matrices U E cmxm and
V E ()"xn such that

p = min{m,n}

ProbiiBDlll

P2.5.1 Show 'hat if S ill real Mel sT = -S, Chen I-S ill DOnsiDgulal- aDd the matrix
(I- S)- 1 (1 + S) ilon~ Thia ill known 1111 the C11flley t:ran6f-ot S.
74 CHAPTER 2. MATRIX ANALYSIS

P2.6.2 Sbow tba& .. ~ ~ dla&oaal.IIIMriK.


P:U.-3 $boor tbt If Q • Q1 + iQ2 w Wlitary with Q,, Qz E R'""', thea the 2r..by-2n
ral ma&rtx

ia ortbosooal.
P2.$.4 Eatabliab propertlaa (2.5.3)·(2.M).

P2.5.6 Fol- the 2-by-2 matrix A = [ ~ ~ ] , derive ~na for C1moa (A) and
a--o..(A) that ant fundi.oaa ol w, :1:, 1f, .-d .E.
P2.5. T Shaw til» any mMrix io R" "'" ill th~ limit ola ~oce o! fuU rank ma&ricc:e..
P~.5.8 Sbow tb~ i{ A E R''""' has rank n, then II A(ATA) - 1 A T 112 = l.
P:l.$.9 WhM ia t he - ' rank-ooe ma&rix to A "" [ !. ~ ] in the Frobeui111 oorm7

P2.5. 10 S how that if A E R"'"" t hen II A 11,. $ vfran!t(A ) HA h . tb«eby abarpeuiog


(2.3.7).

Not• aad Ref-ce~ lot' Sec:. ~.5

Fbrsf'be aAd Moler ( 1961) oller a cood IMXlOillli ol the SVD'a role in the Ule.IY8ia of the
.U "" II proble!l'l. Their proof of the decompollitloo il more tnlditioo&l thaD ow. io til»
it mallm u. of the el&eavalua theory foe symmetric matricel. Hinoric:al SVD tefemKea
iodude

E. Beltrami (1873). "Sulle FwWoni BlliDeari," GWnole di MIJIMm4~ 11, 98-106.


C. Ec:kart aDd G. Youag ( 1~). •A PriDCipN Axil 'l"raMMomleiion for Noo-H.Wtiaz~
Ma&ricel." Bull. Arn.er. MIJJJL. Soe. .45, 118-21.
G.W. ~ ( 1m). "'n tbe. Eady Hiaory of tha Si.Dculw Value Decompmkioll:
SIAM Rttt1ittm JJ, 551-1166.
O..olthe-- eipi&c:ant dewlop.-m. io ~ COGipUta&iola baa b.- the mc:r-1
SVD in applicactoo are8l tbae require WI 'ate!'tcaal haodliDc of ma&ztx raaJt.
1111 of tbe
Tbe ranp of epplicatioaa w~,., Ooe of tbe moat ~,ill
C.B. Moler aDd D. Morrillou (1D83) . "SiogWK Value Aoalya ol Crypt.osJama." Amcr.
MGIA. Mon.tl!IJ H , 78-87.
Fbr ~~~~of the SVD to IAftDiW di-lon&l HUbert ~. -

LC. Gobber& and M.G . KmD (1960). I~ to til. 171-, of L"'- Non-&1/
AG;oin£ ()perGcora , Am.. Math. Soc., ~ R.I.
F. Smitble. (19'10). lf~Urral ~ Cambridp Ulliwnity PN., Cambridp.
R.educ:iiiJ the raDlt of a ma&rix aa in Tb«nm 2.5.3 wheo the pctwbiD& matrix ill COD-
at:mi.aed ia m.cu-1 iD

J.W. Demmel (1987). "Tbe nnalleBt pmurbe&loa of a mbmatrill whidl 1 - . the raak
aad COM\raloed toW ..._ ~~ problaM, SIAN J. NtuMr. Anol. !4, 1~206.
2.6. PROJECTIONS AND THE CS DECOMPOSITION 75

G. H. Golub, A. Hoffman, and G.W. Stewart (lgsjJ). ~A ~ ot \he Edcart-


Young-Miralcy Apprmri~Jmioo Theorem." Lm. Alg. tmd Ita APJ!Iie. 88/89, 317-3:.18.
G.A. Wat8rm {1988). "The Smallest Perturb&tioo of a Submatrix which l.onra the Rank
of the Matrix,~ JMA J. Numer. AnaJ.. 8, 295--304.

2.6 Projections and the CS Decomposition


U the object of a computation is to compute a matrix or a vector, then
norms are useful for assessing the accuracy of the answer or !or measuring
progress during an iteration. U the object of a computation is to compute
a subspace, then to make similar comments we need to be able to quantify
the distance between two subspaces. Orthogonal projections are critical in
this regard. After the elementary concepts are established we discuss the
CS decomposition. This is an SVD-like decomposition that is handy when
having to compare a pair of subspaces. We begin with the nation of an
orthogonal projection.

2.6.1 Orthogonal Projections


Let S ~ R" be a subspace. P E 1R" x" is the orthogonal projection onto
=
S if ran( P) = S, P 2 P, aod pT = P. From this definition it is easy to
shaw that if x E Rn, then Px E S and (I - P):z E SJ..
If P1 and P2 are each orthogonal projections, then for any z ERn we
have
I (P1 - P:~)z II~ = (P1z)T(I- P2)z + (P2z)T{I- P1)z.
If ran(Pl) = ran(P2) = S, then the right-hand side of this expression is
zero showing that the orthogonal projection for a subspace is unique. If the
colUIDJl.<l of V = [v1 1 • • • , v.rc ] are an orthonormal basis for a subspace S, then
it is easy to show that P = yyT is the unique orthogonal projection onto
S. Note that if v e R" 1 then P = wT jvT v is the orthogonal projection
onto S ~ span{v}·.

2.6.2 SVD-Related Projections


There are several important orthogonal projections associated with the sin-
gular value decomposition. Suppose A = UEvT E R"'l(" is the SVD of A
and that r ~rank( A). U we have the U and V partitioni.ngs
U = [ Ur Ur ] V ~ [ V,. V,. I
r m-r r n-r
then
VrVrT = projection on to nuU(A)l. = ran(AT)
- -r
VrVr = projection on to null(A)
Urif! = projection on to rao(A)
- -r
UrUr = projection on to ran(A)l. = null(AT)
76 CHAPTER 2. MATRIX ANALYSlS

2.6.3 Distance Between Subspaces


The one-to-one correspondence between subspaces and orthogonal projec-
tions enables us to devise a notion of distance between subspaces. Suppose
S 1 B.lld S 2 are subspaces ofR"" and that dim(S1) = dim(S2). We define the
distance between these two spaces by
(2.6.1)

where Po. is the orthogonal projection onto S;. The distance between a
pair of subspaces can be characterized in terms of the blocks of a certain
orthogonal matrix.
Theorem 2.6.1 Suppose

Z = [ Zt
k

are n-by-n orthogonal matrices. If St = ran(Wl) and S2 = ran{Zt), then

dist(Sl.S2) = ll W[Z2112 = II ZfW2112·


Proof.

Note that the matrices W[ ZI and


matrix
wr z2 are submatrices of the orthogonal

Our goal is to show that II Q21 11 2 = II Qt2 11 2. Since Q is orthogonal it


follows from
Q( :1: ]
0
= [ Qu:t
Q~n.::t
J
that
t = II Qux u: + II Q21:1: II~
for all unit 2-norm x E Rll:. Tbtl8,

II Q2t II~ = II ;jj;.. 1 Jl Q21:1: II~ = 1 - II :if:_ 1


II Qux II~
= 1 - amt.n(Qu) 2 •
2.6. PROJECTIONS AND THE CS DECOMPOSITION n
Analogously, by working with QT (which is also orthogonal) it is possible
to show that

and therefore
II Q12ll~ = 2
1- O'mu.(Qu) .
Thus, II Q21 lb = II Qn ll2· a
Note that if 81 and S2 are subspace& in R" with the same dimension, then

0 :::; dist(St. S2) :::; 1.

The distance is zero if sl = 92 and one if sl nsf F {0}.


A more refined analysis of the blocks of the Q matrix above sheds more
light on the difference between a pair of subspaces. This requires a special
SVD-like decomposition for orthogonal matrices.

2.6.4 The CS Decomposition


The blocks of an orthogonal matrix partitioned into 2-by-2 form have highly
related SVDs. This is the gist of the CS decomposition. We prove a very
useful special case first.

Theorem 2.6.2 {The CS Decomposition {Thin Version)) Consider the


matrix

where m1 ~ n and m2 ~ n. If the columns of Q are orthonormal, then there


enst !lf"th.o9onaimatncesU1 E Rm,xm 1 , U'J E R"' 2 xm 2 , andVt E Jrxn such
that

where
. [ ~1 ~2 r[~~ ] vl =[~]

C = diag(cos(81), ... , oos(8n)),


S = diag(sin(B1), ... ,sin(8,.)),

and
0 :::; 81 :::; 82 :::; .. · ::5 Bn :::;
n
2"
Proof. Since UQu lb $ II Q lh = 1, the singula:r values of Qu are all in
the interval [0, 1]. Let

[ ~t n-t
~]
t
m1 -t
78 CHAPTER 2. MATRlX ANALYSIS

be the SVD of Q 1 where we assume

1 = Cl = · · · = ct > ct+l ;=: • • · ;=: Cn ;=: 0.


To complete the proof of the theorem we must construct the orthogonal
matrix u2. If
Q'J V1 = 1W1 W2 J
t n- t

r
then

~~. 1~: 1v, - [ :. w, ~ ]·


Since the columns of this matrix have unit 2-norm, W1 = 0. The coJumns
of W2 are nonzero and mutually orthogonal because
W[Wl = ln-t- ETE a diag(1- Cf+l• ... , 1- c!)
is nonsingular. If ss.: =~fork= 1:n, then the coJumns of

Z = W, diag(l/se+h ... , 1/sn}


are orthononnal. By Theorem 2.5.1 there exists an orthogonal matrix
U2 E 1R.m~xm 2 with U2 (:, t + 1:n) = Z. It is easy to verify that

U[Q•Nl = diag(st, ... , Sn) := S.


si
Since ~ + = 1 for k = l:n, it follows that these quantities are the required
cosines and sines. 0
Using the same sort of techniques it is possible to prove the following more
general version of the decomposition:
Theorem 2.6.3 (CS Decomposition {General Version)) If

Q= [ ~~: ~~ J
is a 2-by-2 (arbitrury) partitioning of an n-by-n orthogonal matri:l;, then
there ~t ortlwgonal

u =[~~ I~2 ]
and Y=(~l~~]
such that
I 0 0 0 0 0
0 c 0 0 s 0
0 0 0 0 0 I
UTQV =
0 0 0 I 0 0
0 s 0 0 -C 0
0 0 I 0 0 0
2.6. PROJECTIONS AND THE CS DECOMPOSITION 79

where C = diag(ct, ... ,c,.) and S = diag(st, ... , s,.) are square diagono.i
mllt1-U%" with 0 < eo.,,
< 1.
Proof. See Paige and Saunders (1981) for details. We have suppressed the
dimensions of the zero submatrices, some of which may be empty. []
The essential message of the decomposition is that the SVDs of the Qa; are
highly related.
Exa.mple :u.t The m.a&nx

=~~=
0.3691 0.3838 0.2126 -0.3112]
-0.1552 -0.1129 0.2676 0.8517
0. 7240 -0.6730 -0.1301 0.0602
Q= [ -0.2287 0.0088 0.2235 -o.9235 0.2120

UT QV "'
0.4530

0.9337
0.0000
0.0000
[ 0.1800
0.5612

0.0000
0.6781
0.0000
0.0000
0.5806

0.1800
0.0000
0.0000
-0.9837
0.1162
is orthogonal and 'ffiih the inclicaUd partilioniog cu.Oe reduced to

0.0000
0.7349
0.0000
0.0000
0.3595

0.0000
O.(l()(X)
1.0000
0.0000
l
0.0000 0.7341) 0.0000 -0.6781 0.0000

The angles B&SOCiated with the cosines and sines turn out to be very im-
portant in a. number of applications. See §12.4.

Problenu

Pl.6.1 Show ttlat if P ill an on.hogooai projection, then Q = I - 2P is ortho~.

P2.6.2 W1W ue the ~ngular wl11e1 of an orthogonal projection?


P2.6.3 SuPfX*l 81 =span{:z} a.nd !h. "' llp&U{JI}, whent :z and 11 11n1 unit 2-nonn
vecton in R 1 . Woddng only 'ffith the definition ofdbt(·,·), allow that dist(St,S~) =
.j1 - (:zTt1 )~ wrifyinr; that the diat&Dc:e l::letM!en S1 and 8-z equal.l the sine oft he a.ngle
between :z and II·

Not. and a.r.n- for Sec. 2.8

The following ~ m.:u. wriou8 aspects of tbe CS decoiJli)OBition:


C. Davia &.lid W. Kahan (19'10). '"l'he ~ion of Elg~ by a Ptnurbaiion Ill,"
SIAM J. Num. Aftal. 1, 1-46.
G. W. Stewart ( 1977). "On tiM! Pen.urbation of P.eudo-w--, Projec\iou a.nd Linear
Leut SqUIIRII Problema," SIAM Rerieallg, 634--662.
C.C. Paige aod M. Saundfta (1981). --ybwan:J a Genenlized. Sin«ulu' Value Decompa.i-
tion," SIAM J_ Num.. Anal. 18, 398--405.
C.C. Paip and M. Wti (1994). "m.toey aDd Generality of the CS Decomposition," Lin.
Alg. ;and lu Appiic. !08/109, 303-328.
80 CHAPTER 2. MATRIX ANALYSIS

See §8.7 w eome co~ deta1t..


f'lr a deep« pomec.ric.N ~ ol tbe CS clec:ompooitioo aDd tbe DoCKiozl ol
m.taace ~ .u~-

T .A. Ariu., A. Fdelmu ud S. Smith (1986). "Coajup&e Gndilml aDd K-'Oa'1


Me\bod OD &.be G . . - aad Slw.l MuaiJolda,• w aw-r lA SIAM J. MobV AnoL
.App.l.

2. r The Sensitivity of Square Systems


We oow use some of the tools developed in previoWI sections to analyze the
linear system problem A%= b where A E R'x" is nonsingular andb E lR" .
Our aim is to examine how perturbations in A and b aft'ect the solution z .
A much more detailed treatment may be found in Higham (1996).

2.7.1 An SVD Analysis


Ir
A = E" 11iU;Vr == ur.;vT
is the SVD of A, then
·-·
(2.7.1)

This expansion shows that small changes in A or b can induce relatively


large changes in z if 11, is small.
It should come as DO surprise that tbe magnitude of O'n should have
a bearing on the seusitivity of tbe .4% = b problem when we recall from
Theorem 2.5.3 that O'n ia the dista.oce from A to the set of lliDgulM matrices.
~ the matrix of coefficients approaches thla set, it is iPtuitlvely clear tbat
the solution x should be increa&ingl.y sensitive to perturbatioo.s.

2. 7.2 Condition
A precise measure of linear system sensitivity can be obtained by OOillBider·
ing the parameterized syBtem

(A+ d')z(() = b + (j z(O) =z


where Fe R'x" and IE R". lf A is IIOD8iDgular, thea it is clear that z(t)
isdiffereutiable in a Deighborhood of zero. Moreover, z(O) = A- 1 (! -Fz)
and thua, tbe Taylor aeriea e:xpan8ioD for z(() baa the Corm

z(() = z + (z(O) + O(r).


2.7. THE SENSITIVITY OF SQUARE SYSTEMS 81

Using any vector norm and consistent matrix norm we obtain

ll:t(t:)-zll < lt:IIIA- 1 11{ 11111 +IIFII} + O(e 2} (272}


llxll - Uzll · ··
For square matrices A define the condition number ~>(A) by

~(A)= II A IIJI A- 1 11 (2.7.3)

with the convention that I'O(A} = oo for singular A. Using the inequality
11 b 11 ::; II A II II x l1 it follows from (2.7.2) that

II x(~~~" x II < ~~:(A)(pA + pr,) + O(e2) (2.7.4)

where
PA =
IIFII
If I fAii and pr, = IE I m
11/11

represent the relative errors in A and b, respectively. Thus, the relative


error in x can be ,~~;{A) times the relative error in A and b. In this sense, the
condition number ~~:(A) quantifies the sensitivity of the Ax= b problem.
Note that ~t(·) depends on the underlying norm and subscripts are used
accordingly, e.g.,

ut (A)
I'O:z(A) =II A u~:tl A-l 112 = u,.(A). (2.7.5)

Thua, the 2-norm condition of a matrix A measures the elongation of the


hyperellipsoid {Ax: II x ll2 = 1}.
We mention two other characterizations of the condition number. For
p-norm condition numbers, we have
1
= min (2.7.6)
~~:p{A} A+~A singular

This result may be found in Kahan {1966) and shows that ~~:p(A) measures
the relative p-norm distance from A to the set of singular matrices.
For any norm, we also have

~e(A) = lim sup II (A+ M)- 1 - A- 1 II 1 . (2.7.7)


E-o IIAA~S:EflA.
f II A-111
This imposing result merely says that the condition number is a normaliz,ed
Frechet derivative of the map A -+ A -l. Further details may be foUDd in
Rice (1966b). Recall that we were initially led to ~e(A) through differenti-
ation.
82 CHAPTER 2. MATRIX ANALYSIS

U ,oc(A) is large, then A is said to be an ill-conditioned matrix. Note that


this is a norm-dependent propertyl. However, any two condition numbers
,oc 0 (-} and ICp(·) on :~rxn are equivalent in that constants Ct IIJld C2 can be
found for which
A E JR"x".
For example, on JR.nxn we have

(2.7.8)

Thus, if a matrix is ill-conditioned in the a-norm, it is ill-conditioned in


the ,8-norm. modulo the constants c1 and ~ above.
For any of the ;rnorma, we have ~>p(A) ~ 1. Matrices with sma.ll con-
dition numbers are said to be well-conditioned . In the 2-norm, orthogonal
matrices are perfectly conditioned in that ,..2(Q) = l if Q is orthogonal.

2. '7.3 Determinants and Nearness to Singularity


It is natural to consider how well determinant size measures ill-conditioning.
If det(A) = 0 is equivalent to singularity, is det(A) ~ 0 equivalent to near
singularity? Unfortunately, there is little correlation between det{A) and
the condition of Ax = b. For example, the matrix B .. defined by

B.. =
l -1

~. ~.
... -1]
...
.
-1
. E lR.nxn (2.7.9)
[ . .
0 0 .. . 1

has determinant 1, but = n2"- 1 . On the other hand, a very well


,oc 00 (Bn)
conditioned matrix can have a very small determinant. For example,

Dn = diag(lO-t, ... , w-1} E Rnxn

satisfies ~tp(D.. ) = 1 although det(D.. ) = 10-".

2.7.4 A Rigorous Norm Bound


Recall that the derivation of (2.7.4) was valuable because it highlighted the
connection between ~>(A) and the rate of change of x(e:) ate:= 0. However,
2 It also depends upon the definition of "lm-ge.~ The matter is piU'Stled ill §3.5
2 . 7. THE SENSITIVITY OF SQUARE SYSTEMS 83

it is a little unsatisfying because it Ia conWlpot on t being "small eoougb"


and because it sheds oo light on the size of the 0( ~) term. lD this aDd the
next subsection we develop some additional Ax = b perturbation theorems
that are completely rigorous.
We first establish a uaefullemma that indicates in terms of ~(A) when
we can expect a perturbed system to be nonsingular.
Lemma 2. 7.1 Suppo1e
Ax = b A E Rnx", 0 '# b E R'"

(A+ 6A)y = b + A.b A.A E r•", A.b E R"

with II A.A II ::;; e II A II 16b II ~ t


1111d 1 II b II· lf e K(A ) = r < 1, then A+ A.A
i$ nonsingulor and
1!.!J!
< 1+r .
Uz ll- 1-r
Proof. Since II A- 1t:..A II ~ t II A- 1 II II A II
= r < 1 it follows from
Theorem 2.3.4 that (A+ 6A) is nonsinguJar. Using Lemma 2.3.3 and the
equality (I + A - 1 A.A)v == z + A- 1 A.b we find

II YII::; II(I + A- 16A)- 1 II (IIz ii+ £11 A-I JI IIb ll)


~ 1 ~,. (11 z ll +e ii A- 1 IIIIb l ) = 1 :,.(uz ll+ r \
1 ! D.
11

Since II b II = II Ax II $ II A 11 11 z II it foUows that


1
h II s 1
_ ,. (II z U+ rll z II) . o

We are now set to establish a rigorous Ax = b perturbatiou bowul


Theorem 2.7.2 If the con.ditions of Lemma 2. 7.1 hold, then

II Y-• 11 < ~~(A) (2.7.10)


11 •11 - 1-r
Proof. Since
y-z == A- 1 A.b- A- 1 A.Ay (2.7.11)
we have llv - z II ~ til A- 1 1111 b U+ (II A- 1 11 11 A li llY II and eo
ll y-z U Rbll IIY II
II z II $ t ~(A) II A 1111 z I + uc(A) II z II

2t
= - I '(A). Cl
1-r
84 CHAPTER 2. MATRIX ANALYSlS

Example 2.7'.1 The h ... ,. problem

[ ~ 10~. ][ ~ ] .. [ 10~. ]
11M eolutloa z..: ( 1 , 1 )T Md eoodition~eao(A) = IIJ6. U ab = ( llr 8 , 0 )T, dA .:r. 0,
aDd (A+ .O.A)II =II+~. theu JIIC ( 1 + 10-•, 1 )T 1111d the iDequallty (2. 7.10) ay.
to-' ... I z- II ft..., < II~ Hoo "eo(A) '"" to-• tot • 1.
I z lloo nolloo
ThUIJ, Lhe uppw bound in (2.7.10) CAll be a~ CMJreB&iiDMe of the en-or indueed by t.he
perturbation. On tbeotber baud, if 4111 = (0, t0- 8 )T, l1A = 0, Md (A+ll..A)II :o b+.O.b,
cheo thia l11equ.lity aye

~= ~ 2 )( 10-•to' .
Thus, there an petturbatloaa for which the bound in (2.7.10) is -nti..ly ettained.

2.7.5 Some Rigorous Componentwise Bounds


We conclude UUs section by showing that a more refined perturbation the-
ory is possible if compoDeDtwise perturbation bounds are in etfect and if
we make uae of the absolute value notation.
Theorem 2.1.3 Svppolt.
h = b

(A+ ~A)y = b+ ~b ~A E R"xn, ~bE IR..n


and that !~AI S tiAI and 1 ~61 S tlbl· If c5"'oo(A) = r < 1, then (A +~A)
13 n.onsingulor and

II 11 - z lloo S ~IIIA- 1 11AI IIoo.


liz Uco 1- r
Proof. Since 11 ~ lloo S €II A lloo aDd II ~b lloo S eJI b lloo the conditions of
Lemma 2.7.1 are satisfied in tbe infinity norm. This impliEs that A + AA
is nonaiugular and
11!1 lloo < 1 +r
llx lloa - 1- r ·
Now using (2.7.11) we find
IY - xl S I A- 1 11~1 + IA- 1 II~A11Yl

S ~IA- 1 IIbl + tiA- 1IIAI IYI S tiA-'IIAI (lxl + IYI) .


If we take norma, then
2.7. THE SENSlTlVlTY OF SQUARE SYSTEMS 85

The theorem follows upon division by II :t lloo· 0


We refer to the quantity IIIA- 1 11AIIIoo as the SkR.el condition number. It
has been eJiectively used in the analysis of several important linear system
computations. See §3.5.
Lastly, we report on the results of Oettll and Prager (1964) that indicate
when an approximate solution x E R" to the n-by-n system A:t = b satis-
fies a perturbed system with prescribed structure. In particular, suppose
E E anxn and f E R" are given and have nonnegative entries. We seek
AA e R'xn, Ll.b E R", and w ~ 0 such that

(A + aA)i = b + Ll.b ILl.AI 5. wE, ILl.bl 5. wf. (2.7.12)

Note that by properly choosing E and f the perturbed system can take on
certain qualities. For example, if E = IAI and f = lbl and w is small, then
i satisfies a nearby system in the componentwise sense. Oettli and Prager
(1964) show that for a given A, b, x,
E, and f the smallest w possible in
(2. 7.12) is given by

Wmtn =
1M- bli
(Eixl +f),·

If AX = b then Wm>n = 0. On the other hand, if Wmin = oo, then x does


not satisfy any system of the prescribed perturbation structure.

Proble~

P2.7.1 Show tiW if II I II~ 1, then 110(A) ~ 1.


P2. 7.2. Shaw that for a given norm, ,..(AB) ~ 110(A)1't(B) and that 110(oA) ~ 110(A) for a.l.l
nonzero a.
1'2.7.3 Relate the 2-nonn condition of X E R"x" (m;:: n) to tbe 2-norm condition of
the matricaJ
B = [I,.0 1,.
X]

C= [ ~ J.

Not- and Rafarence. for Sec. 2. 7

The condition concept ill thoroughly investigated in

J. Rice (1966). •A Theory of Condition,~ SIAM J. Hum. Anal. 3, 287-310.


W. Kahan (1!166). "Numerical Lineal' Algebra,~ CanaGi.m Math.. BulL 9, 157--1101.

Ref•encea for componentwise perturbacion theory include


86 CHAPTER 2. MATRIX ANALYSIS

W. Oettli and W. ~'rage!' (1964). "Comp.tibilhy of Appcoximate Solutions of Linear


Equa.tioJUI witb GiYeD Errox- BoWlds for Coefficients &Dd Rjgbt HIUKI Side~,~ Numer.
Mt.JtA.. 6, 405-409.
J.E. Cope and B.W. Rust (1979). ~Bounds on aolutiona of sywteros with 8CC1II'ate data,~
SIAM J. Num. Anal. l6, 9S0-63.
R.D. Sbel (1979). "Scaling for numerical stability in Gauaaia.n Elimina&ioa., ~ J. ACM
16,~26.
J.W. DeuwJel. (1992). "The Componentwile 0--.:e to the Nf!l!ln!ld; Singulac Mmix,"
SIAM J. Matri% Anal. A.ppl. 13, lo-19.
D.J. lfi&ham and N.J. Higham (1992). "Componennrise Perturbation Theory £ex- Linear
S:y~WD~ with Multiple Right-Hand Sidel,~ Lift. Alg. and lt8 Applic. 17.4, 111-129.
N.J. Highl!l.lll (1994). ~A Swvey of Componentwille Perturbation Theory in Numerical

CC)mputaQonal MatMmAeic.s, W. Gautacbi (ed.), Volume 48 of Proa:trl•


Lineae AJgebra,~ in Mathemaeic.s of Comput4tion 19.43-1993: A Half Centufll of

pona in Applied Ma~Ue.!l, Americao Mathematical Society, Providence, Rhode


of Ssrm-

Ialand.
S. Chaodruekanm and I. C. F. lpee11 (1995). "'n the Senaitivity of Solution Components
in Linear SyaWDII of Equationa, ~ SIAM J. Mat:ri:& A114L Appi. 16, 93-112.
Tbe reciproct.l of the condition n~~Jllb« lllMSUrell how neN' a given A:l: = b problem is
to singularity. The importance of kn(J'III'ing how Jlelll' a civeu problem is to a difficult or
insoluble problem ba.s come to be appreciated in ma.ny computational sedinp. See

A. Laub{1985). "Numeric&J LiDe.r AJgebra Aspects of Control Design C»nputationl,"


IEEE Tham. Auto. Cone. AC-30, 97-108.
J. L. Barlow (1986). "'n the Small.esi Pollitive Singular Value of &D M-Matrix with
Applicationa to Ergodic Markov Cbainl," SIAM J. Alg. and DUe. Stnu!t. 7, 414-
424.
J.W. Dem!nf!l (1987). "On the Distance to the Nearesi Ill-P<:Wied Problem," N'!J.f11D'.
Mt.JtA.. 51, 251-289.
J.W. Demmel (1988). 'The Probability that a Numerical Anal}'Bi.s Problem Is Difficult,n
MotA. Comp. 50, 44H80.
N.J. Higham (1989). ~Matrix Nearn- Problems and ApplicatioliS," in Applicatioru of
Matriz 1'heor,i, M.J.C. Gover a.nd S. Bamet.t (eda), Oxford University Pn., Oxford
UK, 1-27.
Chapter 3

General Linear Systems

§3.1 Triangular Systems


§3.2 The LU Factorization
§3.3 Roundoff Analysis of Gaussian Elimination
§3.4 Pivoting
§3.5 Improving and Estimating Accuracy

The problem of solving a linear system Ax = b is central in scientific


computation. In this chapter we focus on the method of Gaussian elimi-
nation, the aJgorith.m of choice when A is square, dense, and unstructured.
When A does not fall into this category, then the algorithms of Chapters
4, 5, and 10 are of interest. Some parallel Az = b solvers are discussed in
Chapter 6.
· We motivate the method of Gaussian elimination in §3.1 by discussing
the ease with which triangular systems can be solved. The conversion of
a general system to triangu.Jar form via Gauss transformations is then pre-
sented in §3.2 where the ..language" of matrix factorizations is introduced.
Unfortunately, the derived method behaves very poorly on a nontrivial class
of problems. Our error analysis in §3.3 pinpoints the difficulty and moti-
vates §3.4, where the concept of pivoting is introduced. In the final section
we comment upon the important practical issues associated with scaling,
iterative improvement, and condition estimation.

Before You Begin


Chapter 1, §§2.1-2.5, and §2.7 are assumed. Complementary references
include Forsythe and Moler (1967), Stewart (1973), Hager (1988), Watkins
88 CHAPTER 3. GENERAL LINEAR SYSTEMS

(1991), Ciadet (1992), Datta (1995), Higham (1996), 'fiefethen and Bau
(1996), and Demmel (1996). Some MATLABfunctions important to thia
chapter are lu, cond, rcoDd, and the "bacbluh" operator "\ ". LAPACK
oonnectiona include

LAPACK: 'I'riaapJar Sy.tema


.nsv :iOhW ..U - b
_DSM Sol._AX.,.B
_nan~ Cooditioo slJDate
_TUPS 8o1ft AX • B, AT X • B with e1T0r bounda
_TKTU Solw AX• 8, ATX • 8
.nm A-l

LAPACK: Gtmeral Linear Sy.tenw


.CESV Solve AX= B
.G!lCDII CoDditloo -~ via P A • LU
.IDFS Improve AX a 8, ATX"' 8 , A 8 X"" 8 .olutiou with etTOr bounda
.GESVX Solve AX = 8, AT X = 8, A H X = B wl$h condition estimate
. CETRF PA =o LU
. GETRS Solve AX= 8 , A'~' X= 8, AH.X = 8 viaPA =LV
_CETJU A-1
_GEEQU Equibbratioo

3.1 'Iriangular Systems


'Iladitional factorization methods for linear systems involve the oonversion
of the given square system to a triangular system that has the same 90lution.
This section is about the 90lutlon of triangular systems.

3.1.1 Forward Substitution


Conaidec tbe following 2-by-2lower triangular sy&tem:

If tul2:2 =F 0, then the unknowns CAD be determined sequentially:

Zt = bt/ln
Zl = (~ -l2tZt)f l-n.

This is tbe 2-by-2 vemon of an algorithm known as /OrtJHlni &uh6titution.


The general procedure is obtained by solving the ith equation in Lz = b
for z,:
3.1. TRIANGULAR SYsrEMS 89

If this is evaluated for i = 1:n, then a complete specification of z is obtained.


Note that at the ith stage the dot product of L(i, 1:i -1) and :t(l:i- 1) is
required. Since b, only is involved in the formula for z,, the former may be
overwritten by the latter:

Algorithm 3.1.1 (Forward Substitution: Row Version) H L E Rm(n


R", then this algorithm overwri.~ b with the
is lower triangular and b E
solution to Lx =b. Lis assumed to be nonsingular.
b(I) = b(l)/L(1, 1}
fori= 2:n
b(i) = (b(i)- L(i, I:i- l)b(1:i- 1))/L(i, i)
end
This algorithm requires n 2 flops. Note that L is accessed by row. The
computed solution x satisfies:

(L + F).i = b (3.1.1)

l-or a proof, see Higham (1996). It says tha.t the computed solution exactly
satisfies a slightly perturbed system. Moreover, each entry in the perturbing
matrix F is small relative to the corresponding element of L.

3.1.2 Back Substitution


The analogous Blgorithm for upper triangular systems U x = b is called
back-substitution. The recipe for Xi is prescribed by

and once again b, can be overwritten by :t1•

Algorithm 3.1.2 (Back Substitution: Row Version) If U E Rnxn


is upper triangular and b E R", then the following algorithm overwrites b
with the solution to Uz =b. U is~ to be nonsingular.
b(n) = b(n)/U(n, n)
fori= n -1:-1:1
b(i) = (b(i)- U(i, i + 1:n)b(i + l:n))/U(i, i)
end
This algorithm requires n 2 fiops and accesses U by row. The computed
solution :i: obtained by the algorithm can be shown to satisfy

(U + F)i = b (3.1.2)
90 CHAPTER 3. GENERAL LINEAR SYSTEMS

3.1.3 Column Oriented Versions


Column oriented versions of the above procedures can be obtained by re-
versing loop orders. To understand what this means from the algebraic
point of view, consider forward substitution. Once Xt is resolved, it can
be removed from equations 2 through n and we proceed with the reduced
system £(2:n, 2:n)x(2:n) = b(2:n) -x(1}L(2:n,1). We then compute x2 a.nd
remove it from equatiOllS 3 through n, etc. Thus, if this approach is applied
to

we find :r: 1 = 3 and then deal with the 2-by~2 system

Here is the complete procedure with overwriting.

Algorithm 3.1.3 (Forward Substitution: Column Version) U L E Rnxn


is lower triangular and b E Rn, then this algorithm overwrites b with the
solution to Lx = b. L is assumed to be nonsingular.
for j = l:n -1
b(j) = b(j)j L(j,j)
b(j + l:n) = b(j + 1:n) - b(j)L(j + 1:n,j}
end
b(n) = b(n)/ L(n, n)
It is also possible to obtain a column-oriented saxpy procedure for back~
substitution.

Algorithm 3.1.4 (Back Substitution: Column Version) If U E R"xn


is upper triangular and b E Rn, then this algorithm overwrites b with the
solution to Ux =b. U is assumed to be nonsingular. ·
for i = n: - 1:2
b(j) = b(j)jU(.i,j)
b(l:j- 1) = b(1:j- 1)- b(j)U(l:j- l,j}
end
b(1) = b(l)/U(1, 1)
Note that the dominant operation in both Algorithms 3.1.3 and 3.1.4 is
tbe saxpy operation. The roundoff behavior of these saxpy implementations
is essentially the same a.s for the dot product versions.
The accuracy of a computed solution to a triangular system is often
surprisingly good. See Higham {1996}.
3.1. TRIANGULAR SYSTEMS 91

3.1.4 Multiple Right Hand Sides


Consider the problem of computing a solution X E R' x' to LX = B where
L E R'xn is lower triangular and B E R'x'. This is the multiple right
htmd nck forward substitution problem. We show that such a problem
can be solved by a block algorithm that is rich in matrix multiplication
assuming that q and n are large enough. This turns out to be important in
subsequent sections where various block factorization schemes are discussed.
We mention that although we are considering here just the lower triangulw-
problem, everything we say applies to the upper triangul.ac case as well.
To develop a block fol'WBl'd substitution algorithm we partition the eqU&-
tian LX = B as follows:

(3.1.3)

Assume tbat the diagonal blacks are square. Paralleling the development of
Algorithm 3.1.3, we solve the system L 11 X 1 = 8 1 for X 1 and then remove
X 1 from block equations 2 through N:

:
£22 0
£33

[
LN3
Continuing in this way we obtain the following block saxpy forward elimi-
nation scheme:
for j = l:N
Solve L;;X; = B;
fori =i + l:N (3.1.4)
Bi = Bi - L;;X;
end
end

l l
Notice tbat the i-loop oversees a single block saxpy update of the form

B~+l = [ B~+l ] - [ L;~1.; X;.


[
BN BN LN~
For this to be handled as a matrix multiplication in a given architec-
ture it is clear that the blocking in (3.1.3} must give sufficiently "big"
X;. Let us assume that this is the case if each X; has at least r rows.
This can be accomplished if N = ceil(n/r) and X~o ... ,XN-l E R'")( 4 and
XN E Jt(n-(N-l)r)xq.
92 CHAPTER 3. GENERAL LINEAR SYSTEMS

3.1.5 The Level-3 Fraction


It is handy to adapt a measure that quantifies the amount of matrix multi-
plication in a given algorithm. To this end we define the level-9 fmction of
an algorithm to be the fraction of flops that occur in the context of matrix
multiplication. We call such flaps level-3 ji&p&.
Let us determine the level-3 fraction fur {3.1.4} with the simplifying
assumption that n = rN. {The same conclusions bold with the unequal
blacking described above.) Because there are N applications of r-by-r
forward elimination {the level-2 portion of the computation) and n 2 flops
overall, the level-3 fraction is approximately given by

Nr 2 1
1 - - == l - -
n2 N
Thus, for large N almost all flops are level-3 flaps and it makes seose to
choose N as large as possible subject to the constraint that the underlying
architecture can achieve a high level of performance when processing block
saxpy's of width at least r = njN.

3.1.6 Non-square Triangular System Solving


The problem a£ solving nonsqua.re, ~by-n trianguJar systems deserves some
mention. Consider first the lower triangular case when m ~ n, i.e.,

Lu E R"x" bt E 1R"
£ 21 E JR(m-n)Xn b, E 1Rm-n

Assume that Lu iB lower triangular, and nonsingular. If we apply forward


elimination to Lux= b1 then x solves the system provided ~ 1 (L~11 bl) =
~- Otherwise, there is no solution to the overall system. In such a case
least squares minimization may be appropriate. See Chapter 5.
Now consider the lower triangular system Lx = b when the number
of columns n exceeds the number of rows m. In this case apply forward
substitution to the square system L{l:m, l:m)x(l:m, l:m) = band prescribe
an arbitrary value for x(m + l:n). See §5.7 for additional comments on
systems that have more unknowns than equations.
The handling of no118Quare upper triangular systems is similar. Details
are left to the reader.

3.1.7 Unit Triangular Systems


A unit triangular matrix is a triangulfll' matrix with ones on the diagonal.
Many of the triangular matrix computations that follow have this added
bit of structure. It clearly poses no difficulty in the above procedures.
3.1. TRIANGULAR SYSTEMS 93

3.1.8 The Algebra of Triangular Matrices


For future reference we list a few properties about products and inverses of
triangular and unit triangula.r matrices.
• The inverse of an upper (lower) triangular matrix is upper (lower)
triangular.
• The product of two upper (lower) triangular matrices is upper (lower)
triangu.lar.
• The inverse of a unit upper (lower) triangular matrix is unit upper
(lower) triangular.
• The product of two unit upper (lower) triangular matrices is unit
upper (lower) triangular.

Problema
P3.1.1 Give an a.lgorithm for computing a. llODZe!'O z: E R" such that Uz: = 0 where
=
U E R"x" Is upper triangular with u,.,. 0 and Un · · · U...-l,n-1 yi; 0.
P3.1.l Discll8S how the determinant of a.llqUII£8 triangular matrix could be computed
w1th minimum risk of ownlow and underflow.
P3.1.3 Rewrite Algorithm 3.1.4 given thai U is stored by column in a length n(n+ l)/2
array u.ve.=.
P3.1.4 Write a. det.ailed wnrion of (3.1.4). Do not I!ISiume that N divides n.
P 3.1.5 Prove all the facts about triangular matrices that Me listed in §3.1.8.
P3.1.6 SupposeS, T E wxn 11re npper triangular and that (ST- M):z: =bill a. non-
singular system. Give a.n O(n2 ) algorithm for computill!: z. Note tbai the explicit
fonna.tion of ST- )J require& O(n3 ) fioJlli. Hint. Suppoae

s+ ; [ ~ f ]. T+ = [ ~ ;: J. b+ =[~ ]
where S+ = =
S(k -l:n,.k-1:n), T+ T(k-l;n, k-Ln), b+ =b(.k-1:n~ and a,T,Jj E R.
Show tbai ilwe Jur.ve a wctor r~ such tha& ·
(SeTa - .U)ze =be
and We = Tezc ill a.~le, then
7 ] /J- avr :&c - uT We
%+ = [ re 'T = err - J.
SOlWII (S+T+ - Al)r+ .::: b+. Obaervl! that z+ and W+ = T+z+ ea.c.h require O(n- k)
fiOJlli.
P:S.l. T Soppoc.e the matricm Rt. ... , R,. e e-x.. are all upper triangular. Give an
O(pn2 ) algorithm far 110lving thesyztem (R 1 • • • R,. -M):z: = b a.umin« that the matrix
of coefficients is nona:lngulw. Hint. Generalize the eolntiou to the previous problem.

Notes and Refareocel for Sec. 3.1

Tbe accur11cy of triangular system 110lven ill analyzed in

N.J. Higham (I989). "The Accuracy of &luti.Oila to 1\i.angula.rSystems," SIAM J. Num.


Al\d.L U, 1252-1265.
94 CHAPTER 3. GENERAL LINEAR SYSTEMS

3.2 The L U Factorization


As we have just seen, triangular systems are "easy" to solve. The idea
behind Gaussian elimination is to oonvert a given system Ax = b to an
equivalent triallgular system. The conversion is achieved by taking appro-
priate linear combinations of the equations. For example, in the system

3%1 +5x::t = 9
6x1 + 7x, = 4

if we multiply the first equation by 2 and subtract it from the second we


obtain

lx1 +5x::t = 9
-3x::~; = -14
This lB n = 2 Gaussian elimination. Our objective in this section is to give
a complete specification of this central procedure and to describe what it
does in the language of matrix factorizations. This means showing that
the algorithm computes a unit lower triangular matrix L and an upper
triangular matrix U so that A= LU, e.g.,

[! ~]
The solution to the original Ax = b problem is then found by a two step
triangular solve process:

Ly = b, Ux= y Ax= LUx= Ly =b.


The LU factorization is a "high-leveJ" algebraic description of Gaussian
elimination. Expressing the outcome of a matrix algorithm in the "lan-
guage" of matrix fa.ctorizations is a worthwhile activity. It facilitates gen-
eralization and highlights connections between algorithms that may appear
very different at the scalar level.

3.2.1 Gauss Transformations


To obtain a factorization deecription of Gaussian elimination we need a
matrix description of the zeroing proce88. At the n = 2 level if x1 "' 0 and
-r = x-a/z1, then

More generally, suppose x E Rn with XII :/; 0. If


Xi
TT = ( ____..
0, · • .,0 o1l:+lt••·•Tn) Ti =
Xt.
i = k + l:n
k
3.2. THE LU FACTORIZATION 95

and we define
(3.2.1)
then
1 0 0 0 %t %a

0 1 0 0 %At %At
= 0
0 -Tlr+l 1 0 %At+I

0 - T,. 0 1 :r,. 0

In general, a matrix of tile form M,. = I- ref € R'x" is a Go.u.u trom-


formo.tion if the tint k components of r € R" are zero. Such a matrix is
unit lower triangular. The components of r(k + l :n) are called multiplier1.
The vector .,. is called the Go.uu veaor.

3.2.2 Applying Gauss Transformations


Multiplication by a Gauss transformation is particularly simple. If C € R'x,.
and M~c = I - ref is a Gauss transform, then

M,.C """ (1- ref)C "" C- r(efC) = C- rC(k, :) .

is an outer product update. Since r(l:k) = 0 onJy C(k + l :n , :) is aJ£ected


and the update C = M,.C can be computed row-by-row as foUaws:

for 1 = k+ l:n
C(i, :) = C(i, :) - r,C(k, :)
end

This computation requires 2{n - l)r ftops.

Example 3.2.1

c =[~ :
6
~ ] , r :.
10
[ ~ ] ~ (I - ref>c ;
- 1
[:

~ 17i ] .
10

3.2.3 Roundoff Properties of Gauss Transforms


If -r is tbe computed version of aD exact Ga.usa wctot r, theo it ia_easy to
verify that
96 CHAPTER 3. GENERAL LINEAR SYSTEMS

Iff is used in a Gauss transform update and fl((I- feDC) denotes the
computed result, then

fl {(I- fei)C) = (I- Tei)C + E,

where

Clearly, if r bas large components, then the errors in the update may be
large in comparison to 101. For t.his reason, care must be exercised when
Gauss transformations are employed, a matter that is pursued in §3.4.

3.2.4 Upper Triangularizing


Assume that A E R"x". Gauss transformations M 1 , ••• ,Mn- 1 can usually
be found such that M.. -1 · · · M 2 M 1 A = U is upper triangular. To see this
we first look at the n = 3 case. Suppose

A= [ 2I 45 87] .
3 6 10

If

then
M1 - [_: -3
0
1
0
n,
[~ 7]
4
MtA = -3 -6
-6 -11

un
Likewise.

M, =
0
1 => M2(M1A) = [I 4 7]
0 -3 -6
-2 0 0 1

Extrapolating from this example observe that during the kth step

• We are confronted with a matrix A(k-l) = M,._ 1 • • • MtA that is


upper triangular in columns 1 to k- l.

• The multipliers in M~e are based on A(le-l)(k + l:n, k). In particular,


1
we need ai~- ) # 0 to proceed.

Noting that complete upper triangularization is achieved after n - 1 steps


we therefore obtain
3.2. THE LU FACI'ORIZATION 97

k=1
while (A(k,k) :F 0} & (k $ n -1)
r(k + 1:n) = A(k + l:n, k)/A(k, k) (3.2.2)
A(k + l:n, :) = A(k + l:n, :) - r(k + l:n)A(k, :)
k=k+l
end
The entry A(k, k) must be checked to avoid a zero divide. These quantities
are referred to as the pivots and their reJative magnitude turns out to be
critically important.

3.2.5 The LU Factorization


In matrix language, if (3.2.2) terminates with k = n, then it computes
Gauss tra.nsforms Mt, ... , M"-1 such that Mn-1 · · · MtA = U is upper
triangular. It is easy to check that If M,. = I- rC-"lef, then its inverse is
prescribed by M;; 1 = I + rCA:) ef and so

A=LU (3.2.3)

where
(3.2.4)
It is clear that Lis a unit lower triangular matrix because each Mt-l is unit
lower triangular. The factorization (3.2.3) is called the LU factorization of
A.
As suggested by the need to check for zero pivots in (3.2.2), the LU
factorization need not exist. For example, it is impossible to find l;; and

l[
u.;; so

1 2 3] [ 1 0 Q Uu UJ!I U13]
[ 3 5 3 ==
2 4 7
1
/21
£31
1
£32
Q 0
0
U22
0
U23
U33

To see this equate entries and observe that we must have uu = 1, u12 = 2,
l2 1 = 2, U22 = 0, and l31 = 3. But when we then look at the (3,2) entry
we obtain the contradictory equation 5 = l:31u 12 + l32u22 = 6.
As we now show, a zero pivot in (3.2.2) can be identified with a singular
leading principal submatrlx.
Theorem 3.2.1 A E R'x" has an LU factorization i/det(A(I:k,l:k)) f 0
fork= l:n- 1. If the LU factorization e:Nt.s and A i8 nonsingular, then
the LU factorization i3 unique and det(A) = uu · · · Un"·
Proof. Suppose k-1 steps in (3.2.2) have been executed. At the beginning
of step k the matrix A has been overwritten by M,._ 1 • • • M 1 A = A(k- 1).
Note that ai~- 1) is the kth pivot. Since the Gauss transformations are
98 CHAPTER 3. GENERAL LINEAR SYSTEMS

unit tower triangular it foUows by looking at the leading k-by-k portion of


this equation tbat det(A(l:k, l:k)) = ai~-l) · · ·ai~-l). Thus, if A(l:k, l:k)
is nonsingul&r then the kth pivot is nonzero.
AI!, !or uniqueness, if A= LtUt and A= /4U2 are two LU factorizatioDB
1 1 1
of a llOIUiingular A, then Li Lt = U2UI • Since L; L1 is anit lowm
1
triangular and U2U} is upper triangular, it follows that both of these
matrices must equal the identity. Hence, Lt = L, and Ut = U2.
Finally, if A = LU then det(A) = det(LU) = det(L)det(U) =
det(U) = uu ... Unft· 0

3.2.6 Some Practical Details


From the practical point of view there are several improvements that can
be made to (3.2.2). First, because zeros have already been introduced in
columns 1 through k- 1, the Gauss transform update need only be applied
to colUIDllS k through n. Of course, we need not even apply the kth Gauss
transform to A(:, k) since we know the result. So the efficient thing to do
is simply to update A(k + l:n, k + l:n). Another worthwhile observation is
that the multipliers sssociated with M~: can be stored in the locations that
they zero, i.e., A(k + l:n, k). With theae changes we obtain the following
version of (3.2.2):

Algorithm 3.2.1 {Outer Product Gaussian Elimination) Suppose


A E :an~~:n has the property that A(l:k, l:k} is nonsingula.r fork= l:n -1.
This algorithm compute5 the factorization Mn-1 ·· · M 1 A = U where U is
upper triangular and each M~t: is a Gauss transform. U is stored in the
upper triangle of A. The multipliers associated with M~c are stored in
A(k + l:n, k), i.e., A(k + l:n, k) = -M~c(k + l:n, k).

fork= l:n -1
rows= k+ l:n
A(rows,k) =A(rows,k)/A(k,k)
A(rows,rows) = A(rowa,rows)- A(rowa,k)A(k,rows)
end

This algorithm invol"''eS 2n3 /3 flops Bnd it is one of several formulations of


Gau.t.rion Elimination. Note that each p8SB through the k-loop involves an
outer product.

3.2.7 Where is L?
Algorithm 3.2.3 represents L in terms of the multipliem. la particular, if
-r<•) is the vector of multipliers 8880ciated with M~c then upon termination,
A(k + l:n, k) = r<lc). One of the more happy "ooincidences" in matrix
3.2. THE LU FACTORIZATION 99

computatiou is that if L = Mi 1 • • • M,;-! 1 , then L(k + l:n, k) = r<•l. This


follows from a careful ]ook at the product that defines L. Indeed,
n-1
L= (l+r1 1)er)···(l+r1"- 1)e!'_ 1 ) = 1+ L>flt)ef.
lt-1

Since A(k + l:n, k) houses the ktb vector of multipUers ,(It) , it follows that
A(i, k) houses la~~: for all i > k .

3.2.8 Solving a Linear System


Once A bas been factored via Algorithm 3.2.1, then Land Uare represented
1n the array A. We can then solve the system Ax =
b via the triangular
systems Ly = b and U x = y by using the methods of §3.1.

Example 3.2.2 If Algorithm 3.2.1 is applied to

A: [ i 4
5
6
! l = [ 3~
10
0
1
2 ~ l [~
4
-3
0 -~ l'
then upoo completioo,

A ::: [ ~ -~2
3
lf b-.. (l,l,l)T, thu 11 = (1,-l,O)T solves Ly t band :z: = (-l/3,1/3,0)T eolves
U:z: = ll·

3.2.9 Other Versions


Gaussian elimination, like watri.x multiplication, is a tripl~loop procedure
that can be arrWJged in several ways. Algorithm 3.2.1 corresponds to the
"kij" version of Ga.ussian el.i.m.ination if we compute the outer product
update row-by-row:

fork= l:n-1
A(k + l:n, k) = A(k + I:n, k)/A(k, k)
fori= k+ l:n
for j = k + l:n
A(i,j) = A(i,j)- A(i, k)A(k,j)
end
end
end

There are five other versioOB: kji, ikj, ijk, jik, and jki. The last of these
results in an implementation that features a sequence of gaxpy's and for-
ward eliminations. In this formulation, the Gauss transformations are not
100 CHAPTER 3. GENERAL LINEAR SYSTEMS

immediately applied to A as they 8le in the outer product version. Instea.d,


their application is delayed. The original A(:,j) is untouched until step j.
At that point in the algorithm A(:,j) is overwritten by M;-1 · · ·M1A(:,j).
The jth Gauss transformation is then computed.
To be precise, suppose 1 :5 j :5 n- 1 and assume that L(:, 1:j- 1}
and U(l:j - 1, l:j- 1) 8le known. This means that the first j -1 columns
of L and U are available. To get the jth columns of L and U we equate
jth collliiUlS in the equation A = LU: A(:,j) = LU(:,j}. From this we
conclude that

A(1:j - 1, j) = L{1:j - 1, l:j - 1)U(l:j -I, j)

and
;
A(j:n,j) =L L(j:n, k}U(k,j).
k=l

The first equation is a lower triangular system that can be solved for the
vector U(1:j -l,j). Once this iB accomplished, the second equation can be
rearranged to produce recipes for U (j, j) and L(j + 1:n, j). Indeed, if we
set
j-1
v(j:n) = A(j:n,j)- LL(j:n,k)U(k,j)
k=l
A(j:n,j)- L(j:n, l:j- 1)U(l:j -l,j),

then L(j + l:n,j) = v(j + 1:n)/v(j) and U{j,j) = v(j). Thus, L(j + 1:n,j)
is a scaled gaxpy and we obtain

L =I; U = 0
for j = 1:n
if j = 1
v{j:n) = A(j:n,j)
else
Solve L(l:j- 1, l:j- l)z = A(1:j- l,j) for z {3.2.5)
and set U(l:j- 1,j) = z.
v{j:n) = A(j:n,j)- L(j:n, l:j -1)z
end
ifj<n
L(j + l:n,j) = v(j + 1:n)/v(i)
end
U(j,j) = v(j)
end

This arrangement of Gaussian elimination iB rich in forward eliminations


and ga.xpy operations and, like Algorithm 3.2.1, requires 2n3 /3 flops.
3.2. THE L u FAcr<>RIZATION 101

3.2.10 Block LU
It is possible to organize Gaussian elimination so that matrix multiplication
becomes the dominant operation. The key to the derivation of this block
procedure is to partition A E R'xn as follows

A = [ Au A12 ] r
A·:u A22 n -r
r n-r

where r is a blocking parameter. Suppose we compute the LU factorization


L 11 U11 = A 11 Wld then solve the multiple right band side triangular systems
LuUt:z = A12 and !.,tUn = A21 for Ur:z IUld L21 respectively. It follows
that

[
Au A12 .] = [ Lu 0 ] [ Ir ~ ] [ Uu Ut:z ]
A21 An L:zt In-r 0 A 0 In-r

where A = A22- L:ztUt2· The matrix A is the Schur complement of Au


with respect to A. Note that if A = £.nU22 is the LU factorization of A,
then

Au
[ A21
A12 ] = [ Lu 0 ] [ lr ~ ] [ Uu Ut:z ]
A22 L21 L22 0 A 0 U:z2
is the LU factorization of A. Thus, after Lu, L 21 , Uu and U22, are com-
puted, we repeat the process on the level-3 updated (2,2) block A.

Algorithm 3.2.2 (Block Outer Product LU) Suppose A E Rnll'.n


and that det(A(l:k, l:k) is nonzero for k = l:n -1. Assume that r satisfies
1 $ r :$ n. The following algorithm computes A = LU via rank r updates.
Upon completion, A(i,j) is overwritten with L(i,j) fori > j Wld A(i,j) is
owrwritten with U(i,j) if j ~ i.

A=l
while,\$ n
JA. = min(n,,\ +r -1}
Use Algorithm 3.2.1 to overwrite A(A:~, A:IJ)
with its LU factors L and (J.
Solve lz = A( A:/-', p + 1:n) for Z and overwrite
A(A:IJ,IJ+ l:n) with Z.
Solve Wti = A(IJ + l:n, A:p) for W and overwrite
A(J.' + l:n, ,\:~) with W.
A(p + l:n,IJ + l:n) = A(IJ + l:n,p+ l:n)- WZ
A=p.+l
end
102 CHAPTER 3. GENERAL LINEAR SYSTEMS

This algorithm involves 2n3 /3 Oops.


Reeamug the discll88ion in §3.1.5, let us consider the level-3 fraction
for this procedure assuming that r is large enough 10 that the underlying
computer is able to compute the matrix multiply update A(p + l:n,p +
1:n} = A(~-&+ l:n, p + 1:n) - W Z at "level-3 speed." Assume tOr clarity
that n = r N. The only Hope that are not level-3 flope occur in the context
of the r-by-r LU factorizations A(A:p, A:p) = lu.
Since there are N such
systems solved in the overall computation, we see that the level-3 fraction
is given by
1- N(2r3/3) = 1- _1
2n3 /3 N'J ·
Thus, for large N almost all arithmetic takes place in the context of matrix
multiplication. A!J we have mentioned, this ensures high performance on a
wide range of computing environments.

3.2.11 The LU Factorization of a Rectangular Matrix


The LU factorization of a rectangular matrix A E JR"'x" can also be per-
formed. The m > n case is illustrated by

[: :]-[~ !][~ _; l
while
123]
[ 4 5 6 :
[10][1
4 1 0 -3
2 -63]
depict& them < n situation. The LU factorization of A E R"'x" is guaran-
teed to exist if A(l:k, l:k) is noosingular fork= l:min(m, n).
The square L U factorization algorithms above need only minor modifi-
cation to handle the rectangular case. For example, to handle the m > n
case we modify Algorithm 3.2.1 aa follows:

fork= 1:n
row11 = k+ l:m
A(rows,k) = A(row.~~,k)/A(k,k)
ifk<n
col.11 = k + l:n
A( row.!, cob) = A(row.!,col.ll) -A(row.!, k)A(k, cols)
end
end
This algorithm requires mn 2 - n 3 /3 Oops.
3.2. THE LU FACI'ORJZATION 103

3.2.12 A Note on Failure


As we know, Gaussian elimination fails unless the first n - 1 principal
submatrices are nonsingular. Thia rules out some very simple matrices,
e.g.,

A=[~~]·
While A has perfect 2-no.rm condition, it fails to have an LU factorization
because it has a singular leading principal submatrix.
Clearly, modifications are necessary if Gaussian elimination is to be
effectively used in general linear system solving. The error analysis in the
Following section suggests the needed modifications.

Problmn~~

PS.l.l Suppc»e the entrie. of A(() E R"'" are continuoUBiy difl'eren.tiable functions of
the sc.b.l (. Assume that A~ A(O} aod .U iw princip&lsubma&rics are nOIWngulu.
Show that for sufficiently smlll.l E, the .nairix A(E) ba~~ a.o LU factorimtion A(E) =
L{()U{t) and that L(() ud U(t) llle both continuously ditferenti&ble.
P3.l.2: Suppoee we partition A E R"'"

A == [ An Atz ]
A21 A22

where Au ia r-by-r. ~metbu An i.l nonaingular. ThemMJ"ix S = A22 -A21Aj"11A12


is called the Seh.w ~plement of Au in A. Show tha& if Au bae aa LU factorU&tion,
thl!ll after r st.~ of Algorithm 3.2.1, A(r + l:n,r + l:n) hou.a S. How a)ukl S be
obtaiDed. ahel- r step~ of (3.2.5)?
P3.l.3 Suwo- A E R'l(" hal: a.n LU fedoriu&.ioD. Show 00. Az ""' b caa be solved
without storinc the muhiplien by computinc the LU factoNaiioD of the n-by-(n + l)
~[Ab).

P3.2:.4 Deecribe a variant of Gu.ian elimin.tion that im.rodut::>M SM'08 into the colwnna
of A ill tbe order, n: - 1:2 aod which prod\IC!II8 tbe factoriu&.ion A = U L where U ._ u11it
upper triangulac and L ia loww triaDJuiar.
P3.2:.S MWic:s in R'x" of the form N(v,fc) =l- pef
wbm-e V E R" are .W to
be G-Jonlfm fnlru/orrrJGtioru. (a) Glwa formula few N(tt,J:)- 1 ..u.m.i..ng it emta.
(b) Given ~e R", Wlder what. cooditioDIJ c&aJI" be (l)uQd .o N(11,J:)~ = e,.? (c) Give
aa ~hm llliDg Ga1111-Jorda.ll ~ thal ~A ..rnh A- 1 • What
oond.iUou 011 A I!ZIIIW1: the suca~~~ of your aJ&ornhm7
PS.2:.15 Extend (3.2.5) .o tbal it caD &J.o haDdle the ca. wbea A hu ~ ron thAD
collliiUIL
P3.l.7 Show~ A. can be ~Ue11 wie.h L &Del U ill (3.2.S). Organize tbe three
loops 110 that umt .uide ~ ~
PS.2.8 Develop a version of G~ elimina.Cioc in whlch tbe ~ ot the three
~~a dot product.
104 CHAPTER 3. GENERAL LINEAR SYSTEMS

Note. and R.efenmce. for Sec:. 3.2

Schur complemeata (P3.2.2) arille ill many applicaJ;i.oall. For a .urvey of both praaical
and theoi'Btical i.Dienllt, -

R.W. Cottle (1974). uManifestatioll8 of the Schur Complement,~ Lin. Alg. lind /tJ
...tpplic. 8, 189-211.
Schur compleroeotll are known 811 ucauss transforma" iD some application areu.. The
uae of G&WB-Jorda.n t.ranaformations (P3.2.5) i.l detailed in Fax: (1964). See a.klo

T. Dekker and W. Hoffman (1989). •Rehabilitation of the Ge.UD-Jordan Algorithm,~


Numer. Moth. 54, 591-599.
AA - mentioned, inner product Wl'lriona of Gaw.i&D elimination have been known and
u-=1 for 110me time. The - o f Crout and Doolittle ace 1180CiaU!d with th- ijfc
technique~~. They were popular during the <UI-)'11 of desk calculacon becauae there are
far fewer imennediate results tha.n in GaWIIIian elimination. Th- methods still have
attraction because they can be implemented with accumulaud inner products. For re-
marks along these lines see Fox (1964) B.ll well u Stewart (1973, pp. 131-39). See also:

G.E. Fonythe (1960). "Crout with Pivoting,n Comm. ACM 3, 507...S.


W.M. McKeeman (1962). "Crout with Equilibration and Iteration," Comm. ACM. 5,
553-55.
Loop orderings and block illsum in LU computations are discussed in

J.J. Dongana, F.G. Gu.atavson, 1111d A. Karp (1984). "Implementing Lineec Algebra
Algorithma for Denee Matricel on a Vector Pipeline Machine," SIAM Review 26,
91-112.
J.M. Ortega (1988). "The ijfc Folliiii or FactorU;ation Methods I: Vector Computers,"
Pa.nUkl Com~ 7, 135-147.
D.H. Bailey, K.t-, and H.D. Simon (1991). "Uiins Str--.'a Algorithm to Accelerate
the Solution of LinMI' S}'lltema," J. Supercomputing 4, 357-371.
J.W. Demmel, N.J. Higham, and R.S. Schreiber (199~). "Stability of Block LU Factor-
izMion," Numer. Lin.. Afg. lliteh Applic. .2, 173-lQO.

3.3 Roundoff Analysis of Gaussian Elimina-


tion
We now assess the effect of rounding errors when the algorithms in the
previous two sections are used to solve the liDear system ;U; b. A much =
more detailed treatment of roundoff errOl' in Gaussian elimination is given
in Higham (1996).
Before we proceed with the analysis, it is useful to CODBider the nearly
ideal situation in which no roundoff occurs during the entire solution process
except when A and bare stored. Thus, if fl(b) = b+e and the stored matrix
fl(A) = A+ E is nonsingular, then we are assuming that the computed
solution x satisfies

(A+ E)x = (b +e) II E lloo ~ ull A lie"" II e floc :5 ull b lloo. (3.3.1)
3.3. RoUNDOFF ANALYSIS OF GAUSSIAN Ew.tiNATION 105

That ia, % solwa a "neuby" II)"Btem exactly. Moreover, if ~(A) ~ !


(say), then by using Theorem 2.7.2, lt can be abmm that

IJ Z - Z llco < 41V'oo(A} . (3.3.2)


llz lloo -
The bounds (3.3.1) and (3.3.2) are "best po88ible" DOlDl bounds. No geDeral
or
oo-norm error analysis a linear equation solver that requires the stmage of
A and b can render sharper bounds. As a CODSequence, we cannot justifiably
criticize ao algorithm for returning an inaccurate £ if A is ill-conditioned
relative to the maciJine precision, e.g., ~ao(A) ~ 1.

3.3.1 Errors in the LU Factorization


Let us see how the error bounds for Gall88ian ellminat.ion compare with
the ideal bounds above. We work with the infinlty norm for convenience
and focus our attention on Algorithm 3.2.3, the outer product version.
The error bounds that we derive also apply to Algorithm 3.2.4, the gaxpy
formulation.
Our 6rtt task is to quantify the roundoff errors associa~ with the
computed triangular factors.

Theorem 3.3.1 As.rume that A is an n-by-n matri2: of floating point num-


bers. If no zero pivots are encountered duri~ the ~ution of Algorithm
3.2.3, then the computed triangular m4ltrices L 4nd U sawfy

tO = A+H (3.3.3)

IHI ~ 3(n - l)u (IAI + ILIIirl) + O(u2 ). (3.3.4)

Proof. The proof'ia by induction on n. The theorem obviously holds for


n = 1. Assume it holds for all ( n - 1)-by-( n - 1) tloating point matrices. If

A=
a
[ v
wT]
B n- l
1

1 n- 1

then i = fl(vfa) and At = fl(B- zwT) are computed in the tint step of
the algorithm. We therefore have

z= 1
-u+f
a Ill ~ ul:: (3.3.5)

and

At= B - iwT +F
106 CHAPTER 3. GENERAL LINEAR SYSTEMS

The algorithm now proceeds to calculate the LU factorization of At. By


induction, we compute approximate factors it and (ft for At that satisfy

Thua,

= A +[ :! Ht ~ F ] =A + H .
From (3.3.6} it follows that

IAtl $ (1 + 2u) (IBI + lillwiT) + O(u2 },


and therefore by using (3.3.7) and (3.3.8) we have

IHt + Fl $ 3(n -l)u (1B1 + lillwiT + 1Lti1Ud) + O(u2 ).

Since la/1 :$ uJvl it is easy to verify that

thereby proving the theorem. a


We mention that if A is m-by-n, then the theorem applies with n in (3.3.4}
replaced by the smaller of n and m .

3.3.2 Triangular Solving with Inexact Triangles


We next examine the effect of roundoff error when L and (J are used by the
triangular system solvers of §3.1.

Theorem 3.3.2 Let i and U be the computed LU factors of the n-by-n


floating point matri:r A obtained by either Algorithm 3.2.3 or 3.2.-4.. Suppo~e
the methods of §3.1 are used to produce the computed solution fJ to Ly = b
and the computed solution x to Ux = y. Then (A+ E)x"" b with

lEI $ nu ( 3IAI + 5ILIIU1) + O(u2) • (3.3.9)


3.3. RoUNDOFF ANALYSIS OF GAUSSIAN EUMINATION 107

Proof. From (3.1.1) and (3.1.2) we have

(L +F)y = b IFI :s; nuiLI + O(u:l)


(U +G)% = y JGI :5: nu!Ul + O(u:l)
and thus

(i. + F)(U + G)x = (iU + FU +i.e+ FG)x =b.


From Theorem 3.3.1
1.0 =A+H,
with IHI :s; 3(n- l)u(IAI + ILIIUI) + O(u 2), and so by defining
E = H + FU + i.G + FG
we find (A + E)x = b. Moreover,

lEI :s; IHI + IFIIUI + ILitGI + O(u2 )


:5: 3nu (!AI + !LIIUI) + 2nu (ILl lUI} + O(u2). D

Were it not for the possibility of a large JLIJUI term, (3.3.9) would compare
favorably with the ideal bound in (3.3.1). (The factor n is of no conse-
quence, cf. the Wilkinson quotation in §2.4.6.) Such a possibility exists, for
there is nothing in Gaussian elimination to rule out the appearance of small
pivots. If a small pivot is encountered, then we can expect large numbers
to be present in i. and U.
We stress that small pivots are not necessarily due to ilkonditioning as
the example A= [ ~ ~ ] bears out. Thus, Gaussian elimination can give
arbitrarily poor results, even for well-conditioned problems. The method is
unstable.
In order to repair this shortroming of the algorithm, it is necessary to
intrOduce row and/or column interchanges during the elimination process
with the intention of keeping the numbers that arise during the calculation
suitably bounded. This idea is pursued in the next section.

Example 3.3.1 Su~ fJ"" 10, t = 3, llouinc point arithmetic Ia Ulllld to 110lve:

[ i~ ~:: ][ :~ ] = [ ~:: ] .
Applying G&UI!Bi.an elimination we get

L; [ 1~ ~] U= [
.001
0
1
-1000
]

and a calcul.ation llhow:a

L(J = [ ·~ 1 ~ ] + [ ~ -~ ] : A+ H.
108 CHAPTER 3. GENERAL LINEAR SYSTEMS

M-, 8 1o-• 001


u)oo ] Ia tbe bolmdlnJ 1llaUix in (3..3.4), ~
[ 10_;, 1
DQt a 8eftlnl

ID&ie of IHI- U- co au to 1111lw the problem ~ding the triaDgular II)'1ICem eolven of §3.1,
then llllinc tbe RID8 ~ arithmeUc- obu.iD a o::omput.ed IOiutioD z = (0, l)T.
This Is in coutrut to the exact; 110lution :r: .: (L002. .. , .998 ... )T,

PS.3.1 Show that i f - drop the -umption Uw A is a 8oa&inc pojni matrix in


Theorem 3.3.1, then (3.3.4) holds with the coefficient "l~replaced by "4."
P3.3.:1 Suppose A ia an n-by-n matrix and that L ~ fj ~~n~ produced by AIJOrithm
3.2.1. (a) How many ftopa are required to o::omput.e IIlLI lUI n.... 7 (b) Show fl(!LIIUI) $
(1 + 2nu)ILJIU! + O(u 2).
P3.3.3 Sup~ :r: = A- 1 b. Shaw that if e = z-% (the error) a.nd r == b- Ai (the
1111idual), then

I~~~~~ 1
:S Ue II :S II A- II Ur 11-
Aasume coDBistency ~ the matrix and vector norm.
PS.S.4 Using 2-digit, bue 10, floating point arithmetic, comput-e the LU CactorizMion
of

For this example, whe.t is the matrix H in (3.3.3)?

Not1111 and RefereoCM for Sec. 3.3

Tbe original roUPdoff 11.11alywia of G-..iau el.i.mine.lion appean In

J .H. Wilkimon ( 1961}. "Enw Ana.lyBia of Direct Mecbodl of Matrix lnvenion," J. AC!tl
8, 281-330.

Variou. itnprovementa in tbe boundt and mnplificaiion. in the llllaly.il have occurred
- the yean. See

B.A. CbartrM and J.C. Geuder (1967). "Computable Error Bounds for Direct Solution
of Linear Equationa," J. ACM 14, 63--71.
J.K. Reid (19TI). "A Not.e OQ the Stability ol GauU.n Elimillation," J. i~t. Math.
Applic. 8, 374-15.
c.c. Paige (1973). "An Error Analysia of e. Metbod lor Solvin( Mamx Equa&iolla,"
Math. c()fn,. t1, 355-59.
C. de Boor ami A. Pinkwl (1917). "A Backward Error Analysis lor Tota.lly Pomm
Lin_. S)"'teem." Nllffler. Moth. n, ~90.
H.H. R.obenacm (1917). "The Accuncy of Ern.- ~ foe Systam of Linea:r- Alp-
IJraic Equatioua," J. InD. MtUA. AppHe. to, 409---14-
J.J. Du Cros and N.J. fl1«bam (1992). "Stability ofMetboda !or Matrix lnwnion," IMA
J. Num. AnaL 11, 1-19.
3.4. PIVOTING 109

3.4 Pivoting
The analysis in the previous section shows that we must take steps to ensure
that no large entries appear in the computed triangular factors L and 0.
The example

A = [ .0001 1] = [ 1 0 ] [ .0001 1 ] = LU
1 1 .10,000 1 0 -9999

correctly identifies the source of the difficulty: relatively small pivots. A


way out of this difficulty is to interchange rows. In our example, if P is the
permutation

then

PA = [ .0~01 ~] = [ .0~)1 ~ ] [ ~ .9~99 ] = LU.

Now the triangular factors are comprised of acceptably small elements.


In this section we shaw how to determine a permuted version of A that
has a reasonably stable LU factorization. There are several ways to do
this and they each correspond to a different pivoting strategy. We focus
on partial pivoting and complete pivoting. The efficient implementation
of these strategies and their properties are discussed. We begin with a
discussion of permutation matrix manipulation.

3.4.1 Permutation Matrices


The stabilizations of Gaussian elimination that are developed in this sec-
tion involve data ID.OVeiD.ents such 88 the interchange of two matrix rows.
In keeping with our desire to describe all computatioll8 in "matrix terms,"
it is necessary to acquire a familiarity with pennutation ffl4trices. A per-
mutation matrix is just the identity with its rows re-ordered, e.g.,

P-un n
An n- by-n permutation matrix should never be explicitly stored. It is much
more efficient to represent a general permutation matrix P with au integer
n-vector p. One way to do this is to let p(k) be the column index of the
sole "1" in P's kth raw. Thus. p = [4 1 3 21 is the appropriate encoding of
the above P. It is also possible to encode P on the basis of where the "1"
occurs in each column, e.g., p = [2 431].
110 CHAPTER 3. GENERAL LINEAR SYSTEMS

H P is a permutation and A is a matrixt then P A is a row permuted


version of A and AP is a column permuted version of A. Permutation
matrices are orthogonal and 110 if P is a permutation, then p-l = pT. A
product of permutation matrices is a permutation matrix.
In this section we are particularly interested in inUn:hange pennuta-
tioru. These are permutations obtained by merely swapping two rows in
the identity, e.g.,

E- [n! ~l
Interchange permutations can be used to describe row and column swap-
ping. With the above 4-by-4 example, EA is A with rows 1 and 4 inter-
changed. Likewise, AE is A with columns 1 and 4 swapped.
If P = En··· E1 and each E,. is the identity with rows k and p(k)
interchanged, then p(1:n) is a useful vector encoding of P. Indeed, z E !Rn
can be overwritten by Px as follows:

fork= l:n
x(k) +-+ x(p(k))
end

Here, the"+-+" notation means "swap contents." Since each E1c is symmetric
and pT = E 1 ···En, the representation can also be used to overwrite x with
pTx:

for k = n: - 1:1
x(k) +-+ x(p(k))
end

It should be noted that no Boating point arithmetic is involved in a permu-


tation operation. Howewr, permutation matrix operations often involve the
irregular movement of data and can represent a significant computational
overhead.

3.4.2 Partial Pivoting: The Basic Idea


We show how interchange permutations can be used in LU computations to
guarantee that no multiplier is greater than one in absolute value. Suppose

A =
3 17
2 4
[ 6 18
10
-2
-12
l .
3.4. PIVOTING lll

To get the smallest possible multipliers in the first GaUBS trBDSform using
row interchanges we need au to be the largest entry in the first column.
ThU5, if E 1 is the interchange permutation

~l
[ 0 0
E1 = 0 1
1 0

then
-12]
[~
18
E1A = 4 -2
17 10
and

Mt - [ -:/3 01 00
-1/2 0 1
l ==> MtE1A =
[6 18 -12]
0
0
-2
8
2
16
.

Now to get the smallest possible multiplier in M 2 we need to swap rows 2


and 3. Thus, if

E2 = 1 0 0
0 0 1
[ 0 1 0
l and 01
00
1/4 1
l
then

M2E:~MtE1A = [ 0~ -~~6 ] ·
1
:
0
The example illustrates the basic idea behind the row interchanges. In
general we have:

fork= 1:n -1
Determine an interch.a.nge matrix E~c with E~c(l:k, l:k} = I~c
such that if z is the kth column of E~cA, then
=
lz(k)l II z(k:n) lloo·
A=E~~:A
Determine the Gauss transform M~~: such that if v is the
kth column of M~cA, then v(k + 1:n) = 0 .
A=M~:A
end

This particular row interchange strategy is called partial pivoting. Upon


completion we emerge with Mn- 1 En-t ·· · MtE1 A = U, an upper triangu-
lar matrix.
112 CHAPTER 3. GENERAL LINEAR SYSTEMS

As a coneequence of the partial pi¥Oting, no multiplier is l.&rger than


one in absolute value. Tbia is because

for k = 1:n- 1. Thus, partial pivoting effectively guards against arbitrarily


large multiplienJ.

3.4.3 Partial Pivoting Details


We are now set to detail the overall Gaussian Elimination with partial piv-
oting algorithm.

Algorithm 3.4.1 (GaU88 Elimination with Partial Pivoting) H


A E Rnxn, then this algorithm computes Gauss transforms Mt. · .. Mn-1
and interchange permutations Et. · · · En-l such that Mn-tEn-1 · · · MtEtA
"" U is upper triangular. No multiplier is bigger than 1 in absolute value.
A(1:k, k) is overwritten by U(l:k, k), k = l:n. A(k + 1:n, k) is overwritten
by -Mt(k + l:n,k), k = 1:n- 1. The integer vector p(l:n- 1) defines
the interchange permutations. In particular, Et interchanges rows k and
p(k), k = l:n- 1.
fork= l:n -1
. Determine 1.£ with k :S: J.1 :S: n so IA(~J, k)l = Jl A(k:n, k) lloo
A(k, k:n} ..-. A(~. k:n)
p(k) = J.1
if A(k,k) # 0
rowa = k + l:n
A(rows,k) = A(rows,k)/A(k,k)
A(rows,rows) = A(rows,rows)- A(rows,k)A(k,rows)
end
end
Note that if II A(k:n, k) 1101.') = 0 in step k, then in exact arithmetic the first
k columns of A are linearly dependent. In contrast to Algorithm 3.2.1, this
poses no difficulty. We merely skip over the zero pivot.
The overhead asaociated with partial pivoting is minimal from the stand-
point of Boating point arithmetic as there are only 0{ n 2 ) comparisons asso-
ciated with the search for the pivots. The overall algorithm involves 2n3 /3
ftops.
To solve the linear system Ax = b after invoking Algorithm 3.4.1 we
• Compute y = M .. -tEn-1 · · · M1E1b.
• Solve the upper triangular system Ux = y.
3.4. PIVOTING 113

All the information necesaary to do this is contained in the array A and the
pivot vector p. Indeed, the calculation

fork= l:n -1
b(k) - b(p{k))
b{k + l:n) ::o b(k + l:n} - b(k)A(k + l:n, k)
end

Example 3.4.1 If Algoriibm 3.4.1 is applied to

A :::: [ ~ 18~~ -1:1~~ l'


6
then upon exii
6 18
-12]
A= [ 1/3 8 16
1/2 -1/4 6
and p = [3, 3j. Tbeae two qua.ntiiiee encode all tbe information ~iated with the
reduction:

00][100][ 100][001] [6
[~ 1
1/4
0
1
0
0
0
1
1
0
-1/3
-1/2
1
0
0
1
0
1
1
0
0
0
A= 0
0

3.4.4 Where is L?
Gaussian elimination with partial pivoting computes the L U factorization of
a row permuted version of A. The proof is a messy subscripting argument.

Theorem 3.4.1 If Gauuian eliminmion with partio.l pivoting u used to


compute the upper trianguiarization

{3.4.1)

via Algorithm 3.4.1, then


PA=LU
where P = En-1 · · · E1 and Lis a unit lower tri4ngular matriz with I~; I:::;
1. The kth column of L below the diagonal u a permuted version of the
kth Gauss vector. In p.:artict&Iar, if M,. =I- r(.l:)ef, then L{k + l:n, k) =
g(k + l:n) where g = En-1 · · • E1.+1r(ir:).

~roof. A manipulation of (3.4.1) reveals that Mn-1 · · • MlPA = U where


Mn-1 = Mn-1 and

k::Sn-2.
114 CHAPTER 3. GENERAL LINEAR SYSTEMS

Since each E; ia an interchange permutation involving row j and a row 11


with p. ~ j we have E;(l:j -1, l:j -1) = 1;-l . It follows that each M~c is
a GaUBB transform with Gauss vector f(.t) = En-1 · · • E1c+ 1 T(.t). []

Ait a consequence of the theorem, it is easy to see how to change Algorithm


3.4.1 so that upon completion, A(i,j) houses L(i,j) for all i > j. We
merely apply each E.t to aU the previoUBly computed Gauss vectors. This
is accomplished by changing the line "A(k, k:n) .... A(JA, k:n)" in Algorithm
3.4.1 to "A(k,l:n) .... A(11,l:n)."

E:umple 3.4.~ Tbe factorization PA = LU ofthe matrix in Example 3.4.1 is given by

[ 0~ ~1
1
°0 l[6~ 1

18
~ -12~~ l= [ I/;
1/3 -1/4
~ ~1 l[~ 0
1
:
0
-!~6 l'
3.4.5 The Gaxpy Version
In §3.2 we developed outer product and ga.xpy schemes for computing the
LU factorization. Having just incorporated pivoting in the outer product
version, it is natural to do the same with the ga.xpy approach. Recall from
(3.2.5) the general structure of the ga.xpy LU process:

L=I
U=O
for j = l:n
ifj=l
v(j:n} = A(j:n,j)
else
Solve L(l:j - l,l:j - l}z == A{l:j - 1, j) for z
and set U(l:j - 1, j} = z.
v(j:n) = A(j:n,j)- L(j:n, l:j - l)z
end
if j <n
L(j + l:n,j} = v(j + l:n)/v(j)
end
U(i,j) = v(j)
end

With partial pivoting we search lv(j:n)l for its maximal element and pro-
ceed accordingly. Assuming A is nonsingular so no zero pivots are encoun-
tered we obtain
3.4. PIVOTING 115

L=l; U=O
for j = l:n
ifj=l
v(i:n) = A(i:n,j)
else
Solve L(l:j- 1, l:j- l)z ::o A(l:j -l,j)
for z and set U(l:j- l,j) = z.
v(i:n) = A(i:n,j)- L(j:n, l:j- l}z
~ (U~
ifj<n
Determine p. with k ~ IJ ~ n so lv(p.)l = IJ v(j:n) lloo·
p{j) = p.
v(j) .-. v{p.)
A(j,j + l:n) .-. A(J.',j + l:n)
L(j + l:n,j) = v(j + l:n)fv(j)
ifj>l
L(i, l:j- 1) .-. L(J.', l:j -1)
end
end
U(j,j) = v(j)
end

In this implementation, we emerge with the factorization P A = LU where


P = En-1 · · · E1 where E1c is obtained by interchanging rows k and p(k) of
the n-by-n identity. Ait with Algorithm 3.4.1, this procedure requires 2n3 /3
ftops and 0{ n 2 ) comparisons.

3.4.6 Error Analysis


We now examine the stability that is obta.i.ned with partial pivoting. This
requires an accounting of the rounding errors that are sustained during
elimination and during the triangular system solving. Bearing in mind
that there are no rounding errors 8S30ciated with permutation, it is not
hard to show using Theorem 3.3.2 that the e<mputed. solution :i: satisfie:~
(A+ E}x = b where

lEI :5 nu ( 3IAI + 5PTILIIU1) + O(u2 ) • {3.4.3)

Here we are assuming that P, L, and f.J are the computed analog5 of P,
L, and U as produced by the above algori~. Pivoting implies that the
elements of L are bounded by one. Thus RL Roo :5 n and we obtain the
bound

II E ~co :5 nu ( 311 A lloo + 5nll fJ lloa) + O(u 2 ). (3.4.4)


116 CHAPTER 3. GENERAL LINEAR SYSTEMS

The problem now is to bound II fl lloo· Define the grr1Wth factor p by

p=
la~A:l I
max __i _ (3.4.5)
i.,j,k II A lloo
where JV.:) is the computed version of the matrix A(.t) ""M.tE.t · · · M1E1A.
It follows that
(3.4.6)
Whether or not this compares favorably with the ideal bound (3.3.1) hinges
upon the size of the growth factor of p. (The factor n 3 is not an operating
factor in practice and may be ignored in this discussion.) The growth factor
meaaures how large the numbers become during the process of elimination.
In practice, p is usually of order 10 but it can also be as large as 2"- 1• De-
spite this, most numerical analysts regard the occurrence of serious element
growth in Gausaian elimination with partial pivoting as highly unlikely in
practice. The method can be used with confidence.

Ex.unple 8.4.3 H Ga11811ian elimination with partial pivoting is applied to the problem

with {3
[ i~ ~:~ ] [ =~
= 10, t = 3, floating point aritlunetic, then
] = [ i:~ ]

P= [~ ~]. L= [~~ u~]. iJ = [ 1.000 2.00 ]


1.00
and :t = (1.00, .996)T. Compare w:itb Example 3.3.1.

Example 3.4.2 If A e R"xn ill defined by


1 ifi=jorj=n
Gij = -1 ifi>j
{
' 0 ~benriee

then A bu 11.11 LU factorizaiion witb 1~ 1 1 $ 1 and u.." = :r-- 1 .

3.4. 7 Block Gaussian Elimination


Gaussian Elimination with partial pivoting can be organized so that it is
rich in level-3 operations. We detail a block outer product procedure but
block gaxpy and block dot product formulations are also possible. See
Dayde and Duff (1988).
Assume A e JR.nxn and for clarity that n = rN. Partition A as follows:

A == [ ~~~ ~~ ] n: r
r n-r
3.4 . PIVOTING 117

The first step in the block reduction is typical aDd proceeds as follows:
• Use scalar GaUMian elimination with partial pivoting (e.g. a rec:ta.o-
gular version of Algorithm 3.4.1) to compute permutation P 1 e R' x " ,
unit lower tria.ngular Lu e ~xr and Upper triangular Uu e R"xr so

• Apply tbe P, acroa the rest of A:

• Solve the lower triangular multiple right hand side problem

• Perform the level-3 update

With these computations we obtain the factorization

The process is then repeated on the first r columna of A.


lD geDeral, durlog step lc (1 ~ k ~ N- l) of the block algorithm we
apply scalar Gausaiao elimination to a matrix of size (n- (k- l)r)-by-r.
AD r-by-(n- kr) multiple right hand side .system is solved and a level 3
update o£ size (n- kr}-by-(n- kr) is performed. The level 3 fraction for
the <M!rall process is approximately given by 1 - 3/ (2N). Thus, for large
N the procedure is rich in matrix multiplication.

3.4.8 Complete Pivoting


Another pivot strategy ealJed complete pivoting has tbe property that the
associated growth factor bound is oonaidert.bly smaller than 2"- 1. Recall
that in partial pivoting, tbe kth pivot is determined by ranning tbe current
subcolumn A(l::n, k). In oomp.let.e pivoting, the largest entry in the cur-
rent submatrix A(k:n , l::n) is permuted into the (l:, k) position. Thus, we
compute the upper Uiangularizac:ion M"-tEn-1 · · · MtEtAFt · · · Fn-1 = U
with the property tbat in step k """ ace oonfroated with the matrix

A(lo- t ) = M.~-tE~o-t · · · MtEtAFt ·· ·Ft-t


118 CHAPTER 3. GENERAL LINEAR SYSTEMS

aud determine interchange permutations E, and F~~: &uch that

We have the a.oalog of Theorem 3.4.1


Theorem 3.4.2 If Gaus.rian elimination with compkte pivoting is used to
compute the upper triangularization
(3.4.7}

then
PAQ =:! LU
where P = En-l · · · E1 , Q = F1 · · · F,._l and L is a unit lOUJer triangular
matrix with jl;jl S 1. The kth column of L belmu the diagonal is a permuted
version of the kth Gauss vector. In particular, if M~~: "" I- -r!")ei then
L(k + l:n, k) = g(k + l:n} where g = E,._ •. ··EJo:+lT(/o) •
Proof. The proof is similar to the proof of Theorem 3.4.1. Details are left
to the reader. 0

Here is Gaussian elimination with complete pivoting in detail:

Algorithm 3.4.2 (Gaussian Elimination with Complete Pivoting)


Th.i8 algorithm computes the complete pivoting factorization P AQ = LU
where L is unit lower triangular and U is upper triangular. P = En-1 · · · E1
and Q = Ft·· ·Fn-1 are products of interchange permutations. A(l:k,k)
is overwritten by U(1:k,k),k = 1:n. A(k + l:n,k) is overwritten by L(k +
l:n,k},k = l:n- 1. E1c interchanges rows k and p(k). p._ interchanges
columns lc and q{k).
for k=l:n-1
Determine p with k $ p. $ n and ). with k $ >. $ n so
IA(p, ).)1 = max{ IA(i,j)l : i = k:n, j = k:n}
A(k, 1:n) ~ A(p, 1:n)
A(l:n, k) .... A(l:n, ).)
p(k) = IJ
q(k) = >.
if A(k, k) '1- 0
rawa = k + l:n
A(row&, k) = A(row&, k)/A(k, k)
A(row&,row.s) = A(raw.s,row.s)- A(rawa,k)A(k,rows)
end
end
3.4. PIVOTING 119

This algorithm requires 2n 3 /3 flops &Dd O(n3 ) comparisons. Unlike partial


pivoting, complete pivoting involves a significant overhead because of the
two-dimensional search at each stage.

3.4.9 Comments on Complete Pivoting


Suppose rank( A) = r < n. It follows that at the beginning of step r + 1,
A(r+ l:n,r+ l:n) = 0. This implies that E~o = F1: = M,. =I fork :::o r+l:n
and so the algorithm can be terminated after step r with the following
factorization in band:

PAQ :::o LU = [ Lu
~1
0 ] ( Uu
f .. _.. 0
U12 ] •
0

Here Lu and Uu are r-by-r and ~1 and [!'[; are (n - r)-by-r. Thus,
Gaussian elimination with complete pivoting can in principle be used to
determine the rank of a matrix. Yet roWldoff errors make the probability
of encountering an exactly zero pivot remote. In practice one would have to
"'declare" A to have rank k if the pivot element in step k + 1 wBB sufficiently
small. The numerical rank determination problem is discussed in detail in
§5.4.
Wilkinson (1961) hBB shown that in exact arithmetic the elements of
the matrix A<"l = M~oE1c · · · MtEtAFt · · • F~; satisfy
(3.4.8)
The upper bound is a rather slow-growing function of k. This fa.ct coupled
with vast empirical evidence suggesting that p is always modestly sized (e.g,
p = 10) permit us to conclude that Gauuian elimination 111ith ctJmplete
pivoti119 is stable. The method solves a nearby linear system (A+ E)% b =
exactly in the sense of (3.3.1). However, there appears to be no practical
justification for choosing complete pivoting over partial. pivoting except in
cases where rank determination is an issue.

Exaolp.le 3.4.6 H GIWIIIIiaD elimlnelion rib complete piwt.ing ill applied to the prob-
IIIID
.001 1.00 ] [ :1 ] "" [ UIO ]
[ 1.00 2.00 %2 3.00
'iritb {J = 10, t = 3, lloa&.ing: llritbmetic, tben

p,.. [ 1
o 1 ]
0 'Q=
[ o
1
1 ]
0 '
L ""
[ .500
1.oo o..oo
UIO'
] (J = [ 0.00
2.00 1.00 ]
.499
aDd t = [1.00, l.oo)T. Compece with Examples 3.3.1 NKI3.4...3.

3.4.10 The Avoidance of Pivoting


For certain classes of matrices it is not necessary to pivot. It is important
to identify such classes because pivoting wrually degrades performance. To
120 CHAPTER 3. GENERAL LINEAR SYSTEMS

illustrate the kind of analysis required to prove that pivoting can be safely
avoided, we conaider the case of diagonally dominaot matrices. We say that
A E :K')(n ia strictly diagonally dominant if

i = l:n.

The following theorem shows how this property can ensure a nice, no-
pivoting LU factorization.

Theorem 3.4.3 If AT is strictly diagona.lly dominant, then A ha3 an LU


factorization and llo;l :5 1. In other VJOnU, if Algorithm 3.4.1 is applied,
then P= I.

Proof. Partition A as follows

A= [ ~ ~]
where o: is 1-by-1 and note that after one step of the outer product LU
process we have the factorization

1 0 ] [ 1 0 ] [ o: wT ]
[ vfa. I 0 C- vwT fa 0 I ·

The theorem follows by induction on n if we can show that the transpose


of B = C -vwT / o is strictly diagonally dominant. This is because we may
then assume that B has an LU factorization B =
LtU1 and that implies

1 0 ] [ 0 WT ]
A = [ vjo L1 0 U1 :=: LU.

But the proof that BT is strictly diagonally domi.oaot is straight forward.


From the definitions we have
n-1 n-1 n-1 jw·j"-1
:Eib4;t
i-1
L I~;; - v,w;/o:l :5 L leo; I + ; L
1 1 i-1
lv•l
i-1 i-1
i.J>j
i"'j i#j '"''

:S (lc;;l-lw;f) + lw·l
; (lol-lv;l)
11
$ Is;- I
w~v; = lb,,J.o
3.4. PNOTING 121

3.4.11 Some Applications


We conclude with some examples that illustrate how to think in terms of
matrix factorizations when confronted with various linear equation situa-
tions.
Suppose A is nonsingular &lld n-by~n a.nd that B is n-by-p. Consider the
problem of finding X (n-by-p) so AX= B, i.e., the multiple right hand side
problem. U X = [ Xt, ... , xJI ] and B = ( bt, ... , b,. ] are colUlllJl partitions,
then
Compute PA ""LU.
fork::::;;; l:p
=
Solve Ly Pb~c {3.4.9)
Solve U:q, = y
end

Note that A is factored jUBt once. If B = In then we emerge with a


computed A - l .
As another example of getting the LU factorization "outside the loop,"
suppose we want to solve the linear system A""x == b where A E Rnxn,
bERn, and k is a positive integer. One approach is to compute C"" A"
and then solve Cx = b. However, the matrix multiplications can be avoided
altogether:
Compute P A = LU
for j = l:k
Overwrite b with the solution to Ly = Pb. (3.4.10)
Overwrite b with the solution to U :r. = b.
end
As a final example we show how to avoid the pitfall of explicit inverse
computation. Suppose we are given A E R"xn, d E R"', and c E R" and
that we want to computes= CC' A- 1d. One approach ill to compute X=
A -I as suggested above and then compute s = CI' X d. A more economical
procedure is to compute P A =LU and then solve the triangular systems
cr
Ly = Pd aod u X :::= !I· It follows that 8 = X. The point of this example is
to stress that when a matrix inverse is encountered in a fonnula.; we must
think in terms of solving equations rather than in terms of explicit inverse
formation.

Problem8

P3.4.1 Let A = LU be the LU factGrizati.on of n-by.n A wit.h jt,; I :5 1. Let a.'[ and uf
denote the iib row. of A Md U, re~pecti~y. Verify the equ&iion
i-1

uf = 4l' - L ltruf
j•l
122 CHAPTER 3. GENERAL LINEAR. SYSTEMS

PS.4.:J Shaw that if P AQ = LU i1 ob&aiDed via Ga&aiall eljminatioo with complete


pMH;Ing, then no elemeat of U(i, i:n) ia 1arpr in abloluce value «han luul.
PS.4.3 Su~ A E R"x. b. an W r.ctoriation &Dd tha1 L aDd U are ~. Give
aD alpiihm which call compute the (i,j) 1111try of A- 1 in appraxima&ely (n-;) 2 +(n-i) 2

ftope.
PS.4.4 Suppoee X is the compuud in\ll!IIBI! obtained via (3.4..9). Giw UJ upper bound
for ll AX- IliF·
PS.4.5 Prow Tbeorem 3.4.2.
PS.4.6 Extend Algorithm 3.4.3 so that ii can ~ 1m arllitrary rectanguJar matrix.
PS.4. 7 Write a detailed wnion of tbe block eliminalion algoriUun outlined in §3.4.7.

Notee and Reference~~ tor ~. 3.4


An Algol wnion of Algorithm 3.4.1 it: given in

H.J. Dowdier, R.S. Martin, G. Peten., Uld J.H. Willtin8on {1966}. "Solulioo of Real
and Complt!X Systems o£ Linear Equations,n Numer. Moth. 8, 217-34. See alao
Wilkinson and Reinsch (1971, 93-110).

The conjecture that la~;)l cS n maxi~~;; I when complete pivoting is u.cl bu be8n pi'O\I'&Il
in the real n = 4 case in

C.W. Cryer (1008). ~Pivot Size in Gawmiao. Elimination.~ Hamer. MAth. a, 335-45.

Other papen eoncemed with element groM.h and pi~ing include

J.K. Reid (1971). "A Note on the Sblhility oi GauBa.n Elimination,~ J.Irut. Math.
ApplN;,. 8, 374-75.
P.A. BuaiDpr (1971). ~MoniWJDg tbe Numerical Stability of G...-iaa Eliminalion,n
N - . MGJA. 16, 360-61. ·
A.M. Cohen (1Q74). "A Note on~ Size in G-..-ia.n Elimination," Li-n. Alg. and It.
Applic. 8, 361~
A.M. Eriaman and J.K. Reid (1974). "Monitorin~ the Stabili'f of the "I'rianplR Fac:-
tomar.bl of a Spvae Matrilc," Numer. MAth. n, 183-86. .
J. Day BDd B. Pettnon (1988). "Gnnrtb in G~ Elimin.tion,~ Am=. Math.
Mon.cflly 95, 489--513.
N.J. Higham &Dd D.J. HiJbam (1989). •~..aqe Gl'OII'th F8cton in Ga.uaian EJiminatioQ
with Plwting," SIAM J. M&trV AnaL AppL 10, 1M-1M.
L.N. Trefethen &Del R.S. Schnliber (1990). ·A~ Stability of GAU81ian ElJmi.
n.abm, .. SlAM J. M~ AnaL AppL ll, 335-360.
N. Gould (1991). "'n Growtb in Gau.ia.n Elimination with Complete Piwtinc," SIAM
J. MtUriz Anol. Appl. 11, 354-361.
A. Edelman (1992). "'The Complete PivotU!.c Conjecture !of Gauaaian EJin~inatioo ia
Faile.R The Matlw:m4ttm Jaunt4l !, 53-61.
S.J. Wript (1993). "A Collection ol Probimll for Which Gauasian Elimination with
Partial Pivoting is U~R SIAM J. Sri. on4 StQ. ComJ~. 1,1, 231-238.
L.V. F'ostel- (1994). "GIWMian El.i.miDIItion with Partial Pivoting Can Fail in Practice,"
SIAM J. Mlltri% AnaL Appl. 15, 1354-1362.
3.5. IMPROVING AND :EsTIMATING ACCURACY 123

A. F..clelmaA ADd W. M~ (1~). "'D


the Ccmplew ~ Coujectunt tor a
a.daman1 Mat.m ol 000. 12," Linmr and AI~ A~ 38, 181-18$.
Tbe deaipen of~ Cau.iaD e!!minatioa cod. are~ ill lbe topic olea-&
powtb ~ mul\lplien ~ chan ~ are 10JDe$Uz. c.olera&ed rw ~he _.. of
minlmlsin& fill-in. See

!.S. DWr, A.M. Erisman, UJd J.K. Reid (1086). Direct MeiiiDd.tftlr' SpGne MOlriou,
Oxford UDiV'IInhy Pre..
The OO!lllection betwMn amall plwta Uld n_. ~ ill ~ in

T .F. Chan (1085). "'n tbe Em&eaee and Comput»ion of LU ~u with email
pivoca," Moll&. Comp. 41. ~~8.
A pivot 1tnt41C that- did aot UcuM ia ~ P'wfing. Ia this appr'Ofldl. 2-by-2
G&ll8 uansf~ioo8 are u.d to 1SV the lower tri&Dculac ponion of A The techuique
ie appealin1 in certain mo.ltipr_. eovimameut.w becau. ooly ad,llloeellt - ue com-
bined in each l&ep. See

D. SciNMeD (1985). ~AIUII:fllil of Pairwiae P ivotin& in Gauaian Eliminaiion," IEEE


Thur.a. tm Compu&er.t C-j-4, 214-278.

A3 a sample paper in wbidl a a- ol JllaUicee ia ideotilied that requn no pivoting, see

s. Ssbio (1080). "'n F'actoriiJc a Cl- or Complex SymiDOtric Malricell Wi~bout Piv-
oting,~ MalA. C<nnp. 35, 1231-1234.

Just • there an six ·~· vel'lion8 of ec:alar Gallll6ao e!jmiDe&lon, there are
alao six oonveatioul block fonnula&iou of Gau.ian ~ion. For a diacualnn of
~b- procedwea aAd their i~ion -

K. GallivaD, W. Jalby, U. Meier, aud A.H. Sameb (1988). ulm~ a! Hilnrchical Mem-
ory S)'8temll on Lioe&r AJcebra Algorithm Design," lnt'l J. Supm»mputer Applic.
2, 12-48.

3.5 Improving and Estimating Accuracy


Suppoee Gauasiao elimination with partial Pivoting is used to solve the n-
by-n system Az ""'b. A.ssume t-digit, bue {3 floating poiat arithmetic is
used. Equation (3.4.6) &'leentially says that if the growth factor is modest
then the computed solution : satisfies

(A+ E)% =b, 0E lloo ~ ull A lloo. u = ~fj-c . (3.5.1)

In this section we explore tbe practical ramifiratioos of tlli! result. We begin


by &tn!IIISing the distinction that should be made between residual si2e and
accuracy. This is followed by a di&cusslon of scallDg, iterative improvemeat.,
and condition estimiWon. See Higham {1996) for a more detailed treatmeut
of these topica.
We make two notational remarks at the outset. The infinity norin is used
throughout since it is very handy in roundoff error analysis and in practieal
124 CHAPTER 3. GENERAL LINEAR SYSTEMS

error estimation. Second, whenever we refer to "Gaussian elimination" in


this section we really mean Gaussian elimination with some stabilizing pivot
strategy such as partial pivoting.

3.5.1 Residual Size Versus Accuracy


The ruiduol of & computed solution i to the linear syBtem Ax = b is the
vector b - M. A small residual means that Ai' effectively "predicts" the
right hand side b. From (3.5.1} we have II b- M !leo ~ ull A !I coli x II co
and so we obtain
Heuristic I. Gaussian elimination produces a solution x with a. relatively
small residual.
Small residuals do not imply high accuracy. Combining {3.3.2) and (3.5.1),
we see that
!IX- X lloo
II X lloo ~ U~>oo{A) • (3.5.2)

This justifies a second guiding principle.


Heuristic II. If the unit roundoff and condition satisfy u ~ w-d and
~~:00 (A) ~ lO'l, then Gaussian elimination produces a solution x that
has about d - q correct decimal digits.
If u~~:oc(A) is large, then we say that A is ill-conditioned with respect to
the machine precision.
As an illustration of the Heuristics 1 and II, consider the system

[ :: :~~ ] [ :~ ] = [ :ib~ ]
in which ~(A) ~ 700 and x =(2, -3)r. Here is what we find for various
machine precisions:

t Jl X- x IJoo II b- A.i lloo


fJ .it .i-2
II :Z: lloo II A llooll Z lloo
10 3 2.11 -3.17 5 ·10 ·:.1: 2.0 ·10 ·;i
10 4 1.986 -2.975 8 ·10- 3 1.5. w- 4
10 5 2.0019 -3.0032 1·10-3 2.1. w-6

10 6 2.00025 -3.00094 3 ·10-4 4.2 .w-1

Whether or not one is content with the computed solution x depends on


the requirements of the underlying source problem. In many applications
accuracy is not important but small residuals are. In such a situation, the
x produced by Gaussian elimination is probably adequate. On the other
hand, if the number of correct digits in x is an issue then the situation
is more complicated and the discusBion in the remainder of this section is
relevant.
3.5. IMPROVING AND EsTIMATING ACCURAcY 125

3.5.2 Scaling
Let {3 be the machine base and define the diagonal matrices D 1 and D2 by
D1 = diag(,8'"l ..• tr·)
~ = diag(,OCl ..• ~).
The solution to the n-by-n linear system Az = b can be found by solving
the scakd sy.ttem (Di" 1 AD2)Y = Di 1b using Gaussian elimination and
then setting x = D'ltl· The scali.ngs of A, b, andy require only O(n2 ) flops
and may be accomplished without roundoff. Note that D1 scales equations
and~ scales llllknowns.
It follows from Heuristic II that if x and fi are the computed versions of
x andy, then

II D2 (x- x) II<Xl = II fi- y lloo ~


1
Ult (D-1 AD) (3.5.3)
1 2
II D2 1 x IJIX> II Y lloo oo •

Thus, if ~>IXI(D1 1 AD2 ) can be made considerably smaller than ~t,x,(A), then
we might expect a correspondingly more accurate x, provided errors are
measured in the "~" norm defined by 1l z 11 02 = II D2 1z lloo· This is the
objective of scaling. Note that it encompasses two issues: the condition
of the scaled problem and the appropriateness of appraising error in the
D2-norm.
An interesting but very difficult mathematical problem concerns the
exact minimization of ~>p(D1 1 AD2 ) for general diagonal D, and various
p. What results there are in this direction are not very practical. This is
hardly discouraging, howewr, when we recall that (3.5.3) is heuristic and
it m.a.kes little sense to minimize exactly a heuristic bound. What we seek
is a fast, approximate method for improving the quality of the computed
solution x.
One technique of this wriety is simple row sca.ling. In this scheme D2 is
the identity and D1 .is chosen so that each row in Di 1 A bas apprmdm.ately
the same oo-oonn. Row scaling reduces the likelihood of adding a very
small number to a very large number during elimination-an event that
can greatly diminish accuracy.
Slightly more complicated than simple row scaling is row-column equi-
librotion. Here, the object Is to choose D1 and D 2 so that the oo-oorm
of each row and column of D1 1 AD2 belongs to the interval [1/,8, 1] where
fj is the base of the Boating point system. For work along these lines see
McKeeman ( 1962).
It cannot be stressed too much that simple row scaling and row-column
equilibration do not "solve" the acaling problem. Indeed. either technique
can render a worse %than if no scaling whatever is used. The f81Dificatioua
of this point are thoroughly diacussed. in Forsythe and Moler (1967, chap--
ter 11). The basic recommendation is that the scaling of equations and
126 CHAPTER 3. GENERAL LINEAR SYSTEMS

unknowmi must proceed on a problem-by-problem basis. General scaling


strategies are unreliable. It is best to scale (if at all) on the basis of what the
source problem procl.aims about the significance of each Oi;. Measurement
units and data error may have to be considered.

Example 3.5.1 (Forsythe aod Moler (1967, pp. 34, 40]). If

and the equivalent row--acaled problem

ate each 110lved UBing fJ = lO,t = 3 arithmetic, tben .110lutio1111 ~ = (0.00, UlO)T and
~ =:: (1.00, l.OO)T ar-e reepectiwly computed. Note that :z: =: (1.0001 ... , .9999 ... )T ill
the uuct aoluliou.

3.5.3 Iterative Improvement


Suppose .Ax = b has been solved via the partial pivoting factorization P A =
LU and that we wish to improve the accuracy of the computed solution :i:.
If we execute

r=b-A:i
Solve Ly == Pr. {3.5.4)
Solve Uz == y.
;a;_.=:i:+z
=
then in exact arithmetic A:ttuw = AZ+Az (b-r)+r =b. Unfortunately,
the naive Boating point execution of these formuJae renders an Xtuw that is
no more accurate than i:. This is to be expected since f = fl(b- A:i:) has
few, if any, correct significant digits. (Recall Heuristic I.) Consequently,
i = fl(A- 1r} ::::: A- 1 · noise ::::: noise is a very poor correction from ~
standpoint of improving the accurocy of£. However, Skeel (1980) has done
an error analysis that indicates when (3.5.4) gives an improved ;a;IWUI from
fll.e standpoint of bad:ward.s error. In particular, if the quantity

is not too big, then (3.5.4) produces an~ such that (A+ E)x.._, = b
for very small E. Of course, if Gaussian elimination with partial pivoting
is used then the computed % already solves a nearby system. However,
this may not be the case for some of the pivot strategies that are used to
preserve sparsity. In this situation, the fixed precision iterntive impnwement
3.5. IMPROVING AND EsTIMATING AccURACY 127

step (3.5.4) can be very worthwhile aod cheap. See Arioli, Demmel, and
Duff (1988).
For (3.5.4) to produce a more accurate x, it is necessary to compute the
residual b- Ai with extended precision Boating point arithmetic. Typically,
this means that ifkligit arithmetic is used to compute PA = W, x, y, and
z, then 2t.-d.igit arithmetic is used to form b-Ai, i.e., double precision. The
process can be iterated. In particular, once we have computed PA = LU
=
and initialize x 0, we repeat the following:
r = b - Ax (Double Precision)
Solve Ly = Pr for y. (3.5.5)
Solve Uz = y for z.
x=x+z

We refer to this process as mixed precision iterative improvement. The


original A must be used in the double precision computation of r. The
basic result concerning the performance of (3.5.5) is summarized in the
following heuristic:
Heuristic III. H the machine precision u and condition satisfy u = w-d
and ~too(A) :::::: lO'l, then after k executions of (3.5.5), x has approxi-
mately min(d, k(d- q)} correct digits.
Roughly speaking, if U~too(A} ~ 1, then iterative improvement can ulti-
mately produce a solution that is correct to full (single) precision. Note
that the process is relatively cheap. Each improvement costs O(n 2 ), to be
compared with the original O(n3 ) investment in the factorization PA = LU.
Of course, no improvement may result if A i.s badly enough conditioned with
respect to the machine precision.
The primary drawback of mixed precision iterative improvement is that
its implementation is somewhat machin&-dependent. This discourages its
use in software that is intended for wide distribution. The need for retaining
an original copy of A is another aggravation associated with the method.
On the other hand, mixed precision iterative improvement is usually
very easy to implemettt on a given machine that has provision for the ac-
c\lmulation of inner products, i.e., provision for the double precision calcu-
lation of inner products between the rows of A and x. In a short mantissa
computing environment the presence of an iterative improvement routine
can significantly widen the class of solvable Ax = b problems.

Exmnple 3.5.::1: u (3.5.5) il applied to the ayR.em


.986 .579 ] [ :1:1 ] [ .235 ]
[ .409 .237 Z2 .. .101
and {j == 10 and t = 3, theD. itenliw impc'ovement producea the folJowlog sequeace of
computed 110rutioos:
• [ 2.11 ] [ 1.99 ] [ 2.00 ]
z = -3.11 ' -2.99 • -3.00 •...
128 CHAPTER 3. GENERAL LINEAR SYSTEMS

3.5.4 Condition Estimation


Suppose that we have solved Ax = b via P A = LU and that we now wish
to ascertain the number of correct digits in the computed solution %. It
follows from Heuristic n that in order to do this we need an estimate of the
condition Koc(A) =II A llooll A- 1 lloo· Computing II A lloo poses no problem
as we merely use the formula

II A lloo = max
l~i~n
L la;il·
jcol

The challenge is with respect to the factor II A- 1 lloo· Conceivably, we


could estimate this quantity by II X lloo. where X = [±1, ... , Zn] and Xi
is the computed solution to Al:i = ~- {See §3.4.9.) The trouble with this
approach is its expense: k 00 = II A IIIX>II X lloo costs about three times as
much as i.
The central problem of condition estimation is how to estimAte the
condition number in O(n 2 ) Oops assuming the availability of PA = LU or
some other factorizations that are presented in subsequent chapters. An
approach described in Forsythe and Moler (SLE, p. 51) is based on iterative
improvement and the heuristic ~00 (A} :::::- II z lloc/11 x lloo where z is the first
correction of x in (3.5.5). While the resulting condition estimator is O{n2 ),
it suffers from the shortcoming of iterative improvement, namely, machine
dependency.
Cline, Moler, Stewart, and Wtlkinson (1979) have proposed a very suc-
cessful approach to the condition estimation problem without this tlaw. It
is based on exploitation of the implication

The idea behind their estimator is to choose d so that the solution y is large
in norm and then set

The success of this method hinges on how close the ratio II y lloc/11 d lloc is
to its maximum value II A -l lloo·
Consider the case when A = Tis upper triangular. The relation between
d and y is completely specified by the following column version of back
substitution:

p(l:n) =0
3.5. IMPROVING AND EsTIMATING AcCURACY 129

for k = n: - 1:1
Choose d{k).
y(k) = (d(k) - p(k))/T(k, k) {3.5.6)
p(l:k- 1) ""p(l:k- 1) + y(k)T(l:k- l,k)
end
Normally, we use thia algorithm to solw a given triangular system Ty = d.
Now, however, we are free to pick the right-hand side d subject to the
uconstraint" that y is large relatiw to d.
One way to encourage growth in y is to choose d( k) from the set
{ -1, +1} so as to maximize y(k). II p(k) ~ 0, then set d(k) = -L If
p(k) < 0, th~n set d(k) = +1. In other words, (3.5.6) is invoked with d(k)
=-sign(p(k)). Since d is then ·a vector of the form d(1:n) = (±I, ... , ±l)T,
we obtain the estimator ltoo =- II T llooll Y lloo·
A more reliable estimator results if d( k) E { -1, +1} is chosen so as
to encourage growth both in y(k) and the updated running sum given by
p(l:k- 1, k) + T(l:k- 1, k)y(k). In particular, at step k we compute

y(k)+ = (1- p(k))/T(k,k)


s(k)+ = jy(k)+l + II p{l:k -1) + T(l:k -l,k)y(k)+ 11 1
y(k)- = (-1- p(k))/T(k, k)
s(k)- = ly(k)-1 + II p(l:k -1} +T(1:k -l,k)y(k)-lh

and set

This gives

Algorithm 3.5.1 (Condition Estimator) LetT E R'x" be a nonsin-


gular upper triangular matrix. This algorithm computes unit oo-norm y
and a scalar"" so II Ty lloo::::: 1/11 T-l lloo and~::::: "oo(T)

p{1:n) = 0
for k = n: - 1:1
y(k)+ = (1- p(k))/T(k,k)
y(k)- = (-1- p(k))/T(k,k)
p(k)+ = p{l:k- I)+ T(I:k- 1, k)y(k)+
P(k)- = p(l:k- I)+ T(l:k- 1, k)y(k)-
130 CHAPTER 3. GENERAL LINEAR SYSTEMS

if ly(k)+l + II p(k)+ ll1 ;::: ly(k)-1 + II p(k)- lit


y(k) = y(k)+
p(l:k- 1) = p(k)+
else
y{k) = y(k)-
p(l:k- 1) = p(k)-
end
end
K = II Y llooll T lloo;
Y= Y/11 Y lloo
The algorithm involves several times the work of ordinary back substitution.
We are now in a position to describe a procedure for estimating the
condition of a square nonsingular matrix A whose P A ""' LU factorization
we lmow:
• Apply the lower triangular version of Algorithm 3.5.1 to r;T and ob-
tain a large norm solution to uT y = d.

• Solve the triangular systems LT r = y, Lw = Pr, and U z = w.

• koo = HA llooll Z lfc.o/11 T lloo·


Note that II z lloo :5 II A - l lloo II r lloo· The method is based on several heuris-
tics. First, if A is ill-conditioned and P A = LU, then it is usually the case
that U is correspondingly ill-conditioned. The lower triangle L tends to be
fairly well-conditioned. Thus, it is more profitable to apply the condition
estimator to U than to L. The vector r, because it solves AT pT r = d,
tends to be rich in the direction of the left singular vector associated with
O"min(A). Rightha.nd sides with this property render large solutions to the
problem Az = r.
In practice, it is found that the condition estimation technique that we
have outlined produces good order-of-magnitude estimates of the actual
condition number.

Problem~~

P3.5.1 Show by example that there may be mol'l!! than one way to equilib~ a matrix.

Ps.l5.2 Using fJ = 10, t = 2 eritbJIJd.ic, II01ve

using Gauaaian elimination witb partlal pivoting. Do one step of itecative improvement
ueing t = 4 arithiMtic to compute the residual. (Do noi forget to roUlld the computed
residual to two digita)
t
P3.5.3 Suppoae P(A +E) :; L(J, where P is • permutation, is lower triangular with
1, and U is upper triangular. Show that iec.o(A) ~ II A lloo/(11 E IJ.,.. + #) where
Iii; I :S:
3.5. IMPROVING AND EsTIMATING AccuRACY 131

J' = min IU.ol· Conclude the.& if a small pivot ill e!ICOuntend wiHm Gaullian elimina&ion
with pivotill( is applied to A, then A ill ill-amditiona:l. The CUilYt!IWII is not true. {Let
A= B .. ).
= b where
l l
P3.5.4 {Ka.ha.u 1966) Tbe syBtem. .4:1:

A = [ -~ IQ--;~ 10-1! b = [ 2(1-i~~~lO)


1 to-to 10-lo 1o-to
baa eolution z =(10- 10 - 1 1)-r. (a) Show tbt if (A+ E)11: band lEI ~ w-'IAI,
then j::r- vi :s: 10- 7 jzj. Tba& is, IIIlllll relative changes in A'a emriee do not iDduce large
changes in: tM!I!. thougb ~~:.oo(A) = 1010 . (b) DefineD = diag(1o- 6 ,lo&,1o5). Show
"'aa(DAD) ~ 5. (c) Explain what is going on in lenni of Theorem 2..7.3.

l
P3.5.5 Consider the matrix:

T = [ ~
0
! -~
0 0
-~
1
MER.

What estimate of ~{T) ia produced when (3.5.6) is applied wi~b d(k) "" -agn(p(k))?
What estimate 00. Algorithm 3.5.1 produce? What is the true ~~:.oo(T)?

P3.5.6 Wha& do. Algorithm 3.5.1 produce when spplied to thl! matrix B,. given in
(2.7.9)?

Notea and References ror Sec:. 3.5

The following papent are concerned with the scaling of k =b prob\1!11111:


F.L. Bau« (1963). "'ptimally Sa.led Matri(:eat NtJm.t!r. MalA. 5, 13-81.
P.A. Businger (1968). ~Matric:es Which Can be Opiimally Scaled,~ N11rru!r. Math. JS,
346--48.
A. van del' Sluis (1969). "Condition Numbers and Equilibration Matrices,~ Numer.
Math. 1-4, 14-23.
A. vaD du Sluis (1910). "CoDdition, Equilibn.tion, and Pi~ in Li.neer Algebraic
S}'Btemll.. ~ NumM". Math. 15, 14--86.
C. McCarthy &Dd G. Strang (1913). "''ptimal Conditioning ofMatricea, n SIAM J. Num.
AnaL 10, 3ro-88.
T. Fenn« l!.lld G. Loizou (1974). "Some New Bounds on the Condition Numbers of
Optimally Saled Ma&rices," J. ACN !1, 514-24.
G.H. Golub aod J.M. Vant.b (H174). ~on a Characterization oftbe Best L,-Scaling of a
Matrix,~ SIAM J. Num. Anal. 11, 412-19.
R. Skeel (U179). "Scaling for Numerical Stability in Gaussian Elimillation,~ J. ACM.
25, 494-626.
R. Skeel (1981). ~Effect of Equilibration on Residual. Size for Partial Pivotill(,n SIAM
J. Num. AnoL 18, ~55.
Pact of the difficulty in scaling con~ the selection or a norm in which to mauunt
errorlS. An interstin«: di8cuaion of thia freque!Uly overlooked poim a.p~ in

W. Kahan (1966). "NWDI!llica.l LinM1' Alg*a,~ CGnG4iGn Math. BvU. 9, 157-801.


Fbi' a rigoro~ aD&lyftl of itera&i...,. imp-ovmnent aDd rela&ed IDN«en, see

M. Jankowski aod M. Womialwwsk.i (1977). "Iteratiw Refinement Implies Nwnerical.


Stability,~ BIT 17, 303--311.
132 CHAPTER 3. GENERAL LINEAR SYSTEMS

C.B. Molar (1967}. "lteratiw Refinemt!IDt in Floating Point," J. ACM 14, 316-71.
R.D. Skeel (1980}. "Iterative Refinement Impliet Numerieal Stability for Ga.ussian Elim-
ination," M11th.. Comp. 35, 817-832.
G.W. Stewan: (1981). "'n tba Implicit Deflation of Ntt~~rly Singulac Systems of Linear
Equations," SIAM J. Set. and Stilt. Comp. S, 1~140.
The condition estiJnator that we described ill giwn ill

A.K. Cline, C.B. Moler, G.W. Stewart, and J.H. WilkillSOD (1919). "An Estimate for
the Condition Number of a Matrix," SIAM J. Num. Anal. 16, 368-75.
Other references concerned with the rondition estimation problem include

C.G. Broyden (1973}. "Some Condition Number Bound!! for the Gaussian Elimination
Proceaa," J. !rut. Math. Applic:. 11, 273-86.
F. Lemeire (1973). "Bound& for Condition Numbers of Triangular Value of a Matrix,"
Lin. Alg. and Ita Applil:.. 11, 1-2.
R.S. Varga (1976). "On Diagonal Dominance Arguments for Bounding II A- 1 lloo," Lin.
Alg. and Ita Applic. 14, 211-17.
G.W. S~ (1980). "The Elftciellt Generation of Random Onhogona.l. Matricl!ll with
a.n Applica.tion to Condition Estimaton," SIAM J. Num. Anal. 17, 403--9.
D.P. O'Leary (1980). "Estimating Matrix Condition Numberst SIAM J. Sci. Stat.
Comp. 1, 205-9.
R.G. Grime!l and J.G. Lewis {1981). "Condition Number- Elltimat.ion for Sparse Matri-
ces," SIAM J. Sci. and Stat. Comp. B, 384-88.
A.K. Cline, A.R. Conn, and C. Van Loan (1982). "Generalizing the LINPACK Condition
Estimator-," in Numerical Analy~i.!! , ed., J.P. Hennart, Lectur-e Notes in Mathematics
no. 909, Springer-Verlag, New York.
A.K. Cline and R.K. Rew (1983}. "A Set of Counter examplee to Tbn;~e Condition
Number- Eaiimators," SIAM J. Sci. and Sto.t. Comp. 4, 602-611.
W. Hager (1984). "Condition Estimates,~ SIAM J. Sci. and Sto.t. Comp. 5, 311-316.
N.J. Higham (1987). "A Survey of Condition Number Estimation for Triangular Matri-
CIIII," SIAM .RetMw 19, 575-596.
N.J. Higham (1988). "Fortran Codea for EBilmaiing tbe One-norm of a Real or Complex
Matrix, with Applications to Condition Elltimation," ACM Tl-cM. MotA. Soft. 14,
381-396.
N.J. Higham (1988). "FORI'RAN Codes for Estimatin&: tbe One-Norm of a Real or
Complex Matrix with ApplicatioiiiJ to Condition Eet~n (Algorithm 674}t ACM
Thm.t. Math. Soft. 14, 381-396.
C.H. Bischof (1990}. "'ncremmtal Condition Estlmaiion,w SIAM J. MatN Anlll. AppL
11' 644-659. .
C.H. Bischof (1990}. "Incremental Condition EBtimation f« Spacse Matrices," SIAM J.
Matri.l: Anal. AppL 11, 312-322.
G. Aucbmuty (1991). "A Posteriori Error Elltimates for Linear Equations: Numer.
Ma.tJ&. 61, 1-6.
N.J. Higham (199.1}. "'ptimisation by Direct; Search in Matrix Computationa," SIAM
J. Matri:l: Anal. Appl. U, 317-333.
D.J. Higham (1995}. "Condition NIIJIIbm-a and Their Condition Numbera," Lin. Alg.
and & Applie. 814, 193-213.
Chapter 4

Special Linear Systems

§4.1 The LDMT and LDLT Factorizations


§4.2 Positive Definite Systems
§4.3 Banded Systems
§4.4 Symmetric Indefinite Systems
§4.5 Block Systems
§4.6 Vandermonde Systems and the FFT
§4. 7 Toeplitz and Related Systems

It is a basic tenet of numerical analysis that structure should be ex~


ploited whenever solving a problem. In numerical linear aJgebra., this trans--
lates into an expectation that algorithms for general matrix problems can
be streamlined in the presence of such properties as symmetry, definiteness,
and spanrity. This is the central theme of the current chapter, where our
principal aim is to devise special algorithms for computing special variants
of the LU factorization.
We begin by pointing out the connection between the triangular fac-
tors L and U when A is B)'lllllletric. This is achieved by examining the
LD~ factorization in §4.1. We then tum our a.ttentioo to the important
case when A is both symmetric and positive definite, deriving the stable
Cholesky factorization in §4.2. Unsymmetric positive definite systems are
also investigated in this section. In §4.3, banded versions of Gaussian elimi-
nation and other factorization methods are discussed. We then examine the
interesting situation when A is symmetric but indefinite. Our treatment of
this problem in §4.4 highlights the numerical analyst's ambiwlence towards
pivoting. We love pivoting for the stability it induces but despise it for the

133
134 CHAPTER 4. SPECIAL LINEAR SYSTEMS

structure that it can destroy. Fortunately, there is a happy resolution to


this conftict in the symmetric indefinite problem.
Any block banded matrix is also banded and so the methods of §4.3 are
applicable. Yet, there are occasions when it pays not to adopt this point of
view. To illustrate this we consider the important case of block tridiagonal
systems in §4.5. Other block systems are discussed as well.
In the final two sections we examine some very interesting O(n2 ) algo-
rithms that can be used to solve Vandermonde and Toeplitz systems.

Before You Begin


Chapter 1, §§2.1-2.5, and §2.7, and Chapter 3 are BBSumed. Within this
chapter there are the following dependencies:
§4.5
t
§4.1 -+ §4.2 - §4.3 -+ §4.4
l
§4.6 -+ §4.7
Complementary references include George and Liu {1981), Gill, Murray,
and Wright (1991), Higham (1996), Trefethen and Ba.u (1996), and Demmel
(1996). Some MATLAB functiODB important to this chapter: chol, tril,
triu, vander, toeplitz, !ft. LAPACK connectio~ include

LAPACK: General Band Matrices


_GBSV Solve AX- B
-CGBCDI Condib;m estimator
_GBJIFS Improve AX= B, ATX = B, AH X= B !IOintiona with eiTDr bounds
.GBSVX Solve AX= 8, AT X= B, AH X= B with condition estimate
.Glm.F PA:LU
.CBTliS Soh"' AX= B, AT X= B, AH X= B via PA LU =
_GBEQIJ Equilibration

LAPACK: General Tridiagonal Matrices


.CTSY Solve AX- B
.GTCOJI Condition eatimaklr
.GTRI'S Improve AX = B, AT X = B, AH X = B soluCioDI w:iib error bound.e
.GTSVX Solve AX= B, ATX = B, A" X= B witb condition eatimaee
.cnliF PA=LU
.arns =
Sol"" AX= B, ATX B, AHX"" B via PA W =
LAPACK: Full Symmetric Positive Deftuite
.POSY Sol\oe AX- B
• POCOI =
Caodition fll&ima.t.e via P A LU
- PORFS =
Imps-ove AX B 10luUona witb error bouDda
_POSVJ Solve AX = B with condition an:imate
_POTRF A = GeT
- POTRS Solve AX = B via. A = Gd"
.POTRI A- 1
_POEQU Equilibration
4.1. THE LDMT AND LDLT FACTOR.IZATTONS 135

LAPACK: Banded Symmetric Poaitiw Dednite


.PBSV Solve AX:: B
.PBCOI Condition aJt.imate via A = GGT
_PBRFS lmpi'CJYe AX = B ~e~lutiona with enw bouDda
_PBSVI Solw AX "" B with conditioa ~
_PBtliF A :Gcfl'
_PIITliS Solve AX = B via A = Gd"

LAPACK: Tridiagonal Symmetric POBitive Deftnite


.PTSV SolVI!JlX - B
Condition mtimate via A = LDLT
.PTC!Jlr
.PTllFS lmp~"CJYe AX = B ~e~lutiona with mTDr bounds
_PTSVI Solve AX =B with condition Mtimate
_PTTRF A=LDLT
_PTTIIS Solve AX =B via A = LDLT
LAPACK: Full Symmetric Indefinite
.SYSV Solve AX= B
_SYCOJ Condition eatimat.e via P APT LDLT =
_Sll\FS lm~ AX = B aolutions with tnOT bounda
.SYSVI SoiWI AX = B with condition estimate
.SYTRF PAPT"" LDLT
.SYTRS SoiWI AX= B via PAPT = LDLT
.SY'TU A-1

LAPACK: 'I.Hangular Banded Matrice~~


- TBCOll Condition lllltimate
• TBRFS Improw AX = B, AT X = B eolutioDa with error bounds
_TBTRS SolVI! AX= 8, ATX = 8

4.1 The LDM'f and LDLT Factorizations


We want to develop a structure-exploitiDg method for solving symmetric
Ax = b problems. To do this we establish a variant of the LU factorization
in which A is factored into a three-matrix product. LDMT where D is
diagonal and L and M are unit lower triangular. Once this factorization is
obtained, the solution to Ax= b may be found in O(n2 ) Bops by solving
Ly = b (forward elimination), Dz = y, and MT z = z (back substitution).
The reason fur developing the LDM'f factorization is to set the stage for
the symmetric case for if A = AT then L = M and the work associated
with the factorization is half of that required by Gaussian elimination. The
issue of pivoting is taken up in subsequent sections.

4.1.1 The LDM'f Factorization


Our first result connects the LD~ factorization with the LU factorization.
136 CHAPTER 4. SPECIAL LINEAR SYSTEMS

Theorem 4.1.1 If all the leading principal 6UbmGtricu of A E R"x" are


nonsingular, then there e:ri&t unique unit lower triangular matrice6 L and M
and a unique diagonal matrU D = diag(dt, .•. ,dn) 6UCh that A= LDMI'.
Proof. By Theorem 3.2.1 we know that A has an LU factorization A= LU.
Set D = diag( d 11 ••• , dn) with ~ = Uoi for i = 1:n. Notice that D is non-
singular and that MI' = n- u
1 ia unit upper triangular. Thus, A= LU:;;;
LD(D- U) = LDMI'. Uniqueness follows from the uniqueness of the LU
1

factorization as described in Theorem 3.2.1. 0

The proof shows that the LDMT factorization can be found by using Gaus-
sian elimination to compute A = LU and then determining D and M from
the equation U = DMI'. However, an interesting alternative algorithm can
be derived by computing L, D, and M directly.
Assume that we know the first j - 1 columns of L, diagonal entries
d 11 ... , di-l of D, and the first j -1 rows of M for some j with 1 '5: j '5: n.
To develop recipes fur L(j + l:n,j), M(j, l:j - 1), and di we equate jth
columns in the equation A = LDA(I'. In particular,
A(l:n,j) = Lv (4.1.1)

where v = DMTei. The "top" half of (4.1.1) defines v(l:j) as the solution
of a known lower triangular system:

L(l:j, l:j)v(1:j) = A(l:j,j).


Once we know v then we compute

d(j) = v(j)
M(j,i) = v(i)/d(i) i = 1:j -1.
The "bottom" half of (4.1.1) sayB L(j + 1:n, l:j)v(l:j) = A(j + 1:n,j) which
can be rearranged to obtain a recipe for the jth column of L:
L(j + l:n,j)v(j) = A(j + l:n,j) - L(j + 1:n,l:j -l)v(1:j -1).
Thus, L(j + 1:n, j) is a scaled gaxpy operation and overall we obtain

for j = l:n
Solve L(l:j, 1:j)v(l:j)= A(l:j,j) for v(l:j).
fori= l:j -1
M(j, i) = v(i)/d(i) (4.1.2)
end
d(j) = v(j)
L(j + l:n,j) =
(A(j + 1:n,j)- L(j + l:n, l:j- 1)v(l:j- l)) fv(j)
end
4.1. THE LDMT AND LDLT FACTORIZATIONS 137

A3 with the LU factorization, it is poaaible to overwrite A with the L, D,


and M factors. H the column version of forward elimination is used to solve
for v(1:j) then we obtain the following procedure:

Algorithm 4.1.1 (LDMI') H A E R'xn has an LU factorization then


this algorithm computes unit lower triangular matrices L and M and a
diagonal matrix D = diag(d11 ••• , dn) such that A = LDMT. The entry
a;; is overwritten with '-1; if i > j , with c4 if i = i, and with m;• if i < j.

for j = l:n
{ Solve L(l:j, l:j)v(l:j) = A(1:j, j). }
u(l:j) =A(l:j,j)
fork= l:j -1
v(k + l:j) = v(k + l:j) - u(k)A(k + l:j, k)
end
{ Compute M(j, l:j- 1) and store in A(l:j- l,j). }
fori= 1:j -1
A(i,j) = v(i)/A(i,i)
end
{ Store d(j) in A(j, j). }
A(i, j) = u(j)
{Compute L(j + l:n,j) and store in A(j + l:n,j) }
fork= l:j -1
A(i + l:n) = A(i + l:n,j) - v(k)A(j + l:n, k)
end
A(j + l:n,j) = A(j + 1:n,j)Jv(j)
end

This algorithm involves the same amount of work as the LU factorization,


about 2n3 /3 flops. ·
The computed solution i to A.:r = b obtained via Algorithm 4.1.1 and
the usual triangular system solvers of §3.1 can be showu to satisfy a per-
turbed system (A + E)i = b, where

and L, b, and Mare the computed versions of L, D, and M, respectively.


A3 in the case of the LU factorization considered in the previous chapter,
the upper bound in (4.1.3) is without limit unless some form of pivoting is
done. Hence, for Algorithm 4.1.1 to be a practical procedure, it must be
modified so as to compute a factorization of the fonn P A = LD M'1', where
P is a permutation matrix chosen so that the entries in L satisfy l'-1; I $ 1.
The details of this are not pursued here since they are straightforward and
since our main object for introducing the LDM'f factorization iB to motivate
138 CHAPTER 4. SPECIAL LINEAR SYSTEMS

special methods for symmetric systems.

Example 4.1.1

A """ [ ~ ~
30 00
:
61
l = [ 3~ ~ ~
( 1
l[ ~ ~ gl[~ ! ~ l
1

0 0 1 0 0 1
aDd upon completion, Algorithm 4.1.1 ovenmt-. Au folio1w3:

A = [ lg3 4~ ~1 ]·
4.1.2 Symmetry and the LDLT Factorization
There is redundancy in the LDM'f factorization if A is symmetric.
Theorem 4.1.2 If A = LDM'T i.! the LDM'T factorization of a nonsin-
gtdar symmetric matriz A, then L M. =
Proof. The matrix = M- 1LD is both symmetric and lower
M- 1AM-T
triangular and therefore diagonal. Since D is nonsingular, this implies
that M- 1 L is also diagonal. But M- 1 L is unit lower triangular and so
M- 1L = J.o

In view of this result, it is possible to halve the work in Algorithm 4.1.1


when it is applied to a symmetric matrix. In the jth step we already know
M(j, 1:j - 1} since M = L and we presume knowledge of L's first j - 1

l
columns. Recall that in the jth step of (4.1.2) the vector v(1:j) is defined
by the first j components of DMT e;. Since M = L, this says that

d(l)L(j, l)
1
v( :j) = [
d(j- l)~(j,j - 1) .
d(j)

Hence, the vector v(1:j -1) can be obtained by a simple scaling of L's jth
row. The formula v(j) = A(j,j)- L(j,l:j- l)v(1:j -1) can be derived
from the jth equation in L(1:j, 1:j)v = A(I:j, j) rendering
for j = 1:n
fori= 1:j -1
v(i) = L(j, i)d(i)
end
v(j) = A(j,j)- L(j, 1:j- 1)v(1:j- 1)
d(j) = v(j)
L(j + l:n, j) =
(A(j + 1:n,j)- L(j + l:n, l:j- l)v(l:j- l))fv(j)
end
4.1. THE LDMT AND LDLT FACTORIZATIONS 139

With overwriting this becomes

Algorithm 4.1.2 (LDLT) U A e ll"xn is symmetric and bas an LU


factorization then this algorithm computes a unit lower ~ matrix
L and a diagonal matrix D = diag(dt, •.. , dn) so A= LDLT. The entry
Gi; is overwritten with 4; if i > j and with d; if i = j.
for j = l:n
{ Compute v(l:j). }
fori= 1:j- 1
v(i) = A(j, i)A(i, i)
end
v(j) = A(j,j)- A(j,1:j- 1)v(1:j- 1)
{ Store d(j) and compute L(j + 1:n,j). }
A(j,j) = v(j)
A(i + l:n,j) =
(A(j + 1:n,j)- A(j + 1:n, 1:j- 1)v(1:j -1))/v(j)
end
This algorithm requires n 3 /3 flops, about half the number of fiops involved
in Gaussian elimination.
In the next section, we show that if A is both symmetric and positive
definite, then Algorithm 4.1.2 not only runs to completion, but is extremely
stable. If A is symmetric but not positive definite, then pivoting may be
necessary and the methods of §4.4 are relevant.

Example 4..1.:1
1

~ ~~ ~ ! ~ ][ ~ ~ ~ I !]
10
A= [
~ ] = [ ] [ g
and eo if Algorithm 4.1.:1' i8 applied, A is overwriUen by

A=
[
10
2
3
20
5
4
30
80
1
l .

P:roblem.

P4.1.1 Show that. the LDMT &.ctoria&ion of a non.inguJM A i8 unique if it aiata.


P4.1.2 Modify Algorithm 4.1.1 ao thai. it computes a Cactoriuiiou of the Corm PA =
LDMT, where L and M are both wm bRir triiUigll1a.r, D i111 diagonal., aDd P is a
permu\aiioll U1at ill c:boaeD so It•; I ~ 1.
P4..1.3 Snppme the n-by.n symmetric mauix A = (CI4j) is llklnld in a vector eM
followr. e = (au,IJ.21 1 ···•a..loll22o• .. ,a..2, ... ,a.,,.). Rewrite Algorithm 4.1.2 with A
stored in tbi.l fuhion. Gee u much iDckring outside the inner I~ u ~ble.
140 CHAPTER 4. SPECIAL LINEAR SYSTEMS

P4.1.4 Rewrite Algorithm 4.1.2 for A lliOred by diagonal. See §1.2.8.

Nota and Refwencee for Sec. 4.1

Algorithm 4.l.lla related CO the met;hods of Crout and Doolittle in thac ou~ product
updaiell are avoided. See Chapter- 4 of Fax (HI64} or Stewart (l!n3,131-149). An Algol
procllldlm!l may be found in

H.J. Bowdler, R..S. Martin, G. Petera, and J.H. Wi.lldnaou (1966), "Solution of Real and
Complex Systems of Lineat Equations," Numer. MGtA. 8, 217-234.
See alao

G.E. Forsythe (1960). "Crout with Pivoting,~ Camm. ACM 3, 507-u8.


W.M. McKeeman (1962). ~erout with Equilibration and Iteration," Cormn. ACM 5,
553--55.
JUII& aa algorithms can be tailored to exploit structure, ao can error a.nalysis and pertiU'-
ba&.ioo th~:

M. Arioli, J. Demmel, and I. Duff (1989). "Solvi111 Sparse Linear Syatems with SpBI'IIe
Backwa:d Error," SIAM J. Matri.% Anal. AppL. 10, 165-190.
J.R. Bunch, J.W. Demmel, and C.F. Va.n Loan (1989). "The Strong Stability of Algo-
rith.ms for Solving Symmetric Linear Syatell'lll," SIAM J. MGtri:l AML Appl. 10,
494-499.
A. Ba.rrlund (1991). "Perturbation Bounds for the LDLT a.nd LU DecomJ)08itio~."
BIT 31, 358-363.
D.J. Higham and N.J. Higham (1992). "Backward Error aod Condition of Structured
Lineal' Systems,~ SIAM J. MC!trU AML Appl. 13, 162-mi.

4.2 Positive Definite Systems


A matrix A e R'xn iB politive definite if XT Ax> 0 for all nonzero X e IR.n.
Positive definite systems constitute one of the most important cl888e6 of
special Ax = b problems. COnsider the 2-by-2 symmetric case. If

A = [ au a12 ]
t121 a22

is positive definite then

X = (1, o)T => :xTAx = a 11 > 0


X = (0, l)T => xTA.x = a22 >0
X = (1, l)T => xTA.x = au + 2a12 + an > 0
X = (1, -l)T => xTA.x = au - 2a 12 + an > 0 .

The last two equations imply la12l ~ (au + a22)/2. From these results we
see that the largest entry in A is on the diagonal and that it is positive. This
turns out to be true in general. A symmetric positive definite matrix has
a "weighty" diagonal. The mass on the diagonal is not blatantly obvious
4.2. POSITIVE DEFINITE SYSTEMS 141

as in the C88e of diagonal dominance but it bas the same effect in that jt
precludes the need for pivoting. See §3.4.10.
We begin with a few comments about the property of positive definite..
ness and what. it implies in the unsymmetric case with respect to pivoting.
We then focus oo the efficient organization of the Cholesky procedure which
can be used to safely factor a symmetric positive definite A. Gaxpy, outer
product, and block versioDB are developed. The section concludes with a
few comments about the semidefinite case.

4.2.1 Positive Definiteness


Suppose A E Rnxn is positive definite. It is obvious that a positive definite
matrix is n0I18ingular for otherwise we could find a nonzero x so xT Ax = 0.
However, much more is implied by the positivity of the quadrutic form
xT Ax as the following results show.

Theorem 4.2.1 If A E m_nxn is poMtive definite and X E E:'xll: has rnnk


k, then B = xr AXe Rb.t is also positive definite.
Proof. If z E R."' satisfies 0;::: zTBz = (Xz)T A(Xz) then Xz == 0. But
since X has full column rank, this implies that z == 0. 0
Corollary 4.2.2 If A is positive definite then all its principal submatrices
are positive definite. In particular, all the diagonal entrie" are positive.
Proof. If v E n.• is an integer vector with 1 ~ Vt < · · - < Vfc ~ n, then
X= 10 (:, v) is a rank k matrix made up columns 111, ••• , V.t of the identity.
It follows from Theorem 4.2.1 that A(v, v) = XT AX is positive definite. 0
Corollary 4.2.3 If A is po.fttive definite tkn the factorization A = LDMT
ai&ts and D = diag(d 11 ••• , d,..) hM positive diagoooi entries.
Proof. From Corollary 4.2.2, it follows that the submatrices A(1:k, 1:k)
are nonsingu.lar for k = l:n and so from Theorem 4.1.1 the factorization
A= LDMT exists. U we apply Theorem 4.2.1 with X= L-T then B =
DMT£-T = L- 1 AL-T is positive definite. Since AfTL-T is unit upper
triangular, B and D have the same diagonal and it must be posi.ti"'!. 0
There are several typical situations that give rise to positive definite m.a.
trices in practice:
• The quadratic form i8 an energy function whose positivity ia guaran-
teed from physical principles.
• The matrix A equals a emu-product XT X where X has full column
rank. (Positive definiteness follows by setting A = In in Theorem
4.2.1.)
• Both A and AT are diagonally dominant and each an ia positive.
142 CHAPTER 4. SPECIAL LINEAR SYSTEMS

4.2.2 Unsymmetric Positive Definite Systems


The mere existence of an LD~ factorization does not mean that its com-
putation is advisable because the resulting factors may have unacceptably
large elements. For example, if e > 0 then the matrix

A _ [ e m ] _ [ 1 0 ] [ f 0 ] [ 1 m/t. ]
- -m t. - -m/E 1 0 e + m 2fe 0 1

> 1, then pivoting is recommended.


is positive definite. But if m/e
The following result suggests when to expect element growth in the
LDMT factorization of a positive definite matrix.

Theorem 4.2.4 Let A € ~xn be positiue definite and set T = (A+AT)/2


and S = (A - AT)/2. If A = LDMI', then

(4.2.1}

Proof. See Golub and Van Loan (1979). []

The theorem suggests when it is safe not to pivot. Assume that the com-
puted factors t, iJ, and M satisfy:

(4.2.2)

where c is a constant of modest size. It follows from (4. 2.1) and the analysis
in §3.3 that if these factors are used to compute a solution to Ax = b, then
the computed solution i: satisfies (A+ E).i: b with =
II E IIF :5 u {3nll A IIF + 5cn2(II T ll:~ +II ST- 1s !12)) + O(u2). (4.2.3)

It is easy to show that II T !12 :S II A !12. and so it follows that if


II ST- 1s lb
0
= IIAII:~ (4.2.4)

is not too large then it is safe not to pivot. In other words, the norm of the
skew part S has to be modest relative to the condition of the symmetric
part T. Sometimes it is possible to estimate 0 in an application. This is
trivially the case when A is symmetric for then 0 = o.

4.2.3 Symmetric Positive Definite Systems


When we apply the above results to a symmetric positive definite system
we know that the factorization A= LDLT exists and moreover is stable to
compute. However, in this situation another factorization is available.
4.2. POSITIVE DEFINITE SYSTEMS 143

Theorem 4.2.5 (Cholesky Factorization) lf A E Rnxn ia "!ffllnu!tric


po.rititle definite, then there e:z:Uts a unique lower triangular G E R' x" with
positive diagonal entrie.11 mel& that A = GG T.

Proof. From Theorem 4.1.2, there exists a unit lower triangular Land a
diagonal D = diag(d 1 , ••• , d.,..) such that A= LDLr. Since the dt are~
itive, the matrix G = L diag( .;d;, ... , -./d:)
is real lower triangular with
positive diagonal entries. It also satisfies A = GGr. Uniqueness follows
from the uniqueness of the LDLT factorization. []

The factorization A = GG T is known as the Chole~ky factorization and G


. is referred to as the Choluky triangle. Note that if we compute the Cholesky
=
factorization and solve the triangular systems Gy b and cT x = y, then
b = Gy = G(GT x) = (GGT)x =Ax.
Our proof of the Cholesky factorization in Theorem 4.2.5 is constructive.
However, more effective methods for computing the Cholesky triangle can
be derived by manipulating the equation A = GGT. This can be done in
several ways as we show in the next few subsections.

Example 4.2.1 The matrix

is positive definite.

4.2.4 Gaxpy Cholesky


We first derive an implementation of Cholesky that is rich in the gaxpy
operation. If we compare jth columns in the equation A = GGT then we
obtain
;
A(:, j) = L G(j, k)G(:, k) .
A:•l

This says that


j-1

G(j,j)G(:,j) = A(:,j)- LG(j,k)G(:,k) = v. (4.2.5)


k-1

If we know the first j - 1 columns of G, then v is computable. It follows


by equating components in (4.2.5) that

G(j:n,j) = v(j:n)j..fo(j).

This is a scaled gaxpy operation and so we obtain the foUowing gaxpy-based


method for computing the Cbolesky factorization:
144 CHAPTER 4. SPECIAL LINEAR SYSTEMS

for j = 1:n
v(j:n) = A(j:n, j)
fork= 1:j -1
v(j:n) = v(j:n) - G(j, k)G(j:n, k)
end
G(j:n,j) = v(j:n)/ y'vU)
end
It is possible to arrange the computations so that G overwrites the lower
triangle of A.

Algorithm 4.2.1 (Cholesky: Gaxpy Version) Given a symmetric


positive definite A E R"xn, the following algorithm computes a lower tri·
angular G E m_nxn such that A = GGT. For all i ~ j, G(i,j) overwrites
A(i,j).
for j = 1:n
ifj>1
A(j:n,j) = A(j:n,j) - A(j:n,l:j- l)A(j, l:j- l)T
end
A(j:n,j) = A(j:n,j)j..jA(j,j)
end
This algorithm requires n 3 /3 flops.

4.2.5 Outer Product Cholesky


An alternative Cholesky procedure based on outer product (rank- I) updates
can be derived from the partitioning

A _ [ Q VT ] _ [ 0 ] [ 1
(3 0 ] [ (3 vT j{J ]
- v B - v/P
ln-1 0 B- wT /a 0 ln-1 ·
(4.2.6)
Here, (3 = ..fii and we know that a > 0 because A is positive definite. Note
that B-ooT fa is positive definite because it is a principal submatrix of
J(1' AX where
X = [ 1 -vT fa ] .
0 [,._1

If we have the Cholesky factorization G 1Gf = B -wT fa, then from (4.2.6)
it follows that A = GGC" with

Thus, the Cho~esky factorization can be obtained through the repeated


application of (4.2.6), much in the the style of kji Gaussian el.i.mination.
4.2. POSITIVE DEFINITE SYSTEMS 145

Algorithm 4.2.2 (Cholesky: Outer product Version) Given a sym-


metric positive definite A E R"x", the following algorithm computes a lower
triangular G E :R'xn such that A= GcP'. For all i ~ j, G(i,j) overwrites
A(i,j).

fork= l:n
A(k,k) = y'A(k,k)
A(k + l:n,k) = A(k + l:n,k)/A(k,k)
for j = k + l:n
A(j:n,j) = A(j:n,j) -A(j:n,k)A(j,k)
end
end

This algorithm involves n 3 /3 flops. Note that the j-Joop computes the lower
triangular part of the outer product update

A(k + l:n, k + l:n) = A(k + l:n, k + l:n) - A(k + l:n, k)A(k + l:n, k)T.

Recalling our discussion in §1.4.8 about gaxpy versus outer product up-
dates, it is easy to show that Algorithm 4.2.1 involves fewer vector touches
than Algorithm 4.2.2 by a factor of two.

4.2.6 Block Dot Product Cholesky


Suppose A e R"x" is symmetric positive definite. Regard A= (~;)and its
Cholesky factor G = (G,;) as N-by-N block matrices with square diagonal
blocks. By equating (i,j) blocks in the equation A =GaT with i ~ j it
follows that
j

A;; = L G~df;c.
t-1

Defining
j-1
s= ~; - '2:: oi,.aJ,.
bool

we see that Gnd£ = S if i = j and that G,;Gfj = S if i > j. Properly


sequenced, these equatioDS can be arranged to compute all the G,;:

Algoritlun 4.2.3 (Choleaky: Block Dot Product Version) Given a


symmetric positive definite A E R'x", the following algorithm oomputes a
lower triangulal- G E rxn such that A = GG T. The lower triaugular pact
of A is overwritten by the lower triangular part of G. A is regarded aa an
N-by-N block matrix with square diagonal blocks.
146 CHAPTER 4. SPEClAL LINEAR SYSTEMS

for j = l:N
fori =j:N
j-1

S = A;; - .E
Jr,.l
GUrGJ~c
ifi=j
Compute CboJesky factorizationS= Giid£·
else
Solve G 1 ;d£= S for G1;
end
Overwrite AiJ with G,;.
end
end
The overall process involves n 3 /3 Bops like the other Cholesky procedures
that we have developed. The procedure is rich in matrix multiplication
assuming a suitable blocking of the matrix A. For ex&mple, if n = r N and
each Aii is r-by-r, t.hen the level-3 fraction is approximately 1- (l/N 2 ).
Algorithm 4.2.3 is incomplete in the sense that we have not specified how
the products GUrG;~c are formed or how the r-by-r Cholesky factorizations
S = Giid£ are computed. These important details would have to be
worked out carefully in order to extract high performance.
Another block procedure can be derived from the gaxpy Cholesky algo-
rithm. After r steps of Algorithm 4.2.1 we know the matrices Gu E wxr
and G:u E ~n-r)Xr in

T
Au Atl 0 11 0 1,. 0 Gu 0
[ A21 A23 ] = [ G21 I .. _,. ] [ 0 A] [ G21 ln-r ]

We then perform r more steps of gaxpy Cholesky not on A but on the


reduced matrix A = A 22 · - Gu G11 which we ezplicitly form exploiting
symmetry. Continuing in this way we obtain a block Cholesky algorithm
whose kth step involves r ga.xpy Cholesky steps on a matrix of order n -
(k- l)r followed a level-3 computation having order n- kr. The level-3
fraction is approximately equal to 1 - 3/(2N) if n::::: rN.

4.2.7 . Stability of the Cholesky Process


In exact arithmetic, we know that a symmetric positive definite matrix
has a Cholesky factorization. Conversely, if the Cholesky process runs to
completion with strictly positive square roots, then A is positive definite.
Thus, to find out if a matrix A is positive definite, we merely try to compute
its Cholesky factorization using any of the methods given abD¥8.
The situation in the context of roundoff error is more interesting. The
numerical stability of the Cholesky algorithm roughly follows from the in-
4.2. POSITIVE DEFINITE SYSTEMS 147

equality

it, :S L.o! = Bts.
k-1
This show& that tbe entries in the Cholesky triangle are nicely bounded.
The same conclusion can be reaclled from the equation II G II~ II A ll2· =
The roundoff errors associated with the Cholesky factorization have
been extensively studied in a clasaical paper by WUkinson (1968). Using
the results in this paper, it can be shown that if i is the computed 110lution
to Ax = b, obtained via any of our Cholesky procedures then £ solves
the perturbed system (A + E)% = b where II E II:~: :5 c,.ull A l :z: and Cn
is a small constant depending upon n. Moreover, Willdnson showa that if
Qn~(A) $ 1 where Qnis another small constant, then the Cholesky process
nms to completion, i.e, no square roots of negative numbers arise.

l
Ex.mple 4.2.2 U Algornhm 4.2.2 ia applied to the positive definite mairix

A = [ I~ 2~; :~~
.01 .01 1.00
and fj = 10, t = 2, rounded arithmetic used, then gu "" 10, j:u = 1.5, i:u = .001 and
h.z = 0.00. The alcoriibm the11 breaks down tr)'in& to compute 932·

4.2.8 The Semidefinite Case


A matrix is said to be po.fltive semidefinite if zT Ax ;:: 0 for all vectors
x. Symmetric positive semidefinite (sp•) matrices are important and we
briefiy disctl88 some Cholesky-like manipulations that can be used to solve
various sp8 problems. Results about the diagonal entries in an sps matrix
are needed first.
Theorem 4.2.6 If A e Rnxn i.s "!Jfllmetric po.ritive semitkfinite, then
I~; I :S (as,+ a1;)/2 (4.2.7)
las;l :S .Jas.a;; (i #i) (4.2.8)
ma.x la.;l = max llii (4.2.9)
i,j

llii =0 ~ A(i,:) =0, A(:,i) =0 (4.2.10)


= =
Proof. If x ~ + e1 then 0 ::5: :r;T Ax ~; + a;; + 2Bt; whlle :z; = e; - e;
implies 0 $ xT .4% = a.; + ajj - 2a;;. Inequality ( 4.2. 7) follows from these
two results. Equation (4.2.9) iB an easy consequence of (4.2.7).
'Ib prove ( 4.2.8) a&~~ume without loss of generality that i 1 and j 2 = =
and consider the inequality

0 ::5: [ ~ r[:~ :: ][~ ] = auzl + la12X + G2:J


148 CHAPTER 4. SPECIAL LINEAR SYSTEMS

which holds since A(1:2, 1:2) is also semidefinite. This is a quadratic equa-
tion in x and for the inequality to hold, the discriminant 4al3 - 4aua22
must be negative. Implication ( 4.2.10) follows from (4.2.8}. Cl

Consider what happens when outer product Cholesky is applied to an sps


matrix. If a zero A(k, k} is encountered then from (4.2.10) A(k:n, k) is zero
and there is "nothing to do" and we obtain

fork:;:;;: l:n
if A(k,k) > 0
A(k,k) = .jA(k,k)
A(k + l:n,k) = A(k + l:n,k)/A(k,k)
for j = k+ l:n
A(j:n,j) = A(j:n,j)- A(j:n,k)A(j,k) (4.2.11)
end
end
end

Thus, a simple change makes Algorithm 4.2.2 applicable to the semidefinite


case. However, in practice rounding errors preclude the generation of exact
zeros and it may be preferable to incorporate pivoting.

4.2.9 Symmetric Pivoting


To preserve symmetry in a symmetric A we only consider data reorderings
of the form P APT where Pis a permutation. Row pennuta.tions {A ,_ P A}
or column permutations (A ,_ AP) alone destroy symmetry. An update of
the form
A,_ PAPT

is called a symmetric perniUtation of A. Note that such an operation does


not move off-diagonal elements to the diagonal. The diagonal of P APT is
a reordering of the diagonal of A.
Suppose at the beginning of the kth step in (4.2.11) we symmetrically
permute the la:rgest diagonal entry of A(k:n, k:n) into the lead position.
If that largest diagonal entry is zero then A( k:n, k:n} = 0 by virtue of
(4.2.10). In this way we can compute the factorization PAPT =
GGT
where G E R"x(.t-l) is lower triangular.

Algorithm 4.2.4 Suppose A E R'xn is symmetric poei.tiw semidefinite


and that rank( A)= r. The following algorithm computes a permutation P,
the inde~t r, and an n-by-r lower triangular matrix G such that P APT =
GGT. The lower triangular part of A(:, l:r) is overwritten by the lower
triangular part of G. P = P,. · · · P1 where P,. is the identity with rows k
and piv(k) interchanged.
4.2. POSITIVE DEFINITE SYSTEMS 149

r=O
fork= l:n
Find q (k ~ q ~ n) so A(q, q) = max {A(k, k), .. , A(n, n)}
if A(q,q) > 0
r=r+l
piv(k) q =
A(k, :) ...... A(q, :)
A{:,k) +-+ A(:,q)
A(k, k) = .jA(k, k)
A(k + l:n, k) = A(k + l:n, k)/A(k,k)
for j = k + l:n
A(j:n,j) = A(j:n,j)- A(j:n,k)A(j,k)
end
end
end

In practice, a tolerance is used to detect small A(k, k). However, the sit-
uation is quite tricky and the reader should consu]t Higham {1989). In
addition, §5.5 has a discussion of tolerances in the rank detection problem.
Finally, we remark that a truly efficient implementation of Algorithm 4.2.4
would only access the lower triangular portion of A.

4.2.10 The Polar Decomposition and Square Root


Let A= U1 E 1 vT be the thin SVD of A E IR.mxn where m ~ n. Note that
A= (U1 VT)(VE1 VT) ;;;;: ZP (4.2.12)
where Z = U1 yT and P = VE 1 vT. Z has orthonormal col11IDD8 and Pis
symmetric positive semidefinite because
n

:&T P~ = (VT :z:)TE1(VT:c)= L O"Jcll~ 2:0


h•l

where y = yT :c. The decomposition {4.2.12) is called the polar decom-


pontion because it is analogous to the complex number factorization z =
e•·,.r~(:if) lzl. See §12.4.1 for further discussion.
Another important decomposition is the matri% Sqtl(ln! root. Suppoee
A E :arxn i.s symmetric positive semidefinite and that A = GaT is its
Cholesky factorization. H G = ~yT ia G's SVD and X = UEuT, then
X is symmetric positive semidefinite and
A= Gif = (UEVT)(UEVT)T = U!:. 2 UT = (UEUT)(~UT) = X 2 .
Thus, X is a square ftiQt of A. It can be shown (mast easily with eigen-
value theory) that a symmetric positive semidefinite matrix has a unique
symmetric J)06itive semidefinite square root.
150 CHAPTER 4. SPECIAL LINEAR SYSTEMS

Proble~

P4.2.1 Suwc- the.& H =


A+ •B is Hennitiao a.ad po.itive definite with A, B e R' x".
Thill rDo~~&M that ~Hz > 0 .. ~ z # 0. (a) Show ihat

ia S)'tnllllltric and po.itive definite. (b) Fbrmula&e an algorithm for 10lving (A+iB)(:+ill)
:: (II + U::), where b, c, :~:, and t1 are in R". lt should involve ana /3 Dopa. How much
storage ia required?

P4.2.2 Suppoee A e R'x" is .symtnetric and positive definite. Give an algorithm for
computing an upper triangular lll.lWix R E R"x" such thai A= RRT.
P4.2.3 Let A E R'xn be poeitive definite and 1St T = (A+AT)/2 and S = (A-AT)/'.2.
(a) Show thatII A- 1 112 $ II r- 1 112 and ;~:T A- 1:t $ :~:TT- 1 z for all :r E R". (b) Show
that If A= LDMT, then d~o 2:" 1/11 r- 1 Ill fork= l:n
P4.2.4 Find a 2-by-2 reel matrix A with the property that :r.T A:r. > o ror all real nonzero
2-vectors but which is not positive definite when regarded as & member of ~x •
2

P4.2.5 Suppose A € R'x" hAll a positive diagonaL Shaw that if both A and AT ace
strictly diagonally dominalli, then A is pclllitive definite.
P4.2.6 Show that the function /(:r) = (:z:T A:r.)/2 is a vector norm on R" if &lid only if
A is p011itive definite.
P4.2. T Modify Algorithm 4.2.1 so that if Ule ~quare rooe; of a nepiive number is
encount.an!d, then the al&Qritbm finds a unit vector z so :zT A:r. < 0 &lid terminates.
P4.2.8 The numerical range W(A) of a complex matrix A is defined to be the set
W(A) = {:~:HA:: z-llz= 1}. Show that iiO!{ W(A), then A has an LU factorip,t.ion.
P4.2.9 Fonnulaie au m <n V'!l'llion of the polar dec:omposition for A e R"x".
=
P4.2.10 Suppoee A I+ U'IIT where A E R"l(OI and II 'II n2 = 1. Giw e:q~licit fonuuJa.e
for 1he diagonal and subdiagoDal of A's Cboleaky factor.

P4.2.11 SupPQee A E ft"X" i8 syiiUIIetric positive definite and that itl Choleeky factor
i8 avU.Iable. Let eJ. :I,.(:, .t). For 1 ~ i <; :S "• lee a.; be tbe amalleai real tbat mabl!l
A+ll(~~oef +e1ef) singular. Likewi., le&. Bfi be 1be anallest reallba& lllllkel (A+~ef)
sinruJac. Sbow how 1o compute ''-e quanli1iell using the Sberman-Morrlaoo-Woodbury
formula. Haw many flops IV'e required to find all the ~;?

Note. aad RefereDce. for Sec. 4.2


The definite~~.- of1hequadratic fonn zT A% can hquenily be l!!lltabliahed by CODIIidering
Ule rnathemat!.ra of the undedying ~ For ext~~nple, the <lilcretiutioD of certain
partial d.i.trerential operaton ghw riee to provably positive definite rnatric;ea. A-pectl of
the UliS)'1IUDet.ri J)OIIitive definib! problem IV'e dilcu.ed in

A. Buddey (1974). ~A Note on MatriCflll A= 1 + H, H Skew-Symmetric," Z. Anget~~.


Math. Mech.. $../, 125-26.
4.2. POSITIVE DEFINITE SYSTEMS 151

A. Buddey {1977). "'n


~be SoMioa ol Cenaia Sbw-S)'IDIDetrie u - Sys&c~m~~." SIAM
J. Num.. AnaL 14, 566-70.
G.H. Golub aDd C. Van Lo.n (1979). ·u~ Positive Definite Linear S)'Uema.~
Lin. Alg. tmd Iu Applic. U, 85-iJS.
R.. Matb.i.u (1992). "Matricm with Positive Definite Hermitian Part: lnequali1ies and
Lineu" Systema." SIAM J. Matriz AnaL Appl. 13, 640-654.

S)'llliDe'tric positive definRe l)'lieml conaUtute tbe moat important claM of special.U =b
problems. A.Jgol programs for ~heee problems are given in

R.S. Martin, G. Patera, and J.H. Wilkinson (1965). "S)'llUPI!lric Decompolition of a


Pc:wRive Definite Mairix,~ Numtr. Math. 1, 36~.
R.S. Martin, G. Peten, and J.H. Willrlnaon (1966). "'teratiw RA!lfinement of the Solution
of a Positive Definite Syatem of Equations," Numer. Math. 8, 203-16.
F.L. Bauer and C. Reinacb. (1971). "'nversion ofPOIIitive Definite Matrices by the Ga-
Jordao Method,~ in HtmdbooJC for A~ ComJ7Uiation VoL !, LiMDr Algebro,
J.H. WilkiniJon and C. Reinaeh, eds. Springer-VerJac, New York, 45-49.

The rouadolf errors aaociated with tbe metbod are analyzed in

J.H. Wilkinson (1968). ~). Priori Error Analysis of Algebl'llic Proces&1!fl," Prot!. lnter-
nacionol Ctm~ Moth. (Moecow: lulat. Mir, 1968), pp. 629-39.
J. Meinguet (1983). ~Refined Error Analyam of Choleaky Factorization," SIAM J. Nu-
rner. AnaL ~. 1243-1250.
A. Kielbaainslci (1987). MA Note oa RDunding Error An.alyais ofCholeeky Faetoma.tion,"
Lin. Alg. and !U Applic. 88/89, 487-494.
N.J. Higham (1990). ~ Analyais of 1he Cholesky Decompo~~ition of a Semidefinite Matrix,~
in R.eUable Numt:ricol Computatitm, M.G. Cox and S.J. Hammacling (eda), Oxford
Untver.ity ~.Oxford, UK, 161-185.
R. Carte' (1991). "Y-MP Floa&ing Point and Choialky Factorization," lnt'l J. High
~Computing 3, 215-222.
J-Guang Sun (1992). "Rounding Errol- and Perturbation Bounds for the Choleslcy and
LDLT Factorizaa:iona," Un. Alg. and It.r Applic. 113, 71-91.

The question of bow tbe Cboleaky triangle G changes wbeq A = GG T ia perturbed is


a.n.alysed Ia

G.W. St-.n; {1977b}. MPerturbati011 BoUDda for tbe QR Flctorizat.ion of a Matrix,"


SIAM J. Num. AnaL 14, 509-18.
Z. Dram.k, M. Omlad~, aDd K. V~ (1994). "'n the Perturbation of the ChoiEBky
Faetorlzation," SIAM J. MatrU: AnaL AppL 15,1319-1332.

N~/eensitivity iauel aa~eiated witb poeitiw 881Jli..defiui~ and the polar de-
composition an! pn!III!Dted in

N.J. Higham (1988). "Compu1iag a Ne!~l'e~Jt Symmetric Positive Semidefiaite Matrix,•


Lin.. AZg. and It& Applic. 103, 103-118.
R. Matbiaa (1993). "Penurbation BollJidl for the Polar Decomposition," SIAM J. MatrU:
AnaL AppL 14, 588-597.
R.C. Li (1995). "N- Perturbaciml Bouncbi for the Unitary Polal' Factor," SIAM J.
Matri: AnaL Appl. 16, 327-332.

Computationally-oriented refefence8 for tbe polar- decomparitlon and tbe IIQUant root are
given in §8.6 and §11.2 rmpecr.i-ty.
152 CHAPTER 4. SPECIAL LINEAR SYSTEMS

4.3 Banded Systems


In many applications that involve linear systems, the matrix of coetlicients
is banded. This is the case whenever the equatiollll can be ordered so that
each UDlmown z. appears in only a few equationa in a "neighborhood" of
the ith equation. Formally, we say that A = (a.;) has upper bandwidth q
if a.; = 0 whenever j > i + q and lower bandwidth p if a.; = 0 whenever
i > j + p. Substantial economies can be realized when solving banded
systems because the triangular factors in LU, G(fl', LDMT, etc., are also
banded.
Before proc.eecUng the reader is advised to review §1.2 where several
aspects of band matrix manipulation are discUBSed.

4.3.1 Band LU Factorization


Our first result shows that if A is banded and A = LU then L{U) inherits
the lower (upper) bandwidth of A.
Theorem 4.3.1 Suppose A E Rnxn has an LU factorization A= LU. If A
has upper bandwidth q and lower bandwidth p, then U has upper bandwi.dth
q an.d L has lower bandwidth p.

Proof. The proof is by induction on n. From (3.2.6) we have the factor-


ization

A = [~ w; ] = [ v}a In~l ] [ ~ B- :T fa ] [ ~ 1:~1 ] ·

It is clear that B -vtuT fa bas upper bandwidth q and lower bandwidth p


because only the first q components of w and the first p components of v
are nonzero. Let L 1 U1 be the LU factorization of this matrix. Using the
Induction hypothesis and the sparsity of w and v, it follows that

L _ [ I 0 ] a
and U = [ 0
WT]
Ut
- vfa Lt
have the desired bandwidth properties and satisfy A = LU. []

The specialization of Gai.IS8ian elimination to banded matrices having an


LU factorization is straightforward.

Algorithm 4.3.1 (Band Gaussian Elimination: Outer Product Ver-


sion) Given A E R'x" with upper bandwidth q and lower bandwidth p,
the following algorithm computes the factorization A = LU, assuming it
exists. A(i,j) is overwritten by L(i,j) ifi > j and by U(i,j) otherwise.
4.3. BANDED SYSTEMS 153

fork= 1:n -1
fori= k + 1:min(k + p,n)
A(i, k) = A(i, k)/A(k, k)
end
for j = k + l:min(k + q, n)
for i = k + l:min(k + p, n)
A(i,j) = A(i,j)- A(i,k)A(k,j)
end
end
end
If n > p and n > q then this algorithm involves about 2npq Hops. Band
versions of Algorithm 4.1.1 (LD~) and all the Cholesky procedures also
exist, but we leave their formulation to the exercises.

4.3.2 Band Triangular System Solving


Analogous savings can also be made when solving banded triangular sys-
tems.

Algorithm 4.3.2 (Band Forward Substitution: Column Version)


Let L E E'xn be a unit lower triangular matrix having lower bandwidth
p. Given b E JR.n, the following algorithm overwrites b with the solution to
Lx =b.
for j = l:n
for i = j + l:min(j + p, n)
b( i) = b( i) - L( i, j}b(j}
end
end
If n > p then this algorithm requires about 2np fiops.

Algorithm 4.3.3 (Band Back-Substitution: Column Version) Let


U E R')(" be a oonsiDgular upper triangulu matrix having upper band-
width q. Given b E nn the following algorithm overwrites b with the solu-
I

tion to Uz =b.

for j = n: - 1:1
b(j) = b(j)JU(j,j)
fori= max(l,j- q):j -1
b(i) = b(i)- U(i,j)b(j)
end
end
If n > q then this algorithm requires about 2nq fiops.
154 CHAPTER 4 . SPECIAl. LINEAR SYSTEMS

4.3.3 Band Gaussian Elimination with Pivoting


GatliS&aD elimination with panial pivoting can also be speciaJized to exploit
band structure in A. If, however, P A ""' LU, then the band properties of L
and U are not quite so simple. For example, if A ia tridiagonal u.d the first
two rows are interchanged at the very 6rst step of the algorithm, then uu
Ia nonzero. Consequently, row interchanges expand bandwidth. Precisely
how the band enlarges is the subject of the following theorem.

Theorem 4.3.2 Suppose A E R')(" is nonsmgul4r and ha.s upper and lower
lxmdwidtlu q o.nd p, re~peCtively. If Gau.rsian elimination with parli41 pttr
oting is wed to oompute Gaws tmn.sfomuJtioru

; = l :n -1

and pennutat1oru Pt.·· · ,P,._l .such that Mn-tPn-t· · ·MtPlA = U is up-


per triangular, ~ U haJ upptr banduMth p +q and ap> = 0 whenever
i S j or i > j + p.

Proof. Let P A = LU be the factorization computed by Gaussian elimi-


nation with partial pivoting and recall that P • P..-t · · · Pt. Write pT ;:=;
[ e•w .. , e ... ], where {s1, .•. , s,.} is a permutation of {1, 2, ... , n}. If J ; > i+p
then it follows that the leading i-by-i principal su bmatrix of P A is singular.
since (PA)ii :z a.,,j fur j = l:s, - p - 1 and s. - p - 1 ~ i . This implies
that U and A are singular, a contradiction. Thus,"' $ i+p fori :z l:n and
therefore, PA bas upper bandwidth p + q. It fullow3 from Theorem 4.3.1
that U has upper bandwidth p + q.
The assertion about the o W can be verified by obeerving that Mj need
only zero elements (j + 1, j), ... , (j + p, j) of the partially reduced matrix
.PjM;-tPJ-t · · · t PlA. 0

Thus. pivoting destroys band structure in the sense that U becomes


'"wider" than A's upper triangle, while oothlng at all can be &aid about
the bandwidth of L. However, since the jth column of Lis a permutation
of the jtb Gauas vector a 1 , it follows that L has at most p + 1 nonzero
elements per column.

4.3.4 Hessenberg LU
A3 an example of an WlBymmetric band matrix computation, we show bow
Gaussiao elimination with partial pivoting cao be applied to factor an upper
Hessenberg matrix H. (Recall that if H is upper Hes,enberg then h.o; = 0,
i > j + 1). After k - 1 atepe of GaWJSian elimination with partial pivoting
4.3. BANDED SYSTEMS 155

we are left with an upper Hesaenberg matrix of the form.:

[ ~:::
0 0
0 0
X
X
X
X
:1
X
X
lc=3,n=5

0 0 0 X X

By virtue of the special structure of this matrix. we see that the next
permutation, P3 , is either the identity or the identity with rows 3 and 4
interchanged. Moreowr, the next GaUBS transformation Mk bas a single
nonzero multiplier in the (k + l,k) position. This illustrates the kth step
of the following algorithm.

Algorithm 4.3.4 (Hessenberg LU) Given an upper Hessenberg matrix


H E JR." xn, the following algorithm computes the upper triangular matrix
Mn-lPn-l · · · MtP1 H =
U where each Pre is a permutation and each Mk
is a Gauss transformation whose entries are bounded by unity. H(i, k) is
overwritten with U(i, k) if i :5 k and by (M.~~:).Hl,Jo if i = k + 1. An integer
vector piv(1:n- 1) encodes the permutations. If Ptt =I, then piv(k) = 0.
If P~c interchanges rows k and k + 1, then piv(k) = I.
fork= l:n -1
if IH(k, k)l < IH(k + 1,k)l
=
piu(k) 1; H(k,k:n) +--+ H(k + 1, k:n)
else
piv(k) 0 =
end
if H(k, k) 1: 0
t = -H(k + 1,k)/H(k,k)
f()r j = k + 1:n
H(k + 1,j) = H(k + 1,j) + tH(k,j)
end
H(k+ l,k) = t
end
end
This algorithm requires n 2 flops.

4.3.5 Band Cholesky


The rest of this section is devoted to banded Az = b problems where the
matrix A is also symmetric positive definite. The fact that pivoting is
unnecessary for such matrices leada to some very compact, elegant algo-
rithms. In particular, it follows from Theorem 4.3.1 that if A = G(fl' is the
Cholesky factorization of A, then G has the same lower bandwidth as A.
156 CHAPTER 4. SPECIAL LINEAR SYSTEMS

Thill leads to the following banded version of Algorithm 4.2.1, gaxpy-based


Cholesky

Algorithm 4.3.5 (Band Cholesky: Gaxpy Version) Given a symmet-


ric pooitive definite A E lRnxn with bandwidth p, the foUowing algorithm
computes a lower triangular matrix G with lower bandwidth p such that
A= GGT. For all i ~ j, G(i,j) overwrites A(i,j).

for j = l:n
for k == max(l,j - p):j - 1
.\. = min(k + p, n)
A(j:.\,j) = A(j:.\,j)- A(j, k)A(j:>., k)
end
.\. =
min(j + p, n)
A(j:.\,j) = A(j:>.,j)/ ..jA(j,j)
end

If n > p then thiB algorithm requires about n(p2 + 3p} flopa and n square
roots. Of COUI'8e, in a serious implementation an appropriate data structure
for A should be used. For example, if we just store the nonzero lower
triangular part, then a (p + 1)-by-n array would suffice. (See §1.2.6)
If our band Cholesky procedure is coupled with appropriate band trian-
gular solve routines then approximately n;? + 7np + 2n flops and n square
roots are required to solve Ax = b. For small p it follows that the square
roots represent a significant portion of the computation and it is prefer-
able to use the LDL T approach. Indeed, a careful flop count of the steps
A= LDLT, Ly = b, Dz = y, and LTx = z reveals that n;? +Bnp+n flops
and no square roots are needed.

4.3.6 Tridiagonal System Solving


AB a sample narrow band LDLT solution procedure, we look at the case of
symmetric positive definite tridiagonal systems. Setting

1 0

L=

0 e..-1 1
4.3. BANDED SYSTEMS 157

and D = diag(d~o ••• ,4,) we deduce from the equation A= LDLT that:

au = dt
a~~:,.~~:- 1 = e.~~:-ldifl:-1 k=2:n
au = d.~~:+ ~-l d.~~:-1 =-d.~~: + e~~:_ 1 a~~:,.r.-t k= 2:n
Thus, the d.; and e. can be resolved as follows:
d1 =au
fork= 2:n
e~~:-1 = a,.,~e-1/dt-1; d,. =au- e~ro-tal:,k-t
end
To obtain the solution to A.:r = b we solve Ly = b, Dz = y, and LTx = z.
With overwriting we obtain

Algorithm 4.3.6 (Symmetric, Tridiagonal, Positive Definite Sys-


tem Solver) Given an n-by n symmetric, tridiagonal, positive definite
6

matrix A and b E lR", the following algorithm overwrites b with the solu-
tion to Ax= b. It is assumed that the diagonal of A is stored in d(l:n) and
the superdiagonal in e(l:n- 1).

fork= 2:n
t = e(k- I); e(k- 1) = tfd(k- 1); d(k) = d(k) - te(k- 1)
end
fork= 2:n
b(k) = b(k)- e(k- l)b(k- 1)
end
b(n) = b(n)/d{n)
for k = n- 1: - 1:1
=
b(k) b(k)fd(k)- e(k)b(k + 1)
end

This algorithm requires 8n flops.

4.3. 7 Vectorization Issues


The trid.iagonal example brings up a sore point: narrow band problems and
vector /pipeline architectures do not mix well. The narrow band implies
short vectors. However, it is sometimes the cue that large, independent
sets of such problema must be solved at the same time. Let us look at how
such a computation should be arranged in light of the issues raiaed in §1.4.
For simplicity, a.ssu.me that we must solve the n-by-n unit lower bidiag·
onal systems
k= l:m
158 CHAPTER 4. SPECIAL LINEAR SYSTEMS

and that m > n. Suppoee we have arrays E(l:n- 1, l:m) and B(l:n, l:m)
with the property that E(l:n - 1, k) houses tbe subdiagonal of A(l:) and
B(1:n, k) houses the kth right hand side b(.ll:) . We can overwrite b(JJ) with
the solution :z(l:) as foUows:

fork= l:m
fori= 2:n
B(i,k) = B(i,k)- E(i -l,k)B(i -l,k)
end
end
The problem with thia algorithm, which sequentially solves each bidiagonal
system in turn, is that the inner loop does not vectorize. This is because
of the dependence of B(i,k) on B(i -l,k). H we interchange the k and i
loops we get
fori= 2:n
fork= l:m
B(i,k) = B(i,k)- E(i -l,k)B(i -l,k) (4.3.1)
end
end
Now the inner loop vectorizel well as it involves a vector multiply and a
vector add. Unfortunately, (4.3.1) is not a unit stride procedure. However,
this problem is easily rectified if we store the subdiagonals and right--hand-
sides by row. That is, we use the arrays E(l:m, l:n -1) and B(1:m,1:n-1)
and store the Bubdiagonal of A(AI) in E(k, 1:n- 1) and b(.II:)T in B(k, l:n).
The computation (4.3.1) then transforms to
fori= 2:n
fork= l:m
B(k,i) = B(k,i)- E(k,i -1)B(k,i- 1)
end
end
illustrat.lng once again the effect of data structure on performance.

4.3.8 Band Matrix Data Structures


The above algorithms are written as if the matrix A is c.ouveotionally stored
in an n-by-n array. In practice, a band linear equation solver would be or-
ganized around a data structure that takes advantage of the IWI.IlY zeroes
in A. RecaiJ &om §1.2.6 that if A bas lower bandwidth p and upper band-
width q it ca.n be represented in a (p + q + 1)-by-n array A.band where
band entry tltj is stored in A.bo.nd(i- j +q+l,j). In this arrangement, the
nonzero portion of A's jth column is housed in the jth column of A.band.
Another possible band matrix data structure that we d!scussed in §1.2.8
4.3. BANDED SYSTEMS 159

involves storing A by diasonal in a 1-<llmeuiooal array A.t&g. RA!gardle8s


of the data structure adopted, the design of a mat.r ix computation with a
band storage arrangement requires care in order to minimize subecriptiDg
overheads.

ProbJem.
P4.S.l Oeriw a buded LDMT pl'C)CBdun llmi1ar &o AJcomhm ...3..l.
P4.S.2 Slt.ow how the output of Algorithm 4.3.._ eaa be liB w eolw the upper Hts-
-bcg~Hz-h.
P4.3.3 Giw u alsori'hm for .olvin& an w.yiDIDCibic ~ 8)'ftem Az ... h th-at
, . . GeuaiA elimma&iou with partial pivotin(. It lhould require Oftly four ""vecwn of
floe&iac poim storap Cor the factoriaaUoo.
P4.3.4 For C E R' 11" define the profiJc ~ m(C, i) = miQ(j:~;i ;. 0}. wb.e
i : l:n. Sbow tb&l if A : ad" il the Cbolelky fad.Oriaa&loo oC A., ChilD m(A, i) :or
m(G, i) fori = l:n . (We~ &lW G baa the ame profiJc M A.)
P4.3.5 SappoM A E R' 11" ill aymmetric positiw deflDite with profile indices 1ni -
m(A, i) wbele i = l:n. Aaume thM A Is nored in a o-.dimlllllional anay vaa folknn:
v :: (mu , ~,,.. 2 , • .• , G22 0 113,,.s• " ' ' ll33, ·· . ,Go.,, ., .. . , Go.,.). Write &D algorithm th-at
overwrites t1 witb the c:orn.poadinc entries ol the Cbolalky fac\01' G 61ld tbeo '*' thia
f~ to eolw Az : b. How ma~~y !lope are required?

P4.3.S ForC E R'.c" de6oe p(C, i ) "' max{i:e;J ;1. 0}. Sup!)C* that A E R''"' haa ao
LU f~ioo A • LU aDd~:
m(A,l) ~ m (A,2) ~ ··· ~ m(A,n)
p(A.l) ~ p(A, 2) ~ . .. S p(A. n )
Show that m(A, i) • m(L, j ) aod p( A, i) = fi(U, i) for i • l :n. Recall tbe defioitioa of
m(A, i) &om P4.3.4.
P4.3.T De¥elop a pxpy wnioG of Alsorithm 4.3.1.
P4.3.8 o.veJop a unn ltride. Yec:Wrizable .Jpithm lor eolving the symmetrk potitlve
defioitetridiaconai8)'Bte!m A<•l~<•> .. /)( 11) . Alwmethat thedlaconala. euperd!aconala.
and fi&ht baDd aides an stored b)' row ia arta)'l D , E , aDd. B aod tba& bl•> ill ovwwriUen
with :z( lr).
P4.1.1a Onelop a -:moo of A1&oritbm 4.3.1 iD wllk:h A iii&Onld by ~
P4.3.10 Gtv. ao aampJe of a 3-by-3 S)'1IIID8&ric poaitiw defbliWI ma&rtx w~ tridlq·
ooa.l part ill DOt poaittve de8nlte.
P4.S.U Couider the Az .. b problem where
2 -1 0 0 -1

-1 2 -l 0

0 - 1 2
A•
0

0 2 - 1
-1 0 0 -1 2
Thi8 ldDd of matrix an- ia houod.ary value proba- with ptriodie bowdary ooodi.Uoll&
(a) Shaw A illliDcuJar. (b) Gi¥eCODdi\loDI tlw b mill& IMiafy fof: tbere tocxilt a .otutioo
aod specify aa atsomhm for aolviac it. (c). A.ume that" i l l . - aDd COMiOer the
pennut.a&ioo
160 CHAPTER 4. SPECIAL LINEAR SYSTEMS

wbere e 11 il ~he k&h mlum.a. of I.,. o-:ribethe ~l'lllllld sy11tem pT AP(PT%) = pTb
and mbow bow C.O 10lve it. hlrume that there ill a IIIDiution IIDd ignon!l pnootiug.

Note. and Relerenc. fOr Sec. 4.3


The li\erattUe coneemed ~h banded system. il ilnmeDM. Some rep.._tati'Ye papen
i.Dclude

R.S. Martin aDd J.H. Wi.l.kinaon (1965). "S)'DIIDIIUie Decompoeition of Positive Definite
Baod M..mce.,w N'Uf116. Moth. 7, 3M-61.
R. S. Menmand J .H. Wi.lkin81ln ( 1961}. "Solution of Symmetric and U nsytiliiletri.c Band
Equatio1121 and ~be Calcul&tion of Eipnva.IIIM of Band Matric:s, ~ Numer. Math. 9,
27'9--301.
E.L. AJl&ower' (1973). "Exact~ of Certain Band Matl"i.cci!!l,~ NurM:r. Math. JH,
219--84.
Z. Bobte (1975). "Bounds for Rolllldiq Ezrors in the Gau.ian EliminaUon for Band
Symqns," J. lrvt. MGih.. Applic. 16, 133-42.
I.S. Duff (19'17). "A Survey of Sparae Matrix~.~ Proc. IEEE 65, ~535.
N.J. Higham (1990). "BoliDding the Error in Gauaaian Elimination for Tridiagonal
S}'Remll." SIAM J. Macri:~: AnaL AwL 11, 521-530.
A topic of COIIIIiderable intermt in the a.rea of banded matricM deals with method! for
reducing the width of the band. See

E. Cuthill (1972). "Severaa Strntegiell fiK Reducing the Bandwidth of Matrice&t in


Sporse Ma.tricu cm4 Their Appltcntiofu, ed. D.J. Roal and RA. Willoughby, Plenum
Pn!lll!l, Nll'W York..
N.E. Gibbs, W.G. Poole, Jr., and P.K. Stocltmeye!' (1976). "An Algorithm for Reducing
the Bandwidth and ProiDe of a Spane Mairix," SIAM J. Num. Al'l4l. 13, 236-50.
N.E. Gibbs, W.G. Poole, Jr., and P.K. Stoc:kmeye- (1976). "A Comperi&on of Several
Bandwidth and Profile Reduction Algorithma, ~ ACM Thml. Math. Soft. !, 322-30.
Am Will meniioned, tridia(onai systema arise with particular frequency. Thua, it is not
slll'J)l'Wng thai a peat deal of attention has been toc:u.d on !lpeCia.l methods for this
cl- of banded problema.

C. FB:ber &nd R.A. Ulllllalli {UHI9). "Propatim of Soma~ MatrK:a. and Their
.App&a&ion to Boundary Value Problema," SIAM J. Num. ARAL 6, 121-.c2.
D.J. Rc. (l969). "All Algornbm fcx Solvin( a Special Clall of Tridiagonal S)'lltemll of
Lin~~K Equatioaa,~ Camm.. ACM Ill, 234-36.
H.S. Stoue (1973}. ~An Eftl.cienc Parallel Algorithm for the Solution of a 'Indiagonal
Lineez- SysteQ~ of EquaWD.t,M J. ACM !0, 27-38.
M.A. Malcolm and J. Pam-- {1!r74.). "A FW Method for Solvfu« a C1aa of 'l'ridia«onal
SysCema of LiDeer Equa;iona," Comm. ACM 11, 14-17.
J. Lambiotte aod R.G. Voigt (1975). '"l'he Solution of 'DicJia«on.al Linear Syaiemll of
the COC.STAR 100 Compu.t.," ACM n-... MatA. S~ l, 308-29.
H.S. Stoue (1975). "Pvaalel 'I'ridiagoDal Equacioa Solven,~ ACM n-an.. Math. Soft.I,
289-307.
D. Kmmaw(1982). "SoltRion of Sinpl 'l'riditcvnal u - Systems aDd Vedoriatian of
the ICCG Algornh.m on the Cray-1,~ in G. Rcderigue (ed), Pa.m.Uel Com~
Academic ~ NY, 1982.
N.J. Higham (1986). "EIIicieni AlgoritluDI foe oomputiJJ&: the condition number of a
tridiagonal matrix," SIAM J. Sci. Clnd SCot. Comp. 1, 150-165.
Chapca- 4 of George and Liu (1981) contains a nb survey of bud mechodlll for positive
defiltite 8)'11tema.
4.4. SYMMETRIC INDEFINITE SYSTEMS 161

4.4 Symmetric Indefinite Systems


A symmetric matrix whose quadratic furm i1' Az takes on both positive and
negative values is ealled indefiniU. Although an indefinite A may have an
LDLT factorization, the entries in the factors can have arbitrary magnitude:

Of course, any of the pivot strategies in §3.4 could be invoked. However,


they destroy symmetry and with it, the chance for a "Cholesk:y speed"
indefinite system solver. Symmetric pivoting, i.e., data resbufRi.ngs of the
form A .... P APT, must be used as we discll!ISed in §4.2.9. Unfortunately,
symmetric pivoting does not always stabilize the LDLT computation. U f 1
and e:~; are small then regardless of P, the matrix

A= p [ ft 1 ] pT
1 f:l;

has small diagonal entries and large numbers surface in the factorization.
With symmetric pivoting, the pivots are always selected from the diagonal
and trouble results if these numbers are small relative to what must be
zeroed off the diagonal. Thus, LDL T with symmetric pivoting cannot be
recommended as a reliable approach to symmetric indefinite system solving.
It seems that the challenge is to involve the off-diagonal entries in the
pivoting prore;s while at the same time maintaining symmetry.
In this section we discuss two ways to do this. The first method is due
to Aasen(l971) and it computes the factorization

(4.4.1)

=
where L (l,J) is.unit lower triangular aod Tis tridiagonal Pis a permu-
tation chosen such that 14;1 :5 1. In contrast, the diagonal. pivoting method
due to Bunch and Parlett ( 1971.} computes a permutation P such that

(4.4.2)

where D is a direct sum of 1-by-1 and 2-by-2 pivot blocks. Again, P is


chosen so that the entries iD the unit lower triangular L satisfy ll•i I :5 1.
Both factorizations involve n 3 /3 ftope and once computed, can be uaed to
solve Ax= b with O(n1 ) work:
PAPT=LTLT,Lz=Pb,Tw=z,LTy=w,x=Py ~ k=b
PAPT = LDLT, Lz = Pb, Dw = z,LTy = w,x = Py ~ A%= b
The only thing "new" to discuss in these solution procedures are the Tw = z
and Dw = z systems.
162 CHAPTER 4. SPECIAL LlNEAJl SYSTEMS

In AaaeA's methOd, the symmetric indefi.o.ite tridiagonal sy&tem Tw = z


is solved in O(n) time uaiDg baod Gaussian elimination with pivoting. Note
that there is oo serious price to pay for the disregard of symmetry at this
lewJ since the mwall prooeas is O(n3).
=
In the diagonal pivoting approach, the Dw z system amowrca to a set
of 1-by-l and 2-by-2 symmetric indefinite S)'3tem8. The 2-by-2 problems
can be handled via Gauaeian elimination with piYQti.og. Again, there ia DO
harm in dlaregardlng symmetry during this O(n) phaae of the calculation.
Thus, the central issue in this section is the efficient computation of the
factorizations (4.4.1) and (4.4.2).

4.4.1 The Parlett-Reid Algorithm


Parlett aod Reid (1970) show haw to compute (4.4.1) using Causa trans-
forms. Their algorithm is sufficieotly ill\18\ra.ted by displaying the k ..,. 2
step for the case n = S. At the beginning of this step the matrix A has

l
beeo traasformed to

A(l) = M 1 P1 AP[M( = [ ~~ ~ :
Ov4 x x x
; ;

Ovsxxx
where Pt is a permutation chosen so that the entries in the Gauss trans-
formation Mt are bounded by unity in modulus. Scanning the vector
(v3 v, vs)T for its largest entry, we now determine a 3-by-3 permutation P,
such. that

II this maximal element is zero, we set M2 = P2 = I and proceed to the


next step. Otherwise, we aet P2 = diag{I2 ,P,) and M2 :z I - a< 2>ef with
a('~) = ( 0 0 0 ii,fva iis/113 ) T
aodo~that

Pt 0 0

n
[ a,
P1 O'J ~ 0
A<2> = M,P,A(t) P[ M[ = 0 ~ X X
0 0 X X
0 0 X X

Io general, the proceea coot.inues for n-2 stepeleaving us with a tricii.agonal


matrix
T = A<n-:i) "" (Mn-2Pn-2 .. · MtPdA(Mn-:tPn-2 · · · M1Pt)T.
4.4. SYMMETRIC INDEFINITE SYS1'EMS 163

It can be shown that (4.4.1) holds with P = P~-2 · · · Pt and

L = (Mn-:~Pn-2 · · · M1P1PT)- 1 •

Analysis of L reveals that its first column is e 1 and that its subdiagonal
entries iP column k with k > 1 are "made up" of the multipliers in M1c-1·
The efficient implementation of the Parlett-Reid method requires care
when computing the update

(4.4.3)

To see what is involwd with a minimum of notation, suppose B BT has =


order n - k and that we wish to form: B+ = (I -wef)B(I - wef)T where
wE lR.n-11: and e1 is the first column of In-11:· Such a calculation is at the
heart of (4.4.3). If we set

bu
u::;: Be 1 - -w,
2
then the lower half of the symmetric matrix B+ = B- wuT- uwT can
be formed in 2(n - k) 2 flops. Summing this quantity as k ranges from 1
to n - 2 indicates that the Parlett~Reid procedure requires 2n 3 /3 flops-
twice what we would like.

Example 4.4.1 If the PIIClett-Rsid algorithm ill applied to

then
•=[iii!]
Pi""' [et~l!l~)
M1 !4 - {0, 0, 2/3, 1/3, )T ef
Pz =- [ et e:a 1!4 e:s ]
M:a '"' !4 - (0, 0, 0, 1/2)Tef
= LTLT, whi!R P = (e1, e:s, e.&, e:~),

*1l·
and PAPT

L-u ~~ .!. n~T=[i ·i·


4.4.2 The Method of Aasen
An n 3 /3 approach to computing (4.4.1) due to Aasen {1971) can be derived
by reconsidering some of the computations in the Parlett-Reid approach.
164 CHAPTER 4. SPECIAL LINEAR SYSTEMS

We aeed a notation for tbe tridiagonal T:

T=

For clarity, we temporarily ignore pivoting and assume that th4!l factoriz~
tion A = LTLT exists where Lis unit lower triangular with L(:, 1) = e 1 •
Aasen's method is organized as foUows:

for j = 1:n
Compute h(l:j) where h = TLT e; =He;.
Compute a(j).
if j ~ n - 1
Compute PU> (4.4.4)
end
if j~ n-2
Compute L(j + 2:n,j + 1).
end
end

Thus, the mission of the jth Aasen step is to compute the jth column of
T and the (j + 1)4 column of L. The algorithm exploits the fact that the
matrix H = TLT is upper Hessenberg. AB can be deduced from (4.4.4),
the computation of a(j), {J(j), and L(j + 2:n,j + 1) hinges upon the vector
h(l:j) = H(1:j,j). Let us .see why.
Consider the jth column of the equation A = LH:

A(:, j) = L(:, l :j + l )h(l :j + 1). (4.4.5)

This MY8 that A(:,j) is a linear combination of the first j + 1 colUllllUI of


L . lD particulal,

A(j + l:n,j) = L(j + l :n, l :j)h(l:j) + L(j + l:n,j + l)h(j + 1) .


It followB that if we compute

v(j + l :n) = A(j + l:n,j) - L(j + l:n,l:j}h(l:j) ,


then
L(j + l:n, j + l )h(j + 1) = v(j + l:n) . (4.4.6)
4.4. SYMMETRIC INDEFINITE SYSTEMS 165

Thus, L(j + 2:n,j + I) is a scaling of v(j + 2:n). Since L is unit lower


triaogular we have from (4.4.6) that

v(i + 1) = h(j + 1)
and so from that same equation we obtain the following recipe for the
(j + 1)-st column of L:

L(j + 2:n, j + 1) = v(i + 2:n)/v(j + 1) .


Note that L(j + 2:n,j + 1) is a scaled gaxpy.
We next develop formulae for a.(j} and {3(j). Compare the (j,j) and
{j .f:1,j) entries in the equation H ~ TLT. With the convention ,tJ(O) = 0
we find that h(j} = {J(j -l)L(j,j- 1) + a.(j) and h(j + 1) = v(j + 1) and
so
a.(j) = h(j) - {3(j- 1)L(j,j- 1)

{J(j) = v(j+l).
With these recipes we can completely describe the Aasen procedure:

for j = 1:n
Compute h(1:j} where h = TLT e;.
ifj=1Vj=2
=
a.(j) h(:j)
else
a.(j) = h(j) - {J(j- 1)L(j,j - 1)
end
if j ~ n- l {4.4.7)
v(j + l:n) = A(i + l:n,j) - L(j + l:n, l:j)h(l:j}
,tJ(j) = v(j + 1)
end
ifj5n-2
L(i + 2:n,j + 1) = v(j + 2:n)jv(j + 1)
end
end

To complete the description we must detail the computation of h(I:j}.


From (4.4.5) it follows that

A(1:j,j) = L(l:j, 1:j)h(1:j). (4.4.8)

This lower triangular system can be solved for h(1:j) since we know the first
j colUDlDS of£. However, a much more efficient way to compute H(l:j,j)
166 CHAPTER 4. SPECIAL LINEAR SYSTEMS

is obtained by exploiting the jth column of the equation H = TLT. In


particular, with the comention that ,B(O)L(j, 0) = 0 we have

h(k) = ,B(k- l)L(j, k- 1} + a(k)L(j, k) + {3(k)L(j, k + 1) .


for k = 1:j. These are working formulae except in the case k = j because
we have not yet computed a(j) and fj(j). However, once h(l:j -1) is known
we can obtain h(j) from the last row of the triangular system (4.4.8), i.e.,
;-1
h(j) = A(j,j) - L:L(j,k)h(k) .
.t ... l

Collecting results and using a work array t(l:n) for L(j,1:j) we see that
the computation of h(l:j) in (4.4.7) can be organized as follows:

if j =1
h(l} =A( I, 1)
elseif j = 2
h(l} = ,8(1); h(2) = A(2,2) (4.4.9)
else
l(O) = 0; l(1) = 0; l(2:j - 1) = L(j, 2:j - 1); i(j) = 1
h(j) = A(j, j)
fork= l:j -1
h(k) = ,B(k- 1)t'(k - 1) + a(k)i(k) + (J(k)i(k + 1)
h(j} = h(j) - l(k}h(k)
end
end
Note that with this O(j) method for computing h(1:j), the gaxpy calcula-
tion of v(j + l:n) is the dominant operation in (4.4.7). During the jth step
this gaxpy involves about 2j(n- j) Oops. Summing this for j = l:n shows
that Aasen's method requires n 3 /3 flops. ThUB, the Aasen and Cholesky
algorithms entail the same amount of arithmetic.

4.4.3 Pivoting in Aasen's Method


As it now stands, the columns of L are scalings of the v-vectors in (4.4.7).
If any of these scalings are large, i.e., if any of the vU + I)'s a.re small,
then we are in trouble. To circumvent this problem we need only permute
the largest component of v(j + l:n} to the top position. Of course, this
permutation must be suitably applied to the unreduced portion of A and
the previously computed portion of L.

Algorithm 4.4.1 (Aasen's Method} H A E nnxn is symmetric then


the following algorithm computes a permutation P, a unit lower triangular
4.4. SYMMETRIC INDEFINITE SYSTEMS 167

L, and a tridiagonal T such that P.APT = LTLT with IL(i,i)l :$ 1. The


permutation P is encoded in an integer wctor piv. In particular, P =
P1 • • • Pn-:t where Pj is the identity with rows piv(j) and j + 1 interchanged.
The diagonal and subdiagonal ofT are stored in a(1:n) and P(l:n - 1),
respectively. Only the subdiagonal portion of L(2:n,2:n) is computed.

for j = l:n
Compute h(l:j) via (4.4.9).
if i = 1 v j = 2
a(j) = h(j)
else
o:(j) = h(i)- {j(j- l)L(j,j- 1)
end
ifj~n-1
v(j + 1:n) = A(j + 1:n,j)- L(j + l:n, l:j)h(l:j)
Find q so jv(q)i = I v(j + l:n) lloo with j + 1 :::; q :::; n.
piv(i) = q; v(j + 1) +-+ v(q); L(j + 1, 2:j) ...... L(q,2:j)
A(j + 1,j + 1:n) +-> A{q,j + l:n)
A(j + 1:n, j + 1) +-+ A(j + l:n, q)
(J{j) = v(j + 1)
end
ifj~n-2
L(i + 2:n,j + 1) = v(j + 2:n)
if v(j + 1) ~ 0
L(j + 2:n,j + 1) = L(j + 2:n,j + 1)/v(j + 1)
end
end
end

Aasen's method is stable in the same sense that GaUSBian elimination with
partial pivoting is stable. That ia, the exact factorization of a matrix near
A is obtained provided fl T l!:t/11 A l12:::: 1, where tis the computed version
of the tridiagonal matrix T. In general, this is almost always the case.
In a practical implementation of the Aasen algorithm, the lower trian-
gular portion of A would be overwritten with L and T. Here is. n = 5
case:

Notice that the columns of L are shifted left in this arrangement.


168 CHAPTER 4. SPEClAL LINEAR SYSTEMS

4.4.4 Diagonal Pivoting Methods


We next describe the computation of the block LD LT factorization (4.4.2).
We follow the discussion in Bunch and Parlett (19n). Suppose

PtAP[ = [ ~ '; ] n ~s
8 n -8

where P 1 is a pennutation matrix and s = 1 or 2. If A is nonzero, then it is


always possible to choose these quantities so that E is nonsingular thereby
enabling us to write

T
PtAPt =
( r. o ] [ E0
cE- 1 In-• B- CE-l(fl'
o ] [ r.0 E- cr ]
1
/,._.

For the sake of stability, the s-by-s "pivot" E should be chosen so that the
entries in
(4.4.10)
are suitably bounded. To this end, let a E (0, 1) be given and define the
size measures

1-'G = max
i,j
l<li; I

1-'l = max Ja,,J .

The Bunch-Parlett pivot strategy is as follows:

it l-'1 :?:: cr,.ao


s=l
Choose P1 so Jeul = #'1·
else
s=2
Choose P1 so Je21l = IJG.
end

It is easy to verify from (4.4.10) that if s = 1 then


liitJI :S (1 + a- 1 )1-'o (4.4.11)

while s = 2 implies
(4.4.12)
4.4. SYMMETRIC INDEFINITE SYSTEMS 169

By equating {1 + a:- 1 ) 2 , the growth factor 8Sil0ciated with two"= 1 steps,


and {3-o:)/(1-a:), the corresponding s = 2 factor, Bunch and Parlett con-
clude that a= {1 + Ji7)/8 is optimum from the standpoint of minimizing
the bound on element growth.
The reductions outlined above are then repeated on the n - s order
symmetric m.a.trix A. A simple induction argument rstablishes that the
factorization (4.4.2) exists and that n 3 /3 flops are required if the work
associated with pivot determination is ignored.

4.4.5 Stability and Efficiency


Diagonal pivoting with the above strategy is shown by Bunch (1971) to be
as stable as Gaussian elimination with complete pivoting. Unfortunately,
the overall process requires between n 3 /12 and n 3 /6 comparisons, since fJ(J
involves a two-dimensional search at each stage of the reduction. The actual
number of comparisons depends on the total number of 2-by-2 pivoU> but
in general the Bunch-Parlett method for computing ( 4.4.2) is considerably
slower than the technique of Aasen. See Barwell and George(1976).
This is not the case with the diagonal pivoting method of Bunch and
Kaufman (1977). In their scheme, it i5 only necessary to scan two columns
at each stage of the reduction. The strategy is fully illustrated by consid-
ering the very first step in the reduction:

a:==(1 + vTI)/8; >. = ja.. 1l = mruc:{Ja:ul, ... ,ja,.ll}


iL\ > 0
if Jan I ~ at\
3 = 1;P1 =I
else
u= Ia,..I = ma.x{ja1,., · · ·, la.--1,.. j, la.,.+I,rl, · ··, lan.. l}
if ujaul ~ a..\1
s = 1,Pt =I
elseif Ia.... I ~ aCT
3 = 1 and choose P1 so (Pf APt)u =a..-.,..
else
3 = 2 and choose P1 so (Pf API):n = a.,.p.
end
end
end

Overall, the Bunch-Kaufman algorithm requires n 3 /3 flops, O(n2 ) compar-


isons, and, like all the methods of this section, n 1 /2 storage.
170 CHAPTER 4. SPECIAL LINEAR SYSTEMS

ExmPple 44.2 U U1e Bunch--K..uflhan algorithm ia applied to

l 10 20 ]
A = 10 l 30
[ 20 30 1
then in tbe first step ), "" 20, r == 3, cr = 30, and p == 2. Tbe permutation P = [ e3 e:z et ]
is applied giving

PAPT =[ io ~ ~ ] .
20 10 1
A 2-by-2 piwt is then tilled to produce the reduction

l 0
PAPT "" 0 1
[ .3115 .tl563

4.4.6 A Note on Equilibrium Systems


A very important class of symmetric indefinite matrices have the form

A= ( 4.4.13)

where Cis symmetric positive definite and B has full colwnn rank. These
conditions ensure that A is nonsingular.
Of course, the methods of this section apply to A. However, they do not
exploit its structure because the pivot strategies "wipe out" the zero (2,2)
block. On the other hand, here is a tempting approach that does exploit
A's block structure:
(a} Compute the Cbolesky factorization of C, C = G(/I'.
(b) Solve GK =B forK E Jr)(P.
(c) Compute the Cbolesky factorization of i(TK = sTc- 1B, HHT =
I(1" K.

From this it follows that

In principle, this triangular factorization can be used to solve the equilib-


rium IIJ,IItem
(4.4.14)
4.4. SYMMETRIC INDEFINITE SYSTEMS 171

However, it is clear by considering steps (b) and (c) above that the accuracy
of the computed solution depends upon .oc(C) and this quantity may be
much greater than ,o,;;(A). The situation has been carefully analyzed and
various structure-exploiting algorithms have been proposed. A brief review
of the literature is given at the end of the section.
But before we close it is interesting to consider a special case of (4.4.14)
that clarifies what it mean8 for an algorithm to be stable and illustrates
how perturbation analysis can structtm the search for better methods.
In several important applications, g = 0, C is diagonal, and the solution
subvector y is of primary importance. A manipulation of (4.4.14) shows
that this vector is specified by

(4.4.15)

Looking at this we are aga.io led to believe that ~t( C) should have a bearing
on the accuracy of the computed y. However, it can be shown that

(4.4.16)

where the upper bound 1/Ja is independent of C, a result that (correctly)


suggests that y is not sensitive to perturbations in C. A stable method for
computing this vector should respect this, mea.ning that the accuracy of
the computed y should be independent of C. Vavasis {1994) has developed
a method with this property. It involves the careful assembly of a matrix
V E m.n)( (n-p) whose colUIDDS are a basis for the nullspace of BTc- 1 • The
n-by-n linear system

IB,v][~]=!
is then solved implying f =By+ Vq. Thus, BTC- 1f = BTC- 1By and
(4.4.15) holds.

Problema

P4.4.1 Show t!W if all the l·by-1 Uld 2-by~2 priDcipalsubiD&b'ieee of an n-by-n
symmetric ma&.rix A are singular', tbeD A is zero.
P4.4.2 Shaw tha& no 2-by-2 pivots ClUJ arill8 iD the Bunch-Kaufman algorithm if A is
posnive definite.
P4.4.S Arrange Algorithm 4.4.1 ao tha& only the knn!r- triangll.lal- portioD of A ill
referenced and eo tha& a(j) 0\WWrit.es .A(j,j) f€K j =
1;n, {J(j) ovennitel A(j + l,j) for
j = 1:n- 1, aDd L(,,j) overwrite. A(i,j- 1) for j = 2:n- 1 and i = j + I:n.
P4.4.4 Supp<Me A E R' u is DODSingular, symmetric, and atricily diagonally dominant.
Give an aJcorithm thai compute. the factorizaiioD

nAIIT == ( ~ -~ J[ R: ~ ]
where R E ft')(il aDd ME R<"-.11))((,.-lll are loww trian&u1at' and uoWiingulaz- and ll is
a pennutaiioa..
172 CHAPTER 4. SPECIAL LINEAR SYSTEMS

P.C.LS Show' that if


A =- [ Au An ] n
A:n -A22 11
n P
i81)'1DJD81:ric with .411 aad A22 pclllitlw definite, tbeu it bM ILD LDLT r.ctorizaiion witb
the propolriy thai

D= [ ~1 -~~ J
where D1 E R'x" and lh E IJ'XP have ~tiw cllaconal eniriel.
P4.4..8 Pron {4.4.11) lllld (4.4.12).
P4.4.T Sboao that -(BTC- 1 B)- 1 lethe (2,2) block of A- 1 whece A ~giYea. by (4.4.13).

P4.4.8 The point of this problem is to coll.Sider a special caM of (4.4.15). Define the
matrix

where
C =(I,. +aer.ef> a> -1.
a.a.d e1 = I,.(:,k). (Note thai C i3 jll8l the identity with a added to the (k, k) entry.)
Aaume that BE R'x" hell rank p and show tha&

M(a) = (BT B)-I sT (I.. - 1 +(kill


a T e,.wT)
w
where w = (I,. - B(BT B)- 1 sT)~~· Show that ii II w lb = 0 or II w lb = 1, theu
jJ M(a:) ~~~ = 1/ITmi,.(B). Show tba.& if 0 < llwll 2 < l, th«<

II M(a) 112 ~ max { 1 -Illw 112 ' 1 + II~ lb} I a",.,,.(B).

Thua, II M(a) Jl 2 baa an a-independent upp« bound.

Nota. and Refeno~ Cor Sec. 4.4

The buK reren.D.Cefl !Or computing (4.4.1) er&

J.O. ~ (1911). "On the Reduction of a S)'llllnl!tric Matrix to nidiagoaal Form, n


BIT 11, 233-42.
B.N. Parll!ltt and J.K. Raid (1970}. "'n the Solution of a S)'Btem. of Une&r Equaiious
Wboaa Matrix ia Symmeiric: but DOt De4nite," BIT 10, 386-97.
The cfia&onal pM:Idnlliterature includm

J.R. Bunch aad B.N. Parlett (1911). "Direct M~bodll fill' SolYiJI& SyrnmeCrlc l.ndeftnite
SyM:ID8 of~ Equaiiou," SIAM J. Nvm. Anal. 8, 639-SS.
J.R. BlllloCh {1971). ~Allalym of the Dia&oaal Pivoting Method: SIAM J. Nvm. Anal.
8, 6S6-680.
J.R. Bunch (1974). ~Partial Pivot.ing~forS}'DIIDI!tric Matrioell," SIAM J. Num.
Anol. 11, 521-528.
J.R. BUJI.Ch, L. Kaufmau, aDd B.N. Paclett {1916). ~tioQ. or a Symmetric
Matrix.• Nwn.er. MtUh. 1'1, 95-109.
J.R.. BWICb IIDCI L. Kaufman (1977). "'Some Stable M~ Jot: Calculatin& lnarl:ia aud
Solvillg Symmetric Lil:leu' S~" Math. Cqmp. 3l, 162-19.
I.S. Duff, N.I.M. Gould, J.K. Reid, J.A. Scott, IIDCI K. Turn..- (1991). "The Fllottorization
of SJMl1IIIIIDdelluite Matricm," IMA J. ~ AnaL 11, 181-204.
4.4. SYMMETRIC INDEFINITE SYSTEMs 173

M.T. JoDfJS and M.L. Pahidc (1993). "BUDCh-K.ufmao ~ioa b R-.15ymmlltrk:


lDt:lefinite Baoded M~· SIAM J. MGtri: AAGL AppL 14. 5.53-639.
Becaual! "future" coiUlDDS mu.t be I'IC&Illl«<. in the pivot;inJ p~ i\ ill awkward (bu\
poable) kl ob~ a pxpy-ricll diacmW pivoCiJlc aJcarithm. On the oths- haod, ~·~~
met be& ill naturally rich in pxpy'L Block Vll!nlio1111 or both procedm. are ~biB. LA-
PACK ~ the ~nal pm,tiog method. vanoo. perfonnaD<:e -.ue. are dille~ iD

V. Bantmll and J .A. George ( 1976). M A CompariloD of AJ&oriihmll b- Solving Symmetric


lndefiuite S}'Vielm of I..iuear EquatioQ.," A CM Thlnl. MalA. Sofi. I, 242-51.
M.T. JODell and M.L. Patrick (1994). "Factoring S)'lDmlltric lade6Dite Ma&rief8 on Hlgb-
Ptrimmane41 Architecture~,~ SIAM J. MtibU AnczL. AppL 15, 273-283.
Another idea for a. cheap pivoting strategy utilizell error bounds baaed on more Uberal
interehange criteria, liD. ide& oorrowed from I!Ome work done in the - of llpaz1lll! elimi-
nation methods. See

R. F1etcher (1976). MF'aciorisiDg Symmetric Indefinite Matrical.~ Lm. Alg. !1M Iu


Applic. 14, 2.57-72.

=
b eolW!', it may be advisable to equilibnte A. An
Before UBing &D)' ll)'tJllnetric A%
O(n") algorithm lor acmmpliabing this t-" ia giwm. in

J.R. Bunch (1971). "Equilibration of Symmetric Matricell in the Max-Norm," J. ACM


18, 566-7'2.

Aoaloguee of tbe symmetric iDdefinite solwl'll tba.t - bave prlllll!:llted l!ldst ror sla!w-
symmetric 'Y!Itema. See

J.R. Bunch (1982). ~A Note on the Stable Decompoei\ion of Slaew Symmetric Matric.es,n
Mllth. Comp. 158, 475-480.

The equilibrium system liter.ture is acatU!nld among &he 3IM!l'a1 applicaiion a.reu where
it ha11 an i.mport&nt role to play. Nice 0\WViewl with pointers to thil litemture include

G. Strang (1988). "A l"raJDework fOI.' Equilibrium Equatious," SIAM &uU:u 30, 283-291.
S.A. Ve.weis (1994). "Stahle MUJMrical AJ&orithmlll for Equilibrium Syatema," SIAM J.
Matr= Afi4L AppL U, 1108-1131.

Other papen1 i.oclude

C. C. Paige (1979). "Fad Numerically Stable Computa&iona for Geo.alized Lmear t - i


Squarea Problems, ft SIAM J. Num.. A - 16, lM-71.
A. BjOrc.k ud I.S. Dull' (1980}. ~A Direct Mcbod for the Solution of Spame I..illNr
r - t SqWil'l!ll Problema," Lin. Alg. cm4 lb Applic. 3.4, ~1.
A.. BjOn::k (1992). ~Pi:YOting and Stability i.D the Auptebied Syaem Metbod,n Prot:«G-
*"91 of the 14th ~ C~ D.F. Griflttbll aad G.A. ~ (edli), I.onpnau
Scientific and 'l'eclmH:al, ~ u .X.
P.D. Hough and S.A. Va.vuis (191Hl). "Comppete Ortbogoul DecomposWon for Weigbt4!d
Leuc Squaree,ft SIAM J. Mal:ri:z AnaL AJ!1L, kl appear.

Some of tbme pllpSIImaim ~ of tbe QR factorisUioll. &Dd otbe£ leuC IIC(IUII'EII id.-
tlw. an~ in the ned cbaptllr &Dd §12.1.
Probleuls wiUa nructure abound in m.irix ~ and perturlla&ion &ll.ecQ'
baa a by ro1ll to play In the -a. ror Aab&e, flftlcieD& algorithm&. Fix' equilibrium .,..
tems, there ~ IIEIVEII'8l reau.Ita lib (4.4.15) that UDdmpiJ1 the moM. effilctive algl:lmb.!llS.
See
174 CHAPTER 4. SPECIAL LINBAR SYSTEMS

A. ~ (1996). "011 IJD.r ~ Pmblemll with D~ Domillallt


Weigh\ M.usc.,• TedUlical ~ 1'Rn'A·MAT·l~S2, ~ ot ~
maUct, Royal m.i\IM ol 'I'echDoloo', ~100 44, Swdtholm, Swadea.

G.W . S&ewan (1989). "'n Sealed Projectiona Uld PlltldoUI..--,• Lin. Alg. ond /U
Applic. I II, 1ag...1~.
D.P. O'Leuy (1000). "011 Bound8 C« Scaled Projtc:tioM Uld Peeudo~~ L in. Alg.
cm4 Ie. A,&. 1#, 115-117.
M.J. 'Thdd (1990). ~A Darrtals-WoUHilur VarieD& of ~·• lm«ior-Pola& LiMar
Procrunmiuc A)&withm." ~ Raeareh. 38, Ul06-1018.

4.5 Block Systems


In many application areas the matrices that ariae have exploitable block
structure. As a case study we have ch08en to discusa block tridiagonal
systems of the form

0 :f) bt
x, lJ.z
= (4.5. 1)

Fn- 1
0 En- 1 Dn Xn bn

Here 1WI assume that all blocks are q-by-q and that the x, and b, are in
~. In this section we discusa both a block LU approach to this problem as
ftlJ as a divide and conquer scheme known as cyclic reduction. Kronecker
product systems are briefly mentioned.

4.5.1 Block Tridiagonal LU Factorization


We begin by considering a block LU factorization for tbe matrix in ( 4.5.1).
Define the block tridiagonal matrioes At by

Dt F1 0
Et D~

A" ""' k = l:n . {4.5.2)

Fk-1
0 E,._ L D,.
4.5. BLOCX SYSTEMS 175

Comparing blocb in

I 0 u. Ft 0
La I 0 u2
A..= (4.5.3)

Fn-1
0 L..-· I 0 0 U,.

we formally obtain the fol.knring algorithm for the L, and U,:

Ua =Dt
fori:= 2:n
Solve L,_,u,_, E,_:= 1 for L,_ ,. (4.5.4)
U,:: D,- L,_,.F(_,
end
The procedure is defined so long as tbe U, are nonsingular. This is assured,
for example, if tbe matrices A, , ... ,An are nonsingular.
Having comput«i the factorization (4.5.3), the vector z in (4.5.1) can
be obtained via block forward aud back substitution:

Yt • bt
fori= 2:n

end (4.5.5)
Solve Un:&n = Yn for %,..
for i = n- 1: - 1:1

end

To carry out both (4.5.4) and (4.5.5), each U, m~ be factored aiDce linear
systems involving these' sublllAtrices are solved. This could be done using
Gt.ussian elimination with pivoting. However, this does not guarantee the
stability of the overall process. To see this just consider the caae whi!n the
block size q is unity.

4.5.2 Block Diagonal Dominance


In order to obtain satiafactory bounds on the L, a.nd U, it is necees•ry
to make additional aasumptlons about the underlying block matrix. Fbr
example, if for i = l:n we have the block diagonal dominaw:e relations

E,.:Fo=:O (4.5.6)
176 CHAPTER 4. SPECIAL LINEAR SYSTEMS

then the factorization (4.5.3) exists and it is poeaible to show that the ~
and Ua satisfy the inequalities

IILdl1 ~ 1 (4.5.7)
IIUilll ~ IIA.alh (4.5.8)

4.5.3 Block Versus Band Solving


At this point it is reasonable to ask why we do not simply regard the matrix
A in (4.5.1) 88 a. qn-by-qn lll8trix having scalar entries and bandwidth
2q - 1. Ba.nd Gaussian elimination as described in §4.3 could be applied.
The effectiveness of this course of action depends on such things as the
dimensions of the blocks and the sparsity patterns within each block.
To illustrate this in a. very simple setting, suppose that we wish to solve

(4.5.9)

where D 1 a.nd D2 are diagonal and F1 and E 1 are tridiagonal. Assume


that each of these blocks is n-by-n and that it is "safen to solve (4.5.9) via
(4.5.3) and (4.5.5). Note that

U1 = D1 (diagonal)
1 (tridiagonal)
£1 = E1Ui"
u'J = ~- L1F1 (pentadiagonal)
Yl = bl
112 = ~- E1(Di 1yl)
U2z2 = lh
DtXl = Yl- FtX2.

Consequently, some very simple n-by-n calculatiODS with the original banded
blocks renders the solution.
On the other hand, the naive application of band Gaussian elimination
to the system ( 4.5.9) would entail a great deal of unnecessary work and
storage as the system has bandwidth n + 1. However, we mention that by
permuting the rows and columns of the system via the permutation

(4.5.10)
4.5. BLOCK SYSTEMS 177

we find (in the n = 5 case) that

X X 0 X 0 0 0 0 0 0
X X X 0 0 0 0 0 0 0
0 X X X 0 X 0 0 0 0
X 0 X X X 0 0 0 0 0
PAPT = 0 0 0 X X X 0 X 0 0
0 0 X 0 X X X 0 0 0
0 0 0 0 0 X X X 0 X
0 0 0 0 X 0 X X X 0
0 0 0 0 0 0 0 X X X
0 0 0 0 0 0 X 0 X X

This matrix has upper and lower bandwidth equal to three and so a very
reasonable solution procedure results by applying band Gaussian elimina-
tion to this permuted version of A.
The subject of bandwidth.reducing permutations is important. See
George and Liu (1981, Chapter 4). We also refer to the reader to Va.rah
(1972) and George (1974) for further details concerning the solution of block
tridiagonal systems.

4.5.4 Block Cyclic Reduction


We next describe the method of block cyclic reduction that can be used
to solve some important special instances of the block tridiagonal system
(4.5.1). Fbr simplicity, we assume that A has the form

D F 0
F D
A= E Jr'9X"''' (4.5.11)

F
0 F D
where F and Dare q-by-q matrices that satisfy DF = FD. We also 888Ullle
that n = 211: - 1. These conditions hold in certain important applications
such as the discretization of Poissonts equation on a rectangle. In that
situation,
4 -1 0
-1 4
D= (4.5.12)

-1
0 -1 4
178 CHAPTER 4. SPECIAL LINEAR SYSTEMS

and F = -lq. The integer n is determined by the size of the mesh and can
often be chosen to be of the form n = 2k- 1. (Sweet (1977) shows how to
proceed when the dimension is not of tbis form.)
The basic idea behind cyclic reduction is to halve the dimension of the
problem on hand repeatedly until we are left with a single q-by-q system
for the unknown subvector x 2•-•· This system is then solved by standard
mea.Jl8. The previously eliminated z, are found by a back-substitution
process.
The general procedure is adequately motivated by considering the case
n=7:
bt = Dx1 + Fz2
~ = Fzt + Dz2 + Fza
~ == Fz 2 + Dz3 + Fz4
b, = Fz3 + Dx, + Fz6
IJs = Fz, + Dx6
ba = Fxs Fx1
b., = Dxr
(4.5.13)
Fori= 2, 4, and 6 we multiply equations i - 1, i, and i + 1 by F, -D, and
F, respectively, and add the resulting equations to obtain
{2F2 - D 2 )x2 + f"lx, = F(b1 + b:J) - Dbz
f"lz2 + (2F2 - D2 )x, + F 2x6 = F(b3 + bs) - Db4
p2z, + (2f"l - D 2)z6 = F(b5 + b.,} - Dbo
Thus, with this tactic we have removed the odd-indexed x, and are left
with a reduced block tridiagonal system of the form
D( 1lx2 + F( 1lz4 = b~1 )
p!ll.x 2 + D(llx4 + p0lz6 = b~1 l
F( 1lx4 + D(llz6 = b~ll

=
where D< 1l 2f'2- D 2 and p(ll = Fl commute. Applying the same elim-
ination strategy as above, we multiply these three equations respectively
by p{ 1l, -D<ll, and .F(ll. When these transformed equations added are
together, we obtain the single equation
( 2[p(l)]2- D(l)2) Z-t = p(l) ( b;l) + bil') - D(l)bll)

which we write as
D( 2) X-t = b{2).
This completes the cyclic reduction. We now solve this (small) q-by-q sys-
tem for :z,. The vectors 2:2 and X6 a.re then found by solving the systems
D(l)X2 = ~1) - p(ll:z:,
v< 1l:z:e = b~l) - p(l)z,
4.5. BLOCK SYSTEMS 179

Finally, we use the first, third, fifth, and seventh equations in (4.5.13) to
compute xa, xa, xs, and xr, respectively.
= =
Fbr general n of the form n 2~r -1 we set D(O) D, F< 0 l F, b(O) b = =
and compute:

for p = l:k -1
p<Pl = [FCP-11]2
D<P> = 2F<P> -[D<P- 1>] 2
r= 211
for i = 1:2"'-"- 1 (4.5.14)
b(pl = p(p-t) (b<P-tl + b(p- 1) ) - D(p-llb~- 1 1
jr jr-r/2 jr+r/2 Jr
end
end

The Xi are then computed as follows:


Solve D(A:-tlx 2 •-• = biJr- 1) for x 11•-1.
for p = k - 2: - 1:0
r"" 2"
fori= 1:2Jr-~l (4.5.15)
ifj=1
- b(p) - F(Plx .
C- (:lj-1)r 2Jr

elseif i = 2"-p+l
- b(p) p(p)
C- (:lj-l)r- X(2j-2)r
else
C = b~~~-l)r - p(p) {x2jr + X(2j-2)r}
end
Solve D(p)X(IIj-l)r = c for X(2j-l)r
end
end
The amount of work required to perform these recnrsions depends greatly
upon the sparsity of the D(p) and F(pl. In the worse case when these
matrices are full, tbe overall flop count baa order log(n)t/. Care must be
exercised in order to ensure stability during the reduction. For further
details, see Buneman (1969).

Example 4.5.1 Suwo- q = 1, D ""'(4), ud F =


(-1) in (4.5.14) IU1d tha& we wish lO
aolve::
4 -1 0 0 0 0 0 %1 2
-1 4 -1 0 0 0 0 %2 4
0 -1 4 -1 0 0 0 ~ 6
0 0 -1 4 -1 0 0
0 0 0 -1 4 -1 0
%;1
%1j
"" 108
0 0 0 0 -1 4 -1 %8 12
0 D 0 D 0 -1 4 ZT 22
180 CHAPTER 4. SPECIAL LINEAR SYSTEMS

By Dee~~C:illg (4.5.15) -obtain tbe redooad ll}'llteml:

[
-141
0
-141
l
01
-14
l[ l [ z,
~"
:1"8
"" -24]
-48
-80
p=1

[-194]=[~][-ns] p=2
The %i .-e theu deten:n.inad via {4.5.16):

p=2: ~ =4
Jl ""' ~~ %2 =2 %8 =6
p ... 0: %1 =1 %3 == 3

Cyclic reduction is au example of a divide &Dd conquer algorithm. Other


divide and conquer procedures are discussed in §1.3.8 and §8.6.

4.5.5 Kronecker Product Systems


If BE nmxn and C E R'xoz, then their Kronecker product is given by

Thus, A is an m-by-n block matrix whose (i,j) block is b,1C. Kronecker


products arise in conjunction with various mesh discretizatiollB and through-
out signal processing. Some of the more important properties that the
Kronecker product satisfies include

(A®B)(C®D) = '=AC®BD (4.5.16)


(A®B)T = AT®BT ( 4.5.17)
(A® B)-1 = A-1 ® s-1 (4.5.18)

where it is assumed that all the factor operatioM are defined.


Rela.ted to the Kronecker product is the "vee" operation:

vee{X) =
[
X(:, 1)
:
X(:,n)
l E Rmn.

Thus, the vee of a matrix amounts to a "stacking" of its columns. It can


be shown that

vec(Y) = (B ®C)-we( X). (4.5.19)


4.5. BLOCK SYSTEMS 181

It follows that IOlviDg a Kronecker product system,

(B®C)~ ""'d
is equi'lalent to aolving the matrix equation CXBT = D fOr X where
z = =
vee(X) alld d vee( D). Thia baa efficiency rami1icatlons. To Wusuate,
suppoee B, C E ~w. are symmetric poeitive definite. If A : B ® C is
treated as a geoera.l matrix and factored in order to solve fur ~, then 0(n•)
11 2
dops are required Binc:e B ® C e E' x.. . On tbe other h.aod, the solution
approach
1. Compute the Cholesky factorizations 8 = G(ff" and C: HJ(T.
2. Solve BZ = DT for Z using G.

J. Solve ex = zT fur x using H .

4. z = vec(X).
involves O (n 3 ) flops. Note that

B®C =GGT ®HH'I' ~ (G®H}(G®H)T

is the Cholesky factorization of B ® C because tbe Kronecker product of a


pair of lower triangular matrices is lower triangulAr. Thus, the above four·
step solution approach is a structure-exploitiog, Cbolesky method applied
to B®C.
We mention that if B Is spane, tben B®C bas the same sparsity at the
block JeRI. .For eumple, if B is tridiagonal, then B®C is block tridiagonal.

Problem.

P4.5.1 Slaow t.ha& a block diapal1y domiun& lllaU'b: is no~.


P4.5.2 Verily thai (4.5.6) impliel (4.5.1) ud (.U.8).
P4.&.3 Suppoae block eycllc reductio11 II applied with D Ciwn by (4.5.12) lllld F:: -1•.
Wh.l caD you _,. about \he bud .uw:ture of the ~ p-(P) aDd _o{P) tbat an.t
P4.5.4 Svp~ A E R',." is~ aDd that- b.w eohatioM to t.lke liiiiiK
ayll&oml AI = II ud Aw ,. 11 wbere II. 1 E R"' ue pYa. Sbow bow to IIOIIw Ule ~

iD 0(1\) &,. ...-. o./J E R lllld hER'" - gr_, eacl ~ IIIIMrix of C04Ilciell&a At- ill
nonaiJiplar. Tbe edvilability of goiBI for neb • quick eolutm il a eomplicaWd - .
dw depeada upGQ tile mDditioa aumbsa of A ud -4 lllld aU. fadon.
P4.5.5 Verily (·U.16)-(U.19).
P4.&.7 Sboao boa- to~ the SVD of B ® C from tbe SVDI of B Uld C .
P4.5.8 U A, B. Uld Cue mallic8, tbeD il cu belbowu tba& (A®B)®C • A8(B®C)
182 CHAPTER 4 . SPECIAL Lll'fEAR SYSTEMS

ud eo - ju.t write A ~ B 4P C far tbll meUtx.. Sbftr bow to .W. the u- .,._
(A e 8 e C).:= d a.umin& that A, B, ud C are I)'11IJrletrie poeiliw deli~tite.

Not• aDd RerenDce. tor Sec. 4..5


The follolriac papen provide iDiigllt iACO the ~ DVIDI* ol block mMrix computa-
UoD.:

J .M. V.nb (1972). "On the SolutioD ol Bloc:k-1\idiaaOD&l


Finite.~ Equatiolla," MiliA. Comp. 16, ~.
s,.._
Arlaillg from~

J.A. G«qe (1974). "On Block Wimioatioo for S~ u-r Syfien\1,~ SIAM J. N~~m.
AML 11, 585--603.
R.. Pourer (1984). "St..u-c- Matrio~!a ead S~" SIAM Rnial U. 1-n.
M.L. Merriam (1985). •on the Fllctoriaatioo ol Blodt 'lndiacoula With Storace Con-
araiD~" SIAM J. Sci. AM Sill*. CMIIJ'. 6, Ul2·192.

The property of block cliacooa1 dominaDc:e &lid ita variou8 impl.icatioo8 is the central
theme lD

D.G. Fusold and R..S. Vacp {1962). "Block Di~aally Domiaam Macncee aad Gen-
eraliaationa of the GenllevriD Circle Tbeoremt Ptldftc J. Mlllh. It, 1241-56.
Early methoda ~bat involve the idea of cyclic reduction are d.;:ri~ in

R..W. Hodtuey (1965). wA Fut D~ Solut»n of PoiiM>o'• Equa&ioo Uling Fourier


Anal)lllill. "J. ACJI 11, V5-113.
B.L. Bubea, G .H. Golub, &lid C.W. Nielaon (1~'10). "On Direct Method. Cor Solvinc
Poillon's Equations," SIAM J. Nvm. Anal. 7, 621-56.
The ac~~;umula\ioo of ~be rich~haod aide mua\ be done witb 1JeM care, for otbenriee
&here would be a aipillcut • of accurw:y. A Aable way of doinc thw w d-=ribed in

0. Bu~~e~UG (1960). "A Compact Noo-lteraiive PoiaJo Solvw," ~\aport 294, Stanford
Univenli'Y lniC.itu\e foe Plum& ~. S~. Califonai&.
~b.- li&erMure cooc:emed with cyelic redu~ion iDcluct.

F.W. Dow (1910). '"l'be ~ Solu&ioD oBIM Dillc:rCe Polilloe EquaUOD OD a R.ect&D-
P," SIAM Review 1.. ~.
B.L. Busbee, F.W. Dorr, J .A. Georp. aad O.H. Golub (1971). "The Direcl Solution o(
die Dl8cnCe PoM.oo Equa&ion OD ~ ft.eclou," SIAM J. Nwm.. AnoL 8. 122-36.
F.W. ~ (1973). '"The Dlrecl SolutioD of tbe Dia:tece Poia1oo Equtioa i:D O(tal)
~·SIAM Rewiew 15, 412-41&.
P. Concua IIDd G.H. Golub (1973). •U• ol Fu& Direct Mebodlf« &be Eftld1111t Nu-
morical Solation ol No~ Elllp&ic Equa&ioua,~ SIAM J. Num. Anal. 10,
U03-20.
B.L. Bua~Me aDd F .W . Dorr (1974). '7be ~ So11l~ oC tbe Blbarmoo.ic Equation
OD ~ ftecjoca aad tbe P~ F.qu&tioa on fn1wu.la.r .R.cio-.• SlAM J.
Nvm. .4-'. 11, 753-03.
D . Heller ( 1~76). "SoiDI Alpecta oHbe Cyclic Reduction Al&oriU!m for Block Tric1la«ooaa
Lin~ s.,--,-SIAM J. N-. AnoL 13, 484-fe.

Variou ~iou &lid extallliou to eyctic reducdoll llaw bee PI'OPC*d;

P.N. Swamuaat-lllld R.A. s.- (1973). -rile Dlrecl SoluUoa of the Oiacme Poaoo
Equa&»ll oa a DIU.'' SIAM J. Nwm. AML 10, ~.
4.6. VANDERMONDE SYSTEMS AND THE FFT 183

R..A. SWMt (1974}. •A a--.lis«i Cyclic RAid!ICWn Alcombm.." SIAM J. Nvm. AnaL
11,506-20.
M.A. Diamond N1d D.L.V. Ferreira (1978). "'n a Cyclic lbduction Method for the
SoluUDn of Poillloo'• EquatloD," S/Altl J. Nvm. AnaL 13, 54-10.
R.A. Sw.t {1971}. "'A Cyclic Reduction Alp1ihJD for SotviDc Diode TridiJicoD&I Sy&-
&ema of Arbitrary Dimea8ioo," SIAM J. N'lml. AnaL 14, 706-20.
P.N. Swamrauber aad R. ~ (1989). "Vector aud Par.llel Methods for the Dinlct
Solutioll ofPo_,.'• Equation,~ J. Comp. AppL M/Uh. rl, 241-263.
S. Bondellud W. Cuder (1994). "Cyclic Reduction !or Special nidiasooal Systems,~
SIAM J. Mllir'IZ AnaL Appl. 15, 321-330.
F'Ol' cmtai.D matricel tha& artae in conjUDCtion rib elllp& perlla.l diffi!nm.tial equationa,
block ellmillaiion com.'lllp<lnds to rather natural opera&lona on the underlying mesh. A
clurical example of thilll is the method of neated dillrieaioD. dellcribed in

A. George (1973}. "Nested Di~Mct.ion of a 1\egu.lu Finite Element Mesh," SIAM J.


NUffl. AnaL 10, 345-63.

We also mention the following general survey:

J.R. Bunch (1976)."Block Methoda for Solving Span~e Lipear Sy.tems," in Spar~e
Motriz ComputGHoM, J.R. Bunch and D.J. &c- (eds), Academic Pre., New York.
Bordered linear systems as pn!!lelltad in P4.5.4 are ~ in

W. eo-t. and J.D. Pryce {1990). ~Block Eliminacion wiib One Iterative Refinement
Solvm Bordered Lineel' Syatemll Accuru.ely," BIT 30, 490-507.
W. Caw.erta (1991). "Stable Sol\'el'll and Block Elimina1ioa for Borden!d Systems,"
SIAM J. Mott'U An4l. Appl. 11, 469-483.
W. Govaertl and J.D. Pryce (1993). ~Mlxa:l Block Elimination for Line&~' Sy&ems with
Wider Boniers," IMA J. Nvm. A~. 13, 161-180.
Kroneda!!r- product refer-enOM iDCiude

H.C. Andrewa and J. Kane (1970). ~Kronecks- Matricm, Computer Implementation,


and Genenlized Spectra," J. Auoc. Comput. Moda.. 17, 260--268.
C. de Boor (1979). "Efficient Computer Manipula&ion ol Tenaor Produda," A OM Than~.
Mrdh. Soft. 5, 173-182.
A. Graham (1981). K~ Product. and M~ C~ wilh AppliaJtioru, Ellilll
Horwood Ltd., ~. En&Jaod.
H.V. Henderllon and S.R. Searle (1981). "The Vec-Pennuiation Matrix, The Vee Opera-
tor, and Kronedmr Products; A Revi-," I.mcor rmd Muliilinmr Algelma 9, 271-288.
P.A. R.eplia and S. Mitra (1989). "Kronecker Products, Unitary Mat.ria!s, and Signal
Proa.iDg Appliawons,~ SIAM Review 31, 58&--013.

4.6 Vandermonde Systems and the FFT


184 CHAPTER 4. SPECIAL LINEAR SYSTEMS

ill said to be a Vandennonde matriJ:. In this section, we abow how the


systems VTa = f = /(O:n) and Vz = b = b(O:n) can be solved in O(n2 )
flops. The discrete Fourier traasform is brie8y introduced. This special. and
extremely important Va11dermonde system bas a a recursive block structure
and cao be aolved in O(nlogn) ftops. In thU .te.ction, vectors and matrices
are IUb.tcripWl from 0.

4.6.1 Polynomial Interpolation: yr a =f


Vandermonde systems arise in many approximation and interpolation prob-
leirul. Indeed, the key to obta.ining a fast Vandermonde solver is to recognize
that solving vr a = f is equivalent to polynomial interpolation. This fol-
lows because if vT a = f and

(4.6.1}

then p(xi) = /;. for i = O:n.


Recall that if the z, are distinct then there is a unique polynomial of
degree n that interpolates (xo, /o), ... , (x,., f.,). Consequently, V is non-
singular as long as the Xi are distinct. We BBBume this throughout the
section.
The first step in computing the a1 of (4.6.1) is to calculate the Newton
representation of the interpolating polynomial p:

(4.6.2)

The constants Ck are divided differences and may be determined as follows:


c(O:n) = /(O:n)
for k=O:n-1
fori= n: -1:k + 1 (4.6.3)
C;. =(eo- Co-t)/(x;.- x;.-1.:-1)
end
end

See Conte and de Boor (1980, chapter 2).


The next task iB to generate a(O:n) from c(O:n). Define the polynomials
p,.(x), ... ,vo(x} by the iteration
Pn(x) =c.,
for k = n - 1: - 1:0
p~:(x) = c~.: + (x- x~~:}Pk+l(x)
end
4.6. VANDSRMONDE SYSTEMS AND THE FFT 185

and observe that Po{.:t) = p(x). Writing


P~.:(x) = ai"'l + ai~1x + ... + c4""lzn-lr
and equating like powers of .:tIn the equation Pk =C.!:+ (.:t- :r.~~:)PA:+1 gives
the following recursion for the coefficients a~t):
cJ.n) = Cn
for k = n- 1: - 1:0
a<"'> - .f'L- .:tLa(lr+1)
.1:: -- .. A:+l
for i = k + 1:n- 1
(k) _ (A:+ 1) (lr+ I)
ai - a, - Xkat+ 1
end
an(A:) =an(11:+1)
end
Coru;equently, the coefficients lli = a~ l can be calculated as follows:
0

a(O:n) = c(O:n)
for k = n- 1: - 1:0
fori= k:n -1 (4.6.4)
lli = ll; - Xklli+l
end
end
Combining this iteration with (4.6.3) renders the following algorithm:

Algorithm. 4.6.1 Given x(O:n) E R"+ 1 with distinct entries and f =


f(O:n) E nn+ 1 ' the following algorithm overwrites I with the solution a =
a(O:n) to the Vaodermonde system V{x0 , ••• ,.:tn)Ta = f.
fork= O:n -1
for i = n: - l:k + 1
f(i) = (f(i)- f(i.- 1))/(:z:(i)- x(i- k- 1))
end
end
for k = n - 1: - 1:0
fori= k:n -1
/(i} = f(i)- f(i + l)x(k)
end
end
This algorithm requires 5n2 f2 flops.

Example 4.6.1 Sup~ Algorithm 4.6.1 II 118«1: to 110M!

[ i~ !~ .:! !!!]T[::;]
::
z [ :]

~~
186 CHAPTER 4. SPECIAL LtNEAR SYSTEMS

Tbe 8nt lo-loop compma cJie N..c.oa '•ellllliaioo of Jl(:~:):


p(z) ... 10 + 16(z - 1) + 8(;t - l)(z- a)+ (s - l)(z - 2)(z- 3).

The ..coDd At-loop comput. A .. ,4 3 2tJT (rom (10 1& 8 lj'l'.

4.6.2 The System Vz =b


Now consider the system V z = b. To derive an efticient algorithm for this
problem, we deeeribe what Algorithm 4.6.1 does in matrix-vector language.
Define tbe lower bidiagonal matrix L.~o(o) E R {n+l)x(rt+l) by

lie 0
1 0
-a 1

1
0 -a 1
and the diagonal matrix D~c by

D1e = diag( .__..


1, .. . , 1 ,Zit+l - zo, . .. ,z,. - Zn- lt-1)·
k+l

With these definitions it is easy to verify from (4.6.3) that if I = I (O:n)


and c = c(O:n) is the W!Ct.Or of divided differences tben c::::;; rf1'I where U
is the upper triangular matrix defined by

uT = v;-~ 1 L,.-t(l)···DO''Lo(t).
Similarly, from ( 4.6.4) we have

where L is the unit lower triansular matrix defined by:

LT ""' Lo(%0}T • • • Ln-t(Zn-t)T .

Thus, a = LTrfT1 where y - T = LTrfT. In other words, Algorithm 4.6.1


=
80lve!l yTa I by tacitly computing the "UL" factorization of v - a.
CoDSequeDtly, the solution to tbe system V z = b is giwn by

z = v- 1b -= U(Lb)
= (Lo(l)T Di) 1 • • • L,.- J(ll D,;2 1) (L..- t(Zn-1) · · · Lo(zo)b)
4.6. VANDERMONDE SYSTEMS AND THE FFT 187

'I'Jtis observation gives rise to the following algorithm:

Algorithm 4.6.2 Given :z:(O:n) E R"+ 1 with distinct entries and b =


b(O:n) E R"+l, the followiug algorithm mrerwrites b with the solution z =
z(O:n) to the Vandermonde system V(:ro, ... ,x.,)z =b.

fork= O:n -1
for i = n: - 1:k + 1
b(i) = b(i)- x(k)b(i- 1)
end
end
for k = n - 1: - 1:0
fori= k + l:n
b(i) = b(i)f(x(i) - x(i- k- 1))
end
for 1 = k:n -1
b(i) = b(i)- b(i + 1)
end
end
This algorithm requires 5n2 /2 flops.

Example 4.6.l Sup~ Algoriihm 4.6.2 II uaed to 801ve

l[~ l -~ l
[ 1!1 84~ 2 7!9 6 416! S2
.llJ
= [ 3
35
The fiZ'It k-loop computes tbe vector

~(3)~(2)L,(l) j l ~ [-~ ].
[

The IIIIICOnd ..loop tbea. c:alc:uJaklll

Lo(l)'DO'L,(l)'DO'L,(l)TD;' [ j l ~ [i l
4.6.3 Stability
Algorithms 4.6.1 and 4.6.2 are discussed and analyzed in BjOrck and Pereyra
(1970). Their experience is that these algorithms frequently produce sur-
priBingly accurate solutions, even when V is ill-conditioned. They also
show haw to update the solution when a new coordinate pair (xn+l. fn+t)
188 CHAP'rER 4 . SPECIAL LINEAR SYSTEMS

is added to the set of points to be interpolated, and how to aolw conjluent

l
Vc~ ~t~lteml, le., systems involving matrices lib

01 :t31
2%1 ~ .
~~

4.6.4 The Fast Fourier Transform


The discrete Fourier transform (DFT) matrix of order n is defined by

where
"'" = exp( -211'i/n) ""'ooa(211'/n)- i · sin(211'/ n).

The parameter "'" is 8D nth root of unity becouae w: = 1. I~a the n = 4


case, w•::::;; -i and

-:_: ]·
-1 -·

If :t e C', t hen its DFT is the vector F,.:t. The DIT bas an extremely
important role to play throughout applied mathematics and engineering.
U n is highly compceite, tben it is possible to carry out the DFT in
many~ than the O(n,) ftops required by conventional matrix-vector
multiplication. To illustrate this we set n =
2' and proceed to develop
the ~2/an Fourier trun.jorm (FFT}. The starting point 15 to look
at an even-order DFT matrix when we permute its colu.mns 80 that the
even-indexed colWIUliS come first. Consider the cue n a 8. Noting that
w:; = ~i mod 8 we have

1 1 1 1 1 1 1 1
l w w2 w" w$ w" w1
1 wa w• w'
1 w3 w" w
""' 1 w2 w'
w' wT w2
w"
ws
Fa= 1 w' 1 w• w' W=Ws·
1 w' 1
1 ...,s w2 w" w' w w'
1 ...,e w' w'2 1 w" w• """
w2
1 ...,r w" wS w' Wl w
""'
4.6. VANDBRMONDE SYSTEMS AND THE FFT 189

If we de6.ne the index vector c = [0 2 4 6 1 3 5 1J, then


1 1 1 1 1 1 1 1
1 w'~ w• we w w' w& w1
1 w• 1 w• w2 w• w2 we
1 w• w• w2 w3 w w1 w'
Fa(:,c) = - 1
1 1 1 1 -1 - 1 - 1
1 w, w• w,. -VI --w' -v/5 - w1
1 w• 1 w• -w2 - we ~ -w6
1 w" w• w2 - w3 --w -w1 - ws
The lines through the matrix ate there to help us think of Fn(:,c) as a
2-by-2 matrix with 4-by-4 blocks. Not ing that w~ = wl = w, we see that

F4
Fa(:, c) = [ F4
n,F,]
- n.F4

wnere

n..... ~~ ~ ~ w~]l ·
0 0 0
It follows that if z in an 8-vector, then

Fax=F(:,c)x(c) =

= [ ~ ,_g: ][~:=~~~~~~~ J.
Thus, by simple seaJinp we can obtain the 8-pcint DFT y = Faz from the
4-point DFrs YT = ·F4 x(0:2:7) and·ya = F•%(1:2:7):

y(0:4) = !IT + d. • '!/8


y(4:7) = YT - d. • JIB·

Here,

and ".•• iDdicates vector multipllcatioo. In ~. if n = 2m, then !f "'"


F"x ia given by

y{O:m - 1) = liT+ d. • YB
y(m:n- 1) .,. !IB -d. • !IB
190 CHAPTER 4. SPECIAL LINEAR SYSTEMS

where

d = (1, w,···, wm-1 (

YT = Fm:c(0:2:n- 1},
YB = Fm:c(1:2:n- 1).
For n = 2' we can recur on this process until n = 1 for which F1 x = x:
function y = F FT(x, n)
ifn=l
y=x
else
m = n/2; w = e-2-rls/n
'!IT= FFT(x(0:2:n),m); YB = F.FT(x(1:2:n),m)
d = ( 1, w, · · ·, wm-l ]T; z =d.* YB

Y
:=[YT+Z]
YT -z
end
This is a member of the fast Fourier transform family of algorithms. It
has a nonrecursive implementation that is best presented in terms of a.
factorization of F... Indeed, it can be shown that F,. =At··· A1P,. where

L=~. r =nfL

with

BL = [ h/r,f2
12 nL/2 ]
-{h/2 an
d n
L/2 = diag(I 'w}L, ... 'WLL/2-1) .
The matrix P,. is called the bit reversal pennutation, the description of
which we omit. (Recall the definition of the Kronecker product "®" from
§4.5.5.) Note that with this factorization, 11 = F,.z can be computed as
follows:
z=P,.x
for q = l:t
L=2'l,r=nfL {4.6.5)
X= (Jr ®BL)X
end
The matrices~= (lr®BL) have 2 nonzeros per row and it is this sparsity
that makes it possible to implement the DFI' in O{nlogn) Hope. In fact,
a careful implementation involves 5n log2 n flops.
The DFI' matrix has the property that

(4.6.6)
4.6. VANDERMONDE SYSTEMS AND THE FFT 191

That is, the inverse of Fn is obtained by conjugating its entries and acaling
by n. A fast inverse DFT can be obtained from a (forward) FIT merely
by replacing all root-of-unity referencs with their complex conjugate and
scaling by n at the end.
The value of the DFI' is that many '"hard problems" are made simple
by transfo.rmillg into Fourier space (via Fn)• The sought-after solution
is then obtained by trBDSforming the Fourier apace solution into original
coordinates (via F; 1).

Problema

P4.6.1 Show thai if V =- V(:o, ... ,z,.), theu


det.(V) ::; II <=•- :;).
,.~O>j~O

P4.6.2 (Gaut.achi 1975&) Verify tbe following inequality for tbe n::; 1 cue above:

II v-1 """ :::; max


o<lo<n
rr" lz~o1 +-l:rd=•I .
- - hoO

''"'
Equality resultlll if the z; an all on the same ray in the complex plane.

P 4.6.3 S uppoee w - [1 , w,, w,.,


2 ... , w,.
n/l-1 ] w" -, n -- 21 . Using co Ion notation,
...,.. .
ex preY
[1, """• ""~' ... ' w;/2-1 ]
as a subvt!ct.oc- of w when! r : 2', q = 1:t.
P4.6.4 Prove (4.6.6).
P4.6.5 Expand the opera&ion z = (I @ Bc.)z in (4.6.5) into a double loop and count
the number of fto~ required by yow- ~ (Ignore the detaik of: P..:r. =
=
P4.6.6 Suwc- n 3m &Z1d examine
G = fF,.(:,0:3:n -1) F,.(l:3:n. -1) F,.(:,2:3:n- 1)]
as a 3-by-3 block malrix, looking few IICIIIed copiM of Fm. s.-:1 011 what you find,
develop a l"8CCll'SM! radix-3 FFT BD&Joso• to the radix-2 implemen&at.ion in the text.

Not• and Refereac:ee for Sec. 4.8


Our dlac:laion of Vandermonde liDeu- 8)'lltem8 i8 drawn from the pepen

A. BjOttk and V. P.nyra (1970). "Soouiioll of VUldermonde S~ of Equat.loaa.. • Math.


Camp. !4, 893-903. .
A. BjOn:k and T. ElMng (1913). •AJgoritbms for Confluent Vandermonde s~n
N'/Jifler. Math. 11, 130-37.
The divided dillenmc:e eomputatioDB we ~ 8f8 detailed in cb.aptel' 2 of

S.D. Conie aDd C. de Boor (1980). ~ Numeriml Analv.U: An Algonthmie


Approach, 3rd ed., McG~-Hill. N- York.
192 CHAPTER 4. SPECIAL LINEAR SYSTEMS

Tbe latter , ............ iDclude. 811 Alcol prooedw.. Error ~of V&AdarmoDde l)'lltem
110lwn include

N.J. Higham (1987b). "Errol' Allal)W of the Bjiin:k-Pereyra A.lgorithms f~ Solvillg


VIIDdermoDde Sys5em~, ~ Nwner. Mtdh. 50, 6134'132.
N.J. Higham (1988a). "Fast Solution ofVaadennondo-Ub S~ lnvolviiiJ Qnbogoilal
Pol)'II:OID1als," IMA J. N-. AnAL 8, 473-486.
N.J. Higham (1990). "Stability ~ofAigoriU!Jm forcSolvingCooftueut Vaadermonde-
li.b S~" SIAM J. MaiM: AMl. Appl. 11,23-41.
S.G. Bane1s a.od D.J. Higham (1992). "Tblt Structwwi Semi.tivity ofVB.Ddennonde-Like
Systama," N'llm#/:r'. Math. 6!, 17-:W.
J.M. Van.b (1993). KErron and Perturbatiooa in Va.ndennonde Syatems," IMA J. Num.
AnGL 13, 1-12.

Interesting theoretical n!lllllta Olo.cet1Uilg the condition of Vandermoode ll)'lltemB may be


found in

W. Gautachi (1975a). KNonn Eatima.ta for In,__ ofVandennond.e MatricEB," Numer.


Math.. M, 337-47.
W. Gautachi (l975b). "Optimally Conditioned Vande:rmonde Matrices," Numer. Math.
2.4. 1-12.

The bmlic algorithms p~nted can be extended to cover confluent Vandermonde sys-
tems, block Vandermonde syaems, IIDd Vaudermonde systl!lllll that 111'8 ~ on other
polynomial ~:

G. Galimberti and v. Pereyra (1970). KNumetical Differentiation a.nd the Solution or


Multidimensional Vandermonde Systems," MaUl. Comp. !4, 357-64.
G. Galimberti and V. Pereyra (1911). "Solving Confluent Vandel'monde Systems of
HennitU..O Type," N11mer. Moth. 18, 44-00.
H. Van de Vel (1977). ~Numeric-.~ Treatment; of a Genes-alized Va.ndennonde5)'Stemll of
Equat.iona," Lm. Alg. and /u Applic. 17, 149-74.
G.H. Golub and W.P Tang (1981). "The Block Decompolition of a Vandermoode Matrix
and It.. Applicatio1111," BIT 11, 505--17.
D. Calwtti end L. Reichel (1992). "A Chebyc:hev-Vandermonde Solver,~ Lin. AZg. and
Iu Applic. I'T!, 219-7211.
D. Ca.lYet\i aDd L. Reichl!! (1993). MF'fJM IDwnioD of Vandermond~Like Mlllrices In-
wlvinl Qnhogon&l Polynomials," BIT 33, 473-484.
H. Lu (1994). MFut Solution of CoDfiueDi Vudennonde Linear Systems," SIAM J.
Mo.tri::I: AnaL APJII. 15, 1277-1289.
H. Lu ( 1996). "Solution of Va.odennoode-like Systems and Conduent Va.ndermonde-like
S)'Remll," SIAM J. Ma.b"'is Anal. Appl. 17, 121-138.

The FFI' litmat.UJ1!1 ill 'ili!fY exteuive aud acattered. Fbr an 0\W\'iew of tbe ace& couched
in Kronedcer prodlaei noiation, -

C.F. Van Loen (1992). CQ!n~ ~far the Faa' Fourier 7hatu/orm,
SIAM Pnbl~. Phlladelplria. PA.

The polni of view in tllW te~tila that difterem FFTe oorre.pond to diBenut ~na
of ihe OF"I' maUtx. 'I'hale 11111 ~e fac:toriatiolla in that the fActota have vecy few
I10Mei'OII per row.
4.1. TOEPLrrZ AND RELATED SYSTEMS 193

4. 7 Toeplitz and Related Systems


Matrices whose entries are CODStaDt along each diagonal arise in maoy ap.
plications and are called Toeplib matricu. Formally, T E R'xn is Toeplitz
if there exist scalars r -n+l, ... , ro, ••. , rn-1 such that 41; = r;-t. for al1 i
and j. Thus,

T = ro
r -1
r1 r:r1z: r,r3]
ro =
[3 11
4 3 1 7
6]
r -2 r -1 ro rt 0 4 3 1
[
r -3 r -2 r -1 ro 9 0 4 3

i.s Toeplitz.
Toeplitz matrices belong to the larger class of persymmctnc matrice...
We say that B e R'xn is persymmetric if it symmetric about its northeast-
southwest diagonal, i.e., bi; = bn-;+t,n-i+l for al1 i and j. This is equivalent
to requiring B = EBT E where E = [ e,., ... , et ] = !,.,.(:, n: - 1:1) is the
n-by-n exchange matrix, i.e.,

00 00 01 01]
E=
[ 01 1 0 0
0 0 0
.

It i.s easy to verify that (a) Toeplitz matrices are persymmetric and (b) the
inverse of a nonsingu.la.r Toeplitz matrix is persymmetric. In this section we
show how the careful exploitation of (b) enables us to solve Toeplitz systems
with O(n 2 ) fiope. The discUSBion focuses on the important case when Tis
also symmetric and positive definite. Unsymmetric Toeplitz systems and
connections with circulant matrices and the discrete Fourier transform are
brie8y discUBSed.

4.7.1 Three Problems


Assume that we have scal81"8 r 1 , ••• , r.. such that for k = l:n the matrices

1 r1 r,_, Ti;-1

r1 1 rk-2
To~; =
ri:-:l rl
Ti;-1 r.~:-2 Tt 1

are positive definite. (There is no loss of generality in normalizing the


diagonal.) Three algorithms are described in this section:
194 CHAPTER 4. SPECIAL LINEAR SYSTEMS

• Durbin's algorithm for the Yule- Walker problem T,y = -{r 1 , ... , r,JT.

• Levinson's algorithm for the general righthand side problem T,z =b.
• Trench's algorithm for computing B = 7; 1•

In deriving these methods, we denote the k-by-k exchange matrix by E~o,


i.e., E., = I~o(:, k:- 1:1).

4.7.2 Solving the Yule-Walker Equations


We begin by presenting Durbin's algorithm for the Yule--Wa.l..lmr equations
which arise in conjunction with certain linear prediction problems. Suppose
for some k that satisfies 1 $ k $ n - 1 we have solved the k-th order Yule-
Walker system T11 y = -r = -(r1, ... , r,)T. We now show how the {k+l)-st
order Yule-Walker 8}'!tem

can be solved in O(k) 8ops. First observe that

z =T;- 1 (-r- aE,r) = y- aT;- 1 E,.r


and
a = -r,+l - rT E,z.

Since Tj; is persymmetric, T; 1E, = E~.T; 1 and thus,


1

z = y- aE.,T; 1r = y +aE,y.

By substituting this into the above expression for a we find

The denominator is positive because T,.+l is positive definite and because

I E"y ] T E~.y
[ 0 1
[
rrT,E" E,r ] [ I
1 0 1
] _ [ T"
- 0
0
1 + rT y
]
.

We ha.ve illustrated the kth step of an algorithm propoeed by Durbin (1960).


It proceeds by solving the Yule-Walker systems

T~.y(l:) = -r<") = - [r1, ... , r,.f

for k = l:n u follows:


4. 7. TOEPLITZ AND RELATED SYSTEMS 195

(4.7.1)

end
As it stands, this algorithm would require 3n 2 8ops to generate y = y( .. l.
It is possible, however, to reduce the amount of work even further by ex-
ploiting some of the above expressiom:
fJt 1 + [r<"l]T y<")

1 + [ r<t-t)T r~: J [ y(l:-l) + a!:~~t-lY(t-1) ]

== (1 + [r<t-l)JTy<t-1)) + al:-1 (rr(.l:-l)]T Et-lY(Jo:-1) + rr.)


= fJk-1 + Ofc-1( -J'k-10.1:-1)
= (1- cr~-tlfJ~:-1·
Using this recursion we obtain the following algorithm:

Algorithm 4. 7 .1. (Durbin) Given real numbers 1 = ro, rt. ... , r n such
that T = (rli-jJ) E Fx" is positive definite, the following algorithm com-
putes y E JR."' such that Ty = -(r1, ... , r,.)T.
y(1) = -r(1}; {J = 1; cr = -r(l)
fork'=" 1:n -1
{J = (1 -:- a2){J
a=- (r(k + 1) + r(k:- 1:l)Ty(1:k)) J/3
z(l:k) = y(1:k) + ay(l:: - 1:1)
y(1:k + 1) = [ z(~k) ]
end
This algorithm requires 2n2 Oops. We have included an auxilia.ry vector z
for clarity, but it can be avoided.

Example 4.7.1 Suppoae we wiab to ao1ve the YuJe.WaJker 11y11tem

118inJ Ai«orithm 4.7.1.


[-~ -~ :~ l[~ l" -[:~ ]
.2 .5 1 113
Aft« oae p.- through the loop we obtain
.I

Q = 1/15, /J = 3/4, ll = [ ~~~5 ] .


196 CHAPTER 4. SPECIAL LINEAR SYSTEMS

We tbell compute
{3 = (1 - a 2 ),B "" 56/75
Q
= + '"2111 + rt1/2)//3 =
-(Tl -J/28
%1 = + 01/2 = -225/420
Ill
%2 = 112 + OJ/1 = -36/420,
giviog lhe finale:~lution v = {-75, 12, -~JT /140.

4.7.3 The General Right Hand Side Problem


With a little extra work, it is. possible to solve a symmetric positive definite
Toeplitz system that has an arbitrary right-band side. Suppose that we
have solved the system

(4.7.2}

for some k satisfying l $ k < n and that we now wish to solve

(4.7.3)

Here, r = (r 1 , ••• , r,~;)T as above. Assume also that the solution to the kth
order Yule-Walker system T,~;y = -r is also available. From T.~;V+JJE,~;r = b
it follows that

and so

jJ = b.1:+1 - rT E~cv
= b~.:+t - rT E~cz -- ~JrT 11
= (bk+l-rTEr.x)/(l+rTy).

Consequently, we can effect the transition from (4.1.2) to (4.7.3) in O(k)


flops.
Overall, we can efficiently solve the system Tnx = b by solving the sys-
tems T,~:z(k) = b(.l:) = (b1, ... , b.~;)T and T~~:y(k) = -r(k) =
(r1, ... , rt)T "in
paralleln for k = l:n. This is the gist of the following algorithm:

Algorithm 4.7.2 (Levinson)Given be R"' and real numbers 1 =


ro,rh····r" such that T = (rl•-il) e R'x"
is positive definite, the fol-
lowing algorithm computes :z: E R" such that Tx = b.

y(l) = -r(l); x(l) = b{l); 13 = 1; a= -r(l)


4.7. TOEPLITZ AND RELATED SYSTEMS 197

fork= l:n -1
=
P = (1- a')/1; p. (b(k + 1)- r(l:kf z(k~- 1:1)) f{J
v(l:k) = z{l:k) + IJY(k:- 1:1)
z(1:k + 1} = [ v(~k) ]
ifk<n-1
a= { -r(k + 1) + r(1:k)Ty(k:- 1:1)) /P
z(1:k) = y(1:k) + ay(k: - 1:1)
y(1:k +I) = [ z(~k) ]
end
end
This algorithm require! 4n2 Bops. The vectors z and u are for darity and
can be avoided in a detailed implementation.

Example 4. T.2 Suppoee we wish to solw the symmetric positive definite Toeplitz
sysk!m

[ :~ :! :~ l[:: l
1111ing the aboYe algorithm. After one peg through tbe loop
=- [ -1] wt1 obtain

a::: 1/15, fJ"" 3/4, y := [ -1~1 ;


1
] :t = [ _: ] ·

We then compute
tJ = (I - a 2 ){J "'56/75 J,1 =- (ba- 1"1%2- r2:t1)//3 = 285/56
t11 "" Zl + ,Ul/2 "' 3M/56 ""l = Z2 +Pill .::: -376/M
giving the 6.nal110lution z ={355, -376, 2MJT /56.

4.7.4 Computing the Inverse


One of the most surprising properties of a symmetric positive definite
Theplitz matrix Tn is that its complete inverse can be calculated in O(n2 )
flops. To derive the algorithm for doing this, partition r; 1 as follows

T.-t
n
= [ rTE
A Er ] - l = [ B
1 if 7
v ] (4.7.4}

where A= T,._lJ E = En-1 1 and r = (rt. ... ,rn-l)T. From the equation

[J E ~r ] [ ; ] = [~ ]
it follows that Av = --yEr = --yE(rt. ... , rn-t)T and 1 = 1 - ,-7' Ev.
IT y solves the (n- 1)-st order Yule-Walker system Ay = -r, then these
198 CHAPTER 4. SPECIAL LINEAR SYSTEMS

expressioos imply that


"1 = 1/(l+rTz.e)
u = 7Er.
Thus, tbe last row and column of T; 1 are readily obtained.
It N!mains for us to develop working formulae for the entries of tbe
submatrix Bin (4.7.4). Sl.oce AB + ErvT = 1,._ 1 , it follows that
T
B = .4- 1 - (A- 1 Er)vT ~ A- 1 + .!:!_
.., .
Now since A = T,._l ia noDSingular and Toeplitz, ita inverse .is per3ymDlet-
ric. Thus,
bti = (A-')•J + "'"J
'Y
,.. (A- 1)
n-J,n-1
+ ,,,, (4.7.5}
'Y
= b,._,,,._,
. . _ t/n-jV,.-i
"'(
+ ViVj
...,
1
= "•- J.n- t + -.., (v;v;- tln-;tln-t) .

This indicates that although B is not persymmetric, we can readily compute


an element ,.J
from its reflection acroea the northeast-southwest axis. Cou-
pling tbia with the fact that A - 1 is persymmetric enables us to detennille
B from its "edges" to its "interior."
Becauae tbe order of operations is ratber cumbersome to describe, we
preview the formal specification of the algorithm pictorially. To this end,
888tune that we know the last eolumn and row of 7;;-1:

u u u u u k
u u u u u k
T.-1 = u u u u u k
" uuuuuk
u u u u u k
k k lc k k k
Here u and k denote the unknown and the known entries respectiwly, and
=
n 6. Alternately exploiting the pensymmetry of T,; 1 and the recursion
(4.7.5), we cao compute B, the leading (n-1)-by-(n -1) block ofT;!1 , u
followa:
lc k k k k k k k k k k k
k u u u u k k u u u k k
~eym.

- k
k u
k
u

u
u u
u u
u u
u
u
u
k
lc
le
-
(U.$) k
k
k
u
u
k k
u
u
u
u
k
k
lr;
k
k
k
k
k k k k k k k k k k k /r;
4.7. TOEPL11"1: AND RELATED SYSTEMS 199

k k k k k k k k k k k k
k k k k k k k k k k k k

--
1"'1"'¥""'· k k u u k k
k k u u k
k k k k k k
k k k k k k
" -
(4.7.5) k
k
k
k
k u
k k
k k
k k
k k
k k
k k
k k
k
k
k
k
k k k k k k
k k k k k k
prr~ k k k k k k
k k k k k k
k k k k k k
k k k k k k
Of course, when computing a matrix that i.s both symmetric and persym~
metric, such as T; 1 , it is only necessary to compute the "upper wedgen of
the matrix-e.g.,
X X X X X X
X X X X (n = 6)
X X

With this last observa.tion, we are ready to prESent the overall algorithm.

Algoritlun 4. 7.3 (Trench) Given real numbers 1 = r 0 , r 11 •.• , r,. such


that T = (rlo-jl) E R"x" is positive defurite, the following algorithm com-
putes B = T;; . Only tbooe b;,j for which i $ j and i + j ~ n + 1 are
computed.
Use Algorithm 4.7.1 to solve T.. -1Y = -(rr, ... , r,._lf·
'Y = 1/(1 + r(l:n -l)T y(l:n- 1))
v(l:n- 1) =·w(n -1:- 1:1)
B(1, 1) = 7
B(1,2:n) = v(n -1: -l:l)T
for i = 2:fioor((n -1)/2} + 1
for j = i:n - i + 1
B(i,j) = B(i- l,j -1)+
(v(n + 1- j}v(n + 1- i)- v(i- l)v(j -1)) h
end
end
This algorithm requ.in!s 13n2 /4 Hops.

Exan:J.ple 4.7.3 If the~ algorithm is &pplied to compute tbe inwne B of the


poe.itl- definite Toepliu matrix

1 .5 .2 ]
.5 1 .5 '
[ .2 .5 1
200 CHAPTEll 4 . SPECIAL LJNEAR SYSTEMS

Uleo - otKabl"'f a 75/ SO. bat = 75/56, 612 s -5/7, ba2 • 5/M, aDd~ • 12/7.

4.7.5 Stability Issues


Error analyBeS for the above algorithms haw been performed by Cybenko
(1978), and we brie1ly report on some of his findings.
The key quantities turn out to be the at in {4.7.1). In exact arithmetic
these aca.lalB aatlafy
loki< 1
and can be used to bound II T- 1
n.:
{4.7.6)

Moreover, the solution to the Yule-Walker system T.. y = -r(t :n) satisfies

{4.7.7)

provided all ~ o.~; are non~negative.


x
Now if i8 the computed Durbin solution to the Yule-Walker equations
then ro = Tn% + r can be bounded as foUows

" (1 +fa~: I)
II ro II ~ u II
l<•l

where at is the computed versioo of at. By way of comparison, since


each lrol is bounded by unity, it foUows that II rc II ::;:s ui!YIIa where r c is
the residual associated with the computed solution obtained via Cbolesky.
Note that the two residuals are of comparable magnitude provided (4.7.7}
holds. Experimental evidence suggests thAt this is the case even if some of
the a,.
are negative. Similar comments apply to the numerical behavior of
the UwiD8on algorithm.
For the Trench method, tbe computed inverse iJ of T; 1 can be shown
to satisfy
II T; 1 - B 111 =: u rr" 1 + latl .
UT;1 1h t•• 1 - latl
In light o( (4.7.7) we see that the right-hand side is an approximate upper
bound for u 1J 1;1 Jl which is approximately the size of the relative error
when r,.- 1 is calculated using the Cholesky factorization.
4.7. TOEPLITZ AND RELATED SYSTEMS 201

4. 7.6 The Unsymmetric Case


Similar recunrions can be developed for the unsymmetric case. Suppoee we
are given scalars rt. .•. 1 r n- 1o P1, ••. 1 Pn-11 and b1, ... , b" and that we want
to solve a linear system Tz = b of the form

(n = 5).

In the process that follows we require the leading principle submatrices


Tk = T(l:k, l:k), k = l:n to be noDSingular. Using the same notation as
above, it can shown that if we have the solutions to the k-by-k systems

T[y = -r = - [rt r:~; · · · r;r. f


T~~:w = -p = - [p1 P2 · · · Pk ]T (4.7.8)

Tkz = b = [bl ~ · · · bk ]T,


then we can obtain solutions to

Err [~] = -[ r:+l ]


Er] [:] = -[P:+l] (4.7.9)

=
in O(k) Hops. This means that in principle it is possible to solve an unsym-
metric Toeplitz system in 0( n :l) Hops. However, the stability of the process
cannot be assured unless the matrices T~~: = T(l:k1 l:k) are sufficiently well
conditioned.

4. 7. 7 Circulant Systems
A very important class ofToeplitz matrices are the circulant matrices. Here
is an example:

~ ~ :]·
~ tl.t
tit VI)
C(t~) = t1:t v1
[ Va tl2 V} tlo tlf
tl.t 113 1.12 tit vo
202 CHAPTER 4. SPECIAL LINEAR SYSTEMS

Notice that each column of a circulant is a "downshifted" version of ita


predecessor. In particular, if we define the downshift permutation Sn by

01 0
0 0
0 0
0 0 11
Sn= 0 1 0 0 0 (n = 5)
0 0 1 0 0
1 0 0 0 1 0

and v = {vo tit · • • 11n-l JT, then C(tr) = [ v, Snv, ~v, ... , 1v ].
s;:-
There are important connectiona between circulant matrices, 'lbeplitz
matrices, and the DFI'. First of all, it can be shown that
(4.7.10)

This means that a. product of the form y = C(v)x can be solved a.t "FFT
speed":
x= Fn:r
v=Fnv
z =v.•x
y = F;; 1 z
In other words, three DFTs and a vector multiply suffice to carry out the
product of a circulant matrix and a vector. Products of this form are called
convolutions and they are ubiquitous in signal processing and other areas.
Toeplitz-vector products can also be computed fast. The key idea is
that any Toeplitz matrix can be "embedded" in a circulant. For example,

T=[!! i]
is the leading 3-by~3 submatrix of

C=r~~~~n
In general, if T = (tij) is an n--by-n Toeplitz matrix, then T = C{1:n, l:n)
where C E m_(2n-l)x(2n-l) is a circulant with

T(1:n, 1}
C( :, 1) = [ T(1, ]
n:- 1:2)T ·

Note that if y = Cx and x(n+ 1:2n-1) = 0, then y(1:n) = Tx(l:n) showing


that Toeplitz vector products can also be computed at "FFT speed."
4.7. TOEPUTZ ANO RELATED SYSTEMS 203

Probleu»

P4.7.1 Fot a.oy v E R' define \be vectora V+ = (v + E,.v)/2 a.od V- = (v- E..v)/'l.
Suppo~~e A E R" "" ia II)'UUDI!'Cric aod perll)'lDJDIItrc. Shaw that. if A% ""' b than Az + = b+
aod Az_ = 6--
P4. 7.2 Let U E R" "" be the UDit upper triaogular matrix with tbe propeny that;
U(l;k- l,J:) = E~<-lll(lo-l) where 11{lo) ia defined by (4.7.1). Show that

UTT,.U = d.iac(I,Jh, ... ,JJ,.-1).


P4.7.3 Suppose z E R" 1111d that S E R""" is ort.bogoaal. Show that if
X= [z. 811, ... ,S"- 1 .~:)

then xr X ia Toeplib..
P4. T.4 Coa.ider the LDLT factorization of ao n.by-n symmetric, t.ridiattooal., positive
definit-e Tooplltz matrix. Show thai; d.. and l,.,n-1 coovqe 1111 n-o oo.
P4. T.5 Show that the product of two lower triangular Toeplitz matrices is 'Theplitz.
P4.T.6 Give an algorithm f"" determioi:og ~ E R ,uch thai;

T,. +1-'(e,.ef +e1~)


is singu18l'. Asume T,. = (rli-jJ) is positive definite, with ro = l.
P4. T. T Rewrite Algorithm 4.7.2110 that it doe. not nquire the vectors z and v.
P4.T.8 Gi"" an algorithm ror computing ~(T•) for J: = l:n.

l
P4.T.9 Suppose A1, A1, As and~ are m-by-m mairiCEII and that.

A -- [ ~A1 ~A;a ~~A.o 1:A1 .


A1 At A3 Ao
Show that there il a permutation mNTix IT such that nT All : C = (C,i) where eech
C,; is a. 4-by-4 circulant matriL
P4. T.lO A p-by-p bl&ck matrix A= (A,;) with m-by-m b!oc.ka is block Toepli~ if thln-
l!ltist. A-p+l, ... ,A-~oAo,A1, .. .• ~-l E R'"""' so that A.i ""Ao-i• e.g.,

~]
Ao At At
A- A-t Ao At
- [ A-2 A-1 Ao A1 '
A-3 A-2 A-t Ao

Ta

ITT All , ; T~1 Tn


[
T_,
where each T,; is p-by·p ILild Toep~ Eech T,, ahould be "'ude upR of (i,j) eDtriea
selected &om the A• Dlll.tricaL (b) What au you My about the T,1 if A., "=' A-.,,
k= l:p -17
P4.7.11 Show how to compute the 8CIIutioll8 to the syste!nS in (4.7.9) giWD th3C the
204 CHAPTER 4. SPECIAL LINEAR SYSTEMS

aol.~
to the.,.... in (4.7.8) ant nai1abla. AMwne thai all the ma&ricel imvlved
oomd.ngular. ~ to dewklp a. ~ w.ymmetrie Toepllt.a eolwr for T: = b
&n!
~ ~ T's 1-.diq pri.ociple mbmatricalan~ a.ll no~.
P4.7.12 A matrix HE e-xn ill HtWca if H(n:- 1:1,:) i1 'lbeplita. Show thai if
A E R'x" ill defined by
a,;; = 1• O»(U) COB(jll)d8

thaD A i1 the sum ol a. Han.lo!l matrix aod Toeplits matrix. Hint. Mllkb 1188 of the
=
i.deutity coe(u + 11) <XIII{u} cos(v) - sin(u} ain(v).
P4.7.13 Varify tha.t F,.C(v):: dia.g(F,.w)F~~.
P4. 7.14 Show that it is poeaiblo!! to embed a l)'llliDI!tric Toeplit:~ matrix into a. symmetric
circulant matrix.
P4.7.15 Consider the kth onl.er Yule-Walkm system T~ou< 11 > = -r<•> tba& aria~!~ in

l
(4..7.1):

T, [ ·;'
Uu
=- [ 7]
r~o

Show that if
1 0 0 0 0
1111 1 0 0 0
l':l2 :11:11 1 0 0
L= W3 1132 1131 1 0

1111-l, .. -1 :U..-1,n-~ l/n-1,11-3 1/n-1,1 1

the~~ LT.. LT = dief!:(l,01 •... ,,6..-I) •here fJ11 = 1 + r<lo)T 1/(J.). Tbus, the Durbin
algorithm can be thought of as a fasi method for computing the LDLT fadorization of
...
'T'-1
~

Note. and R.efenAce. for Sec. 4.7

ADyOJIII who Y'llllt- iDio tbe w.t Toepllta metbod lit.eratUie should flmt rMd

J.R.. Blllldl (1~). "Stability of Mecb.oda for Solving Toeplib S:ya.ems of Equ.ati0011,~
SIAM J. Sci. St4t.. Comp. 6, 34.g.....;J64
for- a dari&eiion of l&abllity u.n-. N. ill true .nth the "£Bet aJgoritluna" ar-ea in general,
uutable Toeplib techniqus abound aDd caution muat be flhl'ciaed. See aiBo

G. Cybenko (1918). MEnor Analyail ol Some Signal~ Algnrithma,~ Ph.D.


~ PrineetoD Unmaiiy.
G. C,..benko (1980). "The Num.ical. St.bility ol the Levineoa-Ourllin Algorithm for
Toeplitz s,..._ of EquM;ioM, • SIAM J. Sci. Gftd St4L Camp. 1, 303-19.
E. L1Dzao (1992). "011 the Sta.bi1ity of Solution Methods br a-d Theplitz System~,~
Un..Alg. 11116 It. ApplicGaicm 170, 1-32.
J.M. Varah. (1994). "Backwvd Error &otimatee foe Toeplitz SyBCema,~ SIAM J. Ma~
Anol. Appl. 15, .(()8-.417.
A.W. Boja.oczyk, R.P. Bnmt, F.IL de Hoos, Uld D.R. S1111111t (1995). "'n tbe S~illty of
'l:le Barei8a aud ~ ThepliU ~ Algorit~~ SIAM J. Ma~ A.RGL
Appl. 16, 40-51.
4. 7. TOEPLITZ AND RELATED SYSTEMS 205

M. Stewvt aDd P. V.a Dooc'ell (1996). "Stability . _ in the F'actorization of Strac-


tured Ma.trics," SIAM J. Jlatris AnaL A,t 18, w appear.
Tbe origiDal refiereD.I::a for the three alpit.h.me dearzibm in tbia -=tioD are:

J. Durbin (1960). '"The Fittinc ol Time s..n- Model.,~ Rev. /rut. /nt. Sl4i. S8 233-43.
N. Leviulon (1941). 'The W«ner RMS Errol- Criterion in Filtm- Design and Pn!diction,"
J. M!Uh. Phf•- 15, 261-78.
W.F. 'I'reacll (1964). ~An Algorithm lor the lnwnion of Finite Toepliu Ma.tnc.," J.
SIAM 1J, 515-22.
A more detailed dtB:ription of the nonsymmetric Trench algoriUun is given in

S. Zohu (1969). '"'I'oepliU: Mairix lnvenion: The Algorithm orW.F. Trench." J. ACM
16, 592-601.
Fui Toeplib system eolving hM aitraeted an enotmoUII amount of attmltion and a sam-
pling of inten.ting algorithmic id.. may be found in

G. Ammar and W.B. Gragg (1988). "Superfalt Solution of Real Poaitive Definite
Toeplitz Systems," SIAM J. Matri: AnaL AppL 9, 61-16.
T.F. Chan and P. Hauen (1992). ~A Look-Ahead Levinaon Algorithm for Indefinite
Toeplita S}'1ltemll," SIAM J. Matri: AnaL AppL 13, 490-606.
D.R. S'WOH (1993). "The Ute of Pi"YotiDg w lmpi"CMt the N1lDHrical. Perlonnallc:e or
Al~t~ritbms for Toeplitl Mamcea,• SIAM J. Ma1:1U Aftlll. App.l. 14, 468-493.
T. Kailath and J. Chun (1994). "'GeneraJized Oilplacement Structure for Block-Toeplitz,
Thepfits-Biock, and TO!!pl.ib-Deriwd. M.atricm," SIAM J. Mal:ri:c AnaL AppL 15,
114-128.
T. Kailath ami A.H. S&yed (1995). "Disp1ecemem Structure: Theory and Applicaiions,"
SIAM Re1Mul 37, 297-386.
Important Toeplits znairix appUcationa are di8cu.ed in

J. Makhoul (1975}. "Linear Prediciiol1: A Thtorial Rmaw," Proc. IEEE 63(4), 561-80.
J. Markel aud A. Gray (1976). LiMar Pred1dlon of Sp«eh, Springer-Verlag, Berlin and
New York.
A.V. Oppenheim (1978). App&:Geioru of Diglt4l Signal Proceuing, Pnmt.ice-Hall, En-
glewood Cill&.

-
ffaAkel mairics are eonatam aloonr Uleir alliidiagpnals and llriea in ~ im.pon;am
G . Heiuig and P. Jaakowlki (1990). "Parallel &ad Supmfut AJ&oriihma for H&Db.l
Sy.tems of Equations," H - . M~Mh. 58, lt»-127.
R. W. Freund and H. Zha (1993). ~A Look-AheM!. AJgoritlun.lor the Solution of a-enJ
IUabl Sywte~m. N N - . .Vc:a.th. 64. 211.5-322.
The DFTfl'oeplit;s/circu.INU CIOIUlection ia ~ in

C.F. Van Loa.a {lDtn). C~ ~ /tw the Fon Founer- 7Fotwfonn,


SIAM Publicailom, Philai:Wphi&, PA.
Chapter 5

Orthogonalization and
Least Squares

§5.1 Householder and Givens Matrices


§5.2 The QR Factorization
§5.3 The Full Rank LS Problem
§5.4 Other Orthogonal Factorizations
§5.5 The Rank Deficient LS Problem
§5.6 Weighting and Iterative Improvement
§5.7 Square and Underdetermined Systems

This chapter is primarily concerned with the least squ8l'es solution of


overdetermined systems of equations, i.e., the minimization of II Ax- b lb
where A E ~xn with m ~nand bE R"'. The most reliable solution pro--
cedures for this problem involve the reduction of A to wriowi canonical
forms via orthogonal transformations. Householder reflections and Givens
rotations are central to this process and we begin the chapter with a d.iscus-
sion of these important transformations. In §5.2 we discUBB the computation
of the factorization A= QR where Q is orthogonal and R is upper trian-
gular. This amounts to finding an orthonormal basis for ran(A). The QR
factorization can be used to solve the full rank least squares problem as we
show in §5.3. The technique is compared with the method of normal equa-
tions after a perturbation theory is developed. In §5.4 and §5.5 we consider
methods for handling the difficult situation when A is rank deficient (or
nearly so). QR with column pivoting and the SVD are featured. In §5.6 we
discuss several steps that can be taken to improve the quality of a computed
207 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES

least squares solution. Some remarks about square and underdetermined


systems are offered in §5. 7.

Before You Begin


Chapters 1, 2, and 3 and §§4.1-4.3 are assumed. Within this chapter
there are the following dependencies:
§5.6
T
§5.1 - §5.2 - §5.3 -+ §5.4 -+ §5.5
1
§5.7
Complementary references include Lawson and Hanson (1974), Fe.rebrother
(1987), and Bjorclt (1996). See also Stewart (1973), , Hager (1988), Stewart
and Sun (1990), Watkins (1991), Gill, Murray, and Wright (1991), Higham
(1996), Trefethen and Bau (1996), and Demmel (1996). Some MATLAB
functions important to this chapter are qr, svd, pinv, orth. rank, and the
"backsl.ash" operator "\.'' LAPACK connections include

LAPACK: Householder/Givens Tools


_UJIFG Gen111'8W a Householder matrix
.UJIF Householder times mairix
.L.UFl Small n Houaehoider timea matrix
.UBFB Block Houasholdu times malrix
.UilFT Compute. I - VTV H blodc reflector repre!Jentation
.UllTG ~a p1aDe rotation
.LAIICY Generates a vector of plane rotations
.U11.TY Applim a WICtor of pl1111e rotatioDII to a vector pair
.USR Applies rota&ion sequence to a mat.rix
CSRDT Real rotation timm complex vector pair
ClUJT Complex rotUioo (c real) timeli complex vector pair
CUCGV Complex rotatioll (• n.w.l) times complex vector p&ir

LAPACK: Orthogonal Factoriz.ations


.C.EQV A-QR
_GIJ3PF All "'QR
.DJIMQR Q (fi!Gonld Conn) &.i.bJM mat.rix (rm.J. caM)
~tnntQR Q (factored fonn) tirneB ma&rix (complex cue)
~DKQR ~Q (real. ea.)
_UWGQR Genera&M Q (romplex cue)
_GIJIQF A::'!:'!- ~um- tria.ll.guW1focthogooal)
~GELQF A = QL = {orthogo:o.al){lollw ~)
_GEQLF A • LQ = (~ trianguJ.&c)(ortbogonal)
_TZRQF A = RQ where A ill upper t.rapesoidal
.CESVD A•UEvT
_:ensqa SVD of rwl. bidie«oDaliDMrix
_CDJm BJdiasonaliza1ioo of ~ matrix
_DAGBa GeDera&e. tbe onhoconaJ trao..fonnaikoos
.GBBIID Bidlagooallzatioo of band mamx
208 CHAPTER 5. ORI'HOGONALIZATION AND LEAST SQUARES

LAPACK: Least Squares


_GELS Full rank miD 0AX - B ~z,. or mill UAn X - B n,.
_C£LSS SVD aolu'ion to min II AX - B II p
-GELSI Complete orthogonal. decompomioo 10luiioo to miD UAX - B II,.
~Cm:JU F..q.ailibraiea ~ mat:rD: to reduce condition

5.1 Householder and Givens Matrices


Recall that Q E R"xn is orthogonal if QTQ = QQT = In.. Orthogonal
matrices have an important role to play in least squares and eigenvalue
computations. In this section we introduce the key players in this game:
Householder retlections and Givens rotations.

5.1.1 A 2-by-2 Preview


It is instructive to examine the geometry 8SSOCiated with rotations and
reflections at the n = 2 level. A 2-by-2 orthogonal matrix Q is a rotation·
if it has the form
_ ( cos( 8} sin(8) ]
Q - - sio(8) cos(B) ·
If y = QT x, then y is obtained by rotating x counterclockwise through an
angle 8.
A 2-by-2 orthogonal matrix Q is a reflection if it has the form

Q _ [ cos(8) sin(8) ]
- sin( B) -cos( B) ·

U y = QT x = Qz, then y is obtained by reflecting the vector z across the


line defined by
cos(fJ/2) ] }
S = span { [ sin(8/2) ·
Reflections and rotations are computationally attractive becauae they are
easily constructed and because they can be used to introduce zeros in a
vector by properly choosing the rotation angle or the retlection plane.

Example 5.1.1 Suppoee: = [ l, .;3JT. I r - set

cos( -600) sin( -60") ] [ 1/2 -../3/2 ]


Q = [ -sin( -60") cos( -60") = ../3/2 1/2

then QT~ = ( 2, 0 JT. Thus, a. rotuioo of -W' ~the 8e1Xlod COQ1pooent of:. U
cos(30") sin.(30") ] [ ../3/2 1/2 ]
Q = [ llin(30") - CC111(30") = 1/2 -.iJ/2
then QT: = (2, 0 ]T. Thus, by reflecting z acrcw tbe 30" liDe - c:aa zaoo it. -=ood
compooeot.
5. 1. HOUSEHOLDER AND GrvENS MA.TRJC8S 209

5.1. 2 Householder Reftections


Let v e R" be nonzero. An n-by-n matrix P of tbe form

p = 1- __!_ttvT (5.1.1)
vTv

is called a Howelwlder reflection. {Synonyms: Householder matrix, Ho~


bolder transformation.} The wctor v Is called a Hou.tholder vtetor. If a
vector r is multiplied by P , then it 18 reftected in the hyperplane span{v}.L.
It is easy to verify that Hollllebolder matrices are symmetric and orthogonal.
Householder reflections are similar in two ways to Gauss transforma--
tions, which we introduced in §3.2.1. They are rank-1 modifications of the
identity and they can be Wled to zero selected components of a vector. In
particular, suppoae we are given 0 -F x e R" and want Px to be a multiple
of e1 = !.,(:, 1). Note that

and Px e span{e1} imply v e span {x, e l}. Setting v = x + o e1 gives

and

and therefore

Px =

In order for the coefficient of x to be zero, we set a= ±II x ll:z for then

v= :t±llr ll2el => Pr= (1-2:;.:)x=~llrll2e1. (5.1.2)

It is this simple determination of v that makes the Bouaebolder reflection


so useful.

Exaaaple 5.1.2 Ir lll:'- ( 3, 1, 5, lJ'~' and v- ( 9, 1, 5, 1J1'. &baD

p"" /-2wT ...


.,Tv
.!..
S4
[ -~ -~"' -~119 -5
-45
-9 -1 -5
=~
53
l
baa the~ t~ P% = ( - 6, 0, 0, 0, JT .
210 CHAPTER 5. 0RTHOGONALIZAT10N AND LEAST SQUARES

5.1.3 Computing the Householder Vector


There are a number of important practical details 8880ciated with the deter-
mination of a Householder matrix, i.e., the determination of a Householder
vector. One concerns the choice of sign in the definition of v in (5.1.2}.
Setting
tit = Xt - II X 112
has the nice property that Px is a positive multiple of e 1 • But this recipe is
dangerous if x is close to a positive multiple of e1 because severe cancellation
would occur. However, the formula
:&¥-II :r ~~~ -(~ + ... +x!)
til = Xt - ll X lb = :rl + II X 112 = Zt + ll X lt2
suggested by Pa.rlett (19TI) does not suffer from this defect in the :r1 > 0
case.
In practice, it is handy to normalize the Householder vector so that
v(l} = 1. This permits the storage of v(2:n) where the zeros have been
introduced in x, i.e., x(2:n). We refer to v(2:n) as the essential port of
the Householder vector. Recalling that {J = 2/vTv and letting length(:t)
specify vector dimension, we obtain the following encapsulation:

Algorithm 5.1.1 (Householder Vector) Given x E R", this function


fJ E R such that P = In- fJvvT is
computes v E R" with v(l) = 1 and
orthogonal and Px = II x lbe1.
function: [v, fJI = house(x)
n = length(x)
a = x(2:n)T x(2:n)

v = [ x(;:n) )
if (j = 0
/3=0
else
~ = v'x(l)2 +a
if x(l) <== 0
t1(l) = x(l) -~J
else
t1(l) = -u/(x(l) + IJ)
end
{3 = 2v(l) 3 J(a + v{l)3 )
t1 = tljv(l)
end
This algorithm involws about 3n flops and renders a computed Householder
matrix that is orthogonal to machine precision, a concept disc\UISeCl below.
5.1. HOUSEHOLDER AND GIVENS MATRICES 211

A production version of Algorithm 5.1.1 may invotve a preliminary scaling


of the~ vector (z .._ ~111 x II) to awid overflow.

5.1.4 Applying Householder Matrices


It is critical to exploit structure when applying a Householder reflection to
a matrix. H A E wxn and p = I - /3vvT E nmxm, then

PA =(I- {3vvT) A= A -vwT

where w = {3AT v. Likewise, if P = I - {3vvT E R"xn, then

where w:::: {JAv. Thus, an m-by-n Householder update in'IOlves a matrix-


vector multiplication and an outer product update. It requires 4mn ftops.
Failure to recognize this and to treat P as a general matrix increases work
by an order of magnitude. Householder updates never entail the explicit
formation of the Housdwlder matriz.
Both of the above Householder updates can be implemented in a way
that exploits the fact that v(l) "" 1. This feature can be important in the
computation of P A when m is small and in the computation of AP when
n is small.
AB an example of a Householder matrix update, suppose we want to
overwrite A E R"'xn (m ~ n) with B = QT A where Q is an orthogonal
matrix choeen so that B(j + I :m, j) = 0 for some j that satisfies 1 ~ j ~ n.
In addition, suppose A(j:m, l:j- 1) = 0 and that we want to store the
essential part of the Householder vector in A(j + l:m,j). The following
instructions accomplish this task:

[v,PJ = house(A(j:m,j))
A(j:m,j:n) = (Im-;+I- fjwT)A(j:m,j:n)
A(j + l:m,j) = v(2:m- j + 1)

From the computational point of view, we have applied an order m- j + 1


Householder matrix to the bottom m - j + 1 rows of A. However, math&
matically we have also applied the m-by-m Householder matrix

to A in its entirety. Regardless, the "essential" part of the HoUBebolder


vector can be recorded in the zeroed portion of A.
212 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARE'S

5.1.5 Roundoff Properties


The roundoff properties associated with Householder matrices are very £&..
vorable. Wilkinson (1965, pp. 152-62) shows that house produces a House-
holder vector v very Deal' the exact 11. U P = I - 200T jfJT fJ then

II P- Pl12 = O(u)

meaning that Pis ortlwgonal to machine precision. Moreover, the com-


puted updates with P are close to the exact updates with P :

fl(FA} = P(A +E) II E ll2 = O(uU A lh)


Jl(AP) = (A+ E)P II E ll2 = O(ull A !12)

5.1.6 Factored Form Representation


Many Householder based factorization algorithms that are presented in the
following sections compute products of Householder matrices

(5.1.3)

where r :5 n and each vul has the form

vU> = ( ~
0 0 . · · 0 1 vUl
J+l•
. .. • n
vUl)T ·
;-1

It is usually not necesaa.ry to compute Q explicitly even if it is involved in


subsequent calculations. For example, if C E m_nxq a.nd we wish to compute
QT C , then we merely execute the loop

for j = l:r
C=Q;C
end
The storage of the Householder vectors v< 1>· · · v(r) and the corresponding
/J; (if convenient) amounts to a factored form representation of Q. To
illustrate the economies of the factored form representation, suppose that
we have an array A and that A(j + l:n,j) houses vU>(j + l:n), the essential
part of the jth Houooholder vector. The overwriting of C E R"x 11 with
QT C can then be implemented as follows:

for j = l:r
v(j:n) = [ A(j +\:n.,j) ] (5.1.4)

C(i:n, :) =(I- {J1v(j:n)v(j:n)T)C(j:n, :)


end
5.1. HOUSEHOLDER AND GIVENS MATRIC&':i 213

This involves about 2qr(2n - r) Hops. H Q is explicitly represented as an


n·by-n matrix., QT C would involve 2n2 q Hops.
Of course, in some applications, it is necessary to explicitly form Q
(or parts of it). Two possible algorithms for computing the Householder
product matrix Q in (5.1.3) are /Of1JKJrd accumulation,

Q= In
for j = 1:r
Q=QQ;
end

and backward acct~mulaticn.,

Q =In
for j = r:- 1:1
Q=Q;Q
end

Recall that the leading (j - 1)-by- (j - 1) portion of Qi is the identity. Thus,


at the beginning of backward accumulation, Q is "mostly the identity" and
it gradually becomes full as the iteration progresses. This pattern can be
exploited to reduce the number of required flops. In contrast, Q is full
in forward accumulation after the first step. For this reason, backward
accumulation is cheaper and the strategy of choice:

Q =In
for j = r: - 1:1

v(j:n)= [ A(j +\:n,j) ] (5.1.5)


Q(j:n,j:n) = (!- fJ;v(j:n)v(j:n)T)Q(j:n,j:n)
end

This involves about 4(n2 r- nr 2 +r /3) flops.

5.1. 7 A Block Representation


Suppose Q = Q1 · · · Qr is a product of n-by-n Householder matrices as
in (5.1.3). Since each Q; is a nwk-one modification of the identity, it
follows from the structure of the Householder vectors that Q is a rank-r
modification of the identity and can be written in the form

(5.1.6)

where W and Y Bl'e n-by-r matrices. The key to computing the block
repre.sentation (5.1.6) is the following lemma.
214 CHAPTER 5. 0RTHOGONALIZATlON AND LEAST SQUARES

Lemma 5.1.1 Suppoae Q = I+ wyT is Bn n-by-n orthogonal matnZ with


W, Y E R"x.:l. If P =I- {JwT with v E R" and z = -{JQv, tlw!n

Q+=QP=l+W+Y.f
where w+ = [ w z 1andY+ = [ Y v 1are each n-by-(j + 1).
Proof.
QP = (I+ WYT) (I - {JvvT) = I+ WYT - f3Qm7
= I + WYT + ZVT = I + [ w z l [ y v IT D

By repeatedly applying the lemma, we can generate the block representa-


tion of Q in (5.1.3) from the factored form representation as follows:

Algorithm 5.1.2 Suppose Q = Ql · · · Q .. is a product of n-by-n House-


bolder matrices as described in (5.1.3). This algorithm computes matrices
W, Y E R"xr such that Q =I+ wyT,
y = tl(l)
W = -.Blv(l)
for j = 2:r
z = -/3;(1 + WYT)vUl
W={W z]
Y = [Yv(j)]
end
This algorithm involves about 2r2 n - 2r
/3 flops if the zeros in the vW are
exploited. Note that Y ia merely the matrix of Householder vectors and is
therefore unit lower triangular. Clearly, the central task in the generation
of the WY representation (5.1.6) ia the computation of theW matrix.
The block representation for products of Householder matrices is attrac-
tive in situations where Q must be applied to a matrix. Suppose C E R'xq.
It follows that the operatiOn
c- QTc =(I+ WYr)Tc = c + Y(wTc)
is rich in level-3 operations. On the other b&Dd, if Q is in factored form,
qr C is just rich in the level-2 operations of matrix-vector multiplication
and outer product updateJ. Of course, in this context the distinction be-
tween level-2 and lewl-3 diminishes as C gets narrower.
We mention that the "'WY" representation is not a generalized House-
bolder transformation from the geometric point of view. Ttue block reflec-
tors have the form Q = I- 2vv"T where V E R'xr satisfies vTV = I,..
See Schreiber and Parlett (1987) and also Schreiber and Van Loan (1989).

Example 15.1.3 [f n "' 4, r = 2, and { 1, .6, 0, .8 JT and ( 0, 1, .8., .6JT e.re the
5.1. HOUSEHOLDER AND GJVENS MATRJCBS 215

Q(h
1
1.
.. • +WY1' ii
l
" + [
-1
-6
0
- .8
1.080
- .352
- .800
.:liU
l [ 1
0
.6
1
0
.8
.8]
.6 .

5.1.8 Givens Rotations


Householder re8ectiona are exc••iingly useful ror
introducing zeros on a
grand SC8le, e.g., the annihilation of all but tbe first component of avec-
tor. However, in ealeulations where it is neoesaary to zero elements more
!electively, Giveru roCGtion.r are the transformation of choice. These are
rank· two correctiooa to the identity of tbe form

1 0 0 0

0 c ... s ... 0
G(i,k,8) = (5.1.7)
0 -6 ... c 0 k

0 0 0 1

i k

where c = cos(9) and 8 = siD(8) for some 9. Givens rotations are clearly
orthogonal.
Premultiplicatlon by G(i, k,9)T amount& to a counterclockwise rotation
of 8 radians in the (i, k) coordinate plane. Indeed. u z e R" and '!I =
T .
G(i,k,8) z, then

1/J = { :: :: ; : ~ i
z; j ~ i, /c

From theae formulae it ill clev that ,.., cao bee ~ to be zero by setting

(5.1.8)

Thus, it ill a 8lmple matter to zero a apecl6ed eDtry in a vector by using a


Givens rotation. In practice, there are better ways to compute c aDd 8 than
(5.1.8). The following algorithm. for example, guards against overflow.
216 CHAPTER 5. ORI'HOGONALIZATION AND LEAST SQUARES

Algorithm 5.1.3 Given aca1a:rs a and b, this function computes c =COl!!( B)


=
and s sin( 9) so

[_; :r[:
function: [c., s) = givens( a, b)
J= [ ~ J .
ifb=O
c= 1; s =0
else
if lbl > lal
r=-afb; s=lf~; c=s-r
else
r = -bfa; c = 1/v'f'+'?; s = cr
end
end
This algorithm requires 5 flops and a single square root. Note that it does
not compute 8 and so it does not involve inverse trigonometric functions.

Example 5.1.4 If z = {1, 2, 3, fJT, coe(9) = 1/../5, and sin(9) = -2/../5, then
G(2,4.,9)z:: [1, ..120, 3, oJT.

5.1.9 Applying Givens Rotations


It is critical that the simple structure of a Givens rotation matrix be ex~
plaited when it is involved in a matrix multiplication. Suppose A E 1R.mxn,
c = cos(B), and s = sin(B). If G(i, k, 8) e Rmx ... , then the update A ,_
G( i, k, 8)T A effects just two rows of A,

8
A([i, k), :) = [ c ] T A([i, k), :)
-s c
and requires just 6n flops:
for j = l:n
Tt = A(i,j)
= A(k,j}
7'2
A(l,j) = CTJ. - 872
A(2,j) = STJ. + cr2
end
Likewise, if G(i,k,S) E R"x .. , then the update A+- AG(i,k,B} effectB just
two colum.os of A,

A(:, [i, kl) = A(:, [i, k]) [ -~ : ]

and requires just 6m .Bops:


5.1. HOUSEHOLDER AND GrvENS MATRICES 217

for j = 1:m
= A(j,i)
-r1
= A(j,k)
-rl
A(j,i) = cr1- S"'"l
A(j, k) = "'"• + C1"J
end

5.1.10 Roundoff Properties


The numerical properties of Givens rotations are 88 favorable 88 those for
Householder reflections. In perticular, it can be shown that the computed
c and s in givens satisfy
c = c{l + ec) fc = O(u)
§ = s(1 +e.) E. = O(u).
If c and § are subsequently UBed in a Giveos update, then the computed
update is the exact update of a nearby matrix:
fl[G(i, k, fl)T AJ = G(i, k, 8f(A +E) II E !12 ~ ull A Ill
fl[AG(i, k, 0)} = (A+ E)G(i, k, 6) II E IJ:;~ ~ ull A Ill·
A detailed error analysis of Givens rotations may be found in Wilkioson
(1965, pp. 131-39).

5.1.11 Representing Products of Givens Rotations


Suppose Q = G1 · · · Ge is a product of Givens rotations. AJJ we have seen in
cormection with Householder reftections, it is more economical to keep the
orthogooa.l matrix Q in factored form than to compute explicitly the prod-
uct of the rotations. Using a. technique demonstr&ted by Stewart (1976),
it is possible to do this in a very compact way. The idea is to 8Sii0ciate a
single floating point number p with each rotation. Specifically, if

z = [
-s
e s]
e
then we define the scalar p by
ifc=O
p=l
elseif lsi < lei
p = sign(c)s/2 (5.1.9)
else
p = 2sign(s)/c
end
218 CHAPTER 5. ORI'HOGONALIZATION AND LEAST SQUARES

Essentially, this amounts to storing s/2 if the sine is smaller and 2/c if the
cosine is smaller. With this encoding, it is possible to reconstruct ±Z as
follows:
ifp=l
c=O;s=l
elseif IPl < 1
s=2p;c=~ (5.1.10)
else
c= 2/ p; 8 = v'f"="'&
end

That -Z may be generated is usually of no consequence for if Z zeros a


particular matrix entry, so does -Z. The reason for essentially storing the
smaller of c and a is that the formula v"f'=? renders poor results if x is
near unity. More detaila may be found in Stewart (1976). Of course, to
'"reconstruct" G(i, k, 9) we need i and k in addition to the associated p.
This usually poses no difficulty as we discuss in §5.2.3.

5.1.12 Error Propagation


We offer some remarks about the propagation of roundoff error in algo-
rithms that involve sequences of Householder/Givens updates. To be pre-
cise, suppose A = Ao E m.m x n is given and that matrices A 1, ••. , Ap = B
are generated via the formula

k = l:p.
Assume that the above Householder and Givens algorithms are used for
.
both the generation and application of the Q11 and z~. Let O~r and Z 11 be
the orthogonal matrice~ that would be produced in the absence of roundoff.
It can be shown that

B = (Qp · · · Qt)(A + E)(Zt · · · Zp), (5.1.11)

where II E 1!2 $ cull A l12 and c is a constant that depends mildly on n, m,


and p. In plain English, B is an exact orthogonal update of a matrix near
to A.

5.1.13 Fast Givens Transformations


The ability to introduce zeros in a selective fashion makes Givens rotations
ao important zeroing tool in certain structured problems. This hBS led to
the development of "fast GivellB" procedures. The fast Givens idea amounts
to a clever representation of Q when Q is the product of Givens rotations.
5.1. HOUSeHOLDER AND GIVENS MATRICES 219

In particular, Q is represented by a matrix pair (M, D) where JdT M = D:;;;:


diag(dt) and each dt is positive. The matrices Q, M, and Dare connected
through the formula

Q = MD- 112 = Mdiag(1fv'(4).

Note that (MD- 112 )T(MD- 112 ) =


D- 112DD- 112 = I and so the ma.-
trlx M n- 12 is orthogonal. Moreover, if F is au n-by-n matrix with
1

FT DF = n_ diagonal, then M",_.M_ _. :;;;: Dnc,. where Mnew = M F.


Thus, it is possible to update the fast Givens representation {M, D) to ob-
tain (Mnew 1 D...,.,). For this idea to be of practical interest, we must show
how to give F zeroing capabilities subject to the constraint that it ~keeps"
D diagonal.
The details are best explained at the 2-by~2 level. Let x = [xt x2]T and
D = diag(d1,~) be given and assume that d1 and d2 are positi~. Define

f3t 1 ] (5.1.12)
[ 1 Clt

and observe that

and
T [ d2 + /Jrdt dd3t + d2a1 ]
Ml DMt = dtf3t +~at dt +or~ :::: Dt .
If x2 "# 0, 0:1 = -xt!x2, and f31 = -a1d2/dt, then

M 1T X _
-
[ X2(l + 'Tl)
0
]

M'r DM1 = [ d2(l + '11) 0 ]


1
0 dt ( 1 + I'd
where 1'1 = -at/31 = (~/dt)(x1/x2) 2 .
Analogously, if we 888ume x 1 # 0 and define M2 by

{5.1.13)

where a2 = -x2/x1 and {h = -(dt/d:~)a2, then

M[ x = [ :tt (1; 'Y2) ]


and
M.2TDM:~ _ [ dt(1 +"b)
- 0
0
~(1 +"'1'2)
] ___ D
~h
220 CHAPTER 5. O!n'HOGONALlZATJON ANO LEAST SQUARES

where 72 = -o2P2 = (dt/d2)(:2/:1)3.


It La ea:ry to show tbat Cor either &= 1 or 2, the matrix J"" D 112 M~D;tt'J
is orthogonal and that it is designed so that the second component of
.fT(D- 112:) is zero. (J may actually be a reflection and thus it is half-
correct to use the popul&r tenn "fast Givens.")
Notice that the 1'i satisfy 7t72 = 1. Thus, we can always select M, in
the above so that the "growth factor" (1 + 'Yo) is bounded by 2. Matrices
of the form

that satisfy -1 S a;p, $ 0 &re 2aby·2 fast Givens tron.tformati0114. Notice


that premultipUcation by a fast Givens traD:Jformation involves half the
number of multiplies as premultiplication by a.o "ordinary" Givens trans-
formation. Also, tbe zeroing is eanied out without an explicit square root.
In the n.-by-n ease, everything "scales up" as with ordinary Givens r~
tations. The '"type I " transformations have the form

1 0 0 0

0 {J 1 0 i
F(i,k,a,{J) = (5.1.14)
0 I ·· · a 0 k

0 0 0 I

i k
while the "type 2" trBDBformations are structured as follows:

1 0 0 0

0 1 0 ... 0 i
F(i,k,a,jJ) "" (5.1.15)
0 {J 1 0 k

0 0 0 1
k
EDC8p8ulating all this we obtain

Algorithm 5.1.4 Given :r e R 2 and positive de R 2 , the following al-


gorithm computes a 2-by-2 fast Givens transformation M such that the
5.1. HOUSEHOLDER AND GIVENS MATRICES 221

second component of MT z is zero and MI' D M = D 1 is diagonal where D


= dlag(dt,d.z). If type= 1 then M has the form (5.1.12) while if type= 2
then M has the form (5.1.13). The diagonal elements of Dt overwrite d.
function: ra, P. twe 1== rast.givens(x, d)
if x(2) ¥- 0
a= -x(1)/x(2); {J = -ad(2)/d(1); 7 = -a/3
if,. ::5 1
twe= 1
T = d(1}; d(l) = (1 + 1')d(2); d(2) = (1 + 'Y)T
else
type= 2
a= 1/a; f)= lf{J; 1' = 1/7
d(1) = (1 + -y)d(l); d(2) = (1 + -y)d(2)
end
else
type= 2
a= 0; (3 =0
end
The application of fast Givens transformations is analogous to that for
ordinary Givens transformations. Even with the appropriate type of trans-
formation used, the growth factor 1 + "( may still be as large as two. Thus,
2• growth can occur in the entries of D and M after s updates. This means
that the diagonal D must be monitored during a. fast Givens procedure to
avoid overfl.ow. See Anda. and Park (1994) for how to do this efficiently.
Nevertheless, element growth in M and D is controlled becaUBe at all
times we have M n- 112 orthogonal. The roundoff properties of a fast givens
procedure are what we would expect of a Givens matrix technique. For ex-
ample, if we computed Q = fl( Mb- 112 ) where M and b are the computed
M and D, then Q is orthogonal to working precision: II (JT{J- I ll2 ~ u.

ProbleiWI

PS.l.l Execute bouse with r::: ( 1, T, 2, 3, -l]T.


P5.1.2 Let :.: and v be IIOQSerO vecton in R'". Give an alsortchm for daermining a
HOUIIebolder matrix P lll1Ch tba& P-z: ill a multiple ol V·
P5.1.3 Suppoae -z: E c- and that 'Zt = 1-z:tle~• with 9 E R. A.ume -z: :/: 0 and
define u = 'Z: + e""U ::r ll2e1 Show thai P = I - 2uuH fuHu il unitary 8lld that
Pz = -e,.n ~ ll2e1.
P5.1.4 Use HoUMbolder matricell; to show tba& det(l + rvr) = 1 + zTv where -z: and "
az-e gi.~ n-vecwra..
P5.1.5 Suppoae z E o1. Gi11e an algombm for detennilling a unitary matrix of the
fonn
Q:o [ _,c 'J
e
such that the eecond componeai of QH r ill zero.
222 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES

P15.1.0 Suppoee ::~: aDd ¥are unit vee\on in R'". GMI a.a. algorithm W1in1 Giwoa
tn.uafannatioD. which computlll an orthQsmlal Q IIUCb that QT:: = V·
PIS.l. T Detenmne c = cos(8) aud s =sin(B) lllldl that

P5.1.8 Sup~ &haC Q =I+ YrrT ia orthogona.l when! Y E R'xi ud T E p,JxJ is


uppc~r triangular. Show &haC if Q+ =: QP where P = I - 2w~vT v is & Houaebolder
matrix, then Q+ can be exp~ in tbe foi'III Q+ =I+ Y+T+Y+ where Y+ E R'x(j+l)
aod T+ E RJ+l)xU+l) Is upper- trianguhlc.
Pl5.1.9 Give "' detailed implementation of Algorithm 5.1.2 with the ununption that
vW(j+1:n), the -miai pan ofthetbe jth HoUI8holdervector, iutond in A(j+I;n,j).
Sinoe Y ill eff(lct.ively ~ted in A, your prooedUR need only eet up the W matrix.
P5.1.10 Show t~ if Sill sbw....ymmetric (ST = -S), then Q =(I+ S)(I- S)- 1 Is
orthogonal. {Q is called the Cayjqr trrJm/orm of S.) Construct a Olllk-2 S 110 that if :c
ill a vector then Q:~: is zero except in the first component.
P5.1.11 Suppoae p E R>X" satisfiea upT p- r.. ll:t = ~ < l. Show tb&t all the singular
vatu• of Pare in the interval (1- E1 1 +Ej aod that II P - UVT lb $ E whe.-e P = UEvT
is &be SVD of P.
P5.1.12 Suppose A E a:tx 2 . Under what conditioDII is the cloaeBt rotation to A dl;m8l'
thao the cloeeBt reflection to A?

Note. and References for Sec;. 5.1

Ho~dm matriC8!1 aa n&IPSd alter A.S. Houaebolder, wbo popularized their use in
numerical analysis. HO\WI'\I'er, the propertlai o£ these matricea have been known for quite
IJOPHI time. See

H.W. Turnbull and A.C. Aitbn (1961). An /nirodudion t.o the Theory of Canonical
Matricu, Dover Publicationa, N- Yock, pp. 102-5.
Other refen!nc- concerned with Houaeholder tl'alliJformatlona include

A.R. Gourlay (lm>). "G~n of E.leuwmi.aey HenniiiaD Matric:el!l,~ Comp. J.


13, 411-12.
B.N. Parlett (1971). "AnalysiJ of Algorithma for Retlectio011 in Biaectom," SIAM Review
13, lg,7-208.
N.K. Taao (1975). • A Note on Implementing the Ho11118holdes- Transfonnationa. ~ SIAM
J. Num. Anal. lf, 53-58.
B. DaaJoy (11176). •on the Choace ol Sip8 f<W HoUIIebokler Ma.t.rio!e, n J. Comp. Appl.
Math. f!, 67-69.
J.J.M. CuppeD (1984). •on UpdMiog Triangulal' Product& of HoWII!Iholder M&tricee, ~
Nvnu:r. M!Uh. 4$, 4Q3....410.
L. Kaufman (1987). "The Geuera1ized HoUMiholder 'l\-ansfonnation and Span~~~ Matri-
ce.," Lin. A!g. and IC.. App~ 90, 221-234.
A detailed error ana.lysia of HoU811Jbolder traDIIfonnatioDII ill gi.veu in t-.on a11d Hanaon
{11174, 83-89).
The buic refenmCfll for block Hmmebolder ~~ and the ~ com-
putatiDDII i.Dclude

C.H. Billchof and C. Van Loan (1987). "The WY Represemation fo~ PmduettJ of Houae-
holder M!!oirices," SIAM J. Sci. and Stat. Camp. 8, s2""1113.
5.2. THE QR FACTORIZATION 223

R. Sochnibolll' aod B.N. Parleu (1987). "Block IWieciors: Th!QY aDd Computatio!l,~
SIAM J. Nt.mur. An.al.. !5, 189.205.
B.N. Plll'iett and R. Schreibet (1988). "Block RAiflecton: Thecly ud Compnta.tkm,"
SIAM J. Hum. AlldL 16, 189-l06.
R.S. Schreiber .ad C. Vall LoaD (1989}. "A Stocap--Efficient WY R.epr.entMion foc
Product. of Hotueho1del- 'I'rallsformaiio011," SIAM J. Sci. and Stat. Comp. 10,
52-57.
C. ~ ( 1992). •Modificai:ion of the H1>1J81!boMW Method a-1 oo tbe Compact WY
~." SIAM J. Sci and St.GL Camp. 13, 123---126.
X. Su.a. and C.H. BiBcbof (1995). "A BMia-Kentel ~t.ati.on of Orthogonal Ma&ri-
a.," SIAM J. Ma.triz A-'. AppL 16, 1184-1196.
GiWD~~ rot.atiou.l, IIIUDI!Jd after- W. Giveu, are allo referred to • Jacobi rota&iDns. Jacobi
devised a S)'111ID8bic eigeovalue a.lgori\hm ~ on thMe traoafonnatio011 in 1846. See
§8.4. The GiWIDII rotation atorap IICheme dillcw.ed in tbe text ia detailed in

G.W. S~ (1976). "Tbe Economical St<Jnp of Plane Rocatiou.," NVMr. Mlloth..


!5, 137-38.
Faai Given~ tr-anafonn.UioDS are a18o refernd to u "8QU&l9--root-free" Giwns tranafo~­
mationa. (Recall that a ~ root muat ordinarily be computed during the formation
of GiveDfl ti"8DSfocmat.ion.) There are several ways faat Giveru~ calculatioM can be ar-
ranged. See

M. Gentleman (1973). "Leui Squatl!lll Computations by Givene Tranafonnationa without


Squr.ce Roots," J. Imt. Math. AppL U!, 3.36.
C.F. Van LoaD (1973). "Geaeralised Singui.N- Values With Algoritbm. aod Appl.ica--
tioDS: Ph.D. thellia, UniWII'IIity of Michigan, Ann Arbor.
S. Hammacling (1974). ~A Note on Modificaii.oDS to the G i - Plaue Rotation," J.
1-rut. Math. Appl. 13, 21~18.
J.H. Wilkinson (19TT). "Some Recent Advances in Nl1llll!rica.l Linear Algebra_" in The
State of t.M Art in N~meric.Gl AnalvN, ed. D.A.H. Jacobe, Academic Preas, New
York, pp. 1-53.
A.A. Anda and H. Pack (lW4). "Fan Plane Rota&io1111 with Dynamic Sca.ling,ft SIAM
J. Mo.triz AnaL AppL 15, 162-174.

5.2 The QR Factorization


We now show how Householder and Givens transformations can be used to
compute various factorizatiooa, beginning with the QR factorization. The
QR factorization of an m-by-n matrix A is given by
A=QR
where Q E R"'xna is orthogonal andRe R"'x" is upper triangular. In thia
section we assume m ~ n. We will see that if A has full column rank,
then the first n colUDUlS of Q form an orthonormal basis for ran( A). Thus,
calculation of the QR factorization is one way to compute an orthonormal
basis for a set of vectors. Thls computation can be arranged in several ways.
We give methods based on Householder, block Householder, Givens, and
fast Givens transformations. The Gram-Schmidt orthogonaJ.ization process
and a numerically more stable varia.nt called modified Gram-Schmidt are
also discussed.
224 CHAPTER 5. ORI'HOGONALIZATION AND LEAST SQUARES

5.2.1 Householder QR
We begin with a QR factorization method that utilizes Householder trans-
formations. The essence of the algorithm can be conveyed by a small ex·
ample. Suppose m = 6, n = 5, and assume that HoWieholder matrices H 1
and H2 have been computed so that
X X X X X
0 X X X X

0 0 131 X X
H2H1A = 0 0 131 X X
0 0 131 X X
0 0 131 X X

Concentrating on the highlighted entries, we determine a Householder ma-


trix fl3 E Jl'X 4 such that

If H3 = diag(I:z, H3), then


H, [ ~ l [~ l =

X X X X X
0 X X X X
0 0 X X X
H3H:zH1A = 0 0 0 X X
0 0 0 X X
0 0 0 X X

After n such steps we obtain an upper trian.gular H.. H.. _ 1 • • • H1 A =Rand


so by setting Q = H 1 ···H.. we obtain A = QR.

Algorithm 5.2.1 (Householder QR) Given A e R"'x" with m ~ n,


the following algorithm finds Householder matrices H 1 , ••• , H .. such that if
Q = H 1 · ··H.. , then QT A= R is upper triangular. The upper triaugula.r
part of A is overwritten by the upper triangular part of R and components
j + l:m of the jth Householder vector are stored in A(j + l:m,j),j < m.

for j = l:n
[v, PI = house(A(j:m,j))
A(j:m,j:n) = (Im-,;+1- .BvvT)A(j:m,j:n)
if j < m
A(j + l:m,j) = v(2:m- j + l)
end
end
5.2. THE QR FACTORIZATION 225

This algorithm requires 2n2 {m- n/3) flops.


To clarify how A is overwritten, if
(j)
v (j) = [ O, ... ,O,l,v;+t•···•vm
.-....--
U)
lT
.:i-1

is the jth Householder vector, then upon completion

rn T12 TlJ ru T15


(1)
v2 rn T23 r:u r25
(1) (2)
v3 v3 T:J3 r:w T3,5
A ;;
(1)
v4
(2)
v,.
(3)
v_. r._. T.f6

v<1l (2) v(3) {4)


5 v~ 5 Vs T!j6

v~1) v6
('1) (3)
v6
{4)
v6 ti~S)

If the matrix Q = H 1 · · · Hn is required, then it can be accumul&ted using


(5.1.5). This accumulation requires 4(m 2n- mn 2 + n 3 f3) flops.
The computed upper triangular matrix R is the exact R for a nearby A
in the sense that zl'(A+E) = R where Z is some exact orthogonal matrix
and II E ll2 ~ ull A ll2·

5.2.2 Block Householder QR Factorization


Algorithm 5.2.1 is rich in the level-2 operations of matrix-vector multi-
plication and outer product updates. By reorganizing the computation
and uaing the block Bouaebolder repreaentation discwmed in §5.1.7 'Wt! can
obtain a level-3 procedure. The idea is to apply clusters of Householder
transformations that are represented in the WY form of §5.1.7.
A small example illustrates the main idea. Suppose n = 12 and that
the "blocldog parameter" r has the value r = 3. The first step .1.8 to gener-
ate Householders Ht, H 2 , and H3 as in Algorithm 5.2.1. However, unlike
Algorithm 5.2.1 where the H, are applied to all of A, we ollly apply Ht,
H:z, and Ha to A(:,l:3). After this is accomplished we generate the block.
representation H1H2Ha = l + W1 Yt and then perform the level·3 update

A(:, 4:12) = (I+ WYT)A(:, 4:12).


Next, we generate H,, H6 , and H 6 aa in Algorithm 5.2.1. However, these
transformations are not applied to A(:, 7:12) until their block representation
H4Hr.H6 =I+ W2Y{ is found. This illustrates the general pattern.
226 CHAPTER 5 . Oltl'JIOCONALlZATlON AND LEAST SQUAflES

.\ = 1; k- 0
while.\~ n
::smin(.\ + r - 1, n); k k + 1
1" =
Using Algorithm 5.2.1, upper triaugularize A(A:m,.\:n)
generating Houaeholder matrices H,., ... , H,.. (5.2.1)
Use Algorithm 5.1.2 to get t.he block repreeeDtation
I+ W•Y• = H,., . . . ,H,..
A(.\:m. -r + l:n) = (1 + W1l'i)TA(.\:m , r + l :n)
.\::zT+l
end
The zero-nonzero structure of the Houaehok!er vectors that define the ma-
trices H", . .. , H,. implies that the first .\ - 1 rows ofw.
and Y.a. au zero.
This fACt would be exploited in a practical implementatioo.
The proper way to regard (5.2.1) is through the partitioning

A = {At, . .. ,Ard N :: ceU(n/ r)

where block column A• is procesaed during the ktb step. In the ktb step of
(5.2.1), a block Householder is formed that zeros the subdiagonal portioo
of A~c. The remaining block oolUIDilS are then updated.
The roundoff properties of (5.2.1) are essentially the same as those Cor
Algorithm 5.2.1. There is a slight i.ncreue in tbe number of Bops required
because of theW-matrix computatious. However, as a result of the block-
ing, all but a small fraction of the flope occur in the context of matrix mul-
tiplication. In particular, the level-3 fraction of (5.2.1) is approximately
1- 2/ N. See Bischof aod Van Loan (1987) for further details.

5.2.3 Givens QR Methods


Givens rotatiooa can alao be Uled to compute the QR factorization. The
4-by-3 cue iUu.tratea the general idea:

u n~r~ ~ l~ [~ n~
X X X
X X X
)( )( )(

X X X

[~ ~ l~ [~ ~ l~ [~ ~ ]~R
X X X
X X X
X X 0
X 0 0
Here we haw highlighted the 2-vectors that deti.oe the underlying G ivewJ
rotatioos. Clearly, if G1 denotes the jth Givens rotation iD tbe reduction,
then QT A ""'R is upper triangular wbere Q = G 1 • • · Gc and t is tbe total
5.2. THE QR FAcroRIZATJON 227

number of rotations. For general m and n we ha....e:

Algorithm 5.2.2 (Givens QR) Given A E R"xn with m ~ n, the fol-


lowing algorithm overwrites A with QT A= R, where R is upper triangular
and Q is orthogonal.

for j = l:n
for i = m: - 1:j + 1
[c,s) = givens(A(i -l,j),A(i,j))
A(i -l::i,j:n) ;; [ c s
-s c
]T A(i- l:i,j:n)
end
end
This algorithm requires 3n 2 (m- n/3) Oops. Note that we could use (5.1.9)
to encode (c, s) in a. single nwnber p wltich could then be stored in the zeroed
entry A(i,j). An opera.tion such as x ~ QT x could then be implemented
by using (5.1.10), taking care to reconstruct the rotations in the proper
order.
Other sequences of rotations ca.n be used to upper triangula.rize A. For
example, if we replace the for statements in Algorithm 5.2.2 with

for i = m: - 1:2
for j = l:min{i- 1, n}

then the zeros in A a.re introduced row-by-row.


Another parameter in a Givem QR procedure concemB the planes of
rotation that are involved in the zeroing of each C;j. For example, instead
of rotating rows i- 1 and ito zero a;1 as in Algorithm 5.2.2, we could use
rows j and i:

for j = l:n
for i = m: - l:j + 1
[c,s) = givens(A(j,j),A(i,j))
A{[ji],j:n) = [ c
8 ]T A([jij,j:n)
-s c
end
end

5.2.4 Hessenberg QR via Givens


As an example of how Givens rotations can be used in structured problems,
we show how they can be employed to compute the QR factorization of an
upper Hessenberg matrix. A smaU example illustrates the general idea.
228 CHAPTER 5. 0RJ'HOGONALIZAT10N AND LEAST SQUARES

Suppose n = 6 and that after two steps we have computed


X X X X X X
0 X X X X X
0 0 X X X X
G(2, 3, 82)T G(l, 2, 8l}TA = 0 0 X X X X
0 0 0 X X X
0 0 0 0 X X

We then compute G(3, 4, 93 ) to zero the current (4,3) entry thereby obtain-
ing
X X X X X X
0 X X X X X
0 0 X X X X
G(3,4,83)TG(2,3,9"l)TG(l,2,1h)T A =
0 0 0 X X X
0 0 0 X X X
0 0 0 0 X X

Overall we have

Algorithm 5.2.3 (Hessenberg QR) If A E .n.nxn is upper Hessenberg,


then the following algorithm overwrites A with QT A = R where Q is or-
thogonal and R is upper triangular. Q = G1 · · · Gn-1 is a product of Givens
rotations where G1 has the form G1 = G(j,j + 1,8;)-
for j = L:n -1
[ cs l = givens(A(j,j),A(j + l,j))
A(j:j + l,j:n) = [ c
3 ]T A(i:j + l,j:n)
-s c
end
This algorithm requires about 3n2 flops.

5.2.5 Fast Givens QR


We can U8e the fast Givens transformationa described in §5.1.13 to compute
an (M, D) representation of Q. In particular, if M is nonsingular and D
is diagonal such that MT A = T is upper triangular and A(T M = D is
diagonal, then Q = MD- 112 is orthogonal and QT A = D- 112T -:= R is
upper triangular. Analogous to the Givens QR procedure we have:

Algorithm 5.2.4 (Fast Giveos QR) Given A E Rmxn. with m ~ n, the


following algorithm computes noDSingular ME ~rxm and positive d{l:m)
such that MI' A= Tis upper triangu.Jar, and AfT M = diag(dt. ... , d,.). A
is overwritten by T. Note: A = (MD- 112 )(D 112 T) is a QR factorization
of A.
5.2. THE QR FACTORIZATION 229

fori"" l:m
d(i) = 1
end
for j = l:n
for i = m: - l:j + 1
[ et, {3, type J = fast.givens(A(i- l:i,j), d(i- l:i))
if type= 1
A(i -l:i,j:n) = [ ~ ! r A(i- l:i,j:n)

end
else

A(i- l:i,j:n) =[~ ~ r A(i -l:i,j:n)

end
This algorithm requires 2n2 (m- n/3) flops. AE we mentioned in the pre-
vious section, it is necessary to guard against overflow in fast Givens algo-
rithms such as the above. This means that M, D, and A must be periodi-
cally scaled if their entries become large.
If the QR factorization of a narrow band matrix is required, then the
fast Givens approach is attractive because it involves no square roots. (We
found LD LT preferable to Cholesky in the narrow band case for the same
reason; see §4.3.6.) In particular, if A E Rmxn has upper bandwidth q and
lower bandwidth p, then QT A = R has upper bandwidth p + q. In this
case Givens QR requires about O{np(p+q)) flops and O(np) square roots.
Thus, the square roots are a significant portion of the overall computation
if p, q <t: n.

5.2.6 Properties of the QR Factorization


The above algorithms "prove" that the QR factorization exists. Now we
relate the colwnns of Q to ran( A) and ran(A).L and examine the uniqueness
question.
Theorem. 5.2.1 If A = QR is 4 QR factarizo.tion of a full column mnk
A E R"'xn and A = [ a 11 ... , a..] and Q = [ q1, ••• , q... ] are column parti-
tioni~, then

k= l:n.

ln particular, if Q1 = Q(l:m, l:n) and Q'l = Q(l:m, n + l:m) then


ran(A) = ran(Qt)
ran(A).L = ran(Q:z)

and A= Q1R1 with R1 = R(l:n., l:n).


230 CHAPTER 5. ORI'HOGONALIZATION AND LEAST SQUARES

Proof. Comparing kth colUDlDB in A= QR vre conclude that

a,. ;; L• r,,.q. E span{q1, ... ,q,.} . (5.2.2)

Thus, spao{alo····a,.} s; span{ql, ... ,q~:}. However, since rank(A);;


n it follows that span{ at, ... , a,.,} has dimension k and so must equal
span { q1, •.. , q,.} The rest of the theorem follows trivially. D

The matrices Ql = Q(l:m, l:n) and Q'l = Q(l:m, n + l:m) can be easily
computed from a fa.ctored form representation of Q.
II A = QR is a. QR factorization of A E Rmx" and m 2: n, then we refer
to A ;; Q(:,l:n)R(l:n, l:n) as the thin QR factorization. The next result
addresses the uniqueness issue for the thin QR factorization
Theorem 5.2.2 Suppose A E R"x" has full column rank. The thin QR
factorization
A= Q1R1
is unique wMre Q 1 E Rmxn has ort!wnormal columm and Rt is upper tri-
angular with po.ritive diagonal entriu. Moreover, R 1 = (jl' where G is the
lower triangular Cholesky factor of AT A.
Proof. Since AT A = {Q 1 R1)T (QtRi) = Rf R1 vre see that G = Rf is the
Cholesky factor of AT A. This factor is unique by Theorem 4.2.5. Since
Q 1 = AR} 1 it follows that Ql is also unique. []

How are Q1 a.nd R1 affected by perturbations in A? To answer this


question we need to extend the notion of condition to rectangular llij!.trices.
Recall from §2. 7.3 that the 2-norm condition of a square nonsingular matrix
is the ratio of the largest and smallest singular values. For rectangular
matrices with full column rank we continue with this definition:

A E Rmxn,rank(A) = n => 1t2(A) = umc:(A).


O'm.:n(A)
If the columns of A are nearly dependent, then lt'l(A) is large. Stewart
{1993) has shown that O(E) relative error in A induoes 0(Eit2(A)) relative
error in Rand Q1.

5.2. 7 Classical Gram·Schmidt


We now discuss two alternative methods that can be used to compute the
thin QR factorization A;; Q1R1 directly. If raok(A) = n, then equation
(5.2.2) can be solved for q,.:

q,., ~ (a~. - ~rieqi) /n,..


5.2. THE QR FACTORIZATION 231

Thus, we can think of Qlc as a unit 2-norm vector in the direction of


Jc-1
.;,.. = a,., - L TiA:Qi

where to ensure Zl! E span{Ql> ••• , Qlc -1} .L we cbOOI'Ie


ru, = q'[a1c i = 1:k- 1 .
This leads to the clas!ical Gmm-Schmidt (CGS) algorithm for computing
A= Q1R1.

R(l, 1) =II A(:, 1) lb


Q(:,l) =A(:, 1)/R(l, 1)
fork= 2:n
R(l:k - 1, k} = Q(1:m, l:k- l)T A(l:m, k)
z = A(l:m,A:)-Q(l:m,l:k-l)R(l:k-l,k) {5.2.3}
R(k, k) = l!.z II;,
Q(l:m, k) = z/R(k, k)
end
In the kth step of CGS, the kth columns of both Q and R are generated.

5.2.8 Modified Gram-Schmidt


Unfortunately, the CGS method has very poor numerical properties in that
there is typically a severe 1065 of orthogonality among the computed qi.
Interestingly, a rearrangement of the calculation, known as modifiM Grnm-
Schmidt (MGS), yields a much soUDder computational procedure. In the
kth step of MGS, the kth column of Q (denoted by q~c) and the kth row of
R (denoted by rf} are determined. To derive the MGS method, define the
matrix A(Jc) E Rmx(n-ir+l) by .
l-1 "
A- L,ihrf = L,q,rt = [oA<klJ. (5.2.4)

It follows that if
A(Jc} = [ z B I
1 n- k
then ru = II z IJ;,, q,., =
zfrlcJc and (rt,i+l · .. r..,.) = qfB. We then
compute the outer product A<•+ I) = B - q• (rll,k+l · · · r.m) and proceed
to the next step. This completely describes the kth step of MGS.

Algorithm 5.2.5 (Modified Gram-Schmidt) Given A e Jr":.:" with


rank(A) = n, the following algorithm computes the factorization A= Q1R1
where Q1 E Rmxn has orthonormal columns and R1 E R"xn is upper tri-
angular.
232 CHAPTER 5. 0RTHOOONALIZATION ANO LEAST SQliAR.ES

fork= l:n
R(k,k) =II A(l:m,k) II:~
Q(l:m,k) = A(l:m,k)/R(k,k)
for j = k+ l:n
R(k, j) = Q(l:m,k)TA(l:m,j)
.A.(l:m,j) = A(l:m,j)- Q{l :m, k)R(k,j)
end
end
This algorithm require~ 2mn2 fiops. It ia oot possible to overwrite A with
both Q1 and Rt. Typically, the MGS computation is arranged so that A is
overwritten by Q, and the matrix Rt ia stored iDa aeparate array.

5.2.9 Work and Accuracy


If oae is i.Dterested in computing an orthonormal baaia for ran(A), then
the Howteholder approach req\lirea 2mn2 - 2n3 / 3 flops to get Q in fac-
tored form and another 2mn 2 - 2n3 / 3 ftop8 to get the first n ooltJJDnS of
Q. (Thia requires "paying attention" to just the first n colWllD8 of Q in
(5.1.5).) Therefore, for the problem of findins an orthonormal basis for
ran(A), MGS is about twice as efficient as Householder orthogonalization.
H~, BjOrck (1967) baa shown that MGS produces a computed Q1 =
( ql, ... , q" I that &&tis fie~
Q[Qt = 1 + Eucs II Eucs II:~~ ~(A)
whereas the corresponding result ror the Ho\18eholder approach is of the
form
•T ~
Ql Ql = I + EH II EH lh ~ u .
Thtl8, if orthonormality is critical, tben MGS should be U8ed to compute
ortbo.normal bues only when the vectors to be orthogonalized are fairly
independent.
We also mention that the oomputed triaogular factor R produced by
MGS sa&Lsfies II A - QR 11 ~ ull A II and tbat there exista aQ with periectly
orthonormal colWDDS such that UA - QRII ~ uU A II· See Hlgbam (1996,
p.379).

-.~0'1]
.107100
.
5.2. THE QR FACTORIZATION 233

5.2.10 A Note on Complex QR


Most of the algorithms that we present in this book have complex ver-
sions that are fairly straight forward to derive from their real counterpart.B.
{This is not to say that everything is easy and obvious at the implementa-
tion level.) As an illustration we outline what a complex Householder QR
factorization algorithm looks like.
Starting at the level of an individual Householder tra.nsformation, sup-
pose 0 #: x e C' and that X"1 ::::: re 08 where r, 8 e R. H v = x ± e"'ll x lllel
and P =In- {JwH, {J '"'2/vHv, then Px = H"'ll x ll:zet. (See P5.1.3.)
The sign can be determined to maxim.i.ze II v !12 for the sa.ke of stability.
The upper triangularization of A e R"':.:", m ~ n, proceeds as in Algo-
rithm 5.2.1. In step j we zero the subdiagonal portion of A(j:m,j):

for j = l:n
x = A(j:m,j)
v = x ± e08 11 x llze1 where x1 = re 08 •
/3 = 2/vH /v
A(j:m,j:n) ""{Im-i+l -pvvH)A(j:m,j:n)
end

The reduction involves 8n2 (m - n/3) real flops, four times the number
required to execute Algorithm 5.2.1. H Q = P 1 • • • Pn is the product of the
Householder transformations, the11 Q is unitary and QT A= R e Rmxn is
complex and upper triangular.
Problems

Pli.:il.l Adapt tbe Houebolder QR al&orithm ao tlllll it c:&D efficiently bandle the ca.
when A E R"x" hM ~ bandwidth p and upper be.DdwidUl q.
Pli.2.2 Adapt tbe Hou.holdet QR alpiihm 110 thai it cornpa.tm the r.ctorization
A :i QL wbwe L ill loww ~aDd Q is ortbopmal. AMume t.ha& A ia ~~quare. This
=
involw. rewritin11: the Hou.holdet vectot function v bou.(z) ., thai (l-2vvT fvT v ):
i.l!l zero ewrywhere but itl bouom component.
Pli.2.3 Adapt the Giwa.l QR factorization algorithm 80 'IW t h e - ant ioUoduced by
diagonal. That ia, the eotri• a.rezaoed in the order (m, 1), (m-1, 1), (m, 2), (m- 2, l),
(m - 1, 2), (m, 3) , etc.
P5.2.4 Adap; tbe fMt GiYI!llllll QR fiK:toriation aJ«ori&hm 110 t..h.llt it efficieot.ly bandleB
the a . wbtm A i1: n.-by&n aod tridia&onal A.ume that tbe subdiagoaal. diagonal, and
s~ ol A ace stored iD e{l:n- l), o(l:n),/(l:n- l) ruapectiwly. DMigD. your
algorithm 80 thai tru- 't'eCionl are oY!lfWritten by tbe oo-ao portioo ofT.
PS.:II:.lS Suppoae L E R'""' with m ~ n is lower triaDgu)ar. Show hl:nr HoWieboider
Dllltricm Ht ... H .. CUI be Wllld. &o determille a ~ triaugular Lt E R"x .. 80 tha&

H.. ···HtL = [ ~1 ]
234 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES

with the property that I'OWII 1 and 3 are left alone.


PI5.:U Sbow Ula& il
A= [: ~] m~k b"" m-k
k n-k
2
and A h.a.s full column rank, thm miD II Az - b II~ =1! d II~ - (vT d/U v 1!2) .
PS.l. 7 Suppose A E R"'" and D = dia@:(dt, ... , d..) E R"'"· Show bow to construct
an. on.bogona.l. Q such th&t qT A - DQT = R i.l!l upper- triangular. Do not worry about
efficiency-this i.l!l just an exl!l'Cise in QR manipulAtion.
P5.l.8 Show how to compute the QR factorization of the product A = Ap ···A:~ At
without explicitly multiplying the matrices At 1 ••• , Ap together. Hint: In the p =
3 cue, write Qf A == Qf AlQ,QI A:~QtQf At 8.D.d determine orthogonal Q~ so that
Q[CA.wQ;-d i.l!l upper triaoguJBl'. (Qo = !).
Pl5.l.9 Suppose A E R'x" and let E be the j)lll'mntation obtai.oed by revmaing the
order of tbe rows in I,.. (This ill juM. the exchange matrix of §4.7.) (a) Show that if
R E R""n ia upper trtaa.gular, then L == ERE i.l!l lower triangular. (b) Show bow to
compute an ort.hogonaJ Q E R'x" and a. lower triangular L E R'"" so that A = QL
IIIIIIWIIing the availability or & procedure for computing tbe QR factoriz-ation.
PS.l.lO MGS applied to A E R"x" ill numeric&lly equivalent to the tim step in Houae-
holdec QR applied to

A= [ ~"]
where 0.. ill t.he rrby-n zero matrix. Verily tba.t. th.il &latement i.l!l true aftet- the first
step of eech method ia completed.
P&.l.U llewnle the loop on:l.ss ill AJsoritlun 5.~.5 (MGS QR) 80 that Rill computed
colWIUl-by-colWIID.
PS.l.ll Develop a complex~ oftbe Givsu QR ~ Refer to P5.l.5.
where complex GiveM rota&iooa are the theme. I. ii paable to mp.oize the calcu.Jal;ioDB
110 thN tbe <fia&oll&l elemsCII of Rare DOoneptiv.7

The idea of using Householder transformatioDS to solve the LS problem was prnpoaed in

A.S. Houeebolder (1958). •unitary Triangu1arlzation of • NDDS)'IIIIIH!tric M&trix," J.


ACM. 5, 339-42.
The practical decailll wse WDI"ked out In

P. Busiuger llDd G.H. Golub (l965). ~Linear Least. Sq~ SolutiolljJ by HOUIIeholder
'l'c&usbnna.tiolla," NUJJil!r. Mtldl.. 1, 269-16. See aJ.o Wi.l.k.illiiOn and Rein8ch
(1971,111-18).
G. H. Golub (1965). ~Numerical Metboda !or Solvin~~: Linetl.l' ~ Squana Problem~!,~
Numer. MctJi. 7, 206-16.
5.2. THE QR FACTORIZATION 235

W. GJw. (1958). "Computatiozl ol PlaDe Unitary Ro&atJcaa 'IhDalormiq a G.lwU


Matrix co 'niaagular Form," SIAM J. Al'J'. Maill. tS, ~.
M. Gat ........ (1973). -Enw ~ ol QR [)ercm~jtjcvw by G~ ~
tlou." Lin. Alg. Gftd IC. A,L 10, 1-..o1.
For a ~ of 00. the QR factorisadclll caD be 1-t to IOMt OWIIIIIUUI problem. iD
stae•ic&l computat.km, -
G.H. Golub (1969). "Mto&riz Decom~aa 8lld SWiltical Compma&icm," In~
~ , ed. llC. Milton aDd J .A. Neld«, Aeademic PreM, N- York, pp.
366-97.
The behavior of the Q and R r.aor. when A ill perturbed is ~ in
G.W. Stewart (1977). "Penurbailon BouD<II for tbe QR F'Kt.orisa&loo of a Ma&rix.,"
SIAM J. N-. AnGL l-4 , ~18.
H. Zha (1993). "A CompoAeatwi8e Perturbation Aoalyaia of the QR Decompoeition."
SIAM J. M_... AfiOL AppL 4. 1124-1131.
G.W. Stewan (1993). "'n the Perturbation of LU Cholaaky, and QR FactorisatioDB,"
SJAJI J. Mo.tri:& AMi AppL 14, 1141-1145.
A. Barrlund (I UN). "Pelturb&tioD BoWlds for tbe ~ QR Faccoria&ioo," Lin.
A(f. 11M IC.. Applie. WT, ~1-271.
J..Q. Sua (1•). "'On Plltlllba&ioo 8ouDdll for tbe QR FlctoriziUoo," Lin. Alg. 1M
IC. Appk IJ6, ~112.
Tbe maio nnlt ill tba& Lbe challpa iD Q Nld R are bounded by &be colldiUoa of A t1mes
,.,_ re!Mive cbeop iD A . ~ the CQIII~Caiion 10 thai the eacrMa ill Q depeo<l
COIIWauo-'Y OD tile elltrial iD A ie m.c:u..t ill
T.F. Coleman t.lld D.C.~ (1984). ~A Now OCl tbe Compuce&ion of an Ortbonor·
me.1 S... for tbe NuU s.,_ot a~"~~~~ 19,234-242.
Rerermcs for tbe Gram-Schmich ~ iDclude include

J.ll RJce (1966). ·~on G~ 0"'~·" MaUl. Comp.


10,32&-28.
A. Bjiin:k (1961). "SolviJJC u- lAel& SqOIIW Problema by Gram-Schm!cl\ Qnboco-
n-.ln•t;on}' BIT 7, 1- :11.
N.N. Abdelmelelc (1971). ·~Error ADal)llia b Gram-Schmidt Melbod and
Sollrtioo ol Lillev te.o. SqiiANI Prob"-." BIT 11, 345-68.
J. Daaiel, W.B. Gng, L.Kanfmaa, and G.W. S~ (1971). "Aecln~icll
ud Stable A)pit.hza. Cw UpdatiDc the Gram-'lcbmidt QR ~... MotJa..
Comp. 30. m -m.
A. Rube (1!1183). "NIUZiel'ical Mpec:ta of Gru»-Scbmid\ Orihop'•lintioa of Vecton,"
Lin. Alg. an4 ltl Al'Piie. 61/53, 59Hi01.
W . Jalby and B. Philippe (1991). "Stab~ AllaJylll aDd Impr~ of tbe Blodt
Gram-Scbmid\ AJ&oriCbm," SIAM J. Sci. Sta. eorn,. 1!, 10158-1073.
A. BjOn:k aDd C .C . Palp (1092). •:t.a. and R..:apture ol Ortlaapoality in the Mocli6ed
Gram-Sdullid\ Alpiibu!," SIAM J. Mo&riz AfiOL Appl. 1:1. 171H90.
A. BjOrdc (1*). •Nam.ic~ ol Gram-Schmidt Orthopa-.li•Wt!o," Lin. Alg. ana lu
App&:. 11'1/198, 201-316.
The QR (actoriaa&ioa of a muc:tured matriJ: ill unally scrudured iQelf. See

A. W. BoJ&IIC&Yk. R.P. Bnmt, and F.R. de Hooc (1088). "QR ~ of ToepliU


Me&rlc:el," N-M:r. MGIA. 49, 81--9(.
236 CHAPTER 5. 0RTHOGONAL1ZAT10N AND LEAST SQUARES

S. Qiao(1986). "Hybrid A1gornbm for fast ToepliU Oniqoulia&.ioP,~ Numer. MatiL..


53, 351-366.
C.J. Demeure (1989). "Fui QR F'actorizaUon of Vandennonde Matrices," Lin. Alg.
and It. Appiic. lU/1!3/11.4, HllH94.
L. Reic:bel (1991). "Fa.& QR Decomposition ofVandermond&-Like Ma.tricM aDd Polyno-
mial!.-& Squares Apprmrlm.aiion,~ SIAM J. Matriz AnaL AppL 1!, 552-564.
D.R. Sweet (1991). "Fast Block Toeplitz OrthogoaalizUion," Numer. Moth. 58, 613-
629.
Vario1111 high-performance ialum pen.aini.us to the QR fact.orization are disc...-:1 in

B. Maitingly, C. Meyer, and J. Ortega (1989). "'rthogonal Reduction on Vecror Com-


puters," SIAM J. Sci. an.d Stot. Comp. 10, 372-381.
P.A. Knight (1995). "F'aat Rectangular Matrix Multiplication and the QR Decompoai-
tion," Lin. Alg. cmd IC• Appiic. S.f!l, 69-81.

5.3 The Full Rank LS Problem


Consider the problem of finding a vector x E Rn such that Ax b where=
the data matrix A E IR"'xn and the observation vector b E Rm are given and
m ~ n. When there are more equatioJUI than unlmowns, we say that the
system Ax= b is overdetermined. Usually an overdetermined system has
no exact solution since b must be a.n element of ran( A), a proper subspace
of m.m.
This suggests that we strive to minimize II Ax-bliP for some suitable
choice of p. Different norms render different optimum solutions. For exam-
ple, if A = [ 1, 1, qr
and b = [ b11 Oz, b:J ]T with b1 2! bz ~ ~ ~ 0, then it
can be verified that
p = 1 Oz
p = 2 = (bt + ~ + b:J)/3
p = = (bt + ~)/2.
Minimization in the 1-norm a.nd oo -norm is complicated by the fact that
the function /(x) = Jl Ax.- bliP is not differentiable for these values of
p. However, much progress has been made in tbis area., and there a.re
several good techniques available for 1-oorm a.nd oo-norm minimization.
See Coleman and Li (1992), Li (1993), and Zba.ug (1993).
In contrast to general p-norm minimization, the lealt 8quares (LS) prob-
lem

min (5.3.1)
sER"

is more tractable for two reasons:


• 4l(x) =~II Ax- b II~ is a. differentiable function of x a.nd so the min-
imizers of¢ satisfy the gradient equation Vt/l(z)= 0. This turns out
to be an easily constructed symmetric linear system which is positive
definite if A bas full column rank.
5.3. THE FULL ~K LS PROBLEM 237

• The 2-norm is preaerwd under orthogonal transformation. This means


that we can seek an orthogooaJ. Q such that the equivalent problem
of minimizing U(QTA)%- (QTb) 11, is "easy" to solw.

In this section we pursue these two solution approaches for the case when
A baa full column rank. Methods based on normal equations and the QR
factorization are detailed and compared.

5.3.1 Implications of Full Rank


Suppose x E R", z E R" , and a E R and consider the equality

where A E R"' xn and b E R"'. If z solve.'! the LS problem (5.3.1) then


we must have AT(Ax- b) = 0. Otherwise, if z = -AT(Az- b) aud
we make o small enough, then we obtain the contradictory inequality
II A(z + az) - b 111 < ll A% - b lh· We may also conclude that if x and
z + oz: are LS minimireJS, then z E null(A).
Tbll8, if A baa full column rank, then there is a unique LS solution XLS
and it solves the symmetric positive definite linear system

AT AxLs ;: ATb.

These are called the nonn4l equat&onl. Since V~(x) ""AT(.Az- b) where
~(z) = !II Ax- b II~, we see that solving the normal equations is tanta--
mount to solving the gradient equation V ~ = 0. We call

rts = b- A%Ls

the minimum ~ and we use the notation

PLs = uAzr.s-, n,
to deDOte ita size. Note that if PLS is small, then we can "predict" b with
the colulllD.S of A.
So far we have been &SBUming tbat A E m.mxn has full column rank.
This 888WDption is dropped in §5.5. However, even if rank(A) = n, then
we can expect trouble in the above procedures if A is nearly rank deficient.
When 9"l1'8Sing the quality of a computed LS solution its, theze are
two important laeues to beat in mind:

• Haw small is fLs = b - A%z.s compared to rLS = b- AzLS?


238 CHAPTER. 5. 0RTHOGONALIZA.TION AND LEAST SQUARES

The relatiw importance of tbeee two criteria varies &om application to


application. In &D)' case it is important to understand how XLS and Tf,$
are affected by perturbations in A and b. Our intuition tells us that if
the coJWDJIB of A are nearly depeodeut. then these quantities may be quite
seoaitive. ·

Example 5~.1 Su~

A ~ ~ ~-•
[ ] , 6A = [ ~ l~-• ] , 6= [ ~] , 6b = [ ~],
and tbaULS Uld i u minilnife M..b-612 aDd II (A+6A)z -(6+66) 1
12 respectively.
I.e\ rLS aDd fr.s bo ~be cOCYWpOIIdioc minimum ruid!U. The~~

Z£S = [ ~ ] , i£s • [ .~· lo4 ] , rt..s = [ ~ ] , rLS = { -::i: ~~l ] •

Sin~ ~(A)z lo' we bave


II Z£S- %1..$ a, ~ .9999 ·lo4 < ~(A)2 116A Ha = lOll. 10_,
"%£5 n2 - nA t12

The example suggests that the S6DSitivityo{ XLS depends upoo ~2(A) 2 . At
the end o! this &eetio.o we develop a perturbation theory {or the LS problem
and the "'2 (A) 2 factor will return.

5.3.2 The Method of Normal Equations


The most widely used method for solving the full rank LS problem is the
method of oorma1 equationa.

Algorithm 5.3.1 (Normal Equ.ationa) Given A e R."'xn with the pro~


erty tbat rank(A) =nand be R"', this algorithm computes the solution
XLS to the LS problem IDiD RAz - b lla where b e am.

Compute the lower triangular portion of C = AT A.


d=ATb
Compute tbe Cboleslcy factorization C : G(/f'.
Solve G11 = d and cf1'XLS = Sf.
Tltis algorithm requires (m + n/3)n2 ftope. The normal equation approach
is convenient becau8e it relies on staadard algorithms: Cboleslcy factoriza.
tion, matrix-matrix multipUcation, and matrix-vector multiplication. The
compremion of the m-by-n data matrix A into the (typically) much smaller
n-by-n cross-product matrix C is attractive.
5.3. THE FULL RANK LS PROBLEM 239

Let ue OODBider the accuracy of the computed DOrmal equations solution


ZLS· For clarity, 888ume that no roundoff errors occur during the formation
of C ; ATA and d = ATb. (On mauy computers inner products are accu-
mulated in double precisioo and so this is not a terribly unfair assumption.)
It follows from what we know about the roundoff properties oftbe Cboleaky
factorization (d. §4.2.7) that

(AT A + E)i:ts = ATb,


where II E lb 1':$ uR AT n,u An,~ un ATAlb and thus we can expect
II iLS- %LS II, (ATA) = (A)2 (5.3.2)
· llzu ll, ~ ~2 u~ ·

ln other words, the accuracy of the computed normal equations solution


depends on the square of the condition. This seems to be consistent with
Example 5.3.1 but more refined comments follow in §5.3.9.

Example 5.3.2 h &hould be no&ed


Jo. o£ i.olorma&ioa.
'* tbe formatioD of AT A. caD raub in a aevere

A= [ o1 ] &Dd 11 = [ 2 3 ]
10-
to- 3 10- 3

then ~2(A) ~ 1.4 · lo', %£5 = (1 IJT, eDd PLS • 0. U the oorma1 equatlou mecbod is
execuWICI with bue 10, t • Ci aritllmatic, then a divide-by-aero OCC'Iln duriJIK the eolutioo
pro<:ea~, s ioce

fl(AT A) = [ ! !]
is exactly aingular. Oo the other halld, if 7-di&it.arithmetk is ~ thea :LS ""
I 2.000001 • 0 JT aod II Z£$ - %{.$ 11,/1 %£$ 112""' U1'2(A) 3 .

5.3.3 LS Solution Via QR Factorization


Let A E Jr)(" with m ~ n and b e Rm be given and suppose that an
orthogonal matrix Q E rxm bas been computed such that

QTA=R= [R
0
1] m-n
n (5.3.3)

is upper triaogular. II
n
m-n
tbeo
240 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES

for any x E R". Clearly, if rank(A) = rank(Rl) = n, then XLS is defined


by the upper triangular system R1 XLS = c. Note that

PLS = lldJlz.
We conclude that the full rank LS problem ca.u be readily aolved once we
have computed the QR factorization of A. Details depend on the exact QR
procedure. If Householder matrices are used and QT is applied in factored
form to b, then we obtain

Algorithm 5.3.2 (Householder LS Solution) If A E m..mxn has full


column rank and bE lRm, then the following algorithm computes a vector
XLS E R.n such that II AxLs- b lh is minimum.

Use Algorithm 5.2.1 to overwrite A with its QR factorization.


for j = l:n
v(J') = 1; v(j + l:m) = A(j + l:m,j)
b(j:m) = (lm-Hl - /3jvvT)b(j:m)
end
Solve R{l:n, l:n)xLs = b(l:n) using back substitution.
This method for solving the full rank LS problem requires 2n 2 (m - n/3)
flaps. The O(mn) flops associated with the updating of band the O(n 2 )
Hops associated with the back substitution are not significant compared to
the work required to factor A.
It can be shown that the computed XLS solves

minfl {A + oA)x - (b + ob) lb (5.3.4)

where
(5.3.5)
and
IJ6b liz :5 (6m- 3n + 40)null b lb + O(u 2). (5.3.6)
These inequalities are established in Lawson and Hanson {1974, p.90ff} and
show that :i: LS satisfies a "nearby" LS problem. (We cannot address the
relative error in :i: LS without an LS perturbation theory, to be discussed
shortly.) We mention that similar results hold if Givens QR is used.

5.3.4 Breakdown in Near-Rank Deficient Case


Like the method of normal equatioDB, the Houaeholder method for solving
the LS problem breaks down in the back substitution phase if rank( A) < n.
Numerically, trouble can be expected whenever ~~:~(A) "" lt-l(R) ::::~ 1/u.
This ia in contrast to the normal equations approach, where completion
of the Cholesky factorization becomes problematical once ~~::~(A) is in the
5.3. THE FULL RANK LS PROBLEM 241

Deighborhood of 1/y'U. (See Example 5.3.2.) Bence the claim in LaW1110n


and H8D80n (1974, 126-127) that for a fixed machine precision, a wider
ci881J of LS problems can be solved using Rouaeholder ortbogooallzation.

5.3.5 A Note on the MGS Approach


In principle, MGS computes the thin QR factorization A= Q1R1. This ia
enough to solve the full rank LS problem because it transforms tbe normal
equations (AT A)% = ATb to tbe upper triangular system R 1% = Qfb.
But an analysis of this approach when Qf6 is explicitly Conned intr~
duces a ~(A) 2 tenn. This is because the computed factor Q1 satisfies
. II QfQ, - In lb ~~(A) aa we mentioned in §5.2.9.
However, if MGS is applied to the augmented matrix

A+ = I A bI = I Q, 9n+i I [ ~I ; ] '

then z :a Q[b. Computing Q[b in this fashion and solving RtXLS = z


produces an LS solution iLs that is "just as good" as the Householder QR
method. That is to say, a result of the form (5.3.4}·(5.3.6) applies. See
Bjorck and Paige (1992).
It should be noted that the MGS method is slightly more expensive
than Householder QR because it always manipulates m-vec:tors whereas
the latter procedure deals with ever shorter vectors.

5.3.6 Fast Givens LS Solver


The LS problem can alao be solved using fast Givens transformations. Sup-
pose MT M = D is diagonaJ and

Sa ] n
[ 0 m-n
is upper triangular. H
n
m-n
then

for aoy ,; E R". CJearly, XLS 1s obtained by .solving the noosingular upper
tri&Dgu.lar system St% =c.
The computed solution zc.s obtained in this fashion can be shown to
solve a nearby LS problem in the seose of (5.3.4)-(5.3.6). This may seem
242 CHAPTER 5. OJO'HOGONALIZATION AND LEAST SQUARES

surprising since lacge numbers can arise during the calculation. An entry
in the sca.ling matrix D can double in magnitude after a single fast Givens
update. However, largeness in D must be exactly compensated for by large-
ness in M, since v-
1/lM is orthogonal at aJl stages of the computation.

It is this phenomenon that enables one to pUBh through a favorable error


analysis.

5.3. 7 The Sensitivity of the LS Problem


We now develop a. perturbation theory that assists in the comparison of
the normal equations 8Jld QR approa.cbes to the LS problem. The theorem
below examines bow the LS solution and its residual are affected by changes
in A and b. In so doing, the condition of the LS problem is identified.
Two easily established facts are required in the analysis:

(5.3.7}

These equations can be verified using the SVD.

Theorem 5.3.1 Suppose x, r, x, and r satisfy

II Ax- b 11 2 = min r=b-Ax

11 (A + oA)x- (b + ob) lh = min f = (b + ob) - (A+ oA)x

where A and oA are in R"xl\ with m ~ n and 0 f. b and 6b are in Rm. If

and
. (O} PLS ..i. l
sw ""llblh r

where PLs = II AxLs - h ll2. then


II ~: ;2112 $ I! { ~~:; + tan(9)~(A)2} + 0(E2) (5.3.8)

II f - r lb < e {1 + 2~(A)) min(l,m- n) + O(t2 ). (5.3.9)


II bii:J
5.3. THE FULL RANK LS PROBLEM 243

Proof. Let E and f be defined byE= oAjE and f = objf.. By hypothesis


I oA 112 < O'n(A) and so by Theorem 2.5.2 we have rank( A+ tE) = n for
all t E {0, Ej. It follows that the solution :~:(t) to

(A+ tE)T(A + tE)x(t) = (A+ tE)T(b + tj) (5.3.10)

is continuously differentiable for all t e [O,Ej. Since x = x(O) and x = x(E),


we have
i' == x + ci(O) + 0( ~)-
The assumptions b I= 0 and sin(8) I= 1 ensure that x is nonzero and so

IIi- X ll2 - f. "±(0) '"' + O(E2)- (5.3.11)


fl:~: ll2 !lxfb
In order to hound II ±(0) 11 2 , we differentiate (5.3.10) and set t = 0 in the
result. This gives

i.e.,
(5.3.12}
By substituting this result into (5.3.11 ), taking norms, and using the easily
verified inequalities II J lb ~ 11 b 11 2 and II E 11 2 ~ II A 1! 2 we obtain

ll.i:-xlb
I xll2
< E {11 A ll2ll (AT A)-lAT lb (
11
}1 : 1 ~ lb
\ 1
+ 1)
+ II A ::~r X u211 A II~ II (AT A)-I lb} +
2
0(E )-

Since AT(Az- b),;. 0, A% is orthogonal to Ax-band so

Thus,

and so by using (5.3.7)

x II~
II xII -x lb { ( 1 1) :z sin(9) } 2
:5 f. K:z(A) cos( H) + + ~(A) cos(B) + O(f. )

thereby establishing (5.3.8).


To prove (5.3.9), we define the differentiable vector function r(t) by

r(t) = (b + tf) -(A+ tE)x(t)


244 CHAPTER 5. OrtrHOGONALIZATION AND LEAST SQUARES

r
and observe that r = r(O) and = r(~). Using (5.3.12} it can be shown
that
r{O) = (I- A( AT A)- 1 Ai) (! -Ex) -A( ATA)- 1 ET r.

Since II f - r lb =ell r(O) 11 2 + O(e) we have


II r-rlb II r(O) ll2 + O(~)
II bib f II bib
< f {III- A(ATA)-lAT liz (t + II Allll~l~l: 112)
2
+ II A(AT A)-l 11211 Alb t:»2} + 0(E ).

Inequality (5.3.9) now follows because

II A 11211 X lb = II A 11211 A+b 112 ::; "z(A)II b li'JI


PLS =II (I- A(AT A)-lAT)b 112 $ Jl I - A(ATA)-lAT lbll b 112'
and
II(/- A( AT A)- 1 AT lb = min(m- n, 1). D
An interesting feature of the upper bound in (5.3.8) is the factor

2 PLS 2
tan(8)~~:2(A) = 11:2(A) .
v'll b II~ - Pls
Thus, in nonzero residual problems it is the square of the condition that
measures the sensitivity of XLS· In contrast, residual sensitivity depends
just linearly on ltl(A). These dependencies are confirmed by Example 5.3.1.

5.3.8 Normal Equations Versus QR


It is instructive to compare the normal equation and QR approaches to the
LS problem. Recall tbe following main points from our discussion:

• The sensitivity of the LS solution is roughly proportional to the quan-


2
tity 1t2(A) + PLs1t2(A) .

• The method of normal equations produces an .i LS whose relative error


depends on the square of the condition.

• The QR approach (Householder, Givens, careful MGS) solves a nearby


LS problem and therefore produces a solution that has a relative error
approximately given by U(~'>'J(A) + PLSI>2(A) ).
2
5.3. THE FULL RANK LS PROBLEM

Thus, we may conclude that if P£S ia small and 11:2(A) is large, then the
method of normal equations does not solve a nearby problem and will usu-
ally render an LS solution tbat ia less IICCUl'&te than a stable QR approach.
Conversely, the two methods produce comparably inaccurate reaulta when
applied to large residual, ill-conditioned problems.
Finally, we meatioll two other factors that figure in the debate about
QR venus normal equatiooa:

• The normal equatioiiB approach involves about half or the arithmetic


when m > n aud does not require aa much storage.
• QR approaches are applicable t.o a wider class of matrices because
the Cholesky process applied to ATA breaks down "before" the back
substitution proceea on qr A = R.

At the 'Very minimum, thl8 discussion sbould convince you how difficult it
can be to choose the "right" algorit hm!

Problema

P6.!U Allume AT Az :3 A7 b, (AT A+ F)i"" ATb, and 21 F 02 ~ <1,.(Af1 . Show that


if T = b - A% &nd r "" b - A%, tbeD r - r • A(ATA + F ) - l Fz aod

I r - ,.II, $ ~2(A) ~ ~ ~2 11 z 11
2•

P6.3.2 Alsume ~bat AT.U • ATb e.ad tbat AT At = A Tb +I wbele I I 112 $


cuHAT 121b lb &nd A bu full col\UDD rank. Sbow \hat

nz - t Ha < ~2 (A)2 I AT I~I b Ua


II z 02 - II A b U •

P6.3.3 1M A E R""'" rill m > n ADd t1 E R"' aad deftDe A ,. fA til E R",(,.+l).
Silo. thaa crt(A) ?: <1t(A) Nld <1n+t(A) $ o-,.(A). Tbaa, the CIDIIditloa CI'IJ'ft if a cduma
il added t.o a ~~latrix.
P5.lU La\ A E R", .. (m ?: n), w E R", and define

Show t~ cr,.(B) ?: ~r,.(A) 8Dd <11 (8) $ v'l A 11: + UwiJ. T'11ut, tbe OODditioD ol a
matrix lila)' t.Da.. ex ~ if a rvw ia .added.
P5.3 .8 (Cliu lln3) S~ Q.at A E R'"l(" b. taDk" .00 t.hM Gan=i•n eliminvjon
rib puUal piwdq il ~ &o OOIIIPnt the~ PA • LU, -~ L E R'"x"ll
Wlit loww ~. U e R'•" il ~ Viaqulw, .M PER",.,. ill a~
&plaiD bow tbe decompolit.illa ill P5.2.5 cu be ll8ld t.o liDd & vect.or • E R'" llldl Cia&&
II L~& - ~ u2 • miDimjW Silo. \W if U:z .. z, tJa. I A:z- It 1:.
ill m i gjg•m ~
\bat tJUa melbod of 8Dlvlq &be LS problem It IDOft ema.a& cbaa H~ QR. from
~be ftop poiDt of view ..,~ m S 'lm/3.
P5.3.8 Tbe lll&trilr C :a (AT A)- 1, wbere I'Mk(A) = n , 1ri1ee ill --.y ..-Wjc&l ~
cacioaa and ill lmowD aa tbe oariAnce-oowrioACe motri& Aauma uas
cbe ~
246 CHAPTER 5. 0Jn'HOGONALIZATION AND LEAST SQUARES

A"" QR ia avai.labla. (a} Sbcnr C = (RTR)- 1 • (b) Give N1 a.J&om.hm lor comput~ the
diagonal of C tl!5 requirel n 3 /3 flops. (c) Show that

R=[<Ro "sr] '* C=(RTR)-1""[(l+vrc1v)/a2


-Ctv/a
-TCtfa]
Ct
where Ct = (st'S)- 1. (d) Ullin~~: (c), giw ILII algorithm that OWlrWritee tbe upps- tri-
ILIIpW portion of R with tbe upper trian&uJar portion of C. Your algorithm should
l11Q.uire 2n3 /3 llo~
PIS.S. T Suppoee A E R',.;" is ll)'llliDMric and that T = b - A% where T, b, :r E R" and
: ill nouero. Show bow to compute a 8)'lllllll!tric E E R"x" wit!! minimal Fkobenius
norm 110 that. (A +E)~ == b. Hint. u. the QR factorU&tlon or [:, T] ao.d note tba&
E:z :::n• ~ (QTEQ)(QT;z:) = QTT.
P5.3.8 Show bow to compute the nearest circulant matrix to a given Toeplitz matrix.
Me:Mure distance with the FtobeniUII nonn.

Note~~ and Referencea for Sec. 5.3

Our restriction to le&llt &:Juarell apprmrimaiion ia not a vote against minimiution in other
nonns. There an1 occuions when it ill adviAble to minimize II Az - b II,. for 'P == 1 and
oo. Some a.lgoritbme fo~ doing this ue dBI!ICribed in

A.K. Cline (1976&). "A n-::eot Method foc the Uniform Solution to Oven:l.etenn.ined
Symma of Equaliona, ~ SIAM J. Num. AM-I. 13, 293-300.
R.H. Ba.rte1s, A.R. Conn, and C. Chan.Jambous (1978). "'n Cline's Direci Method !or
Solving Overdetermined Linear S}'lltems in the L.,.. Senset SIAM J. Num. An.t!.L 15,
255-70.
T.F. ColemiLII ILIId Y. Li (1992). ~A Glob&lly and Quadratically Convergent Affine
Scaling Method for Linear L1 Problems,~ Mathemotirot Progromming, 56, St:ri~ A,
189-222.
Y. Li (1993). "A Globally Convergent Method for L,. Problems," SIAM J. Opmnwticn
3, 609-629.
Y. Zhang (1993). "A Primal--Dual Interior Point Approach for Computing the L1 and
L 00 Solutions of Ovetdetermill.ed Linear Syateme, n J. OptimUation 7'ht:orv and Ap-
plic:otionl 17, 323-341.

Tbe . - of G&UIB u.u.lonnaiio• to solve the LS problem hal atcracted some attention
'**- they _, cheapm- to . - than Houeebolder or Gift~~~~ ma.triais. See
G. Peters and J .H. Wilkinaoo ( 1970). "The Leut Squaces Problem =d P!lelldo-JnverlleS,"
Oomp. J. 13, 309-16.
A.K. Cline (1913). "An Eliminalion Method fur the Solution of Linear Lee..& Squllnlll
Problema," SIAM J. Num. AMl. 10, 283-89.
R.J. Plemmooa (1974). "Linear Least SqiiSI'tll by Elimination ILIId MGS," J. Auoc.
Oomp. MDcl&. !1, 581-85.

Important .I!Ulalr- of the LS problem and varioua IIO!utlon approach• include

G.H. Golub and J.H. Wilkinaon (1966). •Note on tbe lten.tM! Refinement of I...eu&
Squana Solution," Nwn.er. Malh. 9, 139-48.
A. \'11.11 del' Slnia ( 1915). "Stability of the So!uti~ of Lineal' ~ Sq~ Problem,"
Nvmer-. Moth. D, 241-54.
Y. Saad (1986). "'n tbe Condition Number of Some Gram MacricM Arising from Le:a.tn.
Squar&~ Appraximation in the Complelt Plane," N'1J.'11U!:r. Math. ~8, 337-348.
A. BjOn:k ( 1987). "Stability Ana.lysia of t.he Method of Seminonnal Equa&iollll," Lin.
Alg. and /q Awlic. 88/89, 31-48.
5.3. THE FULL RANK LS PROBLEM 247

J. Gluc:howsbaad A. SmoktUDawia {1990). "Solving the Lu-- l.-t SqU&I'I!IB Problem


with Veey- Higb Rela&iw Accuracy, • C~ .t5, 345-354.
A. BjOn:k (1991). "Component-wiae Perturbation Analyaia and Erroc Bound& for Lineaz-
Leut Solutiona,• BIT 31, 231HM4.
Squaca~
A. BjOrck aod C. C. Paige (1992).
"'.,.(a aod Recapture of Orthogonality in 'he Modified
Gram-SclunXl1 AJsorithm," SIAM J. Motriz AnaL AppL 13, 176-1!10.
B. Waldea, R. Kadmn, J. SUD (1995). ~imal Baclcward Penurbaiion Bounds for 'be
Linear Leul Squan:~ Problem," Numeriall Lin. Al.g. Ulith Applic. I, 271-286.
The "seminonnal" equa&ioos are pwm. by Rr lb :Arb wbmtt A,.. QR. In d1e aboYe
paper it is shown that by aol.ving 'he seminonnal equations ao ac:ceptable LS mlution is
obtained if one step of fixed pn!Cision iterative improvemem ill performed.
An Algol implemen~ion of tbe MGS meibod for solving 'be LS problem appan~ in

F.L. Baum- {1965). MElimination with Weighted Row Combinatolons ror Solving Lin-
ear Equatio01 and Least Squares Problems," N'ISfMT. Mtaeh. 7, 338-!5-2. See al80
Wilkinson and Reinscb (1971, 119-33).
Least squazes problem~~ often baw special structllRl wbicb, of colllllll, :should be exploited.

M.G. Cox (1981). "Tbe Least Squares Solution of Own:letmmined Linear Equations
having Band or Augmented Band StructUM," IMA J. Num. Anal. 1, 3-22.
G. Cybenko (1984). 'The Numerical Stability of the Lattice Algornhm for Least Squares
Linear Prediction Problems," BIT 2.4, 441-455.
P.C. H~ a.od H. Gaunar (1993). "Fast Orthogonal Decompo~~ition of Rank-Deficient
Toeplitz Matricee," Numericol Algonth!'M .t. 151-166.
The use of Householder matrices to solve sparse LS problemll require~~ careful attention
to avoid ~ve 6ll-in.

J.K. Reid (1967). ~A Note on the Leaat Squ8l"fll Solution of a. Band S}'lltem of Linear
Equa.t.iona by Houaeholder Reductiona,n Comp. J. 10, 188-89.
l.S. Duff and J.K. Reid {1976). "A Compal'i!lon of Some Methods for the Solution of
Spane Over-Determined Systems of Linear Equat.iODI,~ J. /rut. MaUL AppUc. 17,
267-80.
P.E. Gill and W. Murray (1976). "The Orthogonal Factorization of a Lacge Spame
Matrix, • ill Sporn Mlltriz Campueatimu, ed. J.R BIIDC.b aod D.J. Rose, Academic
Pre., N_. YOf'lc, pp. 177-200.
L. Kaufman (100'9). "Application of n..-
Hou.ebokler 'l'nmldClr~Datiou. to a Span~~~
Mairix.,n ACM 7hm•. Mtaeh. Soft. 5, 442-51.

Al,hough the comput.aUon of the QR ractorizatioll is more efficient witb Householder


reftectiona, there are some l!letiil!.p where the G i - approach is ~u.. For --
ample, if A ia spene, 'ben the careful appli.cation of Giwloa rota.tiooa caD. minimhre fill-in.

l.S. DuB (1974). "Prou SeJeclioD &lid Row OrderiDg in Gi,_. Reduciion on Sparse
Matrics," ComJ"'ting 13, 239-48.
J.A.. Georp ud M.T. He.&.h (1980). "Soo.mioD ofSJ~UM u - 1.-.c Square~ Problema
U.mg Gi11el!S ~io01," Lin. Alg. and /C.. Applic.. 34, 61HJ3.
248 CHAPTER 5. OJti'HOGONALIZATlON AND LEAST SQUARES

5.4 Other Orthogonal Factorizations


If A is rank deficient, then the QR factorization need not give a basis for
ran(A). This problem can be corrected by computing the QR factorization
of a column-permuted version of A, i.e., All= QR where ll is a permuta-
tion.
The "data" in A can be compressed further if we permit right multipli-
cation by a general orthogonal matrix Z:

There are interesting choices for Q and Z and these, together with the
column pivoted QR factorization, are discUBSed in this section.

5.4.1 Rank Deficiency: QR with Column Pivoting


If A E lf'u:n and rank(A) < n, then the QR factorization does not nec-
essarily produce an orthonormal basis for ran(A). For example, if A has
three columns and

is its QR factorization, then rank( A) = 2 but ran(A) does not equal any of
the subspaces span{qt. lJ2}, span{q1o q3}, or spao{Q2, 113}.
Fortunately, the Householder QR factorization procedure (Algorithm
5.2.1) can be modified in a simple way to produce an orthonormal basis for
ran(A). The modified algorithm computes the factorization

qT Ali = ( Ru
0
Rt::r
0
J r
m-r (5.4.1)
T n-T

where r = rank(A), Q is orthogonal, Ru is upper triangular and non-


singular, and ll is a permutation. If we have the column partitionings
All = [ ac1 , ••• , a.,. ] and Q = [ QI, ••. , Qm ], then for k = l:n we have

min{r,k}
ac. = L Tttqi E span{qt, ... ,q,.}
i.-1

implying
ran( A) = span{qb ... , q,. }.
The matrices Q and ll are products of Householder matricea and inter-
change matrices respectively. Assume for some k that we have computed
5.4. OTHER ORTHOGONAL FACTORIZATIONS 249

Householder matrices H1, ... , H~c-1 and permutations 111 , ••• ,n,._ 1 such
that

(5.4.2)
k-1
m-k+1

where Rl~- ) is a. nonsingula.r and upper triangular matrix. Now suppose


1

that
,(k-1) - [ (lc-1) (k-1)]
£"22 - zlc ' ••• 'Zn

is a column partitioning and let p ?: k be the smallest index such that

II ,4~<-l) liz = max {II z~k- 1 ) ll:z• ···,II z!_k-tl lb} · (5.4.3)

Note that if k-1 = rank(A), then this maximum is zero and we are finished.
Otherwise, let D~o be the n-by-n identity with columns p and k interchanged
and determine a Householder matrix H,. such that if R(k) = H"R(k-llDk,
then R(k)(k + 1:m, k) = 0. In other words, Il1c moves the largest column in
~- 1 ) to the lead position and il.., zeroes all of its subdiagonal components.
The column norms do not have to be recomputed at each stage if we
exploit the property

QTz = [a]
w s-1
1

which holds for any orthogonal matrix Q E ~x•. This reduces the overhead
associated with column pivoting from O(mn:J) flops to O(mn) fiops because
we can get the new column norms by updating the old column norms, e.g.,

II zW Ill = II zU-l} II~ - r,,.

Combining all of the above we obtain the following algorithm established


by Businger and Golub (1965):

Algorithm 5.4.1 (Householder QR With Column Pivoting) Given


A E R'"xn with m ?:: n, the following algorithm computes r = rao.k.(A)
and the factorization (5.4.1) with Q = Ht · · · H,. and n = fl 1 · · • n,.. The
upper triangular part of A is overwritten by the upper triangular part of
R and components j + 1:m of the jth Householder vector are stored in
A(j + 1:m, j). The permutation n is encoded in an integer vector piv. In
particular, II; is the identity with rows j and piv{j) interchanged.
250 CHAPTER 5. ORI'HOGONALIZATION AND LEAST SQUARES

for j = 1:n
c(j) = A(l:m,j)TA(l:m,j)
end
r = 0; T = m.a.x{c(1}, ... , c(n)}
Find smallest k with 1 ::5 k ::5 n so c( k) = .,.
while.,.> 0
r=r+l
piv(r) = k; A(l:m, r) ...... A(l:m, k); c{r) ..... c(k)
[v,,B) = bouse(A(r:m,r))
A(r:m, r:n) = (Im-r+l - ,BvvT)A(r:m, r:n)
A(r + l:m, r); v(2:m- r + 1)
fori= r + l:n
c(i) = c(i) - A(r, i) 2
end
ifr < n
T= max{c(r+ 1), ... ,c(n)}
Find smallest k with r + 1 ::5 k ::5 n so c( k) == T.
else
;::::::0
end
end
This algorithm requires 4mnr-2r2(m+n) +4rl /3 flops where r = rank( A).
AB with the nonpivoting procedure, Algorithm 5.2.1, the orthogonal matrix
Q is stored in factored form in the subdia.gonal portion of A.

Example 5.4.1 If Algorithm 5.4.1 ill applied to

A = [ ~ ~
1 11
:
12

n:::::: [e! e2 el] a.nd to three significant digits we obtain

l[
then

An "" QR :o:: [ =:: -::::


.548 .000
-:~;~ :~:
.113 -.829
-1~:~
0.0
-14:~
.000
-~::~
0.000

-.730 .408 .200 .510

5.4.2 Complete Orthogonal Decompositions


The matrix R produced by Algorithm 5.4.1 can be further reduced if it
is post-multiplied by an appropriate sequence of Householder matrices. In
particular, we can use Algorithm 5.2.1 to compute

z,. ... zl R'h]


[ RT = [Tl',]
11
0
r
n-r
(5.4.4)
1:1
5.4. OTHER 01miOGONAL FACTORIZATIONS 251

where the z, are Householder aaoa£ormatioas and Tfi is upper tri.aogular.


It then follows that
0 ] r
0 m-r . (5.4.5)
r n-r
where Z = UZt · · · Z,.. We refer to aay decomposition of th.ia form aa a com-
plete orthog<mGI det:ompolition. Note that DUll(A) = ran(Z(l:n, r + l :n)).
See P5.2.5 for details about the exploitation of mucture in (5.4.4).

5.4.3 Bidiagonalization
Suppose A E R"x" aud m ~ n. We next allow how to compute orthogonal
Us (m-by-m) aud Va (n-by-n) auch that

dt h 0 0
0 d2 h 0

UIAVa = 0 dn- l f,.-. (5.4.6)


0 0 d,.

0
uB= ul ... u.. aDd Vs = vl ... Va-l can each be determined as a product
of Householder matrices:

~ :~
[ 0
0
0
X
X
X
X
X
X
=].!2.[~ ~ =
X
X
0
0
0
0
X
X

[~ ~ ~ n~[~
X
X
0
0
0
0
X
X
0
0
~
X
X
X
l [~ :
~ 0
0
0
0
0
0
252 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES

In general, uli introduces zeros into the kth column, while v. zeros the
appropriate entries in row k. Overall we have:

Algorithm 5.4.2 (Householder Bidiagonallzation} Given A E :R"x"


with m ~ n, the followin,g algorithm overwrites A with Ul'AVs = B where
B is upper bidiagonal and UB =
Ut · · · Un and Vs = V1 · · · Vn-2· The
essential part of U;'s Householder vector is stored in A(j + l:m,j) and the
essential part of Vj's Householder vector is stored in A(j,j + 2:n).
for j = 1:n
[v, /31 = hoW!e(A(i:m,j))
A(j:m, j:n) = (Im-Hl - .BvvT)A(j:m,j:n)
A(j + l:m,j) = v(2:m- j + 1)
ifj:Sn-2
[v, ;3] = house(A(j,j + l:n)T)
A(j:m,j + l:n) = A(j:m,j + l:n)(In-j- /3t!VT)
A(j,j + 2:n) == v(2:n- j)T
end
end
This algorithm requires 4mn2 - 4n3 /3 flops. Such a technique is used in
Golub and Kahan (1965), where bidiagona.lization is fust described. If the
matrices UB and Vs are explicitly desired, then they can be accumulated
in 4m 2 n - 4n 3 /3 and 4n 3 /3 flops, respectively. The bidiagonalization of A
is related to the tridiagonalization of AT A. See §8.2.1.

Example 15..4.2 U Algorithm 5.4.2 is applied to

.
A=
[ 1
1
4

10 ll
2
5
8
12
!].
then to three significant digits we obtain

iJ =
[
12.8
0
0
21.8
2.24
0
0
-.613
0
l Vs ::=:
1.00
o.oo
[ 0.00
0.00
-.867
-.745
0.00
-.745
.667
l
_:: -:: l
0 0 0
-0776 -.833
• -.3110 -.451
Us""' __
[ 5430 -.069 .101 -.457 .
-.1160 .312 .547 .037

5.4.4 R·Bidiagonalization
A faster method of bidiagonalizing when m > n results if we upper trian-
gularize A first before applying Algorithm 5.4.2. In particular, suppose we
5.4. OTHER OJUHOGONAL FACI"ORIZATIONS 253

compute an orthogonal Q E R"'xm such that

is upper triangular. We then bidiagonalize the square matrix R 1 ,


U};RzVa =Bt.
Here UR and Vs are n-by-n orthogonal and B 1 is n-by-n upper bidiagonal.
If Us== Qdiag (Un,Im-n) then

UT AV == [ ~1 ] ; B

is a bidiagonalization of A.
The idea of computing the bidia.gonalization in this lll8J1Der is mentioned
in Lawson and Hanson (1974, p.ll9) and more fully analyuxi in Chan
{1982a). We refer to this method as R-bidiagonalization. By comparing its
flop count {2mn2 +2n 3) with that for Algorithm 5.4.2 (4mn 2 -4n3 /3) we see
that it involves fewer computations (approximately) whenever m ~ 5n/3.

5.4.5 The SVD and its Computation


Once the bidiagonalization of A has been achieved, the next step in the
Golub-Reinsch SVD algorithm is to zero the superdiagonal elements in B.
This is an iterative process and is accomplished by an algorithm due to
Golub and Kahan (1965). Unfortunately, we must defer our discussion of
this iteration until §8.6 as it requires an understanding of the symmetric
eigenvalue problem. Suffice it to say here that it computes orthogonal
matrices Ur. and Vr. such that
UtBVr. = E = diag(u 11 ... ,a") e rxn.
By defining U == U~Ur. and V = V8 V:c we see that lP' AV = E is the SVD
of A. The flop counts associated with this portion of the algorithm depend
upon "how much" of the SVD is required. For example, when solving the
LS problem, lf1' need never be explicitly formed but merely applied to b
as it is developed. In other appllcationB, only the matrix Ut = U(:, l:n)
is required. Altogether there are six possibilities and the total amount of
work required by the SVD algorithm in each case is summarized in the
table below. Because of the two possible bidiagonalization schemes, there
are two columna of flop counts. If the bidiagonali'Mtion is achieved via
Algorithm 5.4.2, the Golub-Reinsch (1970) SVD algorithm results, while if
R-bidiagonalization is invoked we obtain the .R-SVD algorithm detailed in
Chan (1982a). By comparing the entriea in this table (which are meant only
as approximate estimates of work), we conclude that the R-SVD approach
is more efficient unless m ~ n.
254 CHAPTER 5. 0Rl'BOGONALIZATJON AND LEAST SQUARES

Required Golub-Reiuc:h SVD R..SVO


I: 4mn -4n /32 3
2mn2 +2n3
t,V 4nm2 +8n3 2mn2 + 1ln3
E,U 4m2 n- 8mn2 4m2 n+13n3
E,Ut 14mn2 -2n3 6mn2 + 1ln3
E,U, V 4m2 n + 8mn2 + 9n3 4m2n +22n3
E,Ut,V 14mn2 +8n3 6mn2 +20n3

Problema
P5.,.1 Suppoee A E R",." with m < " · Gn. to~~ aJ&oritbm for cocnputmc tbe fector-
iaaUoll
urAv •!BOJ
wbere B ia an m-by-m upper bidiagoaal ma&rix. (HIIll: Obtain the form

X 0
X )(
0
0
0
0
0]
0
0 X X 0 0 .
0 0 X X 0
uaiac Ho\!Mbolcler 1!\Mricee .lid tha "cha." tbe (m, m + 1) eotry up the (m + l}a
colUDUl by app!yias G l - rocKiooa from the right.)
P5.4.2 Show bow ~ eflb:ieDtl)' bidiacooallle u n-by-n upper tri&nplar m¥rix IWios
GIWDI ro&atloDL
P5.4.3 Show bow to upPft' bidiacooaliae a tridlacoD&I matrix T E R',. .. uainc Giveu
~ou.
PS.,.4 Let A e R""" aad _.me tllat 0 .;, v atiafiea 0 Av ll2 "" u,.(A)II v l2 Let n
be a ptnDU~ioll auc:h that if nr• • w, the:D IU~oo l "" Uw flao· Show thal II All • QR
ill the QR factorisation of An. thell fr,.,.j ~ ,..;M,.(A). Thu, Ulere always exiRa a
~ll n 8UCb t l$ \he QR !~D of An •w.piayw" -rank defidaucy.
PS.4.S .1...- z, l' E R"' aad Q E R"x.. be pwn with Q onbogooaL Show that if
QT%= [a]u
1
m-1
QT'/1= [/J]
o
1
m-1
tbell uTV a %T'II - a/J,
P5.C.6 Let As ( cu, ... ,o,.J E R"•" Uld bE R"" be giwa. fo\x' auy tut.K of A'a
cofWIIIIII {Get , ... , Oc.\ } deftDe

rw[~~c, • ... • ~ae .. l = min I [Oc.. ... . ac., ) %-bb


%ERi
o.:ribe an alt.enleUve p~ ~iDD ~In k .AJ&orithm 5.4.1 ndl thai if QR s
..tn - [ Clq •• • •• Oc,. ) lD &be liDal raetoriaa&ioa. thea lor It - 1:n:
n.(~~c . .... . ~~c.J• mm nll{a,., . .. • ~~c.,_.,a,.J
·~·
5.4. OTHER Oln'HOOONAL FACI'ORIZATIONS 255

R.J. a - &lid C.L. La.._ (11169). "Ext;euioaa ud Applica&io• of the Houeholder


Alsorithm COl" SolviJI«: u _ . 1.-t SqUIU'8 Problems,"
MalA. Comp. ~. 787-812.
P.A. Wedin (1973). "'n the .AlmoA Rank.Deftdept c-oftbe 1.-.t Squa.ree Problmn, •
BIT U, 3«-54.
G.B. Golub and V. Pereyra (1976). "Dilfenmlatiou ol Paeudo.In--, Sepemb1e Non-
liDeac 1.-t Squ.area Pmbiam and Other- 'I'alal." In Gl!lflf!ftlliud lnf1eiW.ll Mid Appli..
OIJtion.l, ed. M.Z. N.u.t, Academic Pre., N- York, pp. 303-24.

compu~ of
The tbe SVD ill detailed in §8.6. But ha-t! an1 aome of the aCandaro
~ collCIImed wit..h U. calcu.la&ion:

G.H. Golub aDd W. Kahaa (1965). "Cakulacial lhe Singulv Values and Peeudo-lllwne
of a Matrix,~ SIAM J. Num. AnaL t, '205-24.
P.A. Bllllingar and G.H. Golub (1969). KAigorithm 358: s~ Value Decomposition
of the Complu Matrix." Cmnm. ACM 111, ~-
G.H. Golub a~~d C. JWnacb (1970). "Singular- Value Deco~tion aDd Least. Squ.t~AS
Solutiona," Numer. MalA. 1./, 403-20. See ablo Wilkinaon a~~d Reinach(I971, pp.
1334-51).
T.F. Chan (1982}. KAo lmpnJyed Algurithm for Computing the Singulac Value Decom-
polition, ~ A CM lhJN. Mo&h. Soft. 8, 12-83.

QR with column pi¥Dting waa tint disc~ in

P.A. Bu.singer aDd G.H. Golub (1965). •Linear Learlt Squares Solutio11.11 by H01111eholder
'I'nmaforma1iona," Nvmer. Math. 7, 269-76. See alao Wilkinrlon a~~d Rsinacb. (Urn,
pp. 11-18).

Koowing whim io stop in the algorithm i1 d.i.ftkult. In questkma of r.nk deficiency, it is


helpful to obtam information about the llllll1iW lli.D.gul.-r wlue of the upper triangular
matrix R. Thia ca.a be do»e UliDg tbe t.echAiqu• of §3.5.4 or tha.e that are ~ io

I. Kan.ealo (1974). •A Criterion fw 'Ihmcation of the QR Decompoeition Algorithm for


the Singular Lineec r - t Squar-m Problem," BIT 14, 156-M.
N. A.ndenon aDd L ~ (1975). "'n Computing Bounds for the I...euc Sinp;ul&r
Value of a 'niaDgular Matrix," BIT 15, 1-4.

L.V. Fl:.ter (1986). '"RaU and NuU Space CalcuiMiou Using Matrix Oecompo8Rion
without Column Incert:baDpa.~ Lin. Alg. an& Ita App!ic.. 7.f, 47-11.
T.F. Chan (1987). "RaQk ~ QR Factoriationa,~ Lln.. Alg. C&n.d Ju Apptie.
88/89, 67-8!.
T.F. Chan and P. Ilanlml (1992). "Some Applicatio~~~a ol the Rank Revealing QR Fac-
torizaa;ion,~ SIAM J. Sci. otul SJ4t. Comp. 1,, 727-741.
J.L. Barlow Uld U.B. Vflml11apa&i (1992). •ftank ~Methode lor Spene M.m.
01!11,~ SIAM J. MGO'V. AnaL Appl. 1!, 12'19-1297.
T-M. HM.DS, W~W. Ll.n, IIDd E.K. Y11Ag (1992). "Ra.ok·R.enalint: LU Factomatiooa,~
Lin. Alg. and Ita AppAc. 175, 115-141.
C.H. Blacbof aod P.C. a - (1992). •A Bkx:k Algori\bm for Compui!Dc RBIIk·
~ QR ~, N--ux.l Algontlmu ~ 371-392.
S. ~ and I. C. F. I~ (1994). "'n. Rank·n-liDg F'ac:toriza&;klaa,,. SIAM
J. Mfltri.% A114l.. AppL 15, 592-622.
R.D. Fierro aDd P.C. 11.- (1995). •A.ccmacy of TSVD SoluUooa Canpu.ted from
Rank~R.enaiint: Decompnaitioaa, ~ Num.et". MtUh. 70, 453-472.
256 CHAPTER 5. 0RTHOGONALJZATION AND LEAST SQUARES

5.5 The Rank Deficient LS Problem


If A ill rauk deficient, then there are ao Winite nUlllber of solutioDB to the
LS problem aod we mll$t resort to special tedmiques. These techniques
must address the difficult problem o( numerical rank determination.
After &ODle SVD ~. we show how QR with column pivoting
caa be used to determine a minimi?« zs with the property that Aza Ia a
linear combination of r = rank( A) columna. We then discuss the miDimum
2-norm solution tbat can be obtained from the SVD.

5.5.1 The Minimum Norm Solution


Suppose A E R"'x" and rank(A) = r < n . Tbe rank deficient LS problem
ha8 an infinite number of solutions, for if z is a minimizer and z E null( A)
then z + z is also a mjnjmiref. The set of all minimizers
X = {z € Rn : 0A% - b ll2 = min }
is convex, for if z 11 z 2 EX and ..\ E {0, 1), then
II A(.Ut + (1 - >.)z,) - bfh $ All h t - b lb + (1 ->.)II Ax, - b lb
= min II Az - b n,.
Tbus, A%1 + (1- A)x2 EX. It follows that X has a unique element having
minimum 2-norm and we denote this solution by XLS · (Note that in the
full rank case, there is only one LS solution aud so it must have minimal
2-norm. Thus, we are coo.sistent with the notation in §5.3.)

5.5.2 Complete Orthogonal Factorization and XLS

Any complete orthogonal factorization can be used to compute XLS. In


particular, if Q and Z are orthogonal matrices such that

~
] m-r r
r = rank(A)
r n-r
then
I!Az-bll~ =II(QTAZ)z7'x-QTbJI~ z:IITuw-c ll~ +lldll~
wbere

zrz = [ w
y
J n-r r
Clearly, if x is to miaimize tbe awn of squans. then we must have to = Til 1c.
For % to have minimal 2-oorm, tl must be zero, aod thu.,
~lc
ZLS = Z [
1
J ·
)
5.5. THE RANK DEFICIENT LS PROBLEM 257

5.5.3 The SVD and the LS Problem


Of oourse, the SVD is a particularly revealing complete orthogonal de-
composition. It provides a neat expression for z LS and the norm of the
minimum residual PLS =II AzLs- b [1:,.

=
Theorem 5.5.1 Suppoae rfi' AV E i.t the SVD of A e Ie'xn with r =
rank(A). If U = [ u1,. •• , Urn] and V = [111, ••• , Vn] are column partition-
ingt and b e R"\ then
(5.5.1)

minimize, ll Ax - b 1l2 and ha.t the small.e.!t 1?-nonn of all minimizers. More-
over m

Pls = II AzLs - b il~ = E (ufb) 2


• (5.5.2)
i:::ar+l

Proof. For any X e Rn we have:

r m
= L(u,a,- ufb) 2
+ L (u[b) 2

where o: = yT x. Clearly, if x solves the LS problem, then o:, = (u[bfu,) for


i = l:r. If we set a(r + l:n) = 0, then the resulting x clearly has minimal
2-norm. D

5.5.4 The Pseudo-Inverse


Note that if we define the matrix A+ e nnxm by A+= VE+rfl' where

E+ = diag (..!.., ... ,.!._, O, ... , o) E nnxm r = ra.nk(A)


CTl u.-

then %LS = A+b and PLs = [I {I- AA+)b ll2· A+ is referred to as the
pteudo-inver&e of A. It is the unique minimal Frobenius norm solution to
the problem
{5.5.3)

U raok(A) = n, then A+= (AT A)-tAT, while if m = n = rank(A), then


A+ = A- 1 • Typically, A+ is defined to be the unique matrix X E R"xm
that satisfies the four Moore-Penro&e conditiom:
{i) AXA "" A (iii) {AX)T "" AX
(ii) XAX = X (iv) (XA)T = XA.
258 CHAPTER 5. OlmlOOONALIZATION A.ND LEAST SQUARES

Thae coodit:ioiUJ amou.ot to tbe requiremaat that AA• and A+ A be orthog-


onal projectiooa onto ran(A) and ran(Ar), NBpeCtiwly. Indeed, AA+ =
Utu'f' where UL = U(l:m, l :r) and A+ A= lliVf where Vi. = V{l :n, l :r).

5.5.5 Some Sensitivity Issues


In §5.3 we examined the aeD&itivity of tbe full rank LS problem. The ~
bavior of %LS in this situation ia summarized in Theorem 5.3.1. U we drop
the full rauk aasumptlou thea :z:Ls is not even a cootinuous function of the
dat& and small ch.angee iD A and b caa induce ubitrarily large chamges in
:z:ts = A+b . The easiest way to aee this is to consider the behavior of the
peeudo inwrle. If A and 6A are in R"xn, then Wedin (1973) and Stewart
(1975) sbow that

II (A +6A)+- A+ 11,- $ 2U 6A IIFmax {II A+ II~ , II (A + 6A)+ II~} .


This inequality is a generalization of Theorem 2.3.4 in which perturbations
in tbe matrix inverae are bounded. However, unlike the square noosingular
cue, the upper bound does oot necellllarily tend to zero as 6A tends to zero.
It

tben

A+
=
[ 1 0 0]
0 0 0

and UA+- (A+ c5A)+ 1l2 =


1/~. The numerical determinatioo of an LS
min!mirer in the preaenee of such diacontlnuitMw is a major cballeuge.

5.5.6 QR with Column Pivoting and Basic Solutions


Suppoee A e r u baa rank r . QR with column pivoting (Algorithm 5.4.1)
prodooea the factorization AD = QR where

R = [ Ru Rt2 ] r
0 0 m- r .
r n-r
Giwn thia reduction, the LS problem cao be readily solwd. Indeed, for
any .7: E It" we have

Q..4%- b II~ = II (QT AO)(fiT.z) - (QTb) Ill


= II Ruu - (c - R12z) II~ + II d IIi ,
5.5. THE RANK DEFICIENT LS PRoBLEM 259

where
r r
n - r m-r
Tbua, if z is ao LS minimizer, tbeo we must have

z,. II [ llt1 (c ~ R12z:)


1
] .

H z is set to ~ in this expreaaion, then we obtain tbe 6olic Hlution

zs =II [ R-t
Oc ] .

Notice that zs has at most r nonzero components and so Azs involves a


subset of A's colWDDB.
The basic solution is not the minimal 2-norm solution unless tbe sub-
matrix R12 i8 zero since

II %LS ll2 = min_


z e R" r
!lxB- II [ R_l1R 11

"-~
12
] zll .
2
(5.5.4)

Indeed, this characterization of II ZLS ll2 can be used w show


1 ~ l:l::., ~~~2 :S V+ II Rii'
1 R 12 U~ . (5.5.5)

See Golub and Pereyra (1976) for details.

5.5. 7 Numerical Rank Determination with All = QR


H AJgorithm S.U is used to compute zs, then care must be exerci8ed in
the detennio.atioo of rank( A). In order to appreciate the diflieulty of this,
suppoee

fl(H,. .. · HtAIII · .. II~r) = ft.-c) = [


m">
11 m->
12
] A t ~..
0 ~ m-"'
k n-k
is the matrix computed aftm k steps of the algorithm have been executed
in floating point. Suppoee raak{A) = k. Becau.e ofroundo&' enor, ~)
will not be exact.ly zero. H01reYer, if~> is suitably small in norm then it
is nuooable to terminate the reductioa aDd declare A to haw rauk t . A
typical termination criteria might be

Jl ~) lia $ Etll A Jl2 (5.5.6)


260 CHAPTER 5. 0Jn'HOOONALIZATION AND LEAST SQUARES

for some small machine-dependent parameter E1 • In view of the roundoff


properties associated with Householder matrix computation (cf. §5.1.12),
we know that fl<t) is the exact R factor of & matrix A+ E~c, where
II E~t: 112 $ t2ll A 112 t2 = O(u).
Using Theorem 2.5.2 we have

CT~~:+a(A + E~o) = O'lo+l(flCtl) :5 II R!/;,) II:~.


Since !TJt:+l{A) $ O't+t(A + E~c) +II E~c ll:h it follows that
aA:+t(A) :5 (ti + t2)!1 A ll11·
In other words, a relative perturbation of O(Et +E2) in A can yield a rank-k
matrix. With this termin&tion criterion, we conclude that QR with column
pivoting "discovers" rank degeneracy if in the course of the reduction RJ;;,>
is small for some k < n.
Unfortunately, this is not always the case. A matrix can be ne&rly rank
deficient without a single Ml;.'
being particularly small. Thus, QR with
column pivoting by it3elf is not entirely reliable as a method for detecting
near rank deficiency. However, if a good condition estimator is applied to

I
R it is practically impossible for near rank deficiency to go unnoticed.

Ex.ample 5.5.1 Let Tn(c) be the matrix

-c -c ··· -c
1 -c ··· -c
.
.
.
. ...
T.(<) = d"«(l,•, .. , ,•-') [ : . .
1 -c
1
with til +•3 =1 with c.,• > 0 (See LaWIIOo and HIWlBOD (1974, p.31).) Th- matrices~
unaltered by Al~t~rilbm 5.4.1 and thus YII;} ll2 ~ Jn-t ror It = l:n -1 . This inequality
implie. (for example} th6&. the matrix TuXJ(.2} b.u 110 particulacly small trailing princiJ)61
submatrix since 11"""'
.13. However, it can be shown thai un = 0(10- 11 ).

5.5.8 Numerical Rank and the SVD


We now focus our attention on the ability of the SVD to handle rank~
deficiency in the presence of roundoff. Recall that if A = UI:vT is the
SVD of A, then

(5.5.7)

where r = rank(A). Denote the computed versioDB of U, V, and E =


diag( ~i) by (J, V, and E == diag( a-,). Assume that both sequences of singular
5.5. THE RANK DEFICIENT LS PROBLEM 261

values range from largest to smallest. For 8 reasonably implemented SVD


algorithm it can be shown that

(5.5.8)

V=Z+6V zrz=~,,. II.O.Vll:~:5~ (5.5.9)


t = wr(A + AA)Z li.O.A II:~: 5 fll A II:~: (5.5.10)

where t ia 8 small multiple of u, the machine precision. In plain English, the


SVD algorithm computes the singular values of 8 "nearby" matrix A+ .6.A.
Note that U and V are not necessarily clOBe to their exact counterparts.
However, we can show that O"~c is close to CTJc. Using (5.5.10) and Theorem
2.5.2 we have

011: = min 11 A- B II:~.


....,k(B)=k-1

= min II (E- B) - JVT(.O.A)Z lb.


rank(B)=Ic-1

min ll E~~: - B ll2 = u~c


ranlc.(B}=k-1

it follows that lu~c- &,.I 5 fu 1 fork= l:n. ThiiB, if A has rank r then we
can expect n - r of the computed singular values to be small. Near rank
deficiency in A clUUlot escape detection when the SVD of A is computed.

Example 6.6.2 Fw ihe muri:JI: T100 (.2) in Eump)e 5.5.1, a,. 1":1 .367 · to-•.
One approach to estimating r = rank(A) from the computed singular
values is to have a tolf!rance 6 > 0 and a convention that A has "numerical
r
·rank" if the a,
satisfy

The tolerance 6 should be consistent with the machine precision, e.g. 5 =


ull A lloo·However, if the generallewl of relative error in the data is larger
than u, then 6 should be correspondingly bigger, e.g., 5 = I0- 2 11 A lloo if
the entries in A are correct to two digits.
Iff is accepted as the numerical rank then vre can regard

.,. ~Tb
:&;o = L u!
i•l qi
.Ui
262 CHAPTER 5. 0RTHOGON.UIZATIOM AND LEAST SQUARES

aa an approximation to %£S· Since II z,.ll2 ll:l lfu, $ 1f6 then 6 may also
be chosen .-ith the intention of producing an approximate LS solution with
suitably small norm. In §12.1, we discusa more liOphisticated methods for
doing this.
If fr~ > 6, then we haw reason to be comfortable with z,. because A
can thea be unambiguously regarded as a raok(A,) matrix (modulo 6).
On the other band, {•h 1 ••• 1 c1.,} might not clearly split into subaets
of smaU and large singular values, malciDg the determination of by this r
means somewhat arbitrary. This leads to more complicated methods for
estimating rank which we now dlacuaa in the context of the LS problem.
For example, 8uppoee r = n, and assume for the mo.ment that ~ = 0
in (5.5.10). Thus u; = £or i a-, =
l:n. Denote the ith oolUDlllS of the
matrices U, W1 V1 and Z by u., w;, v,, and z,, respectively. Subtracting
z,. from z £S and taking norms we obtain

~ nCwTb>zt - <ufb)v, ll:r +


II z, - %£S II :1 <
_ L.J

~· Cft

From (5.5.8) and (5.5.9) it is easy to verify that

II (wfb)zt- (uTb)v, liz ~ 2(1 + e)tft b ll:r (5.5.11)


and therefore

The parameter r can be determined 88 that integer which minimizes the


upper bound. Notice that the first term in the bound inereases with ;.,
while the second decreases.
On oa:aliona when minimizing tbe residual is more important than ac-
r
curacy in the 80lutiool we can determine on the basi8 of how cloee we
surmise II b - Az,. II:~ is to tbe true minimum. Paralleling the above analy-
sis, it can be shown that

II b- Az,. ll:r -II b- AzLS lb :S (n -f) II b n:l + tn b 112 (r + :: + t)) .


(1

Again f could be choeen to minimize the upper bound. See Varah (1973)
for practical details aod a1ao the LAPACK manual.

5.5.9 Some Comparisons


Aa we mentioned, when solvtog the LS problem via tbe SVD, only I: and
V have to be computed. The following table compares tbe efficiency of this
approach with the other algorithms that we have presented.
5.5. THE RANK DEFICIENT LS PROBLEM 263

LS Algorithm Flop Count

Normal Equa4ioos mn 2 +n3 /3


HoWJebolder Orthogonalizatioo 2mn2 - 2n3 /3
Modified Gram Schmidt 2mn2
Gi~ Orthogonalizatioo 3mn1 -n3
Householder BidiagonaJization 4mn2 - 4n3/2
R-Bidiagonalization 2mn 2 +2n 3
Golub-Reinsch SVD 4mn2 +8n 3
R-SVD 2mn2 + lln3

Probt-
P5.5.1 Show tbat if
A = [~~]
r
m-r
r n-r
When! " ,.. rank(A) &Ad T il nouinplar, \hea
X == [ T;• ~ ] n "- r
r m-r
aatisfiel A.XA.,. A aDd (AX)T ;: (AX). In tbis cue. -aay tba& X ia a (1,3) rufiWID-
inueneof A. Sbow 'ba& for g-.1 A, za • Xb w!Mn X ia a (1,3) paeudo-i~ ol A.

P6.6.2 De6Jie 8(..\) E R"><"' by B(A) :: (AT A+ A/)- 1 AT, wbere .l. > 0. Show
).
I B (.l.) -A+ ft:a .. cr,.(A)(o,.(A)l + .l.J r = rank( A)

Uld 'IMrdore '"-' B(.l.) - A+ • .l. - o.


P5.5.3 Collllider the~ de6cieat LS problem

where R E R""", S E Jt'X"-",JI E R", 8M z E ll"-•. AMuule tbat Ria upper~


Jar aDd DOIIIiapar. Sbow bow co oiMaiD lbe miniJnwD IIOnll .ol.niou co tbis problenl
by complRUic aa -wropria&e QR ~ wi'~ pivoWI( aDd ~ .W'riq lor lb.e
appropria&e • &Ad %.
PIS.5.4 Show 'bat if A• -A ud At -.A+, cbeD tbtn ~ U1 iDtepr- IUCb cba&
raak(At) il ~ fOC' all t ~ ~.
P5.5.5 St.o. tbat if A e Jr"" baa niiiJt n. tbc 110 dom A+ E i f - baw Lbe illequNtcy
uEM A• n, < 1.
N~ aod Rat'ereoce. lor Sec. 5.5
The paeudo-m- lit411r11&11l11 ia ....... evldeoced by the 1,715 n f - iD
264 CHAPTER 5. O!n'HOGONALIZATION AND LEAST SQUARES

M.Z. N.ued (1976). 0~ lnvenu ~ Awlimtion.t, Academic~. N- York.


Tb.e ~on of the ~invene i8 funhs- d~ in

C.L. La.aon end R.J. Hal:ia:l11 (1969). "Enanaiolla aod Applicac.iona ofthe Houaehold.
Algwi:thm !or SolviDg LinM:r 1.-t Squania Problezm, • MotA. Comp. rl, 787~12.
G.H. Golllb aod V. Pereyra (19'13). "The Differeat.iation of J>eeudo..ln~ and Noolln-
e-.r L.- Squlll'tll Problem~ W'hoe Variablea Separate," SlAM J. Num. AnaL 10,
413-32.
s~ u-tmen~ of LS ~uroawn theory may be foUDd in l..aWIIOII e n d " -
(1974), Stewart and Sun (1991), Bjon:k (1996), and

P.A. WedlD ( 1973). "Perturbation Theory for Paelkio-IIIV'ei"8S," BIT 13, 217-32.
G.W. S~ (1977}. "'n th«i Perturbation of Paeudo-lnvenm, Projections, and Linear
Least Squarea," SIAM &tti.evJ 19, 634-62.
Ewn for fuU rank probleme, column pivoting IIBelllll to produce more accuraie solutioQII.
The error aoalysie in the following papa- attempts to explain wby.

L.S. Jennings and M.R. Osborne (1974). "A Dind Error Analysis for r.-st. Sq~"
NIJnU!:r". Math. .!2, 322-32.
Variou. other aapects rank deficiency 111e disc~ in

J.M. Vara.h (1973). "On the Numeri<:al Solution of Til-Conditioned Linear Systems witb
Appli.ca.tiona to Dl-Poeed Problems," SIAM J. Nusn. Anal. JO, 257-67.
G.W. Stewart (1984). "Rank Degeneracy," SIAM J. Sci. and Stat. Comp. 5, 403413.
P.C. Hanaen (1987). "The 'I'nmcated SVD 11.1 a Method for Regularisation," BIT !1,
534-553.
G.W. Stewart (1987). "Colllnearity a.nd Leas&. Squan'lll Regression," StotUtical Sc:U:na
.e, 68-100.
We b&w more to say on the subject in §12.1 and §12.2.

5.6 Weighting and Iterative Improvement


The concepts of scaling and iterative improvement were introduced in the
Chapter 3 context of square linear systems. Generalizations of these ideas
that are applicable to the least squares problem are DOW' offered.

5.6.1 Column Weighting


Suppose G E Fxn is ooosiogular. A solution to the LS problem
min II A:c - b lh (5.6.1)
can be obtained by finding the minimum 2-norm solution YLS to
min II (AG)y- b 112 (5.6.2)
and thea setting :ra = GYLS· If rank( A)= n, then xa = XLS· Otherwise,
xa is the miDimum G-norm solution to (5.6.1), where the G-norm is defined
by II z lla = II a-lz ll2·
5,6. WEIGHTING AND ITERATIVE IMPROVEMENT 265

The choice of G is important. Sometimes its selection can be based on


a, priori knowledge of the uncertainties in A. On other occasions, it may be
desirable to normallze the columns of A by setting

G = Go = diag(l/11 A(:, 1) !1:11 ••• , 1/11 A(:, n) !12).


VanderSluis (1969) has shown that with this choice, "2(AG) is approxi-
mately minimiud. Since the computed accuracy of YLS depends on "'2(AG),
a case can be made for setting G =Go.
We remark that column weighting affects singular values. Consequently,
a scheme for determining numerical rank may not return the same estimates
when applied to A and AG. See Stewart (1984b).

5.6.2 Row Weighting


Let D = diag(dto ... , dm) be nonsingular and consider the weighted kast
squares problem

minimize II D(Ax -b) ll2 (5.6.3)

Assume rank( A) = nand that xn solves (5.6.3). It follows that the solution
XLS to (5.6.1) satisfies

xn - XLs = (AT D'J. A)- 1 AT (D2 - I)(b- AxLs) . (5.6.4}

This shows that row weighting in the LS problem affects the solution. (An
important exception occurs when be ran(A) for then XD = XLS.)
One way of determining D is to let dk be some measure of the un-
certainty in b~c, e.g., the reciprocal of the standard deviation in b~c. The
tendency is for rt = ei'(b- Axo) to be small whenever dt is large. The
precise effect of d~c on r* can be clarified as follows. Define

D(6) = diag(dt, ... , d~c-t, d,. v'i"'+1, d~;+l, ... , d.n)


where 6 > -1. If x(6) minimizes II D(o)(Az- b) lb and r,.(b) is the k-tb
component of b- Ax(6), then it can be shown that
r,.
r~c(6) = 1 +c4etA(ATD2A)-lATet. (5.6.5)

This explicit expression shows that r,.(6) is a monotone dec.reaaing function


of 6. Of course, how r,.
changes when all the weights are varied is much
more complicated.

Example 5.6.1 Sup~


266 CHAPTER 5. 0Jn'HOGONAL1ZATION AND LEAST SQUARES

lf D = I<& cbeD. ~D ;; r -1, .85JT aod ,. ... b- ..Uo =( .3, __ ., -.1, .2]T. On
lhe ocher band, if D : diag( 1000, 1, 1, 1 ) Uien - b&\1111 zo ~ ( -1.43, 1.2:1 ]'l' and
r = b- .Uo "'" ( .000428 -.571428 -.142:853 .28$7l.f.IT·

5.6.3 Generalized Least Squares


In many estimation problems, the vector of observatioDB b is related to x
through the equation
b = Ax+w (5.6.6)
where the noille tJector w ha8 zero mean and a symmetric positive defi-
nite 11ariance-cooariance matrix u2W. Assume that W iB known and that
W = BBT for some BE R"xm. The matrix B might be given or it might
be W's Cholesky triangle. In order that all the equations in (5.6.6} con·
tribute equally to the determination of :z:, statisticians frequently solve the
LS problem
minlf a- 1(Ax- b) !12. (5.6.7)
An obvious computational approach to this problem. is to form A= 1A s-
and b = B- 1b and then apply any of our previous techniques to minjmi'7.e
II k- b1!2· Unfortunately, z will be poorly determined by such a proce-
dure if B is ill-ronditioned.
A much more stable way of solving (5.6.7} using orthogonal transforma-
tions bas been suggested by Paige (1979&, 1979b). It is based on the idea.
that (5.6.7) is equivalent to the generalized k&t &quareil problem,

(5.6.8)

Notice that this problem is defined even if A and B are rank deficient.
Although Paige's technique can be applied when this is the case, we shall
describe it under the 888Ulllption that both these matrices have full rank.
The first step is to compute the QR factorization of A:

An orthogonal matrix Z E R"'xm is then determined 90 that


Qfsz = 1o s 1 z = 1z1 z2 1
n m-n n m-n
where S is upper tria.ngula.r. With the use of these orthogonal matrices the
constraint in (5.6.8) transforms to
5.6. WEJGHTlNG AND ITERATIVE IMPROVEMENT 261

Notice that the "bottom half" of this equation determines v,

Su; Qfb v =~u, (5.6.9)

wbile the "top half" preecrlbes :t:

R,:c =Qfb- (QfBZtzf + Q[B~zf)'u = Qfb-QfBZ2u. (5.6.10)

The attractiveness of this method is that all potentJal ill-oonditioning is


concentrated in triaoguJar syst.em8 (5.6.9) and (5.6.10). Moreover, Paige
(1979b) has abown that the above procedure is numerically stable, some-
thing that i.s not uue of any method that explicitly forma s-• A.

5.6.4 Iterative Improvement


A technique for refining ao approximate LS solution has been analyzed by
Bjorck (1967, 1968). It i.s based on tbe idea tbat if

(5.6.11)

tben II b- Ax ft2 = min. This follows because r+ Ax = band ATr = 0 imply


AT Ax ;;;o ATb. The above augmented system is noosingu)ar if rank(A) =
n , which we hereafter assume.
By casting the LS problem in the form of a square liDear system, tbe
iterative improvement scheme (3.5.5) can be applied:

= 0; :t(O) = 0
r <O)
for 1e = 0, 1,

[!~:: ] = [ ~] - [ A: ~ ][:~:~ ]
[A: ~ ][~:: ] = [ ~::! ]

end

The residuaJs f">and g(lc) must be computed in higha- precision and ao


original copy of A must be around for this purpoee.
268 CHAPTER 5. 0RTHOGONALIZATION AND LEAST SQUARES

If the QR factorization of A is available, then the aolution of the aug.


mented system is readily obtained. Io particular, if A = QR and R 1 =
R(l:n, l:n), then a system of the form

[At ~] [~] = ~ []
transforms to

p Rf
Im-n
0

0 1' l[~ l = [~ l
where

QTf =
[ ~~ ] m-n
n
QTp = [~] n
m-n

Thus, p and z can be determined by solving the triangular systems Rf h = g


and Rt z = !I - h and setting p = Q [ ~ ] . Assuming that Q is stored in
factored form, each iteration requires 8mn - 2n2 Hops.
The key to the iteration's success is that both the LS residual and so.
lution are updated-not just the solution. Bjorck (1968) shows that if
"2(A) :::::~ [39 and t-digit, ,B-base arithmetic iB used, then z(k) has approxi·
mately k(t- q) correct base ,B digits, provided the residuals are computed
in double precision. Notice that it is ~t2 (A), not ~t 2 (A) , that appears in
2

this heuristic.
Problema

PIS.6.1 Verily (5.6.4).


P5.6.2 Let A E R"'><" have full raDk aod define the diagonal matrU:

+ 6). 1, ...• 1 )
-.....--
.1. = diag( 1, ... ' 1 ' ( l
...............
lW 6 > -1. c-ote the LS 1110lution to min U.:l(Az- b) Hz by :.:(6) &lid ic. l'Midual by
r(6) = b- Az(.S). (a) Shaw

A(ATA)- 1 ATe,eT )
r(6) =(I - 61+ .ser
A(ATA)-lA~eit r(O).
(b) Letting r~o(.S) l!taDd [or the kth CODlpoDelli or r(6}, 8how
r~o(O)
(6 )
r~: = 1+6efA(A1"A)- 1 ATe,·
(c) Uee (b) to -my (5.6.5).
P5.8.3 Show how the SVD ca.o be Ulll!d to IIOMt the genenW.zed LS problem when the
matrices A ud Bin (5.6.8) are rank deficient.
5 .6 . WEIGHTING AND ITERATIVE IMPROVEMENT 269

P5.8.4 La& A E r•• haw raak" aDd 6x a~ 0 de&e


.V(o} = [ a!'T ~ ] .
Sbow tbei

~...-(M(a)) = miD {a, -i + ~.. (A)3 + ( i) }


2

aDd decermiM the 'lalueof a tbu m!nlmjwe ot2(M(a)).


Ps.e.s ADotw tt...u.. ~ mMhod fot LS probJ.a. • tbe roiJowinc:
~(0) =0
fOr*= 0.1. ...
,.(~)-,- ~·> (double pncirioo)
n..u<•l - r<Al j)a • mm
z<•+l) =z<•> + z(~)

(a) Aaaumiq tbu tbe QR factorisation of A ill awilable, bow I1INl)' Bop. ptr itefaticm
~e required? (b) Show tbM the above iteracton rwulta by eeumg g<•> a 0 ID tbe iten.-
tive improvemect ICbemt 9- iD §5.6.4.

RDw &lid column wwiglrtio& in cbe LS problem ill ~ iD L&11n10n .ud flaa80Cl (SLS,
pp. 186-88). T be varioua effecu of tealin'
are d~ lD

A. van de!' Sluie (1969). "Coadition Numben and Equilibra&ion of Ma&rics," Numt!:r.
Math. 14, 14-23.
G.W. St4wart (1984b). "'n cbe Aqmpcotie Beha.nor of Sealed SiDcuiar Value and QR
OecompoG\ioq," MaUL Camp. 43, ~110.
The Cbeoraieal aod compu&a.&iooal upecc. of Cbe gellel'aiDed lelll &quaret probmn ~
peuiD

S. Kourouldia 6Dd C.C. Pe.iae (1981). •A ~Leta Sq~We~~ Appwc.b eo the


G--.l G._Malilulr u - NocW," J. A -. StoL. A - . 16, 6:»-Z.
C.C. hip (lOTh). "Cc..puw Soladon a.a4 Penurba&lon Au.I)W of~ L..-
~Problema,... Ma~~~. c-.. 33, 171-14.
C.C. Paip (1D7Gb). ...,_ NWIId:ally Scab~. Complltadoal Cm a-.lbed u-r
l:.oeiA Squa&w Problem~,~ SIAM J. Nwa. Ana US, 1.71.
C.C. Paise (1985). -r1ae a-.1 Limit Model aDd tbe a-.aDed SiDcn1u' Vahle
~" LM. A{f. M4 n.
Appk 10, ._2M.
1\e'atlw lmpnMII!Mil& ill the leMc ~ ClDilWst • ~ ill

G.R. Golub ud J .H. ~ (liM). "Noce ODit.aUiw JW! • o1 LeMt SqaMw


Solu~• N -. JleiA I,
A. Bjin:k aDd G.R. Golub (1957}.
1---.
~ ftel!_.,t of u- ._. ~ Som-
tioal by Bouebolder ~" BIT 7, 322-37.
A.. B,l&r.k (1981). "'tssiw J W I - of u - I..eaa Sqaara SohdioDI 1.• BIT 1,
257-'18.
A. BJ6rr& (1968)."'c.ative JWI........,. of u-r ._. Sc;UN1 Sola&ioal II.'" BIT 8,
8-30.
A. BjOtct (1987). "S&abiliey ADabW ol 'he Method ot StmiDorm&l ~trw u-
Leut Sq~ ProbJ..,• Lm- Alg. IJINl Itt AppHc. 88/11, 31-48.
270 CHAPTER 5. 0RTHOGONAIJ.ZATION AND LEAST SQUARES

5.7 Square and Underdetermined Systems


The orthogonalization methods developed in this chapter caD be applied to
square systems and also to systems in which there are fewer equations thaD
unknowns. In this brief section we discuss some of the various possibilities.

5.7.1 Using QR and SVD to Solve Square Systems


The least squares solvers based on the QR factorization and the SVD can
be used to solve square line&r systems: just aet m = n. However, from
the flop point of view, Ga.ussia.n elimination is the cheapest way to solve
a square linear system as shown in the following table which assumes that
the right hand side is av:ai.lable at the time of factorization:

Method Flops
Gaussian Elimination 2n3 /3
Householder Orthogonalization 4n3 /3
Modified Gram-Schmidt 2n3
Bidiagona.lization 8n 3 /3
Singular Value Decomposition 12n3

Nevertheless, there are three reasons why orthogonalization methods might


be considered:

• The Hop count& tend to exaggerate the Gaussian elimination advan-


tage. When memory traffic and vectorization overheads are consid-
ered, the QR approach is comparable in efficiency.

• The ortbogonalization methods bav.:~ guaraateed stability; there is no


"growth factor"' to worry about as in Gaussiau elimiDatlon.

• In cases of ill--amditioning, the orthogonal methods give an added


measure of reliability. QR with condition estimation is very depend-
able and, of course, SVD is umrurpasaed when it comes to producing
a meaningful solution to & nearly singular system.

We are DOt expressing a strong preference for orthogon&lization methoda


but merely suggesting viable &lternatives to Gauaai.an elimination.
We also mention that the SVD entry in Table 5.7.1 assumes the avail-
ability of b at the time of decomposition. Otherwise, 20n3 Hops are required
because it then becomes necessary to accumulate the U matrix.
If the QR factorization is U8l!ld to 801ft A% =
b, then we ordinarily
=
have to carry out a back substitution: Rx QTb. However, this can be
avoided by "preprocessing" b. Suppose H is & HoU88holder matrix such
5.7. SQUARE AND UNoERDETERMINED SYSTEMS 271

that Hb = /3en where en i8 the last column of In. If we compute the QR


factorization of (HA)T, then A= I{l'RTQT and the ayatem traosforma to

RT11 =Pen

where y = qT x. Since RT ia lower triangular, y = ({3/rn.n)en and so


p
x= -Q(:,n).
rnn

5.7.2 Underdetermined Systems


We say that a linear system

(5.7.1)

is undenktennined whenever m < n. Notice that such a system either has


no solution or has an infinity of solutions. In the second case, it is important
to distinguish between algorithms that find the minimum 2-norm solution
and those that do not necessarily do so. The first algorithm we present is
in the latter category. Assume that A has full row rank and that we apply
QR with column pivoting to obtain:

QT An = [ Rt R'l ]

where Rt E urxm is upper triangular and R'l E R"'x(n-m). Thus, Ax = b


transforms to

where

with Zt E R"' and Z2 E R(n-m). By virtue of the column pivotillg, Rt is


nollSingular because we are 8SBUIDing that A has full row rank. Ooe solution
to the problem is therefore obtained by setting z 1 = Ri 1QTb and z'l = 0.

Algorithm 5. 7.1 Given A E rxn with rank(A) = m and b E R"', the


following algorithm finds an ::t E R" such that Ax = b.

qT An = R (QR with column pivoting.)


Solve R(l:m, l:m)z1 = QTb.

Setx=II[ ri ]·
272 CHAPTER 5. 0RTHOOONAUZATION AND LEAST SQUARES

This algorithm requires 2m2n - m 3 /3 flops. The minimum norm solution


is not guaranteed. (A different n would render a smaller z1.) However, if
we compute the QR factorization

with R 1 E IR"'x"', then Ax = b becomes

where

Now the minimum norm solution does follow by setting z 2 = 0.

Algorithm 5.1.2 Given A E Rmxn with rank(A) = m and bE R"', the


=
following algorithm finds the minimal 2--norm solution to Ax b.

AT = QR (QR factorization)
Solve R(l:m, l:m)T z = b.
:t = Q(:, l:m)z

This algorithm requires at most 2m2n- 2m3/3


The SVD can also be used to compute the minimal norm solution of a.n
underdeterm.ined Ax = b problem. If
r
A = :E.uiut.vf r = raok(A)
i .. l

is A's singular value expansion, then

AB in the least squares problem, the SVD approach is desirable whenever


A is nearly rank deficient.

5.7.3 Perturbed Underdetermined Systems


We conclude this section with a perturbation result for full-rank underde-
termined systems.
5.7. SQUARE AND UNDBRDETERMJNED SYSTEMS 273

Theorem 5.'T.l SUfiTJO~t rank(A) = m S n and tiW A E R")(" , 6A E R")(",


o f:. b e R"', and 6b e Rm ~wfJJ

t = max{t:.A,fll} < u.,.(A),


ukre t..t = II cSA 112/ll A ll2 and 4 = lf6b 112/IJ b ll2· If% and i an minimum
norm $()lutioru tlwt MJtufy
(A + 6A)i = b + 60
then

Proof. Let E and f be defined by 6A/t and 66/ e:. Note that rank( A+ tE) =
m for all 0 < t < e: and that
x(t) = ( A+ tE)T ((A+ tE)(A + tE)T) - l (b + tn
satisfies (A+ tE)x(t) =
6 + tj. By cillferentiatin( this expression with
resp«t to t and setting t "' 0 in t he result we obtain
z(O) = (/ - AT(AAT)- 1A) E'T(AAT)- 1b+ AT(AAT)- 1(! - Ex).
Since
II x 1!2 ~ II AT(AAT)- 1b 1!2 ~ u,.(A)II (AAT)- 1b 112.
II I-AT(AAT)- 1A II2 = min( l , n - m),
and

we bave
II x - x lb ::~(e:) - x(O) = e:ll z (O) Ill + O(e::z)
II x !12 II x(O) 82 II :t l1 2
. {II E lh II I ll2 II E ,:2
S umn(l , n- m) UA ll2 + Jib 1!2 + II A 1!2 ~(A) + O( )
!12}
from which the theorem follows. []

Note that there is no "2(A)2 factor as iD the case of overdetermined systems.

Probleru
P&.f.l Deri- the~ ~D fot' z(O).
P5.1.2 Find the miD.Im&ln«m dution to tbe ~ ..U • b when A = ( 1 2 3 J aad
b= J.
P5. 1.3 Show boo1r triaoplac ay.c.em IIOiv1Dc o:&D be avoided wt.o ..m& tbe QR fal:tor-
iatioo to mlve &D Ulldenlee.ermiDed ~
P5.1.4 Suppoee b,z e R" are J{WD. CoDIIid• tbe followlug problema:
'274 CHAPTER 5. ORI'BOGONALIZATION AND LEAST SQUARES

(a) Fiod 1111 ~ Toeplits ma&.rix T 10 Tz = b.


{b) Find a 8}'1JI.UII!tric Toepliw ma&rix T 110 T:r: ""b.
(c) FIDd a dreuJan&. mMrix C 110 Cz =b.
Paae .ell problem iD tbe t"orm Ap = b wbeo. A ia a a>.atrix made up of 1!JD!::ri.. fmm z:
azx:1 pill tbe wctor of IOilg~ ~

Note. aad Ref'erenc.. for Sec. l'i.T


lntenatizll upeci8 ~ sinsuJ.ar S]IBiama ant diac~ in

T.F. Chan (HIM). ~Deftaied Decompolliiion Solllliolw of Nea.t"ly Singular System~~,"


SIAM J. Num. An& 21,138-754..
G.H. Golub and C.D. M~ (1986). ~using the QR Factorization and Group Invenion
to Compute, Diffen!!ntiaie, and estimate the Sensitivity of Ste.tion.ary Probabilitie~~
for Markov Chains," SlAM J. Alg. oUI.d Di.f. MetlwtU, 7, 273-281.
Papen ou uaderdetennined system~~ iDclude

R.E. Cline ami R.J. Plemmoua (1976). ~~lotions to Underdetenn.ined Linear Sye-
tema," SIAM Jl.ev$eul18, 92-106.
M. Arioli &Dd A. Lanatta (1985). "Error AnalyaiB of ao Algorithm for Salvin& an Unde!'-
detenniiM!d Sycem,.. ~. MGth. -46, 255-268.
J.W. Demmel a.nd N.J. Higham (1993). •Improved Error Bounds for Undflnietennined
System Solvers," SIAM J. Mlltnz AnaL Appl 14, l-14.
The QR factorizaiion can of c o - be ~ to 1101Ye linear systems. See

N.J. Higham (1991). "Iterative Refinement Enhana:s the Stability of QR Factorization


MelhodB for Solvin1 Lin_. Equa.ciona," BIT 31, 447-468.
Chapter 6

Parallel Matrix
Computations

§6.1 Basic Concepts


§6.2 Matrix Multiplication
§6.3 Factorizations

The parallel matrix computation area bas been the focus of intense
research. Although much of the work is machine/systeol dependent, a
number of basic strategies have emerged. Our aim is to present these along
with a picture of what it is like to "think parallel" during the design of a
matrix computation.
The distributed and shared memory para,digms are considered. We use
matrix-vector multiplication to introduce the notion of a node program in
§6.1. Load balancing, speed-up, and synchronization are also discussed.
In §6.2 matrix-matrix multiplication is used to ahow the eHect of blocking
on granularity and to convey the spirit of two-<limensiooal data How. Two
parallel implementations of the Cholesky factorization are given in §6.3.

Before You Begin


Chapter I, §4.1, and §4.2 are assumed. Within this chapter there are
the following dependencies:

§6.1 -t §6.2 - §6.3

Complementary references include the books by Schonauer (1987), Hack-


ney and Jesshope (1988), Modi {1988), Ortega (1988}, Dongarra, Du11',
276 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

Sorensen, and van der Vorst (1991), and Golub and Ortega (1993) and the
excellent review papers by Heller (1978), Ortega and Voight (1985), Galli-
van, Plemmons, and Sameh (1990), and Demmel, Heath, and van der Vorst
(1993).

6.1 Basic Concepts


In this section we introduce the distributed and shared memory paradigms
using the gaxpy operation

z"" y+Ax, A E Irx",z,y,z E R" (6.1.1)

as an example. In practice, there is a fuzzy line between these two styles


of parallel computing and typically a blend of our comments apply to any
particula.r machine.

6.1.1 Distributed Memory Systems


In a distributed memory multiprocessor each processor has a locaJ mem-
ory and executes its ovin node program. The program can alter values in
the executing processor's local memory and can send data in the form of
message.3 to the other processors in the network. The interconnection of
the processors defines the networi: topology and one simple example that
is good enough for our introduction is the ring. See FIGURE 6.1.1. Other

~ p~(l) 1--i P~(2) 1--i Proc(3) 1--i Proc(4) w


FIGURE 6.1.1 A Four-Prr:JCUsor Ring

important interconnection schemes include the mesh and torus (for their
close correspondence with two-dimensional anayz), the hypercube (for its
generality and optimality), and the tree (for its handling of divide and
conquer procedures). See Ortega and Voigt (1985) for a discussion of the
possibilities. Our immediate goal is to develop a ring algorithm fur (6.1.1).
Matrix multiplication on a torus is discussed in §6.2.
Each processor haa an identificatitm number. The J..tth proceseor is des-
ignated by ProcV.). We say that Proc(~) is a neighbor of Proc{J..t) if there
is a direct physical connection between them. Thus, in a p-processor ring,
Proc{p- I) and Proc(1) are neighbora of Proc{p).
6.1. BASIC CoNcEPTS 277

Important factors in the design of an ~ distributed memory al-


gorithm include (a) the number of processors and the capacity of the local
memories, (b) how the processors are interconnected, (c) the speed of com-
putation relative to the speed of interprocessor communication, and (d)
whether or not a node is able to compute and comm1lllicate at the same
time.

6.1.2 Communication
To describe the sending and receiving of messages we adopt a simple nota-
tion:

send( {tnlltriz} , {id of the receiving processor} )


recv( {matrix} , {id of the sending processor})

Scalars and vectors are matrices and therefore messages. In our model,
if Proc(p) executes the instruction send(Vloe, >.), then a copy of the local
matrix Vioc is sent to Proc().) and the execution ofProc(J!)'s node program
resumes immediately. It is legal for a processor to send a message to itself.
'Ib emphasize that a matrix is stored in a local memory we UBe the subscript
"toe."
If Proc(p) executes the instruction recv(U,oc:, .).), then the execution of
its node program is suspended untU a message is received from Proc{..\).
Once received, the message is pl.a.ced in a local matrix U,oc and Proc(Ji)
resumes execution of its node program.
Although the syntax and semantics of our send/receive notation is ad-
equate for our purposes, it does suppress a number of important details:

• Message 888elllbly overhead. In practice, there may be a penalty


8SBOciated with the trRDsmiBRion of a matrix whoee entries are not
contiguous in the sender's local memory. We ignore this detail.
• Measage tagging. Messages need not arrive in the order they are sent,
and a syatem of mesaage tagging is l!80P888ry so that the receiver is
not "confused." We ignore this detail by asswning that messages do
arrive in the order that they are sent.

• Me8118ge interpretation ovel'bead. In pract.ice a mell88g8 is a bit string,


and a header must be provided that indicates to the receiver the
dimensioos of the matrix ami the format of the floating point words
that are used to represent ita entries. Going from ~ to stored
matrix takes time, but it is an overhead that we do not try to quantify.

These aim.pllficatioos enable us to fucua on high-level algorithmic ideas. But


it should be remembeled that the SUCCft!S of a particular implementation
may binge upon the control of these hidden overheads.
278 CHAPTER 6. PARALLEL MATRIX COMPUTA'J10NS

6.1.3 Some Distributed Data Structures


Before we Call specify our fiJWt distributed memory algorithm. we must
coosldet the matter of <lata lorout. How are tbe partidpati.Dg matrices and
vecton distributed around~ network?
Suppoee :z E R" a. to be distributed among the local memoriea ol a ~
PfOOI!I*)r network. Aasume for the moment that n = rp. Two "C8DOilical"
approaches to tbis problem are store-by-row t.lld store-by-column.
[n sto~e-by-eolumn w.! regard the vector ~ as an r-by-p matrix,

Zrx p = [ :r{l:r) ~(r + 1:2r) .. · z(l + (p- 1 )r:n) },

and store each ooluom in a procesaor, Le., .r(l + (JJ - l )r:pr) € Proc(~).
(In thia context "€" means "is stored in..") Note that each p~r houaee
a coDttguous portion of r .
In the sto~by-rmr acheme f t regard .r as a ,_by-r matrix

Zpxr c: ( .r(l:p) :r(p + 1:2p) ·.. z((r -1 }p + l :n) } ,

and store each f'OW in a ptOOei!BOr, i.e., z(p:p:n) E Proc(JJ)· Store-by-row ie


solMtimes referred to 88 the fDmP method of distributing a ~r becauee
tbe components of z caD be thought of aa cards in a deck that are "dealt"
to tbe p1'0C81180n iD wrap-around fashion.
If n Ia not an exact multiple of p, then these ideas go through with minor
modification. Coosidet store-.by-rolumn witb fl :z:: 14 aod p ::::: 4:

zT = ~1 z'l.,zs %~ I Z5 ~ zr za I %9 Zto zu I %12 Zt3 Z1c J.


Plvc(l) ~~) PI'OC(3) Proc(.t)

In general. if n = pr + q- with 0 ~ q < p, then Proc(l),...,Proc(q) caD.


each boll8e r + 1 oomponents aod Proc(q + 1}, ... , Proc(p) caD bouse r
oompoDeDta. In store-by-row we simply let Proc(JJ) bouse z{p:p:n).
Similar optiou apply to tbe layout of • matrix. There ate four obvious
poesibWtieslf A e R'x" and (for simplicity) n ry. =
f O.ri«ltation j Style
Column Contiguous Ai:, 1 +_(1-t- l)r:IU')
Column Wtap A(:.~:p:n)
Row Coutiguoua A(l + (JJ - l}r:/#', :)
Row \Vrap A(~-t:p:n, :)

Thele strategiee haw block analop. Fbr eu.mple, if A :::: {A •• .. . • A Nl fs


a block column putitioaiDg, thea we could anange to ba¥e ~(JJ) stote
A. for i = 1-':p:N .
6.1. BASIC CoNCEPTS 279

6.1.4 Gaxpy on a Ring


We are JlOIW 8Bt to develop a ring algorithm for the gaxpy z = y + A:e
(A E rxn, Z,l/ E R"). For clarity, 8li8UIDe that n = rp where pis the size
of the rillg. Partition the gaxpy as

[I] [J.] + D: . . Z1 [}:] (6.1.2}

where A;; E Exr and z,,fk,zt e R'". We assume that at the start of com-
putation Proc(JJ) houses Zp., 1/p.• and the J.&th block row of A. Upon com-
pletion we set aa our goal the owrwriting of 1/p. by z,.. From the Proc(J..t)
perspective, the computation of

z,.
..
.,.
p
= y,. + L A,..,.z.,.
involves local data (A 11.,., y11 , z,.} and nonlocal data (z.,., r "!- IJ). To make
the nonlocal portions of x available, we circulate its subvectors around the
ring. For example, in the 'P = 3 C&Be we rotate the z 1 , z 2 , and x 3 as follows:

step Proc(l) Proc(2) Proc(3)


1 za Zt :t2
2 %2 :ca Xt
3 %( X:z :ta
When a subvector of x "visits" , the host processor must incorporate the
appropriate term into ita running sum:

step Proc(l) Proc(2) Proc(3)


1 1/1 = 1/1 + A13X3 t12= w+ A21z1 Ys = Y3 + Aa2z2
2 Y1 = Y1 + Auz2 1h = Y2 + A23Z3 =
Y3 Y3 + AatXt
3 Yl = 111 + Aux1 Y2 = !12 + A:nX2 =
Y3 Y3 + A33:ca
In general, the "merry-go.round" of x subvectors makes p "stops." For each
received X--6Ubvector, a proce880I' performs an r-by-r gaxpy.

Algorithm 6.1.1 Suppoee A E R'xn, x E R", andy E R" are given and
that z = y + Az. H eacll processor in a ~processor ring executes the
following node program and n = rp, then upon rompletion Proc(JJ) boll8e8
z(l + (p -1 )r:JJr) in 1f1K· Assume the following local memory initialir.ationa:
p,p (the node id), left and right (the neighbor id's), n, row= l+(JJ-l)r:J.&r,
.A,QC = A(row, :), Xtoc = z(row), !/Zoe= y(row).
280 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

fort= l:p
send(:rloc. right)
recv(:rzoc:, left)
T=JJ-t
ifr:50

end
{ Zloc = :r(l + (r- l)r:rr) }
Yloc = Yloc + .A,oc(:, 1 + (r- l)r:rr):z:loc
end

The index .,.. names the currently available x subvector. Once it is com-
puted it is possible to carry out the update of the locally housed portion of
y. The send-recv pair passes the currently housed :r subvector to the right
and waits to receive the nen one from the left. Synchronization ill achieved
because the local y update cannot begin until the "new" z subvector ar-
riws. It is impossible for one processor to "race ahead" of the others or for
an :r subvector to pass another in the merry-go-round. The algorithm is
tailored to the ring topology in that only nearest neighbor communication
is involved. The computation is also perfectly load balanced meaning that
each processor has the same amount of computation and communication.
Load imbalance is discl.1886d further in §6.1.7.
The design of a parallel program involves subtleties that do not arise in
the uniprocessor setting. For example, if we inadvertently reverse the order
of tb.e send and the recv, then each processor starts its node program by
waiting for a message from its left neighbor. Since that neighbor in turn is
waiting for a message from 1ts left neighbor, a state of deadlock results.

6.1.5 The Cost of Communication


Communication overheads can be estimated if we model the cost of sending
and receiving a message. To that end we 88Sume that a send or recv
involving m 8oating point numbers requires

r(m) = Clef + /Jr1.m (6.1.3)

seconds to carry out. Here ac~ is the time required to initiate the send or
recv and /Jc~ ill the reciprocal of the rate that a message can be tranaferred.
Note that this model does not take into consideration the "distance" be-
tween the sender and receiver. Clearly, it takes longer to pBSS a message
halfway around a ring than to a neighbor. That is why it is always desirable
to arrange (if possible) a distributed computation so that communication
is just between neighbors.
During each step in Algorithm 6.1.1 an r-vector is sent and received and
2r2 Oops are performed, If the computation proceeds at R flops per 9eCOnd
6.1. BASIC CoNcEPTS 281

and there is no idle waiti.og 88IIOciated with tbe recv, then eadl ~ upda.te
requires approximately (~ / R) + 2(04 + {J4r) sec:oods.
Another IDstructiw statistic ia the annpu(Gtion-to-communic:Gtion ratio.
For Algorithm 6.1.1 this is prescribed by

TiJDe speot computing 2r2/ R


Time spent communicating ~:::~ 2(a., + 04'1") •
aver~ of communication relative to the vol-
Thla fraction quantifies the
ume of computation. Clearly, aa r = nf p grows, the fraction of time spent
computing increases. 1

6.1.6 Efficiency and Speed-Up


The efficiency of a p-procesaor parallel algorithm is given by

E = T(l)
pT(p)
where T(k) ia the time required to execute tbe program on k processors.
If computation proceeds at R Bops/~ and oommunieation is modeled by
(6.1.3), then a reasonable estimate of T (k) for AJgoritlun 6.1.1 is gi'\lell by

k 2 ~
T(k) = L:2(n/k)2/R+2(ac~+ Oc~(n/k)) = ;;. + 2ac~k + 2tJ.,n
l= l

fork > 1. This 888UJDe8 no idle waiting. U k = 1, then no communication


is required and T(l) = 2n2/ R. It follows that the efficiency

1
E = 1 +. PI (0d~ + IJ) .
imprCMlS with incre&ling n aDd degradates with increasing p or R. In
practice, benchmarking ia the only dependable way to asaees efficiency.
A coDCept related to efticieDcy is ~·up. We say that a parallel alg~
rithm for a particular problem acbiews ·speed-up S if

S = T_,/T,_.
where T .-ris the tilDe required fOI' execution of tbe parallel program and
T,~ is the time required by one prooeaaor when the best uniproce1110r pro-
ced~ is used. For aome problema, the fastest sequential algorithm does
not paralle1ize and so two distinct algorithms are involved In the speed-up
a. ment.
I We -'ioa t~Mol ~apt.--~ putleularly WnminMiac iD . , . _
whwe Uae DOdll ant able ~ cwerlap mmpuwioa 8lld CX'mmaak:Nim..
282 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

6.1. 7 The Challenge of Load Balancing


H we apply Algorithm 6.1.1 to a matrix A E ~xn that is lower triangular,
then apprmcimately half of the Oops 8880ciated with the lll«> updates are
unnecessary because half of the ~i in (6.1.2) are zero. In parlicular, in the
J..tth proce880t, .A,oe(:, 1 + (r- l)r:rr) is zero if r > 1-'· Thus, if we guard
the Yzoe update as foUows,

if'T$J.'
Yloc = Yloc + ~...,(:,1 + (-r- 1)r:-rr)x,...,
end
then the overall number of Hops is halved. This solves the superfluous Hops
problem but it creates a load imbalance problem. Proc(J..t) over90011 about
2
IJT /2 Hops, an increasing function of the processor id 1-'· Consider the
following r = p = 3 example:

Z} 0: 0 0 0 0 0 0 0 0 XI Yt
Z2 a a 0 0 0 0 0 0 0 X2 Y2
~ a a a 0 0 0 0 0 0 X3
...ML
z.t fJ {J fJ {3 0 0 0 0 0 X.c Y•
zs = tJ {3 tJ {J
fJ 0 0
0 0 X~ + Ys
%:6 (3 f3 {3 {3
{3 0 0 0
{3 ~
.., .., ..1!!...
Z7 1' 1' 1' 1' 1' 0 0 X7 Y7
zs 1' "f 1' "f 1' 1' 1' 1' 0 xs Ya
Z9 1' 1' 1' 1' 1' "f 1' 1' 1' Xg 119

Here, Proc(l) handles the a part, Proc(2) handles the (3 part, and Proc(3)
handles the 7 part.
However, if processors 1, 2, and 3 compute (zl!z.&,ZT), (z2,ZG,Ze), and
(z3, zs, z.g), respectively, then approximate load balancing results:

Zt (J 0 0 0 0 0 0 0 0 Xt Y1 .
z.c /3 f3 11 f3 0 0 0 0 0 X':I Y4
..., 1' 0 0
2!.... 1' 7 7 7 7 ~ .J!I_
Z2 a a 0 0 0 0 0 0 0 X.c Y2
4 = f3 {3 {3 fJ f3 0 0 0 0 XI) + Y&
~ 1' 7 1' 7 1' 1' 7 1' 0 X&
...!!...
Z3 (J () (J 0 0 0 0 0 0 X'7 113
Z& fJ tJ fJ {3 tJ {3 0 0 0 zs Ye
Ze 1' 7 7 7 7 7 1' 7 7 Xg Yll

The amount of arithmetic still increases with 1-'t but the effect is not no-
ticeable if n > p.
The development of the general algorithm requires some index JJUWip-
ulation. Assume that Proc(p} is initialized with ~« = A(p:pm, :) and
6.1. BASIC CONCEPTS 283

Yloc =
y(p:p:n), and 8SIIUlile that the contiguous .:r-subvectors circulate as
before. If at some stage :l:loc contains x(l + ('r -l)r:'TT'), then the update

!lloc = ll'loc +Aloe(:, 1 + (r- l)r:TT):Cioc


implements

y(J.&:p:n) = y(p:p:n) + A(J.&:p:n, 1 + (T- l)r:Tr)x(l + (T- l)r:TT).


To exploit the triangular structure of A in the Yloc computation, we express
the gaxpy as a double loop:
for a= l:r
for 11 = l:r
rnoc(a) = Yloc(a) + Aloc(a, ,13 + (T- l)r)Xloc(.8)
end
end
The A1..., reference refers to A(J.&+(a-l)p, f3+(T-l)r) which is zero unless
the column index is less than or equal to the row index. Abbreviating the
inner loop range with this in mind we obtain

Algorithm 6.1.2 Suppose A E R'x", x E R" andy E R" are given and
that z = y + Ax. Assume that n = rp and that A is lower triangular. If
each processor in a p-processor ring executes the following node program,
then upon completion Proc(J.&) hoUBeS Z(J.&:p:n) in Yloc· Assume the following
local memory initializatioos: p, J.' (the node id), left and right (the neighbor
= =
A(J.&:p:n, :), 3/101! y(J.&:p:n), and X1oc = x(l + (J.&- l)r:J.&r).
id's), n, .4,...,
r=nfp
fort= l:p
send(Xloe• right)
recv(XIoeoleft)
'T=JJ-t
ifT~O
T:=T+p
end
{:cloc = x(l + (T -l)r:rr)}
for a= l:r
for {3 = l:p +(a -l)p- (r-l)r
YZuc(a) = !lioc(a) + AEuc(a, 11 + (1'- l)r)Xioc(/3)
end
end
end
Having to map indices back and forth between "node space" and "global
space" is one aspect of distributed matrix computations that requires care
and (hopefully) compiler assistance.
284 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

6.1.8 Tradeoff's
AJJ we did in §1.1, let us ~lop a column-orient«i gaxpy aad auticipate
ita performance. With the block colWIUl partitioning
At e R"xr, r = n/p

the ga.xpy z = 'J + A:r: becomes

=
where x,. x(1 + (p.- 1)r.pr). Assume that Proc(p) oootains A,. and x,..
Its contribution to the gaxpy is the product A,.xj.l and involves local data.
However, these products must be summed. We assign this task to Proc(l )
which we asaume contains y. The strategy is thu for each procesaor to
compute A,.xj.l aad to aend the result to Proc(l).

Algorithm 6.1.3 Suppose A e ~xn, x e R" andy E R" are given and
that z = 11 +A%. If each processor in a p-prooeasor network executes the
following node program and n = rp, tben upon completion Proc(l ) houses
z. Assume the following local memory initi&lizations: p, p. {the node id),
n, Zloe = x(1 + (p - l)r:p.r), Aloe= A(:, 1 + (p - l )r:pr), and (in Proc(l)
only) !llO<! = y.
if#-'=1
!lloc = Yloc + Atoc%1oe
fort= 2:p
recv(wloc• t)
Yloe = !lloc + Wioc

else
w~oc =Az~loc
aend(wroe, 1)
end
At first glance this seems to be much leas attractive than the row-oriented
Algorithm 6.1.1. The additional responsibilities o£ Proc(l) meao that it
has more arithmetic to pezform by a factor of about

2n2 /p + np = 1 + -
-~:::-;-.....;..
r
2nlfp 2n
and more m~ to process by a factor of about p. This imbalance be-
comes leu critk.al if n > p and the communication parameters a, and Pt~
factors are small enough. Another poe~ible mitigating fa.ctor l8 that ~
rithm 6. 1.3 maoipulattw length n vecton whereas Algorithm 6.Ll worb
6.1. BABIC CONCEPTS 285

with length nfp vectors. If the nodea are capable of -m:tor arithmetic; then
the longer vac:tors m&y raise the lew! of performance.
This brief compariaon of Algorithms 6.1.1 aDd 6.1.3 reminds WI ooce
again that difterent implemeatatiou of the same computation eau ha"'!
vwy diffinnt performaoce cbaracteristies.

6.1.9 Shared Memory Systems

We now disc:uss the gaxpy problem for a shared memory multiproc:e&80r. In


this eoviroomeo.t each proceesor has accae to a commoo., global memory
as depicted in Figure 6.1.2. Communication between processors is achieved

Global Memory

FIGURE 6.1.2 A Four-Pf'OCU6M' Shared Memofll Symm

by reading and writing to gl.obGl ~ that reside in the global memory.


Each proca.or mecutea its own local prognma aod bu ita own looal memof'J#.
Dau. flowa to eod from the global memory duriug execution.
AU the coaceru that attend distributed memory computation an with
us in modified brm. The overall procedure should be load bolaftced and tbe
computaticma should be arraoged ., that the individual proc:M80r'8 have
to wait as liWe u pa&bie fur something U8llful to compute. The tralic
between tbe global and local memorles BUl8t be ~carefully, beca.e
the exteDt of such data Uaoafen ia typically a sigoi6CNlt CMIIrbead. (It
corresponds to interpiOCeiiBOI' commuaica&iou in the distributed JDfliDOirY
eetting and to data motion up aad down a memory hienrchy u diac:u.ed
in §1.4.5.) The nature of the p.hyBical CODDeCtion betweeD the~
and the abaled memory is WJCY impott&Di aod can effect algorit.hmie dewl-
opmeot. Howevv, fOr simplicity we regard this aspect of the syBtem • •
black box aa shown in Flgwe 6.1.2.
286 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

6.1.10 A Shared Memory Gaxpy


Conaider the following partitioning of the n-by-n gaxpy problem z = y+ Ax:

[:] = [~] + [1l· (6.1.4)

Here we assume that n = rp and that A,. E R"x", yl' E R", and z,. E R'".
We use the following algorithm to introduce the basic ideas and notations.

Algorithm. 6.1.4 Suppose A E E'xn, x E R", andy ERn reside in a


global memory accessible to p processors. If n = rp and each processor
executes the following algorithm, then upon completion, y is overwritten
by z = y + .A.:z:. Assume the following initializations in each local memory:
p, p (the node id), and n.

r =nfp
row= 1 + (p- l)r:pr
Xloc. =X
Yloc = y(row)
for j = l:n
aloe= A(row,j)
Yloc = 1/10<: + Gloc.XIoc(i)
end
y(row) = 1/loc

We assume that a copy of this program resides in each processor. Float-


ing point variables that are local to an individual processor have a "loc"
subscript.
Data is tr8.DSferred to and from the global memory during the execution
of Algorithm 6.1.4. There are two global memory reads before the loop
{:ctoe = :c and Yloe = y( row)), one read each time through the loop (Gfoc =
A(row,j)), and one write after the loop (y(row) = Yfoe)·
Only one processor writes to a given global memory location in y, and
so there is no need to synchronize the participating processors. Each has
a completely independent part of the overall gaxpy operation and does not
have to monitor the progress of the other processors. The computation is
statically scheduled because the partitioning of work is determined before
execution.
If A is lower triangular, then steps have to he taken to preserve the
load balancing in Algorithm 6.1.4. As we discovered in §6.1.7, the wrap
mapping is a vehicle for doing this. Assigning Proc(p) the computation of
z(p:p:n) = y(p:p:n) + A(p:p:n, :)x effectively partitions the n:l flops among
the p processors.
6.1. BASlC CONCEPTS 287

6.1.11 Memory Traffic Overhead


It is important to recognize that overall performance depends strongly on
the overhead8 8880ciated. with the reads and writes to the global memory.
If such a data transfer involves m floating point numbers, then we model
the transfer time by
-r(m) = o. + f3.m. (6.1.5)
The parameter a. represents a start-up overhead and /3. is the reciprocal
transfer rate. We modelled interprocel'IIOr communication in the distributed
environment exactly the same way. (See (6.1.3).)
Accounting for all the shared memory reads and writes in Algorithm
6.1.4 we see that each processor spends time

communicating with global memory.


We organized the computation so that one column of A(row.:) is read
at a time from shared memory. If the local memory is large enough, then
the loop in Algorithm 6.1.4 can be replnced with
~oe = A(row, :)
1/loe == Yloc + A1oeX1oc
This changes the communication overhead to
- n2
T::::: 3a• + -/3~,
p
a significant improvement if the start-up parameter o~ is large.

6.1.12 Barrier Synchronization


Let us consider the shared memory version of Algorithm 6.1.4 in which
the gaxpy is column oriented. Assume n = rp and col= 1 + (IJ- l)r:IJT.
A re880na.ble idea is to use a global array W{l:n, l:p) to house the prod-
ucts A(:, col)x(col} produced by each processor, and then have some chosen
processor (say Proc(l)) add its columns:
~oe = A(:,col); XI«:= z(col); W!oe = ~oeXloc; W(:,p) = Wjoc
if!J=l
Yloe = Y
fori= l;p
Wfoe = W(:,j)

end
Y = 1/loc
end
288 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

However, tbia strategy la aerloualy flawed because theN ia no gu.anntee that


W(l:n,l:p} is fully initialized when Proc(I) begins tbe summaaon proce88.
What 188 need is a aynchroDizatioo coastruct that can delay the Proc(1)
1mmmatJoo until all tbe ~ have oomputed and stored their contri-
butions iD. the W array. For this purpoee ma.oy shared memory systems
support aome version of tbe barrie%' OODBtruct which we introduce in the
follawing algorithm:

Algorithm 6.1.5 Suppose A e rrxn, X E R", and !IE R" reside in a


global memory acceMible to p proce11110rs. If n = rp and eadl proceseor
executes the following algorithm, then upon completion y Is overwritten by
v + A:t. Assume the following initializatioas in each local memory: p, JJ
(the node id), and n.
T = n f p; col= 1 + (JJ- l)r.JJr; Aloe"" A(:,col); Xloc = x(col)
Ul!oc = A!ocXIoc
W(:,JJ) = Ulloc
barrier
ifp = l
Yloc =Y
fori= l :p
Wtoc = W (:, j)
l/loe = Yloe + tDioe
end
y =!/roc
end
1b undent&lld thfJ barrier, it is convenient to regard a prooeaaor aa. either
blocked or free. A prOoealor is blocked and suspends execution when it
executes the barrier. A&r the pth plOOeiSIIOr is blodmd, all the proceaaors
return to the "free state" and rarume execution. Think of the barrier as
treacheroWI stream to be traversed by all p processors. For safety, they
all congregate on the bank before attempting to croa When the las\
member of the party arrives, they ford the 11tream in UDison and re8UD1e
their individual treks.
In Algorithm 6.1.5, the prooesaors are blocked afW computing their
portion oi the matrix-vector product. We eamwt predict the order in whicla
these b.loc:kiDp occur, but once tbe last p~r reacluis the barrier, they
are all released and Proc(l) can earry out the wctor summation.

6.1.13 Dynamic Scheduling


Instead of having one proc:esaor in charge of the v.!Ctor summation, it ia
tempting to have each processor add its contribution directly to the global
variable y. For Proc(JJ), this means executing the fol.lowiDg:
6.1. BASIC CONCEPTS 289

r = nfp; col = 1 + (JJ- l)r:JJr; Azoc = A(:, col); ZJoe = z{col)


Ulloe= Aioc-Zloc
Yloc = Yi :Woe = Yloc + Wloci 11 = Yroc:
However, a problem concerns the read-update-write triplet

Indeed, if more than one processor is executing thia code fragment at the
same time, then there may be a losa of information.. Coosider the following
sequence:

Proc( 1) reads y
Proc(2) reads y
Proc(l) writes y
Proc(2) writes y

The contribution of Proc(l) is lost because Proc(l) and Proc(2) obtain the
same version of y. A1J a result, the effect of the Proc(l) write is erased by
the Proc(2) write.
To prevent this kind of thing from happening most shared memory
systems support the idea of a critical section. These are special, isolated
portions of a node program that require a "key" to enter. Throughout the
system, there is only one key and so the net effect is that only one processor
can be executing in a critical section at any given time.

Algorithm 6.1.6 Suppose A E R"x .. , x E R .. , and y E R,. reside in a


global memory accessible to p processors. If n = pr and each processor
executes the following algorithm, then upon completion, y is overwritten
by y + A%. Allsume the following initializations in each local memory: p, p.
{the node id), and -n.

r = nfp; col= 1 + (p.- l)r:JJr; Azoo: =A(:, col); Xioc = :::r:(col)


w,<)C = Azoc:Zli)C
begin criticaJ section
3/loc =y
Yloe = Yzoc + W!oc
Jl = Yfoc
end critical section

This use of the ttitical. section concept controls the update of y in a way
that eDBUI1l8 correc:tnesa. The algorithm is dyMmically &ch«lukd because
the order in which the summations occur is determined as the computation
unfolds. Dynamic scheduling is -rery important in problems with irregular
structure.
290 CHAPTER 6. PARALLEL MATRIX CoMPUTATIONS

Proble..-
P4.1.1 Modify Algoritlun 6.1.1 10 thai it caD baDdJe arbitrary n.
1"0.1.2 Modify Al&vritbm 6.1.210 ~" llftldelllly blmdlell the DppS' triangular cue.
P8.1.3 (a) Modify Al(urithml6.1.3 aDd 6.1.410 thu they owrwrite 11 with z v+A"'z =
for a gn.u poaitiw imepr m that is avNlable to sdl p~. (b) Modify Algorithmll
6.1.3 aDd 6.1.4 10 that v ill tMitWlitt.IID t., I = 1f +AT A:.
PS.1.4 Modify ~hm 6.1.310 that upon completion, the loca.l array A'- in Proc(#-1)
ho'- the pth block eoi.IUDD of A + eyT.
P8.1.5 Modify Algorithm 6.1.410 thai (a} A iiCM!II'Written by the outer product update
A+ rvT, (b) z ia <M1rWritten with A 2 :r:, (c) V is a--writteo by a1lllR 2-nonn ~ in
the direction ofv+A•:r:, and (d) it •ciemly lwldlea theca- when A illlowm triangulal-.

Note. and RsferenCBB for Sec. 6.1

General referencM on pamllel computatioDII that include 8IM!ral chaptenr: on matrix


oomputatiolls iDclude
G. c. Fax., M. A. Johlllon. G. A. Lyzenga, S. W. Otto, J. K. Salmon and D. W. Walbr
(1988). Solvmg Pro/Item, on Concummt Prot:uaon, Volume 1. Prentice Hall. En-
glewood Clift's, NJ.
D. P. BerlMima and J. N. TlriWildill (1989). P.alW o:~nd DU~ ComJH4ncn:
NumcricGl Mdh.oti.a, Prentice H.a1.1, Englewood CliB'a, NJ.
s. Lakahmivaraban and S. K. Dball (1990). AnoJvm and De.rign of Parallel Algonehnu:
Arithmetic cmd Matri: PniOlt:nu, McGra111-Hill, New York.
T. L. Freeman and C. P~ (1992). PQf'Gllel Nummau A~ Pnm.tice Hall,
Ne111 York.
F.T. Leighton (1992). /ntr'odudicm to Parallel Algorielum and Arcllitec:hiTU, Morgan
Kaufmann, San Mateo, CA.
G. C. Fox, R. D. Williams, and P. C. Meeaina (1994). Pamlkl Campunng WOJ"UI,
Morgao Kaubnenn, San Fr&nci!K:o.
V. KUDUII", A. Grama., A. Gupta aod G. Karypil (1994). Introduction to Pomll.el Com-
~: Duign and An41""' of Algoriihml. Beujamiu/Cummillgll, Reading, MA.
E.F. Van de Velde (1994). ConawTent S~ Com~ Sprinpr-Verlag, New York.
M. Comard and D. 'l'ryRram (1995). Pnlel Algorithnu tl:ftd A~, Interna.-
tional ThoiDIIOn Computer ~. New York.
Her.t are 10me c-ra1 reNrencel ibal are mon~ specific to penille1 mamx computUiona:

v. ~and D. Fad.IIIIV (1977). ~Panr.llel Co~~~p~~.t&tioDI in IJnear- Algebra,~ KiJir.


neticG 6, 28--40.
D. Heller (1978). wA Surwy of Pva.l.lel Algorithu. iD Numerical LiDear Alp bra, D SIAM
Revieul RO, 740-777.
J.M. Oriep and R.G. Voigt (1985). "Solution of Pwtial Diffiii'I!Dtial Equations 011 Vectol'
and Panlll&l Compulela, ~ SIAM &view n, 149-240.
J.J. Donpna and D.C. SomiMa (1986). ~Liaear Alpbra oa Hip Ptll'fonnanoe Com-
puter.," APJ)L Math. antl Camp. 1!0, 57--88.
K.A. Gallivan. R.J. Plemmons, and A.H. Sameh (1990). •paralleJ. Algorithms for Oenae
Linear' AJpbra Compo~" SIAM Rmeat 3!, 54--135.
J.W. Delnmel, M.T. Heoih, and H.A. van der Vom {1993) "Pa.rallel Numeric.a.l Lin~
Alpbra," in Ac:t.J Nvmmtd 1993, Cambridp UllMnny P.-.
See IIIIo
6.1. BASIC CONCEPTS 291

B.N. Datta. (1989). "Parallel aud Large-Scale Matrix ComputatioiiiJ in Coutrol: Some
Jdeu,n Lin. Alg. Mid It.J Af'Plic. 111, 24.3-264.
A. Edelman (1993). "Large Denlle Numerical Linear AJgebra in 1993: Tbe Pan.llel
Computing lntluenoe,~ Int'l J. ~ AppL 7, 113-128.
Managing aud modelling oommunication in a distributed memory environment is an im-
portaDt., difficult problem. &!e

L. Adams 8Jld T. Crockett (1984). "Modeling Algorithm Execution Time on P~r


ArraY!!," CompuU:r 17, 38-43.
D. Gannon and J. Vau Roeeodale (1984). "'n tbe lmp8Ct of Communication Complexity
on· tbe Design of PI!Lrll.llel Nwnerical Algorithlllll," IEEE Thu~. Comp. 0-33, 11~
1194.
S.:J:,. Job1181l0n (1987). "Communication Efficient 88.11ic Linear Algebra Computations on
Hypercube Multiprocesao111," J. Parol!&! and lNtrih.ted Computing, No. 4, 133-172.
Y. Saad and M. Schultz (1989). KData Communication in Hypercubes,~ J. DUt. P4rollel
Comp. 6, 115--135.
Y. Saad a.nd M.H. Schultz (1989). KDa.t.a. Communicatioa in Parallel Architect.Wl!!l," J.
DiaL Parallel Comp. 11, 131-150.

fbr snap!!bots of basic linear &lgebra computation on a distributed memory system, see

0. McBryB.n aod E.F. van de Velde (1981). "Hypercube A~ritbms and Implementa-
tions," SIAM J. Sci. and Stai. Comp. 8, s227-s287.
S.L. JohD8S>o and C.T. Ho (1988). "Matrix Tl'ansposition on Boolean n<ube Configurl!d
EDSemble Architectures," SIAM J. M4triz Anal. AppL 9, 419-454.
T. Dehn, M. Eierma.nn, K. Giebermann, a.nd V. Sperling (1995). ~structured Sparse
Matrix Vector Multiplication on Massively Parallel SIMD Architectures,M Parnl~l
Computing £1, 1867-1894.
J. Choi, J.J. Dongarr&, and D.W. W&l.ker (1995). ~Parallel Matrix Tha.n.spose Algontbms
oa Distributed Memory Conc::urrent Computers,~ P4rolle! Computing 21, 1387-1406.
L. Colombet, Ph. Micballoa, and D. 'l'rystr&Jn (1996). ~Parallel Matrix-Vector Product
on Rings witb a Minimum of Communicatioa,~ Parol~ Computing 22, 289-310.
The implemeiit&tioo of a p&lll.lltd algoritbm is usually very cballenging. It is important
to ha.ve compilers and rei&~ tools tbat an! able to bandle the details. See

D.P. O'Leary ud G.W. Stewart (1986). "Aa!i8nment and Scbedulill8 in Parallel M&trix
Fll.ctorizatiou," Lin. Alg. cmd It. Applic. 17, 2~300.
J. Ooogarn and D.C. Sorell8en (1987). "'A Portable Environment for I>evelopiog Parallel
P~ms, ~ PC!Ttlllel Com~ 5, 175-186.
K. Connolly, J.J. Dongarra. D. Sorellllen, and J. Pattei1!0n (1988). "Programming
Methodology and PerfOrmanoe lllllle& l'or Advauced Computer Architectures," Par-
aU& Compuoog 5, 41-58.
P. Ja.cobeon, B. Kagstrom, and M. Rann&r (1992). ~Algorithm Development for Di&-
tributed Memory Multicomputen Using Conl&b,~ Scientific P'I'Dgl"'lmming, 1, 185-
203.
C. Anoolll't, F. Coelho, F. lrigoio, and R. Keeyell (1993). "A Linear Alpbnt. Framework
for Static HPF Coda Disui.butirm," ~of the 4th Worbhop on C~
for ParaUd C~, Delft, The Netberlan.da.
D. Bau, I. Kodukul&, V. KotlY1U', K. Pingali, and.P. Stodghill (1993). "Soo.viD8 Alignment
Using Elementary Linear Al@;ebra," in ~ of the 7th I~ Worbhop
on Langw.gou and Compilen for Poralld Computing, Lecture Notes in Computer
Science 892, Springer- Verlag, New York, 46-00.
M. Wol11!! (1996). High. Pe.r/Dnnafll% Compilef'J for Parallel C~, Addl8ou. Wmley,
Reading MA.
292 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

6.2 Matrix Multiplication


Ill thia section we develop two pacaUel algorithms Cor matrix·matrix multi-
plication. A shared memory implemeotation ill uaed to illustrate the effect
of blocking on granularity and load belanc:Jng. A torua implementation is
designed to ooovey the spirit of two-dimeosional data flow.

6.2.1 A Block Gaxpy Procedure


Suppose A, B, C E ~rxn with B upper triaDgular and oowrider the compu-
tation of the matrix multiply update

(6.2.1)

on a sbared memory computer with p proceeaora. Assume that n = rkp


aod partition tbe update

[ Dt. . ... D,.,. I = [ C~o .. .• c,.,. I + (At .... A,., I [BJ . ....
I Bk·p I (6.2.2)

where each block column has width r"" n /(kp). If

BlJ

B;;
B; = 0

then
j
Di = ci + AB; = c 1 + L ~B,..1 . (6.2.3)
r-1

The number of fiops required to compute D; iB given by


3
2n ) .
2nr2.J = ( k2p2
JJ = 1·

This is an inc:reuiDg function o( j becauae B is upper triaDgular. As we


di8cxMnd in the previoua &eetion, the wrap mapping ia the way to solve
load imbolaate problems that result from triaogulac matrix structure. This
suggests that 'I'Ve assign Proc(,.,.) the taak of computing D1 for j = ,.,.:p:kp.

Algorithm 6.2.1 Suppoee A. B , and Care n-by-n matrlcee that reside


in a global memory atteMible top J>roceliiiOrS. Aasume that B iB upper
triarlgular and n = rkp. If each prooessor executes the followiDg algorithm,
6.2. MATRIX MULTIPUCATION 293

then upon completion C is overwritten by D =


C + AB. Assume the
foUowing initialiWioos in each 1ocaJ. memory: n, r, k, p and JJ (the node
id).
for j = I'=P:kp
{ComputeD;.}
Bloc= B(l:jr, 1 + (j - l)r:jr)
Ctoc = C(:, 1 + (j - l)r:jr)
forT= l:j
col= 1 + (T -l)r:TT
Atoc: = A(:, an)
C1oc = Ctoa + ~oeBioc{col, :)
end
C(:, 1 + U- l)r:jr) = qoc
end
Let 118 examine the degree of load balancing aa a. function of the parameter
k. For Proc(JJ), the number of Bops required is given by

F(JJ) =
Sc (
bf~>+<•-llp ~ kJJ + 2
k2p) k2~·
2n3

The quotient F(p)f F(l) is a. measure of load balancing from the fl.op point
of view. Since
F(p) _ kp + k2p/2 _ 2(p- 1)
1
F(l) - k+k 2pj2 - + 2+kp
we see that arithmetic ba.la.uce improves with increasing k. A similar anal-
ysis shows that the communication overheads are weU ba.l.auced as k in-
creases.
On the other hand, the total number of global memory reads and writes
88liOCia.ted with Algorithm 6.2.1 increases with the square of k. If the start-
up parameter a. in (6.1.5) is large, then performance can degrade with
increased k.
The optimum choice for k given these two opposing forces is system
dependeot. If communication is fast, then smaller taeb can be supported
without penalty and this makes it easier to achieve load balancing. A mul-
tiprocessor with this attribute support& fine-grained parnl~li.mt.. However,
if granularity ia too fine in a system with high-performance nodes, then it
may be impossible for the DOde programa to perform at level-2 or level-3
speeds simply because there just is not enough local linear algebra. Again..
benchmarking is the only way to clarify these issues.

6.2.2 Torus
A torus is a two-dimensional proce880r enay in which each TOW' and col-
umn is a ring. See FIGURE 6.2.1. A Proce!liiOr id in this context is an
294 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

ordered pair and each proce11110r baa four neighbor& In the displ&yed. exam-

FIGURE 6.2.1 A Four-by-Four Torus

ple, Proc(1,3) has wut neighbor Proc(l,2), ea.tt neighbor Proc{l,4), south
neighbor Proc(2,3), and north neighbor Proc(4,3).
To show what it ia like to organi2e a toroidal matrix computation, we
develop an algorithm for the matrix multiplication D = C + AB where
A,B, C E R"xn. Assume that the torus is Pl-by-Pt and that n = ?1·
Regard A = (A.;), B = (B,;), and C = (C,;) as P1-by-p1 block matrices
with r-by-r bl.ocks. Aaaume that Proc(i,j) contains A.;, Bi;, and C,i and
that its mission ia to overwrite C,; with

Pl
Di; = Ci; + L ~Bt;·
lr•l

We develop the general algorithm from the Pl = 3 caae., displaying the torus
in cellular form as follows:
6.2. M.ATJUX MULTIPUCATION 295

Proc(l,l) Proc(l,2) Proc(l,3)

Proc(2,1) Proc(2,2) Proc(2,3)

Proc(3,1) Proc(3,2) Proc(3,3)

Let us focus attention on Proc(l ,l) and· the calculation of


Du = Cu + AuBu + Ax:zB:u + At3Blt .
Suppoee the six inputs that define this block dot product are positioned
within the torus as follows:

Au Bu Atl At3

B:n

B31

(Pay no attention to the "dots." They are later replaced by various A,;
and s.,).
Our plan is to "ratchet" the first block row of A and the first bloclc.
column of B through Proc(l,l ) in a coordinated fashion. The pairs Au
and Bu, Atl and Blh and Au and 831 meet, are multiplied, and added
into a running 8UDl array C.oc:

A12 B21 At3 Au

8,1

Bu

A13 831 Au A12

Bu . .
~1

Au Bu A12 A13

821 C,oc. = c,_+ AuBu


. Bu
296 CHAPTER 6. PARALLEL MATIUX COMPUTATIONS

Thus, after three steps, the local array c,.


in Proc(l,l) bowa D 11 •
We have org&Dized the ftow of data so that the A11 migrate westwards
and the Bu migrate northwards through tbe torus. It is thus apparent that
Proc(l,l) must execute a node program of the form:

fort= 1:3
send( Aloe, west)
send( Bloc, north)
rec:v(Aioet ea.tt}
recv( Bloc• •auth)
Ctoe =Ctoe + AloeBicoe
end

The send-recv-send-recv 8equence

fort= 1:3
Hnd(Aioeowut)
recv(Aioe 1 east)
send(B~.x:, north)
recv(Btoc• •outh)
C1oe = C1oc: + A1oeB1coe
end

also works. However, this induces unneceuary delays into the process be-
catl8e the B submatrix is not sent until the new A submatrix arrives.
We uext coDSider the activity in Proc(I,2), Proc(l,3), Proc(2,1), and
Proc(3,1). At this pomt in the development, these proces80rs merely help
circulate b.locks Au, Au, and Au aod Bu, B21 o and lhs , respectively. If
832, Bs2. and Bn &wed through Proc(1,2) during these step&, then

could be formed. Likewise, Proc(l,3) could comp11te

if B13, 823 , and B33 are available during t = 1:3. To tbia end we illitialize
the torus u followa

Au Bn Atz Bn Au Bss

B,l Bn Bt3

Blt B 12 B'l3
6.2. MATRIX MULTIPLICATION 297

A12 B21 At:s Ba2 Au B1a

Bll B12 B23 t=l

Bu ~ 833

Ata 831 Au Bn a12 B73


. Bu B22 B33 t=2

B-zl Bs2 Bts

Au Bn A12 Bn al3 833

B21 Ba2 Bts t= 3

Bat B12 B23

Thus, if B is mapped onto the torus in a "staggered start" fashion, we can


arrange for the first row of processors to compute the first row of C.
If we stagger the second and third rOW8 of A in a similar fashion, then
we can arrange for all nine processors to perform a multiply-add at each
step. In particu1ar, if we set

Au Bu A12 B22 Ata B33

An B·u A73 B32 A21 Bts

A33 B31 A31 B1~ As:~ B'l3

then with westward Bow of the ~i and northward flow of the B,; we obtain

An B21 Ata ~ An Bu

A22 Bst A:u B12 An EJ.n t=l

Aat Bu A32 BTJ A33 833


298 CHAPTER. 6. PARALLEL MATRl.X COMPtrrATIONS

Au 831 Au Bu At:~ il23


A:n Bu A2:1 B, A'J3 Baa t=2

A32 ~. A33 832 A31 B13

Au Bu An Bn A13 833

A2:1 Bu A:l3 Bn A21 Bu t=3

A33 831 A31 B12 A32 B'l3

From this example we are ready to specify the general algorithm. We


assume that at the start, Proc(i,j) houses A.i, Boi• and Coi· To obtain the
necessary staggering of the A data, we note that in processor row i the A,i
should be circulated westward i - 1 positions. Likewise, in the jth column
of processors, the B,j should be circulated northward j - 1 positions. This
gives the following algorithm:

Algorithm 6.2.2 Suppose A E R'x", BE R*x", and C E R"x" are given


and that D = C + AB. U ea.cb. processor in a p 1-by-p1 torus executes
the following algorithm and n = Pl r, then upon completion Proc(J', .A)
houses D~A in local variable Cu,e. Assume the following local memory
initializa.tions: Pt. (J', >..) (the node id), norlh., ec~t, south, a.nd west, (the
four neighbor id's), row = 1 + (p - 1)r:pr, col = 1 + (.\ - 1)r:..\r, A,..., :::o
A(row,col), Bloc= B(row,col), and Clo.: = C(row,rol).
{Stagger the A~i and 8,).. }
fork= 1:~ -1
send(A-!oc, west); recv(A,...,, east)
end
for k = 1:>.- 1
send(B~oc, north); recv(Bioe• south)
end
fork= l:p1
C1oc = Ctoc + AiocBioc
send( Atoc, west)
send( Bloc• north)
recv(~oc. east)
recv(Btoe. south)
end
6.2. MATRIX MULTIPLICATION 299

{Uustagger the A,.; and Bu..}


for k.,. l:p- 1
send(A,oc, eo.ri); recv(A,oc, wut)
end
fork= l:l-1
send( Bloc• south); recv(Bloc• north)
end

It is not hard to show that the computatiou·to-communication ratio for


this algorithm goes to zero as n/P1 increases.

Problema

PB.~-1 ~op a. ring implementation for Algorithm 6.2.1.

P8.~.~ An upp« triangulel' matrix Cll.ll be overwritten witb ita aquare wnbout any
additional work3pace. Write a. d)'D&IDical1y IICheduled, sbared·memory procedure for
doing thlll.

Notal and Relere- (or Sec. 8.2


Matrix <:omputationa on 2--dimeiiSional acraya ace diiiCu.ed in

H.T. Kung (1982). "Why Sys&olic Archiiect~7," Computer 15, 37-46.


D.P. 0'1.-y ADd G.W. S&enrt (1985). "Daia Flow AJ&orit.bmll for Parallel Matrix
Computations,~ Comm. A.CM fll, 841-853.
B. Hendricbon ADd D. Womble (1994). -rhe 'I'oru-Wrap Mapping for Dense Matrix
Calcn1atioDa on Ma.ively Parallel Computen.~ SIAM J. Sci. Comput. 15, 1201-
1226.

L.E. CIUUlOn (1969). A C'alular Corr~.p'!~Ur to lmplemmc the KtJbrusn Filla- A.fgoriUun,
Ph.D. ThePs, Montana State Uniwnity.
K.H. Cbenc and S. SaluU (1987). "VUll Systems for Band Matrix Mult.iplic:ation,"
ParaUd Computing 4, 239-258.
G. Fox, S.W. Ono, aDd A.J. Hey (1987). "Matrix AlgoriLhmB on a Hypmm~be 1: Matrix
Multiplicaiion," PdtGlld Compueing ,f, 17-31.
J. eenu- (1989). "<JommrmiaWon Efficient Matrix Multiplica&ion on Hypm=bs,"
PomUel Compu&ing 1.1, 335-342.
H.J. Japdiab and T. Kail&th (1989). "A Family of N- Efllci1111t Arrays for Matrix
Multiplication,,. !EBB nun.. Comp~~&. S&, 149--155.
P. Bjjmltad, F. Mamie, T.Stlnvik, aad M. Vaj\ertic (1992). "Efficient Matrix Mulllpli-
cation on SIMD Compu~'" SlAM J. M~&triz Anal. AppL 13, 386-401.
K. MathW' ood S.L. Joru- (191U). "MultiplkaiJon of Mauicel of kbitrazy Shape oo
a Data Para1lel Compmer.~ Pa.ndld Compatmg BO., 919-952.
R. M&tbiu (1995). -rho Instability of Pamllel. Prefix Matrix Mu1tiplica&ion,~ SIAM J.
Sci. Com-p. 16, 956-973.
300 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

6.3 Factorizations
In this section we present a pair of parallel Cholesky factorizations. To
illustrate what a distributed memory factorization looks l.ike, we implement
the gaxpy Cholesky algorithm on a ring. A shared memory implementation
of outer product Cbolesky is also detailed.

6.3.1 A rung Cholesky


Let us see how the Cholesky factorization procedure can be distributed on
a ring of p processors. The starting point is the equation
s<-1
G(p,p)G{p.:n,p.) = A(p.:n,p.)- L G(p.,j)G(p.:n,j) = v(p.:n).
j•l

This equation is obtained by equating the p.th column in the n-by-n equa-
tion A = G<fl'. Once the vector v(p.:n) is found then G(p.:n, p.) is a simple
scaling:
G(p.:n, p.) = v(p.:n)j.,;;[ii.).
For clarity, we first assume that n = p and that Proc(p.) initially houses
A(p.:n, p.). Upon completion, each processor overwrites its A-column with
the corresponding G-oolumn. For Proc(p.) this process involves p. -1 sa.xpy
updates of the form

A(p.:n, p.) --- A(p.:n, p.) - G{p.,j)G(p.:n, j)

foUowed by a square root and a scaling. The general structure of Proc(p.)'s


node program is therefore as follows:

for j = l:p. -1
Receive a 0-a>lumn from the left neighbor.
U necessary, send a copy of the received G-column to
the right neighbor.
Update A(.u:n, p.) .
end
Generate G(p:n, p.) and, if Decal88l'Y1 send it to the
right neighbor.

Thus Proc(l) immediately computes G(l:n, 1) = A(l:n,l)/ y'A(l, 1) and


sends it to Proc(2). AB soon as Proc(2) receives this column it can generate
G{2:n, 2) and p8B8 it to Proc(3) etc•. Wtth this pipelining arrangement we
can assert that once a proce1180r computes its G-column, it can quit. It
also follows that each processor receives G-oolUllUl8 in ascending order, i.e.,
G(l:n, 1), G(2:n, 2), etc. Based on these observations we have
6.3. FACTORIZATIONS 301

j=l
while j < p
recv(g,oc(j:n), le.ft)
ifp<n
send(moc(j:n), right)
end
Atoc~:n) = Atoc(p:n)- !noc(JJ)g~:n)
j=j+l
end
Atoc(p:n) = Atoc(p:n)/-/Atoc:(JJ)
ifp<n
send{Atoc(JJ:n), right)
end

Note that the number of received G-columns is given by j - 1. H j = p,


then it is time for Proc(p) to generate and send G{p:n,JJ).
We now extend this strategy to the general n case. There are two obvi-
ous ways to distribute the computation. We could require each processor
to compute a contiguous set of G-colUJDDS. For example, if n = 11, p = 3,
and A = (Ct. ... , a 11 ], then we could distribute A as follows

Each processor could then proceed to find the corresponding G columns.


The trouble with this approach is that (for example) Proc(l) is idle after
the fourth column of G is found even though much work remains.
Greater load balancing results if we distribute the computational tasks
using the wrap mapping, i.e.,

In this scheme Proc(p) carries out the construction of G(:,p:p:n). When


a given proce8110r finishee computing its G-colum.os, ea.cb. of the other pro-
ce880rs has at most one more G column to find. ThUB if n/p > 1, then all
of the proa11110rs are bU8Y most of the time.
Let us examine the detaila of a wrap-djstributed. Cholesky procedure.
Each proce880r maintains a pair of counters. The counter j is the in-
dex: of the next G-column to be received by Proc~). A procesaor also
needs to know the index of the next G-oolumn that it is to produce. Note
that if col ~ p:p:n, then Proc{JJ) is responsible fur G(:, col) and that
L = lengt:h{col) is the number of the G-columns that it must compute.
We use q to indicate the status of G-column production. At any instant,
302 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

col(q) is the index of the next G-column to be produced.

Algorithm 6.3.1 Suppose A e R'x" is symmetric and positive defi-


nite and that A = G(jl' is its Cholesky factorization. H each node in
a p.processor ring execute~ the following program, then upon completion
Proc(JJ) ho118e8 G(k:n,k) fork= JJ:p:n in a local array ~oe(1:n,L) where
L = length( col) and col = JJ:p:n. In particul&l', G(col(q):n, col(q)) is
houaed in A,oc(col(q):n, q) for q = l:L. Asrrume the following local memory
initiallzations: p, JJ (the node id), left and right (the neighbor id's), n, and
Atoe = A(JJ:p:n, :).
j = 1; q = 1; col = JJ:p:n; L = lengtb(col)
while q :S L
if j = col(q)
{ Form G(j:n,j) }
A1oeU :n, q) = Aloe(j :n, Q) / J ~ocG, q)
ilj<n
send(Aioc(j:n, q), right)
end
j=j+l
{ Update local colWDII8. }
fork= q+ l:L
r = col(k)
~oe(r:n, k) == Aloc(r:n, k) - Aloc(r, q)Atoc(r:n, q)
end
q=q+l
else
recv(g1oc(j:n), left)
Compute cr, the id of the processor that generated the
received G-eolumn.
Compute {3, the index of Proc(right)'s final colllDlD.
if right # a A j < /J
send(gloc{i:n), right)
end
{Update local columns. }
fork= q:L
r = col(k)
A.loc(r:n, k) == ~(r:n, k) - moc(r)g,oc(r:n)
end
j =j +1
end
end
To illustrate the logic of the pointer syBtem. we consider a sample 3-processor
situation with n = 10. ABsume that the three local values of q are 3,2, and
6.3. FACTORIZATIONS 303

2 and that tbe corresponding values of col(q} are 7, 5, and 6:

1 1 1 ]
[ o, o.. CIT aao I 02 ar; t1c au I ~3 tic Cit,
....
Proc(l) Proe(2) "-:(3)

Proe(2) now generates the 6f\h G-oolumn and incremem ita q to 3.


The decision to peaa a recejved G-column to the right neighbor needs
to be explained. Two conditions must be fu16.1Jed:
• The right neighbor must not be the pi'0Cel!80r which geDerated the G
column. This way the circulation of the received G-column is properly
tennin.ud.

• The right Deighbor must :rt.UJ how more G-oolUIDD8 to generate. Oth-
erwise, a G-column will be seDt to an inactive processor.
Tbl.s kind of reasoning is quite typical in distributed memory matrix com-
putations.
Let UB examine the behavior of Algorithm 6.3.1 under tbe 888umption
that n > p. It is not bard to show that Proc(~) performs
L 3
F(p) = L 2(n- (p + (k -l)p))(p + (k - l)p) :;, ;
*•1 'P
flops. Each processor recei~ and sends just about ewry G-oolumn. u~
ing our communication overhead model (6.1.3), we see that the time each
proceMOr spends communicating is given by
..
m11 = }:2(a.,c + .Bc~(n- j)) ::t: 2actn + Pc~n2 •
j•l

H we 888UDle that ciomputatlon proceeds a& R Oops per second. then tbe
computation/communication ratio for Algorithm 6.3.1 is apprmrimately
given by (n/p)(l/Wt~.). Thus, commuuic.ation owrbeeda diminish in iJn.
portaDce as n/p grows.

6.3.2 A Shared Memory Cbolesky


Next we coD8lder a shared memory lmpl~on of the outer product
Choleeky algorithm:
for k -- l :n
A(l:n,k) = A(k:n,k)fy'A(k,k)
fori= k+ l :n
A(j:n,j) = A(j:n,j)- A(j:n,k)A(j,k)
end
end
304 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

The j-loop oversees an outer product update. The n - k saxpy operations


that make up its body are independent and easily parallelized. The scaling
A(k:n, k) can be carried out by a single pi'0Ce880l' with no threat to load
balancing.

Algorithm 6.3.2 Suppose A E lrx" is a symmetric positive definite


matrix stored in a shared memory acce&l'lible to p pl'0C8!1110rs. If each pro-
cessor executes the following algorithm, then upon completion the lower
triangular part of A is overwritten with its Cholesky factor. Assume the
following initializationa.in each local memory: n, p and J.1 (the node id).
fork= l:n
ifl'=l
Vloc:(k:n) = A(k:n)
Vloc(k:n) = t.Proc(k:n)/ y'ttloc(k)
A(k:n, k) = t.Proc(k:n)
end
barrier
Vloc(k + l:n) = A(k + l:n, k)
for j = (k + J.J):p:n
Wtoc(j:n) = A(i:n,j)
Wtoc(i:n) = Wloc(j:n)- VtocU)ttloc(j:n)
A(j:n,j) = W!ocU:n)
end
barrier
end
The scaling before the j-loop represents very little work compared to the
outer product update and so it is reasonable to assign that portion of the
computation to a single processor. Notice that two barrier statements are
required. The first ensures that a ptoeell80l' does not begin working on the
kth outer product update until the kth column of G is made available by
Proc(l). The second barrier prevents the processing of the k+lst step to
begin until the kth step is completely finished.

Problem.

PO..S.l I~ Is ~ble to £ormulaie a block venriou. of Algorithm 6.3.1. Suwo- n = rN.


Fork= l:N- (a) have Proc:(l) geoerae.e G(:,l+(A:-l)r:A:r) aod (b) have all p~
part.iciJ*e in the rank r update ohhe trailingsubmKrix A(A:r+l:n, kr+l:n). See §4.2.6.
The COIIIliel' granularity may impi'OYII perlonnanee if tbe individual ~ lib lewl-3
opera&.ioDa.
Pe.3.2 Denllop a ahand .-mary QR £11Ctor'ization pattemed aft« A1gorithm 6.3.2.
Proc(l) should generate the HoWIIIboldet wcton aud all prooeiJIIIOn sbould share in the
easuing Ho11811holder update.
6.3. FACI'ORIZATIONS 305

Not. and R.efenluc=- for See. 6.8


Gener.l coiiiJII8IItll on d.Lmibuted memory fact.orizatjon proced~ may be foUDd in

G.A. Ceil&: NKl M.T. Headl (18116). "Mairix F8ctoriutioa. 011 a H~ • in M.T.
H.-th (ed) (1986). ~ of Fine SIAM Con/- on Hypereube M~
~ SIAM Publicatiom, Philadelphia, Pa.
I.C.F. I . - , Y. Saad., &Dd M. Sc:hulb (1986). " n - LineR SyRe~m 011 a lUng of
Pnx-n.• Lift. A.lg• ..ale. Appiic. 77, 205-2:m.
D.P. O'Leary and G.W. S~ (1986). •AMp"*" BOd Scheduling iD Pamllel Mairix
FadorisMio11,• Lin. A.lg. and lu App!ic.. TT, 275-300.
R.S. Schrl!lib. (1988). •sJock Atpnlum for Puallel Mlld!iDea," iD NWM!riaJl Algo-
rithnu for Modern Pflnllld Camputer A~hlfU. M.H. Schultli (ed), IMA Volumes
in Mub.emailce aDd Ita Application~~, NUIDber 13, Sprinpr'-Verles, Berlh1, 191-207.
S.L. Jobn.o11 and W. Lichte~~Aein (1993). "Bkx:k Cydk Dell8e Linear Alpbn,ft SIAM
J. Sci.Comp. 1-l, 1257-1286.

Papers specific:ally coocarned with LU, Choleaky ud QR illclude

R.N. Kapur and J.C. Brow~~e (1984}. "'''echn.iqUM !or SoWing Blodr. 'I'ridiaconel Sys&em~~
on Recollfigun&bla Airay Com~," SlAM J. Sci. ond SUU. Camp. 5, 701-119.
G.J. Davis (1986). "Column LU Pi...otillg on a Hypm-cube Multip~.ft SlAM J.
Alg. and DUe. Metlwd.1, .538-550.
J.M. Delosme ud I.C.F. Ipseo. (1986). ~Panllel Solution of Symmet.ric ~'M Definite
Systems with Hyperbolic Ro&Miooa," Lin. Alg. and Iu Applic. TT, ~112.
A. Potb.en, S. Jha, a.nd U. V~~~J~BPul.ati (1987). "Ortbogona.l Fac:torizatjoo on a Dis-
tributed Memory Multi~." in Hyperr:u/18 MuJI.i~or1, ed. M.T. Heath,
SIAM P~, 1981.
C.H. Billchof (1988). "QR Factorizatinn Algorithms for Coane Grain Disiributed Sya...
tema," PhD~ Depc. of Computer Scl,nce, Cornell Univemty, libca, NY.
G.A. Geillt aAd C.H. &amine (1988). "LU Factorization Aliorithma on Dillt.ributed.
Memory M~ Arclrlt.ect~~n~~~," SIAM J. Sci. and S""- Comp. 9, 639--649.
J.M. Ortega and C.H. Romil1e (1988). "The ijlc F'ormll of Factoriation Methods Il:
Pe.rallel Sywtema," Panalld Cmn,.mng 1, 149-162.
M. MliiTUI:hi SAd Y. Robst (1989). "'ptimal Algoritbml for GaUMian Eliminat.ion on
an MIMD Computer,R Parallel Computing 11!, 183-194.
Parallel triaDplar syatan .wine is ~ in

R. Momoye aad D. Laurie (1982). MA Ptw:tical Algorithm for tbe SolutioD of'I'riangulat
S}'llteml on a Panllel ~ Synem." IEEE lhlru. Comp. C-31, UJ76-1082.
D.J. EYaDI aDd R. Dunbar (1983). "'The Pa.allel Solution of 'I\"iangul8t Syatem. of
Equ.ations,R IEBB ThiN. Ctnn.p. C-3!, 201-204.
C. H. Romine and J.M. Ortega (1988). "Pwaalel SolutioD of'Ina.Dgular Systems of Equa.
tions," Parallel Cornptlting 6, 109-114.
M.T. Hmt.h IIZld C.R. Romine (1988). "ParaDe!. Solution of~ System. on Dis-
tributed Memory M~" SIAM J. Sci. and St4L Comp. 9, ~.
G. U ll.lld T. ColemaD (1988). "A PamDel niNl&ular SoMr for a Dishibuied-Memory
M~r," SIAM J. Sci. lind Stat. COfiiJI. 9, 485-502.
S.C. E-...t, M.T. Heatb, C.S. Bmbl, and C.H. Romine (1988). "Modified Cyclic
Algorithlm fot Solving 'I'ciuJpiM s,.._ 011 Di.tributed Memory Uultipmcwra,"
SIAM J. Sci. and SfGL Ccmp. 9, 589-600.
N.J. Bigh&m (1995). "Stability of Parallel niNl&ular Syatem Solwm.~ SIAM J. Scf.
Com-p. 111. 401)-.(13.
Papers oo the parallal OOU!pllt8tion of the LU aDd ChoiMky fac:torisat.ioD iDclude
306 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS

R.P. Bnmt ud F .T. Luk (1982) "Computing the Choleaky F'actorisMion Ullillg & SyRolic
An:hiteciwe,~ Prvc. 6th Ati.HnlliGn Computer S~ Con/. 296-302.
D.P. 0~ utd G.W. Stewart {1985). "D&ta Flow Algoritb.I:PIJ for Parallel MMrix
Computaiiooll," Comm. of the ACM 118, Ml-853.
J.M. De1omoe lltld LC.F. 11*111 (1986). "Pu"&Uee Solu\ion of Symmetric; PDBitiw Definite
S)'ll&em8 with Hyperbolic Roce&.iou," Lin. Alg. and Ju Applic. 'fT, ~112.
R.E. Funderik ud A. Geist ( 1986). "Torus Da&& Fknr for Pamllel. Computation of
MiMised Ma&rix Problems," Lin. Alg. and lu AJ'Piic. 11, 149-164.
M. Costnatd. M. Manakchi, Ad Y. Hoben (1988). "P~ G&UIIian Elimination on
an MIMD Computer," Parulld Ctnnputing 6, 275-296.
Parallel m.ethods for banded and spane lf}"'temml include

S.L. JohnMon (1985). "Solving NarT'CIW Banded Syfiems 011 Ell86Dlhle An:hitectlln!8,"
ACM '1l-aru.. MGth. Soft. 1l, 211-288.
S.L. J~n (1986). KBand M&trix Sy.tem Solvcn on Eo.emble An:hitectlln!ll," in
Supm:omputen: Algoritllnu, Arcllitectuf'U, cmrl Scimnfic Compul4tion, edl. F.A.
M~ aDd T. -r.jima, U.llivenRy of Teua ~ Austin TX., 196-216.
S.L. J~n (1987). KSolvinK Tridiagonal Sywte~m on Euemble An:ltitect.un~B," SIAM
J. Sci. tmd Stat.. Camp. 8, 3M-392.
U. Mei.m (1985). wA P&rallel. Partition Method for Solving Ba.Dded Syaten11 of Lineal"
Equa&~u.," PG.t'Glld Compu&ers I, 33-43.
H. van del" Yom (1987). MI...arp 'l'ridie«onal and Block Tridiagonal LineR System~~ on
Vector and Perallel Compuua,~ Pflflllld Cmnput. 5, 45-M.
R. Bev&cqua., B. Codenotii, and F. Romani (1988). KPantJlel SolutioD of Bloc:k nidi-
agonal Llnea:r Syatems," L'n.Alg. Gnd Iu Appiic. 10.4, 39-57.
E. Gallopouloll ao.d Y. Saad (1989). wA Panllel Block Cyclic Rllduction Algorithm for
the Fut Solution of Elliptic Eqll&iiowl," Paralld Computing 10, 143-160.
J.M. Conroy (1989). MA Note on the PMallel Choleaky Factorizalioo of Wide BUided
Mauic:el," Pcwallel Computing 10, 239-246.
M. HecJaod (1991). "'n the Pu-allel Solution of Tridiagonal S~ by ~Around
Partitioning &nd lncompiel.e LU F'acWrizetion," Num£r. Ma:h. Sg, 453-472.
M.T. Heath, E. Ng, &nd B.W. Peyion (1Wl). "P&n.11el Alcoritimla for Sp&ne Lineer
Syatema," SIAM &tM:ul :1::1, 42().-460,
V. Mebnaann (1993). MDivide &nd Conquer Methodll for Block Tridiagona.l. Systems,"
PIJf'OUel Computing 19, 257-280.
P. Ragh&V&D (199S). wDistributed Spat811 GIWMiMl EliminAtion and Orthogon.&l f'atctor..
bation," SIAM J. Sci. Comp. 16, 1462-14T7.

Pare.l.lel QR. fiiCWI'iatio11 prooedurea a.re or intenlt in real-time signal proceMil1g. Do-
taila ma,y be found in

W.M. Gentleman aDd H.T. Kung (1981). ~Mauix 'I'riangu.l.arilon by Systolic Arraya, •
SPIE P~lnp, Vol. 298, 19-26.
D.E. Heller .ad I.C.F. IIMS! (1983). "S)"IItolic Netwwb for Ortho&onal Decomposi.tionll,"
SIAM J. Sci. and St4t. Comp . .4, 261-269.
M. Coeinard., J.M. Muller-, aiJd Y. Rot-f. (1986). "PanJJel. QR Decomposition of a.
~laz- Matrix,~ Nvm.er. MatJ&.. .f8, 23g....250_
L. Eldin aod R.. Sc:lmnbm- {1986). "An Application of Sy.tolic Anaya to Linear D~
111-P~ p~· SIAM J. Sci. tmd St4t. Comp. 7, 892:-oo.l.
F.T. Lu.k (1986). "A R~Mtion Mahod for Computing the QR FactotiaUcm," SIAM J.
Sci. and StaL Camp. 1, 452-M9.
J.J. Modi and M.R.B. Clade (1986). ~An Alternalive Give~U~ Orderi.as," Numer. Math.
.4:1, 83-90.
P. Amodio and L. BnJ&D&DO ( 1995). ~he PanJJel QR fact.miaatioG Algorithm for
'Ihdi.aCQneJ LiDee.r Systems," Panslld Com,utmg J1, L097-1110.
6.3. FAc:I'ORIZATIONS 307

S. Chea, D. Kuc:k, aod A- Semeh (1978). •Practical PamHel 8aDd 'I'riaDp1az' Syat.ema
Solwra," ACM Thlru. Math. Soft. .4, 210-217.
A. Samail aDd D. KliCk (1978). "On Stable Parallel l.m-- Syaem Solven," J. ANoe.
Comp. MadL 15, 81-91.
P. s~ (1979). ""A PacalW AJgoritbm for Solving G--.1. Tridiagonal Equa-
Uona," MI&IA. Comp. 33, 185-199.
s. Chea, J. Doqvra, aud c. lkuing {1984}. "Multi~ u - Alg8bra Algo-
ritlu:n. on the Cray X-MP-2: ~with Sm&ll Gruulariiy," J. PonUid ond
Dilirlhtelf Comptlting 1, 22-31.
J.J. Donpna aDd A.H. Sameh (191W). "'D Some Pamll4!1 Banded System Sol-."
Paralld Computmg 1, 223-235.
J.J. Donpna aad R.E. H~ (1984). "A CoHection of Parallel Linear Equation
Routinm for the Deneloor HEP,' Parallel Compuilng 1, 133-142.
J.J. Dongarra and T. Hewi.U (1986). •ImplementiJls Deuae LiDeN" Alpbnt. Algoritbm.
Using Multit.alkillr; on the Cn.y X-MP-4 (m- Approadrinr; the Gi.pllop)," SlAM J.
Sci. tmd Sttat. Cmnp. 7, 347-3:50.
J.J. Dooprra, A. Sameh, and D. ~ (1986). •rmplemea~ion of Some ConcmTI!!Ilt
Algorithml for Murix ~ion," P!&n~Ud Camyutmg 3, 25-34.
A. George, M.T. Heath, and J. Liu (1986). "P111111le.l Cbolmky Factorization on a Shared
Memory Multip~," Lin. ttlg. and It~ Applic. 77, 165-187.
J.J. Donprm and D.C.~ (1987). "Linear Algebra on High PerfDiliiBIIc& Co~
putel"ll," Appl. Math. and C111J1p. 10, 57-88.
K. Dackland, E. FJmroth, aud B. Kaptrom (191i12). "Parallel Block Factorizations on the
Shared Memory Multiproc_.- IBM 3090VF/600J," ln~ J. SuJli!I'"COf7lputer
Applicatioru, 6, 69-97.
Chapter 7

The U nsymmetric
Eigenvalue Problem

§7.1 Properties and Decompositions


§7.2 Perturbation Theory
§7.3 Power Iterations
§7 .4 The Hessenberg and Real Schur Forms
§7.5 The Practical QR Algorithm
§7.6 Invariant Subspace Computations
§7.7 The QZ Method for Ax= >.Bx

Having discussed linear equations and least squares, we now direct our
attention to the third major problem area in matrix computations, the
algebraic eigenvalue problem. The unsymmetric problem is considered in
this chapter and the more agreeable symmetric case in the next.
Our first task is to present the decompositions of Schur and Jordan
along with the basic properties of eigenvalues and invariant subspaces. The
contrasting behavior of these two decompositions sets the stage for §7.2
in which we investigate how the eigenvalues and invariant subspaces of
a matrix are affected by perturbation. Condition numbers are developed
that permit estimation of the errors that can be expected to arise because
of roundoff.
The key algorithm of the chapter is the justly famous QR algorithm.
This procedure is the most complex algorithm presented in this book and its
development is spread over three sections. We derive the basic QR iteration
in §7.3 as a natural generalization of the simple power method. The next

308
309 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

two sections are devoted to making this basic iteration computationally


feasible. This involves the introduction of the Hessenberg decomposition in
§7.4 and the notion of origin shifts in §7.5.
The QR algorithm computes the real Schur form of a matrix, a canonical
form that displays eigenvalues but not eigenvectors. Consequently, addi-
tional computations usually must be performed if information regarding
invariant subspaces is desired. In §7.6, which could be subtitled, "What to
Do after the Real Schur Form is Calculated," we discuss various invariant
subspace calculations that can follow the QR algorithm.
Finally, in the last section we consider the generalized eigenvalue prob-
lem Ax = >.Ex and a variant of the QR algorithm that has been devised to
solve it. This algorithm, called the QZ algorithm, underscores the impor-
tance of orthogonal matrices in the eigenproblem, a central theme of the
chapter.
It is appropriate at this time to make a remark about complex versus real
arithmetic. In this book, we focus on the development of real arithmetic
algorithms for real matrix problems. This chapter is no exception even
though a real unsymmetric matrix can have complex eigenvalues. However,
in the derivation of the practical, real arithmetic QR algorithm and in the
mathematical analysis of the eigenproblem itself, it is convenient to work
in the complex field. Thus, the reader will find that we have switched to
complex notation in §7.1, §7.2, and §7.3. In these sections, we use complex
versions of the QR factorization, the singular value decomposition, and the
CS decomposition.

Before You Begin

Chapters 1-3 and §§5.1-5.2 are assumed. Within this chapter there are
the following dependencies:

§7.1 _, §7.2 _, §7.3 _, §7.4 _, §7.5 _, §7.6 _, §7.7

Complementary references include Fox (1964), Wilkinson (1965), Gourlay


and Watson (1973), Stewart (1973), Hager (1988), Ciarlet (1989), Stewart
and Sun (1990), Watkins (1991), Saad (1992), Jennings and Me Keowen
(1992), Datta (1995), Trefethen and Bau (1997), and Demmel (1996). Some
Matlab functions important to this chapter are eig, poly, polyeig, hess,
qz, rsf2csf, cdf2rdf, schur, and balance. LAPACK connections include
310 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

LAPACK: Unsymmetric Eigenproblem


_GEBAL Balance transform
_GEBAK Undo balance transform
_GEHRD Hessenberg reduction uH AV = H
_QRMHR U (factored form) times matrix (real case)
_ORGHR Generates U (real case)
_lJJlMHR U (factored form) times matrix (complex case)
_lJJlGHR Generates U (complex case)
_HSEQR Schur decomposition of Hessenberg matrix
_HSEIN Eigenvectors of Hessenberg matrix by inverse iteration
_GEES Schur decamp of general matrix with e. value ordering
_GEESX Same but with condition estimates
_GEEV Eigenvalues and left and right eigenvectors of general matrix
_GEEVX Same but with condition estimates
- TREVC Selected eigenvectors of upper quasitriangular matrix
_tl\SNA Cond. estimates of selected eigenvalues of upper quasitriangular matrix
_TREXC Unitary reordering of Schur decomposition
- tl\SEN Same but with condition estimates
_TRSYL Solves AX+ X B = C for upper quasitriangular A and. B

LAPACK: Unsymmetric Generalized Eigenproblem


_GGBAL Balance transform
_GGHRD Reduction to Hessenberg-Triangular form
_HGEQZ Generalized Schur decomposition
_TGEVC Eigenvectors
_GGBAK Undo balance transform

7.1 Properties and Decompositions


In this section we survey the mathematical background necessary to develop
and analyze the eigenvalue algorithms that follow.

7.1.1 Eigenvalues and Invariant Subspaces


The eigenvalues of a matrix A E <C" xn are the n roots of its chamcteristic
polynomial p(z) = det(zJ- A). The set of these roots is called the spectrum
and is denoted by >.(A). If >.(A) = {>.1, ... , >.,.},then it follows that

det(A) = >.1>.2 · · · >.,. .


Moreover, if we define the tmce of A by
n

tr(A) = 2:: a;;,


i=l

then tr(A) = AJ + · · · + An· This follows by looking at the coefficient of


zn-! in the characteristic polynomial.
If>. E >.(A), then the nonzero vectors x E <C" that satisfy
Ax= >.x
7 .1. PROPERTIES AND DECOMPOSITIONS 311

are referred to as eigenvectors. More precisely, x is a right eigenvector for >.


if Ax= >.x and a left eigenvector if xH A= >.xH. Unless otherwise stated,
"eigenvector" means "right eigenvector."
An eigenvector defines a"one-dimensional subspace that is invariant with
respeet to premultiplication by A. More generally, a subspace S <:;;; C" with
the property that
xES==}AxES
is said to be invariant (for A). Note that if

AX=XB,
then ran(X) is invariant and By= >.y =} A(Xy) = >.(Xy). Thus, if X has
full column rank, then AX= XB implies that >.(B)<:;;; >.(A). If X is square
and nonsingular, then >.(A)= >.(B) and we say that A and B = x-
1
AX
are similar. In this context, X is called a similarity transformation.

7 .1.2 Decoupling
Many eigenvalue computations involve breaking the given problem down
into a collection of smaller eigenproblems. The following result is the basis
for these reductions.
Lemma 7.1.1 JfT E cnxn is partitioned as follows,

T=

then >.(T) = >.(Tn) U .X(T22).


Proof. Suppose

where XJ E (CP and x2 E <Cq. If x2 I= 0, then T22X2 = >.x2 and so >. E


>.(T22). If x2 = 0, then Tux! = >.x 1 and so >. E >.(Tu). It follows that
>.(T) C >.(Tu) U .X(T22l· But since both >.(T) and >.(Tu) U >.(T22) have the
same cardinality, the two sets are equal. D

7.1.3 The Basic Unitary Decompositions


By using similarity transformations, it is possible to reduce a given matrix
to any one of several canonical forms. The canonical forms differ in how
they display the eigenvalues and in the kind of invariant subspace informa-
tion that they provide. Because of their numerical stability we begin by
discussing the reductions that can be achieved with unitary similarity.
312 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Lemma 7.1.2 If A E d::"xn, B E ~xp, and X E d::"xp satisfy

AX=XB, rank( X)= p, (7.1.1)

then there exists a unita111 Q E d::"x" such that

T 11 T12 ] p
QHAQ = T =
[ 0 T22 n-p (7.1.2)
p n-p

where ).(T11 ) =).(A) n ).(B).

Proof. Let

be a QR factorization of X. By substituting this into (7.1.1) and rearrang-


ing we have

where
T11 T12 ] P
[ T21 T22 n- p
p n-p
By using the nonsingularity of R 1 and the equations T2 1R 1 = 0 and T11R1 =
R 1 B, we can conclude that T2 1 = 0 and ).(T11 ) = ).(B). The conclusion
now follows because from Lemma 7.1.1 ).(A) = ).(T) = ).(T11 ) U ).(T22)· []

Example 7.1.1 If
67.00 177.60 -63.20 ]
A = -20.40 95.88 -87.16 ,
[ 22.80 67.84 12.12
X= [20, -9, -12]T and B = [25], then AX= X B. Moreover, if the orthogonal matrix
Q is defined by
-.800 .360 .480 ]
Q = .360 .928 -.096 '
[ .480 -.096 .872
then QT X= [-25, 0, OjT and

2
QT AQ = T = [ ~ ~~~ -10~ ]
0 146 3
A calculation shows that .I.( A) = {25, 75 + IOOi, 75 - 100i}.

Lemma 7.1.2 says that a matrix can be reduced to block triangular form
using unitary similarity transformations if we know one of its invariant
subspaces. By induction we can readily establish the decomposition of
Schur (1909).
7 .1. PROPERTIES AND DECOMPOSITIONS 313

Theorem 7.1.3 (Schur Decomposition) If A E IC"xn, then there exists


a unitary Q E IF x n such that
(7.1.3)

where D = diag(>. 1 , ••• , >.n) and N E (C"xn is strictly upper triangular.


Furthermore, Q can be chosen so that the eigenvalues >.i appear in any
order along the diagonal.
Proof. The theorem obviously holds when n =1. Suppose it holds for all
matrices of order n- 1 or less. If Ax= >.x, where x "/= 0, then by Lemma
7.1.2 (with B = (>.)) there exists a unitary U such that:

). WH ] 1
[ 0 C n-1
1 n -1

By induction there is a unitary [; such that [; H CU is upper triangular.


Thus, if Q = Udiag(1, U), then QH AQ is upper triangular. [J

Example 7".1.2 If

A= [ _; n and Q = [ .B944i
-.4472
.4472 ]
-.8944i '
then Q is unitary and
-6 ]
3- 4i .

If Q = [ q1, ... , qn ] is a column partitioning of the unitary matrix Q in


(7.1.3), then the qi are referred to as Schur vectors. By equating columns
in the equations AQ = QT we see that the Schur vectors satisfy
k-1
Aqk = >.kQk + L n;kqi k = 1:n. (7.1.4)
i=l

From this we conclude that the subspaces

sk = span{qJ, ... ,qk} k = 1:n

are invariant. Moreover, it is not hard to show that if Qk = I q1 , ... , qk],


then >.(Qf: AQk) = {>. 1 , ••. , >.k}· Since the eigenvalues in (7.1.3) can bear-
bitrarily ordered, it follows that there is at least one k-dimensional invariant
subspace associated with each subset of k eigenvalues.
Another conclusion to be drawn from (7.1.4) is that the Schur vector qk
is an eigenvector if and only if the k-th column of N is zero. This turns out
to be the case for k = 1:n whenever AHA = AAH. Matrices that satisfy
this property are called normal.
314 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Corollary 7.1.4 A E CC"xn is normal if and only if there exists a unitary


Q E CC"xn such that QH AQ = diag(AJ, ... , An).
Proof. It is easy to show that if A is unitarily similar to a diagonal matrix,
then A is normal. On the other hand, if A is normal and QH AQ = T is
its Schur decomposition, then T is also normal. The corollary follows by
showing that a normal, upper triangular matrix is diagonal. D

Note that if QH AQ = T = diag(A;) + N is a Schur decomposition of a


general n-by-n matrix A, then II N IIF is independent of the choice of Q:
n

II N II~ = II A II~ - L IA;I 2 = ~ 2


(A).
i=1

This quantity is referred to as A's departure from normality. Thus, to


make T ''more diagonal," it is necessary to rely on nonunitary similarity
transformations.

7.1.4 Nonunitary Reductions


To see what is involved in nonunitary similarity reduction, we examine the
block diagonalization of a 2-by-2 block triangular matrix.

Lemma 7.1.5 LetT E CC"xn be partitioned as follows:

T = [ T~1 ~:~ ] :
p q

Define the linear tronsformation rj>:<fJ'xq --> <Cpxq by

r/>(X) = TuX - XT22

where X E <CPxq. Then </> is nonsingular if and only if A(T11 ) n A(T22l = 0.


If¢ is nonsingular and· Y is defined by

y = [ lp
0 lq
z] r/>(Z) = -T12

then y- 1TY = diag(Tu, T22).

Proof. Suppose ¢(X) = 0 for X oft 0 and that

[~
T
p-r
r q -r
7.1. PROPERTIES AND DECOMPOSITIONS 315

is the SVD of X withEr = diag(a;), r = rank(X). Substituting this into


the equation T 11 X = XT22 gives

where UHTuU = (A;i) and VHT22V = (B;i). By comparing blocks we see


that A21 = 0, B12 = 0, and >.(Au) = .X(Bu)- Consequently,

01- >.(An) = .X(Bu) <;;; .X(Tu) n .X(T22).


On the other hand, if >. E .X(Tn) n .X(T22 ) then we have nonzero vectors x
andy so Tux = >.x and yHT2 2 = >.yH. A calculation shows that ¢(xyH)
= 0. Finally, if ¢ is nonsingular then the matrix Z above exists and

Example 7.1.3 If

1.0 0.5 -0.5 ]


T = [ g 32 83 ] and Y ~ 0.0 1.0 o.o
-2 3 [ 0.0 0.0 1.0

then

By repeatedly applying Lemma 7.1.5, we can establish the following more


general result:

Theorem 7.1.6 (Block Diagonal Decomposition) Suppose

l
l
Tu T12 · · · Ttq
0 T22 · · · T2
QHAQ = T = ... ... .. (7.1.5)
..
.q
.
0 0 Tqq

is a Schur decomposition of A E ID"xn and assume that the T;; are square.
If >..(T;;) n >.(Tii) = 0 whenever i f. j, then there exists a nonsingular matrix
Y E ID"xn such that

(QY)- 1 A(QY) = diag(Tu, ... ,Tqq)· (7.1.6)


316 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Proof. A proof can be obtained by using Lemma 7.1.5 and induction. IIJ]

If each diagonal block T;; is associated with a distinct eigenvalue, then we


obtain
Corollary 7.1.7 If A E <C"x" then there exists a nonsingular X such that

(7.1.7)

where A1. ... , Aq are distinct, the integers n 1 , ... , nq satisfy n1 + · · · + nq =


n, and each N; is strictly upper triangular.
A number of important terms are connected with decomposition (7.1.7).
The integer n; is referred to as the algebraic multiplicity of A;. If n; = 1,
then A; is said to be simple . The geometric multiplicity of A; equals the
dimensions ofnull(N;), i.e., the number of linearly independent eigenvectors
associated with A;. If the algebraic multiplicity of A; exceeds its geometric
multiplicity, then A; is said to be a defective eigenvalue. A matrix with
a defective eigenvalue is referred to as a defective matrix. Nondefective
matrices are also said to be diagonalizable in light of the following result:
Corollary 7.1.8 (Diagonal Form) A E <r'xn is nondefective if and only
if there exists a nonsingular X E <C" x" such that

(7.1.8)

Proof. A is nondefective if and only if there exist independent vectors


x1 ... Xn E <C" and scalars A1, ... , An such that Ax; = A; X; for i = 1:n. This
is equivalent to the existence of a nonsingular X= [xll···,xn] E <Cnxn
such that AX= X D where D = diag(A 1 , ... , An)· Cl

Note that if yfl is the ith row of x- 1, then yfl A = A;yfl. Thus, the columns
of x-T are left eigenvectors and the columns of X are right eigenvectors.

Example 7.1.4 If

~
-1 1
A = [ 5 and X= [ -2
-2 6
then x- 1 AX= diag(4, 7).

If we partition the matrix X in (7.1.7),

X = [ XI , ... ' Xq
n1 nq

then <C" = ran(X1) EB ... EB ran(Xq), a direct sum of invariant subspaces. If


the bases for these subspaces are chosen in a special way, then it is possible
to introduce even more zeroes into the upper triangular portion of x- 1AX.
7 .1. PROPERTIES AND DECOMPOSITIONS 317

Theorem 7.1.9 (Jordan Deco~posi.tion) If A E cvnxn, then there ex-


ists a nonsingular X E cvnxn such that x-l AX = diag(JI, ... , J,) where

0 0

is mt -by-m, and m 1 + · · · + m, = n .
Proof. See Halmos {1958, pp. 112 ff.) 0

The J; are referred to as Jordan blocks . The number and dimensions of the
Jordan blocks associated with each distinct eigenvalue is unique, although
their ordering along the diagonal is not.

7.1.5 Some Comments on Nonunitary Similarity


The Jordan block structure of a defective matrix is difficult to determine
numerically. The set of n-by-n diagonalizable matrices is dense in ern X n,
and thus, small changes in a defective matrix can radically alter its Jordan
form. We have more to say about this in §7.6.5.
A related difficulty that arises in the eigenvalue problem is that a nearly
defective matrix can have a poorly conditioned matrix of eigenvectors. For
example, any matrix X that diagonalizes

A= [1+ f 1] (7.1.9)
0 1- f
has a 2-norm condition of order 1/f.
These observat ions serve to highlight the difficulties associated wit h ill-
conditioned similarity transformations. Since

(7.1.10)

where
II E ll2 ::::: uttz(X) II A liz (7.1.11)

is it clear that large errors can be introduced into an eigenvalue calculation


when we depart from unitary similarity.
318 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

7.1.6 Singular Values and Eigenvalues


Since the singular values of A and its Schur decomposition QH AQ =
diag(>.;) + N are the same, it follows that

From what we know about the condition of triangular matrices, it may be


the case that
!>.;I
max -l>.l « ~~: 2 (A).
1,J :J

This is a reminder that for nonnormal matrices, eigenvalues do not have the
"predictive power" of singular values when it comes to Ax = b sensitivity
matters. Eigenvalues of nonnormal matrices have other shortcomings. See
§11.3.4.

Problems

P7.1.1 Show that if T E q;nxn is upper triangular and normal, then Tis dh>gonal.
P7.1.2 Verify that if X diagonalizes the 2-by-2 matrix in (7.1.9) and f S 1/2 then
~<t(X) 2: 1/<.
P7.1.3 Suppose A E q;nxn has distinct eigenvalues. Show that if QH AQ "'Tis its
Schur decomposition and AB = BA, then QH BQ is upper triangular.
P7.1.4 Show that if A and BH are in cr;mxn with m 2: n, then:
>.(AB) = >.(BA) U { 0, ... , 0 }.
~
m-n

P7.1.5 Given A E q;nxn, use the Schur decomposition to show that for every < > 0,
there exists a diagonalizable matrix B such that II A - B l12 S f. This shows that the set
of diagonalizable matrices is dense in q;nxn and that the Jordan canonical form is not
a continuous matrix decomposition.
P7.1.6 Suppose Ak --> A and that Q{! AkQk = Tk is a Schur decomposition of Ak.
Show that {Q.} has a converging subsequence {Q•J with the property that
lim Qk; = Q
•-oo
where QH AQ = T is upper triangular. This shows that the eigenvalues of a matrix are
continuous functions of its entries.
P7.1.7 Justify (7.1.10) and (7.1.11).
P7.1.8 . Show how to compute the eigenvalues of
k
M= j

where A, B, C, and Dare given real diagonal matrices.


P7.1.9 Use the JCF to show that if all the eigenvalues of a matrix A are strictly less
7.1. PROPERTIES AND DECOMPOSITIONS 319

than unity, then limk-oo A k = 0.


P7.1.10 The initial value problem
:i:(t) y(t) x(O) = 1
y(t) -x(t) y(O) = 0
has solution x(t) = cos(t) and y(t) = sin(t). Let h > 0. Here are three reasonable
iterations that can be used to compute approximations Xk "" x(kh) and Yk "" y(kh)
assuming that xo = 1 and Yk = 0:

Method 1: Xk+l 1 +hYk


Yk+l 1- hxk

Method 2: Xk+l 1 +hYk


Yk+l 1- hxk+I

Method 3: Xk+l 1+hYHl


Yk+J 1- hxk+L
Express each method in the form

[ :~:: ] = Ah [ :: ]
where Ah is a 2-by-2 matrix. For each case, compute A(Ah) and use the previous problem
to di.scUBS limxk and timyk ask--+ oo.
P7.1.11 If J E R"xd is a Jordan block, what is l<oo(J)?
P7.1.12 Show that if
p
R q

is normal and A{Ru) n A(R22l = 0, then R12 = 0.

Notes and References for Sec. 7.1


The mathematical properties of the algebraic eigenvalue problem are elegantly covered in
Wilkinson (1965, chapter 1) and Stewart {1973, chapter 6). For those who need further
review we also recommend

R. Bellman {1970). Introduction to Matrix Analysis, 2nd ed., McGraw-Hill, New York.
I.C. Gohberg, P. Lancaster, and L. Rodman {1986). Invariant Subspaces of Matrices
With Applications, John Wiley and Sons, New York.
M. Marcus and H. Mine {1964). A Suroey of Matrix Theory and Matrix Inequalities,
Allyn and Bacon, Boston.
L. Mirsky (1963). An Introduction to Linear Algebra, Oxford University Press, Oxford.
The Schur decomposition originally appeared in

I. Schur (1909). "On the Characteristic Roots of a Linear Substitution with an Appli-
cation to the Theory of Integra.! Equations." Math. Ann. 66, 488-510 (Gennan).
A proof very similar to ours is given on page 105 of

H. W. Turnbull and A. C. Aitken {1961). An Introduction to the Theory of Canonical


Fonna, Dover, New York.
320 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Connections between singular values, eigenvalues, and pseudoeigenvalues (see §11.3.4)


e.re discussed in

K-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra
of Companion Matrices," Numer. Math. 68, 403-425.
F. Kittaneh (1995). "Singular Values of Companion Matrices and Bounds on Zeros of
Polynomials," SIAM J. Matrix Anal. Appl. 16, 333-340.

7.2 Perturbation Theory


The act of computing eigenvalues is the act of computing zeros of the char-
acteristic polynomial. Galois theory tells us that such a process has to be
iterative if n > 4 and so errors will arise because of finite termination. In
order to develop intelligent stopping criteria we need an informative per-
turbation theory that tells us how to think about approximate eigenvalues
and invariant su bspaces.

7.2.1 Eigenvalue Sensitivity


Several eigenvalue routines produce a sequence of similarity transformations
Xk with the property that the matrices
1
x;
AXk are progressively "more
diagonal." The question naturally arises, how well do the diagonal elements
of a matrix approximate its eigenvalues?

Theorem 7.2.1 (Gershgorin Circle Theorem) If x- 1 AX = D +F


where D = diag(d 1 , ••• , dn) and F has zero diagonal entries, then
n

n
where D; {z E <C: lz- d;l < L 1/;jl}.
j=l

Proof. Suppose.>.. E .>..(A) and assume without loss of generality that.>.. =J d;


for i = l:n. Since (D-.>..!) + F is singular, it follows from Lemma 2.3.3
that

for some k, 1 ~ k ~ n. But this implies that.>.. E Dk. [J

It can also be shown that if the Gershgorin disk D; is isolated from the other
disks, then it contains precisely one of A's eigenvalues. See Wilkinson (1965,
7.2. PERTURBATION THEORY 321

pp. 71ff.).

Example 7.2.1 If
10 2 3 ]
A = -1 0 2
[ 1 -2 1
then A( A) ::e {10.226, .3870 + 2.2216i, .3870- 2.2216i} and the Gershgorin disks are
D1 = { lzl : lz- 101 :0: 5}, D2 = { lzl : lzl ::; 3}, and D3 = { lzl : lz- II :0: 3}.

For some very important eigenvalue routines it is possible to show that the
computed eigenvalues are the exact eigenvalues of a matrix A+ E where E
is small in norm. Consequently, we must understand how the eigenvalues
of a matrix can be affected by small perturbations. A sample result that
sheds light on this issue is the following. theorem.
Theorem 7.2.2 (Bauer-Fike) If p. is an eigenvalue of A+ E E <C"xn
and x- 1AX= D = diag(>.1, ... , >-n), then

min
>.E>.(A)
1>-- !JI ~ ~>p(X)II E II
P

where II · liP denotes any of the p-norms.


Proof. We need only consider the case when /.1 is not in >.(A). If the matrix
x- 1 (A+E-!JI)X is singular, then so is I +(D- !JI)- 1 (X- 1 EX). Thus,
from Lemma 2.3.3 we obtain
1 ~ II (D- !J/)- 1(X- 1EX) lip ~ I (D- !J/)- 1liP II X liP II E liP II x- 1liP.
Since (D- !J/)- 1 is diagonal and the p-norm of a diagonal matrix is the
absolute value of the largest diagonal entry, it follows that

II (D- !JJ)-1 liP = >.~i<~l 1>- ~ !JI


from which the theorem follows. D

An analogous result can be obtained via the Schur decomposition:


Theorem 7.2.3 Let QH AQ = D+N be. a Schur decomposition of A E <C"xn
as in (7.1.3}. If /.1 E >.(A+ E) and p is the smallest positive integer such
that INIP= 0, then

min 1>-- !JI ~ max(&, e11P)


>.E>.(A)

where
p-1
6 II E 112 L II N II~ .
k=O
322 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Proof. Define

6 =

The theorem is clearly true if 6 = 0. If 6 > 0 then I- (Jl.l- A)- 1 E is


singular and by Lemma 2.3.3 we have
1 ::; II (J.LI- A)- 1E 112 ::; I (J.Ll- A)-I 11211 E 112 (7.2.1)
= II ((J.LI- D)- N)- 1 11211 E 112.
Since (J.LI- D)-I is diagonal and INIP = 0 it is not hard to show that
((J.LI- D)- 1 N)P = 0. Thus,
p-1
((J.Ll- D)- N)- 1 = L ((J.Ll- D)-I N)k (J.LI- D)- 1
k=O
and so
1 (II N ll2)k
II ((J.LI- D)- N)- 1 lb < -:L-
p-I

6 k=O 6
If 8 > 1 then
p-1
II(J.Ll-D)-N)- 1
112::; ~ LIINII;
k=O
and so from (7.2.1), 6::; (J. H 8::; 1 then

II (J.LI- D) -
1
N)-l 112 ::; 8P I: II
k=O
N II~
and so from (7.2.1), 8P ::; fJ. Thus, 8::; max(fJ, (J 11P). D

Example 7.2.2 If

A=[~~~]
0 0 4.001
and E=[~.001 0~~].
0
then >.(A+ E) "' {1.0001, 4.0582, 3.9427} and A's matrix of eigenvectors satisfies
rt2{X) "' 107 . The Bauer-Fike bound in Theorem 7.2.2 he.s order 104 , while the Schur
bound in Theorem 7.2.3 has order 10°

Theorems 7.2.2 and 7.2.3 each indicate potential eigenvalue sensitivity if A


is nonnormal. Specifically, if ~>2(X) or II N II~- I is large, then small changes
in A can induce large changes in the eigenvalues.

Example 7.2.3 If

A = [ ~ I~ ] and E = [ 10 ~ 1o ~ J,
7.2. PERTURBATION THEORY 323

=
then for all .1. E .I.(A) and !J E .I.( A+ E), 1.1.- 1-'1 w-1. In this example a change of
order 10- 10 in A results in a change of order 10-t in its eigenvalues.

7.2.2 The Condition of a Simple Eigenvalue


Extreme eigenvalue sensitivity for a matrix A cannot occur if A is normal.
On the other hand, nonnormality does not necessarily imply eigenvalue sen-
sitivity. Indeed, a nonnormal matrix can have a mixture of well-conditioned
and ill-conditioned eigenvalues. For this reason, it is beneficial to refine our
perturbation theory so that it is applicable to individual eigenvalues and
not the spectrum as a whole.
To this end, suppose that .X is a simple eigenvalue of A E <D"xn and
that x andy satisfy Ax = .Xx and yH A = .XyH with II x 1!2 = II y ll2 = 1.
If yH AX = J is the Jordan decomposition with yH = x- 1, then y and
x are nonzero multiples of X(:,i) and Y(:,i) for some i. It follows from
1 = Y(:,i)HX(:,i) that yHx f. 0, a fact that we shall use shortly.
Using classical results from function theory, it can be shown that in a
neighborhood of the origin there exist differentiable x(f) and .X(f) such that
(A+ cF)x(<) = .X(<)x(E)
where .X(O) =.X and x(O) = x. By differentiating this equation with respect
to E and setting E = 0 in the result, we obtain
Ax(O) + Fx = A(O)x + .X:i:(O) .
Applying yH to both sides of this equation, dividing by yH x, and taking
absolute values gives

1-\(0)I = /y:~x~ ~ iy;xl"


The upper bound is attained if F = yxH. For this reason we refer to the
reciprocal of
s(.X) = IYH xi
as the condition of the eigenvalue .X.
Roughly speaking, the above analysis shows that if order f perturbations
are made in A, then an eigenvalue .X may be perturbed by an amount
f/s(.X). Thus, if s(.X) is small, then .X is appropriately regarded as ill-
conditioned. Note that s( .X) is the cosine of the angle between the left and
right eigenvectors associated with .X and is unique only if ). is simple.
A small s(.X) implies that A is near a matrix having a multiple eigen-
value. In particular, if ). is distinct and s(.X) < 1, then there exists an E
sucli that ). is a repeated eigenvalue of A + E and
II E ll2 < s(.X) .
II A ll2 - )1- s(.X) 2
324 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

This result is proved in Wilkinson (1972).

Example 7.2.4 If

0 0 0 ]
A= [ g 042 53 ]
4.001
and E= 0
[ .001
0
0
0
0
,

then J.(A +E) "'{1.0001, 4.0582, 3.9427} and s(1) "'.8 x 10°, s(4)"' .2 x 10- 3 , and
s(4.001)"' .2 X 10-a. Observe that II E ll2/s(J.) is a good estimate of the perturbation
that each eigenvalue undergoes.

7.2.3 Sensitivity of Repeated Eigenvalues


If >. is a repeated eigenvalue, then the eigenvalue sensitivity question is
more complicated. For example, if

and F= [~ ~],
then >.(A+ EF) = {1 ± y'ffi}. Note that if a f. 0, then it follows that the
eigenvalues of A+ EF are not differentiable at zero; their rate of change at
the origin is infinite. In general, if >. is a defective eigenvalue of A, then
0( E) perturbations in A can result in 0( E11P) perturbations in >. if >. is
associated with a p-dimensional Jordan block. See Wilkinson (1965, pp.
77ff.) for a more detailed discussion.

7.2.4 Invariant Subspace Sensitivity


A collection of sensitive eigenvectors can define an insensitive invariant
subspace provided the corresponding cluster of eigenvalues is isolated. To
be precise, suppose

[ T~1 ~~~ ] T
n -r (7.2.2)
T n- T

is a Schur decomposition of A with

(7.2.3)

It is clear from our discussion of eigenvector perturbation that the sensi-


tivity of the invariant subspace ran(Qt) depends on the distance between
>.(Tu) and >.(T22). The proper measure of this distance turns out to be
the smallest singular value of the linear transformation X --+ T 11 X- XT22 .
7.2. PERTURBATION THEORY 325

(Recall that this transformation figures in Lemma 7.1.5.) In particular, if


we define the sepamtion between the matrices Tu and T22 by

min
II TuX- XT22IIF (7.2.4)
X#O IIX IIF

then we have the following general result:


Theorem 7.2.4 Suppose that (7.2.2) and (7.2.3} hold and that for any
matrix E E <C"xn we partition QH EQ as follows:

[~~: r
~~~ ] n-r
n-r
r

If sep(Tu, T22) > 0 and

II E 112 (1 + 511 T12 112 )


sep(Tu, T22)
then there exists aPE ~(n-r)xr with

II P II 2 < 4 II E21 ll2


- sep(Tu, T22)

such that the column.s ofQ 1 = (Q 1 +Q2P)(I+PHp)- 112 are an orthonor-


mal basis for a subspace invariant for A+ E.
Proof. This result is a slight recasting of Theorem 4.11 in Stewart (1973)
which should be consulted for proof details. See also Stewart and Sun
(1990, p.230). The matrix (I+ pH P)- 112 is the inverse of the square root
of the symmetric positive definite matrix I + pH P. See §4.2.10. [J
Corollary 7.2.5 If the assumptions in Theorem 7.2.4 hold, then

. (ran (Q 1 ) ,ran(Q, 1 ))
d1st ::; 4 &1 Tlb
II (T ).
sep u, 22
Proof. Using the SVD of P, it can be shown that
II P(I +pH P)- 112 112 ::; II p lb· (7 .2.5)

The corollary follows because the required distance is the norm of Q!j Q1 =
P(I +pH P)-1/2. [J
Thus, the reciprocal of sep(T11 , T22 ) can be thought of as a condition num-
ber that measures the sensitivity of ran( Q1) as an invariant subspace.

Example 7.2.5 Suppose


3
Tu = [ 0
10 ]
1 '
0
T22 = [ 0
-20 ]
3.01 ' and T,, = [ 1
-1
-1 ]
1
326 CHAPTER 7. THE UNSYMMETR!C EIGENVALUE PROBLEM

and that
A =T = [ TOu T,, ]
T., .
Observe that AQ, = Q1Tu where Q, = [e, e2] E R'X2. A calculation shows that
sep(Tu, T22l ::e .0003. If
E = 10 _6 [ 1 1 )
21 1 1
and we examine the Schur decomposition of

A +E ~ [ f~~ ~~~ ],
then we find that Q 1 gets perturbed to

-.9999 -.0003]
q, = .0003 -.9999
[ -.0005 -.0026
.0000 .0003

Thus, we have dist(ran(QI),ra.n(QI)) ::e .0027"" w- 6/sep(Tu, T22).

7.2.5 Eigenvector Sensitivity


If we set r = 1 in the preceding subsection, then the analysis addresses the
issue of eigenvector sensitivity.

Corollary 7.2.6 Suppose A, E E <C"xn and that Q = [ q1 Q 2 ) E <C"xn is


unitary with q1 E <C". Assume

n-1
1
[~ 1
n-1
1

(Thus, q1 is an eigenveCtor.) If a= amin(T22- >.I) > 0 and

then there exists p E <C"- 1 with

IIPib :::; 4~
a

such that {iJ = (qi +Q2p)j ..j1 + plfp is a unit 2-norm eigenvector for A+E.
Moreover,

Proof. The result follows from Theorem 7.2.4, Corollary 7.2.5 and the
observation that if Tn =>.,then sep(Tn, T22) = amin(T22- >.I). D
7.2. PERTURBATION THEORY 327

Note that CTmin(T22- >.I) roughly measures the separation of>. from the
eigenvalues of T22· We have to say "roughly" because

sep(>., T22) = CTmin(T22- >.I) ~ min /J.'- >.j


/'EA(T22)

and the upper bound can be a gross overestimate.


That the separation of the eigenvalues should have a bearing upon eigen-
vector sensitivity should come as no surprise. Indeed, if>. is a nondefective,
repeated eigenvalue, then there are an infinite number of possible eigen-
vector bases for the associated invariant subspace. The preceding analysis
merely indicates that this indeterminancy begins to be felt as the eigen-
values coalesce. In other words, the eigenvectors associated with nearby
eigenvalues are "wobbly."

Example 7.2.6 If
A = [ 0.00
1.01 0.01
0.99
l
then the eigenvalue .X= .99 has condition l/s(.99) ""1.118 and 8S80ciated eigenvector
x = [.4472, -.8944]T. On the other hand, the eigenvalue .X= 1.00 of the "nearby" matrix

A+ E = [ 1.01
0.00
0.01
1.00
l
has an eigenvector i = [. 7071, -. 7071 jT.

Problems

P7.2.1 Suppose QHAQ = diag(.X1) + N ia a Schur decomposition of A E <Cnxn and


define v(A) = II AHA- AAH IIF· The upper and lower bounds in

v(A)2 < II N 112


611 A II~ - F
<
- V~
---u-v (A)
are established by Henrici (1962) and Eberlein (1965), respectively. Verify th..,e r..,u]ts
for the case n = 2.
P7.2.2 Suppose A E <Cnxn and x- 1 AX = diag(.X1, ... , .Xn) with distinct .X;. Show
that if the columns of X have unit 2-norm, then Kp(X) 2 = n L~=l (1/s(.X;)) 2
P7.2.3 Suppose QH AQ = diag(.X;) + N is a Schur decomposition of A and that x- 1 AX
= diag (.X;). Show K2(X) 2 2: 1 + Cll N IIF/11 A llp) 2 . See Loizou (1969).
P7.2.4 If x- 1 AX = diag (.X;) and I.Xd 2: ... 2: !.Xnl. then

=~ii) :-:; I.X;I :5 "2(X)a;(A).


Prove this result for then= 2 case. See Ruhe (1975).

P7.2.5 Show that if A= [ ~ ~ ] and a f. b, then s(a) = s(b) = (1+ lc/(a-b)\2)-1/2.


328 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

P7.2.6 Suppose

A= [ ~ ;: ]
and that Art »(T22). Show that if a= sep(», T22l, then
I a
s (») = < --;=~=;==;;;=
JI +II (T22- »I) 'vII~ - Ja 2 +II vII~

P7 .2. 7 Show that the condition of a simple eigenvalue is preserved under unitary
similarity transformations.
P7.2.8 With the same hypothesis as in the Bauer-Pike theorem (Theorem 7.2.2), show

that min I» -1"1 S IIIX-'IIEIIXIII,·


.\E.\(A)

P7.2.9 Verify (7.2.5).


P7.2.10 Show that if B E cc~x~ and C E <C"x", then sep(B, C) is less than or equal
to I»- I" I for all » E »(B) and I" E »(C).

Notes and References for Sec. 7.2


Many of the results presented in this section may be found in Wilkinson (1965, chapter
2), Stewart and Sun (1990) as well as in

P.L. Bauer and C.T. Pike (1960). "Norms and Exclusion Theorems," Numer. Math. 2,
123-44.
A.S. Householder (1964). The Theory of Matrices in Numerical Analysis. Blaisdell,
New York.
The following papers are concerned with the effect of perturbations on the eigenvalues
of a general matrix:

A. Rube (1970). "Perturbation Bounds for Means of Eigenvalues and Invariant Sub-
spaces," BIT 10, 343-54.
A. Rube (1970). "Properties of a Matrix with a Very Ill-Conditioned Eigenproblem,"
Numer. Math. 15, 57-60.
J .H. Wilkinson ( 1972). "Note on Matrices with a Very Ill-Conditioned Eigenproblem,"
Numer. Math. 19, 176-78.
W. Kahan, B.N. Parlett, and E. Jiang (1982). "Residual Bounds on Approximate Eigen-
systems of Nonnorma.l Matrices," SIAM J. Numer. Anal. 19, 47(}-.484.
J.H. Wilkinson (1984). "On Neighboring Matrices with Quadratic Elementary Divisors,"
Numer. Math. 44, 1-21.
J.V. Burke and M.L. Overton (1992). "Stable Perturbations ofNonsymmetric Matrices,"
Lin.Alg. and Its Application 171, 249--273.
Wilkinson's work on nearest defective matrices is typical of a growing body of literature
that is concerned with "nearness" problems. See

N.J. Higham (1985). "Nearness Problems in Numerical Linear Algebra," PhD Thesis,
University of Manchester, England.
C. Van Loan (1985). "How Near is a Stable Matrix to an Unstable Matrix?," Contem-
porary Mathematics, Vol. 47, 465--477.
J.W. Demmel (1987). "On the Distance to the Nearest lll-Posed Problem," Numer.
Math. 51, 251-289.
7.2. PERTURBATION THEORY 329

J.W. Demmel (1987). "A Counterexample for two Conjectures About Stability," IEEE
Tro.m. Auto. Cont. A C-32, 340-342.
A. Ruhe (1987). "Closest Normal Matrix Found!," BIT 27, 585-598.
R. Byers (1988). "A Bisection Method for Measuring the Distance of a Stable ~latrix to
the Unstable Matrices," SIAM J. Sci. and Stat. Comp . .9, 875-881.
J.W. Demmel (1988). "The Probability that a Numerical Analysis Problem is Difficult,"
Math. Comp. 50, 449-480.
N.J. Higham (1989). "Matrix Nearness Problems and Applications," in Applicatiom of
Matnx Theory, M.J.C. Gover and S. Barnett (eds), Oxford University Press, Oxford
UK, 1-27.
Aspects of eigenvalue condition are discussed in

C. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors,"
Lin. Alg. and Its Applic. 88/89, 715-732.
C. D. Meyer and G.W. Stewart (1988). "Derivatives and Perturbations of Eigenvectors,"
SIAM J. Nv.m. Anal. 25, 679-691.
G.W. Stewart and G. Zhang (1991). "Eigenvalues of Graded Matrices and the Condition
Numbers of Multiple Eigenvalues," Nv.mer. Math. 58, 703-712.
J.-G. Sun (1992). "On Condition !\umbers of a Kondefective Multiple Eigenvalue,"
Nv.mer. Math. 61, 265-276.
The relationship between the eigenvalue condition number, the departure from normal-
ity, and the condition of the eigenvector matrix is discussed in

P. Henrici (1962). "Bounds for Iterates, Inverses, Spectral Variation and Fields of Values
of Non-normal Matrices," Numer. Math. 4, 24-40.
P. Eberlein (1965). "On Measnres of Non-=-rormality for Matrices," A mer. Math. Soc.
Monthly 72, 99.'>-96.
R.A. Smith (1967). "The Condition Numbers of the Matrix Eigenvalue Problem," Nu-
mer. Math. 10 232-40.
G. Loizou (1969). "Non normality and Jordan Condition Numbers of Matrices," J_ ACM
16, 580-40.
A. van der Slnis (1975). "Perturbations of Eigenvalues of Non-normal ).,\atrices," Comm.
ACM 18, 30-36.
The paper by Henrici also contains a result similar to Theorem 7.2.3. Penetrating treat-
ments of invariant subspace perturbation include

T. Kato (1966). Perturbation Theory for Linear Opemtors, Springer-Verlag, New York.
C. Davis and W.M. Kahan (1970). "The Rotation of Eigenvectors by a Perturbation,
Ill," SIAM J. Num. Anal. 7, 1-46.
G.W. Stewart (1971). "Error Bounds for Approximate Invariant Subspaces of Closed
Linear Operators," SIAM. J. Num. Anal. 8, 796-808.
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with
Certain Eigenvalue Problems," SIAM Review 15, 727-64.
Detailed analyses of the function sep(.,.) and the map X~ AX+ X AT are given in

J. Varah (1979). "On the Separation of Two Matrices," SIAM J_ Num. Anal. 16,
216-22.
R. Byers and S.G. Nash (1987). "On the Singular Vectors of the Lyapunov Operator,''
SIAM J. A!g. and DiBc. Methods 8, 59-66.
Gershgorin's Theorem can be used to derive a comprehensive perturbation theory. See
Wilkinson (1965, chapter 2). The theorem itself can be generalized and extended in
various ways; see
330 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

R.S. Varga (1970). "Minimal Gershgorin Sets for Partitioned Matrices," SIAM J. Num.
Anal. 7, 493-507.
R.J. Johnston (1971). "Gershgorin Theore= for Partitioned Matrices," Lin. Alg. and
Its Applic. 4, 205-20.

7.3 Power Iterations


Suppose that we are given A E (Q"xn and a unitary Uo E ([;"xn. Assume
that Householder orthogonalization (Algorithm 5.2.1) can be extended to
complex matrices (it can) and consider the following iteration:

To= U!/ AUo


for k = 1, 2, ...
Tk-1 = UkRk (QR factorization) (7.3.1)
Tk =·RkUk
end

Since Tk = RkUk = Uf/(UkRk)Uk = Uf/Tk-1Uk it follows by induction


that
(7.3.2)
Thus, each Tk is unitarily similar to A. Not so obvious, and what is the
central theme of this section, is that the Tk almost always converge to
upper triangular form. That is, (7.3.2) almost always "converges" to a
Schur decomposition of A.
Iteration (7.3.1) is called the QR iteration, and it forms the backbone
of the most effective algorithm for computing the Schur decomposition.
In order to motivate the method and to derive its convergence properties,
two other eigenvalue iterations that are important in their own right are
presented first: the power method and the method of orthogonal iteration.

7.3.1 The Power Method


Suppose A E (Q"xn is diagonalizable, that x- 1AX= diag(AJ. ... , An) with
X= [x1, ... ,xn], and lAd> IA2I ~ ... ~!Ani· Given a unit 2-norm
q(O) E ([;", the power method produces a sequence of vectors q(k) as follows:

for k = 1, 2; ...
z(k) = Aq(k-1)
q(k) = z(k) /II z(k) !12 (7.3.3)
A(kJ = (q(kl]H Aq(k)
end

There is nothing special about doing a 2-norm normalization except that


it imparts a greater unity on the overall discussion in this section.
7.3. POWER ITERATIONS 331

Let us examine the convergence properties of the power iteration. If


q(o) = a1x1 + alx2 + · · · + a,.x,.

and a1 -1 0, then it follows that

Akq(o) = a1>..~ ( X1

Since q(k) E span{Akq( 0 )} we conclude that

dist(span{q<k>},span{xll) = o (l~:n
and moreover,

1.>..1-.>..<k>l = o(i~:O·
If l>..1l > l>..2l ;?: • • • ;?: I>..,. I then we say that >..1 is a dominant eigenvalue.
Thus, the power method converges if >.. 1 is dominant and if q(o) has a
component in the direction of the corresponding dcnninant eigenvector x 1.
The behavior of the iteration without these assumptions is discussed in
Wilkinson (1965, p.570) and Parlett and Poole (1973).

Example '1'.3.1 If

A= [ =~~
-800
: ==]
631 -144
then .\(A)= {10, 4, 3}. Applying (7.3.3) with q(0 ) = [1, 0, O)Twe find
k J.(k)
1 13.0606
2 10.7191
3 10.2073
4 10.0633
5 10.0198
6 10.0063
7 10.0020
8 10.0007
9 10.0002

In practice, the usefulness of the power method depends upon the ratio
1.>..2!/1>..1!, since it dictates the rate of convergence. The danger that q(0 ) is
deficient in X1 is a less worrisome matter because rounding errors sustained
during the iteration typically ensure that the subsequent q(k) have a com-
ponent in this direction. Moreover, it is typically the case in applications
332 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

where the dominant eigenvalue and eigenvector are desired that an a priori
estimate of Xi is known. Normally, by setting q<0l to be this estimate, the
dangers of a small a 1 are minimized.
Note that the only thing required to implement the power method is a
subroutine capable of computing matrix-vector products of the form Aq.
It is not necessary to store A in an n-by-n array. Fbr this reason, the
algorithm can be of interest when A is large and sparse and when there is
a sufficient gap between IA1I and IA21·
Estimates for the error IA(k) - Ad can be obtained by applying the
perturbation theory developed in the previous section. Define the vector
r(I•J = Aq(k)- A(k)q(k) and observe that (A+ E(kl)q(k) = A(k)q(k) where
E(k) = -r(k) [q<kl]H. Thus A(k) is an eigenvalue of A+ E(k) and

(k) _ ~ II E(k) ll2 _ II r<kJ ll2


IA . AJ I ~ s(>.J) - s(AJ) .
If we use the power method to generate approximate right and left dominant
eigenvectors, then it is possible to obtain an estimate of s(AJ). In particular,
if w<kJ is a unit 2-norm vector in the direction of (AH)kw< 0 >, then we can
use the approximation s(A 1 ) ~ 1w<k>H q(k) I·

7.3.2 Orthogonal Iteration


A straightforward generalization of the power method can be used to com-
pute higher-dimensional invariant subspaces. Let r be a chosen integer
satisfying 1 ::::; r ::::; n. Given an n-by-r matrix Q0 with orthonormal
columns, the method of orthogonal iteration generates a sequence of matri-
ces {Qk} ~ V""r 88 follows:
fork= 1,2, ...
zk =AQk-1 (7.3.4)
QkRk =Zk (QR factorization)
end
Note that if r = 1, then this is just the power method. Moreover, the
sequence {Qke 1} is precisely the sequence of vectors produced by the power
iteration with starting vector q< 0> = Q0 e 1 •
In order to analyze the behavior of this iteration, suppose that
QH AQ = T = dia.g(A;) +N (7.3.5)
is a Schur decomposition of A E V""n. Assume that 1 ::::; r <nand parti-
tion Q, T, and N 88 follows:

Q = [Q,. Q(j I
r n-r
T= [ T~1 ~~:] r
n-r
r n- r
(7.3.6)
7.3. POWER ITERATIONS 333

r
N=
n -r.
r n-r
If /Arl > /Ar+J/, then the subspace Dr(A) = ran(Qa) is said to be a dom-
inant invariant subspace. It is the unique invariant subspace associated
with the eigenvalues A1, ... , Ar· The following theorem shows that with rea-
sonable assumptions, the subspaces ran(Qk) generated by (7.3.4) converge
to Dr(A) at a rate proportional to IAr+t/Ar/k.
Theorem 7.3.1 Let the Schur decomposition of A E CO"" be given by
(7.3.5} and (7.3.6} with n 2!: 2. Assume that /Ar/ > /Ar+Ii and that 8 2!: 0
satisfies
(1 + 8)/Ar/ > /1 N /IF ·
If Qo E C'"r has orthonormal columns and
d = dist(Dr(AH), ran(Qo)) < 1,
then the matrices Qk generated by (7.3.4) satisfy
dist(Dr(A), ran(Qk)) ~
2
(1+8)"- (
1+ /IT121/F ) (iAr+Ii+liNI/F/(1+8))k
v'1-d2 sep(Tu,T22) /Ar/-11 N IIF/(1 +8)
Proof. The proof is given in an appendix at the end of this section. D

The condition d < 1 in Theorem 7.3.1 ensures that the initial Q matrix is
not deficient in certain eigendirections:
d <1 +-+ Dr(AH).L n ran(Qo) = {0}.
The theorem essentially says that if this condition holds and if 8 is chosen
large enough, then

dist(Dr(A), ran(Q~~:)) ~ c IA~: Ilk

where c depends on sep(Tu, T22) and A's departure from normality. Need-
less to say, convergence can be very slow if the gap between /Ar/ and /Ar+l/
is not sufficiently wide.

Example 7.3.2 If (7.3.4) is applied to the matrix A in Example 7.3.1, with Qo = [e1,e2),
we lind:
k dist(D2(A), ran Qk))
1 .0052
2 .0047
3 .0039
4 .0030
5 .0023
6 .0017
7 .0013
334 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

The error is tending to zero with rate (>.sf>.,)k = (3/4)k.


It is possible to accelerate the convergence in orthogonal iteration using
a technique described in Stewart (1976). In the accelerated scheme, the
approximate eigenvalue >.lkl satisfies

i>.lk) -A; I ~ I>.~:1 lk i = 1:r.

(Without the acceleration, the right-hand side is i>.HI/>.;Ik.) Stewart's algo-


rithni involves computing the Schur decomposition of the matrices AQk Qf
every so often. The method can be very useful in situations where A is
large and sparse and a few of its largest eigenvalues are required.

7.3.3 The QR Iteration


We now "derive" the QR iteration (7.3.1) and examine its convergence.
Suppose r = n in (7.3.4) and the eigenvalues of A satisfy

1>.11 > 1>.21 > · · · > l>.nl·


Partition the matrix Q in (7.3.5) and Qk in (7.3.4) as follows:

If
dist(D1 (AH),span{q~0l, ... ,q~0 l}) < 1 i = l:n (7.3.7)
then it follows from Theorem 7.3.1 that
dist(span{q~k), ... ,qlkl},span{q11 ••• ,q;}) -+ 0

for i = l:n. This implies that the matrices Tk defined by

Tk = Qf:AQk

are converging to upper triangular form. Thus, it can be said that the
method of orthogonal iteration computes a Schur decomposition provided
the original iterate Qo E e-x" is not deficient in the sense of (7 .3. 7).
The QR iteration arises naturally by considering how to compute the
matrix Tk directly from its predecessor Tk_ 1. On the one hand, we have
from (7.3.4) and the definition of Tk- 1 that

Tk-1 = Qf:_1AQk-1 = Qf:-1 (AQk-1) = (Qf{_1 Qk)Rk.


On the other hand

Tk = Qf{ AQk = (Qf{ AQk-1)(Qf:-tQk) = Rk(Qf{_ 1Qk)·


7.3. POWER ITERATIONS 335

Thus, TJc is determined by computing the QR factorization of Tk-1 and


then multiplying the factors together in reverse order. This is precisely
what is done in (7.3.1).

Example 7.3.3 If the iteration:

for lc = 1, 2, ...
A=QR
A=RQ
end
is applied to the matrix of Example 7.3.1, then the strictly lower triangular elements
diminish as follows:

lc O(la21l> O(ia3d) O(la321l

1 w-1 10-1 10-2


2 w-• 10-2 w-3
3 10-2 10-3 10-3
4 10-3 10-3 10-3
5 10-3 10-4 10-3
6 10-4 10-5 10-3
7 w-4 10-5 10-3
8 10-5 10-6 10-4
9 10-5 w-7 10-4
10 10-6 10-8 10- 4

Note that a single QR iteration is an O(n3 ) calculation. Moreover, since


convergence is only linear (when it exists), it is clear that the method is a
prohibitively expensive way to compute Sclmr decompositions. Fortunately
these practical difficulties can be overcome, as we show in §7.4 and §7.5.

7.3.4 LR Iterations
We conclude with some remarks about power iterations that rely on the LU
factorization rather than the QR fa.ctorizaton. Let Go E have rank r. c-xr
Corresponding to (7.3.4) we have the following iteration:

fork= 1,2, ...


Z1c = AG~c_ 1 (7.3.8)
Z~c =G~cR, (LU factorization)
end

Suppose r = n and that we define the matrices T1c by

T~c = Gj; 1 AG,. (7.3.9)

It can be shown that if we set L 0 =Go, then the T~c can be generated as
follows:
336 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

To= L 01 ALo
fork=1,2, ... (7.3.10)
Tk-1 = L~cRk (LU factorization)
TJ.=R"L"
end
Iterations (7.3.8) and (7.3.10) are known as treppenitemtion and the LR
iteration, respectively. Under reasonable assumptions, the Tk converge to
upper triangular form. To successfully implement either method, it is nec-
essary to pivot. See Wilkinson (1965, p.602).

Appendix
In order to establish Theorem 7.3.1 we need the following lemma which is
concerned with bounding the powers of a matrix and its inverse.
Lemma 7.3.2 Let QH AQ = T = D + N be a Schur decomposition of
A E ccnxn where D is diagonal and N strictly upper triangular. Let A and
p. denote the largest and smallest eigenvalues of A in absolute value. If
8 :2-: 0 then for all k :2-: 0 we have

IIA"IIz ~ (1+8)"- 1 (1AI + l 1:l~r (7.3.11)

If A is nonsingular and 8 :2-: 0 satisfies (1 + 8)[P.! > II N IIF• then for all
k :2-: 0 we also have

(7.3.12)

Proof. For 8 :2-: 0, define the diagonal matrix .6. by

.6. = diag (1, (1 + 8), (1+ 8) 2 , ... , (1 + 8)n- 1 )


and note that ~(.6.) = (1 + 8)"- 1 . Since N is strictly upper triangular, it
is easy to verify that II ~N.6. - 1 IIF ~ II N IIF/(1 + 8). Thus,

II A" liz II T" 112 = II .6.- 1 (D + .6.N.6.- 1)k.6.ll2


~ ~>2(.6.) (II D 112 +II .6.N.6.- 1 IIz/
< (1 + 8)n-1 (lA[ + II N IIF)k
1+8
On the other hand, if A is nonsingular and (1 + 8)IP.I > II N !IF, then
II .6.D- 1 N .6. - 1 II 2 < 1 and using Lemma 2.3.3 we obtain
IIA-"112 = IIT-"112 = II.6.- 1[(I+.6.D- 1 N.6.- 1 )- 1 D- 1]".6.II2
7 .3. POWER ITERATIONS 337

' un-1·11 )k
s 1>2(.6.) ( 1 - II .6-D-1 N~ I 1\2
s (1
+
er•-l (
IJ.tl-11
1
N IIF/(1 + 0)
)k . 0
Proof of Theorem 7.3.1
It is easy to show by induction that AkQ 0 = Qk(Rk · · · R!). By substi-
tuting (7 .3.5) and 7 .3.6) into this equality we obtain

Tk [ ~ ] = [ ~ ] (Rk .. · R1 )
where vk = Q{!Qk and wk = QUQk. Using Lemma 7.1.5 we know that a
matrix X E <r:•" (n-r) exists such that

Ir X Tu T12 Ir X Tu 0
-I [ ] [ ] [ ]
[ 0 In-r ] 0 T22 0 In-r = 0 T22

and so

[ Ttl J 2
][ Vo W~Wo ] = [ Vk ~Wk ] (Rk ... R1 ) .

Below we establish that the matrix V0 - XW0 is nonsingular and this enables
us to obtain the following expression:

Wk = T:f2 Wo(Vo- XWo)- Tiik [ lr, -X] [


1
~k ] ·

Recalling the definition of distance between subspa.ces from §2.6.3,

Since
II [I. , -X] liz S 1 + II X IIF
we have
dist(D.(A), ran( Qk)) s (7.3.13)
II T~2 llzll (Vo- XWo)- ll2ll Tii.k ll2 (1 +II X II F)
1
.
To prove the theorem we must look at each of the four factors in the upper
bound.
Since sep(T11 , T22 ) is the smallest singular value of the linear transfor-
mation c/l(X)= TuX- XTzz it readily follows from c/l(X) = -T12 that

(7.3.14)
338 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Using Lemma 7 .3.2 , it can be shown that

II Tf2ll2 ~ (1 +8)"- r- 1 (l>•r+ll + lllt:I~F) k (7.3.15)

and
(7.3.16)

Finally, we turn our attention to the \1 (V0 - XW0 ) - 1


II factor. Note
that
Vo - XWo Q~Qo - XQf/Qo

= i lr , -X ] [ ~ ] Qo
[1 Q.Q, [ [ -~H Jr Qo

where

Z = (Q.,.Q.6) [ -~H] (lr + XXH)- lf 'l


= (Qa - QpXH)(lr + X XH )- 112.

The columns of this matrix are orthonormal. They are also a basis for
Dr(AH ) because
AH(QOr - Q/J XH) = (Qo - Q/JXH)T{{ .
This last fact follows from the equation AHQ = QTH .
From Theorem 2.6.1

d = dist(Dr(AH), range(Qo)) =VI- ur(ZHQ0 ) 2

and since d < 1 by hypothesis,


Ur(ZH Qo) > 0.
This shows that

(Vo - XWo) = (lr + XXH) 112 (ZHQ0 )


is nonsingular and thus,
II (Vo - XWo)- 1 112 ::; II(Ir + XXH)-l/2 112 II (ZHQo)- 1 11,
::; 1/v'I - tP. (7.3.17)
7.3. POWER ITERATIONS 339

The theorem follows by substituting (7.3.14)-(7.3.17) into (7.3.13). D

Problems

PT.3.1 (a) Show that if X E «:"X" is nonsingula.r, then II A llx = II x-I AX II• defines
a matrix norm with the property that II AB llx :5 II A llx II B llx. (h) Let A E CC"x" and
set p =max lA; I. Show that for a.ny • > 0 there exists a nonsingular X E CC"x" such that
II A llx = II x-I AX ll2 :5 p + •: C?nclude that there is a constant M_ such that II Ak I!•
:5 M(p + •l' for all non-negative mtegers k. (Hmt: Set X = Q diag(1, a, ... , a"- )
where QH AQ = D + N is A's Schur decomposition.)
PT.3.2 Verify that (7.3.10) calculates the matrices T, defined hy (7.3.9).
PT.3.3 Suppose A E C[;"x" is nonsingular and that Qo E cr;nxp has orthonormal columns.
The following iteration is referred to as inverse ortlwgonal iterotion.

for k=1,2, ...


Solve AZ, = Q•-I for z, E cr;nxp
z, = Q,R, (QR factorization)
end

Explain why this iteration ca.n usually be used to compute the p smalle~t eigenvalue~
of A in absolute value. Note that to implement this iteration it is necessary to be able
to solve linear systems that involve A. When p = 1, the method is referred to as the
inverse power method.
PT.3.4 Assume A E R'x" has eigenvalues A~o ... , A, that satisfy
A= AI = A2 = Aa = .\.4 >lAs I ;: -: ··· ;: -: jA,j
where .1. is positive. Assume that A has two Jordan blocks of the form.

Discuss the convergence properties of the power method when applied to this matrix.
Discuss how the convergence might be accelerated.

Notes and References for Sec. 7.3


A detailed, practical discussion oft he power method is given in Wilkinson (1965, chapter
10). Methods are discussed for accelerating the basic iteration, for calculating nondomi-
na.nt eigenvalues, and for handling complex conjugate eigenvalue pairs. The connections
among the various power iterations are discussed in

B.N. Parlett and W.G. Poole (1973). "A Geometric Theory for the QR, LU, and Power
Iterations," SIAM J. Num. Anal. 10, 389-412.
The QR iteration was concurrently developed in

J.G.F. Francis (1961). "The QR Tra.nsformation: A Unitary Analogue to the LR Trans-


formation," Camp. J. 4, 265-71, 332-34.
V.N. Kublanovska.ya (1961). "On Some Algorithms for the Solution of the Complete
Eigenvalue Problem," USSR Camp. Math. Phys. 3, 637-57.
As can be deduced from the title of the first paper, the LR iteration predates the QR
iteration. The former very fundamental algorithm was proposed hy
340 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

H. Rutishauser (1958). "Solution of Eigenvalue Problems with the LR Transformation,"


Nat. Bur. Stand. App. Math. Ser. ~9, 47-81.
B.N. Parlett (1995). "The New qd Algorithms," ACTA Numerica 5, 45!}-491.
Numerous papers on the convergence of the QR iteration have appeared. Several of these
ace

J.H. Wilkinson (1965). "Convergence of the LR, QR, and Related Algorithms," Comp.
J. 8, 77-84.
B.N. Parlett (1965). "Convergence of the Q-R Algorithm," Numer. Math. 7, 187-93.
(Correction in Numer. Math. 10, 163-{;4.)
B.N. Parlett (1966). "Singular and Invariant Matrices Under the QR Algorithm," Math.
Comp. 20, 611-15.
B.N. Parlett (1968). "Global Convergence of the Basic QR Algorithm on Hessenberg
Matrices," Math. Comp. 22, 803-17.
Wilkinson (AEP, chapter 9) also discusses the convergence theory for this important
algorithm.
Deeper insight into the convergence of the QR algorithm and its connection to other
important algorithms can be attained by reading

D.S. Watkins (1982). "Understanding the QR Algorithm," SIAM Review~. 427-440.


T. Nanda (1985). "Differential Equations and the QR Algorithm," SIAM J. Numer.
Anal. 2B, 31 o-321.
D.S. Watkins (1993). "Some Perspectives on the Eigenvalue Problem," SIAM Review
35, 430-471.
The following papers are concerned with various practical and theoretical aspects of si-
multaneous iteration:

H. Rutishauser (1970). "Simultaneous Iteration Method for Symmetric Matrices," Nu-


mer. Math. 16, 205-23. See also (Wilkinson and Reinsch(1971, pp. 284-302.
M. Clint and A. Jennings (1971). "A Simultaneous Iteration Method for the Unsym-
metric Eigenvalue Problem," J. Imt. Math. Applic. 8, 111-21.
A. Jennings and D.R.L. Orr (1971). "Application of the Simultaneous Iteration Method
to Undamped Vibration Problems," Inst. J. Numer. Math. Eng. 3, 13-24.
A. Jennings and W.J. Stewart (1975). "Simultaneous Iteration for the Partial Eigenso-
lution of Real Matrices," J. In•t. Math. Applic. 15, 351-{;2.
G.W. Stewart (1975). "Methoda of Simultaneous Iteration for Calculating Eigenvectors
of Matrices," in Topic• in Numerical Analy•iB II , ed. John J.H. Miller, Academic
Press, New York, pp. 185-96.
G.W. Stewart (1976). "Simultaneous Iteration for Computing Invariant Subspaces of
Non-Hermitian Matrices," Numer. Math. 25, 123-36.
See also chapter 10 of
A. Jennings (1977). Matrix Computation for Engineers and ScientiBts, John Wiley and
Sons, New York.
Simultaneous iteration and the Lanczos algorithm (cf. Chapter 9) are the principal meth-
oda for finding a few eigenvalues of a general sparse matrix.
7.4. THE HESSENBERG AND REAL SCHUR FORMS 341

7.4 The Hessenberg and Real Schur Forms


In this and the next section we show how to make the QR iteration (7.3.1)
a fast, effective method for computing Schur decompositions. Because the
majority of eigenvalue/invariant subspace problems involve real data, we
concentrate on developing the real analog of (7.3.1) which we write as fol-
lows:
Ha = U;[ AUo
fork= 1, 2, ...
Hk-1 = UkRk (QR factorization) (7.4.1)
Hk = RkUk
end
Here, A E Rnxn, each Uk E Rnxn is orthogonal, and each Rk E Rnxn is
upper triangular. A difficulty associated with this real iteration is that the
Hk can never converge to strict, "eigenvalue revealing," triangular form
in the event that A has complex eigenvalues. For this reason, we must
lower our expectations and be content with the calculation of an alternative
decomposition known as the real Schur decomposition.
In order to compute the real Schur decomposition efficiently we must
carefully choose the initial orthogonal similarity transformation Ua in (7.4.1).
In particular, if we choose U0 so that Ho is upper Hessenberg, then the
amount of work per iteration is reduced from O(n3 ) to O(n2 ). The initial
reduction to Hessenberg form (the U0 computation) is a very important
computation in its own right and can be realized by a sequence of House-
holder matrix operations.

7.4.1 The Real Schur Decomposition


A block upper triangular matrix with either 1-by-1 or 2-by-2 diagonal blocks
is upper quasi-triangular. The real Schur decomposition amounts to a real
reduction to upper quasi-triangular form.
Theorem 7.4.1 (Real Schur Decomposition) If A E Rnxn, then there
exists an orthogonal Q E Rnxn such that

(7.4.2)

where each R;; is either a 1-by-1 matrix or a 2-by-2 matrix having complex
conjugate eigenvalues.
Proof. The complex eigenvalues of A must come in conjugate pairs, since
the characteristic polynomial det (zl - A) has real coefficients. Let k be
342 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

the number of complex conjugate pairs in >.(A). We prove the theorem by


induction on k. Observe first that Lemma 7.1.2 and Theorem 7.1.3 have
obvious real analogs. Thus, the theorem holds if k = 0. Now suppose that
k ;:::>: 1. If>. = '"'( + il' E >.(A) and I' "" 0, then there exist vectors y and z in
ne(z"" 0) such that A(y + iz) = ('"'I+ iJ.')(y + iz), i.e.,

A[ y z ] = [ Y z ] [ -~ ~ ] ·

The assumption that Jl "" 0 implies that y and z span a two-dimensional,


real invariant subspace' for A. It then follows from Lemma 7.1.2 that an
orthogonal U E m,nxn exists such that

UT AU = [Til Tl2 ] 2
0 T22 n- 2
2 n- 2
where >.(T11 ) = {>., X}. By induction, there exists an orthogonal U so
(JTT22U has the required structure. The theorem follows by setting Q = U
diag(h U). []

The theorem shows that any real matrix is orthogonally similar to an upper
quasi-triangular matrix. It is clear that the real and imaginary part of the
complex eigenvalues can be easily obtained from the 2-by-2 diagonal blocks.

7.4.2 A Hessenberg QR Step


We now turn our attention to the speedy calculation of a single QR step
in (7.4.1). In this regard, the most glaring shortcoming associated with
(7.4.1) is that each step requires a full QR factorization costing O(n3 ) flops.
Fortunately, the amount of work per iteration can be reduced by an order of
magnitude if the orthogonal matrix U0 is judiciously chosen. In particular,
if UJ' AUo = Ho = (h;;) is upper Hessenberg (h;; = 0, i > j + 1), then each
subsequent Hk requires.only O(n 2 ) flops to calculate. To see this we look at
the computations H = QR and H+ = RQ when H is upper Hessenberg. AB
described in §5.2.4, we can upper triangularize H with a sequence of n- 1
Givens rotations: QTH =
G;;_ 1 .. ·dfH = R. Here, G, = G(i,i + 1,8,).
For the n = 4 case there are three Givens premultiplications:
X X
X X
X X
0 X
7 .4. THE HESSENBERG AND REAL SCHUR FORMS 343

See Algorithm 5.2.3.


The computation RQ = R(G 1 • • • Gn-!) is equally easy to implement.
In the n = 4 case there are three Givens post-multiplications:

[~ ~l
X X X X X X
X X X X X X
0 X 0 X X X
0 0 0 0 0 0

X X
X X
X X
0 X
Overall we obtain the following algorithm:

Algorithm 7.4.1 If H is an n-by-n upper Hessenberg matrix, then this


algorithm overwrites H with H+ = RQ where H = QR is the QR factor-
ization of H.
fork= 1:n-1
[ c(k), s(k) J = givens(H(k, k), H(k + 1, k))
T
c(k) s(k)
H(k:k + 1, k:n) = [ -s(k) c(k) ] H(k:k + 1, k:n)
end
fork= 1:n -1
c(k) s(k) ]
H(1:k + 1, k:k + 1) = H(1:k + 1, k:k + 1) [ -s(k) c(k)
end

Let Gk = G(k, k + 1, 8k) be the kth Givens rotation. It is easy to confirm


that the matrix Q = G1 · · · Gn-! is upper Hessenberg. Thus, RQ = H+ is
also upper Hessenberg. The algorithm requires about 6n 2 flops and thus is
an order-of-magnitude quicker than a full matrix QR step (7.3.1).

Example 7.4.1 If Algorithm 7.4.1 is applied to:

[~ n·
I
H= 2
.01
then

and
G, [ .8
6
0
-.8
.6
0 n.
[ 4.7600
G2

-2.5442
=
[~
0
.9996
.0249

5.4653 ]
-.024~
.9996
] '

H+ .3200 .1856 -2.1796 .


.0000 .0263 1.0540
344 CHAPTER 7. THE UNSYMMBTRJC EIGENVALUE PROBLEM
'

7.4.3 The Hessenberg Reduction


It remains for us to show how the Hessenbery decomposition

UJ'AU0 =H U[Uo = l (7.4.3)


can be computed. The transformation U0 can be computed as a product
of Householder matrices P., ... , Pn-2· The role of P,. is to zero the kth
column below the subdlagonal. In the n = 6 case, we have

X X X X X X X X X X X X
X

-
X X X X X X X X X X X
X X X X X X p, 0 X X X X X
A
X X X X X X 0 X X X X X
X X X X X X 0 X X X X X
X )( X X X X 0 X X )( X X

X X X X X X X X X X X X

-
X X X X X X X X X )( X X
0 X X )( )( X p, 0 X X X X X
~
0 0 X )( X X 0 0 X )( X X
0 0 X X X X 0 0 0 )( X X
0 0 X X X X 0 0 0 X X X

X X X X X X
X X X X X X
0 X X X X X
0 0 X X X X
0 0 0 X X X
0 0 0 0 X X

In general, after k - 1 steps we have computed k - 1 Householder matrices


Pt, ... , P,._l such that

Bu k- 1
B·n 1
[ 0
n-k
k-1
is upper Hessenberg through its first k - I columns. Suppose A: is an order
n- k Householder matrix such that f>,.B32 is a multiple of e~n-k) . If Pt =
diag(I~;,i\), then
7.4. THE HESSENBERG AND REAL SCHUR FORMS 345

is upper Hessenberg through its first k columns. Repeating this for k =


l:n- 2 we obtain

Algorithm 7.4.2 (Householder Reduction to Hessenberg Form)


Given A E m.nxn, the following algorithm overwrites A with H = U;f AU0
where H is upper Hessenberg and U0 is product of Householder matrices.
fork= l:n- 2
[v, ;1] = house(A(k + l:n, k))
A(k + l:n, k:n) =(I -j1vvT)A(k + l:n, k:n)
A(l:n, k + l:n) = A(l:n, k + l:n)(I -;1vvT)
end
This algorithm requires 10n3 /3 flops. If Uo is explicitly formed, an addi-
tional 4n 3 /3 flops are required. The kth Householder matrix can be repre-
sented in A(k + 2:n, k). See Martin and Wilkinson (196Bd) for a detailed
description.
The roundoff properties of this method for reducing A to Hessenberg
form are very desirable. Wilkinson (1965, p.351) states that the computed
Hessenberg matrix H satisfies H = QT(A + E)Q, where Q is orthogonal
and II E IIF ~ cn 2 ull A IIF with c a small constant.

u~ n
Example 7.4.2 If

A=
and Ua = [~ .~ -.6.~ ]
0 .8
then
1.00 8.60 -.20 ]
UJ' AUo =H = 5.00
[ 0.00
4.96 -.72
2.28 -3.96

7.4.4 Level-3 Aspects


The Hessenberg reduction (Algorithm 7.4.2) is rich in level-2 operations:
half ga.xpys and half outer product updates. We briefly discuss two methods
for introducing level-3 computations into the process.
The first approach involves a block reduction to block Hessenberg form
and is quite straightforward. Suppose (for clarity) that n = rN and write

A= An A12 ] r
[ A21 A22 n- r
r n-r

Suppose that we have computed the QR factorization A21 = Q1 R 1 and


that Q1 is in WY form. That is, we have WI. Y1 E m.<n-r)xr such that
346 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Q1 = I - W 1l7. (See §5.2.2 for details.) If Q1 = diag(InQt) then

QT1 AQ, = [ Au _A12Q1_ ]


R1 Qf A22Q1 .

Notice that the updates of the ( 1,2) and (2,2) blocks are rich in level-3
operations given that Q 1 is in WY form. This fully illustrates the overall
process as Qf AQ1 is block upper Hessenberg through its first block column.
We next repeat the computations on the first r columns of QfA22Q1. After
N - 2 such steps we obtain

H uJ' AUo = 0

0 0
where each H;i is r-by-r and Uo = Q 1 · · ·QN- 2 with with each Q, in WY
form. The overall algorithm has a level-3 fraction of the form 1- 0(1/N).
Note that the subdiagonal blocks in H are upper triangular and so the
matrix has lower bandwidth p. It is possible to reduce H to actual Hessen-
berg form by using Givens rotations to zero all but the first subdiagonal.
Dongarra, HammarUng and Sorensen (1987) have shown how to proceed
directly to Hessenberg form using a mixture of gaxpy's and level-3 updates.
Their idea involves minimal updating after each Householder transforma-
tion is generated. For example, suppose the first Householder Pt has been
computed. To generate P2 we need just the second column of PtAPt , not
the full outer product update. To generate P 3 we need just the 3rd col-
umn of P2PtAP1P2, etc. In this way, the Householder matrices can be
determined using only gaxpy operations. No outer product updates are
involved. Once a suitable number of Householder matrices are known they
can be aggregated and applied in a level-3 fashion.

7.4.5 Important Hessenberg Matrix Properties


The Hessenberg decomposition is not unique. If Z is any n-by-n orthogonal
matrix and we apply Algorithm 7.4.2 to z:r
AZ, then QT AQ = H is upper
Hessenberg where Q = ZUo. However, Qe1 = Z(Uoe.) = Ze 1 suggesting
that H is unique once the first column of Q is specified. This is essentially
the case provided H has no zero subdiagonal entries. Hessenberg matrices
with this property are said to be unreduced. Here is a very important
theorem that clarifies the uniqueness of the Hessenberg reduction.

Theorem T.4.2 ( Implicit Q Theorem ) S1Jppose Q = I q1 , • •• , fin J. and


V = I v,, ... , tin J are orthogonal matrices with the property that both Q AQ
7 .4. THE HFSSENBERG AND REAL SCHUR FORMS 347

=Hand yT AV = G are upper Hessenberg where A E Rnxn. Let k denote


the smallest positive integer for which hk+ 1,k = 0, with the convention that
k = n if H is unreduced. lf Q1 = v 1, then q, = ±v; and lhi,i-1l = l9i,i-11
fori = 2:k. Moreover, if k < n, then 9k+i,k = 0.
Proof. Define the orthogonal matrix W = [ w1, ... , Wn ] = yT Q and
observe that GW = W H. By comparing column i - 1 in this equation for
i = 2:k we see that
i-1
hi,i-1W; = Gw;-1 - L hj,i-JWj.
i=l

Since w1 = e~o it follows that [ WJ, .• . , WJ:) is upper triangular and thus w;
= ±In(:, i) = ±e, for i = 2:k. Since w; = yT Qi and hi,i-1 = w[Gw,_ 1 it
follows that v; = ±q, and

lh•,i-11 = lq[ Aq•-1! = 1~r Av•-11 = l9•,i-!l


for i = 2:k. If k < n, then
9k+l,k = ef+IGe~; = ef+ 1 GWe~; = ef+ 1 WHe~;
k 1:
= ef+ll:il;~:We, = Lh;ker+1ei = 0.0
i=l i=l

The gist of the implicit Q theorem is that if QT AQ = H and zT AZ = G


are each unreduced upper Hessenberg matrices and Q and Z have the same
first column, then G and H are "essentially equal" in the sense that G =
n- 1 H D where D = diag(±1, ... , ±1).
Our next theorem involves a new type of matrix called a Krylov ma-
trix. If A E Rnxn and vERn, then the Krylov matrix K(A,v,j) E Rnxj
is defined by
K(A, v,j) = ( v, Av, ... , Ai- 1v ].
It turns out that there is a connection between the Hessenberg reduction
QT AQ =Hand the QR factorization of the Krylov matrix K(A, Q(:, 1), n).
Theorem 7.4.3 Suppose Q E Rnx n is an orthogonal matrix and A E Rnx n.
Then QT AQ = H is an unreduced upper Hessenberg matrix if and only if
QT K(A,Q(:, 1),n) = R is nonsingular and upper triangular.
Proof. Suppose Q E Rnxn is orthogonal and setH = QT AQ. Consider
the identity
QTK(A,Q(:,1),n) = (e1, He~o ... ,Hn- 1e1] = R.
If H is an unreduced upper Hessenberg matrix, then it is clear that R is
upper triangular with r;; = h2 1h32 · · · hi,i-1 for i = 2:n. Since ru = 1 it
follows that R is nollSingular.
348 CHAPTER 7. THE UNSYMMETR!C EIGENVALUE PROBLEM

To prove the converse, suppose R is upper triangular and nonsingular.


Since R(:, k + 1) = H R(:, k) it follows that H(:, k) E span{ e 1 , .•. , ek+l }.
This implies that His upper Hessenberg. Since Tnn = h21h32 · · · hn,n-1 f 0
it follows that H is also unreduced. D

Thus, there is more or less a correspondence between nonsingular Krylov


matrices and orthogonal similarity reductions to unreduced Hessenberg
form.
Our last result concerns eigenvalues of an unreduced upper Hessenberg
matrix.
Theorem 7.4.4 If>. is an eigenvalue of an unreduced upper Hessenberg
matrix H E R"x", then its geometric multiplicity is one.
Proof. For any >. E «:: we have rank(A - AI) ~ n - 1 because the first
n - 1 columns of H - AI are independent. 0

7.4.6 Companion Matrix Form


Just as the Schur decomposition has a nonunitary analog in the Jordan
decomposition, so does the Hessenberg decomposition have a nonunitary
analog in the companion matrix decomposition. Let x E R" and suppose
that the Krylov matrix K = K(A,x,n) is nonsingular. If c = c(O:n -1)
solves the linear system Kc = -Anx, then it follows that AK =KG where
C has the form:
0 0 0 -C()
1 0 0 -Ct

c = 0 1 0 -C2 (7.4.4)

0 0 1 -Cn-1

The matrix C is said to be a companion matrix. Since

det(zJ- C) = C() + C]Z + · · · + Cn-lZn-l + z"


it follows that if K is nonsingular, then the decomposition K- 1 AK = C
displays A's characteristic polynomial. This, coupled with the sparseness
of C, bas led to "companion matrix methods" in various application areas.
These techniques typically involve:
• Computing the Hessenberg decomposition UJ' AU0 =H.
• Hoping H is unreduced and setting Y = [ e 1 , H e1 , ... , H"- 1 et].
• Solving YC = HY for C.
7.4. THE HESSENBERG AND REAL SCHUR FORMS 349

Unfortunately, this calculation can be highly unstable. A is similar to an


unreduced Hessenberg matrix only if each eigenvalue has unit geometric
multiplicity. Matrices that have this property are called nonderogatory. It
follows that the matrix Y above can be very poorly conditioned if A is close
to a derogatory matrix.
A full discussion of the dangers associated with companion matrix com-
putation can be found in Wilkinson (1965, pp. 405 ff.).

7.4.7 Hessenberg Reduction Via Gauss Transforms


While we are on the subject of nonorthogonal reduction to Hessenberg
form, we should mention that Gauss transformations can be used in lieu
of Householder matrices in Algorithm 7.4.2. In particular, suppose permu-
tations III> ... , IIk-1 and Gauss transformations M I> ••• , Mk-I have been
determined such that

where

B
B13]
B23 k -1
1
B33 n-k
n-k
is upper Hessenberg through its first k - 1 columns. A permutation ih
of order n - k is then determined such that the first element of ihB32 is
maximal in absolute value. This makes it possible to determine a stable
Gauss transformation Mk = I- Zkef also of order n- k, such that all but
the first component of Mk(frkB32) is zero. Defining Ilk = diag(Ik, frk) and
Mk = diag(Ik,Mk), we see that

is upper Hessenberg through its first k columns. Note that M;; 1 = I+ zke[
and so some very simple rank-one updates are involved in the reduction.
A careful operation count reveals that the Gauss reduction to Hessen-
berg form requires only half the number of flops of the Householder method.
However, as in the case of Gaussian elimination with partial pivoting, there
is a {fairly remote) chance of 2" growth. See Businger (1969). Another dif-
ficulty associated with the Gauss approach is that the eigenvalue condition
350 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

numbers- the s(.>..)- 1 -are not preserved with nonorthogonal similarity


transformations and this complicates the error estimation process.

Problems

P7.4.1 Suppooe A E R"x" and z E R". Give a detailed algorithm for computing an
orthogonal Q such that QT AQ is upper H.....,nberg and QT z is a multiple of (Hint: e,.
Reduce z first and then apply Algorithm 7.4.2.)
P7.4.2 Specify a complete reduction to Hessenberg form using Gauss transformations
and verify that it only requires 5n 3 /3 flops.
P7.4.3 In some situations, it is necessary to solve the linear system (A+ zl):z: = b for
many different values of z E R and b E R". Show how this problem can be efficiently
and stably solved using the H"""""berg decomposition.
P7.4.4 Give a detailed algorithm for explicitly computing the matrix Uo in Algorithm
7.4.2. Design your algorithm so that H is overwritten by Uo.
P7.4.5 Suppose HE wxn is an unreduced upper Hessenberg matrix. Show that there
exists a diagonal matrix D such that each subdiagona.l element of n- 1 H D is equa.l to
one. What is 1<2(D)?
P7.4.6 Suppose W, Y E wxn and define the matrices C and B by

C = W + iY, B = [ ~ -: ]

Show that if>. E >.(C) is real, then >. E >.(B). Relate the corresponding eigenvectors.

P7 .4. 7 Suppoae A = [ ~ ~ ] is a real matrix having eigenvalue~ >. ± iJJ, where JJ is


nonzero. Give an a.lgorithm that stably determines c = cos(9) and s = sin(9) such that

[ -sc "]T[w
c 11
"'][
z -sc "]
c = ["
a {J]
>.
=
where a{J -JJ 2 •
P7.4.8 Suppose(>., z) is a known eigenvalu&eigenvector pair for the upper H....enberg
matrix H E R!'x". Give an algorithm for computing an orthogona.l matrix P such that

pTHP -- [ >.
0
wT
H,
]
where H1 E ft(n-l)x(n-l) is upper Hessenberg. Compute P as a product of Givens
rotations.
P7.4.9 Suppose HE wxn has lower bandwidth p. Show how to compute Q E R!'x",
a product of Givens rotations, such that QT HQ is upper H....enberg. How many flops
are required?
P7.4.10 Show that if C ill a companion matrix with distinct
then vcv- 1 = diag(.\1, ... '>.,.) where

V= [ : :~
>.~-1
>.n-1
2

.x~-1
l eigenvalue~ >.1, ... , >.,.,

Notes and References for Sec. 7.4


The real Schor decomposition was originally pre~CDted in
7.4. THE HESSENBERG AND REAL SCHUR FoRMS 351

F.D. Murnaghan and A. Wintner (1931). 'A Canonical Fbrm for Real Mabicea Under
Orthogonal Tra.nsforma~ions," Proc. Nat. Acad. Sci. 17, 417-20.
A thorough treatment of the reduction to Hessenberg form is given in Wilkinson (1965,
chapter 6), and Algol procedures for both the Householder and Gauss methods appear in

RS. Martin and J.H. Wilkinson (1968). "Similarity Reduction of a General Matrix to
Hessenberg Form," Numer. Math. 12, 349-£8. See also Wilkinson and Reinsch
(1971,pp.339-58).
Fortran versions of the Algol procedures in the last reference are in Eispack.
Givens rotations can also be used to compute the Hessenberg decomposition. See

W. Rath (1982). "Fast Givens Rotations for Orthogonal Similarity," Nu.mer. Math. 40,
47-56.
The high performance computation of the Hessenberg reduction is discussed in

J.J. Dongarra, L. Kaufman, and S. Hammarling (1986). "Squeezing the Most Out of
Eigenvalue Solvers on High Performance Computers," Lin. Alg. and If.8 Applic. 77,
113-136.
J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). "Block Reduction of Matrices
to Condensed Forms for Eigenvalue Computations," JACM f!.7, 215--227.
M.W. Berry, J.J. Dongarra, andY. Kim (1995). "A Parallel Algorithm for the Reduction
of a Nonsymmetric Matrix to Block Upper Hessenberg Form," Parollel Computing
f!.l, 1189-1211.
The possibility of exponential growth in tile Gauss transformation approach was first
pointed out in

P. Businger (1969). "Reducing a Matrix to H""""nberg Fbrm," Math. Comp. 29, 819-21.
However, the algorithm should be regarded in the same light as Gaussian elimination
with partial pivoting-stable for all practical purposes. See Eispack, pp. 56-58.
Aspects of the He9Benberg decomposition for sparse matric... are discussed in

I.S. Duff and J.K. Reid (1975). "On the Reduction of Sparse Matric... to Condensed
Forms by Similarity Transformations," J. Inst. Math. Applic. 15, 217-24.
Once an eigenvalue of an unreduced upper Hessenberg matrix is known, it is possible to
zero the last subdiagonal entry using Givens similarity transformations. See

P.A. Businger (1971). "Numerically Stable Deflation of Hessenberg and Symmetric Tridi-
agonal Matric...,BIT 11, 262-70.
Some interesting mathematical properties of the Hesaenberg form may be found in

B.N. Parlett (1967). "Canonical Decomposition of Hessenberg Matrices," Math. Comp.


f!.l, 223-27.
Y. Ikebe (1979). "On Inverses of Hessenberg Matrices," Lin. Alg. and Its Applic. f!.4,
93-97.
Although the Hessenberg decomposition is largely appreciated BB a "front end" dBCOm-
position for the QR iteration, it is increasingly popular as a cheap alternative to the
more expensive Schur decomposition in certain problems. For a sampling of applications
where it bas proven to be very useful, consult

W. Enright (1979). "On the Efficient and Reliable Numerical Solution of Large Linear
Systems of O.D.E. 's," IEEE 7rans. Auto. Cont. AC-24, 905--8.
352 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

G.H. Golub, S. Nash and C. Van Loan (1979). "A Hessenberg-Schur Method for the
Problem AX +XB = C," IEEE Trons. Auto. Cont. AC-£4, 909-13.
A. Laub (1981). "Efficient Multivariable Frequency Response Computations," IEEE
Trnm. Auto. Cont. AC-£6, 407-8.
C.C. Paige (1981). "Properties of Numerical Algorithms Related to Computing Control-
lability," IEEE Trons. Auto. Cont. AC-26, 13~38.
G. Miminis and C.C. Paige (1982). "An Algorithm for Pole Assignment of Time Invariant
Linear Systems," International J. of Control 35, 341-354.
C. Van Loan (1982). "Using the Hessenberg Decomposition in Control Theory," in
Algorithms and Theory in Filtering and Control , D.C. Sorensen and R.J. Wets
(eds), Mathematical Programming Study No. 18, North Holland, Amsterdam, pp.
102-11.
The advisability of posing polynomial root problems as companion matrix eigenvalue
problem is discussed in

K.-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra
of Companion Matrices," Numer. Math. 68, 403-425.
A. Edelman and H. Murakami (1995). "Polynomial Roots from Companion Matrix
Eigenvalues," Math. Comp. 64, 763-776.

7.5 The Practical QR Algorithm


We return to the Hessenberg QR iteration which we write as follows:
H = U[ AUo (Hessenberg Reduction)
fork=1,2, ...
H = UR (QR factorization) (7.5.1)
H=RU
end
Our aim in this section is to describe how the H's converge to upper quasi-
triangular form and to show how the convergence rate can be accelerated
by incorporating shifts.

7.5.1 Deflation
Without loss of generality we may assume that each Hessenberg matrix H
in (7.5.1) is unreduced. If not, then at some stage we have

H= Hn H12 ] p
[ 0 H22 n- p
P n-p
where 1 :0::: p < n and the problem decouples into two smaller problems
involving H11 and H22· The term deflation is also used in this context,
usually when p = n - 1 or n - 2.
In practice, decoupling occurs whenever a subdiagonal entry in H is
suitably small. For example, in Eispack if
(7.5.2)
7 .5. T HE PRACTICAL QR ALGORITHM 353

for a small constant c, then hs>+ 1,p is "declared" to be zero. This is justified
since rounding errors of order ull H 11 are already present throughout the
matrix.

7.5.2 The Shifted QR Iteration


Let JJ. E IR and consider the iteration:
H = UJ' AUo (Hessenberg Reduction)
fork = 1,2, ...
Determine a scalar Jl.·
H -JJI = UR (QR factorization) (7.5.3)
H = RU + JJI
end
The scalar 1J is referred to as a shift. Each matrix H generated in (7.5.3)
is similar to A, since RU + JJI = ur (U R + JJI )U = uT HU. If we order
the eigenvalues >., of A so that

and J.l is fixed from iteration to iteration, then the theory of §7.3 says that
the pth subdiagonal entry in H converges to zero with rate

>.p+l - IJ lk
I >.p - JJ
Of course, if >.,. = >.p+ 1 , then there is no convergence at all. But if, for
example, 1J is much closer to >.n than to the other eigenvalues, then the
zeroing of the (n , n - 1) entry is rapid. In the extreme case we have the
following:
Theorem 7.5.1 Let JJ. be an eigenvalue of an n-by-n unreduced Hessenberg
matrix H . If fl = RU + JJl, where H -JJI =URi:; the QR factorization
of H - JJ.l, then hn,n-1 = 0 and hnn = JJ.
Proof. Since H is an unreduced Hessenberg matrix the first n - 1 columns
of H - JJl are independent, regardless of Jl.· T hus, if U R = (H- JJl) is the
QR factorization then r ,t # 0 fori = l:n - 1. But if H - J.!l is singular then
ru · · · r,, = 0. Thus, r,, = 0 and fl(n , :) = [0, ... , 0, JJ I· [J

The theorem says that if we shift by an exact eigenvalue, then in exact


arithmetic deflation occurs in one step.

Example 1.5.1 If

H = [ ~ -~1 =~5 ].
354 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

then 6 E >.(H). If U R = H- 61 is the QR factorization, then B = RU + 61 is given by


8.5384 -3.7313 -1.0090 ]
B = 0.6343 5.4615 1.3867 .
[ 0.0000 0.0000 6.0000

7.5.3 The Single Shift Strategy


Now let us consider varying J.l. from iteration to iteration incorporating new
information about .X( A) as the subdiagono.l entries converge to zero. A
good heuristic is to regard hnn as the best approximate eigenvalue along
the diagono.I. If we shift by this quantity during each iteration, we obtain
the single-shift QR itemtion:

fork=1,2, ...
= H(n,n)
J.l.
H-J.J.l=UR (QR Factorization) (7.5.4)
H=RU+J.J.l
end

If the (n, n - 1) entry converges to zero, it is likely to do so at a quadratic


rate. To see this, we borrow an example from Stewart (1973, p. 366).
Suppose H is an unreduced upper Hessenberg matrix of the form

X X X

.u
X X X
X X X
0 X X
0 0 €

and that we perform one step of the single-shift QR o.lgorithm: U R =


H- hnnl, fi = RU + hnnl. After n- 2 steps in the reduction of H- hnnl
to upper triangular form we obtain a matrix with the following structure:

~l
X X X
X X X
0 X X
0 0 a
0 0 €

It is not hard to show that the (n, n - 1) entry in fi = RU + hnnl is


given by -£2 b/(f2 + a 2 ). If we assume that f « a, then it is clear that
7.5. THE PRACTICAL QR ALGORITHM 355

the new (n,n -1} entry has order e2 , precisely what we would expect of o.
quadratically converging algorithm.

D.~1 n
Example 7.5.2 If

H =
and U R = H - 7 I is the QR factorization, then fl = RU + 7 I is given by
-0.5384 1.6908 0.8351 ]
fl "' 0.3076 6.5264 -6.6555 .
[ 0.0000 2 . 10- 5 7.0119

Near-perfect shifts a.s above almost always ensure a small hn,n-1· However, this is just
a. heuristic. There are examples in which hn,n-! is a relatively large matrix entry even
though u~;n(H - !'1) "' U.

7.5.4 The Double Shift Strategy


Unfortunately, difficulties with (7.5.4} can be expected if at some stage the
eigenvalues a 1 and a2 of

G = [ hmm hmn] m=n-1 (7.5.5}


hnm hnn
are complex for then hnn would tend to be a poor approximate eigenvalue.
A way around this difficulty is to perform two single-shift QR steps in
succession using a 1 and a 2 as shifts:

H -a,! = U,R,
Ht = R,u, +a1I (7.5.6}
H 1 -~I U2R2
H2 = R2U2 + a2l
These equations can be manipulated to show that

(7.5.7}

where M is defined by

(7.5.8}

Note that M is a real matrix even if G's eigenvalues are complex since

M = H 2 -sH+tl
where
s =a, + a2 = hmm + hnn =trace(G) E R
356 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

and
t= a1a2 = hmmhnn- hmnhnm = det(G) E lR.
Thus, (7.5.7) is the QR factorization of a real matrix and we may choose
U1 and U2 so that Z = U1 U2 is real orthogonal. It then follows that

is real.
Unfortunately, roundoff error almost always prevents an exact return to
the real field. A real H 2 could be guaranteed if we

• explicitly form the real matrix M = H2 - sH + tl,

• compute the real QR factorization M = ZR, and

• set H2 = zrHz.

But since the first of these steps requires O(n3 ) flops, this is not a practical
course of action.

7.5.5 The Double Implicit Shift Strategy


Fortunately, it turns out that we can implement the double shift step with
O(n2 ) flops by appealing to the Implicit Q Theorem of §7.4.5. In particular
we can effect the transition from H to H 2 in O(n 2 ) flops if we

• compute Me 1 , the first column of M;

• determine a Householder matrix Po such that P0 (Mei) is a multiple


of e1;

• compute Householder matrices Pt. ... , Pn_ 2 such that if Z 1 is the


product Z1 = PoP1 · · · Pn-2, then Z[ HZ1 is upper Hessenberg and
the first columns of Z and Z 1 are the same.

Under these circumstances, the Implicit Q theorem permits us to conclude


that if zT HZ and ZfHZ1 are both unreduced upper Hessenberg matrices,
then they are essentially equal. Note that if these Hessenberg matrices are
not unreduced, then we can effect a decoupling and proceed with smaller
unreduced subproblems.
Let us work out the details. Observe first that Po can be determined in
0(1) flops since M e 1 = [x, y, z, 0, ... , o]T where

x = h~ 1 + h12h21 - shu +t
y = h21(hu + h22- s)
z = h2lh32·
7.5. THE PRACTICAL QR ALGORITHM 357

Since a similarity transformation with Po only changes rows and coluiiUls


1, 2, and 3, we see that

X X X X X X
X X X X X X
X X X X X X
PoHPo =
X X X X X X
0 0 0 X X X
0 0 0 0 X X

Now the mission of the Householder matrices P1, ... , Pn-2 is to restore this
matrix to upper Hessenberg form. The calculation proceeds as follows:

X X X X X X X X X X X X
X X X X X X X X X X X X
X X X X X X 0 X X X X X P,
~ --+
X X X X X X 0 X X X X X
0 0 0 X X X 0 X X X X X
0 0 0 0 X X 0 0 0 0 X X

X X X X X X X X X X X X
X X X X X X X X X X X X
0 X X X X X Ps 0 X X X X X
--+ ~
0 0 X X X X 0 0 X X X X
0 0 X X X X 0 0 0 X X X
0 0 X X X X 0 0 0 X X X

X X X )(" X X
X X X X X X
0 X X X X X
0 0 X X X X
0 0 0 X X X
0 0 0 0 X X

Clearly, the general Pk has the form [\ = diag(Jk, Pk, In-k-3) where A is
a 3-by-3 Householder matrix. For example,

1 0 0 0 0 0
0 1 0 0 0 0
0 0 X X X 0
p2
0 0 X X X 0
0 0 X X X 0
0 0 0 0 0 1

Note that Pn-2 is an exception to this since Pn-2= diag(In- 2 , Pn-2)·


The applicability of Theorem 7.4.3 (the Implicit Q theorem) follows
from the observation that P,.e 1 = e1 for k = l:n - 2 and that Po and Z
358 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

have the same first column. Hence, Z1e1 = Ze 1, and we can assert that Z1
essentially equals Z provided that the upper Hessenberg matrices zT HZ
and Z[ HZ1 are each unreduced.
The implicit determination of H2 from H outlined above was first de-
scribed by Francis (1961) and we refer to it as a Francis QR step. The
complete Francis step is summarired as follows:

Algoritlun 7.5.1 (Francis QR Step) Given the unreduced upper Hes-


senberg matrix H E Rnxn whose trailing 2-by-2 principal submatrix has
eigenvalues a1 and a2, this algorithm overwrites H with zT HZ, where Z =
P 1 • • • Pn-2 is a product of Householder matrices and zT(H -a1I)(H -azl)
is upper triangular.
m=n-1
{Compute first column of (H- a 1 I)(H- a2I).}
s = H(m,m) + H(n,n)
t = H(m, m)H(n, n)- H(m, n)H(n, m)
x = H(1, 1)H(1, 1) + H(1, 2)H(2, 1)- sH(1, 1) + t
y = H(2, 1)(H(1, 1) + H(2, 2)- s)
z = H(2, 1)H(3, 2)
fork= O:n-3
[v, {Jj = house([x y z]T)
q = rnax{1,k}.
H(k + 1:k + 3, q:n) =(I- {JvvT)H(k + 1:k + 3, q:n)
r = min{k + 4,n}
H(1:r, k + 1:k + 3) = H(1:r, k + 1:k + 3)(I- {JvvT)
x=H(k+2,k+1)
y=H(k+3,k+1)
ifk<n-3
z = H(k +4,k+ 1)
end
end
[v, {3] =house([ x y JT)
H(n -1:n, n- 2:n) =(I- {JvvT)H(n- 1:n, n- 2:n)
H(1:n, n- 1:n) = H(1:n, n- 1:n)(I- (3vvT)
This algorithm requires 10n2 flops. If Z is accumulated into a given or-
thogonal matrix, an additional 10n2 flops are necessary.

7.5.6 The Overall Process


Reducing A to Hessenberg form using Algorithm 7.4.2 and then iterating
with Algorithm 7.5.1 to produce the real Schur form is the standard means
by which the dense unsymrnetric eigenproblem is solved. During the iter-
ation it is necessary to monitor the subdiagonal elements in H in order to
7.5. THE PRACTICAL QR ALGORITHM 359

spot any possible decoupling. How this is done is illustrated in the following
algorithm:

Algorithm 7.5.2 (QR Algorithm) Given A E lRnxn and a tolerance


tol greater than the unit roundoff, this algorithm computes the real Schur
canonical form QT AQ = T. A is overwritten with the Hessenberg decompo-
sition. If Q and Tare desired, then Tis stored in H. If only the eigenvalues
are desired, then diagonal blocks in T are stored in the corresponding po-
sitions in H.

Use Algorithm 7.4.2 to compute the Hessenberg reduction


H = U[ AUo where Uo=P1 · · · Pn-2·
If Q is desired form Q = P1 · · · Pn-2· See§5.1.6.
until q = n
Set to zero all subdiagonal elements that satisfy:
lhi,i-1l :<::: tol(jh;;l + ih;-!,i-&
Firld the largest non-negative q and the smallest
non-negative p such that

p
H n-p-q
q
p n-p-q q

where Haa is upper quasi-triangular and H22 is


unreduced. (Note: either p or q may be zero.)
if q < n
Perform a Francis QR step on H22: H2 2 = zT H22Z
if Q is desired
Q = Qdiag(Ip,Z,Iq)
H12 = H12Z
H23 = ZTH23
end
end
end
Upper triangularize all 2-by-2 diagonal blocks in H that have
real eigenvalues and accumulate the transformations
if necessary.

This algorithm requires 25n3 flops if Q and T are computed. If only the
eigenvalues are desired, then 10n3 flops are necessary. These flops counts
are very approximate and are based on the empirical observation that on
average only two Francis iterations are required before the lower 1-by-1 or
360 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

2-by-2 decouples.

Example 7.5.3 If Algorithm 7.5.2 is applied to

u ll
3 4 5
4 5 6
A= 3 6 7
0 2 8
0 0 1 10
then the subdiagona.l entries converge as follows

Iteration O(Jh211l O(JII,2Jl 0(1~1) O(JhMIJ


1 100 100 10o 10o
2 1o0 100 100 100
3 100 100 10-1 100
4 1o0 1o0 10-• w-3
5 100 100 w-6 10-s
6 10-1 100 10-13 10-13
7 10-1 100 10-2a w-n
8 10-' 100 converg. converg.
9 10-a 100
10 10-a 100
11 10-16 100
12 10 -32 100
13 converg. converg.

The roundoff properties of the QR algorithm are what one would expect
of any orthogonal matrix technique. The computed real Schur form T is
orthogonally similar to a matrix near to A, i.e.,
QT(A+E)Q = T
where QTQ =I and II E ll2 r::= ull A ll2· The computed Q is almost orthog-
onal in the sense that QTQ =I+ F where II F ll2 r::= u.
The order of the eigenvalues along T is somewhat arbitrary. But as we
discuss in §7.6, any ordering can be achieved by using a simple procedure
for swapping two adjacent diagonal entries.

7.5.7 Balancing
Finally, we mention that if the elements of A have widely varying magni-
tudes, then A should be balanced before applying the QR algorithm. This
is an O(n 2 ) calculation in which a diagonal matrix D is computed so that
if

v-'AD ~ [o,, ... ,o,.[ ~ [!]


then II r; lloo r::= II C; lloo fori= l:n. The diagonal matrix Dis chosen to have
the form D = diag(/3i', ... , 13;") where 13 is the floating point base. Note
7.5. THE PRACTICAL QR ALGORITHM 361

that D- 1AD can be calculated without roundoff. When A is balanced, the


computed eigenvalues are often more accurate. See Parlett and Reinsch
(1969).

Problems

P7.5.1 Show that if H = QT HQ is obtained by performing a single-shift QR step with


H = [ ~ ~ ] , then )ii21l ::; )y 2 :r)/[(w- z)2 + y 2 ].
P7 .5.2 Give a formula for the 2-by-2 diagonal matrix D that minimizes II v- 1 AD IIF
where A = [ ~ ~ ].
P7.5.3 Explain how the single-shift QR step H- JJ.I = UR, ii = RU + JJ.l can be
carried out implicitly. That is, show how the transition from ii to H can be carried out
without subtracting the shift IJ. from the diagonal of H.
P7 .5.4 Suppose H is upper Hessenberg and that we compute the factorization PH =
LU via Gaussian elimination with partial pivoting. (See Algorithm 4.3.4.) Show that
H1 = U(PTL) is upper Hessenberg and similar to H. (This is the basis of the modified
LR algorithm.)
P7.5.5 Show that if H = Ho is given and we generate the matrices Hk via Hk - IJ.ki
= UkRk, Hk+l = RkUk + JJ.ki, then

(U1 · · · U;)(R; · · · R,) = (H -JJ.ll) · · · (H -JJ.;l).

Notes and References for Sec. 7.5


The development of the practical QR algorithm began with the important paper

H. Rutishauser (1958). "Solution of Eigenvalue Problems with the LR Transformation,"


Nat. Bur. Stand. App. Math. Ser. 49, 47-81.
The algorithm described here was then "orthogonalized" in

J.G.F. Francis {1961). "The QR Transformation: A Unitary Analogue to the LR Trans-


formation, Parte I and II" Comp. J. 4, 265-72, 332-45.
Descriptions of the practical QR algorithm may be found in Wilkinson {1965) and Stew-
art {1973), and Watkins {1991). See also

D. Watkins and L. Elsner (1991). "Chasing Algorithms for the Eigenvalue Problem,"
SIAM J. Matrix Anal. Appl. 11!, 374-384.
D.S. Watkins and L. Elsner (1991). "Convergence of Algorithms of Decomposition Type
for the Eigenvalue Problem," Lin.Aig. and Ito Application 143, 19-47.
J. Erxiong {1992). "A Note on the Double-Shift QL Algorithm," Lin.Alg. and Its
Application 171, 121-132.
Algol procedures for LR and QR methods are given in

R.S. Martin and J.H. Wilkinson (1968). "The Modified LR Algorithm for Complex Hes-
senberg Matrices," Numer. Math. 11!, 369-76. See also Wilkinson and Reinsch(1971,
pp. 396-403).
362 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

R.S. Martin, G. Peters, and J.H. Wilkinson (1970). "The QR Algorithm for Real Hes-
senberg Matrices," Numer. Math. 1.4, 21!}--31. See also Wilkinson and Reinsch(1971,
pp. 35!}--71).
Aspects of the balancing problem are discw;sed in

E. E. Osborne (1960). "On Preconditioning of Matrices," JACM 7, 338-45.


B.N. Parlett and C. Reinsch (1969). "Balancing a Matrix for Calculation of Eigen-
values and Eigenvectors," Numer. Math.. 13, 292-304. See also Wilkinson and
Reinsch(1971, pp. 315-26).
High performance eigenvalue solver papers include

Z. Bai and J.W. Demmel (1989). "On a Block Implementation of Hessenberg Multishift
QR Iteration," Int'l J. of High Speed Comput. 1, 97-112.
G. Shroff (1991). "A Parallel Algorithm for the Eigenvalues and Eigenvectors of a
General Complex Matrix," Numer. Math. 58, 77!}--806.
R.A. Van De Geijn (1993). "Deferred Shifting Schemes for Parallel QR Methods," SIAM
J. Matrix Anal. Appl. 14, 18o-194.
A.A. Dubrulle and G.H. Golub (1994). "A Multishift QR Iteration Without Computa-
tion of the Shifts," Numerical Algorithms 7, 173-181.

7.6 Invariant Subspace Computations


Several important invariant subspace problems can be solved once the real
Schur decomposition QT AQ = T has been computed. In this section we
discuss how to

• compute the eigenvectors associated with some subset of >.(A),

• compute an orthonormal basis for a given invariant subspace,

• block-diagonalize A using well-conditioned similarity transformations,

• compute a basis of eigenvectors regardless of their condition, and

• compute an approximate Jordan canonical form of A.

Eigenvector /invariant subspace computation for sparse matrices is discussed


elsewhere. See §7.3 as well as portions of Chapters 8 and 9.

7.6.1 Selected Eigenvectors via Inverse Iteration


Let q(O) E <C" be a given unit 2-norm vector and assume that A - 11-I E Rnxn
is nonsingular. The following is referred to as inverse iterotion:

fork= 1,2, ...


Solve (A- J.ll)z(k) = q<k-1)
q(k) = z(k) /II z(k) !12 (7.6.1)
.>,(k) = q(k)T Aq(k)
end
7.6. INVARIANT SUBSPACE COMPUTATIONS 363

Inverse iteration is just the power method applied to (A- 11/)- 1 .


To analyze the behavior of (7.6.1), assume that A has a basis of eigen-
vectors {xi, ... , Xn} and that Ax; = >.;x; fori= 1:n. If
n
q(o) = L (3;x;
i=l

then q(k) is a unit vector in the direction of

(A - 111) -k q(0) ~ (3;


= ~ (),. - 11
)k X; .
i=l '

Clearly, if 11 is much closer to an eigenvalue Aj than to the other eigenvalues,


then q(k) is rich in the direction of xi provided /3j ol 0.
A sample stopping criterion for (7.6.1) might be to quit as soon as the
residual

satisfies
II T(k) lloo :::; cull A lloo (7.6.2)

where c is a constant of order unity. Since

with Ek = -r<klq(k)T, it follows that (7.6.2) forces 11 and q(k) to be an


exact eigenpair for a nearby matrix.
Inverse iteration can be used in conjunction with the QR algorithm as
follows:

• Compute the Hessenberg decomposition Ujf AUo = H.

• Apply the double implicit shift Francis iteration to H without accu-


mulating transformations.

• For each computed eigenvalue .>. whose corresponding eigenvector x


is sought, apply (7.6.1) with A= Hand 11 =.>.to produce a vector z
such that Hz::::: 11z.

• Set x = Uoz.

Inverse iteration with H is very economical because ( 1) we do not have to


accumulate transformations during the double Francis iteration; (2) we can
factor matrices of the form H- >.I in O(n 2 ) flops, and (3) only one iteration
is typically required to produce an adequate approximate eigenvector.
364 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

This last point is perhaps the most interesting aspect of inverse iteration
and requires some justification since >. can be comparatively inaccurate if
it is ill-conditioned. Assume for simplicity that >. is real and let

n
H->.1 L a;u;v'[ = UEVT
i=l

be the SVD of H - >.I. From what we said about the roundoff properties
of the QR algorithm in §7.5.6, there exists a matrix E E JR"x" such that
H + E- >.I is singular and II E ll2 ~ ull H ll2· It follows that a., ~ ua1 and
II (H- 5.J)v, ll2 "" ual> i.e., v, is a good approximate eigenvector. Clearly
if the starting vector q<0 > has the expansion

q(O) L" -y;u;


i=l

then

z(ll = " T
L....!.v;
a·1
i=l

is "rich" in the direction v,. Note that if s(>.) ~ lu;:v, I is small, then
z(l) is rather deficient in the direction u.,.
This explains (heuristically)
why another step of inverse iteration is not likely to produce an improved
eigenvector approximate, especially if>. is ill-conditioned. For more details,
see Peters and Wilkinson (1979).

Example 7.6.1 The matrix

1
A = [ w-w
has eigenvalm., .\1 = .99999 and .\2 = 1.00001 and corresponding eigenvectors Xi
[1, -lo- 0 f
and :r2 [1, w- 5= f.
The condition of both eigenvalues is of order 100.
Th.e approximate eigenvalue 1J. = 1 is a.n exact eigenvalue of A+ E where

Thus, the quality of p. is typical of the quality of an eigenvalue produced by the QR


algorithm when executed in 10-digit floating point.
If (7.6.1) is applied with starting vector q( 0) = [0, If, then q(l)= [l,O]T a.nd
II Aq(l) -IJ.Q(I) ll2 = 10-IO. However, one more step produces q< 2l = [0, 1]T for which
II Aq(>) - IJ.Q( 2 ) ll2 = 1. This example is discUBSed in Peters a.od Wilkinson {1979).
7.6. INVARIANT SUBSPACE COMPUTATIONS 365

7.6.2 Ordering Eigenvalues in the Real Schur Form


Recall that the real Schur decomposition provides information about in-
variant su bspaces. If

T Tu T12 ] P
[ 0 T22 q
p q

and >.(Tu) n >.(T22) = 0, then the first p columns of Q span the unique
invariant subspace associated with >.(T11 ). (See §7.1.4.) Unfortunately, the
Francis iteration supplies us with a real Schur decomposition Q~AQF = TF
in which the eigenvalues appear somewhat randomly along the diagonal of
TF. This poses a problem if we want an orthonormal basis for an invariant
subspace whose associated eigenvalues are not at the top of TF's diago-
nal. Clearly, we need a method for computing an orthogonal matrix QD
such that Q'J;TFQD is upper quasi-triangular with appropriate eigenvalue
ordering.
A look at the 2-by-2 case suggests how this can be accomplished. Sup-
pose

F F -_ T F -_ [ >.,0 h2
QTAQ ). ]
2
and that we wish to reverse the order of the eigenvalues. Note that TFx =
>.2x where
X = [ )..2 t~\1 ].

Let Qv be a Givens rotation such that the second component of Q'J;x is


zero. If Q = QFQD then

(QT AQ)e, = QbTF(Qvei) = >.2Qb(Qvei) = >.2e1


and so QT AQ must have the form

By systematically interchanging adjacent pairs of eigenvalues using this


technique, we can move any subset of >.(A) to the top ofT's diagonal as-
suming that no 2-by-2 bumps are encountered along the way.

Algorithm 7.6.1 Given an orthogonal matrix Q E IR"x", an upper tri-


angular matrix T = QTAQ, and a subset 6. = {>. 1 , .•. ,>.p} of >.(A), the
following algorithm computes a.n orthogonal matrix Qv such that Q'J;TQv
= S is upper triangular and {s II, ... , Spp} = 6.. The matrices Q and T are
overwritten by QQ D and S respectively.
366 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

while {t 11 , ... , tpp} i'!:;.


fork= 1:n -1
if tkk if_ !:;. and tk+I,k+! E !:;.

[ c, s] = givens(T(k, k + 1), T(k + 1, k + 1)- T(k, k))


T
8
T(k:k + 1, k:n) = [ c ] T(k:k + 1, k:n)
-s c

T(1:k+1,k:k+1)=T(1:k+1,k:k+1) [ -~ :]

8
Q(1:n,, k:k + 1) = Q(1:n, k:k + 1) [ c ]
-s c
end
end
end

This algorithm requires k(12n) flops, where k is the total number of required
swaps. The integer k is never greater than (n- p)p.
The swapping gets a little more complicated when T has 2-by-2 blocks
along its diagonal. See Ruhe (1 970) and Stewart (1 976) for details. Of
course, these interchanging techniques can be used to sort the eigenvalues,
say from maximum to minimum modulus.
Computing invariant subspaces by manipulating the real Schur decom-
position is extremely stable. If Q =[ Qb ... , Qn] denotes the computed or-
thogonal matrix Q, then I QTQ- I lb ;::; u and there exists a matrix E
satisfying II E 112 ;::; ull A ll2 such that (A+ E)q; E span{qi, ... , Qp} for
i = 1:p.

7.6.3 Block Diagonalization


Let

T
[
T~1 ~~~
0 0
~~: ~~
Tqq
l nq
(7.6.3)

n1 n2 nq

be a partitioning of some real Schur canonical form QT AQ = T E Rnxn


such that .:\(T11 ), •.. , .:\(Tqq) are disjoint. By Theorem 7.1.6 there exists a
matrix Y such that y- 1TY = diag(T11 , •.• ,Tqq). A practical procedure
for determining Y is now given together with an analysis of Y's sensitivity
as a function of the above partitioning.
Partition In = [ E 1 , ... , Eq] conformably with T and define the matrix
7.6. INVARIANT SUBSPACE COMPUTATIONS 367

Y;j E !Rn x n as follows:

Y;i = In + E;Z;iEJ. i < j, Z;j E !Rn' Xn;

In other words, Y;j looks just like the identity except that Z;j occupies the
(i,j) block position. It follows that if Y;j 1TY; 1 = T = (T; 1 ) then T and T
are identical except that

T;1 = T;;Z;1 - Z;1Tii + T;1


f';k = T;k - Z;iTik (k = j + 1:q)
Tki = Tk;Z;1 + Tki (k = 1:i- 1)
Thus, T;1 can be zeroed provided we have an algorithm for solving the
Sylvester equation
FZ-ZG = C (7.6.4)
where FE JR!'XP and G E wxr are given upper quasi-triangular matrices
and C E IR!'xr.
Bartels and Stewart (1972) have devised a method for doing this. Let C
= [ Ct, ... , Cr] and Z = [ Z1> ... , Zr] be column partitionings. If 9k+1,k = 0,
then by comparing columns in (7.6.4) we find

k
Fzk - L9ikZi = Ck.
i=l

Thus, once we know z 1 , •.• , Zk-! then we can solve the quasi-triangular
system
k-!
(F - 9kkl) Zk = Ck +L 9ikZi
i=l

for zk. If 9k+t,k =F 0, then zk and zk+l can be simultaneously found by


solving the 2p-by-2p system

(7.6.5)

where m = k + 1. By reordering the equations according to the permutation


(1,p+ 1, 2,p+2, ... ,p, 2p), a banded system is obtained that can be solved
in O(p 2 ) flops. The details may be found in Bartels and Stewart (1972).
Here is the overall process for the case when F and G are each triangular.

Algorithm 7.6.2 (Bartels-Stewart Algorithm) Given C E IR!'xr and


upper triangular matrices FE JR!'xp and G E JR'"xr that satisfy >.(F) n
>.(G) = 0, the following algorithm overwrites C with the solution to the
equation FZ- ZG =C.
368 CHAPTER 7. THE U NSYMMETRlC EIGENVALUE PROBLEM

fork=l:r
C(I:p,k) = C(l:p,k) + C(l:p, l :k - l)G(l:k - l , k)
Solve (F - G(k,k)I)z = C(l:p,k) for z.
C(L:p,k) = z
end

T his algorithm requires pr(p + r) flops.


By zeroing the super diagonal blocks in Tin the appropriate order, the
entire matrix can be reduced to block diagonal form.

Algorithm 7.6.3 Given an orthogonal matrix Q E lR.nxn, an upper quasi-


triangular matrix T = QT AQ, and the partitioning (7.6.3), the following
algorithm overwrites Q with QY where y-try = diag(T11 , •. . , T44 ).

for j = 2:q
fori = l:j -1
Solve TiiZ- ZT11 = -1iJ for Z using Algorithm 7.6.2.
fork =j + l:q
T,k = Tu,- ZTJk
end
fork= l:q
QkJ = Qk;Z + Qki
end
end
end

The number of Sops required by this algorithm is a complicated function


of the block sizes in (7.6.3).
The choice of the real Schur form T and its partitioning in (7.6.3) de-
termines the sensitivity of the Sylvester equations that must be solved in
Algorithm 7.6.3. This in turn affects the condition of the matrix Y and
the overall usefulness of the block diagonalization. The reason for these
dependencies is that the relative error of the computed solution Z to
T;;Z - ZT;; = - T,; (7.6.6)

satisfies
II z -ZIIF ~
liT lip
U --':=...;;.;.:,~
II Z liP sep(T;;, Tii)
For details, see Golub, Nash, and Van Loan (1979). Since

sep(1j;, Tii) min min I.\ - PI


X;o!O .AE.A(TH)
~JE.A(T;;)
7.6. INVARIANT SUBSPACE COMPUTATIONS 369

there can be a substantial loss of accuracy whenever the subsets .>..(T;;) are
insufficiently separated. Moreover, if Z satisfies (7.6.6) then

II z IIF :::; II T;j IIF


sep(T;;, Tii)
Thus, large-norm solutions can be expected if sep(T;;, Tii) is small. This
tends to make the matrix Y in Algorithm 7 .6.3 ill-conditioned since it is
the product of the matrices

Y;i=[~~]·
Note: KF(l'ii) = 2n +II Z II~·
Confronted with these difficulties, Bavely and Stewart ( 1979) develop
an algorithm for block diagonalizing that dynamically determines the eigen-
value ordering and partitioning in (7.6.3) so that all the Z matrices in Al-
gorithm 7.6.3 are bounded in norm by some user-supplied tolerance. They
find that the condition of Y can be controlled by controlling the condition
of the Y;j.

7.6.4 Eigenvector Bases


If the blocks in the partitioning (7.6.3) are all 1-by-1, then Algorithm 7.6.3
produces a basis of eigenvectors. As with the method of inverse iteration,
the computed eigenvalue-eigenvector pairs are exact for some "nearby" ma-
trix. A widely followed rule of thumb for deciding upon a suitable eigen-
vector method is to use inverse iteration whenever fewer than 25% of the
eigenvectors are desired.
We point out, however, that the real Schur form can be used to deter-
mine selected eigenvectors. Suppose

QTAQ = T~u u
)..
k-1
1
[ 0 n-k
k-1 1

is upper quasi-triangular and that .>.. if_ .>..(Tu) U .>..(Taa). It follows that if we
solve the linear systems (T11 - .>..I)w = -u and (Taa - .>..Il z = -v then

are the associated right and left eigenvectors, respectively. Note that the
condition of.>.. is prescribed by 1/s(.>..) = .,/(1 + wTw)(1 + zTz).
370 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

7.6.5 Ascertaining Jordan Block Structures

Suppose that we have cpmputed the real Schur decomposition A= QTQT,


identified clusters of "equal" eigenvalues, and calculated the corresponding
block diagonalization T = Y diag(T11 , .•. , Tqq)Y- 1 . As we have seen, this
can be a formidable task. However, even greater numerical problems con-
front us if we attempt to ascertain the Jordan block structure of each T;;. A
brief examination of these difficulties will serve to highlight the limitations
of the Jordan decomposition.
Assume for clarity that >.(T;;) is real. The reduction of T;; to Jordan
form begins by replacing it with a matrix of the form C = >.I + N, where
N is the strictly upper triangular portion of T;; and where >., say, is the
mean of its eigenvalues.
Recall that the dimension of a Jordan block J(>.) is the smallest non-
negative integer k for which [J(>.)- >.J]k = 0. Thus, if p; = dim[null(Ni)],
for i = O:n, then Pi - Pi-i equals the number of blocks in C's Jordan
form that have dimension i or greater. A concrete example helps to make
this assertion clear and to illustrate the role of the SVD in Jordan form
computations.
Assume that Cis 7-by-7. Suppose we compute the SVD Uf NV1 = E1
and "discover" that N has rank 3. If we order the singular values from
small to large then it follows that the matrix N 1 = Vt NV1 has the form

At this point, we know that the geometric multiplicity of >. is 4-i.e, C's
Jordan form has 4 blocks (p 1 -Po= 4-0 = 4).
Now suppose U[ LV2 = E 2 is the SVD of Land that we find that L has
unit rank. If we again order the singular values from small to large, then
£2 = ~T L V2 clearly has the following structure:

However >.(£2) = >.(L) = {0, 0, 0} and soc= 0. Thus, if


7.6. INVARIANT SUBSPACE COMPUTATIONS 371

then N2 = V[ N1 V2 has the following form:


0 0 0 0 X X X
0 0 0 0 X X X
0 0 0 0 X X X
N2 0 0 0 0 X X X
0 0 0 0 0 0 a
0 0 0 0 0 0 b
0 0 0 0 0 0 0

Besides allowing us to introduce more zeroes into the upper triangle, the
SVD of L also enables us to deduce the dimension of the null space of N 2 .
Since

Nf = [ ~ ~~ ] = [ ~ ~ ][ ~ ~ ]
and [ ~ ] has full column rank,

p2 = dim(null(N 2)) = dim(null(Nr)) = 4 + dim(null(L)) = p 1 + 2.


Hence, we can conclude at this stage that the Jordan form of C has at least
two blocks of dimension 2 or greater.
Finally, it is easy to see that Nf = 0, from which we conclude that there
is P3 - P2 = 7- 6 = 1 block of dimension 3 or larger. If we define V = Vj V2
then it follows that the decomposition

>. 0 0 0
b~do< ~de< 1 ~ i>uge<
X X X
0 >. 0 0 X X X
} 4 of
0 0 >. 0 X X X
vrcv 0 0 0 >. X X X
0 0 0 0 >. X a } 2 blocks of order 2 or larger
0 0 0 0 0 >. 0
0 0 0 0 0 0 >. } 1 block of order 3 or larger

"displays" C's Jordan block structure: 2 blocks of order 1, 1 block of order


2, and 1 block of order 3.
To compute the Jordan decomposition it is necessary to resort to non-
orthogonal transformations. We refer the reader to either Golub and Wilkin-
son (1976) or Kagstrom and Ruhe (1980a, 1980b) for how to proceed with
this phase of the reduction.
The above calculations with the SVD amply illustrate that difficult
rank decisions must be made at each stage and that the final computed
block structure depends critically on those decisions. Fortunately, the sta-
ble Schur decomposition can almost always be used in lieu of the Jordan
decomposition in practical applications.
372 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

Problems

PT.6.1 Give a complete algorithm for solving a reaJ, n-by-n, upper quasi-triangular
system Tx = b.
P7.6.2 Suppose u- 1 AU = diag(az, ... ,am) and v- 1 BV = diag(iJz, ... ,iJ,.). Show
that if </l(X) = AX + X B, then .X(</l) {a; + IJi : i = 1:m, j = 1:n }. What
are the corresponding eigenvectors? How can these decompositions be used to solve
AX+XB= C?
P7.6.3 Show that if Y = [ ~ ~ ] then 1t2(Y) = [2 + u 2 + v'4u2 + u4 ]/2 where
u = II z II2·
P7.6.4 Derive the system (7.6.5).
P7.6.5 Assume that T E Rnxn is block upper triangular and partitioned as follows:

T = [ Tz~ ~~~ ~~! ] T E E'xn


0 0 T33
Suppose that the diagonal block T22 is 2-by-2 with complex eigenvalues that are disjoint
from .X(Tu) and .X{Taa). Give an algorithm for computing the 2-dimensional real invari-
ant subspace associated with T22's eigenvalues.
P7 .6_6 Suppose H E E' x" is upper Hessenberg with a complex eigenvalue .X+ i ·I'· How
could inverse iteration be used to compute x,y E R" so that H(x+iy) = .X+i!')(x+iy)?
Hint: compare real and imaginary parts in this equation and obtain a 2n-by-2n real sys-
tem.
P7.6.6 {a) Prove that if I-'D E (C has nonzero real part, then the iteration

= ~ (~-~k + l-' k)
1
1-'k+l

converges to 1 if Re(i-<o) > 0 and to -1 if Re(i-<o) < 0. (b) Suppose A E (C"xn is


diagonali2able and that
A -X
- [ D+
0 o ] x-'
D-
where D+ E (CPXP and D- E (C(n-p)x(n-p) are diagonal with eigenvalues in the open
right half plane and open left half plane, respectively. Show that the iteration

1 ( Ak
Ak+l = 2 + Ak-1) Ao =A

converges to
1 0
sign( A) =X [ Q' -ln-p ] x-'.
(c) Suppose
M [~I M12 ]
M22
p
n-p
p n -p
with the property that .X(Mzz) is in the open right half plane and .X(M2 2 ) is in the open
left half plane. Show that

sign(M) = [
1
Q'
z
-ln-p

and that -Z/2 solves MuX- XM22 = -Ml2· Thus,

u-
- [ ~22
/p0 -Z/2 ] ~u- 1 MU= [ Mu
O
ln-p ] .
7 .6. INVARIANT SUBSPACE COMPUTATIONS 373

Notes and ReFerences for Sec. 7.6


Much of the material discUBSed in this section may be found in the survey paper

G.H. Golub and J.H. Wilkinson {1976). "Ill-Conditioned Eigensystems and the Compu-
tation of the Jordan Canonical Form," SIAM Review 18, 578--619.
Papers that specifically ana.lyze the method of inverse iteration for computing eigenvec-
tors include

J. Vamh (1968). "The Ca.lculation of the Eigenvectors of a Genera.! Complex Matrix by


Inverse Iteration," Math. Comp. 22, 785-91.
J. Vamh (1968). "Rigorous Machine Bounds for the Eigensystem of a Genera.! Complex
Matrix," Math. Comp. 22, 793-801.
J. Varah {1970). "Computing Invariant Subspaces of a General Matrix When the Eigen-
system is Poorly Determined," Math. Comp. 24, 137-49.
G. Peters and J.H. Wilkinson {1979). "Inverse Iteration, Ill-Conditioned Equations, and
Newton's Method," SIAM Review 21, 339-60.
The Algol version of the Eispack inverse iteration subroutine is given in

G. Peters and J.H. Wilkinson {1971). "The Calculation of Specified Eigenvectors by


Inverse Iteration," in Wilkinson and Reinsch {197l,pp.418-39).
The problem of ordering the eigenvalues in the rea.! Schur form is the subject of

A. Ruhe (1970). "An Algorithm for Numerical Determination of the Structure of a


General Matrix," BIT 10, 19&--216.
G.W. Stewart (1976). "Algorithm 406: HQR3 and EXCHNG: Fortran Subroutines for
Calculating and Ordering the Eigenvalues of a Real Upper Hessenberg Matrix," ACM
Trans. Math. Soft. B, 27&-80.
J.J. Dongarra, S. Hammarling, and J.H. Wilkinson (1992). "Numerica.l Considerations
in Computing Invariant Subepaces," SIAM J. Matrix Anal. Appl. 13, 14&-161.
Z. Bai and J.W. Demmel (1993). "On Swapping Diagona.l Blocks in Reel Schur Form,"
Lin. Alg. and Its Applic. 186, 73-95
Fortran programs for computing block diagonalizations and Jordan forms are described
in

C. Bavely and G.W. Stewart (1979). "An Algorithm for Computing Reducing Subspaces
by Block Diagonalization," SIAM J. Num. Anal. 16, 359-67.
B. Kilgstrom and A. Ruhe (1980a). "An Algorithm for Numerical Computation of the
Jordan Normal Form of a Complex Matrix," ACM Trans. Math. Soft. 6, 398-419.
B. Kilgstrom and A. Ruhe (1980b). "Algorithm 560 JNF: An Algorithm for Numerical
Computation of the Jordan Norma.! Form of a Complex Matrix," ACM Trans. Math.
Soft. 6, 437-43.
J.W. Demmel (1983). "A Numerical Analyst's Jordan Canonical Form," Ph.D. Thesis,
Berkeley.
PapeiS that are concerned with estimating the error in a computed eigenvalue andfor
eigenvector include

S.P. Chan and B.N. Parlett (1977). "Algorithm 517: A Program for Computing the
Condition Numbers of Matrix Eigenvalues Without Computing Eigenvectors," ACM
Trans. Math. Soft. 3, 18&--203.
H.J. Symm and J.H. Wilkinson {1980). ''Rea.listic Error Bounds for a Simple Eigenvalue
and Its ASBOCiated Eigenvector," Numer. Math. 35, 113-26.
374 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

C. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors,"
Lin. Alg. and Its Applic. 88/89, 715--732.
Z. Bai, J. Demmel, and A. McKenney (1993). "On Computing Condition Numbers for
the Nonsymmetric Eigenproblem," ACM 7\uns. Math. Soft. 19, 202-223.
As we have seen, tbe sep(.,.) function is of great importance in tbe assessment of a com-
puted invariant subspace. Aspects of this quantity and the associated Sylvester equation
are discussed in

J. Vamh (1979). "On the Separation of Two Matrices," SIAM J. Num. Anal. 16,
212-22.
R. Byers (1984). "A Linpack-Style Condition Estimator for the Equation AX- XBT =
C," IEEE 7\uns. Auto. Cont. AC-29, 926-928.
K Datta (1988). "The Matrix Equation XA- BX =Rand Its Applications," Lin. Alg.
and It8 Appl. 109, 91-105.
N.J. Higham (1993). "Perturbation Theory and Backward Error for AX- XB = C,"
BIT 33, 124-136.
J. Gardiner, M.R. Wette, A.J. Laub, J.J. Amato, and C.B. Moler (1992). "Algorithm
705: A FORTRAN-77 Software Package for Solving the Sylvester Matrix Equation
AX BT + CX DT = E," ACM 'lrnns. Math. Soft. 18, 232-238.
Numerous algorithms have bffin proposed for the Sylvester equation, but those described
in

R.H. Bartels and G.W. Stewart (1972). "Solution of the Equation AX+ XB = C,"
Comm. ACM 15, 82o-26.
G.H. Golub, S. Nash, and C. Van Loan (1979). "A Hessenberg-Schur Method for the
Matrix Problem AX+ XB = C," IEEE 7\uns. Auto. Cont. AC-24, 90~13.
are among the more reliable in that they rely on orthogonal transformations. A con-
strained Sylvester equation problem is considerd in

J.B. Barlow, M.M. Monahemi, and D.P. O'Leary (1992). "Constrained Matrix Sylvester
Equations," SIAM J. Matri3: Anal. AppL 13, 1-9.
The Lyapunov problem FX + XFT = -C where Cis non-negative definite bas a
very important role to play in control theory. See

S. Barnett and C. Storey (1968). "Some Applications of the Lyapunov Matrix Equation,"
J. Inst. Math. Applic. 4, 3342.
G. Hewer and C. Kenney (1988). "Tbe Sensitivity of tbe Stable Lyapunov Equation,"
SIAM J. Control Optim 26, 321-344.
A.R. Ghavirni and A.J. Laub (1995). "Residual Bounds for Discrete-Time Lyapunov
Equations," IEEE 7\uns. Auto. Cont. 40, 1244-1249.

Several authors have considered generalizations of the Sylv..,ter equation, i.e., EF;X G; =
C. These include

P. Lancaster (1970). "Explicit Solution of Linear Matrix Equations," SIAM RIIDiew 12,
544-66.
H. Wimmer and A.D. Ziebur (1972). "Solving tbe Matrix Equations E/p(A)gp(A) = C,"
SIAM Review 14, 318--23.
W.J. Vetter (1975). "Vector Structur... and Solutions of Linear Matrix Equations," Lin.
Alg. and Its Applic. 10, 181-88.
Some Ideas about improving computed eigenvalu..,, eigenvectors, and invariant sub-
spaces may be found in
7. 7. THE QZ METHOD FOR Ax = ABX 375

J.J. Dongarra, C.B. Moler, and J.H. Wilkinson (1983). "Improving the Accuracy of
Computed Eigenvalues and Eigenvectors," SIAM J. Numer. Anal. 20, 23--46.
J.W. Demmel (1987). ''Three Methods for Refining Estimates of Invariant Subspaces,"
Computing 38, 43-57.

Hessenberg/QR iteration techniques are fast, but not very amenable to parallel computa-
tion. Because of this there is a hunger for radically new approaches to the eigenproblem.
Here are some papers that focus on the matrix sign function and related ideas that have
high performance potential:

C.S. Kenney and A.J. Laub (1991). "Rational Iterative Methods for the Matrix Sign
Function," SIAM J. Matrix Anal. Appl. 12, 273-291.
C.S. Kenney, A.J. Laub, and P.M. Papadopouos (1992). "Matrix Sign Algorithms for
Riccati Equations," IMA J. of Math. Control Inform. 9, 331-344.
C.S. Kenney and A.J. Laub (1992). "On Scaling Newton's Method for Polar Decompo-
sition and the Matrix Sign Function," SIAM J. Matrix Anal. Appl. 13, 688-706.
N.J. Higham (1994). "The Matrix Sign Decomposition and Its Relation to the Polar
Decomposition," Lin. Alg. and Its Applic 212/219, 3-20.
L. Adams and P. Arbenz (1994). "Towards a Divide and Conquer Algorithm for the Real
Nonsymmetric Eigenvalue Problem," SIAM J. Matrix Anal. Appl. 15, 1333-1353.

1.1 The QZ Method for Ax= A.Bx


Let A and B be two n-by-n matrices. The set of all matrices of the form
A - AB with A E <C is said to be a pencil. The eigenvalues of the pencil
are elements of the set A(A, B) defined by

A(A,B) = {z E <C: det(A- zB) = 0 }.

If A E A(A, B) and

Ax= ABX (7.7.1)

then xis referred to as an eigenvector of A- )..B.


In this section we briefly survey some of the mathematical properties
of the generalized eigenproblem (7.7.1) and present a stable method for its
solution. The important case when A and B are symmetric with the latter
positive definite is discussed in §8. 7.2.

7.7.1 Background
The first thing to observe about the generalized eigenvalue problem is that
there are n eigenvalues if and only if rank( B) = n. H B is rank deficient
376 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

then ).(A, B) may be finite, empty, or infinite:

A= [ ~ ;] B =
[~ ~ ] ~ >.(A, B)= {1}

A
[~ ~ ] B =
[~ ~ ] ~ >.(A, B)= 0

A -- [ 1
0 0
2] B -- 1 0]
[ 0 0 ~ >.(A, B)= Qj

Note that if 0 # >. E >.(A, B) then (1/>.) E >.(B, A). Moreover, if B is


nonsingular then >.(A, B) = >.(B- 1 A ,I) = >.(B- 1 A).
This last observation suggests one method for solving the A- >.B prob-
lem when B is nonsingular:

• Solve BC =A for C using (say) Gaussian elimination with pivoting.

• Use the QR algorithm to compute the eigenvalues of C.

Note that C will be affected by roundoff errors of order ull A ll2ll B- 1 ll2·
If B is ill-conditioned, then this can rule out the possibility of computing
any generalized eigenvalue accurately---€ven those eigenvalues that may be
regarded as well-conditioned.

Example 7.7.1 If

A = [ 1.746 .940 ] and B = [ .780 .563 ]


1.246 1.898 .913 .659

then ~(A, B)= {2, 1.07x 106 }. With 7-digit floating point arithmetic, we find ~(/l(AB- 1 ))
= {1.562539, 1.01 x 106}. The poor quality of the sma.ll eigenvalue is because 1<2(8)""
2 x 106 . On the other hand, we find that

The accuracy of the small eigenvalue is improved because 1<2(A),.. 4.

Example 7.7.1 suggests that we seek an alternative approach to the A- >.B


problem. One idea is to compute well-conditioned Q and Z such that the
matrices
(7.7.2)
are each in canonical form. Note that >.(A, B)= >.(A 1 , Bt) since

Ax = >.Bx *> AtY = >.BtY x =Zy

We say that the pencils A - >.B and A 1 - >.B 1 are equivalent if (7.7.2)
holds with nonsingular Q and Z.
7. 7. THE QZ METHOD FOR Ax = >.Bx 377

7.7.2 The Generalized Schur Decomposition


As in the standard eigenproblem A- >.I there is a choice between canonical
forms. Analogous to the Jordan form is a decomposition of Kronecker in
which both A1 and B 1 are block diagonal. The blocks are similar to Jordan
blocks. The Kronecker canonical form poses the same numerical difficulties
as the Jordan form. However, this decomposition does provide insight into
the mathematical properties of the pencil A- >.B. See Wilkinson (1978)
and Demmel and Kagstrom (1987) for details.
More attractive from the numerical point of view is the following de-
composition described in Moler and Stewart (1973).
Theorem 7.7.1 (Generalized Schur Decomposition) If A and Bare
in cr:nxn, then there exist unitary Q and Z such that QH AZ = T and
QH BZ = S are upper triangular. If for some k, tkk and Skk are both zero,
then >.(A, B) = «J. Otherwise

>.(A, B) = {t;;/s;;: s;; f 0}.


Proof. Let {B k} be a sequence of nonsingular matrices that converge to B.
For each k, let Qf/ (AB;; 1 )Qk = Rk be a Schur decomposition of AB; 1 • Let
Z~c be unitary such that zf! 1
=
(Bi: Q~c) Sj; is upper triangular. It follows
1

that both Qf/ AZ~c = R~cS~c and Qf/ B~cZk = Sk are also upper triangular.
Using the Bolzano-Weierstrass theorem, we know that the bounded se-
quence {(Q~c, Zk)} has a converging subsequence, Jim(Qk., Zk,) = (Q, Z).
It is easy to show that Q and Z are unitary and that QH AZ and QH BZ
are upper triangul&. The assertions about >.(A, B) follow from the identity
n
det(A- >.B) = det(QZH) 11(t;;- >.s;;). D
i=l

U A and B are real then the following decomposition, which corresponds


to the realschur decomposition (Theorem 7.4.1), is of interest.
Theorem 7.7.2 (Generalized Real Schur Decomposition) If A and
B are in 1Rnxn then there exist orthogonal matrices Q and Z such that
QT AZ is upper quasi-triangular and QT BZ is upper triangular.
Proof. See Stewart (1972). D

In the remainder of this section we are concerned with the computation of


this decomposition and the mathematical insight that it provides.

7.7.3 Sensitivity Issues


The generalized Schur decomposition sheds light on the issue of eigenvalue
sensitivity for the A- >.B problem. Clearly, small changes in A and B can
378 CHAPTER 7. THE UNSYMMETRJC EIGENVALUE PROBLEM

induce large changes in the eigenvalue~; = t;, j s,1 if s;; is small. However,
as Stewart (1978) argues, it may not be appropriate to regard such an
eigenvalue as "ill-oonditioned." 'Ibe reason is that the reciprocal p.1 =
s,ift., might be a very well behaved eigenvalue for the pencil p.A- B. In
the Stewart analysis, A and B a.re treated symmetrically and the eigenvalues
are regarded more as ordered pairs (t", Si;) than as quotients. With this
point of view it becomes appropriate to measure eigenvalue perturbations
in the chordal metric chord (a, b) defined by

chord( a, b) =
la-bl .
v'l + a 2 v'l + b2
Stewart shows that if ~ is a dL<;tinct eigenvalue of A - ~B and ~e is the
corresponding eigenvalue of the perturbed pencil A- ~iJ with II A - A 11~ ~
II B - iJ 1!2::::: t., then

where x and y have unit 2-norm and satisfy Ax = ~Bx and y 8 = ~yB B .
Note that the denominator in the upper bound is symmetric in A and B .
The "truly" ill-conditioned eigenvalues are those for which this denominator
is small.
The extreme case when tJc1c = su = 0 for some A: has been studied
by Wilkinson (1979). He me.Jces the interesting observation that when this
occurs, the remaining quotients t;;f B!i can assume arbitrary values.

7.7.4 Hessenberg-Triangular Form


The first step in computing the generalized Schur decomposition of t he pair
(A , B) is to reduce A to upper Hessenberg form and B to upper triangular
fOrm via orthogonal traDSformations. We first determine an orthogonal U
such that UT B is upper triangular. Of course, to preserve eigenvalues, we
must also update A in exactly the same way. Let's trace what happens in
the n = 5 case.
7.7. THE QZ METHOD FOR AX= >.Bx 379

Next, we reduce A to upper Hessenberg form while preserving B's upper


triangular form. First, a Givens rotation Q 45 is determined to zero as1:

~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 0 X
X X X 0 0 X

The nonzero entry arising in the (5,4) position in B can be zeroed by


postmultiplying with an appropriate Givens rotation Z45 :

~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 0 X
X X X 0 0 0

Zeros are similarly introduced into the (4, 1} and (3, 1} positions in A:

~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 X X
X X X 0 0 0

X~
~l
X X X X X X
X X X X X X
A= AZs4 = X X X 0 X X

[ X X X 0 0 X
X X X 0 0 0

~l
X X X X X X
X X X X X X
X X X X X X
X X X 0 0 X
X X X 0 0 0

~l
X X X X X X
X X X X X X
X X X 0 X X
X X X 0 0 X
X X X 0 0 0
A is now upper Hessenberg through its first column. The reduction is
completed by zeroing as2. a42, and as3· As is evident above, two orthogonal
transformations are required for each .a;; that is zeroed--one to do the
380 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

zeroing and the other to restore B's triangularity. Either Givens rotations
or 2-by-2 modified Householder transformations can be used. Overall we
have:

Algorithm 7.7.1 (Hessenberg-Triangular Reduction) Given A and


Bin lRnxn, the following algorithm overwrites A with an upper Hessenberg
matrix QT AZ and B with an upper triangular matrix QT BZ where both
Q and Z are orthogonal.
Using Algorithm 5.2.1, overwrite B with QT B = R where
Q is orthogonal and R is upper triangular.
A=QTA
for j = l:n- 2
fori= n:- l:j + 2
[c, s] = givens(A(i -l,j),A(i,j))
A(i -l:i,j:n) = [ c
8 ]T A(i -l:i,j:n)
-8 c

B(i -l:i,i -l:n) = [ c


8 ]T B(i -l:i,i -l:n)
-s c
[c,s] = givens(-B(i,i),B(i,i -1))
B(l:i,i -l:i) = B(l:i,i -l:i) [ -~ :]
8
A(l:n,i -l:i) = A(l:n,i -l:i) [ c ]
-s c
end
end
This algorithm requires about 8n3 flops. The accumulation of Q and Z
requires about 4n3 and 3n3 flops, respectively.
The reduction of A - >.B to Hessenberg-triangular form serves as a
"front end" decomposition for a generalized QR iteration known as the QZ
iteration which we describe next.

Example 7.7.3 If

-.1231
A = [
1
~1 ~1 -i2 ]
and orthogonal matrices Q and Z are defined by
-.9917 .0378 ]
and B
[~
2
5
8

[ 1.0000
n 0.0000 0.0000 ]
Q = -.4924
[ -.8616
.0279 -.8699 and z 0.0000 -.8944 -.4472
.1257 .4917 0.0000 .4472 -.8944
then A1 = QT AZ and B, = QT BZ are given by
-2.5849 1.5413 2.4221 ] [ -8.1240 3.6332 14.2024 ]
A, = -9.7631 .0874 1.9239 a.nd B, 0.0000 0.0000 1.8739 .
[ 0.0000 2.7233 -.7612 0.0000 0.0000 .7612
7. 7. THE QZ METHOD FOR Ax = ).Bx 381

7.7.5 Deflation
In describing the QZ iteration we may assume without loss of generality that
A is an unreduced upper Hessenberg matrix and that B is a nonsingular
upper triangular matrix. The first of these assertions is obvious, for if
ak+l,k = 0 then

A->.B Au - >.Bn A12 - >.B12 ] k


[ 0 A22 - >.B22 n- k
k n-k
and we may proceed to solve the two smaller problems Au - >.B 11 and
A22 - >.B22· On the other hand, if bkk = 0 for some k, then it is possible to
introduce a zero in A's (n, n- 1) position and thereby deflate. Illustrating
by example, suppose n = 5 and k = 3:

~l
X X X X X X
X X X X X X
X X X 0 0 X
0 X X 0 0 X
0 0 X 0 0 0

The zero on B's diagonal can be "pushed down" to the (5,5) position as
follows using Givens rotations:

~l
X X X X X X
X X X X X X
X X X 0 0 X
X X X 0 0 0
0 0 X 0 0 0

~l
X X X X X X
X X X X X X
X X X 0 0 X
0 X X 0 0 0
0 0 X 0 0 0

~l
X X X X X X
X X X X X X
X X X 0 0 X
0 X X 0 0 0
0 X X 0 0 0

~l
X X X X X X
X X X X X X
X X X 0 X X
0 X X 0 0 0
0 0 X 0 0 0
382 CHAPTER 7. T HE UNSYMMETRlC EIGENVALUE PROBLEM

[~ ~ l·B ~ BZ~ ~ ~ ~l
X X X X X X
X X X X X X
A= AZ4s = X X X 0 X X
0 X X [ 0 0 X
0 0 0 0 0 0

T his zero-chasing technique is perfectly general and can ~ used to zero


an,n- 1 regardless of where the zero appears along B's diagonal.

7.7.6 The QZ Step


We a re now in a position to describe a QZ step. T he basic idea is to update
A and B as follows

(A - >.B) = (JT (A - >.B)Z,

where A is upper Hessenberg, B is upper triangular, £J and t are each


orthogonal, and .AB-1 is essentially the same matrix that would result if a
Flands QR step {Algorithm 7.5.2) were explicitly applied to AB- 1 • This
can be done wit h some clever zero-chasing and an appeal to t he implicit Q
theorem.
Let M = AB- 1 (upper Hesse~rg) and let v be the first column of the
matrix (M - al )(M - bl), where a and b are the eigenvalues of M 's lower
2-by-2 subma trix. Note t hat v can be calculated in 0(1) flops. If Po is a
Householder matrix such that P0 v is a multiple of e1, then

X X X X X X
X X X X X X
X X X X X X
A = Po A =
0 0 X X X X
0 0 0 X X X
0 0 0 0 X X

X X X X X X
X X X X X X
X X X X X X
B = PoB ::::
0 0 0 X X X
0 0 0 0 X X
0 0 0 0 0 X

The idea now is to restore these matrices to Hessenberg-triangular form by


chasing the unwanted nonzero elements down t he diagonal.
To this end , we first determine a pair of Householder matrices Z 1 a nd
7.7. THE QZ METHOD FOR Ax = .\Bx 383

Zz to zero ba., ba2, and b:n:


X X X X X X
X X X X X X
X X X X X X
A= AZ1Z2 = X X X X X X
0 0 0 X X X
0 0 0 0 X X

X X X X X X
0 X X X X X
0 0 X X X X
B = BZIZ'J = 0 0 0 X X X
0 0 0 0 X X
0 0 0 0 0 X

Then a Householder matrix Pt is used to zero as 1 and '41:

X X X X X X
X X X X X X
0 X X X X X
0 X X X X X
0 0 0 X X X
0 0 0 0 X X

X X X X X X
0 X X X X X
0 X X X X )(
B = PtB = 0 X X X X X
0 0 0 0 X X
0 0 0 0 0 X

Notice that with this step the unwanted nomero elements bave been shifted
down and to the right from their original position. This Ulustrates a typical
step in the QZ iteration. Notice that Q = QoQt · · · Qn-z has the same first
column as Qo. By the way the initial Householder matrix was detennined,
we can apply the Implicit Q theorem and assert that AB- 1 = QT (AB- 1)Q
is indeed essentially the same matrix that we would obtain by applying the
Francis iteration toM = AB- 1 directly. Overall we have:

Algorithm 7.7.2 (The QZ Step) Given an unreduced upper Hessenberg


matrix A E ~exn and a nonsingular upper triangular matrix BE R"xn,
the following algorithm overwrites A with the upper Hessenberg matrix
QT AZ and B with the upper triangular matrix QT BZ where Q and z are
orthogonal and Q has the same first column as the orthogonal similarity
transformation in Algorithm 7.5.1 when it is applied to AB- 1 •
384 CHAPTER 7. THE UNSYMMETR!C EIGENVALUE PROBLEM

Let M = AB- 1 and compute (M- ai)(M- bl)e1 = (x, y, z, 0, ... , of


where a and b axe the eigenvalues of M's lower 2-by-2.
fork= l:n- 2
Find Householder Qk so Qk[ x y z ]T = [ * 0 0 f.
A= diag(Ik-I,Qk,ln-k-2)A
B = diag(h-1,QkJn-k-2)B
Find Householder Zk1 so
[ bk+2,k bk+2,k+! bk+2,k+2 ] zkl = [ o o * ] .
A= Adiag(h-1,Zki•In-k-2)
B = Bdiag(h-h Zk1Jn-k-2l
Find Householder Zk2 so
[ bk+!,k bk+!,k+! ] Zk2 = [ o * ] .
A= Adiag(h-1,Zk2Jn-k-d
B = Bdiag(/k-I, Zk2• In-k-!)
x = ak+1,ki Y = ak+1,k
ifk<n-2
z = ak+3,k
end
end
Find Householder Qn-1 so Qn-1 [ ~ ] = [ ~]
A = diag(/n-2, Qn-J)A
B = diag(In-2, Qn-1)B
Find Householder Zn_ 1 so
[ bn,n-1 bnn ] Zn-1 = [ 0 *]
A = Adiag(In-2• Zn-I)
B = Bdiag(In-2• Zn-1)

This algorithm requires 22n 2 flops. Q and Z can be accumulated for an


additional 8n2 flops and 13n2 flops, respectively.

7.7.7 The Overall QZ Process


By applying a sequence of QZ steps to the Hessenberg-triangular pencil
A- AB, it is possible to reduce A to quasi-triangular form. In doing this it
is necessary to monitor A's subdiagonal and B's diagonal in order to bring
about decoupling whenever possible. The complete process, due to Moler
and Stewart ( 1973), is as follows:

Algorithm 7.7.3 Given A E llr'xn and BE Rnxn, the following algo-


rithm computes orthogonal Q and Z such that QT AZ = T is upper quasi-
triangular and QT BZ =Sis upper triangular. A is overwritten by T and
B by S.
7.7. THE QZ METHOD FOR Ax= >.Bx 385

Using Algoritlun 7.7.1, overwrite A with Qr AZ (upper Hessenberg)


and B with QTBZ (upper triangular).
until q = n
Set all subdiagonal elements in A to zero that satisfy
la;,i-11 ::; ~(Ja;-1,t-ll + JauJ)
Find the largest nonnegative q and the smallest nonnegative p
such that if

p
A n-p-q
q
n-p-q q

then A33 is upper quasi-triangular and A22 is unreduced


upper Hessenberg.
Partition B conformably:

p
B n-p-q
q
n-p-q

if q < n
if B22 is singular
Zero an-q,n-q-1
else
Apply A!goritlun 7.7.2 to A22 and B2 2
A= diag(lp,Q,lq)TAdiag(Ip,Z,lq)
B = diag(lp,Q,IqjTBdiag(Ip,Z,Iq)
end
end
end

This algoritlun requires 30n3 flops. If Q is desired, an additional 16n3 are


necessary. If Z is required, an additional 20n 3 are needed. These estimates
of work are based on the experience that about two QZ iterations per
eigenvalue are necessary. Thus, the convergence properties of QZ are the
same as for QR. The speed of the QZ algoritlun is not affected by rank
deficiency in B.
The computed S and T can be shown to satisfy

Q;r(A + E)Zo = T s
386 CHAPT ER 7. THE UNSYMMETRIC EICENVALUE PROBLEM

where Qo and Zo are eJUI(:tly orthogonal and II E 112 ~ unA 112 and II F 112 ~
uiiBII2·
Example 7.7.6 Ir the QZ algorithm is app lied t.o

A=
[l
3
4
3
0
0
4
5
6
2
0
5
6
7
8
1 u and B =
[j
-1
1
0
0
0
- 1
- 1
1
0
0
-1
- 1
- 1
1
0
- 1
-1
- 1
-1
1
l
then t he subdiagonal elements or A converge BB follows

Iteration O(lh~ 1 1) O(lhaz l) O (lh43D O()h54l )


1 10o ]01 U)O ]0 I
2 wo loO Io0 J0- 1
3 100 10 1 ]0-1 10-3
4 JOO 1o0 10- 1 w- s
5 Jo0 101 I0-1 10-111
6 100 100 w-a converg.
7 Jo0 J0-1 10-4
8 101 10- 1 J0-8
9 100 w-1 10-111
10 100 10-~ converg.
11 10- 1 w-4
12 to- 2 to-n
13 •o-a 10 - 21
14 converg. converg.

7.7.8 Generalized Invariant Subspace Computations


Many of the invariant subspace computations discussed in §7.6 carry over to
the generalized eigenvalue problem. For example, approximate eigenvectors
can be found via inverse iteration:
q(O) E G:"xn given.
fork = 1, 2, . . .
Solve (A- J.!B)zCic) = BqCk-l )
Normalize: q (k) = z (lt) /II z(k) lb
,>.(k) = [q<" >JHAq(k) I !q<">JH Aq(k}
end
When B is nonsingular, this is equivalent to applying (7.6.1) with the
matrix B - 1A. Typically, only a single iteration is required if I' is an ap-
proximate eigenvalue computed by the QZ algorithm. By inverse iterat-
ing with the Hessenberg-triangular pencil, costly accumulation of the Z-
transforma.tions during the QZ iteration can be avoided.
Corresponding to the notion of an invariant subspace for a single ma-
trix, we have the notion of a deflating subspace for the pencil A - .>.B. In
7.7. THE QZ METHOD FOR AX = ).Bx 387

particular, we say that a k-dimensiona.l subspace S s;; R" is "deflating" for


the pencil A - .\B if the subspace {Ax+ By : x, y E S } has dimension k or
less. Note that the columns of the matrix Z in the generalized Schur decom-
position define a. family of deflating subspaces, for if Q = [ qlo ... , q" ) and
Z = [z~o ... ,z,) then we have spa.n{Az1 , .•• ,Azk} s;; spa.n{ql>···•qk} and
span{ Bz~o ... , Bzk} s;; spa.n{q1 , •.• , qk}· Properties of deflating subspaces
and their behavior under perturbation are described in Stewart (1972).

Problems

P7.7.1 Suppose A and B are in R""n and that

lJTBV = [~ 0 ]
0 n-r
r u = [ u, u. l V=[Vl V2]
T n -T r n-r
r n-r
is the SVD of B, where D is r-by-r and r = mnk(B). Show that if >.(A, B) = cr then
Uf A V2 is singular.
P7.7.2 Define F: Rn ~ R by

F(:r) = !.I
2
lAo: - "'T BT A:r B:rll2
zTBTBz
2
=
where A and Bare in R""n. Show that if VF(:r) 0, then A:r is a multiple of B:r.
P7.7.3 Suppose A and Bare in :re'"n. Givt~ an algorithm for computing orthogonal Q
e.nd Z such that QT AZ is upper Hessenberg and zT BQ is upper trie.ngulac.
P7. 7.4 Suppose
A_ [ An Al2 ] BOd B-
- 0 A22 - [ Bn
0
with An, Bu E Rkxk and A22, B22 E fll><j. Under what circumste.nces do there exist

X = [ ~ ~;2 ] BDd y =[ ~ 1 2
]

so that y-l AX and y-l BX ace both block diagonal? This is the generulizeD. Sylvester
equation problem. Specify an algorithm for the case when An, A22, Bn, and B22 ace
upper triangulac. See K8gstriim (1994).
P7.7.5 Suppose 1.1 ~>.(A, B). Relate the eigenvalues and eigenvectors of A, =(A-
1.1B)- 1 A BDd B1 = (A- 1.1B)- 1B to the generalized eigenvalues and eigenvectors of
A- >.B.
P7.7.6 Suppose A, B,C,D E R""n. Show how to compute orthogonal matrices Q, Z,U,
and v such that QT AU is upper Hessenberg and vTcz, QT BV, and vTnz are Bll
upper triangulac. Note that this converts the pencil AC- .\BD to Hessenberg-triangulac
form. Your algorithm should not form the products AC or BD explicitly and not should
not compute any matrix inverse. See Van Loan (1975).

Notes and References for Sec. 7.7


Mathematical aspects of the generalized eigenvalue problem ace covered in
388 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM

F. Gantmacher (1959). The Theory of Matrice• , vol. 2, Chelsea, New York.


H.W. Turnbill and A.C. Aitken {1961). An Introduction to the Theory of Canonical
Matrices, Dover, New York.
I. Erdelyi {1967). "On the Matrix Equation Ax = >..Bx," J. Math. Anal. and Applic.
17, 119--32.

A good general volume that covers many aspects of the A - J..B problem is

B. Kii.gstrom and A. Ruhe {1983). Matrix PenciL., Proc. Pite Havsbad, 1982, Lecture
Notes in Mathematics 973, Springer-Verlag, New York and Berlin.
The perturbation theory for the generalized eigenvalue problem is treated in

G.W. Stewart {1972). "On the Sensitivity of the Eigenvalue Problem Ax= J..Bx," SIAM
J. Num. Anal. 9, 669-86.
G.W. Stewart {1973). "Error and Perturbation Bounds for Subspaces Associated with
Certain Eigenvalue Problems," SIAM Review 15, 727--{;4.
G.W. Stewart (1975). "Gershgorin Theory for the Generalized Eigenvalue Problem Ax =
J..Bx," Mafh. Comp. 29, 6()(}-606.
G.W. Stewart {1978). "Perturbation Theory for the Generalized Eigenvalue Problem"',
in Recent Advance• in Numerical Analy•io , ed. C. de Boor and G.H. Golub, Aca-
demic Press, New York.
A. Pokrzywa {1986). "On Perturbations and the Equivalence Orbit of a Matrix Pencil,"
Lin. Alg. and Applic. 82, 99-121.

The QZ-related papers include

C.B. Moler and G.W. Stewart (1973). "An Algorithm for Generalized Matrix Eigenvalue
Problems," SIAM J. Num. Anal. 10, 241-56.
L. Kaufman {1974). "The LZ Algorithm to Solve the Generalized Eigenvalue Problem,"
SIAM J. Num. Anal. 11, 997-1024.
R.C. Ward {1975). "The Combination Shift QZ Algorithm," SIAM J. Num. Anal. 12,
835-853.
C. F. Van Loan {1975). "A General Matrix Eigenvalue Algorithm," SIAM J. Num. Anal.
12, 819--834.
L. Kaufman {1977). "Some Thoughts on the QZ Algorithm for Solving the Generalized
Eigenvalue Problem," ACM 'lhln•. Mafh. Soft. S, 65-75.
R.C. Ward (1981). "Balancing the Generalized Eigenvalue Problem," SIAM J. Sci and
Stat. Comp. 2, 141-152.
P. Van Dooren {1982). "Algorithm 590: DSUBSP and EXCHQZ: Fortran Routines
for Computing Deflating Subspaces with Specified Spectrum," ACM 'lhln•. Math.
Software 8, 376-382.
D. Watkins and L. Elsner {1994). "Theory of Decomposition and Bulge-Chasing Algo-
rithms for the Generalized Eigenvalue Problem," SIAM J. Matrix Anal. Appl. 15,
943-967.
Just as the Hessenberg decomposition is important in its own right, so is the Hessenberg-
triangular decomposition that serves as a QZ front end. See

W. Enright and S. Serbin (1978). "A Note on the Efficient Solution of Matrix Pencil
Systems," BIT 18, 276-81.
Other solution frameworks are proposed in

V.N. Kublanovskaja and V.N. Fadeeva (1964). "Computational Methods for the Solution
of a Generalized Eigenvalue Problem," A mer. Math. Soc. 'lhln•l. 2, 271-90.
G. Peters and J.H. Wilkinson (1970a). "Ax= J..Bx and the Generalized Eigenproblem,"
SIAM J. Num. Anal. 7, 479--92.
7.7. THE QZ METHOD FOR Ax= ABX 389

G. Rodrigue (1973). "A Gradient Method for the Matrix Eigenvalue Problem A:r =
.>.B:r," Numer. Math. 22, 1-16.
H.R. Schwartz (1974). "The Method of Coordinate Relaxation for (A - .>.B)x = 0,"
Num. Math. 23, 135-52.
A. Jennings and M.R. Osborne (1977). "Generalized Eigenvalue Problems for Certain
Unsymmetric Band Matrices," Lin. Alg. and Its Applic. !19, 139-50.
V.N. Kublanovskaya (1984). "AB Algorithm and Its Modifications for the Spectral
Problem of Linear Pencils of Matrices," Numer. Math. 43, 329-342.
C. Oara (1994). "Proper Deflating Subspaces: Properties, Algorithms, and Applica.-
tiOJJS," Numerical Algorithms 7, 355-373.
The general Ax = .>.Bx problem is central to some important control theory applications.
See

P. Van Dooren (1981). "A Generalized Eigenvalue Approach for Solving Riccati Equa-
tions," SIAM J. Sci. and Stat. Comp. 2, 121-135.
P. Van Dooren (1981). "The Generalized Eigenstructure Problem in Linear System
Theory," IEEE 1Tans. Auto. Cont. AC-26, 111-128.
W.F. Arnold and A.J. Laub (1984). "Generalized Eigenproblem Algorithms and Software
for Algebraic Riccati Equations," Proc. IEEE 72, 1746-1754.
J.W. Demmel and B. KB.gstrom (1988). "Accurate Solutions of Ill-Posed Problems in
Control Theory," SIAM J. MatTi% Anal. Appl. 126-145.
U. Flaschka, W-W. Li, and J-L. Wu (1992). "A KQZ Algorithm for Solving Linear-
Response Eigenvalue Equations," Lin. Alg. and Its Applic. 165, 93-123.
Rectangular generalized eigenvalue problems arise in certain applications. See

G.L. Thompson and R.L. Weil (1970). "Reducing the Rank of A- .>.B," Proc. Amer.
Math. Sec. 26, 548-54.
G.L. Thompson and R.L. Wei! (1972). "Roots of Matrix Pencils Ay =.>.By: Existence,
Calculations, and Relations to Game Theory," Lin. Alg. and Its Applic. 5, 207-26.
G.W. Stewart (1994). "Perturbation Theory for Rectangular Matrix Pencils," Lin. Alg.
and Applic. 208/809, 297-301.
The Kronecker Structure of the pencil A-.>.B is analogous to Jordan structure of A-.>.I:
it provides very useful information about the underlying application.

J .H. Wilkinson (1978). "Linear Differential Equations and Kronecker's Canonical Form,"
in Recent Advances in Numerical Analysis , ed. C. de Boor and G.H. Golub, Aca.-
demic Press, New York, pp. 231-65.
Interest in the Kronecker structure has led to a host of new algorithms and ana.lyses.

J.H. Wilkinson (1979). "Kronecker's Canonical Form and the QZ Algorithm," Lin. Alg.
and Its Applic. 88, 285-303.
P. Van Dooren (1979). "The Computation of Kronecker's Canonical Form of a Singular
Pencil," Lin. Alg. and Its Applic. 27, 103-40.
J.W. Demmel (1983). ''The Condition Number of Equivalence Transformations that
Block Diagonalize Matrix Pencils," SIAM J. Numer. Anal. 80, 599-610.
J.W. Demmel and B. Kagstrom (1987). "Computing Stable Eigendecompositions of
Matrix Pencils," Linear Alg. and Its Applic 88/89, 139-186.
B. Kagstrom (1985). "The Generalized Singular Value Decomposition and the General
A - AB Problem," BIT 24, 56s-583.
B. Kagstri:im (1986). "RGSVD: An Algorithm for Computing the Kronecker Structure
and Reducing Subspaces of Singular A- .>.B Pencils," SIAM J. Sci. and Stat. Comp.
7,185-211.
J. Dernmel and B. K8.gstri:im (1986). "Stably Computing the Kronecker Structure and
Reducing Subspaces of Singular Pencils A - .>.B for Uncertain Data," in Large Scale
390 CHAPTER 7. THE UNSYMMETRJC EIGENVALUE PROBLEM

Eigenvalue Probleffi8, J. Cullum and R.A. Willoughby (eds), North-Holland, Arrur


terdam.
T. Beelen and P. Van Dooren (1988). "An Improved Algorithm for the Computation of
Kronecker's Canonical Form of a Singular Pencil," Lin. Alg. and Its Applic. 105,
~5.
B. Kagstrom and L. Westin {1989). "Generalized Schur Methods with Condition Esti-
mators for Solving the Generalized Sylvester Equation," IEEE Trans. Auto. Cont.
AC-34, 745-751.
B. Kii.gstrom and P. Poromaa (1992). "Distributed and Shared Memory Block Algo-
rithms for the Triangular Sylvester Equation with aep- 1 Estimators," SIAM J. Ma-
trix Anal. Appl. 13, 90-101.
B. Kagstrom {1994). "A Perturbation Analysis of the Generalized Sylvester Equation
(AR- LB, DR- LE) = (C, F)," SIAM J. Matrix Anal. Appl. 15, 1045-1060.
E. Elmroth and B. Kii.gstrom (1996). "The Set of 2-by-3 Matrix Pencils-Kronecker
Structure and their Transitions under Perturbations," SIAM J. Matrix Anal., to
appear.
A. Edelman, E. Elmroth, and B. Kagstrom (1996). "A Geometric Approach to Pertur-
bation Theory of Matrices and Matrix Pencils," SIAM J. Matrix Anal., to appear.
Chapter 8

The Symmetric
Eigenvalue Problem

§8.1 Properties and Decompositions


§8.2 Power Iterations
§8.3 The Symmetric QR Algorithm
§8.4 Jacobi Methods
§8.5 Tridiagonal Methods
§8.6 Computing the SVD
§8.7 Some Generalized Eigenvalue Problems

The symmetric eigenvalue problem with its rich mathematical struc-


ture is one of the most aesthetically pleasing problems in numerical linear
algebra. We begin our presentation with a brief discussion of the math-
ematical properties that underlie this computation. In §8.2 and §8.3 we
develop various power iterations eventually focusing on the symmetric QR
algorithm.
In §8.4 we discuss Jacobi's method, one of the earliest matrix algorithms
to appear in the literature. This technique is of current interest because it is
amenable to parallel computation and because under certain circumstances
it has superior accuracy.
Various methods for the tridiagonal case are presented in §8.5. These
include the method of bisection and a divide and conquer technique.
The computation of the singular value decomposition is detailed in §8.6.
The central algorithm is a variant of the symmetric QR iteration that works
on bidiagonal matrices.

391
392 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

In the final section we discuss the generalized eigenvalue problem Ax =


>.Bx for the important case when A is symmetric and B is symmetric
positive definite. No suitable analog of the orthogonally- based QZ algo-
rithm (see §7. 7) exists for this specially structured, generalized eigenprob-
lem. However, there are several successful methods that can be applied
and these are presented along with a discussion of the generalized singular
value decomposition.

Before You Begin


Chapter 1, §§2.1-2.5, and §2.7, Chapter 3, §§4.1-4.3, §§5.1-5.5 and §7.1.1
are assumed. Within this chapter there are the following dependencies:

§8.4
i
§8.1 -+ §8.2 -+ §8.3 -+ §8.6 _, §8.7
!
§8.5

Many of the algorithms and theorems in this chapter have unsymmetric


counterparts in Chapter 7. However, except for a few concepts and defini-
tions, our treatment of the symmetric eigenproblem can be studied before
reading Chapter 7.
Complementary references include Wilkinson (1965), Stewart (1973),
Gourlay and Watson (1973), Hager {1988), Chatelin (1993), Parlett (1980),
Stewart and Sun (1990), Watkins (1991), Jennings and McKeowen (1992),
and Datta (1995). Some Matlab functions important to this chapter are
schur and svd. LAPACK connections include

LAPACK: Symmetric Eigenproblem


_SYEV All eigenvalues and vectom
_SYEVD Same but uses divide and conquer for eigenvectors
_SYEVX Selected eigenvalues and vectors
_SYTRD Householder tridiagonalization
_SBTRD Householder tridiagonalization (A banded)
_SPTRD Householder tridiagonalization (A in packed storage)
_STEQR All eigenvalues and vectors of tridiagonal by implicit QR
_STEDC All eigenvalues and vectors of tridiagonal by divide and conquer
_STERF All eigenvalues of tridiagonal by root-free QR
_PTEQR All eigenvalues and eigenvectors of positive definite tridiagonal
_STEBZ Selected eigenvalues of tridiagonal by bisection
_STEIN Selected eigenvectors of tridiagonal by inverse iteration

LAPACK: Symmetric-Definite Eigenproblems


- SYGST
_PBSTF
I Split
Converts A AB to C
Cholesky factorization
form
).J

_SBGST Converts banded A - AB to C - ).[ form via split Cholesky


8.1. PROPERTIES AND DECOM POSITIONS 393

LAPACI<: SVD
. CESVD A = VEVT
.BDSQR SVD of real bidiagonal matrix
.C£&1\11 bidiagonalization of general matrix
.ORGBR generates the orthogonal tr&nsformations
.CBBlUI bidiagonalization of b&nd matrix

LAPACI<: The Generalized Singular Value Problem


. CCSVP
• TCSJA
1
Converts AT A - ~J 2 BT B to triangular Af At - p. 2 B[ Bt
Comp utes GSVD of a p air of triangular matrices.

8.1 Properties and Decompositions


In this section we set down the mat hematics that is required to develop
and analyze algorithms for the symmetric eigenvalue problem.

8.1.1 Eigenvalues and Eigenvectors


Symmetry guarantees that all of A's eigenvalues are real and that there is
an orthonormal basis of eigenvectors.
Theorem 8.1.1 (Symmetric Schur Decomposition) If A E m.nxn is sym-
metric, then there exists a rod orthogonal Q su.ch that
QT AQ =A= diag(.>.t, ... ,.>.n)·
lvfon~over, fork= l:n, AQ(:, k) = .>.~cQ(: , Ic). See Theorem 7.1.3.
Proof. Suppose .>.1 E >.(A) and that x E (;" is a unit 2-norm eigenvector
with Ax = >. 1x. Since >.1 = x 8 Ax = x 8 A 8 x = x 8 Ax = >.1 it follows
that >-t E R Thus, we may assume that x E 1R". Let P1 E m.n xn be
a Householder matrix such that P'{ x = e 1 = In(:, 1). It follows from
Ax = >.tx t hat (P'[ APt)et = >.e1. This says t hat the first column of
P'{ AP1 is a multiple of e1 . But since P'{AP1 is synunet ric it must have
t he form
P[AP1 = [ ~1 1 1
]

where At E R(n-I)x(n-I) is symmetric. By induction we may assume t hat


there is an orthogonal Q I E 1R.(n-I)x(n-I) such that QT A 1Q1 = At Is diag·
onal. The theorem foUows by setting

Q= H o]
[ 01 Ql and a" = [.>. •
0
Ao
a ]
1

and comparing columns in the matrix equation AQ = QA. []


Example 8.1.1 If
A= [ 6.8
2.4
2.4 ]
8.2
and Q = [ ..86 -..68].
394 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

then Q is orthogonal and qT AQ = diag(10,5).

Fbr a symmetric matrix A we shall we the notation >.k(A) to designate the


kth largest eigenvalue. Thus,
>.,(A) ~ · · · ~ >.2(A) ~ >. 1(A).
It follows from the orthogonal invariance of the 2-norm that A has singular
values {I>., (A) I, ... , i>., (A) I} and so
II A ll2 = max{ i>.,(A)I, i>.n(A)I }.

The eigenvalues of a symmetric matrix have a "minimax" characteriza-


tion based on the values that can be assumed by the quadratic form ratio
:r.T A:J:jxTx.
Theorem 8.1.2 (Courant-Fischer Minimax Theorem} If A E lRnxn
is symmetric, then TA
>.k(A) = max min ~
dim(S)=k O;o!y€8 yTy

fork= l:n.
Proof. Let QT AQ = diag(>.i) be the Schur decomposition with>.,~: = >.,~;(A)
and Q = [ q" q2, ... , qn ]. Define
S.~; = span{q1, ... , qk},
the invariant subspace associated with >.~, ... , >.k. It is easy to show that

max min yT Ay > yTAy


min -T- = q[ Aqk = >.k(A).
dim(S)=k 01'yES yT y - O;o!yes. Y Y

Th establish the reverse inequality, let S be any k-dimensional subspace and


note that it must intersect span{qk, ... , q,}, a subspace that has dimension
n- k + 1. If Y• = okqk + · · · + Onqn is in this intersection, then
TA TA
min ~T
<
-
Y. T Y• <
-
>.k (A) •
O"'yES Y Y Y. Yo

Since this inequality holds for all k-dimensional subspaces,

TA
max min y T y ~ >..~:(A)
dim(S)=k O;o!yES Y Y

thereby completing the proof of the theorem. D

If A E Rnxn is symmetric positive definite, then >.,(A) > 0.


8.1. PROPERTIES AND DECOMPOSITIONS 395

8.1.2 Eigenvalue Sensitivity


An important solution framework for the symmetric eigenproblem involves
the production of a sequence of orthogonal transformations {Qk} with the
property that the matrices Qf AQk are progressively "more diagonal." The
question naturally arises, how well do the diagonal elements of a matrix
approximate its eigenvalues?

Theorem 8.1.3 (Gershgorin) Suppose A E Rnxn is symmetric and that


Q E Rnxn is orthogonal. If QT AQ = D + F where D = diag(dt, ... , dn)
and F has zero diagonal entries, then
n

,\(A)~ U[d;-r;,d;+r;]
i=l

n
where r; L IJ;il fori= 1:n. See Theorem 7.2.1.
j=l

Proof. Suppose,\ E ,\(A) and assume without loss of generality that,\ # d;


for i = 1:n. Since (D- ,\I)+ F is singular, it follows from Lemma 2.3.3
that
1 < II (D- ,\I)-l F lloo
-
= t
j=l ldk -
l/kil
AI
= Tk
ldk - AI
for some k, 1 ::; k ::; n. But this implies that ,\ E [dk - rk, dk + rk]· D

Example 8.1.2 The matrix

2.0000 0.1000 0.2000 ]


A = 0.2000 5.0000 0.3000
[ 0.1000 0.3000 -1.0000

has Gerschgorin intervals [1.7, 2.3], [4.5,5.5], and [-1.4, -.6] and eigenvalues 1.9984,
5.0224, and -1.0208.

The next results show that if A is perturbed by a symmetric matrix E,


then its eigenvalues do not move by more than II E II·

Theorem 8.1.4 (Wielandt-Hoffman) If A and A+ E are n-by-n sym-


metric matrices, then
n
L (A;( A+ E)- ,\;(A)) 2
::; II E II~.
i=l
396 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Proof. A proof can be found in Wilkinson (1965, pp.104-8) or Stewart


and Sun (1991, pp.189-191). See also P8.1.5. IJ

Example 8.1.3 If

A= [ 6.8
2.4
2.4
8.2
l and E _
-
[ .002
.003
.003
.001
l '
then A(A) = {5, 10} and A(A +E)= {4.9988, 10.004} confirming that
1.95 x 10-' = 14.9988- 51 2 + 110.004- 101 2 :-::; 11 E 11~ = 2.3 x w-'.

Theorem 8.1.5 If A and A + E are n-by-n symmetric matrices, then


k = 1:n.

Proof. This follows from the minimax characterization. See Wilkinson


(1965, pp.101-2) or Stewart and Sun (1990, p.203). IJ

Example 8.1.4 If

A = [ 6.8 2.4 ] and E _ [ .002 .003 ]


2.4 8.2 - .003 .001 '
then A(A) = {5, 10}, A(E) = { -.0015, .0045}, and A( A+ E) = {4.9988, 10.0042}.
confirming that
5 - .0015 :<::: 4.9988 :<::: 5 + .0045
10- .0015 :<::: 10.0042 :<::: 10 + .0045.

Corollary 8.1.6 If A and A+ E are n-by-n symmetric matrices, then

fork= 1:n.
Proof.

Several more useful perturbation results follow from the minimax property.
Theorem 8.1.7 (Interlacing Property) If A E !Rnxn is symmetric and
A,. = A(1:r, 1:r), then
Ar+t(Ar+t) :":: Ar(Ar) :":: Ar(Ar+!) :":: · .. :":: A2(Ar+I) :":: .XI(Ar) :":: .XI(Ar+J)
forr = 1:n -1.
8.1. PROPERTIES AND DECOMPOSITIONS 397

Proof. Wilkinson (1965, pp.103-4). 0

Example 8.1.5 If

A=[~~~!]
1 3 6 10
1 4 10 20
then -\(A,) = {1}, -\(A2) = {.3820, 2.6180}, -\(A 3 ) = {.1270, 1.0000, 7.873}, and
-\(A.)= {.0380, .4538, 2.2034, 26.3047}.

Theorem 8.1.8 Suppose B = A + rccT where A E !Rnxn is symmetric,


c E !Rn has unit 2-norm and r E JR. If r ~ 0, then

>.;(B) E [>.;(A), >.;-t(A)J i = 2:n

while if T ::; 0 then

>.;(B) E [>.;+t(A),>.;(A)J, i = 1:n- 1.

In either case, there exist nonnegative m 1 , ... , mn such that

>.;(B)= >.;(A)+ m;r, i = 1:n

with m1 + · · · + mn = 1.

Proof. Wilkinson (1965, pp.94-97). See also P8.1.8. 01

8.1.3 Invariant Subspaces


Many eigenvalue computations proceed by breaking the original problem
into a collection of smaller subproblems. The following result is the basis
for this solution framework.

Theorem 8.1.9 Suppose A E !Rnxn is symmetric and that

is orthogonal. If ran( Q1 ) is an invariant subspace, then

QTAQ= D =
(8.1.1)

and >.(A)= >.(Dt) U >.(D2). See also Lemma 7.1.2.


398 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Proof. If
QT AQ = [ Dt £1; ] ,
E21 D2
then from AQ = QD we have AQ1 - Q 1D 1 = Q 2E2 1. Since ran(Qt) is
invariant, the columns of Q2E21 are also in ran(Qt) and therefore perpen-
dicular to the columns of Q2. Thus,
o = QI(AQt- QtDt) = QfQ2E21 = E21·
and so (8.1.1) holds. It is easy to show
det(A- Mn) = det(QT AQ- Mn) = det(Dt- M.)det(D2- Mn-r)
confirming that >.(A)= >.(Dt) U >.(D2). D

The sensitivity to perturbation of an invariant subspace depends upon


the separation of the associated eigenvalues from the rest of the spectrum.
The appropriate measure of separation between the eigenvalues of two sym-
metric matrices B and C is given by

sep(B,C) = min 1>- -ttl· (8.1.2)


.l.E.l.(B)
I'E.l.(C)

With this definition we have


Theorem 8.1.10 Suppose A and A+ E are n-by-n symmetric matrices
and that
Q = [ Ql Q2 l
r n -r
is an orthogonal matrix such that ran(Q 1 ) is an invariant subspace for A.
Partition the matrices QT AQ and QT EQ as follows:
r
n-r [ ~~~r n-r
~]
r
n-r

II E lb $ sep(Dt D2),

then there exists a matrix P E R(n-r)xr with


4

such that the columns of Ot = (Q 1 + Q2P)(I + pT P)- 112 define an or-


thonormal basis for a subspace that is invariant for A+ E. See also Theorem
7.2.4.
8.1. PROPERTIES AND DECOMPOSITIONS 399

Proof. This result is a slight adaptation of of Theorem 4.11 in Stewart


(1973). The matrix (I + pT P)- 112 is the inverse of the square root of
I+ pT P. See §4.2.10. Dl

Corollary 8.1.11 If the conditions of the theorem hold, then


4
dist{ran(QI), ran(QI)) :::;
sep
(D D )
1, 2
II E21 ll2·
See also Corollary 7.2.5.
Proof. It can be shown using the SVD that

II P{I + pT P)- 112 112:::; II p 112· {8.1.3)

Since Q'fQ 1 = P{I +pH P)- 112 it follows that

dist{ran(QI), ran(Q!)) = II Q'fQI IJ2 = II P{I +pH P)- 112IJ2

Thus, the reciprocal of sep(D~, D 2 ) can be thought of as a condition number


that measures the sellSitivity of ran{ Q 1 ) as an invariant subspace.
The effect of perturbations on a single eigenvector is sufficiently impor-
tant that we specialize the above results to this important case.
Theorem 8.1.12 Suppose A and A+ E are n-by-n symmetric matrices
and that
Q = [ Ql
1
is an orthogonal matrix such that q1 is an eigenvector for A. Partition the
matrices QT AQ and QT EQ as follows:
1 1
n-1 n-1

If d = min lA- J.tl > 0 and


I'E>.(D,)

d
IIEib:::; 4'
then there exists p E Rn-I satisfying
400 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

such that ih = (qi +Q2p)/ y'1 + pTp is a unit 2-nonn eigenvector for A+E.
Moreover,

See also Corollary 7.2.6.

Proof. Apply Theorem 8.1.10 and Corollary 8.1.11 with r = 1 and observe
that if Dt =(A), then d = sep(D1,D2). D

Example 8.1.6 If A= diag{.999, 1.001, 2.), and

0.00 O.Dl O.Dl ]


E = 0.01 0.00 0.01 ,
[ 0.01 0.01 0.00

then QT(A + E)Q = diag(.9899, 1.0098, 2.0002) where

-.7418 .6706 .0101 ]


Q = .6708 .7417 .0101
[ .0007 -.0143 .9999

is orthogonal. Let tls = Qes, i = 1, 2, 3. Thus) 'is is the perturbation of A's eigenvector
q; = "-i· A calculation shows that

dist{span{qi}, span{Q,}} = dist{span{q2}, span{q2}} = .67

Thus, because they are WliOciated with nearby eigenvalues, the eigenvectors q1 and ~
cannot be computed accurately. On the other hand, since >., and >.2 are well separated
from >.3, they define a tw<>-dimensional subspace that is not particularly sensitive as
dist{span{qi,q2},apan{qt,oh}} = .01.

8.1.4 Approximate Invariant Subspaces


If the columns of Q1 E !Rnxr are independent and the residual matrix R =
AQ1- Q1S is small for someS E IR'"xr, then the columns of Q 1 define an
approximate invariant subspace. Let us discover what we can say about
the eigensystem of A when in the possession of such a matrix.

Theorem 8.1.13 Suppose A E m.nxn and S E m;xr are symmetric and


that
AQI- QIS = El
where Q 1 E llr'xr satisfies QfQl =I•. Then there exist f.lt. ... , f.lr E A( A)
such that

fork= 1:r.
8.1. PROPERTIES AND DECOMPOSITIONS 401

Proof. Let Q2 E lRnx(n-r) be any matrix such that Q = [ Qb Q2 ] is


orthogonal. It follows that

~~::
2
B+E
[: QI:Q2] + [ E[OQ ]

and so by using Corollary 8.1.6 we have I.\~~:(A) -.\~~:(B) I ~ II E 112 for


k = l:n. Since .\(S) ~ .\(B), there exist !Jt, ... , !Jr E .\(A) such that

for k = l:r. The theorem follows by noting that for any x E lRr and
y E lRn-r we have

liE [ ; ] 112 ~ II EtX 112 +II E[Q2Y 112 ~ II Et 11211 X 112 +II Et 112 II y 112

from which we readily conclude that II E 112 ~ v'2ii Et l12· []


Example 8.1.7 If

A =[ 6.8
2.4
2.4 ]
8.2 ' Q1 = [ :~:~ ] , and S = (5.1) E R
then
AQ1- Q1S = [ =:::~ ]= E1.

The theorem predicts that A has a.n eigenvalue within -./211 E1 ll2"' .1415 of 5.1. This
=
is true since >.(A) {5, 10}.

The eigenvalue bounds in Theorem 8.1.13 depend on II AQ1 - Q1S 112·


Given A and Q 11 the following theorem indicates how to chooseS so that
this quantity is minimized in the Frobenius norm.

Theorem 8.1.14 If A E lRnxn is symmetric and Ql E lRnxr has orthonor-


mal columns, then

and S = Q[ AQ 1 is the minimizer.

Proof. Let Q 2 E lRnx(n-r) be such that Q = [ Q1, Q2] is orthogonal. For


any S E m;xr we have

II AQt- QIS II~ = II QT AQt - QTQtS II~


= II Q[ AQ1 - S II~+ II QI AQ1 II~-
402 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Clearly, the minimizing Sis given by S = Qf AQ 1 • []

This result enables us to associate any r-dimensional subspace ran( QI),


with a set of r "optimal" eigenvalue-eigenvector approximates.
Theorem 8.1.15 Suppose A E Rnxn is symmetric and that Q 1 E Rnxr
satisfies Q[Qt = Ir· If

zT(Qf AQt)Z = diag(9I>···•9r) = D

is the Schur decomposition of Q[ AQ1 and Q1Z = [ YI, ... , Yr] , then

fork= 1:r.
Proof.

The theorem follows by taking norms. D

In Theorem 8.1.15, the (!k are called Ritz values, the Yk are called Ritz
vectors, and the {Ok, Yk) are called Ritz pairs.
The usefulness of Theorem 8.1.13 is enhanced if we weaken the assump-
tion that the columns of Q 1 are orthonormal. As can be expected, the
bounds deteriorate with the loss of orthogonality.
Theorem 8.1.16 Suppose A E Rnxn is symmetric and that

AX1 - X1S = Ft,


where X1 E Rnxr and S = Xf AX1. If

II xr X] - Ir 112 =T < 1, (8.1.4)

then there exist Jl.t, ... , Jl.r E .>-(A) such that

fork= 1:r.
Proof. Let X 1 = ZP be the polar decomposition of X 1 . Recall from
§4.2.10 that this means Z E Rnxr has orthonormal columns and P E Rkxk
is a symmetric positive semidefinite matrix that satisfies P 2 = Xf X 1 .
Taking norJns in the equation

E1 = AZ-ZS (AX1 - X1S) + A(Z- Xt) - (Z- X1)S


F1 + AZ(I- P) - Z(I- P)Xf AX1
8 .1. PROPERTlES AND DECOMPOSITIONS 403

gives

II Et ll2 ~ IIFtll2 + IIAII211 I -PII2 (l+IIXt ll ~). (8.1.5)


Equation (8.1.4) implies that

II x, u~ ~ 1 + r. (8.1.6)

Since P is positive semidefinite, (I + P) is nonsingular and so

I - P = (I+ P)- 1 {1- P 2 ) = (I+ P)- 1(1- xrxt)


which implies II I - P lb ~ r . By substituting this inequality and (8.1.6)
into (8.1.5) we have II E1 liz $ II F 1 1!2 + -r(2 + -r)ll A liz. The proof is
completed by noting t hat we can use Theorem 8.1. 13 with Q 1 = Z to
relate the eigenvalues of A and S via the residual E 1 • C

8.1.5 The Law of Inertia


The inertia of a symmetric matrix A is a triplet of nonnegative integers
(m, z,p) where m, z, and pare respectively the number of negative, zero,
and positive elements of >.(A).
Theorem 8.1.17 (Sylvester Law of Ine rtia) If A E Rnxn is symmet-
ric and X E Rnxn i.~ nonsingular, then A and XT AX have the same iner-
tia.
Proof. Suppose for some r that Ar(A) > 0 and define t he subspace S0 ~
m_n by
So = span{X - 1qt, .. . , x - 1qr}, q, # 0
where Aq; = >.,(A)qi and i = l:r. From the minimax characterization of
Ar(XT AX) we have

max min min


dim(S)=r tiES 11ESo

Since

y ERn =>

y E So =>
it foUows that

min
yESo
404 CHAPTER 8. THE SYMM£TRIC EIGENVALUE P ROBLEM

An analogous argument with the roles of A and xr AX reversed shows that

~ r (A) >_ ~r (XTAX)--vn (X-1)2 -


-
~r(XT AX)
u (X)2 ·
1

T hus, ).,.(A) and ~r(XT AX) have the s8Jlle sign and so we have shown that
A and xr AX have the same number of positive eigenvalues. If we apply
this result to - A, we conclude that A and xr AX have the s8Jlle number of
negative eigenvalues. ObviollSly, the number of zero eigenvalues possessed
by each matrix is also the same. C

Example 8.1.8 If A= diag(3, 2,-l) and

X=[~!~]·
then

xT AX = [ 1~ 64~~ !!82 ]
15
and ,\(XT AX) {134.769, 3555, - .1252}.

Problems

P8.1.1 Without using any of the results i.n this section, show that the eigenvalues or a
2-by-2 symmetric matrix must be real.

P8.1.2 Compute the Schur decomposition of A= [ ~ ~ ].


P8.1.3 Show that the eigenvalues or a Hermitian matrix (AH =
A) are real. For
each theorem and corollary in this section, state and p rove the corresponding result for
Hermitian matrices. Which results have analogll when A is skew-symmetric? (Hint: If
AT= -A, then iA is Hermitian.)
P8.1.4 Show that if X E R'" r, T' 5. n, and II xr =
X- I II 'T < 1, then qmin(X) ~ 1-T.

P8.1.6 Suppose A, E E wxn are symmetric and consider the Schur decomposition
A + tE = QDQT where we GSsume that Q = Q(t) and D = D(t) are continuously differ-
entiable functions oft E R. Show t hat D(t) = diag( Q(t)T EQ(t)) where t he matrix on
the right is the diagonal part of Q(t)TEQ(t). Establi$h the Wielandt-Hoffman theorem
by integrating both sides of this equation from 0 to 1 and taking Frobenius norms to
show that

II D(1)- D(O) IIF 5. 1 1


II diag(Q(t)T EQ(t) IIFdt 5. II E !IF .

P8.1.6 Prove Theorem 8.1.5.


Ps.1:'1 Prove Theorem !l.l.'T.
P8.1.8 lf C E R'"' n then the trou junction tr(C) = cu + · · · + Cnn equals the sum of
C's eigenvalues. Use this to prove Theorem 8.1.8.
P8.1.9 Show that if BE R,mXm &~~d C E R'""' are sym metric, then sep(B,C) =min
8.2. POWER ITERATIONS 405

II BX- XC IIF where the min is taken over a.ll matrices in Rmxn.
P8.1.10 Prove the inequality (8.1.3).
PS.l.ll Suppose A E R'xn is symmetric and C E R'xr has full column rank and
assume that r « n. By using Theorem 8.1.8 relate the eigenva.lues of A + CCT to the
eigenva.lues of A.

Notes and References for Sec. 8.1

The perturbation theory for the symmetric eigenva.Iue problem is surveyed in Wilkinson
(1965, chapter 2), Parlett {1980, chapters 10 and 11), a.nd Stewart and Sun (1990, chap-
ters 4 a.nd 5). Some representative papers in this well-researched area include

G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with
Certain Eigenvalue Problems," SIAM Review 15, 727-£4.
C.C. Paige {1974). "Eigenvalues of Perturbed Hermitian Matrices," Lin. Alg. and It8
Applic . 8, 1-10.
A. Ruhe {1975). "On the Closeness of Eigenva.lues and Singular Values for Almost
Norma.I Matrices," Lin. Alg. and It8 Applic. 11, 87-94.
W. Kahan (1975). "Spectra of Nearly Hermitian Matrices," Proc. Amer. Math. Soc.
48, 11-17.
A. Schonhage (1979). "Arbitrary Perturbations of Hermitian Matrices," Lin. Alg. and
Its Applic. 24, 143-49.
P. Deift, T. Nanda, and C. Tomei (1983). "Ordinary Differentia.l Equations and the
Symmetric Eigenvalue Problem," SIAM J. Numer. Anal. 20, 1-22.
D.S. Scott {1985). "On the Accuracy of the Gershgorin Circle Theorem for Bounding
the Spread of a Rea.l Symmetric Matrix," Lin. Alg. and Its Applic. 65, 147-155
J.-G. Sun (1995). "A Note on Backward Error Perturbations for the Hermitian Eigen-
va.lue Problem," BIT 35, 385--393.
R.-C. Li (1996). "Relative Perturbation Theory(!) Eigenvalue and Singular Value Vari-
ations," Technical Report UCB/ /CSD-94-855, Department of EECS, University of
Ca.Iifornia at Berkeley.
R.-C. Li (1996). "Relative Perturbation Theory (II) Eigenspace and Singular Subspace
Variations," Technical Report UCB/ /CSD-94-856, Department of EECS, University
of Ca.lifornia at Berkeley.

8.2 Power Iterations


Assume that A E Jre'xn is symmetric and that U0 E lR.nxn is orthogonal.
Consider the following QR itemtion:

To= UJ'AUo
fork= 1,2, ...
Tk-1 = UkRk ( QR factorization) (8.2.1)
Tk = RkUk
end

Since Tk
that
(8.2.2)
406 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Thus, eo.ch Tk is orthogonally similar to A. Moreover, the Tk almost al-


ways converge to diagonal form and so it can be said that (8.2.1) almost
always "converges" to a Schur decomposition of A. In order to establish
this remarkable result we first consider the power method and the method
of orthogonal iteration.

8.2.1 The Power Method


Given a unit 2-norm q(O) ERn, the power method produces a sequence of
vectors q(k) as follows:

fork= 1,2, ...


z(k) = Aq(k-1)
q(k) = z(k) /II z(k) ll2 (8.2.3)
>,(k) = [q(k)]T Aq(k)
end

If q(o) is not "deficient" and A's eigenvalue of maximum modulus is unique,


then the q(k) converge to an eigenvector.

Theorem 8.2.1 Suppose A E Rnxn is symmetric and that

QT AQ = diag(>.t, ... , >.n)

where Q = [q 1 , ••• ,qn] is orthogonal and [>.t[ > [>.21 ~ ··· ~ l>.nl• Let the
vectors Qk be specified by {8.2.3) and define 8k E [O,n/2] by

cos(Bk) = lqf q(k) I·


If cos( Bo) f- 0, then

lsin(Bk)l s tan(Bo) I~: r (8.2.4)

l>.(k)- >.I < 1>.1- >.nl tan(Bo)21 ~: 12k (8.2.5)

Proof. From the definition of the iteration, it follows that q(k) is a multiple
of Akq(O) and so

If q(O) has the eigenvector expansion q( 0 ) = a 1q1 + · · · + anqn, then


8.2. POWER ITERATIONS 407

a~+ · ··+a; = 1,
and

Thus,
n
~ a:l.A~k
~. '
i=2
1 - n n
~a2_A?k
LJ 1 • La~A'f"
i=l i=l

2 .A )2~<
= tan(9o) ( A:
This proves (8.2.4). Likewise,

[q<o>( A2k+l q(o) i= l


(q(O)]T A2kq(O)

and so
n

La~ .A~" (.A, - .A 1)


i-2

i= l

Example 8.2.1 The eigenvalues or

A _
- [
-1.6407
1.0814
1.2014
1.0814
4.1573
7.4035
1.1539 -1.0463
1.2014
7.4035
2.7890
-1.5737
1 1539
-1:0463
-1.5737
8.6944
l
408 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

are given by J.(A) : {12, 8, -4, -2}. If (8.2.3) i.s applied to this matrix with q(O) ==
[1 0 0 O)T, then
k ;.(k)
1 2.3156
2 8.6802
3 10.3163
4 11.0663
5 11.5259
6 11.7747
7 11.8967
8 11.9534
9 11.9792
10 11.9907
Observe the convergence to J. 1 = 12 with rate IJ. 2 jJ. 1 12 k : (8/12) 2 k : (4/9)k.

Computable error bounds for the power method can be obtained by using
Theorem 8.1.13. If
I Aq(k)- .>,(k)q(k) lb = h,
then there exists), E >.(A) such that l.>.(k) ->.I : : ; -/26.

8.2.2 Inverse Iteration


Suppose the power method is applied with A replaced by (A - U) -I. If ),
is very close to a distinct eigenvalue of A, then the next iterate vector will
be very rich in the corresponding eigendirection:

X = t
i=I
a;q; }
=> (A- >.I)- 1 x =
n a·
2: ~ ~ ),q,.
t=l
Aq; = >.;q;, i = l:n
Thus, if ), "=' ),i and aj is not too small, then this vector has a strong
component in the direction of QJ· This process is called inverse itemtion
and it requires the solution of a linear system with matrix of coefficients
A->.I.

8.2.3 Rayleigh Quotient Iteration


Suppose A E IRnxn is symmetric and that xis a given nonzero n-vector. A
simple differentiation reveals that

), = r(x) _

minimizes II (A- >.I)x ll2· (See also Theorem 8.1.14.) The scalar r(x) is
called the Rayleigh quotient of x. Clearly, if x is an approximate eigen-
vector, then r(x) is a reasonable choice for the corresponding eigenvalue.
8.2 . POWER ITERATIONS 409

Combining t his Idea with inverse iteration gives rise to the Rayleigh quotient
iteration:
xo given, II xo !1 2= 1
for k= 0, 1, . ..
J.Jk = r(x~r.) (8.2.6)
Solve (A - JJ~r. l )zk+ l = Xk for Zk+l
Xk+J = Zk+J/ 11 Zk+l 112
end

l
Exam ple 8 .l.2 If (8.2.6) is applied to
I 1 I I
2 3 4 5 6

A - [ j 3
4
5
6 10
10 20
15 35
15
35
70
21
I
56
126
6 21 56 126 252
with xo = [1, 1, 1, 1, 1, 1]T/6, t hen
k ~k
0 153.8333
1 120.0571
2 49.5011
3 13.8687
4 15.4959
5 15.5534
The iteration is converging to the eigenvalue>. = 15.5534732737.

T he Rayleigh quotient iteration almost always converges and when it


does, the rate of convergence is cubic. We demonstrate t his for the case
n = 2. Without loss of generality, we may assume that A = diag(.\1! >.2),
with .\ 1 > .\2. Denoting x~.: by

c~ + s~ = 1

it follows that J.Jk .X,c~ + >. t s~ in (8.2.6) and

Z1t.+1 ). 1 ~ ~ -~~~~ [ ]

A calculation shows that

(8.2.7)

From these equations it is clear that t he x~; converge cubically to either-


span{ed or span{e2 } provided l c~e l ::f ls~r. l ·
Details associated wit h the practical im plementation of the Rayleigh
quot ient iteration may be fo und in Parlet t (1974).
410 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

8.2.4 Orthogonal Iteration


A straightforward generalization of the power method can be used to com-
pute higher-dimensional invariant subspaces. Let r be a chosen integer
satisfying 1 ::; r ::; n. Given an n-by-r matrix Qo with orthonormal
columns, the method of orthogonal itemtion generates a sequence of matri-
ces {Qk} ~ lRnxr as follows:

fork=1,2, ...
zk = AQk-1 (8.2.8)
QkRk = zk (QR factorization)
end

Note that if r = 1, then this is just the power method. Moreover, the
sequence {Qked is precisely the sequence of vectors produced by the power
iteration with starting vector q(O) = Q 0 e 1 .
In order to analyze the behavior of (8.2.8), assume that

QT AQ = D = diag(>.;) (8.2.9)

is a Schur decomposition of A E lRnxn. Partition Q and D as follows:

r
Q = [ Q(i Q{3 l D= n-r (8.2.10)
r n-r

Dr(A) = ran(Qo)
is the dominant invariant subspace of dimension r. It is the unique invari-
ant subspace associated with the eigenvalues >. 1 , ... , >.r.
The following theorem shows that with reasonable assumptions, the
subspaces ran(Qk) generated by (8.2.8) converge to Dr(A) at a rate pro-
portional to IAr+ dAr Ik.
Theorem 8.2.2 Let the Schur decomposition of A E lRnxn be given by
{8.2.9) and (8.2.10) with n ~ 2. Assume that 1>-rl > 1>-r+II and that the
n-by-r matrices {Qk} are defined by {8.2.8). lf8 E [0,7r/2] is specified by

cos(8) = min lurvl 0


uEDr(A) I u ll21l v ll2 > '
~Emn(Qo)

then
dist(Dr(A), ran(Qk)) ::; tan(£1) I>.~:Ilk
See also Theorem 7. 9.1.
8.2. POWER ITERATIONS 411

Proof. By induction it can be shown that

AkQo = Qk (Rk · · · Rt)


and so with the partitionings (8.2.10) we have

[~~ ~~ ][~f~: ] = [ ~f~: ]


(Rk .. · Rt).

If

then

cos(8min) = O'r(Vo) == } 1 - II W o ~~~


dist(Dr(A), ran(Qk)) = II W,~r liz
D~Vo = Vk (Rk · · · Rt)
D~Wo = W.~.:(R"·"Rt)
It follows that Vo is nonsingular which in turn implies that Vk and ( Rk • · · Rt)
are also nonsingular. Thus,

and so

Example 8.2.3 If (8.2.8) Is applied to tile matrix ot Example 8 .2.1 with r = 2 and
Qo = 1,(:,1:2), then

k dist(D2(A) ,1'811(Q~o))
1 0.8806
:i! o.4091
3 0.1121
4 0.0313
5 0.0106
6 0.0044
1 0.0020
8 0.0010
9 0.0005
10 0.0002
412 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

8.2.5 The QR Iteration


Consider what happens when we apply the method of orthogonal iteration
(8.2.8) with r = n. Let QT AQ = diag(>.~, ... , >.n) be the Schur decompo-
sition and assume
l>.1l > l>.2l > · · · > l>.nl·

If Q = [ q1, ... , qn ] and Qk = [ qlk), ... , q~k) ] and

0 0
dist(D;(A),span{ql \ ••• , q} )}) < 1 (8.2.11)

fori= 1:n- 1, then it follows from Theorem 8.2.2 that

k) .
• (k) (k)
dist(span{q 1 , ... ,q; },span{q~, ... ,q;}) = 0
(I T
>.H1
l

fori= 1:n- 1. This implies that the matrices Tk defined by

are converging to diagonal form. Thus, it can be said that the method
of orthogonal iteration computes a Schur decomposition if r = n and the
original iterate Q 0 E m.nxn is not deficient in the sense of (8.2.11).
The QR iteration arises by considering how to compute the matrix Tk
directly from its predecessor Tk_ 1 , On the one hand, we have from (8.2.1)
and the definition of Tk- 1 that

On the other hand,

Thus, Tk is determined by computing the QR factorization of n-1 and


then multiplying the factors together in reverse order. This is precisely
what is done in (8.2.1).

Example 8.2.4 If the QR iteration (8.2.1) is applied to the matrix in Example 8.2.1,
then alter 10 iterations

11.9907 -0.1926 -0.0004 0.0000 ]


T = -0.1926 8.0093 -0.0029 0.0001 .
10
[ -0.0004 -0.0029 -4.0000 0.0007
0.0000 0.0001 0.0007 -2.0000

The off-diagonal entries of the Tk matrices go to zero as follows:


8.2. POWER ITERATIONS 413

k 1Tk(2, 1)1 1Tk(3, 1)1 1Tk(4, 1) 1Tk(3, 2)1 1Tk(4, 2)1 1Tk(4, 3)1
1 3.9254 1.8122 3.3892 4.2492 2.8367 1.1679
2 2.6491 1.2841 2.1908 1.1587 3.1473 0.2294
3 2.0147 0.6154 0.5082 0.0997 0.9859 0.0748
4 1.6930 0.2408 0.0970 0.0723 0.2596 0.0440
5 1.2928 0.0866 0.0173 0.0665 0.0667 0.0233
6 0.9222 0.0299 0.0030 0.0405 0.0169 0.0118
7 0.6346 0.0101 0.0005 0.0219 0.0043 0.0059
8 0.4292 0.0034 0.0001 0.0113 0.0011 0.0030
9 0.2880 0.0011 0.0000 0.0057 0.0003 0.0015
10 0.1926 0.0004 0.0000 0.0029 0.0001 0.0007

Note that a single QR iteration involves O(n3 ) flops. Moreover, since con-
vergence is only linear (when it exists), it is clear that the method is a pro-
hibitively expensive way to compute Schur decompositions. Fortunately,
these practical difficulties can be overcome as we show in the next section.

Problems

P8.2.1 Suppose Ao E JI!'Xn is symmetric a.nd positive definite a.nd consider the following
iteration:

fork= 1, 2, ...
Ak-1 = Gka'f {Cholesky)
Ak =GfGk
end

(a) Show that this iteration is defined. (b) Show that if Ao = [ ~ be ] with a~ c has
eigenvalues A1 ~ A2 > 0, then the Ak converge to dia.g{At, A2).
P8.2.2 Prove {8.2.7).
P8.2.3 Suppose A E R!'xn is symmetric and define the function f:R"+ 1 -+ R"+ 1 by

f ( ).x) = [Ax-Ax
(rTx -1)/2
l
where "' E R" and ). E R Suppose X+ a.nd ).+ are produced by applying Newton's
method to f at the "current point" defined by rc a.nd Ac. Give expnS'IiOns for "'+ and
).+ assuming that II rc 112 = 1 and Ac = xr Axe.
Notes and References for Sec. 8.2

The following references are concerned with the method of orthogonal iteration {a.k.a..
the method of simulta.neous iteration):

G.W. Stewart {1969). "Accelerating The Orthogonal Iteration for the Eigenvalues of a.
Hermitian Matrix," Numer. Math. 13, 362-76.
M. Clint and A. Jennings (1970). "The E>91uation of Eigenvalues and Eigenvectors of
Real Symmetric Matrices by Simulta.neous Iteration," Comp. J. 13, 76-80.
H. Rutishall5BC {1970). "Sintulta.neous Iteration Method for Symmetric Matrices," Nu-
mer. Malh. 16, 205-23. See also WilkillBOn a.nd Reinsch {197l,pp.284-302).
414 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

References for the Rayleigh quotient method include

J. Vandergraft (1971). "Generalized Rayleigh Methods with Applications to Finding


Eigenvalues of Large Matrices," Lin. Alg. and Its Applic. 4, 353~8.
B.N. Parlett (1974). "The Rayleigh Quotient Iteration and Some Generalizations for
Nonnormal Matrices," Math. Comp. 28, 619-93.
R.A. Tapia and D.L. Whitley (1988). "The Projected Newton Method Has Order l + V2
for the Symmetric Eigenvalue Problem," SIAM J. Num. Anal. 25, 1376-1382.
S. Batterson and J. Smillie (1989). "The Dynamics of Rayleigh Quotient Iteration,"
SIAM J. Num. Anal. 26, 624~36.
C. Beattie and D.W. Fox (1989). "Localization Criteria and Containment for Rayleigh
Quotient Iteration," SIAM J. Matriz Anal. Appl. 10, 8o--93.
P.T.P. Tang (1994). "Dynamic Condition Estimation and Rayleigh-Ritz Approxima--
tion," SIAM J. Matriz Anal. Appl. 15, 331-346.

8.3 The Symmetric QR Algorithm


The symmetric QR iteration (8.2.1) can be made very efficient in two ways.
First, we show how to compute an orthogonal Uo such that UJ' AU = T is
tridiagonal. With this reduction, the iterates produced by (8.2.1) are all
tridiagonal and this reduces the work per step to O(n 2 ). Second, the idea of
shifts are introduced and with this change the convergence to diagonal form
proceeds at a cubic rate. This is far better than having the off-diagonal
entries going to to zero like j>.;+J/ >.;jk as discussed in §8.2.5.

8.3.1 Reduction to Tridiagonal Form


If A is symmetric, then it is possible to find an orthogonal Q such that

(8.3.1)

is tridiagonal. We call this the tridiagonal decomposition and as a compres-


sion of data, it represents a very big step towards diagonalization.
We show how to compute (8.3.1) with Householder matrices. Suppose
that Householder matrices P 1 , •.. , Pk-i have been determined such that if
Ak-i = (PI··· Pk-i)T A(P1 · · · Pk_J), then

Bu k-1
B21 1
[ 0 n-k
k-1

is tridiagonal through its first k - 1 columns. If Pk is an order n - k


Householder matrix such that hB32 is a multiple of In-k(:, 1) and if Pk =
8.3. THE SYMMETRIC QR ALGORITHM 415

diag(h, j\), then the leading k-by-k principal submatrix of

[ ~;: B2~Pk
PkB33pk
l k-1
1
n-k
k-1 n-k
is tridiagonal. Clearly, if Uo = P1 · · · Pn-2, then UJ' AUo = Tis tridiagonal.
In the calculation of Ak it is important to exploit symmetry during the
formation of the matrix PkB33Pk. To be specific, suppose that Pk has the
form
Pk = I- (3vvT f3 = 2/vT v, 0 f. v E IR.n-k.
Note that if p = (3B33 v and w =p - (f3pT v j2)v, then

PkB33pk = B33- VWT- WVT.

Since only the upper triangular portion of this matrix needs to be calcu-
lated, we see that the transition from Ak-l to Ak can be accomplished in
only 4( n - k ) 2 flops.

Algorithm 8.3.1 (Householder Tridiagonalization) Given a sym-


metric A E IR.nxn, the following algorithm overwrites A with T = QT AQ,
where T is tridiagonal and Q = H 1 • • • Hn- 2 is the product of Householder
transformations.
fork= 1:n- 2
[v, (3] = house(A(k + 1:n, k))
p = (3A(k + 1:n, k + 1:n)v
w = p- (f3pTvj2)v
A(k + 1, k) =II A(k + 1:n, k) ll2; A(k, k + 1) = A(k + l.k)
A(k + 1:n, k + 1:n) = A(k + 1:n, k + 1:n)- vwT- wvT
end
This algorithm requires 4n 3 /3 flops when symmetry is exploited in calcu-
lating the rank-2 update. The matrix Q can be stored in factored form in
the subdiagonal portion of A. If Q is explicitly required, then it can be
formed with an additional 4n 3 /3 flops.

Example 8.3.1

I 0 O]T[l3 ~
[ 00 .6.8 -.6.8 4 8
: ] [~
3 0
.~ -~
.8 -.6
] [; 0
5
10.32
1. 76
1~6
-5.32
] .

Note that if T has a zero subdiagonal, then the eigenproblem splits into
a pair of smaller eigenproblems. In particular, if tk+l,k = 0, then >.(T) =
416 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

>..(T(1:k, 1:k))u>..(T(k+1:n,k+1:n)). IfT has no zero subdiagonal entries,


then it is said to be unreduced.
Let T denote the computed version ofT obtained by Algorithm 8.3.1.
It can be shown that T= {JT (A+ E)Q where Q is exactly orthogonal and
Eisa symmetric matrix satisfying II E IIF :"': cull A IIF where cis a small
constant. See Wilkinson (1965, p. 297).

8.3.2 Properties of the Tridiagonal Decomposition


We prove two theorems about the tridiagonal decomposition both of which
have key roles to play in the sequel. The first connects (8.3.1) to the QR
factorization of a certain Krylov matrix. These matrices have the form

K(A,v,k) = [v, Av,···, Ak- 1v]

Theorem 8.3.1 IfQT AQ =Tis the tridiagonal decomposition of the sym-


metric matrix A E IRnxn, then QT K(A, Q(:, 1 ), n) = R is upper triangular.
If R is nonsingular, then T is unreduced. If R is singular and k is the
smallest index so rkk = 0, then k is also the smallest index so tk,k-l is
zero. See also Theorem 7.4.3.
Proof. It is clear that if q1 = Q(:, 1), then

QTK(A, Q(:, 1), n) [ QT ql, (QT AQ)(QT ql), ... , (QT AQt-I(QTqi) J
[et, Tq, ... ,rn- 1 e!] = R

is upper triangular with the property that r 11 = 1 and r;; = t21t32 · · · t;,;-1
for i = 2:n. Clearly, if R is nonsingular, then T is unreduced. If R is
singular and rkk is its first zero diagonal entry, then k 2: 2 and tk,k-l is the
first zero subdiagonal entry. D
The next result shows that Q is essentially unique once Q(:, 1) is specified.

Theorem 8.3.2 ( Implicit Q Theorem) Suppose Q = [ q1, ... , qn] and


V = [ v 1 , ••• , Vn ] are orthogonal matrices with the property that both QT AQ
= T and vr AV = S are tridiagonal where A E IRnxn is symmetric. Let k
denote the smallest positive integer for which tk+l,k = 0, with the conven-
tion that k = n if T is unreduced. If v 1 = q1 , then v; = ±q; and lti,i-tl =
lsi,i-1l fori= 2:k. Moreover, if k < n, then sk+l,k = 0. See also Theorem
7...l-f!.

Proof. Define the orthogonal matrix W = QTV and observe that W(:, 1) =
In(:, 1) = e1 and WTTW = S. By Theorem 8.3.1, WTK(T,e 1 , k) is upper
triangular with full column rank. But K(T, e1 , k) is upper triangular and
so by the essential uniqueness of the thin QR factorization,

W(:, 1:k) =In(:, 1:k)diag(±l, ... , ±1).


8.3. THE SYMMETRIC QR ALGORITHM 417

This says that Q(:,i) = ±V(:,i) fori= 1:k. The comments about the
subdiagonal entries follows from this since ti+l,i = Q(:, i + 1)T AQ(:, i) and
si+ 1 ,; = V(:,i + 1)T AV(:,i) fori= 1:n -1. D

8.3.3 The QR Iteration and Tridiagonal Matrices


We quickly state four facts that pertain to the QR iteration and tridiagonal
matrices. Complete verifications are straight forward.

1. Preservation of Form. If T = QR is the QR factorization of a sym-


metric tridiagonal matrix T E Rnxn, then Q has lower bandwidth 1
and R hf\S upper bandwidth 2 and it follows that

is also symmetric and tridiagonal.

2. Shifts. If s E Rand T- sl = QR is the QR factorization, then

is also tridiagonal. This is called a shifted QR step.

3. Perfect Shifts. If T is unreduced, then the first n -1 columns ofT- sl .


are independent regardless of s. Thus, if s E .A(T) and

QR=T-sl

is a QR factorization, then rnn =0 and the last column ofT+ =


RQ + sl equals sin(:, n) =sen.

4. Cost. If T E R'xn is tridiagonal, then its QR factorization can be


computed by applying a sequence of n - 1 Givens rotations:

fork= 1:n -1
[c, s] = givens(tkk• tk+l,k)
m = min{ k + 2, n}
T(k:k + 1, k:m) = [ -sc cs } T T(k:k + 1, k:m)
end

This requires O(n) flops. If the rotations are accumulated, then O(n2)
flops are needed.
418 CHAPTER 8. THE SYMM ET RIC E I GENVALUE P ROBLEM

8.3.4 Explicit Single Shift QR Iteration


If s i.S a good approximate eigenvalue, then we suspect that the (n, n - 1)
will be small after a QR step with shift s. Thls is the philosophy behind
the following iteration:

T = U[ AUo (tridiagonal)
fork= 0, 1, ...
Determine real shift Jl.· (8.3.2)
T - JJ.l = U R (QR factorization)
T = RU +J.Ll
end

1f
0

T =

then one reasonable choice for the shift is J1. = a,... However, a more effective
choice is to shift by the eigenvalue of

T( n- 1:n,n- 1:n) = an- 1 b,l- 1 ]


[ bn-1 an

that is closer t o an. This is known as the Wilkinson shift and it is given
by
(8.3.3)

where d = (an- 1 - an)/2. Wilkinson (1968b) has shown that (8.3.2) is


cubically convergent with either shift strategy, but gives heuristic reasons
why (8.3.3) is preferred.

8.3.5 Implicit Shift Version


It is possible to execute the transit ion from T toT+ = RU + J.Ll = UTTU
without explicitly forming the matrix T - Jl./. This has advantages when
the shift is much larger than some of the a;. Let c =cos( B) and s = sin( B)
be computed such that
8.3. THE SYMMETRIC QR ALGORITHM 419

If we set Gt = G(1, 2, li) then G1e1 = Ue 1 and


X X + 0 0 0
X X X 0 0 0
T +-- G[TG1 + X X X 0 0
0 0 X X X 0
0 0 0 X X X
0 0 0 0 X X

We are thus in a position to apply the Implicit Q theorem provided we can


compute rotations G 2, ... ,Gn- 1 with the property that if Z = G1G2 · · · Gn-1
then Ze1 = G1e1 = Ue1 and zrrz
is tridiagonal.
Note that the first column of Z and U are identical provided we take
each G; to be of the form G; = G(i, i + 1, li;) , i = 2:n- 1. But G; of this
form can be used to chase the unwanted nonzero element "+" out of the
matrix G[TG 1 as follows:

X X 0 0 0 0 X X 0 0 0 0
X X X + 0 0 X X X 0 0 0
a,
___, 0 X X X 0 0 a,
___, 0 X X X + 0
0 + X X X 0 0 0 X X X 0
0 0 0 X X X 0 0 + X X X
0 0 0 0 X X 0 0 0 0 X X

X X 0 0 0 0 X X 0 0 0 0
X X X 0 0 0 X X X 0 0 0
a.
--+
0 X X X 0 0 a,
___, 0 X X X 0 0
0 0 X X X + 0 0 X X X 0
0 0 0 X X X 0 0 0 X X X
0 0 0 + X X 0 0 0 0 X X

Thus, it follows from the Implicit Q theorem that the tridiagonal matrix
zrrz produced by this zero-chasing technique is essentially the same as the
tridiagonal matrix T obtained by the explicit method. (We may assume
that all tridiagonal matrices in question are unreduced for otherwise the
problem decouples.)
Note that at any stage of the zero-chasing, there is only one nonzero
entry outside the tridiagonal band. How this nonzero entry moves down

01
[0
0
-8
0c 08

0 0
c
00
0
1
l
T [ ak
bk
Zk
0
bk
ap
bP
0
Zk
bp
aq
bq Ur
00
bq
l[ l [ l
the matrix during the update T +-- G{TGk is illustrated in the following:

01 _ 0c 0s
0
0 0 0
s c
00
0
1
=
Uk
bk
0
0
bk
ap
bp
Zp
0
bP
aq
bq
0
zp
bq
Ur

Here (p, q, r) = (k + 1, k+ 2, k+3). This update can be performed in about


420 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

26 flops once c and s have been determined from the equation bks + ZkC =
0. Overall we obtain

Algorithm 8.3.2 (Implicit Symmetric QR Step with Wilkinson


Shift) Given an unreduced symmetric tridiagonal matrix T E IR"x", the
following algorithm overwrites T with z'TTZ, where Z = G1 · · · Gn-1 is a
product of Givens rotations with the property that zT(T- J1l) is upper
triangular and J1 is that eigenvalue ofT's trailing 2-by-2 principal submatrix
closer to tnn·
d = (tn-1,n-1- tnn)/2
J1 = t,,- t~,n-d (d + sign(d)Jdl- + t~,n- 1 )
x = tu - J1
z = t21
fork= 1:n -1
[ c, s] = givens(x, z)
T = GfTGk, where Gk = G(k, k + 1, 8)
ifk<n-1
X= tk+l,k
z = tk+2,k
end
end
This algorithm requires about 30n flops and n square roots. If a given
orthogonal matrix Q is overwritten with QG 1 · · · Gn-1, then an additional
6n2 flops are needed. Of course, in any practical implementation the tridi-
agonal matrix T would be stored in a pair of n-vectors and not in an n-by-n
array.

Example 8.3.2 If Algorithm 8.3.2 is applied to

1 1 0
T = 1 2 1
[
0 1 3
0 0 .01

then the new tridiagonal matrix T is given by

T _
-
[
.5000
.5916
0
0
.5916
1.785
. 1808
0
0
.1808
3.7140
.0000044
0
0
.0000044
4.002497
l .

Algorithm 8.3.2 is the basis of the symmetric QR algorithm-the standard


means for computing the Schur decomposition of a dense symmetric matrix.
8 .3. THE SYMMETRIC QR ALGORITH M 421

Algorithm 8 .3.3 (Symmetric QR A lgorit hm) Given A E nr'xn (sym-


metric) and a tolerance tol greater than t he unit roundoff, t his algorithm
computes an approximate symmetric Schur decomposition QTAQ = D. A
is overwritten with the t ridiagonal decomposition.

Use Algorithm 8.3.1, compute the t ridiagonalization


T = (P1· · · Pn- 2)T A (Pt · · · Pn- 2)·
Set D = T and if Q is desired, form Q = Pt · · · Pn- 2· See §5.1.6.
until q = n
Fori= l :n - 1, set d i+t ,i and di,i+l to zero if
ldH L,\1 = ldi,i+l l ~ tol(ld;il + ldi+t,i+l l)
Find the largest q and the smallest p such t hat if

D
[Y ] n-:-q
p n-p -q q

then D33 is diagonal and Dn is unreduced.


if q < n
Apply Algorit hm 8.3.2 to D22:
D = diag(Ip,Z,l9 )TD diag(Jp,Z,I9 )
If Q is desired, then Q = Q diag(Jp. Z, lq)·
end
end

This algorithm requires about 4n3 /3 flops if Q is not accumulated and


about 9n3 flops if Q is accumulated.

Examp le 8 .3 .3 Suppose Algorit hm 8.3.3 is applied to the tridiagonal matrix

1 2 0 0 ]
A= 2 3 4 0
0 4 5 6
[ 0 0 6 7

The subdiagonal entries change as follows during the execution or Algorithm 8.3.3:

Iteration a21 a3 2 a 43
1 1.6817 3.2344 .8649
2 1.6142 2.5755 .0006
3 1.6245 1.6965 10- 13
4 1.6245 1.6965 COD\oerg•
5 1.5117 .0150
6 1.1195 10- 9
7 .7071 converg.
8 conver!·
422 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Upon completion we find A( A) = { -2.4848, .7046, 4.9366, 12.831}.

The computed eigenvalues 5., obtained via Algorithm 8.3.3 are the exact
eigenvalues of a matrix that is near to A, i.e., Q~(A + E)Qo = diag(~)
where Q~ Qo = I and II E 112 ~ ull A ll2· Using Corollary 8.1.6 we know that
the absolute error in each 5., is small in the sense that 15.; - >.;! "" ull A ll2·
If Q = [ q1 , ••• , <1n J is the computed matrix of orthonormal eigenvectors,
then the accuracy of q; depends on the separation of >.; from the remainder
of the spectrum. See Theorem 8.1.12.
If all of the eigenvalues and a few of the eigenvectors are desired, then
it is cheaper not to accumulate Q in Algorithm 8.3.3. Instead, the desired
eigenvectors can be found via inverse iteration with T. See §8.2.2. Usually
just one step is sufficient to get a good eigenvector, even with a random
initial vector.
If just a few eigenvalues and eigenvectors are required, then the special
techniques in §8.5 are appropriate.
It is interesting to note the connection between Rayleigh quotient it-
eration and the symmetric QR algorithm. Suppose we apply the latter
to the tridiagonal matrix T E !Rnxn with shift a = e'?;Ten = tnn where
en= In(:,n). If T -ul= QR, then we obtain T = RQ +a!. From the
equation (T -ul)Q = RT it follows that

where qn is the last column of the orthogonal matrix Q. Thus, if we apply


(8,2.6) with Xo = en, then Xt = qno

8.3.6 Orthogonal Iteration with Ritz Acceleration


Recall from §8.2.4 that an orthogonal iteration step involves a matrix-
matrix product and a QR factorization:

zk = AQk-1
QkRk = Zk (QR factorization)

Theorem 8.1.14 says that we can minimize II AQk - GkS IIF by settingS=
Sk =: {Jf
AQk· If rf{ SkUk = Dk is the Schur decomposition of Sk E JR."xr
and Qk = GkUk, then

showing that the columns of Qk are the best possible basis to take after k
steps from the standpoint of minimizing the residual. This defines the Ritz
accelerotion idea:
8.3 . THE SYMMETRIC QR ALGORIT HM 423

Q0 E nr'xp given with Q'{jQo === / 11


fork = 1, 2, .. .
z" = AQk- 1
Cli.R~c = Z~c (QR factorization)
S~c = QrAQ~c (8.3.6)
U[ S~cUk :::: Dk (Schur decomposition)
Q~c = Q~cU~c
e nd
It can be shown that if
D~c = diag(B~"l, .. . , B~lcl)
then
IBflcl - >.;(A)I = 0 (I>.~:· lk) i = l :r

Recall t hat Theorem 8.2.2 says the eigenvalues of QI


AQ~c converge with
rate I>.,.+J/ >.,.lie. Thus, the Ritz values converge at a more favorable rate.
For details, see Stewart {1969).

Example 8.3.4 If we apply (8.3.6) wi~h

then
A-[T t lll . . Q··[! ll
k di.s~ {D2(A). QA: }
0 .2 x to- 1
1 .5 X 10- 3
2 .1 X 10- 4
3 .3 X 10- 6
4 .8 X 10- 8

Clearly, convergence is taking place at the ra te (2/99)1<.

Problems

P8.S. I Supp05e), is an eigenvalue of a symme~dc tridiagonal matrix T . Show that if


J. bas algebraic multiplicity k, then at least k - 1 ofT's s ubdiagonal elements are zero.
P8.3.2 Suppose A is symmetric and has bandwid~h p . Show that if we perform the
=
shifted QR. step A - 1'1 QR, A = RQ + 14f. then A has bandwidth p.
P8.3.3 Suppose B E R"x n is upper bidiagona.J w ith diagonal entries d(l:n) and super-
diagona.J entries /(l:n - 1). State and prow. a singu lar value version o f Theorem 8 .3.1.
P8.3 .4 Let A = [ : ~ ] be real and suppore we perform the following shifted QR.
step: A - zl = U R, A- = RU + zl . Show that if A=
- ['Iii:t fZ ] then
424 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

w = w + x 2 (w- z)j[(w- z)2 + x•]


z= z- x 2 (w- z)j[(w- z) 2 + x2]
x = -x 3 /[(w- z) 2 + x 2 ].

P8.3.5 Suppose A E <Dnxn is Hermitian. Show how to construct unitary Q such that
QH AQ = T is real, symmetric, and tridiagonal.

P8.3.6 Show that if A = B + iC is Hermitian, then M = [ ~ -C


B ]·lS symmetnc.
.
Relate the eigenvalues and eigenvectors of A and M.
P8.3. 7 Rewrite Algorithm 8.2.2 for the case when A is stored in two n-vectors. Justify
the given flop count.
P8.3.8 Suppose A= S+uuuT where S E Rnxn is skew-symmetric (AT= -A, u ERn
has unit 2-norm, and 11 E R. Show how to compute an orthogonal Q such that QT AQ
is tridiagonal and QTu =In(:, 1) = e,.

Notes and References for Sec. 8.3

The tridiagonalization of a symmetric matrix is discussed in

R.S. Martin and J.H. Wilkinson (1968). "Householder's Tridiagonalization of a Sym-


metric Matrix," Numer. Math. 11, 181-95. See also Wilkinson ll.lld Reinsch (1971,
pp.212-26).
H.R. Schwartz (1968). "Tridiagonalization of a Symmetric Band Matrix," Numer. Math.
12, 231-41. See also Wilkinson and Reinsch (1971, pp.273-83).
N.E. Gibbs and W.G. Poole, Jr. (1974). "Tridiagonalization by Permutations," Comm.
ACM 17, 2Q-24.

The first two references contain Algol programs. Algol procedures for the explicit and
implicit tridiagonal QR algorithm are given in

H. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). "The QR and QL
Algorithms for Symmetric Matrices," Numer. Math. 11, 293-306. See also Wilkinson
and Reinsch (1971, pp.227-40).
A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). "The Implicit QL Algorithm,"
Numer. Math. 1f!, 377-83. see also Wilkinson and Reinsch (1971, pp.241-48).

The "QL" algorithm is identical to the QR algorithm except that at each step the matrix
T- >.I is factored into a product of an orthogonal matrix and a lower triangular matrix.
Other papers concerned with these methods include

G.W. Stewart (1970). "Incorporating Original Shifts into the QR Algorithm for Sym-
metric Tridiagonal Matrices," Comm. ACM 13, 365--67.
A. Dubrulle (1970). "A Short Note on the Implicit QL Algorithm for Symmetric Tridi-
agonal Matrices," Numer. Math. 15, 450.
Extensions to Hermitian and skew-symmetric matrices are described in

D. Mueller (1966). "Householder's Method for Complex Matrices and Hermitian Matri-
ces," Numer. Math. 8, 72-92.
R.C. Ward and L.J. Gray (1978). "Eigensystem Computation for Skew-Symmetric and
A Class of Symmetric Matrices," ACM Trans. Math. Soft. 4, 278-85.
8.3. THE SYMMETRIC QR ALGORITHM 425

The convergence properties of Algorithm 8.2.3 are detailed in Lawson and Hanson {1974,
Appendix B), as well as In

J.H. Wilkinson {196Bb). "Global Convergence of Tridiagonal QR Algorithm With Origin


Shifts," Lin. Alg. and Its Applic. I, 409-20.
T.J. Dekker and J.F. Traub {1971). "The Shifted QR Algorithm for Hermitian Matrices,"
Lin. Alg. and Its Applic . .4, 137-54.
W. Hoffman and B.N. Parlett {1978). "A New Proof of Global Convergence for the
Tridiagonal QL Algorithm," SIAM J. Num. Anal. 15, 929--37.
S. Batterson {1994). "Convergence of the Francis Shifted QR Algorithm on Normal
Matrices," Lin. Alg. and Its Applic. M7, 181-195.
For an analysis of the method when it is applied to normal matrices see

C.P. Huang {1981). "On the Convergence of the QR Algorithm with Origin Shifts for
Normal Matrices," IMA J. Num. Anal. 1, 127-33.
Interesting papers concerned with shifting in the tridiagonal QR algorithm include

F.L. Bauer and C. Reinsch {1968). "Rational QR Transformations with Newton Shift
for Symmetric Tridiagonal Matrices," Numer. Math. 11, 264-72. See also Wilkinson
and Reinsch {1971, pp.257-65).
G.W. Stewart {1970). "Incorporating Origin Shifts into the QR Algorithm for Symmetric
Tridiagonal Matrices," Comm. Assoc. Comp. Mach. 13, 365-67.
Some parallel computation pOSBibilities for the algorithms in this section are discussed in

S. Lo, B. Philippe, and A. Sameh {1987). "A Multiprocessor Algorithm for the Symmet-
ric Tridiagonal Eigenvalue Problem," SIAM J. Sci. and Stat. Comp. 8, s155-s165.
H.Y. Chang and M. Salama (1988). "A Parallel Householder Tridiagonalization Strategy
Using Scattered Square Decomposition," Parallel Computing 6, 297-312.
Another way to compute a specified subset of eigenvalues is via the rational QR algo-
rithm. In this method, the shift is determined using Newton's method. This makes it
possible to "steer" the iteration towards desired eigenvalues. See

C. Reinsch and F.L. Bauer {1968). "Rational QR Transformation with Newton's Shift
for Symmetric Tridiagonal Matrices," Numer. Math. 11, 264-72. See also Wilkinson
and Reinsch {1971, pp.257-65).
Papers concerned with the symmetric QR algorithm for banded matrices include

R.S. Martin and J.H. Wilkinson (1967). "Solution of Symmetric and Unsymmetric Band
Equations and the Calculation of Eigenvectors of Band Matrices," Numer. Math. 9,
279-301. See also See also Wilkinson and Reinsch (1971, pp.7Q-92).
R.S. Martin, C. Reinsch, and J.H. Wilkinson {1970). "The QR Algorithm for Band
Symmetric Matrices," Numer. Math. 16, 85--92. See also Wilkinson and Reinsch
(1971, pp.266-72).
426 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

8.4 Jacobi Methods


Jacobi methods for the symmetric eigenvalue problem attract current at-
tention because they are inherently parallel. They work by performing a
sequence of orthogonal similarity updates A <-- QT AQ with the property
that each new A, although full, is "more diagonal" than its predecessor.
Eventually, the off-diagonal entries are small enough to be declared zero.
After surveying the basic ideas behind the Jacobi approach we develop
a parallel Jacobi procedure.

8.4.1 The Jacobi Idea


The idea behind Jacobi's method is to systematically reduce the quantity

n n
off( A) = L I>~j ,
i=l j=l
#i

i.e., the"norm" of the off-diagonal elements. The tools for doing this are
rotations of the form

1 0 0 0

0 c s 0 p

J(p,q,(})
0 -s c 0 q

0 0 0 1

p q

which we call Jacobi rotations. Jacobi rotations are no different from Givens
rotations, c.f. §5.1.8. We submit to the name change in this section to honor
the inventor.
The basic step in a Jacobi eigenvalue procedure involves (1) choosing an
index pair (p, q) that satisfies 1 :::; p < q :::; n, (2) computing a cosine-sine
pair (c, s) such that

bpp bpq c s T [ app apq c s


[ (8.4.1)
bqp bqq ] [ -s c ] aqp aqq ] [ -s c ]

is diagonal, and (3) overwriting A with B = JT AJ where J = J(p, q, 6).


Observe that the matrix B agrees with A except in rows and columns p
8.4 . JACOBI M ETHODS 427

and q. Moreover, since the Frobenius norm is preserved by orthogonal


transformations we find that
b2
2
a"" 2 ,2
+ aqq + "apq
- b2
- "" + qq + "b2
" pq - b2 +b2
- "" qq

and so
n
off(B) 2 = II B II~- Lb?, (8.4.2)
i= l
n
= II A II~-L:af, + (a~+ a~q- b~ - b~9 )
i=l

= olf(A) 2 - 2a;9 •

It is in this sense that A moves closer to diagonal form with each Jacobi
step.
Before we discuss how the index pair (p, q) can be chosen, let us look at
the actual computations associated wit h the (p,q) subproblem.

8.4.2 The 2-by-2 Symmetric Schur Decomposition


To say that we diagonalize in (8.4.1) is to s ay that
2
0 = bpq = apq(c - s2 ) + (a""- aq9 )cs. (8.4.3)

If apq = 0, then we just set (c, s) = (1,0) . Otherwise define

r = aqq -a, and t = sfc


2apq

and conclude from (8.4.3) that t = tan(9) solves the quadratic

t
2
+ 2rl - 1 = 0 .
It t urns out to be important to select the smaller of the two roots,

t = -T± ~
whereupon c and scan be resolved from t he formulae

c= l/v'l+t2 s = tc.
Choosing t to be the smaller of the two roots ensures that 191 ~ 7r I 4 and
has the effect of minimizing the difference between B and A because
n

II B- A II~ = 4(1 - c) L (a;, +a~) + 2a~jc?


'"'I
~p.q
428 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

We summarize the 2-by-2 computations as follows:

Algorithm 8.4.1 Given an n-by-n symmetric A and integers p and q that


satisfy 1 :5 p < q :5 n, this algorithm computes a cosine-sine pair (c , s)
such that if B = J(p,q,O)T AJ(p,q,O) then bpq = bqp = 0.

function: [c, s] = sym.schur2(A,p, q)


if A(p, q) "f 0
r = (A(q,q)- A(p,p))j(2A(p,q))
if 1" ~ 0
t = 1/(r + v'1 + r2);
else
t = -1/( -T + v'f"+""?);
end
c = 1/v'1 + t 2
s = tc
else
c=l
s=O
end

8.4.3 The Classical Jacobi Algorithm


As we mentioned above, only rows and columns p and q are altered when
the (p, q) subproblem is solved. Once sym.schur2 determines the 2-by-2
rotation, then the update A+- J(p,q,(J)T AJ(p,q,O) can be implemented
in 6n flops if symmetry is exploited.
How do we choose the indices p and q? From the standpoint of maxi-
mizing the reduction of off(A) in (8.4.2}, it makes sense to choose (p,q) so
that a;q is maximal. This is the basis of the classical Jacobi algorithm.

Algorithm 8.4.2 (Classical Jacobi) Given a symmetric A E IRnxn and


a tolerance tol > 0, this algorithm overwrites A with yT AV where V is
orthogonal and off(VT AV) :5 tolll A IIF·

V =In; eps =to! II A IIF


while off(A) > eps
Choose (p,q) so lapql =max;,.; Ia;; I·
(c, s) = sym.schur2(A,p, q)
A= J(p,q,(j)TAJ(p,q,9)
V = V J(p,q,O)
end

Since la,.ql is the largest off-diagonal entry, off(A) 2 :5 N(a~ +a~,.) where
8.4. JACOBI METHODS 429

N = n(n - 1}/2. From (8.4.2} it follows that

olf(B)
2
::; ( 1- ~) olf(A)2 •
By induction, if A(t) denotes the matrix A after k Jacobi updates, then

off(A(.Ic)} 2 ::; ( 1- ~) k off(A(0) ) 2 •


This implies that the classical Jacobi procedure converges at a linear rate.
However, the asymptotic convergence rate of the method is considerably
better than linear. Schonbage (1964} and van Kempen {1966) show that
for k large enough, there is a constant c such that

i.e., quadratic convergence. An earlier paper by Henrici (1958} established


the same result for the special case when A has distinct eigenvalues. In
the convergence theory for the Jacoblltera.t!on, it Is critical that J8l ::; 1rI 4.
Among other things this precludes the possibility of "interchanging" nearly
converged diagonal entries. This follows from the formulae bpp = app- tapq
and b99 = a99 + tapq, which can be derived from equations (8.4.1) and the
definition t = sin(8)/ cos(8).
It is customary to refer to N Jacobi updates as a sweep. Thus, after
a sufficient number of iterations, quadratic convergence is observed when
examining off(A} after every sweep.

Example 8.4.1 Applying the ciMsical Jacobi !Wra1ioo to

A = [ ~
1
i : 1~ l
4 10 20
we lind

sweep O(off(A))
0 102
1 101
2 to- 2
3 to-11
4 to-n

There is no rigorous theory that enables one to predict the number of


sweeps that are requi.red to achieve a specified reduction in off(A). However,
Brent and Luk (1985) have argued heuristically that the number of sweeps
is proportional to log(n) and this seems to be the case in practice.
430 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

8.4.4 The Cyclic-by-Row Algorithm


The trouble with the classical Jacobi method is that the updates involve
O(n) flops while the search for the optimal (p, q) is O(n 2 ). One way to
address this imbalance is to fix the sequence of subproblems to be solved
in advance. A reasonable possibility is to step through all the subproblems
in row-by-row fashion. For example, if n = 4 we cycle as follows:

(p, q) = (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4), (1, 2), ...

This ordering scheme is referred to as cyclic-by-row and it results in the


following procedure:

Algorithm 8.4.3 (Cyclic Jacobi) Given a symmetric A E !Rnxn and


a tolerance tal > 0, this algorithm overwrites A with vr AV where V is
orthogonal and off(VT AV) ~ tolll A IIF .

V=In
eps = tolll A IIF
while off(A) > eps
for p=l:n-1
for q = p+ 1:n
(c, s) = sym.schur2(A,p, q)
A= J(p, q, B)T AJ(p, q, B)
V = V J(p, q, B)
end
end
end

Cyclic Jacobi converges also quadratically. (See Wilkinson (1962) and van
Kempen (1966).) However, since it does not require off-diagonal search, it
is considerably faster than Jacobi's original algorithm.

Example 8.4.2 If the cyclic Jacobi method is applied to the matrix in Example 8.4.1
we find

Sweep O(off(A))
0 10
1 10'
2 10_,
3 w-6
4 10 -16
8.4. JACOBI METHODS 431

8.4.5 Error Analysis


Using Wilkinson's error analysis it is possible to show that if r sweeps are
needed in Algorithm 8.4.3 then the computed d; satisfy
n

L(d;- >.;) 2 ::; (8 + k,)ll A IIFu


i=l

for some ordering of A's eigenvalues >.;. The parameter k, depends mildly
on r.
Although the cyclic Jacobi method converges quadratically, it is not
generally competitive with the symmetric QR algorithm. For example, if
we just count flops, then 2 sweeps of Jacobi is roughly equivalent to a com-
plete QR reduction to diagonal form with accumulation of transformations.
However, for small n this liability is not very dramatic. Moreover, if an ap-
proximate eigenvector matrix V is known, then vr AV is almost diagonal,
a situation that Jacobi can exploit but not QR.
Another interesting feature of the Jacobi method is that it can a com-
pute the eigenvalues with small relative error if A is positive definite. To
appreciate this point, note that the Wilkinson analysis cited above cou-
pled the §8.1 perturbation theory ensures that the computed eigenvalues
>.1 :::: · · · :::: 5-n satisfy
15.;- >.;(A)I ~ II A 112 < (A)
>.;(A) ~ u >.;(A) - UK2 .

However, a refined, componentwise error analysis by Demmel and Veselic


(1992) shows that in the positive definite case,

15.;- >.;(A) I ~ (D-1AD-1) (8.4.4)


-\;(A) ~ UK2 •

where D = diag( ..;all, ... , ya;;;;-) and this is generally a much smaller ap-
proximating bound. The key to establishing this result is some new pertur-
bation theory and a demonstration that if A+ is a computed Jacobi update
obtained from the current matrix Ac, then the eigenvalues of A+ are rel-
atively close to the eigenvalues of Ac in the sense of (8.4.4). To make the
whole thing work in practice, the termination criteria is not based upon
the comparison of off(A) with ull A IIF but rather on the size of each la;il
compared to u,ja;;aii. This work is typical of a new genre of research con-
cerned with high-accuracy algorithms based upon careful, componentwise
error analysis. See Mathias (1995).

8.4.6 Parallel Jacobi


Perhaps the most interesting distinction between the QR and Jacobi ap-
proaches to the symmetric eigenvalue problem is the rich inherent para!-
432 CHAPTER 8 . THE SYMMETRIC EIGENVALUE PROBLEM

lelism of the latter algorithm. To illustrate this, suppose n = 4 and group


the six subproblems into three rototion sets as follows:

rot.set(1) {(1,2), (3, 4)}


rot. set(2) = {(1,3),(2,4)}
rot.set(3) = {(1, 4),(2,3)}

Note that all the rotations within each of t he t hree rotation sets are "non-
conflicting." That is, subproblems (1,2) and (3,4) can be carried out in
parallel. Likewise the (1,3) and (2,4) subproblems can be executed in par-
allel as can subproblems (1,4) and (2,3). In general, we say that

N = (n - l )n/2

is a parallel ordering of the set {(i,j) II ~ i < j ~ n} if for s = 1:n -1


t he rotation set rot.set(s) = { {ir ,ir): r = 1 + n(s - l )/2:ns/2 } consists
of nonconflicting rotations. This requires n to be even, which we assume
throughout this section. (The odd n case can be handled by bordering
A with a row and column of zeros and being careful when solving the
subproblems that involve these augmented zeros.)
A good way to generate a parallel ordering is to visualize a chess tourna-
ment with n players in which everybody must play everybody else exactly
once. In then = 8 case this entails 7 "rounds." During round one we have
the following four games:

I ~ ! ~ ~
I I I I
rot.set{1) = { (1, 2) , (3,4), (5,6), (7,8)}

i.e., 1 plays 2, 3 plays 4, etc. To set up rounds 2 through 7, player 1 stays


put and players 2 through 8 embark on a merry-go-round:

I !~ I I : I ~ I
rot.set(2) = { (1, 4), (2, 6), (3, 8), (5, 7)}

I ~ I : I ; I ~ I
rot.set(3) {(1,6),(4,8),(2, 7),(3, 5)}

I!~ I I : I ; I
rot.set(4) = {(1,8), (6,7),(4,5), (2,3)}

I~ I: I~ I~ I rot.set(5) {(1 , 7),(5,8),(3,6) ,(2,4)}

I! I~ I~ I~ I rot.set(6) {(1, 5),(3,7),(2,8),(4,6)}


8.4. JACOBI METHODS 433

I~ I~ I~ I: I rot.set(7) = {(1, 3), (2, 5), (4, 7), (6, 8)}

We can encode these operatiollB in a pair of integer vectors top(1:n/2) and


bot(1:n/2). During a given round top(k) plays bot(k) , k = 1:n/2. The
pairings for the next round is obtained by updating top and bot as follows:

function: [new.top, new.bot] =music( top, bot, n)


m=n/2
fork= 1:m
ifk = 1
new.top(1) = 1
else if k = 2
new.top(k) = bot(1)
elseif k > 2
new.top(k) = top(k- 1)
end
if k=m
new.bot(k) = top(k)
else
new.bot(k) = bot(k + 1)
end
end

Using music we obtain the following parallel order Jacobi procedure.

Algorithm 8.4.4 (Parallel Order Jacobi ) Given a symmetric A E R" x n


and a tolerance to/> 0, this algorithm overwrites A with vr AV where V
is orthogonal and off(VT AV) ::; toll! A 1\p . It is assumed that n is even.

V=ln
eps =toll! A IIF
top= 1:2:n; bot = 2:2:n
while off(A) > eps
for set= 1:n- 1
fork= 1:n/2
p = min(top(k),bot(k))
q = max(top(k), bot(k))
(c, s) = sym.schur2(A,p,q)
A = J(p, q, 9)T AJ(p, q, 9)
v = v J(p,q,9)
end
(top, bot]= music(top,bot,n)
end
end
434 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Notice that the k-loop steps through n/2 independent, nonconflicting sub-
problems.

8.4.7 A Ring Procedure


We now discuss how Algorithm 8.4.4 could be implemented on a ring of p
processors. We assume that p = n/2 for clarity. At any instant, Proc(JJ)
houses two columns of A and the corresponding V columns. For example,
if n = 8 then here is how the column distribution of A proceeds from step
to step:

Proc(1) Proc(2) Proc(3) Proc(4)


Step 1: [1 2 J [3 4 J [56 J [ 7 8]
Step 2: [1 4 J [2 6 J [ 3 8] [57]
Step 3: [1 6 J [4 8 J [ 2 7] [ 3 5]
etc.

The ordered pairs denote the indices of the housed columns. The first index
names the left column and the second index names the right column. Thus,
the left and right columns in Proc(3) during step 3 are 2 and 7 respectively.
Note that in between steps, the columns are shuffled according to the
permutation implicit in music and that nearest neighbor communication
prevails. At each step, each processor oversees a single subproblem. This
involves (a) computing an orthogonal Vamall E IR2 x 2 that solves a local 2-
by-2 Schur problem, (b) using the 2-by-2 v.mall to update the two housed
columns of A and V, (c) sending the 2-by-2 V.mall to all the other proces-
sors, and (d) receiving the V.mall matrices from the other processors and
updating the local portions of A and V accordingly. Since A is stored by
column, communication is necessary to carry out the Vamall updates be-
cause they effect rows of A. For example, in the second step of the n = 8
problem, Proc(2) must receive the 2-by-2 rotations associated with sub-
problems (1,4), (3,8), and (5,7). These come from Proc(1), Proc(3), and
Proc(4) respectively. In general, the sharing of the rotation matrices can
be conveniently implemented by circulating the 2-by-2 Vsmall matrices in
"merry go round" fashion around the ring. Each processor copies a pass-
ing 2-by-2 V.mall into its local memory and then appropriately updates the
locally housed portions of A and V.
The termination criteria in Algorithm 8.4.4 poses something of a prob-
lem in a distributed memory environment in that the value of off(·) and
II A IIF require access to all of A. However, these global quantities can be
computed during the V matrix merry-go-round phase. Before the circu-
lation of the V's begins, each processor can compute its contribution to
II A IIF and off(·). These quantities can then be summed by each processor
if they are placed on the merry-go-round and read at each stop. By the
end of one revolution each processor has its own copy of II A IIF and off(}
8.4. JACOBI METHODS 435

8.4.8 Block Jacobi Procedures


It is usually the case when solving the symmetric eigenvalue problem on a
p-processor machine that n » p. In this case a block version of the Jacobi
algorithm may be appropriate. Block versions of the above procedures are
straightforward. Suppose that n = rN and that we partition the n-by-n
matrix A as follows:

Here, each A;j is r-by-r. In block Jacobi the (p, q) subproblem involves
computing the 2r-by-2r Schur decomposition

_ [ Dw 0 ]
] - 0 Dqq

and then applying to A the block Jacobi rotation made up of the V;i . If
we call this block rotation V then it is easy to show that

Block Jacobi procedures have many interesting computational aspects. For


example, there are many ways to solve the subproblems and the choice
appears to be critical. See Bischof (1987).

Problems

P8.4.1 Let the scalar -y be given along with the matrix

A=[~~]·
It is desired to compute an orthogonal matrix

J= [ c •]
-· c
such that the (!, I) entry of JT AJ equals -y. Show that this requirement leads to the
equation
(w--y)r2 -2xr+(z--y) =0,
=
where r cfs. Verify that this quadratic has real roots if-y satisfies ll2 $ -y $ ll,, where
ll, and ,\2 a.re the eigenvalues of A.
P8.4.2 Let A E R'x" be symmetric. Give an algorithm that computes the factorization
QT AQ = -yl +F
where Q is a product of Jacobi rotations, -y = trace(A)/n, and F has zero diagonal
entries. Discuss the uniqueness of Q.
P8.4.3 Fbrmulate Jacobi procedures for (a) skew symmetric matrices and (b) complex
436 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Hermitian matrices.
P8.4.4 Partition the n-by-n real symmetric matrix A as follows:

a vT ] 1
A= [ v AI n-1
1 n-1

Let Q be a Householder matrix such that if B = QT AQ, then B(3:n, 1) = 0. Let


J = J(1, 2, 0) be determined such that if C = JT BJ, then c 12 = 0 and en 2 c22· Show
en 2: a+ II v 112· La Budde (1964) formulated Bl1 algorithm for the symmetric eigenvalue
probem based upon repetition of this Householder-Jacobi computation.
P8.4.5 Organize function music so that it involves minimum workspace.
P8.4.6 When implementing cyclic Jacobi, it is sensible to skip the annihilation of apq
if its modulus is less than some small, sweep-dependent parameter, because the net re-
duction in off( A) is not worth the cost. This leads to what is called the threshold Jacobi
method. Details concerning this variant of Jacobi's algorithm may be found in Wilkinson
(1965, p.277). Show that appropriate thresholding can guarantee convergence.

Notes and References for Sec. 8.4

Jacobi's original paper is one of the earliest references found in the numerical analysis
literature

C.G.J. Jacobi (1846). "Uber ein Leichtes Verfahren Die in der Theorie der Sacularstroun-
gen Vorkommendern Gleichungen Numerisch Aufzulosen," Grelle's J. SO, 51-94.
Prior to the QR algorithm, the Jacobi technique was the standard method for solving
dense symmetric eigenvalue problems. Early attempts to improve upon it include

M. Lotkin (1956). "Characteristic Values of Arbitrary Matrices," Quart. Appl. Math.


Ll, 267-75.
D.A. Pope and C. Tompkins (1957). "Maximizing Functions of Rotations: Experiments
Concerning Speed ofDiagonalization of Symmetric Matrices Using Jacobi's Method,"
J. ACM ,t, 459-66.
C.D. La Budde (1964). "Two Class.., of Algorithms for Finding the Eigenvalues and
Eigenvectors of Real Symmetric Matrices," J. ACM 11, 53-58.
The computational aspects of Jacobi method are described in Wilkinson (1965,p.265).
See also

H. Rutishauser (1966). "The Jacobi Method for Real Symmetric Matrices," Numer.
Math. 9, 1-10. See also Wilkinson and Reinsch (1971, pp. 202-11).
N. Mackey (1995). "Hamilton and Jacobi Meet Again: Quaternions and the Eigenvalue
Problem," SIAM J. Matrix Anal. Applic. 16, 421-435.
The method is also useful when a nearly diagonal matrix must be diagonalized. See

J .H. Wilkinson (1968). "Almost Diagonal Matrices with Multiple or Close Eigenvalues,"
Lin. Alg. and Its Applic. I, 1-12.
Establishing the quadratic convergence of the classical and cyclic Jacobi iterations has
attracted much attention:

P. Henrici (1958). "On the Speed of Convergence of Cyclic and Quasicyclic Jacobi
Methods for Computing the Eigenvalues of Hermitian Matrices," SIAM J. Appl.
Math. 6, 144-62.
E.R. Hansen (1962). "On Quasicyclic Jacobi Methods," ACM J. 9, lls--35.
8.4. JACOBI METHODS 437

J.H. Wilkinson (1962). "Note on the Quadratic Convergence of the Cyclic Jacobi Pro-
cess," Numer. Math. 6, 296-300.
E.R. Hansen (1963). "On Cyclic Jacobi Methods," SIAM J. Appl. Math. 11, 448-59.
A. Schonhage (1964). "On the Quadratic Convergence of the Jacobi Process," Numer.
Math. 6, 41o-12.
H.P.M. van Kempen (1966). "On Quadratic Convergence of the Special Cyclic Jacobi
Method," Numer. Math. 9, 19-22.
P. Henrici and K. Zimmermann (1968). "An Estimate for the Nonns of Certain Cyclic
Jacobi Operators," Lin. Alg. and Its Applic. 1, 489-501.
K.W. Brodlie and M.J.D. Powell (1975). "On the Convergence of Cyclic Jacobi Meth-
ods," J. Inst. Math. Applic. 15, 279-87.
Detailed error analyses that ..,tablish imprtant oomponentwise error bounds include

J. Barlow and J. Derome! (1990). "Computing Accurate Eigensystema of Scaled Diago-


nally Dominant Matrices," SIAM J. Numer. Anal. !!7, 762-791.
J.W. Derome! and K. Veselil: (1992). "Jacobi's Method is More Accurate than QR,"
SIAM J. Matrix Anal. Appl. 13, 1204-1245.
Z. Drm~ (1994). The Genernlized Singular Value Problem, Ph.D. Thesis, FernUniver-
sitat, Hagen, Germany.
W.F. MBBCarenhas (1994). "A Note on Jacobi Being More Accurate than QR," SIAM
J. Matrix Anal. Appl. 15, 215-218.
R. Mathias (1995). "Accurate Eigensystem Computations by Jacobi Methods," SIAM
J. Matrix Anal. Appl. 16, 977-1003.
Attempts have been made to extend the Jacobi iteration to other classes of matrices
and to push through corresponding convergence results. The case of normal matrices is
discussed in

H.H. Goldstine and L.P. Horowitz (1959). "A Procedure for the Diagonalization of
Normal Matrices," J. Assoc. Comp. Mach. 6, 176-95.
G. Loizou (1972). "On the Quadratic Convergence of the Jacobi Method for Normal
Matrices," Comp. J. 15, 274-76.
A. Rube (1972). "On the Quadratic Convergence of the Jacobi Method for Normal
Matrices," BIT 7, 305-13.
See also

M.H.C. Paardekooper (1971). "An Eigenvalue Algorithm for Skew Symmetric Matrices,"
Numer. Math. 17, 189-202.
D. Hacon (1993). "Jacobi's Method for Skew-Symmetric Matrices," SIAM J. Matrix
AnaL Appl. 14, 619-628.
Essentially, the analysis and algorithmic developments presented in the text carry over
to the normal case with minor modification. For non-normal matrices, the situation iB
considerably more difficult. Consult

J. Greenstadt (1955). "A Method for Finding Roots of Arbitrary Matrices," Math.
Tables and Other Aids to Comp. 9, 47-52.
C.E. Froberg {1965). "On Tria.ngularization of Complex Matrices by Two Dimensional
Unitary Tranformations," BIT 5, 23o-34.
J. Boothroyd and P.J. Eberlein (1968). "Solution to the Eigenproblem by a Norm-
Reducing Jacobi-Type Method (Handbook)," Numer. Math. 11, 1-12. See also
Wilkinson and Reinsch (1971, pp.327-3S).
A. Rube (1968). On the Quadratic Convergence of a Generalization of the Jacobi Method
to Arbitrary Matrices," BIT 8, 21o-3L
A. Rube (1969). ''The Norm of a Matrix After a. Similarity Transformation," BIT 9,
53-58.
438 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

P.J. Eberlein (1970). "Solution to the Complex Eigenproblem by a Norm-Reducing


Jacobi-type Method," Numer. Math. 14, 232--45. See also Wilkinson and Reinsch
(1971, pp.404-17).
C.P. Huang (1975). "A Jacobi-Type Method for Triangularizing an Arbitrary Matrix,"
SIAM J. Num. Anal. 12, 56&-70.
V. Hari ( 1982). "On the Global Convergence of the Eberlein Method for Real Matrices,"
Numer. Math. 39, 361-370.
G.W. Stewart (1985). "A Jacobi-Like Algorithm for Computing the Schur Decomposi-
tion of a Nonhermitian Matrix," SIAM J. Sci. and Stat. Comp. 6, 853-862.
W-W. Lin and C.W. Chen (1991). "An Acceleration Method for Computing the Gen-
eralized Eigenvalue Problem on a Parallel Computer," Lin.Alg. and Its Applic. 146,
49--£5.

Jacobi methods for complex symmetric matrices have also been developed. See

J.J. Seaton (1969). "Diagonalization of Complex Symmetric Matrices Using a Modified


Jacobi Method," Comp. J. 12, 15&-57.
P.J. Eberlein (1971). "On the Diagonalization of Complex Symmetric Matrices," J. Inst.
Math. Applic. 7, 377-83.
P. Anderson and G. Loizou (1973). "On the Quadratic Convergence of an Algorithm
Which Diagona.lizes a Complex Symmetric Matrix," J. Inst. Math. Applic. 12,
261-71.
P. Anderson and G. Loizou (1976). "A Jacobi-Type Method for Complex Symmetric
Matrices (Handbook)," Numer. Math. 25, 347~3.

Although the symmetric QR algorithm is generally much faster than the Jacobi
method, there are special settings where the latter technique is of interest. As we illU&-
trated, on a pacallel-computer it is possible to perform several rotations concurrently,
thereby accelerating the reduction of the off-diagonal elements. See

A. Sameh (1971). "On Jacobi and Jacobi-like Algorithms for a Parallel Computer,"
Math. Comp. 25, 579-90.
J.J. Modi and J.D. Pryce (1985). "Efficient Implementation of Jacobi's Diagona.lization
Method on the DAP," Numer. Math. 46, 443-454.
D.S. Scott, M.T. Heath, and R.C. Wacd (1986). "Parallel Block Jacobi Eigenvalue
Algorithms Using Systolic Arrays," Lin. Alg. and Its Applic. 77, 345-356.
P.J. Eberlein (1987). "On Using the Jacobi Method on a Hypercube," in Hypercube
Multiprocessors, ed. M.T. Heath, SIAM Publications, Philadelphia.
G. Shroff and R. Schreiber (1989). "On the Convergence of the Cyclic Jacobi Method
for Parallel Block Orderings," SIAM J. Matri:I: Anal. Appl. 10, 32&-346.
M.H.C. Paardekooper (1991). "A Quadratically Convergent Parallel Jacobi Process
for Diagonally Dominant Matrices with Nondistinct Eigenvalues," Lin.Alg. and Its
Applic. 145, 71-&!.
8.5. TRIDIAGONAL METHODS 439

8.5 Tridiagonal Methods


In this section we develop special methods for the symmetric tridiagonal
eigenproblem. The tridiagonal form

a, b, 0
b, a2
T = (8.5.1)

bn-1
0 bn-l a,.

can be obtained by Householder reduction (cf. §8.3.1) . However, symmetric


tridiagonal eigenproblems arise naturally in many settings.
We first discuss bisection methods that are of interest when selected
portions of the eigensystem are required. T his is followed by t he presen-
tation of a divide and conquer algorithm that can be used to acquire the
full symmetric Schur decomposition in a way that is amenable to parallel
processing.

8.5.1 Eigenvalues by Bisection


Let T,. denote the leading r-by-r principal submatrlx of the matrix T In
(8.5.1) . Define the polynomials p,.(x) = det(Tr- xi), r = l:n. A simple
determinanta.l expansion shows that

p,.(x) = (a,. - x)p,._,(x) - b~-tPr- 2(x) (8.5.2)

for r = 2:n if we set Po(x) = 1. Because p0 (x) can be evaluated in O(n)


flops, it is feasible to find its roots using the met hod of bisection. For
example, if Pn (y) p,. (z) < 0 andy< z, then the iteration

while IY - zl > e(IYI + lzl)


x = (y + z)/2
if Pn(x)pn(Y) <0
z=x
else
y=x
end
end

is guaranteed to terminate with (y + z)/2 an approximate zero of Pn(x) ,


i.e., an approximate eigenvalue of T. The iteration converges linearly in
that the error is approximately halved at each step.
440 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

8.5.2 Sturm Sequence Methods


Sometimes it is necessary to compute the kth largest eigenvalue ofT for
some prescribed value of k. This can be done efficiently by using the bisec-
tion idea and the following classical result:
Theorem 8.5.1 (Sturm Sequence Property) If the tridiagonal matrix
in {8. 5.1) has no zero subdiagonal entries, then the eigenvalues of T,_ 1
strictly separate the eigenvalues ofT,:

Moreover, if a(A) denotes the number of sign changes in the sequence

{Po(A), p,(A), ... , Pn(A)}

then a( A) equals the number ofT's eigenvalues that are less than A. Here,
the polynomials p,(x) are defined by {8.5.2} and we have the convention
that Pr(A) has the opposite sign of Pr-1(A) if p,(A) = 0.
Proof. It follows from Theorem 8.1.7 that the eigenvalues of T,_ 1 weakly
separate those of Tr. To prove that the separation must be strict, suppose
that Pr(J.L) = Pr-1(J.L) = 0 for some r and J.L. It then follows from (8.5.2)
and the assumption that Tis unreduced that Po(J.L) = p 1(J.L) = · · · = Pr(J.L)
= 0, a contradiction. Thus, we must have strict separation.
The assertion about a(A) is established in Wilkinson (1965, 300-301).
We mention that if Pr(A) = 0, then its sign is assumed to be opposite the
sign of Pr-1 (A). D

Example 8.5.1 If

=~ -! l
1 -1
-1 2
T = [
0 -1
0 0
then A(T)"" {.254, 1.82, 3.18, 4.74}. The sequence
{po(2), P!(2), P2(2), pa(2), P<(2)} = { 1, -1, -1, 0, 1}

confirms that there are two eigenvalues less than A = 2.

Suppose we wish to compute Ak(T). From the Gershgorin theorem


(Theorem 8.1.3) it follows that Ak (T) E [y, z] where

y = min a; -lb;l-lb;-11 z = max a;+ lb;l + lb;-1l


l:Si:Sn l:Si~n

if we define bo = bn = 0. With these starting values, it is clear from the


Sturm sequence property that the iteration
8.5. TRIDIAGONAL METHODS 441

while ]z - yJ> u(IYI + ]zl)


x = (y+z)/2
if a(x) 2;:: n- k (8.5.3)
z=x
else
y=x
end
end
produces a sequence of subintervals that are repeatedly halved in length
but which always contain >.k (T).

Example 8.6.2 If (8.5.3) is applied to the matrix of Example 8.5.1 with k ""3, then
t he values shown in the following table are generated:

II z X a(x)
0.0000 s.oooo 2.5000 2
0.0000 2.5000 1.25o00 1
1.2500 2.5000 1.3750 1
1.3750 2.5000 1.9375 2
1.3750 1.9375 1.65&3 1
1.6563 1.9375 1.7969 1

We conclude from the output that ~a (T) E [ 1.7969, 1.9375 ]. Note: ~a(T)::::: 1.82.

During the execution of (8.5.3), information about the location of other


eigenvalues is obtained. By systematically keeping track of this informa-
tion it is possible to devise an efficient scheme for computing "contiguous"
subsets of >.(T), e.g., >.k(T), >.k+l(T), .. . , >.k+i(T). See Barth, Martin, and
Wilkinson (1967).
If selected eigenvalues of a general symmetric matrix A are desired,
then it is necessary first to compute the t ridiagonallzation T = U[TU0
before the above bisection schemes can be applied. This can be done using
Algorithm 8.3.1 or by the Lanczos algorithm discussed in the next chapter.
In eit her case, the corresponding eigenvectors can be readily found via
inverse iteration since tridiagonal systems can be solved in O(n) flops. See
§4.3.6 and §8.2.2.
In t hose applications where t he original matrix A already has tridiagonal
form, bisection computes eigenvalues with small relative error, regardless of
their magnitude. This Is In contrast to the tridiagonal QR iteration, where
the computed eigenvalues 5.; can be guaranteed only to have small absolute
error: ]5.; - >.;(T)I ::::l u JI T ll2
Finally, it is possible to compute specific eigenvalues of a symmetric ma-
trix by using the LDL T factorization (see §4.2) and exploiting t he Sylvester
inertia theorem (Theorem 8.1.17). If

A- Jj [ = LDLT
442 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

is the LDLT factorization of A- JJI with D = diag(db ... , dn), then the
number of negatived; equals the number of >.;(A) that are less than IJ· See
Parlett (1980, p.46) for details.

8.5.3 Eigensystems of Diagonal Plus R.a.nk-1 Matrices


Our next method for the symmetric tridiagonal eigenproblem requires that
we be able to compute efficiently the eigenvalues and eigenvectors of a
matrix of the form D + pzzT where D E lR"""is diagonal, z E JR.•, and
p E JR. This problem is important in its own right and the key computations
rest upon the following pair of results.

Lemma 8.5.2 SupposeD = diag(dl! ... , d,.) E JR.""" has the property that
d1 > · · · > dn Assume that p # 0 and that z E IR" has no zero compo-
nents. If
(D + pzzT)v = >.v v#0
then zT v #0 and D- >.I is nonsingular.

Proof. If>. E >.(D) , then >. = d; for some i and thus

0 = ef[(D- >.I)v + p(zT v)z] = p(zT v)z;.

Since p and z; are nonzero we must have 0 = zT v and so Dv = >.v. Haw-


ever, D has distinct eigenvalues and therefore, v E span{e;}. But then
0 = zT v = z;, a contradiction. Thus, D and D + pzzT do not have any
common eigenvalues and zT v # 0. D

Theorem 8.5.3 SupposeD= diag(dJ. ... , dn) E R""" and that the diag-
onal entries satisfy d1 > · · · > dn. Assume that p # 0 and that z E IR" has
no zero components. If V E IR""" is orthogonal such that

with >.1? ···?An and V = [ VJ, ... ,Vn ], then

(a) The>.; are then zeros of!(>.)= 1 + pzT(D- >.I)- 1 z.

(b) If p > 0, then >.1 > d1 > >.2 > · · · > dn.
If p < 0, then d1 > >. 1 > d2 > · · · > d,. > >.n.

(c) The eigenvector v; is a multiple of (D- >.,I)- 1z.

Proof. If (D + pzzT)v = >.v, then

(D - >.I)v + p(zT v)z = 0. (8.5.4)


8.5. TRIDIAGONAL METHODS 443

We know from Lemma 8.5.2 that D- AI is nonsingular. Thus,

v E span{(D- AJ)- 1 z}

thereby establishing (c). Moreover, if we apply zT(D- AJ)- 1 to both sides


of equation (8.5.4) we obtain

ZT V (1 + pzT(D- AJ)- 1 z) = 0.

By Lemma 8.5.2, zTv # 0 and so this shows that if A E A(D + pzzT), then
/(A) = 0. We must show that all the zeros off are eigenvalues of D + pzzT
and that the interlacing relations (b) hold.
To do this we look more carefully at the equations

/(A)

/'(A)

Note that f is monotone in between its poles. This allows us to conclude


that if p > 0, then f has precisely n roots, one in each of the intervals

If p < 0 then f has exactly n roots, one in each of the intervals

In either case, it follows that the zeros of f are precisely the eigenvalues of
D+ pvvT.o

The theorem suggests that to compute V we (a) find the roots A1 , ... , An
off using a Newton-like procedure and then (b) compute the columns of
V by normalizing the vectors (D- >.;I)- 1 z fori= l:n. The same plan of
attack can be followed even if there are repeated d; and zero z;.
Theorem 8.5.4 If D = diag(d1 , •.. ,dn) and z ERn, then there exists an
orthogonal matrix Vi such that if v[ DV1 = diag(J.'t, ... , 1-'n) and w =
Vtz then
l-'1 > l-'2 > · · · > 1-'r ;::: 1-'r+l ;::: · · · ;::: 1-'n ,
W; # 0 fori= l:r, and w; = 0 fori= r + l:n.
Proof. We give a constructive proof based upon two elementary opera-
tions. (a) Suppose d; = di for some i < j . Let J( i, j, B) be a Jacobi
rotation in the (i, j) plane with the property that the jth component of
J(i,j,B)Tz is zero. It is not hard to show that J(i,j,B)TDJ(i,j,B) =D.
Thus, we can zero a component of z if there is a repeated d;. (b) If z; = 0,
444 C H APTER 8. T HE SYMMET RIC EIGENVALUE PROBLEM

z1 :/= 0 , and i < j, then let P be the identity with columns i and j inter-
changed. It follows that pTDp is diagonal, (PTz), :/= 0, and (PTz)1 = 0.
Thus, we can permute all the zero z. to the "bottom." Clearly, repetition
of (a) and (b) eventually renders the desired canonical structure. V1 is the
product of the rotations. D

See Barlow (1993) and the references therein for a discussion of the solution
procedures that we have outlined above.

8.5.4 A Divide and Conquer Method


We now present a divide-and-conquer method for computing the Schur
decomposition
(8.5.5)
for tridiagonal T that involves (a) ''tearing" T in half, (b) computing the the
Schur decompositions of the two parts, and (c) combining t he two half-sized
Schur decompositions into t he required full size Schur decomposition. The
overall procedure, developed by Dongarra and Sorensen (1987), is suitable
for parallel computation.
We first show how T can be "torn" in half with a rank-one modification.
For simplicity, assume n = 2m. Define v E Rn as follows

v = [ ~~) ]
8e1
. (8.5.6)

Note that for all p E R the matrix T=T - fYVVT is identical to T except
in its "middle foue' entries:

T(m:m + 1, m:m + 1)

If we set p8 = bm then

where
0

0
8.5. TRIDIAGONAL METHODS 445

a m+l bm+l 0
bm+l Clm+2

and Cim = am - p and iim+J = Clm+J - pB'l.


Now suppose that we have m-by-m orthogonal matrices Q1 and Q2 such
that QfT1 Q 1 = D1 and QIT2Q2 = D2 are each diagonal. If we set

U=

then

where

is diagonal and
T [ Q[em ]
z = U tl = BQI et ·
Comparing these equations we see that the effective synthesis of the two
half-sized Schur decompositions requires the quick and stable computation
of an orthogonal V such that

vT(D+pzzT)V = A = diag().J, ... , ).n)

which we discUBSed in §8.5.3.

8.5.5 A Parallel Implementation


Having stepped through the tearing a.nd synthesis operations, we are ready
to illustrate the overall process and how it can be implemented on a mul-
tiprocessor. For clarity, assume that n = 8N for some positive integer N
and that three levels of tearing are performed. We can depict this with a
binary t ree as shown in Fro. 8.5.1. The indices are specified in binary.
FIG. 8.5.2 depicts a single node and should be interpreted to mean that
the eigensystem for t he tridiagonal T (b) is obt ained from the eigensystems
of the tridiagonals T(bO) and T (bl ). For example, the eigensystems for the
N-by-N matrices T(UO) and T(lll) are rombined to produce the eigen-
system for the 2N-by-2N t ridiagonal matrix T(ll).
446 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

T(O) T(l)

~
T(OO) T(Ol)
~
T(lO) T(ll)

A
T(OOO) T(OOl)
A
T(OlO) T(Oll)
A
T(lOO) T(lOl)
A
T(llO) T(lll)

FIGURE 8.5.1 Computation TI-ee

T(b)

A
T(bO) T(bl)

FIGURE 8.5.2 Synthesis at a Node

With tree-structured algorithms there is always the danger that paral-


lelism is lost as the tree is "climbed" towards the root, but this is not the
case in our problem. To see this suppose we have 8 processors and that the
first task of Proc(b) is to compute the Schur decomposition of T(b) where
b = 000,001,010,011,100,101,110,111. This portion of the computation is
perfectly load balanced and does not involve interprocessor communication.
(We are ignoring the Theorem 8.5.4 deflations, which are unlikely to cause
significant load imbalance.)
At the next level there are four gluing operations to perform: T(OO),
T(01), T(lO), T(11). However, each of these computations neatly subdi-
vides and we can assign two processors to each task. For example, once
the secular equation that underlies the T(OO) synthesis is known to both
Proc(OOO) and Proc(001), then they each can go about getting half of the
eigenvalues and corresponding eigenvectors. Likewise, 4 processors can each
be assigned to the T(O) and T(1) problem. All 8 processors can participate
in computing the eigensystem ofT. Thus, at every level full parallelism
8.5. TRIDIAGONAL METHODS 447

can be maintained because the eigenvalue/eigenvector computations are


independent of one another.

Problems

P8.5.1 Suppose >. is an eigenvalue of a symmetric tridiagonal matrix T. Show that if


>. has algebraic multiplicity k, then at least k- 1 ofT's subdiagonal elements are zero.
P8.5.2 Give an algorithm for determining p and 0 in (8.5.6) with the property that
lar- pi, lar+1- PI} is maximized.
8 E { -1, 1} and min{
P8.5.3 Let p,(>.) = det(T(1:r, 1:r)- >.!,) where Tis given by (8.5.1). Derive are-
cursion for evaluating p;_(>.) and use it to develop a Newton iteration that can compute
eigenvalues ofT.
P8.5.4 What communication is necessary between the processors assigned to a partic-
ular T,? Is it possible to share the work associated with the processing of repeated d;
and zero Zi ?
P8.5.5 If Tis positive definite, does it follow that the matrices T1 and T2 in §8.5.4 are
positive definite?
P8.5.6 Suppose that

A=[$ v

where D = diag(d, ... , dn- I) has distinct diagonal entries and v E Rn- 1 has no zero
entries. (a) Show that if >. E >.(A), then D- >.In- 1 is nonsingular. (b) Show that if
>. E >.(A), then >. is a zero of

T
P8.5. Suppose A=
S+auuT where S E wxn is skew-symmetric, u ERn, and a E R.
Show how to compute an orthogonal Q such that QT AQ = T + ae1ei where Tis tridi-
agonal and skew-symmetric and e1 is the first column of In.
P8.5.8 It is known that >. E >.(T) where T E Rnxn is symmetric and tridiagonal with
no zero subdiagonal entries. Show how to compute x(1:n -1) from the equation Tx = >.x
given that Xn = 1.

Notes and References for Sec. 8.5

Bisection/ Strum sequence methods are discussed in

W. Barth, R.S. Martin, and J.H. Wilkinson (1967). "Calculation of the Eigenvalues of
a Symmetric Tridiagonal Matrix by the Method of Bisection,'' Numer. Math. 9,
386-93. See also Wilkinson and Reinsch (1971, 249-256).
K.K. Gupta (1972). "Solution of Eigenvalue Problems by Sturm Sequence Method," Int.
J. Numer. Meth. Eng. 4, 379-404.

Various aspects of the divide and conquer algorithm discussed in this section is detailed in

G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Review 15,
318-44.
J.R. Bunch, C.P. Nielsen, and D.C. Sorensen (1978). "Rank-One Modification of the
Symmetric Eigenproblem," Numer. Math. 31, 31--48.
J.J.M. Cuppen (1981). "A Divide and Conquer Method for the Symmetric Eigenprob-
lem," Numer. Math. 36, 177-95.
448 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

J.J. Dongana and D.C. Sorensen (1987). "A Fully Parallel Algorithm for the Symmetric
Eigenvalue Problem," SIAM J. Sci. and Stat. Comp. 8, S139-S154.
S. Crivelli and E.R. Jessup (1995). "The Cost of Eigenvalue Computation on Distributed
Memory MIMD Computers," Parallel Computing 21, 401-422.
The very delicate computations required by the method are CIU"efully analyzed in

J.L. B!U"low (1993). "Error Analysis of Update Methods for the Symmetric Eigenvalue
Problem," SIAM J. Matm Anal. Appl. 14, 598-{;18.
Various generalizations to banded symmetric eigenproblems have been explored.

P. Arbenz, W. Gander, and G.H. Golub (1988). "Restricted Rank Modification of the
Symmetric Eigenvalue Problem: Theoretical Considerations," Lin. Alg. and Its
Applic. 104, 75-95.
P. Arbenz and G.H. Golub (1988). "On the Spectral Decomposition of Hermitian Ma-
trices Subject to Indefinite Low Rank Perturbations with Applications," SIAM .1.
Matm Anal. Appl. 9, 40-58.
A related divide and conquer method based on the "ELrrowhead" matrix (see P8.5. 7) is
given in

M. Gu and S.C. EisenstELt (1995). "A Divide-and-Conquer Algorithm for the Symmetric
Tridiagonal Eigenproblem," SIAM J. Matm Anal. Appl. 16, 172-191.

8.6 Computing the SVD


There are important relationships between the singular value decomposition
of a. matrix A and the Schur decompositions of the symmetric matrices
AT A, AAT , and [ A0 AT ] . Indeed, 1f.
0

urAV = dia.g(a,, ... ,an)

is the SVD of A E lR.mxn (m?: n), then

vr(ATA)V = dia.g(a~, ... ,a;) ElR.nxn (8.6.1)

and
dia.g(a?, ... , a;, ..__...._..
0, ... , 0) E lR.mxm (8.6.2)
m-n

Moreover, if

and we define the orthogonal matrix Q E lR.(m+n)x(m+n) by


8.6. COMPUTING THE SVD 449

then

QT [ ~ A;] Q = diag(at, . .. ,un,-a1, .. . ,-a 0 ,~). (8.6.3)


m-n
These connections to the symmetric eigenproblem allow us to adapt the
mathematical and algorithmic developments of the previous sections to the
singular value problem. Good references for this section include Lawson
and Hanson (1974) and Stewart and Sun (1990).

8.6.1 Perturbation Theory and Properties


We first establish perturbation results for the SVD based on the theorems
of §8.1. Recall that a;(A) denotes the ith largest singular value of A.
Theorem 8.6.1 If A E IRmxn, then fork= 1:min{m,n}

max min = max min


II Ax ll2
dim(S)=k
dim(T)=k
:r:ES
yET dim(S)=k :r:ES II X 112

Note that in this expressionS~ IR" and T ~ IRm are subspaces.


Proof. The right-most characterization follows by applying Theorem 8.1.2
to AT A. The remainder of the proof we leave as an exercise. []

Corollary 8.6.2 If A andA+E are iniRmxn withm ~ n, then fork= 1:n

iak(A +E)- ak(A)I ::; at(E) =II E ll2·


Proof. Apply Corollary 8.1.6 to

0 (A+E?] 0
[ A+E 0 .

n
Example 8.6.1 If

A= [ ~ and A+E= [ ;
3
~]
6.01
then u(A) = {9.5080, .7729} and u(A +E) = {9.5145, .7706}. It is clear that fori= 1:2
we have lo-;(A +E) - u;(A)I $ II E ll2 = .01.

Corollary 8.6.3 Let A= [ a 1 , ••• , an] E IRmxn be a column partitioning


with m ~ n. If Ar = [a~o ... ,a. ], thenforr = l:n-1
Ut(Ar+t) ~ O"t(Ar) ~ a2(Ar+l) ~ • · · ~ o-.(Ar+t) ~ a.(A.) ~ O"r+t(Ar+l)·
450 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Proof. Apply Corollary 8.1.7 to AT A. D

This last result says that by adding a column to a matrix, the largest
singular value increases and the smallest singular value is diminished.

Example 8.3.2

7
6
8
9
10
12
11
13
14
15
l {
a(A 1 ) = {7.4162}
a(A2) = {19.5377, 1.8095}
a(Aa) = {35.1272, 2.4654, 0.0000}

thereby confirming Corollary 8.6.3.

The next result is a Wielandt-Hoffman theorem for singular values:


Theorem 8.6.4 If A and A+ E are in lRmxn with m;::: n, then
n

L (uk(A +E)- uk(A))


2
~ II E II~.
k=I

Proof. Apply Theorem 8.1.4 to [ ~ A; ] and [ A 1 E (A +O E)T ] . D

Example 8.6.3 If

then
A= r~ n and A+E = [ ;
3
;
6.01
]

2
L (uk(A +E)- O'k(A)) 2 .472 x w-• :-: : w-•
k=l
See Example 8.6.1.

For A E lRmxn we say that the k-dimensional subspaces S £:; ne and


T <; lRm form a singular subspace pair if x E S and y E T imply Ax E T
and AT y E S. The following result is concerned with the perturbation of
singular subspace pairs.
Theorem 8.6.5 Let A,E E lRmxn with m;::: n be given and suppose that
V E IR"xn and U E lRmxm are orthogonal. Assume that

and that ran(VI) and ran(U1 ) form a singular subspace pair for A. Let
r
uH AV = [ AOu 0 ]
A22 m- r
r n-r
8.6. CoMPUTING THE SVD 451

UHEV r
m-r
r n-r
and assume that

min Ia - -rl > 0.


.:rE.:r{An)
-yE.:r(A 22 )

If

then there exist matrices P E IR(n-r)xr and Q E IR(m-r)xr satisfying

such that ran( VI + V2Q) and ran(UJ + U2P) is a singular subspace pair for
A+E.

Proof. See Stewart (1973), Theorem 6.4. D

Roughly speaking, the theorem says that O(t) changes in A can alter a
singular subspace by an amount t/6, where 6 measures the separation of
the relevant singular values.

Example 8.6.4 The matrix A= diag(2.000, 1.001, .999) E R'x 3 has singular subspace

l
pairs (span{ vi}, span{u;}) fori= I, 2, 3 where v, = ej l and u; = ej 4 ) Suppose
3

2.000
.010 .010
E = .010 1.001 .010
A + .010
[ .OIO .999
.010 .010 .010

The corresponding columns of the matrices

[ .9999 -.0144
.OIOI .7415 .0007]
.6708
U= [ iLt u2 u3] .0101 .6707 -.7616
.005I .OI38 -.0007

[ .9999 -.OI43 .0007 ]


v= [ il, ii, ii3] = .0101 .7416 .6708
.OIOI .6707 -.7416
define singular subspace pairs for A +E. Note that the pair {span{ 1];}, span{ u;}}, is close
to {span{ vi}, span{u;}} fori= 1 but not fori= 2 or 3. On the other hand, the singular
subspace pair {span{ii2,ii3}, span{u2,ua}} is close to {span {v2,v3 }, span{u2,u 3}}.
452 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

8.6.2 The SVD Algorithm


We now show how a variant of the QR algorithm can be used to com-
pute the SVD of an A E nmxn with m C; n . At first glance, this appears
straightforward. Equation (8.6.1) suggests that we

• form C =AT A,

• use the symmetric QR algorithm to compute V('CV1 = diag(O'l),


• apply QR with column pivoting to AVt obtaining lJT(AVt)fl = R.

Since R has orthogonal columns, it foUows that UT A(VtiT) is diagonal.


However, as we saw in Example 5.3.2, the format ion of ATA can lead to a
Loss of information. T he situation is not quite so bad here, since the original
A is used to compute U.
A pmerabJe method for computing tbe SVD is described io Golub and
Kahan (1965). Their technique finds U and V simu)taDeously by implicitly
applying the symmetric QR algorithm to AT A. The first step is to reduce
A to upper bidiagonal form using Algoritl1m 5.4.2:

dt ft 0

0 d'l
UiAVs
[~ ] B E Jre1Xfl.

fn- l
0 0 dn

The remaining problem is thus to compute the SVD of B. To this end, con-
sider applying an implicit-shift QRstep (Algorithm 8.3.2) to the tridiagonal
matrix T = B TB :

• Compute the eigenvalue >. of

d2 +j,'l
T(m:n,m:n) = m m- t nl=n-1
[
dmfm

that is closer to ~ + J';,.


• Compute c1 = cos(9t) and s 1 = sin(81 ) such that

[
c1 s1]T [ 4- A] = [ XQ ]
-S l Ct ddt
and set G1 = G(1,2,8t)·
8.6. COMPUTING THE SVD 453

• Compute Givens rotations G 2 , ••. , Gn-l so that if Q = G1 · · · Gn-1


then QTTQ is tridiagonal and Qe 1 = G 1 e 1 •
Note that these calculations require the explicit formation of BT B, which,
a.s we have seen, is unwise from the numerical standpoint.
Suppose instead that we apply the Givens rotation G 1 above to B di-
rectly. Illustrating with the n = 6 case this gives
X X 0 0 0 0
+ X X 0 0 0
0 0 X X 0 0
B <-- BG1 =
0 0 0 X X 0
0 0 0 0 X X
0 0 0 0 0 X

We then can determine Givens rotations U1, V2, U2 , ••• , Vn-1, and Un-l to
chase the unwanted nonzero element down the bidiagonal:

X X + 0 0 0
0 X X 0 0 0
0 0 X X 0 0
B <-- U[B =
0 0 0 X X 0
0 0 0 0 X X
0 0 0 0 0 X

X X 0 0 0 0
0 X X 0 0 0
B <-- BV2
0 + X X 0 0
0 0 0 X X 0
0 0 0 0 X X
0 0 0 0 0 X

X X 0 0 0 0
0 X X + 0 0
0 0 X X 0 0
B <-- U'{B
0 0 0 X X 0
0 0 0 0 X X
0 0 0 0 0 X

and so on. The process terminates with a new bidiagonal B that is related
to B as follows:
- T T -r -
B = (Un-1 · · · U1 )B(G1 V2 · · · Vn-1) = U BV.
454 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Since each V; has the form V; = G(i,i + l,Oi) where i = 2:n -1, it follows
that Ve1 = Qe 1. By the implicit Q theorem we can assert that V and Q
are essentially the same. Thus, we can implicitly effect the transition from
T tot= fJT fJ by working directly on the bidiagonal matrix B.
Of course, for these claims to hold it is necessary that the underlying
tridiagonal matrices be unreduced. Since the subdiagonal entries of BTB
are of the form d;-1 , !;, it is clear that we must search the bidiagonal band
for zeros. If fk = 0 for some k, then

B= B1 0 ] k
[ 0 B2 n- k
k n -k

and the original SVD problem decouples into two smaller problems involv-
ing the matrices B 1 and B 2 . If dk = 0 for some k < n, then premultiplication
by a sequence of Givens transformations can zero fk. For example, if n =
6 and k = 3, then by rotating in row planes (3,4), (3,5), and (3,6) we can
zero the entire third row:

X X 0 0 0 0 X X 0 0 0 0
0 X X 0 0 0 0 X X 0 0 0
B =
0 0 0 X 0 0 (3,4)
--+
0 0 0 0 + 0
0 0 0 X X 0 0 0 0 X X 0
0 0 0 0 X X 0 0 0 0 X X
0 0 0 0 0 X 0 0 0 0 0 X

X X 0 0 0 0 X X 0 0 0 0
0 X X 0 0 0 0 X X 0 0 0
(3,5)
--+
0 0 0 0 0 + (3,6)
--+
0 0 0 0 0 0
0 0 0 X X 0 0 0 0 X X 0
0 0 0 0 X X 0 0 0 0 X X
0 0 0 0 0 X 0 0 0 0 0 X

If dn = 0, then the last column can be zeroed with a series of column


rotations in planes (n- 1, n), (n- 2, n), ... , (1, n). Thus, we can decouple
if !J ···fn-1 = 0 or d1···dn = 0.

Algorithm 8.6.1 (Golub-Kahan SVD Step) Given a bidiagonal matrix


BE lRmxn having no zeros on its diagonal or superdiagonal, the following
algorithm overwrites B with the bidiagonal matrix iJ = ()T BV where tJ
and V are ortlwgonal and V is essentially the orthogonal matrix that would
be obtained by applying Algorithm 8.3.2 to T = BT B.
8.6. COMPUTING THE SVD 455

Let J1. be the eigenvalue of the trailing 2-by-2 submatrix ofT = BT B


that is closer to tnn.
y = tu - J1.
z = tl2
fork= 1:n -1
Determine c = cos( 0) and s = sin( 0) such that

[ y z l [ -~ ~ ] = [ * 0 l
B = BG(k,k+ 1,0)
y = bkk; z = bk+l,k

[-~ ~ r[n [~
Determine c = cos(O) and s = sin(O) such that

J
B = G(k,k + 1,0)TB
if k < n- 1
y = bk,k+1; z = bk,k+2
end
end
An efficient implementation of this algorithm would store B's diagonal and
superdiagonal in vectors a(1:n) and f(1:n- 1) respectively and would re-
quire 30n flops and 2n square roots. Accumulating U requires 6mn flops.
Accumulating V requires 6n 2 flops. ·
Typically, after a few of the above SVD iterations, the superdiagonal
entry f n-l becomes negligible. Criteria for smallness within B's band are
usually of the form

l/;l ~ €( ld;l + ldi+tl)


ld;l ~ dB II
where f is a small multiple of the unit roundoff and II · I is some compu-
tationally convenient norm.
Combining Algorithm 5.4.2 (bidiagonalization), Algorithm 8.6.1, and
the decoupling calculations mentioned earlier gives

Algorithm 8.6.2 (The SVD Algorithm} Given A E lRmxn (m ~ n) and


f,a small multiple of the unit roundoff, the following algorithm overwrites
A with UT AV = D + E, where U E lRmxn is orthogonal, V E lR"x" is
orthogonal, D E lRmxn is diagonal, and E satisfies II E lb ~ ull A ll2·
Use Algorithm 5.4.2 to compute the bidiagonalization
456 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

until q = n
Set bi,i+l to zero if lbi,i+tl :5 E(lb;;l + lbHt,i+lD
for any i = 1:n - 1.
Find the largest q and the smallest p such that if

p
B n- p- q
q
n-p -q

then B 33 is diagonal Md B 22 has nonzero superdiagonal.


if q <n
if any diagonal entry in B 2 2 is zero, then zero
the superdiagonal entry in the same row.
else
Apply Algorit lun 8.6.1 to B22,
B = dia.g{lp, U, lq+m- n)T Bdiag(lp, V, lq)
end
end
end
The amount of work required by this algoritlun and its numerical properties
are discussed in §5.4.5 and §5.5.8.

Example 8.6.5 1f Algorit hm 8 .6.2 is a p plied to

A~u~~n
t ben the superdingonal elements converge \o zero as follows :

lteraLion O(la21l) O(la321) O (la431)


10° 100 10°
2 100 100 100
3 10() 10() 100
4 100 w- 1 w-z
5 100 10-1 w-s
6 100 10-1 10- 27
7 100 w-1 converg.
8 100 1o-•
9 w- 1 I0- 14
10 w-• converg.
11 w-•
12 10-12
13 couv..rg .

Obse rve t he cub ic-like convergence.


8.6. COMPUTING THE SVD 457

8.6.3 Jacobi SVD Procedures


It is straightforward to adapt the Jacobi procedures of §8.4 to the SVD
problem. Instead of solving a sequence of 2-by-2 symmetric eigenproblems,
we solve a sequence of 2-by-2 SVD problems. Thus, for a given index pair
(p, q) we compute a pair of rotations such that

Ct
[ -S!

See P8.6.8. The resulting algorithm is referred to as two-sided because each


update involves a pre- and post-multiplication.
A one-sided Jacobi algorithm involves a sequence of pairwise column
orthogonalizations. For a given index pair (p, q) a Jacobi rotation J(p, q, II)
is determined so that columns p and q of AJ(p.q, II) are orthogonal to each
other. See P8.6.8. Note that this corresponds to zeroing the (p, q) and (q, p)
entries in AT A. Once AV has sufficiently orthogonal columns, the rest of
the SVD (U and r:) follows from column scaling: AV = ur:.

Problems

P8.6.1 Show that if BE Rnxn is an upper bidiagonal matrix having a repeated singular
value, then B must have a zero on its diagonal or superdiagonal.

P8.6.2 Give formulae for the eigenvectors of [ ~ AOT ] in terms of the singular
vectors of A E R"'xn where m ~ n.
P8.6.3 Give an algorithm for reducing a complex matrix A to real bidiagonal form
using complex Householder transformations.
P8.6.4 Relate the singular values and vectors of A= B + iC (B, C E R"'xn) to those

of [ ~ -~].
P8.6.5 Complete the proof of Theorem 8.6.1.
P8.6.6 Assume that n = 2m and that S E Rnxn is skew-symmetric and tridiagonal.
Show that there exists a permutation P E E'xn such that pT SP has the following form:

pTsp = [ ~ -~T ] :
m m
Describe B. Show how to compute the eigenvalues and eigenvectors of S via the SVD
of B. Repeat for the case n = 2m + 1.
P8.6. 7 (a) Let

be real. Give a stable algorithm for computing c and s with c2 + s2 = I such that

B=[ -sc •]c


c
is symmetric. (b) Combine (a) with the Jacobi trigonometric calculations in the text
to obtain a stable algorithm for computing the SVD of C. (c) Part (b) can be used to
458 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

develop a Jacobi-like algorithm for computing the SVD of A E wxn. For a given (p, q)
with p < q, Jacobi transformatiollB J(p, q, IJ,) and J(p, q, 02) are determined such that if
B = J(p, q, O!)T AJ(p, q, 62),
then bpq = bqp = 0. Show
off(B) 2 = off(A) 2 - b;q- b~p·
How might p and q be determined? How could the algorithm be adapted to handle the
case when A E wxn with m > n?
P8.6.8 Let x and y be in Rm and define the orthogonal matrix Q by

Q=[ -s c "
c ]·
Give a stable algorithm for computing c and s such that the columns of [x, y]Q are or-
thogonal to each other.
P8.6.D Suppose BE Rnxn is upper bidiagonal with bnn = 0. Show how to CO!lBtruct
orthogonal u and v (product of GivellB rotations) so that uT BV is Upper bidiagonal
with a zero nth column.
P8.6.10 Suppose B E wxn is upper bidiagonal with diagonal entries d(1:n) and super-
diagonal entries f(1:n- 1). State and prove a singular value version of Theorem 8.5.1.

Notes and References for Sec. 8.6


The mathematical properties of the SVD are discussed in Stewart and Sun (1990) as
well as

A.R. Amir-Moez (1965). Extremal Properties of Linear Thlnsformations and Geometry


of Unitary Spaces, Texas Tech University Mathematics Series, no. 243, Lubbock,
Texas.
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with
Certain Eigenvalue Problems," SIAM Review 15, 727-64.
P.A. Wedin (1972). "Perturbation Bounds in Connection with the Singular Value De-
composition," BIT 12, 99-111.
G.W. Stewart (1979). "A Note on the Perturbation of Singular Values," Lin. A!g. and
Its App!ic. 28, 213-16.
G.W. Stewart (1984). "A Second Order Perturbation Expansion for Small Singular
Values," Lin. A!g. and Its Applic. 56, 231-236.
R.J. Vaccaro (1994). "A Second-Order Perturbation Expansion for the SVD," SIAM J.
Matrix Anal. App!ic. 15, 661-671.
The idea of adapting the symmetric QR algorithm to compute the SVD first appeared in

G.H. Golub and W. Kahan (1965). "Calculating the Singular Values and Pseudo-Inverse
of a Matrix," SIAM J. Num. Ana!. Ser. B 2, 205-24.
and then came some early implementations:

P.A. Businger and G. H. Golub (1969). "Algorithm 358: Singular Value Decomposition
of a Complex Matrix," Comm. Assoc. Camp. Mach. 1£, 564-65.
G.H. Golub and C. Reinsch (1970). "Singular Value Decomposition and Least Squares
Solutions," Numer. Math. 14, 403--20. See also Wilkinson and Reinsch ( 1971, 134-
51).
Interesting algorithmic developments associated with the SVD appear in
8.6. COMPUTING THE SVD 459

J.J.M. Cuppen (1983). "The Singular Value Decomposition in Product Form," SIAM
J. Sci. and Stat. Camp. 4, 216-222.
J.J. Dongarra (1983). "Improving the Accuracy of Computed Singular Values," SIAM
J. Sci. and Stat. Camp. 4, 712-719.
S. Van Huffel, J. Vandewalle, and A. Haegemans (1987). "An Efficient and Reliable
Algorithm for Computing the Singular Subspace of a Matrix Associated with its
Smallest Singular Valu..,," J. Camp. and Appl. Math. 19, 313-330.
P. Deift, J. Demmel, L.-C. Li, and C. Thmei (1991). "The Bidiagonal Singular Value
Decomposition and Hamiltonian Mechanics," SIAM J. Num. Anal. £8, 1463-1516.
R. Mathias and G.W. Stewart (1993). "A Block QR Algorithm and the Singular Value
Decomposition," Lin. Alg. and It8 Applic. 182, 91-100.
A. Bjiirck, E. Grimme, and P. Van Dooren (1994). "An Implicit Shift Bidiagonalization
Algorithm for Ill-Posed Problems," BIT .'14, 510-534.
The Polar decomposition of a matrix can be computed immediately from its SVD. How-
ever, special algorithms have been developed just for this purpose.

N.J. Higham (1986). "Computing the PolBr Decomposition-with Applications," SIAM


J. Sci. and Stat. Comp. 1, 1160-1174.
N.J. Higham and P. Papadimitriou (1994). "A Parallel Algorithm for Computing the
Polar Decomposition," Pamllel Comp. IW, 1161-1173.
Jacobi methods for the SVD fall into two categoriffi. The two-sided Jacobi algorithms
repeatedly perform the update A ~ ur AV producing a sequence of iterates that are
increasingly diagonal.

E.G. Kogbetliantz (1955). "Solution of Linear Equations by Diagonalization of Coeffi-


cient Matrix," Quart. Appl. Math. 1.'1, 123-132.
G.E. Forsythe and P. Henrici (1960). "The Cyclic Jacobi Method for Computing the
Principal Valu"" of a Complex Matrix," Trans. Amer. Math. Soc. 9-i, 1-23.
GC. Paige and P. Van Dooren (1986). "On the Quadratic Convergence of Kogbetliantz's
Algorithm for Computing the Singular Value Decomposition," Lin. Alg. and Its
Applic. 71, 301-313.
J.P. Charlier and P. Van Dooren (1987). "On Kogbetliantz's SVD Algorithm in the
Presence of Clusters," Lin. Alg. and Its Applic. 95, 135-160.
Z. Bai {1988). "Note on the Quadratic Convergence of Kogbetliantz's Algorithm for
Computing the Singular Value Decomposition," Lin. Alg. and Its Applic. 104,
131-140.
J.P. Charlier, M. Vanbegin, P. Van Dooren (1988). "On Efficient Implementation of
Kogbetliantz's Algorithm for Computing the Singular Value Decompo.sition," Numer.
Math. 5£, 279-300.
K.V. Fernando {1989). "Linear Convergence of the Row Cyclic Jacobi and Kogbetliantz
methods," Numer. Math. 56, 73-92.
The one-sided Jacobi SVD procedures repeatedly perform the update A ~ AV produc-
ing a sequence of iterates with columns that ace increaaingly orthogonal.

J.C. Nash (1975). "A One-Sided Tranformation Method for the Singular Value Decom-
position and Algebraic Eigenproblem," Camp. J. 18, 74-76.
P.C. Hansen (1988). "Reducing the Number of Sweeps in Hestenes Method," in Singular
Value Decomposition and Signal Processing, ed. E. F. Deprettere, North Holland.
K. Veseli~ and V. Ho.ri (1989). "A Note on a One-Sided Jacobi Algorithm," Numer.
Math. 56, 627-633.
Numerous parallel implementations have been developed.

F.T. Luk (1980). "Computing the Singular Value Decomposition on the ILLIAC IV,"
ACM Trans. Math. Soft. 6, 524-39.
460 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

R.P. Brent and F.T. Luk (1985). "The Solution of Singular Value a.nd Symmetric Eigen-
value Problems on Multiprocessor Arrays," SIAM J. Sci. and Stat. Comp. 6, 6!<-84.
R.P. Brent, F .T. Luk, and C. Van Loan (1985). "Computation of the Singular Value
Decomposition Using Mesh Connected Processors," J. VLSI Computer Systems 1,
242-270.
F.T. Luk (1986). "A Triangula.r Processor Array for Computing Singular Values," Lin.
Alg. and Its Applic. 77, 25!<-274.
M. Berry and A. Sameh (1986). "Multiprocessor Jacobi Algorithms for Dense Symmetric
Eigenvalue and Singulac Value Decompositions," in Proc. International Conference
on Parallel Processing, 433--440.
R. Schreiber (1986). "Solving Eigenvalue and Singular Value Problems on an Undersized
Systolic Array," SIAM J. Sci. and Stat. Camp. 7, 441--451.
C.H. Bischof and C. Va.n Loan (1986). "Computing the SVD on a Ring of Array Proces-
sors," in Large Scale Eigenvalue Problems, eds. J. Cullum and R. Willoughby, North
Holland, 51-66.
C.H. Bischof (1987). "The Two-Sided Block Jacobi Method on Hypercube Architec-
tures," in Hwercube Multiprocessors, ed. M.T. Heath, SIAM Press, Philadelphia.
C.H. Bischof (1989). "Computing the Singular Value Decomposition on a Distributed
System of Vector Processors," Parallel Computing 11, 171-186.
S. Van Huffel and H. Park (1994). "Parallel Tri- and Bidiagona.lization of Bordered
Bidiagonal Matrices," Parallel Computing 20, 1107-1128.
B. Lang (1996). "Parallel Reduction of Banded Matrices to Bidiagonal Form," Parallel
Computing 22, 1-18.
The divide and conquer algorithms devised for for the symmetric eigenproblem have
SVD analogs:

E.R. Jessup and D.C. Sorensen (1994). "A Parallel Algorithm for Computing the Sin-
gula.r Value Decomposition of a Matrix," SIAM J. Matrix Anal. Appl. 15, 53(}-548.
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Bidiagonal
SVD," SIAM J. Matri:J: Anal. Appl. 16, 7!<-92.
Careful analyses of the SVD calculation include

J.W. Demmel and W. Kahan (1990). "Accurate Singular Values o[ Bidiagonal Matrices,"
SIAM J. Sci. and Stat. Comp. 11, 873-912.
K.V. Fernando and B.N. Parlett (1994). "Accurate Singular Values and Differential qd
Algorithms," Numer. Math. 67, 191-230.
S. Cha.ndrasekaren and I.C.F. Ipsen (1994). "Backward Errors ror Eigenvalue and Sin-
gular Value Decompositions," Numer. Math. 68, 215--223.
High accuracy SVD calculation and connections among the Cholesky, Schur, and singu-
lar value computations are discussed in

J.W. Derome! a.nd K. Veselic (1992). "Jacobi's Method is More Accurate than QR,"
SIAM J. Matrix Anal. Appl. 13, 1204-1245.
R. Mathias (1995). "Accurate Eigensystem Computations by Jacobi Methods," SIAM
J. Matrix Anal. Appl. 16, 977-1003.
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 461

8. 7 Some Generalized Eigenvalue Problems


Given a symmetric matrix A E IR.nxn and a symmetric positive definite
BE IR"x", we consider the problem of finding a nonzero vector x and a
scalar >. so Ax = >.Bx. This is the symmetric-definite generalized eigen-
problem. The scalar >. can be thought of as a generalized eigenvalue. As >.
varies, A - >.B defines a pencil and our job is to determine

>.(A, B) = {.XI det(A- >.B) = 0 }.


A symmetric-definite generalized eigenproblcm can be transformed to an
equivalent problem with a congruence transformation:

A- >.B is singular ¢} (XT AX)- >.(XTBX) is singular

Thus, if X is nonsingular, then >.(A,B) = >.(XT AX,XrBX).


In this section we present various structure-preserving procedures that
solve such eigenproblems through the careful selection of X. The related
generalized singular value decomposition problem is also discussed.

8.7.1 Mathematical Background


We seek is a stable, efficient algorithm that computes X such that xr AX
and xr BX are both in "canonical form." The obvious form to aim for is
diagonal form.
Theorem 8. 7.1 Suppose A and B are n-by-n symmetric matrices, and
define C(Jlo) by

(8.7.1)

If there exists a J1o E [0, 1] such that C(Jlo) is non-negative definite and

null(C(J.Io)) = null( A) n null(B)


then there exists a nonsingular X such that both xr AX and xr BX are
diagonal.
Proof. Let J1o E [0, 1] be chosen so that C(J.Io) is non-negative definite with
the property that null(C(J.Io)) = null(A) n null(B). Let

Q'[C(Jlo)Qt = [ ~ ~]
be the Schur decomposition of C(J.Io) and define X 1 = Q 1 diag(D- 112 , In-k).
If At= X'[ AXt, Bt =X{ BX1, and Ct = X'[C(Jlo)Xl, then

C1 = [ I~ ~ ] = Jl.At+ (1- J.lo)Bl.


462 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

Sincespan{ek+J, ... ,e,.} = nuii(C1) = null(A 1)nnull(B 1 ) it follows that


A1 and B 1 have the following block structure:

k k
n-k n-k

Moreover Ik = J.IAu + (1- J.I)Bu.


Suppose J.1 =f. 0. It then follows that if zrB 11 Z = diag(bJ, ... ,bk) is
the Schur decomposition of Bu and we set X = X 1diag(Z,ln-k) then

and

XT AX = ~XT (C(J.I)- (1- J.t)B) X


J.l

= ~ ([I~ ~]- (1- J.I)D 8 ) := DA.


On the other hand, if J.l = 0, then let zT A 11 Z = diag(a 1, ... , ak) be the
Schur decomposition of Au and set X = Xldiag(Z,l,.-k)· It is easy to
verify that in this case as well, both XT AX and xr BX are diagonal. 0

Frequently, the conditions in Theorem 8.7.1 are satisfied becallSe either A


or B is positive definite.

Corollary 8. 7.2 If A- >.BE lR.nxn is symmetric-definite, then there ex-


ists a nonsingular X = ( x 1, ... , x,. ] such that

Moreover, Ax; = >.;Bx; fori = l:n where >.; = a;jb;.


Proof. By setting J.l = 0 in Theorem 8.7.1 we see that symmetric-definite
pencils can be simultaneollSly diagonalized. The rest of the corollary is
easily verified. 0

Example 8. '7.1 If

A = [ 229 163 ] 81 59 ]
163 116 and B= [ 59 43

then A- >.B is sy=etric-definite and >.(A, B)= {5,-1/2}. If

X = [ 3 -5 ]
-4 7
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 463

then XT AX= diag(5,-1) and XTBX = diag(1,2).

Stewart (1979) has worked out a perturbation theory for symmetric


pencils A - )..B that satisfy

c(A, B) = min (xT Ax) 2 + (xT Bx) 2 > 0 (8.7.2)


llxll,=l

The scalar c(A, B) is called the Crawford number of the pencil A- )..B.

Theorem 8. 7.3 Suppose A - )..B is an n-by-n symmetric-definite pencil


with eigenvalues
;..., :2: )..2 :2: · · · :2: An.
Suppose EA and Es are symmetric n-by-n matrices that satisfy

~
2
= II EA II~ +II Es II~ < c(A,B).

Then (A+ EA)- )..(B + Es) is symmetric-definite with eigenvalues

P.l :2: "' :2: P.n

that satisfy

Iarctan()..;)- arctan(p.;)l ::; arctan(~/c(A, B))

fori= l:n.

Proof. See Stewart (1979). 0

8.7.2 Methods for the Symmetric-Definite Problem


Turning to algorithmic matters, we first present a method for solving the
symmetric-definite problem that utilizes both the Cholesky factorization
and the symmetric QR algorithm.

Algorithm 8.7.1 Given A= ATE IR:'xn and B = BT E lRnxn with B


positive definite, the following algorithm computes a nonsingular X such
that XTBX= In and XTAX = diag(a,, ... ,an)·

Compute the Cholesky factorization B = GGT


using Algorithm 4.2.2.
Compute C = c- 1 Ac-T.
Use the symmetric QR algorithm to compute the Schur
decomposition QTCQ = diag(a,, ... , an)·
Set X= c-TQ.
464 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

This algorithm requires about 14n3 flops. In a practical implementation,


A can be overwritten by the matrix C. See Martin and Wilkinson (1968c)
for details. Note that

Ifa; is a computed eigenvalue obtained by Algorithm 8.7.1, then it can


be shown that a; E >.(C- 1 Ac-T+ E;), where II E; 112 ::::: uiJ A 11211 B- 1 112·
Thus, if B is ill-conditioned, then a; may be severely contaminated with
roundoff error even if a; is a well-conditioned generalized eigenvalue. The
problem, of course, is that in this case, the matrix C = c- 1 Ac-T can have
some very large entries if B, and hence C, is ill-conditioned. This difficulty
can sometimes be overcome by replacing the matrix C in Algorithm 8.7.1
with V D- 112 where yT BV = Dis the Schur decomposition of B. If the
diagonal entries of D are ordered from smallest to largest, then the large
entries in C are concentrated in the upper left-hand corner. The small
eigenvalues of C can then be computed without excessive roundoff error
contamination (or so the heuristic goes). For further discussion, consult
Wilkinson (1965, pp.337-38).

n
Example 8.7.2 If

r~:
.001 0
and G = ~ .001
A= [ 1
and B = Gcfl', then the two smallest eigenvalues of A- ).8 are
a, = -0.619402940600584 a2 = 1.627440079051887.
If 17-<l.igit floating point arithmetic is UBed, then these eigenvalues are computed to full
machine precision when the symmetric QR algorithm is applied to fl(D- 1 12VT AV D- 1/2),
where B = V DVT is the Schur decomposition of B. On the other hand, if Algorithm
8.7.1 is applied, then
il1 = -0.619373517376444 il2 = 1.627516601905228.
The reason for obtaining only four correct significant digits is that "2( B) "" 1018 .

The condition of the matrix X in Algorithm 8.7.1 can sometimes be


improved by replacing B with a suitable convex combination of A and B.
The connection between the eigenvalues of the modified pencil and those
of the original are detailed in the proof of Theorem 8.7.1.
Other difficulties concerning Algorithm 8.7.1 revolve around the fact
that c- 1 Ac-T is generally full even when A and B are sparse. This is a
serious problem, since many of the symmetric-definite problems arising in
practice are large and sparse.
Crawford (1973) has shown how to implement Algorithm 8.7.1 effec-
tively when A and B are banded. Aside from this case, however, the si-
multaneous diagonalization approach is impractical for the large, sparse
symmetric-definite problem.
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 465

An alternative idea is to extend the Rayleigh quotient iteration (8.4.4)


as follows:
xo given with I xo ll2 = 1
fork= 0, 1, ...
/.lk = x[ Axkfx[ Bxk (8.7.3)
Solve (A- /.lkB)zk+l = Bxk for Zk+I·
Xk+J = Zk+J/11 Zk+l ll2
end

The mathematical basis for this iteration is that


xTAx
>. = xTBx (8.7.4)

minimizes
!(>.) = II Ax - >.Bx liB (8.7.5)
where II·JIB is defined by Jlzll1 = zT B- 1 z. The mathematical properties of

(8.7.3) are similar to those of (8.4.4). Its applicability depends on whether


or not systems of the form (A- Jl.B)z = x can be readily solved. A similar
comment pertains to the following generalized orthogonal iteration:
Q 0 E lRnxp given with Q~Qo = Ip
fork= 1,2, ...
Solve BZk = AQk-t for Zk. (8.7.6)
Zk = QkRk (QR factorization)
end

This is mathematically equivalent to (7.3.4) with A replaced by B- 1 A. Its


practicality depends on how easy it is to solve linear systems of the form
Bz = y.
Sometimes A and Bare so large that neither (8.7.3) nor (8.7.6) can be
invoked. In this situation, one can resort to any of a number of gradient
and coordinate relaxation algorithms. See Stewart (1976) for an extensive
guide to the literature.

8.7.3 The Generalized Singular Value Problem


We conclude with some remarks about symmetric pencils that have the
form AT A- >.BT B where A E lRmxn and BE lR!'xn. This pencil under-
lies the generalized singular value decomposition (GSVD), a decomposition
that is useful in several constrained least squares problems. (Cf. §12.1.)
Note that by Theorem 8.7.1 there exists a nonsingular X E lRnxn such that
XT(AT A)X and XT(BT B) X are both diagonal. The value of the GSVD
466 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

is that these diagonalizations can be achieved without forming AT A and


BTB.

Theorem 8.7.4 (Generalized Singular Value Decomposition) If we


have A E lRmxn with m ~ n and BE R"xn, then there exist orthogonal
U E lRmxm and V E R"xp and an invertible X E lRnxn such that

and
VTBX = S = diag(s~. ... ,sq)
where q = min(p,n).
Proof. The proof of this decomposition appears in Van Loan (1976). We
present a more constructive proof along the lines of Paige and Saunders
(1981). For clarity we assume that null(A) n null( B) = {0} and p ~ n. We
leave it to the reader to extend the proof so that it covers theses cases.
Let
(8.7.6)

be a QR factorization with Q 1 E lRmxn, Q2 E JR!'xn, andRE lRnxn. Paige


and Saunders show that the SVD's of Q 1 and Q 2 are related in the sense
that
(8.7.7)
Here, U,V, and Ware orthogonal, C = diag(c.;) with 0 ~ c 1 ~ ···~en, S
= diag(s;) with St ~ •.• ~ Sn, and (fTc+ sT s = In. The decomposition
(8.7.7) is a variant of the CS decomposition in §2.6 and from it we conclude
that A= Q 1 R = UC(WTR) and B = Q 2 R = VS(WTR). The theorem
follows by setting X = (wT R) -I, D A = C, and D 8 = S . The invertibility
of R follows from our assumption that null( A) n null( B) = {0}. D

The elements of the set u(A, B) ={ ctfs 1 , ••• , en/sq } are referred
to as the generalized singular values of A and B. Note that a E u(A, B)
implies that u 2 E >.(AT A, BT B). The theorem is a generalization of the
SVD in that if B =In, then u(A,B) = u(A).
Our proof of the GSVD is of practical importance since Stewart (1983)
and Van Loan (1985) have shown how to stably compute the CS decompo-
sition. The only tricky part is the inversion of WT R to get X. Note that
the columns of X = [ x1, ... , Xn] satisfy

i = 1:n

and so if s; "'0 then AT Ax; = ur BT Bx; where u; = c.;fs;. Thus, the x;


are aptly termed the generalized singular vectors of the pair (A, B).
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 467

In several applications an orthonormal basis for some designated gen-


eralized singular vector subspace space span{x; 1 , ••• ,x;.} is required. We
show how this can be accomplished without any matrix inversions or cross
products:
• Compute the QR factorization

• Compute the CS decomposition


Ql = ucwT
and order the diagonals of C and S so that
{cJ/s~. ... ,c,.;s,.} = {e;,/s;., ... ,e;./s;.}.

• Compute orthogonal Z and upper triangular T so TZ = WT R. (See


P8.7.5.) Note that if x- 1 = WTR = TZ, then X= ZTT- 1 and so
the first k rows of Z are an orthonormal basis for span{x1, ... , x~e}.

Problems
P8.7.1 Suppose A E Rnxn is symmetric and G E Rnxn is lower triangular and nonsin-
gular. Give an efficient algorithm for computing C = o-1 AG-T .
P8.7.2 Suppose A E R'xn is symmetric and BE R'x" is symmetric positive definite.
Give an algorithm for computing the eigenvalues of AB that us.., the Cholesky factor-
ization and the symmetric QR algorithm.
P8. 7.3 Show that if Cis real and diagonalizable, then there exist symmetric matrices A
and B, B nonsingular, such that C = AB- 1 • This shows that symmetric pencils A->.B
are essentially general.
=
P8. 7.4 Show how to convert an Ax >.Bx problem into a generalized singular value
problem if A and B are both symmetric and non-negative definite.
P8.7.5 Given Y E R'x" show how to compute Householder matrices H2, ... , H, so
that Y H, ·. · H2 = T is upper triangular. Hint: Hk zeros out the kth row.
P8.7.6 Suppose

where A E Rmxn, B1 E R"'xm, and B, E Jl!'X". Assume that B1 and B, are positive
definite with Cholesky triangles G1 and G, respectively. Relate the generalized eigen-
values of this problem to the singular values of G1 1 AG:IT
P8. 7. 7 Suppose A and B are both symmetric positive definite. Show how to compute
>.(A, B) and the corresponding eigenvectors using the Cholesky factorization and C S
decomposition.

Notes and References for Sec. 8.7


An excellent survey of computational methods for symmetric-definite pencils is given in
468 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM

G.W. Stewart (1976). "A Bibliographical Tour of the Large Sparse Generalized Eigen-
value Problem," in Sparse Matrix Computations , ed., J.R. Bunch and D.J. Rooe,
Academic Press, New York.
Some papers of particular interest include

R.S. Martin and J.H. Wilkinson (1968c). "Reduction of a Symmetric Eigenproblem


Ax= >.Ex and Related Problems to Standard Form," Numer. Math. 11, 9\>-llO.
G. Peters and J.H. Wilkinson (1969). "Eigenvalues of Ax= >.Ex with Band Symmetric
A and B," Comp. 1. 12, 398-404.
G. Fix and R. Heiberger (1972). "An Algorithm for the Ill-Conditioned Generalized
Eigenvalue Problem," SIAM J. Num. Anal. 9, 78-88.
C.R. Crawford (1973). "Reduction of a Band Symmetric Generalized Eigenvalue Prob-
lem," Comm. ACM 16, 41-44.
A. Ruhe (1974). "SOR Methods for the Eigenvalue Problem with Large Sparse Matri-
ces," Math. Comp. 28, 695-710.
C.R. Crawford (1976). "A Stable Generalized Eigenvalue Problem," SIAM J. Num.
Anal. 13, 854--{10.
A. Bun.se-Gerstner (1984). "An Algorithm for the Symmetric Generalized Eigenvalue
Problem," Lin. Alg. and Its Applic. 58, 43--{18.
C.R. Crawford (1986). "Algorithm 646 PDFIND: A Routine to Find a Positive Definite
Linear Combination of Two Real Symmetric Matrices," ACM Trans. Math. Soft.
12, 278-282.
C.R. Crawford and Y.S. Moon (1983). "Finding a Positive Definite Linear Combination
of Two Hermitian Matrices," Lin. Alg. and Its Applic ..51, 37-48.
W. Shougen and Z. Shuqin (1991). "An Algorithm for Ax = >.Ex with Symmetric and
Positive Definite A and B," SIAM J. Matrix Anal. Appl. 12, 654--{160.
K. Li and T-Y. Li (1993). "A Homotopy Algorithm for a Symmetric Generalized Eigen-
problem," Numerical Algorithms 4, 167-195.
K. Li, T-Y. Li, and Z. Zeng (1994). "An Algorithm for the Generalized Symmetric
Tridiagonal Eigenvalue Problem," Numerical Algorithms 8, 269-291.
H. Zhang and W.F. Moss (1994). "Using Parallel Banded Linear System Solvers in
Generalized Eigenvalue Problems," Pamllel Computing 20, 1089-l106
The simultaneous reduction of two symmetric matrices to diagonal form is discussed in

A. Berman and A. Ben-Israel (1971). "A Note on Pencils of Hermitian or Symmetric


Matrices," SIAM J. Applic. Math. 21, 51-54.
F. Uhlig (1973). "Simultaneous Block Diagonalization of Two Real Symmetric Matrices,"
Lin. Alg. and Its Applic. 7, 281-89.
F. Uhlig (1976). "A Canonical Form for a Pair of Real Symmetric Matrices That Gen-
erate a Non.singule.r Pencil," Lin. Alg. and Its Applic. 14, 189-210.
K.N. Majinder (1979). "Linear Combinations of Hermitian and Real Symmetric Matri-
ces," Lin. Alg. and Its Applic. 25, 95-105.
The perturbation theory that we presented for the symmetric-definite problem was taken
from

G.W. Stewart (1979). "Perturbation Bounds for the Definite Generalized Eigenvalue
Problem," Lin. Alg. and Its Applic. 23, 69-86.
See also

L. Elsner and J. Gue.ng Sun (1982). "Perturbation Theorems for the Generalized Eigen-
value Problem,; Lin. Alg. and its Applic. 48, 341-357.
J. Guang Sun (1982). "A Note on Stewart's Theorem for Definite Matrix Pairs," Lin.
Alg. and Its Applic. 48, 331-339.
8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 469

J. Guang Sun (1983). "Perturbation Analysis for the Generalized Singular Value Prob-
lem," SIAM J. Numer. Anal. 20, 611~25.
C.C. Paige (1984). "A Note on a Result of Sun J.-Guang: Sensitivity of the CS and
GSV Decompositions," SIAM J. Numer. Anal. 21, 186-191.
The generalized SVD and some of its applications are discussed in

C.F. Van Loan (1976). "Generalizing the Singular Value Decomposition," SIAM J. Num.
Anal. 13, 76-83.
C. C. Paige !llld M. Saunders (1981). ''Towsrds A Generalized Singular Value Decompo-
sition," SIAM J. Num. Anal. 18, 398--405.
B. Kagstrom (1985). "The Generalized Singular Value Decomposition !llld the General
A - :I.B Problem," BIT 24, 568-583.
Stable methods for computing the CS !llld generalized singular value decompositions are
described in

G.W. Stewart (1983). "A Method for Computing the Generalized Singular Value De-
composition," in Matri:J: Pencils , eel. B. Kilgstrom !llld A. Ruhe, Springer-Verlag,
New York, pp. 207-20.
C.F. Van Loan (1985). "Computing the CS and Generalized Singular Value Decompo-
sition," Numer. Math. 46, 479--492.
M.T. Heath, A.J. Laub, C. C. Paige, and R.C. Ward (1986). "Computing the SVD of a
Product of Two Matrices," SIAM J. Sci. and Stat. Comp. 7, 1147-1159.
C. C. Paige (1986). "Computing the Generalized Singular Value Decomposition," SIAM
J. Sci. and Stat. Comp. 7, 1126-1146.
L.M. Ewerbring and F.T. Luk (1989). "Canonical Correlations and Generalized SVD;
Applications and New Algorithms," J. Comput. Appl. Math. 27, 37-52.
J. Erxiong (1990). "An Algorithm for Finding Generalized Eigenpairs of a Symmetric
Definite Matrix Pencil," Lin.Alg. and Its Applic. 132, 65--91.
P.C. H!lllsen (1990). "Relations Between SVD and GSVD of Discrete Regularization
Problems in Standard and General Form," Lin.Alg. and Its Applic. 141, 165-176.
H. Zha (1991). "The Restricted Singular Value Decomposition of Matrix Triplets," SIAM
J. Matrix Anal. Appl. 12, 172-194.
B. De Moor and G.H. Golub (1991). "The Restricted Singular Value Decomposition:
Properties and Applications," SIAM J. Matrix Anal. Appl. 12, 401-425.
V. Hari (1991). "On Pairs of Almost Diagonal Matrices," Lin. Alg. and Its Applic.
148, 193-223.
B. De Moor and P. Van Dooren (1992). ''Generalizing the Singular Value !llld QR
Decompositions," SIAM J. Matrix Anal. Appl. 13, 993-1014.
H. Zha (1992). "A Numerical Algorithm for Computing the Restricted Singular Value
Decomposition of Matrix Triplets," Lin.Alg. and Its Applic. 168, 1-25.
R-C. Li (1993). "Bounds on Perturbations of Generalized Singular Values and of AI;oo-
ciated Subspaces," SIAM J. Matrix Anal. Appl. 14, 195--234.
K. Veseli~ (1993). "A Jacobi Eigenreduction Algorithm for Definite Matrix Pairs," Nu-
mer. Math. 64, 241-268.
Z. Bai and H. Zha (1993). "A New Preprocessing Algorithm for the Computation of the
Generalized Singular Value Decomposition," SIAM J. Sci. Comp. 14, 1007-1012.
L. Kaufm!lll (1993). "An Algorithm for the Banded Symmetric Generalized Matrix
Eigenvalue Problem," SIAM J. Matri:J: Anal. Appl. 14, 372-389.
G.E. Adams, A.W. Boja.nczyk, a.nd F.T. Luk (1994). "Computing the PSVD of Two
2x2 Tri!lllgular Matrices," SIAM J. Matrix Anal. AppL 15, 366-382.
Z. Drmai: (1994). The Genemlized Singular Value Problem, Ph.D. Thesis, FemUniver-
sitat, Hagen, Germany.
R-C. Li (1994). "On Eigenvalue Variations of Rayleigh Quotient Matrix Pencils of a
Definite Pencil," Lin. Alg. and Its Applic. 208/209, 471-483.
Chapter 9

Lanczos Methods

§9.1 Derivation and Convergence Properties


§9.2 Practical Lanczos Procedures
§9.3 Applications to Ax = b and Least Squares
§9.4 Arnoldi and Unsymmetric Lanczos

In this chapter we develop the Lanczos method, a technique th~t can be


used to solve certain large, sparse, symmetric eigenproblems Ax = >.x. The
method involves partial tridiagonalizations of the given matrix A. How-
ever, unlike the Householder approach, no intermediate, full submatrices
are generated. Equally important, information about A's extremal eigen-
values tends to emerge long before the tridiagonalization is complete. This
makes the Lanczos algorithm particularly useful in situations where a few
of A's largest or smallest eigenvalues are desired.
The derivation and exact arithmetic attributes of the method are pre-
sented in §9.1. The key Mpects of the Kaniel-Paige theory are detailed.
This theory explains the extraordinary convergence properties of the Lanc-
zos process. Unfortunately, roundoff errors make the Lanczos method some-
what difficult to use in practice. The central problem is a loss of orthog-
onality among the Lanczos vectors that the iteration produces. There are
several ways to cope with this M we discuss §9.2.
In §9.3 we show how the "Lanczos idea" can be applied to solve an as-
sortment of singular value, leMt squares, and linear equations problems. Of
particular interest is the development of the conjugate gradient method for
symmetric positive definite linear systems. The Lanczos-conjugate gradient
connection is explored further in the next chapter. In §9.4 we discuss the
Arnoldi iteration which is based on the Hessenberg decomposition and a

470
9.1. 0ER1VATION AND CONVERGENCE PROPERTIES 471

version of the Lanczos process that can (sometimes) be used to tridiago-


nalize unsymmetric matrices.

Before You Begin


Chapters 5 and 8 are required for §9.1-9.3 and Chapter 7 is needed for
§9.4. Within this chapter there are the following dependencies:

§9.1 _, §9.2 _, §9.3


t
§9.4
A wide range of Lanczos papers are collected in Brown, Chu, Ellison, and
Plemmons (1994). Other complementary references include Parlett (1980),
Saad (1992), and Chatelin (1993). The two volume work hy Cullum and
Willoughby (1985a,1985b) includes both analysis and software.

9.1 Derivation and Convergence Properties


Suppose A E Rnxn is large, sparse, and symmetric and assume that a few
of its largest and/or smallest eigenvalues are desired. This problem can be
solved by a method attributed to Lanczos (1950). The method generates
a sequence of tridiagonal matrices Tk with the property that the extremal
eigenvalues ofTk E Rkxk are progressively better estimates of A's extremal
eigenvalues. In this section, we derive the technique and investigate its
exact arithmetic properties. Throughout the section .>.;(·) designates the
ith largest eigenvalue.

9.1.1 Krylov Subspaces


The derivation of the Lanczos algorithm can proceed in several ways. So
that its remarkable convergence properties do not come as a complete sur-
prise, we prefer to lead into the technique by considering the optimization
of the Rayleigh quotient

r(x) X# 0.

Recall from Theorem 8.1.2 that the maximum and minimum values of r(x)
are .>. 1 (A) and >.n(A), respectively. Suppose {q;} ~ Rn is a sequence of
orthonormal vectors and define the scalars Mk and mk by

max
y¢0
472 CHAPTER 9. LANCZOS METHODS

min min r(Qky) ~ >.n(A)


y;<O IIYII,=l

where Qk = [ q 1 , ••• , Qk ]. The Lanczos algorithm can be derived by con-


sidering how to generate the Qk so that Mk and mk are increasingly better
estimates of >. 1 (A) and >.n(A).
Suppose Uk E span{q,, ... ,qk} is such that Mk = r(uk). Since r(x)
increases most rapidly in the direction of the gradient
2
V'r(x) = --r(Ax- r(x)x),
X X

we can ensure that Mk+I > Mk if Qk+! is determined so


(9.1.1)

(This assumes V'r(uk) # 0.) Likewise, ifvk E span{q~, ... ,qk} satisfies
r(vk) = mk, then it makes sense to require
(9.1.2)

since r(x) decreases most rapidly in the direction of- V'r(x).


At first glance, the task of finding a single Qk+ 1 that satisfies these two
requirements appears impossible. However, since V'r(x) E span{x,Ax}, it
is clear that (9.1.1) and (9.1.2) can be simultaneously satisfied if

span{q,, ... ,qk} = span{q1 , Aq 1, ... ,Ak-iq1}

and we choose qk+ 1 so

span{q,, ... ,qk+J} = span{q 1 , Aq 1 , •.. ,Ak- 1q1 , Akq 1}.

Thus, we are Jed to the problem of computing orthonormal bases for the
Krylov subspaces

IC(A,q,,k) = span{q" Aq1, ... ,Ak- 1q1}.


These are just the range spaces of the Krylov matrices

K(A,q,,n) = [q1, Aq,, A 2q,, ... , An- 1q 1 ].


presented in §8.3.2.

9.1.2 Tridiagonalization
In order to find this basis efficiently we exploit the connection between the
tridiagonalization of A and the QR factorization of K(A, q1, n). Recall that
if QT AQ = T is tridiagonal with Qe 1 = q1, then

K(A,q1,n) = Q[e 1 ,Tel!T2e 1, ... ,Tn-iel]


9.1. D ERI VATION AND C ONVERGENCE PROPERTIES 473

Is the QR factorization of K(A,qt,n) where e 1 = In(: , 1). Thus the Qk can


effectively be generated by tridiagonalizing A with an orthogonal matrix
whose first column is q1 •
Householder trldlagonalizatlon, discussed in §8.3.1, can be adapted for
this purpose. However, this approach is impractical if A is large and sparse
because Householder similarity transformations tend to destroy sparsity.
As a result, unacceptably large, dense mat:rices arise during the reduction.
Loss of sparsity can sometimes he controlled by using Givens rather
than Householder transformations. See Duff and Reid (1976). However,
any method that computes T by successively updating A is not useful in
the majority of cases when A is sparse.
This suggests that we try to compute the elements of the tridiagonal
matrix T = QT AQ direct ly. Setting Q = I q1 , .•. , qn J and

T =

0
and equat ing columns in AQ = QT, we fi nd

Aq.~; = f3k-IQk- l + a.~;qk + f3kqk+l f3otlo =0


for k = 1:n - 1. The orthonormality of the q; implies a~c = qfAq,..
Moreover, if r,. = (A- a~cl}qk- f3Jc -lQk- l is nonzero, then Qk+l = r,.j{3.~;
where {3,. = ±II r,. liz. If r,. = 0, then the iteration breaks down but (as
we shall see) not wit hout the acquisition of valuable invariant subspace
information. So by properly sequencing the above formulae we obtain the
Lanczos iteration:
ro= q1 ; Po 1; qo = 0; k = 0
;<

while ([3,. f= 0)
Qk+I = r,. j{3,.; k = k + 1; a,. = qf Aqk (9.1.3)
r,. = (A- a~ol)q,. - f3~o - tqk-ti /3~c = II rk 1!2
end
There is no loss of generality in choosing t he f3r. to be positive. The q,. are
called Lanc.zos vectors.

9.1.3 Tennination and Error Bounds


The iteration halts before complete tridiagonalization if q1 is contained in
a proper invariant subspace. This is one of several mathematical properties
of the met hod that we summarize in the following theorem.
474 CHAPTER 9. LANCZOS METHODS

Theorem 9.1.1 Let A E !Rnxn be symmetric and assume QJ E IRn has unit
2-norm. Then the Lanczos itemtion (9.1.3) runs until k = m, where m =
rank(K(A,q 1 ,n)) Moreover, fork= l:m we have

AQk = QSk + rkei (9.1.4)

where
Clj fh 0

f3J a2
Tk =

f3k-l
0 f3k-l O!k
and Qk = [ q1 , ... , Qk] has orthonormal columns that span JC(A, q1 , k ).
Proof. The proof is by induction on k. Suppose the iteration has produced
Qk = [q 1, ... ,qk] such that ran(Qk) = JC(A,qt,k) and QIQk =h. It is
easy to see from (9.1.3) that (9.1.4) holds. Thus, QI AQk = Tk +QI rkef,
Since a; = q'[ Aq; for i = l:k and

q'4, 1Aq; = q'4, 1(Aq;- a;q;- !3.-tq;_J) = q'[+ 1 ((3;qi+J) = (3;

fori= l:k -1, we have QI AQk = Tk. Consequently, Qfrk = 0.


If rk i' 0, then Qk+l = rk/11 rk ll2 is orthogonal to Q1, ... , Qk and
Qk+J E span{Aqk, Qk, Qk-d c;:: JC(A, Q1, k + 1).
Thus, QI+JQk+l = h+J and ran(Qk+ 1 ) = JC(A,qt,k + 1). On the other
hand, if rk = 0, then AQk = QkTk. This says that ran(Qk) = JC(A, q1, k)
is invariant. From this we conclude that k = m = rank(K(A, q1 , n)). []

Encountering a zero f3k in the Lanczos iteration is a welcome event in that it


signals the computation of an exact invariant subspace. However, an exact
zero or even a small f3k is a rarity in practice. Nevertheless, the extremal
eigenvalues of Tk turn out to be surprisingly good approximations to A's
extremal eigenvalues. Consequently, other explanations for the convergence
of Tk 's eigenvalues must be sought. The following rp.sult is a step in this
direction.
Theorem 9.1.2 Suppose that k steps of the Lanczos algorithm have been
performed and that s'[TkSk = diag(01 , ... ,Ok) is the Schur decomposition
of the tridiagonal matrix Tk. If Yk = [ Yt, ... , Yk] = QkSk E IRnxk, then
fori= l:k we have II Ay;- O,y, 112 = lf3ki\sk;l where sk = (spq).
Proof. Post-multiplying (9.1.4) by Sk gives

AYk = Ykdiag(OJ, ... , Ok) + rkek S"'


9.1. DERIVATION AND CONVERGENCE PROPERTIES 475

and so Ay; = O;y; + rk(e[ Se;). The proof is complete by taking norms
and recalling that II rk !!2 = li3kl· IDl

The theorem provides computable error bounds for Tk 's eigenvalues:

i = 1:k

Note that in the terminology of Theorem 8.1.15, the (0;, y;) are Ritz pairs
for the subspace ran(Qk)·
Another way that Tk can be used to provide estimates of A's eigenvalues
is described in Golub (1974) and involves the judicious construction of a
rank-one matrix E such that ran(Qk) is invariant for A+ E. In particular,
if we use the Lanczos method to compute AQk = QkTk + rke[ and set E
= rwwT, where r = ±1 and w = aqk + brk, then it can be shown that

If 0 = 1 + rab, then the eigenvalues of Tk = Tk + ra 2 ekef, a tridiagonal


matrix, are also eigenvalues of A+ E. Using Theorem 8.1.8 it can be shown
that the interval [A;(Tk), A;_ 1(Tk)] contains an eigenvalue of A fori= 2:k.
These bracketing intervals depend on the choice of ra 2 • Suppose we
have an approximate eigenvalue of .X of A. One possibility is to choose
ra 2 so that det(Tk- Ah) = (a2 + ra 2 - A)Pk-J(A)- i3L 1Pk-2(A) = 0
where the polynomials Pi (x) = det(Ti - xi;) can be evaluated at A using the
three-term recurrence (8.5.2). (This assumes that Pk-J(A) i' 0.) Eigenvalue
estimation in this spirit is discussed in Lehmann ( 1963) and Householder
(1968).

9.1.4 The Kaniel-Paige Convergence Theory


The preceding discussion indicates how eigenvalue estimates can be ob-
tained via the Lanczos algorithm, but it reveals nothing about rate of con-
vergence. Results of this variety constitute what is known as the Kaniel-
Paige theory, a sample of which follows.
Theorem 9.1.3 Let A be an n-by-n symmetric matrix with eigenvalues
A1 ~ · · · ~ An and corresponding orthonormal eigenvectors z 1 , .•. , Zn. If
01 ~ · · · ~ (}k are the eigenvalues of the matrix Tk obtained after k steps of
the Lanczos itemtion, then

(AI -An) tan(,PI) 2


(ck-1(1 + 2pi)) 2

where cos(.PI) = lqf zd, PI = (AI - A2)/(A2 - An), and ck_ 1 (x) is the
Chebyshev polynomial of degr-ee k - 1.
476 CHAPTER 9. LANCZOS METHODS

Proof. From Theorem 8.1.2, we have


9t = max yTTw = max (Q~:y)T A(Q~;y) =
v"o yTy v,.O (Qw)T(Qky)

Since At is the maximum of wT AwjwTw over all nonzero w, it follows that


At ~ 9t . To obtain the lower bound for 9t, note that
qfp(A)Ap(A)qt
91 = max
I'EP•-• qfp(A)2qt
n
where 'P~c-1 is t he set of k- 1 degree polynomials. If q, = L d;z; then
i=t

•=• n
L:d~v(.x,)2
1= 1
n
L:d~p(.\;)2
~ At - (AI -An) •= n
2

drv<>..) + L:d~v(-X;?
2

·-2
We C3ll make the lower bound tight by selecting a polynomial p(x) that is
large at x = At in comparison to its value at the remaining eigenvalues.
One way of doing this is to set

p(x) = c~:-1 ( -1 + 2 ; 2-=_~:)


where c~;_ 1 (z) is the (k- 1)-st Chebyshev polynomial generated via the
recursion

Co= 1, Ct = z.
These polynomials are boWlded by unity on l-1, 1], but grow very rapidly
outside this interval. By defining p(x) this way it follows that lp(Ai)lis
bounded by unity fori= 2:n, while p(At) = ck_ 1(1 + 2pt). Thus,

1-4 1
91 ~ At - (At- An)~ Ck-t(l + 2pt)2 .

T he desired lower bound Is obtained by noting that tan(4» 1) 2 = (1-d0fdi. C

An analogous result pertaining to 9~: follows immediately from this theorem:


9 .1. DERIVATION AND CONVERGENCE PROPERTIES 477

Corollary 9.1.4 Using the same Mtation as the theorem,


2
\ < 9 < A {At- An)ta.n(if>n)
""' - k - n + Ck-1(1 + 2Pn)2
where Pn = (An-1 - An)/(At - >-n-1) and cos(if>n):::: q'!;zn.

Proof. Apply Theorem 9 .1.3 with A replaced by - A. 0

9.1.5 The Power Method Versus the Lanczos Method


It is worthwhile to compare (Jt with the corresponding power method est i-
mate of At. (See §8.2.1.) For clarity, assume ). 1 ~ • • • ~ >.n ~ 0. After k - 1
power method steps applied to qt, a vector is obtained in the direction of
fl

v = Ak-tqt = L:c.>.~-tZi
i=l

along with an eigenvalue estimate

Using the proof and notation of Theorem 9.1.3, it is easy to show that

(9.1.5)

(Hint: Set p(x) = xk- t in the proof.) Thus, we can oompare the quality of
the lower bounds for 91 and 1 1 by comparing

and
).2) 2(k-1)
Rk-1 = ('f."
This is done in following table for representative values of k and .>.2 / .>.1 •
The superiority of the Lanczos estimate is self-evident. This should
be no surprise, since 81 is the maximum of r(x) = xT AxjxT x over all of
X:(A,q 1 ,k), while 11 = r(v) for a particular v in X:(A,q1,k), namely v =
Ak-tqt.
478 CHAPTER 9. LANCZOS METHODS

>.J/ >.2 k=5 k = 10 k = 15 k = 20 k = 25


1.1x1o-• 2.0x 10- 10 3.9x w- 1• 7.4x 10-•• 1.4x 10-0T
1.50
3.9x10 2 6.8x 10 • 1.2x 10 • 2.0x 10 7 3.5x 10 •
2.7x 1o-• s.sx 10-• 1.1 X 10- 7 2.1x10- 10 4.2x 10- 13
1.10
4.7xl0- 1 L8x10- 1 6.9 X 10-2 2.7x 10-• l.Ox1o-•

5.6xl0- 1 l.Ox10- 1 1.5x w-• 2.0x 1o-• 2.8x 10-•


1.01 9.2xl0- 1 8.4xio- 1 7.6xi0-1 6.9x 10 1 6.2x10- 1

TABLE 9.1.1 Lk-J/ Rk-1

9.1.6 Convergence of Interior Eigenvalues


We conclude with some remarks about error bounds for Tk's interior eigen-
values. The key idea in the proof of Theorem 9.1.3 is the use of the trans-
lated Chebyshev polynomial. With this polynomial we amplified the com-
ponent of q1 in the direction Z1. A similar idea can be used to obtain bounds
for an interior Ritz value 9;. However, the bounds are not as satisfactory be-
cause the "amplifying polynomial" has the form q(x)IIf;{(x- >.;) , where
q(x) is the (k - 1) degree of the Chebyshev polynomial on the interval
[>-..+I.>.n]· For details, see Kaniel (1966), Paige (1971), or Saad {1980).

Problems

P9.1.1 Suppose A E Rnxn is skew-symmetric. Derive a Lanczos-like algorithm for


computing a. skew-symmetric tridiagonal matrix T m. such that AQm. = QmTm, where
Q'J:.Qm =1m.
P9.1.2 Let A E R'xn be symmetric and define r(z) = zT Az/zT z. Suppose S ~ Rn
is a subspace with the property that z E S implies Vr(z) E S. Show that S is invariant
for A.
P9.1.3 Show that if a symmetric matrix A E RnXn has"' multiple eigenvalue, then the
Lanczos iteration terminates prerna.turely.
P9.1.4 Show that the index min Theorem 9.1.1 is the dimension of the smallest in-
vaciRnt subspace for A that contHoins q1.
P9.1.5 Let A E R'xn be symmetric and consider the problem of determining an or-
thonormal sequence q1, q2, ... with the property that once Qk =[ q,, ... ,qk] is known,
9k+l is chosen so as to minimize l'k II (J- Qk+ 1 QI+ 1 )AQk II F. Show that if
span{q1, ... ,qk} = ,t;::(A,q1ok), then it is possible to choose 9k+l so Jlk = 0. Explain
how this optimization problem leads to the Lanczos iteration.
P9.1.6 Suppose A E R'xn is symmetric and that we wish to compute its largest eigen-
value. Let '1 be an approximate eigenveetor and set
'IT AT/
'ITT/
z AT/- <Xfl.
9.2. PRACTICAL LANCZOS PROCEDURES 479

(a) Show that the interval [a - 6, a+ 6] must contain Bll eigenvalue of A where 6 =
11 z 11•/11'7 II•· (b) Consider the new approximation fj =a.,+ bz Blld show how to deter-
mine the scalars a and b so that
_ fiT Afj
a= fjTfj
is maximized. (c) Relate the above computations to the first two steps of the La.nczos
process.

Notes and References for Sec. 9.1

The claBSic reference for the Lanczos method is

C. Lanczos (1950). "An Iteration Method for the Solution of the Eigenvalue Problem of
Linear Differential and Integral Operators," J. Res. Nat. Bur. Stand. 45, 255--82.
Although the convergence of the Ritz values is alluded to this paper, for more details we
refer the reader to

S. Kaniel (1966). "Estimates for Some Computational Techniques in Linear Algebra,"


Math. Camp. 20, 369-78.
C. C. Paige (1971). "The Computation of Eigenvalues and Eigenvectors of Very Large
Spacse Matrices," Ph.D. thesis, London University.
Y. Saad (1980). "On the R.a.tes of Convergence of the Lanczos and the Block Lanczos
Methods," SIAM J. Num. Anal.17, 687-706.
The connections between the Lanczos algorithm, orthogonal polynomials, and the theory
of moments are discussed in

N.J. Lehmann (1963). "Optimale Eigenwerteinschliessungen," Numer. Math. 5, 246-72.


A.S. Householder (1968). "Moments and characteristic Rnots II," Numer. Math. 11,
126-28.
G.H. Golub (1974). "Some Uses of the Lanczos Algorithm in Numerical Linear Algebra,"
in Topics in Numerical Analysis, ed., J.J.H. Miller, Academic Press, New York.
We motivated our discussion of the Lanczos algorithm by discussing the inevitability of
fill-in when Householder or Givens transformations are used to tridiagonalize. Actually,
fill-in can sometimes be kept to an acceptable level if care is exercised. See

l.S. Duff (1974). "Pivot Selection and Row Ordering in Givens Reduction on Sparse
Matrices," Computing 13, 239--48.
l.S. Duff and J.K. &.id (1976). "A Comparison of Some Methods for the Solution of
Sparse Over-Determined Systems of Linear Equations," J. Ins!. Ma!hs. Applic. 17,
267-80.
L. Kaufman (1979). "Application of Dense Householder Transformations to a Sparse
Matrix," ACM Trans. Math. Sof!. 5, 442-50.

9.2 Practical Lanczos Procedures


Rounding errors greatly affect the behavior of the Lanczos iteration. The
basic difficulty is caused by loss of orthogonality among the Lanczos vectors,
a phenomenon that muddies the issue of termination and complicates the
relationship between A's eigenvalues and those of the tridiagonal matrices
Tk. This troublesome feature, coupled with the advent of Householder's
perfectly stable method of tridiagonalization, explains why the Lanczos
480 CHAPTER 9. LANCZOS METHODS

algorithm was disregarded by numerical analysts during the 1950's and


1960's. However, interest in the method was rejuvenated with the devel-
opment of the Kaniel-Pa.ige theory and because the pressure to solve large,
sparse eigenproblems increased with increased computer power. With many
fewer than n iterations typically required to get good approximate extremal
eigenvalues, the Lanczos method beca.me attractive as a sparse matrix tech-
nique rather than as a competitor of the Householder approach.
Successful implementations of the Lanczos iteration involve much more
than a simple encoding of (9.1.3). In this section we outline some of the
practical ideas that have been proposed to make Lanczos procedure viable
in practice.

9.2.1 Exact Arithmetic Implementation


With careful overwriting in (9.1.3) and exploitation of the formula

ak = q"[(Aqk- f3k-1qk-J),

the whole Lanczos process can be implemented with just two n-vectors of
storage.

Algorithm 9.2.1. (The Lanczos Algorithm) Given a symmetric


A E JR.nxn and w E JR." having unit 2-norm, the following algorithm com-
putes a k-by-k symmetric tridiagonal matrix Tk with the property that
>-(Tk) C >-(A). It assumes the existence of a function A.mult(w) that
returns the matrix-vector product Aw. The diagonal and subdiagonal ele-
ments of Tk are stored in a(1:k) and (3(1:k -1) respectively.
v(l:n) = 0; f3o = 1; k = 0
while f3k "f 0
if k 'I 0
fori= 1:n
t = w;; w; = v;f f3k; v; = -(3kt
end
end
v = v + A.mult(w)
k = k + 1; ak = wTv; v == v- a~;w; f3k ==II v ll2
end
Note that A is not altered during the entire process. Only a procedure
A.mult(·) for computing matrix-vector products involving A need be sup-
plied. If A has an average of about i non zeros per row, then approximately
(2i + 8)n flops are involved in a single Lanczos step.
Upon termination the eigenvalues ofT" can be found using the symmet-
ric tridiagonal QR algorithm or any of the special methods of §8.5, such as
bisection.
9.2. PRACTICAL LANCZOS PROCEDURES 481

The Lanczos vectors are generated in t hen-vector w. If t hey are desired


for later use, t hen special arrangements must be made for their storage. In
the typical sparse matrix setting they could be stored on a d isk or some
other secondary storage device until required.

9.2.2 Roundoff Properties


The development of a practical, easy-to-use Lancws procedure requires
an appreciation of the fundamental error analyses of Paige (1971, 1976,
1980). An examination of his results is the best way to motivate the several
modified Lanczos procedures of this section.
After j steps of the algorithm we obtain the matrix of computed Lanczos
vectors Q1c = [ Ql, ... , Qk ] and the associated tridiagonal mat rix

ch PI 0

iJ1 02
'i',. =

Paige (1971, 1976) shows that iff~.: is the computed analog of r~c, then

(9.2.1)

where
II E1c: 112 ::::~ ull A lb . (9.2.2)
This indicates that the important equation AQ" = Q,.T,. +nef is satisfied
to working precision.
Unfortunately, the picture is much less rosy with respect to the orthog-
onality among the q; . (Normality is not an issue. The computed Lanczos
vectors essentially have unit length.) If Pk
= fl (ll f k ll2) and we compute
Qk+l = fl(rk f{J,.), then a simple analysis shows that PkQk+1 ~ i'k + w~o
where II Wk ll2::::~ uJI f~r ll2 ~ uJI A 112· Thus, we may conclude that

·T • I ~ IffQil + ul! A !1 2
Iqk+lQi !.B~cl

for i = l:k. In other words, significant departures from orthogonality can


be expected when P1c is small, even in the ideal situation where r'{Qk is
zero. A small !J~e implies cancellation in t be computation off,.. We stress
that loss of orthogonality is due to this cancellation and is not the result of
482 CHAPTER 9. LANCZOS METHODS

the gradual accumulation of roundoff error.

Example 9.2.1 The matrix

A = [ 2.64 -.48 ]
-.48 2.36
has eigenvalues .\1 = 3 and .\2 = 2. If the Lanczos algorithm is applied to this matrix
with 91 =[ .810, -.586 jT and thre&digit floating point arithmetic is performed, then
q2 =[ .707, .707]T. Loss of orthogonality occurs because span{ql} is almost invariant
for A. {The vector x = [.8, -.6JT is the eigenvector affiliated with>.,.)

Further details of the Paige analysis are given shortly. Suffice it to


say now that loss of orthogonality always occurs in practice and with it,
an apparent deterioration in the quality of Tk 's eigenvalues. This can be
quantified by combining (9.2.1) with Theorem 8.1.16. In particular, if in
that theorem we set F1 = fker + Ek, X 1 = Qk, S = Tk, and assume that

satisfies T < 1, then there exist eigenvalues JJ.b ... , JJ.k E A(A) such that

for i = 1:k. An obvious way to control the T factor is to orthogonalize


each newly computed Lanczos vector against its predecessors. This leads
directly to our first "practical" Lanczos procedure.

9.2.3 Lanczos with Complete Reorthogonalization


Let ro, ... ,rk-1 ERn be given and suppose that Householder matrices
Ho, ... , Hk-1 have been computed such that (Ho · · · Hk-1 f [ro, ... , rk-1 I
is upper triangular. Let [q 1, ... ,Qk] denote the first k columns of the
Householder product (Ho · · · Hk-1). Now suppose that we are given avec-
tor rk ERn and wish to compute a unit vector Qk+l in the direction of

k
w rk- "i:,(qf rk)Q; E span{q1, ... ,Qk}.L.
i=l

If a Householder matrix Hk is determined so (Ho · · · Hk)T [ r 0 , ••• , Tk I is


upper triangular, then it follows that column (k + 1) of H 0 · · · Hk is the
desired unit vector.
If we incorporate these Householder computations into the Lanczos pro-
cess, then we can produce Lanczos vectors that are orthogonal to machine
precision:
9.2. PRACTICAL LANCZOS PROCEDURES 483

ro = q1 (given unit vector)


Determine Householder Ho so Horo = e1.
a1 = q'f Aq1
fork= l:n- 1
Tk =(A- akl)qk- f3k-IQk-I =
(f3oqo 0) (9.2.3)
w = (Hk-1 · · · Ho)rk
Determine Householder Hk so Hkw = (wJ, ... 'Wk, f3k, 0, ... 'o)T.
Qk+I = Ho · · · Hkek+I; ak+l = q[+ 1Aqk+I
end

This is an example of a complete reorthorgonalization Lanczos scheme. A


thorough analysis may be found in Paige (1970). The idea of using House-
holder matrices to enforce orthogonality appears in Golub, Underwood, and
Wilkinson (1972).
That the computed q; in (9.2.3) are orthogonal to working precision
follows from the roundoff properties of Householder matrices. Note that by
virtue of the definition of Qk+ 1 , it makes no difference if f3k = 0. For this
reason, the algorithm may safely run until k = n- 1. (However, in practice
one would terminate for a much smaller value of k.)
Of course, in any implementation of (9.2.3), one stores the Householder
vectors Vk and never explicitly forms the corresponding Pk. Since we have
Hk(1:k, l:k) = h there is no need to compute the first k components of
w = (Hk-1" · ·Ho)rk, for in exact arithmetic these components would be
zero.
Unfortunately, these economies make but a small dent in the computa-
tional overhead associated with complete reorthogonalization. The House-
holder calculations increase the work in the kth Lanczos step by O(kn)
flops. Moreover, to compute Qk+I, the Householder vectors associated with
Ho, ... , Hk must be accessed. For large n and k, this usually implies a
prohibitive amount of data transfer.
Thus, there is a high price associated with complete reorthogonalization.
Fortunately, there are more effective courses of action to take, but these
demand that we look more closely at how orthogonality is lost.

9.2.4 Selective Orthogonalization


A remarkable, ironic consequence of the Paige (1971) error analysis is that
loss of orthogonality goes hand in hand with convergence of a rutz pair.
To be precise, suppose the symmetric QR algorithm is applied to 'h and
renders computed Ritz values 01 , ••• , Ok and a nearly orthogonal matrix of
eigenvectors sk = (sv9 ). If Yk = [ y~, ... ,ilk] = fl(QkSk), then it can be
shown that for i = 1:k we have

(9.2.4)
484 CHAPTER 9. LANCZOS METHODS

and
(9.2.5)
That is, the most recently computed Lanczos vector qk+l tends to have a
nontrivial and unwanted component in the direction of any converged Ritz
vector. Consequently, instead of orthogonalizing qk+ 1 against all of the
previously computed Lanczos vectors, we can achieve the same effect by
orthogonalizing it against the much smaller set of converged Ritz vectors.
The practical aspects of enforcing orthogonality in this way are dis-
cussed in Parlett and Scott (1979). In their scheme, known as selective
orlhogonalization, a computed Ritz pair (0, y) is called "good" if it satisfies

AI; soon as qk+J is computed, it is orthogonalized against each good Ritz


vector. This is much less costly than complete reorthogonalization, since
there are usually many fewer good Ritz vectors than Lanczos vectors.
One way to implement selective orthogonalization is to diagonalize 'h at
each step and then examine the Ski in light of (9.2.4) and (9.2.5). A much
more efficient approach is to estimate the loss-of-orthogonality measure
II h- QfQk ll2 using the following result:
Lemma 9.2.1 Supposes+ = [ s d I where s E !Rnxk and dE !Rn. If s
satisfies II h- sTs 112 ~ J.L and 11- dT dl ~ {j then II h+l- srs+ 112 <
J.L+ where
J.L+ = ~ (J.L +6 + V(ll- {)) 2 + 411 sTd II~ )
Proof. See Kahan and Parlett (1974) or Parlett and Scott (1979). D

Thus, if we have a bound for II h- QfQk ll2 we can generate a bound for
-T - -
II h+J- Qk+JQk+1 ll2 by applying the lemma with S = Qk and d = qk+1·
(In this case {j ~ u and we assume that qk+ 1 has been orthogonalized against
the set of currently good Ritz vectors.) It is possible to estimate the norm
of Qf qk+l from a simple recurrence that spares one the need for accessing
q1, ... , qk· See Kahan and Parlett (1974) or Parlett and Scott (1979). The
overhead is minimal, and when the bounds signal loss of orthogonality, it is
time to contemplate the enlargement of the set of good Ritz vectors. Then
and only then is 'h diagonalized.

9.2.5 The Ghost Eigenvalue Problem


Considerable effort has been spent in trying to develop a workable Lanc-
zos procedure that does not involve any kind of orthogonality enforcement.
Research in this direction focuses on the problem of "ghost" or "spurious"
9.2. PRACTICAL LANCZOS PROCEDURES 485

eigenvalues. These are multiple eigenvalues of 1'1c tha.t correspond to sim-


ple eigenvalues of A. They arise because the iteration essentially restarts
itself when orthogonality to a converged Ritz vector is lost. (By way of
a nalogy, consider what would happen during orthogonal iteration §8.2.8 if
we "forgot" to ortbogonalize.)
The problem of identifying ghost eigenvalues and coping with their pres-
ence is discussed in Cullum and Willoughby (1979) and Parlett and Reid
(1981). It is a particularly pressing problem in those applications where all
of A's eigenvalues are desired, for then the above orthogonaliza.tion proce-
dures are too expensive to implement.
Difficulties wit h the Lanczos iteration can be expected even if A has a
genuinely multiple eigenvalue. This follows because the 'h are unreduced,
and unreduced tridiagonal matrices cannot have multiple eigenvalues. Our
next practical Lanczos procedure attempts to circumvent this difficulty.

9.2.6 Block Lanczos


Just as the simple power method has a block analog in simultaneous itera-
tion, so does the Lanczos algorithm have a block version. Suppose n = rp
and consider the decomposition

Mt BT
1 0

Bt M2
QTAQ 1'= (9.2.6)

B'f-t
0 Br-1 Mr

where
Q = [X1, ... ,Xr j xi e JR.nxp
is orthogonal, each M , E JR.Px-", and each B1 e JR!'xp is upper triangular.
Comparing blocks in AQ = Q1' shows that

XoBo =0
for k = 1:r - 1. From the orthogonality of Q we have

Mt = X[AX~c

fork= l:r. Moreover, if we define

Rt = AX~:- X~:M1c - X~c - tB'f_ 1 e 1Rnxp


then Xt+tBic = R,. is a QR factorization of R~c . T hese observations suggest
that the block tridiagonal matrix 1' in (9.2.6) can be generated as follows:
486 CHAPT ER 9. LANCZOS METHODS

XI E JR."XP given with xr xl = I,.


M1 =X'[ AX1
fork = l:r - 1 (9.2.7)
R1c = AX1c - X~cMk - X~c- I B'f_ 1 (XoBJ' = 0)
X~o:+ 1 B,. = R,. (QR factorization of R,.)
Mk+t = X[+1 AXk+ t
end

At t he beginning of the kth pass through the loop we have

A[XJ, ... ,X,.] = [XJ, ... ,X,.]T.~:+R~c[O, ... ,O,I,J (9.2.8)

where

Ml BT 0
B1 Mz
T"

Using an argument similar to the one used in the proof of T heorem 9.1.1,
we can show that the x,. are mutually orthogonal provided none of the R,.
are rank-deficient. However if rank(Rk ) < p for some k, then it is possible
to choose the columns of Xk+l such that X[+1 X, = 0, fori = l :k. See
Golub and Underwood (1977).
Because Tk has bandwidth p, it can be efficiently reduced to tridiagcr
nat form using an algorithm of Schwartz (1968). Once t ridiagonal form is
achieved, the Ritz values can be obtained via the symmetric QR algorithm.
In order to intelligently decide when to use block Lanczos, it is necessary
to understand how the block dimension affects convergence of the Ritz
values. The following generalization of Theorem 9.1 .3 sheds light on this
issue.
Theorem 9.2.2 Let A by an n-by-n symmetric matrix with eigenvalues
>. 1 ~ • • • ~ >.., and corresponding orthonormal eigenvectors z 1 , •• • , z,.. . Let
P.l ~ · · · ~ IJ.p be the p largest eigenvalues of the matrix 1',. obtainP.d after
k steps of the block Lanczos iteration {9.2. 7). If Z 1 = [ z 1 , •• • , z, ] and
cos(o9,) = q,(Z'[X1 ) > 0, the11 fori= l:p, >., ~ IJ> ~ >., - where t:t
t:~
2
(>.1 tan (9p) >.,+1
·
- ,\i) "Yi "'• -
[ct.-)o=~)r >.,->..,
9.2. PRACTICAL LANCZOS PROCEDURES 487

and Ck-i(z) is the Chebyshev polynomial of degree k- 1.


Proof. See Underwood (1975). []

Analogous inequalities can be obtained for Tk 's smallest eigenvalues by


applying the theorem with A replaced by -A.
Based on Theorem 9.2.2 and scrutiny of the block Lanczos iteration
(9.2.7) we may conclude that:
• the error bound for the Rltz values improve with increased p.

• the amount of work required to compute Tk's eigenvalues is propor-


tional to p 2 •
• the block dimension should be at least as large as the largest multi-
plicity of any sought-after eigenvalue.
How to determine block dimension in the face of these tradeoffs is discussed
in detail by Scott (1979).
Loss of orthogonality also plagues the block Lanczos algorithm. How-
ever, all of the orthogonality enforcement schemes described above can be
extended to the block setting.

9.2.7 s-Step Lanczos


The block Lanczos algorithm (9.2.7) can be used in an iterative fashion
to calculate selected eigenalues of A. To fix ideas, suppose we wish to
calculate the p largest eigenvalues. If X 1 E Rnxp is a given matrix having
orthonormal columns, we may proceed as follows:

until II AX1- X1f'.IIF is small enough


Generate x2, ... ,x. E Jir'XP via the block Lanczos algorithm.
Form f'. = [xi> ... , x.
fA [XI,. .. , x. ], an sp-by-sp,
p-diagonal matrix.
Compute an orthogonal matrix U = [ u 1 , •.• , u., J such that
urt.u = diag(91 , ... , o.,) with 91 ~ • • • ~ o.,.
Set X1 = [XI, ... ,X.] [ui, ... ,u,].
end
This is the block analog of the s-step Lanczos algorithm , which has been
extensively analyzed by Cullum and Donath (1974) and Underwood (1975).
The same idea can also be used to compute several of A's smallest eigen-
values or a mixture of both large and small eigenvalues. See Cullum (1978).
The choice of the parameters s and p depends upon storage constraints as
well as upon the factors we mentioned above in our discussion of block
dimension. The block dimension p may be diminished as the good Ritz
488 CHAPTER 9. LANCZOS METHODS

vectors emerge. However this demands that orthogonality to the converged


vectors be enforced. See Cullum and Donath (1974).
Problems

P9.2.1 Prove Lemma 9.2.1.


P9.2.2 If rank(Rk) <pin (9.2.7), does it follow that range(( X,, ... , Xk]) contains an
eigenvector of A?

Notes and References for Sec. 9.2

Of the several computational variants of the Lanczos Method, Algorithm 9.2.1 is the
most stable. For details, see

C. C. Paige (1972). "Computational Variants of the Lanczos Method for the Eigenprob-
lem," J. Inst. Math. Applic. 10, 373~81.

Other practical details associated with the implementation of the Lanczos procedure are
discussed in

D.S. Scott (1979). "How to Make the Lanczos Algorithm Converge Slowly," Math.
Comp. 33, 239~47.
B.N. Parlett, H. Simon, and L.M. Stringer (1982). "On Estimating the Largest Eigen-
value with the Lanczos Algorithm," Math. Comp. 38, 153-166.
B.N. Parlett and B. Nour-Omid (1985). "The Use of a Refined Error Bound When
Updating Eigenvalues of Tridiagonals," Lin. Alg. and It• Applic. 68, 179-220.
J. Kuczynski and H. Wo:iniakowski (1992). "Estimating the Largest Eigenvalue by the
Power and Lanczos Algorithms with a Random Start," SIAM J. Matrix Anal. Appl.
13, 1094-1122.

The behavior of the Lanczos method in the presence of roundoff error was originally
reported in

C.C. Paige (1971). "The Computation of Eigenvalues and Eigenvectors of Very Large
Sparse Matrices," Ph.D. thesis, University of London.

Important follow-up papers include

C. C. Paige (1976). "Error Analysis of the Lanczos Algorithm for Tridiagonalizing Sym-
metric Matrix," J. In•t. Math. Applic. 18, 341-49.
C.C. Paige (1980). "Accuracy and Effectiveness of the Lanczos Algorithm for the Sym-
metric Eigenproblem," Lin. Alg. and Its Applic. 34, 235-58.

For a discussion about various reorthogonalization schemes, see

C. C. Paige (1970). "Practical Use of the Symmetric Lanczos Process with Reorthogo-
nalization/' BIT 10, 183-95.
G.H. Golub, R. Underwood, and J.H. Wilkinson (1972). "The Lanczos Algorithm for the
Symmetric Ax= >.Bx Problem," Report STAN-CS-72-270, Department of Computer
Science, Stanford University, Stanford, California.
B.N. Parlett and D.S. Scott (1979). "The Lanczos Algorithm with Selective Orthogo-
nalization/' Math. Camp. 331 217-38.
H. Simon (1984). "Analysis of the Symmetric Lanczos Algorithm with Reorthogonaliza-
tion Methods," Lin. Alg. and It• Applic. 61, 101-132.
Without any reorthogonalization it is necessary either to monitor the loss of orthogonal-
ity and quit at the appropriate instant or else to devise some scheme that will aid in the
9.2. PRACTICAL LANCZOS PROCEDURES 489

distinction between the ghost eigenvalues and the actual eigenvalues. See

W. Kahan and B.N. Parlett (1976). "How Far Should You Go with the Lanczos Process?"
in Sparse Matrix Computations, ed.. J. Bunch and D. Rose, Academic Press, New
York, pp. 131-44.
J. Cullum and R.A. Willoughby (1979). "Lanczos and the Computation in Specified
Intervals of the Spectrum of Large, Sparse Real Symmetric Matrices, in Sparse Matrix
Proc. , 1978, ed. I.S. Duff and G.W. Stewart, SIAM Publications, Philadelphia, PA.
B.N. Parlett and J.K. Reid (1981). "Tracking the Progress of the Lanczos Algorithm for
Large Symmetric Eigenproblems," IMA J. Num. Anal. 1, 135-55.
D. Calvetti, L. Reichel, and D.C. Sorensen ( 1994). "An Implicitly Restarted Lanczos
Method for Large Symmetric Eigenvalue Problems," ETNA 2, 1-21.
The block Lanczos algorithm is discussed in

J. Cullum and W.E. Donath (1974). "A Block Lanczos Algorithm for Computing the q
Algebraically Largest Eigenvalues and a Corresponding Eigenspace of Large Sparse
Real Symmetric Matrices," Proc. of the 1974 IEEE Conf. on Dei!ision and Control,
Phoenix, Arizona, pp. 505-9.
R. Underwood (1975). "An Iterative Block Lanczos Method for the Solution of Large
Sparse Symmetric Eigenproblems," Report STAN-CS-75-495, Department of Com-
puter Science, Stanford University, Stanford, California.
G.H. Golub and R. Underwood (1977). "The Block Lanczos Method for Computing
Eigenvalues," in Mathematical Software III, ed. J. llice, Academic Press, New York,
pp. 364-77.
J. Cullum (1978). "The Simultaneous Computation of a Few of the Algebraically Largest
and Smallest Eigenvalues of a Large Sparse Symmetric Matrix," BIT 18, 265-75.
A. Ruhe (1979). "Implementation Aspects of Band Lanczos Algorithms for Computation
of Eigenvalues of Large Sparse Symmetric Matrices," Math. Camp. 39, 680.87.
The block Lanczos algorithm generates a symmetric band matrix whose eigenvalues can
be computed in any of several ways. One approach is described in

H.R. Schwartz (1968). "Tridiagonalization of a Symmetric Band Matrix," Numer. Math.


12, 231-41. See also Wilkinson and Reinsch (1971, 273-83).
In some applications it is necessary to obtain estimates of interior eigenvalues. The
Lanczos algorithm, however, tends to find the extreme eigenvalues first. The following
papers deal with this issue:

A.K. Cline, G.H. Golub, and G.W. Platzman (1976). "Calculation of Normal Modes of
Oceans Using a Lanczos Method,, in Sparse Matrix Computations, ed. J.R. Bunch
and D.J. Rose, Academic Press, New York, pp. 409-26.
T. Ericsson and A. Ruhe (1980). "The Spectral Transformation Lanczos Method for the
Numerical Solution of Large Sparse Generalized Symmetric Eigenvalue Problems,"
Math. Camp. 35, 1251-68.
R.G. Grimes, J.G. Lewis, and H.D. Simon (1994). "A Shifted Block Lanczos Algorithm
for Solving Sparse Symmetric Generalized Eigenproblems," SIAM J. Matri:r Anal.
Appl. 15, 228-272.
490 CHAPTER 9. LANCZOS METHODS

9.3 Applications to Ax = b and Least Squares


In this section we briefly show how the Lanczos iteration can be embellished
to solve large sparse linear equation and least squares problems. For further
details, we recommend Saunders (1995).

9.3.1 Symmetric Positive Definite Systems


Suppose A E m_nxn is symmetric and positive definite and coiiSider the func-
tional tj>(x) defined by

where bE m.". Since Vtj>(x) = Ax-b, it follows that x = A- 1 b is the unique


minimizer of tj>. Hence, an approximate minimizer of t/> can be regarded as
an approximate solution to Ax = b.
Suppose xo E m." is an initial guess. One way to produce a vector se-
quence {xk} that converges to xis to generate a sequence of orthonormal
vectors {qk} and to let Xk minimize t/> over the set

for k = 1:n. If Qk = [ q1 , ••• , Qk ], then this just means choosing y E m_k


such that
1
2(xo + Qky) T A(xo + Qky) - (xo + Qky) T b
~yr(QrAQk)y- yrQ[(b- Axo) + t/>(xo)
is minimized. By looking at the gradient of this expression with respect to
y we see that
(9.3.1)
where
(9.3.2}
When k = n the minimization is over all of R" and so Axn = b.
For large sparse A it is necessary to overcome two hurdles in order to
make this an effective solution process:

• the linear system (9.3.2) must be "easily" solved.

• we must be able to compute Xk without having to refer to QI, ••• , Qk


explicitly as (9.3.1) suggests. Otherwise there would be an excessive
amount of data movement.
9.3. APPLICATIONS TO Ax = b AND LEAST SQUARES 491

We show that both of these requirements are met if the q1r are Lanczos
vectors.
After k steps of the Lanczos algorithm we obtain the factorization
(9.3.3)

where
0

(9.3.4)

With t his approach (9.3.2) becomes a symmetric positive definite t ridiag-


onal system which may be solved via the LDLT factorization. (See Algo-
rithm 4.3.6.) In particular, by setting

0
1
and

we find by comparing entries in

T1c = LkD~cLf (9.3.5)

that
dJ = Ot
fori = 2:k
JJi- 1 = /3i- ddt- t
d, ""'a, - 13<-ll'i.- 1
end
Note that we need only calculate the quantities

JJ/r - 1 = f3Jr-l /d~c-1 (9.3.6)


d1c = Ole - f3k- 1Jl.lc-l
in order to obtain L~r and D1c from L~r- 1 and Dtc- J·
As we mentioned, it is critical to be able to compute XJc in (9.3.1) effi-
ciently. To t his end we define C1r E ll"xlc and Plr E JR!' by the equations

(9.3.7)
492 CHAPTER 9. LANCZOS METHODS

and observe that if r 0 = b - Axo then

Let C~c = [ Ct , •• • , c~c J be a. column partitioning. It follows from (9.3. 7) that

[ Ct. J.'tCJ + C2 , ' ·', P,k-lCk-1 + Ck ) = (Ql, • • · , Qk]


and therefore C,. = (Ck- 1, C1e j where

Also observe that if we set Pk = [ p 1 , ••• , Pk JT in L~eD~cp,. = QJ. ro, then


that equation becomes

P1 q[ ro
[ L,_,D,_, q'fro

:J
P2
=
0 ... 0 1-"k- Jdk-1
Pk-1 qJ._ 1 ro
P1e qJ.ro

Pk = [ Pk- l ]
Pk

where

and thus,

This is precisely the kind of recursive formula for Xk that we need. To-
gether with (9.3.6) and (9.3.7) it enables us to make the transition from
( Q~c-t,Ck-t , XJc -J) to (q~c,c.~c ,x~c) with a minimal work and storage.
A further simplification results if we set q1 to be a unit vector in the
direction of the initial residual r 0 == b- Axo. With this choice for a Lanczos
starting vector, qJ. ro = 0 for k ~ 2. It follows from (9.3.3) that

b- Ax~c = b- A(xo + Q~cy~c) = r 0 - (Q.~cT~c + r~cef)yk


= ro - Q~cQ'f:ro - rke'[ y = -rkeJ. y.
9.3. APPLICATIONS TO Ax = b AND LEAST SQUARES 493

Thus, if f3k = II rk lb = 0 in the Lanczos iteration, then Axk = b. Moreover,


II Axk - b 11 2 = f3k lef Ykl and so estimates of the current residual can be
obtained as a by-product of the iteration. Overall, we have the following
procedure.

Algorithm 9.3.1 If A E IRnxn is symmetric positive definite, bE JR.n, and


xo E JR.n is an initial guess (Axo R: b), then this algorithm computes the
solution to Ax = b.

ro = b- Axo
/3o = II ro ll2
Qo = 0
k=O
while {3k #0
Qk+l = Tk/f3k
k=k+1
ak = qfAqk
Tk =(A- akl)qk- f3k-1Qk-l
l3k =II Tk lb
if k = 1
dl = ll}
Ct = Ql
PI = f3o/ai
X} = PIQI
else
J.Lk-1 = l3k-I/dk-l
dk = llk- f3k-IJ.Lk-!
Ck = Qk - /.Lk-!Ck-1
Pk = -J.Lk-Idk-IPk-1/dk
Xk = Xk-1 + PkCk
end
end
X= Xk

This algorithm requires one matrix-vector multiplication and a couple of


saxpy operations per iteration. The numerical behavior of Algorithm 9.3.1
is discussed in the next chapter, where it is rederived and identified as the
widely known method of conjugate gmdients.

9.3.2 Symmetric Indefinite Systems


A key feature in the above development is the idea of computing the LDLT
factorization of the tridiagonal matrices Tk. Unfortunately, this is poten-
tially unstable if A, and consequently Tk, is not positive definite. A way
around this difficulty proposed by Paige and Saunders (1975) is to develop
494 CHAPTER 9. LANCZOS METHODS

t he recursion for Xk via an "LQ" factorization of T1c . In particular, at the


kth step of the iteration, we have Givens rotations 1 1 , • . • , l~c - 1 such that

dt 0 0 0
Ct th 0 0
h ez d3 0
Tk11 · · · lk-1 = L~: =

0
0 0 0 ik- z Ck - 1 Jk

Note that with this factorization xk is given by

XJc = xo + QkYk = Q~cT; 1 Qfb = Wksk

where

and Sk E R" solves


LkSk = Qfb.
Scrutiny of these equations enables one to develop a formula for computing
Xk from Xk- 1 and an easily computed multiple of w~c, the last column of
Wk. This defines the SYMMLQ method set forth in Paige and Saunders
(1975).
A different idea is to notice from (9.3.3) and the definition {3kQk+J = r~c
that

where

H k == [ 13~;r ] .
This (k + 1)-by-k matrix is upper Hessenberg and figures in the MINRES
method of Paige and Saunders {1975). In this technique Xk minimizes
II A x - b 11 2 over t he set xo +span{qt •. . . , qk}. Note that

II A(xo + Q~cy)- b 11 2 = II AQ~cy- (b - Axo) 11 2


= II Qk+tHkY- (b- Axo) 11 2 = II H~cy- f3oe1 liz
where it is assumed that q1 = (b- Ax0 )/.Bo is a unit vector. As in SYMMLQ,
it is possible to develop recursions that permit the efficient computation of
Xk from its predecessor Xk-1· The QR factorization of H~c is involved.
The behavior of the conjugate gradient method is detailed in the next
chapter. The convergence of SYMMLQ and MIN RES is more complicated
and is discussed in Paige, Parlett, and Van Der Vorst (1995).
9.3. APPLICATIONS TO Ax = b AND LEAST SQUARES 495

9.3.3 Bidiagonalization and the SVD


Suppose lJTAV = B represents the bidiagonalization of A E R"'xn with
u = [ Ut' ... 'Um I uru = lm
v = [v••... ,vnl V"V=ln
and
Ot f3t 0

0 ()2

B = (9.3.8)

fJn-1
0 0 On
Recall from §5.4.3 that this factorization may be computed using House-
holder transformations and that it serves as a. front end for the SVD algo-
rithm.
Unfortunately, if A is large and sparse, then we can expect large, dense
submatrices to arise during the HousehoJder bidiagonalization. Conse-
quently, it would be nice to develop a means for computing B di.rectly
without any orthogonal updates of the matrix A.
Proceeding just as we did in §9.1.2 we compare columns in the equations
AV = UB and ATU = VBT fork= l :n a.nd obtain
Av~o = a~ouk + Pt.-tUk-1 fJouo =0
(9.3.9)
ATu,_ = Oktlk + fJkVk+l /JnVn+I: 0

Defining
rk = Av,. - fJk-ttLk-1
Pk = ATUk - Olk1Jk
we may conclude from orthonormality that a,. = ±II rk 1!2. u~~: = r~~:fa,. ,
{3,. = ±II Pk 112, and Vk+l = Pk!{3,.. Properly sequenced, these equations
define the La.nczos method for bidiagonalizlng a rectangular matrix:
Vt= given unit 2-norm n-vector
Po= V t j f3o = 1; k = 0; U0 = 0
while fJ1e '¥= 0
Vle+l = P1e/fJ1e
k=k+l
r,. = Avk- fJie-tUie-1 (9.3.10)
Otk = II r1e H2
u,. = r,,Ja~c
Pie = ATUk - Otk1Jic
/Jic = II Pie 112
end
496 CHAPTER 9. LANCZOS METHODS

If rank(A) = n, then we can guarantee that no zero crk arise. Indeed, if


Clk = 0 then span{Av1, ... ,Avk} C span{u 1, ... ,uk-d which implies rank
deficiency.
If f3k = 0, then it is not hard to verify that

where Bk = B(1:k, 1:k) and B is prescribed by (9.3.8). Thus, the v vectors


and the u vectors are singular vectors and u(Bk) C a(A). Lanczos bidiag-
onalization is discussed in Paige (1974). See also Cullum and Willoughby
(1985a, 1985b). It is essentially equivalent to applying the Lanczos tridiag-
onalization scheme to the symmetric matrix

We showed that >.;(C) = u;(A) = ->.n+m-i+i(C) for i = 1:n at the


beginning of §8.6. Because of this, it is not surprising that the large singular
values of the bidiagonal matrix tend to be very good approximations to the
large singular values of A. The small singular values of A correspond to the
interior eigenvalues of C and are not so well approximated. The equivalent
of the Kaniel-Paige theory for the Lanczos bidiagonalization may be found
in Luk (1978) as well as in Golub, Luk, and Overton (1981 ). The analytic,
algorithmic, and numerical developments of the previous two sections all
carry over naturally to the bidiagonalization.

9.3.4 Least Squares


The full-rank LS problem min II Ax- b ll2 can be solved via the bidiago-
nalization. In particular,

2:: YiVi
i=l

where y = [YJ, ... ,yn]T solves the system By = [ufb, ... ,u~bjT. Note
that because B is upper bidiagonal, we cannot solve for y until the bidi-
agonalization is complete. Moreover, we are required to save the vectors
vi, ... , Vn, an unhappy circumstance if n is large.
The development of a sparse least squares algorithm based on the bidi-
agonalization can be accomplished more favorably if A is reduced to lower
9.3. APPLJCATIONS TO A:z: = bAND LEAST SQUARES 497

bidiagonal form

0] 0 0

f3t oz

urAV = B = 0
0
0

0
where V = IVI, ... , Vn I and U = (u 11 ••• , Um I are orthogonal. Comparing
columns in the equations ATU = V BT and AV = U B we obtain

ATu,~; = fjk-IVk-1 +o~cvk f3ovo =0


Avk "" OA;Uk + f3~cuk+1
It is straightrorward to develop a Lanczos procedure from these equations
and the resulting algorithm is very similar to (9.3.10}, only u 1 is the starting
vector.
Define the matrices vk = I V], .. . . Vk ], uk = [ U], . .. 'Uk I. and Bk =
B(l:k +l , l :k} and observe that AV.~; = U.t+lB,. Our goal is to compute x~c,
the minimizer of II Ax - b liz over all vectors of the form x = x 0 + V,y, where
y E lR" and xo e lRn is an initial guess. If u 1 = (b-A:z:o)/ ll b - Axo lb. then

A (xo + V,y) - b = VJ<+tB~cy- 13tUic+ 1e 1 = U.t+t (B~cy - t3tet)


where e 1 = Im(:, 1). It follows that if Y.t solves the (k + 1)-by-k lower
bidiagonal LS problem

then Xk = xo + VkYic· Since Bk is lower bidiagonal, it is easy to compute


Givens rotations J 1 , ••• , J1c such that

Jk ... JtBk = [ ~k] ~


is upper bidiagonal. H

die ] k
[ u 1 '

then it follows that X/( = xo + v,.y, = Wkdk where wk = V~cR; 1• Paige


and Saunders (1982a) show how X k can be obtained from Xk-l vJa a simple
498 CHAPTER 9. LANCZOS METHODS

recursion that involves the last column of Wk. The net result is a sparse LS
algorithm referred to as LSQR that requires only a few n-vectors of storage
to implement.

Problems

P9.3.1 Modify Algorithm 9.3.1 so that it implements the indefinite symmetric solver
outlined in §9.3.2.
P9.3.2 How many vector workspaces are required to implement efficiently (9.3.10)?
P9.3.3 Suppose A is rank deficient and ak = 0 in (9.3.10). How could Uk be obtained
so that the iteration could continue?
P9.3.4 Work out the lower bidiagonal version of (9.3.10) and detail the least square
solver sketched in §9.3.4.

Notes and References for Sec. 9.3

Much of the material in this section has been distilled from the following papers:

C.C. Paige (1974). "Bidiagonalization of Matrices and Solution of Linear Equations,"


SIAM J. Num. Anal. 11, 197-209.
C.C. Paige and M.A. Saunders (1975). "Solution of Sparse Indefinite Systems of Linear
Equations," SIAM J. Num. Anal. Hl, 617-29.
C.C. Paige and M.A. Saunders (1982a). "LSQR: An Algorithm for Sparse Linear Equa-
tions and Sparse Least Squares," ACM 7\"ans. Math. Soft. 8, 43-71.
C.C. Paige and M.A. Saunders {1982b). "Algorithm 583 LSQR: Sparse Linear Equations
and Least Squares Problems," ACM 7\"ans. Math. Soft. 8, 195-209.
M.A. Sanders (1995). "Solution of Sparse Rectangular Systems," BIT 35, 588--604.

See also Cullum and Willoughby (19855a,1985b) and

0. Widlund {1978). "A Lanczos Method for a Class of Nonsymmetric Systems of Linear
Equations," SIAM J. Numer. Anal. 15, 801-12.
B.N. Parlett (1980). "A New Look at the Lanczos Algorithm for Solving Symmetric
Systems of Linear Equations," Lin. Alg. and Its Applic. 29, 323--46.
G.H. Golub, F.T. Luk, and M. Overton (1981). "A Block Lanczos Method for Computing
the Singular Values and Corresponding Singular Vectors of a Matrix," ACM 7\"ans.
Math. Soft. 1, 149--69.
J. Cullum, R.A. Willoughby, and M. Lake {1983). "A Lanczos Algorithm for Computing
Singular Values and Vectors of Large Matrices," SIAM J. Sci. and Stat." Comp. 4,
197-215.
Y. Saad (1987). "On the Lanczos Method for Solving Symmetric Systems with Several
Right Hand Sides," Math. Comp. 48, 651-662.
M. Berry and G.H. Golub (1991). "Estimating the Largest Singular Values of Large
Sparse Matrices via Modified Moments," Numerical Algorithms 1, 353-374.
C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Numer. Linear Algebm with Applic. 2,
115--134.
9.4. ARNOLDI AND UNSYMMETRIC LANCZOS 499

9.4 Arnoldi and Unsymmetric Lanczos


If A is not symmetric, then the orthogonal tridiagonalization QT AQ = T
does not exist in general. There are two ways to proceed. The Arnoldi
approach involves the column-by-column generation of an orthogonal Q
such that QT AQ = H is the Hessenberg reduction of §7.4. The unsym-
metric Lanczos approach computes the columns of Q = [q~, ... ,qn] and
P = [P1, ... ,Pn] so that pT AQ = Tis tridiagonal and pTQ =In. Both
methods are interesting as large sparse unsymmetric eigenvalue solvers and
both can be adapted for sparse unsymmetric Ax= b solving. (See §10.4.)

9.4.1 The Basic Arnoldi Iteration


One way to extend the Lanczos process to unsymmetric matrices is due to
Arnoldi (1951) and revolves around the Hessenberg reduction QT AQ =H.
In particular, if Q = [ ql, ... , qn ] and we compare columns in AQ = Q H,
then
k+l
Aqk = Lh;kqi 1::;k::;n-1.
i=l

Isolating the last term in the summation gives

k
hk+1,kqk+1 = Aqk- L h;kq;: = rk
i=l

where h;k = q[ Aqk for i = 1:k. It follows that if rk =/= 0, then qk+l is
specified by
qk+l = Tk/hk+t,k
where hk+t,k = II rk lb· These equations define the Arnoldi process and in
strict analogy to the symmetric Lanczos process (9.1.3) we obtain :

ro = ql
hw = 1
k=O
while (hk+l,k =/= 0)
qk+l = Tk/hk+l,k
k=k+1
Tk = Aqk (9.4.1)
fori= 1:k
h;k = q[w
Tk = rk - h;kqi
end
hk+t,k = II Tk 112
end
500 CHAPTE R 9 . L ANCZOS METHODS

We assume that q1 is a given unit 2-norm starting vector. The Qk are called
t he Amcldi vectors and they define an orthonormal basis for the Krylov
subspace .t::(A,q 1 ,k):
(9.4.2)
The situation after k steps is summarized by the k-step Arnoldi factoriza-
tion

AQ,. = QkHk + rkei (9.4.3)


where Qk = lq1, .. . , qk], ek = Ik(:, k) , and
hll ht2 hlk
h2l h22 h2k

0 hk,k-l hkk

If Tk = 0, then the columns of Q,. define an invariant subspace and >.( H k) s;;;
>.(A). Otherwise, the focus is on how to extract information about A's
eigensystem from the Hessenberg matrix Hk and the mat rix Qk of Arnoldi
vectors.
If y E R" is a unit 2-norm eigenvector for Hk and Hky = >.y, then from
(9.4.3)
(A - Al)x = {ef y)rk
where x = Qi:Y· We call >. a Ritz value and x the corresponding Ritz
vector. The size of le[ Ylll rk lb can be used to obtain error bounds, although
the relevant perturbation theorems are not as routine to apply as in the
symmetric case.
Some numerical properties of tbe Arnoldi iteration are discussed in
Wilkinson (1965, pp.382). As with the symmetric Lanczos iteration, Joss
of orthogonality among the q; is an issue. But two other features of (9.4.1)
must be addressed before a practical Arnoldi eigensolver can be obtained:
• The Arnoldi vectors q1 , • •. , Qk are referenced in step k and the com-
putat ion of H,.(I:k, k) involves O(kn) flops. Thus, there is a steep
penalty associated with the generation of long Arnoldi sequences.
• The eigenvalues of H,. do not approximate the eigenvalues of A in the
style of Kaniel and Paige. This is in contrast t o the symmetric case
where information about A's extremal eigenvalues emerges quickly .
With Arnoldi, the early extraction of eigenvalue information depends
crucially on the choice of q 1•
These realities suggest a framework in which we use Arnoldi with repeated,
carefully c.hosen restarts and a controlled iteration maximum. {Recall the
s-step Lanczos process of §9.2.7.)
9.4. ARNOLDI AND UNSYMMETRIC LANCZOS 501

9.4.2 Arnoldi with Restarting


Consider running Arnoldi for m steps and then restarting the process with
a vector q+ chosen from the span of the Arnoldi vectors Q1, ••. , Qm· Because
of the Krylov connection (9.4.2), q+ has the form

for some polynomial of degree m- 1. If Av; = >.;v; for i = 1:n and q1 has
the eigenvector expansion

then
q+ = aip(>.I)VI + · · · + anp(>.n)Vn.
Note that K:(A,q+,m) is rich in eigenvectors that are emphasized by p(>.).
That is, if p(>.wanted) is large compared to p().,.nwanted), then the Krylov
space K:(A, q+, m) will have much better approximations to the eigenvector
Xwanted than to the eigenvector Xunwanted· (It is possible to couch this
argument in terms of Schur vectors and invariant subs paces rather than in
terms of particular eigenvectors.)
Thus the act of picking a good restart vector q+ from K(A, q1, m) is the
act of picking a polynomial "filter" that tunes out Wlwanted portions of the
spectrum. Various heuristics for doing this have been developed based on
computed Ritz vectors. See Saad (1980, 1984, 1992).
We describe a method due to Sorensen (1992) that determines the
restart vector implicitly using the QR iteration with shifts. The restart
occurs after every m steps and we assume that m > j where j is the num-
ber of sought-after eigenvalues. The choice of the Arnoldi length parameter
m depends on the problem dimension n, the effect of orthogonality loss, and
system storage constraints.
After m steps we have the Arnoldi factorization

where Qe E Rnxm has orthonormal columns, He E .Ire"xm is upper Hessen-


berg, and Qire = 0. The subscript "c" stands for "current." The QR
iteration with shifts is then applied to He:

H(I) =He
fori= 1:p
H(i) - ?tJ = V;R;
H(i+I) = R;V; + ?t;/
502 CHAPTER 9. LANCZOS METHODS

Here p = m - j and it is assumed that the implicitly shifted QR process of


§7.5.5 is applied. The selection of the shifts will be discussed shortly.
The orthogonal matrix V = V 1 · · · Vp has three crucial properties:
(1) H+ = VTHcV. This is because VlH(ilV; = H(i+il.

(2) [V]mi = 0 fori= 1:j -1. This is because each V; is upper Hessenberg
and so V E lR.mxm has lower bandwidth p = m- j.

(3) The first column of V has the form

(9.4.4)

where a is a scalar.

To be convinced of property (3), consider the p = 2 case:

V1(V2R2)R1 = V1(H< 2l - J1.2I)R1


Vi(VtH< 1lVI- J1.2I)R1 = (H(l)- J1.2I)V1R1
(H(l) - J1.2l)(H< 1l - J1.1l) =(He- Jl-2/)(He- Jl.l/).

Since R 2R 1 is upper triangular, the first column of V = V1 V2 is a multiple


of (He- Jl-2/)(He- Jl.Jl).
We now show how to restart the Arnoldi process using the matrix V to
implicitly select the new starting vector. From (1) we obtain the following
transformation of (9.4.3):

AQ+ = Q+H+ + ree;;v

where Q+ = QeV. This is not a new length-m Arnoldi factorization because


e;;
Vis not a multiple of e;;.
However, in view of property (2),

(9.4.5)

is a length-j Arnoldi factorization. By "jumping into" the basic Arnoldi


iteration at step j + 1 and performing p steps, we can extend (9.4.5) to a new
length-m Arnoldi factorization. Moreover, using property (3) the associated
starting vector qlnew) = Q+(:, 1) has the following characterization:

Q+(:, 1) = Qe Ve1 = aQe(He- }J.p/) ···(He- Jl.d)el


a( A- }J.p/) · · · (Ae- Jl.i/)Qeel (9.4.6)

The last equation follows from the identity

(A- }J./)Qe = Qe(He - }J./) + re;;


and the fact that e;,f( He)e 1 = 0 for any polynomial f( ·) of degree p- 1 or
less.
9 .4. ARNOLD( AND UNSYMMETRIC LANCZOS 503

Thus, q\new) = p(A)q1 where p()) Is th.e polynomial

p()) = (.X - jjt)(.X- jj-z) · · · (.X- jjp)·


This shows that the shifts are the zeros of the filtering polynomial. One
interesting choice for t he shifts is to compute .X(Hc) and to identify the
eigenvalues of interest ) 11 .•• , Xi:

.X(Hc) = {5.lt ... , XJ} u {Xi+~• . .. .Xm}·


Setting Jl.i = ),+i for i = 1:p is one way of generating a filter polynomial
that de-emphasizes the unwanted portion of the spectrum.
We have just presented the rudiments of the implicitly restarted Arnoldi
method. It has many attractive attributes. For implementation details and
further analysis, see Lehoucq and Sorensen (1996) and Morgan {1996).

9.4.3 Unsymmetric Lanczos Tridiagonalization


~nother way to extend the synunetric Lanczos process is to reduce A
to tridiagonal form using a general similarity transformation. Suppose
A E JR"x" and that a nonsingular matrix Q exists so
0

"Yn-1
0 f3n-t a..
With the column partitionings

Q = I q., . .. ,q" I
Q- T = p = IPl, .. ·,Pn I
we find upon contpa.ring columns in AQ= qr and AT P = PTT that
Aqk = "Y~:-tqk-J + a~.:q~: + f3~rqk+I
ATP~< = fJic - tPI:- 1 + <lkPk + "YicPic+l

for k = 1:n -1. TM8e equations together with the &icrtMgonalitJJ oondition
pTQ = In imply
Ole == p'fAq,.
and

f3kqk+1 := r" = (A- a,J)q" - "YJc- tqk-1


"YicPk+t =
Sk (A- a,.I)TPI< - /3Jc-lPic- t·
504 CHAPTER 9. LANCZOS METHODS

There is some flexibility in choosing the scale factors l3k and 'Yk· Note that

1 = pf+tQk+J = (sk/"fk)T (r~c//3~c).

It follows that once l3k is specified 'Yk is given by

'Yk =sfr~c/f3~c.

With the "canonical" choice f3Jt = II rk lh we obtain


q., p 1 given unit 2-norm vectors with pf Q1 =F 0.
k=O
qo =
0; t·o Qt =
Po= 0; so= Pt
while (r~c '1: 0) 1\ (sk '1: 0) 1\ (sfTJt =F 0)
fJJt =II TJt 112
'Y~< = sfr~c/ fJ~c
Qk+l = TJt/ fJk
Pk+I = s,.hk
k=k+l (9.4.7)
Ok = pfAqk
Tk =(A - O:kl)qk - 'Yk-lqk- 1

s~c = (A - a~cJ)TPk- fJ~c-tPk-1


end
If
0

0
then the situation at the bottom of the loop is summarized by the equations

A IQt. · .. , qk 1 = [ q., . .. 'Qk I T~c + rker (9.4.8)


AT(pJ, ... , p~c J [Pt ... · ,p~c) T[ + Skef. (9.4.9)

If TJt == 0, then the iteration terminates and span {q1 , ••• , Qk} is an invari-
ant subspace for A. U s~c = 0, then the iteration also terminates and
span{p1, ... ,pk} is an invariant subspace for AT. However, if neither of
these conditions are true and sfTJt = 0, t hen the tridiagonalization process
ends without any invariant subspace information. This is called serii.>US
breakdown. See Wilkinson (1965, p.389) for an early discussion of the mat-
ter.
9.4. ARNOLDl AND UNSYMMETRJC LANCZOS 505

9.4.4 The Look-Ahead Idea


It is interest ing to look t he serious breakdown issue in the block version
of (9.4.7). For clarity assume that A e Rnxn wit h n = rp. Consider t he
factorization
M1 cT
I 0
Bl M2
p TAQ= (9.4.10}

C:-1
0 Br-1 M,.

where all the blocks are p-by-p. Let Q = [ Q1 , • .. , Q,.] and P = [ P1, . . . , P,.]
be conformable partitionings of Q and P. Comparing block columns in the
equations AQ = QT and AT P = PTI' we obtain

Qk+1B1c = AQ,. -Q,.M,. - Q~c-•c'f- 1 = R,.


P~c+lc,. = AT P,. - P,.M'[- P~c-1B'L 1 - s"

Note that Mk = P[ AQ~c. If srRk E JRPXF is nonsingular and we compute


B~t, c,. e JR.Px" so that

then

Qlc +l = R,.B;;• (9.4.11)


P~c+ • = s,.c;• (9.4.12)

satisfy PJ'+ 1Qic+l = Ip. Serious breakdown in t his set ting is associated with
having a singular s'[ R,..
One way of solving the serious breakdown problem in (9.4.7} is to go
after a factorization of the form (9.4.10) in which the block sizes are dynam-
ically determined. Roughly speaking, in this approach matrices Q k+ l and
P,.+l are built up column by column with special recursions that culminate
in the production of a nonsingular P'[+ 1Qk+l· The computations are ar-
ranged so that the biorthogonality conditions Pt Qk+l = 0 and Q[ Pk+ 1 = 0
hold for i = 1:k.
A method of t his form belongs to the family of look-ahead Lanczos
methods. The length of a look-ahead step is the widt h of the Qk+l and Pic+l
that it produces. H that width is one, a conventional block Lanczos step
may be taken. Length-2 look-ahead steps are discussed in Parlett , Taylor
and Liu (1985). T he notion of incurable brookdown.ls also presented by these
authors. Freund, Gutknecht, and Nachtigal (1993) cover the general case
along with a host of implementation details. Floating point considerations
506 CHAPTER 9. LANCZOS METHODS

require the handling of "near" serious breakdown. In practice, each Mk that


is 2-by-2 or larger corresponds to an instance of near serious breakdown.

Problems

P9.4.1 Prove that the Arnoldi vectors in (9.4.1) are mutually orthogonal.
P9.4.2 Prove (9.4.4).
P9.4.3 Prove (9.4.6).
P9.4.4 Give a.n example of a starting vector for which the unsymmetric La.nczos iteration
(9.4.7) breaks down without rendering any invariant subspace information. Use

~
6
0
A= [
3

P9.4.5 Suppose HE Rnxn is upper Hessenberg. Discuss the computation of a unit


upper triangular matrix U such that HU = UT where T is tridiagonal.
P9.4.6 Show that the QR algorithm for eigenvalues does not preserve tridiagonal struc-
ture in the unsymmetric case.

Notes and References for Sec. 9.4

References [or the Arnoldi iteration and its practical implementation include Saad (1992)
and

W.E. Arnoldi (1951). "The Principle of Minimized Iterations in the Solution of the
Matrix Eigenvalue Problem," Quarterly of Applied Mathematics 9, 17-29.
Y. Saad (1980). "Variations of Arnoldi's Method for Computing Eigenelements of Large
Unsymmetric Matrices.," Lin. Alg. and Its Applic. 34, 269-295.
Y. Saad (1984). "Chebyshev Acceleration Techniques for Solving Nonsymmetric Eigen-
value Problems," Math. Comp. 42, 567-588.
D.C. Sorensen (1992). "Implicit Application of Polynomial Filters in a k-Step Arnoldi
Method," SIAM J. Matrix Anal. Appl. 13, 357-385.
D.C. Sorensen {1995). "Implicitly Restarted Arnoldi/Lanczos Methods for Large Scale
Eigenvalue Calculations," in Proceedings of the ICASEjLaRC Workshop on Pamllel
Numerical Algorithms, May 23-25, 1994, D. E. Keyes, A. Sameh, and V. Venkatakr-
ishnan (eds), Kluwer.
R.B, Lehoucq {1995). "Analysis and Implementation of an Implicitly Restarted Arnoldi
Iteration," Ph.D. thesis, ruce University, Houston Texas.
R.B. Lehoucq {1996). "Restarting an Arnoldi Reduction," Report MCS-P591-0496, Ar-
gonne National Laboratory, Argonne Jllinois.
R.B. Lehoucq and D.C. Sorensen (1996). "Deflation Techniques for an Implicitly Restarted
Iteration," SIAM J. Matrix Analysis and Applic, to appear.
R.B. Morgan (1996). "On Restarting the Arnoldi Method for Large Nonsymmetric
Eigenvalue Problems," Math Comp 65, 1213-1230.
Related papers include

A. Ruhe {1984). "Rational Krylov Algorithms for Eigenvalue Computation," Lin. Alg.
and Its Applic. 58, 391-405.
A. Ruhe (1994). "Rational Krylov Algorithms for Nonsymmetric Eigenvalue Problems
IJ. Matrix Pairs," Lin. Alg. and Its Applic. 191, 283-295.
9.4. ARNOLDI AND UNSYMMETRIC LANCZOS 507

A. Ruhe (1994). "The Rational Krylov Algorithm for Nonsymmetric Eigenvalue Prob-
lems Ill: Complex Shifts for Real Matrices," BIT 34,165-176.
T. Huckle (1994). "The Arnoldi Method for Normal Matrices," SIAM J. MatTi% Anal.
Appl. 15, 479-489.
C.C. Paige, B.N. Parlett,and H.A. VanDer Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Numer. Lineor Algelrm with Applic. 2,
115-134.
K.C. Toh and L.N. Trefethen (1996). "Calculation of PHeudospectra by the Arnoldi
Iteration," SIAM J. Sci. Cump. 17, 1-15.
The unsymmetric Lanczoa proce~B and related look ahead ideaa are nicely presented in

B.N. Parlett, D. Taylor, and z. Liu (1985). "A Look-Ahead Lanczos Algorithm for
Unsymmetrlc Matric..,," Math. Comp. 44, 105-124.
R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). "An Implementation of the
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices," SfAM J. Sci. and
Stat.Comp. 14, 137-158.
See also

Y. Saad (1982). "The Lanczos Biorthogonalizatlon Algorithm and Other Oblique Pro-
jection Methods for Solving Large Unsymmetric Eigenproblems," SIAM J. Numer.
AnaL 19, 485-506.
G.A. Geist (1991). "Reduction of a General Matrix to Tridiagonal Form," SIAM J.
Matriz Anal. Appl. 12, 362-373.
C. Brezinski, M. Zaglia, and H. Sadok (1991). "Avoiding Breakdown and Near Break-
down in Lanczos Tyoe Algorithms," Numer. Alg. 1, 261-284.
S.K. Kim and A.T. Chronopoulos (1991). "A CIBBS of Lanczos..Like Algorithms Imple-
mented on Parallel Computers," Parallel Comput. 17, 763-778.
B.N. Parlett (1992). "Reduction to Tridiagonal Form a.nd Minimal Realizations," SIAM
J. MatTi% AnaL Appl. 13, 567-593.
M. Gutknecht (1992). "A Compl~ Theory of the Unsymrnetric Lanczos Process and
Related Algorithms, Part 1," SIAM J. Matriz Anal. Appl. 13, 594--639.
M. Gutknecht (1994). "A Completed Theory of the Unsymmetric Lanczos Pro~ and
Related Algorithms, Part II," SIAM J. MatTi% Anal. Appl. 15, 15-58.
Z. Bai (1994). "Error Analysis of the Lanczos Algorithm for Nonsymmetric Eigenvalue
Problem," Math. Comp. 62, 209-226.
T. Huckle (1995). "Low-Rank Modification of tbe Unsynunetric Lanczos Algorithm,"
Math.Comp. 64, 1577-1588.
Z. Jia. (1995). "The Convergence of Generalized LanczosMethods for Large Unsymmetric
Eigenproblems," SIAM J. MatTi% Anal. Appllc 16, 543-562.
M.T. Chu, R.E. Funderlic, and G.H. Golub (1995). "A Rank-One Reduction Formula.
a.nd Its Applications to Matrix Factorizations," SIAM Retliew 37, 512-530.
Other papers include

H.A. Va.n der Vorst (1982). "A Generalized Lanczas Scheme," Math. Cump. 39, 559-
562.
D. Boley and G.H. Golub (1984). "The La.nczos-Amoldi Algorithm and Controllability,"
Syst. Control Lett. 4, 317-324.
Chapter 10

Iterative Methods for


Linear Systems

§10.1 The Standard Iterations


§10.2 The Conjugate Gradient Method
§10.3 Preconditioned Conjugate Gradients
§10.4 Other Krylov Subspace Methods

We concluded the previous chapter by showing how the Lanczos it-


eration could be used to solve various linear equation and least squares
problems. The methods developed were suitable for large sparse problems
because they did not require the factorization of the underlying matrix. In
this section, we continue the discussion of linear equation solvers that have
this property.
The first section is a brisk review of the classical iterations: Jacobi,
Gauss-Seidel, SOR, Chebyshev semi-iterative, and so on. Our treatment of
these methods is brief because our principal aim in this chapter is to high-
light the method of conjugate gradients. In §10.2, we carefully develop this
important technique in a natural way from the method of steepest descent.
Recall that the conjugate gradient method has already been introduced via
the Lanczos iteration in §9.3. The reason for deriving the method again is
to motivate some of its practical variants, which are the subject of §10.3.
Extensions to unsymmetric problems are treated in §10.4.
We warn the reader of an inconsistency in the notation of this chapter
In §10.1, methods are developed at the "(i, j) level" necessitating the use of
superscripts: xjkl denotes the i-th component of a vector x(k). In the other

508
10.1. THE STANDARD ITERATIONS 509

sections, however, algorithmic developments can proceed without explicit


mention of vector/matrix entries. Hence, in §10.2-§10.4 we dispense with
superscripts and denote vector sequences by {xk}.

Before You Begin


Chapter 1, §§2.1-2.5, and §2.7, Chapter 3, and §§4.1-4.3 are assumed.
Other dependencies include:
Chapter 9
l
§10.1 --4 §10.2 --4 §10.3 --4 §10 .. 4
i
§7.4
Texts devoted to iterative solvers include Varga (1962), Young (1971),
Hageman and Young (1981), and Axelsson (1994). The software "tem-
plates" volume by Barrett et al (1993) is particularly useful. The direct
(non-iterative) solution of large sparse systems is sometimes preferred. See
George and Liu (1981) and Duff, Erisman, and Reid (1986).

10.1 The Standard Iterations


The linear equation solvers in Chapters 3 and 4 involve the factorization
of the coefficient matrix A. Methods of this type are called direct methods.
Direct methods can be impractical if A is large and sparse, because the
sought-after factors can be dense. An exception to this occurs when A is
banded (cf. §4.3). Yet in many band matrix problems even the band itself
is sparse making algorithms such as band Cholesky difficult to implement.
One reason for the great interest in sparse linear equation solvers is the
importance of being able to obtain numerical solutions to partial differ-
ential equations. Indeed, researchers in computational PDE's have been
responsible for many of the sparse matrix techniques that are presently in
general use.
Roughly speaking, there are two approaches to the sparse Ax = b prob-
lem. One is to pick an appropriate direct method and adapt it to exploit
A's sparsity. Typical adaptation strategies involve the intelligent use of
data structures and special pivoting strategies that minimize fill-in.
In contrast to the direct methods are the itemtive methods. These meth-
ods generate a sequence of approximate solutions {x(k)} and essentially
involve the matrix A only in the context of matrix-vector multiplication.
The evaluation of an iterative method invariably focuses on how quickly the
iterates x(k) converge. In this section, we present some basic iterative meth-
ods, discuss their practical implementation, and prove a few representative
theorems concerned with their behavior.
510 CHAPTER. 10. ITERATIVE METHODS FOR. L INEAR SYSTEMS

10.1.1 The Jacobi and Gauss-Seidel Iterations


Perhaps the simplest iterative scheme is the Jacobi iteration . It is defined
for matrices that have nonzero diagonal elements. The method can be
motivated by rewriting the 3-by-3 system Ax= bas follows:
x1 = (b t - a 12x2 - ataxa)/au
X2 (b2 - a21X1 - a2axa)/a22
xa (ba- aatXt - aa2x2)/aaa
Suppose x(k) is an approximation to x = A - 1 b. A nat ural way to generate
a new approximation x<"+I) is to compute
x~k+ l) = (b1 - a 1 2x~k) - all x~"'))/au
x~HI) (~ - a11x~k) - a23x~"'))/a22 (10.1.1)
x~k+ t) (b:3- a31X~k)- aa2X~k))/aaa
This defines the Jacobi iteration for the case n = 3. For general n we have
for i = 1:n

x~k+L) = (b• - I:
J- L
a,ix)k ) - .t
J ~t+l
a;ix]"'>) /a;; (10.1.2)

e nd
Note that in the Jacobi iteration one does not use the most recently avail-
able information when comp uting x~Ht} . For example, x~k) is used in the
calculation of x~k+ L) even though component x~k+ t) is known. If we revise
the J acobi iteration so that we always use the most current est imate of t he
exact x; then we obtain
fori = l :n

x!'+>l = (b,
end
This defines what is called the Gauss-Seidel iterotion.
For both the Jacobi and Gauss-Seidel iterations, the transition from
x(k) to x (k+L} can be succinctly described in terms of the matrices L, D ,
and U defined by:
0 0 0

a21 0
L = a~u a32 0
0 0
GnJ a,.2 ... an,n - 1 0
10.1. THE STANDARD ITERATIONS 511

D diag(au, ... , ann) (10.1.4)

0 a12 a1n
0 0
u = 0 0 an-2,n

an.-1,n
0 0 0 0

In particular, the Jacobi step has the form MJx(k+I} = NJx(k) + b where
MJ = D and NJ = -(L+U). On the other hand, Gauss-Seidel is defined
by Max<k+ 1) = Nax(k) + b with Me = (D + L) and Nc = -U.

10.1.2 Splittings and Convergence


The Jacobi and Gauss-Seidel procedures are typical members of a large
family of iterations that have the form
Mx(k+ 1) = Nx(k) +b (10.1.5)

where A = M- N is a splitting of the matrix A. For the iteration (10.1.5)


to be practical, it must be "easy" to solve a linear system with M as the
matrix. Note that for Jacobi and Gaus&-Seidel, M is diagonal and lower
triangular respectively.
Whether or not (10.1.5) converges to x =A - 1 b depends upon the eigen-
values of M - 1 N. In particular, if the spectral radius of an n- by-n matrix
G is defined by
p(G) = max{ IA-1 :A. EA.( G)},
then it is the size of p(M- 1 N) is critical to the success of (10.1.5).
Theorem 10.1.1 Suppose bE IR.n and A= M- N E Ire'xn is nonsingu-
lar. If M is nonsingular and the spectral radius of M- 1 N satisfies the
inequality p(M- 1 N) < 1, then the iterates x(k) defined by Mx<k+ 1) =
Nx(k) + b converye to x = A- 1 b for any starting vector x(o).
Proof. Let e(k) = x(k} - x denote the error in the kth iterate. Since M x
=Nx+b it follows that M(x<k+ 1) -x) = N(x(k} -x), and thus, the error in
x(k+I) is given by e<k+ 1) = M- 1 Ne(k) = (M- 1 N)k+ 1e(0}. From Lemma
7.3.2 we know that (M- 1 N)k --+ 0 iff p(M- 1 N) < 1. D

This result is central to the study of iterative methods where algorithmic


development typically proceeds along the following lines:
• A splitting A = M - N is proposed where linear systems of the form
M z = d are "easy" to solve.
512 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

• Classes of matrices are identified for which the iteration matrix G ==


M- 1 N satisfies p( G) < 1.

• Further results about p( G) are established to gain intuition about


how the error e(k) tends to zero.

For example, consider the Jacobi iteration, Dx(k+I) = -(L + U)x(k) +b.
One condition that guarantees p(M:J 1 NJ) < 1 is strict diagonal dominance.
Indeed, if A has that property (defined in §3.4.10), then

< 1

Usually, the "more dominant" the diagonal the more rapid the convergence
but there are counterexamples. See P10.1. 7.
A more complicated spectral radius argument is needed to show that
Gauss-Seidel converges for symmetric positive definite A.

Theorem 10.1.2 If A E IRnxn is symmetric and positive definite, then the


Gauss-Seidel iteration {10.1.3} converges for any x(O).

Proof. Write A = L + D + LT where D = diag(a;;) and L is strictly lower


triangular. In light of Theorem 10.1.1 our task is to show that the matrix
G = -(D + L)- 1 LT has eigenvalues that are inside the unit circle. Since
Dis positive definite we have G 1 =
D 112 GD- 112 = -(I+ L 1 )- 1 Lf,
where £ 1 = n- 112 LD- 112 • Since G and G 1 have the same eigenvalues,
we must verify that p(G 1 ) < 1. If G 1x = >.x with xHx = 1, then we
have -L[ x =.\(I+ L 1 )x and thus, -xH L[ x = .\(1 + xH L 1x). Letting
a+ bi = xlf L 1 x we have

2 ~-a+bi 12 a2+b2
1>-1 = 1 +a+ bi = 1 + 2a + a 2 + b2 ·

However, since D- 112 AD- 112 = I+ L 1 + Lf is positive definite, it is not


hard to show that 0 < 1 + xH L 1 x + xH L[ x = 1 + 2a implying 1>-1 < 1. [J

This result is frequently applicable because many of the matrices that arise
from discretized elliptic PDE's are symmetric positive definite. Numerous
other results of this flavor appear in the literature.

10.1.3 Practical Implementation of Gauss-Seidel


We now focus on some practical details associated with the Gauss-Seidel
iteration. With overwriting the Gauss-Seidel step (10.1.3) is particularly
simple to implement:
10. 1. THE STANDARD ITERATIONS 513

end
This computation requires about twice as many flops as there are nonzero
entries in the matrix A. It makes no sense to be more precise about the
work involved because the actual implementation depends greatly upon the
structure of the problem at hand.
In order to stress this point we consider the application of (10.1.3) to
the NM-by-NM block tridiagonal system

0
91 11
92 h
= ( 10.1.6)

0 9M

where

4 -1 0 G(I,j) F(l,j)
-1 4 G(2,j) F(2,j)
T= '9J = ' h=
-1
0 -1 4 G(N, j) F(N,j)

This problem arises when the Poisson equation is discretized on a rectangle.


It is easy to show that the matrix A is positive definite.
With the convention that G(i, j) = 0 whenever i E { 0, N + 1} or
j E {0, M + 1} we see that with overwriting the Gauss-Seidel step takes on
the form:

for j = l :M
fori= l:N
G(i,j) = (F(i,j) + G(i - l,j) + G(i + l,j)+
G(i,j - 1) + G(i,j + 1))/4
end
end

Note that in this problem no storage is required for the matrix A.


514 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

10.1.4 Successive Over-Relaxation


The Gauss-Seidel iteration is very attractive because of its simplicity. Un-
fortunately, if the spectral radius of M(; 1 Nc is close to unity, then it may
be prohibitively slow because the error tends to zero like p(M(; 1 Nc)k. To
rectify this, let w E lR and consider the following modification of the Gauss-
Seidel step:
fori= 1:n

xlk+l) = W (bi - ~ a;jX;k+l)


j=l

+ (1- w)xlk) (10.1.7)


end

This defines the method of successive over-relaxation (SOR). Using (10.1.4)


we see that in matrix terms, the SOR step is given by
(10.1.8)

where Mw = D+wL and Nw = (1-w)D-wU. For a few structured (but


important) problems such as (10.1.6), the value of the relaxation parameter
w that minimizes p(M:;;I Nw) is known. Moreover, a significant reduction
in p(M! 1 NI) = p(M(; 1 Na) can result. In more complicated problems,
however, it may be necessary to perform a fairly sophisticated eigenvalue
analysis in order to determine an appropriate w.

10.1.5 The Chebyshev Semi-Iterative Method


Another way to accelerate the convergence of an iterative method makes
use of Chebyshev polynomials. Suppose x(ll, ... , x(k) have been generated
via the iteration Mx(j+I) = Nx(jl + b and that we wish to determine
coefficients Vj(k), j = O:k such that
k
y(k) = l:.>j(k)x(j) (10.1.9)
j=O

represents an improvement over x(k). If x(o) = · · · = x(k) = x, then it is


reasonable to insist that y(k) = x. Hence, we require
k
l:.>j(k) = 1. (10.1.10)
j=O

Subject to this constraint, we consider how to choose the vj(k) so that the
error in y(k) is minimized.
10.1. THE STANDARD ITER.ATlONS 515

Recalling from the proof of Theorem 10.1.1 that x<">-x = (M - 1 N)"e<0 >
where e<0 > = x<0 l - x, we see that
lr
y(k) - X = 2: Vj(k)(x(j) - x) "
2:vj(k)(M- 1 N)'e(O).
; ..o j=O

Working in the 2~norm we therefore obtain

(10.1.11)

where G = M- 1N and
k
Pl<(z) = 2:v;(k)zi.
f=O
Note that the condition {10.1.10) implies P~<(1) = 1.
At this point we assume that G is symmetric with eigenvalues A; that
satisfy -1 < a $ An ::; · • · ~ A1 $ {3 < 1. It follows that

II PI<(G) 112 =

Thus, to make the norm of PI<(G) small, we need a polynomial p~r(z) that
is small on [a ,/3] subject to the constraint that Pl<(1) = 1.
Consider the Chebyshev polynomials Cj(z) generated by the recursion
CJ(z) = 2zci_ 1(z)- C;-2(z) where eo(z) -=:: 1 and Ct(z) = z. These polyn~
mials satisfy lc;(z)l ::; 1 on [~1, 1] but grow rapidly off this interval. As a
consequence, the polynomial

Ck (-1 + 2z-a)
{3 - a
Pl<(z)

where
1- a 1-{3
p. = -1+2{3-a = 1+2{3 -a

satisfies P~o(1) = 1 and tends to be smaU on [o,{3]. From the definition of


P~o(z) and equation (10.1.11) we see

(1:) _ II2 < II x - x<Ol ll2


II y X - ICk{l,)l .
Thus, the larger p. is, the greater the acceleration of convergence.
In order for the above to be a p ractical acceleration procedure, we need
a more efficient method for calculating y<A:) than {10.1.9). We have been
516 CHAPTER 10 . ITERATIV E M ETHODS FOR L IN EAR S YSTEMS

tacit ly assuming that n is large and thus the retrieval of x! 0 ) , •.. , x! k) for
large k would be inconvenient or even impossible.
Fortunately, it is possible to derive a three-term recurrence among the
y<kl by exploiting t he three-term recurrence among the Chebyshev polyn~
mials. In particular, it can be shown that if

then

M z<k> = b - Ay(k) (10.1.12)

1 = 2/ (2 -a- {j),

where y< 0) = x (O) and y(ll = x( 1l. We refer to this scheme as the Cheby-
shev semi-iterative method associated with My{k+l ) = Ny(k) + b. For the
acceleration to be effective we need good lower and upper bounds a and {3.
As in SOR, these parameters may be difficult to ascertain except in a few
structured problems.
Chebyshev semi-iterative methods are extensively analy.>,ed in Varga
(1962, chapter 5), as well as in Golub and Varga (1961).

10.1.6 Symmetric SOR


In deriving the Chebyshev acceleration we assumed t hat the iteration ma-
trix G = M - 1 N was symmetric. Thus, our simple analysis does not apply
to the unsymmetric SOR iteration matrix M:; 1 N..,. However, it is pos-
sible to symmet rize the SOR method making it amenable to Chebyshev
acceleration. The idea is to couple SOR with the backward S OR scheme

for i = n:- 1:1

x~k+t) = w (bi - Eaiixjk+l) - t a,ix]kl) /aii


j= l j•i+l

+ (1 -w)xt> (10.1.13)
end

T his iteration is obtained by updating the unknowns in reverse order in


(10.1.7). Backward SOR can be described in matrix terms using (10.1.4).
In particular, we have M"' x(k+l) = N"' x(k) + wb where

Mw = D +wU and Nw= (1 -w) D-wL. (10.1.14)


10.1. THE STANDARD ITERATIONS 517

If A is symmetric (U = LT), then M.., = MJ and N.., = NJ, and we have


the iteration

(10.1.15)
=

It is clear that G M:;;T NJ M:; 1 N.., is the iteration matrix for this
method. From the definitions of Mw and Nw it follows that

(10.1.16)

If D has positive diagonal entries and KKT = (N'J n- 1 Nw) is the Cholesky
factorization, then KTGK-T = KT(MwD- 1 MJ)- 1 K. Thus, G is similar
to a symmetric matrix and has real eigenvalues.
The iteration (10.1.15) is called the symmetric successive over-relaxation
(SSOR) method. It is frequently used in conjunction with the Chebyshev
semi-iterative acceleration.

Problems

=
PlO.l.l Show that the Jacobi iteration can be written in the form x<k+t) x<•>+Hr(k)
where r<•l = b- Ax<•>. Repeat for the Gauss-Seidel iteration.
P10.1.2 Show that if A is strictly diagonally dominant, then the Gauss-Seidel iteration
converges.
P10.1.3 Show that the Jacobi iteration converges for 2-by-2 symmetric positive definite
systems.
P10.1.4 Show that if A= M- N is singular, then we can never have p(M- 1 N) < 1
even if M is nonsingular.
P10.1.5 Prove (10.1.16).
P10.1.6 Prove the converse of Theorem 10.1.1. In other words, show that if the iteration
Mx{k+l) =Nx(k) + b always converges, then p(M- 1 N) < l.
P10.1. 7 (Supplied by R.S. Varga) Suppose that

At = [ -:/2 -~/2 ] A2 = [ -lfl2 -~/4 ] .

Let J 1 and J2 be the associated Jacobi iteration matrices. Show that p(Jt) > p(J,)
thereby refuting the claim that greater diagonal dominance implies more rapid Jacobi
convergence.
P10.1.8 The Chebyshev algorithm is defined in terms of parameters
2c.(l/p)
1
Wk+ = pck+t(l/p)
where ck(>.) = cosh(kcosh- 1 {>.)] with >. > 1. {a) Show that 1 < Wk < 2 for k > 1
whenever 0 < p < 1. (b) Verify that Wk+t < Wk· (c) Determine limwk ask~ oo.
P10.1.9 Consider the 2-by-2 matrix

A= [ -~ n.
518 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

(a) Under what conditions will GaU!IS-Seidel converge with this matrix? (b) For what
range of w will the SOR method converge? What is the optimal choice for this parameter?
(c) J!P.peat (a) and (b) for the matrix

A- [
- I,.
-sT I~ ]

where S E R'x". Hint: Use the SVD of S.


P10.1.10 We want to investigate the solution of Au= f where A# AT. For a model
problem, consider the finite difference approximation to

-u" + au' = 0 O<x<1


where u(O) = 10 and u(1) = 10exp". This leads to the difference equation
i = 1:n
where R = <1hj2, uo = 10, and Un+I = 10exp". The number R should be less than
l. What is the convergence rate for the iteration Mu(k+l) = Nu(k) + f where M =
(A+ AT)/2 and N = (AT - A)/2?
PlO.l.ll Consider the iteration
1/(Hl) = w(By(k) + d _ y(k-1)) + y(k-1)
where B has Schur decomposition QT BQ = diag(A 1 , ... , A,.) with A, ;:>: • • • ;:>: A,..
Asaume that x = Bx +d. (a) Derive an equation for e(k) =
y(k) - x. (b) Assume
y(l) = By(O) +d. Show that e<•> = Pk(B)e<0 J where Pk is an even polynomial if It is
even and an odd polynomial if k is odd. (c) Write f(k) = QT e(k). Derive a difference
equation for!?) for j =
1:n. Try to specify the exact solution for general Jj0 ) and Jj'>.
(d) Show how to determine an optimal w.

Notes and References for Sec. 10.1

As we mentioned, Young (1971) has the most comprehensive treatment of the SOR
method. The object of "SOR theory" is to guide the user in choosing the relaxation
parameter w. In this setting, the ordering of equations and unknowns is critical. See

M.J.M. Bernal and J.H. Verner (1968). "On Generalizing of the Theory of Consistent
Orderings for Successive Over-Relaxation Methods," Numer. Math. 12, 21&-22.
D.M. Young (1970). "Convergence Properties of the Symmetric and Unsymmetric Over-
Relaxation Methods," Math. Camp. 24, 793-807.
D.M. Young (1972). "Generalization of Property A and Consistent Ordering," SIAM J.
Num. Anal. 9, 454-63.
ILA. Nicolaides (1974). "On a Geometrical Aspect of SOR and the Theory of Consistent
Ordering for Positive Definite Matrices," Numer. Math. 12, 99--104.
L. Adams and H. Jordan (1986). "ls SOR Color-Blind?" SIAM J. Sci. Stat. Comp. 1,
49o--506.
M. Eiermann and R.S. Varga (1993). "Is the Optimal w Best for the SOR Iteration
Method," Lin. Alg. and Its Applic. 182, 257-277.

An analysis of the Chebyshev semi-iterati"" method appears in

G.H. Golub and ILS. Varga (1961). "Chebychev Semi-Iterative Methods, Successive
Over-Relaxation Iterative Methods, and Second-Order Richardson Iterative Methods,
Parts I and II," Numer. Math. 3, 147-56, 157-68.
10.1. THE STANDARD ITERATIONS 519

This work is premised on the assumption that the underlying iteration matrix has real
eigenvalues. How to proceed when this is not the case is discussed in

T.A. Manteuffel (1977). "The Tchebychev Iteration for Nonsymmetric Linear Systems,"
Numer. Math. 28, 307-27.
M. Eiermann and W. Niethammer (1983). "On the Construction of Semi-iterative Meth-
ods," SIAM J. Numer. Anal. 20, 1153-1160.
W. Niethammer and R.S. Varga (1983). "The Analysis of k-step Iterative Methods for
Linear Systems from Summability Theory," Numer. Math. 41, 177-206.
G.H. Golub and M. Overton (1988). ''The Convergence of Inexact Chebychev and
Richardson Iterative Methods for Solving Linear Systems," Numer. Math. 53, 571-
594.
D. Calvetti, G.H. Golub, and L. Reichel (1994). "An Adaptive Chebyshev Iterative
Method for Nonsymmetric Linear Systems Based on Modified Moments," Numer.
Math. 67, 21-40.
Other unsymmetric methods include

M. Eiermann, W. Niethammer, and R.S. Varga (1992). "Acceleration of Relaxation


Methods for Non-Hermitian Linear Systems," SIAM J. Matrix Anal. Appl. 13,
979-991.
H. Elman and G.H. Golub (1990). "Iterative Methods for Cyclically Reduced Non-Self-
Adjoint Linear Systems 1," Math. Comp. 54, 671-700.
H. Elman and G.H. Golub {1990). "Iterative Methods for Cyclically Reduced Non-Self-
Adjoint Linear Systems II," Math. Comp. 56, 215-242.
R. Bramley and A. Sarneh {1992). "Row Projection Methods for Large Nonsymmetric
Linear Systems," SIAM J. Sci. Statist. Comput. 13, 168-193.

Sometimes it is possible to "symmetrize" an iterative method, thereby simplifying the


acceleration process, since all the relevant eigenvalues are real. This is the case for the
SSOR method discussed in

J.W. Sheldon (1955). "On the Numerical Solution of Elliptic Difference Equations,"
Math. Table, Aids Comp. 9, 101-12.

The parallel implementation of the classical iterations has received some attention. See

D.J. Evans (1984). "Parallel SOR Iterative Methods," Parollel Computing 1, 3-18.
N. Patel and H. Jordan (1984). "A Parallelized Point Rowwise Successive Over-Relaxation
Method on a Multiprocessor," Parollel Computing 1, 207-222.
R.J. Plemmons {1986). "A Parallel Block Iterative Scheme Applied to Computations in
Structural Analysis," SIAM J. Alg. and Disc. Methods 7, 337-347.
C. Kamath and A. Sarneh (1989). "A Projection Method for Solving Nonsymmetric
Linear Systems on Multiprocessors," Pamllel Computing 9, 291-312.

We have seen that the condition K(A) is an important issue when direct methods are
applied to Ax = b. However, the condition of the system also has a bearing on iterative
method performance. See

M. Arioli and F. Romani (1985). "Relations Between Condition Numbers and the Con-
vergence of the Jacobi Method for Real Positive Definite Matrices," Numer. Math.
46, 31-42.
M. Arioli, l.S. Duff, and D. Ruiz (1992). "Stopping Criteria for Iterative Solvers," SIAM
J. Matrix Anal. Appl. 13, 138-144.

Iterative methods for singular systems are discussed in


520 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

A. Dax (1990). "The Convergence of Linear Stationary Iterative Processes for Solving
Singular Unstructured Systems of Linear Equations," SIAM Review 32, 611--{;35.

Finally, the effect of rounding errors on the methods of this section are treated in

H. Wozniakowski (1978). "Roundoff-Error Analysis of Iterations for Large Linear Sys-


tems," Numer. Math.. 30, 301-314.
P.A. Knight (1993). "Error Analysis of Stationary Iteration and Associated Problems,"
Ph.D. thesis, Department of Mathematics, University of Manchester, England.

10.2 The Conjugate Gradient Method


A difficulty associated with the SOR, Chebyshev semi-iterative, and related
methods is that they depend upon parameters that are sometimes hard to
choose properly. For example, the Chebyshev acceleration scheme needs
good estimates of the largest and smallest eigenvalue of the underlying
iteration matrix M- 1 N. Unless this matrix is sufficiently structured, it
may be analytically impossible andfor computationally expensive to do
this.
In this section, we present a method without this difficulty for the sym-
metric positive definite Ax = b problem, the well-known Hestenes-Stiefel
conjugate gradient method. We derived this method in §9.3.1 from the
Lanczos algorithm. The derivation now is from a different point of view
and it will set the stage for various important generalizations in §10.3 and
§10.4.

10.2.1 Steepest Descent


The starting point in the derivation is to consider how we might go about
minimizing the function

where b E m.n and A E m,nxn is assumed to be positive definite and sym-


metric. The minimum value of <f>(x) is -bT A- 1 bf2, achieved by setting x
= A- 1 b. Thus, minimizing</> and solving Ax= bare equivalent problems
if A is symmetric positive definite.
One of the simplest strategies for minimizing </> is the method of steepest
descent. At a current point Xc the function </> decreases most rapidly in the
direction of the negative gradient: -\l<f>(xc) = b- Axe. We call

the residual of Xc. If the residual is nonzero, then there exists a positive
a such that rjJ(xc +arc) < <f>(xc)· In the method of steepest descent (with
10.2. THE CONJUGATE GRADIENT METHOD 521

exact line search) we set a= r?; rc/r[ Arc thereby minimizing

This gives

xo = initial guess
ro = b- Axo
k=O
while rk f- 0
k=k+1 (10.2.1)
elk= rf_ 1rk-J/rf_ 1Ark-l
Xk = Xk-l + ClkTk-l
Tk = b- Axk
end

It can be shown that

which implies global convergence. Unfortunately, the rate of convergence


may be prohibitively slow if the condition /\;2(A) = At(A)/An(A) is large.
Geometrically this means that the level curves of ¢ are very elongated
hyperellipsoids and minimization corresponds to finding the lowest point
in a relatively flat, steep-sided valley. In steepest descent, we are forced
to traverse back and forth across the valley rather than down the valley.
Stated another way, the gradient directions that arise during the iteration
are not different enough.

10.2.2 General Search Directions


To avoid the pitfalls of steepest descent, we consider the successive min-
imization of ¢ along a set of directions {Pt, P2• ... } that do not neces-
sarily correspond to the residuals {ro, rt, ... }. It is easy to show that
¢(xk-l + apk) is minimized by setting

With this choice it can be shown that

(10.2.3)

To ensure a reduction in the size of ¢ we insist that Pk not be orthogonal


to rk-1· This leads to the following framework:
522 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

xo = initial guess
ro = b- Axo
k=O
while Tk I 0
k=k+1 (10.2.4)
Choose a direction Pk such that pfrk-1 I 0.
Ctk = pfTk-I/PI Apk
Xk = Xk-1 + CtkPk
rk = b- Axk
end
Note that

Our goal is to choose the search directions in a way that guarantees con-
vergence without the shortcomings of steepest descent.

10.2.3 A-Conjugate Search Directions


If the search directions are linearly independent and Xk solves the problem
min cf>(x) (10.2.5)
:zEa:o+span{p1, ... ,p,.}

for k = 1, 2, ... , then convergence is guaranteed in at most n steps. This is


because Xn minimizes </> over lRn and therefore satisfies Axn = b.
However, for this to be a viable approach the search directions must
have the property that it is "easy" to compute Xk given Xk-1· Let us see
what this says about the determination of Pk· If

Xk = Xo + flJc-1Y + Ctpk
where Pk-1 = (p~, ... ,Pk-1 ], y E IRk- I, and Ct E IR, then
2
Ct T
cf>(xk) = </>(xo + Pk-IY) + oyTPk-lAPk
T
+ T
2Pk Apk - etpk ro.

If Pk E span{Ap1. ... , Apk- i}.l, then the cross term ayT P[_ 1Apk is zero
and the search for the minimizing Xk splits into a pair of uncoupled mini-
mizations, one for y and one for a:

min cf>(xk) = min </>(xo + Pk-lY + etpk)


"'•E>:o+opan{Pt.···•P•} v.cx

0
= min ( </>(:z:o+Pk-IY) + 2
2 vfApk
- apfro)
y,<>
10.2. THE CONJUGATE GRADIENT METHOD 523

min </>(xo + Pk-tY) +


y

Note that if Yk-1 solves the first min problem then Xk-1 = xo + Pk-tYk-1
minimizes</> over x 0 + span{pt. ... ,Pk-d· The solution to the a min prob-
lem is given by ak = pf To/pf Apk. Note that because of A-conjugacy,

pf Tk-1 pf (b- Axk-1)


= pf (b- A(xo + Pk-tYk-1)) = pf To.
With these results it follows that Xk = Xk-l + akPk and we obtain the
following instance of {10.2.4):
xo = initial guess
k=O
To= b- Axo
while Tk -1 0
k = k+ 1
Choose Pk E span{ Apt, ... , Apk-d_i so pf Tk-l -1 0. {10.2.6)
O!k = pf
Tk-t!P[ Apk
Xk = Xk-1 + O!kPk
Tk = b- Axk
end
The following lemma shows that it is possible to find the search directions
with the required properties.
Lemma 10.2.1 If Tk-1 -1 0, then there exists a Pk E span{ Ap 1, ... , APk- t}_i
such that pfTk-1 -1 0.
Proof. For the case k = 1, set p1 = To. If k > 1, then since Tk-l -1 0 it
follows that

A- 1b cf_ xo + span{pt, ... ,Pk-d '* b cf_ Axo + span{Apt, ... , APk-d
=? To cf_span{Apt,····APk-d·
Thus there exists a p E span{ Apt. ... , Apk-d_i such that pT To -1 0. But
Xk-1 E Xo + span{pt, ... ,Pk-d and so Tk-1 E To+ span{Apt, ... , APk-d·
It follows that pT Tk-1 = pT To -1 0. (]
The search directions in {10.2.6) are said to be A-conjugate because
pf Api = 0 for all i -1 j. Note that if Pk = [p1, ... ,pk] is the matrix of these
vectors, then
P'{ APk = diag{pf Apt, ... ,pf Apk)
is nonsingular since A is positive definite and the search directions are
nonzero. It follows that Pk has full column rank. This guarantees conver-
gence in {10.2.6) in at most n steps because Xn (if we get that far) minimizes
</>(x) over ran(Pn) = !Rn.
524 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

10.2.4 Choosing a Best Search Direction


A way to combine the positive aspects of steepest descent and A-conjugate
searching is to choose Pk in (10.2.6) to be the closest vector to rk-1 that is
A-conjugate to p 1, ... , Pk-1· This defines "version zero" of the method of
conjugate gradients:

x 0 = initial guess
k=O
ro = b- Axo
while rk # 0
k=k+l
if k = 1
P1 = ro
else (10.2.7)
Let Pk minimize II p- rk-l ll2 over all vectors
p E span{Ap,, ... , Apk-d .L
end
ak = Pr rk-1/Pr Apk
Xk = Xk-1 + CtkPk
rk = b- Axk
end
X= Xk

To make this an effective sparse Ax = b solver, we need an efficient method


for computing Pk. A considerable amount of analysis is required to develop
the final recursions. The first step is to show that Pk is the minimum
residual of a certain least squares problem.

Lemma 10.2.2 Fork 2 2 the vectors Pk generated by {10.2. 7) satisfy

where Pk-1 = [p,, ... ,Pk-1] and Zk-1 solves min II rk-1- APk-lZ 112·
z E lRk-l

Pmof. Suppose Zk- 1 solves the above LS problem and let p be the associ-
ated minimum residual:

It follows that pT APk_ 1 = 0. Moreover, p = [I- (APk_ 1)(APk-l)+]rk-l


is the orthogonal projection of Tk-l into ran(APk-l).L and so it is the clos-
est vector in ran(APk-I).L to Tk-1· Thus, p = Pk· D
10.2. THE CONJUGATE GRADIENT METHOD 525

With this result we can establish a number of important relationships be-


tween the residuals rk, the search directions Pk, and the Krylov subspaces
JC(ro, A, k) = span{ro, Ar0, ... , Ak- 1ro}.
Theorem 10.2.3 After k iterations in (10.2. 7) we have
rk rk-1- akAPk (10.2.8)
P[rk = 0 (10.2.9)
span{p1, ... ,pk} span{ro, ... ,rk-d = JC(ro,A,k) (10.2.10)
and the residuals ro, ... , Tk are mutually orthogonal.
Proof. Equation (10.2.8) follows by applying A to both sides of Xk =
Xk-1 + D<kPk and using the definition of the residual.
To prove {10.2.9), we recall that Xk = xo + PkYk where Yk is the mini-
mizer of
1 T T T
¢(xo + Pky) = ¢(xo) + 2y (Pk APk)Y- y Pk(b- Axo).

But this means that Yk solves the linear system (PJ' APk)Y = PJ'(b- Axo).
Thus
0 = P'{ (b- Axo) - P'{ APkYk = P'{ (b- A(xo + PkYk)) = P'{ rk.
To prove {10.2.10) we note from (10.2.8) that
{Ap1, ... ,Apk-d ~ span{ro, ... ,Tk-d
and so from Lemma 10.2.2,
Pk = Tk-1- [Ap1, ... , APk-1] Zk-1 E span{ro, ... , Tk-d
It follows that
[p1, ... ,Pk] = [ro, ... 'Tk-1] T
for some upper triangular T. Since the search directions are independent,
Tis nonsingular. This shows
span{p1, ... ,pk} = span{ro, ... , rk-d·
Using (10.2.8) we see that
rk E span{rk-1, Apk} ~ span{rk-1• Aro, ... , Ark_J}.
The Krylov space connection in (10.2.10)follows from this by induction.
Finally, to establish the mutual orthogonality of the residuals, we note
from (10.2.9) that rk is orthogonal to any vector in the range of Pk. But
from {10.2.10) this subspace contains ro, ... , Tk-I· D
Using these facts we next show that Pk is a simple linear combination
of its predecessor Pk- 1 and the "current" residual rk_ 1 .
526 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

Corollary 10.2.4 The residuals and search directions in (10.2. 7} have the
properly that Pk E span {Pk-i, Tk-i} for k ;:: 2.

Proof. If k = 2, then from (10.2.10) P2 E span{r0 , TJ}. But Pi = ro and


so p 2 is a linear combination of p 1 and TJ.
If k > 2, then partition the vector Zk-i of Lemma 10.2.2 as

Zk-i = [ ~]
,.. k-12

Using the identity Tk-i = rk-2- O:k-iAPk-i• we see that

where

Sk-i
J.!
---Tk-2 - APk-2W
O:k-1
E span{rk-2, APk-2W}
<:::; span{rk-2, Ap1, ... , APk-2}
<:::; span{rJ, ... ,rk-2}

Because the r; are mutually orthogonal, it follows that Sk-i and Tk-i are
orthogonal to each other. Thus, the least squares problem of Lemma 10.2.2
boils down to choosing J.! and w such that
2
II Pk II~ = (1 + _J.!_)
O:k-1
II rk-i II~ +II Sk-J II~
is minimum. Since the 2-norm of Tk_ 2 -APk-2Z is minimized by Zk-2 giving
residual Pk-i, it follows that Sk-i is a multiple of Pk-i· Consequently,
Pk E span{rk-l,Pk-1}. IJ

We are now set to derive a very simple expression for Pk· Without loss
of generality we may assume from Corollary 10.2.4 that

Pk = Tk-i + f3kPk-i·
Since pf_ 1 Apk =0 it follows that

This leads us to ''version 1" of the conjugate gradient method:


10.2. THE CONJUGATE GRADIENT METHOD 527

xo = initial guess
k=O
ro= b-Axo
while rk '# 0
k=k+1
if k =1
Pi= ro
else
f3k = -pf_iArk-1/PL~Apk-t
Pk = Tk-1 + f3kPk-i (10.2.11)
end
C<k = p'[Tk- tfp'[ Apk
Xk = Xk-1 + C<kPk
Tk = b- Axk
end
X=Xk
In this implementation, the method requires three separate matrix-vector
multiplications per step. However, by computing residuals recursively via
rk = rk-1 - okAPk and substituting
(10.2.12)
and
(10.2.13)
into the formula for f3k, we obtain the following more efficient version:

Algorithm 10.2.1 [Conjugate Gradients) If A E nexn is symmetric


positive definite, bE IR", and xo E IRn is an initial guess (Ax 0 ::::: b), then
the following algorithm computes x E IRn so Ax = b.
k=O
ro=b-Axo
while rk '# 0
k=k+1
if k = 1
P1 = ro
else
f3k = r'[_ 1rk-1/r'[_ 2rk-2
Pk = Tk-1 + !3kPk-i
end
ak =r'f_ 1rk-tfp'[Apk
Xk = Xk-1 + C<kPk
Tk = Tk-1 - C<kAPk
end
X =Xk
528 CHAPTER 10. ITERATIVE METHODS Jo'OR LINEAR SYSTEMS

This procedure is essentially t he form of the conjugate gradient algorithm


that appears in the original paper by Hestenes and Stiefel (1952). Note
that only one matrix-vector multiplication Is required per iteration.

10.2.5 The Lanczos Connection


In §9.3.1 we derived the conjugate gradient method from the Lanczos al-
gorithm. Now let us look at the connections between these two algorithms
in t he reverse direction by "deriving" the Lanczos frocess from conjugate
gradients. Define the matrix of residuals R~o E Rnx by

and t he upper bidiagonal matrix Bk E JEtxk by

1 -(Jz 0 0

0 1 -/3a

From the equations Pi = r;-1 +PiPi-t. i = 2:k, and PI = ro it follows that


R~c = P~cB~~:. Since the columns of I\ = [Pt. . .. ,Pk] are A-conjugate, we
see that nr AR1c = B'[diag(pf Apt. . .. , pi Apk)B1c is tridiagonal. From
(10.2.10) it follows that if

then the columns of R~ca- 1 form an orthonormal basis for the subspace
span{ro,Aro, ... ,A"- 1r 0 }. Consequently, the columns of this mat rix are
essentially the La.oczos vectors of Algorithm 9.3.1, i.e.,

q; = ±r;-J/Pl-1 i = l:k.
Moreover, the tridiagonal matrix associated with these Lanczos vectors is
given by
(10.2.14)

The d iagonal and subdiagonal of this matrix involve quantities that are
readily available during the conjugate gradient iteration. Thus, we can
obtain good estimates of A's extremal eigenvalues (and condition number)
as we generate the Xk in Algorit hm 10.2.1.
10.2. THE CONJUGATE GRADIENT METHOD 529

10.2.6 Some Practical Details


The termination criteria in Algorithm 10.2.1 is unrealistic. Rounding errors
lead to a loss of orthogonality among the residuals and finite termination
is not mathematically guaranteed. Moreover, when the conjugate gradient
method is applied, n is usually so big that 0( n) iterations represents an
unacceptable amount of work. As a consequence of these observations, it
is customary to regard the method as a genuinely iterative technique with
termination based upon an iteration maximum krrwx and the residual norm.
This leads to the following practical version of Algorithm 10.2.1:

x = initial guess
k=O
r = b- Axo
Po= I r II~
while ( y'iik > ~11 b lb) A (k < krrwx)
k=k+1
ifk=1
p=r
else (10.2.16)
/3k =Pk-t/Pk-2
p = r + f3kP
end
w=Ap
Ok = Pk-t!PT w
x =x+okp
r =r-okw
Pk =II r II~
end

This algorithm requires one matrix-vector multiplication and IOn flops per
iteration. Notice that just four n-vectors of storage are essential: x, r, p,
and w. The subscripting of the scalars is not necessary and is only done
here to facilitate comparison with Algorithm 10.2.1.
It is also possible to base the termination criteria on heuristic estimates
of the error A- 1 rk by approximating II A- 1 ll2 with the reciprocal of the
smallest eigenvalue of the tridiagonal matrix Tk given in (10.2.14).
The idea of regarding conjugate gradients as an iterative method began
with Reid (1971). The iterative point of view is useful but then the mte of
convergence is central to the method's success.

10.2.7 Convergence Properties


We conclude this section by examining the convergence of the conjugate
gradient iterates {xk}. Two results are given and they both say that the
530 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

method performs well when A is near the identity either in the sense of a
low rank perturbation or in the sense of norm.
Theorem 10.2.5 If A = I + B is an n-by-n symmetric positive definite
matrix and mnk(B) = r then Algorithm 10.2.1 converges in at most r + 1
steps.
Proof. The dimension of

span{ro,Aro, ... ,Ak-lro} = span{ro,Bro, ... ,Bk-lro}

cannot exceed r + 1. Since p 1 , ••• ,pk span this subspace and are indepen-
dent, the iteration cannot progress beyond r + 1 steps. D

An important metatheorem follows from this result:

• If A is close to a rank r correction to the identity, then Algorithm


10.2.1 almost converges after r + 1 steps.

We show how this heuristic can be exploited in the next section.


An error bound of a different flavor can be obtained in terms of the
A-norm which we define as follows:

Theorem 10.2.6 Suppose A E Rnxn is symmetric positive definite and


bE R· If Algorithm 10.2.1 produces itemtes {xk} and ti = t> 2 (A) then

Proof. See Luenberger (1973, p.187). D

The accuracy of the {xk} is often much better than this theorem predicts.
However, a heuristic version of Theorem 10.2.6 turns out to be very useful:

• The conjugate gradient method converges very fast in the A-norm if


t>2(A) ~ 1.
In the next section we show how we can sometimes convert a given Ax = b
problem into a related Ax = b problem with A being close to the identity.

Problems

P10.2.1 Verify that the residuals in (10.2.1) satisfy r"[ r; = 0 whenever j = i + 1.


P10.2.2 Verify (10.2.2).
Pl0.2.3 Verify (10.2.3).
Pl0.2.4 Verify (10.2.12) aud (10.2.13).
10.2. THE CONJUGATE GRADIENT METHOD 531

P10.2.5 Give formula for the entri'"' of the tridiagonal matrix Tk in (10.2.14).
P10.2.6 Compare the work and storage requirements associated with the practical im-
plementation of Algorithms 9.3.1 and 10.2.1.
P10.2.'1' Show that if A E R'xn is symmetric positive definite and has It distinct eigen-
values, then the conjugate gradient method does not require more than It + 1 steps to
converge.
P10.2.8 Use Theorem 10.2.6 to verify that

Notes and References for Sec. 10.2


The conjugate gradient method Is a member of a larger class of methods that are referred
to as conjugate direction algorithms. In a conjugate direction algorithm the search di-
rections are all B-conjugate for some suitably chosen matrix B. A discussion of these
methods appears in

J.E. Dennis Jr. and K. Thrner (1987). "Generalized Conjugate Directions,• Lin. Alg.
and Its Applic. 88/89, 187-209.
G.W. Stewart (1973). "Conjugate Direction Methods for Solving Systems of Linear
Equations," Numer. Math. 111, 284-97.

Some historical and unifying perspectives are offered in

G. Golub and D. O'Leary (1989). "Some History of the Conjugate Gradient and Lancws
Methods," SIAM Review 81, SQ-102.
M.R. Hestenes (1990). "Conjugacy and Gradients," in A Hi.tory of Scientific Comput-
ing, Addison-Wesley, Reading, MA.
S. Ashby, T.A. Manteulfel, and P.E. Saylor (1992). "A Taxonomy for Conjugate Gradient
Methods," SIAM J. Numer. Anal. 117, 1542-1568.
The classic reference for the conjugate gradient method is

M.R. Hestenes and E. Stiefel (1952). "Methods of Conjugate Gradients for Solving
Linear Systems," J. Res. Nat. Bur. Stand. ,49, 409-36.
An exact arithmetic analysis of the method may be found in chapter 2 of

M.R. Hestenes (1980). Conjugate Direction Methods in Optimization, Springer-Verlag,


Berlin.
See also

0. Axelsson (1977). "Solution of Linear Systems of Equations: Iterative Methods," in


Sparse Mat.TU Techniques: Copenhagen, 1976, ed. V.A. Barker, Springer-Verlag,
Berlin.
For a discussion of conjugate gradient convergence behavior, see

D. G. Luenberger (1973). Introduction to Linear and Nonlinear Progmmming, Addison-


Wesley, New York.
A. van der Sluis and H. A. Van Der Vorst (1986). "The Rate of Convergence of Conjugate
Gradients," Numer. Math. ,48, 543-560.
532 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

The idea of using the conjugate gradient method as an iterative method was first dis-
cussed in

J.K. REid (1971). " On the Method of Conjugate Gradients for the Solution of Large
Sparse Systems of Linear Equations," in Large Sparse Sets of Linear Equations , ed.
J.K. REid, Academic Press, New York, pp. 231-54.
Several authors have attempted to explain the algorithm's behavior in finite precision
arithmetic. See

H. Wozniakowski (1980). "Roundoff Error Analysis of a New Class of Conjugate Gradient


Algorithms," Lin. Alg. and Its Applic. 29,
A. Greenbaum and Z. Strakos (1992). "Predicting the Behavior of Finite Precision
lanczos and Conjugate Gradient Computations," SIAM 1. Matrix Ana. Applic. 13,
121-137.

See also the analysis in

G.W. Stewart (1975). "The Convergence of the Method of Conjugate Gradients at


Isolated Extreme Points in the Spectrum," Nv.mer. Math. 24, 85-93.
A. Jennings (1977). "Influence of the Eigenvalue Spectrum on the Convergence Rate of
the Conjugate Gradient Method," J. Inst. Math. Applic. 20, 61-72.
J. Cullum and R. Willoughby (1980). "The Lanczos Phenomena: An Interpretation
Based on Conjugate Gradient Optimixation," Lin. Alg. and Its App!ic. 29, 63-90.

Finally, we mention that the method can be used to compute an eigenvector of a large
sparse symmetric matrix:

A. Ruhe and T. Wiberg (1972). "The Method of Conjugate Gradients Used in Inverse
Iteration," BIT 12, 543-54.

10.3 Preconditioned Conjugate Gradients


We concluded the previous section by observing that the method of con-
jugate gradients works well on matrices that are either well conditioned or
have just a few distinct eigenvalues. (The latter being the case when A is
a lower rank perturbation of the identity.) In this section we show how to
precondition a linear system so that the matrix of coefficients assumes one
of these nice forms. Our treatment is quite brief and informal. Golub and
Meurant (1983) and Axelsson (1985) have more comprehensive expositions.

10.3.1 Derivation
Consider the n-by-n symmetric positive definite linear system Ax = b. The
idea behind preconditioned conjugate gradients is to apply the "regular"
conjugate gradient method to the transformed system

Ax= b, (10.3.1)

where A = c-l Ac-l' X Cx, b = c- 1b, and c is symmetric positive


definite. In view of our remarks in §10.2.8, we should try to choose C
10.3. PRECONDITIONED CONJUGATE GRADIENTS 533

so that A is well conditioned or a matrix with clustered eigenvalues. For


reasons that will soon emerge, the matrix C 2 must also be "simple."
If we apply Algorithm 10.2.1 to (10.3.1), then we obtain the iteration

k=O
x0 = initial guess (Ax 0 :::::b)
fo =b- Aio
while fk # 0
k=k+1
if k = 1
ih = fo
else (10.3.2)
~k = rf_ 1fk-I/rf_ 2fk-2
Pk = Tk-! + ~kPk-!
end
ak = rf_ 1rk-J/fifC- 1AC- 1fik
Xk = h-J + OkPk
fk = fk-J- akC- 1 AC- 1pk

Here, Xk should be regarded as an approximation to x and rk is the residual


in the transformed coordinates, i.e., fk = b- Aik. Of course, once we have x
then we can obtain x via the equation x = c- 1i. However, it is possible to
avoid explicit reference to the matrix c- 1 by defining Pk = Cpk, Xk = Cxk,
and rk = c- 1rk. Indeed, if we substitute these definitions into (10.3.2) and
recall that b = 1
c-
b and i = Cx, then we obtain

k=O
xo = initial guess ( Axo ::::: b)
ro = b- Axo
while c-!Tk fc 0
k=k+l
if k = 1
Cp1 = c- 1ro
else (10.3.3)
~k = (C- 1rk_t)T(c- 1rk-J)/(C- 1rk-2)T(c- 1rk-2)
Cpk = c-
1
rk-l + ~kcPk-!
end
ak = (C- 1 rk_t)T(c- 1rk_J)f(Cpk)T(c- 1AC- 1)(Cpk)
Cxk = Cxk-1 + akCPk
c- 1rk = C- 1rk-1- Ok(C- 1AC- 1)Cpk
end
Cx = Cxk
534 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

If we define the preconditioner M by M = C 2 (also positive definite) and


let Zk be the solution of the system M Zk = Tk then (10.3.3) simplifies to

Algorithm 10.3.1 [Preconditioned Conjugate Gradients] Given a


symmetric positive definite A E IRnxn, bE !Rn, a symmetric positive def-
inite preconditioner M, and an initial guess x 0 (Axo ""' b), the following
algorithm solves the linear system Ax = b.

k=O
ro = b- Axo
while (rk # 0)
Solve M Zk = Tk.
k=k+1
ifk = 1
PI= zo
else
~k =r[_ 1Zk-1/r[_ 2zk-2
Pk = Zk-1 + fJkPk-1
end
ak = rf_ 1zk-1/P[ Apk
Xk = Xk-1 + akPk
Tk = Tk-1 - akAPk
end
X =Xk

A number of important observations should be made about this procedure:

• It can be shown that the residuals and search directions satisfy

i#j (10.3.4)

i#j (10.3.5)

• The denominators rf_ 2zk-2 = zf_ 2Mzk-2 never vanish because M


is positive definite.

• Although the transformation C figured heavily in the derivation of the


algorithm, its action is only felt through the preconditioner M = C 2 •

• For Algorithm 10.3.1 to be an effective sparse matrix technique, linear


systems of the form M z = r must be easily solved and convergence
must be rapid.

The choice of a good preconditioner can have a dramatic effect upon the
rate of convergence. Some of the possibilities are now discussed.
10.3. PRECONDITIONED CONJUGATE GRADIENTS 535

10.3.2 Incomplete Cholesky Preconditioners


One of the most important preconditioning strategies involves computing an
incomplete Cholesky factorization of A. The idea behind this approach is
to calculate a lower triangular matrix H with the property that H has some
tractable sparsity structure and is somehow "close" to A's exact Cholesky
factor G. The preconditioner is then taken to be M = H HT. To appreciate
this choice consider the following facts:
• There exists a unique symmetric positive definite matrix C such that
M=C 2 •
• There exists an orthogonal Q such that C = QHT, i.e., HT is the
upper triangular factor of a QR factorization of C.
We therefore obtain the heuristic
A c- 1 Ac- 1 = c-T Ac- 1 (10.3.6)
= (HQT)-1 A(QHT)-1 = Q(H-1GGT H-T)QT::::: I
Thus, the better H approximates G the smaller the condition of A, and the
better the performance of Algorithm 10.3.1.
An easy but effective way to determine such a simple H that approxi-
mates G is to step through the Cholesky reduction setting h;i to zero if the
corresponding a;j is zero. Pursuing this with the outer product version of
Cholesky we obtain
fork= l:n
A(k, k) = J A(k, k)
fori= k + l:n
if A(i,k) # 0
A(i,k) = A(i,k)/A(k,k)
end
end (10.3.7)
for j = k + l:n
fori= j:n
if A(i,j) # 0
A(i,j) = A(i,j)- A(i, k)A(j, k)
end
end
end
end
In practice, the matrix A and its incomplete Cholesky factor H would
be stored in an appropriate data structure and the looping in the above
algorithm would take on a very special appearance.
Unfortunately, (10.3.7) is not always stable. Classes of positive definite
matrices for which incomplete Cholesky is stable are identified in Manteuffel
(1979). See also Elman (1986).
536 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

10.3.3 Incomplete Block Preconditioners


As with just about everything else in this book, the incomplete factoriza-
tion ideas outlined in the previous subsection have a block analog. We
illustrate this by looking at the incomplete block Cholesky factorization of
the symmetric, positive definite, block tridiagonal matrix

For purposes of illustration, we assume that the A; are tridiagonal and the
E; are diagonaL Matrices with this structure arise from the standard 5-
point discretization of self-adjoint elliptic partial differential equations over
a two-dimensional domain.
The 3-by-3 case is sufficiently general. Our discussion is based upon
Concus, Golub, and Meurant (1985). Let

be the exact block Cholesky factor of A. Although G is sparse as a block


matrix, the individual blocks are dense with the exception of G1. This can
be seen from the required computations:

G1Gf = B, - AI
F1 = E,c-;1
c2cr = B2 = A2-F1F'[ = A2- E,B1 1Ef
F2 = E2G"i. 1
cacr = Ba - Aa -F2F[ Aa- E2Bi. 1 E[

We therefore seek an approximate block Cholesky factor of the form

so that we can easily solve systems that involve the preconditioner M =


G(;T. This involves the imposition of sparsity on G's blocks and here is
a reasonable approach given that the A; are tridiagonal and the E; are
diagonal:
10.3. PRECONDITIONED CONJUGATE GRADIENTS 537

ih =AI
EJ't( 1
Fh = A2- E1A 1ET, A1 (tridiagonal) ::::: B! 1
1
E2G"2
B3 = A3- E2A2Ef, A2 (tridiagonal) ::::: B2 1

Note that all the B; are tridiagonal. Clearly, the A; mnst be carefully
chosen to ensure that the B; are also symmetric and positive definite. It
then follows that the G; are lower bidiagonal. The F; are full, but they
need not be explicitly formed. For example, in the course of solving the
system M z = r we must solve a system of the form

Forward elimination can be used to carry out matrix-vector products that


- - 1
involve the F; = E,G; :

Glwl Tl
- --I
G2w2 r2- F1w1 = T2- E1G 1 w1
- - I
Caw a ra - F2w2 = T3 - E2Gi w2

The choice of A; is delicate as the resulting B; must be positive definite.


As we have organized the computation, the central issue is how to approx-
imate the inverse of an m-by-m symmetric, positive definite, tridiagonal
matrix T = (t; 1 ) with a symmetric tridiagonal matrix A. There are several
reasonable approaches:

• Set A= diag(1/tn, ... , 1/tnn)·

• Take A to be the tridiagonal part of T- 1 • This can be efficiently


computed since there exist u, v E JR."' such that the lower triangular
part of T- 1 is the lower triangular part of uvT. See Asplund(1959).

• Set A= uru where U is the lower bidiagonal portion of a- 1 where


T = GGT is the Cholesky factorization. This can be found in O(m)
flops.
For a discussion of these approximations and what they imply about the
associated preconditioners, see Concus, Golub, and Meurant (1985).
538 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

10.3.4 Domain Decomposition Ideas


The numerical solution of elliptic partial differential equations often leads
to linear systems of the form

B1 XJ di

B2 X2 d2

(10.3.8)

Ap Bp Xp dp

B'[ BJ' BT
p Q z f

if the unknowns are properly sequenced. See Meurant (1984). Here, the
A; are symmetric positive definite, the B; are sparse, and the last block
column is generally much narrower than the others.
An example with p = 2 serves to connect (10.3.8) and its block structure
with the underlying problem geometry and the chosen domain decomposi-
tion. Suppose we are to solve Poisson's equation on the following domain:

+++++++++
+++++++++
+++++++++
+++++++++
+++++++++
+++++++++
+++++++++
••••••• ••
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X

\Vith the usual discretization, an unknown at a mesh point is coupled only


to its "north", "east", "south", and ''west" neighbor. There are three
"types" of variables: those interior to the top subdomain (aggregated in
the subvector Xt and associated with the "+" mesh points), those interior
to the bottom subdomain (aggregated in the subvector x 2 and associated
with the ''x" mesh points), and those on the interface between the two
subdomains (aggregated in the subvector z and associated with the "*"
mesh points). Note that the interior unknowns of one subdomain are not
coupled to the interior unknowns of another subdomain, which accounts
10.3. PRECONDITIONED CONJUGATE GRADIENTS 539

for the zero blocks in (10.3.8). Also observe that the number of interface
unknowns is typically small compared to the overall number of unknowns.
Now let us explore the preconditioning possibilities associated with
(10.3.8). We continue with the p = 2 case for simplicity. If we set
1
[ M- 0
M L I 0 M21 qLT
0 0 s-1

where

M~
~]
0
L = [ M2
BT
I nr
then

M =
[ M1
0
BT
I
0
M2 B1
BT
2
B2
s.
l (10.3.9)

with S. = S + B[Mj 1 B 1 + BfM; 1 B 2 • Let us consider how we might


choose the block parameters M 1 , M 2 , and S so as to produce an effective
preconditioner.
If we compare (10.3.9) with the p = 2 version of (10.3.8) we see that it
makes sense for M; to approximate A; and for S. to approximate Q. The
latter is achieved if S "" Q- Bf Mj 1B1 - Bf M2 1B2. There are several
approaches to selecting S and they all address the fact that we cannot form
the dense matrices B;M;- 1 B[. For example, as discussed in the previous
subsection, tridiagonal approximations of the M;- 1 could be used. See
Meurant (1989).
If the subdomains are sufficiently regular and it is feasible to solve linear
systems that involve the A; exactly (say by using a fast Poisson solver), then
we can set M; = A;. It follows that M = A+ E where the rank( E) = m
with m being the number of interface unknowns. Thus, the preconditioned
conjugate gradient algorithm would theoretically converge in m + 1 steps.
Regardless of the approximations that must be incorporated in the pro-
cess, we see that there are significant opportunities for parallelism because
the subdomain problems are decoupled. Indeed, the number of subdomains
p is usually a function of both the problem geometry and the number of
processors that are available for the computation.

10.3.5 Polynomial Preconditioners


The vector z defined by the preconditioner system M z = r should be
thought of as an approximate solution to Az = r insofar as M is an ap-
proximation of A. One way to obtain such an approximate solution is to
540 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

apply p steps of a stationary method M 1z(k+l) = N 1 z(k) + r, z(o) = 0. It


follows that if G = Mj" 1 N 1 then

z = z(P) =(I+ G + ... QP- 1 )Mj" 1r.

Thus, if M- 1 = (I+ G + · · · QP- 1 )M1- 1 then Mz = r and we can think


of M as a preconditioner. Of course, it is important that M be symmetric
positive definite and this constrains the choice of M 1 , N" and p. Because
M is a polynomial in G it is referred to as a polynomial preconditioner.
This type of preconditioner is attractive from the vector /parallel point of
view and has therefore attracted considerable attention.

10.3.6 Another Perspective


The polynomial preconditioner discussion points to an important connec-
tion between the classical iterations and the preconditioned conjugate gra-
dient algorithm. Many iterative methods have as their basic step

(10.3.10)

where Mzk-1 Tk-1 = b- Axk-1· For example, if we set Wk = 1, and


= 1, then
'Yk
1
Xk = M- (b-Axk-d+xk-I.
i.e., Mxk = Nxk-1 + b, where A = M- N. Thus, the Jacobi, Gauss-
Seidel, SOR, and SSOR methods of §10.1 have the form (10.3.10). So also
does the Chebyshev semi-iterative method (10.1.12).
Following Concus, Golub, and O'Leary (1976), it is also possible to
organize Algorithm 10.3.1 with a central step of the form (10.3.10):

x-1 = 0; xo =initial guess; k = 0; ro = b- Ax0


while rk of 0
k=k+1
Solve Mzk-1 = Tk-1 for Zk-1·
'Yk-1 = z'[_ 1Mzk_Jiz'{_ 1Azk_ 1
if k = 1 (10.3.11)
w1 = 1
else
Wk = (1- T
'Yk-1 z~;-1MZk-1
'Yk-2 Zk_ 2MZk-2 Wk-1
1) -1

end
Xk = Xk-2 + Wkbk-1Zk-1 + Xk-1- Xk-2)
rk =b-Axk
end
X =.Xn
10.3. PRECONDITIONED CONJUGATE GRADIENTS 541

Thus, we can think of the scalars wk and "'k in (10.3.11) as acceleration


parameters that can be chosen to speed the convergence of the iteration
Mx~r = Nxk-1 +b. Hence, any iterative method based on the splitting
A = M - N can be accelerated by the conjugate gradient algorithm as long
as M (the preconditioner) is symmetric and positive definite.

Problems

P10.3.1 Detail an incomplete factorization procedure that is based on gaxpy Cholesky,


i.e., Algorithm 4.2.1.
Pl0.3.2 How many n-vectors of storage is required by a practical implementation of
Algorithm 10.3.1? Ignore workspaces that may be required when M z = r is solved.

Notes and References for Sec. 10.3

Our discussion .,f the preconditioned conjugate gradient is drawn from .several sources
including

P. Concus, G.H. Golub, and D.P. O'Leary (1976). " A Generalized Conjugate Gradient
Method for the Numerical Solution of Elliptic Partial Differential Equations," in
Sparse Matrix Computations , ed. J.R. Bunch and D.J. Rose, Academic Press, New
York.
G.H. Golub and G. Meumnt (1983). Resolution Numerique des Gmndes Systemes
Lineaires, Collection de Ia Direction des Etudes et Recherches de l'Eiectricite de
France, vol. 49, Eyolles, Paris.
0. Axelsson (1985). "A Survey of Preconditioned Iterative Methods for Linear Systems
of Equations," BIT f5, 166-187.
P. Concus, G.H. Golub, and G. Meurant (1985). "Block Preconditioning for the Conju-
gate Gradient Method," SIAM J. Sci. and Stat. Comp. 6, 22(}-252.
0. Axelsson and G. Lindskog (1986). "On the Rate of Convergence of the Preconditioned
Conjugate Gradient Method," Numer. Math. 48, 499-523.

Incomplete factorization ideas are detailed in

J.A. Meijerink and H.A. Vander vorst (1977). "An Iterative Solution Method for Linear
Equation Systems of Which the Coefficient Matrix is a Symmetric M-Matrix," Math.
Comp. :n, 148-{)2.
T.A. Mantueflel (1979). "Shifted Incomplete Cholesky Factorization," in Sparse Matrix
Proceedings, 1978, ed. l.S. Duff and G.W, Stewart, SIAM Publications, Philadelphia,
PA.
T.F. Chao, K.R. Jackson, and B. Zhu (1983). "Alternating Direction Incomplete Fac-
torizations," SIAM J. Numer. Anal. fO, 239-257.
G. Roderigue and D. Wolitzer (1984). "Preconditioning by Incomplete Block Cyclic
Reduction," Math. Comp. 4f, 549-566.
0. Axelsson (1985). "Incomplete Block Matrix Factorization Preconditioning Methods.
The ffitimate Answer?", J. Comput. Appl. Math. 12&13, 3-18.
0. Axelsson (1986). "A General Incomplete Block Matrix Factorization Method," Lin.
Alg. Appl. 14, 179-190.
H. Elman (1986). "A Stability Analysis of Incomplete LU Factorization," Math. Comp.
47, 191-218.
T. Chan (1991). "Fourier Analysis of RelllXed Incomplete Factorization Precondition-
ers," SIAM J. Sci. Statist. Comput. lf, 668-680.
542 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

Y. Notay {1992). "On the Robustness of Modified Incomplete Factorization Methods,"


1. Comput. Math. 40, 121-141.
For information on domain decomposition and other ~'pde driven" preconditioning ideas,
see

J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition-
en; for Elliptic Problems by Substructuring 1," Math. Comp. 41, 103-134.
J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition-
en; for Elliptic Problems by Substructuring II," Math. Comp. 49, 1-17.
G. Meurant {1989). "Domain Decomposition Methods for Partial Differential Equations
on Parallel Computers," to appear lnt'l J. Supercomputing Applications.
W.D. Gropp and D.E. Keyes (1992). "Domain Decomposition with Local Mesh Refine-
ment," SIAM 1. Sci. Statist. Comput. 13, 967-~J93.
D.E. Keyes, T.F. Chan, G. Meurant, J.S. Scroggs, and R.G. Voigt {eds) {1992). Do-
main Decomposition Methods for Partial Differential Equations, SIAM Publications,
Philadelphia, PA.
M. Mu. (1995). "A New family of Preconditioners for Domain Decomposition," SIAM
1. Sci. Comp. 16, 289-306.
Various aspects of polynomial preconditioners a.re discussed in

O.G. Johnson, C.A. Micchelli, and G. Paul {1983). "Polynomial Preconditioners for
Conjugate Gradient Calculations," SIAM 1. Numer. Anal. 20, 362-376.
S.C. Eisenstat {1984). "Efficient Implementation of a Class of Preconditioned Conjugate
Gradient Methods," SIAM 1. Sci. and Stat. Computing 2, 1-4.
Y. Saad (1985). "Practical Use of Polynomial Preconditionings for the Conjugate Gra-
dient Method," SIAM 1. Sci. and Stat. Comp. 6, 865-882.
L. Adams {1985). "m-step Preconditioned Congugate Gradient Methods," SIAM 1. Sci.
and Stat. Comp. 6, 452--463.
S.F. Ashby {1987). "Polyuomial Preconditioning for Conjugate Gradient Methods,"
Ph.D. Thesis, Dept. of Computer Science, University of Illinois.
S. Ashby, T. Manteuffel, and P. Saylor (1989). "Adaptive Polynomial Preconditioning
for Hermitian Indefinite Linear Systems," BIT 29, 583--{;09.
R.W. Freund {1990). "On Conjugate Gradient Type Methods and Polynomial Pre-
conditioners for a Class of Complex Non-Hermitian Matrices," Numer. Math. 57,
285-312.
S. Ashby, T. Manteuffel, and J. Otto (1992). "A Comparison of Adaptive Chebyshev
and Least Squares Polynomial Preconditioning for Hermitian Positive Definite Linear
Systems," SIAM 1. Sci. Stat. Comp. 13, 1-29.
Numerous vector/parallel implementations of the cg method have been developed. See

P.F. Dubois, A. Greenbaum, and G.H. Rodrigue (1979). "Approximating the Inverse
of a Matrix for Use on Iterative Algorithms on Vector Processors," Computing 22,
257-268.
H.A. Vander Vorst {1982). "A Vectorizable Variant of Some JCCG Methods," SIAM 1.
Sci. and Stat. Comp. 3, 35Q--356.
G. Meurant (1984). "The Block Preconditioned Conjugate Gradient Method on Vector
Computers," BIT 24, 623--{;33.
T. Jordan (1984). "Conjugate Gradient Preconditioners for Vector and Parallel Pro-
cessors," in G. Birkoff and A. Schoenstadt {eds), Proceeding• of the Conference on
Elliptic Problem Solvero, Academic Press, NY.
H.A. Van der Vorst {1986). "The Performance of Fortran Implementations for Precon-
ditioned Conjugate Gradients on Vector Computers," Parallel Computing 3, 49-58.
M.K. Seager {1986). "Parallelizing Conjugate Gradient for the Cray X-MP," Parallel
Computing 3, 35-47.
10.3. PRECONDITIONED CONJUGATE GRADIENTS 543

0. Axelsson and B. Polman (1986). "On Approximate Factorization Methods for Block
Matrices Suitable for Vector and Parallel Processors,'' Lin. Alg. and Its Applic. 77,
3-26.
D.P. O'Leary (1987). "Parallel Implementation of the Block Conjugate Gradient Algo-
rithm," Parallel Computers 5, 127-140.
R. Melhem(1987). ''Toward Efficient Implementation of Preconditioned Conjugate Gra-
dient Methods on Vector Supercomputers," Int'l J. Supercomputing Applications 1,
7Q-98.
E.L. Poole and J.M. Ortega {1987). "Multicolor ICCG Methods for Vector Computers,"
SIAM J. Numer. Anal. !!4, 1394-1418.
C.C. Ashcraft a.nd R. Grimes (1988). "On Vectorizing Incomplete Factorization and
SSOR Preconditioners," SIAM J. Sci. and Stat. Camp. 9, 122-151.
U. Meier and A. Sameh (1988). "The Behavior of Conjugate Gradient Agorithms on a
Multivector Processor with a HierBrchical Memory," J. Comput. Appl. Math. 24,
13-32.
W.O. Gropp and D.E. Keyes (1988). "Complexity of Parallel Implementation of Domain
Decomposition Techniques for Elliptic Partial Differential Equations," SIAM J. Sci.
and Stat. Comp. 9, 312-326.
H. VanDer Vorst {1989). "High Performance Preconditioning," SIAM J. Sci. and Stat.
Comp. 10, ll74-ll85.
H. Elman (1989). "Approximate Schur Complement Preconditioners on Serial and Par-
allel Computers," SIAM J. Sci. Stat. Comput. 10, 581-605.
0. Axelsson and V. Eijkhout {1989). ''Vectorizable Preconditioners for Elliptic Difference
Equations in Three Space Dimensions," J. Comput. Appl. Math. 27, 299-321.
S.L. Johnsson and K. Mathur (1989). "Experience with the Conjugate Gradient Method
for Stress Analysis on a Data Parallel Supercomputer," International Journal on
Numerical Methods in Engineering 27, 523-546.
L. Mansfield (1991). "Damped Jacobi Preconditioning and Coarse Grid Deflation for
Conjugate Gradient Iteration on Parallel Computers," SIAM J. Sci. and Stat. Camp.
12, 1314-1323.
V. Eijkhout (1991). "Analysis of Parallel Incomplete Point Factorizations," Lin. Alg.
and Its Applic. 154-156, 723-740.
S. Doi (1991). "On Parallelism and Convergence oflncomplete LU Factorizations," Appl.
Numer. Math. 7, 417--436.

Preconditioners for large Toeplitz systems are discussed in

G. Stmng {1986). "A Proposal for Toeplitz Matrix Calculations," Stud. Appl. Math.
14, 171-176.
T.F. Chan (1988). "An Optimal Circulant Preconditioner for Toeplitz Systems," SIAM.
J. Sci. Stat. Comp. 9, 766-771.
R.H. Chan {1989). "The Spectrum of a Family of Circulant Preconditioned Toeplitz
Systems," SIAM J. Num. Anal. 26, 503-506.
R.H. Chan (1991). "Preconditioners for Toeplitz Systems with Nonnegative Generating
Functions," IMA J. Num. Anal. 11, 333-345.
T. Huckle {1992). "Circulant and Skewcirculant Matrices for Solving Toeplitz Matrix
Problems," SIAM J. Matrix Anal. Appl. 13, 167-777.
T. Huckle {1992). "A Note on Skew-Circulant Preconditioners foe Elliptic Problems,"
Numerical Algorithms 1?., 279-286.
R.H. Chan, J.G. Nagy, and R.J. Plemmons {1993). "FFT based Preconditioners for
Toeplitz Block Least Squares Problems," SIAM J. Num. Anal. 30, 174Q-1768.
M. Hanke and J.G. Nagy (1994). ''Toeplitz Approximate Inverse Preconditioner for
Banded Toeplitz Matrices," Numerical Algorithms 7, 183-199.
R.H. Chan, J.G. Nagy, and R.J. Plemmons (1994). "Circulant Preconditioned Toeplitz
Least Squares Iterations," SIAM J. Matrix Anal. Appl. 15, 8Q-97.
544 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

T.F. Chan and J.A. Oikin (1994). "Circulant Preconditioners for Toeplitz Block Matri-
ces," Numerical Algorithms 6, 89-101.

Finally, we offer an assortment of references concerned with the practical application of


the cg method:

J.K. Reid (1972). "The Use of Conjugate Gradients for Systems of Linear Equations
Possessing Property A," SIAM J. Num. Anal. 9, 325-32.
D.P. O'Leary (1980). "The Block Conjugate Gradient Algorithm and Related Methods,"
Lin. Alg. and Its Applic. 29, 293-322.
R.C. Chin, T.A. Manteuffel, and J. de Pillis (1984). "ADI as a Preconditioning for
Solving the Convection-Diffusion Equation," SIAM J. Sci. and Stat. Comp. 5,
281-299.
I. Duff and G. Meurant (1989). "The Effect of Ordering on Preconditioned Conjugate
Gradients," BIT 29, 635--657.
A. Greenbaum and G. Rodrigue {1989). "Optimal Preconditioners of a Given Sparsity
Pattern," BIT 29, 61Q-634.
0. Axelsson and P. Vassilevski {1989). "Algebraic Multilevel Preconditioning Methods
I," Numer. Math. 56, 157-177.
0. Axelsson and P. Vassilevski (1990). "Algebraic Multilevel Preconditioning Methods
II," SIAM J. Numer. Anal. 27. 1569-1590.
M. Hanke and M. Neumann (1990). "Preconditioning• and Splittings for Rectangular
Systems," Numer. Math. 57, 85--96.
A. Greenbaum (1992). "Diagonal Scalings of the Laplacian as Preconditioners for Other
Elliptic Differential Operators," SIAM J. Matrix Anal. Appl. 13, 826-846.
P.E. Gill, W. Murray, D.B. Poncele6n, and M.A. Saunders (1992). "Preconditioners
for Indefinite Systems Arising in Optimization," SIAM J. Matrix Anal. App!. 13,
292-311.
G. Meurant (1992). "A Review on the Inverse of Symmetric Tridiagonal and Block
Tridiagonal Matrices," SIAM J. Matrix Anal. Appl. 13, 707-728.
S. Holmgren and K. Otto (1992). "Iterative Solution Methods and Preconditioners for
Block-Tridiagonal Systems of Equations," SIAM J. Matrix Anal. Appl. 13, 863--886.
S.A. Vavasis (1992). "Preconditioning for Boundary Integral Equations," SIAM J. Ma-
trix Anal. Appl. 13, 905--925.
P. Joly and G. Meurant (1993). "Complex Conjugate Gradient Methods," Numerical
Algorithms 4, 379-406.
X.-C. Cai and 0. Widlund (1993). "Multiplicative Schwarz Algorithms for Some Non-
symmetric and Indefinite Problems," SIAM J. Numer. Anal. 30, 936--952.

10.4 Other Krylov Subspace Methods


The conjugate gradient method presented over the previous two sections
is applicable to symmetric positive definite systems. The MINRES and
SYMMLQ variants developed in §9.3.2 in connection with the symmetric
Lanczos process can handle symmetric indefinite systems. Now we push
the generalizations even further in pursuit of iterative methods that are
applicable to unsymmetric systems.
The discussion is patterned after the survey article by Freund, Golub,
and Nachtigal (1992) and Chapter 9 of Golub and Ortega (1993). We focus
on cg-type algorithms that involve optimization over Krylov spaces.
10.4. OTHER KRYLOV SUBSPACE METHODS 545

Bear in mind that there is a large gap between our algorithmic speci-
fications and production software. A good place to build an appreciation
for this point is the Templates book by Barrett et al (1993). The book by
Saad (1996) is also highly recommended

10.4.1 Normal Equation Approaches


The method of normal equations for the least squares problem is appealing
because it allows us to use simple "Cholesky technology'' instead of more
complicated methods that involve orthogonalization. Likewise, in the un-
symmetric Ax = b problem it is tempting to solve the equivalent symmetric
positive definite system

using existing conjugate gradient technology. Indeed, if we make the sub-


\~titution A +-AT A in Algorithm 10.2.1 and note that a normal equation
tesidual ATb- AT Axk is AT times the "true" residual b- Axk, then we
obtain the Qonjugate Gradient _Normal Equation Residual method:

Algorithm 10.4.1 [CGNR] If A E IR.nxn is nonsingular, bE IR.n, and


xo E JR." is an initial guess (Axo ::::: b), then the following algorithm com-
putes x E JR." so Ax = b.

k=O
ro = b-Axo
while rk # 0
k=k+l
if k = 1
P1 = ATro
else
f3k =(ATTk_J)T(AT Tk-1)/(ATTk-2)T(AT Tk-2)
Pk =ATTk-1 + fJkPk-1
end
O:k = (AT Tk-dT (AT Tk-1)/(Apk)T(Apk)
Xk = Xk-1 + O:kPk
Tk = Tk-1 - o:kAPk
end
X= Xk

Another way to make an unsymmetric Ax = b problem "cg-friendly" is to


work with the system

In "y space" the cg algorithm takes on the following form:


546 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

k=O
Yo= initial guess (AATy 0 =b)
ro = b- AATyo
while rk 'f 0
k=k+1
if k = 1
P1 = ro
else
~k =rf_ 1rk-1/rf_ 2rk-2
Pk = Tk-1 + ~kPk-1
end
ak = rf_ 1rk-J/pfAAT Pk
Yk = Yk-1 + akPk
Tk = Tk-1 - akAAT Pk
end
Y = Yk
Making the substitutions Xk <- AT Yk and Pk <- AT Pk and simplifying we
obtain the Qonjugate Qradient Normal Equation ~rror method:

Algorithm 10.4.2 [CGNE] If A E IR:'xn is nonsingular, bE IIe, and


x0 E !Rn is an initial guess (Ax 0 "" b), then the following algorithm com-
putes x E !Rn so Ax = b.

k=O
ro = b- Axo
while Tk 'f 0
k=k+1
if k = 1
P1 = ATro
else
~k = rf_ 1rk-1/rf_ 2rk-2
Pk =ATTk-1 + ~kPk-1
end
ak-- rk-1Tk-1
T IT
PkPk
Xk = Xk-1 + QkPk
rk = rk-1 - akAPk
end
X= Xk

In general these two normal equation approaches are handicapped by the


squaring of the condition number. (Recall Theorem 10.2.6.) However,
there are some occasions where they are effective and we refer the reader
to Freund, Golub and Nachtigal (1991).
10.4. OTHER KRYLOV SUBSPACE METHODS 547

10.4.2 A Note on Objective FUnctions


Based on what we know about the cg method, the CGNR iterate xk mini-
mizes

over the set


SkCGNR) == xo + K:(AT A,ro,k).
It is easy to show that

and so xk minimizes the residual II b- Ax 11 2 over sfGNR). The "R" in


"CGNR" is there because of the residual-based optimization.
On the other hand, the CGNE (implicit) iterate Yk minimizes

over the set Yo+ JC(AAT,b- AATYo,k). With the change of variable x =
AT y it can be shown that Xk minimizes

over
(10.4.1)

Thus CGNE minimizes the ~rror at each step and that explains the "E" in
"CGNE".

10.4.3 The Conjugate Residual Method


Recall that if A is symmetric positive definite, then it has a symmetric
positive definite square root A 112 • (See §4.2.10.) Note that in this case
Ax= band

are equivalent and that the former is the normal equation version of the
latter. If we apply CGNR to this square root system and simplify the
results, then we obtain

Algorithm 10.2.3 [Conjugate Residuals) If A E IR"xn is symmetric


positive definite, bE IR", and x 0 E IR" is an initial guess (Ax 0 ""b), then
the following algorithm computes x E IR" so Ax = b.
548 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYST EMS

k == O
r 0 = b- Axo
w hile rk ::10
k = k+ l
if k =1
P1 = ro
e lse
fJk = rf_ 1 Ark-tfr[_ 2 Ark- 2
Ap.r. = Ark- J + fJ.r.AP.r. - 1
end
O"k = rZ'_ 1Ark_ If(Apk)T(Ap.r.)
Xk = Xk- l + ll!kPk
rk = rk-l - akAPk
e nd
X = Xk

It follows from our comments about CGNR that II A -lf2 (b- Ax) 11 2 is min-
imized over the set x 0 +A::( A , r 0 , k) during the kth iteration

10.4.4 GMRES
In §9.3.2 we briefly discussed the Lanczos-based MINRES method for sym-
metric, possibly indefinite, Ax = b problems. In tha.t method the iterate
xk minimizes II b - Ax lb over the set
sk = Xo + span{ro, Aro, .. . , Ak-lro } = Xo + A::( A, To,k) (10.4.2)
The key idea behind the algorithm is to express xk in terms of the Lanczos
vectors Qt , Q2, • • . , Qk which span K(A, r 0 , k) if q1 is a. mult iple of the init ial
residual ro = b - Axo.
In t he Qeneralized Minimum Residual (GMRES) method of Saad and
Schultz (1986) t he same approach is taken except that the iterates are
expressed in terms of Arnoldi vectors instead of Lanczos vectors in order
to handle unsymmetric A. After k steps of the Arnoldi iteration (9.4.1) we
have the factorization
(10.4.3)
where t he columns of QA:+ t = [ Qk qk+l J are the orthonormal Arnoldi vec-
tors and
hu hl2 hlk
h21 h22 h2k
0 E Rk+lxk
ii.r.=

0 hk,k- l hkk
0 0 hk+J,k
10.4. OTHER KRYLOV SUBSPACE METHODS 549

is upper Hessenberg. In the kth step of GMRES, II b- Axk 11 2 is minimized


subject to the constraint that xk has the form Xk = xo + QkYk for some
Yk E JR.k. If q, = To/ Po where Po = II ro liz, then it follows that

II b- A(xo + QkYk) liz = II ro - AQkYk liz


= II ro- Qk+iihyk liz
II Poe1 - ilkYk liz.
Thus, Yk is the solution to a ( k + 1)-by-k least squares problem and the
GMRES iterate is given by xk = xo + QkYk .

Algorithm 10.4.4 [GMRES] If A E lR.nxn is nonsingular, bE lR.n, and


x 0 E lR.n is an initial guess (Ax 0 ~ b), then the following algorithm com-
putes x E lR.n so Ax = b.

To= b-Axo
hw = I To liz
k=O
while (hk+i,k > 0)
qk+l = Tk(hk+i,k
k=k+l
Tk = Aqk
fori= l:k
hik = q[ Tk
Tk = Tk - h;kqi
end
hk+t,k = II Tk lb
Xk = xo + QkYk where II h10e1 - ilkYk liz = min
end
X= Xk

It is easy to verify that

The upper Hessenberg least square problem can be efficiently solved using
Givens rotations. In practice there is no need to form Xk until one is happy
with its residual.
The main problem with "unlimited GMRES" is that the kth iteration
involves O(kn) flops. Thus like Arnoldi, a practical GMRES implementa-
tion requires a restart strategy to avoid excessive amounts of computation
and memory traffic. For example, if at most m steps are tolerable, then x,.
can be used as the initial vector for the next GMRES sequence.
550 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

10.4.5 Preconditioning
Preconditioning is the other key to making GMRES effective. Analogous
to the development of the preconditioned conjugate gradient method in
§10.3, we obtain a nonsingular matrix M = M 1M2 that approximates A
in some sense and then apply GMRES to the system Ax = b where A =
Mi 1 AM2 1 , b = Mi 1 b, and x = M2x. If we write down the GMRES
iteration for the tilde system and manipulate the equations to restore the
original variables, then the resulting iteration requires the solution of linear
systems that involve the preconditioner M. Thus, the act of finding a good
preconditioner M = M 1M2 is the act of making A = Mi 1 AM2 1 look
as much as possible like the identity subject to the constraint that linear
systems with M are easy to solve.

10.4.6 The Biconjugate Gradient Method


Just as Arnoldi underwrites GMRES, the unsymmetric Lanczos process
underwrites the Biconjugate gradient (BiCG) method. The starting point
in the development of BiCG is to go back to the Lanczos derivation of the
conjugate gradient method in §9.3.1. In terms of Lanczos vectors, the kth
cg iterate is given by Xk = x 0 + QkYk where Qk is the matrix of Lanczos
vectors, Tk = Qf AQk is tridiagonal, and Yk solves TkYk = Qf To. Note that

Qf(b- Axk) = Qf(To- AQkYk) = 0.


Thus, we can characterize Xk by insisting that it come from xo + K(A, To, k)
and that it produce a residual that is orthogonal to a given subspace, say
K(A,To,k).
In the unsymmetric case we can extend this notion by producing a se-
quence of iterates { xk} with the property that Xk belongs to xo +K(A, To, k)
and produces a residual that is orthogonal to K( AT, s 0 , k) for some so E IR.n.
Simplifications occur if the unsymmetric Lanczos process is used to gener-
ate bases for the two involved Krylov spaces. In particular, after k steps
of the unsymmetric Lanczos algorithm (9.4.7) we have Qk, Pk E IR.nxk such
that P[Qk = h and a tridiagonal matrix Tk = P[ AQk such that

QkTk + Tkef P[Tk = 0


(10.4.4)
PkT[ +skef Qfsk = 0
In BiCG we set Xk = xo +QkYk where TkYk = Qf To. Note that the GaleTkin
condition

holds.
As might be expected, it is possible to develop recursions so that Xk
can be computed as a simple combination of Xk-l and qk-l, instead of as
a linear combination of all the previous q-vectors.
10.4. OTHER KRYLOV SUBSPACE METHODS 551

The BiCG method is subject to serious breakdown because of its de-


pendence on the unsymmetric Lanczos process. However, by relying on
a look-ahead Lanczos procedure it is possible to overcome some of these
difficulties.

10.4.7 QMR
Another iteration that runs off of the unsymmetric Lanczos process is the
quasi-minimum residual (QMR) method of Freund and Nachtigal (1991).
As in BiCG the kth iterate has the form Xk = xo +QkYk· It is easy to show
that after k steps in (9.4. 7) we have the factorization
AQk =Qk+!Tk
where 'h E IR.k+lxk is tridiagonal. It follows that if q1 = p(b- Ax0 ), then
b-Axk = b-A(xo+QkYk)
= To -AQkYk
= To- Qk+tTkYk
= Qk+t(Pet- TkYk)·
If Yk is chosen to minimize the 2-norm of this vector, then in exact arith-
metic xo + QkYk defines the GMRES iterate. In QMR, Yk is chosen to
minimize I pe1- 'hYk lb·

10.4.8 Summary
The methods that we have presented do not submit to a linear ranking.
The choice of a technique is complicated and depends on a host of factors.
A particularly cogent assessment of the major algorithms is given in Barrett
et al (1993).

Problems
P10.4.1 Analogous to (10.2.16), develop efficient implementations of the CGNR, CGNE,
Conjugate residual met hods.
P10.4.2 Establish the mathematical equivalence of the CGNR and the LSQR method
outlined in §9.3.4.
P10.4.3 Prove (10.4.3).
P10.4.4 Develop an efficient preconditioned GMRES implementation. Proceeding as
we did in §10.3 for preconditioned conjugate gradient method. (See (10.3.2) and (10.3.3)
in particular.)
P10.4.5 Prove that the GMRES least squares problem has full rank.

Notes and References for Sec. 10,4


The following papers serve as excellent introductions to the world of unsymmetric iter-
ation:
552 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

S. Eisenste.t, H. Elman, and M. Schultz (1983). "Variational Iterative Methods for


Nonsymmetric Systems of Equations," SIAM J. Num. Anal. 20, 34&-357.
R.W. Freund, G.H. Golub, and N. Nachtiga.l (1992). "Iterative Solution of Linear Sys-
tems," Acta Numerica 1, 57-100.
N. Nachtigal, S. Reddy, and L. Trefethen (1992). "How Fast Are Nonsymmetric Matrix
Iterations," SIAM J. Matrix Anal. Appl. 13, 778-795.
A. Greenbaum and L.N. Trefethen (1994). "GMRES/CR and Arnoldi/Lanczos as Matrix
Approximation Problems," SIAM J. Sci. Camp. 15, 359-368.

Krylov space methods and analysis are featured in the following papers:

W.E. Arnoldi (1951). "The Principle of Minimized Iterations in the Solution of the
Matrix Eigenvalue Problem," Quart. Appl. Math. 9, 17-29.
Y. Saad (1981). "Krylov Subspace Methods for Solving Large Unsymmetric Linear
Systems," Math. Comp. 37, 10&-126.
Y. Saad (1984). "Practica.I Use of Some Krylov Subspace Methods for Solving Indefinite
and Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. Comp. 5, 203-228.
Y. Saad (1989). "Krylov Subspace Methods on Supercomputers," SIAM J. Sci. and
Stat. Comp. 10, 120D-1322.
C.-M. Huang and D.P. O'Leary (1993). "A Krylov Multisplitting Algorithm for Solving
Linear Systems of Equations," Lin. Alg. and It• Applic. 194, 9-29.
C.C. Paige, B.N. Parlett,and H.A. VanDer Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Numer. Linear Algebra with Applic. 2,
11&-134.

References for the GMRES method include

Y. Saad and M. Schultz (1986). "GMRES: A Generalized Minimal Residual Algorithm


for Solving Nonsymmetric Linear Systems," SIAM J. Scientific and Stat. Camp. 7,
856-869.
H.F. Walker (1988). "Implementation of the GMRES Method Using Householder Trans-
formations," SIAM J. Sci. Stat. Comp. 9, 152-163.
C. Vuik and H.A. van der Vorst (1992). "A Comparison of Some GMRES-like Methods,"
Lin. Alg. and Its Applic. 160, 131-162.
N. Nachtigal, L. Reichel, and L. Trefethen (1992). "A Hybrid GMRES Algorithm for
Nonsymmetric Linear Systems,'' SIAM J. Matrix Anal. Appl. 13, 796-825.
Y. Saad (1993). "A Flexible Inner-Outer Preconditioned GMRES Algorithm,'' SIAM J.
Sci. Camput. 14, 461-469.
z. Bai, D. Hu, and L. Reichel (1994). "A Newton Basis GMRES Implementation," IMA
J. Num. Anal. 14, 563-581.
R.B. Morgan (1995). "A Restarted GMRES Method Augmented with Eigenvectors,"
SIAM J. Matrix Anal. Applic. 16, 1154-1171.

Preconditioning ideas for Ull8ymmetric problems are discussed in the following papers:

Y. Saad (1988). "Preconditioning Techniques for Indefinite and Nonsymmetric Linear


Systems," J. Camput. Appl. Math. 24, 89-105.
L. Yu. Kolotilina and A. Yu. Yeremin (1993). "Factorized Sparse Approximate Inverse
Preconditioning I: Theory," SIAM J. Matrix Anal. Applic. 14, 45-58.
I.E. Kaporin (1994). "New Convergence Results and Preconditioning Strategies for the
Conjugate Gradient Method," Num. Lin. Alg. Applic. 1, 179-210.
L. Yu. Kolotilina and A. Yu. Yeremin (1995). "Factorized Sparse Approximate Inverse
Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers,"
Intern. J. High Sp<M Comput. 7, 191-215.
H. Elman (1996). "Fast Nonsymmetric Iterations and Preconditioning for Navier-Stokes
Equations," SIAM J. Sci. Camput. 17, 33-46.
10.4. OTHER KRYLOV SUBSPACE METHODS 553

M. Benzi, C.D. Meyer, and M. Tuma (1996). "A Sparse Approximate Inverse Precondi-
tioner for the Conjugate Gradient Method," SIAM J. Sci. Comput. 17, to appear.

Some representative papers concerned with the development of nonsymmetric conjugate


gradient procedures include

D.M. Young and K.C. Jea (1980). "Generalized Conjugate Gradient Acceleration of
Nonsymmetrizable Iterative Methods," Lin. Alg. and Its Applic. 34, 159-94.
0. Axelsson (1980). "Conjugate Gradient Type Methods for Unsymmetric and Incon-
sistent Systems of Linear Equations," Lin. Alg. and Its Applic. 29, 1-16.
K.C. Jea 81ld D.M. Young (1983). "On the Simplification of Generalized Conjugate
Gradient Methods for Nonsymmetrizable Linear Systems," Lin. Alg. and Its Applic.
52/53, 399-417.
V. Faber and T. Manteuffel (1984). "Necessary and Sufficient Conditions for the Exis-
tence of a Conjugate Gradient Method," SIAM J. Numer. Anal. 21 352-362.
Y. Saad and M. Schultz (1985). "Conjugate Gradient-Like Algorithms for Solving Non-
symmetric Linear Systems," Math. Comp. 44, 417-424.
H.A. Vander Vorst (1986). "An Iterative Solution Method for Solving /(A)z =bUsing
Krylov Subspace Information Obtained for the Symmetric Positive Definite Matrix
A," J. Comp. and App. Math. 18, 249-263.
M.A. Saunders, H.D. Simon, and E.L. Yip (1988). "Two Conjugate Gradient-Type
Methods for Unsymmetric Linear Equations," SIAM J. Num. Anal. 25, 927-940.
R. Freund (1992). "Conjugate Gradient-Type Methods for Linear Systems with Complex
Symmetric Coefficient Matrices," SIAM J. Sci. Statist. Comput. 13, 425-448.

More Lanczas-based solvers are discussed in

Y. Saad (1982). "The Lanczos Biorthogona.lization Algorithm and Other Oblique Pro-
jection Methods for Solving Large Unsymmetric Systems," SIAM J. Numer. Anal.
19, 485-506.
Y. Saad (1987). "On the Lanczos Method for Solving Symmetric Systems with Several
Right Hand Sides," Math. Comp. 48, 651-662.
C. Brezinski and H. Sadok (1991). "Avoiding Breakdown in the CGS Algorithm," Nu-
mer. Alg. 1, 199-206.
C. Brezinski, M. Zaglia., and H. Sadok (1992). "A Breakdown Free Lanczos Type Algo-
rithm for Solving Linear Systems," Numer. Math. 63, 29-38.
S.K. Kim and A.T. Chronopoulos (1991). "A Class of Lanczos-Like Algorithms Imple-
mented on Parallel Computers," Parollel Comput. 17, 763-778.
W. Joubert (1992). "Lanczos Methods for the Solution of Nonsymmetric Systems of
Linear Equations," SIAM J. Matriz Anal. Appl. 13, 926-943.
R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). "An Implementation of the
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices," SIAM J. Sci. and
Stat.Comp. 14, 137-158.

The QMR method is detailed in the following papers

R.W. Freund and N. Nachtigal (1991). "QMR: A Quasi-Minimal Residual Method for
Non-Hermitian Linear Systems," Numer. Math. 60, 315-339.
R.W. Freund (1993). "A Transpose-Free Quasi-Minimum Residual Algorithm for Non-
hermitian Linear System," SIAM J. Sci. Comput. 14, 470-482.
R. W. Freund and N.M. Nachtigal (1994). "An Implementation of the QMR Method
Based on Coupled Two-term Recurrences," SIAM J. Sci. Comp. 15, 313-337.

The residuals in BiCG tend to display erratic behavior prompting the development of
stabilizing techniques:
554 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS

H. van der Vorst (1992). "BiCGSTAB: A Fast and Smoothly Converging Variant of the
Bi-CG for the Solution of Nonsymmetric Linear Systems," SIAM J. Sci. and Stat.
Cump. 13, 631--£44.
M. Gutknecht (1993). "Variants of BiCBSTAB for Matrices with Complex Spectrum,"
SIAM J. Sci. and Stat. Comp. 14, 102o-1033.
G.L.G. Sleijpen and D.R. Fokkema (1993). "BICGSTAB(i) for Linear Equations In-
volving Unsymmetric Matrices with Complex Spectrum,, Electronic Transactions
on Numerical Analysis 1, 11-32.
C. Brezinski and M. Redivo-Zaglia (1995). "Look-Ahead in BiCGSTAB and Other
Product-Type Methods for Linear Systems," BIT 35, 169-201.
In some applications it is awkward to produce matrix-vector product code for both A:r
and AT :r. Transpose free methods are popular in this context. See

P. Sonneveld (1989). "CGS, A Fast Lanczos-Type Solver for Nonsymmetric Linear Sys-
tems," SIAM J. Sci. and Stat. Comp. 10, 36-52.
G. Radicati dl Brozolo and Y. Robert (1989). "Parallel Conjugate Gradient-like Algo-
rithms for Solving Sparse Nonsymmetric Linear Systems on a Vector Multiprocessor,"
Parallel Computing 11, 233-240.
C. Brezinski and M. Redivo-Zaglia (1994). ''Treatment of Near-Breakdown in the CGS
Algorithms," Numerical Algorithms 7, 33-73.
E.M. Kasenally (1995). "GMBACK: A Generalized Minimum Backward Error Algorithm
for Nonsymmetric Linear Systems," SIAM J. Sci. Comp. 16, 698-719.
C.C. Paige, B.N. Parlett, a.nd H.A. van der Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Num. Lin. Alg. with Applic. 2, 115-
133.
M. Hochbruck and Ch. Lubich (1996), "On Krylov Subspace Approximations to the
Matrix Exponential Operator," SIAM J. Numer. Anal., to appear.
M. Hochbruck and Ch. Lubich (1996), "Error Analysis of Krylov Method in a Nutshell,"
SIAM J. Sci. Cumput., to appear.
Connections between the pseudoinverse of a rectangular matrix A and the conjugate
gradient method applied to AT A are pointed out in the paper

M. He<tenes (1975). "Pseudoinverses and Conjugate Gradients," CACM 18, 4o-43.


Chapter 11

Functions of Matrices

§11.1 Eigenvalue Methods


§11.2 Approximation Methods
§11.3 The Matrix Exponential

Computing a function f(A) of an n-by-n matrix A is a frequently oc-


curring problem in control theory and other application areas. Roughly
speaking, if the scalar function f(z) is defined on >.(A), then f(A) is de-
fined by substituting "A" for "z" in the "formula" for f(z). For example,
if f(z) = (1 + z)/(1- z) and 1 (/.>.(A), then f(A) =(I+ A)(I- A)- 1 •
The computations get particularly interesting when the function f is
transcendental. One approach in this more complicated situation is to
compute an eigenvalue decomposition A = Y BY- 1 and use the formula
f(A) = Y f(B)Y- 1 • If B is sufficiently simple, then it is often possible
to calculate f(B) directly. This is illustrated in §11.1 for the Jordan and
Schur decompositions. Not surprisingly, reliance on the latter decomposi-
tion results in a more stable f(A) procedure.
Another class of methods for the matrix function problem is to approx-
imate the desired function f(A) with an easy-to-calculate function g(A).
For example, g might be a truncated Taylor series approximate to f. Error
bounds associated with the approximation of matrix functions are given in
§11.2.
In the last section we discuss the special and very important problem
of computing the matrix exponential eA.

Before You Begin


Chapters 1, 2, 3, 7 and 8 are assumed. Within this chapter there are
the following dependencies:

555
556 CHAPTER 11. FUNCTIONS OF MATRICES

§11.1 --> §11.2 --> §11.3

Complementary references include Mirsky (1955), Gantmacher (1959), Bell-


man (1969), and Horn and Johnson (1991). Some Matlab functions impor-
tant to this chapter are expm, expml, expm2, expm3, logm, sqrtm, and funm.

11.1 Eigenvalue Methods


Given an n-by-n matrix A and a scalar function f(z), there are several
ways to define the matrix function f(A). A very informal definition might
be to substitute "A" for "z" in the formula for f(z). For example, if p(z)
= 1 + z and r(z) = (1- (z/2))- 1 (1 + (z/2)) for z # 2, then it is certainly
reasonable to define p(A) and r(A) by

p(A) = I +A

and
2 ¢>.(A).

"A-for-z" substitution also works for transcendental functions, i.e.,

To make subsequent algorithmic developments precise, however, we need a


more precise definition of f (A).

11.1.1 A Definition
There are many ways to establish rigorously the notion of a matrix function.
See Rinehart (1955). Perhaps the most elegant approach is in terms of a
line integral. Suppose f(z) is analytic inside on a closed contour r which
encircles >.(A). We define f(A) to be the matrix

f(A) = ~ 1 f(z)(zl- A)- 1 dz. (11.1.1)


27rl !r
This definition is immediately recognized as a matrix version of the Cauchy
integral theorem. The integral is defined on an element-by-element basis:

1
==? !ki = - .
27rl
1 f(z)ef(zi- A)- 1 e;dz.
!r
Notice that the entries of (zl- A)- 1 are analytic on r and that J(A) is
defined whenever f(z) is analytic in a neighborhood of >.(A).
11.1. EIGENVALUE METHODS 557

11.1.2 The Jordan Characterization


Although fairly useless from the computational point of view, the definition
(11.1.1) can be used to derive more practical characterizations of /(A ). Fbr
example, if /{A) is defined and
A= XBX - 1 = Xdiag(B., . .. ,Bp)X- 1 ,
then it is easy to verify that
/(A) = X/(B)X - 1 = Xdiag(/(B 1), ••. ,/(Bp))X- 1 • (11.1.2)

For the case when the B; are Jordan blocks we obtain the following:
Theorem 11.1.1 Let x- 1AX = diag{JI, ... ,J,) be the Jordan canonical
form ( JCF) of A E (:"xn with

Aa 1 0
0 Ai 1

1
0 0 A;
being an m,-by~m, Jordan block. I/ f(z) is analytic on an open set contain-
ing A(A), then
/(A) = Xdlag{/(JI), ... , / (J,))X-1
where
j (m,-l){Ai)
/{At) j<1>(Aa)
(m,- 1)1
0 / (A,)
/{J;) =

0
Proof. In view of the remarks preceding the statement of the theorem, it
suffices to examine f (G) where
G = AI+ E E = (t5a.J-I)
~ a q-by-q Jordan block. Suppose (zl- G) Is nonsingultJ.r, Since
q-t E"
(zl- G)- 1 " ..,......-~-:-:-
= LJ (z - A)k+I
k=O
558 CHAPTER 11. FUNCTIONS OF MATRICES

it follows from Cauchy's integral theorem that

1
q-l [ 1 f(z) ] k j(k)(>.) k
q-l
f(G) = { ; 27ri Jr (z- >.)k+l dz E = 2:-k!-E.
k=O

The theorem follows from the observation that Ek = (<'ii,j-k). D

Corollary 11.1.2 If A E <C"xn, A= Xdiag(>. 1 , ... ,>.n)X- 1 , and f(A) is


defined, then

Proof. The Jordan blocks are all 1-by-l. D

These results illustrate the close connection between j(A) and the eigen-
system of A. Unfortunately, the JCF approach to the matrix function
problem has dubious computational merit unless A is diagonal.izable with
a well-conditioned matrix of eigenvectors. Indeed, rounding errors of order
u~~: 2 (X) can be expected to contaminate the computed result, since a lin-
ear system involving the matrix X must be solved. The following example
suggests that ill-conditioned similarity transformations should be avoided
when computing a function of a matrix.

Example 11.1.1 If
1 + w- 5
A = [ 0
then any matrix of eigenvectors is a column scaled verHion of

x = [ ~ 2(1--io- 5 ) ]

and has a 2-norm condition number of order 105 . Using a computer with machine
precision u "' w- 7 we find
f![X-'diag(exp(l + 10-s),exp(l- w-s))X] = [ 20 . 700000183007 2.750000 ]
2.718254
while
e
A = [ 2. 718309
0.000000
2. 718282
2. 718255
l
11.1.3 A Schur Decomposition Approach
Some of the difficulties associated with the Jordan approach to the matrix
function problem can be circumvented by relying upon the Schur decom-
position. If A = QTQH is the Schur decomposition of A, then
j(A) = QJ(T)QH.
For this to be effective, we need an algorithm for computing functions of
upper triangular matrices. Unfortunately, an explicit expression for J(T)
is very complicated as the following theorem shows.
11.1. EIGENVALUE METHODS 559

Theorem 11.1.3 LetT= (t;1 ) be an n-by-n upper triangular matrix with


>.; = t;; and assume f(T) is defined. If f(T) = (f;j), then J;i = 0 ifi > j,
J;1 = J(>.;) fori = j, and for all i < j we have

J;j = L t,o,s,ts,,s,. • ·t .. _,,._j[>.,u, • .. ,>.,.]'


(so,. .. ,s,.)ESi;

where S;i is the set of all strictly increasing sequences of integers that start
at i and end at j and f [>., 0 , ••• , >.,.] is the kth order divided difference of
fat {>.,o' ... '>.,.}.
Proof. See Descloux (1963), Davis (1973), or Van Loan (1975). 0

Computing f(T) via Theorem 11.1.3 would require 0(2n) flops. Fortu-
nately, Parlett (197 4) has derived an elegant recursive method for deter-
mining the strictly upper triangular portion of the matrix F = f (T). It
requires only 2n3 /3 flops and can be derived from the following commutivity
result:
FT = TF. (11.1.3)
Indeed, by comparing (i, j) entries in this equation, we find
j j
2..: J;ktkj = L t,kfkj j >i
k=i k=i

and thus, if t;; and t 1i are distinct,

(11.1.4)

From this we conclude that J;i is a linear combination of its neighbors to its
left and below in the matrix F. For example, the entry hs depends upon
!22, ha, !24, fss, !4s 1 and hs· Because of this, the entire upper triangular
portion ofF can be computed one superdiagonal at a time beginning with
the diagonal, f(tu), ... , f(tnn)- The complete procedure is as follows:

Algorithm 11.1.1 This algorithm computes the matrix function F =


f(T) where Tis upper triangular with distinct eigenvalues and f is defined
on >.(T).
fori= 1:n
fz; = f(t;;)
end
560 CHAPTER 11. FUNCTIONS OF MATRICES

for p = 1:n -1
fori= 1:n- p
j=i+p
s= t;j (fjj - /;;)
for k = i + 1:j- 1
s = s + t;k/kj - /;ktkj
end
Iii = s/(tii - t;;)
end
end
This algorithm requires 2n 3 /3 flops. Assuming that T = QAQH is the
Schur form of A, f(A) = QFQH where F = f(T). Clearly, most of the
work in computing !(A) by this approach is in the computation of the
Schur decomposition, unless f is extremely expensive to evaluate.

n
Example 11.1.2 If

T= [~ ~
and f(z) = {1 + z)/z then F = (/;;) = f(T) is defined by
/n {1+1)/1=2
!22 (1 + 3)/3 = 4/3
h• (1 + 5)/5 = 6/5
!t2 t12(/22- /ul/(t22- tu) = -2/3
h• t23(/33- /22)/(t..- t22) = -4/15
/13 [t13(/33- /u) + (h2/23- /12t23)1/(t33- tu) = -1/15.

11.1.4 A Block Schur Approach


If A has close or multiple eigenvalues, then Algorithm 11.1.1 leads to poor
results. In this case, it is advisable to use a block version of Algorithm
11.1.1. We outline such a procedure due to Parlett (1974a). The first
step is to choose Q in the Schur decomposition such that close or multiple

l l
eigenvalues are clustered in blocks Tu, ... , Tvv along the diagonal ofT. In
particular, we must compute a partitioning

Tu T12 · · · T1v Fu F12 · · · F1v


0 T22 · · · T2 0 F22 · · · F2
T = . . . .v F = ... . . .v
[ .. .. .. ..
[ .. .. ..
0 0 Tvv 0 0 Fw

where >.(T;;) n >.(Tjj) f 0, i f j. The actual determination of the block


sizes can be done using the methods of §7.6.
11.1. EIGENVALUE METHODS 561

Next, we compute the submatrices F;; = f(T;;) for i = 1:p. Since the
eigenvalues of T;; are presumably close, these calculations require special
methods. (Some possibilities are discussed in the next two sections.) Once
the diagonal blocks of F are known, the blocks in the strict upper triangle
of F can be found recursively, as in the scalar case. To derive the governing
equations, we equate (i, j) blocks in FT = TF for i < j and obtain the
following generalization of ( 11.1.4):
j-1

F;jTjj- T;;F;j = T;jFjj- F;;T;j + L (T;kFkj- F;kTkj). (11.1.5)


k=i+l

This is a linear system whose unknowns are the elements of the block F;j
and whose right-hand side is "known" if we compute the F;3 one block
super-diagonal at a time. We can solve (11.1.5) using the Bartels-Stewart
algorithm (Algorithm 7.6.2).
The block Schur approach described here is useful when computing real
functions of real matrices. After computing the real Schur form A = QTQT,
the block algorithm can be invoked in order to handle the 2-by-2 bumps
along the diagonal ofT.

Problems

P11.1.1 Using the definition (11.1.1) show that (a) Af(A) = f(A)A, (b) /(A) is upper
triangular if A is upper triangular, and (c) f(A) is Hermitian if A is Hermitian.
P11.1.2 Rewrite Algorithm 11.1.1 so that f(T) is computed column by column.
Pll.l.S Suppose A= Xdiag(>.i)X- 1 where X= [z1, ... ,zn] andX- 1 = [tn, ... ,!In J".
Show that if f(A) is defined, then
n

/(A} =L f(>.i)zwfl .
k=1

P11.1.4 Show that

T= r,. J
r,, P
q f(T) = [ ~1 Ft.]
F••q
p
q
q p
where Fu = f(Tu) and F22 = f(T,,). Assume f(T) is defined.

Notes and References for Sec. 11.1

The contour integral representation of f(A) given in the text is UBeful in functional .anal-
ysis because of its generality. See

N. Dunford and J. Schwartz {1958). Linear Operators, Po.rt I, Interscience, New York.
As we discussed, other definitions of f(A) are poBBible. However, for the matrix functionB
typically encountered in practice, all these definitions are equivalent. See
562 CHAPTER 11. FUNCTIONS OF MATRICES

R.F. Rinehart (1955). "The Equivalence of Definitions of a Matric Function," Amer.


Math. Monthly 62, 395-414.
Various aspects of the Jordan representation are detailed in

J.S. frame {1964). "Matrix Functions and Applications, Part II," IEEE Spectrum 1
(April), 102~.
J.S. frame (1964). "Matrix Functions and Applications, Part IV," IEEE Spectrum 1
(June), 123-31.
The following are concerned with the Schur decomposition and its relationship to the
f(A) problem:

D. Davis (1973). "Explicit Functional Calculus," Lin. Alg. and Its Applic. 6, 193-99.
J. Descloux (1963). "Bounds for the Spectral Norm of Functions of Matrices," Numer.
Math. 5, 185--90.
C.F. Van Loan {1975). "A Study of the Matrix Exponential," Numerical Analysis Report
No. 10, Dept. of Maths., University of Manchester, England.
Algorithm 11.1.1 and the various computational difficulties that arise when it is applied
to a matrix having close or repeated eigenvalues are discussed in

B.N. Parlett {1976). "A Recurrence Among the Elements of Functions of Triangular
Matrices," Lin. Alg. and Its Applic. 14, 117-21.
A compromise between the Jordan and Schur approaches to the f(A) problem results if
A is reduced to block diagonal form as described in §7.6.3. See

B. Kagstrom (1977). "Numerical Computation of Matrix Functions," Department of


Information Processing Report UMINF-58.77, University of Umea, Sweden.
The sensitivity of matrix functions to perturbation is discussed in

C.S. Kenney and A.J. Laub (1989). "Condition Estimates for Matrix Functions," SIAM
J. Matrix Anal. Appl. 10, 191-209.
C.S. Kenney and A.J. Laub (1994). "Small-Sample Statistical Condition Estimates for
General Matrix Functions," SIAM J. Sci. Camp. 15, 36--61.
A theme in this chapter is that if A is nonnormal, then there is more to computing f(A)
than just computing f(z) on .\(A). The pseudo-eigenvalue concept is a way of under-
standing this phenomena. See

L.N. Trefethen (1992). "Pseudospectra of Matrices," in Numerical Analysis 1991, D.F.


Griffiths and G.A. Watson (eds), Longman Scientific & Technical, Harlow, Essex,
UK.
More details are offered in §11.3.4.

11.2 Approximation Methods


We now consider a class of methods for computing matrix functions which at
first glance do not appear to involve eigenvalues. These techniques are based
on the idea that if g(z) approximates f(z) on >.(A), then f(A) approximates
g(A), e.g.,
A2 Aq
eA :::::: I+ A+ -21
.
+ · · · + -q.1 •
11.2. APPROXIMATION METHODS 563

We begin by bounding II f(A)- g(A) II using the Jordan and Schur matrix
function representations. We follow this discussion with some comments
on the evaluation of m~trix polynomials.

11.2.1 A Jordan Analysis


The Jordan representation of matrix functions (Theorem 11.1.1) can be
used to bound the error in an approximant g(A) of f(A).

Theorem 11.2.1 Let x- 1AX= diag(J., . .. , Jp) be the JCF of A E cnxn


with
A; 1 0
0 A; 1

0
being an fflJ-by-m, Jordan block. lf f(z) and g(z) are analytic on oo open
set containing >.(A), then

I[ f(A)- g(A) ll2 ~ K2(X) ma.x


lSa~
OSr!>rrn-1

Proof. Defining h(z) = f(z) - g(z) we have

II J(A)- g(A) ll2 = II Xdiag{h(Jt), . .. , h(Jp))X- 1 1!2

~ ~2(X) ma.x II h(J,) liz .


lSi~

Using Theorem 11.1.1 and equation (2.3.8) we conclude that

thereby proving the theorem. 0

11.2.2 A Schur Analysis


If we rely on the Schur instead of the Jordan decomposition we obtain an
alternative bound.
564 CHAPTER 11. FUNCTIONS OF MATRICES

Theorem 11.2.2 Let QH AQ = T = diag(>.;) + N be the Schur decompo-


sition of A E <C"xn, with N being the strictly upper triangular portion of
T. If f(z) and g(z) are analytic on a closed convex set !1 whose interior
contains >.(A), then

II f(A)- g(A) IIF ::::; LOr II INirr. IIF


n-l

r=O
1

where

Proof. Let h(z) = f(z)- g(z) and setH= (h;;) = h(A). LetS;~) denote
the set of strictly increasing integer sequences (so, ... , Sr) with the property
that so = i and Sr = j. Notice that
j-i

S;; = U ,, s(r)

and so from Theorem 11.1.3, we obtain the following for all i < j:
i-1
h;; = 2: 2: n.o .• ,n., .•, ···n,r_,,.rh[>-.o ..... >-.rl·
r=l sES~;)

Now since !1 is convex and h analytic, we have

sup (11.2.1)
zEfl

Furthermore if INI•= (nj;>) for r 2: 1, then it can be shown that

j < i+ r
(11.2.2)
j2:i+r

The theorem now follows by taking absolute values in the expression for
h;; and then using (11.2.1) and (11.2.2). D

The bounds in the above theorems suggest that there is more to approximat-
ing f(A) than just approximating f(z) on the spectrum of A. In particular,
we see that if the eigensystem of A is ill-conditioned and/or A's departure
11.2. APPROXIMATION METHODS 565

from normality is large, then the discrepancy between f(A) and g(A) may
be considerably larger than the maximum of IJ(z) - g(z)l on >.(A). Thus,
even though approximation methods avoid eigenvalue computations, they
appear to be influenced by the structure of A's eigensystem, a point that
we pursue further in the next section.

Example 11.2.1 Supp05e

-.01 1 1 ]
A= 00 1.
[ 0 0 .01

If /(z) = e• and g(z) = 1 + z + z 2 /2, then II /(A) - g(A) II "" 10- 5 in either the
Frobenius norm or the 2-norm. Since t< 2 (X) "' 107, the error predicted by Theorem
11.2.1 is 0(1), rather pessimistic. On the other hand, the error predicted by the Schur
decomposition approach is O(lo- 2 ).

11.2.3 Taylor Approximants


A popular way of approximating a matrix function such as eA is through
the truncation of its Taylor series. The conditions under which a matrix
function f(A) has a Taylor series representation are easily established.

Theorem 11.2.3 If f(z) has a power series representation


00

f(z} = Lc"z"
k=O

on an open disk containing >.(A), then

Proof. We prove the theorem for the case when A is diagonalizable. In


P11.2.1, we give a hint as to how to proceed without this assumption.
Suppose x-i AX = D = diag(>.i, ... , >.n)· Using Corollary 11.1.2, we
have

/(A} Xdiag ( f(>.i), ... , f(>.n)) x-i

Xdiag ( fc">.f, ... ,fc">.~) x-i


k=O k=O

= X (fc"D") x- 1 = fc~c(XDX- 1 )" =


k=O k=O
566 CHAPTER 11. FUNCTIONS OF MATRICES

Several important transcendental matrix functions have particularly simple


series representations:

log(!- A) 1>-1 < 1, >. E >.(A)

sin( A)

The following theorem bounds the errors that arise when matrix functions
such as these are approximated via truncated Taylor series.

Theorem 11.2.4 If f(z) has the Taylor series


00

f(z) = L ovk
k=O

on an open disk containing the eigenvalues of A E <C"x", then

Proof. Define the matrix E(s) by


q

!(As) = L:ok(As)k + E(s) (11.2.3)


k=O

If f;i(s) is the (i,j) entry of f(As), then it is necessarily analytic and so

(11.2.4)

where c;j satisfies 0 :S c;j :S s :S 1.


By comparing powers of s in (11.2.3) and (11.2.4) we conclude that
e;j(s), the (i,j) entry of E(s), has the form

f ij(q+i) ( Cij ) q+I


e;i(s) = (q + 1)! s .
11.2. APPROXIMATION METHODS 567

Now 1,1-i)(s) is the (i,j) entry of AH 1 j(Hi)(As) and therefore

II Aq+ij(Hil(As) ll2
le;;(s)l :5 max
(q + 1)1
0$•:51

The theorem now follows by applying (2.3.8). [JI

Example 11.2.2 If

then
eA =[ -0.735759 .0551819 )
-1.471518 1.103638
For q = 59, Theorem 11.2.4 predicts that
q
A ~Ak n
II e - ~ 7CT ll2 ::; (q + 1)! max
O:$a~l
k=O

However, if u "' 10- 7 , then we find

fl ~ ~!
59 k) = [
-22.25880 -1.4322766 ]
( -61.49931 -3.474280 .

The problem is that some of the partial sums have large elements. For example, I+· · · +
A 11 /17! has entries of order 101. Since the machine precision is approximately 10- 7 ,
rounding errors larger than the norm of the solution are sustained.

Example 11.2.2 highlights a shortcoming oftruncated Taylor series approx-


imation: It tends to be worthwhile only near the origin. The problem can
sometimes be circumvented through a change of scale. For example, by
repeated application of the double angle formulae:

cos(2A) = 2 cos(A) 2 - I sin(2A) = 2 sin(A) cos( A)

it is possible to "build up" the sine and cosine of a matrix from suitably
truncated Taylor series approximates:

So = Taylor approximate to sin(A/2k)


Co = Taylor approximate to cos(A/2k)
for j = 1:k
8; = 28;-iCi-1
C; = 1 - I 2cJ_
end

Here k is a positive integer chosen so that, say, II A lloo ""2k. See Serbin
and Blalock (1979).
568 CHAPTER 11. FUNCTIONS OF MATRICES

11.2.4 Evaluating Matrix Polynomials


Since the approximation of transcendental matrix functions so often in-
volves the evaluation of polynomials, it is worthwhile to look at the details
of computing
p(A) = bol + btA + · · · + bqA•
where the scalars b0 , ••• , bq E lR are given. The most obvious approach is
to invoke Horner's scheme:

Algorithm 11.2.1 Given a matrix A and b(O:q), the following algorithm


computes F = bqA• + · · · + b1A + bol.
F = bqA + bq-1!
fork= q- 2: -1:0
F=AF+bd
end
This requires q -1 matrix multiplications. However, unlike the scalar case,
this summation process is not optimal. To see why, suppose q == 9 and
observe that
3
p(A) = A 3(A 3(b 9 A + (b 8 A 2 + b7A + b6I))
2 2
+(b5 A + b4A + b3I)) + b2A + btA + bol.

Thus, F = p(A) can be evaluated with only four matrix multiplies:

A2 = A2
A3 AA2
F1 bgA3 + bsA2 + b7A + b6I
F2 = A3F1 + bsA2 + b4A + b3I
F A3F2 + b2A2 + b1A + bol.

In general, if s is any integer satisfying 1 ~ s ~ yq then


r
p(A) = l:Bk(A")k r = lloor(q/s) (11.2.5)
k=O

where

bsk+s-tA"- 1 + · · · + bsk+1A + b,kl k = O:r- 1


Bk = {
bqAq-•r + · · · + bsr+IA + bkl k = r.

Once A 2 , .•. , A• are computed, Horner's rule can be applied to (11.2.5)


and the net result is that p(A) can be computed with s + r - 1 matrix
11.2. APPROXIMATION METHODS 569

multiplies. By choosing s = floor( v'Q), the number of matrix multiplies


is approximately minimized. This technique is discussed in Paterson and
Stockmeyer (1973). Van Loan (1978) shows how the procedure can be
implemented without storage arrays for A 2 , ••• , A'.

11.2.5 Computing Powers of a Matrix


The problem of raising a matrix to a given power deserves special mention.
Suppose it is required to compute A13 . Noting that A 4 = (A 2 ) 2 , As =
(A4 ) 2 and A 13 = As A 4 A, we see that this can be accomplished with just 5
matrix multiplications. In general we have

Algorithm 11.2.2 (Binary Powering) Given a positive integer 8 and


A E !Rnxn, the following algorithm computes F =A' where 8 is a positive
integer and A E IRnxn.

t
Let 8 = 2: J3k2k be the binary expansion of s with !3t =F 0.
k=O
Z=A; q=O
while J3q = 0
z = Z 2; q = q + 1
end
F=Z
fork= q + l:t
Z= z2
if i3k =F 0
F=FZ
end
end

This algorithm requires at most 2 fioor[log 2(8)] matrix multiplies. If s is a


power of 2, then only log 2(8) matrix multiplies are needed.

11.2.6 Integrating Matrix Functions


We conclude this section with some remarks on the integration of matrix
functions. Suppose f(At) is defined for all t E [a, b] and that we wish to
compute

F = l j(At)dt.

As in (11.1.1) the integration is on an element-by-element basis.


570 CHAPTER 11. FUNCTIONS OF MATRICES

Ordinary quadrature rules can be applied to F. For example, with


Simpson's rule, we have
h m
F ~ F = 3 L wkf(A(a + kh)) (11.2.6)
k-0

where m is even, h = (b- a)fm and


1 k = O,m
Wk = 4 k odd
{
2 k even, k ofi 0, m .

If (d 4 jdz 4 )f(zt) = j< 4 l(zt) is continuous fort E [a,b] and if j< 4 l(At) is
defined on this same interval, then it can be shown that F = F + E where

(11.2.7)

Let /;j and eij denote the (i, j) entries ofF and E, respectively. Under the
above assumptions we can apply the standard error bounds for Simpson's
rule and obtain

The inequality (11.2.7) now follows since If E Jl2 ~ n max le;il and

max II J< 4l(At) ll2.


a9$b

Of course, in the practical application of (11.2.6), the function evaluations


f(A(a + kh)) normally have to be approximated. Thus, the overall error
involves the error in approximating f(A(a+kh) as well as the Simpson rule
error.

Problems

P11.2.1 (a) Suppose G = >.I+ E is a p-by-p Jordan block, where E = (6;,;-d. Show
that
min{p-l,k}
(>J + E)k = L ( ~ )>.H Ei .
i=O
(b) Use (a) and Theorem 11.1.1 to prove Theorem 11.2.3.
P11.2.2 Verify (11.2.2).
P11.2.3 Show that if II A II• < 1, then log(!+ A) exists and satisfies the bound
ll.2. APPROXIMATION METHODS 571

II log(/+ A) 1!2 :'> II A 1!2/(1- II A 1!2).


P11.2.4 Let A by an n-by-n symmetric positive definite matrix. (a) Show that there
exists a unique symmetric positive definite X such that A = X 2 . (b) Show that if
Xo =I and xk+l = (Xk + AX/: 1 )/2 then xk ~VA quadratically where VA denotes
the matrix X in part (a).
P11.2.5 Specialize Algorithm 11.2.1 to the case when A is symmetric. Repeat for the
case when A is upper triangular. In both instances, give the associated flop counts.
P11.2.6 Show that X(t) = Ct cos( tVA) + c 2 ..;"A"=f sin( tVA) solves the initial vslue
problem X(t) = -AX(t), X(O) = c,' X(O) = c2. Assume that A is symmetric positive
definite.
P11.2. 7 Using Theorem 11.2.4, bound the error in the approximations:

q A2k+I
sin( A) ., "<
L
-1)k..,.(...,.----,--,
2k+ 1)!
k=O

P11.2.8 Suppose A E Rn~n is nonsingular and Xo E R'~n is given. The iteration


defined by
Xk+t = Xk(21- AXk)
is the matrix analog of Newton's method applied to the function f(x) =a- (1/x). Use
the SVD to analyze this iteration. Do the iterates converge to A-t? Discuss the choice
of Xo.

Notes and References for Sec. 11.2


The optimality of Horner's rule for polynomial evaluation is discussed in

D. Knuth (1981). The Art of Computer Progrumming , vol. 2. Seminumerical Algo-


rithms , 2nd ed., Addison-Wesley, Reading, Massachusetts.
M.S. Paterson and L.J. Stockmeyer (1973). "On the Number of Nonscalar Multiplica-
tions Necessary to Evaluate Polynomials," SIAM J. Comp. 2, 6o-£6.

The Horner evaluation of matrix polynomials is analyzed in

C. F. Van Loan (1978). "A Note on the Evaluation of Matrix Polynomials," IEEE Tmns.
Auto. Cont. AC-24, 32(}-21.
Other aspects of matrix function computation are discussed in

N.J. Higham and P.A. Knight (1995). "Matrix Powers in Finite Precision Arithmetic,"
SIAM J. Matrix Anal. Appl. 16, 343-358.
R. Mathias (1993). "Approximation of Matrix-Valued Functions," SIAM J. Matrix Anal.
Appl. 14, 1061-1063.
S. Friedland (1991). "Revisiting Matrix Squaring," Lin. Alg. and Its Applic. 154-156,
59-63.
H. Bolz and W. Niethammer (1988). "On the Evaluation of Matrix Functions Given by
Power Series," SIAM J. Matrix Anal. Appl. 9, 202-209.

The Newton and Language representations for f(A) and their relationship to other ma-
trix function definitions is discuSBed in

R.F. Rinehart (1955). "The Equivalence of Definitions of a Matric Function," Amer.


Math. Monthly 62, 395-414.
572 CHAPTER 11. FUNCTIONS OF MATRICES

The "double angle" method for computing the cosine of matrix is analyzed in

S. Serbin and S. Blalock (1979). "An Algorithm for Computing the Matrix Cosine,"
SIAM J. Sci. Stat. Comp. 1, 198-204.

The square root is a particularly important matrix function. See §4.2.10. Several ap.
proaches are possible:

A. Bjorck and S. Hammar ling (1983). "A Schur Method for the Square Root of a Matrix,"
Lin. Alg. and Its Applic. 52/53, 127-140.
N.J. Higham (1986). "Newton's Method for the Matrix Square Root," Math. Comp.
46, 537-550.
N.J. Higham (1987). "Computing Real Square Roots of a Rea.l Matrix," Lin. Alg. and
Its Applic. 88/89, 405--430.

11.3 The Matrix Exponential


One of the most frequently computed matrix functions is the exponential

At - ~ (At)k
e - LJ k! .
k=O

Numerous algorithms for computing eAt have been proposed, but most of
them are of dubious numerical quality, as is pointed out in the survey article
by Moler and Van Loan (1978). In order to illustrate what the computa-
tional difficulties are, we present a "scaling and squaring" method based
upon Pade approximation. A brief analysis of the method follows that in-
volves some eAt perturbation theory and comments about the shortcomings
of eigenanalysis in settings where non-normality prevails.

11.3.1 A Pade Approximation Method


Following the discussion in §11.2, if g(z) ;::: ez, then g(A) ::::: eA. A very
useful class of approximants for this purpose are the Pade functions defined
by

where
~ (p+q-k)!p! k
= 8,(p+q)!k!(p-k)!z

and
~ (p+q-k)!q! k
Dpq(z) = LJ(p+q)!k!(q-k)!(-z) ·
k=O
Notice that Rp 0 (z) = 1 + z + · · · + zP /p! is the pth order Taylor polynomial.
11.3. THE MATRIX EXPONENTIAL 573

Unfortunately, the Pad~ approximant& are good only near the origin, as
the following identity reveals:

eA = R,..,(A)+ / -l};,AJ1+9+l D,..,(A)- 1 {' uP(l-u)9eA(l - u)du. (11.3.1}


p+ q · lo
However, this problem can be overcome by exploiting the fact that eA =
(eA/m)m. In particular, we can scale A by m such that Fpq=: R,..,(A/m)
is a suitably accurate approximation to eA/m. We then compute F;,: using
Algorithm 11.2.2. If m is a power of two, then this amounts to repeated
squaring and so is very efficient. The success of the overall procedure de-
pends on the accuracy of the approximant

In Moler and Van Loan (1978) it is shown that if

II A lloo < ~
2i - 2'
then there exists an E E Rnxn such that

Fpq = eA+E
AE = EA
IIE IIoo ~ e(p,q)lj A lloo
e(p,q) z3-(p+~) p!ql
=
(p + q)!(p + q + 1)! .

These results form the basis of an effective eA procedure wit h error control.
Using the above formulae lt is easy to establish the inequality:

II eA- Fpq lloo < E(p q)U A II00 eE(p,q)ll A lloo


II eA lloo - ' •

Tbe parameters p and q can be determined according to some relative


error tolerance. Note that since Fpq requires about j + ma.x(p, q) matrix
multiplies it makes sense to set p = q as this choice minimizes E(p,q) for a
given amount of work. Encapsulating these ideas we obtain

Algorithm 11.3.1 Given 6 > 0 and A E R"xn, the following algorithm


computes F = eA+E where II E lloo $ 611 A lloo
j = ma.x(O, 1 + ftoor(log2(ll A lloo)))
A = Aj2i
Let q be the smallest non-negative integer such that E(q, q) $; 6.
574 CHAPTER. 11. FuNCTIONS OF MATRICES

D =I; N = I; X = I; c = 1
fork = l:q
c= c(q - k + 1)/[(2q -k+ l)k)
X=AX; N = N+cX; D=D+ (-l)"cX
end
Solve DF = N for F using Gaussian elimination.
fork= l :j
F=FJ
end
This algorithm requires about 2(q + j + 1/ 3)n3 flops. The roundoff error
properties of ba.ve essentially been analyzed by Ward (1977).
The special Horner techniques of §11.2 can be applied to quicken the
computation of D = D9q(A) and N = N 9 q(A). For example, if q = 8 we
have Nqq(A) = U + AV and Dqq(A) = U- AV where

U = eoi + ~Az + (~I+ ~A2 + eeA4 )A4

and
V = ctl + caAz + (Cf.l + 0rA2 )A4 •
Clearly, N and D can be found in 5 matrix multiplies rather than the 7
required by Algorithm 11.3.1.

11.3.2 Perturbation Theory


Is Algorithm 11.3.1 stable in the presence of roWldoff error? To answer this
question we need to understand the sensitivity of the matrix exponential to
perturbations in A. The starting point in the discussion is the initial value
problem
X(t) = AX(t) X(O) =I
where A, X(t) e Rnxn. This has the unique solution X(t) = eAt , a char-
acterization of the matrix exponential that can be used to establish the
identity
e<A+E)t _eAt = 1t eA(t-1) Ee<A+E )•ds .

From this it follows that

II e(A+E)t -
II ?t
eAt liz
1! 2
< II E
-
liz
II eAt 112
r II
lo
eA(t-s) II 2 II e(A+E)I II 2 ds

Further simplifications result if we bound the norms of the exponentials


that appear in the integrand. One way of doing this is through the Schur
decomposition. If QH AQ = diag(>.i) + N is the Schur decomposition of
A E C'x", then it can be shown that
(1 1.3.2)
11.3. THE MATRIX EXPONENTIAL 575

where
a( A) =max {Re(.\): .\ E .\(A)} (11.3.3)
and
Ms(t) = ~ II~~ II~ .
A:~o

The quantity a( A) is called the spectml abscissa and with a little manipu-
lation it can be shown that
II e<A+E)t eAt I2 2
II eAtliz :S tilE llzMs(t) exp(tMs(t)ll E ll2).
Notice that Ms(t) = 1 if and only if A is normal, suggesting that the matrix
exponential problem is "well behaved" if A is normal. This observation
is confirmed by the behavior of the matrix exponential condition number
v(A, t), defined by

v(A, t) =

This quantity, discussed in Van Loan (1977), measures the sensitivity of


the map A --+ eAt in that for a given t, there is a matrix E for which
II e<A+E)t -eAt ll2 II E 112
II eAt ll2 "" v(A, t) II A ll2 ·
Thus, if v(A, t) is large, small changes in A can induce relatively large
changes in eAt. Unfortunately, it is difficult to characterize precisely those
A for which v(A, t) is large. (This is in contrast to the linear equation
problem Ax = b, where the ill-conditioned A are neatly described in terms
of SVD.) One thing we can say, however, is that v(A, t) 2: til A 112, with
equality holding for all non-negative t if and only if A is normal.
Dwelling a little more on the effect of non-normality, we know from the
analysis of §11.2 that approximating eAt involves more than just approxi-
mating ezt on .\(A). Another clue that eigenvalues do not ''tell the whole
story" in the eAt problem has to do with the inability of the spectral ab-
scissa (11.3.3) to predict the size of II eAt 11 2 as a function of time. If A is
normal, then
(11.3.4)
Thus, there is uniform decay if the eigenvalues of A are in the open left half
plane. But if A is non-normal, then eAt can grow before decay "sets in."
The 2-by-2 example

A= [ -~ ~] # eAt_
- e-t [ 1 tM ]
0 1

plainly illustrates this point.


576 CHAPTER 11. FUNCTIONS OF MATRICES

11.3.3 Some Stability Issues


With this discUBSion we are ready to begin thinking about the stability of
Algorithm 11.3.1. A potential difficulty arises during the squaring process
if A is a matrix whose exponential grows before it decays. If

then it can be shown that rounding errors of order


'Y =nil G21i2ll G4 1i2ll G8 1i2 ···II G2;-• !12
can be expected to contaminate the computed G2;. If II eAt 1\2 has a sub-
stantial initial growth, then it may be the case that
'Y » nil G~ ll2 "=' nl\ eA 1\2
thus ruling out the possibility of small relative errors.
If A is normal, then so is the matrix G and therefore II am
112 = II G 1\:;'
for all positive integers m. Thus, 'Y ""' nil ~ !12 ~ nl\ eA ll2 and so the
initial growth problems disappear. The algorithm can essentially be guar-
anteed to produce small relative error when A is normal. On the other
hand, it is more difficult to draw conclusions about the method when A is
non-normal because the connection between v(A, t) and the initial growth
phenomena is unclear. However, numerical experiments suggest that Algo-
rithm 11.3.1 fails to produce a relatively accurate eA only when v(A, 1) is
correspondingly large.

11.3.4 Eigenvalues and Pseudo-Eigenvalues


We closed §7.1 with a comment that the eigenvalues of a matrix are gen-
erally not good "informers" when it comes to measuring nearness to sin-
gularity, unless the matrix is normal. It is the singular values that shed
light on Ax = b sensitivity. Our discussion of the matrix exponential is
another warning to the same effect. The spectrum of a non-notmal A does
not completely describe cAt behavior.
In many applications, the eigenvalues of a matrix "say something" about
an underlying phenomenon that is being modeled. If the eigenvalues are
extremely sensitive to perturbation, then what they say can be misleading.
This has prompted the development of the idea of pseudospectra. For f ~ 0,
the E-pseudospectrum of a matrix A is a subset of the complex plane defined
by
(11.3.5)

Qualitatively, z is a pseudo-eigenvalue of A if zl - A is sufficiently close to


singular. By convention we set .>.o(A) =.>.(A). Here are some pseudospectra
properties:
11.3. THE MATRIX EXPONENTIAL
577

1. If E1 :5 E2, then Ae 1 (A) ~ A•• (A).


2. Ae(A) = {z E (;: O'min(zl- A) :5 € }.
3. A.(A) = {z E (;: z E A(A +E), for some E with II E ll2 :5 € }.
Plotting the pseudospectra of a non-normal matrix A can provide insight
into behavior. Here "behavior" can mean anything from the mathematical
behavior of an iteration to solve Ax = b to the physical behavior predicted
by a model that involves A. See Higham and Trefethen (1993), Nachtigal,
Reddy, and Trefethen (1992), and Trefethen, Trefethen, Reddy, and Driscoll
(1993).

Problems

PU.S.l Show that e<A+B)t = eA•eBt for all t if and only if AB = BA. (Hint: Express
both sides as a power seri""' in t and compare the coefficient of t.)
PU.3.2 Suppose that A is skew-symmetric. Show that both eA and the (1,1) Pade
approximate Rn (A) are orthogonal. Are there any other values of p and q for which
Rpq(A) is orthogonal?
PU.S.S Show that if A is nonsingular, then there exists a matrix X such that A= eX.
Is X unique?
P11.3.4 Show that if

n n
then
TF
F 11 12 = Jnro· eAT 'PeA'dt.
Pll.3.5 Give an algorithm for computing eA when A = uvT, u, v E Rn.
P11.3.6 Suppose A E fl!'Xn and that v E Rn has unit 2-norm. Define the function
=
<f>(t) II eA•v 11~/2 and show that
¢,(t) $ i<(A)</>(t)
where J.<(A) = >.1((A + AT)/2). Conclude that II eAt ll2 $ e"(A)t where t 2: 0.
Pn.s. 7 Prove the three pseudospectra properti""' given in the text.

Notes and References for Sec. 11.3


Much of what appears in this section and an extensive bibliography may be found in the
following survey article:

C.B. Moler and C.F. Van Loan (1978). "Nineteen Dubious Ways to Compute the Expo-
nential of a Matrix," SIAM Relliew fO, 801-36.
Scaling and squaring with Pad<\ approximant• (Algorithm 11.3.1) and a careful imple-
mentation of Parlett's Schur decomposition method (Algorithm 11.1.1) were found to be
among the less dubious of the nineteen methods scrutinized. Various aspects of Pade
578 CHAPTER 11. FUNCTIONS OF MATRICES

approximation of the matrix exponential a.re discussed in

W. Fair and Y. Luke (1970). "Pac!e Approximations to the Operator Exponential,"


Numer. Math. 14, 379-82.
C.F. Van Loan (1977). "On the Limitation and Application of Pade Approximation to
the Matrix Exponential," in Pade and Rational Approximation, ed. E.B. Sal! and
R.S. Varga, Academic Press, New York.
R.C. Ward (1977). "Numerical Computation of the Matrix Exponential with Accuracy
Estimate," SIAM J. Num. Anal. 14, 600--14.
A. Wragg (1973). "Computation of the Exponential of a Matrix I: Theoretical Consid-
erations," J. Inst. Math. Applic. 11, 369-75.
A. Wragg (1975). "Computation of the Exponential of a Matrix II: Practical Consider-
ations," J. Inst. Math. Applic. 15, 273-78.
A proof of equation (11.3.1) for the scalar case appears in

R.S. Varga (1961). "On Higher-Order Stable Implicit Methods for Solving Parabolic
Partial Differential Equations," J. Math. Phys. 40, 22Q-31.
There are many applications in control theory calling for the computation of the ma-
trix exponential. In the linear optimal regular problem, for example, various integrals
involving the matrix exponential are required. See

J. Johnson and C.L. Phillips (1971). "An Algorithm for the Computation of the Integral
of the State Transition Matrix," IEEE 1rons. Auto. Cont. AC-16, 204-5.
C.F. Van Loan (1978). "Computing Integrals Involving the Matrix Exponential," IEEE
1rons. Auto. Cont. AC-23, 395-404.
An understanding of the map A --> exp(At) and its sensitivity is helpful when assessing
the performance of algorithms for computing the matrix exponential. Work in this di-
rection includes

B. Kagstrom (1977). "Bounds and Perturbation Bounds for the Matrix Exponential,"
BIT 17, 39-57.
C.F. Van Loan (1977). "The Sensitivity of the Matrix Exponential," SIAM J. Num.
Anal. 14, 971-81.
R. Mathias (1992). "Evaluating the Frechet Derivative of the Matrix Exponential,"
Numer. Math. 63, 213-226.
The computation of a logarithm of a matrix is an important area demanding much more
work. These calculations arise in various 11system identification" problems. See

B. Singer and S. Spilerman (1976). "The Representation of Social Processes by Markov


Models," Amer. J. Sociology 82, 1-54.
B.W. Helton (1968). "Logarithms of Matrices," Proc. A mer. Math. Soc. 19, 733-36.
For pointers into the paeudospectra literature we recommend

L.N. Trefethen (1992). "Pseudospecta of Matric,..," in Numerical Analysis 1991, D. F.


Griffiths and G.A. Watson (eds), Longman Scientific and Technical, Harlow, Essex,
UK, 234-262.
D.J. Highann and L.N. Trefethen (1993). "Stiffness of ODES," BIT 33, 285-303.
L.N. Trefethen, A.E. Trefethen, S.C. Reddy, and T.A. Driscoll (1993). "Hydrodynamic
Stability Without Eigenvalu... ," Science 261, 578-584.
as well as Chaitin-Chatelin and Fraysse (1996, chapter 10).
Chapter 12

Special Topics

§12.1 Constrained Least Squares


§12.2 Subset Selection Using the SVD
§12.3 Total Least Squares
§12.4 Computing Subspaces with the SVD
§12.5 Updating Matrix Factorizations
§12.6 Modified/Structured Eigenproblems

In this final chapter we discuss an assortment of problems that repre-


sent important applications of the singular value, QR, and Schur decompo-
sitions. We first consider least squares minimization with constraints. Two
types of constraints are considered in §12.1, quadratic inequality and linear
equality. The next two sections are also concerned with variations on the
standard LS problem. In §12.2 we consider how the vector of observations
b might be approximated by some subset of A's columns, a course of action
that is sometimes appropriate if A is rank-deficient. In §12.3 we consider
a variation of ordinary regression known as total least squares that has
appeal when A is contaminated with error. More applications of the SVD
are considered in §12.4, where various subspace calculations are considered.
In §12.5 we investigate the updating of orthogonal factorizations when the
matrix A undergoes a low-rank perturbation. Some variations of the basic
eigenvalue problem are discussed in §12.6.
Before You Begin
Because of the topical nature of this chapter, it doesn't make sense to
have a chapter-wide, before-you-begin advisory. Instead, each section will
begin with pointers to earlier portions of the book, and, if appropriate,
pointers to LAPACK and other texts.

579
580 CHAPTER 12. SPECIAL TOPICS

12.1 Constrained Least Squares


In the least squares setting it is sometimes natural to minimize II Ax - b liz
over a proper subset of R". For example, we may wish to predict bas best
we can with Ax subject to the constraint thatx is a unit vector. Or, perhaps
the solution defines a fitting function f (t ) which is to have prescribed values
at a finite number of points. This can lead to an equality constrained least
squares problem. In this section we show how these problems can be solved
using the QR factorization and the SVD.
Chapter 5 and §8. 7 should be understood before reading this section.
LAPACK connections include:

LAPACK: Tools for Generalized/Constrained LS Problems


_GCLSE Sotves the equality constrained LS problem
_GGQRF Computes tbe generalized QR factorization of a matrix pair
_CCR.QF Computes the generalized RQ factorization of a matrix pair
_CGSVP Converts the GSVD problem to t riangular form
_ TGSJA Computes the GSVD of a pair of triangular matrices

Complementary references include Lawson a.nd Hanson (1974) a.nd Bjorck


(1996).

12.1.1 The Problem LSQI


Least squares minimization with a quadratic inequality constraint- the
LSQI problem- is a technique that can h@ used whenever the solution to
the ordinary LS problem needs to be regularized. A simple I.SQI problem
that arises when attempting to fit a function to noisy data is

minimize II Ax - b 11 2 subject to II Bx 11 2 ~ a (12.1.1)

where A E IR"'x", bERm, BE Rnxn (nonsingular), a.nd o;::: 0. The con-


straint defiMs a hyperellipsoid in R" and is usually chosen to damp out
excessive oscillation in the fitting function. This can be done, for example,
if B is a discretized second derivative operator.
More generally, we have the problem

minimize II Ax - b 11 2 subject to II Bx - d lb ~ o (12.1.2)

where A E m.mxn (m;::: n), bERm, BE R"x", dE JRP, and o;::: 0. The
generalized singular value decomposition of §8.7.3 sheds light on the solv-
ability of (12.1.2). Indeed, if

lJT AX = diag(o., .. . ,an)


(12.1.3)
vTv =I,., q = min{p,n}
12.1. CONSTRAINED LEAST SQUARES 581

is the generalized singular value decomposition of A and B, then (12.1.2)


transforms to

II DAY- b !b subject to II Dsy- d!b ~a


minimize

where b = uTb, d = vT d, andy= x- 1x. The simple form of the objective


function

(12.1.4)
i=n+l

and the constraint equation


r P
II Dsy- dll~ = ~),6;y;- d;) 2 + L cl,2 ~ a 2 (12.1.5)
i==l i=r+l

facilitate the analysis of the LSQI problem. Here, r = rank(B) and we


assume that /3r+t = · · · = /Jq = 0.
To begin with, the problem has a solution if and only if
p

L ell~ a
2

i=r+l

If we have equality in this expression then consideration of (12.1.4) a.nd


(12.1.5) shows that the vector defined by

i = 1:r
i=r+1:n,a;f0 (12.1.6)
i = r + 1:n,a; = 0
solves the LSQI problem. Otherwise

(12.1.7)
i=r+l

and we have more alternatives to pursue. The vector y E m.n, defined by

y· _ { bj/ai a; f 0 i = 1:n
• - d;/,6; Cti =0
is a minimizer of II DAY - b lb· If this vector is also feasible, then we have
a solution to (12.1.2). (This is not necessarily the solution of minimum
2-norm, however.) We therefore assume that

t (/); !i. -J..)2 +


i=l '
(12.1.8)
adO
582 CHAPTER 12. SPECIAL TOPICS

This implies that the solution to the LSQI problem occurs on the boundary
of the feasible set. Thus, our remaining goal is to

minimize II DAY- b 11 2 subject to II DBY - d 11 2 = o.

To solve this problem, we use the method of Lagrange multipliers. Defining

we see that the equations 0 = Bh/By; , i = 1:n, lead to the linear system

Assuming that the matrix of coefficients is nonsingular, this has a solution


y(.A) where
i = 1:q
i = q + 1:n
To determine the Lagrange parameter we define,

and seek a solution to ¢(.A) = o 2 • Equations of this type are referred to as


secular equations and we encountered them earlier in §8.5.3. From (12.1.8)
we see that ¢(0) > a 2 • Now ¢(.A) is monotone decreasing for .A > 0, and
(12.1.8) therefore implies the existence of a unique positive .A* for which
¢(.A•) = o 2 • It is easy to show that this is the desired root. It can be
found through the application of any standard root-finding technique, such
as Newton's method. The solution of the original LSQI problem is then
X= Xy(.A*).

12.1.2 LS Minimization Over a Sphere


For the important case of minimization over a sphere (B = In, d = 0), we
have the following procedure:

Algorithm 12.1.1 Given A E IRmxn with m ~ n, bE IRm, and o > 0,


the following algorithm computes a vector x E IRn such that II Ax- b 11 2 is
minimum, subject to the constraint that II x lb :"': o.
Compute the SVD A = UEVT ,save V = [ v 1 , ••• , Vn ] , and
form b = urb.
r = rank(A)
12.1. CONSTRAINED LEAST SQUARES 583

else

x= Lr (b·)
~ Vt
i=l t1 t
end

The SVD is the dominant computation in this algorithm.


Example 12.1.1 The secular equation for the problem

is given by
2 2
(A!4) + (A:1) = l.
Fbr this problem we find A• = 4.57132 and o: = [.93334 .35898JT.

12.1.3 Ridge Regression


The problem solved by Algorithm 12.1.1 i.s equivalent to the Lagrange mul-
tiplier problem of determining A> 0 such that
(12.1.9)
and II x Jb = o. This equation i.s precisely the normal equation formulation
for the ridge regression problem

min J Ax - b II~ +Ail x II~ .


X

In the general ridge regression problem one has some criteria for selecting
the ridge parameter A, e.g., II x(A) Jb = a for some given o. We describe a
A-selection procedure that is discussed in Golub, Heath, and Wahba (1979).
Set Dk = I- ekef = diag(1, ... , 1, 0, 1, ... , 1) E lR.mxmand let xk(A)
solve

min II D,.(Ax -b) II~ +Ail X II~ . (12.1.10)


X
584 CHAPTER 12. SPECIAL TOPICS

Thus, Xk(A) is the solution to the ridge regression problem with the kth row
of A and kth component of b deleted, i.e., the kth experiment is ignored.
Now consider choosing A so as to minimize the cross-validation weighted
square error C(A) defined by

Here, W1, ..• ,Wm are non-negative weights and


Noting that '
ar is the kth row of A.

2
we see that (afxk(A)- bk) is the increase in the sum of squares result-
ing when the kth row is "reinstated." Minimizing C(A) is tantamount to
choosing A such that the final model is not overly dependent on any one
experiment.
A more rigorous analysis can make this statement precise and also sug-
gest a method for minimizing C(A). Assuming that A > 0, an algebraic
manipulation shows that

') ( ') a'[ x(A)- bk


Xk (A = X A + T
1 - zkak
Zk (12.1.11)

where Zk = (AT A+ AI)- 1ak and x(A) = (AT A+ AI)- 1AT b. Applying
-af to (12.1.11) and then adding bk to each side of the resulting equation
gives

T ef(I- A(AT A+ AI)- 1AT)b


bk- akxk(A) = e'f(I- A( AT A+ AJ)-!AT)ek. (12.1.12)

Noting that the residual r = {rt, ... ,rm)T = b- Ax(A) is given by the
formula r = [J- A( AT A+ AJ)- 1 AT]b, we see that

1
C(A) = m LWk
m (
8r 7ab )2
k=! k k

The quotient rk/(8rk/8bk) may be regarded as an inverse measure of the


"impact" of the kth observation bk on the model. When 8rk/8h is small,
this says that the error in the model's prediction of bk is somewhat inde-
pendent of bk. The tendency for this to be true is lessened by basing the
model on the A" that minimizes C(A).
The actual determination of A" is simplified by computing the SVD of
A. Indeed, if UT AV = diag(a 1 , .•. ,an) with a 1 ~ ••. ~an and b = UTb,
12.1. CONSTRAINED LEAST SQUARES 585

t hen it can be shown from {12.1.12) that

C(>.)
1- t n~i
i= l
( tJ+ .,)
q) ..
The minimization of this expression is discussed in Golub, Heath, and
Wahba (1979).

12.1.4 Equality Constrained Least Squares


We conclude the section by considering the least squares problem with
linear equality constraints:

min II Ax- bll 2 {12.1.13)


Bx=d

Here A E Rmxn., B E R"x", b E Rm, dE lR!', and rank( B ) = p. We refer


to {12.1.13) as the LSE problem. By setting a = 0 in (12.1.2) we see
that the LSE problem is a special case of the LSQI problem. Hov.-ever,
it is simpler to approach the LSE problem directly rather than through
Lagrange multipliers.
Assume for clarity that both A and B have full rank. Let

p
n-p
be the QR factorization of BT and set

p
AQ :;: [ A t
p n-p

It is clear that with these transformations (12.1.13) becomes

min II AtY + Azz- b 1! 2 .


Rry= d

Thus, y is determined from the constraint equation RTy :;: d and the vector
z is obtained by solving the unconstrained LS problem

min II Az z- (b - AtY) lb·


~
586 CHAPTER 12. SPECIAL TOPICS

Combining the above, we see that x = Q [ ~ ] solves (12.1.13).

Algorithm 12.1.2 Suppose A E IRmxn, BE JR!'x", bE IRm, and dE JR!'.


If rank(A) = n and rank(B) = p, then the following algorithm minimizes
II Ax- b 11 2 subject to the constraint Bx =d.

BT = QR ( QR factorization)
Solve R(1:p, 1:p)T y = d for y.
A=AQ
Find z so II A(:,p + 1:n)z- (b- A(:, 1:p)y) 11 2 is minimized.
x = Q(:, 1:p)y + Q(:,p + 1:n)z
Note that this approach to the LSE problem involves two factorizations and
a matrix multiplication.

12.1.5 The Method of Weighting


An interesting way to obtain an approximate solution to (12.1.13) is to
solve the unconstrained LS problem

(12.1.14)

for large >.. The generalized singular value decomposition of §8. 7.3 sheds
light on the quality of the approximation. Let
ur AX diag(aJ, ... ,an)= DA E IRmxn
VTBX = diag(/h, ... ,,6p)=DBEJRI'x"
be the GSVD of (A, B) and assume that both matrices have full rank for
clarity. If U = [ UJ, ... , Um ], V = [ v 1 , .•. , Vp ], and X = [ x1, ... , Xn], then
it is easy to show that
P vTd n uTb
X = L i=l
~-
,._,,
X; + L
i=p+I
~- 1
X; (12.1.15)

is the exact solution to (12.1.13), while


P Tb ).2 a2 T d " Tb
>.) = """' a;u; + ~x,·
x(
!Ji V; . """' (12.1.16)
~ a2 + ).2,6~ x, + ~ a·
i=l 1, 1, i=p+l 1,

solves (12.1.14). Since

x(>.) - x (12.1.17)
12.1. CONSTRAINED L EAST SQUARES 587

it follows that x(~} -1 x as~ -1 oo.


The appeal of this approach to the LSE problem Is that no special sub-
routines are required: an ordinary LS solver will do. However, for large
values of A numerical problems can arise and it is necessary to take precau-
tions. See Powell and Reid (1968} and Van Loan (1982a).

Example 12.1.2 The problem

hea aolution z = (.3407821 , .3407821JT. Thie can be approximated by solving

mlD.[ 5 ~ 6 ~][:1:]]
:z::a -[i]
3
1000 -1000 0 2

which baa soluilon z - [.340'7810, .340'7829)T.

Problema

P 12.1.1 {a) Show that if nuii(A) n null( B) ~ {0}, then (12.1.2) canncrt ha~ a unique
solution. (b) Give an example which shows that the converae is not true. (Hint: A+b
feasible.)
P12.1.2 Let Po(:r), ... ,p,.(:r) be given polynomlu and (:ro, flo), ... , (:r.n , y.,.) a given
set of coordinate pairs with :r,
E (a, b). It Is desired to find a polynomial p(:r) =
:L;-o Oi<PA:(:r) such that :L:.O(p{:r,)-";)
2 Is minimized subject to the constraint that

b N 2

l0
IP'' (:r)):ad:r Rj h ~ (p(zo- 1)- 2p~;') + P(Zi+l) ) ~ Q2

where zo = a+ ih and b =a+ N h. Show that thie leade to an LSQI problem of the form
(12.1.1).
P12.1.S Suppo&e Y = [111 , .. . ,1/, J E R" x• has lhe properly that
yTy = diag(4, .. . ,4) d1 ~ d:a ~ .. · ~ d~o > 0.
Show that if Y = QR is the QR factorization of Y , then R is diagonal with Jr;tJ = d;.
P12.1.4 (a) Show that if (AT A + >.I):r • ATb, ~ > 0, and ft lb = a , then z = :r
(A:r- b)/ A solves the dv.al equatiDnB (AAT + Al)z = - b with II AT z 11 2 a a. (b) Show
that if(AAT +Al)z = - b, R ATzlb =a, then :r = - A7'zaatisfiel (ATA + AT):r = ATb,
U :r Y2 = or.
P12.1.5 Suppose A Is the ~by-1 matrix of ones and let b E R"'. Show that the
cross-validation technique with unit weights preseribes at1 optimal A given by

...
where or= (bl +.. . + bm.)/m and 8 = L:<b·- '>2 / (m- 1).
·-1
588 CHAPTER 12. SPECIAL TOPICS

P12.1.6 Establish equations (12.1.15), (12.1.16), and (12.1.17).


P12.1. 7 Develop an SVD version of Algorithm 12.1.2 that can handle rank deficiency
in A and B.
P12.1.8 Suppose

A = [ ~~ ]
where A1 E E'xn is nonsingular and A2 E R(m-n)xn. Show that

D"m;n(A) ~ ..j1 + D"min(A2A 1 1 )2 D"min(AJ) ·


P12.1.9 Consider the problem

min II Ax- bll,


~TB~=/32
:r:TC:r:='l2

Asaume that B and 0 are positive definite and that Z E Rn.xn is a. nollBingular matrix
with the property that zTBz = diag(.I.Jo····.l.n) and zTcz =I•. Assume that
>.1 ~ · · · ~ >. •. (a) Show that the the set of feasible x is empty unless .l.n -:5, {P h 2 -:5, >.1.
(b) Using Z, show how the two constraint problem can be converted to a single constraint
problem of the form
min II Ax- b lb
yTWy=./32-An'l2

where W = diag(.l.1, ... , An) -Ani.


P12.1.10 Suppose p ~ m ~ n and that A E R"'xn and BE Jl."'XP Show how to
compute orthogonal Q E Rm x m and orthogonal V E R" x" so that

where R E E'xn and S E R"'xm are upper triangular.


P12.1.11 Suppose r E Rm, y E R", and~> 0. Show how to solve the problem

min II Ey- r II,


E E R"'Xn
II E llp9
Repeat with "min" replaced by "max".

Notes and References for Sec. 12.1

Roughly speaking, regularization is a technique for transforming a poorly conditioned


problem into a stable one. Quadratically constrained least squares is an important ex·
ample. See

L. Elden (1977). "Algorithms for the Regularization of Ill-Conditioned Least Squares


Problems," BIT 11, 134-45.
References for cross-validation include

G.H. Golub, M. Heath, and G. Wahba (1979). "Generalized Cross-Validation as a


Method for Choosing a Good Ridge Parameter," Technometrics 21, 215--23.
L. Elden (1985). "A Note on the Computation of the Generalized Cross-Validation
Function for Ill-Conditioned Least Squares Problems," BIT 24, 467-472.
12.1. CONSTRAINED LEAST SQUARES 589

The LSQI problem is discussed in

G.E. Forsythe and G.H. Golub (1965). "On the Stationary Values of a Second-Degree
Polynomial on the Unit Sphere," SIAM 1. App. Math. 14, 1051Hl8.
L. Elden (1980). "Perturbation Theory for the Least Squar.., Problem with Linear
Equality Constraints," SIAM 1. Num. Anal. 17, 338-50.
W. Gander (1981). "Lea.st Squares with a Quadratic Constraint," Numer. Math. 36,
291-307.
L. Elden (1983). "A Weighted Pseudoinverse, Generalized Singular Values, and Con-
strained Least Squares Problems," BIT llll, 487-502.
G.W. Stewart (1984). "On the Asymptotic Behavior of Scaled Singular Value and QR
Decompositions," Math. Comp. 43, 483-490.
G.H. Golub and U. von Matt (1991). "Quadratically Constrained Lea.st Square~ and
Quadratic Problems," Numer. Math. 59, 561-580.
T.F. Chan, J.A. Olkin, and D. Cooley (1992). "Solving Quadratically Constrained Least
Squares Using Black Box Solvers," BIT 32, 481-495.
Other computational aspects of the LSQI problem involve updating and the handling of
banded and sparse problems. See

K. Schittkowski and J. Stoer (1979). "A Factorization Method for the Solution of Con-
strained Linear Le118t Squares Problems Allowing for Subsequent Data changes,"
Numer. Math. 31, 431-463.
D.P. O'Leary and J.A. Simmons (1981). "A Bidiagona)ization-Regulacization Procedure
for Large Scale Discretizations oflli-Pnsed Problems," SIAM 1. Sci. and Stat. Camp.
2, 474-489.
A. Bjorck (1984). "A General Updating Algorithm for Constrained Linear Least SquSre~
Problems," SIAM 1. Sci. and Stat. Comp. 5, 394-402.
L. Elden (1984). "An Algorithm for the Regularization of Ill-Conditioned, Banded Least
SquSre~ Problems," SIAM 1. Sci. and Stat. Camp. 5, 237-254.

Various aspects of the LSE problem are discussed and analyzed in

M.J.D. Powell and J.K. Reid (1968). "On Applying Householder's Method to Linear
Least Squares Problems," Proc. IFIP Congre••, pp. 122-26.
C. Van Loan (1985). "On the Method of Weighting for Equality Constrained Least
Squares Problems," SIAM 1. Numer. Anal. llll, 851-864.
J.L. Barlow, N.K. Nichols, and R.J. Plemmons (1988). "'terative Methods for Equality
Constrained Least SquSre~ Problems," SIAM J. Sci. and Stat. Comp. 9, 892-906.
J.L. Barlow (1988). "Error Analysis and Implementation Aspects of Deferred Correction
for Equality Constrained Least-Squares Problems," SIAM 1. Num. Anal. 25, 134(}-
1358.
J.L. Barlow and S.L. Handy (1988). "The Direct Solution of Weighted and Equality
Constrained Least-Squares Problems," SIAM 1. Sci. Stat. Comp. 9, 704-716.
J.L. Barlow and U.B. Vemulapati (1992). "A Note on Deferred Correction for Equality
Constrained Least Square~ Problems," SIAM 1. Num. Anal. 29, 249--256.
M. Wei (1992). "Perturbation Theory for the Rank-Deficient Equality Constrained Least
Squar.., Problem," SIAM 1. Num. Anal. 29, 1462-1481.
M. Wei (1992). "Algebraic Properties of the Rank-Deficient Equality-Constrained and
Weighted Least Squac.., Problems," Lin. Alg. and Its Applic. 161, 27-44.
M. Gulli.ksson and P-A. Wedin (1992). "Modifying the QR-Decomposition to Con-
strained and Weighted Linear Least Squares," SIAM 1. Matriz Anal. Appl. 13,
1298-1313.
A. Bjorck and C.C. Paige (1994). "Solution of Augmented Linear Systems Using Or-
thogonal Factorizations," BIT 34, 1-24.
M. Gulliksson (1994). "Iterative Refinement for Constrained and Weighted Linear Least
Squares," BIT 34, 239--253.
590 CHAPTER 12. SPECIAL TOPICS

M. Gulliksson (1995). "Backward Error Analysis for the Constrained and Weighted
Linear Least Squares Problem When Using the Weighted QR Factorization," SIAM
J. Matrix. Anal. Appl. 18, 675--£87.
Generalized factorizations have a.n important bearing on generalized least squares prob-
lems.

C. C. Paige (1985). "The General Linear Model and the Generalized Singular Value
De<:omposition," Lin. Alg. and Its Applic. 70, 269-284.
C.C. Paige (1990). "Some Aspects of Generalized QR Factorization," in Reliable Nu-
merical Computations, M. Cox and S. Hammarling (eds), Clarendon Press, Oxford.
E. Anderson, Z. Bai, and J. Dongarra (1992). "Generalized QR Factorization and Its
Applications," Lin. Alg. and Its Applic. 162/163/164, 243-271.

12.2 Subset Selection Using the SVD


As described in §5.5, the rank-deficient LS problem min II Ax- b ll2 can be
approached by approximating the minimum norm solution
r Tb
X£s = L~Vi r =rank( A)
i=l O"i

with
r Tb
X; L..,
" " U;--v;
i=l O'i
where
r
A = u~vT = Ea;u;vf (12.2.1)
i=l
is the SVD of A and i' is some numerically determined estimate of r. Note
that x; minimizes II A,x - b 11 2 where
;
A, = L:a;u;vf
i=l

is the closest matrix to A that has rank f. See Theorem 2.5.3.


Replacing A by A; in the LS problem amounts to filtering the small
singular values and can make a great deal of sense in those situations where
A is derived from noisy data. In other applications, however, rank deficiency
implies redundancy among the factors that comprise the underlying model.
In this case, the model-builder may not be interested in a predictor such
as A;x, that involves all n redundant factors. Instead, a predictor Ay may
be sought where y has at most i' nonzero components. The position of the
nonzero entries determines which columns of A, i.e., which factors in the
model, are to be used in approximating the observation vector b. How to
pick these columns is the problem of subset selection and is the subject of
this section.
The contents of this section depends heavily upon §2.6 and Chapter 5.
12.2. SUBSET SELECTION USING THE SVD 591

12.2.1 QR with Column Pivoting


QR with column pivoting can be regarded as a method for selecting an
independent subset of A's columns from which b might be predicted. Sup-
pose we apply Algorithm 5.4.1 to A E lRmxn and compute an orthogonal
Q and a permutation II such that R = QT AII is upper triangular. If
R{1:f, 1:f}z = b(1:f} where ii = QTb and we set

then Ay is an approximate LS predictor of b that involves the first f columns


of Ail.

12.2.2 Using the SVD


Although QR with column pivoting is a fairly reliable way to handle near
rank deficiency, the SVD is sometimes preferable for reasons discussed in
§5.5. We therefore describe an SVD-based subset selection procedure due
to Golub, Klema, and Stewart {I976) that proceeds as follows:
• Compute the SVD A = UEVT and use it to determine a rank estimate
f.

• Calculate a permutation matrix P such that the columns of the matrix


B 1 E IRmxf in AP = [ B1 B2 J are "sufficiently independent."

• Predict b with the vector Ay where y = P [ ~ ] and z E m_r minimizes


II BIZ- b 112
The second step is key. Since

min II Ax- b ll2


X E JRn

it can be argued that the permutation P should be chosen to make the


residual {I- B 1Bt}b as small as possible. Unfortunately, such a solution
procedure can be unstable. For example, if

A=
I 1
Il+E
[ 0 0
0
I,
I
l b = [ -i l,
f = 2, and P = I, then min II B1z- b ll2 = 0, but II Bib ll2 = O{Ijf}.
On the other hand, any proper subset involving the third column of A is
strongly independent but renders a much worse residual.
592 CHAPTER 12. SPECIAL TOPICS

This example shows that there can be a trade-off between the indepen-
dence of the chosen columns and the norm of the residual that they render.
How to proceed in the face of this trade-off requires additional mathemati-
cal machinery in the form of useful bounds on ar(B 1 ), the smallest singular
value of B1.
Theorem 12.2.1 Let the SVD of A E Rmxn be given by {12.2.1}, and
define the matrix Bt E Rmxr, i' ~rank( A), by
AP = [ Bt B2 I
f n- i'
where P E Rnxn is a permutation. If

pry = [ fu ~12 ] i'


V21 V22 n-i' (12.2.2)
i' n-i'
and V11 is nonsingular, then
ar(A)
_ 1 ~ a;(Bt) ~ a;(A) .
II Vu ll2
Proof. The upper bound follows from the minimax characteri2ation of
singular values given in §8.6.1
To establish the lower bound, partition the diagonal matrix of singular
values as follows:

L:= L:t 0 ] i'


[ 0 L:2 m- i'
i' n-i'
If w E Rf is a unit vector with the property that II B 1w ll2 = ar(Bt), then

-T 2 -T 2
= IIL:tVuwll2 + IIL:2V12wlb ·
The theorem now follows because II L:1 V?;w 1 2 2: ar(A)/11 Vj[ 1 1 2. D

This result suggests that in the interest of obtaining a sufficiently indepen-


dent subset of columns, we choose the permutation P such that the result-
ing Vu submatrix is as well-conditioned as possible. A heuristic solution to
this problem can be obtained by computing the QR with column-pivoting
factori2ation of the matrix [ v;) V2') ] , where

Vu V12 ] i'
V= [ V21 V22 n-i'
i' n-i'
12.2. SuBSET SELECTION USING THE SVD 593

is a partitioning of the matrix V in (12.2.1). In particular, if we apply QR


with column pivoting (Algorithm 5.4.1) to compute

QT[V?; V2l ]P = [ Ru R12 ]


i' n- r

where Q is orthogonal, Pis a permutation matrix, and Rn is upper trian-


gular, then (12.2.2} implies:

[ ~~ J = pT [ ~: ] = [ ~E~~ J .
Note that Rn is nonsingular and that II Vjj 1 1\ 2 = II R]} lb· Heuristically,
column pivoting tends to produce a well-conditioned Rn, and so the overall
process tends to produce a well-conditoned Vn. Thus we obtain

Algorithm 12.2.1 Given A E Rmxn and b e!Rm the following algo-


rithm computes a permutation P, a rank estimate i', and a vector z E :or'
such that the first i' columns of B = AP are independent and such that
II B(:, 1:i'}z- b lb is minimized.
Compute the SVD UT AV = diag(al,, .. , an) and save V.
Determine f <::: rank(A).
Apply QR with column pivoting: QTV(:, l:r)''P = [Rn R 12 ] and set
AP = [B1 Bz] with B1 E IRmxr and Bz E IRmx(n-rJ.
Determine z E Rf such that II b- B1z lb = min.

Example 12.2.1 Let

A=
[
3
7
2
4
4
5
1.0001
-3.0002
2.9999
l ul'
,
b =
-1 4 5.0003
A is close to being rank 2 in the sense that u,(A) "".0001. Setting f = 2 in Algorithm
12.2.1 leads to x = [0 0.2360 - 0.0085]T with II Ax- b II, = .1966. (The permutation
Pis given by P = [e3 e2 e,].) Note that XLS = [828.1056 -8278569 828.0536]T
with minimum residual II AxLs - b II, = 0.0343.

12.2.3 More on Column Independence vs. Residual


We return to the discussion of the trade-off between column independence
and norm of the residual. In particular, to assess the above method of
subset selection, we need to examine the residual of the vector y that it
produces ry = b-Ay= b-B 1 z =(I -BtBi)b. Here, B 1 = B(:,1:i') with
594 CHAPTER 12. SPECIAL TOPICS

B = AP. To this end, it is appropriate to compare ry with r,, = b- Ax,


since we are regarding A as a rank-r matrix and since Xr solves the nearest
rank-r LS problem, namely, min II A,x- b 11 2 .
Theorem 12.2.2 lfrv and r,, are defined as above and ifVn is the leading
r-by-r principal submatrix of prv, then
ar+t(A) -_ 1
II r,,- ry ll2 :<::: ar(A) II Vu lbll b ll2
Proof. Note that r,, =(I- U1UJ')b and ry =(I- Q 1QT}b where

u = [ U1 u2 l
r m-r
is a partitioning of the matrix U in (12.2.1) and where Q 1 = B 1 (B'f B 1 )- 112 .
Using Theorem 2.6.1 we obtain

while Theorem 12.2.1 permits us to conclude that

<

Noting that

II r,, - ry 11 2 = IIBtY- t(ufb)u1


we see that Theorem 12.2.2 sheds light on how well B 1 y can predict the
"stable" component of b, i.e., U'fb. Any attempt to approximate U[b
can lead to a large norm solution. Moreover, the theorem says that if
af+ 1 (A) « ar(A), then any reasonably independent subset of columns
produces essentially the same-sized residual. On the other hand, if there
is no well-defined gap in the singular values, then the determination of f
becomes difficult and the entire subset selection problem more complicated.
Problems

P12.2.1 Suppose A E Rmxn and that II uTA 11 2 = " with uT u = l. Show that if
uT(Ax- b)= 0 for x ERn and bERm, then II x II,<': luTbi/CI.
P12.2.2 Show that if Bt E ff"Xk is comprised of k columns from A E Rmxn then
"k(Bt) :5 "k(A).
P12.2.3 In equation (12.2.2) we know that the matrix

pTy = ~12 ]
f
V22 n-f
r n-f
12.3. TOTAL LEAST SQUARES 595

v,-;;'
is orthogonal. Thus, II ii,"i' 112 =II cs
112 from the decomposition (Theorem 2.6.3).
Show how to compute P by applying the QR with column pivoting algorithm to [V,f. ii,~].
(For r > n/2, this procedure would be more economical than the technique discussed in
the text.) Incorporate this observation in Algorithm 12.2.1.

Notes and References for Sec. 12.2

The material in this section is derived from

G.H. Golub, V. Klema and G.W. Stewart (1976). "Rank Degeneracy and Least Squar"'
Problems," Technical Report TR-456, Department of Computer Science, University
of Maryland, College Park, MD.
A subset selection procedure based upon the total least squares fitting technique of §12.3
is given in

S. Van Huffel and J. Vandewalle (1987). "Subset Selection Using the Total Least Squares
Approach in Collinearity Problems with Errors in the Variables," IAn. Alg. and Its
Applic. 88/89, 695-714.
The literature on subset selection is vast and we refer the reader to

H. Hotelling (1957). "The Relations of the Newer Multivariate Statistical Methods to


Factor Analysis," Brit. J. Stat. Psych. 10, 69-79.

12.3 Total Least Squares


The problem of minimizing II D(Ax- b) lb where A E m.mxn, and D
diag(dJ. ... , dm) is nonsingular can be recast as follows:

min II Dr lb (12.3.1)
b+r E range(A)

In this problem, there is a tacit assumption that the errors are confined to
the "observation" b. When error is also present in the "data" A, then it
may be more natural to consider the problem

min IID[E,r]TIIF {12.3.2)


b+r E range(A+E)

where D = diag(dt, ... , dm) and T = diag(tt, ... , tn+J) are nonsingular.
This problem, discussed in Golub and Van Loan (1980), is referred to as
the total least squares (TLS) problem.
If a minimizing [Eo, ro] can be found for (12.3.2), then any x satisfying
(A+ Eo)x = b + ro is called a TLS solution. However, it should be realized

n• [! ],
that (12.3.2) may fail to have a solution altogether. For example, if

A = [ ~ b= D =Is. T = Is, and Ef = [ ~ : ]


596 CHAPTER 12. SPECIAL TOPICS

then for all f > 0, bE ran(A + Ef). However, there is no smallest value of
II [E, r]IIF for which b +r E ran(A+E).
A generalization of (12.3.2) results if we allow multiple right-hand sides.
In particular, if BE lRmxk, then we have the problem

min IID[E, R]TIIF (12.3.3)


range(B+R)!;; range(A+E)

where E E lRmxn and R E R"xk and the matrices D = diag(d~, ... ,dm)
and T = diag(it, ... , tn+k) are nonsingular. If [Eo, Ro] solves (12.3.3),
then any X E lRnxk that satisfies (A+ Eo)X = (B + Ro) is said to be a
TLS solution to (12.3.3).
In this section we discuss some of the mathematical properties of the
total least squares problem and show how it can be solved using the SVD.
Chapter 5 is the only prerequisite. A very detailed treatment of the TLS
problem is given in the monograph by Van Huffel and Vanderwalle (1991).

12.3.1 Mathematical Background


The following theorem gives conditions for the uniqueness and existence of
a TLS solution to the multiple right-hand side problem.
Theorem 12.3.1 Let A, B, D, andT be as above and assume m ~ n+k.
Let
C = D[ A, B )T = ( C1 C2]
n k
have SVD UTCV = diag(O'~, ... , O'n+k) = E where U, V, and E are parti-
tioned as follows:

u = 1 u1 U2] v = [ Vu
V21
vl2] n
V22 k
n k
n k

E= [~I 0 ]
E2
n
k
n k
If O',..(C!) > O'n+I(C), then the matrix [Eo, Ro] defined by

D[Eo, Ro]T = -U2E2[V1~, Vi;] (12.3.4)

solves (12.3.3}. lfT1 = diag(t1, ... , t,..) and T2 = diag(t,..+l, ... ,t,..+k) then
the matrix
XTLS = -T1 V12V221T2-l
exists and is the unique solution to (A+ Eo)X = B + Ro.
12.3. TOTAL LEAST SQUARES 597

Proof. We first establish two results that follow from the assumption
an(C1) > <Tn+J(C). From the equation CV = UE we have C1 V12+C2V22 =
U2E 2. We wish to show that l/2 2 is nonsingular. Suppose V22X = 0 for some
unit 2-norm X. It follows from v1~v12 + v:r:;v22 = I that II v12X 112 = 1. But
then
O"n+J(C) ~II U2E2x lb =II c1 v12x lb ~ <Tn(C1),
a contradiction. Thus, the submatix l/2 2 is nonsingular.
The other fact that follows from an(C1) > <Tn+ 1(C) concerns the strict
separation of an( C) and <Tn+ 1(C). From Corollary 8.3.3, we have an( C)~
<Tn(C1) and SO <Tn(C) ~ <Tn(C1) > <Tn+1(C).
Now we are set to prove the theorem. If ran{B + R) C ran( A+ E),
then there is an X (n-by-k) so (A+ E)X = B + R, i.e.,

{D[A,B]T+D[E,R]T}r- 1[ -~k] = 0. {12.3.5)

Thus, the matrix in curly brackets has, at most, rank n. By following the
argument in Theorem 2.5.3, it can be shown that
n+k
II D[ E, R]T IIF ~ L a;{C) 2
i=n+l

and that the lower bound is realized by setting [ E, R] = [Eo, Ro ]. The


inequality an(C) > <Tn+ 1(C) ensures that [Eo, Ro] is the unique minimizer.
The null space of
{D[A, B]T+ D[Eo, Ro]T} = U1Ei[V1~ Vi;]
is the range of [ ~~ ] . Thus, from {12.3.5)

r-1 [ -~k ] = [ ~~ ] s
for some k-by-k matrix S. From the equations Tj 1X = V12S and - T2- 1 =
V22S we see that S = -V221T2 1 . Thus, we must have
1 1
X = T1 V12S = -T1 V12 \t22 T2- = XrLS· D

If an(C) = <Tn+J(C), then the TLS problem may still have a solution,
although it may not be unique. In this case, it may be desirable to single
out a "minimal norm" solution. To this end, consider the r-norm defined
on lRnxk by I z llr = II r1- 1ZT2 112· If X is given by {12.3.5), then from the
CS decomposition {Theorem 2.6.3) we have
2 _ 1- ak(V22) 2
II X llr = II V12V22 1 ll22 = O"k {V22 ) 2 ·
This suggests choosing V in Theorem 12.3.1 so that ak{l/2 2) is maximized.
598 CHAPTER 12. SPECIAL TOPICS

12.3.2 Computations for the k=l Case


We show how to ma.ximize V22 in the important k = 1 case. Suppose
the singular values of C satisfy D"n-p > an-v+i == • • • == an+ I and let
V = [v1 , •.• , Vn+ 1 ] be a column partitioning of V. If Q is a Householder
matrix such that

V(:,n+1-p:n+1)Q [~ p
~] ~
1

then [ ~ ] has the largest ( n + 1)-st component of all the vectors in


span{vn+l-v• ... , Vn+d . If a== 0, the TLS problem has no solution. Oth-
erwise XTLS = -T!z/(tn+!a). Moreover,

In-1 0 ] UT(D[A b]T)V [ In-p ~ ] = ~


[ 0 Q ' 0 Q
and so
D[Eo, ro]T = -D[A, b]T [ ~] [zT a].
Overall, we have the following algorithm:

Algorithm 12.3.1 Given A E m;nxn (m > n), bERm, and nonsingular


D == diag(d1, ... ,dm) and T == diag(t1, ... , tn+ 1), the following algorithm
computes (if possible) a vector xns E R" such that (A+ Eo)x = (b + ro)
and 1\ D[ Eo, ro ]T \IF is minimal.
Compute the SVD UT(D[ A, b ]T)V = diag(a1, ... ,an+J). Save V.
Determine p such that a 1 ~ · • • ~ D"n-p > D"n-p+! = · · · = D"n+l·
Compute a Householder matrix P such that if V = V P, then
V(n+1,n -p+ 1:n) =0

if Vn+l,n+l of 0
fori= l:n
X; = -t;iii,n+J/(tn+!Vn+l,n+l)
end
end
This algorithm requires about 2mn 2 + 12n3 flops and most of these are
associated with the SVD computation.

Example 12.3.1 The TLS problem min II [ e, r JIIF where a= [1, 2, 3, 4]T and
(o+e)x=b+r
b = [2.01, 3.99, 5.80, 8.3o]T has solution XTLS = 2.0212, e = [-.0045, -.0209, -.1048, .0855JT,
and r = [.0022, .0103, .0519, -.0423]T. Note that for this data XLs = 2.0197.
12.3. TOTAL LEAST SQUARES 599

12.3.3 Geometric Interpretation


It can be shown that the TLS solution :r:r LS minimizes

~cE !a'!':r:-bs l2
'1/J(:r:) = LJ i r;.- 2 + t - 2
i•l :r: 1 :r: n+l

where a.T Is ith row of A and b, is the ith component of b. A geometrical


interpretation of the TLS problem is made possible by this observation.
Indeed,
laT:r: - b,l2
:r:rr-2 t 2
1 x+ n + l

is the square of the distance from [ :: ] E Rn+l to the nearest point in


the subspace

Pc = { [ : ] : a E Rn, b E R, b = :r:T a}
where distance in m,n+t is measured by the norm II z II =II Tz lb· A great
deal has been written about this kind of fitting. See Pearson {1901) and
Mad8JlSky (1959).

Problems

P12.3.1 Consider t he T LS problem (12.3.2) with nonBlngular D and T. (a) Show that
if rank( A) < n, then (12.3.2) bas a solution if and only if b E ran( A). (b) Show that if
rank( A) = n , then (12.3.2) has no solution if AT.D2b = Osnd lt,.+t lll Db lb ~ u,.(DATt)
where T1 = diag(tt, ... , tn)·
P12.3.2 Show that if 0 = D[ A , b )T = [ A1 , d] and u,.(O) > ITn+l(O), then the TLS
solution :r satisfies (Af A t - 1Tn+l(0) 2/):r = Af d.
P12.3.3 Sbow how to solw (12.3.2) with the added oonstn.int t hat the firat p columns
of the minimizing E are zero.

Notes and References for Sec. 12.3


Tbis section Is based upon

G.H. Golub and C.F. Van Loan (1980). "An Analysis of the Total Least Squares Prob-
lem," SIAM J. Num. Anal. 17, 883-93.
The bearing or Lhe SVD on the TLS problem is set forth in

G.H. Golub and C. R.elnsch (1970). "Singular Value Decomposition and Least Square~
Solutio1111," Numer. MiliA. 1.1, 403-420.
600 CHAPTER 12. SPECIAL TOPICS

G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Review 15,
318-334.

The most detailed study of the TLS problem is

S. Van Huffel and J. Vandewalle (1991). The Total L&J.St Squares Problem: Computa-
tional Aspects and Analysis, SIAM Publications, Philadelphia.

U some of the columns of A are known exactly then it is sensible to force the TLS per-
turbation matrix E to be zero in the same columns. Aspects of this constrained TLS
problem are discussed in

J.W. Demmel (1987). ''The Smallest Perturbation of a Submatrix which Lowers the
Rank and Constrained Total Least Squares Problems, SIAM J. Numer. Anal. 24,
19~206.
S. Van Huffel and J. Vandewalle (1988). ''The Partial Total Least Squares Algorithm,"
J. Comp. and App. Math. 21, 333-342.
S. Van Huffel and J. Vandewalle (1988). "Analysis and Solution of the Nongeneric Tbtal
Least Squares Problem," SIAM J. Matrix Anal. Appl. 9, 36G-372.
S. Van Huffel and J. Vandewalle (1989). "Analysis and Properties of the Generalized
Total Least Squo.res Problem AX "' B When Some or All Columns in A a.re Subject
to Error," SIAM J. Matrix Anal. Appl. 10, 294-315.
S. Van Huffel and H. Zha (1991). ''The Restricted Total Least Squares Problem: For-
mulation, Algorithm, and Properties," SIAM J. Matrix Anal. Appl. 1!!, 292-309.
S. Van Hutrel (1992). "On the Significance of Nongeneric Total Least Squares Problems,"
SIAM J. Matri:z: Anal. Appl. 13, 2G-35.
M. Wei (1992). "The Analysis for the Total Least Squares Problem with More than One
Solution," SIAM J. Matri:z: Anal. Appl. 13, 746-763.
S. Van Huffel and H. Zha (1993). "An Efficient Total Least Squo.res Algorithm Based
On a Rank-Revealing Two-Sided Orthogonal Decomposition," Numerical Algorithms
•• 101-133.
C.C. Paige and M. Wei (1993). "Analysis of the Generalized Total Least Squares Problem
AX = B when Some of the Columns o.re Free of Error," Numer. Math. 65, 177-202.
R.D. Fierro and J.R. Bunch (1994). "Collinearity and Total Least Squo.res," SIAM J.
Matriz Anal. Appl. 15, 1167-1181.

Other references concerned with least squares fitting when there are errors in the data
matrix include

K. Pearson (1901). "On Lines and Planes of Closest Fit to Points in Space," Phil. Mag.
1!, 55~72.
A. Wald (1940). "The Fitting of Straight Lines if Both Variables are Subject to Error,"
Annals of Mathematical Statistics 11, 284-300.
A. Madanaky (1959). "The Fitting of Straight Lines When Both Variables Are Subject
to Error," J. Amer. Stat. Assoc. 5.1, 173-205.
I. Linnik (1961). Method of L&J.St Squares and Principles of the Theory of Observations,
Pergamon Press, New York.
W.G. Cochrane (1968). "Errors of Measurement in Statistics," Technometrics 10, 637-
66.
R.F. Gunst, J.T. Webster, and R.L. Mason (1976). "A Comparison of Least Squo.res
and Latent Root Regression Estimators," Technometrics 18, 75-83.
G.W. Stewart (1977c). "Sensitivity Coefficients for the Effects of Errors in the Ind<>-
pendent Variables in a Linear Regression," Technical Report TR-571, Department of
Computer Science, University of Maryland, College Park, MD.
A. Van der Sluis and G.W. Veltkamp (1979). "Restoring Rank and Consistency by
Orthogonal Projection," Lin. Alg. and Its Applic. 28, 257-78.
12.4. COMPUTING SUBSPACES WITH THE SVD 601

12.4 Computing Subspaces with the SVD


It is sometimes necessary to investigate the relationship between two given
subspaces. How close are they? Do they intersect? Can one be "rotated"
into the other? And so on. In thls section we show how questions like
t hese can be answered using the singular value decomposition. Knowledge
of Chapter 5 and §8.6 are assumed.

12.4.1 Rotation of Subspaces


Suppose A E Rmxp is a data matrix obtained by performing a certain set
of experiments. If the same set of experiments is performed again, then a
different data matrix, B E Jir'xP, is obtained. In the orthogonal Procrustes
problem the possibility that B can be rotated into A is explored by solving
the following problem:
minimize II A - BQ IIF (12.4.1)
Recall that the trace of a mat rix is the sum of its diagonal entries and t hus,
tr(CTC) = II C II~· It follows t hat if Q E JRPx 11 is orthogonal, then
II A - BQ II~= tr(ATA )+ tr(BTB )- 2tr(QT BT A).
Thus, (12.4.1) is equivalent to t he problem of maximizing tr(QTBT A).
The maximizing Q can be found by calculating the SVD of BT A. In-
deed, if UT(BT A)V = I: = diag(u1 , •• • ,u11) is the SVD of this matrix
and we define the orthogonal matrix Z by Z = v TQTU, then
p J!
tr(QT BT A )= tr(QTur;vT) = tr(ZI:) = L:z,iu, $ L:u;.
i= l f=l

Clearly, t he upper bound is attained by setting Q = UvT for then Z = IP.


This gives the following algorithm:

Algorithm 12.4.1 Given A and B in R"'xJJ, the following algorithm finds


an orthogonal Q E R"xp such t hat II A - BQ IIF is minimum.
C = BT A
Compute t he SVD urcv =I:. Save U and V.
Q=UVT.
The solution matrix Q is t he orthogonal polar factor of BT A. See §4.2.10.

Example 12.4.1

Q = [ :~~ -:~: ] minimiz4?8


[
J1
s
7
2]
4 Q - [ 2.9
6
8
1.2 4.3
2.1
5.2
6.8
6.1
8.1
l F
602 CHAPTER 12. SPECIAL TOPICS

12.4.2 Intersection of Null Spaces


Let A E Rmxn and B E JR?X" be given, and consider the problem of finding
an orthonormal basis for null( A) n null( B). One approach is to compute
the null space of the matrix

since Cx = 0 # x E nuli(A) n null(B). However, a more economical


procedure results if we exploit the following theorem.

Theorem 12.4.1 Suppose A E Rmxn and let {z11 ••• , Zt} be an orthonor-
mal basis for null( A). Define Z = [ Zt. ... , Zt ) and let {Wt. ... , Wq} be an
orthonormal basis for null(B Z) where B E JR?x". If W = [ Wt, ... , Wq ] ,
then the columns of ZW form an orthonormal basis for null( A) n null( B).

Proof. Since AZ = 0 and (BZ)W = 0, we clearly .have ran(ZW) c


nuli(A)nnull(B). Now suppose xis in both null(A) and null(B). It follows
that x = Za for some 0 'I a E Rt. But since 0 = Bx = B Za, we must have
a= Wb for some bE Rq. Thus, x = ZWb E ran(ZW). [J

When the SVD is used to compute the orthonormal bases in this theorem
we obtain the following procedure:

Algorithm 12.4.2 Given A E Rmxn and BE lR?x", the following al-


gorithm computes and integer s and a matrix Y = [ y1 , ••• , y.] having
orthonormal columns which span null( A) n null( B). If the intersection is
trivial then s = 0.

Compute the SVD u.r


AVA= diag(u;). Save VA and set
r = rank(A).
ifr < n
C = BVA(:,r + l:n)
Compute the SVD U'f:CVc = diag('Y;). Save Vc and set
q = rank(C).
if q < n- r
s=n-r-q
Y = VA(:,r + l:n)Vc(:,q + l:n- r)
else
s=O
end
else
s=O
end
12.4. COMPUTING SUBSPACES WITH THE SVD 603

The amount of work required by this algorithm depends upon the relative
sizes of m, n, p, and r.
We mention that a practical implementation of this algorithm requires
a means for deciding when a computed singular value &; is negligible. The
use of a tolerance 6 for this purpose (e.g. &; < 6 => &; = 0) implies that
the columns of the computed Y "almost" define a common null space of A
and B in the sense that II AY 112 :::: II BY !12 :::: 6.

Example 12.4.2 If

A = [ ~ =~ ~ ] wd B = [ ~ ~ ~]
then null( A) n null( B)= span{x}, where x = [1 -2 - 3]T. Applying Algorithm 12.4.2
we find
-.8165 .0000 ] [ - 3273 ] [ .2673 ]
V 2 AV2c = -.4082 .7071 _: 9449 "" -.5345
[ .4082 .7071 -.8018

12.4.3 Angles Between Subspaces


Let F and G be subspaces in lRm whose dimensions satisfy
p = dim(F) ~ dim(G) = q ~ 1.
The principal angles 01, ... , Oq E [0, 71' /2) between F and G are defined
recursively by
cos(Ok) = max max v.Tv = ufvk
uEF vEG

subject to:
llull=llvll=l
UTU; = 0 i = 1:k -1
vTv, = 0 i = 1:k -1.
Note that the principal angles satisfy 0 ~ 01 ~ · · · ~ Oq ~ 71'/2. The vectors
{ u 1 , ••• , uq} and {v 1 , ••• , vq} are called the principal vectors between the
subspaces F and G.
Principal angles and vectors arise in many important statistical appli-
cations. The largest principal angle is related to the notion of distance
between equidimensional subspaces that we discnssed in §2.6.3 If p = q
then dist(F, G) = \/1- cos(Op) 2 = sin(Ov)·
If the columns of QF E Rmxp and Qo E Rmxq define orthonormal bases
for F and G respectively, then

max ma.x ma.x yT(QJ,Qa)z


uEF vEG 11ER" zeR•
llull•=l 11•11>=1 111111>=1 11•11>=1
604 CHAPTER 12. SPECIAL TOPICS

From the minimax characterization of singular values given in Theorem


8.6.1 it follows that if YT(Q~Qa)Z = diag(a 1, ... ,aq) is the SVD of
Q~Qa, then we may define the uk, Vk, and fh by

[ u 1, ,uP]
.•.

[ VJ, •.• ,Vq j

cos(6k)

Typically, the spaces F and G are defined as the ranges of given matrices
A E Rmxp and BE Rmxq. In this case the desired orthonormal bases can
be obtained by computing the QR factorizations of these two matrices.

Algorithm 12.4.3 Given A E Rmxp and BE Rmxq (p ~ q) each with lin-


early independent columns, the following algorithm computes the orthogo-
nal matrices U = [ ub ... , uq] and V = [VI. ... , Vq] and cos(61), ... cos(6q)
such that the (;lk are the principal angles between ran(A) and ran(B) and
Uk and Vk are the associated principal vectors.

Use Algorithm 5.2.1 to compute the QR factorizations


Q~QA = lp, RAE wxp
Q~QB = Iq, RB E RqXq

C=Q~QB
Compute the SVD yTcz = diag(cos(8k)).
QAY(:,1:q) = [uJ, ... ,uq]
QBZ = [ VJ, •.• ,Vq j

This algorithm requires about 4m(q 2 + 2p2 ) + 2pq(m + q) + 12q3 flops.


The idea of using the SVD to compute the principal angles and vectors
is due to Bjorck and Golub (1973). The problem of rank deficiency in A
and B is also treated in this paper.

12.4.4 Intersection of Subspaces


Algorithm 12.4.3 can also be used to compute an orthonormal basis for
ran( A) n ran(B) where A E Rmxp and BE Rmxq

Theorem 12.4.2 Let {cos(8k),uk,Vk}k= 1 be defined by Algorithm 12.4.3.


1 = cos(lh) = · · · = cos(6s) > cos(8s+I), then
If the index s is defined by
we have

ran(A)nran(B) = span{u 1, ... ,u,} = span{v 1, ... ,v,}.


12.4. COMPUTING SUBSPACES WITH THE SVD 605

Proof. The proof follows from the observation that if cos(Ok) 1, then
Uk = Vk. 0

With inexact arithmetic, it is necessary to compute the approximate mul-


tiplicity of the unit cosines in Algorithm 12.4.3.

Example 12.4.3 If

A [: n and B = [ ~ 3 1
]

then the cosines of the principal angles between ran( A) and ran( B) are 1.000 and .856.

Problems

P12.4.1 Show that if A and B are m-by-p matrices, with p :<:; m, then
p
min II A- BQ II} = L):r;(A) 2 - 20';(BT A)+ 0';(8) 2 ).
QTQ=lp i=l

P12.4.2 Extend Algorithm 12.4.2 so that it can compute an orthonormal be.sis for
null(A 1 ) n · · · n null( A.).
P12.4.3 Extend Algorithm 12.4.3 to handle the case when A and Bare rank deficient.

P12.4.4 Relate the principal angles and vectors between ran(A) and ran(B) to the
eigenvalues and eigenvectors of the generalized eigenvalue problem

P12.4.5 Suppose A, BE R"'xn and that A has full column rank. Show how to compute
a symmetric matrix X E Jl!'xn that minimizes II AX- B IIF· Hint: Compute the SVD
of A.

Notes and References for Sec. 12.4


The problem of minimizing II A - BQ IIF over all orthogonal matrices arises ln psych<>-
metrics. See

B. Green (1952). "The Orthogonal Approximation of an Oblique Structure in Factor


Analysis," Psychometrika 17, 429--40.
P. Schonemann (1966). • A Generalized Solution of the Orthogonal Procrustes Problem,"
Psychometrika 31, 1-10.
I.Y. Bar-Itzhack (1975). "Iterative Optima.l Orthogonalization of the Strapdown Ma-
trix," IEEE Tl'ans. Aerospace and Electronic Systems 11, 3Q-37.
R.J. Hanson and M.J. Norris (1981). "Analysis of Measurements Based on the Singular
Value Decomposition," SIAM J. Sci. and Stat. Comp. 2, 363-374.
H. Park (1991). "A Parallel Algorithm for the Unbalanced Orthogonal Procrustes Prob-
lem," Parallel Computing 17, 913--923.
606 CHAPTER 12. SPECIAL TOPICS

When B = I, this problem amounts to finding the clOBeSt orthogonal matrix to A. This
is equivalent to the polar decomposition problem of §4.2.10. See

A. Bjorck and C. Bowie (1971). "An Iterative Algorithm for Computing the Best Esti-
mate of an Orthogonal Matrix," SIAM J. Num. Anal. 8, 358--<i4.
N.J. Higham (1986). "Computing the Polar Decompoaition-with Applications," SIAM
J. Sci. and Stat. Comp. 7, 1160-1174.

If A is reasonably close to being orthogonal itself, then Bjorck and Bowie's technique is
more efficient than the SVD algorithm.
The problem of minimizing II AX - B IIF subject to the constraint that X is sym-
metric is studied in

N.J. Higham (1988). "The Symmetric Procrustes Problem," BIT 28, 133-43.

Using the SVD to solve the canonical correlation problem is discussed in

A. Bjorck and G.H. Golub (1973). "Numerical Methods for Computing Angles Between
Linear Subspaces," Math. Comp. 27, 579-94.
G.H. Golub and H. Zha (1994). "Perturbation Analysis of the Canonical Correlations of
Matrix Pairs," Lin. Alg. and Its Applic. 210, 3-28.

The SVD has other roles to play in statistical computation.

S.J. Hammarling (1985). ''The Singular Value Decomposition in Multivariate Statistics,"


ACM SIGNUM Newsletter 20, 2-25.

12.5 Updating Matrix Factorizations


In many applications it is necessary to re-factor a given matrix A E rxn
after it has been altered in some minimal sense. For example, given that
we have the QR factorization of A, we may need to calculate the QR fac-
torization of a matrix A that is obtained by (a) adding a general rank-one
matrix to A, (b) appending a row (or column) to A, or (c) deleting a row
(or column) from A. In this section we show that in situations like these, it
is much more efficient to "update" A's QR factorization than to generate it
from scratch. We also show how to update the null space of a matrix after
it has been augmented with an additional row.
Before beginning, we mention that there are also techniques for updat-
ing the factorizations PA = LU, A= GGT, and A= LDLT. Updating
these factorizations, however, can be quite delicate because of pivoting re-
quirements and because when we tamper with a positive definite matrix the
result may not be positive definite. See Gill, Golub, Murray, and Saunders
(1974) and Stewart (1979). Along these lines we briefly discuss hyperbolic
transformations and their use in the Cholesky downdating problem.
Familiarity with §3.5, §4.1, §5.1, §5.2, §5.4, and §5.5 is required. Com-
plementary reading includes Gill, Murray, and Wright (1991).
12.5. UPDATING MATRIX FACTORIZATIONS 607

12.5.1 Rank-One Changes


Suppose we have the QR factorization QR =BE m_mxn and that we need
to compute the QR factorization B + uvT = Q1R1 where u,v E m.n are
given. Observe that
(12.5.1)
where w = QT u. Suppose that we compute rotations ln-b ... , h, J1 such
that
Jf ... ff:_ 1w = ±II w ll2e1.
Here, each Jk is a rotation in planes k and k + 1. (For details, see Algorithm
5.1.3.) If these same Givens rotations are applied to R, it can be shown
that
(12.5.2)
is upper Hessenberg. Fbr example, in the n = 4 case we start with

n~[!~~~l
and then update as follows:

R ~ P,R [ !~ ~ ~l w Jfw

R ~ ~ [! ~ ~ ~
P,R l w = J.Tw
2

H ~ ~ ~~~~
Consequently,
JiR [ l
(J[ .. · 1J'_ 1)(R + wvT) = H ±II w ll2e1vT = H1 (12.5.3)

is also upper Hessenberg.


In Algorithm 5.2.3, we show how to compute the QR factorization of an
upper Hessenberg matrix in O(n 2 ) flops. In particular, we can find Givens
'
rotations o~. k = 1:n- 1 such that
(12.5.4)
608 CHAPTER 12. SPECIAL TOPlCS

is upper triangular. Combining (12.5.1) through (12.5.4) we obtain the QR


factorization B + uvT = Q 1 R 1 where

Q1 = QJn- 1 • .. J1G1 .. · Gn- 1·

A careful assessment of the work reveaJs that about 26n2 flops are required.
The vector w = QT u requires 2n2 Bops. Computing H and accumulating
the J~c into Q. involves 12n2 Bops. Finally, computing R 1 and multiplying
the G~c into Q involves 12n2 Hops.
The technique reailily extends to the case when B Is rectangular. It can
also be generalized to compute the QR factorization of B + uvr where
rank(UVT) = p > 1.

12.5.2 Appending or Deleting a Column


Assume that we have the QR factorization

a; € 1R"' (12.5.5)

and partition the upper triangular matrix R e Rmx" as foUows:

[1'
v k-1
Tkk 1
R = 0 m-k
k- 1 1

Now suppose that we want to compute the QR factorization of

A = ! a ..... ,a/c-1 , ak+l •·· ··an]eRmx (n-l).


Note that A is just A with its kth column deleted and that

Qr A = [ ~0 R33
~!}
1
l = H

is upper Hessenberg, e.g.,

X X X X X
0 X X X X
0 0 X X X
H = 0 0 X X X m = 7, n = 6, k = 3
0 0 0 X X
0 0 0 0 X
0 0 0 0 0

Clearly, the unwanted subdiagonal elements h~t+I ,k , .•. , hn,n- l can be ze-
roed by a sequence of Givens rotations: c:;_, ···
G'f H = R 1• Here, G; is
12.5. UPDATING MATRIX FACTORIZATIONS 609

a rotation in planes i and i + 1 fori = k:n - 1. Thus, if Ql = QG~c · · · Gn-1


then A= Q1R1 is the QR factorization of A.
The above update procedure can be executed in O(n 2 ) Bops and is
very useful in certain least squares problems. For example, one may wish
to examine the significan(;e of the kth factor in the underlying model by
deleting the kt h column of the corresponding data matrix and solving the
resulting LS problem.
In a. similar vein, it is useful to be able to compute efficiently the solution
to the LS problem after a column has been appended to A. Suppose we have
the QR factorization (12.5.5) and now wish to compute the QR factorization
of
A = (a., ... ,a~c,z,ak+lo···o<ln]
where z E JR.m is given. Note that if w = QT z then
QTA = [QTa., ... ,QTa~c,w,QTak+h ... ,QTan] A
is upper triangular except for the presence of a "spike" in its k + l-st column,
e.g.,
X X X X X X
0 X X X X X
0 0 X X X X
A= 0 0 0 X X X m = 7, n = 5, k = 3
0 0 0 X 0 X
0 0 0 X 0 0
0 0 0 X 0 0
It is possible to determine Givens rotations Jm-h ... , Jk+l so that

Wk+ l
0

=
with .f[+l · · · JJ;_1 A R upper triangular. We illustrate this by continuing
with the above example:
X X X X X X
0 X X X X X
0 0 X X X X
T-
H = J6A = 0 0 0 X X X
0 0 0 X 0 X
0 0 0 X 0 0
0 0 0 0 0 0
610 CHAPTER 12. SPECIAL TOPICS

X X X X X X
0 X X X X X
0 0 X X X X
H = f[H = 0 0 0 X X X
0 0 0 X 0 X
0 0 0 0 0 X
0 0 0 0 0 0

X X X X X X
0 X X X X X
0 0 X X X X
H = f[H == 0 0 0 X X X
0 0 0 0 X X
0 0 0 0 0 X
0 0 0 0 0 0

This update requires O(mn) flops.

12.5.3 Appending or Deleting a Row


Suppose we have the QR factorization QR =A E Rmxn and now wish to
obtain tbe QR factorization of

- [ wTA ]
A=

where w E R" . Note that

diag(1, QT)A = [ u: ] = H

is upper Hessenberg. Thus, Givens rotations J1, ... , Jn could be determined


so f! ··· f[ H = Rt is upper triangular. It follows that

A= QlRI

is the desired QR factorization, where Q1 = dia.g(l, Q)Jt · · · Jn.


No essential complications result if the new row is added between rows
k snd k + 1 of A. We merely apply the above with A replaced by P A and
Q replaced by PQ where

P -_ [ I,.0 Im-k]
0 '

Upon completion diag(l, pT)Q 1 is the desired orthogonal factor.


12.5. UPDATING MATRIX FACTORIZATIONS 611

Lastly, we consider how to update the QR factorization QR =A E Rmxn


when the first row of A is deleted. In particular, we wish to compute the
QR factorization of the submatrix A 1 in

A=

(The procedure is similar when an arbitrary row is deleted.) Let qT be the


first row of Q and compute Givens rotations G 1 , ••. , Gm-l such that

where a = ±1. Note that

[ ~] 1
m-1

is upper Hessenberg and that

where Q1 E R(m-l)x(m-l) is orthogonal. Thus,

from which we conclude that A 1 = Q1 R 1 is the desired QR factorization.

12.5.4 Hyperbolic Transformation Methods


Recall that the "R" in A= QR is the transposed Cholesky factor in AT A=
GGT. Thus, there is a close connection between the QR modifications just
discussed and analogous modifications of the Cholesky factorization. We
illustrate this with the Cholesky downdating problem which corresponds to
the removal of an A-row in QR. In the Cholesky downdating problem we
have the Cholesky factorization

(12.5.6)

where A E Rmxn with m > n and z E R". Our task is to find a lower
triangular G1 such that G1 Gf = AfA 1 • There are several approaches to
this interesting and important problem. Simply because it is an opportunity
to Introduce some new ideas, we present a downdating procedure that relies
on hyperbolic transformations.
612 CHAPTER 12. SPECIAL TOPICS

We start with a definition. H E lRm xm is pseudo-orthogonal with respect


to the signature matrix S = diag(±1) E IR.mxm if HTSH = S. Now from
(12.5.6) we have AT A =A[ A 1 + zzT = GGT and so

Define the signature matrix

(12.5.7)

and suppose that we can find H E JR.(n+ L) x (n+l) such that HT S H = S with
the property that
(12.5.8)

is upper triangular. It follows that

A[A 1 = [Gz]HTSH[~:] = [G10]S[~1 ] = G1Gf

is the sought after Cholesky factorization.


We now show how to construct the hyperbolic transformation H in
(12.5.8) using hwerbolic rotations. A 2-by-2 hyperbolic rotation has the
form
H = [ cosh(l:l) -sinh(!:I) ] = [ c -s ] .
- sinh(l:l) cosh( B) -s c
Note that if HE IR.2 x2 is a hyperbolic rotation then HTSH = S where S
= diag(-1,1). Paralleling our Givens rotations developments, let us see how
hyperbolic rotations can be used for zeroing. From

we obtain the equation cx2 = SXJ. Note that there is no solution to this
equation if XJ = x2 ofi 0, a clue that hyperbolic rotations are not as nu-
merically solid as their Givens rotation counterparts. If x 1 ofi x2 then it is
possible to compute the cosh-sinh pair:
if X2 = 0
s = 0; c = 1
else (12.5.9)
if lx2l < lx1l
T = X2/XIi C = 1/y1 - r2; 8 = CT
elseif lx 1 < 1 lx2l
7 = XJ/X2j S = 1/~i C = ST
end
end
12.5. UPDATING MATRIX FACTORIZATIONS 613

0 bserve that the norm of the hyperbolic rotation produced by this algo-
rithm gets large as x 1 gets close to x2.
Now any matrix H = H(p, n + 1, 1:1) E JR(n+L) x (n+l) that is the identity
everywhere except _h£,P = hn+l,n+L = cosh(!:!) and hp,n+l = hn+l,p =
-sinh( I:!) satisfies H" SH = S where S is prescribed in (12.5.7). Using
(12.5.9), we attempt to generate hyperbolic rotations Hk = H(1, k, l:lk) for
k = 2:n + 1 so that

This turns out to be possible if A has full column rank. Hyperbolic rotation
Hk zeros entry (k + 1, k). In other words, if A has full column rank, then
it can be shown that each call to (12.5.9) results in a cosh-sinh pair. See
Alexander, Pan, and Plemmons (1988).

12.5.5 Updating the ULV Decomposition


Suppose A E !Rmxn is rank deficient and that we have a basis for its null
space. If we add a row to A,

then how easy is it to compute a null basis for A? When a sequence of


such update problems are involved the issue is one of trocking the null
space. Subspace tracking arises in a number of real-time signal processing
applications.
Working with the SVD is awkward in this context because O(n3 ) flops
are required to recompute the SVD of a matrix that has undergone a unit
rank perturbation. On the other hand, Stewart (1993) has shown that the
null space updating problem becomes O(n 2 ) per step if we properly couple
the ideas of condition estimation of §3.5.4 and complete orthogonal decom-
position. Recall from §5.4.2 that a complete orthogonal decomposition is
two-sided and reveals the rank of the underlying matrix,

UTA V _ [ Tu 0 ] Tu E !R'"xr, r = rank(A).


- 0 0 •

A pair of QR factorizations (one with column pivoting) can be used to


compute this. In this case T 11 = L is lower triangular in exact arithmetic.
But with noise and roundoff we instead compute
614 CHAPTER 12. SPECIAL TOPICS

(12.5.10)

where L E JR'"xr and E E m.<n-r)x(n-r) are lower triangular and H and E


are ''small" compared to Umin(L) . In this case we refer to (12.5.10) as a
rank-revealing U LV decomposition. 1 Note that if

then the columns of V2 define an approximate null space:

II AVz lb = II UzE lb $ II E lb·

Our goal is to produce a rank-revealing U LV decomposition for the row-


appended matrix A. To be more specific, our aim is to show how to produce
updates of L, E, H, V , and (possibly) the rank in O(n 2 ) flops.
Note that

By permuting the bottom row up "underneath" H and E we see that the


challenge is to compute a rank-revealing ULV decomposition of

e 0 0 0 0 0
0
e e 0 0 0 0
0
e e e 0 0 0
0
e e e f 0 0
0
(12.5.11)
h h h h e 0
0
h h h h e e 0
h h h h e e e
w w w w y y y

in O(n 2 ) flops. Here and in t he sequel, we set r = 4 and n = 7 to illustrate


the main ideas. Bear in mind that the h and e entries are small and tha.t
1Dual to this Is the URV decomposition in which the rank-revealing form is upper
t riangular. T here are updating s ituations that sometimes favor t he manipulation of this
form instead of ULV .
12.5. UPDATING MATRIX FACTORIZATIONS 615

we have deduced that the numerical raok is four. In practice, this involves
comparisons with a small tolerance as discussed in §5.5.7.
Using zeroing techniques similar to those presented in §12.5.3, the bot-
tom row can be zeroed with a sequence of row rotations giving

X 0 0 0 0 0 0
X X 0 0 0 0 0
X X X 0 0 0 0

[~ ] X
X
X
X
X
X
X
X
X
X
X
X
0
X
X
0
0
X
0
0
0
X X X X X X X
0 0 0 0 0 0 0

Because this zeroing process intermingles the (presumably large) entries of


tbe bottom row with the entries from each of the other rows, the triangulac
form typically is not rank revealing. However, we can restore the rank-
revealing structure with a combination of condition estimation and zero-
chasing with rotations. Let us assume that witb the added row, the new
null space has dimension two.
With a reliable condition estimator we produce a unit 2-norm vector p
such that
II PT-L liz :::s O'man(L).
-

See §3.5.4. Rotations {Ua,Ht}r= • can be found such that

uf.ru[s~u[..ui;U[2P =es = ls(:,8).


The matrix
H = uJ;u~uis~uJ;Ul;L
is lower Hessenberg and can be restored to a lower triangular form L+ by
a sequence of column rotations:

It follows that

e'fL+ = (e'fH) Vt2V23V34V4sVsc;V61 = (pTL) Vt2V23V34V45Vs&V61


has approximate norm O'man(L). Thus, we obtain a lower triangular matrix
of the form
616 CHAPTER 12. SPECIAL TOPICS

X 0 0 0 0 0 0
X X 0 0 0 0 0
X X X 0 0 0 0
X X X X 0 0 0
X X X X X 0 0
X X X X X X 0
h h h h h h e

with small h's and e. We can repeat the condition estimation and zero
chasing on the leading 6-by-6 portion thereby producing (perhaps) another
row of small numbers:

X 0 0 0 0 0 0
X X 0 0 0 0 0
X X X 0 0 0 0
X X X X 0 0 0
X X X X X 0 0
h h h h h e 0
h h h h h e e

(If not, then the revealed rank is 6.) Continuing in this way, we can restore
any lower triangular matrix to rank-revealing form.
In the event that they vector in (12.5.11) is small, we can reach rank-
revealing form by a different, more efficient route. We start with a sequence
of left and right Givens rotations to zero all but the first component of y:
; f 0 0 0 0 0 0 f 0 0 0 0 0 0 -
f f 0 0 0 0 0 f f 0 0 0 0 0
f f f 0 0 0 0 f f f 0 0 0 0
f f f f 0 0 0 f f f f 0 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e e h h h h e e 0
h h h h e e e h h h h e e e
X X X X y y 0 X X X X y y 0

f 0 0 0 0 0 0 f 0 0 0 0 0 0
f f 0 0 0 0 0 f f 0 0 0 0 0
f f f 0 0 0 0 f f f 0 0 0 0
f f f f 0 0 0 f f f f 0 0 0
h h h h e e 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
X X X X y. 0 0 X X X X y. 0 0
12.5. UPDATING MATRIX FACTORIZATIONS 617

Here, "U;j'' means a rotation of rows i and j and "V;/' means a rotation of
columns i and j. It is important to observe that there is no intermingling
of small and large numbers during this process. The h's and e's are still
small.

Following this, we produce a sequence of rotations that transform the


matrix to

e 0 0 0 0 0 0
e e 0 0 0 0 0
e e e 0 0 0 0
e e e e 0 0 0
(12.5.12)
h h h h e 0 0
h h h h e e 0
h h h h e e e
y y y y y 0 0

where all the y's are small:

e 0 0 0 0 0 0 - e 0 0 0 0 0 0
e e 0 0 0 0 0 e e 0 0 0 0 0
e e e 0 0 0 0 e e e 0 J.L 0 0
e e e e /1 0 0 e e e e /1- 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e e h h h h e e 0
h h h h e e e h h h h e e e
X X X 0 y 0 0 X X 0 0 y 0 0

e 0 0 0 0 0 0 e 0 0 0 /1- 0 0
e e 0 0 /1 0 0 e e 0 0 /1- 0 0
e e e 0 J.L 0 0 e e e 0 /1- 0 0
e e e e /1 0 0 e e e e /1 0 0
h h h h e e 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
X 0 0 0 y 0 0 L 0 0 0 0 y •• 0 0

Note that y •• is small because of 2-norm preservation. Column rotations


618 CHAPTER 12. SPECIAL TOPICS

in planes (1,5), (2,5), (3,5), and (4,5) can remove the 11-'s:
£ 0 0 0 0 0 0 £ 0 0 0 0 0 0
£ £ 0 0 11- 0 0 £ £ 0 0 0 0 0
i i i 0 11- 0 0 £ £ £ 0 11- 0 0
i i £ £ 11- 0 0 £ £ £ £ 11- 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
L y 0 0 0 y 0 0 y y 0 0 y 0 0

£ 0 0 0 0 0 0 £ 0 0 0 0 0 0
£ £ 0 0 0 0 0 £ £ 0 0 0 0 0
£ £ £ 0 0 0 0 £ £ £ 0 0 0 0
£ £ £ £ 11- 0 0 £ £ £ £ 0 0 0
h h h h e 0 0 h h h h e 0 0
h h h h e e 0 h h h h e e 0
h h h h e e e h h h h e e e
y y y 0 y 0 0 . y y y y y 0 0
thus producing the structure displayed in (12.5.12). All the y's are small
and thus a sequence of row rotations U67 , U4 7 , •.. , U17 , can be constructed
to clean out the bottom row giving the rank-revealed form
,. £ 0 0 0 0 0 0
£ £ 0 0 0 0 0
£ £ £ 0 0 0 0
i i £ £ 0 0 0
h h h h e 0 0
h h h h e e 0
h h h h e e e
0 0 0 0 0 0 0

Problems

P12.5.1 Suppose we ha.ve the QR factorization for A E E"x" and now wish to mini-
mize II (A+ uvT)x - b ll2 where u, bE Rm and v E R" are given. Give an algorithm for
solving this problem that requires O(mn) flops. Assume tha.t Q must be updated.
P12.5.2 Suppose we have the QR factorization QR =A E E"x". Give an a.lgnrithm
for computing the QR factorization of the matrix A obtained by deleting the kth row of
A. Your algorithm should require O(mn) flops.
P12.5.3 Suppose T E R"x" is tridiagonal and symmetric and that v E R". Show how
the Lanczos algorithm ca.n be used (in principle) to compute an orthogonal Q E R"x"
=
in O(n 2 ) flops such that QT(T + vvT)Q f is also tridiagonal.
P12.5.4 Suppose

A= [ ~] c E R", BE R(m-t)xn
12.5. UPDAT ING MATRIX FACTORIZ.-'TIONS 619

ha.a tun column ra.nk and m > n. Using the Sherman-Morrison-Woodbury formula show
that
1 < I (AT A) - 1c II~
.,.,.;,.(B) - <T,.;,.(A) + 1 - cT(AT A)-lc .

P12.5.5 As a function of :rt and :r2, what is the 2-nonn of tbe hyperbolic rotation
produoed by (12.!1.9)?
P12.5.6 Show that the hyperbolic reduction in §12.5.4 does not breakdown U A has
full column rank.
P12.5. 7 A•ume
A=[~~]
where R and E are square with

p =
II ENz < 1.
O'm;n(R)
Show that if
Q-
Is orthogonal and

[~ ~ ][ ~~ ] = [ ~~
then II Ht lb ~ Pll H lb·
Notes and References for Sec. 12.5
Numerous aapects of the updating problem are presented in

P.E. 0111, C .H. Golub, W . Murray, and M.A. Saunder8 (1974). •Mmods for Modifylnc
Matrix Fa.ctorlza.tions," Ma.th. Oomp. 28, 506-35.
Appllcatione in the area of opUmizatlon are covered In

R.H. Bartels (1971). "A Stabilization of the Simplex M~bod ," Numer. Mo.tlt. 16,
414-434.
P.E. CiU, W . Murray, and M.A. Saunders (1975). "Methode for Computin& a nd Modl-
fyine the LDV FactoB of a Matrix," Math. Oomp. %9, 1051-77.
D. Goldfarb (1976). "Factored Variable Metric Methods for Unconstrained Optimiu.-
tion," Math. Oomp. 30, 796-811.
J .E . Dennie and R.B. Schnabel (1983). Num<rical Methodl/or Uncofutm1ned 0ptimiz4-
tion and Nonlinear Equations, Prenti~Hall, Englewood Cliffil, NJ.
W .W . Hager (1989). "Up d ating the Inverse of a Matrix," SIAM Review 31, 221-239.
S.K. Elderaveld and M.A. Saunders (1992). "A Bloclt-LU Update for Large-Scale Linear
Programming," SIAM J . Matri:z AnaL Appl. 13, 191- 201.
Updating issues in the least equ11J'81 eetting are discussed in

J. Daniel, W .B. Cragg, L. Kaufman, and C.W. Stewart (1976). "Reortbogonaiaation


aod Stable Alsorithms Lor Updating the Crar&Schmidt QR Factorization," Moth.
Comp. 30, 772-95.
S. Qiao (1988). "Recursive Least Square~~ Algorithm for Linear Prediction Problems,"
SIAM J. Matri:z Anal. Appl. 9, 323-328.
A. BjOrdc, H.. Park, and L. Elden (1994). "Acc:urate Downdatlng of Least Square
Solutiou," SIAM J. Motri:z Anal. AppL 15, 54~568.
620 CHAPTER 12. SPECIAL TOPICS

S.J. Olszanskyj, J.M. Leba.k, and A.W. Bojanczyk (1994). "Rank-k Modification Meth-
ods for Recursive Least Squares Problems," Numerical Algorithms 7, 325-354.
L. Elden and H. Park (1994). "Block Downdating of Least Squares Solutions," SIAM J.
Matrix Anal. Appl. 15, 1018-1034.
Another important topic concerns the updating of condition estimates:

W.R. Ferng, G.H. Golub, and R.J. Plemmons (1991). "Adaptive Lanczos Methods for
Recursive Condition Estimation," Numerical Algorithms 1, 1-20.
G. Shroff and C.H. Bischof (1992). "Adaptive Condition Estimation for Rank-One Up-
dates of QR Factorizations," SIAM J. Matrix Anal. Appl. 19, 1264-1278.
D.J. Pierce and R.J. Plemmons (1992). "Fast Adaptive Condition Estimation," SIAM
J. Matrix Anal. Appl. 19, 274-291.

Hyperbolic transformations are discussed in

G.H. Golub (1969). "Matrix Decompositions and Statistical Computation," in Statistical


Computation, ed., R.C. Milton and J.A. Neider, Academic Press, New York, pp. 365-
97.
C.M. Rader and A.O. Steinhardt (1988). "Hyperbolic Householder Transforms," SIAM
J. Matrix Anal. Appl. 9, 26!l--290.
S.T. Alexander, C.T. Pan, and R.J. Plemmons (1988). "Analysis of a Recursive Least
Squares Hyperbolic Rotation Algorithm for Signal Processing," Lin. Alg. and I/.8
Applic. 98, 3-40.
G. Cybenko and M. Berry (1990). "Hyperbolic Householder Algorithms for Factoring
Structured Matrices," SIAM J. Matrix Anal. Appl. 11, 499-520.
A.W. Bojanczyk, R. Onn, and A.O. Steinhardt (1993). "Existence of the Hyperbolic
Singular Value Decomp06ition," Lin. Alg. and Its Applic. 185, 21-30.
Cholesky update issues have also attracted a lot of attention.

G.W. Stewart {1979). "The Effects of Rounding Error on an Algorithm for Downdating
a Cholesky Factorization," J. Inst. Math. Applic. 29, 203-13.
A.W. Bojanczyk, R.P. Brent, P. Van Dooren, and F.R. de Hoog (1987). "A Note on
Downdating the Cholesky Factorization," SIAM J. Sci. and Stat. Comp. 8, 21D-221.
C.S. Henkel, M.T. Heath, and R.J. Plemmons (1988). "Cholesky Downdating on a
Hypercube," in G. Fox (1988), 1592-1598.
C.-T. Pan {1993). "A Perturbation Analysis of the Problem of Downdating a Cholesky
Factorization," Lin. Alg. and I/.8 Applic. 183, 103-115.
L. Elden and H. Park (1994). "Perturbation Analysis for Block Down dating of a Cholesky
Decomposition," Numer. Math. 68, 457-468.
Updating and downdating the ULV and URV decompositions and related topics are cov-
ered in

C.H. Bischof and G.M. Shroff (1992). "On Updating Signal Subspaces," IEEE nuns.
Signal Proc. 40, 96-105.
G.W. Stewart (1992). "An Updating Algorithm for Subspace Tracking," IEEE nuns.
Signal Proc. 40, 1535-1541.
G.W. Stewart (1993). "Updating a Rank-Revealing ULV Decomposition," SIAM J.
Matrix Anal. AppL 14, 494-499.
G.W. Stewart {1994). "Updating URV Decompositions in Parallel," Pamllel Computing
!JO, 151-172.
H. Park and L. Elden (1995). "Downdating the Rank-Revealing URV Decomp06ition,"
SIAM J. Matrix Anal. Appl. 16, 138-155.
Finally, we mention the following paper concerned with SVD updating:
12.6. MODIFIED/STRUCTURED EIGENPROBLEMS 621

M. Moonen, P. Van Dooren, and J. Vandewalle (1992). "A Singular Value Decomposition
Updating Algorithm," SIAM J. Matrix Anal. Appl. 13, 1015--1038.

12.6 Modified/Structured Eigenproblems


In this section we treat an array of constrained, inverse, and structured
eigenvalue problems. Although the examples are not related, collectively
they show how certain special eigenproblems can be solved using the basic
factorization ideas presented in earlier chapters.
The dependence of this section upon earlier portions of the book is as
follows:

§§5.1, 5.2, 8.1, 8.3 --+ §12.6.1


§§8.1, 8.3, 9.1 --+ §12.6.2
§§4.7, 8.1 --+ §12.6.3
§§5.1, 5.2, 5.4, 7.4, 8.1, 8.2, 8.3, 8.6 --+ §12.6.4

12.6.1 A Constrained Eigenvalue Problem


Let A E lRnxn be symmetric. The gradient of r(x) = xT Ax/ xT x is zero if
and only if xis an eigenvector of A. Thus the stationary values of r(x) are
therefore the eigenvalues of A.
In certain applications it is necessary to find the stationary values of r(x)
subject to the constraint cT x = 0 where C E lRnxp with n 2: p. Suppose

[~ ~ ]
r p-r
T
n-r
r =rank( C)

is a complete orthogonal decomposition of C. Define B E Rnxn by

B12 ] r
QTAQ = B = [ Bu
B21 B22 n-r
T n-r

and set
y = QTx = [:] n-r
T

Since cT X = 0 transforms to sTu = 0, the original problem becomes one of


finding the stationary values of r(y) = yT ByjyT y subject to the constraint
that u = 0. But this amounts merely to finding the stationary values
(eigenvalues) of the (n- r)-by-(n- r) symmetric matrix B 22 .
622 CHAPTER 12. SPEClAL TOPICS

12.6.2 Two Inverse Eigenvalue Problems


Consider the r = 1 case in the previous subsection. Let 5: 1 ::; . .. ::; 5:n-l be
the stationary values of xT A xjxTx s ubject to the constraint cTx = 0. From
Theorem 8.1.7, it is easy to show that these stationary values interlace the
eigenvalues A; of A:
An $ 5:n - 1 $ An- 1 $ · · · $ A2 $ j; 1 $ A1·
Now suypose ~hat A has distinct eigenvalues and that we are given the
values A1 , ••• , An- I that satisfy

An < >:n-1 < An-1 < · · · < A2 < j;l < A1 ·


We seek to determine a unit vector c E Rn such that t he 5:; are the station-
ary values of x T Ax subject to xT x = 1 and cT x = 0.
In order to determine the properties that c must have, we use the method
of Lagrange multipliers. Equating the gradient of
if>(x, A, J.L) = XT Ax - A(XTX - 1) + 2J.LXT C
to zero we obtain t he important equation (A - Al )x = - J.LC. Thus, A - >.I is
nonsingular and sox = -J.L(A - AI)- 1 c. Applying cT to both sides of t his
equation and s ubstituting the eigenvalue decomposition QT AQ = diag(A,)
we obtain

where d = QT c, i.e.,
n n
p(A) =L d~ II (Aj - A)
i~t j =l
= 0.

j;f>i

Notice that 1= II c II ~ = IId 1l2= di + · · ·+ ~ is the coefficient of ( -A)n- 1•


Since p(A) is a polynomial having zeroes >:1, ... , 5.n- l we must have
n- 1
p(A) = II (5:j -A) .
j=l

It follows from these two formulas for p(A) that

n- 1 k = l :n. (12.6. 1)
n
j•l
(A; - AA:)
i#
12.6. MODIFIED/STRUCTURED EIGENP ROBLEMS 623

This determines each dk up to its sign. Thus there are 2n different solutions
c = Qd to the original problem.
A related inverse eigenvalue problem involves finding a tridiagonal ma-
trix
0

T=

0
such that T has prescribed eigenvalues {.Xt •... , An} and T(2:n, 2:n) has
prescribed eigenvalues {>.1, . . . , .Xn- l} with

At > ).1 > A2 > · · · > An-1 > ~-1 > An.
We show how to compute the tridiagonal T via the La.nczos process. Note
that the ~. are the stationary values of

!J>(y) = yTAy
yTy

subject to ~Y = 0 where A= dlag(AJt . .. ,An) and dIs specified by (12.6.1).


If we apply the Lanczos iteration (9.1.3) with A = =
A and q1 d, then it
produces an orthogonal matrix Q and a tridiagonal matrix T such that
=
QT AQ = T. With the definition x QTy , it follows that the .X, are the
stationary values of
t/J(x) = xTTx
xT:z:
subject to efx = 0. But these are precisely the eigenvalues of T(2:n, 2:n)!
12.6.3 A Toeplitz Eigenproblem
Assume that

T= [~ ~]
is symmetric, positive definite, and Toeplitz with r E m,n- l. Our goal is to
compute the smallest eigenvalue Amm(T) ofT given that

Am>n(T) < Am~n(G).


This problem is considered in Cybenko and Van Loan (1986) and hBB ap-
plications in signal processing.
Suppose

[~ ~][;] A[;].
624 CHAPTER 12. SPECIAL TOPICS

i.e.,

a +rTy >.a
ar+Gy .>.y.

If.>.¢ .A( G), then y = -a(G- .>.I)- 1r, a ol 0, and


a+ rT [-a(G- .AI)- 1 r] =>.a.

Thus, .>. is a zero of the rational function

We have dealt with similar functions in §8.5 and §12.1. In this case, f
always has a negative slope

If.>.< Amin(G), then it also has a negative second derivative:

Using these facts it can be shown that if

(12.6.2)

then the Newton iteration

(12.6.3)

converges to Amin(T) monotonically from the right. Note that

where w solves the "shifted" Yule-Walker system

(G- .>.(k)I)w = -r.

Since, ,A(k) < Amin(G), this system is positive definite and Algorithm 4.7.1
is applicable if we simply apply it to the normalized Toeplitz matrix (G -
,A(k)I)/(1- ,A(kl).
A starting value that satisfies (12.6.2) can be obtained by examining
the Durbin algorithm when it is applied to T>.. = (T- .AI)/(1 - .>.). For
this matrix the "r" vector is r/(1- .A) and so the Durbin algorithm (4.7.1)
transforms to
12.6. MODIFIED/STRUCTURED EIGENPROBLEMS 625

(12.6.4)

end

From the discussion in §4. 7.2 we know that f3I. .. • , f3k > 0 implies that
T,x(1:k + 1, 1:k + 1) is positive definite. Hence, a suitably modified (12.6.4)
can be used to compute m(>.), the largest index m such that /3~o ... , !3m are
all positive but that /3m+ I .::; 0. Note that if m(>.) = n- 2, then (12.6.2)
holds. This suggests the following bisection scheme:

Choose L and R so L :S: Amln (T) < Amln (G) :S: R.


Until m= n -2
>. = (L+ R)/2
m = m(>.)
ifm<n-2 (12.6.5)
R=>.
end
ifm=n-1
L=>.
end
end

The bracketing interval [L, R] always contains a >. such that m(>.) = n- 2
and so the current >. has this property upon termination.
There are several possible choices for a starting interval. One idea is to
set L = 0 and R = 1 - lr1l since

where the upper bound follows from Theorem 8.1.7.


Note that the iterations in (12.6.4) and (12.6.5) involve at most O(n 2 )
flops. A heuristic argument that O(log n) iterations are required is given
in Cybenko and Van Loan (1986).

12.6.4 An Orthogonal Matrix Eigenproblem


Computing the eigenvalues and eigenvectors of a real orthogonal matrix
A E R"xn is a problem that arises in signal processing, see Cybenko (1985).
626 CHAPTER 12. SPECIAL TOPICS

The eigenvalues of A are on the unit circle and moreover,

cos(8) ± i sin(8) E >.(A) ~ cos(ll) E >. (A \A-I) = >. (A~ AT) .


This suggests computing Re(>.(A)) via the Schur decomposition

QT(A+AT) Q .
= diag(cos(8J), ... , cos(lln))
2

and then computing Im(>.(A)) with the formula 8 = v'1- c2 • Unfortu-


nately, if lei : : ; 1, then this formula does not produce an accurate sine
because of floating point cancellation. We could work with the skew-
symmetric matrix (A -AT)/2 to get the "small sine" eigenvalues, but then
we are talking about a method that requires a pair of full Schur decompo-
sition problems and the approach begins to lose its appeal.
A way around these difficulties that involves an interesting SVD ap-
plication is proposed by Ammar, Gragg, and Reichel (1986). We present
just the eigenvalue portion of their algorithm. The derivation is instructive
because it involves practically every decomposition that we have studied.
The first step is to orthogonally reduce A to upper Hessenberg form,
QT AQ =H. (Frequently, A is already in Hessenberg form.) Without loss
of generality, we may assume that His unreduced with positive subdiagonal
elements.
If n is odd, then it must have a real eigenvalue because the eigenvalues
of a real matrix come in complex conjugate pairs. In this case it is possible
to deflate the problem with O(n) work to size n- 1 by carefully working
with the eigenvector equation Hx = x (or Hx = -x). See Gragg (1986).
Thus, we may assume that n is even.
For 1 :::; k :::; n - 1, define the reflection Gk E Rnxn by

where Ck = cos(4>k), sk = sin(4>k), and 0 < 4>k <


J,_,
1r.
l
It is possible to
determine Gh ... , Gn-1 such that

H = (GJ · · · Gn-1) diag(1, ... , 1, -en)

where Cn = ±1. This is just the QR decomposition of H. The sines


s1, ... , sn-lare the subdiagonal entries of H. The "R" matrix is diagonal
because it is orthogonal and triangular. Since the determinant of a reflection
is -1, det(H) = Cn· This quantity is the product of H's eigenvalues and so
if Cn = -1, then {-1, 1} ~>..(H). In this situation it is also possible to
deflate.
12.6. MoDIFIED/STRUCTURED EtG ENPROBLEMS 627

So altogether we may assume that n is even and


H = G t(,Pt) · · · Gn-l(<l>n-J)Gn(,P,.)
where G,. = Gn(,P,.) = diag(1, .. . , 1, - c..) and ~ = 1. Designate the
sought after eigenvalues by
>.(H) = { cos(8k) ± i · sin(8k) };;'.,. 1 (12.6.4)

where m = n / 2.
The cosines c1, . . . , Cn a.re called t he Schur parameters and as we men-
tioned, the oorresponding sloes are the subdiagonal entries of H. Using
these numbers it is possible to construct explicitly a pair of bidiagonal ma-
trices Bc,Bs e IR.nxn with the property that
a(Bc(I:m , l :m )) = {cos(81 /2), ... ,cos(Bm/ 2)} (12.6.5)
a(Bs(1:m, 1:m)) = {sin(OJ / 2), ... ,sin(Bm / 2)} (12.6.6)
The singular values of Bc(1:m, l:m) and Bs(l:m, l:m) can be computed
using the bidiagonal SVD algorithm. The angle 8k can be 8()CUrately com-
puted from sin(8k/2) if 0 < 8k $ 1rj 2 and accurately oomputed from
cos(O~c /2) if 'lr/ 2 $ 8k < 1r. The construction of Be and Bs is based
on three facts:
1. H is slmilar to
fi = HoHe
where H 0 and H, are t he odd and even reflection products

Ho = GtG3 '"Gn-1
Ht, = G2G4 ·· ·Gn.
These matrices are block diagonal with 2-by-2 and 1-by-1 blocks, i.e.,
Ho = diag(R(¢1), R(</>a), . . . , R(¢,._1)) (12.6.7)
He = diag(l, R(¢-l), R(,P4), . .. , R(¢n-2), - 1) (12.6.8)
where
R(¢) _ [ - cos(~/>) sin(¢)] (12.6.9)
- sin(~/>) cos{¢) ·
2. The eigenvalues of the symmetric tridiagonal matrices
C = Ho + He and S = H0 - He (12.6.10)
2 2
are given by
A(C) = {e cos(81/2), .. . , ± cos{8m/2) } (12.6.11)
A(S) { ± sin(Bt/2), . .. ,±sin(Bm/2)}. (12.6.12)
628 CHAPTER 12. SPECIAL TOPICS

3. It is possible to construct bidiagonalizations

and U§SVs = Bs

that satisfy (12.6.5) and (12.6.6). The transformations Uc, Vc, U8 ,


and Vs are products of known reflections Gk and simple permutations.
We begin the verification of these three facts by showing that H is
similar to H 0 H,. The n = 8 case is sufficient for this purpose. Define the
orthogonal matrix P by

F3 = G3G4GsG6G1Gs
where Fs = GsG6G7Gs
{
F1 = G1Gs.

Since reflections are symmetric and G;Gi = GiGi if li- il ::0: 2, we see that

F3HF[ = (G3G4GsG6G7Gs)(GIG2GaG4GsG6G7Gs)(GJG4GsG6G7Gs)r
(G3G4GsG6G1Gs)GtG2
= GtGJG2G4GsG6G1Gs,

Fs(F3HFJ)F[ (GsG6G7Gs)(GtG3G2G4GsG6G7Gs)(GsG6G7Gs)r,
= (GsG6G1Gs)G,G3G2G4
G,G3GsG2G4G5G7Gs

F1(FsF3H FJ F[)Fi
(G1Gs) (Gt G3GsG2G4G6G1Gs)( G1Gs)T
(GtG3GsG7)(G2G4G6Gs) = HaH•.
The second of the three facts that we need to establish relates the eigen-
values of if = HoHe to the eigenvalues of the C and S matrices defined
in (12.6.10). It follows from (12.6.7) and (12.6.8) that these matrices are
symmetric, tridiagonal, and unreduced, e.g.,

c 1
2
[-~
0
s,
SJ CJ- ~
s2
0
s2
C2 - C3
0
0
SJ
l
l
0 0 83 C3- C4

s =
1
2
[-~
0
s,
s, C1 + C2
-S2
0
-s2
-C2- C3
0
0
SJ .
0 0 SJ CJ +c4
By working with the definitions it is easy to verify that
if+ ifT
2
12.6. MODIFI ED/STRUCTURED EJGENPROBLEMS 629

and

This shows that Re(,\(H)) = ,\(2C2 - J) and Im(.A(H)) = ,\( - 2iCS)


thereby establishing (12.6.11) and (12.6.12).
Instead of thinking of these half-angle cosines and sines as eigenvalues
of n -by-n matrices, it is more efficient to think of them as singular values
of m-by-m matrices. This brings us to the bidiagonalization of C and S.
The orthogonal equivalence transformations that carry out this task are
based upon the Schur decompositions of H0 and He. A 2-by-2 reflection
R(4>) defined by (12.6.9) has eigenvalues 1 and - 1 and t he following Schur
decomposition:

R(4>/2)R(4>)R(4>/2) = [ ~ ~~ ] .

Thus, if

Qo diag(R(4>tf2), R(4>a/2), . . . , R(4>n- tf2))


Qe = diag(1,R(</>2/2),R(4>4/2), ... ,R(4>n-2/ 2), - 1)

then from (12.6.7) and (12.6.8) H0 and He have the following Schur decom-
positions:

Q 0 H0 Q0 = D 0 = diag(1, -1, 1, - 1, .. •, 1, - 1}
Q eHeQe De = diag(1, 1,-1,1, -1, .. · ,1 , - 1,-1).

The matrices
1 1
QoCQe = 2Qo (Ho + He)Qe = 2 (Do(QoQe) + (QoQe )De)
1 1
QoSQe = 2Qo (Ho - He) Qe = 2 (Do(QoQe) - (QoQe)De)

have the same singular values as C and S respectively. To analyze their


structure we first note that Q 0 Qe is banded:

X X X 0 0 0 0 0
X X X 0 0 0 0 0
0 X X X X 0 0 0
0 X X X X 0 0 0
QoQe
0 0 0 X X X X 0
0 0 0 X X X X 0
0 0 0 0 0 X X X
0 0 0 0 0 X X X
630 CHAPTER 12. SPECIAL TOPICS

(The main ideas from this point on are amply communicated with n =8
examples.) If D 0 (i,i) and D.(j,j) have the opposite sign, then C;~) =0
from which we conclude that c(l) has the form

ao b! 0 0 0 0 0 0
0 0 ~ 0 0 0 0 0
0 a2 0 b3 0 0 0 0
0 0 a3 0 b4 0 0 0
c(J) = QoCQ.
0 0 0 a4 0 bs 0 0
0 0 0 0 as 0 be 0
0 0 0 0 0 ae 0 0
0 0 0 0 0 0 a7 ba

Analogously, if D 0 (i, i) and D.(j,j) have the same sign, then sg) = 0 from
which we conclude that S(l) has the form

0 0 h 0 0 0 0 0
e2 d2 0 0 0 0 0 0
0 0 d3 0 Is 0 0 0
S(l) = Q 0 SQ. 0 e4 0 d4 0 0 0 0
0 0 0 0 ds 0 Is 0
0 0 0 e6 0 d6 0 0
0 0 0 0 0 0 d7 h
0 0 0 0 0 0 es 0

Row/column permutations of these matrices result in bidiagonal forms:

Be = c( 1l([ 1 3 512 4 6 8], [ 1 2 4 6 3 51 8])

ao b! 0 0 0 0 0 0
0 a2 b3 0 0 0 0 0
0 0 a4 bs 0 0 0 0
0 0 0 ae 0 0 0 0
= 0 0 0 0 b2 0 0 0
0 0 0 0 a3 b4 0 0
0 0 0 0 0 as b6 0
0 0 0 0 0 0 a7 bs
12.6. MODIFIED/STRUCTURED EJGENPROBLEMS 631

Bs = s<l)([ 2 4 6 s13 5 7J, 112 4 6 a51 s))


e2 ~ 0 0 0 0 0 0
0 e4 d4 0 0 0 0 0
0 0 ee ~ 0 0 0 0
0 0 0 ea 0 0 0 0
= 0 0 0 0 h 0 0 0
0 0 0 0 d3 h 0 0
0 0 0 0 0 ds /s 0
0 0 0 0 0 0 do, h
It is not bard to verify that a's, b's, d's, e's, and fs are all nonzero and
this implies that the singular values of Bc(1:m, 1:m) and Bs(1:m, 1:m) are
distinct. Since

a(C) = a(Bc) = { cos(OJ/2), cos(OJ/2), . .. ,cos(Om./2), cos(9m./2) }


a(S) = a(Bs) = {sin(8t/2), sin(Bt/2), ... , sin(Bm/2), sin(Om/2)}
we have verified (12.6.5) ~md (12.6.6).

Problems
Pl2.6.1 Let A E If" X" and consider the problem of finding the stationary values of
liTA,;
R(z tt) - 11 E nm,z E R"
... - lilllbll,;b
subject to the constraints
<P':r: = O CeE'x" n~p
DTy = O DER"xq m~q

Show how to solve this problem by first computing complete orthogonal decompositions
of C and D and then computing the SVD of a ceztain eubmatrix of a transformed A.
Pl2.6.2 Supp06e A E R"'x" and B E R"'><". Assume that rank(A) = nand rank( B ) =
p. Using the methods of this section, s how how to solve

min II b- A:?: II~ = min


Ba: O liz II~ + 1 B•=O

Show that this Ia a coDStralned TLS problem. Is there al-ya a solution?


Pl2.8.S Suppoae A E rxn is symmetric and that BE R"xn has rank p. Let dE m'.
Show how to solve the problem of minimizing :r:TA:r: s ubject to the conatre.ints n:r: IJ2 =
I and B :r: = d. Indk:ate when a solution fails to exist.
Pl2.8.4 Assume th at A E R'x" is symmetric, large, and sparse and that C E R''><P is
a lso large ond sparee. How can the Lanczos proooss be used to llnd t he stationary values
of
zTAx
r(:r:)= - T -
:r: %
632 CHAPTER 12. SPECIAL TOPICS

subject to the constraint cT X= 0? Assume that 8 sparse QR factorization c = QR is


available.
P12.6.5 Relate the eigenvalues and eigenvectors of

to the eigenvalues and eigenvectors of A= A1A2A3A4. Assume that the diagonal blocks
in A ace square.
P12.6.6 Prove that if {12.6.2) holds, then {12.6.3) converges to >.m;n(T) montonically
from the right.
P12.6.7 Recall from §4.7 that it is possible to compute the inverse of a. symmetric pos-
itive definite Toeplitz matrix in O{n 2) flops. Use this fact to obtain an initia.l bracketing
interval for {12.6.5) that is based on II T- 1 lloo and II a- 1 lloo·
P12.6.8 A matrix A e R''" is centrosymmetric if it is symmetric and persymmet-
ric, i.e., A = EnAEn where En = In(:,n:- 1:1). Show that if n =2m and Q is the
orthogonal matrix
1 [ lm
Q=.Ji Em -&n
Im
].
then
QTAQ = [
An+ A12Em
0
0
An- A12Em ]
where An = A{1:m,1:m) and A12 = A{1:m, m + 1:n). Show that if n = 2m, then the
Schur decnmposition of a centrosymmetric matrix can be computed with onG-fourth the
flops that it takBs to compute the Schur decomposition of a. symmetric matrix, assuming
that the QR a.lgorithm is used in both cases. Repeat the problem if n = 2m+ 1.
P12.6.9 Suppose F, G E R"x" are symmetric and that
Q = [ Q1 02 1
I' n-p
is an n-by-n orthogonal matrix. Show how to compute Q and p so that
f(Q,p) =tr(Q[FQ1) +tr{QfGQ2)
is maximized. Hint: tr(Qf FQ1) + tr(Qf GQ2) = tr(Qf (F- G)QJ) + tr{G).
P12.6.10 Suppose A E R''" is given and consider the problem of minimizing II A - S IIF
over all symmetric positive semidefinite matrices S that have rank r or less. Show that
min{k,r}
s= L >.,q,qr
i=l

solves this problem where


A+ AT . T
-2- = Qdw.g{At, ... 'An)Q

is the Schur decomposition of A's symmetric part, Q = [ q~o ... , qn ], and


>.1 ~ · · · ~ >.k > 0 ~ >.k+i ~ · · · ~ An.

P12.6.11 Verify for general n (even) that His similar to HaHe where these matrices
ace defined in §12.6.4.
P12.6.12 Verify that the bidiagonal matrices Bc{1:m,1:m) and Bs(1:m, 1:m) in §12.6.4
12.6. MODIFIED/STRUCTURED ElGENPROBLEMS 633

have nonzero entries on their diagonal and superdiagonal and specify their value.
P12.6.13 A real 2n-by-2n matrix of the form

is Hamiltonian if A E R"xn and F, G E R"x" are symmetric. Equivalently, if the or-


thogonal matrix J is defined by

I,.
0 l ,

then ME fi'"X 2n is Hamiltonian if and only if .P'MJ = -MT. (a) Show that the
eigenvalues of a Hamiltonian matrix come in plU&-minus pairs. (b) A matrix S E R 2"X 2n
is symplectic if .P' SJ =-s-T. Show that if Sis symplectic and M is Hamiltonian, then
s- 1 MS is also Hamiltonian. (c) Show that if Q E fi'"X 2n is orthogonal and symplectic,
then

Q =[ -~~ ~~ l
where QfQ 1 + Q'fQ2 = In and Q'fQ, is symmetric. Thus, a Givens rotation of the
form G(i, i + n, IJ) is orthogonal symplectic as is the direct sum of n-by-n Householders.
(d) Show how to compute a symplectic orthogonal U such that

UT MU = [ ~ -H~ l
where His upper Hessenberg and D is diagonal.

Notes and References for Sec. 12.6


The inverse eigenvalue problems discussed in this §12.6.1 and §12.6.2 appear in the fol-
lowing survey articles:

G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Re!liew 15,
318-44.
D. Boley a.nd G.H. Golub (1987). "A Survey of Matrix Inverse Eigenvalue Problems,"
Inverse Problems 3, 595-{;22.
References for the stationary value problem include

G.E. Fbrsythe and G.H. Golub (1965). "On the Stationary Values of a Second-Degree
Polynomial on the Unit Sphere," SIAM J. App. Math. 19, 105(H)8.
G.H. Golub and R. Underwood {1970). "Stationary Values of the Ratio of Quadratic
Fbrms Subject to Linear Constraints," Z. Angew. Math. Phys. 21, 318-26.
S. Leon {1994). ''Maximizing Bilinear Fbrms Subject to Linear Constraints," Lin. Alg.
and Its Applic. flO, 4~58.
An algorithm for minimizing xT Ax where x satisfies Bx = d and II x ll2=1 is presented in

W. Gander, G.H. Golub, and U. von Matt {1991). "A Constrained Eigenvalue Problem,"
in Numericnl Linear Algebra, Digital Signal Processing, and Parallel Algorithms,
G.H. Golub and P. Van Dooren (eds), Springer-Verlag, Berlin.
Selected papers that discuss a range of inverse eigenvalue problems include

G.H. Golub and J.H. Welsch (1969). "Calculation of Gauss Quadrature Rules," Math.
Camp. es, 221-30.
S. Friedland (1975). "On Inverse Multiplicative Eigenvalue Problems for Matrices," Lin.
Alg. and Its Applic. 12, 127-38.
634 CHAPTER 12. SPECIAL TOPICS

D.L. Boley and G.H. Golub (1978). "The Matrix Inverse Eigenvalue Problem for Peri-
odic Jacobi Matrice!," in Proc. Fourth Svmposium on Basic Problems of Numerical
Mathematics, Prague, pp. 63-76.
W.E. Ferguson (1980). ''The Construction of Jacobi and Periodic Jacobi Matrices with
Pr=ribed Spectra," Math. Camp. 35, 1203-1220.
J. Kautsky and G.H. Golub (1983). "On the Ca.iculation of Jacobi Matrices," Lin. Alg.
and It. Applic. 52/53, 439-456.
D. Boley and G.H. Golub (1984). "A Modified Method for Restructuring Periodic Jacobi
Matriee!," Math- Comp. 42, 143-150.
W.B. Gragg and W.J. Harrod (1984). "The Numerica.lly Stable Reconstruction of Jacobi
Matrices from Spectral Data," Numer. Math. 44, 317-336.
S. Friedland, J. Noceda.i, and M.L. Overton (1987). "The Formulation and Analysis of
Numerical MethodB for InveiBe Eigenvalue Problems," SIAM J. Numer. Anal. £4,
634-W7.
M.T. Chu (1992). "Numerical Methods for InveJBe Singular Value Problems," SIAM J.
Num. Anal. 29, 885-903.
G. Ammar and G. He (1995). "On an lnveiBe Eigenvalue Problem for Unitary Matrices,"
Lin. Alg. and Its Applic. 218, 263-271.
H. Zha and z. Zhang (1995). "A Note on Constructing a Symmetric Matrix with Spec-
ified Diagona.l Entries and Eigenvalues," BIT 35, 448-451.

Various Toeplitz eigenva.lue computations are presented in

G. Cybenko and C. Van Loan (1986). "Computing the Minimum Eigenvalue of a Sym-
metric Positive Definite Toeplitz Matrix," SIAM J. Sci. and Stat. Camp. 7, 123-131.
W.F. Trench (1989). "Numerical Solution of the Eigenvalue Problem for Hermitian
'Ibeplitz Matriee!," SIAM J. Matriz Anal. Appl. 10, 135-146.
L. Reichel and L.N. Trefethen (1992). "Eigenva.lues and Pseudo-eigenvalues of 'Ibeplitz
Matriee!," Lin. Alg. and Its Applic. 162/163/164, 153-186.
S.L. Handy and J.L. Barlow (1994). "Numerica.l Solution of the Eigenproblem for
Banded, Symmetric Toeplitz Matrices," SIAM J. Matriz AnaL Appl. 15, 205-214.

Unitary/orthogonal eigenvalue problems are treated in

H. Rutishauser (1966). "Bestimmung der Eigenwerte Orthogona.ler Matrizen," Numer.


Math. 9, 104-108.
P.J. Eberlein and C.P. Huang (1975). "Global Convergence of the QR Algorithm for
Unitary Matricft! with Some Results for Nonnal Matrices," SIAM J. Numer. Anal.
12, 421-453.
G. Cybenko (1985). "Computing Pisarenko Frequency Estimates," in Proceedings of
the Princeton Conference on Information Science and Systems, Dept. of Electrical
Engineering, Princeton University.
W. B. Gragg (1986). "The QR Algorithm for Unitary Hessenberg Matrices," J. Comp.
Appl. Math. 16, 1-8.
G.S. Amma:r, W.B. Gragg, and L. Reichel (1985). "On the Eigenproblem for Orthogona.l
Matrices," Proc. IEEE Conference on Deci8ion and Control, 1963-1966.
W.B. Gragg and L. Reichel (1990). "A Divide and Conquer Method for Unitary and
Orthogona.l Eigenproblems," Numer. Math. 57, 695-718.

Hamiltonian eigenproblems (see P12.6.13) occur throughout optimal control theory and
are very important.

C. C. Paige and C. Van Loan (1981). "A Schur Decomposition for Hamiltonian Matrices,"
Lin. Alg. and Its Applic. 41, 11-32.
C. Van Loan (1984). "A Symplectic Method for Approximating All the Eigenvalues of
a Hamiltonian Matrix," Lin. Alg. and Its Applic. 61, 233-252.
12.6. MODIFIED/STRUCTURED EIGENPROBLEMS 635

R. Byers (1986) "A Hamiltonian QR Algorithm," SIAM J. Sci. and Stat. Camp. 7,
212-229.
V. Mehrmann (1988). "A Symplectic Orthogonal Method for Single Input or Single
Output Discrete Time Optimal Quadratic Control Problems," SIAM J. Matriz Anal.
Appl. 9, 221-247.
G. Ammar and V. Mehrmann (1991). "On Hamiltonian and Symplectic Hessenberg
Fbrms," Lin.Alg. and Its Application 149, 55-72.
A. BuDBe-Gerstner, R. Byers, and V. Mebrmann (1992). "A Chart of Numerical Methods
for Structured Eigenvalue Problems," SIAM J. Matri:t: Anal Appl. 13, 419-453.
Other papers on modified/structured eigenvalue problems include

A. BuDBe-Gerstner and W.B. Gragg (1988). "Singular Value Decompositions of Complex


Symmetric Matrices," J. Comp. Applic. Matll. 21, 41-54.
R. Byers (1988). "A Bisection Method for Measuring the Distance of a Stable Matrix to
the Unstable Matrices," SIAM J. Sci. Stat. Comp. 9, 875-881.
J.W. Demmel and W. Gragg (1993). "On Computing Accurate Singular Values and
Eigenvalues of Matrices with Acyclic Graphs," Lin. Alg. and Its Applic. 185, 203-
217.
A. Bunse-Gerstner, R. Byers, and V. Mebrmann (1993). "Numerical Methods for Simul-
taneous Diagonalization," SIAM J. Matri:t: Anal. Appl. 14, 927-949.
Bibliography

J.O. ABSen (1971). "On the Reduction of a Symmetric Matrix to Tridiagona.l Form,"
BIT 11, 233-42.
N.N. Abdelmalek (1971). "Roundoff Error Analysis for Gram-Schmidt Method and
Solution of Linear Least Squares Problems," BIT 11, 345-{;8.
G.E. Adams, A.W. Bojanczyk, and F.T. Luk (1994). "Computing the PSVD of Two
2x2 Triangular Matrices,• SIAM J, Matri% Anal. AppL 15, 366-382.
L. Adams (1985). "m-step Preconditioned Congugate GrBdient Methods," SIAM J. Sci.
and Stat. Comp. 6, 452-463.
L. Adams and P. Arbenz (1994). "Towards a Divide and Conquer Algorithm for the Real
Nonsymmetric Eigenvalue Problem," SIAM J. Matri% Anal. Awl. 15, 1333-1353.
L. Adams and T. Crockett (1984). "Modeling Algorithm Execution Time on Processor
Arrays," Computer 17, 38-43.
L. Adams and H. Jordan (1986). "Is SOR Color-Blind?" SIAM J. Sci. Stat. Comp. 7,
49Q-506.
S.T. Alexander, C.T. Pan, and R.J. Plemmons (1988). "Analysis of a Recursive Least
Squares Hyperbolic Rotation Algorithm for Signal ProC<I!Sing," Lin. Alg. and It•
Applic. 98, 3--40.
E.L. Allgower (1973). "Exact Inverses of Certain Band Matrices," Numer. Math. Ill,
279-84.
A.R. Amir-Moez (1965 ). &tremol Properties of Linmr 'I'rowformatiow and Geometry
of Unitary Spaces, Texas Tech University Mathematics Series, no. 243, Lubbock,
Texas.
G.S. Ammar and W.B. Gragg (1988). "Superfast Solution of Real Positive Definite
Toeplitz Systems," SIAM J. Matri% Anal. Appl. 9, 61-76.
G.S. Ammar, W.B. Gragg, and L. Reichel (1985). "On the Eigenproblem for Orthogona.l
Matrices," Proc. IEEE Conference on Deci.ion and Control, 1963-1966.
G.S. Ammar and G. He (1995). "On an Invel'8e Eigenvalue Problem for Unitary Matri-
ces," Lin. Alg. and Its Applic. 1!18, 263-271.
G.S. Ammar and V. Mehrmann (1991). "On Hamiltonian and Symplectic Hessenberg
Forms," Lin.Aig. and ft. Applic. 1,49, 55-72.
P. Amodio and L. Brugnano (1995). "The Parallel QR Factorization Algorithm for
Tridiagonal Linear Systems," Pamllel Computing !H, 1097-1110.
C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell (1993). "A Linear Algebra Framework
for Static HPF Code Distribution," PT'OC«iiings of the 4th Workshop on Compilers
for Pamllel Computers, Delft, The Netherlands.
A.A. Anda and H. Park (1994). "Fast Plane Rotations with Dynamic Scaling," SIAM
J. Matri% An.U. AppL 15, 162-174.
E. Anderson, z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. DuCroz, A. Greenbaum,
S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen (1995). LAPACK
Users' Guide, Releaoe 2.0, 2nd ed., SIAM Publications, Phi!Bdelphia, PA.
E. Anderson, Z. Bal, and J. Dongarra (1992). "Generalized QR Factorization and Its
Application," Lin. Alg. and /Is Applic. 162/163/164, 243-271.
N. Anderson and I. Karasalo (1975). "On Computing Bounds for the Least Singular
Value of a Triangular Matrix," BIT 15, 1-4.

637
638 BIBLIOGRAPHY

P. Anderson a.nd G. Loizou (1973). "On the Quadratic Convergence of an Algorithm


Which Diagonalizes a Complex Symmetric Matrix," J. Inst. Math. Applic. Hl,
261-71.
P. Anderson a.nd G. Loizou (1976). "A Jacobi-Type Method for Complex Symmetric
Matrices (Handbook)," Numer. Math. B5, 347-£3.
H.C. Andrews and J. Kane ( 1970}. "Kronecker Matrices, Computer Implementation,
and Generalized Spectra," J. A ...oc. Comput. Mach. 11, 26<>-268.
P. Arbenz, W. Gander, and G.H. Golub (1988}. "Restricted Rank Modification of the
Symmetric Eigenvalue Problem: Theoretical Considerations," Lin. Alg. and /13
Applic. 104, 75-95.
P. Arbenz and G.H. Golub (1988). "On the Spectral Decomposition of Hermitian Ma-
trices Subject to Indefinite Low Rank PerturbatioiiS with Applications," SIAM J.
Matri% Anal. AppL 9, 40-58.
T.A. Arias, A. Edelman, and S. Smith (1996). "Conjugate Gradient and Newton's
Method on the Gr888man and Stiefel Manifolds," to appear in SIAM J. Matri% Anal.
Appl.
M. Arioli, J. Demmel, and I. Duff (1989). "Solving Sparse Linear Systems with Sparse
Backward Error," SIAM J. Matri% Anal. Appl. 10, 165-190.
M. Arioli, l.S. Duff, and D. Ruiz (1992). "Stopping Criteria for Iterative Solvers," SIAM
J. Matri% Anal. Appl. 13, 138--144.
M. Arioli and A. Laratta (1985). "Error Analysis of an Algorithm for Solving an Under-
dete:cmined System," Numer. Math. 46, 255-268.
M. Arioli and F. Romani (1985}. "Relations Between Condition Numbers and the Con-
vergence of the Jacobi Method for Real Positive Definite Matrices," Numer. Math.
46, 31-42.
W.F. Arnold and A.J. Laub (1984}. "Generalized Eigenproblem Algorithms and Software
for Algebraic Riccati Equations," Proc. IEEE 12, 1746-17M.
W.E. Arnoldi (1951). "The Principle of Minimized Iterations in the Solution of the
Matrix Eigenvalue Problem," Quarterly of Applied Mathematics 9, 17-29.
S. Ashby (1987}. "Polynomial Preconditioning for Conjugate Gradient Methods," Ph.D.
Thesis, Dept. of Computer Science, University of Illinois.
S. Ashby, T. Ma.nteulfel, and J. Otto (1992). "A Comparison of Adaptive Chebyshev
and Least Squares Polynomial Preconditioning for Hermitian Positive Definite Linear
Systems," SIAM J. Sci. Stat. Comp. 13, 1-29.
S. Ashby, T. Manteulfel, and P. Saylor (1989}. "Adaptive Polynomial Preconditioning
for Hermitian Indefinite Linear Systems," BIT 29, 583-£09.
S. Ashby, T. MB.llteuffel, and P. Sa.ylor (1990). "A 'I'a:><onomy of Conjugate Gradient
Methods," SIAM J. Num. Anal. 27, 1M2-1568.
C.C. Ashcraft and R. Grimes (1988}. "On Vectorizing Incomplete Factorization and
SSOR Preconditioners," SIAM J. Sci. and Stat. Comp. 9, 122-151.
G. Auchmuty (1991}. "A Posteriori Error Estimates for Linear Equations," Numer.
Math. 61, 1-£.
0. Axelsson (1977). "Solution of Linear Systems of Equations: Iterative Methods," in
Sparse Matri% Techniques: Copenhagen, 1976, ed. V.A. Barker, Springer-Verlag,
Berlin.
0. Axelsson (1980}. "Conjugate Gradient Type Methods for Unsymmetric and Incon-
sistent Systems of Linear Equations," Lin. Alg. and 113 Applic. 29, 1-16.
0. Axelsson (1985}. "Incomplete Block Matrix Factorization Preconditioning Methods.
The Ultimate Answer?", J. Camput. AppL Math. 12&13, 3-18.
0. Axelsson (1985}. "A Survey of Preconditioned Iterative Methods for Linear Systems
of Equations," BIT 25, 166-187.
0. Axelsson (1986}. "A General incomplete Block Matrix Factorization Method," Lin.
Alg. Appl. 74, 179-190.
0. Axelsson, ed. (1989}. "Preconditioned Conjugate Gradient Methods," BIT 29:4.
0. Axelsson (1994). Iterative Solution Method8, Cambridge University Press.
BIBLIOGRAPHY 639

0. Axelsson and V. Eijkhout ( 1989). "Vectorizable Preconditioners for Elliptic Difference


Equations in Three Space Dimensions," J. Comput. Awl. Math. 27, 299-321.
0. Axelsson and G. Lindskog (1986). "On the Rate of Convergence of the Preconditioned
Conjugate Gradient Method," Numer. Math. 48, 499-523.
0. Axelsson and B. Polman (1986). "On Approximate Factorization Methods for Block
Matrices Suitable for Vector and Parallel Processors," Lin. Alg. and /1.8 Applic. 77,
3-26.
0. Axelsson and P. Vassilevski (1989). "Algebraic Multilevel Preconditioning Methods
1," Numer. Math. 56, 157-177.
0. Axelsson and P. Vassilevski (1990). "Algebraic Multilevel Preconditioning Methods
II," SIAM J. Numer. Anal. 27. 1569-1590.
Z. Bai (1988). "Note on the Quadratic Convergence of Kogbetliantz's Algorithm for
Computing the Singular Value Decomposition," Lin. Alg. and Its Applic. 104,
131-140.
Z. Ba.i (1994). "Error Analysis of the Lanczos Algorithm for Nonsymmetric Eigenvalue
Problem," Math. Comp. 62, 209-226.
z. Ba.i and J.W. Denunel (1989}. "On a Block Implementation of Hessenberg Multishift
QR Iteration," Int'l J. of High Speed Comput. 1, 97-112.
Z. Bai and J.W. Demmel (1993). "On Swapping Diagonal Blocks in Real Schur Form,"
Lin. Alg. and Its Applic. 186, 73-95.
Z. Bai, J.W. Demmel, and A. McKenney (1993). "On Computing Condition Numbers
for the Nonsymmetric Eigenproblem," ACM Tiuns. Math. Soft. 19, 202-223.
Z. Bai, D. Hu, and L. Reichel (1994). "A Newton Basis GMRES Implementation," IMA
J. Num. Anal. 14, 563-581.
Z. Bai and H. Zha (1993). "A New Preprocessing Algorithm for the Computation of the
Generalized Singular Value Decomposition," SIAM J. Sc~ Comp. 14, 1007-1012.
D.H. Bailey (1988). "Extra High Speed Matrix Multiplication on the Cray-2," SIAM J.
Sci. and Stat. Comp. 9, 603-607.
D.H. Bailey (1993). "Algorithm 719: Multiprecision Translation and Execution of FOR,.
TRAN Programs," ACM 1tun8. Math. Soft. 19, 288-319.
D.H. Bailey, K.Lee, and H. D. Simon (1991). "Using Strassen's Algorithm to Acoeler!tte
the Solution of Linear Systems," J. Supercomputing 4, 357-371.
D.H. Bailey, H.D. Simon, J. T. Barton, M.J. Fouts (1989). "Floating Point Arithmetic
in Future Supercomputers," Int'l J. Supercomputing Appl. 3, 86-90.
I.Y. Bar-Itzhsck (1975). "Iterative Optimal Orthogonalization of the Strapdown Ma-
trix," IEEE Tiuns. Aero8pace and Electronic Systemo 11, 3Q-37.
J.L. Barlow (1986). "On the Smallest Positive Singular Value of an M-Matrix with
Applications to Ergodic Markov Chains," SIAM J. Alg. and Disc. Struct. 7, 414-
424.
J.L. Barlow (1988). "Error Analysis and Implementation Aspects of Deferred Correction
for Equality Constrained Least-Squares Problems," SIAM J. Num. Anal. 25, 134Q-
1358.
J.L. Barlow (1993). "Error Analysis of Update Methods for the Symmetric Eigenvalue
Problem," SIAM J. Matrix AnaL AppL 14, 598-618.
J.L. Barlow and J. Demmel (1990). "Computing Accurate Eigensystems of Scaled Di-
agonally Dominant Matrices," SIAM J. Numer. Anal. 27, 762-791.
J.L. Barlow and S.L. Handy (1988). "The Direct Solution of Weighted and Equality
Constrained Least-Squares Problems," SIAM J. Sci. Stat. Comp. 9, 704-716.
J.L. Barlow, M.M. Monahemi, and D.P. O'Leary (1992). "Constrained Matrix Sylvester
Equations," SIAM J. Matri:J: Anal. Appl. 13, 1-9.
J.L. Barlow, N.K. Nichols, and R..J. Plemmons (1988). "Iterative Methods for Equality
Constrained Least Squares Problems," SIAM J. Sci. and Stat. Comp. 9, 892-906.
J.L. Barlow and U.B. Vemulapati (1992). "Rank Detection Methods for Sparse Matri-
ces," SIAM J. Matri:J:. Anal. Appl. 13, 1279-1297.
J.L. Barlow and U.B. Vemulapati (1992). "A Note on Deferred Correction for Equality
Constrained Least Squares Problems," SIAM J. Num. Anal. 29, 249-256.
640 BIBLIOGRAPHY

S. Barnett and C. Storey (1968). "Some Applications of the Lyapunov \latrix Equation,"
J. [rut. Math. Applic. 4, 33-42.
R. Barrett, \!.Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra. V. Eijkhout, R.
Po~o, C. Romine, H. van der Vorst (1993). Templates for the Solution of Linear
Systems: Building Blocks for Itemtive Methods, SIAM Publications, Philadelphia,
PA.
A. Barrlund (1991). "Perturbation Bounds for the I>DLT and LU Decompositions,"
BIT 31, 358-363.
A. Barrlund (1994). "Perturbation Bounds for the Generalized QR Factorization," Lin.
Alg. and Its Applic. 207, 251-271.
R.H. Bartels (1971). "A Stabilization of the Simplex Method," Nnmer. Math. 16,
414-434.
R.H. Bartels, A.R. Conn, and C. Charalambous (1978). "On Cline's Direct Method for
Solving Overdetermined Linear Systems in the L 00 Sense," SIAM J. Num. Anal. 15,
255-70.
R.H. Bartels and G.W. Stewart (1972). "Solution of the Equation AX+ XB "'0,"
Comm. ACM 15, 82Q-26.
S.G. Bartels and D.J. Higham (1992). "The Structured Sensitivity of Vandermonde-Like
Systems," Nume1·. Math. 62, 17-34.
W. Barth, R.S. Martin, and J.H. Wilkinson (1967). "Calculation of the Eigenvalues of
a Symmetric Tridiagonal Matrix by the Method of Bisection," Numer. Math. 9,
386-93. See also Wilkinson and Reinsch (1971, 249-256).
V. Barwell and J.A. George (1976). "A Comparison of Algorithms for Solving Symmetric
Indefinite Systems of Linear Equations," ACM Trons. Math.. Soft. 2, 242-51.
K.J. Bathe and E.L. Wilson (1973). "Solution Methods for Eigenvalue Problems in
Structural Mechanics," Int. J. Numer. Meth.. Eng. 6, 213-26.
S. Batterson (1994). "Convergence of the Francis Shifted QR Algorithm on Normal
Matrices," Lin. Alg. and Its Applic. 201, 181-195.
S. Batterson and J. Smillie (1989). "The Dynamics of Rayleigh Quotient Iteration,"
SIAM J. Num. Anal. 26, 621-636.
D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodgh\11 (1993). "Solving Alignment
Using Elementary Linear Algebra," in Proceedings of the 7th International Workshop
on Language.• and Compilers for Pamllel Computing, Lecture Notes in Computer
Science 892, Springer-Verlag, New York, 46-60.
F.L. Bauer (1963). "Optimally Scaled Matrices," Numer. Math. 5, 73~7.
F.L. Bauer (1965). "Elimination with Weighted Row Combinations for Solving Lin-
ear Equations and Least Squares Problems," Numer. Math. 1, 338-52. See also
Wilkinson and Reinsch (1971, 119-33).
F.L. Bauer and C.T. Fike (1960). "Norms and Exclusion Theorems," Numer. Math. 2,
123-44.
F.L. Bauer and C. Reinsch (1968). "Rational QR Transformations with Newton Shift
for Symmetric Tridiagonal Matrices," Numer. Math. 11, 264-72. See also Wilkinson
and Reinsch (1971, pp.257-{l5).
F.L. Bauer and C. Reinsch (1971 ). "Inversion of Positive Definite Matrices by the Gauss-
Jordan :\!ethod," in Handbook fm· Automatic Computation Vol. 2, Linear Algebra,
J .H. Wilkinson and C. Reinsch, eds. Springer-Verlag, New York, 45-49.
C. Bavely and C.W. Stewart (1979). "An Algorithm for Computing Reducing Suhspaces
hy Block Diagonallzation," SIAM J. Num. Anal. 16, 359-{!7.
C. Beattie and D.W. Fox (1989). "lm:alization Criteria and Containment for Rayleigh
Quotient Iteration," SIAM J. Matrn Anal. Appl. 10, 8Q-93.
ll. Beauwens and P. de Groen, eds. (1992). Itemtitw Methods in Linear Algebra, Elsevier
(North-Holland), Amsterdam.
T. Beelen and P. Van Dooren (1988). "An Improved Algorithm for the Computation of
Kronecker's Canonical Fbrm of a Singular Pencil," Lin. Alg. and Its Applic. 105,
9-65.
R. Bellman (1970). Introduction to Matrix Analysis, 2nd ed., McGraw-Hill, New York.
BIBLIOGRAPHY 641

E. Beltrami {1873). "Sulle Funzioni Bilineari," Gionale di Mathematiche 11, 98-106.


A. Berman and A. Ben-Israel (1971). "A Note on Pencils of Hermitian or Symmetric
Matrices," SIAM J. Applic. Math. 21, 51-54.
A. Berman and R.J. Plemmons {1979). Nonnegative Matrices in the Mathematical Sci-
ences, Academic Press, New York.
M ..J.M. Bernal and J.H. Verner {1968). "On Generalizing of the Theory of Comistent
Orderings for Successive Over-Relaxation Methods," Numer. Math. 12, 21:'>-22.
J. Berntsen (1989). "Communication Efficient Matrix Multiplication on Hypercubes,"
Parallel Computing 12, 33:'>-342.
M.W. Berry, J.J. Dongarra, andY. Kim {1995). "A Parallel Algorithm for the Reduction
of a Nonsymmetric Matrix to Block Upper Hessenberg Form," Parallel Computing
21, 1189-1211.
M.W. Berry and G.H. Golub (1991). "Estimating the Largest Singular Values of Large
Sparse Matrices via Modified Moments," Numerical Algorithms 1, 353-374.
M.W. Berry and A. Sameh (1986). "Multiprocessor Jacobi Algorithms for Dense Sym-
metric Eigenvalue and Singular Value Decompositions," in Proc. Inte:rnational Con-
ference on Parallel Processing, 433--440.
D.P. Bertsekas and J. N. Tsitsiklis {1989). Parallel and Distributed Computation: Nu-
merical MethoW., Prentice Hall, Englewood Cliffs, NJ.
R. Bevilacqua, B. Codenotti, and F. Romani {1988). "Parallel Solution of Block Tridi-
agonal Linear Systems," Lin.Alg. and 11.8 Applic. 104, 3!J....57.
C.H. Bischof {1987). "The Two-Sided Block Jacobi Method on Hypercube Architec-
tures," in Hypercube Multiprocessors, ed. M.T. Heath, SIAM Publications, Philadel-
phia, PA.
C.H. Bischof {1988). "QR Factorization Algorithms for Coarse Grain Distributed Sys-
tems,17 PhD Thesis, Dept. of Computer Science, Cornell Univel'5ity, Ithaca, NY.
C.H. Bischof {1989). "Computing the Singular Value Decomposition on a Distributed
System of Vector Processors," Parallel Computing 11,171-186.
C.H. Bischof {1990). "Incremental Condition Estimation," SIAM J. Matrix Anal. Appl.
11' 644-{;59.
C.H. Bischof {1990). "Incremental Condition Estimation for Sparse Matrices," SIAM J.
Matrix Anal. Appl. 11, 312-322.
C.H. Bischof and P.C. Hansen ( 1992). "A Block Algorithm for Computing Rank-
Revealing QR Factorizations," Numeriool Algorithms 2, 371-392.
C.H. Bischof and G.M. Shroff (1992). "On Updating Signal Subspaces," IEEE Tiuns.
Signal Proc. 40, 96-105.
C.H. Bischof and C. Van Loan (1986). "Computing the SVD on a Ring of Array Proces-
sors," in Lorge Scale Eigenvalue Problems, eds. J. Cullum and R. Willoughby, North
Holland, 51-66.
C. H. Bischof and C. Van Loan {1987). "The WY Representation for Products of House-
holder Matrices," SIAM J. Sci. and Stat. Comp. 8, s2--s13.
A. Bjorck {1967). "Iterative Refinement of Linear Least Squares Solutions I," BIT 7,
257-78.
A. Bjiirck (1967). "Solving Linear Least Squares Problems by Gram-Schmidt Orthogo-
nalization," BIT 7, 1-21.
A. Bjorck ( 1968). "Iterative Refinement of Linear Least Squares Solutions II," BIT 8,
8-30.
A. Bjorck (1984). "A General Updating Algorithm for Constrained Linear Least Squares
Problems," SIAM J. Sci. and Stat. Comp. 5, 394-402.
A. Bjorck (1987). "Stability Analysis of the Method of Seminormal Equations for Linear
Least Squares Problems," Linear Alg. and 11.8 Applic. 88/89, 31-48.
A. Bjorck {1991). "Component-wise Perturbation Analysis and Error Bounds for Linear
Least Squares Solutions," BIT 31, 238-244.
A. Bjiirck {1992). "Pivoting and Stability in the Augmented System Method," Proceed-
ings of the 14th Dundee Conference, D. F. Griffiths and G.A. Watson {eds), Longman
Scientific and Technical, Essex, U.K.
642 BIBLIOGRAPHY

A. Bjorck (1994). "Numerics of Gram-Schmidt Orthogonalization," Lin. Alg. and Its


Applic. 197/198, 297-316.
A. Bjorck (1996). Numerical Methods for Least Square. Problema, SIAM Publications,
Philadelphia, PA.
A. Bjorck and C. Bowie (1971). "An Iterative Algorithm for Computing the Best Esti-
mate of an Orthogonal Matrix," SIAM J. Num. Anal. 8, 358-M.
A. Bjorck and l.S. Duff (1980). "A Direct Method for the Solution of Sparse Linear
Least Squares Problems," Lin. Alg. and Ita Applic. 34, 43-67.
A. Bjorck and T. Elrving (1973). "Algorithms for Confluent Vandermonde Systems,"
Numer. Math. 21, 13Q-37.
A. BjOrck and G.H. Golub (1967). "Iterative Re6nement of Linear Least Squares Solu-
tions by Householder Transformation," BIT 7, 322-37.
A. Bjorck and G.H. Golub (1973). "Numerical Methods for Computing Angles Between
Linear Subepaces," Math. Comp. 27, 579-94.
A. Bjorck, E. Grimme, and P. Van Dooren (1994). "An Implicit Shift Bidiagonalization
Algorithm for Ill-Posed Problems," BIT 34, 51Q-534.
A. Bjorck and S. Hammarling (1983). "A Schur Method for the Square Root of a Matrix,"
Lin. Alg. and Ita Applic. 52/53, 127-140.
P. Bj!'lrstad, F. Manne, T.S0revik, and M. Vajtertic (1992). "Efficient Matrix Multipli-
cation on SIMD Computers," SIAM J. Matri:t Anal. Appl. 13, 386-401.
A. Bjorck and C.C. Paige (1992). "Loss and Recapture of Orthogonality in the Modi6ed
Gram-Schmidt Algorithm," SIAM J. Matri:t Anal. Appl. 13, 176-190.
A. Bjorck and C.C. Paige (1994). "Solution of Augmented Linear Systems Using Or-
thogonal Factorizations," BIT 34, 1-24.
A. Bjiirck, H. Park, and L. Elden (1994). "Accurate Downdating of Least Squares
Solutions," SIAM J. Matri:t Anal. Appl. 15, 54!}--568.
A. Bjorck and V. Pereyra ( 1970). "Solution ofVandermonde Systems of Equations," Math.
Comp. ll..j, 893-903.
A. Bjorck, R.J. Plemmons, and H. Schneider, eds. (1981). Large-Scale Matriz Problems,
North-Holland, New York.
J.M. Blue (1978). "A Portable FORTRAN Program to Find the Euclidean Norm of a
Vector," ACM 1rons. Math. Soft. 4, 15-23.
E. Bodewig (1959). Matriz Calculus, North Holland, Amsterdam.
Z. Bohte (1975). "Bounds for Rounding Errors in the Gaussi8ll Elimination for B8lld
Systems," J. lnst. Math. Applic. 16, 133-42.
A.W. Bojanczyk, R.P. Brent, and F.R.. de Hoog (1986). "QR Factorization of Toeplitz
Matrices," Numer. Math . .j.9, 81-94.
A.W. Bojanczyk, R.P. Brent, F.R.. de Hoog, and D.R. Sweet (1995). "On the Stability of
the Bareis& and Related Toeplitz Factorization Algorithms," SIAM J. Matri:t Anal.
Appl. 16, 4Q-57.
A.W. Bojanczyk, R.P. Brent, P. Van Dooren, and F.R. de Hoog (1987). "A Note on
Downdating the Cholesky Factorization," SIAM J. Sci. and Slat. Comp. 8, 21Q-221.
A.W. Bojanczyk and G. Cybenko, eds. (1995). Linear Algebro for Signal Proa.osing,
IMA Volumes in Mathematics and Its Applications, Springer-Verlag, New York.
A.W. Bojanczyk, R. Onn, and A.O. Steinhardt (1993). "Existence of the Hyperbolic
Singular Value Decomposition," Lin. Alg. and Ita Applic. 185, 21-30.
D.L. Boley and G.H. Golub (1978). "The Matrix Inverse Eigenvalue Problem for Peri-
odic Jacobi Matric..,," in Proc. Fourth Symposium on Basic Problems of Numerical
Mathematics, Prague, pp. 63-76.
D.L. Boley and G.H. Golub (1984). "A Modi6ed Method for R...tructuring Periodic
Jacobi Matrices," Math. Comp. 42, 143-150.
D.L. Boley and G.H. Golub (1984). "The Lanczos-Arnoldi Algorithm and Controllabil-
ity," Syst. Control Lett. .j., 317-324.
D.L. Boley Md G.H. Golub (1987). "A Survey of Matrix Inverse Eigenvalue Problems,"
Inverse Problems 3, 595-622.
BIBLIOGRAPHY 643

H. Bolz and W. Nietho.mmer (1988). "On the Evaluation of Matrix Functions Given by
Power Series," SIAM 1. Matrix Anal. Appl. 9, 202-209.
S. Bondeli and W. Gander (1994). "Cyclic Reduction for Special Tridio.gonaJ Systems,"
SIAM J. Matrix Anal. Appl. 15, 321-330.
J. Boothroyd and P.J. Eberlein (1968). "Solution to the Eigenproblem by a. Norm-
Reducing Jacobi-Type Method (Handbook)," Numer. Math. 11, 1-12. See also
Wilkinson and Reinsch (1971, pp.327-38).
H.J. Bowdler, R.S. Martin, G. Peters, and J.H. Wilkinson (1966). "Solution of Real
and Complex Systems of Linear Equations," Numer. Math. 8, 217-234. (See also
Wilkinson and Reinsch ( 1971, 93-110).
H.J. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). "The QR and QL
Algorithms for Symmetric Matrices," Numer. Math. 11, 293-306. See also Wilkinson
and Reinsch (1971, pp.227-240).
J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition-
ers for Elliptic Problems by Substructuring I," Math. Camp. 41, 103-134.
J.H. Bramble, J.E. Pascio.k, and A. H. Schatz (1986). "The construction of Precondition-
.,. for Elliptic Problems by Substructuring II," Math. Camp. 49, 1-17.
R Bramley and A. So.meh (1992). "Row Projection Methods for Large Nonsymmetric
Linear Systems," SIAM J. Sci. Sto.tiat. Comput. 13, 168-193.
R.P. Brent (1970). "Error Analysis of Algorithms for Matrix Multiplication and Trian-
gular Decomposition Using Winograd's Identity," Numer. Math. 16, 145-156.
R.P. Brent (1978). "A Fortran Multiple Precision Arithmetic Package," ACM 11-ans.
Math. Soft. 4, 57-70.
R.P. Brent (1978). "Algorithm 524 MP, a. Fortran Multiple Precision Arithmetic Pack-
age," ACM 11-ans. Math. Soft. 4, 71-al.
R.P. Brent and F.T. Luk (1982) "Computing the Choiesky Factorization Using a Systolic
Architecture," Proe. 6th Australian Computer Science Conf. 295-302.
R.P. Brent and F.T. Luk (1985). "The Solution of Singular Value and Symmetric Eigen-
value Problems on Multiprocessor Arrays," SIAM J. Sci. and Stat. Camp. 6, 69-84.
R.P. Brent, F.T. Luk, and C. Van Loan (1985). "Computation of the Singular VaJue
Decomposition Using Mesh Connected Processors," J. VLSI Computer Systems 1,
242-270.
C. Brezinski and M. Redivo-Zaglia (1994). "Treatment of Near-Breakdown in the CGS
Algorithms," Numer. Alg. 7, 33-73.
C. Brezinski and M. Redivo-Zaglio. ( 1995). "Look-Aheod in BiCGSTAB and Other
Product-Type Methods for Linear Systems," BIT 35, 169-201.
C. Brezinski a.od H. Sodok (1991). "Avoiding Breakdown in the CGS Algorithm," Nu-
mer. Alg. 1, 199-206.
C. Brezinski, M. Zaglio., and H. Sodok (1991). "Avoiding Breakdown and Near Break-
down in Lo.nczos Tyoe Algorithms," Numer. Alg. 1, 261-284.
C. Brezinski, M. Zaglia., and H. Sadok (1992). "A Breakdown Free Lanczos Type Algo-
rithm for Solving Linear Systems," Numer. Math. 63, 29-38.
K.W. Brodlie and M.J.D. Powell (1975). •on the Convergence of Cyclic Jacobi Meth-
ods," J. lnst. Math. Applic. 15, 279-87.
J.D. Brown, M.T. Chu, D.C. Eiiison, and R.J. Plemmons, eds. (1994). Proceedings
of the Cornelius Lanczos International Centenary Conference, SIAM Publications,
Philadelphia., PA.
C.G. Broyden (1973). "Some Condition Number Bounds for the Gaussian Elimination
Pro«...," J. lnst. Math. Applic. 18, 273-86.
A. Buckley (1974). "A Note on Matrices A = I+ H, H Skew-Symmetric," Z. Angew.
Math. Mech. 54, 125-26.
A. Buckley (1977). "On the Solution of Certain Skew-Symmetric Linear Systems," SIAM
J. Num. Anal. 14, 566-70.
J.R. Bunch (1971). "AnaJysis of the DiagonaJ Pivoting Method," SIAM J. Num. Anal.
8, 656-680.
644 BIBLIOGRAPHY

J.R. Bunch (1971). "Equilibration of Symmetric Matrices in the Ma.x-Norm," J. ACM


18, 566-72.
J.R. Bunch (1974). "Partial Pivoting Strategies for Symmetric Matrices," SIAM J. Num.
Anal. 11, 521-528.
J.R. Bunch (1976). "Block Methods for Solving Sparse Linear Systems," in Sparse
Matrix Computations, J.R. Bunch and D.J. Rose (eds), Academic Press, New York.
J.R. Bunch (1982). "A Note on the Stable Decomposition of Skew Symmetric Matrices,"
Math. Comp. 158, 475-480.
J.R. Bunch (1985). "Stability of Methods for Solving Toeplitz Systems of Equations,"
SIAM J. Sci. Stat. Comp. 6, 349-364.
J.R. Bunch, J.W. Demmel, and C.F. Van Loan (1989). "The Strong Stability of Algo-
rithms for Solving Symmetric Linear Systems," SIAM J. Matrix Anal. Appl. 10,
494-499.
J.R. Bunch and L. Kaufman (1977). "Some Stable Methods for Calculating Inertia and
Solving Symmetric Linear Systems," Math. Comp. 31, 162-79.
J.R. Bunch, L. Kaufman, and B.N. Parlett (1976). "Decomposition of a Symmetric
Matrix," Numer. Math. 27, 95-109.
J.R. Bunch, C.P. Nielsen, and D.C. Sorensen (1978). "Rank-One Modification of the
Symmetric Eigenproblem," Numer. Math. 31, 31--48.
J.R. Bunch and B.N. Parlett (1971). "Direct Methods for Solving Symmetric Indefinite
Systems of Linear Equations," SIAM J. Num. Anal. 8, 639-55.
J.R. Bunch and D.J. Rose, eds. (1976). Sparse Matrix Computations, Academic Press,
New York.
0. Buneman (1969). "A Compact Non-Iterative Poisson Solver," Report 294, Stanford
University Institute for Plasma Research, Stanford, California.
A. Bunse-Gerstner (1984). "An Algorithm for the Symmetric Generalized Eigenvalue
Problem," Lin. Alg. and Its Applic. 58, 43~8.
A. BunS&-Gerstner, R. Byers, and V. Mehrmann (1992). "A Chart of Numerical Methods
for Structured Eigenvalue Problems," SIAM J. Matrix Anal. Appl. 13, 419--453.
A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1993). "Numerical Methods for Simul-
taneous Diagonalization," SIAM J. Matrix Anal. Appl. 14, 927-949.
A. Bunse-Gerstner and W.B. Gragg {1988). "Singular Value Decompositions of Complex
Symmetric Matrices," J. Comp. Applic. Math. 21, 41-54.
J.V. Burke and M.L. Overton (1992). "Stable Perturbations of Nonsymmetric Matrices,"
Lin.Alg. and Its Applic. 171, 249-273.
P.A. Businger (1968). "Matrices Which Can be Optimally Scaled," Numer. Math. 12,
346--48.
P.A. Businger (1969). "Reducing a Matrix to Hesaenberg Form," Math. Comp. 23,
819-21.
P.A. Businger (1971). "Monitoring the Numerical Stability of Gaussian Elimination,"
Numer. Math. 16, 360-61.
P.A. Businger (1971). "Numerically Stable Deflation of Hessenberg and Symmetric Tridi-
agonal Matrices,BIT 11, 262-70.
P.A. Businger and G.H. Golub (1965). "Linear Least Squares Solutions by HoUS&-
holder Transformations," Numer. Math. 7, 269-76. See also Wilkinson and Reinsch
(1971,111-18).
P.A. Businger and G.H. Golub (1969). "Algorithm 358: Singular Value Decomposition
of a Complex Matrix," Comm. Assoc. Comp. Mach. 12, 56H5.
B.L. Buzbee (1986) "A Strategy for Vectorization," Parallel Computing 3, 187-192.
B.L. Buzbee and F.W. Dorr (1974). "The Direct Solution of the Biharmonic Equation
on Rectangular Regions and the Poisson Equation on Irregular Regions," SIAM J.
Num. Anal. 11, 753~3.
B.L. Buzbee, F.W. Dorr, J.A. George, and G.H. Golub (1971). "The Direct Solution of
the Discrete Poisson Equation on Irregular Regions," SIAM J. Num. Anal. 8, 722-36.
B.L. Buzbee, G.H. Golub, and C.W. Nielson (1970). "On Direct Methods for Solving
Poisson's Equations," SIAM J. Num. Anal. 7, 627-56.
BIBLlOGRAPHY 645

R. Byers (1984). "A Linpack-Style Condition Estimator for the Equation AX- X aT=
C," IEEE Tinns. Auto. Cont. AC-.1!9, 926-928.
R. Byers (1986) "A Hamiltonian QR Algorithm," SIAM J. Sci. and Stat. Camp. 7,
212-229.
R. Byers (1988). "A Bisection Method for Measuring the Distance of a Stable Matrix to
the Unstable Matrices," SIAM J. Sci. Stat. Comp. 9, 875-881.
R. Byers and S.G. Nash (1987). "On the Singular Vectors of the Lyapunov Operator,"
SIAM J. Alg. and Disc. Methods 8, 5!Hl6.
X.-C. Cai and 0. Widlund (1993). "Multiplicative Schwarz Algorithms for Some Non-
symmetric and Indefinite Problems," SIAM J. Numer. Anal. 30, 936-952.
D. Calvetti, G.H. Golub, and L. Reichel (1994). "An Adapti11e Chebyshev Iterati11e
Method for Nonsymmetric Linear Systems Bas.ed on Modified Moments," Numer.
Math. 67, 21-40.
D. Calvetti and L. Reichel (1992). "A Chebychev-Vandermonde Sol11er," Lin. Alg. and
Its Applic. 171!, 219-229.
D. Calvetti and L. Reichel (1993). "Fast In11ersion of Vandermonde-Like Matrices In-
volving Orthogonal Polynomials," BIT 33, 473-484.
D. Calvetti, L. Reichel, and D.C. Sorensen (1994). "An Implicitly Restarted Lanczos
Method for Large Symmetric Eigenvalue Problems," ETNA 2, 1-21.
L.E. Cannon (1969). A Cellular Computer to Implement the Kalman Filter Algorithm,
Ph.D. Thesis, Montana State University.
R. Carter (1991). "Y-MP Floating Point and Cholesky Factorization," Jnt'l J. High
Speed. Cumputing 3, 215-222.
F. Chaitin-Chatelin and V. Fraysee (1996). Lectures on nnite Precision Computations,
SIAM Publications, Philadelphia, PA.
R.H. Chan (1989). "The Spectrum of a Family of Circulant Preconditioned Toeplitz
Systems," SIAM J. Num. Anal. 26, 503-506.
R.H. Chan (1991). "Preconditioners for Toeplitz Systems with Nonnegative Generating
Functions," IMA J. Num. Anal. 11, 333-345.
R.H. Chan, J.G. Nagy, and R.J. Plemmons (1993). "FFT bllll<>:l Preconditioners for
Toeplitz Block Least Squares Problems," SIAM J. Num. Anal. 30, 174Q-1768.
R.H. Chan, J.G. Nagy, and R.J. Plemmons (1994). "Circulant Preconditioned Toeplitz
Least Squares Iterations," SIAM J. Matriz Anal. Appl. 15, 8Q-97.
S.P. Chan and B.N. Parlett (1977). "Algorithm 517: A Program for Computing the
Condition Numbers of Matrix Eigenvalues Without Computing Eigenvectors," ACM
Tinns. Math. Soft. 3, 186-203.
T.F. Chan (1982). "An Improved Algorithm for Computing the Singular Value Decom-
position," ACM Tinns. Math. Soft. 8, 72-83.
T.F. Chan (1984). "Deflated Decomposition Solutions of Nearly Singular Systems,"
SIAM J. Num. Anal. 21, 738-754.
T.F. Chan (1985). "On the Existence and Computation of LU Factorizations with small
pivots," Math. Comp. 42, 535-548.
T.F. Chan (1987). "Rank Revealing QR Factorizations," Lin. Alg. and Its Applic.
88/89, 67-82.
T.F. Chan (1988). "An Optimal Circulant Preconditioner for Toeplitz Systems," SIAM.
J. Sci. Stat. Comp. 9, 766-771.
T.F. Chan (1991). "Fourier Analysis of Relaxed Incomplete Factorization Precondition-
era," SIAM J. Sci. Statist. Comput.. 12, 668--680.
T.F. Chan and P. Hansen (1992). "A Look-Ahead Levinson Algorithm for Indefinite
Toeplitz Systems," SIAM J. Matriz Anal. Appl. 13, 49Q-506.
T.F. Chan and P. Hansen (1992). "Some Applications of the Rank Revealing QR Fac-
torization," SIAM J. Sci. and Stat. Comp. 13, 727-741.
T.F. Chan, K.R. Jackson, and B. Zhu (1983). "Alternating Direction Incomplete Fac-
torizations," SIAM J. Numer. Anal. 20, 239-257.
T.F. Chan and J.A. Olkin (1994). "Circulant Preconditioners for Toeplitz Block Matri-
ces," Numerical Algorithms 6, 89-101.
646 BIBLIOGRAPHY

T.F. Chan, J.A. Olkin, and D. Cooley (1992). "Solving Quadratically Constrained Least
Squares Using Block Box Solvers," BIT 32, 481-495.
S. ChandrBSeksren and l.C.F. Ipsen (1994). "On Rank-Revealing Factorizations," SIAM
J. Matnz Anal. Appl. 15, 592--{;22.
S. ChsndrBSeksren and I.C.F. Ipsen (1994). "Backward Errors for Eigenvalue and Sin-
gular Value Decompositions," Numer. Math. 68, 215--223.
S. Chandrasekaren snd I.C.F. Ipsen (1995). "On the Sensitivity of Solution Components
in Linear Systems of Equations," SIAM J. Matriz Anal. Appl. 16, 93-112.
H.Y. Chsng snd M. Salama (1988}. "A Parallel Householder Tridiagonalization Strategy
Using Scattered Square Decomposition," Parollel Computing 6, 297-312.
J.P. Charlier, M. Vanbegin, P. Vsn Dooren (1988}. "On Efficient Implementation of
Kogbetlisntz's Algorithm for Computing the Singular Value Decomposition," Numer.
Math. 52, 279-300.
J.P. Charlier snd P. Van Dooren (1987}. "On Kogbetliantz's SVD Algorithm in the
Presence of Clusters," Lin. Alg. and Its Applic. 95, 135--160.
B.A. Chartres and J.C. Geuder (1967}. "Computable Error Bounds for Direct Solution
of Linear Equations," J. ACM 14, 63-71.
F. Chatelin (1993}. Eigenvalues of Matrice•, John Wiley snd Sons, New York.
S. Chen, J. Dongarra, snd C. Hsuing (1984}. "Multiprocessing Linear Algebra Algo-
rithms on the Cray X-MP-2: Experiences with Small Granularity," J. Parollel and
Distributed Computing 1, 22-31.
S. Chen, D. Kuck, snd A. Sameh (1978}. "Practical Parallel Band Triangular Systems
Solvers," ACM 1mns. Math. Soft. 4, 270--277.
K.H. Cheng and S. Sahni (1987). "VLSI Systems for Band Matrix Multiplication,"
Parallel Computing 4, 239-258.
R.C. Chin, T.A. Manteuffel, and J. de Pillis (1984). "AD! as a Preconditioning for
Solving the Convection-Diffusion Equation," SIAM J. Sci. and Stat. Comp. 5,
281-299.
J. Choi, J.J. Dongarra, and D.W. Walker (1995). "Parallel Matrix Transpose Algorithms
on Distributed Memory Concurrent Computers," Parollel Computing 21, 1387-1406.
M.T. Chu (1992}. "Numerical Methods for Inverse Singular Value Problems," SIAM J.
Num. Anal. 29, 885--903.
M.T. Chu, R.E. Funderlic, and G.H. Golub (1995). "A Rank-One Reduction Formula
and Its Applications to Matrix Factorizations," SIAM Review 37, 512--530.
P.G. Ciarlet (1989). Introduction to Numerical Linear Algebro and Optimisation, Cam-
bridge University Press.
A.K. Cline (1973). "An Elimination Method for the Solution of Linear Least Squares
Problems," SIAM J. Num. Anal. 10, 283-89.
A.K. Cline (1976). "A Descent Method for the Uniform Solution to Overdetermined
Systems of Equations," SIAM J. Num. Anal. 13, 293-309.
A.K. Cline, A.R. Conn, snd C. Van Loan (1982}. "Generalizing the LINPACK Condition
Estimator," in Numerical Analysis, ed., J.P. Hennart, Lecture Notes in Mathematics
no. 909, Springer-Verlag, New York.
A.K. Cline, G.H. Golub, and G.W. Platzmsn (1976}. "Calculation of Normal Modes of
Oceans Using a Lsnczos Method," in Sparse Matriz Computations, ed. J.R. Bunch
snd D.J. Rooe, Academic Press, New York, pp. 409-26.
A.K. Cline, C.B. Moler, G.W. Stewart, snd J.H. Wilkinson (1979). "An Estimate for
the Condition Number of a Matrix," SIAM J. Num. Anal. 16, 368-75.
A.K. Cline and R.K. Rew (1983). "A Set of Counter examples to Three Condition
Number Estimators," SIAM J. Sci. and Stat. Comp. 4, 602--{;11.
R.E. Cline and R.J. Plemmons (1976). "L2-Solutions to Underdetermined Linear Sys-
tems," SIAM Review 18, 92-106.
M. Clint a.nd A. Jennings (1970). "The Evaluation of Eigenvalues and Eigenvectors of
Real Symmetric Matrices by Simultaneous Iteration," Comp. J. 13, 76--{;0.
M. Clint a.nd A. Jennings (1971}. "A Simultsneous Iteration Method for the Unsym-
metric Eigenvalue Problem," J. Inst. Math. Applic. 8, 111-21.
BIBLIOGRAPHY 647

W.G. Cochrane (1968). "Errors of Measurement in Statistics," n.chnometrics 10, 637-


66.
W.J. Cody (1988). "ALGORITHM 665 MACHAR: A Subroutine to Dynamically De-
termine Machine Parameters," ACM Thms. Math. Soft. 14, 303-311.
A.M. Cohen (1974). "A Note on Pivot Size in Gaussian Elimination," Lin. Alg. and Its
App!ic. 8, 361-68.
T.F. Coleman and Y. Li (1992). "A Globally and Quadratically Convergent Affine
Scaling Method for Linear L 1 Problems," Mathematical Progrnmming, 56, Seriea A,
189--222.
T.F. Coleman and D.C. Sorensen (1984). "A Note on the Computation of an Orthonor-
mal BIIBis for the Null Space of a Matrix," Mathematical Programming 2g, 234-242.
T.F. Coleman and C.F. Van Loan (1988). Handbook for Matrix Computations, SIAM
Publications, PhiiiKielphia, PA.
L. Colombet, Ph. Michallon, and D. Trystram (1996). "Parallel Matrix-Vector Product
on Rings with a Minimum of Communication," Pamllel Computing tlf, 289--310.
P. Concus and G.H. Golub (1973). "Use of Fast Direct Methods for the Efficient Nu-
merical Solution of Nonseparable Elliptic Equations," SIAM J. Num. Anal. 10,
1103-20.
P. Concus, G.H. Golub, and G. Meurant ( 1985). "Block Preconditioning for the Conju-
gate GriKiient Method," SIAM J. Sci. and Stat. Comp. 6, 220-252.
P. Concus, G.H. Golub, and D.P. O'Leary (1976). "A Generalized Conjugate Gradient
Method for the Numerical Solution of Elliptic Partial Differential Equations," in
Sparse Matrix Computations, ed. J.R. Bunch and D.J. Rose, Academic Press, New
York.
K. Connolly, J.J. Dongarra, D. Sorensen, and J. Patterson (1988). "Programming
Methodology and Performance Issues for Advanced Computer Architectures," Par-
allel Computing 5, 41-58.
J.M. Conroy (1989). "A Note on the Parallel Cholesky Factorization of Wide Banded
Matrices," Parallel Computing 10, 239-246.
S.D. Conte and C. de Boor (1980). Elementary Numerical Analysis: An Algorithmic
Approach, 3rd ed., McGmw-Hill, New York.
J .E. Cope and B.W. Rust ( 1979). "Bounds on solutions of systems with accurate data,"
SIAM J. Num. Anal. 16, 951Hl3.
M. Costnard, M. Marrakchi, and Y. Robert (1988}. "Parallel Gaussian Elimination on
an MIMD Computer," Parollel Computing 6, 275--296.
M. Costnard, J.M. Muller, and Y. Robert (1986). "Parallel QR Decomposition of a
Rectangular Matrix," Numer. Math. 48, 239--250.
M. Costnard and D. Tryst ram ( 1995). Pamllel Algorithms and Architectures, Interna-
tional Thomson Computer Press, New York.
R. W. Cottle (1974). "Manifestations of the Schur Complement," Lin. Alg. and Its
Applic. 8, 189-211.
M.G. Cox (1981). "The Least Squares Solution of Overdetermined Linear Equations
having Band or Augmented Band Structure," IMA J. Num. Anal. 1, 3-22.
C.R. Crawford (1973). "Reduction of a Band Symmetric Generalized Eigenvalue Prob-
lem," Comm. ACM 16, 41-44.
C.R. Crawford (1976). "A Stable Generalized Eigenvalue Problem," SIAM J. Num.
Anal. 19, 854-00.
C.R. Crawford ( 1986}. "Algorithm 646 PDFIND: A Routine to Find a Positive Definite
Linear Combination of Two Resl Symmetric Matrices," ACM 7rans. Math. Soft.
12, 278-282.
C.R. Crawford and Y.S. Moon (1983}. "Finding a Positive Definite Linear Combination
of Two Hermitian Matrices," Lin. Alg. and Its Applic. 51, 37-48.
S. Crivelli and E.R. Jessup (1995). "The Cost of Eigenvalue Computation on Distributed
Memory MIMD Computers," Parallel Computing 21, 401-422.
C.W. Cryer (1968). "Pivot Size in Gaussian Elimination," Numer. Math. 12, 335-45.
648 BIBLIOGRAPHY

J. Cullum (1978). "The Simultaneous Computation of a Few of the Algebraically Largest


and Smallest Eigenvalues of a Large Sparse Symmetric Matrix," BIT 18, 26&-75.
J. Cullum and W.E. Donath (1974). "A Block Lanczos Algorithm for Computing the q
Algebraically Largest Eigenvalues and a Corresponding Eigenspace of Large Sparse
Real Symmetric Matrices," Proc. of the 1914 IEEE Conf. on Decision and Contro~
Phoenix, Arizona, pp. 505--9.
J. Cullum and R.A. Willoughby (1977). "The Equivalence of the Lanczos and the Con-
jugate Gradient Algorithms," IBM Research Report RE-6903.
J. Cullum and R.A. Willoughby {1979). "Lanczos and the Computation in Specified
Intervals of the Spectrum of Large, Sparse Real Symmetric Matrices, in Sparse Matri.:z
Proc. , 1978, ed. I.S. Duff and G.W. Stewart, SIAM Publications, Philadelphia, PA.
J. Cullum and R. Willoughby (1980). "The Lanczos Phenomena: An Interpretation
Based on Conjugate Gradient Optimization," Lin. Alg. and Its Applic. 29, 63-90.
J. Cullum and R.A. Willoughby (1985). Lanczos Algorithms for Lorge Symmetric Eigen-
value Computation•, Vol. I Theory, Birkhauser, Boston.
J. Cullum and R.A. Willoughby (1985). Lanczos Algorithms for Lorge Symmetric Eigen-
value Computations, Vol. II Programs, Birkhaiiser, Boston.
J. Cullum and R.A. Willoughby, eds. (1986). Lorge Scale Eigenvalue Problems, North-
Holland, Amsterdam.
J. Cullum, R.A. Willoughby, and M. Lake (1983). "A Lanczos Algorithm for Computing
Singular Values and Vectors of Large Matrices," SIAM J. Sci. and Stat. Camp. 4,
197-215.
J.J.M. Cuppen (1981). "A Divide and Conquer Method for the Symmetric Eigenprob-
lem," Numer. Math. 36, 177-95.
J.J.M. Cuppen {1983). "The Singular Value Decomposition in Product Form," SIAM
J. Sci. and Stat. Camp. 4, 216-222.
J.J.M. Cuppen (1984). "On Updating Triangular Products of Householder Matrices,"
Numer. Math. 45, 403-410.
E. Cuthill (1972). "Several Strategies for Reducing the Bandwidth of Matrices," in
Sparse Matrices and Their Applications, ed. D.J. Rose and R.A. Willoughby, Plenum
Press, New York.
G. Cybenko (1978). "Error Analysis of Some Signal Processing Algorithms," Ph.D.
thesis, Princeton University.
G. Cybenko {1980). "The Numerical Stability of the Levinson-Durbin Algorithm for
Toeplitz Systems of Equations," SIAM J. Sci. and Stat. Camp. 1, 303-19.
G. Cybenko (1984). "The Numerical Stability of the Lattice Algorithm for Least Squares
Linear Prediction Problems," BIT 24, 441-455.
G. Cybenko {1985). "Computing Pisarenko Frequency Estimates," in Proceedings of
the Princeton Conference on Information Science and System•, Dept. of Electrical
Engineering, Princeton University.
G. Cybenko and M. Berry (1990). "Hyperbolic Householder Algorithms for Factoring
Structured Matrices," SIAM J. Matri.:z Anal. Appl. 11, 499-520.
G. Cybenko and C. Van Loan (1986). "Computing the Minimum Eigenvalue of a Sym-
metric Positive Definite Toeplitz Matrix," SIAM J. Sci. and Stat. Camp. 1, 123-131.
K. Dackland, E. Elmroth, and B. Kagstrom (1992). "Parallel Block Factorizations on the
Shared Memory Multiprocessor IBM 3090 VF/600J," International J. Supercomputer
Applications, 6, 69-97.
J. Daniel, W.B. Gragg, L.Kaufman, and G.W. Stewart (1976). "R.eorthogonalization
and Stable Algorithms for Updating the Gram-Schmidt QR Factorization," Math.
Camp. 30, 772-795.
B. Danloy (1976). "On the Choice of Signs for Householder Matrices," J. Camp. Appl.
Math. 2, 67-{)9.
B.N. Datta (1989). "Parallel and Large-Scale Matrix Computations in Control: Some
Ideas," Lin. Alg. and It• Applic. 121, 243-264.
B.N. Datta (1995). Numerical Linear Algebra and Application•. Brooks/Cole Publishing
CompWJy, Pacific Grove, California.
BIBLIOGRAPHY 649

B.N. Datta, C.R. Johnson. M.A. Kaashoek, R. Plemmons, and E.D. Sontag, eds. (1988),
Linear Algebra in Signals, System8, and Control, SIAM Publications, Philadelphia,
PA.
K. Datta (1988). ''The Matrix Equation XA- BX =Rand Its Applications," Lin. Alg.
and Its Applic. 109, 91-105.
C. Davis and W.M. Kahan (1970). ''The Rotation of Eigenvectors by a Perturbation,
Ill," SIAM J. Num. Anal. 7, 1-46.
D. Davis (1973). "Explicit Functional Calculus," Lin. Alg. and Its Applic. 6, 193-99.
G.J. Davis (1986). "Column LU Pivoting on a Hypercube Multiprocessor," SIAM J.
Alg. and Disc. Methods 7, 538-550.
J. Day and B. Peterson (1988). "Growth in Gaussian Elimination," Amer. Math.
Monthly 95, 489-513.
A. Dax (1990). "The Convergence of Linear Stationary Iterative Processes for Solving
Singular Unstructured Systems of Linear Equations," SIAM Review 3!!, 611-635.
C. de Boor (1979). "Efficient Computer Manipulation of Tensor Products," ACM 1hlns.
Math. Soft. 5, 173-182.
C. de Boor and A. Pinkus ( 1977). "A Backward Error Analysis for Totally Positive
Linear Systems," Numer. Math. !!7, 485-90.
T. Dehn, M. Eiermann, K. Giebermann, and V. Sperling (1995). "Structured Sparse
Matrix Vector Multiplication on Massively Parallel SIMD Architectures," Parallel
Computing !!1, 1867-1894.
P. Deift, J. Demmel, L.-C. Li, and C. 'lbmei (1991). ''The Bidiagonal Singular Value
Decomposition and Hamiltonian Mechanics," SIAM J. Num. Anal. !!8, 1463-1516.
P. Deift, T. Nanda, and C. Tomei (1983). "Ordinary Differential Equations a.nd the
Symmetric Eigenvalue Problem," SIAM J. Numer. Anal. 20, 1-22.
T. Dekker and W. Hoffman (1989). "Rehabilitation of the Gauss-Jordan Algorithm,"
Numer. Math. 54, 591-599.
T.J. Dekker and J.F. Traub (1971). "The Shifted QRAlgorithm for Hermitian Matrices,"
Lin. Alg. and Its Applic. 4, 137-54.
J.M. Delosme and I.C.F. Ipsen (1986). "Parallel Solution of Symmetric Positive Definite
Systems with Hyperbolic Rotations," Lin. Alg. and Its Applic. 11, 75-112.
C.J. Demeure (1989). "Fast QR Factorization of Vandermonde Matrices," Lin. Alg.
and It. Applic. 1!!2/1fl3j124, 165-194.
J.W. Demmel (1983). "A Numerical Analyst's Jordan Canonical Form," Ph.D. Thesis,
Berkeley.
J.W. Demmel (1983). "The Condition Number of Equivalence Transformations that
Block Diagonalize Matrix Pencils," SIAM J. Numer. Anal. 20, 599-610.
J.W. Demmel (1984). "Underflow and the Reliability of Numerical Software," SIAM J.
Sci. and Stat. Comp. 5, 887-919.
J.W. Demmel (1987). "Three Methods for Refining Estimates of Invariant Subspaces,"
Computing 38, 43-57.
J.W. Demmel (1987). "On the Distance to the Nearest Ill-Posed Problem," Numer.
Math. 51, 251-289.
J.W. Demmel (1987). "A Counterexample for two Conjectures About Stability," IEEE
1hlns. Auto. Cont. AC-3!!, 34G-342.
J.W. Demmel (1987). "The smallest perturbation of a submatrix which lowers the rank
and constrained total least squares problems, SIAM J. Numer. Anal. 24, 199-206.
J.W. Demmel (1988). ''The Probability that a Numerical Analysis Problem is Difficult,"
Math. Comp. 50, 449-480.
J.W. Demmel (1992). "The Componentwise Distance to the Nearest Singular Matrix,"
SIAM J. Matri3: Anal. Appl. 13, 1G-19.
J.W. Demmel (1996). Numerical Linear Algebra, SIAM Publications, Philadelphia, PA.
J.W. Demmel and W. Gragg (1993). "On Computing Accurate Singular Values and
Eigenvalues of Matrices with Acyclic Graphs," Lin. Alg. and Its Applic. 185, 203-
217.
650 BIBLIOGRAPHY

J.W. Demmel, M.T. Heath, and H.A. VanDer Vorst (1993) "Parallel Numerical LineBC
Algebra," in Acta Nume:rica 1993, Cambridge University Press.
J.W. Demmel and N.J. Higham (1992). "Stability of Block Algorithms with Fast Level-3
BLAS," ACM 1hms. Math. Soft. 18, 274-291.
J.W. Demmel and N.J. Higham (1993). "Improved Error Bounds for Underdetermined
System Solvers," SIAM J. Matrix Anal. Appl. 14, 1-14.
J.W. Demmel, N.J. Higham, and R.S. Schreiber (1995). "Stability of Block LU Factor-
ization," Numer. Lin. Alg. with Applic. tl, 173-190.
J.W. Demmel and B. K8gstrom (1986). "Stably Computing the Kronecker Structure
and Reducing Subspaces of Singular Pencils A - >.B for Uncertain Data," in Large
Scale Eigenvalue Problems, J. Cullum and R.A. Willoughby (eds), North-Holland,
Amsterdam.
J.W. Demmel and B. Kagstrom (1987). "Computing Stable Eigendecompositions of
Matrix Pencils," Linear Alg. and Its Applic 88/89, 139-186.
J.W. Demmel and B. Kagstrom (1988). "Accurate Solutions of ill-Posed Problems in
Control Theory," SIAM J. Matrix Anal. Appl. 126-145.
J.W. Demmel and W. Kahan (1990). "Accurate Singular Values of Bidiagonal Matrices,"
SIAM J. Sci. and Stat. Comp. 11, 873-912.
J.W. Demmel and K. Veselic (1992). "Jacobi's Method is More Accurate than QR,"
SIAM J. Matrix Anal. Appl. 13, 1204-1245.
B. De Moor and G.H. Golub (1991). "The Restricted Singular Value Decomposition:
Properties and Applications," SIAM J. Matrix Anal. Appl. 1tl, 401--425.
B. De Moor and P. Van Dooren (1992). "Generalizing the Singular Value and QR
Decompositions," SIAM J. Matrix Anal. Appl. 13, 993-1014.
J.E. Dennis and R.B. Schnabel (1983). Numerical Methods for Unconstrained Optimiza-
tion and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, NJ.
J.E. Dennis Jr. and K. Turner (1987). "Generalized Conjugate Directions," Lin. Alg.
and Its Applic. 88/89, 187-209.
E. F. Deprettere, ed. (1988). SVD and Signal Processing. Elsevier, Amsterdam.
J. Descloux (1963). "Bounds for the Spectral Norm of Functions of Matrices," Numer.
Math. 5, 185-90.
M.A. Diamond and D.L.V. Ferreira (1976). "On a Cyclic Reduction Method for the
Solution of Poisson's Equation," SIAM J. Num. Anal. 13, 54-70.
S. Doi (1991). "On Parallelism and Convergence of Incomplete LU Factorizations," Appl.
Numer. Mat h. 1, 417--436.
J.J. Dongacra (1983). "Improving the Accuracy of Computed Singular Values," SIAM
J. Sci. and Stat. Comp. 4, 712-719.
J.J. Dongarra, J.R. Bunch, C.B. Moler, and G.W. Stewart (1979). UNPACK Users
Guide, SIAM Publications, Philadelphia, PA.
J.J. Dongacra, J. Du Croz, I.S. Duff, and S.J. Hammarling (1990). "A Set of Level 3
Basic Linear Algebra Subprograms," ACM 1hms. Math. Soft. 16, 1-17.
J.J. Dongarra, J. Du Croz, I.S. Duff, and S.J. Hammarling (1990). "Algorithm 679. A
Set of Level 3 Basic Linear Algebra Subprograms: Model Implementation and Test
Programs," ACM Tmns. Math. Soft. 16, 18-28.
J.J. Dongarra, J. Du Croz, S. Hammarling, and R.J. Hanson (1988). "An Extended Set
of Fortran Basic Linear Algebra Subprograms," ACM Tmns. Math. Soft. 14, 1-17.
J.J. Dongacra, J. Du Croz, S. Ho.rnma.rling, and R.J. Hanson (1988). "Algorithm 656 An
Extended Set of Fortran Basic Linear Algebra Subprograms: Model Implementation
and Test Programs," ACM Tmns. Math. Soft. 14, 18-32.
J.J. Dongarra, I. Duff, P. Gaffney, and S. McKee, eds. (1989), Vector and Pamllel
Computing, Ellis Horwood, Chichester, England.
J.J. Dongarra, l. Duff, D. Sorensen, and H. van der Vorst (1990). Solving Linear Systems
on Vector and Shared Memrny Computers, SIAM Publications, Philadelphia, PA.
J.J. Dongacra and S. Eisenstat (1984). "Squeezing the Most Out of an Algorithm in
Cray Fortran," ACM Tmns. Math. Soft. 10, 221-230.
BIBLIOGRAPHY 651

J.J. Dongarra., F.G. Gustavson, and A. Karp (1984). "Implementing Lineae Algebra
Algorithms for Dense Matricftl on a Vector Pipeline Machine," SIAM Review 26,
91-112.
J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). "Block Reduction of Matrices
to Condensed Fonns for Eigenw.lue Computations," JACM 27, 21&-227.
J.J. Dongarra, S. Hammarling, and J.H. Wi!kiiUIOn (1992). "Numerical Considerations
in Computing Invariant Subspacffi," SIAM J. Matri:< Anal. Appl. I3, 14&-161.
J.J. Dongarra and T. Hewitt (1986). "Implementing Dense Linear Algebra Algorithms
Using Multitasking on the Cray X-MP-4 (or Approaching the Giga.tlop)," SIAM J.
Sci. and Stat. Comp. 7, 347-350.
J.J. Dongarra and A. Hinds (1979). "Unrolling Loops in Fortran," Software Practice
and E:tperience 9, 219-229.
J.J. Dongarra and R.E. Hiromoto (1984). "A Collection of Parallel Linear Equal;ion
Routinffi for the Denelcor HEP,' Parallel Computing I, 133-142.
J.J. Dongarra, L. Kaufman, and S. Hamrnarling (1986). "Squee:zing the Most Out of
Eigenvalue Solvers on High Performance Computers," Lin. Alg. and Its Applic. 77,
113-136.
J.J. Dongarra., C.B. Moler, and J.H. Wilkinaon (1983). "Improving the Accuracy of
Computed Eigenw.lues and Eigenvectors," SIAM J. Numer. Anal. 20, 23-46.
J.J. Dongarra and A.H. Sameh (1984). "On Some Parallel Banded System Solvers,"
Parallel Computing I, 223-235.
J.J. Dongarra, A. Sameh, and D. Sorensen (1986). "Implementation of Some Concurrent
Algorithms for Matrix Factorization," Parallel Computing 3, 2&-34.
J.J. Dongarra and D.C. Sorensen (1986). "Linear Algebra on High Performance Com-
puters," Appl. Math. and Comp. 20, 57-88.
J.J. Dongarra and D.C. Sorensen (1987). "A Portable Environment for Developing
Parallel Programs," Pamllel Computing 5, 17&-186.
J.J. Dongarra and D.C. Sorensen (1987). "A Fully Parallel Algorithm for the Symmetric
Eigenvalue Problem," SIAM J. Sci. and Stat. Comp. 8, S139-S154.
J.J. Dongarra and D. Walker (1995). "Software Libraries for Linear Algebra Computa-
tions on High Performance Computers," SIAM Review 37, 151-180.
F.W. Dorr (1970). "The Direct Solution of the Discrete Poisson Equation on a Rectan-
gle," SIAM Review I2, 248-63.
F.W. Dorr (1973). "The Direct Solution of the Discrete Poisson Equation in O(n2 )
Operations," SIAM Review I5, 412--415.
C. C. Douglas, M. Heroux, G. Slishman, and R.M. Smith (1994). "GEMMW: A Portable
Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm,"
J. Comput. Phys. 110, 1-10.
Z. Drmoc (1994). The Generalized Singular Value Problem, Ph.D. Thesis, FernUniver-
sitat, Hagen, Germany.
Z. Dramllc, M. Omlad~, and K. Veseli~ (1994). "On the Perturbation of the Cholesky
Factorization," SIAM J. Matri:< Anal. AppL I5,1319--1332.
P.F. Dubois, A. Greenbaum, and G.H. Rodrigue (1979). "Approximating the Inverse
of a Matrix for Use on Iterative Algorithms on Vector Processors,'' Computing 22,
257-268.
A.A. Dubrulle (1970). "A Short Note on the Implicit QL Algorithm for Symmetric
Tridiagonal Matriee~," Numer. Math. I5, 450.
A.A. Dubrulle and G.H. Golub (1994). "A Multishift QR Iteration Without Computa-
tion of tbe Shifts," Numerical Algorithms 7, 173-181.
A.A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). "The Implicit QL Algorithm,"
Numer. Math. I2, 377-83. see aiao Wilkinson and Reinsch (1971, pp.241-48).
J.J. Du Croz and N.J. Higham (1992). "Stability of Methods for Matrix Inversion," IMA
J. Num. Anal. Ill, 1-19.
I.S. Duff (1974). "Pivot Selection and Row Ordering in Givens Reduction on Sparse
Matriee~," Computing I3, 239--48.
I.S. Duff (1977). "A Survey of Sparse Matrix Research," Proc. IEEE 65, 500-535.
652 BIBLIOGRAPHY

I.S. Duff, ed. (1981). Sparse Matrices and Their Uses, Academic Press, New York.
I.S. Duff, A.M. Erisman, and J.K. Reid (1986). Direct Method. for Sparse Matrices,
Oxford University Press.
I.S. Duff, N.I.M. Gould, J.K. Reid, J.A. Scott, and K. Turner (1991). "The Factorization
of Span<e Indefinite Matrices," IMA J. Num. Anal. 11, 181-204.
I.S. Duff and G. Meurant (1989). "The Effect of Ordering on Preconditioned Conjugate
Gradients," BIT f9, 63fH>57.
I.S. Duff and J.K. Reid (1975). "On the Reduction of Sparse Matrices to Condensed
Forms by Similarity Transformations," J. Inst. Math. Applic. 15, 217-24.
I.S. Duff and J.K. Reid (1976). "A Comparison of Some Methods for the Solution of
Sparse Over-Determined Systems of Linear Equations," J. Inst. Math. Applic. 17,
267-80.
I.S. Duff and G.W. Stewart, eds. (1979). Sparse Matrix Proceedings, 1978, SIAM
Publications, Philadelphia, PA.
N. Dunford and J. Schwartz (1958). Linear O!JEmtors, Part I, Interscience, New York.
J. Durbin (1960). "The Fitting of Time Series Models," Rev. Inst. Int. Stat. f8 233-43.
P.J. Eberlein (1965). "On Measures of Non-Normality for Matrices," A mer. Math. Soc.
Monthly 72, 995-96.
P.J. Eberlein (1970). "Solution to the Complex Eigenproblem by a Norm-Reducing
Jacobi-type Method," Numer. Math. L4, 232-45. See also Wilkinson and Reinsch
(1971, pp.404-17).
P.J. Eberlein (1971). "On the Diagonalization of Complex Symmetric Matrices," J. Inst.
Math. Applic. 7, 377-83.
P.J. Eberlein (1987). "On Using the Jacobi Method on a Hypercube," in Hypercube
Multiprocessors, ed. M.T. Heath, SIAM Publications, Philadelphia, PA.
P.J. Eberlein and C.P. Huang (1975). "Global Convergence of the QR Algorithm for
Unitary Matrices with Some Results for Normal Matrices," SIAM J. Numer. Anal.
1f, 421-453.
C. Eckart and G. Young (1939). "A Principal Axis Transformation for Non-Hermitian
Matrices," Bull. Amer. Math. Soc. 45, 118-21.
A. Edelman (1992). "The Complete Pivoting Conjecture for Gaussian Elimination is
False," The Mathematica Journal 2, 58-61.
A. Edelman (1993). "Large Dense Numerical Linear Algebra in 1993: The Parallel
Computing Influence," Int'l J. Supercomputer Appl. 7, 113-128.
A. Edelman, E. Elmroth, and B. Kii.gstrom (1996). "A Geometric Approach to Pertur-
bation Theory of Matrices and Matrix Pencils," SIAM J. Matrix Anal., to appear.
A. Edelman and W. Mascarenhas (1995). "On the Complete Pivoting Conjecture for a
Hadamard Matrix of Order 12," Linear and Multilinear Algebm 38, 181-185.
A. Edelman and H. Murakami (1995). "Polynomial Roots from Companion Matrix
Eigenvalues," Math. Comp. 64, 763-776.
M. Eiermann and W. Niethammer (1983). "On the Construction of Semi-iterative Meth-
ods," SIAM J. Numer. Anal. 20, 1153-1160.
M. Eiermann, W. Niethammer, and R.S. Varga (1992). "Acceleration of Relaxation
Methods for Non-Hermitian Linear Systems," SIAM J. Matrix Anal. Appl. 13,
97!f-991.
M. Eiermann and R.S. Varga (1993). "Is the Optimal w Best for the SOR Iteration
Method," Lin. Alg. and Its Applic. 182, 257-277.
V. Eijkhout (1991). "Analysis of Parallel Incomplete Point Factorizations," Lin. Alg.
and Its Applic. 154-156, 723-740.
S.C. Eisenstat (1984). "Efficient Implementation of a Class of Preconditioned Conjugate
Gradient Methods," SIAM J. Sci. and Stat. Computing 2, 1-4.
S.C. Eisenstat, H. Elman, and M. Schultz (1983). "Variational Iterative Methods for
Nonsymmetric Systems of Equations," SIAM J. Num. Anal. 20, 345-357.
S.C. Eisenstat, M.T. Heath, C.S. Henkel, and C.H. Romine (1988). "Modified Cyclic
Algorithms for Solving Triangular Systems on Distributed Memory Multiprocessors,"
SIAM J. Sci. and Stat. Comp. 9, 589-600.
BIBLIOGRAPHY 653

L. Elden {1977). "Algorithms for the Regularization of Ill-Conditioned LeBSt Squares


Problems," BIT 17, 134-45.
L. Elden {1980). "Perturbation Theory for the Least Squares Problem with Linear
Equality Constraints," SIAM J. Num. Anal. 17, 338--50.
L. Elden {1983). "A Weighted Pseudoinverse, Generalized Singular Values, and Con-
strained Least Squares Problems," BIT 22 , 487-502.
L. Elden (1984). "An Algorithm for the Regularization of Ill-Conditioned, Banded Least
Squares Problems," SIAM J. Sci. and Stat. Comp. 5, 237-254.
L. Elden (1985). "A Note on the Computation of the Generalized Cross-Validation
Function for Ill-Conditioned Least Squares Problems," BIT 24, 467-472.
L. Elden and H. Park {1994). "Perturbation Analysis for Block Downdating of a Cholesky
Decomposition," Numer. Math. 68, 457-468.
L. Elden and H. Park {1994). "Block Downdating of Least Squares Solutions," SIAM J.
Matrix Anal. Appl. 15, 1018-1034.
L. Elden and R. Schreiber {1986). "An Application of Systolic Arrays to Linear Discrete
Ill-Posed Problems," SIAM J. Sci. and Stat. Comp. 7, 892-903.
H. Elman {1986). "A Stability Analysis of Incomplete LU Factorization," Math. Comp.
41, 191-218.
H. Elman (1989). "Approximate Schur Complement Preconditioners on Serial and Par-
allel Computers," SIAM J. Sci. Stat. Comput. 10, 581-605.
H. Elman {1996). "Fast Nonsymmetric Iterations and Preconditioning for Navier-Stokes
Equations," SIAM J. Sci. Comput. 17, 33-46.
H. Elman and G.H. Golub {1990). "Iterative Methods for Cyclically Reduced Non-Self-
Adjoint Linear Systems I," Math. Comp. 54, 671-700.
H. Elman and G.H. Golub (1990). "Iterative Methods for Cyclically Reduced Non-Self-
Adjoint Linear Systems II," Math. Comp. 56, 215-242.
E. Elmroth and B. Kagstrom (1996). "The Set of 2-by-3 Matrix Pencils-Kronecker
Structure and their Transitions under Perturbations," SIAM J. Matrix Anal., to
appear.
L. Elsner and J.-G. Sun (1982). "Perturbation Theorems for the Generalized Eigenvalue
Problem,; Lin. Alg. and its Applic. 48, 341-357.
W. Enright {1979). "On the Efficient and Reliable Numerical Solution of Large Linear
Systems of O.D.E.'s," IEEE Tmns. Auto. Cont. AC-24, 905-8.
W. Enright and S. Serbin {1978). "A Note on the Efficient Solution of Matrix Pencil
Systems," BIT 18, 27&--81.
I. Erdelyi {1967). "On the Matrix Equation Ax = >.Bx," J. Math. Anal. and Applic.
17, 119-32.
T. Ericsson and A. Rube {1980). "The Spectral Transformation Lanczos Method for the
Numerical Solution of Large Sparse Generalized Symmetric Eigenvalue Problems,"
Math. Comp. 35, 1251-68.
A.M. Erisman and J.K. Reid {1974). "Monitoring the Stability of the Triangular Fac-
torization of a Sparse Matrix," Numer. Math. 22, 183-86.
J. Erxiong {1990). "An Algorithm for Finding Generalized Eigenpairs of a Symmetric
Definite Matrix Pencil," Lin.Aig. and Its Applic. 132, 65-91.
J. Erxiong {1992). "A Note on the Double-Shift QL Algorithm," Lin.Aig. and Its Applic.
171, 121-132.
D.J. Evans {1984). "Parallel SOR Iterative Methods," Pamllel Computing 1, 3-18.
D.J. Evans and R. Dunbar {1983). ''The Parallel Solution of Triangular Systems of
Equations," IEEE Tmns. Comp. C-3fJ, 201-204.
L.M. Ewerbring and F.T. Luk (1989). "Canonical Correlations and Generalized SVD:
Applications and New Algorithms," J. Comput. Appl. Math. 27, 37-52.
V. Faber and T. Manteulfel {1984). "Necessary and Sufficient Conditions for the Exis-
tence of a Conjugate Gradient Method," SIAM J. Numer. Anal. Ill 352-362.
V.N. Faddeeva {1959). Computational Methods of Linear Algebm, Dover, New York.
V. Fadeeva and D. Fadeev {1977). "Parallel Computations in Linear Algebra," Kiber-
netica 6, 28-40.
654 BIBLIOGRAPHY

W. Fair and Y. Luke (1970). "Pade Approximations to the Operator Exponential,"


Numer. Math. 14, 379-82.
R.W. Farebrother (1987). Linoor Lea.t Squares Computations, Marcel Dekker, New
York.
D.G. Feingold and R.S. Varga (1962). "Block Diagonally Dominant Matrices and Gen-
eralizations of the Gershgorin Circle Theorem," Pacific J. Math. 12, 1241-50.
T. Fenner and G. Loizou (1974). "Some New Bounds on the Condition Numbers of
Optimally Scaled Matrices," J. ACM fJ1, 514-24.
W.E. Ferguson (1980). ''The Construction of Jacobi and Periodic Jacobi Matric.., with
Prescribed Spectra," Math. Comp. 35, 1203--1220.
K.V. Fernando (1989). "Linear Convergence of the Row Cyclic Jacobi and Kogbetliantz
methods," Numer. Math. 56, 73-92.
K.V. Fernando and B.N. Parlett (1994). "Accurate Singular Values and Differential qd
Algorithms," Numer. Math. 67, 191-230.
W.R. Ferng, G.H. Golub, o.nd R.J. Plemmons (1991). "Adaptive La.nczos Methods for
Recursive Condition Estimation," Numerical Algorithms 1, 1·20.
R.D. Fierro and J.R. Bunch (1994). "Collinearity and Total Least Squar..,," SIAM J.
Matrix Anal. Appl. 15, 1167-1181.
R.D. Fierro and P.C. Hansen (1995). "Accuracy of TSVD Solutions Computed from
Rank-Revealing Decompositions," Numer. Math. 70, 453--472.
C. Fischer and R.A. Usma.ni (1969). "Properties of Some Tridiagonal Matrices and Their
Application to Bounda.ry Value Problems," SIAM J. Num. Anal. 6, 127-42.
G. Fix and R. Heiberger (1972). "An Algorithm for the Ill-Conditioned Generalized
Eigenvalue Problem," SIAM J. Num. Anal. 9, 78-88.
U. FlBSChka, W-W. Li, and J-L. Wu (1992). "A KQZ Algorithm for Solving Linear-
Response Eigenvalue Equations," Lin. Alg. and Its Applic. 165, 93--123.
R. Fletcher (1976). "Factorizing Symmetric Indefinite Matrices," Lin. Alg. and Its
Applic. 14, 257-72.
A. Forsgren (1995). "On Lineae Least-Squares Problems with Diagonally Dominant
Weight Matrices," Technical Report TRITA-MAT-1995-0S2, Department of Mathe-
matics, Royal Institute of Technology, 8-100 44, Stockholm, Sweden.
G.E. Fbrsythe (1960). "Crout with Pivoting," Comm. ACM 3, 507-8.
G.E. Forsythe and G.H. Golub (1965). "On the Stationary Values of a Second-Degree
Polynomial on the Unit Sphere," SIAM J. App. Math. 13, 105!Hl8.
G.E. Forsythe and P. Henrici (1960). ''The Cyclic Jacobi Method for Computing the
Principal Values of a Complex Matrix," 1h>m. A mer. Math. Soc. 94, 1-23.
G.E. Forsythe and C. Moler (1967). Computer Solution of Linoor Algebroic Systems,
Prentice-Hall, Englewood Cliffs, N J.
L.V. Foster (1986). "Rank and Null Spa.ce Calculations Using Matrix Decomposition
without Column Interchanges," Lin. Alg. and Its Applic. 74, 4 7-71.
L.V. Foster (1994). "Gaussian Elimination with Partial Pivoting Can Fail in Practice,"
SIAM J. Matri3: Anal. Appl. 15, 1354-1362.
R. Fourer (1984). "Stairca.se Matrices and Systems," SIAM Review 26, 1-71.
L. Fox (1964). An Introduction to Numerical Linoor Algebrn, Oxford University Press,
Oxford, England.
G.C. Fox, ed. (1988). The Third Conference on Hypercube Concurrent Computers and
Applications, Vol. II - Applicatiom, ACM Press, New York.
G.C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, J. K. Salmon and D. W. Walker
(1988). Solving Problems on Concurrent Proce8sors, Volume 1. Prentice Hall, En-
glewood Cliffs, N J.
G.C. Fox, S.W. Otto, and A.J. Hey (1987). "Matrix Algorithms on a Hypercube I:
Matrix Multiplication," Parollel Computing 4, 17-31.
G.C. Fox, R.D. Williams, and P. C. Messina (1994). Parollel Computing Works!, Morgan
Kaufmann, San Francisco.
J.S. Frame (1964). "Matrix Functions and Applications, Part II," IEEE Spectrum 1
(April), 102-8.
BIBLIOGRAPHY 655

J.S. Frame (1964). "Matrix Functions a.nd Applications, Pa.rt IV," IEEE Spectrum 1
(June}, 123--31.
J.G.F. Francis (1961). "The QR Transformation: A Unitary Analogue to the LR Trans-
formation, Pa.rts I and II" Camp. J. 4, 265-72, 332-45.
J.N. Franklin (1968). Matrix Theo'f71 Prentice Hall, Englewood Cliffs, NJ.
T. L. Freeman and C. Phillips (1992). Parallel Numerical Algorithms, Prentice Hall,
New York.
R.W. Freund (1990). "On Conjugate Gradient Type Methods and Polynomial Pre-
conditioners for a Class of Complex Non-Hermitian Matrices," Numer. Math. 57,
285-312.
R.W. Freund (1992). "Conjugate Gradient-Type Methods for Lineae Systems with Com-
plex Symmetric Coefficient Matrices," SIAM J. Sci. Statist. Comput. 13, 425-448.
R.W. Freund (1993). "A Transpose-Free Quasi-Minimum Residual Algorithm for Non-
hermitian Linear System," SIAM J. Sci. Comput. 14, 470-482.
R.W. Freund and N. Nachtigal (1991). "QMR: A Quasi-Minimal Residual Method for
Non-Hermitian Lineae Systems," Numer. Math. 60, 315-339.
R.W. Freund and N.M. Nachtigal (1994). "An Implementation of the QMR Method
Based on Coupled Two-term Recurrences," SIAM J. Sci. Comp. 15, 313-337.
R.W. Freund, G.H. Golub, and N. Nachtigal (1992). "Iterative Solution of Lineae Sys-
tems," Acta Numerim 1, 57-lOll.
R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). "An Implementation of the
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices," SIAM J. Sci. and
Stat.Comp. 14, 137-158.
R.W. Freund and H. Zha (1993). "A Look-Ahead Algorithm for the Solution of General
Hankel Systems," Numer. Math. 64, 295-322.
S. Friedland (1975). "On Inverse Multiplicative Eigenvalue Problems for Matrices," Lin.
Alg. and Its Applic. HJ, 127-38.
S. Friedland (1991). "Revisiting Matrix Squaring," Lin. Alg. and Its Applic. 154-156,
59--63.
S. Friedland, J. Nocedal, and M.L. Overton (1987). "The Formulation a.nd Analysis of
Numerical Methods for Inverse Eigenvalue Problems," SIAM J. Numer. Anal. !4,
634--£67.
C.E. Froberg (1965). "On Tria.ngulacization of Complex Matrices by Two Dimensional
Unitary Tranforma.tions," BIT 5, 23(}-34.
R.E. Funderlic and A. Geist (1986). "Thrus Data Flow for Parallel Computation of
Missized Matrix Problems," Lin. Alg. and I!3 Applic. 11, 149-164.
G. Galimberti and V. Pereyra (1970). "Numerical Differentiation and the Solution of
MultidimenBional Vanderrnonde Systems," Math. Comp. !4, 357-{;4.
G. Galimberti and V. Pereyra (1971). "Solving Confluent Vandermonde Systems of
Hermitian Type," Numer. Math. 18, 44-{iO.
K.A Gallivan, M. Heath, E. Ng, J. Ortega, B. Peyton, R. Plemmons, C. Romine, A.
Sa.meh, and B. Voigt (1990), Pamllel Algorithms for Matru Computations, SIAM
Publications, Philadelphia., PA.
K.A. Gallivan, W. Jalby, a.nd U. Meier (1987). ''The Use of BLAS3 in Lineae Algebra
on a. Parallel Processor with a Hieracchica.l Memory," SIAM J. Sci. and Stat. Comp.
8, 1079-1084.
K.A. Gallivan, W. Jalby, U. Meier, and A.H. Sameh (1988). "Impact of Hierarchical
Memory Systems on Lineae Algebra Algorithm Design," Int 'I J. Supercomputer Ap-
plic. ll, 12-48.
K.A. Gallivan, R.J. Plemmons, and A.H. Sameh (1990). "Parallel Algorithms for Dense
Lineae Algebra Computations," SIAM Review 3£, 54-135.
E. Ga.llopoulos and Y. Saad (1989). "A Parallel Block Cyclic Reduction Algorithm for
the Fast Solution of Elliptic Equations," Parallel Computing 10, 143-160.
W. Gander (1981). "Least Squares with a Quadratic Constraint," Numer. Math. 36,
291-307.
656 BIBLIOGRAPHY

W. Gander, G.H. Golub, and U. von Matt (1991). "A Constrained Eigenvalue Problem,"
in Numerical Linear Algebm, Digital Signal Processing, and Pamlld Algorithms,
G.H. Golub and P. Van Dooren (eds), Springer-Verlag, Berlin.
D. Gannon and J. Van Rllsendale (1984). "On the Impact of Communication Complexity
on the Design of Parallel Numerica.l Algorithms," IEEE 7Fans. Comp. C-33, 1180-
1194.
F.R. Gantmacher (1959). The TheoT'lf of Matrices, vols. 1 and i!, Chelsea, New York.
B.S. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972). Matrix Eigensy•tem
Routineo: EISPACK Guide Extension, Lecture Notes in Computer Science, Volume
51, Springer-Verlag, New York.
J. Gardiner, M.R. Wette, A.J. Laub, J.J. Amato, and C.B. Moler (1992). "Algorithm
705: A FORI'RAN-77 Sortware Package fur Solving the Sylvester Matrix Equation
AXBT + CXDT = E," ACM 7Fans. Math. Soft. 18, 232-238.
W. Gautschi (1975). "Norm Estimates fur Inverses of Vandennonde Matrices,• Numer.
Math. 23, 337-47.
W. Gautschi (1975). "Optimally Conditioned Vandermonde Matrices," Numer. Math.
ll4, 1-12.
G.A. Geist {1991). "Reduction of a General Matrix to Tridiagonal Form," SIAM J.
Matrix Anal. Appl. lll, 362-373.
G.A. Geist and M.T. Heath (1986). "Matrix Factorization on a Hypercube," in M.T.
Heath (ed) {1986). Proceedings of First SIAM Conference on Hypercube Multipro-
cessors, SIAM Publications, Philadelphia, PA.
G.A. Geist and C.H. Romine (1988). "LU Factorization Algorithms on Distributed
Memory Multiprocessor Architectures," SIAM J. Sci. and Stat. Comp. 9, 639--649.
W.M. Gentleman {1973). "Least Squares Computations by Givens Transformations
without Square Roots," J. Inst. Math. Appl. 12, 329-36.
W.M. Gentleman (1973). "Error Analysis of QR Decompositions by Givens Transfor-
mations," Lin. Alg. and Its Applic. 10, 189-97.
W.M. Gentleman and H.T. Kung (1981). "Matrix Triangularization by Systolic Arrays,"
SPIE Proceedings, Vol. 298, 19-26.
J.A. George (1973). "Nested Dissection of a Regular Finite Element Mesh," SIAM J.
Num. Anal. 10, 345-63.
J.A. George {1974). "On Block Elimination for Sparse Linear Systems," SIAM J. Num.
Anal. 11, 585--603.
J.A. George and M.T. Heath (1980). "Solution of Sparse Linear Least Squares Problems
Using Givens Rotations," Lin. Alg. and Its Applic. 34, 69-83.
A. George, M.T. Heath, and J. Liu {1986). "Parallel Cholesky Factorization on a Shared
Memory Multiproceasor," Lin. Alg. and Its Applic. 77, 161>-187.
A. George and J. W-H. Liu (1981). Computer Solution of Large Sparse Positive Definite
Systems. Prentice-Hall Inc., Englewood Cliffs, New Jersey.
A.R. Ghavimi and A.J. Laub (1995). "Residual Bounds for Discret&-Time Lyapunov
Equations,• IEEE 7Fans. Auto. Cont. 40, 1244-1249.
N.E. Gibbs and W.G. Poole, Jr. {1974). "Tridiagonalization by Permutations," Comm.
ACM 17, 2G-24.
N.E. Gibbs, W.G. Poole, Jr., and P.K. Stockmeyer (1976). "An Algorithm for Reducing
the Bandwidth and Profile of a SpiiJ'!!e Matrix," SIAM J. Num. Anal. 13, 236-50.
N.E. Gibbs, W.G. Poole, Jr., and P.K. Stockmeyer (1976). "A Comparison of Several
Bandwidth and Profile Reduction Algorithms," A CM 7Fans. Math. Soft. 2, 322-30.
P.E. Gill, G.H. Golub, W. Murray, and M.A. Saunders (1974). "Methods for Modifying
Matrix Factorizations," Math. Comp. 88, 50~35.
P.E. Gill and W. Murray {1976). ''The Orthogonal Factorization of a Large Sparse
Matrix," in Sparse Matrix Computations, ed. J.R. Bunch and D.J. Rllse, Academic
Press, New York, pp. 177-200.
P.E. Gill, W. Murray, D.B. Poncele6n, and M.A. Saunders (1992). "Preconditioners
for Indefinite Systems Arising in Optimization," SIAM J. Matrix Anal. Appl. 13,
292-311.
BIBLIOGRAPHY 657

P.E. Gill, W. Murray, and M.A. Saunders (1975). "Methods for Computing and Modi-
fying the LDV Factors of a Matrix," Math. Comp. !1.9, 1051-77.
P.E. Gill, W. Murray, and M.H. Wright (1991). Numerical Linear Algel>ra and Opti-
mimtion, Vol. 1, Addison-Wesley, Reading, MA.
W. Givens (1958). "Computation of Plane Unita.ry Rotations Transforming a Genera.!
Matrix to Triangular Form," SIAM J. App. Math. 6, 26--50.
J. Gluchowska and A. Smoktunowicz (1990). "Solving the Linear Least Squares Problem
with Very High Relative Accuracy," Computing 45, 345-354.
l.C. Gohberg and M.G. Krein (1969). Introduction to the TheoT]f of Linear Non-Self
Adjoint Operators , Amer. Math. Soc., Providence, R.I.
I.C. Gohberg, P. Lancaster, and L. Rodman (1986). /nooriant Subspaces of Matrices
With Applications, John Wiley and Sons, New York.
D. Goldberg (1991). "What Every Computer Scientist Should Know About Floating
Point Arithmetic," ACM SunJeys 23, 5-48.
D. Goldfarb (1976). "Factored Variable Metric Methods for Unconstrained Optimiza.-
tion," Math. Comp. 30, 796--811.
H.H. Goldstine and L.P. Horowitz (1959). "A Procedure for the Diagonalization of
Norma.! Matrices," J. Assoc. Comp. Mach. 6, 176--95.
G.H. Golub (1965). "Numerical Methods for Solving Linear Least Squares Problems,"
Numer. Math. 7, 206--16.
G.H. Golub (1969). "Matrix Decompositions and Statistical Computation," in Statistical
Computation , ed. R.C. Milton and J.A. Nelder, Academic Press, New York, pp.
365--97.
G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Review 15,
318-334.
G.H. Golub (1974). "Some Uses of the Lanczos Algorithm in Numerical Linear Algebra,"
in Topics in NumericBI Analysis, ed., J.J.H. Miller, Academic Press, New York.
G.H. Golub, M. Heath, and G. Wahba (1979). "Generalized Cr088-Validation as a
Method Cor Choosing a Good Ridge Parameter," Technometrics Ill, 215--23.
G.H. Golub, A. Hoffman, and G.W. Stewart (1988). "A Generalization of the Eckart-
Yaung-Mirsky Approximation Theorem." Lin. Alg. and Its Applic. 88/89, 317-328.
G.H. Golub a.nd W. Kahan (1965). "Calculating the Singular Values and Pseudo-Inverse
of a Matrix," SIAM J. Num. Anal. !!., 205-24.
G.H. Golub, V. Klema and G.W. Stewart (1976). "Rank Degeneracy and Least Squares
Problems," Technical Report TR-456, Department of Computer Science, University
of Maryland, College Park, MD.
G.H. Golub, F.T. Luk, and M. Overton (1981). "A Block Lanczos Method for Computing
the Singular Values and Corresponding Singular Vectors of a Matrix," ACM 1hlns.
Math. Soft. 7, 14!Hi9.
G.H. Golub and G. Meurant (1983). Resolution Numeri.que des Grondes Systi!mes
Lineaires, Collection de Ia Direction des Etudes et Recherches de l'Electricite de
France, val. 49, Eyolles, Paris.
G.H. Golub and C.D. Meyer (1986). "Using the QR Factorization and Group Inversion
to Compute, Differentiate, and estimate the Sensitivity of Stationary Probabilities
for Markov Chains," SIAM J. Alg. and Dis. Methods, 7, 273-281.
G.H. Golub, S. Nash, and C. Van Loan (1979). "A Hessenberg-Schur Method for the
Matrix Problem AX+ XB = C," IEEE 1hlns. Auto. Cont. AC-!!.4, 909-13.
G.H. Golub and D. O'Leary (1989). "Some History of the Conjugate Gradient and
Lanczos Methods," SIAM Review 31, 5(}--102.
G.H. Golub and J.M. Ortega (1993). Scientific Computing: An Introduction with Par-
allel Computing, Academic Press, Boston.
G.H. Golub and M. Overton (1988). "The Convergence of Inexact Chebychev and
Richardson Iterative Methods for Solving Linear Systems," Numer. Math. 53, 571-
594.
658 BIBLIOGRAPHY

G.H. Golub and V. Pereyra (1973). "The Differentiation of Pseudo-Inverses and Nonlin-
eae Least Squares Problems Whose Variables Separate," SIAM J. Num. Anal. 10,
413-32.
G.H. Golub and V. Pereyra (1976). "Differentiation of Pseudo-Inverses, Separable Non-
lineae Least Squares Problems and Other Tales," in Generalized Inverses and Appli-
cations , ed. M.Z. Nashed, Academic Press, New York, pp. 303-24.
G.H. Golub and C. Reinsch (1970). "Singular Va.lue Decomposition and Least Squares
Solutions," Numer. Math. 14, 403-20. See a.lso Wilkinson and Reinsch (1971, 134-
51).
G.H. Golub and W.P. Tang (1981). "The Block Decomposition of a Vandermonde Matrix
a.nd Its Applications," BIT 21, 505-17.
G.H. Golub and R. Underwood (1977). ''fhe Block Lanczos Method for Computing
Eigenvalues," in Mathematical Software III, ed. J. Rice, Academic Press, New York,
pp. 364-77.
G. H. Golub, R. Underwood, and J.H. Wilkinson (1972). "The Lanczos Algorithm for the
Symmetric Ax= >.Bx Problem," Report STAN-CS-72-270, Department of Computer
Science, Stanford University, Stanford, California.
G.H. Golub and P. Van Dooren, eds. (1991). Numerical Linear Algebro, Digital Signal
Processing, and Parollel Algorithms .. Springer-Verlag, Berlin.
G.H. Golub and C. F. Van Loan (1979). "Unsymmetric Positive Definite Linear Systeri!B,"
Lin. Alg. and Its Applic. 28, 85--98.
G.H. Golub and C.F. Van Loan (1980). "An Ana.lysis of the Total Least Squares Prob-
lem," SIAM J. Num. Anal. 17, 883-93.
G.H. Golub and J.M. Varah (1974). "On a Characterization of the Best £2-Scaling of a.
Matrix," SIAM J. Num. Anal. 11, 472-79.
G.H. Golub and R.S. Varga (1961). "Chebychev Semi-Iterative Methods, Successive
Over-Relaxation Iterative Methods, and Second-Order Richardson Iterative Methods,
Parts I and II," Numer. Math. 3, 147-56, 157-68.
G.H. Golub and J.H. Welsch (1969). "Ca.lcula.tion of Gauss Quadrature Rules," Math.
Comp. 23, 221-30.
G.H. Golub and J.H. Wilkinson (1966). "Note on Iterative Refinement of Least Squares
Solutions,'' Numer. Math. 9, 139-48.
G.H. Golub and J.H. Wilkinson (1976). "Ill-Conditioned EigellBYstems and the Compu-
tation of the Jordan Canonical Form," SIAM Review 18, 578-619.
G.H. Golub a.od H. Zha. ( 1994). "Perturbation Analysis of the Canonical Correlations of
Matrix Paino,'' Lim. Alg. and Its Applic. 210, 3-28.
N. Gould (1991). "On Growth in Gaussian Elimination with Complete Pivoting,'' SIAM
J. Matrix Anal. Appl. 12, 354-361.
R.J. Goult, R.F. Hoskins, J.A. Milner and M.J. Pratt (1974). Computational Methods
in Linear Algebm, John Wiley and Sons, New York.
A.R. Gourlay (1970). "Generalization of Elementary Hermitian Matrices," Comp. J.
13, 411-12.
A.R. Gourlay and G.A. Watson (1973). Computational Methods for Matrix Eigenprob-
lems, John Wiley & Sons, New York.
W. Govaerts (1991). "Stable Solvers and Block Elimination for Bordered SysterllB,"
SIAM J. Matrix Anal. Appl. 12, 469-483.
W. Gova.erts and J.D. Pryce (1990). "Block Elimination with One Iterative Refinement
Solves Bordered Lineae Systems Accurately," BIT 30, 490-507.
W. Govaerts and J.D. Pryce (1993). "Mixed Block Elimination for Linear Systems with
Wider Bordeno," IMA J. Num. Anal. 13, 161-180.
W. B. Gragg (1986). "The QR Algorithm for Unitary Hessenberg Matrices," J. Comp.
Appl. Math. 16, 1-8.
W.B. Gragg and W.J. Harrod (1984). "The Numerically Stable Reconstruction of Jacobi
Matrices from Spectra.! Data.,'' Numer. Math. 44, 317-336.
W.B. Gragg and L. Reichel (1990). "A Divide and Conquer Method for Unitary a.nd
Orthogona.l EigenproblerllB," Numer. Math. 57, 695-718.
BIBLIOGRAPHY 659

A. Graham (1981). Kronecker Producls and Matru Calculw with Application.., Ellis
Horwood Ltd., Chichester, England.
B. Green (1952). "The Orthogonal Approximation of an Oblique Structure in Factor
Analysis," Psychometrika 17, 429-40.
A. Greenbaum (1992). "Diagonal Sca.lings of the Laplacian as Preconditione"' for Other
Elliptic Differentia.! Operatoi8," SIAM J. Matru Anal. Appl. 13, 826--846.
A. Greenbaum and G. Rodrigue ( 1989). "Optimal Preconditione"' of a Given Sparsity
Pattern," BIT 29, 61Q-634.
A. Greenbaum and Z. Strakos (1992). "Predicting the Behavior of Finite Precision
Lanczos and Conjugate Gradient Computations," SIAM J. Matri:J; Anal. Appl. 13,
121-137.
A. Greenbaum and L.N. 'frefethen (1994). "GMR.F.'lfCR and Arnoldi/Lanczos as Matrix
Approximation Problems," SIAM J. Sci. Comp. 15, 359-368.
J. Greenstadt (1955). "A Method for Finding Roots of Arbitrary Matrices," Math.
Tables and Other Aids to Comp. 9, 47-52.
R.G. Grimes and J.G. Lewis (1981). "Condition Number Estimation for Sparse Matri-
ces," SIAM J. Sci. and Stat. Comp. 2, 384-88.
R.G. Grim..,, J.G. Lewis, and H.D. Simon (1994). "A Shifted Block Lanczos Algorithm
for Solving Sparse Symmetric Generalized Eigenprohlems," SIAM J. Matri:J; Anal.
Appl. 15, 228-272.
W.O. Gropp and D.E. Key.., (1988). "Complexity of Parallel Implementation of Domain
Decomposition Techniqu.., for Elliptic Partial Differential Equations," SIAM J. Sci.
and Stat. Comp. 9, 312-326.
W.O. Gropp and D.E. Keyes (1992). "Domain Decomposition with Local Mesh Refine-
ment," SIAM J. Sci. Stati.ot. Comput. 13, 967-993.
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Bidiagonal
SVD," SIAM J. Matru Anal. Appl. 16, 79-92.
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Symmetric
'fridiagona.l Eigenprohlem," SIAM J. Matri:J; Anal. Appl. 16, 172-191.
M. Gulliksson (1994). "Iterative Refinement for Constrained and Weighted Linear Least
Squares," BIT 34, 239-253.
M. Gulliksson (1995). "Backward Error Analysis for the Constrained and Weighted
Linear Least Squares Problem When Using the Weighted QR Factorization," SIAM
J. Matru. Anal. Appl. 13, 675--u87.
M. Gulliksson and P-A. Wedin (1992). "Modifying the QR-Decomposition to Con-
strained and Weighted Linear Least Squares," SIAM J. Matri:J; Anal. AppL 13,
1298-1313.
R.F. Gunst, J.T. Webster, and R.L. Mason (1976). "A Comparison of Least Squares
and Latent Root Regression Estimators," Technometrics 18, 75-83.
K.K. Gupta (1972). "Solution of Eigenvalue Problems by Sturrn Sequence Method," Int.
J. Numer. Meth. Eng. 4, 379-404.
M. Gutknecht (1992). "A Completed Theory of the Unsymmetric Lanczos Process and
Related Algorithms, Part 1," SIAM J. Matru Anal. Appl. 13, 594--u39.
M. Gutknecht (1993). "Variants of BiCBSTAB for Matrices with Complex Spectrum,"
SIAM J. Sci. and Stat. Comp. 14, 102(}-1033.
M. Gutknecht (1994). "A Completed Theory of the Unsymmetric Lanczos Process and
Related Algorithms, Part II," SIAM J. Matru Anal. Appl. 15, 15-58.
W. Hackbusch (1994). Iterative Solution of Larye Sparse Systems of Equations, Springer-
Verlag, New York.
D. Hacon (1993). "Jacobi's Method for Skew-Symmetric Matrices," SIAM J. Matru
Anal. Appl. 14, 619--u28.
L.A. Hageman and D.M. Young (1981). Applied Itemtive Methods, Academic Press,
New York.
W. W. Hager (1984). "Condition Estimates," SIAM J. Sci. and Stat. Comp. 5, 311-316.
W. W. Hager (1988). Applied Numerical Linear Algebrn, Prentice-Hall, Englewood Cliffs,
NJ.
660 BIBLIOGRAPHY

S.J. Hammarling (1974). "A Note on Modifications to the Givens Plane Rotation," J.
Irut. Math. Appl. 13, 215-18.
S.J. Harnmarling (1985). "The Singular Value Decomposition in Multivariate Statistics,"
ACM SIGNUM Newsletter 20, 2-25.
S.L. Handy and J.L. Barlow (1994). "Numerical Solution of the Eigenproblem for
Banded, Symmetric Toeplitz Matrices," SIAM J. Matrix Anal. Appl. 15, 205-214.
M. Hanke and J.G. Nagy (1994). "Toeplitz Approximate Inverse Preconditioner for
Banded Toeplitz Matrices," Numerical Algorithms 7, 183-199.
M. Hanke and M. Neumann (1990). "Preconditionings and Splittings for Rectangular
Systems," Numer. Math. 57, 85-96.
E.R. Hansen (1962). "On Quasicyclic Jacobi Methods," ACM J. 9, 118-35.
E.R. Hansen (1963). "On Cyclic Jacobi Methods," SIAM J. AppL Math. 11, 448-59.
P.C. Hansen (1987). "The Truncated SVD as a Method for Regularization," BIT 27,
534-553.
P.C. Hansen (1988). "Reducing the Number of Sweeps in Hestenes Method," in Singular
Value Decomposition and Signal Processing, ed. E.F. Deprettere, North Holland.
P.C. Hansen (1990). "Relations Between SVD and GSVD of Discrete Regularization
Problems in Standard and General Form," Lin.Alg. and Its Applic. 141, 165-176.
P.C. Hansen and H. Gesmar (1993). "Fast Orthogonal Decomposition of Rank-Deficient
Toeplitz Matrices," Numerical Algorithms 4, 151-166.
R.J. Hanson and C.L. Lawson (1969). "Extensions and Applications of the Householder
Algorithm for Solving Linear Least Square Problems," Math. Camp. 23, 787-fll2.
V. Hari (1982). "On the Global Convergence of the Eberlein Method for Real Matrices,"
Numer. Math. 39, 361-370.
V. Hari (1991). "On Pairs of Almost Diagonal Matrices," Lin. Alg. and Its Applic.
148, 193-223.
M.T. Heath, ed. (1986). Proceedings of First SIAM Conference on Hypercube Multipro-
cessors, SIAM Publications, Philadelphia, PA.
M.T. Heath, ed. (1987). Hypercube Multiprocessors, SIAM Publications, Philadelphia,
PA.
M.T. Heath (1997). Scientific Computing: An Introductory Survey, McGraw-Hill, New
York.
M.T. Heath, A.J. Laub, C. C. Paige, and R.C. Ward (1986). "Computing the SVD of a
Product of Two Matrices," SIAM J. Sci. and Stat. Comp. 7, 1147-1159.
M.T. Heath, E. Ng, and B.W. Peyton (1991). "Parallel Algorithms for Sparse Linear
Systems," SIAM Review 33, 420-460.
M.T. Heath and C.H. Romine (1988). "Parallel Solution of Triangular Systems on Dis-
tributed Memory Multiprocessors," SIAM J. Sci. and Stat. Comp. 9, 558-588.
M. Hegland (1991). "On the Parallel Solution of Tridiagonal Systems by Wrap-Around
Partitioning and Incomplete LU Factorization," Numer. Math. 59, 453-472.
G. Heinig and P. Jankowski (1990). "Parallel and Superfast Algorithms for Hankel
Systems of Equations," Numer. Math. 58, 109-127.
D.E. Heller (1976). "Some Aspects of the Cyclic Reduction Algorithm for Block Tridi-
agonal Linear Systems," SIAM J. Num. Anal. 13, 484-96.
D.E. Heller (1978). "A Survey of Parallel Algorithms in Numerical Linear Algebra,"
SIAM Review 20, 740-777.
D.E. Heller and l.C.F. Ipsen (1983). "Systolic Networks for Orthogonal Decompositions,"
SIAM J. Sci. and Stat. Comp. 4, 261-269.
B.W. Helton (1968). "Logarithms of Matrices," Proc. Amer. Math. Soc. 19, 733-36.
H.V. Henderson and S.R. Searle (1981). "The Vee-Permutation Matrix, The Vee Opera..
tor, and Kronecker Products: A Review," Linear and Multilinear Algebra 9, 271-288.
B. Hendrickson and D. Womble ( 1994). "The Torus-Wrap Mapping for Dense Matrix
Calculations on Massively Parallel Computers," SIAM J. Sci. Comput. 15, 1201-
1226.
C.S. Henkel, M.T. Heath, and R.J. Plemmons (1988). "Cholesky Downdating on a
Hypercube," in G. Fox (1988), 1592-1598.
BIBLIOGRAPHY 661

P. Henrici (1958). "On the Speed of Convergence of Cyclic and Quasicyclic Jacobi
Methods for Computing the Eigenvalues of Hermitian Matrices," SIAM J. Appl.
Math. 6, 144-62.
P. Henrici (1962). "Bounds for Iterates, Inverse., Spectral Variation and Fields of Values
of Non-normal Matrices," Numer. Math. 4, 24-40.
P. Henrici and K. Zimmermann (1968). "An Estimate foc the Norms of Certain Cyclic
Jacobi Operators," Lin. Alg. and I!s Applic. 1, 489-501.
M.R. Hestenes (1980). Conjugate Direction Methods in Optimization, Springer-Verlag,
Berlin.
M.R. Hestenes (1990). "Conjugacy and Gradients," in A Hi.sto711 of Scientific Comput-
ing, Addison-Wesley, Reading, MA.
M.R. Hestenes and E. Stiefel (1952). "Methods of Conjugate Gradients for Solving
Lineae Systems," J. Res. Nat. Bur. Stand. 49, 409-36.
G. Hewer and C. Kenney (1988). "The Sensitivity of the Stable Lyapunov Equation,"
SIAM J. Control Optim f6, 321-344.
D.J. Higham (1995). "Condition Numbers and Their Condition Numbers," Lin. Alg.
and Its Applic. f14, 193-213.
D.J. Higham and N.J. Higham (1992). "Componentwise Perturbation Theory for Linear
Systems with Multiple Right-Hand Sides," LirL Alg. and Ita Applic. 174, 111-129.
D.J. Higham and N.J. Higham (1992). "Backward Error and Condition of Structured
Linear Systems," SIAM J. Matrix Anal. Appl. 13, 162-175.
D.J. Higham and L.N. Trefethen (1993). "Stiffness of ODES," BIT 33, 285--303.
N.J. Higham (1985). "Nearness Problems in Numerical Linear Algebra," PhD Thesis,
University of Manchester, England.
N.J. Higham (1986). "Newton's Method for the Matrix Square Root," Math. Comp.
46, 537--550.
N.J. Higham (1986). "Computing the Polar Decomposition-ith Applications," SIAM
J. Sci. and Stat. Comp. 7, 116(}-1174.
N.J. Higham (1986). "Efficient Algorithms for computing the condition number of a
tridiagonal matrix," SIAM J. Sci. and Stat. Comp. 7, 15(}-165.
N.J. Higham (1987). "A Survey of Condition Number Estimation for Triangular Matri-
ces," SIAM Review f9, 575--596.
N.J. Higham (1987). "Error Analysis of the Bjorck-Pereyra Algorithms for Solving Van-
dermonde Systems," Numer. Math. 50, 613-632.
N.J. Higham (1987). "Computing Real Square Roots of a Real Matrix," Lin. Alg. and
Its Applic. 88/89, 405-430.
N.J. Higham (1988). "Fast Solution of Vandermonde-like Systems Involving Orthogonal
Polynomials," IMA J. Num. Anal. 8, 473-486.
N.J. Higham (1988). "Computing a Nearest Symmetric Positive Semidefinite Matrix,"
Lin. Alg. and Its Applic. 103, 103-118.
N.J. Higham (1988). "The Symmetric Procrustes Problem," BIT fB, 133-43.
N.J. Higham (1988). "FORTRAN Codes for Estimating the One-Norm of a Real or
Complex Matrix with Applications to Condition Estimation (Algorithm 674)," ACM
TI-ans. Math. Soft. 14, 381-396.
N.J. Higham (1989). "Matrix Nearness Problems and Applications," in Applications of
Matrix Theo711, M.J.C. Gover and S. Barnett (eds), Oxford University Press, Oxford
UK, 1-27.
N.J. Higham (1989). "The Accuracy of Solutions to Triangular Systems," SIAM J. Num.
Anal. f6, 1252-1265.
N.J. Higham (1990). "Bounding the Error in Gaussian Elimination for Tridiagonal
Systems," SIAM J. Matrix Anal. Appl. 11, 521-530.
N.J. Higham (1990). "Stability Analysis of Algorithms for Solving Confluent Vandermonde-
like Systems," SIAM J. Matrix Anal. Appl. 11, 23-41.
N.J. Higham (1990). "Analysis of the Cholesky Decomposition of a Semidefinite Matrix,"
in Reliable Numerical Computation, M.G. Cox and S.J. Hammarling (eds), Oxford
University Press, Oxford, UK, 161-185.
662 BIBLIOGRAPHY

N.J. Higham (1990). "Exploiting Fa.st Matrix Multiplication within the Level3 BLAS,"
ACM TI-ans. Math. Soft. 16, 352-368.
N.J. Higham (1991). "Iterative Refinement Enhances the Stability of QR Factorization
Methods for Solving Linear Equations," BIT 31, 447--468.
N.J. Higham (1992). "Stability of a Method for Multiplying Complex Matrices with
Three Real Matrix Multiplications," SIAM J. Matri:r: Anal. Appl. 13, 681-687.
N.J. Higham {1992). "Estimating the Matrix p-Norm," Numer. Math. 62, 539-556.
N.J. Higham (1993). "Optimization by Direct Search in Matrix Computations," SIAM
J. Matri:r: Anal. Appl. 14, 317-333.
N.J. Higham (1993). "Perturbation Theory and Backward Error for AX - XB = C,"
BIT 33, 124-136.
N.J. Higham (1994). ''The Matrix Sign Decomposition and Its Relation to the Polar
Decomposition," Lin. Alg. and It. Applic. 212/213, 3--20.
N.J. Higham (1994). "A Survey of Componentwise Perturbation Theory in Numerical
Linear Algebra," in Mathematics of Computation 1943-1993: A Half Century of
Computational Mathematics, W. Gautschi (ed.), Volume 48 of Proceedings of Svm-
posia in Applied Mathematics, American Mathematical Society, Providence, Rhode
Island.
N.J. Higham (1995). "Stability of Parallel Triangular System Solvers," SIAM J. Sci.
Comp. 16, 40Q-413.
N.J. Higham (1996). Accuracy and Stability of Numerical Algorithms, SIAM Publica-
tions, Philadelphia, PA.
N.J. Higham and D.J. Higham (1989). "La.rge Growth Factors in Gaussian Elimination
with Pivoting," SIAM J. Matri:r: Anal. Appl. 10, 155-164.
N.J. Higham and P.A. Knight (1995). "Matrix Powers in Finite Precision Arithmetic,"
SIAM J. Matri:r: Anal. Appl. 16, 343--358.
N.J. Higham and P. Papadimitriou (1994). "A Parallel Algorithm for Computing the
Polar Decomposition," Parallel Comp. 20, 1161-1173.
R.W. Hockney (1965). "A Fast Direct Solution of Poisson's Equation Using Fourier
Analysis," J. ACM 12, 95-113.
R.W. Hockney and C.R. Jesshope (1988). Parallel Computers 2, Adam Hilger, Bristol
and Philadelphia.
W. Hoffman and B.N. Parlett (1978). "A New Proof of Global Convergence for the
Tridiagonal QL Algorithm," SIAM J. Num. Anal. 15, 929-37.
S. Holmgren and K. Otto (1992). "Iterative Solution Methods and Preconditioners for
Block-Tridiagonal Systems of Equations," SIAM J. Matri:r: Anal. Appl. 13, 863-886.
H. Hotelling {1957). "The Relations of the Newer Multivariate Statistical Methods to
Factor Analysis," Brit. J. Stat. Psych. 10, 69-79.
P.D. Hough and S.A. Vavasis (1996). "Complete Orthogonal Decomposition for Weighted
Least Squares," SIAM J. Matri:r: Anal. Appl., to appear.
A.S. Householder (1958). "Unitary Triangularization of a Nonsymmetric Matrix," J.
ACM. 5, 339--42.
A.S. Householder (1964). The Theory of Matrices in Numerical Analysis , Dover Pub-
lications, New York.
A.S. Householder (1968). "Moments and characteristic Roots II," Numer. Math. 11,
126-28.
R. Horn and C. Johnson (1985). Matri:r: Analysis, Cambridge University Press, New
York.
R. Horn and C. Johnson (1991). Topics in Matri:r: Analysis, Cambridge University Pre5B,
New York.
C.P. Huang (1975). "A Jacobi-Type Method for Triangularizing an Arbitrary Matrix,"
SIAM J. Num. Anal. 12, 566-70.
C.P. Huang (1981). "On the Convergence of the QR Algorithm with Origin Shifts for
Normal Matrices," IMA J. Num. Anal. 1, 127-33.
C.-M. Huang and D.P. O'Leary (1993). "A Krylov Multisplitting Algorithm for Solving
Linear Systems of Equations," Lin. Alg. and It. Applic. 194, 9-29.
BIBLIOGRAPHY 663

T. Huckle (1992). "Circulant o.nd Skewcirculo.nt Matrie<., for Solving Toeplitz Matrix
Problems," SIAM J. Matri:r: AnaL Appl. 13, 767-777.
T. Huckle (I992). "A Note on Skew-Circulant Preconditioners for Elliptic Problems,"
Numerical Algorithm.o 2, 27~286.
T. Huckle (I994). "The Arnoldi Method for Normal Matrices," SIAM J. Matri:r: Anal.
Appl. 15, 4~89.
T. Huckle (I995). "Low-Rank Modification of the Unsymmetric Lanczos Algorithm,"
Math.Comp. 64, I577-I588.
T.E. Hull and J.R. Swensen (I966). '"!'est. of Probabilistic Models for Propagation of
Roundoff Errors," Comm. ACM. 9, I08-I3.
T-M. Hwang, W-W. Lin, and E.K. Yo.ng (I992). "Rank-Revealing LU Factorizations,"
Lin. Alg. and It. Applic. 175, 115-I41.
Y. Ikebe (I979). "On Inverses of Hessenberg Matric..,," Lin. Alg. and It. Applic. 24,
93-97.
l.C.F. Ipsen, Y. Saad, and M. Schultz (I986). "Dense Linear Systems on a Ring of
Processors," Lin. Alg. and It. Applic. 77, 205-239.
C.G.J. Jacobi (I846). "Uber ein Leichtffi Verfahren Die in der Theorie der Sacularstroun-
gen Vorkommendern Gleichungen Numerisch Aufzulosen," Grelle's J. 30, 5I-94.
P. Jacobson, B. KB.gstrom, and M. Rannar (I992). "Algorithm Development for Dis-
tributed Memory Multicomputers Using Conlab," Scientific Progrnmming, 1, I85-
203.
H.J. Jagadish and T. Kailath (I989). "A Family of New Efficient Arrays for Matrix
Multiplication," IEEE 'Ihlns. Comput. 38, I4~I55.
W. Jalby and B. Philippe (I99I). "Stability Analysis and Improvement of the Block
Gram-Schmidt Algorithm," SIAM J. Sci. Stat. Comp. 12, I058-I073.
M. Jankowski and M. Wozniakowski (I977). "Iterative Refinement Implies Numerical
Stability," BIT 17, 303-3Il.
K.C. Jea and D.M. Young (I983). "On the Simplification of Generalized Conjugate
Gradient Methods for Nonsymmetrizable Linear Systems," Lin. Alg. and Its Applic.
52/53, 39~I7.
A. Jennings (I977). "Influence of the Eigenvalue Spectrum on the Convergence Rate of
the Conjugate Gradient Method," J. Inst. Math. Applic. 20, 6I-72.
A. Jennings (I977). Matri:r: Computation for Engineers and Scientist., John Wiley and
Sons, New York.
A. Jennings and J.J. McKeowen (I992). Matri:r: Computation (2nd ed}, John Wiley and
Sons, New York.
A. Jennings and D.R.L. Orr (1971). "Application of the Simultaneous Iteration Method
to Undamped Vibration Problems," Inst. J. Numer. Math. Eng. 3, 13-24.
A. Jennings and M.R. Osborne (1977). "Generalized Eigenvalue Problems for Certain
Unsymmetric Band Matrices," Lin. Alg. and Its Applic. 29, I3~50.
A. Jennings and W.J. Stewart (1975). "Simultaneous Iteration for the Partial Eigenso-
lution of Real Matrices," J. Inst. Math. Applic. 15, 35I-62.
L.S. Jennings and M.R. Osbome (I974). "A Direct Error Analysis for Least Squar..,,"
Numer. Math. 22, 322-32.
P.S. Jenson (I972). "The Solution of Large Symmetric Eigenproblems by Sectioning,"
SIAM J. Num. AnaL 9, 534-45.
E.R. Jessup and D.C. Sorensen (I994). "A Parallel Algorithm for Computing the Sin-
gular Value Decomposition of a Matrix," SIAM J. Matri:r: Anal. Appl. 15, 53o-548.
Z. Jia (1995). "The Convergence of Generalized Lanczos Methods for Large Unsymmetric
Eigenproblems," SIAM J. Matri:r: Anal. Applic 16, 543-562.
J. Johnson and C.L. Phillips (1971). "An Algorithm for the Computation of the Integral
of the State Transition Matrix," IEEE 'Ihlns. Auto. Cont. AC-16, 204-5.
O.G. Johnson, C.A. Micchelli, and G. Paul (1983). "Polynomial Preconditioners for
Conjugate Gradient Calculations," SIAM J. Numer. Anal. 20, 362-376.
R.J. Johnston (1971). "Gershgorin Theorems for Partitioned Matrices," Lin. Alg. and
Its Applic. ..j, 205-20.
664 BIBLIOGRAPHY

S.L. Johnsson (1985). "Solving Narrow Banded Systems on Ensemble Architectures,"


ACM Tiuns. Math. Soft. 11, 271-288.
S.L. Johnsson (1986). "Band Matrix System Solvers on Ensemble Architectures," in
Supercomputers: Algorithms, Architectures, and Scientific Computation, eds. F.A.
Matsen and T. Tajima, University of Texas Press, Austin TX., 196--216.
S.L. Johnsson (1987). "Communication Efficient Basic Linear Algebra Computations on
Hypercube Multiprocessors," J. Parallel and Di.otributcd Computing, No. 4, 133-172.
S.L. Johnsson (1987). "Solving Tridiagonal Systems on Ensemble Architectures," SIAM
J. Sci. and Stat. Comp. 8, 354-392.
S.L. Johnsson and C.T. Ho (1988). "Matrix Transposition on Boolean n-cube Configured
Ensemble Architectures," SIAM J. Matrix Anal. Appl. 9, 419--454.
S.L. Johnsson and W. Lichtenstein (1993). "Block Cyclic Dense Linear Algebra," SIAM
J. Sci.Comp. 14, 1257-1286.
S.L. Johnsson and K. Mathur (1989). "Experience with the Conjugate Gradient Method
for Stress Analysis on a Data Parallel Supercomputer," International Journal on
Numerical Method• in Engineering 27, 523-546.
P. Joly and G. Meurant (1993). "Complex Conjugate Gradient Methods," Numerical
Algorithm• 4, 379--406.
M.T. Jones and M.L. Patrick (1993). "Bunch-Kaufman Factorization for Real Symmetric
Indefinite Banded Matrices," SIAM J. Matrix Anal. Appl. 14, 553-559.
M.T. Jones and M.L. Patrick (1994). "Factoring Symmetric Indefinite Matrices on High-
Performance Architectures," SIAM J. Matri:t Anal. Appl. 15, 273--283.
T. Jordan (1984). "Conjugate Gradient Preconditioners for Vector and Parallel Pro-
cessors," in G. Birkoff and A. Schoenstadt (eds), Proceeding• of the Conference on
Elliptic Problem Solver•, Academic Press, NY.
W. Joubert (1992). "Lanczos Methods for the Solution of Nonsymmetric Systems of
Linear Equations," SIAM J. Matri:t Anal. Appl. 13, 926--943.
B. Ki\gstriim (1977). "Numerical Computation of Matrix Functions," Department of
Information Processing Report UMINF-58.77, University of Umea, Sweden.
B. Ki\gstriim (1977). "Bounds and Perturbation Bounds for the Matrix Exponential,"
BIT 17, 39--57.
B. Ki\gstriim (1985). "The Generalized Singular Value Decomposition and the General
A- >.B Problem," BIT 24, 56S.583.
B. Ki\gstri:im (1986). "RGSVD: An Algorithm for Computing the Kronecker Structure
and Reducing Subspaces of Singular A- >.B Pencils," SIAM J. Sci. and Stat. Comp.
7,185-211.
B. Ki\gstriim (1994). "A Perturbation Analysis of the Generalized Sylvester Equation
(AR- LB, DR- LE) = (C,F)," SIAM J. Matriz Anal. Appl. 15, 1045--1060.
B. KAgstriim, P. Ling, and C. Van Loan (1991). "High-Performance Level-3 BLAS:
Sample Routines for Double Precision Real Data," in High Performance Computing
II, M. Durand and F. El Dabaghi (eds), North-Holland, 269--281.
B. Kii.gstriim, P. Ling, and C. Van Loan (1995). "GEMM-Based Level-3 BLAS: High-
Performance Model Implementations and Performance Evaluation Benchmark," in
Parallel Programming and Applications, P. Fritzon and L. Finmo (eds), ISO Press,
184-188.
B. Kagstriim and P. Poromaa (1992). "Distributed and Shared Memory Block Algo-
rithms for the Triangular Sylvester Equation with sep- 1 Estimators," SIAM J. Ma-
triz Anal. Appl. 13, 9()-.101.
B. Kagstriim and A. Ruhe (1980). "An Algorithm for Numerical Computation of the
Jordan Normal Form of a Complex Matrix," ACM 1mn8. Math. Soft. 6, 398--419.
B. Ki\gstrom and A. Ruhe (1980). "Algorithm 560 JNF: An Algorithm for Numerical
Computation of the Jordan Normal Form of a Complex Matrix," ACM Tiun•. Math.
Soft. 6, 437--43.
B. KAgstrom and A. Ruhe, eds. (1983). Matriz Pencil•, Proc. Pite Havsbad, 1982,
Lecture Notes in Mathematics 973, Springer-Verlag, New York and Berlin.
BIBLIOGRAPHY 665

B. Kagstrom and L. Westin (1989). "Generalized Schur Methods with Condition Esti-
mators for Solving the Generalized Sylvester Equation," IEEE 'lhms. Auto. Cont.
AC-34, 745-751.
W. Kahan (1966). "Numerical Linear Algebra," Canadian Math. Bull. 9, 757-801.
W. Kahan (1975). "Spectra of Nearly Hermitian Matrices," Proc. Amer. Math. Soc.
48, 11-17.
W. Kahan and B.N. Parlett (1976). "How Far Should You Go with the Lanczos Process?"
in Sparse Matrix Computations, ed. J. Bunch and D. Rose, Academic Press, New
York, pp. 131-44.
W. Kahan, B.N. Parlett, and E. Jiang (1982). "Residual Bounds on Approximate Eigen-
systems of Nonnormal Matrices," SIAM J. Numer. Anal. 19, 47o-484.
D. Kahaner, C.B. Moler, and S. Nash (1988). Numerical Methods and Software, Prentice-
Hall, Englewood Cliffs, NJ.
T. Ko.ilath and J. Chun (1994). "Generalized Displacement Structure for Block-Toeplitz,
Toeplitz-Biock, and 'Ibeplitz-Derived Matrices," SIAM J. Matriz Anal. Appl. 15,
114-128.
T. Kailath and A. H. Sayed (1995). "Displacement Structure: Theory and Applications,"
SIAM Review 37, 297-386.
C. Ka.rnath and A. Sameh (1989). "A Projection Method for Solving Nonsymmetric
Linear Systems on Multiproce<!SOrs," Parallel Computing 9, 291-312.
S. Kaniel (1966). "Estimat'"' for Some Computational Techniqu.., in Linear Algebra,"
Math. Camp. 20, 369-78.
I.E. Kaporin (1994). "New Convergence Results and Preconditioning Strategi'"' for the
Conjugate Gradient Method," Num. Lin. Alg. Applic. 1, 179-210.
R.N. Kapur and J.C. Browne {1984). "Techniques for Solving Block Tridiagonal Systems
on Reconfigurable Array Computers," SIAM J. Sci. and Stat. Comp. 5, 701-719.
I. Karasalo (1974). "A Criterion for Truncation of the QR Decomp011ition Algorithm for
the Singular Linear Least Squa.re5 Problem," BIT 14, 156-66.
E.M. Kasenally (1995). "GMBACK: A Generalized Minimum Backward Error Algorithm
for Nonsymmetric Linear Systems," SIAM J. Sci. Camp. 16, 698-719.
T. Kato (1966). Perturbation Theory for Linear Operators, Springer-Verlag, New York.
L. Kaufman (1974). "The LZ Algorithm to Solve the Generalized Eigenvalue Problem,"
SIAM J. Num. Anal. 11,997-1024.
L. Kaufman (1977). "Some Thoughts on the QZ Algorithm for Solving the Generalized
Eigenvalue Problem," ACM 'lhms. Math. Soft. 3, 65-75.
L. Kaufman (1979). "Application of Dense HoUBeholder Transformations to a Sparse
Matrix," ACM 'lhms. Math. Soft. 5, 442-51.
L. Kaufman (1987). "The Generalized Householder Transformation and Sparse Matri-
ces," Lin. Alg. and Its Applic. 90, 221-234.
L. Kaufman (1993). "An Algorithm for the Banded Symmetric Generalized Matrix
Eigenva.lue Problem," SIAM J. MoJ.riz Anal. Awl. 14, 372-389.
J. Kautsky and G.H. Golub (1983). "On the Calculation of Jacobi Matrices," Lin. Alg.
and Its Applic. 52/53, 439-456.
C.S. Kenney and A.J. Laub {1989). "Condition Estimates for Matrix Functions," SIAM
J. Matrix Anal. Awl. 10, 191-209.
C.S. Kenney and A.J. Laub (1991). "Rational Iterative Methods for the Matrix Sign
Function," SIAM J. Matriz Anal. Appl. 12, 273-291.
C.S. Kenney and A.J. Laub (1992). "On Scaling Newton's Method for Polar Decompo-
sition and the Matrix Sign Function," SIAM J. Matrix Anal. Appl. 13, 688-706.
C.S. Kenney and A.J. Laub (1994). "Small-Sample Statistical Condition Estimates for
General Matrix Functions," SIAM J. Sci. Camp. 15, 36-61.
D. Kershaw(1982). "Solution of Single Tridiagonal Linear Systems and Vectorization of
the ICCG Algorithm on the Cray-1," in G. Roderigue (ed), Parallel Computation,
Academic Press, NY, 1982.
666 BIBLIOGRAPHY

D.E. Keyes, T.F. Chan, G. Meurant, J.S. Scroggs, and R.G. Voigt (eds) (1992). Do-
main Decomposition Methods for Partial Differential Equations, SIAM Publications,
Philadelphia, PA.
A. Kielbasioski ( 1987). "A Note on RDunding Error Analysis of Cholesky Factorization,"
Lin. Alg. and I1.8 Awlic. 88/89, 487-494.
S.K. Kim and A.T. Chronopoulos (1991). "A Closs of Lanczos-Like Algorithms Imple-
mented on Parallel Computers," Parallel Comput. 17, 763-778.
F. Kittaneh (1995). "Singular Values of CompWJion Matric.., and Bounds on Zeros of
Polynomials," SIAM J. Matrix Anal. Appl. 16, 333-340.
P.A. Knight (1993). "Error Analysis of Stationary Iteration and Associated Problems,"
Ph.D. thesis, Department of Mathematics, University of Manchester, England.
P.A. Knight (1995). "F&Bt Rectangular Matrix Multiplication and the QR Decomposi-
tion," Lin. A/g. and I1.8 Applic. 11111, 69-81.
D. Knuth (1981). The Art of Computer Programming , vol. !. Seminumericul Algo-
rithms, 2nd ed., Addison-Wesley, Reading, Massachusetts.
E.G. Kogbetliantz (1955). "Solution of Linear Equations by Diagonalization of Coeffi-
cient Matrix," Quart. Appl. Math. 13, 123-132.
S. Kourouklis and C.C. Paige (1981). "A Constrained Least Squares Approach to the
General Galliiii-Markov Linear Model," J. A mer. Stat. Assoc. 76, 620-25.
V.N. KublaoOVBicaya (1961). "On Some Algorithms for the Solution of the Complete
Eigenvalue Problem," USSR Comp. Math. Phys. 3, 637-57.
V.N. Kublanovskaya (1984). "AB Algorithm and Its Modifications for the Spectral
Problem of Linear Pencils of Matrices," Numer. Math. 43, 329-342.
V.N. Kublanovskaja and V.N. Fadeeva (1964). "Computational Methods for the Solution
of a Generalized Eigenvalue Problem," A mer. Math. Soc. 7hmsl. 2, 271-90.
J. KuczyDski and H. Woiniakowski (1992). "Estimating the Larg...t Eigenvalue by the
Power and Lanczos Algorithms with a Random Start," SIAM J. Matrix Anal. AwL
13, 1094-1122.
U.W. Kulisch and W.L. Miranker (1986). "The Arithmetic of the Digital Computer,"
SIAM Review 28, 1-40.
V. Kumar, A. Grama, A. Gupta and G. Karypis (1994). Introduction to Parallel Com-
puting: Design and Analysis of Algorithms, Benjamin/Cummings, Reading, MA.
H.T. Kung (1982). "Why Systolic Architectur...?," Computer 15, 37-46.
C.D. La Budde (1964). "Two Closs... of Algorithms for Finding the Eigenvalues and
Eigenvectors of Real Symmetric Matric..,," J. ACM 11, 53-58.
S. Lakshmivarahan and S. K. Dhall (1990). Analysis and Design of Parallel Algorithms:
Arithmetic and MatrU: Problems, McGraw-Hill, New York.
J. Lambiotte Wid R.G. Voigt (1975). ''The Solution of Tridiagonal Linear Systems of
the CDC-STAR 100 Computer," ACM Trans. Math. Soft. 1, 308--29.
P. Lsncaste< (1970). "Explicit Solution of Linear Matrix Equations," SIAM Review 12,
544-{16.
P. Lanc&Bter and M. Tismenetsky (1985). The Theory of Matrices, S=nd Edition,
Academic Press, New York.
C. Lanczos (1950). "An Iteration Method for the Solution of the Eigenvalue Problem of
Linear Differential and Integral Operators," J. Res. Nat. Bur. Stand. 45, 25&-82.
B. Lang (1996). "Parallel Reduction of Banded Matric.., to Bidiagonal Form," Parnllel
Computing 22, 1-18.
J. Larson and A. Sameb (1978). "Efficient Calculation of the EtfectsofRDundotfErrors,"
ACM Trans. Math. Soft. 4, 228--36.
A. Laub (1981). "Efficient Multivariable Frequency Response Computations," IEEE
Trans. Auto. Cont. AC-26, 407-8.
A. Laub(1985). "Numerical Linear Algebra Aspects of Control Design Computations,"
IEEE Trans. Auto. Cont. AC-30, 97-108.
C.L. LaWHOn and R.J. Hanson (1969). "Extensions and Applications of the Householder
Algorithm for Solving Linear Le&Bt Squares Problems," Math. Camp. 23, 787-812.
BIBLIOGRAPHY 661

C.L. Lawson and R.J. Hanson (1974). Sollling Least Squares Problems, Prentice-Hall,
Englewood Cliffs, NJ. Reprinted with a detailed "new developments" appendix in
1996 by SIAM Publications, Philadelphia, PA.
C.L. Lawson, R.J. Hanson, , D.R. Kincaid, and F.T. Krogh (1979). "Basic Linear
Algebra Subprograms for FORTRAN Usage," ACM 'lluns. Math. Soft. 5, 308--323.
C.L. LaWliOn, R.J. Hanson, D.R. Kincaid, and F.T. Krogh (1979). "Algorithm 539,
Basic Linear Algebra Subprograms for FORTRAN Usage," ACM 'lluns. Math. Soft.
5, 324-325.
D. Lay (1994). Linear Algebra and Its Applications, Addison-Wesley, Reading, MA.
N.J. Lehmann (1963). "Optimale Eigenwerteinschliessungen," Numer. Math. 5, 246-72.
R.B, Lehoucq (1995). "Analysis and Implementation of an Implicitly Restarted Arnoldi
Iteration," Ph.D. thesis, Rice University, Houston Texas.
R.B. Lehoucq (1996). "Restarting an Arnoldi Reduction," Report MCS-P591..Q496, Ar-
gonne National Laboratory, Argonne Illinois.
R.B. Lehoucq and D.C. Sorensen (1996). "Deftation Techniques for an Implicitly Restarted
Iteration," SIAM J. Matrix Analysis and Applic, to appear.
F.T. Leighton (1992). Introduction to Parallel Algorithms and Architectures, Morgan
Kaufmann, San Mateo, CA.
F. Lemeire (1973). "Bounds for Condition Numbers of 'lliangular Value of a Matrix,"
Lin. Alg. and /Is Applic. 11, 1-2.
S.J. Leon (1980). Linear Algebra with Applications. Macmillan, New York.
S.J. Leon (1994). "Maximizing Bilinear Forms Subject to Linear Constraints," Lin. Alg.
and Its Applic. !HO, 4~58.
N. Levinson (1947). "The Weiner RMS Error Criterion in Filter Design and Prediction,"
J. Math. Phys. 25, 261-78.
J. Lewis, ed. (1994). Prr>eet!dings of the Fifth SIAM Conference on Applied Linear
Algebm, SIAM PublicatioiiB, Philadelphia, PA.
G. Li and T. Coleman (1988). "A Parallel Triangular Solver for a Distributed-Memory
Multiprocessor," SIAM J. Sci. and Stat. Comp. 9, 485-502.
K. Li and T-Y. Li (1993). "A Homotopy Algorithm for a Symmetric Generalized Eigen-
problem," Numeriml Algorithms 4, 167-195.
K. Li, T-Y. Li, and Z. Zeng (1994). "An Algorithm for the Generalized Symmetric
Tridiagonal Eigenvalue Problem," Numeriml Algorithms 8, 269-291.
R-C. Li (1993). "Bounds on Perturbations of Generalized Singular Values and of AIIIIO-
ciated SubspacE&," SIAM J. Matrix Anal. Appl. 14, 195-234.
R-C. Li (1994). "On Eigenvalue Variations of Rayleigh Quotient Matrix Pencils of a
Definite Pencil," Lin. Alg. and Its Applic. 208/209, 471-483.
R-C. Li (1995). "New Perturbation Bounds for the Unitary Polar Factor," SIAM J.
Matrix Anal. Appl. 16, 327-332.
R.-C. Li (1996). "Relative Perturbation Theory (I) Eigenvalue and Singul...- Value Vari-
ations," 'Thchnical Report UCB//CSD-94-855, Department of EECS, University of
California at Berkeley.
R.-C. Li (1996). "Relative Perturbation Theory (II) Eigenspace and Singular Subspace
Variations," Technical Report UCB/ /CSD-94-856, Department of EECS, University
of California at Berkeley.
Y. Li (1993). "A Globally Convergent Method for Lp Problems," SIAM J. Optimization
3, 609-629.
W-W. Lin and C.W. Chen (1991). "An Acceleration Method br Computing the Gen-
eralized Eigenvalue Problem on a Parallel Computer," L&n.Alg. and Its Applic. 146,
4H5.
I. Linnik (1961). Method of Least Squares and Principles of the Theory of Obseroations,
Pergamon Press, New York.
E. Linzer (1992). "On the Stability of Solution Methods for Band Toeplitz Systems,"
L&n.Alg. and Its Appl&c. 170, 1-32.
S. Lo, B. Philippe, and A. Sameh (1987). "A Multiprocessor Algorithm for the Symmet-
ric Tridiagonal Eigenvalue Problem," SIAM J. Sc&. and Stat. Comp. 8, s15~s165.
668 BIBLIOGRAPHY

G. Loizou (1969). "Nonnormality and Jordan Condition Numbers of Matrices," J. ACM


16, 580--40.
G. Loizou (1972). "On the Quadratic Convergence of the Jacobi Method for Normal
Matrices," Comp. J. 15, 274-76.
M. Lotkin (1956). "Characteristic Va.lues of Arbitrary Matrices," Quart. Appl. Math.
14, 267-75.
H. Lu (1994). "Fast Solution of Confluent Vandermonde Linear Systems," SIAM J.
Mat..U Anal. Appl. 15, 1277-1289.
H. Lu (1996). "Solution of Vandermonde-like Systems and Confluent Vandermonde-like
Systems," SIAM J. Matrix AnaL Appl. 17, 127-138.
D. G. Luenberger (1973). Introduction to Linear and Nonlinear Programming, Addison-
Wesley, New York.
F.T. Luk (1978). "Sparse aod Para.llel Matrix Computations," PhD Thesis, Report
STANS-CS-78-685, Department of Computer Science, Stanford University, Stanford,
CA.
F.T. Luk (1980). "Computing the Singular Va.lue Decomposition on the ILLIAC IV,"
ACM 'nuns. Math. Soft. 6, 524-39.
F.T. Luk (1986). "A RDtation Method for Computing the QR Factorization," SIAM J.
Sci. and Stat. Comp. 7, 452-459.
F.T. Luk (1986). "A Triangular Processor Array for Computing Singular Values," Lia
Alg. and It. Applic. 77, 259-274.
N. Mackey (1995). "Hamilton and Jacobi Meet Again: Quaternions and the Eigenva.lue
Problem," SIAM J. Matrix Anal. AppL 16, 421-435.
A. Madansky (1959). "The Fitting of Straight Lines When Both Variables Are Subject
to Error," J. Amer. Stat. Assoc. 54, 173-205.
N. Madsen, G. RDderigue, and J. Karush (1976). "Matrix Multiplication by Diagonals
on a Vector Parallel Processor," Injomation Processing Letter' 5, 41-45.
K.N. Majinder (1979). "Linear Combinations of Hermitian and Real Symmetric Matri-
ces," Lin. Alg. and Its Applic. SS, 95-105.
J. Makhoul (1975). "Linear Prediction: A Tutoria.l Review," Proc. IEEE 63( 4), 561-80.
M.A. Malcolm and J. Pa.lmer (1974). "A Fast Method for Solving a Class of TridiagonaJ
Systems of Linear Equations," Comm. ACM 17, 14-17.
L. Mansfield (1991). "Damped Jacobi Preconditioning and Coarse Grid Deflation for
Conjugate Gradient Iteration on Parallel Computers," SIAM J. Sci. and Stat. Comp.
IS, 1314-1323.
T.A. Manteulfel (1977). "The Tchebychev Iteration for Nonsymmetric Linear Systems,"
Numer. Math. SS, 307-27.
T.A. Mantuelfel (1979). "Shifted Incomplete Cholesky Factorization," in Sparse Matrix
Proceedings, 1978, ed. I.S. Duff and G.W. Stewart, SIAM Publications, Philadelphia,
PA.
M. Marcus (1993). Matricu and MATLAB: A Thtorial, Prentice HaJJ, Upper Saddle
River, NJ.
M. Marcus and H. Mine {1964). A Suroey of Matrix Theory and Matrix InequalitieJJ,
Allyn and Bacon, Boston.
J. Markel and A. Gray (1976). Linear Prediction of Speech, Springer-Verlag, Berlin and
New York.
M. Marrakchi andY. &bert (1989). "Optima.! Algorithms for Gaussian Elimination on
an MIMD Computer," Parallel Computing 1S, 183-194.
R.S. Martin, G. Peters, and J.H. Wilkinson (1970). "The QR Algorithm for Real Hes-
eenberg Matric'"'," Numer. Math. 14, 219-31. See aJso Wilkinson and Reinsch(1971,
pp. 359-71).
R.S. Martin and J.H. Wilkinson (1965). "Symmetric Decomposition of Positive Definite
Band Matrices," Numer. Math. 7, 355-61.
R.S. Martin and J.H. Wilkinson (1967). "Solution of Symmetric and Unsymmetric Band
Equations aod the Ca.lculation of Eigenvectom of Band Matrices," Nuffl£T'. Math. 9,
279-301. See aJso Wilkinson aod Reinsch (1971, pp.70-92).
BIBLIOGRAPHY 669

R.S. Martin and J .H. Wilkinson (1968). "Similarity Reduction of a General Matrix to
Hessenberg Form," Numer. Malh. 1f, 349-68. See also Wilkinson and Reinsch
(197l,pp.339-58).
R.S. Martin and J.H. Wilkinson (1968). "The Modified LR Algorithm for Complex H.,.
senberg Matrices," Numer. Malh. 12, 369-76. See aJBo Wilkinson and Reinsch(1971,
pp. 396--403).
R.S. Martin and J.H. Wilkinson (1968). "Householder's Tridiagonalization of a Sym-
metric Matrix," Numer. Malh. 11, 181-95. See also Wilkinson and Reinsch (1971,
pp.212--26).
R.S. Martin and J.H. Wilkinson (1968). "Reduction of a Symmetric Eigenproblem Ax=
)I.Bz and Related Problems to Standard Form," Numer. Malh. 11, 99-110.
R.S. Martin, G. Peters, and J.H. Wilkinson (1965). "Symmetric Decomposition of a
Positive Definite Matrix," Numer. Malh. 7, 362-83.
R.S. Martin, G. Peters, and J .H. Wilkinson (1966). "Iterative Rellnement of the Solution
of a Positive Definite System of Equations," Numer. Malh. 8, 203-16.
R.S. Martin, C. Reinsch, and J.H. Wilkinson (1970). "The QR Algorithm for Band
Symmetric Matrices," Numer. Malh. 16, 85-92. See also See also Wilkinson and
Reinsch (1971, pp.26&-72).
W.F. Mascarenhas (1994). "A Note on Jacobi Being More Accurate than QR," SIAM
J. Matrix AnaL Appl. 15, 215-218.
R. Mathias (1992). "Matrices with Positive Definite Hermitian Part: Inequalities and
Linear Systems," SIAM J. Matm Anal. Appl. 13, 64o-654.
R. Mathias (1992). "Evaluating the Frechet Derivative of the Matrix Exponential,"
Numer. Math. 63, 213-226.
R. Mathias (1993). "Approximation of Matrix-Valued Functions," SIAM J. Matrix Anal.
Appl. 14, 1061-1063.
R. Mathias (1993). "Perturbation Bounds for the Polar DecomJXJSition," SIAM J. Matrix
Anal. Appl. 14, 588-597.
R. Mathias (1995). "Accurate Eigensystem Computations by Jacobi Methods," SIAM
J. Matm Anal. Appl. 16, 977-1003.
R. MathiM (1995). "The Instability of Parallel Prefix Matrix Multiplication," SIAM J.
Sci. Camp. 16, 95&-973.
R. MathiM and G.W. Stewart (1993). "A Block QR Algorithm and the Singular Value
Decomposition," Lin. Alg. and Its Applic. 181l, 91-100.
K. Mathur and S.L. Johnsson (1994). "Multiplication of Matrices of Arbitrary Shape on
a Data Parallel Computer," Parallel Computing eo, 919-952.
B. Mattingly, C. Meyer, and J. Ortega (1989). "Orthogonal Reduction on Vector Com-
puters," SIAM J. Sci. and Stat. Camp. 10, 372-381.
0. McBryan and E.F. van de Velde (1987). "Hypercube Algorithms and Implementa-
tions," SIAM J. Sci. and Stat. Camp. 8, s227__,287.
C. McCarthy and G. Strang (1973). "Optimal Conditioning of Matrices," SIAM J. Num.
Anal. 10, 37(}-88.
S.F. McCormick (1972). "A General Approach to One-Step Iterative Methods with
Application to Eigenvalue Problems," J. Comput. Sys. Sci. 6, 354-72.
W.M. McKeeman (1962). "Crout with Equilibration and Iteration," Comm. ACM. 5,
553-55.
K. Meerbergen, A. Spence, and D. Rnose (1994). "Shift-Invert and Cayley Transforms
for the Detection of Rightmost Eigenvalues of Nonsymmetric Matrices," BIT 34,
409-423.
V. Mehrmann (1988). "A Symplectic Orthogonal Method for Single Input or Single
Output Discrete Time Optimal Quadratic Control Problems," SIAM J. Matm Anal.
Appl. 9, 221-247.
V. Mehrmann (1993). "Divide and Conquer Methods for Block Tridiagonal Systems,"
Parallel Computing 19, 257-280.
U. Meier (1985). "A Parallel Partition Method for Solving Banded Systems of Linear
Equations," Parallel Computers f, 33-43.
670 BIBLIOGRAPHY

U. Meier and A. Sameh (1988). "The Behavior of Conjugate Gradient Agorithms on a


Multivector Processor with a Hierarchical Memory," J. Comput Appl. Math. 2-4,
13-32.
J.A. Meijerink 8Dd H.A. Vander vorst (1977). "An Iterative Solution Method for Linear
Equation Systems of Which the Coefficient Matrix is a Symmetric M-Matrix," Math.
Comp. 31, 148-{;2.
J. Meinguet (1983). "Refined Error Analyses of Cholesky Factorization," SIAM J. Nu-
mer. Anal. 20, 1243-1250.
R. Melhem(1987). "Toward Efficient Implementation of Preconditioned Conjugate Gra-
dient Methods on Vector Supercomputers," lnt 'l J. Supercomputing Applications 1,
7Q-98.
M.L. Merriam (1985). "On the Factorization of Block Tridiagonals With Storage Con-
straints," SIAM J. Sci. and Stat. Comp. 6, 182-192.
G. Meurant (1984). "The Block Preconditioned Conjugate Gradient Method on Vector
Computers," BIT 24, 623-{;33.
G. Meurant (1989). "Domain Decomposition Methods for Partial Differential Equations
on Parallel Computers," to appear lnt 'I J. Supercomputing Applications.
G. Meur8llt ( 1992). "A Review on the Inverse of Symmetric Tridiagonal and Block
Tridiagonal Matrices," SIAM J. Matrix Anal. Appl. 13, 707-728.
C.D. Meyer (1997). A Course in Applied Linear Algebro, to be published.
C.D. Meyer 81ld G.W. Stewart (1988). "Derivatives and Perturbations of Eigenvectors,"
SIAM J. Num. Anal. 25, 679-{;91.
W. Miller (1975). "Computational Complexity and Numerical Stability," SIAM J. Com-
puting -4, 97-107.
W. Miller 81ld D. Spooner (1978). "Software for Roundoff Analysis, II," ACM 1\-ans.
Math. Soft. 4, 369-90.
G. Miminis and C.C. Paige (1982). "An Algorithm for Pole Assignment of Time Invariant
Linear Systems," International J. of Control 35, 341-354.
L. Mirsky (1960). "Symmetric Gauge Functions and Unitarily Invari81lt Norms," Quart.
J. Math. 11, 50-59.
L. Mirsky (1963). An Introduction to Linear Algebro, Oxford University Press, Oxford.
J.J. Modi (1988).Parollel Algorithms and Matrix Computation, Oxford University Press,
Oxford.
J.J. Modi and M.R.B. Clarke (1986). "An Alternative Givens Ordering," Numer. Math.
-43, 83-90.
J.J. Modi 81ld J.D. Pryce (1985). "Efficient Implementation of Jacobi's Diagonalization
Method on the DAP," Numer. Math. 46, 443-454.
C.B. Moler (1967). "Iterative Refinement in Floating Point," J. ACM 14, 316-71.
C.B. Moler and D. Morrison (1983). "Singular Value Analysis of Cryptograms," Amer.
Math. Monthly 90, 78-87.
C.B. Moler and G.W. Stewart (1973). "An Algorithm for Generalized Matrix Eigenvalue
Problems," SIAM J. Num. Anal. 10, 241-56.
C.B. Moler and C.F. Van Loan (1978). "Nineteen Dubious Ways to Compute the Expo-
nential of a Matrix," SIAM Review 20, 801-36.
R. Montoye and D. Laurie (1982). "A Practical Algorithm for the Solution of Triangular
Systems on a Parallel Processing System," IEEE 1\-ans. Comp. C-31, 1076-1082.
M.S. Moonen 8lld B. De Moor, eels. (1995). SVD and Signal Processing Ill: Algorithms,
Analysis, and Applications. Elsevier, Amsterdam.
M.S. Moonen, G.H. Golub, and B.L.R. de Moor, eels. (1993). Linear Algebro for Larye
Scale and Real-Time Applications, Kluwer, Dordrecht, The Netherlands.
M.S. Moonen, P. Van Dooren, 81ld J. Vandewalle (1992). "A Singular Value Decompo-
sition Updating Algorithm," SIAM J. Matrix Anal. Appl. 13, 1015-1038.
R.B. Morgan (1995). "A Restarted GMRES Method Augmented with Eigenvectors,"
SIAM J. Matrix Anal. Applic. 16, 1154-1171.
R.B. Morgan (1996). "On Restarting the Arnoldi Method for Large Scale Eigenvalue
Problems," Math Comp, to appear.
BIBLIOGRAPHY 671

M. Mu. (1995). "A New family of Preconditioners for Domain Decomposition," SIAM
J. Sci. Comp. 16, 289--306.
D. Mueller (1966). "Householder's Method for Complex Matrices and Hermitian Matri-
ces," Numer. Math. 8, 72-92.
F.D. Mumagha.n a.nd A. Wintner (1931). "A Ca.nonical Form for Real Mlltrires Under
Orthogonal Transforma.tions," Proc. Nat. Acad. Sci. 17, 417-20.
N. Na<:htigal, S. Reddy, and L. Trefethen (1992). "How Fast Are Nonsymmetric Matrix
Iterations," SIAM J. Matrix Anal. Appl. 13, 778-795.
N. Nachtigal, L. Reichel, and L. Trefethen (1992). "A Hybrid GMRES Algorithm for
Nonsymmetric Linear Systems," SIAM J. Matrix Anal. Appl. 13, 796-825.
T. Nanda (1985). "Differential Equations and the QR Algorithm," SIAM J. Numer.
Anal. 22, 31(}-321.
J.C. Nash (1975). "A One-Sided Tranformation Method for the Singular Value Decom-
position and Algebraic Eigenproblem," Comp. J. 18, 74-76.
M.Z. Nashed (1976). Generalized Inverses and Applications, Academic Press, New York.
R.A. Nicolaides (1974). "On a Geometrical Aspect of SOR and the Theory of Consistent
Ordering for Positive Definite Matrices," Numer. Math. 12, 99--104.
W. Nietharnmer and R.S. Varga (1983). "The Analysis of k-step Iterative Methods for
Linaar Systema from Summability Theory," Numer. Math. 41, 177-206.
B. Noble and J.W. Daniel (1977). Applied Linear Algebra, Prentice-Hall, Englewood
Cliffs.
Y. Notay (1992). "On the Robustn""" of Modified Incomplete Factorization Methods,"
J. Comput. Math. 40, 121-141.
C. Oara (1994). "Proper Deflating Subspaces: Properties, Algorithms, and Applic"'
tiona," Numerical Algorithms 7, 355-373.
W. Oettli and W. Prager (1964). "Complltibility of Approximate Solutions of Linear
Equations with Given Error Bounds for Coefficients and Right Hand Sides," Numer.
Math. 6, 405-409.
D.P. O'Leary (1980). "Estimating Matrix Condition Numbers," SIAM J. Sci. Stat.
Comp. 1, 205-9.
D.P. O'Leary (1980). "The Block Conjugate Gradient Algorithm and Related Methods,"
Lin. Alg. and Its Applic. 2g, 293-322.
D.P. O'Leary (1987). "Parallel Implementation of the Block Conjugate Gradient Alg<>-
rithm," Parallel Computers 5, 127-140.
D.P. O'Leary (1990). "On Bounds for Scaled Projectione and Pseudoinverses," Lin. Alg.
and Its Applic. 132, 115-117.
D.P. O'Leary and J.A. Simmons (1981). "A Bidiagonalization-Regularization Procedure
for Large Scale Discretizations of lll-Posed Problems," SIAM J. Sci. and Stat. Comp.
2, 474-489.
D.P. O'Leary and G.W. Stewart (1985). "Data Flow Algorithms for Parallel Matrix
Computations," Comm. ACM 28, 841-853.
D.P. O'Leary and G.W. Stewart (1986). "Assignment and Scheduling in Parallel Matrix
Factorization," Lin. Alg. and Its Applic. 77, 275-300.
S.J. Olszanskyj, J.M. Lebak, and A.W. Bojanczyk (1994). "Rank-k Modification Meth-
ods for Recursive Least Squares Problems," Numerical Algorithms 7, 325-354.
A.V. Oppenheim (1978). Applications of Digital Signal Processing , Prentice-Hall, En-
glewood Cliffs.
J.M. Ortega (1987). Matrix Theory: A Second Course, Plenum Press, New York.
J.M. Ortega (1988). "The ijk Forms of Factorization Methods I: Vector Computers,"
Parollel Computers 7, 135-147.
J.M. Ortega (1988). Introduction to Parallel and Vector Solution of Linear Systems,
Plenum Press, New York.
J.M. Ortega and C.H. Romine (1988). "The ijk Forms of Factorization Methods ll:
Parallel Systems," Parnllel Computing 7, 149--162.
J.M. Ortega and R.G. Voigt (1985). "Solution of Partial Differential Equations on Vector
and Parallel Computers," SIAM Review !J7, 149--240.
672 BIBLIOGRAPHY

E.E. Osborne (1960). "On Preconditioning of Matrices," JACM 7, 338-45.


M.H.C. Paardekooper (1971). "An Eigenvalue Algorithm for Skew Symmetric Matrices,"
Numer. Math. 17, 18~202.
M.H.C. Paardekooper (1991). "A Quadratically Convergent Parallel Jacobi Process
for Diagonally Dominant Matrices with Nondistinct Eigenvalues," Lin.Alg. and Its
Applic. 1..j5, 71-88.
C.C. Paige (1970). "Practical Use of the Symmetric Lanczos Process with Roorthogo-
nalization," BIT 10, 183-95.
C.C. Paige (1971). ''The Computation of Eigenvalues and Eigenvectors of Very Large
Sparse Matrices," Ph.D. thesis, London University.
C.C. Paige (1972). "Computational Variants of the Lanczos Method for the Eigenprob-
lem," J. Inst. Math. Applic. 10, 373-81.
C.C. Paige (1973). "An Error Analysis of a Method for Solving Matrix Equations,"
Math. Comp. 27, 355-59.
C.C. Paige (1974). "Bidiagonalization of Matrices and Solution of Linear Equations,"
SIAM J. Num. Anal. 11, 197-209.
C.C. Paige (1974). "Eigenvalues of Perturbed Hermitian Matrices," Lin. Alg. and Its
Applic . 8, 1-10.
C.C. Paige (1976). "Error Analysis of the Lanczos Algorithm for Tridiagonalizing Sym-
metric Matrix," J. /nst. Math. Applic. 18, 341-49.
C.C. Paige (1979). "Computer Solution and Perturbation Analysis of Generalized Least
Squares Problems," Math. Comp. 33, 171-84.
C.C. Paige (1979). "Fast Numerically Stable Computations for Generalized Linear Least
Squares Problems," SIAM J. Num. Anal. 16, 165-71.
C. C. Paige (1980). "Accuracy and Effectiveness of the Lanczos Algorithm for the Sym-
metric Eigenproblem," Lin. Alg. and Its Applic. 3,4, 235-58.
C.C. Paige (1981 ). "Properties of Numerical Algorithms Related to Computing Control-
lability," IEEE 1hlns. Auto. Cont. AC-26, 13Q-38.
C. C. Paige (1984). "A Note on a Result of Sun J.-Guang: Sensitivity of the CS and
GSV Decompositions," SIAM J. Numer. Anal. 21, 188-191.
C.C. Paige (1985). "The General Linear Model and the Generalized Singular Value
Decomposition," Lin. Alg. and Its Applic. 70, 26~284.
C.C. Paige (1986). "Computing the Generalized Singular Value Decomposition," SIAM
J. Sci. and Stat. Comp. 7, 1128-1146.
C.C. Paige (1990). "Some Aspects of Generalized QR Factorization," in Reliable Nu-
merical Computations, M. Cox and S. Hammarling (eds), Clarendon Press, Oxford.
C.C. Paige, B.N. Parlett,and H.A. VanDer Vorst (1995). "Approximate Solutions and
Eigenvalue Bounds from Krylov Subspaces," Numer. Linear Algebra with Applic. 2,
115-134.
C.C. Paige and M.A. Saunders (1975). "Solution of Sparse Indefinite Systems of Linear
Equations," SIAM J. Num. Anal. 12, 617-29.
C. C. Paige and M. Saunders (1981). "Toward a Generalized Singular Value Decomposi-
tion," SIAM J. Num. Anal. 18, 398-405.
C. C. Paige and M.A. Saunders (1982). "LSQR: An Algorithm for Sparse Linear Equa-
tions and Sparse Least Squares," ACM 1hlns. Math. Soft. 8, 43-71.
C. C. Paige and M.A. Saunders (1982). "Algorithm 583 LSQR: Sparse Linear Equations
and Least Squares Problems," ACM 1hlns. Math. Soft. 8, 195-209.
C.C. Paige and P. Van Dooren (1986). "On the Quadratic Convergence of Kogbetliantz's
Algorithm for Computing the Singular Value Decomposition," Lin. Alg. and Its
Applic. 77, 301-313.
C.C. Paige and C. Van Loan (1981). "A Schur Decomposition for Hamiltonian Matrices,"
Lin. Alg. and Its Applic. ,41, 11-32.
C. C. Paige and M. Wei (1993). "Analysis of the Generalized Total Least Squares Problem
AX= B when Some of the Columns are Free of Error," Numer. Math. 65, 177-202.
C. C. Paige and M. Wei (1994). "History and Generality of the CS Decomposition," Lin.
Alg. and Its Applic. 208/209, 303-326.
BIBLIOGRAPHY 673

C.-T. Pan (1993). "A Perturbation Analysis of the Problem of Downdating a Cho)..,ky
FBctorization," Lin. Alg. and Its Applic. 183, 103-115.
V. Pan (1984). "How Can We Speed Up Matrix Multiplication?," SIAM Review 26,
393--416.
H. Park (1991). "A Parallel Algorithm for the Unbalanced Orthogonal Procrustes Prob-
lem," Parallel Computing 17, 913-923.
H. Park and L. Elden (1995). "Downdating the Rank-Revealing URV Decomposition,"
SIAM J. Matrix Anal. Appl. 16, 138-155.
B.N. Parlett (1965). "Convergence of the Q-R Algorithm," Numf!r'. Math. 7, 187-93.
(Correction in Numer. Math. 10, 163-64.)
B.N. Parlett (1966). "Singular and Invariant Matric"" Under the QR Algorithm," Math.
Comp. 20, 611-15.
B.N. Parlett (1967). "Canonical Decomposition of Hessenberg Matric...," Math. Comp.
21, 223-27.
B.N. Parlett (1968). "Global Convergence of the Basic QR Algorithm on Hessenberg
Matrices," Math. Comp. 22, 803-17.
B.N. Parlett (1971). "Analysis of Algorithms for Reflections in Bisectors," SIAM Review
13, 197-208.
B.N. Parlett (1974). "The Rayleigh Quotient Iteration and Some Generalizations for
Nonnormal Matric...," Math. Comp. 28, 679-93.
B.N. Parlett (1976). "A Recurrence Among the Elements of Functions of Triangular
Matric...," Lin. Alg. and Its Applic. 1-4, 117-21.
B.N. Parlett (1980). The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood
Cliffs, NJ.
B.N. Parlett (1980). "A New Look at the Lanczos Algorithm for Solving Symmetric
Systems of Linear Equations," Lin. Alg. and Its Applic. 29, 323--46.
B.N. Parlett (1992). "Reduction to Tridiagonal Fonn and Minimal Realizations," SIAM
J. Matrix Anal. Appl. 13, 567-593.
B.N. Parlett (1995). "The New qd Algorithms," ACTA Numerica 5, 459-491.
B.N. Parlett and B. Nour-Omid (1985). "The Use of a Relined Error Bound When
Updating Eigenvalues of Tridiagonals, • Lin. Alg. and Its Applic. 68, 179-220.
B.N. Parlett and W.G. Poole (1973). "A Geometric Theory for the QR, LU, and Power
Iterations," SIAM J. Num. Anal. 10, 389-412.
B.N. Parlett and J.K. Reid (1970). "On the Solution of a System of Linear Equations
Whose Matrix is Symmetric but not Definite," BIT 10, 386-97.
B.N. Parlett and J.K. Reid (1981). "Tracking the Progress of the LanCZOII Algorithm for
Large Symmetric Eigenprohlerns," IMA J. Num. Anal. 1, 135-55.
B.N. Parlett and C. Reinsch (1969). "Balancing a Matrix for Calculation of Eigen-
values and Eigenvectors," Numer. Math. 13, 292-304. See also Wilkinson and
Reinsch(1971, pp. 315-26).
B.N. Parlett and R. Schreiber (1988). "Block Reflectors: Theory and Computation,"
SIAM J. Num. Anal. 25, 189-205.
B.N. Parlett and D.S. Scott (1979). "The Lanczos Algorithm with Selective Orthogo-
nalization," Math. Comp. 33, 217-38.
B.N. Parlett, H. Simon, and L.M. Stringer (1982). "On Estimating the Largest Eigen-
value with the Lanczos Algorithm," Math. Comp. 38, 153-166.
B.N. Parlett, D. Taylor, and Z. Liu (1985). "A Look-Ahead Lanczos Algorithm for
Unsymmetric Matrices," Math. Comp. -4-4, 105-124.
N. Patel and H. Jordan ( 1984). "A Parallelized Point Rowwise Successive Over-Relaxation
Method on a Multiprocessor," Parallel Computing 1, 207-222.
R.V. Patel, A.J. Laub, and P.M. Van Dooren, eds. (1994). Numerical Linear Algebra
Techniq1Joes for S11stems and Control, IEEE Press, Piscataway, New Jersey.
D.A. Patterson and J.L. Hennessy {1989). Computer Architecture: A Quantitatiue Ap-
proach, Morgan Kaufmann Publishers, Inc., Palo Alto, CA.
M.S. Paterson and L.J. Stockmeyer (1973). "On the Number of Nonscalar Multiplica--
tions Necessary to Evaluate Polynomials," SIAM J. Comp. 2, 6D-66.
674 BIBLIOGRAPHY

K. Pearson (1901). "On Lin"" and Plan"" of Closest Fit to Points in Space," Phil. Mag.
£, 559--72.
=
G. Peters a.nd J.H. Wilkinson (1969). "Eigenvaluffi of Az :l.Bz with Ba.nd Symmetric
A a.nd B," Camp. J. 12, 398-404.
G. Peters and J.H. Wilkinson (1970). "The Least Squarffi Problem and Pseudo-Inverses,"
Camp. J. 13, 309-16.
G. Peters a.nd J.H. Wilkinson (1970). "Az = :l.Bz a.nd the Generalized Eigenproblem,"
SIAM J. Num. Anal. 1, 479--92.
G. Peters and J.H. Wilkinson (1971). "The Calculation of Specified Eigenvectors by
Inverse Iteration," in Wilikinson and Reinsch (1971, pp.418-39).
G. Peters and J.H. Wilkinson (1979). "Inverse Iteration, Ill-Conditioned Equations, and
Newton's Method," SIAM Review £1, 339--60.
D.J. Pierce and R.J. Plemmons (1992). "Fast Adaptive Condition Estimation,'' SIAM
J. Matri:J; Anal. Appl. 13, 274-291.
S. Pissanetsky (1984). Sparse Matrix Tedanology, Academic Press, New York.
R.J. Plemmons (1974). "Linear Least Squares by Elimination and MGS," J. Assoc.
Camp. Mach. £1, 581-85.
R.J. Plemmons (1986). "A Parallel Block Iterative Scheme Applied to Computations in
Structural Analysis,'' SIAM J. Alg. and Disc. Method• 1, 337-347.
R.J. Plemmons and C.D. Meyer, eds. ( 1993). Linear Algebra, Markov Chain•, and
Queuing Model8, Springer-Verlag, New York.
A. Pokrzywa (1986). "On Perturbations and the Equivalence Orbit of a Matrix Pencil,"
Lin. Alg. and Applic. 82, 99--121.
E.L. Poole and J.M. Ortega (1987). "Multicolor ICCG Methods for Vector Computers,"
SIAM J. Numer. Anal. tl4, 1394-1418.
D.A. Pope and C. Tompkins (1957). "Maximizing Functions of Rotations: Experiments
Concerning Speed of Diagonalization of Symmetric Matrices Using Jacobi's Method,"
J. ACM 4, 459--66.
A. Pothen, S. Jha, and U. Vemapulati (1987). "Orthogonal Factorization on a Dis-
tributed Memory Multiprocessor," in Hypercube Multiproce••or•, ed. M.T. Heath,
SIAM Publications, 1987.
M.J.D. Powell and J.K. Reid (1968). "On Applying Householder's Method to Linear
Least Squares Problems,'' Proc. IFIP Congre•s, pp. 122-26.
R. Pratap (1995). Getting Started with MATLAB, Saunders College Publishing, Fort
Worth, TX.
J.D. Pryce (1984). "A New Measure of Relative Error for Vectors,'' SIAM J. Num.
Anal. 21, 202-21.
C. Puglisi ( 1992). "Modification of the Householder Method Based on the Compact WY
Representation," SIAM J. Sci. and Stat. Camp. 13, 723--726.
S. Qiao(1986). "Hybrid Algorithm for Fast Toeplitz Orthogonalization," Numer. Math.
53, 351-366.
S. Qiao (1988). "Recursive Least Squares Algorithm for Linear Prediction Problems,"
SIAM J. Matrix Anal. Appl. 9, 323--328.
C.M. Rader a.nd A.O. Steinhardt (1988). "Hyperbolic Householder Transforms," SIAM
J. Matri:J; Anal. Appl. 9, 269--290.
G. Radicati di Brozolo and Y. Robert (1989). "Parallel Conjugate Gradient-like Algo-
rithms for Solving Sparse Nonsymmetric Linear Systems on a Vector Multiprocessor,"
Parallel Computing 11, 233-240.
P. Raghava.n (1995). "Distributed Sparse Gaussian Elimination and Orthogonal Factor-
ization," SIAM J. Sci. Camp. 16, 1462-1477.
W. Rath (1982). "Fast Givens Rotations for Orthogonal Similarity," Numer. Math. 40,
47-56.
P.A. Regalia and S. Mitra (1989). "Kronecker Products, Unitary Matricffi, and Signal
Processing Applications," SIAM Review 31, 58&-613.
L. Reichel (1991). "Fast QR Decomposition of Vandermonde-Like Matrices and Polyno-
mial Least Squares Approximation," SIAM J. Matrix Anal. Appl. 12., 552-564.
BIBLIOGRAPHY 675

L. Reichel and L.N. Trefethen (1992). "Eigenvalues and Pseud~igenvalues of Toeplitz


Matrices," Lin. Alg. and Its Applic. 162/163/164, 153-186.
J.K. Reid (1967). "A Note on the Least Squares Solution of a Band System of Lineae
Equations by Householder Reductions," Camp. J. 10, 188-89.
J.K. Reid (1971). "A Note on the Stability of Gaussian Elimination," J. Inst. Math.
Applic. 8, 374-75.
J.K. Reid (1971). " On the Method of Conjugate Gradients for the Solution of Large
Sparse Systems of Linear Equations," in Large Sparse Sets of Linear Equations , ed.
J.K. Reid, Academic Press, New York, pp. 231-54.
J.K. Reid (1972). "The Use of Conjugate Gradients for Systems of Linear Equations
Possessing Property A," SIAM J. Nv.m. Anal. 9, 325-32.
C. Reinsch and F.L. Bauer (1968). "Rational QR Transformation with Newton's Shift
for Symmetric Tridiagonal Matrices," Numer. Math. 11, 264-72. See also Wilkinson
and Reinsch (1971, pp.257--Q5).
J.R. Rice (1966). "A Theory of Condition," SIAM J. Nv.m. Anal. 3, 287-310.
J.R. Rice (1966). "Experiments on Gram-Schmidt Orthogonalization," Math. Camp.
20, 325-28.
J.R. Rice (1981). Matm Computations and Mathematical Software, Academic Press,
New York.
R.F. Rinehart (1955). "The Equivalence of Definitions of a Matric Function," Amer.
Math. Monthly 62, 395-414.
Y. Robert ( 1990). The Impact of Vector and Pamllel Architectures on the Gaussian
Elimination Algorithm, Halsted Press, New York.
H.H. Robertson (1977). "The Accuracy of Error Estimates for Systems of Linear Alge-
braic Equations," J. Inst. Math. Applic. 20, 409-14.
G. Rodrigue ( 1973). "A Gradient Method for the Matrix Eigenvalue Problem Az
).Bz," Nv.mer. Math. 22, 1-16.
G. Rodrigue, ed. (1982). Pamllel Computation, Academic Press, New York.
G. Rodrigue and D. Wolitzer (1984). "Preconditioning by Incomplete Block Cyclic
Reduction," Math. Camp. 42, 549-566.
C. H. Romine and J.M. Ortega (1988). "Parallel Solution of Triangular Systems of Equa-
tions," Pamllel Computing 6, 109-114.
D.J. Rose (1969). "An Algorithm for Solving a Special Class of Tridiagonal Systems of
Lineae Equations," Comm. ACM 12, 234-36.
D.J. Rose and R. A. Willoughby, eds. (1972). Sparse Matrices and Their Applications,
Plenum Press, New York, 1972
A. Ruhe (1968). On the Quadratic Convergence of a Generalization of the Jacobi Method
to Arbitrary Matrices," BIT 8, 21(}-31.
A. Ruhe (1969). "The Norm of a Matrix After a Similarity Transformation," BIT 9,
53-58.
A. Ruhe (1970). "An Algorithm for Numerical Determination of the Structure of a
General Matrix," BIT 10, 196-216.
A. Ruhe (1970). "Perturbation BoundB for Means of Eigenvalues and Invariant Sub-
spaces," BIT 10, 343-54.
A. Ruhe (1970). "Properties of a Matrix with a Very Ill-Conditioned Eigenproblem,"
Nv.mer. Math. 15, 57--£0.
A. Ruhe (1972). "On the Quadratic Convergence of the Jacobi Method for Normal
Matrices," BIT 7, 305-13.
A. Ruhe (1974). "SOR Methods for the Eigenvalue Problem with Large Sparse Matri-
ces," Math. Comp. 28, 695-710.
A. Ruhe ( 1975). "On the Closeness of Eigenvalues and Singular Values for Almost
Normal Matrices," Lin. Alg. and Its Applic. 11, 87-94.
A. Ruhe ( 1979). "Implementation Aspects of Band Lanczos Algorithms for Computation
of Eigenvalues of Large Sparse Symmetric Matrices," Math. Camp. 33, 680-87.
A. Ruhe (1983). "Numerical Aspects of Gram-Schmidt Orthogonalization of Vectors,"
Lin. Alg. and Its Applic. 52/53, 591--QOl.
676 BIBLIOGRAPHY

A. Rube (1984). "Rational Krylov AlgorithrnB for Eigenvalue Computation," Lin. Alg.
and Its Applic. 58, 391-405.
A. Rube (1987). "Closest Normal Matrix Found!," BIT 21, 585-598.
A. Rube (1994). "Rational Krylov Algorithms for Nonsymmetric Eigenvalue Problems
II. Matrix Pairs," Lin. Alg. and Its Applic. 197, 283-295.
A. Rube (1994). "The Rational Krylov Algorithm for Nonsymmetric Eigenvalue Prob-
lems III: Complex Shifts for Real Matrices," BIT 34,165-176.
A. Rube and T. Wiberg (1972). ''The Method of Conjugate Gradients Used in Inverse
Iteration," BIT 1B, 543-54.
H. Rutishauser (1958). "Solution of Eigenvalue Problems with the LR Transformation,"
Nat. Bur. Stand. App. Math. Ser. 49, 47-81.
H. Rutishauser (1966). "Bestimmung der Eigenwerte Orthogonaler Matrizen," Numer.
Math. 9, 104-108.
H. Rutishauser (1966). "The Jacobi Method for Real Symmetric Matrices," Numer.
Math. 9, 1-10. See also Wilkinson and Reinsch (1971, pp. 202-11).
H. Rutishauser (1970). "Simultaneous Iteration Method for Symmetric Matrices," Nu-
mer. Math. 16, 205-23. See also Wilkinson and Reinsch (1971,pp.284-302).
Y. Saad (1980). "On the Rates of Convergence of the Lanczos and the Block Lanczos
Methods," SIAM J. Num. Anal.11, 687-706.
Y. Saad (1980). "Variations of Arnoldi's Method for Computing Eigenelements of Large
Unsymmetric Matrices.," Lin. Alg. and Its Applic. 34, 269-295.
Y. Saad (1981). "Krylov Subspace Methods for Solving Large Unsymmetric Linear
Systems," Math. Camp. 37, 105-126.
Y. Saad (1982). "The Lanczos Biorthogonalization Algorithm and Other Oblique Pro-
jection Metods for Solving Large Unsymmetric Systems," SIAM J. Numer. Anal.
19, 485-506.
Y. Saad (1984). "Practical Use of Some Krylov Subspace Methods for Solving Indefinite
and Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. Camp. 5, 203-228.
Y. Saad (1985). "Practical Use of Polynomial Preconditioning& for the Conjugate Gra-
dient Method," SIAM J. Sci. and Stat. Comp. 6, 865-882.
Y. Saad (1986). "On the Condition Number of Some Gram Matrices Arising from Least
Squares Approximation in the Complex Plane," Numer. Math. 48, 337-348.
Y. Saad (1987). "On the Lanczos Method for Solving Symmetric Systems with Several
Right Hand Sides," Math. Comp. 48, 651-662.
Y. Saad (1988). "Preconditioning Techniques for Indefinite and Nonsymmetric Linear
Systems," J. Camput. Appl. Math. 2,4, 89-105.
Y. Saad (1989). "Krylov Subspace Methods on Supercomputers," SIAM J. Sci. and
Stat. Comp. 10, 12D0-1322.
Y. Saad (1992). NumeriCBI Methods for Large Eigenvalue Problems: Theory and Algo-
rithms, John Wiley and Sons, New York.
Y. Saad (1993). "A Flexible Inner-Outer Preconditioned GMRES Algorithm," SIAM J.
Sci. Comput. 14, 461-469.
Y. Saad (1996). ltemtive Methods for Sparse Linear Systems, PWS Publishing Co.,
Boston.
Y. Saad and M.H. Schultz (1985). "Conjugate Gradient-Like Algorithms for Solving
Nonsymmetric Linear Systems," Math. Camp. 44, 417-424.
Y. Saad and M.H. Schultz (1986). "GMRES: A Generalized Minimal Residual Algorithm
for Solving Nonsymmetric Linear Systems," SIAM J. Scientific and Stat. Comp. 1,
856--869.
Y. Saad and M.H. Schultz (1989). "Data Communication in Parallel Architectures," J.
Dist. Pamllel Camp. 11, 131-150.
Y. Saad and M.H. Schultz (1989). "Data Communication in Hypercubes," J. Dist.
Parallel Camp. 6, 115-135.
A. Sameh (1971). "On Jacobi and Jacobi-like Algorithms for a Parallel Computer,"
Math. Camp. BS, 579-90.
BIBLIOGRAPHY 677

A. Sacneh and D. Kuck (1978). "On Stable Pacallel Lineae System Solvers," J. A&Ooc.
Comp. Mach. !5, 81-91.
A. Sacneh, J. Lermit and K. Noh (1975). "On the Intermediate Eigenvalues of Symmetric
Sparse Matrices," BIT 12, 543-54.
M.A. Sanders (1995). "Solution of Sp8C8e Rectangulac Systems," BIT 95, 58lHl04.
M.A. Saunders, H.D. Simon, and E.L. Yip (1988). "Two Conjugate Gradient-Type
Methods for Unsymmetric Lineae Equations," SIAM J. Num. Anal. !5, 927-940.
K. Schittkowski and J. Stoer (1979). "A Foctorization Method for the Solution of Con-
strained Lineae Least Squaces Problems Allowing for Subsequent Data changes,"
Numer. Math. 91, 431-463.
W. Schonauer (1987). Scientific Computing on Vector Computers, North Holland, Am-
sterdacn.
P. Schonemann (1966). "A Generalized Solution ofthe Orthogonal Procrustes Problem,"
Psychometrika 31, 1-10.
A. Scbonhage (1964). "On the Quadratic Convergence of the Jacobi Process," Numer.
Math. 6, 41G-12.
A. Schonhage (1979). "Arbitrary Perturbations of Hermitian Matrices," Lin. Alg. and
Its Applic. 24, 143-49.
R.S. Schreiber (1986). "Solving Eigenvalue and Singulac Value Problems on an Under-
sized Systolic Array," SIAM J. Sci. and Stat. Comp. 1, 441-451.
R.S. Schreiber (1988). "Block Algorithms for Pacallel Machines," in Numerical Algo-
rithms for Modern Pamllel Computer Architectures, M.H. Schultz (ed), !MA Volumes
in Mathematics and Its Applications, Number 13, Springer-Verlag, Berlin, 197-207.
R.S. Schreiber and B.N. Parlett (1987). "Block Reflectors: Theory and Computation,"
SIAM J. Numer. Anal. !5, 189-205.
R.S. Schreiber and C. Van Loan (1989). "A Storage-Efficient WY Representation for
Products of Householder Transformations," SIAM J. Sci. and Stat. Comp. 10,
52-57.
M.H. Schultz, ed. (1988). Numerical Algorithms for Modern Pamllel Computer Archi-
tectures, !MA Volumes in Mathematics and Its Applications, Number 13, Springer-
Verlag, Berlin.
!. Schur (1909). "On the Chacacteristic Roots of a Lineae Substitution with an Appli-
cation to the Theory of Integral Equations." Math. Ann. 66, 488-510 (German).
H.R. Schwartz (1968). 'Tridiagonalization of aSymmetric Band Matrix," Numer. Math.
12, 231-41. See also Wilkinson and Reinsch (1971, 273-83).
H.R Schwartz (1974). "The Method of Coordinate Relaxation for (A - AB)x = 0,"
Num. Math. 23, 135-52.
D. Scott (1978). "Analysis of the Symmetric Lanczos Process," Electronic Reseacch
Laboratory Technica.l Report UCB/ERL M78/40, University of California, Berkeley.
D.S. Scott (1979). "Block Lanczos Software for Symmetric Eigenvalue Problems," Re-
port ORNL/CSD-48, Oak Ridge National Laboratory, Union Carbide Corporation,
Oak Ridge, Tennessee.
D.S. Scott (1979). "How to Make the Lanczos Algorithm Converge Slowly," Math.
Camp. 33, 239-47.
D.S. Scott (1984). "Computing a Few Eigenvalues and Eigenvectors of a Symmetric
Band Matrix," SIAM J. Sci. and Stat. Comp. 5, 658--666.
D.S. Scott (1985). "On the Accuracy of the Gershgorin Circle Theorem for Bounding
the Spread of a Real Symmetric Matrix," Lin. Alg. and Its Applic. 65, 147-155
D.S. Scott, M.T. Heath, and R.C. Ward (1986). "Parallel Block Jacobi Eigenvalue
Algorithms Using Systolic Arrays," Lin. Alg. and Its Applic. 77, 345-356.
M.K. Seager (1986). "Pacallelizing Conjugate Gradient for the Cray X-MP," Pamllel
Computing 3, 35-47.
J.J. Seaton (1969). "Diagonalization of Complex Symmetric Matrices Using a Modified
Jocobi Method," Comp. J. 12, 156-57.
S. Serbin (1980). "On Factoring a Class of Complex Symmetric Matrices Without Piv-
oting," Math. Comp. 35, 1231-1234.
678 BIBLIOGRAPHY

S. Serbin and S. Blalock (1979). "An Algorithm fur Computing the Matrix Coeine,"
SIAM J. Sci. Stat. Camp. 1, 198-204.
J.W. Sheldon (1955). "On the Numerical Solution of Elliptic Difference Equations,"
Math. 7bbles Aids Comp. 9, 101-12.
W. Shougen a.nd Z. Shuqin (1991). "An Algorithm for Ax= )IBx with Symmetric a.nd
Positive Definite A and B," SIAM J. Matrix Anal. Appl. 1!, 654--660.
G. Shroff (1991). "A Parallel Algorithm for the Eigenvalues a.nd Eigenvectors of a
General Complex Matrix," Numer. Matfl. 58, 779-806.
G. Shroff and C.H. Bischof (1992). "Adaptive Condition Estimation for Rank-One Up-
dat ... of QR Factorizations," SIAM J. Matrix Anal. Appl. 13, 1264-1278.
G. Shroff a.nd R. Schreiber (1989). "On the Convergence of the Cyclic Jacobi Method
for Parallel Block Orderings," SIAM J. Matrix Anal. AppL 10, 326-346.
H. Simon (1984). "Analysis of the Symmetric La.nczos Algorithm with Reorthogonaliza..
tion Methods," Lin. Alg. and Its Applic. 61, 101-132.
B. Singer and S. Spilerman (1976). "The Representation of Social Processes by Markov
Models," Amer. J. Sociology 8f, 1-54.
R.D. Skeel (1979). "Scaling for numerical stability in Gaussian Elimination," J. ACM
M, 494-526.
R.D. Skeel (1980). "Iterative Refinement Implies Numerical Stability for Gaussian Elim-
ination," Math. Comp. 35, 817--832.
R.D. Skeel (1981). "Effect of Equilibration on RBiidual Size for Partial Pivoting," SIAM
J. Num. Anal. 18, 449-55.
G.L.G. Sleijpen and D.R. Fokkema (1993). "BICGSTAB(f) for Linear Equations In-
volving Unsymmetric Matrices with Complex Spectrum," Electronic 1hJnsactions
on Numerical Analysis 1, 11-32.
B.T. Smith, J.M. Boyle, Y. lkebe, V.C. Klema., a.nd C.B. Moler (1970). Matrix Eigen-
SlfStem Routines: EISPACK Guide, l!nd ed., Lecture Notes in Computer Science,
Volume 6, Springer-Verlag, New York.
R.A. Smith (1967). ''The Condition Numbers of the Matrix Eigenvalue Problem," N,._
mer. Matll. 10 232-40.
F. Smithi'"' (1970). Integrnl Equationa, Cambridge University Press, Cambridge.
P. Sonneveld (1989). "CGS, A Fast LanczO&-Type Solver for Nonsymmetric Linear Sy,.
terns," SIAM J. Sci. and Stat. Comp. 10, 36-52.
D.C. Sorensen ( 1992). "Implicit Application of Polynomial Filters in a k-Step Arnoldi
Method," SIAM J. Matrix Anal. Appl. 13, 357-385.
D.C. Sorensen (1995). "Implicitly RBitarted Arnoldi/Lanczos Methods for Large Scale
Eigenvalue Calculations," in Proceeding• of tile ICASE/LaRC Workshop on Parollel
Numeriall Algoritll,, May f3-f5, 1994, D.E. Keyes, A. Sa.meh, and V. Venkata.kr-
ishnan (eda), Kluwer.
G.W. Stewart (1969). "Accelerating The Orthogonal Iteration for the Eigenvalues of a
Hermitian Matrix," Numer. Math. 13, 362-76.
G.W. Stewart (1970). "Incorporating Original Shifts into the QR Algorithm for Sym-
metric Tridiagonal Matrices," Comm. ACM 13, 365--67.
G.W. Stewart (1971). "Error Bounds for Approximate 1nva.ria.nt Subspa.ces of Closed
Linear Operators," SIAM. J. Num. Anal. 8, 796-808.
G.W. Stewart (1972). "On the Sensitivity of the Eigenvalue Problem Ax= )IBx," SIAM
J. Num. Anal. 9, 669-86.
G.W. Stewart (1973). "Error a.nd Perturbation Bounds for Subspa.ces ASBOCiated with
Certain Eigenvalue Problema," SIAM &view 15, 727--64.
G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press, New York.
G.W. Stewart (1973). "Conjugate Direction Methods for Solving Systems of Linear
Equations," Numer. Math. f1, 284-97.
G.W. Stewart (1974). "The Numerical Treatment of Large Eigenvalue Problems," Proc.
IFIP Congress 74, North-Holland, pp. 666-72.
G. W. Stewart (1975). "The Convergence of the Method of Conjugate Gradients at
Isolated Extreme Points in the Spectrum," Numer. Math. f4, 85--93.
BIBLIOGRAPHY 679

G.W. Stewart (1975). "Gershgorin Theory for the Generalized Eigenvalue Problem Ax=
>.Bx," Math. Camp. 89, 600-606.
G.W. Stewart (1975). "Methods of Simultaneous Iteration for Calculating Eigenvectors
of Matrices," in Topics in Numerical Analysis II, ed. John J.H. Miller, Academic
Press, New York, pp. 185-96.
G.W. Stewart (1976). "The Economical Storage of Plane Rotations," Numer. Math.
85, 137-38.
G.W. Stewart (1976). "Simultaneous Iteration for Computing Invariant Subspaces of
Non-Hermitian Matrices," Numer. Math. 85, 123-36.
G.W. Stewart (1976). "Algorithm 406: HQR3 and EXCHNG: Fortran Subroutines for
Ca.lculating and Ordering the Eigenvalues of a Real Upper Hessenberg Matrix," ACM
7rans. Math. Soft. 8, 275--80.
G.W. Stewart (1976). "A Bibliographical Tour of the Large Sparse Generalized Eigen-
value Problem," in Sparse Matrix Computations , ed., J.R. Bunch and D.J. Rose,
Academic Press, New York.
G.W. Stewart (1977). "Perturbation Bounds for the QR Factorization of a Matrix,"
SIAM J. Num. Anal. 14, 509--18.
G.W. Stewart (1977). "On the Perturbation of Pseudo-Inverses, Projections and Linear
Least Squares Problems," SIAM &view 19, 634--662.
G.W. Stewart (1977). "Sensitivity Coefficients for the Effects of Errors in the Indepen-
dent Variables in a Linear Regression," Technical Report TR.-571, Department of
Computer Science, University of Maryland, College Park, MD.
G.W. Stewart (1978). "Perturbation Theory for the Generalized Eigenvalue Problem"',
in &cent Advances in Numerical Analysis, ed. C. de Boor and G. H. Golub, Aca,.
demic Press, New York.
G. W. Stewart (1979). "A Note on the Perturbation of Singular Values," Lin. Alg. and
Its Applic. 88, 213-16.
G.W. Stewart (1979). "Perturbation Bounds for the Definite Generalized Eigenvalue
Problem," Lin. Alg. and Its Applic. 83, 69--86.
G.W. Stewart (1979). ''The Effects of Rounding Error on an Algorithm for Downdating
a Cholesky Factorization," J. Inst. Math. Applic. 23, 203-13.
G.W. Stewart (1980). "The Efficient Generation of Random Orthogonal Matrices with
an Application to Condition Estimators," SIAM J. Num. Anal. 17, 403-9.
G.W. Stewart (1981). "On the Implicit Deflation of Nearly Singular Systems of Linear
Equations," SIAM J. Sci. and Stat. Comp. 8, 136-140.
G.W. Stewart (1983). "A Method for Computing the Generalized Singular Value De-
composition," in Matrix Pencils , ed. B. KAgstrom and A. Ruhe, Springer-Verlag,
New York, pp. 207-20.
G.W. Stewart (1984). "A Second Order Perturbation Expansion for Small Singular
Values," Lin. Alg. and Its Applic. 56, 231-236.
G.W. Stewart (1984). "Rank Degeneracy," SIAM J. Sci. and Stat. Camp. 5, 403-413.
G.W. Stewart (1984). "On the Asymptotic Behavior of Scaled Singular Value and QR
Decompositions," Math. Camp. 43, 483-490.
G.W. Stewart (1985). "A Jacobi-Like Algorithm for Computing the Schur Decomposi-
tion of a Nonhermitian Matrix," SIAM J. Sci. and Stat. Camp. 6, 853-862.
G.W. Stewart (1987). "Collinearity and Least Squares Regression," Statistical Science
8, 68-100.
G.W. Stewart (1989). "On Scaled Projections and Pseudoinverses," Lin. Alg. and Its
Applic. 112, 189--193.
G.W. Stewart (1992). "An Updating Algorithm for Subspace Tracking," IEEE 7rans.
Signal Proc. 40, 1535--1541.
G.W. Stewart (1993). "Updating a Rank-Revealing ULV Decomposition," SIAM J.
Matrix Anal. Appl. 14, 494-499.
G.W. Stewart (1993). "On the Perturbation of LU Cholesky, and QR Factorizations,"
SIAM J. Matrix Anal. Appl. 14, 1141-1145.
680 BIBLIOGRAPHY

G.W. Stewart (1993). "On the Early History of the Singular Value Decomposition,"
SIAM Review 35, 551-566.
G.W. Stewart (1994). "Perturbation Theory for Rectangular Matrix Pencils," Lin. Alg.
and Applic. 208/209, 297-301.
G.W. Stewart (1994). "Updating URV Decompositions in Parallel," Pamllel Computing
20, 151-172.
G.W. Stewart and J.-G. Sun (1990). Matrix Perturbation TheoMJ, Academic PreBS, San
Diego.
G.W. Stewart and G. Zheng (1991). "Eigenvalues of Graded Matrices and the Condition
Numbers of Multiple Eigenvalues," Numer. Math. 58, 703-712.
M. Stewart and P. Van Dooren (1996). "Stability Issues in the Factorization of Struc-
tured Matrices," SIAM J. Matri>: Anal. App!. 18, to appear.
H.S. Stone (1973). "An Efficient Parallel Algorithm for the Solution of a Tridiagonal
Linear System of Equations," J. ACM 20, 27-38.
H.S. Stone (1975). "Parallel Tridiagonal Equation Solvers," ACM Tmns. Math. Sojt.1,
289-307.
G. Strang (1988). "A Framework for Equilibrium Equations," SIAM Review 30, 283-297.
G. Strang (1993). Introduction to Linear Algebm, Wellesley-Cambridge PreBS, Wellesley
MA.
V. Strassen (1969). "Gaussian Elimination is Not Optimal," Numer. Math. 13, 354-356.
J.-G. Sun (1982). "A Note on Stewart's Theorem for Definite Matrix Pa.irs," Lin. Alg.
and Its Applic. 48, 331-339.
J.-G. Sun (1983). "Perturbation Analysis for the Generalized Singular Value Problem,"
SIAM J. Numer. Anal. 20, 611-625.
J.-G. Sun (1992). "On Condition Numbers of a Nondefective Multiple Eigenvalue,"
Numer. Math. 61, 265-276.
J.-G. Sun (1992}. "Rounding Error and Perturbation Bounds for the Cholesky and
LDLT Factorizations," Lin. Alg. and Its Applic. 173, 77-97.
J.-G. Sun (1995}. "A Note on Backward Error Perturbations for the Hermitian Eigen-
value Problem," BIT 35, 385-393.
J.-G. Sun (1995}. "On Perturbation Bounds for the QR Factorization," Lin. Alg. and
Its Applic. 215, 95-112.
X. Sun and C.H. Bischof (1995). "A Basis-Kernel Representation of Orthogonal Matri-
ces," SIAM J. Matri>: Anal. Appl. 16, 1184-1196.
P.N. Swarztrauber (1979}. "A Parallel Algorithm for Solving General Tridiagonal Equa-
tions," Math. Comp. 33, 185-199.
P.N. Swarztrauber and R.A. Sweet (1973}. "The Direct Solution of the Discrete Poisson
Equation on a Disk," SIAM J. Num. Anal. 10, 900-907.
P.N. Swarztrauber and R.A. Sweet (1989}. "Vector and Parallel Methods for the Direct
Solution of Poisson's Equation," J. Comp. Appl. Math. 27, 241-263.
D.R. Sweet (1991}. "FBBt Block Theplitz Orthogonalization," Numer. Math. 58, 613-
629.
D.R. Sweet (1993}. ''The Use of Pivoting to Improve the Numerical Performance of
Algorithms for Toeplitz Matrices," SIAM J. Matri>: Anal. Appl. 14, 468-493.
R.A. Sweet (1974}. "A Generalized Cyclic Reduction Algorithm," SIAM J. Num. Anal.
11, 506--20.
R.A. Sweet (1977). "A Cyclic Reduction Algorithm for Solving Block Tridiagonal Sy&-
tems of Arbitrary Dimension," SIAM J. Num. Anal. 14, 706--20.
H.J. Symm and J.H. Wilkinson (1980}. "Realistic Error Bounds for a Simple Eigenvalue
and Its ASBOciated Eigenvector," Numer. Math. 35, 113-26.
P.T.P. Thng (1994). "Dynamic Condition Estimation and Rayleigh-Ritz Approxima-
tion," SIAM J. Matri>: Anal. Appl. 15, 331-346.
R.A. Thpia and D.L. Whitley (1988}. "The Projected Newton Method HBB Order 1 + .,/2
for the Symmetric Eigenvalue Problem," SIAM J. Num. Anal. 25, 1376-1382.
G.L. Thompson and R.L. Weil (1970). "Reducing the Rank of A - :O.B," Proc. Amer.
Math. Sec. 26, 548--54.
BIBLIOGRAPHY 681

G.L. Thompson and R.L. Weil (1972). "Roots of Matrix Pencils Ay = .\By: Existence,
Calculations, and Relations to Game Theory," Lin. Alg. and Ita Applic. 5, 207-26.
M.J. Todd (1990). "A Dantzig-Wolfe-like Variant of Ka.rmarker's Interior-Point Linear
Programming Algorithm," Opemtions Research 38, 1006-1018.
K.-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra
of Companion Matrices," Numer. Math. 68, 403-425.
L.N. Trefethen (1992). "Pseudospecta of Matrices," in Numerical Analysis 1991, D.F.
Griffiths and G.A. Watson (eds), Longman Scientific and Technical, Harlow, Essex,
UK, 234-262.
L.N. Trefethen and D. Bau III ( 1997). Numerical Linmr Algelnu, SIAM Publications,
Philadelphia, PA.
L.N. Trefethen and R.S. Schreiber (1990). "Average-Case Stability of Gaussian Elimi-
nation," SIAM J. Matnz Anal. Appl. 11, 335-360.
L.N. Trefethen, A.E. Trefethen, S.C. Reddy, and T.A. Driscoll (1993). "Hydrodynamic
Stability Without Eigenvalues," Science 261, 578-584.
W.F. Trench (1964). "An Algorithm for the Inversion of Finite Toeplitz Matrices," J.
SIAM 12, 515-22.
W.F. Trench (1989). "Numerical Solution of the Eigenvalue Problem for Hermitian
1beplitz Matrices," SIAM J. MatTi% Anal. Appl. 10, 135-146.
N.K. Tsao (1975). "A Note on Implementing the Householder Transformations." SIAM
J. Num. Anal. 12, 53-58.
H.W. Turnbull and A.C. Aitken (1961). An Introduction to the Theory of Canonical
Matrices, Dover Publications, New York, pp. 102-5.
F. Uhlig (1973). "Simultaneous Block Diagonalization of Two Real Symmetric Matrices,"
Lin. Alg. and Ita Applic. 7, 281-a9.
F. Uhlig (1976). "A Canonical Fbrm for a Pair of Real Symmetric Matrices That Gen-
erate a Nonsingular Pencil," Lin. Alg. and Ita Applic. 1.j, 18!}--210.
R. Underwood (1975). "An Iterative Block La.nczoe Method for the Solution of Large
Sparse Symmetric Eigenproblerns," Report STAN-CS-75-495, Department 'of Com-
puter Science, Stanford UniveiBity, Stanford, California.
R.J. Vaccaro, ed. (1991). SVD and Signal Proce8sing II: Algorithms, Analysis, and
Applications. Elsevier, Amsterdam.
R.J. Vaccaro (1994). "A Second-Order Perturbation Expansion for the SVD," SIAM J.
MatTi% Anal. Applic. 15, 661-671.
R.A. Van De Geijn (1993). "Deferred Shifting Schemes for Parallel QR Methods," SIAM
J. MatTi% Anal. Appl. 1.j, 18Q-194.
J. Vandergraft (1971). "Generalized Rayleigh Methods with Applications to Finding
Eigenvalues of Llllge Matrices," Lin. Alg. and Ita Applic . .j, 353-68.
A. Van der Sluis (1969). "Condition NurnbeiB and Equilibration Matrices," Numer.
Math. J.j, 14-23.
A. Van der Sluis ( 1970). "Condition, Equilibration, and Pivoting in Linear Algebraic
Systems," Numer. Math. 15, 74-a6.
A. Vander Sluis (1975). "Stability of the Solutions of Linear Least Squares Problem,"
Numer. Math. 23, 241-54.
A. VanderSluis (1975). "Perturbations of Eigenvalues of Non-normal Matrices," Comm.
ACM 18, 3Q-36.
A. Vander Sluis and H.A. VanDer VoiBt ( 1986). "The Rate of Convergence of Conjugate
Gradients," Numer. Math . .j8, 543-560.
A. Van der Sluis and G.W. Veltkamp (1979). "Restoring Rank and Consistency by
Orthogonal Projection," Lin. Alg. and Ita Applic. 28, 257-78.
H. Van de Vel (1977). "Numerical Treatment of a Generalized Vandermonde systems of
Equations," Lin. Alg. and Ita Applic. 17, 14!}--74.
E.F. Van de Velde (1994). Concurrent Scientific Computing, Springer-Verlag, New York.
H.A. Vander VoiBt (1982). "A Vectorizable Variant of Some ICCG Methods," SIAM J.
Sci. and Stat. Cvmp. 3, 35Q-356.
682 BIBLIOGRAPHY

H.A. Van der Vorst (1982). "A Generalized Lanczos Scheme," Math. Comp. 99, 559--
562.
H.A. Vander Vorst {1986). "The Performance of Fortran Implementations for Precon-
ditioned Conjugate Gradients on Vector Computers," Parollel Computing 9, 49--58.
H.A. Vander Vorst {1986). "An Iterative Solution Method for Solving f(A)x =bUsing
Krylov Subspace Information Obtained for the Symmetric Positive Definite Matrix
A," J. Comp. and App. Math. 18, 249--263.
H. Vander Vorst (1987). "Large Tridiagonal and Block Tridiagonal Linear Systems on
Vector and Parallel Computers," Parollel Comput. 5, 45-54.
H. VanDer Vorst (1989). "High Performance Preconditioning," SIAM J. Sci. and Stat.
Comp. 10, 1174-1185.
H.A. Van Der Vorst (1992). "BiCGSTAB: A Fast and Smoothly Converging Variant of
the Bi-CG for the Solution of Nonsymmetric Linear Systems," SIAM J. Sci. and
Stat. Comp. 19, 631--{)44.
P. Van Dooren (1979). ''The Computation of Kronecker's Canonical Form of a Singular
Pencil," Lin. Alg. and Its Applic. fn, 103-40.
P. Van Dooren (1981). "A Generalized Eigenvalue Approach for Solving lliccati Equa,.
tiona," SIAM J. Sci. and Stat. Comp. !!, 121-135.
P. Van Dooren (1981). "The Generalized Eigenstructure Problem in Linear System
Theory," IEEE 1hms. Auto. Cont. AC-26, 111-128.
P. Van Dooren (1982). "Algorithm 590: DSUBSP and EXCHQZ: Fortran Routines
for Computing Deflating Subspaces with Specified Spectrum," ACM 1Tons. Math.
Software 8, 376-382.
S. Van Huffel (1992). "On the Significance of Nongeneric Total Least Squares Problems,"
SIAM J. Matrix Anal. Appl. 13, 2D-35.
S. Van Huffel and H. Park (1994). "Parallel Tri- and Bidiagonalization of Bordered
Bidiagonal Matrices," Parollel Computing flO, 1107-1128.
S. Van Huffel and J. Vandewalle (1987). "Subset Selection Using the Total Least Squares
Approach in Collinearity Problems with Errors in the Variables," Lin. Alg. and Its
Applic. 88/89, 695-714.
S. Van Hutfel and J. Vandewalle (1988). ''The Partial Total Least Squares Algorithm,"
J. Comp. and App. Math. 21, 333-342.
S. Van Huffel and J. Vandewalle (1988). "Analysis and Solution of the Nongeneric Total
Least Squares Problem," SIAM J. Matrix Anal. Appl. 9, 360-372.
S. Van Hutfel and J. Vandewalle (1989). "Analysis and Properties of the Generalized
Total Least Squares Problem AX "' B When Some or All Columns in A are Subject
to Error," SIAM J. Matrix Anal. Appl. 10, 294-315.
S. Van Huffel and J. Vandewalle (1991). The Total Least Squares Problem: Computa-
tional Aspects and Analysis, SIAM Publications, Philadelphia, PA.
S. Van Hutfel, J. Vandewalle, and A. Haegemans (1987). "An Efficient and Reliable
Algorithm for Computing the Singular Subspace of a Matrix Associated with its
Smallest Singular Values," J. Comp. and Appl. Math. 19, 313-330.
S. Van Hutfel and H. Zha (1991). "The Restricted Total Least Squares Problem: For-
mulation, Algorithm, and Properties," SIAM J. Matrix Anal. Appl. 1!!, 292-309.
S. Van Hutfel and H. Zha (1993). "An Efficient Total Least Squares Algorithm Based
On a Rank-Revealing Two-Sided Orthogonal Decomposition," Numerical Algorith1118
4. 101-133.
H.P.M. van Kempen (1966). "On Quadratic Convergence of the Special Cyclic Jacobi
Method," Numer. Math. 9, 19--22.
C.F. Van Loan (1973). "Generalized Singular Values With Algorithms and Applica-
tions," Ph.D. thesis, University of Michigan, Ann Arbor.
C. F. Van Loan {1975). "A General Matrix Eigenvalue Algorithm," SIAM J. Num. Anal.
1!!, 819--834.
C. F. Van Loan (1975). "A Study of the Matrix Exponential," Numerical Analysis Report
No. 10, Dept. of Maths., University of Manchester, England.
BIBLIOGRAPHY 683

C. F. Van Loan (1976). "Generalizing the Singular Value Decomposition," SIAM J. Num.
Anal. 13, 76-83.
C. F. Van Loan (1977). "On the Limitation and Application of Pade Approximation to
the Matrix Exponential," in Pade and Rational Approximation, ed. E.B. Sa.ff a.nd
R.S. Varga, Academic Press, New York.
C.F. Van Loan (1977). ''The Sensitivity of the Matrix Exponential," SIAM J. Num.
Anal. 14, 971-81.
C.F. Van Loan (1978). "Computing Integrals Involving the Matrix Exponential," IEEE
7hms. Auto. Conl AC-23, 39fr404.
C. F. Van Loan (1978). "A Note on the Evaluation of Matrix Polynomials," IEEE 1hms.
Auto. Cont. AC-.24, 32D-21.
C.F. Va.n Loa.n (1982). "Using the Hessenberg Decomposition in Control Theory,• in
Algorithms and Theory in Filtering and Control, D.C. Sorensen and R.J. Wets (eds),
Mathematical Programming Study No. 18, North Holland, Amsterdam, pp. 102-11.
C.F. Van Loan (1984). "A Symplectic Method for Approximating All the Eigenvalues
of a Hamiltonian Matrix," Lin. Alg. and Its Applic. 61, 233-252.
C.F. Van Loan (1985). "How Near is a Stable Matrix to an Unstable Matrix?," Con-
tempomry Mathematics, Vol. 47, 46fr477.
C.F. Van Loan (1985). "On the Method of Weighting for Equality Constrained Least
Squares Problems," SIAM J. Numer. Anal. 22, 851-864.
C.F. VWI Loan (1985). "Computing the CS and Genemlized Singular Value Decompo-
sition," Nvmer. Math. 46, 479-492.
C.F. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors,"
Lin. Alg. and Its Applic. 88/89, 715-732.
C.F. Van Loan (1992). Computational l'romeworks for the Fut Fourier Transform,
SIAM Publications, Philadelphia, PA.
C.F. Van Loan (1997). Introduction to Scientific Computing: A Matrix- Vector Approach
Using Matlab, Prentice Hall, Upper Saddle River, NJ.
J.M. Varah (1968). "The Calculation of the Eigenvectors of a. General Complex Matrix
by Inverse Iteration," Math. Camp. 22, 785-91.
J.M. Va.rah (1968). "Rigorous Machine Bounds for the Eigensystem of a General Com-
plex Matrix," Math. Comp. 22, 793-801.
J.M. Varah (1970). "Computing Invariant Subspaces of a General Matrix When the
Eigensystem is Poorly Determined," Math. Comp. 24, 137-49.
J.M. Varah (1972). "On the Solution of Block-Tridiagonal Systems Arising from Certain
Finite-Difference Equations," Math. Comp. 26, 85!Hl8.
J.M. Va.rah (1973). "On the Numerical Solution of Ill-Conditioned Linear Systems with
Applications to III-Posed Problems," SIAM J. Num. Anal. 10, 257-67.
J.M. Va.rah (1979). "On the Separation of Two Matrices," SIAM J. Num. Anal. 16,
212-22.
J.M. Varah (1993). "Errors a.nd Perturbations in Vandermonde Systems," IMA J. Num.
AnaL 13, 1-12.
J.M. Varah (1994). "Backward Error Estimates for Toeplitz Systems," SIAM J. MatTi%
Anal. Appl. 15,408-417.
R.S. Varga (1961). "On Higher-Order Stable Implicit Methods for Solving Parabolic
Partial Differential Equations," J. Math. Phys. 40, 22D-31.
R.S. Varga (1962). MatTi% Itemtive Analysis, Prentice-Hall, Englewood Cliffs, NJ.
R.S. Varga (1970). "Minima.! Gershgorin Sets for Partitioned Matrices," SIAM J. Num.
AnaL 7, 493-507.
R.S. Varga (1976). "On Diagonal Dominance Arguments for Bounding II A- 1 lloo," Lin.
Alg. and Its Applic. 14, 211-17.
S.A. Vavasis (1994). "Stable Numerical Algorithms for Equilibrium Systems," SIAM J.
Matri:J: Anal. AppL 15, 1108-1131.
S.A. Vavasis (1992). "Preconditioning for Boundary Integral Equations," SIAM J. Ma-
tri:J: Anal. Appl. 13, 905-925.
684 BIBLIOGRAPHY

K. Veseli~
(1993). "A Jacobi Eigenreduction Algorithm for Definite Matrix Pairs," Nu-
mer. Math. 64, 241-268.
K. Vesel~ and V. Hari (1989). "A Note on a One-Sided Jacobi Algorithm," Numer.
Math. 56, 627-{;33.
W.J. Vetter (1975). "Vector Structures Wld Solutions of Linear Matrix Equations," Lin.
Alg. and Its Applic. 10, 181--s8.
C. Vuik and H.A. van der Vorat (1992). "A Comparison of Some GMRES-Iike Methods,"
Lin. Alg. and Its Applic. 160, 131-162.
A. Wald (1940). "Tbe Fitting of Straight Lines if Botb Variables are Subject to Error,"
Annals of Mathematical Statistics 11, 284-300.
B. Walden, R. Karlson, J. Sun (1995). "Optimal Backward Perturbation Bounds for the
Linear Least Squares Problem," Numerical Lin. Alg. with Applic. 2, 271-286.
H.F. Walker (1988). "Implementation of the GMRES Method Using Householder Trans-
formations," SIAM J. Sci. Stat. Camp. 9, 152-163.
R.C. Ward (1975). ''The Combination Shift QZ Algorithm," SIAM J. Num. Anal. 1S,
83~853.
R.C. Ward (1977). "Numerical Computation of the Matrix Exponential with Accuracy
Estimate," SIAM J. Num. Anal. 14, 6()(}-14.
R.C. Ward (1981). "Balancing the Generalized Eigenvalue Problem," SIAM J. Sci and
Stat. Camp. 8, 141-152.
R.C. Ward and L.J. Gray (1978). "Eigensystem Computation for Skew-Symmetric and
A Cl888 of Symmetric Matrices," ACM Thins. Math. Soft. 4, 278-85.
D.S. Watkins (1982). "Understanding the QR Algorithm," SIAM Re!Jiew 84, 427-440.
D.S. Watkins (1991). Jihndamentals of M11tri:t Computations, John Wiley and Sons,
New York.
D.S. Watkins (1993). "Some Perspectives on the Eigenvalue Problem," SIAM Re!Jiew
35, 43o-471.
D.S. Watkins and L. Elsner (1991). ''Chasing Algorithms for the Eigenvalue Problem,"
SIAM J. Matri:t Anal. Appl. 1!, 374-384.
D.S. Watkins and L. Elsner (1991). "Convergence of Algorithms of Decomposition Type
for the Eigenvalue Problem," Lin.Aig. and Its Applic. 143, 19-47.
D.S. Watkins and L. Elsner (1994). "Theory or Decomposition and Bulge-ChBSing Al-
gorithms for the Generalized Eigenvalue Problem," SIAM J. Matri:r: Anal. Appl. 15,
943-967.
G.A. Watson (1988). "The Smallest Perturbation or a Submatrix which Lowers the Rank
of the Matrix," IMA J. Numer. Anal. 8, 29~304.
P.A. Wedin (1972). "Perturbation Bounds in Connection with the Singular Value De-
compollition," BIT 11l, 99-111.
P.A. Wedin (1973). "Perturbation Theory for Pseudo-Inverses," BIT 13, 217-32.
P.A. Wedin (1973). "On the Almost Rank-Deficient Case of the Least Squares Problem,"
BIT 13, 344-54.
M. Wei (1992). "Perturbation Theory for the Rank-Deficient Equality Constrained Least
Squares Problem," SIAM J. Num. Anal. IJ9, 1462-1481.
M. Wei (1992). "Algebraic Properties of the Rank-Deficient Equality-Constrained and
Weighted Least Squares Problems," Lin. Alg. and Its Applic. 161, 27-44.
M. Wei (1992). "The Analysis for the Total Least Squares Problem with More than One
Solution," SIAM J. Matri3: Anal. AppL 13, 746-763.
0. Wid lund (1978). "A Lanczos Method for a Cl888 of Nonsymmetric Systems of Linear
Equations," SIAM J. Numer. Anal. 15, 801-12.
J.H. Wilkinson (1961). "Error Analysis of Direct Methods or Matrix Inversion," J. ACM
8, 281-330.
J.H. Wilkinson (1962). "Note on the Quadratic Convergence of the Cyclic Jacobi Pro-
cess," Numer. Math. 6, 296--300.
J.H. Wilkinson (1963). Rounding .EmJrs in Algebroic Processes, Prentice-Hall, Engle-
wood Cliffs, NJ.
BIBLIOGRAPHY 685

J.H. Wilkinson (1965). The Algebmic EigentJalue Problem, Clarendon Pn!118, Oxford,
England.
J.H. Wilkinson (1965). "Convergence of the LR, QR, and Related Algorithms," Camp.
J. 8, 77-84.
J.H. Wilkinson (1968). "Global Convergence of Tridiagonal QR Algorithm With Origin
Shifts," Lin. Alg. and Its Applic. I, 409-20.
J.H. Wilkinson (1968). "Almost Diagonal Matrices with Multiple or Close Eigenvalues,"
Lin. Alg. and Its Applic. I, 1-12.
J.H. Wilkinson (1968). •A Priori Error Analysis of Algebraic Processes," Proc. Inter-
national Congress Math. (Moscow: Izdat. Mir, 1968), pp. 629-39.
J.H. Wilkinson (1971). "Modern Error Analysis," SIAM Retliew 13, 548-{;8.
J.H. Wilkinson (1972). "Note on Matrices with a Very Ill-Conditioned Eigenproblem,"
Numer. Math. 19, 176-78.
J.H. Wilkinson (1977). "Some Recent Advances in Numerical Linear Algebra.," in The
State of the Art in Numerical Analysis, ed. D.A.H. Jacobs, Academic Press, New
York, pp. 1-53.
J.H. Wilkinson (1978). "Linear Differential EquatiollB BOd Kronecker's CBOonical Form,"
in Recent Adt!ances in Numerical Analysis , ed. C. de Boor and G.H. Golub, Ac&-
demic Press, New York, w. 231-65.
J.H. Wilkinson (1979). "Kronecker's Canonical Form a.nd the QZ Algorithm," Lin. Alg.
and Its Applic. 28, 285-303.
J.H. Wilkinson (1984). "On Neighboring Matrices with Quadratic Elementary Divisors,"
Numer. Math. 44, 1-21.
J.H. Wilkinson a.nd C. Reinsch, eds. (1971). Handbook for Automatic Computation,
VoL J?, Lifular Algebm, Springer-Verlag, New Yock.
H. Wimmer a.nd A.D. Ziebur (1972). "Solving the Matrix Equations Efp(A)gp(A) = C,"
SIAM Retliew 14, 318-23.
S. Winograd {1968). "A New Algorithm for Inner Product," IEEE 'lrans. Camp. C-17,
693-694.
M. WoUe {1996). High Performance Compilers for Pamllel Computers, Addison-Wesley,
Reading MA.
A. Wouk, ed. (1986). New Computing Enllironments: Pamllel, Vector, and Systolic,
SIAM Publications, Philadelphia, PA.
H. Woznialtowaki (1978). "Roundoff-Error Aoa.lysis of Iterations for Large Linear Sy&-
tems," Numer. Math. 30, 301-314.
H. Woznialtowski (1980). "Roundoff Error Analysis of a New Class of Conjugate Gradient
Algorithms," Lin. Alg. and Its Applic. !9, 507-29.
A. Wragg {1973}. "Computation of the Exponential of a Matrix 1: Theoretical Consid-
erations," J. Inst. Math. Applic. 11, 369-75.
A. Wragg (1975). "Computation of the Exponential of a Matrix II: Practical Consider-
ations," J. Inst. Math. Applic. 15, 273-78.
S.J. Wright (1993). "A Collection of Problems for Which Gaussian Elimination with
Partial Pivoting is Unstable," SIAM J. Sci. and Stat. Comp. 14, 231-238.
J.M. Yohe (1979}. "Software for Interval Arithmetic: A Reasonable Portable Package,"
ACM 'lrans. Math. Soft. 5, 5o-63.
D.M. Young (1970). "Convergence Properties of the Symmetric and Uosymmetric Over-
Relaxation Methods," Math. Camp. J?-4, 793-&>7.
D.M. Young (1971}. ItemtitJe Solution of Large Linear Systems, Academic Press, New
York.
D.M. Young (1972). "Generalization of Property A and Consistent Ordering," SIAM J.
Num. Anal. 9, 454-63.
D.M. Young a.nd K.C. Jea (1980). "Generalized Conjugate Gradient Acceleration of
Nonsymmetrizable Iterative Methods," Lin. Alg. and Its Applic. 34, 159-94.
L. Yu. Kolotilina BOd A. Yu. Yeremin (1993). "Factorized Sparse Approximate lnve<se
Preconditioning 1: Theory," SIAM J. Matrix Anal. Applic. 14, 45-58.
686 BIBLIOGRAPHY

L. Yu. Kolotilina and A. Yu. Yeremin {1995). "Factorized Sparse Approximate Inverse
Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers,"
Intern. J. High Speed Comput. 7, 191-215.
H. Zha (1991). "The Restricted Singular Value Decomposition of Matrix Triplets," SIAM
J. Matrix Anal. Appl. 12, 172-194.
H. Zha (1992). "A Numerical Algorithm for Computing the Restricted Singular Value
Decomposition of Matrix Triplets," Lin.Alg. and Its Applic. 168, 1-25.
H. Zha {1993). "A Componentwise Perturbation Analysis of the QR Decomposition,"
SIAM J. Matrix Anal. Appl. 4, 1124-1131.
H. Zha and z. Zhang (1995). "A Note on Constructing a Symmetric Matrix with Spec-
iJied Diagonal Entries and Eigenvalues," BIT 35, 448-451.
H. Zhang and W.F. Moss {1994). "Using Parallel Banded Linear System Solvers in
Generalized Eigenvalue Problems," Pamllel Computing 20, 108~1106.
Y. Zhang {1993). "A Primal-Dual Interior Point Approach for Computing the L, and
L 00 Solutions of Overdetermined Linear Systems," J. Optimization Theory and Ap-
plications 77, 323--341.
S. Zohar (1969). "Toeplitz Matrix Inversion: The Algorithm ofW.F. Trench," J. ACM
16, 592--{)01.
Index

A·conjugate, 522-3 Jacobi, 435


A·norm, 530 Lanczos, 485, 505
Aasen's method, 163-70 Cholesky, 145-ti
Absolute Value notation, 62 Gaussian elimination, 116-7
Accumulat.ed inner product, 64 LU with pivoting, 116--7
Algebraic multiplicity, 316 LU, 101-2
Angles between subspaces, 603 matrix functions and, 560~1
Approximation QR factorization, 213-4
of a matrix function, 562-70 Tridiagonal, 17 4
Arnoldi factorization, 500 unsymmetric Lanczos, 505
Arnoldi method, 499-503 Block Householder, 225
Block matrices, 24ff
Back-substitution, 89-90, 153 data re-use and, 43-45
Backward error a.na.lysis, 65-67 diagonal dominance of, 175
Backward successive over·relaxation, 516 Block Schur and matrix functions, 560
Balancing, 360 Block vs.band, 176
Band algorithms Bunch~Kaufman algorithm, 169
triangular systems, 153
Cholesky, 155-ti Cache, 41
GauSBian elimination, 152-3 Cancellation, 61
Hessenberg LU, 154-5 Cauchy-Schwartz inequality, 53
Handedness, 16-7 Cayley transform, 73
data structures and, 19-20, 158-9 CGNE, 546
lower and upper, 16 CGNR, 545
LU factorization and, 152-3 Characteristic polynomial, 310
pivoting and, 154 generalized eigenproblem and, 375-6
profile, 159 Chebyshev polynomials, 475
Ba.ndwidth, 16 Chebyshev semi-iterative method, 614--6
Bartels-Stewa.rt algorithm, 367 Cholesky
barrier, 287 band, 155-ti
Basic solution in least squares, 258-9 block, 146-ti
Basis, 49 downdating and, 611
eigenvector, 316 gaxpy, 143-4
orthonormal, 69 outer product, 144-5
Ba.uer-Fike theorem, 321 ring, 300-3
Biconjugate gradient method, 550--1 shared memory, 303-4
Bidiagonalization stability. 146
Householder, 251-3 Cholesky reduction of A - >.B, 463-4
Lanczos, 495--6 Chordal metric, 378
upper triangularizing first, 252-3 Circulant systems, 201-2
Bidiagonal matrix, 17 Claeeical Gram-Schmidt, 23D-1
Big-Oh notation, 13 Classical Jacobi iteration
Binary powering, 569 for eigenvalues, 428-9
Bisection, 439 Colon notation, 7, 19
Bit reversal, 190 Column
Block algorithms deletion or addition in QR, 608-10
cyclic reduction, 177-80 partitioning, 6
data re-use and, 43 pivoting, 248-50
diagonalization, 366 weighting in LS, 264-5

687
688 INDEX

Communication costs, 277, 280--1, 287 Cyclic Jacobi method, 430


Companion matrix, 348 Cyclic reduction, 177-80
Complete
orthogonal decomposition, 250-1 Data Re-use, 34, 41
rank deficiency and, 256 Data Structures
reorthogonalization, 482-3 block, 45
Complex diagonal, 21-22
ma.tricee, 14 distributed, 278
QR factorization, 233 symmetric 20--2
Computation/communication ratio, 281 Deadlock, 280
Computation tree, 446 Decomposition
Condition number Arnoldi, 500
estimation, 128-30 bidiagonal, 251
Condition of block diagonal, 315,
eigenvalues, 323-4 Cholesky, 143
invariant subspaces, 325 companion matrix, 348
least squares problem, 242-5 complete orthogonal, 250-1
linear systems, 80--2 CS (general), 78
multiple eigenvalues, 324 CS (thin), 78
rectangular matrix, 230 generaJized reaJ Schur, 377
similarity transformation, 317 generalized Schur, 377
Confluent Vandermonde matrix, 188 Hessenberg, 344
Conformal partition, 25 Hessen berg-Triangular, 3 78-80
Conjugate Jordan, 317
directions, 522-3 LDLT, 138
residual method, 547-8 LDMT, 136
transpose, 14 LQ, 494
Conjugate gradient method LU, 97-98
derivation and properties, 490-3, 520--8 PA=LU, 113
Lanczos and, 528 QR, 223
Consistent norme, 55 real Schur, 341-2
Constrained least squares, 580ff Schur, JlJ
Contour integral and /(A), 556 s"mgular value, 70
Convergence of singular value (thin) 72
bisection method, 439 Symmetric Schur, 393
Chebyshev semi-iterative method, 515 tridiagonal, 414
conjugate gradient algorithm, 530 Defective eigenvalue, 316
cyclic Jacobi algorithm,430 Deflating subspace, 381, 386
Gauss-Seidel iteration, 511-2 Deflation and,
inverse iteration, 408 bidiagonal form, 454
iterative methods, 511 Heesenberg-triangular form, 381-2
Jacobi iteration, 511-2 QR algorithm, 352
Jacobi's method for the symmetric Departure from normality, 314
eigen problem, 4 29 Derogatory matrix, 349
Lanczos method, 425--7 Determinant, 50..1, 310
orthogonal iteration and singularity, 82
symmetric case, 411 Gaw~Bian elimination and, 97
unsymmetric case, 333, 336-9 Vandermonde matrix, 191
pov.rer method (symmetric), 406-7 Diagonal dominance, 120
QR algorithm, 360 block, 175-6
QZ algorithm, 386 Diagonal form, 316
Rayleigh Quotient iteration, 408-9 Diagonal pivot"mg method, 168-9
steepest descent, 520-1 Differentiation of factorizations,
SVD algorithm, 456 51' 103, 243, 273, 323
symmetric QR iteration, 421 Dimension, 49
Cosine of a matrix, 567 Distance between subspa.ces, 76-7
Courant-Fischer minimax theorem, 394 Distributed memory model, 276-7
Crawford number, 463 Divide and Conquer Algorithms
Critical section, 289 cyclic reduction, 177-80
Cross-validation, 584 Strassen, 31-3
Crout-Doolittle, 104 tridiagonal eigenvalue, 444-7
CS decomposition, 77-9 Domain decomposition, 538-9
INDEX 689

Dominant Exponential of matrix 1 572ft'


eigenvalue, 331
eigenvector, 331 Fa.ctorizatlon. See Decomposition.
invariant sub:Jpace, 333 Fast Fourier tr-ansform, 188-91
Doolittle reduction~ 104 FMt Givens QR, 218, 228, 241
Dot product, 5 ft, 61
Dot product roundoff, 62 Floating point numbers, 59
Double precision, 64 Flop, 18-9
Doubling fonnulae, 567, F·norm, 55
Durbin's algorithm, 195 Forward substitution, 88, 90, 153
Dynamically scheduled algorithms, 288 Forward error analysis, 65--6
Francis QR Step, 356-a
Efficiency, 281 Frechet derivative, 81
Eigenproblem Frobenius matrix norm, 55
constrained, 621 Function of triangular matrix, 558-61
diagonal plus rank-1, 442
generalized, 375ff, 46lff Gauss--Jordan transformations, 103
inverse, 622-3 Gauso-Seidel, 510, 512-3
orthogone.l matrix, 625-31 Gauss-Seidel iteration
symmetric, 39lff Solving Poisson equation and, 512-3
Toeplitz, 623-5 use as preconditioner, 540
unsymmetric, 308ff Gaussian elimination, 94 ff
Eigenvalues accuracy and, 123ff
cha.ra.c:terietic polynomial and, 310 block version, 101
computing selected, 440-1 complete pivoting and, 118
defective, 316 gaxpy version, 100
determinant and, 310 outer product version, 98
dominant, 331 partie.l pivoting and, 110-13
generalized, 375 roundoff error and, 104ff
interior, 478 Gau88 transformations, 95-6
ordering In Schur form, 365-6 Hessenberg form and, 349
~nsitivity of (unsymmetric), 32Q-4 Gaxpy,
sensitivity of (symmetric), 395-7 in distributed memory, 279
simple, 316 in shared memory, 286
singular values and, 318 Gaxpy algorithms
Sturm sequence and, 440-2 band Cholesky, 156
trace, 310 Gaussian elimination, 114-5
Eigenvector Cholesky, 144
dominant, 331 Gaxpy vs. Outer Product, 42
left, 311 Generalized eigenproblem, 375ff, 461ff
matrix and condition, 323-4 Generalized least squares, 266-7
perturbation, 326-7 Generalized Schu£ decomposition, 377
right, 311 Gene£alized singular value
Eispack, xiv decompO&ition, 465--7
Elementary Hermitian matrices and constrained Jeast squares, 580-2
See Householder matrix, proof of, 466
Elementary tl-ansformations. See Geometric multiplicity, 316
Gauss transformations, Gershgorin circle theorem, 320 1 395
Equality constained least squares, 585-7 givena, 216
Equilibration, 125 Given• QR, 226-7
Equilibrium systems, 170-1 Givens rotations, 215-8
Equivalence of norms, 53 Ghost eigenvalues, 484-5
Error Global variableo, 286
absolute, 53 GMRES, 548-50
matrix function, 563-4, 566--7 Golub-Kahan SVD step, 454-5
relative, 53 Gram·Schmidt
roundoff, 61 classical~ 230-1
Error estimation in power method, 332 modified, 231-2
Euclidean matrix norm. See Granularity, 284
Frobenlus matrix norm, Growth and
ExchBBge matrix, 193 Fast Givens tr-ansformations, 220-1, 229
Exponent range, 60 Gaussian elimination, 111, 116
690 INDEX

Gauss reduction to Hessenberg [terative improvement for


rorm, 349-50 least squares, 267-8
linear systems, 126-8
Hessenberg form 344-50 Iterative methods, 508ff
Arnoldi process and, 499-500
Gauss reduction to, 349 Jacobi iteration for the SVD, 457
Householder reduction to, 344---6 Jo.cobi iteration for symmetric
inverse iteration and, 363-4 eigenproblem, 426
properties, 346-8 cyclic, 430
QR factorization and, 227 parallel version, 431-4
QR iteration and, 342 Jacobi method for linear systems,
unreduced, 346 preconditioning with, 540
Hessenberg systems Jaoobi rotations, 426 See also
LU and, 154-5 Givens rotations,
Hessenberg~ Triangular form Jordan blocks, 317
reduction to, 378-80 Jordan decomposition, 317
Hierarchical memory, 41 computation, 370-1
Holder inequality, 153 matrix runctions and, 557, 563
Horner algorithm, 568-9
house, 210 Kaniei-Paige theory, 475-7
Householder bidiagonalization, 251-3 Krylov
Householder matrix, 209-15 matrix, 347-8, 416, 472
Hyperbolic transformations, 611-3 subspaces, 472, 525, 544ff
Hypercube, 276 Krylov subspace methods
biconjugate gradients, 550-1
Identity matrix, 50 CGNE, 546
Ill-conditioned matrix, 82 CGNR, 545
lm, 14 conjugate gradients, 490ff, 520ff
Implicit Q theorem, 346-7, 416-7 GMRES, 548-50
Implicit symmetric QR step with MINRES, 494
Wilkinson Shift, 420 QMR, 551
Implicitly restarted Arnoldi SYMMLQ, 494
method, 501-3
Incomplete Cholesky, 535 Lagrange multipliers, 582
Incomplete block preconditioners, 536-7 Lanczos methods for
Incurable breakdown, 505 bidiagonalizing, 495--6
Indefinite systems, 161ff least squares, 496-8
Independence, 49 singular values, 495--6
Inertia or symmetric matrix, 403 symmetric indefinite problems, 493-4
Inner product symmetric positive definite
accumulation of, 64 linear systems, 490-3
roundoff error and, 62-4 unsymmetric eigenproblem, 503--6
Integrating /(A), 56[1-70 Lanczos tridiagonalization,
Interchanges. See Pivoting, block version, 485-7, 505
Interlo.cing property, 396 complete reorthogonalization and, 482
Intersection of subspaces, 604-5 conjugate gradients and, 528
Invariant subspace, 372, 397-403 interior eigenvalues and, 478
approximate, 400-3 inverse eigenvalue problem and, 623
dominant, 333 power method and, 4 77
perturbation of, 324--6, 397-400 practical, 480
Schur vectors and, 313 Ritz pairs and, 475
[nverse eigenvalue problems, 622-3 roundoff and, 481-2
inverse error analysis. See Belective orthogonalization and, 483-4
Backward error analysis. s-step, 487
lnverse iteration, 362-4, 408 Lanczos vectors, 4 73
generalized eigenproblem, 386 LAPACK, xiii, 2. 4, 88, 134..,>,
[nverse of matrix, 50 207-8, 310, 392-3, 580
computation of, 121 LDLT, 138
perturbation or, 58-9 conjugate gradients and, 491-3
Toeplitz case, 197 LDMT, 135--8
lnverse orthogonal iteration, 339
Least squares problem
[teration matrix, 512
basic solution to, 258-9
INDEX 691

equaJjty constraints and, 585---6 inverse, 50


full rank, 236ff null space of, 49
minimum norm solution to, 256 operations with, 3
quadratic ineuqality constraint, 580-2 pencils, 375
rank deficient, 256ff powers,569
residual of, 237 range of, 49
sensitivity of, 242-4 rank of, 49
solution set of, 256 sign function, 372
SVD and, 257 transpose, 3
Least squares solution via. Matrix functions, 555ff
fast Givens, 241 integrating, 669-70
Lanczos, 486 Jordan decomposition and, 557-S
modified Gram-Schmidt, 241 polynomial evaluation, 568~9
Householder QR factorization, 239 Matrix norms, 6411
SVD, 257 consistency, 55
length, 210 Frobeniua, 55
Level of Operation, 13 relations between, 56
Level-3 fraction, 92, 146 subordinate, 56
Levinson algorithm, 196 Matrix times matrix
Linear equation sensitivity, 80ff block, 25--7, 29--30
Linear systems dot version, 11
banded systems, 162ff outer product version, 13
block tridiagonal systems, parallel, 292tf
174--5, 177-80 sa.xpy version, 12
general systems, 8711 shared memory, 292-3
Hessenberg, 154--5 torus, 293-9
Kronecker product, 180-1 Matrix times vector, 5-6
positive definite systems, 142ff block version, 28
symmetric indefinite systems, 16lff Message-passing, 276-7
Toeplitz systems, 193tf Minimax theorem for
triangular systems, 88ft symmetric eigenvalues, 394
tridjagonal, 156-7 s.inguJar values, 449
Vandermonde systems, 183tf MINRES, 494
Linpack, xiv Mixed precision, 127
Load balancing, 280, 282-3 Modified eigenproblems, 621-3
Local program, 285 Modified Gram-Schmidt, 231-2, 241
Log of a matrix, 566 Modified LR algorithm, 361
Look-Ahead, 505 Moore-Penrose conditions, 257-8
Loop reordering, 9-13 Multiple eigenvalues,
L06s of orthogonality and Lanczos tridiagonalization, 485
Gram-Schmidt, 232 and matrix functions, 660-1
Lanoos, 481~2 Multiple right hand sides, 91, 121
LR iteration, 335, 361 Multiplicity of eigenvalues, 316
LU factorization Multipliers, 96
band, 152-3
block, 101 Neighbor, 276
determinant and, 97-8 Netlib~xlv
differentiation of, 103 Network topology, 276
existence of, 97---8 Node program, 286
diagonal dominance and, 119-20 Nonderogatory matrices, 349
rectangular matrices and, 102 Nonsingular, 50
Normal equations, 237--9, 545-7
Machine precision. See Unit roundoff Normal matrix, 313-4
Mantissa, 60 Normality and eigenvalue condition, 323
Matlab, xiv, 88, 134, 207, 309, Norms
392, 556 matrix, 6411
Matrix vector, 52tf
block, 2411 Notation
diHerentation, 51 block ma.trices, 24-5
equations, 13 colon, 7, 19, 27
exponential, 572ft matrix, 3
functions, 655ff submatrix, 27
692 INDEX

vector, 4 linear equation problem, 80ff


x-o, 16 pseudo-inverse, 258
Null, 49 singular subspace pair, 45G-1
Null space, 49 singular values, 449-50
inter5ection of, 602-3 underdetermined systems, 272-3
Numerical rank and SVD 260-2 Pipelining, 35-6
Pivoting, 109
Off, 426 Aasen, 166
Operation count. See Work or column, 248-50
particular algorithm, complete, 117
Orthogonal partial, 110
basis, 69 symmetric matrices and, 148
complement, 69 Pivots, 97
matrix, 208 condition and, 107
Procrustes problem, 601 zero, 103
projection, 75 Plane rotations. Su Givens rotations,
Orthogonal iteration p-norms, 52
Ritz acceleration and, 422 minimization in, 236
symmetric, 410-1 Polar decomposition, 149
un.symmetric, 332-4 Polynomial precond.itioner, 539-40
Orthogonal matrix representations Positive definite systems, 14G-1
WY block form, 213-5 Gauss-Seidel and, 512
factored form, 212-3 LDLT and, 142
Givens rotations, 217-8 properties of, 141
Orthonormal basis computation, 229-32 unsymmetric, 142
Orthonormality, 69 Power method, 330-2
Outer product, 8 symmetric case 405--6
Overdetermined system, 236 Power series of matrix, 565
Overflow, 61 Powers of a matrix, 569
Overwriting, 23 Preconditioned conjugate
gradient method, 532ff
Pade approximation, 572-4 Pre-conditioners
Parallel computation incomplete block,536-7
gaxpy incomplete Cholesky, 535
message passing ring, 279 polynomial, 539-40
shared memory (dynamic), 289-90 unsymmetric case, 550
shared memory (static), 287 Principal angles and vectors, 603-4
Cholesky Processor id, 276
message passing ring, 300 Procrustes problem, 601
divide and conquer, 445--6 Projections, 75
Jacobi, 431-4 Pseudo-eigenvalues, 576--7
matrix multiplication Pseudo-inverse, 257
shared memory, 292-3
torus, 293-9 QMR, 551
Parlett-Reid method, 162-3 QR algorithm for eigenvalues
Partitioned matrix, 6 symmetric version, 414ff
Pencils, 375 unsymmetric version, 352ff
diagonalization of, 461-2 QR factorization, 223ff
equivalence of, 376 Block Householder
symmetric-definite, 461 computation, 225--6
Permutation matrices, 109-10 Classical gram-Schmidt and, 23G-1
Persymmetric matrix • 193 column pivoting and, 248-50, 591
Pert1..1rbation theory for Fast Givens computation of, 228-9
eigenvalues, 32G-4 Givens computation of, 226-7
eigenvalues (symmetric case). 395-7 Hessenberg matrices and, 227-8
eigenvectors, 326-7 Householder computation of, 224-5
eigenvectors (symmetric case), 399-400 least square problem and, 239-42
gt:meralized eigenvalue, 377-8 Modified Gram-Schmidt and, 231-2
invariant subspaces properties of, 229-30
symmetric case, 397-99 rank of matrix and, 248
unsymmetric case, 324-5 square systems and, 27G-l
least squares problem, 242-4 tridiagonal matrix and, 417
INDEX 693

underdetermined systems and, 271-2 two-by-two symmetric, 427-8


updating, 607-13 Schur vectors, 313
Quadratic form, 394 Search directions, 521ff
QZ algorithm, 3841£ Secular equationa, 443, 682
Selective orthogonalizaton, 483-4
Range, 49 SemideHnite systems, 147-9
Rank of matrix, 49 send, 277
determination of, 259 Sensitivity. See Perturbation
QR factorization and, 248 theory for.
subset selection and, 591--4 Sep, 325
SVD and, 72-3 Serious breakdown, 505
Rank deficient LS problem, 256tf Shared memory traffic, 287
Rank-one modification Shared memory systems, 285-9
of diagonal matrix, 442-4 Sherman-Morrison formula., 50
eigenvalues and, 397 Shifts in
QR factorization and, 607-13 QR algorithm, 353, 356
Rayleigh quotient iteration, 408-9 QZ algorithm, 382-3
QR algorithm and, 422 SVD algorithm, 452
symmetric-definite pencils and, 465 symmetric QR algorithm, 418-20
R-bidiagonaliz.ation, 552-3 Sign function, 372
Re, 14 Similarity transformation, 311
Real Schur decomposition, 341 condition of, 317
generalized, 377 nonunitary, 314, 317
recv, 277 Simpson's rule, 570
Rectangular LU, 102 Simultaneous diagonalization, 461-3
Relaxation parameter, 514 Simultaneous iteration. See.
Residuals vs. accuracy, 124 LR iteration, orthogonal iteration
Restarting Treppeniteration,
Arnoldi method and, 501-3 Sine of matrix, 566
GMRES and, 549 Single shift QR iteration, 354-5
Lanczos and, 584 Singular matrix, 50
Ridge regre88ion, 583-5 Singular value decomposition (SVD), 70-3
Ring, 276 algorithm for, 253---4, 448, 452
Ring algorithms constrained least squares and, 582-3
Cholesky, 30D-3 generalized, 465-7
Jacobi eigensolver, 434 La.nczos method for 1 495-6
Ritz, Linear systems and, 80
acceleration, 334 numerical rank and, 260---2
pairs and Arnoldi method, 500 null space a.nd, 71, 602-3
pairs and Lanczos method, 475 projections and, 75
Rotation of subspaces, 601 proof of, 70
Rounding errors, See pseudo-inverse and, 257
particular algorithm. rank of matrix and, 71
Roundoff error analysis, 62-7 ridge regression and, 583-5
Row addition or deletion, 610-1 subset .selection and, 591-4
Row partition, 6 subspace intersection and, 604-5
Row scalin&, 125 subspace rotation and, 601
Row weighting in LS problem, 265 total least squares and, 596-8
Singular values
Saxpy, 4,5 eigenvalues and, 318
Scaling interlacing properties, 449-50
linear systems and, 125 minimax cha.cacterization, 449
Scaling and squaring for exp(A), 573-4 perturbation of, 450---1
Schmidt orthogonalization. See Singular vectors, 70-1
Gram-Schmidt, Span, 49
Schur complement, 103 Spectral radius, 511
Schur decomposition, 313 Spectrum, 310
generalized, 377 Speed-up, 281
matrix functions and, 558---61 Splitting, 511
normal matrices and, 313-4 Square root of a matrix, 149
teal matrices and, 341-2 S~step Lanczos, 487
~ymmetric matrices and, 393 Static Scheduling, 286
694 INDEX

Stationary values, 621 Krylov subspaces and, 416


Steepest descent and conjugate Lanczos, 472ff
gradients, 520ff Tridiagonal matrices, 416
Store by inverse of, 537
band, 19-20 QR algorithm and, 417ff
block, 45 Tridiagonal systems, 156-7
diagonal, 21-3
Stride, 38-40 ULV updating, 613-8
Strassen method, 31-3, 66 Underdetennined systems, 271-3
Structure exploitation, 16-24 Underftow, 61
Sturm sequences, 440 Unit roundoff, 61
Subma.trix, 27 Unit stride, 38--40
Subordinate norm, 66 Unitary matrix, 73
Subset selection, 590 Unreduced Hessenberg matrices, 346
Subspa.ce, 49 Unsymmetric eigenproblem, 308ff
o.nglee between, 603-4 Unsymmetric La.nczos method, 503-6
basis for, 49 Unsymmetric positive definite systems, 142
deftating, 381, 386 Updating the QR factorization, 606-13
dimension, 49
distance between, 76-7 Va.ndermonde systems, 183--8
intersection, 604-5 Variance-cova.riance matrix, 245-6
inva.ria.nt, 372, 307-403 Vector length issue, 37-8
null space intersection, 602-3 Vector notation, 4
orthogonal projections onto, Vector norms, 52ff
rotation of, 601 Vector operations, 4, 36
Successive over-relaxation (SOR), 514 Vector touch, 41-2
Symmetric eigenproblem, 391fT Vector computing
Symmetric indefinite systems, 1611f models, 37
Symmetric positive definite systems, operations, 4, 36
LBltczos and, ff pipelining, 35-6
Symmetric storage, 2()..2 Vectorization, 34ff, 157--8
Symmetric successive over-relaxation,
(SSOR), 516-7 Weighting
sym.schur, 427 column, 264-5
SYMMLQ, 494 row, 586
Sweep, 429 See al.ao Scaling,
Sylvester equation, 366-9 Wiela.ndt-Hoffman theorem for
Sylvester law of inertia, 403 eigenva.l ues, 395
singular values, 450
Thy lor approximation of e'\ 565-7 Wilkinson shift, 4!8
Threshold Jacobi, 436 Work
Toeplitz matrix methods, 193ff least squares methods, 263
Torus, 276 linear system methods, 270
Total least squares., 595ft SVD and, 254
Trace, 310 Workspace, 23
Transformation matrices Wrap mapping, 278
Fast Givens, 218-21 WY representation, 213-5
Gauss, 94--5
Givens, 215 Yule-Walker problem, 194
Householder, 209
Hyperbolic, 611-2
Trench algorithm, 199
Treppeniteration, 335-6
Triangular matrices, 93
nmltiplication between, 17
unit, 92
Triangular systems, SSff
band, 153-4
multiple, 91
non-square, 92
'Itidia.gonalization,
Householder, 414

You might also like