Numerical Linear Algebra Biswa Nath Datta PART I - Ch1-6
Numerical Linear Algebra Biswa Nath Datta PART I - Ch1-6
Numerical Linear Algebra Biswa Nath Datta PART I - Ch1-6
and
Applications
DeKalb, IL 60115
e-mail: [email protected]
The book is dedicated to my Parents and Father-in-law and Mother-in-law
whose endless blessing has made writing of this book possible
PREFACE
Numerical Linear Algebra is no longer just a subtopic of Numerical Analysis; it has grown
into an independent topic for research over the past few years. Because of its crucial role in
scientic computing, which is a major component of modern applied and engineering research,
numerical linear algebra has become an integral component of undergraduate and graduate curricula
in mathematics and computer science, and is increasingly becoming so in other curricula as well,
especially in engineering.
The currently available books completely devoted to the subject of numerical linear algebra are
Introduction to Matrix Computations by G. W. Stewart, Matrix Computations by G.
H. Golub and Charles Van Loan, Fundamentals of Matrix Computations by David Watkins,
and Applied Numerical Linear Algebra by William Hager. These books, along with the
most celebrated book, The Algebraic Eigenvalue Problem by J. H. Wilkinson, are sources of
knowledge in the subject. I personally salute the books by Stewart and Golub and Van Loan because
I have learned \my numerical linear algebra" from them. Wilkinson's book is a major reference,
and the books by Stewart and Golub and Van Loan are considered mostly to be \graduate texts"
and reference books for researchers in scientic computing.
I have taught numerical linear algebra and numerical analysis at Northern Illinois University,
the University of Illinois, Pennsylvania State University, the University of California{San Diego,
and the State University of Campinas, Brazil. I have used with great success the books by Golub
and Van Loan and by Stewart in teaching courses at the graduate level.
As for introductory undergraduate numerical linear algebra courses, I, like many other instruc-
tors, have taught topics of numerical linear algebra from the popular \numerical analysis" books.
These texts typically treat numerical linear algebra merely as a subtopic, so I have found they do
not adequately cover all that needs to be taught in a numerical linear algebra course. In some under-
graduate books on numerical analysis, numerical linear algebra is barely touched upon. Therefore,
in frustration I have occasionally prescribed the books by Stewart and by Golub and Van Loan as
texts at the introductory level, although only selected portions of these books have been used in
the classroom, and frequently, supplementary class notes had to be provided. When I have used
these two books as \texts" in introductory courses, a major criticism (or compliment, in the view
of some) coming from students on these campuses has been that they are \too rich" and \too vast"
for students new to the subject.
As an instructor, I have always felt the need for a book that is geared toward the undergrad-
uate, and which can be used as an independent text for an undergraduate course in Numerical
Linear Algebra. In writing this book, I hope to fulll such a need. The more recent books Fun-
damentals of Matrix Computations, by David Watkins, and Applied Numerical Linear
Algebra, by William Hager, address this need to some extent.
This book, Numerical Linear Algebra and Applications, is more elementary than most
existing books on the subject. It is an outgrowth of the lecture notes I have compiled over the years
for use in undergraduate courses in numerical linear algebra, and which have been \class-tested"
at Northern Illinois University and at the University of California-San Diego . I have deliberately
chosen only those topics which I consider essential to a study of numerical linear algebra. The
book is intended for use as a textbook at the undergraduate and beginning graduate levels in
mathematics, computer science and engineering. It can also serve as a reference book for scientists
and engineers. However, it is primarily written for use in a rst course in numerical linear algebra,
and my hope is that it will bring numerical linear algebra to the undergraduate classroom.
Here the principal topics of numerical linear algebra, such as Linear Systems, the Matrix Eigen-
value Problem, Singular Value Decomposition, Least Squares Methods, etc., have been covered
at a basic level. The book focuses on development of the basic tools and concepts of numerical
linear algebra and their eective use in algorithm-development. The algorithms are explained in
\step-by-step" fashion. Wherever necessary, I have referred the reader to the exact locations of
advanced treatment of the more dicult concepts, relying primarily on the aforementioned books
by Stewart, Golub and Van Loan, and occasionally on that of Wilkinson.
I have also drawn heavily on applications from dierent areas of science and engineering, such
as: electrical, mechanical and chemical engineering; physics and chemistry; statistics; control theory
and signal and image processing. At the beginning of each chapter, some illustrative case studies
from applications of practical interest are provided to motivate the student. The algorithms are
then outlined, followed by implementational details. MATLAB codes are provided in the appendix
for some selected algorithms. A MATLAB toolkit, called MATCOM, implementing the major
algorithms in Chapters 4 through 8 of the book, is included with the book.
I will consider myself successful and my eorts rewarded if the students taking a rst course in
numerical linear algebra and applications, using this book as a text, develop a rm grasp of the
basic concepts of round-o errors, stability, condition and accuracy, and leave with a knowledge
and appreciation of the core numerical linear algebra algorithms, their basic properties and im-
plementations. I truly believe that the book will serve as the right text for most of the existing
undergraduate and rst year graduate courses in numerical linear algebra. Furthermore, it will
provide enough incentives for the educators to introduce numerical linear algebra courses in their
curricula, if such courses are not in existence already. Prerequisites are a rst course in linear
algebra and good knowledge of scientic programming.
Following is a suggested format for instruction using Numerical Linear Algebra and Ap-
plications as a text. These guidelines have been drawn on the basis of my own teaching and those
of several other colleagues with whom I have had an opportunity to discuss.
1. A First Course in Numerical Linear Algebra (Undergraduate { one semester)
Possibly also some very selected portions of Chapter 9 and Chapter 10, depending upon the
availability of time and the interests of the students and instructors
2. A Second Course in Numerical Linear Algebra (Advanced Undergraduate, First Year
Graduate { one semester)
Chapter 1: 1.3.5, 1.3.6
Chapter 5: 5.5, 5.6, 5.7, 5.8
Chapter 6: 6.3, 6.5.2, 6.5.5, 6.8, 6.10.5, 6.10.6
Chapter 7: 7.7, 7.8, 7.9, 7.10, 7.11
Chapter 8: 8.3, 8.8, 8.9, 8.10, 8.11, 8.12
Chapter 9: 9.2, 9.3, 9.4, 9.5, 9.6.1, 9.8, 9.9, 9.10
Chapter 10
Chapter 11
Chapter 1
Chapter 2
Chapter 3 (except possibly Section 3.8)
Chapter 4
Chapter 5: 5.1,5.2, 5.3, 5.4
Chapter 6: 6.2, 6.3, 6.4, 6.5.1, 6.5.3, 6.6.3 (only the statement and implication of Theorem
6.6.3), 6.7.1, 6.7.2, 6.7.3, 6.7.8, 6.8, 6.9, 6.10.1, 6.10.2, 6.10.3, 6.10.4, 6.10.5
Chapter 7: 7.3, 7.5, 7.8.1, 7.8.2
Chapter 8: 8.2, 8.3, 8.4, 8.5, 8.6.1, 8.6.2, 8.7.1, 8.9.1, 8.9.2, 8.9.3, 8.9.4, 8.9.6, 8.12
Chapter 9
Chapter 10: 10.2, 10.3, 10.4, 10.5, 10.6.1, 10.6.3, 10.6.4, 10.8.1, 10.9.1, 10.9.2
CHAPTER-WISE BREAKDOWN
Chapter 1, Some Required Concepts from Core Linear Algebra, describes some important results
from theoretical linear algebra. Of special importance here are vector and matrix norms, special
matrices and convergence of the sequence of matrix powers, etc., which are essential to the under-
standing of numerical linear algebra, and which are not usually covered in an introductory linear
algebra course.
Chapter 2 is on Floating Point Numbers and Errors in Computations. Here the concepts of
oating point number systems and rounding errors have been introduced and it has been shown
through examples how round-o errors due to cancellation and recursive computations can \pop
up", even in simple calculations, and how these errors can be reduced in certain cases. The IEEE
oating point standard has been discussed.
Chapter 3 deals with Stability of Algorithms and Conditioning in Problems. The basic concepts
of conditioning and stability, including strong and weak stability, have been introduced and exam-
ples have been given on unstable and stable algorithms and ill-conditioned and well-conditioned
problems. It has been my experience, as an instructor, that many students, even after taking a
few courses on numerical analysis, do not clearly understand that \conditioning" is a property of
the problem, stability is a property of the algorithm, and both have eects on the accuracy of the
solution. Attempts have been made to make this as clear as possible.
It is important to understand the distinction between a \bad" algorithm and a numerically
eective algorithm and the fact that popular mathematical software is based only on numerically
eective algorithms. This is done in Chapter 4, Numerically Eective Algorithms and Mathematical
Software. The important properties such as eciency, numerical stability, storage-economy, etc.,
that make an algorithm and the associated software \numerically eective" are explained with
examples. In addition, a brief statement regarding important matrix software such as LINPACK,
EISPACK, IMSL, MATLAB, NAG, LAPACK, etc., is given in this chapter.
Chapter 5 is on Some Useful Transformations in Numerical Linear Algebra and Their Appli-
cations. The transformations such as elementary transformations, Householder re
ections, and
Givens rotations form the principal tools of most algorithms of numerical linear algebra. These
important tools are introduced in this chapter and it is shown how they are applied to achieve the
important decompositions such as LU and QR, and reduction to Hessenberg forms. This chapter
is a sort of \preparatory" chapter for the rest of the topics treated in the book.
Chapter 6 deals with the most important topic of numerically linear algebra, Numerical So-
lutions of Linear Systems. The direct methods such as Gaussian elimination with and without
pivoting, QR factorization methods, the method based on Cholesky decompositions, the methods
that take advantage of special structure of matrices, etc., the standard iterative methods such as
the Jacobi, the Gauss-Seidel, successive overrelation, iterative renement, and the perturbation
analysis of the linear system and computations of determinants, inverses and leading principal mi-
nors are discussed in this chapter. Some motivating examples from applications areas are given
before the techniques are discussed.
The Least Squares Solutions to Linear Systems, discussed in Chapter 7, are so important in
applications that the techniques for nding them should be discussed as much as possible, even
in an introductory course in numerical linear algebra. There are users who still routinely use the
normal equations method for computing least squares solution; the numerical diculties associated
with this approach are described in some detail and then a better method, based on the QR
decomposition for the least squares problems, is discussed. The most reliable general-purpose
method based on the singular value decomposition is mentioned in this chapter, and treated in full in
Chapter 10. The QR methods for rank-decient least squares problem and for the underdetermined
problem, and iterative renement procedures are also discussed in this chapter. Some discussion
on perturbation analysis is also included.
Chapter 8 picks up another important topic, probably the second most important topic, Nu-
merical Matrix Eigenvalue Problems. There are users who still believe that eigenvalues should be
computed by nding the zeros of the characteristic polynomial. It is clearly explained why this is
not a good general rule. The standard and most widely used techniques for eigenvalues computa-
tions, the QR iteration with and without shifts, are then discussed in some detail. The popular
techniques for eigenvector computations such as the inverse power method and the Rayleigh Quo-
tient Iteration are described, along with techniques of eigenvalue locations. The most common
methods for the symmetric eigenvalue problem and the symmetric Lanczos method are described
very brie
y in the end. Discussion on the stability of dierential and dierence equations, and
engineering applications to the vibration of structures and a stock market example from statistics,
are included which will serve as motivating examples for the students.
Chapter 9 deals with The Generalized Eigenvalue Problem (GEP). The GEP arises in many
practical applications such as in mechanical vibrations, design of structures, etc. In fact, in these
applications, almost all eigenvalue problems are generalized eigenvalue problems, most of them
are symmetric denite problems. We rst present a generalized QR iteration for the pair (A; B ),
commonly known as the QZ iteration for GEP. Then we discuss in detail techniques of simultaneous
diagonalization for generalized symmetric denite problems. Some applications of simultaneous
diagonalization techniques, such as decoupling of a system of second-order dierential equations,
are described in some detail. Since several practical applications, e.g. the design of large sparse
structures, give rise to very large-scale generalized denite eigenvalue problems, a brief discussion
on Lanczos-based algorithms for such problems is also included. In addition, several case studies
from vibration and structural engineering are presented. A brief mention is made of how to reduce
a quadratic eigenvalue problem to a standard eigenvalue problem, or to a generalized eigenvalue
problem.
The Singular Value Decomposition (SVD) and singular values play important roles in a wide
variety of applications. In Chapter 10, we rst show how the SVD can be used eectively to solve
computational linear algebra problems arising in applications, such as nding the structure of a
matrix (rank, nearness to rank-deciency, orthonormal basis for the range and the null space of a
matrix, etc.), nding least squares solutions to linear systems, computing the pseudoinverse, etc.
We then describe the most widely used method, the Golub-Kahan-Reinsch method, for computing
the SVD and its modication by Chan. The chapter concludes with the description of a very recent
method for computing the smallest singular values of a bidiagonal matrix with high accuracy by
Demmel and Kahan. A practical life example on separating the fetal ECG from the maternal ECG
is provided in this chapter as a motivating example.
The stability (or instability) of an algorithm is usually established by means of backward round-
o error analysis, introduced and made popular by James Wilkinson. Working out the details of
round-o error analysis of an algorithm can be quite tedious, and presenting such analysis for
every algorithm is certainly beyond the scope of this book. At the same time, I feel that every
student of numerical linear algebra should have some familiarity with the way rounding analysis
of an algorithm is performed. We have given the readers A Taste of Round-o Error Analysis
in Chapter 11 of the book by presenting such analyses of two popular algorithms: solution of a
triangular system and Gaussian elimination for triangularization. For other algorithms, we just
present the results (without proof) in the appropriate places in the book, and refer the readers to
the classic text The Algebraic Eigenvalue Problem by James H. Wilkinson and occasionally
to the book of Golub and Van Loan for more details and proofs.
The appendix contains MATLAB codes for a selected number of basic algorithms. Students
will be able to use these codes as a template for writing codes for more advanced algorithms. A
MATLAB toolkit containing implementation of some of the most important algorithms has been
included in the book, as well. Students can use this toolkit to compare dierent algorithms for the
same problem with respect to eciency, accuracy, and stability. Finally, some discussion of how to
write MATLAB programs will be included.
Some Basic Features of the Book
NUMERICAL LINEAR ALGEBRA AND APPLICATIONS
Clear explanation of the basic concepts. The two most fundamental concepts of Numer-
ical Linear Algebra, namely, the conditioning of the problem and the stability of an algorithm
via backward round-o error analysis, are introduced at a very early stage of the book with
simple motivating examples.
Specic results on these concepts are then stated with respect to each algorithm and problem
in the appropriate places, and their in
uence on the accuracy of the computed results is
clearly demonstrated. The concepts of weak and strong stability, recently introduced by
James Bunch, will appear for the rst time in this book.
Most undergraduate numerical analysis textbooks are somewhat vague in explaining these
concepts which, I believe, are fundamental to numerical linear algebra.
Discussion of fundamental tools in a separate chapter. Elementary, Householder and
Givens matrices are the three most basic tools in numerical linear algebra. Most computation-
ally eective numerical linear algebra algorithms have been developed using these basic tools
as principal ingredients. A separate chapter (Chapter 5) has been devoted to the introduc-
tion and discussion of these basic tools. It has been clearly demonstrated how a simple|but
very powerful|property of these matrices, namely, the ability of introducing zeros in specic
positions of a vector or of a matrix, can be exploited to develop algorithms for useful matrix
factorizations such as LU and QR and for reduction of a matrix to a simple form such as
Hessenberg.
In my experience as a teacher, I have seen that once students have been made familiar with
these basic tools and have learned some of their most immediate applications, the remainder
of the course goes very smoothly and quite fast.
Throughout the text, soon after describing a basic algorithm, it has been shown how the
algorithm can be made cost-eective and storage-ecient using the rich structures of these
matrices.
Step-by-step explanation of the algorithms. The following approach has been adopted
in the book for describing an algorithm: the rst few steps of the algorithm are described in
detail and in an elementary way, and then it is shown how the general kth step can be written
following the pattern of these rst few steps. This is particularly helpful to the understanding
of an algorithm at an undergraduate level.
Before presenting an algorithm, the basic ideas, the underlying principles and a clear goal
of the algorithm have been discussed. This approach appeals to the student's creativity and
stimulates his interest. I have seen from my own experience that once the basic ideas, the
mechanics of the development and goals of the algorithm have been laid out for the student,
he may then be able to reproduce some of the well-known algorithms himself, even before
learning them in the class.
Clear discussion of numerically eective algorithms and high-quality mathemat-
ical software. Along with mathematical software, a clear and concise denition of a \nu-
merically eective" algorithm has been introduced in Chapter 3 and the important properties
such as eciency, numerical stability, storage-economy etc., that make an algorithm and asso-
ciated software numerically eective, have been explained with ample simple examples. This
will help students not only to understand the distinction between a \bad" algorithm and a
numerically eective one, but also to learn how to transform a bad algorithm into a good
one, whenever possible. These ideas are not clearly spelled out in undergraduate texts and
as result, I have seen students who, despite having taken a few basic courses in numerical
analysis, remain confused about these issues.
For example, an algorithm which is only ecient is often mistaken by students for a \good"
algorithm, without understanding the fact that an ecient algorithm can be highly unstable
(e.g., Gaussian elimination without pivoting).
Applications. A major strength of the book is applications. As a teacher, I have often
been faced with questions such as: \Why is it important to study such-and-such problems?",
\Why do such-and-such problems need to be solved numerically?", or \What is the physical
signicance of the computed quantities?" Therefore, I felt it important to include practical
life examples as often as possible, for each computational problem discussed in the book.
I have done so at the outset of each chapter where numerical solutions of a computational
problem have been discussed. The motivating examples have been drawn from applications
areas, mainly from engineering; however, some examples from statistics, business, bioscience,
and control theory have also been given. I believe these examples will provide sucient
motivation to the curious student to study numerical linear algebra.
After a physical problem has been posed, the physical and engineering signicance of its
solution has been explained to some extent. The currently available numerical linear algebra
and numerical analysis books do not provide suciently motivating examples.
MATLAB codes and the MATLAB toolkit. The use of MATLAB is becoming increas-
ingly popular in all areas of scientic and engineering computing. I feel that numerical linear
algebra courses should be taught using MATLAB wherever possible. Of course, this does
not mean that the students should not learn to write FORTRAN codes for their favorite
algorithms{knowledge of FORTRAN is a great asset to a numerical linear algebra student.
MATLAB codes for some selected basic algorithms have therefore been provided to help the
students use these codes as templates for writing codes for more advanced algorithms. Also,
a MATLAB toolkit implementing the major algorithms presented in the book has been pro-
vided. The students will be able to compare dierent algorithms for the same problem with
regard to eciency, stability, and accuracy. For example, the students will be able to see in-
stantly, through numerical examples, why Gaussian elimination is more ecient than the QR
factorization method for linear systems problems; why the computed Q in QR factorization
may be more accurate with the Householder or Givens method than with the Gram-Schmidt
methods, etc.
Thorough discussions and the most up-to-date information. Each topic has been
very thoroughly discussed, and the most current information on the state of the problem has
been provided. The most frequently asked questions by the students have also been answered.
Solutions and answers to selected problems. Partial solutions for selected important
problems and, in some cases, complete answers, have been provided. I feel this is important
for our undergraduate students. In selecting the problems, emphasis has been placed on those
problems that need proofs.
Above all, I have imparted to the book my enthusiasm and my unique style of presenting
material in an undergraduate course at the level of the majority of students in the class, which
have made me a popular teacher. My teaching evaluations at every school at which I have taught
(e.g., State University of Campinas, Brazil; Pennsylvania State University; the University of Illinois
at Urbana-Champaign; University of California, San Diego; Northern Illinois University, etc.) have
been consistently \excellent" or \very good". As a matter of fact, the consistently excellent feedback
that I receive from my students provided me with enough incentive to write this book.
0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND COMPUTA-
TIONAL DIFFICULTIES
0.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1
0.2 Fundamental Linear Algebra Problems and Their Importance : : : : : : : : : : : : : 1
0.3 Computational Diculties of Solving Linear Algebra Problems Using Obvious Ap-
proaches : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4
CHAPTER 0
LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE AND
COMPUTATIONAL DIFFICULTIES
0. LINEAR ALGEBRA PROBLEMS, THEIR IMPORTANCE
AND COMPUTATIONAL DIFFICULTIES
0.1 Introduction
The main objectives of this chapter are to state the fundamental linear algebra problems at the
outset, make a brief mention of their importance, and point out the diculties that one faces in
computational setting when trying to solve these problems using obvious approaches.
A practical variation of the problem requires solutions of several linear systems with the same
matrix A on the left hand side. That is, the problem there is to nd a matrix X = [x1; x2; ::; xm]
such that
AX = B;
where B = [b1; b2; :::; bm] is an n m matrix.
Associated with linear system problems are the problems of nding the inverse of a matrix,
nding the rank, the determinant, the leading principal minors, an orthonormal basis for the range
and the null-space of A, and various projection matrices associated with A. Solutions of some of
these later problems require matrix factorizations and the problem of matrix factorizations and
linear system problems are intimately related.
It is perhaps not an exaggeration to say that the linear system problem arises in almost all
branches of science and engineering: applied mathematics, biology, chemistry, physics, electrical,
mechanical, civil, and vibration engineering, etc.
The most common source is the numerical solution of dierential equations. Many mathe-
matical models of physical and engineering systems are systems of dierential equations: ordinary
and partial. A system of dierential equations is normally solved numerically by discretizing the
system by means of nite dierences or nite element methods. The process of discretization, in
1
general, leads to a linear system, the solution of which is an approximate solution to the dierential
equations. (see Chapter 6 for more details).
Least squares problems arise in statistical and geometric applications that require tting a
polynomial or curve to experimental data, and engineering applications such as signal and image
processing. See Chapter 7 for some specic applications of least squares problems. It is worth
mentioning here that methods for numerically solving least squares problems invariably lead to
solutions of linear systems problems (see again Chapter 7 for details).
The eigenvalue problem typically arises in the explicit solution and stability analysis of a ho-
mogeneous system of rst order dierential equations. The stability analysis requires only implicit
knowledge of eigenvalues, whereas the explicit solution requires eigenvalues, and eigenvectors ex-
plicitly.
Applications such as buckling problems, stock market analysis, study of behavior of dynamical
systems, etc. require computations of only a few eigenvalues and eigenvectors, usually the few
largest or smallest ones.
In many practical instances, the matrix A is symmetric, and thus, the eigenvalue problem
becomes a symmetric eigenvalue problem. For details of some specic applications see Chapter 8.
A great number of eigenvalue problems arising in engineering applications are, however, generalized
eigenvalue problems, as stated below.
2
D. The Generalized and Quadratic Eigenvalue Problems: Given the n n
matrices A; B , and C , the problem is to nd i and xi such that
(2i A + iC + B )xi = 0; i = 1; : : :; n:
This is known as the quadratic eigenvalue problem. In the special case when C is a zero
matrix, the problem reduces to a generalized eigenvalue problem. That is, if we are given n n
matrices A and B , we must nd and x such that
Ax = Bx:
The leading equations of vibration engineering (a branch of engineering dealing with vibrations of
structures, etc.) are systems of homogeneous or nonhomogeneous second-order dierential equa-
tions. A homogeneous second order system has the form:
Az + C z_ + Bz = 0 ;
the solution and stability analysis of which lead to a quadratic eigenvalue problem.
Vibration problems are usually solved by setting C = 0.Moreover, in many
practical instances, the matrices A and B are symmetric and positive denite. This leads to a
symmetric denite generalized eigenvalue problem.
See Chapter 9 for details of some specic applications of these problems.
The above decomposition is known as the Singular Value Decomposition of A. The entries
of are singular values. The column vectors of U and V are called the singular vectors.
Many areas of engineering such as control and systems theory, biomedical engineering, signal
and image processing, and statistical applications give rise to the singular value decomposition
problem. These applications typically require the rank of A, an orthonormal basis, projections, the
3
distance of a matrix from another matrix of lower rank, etc., in the presence of certain impurities
(known as noises) in the data. The singular values and singular vectors are the most numerically
reliable tools to nd these entities. The singular value decomposition is also the most numerically
eective approach to solve the least squares problem, especially, in the rank-decient case.
4
The above equations are known as the normal equations. Unfortunately, this procedure
has some severe numerical limitations. First, in nite precision arithmetic, during an explicit
formation of AT A, some vital information might be lost. Second, the normal equations are
more sensitive to perturbations than the ordinary linear system Ax = b, and this sensitivity,
in certain instances, corrupts the accuracy of the computed least squares solution to an extent
not warranted by the data. (See Chapter 7 for more details.)
Computing the eigenvalues of a matrix by nding the roots of its characteristic
polynomial: The eigenvalues of a matrix A are the zeros of its characteristic polynomial.
Thus an \obvious" procedure for nding the eigenvalues would be to compute the charac-
teristic polynomial of A and then nd its zeros by a standard well-established root-nding
procedure. Unfortunately, this is not a numerically viable approach. The round-o errors pro-
duced during a process for computing the characteristic polynomial, will very likely produce
some small perturbations in the computed coecients. These small errors in the coecients
can aect the computed zeros very drastically in certain cases. The zeros of certain poly-
nomials are known to be extremely sensitive to small perturbations in the coecients. A
classic example of this is the Wilkinson-polynomial (see Chapter 3). Wilkinson took a poly-
nomial of degree 20 with the distinct roots: 1 through 20, and perturbed the coecient of x19
by a signicantly small amount. The zeros of this slightly perturbed polynomial were then
computed by a well-established root-nding procedure, only to nd that some zeros became
totally dierent. Some even became complex.
Solving the Generalized Eigenvalue Problem and the Quadratic Eigenvalue Prob-
lems by Matrix Inversion: The generalized eigenvalue problem in the case where B is
nonsingular
Ax = Bx
is theoretically equivalent to the ordinary eigenvalue problem:
B;1 Ax = x:
However, if the nonsingular matrix B is sensitive to perturbations, then forming the matrix
on the left hand side by explicitly computing the inverse of B will lead to inaccuracies that
in turn will lead to computations of inaccurate generalized eigenvalues.
Similar results hold for the quadratic eigenvalue problem. In major engineering applications,
such as in vibration engineering, the matrix A is symmetric positive denite, and is thus
5
nonsingular. In that case the quadratic eigenvalue problem is equivalent to the eigenvalue
problem
Eu = u ; where
0 I
!
E= :
;A B ;A C
;1 ;1
But numerically it is not advisable to solve the quadratic eigenvalue problem by actually
computing the matrix E explicitly. If A is sensitive to small perturbations, the matrix E
cannot be formed accurately, and the computed eigenvalues will then be inaccurate.
Finding the Singular Values by computing the eigenvalues of AT A: Theoretically,
the singular values of A are the nonnegative square roots of the eigenvalues of AT A. However,
nding the singular values this way is not advisable. Again, explicit formation of the matrix
might lead to the loss of signicant relevant information. Consider a rather trivial example:
01 11
B CC
AB@ 0A
0 0
where is such that!in nite precision computation 1 + 2 = 1. Then computationally we
1 1
have AT A = . The eigenvalues now are 2 and 0. So the computed singular values
1 1 p p p
will now be given by 2, and 0. The exact singular values, however are 2 and = 2 (See
Chapter 10 for details.)
Conclusion: Above we have merely pointed out how certain obvious theoretical approaches to
linear algebra problems might lead to computational diculties and inaccuracies in computed re-
sults. Numerical linear algebra deals with in-depth analysis of such diculties, investigations of how
these diculties can be overcome in certain instances, and with formulation and implementations
of viable numerical algorithms for scientic and engineering use.
6
1. A REVIEW OF SOME REQUIRED CONCEPTS FROM CORE LINEAR ALGE-
BRA
1.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
1.2 Vectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
1.2.1 Subspace and Basis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
1.3 Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
1.3.1 Range and Null Spaces : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
1.3.2 Rank of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
1.3.3 The Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14
1.3.4 Similar Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
1.3.5 Orthogonality and Projections : : : : : : : : : : : : : : : : : : : : : : : : : : 15
1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix : : : : 17
1.4 Some Special Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
1.5 The Cayley-Hamilton Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26
1.6 Singular Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27
1.7 Vector and Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
1.7.1 Vector Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
1.7.2 Matrix Norms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30
1.7.3 Convergence of a Matrix Sequence and Convergent Matrices : : : : : : : : : : 34
1.7.4 Norms and Inverses : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37
1.8 Norm Invariant Properties of Orthogonal Matrices : : : : : : : : : : : : : : : : : : : 40
1.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 41
1.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 42
CHAPTER 1
A REVIEW OF SOME REQUIRED CONCEPTS
FROM CORE LINEAR ALGEBRA
1. A REVIEW OF SOME REQUIRED CONCEPTS FROM
CORE LINEAR ALGEBRA
1.1 Introduction
Although a rst course in linear algebra is a prerequisite for this book, for the sake of completeness
we establish some notation and quickly review the basic denitions and concepts on matrices and
vectors in this chapter, and then discuss in somewhat greater detail the concepts and fundamen-
tal results on vector and matrix norms and their applications to the study of convergent
matrices. These results will be used frequently in the later chapters of the book.
1.2 Vectors
An ordered set of numbers is called a vector; the numbers themselves are called the components
of the vector. A lower case italic letter is usually used to denote a vector. A vector v having n
components has the form 2 3 v
66 v 1
77
v = 666 .. 2 77 :
75
4.
vn
A vector in this form is referred to as a column vector and its transpose is a row vector. The
set of all n-vectors (that is, each vector having n components) will be denoted by Rn1 or simply
by Rn . The transpose of a vector v will be denoted by v T . Unless otherwise stated, a column vector
will simply be called a vector.
If u and v are two row vectors in Rn, then their sum u + v is dened by
u + v = (u + v ; u + v ; : : :; um + vn)T :
1 1 2 2
If c is a scalar, then cu = (cu1; cu2; : : :; cun)T . The inner product of two vectors u and v is
the scalar given by
uvT = u1 v1 + u2 v2 + + unvn:
p
The length of a vector v , denoted by kv k, is v T v; that is, the length of v (or Euclidean length of
p
v) is v12 + v22 + + vn2 .
A set of vectors fm1; : : :; mk g in Rn is said to be linearly dependent if there exist scalars
c1; : : :; ck, not all zero, such that
c m + + ck mk = 0
1 1 (zero vector).
Otherwise, the set is called linearly independent.
7
Example 1.2.1
The set of vectors
ei = (0; 0; : : :; 0; 1 ; 0 : : :; 0)T ; i = 1; : : :; n
"
ith component
is linearly independent, because
0c 1
BB c 1
CC
c e + c e + + cnen = BBB .. 2 CC = 0
CA
@.
1 1 2 2
cn
is true if and only if
c = c = = cn = 0:
1 2
Example 1.2.2
1 ;3 ! !
The vectors and are linearly dependent, because
;2 6
1
!
;3 0
! !
3 + = :
;2 6 0
Thus, c1 = 3; c2 = 1.
.
Two vectors u and v are orthogonal if = 90o , that is uT v = 0. The symbol ? is used to denote
orthogonality.
Let S be a set of vectors in Rn . Then S is called a subspace of Rn if s1 ; s2 2 S implies
c1s1 + c2 s2 2 S , where c1 and c2 are any scalars. That is, S is a subspace if any linear combination
of two vectors in S is also in S . Note that the space Rn itself is a subspace of Rn . For every
subspace there is a unique smallest positive integer r such that every vector in the subspace can be
8
expressed as a linear combination of at most r vectors in the subspace; r is called the dimension
of the subspace and is denoted by dim[S ]. Any set of r linearly independent vectors from S of
dim[S ] = r forms a basis of the subspace.
Orthogonality of Two Subspaces. Two subspaces S1 and S2 of Rn are said to be orthogonal
if sT1 s2 = 0 for every s1 2 S1 and every s2 2 S2 . Two orthogonal subspaces S1 and S2 will be
denoted by S1 ? S2.
1.3 Matrices
A collection of n vectors in Rn arranged in a rectangular array of m rows and n columns is called
a matrix. A matrix A, therefore, has the form
0a b b n 1
BB a 11 12
b
1
b n C
C
A=BBB .. 21 22 2 CC :
CA
@ .
am bm bmn
1 2
C B
1 1
B a b
1
a b
CC
abT = B B C B .. C
m
B@ ... C ( b b bm ) = B C:
2 2 1 2
C B ..
1
A
2
@ . . C A
an anb an bm
1
9
Example 1.3.1
011
B CC
a=B
@2A; b = (2 3 4)
3
02 3 4 1
B C
Outer product abT = B@ 4 6 8 CA (a matrix).
6 9 120 1
2
Inner product aT b = ( 1 2 3 ) B@ 3 CCA = 20 (a scalar).
B
4
The transpose of a matrix A of order m n, denoted by AT , is a matrix of order n m with
rows and columns interchanged:
i = 1; : : :n;
AT = (aji);
j = 1; : : :m:
Note that the matrix product is not commutative; that is, in general
AB 6= BA:
Also, (AB )T = B T AT .
AB = BBB .. CCC :
2
@ . A
am B
10
Block Matrices
If two matrices A and B can be partitioned as
A A ! B B !
A= 11 12
; B= 11 12
;
A 21 A22 B 21 B22
then considering each block as an element of the matrix, we can perform addition, scalar multipli-
cation and matrix multiplication in the usual way. Thus,
A11 + B11 A12 + B21
!
A+B =
A21 + B21 A22 + B22
and
A11B11 + A12B21 A11 B12 + A12B22
!
AB = ;
A21B11 + A22B21 A21 B12 + A22B22
assuming that the partioning has been done conformably so that the corresponding matrix mul-
tiplications are possible. The concept of two block partioning can be easily generalized.
Thus, if A = (Aij ) and B = (Bij ) are two block matrices, then C = AB is given by
X
n !
C = (Cij ) = Aik Bkj ;
k=1
where each Aik ; Bkj , and Cij is a block matrix.
A block diagonal matrix is a diagonal matrix where each diagonal element is a square matrix.
That is,
A = diag(A11; : : :; Ann);
where Aii are square matrices.
The Determinant of a Matrix
For every square matrix A, there is a unique number associated with the matrix called the
determinant of A, which is denoted by det(A). For a 2 2 matrix A, det(A) = a11a22 ; a12a21;
for a 3 3 matrix A = (aij ), det(A) = a11 det(A11) ; a12 det(A12) + a13 det(A13), where A1i
is a 2 2 submatrix obtained by eliminating the rst row and the ith column. This can be easily
generalized. For an n n matrix A = (aij ) we have
det(A) = (;1)i+1ai1 det(Ai1) + (;1)i+2 ai2 det(Ai2)
+ + (;1)i+nain det(Ain);
where Aij is the submatrix of A of order (n ; 1) obtained by eliminating the ith row and j th column.
11
Example 1.3.2
01 2 31
B 4 5 6 CC :
A=B
@ A
7 8 9
Set i = 1. Then
5 6
! 4 6
! 4 5
!
det(A) = 1 det ; 2 det + 3 det
8 9 7 9 7 8
= 1(;3) ; 2(;6) + 3(;3) = 0:
Theorem 1.3.1 The following simple properties of det(A) hold:
1. det(A) = det(AT )
2. det(A) = n det(A), where is a scalar.
3. det(AB ) = det(A) det(B ).
4. If two rows or two columns of A are identical, then det(A) = 0.
5. If B is a matrix obtained from A by interchanging two rows or two columns, then det(B ) =
; det(A).
6. The determinant of a triangular matrix is the product of its diagonal entries.
(A square matrix A is triangular if its elements below or above the diagonal are all zero.)
The Characteristic Polynomial, the Eigenvalues and Eigenvectors of a Matrix
Let A be an n n matrix. Then the polynomial pn () = det(I ; A) is called the characteristic
polynomial. The zeros of the characteristic polynomial are called the eigenvalues of A. Note
that this is equivalent to the following: is an eigenvalue of A i there exists a nonzero vector x
such that Ax = x. The vector x is called a right eigenvector (or just an eigenvector), and the
vector y satisfying y A = y is called a left eigenvector associated with .
Denition 1.3.1 An n n matrix A having fewer than n linearly independent eigenvectors is
called a defective matrix .
Example 1.3.3
12
1 2
!
The matrix A =
0 1 " 1 # " ;1 #
is defective. The two eigenvectors and are linearly independent.
0 0
The Determinant of a Block Matrix
A !
Let
A
A= ;
11 12
0 A 22
An important property of similar matrices. Two similar matrices have the same
eigenvalues. (for a proof, see Chapter 8, section 8.2)
1.3.5 Orthogonality and Projections
A set of vectors fv ; : : :; vmg in Rn is orthogonal if
1
viT vj = 0; i 6= j:
If, in addition, viT vi = 1, for each i, then they are called orthonormal.
A basis for a subspace that is also orthonormal is called an orthonormal basis for the subspace.
Example 1.3.6
15
0 0 01
B C
@ 1 CA :
A=B 1
2
1
1
0 0 1 2
B ;p
The vector B
C
C
@ 1
5 A forms an orthonormal basis for R(A). (See section 5.6.1)
;p 1
5
Example 1.3.7
01 21
B 0 1 CC
A=B
@ A
1 0
0 ;p ;p 1 1 1
B 0 p; C 2 3
The matrix V = B
1 @ C
A forms an orthonormal basis for R(A). (See section 5.6.1)
1
3
;1
p p13
2
Orthogonal Projection
Let S be a subspace of Rn. Then an n n matrix P having the properties:
(i) R(P ) = S
(ii) P T = P (P is symmetric)
(iii) P 2 = P (P is idempotent)
is called the orthogonal projection onto S or simply the projection matrix. We denote the
orthogonal projection P onto S by PS . The orthogonal projection onto a subspace is unique.
It can be shown (exercise 14(b)) that if A is m n(m n) and has full rank,
then
PA = A(AT A); AT 1
PN = I ; A(AT A); AT : 1
Example 1.3.8
01 21
B CC
A=B @0 1A
1 0
! 0 1
AT A =
2 2
; (AT A); = @
; A
1
5
6
1
3
2 5 ; 1 1
0 1 3 3
B CC
5 1 1
PA = B
6 3 6
B@ 1
3
; C A
1
3
1
3
; ;
1
6
1
3
1
6
1.3.6 Projection of a Vector onto the Range and the Null Space of a Matrix
Any vector b can be written as
b = bS + bS? ;
where bS 2 S and bS ? 2 S ? . Let S be the R(A) of a matrix A. Then bS 2 R(A) and bS ? 2 N (AT ).
We will therefore denote bS by bR and bS ? by bN , meaning that bR is in the range of A and bN is
in the null space of AT .
17
bR and bN are called the orthogonal projection of b onto R(A) and the orthogonal projection
of b onto N (AT ), respectively.
From above, we easily see that
bTR bN = 0:
Example 1.3.9
0 0 01 011
B C B C
A=B
@ 1 CA ; b = B@ 1 CA
1
2
1
1 1
0 1
2
B 0 C
B@ ; p CCA
V = an orthonormal basis = B 1
2
;p 1
2
00 0 01
B C
PA = V V T = B
@0 C
A 1
2
1
2
0 1 1
00 0 01011 001
2 2
B0
bR = PAb = B
CC BB 1 CC = BB 1 CC
@ A@ A @ A
1
2
1
2
0 1 1
1 1
01 0 0 1
2 2
B C
PN = (I ; PA) = B
@0 ; C
A 1
2
1
2
0 ; 1 1
011
2 2
B C
bN = PN b = B
@0CA
0
Note that b = bR + bN .
2. Triangular Matrix { A square matrix A = (aij ) is an upper triangular matrix if aij = 0 for
i > j.
18
The transpose of an upper triangular matrix is lower triangular; that is, A = (aij ) is lower
triangular if aij = 0 for i > j .
0 1 0 1
BB 0 CC BB CC
@ A 0
@ A
LOWER TRIANGULAR UPPER TRIANGULAR
19
The following two important properties of orthogonal mattices make them so at-
tractive for numerical computation :
1. The inverse of an orthogonal matrix is just its transpose O;1 = OT
2. The product of two orthogonal matrices is an orthogonal matrix.
where ei is the ith row of the n n identity matrix I , is a permutation matrix. Similarly,
P = (e1 ; e2 ; : : :; e );
n
1 0 0 0 0 1 0 1 0
are all permutation matrices.
Eects of Pre-multiplication and Post-multiplication by a permutation matrix.
0e 1
BB .. C 1
If P = @ . C
1 A, then
e n
0 1
BB th row of A
1
CC
P A=B
B th row of A
2 CC :
1 BB ..
.
CC
@ A
nth row of A
Similarly, if P = (e1 e1 e ), where e is the ith column of A, then
2 n i
20
Example 1.4.2
0a a a 1 00 1 01 0e 1
B 31 12 13
C B C B 2
CC
1. A = B
@a a 21 22 a 23
C
A; P = B
@ 0 0 1 CA = B@ e
1 3 A
a13 a 23 a 33 1 0 0 e1
0a a a23 1 0 2nd row of A 1
B 21 22
C B C
P A=B
1 @a 31 a 32 a33 C
A=B
@ 3rd row of A CA
a 11 a 12 a13 1st row of A
00 1 01
B 0 0 1 CC = (e ; e ; e )
2. P = B
1 @ A 3 1 2
1 0 0
0a a a 1
Ba
AP = B
13
a
11
a
12
CC = (3rd column of A, 1st column of A, 2nd column of A)
@1 23 21 22 A
a 33 a 31 a 32
21
An upper Hessenberg matrix A = (aij ) is unreduced if
ai;i; 6= 0 for i = 2; 3; : : :; n
1
Example 1.4.3
01 2 01
A=B
B2 3 4C
C
@ A is an unreduced lower Hessenberg matrix.
1 1 1
01 1 1
1
B C
A=B
@1 1 1CA is an unreduced upper Hessenberg matrix.
0 2 3
Some Useful Properties
1. Every square matrix A can be transformed to an upper (or lower) Hessenberg matrix by
means of an unitary similarity, that is, given a complex matrix A, there exists a unitary
matrix U such that
UAU = H
where H is a Hessenberg matrix.
Proof. (A constructive proof in the case where A is real is given in Chapter 5.)
2. If A is symmetric (or complex Hermitian), then the transformed Hessenberg matrix as ob-
tained in 1 is tridiagonal.
3. An arbitrary Hessenberg matrix can always be partitioned into diagonal blocks such that each
diagonal block is an unreduced Hessenberg matrix.
Example 1.4.4
01 2 3 41
BB 2 1 1 1 CC
A = BBB CC :
CA
@ 0 0 1 1
0 0 1! 1
A A
A = 1 2
:
0 A3
22
Note that
1 2
!
A1 =
2 1
and
1 1
!
A3 =
1 1
are unreduced Hessenberg matrices.
Companion Matrix | A normalized upper Hessenberg matrix of the form
00 0 a 1
BB 1 0 a CC 1
BB CC
2
C = B0 1 C
B
BB .. . . . . . . . . . .. CCC
@. .A
0 0 0 1 an
is called an upper companion matrix. The transpose of an upper companion matrix is a lower
companion matrix.
The characteristic polynomial of a companion matrix can be easily written down.
det (C ; I ) = det (C T ; I )
= (;1)n(n ; ann;1 ; an;1 n;2 ; ; a2 ; a1 ):
23
1 2
!
A=
0 3
is an upper Hessenberg matrix with (2; 1) entry equal to zero, but A is nonderogatory.
1
!
Pick b = :
2
1 5
!
Then (b; Ab) = is nonsingular.
2 6
A matrix that is not nonderogatory is called derogatory. A derogatory matrix is similar to a
direct sum of a number of companion matrices
0C 1
BB 1
0 CC
BB .
CC
BB .. CC
BB CC
@ 0 A
Ck
where each Ci is a companion matrix, k > 1 and the characteristic polynomial of each Ci divides
the characteristic polynomial of all the preceding Ci's. The above form is also known as Frobenius
Canonical Form.
7. Diagonally Dominant Matrix | A matrix A = (aij ) is row diagonally dominant if
X
jaiij > jaij j for all i
j 6=i
A column diagonally dominant matrix is similarly dened. The matrix
0 10 1 1 1
B C
A=B @ 1 10 1 CA
1 1 10
is both row and column diagonally dominant.
Note : Sometimes in the literature of Linear algebra, a matrix A having the above properties
is called a strictly diagonally matrix.
8. Positive Denite Matrix | A symmetric matrix A is positive denite if for every nonzero
vector x,
xT Ax > 0
X n
Let x = (x ; x ; : : :; xn) . Then x Ax =
1 2
T T aij xixj is called the quadratic form associated
i;j=1
with A.
24
Example 1.4.6
2 1
!
A=
1 !5
x
x= 1
x 2 ! !
2 1 x
x Ax = (x x )
T 1
1
1 5 x !2
2
x
= (2x + x x + 5x ) 1
1
x 2 1 2
2
= 2x + 2x x + 5x
2
1 1 2
2
2
= 2(x + 2x x + x ) + x
2
1 1 2
1
4
2
2
9
2
2
2
= 2(x + x ) + x > 0
1
1
2 2
2 9
2
2
2
A positive semidenite matrix is similarly dened. A symmetric matrix A is positive semidef-
inite if xT Ax 0 for every x.
A commonly used notation for a symmetric positive denite (positive semidenite
matrix) is A > 0( 0).
Some Characterizations and Properties of Positive Denite Matrices
Here are some useful characterizations of positive denite matrices:
1. A matrix A is positive denite if and only if all its eigenvalues are positive. Note that in the
above example the eigenvalues are 1.6972 and 5.3028.
2. A matrix A is positive denite i all its leading principal minors are positive.
There are n leading principal minors of an n n matrix A. The ith leading principal minor,
denoted by " 1 2 i !#
det A ;
1 2 i
is the determinant
0 10of the submatrix
1 of A formed out of the rst i rows and i columns.
1 1
Example: A = BB@ 1 10 1 CCA. Thus, in the above example,
1 1 10
The rst leading principal minor = 10; !
10 1
The second leading principal minor = det A = 99;
1 10
The third leading principal minor = det A = 972.
25
3. A symmetric diagonally dominant matrix is positive denite. Note that the matrix A, in the
example above, is diagonally dominant.
4. If A = (aij ) is positive denite, then aii > 0 for all i.
5. If A = (aij ) is positive denite, then the largest element (in magnitude) of the whole matrix
must lie on the diagonal.
6. The sum of two positive denite matrices is positive denite.
Remarks: Note that (4) and (5) are only necessary conditions for a symmetric matrix to be
positive denite. They can serve only as initial tests for positive deniteness. For example, the
matrices 04 1 1 11
BB 1 0 1 2 CC 0 20 12 25 1
A=B BB CC ; B = BB 12 15 2 CC
@ A
@ 1 1 2 3 CA 25 2 5
1 2 3 4
cannot be positive denite, since in the matrix A, there is a zero entry on the diagonal and in B ,
the largest entry 25 is not on the diagonal.
26
Example 1.5.1
Let
0 1
!
A = :
1 2
P () =
2 2 ; 2!; 1: ! 1 0!
1 2 0 1
P (A) = ;2 ;
2
2 5! 1 2 0 1
0 0
= :
0 0
1.6 Singular Values
Let A by m n. Then the eigenvalues of the n n hermitian matrix A A are real and non-negative.
Let these eigenvalues be denoted by i2 where 12 22 n2 . Then 1 ; 2; : : :; n are called
the singular values of A. Every m n matrix A can be decomposed into
A = U V T ;
where Umm and Vnn are unitary and is an m n \diagonal" matrix. This decomposition is
called the Singular Value Decomposition or SVD. The singular values i ; i = 1; : : :; n are the
diagonal entries of . The number of nonzero singular values is equal to the rank of the
matrix A. The singular values of A are the nonnegative square roots of the eigenvalues
of AT A (see Chapter 10, section 10.3)
Example 1.6.1
Let
0 1
!
A = :
2 2!
1 2
AT A = :
2 8
h p i
The eigenvalues of AT A are : 92 65
q p
1 = q[ 9+2 65 ]
p
2 = [ 9;2 65 ]
27
1.7 Vector and Matrix Norms
1.7.1 Vector Norms
Let 0x 1
BB x 1
CC
x=BBB .. 2 CC
CA
@.
xn
be an n-vector and V be a vector space. Then, a vector norm, denoted by the symbol kxk, is a
real-valued continuous function of the components x1; x2; : : :; xn of x, dened on V , that has the
following properties:
1. kxk > 0 for every non-zero x. kxk = 0 i x is the zero vector.
2. kxk = jjkxk for all x on V and for all scalars .
3. kx + y k kxk + ky k for all x and y in V .
The property (3) is known as the Triangle Inequality.
Note:
k ; xk = kxk
kxk ; kyk k(x ; y)k:
It is simple to verify that the following are vector norms.
In general, if p is a real number greater than or equal to 1, the p-norm or Holder norm is
dened by
kxkp = (jx1jp + + jxnjp) 1 :
p
28
Example 1.7.1
Let x = (1; 1; ;2). Then
kxk = 4
1
q p
kxk = 1 + 1 + (;2) = 6
2
2 2 2
kxk1 = 2
An important property of the Holder-norm is the Holder inequality
kxT yk kxkp kykq ;
where
1 1
p + q = 1:
A special case of the Holder-inequality is the Cauchy-Schwartz inequality
jxT yj kxk kyk ; 2 2
that is, v vn
X
n u
u n u
X uX
j xj yj j t xj t yj : 2 2
j =1 j =1 j =1
29
Equivalent Property of the Vector-norms
All vector norms are equivalent in the sense that there exist positive constants and such
that
kxk kxk kxk
for all x.
For the 2, 1, or 1 norms, we can compute and easily:
kxk kxk pnkxk
2 1 2
Example 1.7.2
0 1
B 1 ; 2 C
A=BB@ 3 4 CCA
;5 6
kAk = 12 1
kAk1 = 11
Another useful p-norm is the spectral norm:
Denition 1.7.1 : q
kAk = maximum eigenvalue of AT A
2
31
The Frobenius Norm
An important matrix norm compatible with the vector norm kxk2 is the Frobenius norm:
2n m 3 1
X X 2
kAkF = 4 jaij j 5 2
j =1 i=1
A matrix norm kkM and a vector norm kkv are compatible if
kAxkv kAkM kxkv :
Example 1.7.4
1 2
!
A=
3 4
p
kAkF = 30:
Notes:
1. For the identity matrix I ,
p
kI kF = n;
whereas kI k1 = kI k2 = kI k1 = 1.
2. kAk2F = trace(AT A), where trace(A) is dened as the sum of the diagonal
entries of A, that is, if A = (aij ), then trace(A) = a11 + a22 + : : : + ann.
p
(2) kAk kAk nkAk .
2 F 2
32
p
(3) p1m kAk1 kAk2 nkAk1.
p
(4) kAk2 kAk1kAk1 .
We prove here inequalities (1) and (2) and leave the rest as exercises.
Proof of (1)
By denition:
kAk1 = max kAxk1 :
x6 kxk =0 1
Again, from the equivalence property of the vector-norms, we have:
kAxk1 kAxk 2
and
kxk pnkxk1:
2
kxk1 kxk 2
or
p
kAxk1 n max kAxk2 = nkAk ; p
max
x6=0 kxk1 6 0 kxk2
x= 2
i.e.,
p1n kAk1 kAk : 2
The rst part is proved. To prove the second part, we again use the denition of kAk2 and the
appropriate equivalence property of the vector-norms.
kAk = max kAxk ; 2
2
p
x6 kxk =0 2
kAxk mkAxk1;
2
kxk1 kxk : 2
Thus,
kAxk pm kAxk1 : 2
kxk kxk1 2
33
We prove (2) using a dierent technique. Recall that
kAkF = trace(AT A):
2
Thus,
kAkF = trace(AT A) = d + + dn dk = kAk :
2
1
2
2
2 2
p
That is, kAkF ndk = nkAk . So, kAkF nkAk .
2 2
We now state, without proof, necessary and sucient conditions for the convergence of vector
and matrix sequences. The proofs can be easily worked out.
34
Theorem 1.7.2 The sequence v ; v ; : : : converges to v if and only if for any vector norm
(1) (2)
Limit
k!1
kv k ; vk = 0:
( )
We now state and prove a result on the convergence of the sequence of powers of a matrix to
the zero matrix.
Theorem 1.7.4 The sequence A; A ; : : : of the powers of the matrix A converges to the zero matrix
2
CC
B
1
B .. . .
Jik = B ... ... .. C
. CC ;
B . .
B
B .. ... CC
B
@ . i ki C
k k ;
A
1
0 0 0 ki
from where we see that Jik ! 0 if and only if jij < 1. This means that Limit
k!1
Ak = 0 i jij < 1
for each i.
Denition 1.7.2 A matrix A is called a convergent matrix if Ak ! 0 as k ! 1.
35
We now prove a sucient condition for a matrix A to be a convergent matrix in terms of a
norm of the matrix A. We rst prove the following result.
A Relationship between Norms and Eigenvalues
Theorem 1.7.5 Let be an eigenvalue of a matrix A. Then for any subordinate matrix norm,
jj < kAk:
Proof. By denition, there exists a nonzero vector x such that
Ax = x:
Taking the norm of each side, we have
kAxk = kxk = jj kxk:
However, kAxk kAk kxk, so jj kxk = kAxk kAkkxk, giving jj kAk.
Denition 1.7.3 The quantity (A) dened by
(A) = max jij
is called the spectral radius of A.
(A) kAk:
36
Convergence of an Innite Matrix Series
Theorem 1.7.6 The matrix series
I +A+ A + 2
37
Since kE k < 1; jij < 1 for each i. Thus, none of the quantities 1 ; 1 ; 1 ; 2 ; : : :; 1 ; n is
zero. This proves that I ; E is nonsingular. (Note that a matrix A is nonsingular i all its
eigenvalues are nonzero.)
To prove the second part, we write
(I ; E );1 = I + E + E 2 +
Since kE k < 1,
Limit
k!1
E k = 0:
Thus, the series on the right side is convergent. Taking the norm on both sides, we have
k(I ; E ); k kI k + kE k + kE k = (1 ; kE k); (since kI k = 1):
1 2 1
Theorem 1.7.9 Let A be nonsingular and let kA; E k < 1. Then A ; E is nonsingular, and
1
kA; ; (A ; E ); k kA; E k
1 1 1
kA; k 1 ; kA; E k
1 1
Since kA; E k < 1, from Theorem 1.7.7, we have I ; A; E is nonsingular. Thus, A ; E , which is
1 1
Now, since
B = A ; (A ; B )
= A[I ; A; (A ; B )]; 1
B; = [I ; A; (A ; B)]; A; :
1 1 1 1
(Note that (XY );1 = Y ;1X ;1.) If we now substitute B = A ; E , we then have
(A ; E );1 = [I ; A;1 E ];1A;1 :
Taking norms, we get
k(A ; E ); k kA; k kI ; A; E k; :
1 1 1 1
39
But from Theorem 1.7.7, we know that
kI ; A; E k; (1 ; kA; E k); :
1 1 1 1
So, we have
k(A ; E ); k 1 ;kkAA;kE k :
; 1
1
1
or kA ;kA(A;1;k E ) k 1 ;kAkA;E1kE k .
;1 ;1 ;1
Proof. By denition,
q
kOk = (OT O)
2
q
= (I ) = 1:
Theorem 1.8.2
kAOk = kAk 2 2
40
Proof.
q
kAOk = (OT AT AO)
2
q
= (AT A) = kAk 2
(Note that the spectral radius is invariant under similarity transformation.) (See Chap-
ter 8.)
Theorem 1.8.3
kAOkF = kAkF
Proof. kAOkF = trace(OT AT AO) = trace(AT A) = kAkF .
2 2
41
(i) The sequence fAkg converges to the \zero" matrix if and only if jij < 1 for
each eigenvalue of i of A (Theorem 1.7.4).
(ii) The sequence fAkg converges to a zero matrix if kAk < 1. (Corollary to
Theorem 1.7.5).
4. Norms and Inverses. If a nonsingular matrix A is perturbed by a matrix E , it is sometimes
of interest to know if the perturbed matrix A + E remains nonsingular and how to estimate
the error in the inverse of A + E .
Three theorems (Theorems 1.7.7, 1.7.8, and 1.7.9) are proved in this context in Section 1.7.4.
These results will play an important role in the perturbation analysis of linear systems (Chap-
ter 6).
43
Exercises on Chapter 1
PROBLEMS ON SECTIONS 1.2 AND 1.3
1. Prove that
(a) a set of n linearly independent vectors in Rn is a basis for Rn.
(b) the set fe1; e2; : : :; eng is a basis of Rn.
(c) a set of m vectors in Rn, where m > n, is linearly dependent.
(d) any two bases in a vector space V have the same number of vectors.
(e) dim(Rn) = n.
(f) spanfv1; : : :; vng is a subspace of V, where spanfv1 ; : : :; vng is the set of linear combina-
tions of the n vectors v1 ; : : :; vn from a vector space V.
(g) spanfv1; : : :; vng is the smallest subspace of V containing v1 ; : : :; vn.
2. Prove that if S = fs; : : :; sk g is an orthogonal set of nonzero vectors, then S is linearly
independent.
3. Let S be an m-dimensional subspace of Rn. Then prove that S has an orthonormal basis.
(Hint: Let S = fv1; : : :; vng.) Dene a set of vectors fuk g by:
u = kvv k
1
1
0
uk = kvvk0 k ;
1
+1
+1
k +1
where
vk0 = vk ; (vkT u )u ; (vkT u )u ; ; (vkT uk)uk ;
+1 +1 +1 1 1 +1 2 2 +1 k = 1; 2; : : :; m ; 1:
Then show that fu1 ; : : :; um g is an orthonormal basis of S . This is the classical Gram-
Schmidt process.
4. Using the Gram-Schmidt process construct an orthonormal basis of R3 .
5. Construct an orthonormal basis of R(A), where
01 21
B 2 3 CC :
A=B
@ A
4 5
44
6. Let S1 and S2 be two subspaces of Rn . Then prove that
dim(S1 + S2 ) = dim(S1) + dim(S2) ; dim(S1 \ S2):
45
A 0
!
12. Let A = 1
, where A and A are square. Prove that det(A) = det(A ) det(A ).
A A
2 3
1 3 1 3
onal matrix. Prove that the leading principal minors (determinant) of A are d ; d d ; : : :,
11 11 22
d : : :dnn.
11
14. (a) Show that if PS is an orthogonal projection onto S , then I ; PS is the orthogonal
projection onto S? .
(b) Prove that
i. PA = A(AT A);1 AT .
ii. PN = I ; A(AT A);1AT .
iii. kPA k2 = 1
(c) Prove that
i. bR = PA b
ii. bN = PN b
15. (a) Find PA and PN for the matrices
01 21 0 1 1
1
A=B
B 2 3 CC ; A = BB 10;4 0 CC
@ A @ A
0 0 0 10 ; 4
011
B 0 CC, nd b and b for each of the above matrices.
(b) For the vector b = B
@ A R N
1
(c) Find an orthonormal basis for each of the above matrices using the Gram-Schmidt
process and then nd PA ; PN ; bR , and bN . For a description of the Gram-Schmidt
algorithm, see Chapter 7 or problem #3 of this chapter.
16. Let A be an m n matrix with rank r. Consider the Singular Value Decomposition of
A:
A = U V T
= (Ur ; U^r )(Vr ; V^r )T :
Then prove that
46
(a) Vr VrT is the orthogonal projection onto N (A)? = R(AT ).
(b) Ur UrT is the orthogonal projection onto R(A).
(c) Vr (U^ )r (U^r )T is the orthogonal projection onto R(A)? = N (AT ).
(d) V^r (V^r)T is the orthogonal projection onto N (A).
17. (Distance between two subspaces). Let S1 and S2 be two subspaces of Rn such that
dim(S1 ) = dim(S2). Let P1 and P2 be the orthogonal projections onto S1 and S2 respectively.
Then kP1 ; P2k2 is dened to be the distance between S1 and S2. Prove that the distance
between S1 and S2 , dist(S1 ; S2) = sin() where is the angle between S1 and S2 .
18. Prove that if PS is an orthogonal projection onto S , then I ; 2PS is an orthogonal projec-
tion.
47
22. A square matrix A = (aij ) is a band matrix of bandwidth 2k + 1 if ji ; j j > k implies that
aij = 0. What are the bandwidths of tridiagonal and pentadiagonal matrices? Is the product
of two banded matrices having the same bandwidth a banded matrix of the same bandwidth?
Give reasons for your answer.
23. (a) Show that the matrix
H = I ; 2 uuuT u ;
T
0 0C
0
C
BB CC
C=B BB 0. 1 0 0 0C
B@ .. .. . . . . . . .. .. CC
. . .C A
0 0 1 0
Then show that
(a) the matrix V dened by
0 n n n 1
BB n; n; nn; CC
1 2
BB . . . 1
n C
C;
1 1
. ... C
1 2
B
V =B . . . . . CC where i 6= j ;
BB
@ n C
1 2 A
1 1 1
is such that V ;1 CV = diag(i); i = 1; : : :; n.
49
(b) The eigenvector xi corresponding to the eigenvalue i of C is given by xTi = (ni ;1; ni ;2; : : :,
i; 1).
33. Let H be an unreduced upper Hessenberg matrix. Let X = (x1 ; : : :; xn) be dened by
011
BB 0 CC
x1 = e1 = BBB .. CCC ; xi+1 = Hxi; i = 1; 2; : : :; n ; 1:
@.A
0
Then prove that X is nonsingular and X ;1HX is a companion matrix (upper Hessenberg
companion matrix.)
34. What are the singular values of a symmetric matrix? What are the singular values of a
symmetric positive denite matrix? Prove that a square matrix A is nonsingular i it has no
zero singular value.
35. Prove that
(a) trace(AB ) = trace(BA).
X
m X
n
(b) trace(AA ) = jaijj , where A = (aij ) is m n.
2
i=1 j =1
(c) trace(A + B ) = trace(A) + trace(B ).
(d) trace(TAT ;1) = trace(A).
50
40. Prove that for any vector x, we have
kxk1 kxk kxk :
2 1
51
52. Let A = (a1; : : :; an), where aj is the j th column of A. Then prove that
X
n
kAkF =
2
kaikF :
2
i=1
01 2 31 00 1 01
3 4 5
A=B
B 0 5 4 CC ; B 0 0 1 CC
A=B
@ A @ A
0 0 1 1 2 3
are not convergent matrices.
56. Construct a simple example where the norm test for convergent matrices fails, but still the
matrix is convergent.
57. Prove that the series (I + A2 + A2 + : : :) converges if kB k < 1, where B = PAP ;1 . What is
the implication of this result? Construct a simple example to see the usefulness of the result
in practical computations. (For details, see Wilkinson AEP, p. 60.)
52
2. FLOATING POINT NUMBERS AND ERRORS IN COMPUTATIONS
2.1 Floating Point Number Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53
2.2 Rounding Errors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 56
2.3 Laws of Floating Point Arithmetic : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59
2.4 Addition of n Floating Point Numbers : : : : : : : : : : : : : : : : : : : : : : : : : : 63
2.5 Multiplication of n Floating Point Numbers : : : : : : : : : : : : : : : : : : : : : : : 65
2.6 Inner Product Computation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66
2.7 Error Bounds for Floating Point Matrix Computations : : : : : : : : : : : : : : : : : 69
2.8 Round-o Errors Due to Cancellation and Recursive Computations : : : : : : : : : : 71
2.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75
2.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77
CHAPTER 2
FLOATING POINT NUMBERS
AND ERRORS IN COMPUTATIONS
2. FLOATING POINT NUMBERS AND ERRORS IN COMPU-
TATIONS
2.1 Floating Point Number Systems
Because of limited storage capacity, a real number may or may not be represented exactly on a
computer. Thus, while using a computer, we have to deal with approximations of the real number
system using nite computer representations. This chapter will be conned to the study of the
arithmetic of such approximate numbers. In particular, we will examine the widely accepted IEEE
standard for binary
oating-point arithmetic (IEEE 1985).
A nonzero normalized
oating point number in base 2 has the form:
(;1)sd1:d2d3 dt 2e
x = :d d dt2e
1 2
or
x = :r2e;
where e is the exponent, r is the signicant, d2 d3 dt is called the fraction, t is the precision,
and (;1)s is the sign. (Note that t is nite.)
d =11
di = 0 or 1; 2 i t
Three parameters specify all numerical values that can be represented. These are: the precision,
t, and L and U , the minimum and maximum exponents. The numbers L and U vary among
computers, even those that adhere to the IEEE standard, since the standard recommends only
minimums. As an example, the standard recommends, for single precision, that t = 24; L = ;126,
and U = 127. The recommendation for double precision is t = 53; L = ;1022, and U = 1023.
Consider the example for a 32-bit word
1 8 23
s e f
Here s is the sign of the number, e is the eld for the exponent, and f is the fraction.
Note that for normalized
oating point numbers in base 2, it is known that d1 = 1 and can thus
be stored implicitly.
53
The actual storage of the exponent is accomplished by storing the true exponent plus an oset,
or bias. The bias is chosen so that e is always nonnegative. The IEEE standard also requires that
the unbiased exponent have two unseen values of L ; 1 and U + 1. L ; 1 is used to encode 0 and
denormalized numbers (i.e., those for which d1 6= 1). U +1 is used to encode 1 and nonnumbers,
such as (+1) + (;1), which are denoted by NaN.
Note that for the single precision example given above, the bias is 127. Thus, if the biased
exponent is 255, then 1 or a NaN is inferred. Likewise if the biased exponent is 0, then 0 or
a denormalized number is inferred. The standard species how to determine the dierent cases
for the special situations. It is not important here to go into such detail. Curious readers should
consult the reference (IEEE 1985).
From the discussion above, one sees that the IEEE standard for single precision provides approx-
imately 7 decimal digits of accuracy, since 2;23 = 1:2 10;7. Similarly, double precision provides
approximately 16 decimal digits of accuracy (2;52 = 2:2 10;16).
There is also an IEEE standard for
oating point numbers which are not necessarily of base 2.
By allowing one to choose the base, , we see that the set of all
oating point numbers, called the
Floating Point Number System, is thus characterized by four parameters:
| The number base.
t | The Precision.
L; U | Lower and upper limits of the exponent.
This set has exactly:
2( ; 1) t;1(U ; L + 1) + 1 numbers in it.
We denote the set of normalized
oating point numbers of precision t by Ft . The set of Ft is
NOT closed under arithmetic operations; that is, the sum, dierence, product, or quotient
of two
oating numbers in Ft is not necessarily a number in Ft. To see this, consider the
simple example in the
oating system with = 10; t = 3; L = ;1; U = 2:
a = 11:2 = :112 10 2
b = 1:13 = :113 10 1
b = :120 103
c = a b = :133 10 5
b = :2 10; 1
c = ab = 2 10; 4
55
Computing the Length of a Vector
Over
ow and under
ow can sometimes be avoided just by organizing the computations dier-
ently. Consider, for example, the task of computing the length of an n-vector x with components
x ; : : :; xn
1
kxk = x + x + + xn:
2
2
2
1
2
2
2
If some xi is too big or too small, then we can get over
ow or under
ow with the usual way
of computing kxk2. However, if we normalize each component of the vector by dividing it by
m = max(jx1j; : : :; jx1j) and then form the squares and the sum, then over
ow problems can be
avoided. Thus, a better way to compute kxk22 will be:
1: m = max(jx1j; : : :; jxnj)
2: yi = xi=m; i = 1; : : :; n
p
3: kxk2 = m (y12 + y22 + + yn2 )
2.2 Rounding Errors
If a computed result of a given real number is not machine representable, then there are two ways
it can be represented in the machine. Consider
d dtdt
1 +1
Then the rst method, chopping, is the method in which the digits from dt+1 on are simply
chopped o. The second method is rounding, in which the digits dt+1 through the rest are not
only chopped o, but the digit dt is also rounded up or down depending on whether dt+1 =2 or
dt+1 < =2.
Let
(x) denote the
oating point representation of a real number x.
Example 2.2.1 Rounding
Consider base 10. Let x = 3:141596
t=2
(x) = 3:1
t=3
(x) = 3:14
t=4
(x) = 3:142
56
We now give an expression to measure the error made in representing a real number x on the
computer, and then show how this measure can be used to give bounds for errors in other
oating
point computations.
Denition 2.2.1 Let x^ denote an approximation of x, then there are two ways we can measure
the error:
Absolute Error = jx^ ; xj
; x 6= 0:
Relative Error = jx^ j;xjxj
Note that the relative error makes more sense than the absolute error. The following
simple example shows this:
Example 2.2.2
Consider
x 1 = 1:31
x^ 1 = 1:30
and x 2 = 0:12
x^ 2 = 0:11
Thus, the relative errors show that x^1 is closer to x1 than x^2 is to x2 , whereas the absolute errors
give no indication of this at all.
The relative error gives an indication of the number of signicant digits in an approximate
answer. If the relative error is about 10;s, then x and x^ agree to about s signicant digits. More
specically,
Denition 2.2.2 x^ is said to approximate x to s signicant digits if s is the largest
non-negative
1in-
j x ; x
^ j j x
teger for which the relative error jxj < 5(10;s); that is, s is given by s = ; log jxj + 2 .; x
^ j
57
Thus, in the above examples, x^1 and x1 agree to two signicant digits, while x^2 and x2 agree
to about only one signicant digit.
We now give an expression for the relative error in representing a real number x by its
oating
point representation
(x):
Theorem 2.2.1 Let
(x) denote the
oating point representation of a real number
x. Then
81 9
j
(x) ; xj = >
< ;t for rounding
2
1 >
=
: (2.2.1)
jxj >
: ;t for chopping
1 >
;
Proof. We establish the bound for rounding and leave the other part as an exercise.
Let x be written as
x = (d d dtdt ) e
1 2 +1
where d1 6= 0 and 0 di < . When we round o x we obtain one of the following point numbers:
x0 = (d d dt) e
1 2
Obviously we have x 2 (x0 ; x00). Assume, without any loss of generality, that x is closer to x0.
We then have
jx ; x0j 12 jx0 ; x00j = 12 e;t:
Thus, the relative error
jx ; x0j 1 ;t
jxj 2 d1d2 dt
;t
12 1 (since di < )
= 12 1;t:
Example 2.2.3
58
Consider the three digit representation of the decimal number x = 0:2346 ( = 10; t = 3).
Then, if rounding is used, we have:
(x) = 0:235
Relative Error = 0:001705 < 1 10;2:
2
Similarly, if chopping is used, we have:
(x) = :234
Relative Error = 0:0025575 < 10;2:
Deniton: The number in (2.2.1) is called the machine precision or unit roundo error.
It is the smallest positive
oating point number such that:
(1 + ) > 1:
is usually between 10; and 10; (on most machines), for double and single precision, respec-
16 6
tively. For the IBM 360 and 370, = 16; t = 6; = 4:77 10; .7
The above FORTRAN program computes an approximation of which diers from by at most a
factor of 2. This approximation is quite acceptable, since an exact value of is not that important
and is seldom needed.
The book by Forsythe, Malcolm and Moler (CMMC 1977) also contains an extensive list of L
and U for various computers.
59
can be written as
(x) = x(1 + )
where j j
Assuming that the IEEE standard holds, we can easily derive the following simple laws of
oating
point arithmetic.
Theorem 2.3.1 Let x and y be two
oating point numbers, and let
(x + y),
(x ; y );
(xy ), and
(x=y ) denote the computed sum, dierence, product and quo-
tient. Then
1.
(x y ) = (x y )(1 + ), where j j .
2.
(xy ) = (xy )(1 + ), where j j .
3. if y 6= 0, then
(x=y ) = (x=y )(1 + ), where j j .
On computers that do not use the IEEE standard, the following
oating point law
of addition might hold:
4.
(x + y ) = x(1 + 1 ) + y (1 + 2 ) where j1 j , and j2 j .
(x + y ) = :100 10 3
60
Thus,
(xy ) = xy (1 + ), where
= 1:00100 10; ; j j 12 (10 ; ):
3 1 3
= 0:
4. Let
= 10; t=4
x = 0:1112
y = :2245 105
xy = :24964 104;
(xy ) = :2496 104
Thus, j
(xy ) ; xy j = :44 and
jj = 1:7625 10; 4
< 12 10; 3
;0:0994 102
(x + y ) = 0:001 6 102
Step 3. Normalize
(x + y ) = 0:160 100
Result:
(x + y) = (x + y)(1 + ) with = 0.
Example 2.3.3 Addition without a guard digit
x = 0:101 10 ; 2
y = ;0:994 10 1
y = ;0:099[4] 10 2
62
Thus, we repeat that for computers with a guard digit,
(x y ) = (x + y )(1 + ) j j
However, for those without a guard digit
(x y ) = x(1 + 1 ) y (1 + 2 );
j j ;
1 j j :
2
A FINAL REMARK: Throughout this book, we will assume that the computations have been
performed with a guard digit, as they are on almost all available machines.
We shall call results 1 through 3 of Theorem 2.3.1 along with (2.2.1) the fundamental laws
of
oating point arithmetic. These fundamental laws form the basis for establishing bounds for
oating point computations.
For example, consider the
oating point computation of x(y + z ):
(x(y + z )) = [x
(y + z )](1 + 1 )
= x(y + z )(1 + 2 )(1 + 1 )
= x(y + z )(1 + 1 2 + 1 + 2 )
x(y + z)(1 + 3);
where 3 = 1 + 2 ; since 1 and 2 are small, their product is neglected.
We can now easily establish the bound of 3 . Suppose = 10, and that rounding is used. Then
j j = j + j j j + j j
3 1 2 1 2
21 10 ;t + 12 10 ;t
1 1
= 10 ;t:1
Thus, the relative error due to round-o in computing
(x(y + z )) is about 101;t in the
worst case.
2.4 Addition of n Floating Point Numbers
Consider adding n
oating point numbers x1; x2; : : :; xn with rounding. Dene s2 =
(x1 + x2 ).
Then
s2 =
(x1 + x2) = (x1 + x2 )(1 + 2 );
63
where j1 j 21 1;t. That is, s2 ; (x1 + x2 ) = 2 (x1 + x2). Dene s3 ; s4; : : :; sn recursively by
si =
(si + xi ); i = 2; 3; : : :; n ; 1:
+1 +1
(x + x ) + (x + x + x )
1 2 2 1 2 3 3
(neglecting the term 2 3 which is small, and so on). Thus, by induction we can show that
sn ; (x + x + + xn) (x + x ) + (x + x + x )
1 2 1 2 2 1 2 3 3
+ + (x + x + + xn )n 1 2
+x ( + + n ) + x ( + + n )
2 2 3 3
+ + xnn
where each jij 12 1;t = . Dening 1 = 0, we can write:
Remark: From the above formula we see that we should expect smaller error in general when
adding n
oating point numbers in ascending order of magnitude:
jx j jx j jx j jxnj:
1 2 3
If the numbers are arranged in ascending order of magnitude, then the larger errors will be associ-
ated with the smaller numbers.
64
2.5 Multiplication of n Floating Point Numbers
Proceeding as in the case of addition of n
oating point numbers in the last section, it can be shown
that
Theorem 2.5.1 Y
n
(x1 x2 xn ) (1 + ) xi
i=1
where = j(1 + 2 )(1 + 3) (1 + n ) ; 1j and ji j ; i = 1; 2; : : :; n.
A bound for
Assuming that (n ; 1) < :01, we will prove that < 1:06(n ; 1). (This assumption is
quite realistic; on most machines this assumption will hold for fairly large values of n).
Since ji j , we have (1 + )n; ; 1. Again, since
1
Theorem 2.5.2 The relative error in computing the product of n
oating point
numbers is at most 1:06(n ; 1), assuming that (n ; 1) < :01.
65
2.6 Inner Product Computation
A frequently arising computational task in numerical linear algebra is the computation of the inner
product of two n-vectors x and y:
xT y = x y + x y + + xnyn
1 1 2 2 (2.6.1)
where xi and yi , i = 1; : : :; n, are the components of x and y .
Let xi and yi , i = 1; : : :; n be
oating-point numbers. Dene
S =
(x y );
1 1 1 (2.6.2)
S =
(S +
(x y ));
2 1 2 2 (2.6.3)
..
.
Sk =
(Sk;1 +
(xkyk )); (2.6.4)
k = 3; 4; : : :; n:
We then have, using Theorem 2.3.1,
S = x y (1 + );
1 1 1 1 (2.6.5)
S = [S + x y (1 + )] (1 + )
2 1 2 2 2 2 (2.6.6)
..
.
Sn = [Sn;1 + xn yn (1 + n )] (1 + n ); (2.6.7)
where each ji j , and jij . Substituting the values of S1 through Sn;1 in Sn and making
some rearrangements, we can write
X
n
Sn = xi yi(1 + i ) (2.6.8)
i=1
where
1 + i = (1 + i )(1 + i)(1 + i+1) (1 + n)
1 + i + i + i+1 + + n (1 = 0) (2.6.9)
(ignoring the products i j and j k , which are small).
66
For example, when n = 2, it is easy to check that
S = x y (1 + ) + x y (1 + );
2 1 1 1 2 2 2 (2.6.10)
where 1 + 1 1 + 1 + 2; 1 + 2 1 + 2 + 2 (neglecting the products of 1 2 and 2 2, which are
small).
As in the last section, it can be shown (see Forsythe and Moler CSLAS, pp. 92{93) that if
n < 0:01, then
jij 1:01(n + 1 ; i); i = 1; 2; : : :; n: (2.6.11)
From (2.6.8) and (2.6.11), we have
j
(xT y) ; xT yj
Xn
jxij jyij jij
i=1
njxjT jyj
n kxk kyk
2 2
(using the Cauchy{Schwarz inequality (Chapter 1, section 1.7)),
where jj = (j1j; j2j; : : :; jnj)T .
Theorem 2.6.1
j
(xT y) ; xT yj njxjT jyj n kxk kyk ;
2 2
Theorem 2.6.2
j
(xT y) ; xT yj cjxT yj;
2
Remark: The last sentence in Theorem 2.6.2 is important. One can construct a very simple
example (Exercise #6(b)) to see that if cancellation takes place, the conclusion of Theorem 2.6.2
does not hold. The phenomenon of catastrophic cancellation is discussed in the next
section.
68
2.7 Error Bounds for Floating Point Matrix Computations
Theorem 2.7.1 Let jM j = (jmijj). Let A and B be two
oating point matrices
and c a
oating point number. Then
1.
(cA) = cA + E; jE j jcAj
2.
(A + B ) = (A + B ) + E; jE j jA + B j
If A and B are two matrices compatible for matrix-multiplication, then
3.
(AB ) = AB + E; jE j njAj jB j + O(2 ).
Proof. See Wilkinson AEP, pp. 114-115, Golub and Van Loan MC (1989, p. 66).
Meaning of O( ) 2
In the above expression the notation O(2 ) stands for a complicated expression that is bounded
by c2 , where c is a constant, depending upon the problem. The expression O(2 ) will be used
frequently in this book.
Remark: The last result shows that the matrix multiplication in
oating point arithmetic can be
very inaccurate, since jAj jB j may be much larger than jAB j itself (exercise #9). For this reason,
whenever possible, while computing matrix-matrix or matrix-vector product, accumu-
lation of inner products in double precision should be used, because in this case the
entries of the error matrix can be shown to be bounded predominantly by the entries
of the matrix jABj, rather than those of jAjjBj; see Wilkinson AEP, p. 118.
Error Bounds in Terms of Norms
Traditionally, for matrix computations the bounds for error matrices are given in terms of the
norms of the matrices, rather than in terms of absolute values of the matrices as given above.
Here we rewrite the bound for error matrices for matrix multiplications using norms, for easy
reference later in the book. We must note, however, that entry-wise error bounds are
more meaningful than norm-wise errors (see remarks in Section 3.2).
Consider again the equation:
(AB ) = AB + E; jE j njAj jB j + O(2):
69
Since kE k k jE j k, we may rewrite the equation as:
(AB ) = AB + E;
where
kE k k jE j k nk jAj k k jBj k + O( ): 2
70
Implication of the above result
The result says that, although matrix multiplication can be inaccurate in general,
if one of the matrices is orthogonal then the
oating point matrix multiplication gives
only a small and acceptable error. As we will see in later chapters, this result forms the basis
of many numerically viable algorithms discussed in this book.
For example, the following result, to be used very often in this book, forms the basis of the QR
factorization of a matrix A (see Chapter 5) and is a consequence of the above result.
71
Suppose now we use four digit arithmetic with rounding. Then we have
x^ = :5462 (Correct to four signicant digits)
y^ = :5460 (Correct to ve signicant digits)
d^ = x^ ; y^ = :0002:
How good is the approximation of d^ to d? The relative error is
jd ; d^j = :25(quite large!)
jdj
What happened above is the following. In four digit arithmetic, the numbers .5462 and .5460 are
of almost the same size. So, when the rst one was subtracted from the second, the most signicant
digits canceled and the very least signicant digit was left in the answer. This phenomenon, known
as catastrophic cancellation, occurs when two numbers of approximately the same size are
subtracted. Fortunately, in many cases catastrophic cancellation can be avoided. For example,
consider the case of solving the quadratic equation:
ax + bx + c = 0;
2
a 6= 0:
The usual way the two roots x1 and x2 are computed is:
p2
x1 = ; b + b ; 4ac
p2a2
x2 = ; b ; b ; 4ac
2a
It is clear from above that if a; b, and c are numbers such that ;b is about the same size
p
as b2 ; 4ac (with respect to the arithmetic used), then a catastrophic cancellation will occur in
computing x2 and as a result, the computed value of x2 can be completely erroneous.
Example 2.8.1
As an illustration, take a = 1, b = ;105, c = 1 (Forsythe, Malcolm and Moler CMMC pp.
20-22). Then using = 10; t = 8; L = ;U = ;50, we see that
p 10
x1 = 10 5
+ 10 ; 4 = 105 (true answer)
2
x2 = 10 ;2 10 = 0 (completely wrong).
5 5
The true x2 = 0:000010000000001 (correctly rounded to 11 signicant digits). The catastrophic
p
cancellation took place in computing x2, since ;b and b2 ; 4ac) are the same order. Note that
p
in 8-digit arithmetic, 1010 ; 4 = 105.
72
How Cancellation Can be Avoided
Cancellation can be avoided if an equivalent pair of formulas is used:
p
x = ; b + sign(b ) b ; 4ac 2
1
2a
c
x = ax
2
1
where sign(b) is the sign of b. Using these formulas, we easily see that:
x = 100000:00
1
1:0000000
x = 100000
2
:00
= 0:000010000
Example 2.8.2
For yet another example to see how cancellation can lead to inaccuracy, consider the problem
of evaluating
f (x) = ex ; x ; 1 at x = :01.
Using ve digit arithmetic, the correct answer is .000050167. If f (x) is evaluated directly from the
expression, we have
f (:01) = 1:0101 ; (:01) ; 1 = :0001
; :000050167
Relative Error = :0001:00005016
= :99 100 ;
indicating that we cannot trust even the rst signicant digit.
Fortunately, cancellation can again be avoided using the convergent series
ex = 1 + x + x2 + x3! +
2 3
= x + x + x
2 3 4
2 3! 4!
For x = :01, this formula gives
(:01)2 + (:01)3 + (:01)4 +
2 3! 4!
= :00005 + :000000166666 + :00000000004166 +
= :000050167 (Correct gure up to ve signicant gures)
73
Remark: Note that if x were negative, then use of the convergent series for ex would not have
helped. For example, to compute ex for a negative value of x, cancellation can be avoided by using:
e;x = e1x = 1
1 + x + 2! + x3! +
x 2 3
Recursive Computations
In the above examples, we saw how subtractive cancellations can give inaccurate answers. There
are, however, other common sources of round-o errors, e.g., recursive computations, which are
computations performed recursively so that the computation of one step depends upon the results
of previous steps. In such cases, even if the error made in the rst step is negligible, due to the
accumulation and magnication of error at every step, the nal error can be quite large, giving a
completely erroneous answer.
Certain recursions propagate errors in very unhealthy fashions. Consider a very nice example
involving recursive computations, again from the book by Forsythe, Malcolm, and Moler CMMC
[pp. 16-17].
Example 2.8.3
Suppose we need to compute the integral
Z 1
En = xnex; dx
1
or
En = 1 ; nEn; ; n = 2; 3; : : :1
Thus, if E is known, then for dierent values of n, En can be computed, using the above recursive
1
formula.
Indeed, with = 10 and t = 6, and starting with E1 = 0:367879 as a six-digit approximation
to E1 = 1=e, we have from above:
E1 = 0:367879
E2 = 0:264242
E3 = 0:207274
E4 = 0:170904
..
.
E9 = ;0:068480
74
Although the integrand is positive throughout the interval [0; 1], the computed value
of E is negative. This phenomenon can be explained as follows.
9
The error in computing E was ;2 times the error in computing E , the error in computing E
2 1 3
was ;3 times the error in E (therefore, the error at this step was exactly six times the error in
2
E ). Thus, the error in computing E was (;2)(;3)(;4) (;9) = 9! times the error in E . The
1 9 1
error in E was due to the rounding of 1=e using six signicant digits, which is about 4:412 10; .
1
7
However, this small error multiplied by 9! gave 9! 4:412 10; = :11601, which is quite large.
7
With n = 20, E20 1 . Let's take E20 = 0. Then, starting with E20 = 0, it can be shown (Forsythe,
21
Malcolm, and Moler CMMC, p. 17) that E9 = 0:0916123, which is correct to full six-digit precision.
The reason for obtaining this accuracy was that the error in E20 was at most 21 1 ; this error was
1 in computing E , giving an error of at most 1 1 = 0:0024 in the computation
multiplied by 20 19
20 21
of E19, and so on.
75
| the base
t | the precision
L; U | the lower and upper limits of the exponent.
2. Errors: The error(s) in a computation is measured either by absolute error or relative
error.
The relative errors make more sense than absolute errors.
The relative error gives an indication of the number of signicant digits in an approximate
answer.
The relative error in representing a real number x by its
oating point representation
(x) is
bounded by a number , called the machine precision (Theorem 2.2.1).
3. Laws of Floating Point Arithmetic.
(x y ) = (x y )(1 + )
where * indicates any of the four basic arithmetic operations +; ;; , or , and j j .
4. Addition, Multiplication, and Inner Product Computations. The results of addi-
tion and multiplication of n
oating point numbers are given in Theorems 2.4.1 and 2.5.1,
respectively.
While adding n
oating point numbers, it is advisable that they are added in ascending
order of magnitude.
While computing the inner product of two vectors, accumulation of inner product in
double precision, whenever possible, is suggested.
5. Floating Point Matrix Multiplications. The entry-wise and normalize error bounds for
matrix multiplication of two
oating point matrices are given in Theorems 2.7.1 and 2.7.2,
respectively.
Matrix multiplication in
oating point arithmetic can be very inaccurate, unless one of
the matrices is orthogonal (or unitary, if complex). Accumulation of inner product is
suggested, whenever possible, in computing a matrix-matrix or a matrix-vector product.
The high accuracy in a matrix product computation involving an orthogonal matrix
makes the use of orthogonal matrices in matrix computations quite attractive.
76
6. Round-o Errors Due to Cancellation and Recursive Computation.
Two major sources of round-o errors are subtractive cancellation and recursive com-
putations.
They have been discussed in some detail in section 2.8.
Examples have been given to show how these errors come up in many basic computations.
An encouraging message here is that in many instances, computations can be reorganized so
that cancellation can be avoided, and the error in recursive computations can be diminished
at each step of computation.
77
Exercises on Chapter 2
1. (a) Show that 81
j
(x) ; xj = >
< ;t for rounding
2
1
jxj >
: ;t for chopping
1
where
jekj 2k + O( ): 2
3. Construct examples to show that the distributive law for
oating point addition and multi-
plication does not hold. What can you say about the commutativity and associativity for
these operations? Give reasons for your answers.
4. Let x1 ; x2; : : :; xn be the n
oating point numbers. Dene
s =
(x + x ); sk =
(sk; + xk ); k = 3; : : :; n:
2 1 2 1
11. Let b be a column vector and x = Ab. Let x^ =
(x). Then show that
kx^ ; xk p(n)kA;1k kAk;
kxk
where p(n) is a polynomial in n of low degree.
(The number kA;1k kAk is called the condition number of A. There are matrices for which
this number can be very big. For those matrices we then conclude that the relative error
in matrix-vector product can be quite large.)
79
12. Using Theorem 2.7.1, prove that, if B is nonsingular,
k
(AB) ; ABkF n kBk
B;1
+ O(2):
kABkF F F
Let y^i =
(yi ). Find a bound for the relative error in computing each yi ; i = 1; : : :; n.
14. Let = 10; t = 4. Compute
(AT A);
where 0 1
BB 1 1 C
A=B@ 10
4 ; 0 C CA :
0 10;4
Repeat your computation with t = 9. Compare the results.
15. Show how to arrange computation in each of the following so that the loss of signicant digits
can be avoided. Do one numerical example in each case to support your answer.
(a) ex ; x ; 1, for negative values of x.
p
(b) x +1;x ,
4 2
for large values of x.
1 1
(c)
x ; x + 1, for large values of x.
(d) x ; sin x, for values of x near zero.
(e) 1 ; cos x, for values of x near zero.
16. What are the relative and absolute errors in approximating
(a) by 227?
(b) 13 by .333?
(c) 16 by .166?
How many signicant digits are there in each computation?
17. Let = 10; t = 4. Consider computing
a = ( 16 ; :1666)=:1666:
How many correct digits of the exact answer will you get?
80
18. Consider evaluating
p
e= a +b : 2 2
How can the computation be organized so that over
ow in computing a2 + b2 for large values
of a or b can be avoided?
19. What answers will you get if you compute the following numbers on your calculator or com-
puter?
p
(a) 10 ; 1,
8
p ;
(b) 10 ; 1, 20
(c) 10 ; 50
16
2a
What remedy do you suggest? Now solve the equations using your suggested remedy, using
t = 4.
21. Show that the integral Z
yi =
1
xi dx
0 x+5
can be computed by using the recursion formula:
yi = 1i ; 5yi;1:
Compute y1 ; y2; : : :; y10 using this formula, taking
y = ln(x + 5)jx = ln 6 ; ln 5 = ln(1:2):
0
1
=0
81
3. STABILITY OF ALGORITHMS AND CONDITIONING OF PROBLEMS
3.1 Some Basic Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82
3.1.1 Computing the Norm of a Vector : : : : : : : : : : : : : : : : : : : : : : : : : 82
3.1.2 Computing the Inner Product of Two Vectors : : : : : : : : : : : : : : : : : : 83
3.1.3 Solution of an Upper Triangular System : : : : : : : : : : : : : : : : : : : : : 83
3.1.4 Computing the Inverse of an Upper Triangular Matrix : : : : : : : : : : : : : 84
3.1.5 Gaussian Elimination for Solving Ax = b : : : : : : : : : : : : : : : : : : : : 86
3.2 Denitions and Concepts of Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : 91
3.3 Conditioning of the Problem and Perturbation Analysis : : : : : : : : : : : : : : : : 95
3.4 Conditioning of the Problem, Stability of the Algorithm, and Accuracy of the Solution 96
3.5 The Wilkinson Polynomial : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98
3.6 An Ill-conditioned Linear System Problem : : : : : : : : : : : : : : : : : : : : : : : : 100
3.7 Examples of Ill-conditioned Eigenvalue Problems : : : : : : : : : : : : : : : : : : : : 100
3.8 Strong, Weak and Mild Stability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 103
3.9 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 105
3.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 106
CHAPTER 3
STABILITY OF ALGORITHMS
AND CONDITIONING OF PROBLEMS
3. STABILITY OF ALGORITHMS AND CONDITIONING OF
PROBLEMS
3.1 Some Basic Algorithms
Denition 3.1.1 An algorithm is an ordered set of operations, logical and arithmetic, which
when applied to a computational problem dened by a given set of data, called the input data,
produces a solution to the problem. A solution is comprised of a set of data called the output
data.
In this book, for the sake of convenience and simplicity, we will very often describe algorithms by
means of pseudocodes which can be translated into computer codes easily. Describing algorithms
by pseudocodes has been made popular by Stewart through his book IMC (1973).
Output Data: s.
Pseudocodes
r = max(jx j; : : :; jxnj)
1
s=0
For i = 1 to n do
yi = xi=r
s = s + yi2
s = r(s) =
1 2
G. W. Stewart, a former student of the celebrated numerical analyst Alston Householder, is a professor of computer
science at the University of Maryland. He is well known for his many outstanding contributions in numerical linear
algebra and statistical computations. He is the author of the book Introduction to Matrix Computations.
82
An Algorithmic Note
In order to avoid over
ow, each entry of x has been normalized before
using the formula q
kxk2 = x21 + + x2n:
t y + + t nyn = b
22 2 2 2
t y + + t nyn = b
33 3 3 3
..
.
83
tn; ;n; yn; + tn; ;nyn = bn;
1 1 1 1 1
tnnyn = bn
where each tii 6= 0 for i = 1; 2; : : :; n.
The last equation is solved rst to obtain yn , then this value is inserted in the next to the last
equation to obtain yn;1, and so on. This process is known as back substitution. The algorithm
can easily be written down.
Algorithm 3.1.3 Back Substitution
Input Data: T = (tij ), an n n upper triangular matrix, and b an n-vector.
Step 1: Compute yn = tbn
nn
Step 2: Compute yn; through y successively:
0 1
1 1
1 Xn
yi = t @bi ; tij yj A ; i = n ; 1; : : :; 2; 1:
ii j =i+1
Pseudocodes
For i = n;0n ; 1; : : :; 3; 2; 11do
Xn
yi = t @bi ;
1
ii
tij yj A
j =i+1
s = t1 :
11
11
s = t1 ; s = ; t1 (t s ):
22 12 12 22
22 11
A Convention: From now onwards, we shall use the following format for algorithm
descriptions.
85
k = 3:
s 33 = 14
s 23 = ; t1 (t23s33 )
22
1
= ; 2 (1 14 ) = ; 81 :
= ; 5 (2 (; 18 ) + 3 41 )
1
1
= ; 10
k = 2:
s = 1 =1
22
t22 2
s 12 = ; t1 (t12s22 )
11
= ; (2 1 ) = ; 1
1
5 2 5
k = 1:
s 11 = t1 = 15
0 ; 1
11
1 1
;1
B
T; = S = B
5 5 10
CC
1
@0 1
2
; 1
8 A
0 0 1
4
a x + a x + + a n xn = b
21 1 22 2 2 2
..
.
an x + an x + + annxn
1 1 2 2 = bn
or, in matrix notation,
Ax = b;
86
where a = (aij ) and b = (b1; : : :; bn)T .
A well-known approach for solving the problem is the classical elimination scheme known as
Gaussian elimination. A detailed description and mechanism of development of this historical
algorithm and its important practical variations will appear in Chapters 5 and 6. However, for
a better understanding of some of the material presented in this chapter, we just give a brief
description of the basic Gaussian elimination scheme.
Basic idea. The basic idea is to reduce the system to an equivalent upper triangular system so
that the reduced upper triangular system can be solved easily using the back substitution algorithm
(Algorithm 3.1.3).
mi = ; aai ; i = 2; : : :; n
1
1
11
are called multipliers. At the end of step 1, the system Ax = b becomes A x = b , where
(1) (1)
the entries of A = (aij ) and those of b are related to the entries of A and b as follows:
(1) (1) (1)
aij = aij + mi a j (i = 2; : : :; n; j = 2; : : :; n)
(1)
1 1
bi = bi + mi b (i = 2; : : :; n):
(1)
1 1
Step 2: At step 2, x is eliminated from the 3rd through the nth equations of A x = b by
2
(1) (1)
22
and adding it, respectively, to the 3rd through nth equations. The system now becomes
A x = b , whose entries are given as follows:
(2) (2)
aij = aij + mi a j (i = 3; : : :; n; j = 3; : : :; n)
(2) (1) (1)
2 2
bi = bi + mi b (i = 3; : : :; n)
(2) (1) (1)
2 2
and so on.
87
k;1)
Step k: At step k, the (n ; k) multipliers mik = ; aikk; , i = k + 1; : : :; n are formed and using
(
akk
( 1)
them, xk is eliminated from the (k + 1)th through the nth equations of A(k;1)x = b(k;1). The
entries of A(k) and those of b(k) are given by
aijk = aijk; + mik akjk;
( ) ( 1) ( 1)
(i = k + 1; : : :; n; j = k + 1; : : :; n)
bik = bik; + mikbkk;
( ) ( 1) ( 1)
(i = k + 1; : : :; n)
Step n-1: At the end of the (n ; 1)th step, the reduced matrix A n; is upper triangular and the
( 1)
akk ( 1)
For j = k + 1; : : :; n do
aijk = aijk; + mik akjk;
( ) ( 1) ( 1)
88
Remarks:
1. The above basic Gaussian elimination algorithm is commonly known as the Gaussian elim-
ination algorithm without row interchanges or the Gaussian elimination algorithm
without pivoting. The reason for having such a name will be clear from the discussion of
this algorithm again in Chapter 5.
2. The basic Gaussian algorithm as presented above is not commonly used in practice. Two prac-
tical variations of this algorithm, known as Gaussian elimination with partial and complete
pivoting, will be described in Chapters 5 and 6.
3. We have assumed that the quantities a11; a(1) (n;1)
22 ; : : :; ann are dierent from zero. If
any of them is computationally zero, the algorithm will stop.
Example 3.1.2
5x1 + x2 + x3 = 7
x1 + x2 + x3 = 3
2x1 + x2 + 3x3 = 6
or
05 1 110x 1 071
BB CB 1
CC B C
@ 1 1 1 CA B@ x2 A = B@ 3 CA
2 1 3 x 3 6
Ax = b:
Step 1: k=1.
i = 2; 3:
m = ; aa = ; 15 ; m = ; aa = ; 52 :
21
21
31
31
11 11
j = 2; 3:
89
i = 2; j = 2 : a = a22 + m21a12 = 45
(1)
22
i = 2; j = 3 : a(1)
23 = a23 + m21a13 = 45
i = 3; j = 2 : a(1)
32 = a32 + m31a12 = 35
i = 3; j = 3 : a(1)
33 = a33 + m31a13 = 13 5
b(1)
2 = b2 + m21b1 = 58
b(1)
3 = b3 + m31b2 = 165
(Note: b1 = b1; a21 = a31 = 0, a11 = a11; a12 = a12 ; a13 = a13.)
(1) (1) (1) (1) (1) (1)
05 1 1 10x 1 071
BB CC BB CC B C 1
@0 A @ x A = B@ CA
4
5
4
5 2
8
5
0 3
5
13
5
x 3
16
5
A x = b :
(1) (1)
Step 2: k=2.
i=3
m = ; a = ; 34
(1)
32
32
a (1)
22
i = 3; j = 3 : a (2)
33 = a(1)
33 + m32 a23 = 2
(1)
b (2)
3 = b(1)
3 + m32 b2
(1)
=2
05 1 1 10x 1 071
BB 0 CC BB x CC = BB CC1
@ A@ A
4
5
4
5 @ A 2
8
5
0 0 2 x3 2
A x = b
(2) (2)
5x1 + x2 + x + 3 = 7 ) x1 = 1
91
As a simple example of backward stability, consider the case of computing the sum of two
oating point numbers x and y . We have seen before that
(x + y ) = (x + y )(1 + )
= x(1 + ) + y (1 + )
= x0 + y 0
Thus, the computed sum of two
oating point numbers x and y is the exact sum of another two
oating point numbers x0 and y 0 . Since
jj ;
both x0 and y 0 are close to x and y , respectively. Thus we conclude that the operation of adding
two
oating point numbers is backward stable. Similar statements, of course, hold for other
oating point arithmetic operations.
For yet another type of example, consider the problem for solving the linear system Ax = b:
Denition 3.2.3 An algorithm for solving Ax = b will be called stable if the computed solution
x^ is such that
(A + E )^x = b + b
with E and b small.
How Do We Measure Smallness?
The \smallness" of a matrix or a vector is measured either by looking into its entries or by
computing its norm.
92
Examples of Stable and Unstable Algorithms by Backward Error Analysis
Example 3.2.1 A Stable Algorithm | Solution of an Upper Triangular System by
Back Substitution
Consider Algorithm 3.1.3 (the back substitution method). Suppose the algorithm is imple-
mented using accumulation of inner product in double precision. Then it can be shown (see Chapter
11) that the computed solution x^ satises
(T + E )^x = b;
where the entries of the error matrix E are quite small. In fact, if E = (eij ) and T = (tij ), then
jeij j jtij j10;t; i; j = 1; : : :; n;
showing that the error can be even smaller than the error made in rounding the entries of T . Thus,
the back substitution process for solving an upper triangular system is stable.
Example 3.2.2 An Unstable Algorithm | Gaussian Elimination Without Pivoting
Consider the problem of solving the nonsingular linear system Ax = b using Gaussian elimina-
tion (Algorithm 3.1.5).
It has been shown by Wilkinson (see Chapter 11 of this book) that, when the process does
not break down, the computed solution x^ satises
(A + E )^x = b;
with
kE k1 cn kAk1 + 0( );
3 2
where
A k = (aijk )
( ) ( )
are the reduced matrices in the elimination process and , known as the growth factor, is given
by
max max ja(ijk)j
k ij
= max ja j :
ij ij
= max(; ; : : :; n; ) :
1 1
93
Now for an arbitrary matrix A, can be quite large, because the entries of the
reduced matrices A k can grow arbitrarily. To see this, consider the simple matrix
( )
10; 1
! 10
A= :
1 2
One step of Gaussian elimination using 9 decimal digit
oating point arithmetic will yield the
reduced matrix
10 ;10 1
! 10;10 1 !
A(1) = = :
0 2 ; 1010 0 ;1010
The growth factor for this problem is then
= max(; 1) = max(22; 10 ) = 102 ;
10 10
which is quite large. Thus, if we now proceed to solve a linear system with this reduced upper
triangular matrix, we cannot expect a small error matrix E . Indeed, if we wish to solve
10;10x1 + x2 = 1
x1 + 2x2 = 3
using the above A(1) , then the computed solution will be x1 = 0; x2 = 1, whereas the exact solution
is x1 = x2 = 1. This shows that Gaussian elimination is unstable for an arbitrary linear
system.
If an algorithm is stable for a given matrix A, then one would like to see that the algorithm
is stable for every matrix A in a given class. Thus, we may give a formal denition of stability as
follows:
Denition 3.2.4 An algorithm is stable for a class of matrices C if for every matrix A in C ,
the computed solution by the algorithm is the exact solution of a nearby problem.
Thus, for the linear system problem
Ax = b;
94
an algorithm is stable for a class of matrices C if for every A 2 C and for each b, it produces a
computed solution x^ that satises
(A + E )^x = = b + b
for some E and b, where (A + E ) is close to A and b + b is close to b.
The ill-conditioning of a problem contaminates the computed solution, even with the use of
a stable algorithm, therefore yielding an unacceptable solution. When a computed solution is
unsatisfactory, some users (who are not usually concerned with conditioning) tend to put the
blame on the algorithm for the inaccuracy. To be fair, we should test an algorithm for stability
only on well-conditioned matrices. If the algorithm passes the test of stability on well-conditioned
matrices, then it should be declared a stable algorithm. However, if a \stable" algorithm is applied
to an ill-conditioned problem, it should not introduce more error than what the data warrants.
From the previous discussion, it is quite clear now that investigating the conditioning of a
problem is very important.
96
The Condition Number of a Problem
Numerical analysts usually try to associate a number called the con-
dition number with a problem. The condition number indicates
whether the problem is ill- or well-conditioned. More specically,
the condition number gives a bound for the relative error in the so-
lution when a small perturbation is applied to the input data.
In numerical linear algebra condition numbers for many (but not all) problems have been
identied. Unfortunately, computing the condition number is often more involved and
time consuming than solving the problem itself. For example (as we shall see in Chapter
6), for the linear system problem Ax = b, the condition number is
Cond(A) = kAk kA;1k:
Thus, computing the condition number in this case involves computing the inverse of A; it is more
expensive to compute the inverse than to solve the system Ax = b. In Chapter 6 we shall discuss
methods for estimating Cond(A) without explicitly computing A;1 .
We shall discuss conditioning of each problem in detail in the relevant chapter. Before closing
this section, however, let's mention several well-known examples of ill-conditioned problems.
An Ill-Conditioned Subtraction
Consider the subtraction: c = a ; b:
a = 12354101
b = 12345678
c = a ; b = 8423:
Now perturb a in the sixth place:
a^ = 12354001
c^ = c + c = a^ ; b = 8323:
Thus, a perturbation in the sixth digit in the input value caused a change in the second digit in
the answer. Note that the relative error in the data is
a ; a^ = :000008;
a
97
while the relative error in the computed result is
c ; c^ = :01187722:
c
An Ill-Conditioned Root-Finding Problem
Consider solving the simple quadratic equation:
f (x) = x ; 2x + 1:
2
The roots are x = 1; 1: Now perturb the coecient 2 by 0.00001. The computed roots of the
perturbed polynomial f^(x) = x2 ; 2:00001x + 1 are: x1 = 1:0032 and x2 = :9968: Relative errors
in x1 and x2 are .0032. The relative error in the data is 5 10;6.
The zeros of P (x) are 1; 2; : : :; 20 and are distinct. Now perturb the coecient of x19 from ;210
to ;210 + 2;20, leaving other coecients unchanged. Wilkinson used a binary computer with
t = 30. Therefore, this change signied a change in the 30th signicant base 2 digit. The roots of
the perturbed polynomial, carefully computed by Wilkinson, were found to be (reproduced from
CMMC, p. 18):
1.00000 0000 10.09526 6145 0.64350 0904i
2.00000 0000 11.79363 3881 1.65232 9728i
3.00000 0000 13.99235 8137 2.51883 0070i
4.00000 0000 16.73073 7466 2.81262 4894i
4.99999 9928 19.50243 9400 1.94033 0347i
6.00000 6944
6.99969 7234
8.00726 7603
8.91725 0249
20.84690 8101
98
The table shows that certain zeros are more sensitive to the perturbation than are others.
The following analysis, due to Wilkinson (see also Forsythe, Malcolm and Moler CMMC, p. 19),
attempts to explain this phenomenon.
Let the perturbed polynomial be
P (x; ) = x ; x +
20 19
measures the sensitivity of the root n = i; i = 1; 2; : : :; 20. To compute this number, dierentiate
the equation P (x; ) = 0 with respect to :
x ;P=
= P=x
= x
19
:
20 XX 20
(x ; j ) i
=1 i j=1
6 j
i=
x for i = 1; : : :; 10 are listed below. (For the complete list, see CMMC, p. 19.)
The values of =x i
1 ;8:2 10; 18
11 ;4:6 10
7
2 8:2 10 11 ; 12 2:0 10
8
3 ;1:6 10; 6
13 ;6:1 10
8
4 2:2 10
3 ; 14 1:3 10
9
5 ;6:1 10; 1
15 ;2:1 10
9
6 5:8 10
1
16 2:4 10
9
7 ;2:5 10 3
17 ;1:9 10
9
8 6:0 10
4
18 1:0 10
9
9 ;8:3 10 5
19 ;3:1 10
8
10 7:6 10
6
20 4:3 10
7
99
Root-nding and Eigenvalue Computation
The above examples teach us a very useful lesson: it is not a good
idea to compute the eigenvalues of a matrix by explicitly
nding the coecients of the characteristic polynomial and
evaluating its zeros, since the round-o errors in computations will
invariably put some small perturbations in the computed coecients
of the characteristic polynomial, and these small perturbations in the
coecients may cause large changes in the zeros. The eigenvalues will
then be computed inaccurately.
B
B . .
2
.
3 4
.. C
+1
@ .. . . . . . C
A
n n
1
1
+1 2
1
n;1
is called the Hilbert matrix after the celebrated mathematician David Hilbert. The linear system
problem, even with a Hilbert matrix of moderate order, is extremely ill-conditioned. For example,
take n = 5 and consider solving
Hx = b;
where b (2:2833; 1:4500; 1:0929; 0:8845; 0:7456). The exact solution is x = (1; 1; 1; 1; 1; )T : Now
perturb the (5; 1)th element of H in the fth place to obtain .20001. The computed solution
with this very slightly perturbed matrix is (0:9937; 1:2857; ;0:2855; 2:9997; 0:0001)T . Note that
Cond(H ) = 0(105).
For more examples of ill-conditioned linear system problems see Chapter 6.
100
Consider the 10 10 matrix: 01 1 1
BB 1 1 0 CC
BB . .
CC
A=B B . . . C
. CC :
BB . .. 1 C
@ 0 A
1
The eigenvalues of A are all 1. Now perturb the (10,1) coecient of A by a small quantity =
10;10. Then the eigenvalues of the perturbed matrix computed using the software MATLAB to be
described in the next chapter (that uses a numerically eective eigenvalue-computation algorithm)
were found to be:
0
1:0184 + 0:0980i
0:9506 + 0:0876i
1:0764 + 0:0632i
0:9051 + 0:0350i
1:0999 + 0:00i
1:0764 ; 0:0632i
0:9051 ; 0:0350i
1:0184 ; 0:0980i
0:9506 ; 0:0876i
(Note the change in the eigenvalues.)
Example 3.7.2 The Wilkinson-Bidiagonal Matrix
Again, it should not be thought that an eigenvalue problem can be ill-conditioned only when
the eigenvalues are multiple or are close to each other. An eigenvalue problem with well-separated
eigenvalues can be very ill-conditioned too. Consider the 20 20 triangular matrix (known as the
Wilkinson-bidiagonal matrix):
0 20 20 1
B
B 19 20 0
CC
B
B ... ...
CC
A=B B CC :
B
B . . . 20 C CA
@ 0
1
The eigenvalues of A are 1; 2; : : :; 20. Now perturb the (20,1) entry of A by = 10;10). If the
eigenvalues of this slightly perturbed matrix are computed using a stable algorithm (such as the
101
QR iteration method to be described in Chapter 8), it will be seen that some of them will change
drastically; they will even become complex.
In this case also, certain eigenvalues are more ill-conditioned than others. To explain this,
Wilkinson computed the condition number of each of the eigenvalues. The condition number
of the eigenvalue i of A is dened to be (see Chapter 8):
Cond(i) = jy T1x j ;
i i
where yi and xi are, respectively, the normalized left and right eigenvectors of A corresponding
to the eigenvalue i (recall that x is a right eigenvector of A associated with an eigenvalue if
Ax = x, x 6= 0); similarly, y is a left eigenvector associated with if yT A = yT ).
In our case, the right eigenvector xr corresponding to r = r has components (see Wilkinson
AEP, p. 91): 20 ; r (20 ; r)(19 ; r)
1; ;20 ; 20 ; r
; : : :; (;20)20;r ; 0; : : :; 0 ;
(;20)2
while the components of yr are
( r ; 1)! ( r ; 1)( r ; 2) r ; 1
0; 0; : : :; 0; 20r;1 ; : : :; 202 ; 20 ; 1 :
These vectors are not quite normalized, but still, the reciprocal of their products gives us an estimate
of the condition numbers.
In fact, Kr , the condition number of the eigenvalue = r, is
Kr = y T1x = (20(;;1)
r 2019
r r r)!(r ; 1)!
The number Kr is large for all values or r. The smallest Ki for the Wilkinson matrix are K1 =
K20 4:31 107 and the largest ones are K11 = K10 3:98 1012.
Example 3.7.3 (Wilkinson AEP, p. 92)
0 n (n ; 1) (n ; 2) 3 2 11
BB (n ; 1) (n ; 1) (n ; 2) 3 2 1C CC
BB
BB 0 (n ; 2) (n ; 2) ... .. .. C
. . CC
A=B
B ... ... . . . . . . ... ... C CC
BB . . . . . . . 2 .. C.
BB .. CC
BB .. C
@ . 2 2 1C A
0 0 1 1
102
As n increases, the smallest eigenvalues become progressively ill-conditioned. For example, when
n = 12, the condition numbers of the rst few eigenvalues are of order unity while those of the last
three are of order 107.
Examples (Bunch).
1. The Gaussian elimination with pivoting is strongly stable on the class of nonsingular matrices.
(See Chapter 6 for a description of the Cholesky-algorithm.)
2. The Cholesky-algorithm for computing the factorization of a symmetric positive denite ma-
trix of the form, A = HH T , is strongly stable on the class of symmetric positive denite
matrices.
James R. Bunch is a professor of mathematics at the University of California at San Diego. He is well known for his
work on ecient factorization of symmetric matrices (popularly known as the Bunch-Kaufman and the Bunch-Parlett
factorization procedures), and his work on stability and conditioning.
103
3. The Gaussian elimination without pivoting is strongly stable on the class of nonsingular
diagonally dominant matrices.
With an analogy of this denition of strong stability, Bunch also introduced the concept of
weak stability that depends upon the conditioning of the problem.
Denition 3.8.2 An algorithm is weakly stable for a class of matrices C if for each well-
conditioned matrix in C the algorithm produces an acceptable accurate solution.
Thus, an algorithm for solving the linear system Ax = b is weakly stable for a class of matrices
C if for each well-conditioned matrix A in C and for each b, the computed solution x^ to Ax = b is
such that
kx ; x^k is small:
kxk
Bunch was motivated to introduce this denition to point out that the well-known (and fre-
quently used by engineers) Levinson Algorithm for solving linear systems involving TOEPLITZ
matrices (a matrix T = (tij ) is TOEPLITZ if the entries along each diagonal row are the same)
are weakly stable on the class of symmetric positive denite Toeplitz matrices. This very
important and remarkable result was proved by Cybenko (1980). The result was important be-
cause the signal processing community had been using the Levinson algorithm routinely for years,
without fully investigating the stability behavior of this important algorithm.
Mild Stability: We have dened an algorithm to be backward stable if the algorithm produces
a solution that is an exact solution of a nearby problem. But it might very well happen that an
algorithm produces a solution that is only close to the exact solution of a nearby problem.
George Cybenko is a professor of electrical engineering and computer science at Dartmouth College. He has made
substantial contributions in numerical linear algebra and signal processing.
104
How should we then call such an Algorithm?
Van Dooren, following de Jong (1977) has called such an algorithm mixed stable algorithm, and
Steward (IMC, 1973) has dened such an algorithm as just stable algorithm under the additional
restriction that the data of the nearby problem and the original data belong to the same class.
We believe that it is more appropriate to call such stability as mild stability. After all, such
an algorithm is stable in the mild sense.
We thus dene:
Denition 3.8.3 An algorithm is mildly stable if it produces a solution that is close to the exact
solution of a nearby problem.
Example 3.8.1 1. The QR-algorithm for the rank decient least squares problems is mildly
stable (see Lawson and Hanson SLP, p. 95 and Chapter 7 of this book).
2. The QR-algorithm for the full-rank underdetermined least squares problem is mildly stable
(see Lawson and Hanson SLP, p. 93 and Chapter 7 of this book).
105
2. Stability of an Algorithm: An algorithm is said to be a backward stable algorithm if
it computes the exact solution of a nearby problem. Some examples of stable algorithms are:
backward substitution and forward elimination for triangular systems, Gaussian elimination
with pivoting for linear systems, QR factorization using Householder and Givens transforma-
tions, QR iteration algorithm for eigenvalue computations, etc.
The Gaussian elimination algorithm without row changes is unstable for arbitrary matrices.
3. Eects of conditioning and stability on the accuracy of the solution: The condi-
tioning of the problem and the stability of the algorithm both have eects on accuracy of the
solution computed by the algorithm.
If a stable algorithm is applied to a well-conditioned problem, it should compute accurate
solution. On the other hand, if a stable algorithm is applied to an ill-conditioned problem,
there is no guarantee that the computed solution will be accurate. The denition of backward
stability does not imply that. However, if a stable algorithm is applied to an ill-conditioned
problem, it should not introduce more errors than what the data warrants.
106
Exercises on Chapter 3
Note: Use MATLAB (see Chapter 4 and the Appendix), whenever appropriate.
1. (a) Show that the
oating point computations of the sum, product and division of two
numbers are backward stable.
(b) Are the
oating computations of the inner and outer product of two vectors backward
stable? Give reasons for your answer.
2. Find the growth factor of Gaussian elimination without pivoting for the following matrices.
0 1 ;1 ;1 1
BB 0 1 ;1 ;1 CC
:00001 1
! 1
!
1 B BB .. . . . . . . . . . .. CCC
; ; . . C
1 1 :00001 1 B BB .. . . C
@. . . . ;1 C
. A
0 0 1 1010
1 1
!
1 1
!
1 1
!
1:0001 1
!
; ; ; ;
1 :9 1 :99 1 :999 1 1
01 1 1 1 0 1 1 1 01 1 1
1
BB 1 :9 :81 CC ; BB :9 :9 C ; BB
:9 C
2 3
CC :
@ A@ A@ 1
2
1
3
1
4 A
1 1:9 3:61 1 1:9 3:61 1
3
1
4
1
5
6. Are the following
oating point computations backward stable? Give reasons for your answer
in each case.
(a)
(x(y + z ))
(b)
(x1 + x2 + + xn )
(c)
(x1x2 xn )
107
(d)
(xT y=c); where x and y are vectors and c is a scalar
p
(e)
x21 + x22 + + x2n
7. Find the growth factor of Gaussian elimination for each of the following matrices and hence
conclude that Gaussian elimination for linear systems with these matrices is backward stable.
0 10 1 1 1
B C
(a) B@ 1 10 1 C
A
1 1 10
04 0 21
B C
(b) B@ 0 4 0 C
A
2 0 5
0 10 1 1 1
B 1 15 5 C
(c) B C
@ A
1 5 14
8. Show that Gaussian elimination without pivoting for the matrix
0 10 1 1 1
BB C
@ 1 10 1 CA
1 1 10
is strongly stable.
9. Let H be an unreduced upper Hessenberg matrix. Find a diagonal matrix D such that
D;1HD is a normalized upper Hessenberg matrix (that is, all subdiagonal entries are 1).
Show that the transforming matrix D must be ill-conditioned, if one or several subdiagonal
entries of H are very small. Do a numerical example of order 5 to verify this.
10. Show that the roots of the following polynomials are ill-conditioned:
(a) x3 ; 3x2 + 3x + 1
(b) (x ; 1)3(x ; 2)
(c) (x ; 1)(x ; :99)(x ; 2)
11. Using the result of problem #5, show that the matrix-vector multiplication with an ill-
conditioned matrix may give rise to a large relative error in the computed result. Construct
your own 2 2 example to see this.
12. Write the following small SUBROUTINES for future use:
(1) MPRINT(A,n) to print a square matrix A or order n.
108
(2) TRANS(A,TRANS,n) to compute the transpose of a matrix.
(3) TRANS(A,n) to compute the transpose of a matrix where the transpose is overwritten
by A.
(4) MMULT(A,B,C,m,n,p) to multiply C = Amn Bnp .
(5) SDOT(n,x,y,answer) to compute the inner product of two n-vectors x and y in single
precision.
(6) SAXPY(n,A,x,y) to compute y ax + y in single precision, where a is a scalar and x
and y are vectors. (The symbol y ax + y means that the computed result of a times
x plus y will be stored in y.)
(7) IMAX(n,x,MAX) to nd
jxij = maxfjxj j : j = 1; : : :; ng:
(8) SWAP(n,x,y) to swap two vectors x and y .
(9) COPY(n,x,y) to copy a vector x to y .
(10) NORM2(n,x,norm) to nd the Euclidean length of a vector x.
X
n
(11) SASUM(n,x,sum) to nd sum jxij.
i=1
(12) NRMI(x,n) to compute the innity norms of an n-vector x.
(13) Rewrite the above routines in double precision.
(14) SNORM(m,n,A,LDA) to compute the 1-norm of a matrix Amn . LDA is the leading
dimension of the array A.
(15) Write subroutines to compute innity and Frobenius norms of a matrix.
(16) Write a subroutine to nd the largest element in magnitude in a column vector.
(17) Write a subroutine to nd the largest element in magnitude in a matrix.
(Note: Some of these subroutines are a part of BLAS (LINPACK). See also the book Hand-
book for Matrix Computations by T. Coleman and Charles Van Loan, SIAM, 1988.)
109
4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATHEMATICAL SOFT-
WARE
4.1 Denitions and Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 110
4.2 Flop-Count and Storage Considerations for Some Basic Algorithm : : : : : : : : : : 113
4.3 Some Existing High-Quality Mathematical Softwares for Linear Algebra Problems : 122
4.3.1 LINPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122
4.3.2 EISPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122
4.3.3 LAPACK : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 123
4.3.4 NETLIB : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 124
4.3.5 NAG : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125
4.3.6 IMSL : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125
4.3.7 MATLAB : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125
4.3.8 MATLAB Codes and MATLAB Toolkit : : : : : : : : : : : : : : : : : : : : : 126
4.3.9 The ACM Library : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126
4.3.10 ITPACK (Iterative Software Package) : : : : : : : : : : : : : : : : : : : : : : 126
4.4 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126
4.5 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 127
CHAPTER 4
NUMERICALLY EFFECTIVE ALGORITHMS
AND MATHEMATICAL SOFTWARE
4. NUMERICALLY EFFECTIVE ALGORITHMS AND MATH-
EMATICAL SOFTWARE
4.1 Denitions and Examples
Solving a problem on a computer involves the following major steps performed in sequence:
1. Making a mathematical model of the problem, that is, translating the problem into the
language of mathematics. For example, mathematical models of many engineering problems
are sets of ordinary and partial dierential equations.
2. Finding or developing constructive methods (theoretical numerical algorithms) for solv-
ing the mathematical model. This step usually consists of a literature search to nd what
methods are available for the problems.
3. Identifying the best method from a numerical point of view (the best one may be a
combination of several others). We call it the numerically eective method.
4. Finally, implementing on the computer the numerically eective method identied in
step 3. This amounts to writing and executing a reliable and ecient computer program
based on the identied numerically eective method, and may also require exploitation of the
target computer architecture.
The purpose of creating a mathematical software is to provide a scientist or engineer with a
piece of a computer program he can use with condence to solve a problem for which the software
was designed. Thus, a mathematical software should be of high quality.
Let's be specic about what we mean when we call a software a high quality mathematical
software. A high quality mathematical software should have the following features. It should be
1. Powerful and
exible | can be used to solve several dierent variations of the original
problem and the closely associated problems. For example, closely associated with the linear
system Ax = b problem are:
(a) Computing the inverse of A, i.e., nding an X such that AX = I . Though nding the
inverse of A and solving Ax = b are equivalent problems, solution of a linear system
using the inverse ofthe system matrix is not advisable. Computing the inverse explicitly
should be avoided, unless a specic application really calls for it.
(b) Finding the determinant and rank of A.
(c) Finding AX = B , where B is a matrix, etc.
110
Also, a matrix problem may have some special structures. It may be positive denite, banded,
Toeplitz, dense, sparse, etc. The software should state clearly what variations of the problem
it can handle and whether it is special-structure oriented.
2. Easy to read and modify | The software should be well documented. The documentation
should be clear and easy to read, even for a non-technical user, so that if some modications
are needed, they can be made easily. To quote from the cover page of Forsythe, Malcolm and
Moler (CMMC):
\: : : it is an order of magnitude easier to write two good subroutines than to de-
cide which one is best. In choosing among the various subroutines available for
a particular problem, we placed considerable emphasis on the clarity and style of
programming. If several subroutines have comparable accuracy, reliability, and
eciency, we have chosen the one that is the least dicult to read and use."
3. Portable | Should be able to run on dierent computers with few or no changes.
4. Robust | Should be able to deal with an unexpected situation during execution.
5. Based on a numerically eective algorithm | Should be based on an algorithm that
has attractive numerical properties.
We have used the expression \numerically eective" several times without qualication. This
is the most important component of a high quality mathematical software. We shall call a matrix
algorithm numerically eective if it is:
(a) General Purpose | The algorithm should work for a wide class of matrices.
(b) Reliable | The algorithm should give warning whenever it is on the verge of breakdown due
to excessive round-o errors or not being able to meet some specied criterion of convergence.
There are algorithms which produce completely wrong answers without giving warning at all.
Gaussian elimination without pivoting (for the linear system or equivalent problem) is
one such algorithm. It is not reliable.
(c) Stable | Total rounding errors of the algorithm should not exceed the errors that are
inherent in the original problem (see the earlier section on stability).
(d) Ecient | The eciency of an algorithm is measured by the amount of computer time con-
sumed in its implementation. Theoretically, the number of
oating-point operations needed
to implement the algorithm indicates its eciency.
111
Denition 4.1.1 A
oating-point operation, or
op, is the amount of computer time re-
quired to execute the Fortran statement
Denition 4.1.2 A matrix algorithm involving computation with matrices of order n will be
called an ecient algorithm if it takes no more than O(n )
ops. (The historical Cramer's
3
rule for solving a linear system is therefore not ecient, since O(n!)
ops are required for
its execution.) (See Chapter 6.)
One point is well worth mentioning here. An algorithm may be ecient, but still
n3
unstable. For example, Gaussian elimination without pivoting requires 3
ops for an n n
matrix. Therefore, while it is ecient, it is unreliable and unstable for an arbitrary matrix.
(e) Economic in the use of storage | Usually, about n2 storage locations are required
to store a dense matrix of order n. Therefore, if an algorithm requires storage of several
matrices during its execution, a large number of storage locations will be needed even when
n is moderate. Thus, it is important to give special attention to economy of storage while
designing an algorithm.
112
By carefully rearranging an algorithm, one can greatly reduce its storage requirement (ex-
amples of this will be presented later). In general, if a matrix generated during execution
of the algorithm is not needed for future use, it should be overwritten by another computed
element.
y C
C X n
Let x = BB C B C
B@ ... CCA and y = B . C be two n-vectors. Then the inner product z = x y = xi yi
2 2
T
B
@ .. CA i =1
xn yn
can be computed as (Algorithm 3.1.2):
113
For i = 1; 2; : : :; n do
z = z + xi yi
Just one
op is needed for each i. Then a total of n
ops is needed to execute the algorithm.
Example 4.2.2 Outer Product Computation
The outer product xy T is an n n matrix Z as shown below. The (i; j )th component of the
matrix is xi yj . Since there are n2 components and each component requires one multiplication,
the outer-product computation requires n2
ops and n2 storage locations. However, very
often one does not require the matrix from the outer-product explicitly.
10x 0x y x y x y 1
BB x
CC 1
BB x y x y x yn CC
1 1 1 2 1
Z = xy = B
T CC ( y y yn ) = BB .
BB ..
CA
2
B@ ..
nC
CC :
2 1 2 2 2
@.
1 2
A
xn xny xny xnyn 1 2
Ab = @ a b + a b + + a nbn C
21 1 22 2 A 2
an b + an b + + annbn
1 1 2 2
Flop-count. Each component of the vector Ab requires n multiplications and additions. Since
there are n components, the computation of Ab requires n
ops. 2
114
An Explanation of the Above Pseudocode
Note that in the above pseudocode j represents the inner-loop and i represents the outer-loop.
For each value of i from 1 to n, j takes the values i through n.
Flop-Count.
1. Computing cij requires j ; i + 1 multiplications.
2. Since j runs from i to n and i runs from 1 to n, the total number of multiplications is:
X
n X
n X
n
(j ; i + 1) = (1 + 2 + + (n ; i + 1)
i=1 j =i i=1
Xn (n ; i + 1)(n ; i + 2)
= 2
i=1
n6 (for large n)
3
115
Flop-count. There are n multiplications in computing each cij . Since j runs from 1 to p and
i runs from 1 to m, the total number of multiplication is mnp. Thus, for two square matrices A
and B , each of order n, this count is n . 3
The algorithm above for matrix-matrix multiplication for two n n matrices will obviously
require n2 storage locations for each matrix. However, it can be rewritten in such a way that there
will be a substantial savings in storage, as illustrated below.
The following algorithm overwrites B with the product AB , assuming that an additional column
has been annexed to B (alternatively one can have a work vector to hold values temporarily).
Example 4.2.6 Matrix-Matrix Product with Economy in Storage
For j = 1; 2; : : :; n do
X n
1. hi = aik bkj (i = 1; 2; : : :; n)
k=1
2.bij hi (i = 1; 2; : : :; n)
(h is a temporary work vector).
!
Example 4.2.7 Computation of I ; 2uuu
T
T u A.
We show below that this matrix product can implicitly be performed with O(n2!)
ops. The key
observation here is that we do not need to form the matrix I ; 2uu explicitly.
T
uT u
116
The following algorithm computes the product. The algorithm overwrites A with the product.
Let
= uT2u :
Then !
I ; 2uuu
T
Tu A
becomes A ; uuT A:
Let 0u 1
BB u 1
CC
u=BBB .. 2 CC :
CA
@.
un
Then the (i; j )th entry of (A ; uuT A) is equal to aij ; (u a j + u a j + + un anj )ui. Thus, we
1 1 2 2
1. Compute = uT2u .
For j = 1; 2; : : :; n do
= u1a1j + u2a2j + + um amj
For i = 1; 2; : : :; m do
aij aij ; ui
Flop-Count.
1. There are (m + 1)
ops to compute (m
ops to compute the inner product and 1
op to
divide 2 by the inner product).
2. There are n 's, and each costs (m + 1)
ops. Thus, we need n(m + 1)
ops to compute
the 's.
3. There are mn aij to compute. Each aij costs just one
op, once the 's are computed.
117
We now summarize the above very important result (which will be used repeatedly in this book)
in the following:
!
Flop-count for the Product: I ; 2uuu
T
Tu A
A Numerical Example
Let 01 11
B 2 1 CC :
u = (1; 1; 1)T ; A = B
@ A
0 0
Then
= 23
j=1: (Compute the rst column of the product)
= u a +u a +u a = 1+2 = 3
1 11 2 21 3 31
= 23 3 = 2
a a ; u = 1 ; 2 = ;1
11 11 1
a a ; u = 2 ; 2 1 ; 2 ; 2 = ;0
21 21 2
a a ; u = 0 ; 2 1 = ;2
31 31 3
= 2 23 = 43
a a ; u = 1 ; 4 1 = ;1
12 12 1
3 3
118
a22 a ; u = 1 ; 43 1 = ;31
22 2
a 32 a ; u = 0 ; 4 = ;4
32 3
3 3
Thus,
!
I ; 2uuu
T
A Tu A
0 ;1 ;1 1
B0 ;1 C
3
=B
@ 3 A
C
;2 ;4
3
119
Example 4.2.8 Flop-count for Algorithm 3.1.3 (Back Substitution Process)
>From the pseudocodes of this algorithm, we see that it takes one
op to compute yn , two
ops
to compute yn;1 , and so on. Thus to compute y1 through yn , we need (1 + 2 + 3 + + n) =
n(n + 1) n2
ops.
2 2
Example 4.2.9 Flop-count and Storage Considerations for Algorithm 3.1.4 (The In-
verse of an Upper Triangular Matrix
Let's state the algorithm once more here
For k = n; n ; 1; : : :; 2; 1
skk = t1
kk
For i = k ; 10; k ; 2; : : :; 11
X n
s =;1 @
ik tii t s Aij jk
j =i+1
Flop-count.
k=1 1
op
k=2 3
ops
k=3 6
ops
..
.
k=n n(n + 1)
ops
2
X
n r(r + 1)
(approximately ):Total
ops 1 + 3 + 6 + + n(n2+ 1) = 2
r=1
X
n r2 X
n r n: 3
= +
2 r=1 2 6
r=1
120
Flop-count for Computing the Inverse of an Upper
Triangular Matrix
It requires about n6
ops to compute the inverse of an upper trian-
3
gular matrix.
Storage Considerations. Since the inverse of an upper triangular matrix is an upper triangular
matrix, and it is clear from the algorithm that we can overwrite tik by sik , the algorithm can be
rewritten so that it overwrites T with the inverse S . Thus we can rewrite the algorithm as
Example 4.2.10 Flop-count and Storage Considerations for Algorithm 3.1.5 (Gaussian
Elimination)
It will be shown in Chapter 5 that the Gaussian elimination algorithm takes about n
3
3
ops. The algorithm can overwrite A with the upper triangular matrix A(n;1); in fact, it can
overwrite A with each A(k) . The multipliers mik can be stored in the lower half of A as they are
computed. Each b(k) can overwrite the vector b.
121
Gaussian Elimination Algorithm with Economy in Storage
For k = 1; 2; : : :; n ; 1 do
For i = k + 1; : : :; n do
aik mik = ; aaik
kk
For j = k + 1; : : :; n do
aij aij + mik akj
bi bi + mikbk
4.3.1 LINPACK
LINPACK is \a collection of Fortran subroutines which analyze and solve various systems of si-
multaneous linear algebraic equations. The subroutines are designed to be completely machine
independent, fully portable, and to run at near optimum eciency in most operating environ-
ments." (Quotation from LINPACK Users' Guide.)
Though primarily intended for linear systems, the package also contains routines for the singu-
lar value decomposition (SVD) and problems associated with linear systems such as computing
the inverse, the determinant, and the linear least square problem. Most of the routines are
for square matrices, but some handle rectangular coecient matrices associated with overdeter-
mined or underdetermined problems.
The routines are meant for small and dense problems of order less than a few hundred and
band matrices of order less than several thousand. There are no routines for iterative methods.
4.3.2 EISPACK
EISPACK is an eigensystem package. The package is primarily designed to compute the eigenvalues
and eigenvectors of a matrix; however, it contains routines for the generalized eigenvalue problem
of the form Ax = Bx and for the singular value decomposition.
122
The eigenvalues of an arbitrary matrix A are computed in several sequential phases. First,
the matrix A is balanced. If it is nonsymmetric the balanced matrix is then reduced to an upper
Hessenberg by matrix similarities (if it is symmetric, it is reduced to symmetric tridiagonal). Fi-
nally the eigenvalues of the transformed upper Hessenberg or the symmetric tridiagonal matrix are
computed using the implicit QR iterations or the Sturm-sequence method. There are EISPACK
routines to perform all these tasks.
4.3.3 LAPACK
The building blocks of numerical linear algebra algorithms have three levels of BLAS (Basic Linear
Algebra Subroutines). They are:
Level 1 BLAS: These are for vector-vector operations. A typical Level 1 BLAS is of the form
y x + y, where x and y are vectors and is a scalar.
Level 2 BLAS: These are for matrix-vector operations. A typical Level 2 BLAS is of the form
y Ax + y .
Level 3 BLAS: These are for matrix-matrix operations. A typical Level 3 BLAS is of the form
C AB + C .
Level 1 BLAS is used in LINPACK. Unfortunately, the algorithms composed of Level 1 BLAS
are not suitable for achieving high eciency on most supercomputers of today, such as on CRAY
computers.
While Level 2 BLAS can give good speed (sometimes almost peak speed) on many vector
computers such as CRAY X-MP or CRAY Y-MP, those are not suitable for high eciency on other
modern supercomputers (e.g., CRAY 2).
The Level 3 BLAS are ideal for most of today's supercomputers. They can perform O(n3)
oating-point operations on O(n2 ) data.
Therefore, during the last several years, an intensive attempt was made by numerical linear
algebraists to restructure the traditional BLAS-1 based algorithms into algorithms rich in BLAS-2
and BLAS-3 operations. As a result, there now exist algorithms of these types, called blocked
algorithms, for most matrix computations. These algorithms have been implemented in a software
package called LAPACK.
\LAPACK is a transportable library of Fortran 77 subroutines for solving the most common
problems in numerical linear algebra: systems of linear equations, linear least squares problems,
eigenvalue problems, and singular value problems. It has been designed to be ecient on a wide
range of modern high-performance computers.
123
LAPACK is designed to supersede LINPACK and EISPACK, principally by restructuring the
software to achieve much greater eciency on vector processors, high-performance \super-scalar"
workstations, and shared memory multiprocessors. LAPACK also adds extra functionality, uses
some new or improved algorithms, and integrates the two sets of algorithms into a unied package.
The LAPACK Users' Guide gives an informal introduction to the design of the algorithms and
software, summarizes the contents of the package, describes conventions used in the software and
its documentation, and includes complete specications for calling the routines." (Quotations from
the cover page of LAPACK Users' Guide.)
4.3.4 NETLIB
Netlib stands for network library. LINPACK, EISPACK, and LAPACK subroutines are available
electronically from this library, along with many other types of softwares for matrix computations.
A user can obtain software from these packages by sending electronic mail to
[email protected]
Information on the subroutines available in a given package can be obtained by sending e-mail:
send index from {library}
Thus, to get a subroutine called SGECO from LINPACK, send the message:
send SGECO from LINPACK
124
(This message will send you SGECO and other routines that call SGECO.) Xnetlib, which uses an
X Window interface for direct downloading, is also available (and more convenient) once installed.
Further information about Netlib can be obtained by anonymous FTP to either of the following
sites:
netlib2.cs.utk.edu
research.att.com
4.3.5 NAG
NAG stands for Numerical Algorithm Group. This group has developed a large software library
(also called NAG) containing routines for most computational problems including numerical linear
algebra problems, numerical dierential equations problems (both ordinary and partial), optimiza-
tion problems, integral equations problems, statistical problems, etc.
4.3.6 IMSL
IMSL stands for International Mathematical and Statistical Libraries. As the title suggests, this
library contains routines for almost all mathematical and statistical computations.
4.3.7 MATLAB
The name MATLAB stands for MATrix LABoratory. It is an interactive computing system
designed for easy computations of various matrix-based scientic and engineering problems. MAT-
LAB provides easy access to matrix software developed by the LINPACK and EISPACK software
projects.
MATLAB can be used to solve a linear system and associated problems (such as inverting a
matrix or computing the rank and determination of a matrix), to compute the eigenvalues and
eigenvectors of a matrix, to nd the singular value decomposition of a matrix, to compute
the zeros of a polynomial, to compute generalized eigenvalues and eigenvectors, etc. MATLAB
is an extremely useful and valuable package for testing algorithms for small problems and for use
in the classroom. It has indeed become an indispensable tool for teaching applied and numerical
linear algebra in the classroom. A remarkable feature of MATLAB is its graphic capabilities (see
more about MATLAB in the appendix).
There is a student edition of MATLAB, published by Prentice Hall,1992.It is designed for easy
use in the classroom.
125
4.3.8 MATLAB Codes and MATLAB Toolkit
MATLAB codes for selected algorithms described in this book are provided for beginning students
in the APPENDIX.
Furthermore, an interactive MATLAB Toolkit called MATCOM implementing all the major
algorithms (to be taught in the rst course) will be provided along with the book, so that students
can compare dierent algorithms for the same problem with respect to numerical eciency, stability,
accuracy, etc.
naively, ignoring the structure of H . This computation forms a basis for many other matrix
computations described later in the book.
5. A statement for each of the several high quality mathematical software package such as
LINPACK, EISPACK, LAPACK, MATLAB, IMSL, NAG, etc., has been provided in the
nal section.
127
Stewart's book (IMC) is a valuable source for learning how to organize and develop algorithms
for basic matrix operations in time-ecient and storage-economic ways.
Each software package has its own users' manual that describes in detail the functions of the
subroutines, and how to use them, etc. We now list the most important ones.
1. The LINPACK Users' Guide by J. Dongarra, J. Bunch, C. Moler and G. W. Stewart,
SIAM, Philadelphia, PA, 1979.
2. The EISPACK Users' Guide can be obtained from either NESC or IMSL. Matrix Eigen-
system Routines EISPACK Guide by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S.
Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler, has been published by Springer-Verlag,
Berlin, 1976, as volume 6 of Lecture Notes in Computer Science. A later version, prepared by
Garbow, Boyle, Dongarra and Moler in 1977, is also available as a Springer-Verlag publication.
3. LAPACK Users' Guide, prepared by Chris Bischof, James Demmel, Jack Dongarra, Jerry
D. Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen, is available from SIAM.
SIAM's address and telephone number are: SIAM (Society for Industrial and Applied Math-
ematics), 3600 University City Science Center, Philadelphia, PA 19114-2688; Tel: (215) 382-
9800.
4. The NAG Library and the associated users' manual can be obtained from:
Numerical Algorithms Group, Inc.
1400 Opus Place, Suite 200
Downers Grove, IL 60151-5702
5. IMSL: The IMSL software library and documentation are available from:
IMSL, Houston, Texas
6. MATLAB: The MATLAB software and Users' Guide are available from:
The MathWorks, Inc.
Cochituate Place
24 Prime Park Way
Natick, MA 01760-1520
TEL: (508) 653-1415; FAX: (508) 653-2997
e-mail: [email protected]
The student edition of MATLAB has been published by Prentice Hall, Englewood Clis, NJ
07632. A 5 14 disk is included with the book for MS-DOS personal computers.
128
For more information on accessing mathematical software electronically, the paper \Distribution
of Mathematical Software via Electronic Mail" by J. Dongarra and E. Grosse, Communications of
the ACM, 30(5) (1987), 403-407, is worth reading.
Finally, a nice survey of the blocked algorithms has been given by James Demmel (1993).
129
Exercises on Chapter 4
1. Develop an algorithm to compute the product C = AB in each of the following cases. Your
algorithm should take advantage of the special structure of the matrices in each case. Give
op-count and storage requirement in each case.
(a) A and B are both lower triangular matrices.
(b) A is arbitrary and B is lower triangular.
(c) A and B are both tridiagonal.
(d) A is arbitrary and B is upper Hessenberg.
(e) A is upper Hessenberg and B is tridiagonal.
(f) A is upper Hessenberg and B is upper triangular.
2. A square matrix A = (aij ) is said to be a band matrix of bandwidth 2k + 1 if
aij = 0 whenever ji ; j j > k:
Develop an algorithm to compute the product C = AB , where A is arbitrary and B is a band
matrix of bandwidth 2, taking advantage of the structure of the matrix B . Overwrite A with
AB and give
op-count.
3. Using the ideas in Algorithm 4.2.1, develop an algorithm to compute the product A(I + xy T ),
where A is an n n matrix and x and y are n-vectors. Your algorithm should require roughly
2n2
ops.
4. Rewrite the algorithm of the problem #3 in the special cases when the matrix I + xy T is
(a) an elementary matrix: I + meTk , m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T , and eTk is the kth
row of I .
(b) a Householder matrix: I ; 2uu
T
T , where u is an n-vector.
uu
5. Let A and B be two symmetric matrices of the same order. Develop an algorithm to compute
C = A + B , taking advantage of symmetry for each matrix. Your algorithm should overwrite
B with C . What is the
op-count?
6. Let A = (aij ) be an unreduced lower Hessenberg matrix of order n. Then, given the rst row
r1, it can be shown (Datta and Datta [1976]) that the successive rows r2 through rn of Ak (k
130
is a positive integer n) can be computed recursively as follows:
X
i;1
(riBi ; aij rj )
j =1
ri =
+1
ai;i+1 ; i = 1; 2; : : :; n ; 1; where Bi = A ; aii I:
Develop an algorithm to implement this. Give
op-count for the algorithm.
7. Let ar and br denote, respectively, the rth columns of the matrices A and B . Then develop
an algorithm to compute the product AB from the formula
X
n
AB = ai bTi :
i=1
Give
op-count and storage requirement of the algorithm.
8. Consider the matrix 0 12 11 10 3 2 1 1
BB 11 11 10 3 2 1 CC
BB .. .. C
C
BB 0 10 10 . . . . .C C
B .. . . . . . . . . . ... ... C
A=B BB . . CC
BB .. . . . . . 2 .. C
. . CC
BB .. C
@. 2 2 1C A
0 0 1 1
Find the eigenvalues of this matrix. Use MATLAB.
Now perturb the (1,12) element to 10;9 and compute the eigenvalues of this perturbed matrix.
What conclusion do you make about the conditioning of the eigenvalues?
131
MATLAB AND MATCOM PROGRAMS AND
PROBLEMS ON CHAPTER 4
1. Using the MATLAB function `rand', create a 5 5 random matrix and then print out the
following outputs:
A(2,:), A(:,1), A (:,5),
A (1, 1: 2 : 5), A([1, 5]), A (4: -1: 1, 5: -1: 1).
2. Using the function `for', write a MATLAB program to nd the inner product and outer
product of two n-vectors u and v.
[s] = inpro(u,v)
[A] = outpro(u,v)
Test your program by creating two dierent vectors u and v using rand (4,1).
3. Learn how to use the following MATLAB commands to create special matrices:
compan companion matrix
diag diagonal matrix
ones constant matrix
zeros zero matrix
rand random matrix
tri triangular part
hankel hankel matrix
toeplitz Toeplitz matrix
hilb Hilbert matrix
triu upper triangular
vander Vandermonde matrix
4. Write MATLAB programs to create the following well-known matrices:
(a) [A] = wilk(n) to create the Wilkinson bidiagonal matrix A = (aij ) of order n:
aii = n ; i + 1; i = 1; 2; ; 20
132
ai; ;i = n; i = 2; 3; ; n
1
aij = 0; otherwise :
(b) [A] = pie(n) to create the Pie matrix A = (aij ) of order n:
aij = ; is a parameter near 1 or n ; 1:
aii = 1 for i 6= j:
5. Using \help" commands for \
ops", \clock", \etime", etc., learn how to measure
op-count
and timing for an algorithm.
6. Using MATLAB functions `for', `size', `zero', write a MATLAB program to nd the product
of two upper triangular matrices A and B of order m n and n p, respectively. Test your
program using
A = triu(rand (4,3)),
B = triu(rand (3,3)).
7. Run the MATLAB program housmul(A; u) from MATCOM by creating a random matrix A
of order 6 3 and a random vector u with six elements. Print the output and the number of
ops and elapsed time.
8. Modify the MATLAB program housmul to housxy(A; x; y ) to compute the product (I +
xy T )A.
Test your program by creating a 15 15 random matrix A and the vectors x and y of
appropriate dimensions. Print the product and the number of
ops and elapsed time.
133
5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LINEAR ALGEBRA
AND THEIR APPLICATIONS
5.1 A Computational Methodology in Numerical Linear Algebra : : : : : : : : : : : : : : 135
5.2 Elementary Matrices and LU Factorization : : : : : : : : : : : : : : : : : : : : : : : 135
5.2.1 Gaussian Elimination without Pivoting : : : : : : : : : : : : : : : : : : : : : 136
5.2.2 Gaussian Elimination with Partial Pivoting : : : : : : : : : : : : : : : : : : : 147
5.2.3 Gaussian Elimination with Complete Pivoting : : : : : : : : : : : : : : : : : : 155
5.3 Stability of Gaussian Elimination : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 160
5.4 Householder Transformations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 163
5.4.1 Householder Matrices and QR Factorization : : : : : : : : : : : : : : : : : : : 167
5.4.2 Householder QR Factorization of a Non-Square Matrix : : : : : : : : : : : : : 173
5.4.3 Householder Matrices and Reduction to Hessenberg Form : : : : : : : : : : : 174
5.5 Givens Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 180
5.5.1 Givens Rotations and QR Factorization : : : : : : : : : : : : : : : : : : : : : 186
5.5.2 Uniqueness in QR Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : 188
5.5.3 Givens Rotations and Reduction to Hessenberg Form : : : : : : : : : : : : : : 191
5.5.4 Uniqueness in Hessenberg Reduction : : : : : : : : : : : : : : : : : : : : : : : 193
5.6 Orthonormal Bases and Orthogonal Projections : : : : : : : : : : : : : : : : : : : : : 194
5.7 QR Factorization with Column Pivoting : : : : : : : : : : : : : : : : : : : : : : : : : 198
5.8 Modifying a QR Factorization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 203
5.9 Summary and Table of Comparisons : : : : : : : : : : : : : : : : : : : : : : : : : : : 205
5.10 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 209
CHAPTER 5
SOME USEFUL TRANSFORMATIONS IN
NUMERICAL LINEAR ALGEBRA
AND THEIR APPLICATIONS
5. SOME USEFUL TRANSFORMATIONS IN NUMERICAL LIN-
EAR ALGEBRA AND THEIR APPLICATIONS
Objectives
The major objective of this chapter is to introduce fundamental tools such as elementary,
Householder, and Givens matrices and their applications. Here are some of the highlights of the
chapter.
Various LU-type matrix factorizations: LU factorization using Gaussian elimination
without pivoting (Section 5.2.1), MA = U factorization using Gaussian elimination with
partial pivoting (Section 5.2.2), and MAQ = U factorization using Gaussian elimination
with complete pivoting (Section 5.2.3).
QR factorization using Householder and Givens matrices (Section 5.4.1 and Section 5.5.1).
Reduction to Hessenberg form by orthogonal similarity using Householder and Givens
matrices (Sections 5.4.3 and 5.5.3).
Computations of orthonormal bases and orthogonal projections using QR factoriza-
tions (Section 5.6).
Background Material Needed for this Chapter
The following background material and tools developed in earlier chapters will be needed for
comprehension of this chapter.
1. Subspace and basis (Section 1.2.1)
2. Rank properties (Section 1.3.2)
3. Orthogonality and projections: Orthonormal basis and orthogonal projections (Sections
1.3.5 and 1.3.6)
4. Special matrices: Triangular, permutation, Hessenberg, orthogonal (Section 1.4)
5. Basic Gaussian elimination (Algorithm 3.1.5)
6. Stability concepts of algorithms (Section 3.2)
134
5.1 A Computational Methodology in Numerical Linear Algebra
Most computational algorithms to be presented in this book have a common basic structure that
can be described as follows:
1. The problem is rst transformed to a \reduced" problem.
2. The reduced problem is then solved exploiting the special structure exhibited by the problem.
3. Finally, the solution of the original problem is recovered from the solution of the reduced
problem.
The reduced problem typically involves a \condensed" form or forms of the matrix A, such
as triangular, Hessenberg (almost triangular), tridiagonal, Real Schur Form (quasi-triangular), or
bidiagonal. It is the structures of these condensed forms which are exploited in the solution of the
reduced problem. For example, the solution of the linear system Ax = b is usually obtained rst
by triangularizing the matrix A, and then solving an equivalent triangular system. In eigenvalue
computations, the matrix A is transformed to a Hessenberg form before applying the QR iterations.
To compute the singular value decomposition, the matrix A is rst transformed to a bidiagonal
matrix, and then singular values of the bidiagonal matrix are computed. These condensed forms
are normally achieved through a series of transformations known as elementary, Householder
or Givens transformations. We will study these transformation here and show how they can be
applied to achieve various condensed forms.
an
there is an elementary matrix E such that Ea is a multiple of e . 1
Proof. Dene 0 1 0 0 01
BB ; a 1 0 0 CC
B a CC
2
E=B BB .. 1
... .. C :
@ .a .C A
; an 0 0 1
Then E is an elementary lower triangular matrix and is such that
1
0a 1
BB 0 CC 1
BB CC
Ea = B BB 0. CCC
B@ .. CA
0
triangular matrix.
Set A = A(0).
136
Step 1. Find an elementary matrix E such that A = E A has zeros below the (1,1) entry
1
(1)
1
0a a a n 1
BB 0 a 11 12
a n C CC
1
A =B
(1) (1)
BB .. ..
(1) 22
. . . ... CCA :
2
@ . .
0 a(1)n2 ann (1)
0 an 1
Then A(1) = E1A will have the above form and is the same as the matrix A(1) obtained at the end
of step 1 of Algorithm 3.1.5.
Record the multipliers: m21; m31; : : :; mn1; mi1 = ; aai1 ; i = 2; : : :n.
11
Step 2. Find an elementary matrix E such that A = E A has zeros below the (2,2)
2
(2)
2
(1)
bE BBB ...
32
CC = BB 0 CC :
2
BB .. CC BB .. CC
@ . A @ . A
an
(1)
2 0
Record the multipliers: m ; : : :; mn ; mi = ; ai ; i = 3; : : :; n. Then dene
(1)
2
32
a 2 2 (1)
01 0 01 0 1
22
B
B0 C
CC BB 1 0 CC
E =BB
B . C = B Eb CA :
@ ..
2
E^ C A @0 2
2
0
A(2) = E2A(1) will then have zeros below the (2,2) entry in the second column.
0a a a n 1
BB 0 a 11 12
a n C CC
1
BB
(1) (1)
A =B
22
a n C
C2
BB 0. 0. a C:
(2) (2) (2)
. . . ... C
33 3
B@ .. .. CA
0 0 an (2)
3 ann (2)
137
Note that premultiplication of A(1) by E2 does not destroy zeros already created in A(1). This
matrix A(2) is the same as the matrix A(2) of Algorithm 3.1.5.
Step k. In general, at the kth step, an elementary matrix Ek is found such that A k = ( )
Ek A k; has zeros below the (k; k) entry in the kth column. Ek is computed in two successive
( 1)
BB ak;kk; CC BB 0 CC
B C
Ebk BBB k .. ;k CCC = BB 0 CC ;
( 1)
+1
@ . A BB ... CC
a k; @( A 1)
nk 0
and then Ek is dened as 0I 1
0
BB k; 1
CC
Ek = BBB CC :
@ Ebk CA
0
Here Ik;1 is the matrix of the rst (k ; 1) rows and columns of the n n identity matrix I . The
matrix A(k) = Ek A(k;1) is the same as the matrix A(k) of Algorithm 3.1.5.
Record the multipliers:
aik ( k;1)
mk ;k ; : : :; mik; mi;k = ; k; ; i = k + 1; : : :; n:
+1
akk ( 1)
Step n-1. At the end of the (n ; 1)th step, the matrix A n; is upper triangular and the ( 1)
12
a n
1
CC
B CC
(1) (1)
B
B0 0 a
22
an
2
CC
=B
(2) (2)
A n;1)
(
B
B
33
... ..
3
CC :
B 0 . CC
B
B . ... .. CA
@ .. .
0 0 0 0 annn; ( 1)
Obtaining L and U .
A n; = En; A n; = En; En; A n; = = En; En; En; E E A:
( 1)
1
( 2)
1 2
( 3)
1 2 3 2 1
Set
U = A n; ;
( 1)
L = En; En; E E :
1 1 2 2 1
138
Then from above we have
U = L A: 1
Since each Ek is a unit lower triangular matrix (a lower triangular matrix having 1's along the
diagonal), so is the matrix L1 and, therefore, L;1 1 exists. (Note that the product of two triangular
matrices of one type is a triangular matrix of the same type.)
Set L = L;1 1 . Then the equation U = L1A becomes
A = LU:
This factorization of A is known as LU factorization.
Denition 5.2.3 The entries a ; a ; : : :; annn; are called pivots, and the above process of ob-
11
(1)
22
( 1)
@ .. .. .. .. 0C A
;mn ;mn ;mn;n; 11 2 1
139
Thus, if det Ar ; r = 1; 2; : : :; n is nonzero, then an LU factorization always exists. Indeed, the LU
factorization in this case is unique, for if
A=L U =L U ;1 1 2 2
where L2L;1 1 is the product of two unit lower triangular matrices and is therefore unit lower
triangular; U2 U1;1 is the product of two upper triangular matrices and is therefore upper triangular.
Since a unit lower triangular matrix can be equal to an upper triangular matrix only if both are
the identity, we have
L1 = L2; U1 = U2:
Theorem 5.2.1 (LU Theorem) Let A be an nn matrix with all nonzero leading
principal minors. Then A can be decomposed uniquely in the form
A = LU;
where L is unit lower triangular and U is upper triangular.
Remark: Note that in the above theorem, if the diagonal entries of L are not specied,
then the factorization is not unique.
Algorithm 5.2.1 LU Factorization Using Elementary Matrices
Given an n n matrix A, the following algorithm computes, whenever possible, elementary
matrices
E ; E ; : : :; En;
1 2 1
140
and an upper triangular matrix U such that with
L = E ; E ; En;; ;
1
1
2
1 1
1
For k = 1; 2; ; n ; 1 do
1. If akk = 0; stop.
2. Find an elementary matrix Ebk = I + meTk , of order (n ; k + 1), where
m = (0; 0; : : :; 0; mk +1 ;k; : : :; mn;k);
such that 0a 1 0a 1
BB kk... CC BB 0kk CC
Ebk BBB .. CCC = BBB .. CCC :
@ . A @ . A
ank 0
Ik; 0
!
3. Dene Ek = 1
; where Ik;1 is the matrix of the rst (k ; 1) rows and columns of
0 Ebk
the n n identity matrix I .
4. Save the multipliers
mk +1 ;k ; : : :; mn;k:
141
Step 1. Compute E . The multipliers are: m = ;2; m = ; .
1 21 31
1
0 1 0 01
2
B C
E =B @ ;2 1 0 CA 1
; 0 1 1
0 1 0 0102 2 31 02 2 3 1 2
B CB C B C
AA =E A=B
(1)
@ ;2 1 0 CA B@ 4 5 6 CA = B@ 0 1 0 CA
1
; 0 1 1 2 1 1
2
0 1 ; 1
2
1 0
!
Eb =
;1 1 2
01 0 01
!
E =B
B C 1 0
@ 0 1 0 CA = 0 Eb
2
0 ;1 1 2
01 0 0102 2 31 02 2 31
AA =E A =B
B 0 1 0 CC BB 0 1 0 CC BB C
(2)
@ 2
(1)
A@ A = @ 0 1 0 CA
0 ;1 1 0 1 0 0 5 5
02 2 31
2 2
B 0 1 0 CC :
Thus U = B
@ A
0 0 5
Compute L: 2
0 1 0 01
B C
L =E E =B @ ;2 1 0 CA
1 2 1
; ;1 1 1
0 1 0 0
1 0 1 0 01 2
L = L; = B
B ;m 1 0C
C BB C
@1
1
A = @ 2 1 0 CA
21
;m ;m 1 31 1 1 32
1
2
0 1 0 01
B C
E =B
@ m 1 0 CA
1 21
m 0 1
0 1 0 010a 1 0a a 1
31
a a a
B CB 11 12 13
CC BB
11 12 13
CC
E A=B
1 @ m 1 0 CA B@ a
21 a21 22 a 23 A=@ 0 a
(1)
22 a (1)
23 A
m 31 0 1 a a 31 32 a 33 0 a(1)
32 a (1)
33
where
a (1)
22 = a22 + m21a21
a (1)
23 = a23 + m31a13
and so on.
3. As soon as A(k) is formed, it can overwrite A.
4. The vector (mk+1;k; : : :; mnk) has (n ; k) elements and at the kth step exactly (n ; k) zeros
are produced, an obvious scheme of storage will then be to store these (n ; k) elements in
the positions (k + 1; k); (k + 2; k); : : :; (n; k) of A below the diagonal.
5. The entries of the upper triangular matrix U then can be stored in the upper half part of
A including the diagonal. With this storage scheme, the matrix A(n;1) at the end of the
(n ; 1)th step will be
0 a a an 1
BB ;m a
11 12
an
1
CC
BB . . . CC
(1) (1)
21
..
21 2
A A n;
( 1)
=B BB ... . . . . . CC
B@ .. . .. . .. .. CC
. A
;mn ;mn;n; annn;
1 1
( 1)
143
Thus a typical step of Gaussian elimination for LU factorization consists of
(1) forming the multipliers and storing them in the appropriate places below diagonal,
(2) updating the entries of the rows (k + 1) through n and saving them in the upper half of A.
Based on the above discussion, we now present the following algorithm.
Algorithm 5.2.2 Triangularization Using Gaussian Elimination without Pivoting
Let A be an n n matrix. The following algorithm computes triangularization of A, whenever
it exists. The algorithm overwrites the upper triangular part of A including the diagonal with U ,
and the entries of A below the diagonal are overwritten with multipliers needed to compute L.
For k = 1; 2; : : :; (n ; 1) do
1. (Form the multipliers)
aik mik = ; aik (i = k + 1; k + 2; : : :; n)
akk
2. (Update the entries)
aij aij + mikakj (i = k + 1; : : :; n; j = k + 1; n).
Remark: The algorithm does not give the matrix L explicitly; however, it can be formed out of
the multipliers saved at each step, as shown earlier (see the explicit expression for L).
Flop-Count. The algorithm requires roughly n3
ops. This can be seen as follows:
3
requires one
op and updating each entry also requires 1
op. Thus, step 1 requires (n ; 1) +(n ; 1)
2
ops.
Step 2, similarly, requires [(n ; 2)2 + (n ; 2)]
ops, and so on. In general, step k requires
[(n ; k)2 + (n ; k)]
ops. Since there are (n ; 1) steps, we have
nX
;1 nX
;1
Total
ops = (n ; k)2 + (n ; k)
k=1 k=1
= n(n ; 1)(2
6
n ; 1) + n(n ; 1)
2
n 3
' 3 + O(n ) :
2
144
Recall
1. 12 + 22 + + r2 = r(r+1)(2r+1)
6
2. 1 + 2 + + r = r(r+1)
2
a a = a + m a = ;2
22
(1)
22 22 21 12
a a = a + m a = ;4 (1)
01 2 1
32 32 32 31 13
A A =B
B 0 ;2 CC :
@ A (1)
0 ;4
Step 2. The multiplier is m = ;2; a a = 0. (2)
32
01 2 1 32 32
B C
A A =B @ 0 ;2 CA (2)
0 0
01 2 1
B C
U = B
@ 0 ;2 CA :
0 0
Note that U in this case is upper trapezoidal rather than an upper triangular matrix
0 1 0 0
1 01 0 01
B C B C
L=B @ ;m21 1 0 CA = B@ 3 1 0 CA :
;m31 ;m32 1 5 2 1
145
Verify that 01 0 0101 2 1 01 21
B CB C B C
LU = B
@ 3 1 0 CA B@ 0 ;2 CA = B@ 3 4 CA = A:
5 2 1 0 0 5 6
Diculties with Gaussian Elimination without Pivoting
As we have seen before, Gaussian elimination without pivoting fails if any of the pivots is zero.
However, it is worse yet if any pivot becomes close to zero: in this case the method can be
carried to completion, but the obtained results may be totally wrong.
Consider the following celebrated example from Forsythe and Moler (CSLAS, p. 34):
Let Gaussian elimination without pivoting be applied to
0:0001 1
!
A= ;
1 1
and use three-digit arithmetic. There is only one step. We have: multiplier m21 = 10;;14 = ;104
0:0001 1
!
U = A = (1)
0 ;104
1 0
!
L = :
104 1
The product of the computed L and U gives
0:0001 1
!
LU = ;
2 0
which is dierent from A. Who is to blame?
Note that the pivot a(1)
11 = 0:0001 is very close to zero (in three-digit arithmetic). This small
pivot gave a large multiplier. The large multiplier, when used to update the entries, eliminated the
small entries (e.g., (1 ; 104) gave ;104). Fortunately, we can avoid this small pivot just by
row interchanges. Consider the matrix with the rst row written second and the second written
rst:
1 1
!
A=0 :
0:0001 1
Gaussian elimination now gives
1 1
! 1 0
!
U =A = (1)
; L=
0 1 0:0001 1
1 1
!
Note that the pivot in this case is a = 1. The product LU =
(1)
= A0 .
11
0:0001 1:0001
146
5.2.2 Gaussian Elimination with Partial Pivoting
In the example above, we have found a factorization of the matrix A0 which is a permuted version
of A in the sense that the rows have been swapped. A primary purpose of factorizing a matrix A
into LU is to use this factorization to solve a linear system. It is easy to see that the solution of the
system Ax = b and that of the system A0 x = b0, where b0 has been obtained in a manner similar to
that used to generate A0 , are the same. Thus, if the row interchanges can help avoid a small pivot,
it is certainly desirable to do so.
As the above example suggests, disaster in Gaussian elimination without pivoting can perhaps
be avoided by identifying a \good pivot" (a pivot as large as possible) at each step, before the
process of elimination is applied. The good pivot may be located among the entries in a column
or among all the entries in a submatrix of the current matrix. In the former case, since the search
is only partial, the method is called partial pivoting; in the latter case, the method is called
complete pivoting. It is important to note that the purpose of pivoting is to prevent
large growth in the reduced matrices which can wipe out original data. One way to
do this is to keep multipliers less than one in magnitude, and this is exactly what is accomplished
by pivoting. However, large multipliers do not necessarily mean instability (see our discussion of
Gaussian elimination without pivoting for symmetric positive denite matrices in Chapter 6). We
rst describe Gaussian elimination with partial pivoting.
The process consists of (n ; 1) steps.
Step 1. Scan the rst column of A to identify the largest element in magnitude in that column.
Let it be ar1 1 .
;
Form a permutation matrix P by interchanging the rows 1 and r of the identity matrix and
1 1
Find an elementary lower triangular matrix M such that A = M P A has zeros below the
1
(1)
1 1
an 1 0
147
Note that 0 1 0 01
BB m C
1 0 0C
BB 21 CC
M =B BB m. 0 1 0CC
... C
1 31
B@ .. C A
mn 0 0 11
where m = ; aa , m = ; aa ; : : :; mn = ; aan . Note that aij refers to the (i; j )th entry of the
21
21
31
31
1
1
permuted matrix P A. Save the multipliers mi ; i = 2; : : :; n and record the row interchanges.
11 11 11
1 1
0 1
BB 0 C CC
BB C
A =B BB 0. C
.. C
(1)
B@ .. .. C
. .C A
0
Step 2. Scan the second column of A below the rst row to identify the largest element
(1)
in magnitude in that column. Let the element be a(1) r2 ;2 . Form the permutation matrix P2 by
interchanging the rows 2 and r2 of the identity matrix and leaving the other rows unchanged. Form
P2 A(1).
Next, nd an elementary lower triangular matrix M2 such that A(2) = M2P2 A(1) has zeros below
the (2,2) entry. M2 is constructed as follows. First, construct an elementary matrix M c2 of order
(n ; 1) such that 0 1 0 1 a
BB a 22
CC BB 0 CC
BB . 32 CC BB CC
c
M =B BB ... CC = BB 0 CC ;
2
B@ .. CC BB .. CC
A @.A
an 2 0
then dene 01 0 01
BB 0 CC
M =BB CC :
B@ ...
2
c
M C A 2
0
Note that aij refers to the (i; j )th entry of the current matrix P A . At the end of Step 2,
2
(1)
148
we will have
0 1
BB 0 C CC
BB C
A =M P A =B
(2)
BB 0.
2 2
(1)
0 C CC ;
B@ .. .. . . . .C .
.
. A
0 0
01 0 0 01
BB 0 1 0 0C CC
BB C
B0 m 1 0C
M =B BB .. .. 32
... .. CC
2
BB .. .. .C C
B@ .. .. ... 0 C CA
0 mn2 0 1
where mi2 = ; aai2 , i = 3; 4; : : :; n.
Save the multipliers mi2 and record the row interchange.
22
Step k. In general, at the kth step, scan the entries of the kth column of the matrix A(k;1)
below the row (k ; 1) to identify the pivot ar , form the permutation matrix Pk , and nd an
k;k
elementary lower triangular matrix Mk such that A(k) = Mk Pk A(k;1) has zeros below the (k; k)
entry. Then Mk is constructed rst by constructing M ck of order (n ; k + 1) such that
0a 1 01
BB kk... CC BB 0 CC
ck BB .. CC = BB .. CC ;
M B@ . CA B@ . CA
ank 0
and then dening !
Ik; 0
Mk = 1
ck ;
0 M
where `0' is a matrix of zeros. The elements ai;k refer to the (i; k)th entries of the matrix
Pk A k; .
( 1)
Step n-1. At the end of the (n ; 1)th step, the matrix A n; will be an upper triangular( 1)
matrix.
Form U : Set
A n; = U:
( 1)
(5.2.1)
149
Then
U = A n; = Mn; Pn; A n;
( 1)
1 1
( 2)
Set
Mn; Pn; Mn; Pn; M P M P = M
1 1 2 2 2 2 1 1 (5.2.2)
Then we have from above the following factorization of A :
U = MA
From (5.2.2) it is easy to see that there exists a permutation matrix P such that PA = LU .
Dene
P = Pn; P P 1 2 1 (5.2.3)
L = P (Mn; Pn; M P ); : 1 1 1 1
1
(5.2.4)
Then PA = LU .
150
Example 5.2.2
0:0001 1
!
A= :
1 1
Only one step. The pivot entry is 1, r1 = 2
0 1
!
P = ;
1
1 0
1 1
!
PA =
1
0:0001 1
m = 1 = ;10;4
; 0:0001
21
1 0
! 1 0
!
M = =
1
m21 1 ;10;4 1
1 0
! 1 1! 1 1!
MPA = = =U
1 1
;10;4 1 0:0001 1 0 1
1
!
0 0 1
! 0 1 !
M = M1P1 = =
;10;4 1 1 0 1 ;10;4
0 1
! 10;4 1 ! 1 1 !
MA = = :
1 ;10;4 1 1 0 1
Example 5.2.3
Triangularize 0 0 1 11
B 1 2 3 CC
A=B
@ A
1 1 1
using partial pivoting. Express A = MU . Find also P and L such that PA = LU .
00 1 01
B C
P = B @ 1 0 0 CA
1
0 0 1
01 2 31
B C
PA = B
1 @ 0 1 1 CA ;
1 1 1
151
0 1 0 01
B C
M = B
@ 0 1 0 CA
1
;1 0 1
01 2 3 1
A = M P A=B
B 0 1 1 CC
(1)
@ A 1 1
0 ;1 ;2
Step 2. The pivot entry is a = 1 22
01 2 3 1
PA = B
B 0 1 1 CC ;
2 @
(1)
A
0 ;1 ;2
P = I (no interchange is necessary)
!
2 3
c = 1 0
M ;
2
1 1
01 0 01
!
M =
I 0 1
=
BB 0 1 0 CC
2
c
0 M @ A
0 1 1 2
01 2 3 1
B C
U = A(2) = M2 P2 A(1) = B
@ 0 1 1 CA
0 0 ;1
00 1 01
M = M2 P2 M1P1 = B
B 1 0 0 CC :
@ A
1 ;1 1
It is easily veried that A = MU .
Form L and P :
00 1 01
B 1 0 0 CC
P = P P =B
@ 2 A 1
0 0 1
01 0 01
B C
L = P (M P M P ); = B
@ 0 1 0 CA :
2 2 1 1
1
1 ;1 1
It is easy to verify that PA = LU .
152
Forming the Matrix M and Other Computational Details
1. Each permutation matrix Pk can be formed just by recording the index rk ; since Pk is the
permuted identity matrix in which rows k and rk have been interchanged. However, neither
the permutation matrix Pk nor the product Pk A(k;1) needs to be formed explicitly. This is
because the matrix Pk A(k;1) is just the permuted version of A(k;1) in which the
rows rk and k have been interchanged.
2. Each elementary matrix Mk can be formed just by saving the (n ; k + 1) multipliers. The
matrices MkPk A(k;1) = Mk B also do not need to be computed explicitly. Note that
the elements in the rst k rows of the matrix Mk B are the same as the elements of the rst
k rows of the matrix B, and the elements in the remaining (n ; k) rows are given by:
bij + mik bkj (i = k + 1; : : :; n; j = k + 1; : : :; n):
3. The multipliers can be stored in the appropriate places of lower triangular part of A (below
the diagonal) as they are computed.
4. The nal upper triangular matrix U = A(n;1) is stored in the upper triangular part.
5. The pivot indices rk are stored in a separate single subscripted integer array.
6. A can be overwritten with each A(k) as soon as the latter is formed.
Again, the major programming requirement is a subroutine that computes an elemen-
tary matrix M such that, given a vector a, Ma is a multiple of the rst column of the
identity matrix.
In view of our above discussion, we can now formulate the following practical algorithm for
LU factorization with partial pivoting.
Algorithm 5.2.3 Triangularization Using Gaussian Elimination with Partial Pivoting
Let A be an n n nonsingular matrix. Then the following algorithm computes the triangu-
larization of A with rows permuted, using Gaussian elimination with partial pivoting. The upper
triangular matrix U is stored in the upper triangular part of A, including the diagonal. The multi-
pliers needed to compute the permuted triangular matrix M such that MA = U are stored in the
lower triangular part of A. The permutation indices rk are stored in a separate array.
For k = 1; 2; : : :; n ; 1 do
153
1. Find rk so that jar ;k j = kmax
k
ja j. Save rk.
in ik
If ar ;k = 0, then stop. Otherwise, continue.
k
(Note that the search for the pivot at step k requires (n ; k) comparisons.)
Note: Algorithm 5.2.3 does not give the matrices M and P explicitly. However, they
can be constructed easily as explained above, from the multipliers and the permutation
indices, respectively.
Remark: The above algorithm accesses the rows of A in the innermost loop and that is why
it is known as the row-oriented Gaussian elimination (with partial pivoting) algorithm.
It is also known as the kij algorithm; note that i and j appear in the inner loops. The column-
oriented algorithm can be similarly developed. Such a column-oriented algorithm has been used in
LINPACK (LINPACK routine SGEFA).
Example 5.2.4 01 2 41
B C
A=B
@ 4 5 6 CA :
7 8 9
Step 1. k = 1.
1. The pivot entry is 7: r1 = 3.
2. Interchange rows 3 and 1: 07 8 91
B C
AB
@ 4 5 6 CA :
1 2 4
3. Form the multipliers:
a m = ; 74 ; a m = ; 17 :
21 21 31 31
154
4. Update: 07 8 9 1
B CC
AB
@0 A: 3
7
6
7
0 6
7
19
7
Step 2. k = 2.
1. The pivot entry is 67 .
2. Interchange rows 2 and 3:
07 8 9 1
B CC
AB
@0 A: 6
7
19
7
0 3
7
6
7
4. Update: 07 8 9 1
B0
A = B
CC :
@ A6
7
19
7
0 0 ; 1
2
Form M . 00 0 11
M =B
B 1 0 ; CC :
@ A 1
7
; 1 ;
1
2
1
2
has zeros on the kth column below the (k; k) entry. The matrix Mk can of course be computed in
two smaller steps as before.
155
At the end of the (n ; 1)th step, the matrix A(n;1) is an upper triangular matrix. Set
A n; = U:
( 1)
(5.2.5)
Then
U = A n; = Mn; Pn; A n; Qn;
( 1)
1 1
( 2)
1
Set
Mn; Pn; M P = M;
1 1 1 1 (5.2.6)
Q Qn; = Q:
1 1 (5.2.7)
Then we have
U = MAQ:
As in the case of partial pivoting, it is easy to see from (5.2.4) and (5.2.7) that the factorization
MAQ = U can be expressed in the form:
PAQ = MU:
156
Corollary 5.2.2 (Complete Pivoting LU Factorization Theorem). Gaussian
elimination with complete pivoting yields the factorization PAQ = LU , where P
and Q are permutation matrices given by
P = Pn; P ; 1 1
Q = Q Qn; ;1 1
Example 5.2.5
Triangularize 00 1 1 1
B1 2 3
A=B
CC
@ A
1 1 1
using complete pivoting.
00 1 01
B C
P =B
1 @ 1 0 0 CA ;
0 0 1
00 0 11 03 2 11
B C B C
Q =B
1 @ 0 1 0 CA ; P AQ = B
1 @ 1 1 0 CA ;
1
1 0 0 1 1 1
0 1 0 01
M =B
B; C
@
1 1 0C
A; 1
3
; 0 1 1
03 2 1 3
1
B CC
A = M P AQ = B
(1)
1 1@0 ;1
1
3
1
3 A:
0 1
3
2
3
157
Step 2. The pivot entry is a = 23 .
(1)
33
01 0 01 01 0 01
B C B C
P =B @ 0 0 1 CA ;
2 Q =B
2 @ 0 0 1 CA :
0 1 0 0 1 0
03 2 11
!
B C c2 = 1 1 0
@ 0 23 13 CA ; M
P2A(1) Q2 = B :
1
0 ;3 31 1 2
01 0 01
M2 = B
B 0 1 0 CC ;
@ A
0 21 1
03 2 11
B 2 1 CC :
U = A(2) = M2P2A(1)Q2 = M2P2(M1P1AQ1 )Q2 = B @0 3 3 A
0 0 12
(Using Corollary 5.2.2, nd for yourself P , Q, L, and U such that PAQ = LU .)
Forming the Matrix M and other Computational Details
Remarks similar to those as in the case of partial pivoting hold. The matrices Pk ; Qk Pk A(k;1)Qk ,
Mk and Mk PkA(k;1)Qk do not have to be formed explicitly wasting storage unnecessarily. It is
enough to save the indices and the multipliers.
In view of our discussion on forming the matrices Mk and the permutation matrix Pk , we now
present a practical Gaussian elimination algorithm with complete pivoting, which does not show
the explicit formation of the matrices Pk ; Qk ; Mk ; MkA and Pk AQk . Note that partial pivoting
is just a special case of complete pivoting.
Algorithm 5.2.4 Triangularization Using Gaussian Elimination with Complete Pivot-
ing
Given an n n matrix, the following algorithm computes triangularization of A with rows and
columns permuted, using Gaussian elimination with complete pivoting. The algorithm overwrites
A with U . U is stored in the upper triangular part of A (including the diagonal) and the multipliers
mik are stored in the lower triangular part. The permutation indices rk and sk are saved separately.
For k = 1; 2; : : :; n ; 1 do
1. Find rk and sk such that jar ;s j = max fjaij j : i; j kg ; and save rk and sk .
k k
158
2. (Interchange the rows rk and k) akj $ ar ;j (j = k; k + 1; : : :; n).
k
Note: Algorithm 5.2.4 does not give the matrices M; P , and Q explicitly; they have
to be formed, respectively, from the multipliers mik and the permutation indices rk
and sk , as explained above.
Flop-count: The algorithm requires n3
ops and O(n ) comparisons.
3
3
Example 5.2.6
1 2
!
A= :
3 4
Just one step is needed.
s = 2:
1
First, the second and rst rows are switched and this is then followed by the switch of the second
and rst column to obtain the pivot entry 4 in the (1,1) position:
3 4
!
A
1 2
(After the interchange of the rst and second rows).
4 3
!
A
2 1
(After the interchange of the rst and second columns).
Multiplier is: a21 m21 = ; aa21 = ; 42 = ; 12
11
4 3
!
A
0 ; 12
159
(After updating the entries of A).
0 1
! 0 1
!
P1 = ; Q = Q1 = ;
1 0 1 0
!
1 0 0 1
! 0 1
!
M = M1P1 = 1 = :
;2 1 1 0 1 ; 12
5.3 Stability of Gaussian Elimination
The stability of Gaussian elimination algorithms is better understood by measuring the growth of
the elements in the reduced matrices A(k). (Note that although pivoting keeps the multipli-
ers bounded by unity, the elements in the reduced matrices still can grow arbitrarily).
We remind the readers of the denition of the growth factor in this context, given in Chapter 3.
Denition 5.3.1 The growth factor is the ratio of the largest element (in magnitude) of
A; A ; : : :; A n; to the largest element (in magnitude) of A:
(1) ( 1)
= max(; ; ; : : :; n; ) ;
1 2 1
Example 5.3.1
0:0001 1
!
A=
1 1
1. Gaussian elimination without pivoting gives
0:0001 1
!
A =U =
(1)
0 ;104
max ja(1)
ij j = 10
4
max jaij j = 1
= the growth factor = 10 4
0 1
max ja(1)
ij j = 1
max jaij j = 1
= the growth factor = 1
160
The question naturally arises: how large can the growth factor be for an arbitrary
matrix? We answer this question in the following.
1.
This is a slowly growing function of n. Furthermore, in practice this bound is never attained.
Indeed, there was an unproven conjecture by Wilkinson (AEP, p. 213) that the growth
factor for complete pivoting was bounded by n for real n n matrices. Unfortunately,
this conjecture has recently been settled by Gould (1991) negatively. Gould (1991)
exhibited a 13 13 matrix for which Gaussian elimination with complete pivoting gave the growth
factor = 13:0205. In spite of Gould's recent result, Gaussian elimination with complete
pivoting is a stable algorithm.
2.
Unfortunately, one can construct matrices for which this bound is attained.
161
Consider the following example:
0 1
BB 1 0 0 0 1C
BB ;1 1 0 0 1C C
BB ... ... ... .. CC
. C
A=B BB .. C
.. C
BB . ... ... . C
C
BB ... ... . . . ... C CC
@ A
;1 ;1 1
That is, 8
> for j = i; n;
< 1
>
aij = > ;1 for j < i;
>
: 0 otherwise.
Wilkinson (AEP, pp. 212) has shown that the growth factor for this matrix with partial pivoting
is 2n;1. To see this, take the special case with n = 4.
0 1 0 0 11
BB ;1 1 0 1 CC
A = B BB CC
@ ;1 ;1 1 1 CA
;1 ;1 ;1 1
01 0 0 11
BB 0 1 0 2 CC
A = B
(1)
BB CC
CA
@ 0 ; 1 1 2
0 ;1 ;1 2
01 0 0 11
BB CC
B 0 1 0 2 C
A(2) = B B@ 0 0 1 4 CCA
0 0 ;1 4
01 0 0 11
BB 0 1 0 2 CC
A = B
(3)
BB CC
@ 0 0 1 4 CA
0 0 0 8
Thus the growth factor
= 81 = 23 = 24;1:
162
Remarks: Note that this is not the only matrix for which = 2n; . Higham and Higham (1987)
1
163
A Householder matrix is also known as an Elementary Re
ector or a Householder transformation.
We now give a geometric interpretation of a Householder transformation.
u(uT x) x
;2u(uT x)
Hx = ( I ; 2uu)T x
= x ; 2u(uT x)
With this geometric interpretation the following results become clear:
kHxk = kxk for every x 2 Rn.
2 2
H = I.
2
Hy = y for every y 2 P .
Vectors in P cannot be re
ected away.
H has a simple eigenvalue ;1 and (n ; 1)-fold eigenvalue 1.
P = fv 2 Rn : v> u = 0g has n ; 1 linearly independent vectors y ; : : :; yn; and
1 1
Proof. Dene
H = I ; 2 uu
T
uT u
with u = x + sign(x1 )kxk2e1 , then it is easy to see that Hx is a multiple of e1 :
Note: If x is zero, its sign can be chosen either + or ;. Any possibility of over
ow or under
ow
1
in the computation of kxk can be avoided by scaling the vector x. Thus the vector u should be
determined from the vector max xfjx jg rather than from the vector x itself.
2
i i
Algorithm 5.4.1 Creating zeros in a vector with a Householder matrix
Given an n-vector x, the following algorithm replaces x by Hx = (; 0; : : :; 0)T , where H is a
Householder matrix.
1. Scale the vector x max xfjx jg .
i i
2. Compute u = x + sign(x1 )kxk2e1 .
3. Form Hx where H = I ; 2uu .
T
uT u
Remark on Step 3: Hx in step 3 should be formed by exploiting the structure of H as shown
in Example 7 in Chapter 4.
165
Example 5.4.1
001
B CC
x=B
@4A
1
001
x maxxfjx jg
B CC
=B
@1A
i 1
001 011 0 p 4
1
p
B CC 17 BB CC BB
17
4
CC
u=B
@1A+ @0A = @ 1
4 A
1
0 1
4
0 0 ; 0: 9701 ; 0 : 2425
14
T B CC
H = I ; 2uuu B
T u = @ ;0:9701 0:0588 ;0:2353 A
;0:2425 ;0:2353 0:9412
and 0 ;4:1231 1
B 0 CC :
Hx = B
@ A
0
Flop-Count and Round-o Property. Creating zeros in a vector by a Householder matrix
is a cheap and numerically stable procedure.
It takes only 2(n-2)
ops to create zeros in the positions 2 through n in a vector, and it can
be shown (Wilkinson AEP, pp. 152-162) that if Hb is the computed Householder matrix, then
kH ; Hb k 10:
Moreover,
b ) = H (x + e);
(Hx
where
jej cn kxk ;
2
2
166
5.4.1 Householder Matrices and QR Factorization
As we will see later, the QR factorization plays a very signicant role in numerical solutions
of linear systems, least-squares problems, eigenvalue and singular value computations.
We now show how the QR factorization of A can be obtained using Householder matrices, which
will provide a constructive proof of Theorem 5.4.1.
As in the process of LU factorization, this can be achieved in (n ; 1) steps; however, unlike
the Gaussian elimination process, the Householder process can always be carried out to
completion.
Step 1. Construct a Householder matrix H such that H A has zeros below the (1,1) entry
1 1
@. . .A
0
Note that it is sucient to construct H = I ; 2unuTn =(uTn un ) such that
1
0a 1 01
BB a CC BB 0 CC
11
@ . A @.A
1
an 1 0
for then H1 A will have the above form.
Overwrite A with A = H1A for use in the next step.
(1)
167
Since A overwrites A(1) , A(1) can be written as:
0a a a n1
BB 0 a 11 12 1
C
a nC
AA =BBB .. .. . . .
(1)
22 2 C
.. C :
@ . . . C
A
0 an 2 ann
Step 2. Construct a Householder matrix H such that H A has zeros below the (2,2) entry
2 2
(1)
in the 2nd column and the zeros already created in the rst column of A(1) in step 1 are not
destroyed: 0 1
BB 0 C CC
BB C
A =H A =B
(2)
BB 0. (1)
0 C C
. . . ... C
2
B@ .. .. CA
.
0 0
H can be constructed as follows:
2
CC = BB 0 CC ;
2
BB .. CC BB .. CC
@ . A @.A
an 2 0
and then dene 01 0 01
BB CC
B 0 CC :
H =B
2
B@ ... bH CA 2
0
A(2) = H2A(1) will then have the form above.
Overwrite A with A . (2)
Note: Since H also has zeros below the diagonal on the lst column, premultiplication of A by
2
(1)
168
of order n ; k + 1 such that 0a 1 01
BB kk... CC BB 0 CC
Hb k BBB .. CCC = BBB .. CCC ;
@ . A @.A
ank 0
and then, dening !
Ik; 0
Hk = 1
;
0 Hb k
compute A(k) = Hk A(k;1).
Overwrite A with A k :( )
The matrix A(k) will have zeros on the kth column below the (k; k)th entry and the zeros already
created in previous columns will not be destroyed. At the end of the (n ; 1)th step, the resulting
matrix A(n;1) will be an upper triangular matrix R.
Now, since
A(k) = Hk A(k;1); k = n ; 1; : : :; 2;
we have
R = A n; = Hn; A n; = Hn; Hn; A n;
( 1)
1
( 2)
1 2
( 3)
(5.4.1)
= = Hn; Hn; H H A:
1 2 2 1
Set
QT = Hn; Hn; H H :
1 2 2 (5.4.2)
1
170
Construct H :
001 01
1
B CC BB CC
HB
@1A = @0A
1
1 0
001 0 1 1 0 p2 1
B 1 CC + p BB CC BB CC
u =B
3 @ A 2@0A = @ 1 A
1 0 1
01 0 01 0 1 p12 p12
1
H = I ; 2uuT uu
T B 0 1 0 CC ; BB p
= B
CC
@ A @ A
3 3 1 1 1
1 3 2 2 2
3
0 0 1
3
p12 12 1
0 0 ; p1 ; p1 1 2
B p1 2 2
C
= B
@; 2 1
2
; C
A 1
2
;p 1
2
; 1
2
1
2
Form A :
(1)
0 ;p2 ;3p2 p 1
2 2
B
A =H A=B
p
2
1; 2 ;p C
CA
(1)
@ 0 1 2
p
2
2
p
2
0 ; (1+ 2)
2
; (2+ 2)
2
bH ;0:2071 =
;1:2071 02
;0:2071 ! 1
! ;1:4318 !
u = ; 1:2247 =
2
;1:2071 0 ;1:2071
;0:1691 ;0:9856 !
Hb =
;0:9856 0:1691
2
Construct H :2 01 1
0 0
H =B
B C
@ 0 ;0:1691 ;0:9856 CA
2
0 ;0:9856 0:1691
171
Form A : (2)
Note: The above count does not take into account the explicit construction of Q. Q is available
only in factored form. It should be noted that in a majority of practical applications, it is sucient
to have Q in this factored form and, in many applications, Q is not needed at all. If Q is needed
explicitly, another 32 n3
ops will be required. (Exercise #22)
Round-o Property. In the presence of round-o errors the algorithm computes QR de-
composition of a slightly perturbed matrix. Specically, it can be shown (Wilkinson AEP p. 236)
that if Rb denotes the computed R, then there exists an orthogonal Qb such that
A + E = Qb R:
b
The error matrix E satises
kE kF (n)kAkF ;
where (n) is a slowly growing function of n and is the machine precision. If the inner
products are accumulated in double precision, then it can be shown (Golub and Wilkinson (1966))
that (n) = 12:5n. The algorithm is thus stable.
172
5.4.2 Householder QR Factorization of a Non-Square Matrix
In many applications (such as in least squares problems, etc.), one requires the QR factorization
of an m n matrix A. The above Householder method can be applied to obtain QR factorization
of such an A as well. The process consists of s = minfn; m ; 1g steps; the Householder matrices
H ; H ; : : :; Hs are constructed successively so that
1 2
8 R!
>
< ; if m n;
HsHs; H H A = Q A = > 0
1 2
T
1
: (R; S ); if m n.
Flop-Count and Round-o Property
The Householder method in this case requires
1. n2 (m ; n3 )
ops if m n.
2. m2(n ; m3 )
ops if m n.
The round-o property is the same as in the previous case. The QR factorization of a rectan-
gular matrix using Householder transformations is stable.
Example 5.4.3
0 1 1
1
B C
A = B
@ 0:0001 0 C A
0 0:0001
s = min(2; 3) = 2:
Step 1. Form H 1
0 1 1 011 0 2 1
B 0:0001 CC + p1 + (0:0001)
u =B
BB 0 CC = BB CC
@
2 A 2
@ A @ 0 :0001 A
0 0 0
0 ;1 ;0:0001 0 1
uu B ;0:0001
=B 0C
C
H =I;
1
2 2 T
2
uT2 u2 @ 1 A
0 0 1
0 ;1 ;1 1
B 0 ;0:0001 CC
A =H A=B
(1)
@ 1A
0 0:0001
173
Step 2: Form H 2
;0:0001 ! q 1
!
u = ; (;0:0001) + (0:0001) 2 2
1
0:0001 0
;2:4141 !
= 10; 4
:1000
1 0
! ;0:7071 0:7071 !
Hb = u u T
; 2 uT u =
1 1
2
0 1 1 0:7071 0:7071
1
01 0 0
1
BB C
H = 2 @ 0 ;0:7071 0:7071 CA
0 0:7071 0:7071
Form R 0 ;1 ;1 1
B C R
!
H A =B @ 0 0:0001 CA =
(1)
:
2
0
0 0
Form
0 ;1 0:0001 ;0:0001
1
B C
Q = H H =B
@ ;0:0001 ;0:7071 0:7071 CA
1 2
0 0:7071 0:7071
;1 ;1 !
R =
0 0:0001
5.4.3 Householder Matrices and Reduction to Hessenberg Form
174
The idea of orthogonal factorization using Householder matrices described in the previous sec-
tion can be easily extended to obtain P and Hu .
The matrix P is constructed as the product of (n ; 2) Householder matrices P1 through Pn;2:
P1 is constructed to create zeros in the rst column of A below the entry (2,1), P2 is determined
to create zeros below the entry (3,2) of the second column of the matrix P1 AP1T , and so on.
The process consists of (n ; 2) steps. (Note that an n n Hessenberg matrix contains at
least (n;2)(2 n;1) zeros.)
0a 1 01
BB a CC BB 0 CC
21
@ . A @.A
1
an 1 0
Dene
I 0
!
P = 1
1
0 Pb1
and compute
A = P AP T :
(1)
1 1
0a 1 01
BB ... CC BB 0 CC
32
an 2 0
Dene
I 0
!
P = 2
2
0 Pb2
and compute A(2) = P2 A(1)P2T :
175
Overwrite A with A : Then
(2)
0 1
BB CCC
BB C
B0 C
AA =B BB (2) CC :
BB 0. 0 C CC
B@ .... .. CA
. .
0 0
The general Step k can now easily be written down.
At the end of (n ; 2) steps, the matrix A(n;2) is an upper Hessenberg matrix Hu:
Now,
Hu = A n; = Pn; A n; PnT;
( 2)
2
( 3)
2
= Pn;2(Pn;3A(n;4)PnT;3 )PnT;2
..
.
= (Pn;2Pn;3 P1 )A(P1T P2T PnT;3PnT;2 ): (5.4.4)
Set
P = Pn; Pn; P :
2 3 1 (5.4.5)
We then have Hu = PAP T : Since each Householder matrix Pi is orthogonal, the matrix P which
is the product of (n ; 2) Householder matrices is also orthogonal.
0
176
and 0 101 0 0 0
1 0 1
B
B C
C BB 0 C
C BB C
C
B
P AP = B
T CC BB CC = BB CC :
1 1
B
@ 0 CA B@ 0 C A B@ 0 C A
0 0 0
Flop-Count. The algorithm requires n
ops to compute Hu. This count does not include
5
3
3
T
2 0 1 2:4142 1 ;0:7071 0:7071
178
Form P : 1
01 1 01
0 0 0 0
1
B0
P =B
CC = BB 0 ;0:7071 ;0:7071 CC
1 @ A @ A
0 Pb 1 0 ;0:7071 0:7071
0 0 ;2:1213 0:7071 1
A A = P AP T = B
B ;1:4142 3:5000 ;0:5000 CC = H :
(1)
@
1 1 A u
0 1:5000 ;0:5000
All computations are done using 4-digit arithmetic.
Tridiagonal Reduction
If the matrix A is symmetric, then from
PAP T = Hu
it follows immediately that the upper Hessenberg matrix Hu is also symmetric and, therefore, is
tridiagonal. Thus, if the algorithm is applied to a symmetric matrix A, the resulting matrix Hu
will be a symmetric tridiagonal matrix T . Furthermore, one obviously can take advantage of the
symmetry of A to modify the algorithm. For example, a signicant savings can be made in storage
by taking advantage of the symmetry of each A(k):
The symmetric algorithm requires only 23 n3
ops to compute T compared to 53 n3
ops needed to
compute Hu: The round-o property is essentially the same as the nonsymmetric algorithm. The
algorithm is stable.
Example 5.4.5
Let 00 1 11
B 1 2 1 CC :
A=B
@ A
1 1 1
Since n = 3; we have just one step to perform.
Form Pb1:
1
! !
Pb1 =
1 0
1
! p 1
! p 1 ! 1 + p2 !
u2 = + 2e1 = + 2 =
1 1 0 1
! ! !
bP1 = I2 ; 2uu2uu22 1 0 ; :2929 5:8284 2:4142 = ;0:7071 ;0:7071 :
T
T
2 0 1 2:4142 1:0000 ;0:7071 0:7071
179
Form P : 1 01 1 01 1
0 0 0 0
B0
P =B
CC = BB 0 ;0:7071 ;0:7071 CC :
1@ A @ A
0 Pb 1 0 ;0:7071 0:7071
Thus 0 0 ;1:4142 0 1
Hu = P AP T = B
B ;1:4142 2:5000 0:5000 CC :
1 @ 1 A
0 0:5000 0:5000
(Note that Hu is symmetric tridiagonal.)
W. Givens was director of the Applied Mathematics Division at Argonne National Laboratory. His pioneering
work done in 1950 on computing the eigenvalues of a symmetric matrix by reducing it to a symmetric tridiagonal
form in a numerically stable way forms the basis of many numerically backward stable algorithms developed later.
Givens held appointments at many prestigious institutes and research institutions (for a complete biography, see the
July 1993 SIAM Newsletter). He died in March, 1993.
180
e2 + )
cos(
! cos ; sin ! cos
!
v= =
sin( + ) sin cos sin
cos
!
u=
sin
e1
Thus, when an n-vector 0x 1
BB x 1
CC
x=BBB .. 2 CC
CA
@.
xn
is premultiplied by the Givens rotation J (i; j; ), only the ith and j th components of x are aected;
the other components remain unchanged.
Note that since c2 + s2 = 1, J (i; j; ) J (i; j; )T = I , thus the rotation J (i; j; ) is orthogonal.
x
!
If x = 1
is a 2-vector, then it is a matter of simple verication that, with
x 2
c= p x ; s= p x 1 2
x +x 2
1
2
2 x +x 2
1
2
2
c s
! !
the Givens rotation J (1; 2; ) = is such that J (1; 2; )x = :
;s c 0
The above formula for computing c and s might cause some under
ow or over
ow. However,
the following simple rearrangement of the formula might prevent that possibility.
1+t
(Note that computations of s and t do not involve .)
1
Example 5.5.1
181
1
!
x= 1
2
BB . CC 2
BB .. CC
BB x CC
x=B BB ..i CCC
BB . CC
BB xk CC
BB .. CC
@.A
xn
and if we desire to zero xk only, we can construct the rotation J (i; k; ) (i < k) such that J (i; k; )x
will have zero in the kth position. !
c s
To construct J (i; k; ), rst construct a 2 2 Givens rotation such that
;s c
! ! !
c s xi
=
;s c xk 0
and then form the matrix J (i; k; ) by inserting c in the positions (i; i) and (k; k), s and ;s
respectively in the positions (i; k) and (k; i), and lling the rest of the matrix with entries of the
identity matrix.
Example 5.5.2
011
B CC
x=B
@ ;1 A :
3
Suppose we want to create a zero in the third position, that is, k = 3. Choose i = 2.
182
1. Form a 2 2 rotation such that
c s ! ;1 ! !
= ; c = p;1 ; s = p3 :
;s c 3 0 10 10
01 0 0
1
B p; CC
2. Form J (2; 3; ) = B
@0 1
10
p310 A:
0 p;103 p;10
1
Then 01 0 10 1 1 0 1 1
0
B p; CC BB CC BB p CC
J (2; 3; )x = B
@0 1
10
p310 A @ ;1 A = @ 10 A :
0 p;103 p;10
1
3 0
011
B CC
x=B
@ ;1 A
2
0p 1
0 ;p1 1
Bp
J (1; 2; ) = B
2
0C
C p2
2
@ 1
2 A 1
0 0 1
0 p2 1
B CC
x(1) = J (1; 2; )x = B
@0A
2
0 q 2 0 p2 1
B 06 1 06 CC :
J (1; 3; ) = B
@ q2 A
;2 0
p 6 6
183
Then
0 p6 1
B CC
J (1; 3; )x = B
(1)
@0A
0
0 p16 ;
p16 p26
1
B
P = J (1; 3; )J (1; 2; ) = B
CC
q@ p12 p12
q q0 A
; 2 2 2
0 p6 1
6 6 6
B 0 CC :
Px = B
@ A
0
Flop-Count and Round-o Property. Creating zeros in a vector using Givens rotations
is about twice as expensive as using Householder matrices. To be precise, the process requires
only 1 21 times
ops as Householder's, but it requires O( n22 ) square roots, whereas the Householder
method requires O(n) square roots.
The process is as stable as the Householder method.
184
Remark: Note that there are other ways to do this as well. For example, we can form J (k; j; )
aecting the kth and j th rows, such that J (k; j; )A will have a zero in the (j; i) position.
Example 5.5.4
Let 01 2 31
B C
A=B
@ 3 3 4 CA :
4 5 6
Create a zero in the (2,1) position of A using J (1; 2; ):
1. Find c and s such that
c s! 1
! x !
=
;s c 3 0
c= p ; 1
10
s = p310
185
2. Form
01 0 0
1
B0 p
J (2; 3; ) = B p420
CC
@ 2
20 A
0 p;204 p220
01 0 0
101 2 31 0 1 2 3 1
B0 p
A = J (2; 3; )A = B
C BB 2 3 4 CC = BB p20 p26 p32 CC
p420 C
(1)
@ 2
20 A@ A @ 20 20 A
and so on.
Let s = min(m; n ; 1): Then
R = A s; = Qs; A s; = Qs; Qs; A s; = = Qs; Qs; Q Q A = QT A:
( 1)
1
( 2)
1 2
( 3)
1 2 2 1
186
Theorem 5.5.1 (Givens QR Factorization Theorem) Given an m
n matrix A, m n, there exist s = min(m; n ; 1) orthogo-
nal matrices Q ; Q ; : : :; Qs; dened by Qi = J (i; m; )J (i; m ; 1; )
1 2 1
we have
A = QR;
where
R
!
R= 1
0
and R1 is upper triangular.
Flop-Count. The algorithm requires 2n ;m ; n
ops. This count, of course, does not
2
3
include computation of Q. Thus, the algorithm is almost twice as expensive as the House-
holder algorithm for QR factorization.
Round-o Property. The algorithm is quite stable. It can be shown that the computed Qb
and Rb satisfy
Rb = Qb T (A + E );
where
kE kF ckAkF ; c is a constant of order unity (Wilkinson AEP, p. 240).
This means that R2 and R1 are the same except for the signs of their rows. Similarly, from
QT Q = V
2 1
we see that Q1 and Q2 are the same except for the signs of their columns. (For a proof, see Stewart
IMC, p. 214.)
If the diagonal entries of R1 and R2 are positive, then V is the identity matrix so that
R =R 2 1
and
Q =Q :
2 1
The above result can be easily generalized to the case where A is m n and has linearly
independent columns.
Example 5.5.6
We nd the QR factorization of 00 1 11
B C
A=B
@ 1 2 3 CA
1 1 1
using Givens Rotations and verify the uniqueness of this factorization.
189
Step 1. Find c and s such that
c s a ! ! !
= 11
;s c a 0 21
a = 0; a = 1
11 21
c = 0; s=1
0 0 1 01
B ;1 0 0 CC
J (1; 2; ) = B
@ A
0 0 1
0 0 1 0100 1 11 01 2 3 1
B CB C B C
A J (1; 2; )A = B
@ ;1 0 0 CA B@ 1 2 3 CA = B@ 0 ;1 ;1 CA
0 0 1 1 1 1 1 1 1
Find c and s such that
c s a ! ! !
= 11
;s c a 0 31
a = 1; c = p ; s = p
11
1 1
0p 0 p 1
2 2
1 1
B C 2 2
J (1; 3; ) = B
@ 0 1 0 CA
0 ;1
p p12
0 p1 0 p1 1 0 1 2 1 0p p 1
2
3 2 p32 2 2
B 2
A J (1; 3; )A = @ 0 1 0 C
B
2
C BB 0 ;1 C = BB 0 ;1 ;1 CC
;1 C
A@ A @ p A
p;12 0 p12 1 1 1 0 p;12 ; 2
Step 2. Find c and s such that
c s! a ! !
=
22
;s c a 0 32
p
a = ;1;
22 a = ;p ; c = ;p ; s = ;p
32
1
2
2
3
1
3
01 0
10p p 0
p 1
2 3
2 2
B ;pp2 C BB C 2
J (2; 3; )A = B
@0 3
; pp C
A @ 0 ;1 ;p1 CA
1
3
0 p13 ;p 2
3
0 ;p ; 2 1
2
0 p2 p3
p 1 0
2 2 1:4142 2:1213 2:8284
1
B p23 p C B C
= B
@0 p2 p C2
A = B@ 0 1:2247 1:6330 CA = R
2
3
0 0 p13 0 0 0:5774
(using four digit computations).
190
Remark: Note that the upper triangular matrix obtained here is essentially the same as the
one given by the Householder method earlier (Example 5.4.2), diering from it only in the signs of
the rst and third rows.
0 1
BB CC
BB CC
B 0 C
P A PT = A = B
(1) (2)
BB .. C C
2
0
BB . . .0 .C
.. C
2
B@ .. .. .. C
.C A
0 0
and so on. At the end of (n ; 2)th step, the matrix A n; is the upper Hessenberg matrix Hu. The
( 2)
191
For p = 1; 2; : : :; n ; 2 do
For q = p + 2; : : :; n do
1. Find c = cos() and s = sin() such that
c s ap ;p
! ! !
=+1
;s c aq;p 0
2. Save c and s and the indices p and q:
3. Overwrite A with J (p + 1; q; )AJ (p + 1; q; )T :
Forming the Matrix P and Other Computational Details
There is no need to form J (p + 1; q; ) and J (p + 1; q; )A explicitly, since they are completely
determined by p; q and c and s (see the section at the end of the QR factorization algorithm using
Givens rotations). If P is needed explicitly, it can be formed from
P = Pn; P P
2 2 1
quired by the Householder method. Thus, the Givens reduction to Hessenberg form is
about twice as expensive as the Householder reduction. If the matrix A is symmetric, then
the algorithm requires about 34 n3
ops to transform A to a symmetric tridiagonal matrix T ; again
this is twice as much as required by the Householder method to do the same job.
Round-o Property. The round-o property is essentially the same as the Householder
method. The method is numerically stable.
Example 5.5.7
00 1 11
B 1 2 3 CC
A=B
@ A
1 1 1
192
Step 1. Find c and s such that
c s! a ! !
=
21
;s c a 0 31
a = a = 1; c = p ; s = p
21 31
1
2
1
2
01 0 0
1
B0
J (2; 3; ) = B p12 p12
CC
@ A
0 ; p12 p12
0 0 2:1213 0:7171
1
B C
A J (2; 3; )AJ (2; 3; )T = B
@ 1:4142 3:5000 0:5000 CA
0 ;1:5000 ;0:5000
= Upper Hessenberg.
Observation: Note that the Upper Hessenberg matrix obtained here is essentially the same as
that obtained by Householder's method (Example 5.4.4) the subdiagonal entries dier only in sign.
matrices. Suppose that P and Q have the same rst columns. Then H1 and H2 are
essentially the same in the sense that H2 = D;1 H1D, where
D = diag(1; : : :; 1):
193
Example 5.5.8
Consider the matrix 00 1 21
B C
A=B
@ 1 2 3 CA
1 1 1
once more. The Householder method (Example 5.4.4) gave
0 0 ;2:1213 0:7171 1
B C
H1 = P1AP1T = B @ ;1:4142 3:5000 ;0:5000 CA
0 1:5000 ;0:5000
The Givens method (Example 5.5.7) gave
0 0 2:1213 0:7171
1
B C
H2 = J (2; 3; )AJ (2; 3; )T = B
@ 1:4142 3:5000 0:5000 CA = H2:
0 ;1:5000 ;0:5000
In the notation of Theorem 5.5.3 we have
P =P ; 1 QT = J (2; 3; ):
Both P and Q have the same rst columns, namely, the rst column of the identity. We verify that
H = D; H D
2
1
1
and
D = diag(1; ;1; 1):
is the orthogonal projection onto R(A) and the matrix PA? = Q2QT2 is the projection onto
the orthogonal complement of R(A). Since the orthogonal complement of R(A) is denoted by
R(A)? = N (AT ), we shall denote PA? by PN .
194
Example 5.6.1
0 1 1
1
B 0:0001
A=B 0 C
C
@ A
0 0:0001
Using the results of Example 5.4.3 we have
0 ;1 0:0001 ;0:0001
1
Q = B
B ;0:0001 ;0:7071 0:7071 CC
@ A
0 0:7071 0:7071
0 ;1 0:0001 1 0 ;0:0001 1
B C B C
Q1 = B @ 0:0001 ;0:7071 CA ; Q2 = B@ 0:7071 CA
0 0:7071 0:7071
0 1:000 0:0003 0:0007 1
B C
PA = Q1QT1 = B @ 0:0003 0:5000 ;0:5000 CA
0 0;:00007 ;0:5000 0:5000
:0001 1
PN = PA? = Q2QT2 = B
B 0:7071 CC ( ;0:0001 0:7071 0:7071)
@ A
0:7071
0 0:00000001 ;0:0001 ;0:0001 1
B C
= B @ ;0:0001 0:5000 0:5000 CA
;0:0001 0:5000 0:5000
Projection of a vector
Given an m-vector b, the vector bR , the projection of b onto R(A), is given by
bR = PA b:
Similarly b? , the projection of b onto the orthogonal complement of R(A), is given by
b? = PA? b = PN b;
where we denote PA? by PN .
Note that b = bR + b?. Again, since the orthogonal complement of R(A) = N (AT ), the null
space of AT , we denote b? by bN for notational convenience.
195
Note: Since bN = b ; bR, it is tempting to compute bN just by subtracting bR from b once
bR has been computed. This is not advisable, since in the computation of bN from bN = b ; bR,
cancellation can take place when bR b.
Example 5.6.2
01 21 011
B CC B CC
A = B
@0 1A; b=B
@1A;
1 0 1
0 ;0:7071 ;0:5774 ;0:4082 1
B C
Q = B@ 0 ;0:5774 0:8165 CA
;0:7071 0:5774 0:4082
0 ;0:7071 ;0:5744 1
B C
Q1 = B
@ 0 ;0:5774 C
A
;0:7071 0:5774
0 0:8334 0:3334 0:1666 1
PA = Q QT = B
B 0:3334 0:3334 ;0:3334 CC
1 1 @ A
0:1666 ;0:3334 0:8334
0 1:3340 1
BB C
bR = PA b = @ 0:3334 CA
0:6666
0 0:1667 ;0:3333 ;0:1667 1
BB C
PN = @ ;0:3333 0:6667 0:3333 CA
;0:1667 0:3333 0:1667
0 ;0:3334 1
BB C
bN = PN b = @ 0:6666 CA
;0:3334
Example 5.6.3
0 1 1
1 0 2 1
B 0:0001
A = B 0 C
C; B 0:0001 CC
b=B
@ A @ A
0 0:0001 0:0001
196
0 ;1 ;0:0001 ;0:0081 1
B C
Q = B
@ ;0:0001 ;0:7071 0:7071 CA
0:1 0:7071 0:7071
0 1 0:0001 0:0001
1
PA = Q1QT1 = B
B 0:0001 0:5000 ;0:5000 CC
@ A
0:0001 ;5:0000 0:5000
0 2 1
B 0:0001 CC
bR = PA b = B
@ A
0:0001
001
bN = B
B 0 CC :
@ A
0
Projection of a Matrix Onto the Range of Another Matrix
Let B = (b1; : : :; bn) be an m n matrix. We can then think of projecting each column of B
onto R(A) and onto its orthogonal complement. Thus, the matrix
BR = PA(b ; : : :; bn) = PA B
1
01 21 01 2 31
B CC B C
A=B
@2 3A; @ 2 3 4 CA
B=B
4 5 3 4 5
A = QR gives
0 ;0:2182 ;0:8165 ;0:5345 1
B C
Q=B
@ ;0:4364 ;0:4082 0:8018 CA
;0:8729 0:4082 ;0:2673
197
Orthonormal Basis for R(A):
0 ;0:2182 ;0:8165 1
QB
B ;0:4364 ;0:4082 CC
@ 1 A
;0:8729 0:4082
0 0:7143 0:4286 ;0:1429 1
B C
PA = Q QT = B
1@ 0:4286 0:3571 0:2143 CA
1
198
Theorem 5.7.1 (QR Column Pivoting Theorem) Let A be an m n matrix
with rank(A) = r min(m; n): Then there exist an n n permutation matrix P
and an m m orthogonal matrix A such that
R R
!
QT AP = 11 12
;
0 0
where R11 is an r r upper triangular matrix with positive diagonal entries.
where A1 is = m r and has linearly independent columns. Consider now the QR factorization
of A1: !
T R11
Q A1 = ;
0
where by the uniqueness theorem (Theorem 5.5.2), Q and R11 are uniquely determined and R11
has positive diagonal entries. Then
R R
!
QT AP = (QT A ; QT A )= 11 12
:
1
0
2
22 R
Since rank(Q AP ) = rank(A) = r, and rank(R11) = r; we must have R22 = 0:
T
Step 1. Find the column of A having the maximum norm. Permute now the columns of A
so that the column of maximum norm becomes the rst column.
This is equivalent to forming a permutation matrix P1 such that the matrix AP1 has the rst
column having the maximum norm. Form now a Householder matrix H1 so that
A = H AP
1 1 1
by deleting the rst row and the rst column. Permute the columns of this submatrix so that
the column of maximum norm becomes the rst column. This is equivalent to constructing a
permutation matrix from P^2 such that the second column of A^1 P^2 has the maximum norm. Form
P2 from P^2 in the usual way, that is:
01 0 01
BB 0 CC
P2 = BB CC :
B@ ... P^2 C A
0
Now construct a Householder matrix H2 so that
A = H A P = H H AP P
2 2 1 2 2 1 1 2
has zeros in the second column of A2 below the (2,2) entry. As before, H2 can be constructed in
two steps as in Section 5.3.1:
The kth step can now easily be written down.
The process is continued until the entries below the diagonal of the current matrix all become
zero.
Suppose r steps are needed. Then at the end of the rth step, we have
A A r = Hr H AP Pr
( )
1 1
!
R R
= QT AP = R = 11 12
:
0 0
Flop-Count and Storage Consideration. The above method requires 2mnr ; r (m + n)+ 2
r
2 3
3
ops. The matrix Q, as in the Householder factorization, is stored in factored form. The
matrix Q can be stored in factored form in the subdiagonal part of A and A can overwrite R.
Example 5.7.1
00 01
B C
@ 1 CA
A=B 1
2
= (a1; a2):
1
2
1 32
200
Step 1. a has the largest norm. Then
2
0 1
!
P =
1
1 0
00 01
B 1 CC
AP = B
1 @ A 1
2
1 1
2
00 ;1
p ;p1 1
B 2
;1 C
2
H = B
1 @ p; 1
2
1
2
C
2 A
;1
p ;1 1
1 0 ;p2 p; 1
2 2
00 ;p1 1 0 0
2
;1
p 0 1
B p;
A = H AP = B
2
;1 C
2
C BB 1 CC BB 0 0 CC = R
2
(1)
@
1 1
1
2
1
2 2 A@ 1
2 A=@ A
;1
p ;1 1
1 1
0 0
2 2 2
!
2
R R
R = 1 2
:
0 0
Thus, for this example
Q = HT = H ; 1
0 1
! 1
P = :
1
1 0
0 0 1
p B ;q CC forms an
The matrix A has rank 1, since R = 2 is 1 1. The column vector B
1 @ 1
2 A
;p1
orthonormal basis of R(A).
2
201
Theorem 5.7.2 (Complete Orthogonalization Theorem) Given Amn with
rank(A) = r, there exist orthogonal matrices Qmm and Wnn such that
T 0
!
QT AW = ;
0 0
where T is an r r upper triangular matrix with positive diagonal entries.
and if R22 is \small" in some measure (say, kR22k is of O(), where is the machine precision),
then the reduction will be terminated. Thus, from the above discussion, we note that, given an
m n matrix A (m n), if there exists a permutation matrix P such that
R R
!
QT AP = R = 11 12
;
0 22 R
where R11 is r r, and R22 is small in some measure, then we will say that A has numerical rank r.
(For more on numerical rank, see Chapter 10, Section 10.5.5.)
Unfortunately, the converse is not true.
A celebrated counterexample due to Kahan (1966) shows that a matrix can be nearly rank-
decient without having kR22k small at all.
Gene H. Golub, an American mathematician and computer scientist, is well known for his outstanding contribu-
tion in numerical linear algebra, especially in the area of the singular value decomposition (SVD), least squares, and
their applications in statistical computations. Golub is a professor of computer science at Stanford University and
is the co-author of the celebrated numerical linear algebra book \Matrix Computations". Golub is a member of the
National Academy of Sciences and a past president of SIAM (Society for Industrial and Applied Mathematics).
202
Consider 01
;c ;c ;c 1
BB 01 ;c ;c C
C
BB .
... ... .. C
C=R
A = diag(1; s; : : :; s ) B
n;
BB ... 1
. C
C
B@ ... . . . . . ;c CCA
0 0 1
with c + s = 1; c; s > 0. For n = 100; c = 0:2, it can be shown that A is nearly singular (the
2 2
smallest singular value is O(10; )). On the other hand, rnn = sn; = :133, which is not small, so
8 1
It is natural to wonder how the QR factorization of A0 can be obtained from the given QR factor-
ization of A, without nding it from scratch.
The problem is called the updating QR factorization problem. The downdating QR fac-
torization is similarly dened. The updating and downdating QR factorization arise in a variety of
practical applications, such as signal and image processing.
We present below a simple algorithm using Householder matrices to solve the updating problem.
Algorithm 5.8.1 Updating QR Factorization Using Householder Matrices
The following algorithm computes the QR factorization of A0 = (a1; : : :; ak ; ak+1) given the
Householder QR factorization of A = (a1; : : :; ak).
Step 1. Compute bk = Hk H ak , where H through Hk are Householder matrices such that
+1 1 +1 1
R
!
QT A = Hk H A = : 1
0
Step 2. Compute a Householder matrix Hk so that Hk bk = rk is zero in entries k +
+1 +1 +1 +1
2; : : :; m.
" R! #
Step 3. Form R0 = ; rk .
0 +1
203
Step 4. Form Q0 = Hk H .
+1 1
Theorem 5.8.1 R0 and Q0 dened above are such that (Q0)T A0 = R0.
Example 5.8.1
011
B CC
A = B
@2A
3
0 ;0:2673 ;0:5345 ;0:8018 1
B C
H H = QT = B
2 1 @ ;0:5345 0:7745 ;0:3382 CA
1
0:3091
Step 2. 01 1 0 ;6:4143 1
0 0
B
H =B
C B C
@ 0 ;0:9426 ;0:3339 CA ;
2 r =B
2 @ ;0:9258 CA
0 ;0:3339 0:9426 0
204
Step 3.
0 ;3:7417 ;6:4143 1
B C
R0 = (R; r ) = B
@ 0
2 ;0:9258 C
A
0 0
0 ;0:2673 ;0:5345 ;0:8018 1
Q0 = H H =B
B 0:7715 ;0:6172 0:1543 CC
2 @1 A
;0:5773 ;0:5774 0:5774
01 1
1
Verication: (Q0)T R0 = BB@ 2 4C
C = A0.
A
3 5
5.9 Summary and Table of Comparisons
For easy reference we now review the most important aspects of this chapter.
3. Hessenberg Reduction.
The Hessenberg form of a matrix is a very useful condensed form. We will see its use throughout
the whole book.
An arbitrary matrix A can always be transformed to an upper Hessenberg matrix by orthogonal
similarity: Given an n n matrix A, there always exists an orthogonal matrix P such that PAP T =
Hu, an upper Hessenberg matrix (Theorem 5.4.2).
This reduction can be achieved using elementary, Householder or Givens matrices. We have
described here methods based on Householder and Givens matrices (Algorithms 5.4.3 and 5.5.3).
Both the methods have guaranteed stability, but again, the Householder method is more ecient
than the Givens method.
For the aspect of uniqueness in Hessenberg reduction, see the statement of the Implicit Q
Theorem (Theorem 5.5.3). This theorem basically says that if a matrix A is transformed by
orthogonal similarity to two dierent unreduced upper Hessenberg matrices H1 and H2 using two
transforming matrices P and Q, then H1 and H2 are essentially the same, provided that P and Q
have the same rst columns.
206
4. Orthogonal Bases and Orthogonal Projections.
R
!
If Q A =
T is the QR factorization of an m n matrix (m n), then the columns of Q form
0 1
an orthonormal basis for R(A) and the columns of Q2 form an orthonormal basis for the orthogonal
complement of R(A), where Q = (Q1; Q2) and Q1 has n columns.
If we let B = (b1; : : :; bn) be an m n matrix, then the matrix BR = PA (b1; : : :; bn) = PA B ,
where PA = Q1 QT1 , is the orthogonal projection of B onto R(A).
Similarly, the matrix BN = PN B is the projection of B onto the orthogonal complement of
R(A), where PN = PA? = Q2QT2 .
6. Table of Comparisons.
We now summarize in the following table eciency and stability properties of some of these major
computations. We assume that A is m n (m n).
207
TABLE 5.1
TABLE OF COMPARISONS
PROBLEM METHOD FLOP-COUNT STABILITY
(APPROXIMATE)
QR Factorization Householder n (m ; n )
2
3
Stable
an n n matrix
an n n matrix
208
used to transform an arbitrary matrix to an upper Hessenberg matrix by similarity.
For details, see Wilkinson AEP, pp. 353-355.
5.10 Suggestions for Further Reading
The topics covered in this chapter are standard and can be found in any numerical linear algebra
text. The books by Golub and Van Loan (MC) and that by G. W. Stewart (IMC) are rich sources
of further knowledge in this area.
The book MC in particular contains a thorough discussion on QR factorization with column
pivoting using Householder transformations. (Golub and Van Loan MC, 1984, pp. 162-167.)
The book SLP by Lawson and Hanson contains in-depth discussion of triangularization using
Householder and Givens transformations, and QR factorization with column pivoting (Chapters 10
and 15).
The details of error analysis of the Householder and the Givens methods for QR factorization
and reduction to Hessenberg forms are contained in AEP by Wilkinson.
For error analyses of QR factorization using Givens transformations and variants of Givens
transformations, see Gentleman (1975).
A nice discussion on orthogonal projection is given in the book Numerical Linear Algebra
and Optimization by Philip E. Gill, Walter Murray, and Margaret H. Wright, Addison Wesley,
1991.
209
Exercises on Chapter 5
(Use MATLAB, whenever appropriate and necessary)
PROBLEMS ON SECTIONS 5.2 and 5.3
1. (a) Show that an elementary lower triangular matrix has the form
E = I + meTk ;
where m = (0; 0; : : :; 0; mk+1;k; : : :; mn;k)T .
(b) Show that the inverse of E in (a) is given by
E ; = I ; meTk :
1
2. (a) Given !
0:00001
a= ;
1
using 3-digit arithmetic, nd an elementary matrix E such that Ea is a multiple of e1 .
(b) Using your computations in (a), nd the LU factorization of
0:00001 1
!
A=
1 2
(c) Let L^ and U^ be the computed L and U in part (b). Find
kA ; L^ U^ kF ;
kAkF
where k kF is the Frobenius norm.
(n;1)
3. Show that the pivots a11; a11
22; : : :; ann are nonzero i the leading principal minors of A are
nonzero.
Hint: Show that
(r ;1)
det Ar = a11 a(1)
22 : : :; arr :
4. Let A be a symmetric positive denite matrix. At the end of the rst step of the LU factor-
ization of A, we have 0 1
a a a n
BB 0
11 12 1
CC
BB CC
A =B
(1)
BB 0 CC
B@ .. CC
. A0 A
0
210
Prove that A0 is also symmetric and positive denite. Hence show that LU factorization of a
symmetric positive denite matrix using Gaussian elimination without pivoting always exists
and is unique.
5. (a) Repeat the exercise #4 when A is a diagonally dominant matrix; that is, show that LU
factorization of a diagonally dominant matrix always exists and is unique.
(b) Using (a), show that a diagonally dominant matrix is nonsingular.
6. Assuming that LU factorization of A exists, prove that
(a) A can be written in the form
A = LDU ; 1
where D is diagonal and L and U1 are unit lower and upper triangular matrices, respec-
tively.
(b) If A is symmetric, then
A = LDLT :
(c) If A is symmetric and positive denite, then
A = HH T ;
where H is a lower triangular matrix with positive diagonal entries. (This is known
as the Cholesky decomposition.)
7. Assuming that LU factorization of A exists, develop an algorithm to compute U by rows and
L by columns directly from the equation:
A = LU:
This is known as Doolittle reduction.
8. Develop an algorithm to compute the factorization
A = LU;
where U is unit upper triangular and L is lower triangular. This is known as Crout reduc-
tion.
Hint: Derive the algorithm from the equation A = LU .
211
9. Compare the Doolittle and Crout reductions with Gaussian elimination with respect to
op-
count, storage requirements and possibility of accumulating inner products in double preci-
sion.
10. A matrix G of the form
G = I ; geTk ;
is called a Gauss-Jordan matrix. Show that, given a vector x with the property that
eTk x 6= 0; there exists a Gauss-Jordan matrix G such that
Gx is a multiple of ek.
Develop an algorithm to construct Gauss-Jordan matrices G1; G2; : : :; Gn successively such
that
212
(b) Show that the algorithm requires about r3
ops.
3
B 98 55 11 CC ;
ii. A = B
@ A
0 10 01 111
B ;1 1 1 CC ;
iii. A = B
@ A
; 1 ; 1 1
0 0:00003 1:566 1:234 1
B C
iv. A = B@ 1:5660 2:000 1:018 CA ;
1 ;3:000
0 11:2340;1 10:018
B C
v. A = B@ ;1 2 0 C A:
0 ;1 2
(c) Express each factorization in the form PAQ = LU (note that for Gaussian elimination
without and with partial pivoting, Q = L).
(d) Compute the growth factor in each case.
214
21. Let 0 10 1 1 1 1
BB 2 10 1 1 CC
H=BBB CC :
@ 0 1 10 1 CA
0 0 1 10
Triangularize H using
(a) Gaussian elimination,
(b) the Householder method,
(c) the Givens method.
22. Let H^ k = I ; 2uu
T
u uT ; where u is a (n ; k + 1) vector. Dene
Ik; 0
!
Hk = 1
:
0 H^ k
How many
ops will be required to compute Hk A, where A is arbitrary and n n? Your
count should take into account the special structure of the matrix H^ k .
Using this result, show that the Householder method requires about 2 n3
ops to obtain R
3
2
and another 3 n
ops to obtain Q in the QR factorization of A.
3
215
24. Given 011
B 2 CC
b=B
@ A
3
and A as in Problem #21, nd bR and bN .
25. Given 01 31
B CC
B=B
@2 4A
3 5
and A as in Problem #21, nd BR and BN .
26. Let H be an n n upper Hessenberg matrix and let
H = QR;
where Q is orthogonal and R is upper triangular obtained by Givens rotations.
Prove that Q is also upper Hessenberg.
27. Develop an algorithm to compute AH where A is m n arbitrary and H is a Householder
matrix. How many
ops will the algorithm require? Your algorithm should exploit the
structure of H .
28. Develop algorithms to compute AJ and JA, where A is m n and J is a Givens rotation.
(Your algorithms should exploit the structure of the matrix J ).
How many
ops are required in each case?
29. Show that the
op-count to compute R in the QR factorization of an m n matrix A (m n)
using Givens' rotations is about 2n2(m ; n3 ).
30. Give an algorithm to compute
Q = H H Hn
1 2
where each Qi is the product of (m ; i) Givens rotations, can be computed with 2n2 ( m ; n )
3
ops.
216
32. Let 01 2 31
B 4 5 6 CC :
A=B
@ A
7 8 9
Find QR factorization of A using
(a) the Householder method
(b) the Givens method
Compare the results.
33. Apply both the Householder and the Givens methods of reduction to the matrix
00 1 0 0 01
BB 0 0 1 0 0 CC
BB CC
A = B0 0 0 1 0C
B CC
BB
@ 0 0 0 0 1 CA
1 2 3 4 5
to reduce it to a Hessenberg matrix by similarity. Compare the results.
34.
(a) Show that it requires 5 n3
ops to compute the upper Hessenberg matrix Hu using the
3 nX;2 nX;2
Householder method of reduction. (Hint: 2(n ; k)2 + 2n(n ; k) 2 n3 + n3 = 5n .)
3
k=1 k=1 3 3
(b) Show that if the transforming matrix P is required explicitly, another 2 n3
ops will be
3
needed.
(c) Work out the corresponding
op-count for reduction to Hessenberg form using Givens
rotations.
(d) If A is symmetric, then show that the corresponding count in (a) is 2n .
3
3
35. Given an unreduced upper Hessenberg matrix H , show that the matrix X dened by X =
(e1; He1; : : :; H n;1e1 ) is nonsingular and is such that X ;1HX is a companion matrix in upper
Hessenberg form.
(a) What are the possible numerical diculties with the above computations?
217
(b) Transform 0 1 1
2 3 4
BB 2 10; 4 4 4C
C
H=BB 5
CC
B@ 0 1 10 1 2 C
;
A3
0 0 1 1
to a companion form.
36.
(a) Given the pair (A; b) where A is n n and b is a column vector, develop an algorithm
to compute an orthogonal matrix P such that PAP T = upper Hessenberg Hu and Pb
is a multiple of the vector e1 .
(b) Show that Hu is unreduced and b is a nonzero multiple of e1 i rank (b; Ab; : : :; An;1b) = n:
(c) Apply your algorithm in (a) to
01 1 1 11 011
BB C BB CC
B 1 2 3 4C C BB 2 CC :
A=B C
B@ 2 1 1 1 CA ; b = B@ 3 CA
1 1 1 1 4
218
MATLAB AND MATCOM PROGRAMS AND
PROBLEMS ON CHAPTER 5
You will need the programs housmul, compiv, givqr and givhs from MATCOM.
1.
(a) Write a MATLAB function called elm(v ) that creates an elementary lower triangular
matrix E so that Ev is a multiple of e1 , where v is an n-vector.
(b) Write a MATLAB function elmul(A; E ) that computes the product EA, where E is an
elementary lower triangular matrix and A is an arbitrary matrix.
Your program should be able to take advantage of the structure of the matrix
E.
(c) Using elm and elmul, write a MATLAB program elmlu that nds the LU factorization
of a matrix, when it exists:
[L; U ] = elmlu(A):
(This program should implement the algorithm 5.2.1 of the book).
0 1 0 1
0:00001 1A 1 1A
A = @ ; A=@ ;
1 1 0:00001 1
0 1
BB 10 1 1 C
A = B@ 1 10 1 CCA ; A = diag(1; 2; 3):
1 1 10
A = 5 5 Hilbert matrix
Now compute (i) the product LU and (ii) jjA ; LU jjF in each case and print your results.
219
2. Modify your program elmlu to incorporate partial pivoting:
[M; U ] = elmlupp(A):
Test your program with each of the matrices of problem #1.
Compare your results with those obtained by MATLAB built-in function:
[L; U ] = lu(A):
3. Write a MATLAB program, called parpiv to compute M and U such that MA = U; using
partial pivoting:
[M; U ] = parpiv(A):
(This program should implement algoirthm 5.2.3 of the book).
Print M; U; jjMA ; U jjF and jjMA ; U jj2 for each of the matrices A of problem #1.
4. Using the program compiv from MATCOM, print M; Q; and U; and
jjMAQ ; U jj ; and jjMAQ ; U jjF
2
Now compute
220
i. jjI ; QT QjjF ,
ii. jjA ; QRjjF ,
and compare the results with those obtained using the MATLAB built-in function
[Q; R] = qr(A):
(d) Repeat (c) with the program givqr from MATCOM in place of housqr, that computes
QR factorization using Givens rotations.
6. Run the program givqr(A) from MATCOM with each of the matrices in problem #1. Then
using [Q; R] = qr(A) from MATLAB on those matrices again, verify the uniqueness of QR
factorization for each A.
7. Using givhs(A) from MATCOM and the MATLAB function hess(A) on each of the matrices
from problem #1, verify the implicit QR theorems: Theorem 5.5.3 (Uniqueness of
Hessenberg reduction).
8. Using the results of problems #5, nd an orthonormal basis for R(A), an orthonormal basis for
the orthogonal complement of R(A), the orthogonal projection onto R(A), and the projection
onto the orthogonal complement of R(A) for each of the matrices of problem #1.
9. Incorporate \maximum norm column pivoting" in housqr to write a MATLAB program
housqrp that computes the QR factorization with column pivoting of a matrix A. Test your
program with each of the matrices of problem #1:
[Q; R; P ] = housqrp(A):
Compare your results with those obtained by using the MATLAB function
[Q; R; P ] = qr(A):
Note : Some of the programs you have been asked to write such as parpiv, housmat, housqr,
etc. are in MATCOM or in the Appendix. But it is a good idea to write your own programs.
221
6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS
6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 223
6.2 Basic Results on Existence and Uniqueness : : : : : : : : : : : : : : : : : : : : : : : 224
6.3 Some Applications Giving Rise to Linear Systems Problems : : : : : : : : : : : : : : 226
6.3.1 An Electric Circuit Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 226
6.3.2 Analysis of a Processing Plant Consisting of Interconnected Reactors : : : : : 228
6.3.3 Linear Systems Arising from Ordinary Dierential Equations (Finite Dier-
ence Scheme) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 231
6.3.4 Linear Systems Arising from Partial Dierential Equations: A Case Study
on Temperature Distribution : : : : : : : : : : : : : : : : : : : : : : : : : : : 233
6.3.5 Special Linear Systems Arising in Applications : : : : : : : : : : : : : : : : : 238
6.3.6 Linear System Arising From Finite Element Methods : : : : : : : : : : : : : 243
6.3.7 Approximation of a Function by a Polynomial: Hilbert System : : : : : : : : 247
6.4 Direct Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 248
6.4.1 Solution of a Lower Triangular System : : : : : : : : : : : : : : : : : : : : : : 249
6.4.2 Solution of the System Ax = b Using Gaussian Elimination without Pivoting 249
6.4.3 Solution of Ax = b Using Pivoting Triangularization : : : : : : : : : : : : : : 250
6.4.4 Solution of Ax = b without Explicit Factorization : : : : : : : : : : : : : : : : 256
6.4.5 Solution of Ax = b Using QR Factorization : : : : : : : : : : : : : : : : : : : 258
6.4.6 Solving Linear System with Right Multiple Hand Sides : : : : : : : : : : : : 260
6.4.7 Special Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 262
6.4.8 Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 284
6.4.9 LU Versus QR and Table of Comparisons : : : : : : : : : : : : : : : : : : : : 286
6.5 Inverses, Determinant and Leading Principal Minors : : : : : : : : : : : : : : : : : : 288
6.5.1 Avoiding Explicit Computation of the Inverses : : : : : : : : : : : : : : : : : 288
6.5.2 The Sherman-Morrison and Woodbury Formulas : : : : : : : : : : : : : : : : 290
6.5.3 Computing the Inverse of a Matrix : : : : : : : : : : : : : : : : : : : : : : : : 292
6.5.4 Computing the Determinant of a Matrix : : : : : : : : : : : : : : : : : : : : : 295
6.5.5 Computing The Leading Principal Minors of a Matrix : : : : : : : : : : : : : 297
6.6 Perturbation Analysis of the Linear System Problem : : : : : : : : : : : : : : : : : : 299
6.6.1 Eect of Perturbation in the Right-Hand Side Vector b : : : : : : : : : : : : 300
6.6.2 Eect of Perturbation in the matrix A : : : : : : : : : : : : : : : : : : : : : 304
6.6.3 Eect of Perturbations in both the matrix A and the vector b : : : : : : : : : 306
6.7 The Condition Number and Accuracy of Solution : : : : : : : : : : : : : : : : : : : : 308
6.7.1 Some Well-known Ill-conditioned Matrices : : : : : : : : : : : : : : : : : : : : 309
6.7.2 Eect of The Condition Number on Accuracy of the Computed Solution : : : 310
6.7.3 How Large Must the Condition Number be for Ill-Conditioning? : : : : : : : 311
6.7.4 The Condition Number and Nearness to Singularity : : : : : : : : : : : : : : 312
6.7.5 Conditioning and Pivoting : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 313
6.7.6 Conditioning and the Eigenvalue Problem : : : : : : : : : : : : : : : : : : : : 313
6.7.7 Conditioning and Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 314
6.7.8 Computing and Estimating the Condition Number : : : : : : : : : : : : : : : 315
6.8 Component-wise Perturbations and the Errors : : : : : : : : : : : : : : : : : : : : : : 320
6.9 Iterative Renement : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 321
6.10 Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 326
6.10.1 The Jacobi Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 328
6.7.2 The Gauss-Seidel Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 332
6.10.3 Convergence of Iterative Methods : : : : : : : : : : : : : : : : : : : : : : : : : 334
6.10.4 The Successive Overrelaxation (SOR) Method : : : : : : : : : : : : : : : : : 342
6.10.5 The Conjugate Gradient Method : : : : : : : : : : : : : : : : : : : : : : : : : 349
6.10.6 The Arnoldi Process and GMRES : : : : : : : : : : : : : : : : : : : : : : : : 356
6.11 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 359
6.12 Some Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : 366
CHAPTER 6
NUMERICAL SOLUTIONS
OF LINEAR SYSTEMS
6. NUMERICAL SOLUTIONS OF LINEAR SYSTEMS
Objectives
The major objectives of this chapter are to study numerical methods for solving linear systems
and associated problems. Some of the highlights of this chapter are:
Theoretical results on existence and uniqueness of the solution (Section 6.2).
Some important engineering applications giving rise to linear systems problems (Section
6.3).
Direct methods (Gaussian elimination with and without pivoting) for solving linear systems
(Section 6.4).
Special systems: Positive denite, Hessenberg, diagonally dominant, tridiagonal and block
tridiagonal (Section 6.4.7).
Methods for computing the determinant and the inverse of a matrix (Section 6.5).
Sensitivity analysis of linear systems problems (Section 6.6).
Iterative renement procedure (Section 6.9).
Iterative methods (Jacobi, Gauss-Seidel, Successive Overrelaxation, Conjugate Gradient)
for linear systems (Section 6.10).
Required Background
The following major tools and concepts developed in earlier chapters will be needed for smooth
learning of material of this chapter.
1. Special Matrices (Section 1.4), and concepts and results on matrix and vector norms
(Section 1.7). Convergence of a matrix sequence and convergent matrices (Section
1.7.3)
2. LU factorization using Gaussian elimination without pivoting (Section 5.2.1, Algo-
rithms 5.5.1 and 5.5.2).
3. MA = U factorization with partial pivoting (Section 5.2.2, Algorithm 5.2.3).
4. MAQ = U factorization with complete pivoting (Section 5.2.3, Algorithm 5.2.4).
5. The concept of the growth factor (Section 5.3).
222
6. QR factorization of a matrix (Section 5.4.1, Section 5.5.1).
7. Concepts of conditioning and stability (Section 3.3 and Section 3.4).
8. Basic knowledge of dierential equations.
6.1 Introduction
In this chapter we will discuss methods for numerically solving the linear system
Ax = b;
where A is an n n matrix and x and b are n-vectors. A and b are given and x is unknown.
The problem arises in a very wide variety of applications. As a matter of fact, it might be
said that numerical solutions of almost all practical engineering and applied science
problems routinely require solution of a linear system problem. (See Section 6.3.)
We shall discuss methods for nonsingular linear systems only in this chapter. The case where
the matrix A is not square or the system has more than one solution will be treated in Chapter 7.
A method called Cramer's Rule, taught in an elementary undergraduate linear algebra course,
is of high signicance from a theoretical point of view.
CRAMER'S RULE
Let A be a nonsingular matrix of order n and b be an n-vector. The solution x to
the system Ax = b is given by
xi = det Ai ; i = 1; : : :; n;
det A
where Ai is a matrix obtained by replacing the ith column of A by the vector b and
x = (x1; x2; : : :; xn)T .
Remarks on Cramer's Rule: Cramer's Rule is, however, not at all practical from
a computational viewpoint. For example, solving a linear system with 20 equations and 20
unknowns by Cramer's rule, using the usual denition of determinant, would require more than a
million years even on a fast computer (Forsythe, Malcom and Moler CMMC, p. 30). For an n n
system, it will require about O(n!)
ops.
Two types of methods are normally used for numerical computations:
223
(1) Direct methods
(2) Iterative methods
The direct methods consist of a nite number of steps and one needs to perform all the steps in a
given method before the solution is obtained. On the other hand, iterative methods are based on
computing a sequence of approximations to the solution x and a user can stop whenever a certain
desired accuracy is obtained or a certain number of iterations are completed. The iterative
methods are used primarily for large and sparse systems.
The organization of this chapter is as follows:
In Section 6.2 we state the basic theoretical results (without proofs) on the existence and
uniqueness of solutions for linear systems.
In Section 6.3 we discuss several engineering applications giving rise to linear systems problems.
In Section 6.4 we describe direct methods for solving linear systems.
In Section 6.5 we show how the LU and QR factorization methods can be used to compute the
determinant, the inverse and the leading principal minors of a matrix.
In Section 6.6, we study the sensitivity issues of the linear systems problems and their eects
on the solutions.
In Section 6.9 we brie
y describe an iterative renement procedure for improving the accuracy
of a computed solution.
In Section 6.10 we discuss iterative methods: the Jacobi, Gauss-Seidel, SOR, and conjugate
gradient and GMRES methods.
224
where 0 a a a 1 0x 1 0b 1
11 12 1n 1
BB a a a CC BB x CC BB b1 CC
A=BBB ..21 22 2n C
CC ; x = BBB ..2 CCC ; b = BBB ..2 CCC :
@ . A @.A @.A
am1 am2 amn xn bm
Given an m n matrix A and an m-vector b, if there exists a vector x satisfying Ax = b, then we
say that the system is consistent. Otherwise, it is inconsistent. It is natural to ask if a given
system Ax = b is consistent and, if it is consistent, how many solutions are there? when is the
solution unique? etc. To this end, we state the following theorem.
Homogeneous Systems
If the vector b = 0, then the system Ax = 0 is called a homogeneous system. A homogeneous
system always has a solution, namely x = 0. This is the trivial solution.
225
Theorem 6.2.3 (Solution Invariance Theorem) A solution of a consistent sys-
tem
Ax = b
remains unchanged under any of the following operations:
(i) Any two equations are interchanged.
(ii) An equation is multiplied by a nonzero constant.
(iii) A nonzero multiple of one equation is added to another equation.
Two systems obtained from one another by applying any of the above operations are called
equivalent systems. Theorem 6.2.3 then says that two equivalent systems have the same
solution.
6.3 Some Applications Giving Rise to Linear Systems Problems
It is probably not an overstatement that linear systems problems arise in almost all practical
applications. We will give examples here from electrical, mechanical, chemical and civil engineering.
We start with a simple problem|an electric circuit.
226
A1 I1 A2 I2 A3
R12 = 1
R23 = 2
R25 = R34
V1 = 100
10
= 3
V6 = 0
I2
I4
R56 = 5
R45 = 4
A6 I3 A5 I2 A4
Figure 6-1
We would like to determine the amount of current between the nodes A1; A2 ; A3; A4 ; A5,
and A6 . The famous Kirchho's Current Law tells us that the algebraic sum of all currents
entering a node must be zero. Applying this law at node A2, we have
I1 ; I2 + I4 = 0 (6.3.1)
At node A5,
I2 ; I3 ; I4 = 0 (6.3.2)
At node A3,
I2 ; I2 = 0 (6.3.3)
At node A4,
I2 ; I2 = 0 (6.3.4)
Now consider the voltage drop around each closed loop of the circuit, A1 A2 A3 A4A5 A6 A1,
A1 A2A5 A6 A1 , A2A3A4A5A2. The Kirchho's Voltage Law tells us that the net voltage drop
around each closed loop is zero. Thus at the loop A1A2A3A4A5A6A1, substituting the values
of resistances and voltages, we have
I1 + 9I2 + 5I3 = 100 (6.3.5)
Similarly, at A1 A2 A5 A6A1 and A2 A3A4 A5 A2 we have, respectively
I1 ; 10I4 + 5I3 = 100 (6.3.6)
9I2 + 10I4 = 0 (6.3.7)
227
Note that (6.3.6) + (6.3.7) = (6.3.5). Thus we have four equations in four unknowns:
I1 ; I2 + I4 = 0 (6.3.8)
I2 ; I3 ; I4 = 0 (6.3.9)
I1 ; 10I4 + 5I3 = 100 (6.3.10)
9I2 + 10I4 = 0 (6.3.11)
The equations (6.3.8){(6.3.11) can be written as
0 1 ;1 0 1 1 0 I 1 0 0 1
1
B
B C
CC BB CCC BBB CCC
B
B
B C BB I CC BB 0 CC
B
B 0 1 ;1 ;1 CCC BB 2 CC BB CC
B
B CC BB CC = BB CC ;
B
B C BB CC BB CC
B
B 1 0 5 ;10 C CC BB I3 CC BB 100 CC
B
B CA B@ CA B@ CA
@
0 9 0 10 I4 0
the solution of which yields the current between the nodes.
228
Q55 = 2
Q15 = 3
C5
Q54 = 2
Q25 = 1
Q01 = 6 Q44 = 11
C1 C2 C4
C01 = 12 Q12 = 3 Q24 = 1
Q23 = 1 Q34 = 8
Q31 = 1 Q66 = 2
C3 C6
Q03 = 8 Q36 = 10
C03 = 20
Figure 6-2
Application of conservation of mass to all these reactors results in a linear system of equations as
shown below, consisting of ve equations in ve unknowns. The solution of the system will tell us
the concentration of the mixture at each of these reactors.
Steady-state, completely mixed reactor. Consider rst a reactor with two
ows coming
in and one
ow going out, as shown in Figure 6-3.
m1 ; Q1; C1-
m2 ; Q2; C2- C3 -
m3 ; Q3; C3
Figure 6-3
Application of the steady state conservation of mass to the above reactor gives
m_ 1 + m_ 2 = m_ 3 : (6.3.12)
Noting that
m_ i = Qi Ci
229
where
mi = mass
ow rate of the mixture at the inlet and outlet sections i; i = 1; 2; 3
Qi = volumetric
ow rate at the section i; i = 1; 2; 3
Ci = density or concentration at the section i; i = 1; 2; 3
we get from (6.3.12)
Q1C1 + Q2C2 = Q3C3 (6.3.13)
For given inlet
ow rates and concentrations, the outlet concentration C3 can be found from equa-
tion (6.3.13). Under steady state operation, this outlet concentration also represents the spatially
uniform or homogeneous concentration inside the reactor. Such information is necessary for design-
ing the reactor to yield mixtures of a specied concentration. For details, see Chapra and Canale
(1988).
Referring now to Figure 6-2, where we consider the plant consisting of six reactors, we have
the following equations (a derivation of each of these equations is similar to that of (6.3.13)). The
derivation of each of these equations is based on the fact that the net mass
ow rate into the reactor
is equal to the net mass
ow out of the reactor.
For reactor 1:
6C1 ; C3 = 72 (6.3.14)
(Note that for this reactor,
ow at the inlet is 72 + C3 and
ow at the outlet is 6C1.)
For reactor 2:
3C1 ; 3C2 = 0 (6.3.15)
For reactor 3:
; C2 + 11C3 = 200 (6.3.16)
For reactor 4:
C2 ; 11C4 + 2C5 + 8C6 = 0 (6.3.17)
For reactor 5:
3C1 + C2 ; 4C5 = 0 (6.3.18)
For reactor 6:
10C3 ; 10C6 = 0 (6.3.19)
230
Equations (6.3.14){(6.3.19) can be rewritten in matrix form as
0 6 0 ;1 0 0 0 1 0 C 1 0 72 1
BB 3 ;3 0 0 0 0 CC BB C1 CC BB 0 CC
BB CB 2C B C
BB 0 ;1 9 0 0 0 CCC BBB C3 CCC BBB 200 CCC
BB CB C = B C (6.3.20)
BB 0 1 8 ;11 2 8 CCC BBB C4 CCC BBB 0 CCC
B@ 3 1 0 0 ;4 0 CA B@ C5 CA B@ 0 CA
0 0 10 0 0 ;10 C6 0
or
AC = D:
The ith coordinate of the unknown vector C represents the mixture concentration at reactor i of
the plant.
6.3.3 Linear Systems Arising from Ordinary Dierential Equations (Finite Dierence
Scheme)
A Case Study on a Spring-Mass Problem
Consider a system of three masses suspended vertically by a series of springs, as shown below,
where k1 ; k2 , and k3 are the spring constants, and x1; x2 , and x3 are the displacements of each
spring from its equilibrium position.
k1
m1
x1
k2
m2
x2
k3
m3
x3
? m?1g m2 g
? ? m3 g
?
k2(x2 ; x1) k3(x3 ; x2)
Referring to the above diagram, the equations of motion, by Newton's second law, are:
m1 ddtx21 = k2(x2 ; x1) + m1g ; k1 x1
2
Suppose we are interested in knowing the displacement of these springs when the system eventually
returns to the steady state, that is, when the system comes to rest. Then, by setting the second-
order derivatives to zero, we obtain
k1x1 + k2(x1 ; x2) = m1 g
k2(x2 ; x1) + k3(x2 ; x3) = m2 g
k3(x3 ; x2) = m3 g
This system of equations in three unknowns, x1 ; x2, and x3 , can be rewritten in matrix form as
0 k + k ;k 0
10x 1 0m g1
1 2 2 1 1
B
B CC BB CC BB CC
B
B CC BB CC BB CC
B
B ;k2 k2 + k3 ;k3 C CC BBB x2 CCC = BBB m2g CCC
B
B CA B@ CA B@ CA
@
0 ;k3 k3 x3 m3 g
or
Kx = w:
The matrix 0 k + k ;k 0
1
1 2 2
BB CC
BB CC
K=B BB 2 2 3 3 CCC
; k k + k ; k
B@ CA
0 ;k3 k3
232
is called the stiness matrix.
6.3.4 Linear Systems Arising from Partial Dierential Equations: A Case Study on
Temperature Distribution
Many engineering problems are modeled by partial dierential equations. Numerical approaches
to these equations typically require discretization by means of dierence equations, that is, partial
derivatives in the equations are replaced by approximate dierences. This process of discretization
in turn gives rise to linear systems of many interesting types. We shall illustrate this with a problem
in heat transfer theory.
A major objective in a heat transfer problem is to determine the temperature distribution
T (x; y; z; t) in a medium resulting from imposed boundary conditions on the surface of the medium.
Once this temperature distribution is known, the heat transfer rate at any point in the medium or
on its surface may be computed from Fourier's law, which is expressed as
qx = ;K @T @x
qy = ;K @y @T
qz = ;K @T @z
where qx is the heat transfer rate in the x direction, @T@x is the temperature gradient in the x
direction, and the positive constant K is called the thermal conductivity of the material. Similarly
for the y and z directions.
Consider a homogeneous medium in which temperature gradients exist and the temperature
distribution T (x; y; z; t) is expressed in Cartesian coordinates. The heat diusion equation which
governs this temperature distribution is obtained by applying conservation of energy over an in-
nitesimally small dierential element, from which we obtain the relation
@ (K @T ) + @ (K @T ) + @ (K @T ) + q_ = C @T ; (6.3.21)
@x @x @y @y @z @z p @z
where is the density, Cp is the specic heat, and q_ is the energy generated per unit volume.
This equation, usually known as the heat equation, provides the basic tool for solving heat
conduction problems.
It is often possible to work with a simplied form of equation (6.3.21). For example, if the
thermal conduction is a constant, the heat equation is
@ 2T + @ 2T + @ 2T + q_ = 1 @T (6.3.22)
@x2 @y 2 @z2 K @t
233
where = K=(Cp) is a thermophysical property known as the thermal diusivity.
Under steady state conditions, there can be no changes of energy storage, i.e., the unsteady
state term @T
@t can be dropped, and equation (6.3.21) reduces to the 3-D Poisson's Equation
@ 2T + @ 2T + @ 2 T + q_ = 0 (6.3.23)
@x2 @y2 @z2 K
If the heat transfer is two-dimensional (e.g., in the x and y directions) and there is no energy
generation, then the heat equation reduces to the famous Laplace's equation
@ 2 T + @ 2T = 0 (6.3.24)
@x2 @y2
If the heat transfer is unsteady and one-dimensional without energy generation, then the heat
equation reduces to
@ 2T = 1 @T (6.3.25)
@x2 @t
Analytical solutions to the heat equation can be obtained for simple geometry and boundary con-
ditions. Very often there are practical situations where the geometry or boundary conditions are
such that an analytical solution has not been obtained or if it is obtained, it involves complex series
solutions that require tedious numerical evaluation. In such cases, the best alternatives are nite
dierence or nite element methods which are well suited for computers.
Finite Dierence Scheme
A well-known scheme for solving a partial dierential equation is to use nite dierences. The
idea is to discretize the partial dierential equation by replacing the partial derivatives with their
approximations, i.e., nite dierences. We will illustrate the scheme with Laplace's equation in the
following.
Let us divide a two-dimensional region into small regions with increments in the x and y
directions given as x and y , as shown in the gure below.
234
Nodal Points
Dy
Dx
Each nodal point is designated by a numbering scheme i and j , where i indicates x increment and
j indicates y increment:
(i,j + 1)
(i - 1,j) (i,j)
(i + 1, j)
(i,j - 1)
The temperature distribution in the medium is assumed to be represented by the nodal points
temperature. The temperature at each nodal point (xi ; yj ) (which is symbolically denoted by (i,j)
as in the diagram above) is the average temperature of the surrounding hatched region. As the
number of nodal points increases, greater accuracy in representation of the temperature distribution
is obtained.
A nite dierence equation suitable for the interior nodes of a steady two-dimensional system
can be obtained by considering Laplace's equation at the nodal point i; j as
@ 2T + @ 2 T = 0 (6.3.26)
@x2 i;j @y 2 i;j
235
The second derivatives at the nodal point (i; j ) can be expressed as
@T ; @T
@ 2T
@x i+ 2 ;jx@x i; 2 ;j
1 1
@x2 i;j (6.3.27)
@T ; @T
@ 2T @y i;j+ 2x@y i;j; 2
1 1
(6.3.28)
@y 2 i;j
As shown in the gure, the temperature gradients can be approximated (as derived from the
Taylor series) as a linear function of the nodal temperatures as
@T Ti+1;j;x Ti;j (6.3.29)
@x i+ 12 ;j
@T Ti;j ;xTi;1;j (6.3.30)
@x i; 12 ;j
@T Ti;j+1;y Ti;j (6.3.31)
@y i;j+ 12
@T Ti;j ;Ty i;j;1 (6.3.32)
@y i;j ; 12
where, Ti;j = T (xi ; yj ). Substituting (6.3.29){(6.3.32) into (6.3.27){(6.3.28), we get
@ 2T
=
Ti+1;j ; 2Ti;j + Ti;1;j (6.3.33)
@x i;j
2 (x)2
@ 2T
=
Ti;j+1 ; 2Ti;j + Ti;j ;1 (6.3.34)
@y i;j
2 (y )2
The equation (6.3.26) then gives
Ti+1;j ; 2Ti;j + Ti;1;j + Ti;j +1 ; 2Ti;j + Ti;j ;1 = 0
(x)2 (y )2
Assume x = y . Then the nite dierence approximation of Laplace's equation for
interior regions can be expressed as
Ti;j+1 + Ti;j ;1 + Ti+1;j + Ti;1;j ; 4Ti;j = 0 (6.3.35)
More accurate higher order approximations for interior nodes and boundary nodes are also obtained
in a similar manner.
Example 6.3.1
A two-dimensional rectangular plate (0 x 1; 0 y 1) is subjected to the uniform
temperature boundary conditions (with top surface maintained at 1000C and all other surfaces at
236
00C ) shown in the gure below, that is T (0; y ) = 0, T (1; y ) = 0, T (x; 0) = 0, and T (x; 1) = 1000C ,
Suppose we are interested only in the values of the temperature at the nine interior nodal points
(xi; yj ); where xi = ix and yj = j y , i; j = 1::3 with x = y = 14 .
o
100 C
o
O oC (0,2) (1,2) (2,2) (3,2) (4,2) O C
o
O C
However, we assume symmetry for simplifying the problem. That is, we assume that T33 = T13,
T32 = T12, and T31 = T11. We thus have only six unknowns: (T11; T12; T13) and (T21; T22; T23).
4T1;1 ; 0 ; 100 ; T2;1 ; T1;2 = 0
4T1;2 ; 0 ; T1;1 ; T2;2 ; T1;3 = 0
4T1;3 ; 0 ; T1;2 ; T2;3 ; 0 = 0
4T2;1 ; T1;1 ; 100 ; T1;1 ; T2;2 = 0
4T2;2 ; T1;2 ; T2;1 ; T1;2 ; T2;3 = 0
4T2;3 ; T1;3 ; T2;2 ; T1;3 ; 0 = 0
237
After suitable rearrangement, these equations can be written in the following form:
0 4 ;1 ;1 0 0 0 1 0 T1;1 1 0 100 1
BB CC BB CC BB CC
BB CC BB CC BB CC
BB ;2 4 0 ;1 0 0C CC BBB T2;1 CCC BBB 100 CCC
BB CC BB CC BB CC
BB C BB T CC BB 0 CC
BB 1 0 4 ;1 ;1 0C CC BB 1;2 CC BB CC
BB CC BB CC = BB CC
BB C BB CC BB CC
BB 0 ;1 ;2 4 0 ;1 C CC BB T2;2 CC BB 0 CC
BB CC BB CC BB CC
BB C BB CC BB CC
BB 0 0 ;1 0 4 ;1 C CC BB T1;3 CC BB 0 CC
BB CA B@ CA B@ CA
@
0 0 0 ;1 ;2 4 T2;3 0
The solution of this system will give us temperatures at the nodal points.
T0 = 1 2 3 T4 =
x
238
Using a similar numbering scheme as before, the temperature at any point is given by
Ti+1 ; 2 Ti + Ti;1 = 0;
that is, the temperature at any point is just the average of the temperatures of the two nearest
neighboring points.
Suppose the domain of the problem is 0 x 1. Divide now the domain into four segments of
equal length, say x. Thus x = :25. Then T at x = ix will be denoted by Ti . Suppose that
we know the temperature at the end points x = 0 and x = 1, that is,
T0 =
T4 =
These are then the boundary conditions of the problem.
>From the equation, the temperature at each node, x = 0; x = x, x = 2x; x = 3x; x = 1
is calculated as follows:
At x = 0, T0 = (given)
At x = x, T0 ; 2T1 + T2 = 0
At x = 2x, T1 ; 2T2 + T3 = 0
At x = 3x, T2 ; 2T3 + T4 = 0
At x = 1, T4 = (given)
In matrix form these equations can be written as:
01 0 0 0 010T 1 01
0
BB CC BB CC BB CC
BB CB C B C
BB 1 ;2 1 0 0 CCC BBB T1 CCC BBB 0 CCC
BB CC BB CC BB CC
BB CB C B C
BB 0 1 ;2 1 0 CCC BBB T2 CCC = BBB 0 CCC
BB CC BB CC BB CC
BB CB C B C
BB 0 0 1 ;2 1 CCC BBB T3 CCC BBB 0 CCC
BB CC BB CC BB CC
@ A@ A @ A
0 0 0 0 1 T4
The matrix of this system is tridiagonal.
239
Symmetric Tridiagonal and Diagonally Dominant Systems
In order to see how such systems arise, consider now the unsteady conduction of heat. This
condition implies that the temperature T varies with the time t. The heat equation in this case is
1 @T = @ 2 T :
@t @x2
Let us divide the grid in the (x; t) plane with spacing x in the x-direction and t in the t-direction.
ti+1 ; ti = t xi+1 ; xi = x
t2
t1
0 x1 x2 x3 xn 1
Let the temperature at the nodal point xi = ix and tj = j t, as before, be denoted by Tij .
Approximating @T
@t and @ 2T by the nite dierences
@x2
@T 1 (T
@t t i;j +1 ; Ti;j )
@ 2T 1 (T
@x2 (x)2 i+1;j +1 ; 2Ti;j +1 + Ti;1;j +1);
we obtain the following dierence analog of the heat equation:
(1 + 2C )Ti;j +1 ; C (Ti+1;j +1 + Ti;1;j +1) = Ti;j
i = 1; 2; : : :; n
where C = (xt)2 .
These equations enable us to determine the temperature at a time step j = k + 1, knowing the
temperature at the previous time step j = k.
For i = 1; j = k: (1 + 2C )T1;k+1 ; CT2;k+1 = CT0;k+1 + T1;k
For i = 2; j = k: (1 + 2C )T2;k+1 ; CT3;k+1 ; CT1;k+1 = T2;k
...
240
For i = n; j = k: (1 + 2C )Tn;k+1 ; CTn;1;k+1 = Tn;k + Tn+1;k+1
Suppose now the temperatures at the two vertical sides are known, that is,
T0;t = TW1
Tn+1;t = TW2
Then the above equations can be written in matrix notation as
0 (1 + 2C ) ;C 0 0 1 0 T1;k+1 1 0 T1;k + CTW1 1
BB CC BB CC BB CC
BB CC BB CC BB CC
BB ;C (1 + 2C ) ;C 0 0 C B T2;k+1 C
C B CC BBB T2;k CC
BB CC BB C B CC
BB . ... ... ... .. C CC BBB .. CCC BBB .. CC
BB .. . CB . C = B . CC
BB CC BB C B CC
BB . ... ...
CC BB . CCC BBB .. CC
BB .. ;C C CC BBB .. CCC BBB . CC
BB CA B@ CA B@ CC
@ A
0 ;C (1 + 2C ) Tn;k+1 Tn;k + CTW2
The matrix of the above system is clearly symmetric, tridiagonal and diagonally dominant
(note that C > 0).
For example, when C = 1, and we have
0 3 ;1 0 0 1 0 T 1 0 T + T 1
1;k+1 1;k W1
B
B ;1 3 ;1 0 C C B
BB T2;k+1 CC BB T2;k CCC
C B
B
B .. . . . . . . . . . .. CC BB ... CC = BB ... CC
B
B . . C
C BB . CC BB . CC
B
B .. . . . . . . ;1 C
C
@ . A B@ .. CA B@ .. CA
0 ;1 3 Tn;k+1 Tn;k + TW2
or
Ax = b:
The matrix A is symmetric, tridiagonal, diagonally dominant and positive denite.
Block Tridiagonal Systems
To see how block tridiagonal systems arise in applications, consider the two-dimensional
Poisson's equation:
@ 2T + @ 2T = f (x; y ); 0 x 1; 0 y 1:
@x2 @y 2
241
A discrete analog to this equation, similar to Laplace's equation derived earlier, is
Ti+1;j + Ti;1;j + Ti;j +1 + Ti;j;1 ; 4Tij = (x)2 fij ;
i = 1; 2; : : :; n j = 1; 2; : : :; n
This will give rise to a linear system of (n + 2)2 variables.
Assume now the values of T at the four sides of the unit square are known and we are interested
in the values of T at the interior grid points, that is,
T0;j ; Tn+1;j and Ti;0 ; Ti;n+1
(j = 0; 1; : : :; n + 1; i = 0; 1; : : :; n + 1)
are given and T11; : : :; Tn1; T12; : : :; Tn2; T1n; : : :; Tnn are to be found. Then we have a (n2 n2 )
system with n2 unknowns which can be written after suitable rearrangement as
04 ;1 0 0 ;1 0 1
BB ;1 4 ;1 0 0 ;1 0 CC
BB CC
BB CC
BB CC
BB CC
BB CC
BB CC
BB 0 0 ;1 4 ;1 0 0 ;1 CC
BB 0 0 ;1 4 0 0 0 ;1 0 CC
BB C
BB ;1 0 0 0 4 ;1 0 0 ;1 0 C CC
B@ 0 ;1 0 ;1 4 ;1 0 0 ;1 0 C A
..
.
242
0 T01 + T10 ; (x)2f11 1
0 T11 1 BB T20 ; (x)2f21 CC
B C B
B CC
B
B C
T21 C B B CC
B
B . C
.. C B B .
.. CC
B
B C
C B
B CC
B
B C
Tn1 C B B CC
B C B .. CC
B
B T12 CC =B
B . CC
B
B .. C
C BB CC
B . C B
B
B C B
BB Tn;1;0 ; (x)2fn;1;1 CCC
B Tn2 CC
B
B . C B C
@ .. CA BBB Tn+1;1 + Tn;0 ; (x)2fn;1 CCC
Tnn B@ T02 ; (x)2f12 CA
..
.
or in matrix form,
0 An ;In 1 0 T11 1
0 T + T ; (x)2f 1
BB CC BB T CC 01 10 11
BB C B 21 C B CC
BB ;In . . . . . . CC BB ... CC BB T20 ; (x) f21
2
CC
BB CC BB CC BB ..
. CC
BB CC BB Tn1 CC BB . CC
BB ... ... ... CC BB CC BB .. CC
C B T12 C = B (6.3.36)
BB CC BB .. CC B Tn;1;0 ; (x) fn;1;1 CC
B 2
BB CC BB . CC BB Tn+1;1 + Tn;0 ; (x)2fn;1 CC
BB . .
. . . . ;In C C BB Tn2 CC BB CC
BB CC BB .. CC @ B T02 ; (x) f12
2 CA
@ A@ . A ..
.
;In An Tnn
where
04;1 0 1
BB. . . . . . ... C
CC
;1
An = BBB ..
. . . . . . ;1 C CA (6.3.37)
@ .
0 ;1 4
The system matrix above is block tridiagonal and each block diagonal of matrix An is symmetric,
tridiagonal, and positive denite. For details, see Ortega and Poole (INMDE, pp. 268{272).
243
Just to give a taste to the readers, we illustrate this below by means of a simple dierential equa-
tion. The interested readers are referred to some of the well-known books on the subject: Ciarlet
(1981), Strang and Fix (1973), Becker, Carey and Oden (1981),Reddy(1993).
1. Variational formulation of a two-point boundary value problem.
Let us consider the following two-point boundary value problem
;u00 + u = f (x) 0<x<1 (6.3.38)
u(0) = u(1) 0 (6.3.39)
du and f is a continuous function on [0,1]. We further assume that f is such that
where u0 = dx
Problem (6.3.38)-(6.3.39) has a unique solution.
We introduce the space
V = fv : v is a continuous function on [0,1] - and v 0 is piece-wise continuous and -
bounded on [0,1], and v (0) = v (1) = 0g
Now, if we multiply the equation ;u00 + u = f (x) by an arbitrary function v 2 V (v is called a
test function) and integrate the left hand side by parts, we get
Z1 Z1
(;u00(x) + u(x))v (x)dx = f (x)v (x)dx
0 0
that is,
Z1 Z1
(u0 v 0 + uv )dx = f (x)v (x)dx (6.3.40)
0 0
Since v 2 V; and v (0) = v (1) = 0. We write (6.3.40) as:
a(u; v) = (f; v) for every u 2 V
where Z1
a(u; v ) = (u0 v 0 + uv )dx
0
and Z1
(f; v ) = f (x)v (x)dx
0
(Notice that the form a(; ) is symmetric (i.e. a(u; v ) = a(v; u)) and bilinear.) These two prop-
erties, will be used later. It can be shown that u is a solution of (6.3.40) if and only
if u is a solution to (6.3.38)-(6.3.39).
2. The Discrete Problem
244
We now discretize problem (6.3.40). We start by constructing a nite dimensional subspace Vn
of the space V .
Here, we will only consider the simple case where Vn consists of continuous piecewise linear
functions. For this purpose, we let 0 = x0 < x1 < x2 ::: < xn < xn+1 = 1 be a partition of the
interval [0,1] into subintervals Ij = [xj ;1; xj ] of length hj = xj ; xj ;1; j = 1; 2; :::; n + 1. With
this partition, we associate the set Vn of all functions v (x) that are continuous on the interval [0,1],
linear in each subinterval Ij , j = 1; :::; n + 1, and satisfy the boundary conditions v (0) = v (1) = 0.
We now introduce the basis functions f1; 2; :::; ng, of Vn.
We dene j8(x) by
< 1 if i = j
(i) j (xi ) = :
0 if i 6= j
(ii) j (x) is a continuous piecewise linear function.
j (x) can be computed explicitly to yield:
ϕ ( λ)
1 j
0
xj 1
8 x ; xj;1
>
< hj ; when xj;1 x xj
j (x) = > xj+1 ; x
: hj+1 ; when xj x xj+1:
Since 1; :::; n are the basis functions, any function v 2 Vn can be written uniquely as:
X
n
v (x) = vii(x); where vi = v (xi ):
i=1
We easily see that Vn V .
The discrete analogous of Problem (6.3.40) then reads:
nd un 2 Vn such that
a(un; v ) = (f; v) 8v 2 Vn (6.3.41)
X
n
Now, if we let un = ui i(x) and notice that equation (6.3.41) is particularly true for every
i=1
245
function j (x); j = 1; :::; n, we get n equations, namely.
X
a( ui i; j ) = (f; j ) 8j = 1; 2; :::; n
Now using the linearity of a (; j ) leads to n linear equations in n unknowns:
X
n
uia(i; j ) = (f; j ) 8j = 1; 2; :::; n:
i=1
which can be written in the matrix form as
Aun = (fn)i (6.3.42)
where (fn )i = (f; i) and A = (aij ) is a symmetric matrix given by
aij = aji = a(i ; j ); and un = (u1; :::; un)T :
The entries of the matrix A can be computed explicitly: We rst notice that
aij = aji = a(i ; j) = 0 if ji ; j j 2
(This is due to the local support of the function i(x)). A direct computation now leads to
Z x " 1 (x ; xj;1)2 # Z x +1 " 1 (xj+1 ; x) #2
aj;j = a(j ; j ) = dx + dx
j j
xj;1h2 + h2 j j h2 + h2 xj j +1 j +1
1 1 1
= h +h + 3 [hj + hj +1] :
j j +1
Z x " 1 (xj ; x) (x ; xj;1 # ;1 + hj
aj;j;1 = ; h2 + h dx
j
h = h 6
xj ;1 j j j j
Hence, the system (6.3.42) can be written as:
2 3 2 3 2 3
66 a1 b1 77 66 u1 77 66 (fn )1 77
66 b1 a2 0 77 66 u2 77 66 (fn )2 77
66 77 66 ..
.
77 66 ..
.
77
66 77 66 77 = 66 77
66 ... ... ... 77 66 .. 77 66 .. 77
66 77 66 . 77 66 . 77
64 ... ... bn;1 75 64 .. 75 64 .. 75
. .
0 bn;1 an un (fn )n
246
where aj = h1 + h 1 + 13 [hj + hj +1] and bj = ; h1 + h6j . In the special case of uniform grid
j j +1 j
1
hj = h = n + 1 , the matrix A then takes the form
2 3
66 2 ;1 0 7 2 3
66 ;1 2 ... ...
7 77 66 4 1
... ... 7
07
7
1
A = h 66
6 ... ... ... 77 + h 66 1 7
66 77 6 66 ... . . . 1 777 (6.3.43)
6
64 ... ... ;1 775 4 0 1 4
5
0 ;1 2
6.3.7 Approximation of a Function by a Polynomial: Hilbert System
In Chapter 3 (Section 3.6) we cited an ill-conditioned linear system with the Hilbert matrix. In
this section we show how such a system arises. The discussion here has been taken from Forsythe
and Moler (CSLAS, pp. 80{81).
Suppose a continuous function f (x) dened on the interval 0 x 1 is to be approximated by
a polynomial of degree n ; 1: n X
pi xi;1;
i=1
such that the error Z 1 "X #2
n
E= pixi;1 ; f (x) dx
0 i=1
is minimized. The coecients pi of the polynomial are easily determined by setting
E = 0; i = 1; : : :; n:
pi
(Note that the error is a dierentiable function of the unknowns pi and that a minimum occurs
when all the partial derivatives are zero.) Thus we have
2 3
E = 2 Z 1 4X
n
pj xj;1 ; f (x)5 xi;1 dx = 0; i = 1; : : :; n
pi 0 j =1
or n Z 1 Z1
X
xi+j ;2 dx pj = f (x)xi;1 dx; i = 1; : : :; n:
j =1 0 0
(To obtain the latter form we have interchanged the summation and integration.)
Letting Z1
hij = xi+j;2 dx
0
247
and Z1
bi = f (x)xi;1 dx (i = 1; 2; : : :; n);
0
we have
X
n
hij pj = bi; i = 1; : : :; n:
j =1
That is, we obtain the linear system
Hp = b;
0b 1
BB b1 C C
where H = (hij ); b = BBB ..2 C
C. The matrix H is easily identied as the Hilbert matrix, (see
@.C A
bn
Chapter 3, Section 3.6), since
Z1
hij = xi+j ;2 dx = i + j1 ; 1 :
0
ops to compute y2, 3
ops to compute y3 , and so on). The algorithm is as stable as the back
substitution process (Algorithm 3.1.3).
249
Solving Ax = b using Gaussian Elimination Without Pivoting
The solution of the system Ax = b using Gaussian elimination (without pivoting)
can be achieved in two stages:
First, nd a LU factorization of A.
Second, solve two triangular systems: the lower triangular system Ly = b rst,
followed by the upper triangular system Ux = y .
Flop-count. Since a LU factorization requires about n33
ops and the solution of each trian-
gular system needs only n22
ops, the total
ops count for solving the system Ax = b using Gaussian
elimination is about n33 + n2 .
250
Implementation of Step 2
The vector
b0 = Mb = Mn;1Pn;1Mn;2 Pn;2 M1 P1b
can be computed as follows:
(1) s1 = b
(2) For k = 1; 2; : : :; n ; 1 do
sk+1 = Mk Pk sk
sn = b0 .
251
Computational Remarks
The practical Gaussian elimination (with partial pivoting) algorithm does not give Mk and Pk
explicitly. But, we really do not need them. The vector sk+1 can be computed immediately from
sk once the index rk of row interchange and the multipliers mik have been saved at the kth step.
This is illustrated with the 3 3 example as follows:
Example 6.4.1
Let 01 0 01 0 1 0 01
B C B C
n = 3; P1 = B
@ 0 0 1 CA ; M1 = B
@ m21 1 0 CA
0 1 0 m31 0 1
and let 0 s(2) 1
BB 1(2) CC
s2 = M1 P1s1 = @ s2 A :
s(2)
3
Then we have 0s 1
BB 1 CC
P1s1 = @ s3 A
s4
and the entries of s2 are then given by
s(2)
1 = s1 ;
s(2)
2 = m21s1 + s3 ;
s(2)
3 = m31s1 + s2 :
Implementation of Step 3
Since x = Qy = Q1 Q2 Qn;1y , the following scheme can be adopted to compute x from y in
Step 3.
Set wn = y .
For k = n ; 1; : : : 2; 1 do
wk = Qk wk+1
Then x = w1.
Note: Since Qk is a permutation matrix, the entries of wk are simply those of wk+1
reordered according to the permutation index.
Example 6.4.2
Solve
Ax = b
with 00 1 11 021
B CC B CC
A=B
@ 1 2 3 A; b=B
@6A
1 1 1 3
(a) using partial pivoting,
253
(b) using complete pivoting.
(a) Partial Pivoting
With the results obtained earlier (Section 5.2.2, Example 5.2.3), we compute
061 00 1 01
B 2 CC ; P = BB 1 0 0 CC
P1 b = B
@ A 1 @ A
3 0 0 1
061 0 1 0 01
M1P1b = B
B 2 CC ; M = BB 0 1 0 CC
@ A 1 @ A
;3 ;1 0 1
061 01 0 01
B CC B C
P2M1P1b = B @ 2 A ; P2 = B@ 0 1 0 CA
;3 0 0 1
061 01 0 01
B CC B C
b0 = M2P2M1P1b = B @ 2 A ; M2 = B@ 0 1 0 CA :
;1 0 1 1
The solution of the system
Ux = b0
01 2 3 10x 1 0 6 1
BB CC BB 1 CC BB CC
@ 0 1 1 A @ x2 A = @ 2 A
0 0 ;1 x3 ;1
is x1 = x2 = x3 = 1.
254
The solution of the system
Uy = b0
03 2 110x 1 061
BB 0 2 1 CC BB 1 CC BB CC
@ 3 3 A @ x2 A = @ 1 A
0 0 12 x3 1
2
is y1 = y2 = y3 = 1. Since fxk g; k = 1; 2; 3 is simply the rearrangement of fyk g, we have
x1 = x2 = x3 = 1.
Some Details of Implementation
Note that it is not necessary to store the vectors si and wi separately, because all we need is
the vector sn for partial pivoting and w1 for complete pivoting. So starting with x = b, each new
vector can be stored in x as it is computed.
Also note that if we use the practical algorithms, the matrices Pk ; Qk and Mk are not available
explicitly; they have to be formed respectively out of indices rk ; sk and the multipliers mik . In this
case, the statements for computing the sk 's and wk 's are to be modied accordingly.
Flop-count. We have seen in Chapter 5 (Section 5.2.1) that the triangularization process
using elementary matrices requires n33
ops. The triangular system Ux = b0 or Uy = b0 can be
solved using back substitution with n
ops and, the vector b0 can be computed with n2
ops,
2
2
taking into account the special structures of the matrices Mk and Pk . Recovering x from y in Step
3 of the Complete Pivoting process does not need any
ops. Note that x is obtained from y just by
reshuing the entries of y . Thus, the solution of the linear3 system Ax = b using Gaussian
elimination with complete or partial pivoting requires n3 +O(n2)
ops. However, Gaussian
elimination with complete pivoting requires about n3 comparisons to identify (n ; 1) pivots,
3
255
Round-o Error Result for Linear System Problem with Gaussian
Elimination
It can be shown that the computed solution x^ of the linear system Ax = b, using
Gaussian elimination, satises
(A + E )^x = b
where kE k1 c(n3 + 3n2 ) kAk1 , and c is a small constant. For a proof see
Chapter 11, Section 11.4.
Remark: The size of the above bound is really determined by , since when n is not too large,
n3 can be considerably small compared to and can therefore be neglected. Thus the growth
factor is again the deciding factor.
6.4.4 Solution of Ax = b without Explicit Factorization
As we have just seen, the Gaussian elimination method for solving Ax = b comes in two stages.
First, the matrix A is explicitly factorized:
A = LU (without pivoting)
MA = U (with partial pivoting)
MAQ = U (with complete pivoting).
Second, the factorization of A is used to solve Ax = b. However, it is easy to see that these two
stages can be combined so that the solution can be obtained by solving an upper triangular
system by processing the matrix A and the vector b simultaneously. In this case, the
augmented matrix (A; b) is triangularized and the solution is then obtained by back substitution.
We illustrate this implicit process for Gaussian elimination with partial pivoting.
Algorithm 6.4.2 Solving Ax = b With Partial Pivoting Without Explicit Factorization
Given an n n matrix A and a vector b, the following algorithm computes the triangular
factorization of the augmented matrix (A; b) using Gaussian elimination with partial pivoting. A
is overwriten by the transformed triangular matrix and b is overwritten by the transformed vector.
The multipliers are stored in the lower-half part of A.
256
For k = 1; 2; : : :; n ; 1 do
(1) Choose the largest element in magnitude in the column k below the (k; k) entry; call it ar :k;k
ar = max fjaik j : i kg ;
k;k
If ar = 0, Stop.
k;k
(2) Otherwise, interchange the rows k and rk of A and the kth and rkth entries of b:
ar $ akj ; (j = k; k + 1; : : :; n)
k;j
br $ bk :
k
Example 6.4.3
00 1 11 021
B 2 2 3 CC ;
A=B
B 6 CC :
b=B
@ A @ A
4 1 1 3
Step 1. The pivot entry is a31 = 4, r1 = 3
Interchange the rows 3 and 1 of A and the 3rd and 1st entries of b.
04 1 11 031
B C B C
AB @ 2 2 3 CA ; b B@ 6 CA
0 1 1 2
m21 = ; aa21 = ; 12
11
04 1 11 031
A A(1) = B
B 0 3 5 CC ; b b(1) = BB 9 CC
@ 2 2A @2A
0 1 1 2
257
Step 2. The pivot entry is a22 = 32
m32 = ; aa3222 = ; 23
04 1 1 1 031
B 3 5 CC B 9 CC
A A(2) = B
@0 2 2 A; b b(2) = B
@2A
0 0 ; 23 ;1
The reduced triangular system A(2) x = b(2) is:
4x1 + x2 + x3 = 3
3x + 5x = 9
2 2 2 3 2
2
; 3 x3 = ;1
The solution is:
x3 = 32 ; x2 = 12 ; x1 = 14
Flop-Count. If the Householder method is used to factor A into QR, the solution of
Ax = b requires 23 n3 + O(n2)
ops (Chapter 5, Section 5.4.1); on the other hand, the Givens
rotations technique requires about twice as many (Chapter 5, Section 5.5.1).
Round-o Property
We know from Chapter 5 that both the Householder and the Givens methods for QR factor-
ization of A are numerically stable. The back-substitution process for solving an upper triangular
system is also numerically stable (Section 3.1.3). Thus, the method for solving Ax = b using
QR factorization is likely to be stable. Indeed, this is so.
Round-o Error Result for Solving Ax = b using QR
It can be shown (Lawson and Hanson (SLP p. 92)) that the computed solution x^ is
the exact solution of
(A + E )^x = b + b;
where kE kF (3n2 + 41n)kAkF + O(2); and kbk (3n2 + 40n)kbk + O(2).
260
where B = (b1; :::; bm) is an n m matrix (m n) and X = (x1; x2; :::; xm). Here bi , and
xi ; i = 1; :::; m are n-vectors.
The problem of this type arises in many practical applications (one such application has been
considered in Section 6.4.7: Computing the Frequency Response Matrix).
Once the matrix A has been factorized using any of the methods described in Chapter 5, the
factorization can be used to solve the m linear systems above. We state the procedure only with
partial pivoting.
261
Solve: 0 6 1 0 ;0:6667 1
B 1:1429 CC ;
Ux2 = Mb2 = B
B 1:3333 CC :
x2 = B
@ A @ A
0 0
6.4.7 Special Systems
In this subsection we will study some special systems. They are
(1) Symmetric positive denite systems.
(2) Hessenberg systems.
(3) Diagonally Dominant Systems.
(4) Tridiagonal and block tridiagonal systems.
We have seen in Section 6.3 that these systems occur very often in practical applications such
as in numerical solution of partial dierential equations, etc. Indeed, it is very often said by
practicing engineers that there are hardly any systems in practical applications which
are not one of the above types. These systems therefore deserve a special treatment.
Symmetric Positive Denite Systems
First, we show that for a symmetric positive denite matrix A there exists a unique
factorization
A = HH T
where H is a lower triangular matrix with positive diagonal entries. The factorization is
called the Cholesky Factorization, after the French engineer Cholesky.
The existence of the Cholesky factorization for a symmetric positive denite matrix A can be
seen either via LU factorization of A or by computing the matrix H directly from the above relation.
To see this via LU factorization, we note that A being positive denite and, therefore having
positive leading principal minors, the factorization
A = LU
is unique. The upper triangular matrix U can be written as
U = DU1
Andre-Louis Cholesky (1875-1918) served as an ocer in the French military. His work there involved geodesy
and surveying.
262
where
D = diag(u11; u22; : : :unn)
= diag(a11; a(1) (n;1)
22 : : :ann );
263
Theorem 6.4.1 (The Cholesky Factorization Theorem) Let A be a symmetric
positive denite matrix. Then A can be written uniquely in the form
A = HH T ;
where H is a lower triangular matrix with positive diagonal entries. An explicit
expression for H is given by
H = LD1=2;
where L is the unit lower triangular matrix in the LU factorization of A obtained
by Gaussian elimination without pivoting and
D1=2 = diag u111=2; : : :; u1nn=2 :
The above constructive procedure suggests the following algorithm to compute the Cholesky
factorization of a symmetric positive denite matrix A:
Algorithm 6.4.3 Gaussian Elimination for the Cholesky Factorization
Step 1. Compute the LU factorization of A using Gaussian elimination without pivoting.
Step 2. Form the diagonal matrix D from the diagonal entries of U :
D = diag(u11; u22; : : :; unn):
264
2 3
!
U = 1 ; D = diag(2; 12 )
0 2
1 0
!
L = 3
2 1
1 0
! p2 0 ! p2 0 !
H = 3 1 1 = 3 1
2 0 p 2
p2 p2
Verify p ! p2 ! !
2 0 p32 2 3
HH T = p32 p12 p12
= = A:
0 3 5
Stability of Gaussian Elimination for the Cholesky Factorization
We now show that Gaussian elimination without pivoting is stable for symmetric positive def-
inite matrices by exhibiting some remarkable invariant properties of symmetric positive denite
matrices. The following example illustrates that even when there is a small pivot, Gaussian
elimination without pivoting does not give rise to the growth in the entries of the
matrices A(k). Let !
0:00003 0:00500
A= :
0:00500 1:0000
There is only one step. The pivot entry is 0.00003. It is small. The multiplier m21 is large:
m21 = ; aa21 = ; 00::00003
00500 = ; 500 :
3
11
But
0 : 00003 0 :00500
!
A(1) = :
0 0:166667
The entries of A(1) did not grow. In fact, max(a(1)ij ) = 0:166667 < max(aij ) = 1. This interesting
phenomenon of Gaussian elimination without pivoting applied to the 2 2 simple example (positive
denite) above can be explained through the following result.
265
Proof. We prove the results just for the rst step of elimination, because the results for the other
steps then follow inductively.
After the rst step of elimination, we have
0a a a1n 1 0 a11 a12 a1n 1
BB 011 a(1)
12
C BB 0 CC
2n C
a(1)
A(1) = BBB .. 22.. . . . ... C
C B
CA = BB@ ...
CC :
CA
@ . . B
0 a(1)
n2 a(1)
nn 0
To prove that A(1) is positive denite, consider the quadratic form:
X
n X
n X n
n X
xT Bx = a(1)
ij xi xj = aij ; aia1aj 1 xixj
i=2 j =2 i=2 j =2 11
Xn X n X
n !2
= aij xi xj ; a11 x1 + aai1 xi
i=1 j =1 i=2 11
If A(1) is not positive denite, then there will exist x2 ; : : :; xn such that
X
n X
n
ij xi xj 0:
a(1)
i=1 j =1
With these values of x2 through xn, if we dene
X
n a
i1
x1 = ; xi ;
i=2 a11
then the quadratic form
X
n X
n
aij xixj 0
i=1 j =1
which contradicts the fact that A is positive denite. Thus A(1) is positive denite.
Also, we note
ii aii ; i = 1; 2; : : :; n;
a(1)
for
0 a(1) ai1 a (because a > 0).
2
ii = aii ; ii
a11 11
Example 6.4.6
05 1 11
B C
A = B@ 1 1 1 CA
0 15 11 51 1
B 4 4 CC
A(1) = B
@0 5 5 A:
0 4 24
5 5
The leading principal minors of A(1) are 5, 4, 16. Thus A(1) is symmetric positive denite. Also,
max(a(1) 24
ij ) = 5 < max(aij ) = 5:
05 1 11
B 4 4 CC
A(2) = B@0 5 5 A
0 0 4
The leading principal minors of A(2) are 5, 4, 16. Thus A(2) is also positive denite. Furthermore,
max(a(2) (1) 5
ij ) = 5 = max(aij ). The growth factor = 5 = 1.
267
In several circumstances one prefers to solve the symmetric positive denite system Ax = b directly
from the factorization A = LDLT , without computing the Cholesky factorization. The advantage
is that the process will not require computations of any square roots. The process then
is as follows:
Gaussian Elimination for the Symmetric Positive Denite System Ax = b
Step 1. Compute the LDLT factorization of A:
A = LDLT :
Step 2. Solve
Lz = b
Step 3. Solve
Dy = z
Step 4. Solve
LT x = y
Example 6.4.7
2 3
! 5
!
A= ; b= :
3 5 8
Step 1. ! !
1 0 2 0
L= 3 ; D= :
2 1 0 21
Step 2. Solve Lz = b
1 0
! z! 5
!
1
= :
3 1
2 z2 8
z1 = 5; z2 = 1 21
Step 3. Solve Dy = z
2 0
! y! 5
!
1
=
0 21 y2 13
2
268
y1 = 52
y2 = 1
Step 4. Solve LT x = y
1 32
! x! 5 !
1 2
=
0 1 x2 1
x2 = 1
x2 = 1
269
For i = 1; 2; : : :; k ; 1 do
aki hki = h1 aki ; Pij;=11 hij hkj
q ii
Remark:
X
0
1. In the above pseudocode, ( ) 0. Also when k = 1, the inner loop is skipped.
j =1
2. The Cholesky factor H is computed row by row.
3. The positive deniteness of A will make the quantities under the square-root sign positive.
Round-o property. Let the computed Cholesky factor be denoted by H^ . Then, it can be
shown (Demmel 1989) that
A + E = H^ (H^ )T ;
where jeij j (n + 1) (aii ajj )1=2, and
1 ; (n + 1)
E = (eij ):
Thus, the Cholesky Factorization Algorithm is Stable
270
Step 3. Solve the upper triangular system for x:
H T x = y:
Example 6.4.8
Let 01 1 1 1 031
B 1 5 5 CC ;
A=B
B 11 CC :
b=B
@ A @ A
1 5 14 20
A. The Cholesky Factorization
1st row: (k = 1)
h11 = 1:
2nd row: (k=2)
h21 = ha21 = 1
q11 p
h22 = a22 ; h221 = 5 ; 1 = 2
(Since the diagonal entries of H have to be positive, we take the + sign.)
3rd row: (k=3)
h31 = ah31 = 1
n
h32 = h (a32 ; h21h31) = 12 (5 ; 1) = 2
1
q22 p p
h33 = a33 ; (h231 + h232) = 14 ; 5 = 9
(we take the + sign)
h033 = +3 1
1 0 0
B C
H=B
@ 1 2 0 CA
1 2 3
271
B. Solution of the Linear System Ax = b
(1) Solution of Hy = b
01 0 01 0y 1 0 3 1
B
B 1 2 0
CC BB y1 CC = BB 11 CC
@ A @ 2A @ A
1 2 3 y3 20
y1 = 3; y2 = 4; y3 = 3
(2) Solution of H T x = y
01 1 11 0x 1 031
BB 0 2 2 CC BB x1 CC = BB 4 CC
@ A @ 2A @ A
0 0 3 x3 3
x3 = 1; x2 = 1; x3 = 1;
Flop-Count. The Cholesky algorithm requires n6
ops to compute H; one half of the number
3
of
ops required to do the same job using LU factorization. Note that the process will also 2require
n square roots. The solution of each triangular system Hy = b and H T x = y requires n2
ops.
Thus the solution of the positive denite system Ax = b using the Cholesky algorithm
n 3
requires 6 + n
ops and n square roots.
2
Round-o property. If x^ is the computed solution of the system Ax = b using the Cholesky
algorithm, then it can be shown that x^ satises
(A + E )^x = b
where kE k2 ckAk2; and c is a small constant depending upon n. Thus the Cholesky algo-
rithm for solving a symmetric positive denite system is quite stable.
Relative Error in the Solution by the Cholesky Algorithm
Let x^ be the computed solution the symmetric positive denite system of Ax = b
using the Cholesky algorithm followed by triangular systems solutions as described
above, then it can be shown that
kx ; x^k2 Cond(A):
kx^k2
(Recall that Cond(A) = kAk kA;1k.)
272
Remark: Demmel (1989) has shown that the above bound can be replaced by O()Cond(A~),
where A~ = D;1AD;1 ; D = diag(pa11; : : :; pann). The latter may be much better than the previous
one, since Cond(A~) may be much smaller than Cond(A). (See the discussions on conditioning and
scaling in Section 6.5.)
Hessenberg System
Consider the linear system
Ax = b;
where A is an upper Hessenberg matrix of order n. If Gaussian elimination with partial pivoting
is used to triangularize A; and if jaij j 1; then ja(ijk)j k + 1; (Wilkinson AEP p. 218). Thus we
can state the following:
Flop-count: It requires only n2
ops to solve a Hessenberg system, signicantly less than
n3
3
ops required to solve a system with an arbitrary matrix. This is because at each step of
elimination during triangularization process, only one element needs to be eliminated and since
n 2
there are (n ; 1) steps, the triangularization process requires about
ops. Once we have the
2
factorization
MA = U;
the upper triangular system
Ux = Mb = b0
can be solved in n
ops. Thus a Hessenberg system can be solved with only n2
ops in
2
2
a stable way using Gaussian elimination with partial pivoting.
Example 6.4.9
273
Triangularize 0 1 2 31
B C
A=B
@ 2 3 4 CA
0 5 6
using partial pivoting.
Step 1.
00 1 01
B C
P1 = B
@ 1 0 0 CA
0 0 1
02 3 41
B 1 2 3 CC
P1 A = B
@ A
0 5 6
1 0
!
c1 =
M
; 21 1
0 1 0 01
B 1 C
M1 = B
@ ; 2 1 0 CA
0 0 1
0 1 0 0102 3 41 02 3 41
B 1 CB C B C
@ ; 2 1 0 CA B@ 1 2 3 CA = B@ 0 21 1 CA :
M1P1A = A(1) = B
0 0 1 0 5 6 0 5 6
Step 2.
01 0 01
B C
P2 = B
@ 0 0 1 CA
0 1 0
02 3 41
P2A(1)
B0 5 6C
= B C
@ A
0 1 1
2
1 0
!
c2 =
M
; 101 1
01 0 0
1
B0
M2 = B 1 0C
C
@ A
0 ; 101 1
274
01 0 0
10 2 3 4
1 0 2 3 4
1
B CC BB CC BB C
U = A(2) = M2P2A(1) = B
@0 1 0A@0 5 6A = @0 5 6C A
0 ; 101 1 1
0 2 1 2
0 0 5
00 1 0
1
B0
M = M2P2M1P1 = B 0 1 C
C
@ A
1 ; 21 ; 101
275
form. Note that the matrix A is transformed to a Hessenberg matrix once and the same
Hessenberg matrix is used to compute G(j!) for each !.
Thus, the computation that uses an initial reduction of A to a Hessenberg matrix
is much more ecient. Moreover, as we have seen before, reduction to Hessenberg form and
solutions of Hessenberg systems using partial pivoting are both stable computations. This approach
was suggested by Laub (1981).
276
and, it can be shown [exercise] that A0 is again column diagonally dominant and therefore a(1)22
is the pivot for the second step. This process can obviously be continued, showing that pivoting
is not needed for column diagonally dominant matrices. Furthermore, the following can be easily
proved.
Example 6.4.10
0 5 1 11
B 1 5 1 CC :
A=B
@ A
1 1 5
Step 1. 05 1 1 1
B 0 24 4 CC
A(1) = B
@ 5 5A
0 4 25
5 5
Step 2. 05 1 1 1
B 24 4 CC
A(2) = B
@0 5 5 A:
0 0 14
3
The growth factor = 1.
(Note that for this example, the matrix A is column diagonally dominant and positive denite;
thus = 1.)
The next example shows that the growth factor of Gaussian elimination for a col-
umn diagonally dominant matrix can be greater than 1, but is always less than 2.
Example 6.4.11
277
5 ;8
!
A= :
1 10
5 ;8
!
A =
(1) :
0 11:6
The growth factor = max(10; 11:6) = 11:6 = 1:16.
10 10
Tridiagonal Systems
The LU factorization of a tridiagonal matrix T , when it exists, may yield L and U having
very special simple structures: both bidiagonal, L having 1's along the main diagonal and the
superdiagonal entries of U the same as those of T . Specically, if we write
0a b1 0
1
BB c1 ... ... .. C
. C
T =BBB ..2 ... ... b C
C
@. n;1 C
A
0 cn an
01 10u b1 1
B
B `2 ... 0 CC BB 1 ... ... 0 CC
B
B ... ...
CC BB ... ...
CC
B
=B CC BB CC :
B CC BB C
B
@ 0 ... ... A@ 0 ... bn;1 C A
`n 1 un
By equating the corresponding elements of the matrices on both sides, we see that
a1 = u1
ci = `i ui;1; i = 2; : : :; n
ai = ui + `i bi;1; i = 2; : : :; n;
from which f`ig and fui g can be easily computed:
Computing LU Factorization of a Tridiagonal Matrix
u1 = a1
For i = 2; : : :; n do
`i = ci
ui;1
ui = ai ; `ibi;1.
278
Flop-count: The above procedure only takes (2n-2)
ops.
Solving a Tridiagonal System
Once we have the above simple factorization of T; the solution of the tridiagonal
system Tx = b can be found by solving the two special bidiagonal systems:
Ly = b
and
Ux = y
.
Flop-count: The solutions of these two bidiagonal systems also require (2n-2)
ops. Thus,
a tridiagonal system can be solved by the above procedure in only 4n-4
ops, a very
cheap procedure indeed.
Stability of the Process: Unfortunately, the above factorization procedures breaks down if
any ui is zero. Even if all ui are theoretically nonzero, the stability of the process in general
cannot be guaranteed. However in many practical situations, such as in discretizing Poisson's
equation etc., the tridiagonal matrices are symmetric positive denite, in which cases, as we
have seen before, the above procedure is quite stable.
In fact, in the symmetric positive denite case, this procedure should be preferred
over the Cholesky-factorization technique, as it does not involve computations of any square
roots. It is true that the Cholesky factorization of a symmetric positive denite tridiagonal matrix
can also be computed in O(n)
ops however, an additional n square roots have to be computed
(see, Golub and Van Loan MC 1984, p. 97).
In the general case, to maintain the stability, Gaussian elimination with partial
pivoting should be used.
If jaij; jbij, jcij 1 (i ; 1; : : :; n), then it can be shown (Wilkinson AEP p. 219) that the entires
of A(k) at each step will be bounded by 2.
279
Growth Factor and Stability of Gaussian Elimination for a Tridiagonal
System
The growth factor for Gaussian elimination with partial pivoting for a tridiagonal
matrix is bounded by 2:
2:
Thus, Gaussian elimination with partial pivoting for a tridiagonal system
is very stable.
The
op-count in this case is little higher; it takes about 7n
ops to solve the system Tx = b (3n
ops for decomposition and 4n for solving two triangular systems), but still an O(n) procedure.
If T is symmetric, one naturally wants to take advantage of the symmetry; however, Gaussian
elimination with partial pivoting does not preserve symmetry. Bunch (1971,1974) and
Bunch and Kaufman (1977) have proposed symmetry-preserving algorithms. These algorithms can
be arranged to have
op-count comparable to that of Gaussian elimination with partial pivoting
and require less storage than the latter. For details see the papers by Bunch and Bunch and
Kaufman.
Example 6.4.12 Triangularize
0 0:9 0:1 0 1
B C
A=B
@ 0:8 0:5 0:1 CA ;
0 0:1 0:5
using (i) the formula A = LU and (ii) Gaussian elimination.
i. From A = LU
u1 = 0:9
i=2:
`2 = uc2 = 00::89 = 98 = 0:8889
1
u2 = a2 ; `2b1 = 0:5 ; 89 0:1 = 0:4111
280
i=3:
`3 = uc3 = 00::41
1 = 0:2432
2
u3 = a03 ; `3b2 = 0:5 ; 0:24 10:1 = 0:4757
1 0 0
B C
L = B
@ 0:8889 1 0C A;
0 0:90 0:01:2432 00:1 1
B C
U = B
@ 0 0:4111 0:1 CA
0 0 0:4757
0 4 ;1 1 0 1 041
BB C BB CC
B ;1 4 0 1 C C BB 4 CC
A=B C
B@ 1 0 2 ;1 CA ; b = B@ 2 CA
0 1 ;1 2 2
4 ;1
! 2 ;1
! 1 0
!
A1 = ; A2 = ; B1 =
;1 4 ;1 2 0 1
4
! 2
!
b1 = ; b2 =
4 2
Block LU Factorization
!
4 ;1
Set U1 = A1 = :
;1 4
i=2:
(1) Solve for L2:
1 0
!
U1L2 = I2 =
0 1
0:2667 0:0667
!
L2 = U1;1 =
0:667 0:2667
283
(2) Compute U2 from
U2 = A2 ; L2 B1! ! !
2 ;1 0:2667 0:0667 1 0
= ;
;1 2 0:0667
! 0:2667 0 1
1:7333 ;1:0667
=
;1:0667 1:7333
Block Forward Elimination
4
!
y1 = b1 ; L1 y0 = b1 =
4
0:6667
!
y2 = b2 ; L2 y1 =
0:6667
Block Back Substitution
0:6667
!
U2x2 = y2 ; B2x3 = y2 = (B2x3 = 0)
0:6667
0:6667
! 0:9286 0:5714 ! 0:6666 ! 1 !
x2 = U2; 1 = =
0:6667 0:5714 0:9286 0:6667 1
4
! 1
! 3
!
U1x1 = y1 ; B1x2 = ; =
4 1 3
3
! !
0:2667 0:0667 3
! 1!
x1 = U1;1 = = :
3 0:0667 0:2667 3 1
285
and then computing
x = D2y;
where
A~ = D1;1AD2
~b = D1;1 b:
The above process is known as equilibration (Forsythe and Moler, CSLAS, pp. 44{45).
In conclusion, we note that scaling or equilibration is recommended in general, and
should be used only on an ad-hoc basis depending upon the data of the problem. \The
round-o error analysis for Gaussian elimination gives the most eective results when a matrix is
equilibrated." (Forsythe and Moler CSLAS, p. )
286
TABLE 6.1
(COMPARISON OF DIFFERENT METHODS
FOR LINEAR SYSTEM PROBLEM WITH ARBITRARY MATRICES)
287
TABLE 6.2
(COMPARISON OF DIFFERENT METHODS FOR LINEAR
SYSTEM PROBLEM WITH SPECIAL MATRICES)
FLOP-COUNT GROWTH
MATRIX TYPE METHOD (APPROX.) FACTOR STABILITY
3
Symmetric Positive 1) Gaussian Elimination 1) n3 1) = 1
Denite without Pivoting Stable
3
1) Cholesky 1) n6 + (n 1) None
square roots)
Diagonally Gaussian Elimination
Dominant with partial n3 2 Stable
3
pivoting
Hessenberg Gaussian Elimination
with Partial n2 n Stable
Pivoting
Tridiagonal Gaussian Elimination O(n) 2 Stable
with Partial Pivoting
288
The rst problem, the computation of A;1 b; is equivalent to solving the linear system:
Ax = b:
Similarly, the second problem can be formulated in terms of solving sets of linear equations. Thus,
if A is of order n n and B is of order n m, then writing C = A;1 B = (c1; c2; : : :; cm). We see
that the columns c1 through cm of C can be found by solving the systems
Aci = bi; i = 1; : : :; m;
where bi ; i = 1; : : :; m are the successive columns of the matrix B .
The computation of bT A;1c can be done in two steps:
1. Find A;1c; that is, solve the linear system: Ax = c
2. Compute bT x.
As we will see later in this section, computing A;1 is three times as expensive as solving
the linear system Ax = b. Thus, all such problems mentioned above can be solved much more
eciently by formulating them in terms of linear systems rather than naively solving them using
matrix inversion.
The explicit computation of the inverse should be avoided whenever pos-
sible. A linear system should never be solved by explicit computation of
the inverse of the system matrix.
290
The Sherman-Morrison Formula
If u and v are two n-vectors and A is a nonsingular matrix, then
(A ; uv T );1 = A;1 + (A;1uv T A;1 )
where
= (1 ; v T1A;1u) ; if v T A;1u 6= 1.
; 52 32 ; 12
291
Thus 0 ;3 1 1 1
B 47 ;43 14 CC
(A ; uv T );1 = B
@ ;2 2 2 A :
; 25 32 ; 12
292
B. If complete pivoting is used,
MAQ = U:
Then
A = M ;1 UQT :
(Note that Q;1 = QT .) Thus,
A;1 = R;1 QT :
For reasons stated earlier, Gaussian elimination with partial pivoting should be used
in practice.
Remark: In practical computations, the structure of the elementary lower triangular ma-
trices Mi and the fact that Pi are permutation matrices should be taken into consideration
in forming the product M .
D. If A is a symmetric positive denite matrix, then
A = HH T (the Cholesky Factorization)
A;1 = (H T );1H ;1 = (H ;1)T H ;1
293
Computing the inverse of a symmetric positive denite matrix A:
Step 1. Compute the Cholesky factorization A = HH T .
Step 2. Compute the inverse of the lower triangular matrix H : H ;1.
Step 3. Compute (H ;1)T H ;1.
E. A is a tridiagonal matrix
0a b 0
1 ja1j > jb1j
1 1
B
B c1 a2 . . .
CC ja2j > jb2j + jc1j
B
A=B ... ... b C
C;
B ..
@ n;1 CA .
0 cn;1 an janj > jcn;1j:
Then A has the bidiagonal factorization:
A = LU;
where L is lower bidiagonal and U is upper bidiagonal.
A;1 = U ;1 L;1
Example 6.5.3
Let
0 2 ;1 0 1
B C
A = B
@ ;1 2 ;1 CA
0 ;1 1
0 1 0 01
B ; 1 1 0 CC ;
L = B
@ 2 A
0 ; 23 1
0 2 ;1 0 1
B 0 3 ;1 CC
U = B
@ 2 A
0 0 1
3
01 1 1
101 0 0
1 0
1 1 1
1
B2 3
C B C B C
A ; 1= U ;1 L;1 = B
@0 2
3
C B 1 C B
2A@ 2 1 0A = @1 2 2C A:
0 0 3 1 2 1 1 2 3
3 3
A is tridiagonal and L and U are bidiagonals.
294
Example 6.5.4
Compute A;1 using partial pivoting when
00 1 11
A = B
B 1 2 3 CC
@ A
1 1 1
A;1 = U ;1 M = U ;1M2 P2 M1P1 :
Using the results of Example 5.2.4, we have
00 1 01
B C
M = M2P2 M1 P1 = B
@ 1 0 0 CA
1 ;1 1
0 1 ;2 1 1
B C
U ;1 = B
@ 0 1 1 CA :
0 0 ;1
So, 0 1 ;2 1 1 0 0 1 0 1 0 ;1 0 1 1
B CB C B C
A;1 = B
@ 0 1 1 CA B@ 1 0 0 CA = B@ 2 ;1 1 CA :
0 0 ; 1 1 ;1 1 ;1 1 ;1
295
Computing det(A) from LU Factorization
(n;1)
22 ann = the product of the pivots.
det(A) = a11a(1)
296
Computing det(A) from MAQ = U factorization
Example 6.5.5
00 1 11
B 1 2 3 CC
A=B
@ A
1 1 1
A. Gaussian elimination with partial pivoting.
01 2 3 1
B C
U =B @ 0 1 1 CA
0 0 ;1
only one interchange; therefore r = 1. det(A) = (;1) det(U ) = (;1)(;1) = 1.
B. Gaussian elimination with complete pivoting.
03 2 11
B 2 1 CC
U =B @0 3 3 A
0 0 12
In the rst step, there were one row interchange and one column interchange.
In the second step, there are one row interchange and one column interchange. Thus r =
2; s = 2
det(A) = (;1)r+s det(U ) = (;1)43 23 12 = 1:
298
Flop-count and stability. The algorithm requires 4n3
ops, four times the expense of the
3
Gaussian elimination method. The algorithm has guaranteed stability. (Wilkinson, AEP, pp.
246)
Example 6.5.6
01 0 01
B 1 1 0 CC
A=B
@ A
0 0 1
only one step needed
c = p1 ; s = p1
2 2
0 1 0 1
p12 p12
! 1 ! B 0:7071 0:7071 C 1 ! B 1:4142 C
=B CA = =B
@ 0 CA
; p12 p12 1 @ 1
;0:7071 0:7071 0
0 0:7071 0:7071 0 1 0 1 0 0 1 0 1:4142 0:7071 0 1
B CB C B C
A J (1; 2; )A = B
@ ;0:7071 0:7071 0 CA B@ 1 1 0 CA = B@ 0 0:7071 0 CA
0 0 1 0 0 1 0 0 1
1st leading principal minor: p1 = a11 = 1
2nd leading principal minor: p2 = a11 a22 = 1:4142 0:7071 = 1:0000
3rd leading principal minor: p3 = a11a22a33 = 1.
x1 = 3; x2 = 0:
Thus, a very small change in the right hand side changed the solution altogether.
In this section we study the eect of small perturbations of the input data A and b on the
computed solution x of the system Ax = b. This study is very useful. Not only will this help
us in assessing an amount of error in the computed solution of the perturbed system,
regardless of the algorithm used, but also, when the result of a perturbation analysis
is combined with that of backward error analysis of a particular algorithm, an error
bound in the computed solution by the algorithm can be obtained.
Since in the linear system problem Ax = b, the input data are A and b, there could be impurities
either in b or in A or in both. We will therefore consider the eect of perturbations on the solution
x in each of these cases separately.
300
Proof. We have
Ax = b;
and
A(x + x) = b + b:
The last equation can be written as
Ax + Ax = b + b;
or
Ax = b; sinceAx = b;
that is,
x = A;1b:
Taking a subordinate matrix-vector norm we get
kxk kA;1k kbk: (6.6.1)
Again, taking the same norm on both sides of Ax = b; we get
kAxk = kbk
or
kbk = kAxk kAk kxk (6.6.2)
Combining (6.6.1) and (6.6.2), we have
kx kbk :
kxk kAkkA;1kkbk
301
Recall from Chapter 3 that kAk kA;1k is the condition number of A and is denoted by Cond(A).
Theorem is therefore proved.
.
302
Interpretation of Theorem 6.6.1
It is important to understand the implication of Theorem 6.6.1 quite well. Theorem 6.6.1
says that a relative change in the solution can be as large as Cond(A) multiplied by
the relative change in the vector b. Thus, if the condition number is not too large, then a
small perturbation in the vector b will have very little eect on the solution. On the other hand,
if the condition number is large, then even a small perturbation in b might change the solution
drastically.
Example 6.6.1 An ill-conditioned problem
01 2 1
1 0 4 1
A=B
B 2 4:0001 2:002 CC ; B 8:0021 CC
b=B
@ A @ A
1 2:002 2:004 5:006
011 0 4 1
B C B C
The exact solution x = B
@1C
A. Change b to b0 = B@ 8:0020 CA.
1 5:0061
Then the relative change in b:
kb0 ; bk = kbk = 1:879 10;5 (small):
kbk kbk
If we solve the system Ax0 = b0, we get
0 3:0850 1
B C
x0 = x + x = B
@ ;0:0436 CA :
1:0022
(x0 is completely dierent from x)
Note: kkx k
xk = 1:3461:
It is easily veried that the inequality in Theorem 6.6.1 is satised:
Cond(A) kkbbkk = 4:4434:
However, the predicated change is overly estimated.
Example 6.6.2 A well-conditioned problem
1 2
! 3
!
A= ; b=
3 4 7
303
1
! 3:0001
!
The exact solution x = . Let b0 = b + b = .
1 7:0001
The relative change in b:
kb0 ; bk = 1:875 10;5 (small)
kbk
Cond(A) = 14:9330 (small)
Thus a drastic change in the solution x is not expected. In fact x0 satisfying
Ax0 = b0
is
0:9999
! 1
!
x0 = x= :
1:0001 1
Note:
kxk = 10;5:
kxk
6.6.2 Eect of Perturbation in the matrix A
Here we assume that there are impurities in A only and as a result we have A + A in hand, but
b is exact.
Proof. We have
(A + A)(x + x) = b;
or
(A + A)x + (A + A)x = b: (6.6.6)
304
Since
Ax = b;
we have from (6.6.6)
(A + A)x = ;Ax (6.6.7)
or
x = ;A;1 A(x + x): (6.6.8)
Taking the norm on both sides, we have
kxk kA;1k kAk (kxk + kxk) (6.6.9)
= kA k kkAAkk kAk (kxk + kxk)
;1
Remarks: Because of the assumption that kAk < kA1;1k (which is quite reasonable to assume),
the denominator on the right hand side of the inequality in Theorem 6.6.2 is less than one. Thus
even if kkAAkk is small, then there could be a drastic change in the solution if Cond(A)
is large.
Example 6.6.3
Consider the previous example once more. Change a2;3 to 2.0001; keep b xed. Thus
00 0 01
B C
A = ;10;4 B@ 0 0 1 CA (small):
0 0 0
305
Now solve the system: (A + A)x0 = b :
0 ;1:0002 1
B 2:0002 CC
x0 = B
@ A
0:9998
0 ;2:0002 1
x = x0 ; x = B
B 1:0002 CC
@ A
;0:0003
Relative Error = kkx
xk
k = 1:2911 (quite large).
Proof. Subtracting
Ax = b
from
(A + A)(x + x) = b + b
we have
(A + A)(x + x) ; Ax = b
or
(A + A)(x + x) ; (A + A)x + (A + A)x ; Ax = b
or
(A + A)(x) ; Ax = b
306
or
A(I ; A;1(;A))x = b + Ax: (6.6.12)
Let A;1 (;A) = F . Then
kF k = kA;1(;A)k kA;1k kAk < 1 (by assumption):
Since kF k < 1, I ; F is invertible, and
k(I ; F );1k 1 ;1kF k (see Chapter 1, Section 1.7, Theorem 1.7.7): (6.6.13)
or
kxk kA;1k kbk + kAk (6.6.14)
kxk (1 ; kF k) kxk
;1
(1k;A kFkk) kbkkbkkAk + kAk (Note that kx1k kkAbkk ).
That is,
kxk kA;1k kAk kbk + kAk : (6.6.15)
kxk (1 ; kF k) kbk kAk
Again
kF k = kA;1(;A)k kA;1k kAk = kA kAkkkAk kAk:
;1
(6.6.16)
Since kF k 1, we can write from (6.6.15) and (6.6.16)
0 1 0 1
kxk B
BB kA;1k kAk C
C kbk kAk BB Cond(A) CC kbk kAk
C +
kxk @ (1 ; ( kA;1k kAk ) kAk) A kbk kAk = B
@ (1 ; Cond(A) kAk) CA kbk + kAk :
kAk kAk
(6.6.17)
Remarks: We again see from (6.6.17) that even if the perturbations kkbbkk and kkAAkk are small,
there might be a drastic change in the solution, if Cond(A) is large. Thus, Cond(A) plays the
crucial role in the sensitivity of the solution.
307
Denition 6.6.1 Let A be a nonsingular matrix. Then
Cond(A) = kAk kA;1k:
A Convention
Unless otherwise stated, when we write Cond(A), we will mean Cond2 (A),
that is, the condition number with respect to 2-norm. The condition number
of a matrix A with respect to a subordinate p norm (p = 1; 2; 1) will be denoted by
Condp (A), that is, Cond1 (A) will stand for the condition number of A with respect
to 1-norm, etc.
Remark: For the above example, it turned out that the condition number with respect to any
norm is the same. This is, however, not always the case. In general, however, they are closely
related. (See below the condition number of the Hilbert matrix with respect to dierent
norms.)
6.7.1 Some Well-known Ill-conditioned Matrices
1. The Hilbert Matrix 0 1 1 1 1 1
n
BB 1 21 13 1 C
n+1 C
A=B BB 2.. 3 4 C
.. C
@. . C
A
n n+1
1 1 1
2n;1
309
For n = 10; Cond2(A) = 1:6025 1013:
Cond1 (A) = 3:5353 1013:
Cond1(A) = 3:5353 1013:
2. The Pie matrix A with aii = ; aij = 1 for i 6= j . The matrix becomes ill-conditioned
when is close to 1 or n ; 1. For example, when = 0:9999 and n = 5, Cond(A) = 5 104.
3. The Wilkinson bidiagonal matrix of order 20 (see Chapter 3):
Let
0
!
x^ =
2
Then
0:0001
!
r = b ; Ax^ =
0
310
Note that
! r is small. However, the vector x^ is nowhere close to the exact solution
1
x= .
1
The above phenomenon can be explained mathematically from the following theorem. The
proof can be easily worked out.
311
Then, kxk is approximately less than or equal to 2 Cond(A) 10;d .
kxk
This says that if the data has a relative error of 10;d and if the relative error in the solution
has to be guaranteed to be less than or equal to 10;t; then Cond(A) has to be less than or equal
to 12 10d;t. Thus, whether a system is ill-conditioned or well-conditioned depends on
the accuracy of the data and how much error in the solution can be tolerated.
For example, suppose that the data have a relative error of about 10;5 and an accuracy of about
10;3 is sought, then Cond(A) 21 102 = 50. On the other hand, if the accuracy of about 10;2 is
sought, then Cond(A) 12 103 = 500. Thus, in the rst case the system will be well-conditioned
if Cond(A) is less than or equal to 50, while in the second case, the system will be well-conditioned
if Cond(A) is less than or equal to 500.
Estimating Accuracy from the Condition Number
In general, if the data are approximately accurate and if Cond(A) = 10s, then there
will be only about t ; s signicant digit accuracy in the computed solution when
the solution is computed in t-digit arithmetic.
For better understanding of conditioning, stability and accuracy, we again refer the readers to
the paper of Bunch (1987).
313
Example 6.7.3
Consider the linear system
Ax = b
with
01 0 0
1
B 0 0:00001
A = B 0 C
C
@ A
0 0 0:00001
0 0:1 1
B CC
b = B
@ 0:1 A
0:1
0 0:00001 1
B C
x = 104 B
@ 1 CA :
1
which is quite large.
The eigenvalues of A are 1, 0.00001 and 0.00001. Thus, A has a small eigenvalue.
0 0:00001 1 0 1
B C
A;1 = 105 B@ 0 1 0C A;
0 0 1
which is large. Thus, for this example (i) the computed solution is large, (ii) A has a small
eigenvalue, and (iii) A;1 is large. A is, therefore, likely to be ill-conditioned. It is indeed true:
Cond(A) = 105:
Remark: Demmel (1989) has shown that scaling to improve the condition number is not
necessary when solving a symmetric positive denite system using the Cholesky algo-
rithm. The error bound obtained for the solution by the algorithm for the unscaled system Ax = b
is almost the same as that of the scaled system with A~ = D;1AD;1 , D = diag(pa11; : : :; pann).
315
Thus, if we choose y such that kz k is quite large, we could have a reasonably good estimate of
ky k
kA;1k. Rice (MCMS, p. 93) remarks: There is a heuristic argument which says that if y is
picked at random, then the expected value of kkyzkk is about 12 kA;1k.
A systematic way to choose y has been given by the Linpack Condition Number Estimator
(LINPACK (1979)). It is based on an algorithm by Cline, Moler, Stewart and Wilkinson (1979).
The process involves solving two systems of equations
AT y = e
and
Az = y;
where e is a scalar multiple of a vector with components 1 chosen in such a way that the possible
growth is maximum.
To avoid over
ow, the LINPACK condition estimator SGECO routine actually computes an
estimate of Cond(1 , called
A)
RCOND = kAkky kkz k :
The procedure for nding RCOND, therefore, can be stated as follows:
1. Compute kAk.
2. Solve AT y = e and Az = y , choosing e such that the growth is maximum (see LINPACK
(1979)) for the details of how to choose e).
3. RCOND = kkAy kk kz k.
Flop-count. Once A has been triangularized to solve a linear system involving A, the actual
cost of estimating the Cond(A) of A using the above procedure is quite cheap. The same triangu-
larization can be used to solve both the systems in step 2. Also, `1 vector norm can be used so that
the subordinate matrix norm can be computed from the columns of the matrix A. The process
of estimating Cond(A) in this way requires only O(n2)
ops.
Round-o error. According to LINPACK (1979), ignoring the eects of round-o error, it
can be proved that
1
RCOND Cond(A):
In the presence of round-o error, if the computed RCOND is not zero, 1 is almost always
RCOND
a lower bound for the true condition number.
316
An Optimization Technique for Estimating kA;1k1
Hager (1984) has proposed a method for estimating kA;1 k based on an optimization technique.
This technique seems to be quite suitable for randomly generated matrices. Let A;1 = B = (bij ).
Dene a function f (x):
f (x) = kBx k1
Xn X n
= bij xj :
i=1 j =1
Then
kBk1 = kA;1k1 = maxff (x) : kxk1 = 1g:
Thus, the problem is to nd maximum of the convex function f over the convex set
S = fx 2 Rn : kxk1 1g:
It is well known that the maximum of a convex function is obtained at an extreme point. Hager's
method consists in nding this maximum systematically. We present the algorithm below (for
details see Hager (1984)). Hager remarks that the algorithm usually stops after two
iterations. An excellent survey of dierent condition number estimators including Hager's, and
their performances have been given by Higham (1987).
Set = kA;1k1 = 0:
011
BB n1 CC
Set b = BBB n.. CCC :
@.A
1
n
1. Solve Ax = b.
2. Test if kxk . If so, go to step 6. Otherwise set = kxk1 and go to step 3.
3. Solve AT z = y; where
yi = 1 if xi 0;
yi = ;1 if xi < 0:
William Hager is a professor of mathematics at University of Florida. He is the author of the book Applied
Numerical Linear Algebra.
317
4. Set j = arg maxfjzij; i = 1 to ng.
5. If jzj j > z T b, update 001
BB ... CC
BB CC
B 1 CC j th entry
bB BB CC
BB 0. CC
B@ .. CA
0
and return to step 1. Else go to step 6.
6. Set kA;1k1 . Then Cond1 (A) = kAk1.
Example 6.7.4
We illustrate Hager's method by means of a very ill-conditioned matrix.
01 2 31
A=B
B 3 4 5 CC ; Cond(A) = 3:3819 1016:
@ A
6 7 8
Iteration 1:
011
B
B
3
C:
b = @ 13 C
A
1
3
0 1:0895 1
B C
x = B
@ ;2:5123 CA
1:4228
= 5:0245
011
BB CC
y = @ ;1 A
1
0 2:0271 1
z =
B ;3:3785 CC
1016 B
@ A
1:3514
j = 2
jz2j = 1016(3:3785) > zT b = ;1:3340:
318
Update 001
B CC
bB
@1A:
0
Iteration 2:
001
B 1 CC ;
b = B
@ A
0
0 ;1:3564 1
B CC
x = 1017 B
@ 2: 7128 A;
;1:3564
kxk1 = 5:4255 1017:
Since kxk1 > , we set = 5:4255 1017.
Comment. It turns out that this current value is an excellent estimate of kA;1k1.
Condition Estimation from Triangularization
If one uses the QR factorization to solve a linear system or the Cholesky factorization to solve
a symmetric positive denite system, then as a by-product of the triangularization, one can obtain
an upper bound of the condition number with just a little additional cost.
If QR factorization with column pivoting is used, then from
R
!
QT AP = ;
0
we have Cond2(A) = Cond2(R).
If the Cholesky factorization is used then from
A = HH T ;
we have Cond2 (A) = (Cond2(H ))2. Thus, the Cond2 (A) can be determined if Cond2(R) or
Cond2 (H ) is known. Since kRk2 or kH k2 is easily computed, all that is needed is an algorithm
to estimate kR;1k2 or kH ;1k2. There are several algorithms for estimation of the norms of the
inverses of triangular matrices. We just state one from a paper of Higham (1987). For details, see
Higham (1987).
Algorithm 6.7.2 Condition Estimation of an Upper Triangular Matrix
319
Given a nonsingular upper triangular matrix T = (tij ) of order n, the following algorithm
computes CE such that kT ;1k1 CE .
1. Set z = 1 .
n tnn
For i = n ; 1 to 1 do
s1
2.
s s + jtij jzj (j = i + 1; : : :; n).
zi = jts j :
ii
3. Compute CE = kz k1 , where z = (z1 ; z2; :::; zn)T .
Remark: Once kT ;1k1 is estimated by the above algorithm, kT ;1k2 can be estimated from the
relation:
1
kT ;1k2 (kM (T );1k1CE2 ;
where M (T ) = (mij ) are dened by:
8
< jt j; i = j
mij = : ii
;jtij j; i =6 j:
kM (T );1k1 can be estimated by using Hager's algorithm described in the last section.
320
Denition 6.8.1 We shall call the number Cond(A; x) = kjA kjjxAk jjxjk the Skeel's condition
;1
number and Conds(A) = kjA;1jjAjk the upper bound of the Skeel's condition number.
An important property of Cond(A; x): Skeel's condition number is invariant un-
der row-scaling. It can, therefore, be much smaller than the usual condition number Cond(A).
Cond(A; x) is useful when the column norms of A;1 vary widely.
Chandrasekaran and Ipsen (1994) have recently given an analysis of how do the individual
components of the solution vector x get aected when the data is perturbed. Their analysis can
eectively be combined with the Skeel's result above, when the component-wise perturbations of
the data are known.. We give an example.
Thus, the component-wise perturbations in the error expressions have led to the componenet-
wise version of Skeel's condition number. Similar results also hold for right hand side perturbation.
For details see Chandrasekaran and Ipsen (1994).
r(k) = b ; Ax(k) :
2. Calculate the correction vector c(k) by solving the system:
Ac(k) = r(k);
using the triangularization of A obtained to get the computed solution x^.
3. Form x(k+1) = x(k) + c(k).
4. If kx kx(;
(k+1) x(k)k
2
k)k2 is less than a prescribed tolerance , stop.
Remark: If the system is well-conditioned, then the iterative renement using Gaussian elim-
ination with pivoting will ultimately produce a very accurate solution.
Example 6.9.1
322
01 1 01 0 0:0001 1
A=B
B 0 2 1 CC ; b = BB 0:0001 CC :
@ A @ A
0 0 3 ;1:666
0 ;0:2777 1
B 0:2778 C
The exact solution x = B C
@ A (correct up to four gures).
;0:5555
011
B CC
x(0) = B
@1A:
1
k=0: 0 ;1:9999 1
B C
r(0) = b ; Ax(0) = B
@ ;2:9999 CA :
;4:6666
The solution of Ac(0) = r(0) is
0 ;1:2777 1
B C
c(0) = B
@ ;0:7222 CA
;1:5555
0 ;0:2777 1
x(1)
B 0:2778 CC :
= x(0) + c(0) = B
@ A
;0:5555
Note that Cond(A) = 3:8078. A is well-conditioned.
Accuracy Obtained by Iterative Renement
Suppose that the iteration converges. Then the error at (k +1)th step will be less than the error
at the kth step.
Relative Accuracy from Iterative Renement
Let
kx^ ; x(k+1)k c kx^ ; x(k)k :
kx^k kx^k
Then if c 10;s, there will be a gain of approximately s gures per iteration.
Flop-count. The procedure is quite cheap. Since A has already been triangularized to solve
the original system Ax = b; each iteration requires only O(n2)
ops.
323
Remarks: Iterative renement is a very useful technique. Gaussian elimination with partial
pivoting followed by iterative renement is the most practical approach for solving a
linear system accurately. Skeel (1979) has shown that in most cases even one step of
iterative renement is sucient.
Example 6.9.2 (Stewart IMC, p. 205)
7 6:990
! 34:97
!
A= ; b=
4 4 20:00
Cond2 (A) = 3:2465 103:
2
!
The exact solution is x = .
3
Let x(0) be
1:667
!
= x(0)
3:333
(obtained by Gaussian elimination without pivoting.)
k=0: !
0:333 10;2
r(0) = b ; Ax(0) =
0
The solution of Ac(0) = r(0) is
0:3167
!
c(0) =
;0:3167
1:9837
!
x(1) = x(0) + c(0) = :
3:0163
k=1:
;0:0292 !
r(1) = b ; Ax(1) =
;0:0168
The solution of Ac(1) = r(1) is
0:0108
!
c(1) =
;0:0150
1:9992
!
x =x +c =
(2) (1) (1)
3:0008
324
Iterative Renement of the Computed Inverse
As in the procedure of rening the computed solution of the system Ax = b; a computed inverse
of A can also be rened iteratively.
Let X (0) be an approximation to a computed inverse of A. Then the matrices X (k) dened by
the following iterative procedure:
X (k+1) = X (k) + X (k)(I ; AX (k)); k = 0; 1; 2; : : : (6.9.1)
converge to a limit (under certain conditions) and the limit, when it exists, is a better inverse of A.
Note the resemblance of the above iteration to the Newton-Raphson method for nding a zero
of f (x):
xk+1 = xk ; ff0((xxk )) :
k
Like the Newton-Raphson method, the iteration (6.9.1) has the convergence of order 2. This can
be seen as follows:
I ; AX (k+1) = I ; A(X (k) + X (k)(I ; AX (k)))
= I ; AX (k) ; AX (k) + (AX (k))2
= (I ; AX (k))2 :
Continuing k times, we get
I ; AXk+1 = (I ; AX0)2k ;
from where we conclude that if kI ; AX0 k = < 1; then the iteration converges to a limit, because
in this case kI ; AXk+1k 2k . A necessary and sucient condition is that (I ; AX0 ) < 1.
We summarize the above discussion as follows:
325
Example 6.9.3
01 2 31
B C
A = B
@ 2 3 4 CA
7 6 8
0 0 ;0:6667 0:3333 1
B C
A;1 = B
@ ;4 4:3333 ;0:6667 CA (in four-digit arithmetic).
3 ;2:6667 0:3333
Let us take 0 0 ;0:6660 0:3333 1
B ;3:996 4:3290 ;0:6660 CC
X0 = B
@ A
2:9970 ;2:6640 0:3330
then
(I ; AX0) = 0:001 < 1:
(Note that the eigenvalues of I ; AX0 are 0.001, 0.001, 0.001.)
0 0 ;0:6667 0:3333 1
B ;4 4:3333 ;0:6667 CC
X1 = X0 + X0(I ; AX0) = B
@ A
3 ;2:6667 0:3333
(exact up to four signicant gures).
Estimating Cond(A) For Iterative Renement
A very crude estimate of Cond(A) may be obtained from the Iterative Renement
procedure (Algorithm 6.9.1). Let k be the number of iterations required for the
renement procedure to converge, t is the number of digits used in the arithmetic,
then (Rice MCMS, p. 98):
a rough estimate of Cond(A) is 10t(1; 1 ) .
k
326
1. The Jacobi method (Section 6.10.1).
2. The Gauss-Seidel method (Section 6.10.2).
3. The Successive Overrelaxation method (Section 6.10.4).
4. The Conjugate Gradient method with and without preconditioner (Section 6.10.5).
5. The GMRES method (Section 6.10.6).
The Gauss-Seidel Method is a modication of the Jacobi method, and is a special case of the
Successive Overrelaxation Method (SOR). The Conjugate Gardient Method is primarily used to
solve a symmetric positive denite system. The Jacobi and the Gauss-Seidel methods converge for
diagonally dominant matrices and in addition, the Gauss-Seidel method converges if A is symmetric
positive denite. Note that the diagonally dominant and the symmetric positive denite matrices
are among the most important classes of matrices arising in practical applications.
The direct methods based on triangularization of the matrix A becomes prohibitive in terms of
computer time and storage if the matrix A is quite large. On the other hand, there are practical
situations such as the discretization of partial dierential equations, where the matrix size
can be as large as several hundred thousand. For such problems, the direct methods become
impractical. For example, if A is of order 10; 000 10; 000, it may take as long as 2-3 days for an
IBM 370 to solve the system Ax = b using Gaussian elimination or orthogonalization techniques
of Householder and Givens. Furthermore, most large problems are sparse and the sparsity gets
lost to a considerable extent during the triangularization procedure, so that at the end
we have to deal with a very large matrix with too many nonzero entries, and the storage becomes
a crucial issue. For such problems, it is advisable to use a class of methods called ITERATIVE
METHODS that never alter the matrix A and require the storage of only a few vectors of length
n at a time.
Basic Idea
The basic idea behind an iterative method is to rst write the system Ax = b in an equivalent form:
x = Bx + d (6.10.1)
and then starting with an initial approximation x(1) of the solution-vector x to generate a sequence
of approximations fx(k)g iteratively dened by
x(k+1) = Bx(k) + d; k = 1; 2; : : : (6.10.2)
327
with a hope that under certain mild conditions, the sequence fx(k)g converges to the solution as
k ! 1.
To solve the linear system Ax = b iteratively using the idea, we therefore need to know
(a) how to write the system Ax = b in the form (6.10.1), and
(b) how should x(1) be chosen so that the iteration (6.10.2) converges to the limit
or under what sort of assumptions, the iteration converges to the limit with any
arbitrary choice of x(1).
Stopping Criteria for the Iteration (6.10.2)
It is natural to wonder when the iteration (6.10.2) can be terminated. Since when convergence
occurs, x(k+1) is a better approximation than x(k); a natural stopping criterion will be:
Stopping Criterion 1
I. Stop the iteration (6.10.2) if
kx(k+1) ; x(k)k <
kx(k)k
for a prescribed small positive number . ( should be chosen according to the
accuracy desired).
In cases where the iteration does not seem to converge or the convergence is too slow, one might
wish to terminate the iteration after a number of steps. In that case the stopping criterion will be:
Stopping Criterion 2
II. Stop the iteration (6.10.2) as soon as the number of iteration exceeds a prescribed
number, say N .
6.10.1 The Jacobi Method
The System Ax = b
or
a11x1 + a12x2 + + a1nxn = b1
a21x1 + a22x2 + + a2nxn = b2
..
.
an1x1 + an2x2 + + annxn = bn
328
can be rewritten (under the assumption that aii 6= 0; i = 1; : : :; n) as:
x = 1 (b ; a x ; ; a x )
1
a11 1 12 2 1n n
With the Jacobi iteration matrix and the Jacobi vector as dened above, the iteration (6.9.2)
becomes:
We thus have the following iterative procedure, called the The Jacobi Method.
Algorithm 6.10.1 The Jacobi Method
0 x(1) 1
BB 1(1) CC
x
(1) Choose x(1) = BBB 2.. CCC.
@ . A
x(1)
n
(2) For k = 1; 2; : : :; do until a stopping criterion is satised
330
Example 6.10.1
05 1 11 071
B C B CC
A=B
@ 1 5 1 CA ; b=B
@7A
1 1 5 7
001
B CC
x(1) = B
@0A
0
0 0 ;0:2000 ;0:2000 1 0 1:4000 1
B C B C
BJ = B
@ ;0:2000 0 ;0:2000 C
A; bJ = B
@ 1:4000 CA
;0:2000 ;0:2000 0 1:4000
k=1: 0 1:4000 1
B C
x(2) = BJ x(1) + bJ = B
@ 1:4000 CA
1:4000
k=2: 0 0:8400 1
B C
x(3) = BJ x(2) + bJ = B
@ 0:8400 CA
0:8400
k=3: 0 1:0640 1
B 1:0640 CC
x(4) = BJ x(3) + bJ = B
@ A
1:0640
k=4: 0 0:9744 1
B C
x(5) = BJ x(4) + bJ = B
@ 0:9744 CA
0:9744
k=5: 0 1:0102 1
B CC
x(6) = BJ x(5) + bJ = B
@ 1: 0102 A
1:0102
k=6: 0 0:0099 1
B 0:0099 CC
x(7) = BJ x(6) + bJ = B
@ A
0:0099
331
k=7: 0 1:0016 1
B 1:0016 CC
x(8) = BJ x(7) + bJ = B
@ A
1:0016
The Gauss-Seidel Method*
In the Jacobi method, to compute the components of the vector x(k+1) ; the components of the
vector x(k) are only used; however, note that to compute x(ik+1), we could have used x(1k+1) through
x(i;k+1)
1 which were already available to us. Thus, a natural modication of the Jacobi method will
be to rewrite the Jacobi iteration (6.10.4) in the following form:
The Gauss-Seidel Iteration
X
i;1 X
n
x(ik+1) = a1 (bi ; aij x(jk+1) ; aij x(jk) ): (6.10.5)
ii j =1 j =i+1
The idea is to use each new component, as soon as it is available, in the computation
of the next component.
The iteration (6.10.5) is known as the Gauss-Seidel iteration and the iterative method based
on this iteration is called the Gauss-Seidel Method.
In the notation used earlier, the Gauss-Seidel iteration is:
x(k+1) = ;(D + L);1Ux + (D + L);1b:
(Note that the matrix D + L is a lower triangular matrix with a11; : : :; ann on the diagonal; and,
since we have assumed that these entries are nonzero, the matrix (D + L) is nonsingular).
We will call the matrix
B = ;(D + L);1U
the Gauss-Seidel matrix and denote it by the symbol BGS . Similarly the Gauss-Seidel vector
(D + L);1b will be denoted by bGS . That is,
* Association of Seidel's name with Gauss for this method does not seem to be well-documented in history.
332
The Gauss-Seidel Matrix and the Gauss-Seidel Vector
Let A = L + D + U . Then
BGS = ;(D + L);1U;
bGS = (D + L);1b:
Example 6.10.2
05 1 11 071
B C B CC
A=B
@ 1 5 1 CA ; b=B
@7A
1 1 5 7
0 0 ;0:2 ;0:2 1
BGS = B
B 0 0:04 ;0:16 CC
@ A
0 0:032 0:072
0 1:4000 1
B C
bGS = B
@ 1:1200 CA
0:8960
k = 1: 0 1:4000 1
B 1:1200 CC
x(2) = BGS x(1) + bGS = B
@ A
0:8960
333
k = 2: 0 0:9968 1
B 1:0214 CC
x(3) = BGS x(2) + bGS = B
@ A
0:9964
k = 3: 0 0:9964 1
B 1:0014 CC
x(4) = BGS x(3) + bGS = B
@ A
1:0004
k = 4: 0 0:9996 1
B 1:0000 CC
x(5) = BGS x(4) + bGS = B
@ A
1:0001
k = 5: 011
B 1 CC :
x(6) = BGS x(5) + bGS = B
@ A
1
Computer Implementations
On actual computer implementations it is certainly economical to use the equation (6.10.4) and
(6.10.7). The use of (6.10.3) and (6.10.6) will necessitate the storage of D, L and U , which will be
a waste of storage.
Proof. From
x = Bx + c
334
and
x(k+1) = Bx(k) + c;
we have
x ; x(k+1) = B(x ; x(k)): (6.10.8)
Since this is true for any value of k, we can write
x ; x(k) = B (x ; x(k;1)): (6.10.9)
Substituting (6.10.9) in (6.10.8), we have
x ; x(k+1) = B2 (x ; x(k;1)): (6.10.10)
Continuing this process k times we can write
x ; x(k+1) = Bk (x ; x(1)):
This shows that fx(k)g converges to the solution x for any arbitrary choice of x(1) if and only if
B k ! 0 as k ! 1.
Recall now from Chapter 1 (Section 1.7, Theorem 1.7.4) that B is a convergent matrix
i the spectral radius of B, (B), is less than 1. Now (B) = maxfjij; i = 1; : : :; ng, where
1 through n are the eigenvalues of B. Since jij kB k for each i and for any matrix norm (see
Chapter 8); in particular, (B ) kB k. Thus, a good way to see if B is convergent is to compute
kBk with row-sum or column-sum norm and see if this is less than one. (Note that the converse
is not true.)
We combine the result of Theorem 6.10.1 with the observation just made in the following:
We now apply the above result to identify classes of matrices for which the Jacobi and/or
Gauss-Seidel methods converge for any choice of initial approximation x(1).
335
The Jacobi and Gauss-Seidel Methods for Diagonally Dominant Matrices
Theorem 6.10.2 Let A be a symmetric positive denite matrix. Then the Gauss-
Seidel method converges for any arbitrary choice of the initial approximation x(1) .
338
>From (6.10.15) and (6.10.16), we obtain
1
+ 1 u Au = u (L + D)u + u (LT + DT )u (6.10.17)
(1 + ) (1 + )
= u (L + D + LT + DT )u (6.10.18)
= u (A + DT )u (6.10.19)
= u (A + D)u > uAu (6.10.20)
(Note that since A is positive denite, so is D and, therefore, uDu > 0).
Dividing both sides of (6.10.20) by u Au(> 0) we have
1 1 >1
+
(1 + ) (1 + )
or
(2 + + ) > 1: (6.10.21)
(1 + )(1 + )
Let = + i . Then = ; i . >From (6.10.21) we then have
2(1 + ) > 1;
(1 + )2 + 2
p
from where it follows that 2 + 2 < 1. That is (BGS ) < 1; since jj = 2 + 2 .
Rates of Convergence and a Comparison Between
the Gauss-Seidel and the Jacobi Methods
We have just seen that for the row diagonally dominant matrices both the Jacobi and the
Gauss-Seidel methods converge for an arbitrary x(1) . The question naturally arises if this is true for
some other matrices, as well. Also, when both methods converge, another question arises:
which one converges faster?
>From our discussion in the last section we know that it is the iteration matrix B that plays
a crucial role in the convergence of an iterative method. More specically, recall from proof of
Theorem 6.9.2 that ek+1 = error at the (k + 1)th step = x ; xk+1 and e1 = initial error = x ; x(1)
are related by
kek+1k kBk kke1k; k = 1; 2; 3; : : ::
Thus, kB k k gives us an upper bound of the ratio of the error between the (k + 1)th step and the
initial error.
Denition 6.10.1 If kBk k < 1; then the quantity
; ln kB k
k
k
339
is called the Average Rate of Convergence for k iterations, and, the quantity
; ln (B)
is called the Asymptotic Rate of Convergence.
If the asymptotic rate of convergence of one iterative method is greater than that
of the other and both the methods are known to converge, then the one with the
larger asymptotic rate of convergence, converges asymptotically faster than the other.
The following theorem, known as the Stein-Rosenberg Theorem, identies a class of matrices
for which the Jacobi and the Gauss-Seidel are either both convergent or both divergent. We shall
state the theorem below without proof. The proof involves the Perron-Frobenius Theorem
from matrix theory and is beyond the scope of this book. The proof of the theorem and related
discussions can be found in an excellent reference book on the subject, (Varga MIR, Chapter 3):
Theorem 6.10.3 (Stein-Rosenberg) If the matrix A is such that its diagonal entries
are all positive and the o-diagonal entries are nonnegative, then one and only one
of the following statements holds:
(a) (BJ ) = (BGS ) = 0;
(b) 0 < (BGS ) < (BJ ) < 1;
(c) (BJ ) = (BGS ) = 1;
(d) 1 < (BJ ) < (BGS ).
Corollary 6.10.3 If 0 < (BJ ) < 1, then the asymptotic rate of convergence of
the Gauss-Seidel method is larger than that of the Jacobi method.
340
If the matrix A satises the hypothesis of Theorem 6.10.3, then
(i) The Jacobi and the Gauss-Seidel methods either both converge or both
diverge.
(ii) When both the methods converge, the asymptotic rate of convergence of
the Gauss-Seidel method is larger than that of the Jacobi method.
Remarks: Note that in (ii) we are talking about the asymptotic rate of convergence, not the
average rate of convergence.
Unfortunately, in the general case no such statements about the convergence and the asymptotic
rates of convergence of two iterative methods can be made. In fact, there are examples where one
method converges but the other diverges (see the example below). However, when both the
Gauss-Seidel and the Jacobi converge, because of the lower storage requirement and
the asymptotic rates of convergence, the Gauss-Seidel method should be preferred
over the Jacobi.
Example 6.10.3
The following example shows that the Jacobi method can converge even when the Gauss-Seidel
method does not.
0 1 2 ;2 1
B C
A = B @ 1 1 1 CA
2 2 1
011
b = B
B 2 CC
@ A
5
0 0 ;2 2 1
BJ = B
B C
@ ;1 0 ;1 CA
;2 ;2 0
0 0 ;2 2 1
B C
BGS = B @ 0 2 ;3 CA
0 0 2
(BGS ) = 2:
(BJ ) = 6:7815 10;6
341
A Few Iterations with Gauss-Seidel
001 011 011
x(1)
B 0 CC ; x(2) = BB 1 CC ; x(3) = BB 0 CC ;
= B
@ A @ A @ A
0 07 1 0 131 1 3
B CC (5) B C
x(4) = B
@ ;8 A ; x = B@ ;36 CA ;
7 15
etc. This shows that the Gauss-Seidel method is clearly diverging. The exact solution is
071
B CC
x=B@ ;4 A :
;1
On the other hand, with the Jacobi method we have convergence with only two iterations:
001
B CC
x(1) = B @0A
0 01 1
B CC
x(2) = B @2A
0 57 1
x(3) = B
B ;4 CC :
@ A
;1
6.10.4 The Successive Overrelaxation (SOR) Method
The Gauss-Seidel Method is frustratingly slow when (BGS ) is close to unity. However, the rate
of convergence of the Gauss-Seidel iteration can, in certain cases, be improved by introducing a
parameter w, known as the relaxation parameter. The following modied Gauss-Seidel iteration,
The SOR Iteration
X
i;1 X
n
x(ik+1) = aw (bi ; aij x(jk+1) ; aij x(jk)) + (1 ; w)xki; i = 1; 2; : : :; (6.10.22)
ii j =1 j =i+1
is known as the successive overrelaxation iteration or in short, the SOR iteration, if w > 1.
From (6.10.22) we note the following:
342
(1) when ! = 1, the SOR iteration reduces to the Gauss-Seidel iteration.
(2) when ! > 1, in computing the (k + 1)th iteration, more weight is placed on
the most current value than when ! < 1, with a hope that the convergence
will be faster.
In matrix notation the SOR iteration is
x(k+1) = (D + !L);1[(1 ; !)D ; !U ]x(k) + !(D + !L);1b; k = 1; 2; : : : (6.10.23)
(Note that since aii 6= 0, i = 1; : : :; n the matrix (D + !L) is nonsingular.)
The matrix (D + !L);1[(1 ; ! )D ; !U ] will be called the SOR matrix and will be denoted by
BSOR . Similarly, the vector (D + !L);1b will be denoted by bSOR , that is,
BSOR = (D + !L);1[(1 ; ! )D ; !U ]
bSOR = !(D + !L);1b:
Then
0 ;0:2000 ;0:2400 ;0:2400 1 0 1:6800 1
B C B C
BSOR = B
@ 0:0480 ;0:1424 ;0:1824 CA ; bSOR = B
@ 1:2768 CA
0:0365 0:0918 ;0:0986 0:9704
343
k = 1: 0 x(2) 1 0 1:6800 1
1
B x(2) CC = B x(1) + b = BB 1:2768 CC
x(2) = B
@ 2 A SOR SOR @ A
x(2)
3 0:9704
k = 2: 0 x(3) 1 0 0:8047 1
1
B x(3) CC = B x(2) + b = BB 0:9986 CC
x(3) = B
@ 2 A SOR SOR @ A
x(3)
3 1:0531
k = 3: 0 x(4) 1 0 1:0266 1
1
B (4) CC B C
x(4) = B
@ x2 A = BSOR x(3) + bSOR = B@ 0:9811 CA :
x(4)
3 0:9875
Choice of ! in the Convergent SOR Iteration
It is natural to wonder what is the range of ! for which the SOR iteration converges and what
is the optimal choice of ! ? To this end, we rst prove the following important result due to William
Kahan (1958).
Theorem 6.10.4 (Kahan) For the SOR iteration to converge for every initial ap-
proximation x(1) , w must lie in the interval (0,2).
344
Since the determinant of a matrix is equal to the product of its eigenvalues, we conclude that
(BSOR ) j1 ; ! j;
where (BSOR is the special radius of the matrix BSOR .
Since by Theorem 6.10.1, for the convergence of any iterative method the spectral radius of the
iteration matrix has to be less than 1, we conclude that ! must lie in the interval (0,2).
The next theorem, known as the Ostrowski-Reich Theorem, shows that the above condition
is also sucient in case the matrix A is symmetric and positive denite.
The theorem was proved by Reich for the Gauss-Seidel iteration (! = 1) in 1949 and subse-
quently extended by Ostrowski in 1954.
We state the theorem without proof. For proof, see Varga MIA, Section 3.4 or Ortega, Nu-
merical Analysis-Second Course, p. 123. The Ostrowski-Reich Theorem is a generalization
of Theorem 6.10.4 for symmetric positive denite matrices.
345
Denition 6.10.3 The matrix A is 2-cyclic if there is a permutation matrix P such that
A A ! 11 12
PAP T = ;
A21 A22
where A11 and A22 are diagonal.
David Young has dened such a matrix as the matrix having \Property (A)". This denition
can be generalized to block matrices where the diagonal matrices A11 and A22 are block diagonal
matrices; we call such matrices block 2-cyclic matrices. A well-known example of a consistently
ordered block 2-cyclic matrix is the block tridiagonal matrix
0 T ;I 0 1
BB ;I . . . . . . ... CC
A=B BB .. . . . . . . CC
CA
@ . ; I
0 ;I T
where 0 4 ;1 0 1
BB ;1 . . . . . . ... CC
T =B BB . . . . . . . . . CC :
@ ;1 CA
0 ;1 4
Recall that this matrix arises in the discretization of the Poisson's equations:
2T + 2T = f
x2 y 2
on the unit square. In fact, it can be shown (Exercise) that every block tridiagonal matrix
with nonsingular diagonal blocks is consistently ordered and 2-cyclic.
The following important and very well-known theorem on the optimal chocie of ! for consistently
ordered matrices is due to David Young (Young (1971)).
David Young, Jr., is a professor of mathematics and Director of the Numerical Analysis Center at the University
of Texas at Austin. He is widely-known for his pioneering contributions in the area of iterative methods for linear
systems. He is also one of the developers of the software package \ITPACK".
346
Theorem 6.10.6 (Young) Let A be consistently ordered and 2-cyclic with nonzero
diagonal elements. Then
(BGS ) = ((BJ ))2:
Furthermore, if the eigenvalues of BJ are real and (BJ ) < 1, then the optimal
choice for ! in terms of producing the smallest spectral radius in SOR, denoted by
!opt, is given by
!opt = p 2 ;
1 + 1 ; (BJ )2
and (BSOR ) = !opt ; 1.
Corollary 6.10.4 For consistently ordered 2-cyclic matrices, if the Jacobi method
converges, so does the Gauss-Seidel method, and the Gauss-Seidel method converges
twice as fast as the Jacobi method.
Example 6.10.5
04 ;1 0 ;1 0 0 1 011
BB ;1 4 ;1 0 ; 1 0 C
C BB 0 CC
BB CC BB CC
B0 ;1 4 0 0 ;1 C C B0C
A=B BB CC ; b = BBB CCC :
BB ;1 0 0 4 ;1 0 C
CC BB 0 CC
B@ 0 ;1 0 ;1 4 ;1 A B@ 0 CA
0 0 ;1 0 ; 1 4 0
The eigenvalues of BJ are: 0:1036; 0:2500; ;0:1036; ;0:2500; 0:6036; ;0:6036.
(BJ ) = 0:6036
!opt = p 2 = 1:1128
1 + 1 ; (0:6036)2
(BGS ) = 0:3643
347
It took ve iterations for the SOR method0with 1 !opt to converge to the exact solution (up to
0
BB 0 CC
SOR = B
four signicant gures), starting with x(1) B . CC :
B@ .. CA
0
0 0:2939 1
BB 0:0901 CC
BB CC
B
B 0:0184 C CC :
x(5) =
SOR B B CC
BB 0 : 0855
B@ 0:0480 CCA
0:0166
With the same starting vector x(1), the Gauss-Seidel method required 12 iterations (Try it!).
Also nd out how many iterations will be required by Jacobi
Remarks: The above theorem basically states that the SOR method with the
optimum relaxation factor converges much faster (which is expected) than the Gauss-
Seidel method, when the asymptotic rate of convergence of the Gauss-Seidel method
is small.
348
Example 6.10.6
Consider again matrix arising in the process of discretization of the Poisson's equation with the
mesh-size h (the matrix 6.3.36).
For this matrix it is easily veried that
(BJ ) = cos(h):
We also know that (BGS ) = 2(BJ ). Thus
RGS = 2RJ = 2(; log cos h)
= 2 h + 0(h4 )
2 2
2
= 2h2 + 0(h4):
For small h, RGS is small, and by the second inequality of Theorem 6.10.7 we have
RSOR 2R1GS=2 2h
and
RSOR 2=(h):
RGS
Thus, when h is small, the asymptotic rate of convergence of the SOR method with the optimum
1 ; then
relaxation is much greater than the Gauss-Seidel method. For example, when h = 50
RSOR 100 = 31:8:
RGS
Thus, in this case the SOR method converges about 31 times faster than the Gauss-Seidel
method. And, furthermore, the rate of convergence becomes greater as h decreases; the improve-
ment is really remarkable when h is very very small.
Example 6.10.7
05 1 11 071
B C B CC
A=B
@ 1 5 1 CA ; b=B
@7A
1 1 5 7
071
x(0) = (0; 0; 0)T ;
B 7 CC
p0 = r0 = b ; Ax(0) = B
@ A
7
i=0: 0 49 1
B CC
! = Ap0 = B
@ 49 A
49
0 = kprT0k!2 = 0:1429
2
0
0 1:0003 1
B C
x1 = x0 + 0p0 = B
@ 1:0003 CA
1:0003
0 ;0:0021 1
B ;0:0021 CC
r1 = r0 ; 0 ! = B
@ A
;0:0021
0 = 9 10;8
0 ;0:0021 1
B ;0:0021 CC :
p1 = r1 + 0 p0 = B
@ A
;0:0021
351
i=1:
0 ;0:0147 1
B ;0:0147 CC
! = Ap1 = B
@ A
;0:0147
0 1:0000 1
1 = 0:1429; x2 = x1 + 1 p1 = B
B 1:0000 CC :
@ A
1:0000
Convergence
In the absence of round-o errors the conjugate gradient method should converge
in no more than n iterations. Thus in theory, the conjugate gradient method requires about
n iterations. In fact, it can be shown that the error at every step decreases. Specically, it can be
2
proved (see Ortega IPVL, p. 277) that:
Theorem 6.10.9
kx ; xk k2 < kx ; xk;1k2;
where x is the exact solution, unless xk;1 = x:
However, the convergence is usually extremely slow due to the ill-conditioning of A. This can
be seen from the following result.
352
Rate of Convergence of the Conjugate Gradient Method
p
Dene kskA = sT As. Then an estimate of the rate of convergence is:
kxk ; xkA 2kkx0 ; xkA
where
p p
= ( ; 1)=( + 1) and = Cond(A) = kAk2kA;1k2 = n=1;
here n and 1 are the largest and smallest eigenvalues of the symmetric positive
denite matrix A (note that the eigenvalues of A are all positive).
Preconditioning
Since a large condition number of A slows down the convergence of the conjugate gradient
method, it is natural to see if the condition number of A can be improved before the method is
applied; in this case we will be able to apply the basic conjugate gradient method to a preconditioned
system. Indeed, the use of a good preconditioner accelerates the rate of convergence of the method
substantially.
A basic idea of preconditioning is to nd a nonsingular S such that Cond(A~) < Cond(A) where
A~ = SAS T . Once such a S is found, then we can solve A~x~ = ~b where x~ = (S ;1)T x; ~b = Sb and
then recover x from x~ from
x = S T x~:
The matrix S is usually dened for simplicity by
(S T S );1 = M:
Note that M is symmetric positive denite and is called a preconditioner.
Algorithm 6.10.5 The Preconditioned Conjugate Gradient Method (PCG)
353
Find a preconditioner M .
Choose x0 and .
Set r0 = b ; Ax0; p0 = y0 = M ;1 r0.
For i = 0; 1; 2; 3 do
(a) w = Api
(b) i = yiT ri =pTi w
(c) xi+1 = xi + i pi
(d) ri+1 = ri ; iw
(e) Test for convergence: If kri+1k22 , continue.
(f) yi+1 = M ;1ri+1
(g) i = yiT+1 ri+1=yiT ri
(h) pi+1 = yi+1 + ipi
Remarks: The above algorithm requires computations of square roots. It may, therefore, not
be carried out to completion. However, we can obtain a no-ll, incomplete LDLT factorization of
A which avoids square root computations.
Algorithm 6.10.7 No-Fill Incomplete LDLT
Set d11 = a11.
For i = 2:::n do
For j = 1; 2; :::; i ; 1 do
if aij = 0; `ij = 0,
else,
X
j ;1
`ij = (aij ; `ik dkk`jk )=djj.
k=1
X
i;1
dii = aii ; `ikdkk
k=1
Use of Incomplete Cholesky Factorization in Preconditioning
Note that Incomplete Cholesky Factorization algorithm mathematically gives the factor-
ization of A in the form
A = LLT + R
355
where R 6= 0. Since the best choice for a preconditioner M is the matrix A itself, after the
matrix L is obtained through incomplete Cholesky Factorization, the preconditioner M is taken as
M = LLT :
In the Preconditioned Conjugate Gradient algorithm (Algorithm 6.9.5) (PCG), a symmetric positive
denite system of the form:
My = r
needs to be solved at each iteration with M as the coecient matrix (Step f). Since M = LLT ,
this is equivalent to solving
Lx = r; LT y = x:
Since the coecient matrix at each iteration is the same, the incomplete Cholesky Fac-
torization will be done only once. If the no-ll incomplete LDLT is used, then mathematically
we have
A = LDLT + R
In this case we take the preconditioner M as:
M = LDLT :
Then at each iteration of the PCG, one needs to solve a system of the form
My = r;
which is equivalent to Lx = r; Dz = x and LT y = z . Again, L and D have to be computed
once for all.
6.10.6 The Arnoldi Process and GMRES
In the last few years, a method called GMRES has received a considerable amount of attention by
numerical linear algebraists in the context of solving large and sparse linear systems. The method
is based on a classical scheme due to Arnoldi, which constructs an orthonormal basis of a space,
called Krylov subspace fv1 ; Av1; :::An;1v1g, where A is n n and v1 is a vector of unit length. The
Arnoldi method can be implemented just by using matrix-vector multiplications, and is, therefore,
suitable for sparse matrices, because the zero entries are preserved.
The basic idea behind solving a large and sparse problem using the Arnoldi method
is to project the problem onto the Krylov subspace of dimension m < n using the
orthonormal basis, constructed by the method, solve the m-dimensional problem using
356
a standard approach, and then recover the solution of the original problem from the
solution of the projected problem.
We now summarize the essentials of the Arnoldi method followed by an algorithmic description
of GMRES.
Example 01 2 31
B 1 2 3 CC
A=B
@ A; m = 2
1 1 1
v1 = (1; 0; 0)T :
j =1:
i = 1; h11 = 1
357
001
B CC
v^2 = Av1 ; h11v1 = B
@1A
1
h210 = 1:41411
0
B CC
v2 = v^2 =h21 = B
@ 0: 7071 A
0:7071
j=2:
i = 1 h12 = 3:5355; h22 = 3:5000
Form
01 0 1
B C 1 3:5335
!
V2 = B
@ 0 0:7071 CA ; H2 = :
1:4142 3:5000
0 0:7071
VERIFY: 00 0
1
B 0 1:0607 CC :
AV2 ; H2 V2 = B
@ A
0 ;1:0607
GMRES (Generalized Minimal Residual) Method
The GMRES method is designed to minimize the norm of the residual vector b ; Ax over all
vectors of the ane subspace x0 + Km , where x0 is the initial vector and Km is the Krylov subspace
of dimension m.
Algorithm 6.10.9 : fGeneralized Minimal Residual Method (GMRES) (Saad1 and Schultz2
(1986))
(1) Start:
Choose x0 and a dimension m of the Krylov subspace. Compute r0 = b ; Ax0 .
(2) Arnoldi process:
Perform m steps of the Arnoldi algorithm starting with v1 = r0=kr0k, to generate the Hessenberg
matrix H~ m and the orthogonal matrix Vm .
(3) Form the approximate solution:
1
Youcef Saad is a professor of computer science at the University of Minnesota. He is well-known for his contri-
butions to large-scale matrix computations based on Arnolid Method.
2
Martin Schultz is a professor of computer science at Yale University.
358
Find the vector ym that minimizes the function J (y) = ke1 ; H~ myk where e1 = [1; 0; :::; 0]T ;
among all vectors y of Rm . = jr0j
Compute xm = x0 + Vmym .
Remark: Although not clear from the above description, the number m steps needed for the
above algorithm to converge is not xed beforehand but is determined as the Arnoldi algorithm
is run. A formula that gives the residual norm without computing explicitly the residual vector
makes this possible. For details see the paper by Saad and Schultz [1986].
Solving a Shifted System: An important observation is that the Arnoldi basis Vm is
invariant under a diagonal shift of A: if we were to use A ; I instead of A in Arnoldi, we would
obtain the same sequence fv1 ; :::; vmg. This is because the Krylov subspace Km is the same for A
and A ; I , provided the initial vector v1 is the same. Note that from (6.9.24) we have:
(A ; I )Vm = Vm (Hm ; I ) + hm+1;m vm+1 eTm ;
which means that if we run the Arnoldi process with the matrix A ; I , we would obtain the same
matrix Vm but the matrix Hm will have its diagonal shifted by I .
This idea has been exploited in the context of ordinary dierential equations methods by Gear
and Saad (1983).
Solving shifted systems arises in many applications, such as the computation of the Frequency
Response matrix of a control system (see Section 6.4.7).
1. Numerical Methods for Arbitrary Linear System Problems. Two types of methods|
direct and iterative|have been discussed.
359
The method requires n3
ops. It is unstable for arbitrary matrices, and is not rec-
3
ommended for practical use unless the matrix A is symmetric positive denite.
The growth factor can be arbitrarily large for an arbitrary matrix.
Gaussian elimination with partial pivoting gives a factorization of A:
MA = U:
Once having this factorization, Ax = b can be solved by solving the upper triangular system
Ux = b0, where b0 = Mb.
The process requires n33
ops and O(n2) comparisons. In theory, there are some risks involved,
but in practice, this is a stable algorithm. It is the most widely used practical algorithm
for solving a dense linear system.
Gaussian elimination with complete pivoting gives
MAQ = U:
Once having this factorization, Ax = b can be solved by solving rst
Uy = b0; where y = QT x; b0 = Mb
and then recovering x from
x = Qy:
The process requires n33
ops and O(n3 ) comparisons. Thus it is more expensive than Gaussian
elimination with partial pivoting, but it is more stable (the growth factor in this case is
bounded by a slowly growing function of n, whereas the growth factor with Gaussian
elimination using partial pivoting can be as big as 2n;1).
The orthogonal triangularization methods are based on the QR factorization of A:
A = QR:
Once having this factorization, Ax = b can be solved by solving the upper triangular system
Rx = b0, where b0 = QT b.
One can use either the Householder method or the Givens method to achieve this factorization.
The Householder method is more economical than the Givens method ( 2n3 3
ops versus 4n3 3
ops). Both the methods have the guaranteed stability.
360
(2) Iterative Methods: The Jacobi, Gauss-Seidel, and SOR methods have been discussed.
A generic formulation of an iterative method is:
x(k+1) = Bx(k) + d:
Dierent methods dier in the way B and d are chosen. Writing A = L + D + U; we have:
For the Jacobi method:
B = BJ = ;D;1 (L + U );
d = bJ = D;1 b:
2. Special Systems: Symmetric positive denite, diagonally dominant, Hessenberg and tridi-
agonal systems have been discussed.
(b) Diagonally Dominant system. Gaussian elimination with partial pivoting is stable
( 2).
(c) Hessenberg system. Gaussian elimination with partial pivoting requires only O(n2)
ops to solve an n n Hessenberg system. It is stable ( n).
362
(d) Tridiagonal system. Gaussian elimination with partial pivoting requires only O(n)
ops.
It is stable ( 2).
3. Inverse, Determinant and Leading Principal Minors. The inverse and the determinant
of a matrix A can be readily computed once a factorization of A is available.
(a) Inverses.
If Gaussian elimination without pivoting is used, we have
A = LU:
Then
A;1 = U ;1 L;1:
If Gaussian elimination with partial pivoting is used, we have
MA = U:
Then
A;1 = (M ;1U );1 = U ;1M:
If Gaussian elimination with complete pivoting is used, we have
MAQ = U:
Then
A;1 = (M ;1UQT );1 = QU ;1M:
If an orthogonal factorization is used, we have
A = QR:
Then
A;1 = (QR);1 = R;1QT :
Note that most problems involving inverses can be recast so that the inverse does
not have to be computed explicitly.
Furthermore, there are matrices (such as triangular, etc.) whose inverses are easily computed.
The inverse of a matrix B which diers from a matrix A by a rank-one perturbation only can
be readily computed, once the inverse of A has been found, by using the Sherman-Morrison
Formula: Let B = A ; uvT . Then B;1 = A;1 + (A;1uvT A;1), where = (1;v 1A;1 u) . T
363
(b) Determinant. The determinant is rarely required in practical applications.
However, the determinant of A can be computed immediately, once a factorization of A has
been obtained.
If we use Gaussian elimination with partial pivoting, we have
MA = U;
then
det(A) = (;1)r a11 a(1) (n;1)
22 ann ;
where r is the number of row interchanges made during the elimination process. a11; a(1)
22 ; : : :;
( n ;
ann are the pivot entries, which appear as the diagonal entries of U . Similarly, other
1)
(c) Leading Principal Minors. The Givens triangularization method has been described.
4. The Condition Number and Accuracy of Solution. In the linear system problem Ax = b,
the input data are A and b. There may exist impurities either in A or in b, or in both.
We have presented perturbation analyses in all the three cases. The results are contained in
Theorems 6.6.1, 6.6.2, and 6.6.3. Theorem 6.6.3 is the most general theorem.
In all these three cases, it turns out that
Cond(A) = kAk kA;1k
is the deciding factor. If this number is large, then a small perturbation in the input data may
cause a large relative error in the computed solution. In this case, the system is called an ill-
conditioned system, otherwise it is well-conditioned. The matrix A having a large condition
number is called an ill-conditioned matrix.
Some important properties of the condition number of a matrix have been listed. Some well-
known ill-conditioned matrices are the Hilbert matrix, Pie matrix, Wilkinson bidiagonal
matrix, etc.
The condition number, of course, has a noticeable eect on the accuracy of the solution.
A computed solution can be considered accurate only when the product of both
Cond(A) and the relative residual is small (Theorem 6.7.1). Thus, a small relative
residual alone does not guarantee the accuracy of the solution.
A frequently asked question is: How large does Cond(A) have to be before the system
Ax = b is considered to be ill-conditioned?
364
The answer depends upon the accuracy of the input data and the level of tolerance of the error
in the solution.
In general, if the data are approximately accurate and if Cond(A) = 10s, then there
are about (t ; s) signicant digits of accuracy in the solution, if it is computed in t-digit
arithmetic.
Computing the condition number from the denition is clearly expensive; it involves nding
the norm of the inverse of A, and nding the inverse of A is about three times the expense of
solving the linear system itself.
Two condition number estimators: The LINPACK condition number estimator and the
Hager's norm-1 condition number estimator, have been described.
There are symptoms exhibited during the Gaussian elimination with pivoting such as a
small pivot, a large computed solution, a large residual, etc. that merely indicate if
a system is ill-conditioned, but these are not sure tests.
When component-wise perturbations are known, Skeel's condition number can be useful ,
especially when the norms of the columns of the inverse matrix vary widely.
5. Iterative Renement. Once a solution has been computed, an inexpensive way to rene the
solution iteratively, known as the iterative renement procedure has been described (Section 6.8).
The iterative renement technique is a very useful technique.
6. The Conjugate Gradient and GMRES Methods. The conjugate gradient method, when
used with an appropriate preconditioner, is one of the most widely used methods for solving a large
and sparse symmetric positive denite linear system.
Only one type of preconditioner, namely the Incomplete Cholesky Factorization (ICF), has been
described in this book. The basic idea of ICF is to compute the Cholesky factorization of A for the
nonzero entries of A only, leaving the zero entries as zeros.
We have described only basic Arnoldi and Arnoldi-based GMRES methods. In practice, mod-
ied Gram-Schmidt process needs to be used in the implementation of the Arnoldi method, and
GMRES needs to be used with a proper preconditioner. The conjugate gradient method is direct in
theory, but iterative in practice. It is extremely slow when A is ill-conditioned and a preconditioner
is needed to accelerate the convergence.
365
6.12 Some Suggestions for Further Reading
The books on numerical methods in engineering literature routinely discuss how various engineering
applications give rise to linear systems problems. We have used the following two in our discussions
and found them useful:
1. Numerical Methods for Engineers, by Steven C. Chapra and Raymond P. Canale
(second edition), McGraw-Hill, Inc., New York, 1988.
2. Advanced Engineering Mathematics, by Peter V. O'Neil (third edition) Wadsworth
Publishing Company, Belmont, California, 1991.
Direct methods (such as Gaussian elimination, QR factorization, etc.) for linear systems and
related problems, discussions on perturbation analysis and conditioning of the linear systems prob-
lems, iterative renement, etc. can be found in any standard numerical linear algebra text (in
particular the books by Golub and Van Loan, MC and by Stewart, IMC are highly recom-
mended). Most numerical analysis texts contain some discussion, but none of the existing books
provides a through and in-depth treatment. For discussion on solutions of linear systems with spe-
cial matrices such as diagonally dominant, Hessenberg, positive denite, etc., see Wilkinson (AEP,
pp. 218{220).
For proofs and analyses of backward error analyses of various algorithms, AEP by Wilkinson is
the most authentic book. Several recent papers by Nick Higham (1986,1987) on condition-number
estimators are interesting to read.
For iterative methods, two (by now) almost classical books on the subject:
1. Matrix Iterative Analysis, by Richard Varga, Prentice Hall, Engelwood Clis, New
Jersey, 1962, and
2. Iterative Solution of Large Linear Systems, by David Young, Academic Press, New
York, 1971,
are a must.
Another important book in the area is: Applied Iterative Methods, by L. A. Hageman and
D. M. Young, Academic Press, New York, 1981. The most recent book in the area is Templates
for the Solution of Linear Systems: Building Blocks For Iterative Methods, by
Richard Barrett, Mike Berry, Tony Chan, James Demmel, June Donato, Jack Dongarra, Viector
Eijkhoat, Roland Pozo, Chuck Romine and Henk van der Vorst, SIAM, 1994. The book incoporates
state-of-the-art computational methods for solving large and sparse non-symmetric systems.
366
The books: Introduction to Parallel and Vector Solutions of Linear Systems and
Numerical Analysis { A Second Course, by James Ortega, also contain very clear expositions
on the convergence of iterative methods.
The Conjugate Gradient Method, originally developed by M. R. Hestenes and E. Stiefel (1952),
has received considerable attention in the context of solving large and sparse positive denite
linear systems. A considerable amount of work has been done by numerical linear algebraists and
researchers in applications areas (such as in optimization).
The following books contain some in-depth discussion:
1. Introduction to Linear and Nonlinear Programming, by David G. Luenberger,
Addison-Wesley, New York, 1973.
2. Introduction to Parallel and Vector Solutions of Linear Systems, by James
Ortega.
An excellent survey paper on the subject by Golub and O'Leary (1989) is also recommended
for further reading. See also another survey paper by Axelsson (1985).
The interesting survey paper by Young, Jea, and Mai (1988), in the book Linear Algebra in
Signals, Systems, and Control, edited by B. N. Datta, et al., SIAM, 1988, is worth reading.
The development of conjugate gradient type methods for nonsymmetric and sym-
metric indenite linear systems is an active area of research.
A nice discussion on \scaling" appears in Forsythe and Moler (CSLAS, Chapter 11). See also
a paper by Skeel(1979) for relationship between scaling and stability of Gaussian elimination.
A recent paper of Chandrasekaran and Ipsen( 1994)describes sensitivity of individual compo-
nents of the solution vector when the data is subject to perturbations.
To learn more about the Arnoldi method and the Arnoldi-based GMRES method, see the papers
of Saad and his coauthors (Saad (1981), Saad (1982), Saad and Schultz (1986)). Walker (1988) has
proposed an implementation of GMRES method using Householder matrices.
367
Exercises on Chapter 6
Use MATLAB Wherever Needed and Appropriate
PROBLEMS ON SECTION 6.3
1. An engineer requires 5000, 5500, and 6000 yd3 of sand, cement and gravel for a building
project. He buys his material from three stores. A distribution of each material in these
stores is given as follows:
Store Sand Cement Gravel
% % %
1 60 20 20
2 40 40 20
3 20 30 50
How many cubic yards of each material must the engineer take from each store to meet his
needs?
2. If the input to reactor 1 in the \reactor" problem of Section 6.3.2 is decreased 10%, what is
the percent change in the concentration of the other reactors?
3. Consider the following circuit diagram:
10Ω 5Ω 1
3 2
V1 = 200V
5Ω 10Ω
V6 = 50V
4 5 6
15Ω 20Ω
o o
100 C (1,2) (2,2) (3,2) 75 C
o
0 C
5. Derive the linear system for the nite dierence approximation of the elliptic equation
2T + 2T = f (x; y):
x2 y 2
The domain is in the unit square, x = 0:25, and the boundary conditions are given by
T (x; 0) = 1;x
T (1; y ) = y
T (0; y ) = 1
T (x; 1) = 1:
6. For the previous problem, if
f (x; y ) = ; 2 sin(x) sin(y);
then the analytic solution to the elliptic equation
2T + 2T = f (x; y);
x2 y 2
with the same boundary conditions as in problem #5, is given by
T (x; y) = 1 ; x + xy + 12 sin(x) sin(y )
(Celia and Gray, Numerical Methods for Differential Equations, Prentice Hall, Inc.,
NJ, 1992, pp. 105{106).
(a) Use the nite dierence scheme of section 6.3.4 to approximate the values of T at the
interior points with x = y = n1 ; n = 4; 8; 16:
(b) Compare the values obtained in (a) with the exact solution.
(c) Write down the linear system arising from nite element method of the solution of the
two-point boundary value problem: ;2u00 + 3u = x2 , 0 x 1; y (0) = y (1) = 0, using
the same basic functions j (x) as in the book and uniform grid.
369
PROBLEMS ON SECTION 6.4
7. Solve the linear system Ax = b, where b is a vector with each component equal to 1 and with
each A from problem #18 of Chapter 5, using
(a) Gaussian elimination wihtout pivoting,
(b) Gaussian elimination with partial and complete pivoting,
(c) the QR factorization.
8. (a) Solve each of the systems of Problem #7 using parital pivoting but without explicit
factorization (Section 6.4.4).
(b) Compute the residual vector in each case.
9. Solve 0 0:00001 1 1 1 0 x 1 0 2:0001 1
1
B
B C B C = BB 3 CC
1 1 A @ x2 C
C B
@ 3 A @ A
1 2 3 x3 3
using Gaussian elimination without and with partial pivoting and compare the answers.
10. Consider m linear systems
Axi = bi; i = 1; 2; : : :; m:
(a) Develop an algorithm to solve the above systems using Gaussian elimination with com-
plete pivoting. Your algorithm should take advantage of the fact that all the m systems
have the same system matrix A.
(b) Determine the
op-count of your algorithm.
(c) Apply your algorithm in the special case where bi is the ith column of the identity matrix,
i = 1; : : :; n.
(d) Use the algorithm in (c) to show that the inverse of an n n matrix A can be computed
using Gaussian elimination with complete pivoting in about 43 n3
ops.
(e) Apply the algorithm in (c) to compute the inverse of
01 1 11
B 2 3
A = @ 2 3 41 C
B C
1 1
A:
1 1 1
3 4 5
11. Consider the system Ax = b, where both A and b are complex. Show how the system can be
solved using real arithmetic only. Compare the
op-count in this case with that needed to
solve the system with Gaussian elimination using complex arithmetic.
370
12. (i) Compute the Cholesky factorization of
01 1 1
1
A=B
B 1 1:001 1:001 CC
@ A
1 1 2
using
(a) Gaussian elimination without pivoting,
(b) the Cholesky algorithm.
ii) In part (a) verify that
16. (a) Show that Gaussian elimination applied to a column diagonally dominant matrix pre-
serves diagonal dominance at each step of reduction; that is, if A = (aij ) is column
diagonally dominant, then so is A(k) = (a(ijk)); k = 1; 2; : : :; n ; 1.
371
(b) Show that the growth using Gaussian elimination with partial pivoting for such a matrix
is bounded by 2. (Hint: max max ja(k)j 2 max jaiij).
k i;j ij
(c) Verify the statement of part (a) with the matrix of problem #14.
(d) Construct a 2 2 column diagonally dominant matrix whose growth factor with Gaussian
elimination without pivoting is larger than 1 but less than or equal to 2.
17. Solve the tridiagonal system
0 2 ;1 0 0 1 0 x 1 0 1 1
BB CC BB 1 CC BB CC
BB ;1 2 ;1 0 CC BB x2 CC = BB 1 CC
B@ 0 ;1 2 ;1 CA B@ x3 CA B@ 1 CA
0 0 ;1 2 x4 1
(a) using Gaussian elimination,
(b) computing the LU factorization of A directly from A = LU .
18. Solve the diagonally dominant system
0 10 1 1 1 1 0 x 1 0 13 1
1
B
B 1 10 1 1 C
C B
BB x2 CCC BBB 13 CCC
B
B C
CC BB CC = BB CC
B
@ ; 1 0 10 1 A @ x3 A @ 10 A
;1 ;1 ;1 10 x4 7
using Gaussian elimination without pivoting. Compute the growth factor.
19. (a) Develop ecient algorithms for triangularizing (i) an upper Hessenberg matrix (ii) a
lower Hessenberg matrix, using Gaussian elimination with partial pivoting.
(b) Show that when A is upper Hessenberg, Gaussian elimination with partial pivoting gives
ja(ijk)j k + 1; if jaij j 1;
hence deduce that the growth factor in this case is bounded by n.
(c) Apply0 1your2 algorithms
1to0 solve
1 the0 systems:
1
2 3 x1 6
B
B 3 4 5 7C
C BB x CC BB 7 CC
B
i. B C
C BB 2 CC = BB CC
B
@ 0 0:1 2 3 C A B@ x3 CA B@ 8 CA
0 0 0:1 1 x4 9
0 1 0:0001 0 1 0 x 1 0 1:0001 1
1
B C B C B C
@ 0 2 0:0001 CA B@ x2 CA = B@ 2:0001 CA
ii. B
0 0 3 x3 3:0000
372
00 0 0 110x 1 001
BB 1 0 0 2 CC BB x1 C BB 0 CC
B C B 2CC
iii. BB@ 0 1 0 3 CCA BB@ x3 C
C =BBB CCC :
A @1A
0 0 1 4 x4 1
(d) Compute the growth factor in each case.
(e) Suppose the data in the above problems are accurate to 4 digits and you seek an accuracy
of three digits in your soluiton. Identify which problems are ill-conditioned.
0 1 1
1
20. (a) Find the QR factorization of A = B
B 10;5 0 CC.
@ A
0 10 ; 5
0 2 1
(b) Using the results of (a), solve Ax = b, where b = B
B 10;5 CC.
@ A
10;5
21. Find the LU factorization of 0T I 0 1
A=B
B I T I CC
@ A
0 I T
0 2 ;1 0 1
B C
where T = B@ ;1 2 ;1 CA :
0 ;1 2
Use this factorization to solve Ax = b; where each element of b is 2.
373
B21 = ;B22 A21A;111;
and B11 = A;111 ; B12 A21A;111 :
(b) How many
ops are needed to compute A;1 using the results of (a) if A11 and A22 are,
respectively, m m and p p? 0 4 0 ;1 ;1 1
BB 0 4 ;1 ;1 CC
(c) Use your results above to compute A ; where A = B
; 1 BB CC :
@ ;1 ;1 4 0 CA
;1 1 0 4
01 2 1
1 0 0 2 1
1
B C B C
24. Let A = B@ 2 4:0001 2:0002 C A and B = B@ 2 4:0001 2:0002 CA.
1 2:0002 2:0004 1 2:0002 2:0004
Write B in the form B = A ; uv T , then compute B ;1 using the Sherman-Morrison formula,
knowing 0 4:0010 ;2:0006 0:0003 1
B C
A;1 = 104 B
@ ;2:0006 1:0004 ;0:0002 CA
0:0003 ;0:0002 0:0001
25. Suppose you have solved a linear system with A as the system matarix. Then show, how you
can solve the augmented system
A
! x
! b
!
= ;
c xn+1 bn+1
where A is nonsingular and n n and a; b, and c are vectors, using the solution you have
already obtained. Apply your result to the solution of
01 2 3 11
BB 4 5 6 1 CC
BB C
B@ 1 1 1 1 CCA y = ( 6 15 3 1 ) :
0 0 1 2
PROBLEMS ON SECTIONS 6.6 and 6.7
26. Consider the symmetric systems Ax = b, where
0 0:4445 0:4444 ;0:2222 1 0 0:6667 1
A=B
B 0:4444 0:4445 ;0:2222 CC b = BB 0:6667 CC :
@ A @ A
;0:2222 ;0:2222 0:1112 ;0:3332
011
B CC
The exact solution of the system x = B
@ 1 A.
1
374
(a) Make a small perturbation b in b keeping A unchanged. Solve the system Ax0 = b + b.
Compare x0 with x. Compute Cond(A) and verify the appropriate inequality in the text.
(b) Make a small perturbation A in A such that kAk kA1;1 k . Solve the system
(A + A)x0 = b. Compare x0 with x and verify the appropriate inequality in the text.
(Hint: kA;1k2 = 104).
27. Prove the inequality
kxk kAk
kx + xk Cond(A) kAk ;
where Ax = b and (A + A)(x + x) = b.
0 1 1 1 10x 1 011 0 0 0 0:00003 1
1
B 1 1 1 CC BB x CC = BB 1 CC, using A = BB 0 0
Verify the inequality for the system B
2 3
0 C
C.
@ 2 3 4 A@ 2A @ A @ A
1 1 1
3 4 5 x3 1 0 0 0
28. (a) How are Cond(A) and Cond(A;1) related?
(b) Show that
i. Cond(A) 1
ii. Cond(AT A) = (Cond(A))2.
29. (a) Let O be an orthogonal matrix. Then show that Cond(O) with respect to the 2-norm is
one.
(b) Show that the Cond(A) with respect to the 2-norm is one if and only if A is a scalar
multiple of an orthogonal matrix.
30. Let U = (uij ) be a nonsingular upper triangular matrix. Then show that with respect to the
innity norm
Cond(U ) max( uii) :
min(u )
ii
Hence construct a simple example of an ill-conditioned non-diagonal symmetric positive def-
inite matrix.
31. Let A = LDLT be a symmetric positive denite matrix. Let D = diag(Dii). Then show that
with respect to 2-norm
Cond(A) max(dii) :
min(dii)
Hence construct an example for an ill-conditioned non-diagonal symmetric positive denite
matrix.
375
32. (a) Show that for any matrix A, Cond(A) with respect to 2-norm is given by
Cond(A) = max ;
min
where max and min are, respectively, the largest and the smallest singular values of A.
(b) Use the above expressions for Cond(A) to construct an example of an ill-conditioned
matrix as follows: choose two non-diagonal orthogonal matrices U and V (in particular
they can be chosen as Householder matrices) and a diagonal matrix with one or several
small diagonal entries. Then the matrix
A = U V T
has the same condition-number as and is ill-conditioned.
33. (a) Construct your own example to show that a small residual does not necessarily guarantee
that the solution is accurate.
(b) Give a proof of Theorem 6.7.1 (Residual Theorem).
(c) Using the Residual Theorem prove that if an algorithm produces a small residual for
every well-conditioned matrix, it is weakly stable.
1 a
!
34. (a) Find for what values of a the matrix A = is ill-conditioned?
a 1
1
!
(b) Let a = 0:999. Solve the system Ax = using Gaussian elimination without pivot-
1
ing.
(c) What is the condition number of A?
376
(b) Carry out ve iterations of both the methods with the same initial approximation
001
BB 0 CC
x =B
(1) BB CCC
@0A
0
and compare the rates of convergence.
37. Construct an example to show that the convergence of the Jacobi method does not necessarily
imply that the Gauss-Seidel method will converge.
38. Let the n n matrix A be partitioned into the form
0 A A A 1
BB A11 A12 A1n CC
A=BBB ..21 . .22. . . . ..2n CCC
@ . . A
AN;1 AN;2 AN;N
where each diagonal block Aii is square and nonsingular. Consider the linear system
Ax = b
with A as above and x and b partitioned commensurately.
(a) Write down the Block Jacobi, Block Gauss-Seidel, and Block SOR iterations for the
linear system Ax = b, (Hint: Write A = L + D + U , where D = diag(A11; : : :; ANN ),
and L and U are strictly block lower and upper triangular matrices.)
(b) If A is symmetric positive denite, then show that U = LT and D is positive denite. In
this case, from the corresponding results in the scalar cases, prove that, with an arbitrary
choice of the initial approximation, Block Gauss-Seidel always converges and Block SOR
converges if and only if 0 < w < 2.
39. Consider the Block system arising in the solution of the discrete Poisson equation: uxx +
uyy = f : 0 T ;I 1
BB ;I T . . . CC
BB . . . C
B@ .. . . . . ;I CCA
;I T
377
where 04 ;1 0 0 1
BB ;1 4 ;1 0 0 C
C
BB ... ... ... 0 C
C
T =B BB CC
B@ . . . . . . ;1 CCA
0 0 0 ;1 4
Show that the Block Jacobi iteration in this case is
Tx(ik+1) = x(i+1
k) + x(k) + b ; i = 1; : : :; N:
i;1 i
Write down the Block Gauss-Seidel and Block SOR iterations for this system.
p p
40. (a) Prove that 1kxk2 kxkA nkxk2, where A is a symmetric positive denite matrix
with the eigenvalues 0 < 1 2 : : : n.
p
(b) Using the result in (a), prove that kx ; xk k2 2 k kx ; x0 k2, for the conjugate
gradient method.
41. Show that the Jacobi method converges for a 2 2 symmetric positive denite system.
42. For the system of problem #39, compute (BJ ); (BGS ), and wopt with N = 50; 100; 1000.
Compare the rate of convergence of the SOR iteration with using the optimal value wopt in
each case with that of Gauss Seidel, without actually performing the iterations.
43. Consider the block diagonal system
0A 0 0 1 011
BB 0 A 0 0 C
C BB 1 CC
BB . . C
. . . . . . ... C
B C
BB .. . . CC x = BBB 1 CCC
BB .. . . . ... ... 0 C CA BB .. CC
@. @.A
0 0 0 A 2525 1
where 0 2 ;1 0 01
BB ;1 2 ;1 0C
C
BB . . . . .. C
C
A=B . C
BB ... . . . . . . . . CC
B@ .. .. ..;1 C A
0 ;12 55
Compute (BJ ) and (BGS ) and nd how they are related. Solve the system using 5 iterations
of Gauss-Seidel and SOR with optimal value of w. Compare the rates of convergence.
378
44. Prove that the choice of
= pT (Ax ; b)=pT Ap
minimizes the quadratic function
() = (x ; p)
= 12 (x ; p)T A(x ; p) ; bT (x ; p):
379
Develop the Jacobi, the Gauss-Seidel and the SOR methods based on the multisplitting of
A. This type of multisplitting has been considered by O'Leary and White (1985),
and Neuman and Plemmons (1987).
49. Apply the Jacobi, the Gauss-Seidel and the SOR (with optimal relaxation factor) methods to
the system in the example following Theorem 6.10.6 (Example 6.10.5) and verify the statement
about number of iterations made there about dierent methods.
50. Give a proof of Theorem 6.10.8.
51. Read the proof of Theorem 6.10.9 from Ortega, IPVL, p. 277, and then reproduce the proof
yourself using your own words.
380
MATLAB AND MATCOM PROGRAMS AND PROBLEMS ON CHAPTER 6
You will need the programs lugsel, inlu, inparpiv, incompiv, givqr, compiv,
invuptr, iterref, jacobi, gaused, sucov, nichol from MATCOM.
Use randomly generated test matrices and test matrices creating L and U with one or more
small diagonal entries.
(Note : forelm and backsub are also in MATCOM or in the Appendix.)
381
Test Matrices for Problems 2 Through 8
For problems #2 through 8 use the following matrices as test matrices. When the
problem is linear system problem Ax = b, create a vector b such that the solution
vector x is a vector with all components equal to 1.
1. Hilbert matrix of order 10
2. Pie matrix of order 10
3. Hankel matrix of order 10
4. Randomly generated matrix of order 10
0 1
BB :00001 1 1 CC
5. A = B@ 0 :00001 1 CA.
0 0 :00001
6. Vandermonde matrix of order 10.
2. (a) Using lugsel from MATCOM, backsub, and forelm, write the MATLAB program
[x] = linsyswp(A; b)
to solve Ax = b using Gaussian elimination without pivoting.
Compute the growth factor, elapsed time, and
op-count for each system.
(b) Run the program inlu from MATCOM and multiply the result by the vector b, to obtain
the solution vector x = A;1 b. Compute
op-count.
(c) Compare the computed solutions and
op-counts of (a) and (b).
3. (a) Using parpiv and elmul from Chapter 5, and backsub, write a MATLAB program
[x] = linsyspp(A; b)
to solve Ax = b using Gaussian elimination with partial pivoting.
Compute the growth factor, elapsed time, and
op-count for each system.
382
(b) Run the program inparpiv from MATCOM and multiply the result by b to compute
the solution vector x = A;1 b.
Compute
op-count for each system.
(c) Compare the computed solutions,
op-counts and elapsed times of (a) and (b).
4. (a) Using compiv from MATCOM, elmul from Chapter 5, and backsub, write the MAT-
LAB program
[x] = linsyscp(A; b)
to solve Ax = b using Gaussian elimination with complete pivoting.
Compute
op-count, elapsed time, and the growth factor for each system.
(b) Run the program incompiv from MATCOM and multiply the result by b to compute
the solution vector x = A;1 b.
Compute
op-count for each system.
(c) Compare the computed solutions,
op-count, and elapsed time of (a) and (b).
5. (a) Implement the algorithm in section 6.4.4 to solve Ax = b without explicit factorization
using partial pivoting:
[x] = linsyswf (A; b):
(b) Compute A;1 using this explicit factorization.
6. (a) Using housqr from Chapter 5 (or the MATLAB function qr) and backsub, write the
MATLAB program
[x] = linsysqrh(A; b)
to solve Ax = b using QR factorization with Householder matrices. Compute
op-count
for each system.
(b) Repeat (a) with givqr in place of housqr; that is, write a MATLAB program called
linsysqrg to solve Ax = b using the Givens method for QR factorization.
7. (The purpose of this exercise is to make a comparative study with respect to
accuracy, elapsed time,
op-count and growth factor for dierent methods for
solving Ax = b.)
Tabulate the result of problems 2 through 6 in the following form:
Make one table for each matrix. x^ stands for the computed solution.
383
TABLE 6.1
(Comparison of Dierent Methods For the Linear System Problems)
The computed Rel. Error Residual Growth Elapsed
Method solution x^ jjx ; x^jj=jjxjj jjb ; Ax^jj Factor Time
linsyswp
linsyspp
linsyscp
linsyswf
linsysqrh
linsysqrg
A;1b
384
8. (a) Write a MATLAB program to nd the inverse of A using housqr (or MATLAB function
qr) and invuptr (from MATCOM):
[A] = invqrh(A):
Compute
op-count for each matrix.
(b) Repeat (a) using givqr and invuptr:
[A] = invqrg(A):
Compute
op-count for each matrix.
(c) Run inlu, inparpiv, incompiv from MATCOM with each of the data matrices. Make
a table for each matrix A to compare the dierent methods for nding the inverse with
respect to accuracy and
op-count. Denote the computed inverse by A^. Get A;1 by
using the MATLAB command inv (A).
TABLE 6.2
(Comparison of Dierent Methods for Computing the Inverse)
Relative Error
Method jjA;1 ; (A^)jj=jjA;1jj Flop-Count
inlu
inparpiv
incompiv
invqrh
invqrg
9. (a) Modify the program elmlu to nd the cholesky factorization of a symmetric positive
denite matrix A using Gaussian elimination without pivoting.
[H ] = cholgauss(A):
Create a 15 15 lower triangular matrix L with positive diagonal entries taking some of
the diagonal entries small enough to be very close to zero, multiply it by LT and take
A = LLT as your test matrix. Compute H .
Compute
op-count.
385
(b) Run the MATLAB program Chol on the same matrix in (a), denote the transpose of
the result by H^ . Compute
op-count.
(c) Compare the results of (a) and (b). (Note that Chol(A) gives an upper triangular matrix
H such that A = H T H ).
10. Run the program linsyswp with the diagonally dominant, symmetric tridiagonal and block
tridiagonal matrices encountered in Section 6.3.5 by choosing the right-hand side vector b so
that the solution vector x is known apriori. Compare the exact solution x with the computed
solution x^.
(The purpose of this exercise is to verify that to solve a symmetrical positive
denite system, no pivoting is needed to ensure stability in Gaussian elimination).
11. (a) Write a MATLAB program to implement algorithm 6.7.2 that nds an upper bound of
the 2-norm of the inverse of an upper triangular matrix:
[CEBOUND] = norminvtr(U ):
Test your result by randomly creating a 10 10 upper triangular matrix with several
small diagonal entries, and then compare your result with that obtained by running the
MATLAB command:
norm(inv (U )):
(b) Now compute the condition number of U as follows:
norm(U ) norminvtr(U ):
Compare your result with that obtained by running the MATLAB command: cond (U ):
Verify that
Cond(U ) max (uii ) :
min(u )ii
(Use the same test matrix U as in part (a)).
12. (a) (The purpose of this exercise is to compare dierent approaches for estimat-
ing the condition number of a matrix). Compute and/or estimate the condition
number of each of the following matrices A of order 10: Hilbert, Pie, randomly
generated, Vandermonde, and Hankel using the following approaches:
i. Find the QR factorization of A with column pivoting: QT AP = R
Estimate the 2-norm of the inverse of R by running norminvtr on R.
Now compute norm(R) * norminvtr(R). Compute
op-count.
386
ii. Compute norm(A) * norm( inv(A)). Compute
op-count.
iii. Compute cond(A). Compute
op-count.
(b) Now compare the results and
op-counts.
13. (a) Run the iterative renement program iterref from MATCOM on each of the 15 15
systems: Hilbert, Pie, Vandermonde, randomly generated, Hankel, using the
solution obtained from the program linsyspp (problem #3) as the initial approximation
x(o)
(b) Estimate the condition number of each of the matrices above using the iterative rene-
ment procedure.
(c) Compare your results on condition number estimation with those obtained in problem
#12.
14. Run the programs jacobi, gaused, and sucov from MATCOM on the 6 6 matrix A of
Example 6.10.5 with the same starting vector x(0) = (0; 0; :::; 0)T . Find how many iterations
each method will take to converge. Verify the statement of the example that it takes ve
iterations for SOR to converge with !opt compared to twelve iterations for Jacobi.
15. Run the programs jacobi, gaused, and sucov from MATCOM on Example 6.10.3 and verify
the statement of the example.
16. Run the program nichol from MATCOM implementing the \NO-Fill Incomplete Cholesky
Factorization" on the tridiagonal symmetric positive denite matrix T of order 20 arising
in descretization of Poisson's equation. Compare your result with that obtained by running
chol(T) on T .
17. Write a MATLAB program called arnoldi based on the Arnoldi method (Algorithm 6.10.8)
using modifying Gram{Schmidt algorithm (modied Gram{Schmidt has been implemented
in MATCOM program mdgrsch (see Chapter 7)).
18. Using Arnoldi and a suitable least-squares routine from Chapter 7, write a MATLAB program
called gmres to implement the GMRES algorithm (Algorithm 6.10.9).
387