Numerical Linear Algebra
Numerical Linear Algebra
Numerical Linear Algebra
W. Kahan
Table of Contents
The Solution of Linear Equations
0. Introduction.
1. The Time Needed to Solve Linear Equations.
2. The Time Needed to Solve a Linear System with a
Band Matrix.
3. Iterative Methods for Solving Linear Systems.
4. Errors in the Solution of Linear Systems.
5. Pivoting and Equilibration.
The Solution of the Symmetric Eigenproblem *
6. The General Eigenproblem and the QR Method.
7. Iterative Methods for Symmetric Eigenproblems.
8. The Reduction to Tri-diagonal form.
9. Eigenvalues of a Tri-diagonal Matrix.
10. Eigenvectors of a Tri-diagonal Matrix.
11. Errors in the Solution of an Eigenproblem.
0. Introduction. The primordial problems of
linear algebra are the solution of a system of linear
equations
757
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
What is the current state of the art of
solving numerical problems in linear algebra with the
aid of electronic computers? That question is the
theme of part of this paper. The rest of the paper
touches upon two or three of the collateral
mathematical problems which have captured my attention
for the past several years. These problems spring
from the widespread desire to give the computer all
its instructions in advance. When the computer is
engrossed in its computation at the rate of perhaps a
million arithmetic operations per second, human
supervision is at best superficial. One dare not
interrupt the machine too frequently with requests
"WHAT ARE YOU DOING NOW?" and with afterthoughts and
countermands, lest the machine be dragged back to the
pace at which a human can plod through a morass of
numerical data. Instead, it is more profitable to
launch the computer on its phrenetic way while we
calmly embrace mathematical (not necessarily
computational) techniques like error-analysis to
predict and appraise the machine's work. Besides,
the mathematical problems of prediction and appraisal
are interesting in their own right.
758
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
1. The Time Needed to Solve Linear Equations,
On our computer (an IBM 709^-11 at the University of
Toronto) the solution of 100 linear equations in 100
unknowns can be calculated in about 7 seconds;
during this time the computer executes about 5000
divisions, 330000 multiplications and additions, and
a comparable amount of extra arithmetic with small
integers which ensures that the aforementioned
operations are performed in their correct order.
This calculation costs about a dollar. To calculate
the inverse of the coefficient matrix costs about
three times as much. If the coefficients are complex
instead of real, the cost is roughly quadrupled. If
the same problem were taken to any other appropriate
electronic computer on this continent, the time
taken could differ by a factor between 1/10 and 1000
(i.e. 1 second to 2 hours for 100 equations)
depending upon the speed of the particular machine
used. These quotations do not include the time
required to produce the equations' coefficients in
storage, nor the time required (a few seconds) to
print the answers on paper.
759
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
about enough for 10 or 20 rows of the matrix. The
rest of the matrix, 980 rows, has to be kept in bulk
storage units, like magnetic tapes or disks, to which
access takes at least as long as several multiplica-
tions. Now, most of the time spent in solving a
linear system is spent thus:
760
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
set aside, one has just (N-l) linear equations in
(N-l) unknowns left.
and 1 ^ -1 -2
3-NJ(1 + 3N - N ) multiplications or divisions.
In the 140 years since this method appeared in print,
many other methods, some of them very different, have
been proposed. All methods satisfy the following
new theorem of Klyuyev and Kokovkin-Shcherbak (1965):
Any algorithm which uses only rational
arithmetic operations to solve a general
system of N linear equations requires at
least as many additions and subtractions,
and at least as many multiplications and
divisions as does Gaussian elimination.
One consequence of this theorem is obtained
by setting N=10000; to solve 10000 linear equations
would take more than two months for the arithmetic
alone on our machine. The time might come down to a
day or so when machines 100 times faster than ours
are produced, but such machines are just now being
developed, and are most unlikely to be in widespread
use within the next five years. The main impediment
seems to be storage access time. (For more details,
see IPIP (1965).)
In the meantime, there are several good
reasons to want to solve systems of as many as
10000 equations. For example, the study of many
physical processes (radiation, diffusion,
elasticity,...) revolves about the solution of
partial differential equations. A powerful technique
for solving these differential equations is to
761
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
approximate them by difference equations over a
lattice erected to approximate the continuum. The
finer the lattice (i.e. the more points there are in
the lattice), the better the approximation. In a
20x20x20 cubic lattice there are 8000 points. To
each point corresponds an unknown and a difference
equation. Fortunately, these equations have special
properties which free us from the limitation given
by Klyuyev and Kokovkin-Shcherbak. (For details
about partial difference equations, see Smith (1965)
or Forsythe and Wasow (i960).)
a. . = 0 if i-j| M
A Band-Matrix,
M/N 1-1/6
is frequently between 1 and 3 . With care, the
matrix corresponding to k coupled boundary value
problems over the same lattice can often be put in a
band form for which the quotient above lies between
k and 3k .
The advantage of a band structure derives from
the fact that it is preserved by the row-operations
involved in Gaussian elimination. This is obvious
762
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
when we select, for I = 1,2,..., N in turn, the
I equation to eliminate the I unknown from all
subsequent equations. It is true also when any other
row-selection rule is used, provided the width of
that part of the band above the main diagonal is
allowed to increase by M . Consequently, far less
time and space are needed to solve band-systems than
to solve general systems. The following table
summarizes the dependence of time and space
requirements upon the parameters M and N . For
the sake of simplicity, constants of proportionality
have been omitted, and terms of the order of 1/M
and 1/N or smaller are ignored.
763
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
Direct methods apply to the equation Ax. = b.
a finite sequence of transformations at whose
termination the equations have a new form, say
U:x = c_ , which can be solved by an obvious and
finite calculation. For example, in Gaussian
elimination U is an upper triangular matrix which,
with £ , can be shown to satisfy
{U-,£} = iT-'-PUjb}
where P is a permutation of the identity and L
is a lower triangular matrix with diagonal elements
all equal to 1 ; and the obvious calculation that
solves Ux = c_ is back substitution. In the
absence of rounding errors, the computed solution is
exact. (For more details see Faddeev and Faddeeva
(1964) ch. II, Householder (1964) ch. 5, or Fox
(1964) ch. 3-5.)
On the other hand, an iterative method for
solving Ax = b_ begins with a first approximation
z_n , to which a sequence of transformations are
applied to produce a sequence of iterates
z, , z_p , z~ , . . . which are supposed to converge
toward the desired solution x_ • ^-n practice the
sequence is terminated as soon as some member z,
of the sequence is judged to be close enough to x_ •
An example of an iterative method is the
Liouvilie-Neumann series which is produced by what
numerical analysts call nJacobiTs Method":
Suppose Ax = b_ can be transformed
conveniently into the form
x. = Bx_ + c_
where the matrix B is small in some sense. To be
more precise, we shall assume that ||B || = 8 < 1 .
(The symbol ||. ..|| represents a matrix norm about
which more will be said later.) We begin with a
first approximation z_n , for which 0_ will do if
nothing better is known, and iterate thus:
764
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
z . k + 1 = Bz^ + £ for k = 0,1,2,... ,
= (I+B+B 2 + . . . +B k )c + B k + 1 z r
= x + Bk+1(z,0-x) •
l|zk-x!<Sk|lz0-x|l .
765
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
^generalization and improvement of iterative methods
in a vast diversity of ways.
= + C
-k+1 -k k-k
where
Z k = b - Az k
766
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
be chosen, is that ïV(*) "* 0 f ° r every
eigenvalue X of CA .
767
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
The simplest application of this theorem is
to the circle
|L k (w)| = |(l-w) k | 1 Bk ,
768
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
some numbers quite near ao > which may be very small
in cases of slow convergence. But then the term
Y-tCr. , when its turn comes in the iteration, can be
so large and so much magnify the effects of rounding
errors that the convergence of the iteration is
jeopardized (Young (1956)).
Fortunately, the Tchebycheff polynomials
satisfy a three-term recurrence
T k+1 (L(z)) = 2L(z)Tk(L(z)) - T^ 1 (L(z))
+ + 6
^k+i • ^k Vïk k^k - s*-i> •
and an appropriate choice of y , , 6, , and
*k+i - 2 - = B(
^k - 1 > >
769
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
where 6 = ||B|| < 1 . In other words, each iteration
reduces the norm of the error by a factor 3 < 1 .
= + y + 6
Sk+i ^k <A k(^k " S t - i >
looks formally just like the iteration that was used
above to construct the Tchebycheff polynomials, but
now the constants -y, and 6, must be chosen to
minimize IK,.-J I i-n that step. This choice of "y,
and 6, has the stronger property that no other
choice of v Q , 6 Q , \-., 6-,, ... , Y, , 6, could yield
a smaller value for IIUV+TJI • I n particular,
ÎIM = 2. > the iteration converges in a finite number
of steps. An excellent exposition of this powerful
technique is given by Stiefel (1953) and (1958).
770
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
here is to adjust some unknown Xj to satisfy
("relax") the I equation I. a^. x. = b. ,
even though in doing so some other equation may be
dissatisfied. The next step is to relax some other
equation, and so on. Gauss (1823) claimed that this
iteration could be performed successfully
"... while half asleep, or while thinking
about other things".
Since his time the method has been systematized and
generalized and improved by orders of magnitude,
especially where its applications to discretized
boundary value problems are concerned. The best
survey of this development is currently to be found
in Vargafs book (1962). Nowadays some of the most
active research into iterative methods is being
conducted upon those variants of relaxation known
as Alternating Direction Methods; see for example
Douglas and Gunn (1965)* Gunn (1965)* Murray and
Lynn (1965), and Kellog and Spanier (1965).
The result of the past fifteen years of
intense mathematical analysis concentrated upon
iterative methods has been to speed them up by
factors of ten and a hundred. Some idea of the times
involved can be gained from surveys by Engeli et al
(1959), Martin and Tee (1961), and Birkhoff, Varga
and Young (1962). For example, the difference
analogue of Dirichlet?s problem (Laplace's equation
with specified boundary values) in a two-dimensional
region with about 3000 lattice points can be solved
to within about 6 significant decimals in about 300
iterations of successive over-relaxation, requiring
about 30 seconds on our machine. This is one third
as long as would be needed to apply Gaussian
elimination to the corresponding band matrix. A
three-dimensional problem with 10000 equations and
unknowns could be solved on our machine in less than
5 minutes by iteration; here Gaussian elimination
takes hundreds of times as long, so the value of
iteration is well established. But the time required
for iterative methods generally cannot easily be
predicted in advance except in special cases (which
are fortunately not uncommon). Furthermore, the
choice of one out of several possible iterative
methods is frequently a matter of trial and error.
Even if the iterative method is chosen on rational
grounds, there will be parameters (like the constants
Y k and 6, above) which must be chosen carefully
for maximum efficiency; but to choose their values
771
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
will frequently require information that is harder
to obtain than the solution being sought in the
first place, (A welcome exception is the method of
conjugate gradients.) Evidently there is plenty of
room for more research, and especially for the
consolidation of available knowledge.
772
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
This example and other problems of error
analysis are easier to discuss using the language of
matrix norms, which I digress to introduce here.
U n t i l f u r t h e r n o t i c e , t h e m a t r i x norm u s e d below
w i l l be assumed t o be one of t h e norms IIAII
773
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
Finally, the notion of a dual linear
functional should be mentioned.. If the row-vector
£T is regarded as a linear operator from a normed
vector space to the space of complex numbers normed
as usual, then
Z\ = IIJLT| 11*11 = 1 •
(For more details, see Householder (1964) or any text
on normed linear spaces; e.g. Day (1962), Kantorovich
and Akilov (1964).)
Now it is possible to discuss the meaning of
"ill-condition". To each matrix A , regarded as a
linear operator from one normed space to another, can
be assigned its condition number K(A) associated
with the norms and defined thus:
774
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
/-86480000. Il*4l0000.>
A" - UonEnnnnn
\i2969OOOO. o-i£-innnrj J
-21610000. and
K(A) * 2 x 10 8 .
If we apply the inequality, associating r with Ajb
and z_ - x with Ax , we verify that
P r o o f . Of c o u r s e , i f (A + AA) i s s i n g u l a r ,
t h e n t h e r e i s some x ^ 0_ f o r which (A + AA) x. = Q.
Therefore || AA|| >_ ||AAX)| / ||x ||
• rai/M
= IIAXII/IIA- 1 AXII
i I/HA"1)! = ||A||/K(A) .
To find a AA which achieves the inequality we
for
consider that vector £ which
II A " d = II A " ! ! ||Z|J ¥ 0 .
1 1
T -1
Then let w be dual to A £
T 1
i.e. w A" Z = JJw j| jJA" 1 2|| - 1 ,
T
and set AA = - £ w T .
We have (A + AA) A~ £ = 0 , so A + AA is singular.
And
|AA[| = max JJ^ w T xJJ/j|x|| o v e r 2L ^ H
= BjJ max |w T x|/iixii
775
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
-lid I*TI MMI/lK^Jl = i/IJA-1!
= ||A||/K(A) .
(A + AA) z = b
776
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
where, although AA is not independent of b^ and
z, , AA satisfies an inequality of the form
< K(A) e 11 M
,l
- 1 - K(A) e ^"
where K(A) is A f s condition number,
e = c g N N p B~S and is very small,
and we assume that K(A) e < 1 .
777
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
In other words, although the residual r is always
small, the error £ - x. may be very large if A is
ill-conditioned; but this error will be no worse than
if A were in error by about c g N N^ units in its
last place to begin with.
The constant g N has an interesting history
It is connected with the rate of growth of the
"pivots" in the Gaussian elimination scheme. The
pivot is the coefficient a-j-j by which the I
equation is divided during the elimination of x, .
This term "pivot" will be easier to explain after
the following example has been considered:
Example 2.
-10 -1
A = 1
1
À A. .1
0 -2 2 x.
,-1 h { -2 0.9999999998 1.0000000002
. „ «.1,x.
-2
2 1.0000000002 0.9999999998/ x Q
778
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
But if we have a calculator whose capacity is limited
to 8 decimal digits then the best we could do would
be to approximate the reduced matrix by
-10
22.
-1 l u
l
-5 x 10- 5 x 10: bf.
5 x 10* -5 x 10'
_2
-10 -1
A + AA = 0
0
779
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
1
°2
•1.0000000 1.0000000 b'
2 2
-1
A + AA = 1
1
N-l
% ±2
This bound is achieved for X = 1 by the matrix
Example 3:
1 0 0 0 0 1
-1 1 0 0 0 1 a±J = -1 if i > j ,
-1 -1 0 1
1 0 a±± = 1 if i < N ,
-1 -1 -1 1
A
x •
a±1 • 0 if i < j < N,
a.j, = 1 if i < N, and
1 0 1 = X
-1 1 1
-1 -1 -1 -1 -1 X N x N
780
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
if each pivot is chosen on the diagonal as one of
the largest elements in its column, and the columns
are chosen in their natural order 1,2,3, ..., N
during the^elimination. But when we repeat the
computation with X = 2 and sufficiently large N ,
an apparent disaster occurs -because the value of
X = 2 gets lost off the right-hand side of our
computing register. On a binary machine like our
7094 (using truncated 27-bit arithmetic), X = 2
is replaced by X = 1 if N > 28 . An example like
this was used by Wilkinson (1961, p. 327) as part of
the justification for his recommendation that one
use both row and column interchanges when selecting
pivot a T T to ensure that it is one of the largest
elements in the reduced matrix. This pivot-selection
strategy is called "complete pivoting" to distinguish
it from "partial pivoting" in which either row
exchanges or column exchanges, but not both, are
used.
% £N
has been conjectured for complete pivoting when A
is real. The conjectured bound is achieved whenever
A is a Hadamard matrix, and L. Tornheim has shown
that the conjecture is valid when N s 3 • He has
also shown that when A is complex the larger bound
g 3 £ 16/3 3 7 2
can be achieved.)
Despite the theoretical advantages of complete
pivoting over partial pivoting, the former is used
much less often than the latter, mainly because
781
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
interchanging both rows and columns is far more of a
nuisance than interchanging, say, rows alone.
Moreover, it is easy to monitor the size of the
pivots used during a partial pivoting computation,
and stop the calculation if the pivots grow too
large; then another program can be called in to
recompute a more accurate solution with the aid of
complete pivoting. Such is the strategy in use on
our comDUter at Toronto, and the results of using
this strategy support the conviction that intolerable
pivot-growth is a phenomenon that happens only to
numerical analysts who are looking for that
phenomenon.
(A + AA) z, = ;b + Ab_
i n which |] AA JJ and ||Ab_|| a r e bounded i n some g i v e n
way, and h e n c e so i s v_ = AAz_ - Ab_ . A p r e c i s e
answer t o t h e q u e s t i o n i s
782
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
Each column z. of Z = {z.-,,z_p> ... > Zvr)
can be regarded as an approximation to the
corresponding column X- °f
n/e 1 N 1 / 2 .
Since e can be predicted in advance, so can n ;
and it is possible to check whether
n IIAJUJZJI < 1 ,
in which case p < 1 and the following argument is
valid:
783
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
H A-^fl^Jlzll+H A-^-zj^nzii^ IIA-^-CI-AZ)!! =!IZ/|+|| A - :L R||
£l|z|M|Ai|p ;
llA^llillZll/d-p) .
Then IIA"1-ZIU lU""1!! p £ ||z||p/(l-p) .
The last inequality says that, neglecting
modest factors which depend upon N , the relative
error || A~1-Z||/||z || is at worst about K(A) times
as large as the relative errors committed during
each arithmetic operation of ZTs computation. In
other words, if A is known precisely then one must
carry at least l°g-iQ K(A) more decimal guard
digits during the computation of A~ than one
wants to have correct in the approximation Z , and
one can verify the accuracy of Z at the end of
the computation by computing n ||A || ||Z || .
If the method by which an approximation Z
to A~ was computed is not known, there is no way
to check the accuracy of Z better than to compute
R and p directly, and this calculation is not
very attractive for two reasons. First, the
computation of either residual
R = I-AZ or I-ZA
costs almost as much time as the computation of an
-1 3
approximate A ; both computations cost about N
multiplications and additions. Second, if K(A) is
large then ||l-AZ|| and || I-ZA || can be very
different, and there is no way to tell in advance
which residual will give the least pessimistic
over-estimate of the error in Z .
(p=||l-AZ|J = ||A(I-ZA)A""1|| <_ K(A)||I-ZA|| etc.)
Both residuals can be pessimistic by a factor like
K(A) . Finally, although a better approximation to
A" than Z is the matrix
Z 1 = Z + Z(I-AZ) = Z + (I-ZA)Z
784
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
the computation of Z., is in most cases more costly
and less accurate than a direct computation of an
approximate A" using Gaussian elimination with
double precision arithmetic. For example, on our
7094 it takes less than twice as long to invert A
to double precision (carrying 16 dec.) than to do
the same job in single precision (8 d e c ) , and the
double precision computation has almost 8 more
correct digits in its answer. But Z., has at most
twice as many correct digits as Z . Therefore, if
Z comes from a single precision Gaussian elimination
program, it will have about 8-log K(A) correct
digits. Z, will have 16-2 log K(A) digits at
best. The double precision elimination will produce
about 16-log K(A) correct digits. Thus does
engineering technique overtake mathematical
ingenuity !
The solution of Ax_ = b_ for a single vector
x_ is not normally performed by first computing
A" and then x = A" b_ for four reasons. First,
the vector Zb_ , where Z is an approximation to
A~ , is frequently much less accurate than the
approximation z_ given directly by Gaussian
elimination. Second, the direct computation of the
vector £ by elimination costs about 1/3 as much
time as the computation of the matrix Z . Third,
if one wants only to compute a vector z_ which
makes r = b_ - Az_ negligible compared with the
uncertainties in b_ and A , then Gaussian
elimination is a satisfactory way to do the job
despite the possible ill-condition of A , whereas
b_ - A(Zb_) = Rb_ can be appreciably larger than
negligible. Fourth, Gaussian elimination can be
applied when A is a band matrix without requiring
the vast storage that would otherwise be needed for
A" . The only disadvantage that can be occasioned
by the lack of an estimate Z of A~ is that
there is no other way to get a rigorous error-bound
for z^ - x . This disadvantage can be partially
overcome by an iterative method known as
re-substitution.
785
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
This vector z^ will be in error by e_ = x - z_ , and
Ae_ = A(x-z) = b - A^ = r ,
which can be computed. (It is necessary to compute
r carefully lest it consist of nothing but the
rounding errors left when ID and A_z nearly cancel.
Double precision accumulation of products is
appropriate here.) Clearly, the error e_ satisfies
an equation similar to x T s except that r replaces
b_ . Therefore, we can approximate e_ by f , say,
obtained by repeating part of the previous calcula-
tion. If enough intermediate results have been
saved during the computation of z_ , one obtains f_
by repeating upon r the operations that transformed
b_ into z_ . The cost of f_ in time and storage is
usually negligible.
r = b_ - Az/ = r - Af = A2Af , so
e K A
1 ( ) llllll/^ 1 - e
K(A)) i f e K(A) < 1 .
And if e K(A) << 1 then ||rf|| can be expected to
be much smaller than || r || . If z_! is renamed z_ ,
the process can be continued. We have left out
several details here; the point is that the process
of re-substitution generally converges to an
approximation z_ which is correct to nearly full
single precision, provided the matrix A is jfarther
from a singular matrix than a few hundred units in
its last place. The problem is to know when to stop.
786
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
The word "convergence" is well-defined
mathematically in several contexts. But the
empirical meaning of "convergence" is more subtle.
For example, suppose we consider the sequence
Z p Z p , . . ., z_ , ... of successive approximations to
x_ produced by the re-substitution iteration, and
suppose that z = z ,-, = z , 0 = . . . . We should
^ -m -m+1 -m+2
conclude that the sequence has converged. And if
z n - z
—m-1 —m is a good
° deal smaller than
z n - z
—m-2 ., j
—m-1 ' we should incline to the belief that
the convergence of the sequence is not accidental;
there is every
J reason to expect z to be the
^ -m
correct answer x_ except for roundoff in the last
place. But a surprise is possible if A is
exceptionally ill-conditioned:
Example 4. Here is an example of a 2x2
system with
A
= /.8647 .5766) = /.2885 )
ana
1.4322 .2822/ - (.1442/ *
We shall use Gaussian elimination to compute a first
approximation z_ to the solution x of Ax = b_ .
Then r = b_ - Az, is computed exactly, and the
solution e_ of Ae^ = r is approximated by £ ,
obtained again by Gaussian elimination. z_T = z. + f. ,
and rT = b_ - Az_T .
We shall try to calculate x correctly to
3 sig. fig T s. It seems reasonable to carry one
guard digit at first, since we can repeat the
calculation with more figures later if that is not
enough. We shall truncate all calculations to
4 sig. fig f s., just like our 7094 (except that it
truncates to about 8 sig. fig T s.). Intermediate
results enclosed in parentheses are obtained by
definition rather than by means of an arithmetic
operation.
787
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
Comment Equ'n Coef. Coef. Right hand
no. of x. of x 2 side b
1st pivotal row is . . . El .8647 .5766 .2885
.4327/.8647= .4998 E2 .4322 .2882 .1442
.4998xEl Is . . . El' (.4322) .2881 .1441
E2-E1' .. . E3 ( 0 ) .lxlO"3 .lxlO"3
3
E3/.lxl0~ Z2 ( 0 )( 1 ) 1.000
.5766xZ2 E3' ( 0 ) (.5766) .5766
E1-E3* E4 (.8647) ( 0 ) -.2881
E4/.8647 Zl ( 1 )( 0 ) -.3331
-4
Residual of El ... Rl -.6843x10
-4
Residual of E2 ... R2 -.3418x10
-.2124
z = /-3331) f = .2000 x l 0 ~ 3 , z_
x±u z ' =zti_
33312l|
z + f =( 1#Mo002000/ | *
- (1.000/
788
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
which is reassuringly smaller than r_ .
to 3 sig fi
Is x = (î'Joo) ' g?s? No
> 1= /_1
789
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
problem becomes large (say > 30), much of our
experience and Intuition with small dimensionality
(say <. 5) becomes misleading. The following
examples are designed to correct misleading
impressions.
Example 5.
f
1 -1 -1 -1
•• •
0 1 -1 -1 a±J = 0 i f i > J ,
0 0 1 ... -1
A = • •• • a 1(J - - 1 i f i < j ,
•
1 -1 &il = 1 .
fi 1
NxN
1 -1
0 1
790
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
The foregoing examples indicate that Gaussian
elimination is a poor way to determine the rank of
a matrix because a few rounding errors may suffice to
cause none of the pivots to be small despite a
theorem which says that, in the absence of rounding
errors, the rank of a matrix is the same as the
number of non-zero pivots generated during Gaussian
elimination with both row and column pivotal
interchanges to select maximal pivots.
Most other methods for determining rank fare
no better in the face of roundoff. For example, the
Schmidt orthogonalization procedure can be described
in terms of an orthogonal projection of ATs n-th
column a upon the space spanned by the previous
n-1 columns a-,, . . . , a __-, (see Householder (1964)
pp. 6-8 and 134-7). The columns can be interchanged,
if necessary, in order at each stage to maximize the
distance from —n
a to the space
^ of —a.l*
, .....
5 a -, .
—n-1
If this is done, the rank r of A will become
evident when a ,-,, a + ? , •••* and a N all have
distance zero from the space spanned by a-, , a,p, ...,
and a . However, if A is merely nearly singular,
there is no guarantee that any of the distances
mentioned above will be anywhere near as small as
the distance between A and the nearest singular
matrix. Difficulties arise whenever || Av|| << |JA|| [|v||
for some vector v whose components v. can be
ordered in such a way that they steadily decrease in
magnitude to a point where the smallest component is
negligible compared with the largest. The following
example illustrates the phenomenon:
a.. = 0 if i >J
a.. = s-i-1 for i,j = 1,2,
i-1 . «.
a. . = -cs if l
ij
2 2
Here N is large (N > 30) and s + c = 1 .
Since A is upper triangular, the n column of 1
is distant a from the space spanned by the
previous columns. Also, this example has been so
chosen that no column interchanges are needed to
maximize the distance, since a , a^,,, ... , and
a N are all equally distant from the space spanned
791
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
The
by a,, a 2 , ... j â n i • smallest distance
N-l
a
nn l s a NN = s
* How much sma
H e r > t h a n a„ N
is the distance || AA || between A and the nearest
singular matrix A + AA ?
||AA||/a NN = 0 ( l / ( l + c ) N " 1 ) as N - « .
- 0(l/(l+/l-aNN2/N)N) .
-N
For fixed N , it is like 2 as a ™ •> 0 . In
other words, A can be closer to singular than
a. by orders of magnitude if N is large.
792
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
our computation were to calculate pivots, but that
is not our object. We wish to satisfy a set of
linear equations, and the computed solution £ can
come very near to satisfying Ax_ = b_ even though
almost all pivots are entirely different from what
they would have been in the absence of roundoff. In
example 4, Rl and R2 are as small as one could
reasonably expect from 4-figure working, and remain
so even when the pivot .0001 is replaced by, say,
.0002 . There are occasions when a small residual
is all that is wanted (see the discussion of
eigenvector calculations below). In such a case,
we must conclude that small pivots have not prevented
a correct answer from being produced. Besides,
pivotal interchanges do not prevent small pivots.
793
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
where e is comparable with the relative error
associated with one rounding error (e is about
—8
10"" on our machine).
794
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
convenience, and such norms can be terribly
inappropriate.
Example 6.
—8
with a relative uncertainty of 10" in each
element. In other words, A + AA is acceptable in
place of A provided | Aa.. . | <_ 10" | a.. . | . If any
of the aforementioned Holder norms are used, the
condition number of A is at least 10 because
10
there exists a AÂ with || AÂ || <_ 10" ||Â|| such
that Â+AA is singular. Therefore, when Gaussian
elimination carried out with eight sig. fig.
arithmetic gives no useful answer, one is not
surprised. "The system is ill-conditioned."
However, the true solution is
x =
_7
with a relative error smaller than 10 in each
component no matter how A and b_ are perturbed,
provided only that no element of A or b is
changed by more than 10 of itself. This system
is well conditioned! But not in the usual Holder
norm.
Example 6 can be obtained from example 2 by
a diagonal transformation;
795
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
pivot, whereas the most natural pivot in example 6
is the corresponding element 2. Any other pivot
would be far better.
This is where equilibration comes in.
Equilibration consists of diagonal transformations
intended to scale each row and column of A in such
a way that, when Gaussian elimination is applied to
the equilibrated system of equations, the results
are nearly as accurate as possible. In other words,
the system
Ax = b
is replaced by
(RAC) £ = (Rb)
where R and C are diagonal matrices. Then
Gaussian elimination (or any other method) is applied
to the array {RAC, R M to produce an approximation
to 2. anc* hence to x. = ^Z. •
How should R and C be chosen? No one has
published a foolproof method. The closest anyone has
come is in a paper by F.L. Bauer (1963) in which the
R and C which minimize K(RAC) are described in
terms of an eigenvalue and eigenvectors of certain
matrices constructed from A and A*" . But no way
is known to construct R and C without first
knowing A"
There is some doubt whether R and C
should be chosen to minimize K(RAC) . The next
example illustrates the problem; the reader should
write out the matrices involved in extenso for N = 6
to follow the argument.
Let A be the NxN matrix defined in
A
example 3 , and l e t
ri 1. o^-N 0 1-N 01-Nv -,
R = diag (h 9 h , ... y 2 , 2 , 2 ) and
C = diag ( 1 , 2 , . . . , 2 N ~ 3 ^ 2 N ~ 2 , 1) .
-1 T -1
We observe that A 1 = (RA^C) , so k± and A±
are both well-conditioned matrices with elements no
larger than 1.
796
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
Let u = (0, 0, 0, ... , 0, 1 ) T and note
that
A. = A, + u u T .
A 2 ~ X = A 1 " 1 - (1 + 2 X ~ N ) A ^ 1 u u T A ^ 1 .
|| ( R A ^ T 1 - (RA2C)-1||/||(RA2C)-1|| = 0 (2"N) .
797
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
REFERENCES
D.W. Barron and H.P.F. Swinnerton-Dyer, Solution
of Simultaneous Linear Equations Using a Magnetic-
Tape Store. The Computer Journal, Vol. 3, (I960)
pp. 28-33.
F.L. Bauer, Optimally Scaled Matrices. Numerische
Math., Vol. 5, (1963) pp. 73-87.
G. Birkhoff, R.S. Varga, and D. Young, Alternating
Direction Implicit Methods. Advances in Computers,
Vol. 3, Academic Press (1962).
E. Bodewig, Matrix Calculus, 2nd ed. North Holland
(1959). (A catalogue of methods and tricks, with
historical asides.)
M.A. Cayless, Solution of Systems of Ordinary
and Partial Differential Equations by Quasi-
Diagonal Matrices. The Computer Journal, Vol. 4,
(1961) pp. 54-61.
M.M. Day, Normed Linear Spaces* Springer (1962).
J. Douglas and J.E. Gunn, A General Formulation
of Alternating Direction Methods, part I.
Numerische Math. Vol. 6, (1965) pp. 428-453.
Dunford and Schwartz, Linear Operators, part I:
General Theory. Interscience (1958).
M. Engeli et. al., Refined Iterative Methods for
Computation of the Solution and the Eigenvalues
of Self-Adjoint Boundary Value Problems.
Mitteilung Nr. 8 aus dem Inst, fur angew. Math,
an dur E.T.H., Zurich! Birkhauser (1959).
D.K. Faddeev and V.N. Faddeeva, Computational
Methods of Linear Algebra, translated from the
Russian by R.C. Williams. W.H. Freeman (1964).
(This text is a useful catalogue, but weak on
error-analysis. A new augmented Russian edition
has appeared.)
G.E. Forsythe and W.R. Wasow, Finite Difference
Methods for Partial Differential Equations.
Wiley (I960). (A detailed text.)
L. Fox, The Numerical Solution of Two-Point
Boundary Problems in Ordinary Differential
Equations. Oxford Univ. Press (1957).
798
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
L. Fox, Numerical Solution of Ordinary and
Partial Differential Equations. Pergamon
Press (1962) (Ed.). (Based on a Summer School
held in Oxford, Aug.-Sept. 1961.)
799
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
V.V. Klyuyev and N.I. Kokovkin-Shcherbak, On
the Minimization of the Number of Arithmetic
Operations for the Solution of Linear Algebraic
Systems of Equations. Journal of Computational
Math, and Math. Phys., Vol. 5, (1965) pp. 21-33
(Russian). A translation, by G.J. Tee, is
available as Tech. Repft CS24 from the Computer
Sci. Dep't of Stanford University. (My copy has
mistakes in it which I have not yet sorted out.)
J. Liouville, Sur le développement des fonctions
en series ... . II, J. Math, pures appl. (1),
Vol. 2, (1837) pp. 16-37.
D.W. Martin and G.J. Tee, Iterative Methods for
Linear Equations with Symmetric Positive
Definite Matrix. The Computer Journal Vol. 4,
(1961) pp. 242-254. (An excellent survey.)
800
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.
33. A.M. Turing, Rounding-off Errors in Matrix
Processes. Quart. J. Mech. Appl. Math. 1,
(1948) pp. 287-308.
34. R.S. Varga, Matrix Iterative Analysis. Prentice
Hall (1962). (An important treatise on those
iterative methods most widely used to solve
large boundary-value problems.)
University of Toronto
801
Downloaded from https://www.cambridge.org/core. 28 Mar 2021 at 21:41:49, subject to the Cambridge Core terms of use.