0% found this document useful (0 votes)
32 views308 pages

Principles of Numerical Analysis

The document is a catalog of mathematical texts published by Dover Publications, listing various titles along with their authors and ISBN numbers. It includes a preface for 'Principles of Numerical Analysis' by Alston S. Householder, which outlines the book's focus on mathematical principles for numerical computation rather than computational rules. The content covers topics such as error analysis, linear equations, interpolation, and numerical integration, aimed at providing a foundational understanding for devising computational routines.

Uploaded by

CABINET CTNR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views308 pages

Principles of Numerical Analysis

The document is a catalog of mathematical texts published by Dover Publications, listing various titles along with their authors and ISBN numbers. It includes a preface for 'Principles of Numerical Analysis' by Alston S. Householder, which outlines the book's focus on mathematical principles for numerical computation rather than computational rules. The content covers topics such as error analysis, linear equations, interpolation, and numerical integration, aimed at providing a foundational understanding for devising computational routines.

Uploaded by

CABINET CTNR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 308

DOVER BOOKS ON MATHEMATICS

HANDBOOK OF MATHEMATICAL FUNCTIONS, Milton Abramowitz and Irene A.


Stegun. (0-486-61272-4)
TENSOR ANALYSIS ON MANIFOLDS, Richard L. Bishop and Samuel I. Goldberg.
(0-486-64039-6)
VECTOR AND TENSOR ANALYSIS WITH APPLICATIONS, A. I. Borisenko and I. E.
Tarapov. (0-486-63833-2)
THe History OF THE CALCULUS AND ITS CONCEPTUAL DEVELOPMENT, Carl B.
Boyer. (0-486-60509-4)
THE QUALITATIVE THEORY OF ORDINARY DIFFERENTIAL EQUATIONS: AN
INTRODUCTION, Fred Brauer and John A. Nohel. (0-486-65846-5)
PRINCIPLES OF Statistics, M. G. Bulmer. (0-486-63760-3)
THE THEORY OF SPINoRS, Elie Cartan. (0-486-64070-1)
ADVANCED NUMBER THEORY, Harvey Cohn. (0-486-64023-X)
Statistics MANUAL, Edwin L. Crow, Francis Davis, and Margaret Maxfield.
(0-486-60599-X)
FOURIER SERIES AND ORTHOGONAL FUNCTIONS, Harry F. Davis. (0-486-65973-9)
COMPUTABILITY AND UNSOLVABILITY, Martin Davis. (0-486-61471-9)
AsyMPTOTIC METHODS IN ANALYSIS, N. G. de Bruijn. (0-486-6422 1-6)
THE MATHEMATICS OF GAMES OF STRATEGY, Melvin Dresher. (0-486-64216-X)
APPLIED PARTIAL DIFFERENTIAL EQUATIONS, Paul DuChateau and David
Zachmann. (0-486-41976-2)
ASYMPTOTIC EXPANSIONS, A. Erdélyi. (0-486-603 18-0)
COMPLEX VARIABLES: HARMONIC AND ANALYTIC FUNCTIONS, Francis J. Flanigan.
(0-486-61388-7)
DIFFERENTIAL TOPOLoGy, David B. Gauld. (0-486-45021-X)
ON FORMALLY UNDECIDABLE PROPOSITIONS OF PRINCIPIA MATHEMATICA AND
RELATED SYSTEMS, Kurt Gédel. (0-486-66980-7)
A History OF GREEK MATHEMATICS, Sir Thomas Heath. (0-486-24073-8,
0-486-24074-6) Two-volume set
PROBABILITY: ELEMENTS OF THE MATHEMATICAL THEORY, C. R. Heathcote.
(0-486-41 149-4)
INTRODUCTION TO NUMERICAL ANALYSIS, Francis B. Hildebrand. (0-486-65363-3)
METHODS OF APPLIED MATHEMATICS, Francis B. Hildebrand. (0-486-67002-3)
Topo.ocy, John G. Hocking and Gail S. Young. (0-486-65676-4)
MATHEMATICS AND Locic, Mark Kac and Stanislaw M. Ulam. (0-486-67085-6)
MATHEMATICAL FOUNDATIONS OF INFORMATION THEORY, A. I. Khinchin.
(0-486-60434-9)
ARITHMETIC REFRESHER, A. Albert Klaf. (0-486-21241-6)
CALCULUS REFRESHER, A. Albert Klaf. (0-486-20370-0)
PROBLEM BOOK IN THE THEORY OF FUNCTIONS, Konrad Knopp. (0-486-41451-5)
(continued on back flap)
SLO
Wi

7 ME FS re ~ i alte Aen
PA 4 : cep aaa pa ihn Sa of %)fay oe

Th sin = cS

ye ee et
bgp —
PRLINE wRFoe. tnt a ; nee
Ainaihag
PRINCIPLES OF
NUMERICAL ANALYSIS
| (ee

eVIAMA JAI,
PRINCIPLES OF
NUMERICAL ANALYSIS
ALSTON S. HOUSEHOLDER

DOVER PUBLICATIONS, INC. - MINEOLA, NEW YORK


Copyright
Copyright © 1953, 1981 by Alston S. Householder
All rights reserved.

Bibliographical Note
This Dover edition, first published in 1974 and reissued in 2006, is an
unabridged, slightly corrected republication of the work originally published
by the McGraw-Hill Book Company, Inc., New York, in 1953.

International Standard Book Number; 0-486-45312-X

Manufactured in the United States of America


Dover Publications, Inc., 31 East 2nd Street, Mineola, N.Y. 11501
TO

B, J, ano J
Digitized by the Internet Archive
in 2024

https://archive.org/details/principlesofnume0000alst
PREFACE
This is a mathematical textbook rather than a compendium of computa-
tional rules. It is hoped that the material included will provide a useful
background for those seeking to devise and evaluate routines for numerical
computation.
The general topics considered are the solution of finite systems of
equations, linear and nonlinear, and the approximate representation of
functions. Conspicuously omitted are functional equations of all types.
The justification for this omission lies first in the background presupposed
on the part of the reader. Second, there are good books, in print and in
preparation, on differential and on integral equations. But ultimately,
the numerical “‘solution”’ of a functional equation consists of a finite table
of numbers, whether these be a set of functional values, or the first n coeffi-
cients of an expansion in terms of known functions. Hence, eventually
the problem must be reduced to that of determining a finite set of numbers
and of representing functions thereby, and at this stage the topics in this
book become relevant.
The endeavor has been to keep the discussion within the reach of
one who has had a course in calculus, though some elementary notions
of the probability theory are utilized in the allusions to statistical assess-
ments of errors, and in the brief outline of the Monte Carlo method. The
book is an expansion of lecture notes for a course given in Oak Ridge for
the University of Tennessee during the spring and summer quarters of
1950.
The material was assembled with high-speed digital computation
always in mind, though many techniques appropriate only to “hand”
computation are discussed. By a curious and amusing paradox, the
advent of high-speed machinery has lent popularity to the two innovations
from the field of statistics referred to above. How otherwise the con-
tinued use of these machines will transform the computer’s art remains to
be seen. But this much can surely be said, that their effective use
demands a more profound understanding of the mathematics of the
problem, and a more detailed acquaintance with the potential sources of
error, than is ever required by a computation whose development can be
watched, step by step, as it proceeds. It is for this reason that a text-
book on the mathematics of computation seems in order.
Help and encouragement have come from too many to permit listing
vil
Vill - PREFACE
all by name. But it is a pleasure to thank, in particular, J. A. Cooley,
C. C. Hurd, D. A. Flanders, J. W. Givens, A. de la Garza, and members of
the Mathematics Panel of Oak Ridge National Laboratory. And for the
painstaking preparation of the copy, thanks go to Iris Tropp, Gwen
Wicker, and above all, to Mae Gill.
A. S. Householder
CONTENTS
Preface. Vii

LE The Art of Computation —

1.1 Errors and Blunders.


1.2 Composition of Error
1.3 Propagated Error and Siguincant Hicures,
1.4 Generated Error . :
1.5 Complete Error Analyses
1.6 Statistical Estimates of Error
1.7 Bibliographic Notes .

. Matrices and Linear Equations

2.1 Iterative Methods


2.2 Direct Methods ;
2.3 Some Comparative Evaluations
2.4 Bibliographic Notes .

. Nonlinear Equations and Systems

3.1 The Graeffe Process . 106


3.2 Bernoulli’s Method 114
3.3 Functional Iteration. 118
3.4 Systems of Equations 132
3.5 Complex Roots and Methods a Pactodenuon 138
3.6 Bibliographic Notes . 141

. The Proper Values and Vectors of a Matrix. 143

4.1. Iterative Methods 150


4.2 Direct Methods . 166
4.3 Bibliographic Notes . 184

rth fe
nterpolation 185

5.1 Polynomial Interpolation 193


5.2 Trigonometric and Exponential Thteroolntion 211
5.3 Bibliographic Notes . 213
ix
om CONTENTS
6. More General Methods of Approximation 215

6.1 -Finite Linear Methods 215


6.2 Chebyshev Expansions . 223
6.3 Bibliographic Notes . 225

7. Numerical Integration and Differentiation . 226


7.1 The Quadrature Problem in General 226
7.2 Numerical Differentiation : 231
7.3 Operational Methods. 232
7.4 Bibliographic Notes . 241

8. The Monte Carlo Method . 242

8.1 Numerical Integration 243


8.2 Random Sequences 245
8.3 Bibliographic Notes . 246

Bibliography 247
Problems 263
Index 269
CHAPTER 1

THE ART OF COMPUTATION

1. The Art of Computation


We are concerned here with mathematical principles that are some-
times of assistance in the design of computational routines. It is hardly
necessary to remark that the advent of high-speed sequenced computing
machinery is revolutionizing the art and that it is much more difficult to
explain to a machine how a problem is to be done than to explain to most
human beings. Or that the process that is easiest for the human being to
carry out is not necessarily the one that is easiest or quickest for the
machine. Not only that, but a process may be admirably well adapted
to one machine and very poorly adapted to another. Consequently, the
robot master has very few tried and true rules at his disposal, and is
forced to go back to first principles to construct such rules as seem to
conform best to the idiosyncrasy of his particular robot.
If a computation requires more than a very few operations, there are
usually many different possible routines for achieving the same end
result. Even so simple a computation as ab/c can be done (ab)/c, (a/c)b,
or a(b/c), not to mention the possibility of reversing the order of the
factors in the multiplication. Mathematically these are all equivalent;
computationally they are not (cf. §1.2 and §1.4). Various, and some-
times conflicting, criteria must be applied in the final selection of a par-
ticular routine. If the routine must be given to someone else, or to a
computing machine, it is desirable to have a routine in which the steps
are easily laid out, and this is a serious and important consideration .
in the use of sequenced computing machines. Naturally one would
like the routineto be as short as possible, to be self-checking as far as
possible, to give results that are at least as accurate as may be required.
And with reference to the last point, one would like the routine to be
such that it is possible to assert with confidence (better yet, with cer-
tainty) and in advance that the results will be as accurate as may be
desired, or if an advance assessment is out of the question, as it often is,
one would hope that it can be made at least upon completion of the |
computation.
1.1. Errors and Blunders. The number 0.33, when expressing the
result of the division 1 + 3, is correctly obtained even though it deviates
1
2 PRINCIPLES OF NUMERICAL ANALYSIS

by 1 per cent from the true quotient. (The number 0.334, when express-
ing the result of the same division, deviates by only 0.2 per cent from the
true quotient, and yet is incorrectly obtained. The deviation of 0.33
from the true quotient will be called an error. If the division is to be
carried out to three places but not more, then 0.333 is the best representa-
tion possible and the replacement of the final ‘3”’ by a final “4” will be
called a blunder.
Blunders result from fallibility, errors from finitude. Blunders will
not be considered here to any extent. There are fairly obvious ways to
guard against them, and their effect, when they occur, can be gross,
insignificant, or anywhere in between. Generally the sources of error
other than blunders will leave a limited range of uncertainty, and gen-.
erally this can be reduced, if necessary, by additional labor. It is impor-
tant to be able to estimate the extent of the range of uncertainty.
Four sources of error are distinguished by von Neumann and Goldstine,
and while occasionally the errors of one type or another may be negligible
or absent, generally they are present. These sources are the following:
1. Mathematical formulations are seldom exactly descriptive of any
real situation, but only of more or less idealized models. Perfect gases
and material points do not exist.
2. Most mathematical formulations contain parameters, such as
lengths, times, masses, temperatures, etc., whose values can be had only
from measurement. Such measurements may be accurate to within 1,
0.1, or 0.01 per cent, or better, but however small the limit of error, it is
not zero.
3. Many mathematical equations have solutions that can be con-
structed only in the sense that an infinite process can be described whose
limit is the solution in question. By definition the infinite process can-
not be completed, so one must stop with some term in the sequence,
accepting this as the adequate approximation to the required solution.
This results in a type of error called the truncation error.
4. The decimal representation of a number is made by writing a
sequence of digits to the left, and one to the right, of an origin which is
marked by the decimal point. The digits to the left of the decimal are
finite in number and are understood to represent coefficients of increasing
powers of 10 beginning with the zeroth; those to the right are possibly
infinite in number, and represent coefficients of decreasing powers of 10.
In digital computation only a finite number of these digits can be taken
account of. The error due to dropping the others is called the round-off
error.
In decimal representation 10 is called the base of the representation.
Many modern computing machines operate in the binary system, using
the base 2 instead of the base 10. Every digit in the two sequences is
THE ART OF COMPUTATION 3
either 0 or 1, and the point which marks the origin is called the binary
point, rather than the decimal point. Desk computing machines which
use the base 8 are on the market, since conversion between the bases 2
and 8 is very simple. Colloquial languages carry the vestiges of the use
of other bases, e.g., 12, 20, 60, and in principle, any base could be used.
Clearly one does not evaluate the error arising from any one of these
sources, for if he did, it would no longer be a source of error. Generally
it cannot be evaluated. In particular cases it can be evaluated but not
represented (e.g., in the division 1 + 3 carried out to a preassigned
number of places). But one does hope to set bounds for the errors and
to ascertain that the errors will not exceed these bounds.
The computor is not responsible for sources 1 and 2. He is not
concerned with formulating or assessing a physical law nor with making
physical measurements. Nevertheless, the range of uncertainty to
which they give rise will, on the one hand, create a limit below which the
range of uncertainty of the results of a computation cannot come, and
on the other hand, provide a range of tolerance below which it does not
need to come.
With the above classification of sources, we present a classification
of errors as such. This is to some extent artificial, since errors arising
from the various sources interact in a complex fashion and result in a
single error which is no simple sum of elementary errors. Nevertheless,
thanks to a most fortunate circumstance, it is generally possible to
estimate an over-all se ofsapoentaunty as though it were such a simple
sum (§1. 2). fence we will distinguish propagated error, generated

i er outset of any computation the data may contain errors of


measurement, round-off errors due to the finite representationin some base |
of numbers like 44, or even numbers requiring a finite but large number
of places for exact representation. These initial errors carry through the
computation and lead to an uncertainty at every step. It is important to
know how these initial errors are propagated through the computation
and to what extent they render the results uncertain.
In addition to this, at every step, or nearly every step, new errors
may arise as a result of round-off, these combine with the errors already
propagated, and the total is propagated through what computations
remain. Finally, when the computation is terminated, a truncation
error may remain and further enlarge the region of uncertainty. Roughly
the extent to which errors are propagated and the uncertainty due to
residual error depend upon the mathematical formulation of the compu-
tational procedure, while the generation is more dependent upon the
detailed ordering of the computational steps.
Any computation, however elaborate, consists of a finite number of
4 PRINCIPLES OF NUMERICAL ANALYSIS

elementary operations carried out in some sequence. The elementary


operations are usually additions and subtractions, multiplications and
divisions, comparisons, possibly table look-ups, and the like. An unam-
biguous description of the sequence in which the operations are performed
or to be performed, with a specification of the data upon which each
is to operate, constitutes a routine. If multiplication and division are
elementary operations, there are six possible routines for computing
ab/c. Hence a routine is by no means defined when a mathematical
formula, or sequence of them, is written down.
A routine of any complexity breaks up naturally into parts or sub-
routines. A subroutine may have for its purpose the computation of an
intermediate quantity, of no interest in itself but serving as a datum or
operand for one or more subsequent subroutines. Thus, in order to
calculate ab/c, one must first calculate ab, or a/c, or b/c. Or a subroutine
may operate upon intermediate results to produce a final result.
Suppose a subroutine is intended to compute a function f(z, y, . . .)
for given values of its arguments. If f is a rational function or a poly-
nomial, there need be no residual error in the computation, but only
propagated and generated errors. If f is not a rational function, some
rational approximation must be devised. One type of rational approxi-
mation is a Taylor series. If a Taylor series is used, only a finite number
of terms will be computed. The residual error in the computation is the
sum of all neglected terms. Hence the residual error is fixed by the
mathematical formulation of the problem, together with the specification
of the number of terms to be used in the computation. But an error
may be generated and propagated in each computed term. Gee Mss
Another type of approximation, e.g., in solving an equation by New- |
is the following: From an initial approximation fo, one
defines a sequence of approximations by a relation fi,; = $(f;), where ¢
is some function which can be evaluated. If the mathematical sequence
converges, one may take as a criterion for terminating the sequence the
condition that successive terms shall differ by less than some assigned.
quantity. Clearly the assigned quantity must not be smaller than the
smallest quantity representable by the machine, and normally it will be
some integral multiple of this. It will have to depend upon the error
generated by the routine for evaluating ¢. The residual error is the
difference between the f; which one accepts and the true value of f, and
it is hmited by the quantity used in the criterion which is, in turn, limited
by the precision of the machine operations. For illustration, a routine
for computing square roots will be considered in §1.5.
1.2. Composition of Error. Let x*, y*, . . . designate numbers which
might occur as data or as results of a particular computation. That is
to say, z* has a form
THE ART OF COMPUTATION 5
(1.2.1) et = £(@6 Pegs SLM BATE BD) Be
where £ is the base, usually 2 or 10, \ is a positive integer, and o any
integer, possibly zero. It may be that X is fixed throughout the course
of the computation, or it may vary, but in any case it is limited by prac-
tical considerations. Such a number will be called a representation.
It may be that x* is obtained by “‘rounding off’’ a number whose true
value is x (for example, x = 14, x* = 0.33), or that x* is the result of
measuring physically a quantity whose true value is x, or that x* is the
result of a computation intended to give an approximation to the quan-
tity x.
Suppose one is interested in performing an operation w upon a pair
of numbers x and y. That is to say, cwy may represent a product of x
and y, a quotient of x by y, the yth power of z,.... In the numerical
computation, however, one has only z* and y* upon which to operate,
not x and y (or at least these are the quantities upon which one does, in
fact, operate). Not only this, but often one does not even perform the
strict operation w, but rather a pseudo operation w*, which yields a
rounded-off product, quotient, power, etc. Hence, instead of obtaining
the desired result rwy, one obtains a result 2*w*y*.
The error in the result is therefore

(1.2.2) swy — ctw*y* = (rwy — x*wy*) + (e*oy* — 2*w


propagated generatSee
Since x* and y* are numbers, the operation w can be applied to them, and
z*wy* is well aren except for special cases as when «w represents
division and y* 2x n t the ;

epres enerated € or nd-off. Hence the total error inrine


resultis (he sum of the error yrovagated by the operation and that gen-
erated by the operation.
It may happen that the two errors are opposite in sign, though often
the sign is not known, but only the magnitude. In any case,

(1.2.3) [roy — x*w*y*| < |zwy — x*wy*| + |x*oy* — x*w*y*|,

and one can say at least that the total error does not exceed the sum of
the two errors.
That propagated and generated errors depend upon the details of the
routine, such as the value of \ for any representation, and even the order
in which certain operations are carried out, is easily seen. Thus, to con-
sider only round-off, if ¢ is some operation, it may be that mathematically

(x*wy*) p2* = 2*w(y* $2").


6 PRINCIPLES OF NUMERICAL ANALYSIS

That is, the two operations may be associative mathematically. Never-


theless,

(x*wy*) p2e* — (w*w*y*) *2* = [(w*wy*)g2* — (x*w*y*) b2*]


+ [(a*w*y*) p2* — (x*w*y*) 6*2*],
whereas

nto(y*de*)—atoX(y*o*2*) = [ota(y*de*)—vtoly*ote)]
+[a*o(y*o*e*)—a*a*(y*o*2)],
Thus the errors generated by performing the operations in the two pos-
sible ways have different expressions, and cannot be assumed equal
without proof.
1.3. Propagated Error and Significant Figures. Let
(1.3.1) e=ax*+ é, fp Pea
ys ae
and consider the problem of evaluating the function f(z, y, ...). Ifthe
function can be expanded in Taylor’s series, then
(1.3.2) f(a, 1 .) = CSs y*, ome .) = feck ah yite eh 2

=} V6 (fee + enjoy ee )

where the partial derivatives are to be evaluated at 2*, y*, .... This
represents the error in f arising from errors in the arguments, 7.e., the
propagated error. Generally one expects the errors é, 7, ... to be
“‘small” so that the terms of second and higher power can be neglected.
If so, then the propagated error Af satisfies, approximately,

(1.3.3) i NAF lel Bfel, soln fclicteecuaanete


This is strictly true when f is a simple sum:

f=tetyt--,
Hence the error in a sum does not exceed the sum of the errors in the
terms.
One can, by direct differentiation, write down any number of special
relations (1.3.3) based upon the assumption that the errors in the argu-
ments are small. Nevertheless, for the detailed analysis one must go to
the individual elementary operations.
Consider the case of the product and quotient. For the first we have

ay — a*y* = oy + y*é + Ey.


It is sometimes convenient to consider the relative error, which is the ratio
of the error to the magnitude. Hence

(1.3.4) BYBe OY eet ae


phys
THE ART OF COMPUTATION 4

Usually one says that the relative error in a product is the sum of the
relative errors in the factors, and this is approximately true if these rela-
tive errors are both small, but only then.
For the quotient
we 8 Mapai Ect *y
¥y <
*
y*(y* + n)
and the relative error is

(1.3.5) Be mintedVouMake ee of
GUT be Soh Seay,
If the relative error 7/y* is negligible, then the relative error in the
quotient does not exceed in magnitude the sum of the magnitudes of the
relative errors in the terms. Nevertheless, if »/y* < 0 and not numeri-
cally small, the conclusion does not follow. .
If x*, given by (1.2.1), represents a number whose true value is z,
and if

(1.3.6) |z* — | < 6/2,


the digits 71, . . . , % may be said to be reliable or significant. Ifa, ~ 0,
then x* is said to contain significant figures. However, if the inequality
(1.3.6) is not known to hold, the last digit, x, is “‘in doubt.”” When a
number such as z* is written in its usual form as a sequence of digits with
a decimal point, it is usually understood that all digits are significant,
from the first non-null digit to the last digit written, unless limits of error
are specified. Some textbooks give various rules of thumb for deter-
mining the number of significant digits in computed quantities, and the
implication is sometimes left that nonsignificant digits should be dropped
before making subsequent computations with these numbers, or at least
that nothing is to be gained by retaining the doubtful digits. But this is
clearly not the case, since dropping these digits generally enlarges the
region of uncertainty.
1.4. Generated Error. In many of the newer automatic computing
machines, the “built-in” arithmetic operations are designed to yield
correct results only when the operands as well as the results are numbers
of a form (1.2.1), where 2 is fixed, ando = 0. Such a number von Neu-
mann and Goldstine call a ‘digital number.” Any digital number is
therefore less than unity.
Given two digital numbers a* and b*, the machine will correctly form
the sum only if
la* + b*| <1,
and will correctly form the difference if
la* — b*| < 1.
8 PRINCIPLES OF NUMERICAL ANALYSIS

But if the one condition or the other holds, the machine will correctly
form the sum or difference, as the case may be, and no round-off is
generated. Hence if a* + b* or a* — b* is digital, a* + b* or a* — b*
can be formed, and formed correctly, without generating any new error.
If a* and b* are digital, then necessarily
larbe\r<ole
However, the true product a*b* is a number of 2\ places. If the machine
holds only ) places, it will not form the true product a*b*, but a pseudo
product. It can be represented a* X b*. It may be that the machine
merely drops off the last \ places from a*b*. Or it may be that the
machine first forms a*b* + 8—/2 and then drops the last » places. In
the latter event the pseudo product satisfies
(1.4.1) la*b* — a* x b*| Se = 6/2.
where « is introduced to simplify notation. Let us assume that (1.4.1)
holds. ;
For division, the quotient. a*/b* will usually require infinitely many
places for its true representation, even though a* and b* are both digital.
The machine, however, can retain only the first \, and it may compute the
first \ places of a*/b* and drop the rest, or it may compute the first \
places of a*/b* + 6/2. In the latter event, the retained » places
represent the pseudo quotient a* + b*, which satisfies
(1.4.2) |a*/b* — a* + b*| <e.
Given a series of n products a*b* to be added together, we have
(1.4.3) |Za*¥ X b¥ — La*b*| < ne.
However, instead of recording each product as a digital number and add-
ing the results, it may be possible for the machine to retain and accumu-
late the true products of a* and b*, rounding off the sum as a digital
number. This pseudo operation may be designated 2*a*b*, and for this
we have
(1.4.4) |=*a*b* — Latb*| < «.
While

(1.4.5) a* & b* = b* cae.

equalities in terms of arithmetic operations do not always hold strictly


when these are replaced by pseudo operations, as was already shown in
general. In particular,

[(a* + b*) xee* = oP ixce* tb ee") | Se.


| THE ART OF COMPUTATION 9
since each pseudo multiplication could give rise to an error «. However,
the two quantities being compared can differ only by a digit in the last
place, which is to say by an integral multiple of 2e. Consequently we
can improve this by saying that
(1.4.6) [(a* + b*) X c* — (a* X ct + B* X c*)| < De.
In order to examine the effect of grouping in a continued product, we
note that :
a* X (b* X c*) — a*b*c* = [a* X (b* X c*) — a*(b* X c*)]
+ [a*(b* X c*) — a*b*c*]
so that :
la* X (b* XK c*) — a*b*c*| < (1 + la*|)e < Qe.
If we now interchange a and c and add results, we have
la* X (b* X c*) — (a* X D*) XK c*| < (2H |a*| + |c*)Je.
But if a* = c* = 1, the left-hand side is zero, and otherwise the left-hand
side is less than 4e. But it must be an integral multiple of 2, so that
(1.4.7) la* x (b* X c*) — (a* X B*) X c*| < 2c.
The two pseudo products either agree or differ by one in the last place.
Finally consider
(a* + b*) XK b* — a* = [(a* + b*) & D* — (a* + B*)d*)
+ (a* + b* — a*/b*)b*.
Since this is less than (1 + |b*|)e, it is actually less than 2e, and since it is
an integral multiple of 2«, it vanishes. Hence
(1.4:8) (a* + b*) X b* = a*.
However
(a* < b*) + b* — a* = [(a* X b*) + b* — (a* X D*)/b*|
+ [(a* X b*) — a*b*]/b*,
so that
(1.4.9) \(a* X b*) + B* — a*| < (1 + [b* |e.
If |b*| is small, |b*|-1 will be large, and the error can be large. However,
bear in mind that this comparison is made for pseudo operations in which
the rounded-off product is used. If the machine retains the complete
product to use as the dividend, the error will not arise.
More generally, and in the same way, we have

(1.4.10) \(a* + b*) X c* — (a*/b*)c*| < (1 + le*l)e,


while
(1.4.11) \(a* X b*) + c* — a*b*/c*| < (1 + |c*|“)e.
10 PRINCIPLES OF NUMERICAL ANALYSIS
Interchange of a* and c* in (1.4.10) gives

l(c* = b*) Xa — (ce/b*)a*| <1 + lacie


so that by addition

[a* =8b*) XNe™— (e* 8) Xra*| Ss 2 eat] ile* |e.


But the left member vanishes when |a*| = |c*| = 1, and otherwise the
right member is less than 4e. Hence in any event
(1.4.12) \(a* + b*) X c* — (c* + b*) X a*| < 2e.
1.5. Complete Error Analyses. It was shown above that the magni-
tude of the error in the result of any subroutine cannot exceed the sum
of the magnitudes of the propagated, generated, and residual errors.
In particular instances a routine can be devised such that the error due
to one of two sources necessarily is of one sign and that due to the other
is of the opposite sign. In such a situation the resultant error cannot
exceed the larger of the two individual errors. But to devise a routine
having this property, or even to discover that a given routine has it, may
take a disproportionate amount of time and not be worth the effort.
Generally one must assume that the errors can build up, and attempt to
devise a routine that will keep all errors as small as possible. And in any
event one must somehow balance the time to be spent on an analysis
against the time required for the computation and the allowed tolerances.
Some computations are of such frequent occurrence that it may be
worth while devoting considerable time and effort to the design of an
optimal routine and a precise error analysis for that routine. We give an
example or two.
Consider a binary computing machine with \ magnitude digits and one
sign digit, representing only numbers of magnitude less than unity. This
machine will compute a (2))-digit product and accept a (2A)-digit divi-
dend, but the digital pseudo products and pseudo quotients will be
supposed to satisfy, in general,
OSGi
oS SPRY
0<4a/b —.a:+b-< 2,
the latter relation presupposing the division to be possible.
We require an optimal routine and precise error analysis for ~/a, using
Newton’s method. This means that the number a whose square root is
required is a digital number, 0 < a < 1, and the routine is to yield a
digital number x for which the maximal (« — ~+/a) is to be as small as we
can make it. Indeed if the routine is properly constructed, then there
should be some half-open interval of length 2— which contains both
v/a and «x.
THE ART OF COMPUTATION tl
By Newton’s method one takes some positive x, < 1 and forms
CSL). vii = wv, — (x — a/x)/2,
and the sequence can be shown to approach +/a as a limit. Moreover, if
one takes x} = 1, then surely
(1.5.2) zi, > Va
when 7 = 0, and one can show inductively that this relation holds for
every 2. In fact, one verifies directly that
Uy — @ = (vw, — a/x,)?/4.
Nevertheless, the sequence one actually forms in the machine is not
strictly defined by (1.5.1), but instead by
(1.5.3) tin = Ui — (4 — a + 4) + 2.
(All numbers will be digital numbers, and the asterisk can be omitted.)
There is no a priori assurance that the numbers 2; will satisfy (1.5.2), and
this point must be investigated.
We first show that, if y and z are any two digital numbers satisfying
(1.5.4) (@-a+z2)+2>(y-—a+y)+2>0,
then

(1.5.5) z>y> va.


The second inequality in (1.5.4) is equivalent to

(Yy¥-a+y) +222,
y-a+y> 2,
Y 22 POY > eo] y > Gy,
whence y? > a, which proves the second inequality in (1.5.5). By the
first inequality in (1.5.4),
(¢-—a+2)+2>(y-—a+y)+24+2%.
But
(g-—a+z)/2> (e—a+az) +2,
ee aO ee YO Ue ae,
since in forming the pseudo quotient by 2 (which is a single shift to the
right) the error is either 0 or 21. Hence

C=442)/22 ya 4)/2+2 4,
2-Q@+e>y-aryt2%
Again,
a+2-—a/z> —2>,
a+y—a/y
<0,
12 PRINCIPLES OF NUMERICAL ANALYSIS

whence
2—-a/z>y-—a/y.
But the function f(z) = 2 — a/z is properly monotonically increasing.
Hence f(z) > f(y) implies the first inequality in (1.5.5).
This implies, in particular, that, if

(t;-—Q+2%,)
+2 = 4%; — %41>0,
then x; > +/a. If it should happen that
(1.5.6) (x; Oy as 2;) ~2< 0,

then clearly we should take x; and not x41 a8 x. On the other hand, if
(1.5.7) (@_1 ae UR re Gia) +22> VEINS

we shall take at least one more step in the iteration.


Suppose the equality holds in (1.5.7). Then

Gi Dio aa
(t24 a ee Gea )y/2 = (ti_1 Oh ae Xi-1) A} pies
whence
U1 — @ + U1 > 2-*1,
Hence
Ue = (0i a es Demtarll Ss a/xs-1 + agg
ine Ss a/xi—1,

Lint Ls,
a;+ TX S a/x:,
mee a/x; > —2>,

This holds a fortiori if the inequality holds in (1.5.7). Hence in all cases
z—a/e> —2>,
oS ae oe
(1.5.8) t > (a + 2--2)% — 2-1,

This gives a lower bound for the computed value.


Next suppose
(wv -—a+a) +2<0.
Then
UO aes < 27%.
%Sa+4;+2-> < a/z; +2>
Hence in all cases
es a/zez+2>,
(2 —2>-1)2< a+ Q-2r—2,
ta (a + 2-2-2) eee hd, 3
(1.5.9)
/
Inequalities (1.5.8) and (1.5.9) define a half-open interval of length 2>
THE ART GF COMPUTATION 13

and center (a + 2-*~?)% upon which z must lie. These inequalities can
be written

(1.5.10) (a? — 2>2z)4 < Va < (2? + 2>2)%.


In the worst case, when a = 0, x = 2, and theerroris2—. Inall other
cases the error is less. The case a = 0, x = 2- could arise in machine
computation when a is an intermediate result, obtained from previous
computation. ;
If a itself can be in error by an amount a, then by §1.3 the propagated
error is approximately a/(2~+/a), when a #0. Hence the total error
cannot exceed 2> + a/(2+/a), when a ~ 0. If a represents the maxi-
mum possible difference between a + 2-»—* and the true value for a, then
the maximum error can be written as 2>—! + a/(2z).
This case is especially favorable because of the fact that round-off
errors do not build up in the course of the computation. At each step
the best available approximation is used as the basis for obtaining a
better one, and one continues to get improvement until a stage is reached
at which the error inherent in a single step is as great as the truncation
error.
By way of comparison, consider a computation based upon Taylor’s
series, where the round-off can accumulate. We require the evaluation
of both sin x and cos 2, for |x| < 7/4 <1, to be followed by a check
based upon the identity sin? z + cos?z =1. Clearly the computed values
of sin x and cos z will not necessarily satisfy the identity strictly. How
close can we get to the true values of sin x and cos z, and how closely can
we expect our approximations to satisfy the identity?
We shall, in fact, describe a routine for computing s, an approximation
to sin x, and c, an approximation to 1 — cosz. Let w: = 2, a digital
number, and
YN Co AYer
Wy, = (+N) XK wry.
For these operations it is assumed that
la*b* — a* X b*| < 2-49.

The terms w, are the terms which appear in the expansions, while w* are
the terms we actually obtain in the computation. For this machine
\ = 39, and the machine accepts a (2d)-digit dividend so that divisions
xz + n are performed by dividing (2x) by (2~n). Let
én = 2-“lw, — wel.
Then
w* — w, = [(x +n) X wh, — (2 + noe] + [@ +0) - (x/n) wr
hi + (x/n)(wa_, — Wn-1)-
14 PRINCIPLES OF NUMERICAL ANALYSIS
The division steps satisfy
0<2/n—a +n < 2-%(n — 1)/n.
Also
|wr| < 1/n}.
Hence
én < 1 + 2(n — 1)/n! 4+ eni/n.
The residual error is less than the first neglected term, and for n > 15,
\wn| < 2-49 = 2-1, Hence on solving recursively and adding the
generated errors (the ¢’s) and the residual errors, we have
lc — (1 — cos x)| < 1.197 XK 2-*%,
|s — sin z| < 1.140 X 2-*.
For the check let
2 fy
cosz =1—c+e, sinz=s+e,

where ¢ and e’ are bounded by the right members of the above inequalities.
Then
2c — c? — gs? = Qe cos x + 2e’ sin & — &* — &?.
Hence
|2c — c? — s*| < 2e’(cos x + sin |x|) + 2(e — e’) cos x
< 2c! 1/2 + We — 2)
< 1.669: X 2-*.
Hence
[Ze —cXe—sXs| < |2c — 2? — 8314+ |eXec—c|+|sX8
— 83
Kaidisnnts

Thus in applying the check, we compute a quantity which should vanish


if there were no errors due to truncation or round-off, but on the basis of
this analysis we can say only that the computed value must be less than
2-%. If a larger value occurs, it must be due to blunders.
This analysis shows only that the quantity computed for the check
cannot be so great as 2-** in magnitude. It does not show that quantities
as great as 15 - 2~* could, in fact, occur, nor even quantities as great as
2-*°, Hence a more detailed analysis, which pays attention to the pos-
sible signs of the errors at each step, might yield a somewhat smaller
bound.
1.6. Statistical Estimates of Error. The discussion in §1.3 shows what
is intuitively obvious to begin with: that given n numbers, each of which
can be in error by as much as e, their sum can be in error by as much as
me. However, the occurrence of this greatest possible error will be
extremely infrequent in practice. Even assuming that the error in each
term is maximal, which is improbable in itself, the probability is only 2-*
that the maximal error of ne would occur in the sum, since that would
THE ART OF COMPUTATION 15
require that the individual errors all be of the same sign. If one can
assert of each term z in the sum that its error can have any value between
0 and ¢, say with uniform probability, then the probability of occurrence
of the maximal error ne becomes much smaller than 27".
As with sums, so with any other computation or sequence of computa-
tions, the probability of occurrence of a maximal error may be extremely
small. It is reasonable to inquire, therefore, as to the probability that
the accumulated error in a given computation will exceed some assigned
limit. A probabilistic approach is the more clearly indicated if one con-
siders the fact that limits of error in measured quantities can seldom be
assigned with certainty. At best one can assign a probability to the
assertion that the error of measurement does not exceed some given
amount.
In principle a statistical estimate of errors can be made by going
through the same steps as in a strict estimate, except that at each step one
requires a distribution of errors in the data and seeks a distribution,
rather than strict limits, for the errors in the result. Unfortunately,
besides the fact that the computation of these distributions is intrinsically
difficult, questions of statistical independence are especially troublesome.
Consequently we mention this approach only in passing and point out
that there is a growing literature on the subject, a few titles of which are
listed among the references.
1.7. Bibliographic Notes. The subject of errors is given at least
casual discussion in most standard texts, either as a separate topic or in
connection with particular computations. Most papers in the periodicals
deal with errors in particular computations. The four sources of error
are distinguished, digital numbers defined, and the error formulas of §1.4
are given in von Neumann and Goldstine (1947), where the major topic
is errors in matrix computations. Turing (1948) also discusses matrix
operations in some detail. Rademacher (1947) and Harrison (1951)
discuss errors in the numerical solution of differential equations, but these
papers are also of more general interest. Dwyer (1951) discusses errors
at length.
On statistical assessments Inman (1950) discusses the problem in
general, Rademacher (1947) makes applications to differential equations,
while Huskey (1949) finds a failure which is explained by Hartree.
Goldstine and von Neumann (1951) give statistical estimates in matrix
computations.
Papers yet to be published by Goldstine, Murray and von Neumann,
and J. W. Givens, each concerned with the problem of finding proper
values of matrices, will contain elaborate error analyses. It seems safe
to predict that an increasing number of detailed analyses of specific
routines, such as the ones given here for the square root and the circular
16 ss PRINCIPLES OF NUMERICAL ANALYSIS
functions, will be issued for limited distribution by groups operating high-
speed computing machines.
On the operation of automatic digital computers see Berkeley (1949),
Wilks, Wheeler and Gill (1951), Engineering Research Associates (1950),
and the (Harvard) Computation Laboratory (1946, 1949). In addition
to these references, which are listed in the bibliography, there are numer-
ous reports and memoranda issued by organizations which build or
operate particular machines: IBM Corporation, MIT, Harvard Com-
putation Laboratory, University of Illinois, NBS, Institute for Advanced
Study, and a number of others. A section of the periodical MTAC is
devoted to electronic computers. And finally, abstracts or complete
papers presented at meetings of the Association for Computing Machinery
are obtainable.
CHAPTER 2

MATRICES AND LINEAR EQUATIONS

2. Matrices and Linear Equations


The numerical solution of an integral equation, of a partial differential
equation, or of an ordinary differential equation with two-point boundary
conditions is generally obtained by solving an approximating linear
algebraic system. Moreover in order to solve a nonlinear problem, one
may replace it by a sequence of linear systems providing progressively
improved approximations. For these reasons, and because of the theo-
retical simplicity, we start with linear systems of equations. For study-
ing linear systems of equations a geometric terminology, with the compact -
symbolism of vectors and matrices, is extremely useful. A résumé of the
basic principles is therefore included.
2.01. Vectors and Coordinate Systems. Any n vectors e1, ... , @n are
said to be linearly dependent in case any of them can. be expressed as a
linear combination of the others. A more symmetric statement of the
same property is that there are scalars a1, . . . , a, not all zero satisfying
a1 +. + + + nen
'= 0.

The equivalence of the two statements is made clear by considering that,


if a; ¥ 0, then we could solve for e; in terms of the other vectors. As an
example, two vectors are linearly dependent in case they are parallel, or
if one is the null vector.
A vector space is n-dimensional in case there are n linearly independent
vectors in the space, but any n + 1 vectors are linearly dependent. Let
€1, . . . , &n be linearly independent, and let x be any vector in the space.
These n + 1 vectors are linearly dependent. Hence we can find scalars
e, &, .. - , &, not all zero, such that .
Xa Cer de 1 tr fen
= 0.
But certainly, then, ¢’ ~ 0, since if we had é = 0, the relation would
express the linear dependence of the set of e’s, whereas they are taken to
be linearly independent. We can therefore solve for x and write

(2.01.1) x= y Ee.
17
18 PRINCIPLES OF NUMERICAL ANALYSIS

Hence every vector of the space is expressible as a linear combination of


any n linearly independent vectors of the space; these vectors constitute
a basis or coordinate system for the space; the numbers &; are the coordi-
nates of x in that system, and the vectors é,e; its components. The set
of n coordinates £; constitutes a ‘‘numerical vector” «. The numerical
vector x specifies x completely in a given coordinate system. The indi-
vidual £; will be called the elements of x.
Suppose fi, . . . , f, are also linearly independent vectors in the space.
Each f; is expressible as a linear combination of e1, . . . , €n:

(2.01.2) fu » eves.
i
But also each e; is expressible as a linear combination of fi, . . . , fn:

(2.01.3) : ee; = , fiers


#)
and either set of relations must be obtainable from the other by treating
it as a set of n equations in n unknowns and solving in the usual manner.
If, in Eq. (2.01.1), we replace each e; by its expression (2.01.3), we

»
obtain
(2.01.4) x = f;€;;&;.

Hence if
(2.01.5) = Y,eiki
7

the & are the coordinates of the geometric vector x in the f; coordinate »
system, and the set of coordinates £, constitute the numerical vector 2’
which represents x in that system of coordinates.
It is convenient to arrange the coefficients ¢;; in the rectangular array
€11 €12 €1n

(2.01.6) ae te, a
En] 9 €no> se 3) Cha

and the coordinates £; in the column


fi
(2.01.7) 2 |
En
with the coordinates £; similarly arranged. These arrangements permit
a simple rule for obtaining each ¢’ from the e’s and #’s. The rule is more
easily observed, by referring to (2.01.5), than stated. The array (2.01.6)
MATRICES AND LINEAR EQUATIONS 19
is called a matrix, and the column (2.01.7) a (numerical) vector, and we
designate these H and z, respectively. The rule, then, is the rule for
multiplying a matrix by a vector,
(2.01.8) x’ = Ex,
where x’ is the column of the £'.
Besides the vectors e; and f,, let gi, . . . , gn represent also a set of
linearly independent vectors in the same space, and let

f; = Y Beday
i
Then
ej = Y fe = . » Si PijEsi-
j kj
Hence if F represents the matrix of the ¢x;,

we shall say that P is the product of the matrices F and E:


(2.01.10) P= FE;
with (2.01.9) giving the rule for multiplication. This is consistent with
the rule for multiplying a matrix by a vector. Note that the product
EF is, in general, not the same as the product FE.
In the particular case when
gi = &,
then
bei = Vrs
and it must then be true that

Y vase = 9ni;
7
where 6;; is the Kronecker delta, defined by
= 0 when
k #1,
(2.01.11) oh 1 when k = 4.
20 PRINCIPLES OF NUMERICAL ANALYSIS
In this event the matrix P has the simple form

(2.01.12)
Selle) ah ven Kot hemes ne

and is called the identity matrix J. Since, in that case,


I = FE,
one says that the matrices F and £ are reciprocals:
F = E.
This is one case in which the order of multiplication is immaterial:
I= FE" = EE.
The matrices introduced so far have been square matrices, but matrices
may also be rectangular, and in particular the vectors z and x’ discussed
above may be regarded as matrices of n rows and one column each. We
may also have a matrix of one row and n columns. Such a matrix would
be called a row vector, while the vectors x and x’ are column vectors.
We can extend the notational scheme by writing
(2.01.13) e=(e1 @2 *°'* @n),
and treating e formally as though it were a (numerical) row vector,
which enables us to write (2.01.3) in abbreviated form
(2.01.14) ~ e = fE.
Then, also,
f = gf,
whence, by formal substitution,
e = gFE.

This is consistent with previous results, where we found that


e = gP
with
P=FE,
2.02. Linear Transformations. A transformation of vectors (as dis-
tinguished from a change of coordinates) is a natural generalization of the
notion of a function. A transformation of vectors is a rule which, to
every vector of the space under consideration, associates a unique vector
in the space, and this vector is called its transform. The transformation
is linear if to the sum of two vectors corresponds the sum of the trans-
forms and to a.scalar multiple of a vector corresponds the same scalar
MATRICES AND LINEAR EQUATIONS 21
smaleiple of the transform. In many contexts, especially in discussions of
quantum mechanics, it is customary to speak of the transformation
as being an operation performed by an operator upon the vector and
yielding the transformed vector (the result of the transformation).
An equivalent definition, and the one which will be used here, is the
following: If T(x) designates the transform of x, then the transformation
is linear in case
T(x) = 2é,7(e,),
when x is given by (2.01.1). In accordance with the abbreviated notation
this can be written
(2.02.1) T(x) = T(e)za.
Since each T(e,) is in the space of the e,, it can be expressed
T (e;) = Le;T ji;
or in abbreviated form,
(2.02.2) T(e) = eT,
where T’ is the matrix
(2.02.3) T = (ry).
But from (2.02.1)
(2.02.4) T(x) = eT.
Hence Tz is the numerical vector representing 7x) in the coordinate
system of the e;, and the matrix 7 represents the transformation (more
strictly, the operator) in that same coordinate system.
In a different coordinate system, however, x, 7(x), and the trans-
formation itself are otherwise represented. In fact, if
x = eg,
and
e = f£,
then
x = fx = fr’,
so that, as we have already seen, x’ = Ex represents x in the coordinate
system f. Now
. T(x) = eTx = fETx,
and
c= Ez’.
Hence
T(x) = {ETE-2z’.

Hence

(2.02.5) T = ETE
22 PRINCIPLES OF NUMERICAL ANALYSIS
is the matrix which represents the transformation in the coordinate
system f, since this is the matrix which, when applied to the numerical
vectorx’ (which represents x in that system), will yield the numerical
vector representing 7'(x) in that system.
2.03. Determinants and Outer Products. An outer product of two
vectors a and b is a new type of geometric entity, defined to have the
following properties:
(12) [a, b] = 1b; a];
(22) [aa, b] = ofa, bj;
(32) [a, b] + [a, c] = [a, b + ¢].
It can be pictured as a two-dimensional vector, whose magnitude, taken
positively or negatively, is the area of the parallelogram determined by
the vectors in the product. It follows immediately from (12) that
[a, a] = 0.
Hence
[a, b] = [a, b] + afa, a] = [a, b] + [a, aa] = [a, b + aa].
Hence for any scalars a and B,
. [a, b] = [a, b + aa] = [a + fb, bl.
If e1, €2 are any linearly independent vectors in the space of a and b,
then
(2.03.1) [a, b] = |a_ bl[ex, eal,
where
(2.03.2) la bl] = arb, — afi
is called the determinant of the numerical vectors a and b. The evalua-
tion is immediate:
[a, b] = [a, Bei + Bree] = Bila, er] + Bela, eo]
= Bilare: + ar€e, 1] + Bolare1 + are, eo]
= Ba2[€2, e1] + Boos[@1, €2]
= (a182 — a81)[e1, e2].
The determinant is a number, and its relation to the outer product is
similar to that of the coordinates to a vector.
_ It is a simple geometric exercise to show that (32) holds in the paral-
lelogram interpretation when a, b, and c are all in the same 2 space.
When they are not, the relation serves to specify the rule of composition.
For outer products of n vectors the defining relations are sufficiently
typified by the case n = 3:
(13) la, b, c] a =(a, Cc, b] = 16 b, a] ae ee
(23) [aa, b, c] = ofa, b, c];
(3,) [a, b, c] + [a, b, d] = [a, b,c + d].
MATRICES AND LINEAR EQUATIONS 23
From these we deduce that
? [a,a,c) = +--+ =0;
[a,
b, c] = [a + Bb, b, c] = [a,b,c
+ aa] = - - - ;
and if e:, 2, es; are linearly independent vectors in the space of a, b, and
c, then
(2.03.3) [a,b,c] = |a Db elfey, es, esl,
where |a b c| is called the determinant of the numerical vectors a, b >
and c, and its value will now be obtained. Note first that, if

a’ = aye, + Q2€2, b’ = @1e, + B2€2,


then
[a, b, es] = [a’, b’, es] = (ai82 — 2281) [€1, €2, 3].

In fact, the identical steps that led to (2.03.1) and (2.03.2) will, if applied
to the second member of this last equality, yield the third member.
Now by an obvious modification we obtain

[a, b, €2] = (a361 — 2183) [€1, C2, esl,


and again
[a, b, e1] = (e283 — as82)[€1, €2, esl.

By putting these together we obtain

(2.03.4) |a b cl = yila28s — asB2) + ye(as6i — a8)


+ 3(a182 — @28:).

When we write the determinant explicitly in the form

Gy. Bi tvs
la b c| = | a2 Bo Y2 1,

a Bs Ys

we see that in the expansion of the determinant in terms of the y’s, the
coefficient of each y; is, except for sign, that second-order determinant
that remains after deleting the row and column containing y;. The sign
is that power of —1 whose exponent is obtained by adding together the
number of the row and column. Thus 72 is in the second row, third
column, and the sign is (—1)***. The coefficient of y; with its proper
sign is called the cofactor of y;. By interchanging rows and columns and
going through the same process, we find

la b c| = 0A, + aA + a3A3

BiB + B2Be —- B3Bs

= yili t yells + ya's,


24 PRINCIPLES OF NUMERICAL ANALYSIS
where the capital letters signify cofactors formed by the samerule. Itis
also true that, for example,
a la b a| = ails + ale + asl's,

and, in general, when the elements of any column are multiplied by the
cofactors of some other column and the products summed, the sum
vanishes.
Finally, we have the expansions
la b ¢| oA, + 61Bi + Wilt
arAe + BoBe + v2T2
ll a3Ag + B3Bzg + YsI's.

These are the expansions we should get if we were to rewrite the deter-
minant, writing the rows of the original as columns of the new one, and
these equations say in effect that this exchange of rows for columns leaves
the value of the determinant unaltered. It is quite clear that such an
exchange leaves the value of a second-order determinant unaltered.
Hence it is clear that in the expansion of either third-order determinant,
the original or the transposed, the coefficient of any element is the same.
The theorem follows because, when the determinant is expressed as an
explicit function of its elements, each term contains as a factor one and
only one element from each row and one and only one element from each
column.
The recursive extension to successively higher dimensions can be made
by following the same pattern, and the formulas need not be written
explicitly. For each extension the expansion is made first along a par-
ticular column, and one observes that it is equally possible to expand
along any row.
For determinants of order 4 or greater another type of expansion is
possible, called the Laplace expansion. To describe this it is convenient
to introduce the symbolism

a; Bil
|ox; B;| i
a; B;
If, now,
& = a1€1 + ar€e + aze3 + ass,
Ore e, Te ee en eine Memes: om euvied el he os: we) Late

d = 6,€; + d2€2 + 53@3 + Sues,

then we can see that

[a, b, €3, e.| ae lou B2\(e1, G2, Cs, e4],


and
[e:, G2, C, d] = lvs 5,|[e1, C2, €3, e,).
MATRICES AND LINEAR EQUATIONS 25
Hence we shall find that in the expansion of ja b c d| there will
appear terms which make up the product |e: Balls é,|. But if we
interchange ez and e3, say, we shall find that there are also terms making
up the product —|a: 8Bs||y2 54], and there is no term common to the
two products. When all possible interchanges are made that yield
distinct products, we find

lg. 004 d| = la: Balls dal — lou Bslly2 S4|


+ lar Ballve 5s|
tae Bally: 44|
— lor Ballvr 8s|
+ las Bullyr del.
To determine the sign in each case we note, for example, that in the
expansion |a3 4||71 42| would appear as the coefficient of [es, e4, e1, ea],
but that this is equal to [e1, e2, es, e4], whence the sign is plus.
Note finally that the determinant of the product of two matrices is the
product of the determinants. In fact, if
e=fE, f=gF,
then
e = gFE.
But on the one hand,
[e1, SO 84 en] = |E\(f:, 5 aS eal

= |EI|Filgs, . . . , Bal,
and on the other hand,

lex, ... , en] = |FEl[gs, . . . , Sl.


Hence
|FE| = |F\|zI.
2.04. Length and Orthogonality. Geometrically the scalar product
xy of the vectors x and y is equal to the product of their lengths into
the cosine of the angle between them, or the product of the length of one
vector by the projection of the other upon it. It is clear geometrically
that the projection of a broken line upon a given line is the sum of the
projections of the separate segments and is equal also to the projection
of the single segment which joins the two ends of the broken line. Hence,
if x is given by (2.01.1), its projection upon y is the sum of the projections
of the segments £e; upon y. Hence

= (2ée,)y.
But by the same rule, if
y = 27;€;,
then
xy = X27;€;,
and therefore

(2.04.1) xy = DLEmee;.
26 PRINCIPLES OF NUMERICAL ANALYSIS
Now the scalar products
(2.04.2) Ys = C0; = VK
are known once the vectors e; are themselves known. Hence the scalar
product of any two vectors can be calculated from (2.04.1).
If each column of a matrix M is written as a row, the order remaining
the same, the resulting matrix is known as the transpose of the original
and is designated M'™. In particular, if 7 is the column vector of the &,
x" is the row vector of the é;. With this understanding, if G is the matrix
(2.04.3) G = (vi) = e'e,
this is said to define the metric in the space, and Eq. (2.04.1) becomes
(2.04.4) xy = w'Gy = y'Gze.
The matrix G is equal to its own transpose and is said to be symmetric.
When the metric G is known and fixed throughout the discussion, one
often uses the notation

(2.04.5) (z, y) = (y, %) = 2'Gy.


An often-used type of coordinate system is one in which each reference
vector e; is of unit length and orthogonal to all the others. In this case
G= I, XV = a yp Se.
With reference to the coordinate system f given by (2.01.14), the
metric is defined by
He= ff;
Now
G=el'e = E'f'fLH,
(2.04.6) G = E'HE.
This relates the metrics for the two coordinate systems.
If both e and f are unit-orthogonal systems, then
Gr Hai
and hence
Tf = EUR:
In this case
ET = F-

and the matrix E is said to be orthogonal.


When G is not the identity matrix, then in the process of evaluating a
scalar product of two vectors x and y, given z and y, one must form either
Gz or Gy. Since if one knows the vector
(2.04.7) a’ =Gz
MATRICES AND LINEAR EQUATIONS 27
one can always find x by solving the equations, knowing z’ is (in principle)
equivalent to knowing x. Hence 2’ is also a representation of x, but of
a different kind from z. It is customary to speak of x’ as giving the
covariant representation or as being a covariant vector, and of z as giving
the contravariant representation or as being a contravariant vector. It
y’ is the covariant representation of y, then
xy = aly! = aly = ya! = y's.
In case x and y are the same vector, the scalar product is the square of
the length:
xx = |x|? = 2'Gz.
Since any numerical vector z represents some geometric vector x, it
follows that always
aGa > 0;
and the equality can hold only when z is the null vector, s = 0. By
virtue of this property of the matrix G, it is said to be positive definite.
The vectors e; can always be referred to a unit-orthogonal system f, and
in this case (2.04.6) becomes
G = HE;
since H is then the identity. It will be shown in §2.201 that any positive
definite matrix can be expressed as a product of a matrix by its transpose.
Hence any positive definite matrix represents a metric in some coordinate
system.
There are always several geometric settings, any one of which could
give rise to a given set of linear algebraic equations. Hence given the
equations, we are at liberty to associate any geometric picture that seems
convenient. This will be done from time to time in the presentation
of the various methods for solving linear systems. The geometric
vectors e, f, etc., seldom if ever need to be introduced explicitly. Given
a numerical vector x, it is sufficient to know that, given a coordinate
system e, the numerical vector x defines a geometric vector x by the
relation
X= ez,
Hence we shall often speak of x as though it were itself a geometric vector,
and we shall refer to it simply as a vector, without qualification.
2.05. Rank and Nullity; Adjoint and Reciprocal Matrices. The outer
product of two vectors has been interpreted geometrically as representing
an oriented parallelogram; the outer product of r vectors represents an
oriented parallelepiped of r dimensions. We shall take it as geometrically
evident that the outer product of r vectors can vanish if and only if the
vectors lie in a space of dimension less than r, or in other words, if they
are linearly dependent.
28 PRINCIPLES OF NUMERICAL ANALYSIS
-A matrix is said to have rank r in case it has r linearly independent
columns, but every r + 1 columns are linearly dependent. Hence every
column is expressible as a linear combination of these r columns. The
outer product of the vectors represented by these r columns must be
non-null, so that at least one submatrix formed from these r columns must
have a nonvanishing determinant. On the other hand, no submatrix of
order r + 1 can have a nonvanishing determinant. Hence a matrix is
of rank r if and only if the largest submatrix with nonvanishing deter-
minant is of order r. By applying the above argument to the transpose
of the matrix, everything that has been said of the columns applies
equally to the rows.
A square matrix of order n and rank r is said to have nullity n — r.
If we suppose the coordinate vectors e to be a unitary orthogonal system,
the homogeneous equations
A'tz =0
are satisfied by any vector x orthogonal to all the columns of A. Butif A
has rank r, its columns determine an r-dimensional subspace of the
n-dimensional vector space, and there is an (n — r)-dimensional subspace
orthogonal to it. Hence the equations have n — r linearly independent
solutions.
A matrix is nonsingular in case its determinant is nonzero. If
(2.05.1) x = Lia; = ax
and the vectors a; are linearly independent, then
E:fa1, Qo, . . . , An] = (F181, ae, . . . , nl
= [f:1a1+ fa. + - * + + Enan, 2, . . . , Bal
ae (X, Be ics us. @al :
Hence if
a=eA,
these equations give
(2.05.2) £ilay Saxe con 2 cap) ten eriagy Caescag|
when we drop the outer product of thee;. Since the vectors a; are linearly
independent, the matrix A is nonsingular, and hence
(2.05.3) fre|e “Qs «6 fe Oal/
di: (Ge) oe dal
and, in general, each £; is the quotient by |A| of the determinant obtained
from |A| when z replaces a;. This is Cramer’s rule.
If we write
(2.05.4) A Sree
and if A;; is the cofactor of a,; in |A|, the matrix
(2.05.5) adj (A) = (Aj)
MATRICES AND LINEAR EQUATIONS 29
of the cofactors is called the adjoint of A. The expansion rules illustrated
in §2.03 for a determinant of order 3 can be expressed
(2.05.6) A adj (A) = adj (A)A = |AZ;
the product of a matrix by its adjoint in either order is a matrix with 0
everywhere except along the principal diagonal, and there every element
has the value |A|.
Now let
(2.05.7) att = A;,/|A|.

Then
(ai) = |A|-1 adj (A),
(a;)(a*) = I,
so that
(2.05.8) A-} = |A|“! adj (A).

This gives the explicit representation of the elements of the inverse


matrix.
2.051. Projection operators. Let a represent a set of m <n linearly
independent vectors, and
(2.051.1) a= eA.
These vectors form a basis for a subspace of m dimensions. If e’e = 7,
then
ala = ATA
defines the metric for that subspace. The matrix is nonsingular, since
otherwise a non-null vector x would exist satisfying ATAx = 0, and hence
z'AtAx = 0, and the non-null geometric vector ax would have length -
zero, which is impossible.
Hence the symmetric matrix
(2.051.2) - P = A(AtTA)-1AtT
exists, and is said to be idempotent since
P? =*A(ATA)HEATA(ATA)“Al =tA(ATA)“1AP = P,
If x = ez is any vector, then
ePx = eA(A'A)—1Alx = a(ATA)“1Alx
is a vector in the space ofa. Ify = ay = eAy is any vector in the space
of a, then
eP(Ay) = eA(ATA)-1AT(Ay) = eAy = y.
Henceif x represents any vector, Px represents a vector in the space of A,
and if = Ay represents a vector in the space of A, then Pz represents
30 PRINCIPLES OF NUMERICAL ANALYSIS
the same vector. Thus the matrix P projects any vector into a particular
subspace and leaves unchanged any vector already in that subspace. It
is therefore called a projection operator.
The projection is, indeed, an orthogonal projection. For given any
z, the projection is Px, and the residual is x — Px. But since P is
symmetric, the projection and the residual have the scalar product
xP — P)x = x'(P — P*)x = 0,

since P is idempotent.
Any symmetric idempotent matrix P is a projection operator. For
if P has rank m, we can find a matrix A of m linearly independent columns
such that every column of P is a linear combination of columns of A.
Hence, for some matrix B we can write
P = AB’
We have only to show that
B= A(AtTA)=*.
Since P is idempotent,
Pu =P? =A BAB = AB
Since the columns of A are linearly independent, B'AB™ = B™. The
rank of P = AB" cannot exceed the rank of B"; hence B™ has rank at least
m, and having only m rows, the rank is exactly m. Hence
BIA ==, A'B.
Since P is symmetric,
ABT = BAT,
ABTA = BATA,
A = BATA.
This is the desired result.
For an arbitrary metric, e'e = G, the orthogonal projection is repre-
sented by a matrix

(2.051.3) P = A(A'GA)-1A1G,
where the columns of A are contravariant vectors.
2.06. Cayley-Hamilton Theorem; Canonical Form of a Matrix. If with
any vector x we associate its successive transforms

(2.06.1) eG = Ty (¢"= O88 2,) shoe),

at most n vectors of the sequence 2%, 1, %2, . . . will be linearly inde-


pendent. Hence for some r < n there exist scalars yo, 71, . . . , Yr not
all null such that

(2.06.2) (vl +ymT+-:++ ++4,T*)x = 0.


MATRICES AND LINEAR EQUATIONS 31

The matrix
(2.06.3). V(T) = Vol oe mT + ey ae a vert

is a polynomial in 7, and since


W(T)x = 0,
zx is said to lie in the null space of ¥(T). If both x and y are in the null
space of ¥(T), then ax + y is also in this null space for any scalars a and
8. The null space consists of the null vector only, unless y¥(T) is a
singular matrix. If it is nonsingular, we shall say it has no null space,
disregarding the trivial case of the null vector alone.
If #(T) and ¥(T) are polynomials in T, and if there is a non-null vector
x which lies in the null space of each, then the scalar polynomials (A)
and ¥(A) have a common divisor which is not constant. For if they do
not, then by a classical theorem in algebra (proved below in §3.06), there
exist polynomials f(A) and g(A) such that
(2.06.4) fO)6Q) + gA¥O) =1
identically. But then
(2.06.5) f(T)O(T) + g(TW(T) = I,
and herice
(2.06.6) F(T) o(T)z + g(T)V(T)z = z.
But since 2 lies in the null spaces of both ¢ and y, the left member of this
identity vanishes, and hence =z is a null vector, contrary to supposition.
Next to a constant, whose null space is the null vector alone, the
simplest type of polynomial is linear. Hence consider a polynomial
T —vXI. It has a null space if and only if its determinant vanishes:
(2.06.7) [7 — rAJ| = 0.
This determinant, when expanded, is a polynomial in ) of degree n,
(2.06.8) (dA) = |T.— AZ| = (—A)* + y(—A)™ I + Oe He,

called the characteristic function of T. Equation (2.06.7) is called the


characteristic equation of T. One verifies easily that

(2.06.9) 1 = Ur: = UT), rn = ITI,


where t(T) is called the trace of 7 and is equal to the sum of the diagonal
elements.
Any satisfying (2.06.7) is called a proper value of 7. If \ is any
proper value, there is at least one vector in the null space of T — NN,
and any vector in the null space of 7 — XJ is called a proper vector
associated with the proper value X. If \ is not a proper value, T — AI
has no null space.
32 PRINCIPLES OF NUMERICAL ANALYSIS

For any proper value A, any vector in the null space of T’ — XJ is also
in the null space of (7 — AJ)’ for any positive integer r, but the converse
is not true. A vector in the null space of (T— ADF for any positive
integer r is a principal vector. If x is in the null space of (T — AI)’
but not in the null space of (7 — dJ)*“}, it is called a principal vector of
grade r.
The characteristic function 400) has the remarkable property that

(2.06.10) ¢(T) = 0.
This is the Cayley-Hamilton theorem, which can be stated otherwise ie
saying that the null space of ¢(7’) is the entire space. This might be
expected from the fact, shown above, that the null spaces of two poly-
nomials in 7 have a non-null vector in common only if they have a
common divisor. A proof of the Cayley-Hamilton theorem is as follows
Since
o(T) — $Q)I = (=—1* = (-Y*] + nl(— 7) - (YT
spss eh a(t)
therefore this difference is equal to a polynomial in 7 multiplied by
T — XI, and hence is said to be divisible by T — AJ. Also, by Eq.
(2.05.6),
(T — AI) adj (T — AI) = oQ)I.
Hence ¢(A)J is also divisible by T — AJ. It follows, then, that ¢(T) is
divisible by 7 — J. However, ¢(T) issuidepee dent of \ and must
therefore vanish.
If F is any nonsingular matrix, then the matrices T and
(2.06.11) fA aON Firs HS
are said to be similar. They represent the same transformation but in
different coordinate systems. Since
F-\(T —I)F =T’ — XI,
they have the same characteristic function and hence the same proper
values. One proves inductively that for any positive integer r
T= FAThs
and hence that
(2.06.12) W(T") = F-Y(T)F
for any polynomial y.
It is reasonable to expect that a given transformation might be more
simply represented in some coordinate systems than in others, and this
will now be shown. Note first that the theorem expressed by (2.06.4)
MATRICES AND LINEAR EQUATIONS 33
can be generalized as follows: If there is no nonconstant factor common to
all polynomials $1(A), ¢2(A), . . . , dm(A), then there exist polynomials
Fil), fold), . . » , fm(A) such that —

(2.06.13) Hib.) + +++ + fmA)om(A) = 1


identically. The proof is inductive and sufficiently indicated by taking
m = 3. It may be that ¢: and ¢2 have a common factor d12(A), but if so,
it is prime to os. By applying (2.06.4) to ¢ = o1/die and yp = $2/d12,
and then multiplying, we have for some g; and ge
9191 + J2o2 = die.
Also for some g and h,
gdi2 =f hdz = il
Hence
991d: + 9g2b2 + hos= 1.
Now let the characteristic function ¢(\) be factored completely:
O(A) = (A — Aa)™(A — Ag)™* > > © (A — Am),
where
MW he = 4 n=,
and the ); are all different. Clearly the polynomials

$i(A) = (A — Ax) (A)


satisfy the conditions of the theorem. Hence with suitable polynomials
fi, (2.06.13) can be satisfied, and hence
AlT) X(T) + +++ +fn(T)om(T) = I.
Hence for any vector x
flT) o(T)e + +++ +fn(T)on(T)x = x.
But ¢:(T)z, and hence also f;(T)¢:(T)z, is a principal vector since
(T — d1)"oA{T) = (7) = 0.
Hence any vector is expressible as a sum of principal vectors.
Moreover, the representation of + as a sum of principal vectors is
unique. For if not, the difference of distinct representations would
express the null vector as a sum of principal vectors in the form
Yityet-- > tym =),

where y; is a principal vector associated with \;, and at least one y; is


non-null. Suppose yi #0. Then

+ y2
(yr )
o(T t+ + Ym) = 90.
34 PRINCIPLES OF NUMERICAL ANALYSIS
But since ¢:(7) contains every factor (A — A)" except (A — Ai), it
follows that
. d(T) (y2 + > °° + Ym) = 0.
Hence
oi(T)yi = 0.
But
(T — Al)™yi = 0,
whereas (A — A1)™ and ¢i(A) have no common factor. Hence y: = 0,
contrary to supposition. —
Now for a new coordinate system choose a matrix

in which the columns of F, form a coordinate system for the null space of
(T — r11)™, the columns of F, form a coordinate system for the null space
of (7 — Ael)™, .. . . Ifxis any vector in the null space of (T’ — AJ)”,
so also is Tx since
(T —dA2)"Tx = T(T — dAd)™x = 0.
Hence any column of TF; is expressible as a linear combination of columns
of F;, and therefore
(2.06.15) TE = RT
where 7” has the form
Eo Or eO
2.06.1 T’ =
/ 0 Ts 0

Ste) OSFOr =i.

But F must be nonsingular, whence


(2.06.17) FEE om, 1%
and except for the trivial case m = 1, a partial simplification has been
effected. We proceed next to specialize the choice of the columns within
any F,.
These are all principal vectors, and there is some »; < n; such that
every vector in the space of F; is of grade »; or less. First select a maximal
linearly independent set of proper vectors in the space of F; (correspond-
ing to the proper value \,;). Adjoin to these a maximal linearly independ-
ent set of (principal) vectors of grade 2,... , and finally complete
F; by adjoining vectors of maximal grade »;. Thus F; has the form

F; — (Fis, F 32, wirie’ ie earl

where all columns of F; are linearly independent and where every column
of F;; is a principal vector of grade 7.
MATRICES AND LINEAR EQUATIONS 35
Now the columns of F;,, are of grade v;, while those of
(T — ADF,
are of grade », — 1. Furthermore, the columns of

we as AL) Fv, Ps,

are linearly independent. If this were not the case, there would exist
equal linear combinations (7 — i,J)z of the columns of the first sub-
matrix and y of the columns of the second:
(2.06.18) (T —rADz = y.
But then
(T — r.1)% = (T — dAJ)*"y = 0

since all columns of F; are of grade < v;. But then y is of grade »; — 1.
Hence y, which is by definition a linear combination of columns of
F;,,, 1s also a linear combination of the remaining columns of F,, and this
is another way of saying that the columns of F; are linearly dependent.
Since this is not the case, y = 0. Hence by (2.06.18) x is a proper
vector, and hence both a linear combination of columns of F;; and of Fi,,.
Hence x = 0.
The argument may be continued to show that the columns of
(2.06.19) CE) Dane AT — Ad) Foy Fa)

are linearly independent. If there are more columns in F; than in


(T — d,1)”1F;,,, we adjoin additional ones to form a new matrix Fy
and continue to form a new matrix F’; of which the matrix (2.06.19) is a
submatrix. We then show that the columns of
(TE PAGE)
PE epee ore gly peag king)
are linearly independent. Proceeding as before, we obtain finally a
matrix F; such that (7 — A,J)Fi; is a submatrix of Fi,,1, ..., and
(T — d,1)F 2 is a submatrix of Fi.
Finally, the columns of F; are rearranged as follows: For fi; take any
column of F;,,. Let
fee = (T — AD)fa,
fis = (T ce AD) fie,

fi, = (T — AMD) fir.


Then
0 = (T —rADfu
and every vector of the sequence is a vector in F;. If there are other
columns in F;,,, take one of them as f;,,,41 and proceed as before. When
36 PRINCIPLES OF NUMERICAL ANALYSIS

all columns of F;,, are exhausted, pass next to a column (if any) from
_ F;,,,-1 which does not appear in one of the above chains; then pass to
Fsy—2)
’ By so forming and rearranging the matrices F; which make up F, we
obtain a matrix F whose columns are grouped into sequences such that,
when the double subscripts of the f’s are replaced by single subscripts
from 1 to n, we have either
Thy = Mf + Fits
or else
Tf; = Mf;
for some d;. Hence
Th hele,
where now T”’ has the form

Pi ORO
0. sFES0
2.06.20 TY’ = &
( ) 0.6 Om ek;
a) ete (6 seme: wel ce "a ens,

and each T; is eit er a scalar ); or else has the form

(2.06.21) T =r +h,
000
(2.06.22 ) Tage
1
1802-0
Go
ee) eo <o ‘ee We. 6 'e

The matrix J; is called the auxiliary unit matrix, and has units along the
first subdiagonal and elsewhere has zeros. Note that
I, = 12
has units along the second subdiagonal, and if I, is of order v, then Jz = 0.
Every column of F is a principal vector of T. We could apply the
above theorem to 77 and in the process obtain a matrix G every column of
which is a principal vector of T7. If f is a principal vector of T cor-
responding to the proper value \, and if g is a principal vector of 7™
corresponding to the proper value » ~ i, then g and f satisfy
(2.06.23) g Tf = gf =0.
The proof can be made inductively. Suppose first that g and f are proper
vectors: Then
gT = ug, Tf =,
so that
g'Tf = ug'f = dg'f.
MATRICES AND LINEAR EQUATIONS 37
Since \ ~ p, this proves the relation for that case. Next suppose g is
a proper vector but f of grade 2. Hence

(T — Af = fi #0,
but
(T — ANYfi = 0.
Hence f: is a proper vector. Now

gfi = 0
as was just shown. Hence
g'Tf = do's,
whereas ;
. g'Tf = ng’,
and again, since \ ~ uy, the relation is proved. By continuing one proves
the relation for proper vectors g and f of any grade.
If T is symmetric, T™’ = T, then any principal vector of 77 is also
a principal vector of 7. But for a symmetric matrix we now show
that all principal vectors are proper vectors, and in the normalized form
(2.06.20) of 7 all matrices T’ are scalars.
This is clearly the case when the proper values are all distinct. In
that case, in fact, every proper vector is orthogonal to every other proper
vector, whence F'F is a diagonal matrix, and by choosing every vector f
to be of unit length, one has even
(2.06.24) 7 FF =I
so that F is an orthogonal matrix. Suppose the proper values of 7’ are
not all distinct. One can, nevertheless, vary the elements of T slightly
so that the matrix T + 67 is still symmetric and has all proper values
distinct. ‘Then F + 6F is an orthogonal matrix. As the elements of
[T + 6T vary continuously while the matrix remains symmetric, the
columns f + éf of F + 6F also vary continuously but remain mutually
orthogonal and can be held at unit length. Hence these properties
remain while 57’ vanishes. Hence for any symmetric matrix 7 there
exists an orthogonal matrix F' such that
(2.06.25) F'TF = A,
where A is a diagonal matrix whose elements are the proper values of T’.
2.07. Analytic Functions of a Matrix; Convergence. The relation
(2.06.12), valid for any polynomial, is easily extended. Consider first
any of the matrices 7’; of (2.06.20), neglecting the trivial case when T’, is a
scalar. Any power can be written

(2.07.1) Ty = MI + hs + C) Metre,
38 PRINCIPLES OF NUMERICAL ANALYSIS

and if 7% is of order v, there are at most » terms for any integer r. If


|\,] < 1, then for any fixed s
. lm (")= 0.
rao \S

Hence in this case as r becomes infinite, 7,’ approaches the null matrix, in
the sense that every one of its elements approaches zero. If for every
proper value 2; of T it is true that |\;| < 1, then also 7” approaches the
null vector as r becomes infinite. Since F and F~ are fixed, this is true
also of F-1T’"F, and hence of 77. Hence if every proper value of 7 has
modulus less than unity, then 7*—> 0 as r becomes infinite. This con-
dition is necessary as well as sufficient.
Now consider any function (A) analytic at the origin:

VA) = Pot Vr + WWyne+t---


If \; lies within the circle of convergence of (A), then since any derivative
of y is analytic within the same circle, we may write formally
Q.07.2) WT!) = pol + Vd + Ii) + BMWS + 2a Ds bye) aS
= Va) + VOdli + BW’ A)I2 + *
where at most vy terms are non-null. Hence for a matrix of the form of 7’
the analytic function ¥(T%) is defined by this relation. But then if every
d; lies within the circle of convergence, we may take
V(T4) 0 mae bs
(2.07.3) wry ={ o- wT ...)

and then we may further take


(2.07.4) W(T) = FY(T’)F-.
Hence if ¥(A) is any function that is analytic in a circle about the origin
which contains all proper values of T in its interior, then ¥(T) is defined
and, in fact, is given by the convergent power series

(2.07.5) VT) = vol + WT + OWT? +


2.08. Measures of Magnitude. Most iterative processes with matrices
are equivalent to successive multiplications of a matrix by another matrix
or by a vector, and hence involve the formation of successively higher
powers of the first matrix. The success of the process depends upon the
successive powers approaching the null matrix. An adequate criterion
for this is given in terms of the proper values of the matrix, but the calcu-
lation of proper values is generally a most laborious procedure. Hence
other criteria that can be more readily applied are much to be desired.
MATRICES AND LINEAR EQUATIONS 39
For this purpose certain measures of magnitude of a vector or matrix
are now introduced, and some relations among them developed. These
measures are of use also as an aid in assessing the several types of error
that enter any numerical computation.
For any matrix A we define the bound b(A), the norm N(A), and the
maximum M(A), and for a vector x we define the bound b(x) and the
norm N(x). It turns out that a natural definition of a maximum M(z)
for a vector x is identical with N(z).
Taking first the vector x whose elements are £;, we define
b(x) = max |&,|,
(2.08.1) i
N(x) = [ZE7]* = [ata].
Thus b(z) is the magnitude of the numerically largest element, while N (zx)
is the ordinary geometric length defined by the metric J.
We define b(A) and N(A) analogously:
b(A) = max jail,
(2.08.2) %J

N(A) = [Za?]%.
Clearly
(2.08.3) b(A') = B(A), N(A‘) = N(A).
If we use the notion of a trace of a matrix
(2.08.4) tr (A) = Za,
then an equivalent expression for N(A) is
(2.08.5) N(A) = [tr (ATA)]* = [tr (AA]*.
If a; are the column vectors of A, and a; the row vectors, then
(2.08.6) N?(A) = ZN%(a;) = ZN*(a}),
where the exponent applies to the functional value
N?(A) = [N(A)]’.
Hence if a; = x and all other a; = 0,
N(A) = N(z).
A useful inequality is the Schwartz inequality which states that for
any vectors x and y
(2.08.7) |zty| = ly™z| < N(x)N(y).
Geometrically this means that a scalar product of two vectors does not
exceed the product of the lengths of the vectors (in fact it is this product
multiplied by the cosine of the included angle). This generalizes immedi-
ately to matrices
(2.08.8) , N(AB) < N(A)N(B).
40 PRINCIPLES OF NUMERICAL ANALYSIS
Another useful inequality is the triangular inequality
(2.08.9) N(e@+y) < N(x) + NQ),
which says that one side of a triangle does not exceed the sum of the other
two, and which also generalizes immediately
(2.08.10) N(A + B) < N(A) + NB).
Also we have
(2.08.11) b(A + B) < B(A) + D(B).
But
(2.08.12) |aty| < nb(x)b(y),
since in zTy there are n terms each of which could have the maximum value
b(x)b(y). Hence for matrices
(2.08.13) b(AB) < nb(A)b(B).
If in (2.08.8) we take
bi = &, b=bs
= --: = b, = 0,
then we have
(2.08.14) N(Az) < N(A)N(z).
We now introduce the third measure:
(2.08.15) M(A) = max N(Az)/N(z) = ess |ztAy|/[N (x) N(y)],
or equivalently
(2.08.16) M(A) = max N(Az) = max |xTAy|.
N(z) =1 N(z)
= Ny) =1

It is obvious that (2.08.15) and (2.08.16) are equivalent. We show now


that the two parts of (2.08.15) or of (2.08.16) are equivalent. Take M(A)
as defined by the first equality, and designate, for the moment, the last
member by M’(A). We wish to show that
M(A) = M’(A).
First we have
|aTAy| < N(x)N(Ay)
by the Schwartz inequality (2.08.7). But

N(Ay) < M(A)N(y)


by definition of M(A). Hence for any z and y

|ztAy| < M(A)N(@)N(y), ©


|ztAy|/[N (x)N(y)] < M(A).
MATRICES AND LINEAR EQUATIONS 41
Hence the maximum of the left member cannot exceed the right member,
so that
M’(A) < M(A).
Now to prove that
M(A) < M’(A),
we take
z= Ay
and obtain
|ztAy| = |yTATAy| = N2(Ay).
Hence when =z is so defined
|ztAy| _ N*(Ay) _ N(Ay)
N(z)N(y) N(Ay)N(¥y) Ny)’
and this quantity has M(A) for its maximum. Hence M’(A) cannot be
less than M(A), and the theorem is proved.
Since
stAy = y'A'z,
it follows from the second definition of M(A) that
(2.08.17) M(A?) = M(A).
Also
(2.08.18) M(A + B) < M(A) + M(B),
(2.08.19) M(AB) < M(A)M(B).
To prove the first of these, we have
N(Az + Br) < N(Az) + N(Bz) < [M(A) + M(B)|N(),
the first inequality being a consequence of the triangular inequality
applied to the vectors Az and Bz, and the second following from the
definition of the maximum. Since, therefore,
N((A + B)z]/N(z) < M(A) + M(B)
for all vectors x ~ 0, the maximum value of the left member cannot
exceed the right.
For the second inequality we have
N{A(Bz)] < M(A)N(Bz) < M(A)M(B)N@),
both inequalities being consequences of the definition of the maximum.
If we divide again by N(x), the conclusion follows as before.
We now establish the following relations among the functions N, b, and
M of the same matrix:
(2.08.20) b(A) < M(A) < N(A) < nb(A),
(2:08.21) N(A) < n¥#M(A).
42 PRINCIPLES OF NUMERICAL ANALYSIS
To prove the first, we use the second definition of M in (2.08.16). If we
take
t= Gi, ¥Y = &,
then
xtAy = Ajj.

Any choice of unit vectors for x and y will give a number z’Ay which
cannot exceed the maximum. Hence

las| < M(A),


and since this holds for any 7 and J, it follows that
b(A) < M(A).
Next, since by (2.08.14)
N(Az)/N(x) < N(A),
the maximum value of the left member cannot exceed N(A), and this
maximum is M(A) by definition. Hence
M(A) < N(A).
To show that
N(A) < nb(A),
we go to the definition and write
N?(A) = Za? < n*b*(A),
since b?(A) is the greatest of the n? terms in the sum. When we take the
square root, we have the desired result.
Before proving (2.08.21), we prove first a more general result:

(2.08.22) N(AB) < N(A)M(B),


N(AB) < M(A)N(B).
If the columns of B are b;, then the columns of AB are Ab;. By (2.08.6)
applied to the matrices B and AB,
N?(AB) = 2N*(Ab;) < M*(A)2N*(b;) = M?(A)N*(B).
We get the second relation on taking square roots. The first follows after
taking transposes from (2.08.3) and (2.08.17). Now (2.08.21) is an
immediate consequence of the second of (2.08.22) when we take B = J,
since
N(I) = n*.

Of the three functions b, N, and M, the first is obtainable for any given
matrix by inspection, and the second by direct computation. The third,
however, is only obtainable in general from rather elaborate computa-
tions, though it generally yields the best estimates of error.
MATRICES AND LINEAR EQUATIONS 43

A few additional properties of these functions will be useful. First


we have: .
(2.08.23) M(A‘™A) = M*(A).
In view of (2.08.17) and (2.08.19), we know that the left member cannot
exceed the right. On the other hand
N?(Ax) = a'ATAx < N(x)N(AtAz) < M(ATA)N2%(2),
the first inequality being a consequence of the Schwartz inequality
(2.08.7), while the second follows from the definition of M. Hence the
right member of (2.08.23) cannot exceed the left, and the equality there-
fore follows.
An orthogonal matrix X is one whose transpose is its inverse:
XTX = XX = I,
Multiplication by an orthogonal matrix does not affect the norm of a
vector:
N32 Xo je== 2X Xe = eee oele = N22).
Hence for any matrix A if X is orthogonal,
(2.08.24) N(AX) = N(A), M(AX) = M(A).
By (2.06.25), for any symmetric matrix B there exists an orthogonal
matrix X such that
(2.08.25) X'iBX=A
is a diagonal matrix. The columns of X are the proper vectors of the
matrix, and the diagonal elements of A the proper values. If
B= ATA
then B is non-negative, and these proper values are all non-negative.
We may suppose that they are arranged in order of decreasing magnitude,

(2.08.26) Me Meas ih. 2/0;


the last equality holding only when A is singular. Clearly

: M (A) = ej Ae,
and furthermore
M(B) = M(A).
Hence
(2.08.27) M(A) = 1%.
Also
(2.08.28) N?(A) = 2X.
44 PRINCIPLES OF NUMERICAL ANALYSIS
To see this, we observe that by definition the proper values ); of a
matrix B satisfy the algebraic equation
|B — rxI| = 0
and that the trace tr (B) is the sum of the proper values, while by
definition
N*(A) = tr (ATA) = tr (B).

These relations provide an alternative proof for (2.08.21) and the second
inequality in (2.08.20).
By analogy with M, we define
(2.08.29) m(A) = an N(Az)/N (2).
Then
(2.08.30) m(A) = d2%.
Also if A is nonsingular,
(2.08.31) M(A-) = d;*, m(A-1) = Az,
Hence for a nonsingular matrix
(2.08.32) M(A-)m(A) =
These relations arise from the fact that, if B is nonsingular,
X' BUX = A},
which is a special case of the relation
(2.08.33) X'B*X = A’,
where r is any integer, positive or negative.
We conclude this discussion by noting that, if x’ is the vector whose
elements are |£;| and if 1, is the vector each on whose elements i
is unity,
then from (2.08.7) it follows that
(2.08.34) Dé] < n¥N (a),
where x has the elements £ This follows from the fact that

N(1,) = n¥.
2.1. Iterative Methods. Generally speaking, an iterative method for
solving an equation or set of equations is a rule for operating upon an
approximate solution z, in order to obtain an improved solution 241,
and such that the sequence {z,} so defined has the solution x as its limit.
This is to be contrasted with a direct method which prescribes only a
finite sequence of operations whose completion yields an exact solution.
Since the exact operations must generally be replaced by pseudo opera-
MATRICES AND LINEAR EQUATIONS 45
tions, in which round-off errors enter, the exact solution is seldom attain-
able in practice, and one may wish to improve the result actually obtained
by one or more iterations. Also since the “approximation” x) with
which one may start an iteration does not necessarily need to be close,
itis sometimes advantageous to omit the direct method altogether, start
with an arbitrary %, perhaps zo = 0, and iterate until the approach is
sufficiently close.
2.11. Some Geometric Considerations. A large class of iterative meth-
ods are based upon the following simple geometric notion: Take any
vector b and a-sequence of vectors {u,}, and define the sequence {b,} by

(2.11.1) b = bo,

bp-1 = bp + ApUlp,
where the scalar ), is chosen so that b, is orthogonal to u,. Then if the
vectors u, ‘‘fill out’’ some n space, the vectors b, approach as a limit a
vector that is orthogonal to this n space. Without attempting a more
precise definition of what is meant by “filling out,” we can see that it
must imply the following: If the vectors e; represent any set of reference.
vectors for this n space, then however far out we may go in the sequence
{up}, it must always be possible to find vectors u, = eu, with a non-
vanishing projection on any e; and, in fact, with components that have
some fixed positive lower bound. A possible choice for the vectors u,
- would be the reference vectors e; taken in order and then repeated.
If e; is the arithmetic vector associated with the geometric vector e;,
we have then
Urynti = i,

where » is any integer. It is easily verified that the vector e; has 1 in


the 7th place and 0 in every other position.
The vectors e; are entirely subject to our choice, and we may choose
any convenient positive definite matrix H to represent the metric. Then
the orthogonal projection upon wu, is represented by u,(ulHu,y) uli.
(cf. §2.051), whence
(2.11.2) Ap = UlAby_1/UHUup.
Now we write
(2.11.3) At=y¥
as the equations to be solved. Let x, represent any approximation to the
solution x. If
(2.11.4) A& = 1, = y — Az,
then either s, or rp can be taken as representing the deviation of the
approximation from the true solution. Hence either s, or rp may be
46 PRINCIPLES OF NUMERICAL ANALYSIS

taken as b,. We consider separately the case when A is itself positive


definite and that when A is not itself positive definite.
2.111. The matrix A is positive definite. Let
ho =105,
Then
y — Ady = by = bot — Aptly = Y — Ay-1 — AgUty-
Hence by comparing first and last members in the equality .
AX, = Api + ApUy,
Ln = Coat e Ag Ups
Therefore we take
Up = Ady
and obtain
(2.111.1) Lp = Lp-1 + ApYy.

Now
\p = ulHb,1/ulHuy
vlAHb, 1/viAH Ay.
Consequently it is natural to take
H = A“,
which gives
(2,111.2) Ap = VT p1/VT Ady.
Alternatively we may take
Sp = by.
Then
y — Ax, = Aby = A(bp-1 — Nytly) = y — Atp-1 — ApAUy.
Hence
Ax, = AXp1 + rA,AUp,
or
(2.111.3) Lp = Lp-1 + ApUp.

But
Ap = URHb, 1/ulHuy.
If we take
H=A,
we have

(2.111.4) Ap = Ulrp-1/UTAUy.

Equations (2.111.3) and (2.111.4) are identical with (2.111.1) and


(2.111.2) except for the replacement of vp by up.
MATRICES AND LINEAR EQUATIONS AT

In geometric terms, since A is taken to represent the metric, our


equation (2.11.3) requires the contravariant representation x of a vector
whose covariant representation y is known. We begin with an approxi-
mate representation %. Then
To = y — Axo = A(x — Xo)
is the covariant representation of the error x — Xo, and
Ait, = {ulro/ulAui}ur
is the contravariant representation of the orthogonal component of ro
in the direction of ui. When this vector is added to xo, the new residual
Y, = X — x; is thus a leg of a right triangle of which rp is the hypotenuse.
Hence we have a better approximation provided only u; and ro were not
orthogonal.
Any rule for selecting the u, (or equivalently the v,) at each step
defines a particular iterative process. There are three in common use:
1. The method of steepest descent prescribes that we take
Up = PLp-1.

To understand the reason for this selection, we note first that, since A is
positive definite, by hypothesis, there exists therefore a matrix C for which
A = CTC.
The equations
As = y
are therefore equivalent to the equations
Cz = 2,
where
Clz. = 4:
Define the function
g(x) (Caz — 2)"(Cx — 2)
21C1Cx — 2a7™Clz + alz
= a'Ax — Qaly + zlz.

This function is a sum of squares which has a minimum of zero for


ee COC ziee Ay,

The function g(x) is a function of the n variables f:,..., & Its


partial derivatives with respect to these variables are the coordinates
of the gradient vector
gz = 2(Ax
— y),
and the gradient at the point 7,1 is —2r,-1. Hence the function g(x)
evaluated at the point x»: is undergoing its most rapid variation in the
48 PRINCIPLES OF NUMERICAL ANALYSIS
direction rp_1, and ifwe think of our problem as one of minimizing g(x) by
proceeding in successive steps, it is natural to take each step in the direc-
tion of most rapid decrease. If we define

(A) = ftps + My),


we shall find that ¢, as a function of the single variable \, takes on its
minimum value at \ = dA, given by (2.111.4).
2. The method of Seidel takes
Up = Urnpi = Gi

From (2.111.8) we see that x, differs from rp_; only in the 7th element.
Furthermore, since b, and e; are to be orthogonal, this means that the ith
element of r, = As, must equal zero, which means that the 7th equation is
satisfied exactly. Hence the ith element of x, is chosen so that the 7th
equation will be satisfied when all other elements are the same as for
XLp-1. While we may expect that this process will require more steps
than does the method of steepest descent, the simplicity of each step is a
great advantage, especially in using automatic machinery.
3. The method of relaxation always takes u, to be some e;, but the
selection is made only at the time. Since the choice u, = e; has the effect
of eliminating the 7th component of ry, one chooses to eliminate the largest
residual. However this is not necessarily the best choice. The effective-
ness of the correction is measured by the magnitude of the correcting
vector A,U,, and this magnitude is

UtAs, s{utAup}%.

Now when wu, = ¢;, then ulAs,_; is the 7th component of the residual
T,-1, but this is divided by the length of e; which has the value of ~/ai.
Hence one should examine the quotients of the residual components
divided by the corresponding ~/a, and eliminate the largest quotient.
This method clearly converges more rapidly than the Seidel method,
which projects upon the same vectors but in a fixed sequence. Therefore
for ‘‘hand”’ calculations it is to be preferred. For automatic machinery,
however, the fixed sequence is almost certainly to be preferred.
2.112. The matrix A is not necessarily positive definite. This case
can always be reduced to the preceding if we multiply throughout by AT.
However, this extra matrix multiplication is to be avoided if possible.
With regard to the equations
Az = y,
we may adopt either of two obvious geometric interpretations.
The simplest interpretation is that we wish to change the vector
coordinates, as in Eq. (2.01.8), where y, taking the place of 2’, is known,
MATRICES AND LINEAR EQUATIONS 49
and A, taking the place of EZ, is known. In the symbols used here,
therefore, the columns of A are the numerical vectors which represent the
e; in the system f, and the column vector y represents x in the same system.
The vector y is to be expressed as a linear combination of the columns
of A, which is another way of saying that the vector x, whose representa-
tion is known in the f system, is to be resolved along the vectors e;.
The other interpretation comes from regarding each of the n equations
as the equation of a hyperplane in n space. If a; represents the ith row
vector in A, then the 7th equation is
j at = Niy

and this equation is satisfied by any vector x leading from the origin of the
point-coordinate system to a point in the hyperplane. If we divide
through this equation by N(a,), the length of a;, we obtain
[a;/N (a,)]z = 4:/N (ai),
and since the vector multiplying x is a unit vector, the equation says that
the projection of x upon the direction of a; is of length 7;,/N(a;), and hence
the same for all points in the plane. Consequently the vector a; is
orthogonal to the plane; and the distance of the plane from the origin is
\n:|/N (a).
In case we think of the underlying coordinate system e as nonorthogo-
nal, the vectors a; are taken to be covariant representations of the
normals, and the vector x as the contravariant representation of the
vector x drawn to the common intersection of the n planes.
These two geometric interpretations suggest different iterative schemes.
We begin with the hyperplane interpretation.
2.1121. The Equations Represent a System of Hyperplanes. If v is
any column vector, then
(2.1121.1) viAx = vly
is also the equation of a hyperplane passing through the point x. The
normal, written as a column vector, is A'v. If x, is any approximation
to x, and s, and r, are defined as before,
(2.1121.2) A8» = Tp = y — Aty = A(z — 2p),

then we may take


bp =S, = x — fp = A",
(2.1121.3)

and project upon the vector up41 = A'p41. This amounts to writing
Az, = y — Ab,
so that as b, vanishes, x, approaches z. The basic sequence as defined
by (2.11.1) and (2.11.4) takes the form
50 PRINCIPLES OF NUMERICAL ANALYSIS ©

(2.1121.5) ‘ Ap = Virp-1/V,AA'd,,
if the identity matrix is taken to define the metric. But then (2.1121.4)
gives
(2.1121.6) Lp = Lp + ApAldy.
By analogy with the method of steepest descent as described for the
positive definite case, we may define the non-negative function

g(x) = (Az — y)"(Az — y)


and find that at x,_1 its gradient lies in the direction A'r,1, whence the
choice
Up = Tp-1

is optimal from the point of view of most rapid minimization of f(z).


If we take
Vp = Venti = iy

then, as appears from (2.1121.4), we are projecting the vectors b, in


rotation upon the normals to the basic hyperplanes of the system. In
this event x, is caused to lie in the 7th hyperplane so that the 7th equation
is satisfied, though in general all components of x, will differ from those
of x,-1. The numerator in Aj is simply the 7th residual, 7.e., the 7th
element of r,-1, while the denominator is the sum of squares of the 7th
row of A.
Hence if we follow the lead of the method of relaxation, we would
choose v, to be that e; that would provide the maximal correction

A pATe;.

Hence to make the optimal choice, we should divide each residual by the
square root of the sum of the squares of the corresponding row of A, and
select the largest quotient. Presumably all these square roots will be
used repeatedly and should be calculated in advance.
2.1122. The Equations Represent a Resolution of the Vector y along
the Column Vectors of A. If x, is any set of trial multipliers,

T? =Yy — Ax,
represents the deviation of the vector Az, from the required vector y-
Take
bp = Tp,
and let
Up = Ady
MATRICES AND LINEAR EQUATIONS 51
represent any linear combination of the columns of A. Then Eqs.
(2.11.1) and (2.11.4) give

y — Aty1 = y — Ary + dzAdz,


or

(2.1122.1) Lp = Dp-1 + App;


with
(2.1122.2) Ap = VIATr, s/vTATAD,.
The method of steepest descent would take

| Vp = Aly,
a choice that complicates the denominator in \, excessively. In taking
v, = @; for some 2, we alter only one element of x,_1 in obtaining zy, but
no element of r, is made to vanish, so no one of the equations is necessar-
ily satisfied exactly. To find the optimal e; according to the principle
of the method of relaxation, we observe that we wish to maximize the
vector
ApUp = ApAD_
or
ApAe;.

However, the numerators of the possible \, have the form

efATr,4
which is a complete scalar product of the 7th column of A with the residual
vector rp1. Taking this scalar product represents the greater portion
of the labor involved in the complete projection, so that one would
probably always take the vectors e; in strict rotation.
2.113. Some generalizations. The methods described in §2.111 con-
sisted in taking each residual s,_1 = « — %p_1 from the true solution z,
projecting orthogonally upon a vector u,, and adding the projection to
2%p,-1 to obtain an improved approximation x, The new residual sp,
was orthogonal to the projection on uy. Clearly if the projection is
made on a linear space of two or more dimensions, the projection, 7.e., the
correction, will be at least as large as the projection on any single direction
in this space. Hence it is to be expected that the rate of convergence
of the process would be more rapid if, instead of projecting each time
upon a single vector uy, we were to project upon a linear space of two or
more dimensions. Such a space may be represented by a matrix U,
such that any vector in the space is a linear combination of columns of
Us
52 PRINCIPLES OF NUMERICAL ANALYSIS

The problem is now the following: Given any matrix U,, we wish to
project the residual s,-1 orthogonally upon the space U, (that is, the
space of linear combinations of its columns). The projection “gill be
taken as a correction to be added to x»_1 to yield the improved approxi-
mation x». The orthogonal projection is represented by the matrix
U,(USAU,)UTA, and we find now that
(2.113.1) Lp = Lp + U,(UIAU,) “Uhr.
The scalar \, is for present purposes to be replaced by the vector
(UTAU,)“Ur y-1.
Its expression is the same as for \, except that the matrix U, replaces
the vector u,, and it is to be noted that the reciprocal matrix enters as a
premultiplier. While the method does provide a larger correction, in
general this advantage is offset by the necessity for calculating an inverse
matrix whose order is equal to the dimensionality of the subspace upon
which the projection is being made. Nevertheless in special cases this
inversion may prove to be fairly simple.
If the columns of the matrix U, are unit vectors ¢;, then the matrix
UTAU, is a principal submatrix of the matrix A. If, say, we take Up
to be the two-column matrix (e;, e;), then

nav, -(“# %) Oj, ji


The correction is made so as to eliminate both the 7th and the jth elements
of r, by appropriate selection of the ith and jth elements of x». Hence
the 7th and jth equations are solved simultaneously for these elements in
terms of the current approximations to the other elements. This is to
be contrasted with the method of §2.111 where the ith and jth equations
are solved at different times and not simultaneously.
The iterations for the case when A is not a symmetric. matrix can be
generalized in like manner. When we interpret the equations as repre-
senting a system of hyperplanes and project upon a space Gevcreaned by
normals to these hyperplanes, we obtain
(2.113.2) Lg = Coa. A Vg VIAALV
ov finan,
where V, is an arbitrary matrix of linearly independent columns. When
we interpret the equations as requiring the resolution of the vector y
along the column vectors of A, we get
(2.113.3) Bp = Lp1 + Va(VIATAV,)“VIATr,1.
2.12. Some Analytic Considerations. The iterative methods so far
discussed have been suggested by geometry. Analytical considerations
suggest several others.
MATRICES AND LINEAR EQUATIONS 53

2.121. Cesari’s method. In the Seidel iteration x, differs from zo only


in the first element, which is so chosen that the first equation is satisfied
exactly. Next 22 differs from x, only in the second element, which is so
chosen that the second equation is satisfied exactly. In the nth step,
z, differs from 2,-1 only in the nth element, which is chosen to satisfy
the nth equation exactly. Next 2,41 is obtained from z, by readjusting
the first element to satisfy the first equation, and this begins a new cycle.
Let us write A = Ai + Ae, where in A; all diagonal and subdiagonal
elements are the same as those in A, while all supradiagonal elements are
zero:
ay , = aj; when t > j,
ou Ai = (a), “i — 0 when i<j;
Ail), Cian 0 when 7 > j,
4 = a; when? < j.
Then we verify easily that

Arkeppyn = Y — ArXpn.
Since the vectors Zpnii for 0 < 7 < m need not enter explicitly, we may
modify the notation by writing simply zx, for what had been designated
Xpn, and the iteration is written
(21212) A1fp41 Py Axx, At Ai + A>.

We know that this iteration converges when A is positive definite


and A; and A> are given by (2.121.1). However (2.121.2) formally
defines an iterative process whether or not A is positive definite, and
whether or not A and A> satisfy (2.121.1). It is required only that A;
be nonsingular. Since (2.121.2) is equivalent to

Lp = Az '(y — Ac®p-1),
and x satisfies
“6= Azlty = A22),

it follows that the residuals s, = x — 2p satisfy


Sp = —AjzAasy-1 = (—Az1A2)?80.
Hence for the sequence (2.121.2) to converge for an arbitrary y, it is
necessary and sufficient that the proper values of the matrix —Ajy1A»
should all be of modulus <1. The characteristic equation for this
matrix can be written
(2.121.3) INAy + Ad| = 0.
A sufficient condition for convergence is that

(2.121.4) M(A7z1A2) <1,


54 PRINCIPLES OF NUMERICAL ANALYSIS
and hence, a fortiori, it is sufficient that
(2.121.5) N(Ajz1A2) < 1,
the latter criterion being one that is fairly readily applied.
Cesari’s derivation of the same sequence is interesting. Let
(2.121.6) A,+ A: = yA,
where y is any scalar and A; is nonsingular. Let the vector z(u) as a
function of the scalar » be defined by

(2.121.7) (Ai + wAs)z(u) = vy + (u — 1)Aw,


with v an arbitrary vector. Then
a(1) == 2.
Let
c(m) = to + pry + wag /2Zit--:.
Then on substituting into (2.121.7) and grouping terms, one obtains
0= (A120 — yy + Aw) + p(A cto + Ax — Aw) + w?(A 24 . Axwxg /2!)
+ w3(Acry/2! + Aicy’/3!)) + +:
If
ty = mot t +++ +a /pl,
then
Arto = vy — Aw,
Aifp1=yy—Axt,r (p21),
and we have again the recursion (2.121.2).
When A, is a diagonal matrix whose diagonal is the same as that of
yA, we have a method discussed by von Mises and Geiringer. Since for
such a choice of A; the diagonal elements of A: are all zeros, and since A,
is a diagonal matrix, the diagonal elements of Aj1Az2 are also all zeros.
It is no restriction to suppose that A: = J and that y = 1, for we may
replace the original system by the equivalent one,
yAziAx = yAzy.

Then (2.121.5) yields one of the criteria given by von Mises and Geiringer
which they state in the form
» a} <1.
tA
a,j

We obtain another of their criteria,

y les! Se <1,
tAy
MATRICES AND LINEAR EQUATIONS 55

by noting that, if o are the elements of s,, then |o%+?| < » laus| > lo,
iwi
whence the criterion implies that Z|o{7?| < ad\o”|.
Now suppose A is positive definite and write
(2.121.8) yI+ B) =A.
Thus we take Ai = I, Az = B for (2.121.2). Since
Ree YIN Yseh) Lea
it follows that for each proper value \; of A there is a proper value
(4; — v)/y of B, and convergence of the process requires therefore
that every \; < 27. If 7 is so chosen, and if the proper values of ) are
arranged in order of magnitude,

Mie aA ot ay ee Xn;

then the proper values of B, in the same order, are


Ar/y — 1, Ao/y — 1, AG , An/y — 1,

and at least \,/y — 1 is negative. If y is taken too near to \,/2, then


1/y — 1 will be close to 1, and convergence will be slow; if y is taken
too large, \,/y — 1 will be close to —1, and convergence again slow.
The optimal choice is
(2.121.9) y = (Ai + A2)/2,
for then

(2,121.10) Aa/y — 1 = —(An/y — 1) = Cr — An)/Q1 + Ad),


the extreme proper values being equal in magnitude.
We may now ask, with Cesari, whether by any choice of a polynomial
f(A), with F(A) = AS), the system

F(A)a = f(A)y,
equivalent to the original, yields a more rapidly convergent sequence.
The proper values of F(A) are n; = F(\;). If uw’ and w”’ are the largest
and smallest of the u;, we wish to choose F(A) sothat (u’ — p’’)/(u’ + yw”)
is as small as possible, as we see by (2.121.10). Hence we wish to choose
F(A) to be positive over the range (A1, An), and with the least possible
variation.
The simplest case is that of a quadratic function F, and the optimal
choice is then
F = Xa — d), a = (Ay + X,)/2.
Ordinarily one does not know the proper values in advance, though one
might wish to estimate the two extreme ones required (¢.g., see Bargmann,
56 PRINCIPLES OF NUMERICAL ANALYSIS
Montgomery, and von Neumann), or these might be required for other
purposes. —
2.122. The method of Hotelling and Bodewig. The iterations so far
considered have begun with an arbitrary initial approximation 2» (which .
might be z = 0). Suppose, now, that by some process of operating
upon y in the system
(2.122.1) Az=y,
perhaps by means of one of the direct methods of solution to be described
below, one obtains a ‘“‘solution” x) which, however, is inexact because
it is infected by round-off. The operations performed upon the vector y
are equivalent to the multiplication by an approximate inverse C:
(2.122.2) Lo = Cy.
Then by (2.11.4) the unknown residual so satisfies the same system as
- does xz, except that r» replaces y, and hence we might expect that Cro is
also an approximation to s. Hence we might suppose that
1 = 2% + Cro = C(2I — AC)y

is a closer approximation to x than is %. Otherwise stated, it would


appear that, if Co is an approximation to A—!, then
Ci = Co(2I — ACo)
is a better one.
If Cy is an approximation to A-!, then AC> is an approximation to J.
Let
(2.122.3) B, = I — AC), Corr = C(I +, B,).
Then
Boyt = I — ACpyi = I — AC,(I + B,) = B?,
and therefore
(2.122.4) Bo == Be.
Hence if M(Bo) <1, then M(B,:) < M(B,), and if N(Bo) <1, then
N(Bai1) < N(B,), and in fact,
(2.1225) M(B,) <[M(B.)I*, N(B,) < [N(Bo)]”.
If A is positive definite, then we can always transform the equations
if necessary and secure that M(A) <1. Then all proper values pu of
I — A satisfy 0 < » <1 so that M(I — A) <1. In this case we may
take
C=I1, Be=I-A,
whence
By = (1. A)?,
and convergence is assured.
MATRICES AND LINEAR EQUATIONS 57

To return to the general case, if Co is3 any approximate inverse, so that


M (Bo) is small, we have
A = Co — Bo) = Co + Bo +Bi+-->),
and the series converges. It is easily verified that

(I + Bo)(I + B2) = I + By + Bz + Be,


(I + Bo)(I + B)(I + B) =I +Bo+ +--+ + Bz
Coie eg) ee: Fe rev te. Gee ot og be eo rey ore: Be je eye jefe, vey ceo et OF te. O88

Hence A~! is expressible as the infinite product


(2.122.6) Asma Gol 7-14) le Be) 4 Bs). -
In applying this scheme, one retains the successively squared powers of
Bo, adding I after a squaring, and multiplying by the previously obtained
approximation. The identical argument carries through if we take
Bi =I -— CoA,
and obtain

(2.122.7) At=-++-7+ B50 + BY) + BoCo.


This scheme is given by Hotelling and by Bodewig.
In solving a system of equations, it may be preferable to operate
directly on the vector y and the successive remainders, as was originally
suggested. In this case the matrix C' may not be given explicitly. Let

Vo = Xo = Cy, ro = y — Ax = By,
Pisa pe, Crs Tp = Tp _ Avp+1 = Br>.

Then
Ly=v tnt *** +p,
(2.122.9) tp = y — At, = Brty,

The procedure is to compute vp;1 from the last remainder r,, and then
compute the next remainder rp+1.
2.13. Some Estimates of Error. The fact that an iterative process con-
verges to a given limit does not of itself imply that the sequence obtained
by a particular digital computation will approach this limit. If the
machine operates with o significant figures in the base 8, we are by no
means sure of o significant figures in the result. At some stage the
round-off errors introduced in the process being used will be of such
magnitude that continuation of the process is unprofitable. However
another, perhaps more slowly convergent, process might permit further
improvement. In any case it is important to be able to estimate both
the residual errors and the generated errors. In presenting these
estimates, it will be supposed that the equations to be solved are them-
58 PRINCIPLES OF NUMERICAL ANALYSIS
selves exact. The extent to which an error in the original coefficients
affects the solution will be discussed in a later section. |.
2.131. Residual errors. We first consider residual or truncation errors
neglecting any effects of round-off. If M(H) < 1, then from the identity
I —-H)3=I1+ AH -4)"
=I+HI+H+H'?+--:>)
it follows that ;
M(ZI — H)] S<1+ M(A)M(CZ — H)*).
Hence
(2.131.1) M((I — H)-] < 1/1 — M(A)].
Thus in certain cases a bound for the maximum value of a reciprocal can
be obtained from the matrix itself. From the same identity, since
N(I) < n4,if N(H) < 1, then
(2,131.2) N(Z — H)-'] < n4/(1 — NCA),
and if nb(H) < 1, then
(2.131.3) b[( — H)-] < 1/[1 — nb(A)).
The last two inequalities are generally less sharp but more easily applied.
Now consider the sequences C’, and B, defined by (2.122.3). Since

AT = Co Boa’
if we set H = Bo, then
(2.131.4) M(A™) < M(C,)/{1 — M(B,)I,
provided the denominator is positive. To establish the analogous
inequalities using N and b, we note that

A = Co + CoBo + CoB? + ce veoh


whence

(2.181.5) N(A~) < N(Co) + N(Co)N(Bo) + N(Co)N?(Bo) + + - >


: = N(Co)/(1 — N(Bo)I,
an

(2,131.6) b(A~) < b(Co) + nb(Co)b(Bo) + nb(Co)b*(Bo) + «+ *


= b(Co)/[L — nb(Bo)],
again provided the denominators are positive.
Now from (2.122.4) and (2.122.3) we can deduce that

or
Cy = AMI — BB),
(2,131.7) A“ — C, = A BY.
MATRICES AND LINEAR EQUATIONS 59

Hence given the SApienses on M(Bo), N(Bo), or nb(Bo), as the case may
be, we have
M(A™ — C,) < M(A)M” (Bp),
whence by (2.131.4)

(2.131.8) .M(A+— Cy) < M(Co)M”(Bo)/[1 — M(Bo)],


and eee orouels,
(2.131.9) N(A-! — Cz) < N(Co)N?(Bo)/[1 — N(Bo)],
(2.181.10) b(A-1 — C,) < n?*b(Co)b?(Bo)/[1 — nb(Bo)].
We can write (2.131.7) in the form
—t— C, = Co — Bo)—1B?’.
If
Tp = Upy, 8p) = % — Xp,
then
8p = Co(I — Bo)
BP y.
Hence we have, for example,

N(8p) < M(Co)M*”(Bo)N(y)/[1 — M(Bo)].


If e; is the ith unit vector, eJs, is the 7th element in s,. Hence we may
write, with de la Garza,

(2.131.11) lelsp| < N(ejCo)M*(Bo)N(y)/{1 — M(Bo)],


(2.131.12) lels,| < N(elCo)N??(Bo)N(y)/[1 — N(Bo)].
For the Seidel iteration we have

A = Ai + Aa,

A-! = (I — H) Ap, - H = —AzAs.


Let
ee JG@+H+H?+ omen te + H?—)Az1

Then
A — C, = HI — H)Azt
and

(2.131.138) M(A-! — C,) < M*(H)M(A;?)/(1 — M(A)I,


(2.131.14) N(A-! — C,) < N*(H)N(A7z)/(1 — M(A)],
(2.1381.15) b(A-? — C,) < n?tb?(H)b(Az?)/[1 — nb(A)).
If we take zo = 0 in the Seidel iteration, then x, = Cy, and the devia-
tion is x — t» = (A-!— C,)y. In general, however, for an arbitrary Xo
if rp = y — Azp, then
tp = A\H?Az"r,
60 PRINCIPLES OF NUMERICAL ANALYSIS

and
N(r,) < M(A1)M(Az1)M?(A)N
(10)-
Since v4
Tpp1 — Zp = HPA;z'y,
« — a, = (I — H)H?A;ry,
then ;
Sp = 2 — ty = (I — A)“ (hpy1 — 2p).
Hence
(2.131.16) N(8) < N(@p11 — %)/(1 — M(H]
< N(Sp11 — 2) /{l — NCD).
Also
Sy = (= A) AG, Ty—1),
(2,131.17) N(8p) < M(H)N (ap — %p-1)/[1 — M(H)]
< N(H)N (ap — %p-1)/[1 — N(A)).
These inequalities provide estimates of the error in terms of the magnitude
of a particular correction.
2.132. Generated errors. If A is symmetric, let » and u be the numeri-
cally smallest proper value and an associated proper vector, respectively.
If xo is any approximation to z = A-y, then Axo = y — fro while
A(to tu) =y —Tot wu.
Hence if » is very small, a large component in 2» along u would appear in
ro a8 only a small component in the same direction. Another way of
saying this is to say that a putative solution x» might yield a residual ro
that would be regarded as negligibly small even when 2 has a large
erroneous component along wu.
In general, for any matrix if x and x, are two putative solutions, then
%1—- Xo = A“ (re == 11),
(2.132.1) N(a
— 2) < N(r2 — 1)M(A~*) = N(r2 — 11)/m(A),
and if m(A) is small, then a large difference x, — x2 could result in only a
small rz — ri, possibly less than the maximum round-off error. In fact,
if «is the limit of the round-off error, then e/m(A) represents the limit of
detectable accuracy in the solution.
There is no a priori assurance, however, that any particular method of
solution will give a result that is even that close. We therefore consider
this question for some of the iterative methods described above. It will
be assumed, for definiteness, that the operations are fixed-point with
maximal round-off ¢, all numbers being scaled to magnitude less than
unity, and that in the multiplication of vectors and matrices it is possible
to accumulate complete products and round off thesum. If each product
is rounded, then generally in the estimates given below the factor « must
be multiplied by n, the order of the matrix.
MATRICES AND LINEAR EQUATIONS 61
In any iterative process which utilizes one approximation to obtain one
that is theoretically closer, the given approximation actually utilized in
the computation, however it may have been obtained, is digital. To the
digital approximation one applies certain pseudo operations to obtain
another digital approximation. Two partially distinct questions arise:
Given a digital approximation and a particular method of iteration, can
we be sure that the next iteration will give improvement? Given two
digital approximations, however obtained, when can we be sure that one
is better than the other? These are questions relating to both the
generated and the residual errors, since for iterative methods they merge
together.
Basic to the discussion is the fact that, when a product Azo, say, of a
digital matrix by a digital vector, is rounded off by rounding only the
accumulated sums and not the separate products of the element, then the
resulting digital vector, which will be designated (Axo)*, satisfies

(2.132.2) b[Ax — (Az)*] < «,


N[Aao — (Axo)*] < ne.
An additional factor m would appear if each product of elements were
rounded. Likewise, for the multiplication of two digital matrices the
digital product satisfies
(2.132.3) b[AC — (AC)*] < «,
N[AC — (AC)*] < ne.
2.1321. The Seidel Process. We consider only a positive definite
matrix A. The process is based upon the fact that the vector x which
satisfies the equations is the vector x which minimizes the function
x'(Ax — 2y) = —a2"(y+7r).
Hence it maximizes z"(y + 7). Hence, given two approximate solutions
xo and 2, we shall say that x; is a better approximation than x» provided

(2.1321.1) aivy +11) > zhy + 10),


and if the two quantities are equal, the two approximations are equally
good. However for making the test in a particular instance, there will be
available only the vectors

| 1, Ax eee pie Qyrk.


By (2.132.2),
N(ry — “3) = N[(Az,y)* — Azz] < ne,
and therefore
lat(rp — r3)| < N(&p)N (rp — 75) S n%*eN (a5).
Also
al(y + rp) = ah(y +15) + xh(t, — 15).
62 PRINCIPLES OF NUMERICAL ANALYSIS

Hence

ely + ro) — aly tr) = hy +r) — ally + rf) + aro — 99) a xi(r1 we rt),
and (2.1321.1) will certainly be satisfied if
(2.1321.2) 2T(y + rf) — xh(y + 79) > n4*{N@o) + N@)].
Since we can also say that
\2X(rp — rz)| < nb(xp)b(rp — rp) < neb(xp),
therefore (2.1321.1) is also implied by 7
(2.1821.3) a}(y + rt) — ai(y + 19) > nelb(xo) + b(@1)).
This requirement is somewhat more stringent.
Now consider a particular approximation 2» and the digital approxima-
tion that would be obtained from 2» following a single projection. Can
we be assured that the digital result of making the projection will be a
better approximation than x9? If the projection is made on e;, we wish
to know whether
(to + A*e,){Ly + [y — A(wo + A*e,)]} > zO(y + 70),
where
r* = elré mon OER

We suppose every a; = 1. This does not violate the requirement that all
stored quantities be in magnitude less than unity since the a,; need not
be stored explicitly in this case. Hence
A* = elry, A = elro.
The above inequality reduces to
2r* hak.

For A* > 0, this is equivalent to


h* > 2—* — 2),
and for \* < 0 it is equivalent to
A* < 2(A* — 2).
Since |A\* — d| < ¢, either condition is assured by
(2.1321.4) In*| = lelr*| > 2c.
If |\*| < 2c, then d* = ejro = 0, and no change is madein 9; if |\*| = 2c,
then at least the modified vector is not worse than a. Hence in spite of
the round-off, no step in the process can yield a poorer approximation,
gad in general any step will yield a better one until ultimately some
ro = 0.
MATRICES AND LINEAR EQUATIONS 63

2.1322. Iteration with an Approximate Inverse. Next consider an


arbitrary nonsingular matrix A with a given approximate inverse C and a
given approximation 2) tox = A~y. Elements of the inverse will be out
of range if the elements of A are digital. Hence C must be stored in the
form @-7C, where y is some positive integer, and the elements of this
matrix will be assumed digital. Also y will be supposed large enough so
that all prescribed operations yield numbers in range. We suppose y
scaled so that + and any approximation are in range. Hence 2» is sup-
posed digital.
As a measure of magnitude of a vector 7, we use b(r), and associate with
it a measure of magnitude of a matrix A, denoted c(A), and defined by

c(A) = max ) |v;

One verifies easily that for any two matrices A and B

c(AB) < ¢(A)c(B),


C(As eb) as (A) + c(B).
If we form a matrix having any vector r in the first column and zero else-
where, and apply the first of these inequalities, we conclude that
b(Ar) < c(A)b(7).
Moreover
c(A) = max b(Ar)/d(r),
r#0

though this property will not be required.


If x) and 2, are any two digital approximations to x, we ask first under
what conditions we can be assured that b(r1) < b(ro). Since we calculate

ae ee (Azp)*, (i ip 2,

we can be assured of the relation only when


b(r*) + b(ri — rt) < b(ri‘) — b(ro — rf).

But each element of r¥ can be in error by an amount e (if individual


products are rounded, it is ne), whence the condition is

b(r*) < b(r#) — 2c.


When the equality holds, then at worst b(r1) = b(ro).
Now suppose we have the approximation 2» and wish to decide whether
to attempt to improve the approximation by forming 2) + Cro. Are we
assured of obtaining a better approximation? Actually we form a
digital vector
Ly = Lo + B(B-7Crs)*,
64 PRINCIPLES OF NUMERICAL ANALYSIS

and the question is whether this is a better approximation than %. We


have identically —
(2.1322.1) m1 = ro — rx + B*r$ + (B — B*)re
+ BrA[s-*Cr¢ — (6-7Crg)*],
where
Bra] — AC,” “B= pip — {Ae 7Cyl.
Hence an improvement is assured if b(ro) exceeds the b function of the
right member of (2.1322.1), and this is certainly true when b(ry) —
b(ro — r*) exceeds the same quantity. Hence the condition is
(2.1822.2) b(r*) > {2b(ro — xX) + Brc(A)b[B-7Crs — (8-7Crf)*]}/
[1 — c(B*) — e(B — B*)).
In this relation c(A), c(B*), and b(r#) can be evaluated directly while the
other quantities are limited by the computational routine.
By the contemplated routine of rounding off the accumulation of
products, each element of (@-7Crf)* can differ from B-7Cr¢ by as much
as e, whence
b[B-*Cr¢ — (8B -7Crs)*] < «.
As for B*, each element in [A(8~7C)]* can be in error by e, and n terms
contribute to the C function. Hence
c(B — B*) < ne6’.
The condition required is therefore
(2.1322.3) b(rg) > [2 + BrYc(A)]/[1 — neBy — c(B*)].
The dominant term in the numerator is probably BYc(A).
A slight modification of the routine can improve the situation by
“damping out” the term 6%c(A). Since rf is presumably small, it should
be possible to scale it up by a factor 6°, forming (6°-yCr*)*. If this is
done, one has
bla1Cr* — (B1Cr8)*] < 6,
and in place of (2.1322.3),
(2.1322.4) b(ry) > 2 + B%*c(A)]/[1 — neBy — c(B*)].
On consecutive iterations as the residual diminishes, 6 can be increased,
possibly even until the term 67*c(A) becomes negligible. Inthe denomi-
nator, c(B*) can be reduced, if desired, by iterating to improve the
approximate inverse. But the term ne’ is not at our disposal.
By a further modification of the routine, the factor 6 can be brought
before the entire numerator. If the products required in forming Azo
can be accumulated before rounding, the accumulation can also be sub-
MATRICES AND LINEAR EQUATIONS ; 65
tracted from y and the result multiplied by ge before penne? This
gives
b(ro — rf) < Be.
If this is done, the condition
(2.1822.5) (rr) > eB [2 + Brc(A)]/[1 — neB? — c(B*)]

is sufficient to ensure improvement. Since the stored quantities are


elements of 6*rf, the elements of r* can be made actually less than the
maximal round-off.
2.2. Direct Methods. A direct method for solving an equation or
system of equations is any finite sequence of arithmetic operations that
will result in an exact solution. Since the operations one actually per-
forms are generally pseudo operations, the direct methods do not gen-
erally in practice yield exact results. Nevertheless, the results may be as
accurate as one requires, or it may be advantageous to apply first a
direct method after which the solution may be improved by the applica
tion of one of the iterative methods.
Certainly all (correct) direct methods are equivalent in the sense that
they all yield in principle the same exact solution (when the matrix is
nonsingular and the solution unique). Nevertheless the methods differ
in the total number of operations (additions and subtractions, multipli-
cations, divisions, recordings) that they require and in the order in which
these take place. As a consequence they differ also as to the opportuni-
ties for making blunders and as to the magnitude of the generated error.
Most direct methods involve obtaining, at one stage or another, a
system of equations of such a type that one equation contains only one
unknown, a second equation contains only this unknown and one other, a
third only these and one other, etc. The procedure for solving such a
system is quite obvious. The matrix of such a system is said to be
triangular (or semidiagonal), since either every element below the prin-
cipal diagonal, or else every element above the principal diagonal, has
the value zero. A matrix of the first of these types is upper triangular;
one of the second is lower triangular. If it happens, in addition, that
every diagonal element is equal to 1, then the matrix is unit upper tri-
angular or unit lower triangular, as the case may be. We shall say that
a matrix M is of triangular type, upper or lower, if it can be partitioned
into one of the two forms

ean we (Mt Me), yea (M2)


_ (Mu Mu _ (Mu a)

where M;, and M2 are both square matrices, as is M itself. We shall


consider such matrices briefly.
66 PRINCIPLES OF NUMERICAL ANALYSIS

2.201. Matrices of triangular type. If M is of upper triangular type,


then M™ is of lower triangular type. Hence it is sufficient to consider
just one of these types. One verifies directly that, if M and.N are both
of upper triangular type when similarly partitioned (7.e., corresponding
submatrices have the same dimensions), then the product MN is of upper
triangular type when similarly partitioned. If, further, M1: and M22 are
nonsingular, then M~—! exists, and is of upper triangular type when simi-
larly partitioned. In fact,

(2.201.1) Mo = (ee aa vata


) 2
Hence if My; is upper triangular and M2. a scalar, then MM itself is upper
triangular and not merely of upper triangular type. In this case (2.201.1)
provides a stepwise procedure for inverting a triangular matrix, if M2.
is a scalar while Mj; is a matrix which has been inverted.
If a matrix A is partitioned as
Au Ai
2202 A
( ) (a as)

with Ai: nonsingular, then A can be expressed in many ways as the


product of matrices of triangular type in the form
‘ Au ra les 0 Miu My
eehl2)
2.201.3
G. Au) =
SANG mene 0 a) :
In fact, for an arbitrary nonsingular Ni (of the dimensions of A1:),
Mii, Mi, and N2: are uniquely determined independertly of Ao: and of
the selection of N22 and Mz. For one verifies directly that
(2.201.4) Mi = NPAu, Mie = NWA, Nox oe AnAZN nu,

while N22 and Me: are restricted only by the relation


N2x2M2 = Ao. — AAPA 12.

This being the case, we can give an inductive algorithm for a factoriza-
tion of A into the product of a unit lower triangular matrix L and an
upper triangular matrix W. That such a factorization exists and is
unique when A is of second order and Ai; # 0 follows from the above by
taking Nir =N2,2 = 1. For purposes of the induction suppose that Nu
above was unit lower triangular and M,, upper triangular. Then M1,
and N» are uniquely determined by (2.201.4). We change the notation
and partition further, writing
Ai Ai Ais Li 0 0 Wi Wie Wis
(2.201.5) Aa Ago Agog = Lox Lie 0 0 Wo Wes ’
As: Azo Az 31 Lee Ls 0 0 W;
MATRICES AND LINEAR EQUATIONS 67

where Iu. = Nu, Wir = Mi, Aus is the same as above, but the sub-
matrices. previously designated A22, Asi, and Ais are now further par-
titioned, as are the matrices Nei and Mie. When the necessary inverses
exist, these last matrices or their own submatrices are determined
uniquely by Eqs. (2.201.4) which now have the form

Wx = LAr, Wis = Ly Ais,


Le = AaWz, Ls = As Wz.

Four conditions remain to be satisfied. Of these, three give


Woe = Ly} (Ace — LaWw),
(2.201.6) Wes = Ly} (Aos — LeiWis),
Ls2 = (Age — Lsi1Wi2) Woe.

Hence W22, Wes, and L32 can be determined uniquely from A and from the
portions of L and W already determined, provided only that As. — LeaWi
is nonsingular, and independently of the choice of the matrices L33 and
Ws3. The last condition merely specifies the product L3;3W33s. Hence
for the inductive algorithm take I2. = 1 and determine the scalar Wee
and the vectors W23 and Lz32. Now the matrices
a 0 ) de )
Dey De 0 Wx

are unit lower and upper triangular matrices, respectively, to be desig-


nated Li; and W,; for the next steps. The process fails if at some stage
We. = 0. If A is nonsingular, it is always possible to rearrange rows and
columns and continue.
By applying the process to A‘, we note that we could equally well make
W a unit upper triangular matrix and L lower triangular, and again the
factorization is unique.
When A is symmetric and L is unit lower triangular, let D? represent
the diagonal matrix of elements of the principal diagonal of W. Then
Dy =.

and the factorization can be written

A = LDL".
If we write
LD = K,
then
A = KK".

If A is not positive definite, then D* will have negative elements, and


hence D will have pure imaginary elements. However this presents no
real computational difficulty.
68 PRINCIPLES OF NUMERICAL ANALYSIS

We conclude this section by noting that, if B is an arbitrary matrix


of m <n linearly independent columns, there exists a unique unit upper
triangular matrix V of order m such that the columns of
(2.201.7) Rk. = BV
are mutually orthogonal with respect to the matrix G. This is to say
that the matrix ;
(2.201.8) R'GR = D
is a diagonal matrix. To make the proof inductive, and exhibit the
algorithm, suppose this has been accomplished for the matrix B with
m <n, and suppose that the vector b is independent of the columns of
B. We wish then to select vectors r and v so that

(2.201.9) Gy aR (% 2
and ¢
GR = 0.

These conditions are satisfied by taking


(2.201.10) v = D"R'Gd, r=b— Rv.
Geometrically, this process amounts to resolving b into a component in
the space of the columns of B (or of R) and a component orthogonal to
this space, the latter component becoming the vector r.
Most of the classical direct methods for solving systems of linear
equations can now be deduced almost immediately.
2.21. Methods of Elimination. In elementary algebra one learns to
solve a system of equations by “‘the method of elimination by addition
and subtraction.” In this method an equation is selected in which the
coefficient of the first unknown &, is non-null, and one adds an appropri-
ate multiple of this equation to each of the others in turn so that £
is eliminated from these equations. The resulting n — 1 equations,
together with the one used for the elimination, constitute a new system
equivalent to the first system, and the n — 1 equations contain only the
nm —1 unknowns £,..., &. The same process applied to these
yields n — 2 equations containing only the n — 2 unknowns £3, . . . , &n.
Eventually one obtains a single equation in £, alone. The final solution
is now obtained by “‘back substitution,” in which the value of £, obtained
from the last equation is substituted into the preceding which can then
be solved for £1, these are then substituted into the one before, ete.
Thus the elimination phase followed by the back-substitution phase
yields the final solution.
In the elimination phase, the operation of. eliminating each of the
MATRICES AND LINEAR EQUATIONS 69

unknowns is equivalent to the operation of multiplying the system by a


particular unit lower triangular matrix—a matrix, in fact, whose off-
diagonal non-null elements are all in the same column. The product of
all these unit lower triangular matrices is again a unit lower triangular
matrix, and hence the entire process of elimination (as opposed to that
of back substitution) is equivalent to that of multiplying the system by a
suitably chosen unit lower triangular matrix. Since the matrix of the
resulting system is clearly upper triangular, these considerations consti-
tute another proof of the possibility of factorizing A into a unit lower
triangular matrix and an upper triangular matrix.
For the system
Ax =y,
after eliminating any one of the variables, the effect to that point is that

ee
of having selected a unit lower triangular matrix of the form

Lie I 22

where Ji; is itself unit lower triangular, in such a way that A is factored

@my A= (4m 4)- (2 7)" ae)


with W11 upper triangular but M22. not. Hence
(2.21.2) Mr. = Aoe — ArrAzPAr.
The original system has at this stage been replaced by the system

(2.21.3) (4i‘ ys) (®:) . (2)


where
Inu 0 2)ES (2) re
(2.21.4) e Ba)(@ mt y.

The matrices Li; and Ls; are not themselves written down. The partial
system
M o2%2 = 22

represents those equations from which further elimination remains to be


done, and this can be treated independently of the other equations of the
system, which fact explains why it is unnecessary to obtain the L matrices
explicitly.
If the upper left-hand element of M22 vanishes, this cannot be used
in the next step of the elimination, and it is not advantageous to use it
when it is small. Hence rows or columns, or both, in M22 must be
70 PRINCIPLES OF NUMERICAL ANALYSIS

rearranged to bring to this position an element that is sufficiently large.


Corresponding changes must be made in the notation.
Crout’s method differs in that the L matrices are written down explic-
itly at each stage while M22 is not. It utilizes the inductive algorithm
given in (2.201.6) where, as we have seen, each successive column of L
and row of W can be obtained from those previously computed together
with the corresponding row and column of A. In order to compute the
vector z as one goes along, one writes the augmented matrix (A, y).
Then the partitioning of (2.201.5) is extended to the following:
Au Ax Ais Y1
(2.21.5) Ao. Ao. Ass Ye
Asi Ase Ass Ys
Tae O 0 Wu Wr Wis a
=| Lei Lee 0 0 Wee Woes 22
Dizi Le Ds, 0 0 Wss 2%

and supposing Lu, La, Ls1, Wu, Wie, Wis, 21 already determined, Lo»
is prescribed (in practice Lez = 1), and Ls2, Wee, Wes, 22 are to be deter-
mined at this step. Equations (2.201.6) give Ls2, W22, and Wes, while 2»
is given by
(2.21.6) 22 = Le (y2 = L321).

While in practice one takes Lo. = 1, this equation and Eqs. (2.201.6)
are perfectly general. Since neither L33, W33, nor 23 occurs in any of
these relations, one can, with Crout, write the two matrices L — I and
(W, z) in the same rectangular array, filling out in sequence the first
row, the first column, the second row, the second column, etc. When
this array is filled out, the elements along and to the right of the principal
diagonal are the coefficients and the constants in the triangular equations
Wx =z.
In case one has two or more sets of equations with the same matrix A,
then the vectors y and z may be replaced by the matrices Y and Z in
(2.21.5) and (2.21.6). Alternatively one may solve one of these systems,
after which, with L and W already known, the elements of any other
column z in Z are obtained sequentially from the corresponding column y
in Y by using (2.21.6), remembering that at the start there is no partial
vector 2; so that one has simply ¢1 = m. In particular, if a single system
is solved by this method, and a result x is obtained which is only approxi-
mate because of round-off errors, we have seen that the error vector
x — Xo satisfies a system with the same matrix A, so that (2.21.6) can be
applied with y — Ax» replacing y.
Another modification of the method of elimination is that of Jordan.
It is clear that, after £ has been eliminated from equations 2, . . . , n,
MATRICES AND LINEAR EQUATIONS 71
and while the new second equation is being used to eliminate £ from
equations3, . . . , n, this can also be used to eliminate £ from the first
equation. Next the third equation can be used to eliminate £3 from what
are now equations 1 and 2, as well as from equations 4,...,n. By
proceeding thus, one obtains an equivalent system of the form Dz = w,
where D is diagonal. This amounts to multiplying the original system
Ax = y sequentially by matrices each of which differs from the identity
only in a single column. However this column will have non-null ele-
ments both above and below the diugonal.
Crout’s method provides a routine for triangular factorization which
minimizes the number of recordings and also the space required for the
recordings. This is very desirable, whether the computations are by
automatic machinery or not. For machine computation it has the dis-
advantage of requiring products such as L3i1W12 involving elements from a
column of Z and from a row of W. Jordan’s method permits a similar
economy of recording without requiring operations upon columns.
To see this we note first that, if J is a matrix satisfying

J(A, I) m ae J),

then certainly J = A-!. In words, if it is possible to find a sequence of


row operations which, when performed upon the matrix (A, J), reduces
it to a matrix (J, J), then J is the required inverse of A. Jordan’s method
forms (J, J) stepwise from (A, J) by multiplying on the left by matrices
of the form J + J;, where J; differs from the null matrix in the 7th column
only. .The process will be complicated somewhat in “positioning for
size,’”’ but we neglect this here and assume that the process can be carried
out in natural order from the first column to the last. Then the sequence
starts with the operation

(I + Ji)(A, I) = (At, Ki)


and continues with
(I + J2)(A1, Ki) = (As, Ke),
and generally
(I + Ji)(Ain, Ki-s) = (Ai, Ki).
We now observe that in A; the first 7 columns are the same as those of J,
and in K; the last n — 7 columns are the same as those of J. Thus one
needs to record only the first 7 columns of K; and the last n — 7columns
of A;. The ith column of J; is to be selected so that the 7th column
of the product (J + J;)Ai-1 is equal to e;, and need not be recorded.
Instead one records the ith column of J + Ji, which becomes the 7th
column of K;. If the ith column of A;_: is a1, ... , on, and the 7th
column of I + J;is ¢1, -.. - , $n, then ¢; = a;1, and ¢; = —a;/ai.
Hence in forming the composite matrix of nontrivial columns of A;
72 PRINCIPLES. OF NUMERICAL ANALYSIS

and of K,, one first forms the 7th row from the 7th row of the previous
composite. For this one divides every element but the 7th (which is
a;) by o;, recording the quotient, and in the ith place records oj. To
obtain the jth row (j ¥ 7), one increases each element except the ith by
—a; times the corresponding element in the new ith row. In the 7th
place one records ¢; = —a;/a; = 0 — aj/ai.
Clearly if one operates in this fashion upon the matrix (A, J, y), then
one comes out with (J, A-!, x). Thus in using automatic machinery if
n(n + 1) places are reserved in the memory for (A-}, x), then these places
are to be filled first by the matrix (A, y) arranged by rows. Each multi-
plication by an J + J; requires first an operation upon the elements of
the ith row, followed by an operation upon the elements of the old jth
and the new 7th row.
2.22. Methods of Orthogonalization. Let;
(2.22.1) A=RV, RR
= D?
where V is unit upper triangular and D? is diagonal. We have seen in
§2.201 that such matrices exist. The general metric G of §2.201 is here
taken to be J. The matrices V and R can be computed sequentially by
applying Eqs. (2.201.9) and (2.201.10) with appropriate modification of
notation. Then the equations Ax = y can be written RVx = y so that
D?Vxz = Ry, and
(2.22.2) gee Vie
Since D is diagonal and V unit upper triangular, their inversion is straight-
forward. This is Schmidt’s method.
In the least-squares problem one has a matrix B, with m < n rows,
and a vector y, and one seeks a vector x of m elements such that
Be =y+d, a'B
= 0.

Geometrically the vector y is to be projected orthogonally upon the space


of the columns of B, and the components z of the projection are required.
Since
B'Ba = By,
these equations yield the required x. If, however,

B= RV,... Bik =D,


then
aR = 0,
whence
D*Va = Ry,
and
¢ =, VD-Rly.
MATRICES AND LINEAR EQUATIONS 73
The orthogonalization process can therefore be applied directly to the
matrix B, and B'B is not required explicitly.
To return to the system Az = y, we may equally well write
(2.22.3) A = US, SST = D?,
where U is unit lower triangular. Let w satisfy
xz = Stw.
Then the equations can be written
USS'w = UD*w = y
so that
w = D°U-y,
and therefore
(2.22.4) 2 = S*D-2U-ly,
In this method the rows of S are orthogonal combinations of the rows
of A, and since x’ = w'S, the vector x" is expressed as a linear combina-
tion of these orthogonal row vectors.
If A is positive definite, we could attempt to use A, or possibly A}, as
the metric to obtain an orthogonalized set of vectors along which z
might be resolved easily, and in fact from §2.201 we can form matrices
R and V such that
l= RV, RtAR =D,
so that if
c= Rw,
then
R'ARw = Ry,
w= D hy,
zs = RD“R'y.

Indeed, R = V-', and it is therefore unit upper triangular. Hence the


relation R'AR = D? is equivalent to A = V'D?V, which is the triangular
resolution already obtained but arrived at in a different way.
An orthogonalization process of somewhat different type has been
devised by Stiefel and Hestenes, independently. The process leads to a
fairly simple iteration, which, however, terminates in n steps to yield
the exact solution apart from round-off. Since the n steps yield progres-
sively better approximations to the true solution, the process can be
continued beyond n steps for reduction of the round-off error.
The first step, as applied to a positive definite matrix A, is the same
as in the method of steepest descent, in that one starts with an arbitrary
initial approximation x» and improves it by adding a multiple of the
residual ro. Thereafter, however, instead of adding to each z; a multiple
74 PRINCIPLES OF NUMERICAL ANALYSIS

of r;, one adds a vector so chosen that r;,1 is orthogonal, with respect to
the metric I, to all preceding r;. If this can be accomplished, then for
some m <n, fm = 0, and hence Az, = y. For if all the vectors 7o, 71,
: , fr—1 are non-null, then being mutually orthogonal they are linearly
independent, and only the null vector is orthogonal to all of them.
Geometrically the method has other points of interest. We have
already noted that the solution x of the equations Ax = y minimizes the
function
(2.22.5) 2f(a) = wtAx — Qaly.
In fact it represents the common center of the hyperdimensional ellipsoids
(2.22.6) f(x) = const.
This fact provides the usual approach to the method of steepest descent.
Also at x» the function f(x) is varying most rapidly in the direction of
ro, which is the gradient at 2 of the function —f(x). Hence one takes
L1 = Lo + Af,

where a minimizes the function f(x + aro) of the single variable a.


It can be shown that the point x, is the mid-point of the chord through
Zo in the direction ro of that particular ellipsoid f(z) = f(xo) which passes
through x. It is easy to write the equation of the diametral plane which
bisects all chords in the direction ro. This is a diametral plane of the
ellipsoid f(x) = f(x1) through 21, as well as a diametral plane of
F(a) = fo),
and it intersects the original ellipsoids in hyperdimensional ellipsoids
whose dimensionality is one less than that of the original ellipsoids.
Stiefel and Hestenes now improve the approximation 2; by adjoining
a vector in the direction of the gradient to the lower dimensional ellipsoid,
which is the orthogonal projection upon the diametral plane of the
gradient r; to the n-dimensional ellipsoid. One proceeds then to get
ellipsoids of progressively lower dimensionality until one finally reaches
the center itself.
The success of the method depends upon a theorem (Lanczos) which
will have application also in other connections. Beginning with a
vector ro, suppose one seeks to orthogonalize the vectors ro, Aro, A2ro,
. by selecting a set of vectors ro, 71, 72, . . . in such a way that 7:41 is
a linear combination of the vectors ro, Aro, . . . , Aro, and 7,1 is orthog-
onal to fo, 71, ..., 7: At most n such ss will be non-null, and
we have already shown how any set of linearly independent vectors can
be orthogonalized: We can then express r;,1 as a linear combination of
the vectors ro, 71, . . . , 7: and of Ar;:
Tita = pitro%o + pipziti + ** * + pipes + piAry.
MATRICES AND LINEAR. EQUATIONS 75

For 7 = 0 the statement is trivial, for it merely says that r, is.a linear :
combination of r> and Aro. Suppose that all vectors ro, 71, . . . , 7% are
non-null and that the statement holds for them. Then p; ¥ 0, since
otherwise the mutually orthogonal vectors ro, ..., 1:41 would be
linearly dependent. Since 7; is a linear combination of ro, Aro, . . . ,
A*ro, therefore Ar; is a linear combination of Aro, A’m, ..., A’ro.
Hence r;,1 is expressed as a linear combination of ro, Aro, . . . , A‘fo.
The theorem in question states that
Pi+1,0 = Pitt = + * = prise = 0.

For suppose the resolution made. Then since p; ~ 0, each Ar; is a


linear combination of ro, 71, ..., Ti+1- Hence Ar, is orthogonal to
every r;forj7 >7+1. Hence
riAr, = 0, fed
Hence Ar; is a linear combination of only r;-1, 7;, and 7:41, which proves
the theorem.
After simplifying the notation, we can therefore set
(2.22.7) Tea =yilrs — Batis — Ary).
We would like to arrange it so that these vectors 7; are residuals y — Aa;.
Then we require that
Of lam Ati === vy eH Ax; sia Bialy oa Axi-1) re ae

If we impose the condition that


(2.22.8) yl = 6:1) = 1,
then we can achieve this by taking

(2.22.9) Vir = ¥i(ts — Beavti-a F aes).


For 7 = 0 we have
(2.22.10) ae Bese
Let
(2.22.11) p=rir, 0; = rIAry.
When we apply the orthogonality criterion, we find first
(2.22.12) a; = pi/oi.

Also
par = ayy Ars,

so that by reducing subscripts


(2.22.13) — fTArea = —pi/(ou-1¥e-1)-
76 PRINCIPLES OF NUMERICAL ANALYSIS
‘ Hence, since
a O = Biipi-1 + ar Ari,
therefore

(2.22.14) Bia = acspi/(e—1ps-1Y:-1)-


Hence, beginning with ao, and yo = 1, we can find i, then a1, Bo, 71, and
hence re, and so on sequentially.
From (2.22.7) we can write

Tar — Ts = VdBiailrs — M1) — aAri],


Tint — Be = YAB-(@s — Ui) + ari).
Therefore

(222.10) Lip. = 1% + DZ,


and hence

(2.22.16) Ti = 1; — Az,
(2:22.17) Zina = Tint + we,

where

(2.22.18) Nis = Qi, Bi = Vip 1Bidi/Acp1 = pirr/pi-

It can be shown inductively that

(2.22.19) 2iAz; = 0, EE Ie

To begin with, from


20 = 10,
T1 = To — AnAzZo,

4=m+ H0Z0,
we have
zhAz. = 2bAri + poziAzo
and
riry = —dorlAzo.

Hence, elimination of r}Azo gives with (2.22.11)

2jAz = —p1/Xo + Hoo,

and this is seen to vanish from (2.22.18), (2.22.14), and (2.22.12). Now
suppose (2.22.19) verified for allj <i<k. From (2.22.16) we have

riAz; = 0, jgst-lj>tt+1,
(2.22.20) pitt = —Asr], Az,
i = ArT Ag.
MATRICES AND LINEAR EQUATIONS 77
Hence, from (2.22.17) with i = k we have the required relation verified
when? = k+1andj<k-—1.
Again, from (2.22.17) with i= k

Aces: = 2fArig: + mezpAzr.


But from (2.22.17) with 7= k — 1 and from (2.22.20)
(2.22.21) ziAz = ziAr, = pr/d,
whence, again with (2.22.20),

2 Azer: = —pryi/e + pepr/d, = O.


This completes the proof of (2.22.19).
If we set
(2.22.22) 1 = Az,
then from (2.22.16) and (2.22.17) we can calculate sequentially zo = 70,
Xo, 71, Mo, 21, Ar, T2, M1, 22, . . . from the formulas

(2.22.23) Ai = pi/Ti, Ms = ig /pi


These equations are obtained by making use of (2.22.20) and (2.22.21).
From these it is clear that
4; > 0, pi > 0.

Hence, from (2.22.18) y; > 0 and 6B; > 0. Hence


lk S> eh S> OL

We can now relate the method to the minimizing problem. The


ellipsoid
F(z) = f(xo)
passes through the point 2; as varies, the points x» + du lie on the
secant through 2» in the direction u, and this secant intersects the ellipsoid
for \ = 0 and again for
A=M = 2u!ro/utAu.

To see this we have only to solve the equation f(a» + Au) = f(xo) for X.
If we take wu = 7o, then A’ = 2X9. Hence 2; is the mid-point of the chord
in the direction 79.
Now the plane rj}A(a — x1) = 0 passes through 2; and also through
the point = A-y, for by direct substitution the left member of this
equation becomes rjr: which vanishes because of orthogonality. This
plane is a diametral plane of the ellipsoid f(x) = f(x); it intersects this
latter ellipsoid in an ellipsoid of lower dimensionality. Instead of choos-
ing 22 to lie on the gradient to f(x) = f(x1), as is done by the method of
73 PRINCIPLES OF NUMERICAL ANALYSIS

steepest descent, the method of Hestenes and Stiefel now takes x2 to lie
on the orthogonal projection of the gradient in this hyperplane, or, what —
amounts to the same, along the gradient to the section of the ellipsoid
which lies in the hyperplane. At the next step a diametral space of
dimension n — 2 is formed, and 2; is taken in the gradient to the section
of the ellipsoid f(x) = f(a2) by this (n — 2) space. Ultimately a diame-
tral line is obtained, and z, is the center itself. With the formulas
already given these statements can be proved in detail, but the proof will
be omitted here.
2.23. Escalator Methods. Various schemes have been proposed for
utilizing a known solution of a subsystem as a step in solving the complete
system. Let A be partitioned into submatrices,

Cee) (42 Ass


A 11 A 2)
2.23.1 A= ,

and suppose the inverse A;} is given or has been previously obtained. If

Cae) ‘@ Cu
| Cu Eu
22502 es (0) ce ’

sc -(Oo.
then
I 11 Za

Is2/’
where the J;; and the 0;; are the identity and null matrices of dimensions
that correspond to the partitioning. Hence if we multiply out, we obtain

AuCi + Ai2Coa = Tu,


AunCr. ata A12C 2% = O12,
(2.23.3)
AaCy + AC = 021,
AoC. + AoC22 = Tee.

The following solution of the system can be verified:

Coz = (Aoze — AnAzA)-},

(2.23.4) Cry = —AA Cx,


Cor = —C2AnAz},
Cu Alu ae Ay2Co1).

If Ave is a scalai, Ais a column vector, and As: a row vector, then C2»
is a scalar, Ci, a column vector, and Cz: a row vector, and the inverse
required for C22 is trivial. The matrices are to be obtained in the order
given, and it is to be noted that the product A;z!Ai2 occurs three times,
and can be calculated at the outset. If A is symmetric, then

Cox = Cle
MATRICES AND LINEAR EQUATIONS | 79

In any event the matrix C2» is of lower dimension than C, and the required
inversion: more easily performed. It is therefore feasible to invert in
sequence the matrices
pages Q@i1 GQi2 13
12
(a1), >| ei Geo Gog, - ++ 4
O21 22
O31 a32 Q33

each matrix in the sequence taking the place of the Aj; in the inversion of
the next.
In the following section it will be shown how from a known inverse
A one can find the inverse of a matrix A’ which differs from A in only
a single element or in one or more rows and columns. It is clearly pos-
sible to start from any matrix whose inverse is known, say the identity I,
and by modifying a row or a column at a time, obtain finally the inverse
required. However, these formulas have importance for their own sake,
and will be considered independently.
2.24. Inverting Modified Matrices. The following formulas can be
verified directly: |
(2.24.1) (A + USV")-! = A-! — A“1US(S + SV™A—'US)SV™A—1,
(2.24.2) (A+ US“*Y!)* = At — AUS + VA) VIA
provided the indicated inverses exist and the dimensions are properly
matched. Thus A and S are square matrices, U and V rectangular. In
particular, if U and V are column vectors wu and », and if the scalar S = 1,
then
(2.24.3) (A + w')-! = A! — (Au) (TA-1)/(1 + TAM).
If u = e;, then the 7th row of w is v', and every other row is null;
if v = e;, then the 7th column of wo" is wu, and every other column is null;
if u = ce;, where o is some scalar, and v = e;, then the element in the
ith row and jth column of wo’ is ¢, and every other element is zero. In
the last instance, v'A-u is o(a1),;, where (a); is the indicated element
of A-!. We have then the interesting corollary that the matrix A + w
becomes singular when ¢ = —1/(a~),;.
2.25. Matrices with Complex Elements. If the coefficients of a system
of linear equations are complex, then the matrix can be written in the
form A + 2B, where A and B have only real elements. In general we
may expect the solution to have complex elements. Hence the equations
can be written in the form

(A + 1B)(z + ty) = ¢ + ad,


where the vectors x, y,c, anddareallreal. However, this is equivalent to
(At — By) + (Ay + Br) = e+ 22,
80 PRINCIPLES OF NUMERICAL ANALYSIS

and since the real parts and the pure imaginary parts must be separately
equal, this is equivalent to the real system of order 2n:
Az — By =¢,
Bx + Ay =d,
or

(Senay) =)
Thus the complex system of order 7 is equivalent to a real system of order
2n, since these steps can be reversed. The complex matrix A + 7B is
singular if and only if the system with c+ id = 0 has a nontrivial
solution, and this occurs if and only if the real matrix of order 2n is
singular.
A complex matrix is called Hermitian in case A is symmetric and B
skew-symmetric, 1.e., in case
Al aA, pie
But then the real matrix of order 27 can be written

(5BoA4)je
and it is symmetric. Hence the complex matrix is Hermitian if and only
if the corresponding real matrix is symmetric. A Hermitian matrix is
positive definite if and only if for every non-null complex vector x + ty
it is true that
(at — ty")(A + 1B)(z + ty) > 0.
This implies that the quantity is real. But if we evaluate the quantity on
the left, we obtain

a'(Ax — By) + y'(Ba + Ay) + i[z™(Bc + Ay) — y™(Ax — By).

Since A is symmetric and B skew-symmetric, the quantity within brackets


vanishes, and the quantity in question is certainly real whenever the
matrix is Hermitian. As for the rest, we have
o'(Ax — By) + y"(Ba + Ay) = (at, y") c
a (*).
If this is positive for every choice of x and y, then the real matrix of order
2n is positive definite. Hence a Hermitian matrix of order n is positive
definite if and only if the corresponding real matrix of order 2n is positive
definite.
Throughout the discussion of methods of inverting matrices and solving
systems of equations, we have tacitly assumed that all quantities were
MATRICES AND LINEAR EQUATIONS 81
real, though in fact many of the processes were equally applicable to the
complex case when appropriate changes are made in the wording. How-
ever, rather than complicate the exposition, we have preferred to treat
only the real case and reduce the complex case to the real.
2.3. Some Comparative Evaluations. For either inverting a matrix
or for solving a system of equations, there is no single method that is
clearly best for all matrices or all systems. Some matrices may have
many null elements, especially those which result from the finite difference
approximation to a differential equation. Analysis of the system may
show that a method that would be highly inefficient in general would work
admirably well for the particular case. On the other hand, if one has
many systems to solve, all differing among themselves, it may be more
efficient to use the same scheme for all of them than it would be to analyze
each system as it arises before deciding upon how to proceed. This is
especially true if one is using automatic computing machinery for which
the arrangement of the program is a major task.
When computing machinery is used, the method must be adapted to
the machine. Generally speaking, the number of multiplications and
divisions and the number of recordings of intermediate results together
provide a rough over-all estimate of the efficiency of a computational
scheme. It is possible to estimate these numbers as functions of n, the
order of the system, but the functions may be discontinuous. That is to
say, if n is small enough so that all quantities, initial, intermediate, and
final, can be retained in the internal memory of the machine, the func-
tions will be of one form. But if 7 is so large that the auxiliary storage
must be utilized, then transfers must be made between the internal
memory and the external, and additional operations may be required.
In fact, it may be necessary to use an entirely different computational
scheme.
2.31. Operational Counts. The possible occurrence of null elements
will be ignored. With this understanding we consider the number of
operations required in the application of some of the methods discussed
above.
2.311. The method of Seidel and the method of relaxation. The equa-
tions can be written in scalar form

ants = 1 — Y oiib;,
ji
&; = n/a — p: (cxsj/ orgs)
E3-
jt

The n(n + 1)/2 divisions ,/o;; and aij/ox (for a symmetric matrix)
can always be done in advance. Thereafter each correction of a single
¢; requires n — 1 multiplications and a single recording provided the
82 PRINCIPLES OF NUMERICAL ANALYSIS

products can be accumulated. For a complete Seidel cycle this is


n(n — 1) products and n recordings. The number of cycles required,
however, depends upon the system, the starting values, and the required
accuracy. i
2.312. The method of steepest descent. One requires at each step the
product Azp-1, or n? products, the n? products Ar,, the n products
r'(Ar,), the n products rfrp, the quotient rjr,/rj}Arp = hp, and the n
products \,r,, as well as the various sums and differences. Counting
multiplications and divisions as equivalent, this is 2n? + 2n + 1 product ~
operations at each step. This is somewhat more than two complete
Seidel cycles performed on a prepared system in which the indicated
quotients have been taken in advance. As for recordings, one requires
at least the n elements r,, the n elements Ary, the products rir, and
r'Ar,, the quotient A», and the n elements 2x», or altogether 3n + 3
quantities, as compared with n for a complete Seidel cycle.
2.313. The Stiefel-Hestenes method. The general step requires n?
products for Az; and n? for Ar;, n for rir; and n more for rtAr;, and indi-
vidual products and quotients for a; and @;. In principle the iteration
terminates in » steps with something over 2n’ product operations. Each
additional projection for reducing round-off requires something over
2n? product operations. In recordings, each step requires at least the
n elements of 2;, the n elements of 7;, the n elements of Ar;, and a few
scalars. This is something over 3n recordings per step or 3n? for a com-
plete cycle. In addition r,, must be carried over from the previous
step.
2.314. The Crout factorization. One is to write

(A, y) = L(W, 2),


where L is unit lower triangular and W upper triangular, then solve the
triangular system
Wx =z.

Consider Eqs. (2.201.6) supposing W1; and Li: to be of order 7, Wo. a


scalar, and Le, = 1. Then We» requires 7 products, as does each element
in We3, making 7(n — 7) products. Each of the n — i — 1 elements of
Lz2 requires 7 products in L31W12 and the quotient of the result by Wes,
making (t+ 1)(n-—%—1) product operations. Finally, from Eq.
(2.21.6) one requires 7 products for 22. This is n(2i + 1) — 222-¢-1
product operations in all. Summing from i = 0 toi = n — 1, we get a
total of n(n? — 1)/3 products to be formed. Solving the triangular
system requires a quotient for £,, a product and a quotient for £1, . . . ,
or altogether n(n + 1)/2 product operations. Altogether we have a
total of n(2n + 1)(n + 1)/6 product operations, or something over n3/3.
MATRICES AND LINEAR EQUATIONS 83

The recordings required are the triangular matrices L and W and the
vectors z and z, or n? + 2n quantities altogether.
To iterate the process in order to reduce round-off, formation of
ro = y — Axo requires n? products; L7'7y requires n(n — 1)/2 (all
multiplications, no divisions since L is unit lower triangular); and
W'L~'ro requires n(n + 1)/2, or altogether 2n? products and at least
3n recordings. These give the corrections to 2, so that an additional n
recordings of 2, itself are required. If the matrix is symmetric, the
operations are reduced by nearly one-half.
2.315. Orthogonalization. In forming RV = A, R'™R = D* as in §2.22,
suppose 7 columns of A have been orthogonalized. As in (2.201.9)
and (2.201.10), 2 elements of the next column of v are to be found, each
requiring n products and a division. Then the next column of FR requires
n(n + 7) products, and n more are required for the next element of
D?. Hence to orthogonalize the columns of A requires a total of
n(4n? + n — 1)/2,
or approximately 2n’ products. Beyond this one requires R'y with n?
products; n more products in multiplying this by D-*; and n(n — 1)/2
more in solving the triangular system Vr = D-*R'y. Altogether it
amounts to 2n?(n + 1) products. For recordings we require at least the
nm? elements of R; n(n — 1)/2 elements of V; n elements of D?; andn
elements each of R'y, of D-*R'y, and of x. This makes n(8n + 7)/2
recordings.
2.316. Inverting a modified matrix. In Eq. (2.24.3), if wu= e;, and A-!
is given, then the inversion of A + w™ requires n? multiplications for
v'A-!; n quotients of 1 + v'A—1u into the vector v'A—! (or into A-u);
n products for multiplying the column vector by the row vector. Hence
there are 2n? + n product operations for modifying the inverse when a
single column of the matrix A is modified. If one builds up the inverse
by modifying a column at a time, then in the worst case n?(2n + 1)
products are required. However, if one starts with the identity, then
in the first step, since
(I + wt)? = I — w'/(1 + Tw),
only n quotients are needed and no other products. The new inverse
differs from J in only the 7th row, so that many zeros remain if 7 is large.
If the programing takes advantage of the presence of the zeros, the num-
ber of products is reduced considerably.
Once the inverse is taken, if a set of equations are to be solved, an
additional n* products are needed.
2.4. Bibliographic Notes. Most of the methods described here are
old, and have been independently discovered several times. A series of
84 PRINCIPLES OF NUMERICAL ANALYSIS

papers by Bodewig (1947, 1947-1948) compares various methods, includ-


ing some not described here, with operational counts and other points of
comparisons, and it contains a list of sources. Forsythe (1952) has
compiled an extensive bibliography with a classification of methods. A
set of mimeographed notes (anonymous) was distributed by the Institute
for Numerical Analysis, and interpreted several of the standard iterative
methods as methods of successive projection, much asis done here. This
includes the iterative resolution of y along the columns of A, which is
attributed to C. B. Tompkins. The same method had been given by
A. de la Garza in a personal communication to the author in 1949.
Large systems of linear inequalities have important military and
economic applications. Agmon (1951) has developed a method of
relaxation for such systems, which reduces to the ordinary method of
relaxation when the inequalities become equations.
Conditions for convergence of iterations are given by von Mises and
Pollaczek-Geiringer (1929), Stein (1951b, 1952), Collatz (1950), Reich
(1949), Ivanov (1939), and Plunkett (1950).
On the orthogonalizations of residuals see Lanczos (1950, 1951). For
other discussions of the method of Lanczos, Stiefel, and Hestenes see
Hestenes and Stein (1951), Stiefel (1952), Hestenes and Stiefel (1952).
Crout (1941) pointed out the possibility of economizing on the record-
ings in the triangular factorization. The author is indebted to James
Alexander and Jean Hall for pointing out the possibility of a similar
economy in recording in the use of Jordan’s method. Turing (1948)
discusses round-off primarily with reference to these two methods and
refers to the former as the ‘‘unsymmetric Choleski method.’ The
formulas apply to the assessment of an inverse already obtained. On
the other hand, von Neumann and Goldstine (1947) obtain a priori
estimates, but in terms of largest and smallest proper values. They
assume the method of elimination to be applied to the system with posi-
tive definite matrix A or to the system which has been premultiplied by
A' to make the matrix positive definite. See also Bargmann, Mont-
gomery, and von Neumann (1946).
Dwyer (1951) devotes some little space to a discussion of errors and
gives detailed computational layouts. Hotelling (1943) deals with a
variety of topics, including errors and techniques. Lonseth (1947)
gives the essential formulas for propagated error.
Sherman and Morrison (1949, 1950), Woodbury (1950), and Bartlett
(1951) give formulas for the inverse of a modified matrix, and Sherman
applies these to inversion in general.
A number of detailed techniques appear in current publications by the
International Business Machines Corporation, especially in the reports
of the several Endicott symposia,
MATRICES AND LINEAR EQUATIONS 85
In the older literature, reference should be made espécially to Aitken
(1932b, 1936-1937a).
An interesting and valuable discussion of measures of magnitude is
given by Fadeeva (1950, in Benster’s translation). In particular Fa-
deeva suggests the association of the measures designated here as c(A)
and b(x).
The use of Chebyshev polynomials for accelerating convergence is
described by Grossman (1950) and Gavurin (1950).
On general theory the literature is abundant. Muir (1906, 1911, 1920,
1923) is almost inexhaustible on special identities and special forms, and
many of the results are summarized in Muir and Metzler (1930). Frazer,
Duncan, and Collar (1946) emphasize computational methods. Mac-
Duffee (1943) is especially good on the normal forms and the character-
istic equation.
CHAPTER 3

NONLINEAR EQUATIONS AND SYSTEMS

3. Nonlinear Equations and Systems


In the present chapter matrices andvectors will occur only incidentally.
Consequently the convention followed in the last chapter of representing
scalars only by Greek letters will be dropped here. The objective of this
chapter is to develop methods for the numerical approximation to the
solutions of nonlinear equations and systems of equations. With systems
of nonlinear equations, the procedure is generally to obtain a sequence
of systems of linear equations whose solutions converge to the required
values, or else a sequence of equations in a single unknown.
A major objective in the classical theory of equations is the expression
in closed form of the solutions of an equation and the determination of
conditions under which such expressions exist. Aside from the fact
that only a limited class of equations satisfy these conditions, the closed
expressions themselves are generally quite unmanageable computa-
tionally. Thus it is easy to write the formula for the real solution of
x* = 9, but if one needs the solution numerically to very many decimals,
it is most easily obtained by solving the equation numerically by one
of the methods which will be described. Nevertheless, certain principles
from the theory of equations will be required, and we begin by develop-
ing them.
To begin with, we shall be concerned with an algebraic equation, which
is one that can be written
(3.0.1) P(g) = 0,
where
(3.0.2) P(x) = aox™ + aya™2 + + + + + anit + dn,
a polynomial of degree n. Ordinarily we shall suppose that a) + 0, since
otherwise the polynomial or the equation would be of some lower degree.
This being the case, we can always, if we wish, write
(3.0.3) P(t) = ay'P(e) =a" + aw + +--+ - +a,
and the equation

is equivalent to the criginal one.


86
NONLINEAR EQUATIONS AND SYSTEMS 87

3.01. The Remainder Theorem; the Symmetric Functions. A basic


theorem is the remainder theorem, which states that, if the polynomial
P(x) isdivided by z — r, where r is any number, then the remainder is
the number P(r). Thus suppose

(3.01.1) P(x) = @ — r)Q@) + RB,


where Q(z) is the quotient of degree n — 1, and R is the constant remain-
der. This is an identity, valid for all values of x. Hence in particular
it is valid for x = r, which leads to

(3.01.2) P(r) = RB.


A corollary is the factor theorem, which states that, if r is a zero of the
polynomial P(x), that is, a root of Eq. (3.0.1), then x — r divides P(z)
exactly. Conversely, if x — r divides P(x), then r is a zero of P(z).
For by (8.01.2) if r is a zero of P(x), then R = 0, and by (8.01.1) the
division by x — ris exact. The converse is obvious.
The fundamental theorem of algebra states that every algebraic equa-
tion hasaroot. The proof is a bit long and will not be given here. But
it follows from that and the factor theorem that an algebraic equation
of degree n has exactly roots (which, however, are not necessarily
distinct). For by the fundamental theorem (3.0.1) has a root, which
we may callz:. By the factor theorem we can write

P(x) = (@ — 21)Q:(2).

But Q, = 0 is an algebraic equation of degree n — 1; it has a root, say


22, and hence
Qi(x) = (@ — %2)Q2(z).
Eventually we get
Q,-1(2) = (x ae In) Qn;

-where Qn-1 is linear and Q, a constant. But then

(3.01.3) P(2) = (x = £1) (x a 2) slre S (x ae tn)Qn,

and not only x; but also 7, . . . , 7, are roots of P = 0. But there can
be no others. For if 2n41 were different from zi, ... , %n, and alsoa
root of P = 0, then it would be true that

O = P(tnt1) = (nti — Bale ise (Tati — Zn) Qn.

But if tn41 — 2% XO fori =1,..., 7, then Q, = 0 and P = 0 iden-


tically. Hence there are exactly n roots, and the theorem is proved. As
a partial restatement we can say that, if a polynomial of degree not
88 PRINCIPLES OF NUMERICAL ANALYSIS
greater than n is known to vanish for n + 1 distinct values of x, then this
polynomial vanishes identically.
Now consider the factorization (3.01.3). If we multiply out on the
right, the polynomial we obtain must be precisely the polynomial P.
But the coefficient of x* on the right is Q,, so therefore Q, = ao. By
examining the other coefficients in turn, we find

ai = —Q0 »;Ti,
+

a2 = ao UXj,
(3.01.4) i<j
a3 = —Qo y CU r,
t<j<k

Gn = (—1)"aor1%2 * + *. En.

Thus (—1)*a;/ao is the sum of the e products of the r’s taken h at a


time. These sums are called the elementary symmetric functions of the
roots. They are symmetric because interchanging any pair of the roots
leaves the value of the function unchanged. It is a theorem that any
rational symmetric function is expressible as a rational function of the
elementary symmetric functions. The general theorem will not be
required here, but special cases will appear. Consequently we introduce
the notation o, for the elementary symmetric function of degree h:
(3.01.5) a, = (—1)*aoon = Aogp.

Of particular importance will be the sums of powers:

(3.01.6) s&s, =
) x,

where h is any integer, positive, negative, or zero. For h = 0 we have


S) =n. Expressions for these in terms of the elementary symmetric
functions will be given later.
We conclude this section by noting that

(3.01.7) a*P(1/r) =anz* + aw +--+ tay


@o(1 — wx)(1 — wae) - +> + (1 — oan).
Hence the equation

(3.01.8) * . eee aw)


has the n roots 271, . . . , #1, provided every 1; 0. We call it the
reciprocal equation. Then
NONLINEAR EQUATIONS AND SYSTEMS 89

(3.01.9)
S76, 'o. le me: 6 76,78; te 0. te Te

3.02. The Derivative Equations. The derivative equations are those


formed from the derivatives of P:

(3.02.1) P(x) =

If P is a real polynomial, 7.e., if all its coefficients are real, then the real
roots of the derivative equations have important relations to the real
roots of the original.
If we set x = 2+ 7, where r is any real number, forming P(z + r),
expand each power of z + 1, and collect like powers of z, the result is a
polynomial in z of degree n. If in this polynomial we now replace z by
x — r but without expanding powers of x — r, we obtain an expression
of the form
P(x) = Cn + Cn—1(@ — 7) + Cree — 7)? + +> > + e0o(e — 1)",
where the c’s are the constant coefficients of the several powers of z in
P(z +r), and in particular, c) = do. This is an identity which therefore
holds when we differentiate on both sides and continues to hold when we
give to x any fixed value. In particular if we set x = r, we find (as in the
proof of the remainder theorem) that
P(r) = Cn,

and if we first differentiate 7 times and then set + = 1,


P(r) = tle,
Hence

(3.02.2) P(x) = P(r) + (@ —n)P'(r) + @ — 7)?P'(r)/2!+ °°:


+ (2 — r)*P™ (r)/n!.
This is Taylor’s series for polynomials.
If P(r) = 0 but P’(r) ¥ 0, then x — ris a factor of P(x), but (x — r)?
is not. Hence r is a simple root of P=0. But if P(r) = P(r) = 0,
then r is at least a double root. Hence a root of P = 0 is a multiple root
if and only if it is also a root of P’ = 0. In fact, ris a root of multiplicity
‘m if and only if

0 =P(r) = P(r) = +++ = P-V(r) 4 P(r).


90 PRINCIPLES OF NUMERICAL ANALYSIS
In that case (3.02. 2) becomes

P(x)= (@ — r)mP™(r)/m! + -
Moreover, r is a root of multiplicity m — 1 of P’ = 0, of multiplicity
m—2ofP” =0,....
If P is a real polynomial, then between consecutive real roots of P = 0
there is an odd number of roots of P’ = 0. In particular, there is at least
one. This is Rolle’s theorem. For suppose 2 is a root of multiplicity
m,, 2 of multiplicity m2. Then we can write
P(x) = (w@ — 21)™(e — r2)™Q(z),
where Q does not vanish at x1 or 2 or anywhere between. Since Q is a
polynomial, it must retain the same sign throughout the interval. Now
P'(x) = (@ — 41)™""(@ — x2)"*19(z),
where
q(x) = m(x — 22)Q + mo(x — 21)Q + (a — 21)(4 — 22)Q’.
Hence
q(#1) = m(%1 — %2)Q(x1),
Q(X2) = me(xe — £1)Q(z2).

But m:Q(x1) and m,Q(x2) have the same sign, whereas

%1—- %e = — (xe = 21).

Hence q(x) and q(x2) have opposite signs, and g(x) must vanish an odd
number of times between x; and x2. Hence the same is true of P’(z).
If we differentiate the factored form (3.01.3) of P(x), we obtain for P’
a sum of products of n — 1 factors each. In fact, each product can be
written as P(x)/(x — x;) for some 7. Hence

(3.02.3) P'(z) = P(e) ) (2 — a),


or

(8.02.4) P'(2)/P(e) =) (@ — 2) = p'@)/P(@).


But for x sufficiently large

@-—a2)=Ssat+ra + sett -:-


and
Z(e — m4)? = na + sya? + sat
Since
p(x)= 2 — a"! + ot? — -
p(x) = na — (n — 1)oye*? + eC- ee +:
NONLINBAR EQUATIONS AND SYSPEMS 91
Hence if we multiply p(x) by =(x — z,)-! and equate the coefficients to
those of p’(x), we get the relations
: $1 — 6, = 0,
S2 — S101 + 2c, = 0,
83 — Sed, + $102 — 303 = 0,
(3.02.5) WERE RE OES © Ae) 9 lepton repay ets sp cays

Spiaesn—1010-|ne cee + (—1)"no,, = 0,


Sntp — Sn4+p—-191 a orca + (—1)"s8,0n == (9).

These are Newton’s identities, expressing recursively the sums of powers


of the roots as polynomials in the elementary symmetric functions, and
hence as rational functions of the coefficients. If one applies the same
relations to the reciprocal equation (3.01.8), one obtains the sums of the
powers with negative exponents.
If we set
(3.02.6) Q(z) =U(1 — xz) = 1 — oe + 022? — © + + + one",
and expand
(B02 ad/0 Giese NG aw) ee See OE
the coefficients S, of this expansion are symmetric functions of the roots:
Sot tie e TS,
Se = vi + 21te + Ne te
(3.02.8)

Si = att ates + rae +


the so-called ‘‘complete’”’ symmetric functions. Since

(1 — oz + oo? — +: -)\(1 + Siz + Soe? +--+) = 1,

on comparing coefficients, one obtains


Sy Oia 0,

Se — 018; + a2 ad 0,
(3.02.9)
S3 — 0182 + 0281 7 Os> 0,
we rel Nia) en mis. ie Wiese e, We.. ial 6. fer. ee) 9,8) xe))ca

Of the three sets of symmetric functions, each set can be expressed in


terms of the others by means of these equations.
3.03. Vandermonde Determinants. It is easy to write in determinantal
form an equation having specified roots. To illustrate for the casen = 3,
if x1, 22, and 2; are all distinct, the equation

(3.03.1) =
92 PRINCIPLES OF NUMERICAL ANALYSIS

has these and only these roots. The determinant, therefore, whose
expansion is a cubic polynomial in z is equal to some constant times
(x — a1)(x — 22)(x — x3) by application of the factor theorem. If we
were to regard 2; as the variable instead of x, and apply the factor theorem
again, it appears that (73 — x2) and (x3 — 21) are also factors of the
expansion of the determinant. Likewise, regarding x2 as the variable,
(2 — 21) appears as an additional factor. Hence the determinant is
equal to the product (a3 — x2) (a3 — %1)("%2 — %1)(% — %3)(% — %e)(% — 21),
possibly multiplied by some factor as yet undetermined. However, the
determinant is a cubic polynomial in z, and so has no other factors con-
taining 2; it is also a cubic in x3, and so can have no other factors contain-
ing x3; nor by the same rule can it have other factors containing 22 or 71.
Hence any factor not yet found is a constant, independent of x or any
of the x; But the expansion of the determinant contains the term z2r72z*
once from the principal diagonal, and the expansion of the product
contains this product also. Hence there is no other factor, and

A eesbead
ge |
= (ts — Le)(%3 — 21) (%2 — 21)(4 — Xs)(% — 42)(e — 2).

The coefficient of x? is

12 Ay at
Z1 Xe X3| = (xs — e)(X3 — 21)(X2 — 4).
ui xy x5
Such a determinant is called an alternant, or an elementary Vandermonde
determinant. The negative of the coefficient of x? is |

Lp el ake ji ak
%1 V2 L3| = 01|/|X1 Lo ZI.
3 3 3 2 2
TT, 4 ti ry 23
Again, the coefficient of z is

L casheoat ly gdoand
ag? 22 22| =
no Ay ce O2;|%1 Xe XI,
Del
3 3
aes3 Hi 3. a
and the negative of the constant term is

ty3 aS3 ae3 oa 4 ee


NONLINEAR EQUATIONS AND SYSTEMS 93
The equation whose roots are 21, 2 = x1 and 23 ¥ 21 is

For if we call the determinant P(x), then P(a1) = P’(x;) = 0, and


P(zs) = 0. Likewise the equation whose roots are 21, #2 = 2 and
%3 = “11s
1 0 Oat
1 1 0 xv =r):
Leer he x
TOT Or es

These representations are easily generalized, but the notation becomes


cumbersome.
3.04. Synthetic Division. We consider now some further useful conse-
quences of the remainder theorem. First we observe that P(r), which
is the remainder after dividing P(x) by.x — r,is most readily evaluated by
evaluating sequentially
aor + a1,
- (aor + ai)r + az,
[(aor + ai)r + as|r + as,

with P(r) obtained as the final step. This process can be systematized
by writing the system
ao ai a2 Chan an [r
borate Pe Landy oat

bo bi be St ad R

where bo = do, and in general every number along the bottom row is the
sum of the two above it. The r is written in the upper right-hand box
merely as a convenient reminder.
Having written this, we now observe that the b’s are the coefficients
of the quotient
Q(x) = boc? + bya? + > + + a1.
One way to see this is to note that, when in ordinary long division we
divide P(x) by x — 1, the b; is exactly the remainder we get after dividing
aot + a by x — 1, the bz is the remainder after dividing aox? + air + az,
94 PRINCIPLES OF NUMERICAL ANALYSIS
and so on sequentially. We have written above merely a scheme for
evaluating these remainders in sequence.
If the coefficients of the equation P(x) =' 0 are all integers, it is possible
to obtain all its rational roots by inspection and a few synthetic divisions.
This is a help even when the rational roots are of no interest for them-
selves, since for every known rational root the degree of the equation
can be lowered by one. If we examine the scheme for synthetic division,
we can see that, if r is an integer and the a’s are all integers, then the b’s
are allintegers. Ifrisa root, then R = 0,sothata, = —brr. Hence
risa factor of dn. Thus if the equation has any integral root, the root is
a divisor of the constant term. More generally, if P(x) = 0 is a poly-
nomial equation with integral coefficients, and if p/q is a rational root in
lowest terms, then p is a divisor of the constant term, and q is a divisor
of the leading coefficient.
For suppose r is a fraction p/g in lowest terms. If bor is a fraction, say
with denominator s, then b; is a mixed number whose fractional term
has the denominator s. But s cannot divide p since p/gq is in lowest
terms. Hence bir is certainly fractional. By continuing to the end, we
conclude that p/q cannot be a root if q does not divide ao. If we apply
the argument to the reciprocal equation (3.01.8), we conclude that p
must divide adn. Since there are only a finite number of possible choices
for p and q, these can be examined one by one.
In some of the numerical methods of evaluating roots of polynomial
equations, and for other purposes too, one often starts with a polynomial
P(x), replaces x by z + 7, and wishes to evaluate the coefficients of the
polynomial P(z+ 7) as in §3.02. For example, r might be a close
approximation to a desired root of P(x) = 0, and we wish to replace the
equation by one in z for which the desired root z is as small as possible.
This is done in both Horner’s method and in Newton’s method, which will
be described later.
As in deriving (3.02.2), we write

P(x) = eo(@ — 7)” + ex@ — 7)" 4+ ++ > +, 42 — 7) + em,

where c, = P(r), so that c, is the remainder after dividing P(x) by x — r.


Again if we write

P(x) = (2 — r)Q(x) + en
= (2 — r)[co@s—ir)™ 1 + * >> + ene) + e,,
it is clear further that c,_; is the remainder after dividing the quotient
by «—7r,.... Hence we extend our synthetic division scheme as
follows:
NONLINHAR EQUATIONS AND SYSTEMS 95

gh Gye FOROS oe Ok. «=Gn ir


bor Do resOler + bos}
bo bi cee 8Ope Ona Ten
or Dns? Dye?
bp oy spelt m2 Cra
phy Hewat tes?
Ae oFae Farr Oh

At each division we cut off the final remainder and repeat the syn-
thetic division with the preceding coefficients. This is sometimes
called reducing the roots of the equation, since every root 2; of the equa-
tion P(z + r) = 0 is7r less than a corresponding root x; of P(x) = 0.
In solving equations by Newton’s or Horner’s method, it is first
necessary to localize the roots roughly, and the first step in this is to
obtain upper and lower bounds for all the real roots. If in the process
of reducing the roots by a positive r the b’s and the c of any line are all
positive, as well as all the c’s previously calculated, then necessarily
all succeeding b’s and c’s will be positive. Hence the transformed equa-
tion will have only positive coefficients and hence can have no positive
real roots. Hence the original equation can have no real roots exceeding
r. Hence any positive number r is an upper bound to the real roots of an
algebraic equation if in any line of the scheme for reducing the roots by r
all numbers are positive along with the c’s already calculated.
3.05. Sturm Functions; Isolation of Roots. The condition just given is
sufficient for assuring us that r is an upper bound to the roots of P = 0,
but it is not necessary. -In particular if all coefficients of P are positive,
the equation can have no positive roots. This again is a sufficient but
not a necessary condition. A condition that is both necessary and
sufficient will be derived in this section. In fact, we shall be able to
tell exactly how many real roots lie in any interval. However, since it
is somewhat laborious, some other weaker, but simpler, criteria will be
given first.
Suppose r is an m-fold root so that
P(a) = («& — r)™P™(r)/m! + + °°
Since P™(r) #0, there is some interval (r — «, r+ e) sufficiently
small so that P™(z) is non-null throughout the interval, and P‘™-(z),
, P’(x), P(x) are non-null except at r. Suppose P(r) > 0. Then
P-) (x) is increasing throughout the interval, and so it must be nega-
tive at r — «, positiveatr +.«. Hence P~*(z) is decreasing, and hence
positive, at r — ¢; increasing, and hence again positive, at r+e«. By
extending the argument, it appears that the signs at r — « and at r + €
can be tabulated as follows:
96 PRINCIPLES OF NUMERICAL ANALYSIS
Aes Pim-3) Pi(m-2) Pim-1) P(m)
(PIC o 0 < — + = GP
hie OX ECR S + + + 42

If P™(r) <0, then every sign is reversed. If we count the variations


in sign in the two sequences, we find that at r — ¢ there are m variations,
for P and P’ have opposite signs and present one variation, P’ and P”
have opposite signs and present another, .... On the other hand, at
r + eall signs are alike, and there are no variations. Hence the sequence
P, P’, P’, ..., P™ loses m variations in sign as one passes over an
m-fold root of P = 0 in the direction of increasing ~.
Next, suppose r is an m-fold root of P™ = 0 but not a root of P°-» = 0
(and it may or may not be a root of P = 0). Then P“+™ remains non-
null and of fixed sign, say > 0, throughout some interval (r — ¢, r + €),
and from P™ to P“+™), m variations in sign are lost. However, we must
consider the possible variations P“-», P™, for the sign of P™ may
change, whereas that of P“-» does not. But if m is even, the sign of
P® does not change, so from P“-» to P“™ there is still a loss of just m
variations. If m is odd, P™ does change, so that from P@-» to P@+t™
there is a loss of either m+ 1 or of only m — 1 variations. In either
event it is a non-negative even number.
By considering every point r on an interval (a, b), at which P or any
of its derivatives may vanish, we conclude that, if V. and V,z are the
numbers of variations in sign at a and b, respectively, displayed by the
sequence P, P’, P”, . .. , then Vz — Vz exceeds by a non-negative even
integer the number of roots of P = 0 on the interval.
This is Budan’s theorem. In particular, V.. = 0, while
P® (0) = m!anm.
Hence the number of variations in sign in the coefficients is Vo, and this
exceeds by a non-negative even integer the number of positive real roots.
This is Descartes’s rule of signs. In counting the variations, vanishing
derivatives are ignored.
This is sometimes sufficient to give all the necessary information.
Thus if there is a single variation or none, there will be one root or none;
if an odd number, there is at least one.
Exact information, however, is always given by Sturm’s theorem. In
the following sequence set

Po =P, Pi, =P

for uniformity. Divide Py by P: and denote the remainder with its


sign changed by P2. Divide Pi by P2 and denote that remainder with
its sign changed by P3,.... The polynomials Po, Pi, Pa, . . . are of
progressively lower degree, and the sequence must therefore terminate:
NONLINEAR EQUATIONS AND SYSTEMS 97

(2°05se Bink) Tie Rude beer sar Be tes

(It is understood that any constant ~ 0 divides any polynomial exactly.)


Now P,, is the highest common factor of Py) and P;. For by the last
equation, P,, divides P,,1; by the one before, since P,, divides both itself
and Pm, it divides Pn_2; by the one before that, it divides also Pm_s,
e< Conversely, if p is any polynomial which divides both Py and Pi,
it therefore divides P2 by the first equation; by the second, it divides
P;, . . . , and by the next to last, it divides Pn. Thus the statement is
established.
It follows that, if P = 0 has any multiple roots, they are roots of
P,, = 0, and all roots of P,, = 0 are multiple roots of P = 0. Hence
we can find all multiple roots by solving an equation of degree lower than
the original, remove them from P, and continue with an equation of
degree < n whose roots are all simple.
We now suppose this to have been done in advance, that P = 0 has
only simple roots, and that therefore P,, is a constant ~0. Then con-
sider the variations in sign presented by the sequence P;. Suppose
P.(r) = 0, whereO0 <i<m. Then Pii(r) #0. For if

Pir) = Pizrlr) = 0,
then P; and P;,; have a common divisor x — r. Since

(3.05.2) Pes = QP: — Pius,

it follows that P;; has also x — r as a divisor, and by continuing, we


conclude that also Pi and Po have the divisor x —r. But then r is
a multiple root of P = 0, whereas there are no multiple roots. Hence if
P,(r) = 0, then Pi41(r) ¥ 0 and also P;_1(r) # 0.
Consider again (3.05.2). At «w=7, P;1= —Pi41. Hence P;1 and
P;,1 have opposite signs at r and also throughout some small interval
(r —e,r+.). Hence whatever the signs of P; at r — e and r + ¢, these
three polynomials present the same number of variations at r — € as at
rte.
Now suppose P(r) = 0. Then Pi(r) 40, and P(x) keeps a fixed
sign in some interval (r — ¢, r+). If P1 > 0 on this interval, then
Por — €-) <0 < Po(r + ©), while if P: < 0, Po(r — «) > O > Po(r + ©).
In either case Py and P; present one variation at r — e and none at r + «.
Hence if V, and V; are now the numbers of variations in sign at a and b
of the sequence Po, P1,P2, . . . , Pm, then Va = Vsifa < b, and Va — Vo
98 PRINCIPLES OF NUMERICAL ANALYSIS

is exactly equal to the number of roots of P = 0’on the interval from


ato b. - |
We have proved the theorem only for the case that all roots are simple
‘roots. By modifying the argument slightly, it can be shown that it is
true also when there are multiple roots, provided each multiple root is
counted only once and not with its multiplicity. This is in contrast
to Budan’s theorem where a root of multiplicity m was to be counted _
m times.
In the practical application of Sturm’s theorem, if for any 7 < m, P;
can be recognized to have no real zeros, then it is unnecessary to continue.
Moreover, the sequence can be modified to
Coo = QP: — P2,
CP, = Q2P2 — Ps,
o © ee; @: ele *; je) 6) 6

where Co, ¢i, . . . are positive constants. Thus if the coefficients of P


are integers, one can keep all coefficients of all the P; integers and,
moreover, remove any common numerical factors that may appear. A
convenient algorithm is the following: Let bo, bi, bs, . . . represent the
coefficients of P; after removal of any common factor, and write the table
Qo GA. Ae as
bo bi be bs

Co Cy Cy c;
where

og EN en = Oy a
Obtain, next, the sequence
Co Cy Co C3

from the sequences b and c’ by


bo Dati |
Cp=
Cosy
Then these are the coefficients of P».
3.06. The Highest Common Factor. It is now easy to prove a theorem
utilized in §2.06. Let Po and P; be any two polynomials with highest
common factor D. There exist polynomials go and q; such that
(3.06.1) Pogo + Piqi => D.

Suppose the degree of P; is not greater than that of Po. In deriving


relations (3.05.1), we were supposing that P; = P}, but this supposition
is not at all necessary for those relations. The assumption was used
only in the proof of Sturm’s theorem. Hence for our arbitrary poly-
NONLINEAR EQUATIONS AND SYSTEMS 99

nomials we can form the successive quotients and remainders (this is


called the Euclidean algorithm) and obtain finally their highest common
factor P,, = D. Write these in the form
P, = Q:P: — Po,
—Q.P.+P; = —P,,
— Q:P;
+ P, = 0,

— Qn—1Pm—1
+ Px =
and regard them as equations in the unknowns P2, P3, . ..., Pm, the
coefficients Q being supposed known. The matrix of coefficients is unit
lower triangular and has determinant 1. Hence P,, is itself expressible
as a determinant, in which P; and P» occur linearly, in the last column
only. Hence the expansion of the determinant has indeed the form of the
left number of (3.06.1), where go and gq: are polynomials, expressiblein
terms of the Q’s.
3.07. Power Series and Analytic Functions. A few basic theorems on
series, and in particular on power series, will be stated here for future
reference. Proofs, when not given, can be found in most calculus texts.
Consider first a series of any type
(3.07.1) bo t+ bitbe+--:
where the 6; are real or complex numbers. Let
(3.07.2) Sx. = bo t+bit->: +d
represent the sum of the firstn + 1terms. The series (3.07.1) converges
to the limit s, provided lim s, = s, that is, provided for any positive
there exists an N such that |s, — s| < « whenevern > N. A theorem
of Cauchy states that the series (3.07.1) converges if and only if
lim |Snrp — Sn| = 0
nn 2

for every positive integer p. In particular sn41— Sn = bn41, so that the


theorem implies, with p = 1, that the individual terms in the series
approach zero in absolute ae
If a new series is formed by dropping any finite number of terms from
the original, or by introducing any finite number of terms into the original,
the two series converge or diverge together.
The series is said to converge absolutely in case the series

[bol + [bal + |b] + -


of moduliconverges. If the series converges absolutely, then it converges
since Ry
[Deap ok Paget te Cate) S lone Pb [Dntpl-
100 PRINCIPLES OF NUMERICAL ANALYSIS

If a series is absolutely convergent, its terms can be rearranged or asso-


ciated in any fashion without affecting the fact of convergence or the limit
to which it converges. This is not true of series which do not converge
absolutely. Also if for every n it is true that |b;| < |b,|, then the series
b+
bo toe+ °°:
converges absolutely if (3.07.1) converges absolutely. We shall say then
that the series (3.07.1) dominates the other.
Since
Ttatat+ +--+ +a = (1 — 2t)/(1
— 2)
identically, it follows that the geometric series, obtained by setting
; = yx‘, converges absolutely for any y and for any x satisfying |x| < 1.
In fact, for this series
(1 —2z)1— 4s =21(1 — 2)-4,

and when |z| < 1, this has the limit zero.


If for a real positive 6 < 1 it is true that lim |b,4:|/|b,| = 6, then
(8.07.1) converges absolutely. For select any positive « <1 — 8.
Then there is an N such that for n > N
[bntil/|bn| < 6 + €
Hence for any p
lbntal/|bnl < (8 + ¢)?.
Hence the series

[onl + [Deer] + ° - > = [Bal(L + [Bngal/lbn| + [bntel/lbnl + - + -)


converges since it is dominated by the terms of the geometric series.
Hence (3.07.1) converges absolutely.
On the other hand, if for a real positive 8 > 1 it is true that

mn lbn+al/ldn| = B,
then the series diverges.
If for some positive 6 < 1itis true that lim |b,|”" = 8, then the series
converges absolutely, but if 8 > 1, it diverges.
Any convergent series has a term of maximum modulus. For since
the sequence of terms b, has the limit zero, for any e there is a term by
such that all subsequent terms are less than ein modulus. Choose « less
than the modulus of some term in the series. Among the N + 1 terms
bo, . . - , by, there is one whose modulus is not exceeded by that of any
other of these terms, nor is it exceeded by the modulus of any b, for
n>N. Hence this is a term of maximum modulus,
NONLINEAR EQUATIONS AND SYSTEMS 101
When the terms }; of the series (3.07.1) are functions of x, the limit,
when it exists, is also a function of x, and we may write

(3.07.3) f(x) = bo(w) + di(x) + bo(x) + + +>


The series converges at x to the limit f(x) in case for any e there exists an
N such that |f(x) — s,(x)| < « whenever n > N. Clearly N depends
upon ¢, and the smaller one requires « to be, the larger N must be made.
In general N will depend also on x. But if for every x in some region the
series converges, and if, moreover, for every ¢ there is an N independent
of x in that region, then the series is said to be uniformly convergent in
the region.
If every b;(x) is continuous, and the series is uniformly convergent in
some region, then f(x) is continuous in that region. To show that f(z) is
continuous at x in that region, one must show that for any positive «
there is a 6 such that |f(x) — f(xo)| < e« whenever |x — 2| <6. Any
finite sum s,(x) is continuous, whence there exists a 5 such that |s,(z) —
Sn(Z0)| < €/8 whenever |x — xo| < 6. Let N be chosen so that |f(r) —
8,(x)| < ¢/3 for all x in the region whenever n > N. Hence
[f(z) — f(%o)|S |sn(w) — sn(ao)| + |f(@) — sa(x)| + |f(@0) — 8n(%0)| < e.
In case

(3.07.4) f@) = aot aw tant. -,


and the series converges for any %o, then the series converges absolutely
and uniformly for all x satisfying |z| < |zo|. For the series f(7o) has a
maximal term. Let this be y. Hence for every 7
(3.07.5) laxi| < y.
But if |z| < |zo|, then since
lax] < y|x/zol?,
the geometric series yZ|x/zxo|* converges and dominates the series for f(z).
Also any N that is effective for the series (3.07.4) when x = 20 is a fortiori
effective when |z| < |xo|, whence the series is uniformly convergent for
|z| < |x|. Since every term is a continuous function of «, therefore f(z)
is a continuous function of z.
If the series (3.07.4) diverges for any x = 2o, then it diverges for all x
of greater moduli. For if |z,| > [zol, and the series converged at x1, then
by the theorem just proved the series would converge at %o, contrary to
hypothesis. Hence if the series converges for any x ¥ 0, it either con-
verges for all x, or else there is some circle about the origin such that the
series converges throughout the interior and diverges at every point out-
side the circle. The behavior at points on the circle can only be deter-
mined by further study. This circle is the circle of convergence of the
power series,
102 PRINCIPLES OF NUMERICAL ANALYSIS

From (3.07.5) it follows that, if the series (3.07.4) converges for x = Zo,
then for every 7
(3.07.6) la;| < y/|zl,
where + is the modulus of the term of maximum modulus.
We have seen that f(x) defined by (3.07.4) is continuous throughout its
circle of convergence. It is also differentiable throughout the same
circle, and

(3.07.7) f'@) = a, + 2aow + 8am? + ---.


To prove this, let r = |z| < R, where R is the radius of the circle of con-
vergence; let a; = |a;|; and let }
x.
S
+

F(r) = aot awr+tar?+--°.


If 0 < 6 < R — 7, then the series F(r) and also the series F(r + 6) both
converge. Hence the series
SR (r+ 8) — FQ] = a1 + a(2r + 8) + as(Br? + 8rd +H) + °°
converges. If |h| < 4,
h-f(a + h) — f(x)] = ai + ae(2e + h) + a3(82? + 82h +h?) + ---,
and this last series is dominated by the previous one. Hence as a series
in functions of h for a fixed x the latter series converges uniformly and
hence defines a continuous function of h. But for h = 0 we have the
series (3.07.7).
Thus a function defined by a power series (3.07.4) is continuous and
differentiable throughout its circle of convergence, and the series (3.07.7)
for its derivative converges in the same circle. But the same can there-
fore be said of f’, so that f” exists, as does f’”’, . . . , and for each the
series converges in the same circle. The function f is said to be analytic
in the circle within which its power series converges. Since
f™(0) = nla,
the series can be written in the form
(3.07.8) f(x) = f(O) + af’) +2°f(0)/2!++ +++,
and this is its Maclaurin expansion. By a change of origin one can also
write the more general Taylor expansion
(3.07.9): f(@) SFO) er i ir) Ge 7h)2) aa
already given for the polynomials.
The series
(3.07.10) F(x) = aot + ayx?/2 + age?/3 +--+ -
formed by integrating each term from 0 to z has a radius of convergence
at least as large as that of f. For ifz # 0. the series x—'F (x) is dominated
NONLINEAR EQUATIONS AND SYSTEMS 103
by f(z). However, since f = F’, the radius of convergence of F can be
no greater than that of f, as we have seen.
If f(z) is an analytic function, the equation
(3.07.11) f(z) = 0
will be said to have a root r of multiplicity m in case (x — r)-*f(z) is
analytic at r but (x — r)—"-¥f(x) isnot. But from the expansion (3.07.9)
it appears that this will be so if and only if
0 = flr) =f) = + = fo) Kf).
Hence, in that case
fle) = (@ — FN) m+ >
In particular, the equation
. f(x)/f'(z) = 0
will have only simple roots, if any.
Budan’s theorem holds in the case of an analytic function provided,
for some k, f(x) keeps the same sign throughout the interval from a to b.
3.08. Kénig’s Theorem. Consider any function

(3.08.1) f(z) = ao + az + az? + ---


for which the expansion converges in some circle about the origin. Sup-
pose that within this circle f(z) has one and only one zero a, which is
simple. Let g(z) be analytic throughout the circle and g(a) #0. Then
the expansion
(3.08.2) g/f = hz) = ho + Ine + hoe? + °°
converges for all |z| < |a|, while the ee nalidion
(8.08.3) (a — z)h(z) = F(z) =ko + hizg+ kez? + -++ ther tee
converges throughout the circle. Then for |z| < |a|
(3.08.4) (a= 2)(ho + hz +--+) =k thizt+.° 3°

so that
aho = ko,

(3.08.5) ae ie
—h,-1 + ah, = k,.
On multiplying these equations by 1, a, «, . . . and adding, one obtains
ath =k that::: + ka".
Let
(3.08.6) F(z) =ko thet: +> +hz’ = F(z) — Ry4r(2).
104 PRINCIPLES OF NUMERICAL ANALYSIS
Then
(3.08.7) hy = oF, (a) = a YF(a) — R,41(0)],
and
(3.08.8) hy/hoya = oF (a) /Fy41(a).
However, F is analytic at a, and the series (3.08.3) converges for z = a.
Let p’ be the radius of convergence of this series and let p satisfy |a| <
p <p’. Then the series converges forz = p. If y is the modulus of the
term of maximum modulus of the series F'(p), then
\k,| < y/p’.
By (3.08.8)
an hi/hy4 im kyy10°t?/F y41(a).

Hence
(3.08.9) la rae hy/hy+:| = plart/p7*}|,

where yp is some positive quantity depending upon the value of 7, a, and


F(a). Since |a/p| < 1, this proves that
lim hy/Avys =
and shows, moreover, that the convergence is geometric with a ratio
|a/pl.
K6nig’s theorem has an important extension to the case in which f(z)
has exactly n simple zeros within some circle about the origin. The
extended theorem and its proof are sufficiently well illustrated by the case
n = 2, and this will now be given. Let the zeros be a: and az, and sup-
pose that within some small circle about the origin f(z) has these and no
other zeros. Take g and h as before but with g(a1) ¥ 0 and g(ae) ¥ 0.
Let
(3.08.10) P(z) = bo(l — 2/a1)(1 — 2/c2) = bo + diz + boz?,
and now take
(3.08.11) P(z)h(z) = F(z) = ko + kz + koe? + ---.,
Then F(z) is analytic throughout the circle. We set
(bo + biz + bae?)(ho thet >> -)=hkothe+---
so that
boho =| > >
biho + bohi = ka,
boho -- bihi + bohe = ke,
(3.08.12) behi + bihe + bohs a ks,
NONLINEAR EQUATIONS AND SYSTEMS 105
If we multiply these equations by 1, a, a?,4) . . . , a%, and add, we obtain

(a2—1bo + arby)hy—1 + arbohy = F,(04),

where F,(z) is defined as in (3.08.6). Likewise

(azbo + afttbi)hy + att bohva1 = Fy41(ax),


and
(azttbo + aft?bi his + oft Bolvgs = Fy42(a1).
These equations hold for 7 =lori=2. Multiply the first of these
equations through by a?, and the second by a;. The equations then show
that the determinant

hy-1 hy a2F ,(a) |

hy hy41 OF y4.1(a) = 0,
ys hy+e Fy42(a%)

since they express the linear dependence of the three columns.


Now if p’ is the radius of convergence of (3.08.11), p satisfies |a,;| < p <
p’ fort = 1 and 2, and 7 is the modulus of the term of maximum modulus
of the series F(p), then

[Ry4s(as)| < yloztt/p|(1 + |ox/p| + - - -)


= ylaztt/pt|/(1 — |ai/pl).
If |ee| Be lal, then

[Ry4s(ai)| < ylogtt/pr|/(1 — |a2/pl),


and, a fortiori,
|R>+2(a,)| Syl pe (l= |a2/pl),
[Rs(ou)| < ylayt4/p7*|/(1 — |a2/p)).
Let a = a in case |F(a1)| > |F(ae)|, otherwise a = a1, and set
M = y/(|F(a)|(1 — |a2/pl)],
u = |a2/p.
Then if the equation
hy loses 2

(3.08.13) Ret Aso) 2 | me O


Avge Avg 1
is expanded to the form

where the coefficients. represent the cofactors of the powers of z in the


determinant (3.08.13), then each of these coefficients differs by a factor
of less than My’* from the coefficients of a quadratic
Coz? + 12 + ce. = 0
106 PRINCIPLES OF NUMERICAL ANALYSIS
satisfied by a, and az. Hence in the limit for large v the quadratic
(3.08.13) is satisfied by the two roots a: and az of smallest modulus of
f = 0, if such exist within the circle of convergence of the expansion of f.
In ihe general case of n roots the equations of the sequence are
h, Rey alee hy haa ee

G00) ee ae 2eee
n—1

hen Rist eons) hyvyr i

3.1. The Graeffe Process. We turn now to methods for solving a sin-
gle equation in asingle unknown. We have seen that one can express the
sum s, of the pth powers of the roots of an algebraic equation as a rational
function of the coefficients of the equation by relations (3.02.5). But
we can write
Sp = 2(1 + 23/af + 28/28 + + * *).
Hence if there should be one root that is larger than all the others, say x1,
then for a sufficiently large p all fractions within the parentheses should
become negligible, and we would have approximately

Sp = 24,
and in particular
lim sl/? = 2}.
Day ee ‘

Hence if a feasible method could be found for computing s, for sufficiently


large p, we could take the pth root of this and obtain thereby an evalua-
tion of the largest root (in case there is such) of the equation.
The Graeffe process does this, and somewhat more. If we write the
equation in the form
(3.1.1) Aor” + aon"? + age™—4 + ao 0 = —ayr"-} a age"—3

es: asx7—5 =) ie

and square both sides, we obtain


aan?” + Qaaex?™—2 + (a? + 2Qaoa,)x?*—-4 + B96 pte Cf ho

+ 2a,a302"—4 + she

or
apc?" + (Qaod2 — af)x*-? + (2aras, — 2aia3 + a2)a2—4 pt heal)
Since only even powers of x occur here, this can be written

(3.1.2) ay” + (2aca2 — a2)y*-1 + (2aoa, — 2aras + a3)y"-?


+--+ =0Q,
where
yimras:
NONLINBAR EQUATIONS AND SYSTEMS 107
Hence we obtain a new equation whose roots are the squares of the roots
of the original equation. If we repeat, we obtain an equation whose roots
are the fourth powers, another repetition gives one with the eighth
powers, etc. After p such operations we obtain an equation whose roots
are the 27th powersof the roots of the original:
(3.1.3) avzt, + aPar, WY eC):

At any stage if we write the coefficients in sequence


a? a? ay ay ay a9

then to get the new sequence a+ we take the product of a by the


coefficient symmetrically placed with respect to a” and double, subtract
the double product of af by its symmetric mate,:. . . , ending with
ta?”*. Now if the roots are z;, then
(8.1.4) @:/ao = —Zx;, afP/aP = —Za,..., a /a = — 2”.
If the roots are all distinct, and x; has a modulus larger than that of any
other root, then eventually
(3.1.5) —a?/aP = x?.
Now it is also true that
ap/aP = ) (xia),
(i)
so that by a similar argument we can take eventually

(3.1.6) ap fay = (aes),


ap /ap = —a¥,
if the modulus of xz exceeds that of every other root except for 7. Again
ap /ay = —(xrters)”,
(3.1.7) ee ene
a? /qg? = —x??:

aP/ay, = —2x2?,
If the equation has only simple real roots, all relations (3.1.7) are valid
for sufficiently large p. The signs of the roots are undetermined, but
these can be obtained by substitution or in other ways. But if P(x) is
real, and the equation has complex roots, these occur in complex conjugate
pairs with equal moduli, and there may be any number of unequal roots,
all having the same modulus. For example, all n roots of

a” —1=0

have unit modulus, and the method fails.


108 PRINCIPLES OF NUMERICAL ANALYSIS

If there is one pair of complex roots whose moduli exceed the moduli of
all others,
lz:| = lao] =p > |e] (> 2),
then in polar form
%1 = p exp 20,
t= p exp (—76),
s0 that for m = 2?
xt = p™ exp mis,
xy p™ exp (—mié),
and
ae + 2e = 2p™ cos mo.
Hence for larger values of p, cos m6 will fluctuate in value and even in
sign, causing a/a?) to do likewise. However, in a¥/a® the dominant
term will be
NE 8 fo
If we can be sure that we stop where cos m@ is not too small, then x” + xz
will dominate the other terms in a/a, and we can obtain both p and
6, but with the quadrant of @ undecided.
This indeterminacy can be resolved if we apply the root-squaring
process to the equation
Py +h) =0
as well as to the original equation, where h is a small fixed number (using
the method of §3.04 to obtain the coefficients of y). Each root y; of this
equation is related to a root of the original equation by

YrP=f—oh,
and if h is small enough, the moduli of y; and yz will also exceed the
moduli of all the other roots. If

o = |ys| = yal,
then our roots 7; and 2; lie in the complex plane where the circle of radius
p about the origin intersects the circle of radius o about the point h units
to the right of the origin (or —A units to the left if h is negative). This
determines #0 uniquely. In case there are other roots x; with the same
modulus, the corresponding roots y; will have different moduli, and this
difficulty is thereby removed.
3.11. Lehmer’s Algorithm. The technique of investigating the roots
y: of Ply + h) = 0, along with the roots x; of P(x) = 0, has the dis-
advantage of requiring two applications of the Graeffe process in addition
to the special computations involved in the determination of the coeffi-
cients of y. Moreover in selecting h, one should be careful to make it
NONLINEAR EQUATIONS AND SYSTEMS 109
small enough so that, if roots x; and a; are such that |z;,| > |z,|, then also
la; — h| > |; — Al.
Brodetsky and Smeal therefore make the natural proposal that h should
be “infinitesimal,”’ and Lehmer has developed an effective algorithm. _
The original Graeffe process can be described in slightly different terms
by saying that we start with a polynomial P(x) and obtain from it a
polynomial P(x) whose zeros are the squares of those of P; from P: we
obtain P2(x) whose zeros are the squares of those of P; and hence the
fourth powers of those of P,.... On setting Po = P for uniformity,
one verifies that

(3.11.1) Poii(z) = Py(V/a)P,(— Vz), p=0,1,2,....


In fact
P(x) = adll(a — x),
Py(x) = al(/z — xi)(— V2 — 2) = Po(a/2)Po(— V2)
agll(—zx =e x3),

and the general statement follows by a simple induction.


If we write
Qo(x) = all(x — x; — h), il
Qi(x) = a(x — a — h)(— Va — a — h) = Q(V2)Q0(— V2)
agl[—z + (x; + h)?I, :
and continue the same procedure with

Qoi1(2) aa Q(z) Q2(— /2),


then we find inductively that Q,(x) is a polynomial whose zeros are the n
quantities
(a, + h)® = a? + mhap tt, om = 22,
where the terms omitted contain h? and higher powers of h.
Lehmer’s algorithm is obtained by setting

= do + a(a —h) + a(x —h)?+:--->


and defining recursively

(3.11.3) Ppis(t, h) = bp(v/2, h)bo(— V2, h).


Then
(3.11.4) Q,(4) = (h™ — x)"¢,(a, h).
Also

do(Z, h) aa go(2, 0) ray hdo(x, 0) ae


=ataw!+tanr? terre + h(bea~? + baw t+-- -)
110 PRINCIPLES OF NUMBRICAL ANALYSIS
where
$) = 9¢/dx = —d¢/dh,
and.
ba= (= Dena 6 =3)2 73,1 2
By direct calculation from (3.11.3) if
(3.11.5) dp+1(2, h) = afvetb + aetDy-1 af afer Dy? Eis (og's
oh 27+1A(betYy-1 + byt hy? a ole *)

ods :
we obtain the recursion
r-1
art) = (—1)a?* + 2 > (—1)"aPalp,, r=0,1,..., 2

(3.11.6) ae me
bet) = (—1)’a? dy _,, fuels 2, fos
v=0

If a = 1, as we may suppose, then af = 1 for every p. From


(3.11.4), Q, and (—2)"¢, differ only by terms containing the factor h”.
Hence the coefficient of x1 in ¢, is the sum of the zeros. But the zeros
are of the form
(x; +h)” = 2% + mar) + Feats
Hence
ap = — 22",
(3.11.7) be iy

But if there is a root of largest modulus 2, then for p sufficiently large it


follows that
(3.11.8) ay = a? /b®,
approximately. Thus we obtain the solution directly without the need
for a root extraction. Again if

E21 eal Er fapclic


then
ay?’ = xTry,
ae bY = ar—Ier-1(n, + 22),
whence

(3.11.10) ty = 1/(b/a? — db? /a”),

In like manner if
jai| > |x] > lars] > °°,
then

(3.11.11) ts = 1/(bY/aY — b&/aY’.


NONLINEAR EQUATIONS AND SYSTEMS lll
There are many special cases that can occur, and effort will be made,
not to enumerate them all, but only to suggest how they can be treated.
If x, is a root of multiplicity k and

eile Spee
then
ay oo —kar,
by = —kat,

so that (3.11.8) still holds. Moreover,

(—1)'ap = op,
(—1)*b = kaket,
(—1)*a, = atae
(=o = cet kaa
Hence
Lin = 1/(b ,/a@, — b?/a”).

Analogous relations can be worked out for a multiple root of intermediate


modulus.
The most important case of unequal roots of equal modulus occurs for
a real polynomial with complex roots. Suppose, for example,

|| = |r| > |x| Bn AY OME

Then if P(x) is real, and x1 # +22, we can write

V1 pexp (10), 22 = p exp (—78),


xt ll p™ exp (m6), xe = p™ exp (— m6).
Since
exp (+70) = cos 6 + 7sin 8,

a”) will contain the term —2p” cos mé which will oscillate in value with
increasing p and m but will dominate the other terms whenever m0 is not
too far from an integral multiple of r. When this is the case, we may say

a?) = —2p™ cos mé,


bY = —2p"-! cos (m — 1)8.
Also
a?) = p?
bY = 2p?"-! cos 8.
112 ' PRINCIPLES OF NUMERICAL. ANALYSIS

Since p is real and positive, this is obtainable from a by a root extraction,


and then 6 can be found from bY or, in fact, from bY or a. In any
event there is no ambiguity since the two possible quadrants for @ corre-
spond to the two roots x; and 22.
3.12. Transcendental Equations. If one sets z = 21, an equation
equivalent to P = 0 is the equation
(3.12.1) fle) =a + arz+az?+---: =0
in z. Each formula referring to the application of the Graeffe process in
either the original form or that given by Lehmer remains valid if each
x root x; is replaced by z;1, the reciprocal of a z root of (3.12.1). But
when this is done, they are applicable also to the case when f is analytic
and not necessarily a polynomial. That is to say, if among the roots of
(3.12.1) which lie within the circle of convergence there is a root 2; whose
modulus is less than that of all the others, then
(3.12.2) lim af/a? = —2?’,
Pp 2

which corresponds to (3.1.5), and


(3.12.3) lim 69 /a? = 2,
y eo

which corresponds to (8.11.8). Polya’s demonstration of the formulas


such as (3.12.2) will now be sketched briefly.
Suppose that the series f(z) converges inside a circle of radius p’ at least
and that the equation has exactly n roots, 21, zo, . . . , 2n, of moduli less
than p’. Choose p and designate the roots so that

2i| S|22)eS eS le, <p <p


It is no restriction to suppose that a) = 1. For any positive integer m,
let
w = exp (2ri/m) = cos (24/m) + 7 sin (2r/m).
Thus is a complex mth root of unity, and one can verify that the other
complex roots are w*, . . . , w”! and that
lTtotowt--+++y"1=0,

This done, one can verify that the product

(3.12.4) f(z)f(wz) + + + fw"2) = 1 + dimez™ + deme? ++


contains only powers of z". The Graeffe process takes advantage of this
in the special case when m is some power of 2. The theorem states that

(3.12.5) lim 22 ++ + g™dam = (—1)".


NONLINEAR EQUATIONS AND SYSTEMS 113
If we write mt |
> f = (@ — 21)(@ — 22) «> (2 — en) GC),
(3.12.6) f/f=(-ayt+:-+++e-a)it+btbet-:-
where
¢/o¢=bitbet---,
then the series ¢’/¢ has a radius of convergence at least p’. Hence for
some
(3.12.7) lbnl < yle*\-
On integrating (3.12.6) from 0 to z and taking the antilogarithm,
(3.12.8) f(z) = (1 — 2/ai) + + + (1 — 2/en) exp (biz + boz?/2
+ b2/3++--),
(3.12.9) f(@)fwz) +> > fw 2) = (lL — 2r/ap) + + -
(1 = ar/en\(L + Bim + ++ 9),
where
1 + Bime™ + Bam2?™ + °° + = exp (Om2™ + Dome?™/2 + Damz?™/3
pee fi

To express the coefficients B,,, in terms of the bp, consider the related
problem
1+ Ay+ Ay?+ :-- =exp (ay + ay?/2+ --: >).
By differentiating both sides with respect to y, we get
Age CAs te seat. me Moat cay +) (1 Ary te Asay? Fs. ° 2),
Hence on multiplying and comparing coefficients, we obtain the recursion
Ai = a,
2A, = aA, + a2,

3A3 = a1A2 + adi + as,


ay oun Oke). OO 5.0m 6.0) (6 18, @ 88) 6

Hence
Ste oS at

2B2,m = il te Dom,
3B3m = bmnB2,m ar DomB 1,m +- Dam;
Opn tes ear) wer ose) ‘ow ee se 16 On 6) Senne: 6

It follows immediately from the first of these relations and (3.12.7) that
[Biml < viel,
and if y > 1, as we may require, one can show inductively that
|Bom| < y?|p-?™|.
114 _ PRINCIPLES OF NUMERICAL ANALYSIS
Now from (3.12.4) and (3.12.9)
(3.12.10) ~1 +b Gime”! Gamz** + °°
a {1 aa Z™Dee™ + Suen + (—1)"2"™z7* OS zn™)a + Bim2Z™ + ap ‘).

Hence, Be comparing coefficients of 2,


Porn OS ir px: PON A (Cool W Yoda oiCs 1)Bin 2 = (— 1)
Lee Bam 2Ag

+: Los Bani « ie a.
But
|BimZ2"| < nylzn/pl™,
Bom y apap < (7)¥*|2n/p|?,

@)_ tela Oe Se ce) 50) 160- (0) Ou Oare. el seukel ae (s:) 6

Brit: ee lescuy*lee/ ol
For fixed n, as m increases, the first term vanishes as |z,/p|", the second
as |z,/p|?", .... Hence
(3.12.11) gem + + + 2a, = (—1)* + O(l2n/p|”).
This is the required theorem.
3.2. Bernoulli’s Method. The Graeffe process has the decided advan-
tage that the exponents m = 2? themselves build up exponentially.
Hence if one is fortunate enough to have the roots reasonably well
separated in the original equation, he may hope that only a relatively
small number of root squarings will be required. It has the further
advantage that, in principle at least, it yields simultaneously all the
roots of an algebraic equation and all roots within a circle of analyticity
when the equation is transcendental. Hence, once the squaring has been
carried sufficiently far, all solutions are obtainable by simple division or,
at worst, by root extraction.
The methods now to be described do not converge nearly so fast; they
give only one root, or at most a few roots, at a time; and in some cases -
they require some previous knowledge of the approximate locations of the
roots to be determined. Nevertheless, they all have one striking advan-
tage. Errors, once made, do not propagate but tend to die out. If
there were no round-off error, they would die out completely. A gross
error might cause the process to converge to some root other than the
one intended, however. But the self-correcting tendency suggests that a
method of this type might be useful at least for improving approximate
solutions obtained by, say, Graeffe’s method, but with insufficient
accuracy.
Let the equation
(3.2.1) f(z) = do oh Q;z2 + Az? + ooo = 0.
NONLINEAR EQUATIONS AND SYSTEMS 115
have a single root a interior to some circle about the origin throughout
which f(z) is analytic. Then if g(z) is analytic throughout the same
circle, and g(a) # 0, it follows from Kénig’s theorem that h,/hp1— a,
where
(3.2.2) g/f = ho + Iiz + hoz? + - - >
Now if

(3.2.3) gZ@) =gotgzt+tg2?+---,


we can set

Go + gitte = (Go + axe + + > +)(hot


haz Fs > +)
and compare coefficients to obtain the recursion

Aho = Jo,
Ahi + aiho = gi
3.2.4 us
( ). Ache + aihi + acho = go,
B40. OmyP ge O, 10. 8, KO) 10) ..0. OO aheuue

so that, if ao ~ 0, the h, can be obtained in sequence.


In the case of an algebraic equation
P(z) = aor” + aye™'+ +--+ +a =0,
f(2) = 2*P(z) is also a polynomial, and the root a is the reciprocal of
some root x; of P= 0. If g is taken to be some polynomial of degree
less than n, then for » > n

(3.2.5) Aohy + Qihy1 + +++: tanh, = 0.

However, instead of first making a nearly arbitrary selection of g(z), one


can just as well select ho, hi, . .., hn arbitrarily and apply only
Eq. (8.2.5). One never needs to know the function g explicitly. It
might happen, by chance, that the selection of ho, . . . , hn—1 defines by
(3.2.4) a polynomial g which vanishes at a, but if so, the sequence h,/h,+1
may converge to some other root.
Bernoulli’s method, properly speaking, is the method just described
for an algebraic equation, though the usual derivation is somewhat differ-
ent. If the roots x; of the algebraic equation P = 0 are all distinct, then
for any choice of ho, . . . , An—1, the m equations
Lue? = hy, p=0,..6.,n-1
can be solved for the w,, since the determinant |x?| is a Vandermonde
which vanishes only when two or more of the 2; are equal. If now
ha, hai, . . . are determined by (3.2.5), then
Zuay =h,
116 PRINCIPLES OF NUMERICAL ANALYSIS
for every ». But if x1, say, is the largest root, and uw. ~ 0, then

ny = 02 (Ur + Uar2/2i + > °°),


and for » sufficiently large, h, = wizz, and hence h,/h,11 = a7! = a.
To return to the general case where f may or may not be a polynomial,
there may not be a circle about the origin containing only a single root.
It may be, instead, that there are two conjugate complex roots a; and a,
for which therefore |a:| = |as|, and which, however, lie within some
circle which contains no other root. If so, we can apply the extension
of Kénig’s theorem, computing the h, as before, but forming a quadratic
equation (3.08.13), for v sufficiently large, whose roots will be a; and
a2 approximately. More generally, if there are n distinct roots with
equal moduli, (3.08.14) can be applied. However, it may be preferable to
set 2 = y + u for some fixed u, and apply the method to the resulting
equation in y.
If there is some circle about the origin which contains only the root au,
a somewhat larger one which contains only a; and age, a still larger one
containing only a1, a2, and a3, . .. , then in principle one can obtain
all roots a1, a2, a3, . . . without a change of origin. Thus having found
ai, we can apply the extension of K6nig’s theorem, and on setting

(3.2.6) Ho =|» ta
p+1 y

obtain
H®/H®, — oar.
Roaincr
gain
Rea [psp
Hore
h,
|Fee pe ee
(3.2.7)
hye hy h,
h
then
AY /HS, — arora.
Aitken has given a convenient recursion for calculating the determi-
nants H® of successively higher order. The formula is

(3.2.8) H?-PH?+D = He? — H® H®,, p=1,2,...

where for uniformity we set

H® = 1, H® = h,.

For p = 1 the equation merely gives the expansion of the determinant


(3.2.6). The proof is sufficiently well illustrated for p = 2 and is based
upon a classical determinantal identity. The sixth-order determinants
NONLINEAR EQUATIONS AND SYSTEMS 117
ote Mie eeD lok hee Wh) Prehewhp0. 10-0
Aya hy. 0 hy-1 0 0 0 h, —h, 0 0 0
hy+2 Iya 0 h, 1 0 x 0 Ayzs Iya 0 0 0

oe meOUrOh,=. A 0c 2 IR (’O aT ee
Beet? 9 Ay hs (0-0 how 0 h, ha 0 0
PaseOik Ghat hyias le0 hoe 0 herman ko
are equal, since the second is obtainable from the first by subtracting the
fourth, fifth, and sixth rows, respectively, from the first, second, and third.
But the second one vanishes, since in the Laplace expansion by third-order
minors every term contains a determinant with a column of zeros. The
Laplace expansion of the first along the first three rows has six nonvanish-
ing terms, but these are equal in pairs. When the three distinct terms
are written out, one obtains

h, hy hy-2 hy1 0 1 hy h,_2 0 h, hy—1 1


phe euh, A hye A, eho Lh, Ooh Colin plage 0
hyi2 Noy h, hygt 1 0 Aya h, 1 hy+2 Ay4i 0

hr he 1| |h, bya O
-- h, hy-1 0 Avs h, 0 == ib),

Ay h, 0 hy+e Nya 1

On simplifying and rearranging, we obtain (3.2.8) for p = 2. The gen-


eral case requires the expansion of a vanishing determinant of order
2p + 2 formed in a similar manner.
Aitken first proposed his 6? process, described below, as a device for
accelerating the convergence of the sequence H/H®,. If the sequence
Uo, U1, U2, .

converges geometrically to the limit u, that is to say, if for some k,


|k| < 1, it is true that
uy, — u = k?(uo — 4),
then for any v
U,y-1 Uy,
Uy Ten A Uy—1
vy—1 — 2u y =f U +1) abs

The proof can be based upon the further property that, if


u=uta, u, = u, + a,

the same identity holds when the quantities are primed. In fact, by
direct substitution one finds that, when each term in the sequence is
increased by w, the entire quantity on the left is increased by w. We can
therefore take w = —wu and consider the sequence whose limit is 0.
But by direct substitution, then, the determinant is seen to vanish,
which proves the assertion, In §3.08 it was shown that each sequence
118 PRINCIPLES OF NUMERICAL ANALYSIS
uy = HY /H®, for fixed p converges geometrically (in the limit). Hence
we may expect that the derived sequence
Uy-1
U Or= e us re 2Uy = Uy+1)
v Uy
Uy+1

would converge more rapidly than the original one. This is Aitken’s 6?
process. A second derived sequence, u‘?, can be formed from the u{? —
just as the u was formed from the u,. It is to be noted that in forming —
a term in a derived sequence one can neglect all digits on the left that
are common to the three terms being used. This is because of the prop-
erty that in increasing each term by w the result is increased by w.
We conclude this section with a brief mention of an expansion due to —
Whittaker. In (3.2.4) we are free to take go = do, 91 = g2 = °°-* =O.
If the first vy + 1 of these equations are regarded as vy + 1 linear equations
in the vy + 1 unknowns fo, di, . . . , h,, the solution for h, can be written
down in determinantal form (cf. §3.32 below). Hence the ratio h,/h,41
can be expressed as the ratio of two determinants. Moreover, one can
write
a= ho/hi + (hi/he a ho/hi) ae (he/hs a hi/he) none ’

and therefore a can be written as the limit of an infinite series involving


quotients of determinants. A slight transformation yields
a = ho/hy + (h? — hoh2)/hihe + (h2 — hihs)/hohs + + >: ,

and the numerators in these fractions are second-order determinants


whose elements are themselves determinants. One can now apply an
identity of the same type as the one used to demonstrate (3.2.8) and
obtain Whittaker’s expansion:
3 |2203
Qo aza2 ae A102
B29) @s- =
ay Q\Q2 A102 Q10203
Apa1 Apa, A0B102
Oana

3.3. Functional Iteration. If ¥(x) has no pole which coincides with a


root of
(3.3.1) f@) = 0,
and if

(3.3.2) g(x) = x — ¥(a)f(a),


then any root of (3.3.1) satisfies also

(3.3.3) . x = (2).
NONLINEAR EQUATIONS AND SYSTEMS 119
In particular if ¥(z) is analytic and non-null throughout some neighbor-
hood of a root « of (3.3.1), then a is the only root of (3.3.3) in that
neighborhood of a. This suggests the possibility of so choosing y that
the sequence

(3.3.4) tiz1 = (x5)


converges to a, provided the initial point xo is sufficiently close to a.
In fact if the sequence (3.3.4) converges at all, it must converge to a solu-
tion of (3.3.3), since clearly the sequences x; and ¢(z;) have the same
limit. Consider first the conditions upon ¢ that will ensure the conver-
gence of (3.3.4).
Define the p neighborhood N(x», p) of xo by
(3.3.5) N(2o, p):|% — 20] < p.
That is, the p neighborhood of 2» is the set of all points z within.a distance
of p from x. Now if for some positive k < 1 and some p it is true that
(3.3.6) |o(x’) — o(2”)| <k for xz’ and x’ in N(a, p),
and if 2 itself is in N(a, p), then every x; in the sequence (3.3.4) is in
N(a, p), and the sequence converges toa. For

yan 9 Bh $(zi) — ¢(a),


so that
|ti41 — a| = |O(2s) — $(a)| < kai — | < [a — al,

and inductively every z; lies in N(a, p). Also

(3-o0L) |x. a a| < k*|xo == a|

by an induction that can be carried out once we know that every 2; lies
in N(a, p). Since k <1, therefore, the distance |x; — a| decreases
geometrically at least. °
Now suppose that for some 2 and p and a positive k < 1 we have

(3.3.8) \o(x’) — o(2”)| < k for x’, 7’ in N(x», p),

the condition holding in a p neighborhood of xo, while at x») we have

(3.3.9) |zo — (%0)|< (1 — k)p.


We do not now presuppose the existence of a solution. Instead we show
that the sequence defined by (3.3.4) has a limit a which lies in N (ao, p)
and which satisfies our equation. Hence the conditions (3.3.8) and
(3.3.9) are together sufficient to assure us of the existence of a solution
and that it is obtainable as the limit of our sequence.
We show first that every term in the sequence lies in N(%o, p). Weare
120 PRINCIPLES OF NUMERICAL ANALYSIS
assured by (3.3.9) that 21 = (xo) does, since this is equivalent to saying
that
a0 = x1| < (1 a k)p < p.

If also a2, x3, . . . , % all lie in N(ao, p), then since

[tena — el = [O(@s) — O(@-r)| S Ales — til,


and by induction

tiga = x;| S Klay — tol < ki — k)p,


therefore
lteza — Gol < |regs — vl + [es — teal + + + + |e — 20
<b + po ae ST — bp = (1 kp <p

Hence the series


lol + |a1 — ol + [v2 — a1| + Tee

converges, and hence the series


LPMae Ct eet) RG i Cz yea Hr it OC

converges absolutely. But the partial sums of the last series are the 2;.
Hence the sequence (3.3.4) converges, and the limit therefore satisfies
(3.3.3).
If ¢ is analytic in some neighborhood of a root a, as will be assumed
throughout, and if
(3.3.10) Id’(a)| <1,
then for any k which satisfies

Ip’(a)| <k <1


there exists a p sufficiently small so that (3.3.6) will be satisfied. Hence
(3.3.10) is sufficient to ensure the existence of some neighborhood of a,
though possibly small, within which (3.3.4) converges. However, it
appears from (3.3.7) that the convergence is more rapid for smaller k,
and hence for smaller |¢’(a)|. Hence it would be advantageous, if pos-
sible, to make ¢’(«) = 0. When this is the case, we shall say that the
sequence (3.3.4) has second-order convergence or, more briefly, that the
iteration (defined by) ¢(x) is a second-order iteration. More generally,
if
(3.3.11) ¢'(a) = g(a) = + + + = pHV(qQ) = 0,

the iteration will be said to be of order m at least, and of order m exactly


if ¢™(a) #0. While one can write in general the expansion

(3.3.12) $(z) — a = (@ — a)g'(a) + (@ — @)*h!"(a)/21-+ +++,


NONLINEAR EQUATIONS AND SYSTEMS 121
(sinee ¢ is supposed analytic), when the iteration is of order m, one can
write
(3.3.13) $(z) — a = (& = a)™g™(a)/m!+ +++.
In this case
ini — a = (2; — a)™G™(a)/mI + -°--.
3.31. Some Special Cases. Some classical methods of successive
approximation are methods of functional iteration as here understood.
Horner’s method is not, and since it has little to recommend it in any
case, it will not be described here.
3.311. First-order iterations. Best known of these is the regula falst.
This applies to real roots of real equations, algebraic or not. If f(z’) and
f(z’) have opposite signs,
ae = [x'f(0") — 2"f(@ VU") — F2’))
lies between x’ and x’. The chord from the point 2’, f(z’) to x”, f(x’)
intersects the z axis at x2 as one verifies easily. It is no restriction to
suppose that f(x2)f(x’”) > 0, since otherwise we can reverse the designa-
tions of z’ and x’. We now let 22 play the role of x’’ and repeat, or we
let x’’ = x, and regard this as the first step in the iteration. Hence we
are taking
(3.311.1) o(x) = [z’f(x) — af(z’)I/[f(e) — f')I-
The derivative ¢’(a) is seen to be
¢'(a) = [f(z’) + (a — 2’)f'(a)I/f(o').
In case f has continuous first and second derivatives near a, then the
Taylor expansion gives
f(z’) = fla) + (@' — a)f'(a) + W@’ — a)*f"(@"),
where now x” is some point on the interval (a, x’). Hence

fe’) + (a — #)f'(a) = Wa!’ — @)?f"(@"),


and therefore
¢'(a) = M(x’ — a)*f"
2")/f(z').
Hence for x’ sufficiently close to a, ¢’(a) will be small, and there will exist
an interval about a over which
lo’@)| Sk <1,
Hence once we find an initial x’ close enough to a, the subsequent itera-
tion will converge to the solution. The choice (3.311.1) of ¢ is equivalent
to the choice
v = (x — 2’)/[f(z) — fv’)
in (3.3.2), as we verify directly.
122 PRINCIPLES OF NUMERICAL ANALYSIS

Geometrically this method ‘amounts to drawing a series of chords


all passing through the point 2’, f(z’). The ith chord passes also through
zi-1, f(a;1). Instead we might take a series of parallel chords. For
this
y =m,
where m is the constant slope. Then
¢(z) = 2 — mf(a),
and
¢'(z) = 1 — mf'(z).
If f’(z) has fixed sign throughout some neighborhood of the solution, we
choose m to have the same sign and, in fact, such that throughout the
interval
2 > mf'(x) > 0.

3.312. Second-order iterations. If we take

(3.312.1) d(x) = x — f(x)/f'(a),


which is to say
(3.312.2) v(x) = 1/f’(a),
we obtain the well-known Newton’s method. The derivative is
(3.312.3) ¢'(z) = 1 — [f"%@a) — f@)f" @I/F"@),
whence
¢’(a) = 0.
If a is not a multiple root,
f(a) #0,
and for any positive k there is a neighborhood of a throughout which ©

ld’(x)| < k.
The requirement often made that at the initial approximation x» we
should have
F(%0)f’" (xo) > 0
is not strictly necessary.
Newton’s method applies to transcendental as well as to algebraic
equations, and to complex roots as well as to real. However if the equa-
tion is real, then the complex roots occur in conjugate pairs, and the
iteration cannot converge to a complex root unless 2p is itself complex.
But if x is complex, and sufficiently close to a complex root, the iteration
will converge to that root.
For algebraic equations, as each of the first two or three 2; is obtained, it
is customary to diminish the roots by 2; by the process described in §3.04.
NONLINEAR EQUATIONS AND SYSTEMS 123
Or rather, one first diminishes by 2; then one obtains 7, — a) and
diminishes the roots of the last equation by this amount; then obtains
— x, and diminishes by this amount, etc. Since
Fes + u) = fei) + uf'(e) + -
one has always that 2;,1 — x; is the negative quotient of the constant
term by the coefficient of the linear term. Hence one calculates f(x;) and
f’(x;) in the process of diminishing the roots at each stage.
However, f(z;) decreases as one proceeds. When f(z;)/f’(x;) becomes
sufficiently small, one can write
u = —([f@s) + uf"(7)/21+ - - -]/f'@,),
which is exact. When wis small, the terms in u?, u?, . . . become small
correction terms, and subsequent improvements in the value of u can be
made quite rapidly by resubstituting corrected values.
When Newton’s method is applied to the equation
x? —N =0,
the result is a standard method for extracting roots in which one uses

o(x) = («& + N/a)/2.


3.32. Iterations of Higher Order: Kénig’s Theorem. If one applies
Newton’s method to any product q(x)f(x) to obtain a particular zero a of
f(z), one always obtains an iteration of second order at least, provided
only g(a) is neither zero nor infinite. Hence one might expect that by
proper choice of g it should be possible to obtain an iteration of third
order or even higher. This is true; one can in fact obtain an iteration of
any desired order, and various schemes have been devised for the purpose.
Some of these will now be described. However, one must expect that
an iteration of higher order is apt to require more laborious computations.
The optimal compromise between simplicity of algorithm and rapidity
of convergence will depend in large measure upon the nature of the avail-
able computing facilities.
Consider first Kénig’s theorem. In the expansion

h(z) = g/f =hothiethe+---,


we have Taylor’s expansion about the origin in which
he = h©(0)/r1.
If we move the origin to some point x, we can restate the theorem in an
apparently more general form, as follows:
If in some circle about x the equation

(3.32.1) f@® =0
124 . PRINCIPLES OF NUMERICAL ANALYSIS

has only a single root a, which is simple; if f(z) and g(z) are analytic
throughout this circle, and g(a) ~ 0; and if we define

(3.32.2) P,(x) = r{g(x)/f(x))?/lg@) /F@)1,


then

(3.32.3) lim P,(z) = a@ — @.

For P,(x) is simply the ratio of the coefficients of the Taylor expansion of
h(z) about the point z.
This being true, then at least for r sufficiently large it is to be expected
that
la — + — P,(z)| < la — al,
and hence x + P, should then define a convergent iteration of some order.
It turns out that the iteration is convergent for any r and in fact is of
order r + 1.
In proof, we write the expansions
> -
h(z) = ho(x) + (2 — z)hi(a) + (2 — 2)*ho(~) +
and
F(z) = ko(x) + (2 — z)ki(zv) + > + > + (2 — x)ke-(x) + Rryilz, 2).
Then
Px) = h,-a(x)/h-(x)
= (a — 2)[F(a) — R,(a, 2)]/{F(@) — Rey1(e, x)],
and
a—2—P, = (a — x)k,(x)/[F(a) — Rr4i(a, 2).
Since a — x — P,(z) has the factor (a — x)’, all derivatives of
a—z-—P,,

and hence of x + P,(x), from the first to the rth will vanish at z = a.
By definition, therefore, x + P,(x) defines an iteration of order r + 1
at least. Note that with g =1, Pi = —f/f’, which yields Newton’s
method.
For the functions h,(x) required in forming any P,(x), one can obtain
them by differentiation, as indicated in the theorem, or by solving a
recursion like that of (3.2.4). However, now the a, and g, are functions
of x, coefficients in the expansions
f(z) = aox) + (2 —az)ai(z) +---,
g(z) = go(x) + (2 — z)gi(z) + ---.
In the statement of Whittaker’s expansion (§3.2), reference was made
to the fact that the h, could be expressed by means of determinants.
NONLINEAR EQUATIONS AND SYSTEMS 125
This is equally true for the h,(x), and even for the P,(x). For this it is
convenient to suppose that any desired function g has been divided into f
in advance, and f now designates the quotient. The expansion is now
(3.32.4) 1/f(z) = ho(x) + (2 — z)hi(x) + (@ — 2)*al(z) + ---,
so that
h(x) = [1/f(x)]/r},
and
P,(a) = hy1(x)/h,(a).
Let
Ao = L Ai = a,(z),

ai ao 0

(3.32.5) Be SM hers
a, Gr, Ay_2
Then
(3.32.6) Ie = (—)*A,ag*—1,
and
(3.32.7) P, = —apA,_1/A,.
This is equivalent to saying that an iteration
(3.32.8) ¢,=x+P,
of order r + 1 is given by 4, satisfying
or —='Z. «Ge 0

—l ai a PEN?
(3.32.9) 0 Cates oct Ses ce Ob

0 a, Gr-1

As an example, let
f=wa-—N
Then
a = u™ — N, a = ma), a2 = m(m — 1)z"-?/2.

We have for third-order iteration


ai ao
& = £ — Aa,/ ?
Qo ay

which becomes after simplification

o = x[(m — 1)x™ + (m + 1)N]/[(m + l)a™ + (m — INI.


This defines Bailey’s iteration for root extraction. For square roots
@ = a(x? + 3N)/(32? + N).
126 PRINCIPLES OF NUMERICAL ANALYSIS
3.33. Iterations of Higher Order: Aitken’s 5? Process. Consider any
sequence 2, satisfying the iteration ¢(z), with the limit a. The sequence
(3.33.1) Uu= 2; —B
for any fixed 6 has the limit a — 6 and satisfies the iteration
(3.33.2) ¥(u) = ¢(u+ B) — B,
since ;
Viti = Unit B= g(x) = o(u; + 8B).

In particular if 6 = a, the sequence & = «; — a represents the deviations


of the approximations z; from the limit a, and this sequence is defined by
the initial deviation &, together with the iteration

(3.33.3) o() = o(f +a) — @


If the iteration ¢ is of order r, then

(3.33.4) w(t) = ark + anyigtt + +:


Now let ¢c1(a) and ¢.2)(%) represent any two iterations of the same
order and not necessarily distinct. With these can be associated the
iterations w1)(£) and w,2)(&) satisfied by the deviations. The function. —

(3.33.5) @(z) = {xdcyldca(x)] — by (2) bc (x) }


{x — day(%) — $(z) + bcylo~m(x)]}
also defines an iteration. This is invariant in the sense that, if one makes
the substitution (3.33.1), forms the ~qa) and yi) as in (3.33.2), defines
W(u) as in (3.33.5) with Way, We), and u replacing $1), ¢:2, and x, then
V(u) = &(u+ 8) — B. This can be verified by direct substitution.
Now it can be shown that, if the iterations $<) and ¢ 2) are of order r,
then the iteration & is of order greater than r, provided only
(3.33.6) [bay(e) om 1] b(a) = 1} 0.

In fact if r = 1, and this condition (3.33.6) is satisfied, then @ is of order


2 at least, and if r > 1, then @ is of order 2r — 1, and condition (3.33.6)
is necessarily satisfied.
Because of the invariance, it is sufficient to consider

{Ewcayloay
(£)] — wa (é) wc (£) }
A) =
E — wa(€) — wa (é) + wala (]}?
where the sequences of deviations satisfy

wy (€) - Cn ko ewe
wa(~) = air tr ss,

wor (£)] = aoP(aPe + ++ -)rpess


NONLINEAR EQUATIONS AND SYSTEMS 127
Hence if r > 1, the term of lowest degree in the numerator is the second
term which is of degree 2r, while the term of lowest degree in the denomi-
nator is the term £, itself of degree 1. Hence an expansion of the fraction
in powers of £ will begin with a term of degree 2r — 1, and this exceeds
r when r > 1.
If r = 1, then to form the numerator we have

Ew nwa) ($)] = aPaPe +--+,


wy (€)wca($) = aMaPe+ ---,
whence on subtraction the terms in ¢ drop out, and the term of lowest
degree in é is of degree 3. For the denominator we have

E — wy(E) — w/t) + walocy(€)]


= (1 = asv ms a +- aDa) & + canioune

= (1 —a®)(1 —a®)E+---.
But
a = gale), a? = $in(a),
so that, if (3.33.6) is satisfied, the expansion of the denominator begins
with a term in the first power of & Hence the expansion of the fraction
begins with a term of degree 2 at least, and the iteration is therefore of
order 2 at least.
Thus given two iterations of the same order, one can always form an
iteration of higher order. But it was nowhere required that ¢q) and ¢(2)
be distinct, so that from any single iteration one can form an iteration of
higher order. More than this, an iteration ® of order r > 1 always
converges in some neighborhood about a, whereas an iteration ¢ of order 1
need not converge and will not unless |¢’(a)| <1, as we have seen.
Hence from any function ¢, analytic in the neighborhood of a and satisfy-
ing ¢(@) = a, one can form an iteration which converges to a whether
that defined by ¢ converges or not.
In ordinary practical application it is not desirable to form @ explicitly.
Instead, one can proceed as follows: One forms

r1 = $(Zo), t= (21)
in the usual manner. However, for x3 one takes not ¢(x2) but ®(xo) by
23 = (“ote — 2?)/(o — 2x1 + 22).
In terms of the difference operator A, defined by
Ax; = Vii — LH,

A’x; = AXi+1 = Ax; = Li42 — 205441 + HAS

this can be written ;


Ls = Lo — (Ax )?/A*xo,
128 PRINCIPLES OF NUMERICAL ANALYSIS
and more generally one can take
} 3¢+4-1) = V3y — (Axs3,)?/A2x3,.

This form brings out the fact that in practical computation any sequence
of digits on the left that is identical for x3,, 73,41, and 3,42 can be ignored,
since it drops out in forming the differences and is restored in subtracting
from %3,.
Just as the iteration & was of order higher than ¢, so one can form from
® an iteration of still higher order. Thus having computed
x3 = B(x), Le = ¥(23),
instead of computing ¢(z.), one could now form
X27 = (Xox— — 12)/(to — 2x3 + Ze).
In principle, iterations of arbitrarily high order can be built up by pro-
ceeding in this manner.
3.34. Iterations of Higher Order: Schréder’s Method. The oldest method
of obtaining iterations of arbitrary order seems to be that given by
Schréder. Consider any simple root a of
(3.34.1) f(x) = 0.
In the neighborhood of a we can set
(3.34.2) y =f)
and let
(3.34.3) a = hly),
where

(3.34.4) c=Hf@)), y=flry)l,


identically. Hence
(3.34.5) a = h(0).
If we expand
| X = K(Y)
in powers of (Y — y), regarding x and y as fixed, we have
X= 2+ YY —yiy) +a aghast =;
where
(3.34.6) hey) = h®(y)/ri.
Hence

(3.34.7) a =a — yhiy) + +++ + (—y)th(y) + (—y)'Ryr(y).


NONLINEAR EQUATIONS AND SYSTEMS 129
Now the h, are functions of y, but y is a function of x by (3.34.2), so that
we may write
(3.34.8) (2) = ALf(@)],
with
(3.34.9) b=1/f', b= ,/(rf?.
Then (3.34.7) can be written
(3.84.10) a=a-—fbit+-++ + (—f), + (-f) ORAS)
identically. If we define ¢, by
(3.34.11) a = ¢,(z) + (-f)
UR a(S),
then we see that ¢, provides an iteration of order r + 1, since its rth
derivative must vanish with f. Again ¢; yields Newton’s method.
_ The quantities b,(z) required for the iteration
(3.34.12) $-(x) = 2 —foit --- +(—f)-
can be formed by successive differentiation as in (3.34.9), where the prime
denotes differentiation with respect to x. It is also possible to write a
system of recursion relations which can be used in case the successive
derivatives of f(x) are known or easily evaluated. If these are known,
we can write the expansion of Y = f(X) in powers of (X — 2):
Y—y=ai\(X
— 2) + a(X — x)? + 0,(X —z)*?*+°--,
where a,(x) = f(x)/z!. If this is substituted into the expansion of
X — x given above, we obtain
X — x = byfas(X — x) + a(X — x)? + 4;(X — x)? + -- J
+ befa2(X — 2)? + Qaya(X — 2)? + ++] 4 bsla(X — 2) +++ 7]
after replacing the h,(y) by b,(x), as in (3.34.8). Now the two sides must
be equal identically so that
ab; = if

(3.34.13) a%bs; + 20100b2


arb, + ab
+ Q3b1 =
= 0,
0,
Oe be ome 6 oy 6) 6) o- .8 le .6) Fe! ho

This is the required recursion for expressing the b,(x) in terms of the a;(z),
and hence of the derivatives of f.
3.35. Iterations of Higher Order: Polynomials. Three distinct methods
have been given for forming iterations of higher order. The 6? process
presupposes that some iteration is known and deduces from it an iteration
of higher order. The methods of Schréder and of Richmond start with
the equation f = 0 and form directly an iteration of any prescribed order,
130 PRINCIPLES OF NUMERICAL ANALYSIS

provided only that the root a is a simple root and that f is analytic, or at
least possesses derivatives of sufficiently high order, in the neighborhood ~
of a. Clearly if g(x) ¥ 0 and is analytic in the same neighborhood, one
could apply either method to the equation F = 0, where F = fg or where
F =f/g. Thus there are a great many ways of forming an iteration of
any order, and doubtless many different iterations. In special cases it
might be desirable to impose upon the iterations special conditions other
than the order of convergence. In particular if f is a polynomial, one
might ask that ¢ be a polynomial. This would be desirable in case
one is using a computing machine for which division is inconvenient; or
in operations with matrices, where direct inversion is to be avoided; or,
as Rademacher and Schoenberg have shown, in operations with Laurent’s
series, where direct inversion is impossible.
We ask now whether, when f is a polynomial, one can find a function g
such that, when Schréder’s method is applied to the equation
(3:So4m) F = f/g = 0,
an iteration ¢, will be a polynomial. Taking first ¢1, we wish to choose g
so that
(f/9)/(F/9)' = Pf,
where p is a polynomial. This requires that

pf’ — plg'/g)f = 1.
But if f = 0 has only simple roots, one can always find polynomials p and
q such that
(3.35.2) of’ —qf =1
by the theorem of §3.06. Hence if g satisfies
(3.35.3) g'/g = 4/P,
the requirement is satisfied for 4.
Now suppose that the process yields

(8.35.4) ¢ = 2 — fort fpo/2l— +--+ (—f)'p-/rl, pi=p


and that all p;, and hence all ¢;, are polynomials for i < r. To obtain
$41 we must add to this a term

(AY pear + 1)! = (—FY


HBaaa,
where the B’s are obtained from F as were the b’s from f in (3.34.9).
Hence ;
B, = g'p,/r',
and
Brit = (g"pr)'/(r + 1) IF’.
NONLINEAR EQUATIONS AND SYSTEMS 131
After minor manipulations and applications of (3.35.2) and (3.35.3), this

Bri = opp, + repr)/(r + 1)!


and therefore
FOB =F (pp, + rap) /(@¢ + 11.
Hence

(3.35.5) Pro = pp. + rap,


is a polynomial, and this recursion, together with (3.35.4), defines poly-
nomial iterations of allorders. Note that g itself is not required explicitly,
but only p and gq.
For illustration let
f=1-— 2", f' = — nz™"a.
Then
(—a/n)(—nx*—"1a) + (1 — x%a) Il
Hence
|b) ae —2/n, Gui =,
g/g = n/z,
and a possible choice is g = x”.
We can now evaluate the p; recursively and obtain a polynomial
iteration of any order for the nth root. But in this case it is easy to
verify that
a =x2(1 — f)-™.

On referring back to the argument used in §3.34 to derive Schréder’s


iterations in general, we may conclude that the first r terms in the
expansion in powers of f must provide an iteration of order r, and this
is certainly a polynomial. Direct verification shows it to be the same as
is given by (8.35.4) and (3.35.5).
For the case n = 1, the iteration of order r is
(3.35.6) @=a2l+f+P+-++ +f7%,
and for r = 2 this becomes on expanding f

@ = «(2 — az).

For the case when a and 2 are matrices, this defines the Hotelling-
Bodewig iteration for inverting a matrix.
On introducing the subject of functional iteration, it was shown
that the iteration would converge to a when x is any point in the com-
plex plane within a circle about a throughout which a certain Lipschitz
condition is satisfied. This condition is sufficient but by no means neces-
sary. In particular, suppose a is real and ¢ is a polynomial. The curve
132 PRINCIPLES OF NUMERICAL ANALYSIS

y = ¢ intersects the line y = x at x = y = a, and if the iteration is of


order 2:or greater, then at this point the curve has a horizontal tangent.
Suppose the curve intersects the line also at some point a’ <a but
nowhere between a’ anda. Thenifa’ < 2 < a, the sequence converges,
whereas the Lipschitz condition holds only in the vicinity of a. Like-
wise if the curve intersects the line at a’’ > a but nowhere between a and
a’’, then the sequence converges when a < % <a’. For % <a’ and |
for to > @”’, the sequence either diverges or converges to some other
limit. Thus for the iteration x(1 + f + f?) to a1, the curve and line
intersect at x = 0, a1, and 2a~!, and the sequence converges to a“ if
and only if 0 < 4» < 2a~._ For the iteration x(1 + f), the curve inter- _
sects the line only at x = 0 and a“, while ¢(2a—') = 0. This sequence
converges under the same conditions, as does any iteration (3.35.6).
3.4. Systems of Equations. All methods so far described have been
methods for dealing with a single equation in a single unknown. For
solving a system of equations one would like to follow the Gaussian pro-
cedure in systems of linear equations, eliminating one variable from each
of n — 1 of the equations, another from each of n — 2 of these, and so on
until there is left a single equation in a single unknown. This of course
is quite impossible ordinarily. Nevertheless, it is possible to reduce the
problem to that of solving an infinite sequence of single equations. For
each equation of the sequence any of the methods described above can be
applied. The reduction to an infinite sequence can be accomplished by
the method of steepest descent.
On the other hand, one can attempt to reduce the problem to that
of solving an infinite sequence of sets of linear equations. This type of
reduction is accomplished by an appropriate generalization of the method
of functional iteration. These two methods will now be described. Only
the case of real functions of real variables will be considered in this
section.
3.41. The Method of Steepest Descent. The method of steepest descent
applies specifically to the location of the maximum or minimum of a
function of n real variables. However if we have to solve a set of n
equations in n variables
(3.41.1) ¥; = 0,
the function

(3.41.2) d = Zy?
takes on the minimum value ¢ = 0 at all points satisfying (3.41.1).
More generally if a,; are elements of a positive definite matrix, the
function
(3.41.3) pt = LIYpiash;
NONLINEAR EQUATIONS AND SYSTEMS 133
also takes on the minimum value ¢* = 0 at the same points. There are
thus many ways in which the problem of solving a set of equations can be
replaced by a problem in minimization. We therefore consider the prob-
lem of minimizing the function $(&, &, . . . , &n), or briefly ¢(z), where
xz is the vector whose components are the ¢. The partial derivatives

be: = 06/08;
are components of a vector which we shall designate as ¢, and take to
be a column vector. This is known as the gradient of ¢, often denoted by
grad ¢ or V4, and its direction at any point z is normal to that surface
(3.41.4) ¢ = const

which passes through the point z.


In the neighborhood of the point + which minimizes ¢, the surfaces
(3.41.4) are closed if, as we suppose, ¢ is continuous, since if we take the
constant sufficiently close to the minimum value, then along any ray
through the point x the function ¢ can only increase, and at some point
along the ray the function will first take on the value of the assigned
constant.
Now suppose that x represents some initial estimate to the required z,
close enough to it so that the surface

(3.41.5) : $(z) = $(20)


is closed, and that the point x lies in the region enclosed by the surface.
Then choose an arbitrary vector u, subject only to the requirement that
at Lo

(3.41.6) u'dz(to) ¥ 0.

This means that the direction u is not tangent to the surface (3.41.5) at
Zo. It therefore cuts through the surface and so intersects surfaces at
which ¢ has smaller (as well as larger) values than at 1%. Determine

(3.41.7) t1 = Xo — AU

as that point on the line through 2» in the direction of u at which ¢ takes


on its least value. Thus we minimize the function

(3.41.8) $1(A) = $(%o — Au)


of the single variable \. If the line in question should happen to pass
through the required point x, then we shall have solved our problem, but
in general this will not happen. However, 2: will be on a surface

(3.41.9) o(z) = $(x1)


134 PRINCIPLES OF NUMERICAL ANALYSIS

at which ¢(z) has a value smaller than it has at 2. The equation for
locating:x1 is obtained by equating to zero the derivative of ¢1(\):
(3.41.10) u'dz(%o — Au) = 0.

This is not satisfied by \ = 0 because of condition (3.41.6). Equation


(3.41.10) states that at the point 2: the line through 2 in the direction u
is tangent to the surface (3.41.9), and this is geometrically evident. At
z1, choose a new direction u, not tangent to the surface, and proceed
sequentially.
In this way is obtained a monotonically decreasing sequence $(z;)
that is bounded below by the minimum value ¢(r). The sequence there-
fore has a limit at some point z~. If ¢.(z.) = 0, then x, minimizes ¢,
and necessarily x. = x if 2 is close enough to x so that ¢ has a minimum
at one point only within the closed surface (3.41.5). But if ¢.(7.) ¥ 0,
then we can find a direction u not tangent to the surface through z.,
and so proceed farther. But then no limit would have been reached, and
hence the limit 7. = zx.
In the method of steepest descent, strictly so-called, one chooses
always u = ¢z. Since ¢, is the direction of most rapid variation of ¢,
this choice may be expected to lead to convergence after the fewest steps.
However, each step is fairly cumbersome. As with linear systems, it is
much simpler to take for each u one of the reference vectors e;, either in
rotation as in the Seidel process, or selected by some other criterion. It
is in keeping with the method of relaxation to examine the components of
¢z and select the largest. If this is the ith, then one takes u = e,, and
Eq. (8.41.10) reduces to
(3.41.11) $:(%0 — drei) = 0,
which amounts to solving the 7th equation for the 7th variable as a func-
tion of current estimates of the other variables.
Equation (3.41.11) in \ will ordinarily be nonlinear, and one of the
methods of successive approximation described above for an equation
in one variable will generally have to be applied. Newton’s method, for
example, gives

(3.41.12) v= :,(%0)/é,2;(2o)

as the first approximation to \. This will not necessarily reduce ¢:, to


zero, but it should reduce the magnitude of the vector ¢,. If so, it is
sufficient to take x = 1» — d’e; and look for the largest component of ¢,
at this point.
One should note carefully that the equations ¢;, = 0 being solved
explicitly by this method are not in general the same as Eqs. (8.41.1),
but they are satisfied by any solution x of (8.41.1). The new equations
NONLINEAR EQUATIONS AND SYSTEMS 135
may be more complicated than the original. If Eqs. (3.41.1) themselves
arise from a minimizing (or a maximizing) problem, then they can be
used as they are, without forming the function ¢ by (3.41.2).
3.42. Functional Iteration. Newton’s method generalizes directly to
systems of equations. Consider first the general functional iteration
inn variables. Let g(x) stand for the vector whose elements are +;(£1, £2,
..., &). Thus g is a function of the vector x. Suppose for some
constant vector a it is true that
(3.42.1) g(a) =a,
and consider the sequence defined by some x and
(3.42.2) C1 = g(a).

Under what circumstances will this sequence of vectors converge to the


vector a?
A sufficient condition for this can be given by means of a Lipschitz
condition. If, as in Chap. 2, we let b(v) represent the magnitude of the
numerically greatest element in the vector v, then the sequence (3.42.2)
converges to the vector a, provided that for some k < 1 and for some p
it is true that
(3.42.3) blg(z")e=— 9@’’)] < kb@’ — 2")
for every x’ and x” satisfying

b(t’ —a)<p, bi” —a) <p,


if also b(t — a) <p. The proof is made exactly as in the one-dimen-
sional case if one uses the maximum absolute value whenever the absolute
value was used before.
Again, if for some k < 1 the same condition (3.42.3) holds whenever
x’ and x”’ satisfy
b(x’ — ao) < p, On — “20)) < 9;
and if also
blg(xo) — to] < (1 — kp,
then we can conclude that the sequence (3.42.2) has a limit, which we may
call a, that
b(a ie Zo) < Pp,

and that a satisfies (3.42.1). Hence under these circumstances we are


assured that a solution exists.
Now consider the system of n equations in n variables

(3.42.4) d(x) = 0,
where x represents the vector of elements &; If each function ¢; is
analytic in the neighborhood of some point 2», then
136 PRINCIPLES OF NUMERICAL ANALYSIS

(3.42.5) $:(z) = os(to) + 2(E — £0,s)8G+(%0)/9E;


+ Wrrd(E; — £o.s)(Ee — £0,n)O°Gs(%0)/OE; OEL+ °
If Eqs. (9.424) have a solution x sufficiently close to xo, that is, for which
b(z — x) is sufficiently small, we might expect that by solving the
equations
O = $:(20) + Z(E15 — &o,3)0Gs(t0)/OE;
for the quantities £1; we would obtain an approximation to the & that
is better than the &,. If f is the vector whose elements are the ¢,, and
if we introduce the matrix
(3.42.6) fa(ao) = [8b+(%0)/0E:],
which is the Jacobian matrix of the functions ¢; evaluated at xo, these
equations have the form

(3.42.7) fe(xo)a1 = fe(0)%0 — f(%o),


and if f.(zo) is nonsingular, then they have the solution

(3.42.8) t= Xo — fz*(Xo)f(Xo),
which is the direct generalization of the iteration given by Newton’s
method.
The iteration does converge, and it is not necessary that derivatives
of all orders of the ¢; should exist. However if n is at all large, the
repeated evaluation of the inverse matrix fz! or the repeated solution of
linear systems of the type (8.42.7) is certainly undesirable. Conse-
quently a somewhat more general theorem will be proved.
Suppose all functions ¢; have continuous first partial derivatives
in the region of n space being considered, and suppose moreover that this
region is convex. If ¢ is any one of these functions, and ¢, is the row
vector of its first partial derivatives, then for any x’ and x’ in the region

d¢[x’ + O(2"’ — x’)\/dd = ¢,[x’ + A(x” — 2’)\(ax"’ — x’).


Hence

Ih zl’ + O(a” — 2')\(a"’ — x!)dd = o(2"’) — $(z’).


Written for all the functions, this identity becomes

(3.42.9) fle”) = fe’) + [Ifala’ + 02” — 2a" — aya,


For brevity write

(3.42.10) F(2!, 2”) = f* fa! + (x! — 2')\d0,


NONLINEAR EQUATIONS AND SYSTEMS 137
Then |
F(2’, x’) = faz’).
If F(&o, £0) is nonsingular, then F(z’, x’’) will remain nonsingular for 2’
and x” in a sufficiently small neighborhood of x. Define the vector
(3.42.11) g(x) = x% — fz (xo) f(z).
Then
g(%0) — oe = —fz1(x0)f(xo),
and
g(x”) — g(a’) = (a — 2) — fom) [f(e") — f@’)I
= (a — 2!) — fa (eo) F(a’, 2") (2"" — 2’)
= [I — F*(o, to) FQ’, 2'")\(a" — 2’).
For x’ = x’ = 2o, the matrix within the brackets vanishes. For p not
too large it will be true that for some k < 1
b[I — F- (xo, 20) F(x’, x’’)] < k/n,
when b(x’ — x) < p and b(x”’ — 2) < p. Then

blg(x’’) — g(x’)] < kb(z” — x’).


Also
blg(xo) — xo] < nb[fz1(xo)
Jolf(ao)].
Hence the iteration will converge if x» is close enough to the solution x
so that
nb[fz*(xo)]b[f(xo)] < (1 — k)p.
This shows that, if the functions ¢,(x) have continuous first partial
derivatives in the neighborhood of a solution z, and if the Jacobian
fz is nonsingular in the neighborhood of this solution, then the iteration
(3.42.11) converges whenever 2» is chosen sufficiently close to x. It is
true a fortiori that the Newtonian iteration defined by

(3.42.12) g(x) = x — fz1(x) f(a)


also converges.
Unfortunately, the practical question as to the convergence of the
computations modeled on this method is left undecided in general, and
round-off errors may make any degree of accuracy impossible. Even in
the case n = 1, the error in the calculation of 2,4: from x; depends upon
the errors present in the computation of f and of f,, and the accuracy
with which these can be calculated depends entirely upon the nature of
the functions. If the errors in the calculation of f and f, can be estimated,
then one can estimate the error in the quotient f/f, or in thes olution
-1f, This is the error in the correction x41 — %, and when it becomes
i188 PRINCIPLES OF NUMERICAL ANALYSIS

large by comparison with the computed correction, the further applica-


tions of the iteration are of no value.
3.5. Complex Roots and Methods of Factorization. Fora single vari-
able, Newton’s method and its generalizations apply equally to the
determination of real and of complex roots. Also by appropriate applica-
tion Graeffe’s method and Bernoulli’s method will yield complex roots.
However, they are somewhat inconvenient, and a number of special
methods have been devised for finding the real and imaginary parts of
complex solutions.
Any analytic function f(z) of a complex variable z = x + ty can always
be written in the form
(3.5.1) fle + ty) =u, y) + we, y),
where u and » are real functions of the real variables x and y._ In par-
ticular if f is a polynomial, one can write Taylor’s series

(3.5.2) f(x + ty) = f(x) + tyf'(z) — yf" (a) /2! — f(a) /BI+ ++: ,
whence
ule, y) = f(t) — yh @) 20 ys (s/s)
v(x, y) = ff’ (a) — yf" @)/31+ °° 4,
so that u and v/y are functions of x and y?.
Any solution z of

must be in the form z = x + iy, where x and y are real and satisfy

(3.5.4) u(z, y) = o(z, y) = 0.


These are real equations in the real variables x and y, and can be treated
by either of the methods described in §3.4.
In the case of a polynomial f, it is possible to eliminate y, obtaining a
single equation in x to be solved for real roots only. By substitution one
can obtain the equations in the associated y’s. For brevity write
U Qo + aoy? + ecm ony ==),

yo Ml
il @1 + Gay" + - * * Gen ay = 0:

Multiply the first equation by ai, the second by ao, subtract, and divide
through by y?. The resulting equation is of degree m — 1 in y?. If
Qom41 #0, multiply the first equation by den41, the second by dom, and
subtract. This gives a second equation of degree m — 1 in y?, and these
two can be treated as were the original two. Eventually there results an
equation of degree 0 in y®. If demy1 = 0, continue with the equation
resulting from the first elimination along with y-v = 0.
For applying Newton’s method or one of its generalizations to the
NONLINEAR EQUATIONS AND SYSTEMS 139
original equation, one can also separate ¢ into its real and imaginary
parts, writing

(3.5.5) o(@ + ty) = ¥(z, y) + tw(z, y).


By a slight modification of the sequence 2;,: = $(z;), one can write

Tin = W(%i, Ys), Yara = (Tir, Y:).


When f(z) is.a real polynomial, all complex roots occur in conjugate
pairs, and the roots of a conjugate pair satisfy a real quadratic equation.
Hence a real polynomial f(z) can be completely factored into real quad-
ratic and linear factors. If the coefficients of a real quadratic factor of
f(z) are known, it is then a simple matter to find its zeros, whether they
may be real or complex.
Let
(3.5.6) d(z) = 22+ az+ 0.
If z; satisfies
2? = —az; — b,
2.e., if z; is a zero of d(z), then z; satisfies also
z3 = —a(—az; — b) — bz;
(a? — b)z; + ab,
and in general any positive integral power of 2; is expressible as a linear
function of z; whose coefficients are polynomials in a and b. Hence also
f(z) is so expressible:
F(zi) = rila, b)z; + ro(a, 6).
If d(z) is not a perfect square, it has a zero z, ~ 2; and
f(z.) = ri(a, b)e + ro(a, 6).
If z; and z are zeros also of f(z) (whether complex or not), then

F(@) = fe) = 90,


and for z; and z, distinct this implies that

(3.5.7) ri(a, b) = ro(a, b) = 0.

Here are two equations in the two unknowns a and b which must be
satisfied by the coefficients of a quadratic factor d(z) of f(z).
The polynomials ro and 7; in a and b could be determined equally
well in a slightly different manner. Dropping the subscript on the z,
if z is any zero of d(z), we can say that

zg = —ag*-! — bg.
140 PRINCIPLES OF NUMERICAL ANALYSIS
Hence if f(z) is of degree n, this substitution reduces it to a polynomial
of degree n — 1 in the zeros of d(z), and the coefficients of the poly-
nomial are themselves polynomials in a and b. Likewise
gr-l —— azg™-2 = (1a

and by continuing sequentially in this way, f(z) is again reduced to the


form
(3.5.8) f(z) = ri(a, b)z + ro(a, b),
valid for any z satisfying d(z) = 0. But now one can see easily that
this is precisely the process for forming the remainder after division of
f(z) by d(z). Hence for all z it is true that
(3.5.9) f(z) = (2 + az + b)Qi(z) + ari(a, b) + ro(a, b),
where Q;(z) is the quotient. Hence if a and 6 satisfy (3.5.7), the division
is exact. Hence the conditions (3.5.7) are sufficient, as well as neces-
sary, for d(z) to divide f(z).
For the solution of Eqs. (3.5.7) Hitchcock gives an iteration which is,
in fact, an application of Newton’s method. If one now divides Q;(z)
by d(z), then f(z) can be written
(3.5.10) f(z) = (2? + az + b)?Q(z) + (2? + az + b)q(z;
4, b) + r(z; a, b),
where

(3.5.11) q(z; a, b) = zqi(a, b) + qo(a, b),


r(z; a, b) = eri(a, b) + ro(a, b),
and q(z; a, b) is the remainder after the second division.
Let z represent either zero of d(z). Then from differentiating (3.5.10)
with respect to a and to }, since f is itself independent of a and b, it
follows that
= zq + dr/da,
0 = q+ dr/ab.
In detail these equations are
2°qi + 2g + z20r:/da + Oro/da = 0,
241 + go + 20r1/0b + dro/db = 0.
These equations must hold for any zero z of d(z). Hence the first can
be written
2(dri/da + go — agq:) + (dro/da — bqi) = 0,
and the second
2(6r,/db + gx) + (8ro/db + qo) = 0.
If d(z) is not a perfect square, then the coefficients of z and the terms free
of z must vanish separately. Hence
NONLINEAR EQUATIONS AND SYSTEMS 141
12° ; 6r1/da = agi — Qo dro/da = bg
3.5.12 2 7,
( ) Or;/db — 91, Oro/db sion Os

These are the partial derivatives required for the application of Newton’s
method to the solution of (3.5.7), and they are obtained by two divisions
of f(z) by (2? + az + b).
Hence if a, and b. represent approximations to the coefficients a and b
of an exact division d(z), then improved approximations Ge+1, bay1 Can
be obtained by solving

(QeQ1 — Qo) (Gari — Qa) — qi(bar1 — be) = —r1


(3.5.13 ) bagi(Ga+1 — Ga) — qo(ber1+ — ba)
a
= —To,

where Qo, 41, To, 71 are the quantities obtained after division of f(z) twice
by 2? + az + be. When the process is carried out in this way, the
general forms of the polynomials ry and r; in a and b are not obtained.
Instead their numerical values and those of their partial derivatives are
obtained by the divisions which use the current numerical approximations
aq and be.
This method can be generalized to yield a factor of arbitrary degree.
If one writes down formally a factorization of f(z) into factors with
unknown coefficients, then by expressing that f(z) is to equal identically
the product of these factors one obtains a set of equations relating the
unknown coefficients. Let the unknown coefficients be represented as
Qi, G2, . . . , @y taken in any order, and let the conditions be written

W=w=--- =w=9,

where each y is a polynomial in the a’s. If with each y; one can associ-
ate an a; in such a way that y; = 0 is easily solved for a; as a function
of the other a’s, one can use this fact to define formally an iterative
scheme for evaluating the coefficients in the factorization, and many
different such schemes have been proposed. Generally, however, the
question of convergence is left open.
3.6. Bibliographic Notes. The author is indebted to Professor
Schwerdtfeger for numerous references on iterative methods for both
linear and nonlinear equations, as well as for a copy of some lecture notes.
And at this point reference may be made to Blaskett and Schwerdtfeger
(1945) on the Schréder iterations.
Konig (1884) published the theorem that is now classic in i the theory
of functions. Hadamard (1892) elaborated this and related notions, with
reference to the location and characterization of singular points of analytic
functions, and made application to the evaluation of zeros. A more
recent discussion is that of Golomb (1943). Aitken (1926, 1931, 1936-
1937b) discusses extensively the use of Bernoulli’s method and the &
142 PRINCIPLES OF NUMERICAL ANALYSIS

process. Whittaker and Robinson (1940, and other editions) discuss


Bernoulli’s method and Whittaker’s method.
The general convergence theorem for functional iteration is given by
Hildebrandt and Graves (1927; see also Graves, 1946). Numerous
special methods for solving equations of all types, like Picard’s method
for differential equations, are methods of functional iteration. Hamilton
(1946) gives a number of special convergence theorems and (1950) gives
an analytic derivation of the algorithm derived geometrically by Rich-
mond (1944) and previously obtained by Critchfield and Beck (1935).
Schréder (1870), Runge (1885), and others had given similar algorithms
for algebraic equations.
If one defines the functions ¢; by 7; = oes = ¢;(x;_;), these func-
tions are of a class called ‘‘permutable” or “‘commutative,’’ on which
there is an extensive literature. If x is aeal or complex number, or
a point in a general space, and satisfies c = ¢(x), then x is said to be a
“fixed point’ of the transformation ¢, and comes into consideration in
the topological literature.
Collatz (1950), Wenzl (1952), and others have described iterations
converging to a power of a root. On questions relating to errors and
rates of convergence, see Ostrowski (1936, 1937, 1938) and Bodewig
(1949).
The iteration for quadratic factors of a polynomial was given by
Hitchcock (1938, 1939, 1944); generalizations are given by Lin (1941,
1943), Luke and Ufford (1951), Friedman (1949), where the factors are of
arbitrary degree. There are many ways by which to define an iteration,
but the treatment of convergence becomes more difficult when both
factors are higher than quadratic.
As it applies to algebraic equations, Graeffe’s method is frequently
treated both in the textbooks and in the periodicals, and several references
are given in the bibliography. The differential technique was given by
Brodetsky and Smeal (1924) and was applied to transcendental equations
by Lehmer (1945). For an elegant treatment of convergence in the
general case see Polya (1915), and for a treatment of error see Ostrowski
(1940).
Most of the basic principles in the theory of equations cited here
are to be found in standard textbooks, but for properties of the Vander-
monde determinants and identities relating the several types of sym-
metric functions see Muir’s “Theory.”
CHAPTER 4

THE PROPER VALUES AND VECTORS OF A MATRIX

4. The Proper Values and Vectors of a Matrix


The characteristic function of a matrix A was defined in Chap. 2 as the
polynomial

(40:1) $A) = [4 — All = (—1)"(* + adr + + + + a)


obtained by expanding the determinant of the matrix A — J. The
characteristic equation is the equation ¢ = 0, and its roots are the proper
values of the matrix. For any proper value i, the matrix A — XI is
singular, whence the equation
(4.0.2) Ax =
has at least one nontrivial solution x, and any solution is a proper vector
associated with the proper value A. Of fundamental importance to the
study of proper values and vectors is the Cayley-Hamilton theorem,
which states that
(4.0.3) ¢(A) =0
identically. In words, any matrix satisfies its own characteristic equa-
tion. In special cases a matrix A may satisfy an equation of lower degree,
say
(4.0.4) Wr)
= 0.
Equation (4.0.4) of the lowest degree satisfied by A is called the minimal
equation, and the polynomial y(A), the minimal function. The minimal
function (A) divides ¢(A), and on the other hand every proper value
is a root of the minimal equation.
To show that ¥(A) divides (A), let

(A) = gH) + 7),


where g(d) and r(A) are polynomials, and r(\) is of lower degree than
v(A). Since ¥(A) = 0, and ¢(A) = 0, it follows that r(A) = 0, and this
can be true only if r(A) = 0. The same argument can be used to show
that (A) divides any polynomial w(d) for which w(A) = 0.
143
144 _ PRINCIPLES OF NUMERICAL ANALYSIS

Before showing that all proper values satisfy (4.0.4), we show that, if
h(A) is the highest common divisor of the elements of adj (A — AJ), then _

(4.0.5) o(A) = h(A)¥A).


We know that
(4.0.6) (A — XI) adj (A — AJ) = GDL.
If P(A) is defined by
adj (A — AI) = ACA)PA),
then P(A) is a matrix whose elements are polynomials in A, and these
polynomials in \ have no nonconstant common divisor. Now (4.0.6)
becomes
hQA)(A — ANP) = g(A)L.
But A — XJ and P(A) are matrices whose elements are polynomials in X.
Hence
(4.0.7) (A — DP) = mI,
where m(A) = $(A)/hA(A) is a polynomial.
When A replaces > in (4.0.7), the left member vanishes. Hence
m(A) = 0, and therefore ¥(A) divides m(A). It remains to prove that
m(A) divides ¥(A); We can find a polynomial matrix Q(A) and a constant
matrix Qo such that
WAT = (A — AINQA) + Q
identically by expanding and equating coefficients of like powers of X.
But since ¥(A) = 0, it follows, on replacing \ by A, that Qo = 0. Hence

VAL = (A — ATQQA).
Since (A) divides m(A), we can write

mr) = kA)v),
whence by (4.0.7)

m(A)I = kA)PA)I = kA)(A — ANQA).


By comparing this with (4.0.7) we conclude that

KA)Q(A) = PQ).
Hence every element of P(A) is divisible by the polynomial k(X). Hence
k(A) is a constant, and therefore m(A) and ¥(A) differ at most by a con-
stant multiplier. If we now take the determinants of the two sides of
(4.0.7), we have ss
G(A)|PA)| = ma},
whence every linear factor of (A) must be also a factor of ¥(A).
THE PROPER VALUES AND VECTORS OF A MATRIX 145
If x is any vector, then since ¥(A) = 0, it is certainly true that
W(A)z = 0. ;
For a given vector z ~ 0 let h(\) be a polynomial for which it is also
true that h(A)x = 0. Then if d(d) is the highest common divisor of
h(A) and (A), it is true that d(A)z = 0. In fact, one can find poly-
nomials p(A) and g(A) such that

DAWA) + gA)h() = da),


whence
p(AWA) + q(A)h(A) = d(A),
and the conclusion follows on multiplying this identity into x. There is
therefore a polynomial of lowest degree h(A) for which h(A)z = 0. This
must divide (A), since otherwise the highest common divisor d(A) is of
still lower degree than h(A), contrary to the hypothesis.
If \; is any proper value, then

Wild) = ¥A)/( — Ai)


is a polynomial, since, as was shown above, every proper value satisfies
y =0. Since y is the polynomial of lowest degree for which ¥(A) = 0,
it follows that y,(A) ~ 0, whence for some z it is true that y,;(A)x # 0.
But
¥(A) = (A — A)y¥i(A),
(A — Al)¥i(A)e = 0,
and therefore y,(A)z is a proper vector corresponding to the proper value
»; That is to say, any non-null linear combination of the columns of
y,(A) is a proper vector.
Let \; be a v-fold root of y = 0, and now let

(4.0.8) Wild) = WA)/(A — A)”.


Then (A — d;)”% and y;(A) are relatively prime, whence for some poly-
nomials p(A) and g(a)

P(A)(A — As)”* + GAYA) = 1,


identically. Hence

p(A)(A — rl) + (AWA) =I


Let y ¥ 0 be a proper vector corresponding to \;. Then

(4.0.9) Vi(A)g(A)y = y.
Hence y is a linear combination of the columns of ¥;(A). If z is any
146 PRINCIPLES OF NUMERICAL ANALYSIS
linear combination of these columns, then the last nonvanishing vector
in the sequence
20 = 2,
a= (A “7 rL)zZ0,
(4.0.10)
22 >= (A = rD)a:,
«0 Beet Rey, piel Esite “e) Fes) sea

is a proper vector associated with ),, and all vectors of the sequence are
principal vectors.
Among the schemes for finding the proper values of a matrix, some lead
directly to the characteristic function ¢, to the minimal function y, or to
some divisor w(A) of the minimal function. When this function is
equated to zero, the resulting equation is then to be solved by any con-
venient method. The scheme for finding the polynomial ¢, ¥, or w, as
the case may be, may or may not have associated with it a scheme for
finding the proper vectors. If the scheme provides only some w, and
not necessarily y or ¢, it may be necessary to reapply the scheme in order
to obtain the remaining proper values.
Other schemes are iterative in character, depending upon the repeated
multiplication of a matrix by a vector. A scheme of this type ordinarily
leads to a sequence of vectors having a proper vector as its limit and to a
sequence of scalars whose limit is the associated proper value. Before
describing these methods in detail, we shall introduce a few further
preliminaries.
4.01. Bounds for the Proper Values of a Matrix. Since a nonsymmetric
matrix may have complex proper values, and hence complex proper
vectors, it is necessary to give further consideration to complex matrices. -
The natural generalization of a symmetric real matrix is a Hermitian
complex matrix. The matrix A is Hermitian in case it is equal to its own
conjugate transpose, 7.e., to the matrix obtained when every element is
replaced by its complex conjugate, and the resulting matrix is then trans-
posed. Let a bar represent the conjugate (as is customary), and an
asterisk represent the conjugate transpose. Then the matrix A is
Hermitian in case

(4.01.1) A* = AT= A,
If. A is Hermitian, and x is any vector, real or. complex, then x*Az is a
real number. For if we take its complex conjugate, we have £*Az; but
this is a scalar and is equal to its own transpose Z7A'Z*T = *A *z**,
However x** = x, and the theorem is proved. Hence we can define a
positive definite Hermitian matrix as a Hermitian matrix for which
az*Az > 0 whenever x + 0, and a non-negative semidefinite matrix as
one for which z*Az > 0 for every x. Only a singular matrix can be
THE PROPER VALUES AND VECTORS OF A MATRIX 147
' semidefinite without being definite. Clearly a Hermitian matrix all of
whose elements are real is a symmetric matrix. ?
Analogous to a real orthogonal matrix, 7.e., a matrix V such that
V'V =I = VV", is a unitary matrix U, which is one such that
(4.01.2) UFUral
= UUs.
A unitary matrix with real elements is orthogonal.
The proper values of a Hermitian matrix are all real, since if Ax = Xa,
then x*Az = dx*z, and both x*Az and «*z are real numbers. Also, if
complex vectors x and y are said to be orthogonal when z*y = y*xz = 0,
then proper vectors associated with distinct proper values of a Hermitian
matrix are orthogonal. For if

Az = dz, Ay = py,
then
y*Ax = hy*x, etAy = party.
But
y*Az = x*Ay, y*z = x*y,
whence if \ ¥ y, this implies that x*y = 0.
If A is Hermitian, there exists a unitary matrix U such that

(4.01.3) eA= N
where A is a diagonal matrix whose elements are the proper values of A,
and where the columns of U are the proper vectors of A. This corre-
sponds to the case of the real symmetric matrix, and the argument can
be made by paraphrasing that given in the real case.
If A is any matrix, Hermitian or not, any scalar of the form «*Aa/x*x
for z ~ 0 is said to lie in the field of values of A. Any proper value of A
lies in its field of values. For if Ax = da, then x*Ax = dAx*az.
If A is any matrix, then A*A is Hermitian and semidefinite; it is
also positive definite if A is nonsingular, for then Az ~ 0 whenever x ¥ 0,
and hence z*A*Axz = (Ax)*(Az) > 0. If the proper values of A*A are
p? > p? > + * + > p2 > 0, and d is any proper value of A, then
(4.01.4) pi = Wj pi}.
For if Ax = Az, then z*A* = dXx*, and hence
gtA* Ax = \\r*z.
Hence XA is in the field of values of A*A. But if a is any number in
the field of values of A*A, then for some a with a*a = 1 it is true that
a*A*Aa =a. If
U*A*AU = P?,
148 PRINCIPLES OF NUMERICAL ANALYSIS

where P? is the diagonal matrix whose elements are the p?, and if a = Ub,
then
a = b*U*A*AUb = b*P% = ZB6ip?,
all products ,8; are real and positive, and hence a is a weighted mean of
the p?. Hence a cannot exceed the greatest nor be exceeded by the least
of the p?:
pi 2 a ] pi.
Since XA is such an a, the relation (4.01.4) now follows.
If \ is a proper value of A with multiplicity », then \ + pis a proper
value of A + pl with multiplicity ». For this reason, the following
classical theorem can provide information as to the limits of the proper
values of a matrix: If for every 7

(4.01.5) lal > ) leu,


7

then the matrix A is nonsingular.


If the matrix were singular, then the equation Az = 0 would have a
nontrivial solution. Among the elements of x, let &; be an element of
greatest modulus, |é| > |é| for allj. Then

| ati |S » las lE] < LEd » lass] < |Ellossl.


pri iHi jAt
But
> asti = 0, ast = — > aijk;,
iwi
whence

laslléd =| Y)ati.
jt

Hence we have a contradiction, and the theorem is proved.


In some cases the theorem remains valid even when some, but not all,
of the inequalities in (4.01.5) become equalities. Suppose, for example,
that

(4.01.6) lanl >) lawl, Zlad > ),lessl 2S ay


j=2 ‘)

Then if & has the same significance as before, there is at least one & for
which |Ex| < \é|. For

lanllél < ) lensll&t


THE PROPER VALUES AND VECTORS OF A MATRIX 149
whence on applying the hypothesis it is clear that, for some j, || > |£|,
and the #’s are therefore not all equal in modulus. Now

lacdléel < Y lasllél


Ai
but this is inconsistent with (4.01.6) unless a, = 0 for every k such that
|&| < |&|. If this is so, and if there are v values of j such that |£;| = |&|,
then there are n — v values of k for which aj = 0. But also by the
same argument a, = 0 for each such k and every j for which |é,| = |&.
By performing a suitable permutation on the rows of A and the same
permutation on the columns, we can assume that

pein
— 2 +2) 6 a gt,

The matrix is then in the form

(4.01.7) A= st fp
(PO¢),
where P and R are square matrices. Hence if the matrix is not one which
can be given the form (4.01.7) by any permutation of rows, accompanied
by the same permutation of columns, then the conditions

(4.01.8) 2lais| > >:|avss|


Z

with a proper inequality for at least one value of 7 are sufficient to ensure
the nonsingularity of the matrix.
Obviously the above argument can be applied to A’.
Now let

(4.01.9) P; = » levssl, Q: = > |axje|.


iwi iri

If we apply the above results to the matrix A — XJ, it is clear that, if \ is


a proper value, A — AI becoming singular, then either

(4.01.10) [IA — au| = P;

for every 7, or else for some 7 it is true that |A — o| <P; In either


event it is true that the proper value lies within or on the boundary of at
least one of the n circles in the complex plane defined by (4.01.10). On
applying the same argument to A‘, we conclude that every proper value
lies within or on the boundary of at least one of the n circles defined by
(4.01.11) [A — ai] = Qi.
150 PRINCIPLES OF NUMERICAL ANALYSIS
4.1. Iterative Methods. These methods provide for the direct manipu-
lation of the matrix itself or of some matrix simply related to it, without
necessitating the explicit development of the characteristic or other
polynomial. We begin with the relatively simple case of a Hermitian
matrix.
4.11. The Matrix A Is Hermitian. If the proper vectors u; of a
Hermitian matrix were all known, these could be normalized to unit
length u*u; = 1, and they would form the columns of the unitary matrix
U such that
(4.11.1) U*AU =A, U*U =I,
where A is the diagonal matrix of proper values. We may assume the
u; to be so ordered that

(4.11.2) MA SP SA

If x is any vector and y satisfies

x = Uy, y = U*x,
then
ete = y*U*Uy = y*y:

Hence the field of values of A and that of A are identical, and if a is in


this field of values, then
MS aS Aa.

If p(r) is any polynomial in ), the matrix p(A) has the same proper
vectors as A itself, and its proper values are p(\;). In particular, the
matrix A? is necessarily non-negative, definite or semidefinite. More-
over, for » sufficiently large, A + ul is positive definite. It is therefore —
no essential restriction to assume, when convenient, that A is positive
definite or is at least semidefinite.
‘In §2.06 the trace tr (A), which is the sum of the diagonal elements,
was shown to be equal to the coefficient of \”-! in the characteristic equa-
an except possibly for the sign, and to be equal to the sum of the proper
values,
tr (A) = 2d;

Make generally, if p(A) is any polynomial, then from (4.11.1) it follows


that ,
(4.11.3) tr [p(A)] = Zp(r).
Moreover, if v is any integer, and A is nonsingular,
(4.11.4) tr (A-’) = Ddz”.
THE PROPER VALUES AND VECTORS OF A MATRIX 151
The norm N(A) of a real matrix A, symmetric or not, was defined to be
the square root of the sum of the squares of the elements. Equivalently,
this is the square root of the trace of ATA. For a complex matrix,
Hermitian or not, it can be defined by
(4.11.5) N?(A) = tr (A*A).
As so defined, tr (A*A) is a positive real number, and for N(A) the posi-
tive root is to be taken. Hence if A is Hermitian,
N2(A) = tr (A2) = tr (A2) = 2D?
Now N?(A) is the sum of the squares of the moduli of all elements of A.
Obviously this is equal to the sum of the squares of the diagonal elements
of A if and only if A is a diagonal matrix. Hence

Zaz, < N*(A),


and the equality holds only when A is diagonal. Now the unitary trans-
form V*AV of A, where V is any unitary matrix, has the same norm as
does A, whereas in general the sum of the squares of the diagonal ele-
ments is not the same. Hence among all unitary transforms of a Her-
mitian matrix A, the transform (4.11.1) maximizes the sum of the squares
of the diagonal elements or minimizes the sum of the squares of the
moduli of the off-diagonal elements.
4.111. The largest and smallest proper values. If A is non-negative
semidefinite, then for vy> 0
NS tr (A’) = DAES nay,
(4.111.1) Ares [tr (A”) >)” S Aun”.
But as v increases, n”” > 1. Hence
(4.111.2) (n-) tA") < Avs [tr (A),
and
(4.111.3) lim [n-1 tr (A”)]”" = lim [tr (A)]}¥" = A.
The two sequences obtained from the successive powers A’ of A approach
X, from above and from below. The most effective application of this
algorithm is made by successively squaring the matrix A and forming
mA BAS TAS, oc aay
with v taking on the values 2”. The degree of convergence is measured
by the ratio n””:1 or by n?””.
If « = Uy is any vector, then
(4.111.4) Ave = UAPU*x = Udy = Lud2n:.
152 PRINCIPLES OF NUMERICAL ANALYSIS
If 41 >; > O fori = 2, 3, ..., ”, or if 41 > Az and A; > |A,|, then
since. ;
Ave = Mum + (A2/A1)"Uene Foc + (An/A1) "Untnl,

as v increases all terms but the first within the brackets approach zero,
and in the limit,
(4.111.5) A’r > No,
provided only 7:0. That is to say, as v increases, the vector A’
approaches a vector in the direction of the first proper vector uw. It is
necessary only to normalize to obtain w; itself.
If 71 =0 4m, the same argument shows that A’x approaches a
vector in the direction of u2, provided Az exceeds \3, . . . , An nUMerically.
To square a matrix of order n requires n* multiplications, whereas to
multiply a matrix by a vector requires only n*. If
(4.111.6) oe = A’g = Age,

then for large v it appears from (4.111.5) that 2,41 = diz, Hence,
although if » = 2?, » increases rapidly with p, it may be more advan-
tageous to form the sequence z, directly as in (4.111.6) than to square
the matrix several times and then multiply by z. Moreover, a blunder
made in computing z, will be corrected in the course of subsequent
multiplications by A. The two methods are related essentially as are
Graeffe’s and Bernoulli’s methods for solving ordinary equations. It
might be pointed out further that, if by a rare chance it should happen
that 7: = 0, nevertheless round-off will introduce a component along
uy in the x,, and this component will build up eventually, though perhaps
slowly.
Let
(4,111.7) Gy = try, = YY y->
Then a, is independent of v, and in particular

Ap = c*APy = y*APy = UDPnii.


Hence, if m ¥ 0,
Oppr1 = BP gi S ADA nis = rap,
(4.111.8) Op+1/Ap <M,
and
(4.111.9) On41/Op — V1.
Thus ap4:1/a, approaches ), from below.
Since in the limit z,,1 = dux,, the ratio of any element of 2,41 to the
corresponding element of x, provides also an estimate of 1, when »y is
THE PROPER VALUES AND VECTORS OF A MATRIX 153
sufficiently large. The agreement among these n ratios is an indication
of the nearness to the limit.
As v increases, the elements of x, and of A” become large if \; > 1, or
small if \1 < 1. Hence in actual computation it is convenient to modify
the sequence of x, or the sequence of A” by introducing factors x, to hold
the quantities within range. Thus one could compute the series
ee KAx,;

where each x, may be selected according to convenience. For succes-


sively squaring the matrix, Bargmann, Montgomery, and von Neumann
propose the sequence
Bo pA/tr (A),
Boy pB;/tr (Bo),
where p is a scalar slightly less than unity. For p = 1, all quantities
would be not greater than unity in principle, though this might fail as a
result of round-off. By a suitable choice of p, one can be sure of staying
within the range from —1 to +1 even with round-off.
If w > 1, then A’ = pl — A is Hermitian, positive definite, and has
the greatest proper value \, = u — Ax». Hence no special discussion is
needed for finding the smallest proper value.
4.112. Accelerating convergence. Of the schemes for accelerating con-
vergence, the simplest is Aitken’s 5? process, described in Chap. 3. This
can be applied to the sequence a,p41/a, for finding \1, to the sequence of
ratios of corresponding elements of z,,1 and x,, and to the sequence x, for
the proper vector.
In the sequence z,, the rapidity of the approach to the limit depends
upon the smallness of all ratios \;/Ai fori > 1. Any matrix A’ which
reduces the greatest of these ratios will provide more rapid convergence.
One of the simplest choices might be A? or A*. This would mean taking
one or two matrix products, requiring n’ multiplications each, and follow-
ing this with a sequence of products of a matrix by a vector, requiring n?
multiplications each. Having obtained, say, the vector x2, one could
then obtain Az2, = 22,4; to find A; without a root extraction.
Another possibility is to replace A by
de
a Oa i
and hence the proper values ); by

N; — MG aim By,

where p is judiciously selected.


Assuming, as always, the ordering (4.11.2), and for present purposes
that the strict inequality 41 > 2 holds, the best choice of » will be that
154 PRINCIPLES OF NUMERICAL ANALYSIS

for which the greatest of the ratios |j/dj|, 7 > 1, is least. But the
greatest, of these is either |\3/)j| or |A,/4| or both. The optimal p is
therefore that for which these ratios are equal, since a different selection
of » would increase one of these ratios even though it decreases the other.
Hence the optimal yu is
w= —(Az + dx) /2.
To make the strictly optimal choice, one must know 2 and X,, exactly, but
enough information may be at hand to permit a good choice.
The iteration of the linear polynomial A + yl and of the very special
quadratic polynomial A? in place of A is sometimes advantageous. The
question arises then whether A*, or even an A?-+ ul, for some yu is
necessarily the best quadratic polynomial, and more generally what is
the best polynomial of any given degree. It turns out that the best
polynomial is given by the Chebyshev polynomial of the prescribed
degree (cf. §5.12).
If a’ and a” are known such that \1 > ’ > Ae > + +s Sa SN”,
then it is no restriction to suppose that \’ = —\” = 1. Forif this is not
the case, one can replace A by A’ = (X’ — N”’)-2A — (X’ + N’’) I].
Hence assume this to have been done, and assume further that a 4 is
known such that
hi SoS Pak, Ss. eee SE
(4.112.1)

Then let
(4,112.2) T'n(X) = cos [m arc cos jl],
Sm(X) = T'n(d)/Tn(8).
Then S,,(5) = 1, and S,,(A1) => 1. On the other hand, for any 2 satisfy-
ing 1 >> —1 and in particular for \ = yx, 7 > 1, it is true that
|Sm(X)| <1. Indeed, the argument developed in §5.12 can be modified
to show that, of all polynomials q(A) of degree m satisfying 9(6) = 1,
Sm(A) is that polynomial whose maximal absolute value on the interval
from —1 to +1 is least. Hence among all polynomials of degree m
that might be used for the iteration, S,,(A) is the best choice that can
be made on the basis of the information contained in the hypothesis. In
other words, in the sequence

uy > f, T41 = Sn(A)z,,


the components along w2, . . . , Un damp out as rapidly as one can make
them by applying a polynomial matrix in A of degreem. Asa final check,
and to nullify the effect of any accumulated round-off, one may take any
x“, a8 Zo and apply the matrix A itself one or more times.
_ The notion of using a Chebyshev polynomial may be carried a step
further. The vector 2’, is the result of » applications of the polynomial
THE PROPER VALUES AND VECTORS OF A MATRIX 155
S,(A) of degree m and hence is the result of applying a polynomial
[Sm(A)]” of degree mv. Clearly the direct application of the polynomial
Sm, would give better results.
Hence, let us return to the original sequence

Vo, V1, V2, . . . y UM.

This is the sequence


Aes Ake) ere ied ve)

Instead of accepting 2, itself, for some », as giving the best approximation


for the direction of wi, one could ask for a linear combination of all »y + 1
vectors that will give as good an approximation as possible. With the
same hypothesis (4.112.1), the best available linear combination is
,
x = ooo + o1%1 + i fae + OXy,
where
SA) = oo foA+ +++ +4,)’.

Again all elements of Az’ should be in the approximate ratio \, to cor-


responding elements of x’, which fact serves both as a check and as a
means of obtaining \; directly without the extraction of a root.
4.113. Intermediate proper values. Already attention has been drawn
to the resemblance between the iterative methods for finding proper
values and the methods of Graeffe and of Bernoulli for solving equations.
Indeed, Aitken bases his discussion of the method of Bernoulli for alge-
braic equations upon the iteration of a matrix for which the given equa-
tion is the characteristic equation. In principle both methods, Graeffe’s
and Bernoulli’s, yield all the roots of an algebraic equation.
In §4.21 a particular direct method will be described for finding the
coefficients of the characteristic equation. That method does not itself
yield the roots of this equation. However, if the method is applied to
A” for any v, then one obtains the equation whose roots are thé J.
Hence if » is sufficiently large, the coefficients of the equation are approxi-
mately equal to dz, AZAZ, AZAZAZ, . . . , if the \’s are numbered in order
of decreasing magnitude. If the matrix is Hermitian, then all roots are
real. For the treatment of multiple roots, reference is to be made to the
discussion of Graeffe’s method in Chap. 3. Unfortunately this method
does not yield the associated proper vectors.
For obtaining proper vectors as well as proper values, suppose A; > i441.
Take any uw such that 0 <A; —»<u— Aus. Then the numerically
smallest proper value of A — ul isd; — uw. Also (A; — pw)? is the smallest
proper value of the positive definite matrix (A — uJ)’. Hence the
problem of evaluating the proper value ); and its associated proper vector
is reduced to that of evaluating the smallest proper value of the positive
156 PRINCIPLES OF NUMERICAL ANALYSIS

definite matrix (A — »J)?, and this in turn can be reduced to that of


finding the largest proper value of a related matrix, as described in §4.112.
If \; — » > uw— Ni > O, then (Az — w)? is oe smallest proper value
of (A — pl)?.
If » is any number on the interval from A to A, but uncientie far
from either A; or An, then (A — uJ)? will have (A; — »)? as its smallest
proper value for some7 = 2, ...,n-— 1. Hence any proper value and
vector can be obtained independently of all others and so that no errors
present in one will affect the others.
A more common procedure for obtaining the intermediate proper values
is to obtain them in sequence, making each depend upon the larger values
already found and the vectors associated with them. The vectors u;
are mutually orthogonal and of unit length, ie means in the complex
case that
ur Uj = 653.

The case of a real symmetric matrix A with real proper vectors u; is a


special case. If the trial vector x has no component in the direction of
uy, and if X_ > |A,| for? = 3, 4, . . . , n, then x, approaches we in direc-
tion, and ap41/ay, as well as the ratio of any element of 2,41 to the cor-
responding element of x,, approaches A». But if wu; is known, and 2’ is
any vector, then
(4.113.1) x= a! — uyxe!
is orthogonal to u:. In fact, this is merely the vector which remains
when the orthogonal projection of x’ upon the unit vector wu; is subtracted
from x’. Hence when wu; has been found, a new sequence 2, can be
determined beginning with a vector x orthogonal to u;. All vectors z,
will be orthogonal to u1, except when round-off introduces spurious com-
ponents along u, but these can be removed from time to time by apply-
ing (4.113.1). Hence Az and we can be obtained from this sequence of
vectors, just as were )\; and 1; before.
In general, for evaluating any ); and u,, where d; exceeds in absolute
value every proper value not already found, one begins with a vector x
from which has been subtracted the orthogonal projection upon the sub-
space determined by those proper vectors already found.
Another method for finding the intermediate proper values and their
associated vectors is to replace the matrix A by another having z, ds,
. « « y An, but not Ax, as proper values. The matrix
Ai =A— Awmyut

is Hermitian, and if 7 ¥ 1, then


Ayu; = Au; — Aut u; = Au; = AU,
THE PROPER VALUES AND VECTORS OF A MATRIX 157

whereas
Ay = A — Niu = 0.

' Hence A; has the proper values 0, associated with the proper vector w1,
and ),, 7 > 1, associated with the proper vector u;. Thus A; satisfies the
requirements. It is useful to note that
Aj = A? — deuwt,
and hence inductively
(4.113.2) AY = A” — Nuzut.
Thus if the powers of A have been formed in the process of arriving at \1
and wu, to form the same powers of A; it is necessary only to subtract a
scalar multiple of the-singular matrix wiu*.
When dz and we are found, one can form
Ao => Ai = Noes

and proceed to find \; and ws.


In this method the matrices A, As, . . . are all of order n, with 0
as a multiple proper value replacing each of the proper values already
found. It is possible, however, to reduce the order of the matrices.
Thus when ); and w; are found, let uw, be any unit vector orthogonal to 1;
uz any unit vector orthogonal to both uw: and uj; ... ; and finally uj,
one of the two unit vectors orthogonal to ui, uy, ... , Wr}. Then the
matrix
Ui - (uw, Ur, OSE » Un)

is unitary, and one verifies that


* hi Ai 0 )
(4.113.3) . UiIAU? = a Ai)’

where now A is of order nm — 1 and Hermitian. The matrix UfAU; has


the same proper values as A; hence the proper values of A are 2, As,
. . An. Let v be any proper vector of Ai:
Ay = mW.

urans(?)=(6' 4)@)-G)-»@)
Then

av.(7) =»)
Hence

and U, i) is a proper vector of A. Hence if the largest proper value of


v
A, and its associated proper vector are found, the corresponding proper
158 PRINCIPLES OF NUMERICAL ANALYSIS

vector of A can be found directly. The next step is to replace Ai by a


matrix of the form
re 0 )
(? As}
where A>» is of order n — 2.
A simple construction of the matrix U; is given by Feller and Forsythe.
Write wu in the form

(4,113.4) Ww = (2)
where w is a vector of n — 1 elements andwascalar. Then it is possible
to choose u so that

(4,113.5) n=(°w IRites


— pww*
)
is unitary. For this we must have

_ ed hp ee o w*
pe Tg ( I- pee) Eh I - oa)
and hence by direct calculation
(4.113.6) w= (1 — o)/(1 — a).
Hence when A, and wu, are found, the transformation (4.113.3), where the
unitary matrix U, is defined by (4.113.5), (4.113.4), and (4.113.6),
replaces A by a Hermitian matrix A, of lower order, whose proper values
are the same as those of A which are not yet known, and whose proper
vectors v; yield those of A by the simple relation

(4.113.7) eran i (°°)


4.114. Equal and nearly equal roots. Suppose \1 = A2x > Az. Then
associated with A, is a two-dimensional set of proper vectors, and any two
mutually orthogonal unit vectors in this set can be taken as wu; and up.
Given any starting vector x, let wi be the direction of its projection on
this set. Then x can be written in the form

% = iti + atts Fo ot NnUn,


while
ty = Are = mdyur + ysdgus + ++ > + dP,
and for v sufficiently large
Ly = niAjU1.

If zis any other vector, then


Z = wily
+ wotle + + + + + Gln,
THE PROPER VALUES AND VECTORS OF A MATRIX 159
where in general w,2 ~ 0. Hence for large »
Zy = Az = (wyuy + Wole) At.

Both vectors, z and z, will be effective in yielding 4, but in general


distinct vectors will lead to distinct proper vectors associated with the
same proper value. However, a third vector w will yield a w, for large v
such that z,, z,, and w, are linearly dependent.
On the other hand, if \; is a triple root, three starting vectors will
lead to the same A, but to three independent proper vectors, while four
starting vectors would approach a set of four linearly dependent vectors.
If \, and 2 are nearly, but not exactly, equal, then with each will be
associated a unique proper vector. Consider two starting vectors x and
z. For »v sufficiently large, we shall have approximately
Ly = mAU1 + n2dZU2,
2 = widyur + W2AZU2.

Let
Qy = Bry, By = 262).
Then

-114.1
ay =54 HimAl + Homers,
2
Saray By = @11AJ + Wowedr}.
Then the two matrices

ays wp agedy Ba petMine Ay


Oy41 pana aoe ) Bv41 ue Aart

Ora AGT? Bt? reat) Neue


are both singular, and hence ); and dz must satisfy the equation
i @k B,
(4.114.2) NK aya1 Bra = 0.
” Qri2 Brye

The method can be extended to the case when three or more of the proper
values are nearly equal.
When the powers A” are formed explicitly, there are already at hand
the vectors x, of n distinct sequences since the 7th column of A” is A’e;.
The 7th diagonal element of A” is the a, for the starting vector ¢;.
If the matrices A” of the sequence approach rank 1, then all columns
approach the proper vector associated with the single largest proper value.
If they approach rank 2, then the two largest proper values are equal or
nearly so. Pick out any two diagonal elements and consider their values
in consecutive matrices of the series. Let these be a, and B, in A’, ay41
and 6,41in A’+1, From (4.114.1), the matrix

a By \_frA¥ AB Nee oa
Oyvi1 Brot AG get Hone W2W2
160 PRINCIPLES OF NUMERICAL ANALYSIS
Hence if the determinant of this matrix vanishes for large v, then \1 = Az.
If it does not vanish, the roots are nearly equal, by comparison with the
other proper values, but not exactly equal, and they can be found by
solving Eqs. (4.114.2).
4.115. Rotational reduction to diagonal form. A second-order Hermitian
matrix has the form

(4.115.1) Basmy (Eis


Aer aBe'* ) 6 >0,
and its characteristic equation is
(4,115.2) uw? — (Bi + B2)u + BiB2 — B? = O.
The discriminant of this quadratic is
(61582)? 4-46",
and since all the 6’s are real, this can vanish only in case B: — B2 = B = 0.
Hence the only second-order Hermitian matrices with coincident proper
values are scalar matrices, which are necessarily diagonal.
In complex 2 space, a unitary vector can be written in the form

he 2 Gs cos ‘\
e2 sin 6

Assume 6 > 0, and let ui and uw. < pi be the two roots of (4.115.2). Then
if v is a proper vector associated with m1,
(B81 — pwi)e cos 6 + Bets) sin 6 = 0.
This can be satisfied by taking
o. =w+ ¢/2, wo. =w— ¢/2,
with w arbitrary, and

(4.115.3) tan 6 = (wu: — B1)/B = B/(u1 — Be).


The second expression is obtained by using the second row of B, and the
two expressions are equivalent since (4.115.2) can be written

(u — B:)(u — Be) = 6.
Because of this relation, either root yw; must exceed both #; and B. or be
exceeded by both. But since the sum of the roots is

Hi + we = Bi + Ba,

it follows that the larger root 1; must exceed both, and the smaller root ps
must be exceeded by both. Hence

tan @ > 0,
THE PROPER VALUES AND VECTORS OF A MATRIX 161
and 6 can be taken to lie in the first quadrant. Hence 6 is uniquely
determined. By applying a standard trigonometric identity, one obtains
(4.115.4) tan 26 = 26/(B: — B:2),
an expression which does not involve p.
The proper vector v; associated with yi can now be written

e#/2 cos 6
(4.115.5) vy = (n am \

where ¢ is defined by (4.115.1) and @ by (4.115.4) with the additional


convention that 0 < 6 < 7m.
If one uses (4.115.4) and then nea trigonometric identities to find
the functions of 6, then from (4.115.3) and the expression for the sum
of the roots it follows that
(4.115.6) Hi = 61 + B tan 8, Be = Bo — B tan 6.
Since vz is orthogonal to 1, it is easy to obtain

—e'#/2 sin 6
(4.115.7) Vo = ( 26? cos )

so that the unitary matrix V which diagonalizes B to M is


e*/2 008 9 = — e*4/2 sin 0
(4.115.8) V= ee sin 6 e¢-‘4/2 cos 6 )
where
(4.115.9) VABV =i:

One verifies directly that the sums of squares of the diagonal elements of
B and M are related by
wi + wf = Bi + BZ + 267.
Now suppose B represents any principal minor of A. For simplicity
let this be the minor taken from the first two rows and columns of A.
The matrix
V0
base & 2
is easily seen to be unitary. If we write A in the form

Bue Bo )
ois B ingBiss,
then ~
= Ut = M V 1).
Ai aoe) UtAU,i ey By )
162 PRINCIPLES OF. NUMERICAL. ANALYSIS

Hence in the transform A; of A by the unitary matrix U, the sum of the


squares of the diagonal elements is increased by 26%. The same would
hold if B were any principal minor with non-null off-diagonal elements
oj and a;:, except that the matrix U must be formed by placing the
elements of V in the 7th and jth rows and columns.
The sum of the squares of the diagonal elements of A is maximal among
all the unitary transforms of A for the diagonal matrix A. Hence an
infinite sequence of transforms A, of A, each produced from the preceding
by a plane rotation as just described, and chosen to nullify a pair of
off-diagonal elements of greatest magnitude, will approach A as a limit,
and the infinite product of unitary matrices U, will approach the matrix
U of proper vectors. The ordering of the proper values down the main
diagonal of A is taken care of by the convention that ui must be the larger
of the two proper values of each B, provided the rows and columns in B
are ordered as they are in A.
4.12. The Matrix A Is Arbitrary. If the proper values of the non-
symmetric real matrix or the non-Hermitian complex matrix A are all
distinct, then there exists a nonsingular matrix W such that

(4.12.1) WAW =A,


and A is again a diagonal matrix of the proper values of A. In this case,
the matrix W is not orthogonal or unitary. Nevertheless, it is still true
that
(4.12.2) > = WAW-,
where v is any integer if A is nonsingular or any non-negative integer when
A is singular. Moreover,
(4.12.3) p(A) = Wp(A)W-},
where p(A) is any polynomial. Hence let r = Wy be any vector. Then
Ave = WA’y = Dwar.
If there is a single proper value of largest modulus, let this be \;. Then
just as in the Hermitian case for large v
(4.12.4) a, = Ae = Nin1W,

provided only m + 0.
For a non-Hermitian matrix, there are two sets of proper vectors, a set
of row vectors and a set of column vectors. Corresponding to any proper
value ); there is a column vector w; and a row vector w‘ such that
(4.12.5) Aw; = wi, wid = rw,
The w* are in fact the rows of W-1, as the w; are the columns of W, or may
THE PROPER VALUES AND VECTORS OF A MATRIX 163
be taken so when properly normalized. Let u = vW- be any row vector.
Then
UA’ = vA"W- = Trg,
if the ¢; are the elements of v. Hence, again, for sufficiently large v
(4.12.6) u, = uA” = dy},
provided ¢; ~ 0. Itis not necessary that w; and w' be of unit length, but
for these to be a column of W and a row of W-', respectively, it is neces-
sary that
ww, = 1.

For sufficiently large v, consecutive vectors in the sequence zx, and


also consecutive vectors in the sequence wu, are approximately linearly
dependent. Hence the ratio of any two corresponding components can
be used to provide an approximation to x, and the agreement among
these ratios provides evidence as to the nearness to the limit. The 6?
process can be applied to accelerate the convergence to all limits, \1, wi,
and w},
Suppose now that A; and d2 are equal in modulus, but exceed in modulus
all other proper values. Then in the limit

oy weNo!
+None
Consider the sequence of scalars
(4.12.8) ays, = U,a:
For large v this becomes

ay = Aihim + AZGan2.
Then in the limit the matrix

Oy Nt nN
yg. AZHE YBtt
Gyp2 AZH® Yet

is singular. If either u or x is replaced by a different vector, another


sequence of scalars 8, can be defined, and by a familiar argument 1 and de
must satisfy the quadratic equation
Le 03 Se
(4.12.9) NX Qr42 Brzr} a0:
2 Ay+2 Byte

The a’s and the ’s can indeed be individual components in the 2’s or in
the w’s. The extension to a larger number of proper values of equal
164 PRINCIPLES OF NUMERICAL ANALYSIS
modulusis direct, and applies, moreover, also to the case of moduli that
are nearly equal.
In case \1 = de, the coefficients in the quadratic (4.12.9) all vanish.
This case will be considered later.
Suppose the proper value )1 of greatest modulus and its associated
vectors w; and w! have been found. Suppose dA, exceeds all other proper
values in modulus. Its proper vector wz is orthogonal to w!, and w? is
orthogonal to w:. Hence one can proceed as above but with starting
vectors x, orthogonal to w!, and wu, orthogonal to 1:
wig = uw; = 0.

Or, one can replace the matrix A by


Ai =A— Aww,

whose proper values are 0, As, As, . . . , An


Turning now to the case of multiple values, it may or may not be
possible to diagonalize the matrix when such occur. If \i = \2 ¥ d; for
i > 2, and if two independent proper vectors exist associated with Ax,
then in general a starting vector x will have some component which is a
linear combination of these vectors. The iteration can proceed and will
yield \; and some linear combination of wi and w2. Likewise a starting
vector u will yield \; and some linear combination of w' and w?. A differ-
ent starting vector x will yield a different linear combination of w; and
We, and a different starting vector wu a different linear combination of w!
and w?. When powers A” of the matrix are computed explicitly, then
one has the effect of iterating simultaneously upon n distinct column
vectors, the columns of A, and upon n distinct row vectors, the rows of A.
In the limit if corresponding columns (or rows) in consecutive powers A’
are linearly independent, then the largest proper values are equal, or
nearly equal, in modulus. But if these are linearly dependent, whereas
the matrix A” has rank > 1, then the largest proper value is a multiple
root.
However, in the case of a multiple proper value, it can happen that
the number of linearly independent proper vectors associated with this
proper value is less than the multiplicity. If so, then the matrix cannot
be diagonalized. The general form was discussed in §2.06, and is shown
in (2.06.20), (2.06.21), and (2.06.22).
It is clear that among the iterates A’x of any vector x only a finite
number can be linearly independent. There will be some m < n such
that for » > m, A’z is expressible as a linear combination of x, Az, ... ,
A™—'z, Hence all iterates lie in some subspace of dimension m < n.
In this subspace there is at most one (in general there will be exactly one)
proper vector associated with each distinct proper value. The fact can
THE PROPER VALUES AND VECTORS OF A MATRIX 165
be deduced from the discussion in §2.06, but will be brought out more
clearly in §4.23. Furthermore, associated with each }; there is a principal
vector of highest grade in the subspace, say u{) of grade n,, the vectors
Seta = (A = ADU,

ue = (A — At)?2u% = (A — dD) ul»,

are of progressively lower grade, and lie also in the subspace, and in par-
ticular u{? is the unique proper vector in the subspace.
Consider the effect of the iteration upon these principal vectors. If
we write
Us = (uPu® - - - uf),
then one verifies that
Oe Uae
where
Xe alt FO, ae.
AREA; Lo. p= Ad hy,
fo) 0; (ef fe fe) 16) 0 ve! Ve

the matrix A; being of order n;. Hence


A?U; = AU,A; = UA? = UA + 11)?,
and in genera]
A’U; cad UAL + Ty)’.

The auxiliary unit matrix J, vanishes in the n,th and higher powers.
Associated with each distinct \; will be a particular matrix U; of
principal vectors, and x can be expressed as a sum

i=l ge,

where x is a vector of as many elements as there are columns in Uj.


Hence
Alt = LU (AL + Ty)’,

If there is a proper value 1 of modulus exceeding all other |A,|, then


in the limit
x, = Ave = UAT + 11)’.
Since

(Ar + I,)™ = nyri(Ail + I,)™7 + fs MAL + J i\a-* +: 7

+ PT = [Ql + 1) — Ml = Ip = 0,
it follows that in the limit any 1 + 71 consecutive vectors in the sequence
Ax will be linearly dependent, and in fact the coefficients expressing the
166 PRINCIPLES OF NUMERICAL ANALYSIS
dependence relation are the coefficients of the powers of ) in the expan-
sion of (A — A1)". ‘This fact provides a means for computing Ai.
Given Ax, it is possible to form the combinations
A),2y = By41 — Ait),
AZ ty = Lrz2 — WZrXy41 + Af,

We find
A), 2 = Uf + Iy- AW Ow + I) "x
= UWA oh I,) "x,

and more generally


AX,%» = ORV EXON 4 + Ty)*x.

But since [7*= 0, therefore A*tz, = 0. Moreover, J7'—! differs from the
null matrix only in the last element of the first row. Hence in the product
UiJ*— only one column is non-null, and this is the proper vector u,
Hence A*!—!z, is equal to uw“ except for a nonessential scalar multiplier.
Thus even in this case, it is possible to obtain from the iteration both the
largest proper value and an associated proper vector.
4.2. Direct Methods. By a direct method will be meant a method for
obtaining explicitly the characteristic function, the minimal function, or
some divisor, possibly coupled with a method for obtaining any proper
vector in a finite number of steps, given the associated proper value. The
method to be used in evaluating the zeros of the function is left open.
Naturally one such method would be direct expansion of the determi-
nant |A — Al| to obtain the characteristic function. This done, and the
equation solved, one could proceed to solve the several sets of homogene-
ous equations (A — AI)x = 0, where d takes on each of the proper values.
Such a naive method might be satisfactory for simple matrices of order 2
or 3, but for larger matrices the labor would quickly become astronomical.
In discussing iterative methods, it was convenient to consider sepa-
rately Hermitian and non-Hermitian matrices. This was primarily
because of the fact that for Hermitian matrices the proper values are
known to be real, though a further point in favor of the Hermitian matrix
is the fact that it can always be diagonalized. For the application of
direct methods, however, the occurrence of complex proper values intro-
duces no difficulty in principle, though naturally they complicate the
task of solving the equation once it is obtained. Intrinsic difficulties
arise in the use of direct methods only with the occurrence of multiple
proper values. Fortunately, however, one begins in the same way
regardless of whether multiplicities are present or not. If present, the
fact reveals itself as one proceeds. If a given direct method applies at
all to non-Hermitian matrices, then its application to any diagonalizable
matrix can be discussed about as easily as can the application to the
THE PROPER VALUES AND VECTORS OF A MATRIX 167

special case of a Hermitian matrix. Consequently, each method or


class of methods will be described for whatever cases it may cover,
Hermitian or not, before passing on to another.
One ingenious and simple method that has been used to find the
characteristic equation may be mentioned at the outset. This is to
evaluate the determinant ¢(X) = |A — AJl| for each of n+ 1 selected
values of A, and then write the interpolation. polynomial of degree n
which they determine. This might be advantageous when the matrix is
small, and values of \ could be found for which the evaluation is especially
simple. However, it provides no assistance for the computation of the
proper vectors.
4.21. Symmetric Functions of the Roots. If we write

(4.21.1) f(d) = (—)"@(A) = A — yu yd? +o + (-) 10,


then the coefficients y, are the “‘elementary”’ symmetric polynomials in
the proper values. That is to say, 7, is the sum of the products h at a
time of the \;._ In particular

(4.21.2) ea = tes
Newton’s identities (3.02.5) express the sums of the powers s; as poly-
nomials in these elementary polynomials, where the y; here take the
places of the o; in (3.02.5). But

Ss = 7 = tr (A),

a3 = 2h, = tr (A®),
and in general

(4.21.3) 8h =o ir GA"):

Hence by taking powers of A up to and including the nth, one can com-
pute the sums of powers of the \;, and thence by applying Newton’s
identities find the y,. Hence to find the s, by this method requires
(n — 1) matrix products; each matrix product requires n’ multiplications;
hence altogether to find the coefficients in (4.21.1) requires approxi-
mately n‘ multiplications. .
To improve the algorithm and obtain further information, consider
(4.21.4) CX) = ad) QP A) = Con — Car CAF ee
ete Cena

Then
(4.21.5) C(A)QI — A) = (—1)"eQ)I.

On expanding and comparing coefficients of \ on the two sides of this


equation, one finds
168 PRINCIPLES OF NUMERICAL ANALYSIS
Co= I,
, Cy + CoA i vil,

(4.21.6) Crea
©) 8G SOO 1 Oldie Ke 3e0 ce

Cr-1 + Cr—2A = Vaealy

CriA = Yall.
Now 71 is given by (4.21.2). Hence C; can be found from the second of
(4.21.6). On multiplying this by A and taking the trace of both sides,
one finds in view of (4.21.3)
tr (CiA) + 82. = 7151.

Comparison with the second of (3.02.5) shows that


2y2 = tr (CA).
Hence yz can be found and therefore, by the third equation, C2 In
general
(4.21.7) Ch = Yal — ya-rA + yn-2A? — + + + + At

On multiplying by A, taking the trace, and comparing with (3.02.5), one


finds that
(4.21.8) hyn = tr (Cy,-1A).

Hence the coefficients yy, and the matrices C can be obtained in the
sequence Cy = I, y1, C1, v2, Co, . . . » Yn. The final equation in (4.21.6)
serves as a check. Note that, since each C; is a polynomial in A, it is
commutative with A:
ACy-= CyA.
As a byproduct of this computation, one obtains the determinant
|A| = Yn;
the adjoint
adj A = C23;

and the inverse


A = Cht/ Vas

If CQ.) #0, then any non-null column of C(d,) is a proper column


vector, and any non-null row of C(d,) is a proper row vector associated
with );, since by (4.21.5) and the commutativity of the C;, and A we have
CA)AI — A) = Ad — ACO) = fAdI = 0.
But if A; is a simple root, then necessarily C(\;) ¥ 0, and there is at
least one non-null column and at least one non-null row. For suppose
C(:) = 0. By differentiating (4.21.5) and setting \ = 4, one obtains

QT — AC’) = (—1)"¢'(A)I,
- THE PROPER VALUES AND VECTORS OF A MATRIX 169
and on taking determinants
of both sides,

(—1)"9Qi)|C’As)| = [((—1)"9’Q)]*.
But ¢(\;) = 0, whereas ¢’(\;) #0, and we have therefore reached a
contradiction. Hence the method yields at least those proper row and
column vectors that are associated with simple proper values.
In forming the matrices C2, C3, . . . , Cn1, each requires n* multipli-
cations, making n°(n — 2) in all for forming the characteristic function.
Given a \;, to form C(A,) one can form

Codi, (Cods + Ci)rs, (Cod? + Cri + C2)m, .


which requires n?(n — 2) multiplications for each »;, and hence again
n®(n — 2) multiplications when the ); are all distinct. Altogether this is
2n*(n — 2) multiplications. However, if one forms only a single row
and a single column, and these are non-null, this can be reduced to a total
of n?(n? — 4).
Suppose next that d; is a double root, but not a triple root. It may
still happen that C(\;) ~ 0. If so, then again any non-null column is a
proper column vector, and any non-null row a proper row vector. Since
by differentiation of (4.21.5)

(4.21.9) Ca) + QI — ANC’A) = (—1)*¢’A)J,


and since 2; is a double root, therefore

(AJ — A)C’(,) = —C(u) € 0,


and certainly therefore C’(\;) # 0. However,

QL — A)C(A) + AL — A)?C"(A) = (—1)re’A)QL — A),


and therefore
(4.21.10) (x — A)?C’(d) = 0.
Hence there is at least one non-null column z and at least one non-null
row u of C’(A;), and

U(rad ge A)? = 0, ews as Aca = 0,


whereas
u(rAd — A) #0, (AJ — A)x ¥ 0.

Thus u and # are principal vectors of grade 2 associated with ),.


Suppose, on the other hand, that C(A,) = 0. Then by (4.21.9) it
follows that
(4.21.11) (AJ — A)C’(As) = 0,
170 PRINCIPLES. OF NUMERICAL ANALYSIS

since ); is a double root. Then any non-null column of C’(,,), if there is


such, is a proper column vector, and any non-null row a proper row vector.
We can show that C’(A,) has rank 2.
We know from Chap. 2 that a double root has associated at most two .
proper vectors. Hence (AJ — A) has rank n—2 at least. But
0 = C(A,) = adj (J — A) so that (AJ — A) has rank n — 2 at most.
Hence in this case (iJ — A) hasrankn — 2exactly. Hence by (4.21.11)
C’(\;) has rank 2 at most. Let B be a constant matrix of maximal rank
such that
C’(A)B = 0.

Since C’(A,) has rank 2 at most, B has rank n — 2 at least. If we differ-


entiate (4.21.9), set \ = ,, and multiply by B, we obtain

Ad — A)C’A)B = (—1)"6"(u)B.
The rank of the right member of this equation is the same as the rank of
B, since ¢’’(\;) # 0, and this cannot exceed the rank of any matrix factor
on the left. But (A.J — A) has rank n — 2, whence the rank of B cannot
exceed n — 2. Hence B has rank n — 2 exactly, and therefore C’(A,;) has
rank 2 exactly.
Thus when \; is a double root (and not a triple root), either C(A;:) # 0,
in which case there exists a non-null column of C(A;) and a non-null
column of C’(\,), the first being a proper, and the second a principal,
column vector associated with ),; or else C(A;) = 0, in which case C’(X;)
has rank 2 and any two linearly independent columns are proper vectors.
Corresponding statements can be made for the rows.
The argument can be extended to the case of a root of arbitrary
multiplicity.
4,22. Methods of Enlargement. Suppose A,_1 is a principal minor of
A, say that taken from the first n — 1 rows and columns of A, and let

(4.22.1) Aaa lee fa)


Gn—-1 An—1
Then

(4.22.2) hice (ie — Ant — ant


~Gp—1 Die On—1
Let

(4.22.3) daQOse Tn Aa,


and

(4224) BX) = adj 00, See ae ZO)


J n—] Pn—-1
THE PROPER VALUES AND VECTORS OF A MATRIX 171
Here the prime does not denote differentiation, but is merely a dis-
tinguishing mark. Note in particular that

dn—1(A) = AI n—1 — A,-1|.

Now ¢,-1(A) is of degree m—1 in >. However, each element of


fn—1(A) and each element of f}_,(\) is of degree n — 2 at most. Hence if
we note that |
OL Riss An) ae rz bn(A)en;
i ' $n—1(A)
or in more detail
(In—1 ; — An—1)fn—1(A) — Qn—1¢n—i(A) = 0é
ce)
4.22.5
Sel FAN) PIO teas
eee NEO Yan)
it follows that, when ¢,_1(A) is known, all coefficient veetors of fr—1(A)
can be obtained by comparing coefficients, and hence ¢,(A) can be formed
from the last equation. In fact

Nfn—1) = An—ifn—i(A) + Gn—1bn—1(A),


so that beginning with the coefficient of \”~? in f,_1(A), which is simply
a,-1, the vector coefficients can be obtained in sequence.
If one first forms ¢1(A) = A — au, one can then by this scheme form
fi(A) and hence ¢2(A); then fo(A), and hence ¢3(A), . . . , eventually
obtaining ¢,(A).
Given ¢n-1(A), the product An_isfn1(A) requires (n — 1)* multiplica-
tions; Gn—i¢n—i(A) requires (n — 1)?; a),_sfn—s(A) requires (n — 1)?; and
On—1n—1(A) requires n — 1. Altogether this is n?(n — 1). When they
are summed over all values from 2 to n, we have a total of
n(n? — 1)(8n + 2)/12
multiplications, or approximately n*/4. The advantage over the other
method lies in the fact that not the entire adjoint but only one column
of it has been computed.
However, this also accounts for a major disadvantage, when proper
vectors as well as proper values are needed. If for any \; the vector
fn—1(x) and the scalar ¢n-1(\;) do not both vanish, then a proper vector
associated with i, is
be
n—1(Ai) \

bn—1 (Aa)

Undoubtedly this covers the majority of cases that arise in actual practice.
But this column alone is insufficient for obtaining all the proper and
principal vectors when ),; is a multiple root, and moreover no general
method has been provided for the theoretically possible case of a simple
root A; for which this column (but not the entire adjoint) vanishes.
172 PRINCIPLES OF NUMERICAL ANALYSIS

The proper row vectors can be obtained in the “usual” case by using
equations corresponding to (4.22.5) to compute f;_1(A), and hence
1 (4). ;
The “escalator method” also proceeds to matrices of progressively
higher order, but it requires the actual solution of the characteristic equa-
tion at each stage. Consider only the case of a symmetric matrix, for
which moreover all proper values are distinct. Thus suppose for the
symmetric matrix A all proper values and all proper vectors are known:
(4.22.6) AU = UA.
If A is bordered by a column vector and its transpose to form a symmetric
matrix of next higher order, we wish to solve the system

(4.22.7) le *)& Z
Hence
Ay ae an = ry,
(4.22.8) Biel gw

Let

(4.22.9) y = Uw.

Then
AUw + an = Uw,
UAw + an = Uv,
(4.22.10) Aw + UTan = du,
w = (Al — A)~U"an.
Since 7 can be taken as an arbitrary scale factor, (4.22.9) and (4.22.10)
determine the proper vector once the proper value \ is known. How-
ever, from the second equation (4.22.8) it follows that
a'Uw = (A — a)n,
and, on substituting (4.22.10),
(4.22.11) aU(Al — A)“UTa = X —
This equation in scalar form can be written ‘
(4.22.12) Z(a™u;)?/(A — i) =A — a,
and its roots are the proper values of the bordered matrix. If the d; are
arranged on the \ axis in the order \1 > \2 > - +: > Aq, then as X
varies from —© to dn, the left member decreases from 0 to — ©; as
\ varies from An to Ani, the left member decreases from + to —o;
. ; a8) varies from ); to + ©, the left member decreases from + © to
0. The right member increases linearly throughout. Hence (4.22.12)
THE PROPER VALUES AND VECTORS OF A MATRIX 173
has exactly one root \ < \,; exactly one root \ between each pair of
consecutive ;; and exactly one root \ > x. With the roots thus iso-
lated, Newton’s method is readily applied for evaluating them.
4.23. Finite Iterations. If bo is an arbitrary non-null vector, then in
the sequence ;
Do, bi = Abo, be = Abi, ai Ae Ct

at most n of the vectors are linearly independent. Suppose the first


m <n of the vectors are linearly independent, but the first m + 1 are
linearly dependent. Then b,, is expressible as a linear combination of the
other m, and hence for some scalars 8; it is true that
(4.23.1) Oe — PiU tit pert — 6-8 “et B09 = 0;
Hence

(223-2) (A= — BA") BsAn > - EB) by = 0,


which is to say that
(4.23.3) p(A)bo = 0,
where
(4.23.4) BAN) ee NBN ae Be
If
d(A) =’ + 6,47 12+ aimee + 6,

is the highest common divisor of p(A) and (A), then d(A)bo = 0. That
is to say
(Avs Ar 2 te ot 8 by =O)
or ‘
b, ae 5yb,-1 + eae ae 5,bo = 0.

But the vectors bo, bi, . . . , bm—1 are linearly independent, whence v = m
and d = p. Hence p(d) divides the minimal function ¥(A) and hence
also the characteristic function ¢(A). In particular if m =n, then
p(dA) = (—1)"¢(A). When this is true, therefore, one can form the
characteristic equation by first performing the n iterations Ab; and then
by solving a system of n linear equations. When this is not so, one can
at least obtain a divisor of the characteristic function by performing a
smaller number of iterations and by solving a system of lower order.
However, in addition one must test at each step the independence of the
vectors already found, as long as the number is below n + 1.
A great improvement is provided by Lanczos’s “‘method of minimized
iterations.” Let bo and co be arbitrary nonorthogonal non-null vectors.
In case A is symmetric, take by = ¢o; if A is Hermitian, take by = co and
in the following discussion replace the transpose by the conjugate trans-
174 PRINCIPLES OF NUMERICAL ANALYSIS

pose. In other cases by and co may or may not be the same. Form b,
as a linear combination of by and Abo, orthogon#] to co; and c; as a linear
combination of co and Ae, orthogonal to bo. Thus
b; = Abo ae abo,

where
0 = chbi = chAbo rae aochbo;
and
cl = clA — doc},
where
0 = clby = chAbo — Sochbo.
But then
ag = 50 = chAbo/cibo.

Next, choose bz as a linear combination of bo, bi, and Abi, orthogonal to


both co and ¢;; and c2 as a linear combination of ¢o, 1, and A'c:, orthogonal
to both bo and bi. Then
be => Abi =a aby = Bobo

where
0 chbe chAby aa Bochbo,
0 I : clbe I clAby = otyc1by.

Hence
a, = clAbi/clbi, Bo = chAb,/chbo.
But from the relations already derived,
clAb, = clbi = clAbo.
Hence
Bo = clbi/chbo,
and if
ch = clA — acl — Boch,
then it follows that
0= chbo = chbi.

The step breaks down in case cib; = 0, since this is the denominator in
a. But this means that
OS clAbi = clA(A = aol )bo,

and when a is replaced by its value, this is, apart from the factor (clbo)—},
equal to the determinant of the matrix product

(co
chA
(by. Ab 0)-
Hence this can vanish only if the pair bo, Abo or else the pair co, A'cp is
linearly dependent.* Suppose for the present that this is not the case.

*This conclusion, and the more general ones, do not follow. It can be shown, however,
that if neither pair (more generally neither set) is linearly dependent, then after a slight
perturbation of either starting vector, bo or co, the determinant will not vanish. One could
say, in pseudotechnical language, that failure has zero probability.
THE PROPER VALUES AND VECTORS OF A MATRIX 175
We now show that a, and B; can be chosen so that

bs = Abs — ade — Bibi

is orthogonal to all three vectors co, c1, and cz and so that

C3 = Atlee — a2C2 — BiC1

is orthogonal to bo, bi, and be, provided only that the vectors bo, Abs,
A’bo, and also the vectors ¢o, A™co, A™co, are linearly independent triples.
First we note that
chAbe (el = aro} Da = 0,
chAbo ch(by + ado) => 0,

so that cz is orthogonal to bo, and b3 to co, independently of a2 and Ai.


Next
clAbe = (ch + acl + Bocl)bs = chlbo = chAbi.
-Hence
clbs = chbe es Bicibi,

whence c; and 6; are orthogonal if


Bx => chbe/clbi.
But in this event
chby = chbe — Biclby = (0.
Finally,
chbs = chAbe = achbe,
chbe = clAbe — azcybe,
and both vanish provided
ag= chAbe/chbe.

This is always possible if c, and be are not orthogonal. But the matrix
product
Cy chbo chA bo chA 2bo
¢ (bo Abo Abo) => clAby chA 2bo chA 3bo
cA clA 2bo clA 3b chA 4D

has a determinant which by successive reductions of rows and columns


yields
chbo chAbo chA*bo chbo clbo chbo

clAbo chA2bo chA*bo chb 1 clby chby = (chbo) (c{b1) (ibe).


chA%by chA%bo chA4bo chbe clbe chbe

Since we have assumed (clbo)(cibi) # 0, it follows that cibs = Of and


only if either the triple co, A'co, A'’co or else the triple bo, Abo, Abo is
linearly dependent.
176 PRINCIPLES OF NUMERICAL ANALYSIS

We can proceed inductively, forming


bipa = (A —'04)b; — Be-16--1,
(4.23.5) dea tie AE an
where
(4.23.6) a= cl Ab;/clb;, Bs-1 = clb;/cl_,bi-1,

until possibly at some stage c; and b; are orthogonal. When ¢; and }b; are »
not orthogonal, then b;41 is orthogonal to every vector Co, C1, ... , Gi,
and c41 is orthogonal to every vector bo, bi, . . . , by.
Necessarily, there is a smallest m <n for which clb,, = 0, since if
the vectors Co, C1, ... , Cn—1 are all linearly independent, then only -
bn = 0 is orthogonal to them all, and this vector is orthogonal to ¢n,
whatever c, may be. Suppose the relation holds forsomem <n. Then
either the set bo, Abo, . . . , Abo or else the set co, ATC, . . . , (AT)co
is linearly dependent. For definiteness suppose it to be theformer. The
set from which Abo is omitted is linearly independent, for if it were not,
the m selected would not be the smallest possible. Hence Abo is
expressible as a linear combination of the m vectors bo, . . . , A™~1bo.
Hence Ab,,_1 is some linear combination of the vectors bo, bi, . . . , Om—1!
FM ps = Bm—10m—1 ar Hm—20m—2 <r Eh aospodo.
Hence

Dm = Abm—1 ae Om—10m—1 as Ba 20s

=o (Mm—1 =, Om—1) Om—1 eS (Hm—2 4 Bm—2) Om—2 ai Mm—30m—s ie ee HoDo.

But then
0 = ChOm = Mochbo,
= ClO => prclby,

0= clo ibs = (Mm—1 = ie iycl_ SO

By hypothesis all the vector products on the right are non-null, and
therefore
O = wo = wi = °° * = m1
— Om,
whence b,, = 0 and

(4.23.7) 0 = (A — an_WD)bm—1 — Bm—2bm—s.

Thus if the vectors bo, Abo, . . . , A%bo are linearly dependent, then
(4.23.7) holds; correspondingly if the iterates of co by A™ are linearly
dependent, then it follows that
(4.23.8) 0= (At = Om—11)Cm—1 — Bram.

Hence the recursion (4.23.5) and (4.23.6) can be continued until for some
« = m either (4.23.7) or else (4.23.8) holds.
THE PROPER VALUES AND VECTORS OF A MATRIX 177
Consider now the sequence of polynomials

pod) = 1,
pila) = (A — ao)pod),
(4.23.9) p2(A) = (A — ai)pi(d) — Bopo(d),
ERO, (8, Cae) (05 elec ey (6: .6 ver ele! ie! 6. ese

© AO Ob HAL IOME Oy sey Oe) 0 Ore, CO wine, 10) dO) WO) Ooms ope

where the a’s and #’s are defined by (4.23.6). One verifies inductively
that

(4.23.10) pi(A)bo = bi, pi(A')eo = Cj.

Hence either pn(A)bo = 0, or else pn(A‘)co = 0. In either case pn(d) is


a divisor of the minimal function ¥(A), its coefficients are provided with-
out the necessity for solving a system of equations, and moreover the test
for dependence of the successive sets of iterates of bo and of co is performed
automatically in the course of the computation.
Suppose now that the proper values are all distinct. Then if W is the
. Matrix of proper vectors, _
(4.23.11) WAW =A,
and A is diagonal. Also

If only one of 6b, and cm vanishes while the other does not, one can by
a different choice of bo (if b» = 0) or of co (if cn = 0) obtain a longer
sequence. Hence suppose that bn = Cm = 0. Moreover pm(d) has only
simple zeros:
(4.23.18) p(d) = pu(A) = (A — x)(A — Ae) + + + (A = An)
Then

(4.23.14) gi(A) = p(rd)/(A — &)


is a polynomial, and there is no nonconstant factor common to all the
q: Hence by a theorem in §2.06 there exist polynomials f;(\) such that

Zfi(A)gi(r) = 1.
Hence

(4.23.15) Dfi(A)g(A) = I = Zfi(A™)gi(A"),


and therefore
(4.23.16) Dfi(A)gi(A)bo = bo, Dfi(AT)q:i(A")co = Co.
178 PRINCIPLES OF NUMERICAL ANALYSIS

Also, since polynomials in A (or those in A‘) are commutative with one
another, .
Zfi(A)gi(A)d; = by, Zf(A )ai(A eg = cy.
But -
(A — Ail) fi(A)ai(A)db; = fi(A)p(A)d; = 0.
Hence f;(A)q:(A)b; is a proper vector of A associated with the proper value
:, 80 that each 6; is expressed as a linear combination of proper vectors.
Again, if the first of (4.23.16) is solved for any proper vector

f(A) agi(A)bo,
it is expressed as a linear combination of vectors bo, Abo, Abo, . . . , and
these in turn are expressible as linear combinations of the b;. Hence
each proper vector appearing in the first of (4.23.16) is expressible as a
linear combination of the b;. Likewise each proper vector appearing
in’the second of (4.23.16) is expressible as a linear combination of the
c;. Let u; represent the proper vectors of A, »; the proper vectors of A’,
which appear in (4.23.16). Then
(4.23.17) bo = ZU, Co = 20;.

Since the wu; and »v; are proper vectors, therefore

bj = pi(A)bo= JHA yn 2,Pil us


(4.23.18)
cj; = D;((A )e = Sp(AT)y, = »pi(di1)03.

Hence if we let
po(A1) donc Dm—1(A1)
(4.23.19) Pic ee er tae aeons cee ;

Do(Am) » «+ Pm—1(Am)
BOS (by ae roan) Com (Cpe
cues)
4.23.20 } m2
( ) OO, Pec Yom) V = (y EP

then (4.23.18) can be written

(4.23.21) B, = UP, C = VP.

But
VU tee -D,

where D is a diagonal matrix. Hence

VIB = DP; CU = Pp:


But we already know that the w’s can be expressed in terms of the b’s, and
THE PROPER VALUES AND VECTORS OF A MATRIX 179
the v’s in terms of the c’s:

U = BH, Vi = ICT.
Moreover,
CB =A
is also a diagonal matrix. Hence
CU 2 AH" V'B = KIA,
H=A°C'U =A"P'D, K™ = V'BA“! = DPA“,
(4.23.22) U = BA"P'D, V" = DPA-Ct.
The diagonal matrix D is determined only by scale factors in the vectors
u; and v; and these can be chosen as convenient. Thus the m proper
vectors of A which appear in (4.23.17) can be expressed as linear combina-
tions of the b;, and the m proper vectors of A™ (or proper row vectors of A)
can be expressed as linear combinations of the c; (or of the c}).
When m < n, in general this will be because ¢(A) has multiple zeros.
Suppose still that p,,(A) has only simple zeros and take a bj, orthogonal to
all the c;. Then Abj is also orthogonal to all the c;, since

JAB, = (ATe;)"B4,
and A'c; is itself a linear combination of the c’s. Likewise if ci is orthog-
onal to all the b’s of the original sequence, so also will A'c) be orthogonal
to all the b’s of this sequence. Hence one can develop sequences bj, and Cc,
and these vectors will be independent of those hitherto found. They
will yield new proper vectors associated with the multiple roots. More-
over, the new sequences will in general have fewer than m members each.
If they have fewer than n — m members, a third pair of sequences must
be started with bj orthogonal to all previous c’s, and cj orthogonal to all
previous b’s. Since each such sequence will contain at least one member,
the initial vector of the sequence, the process will eventually terminate
and yield all proper vectors.
If, instead of imposing the orthogonality requirement upon bj and c},
one required only that b> ¥ bo and cy ¥ ¢o, then in general new sequences
of m terms each will result, proper vectors associated with simple roots
will be found over again, but a new proper vector for A and one for A' will
be found associated with each multiple root. This course might be
preferred in order to avoid the computation of the orthogonal starting
vectors.
When the zeros of pn(A) are not all distinct, it is still possible to obtain
the proper vectors in much the same way. A resolution such as (4.23.16)
can be employed to show that bp and ¢» are expressible as sums of principal
vectors. Here q:(A) is the quotient of p(A) by the highest power of
(A — 4) it contains as a factor. Each principal vector which appears in
180 PRINCIPLES OF NUMERICAL ANALYSIS

(4.23.16) is expressible as a linear combination of the b; (or of the ¢).


Also, if 2; is the principal vector of A associated with i, then (A — AJ) xi,
(A — dI)%x,, . . . can each be expressed as a linear combination of the 6;.
But every vector of this sequence is a principal vector or vanishes, and
one is a proper vector. Hence associated with each zero of P(A) there
is a proper vector of A which is a linear combination of the b’s and a
proper vector of A’ which is a linear combination of the c’s. Let u; be |
the proper vector of A associated with \;. Then

U= , eds

j
Hence
Chu; = cig}by.
But
c; = pj(A")eo,
whence

(4.23.23) chu; = cpp;(A)u: = chusp;(\),


since u; is a proper vector. Hence

(4.23.24) wig = Chuusp;(rz) /cpb;,


and therefore
Us = Chui > 1; (da) b;/c}b;.
J

But clu; is a scale factor which can be chosen arbitrarily. Thus if in


(4.23.19) the matrix P is taken to be rectangular, one row for each X;, and
the matrices U and V contain only the proper vectors, then (4.23.22)
holds also in this more general case. Again if m <n, one can select
vectors bf orthogonal to every c;, and cy orthogonal to every b;, and form
new sequences to provide any proper vectors not already found.
4.24. The Triple-diagonal Form for a Symmetric Matrix. In the
method outlined in the last section, consider again for the moment the
casem =n. Let

B= (bo by ST tas. G4); C = (co Cy ager: Criln):

Then from the defining relations we have C™B as a diagonal matrix D


whose diagonal elements are

(4.24.1) Oo chbo ad 0, 6 655 On—1 = CoeeDnee ~ 0.

Hence
CTB = D,
Boh = DCs C-1 = D-1BT,
THE PROPER VALUES AND VECTORS OF A MATRIX 181
Also we found that
cl Ab; = auch; = a;6;,
clAby_1 = cl_,Ab; =. clb; = 6;,
clAb; = 0, ji —gl > 1.
Hence the product CTAB is a matrix for which the only non-null elements
lie along, just above or just below, the main diagonal:
ado ba 0 O
61 a6, be 0
(4.24.2) CUAB=| 9. %: 252 63
Oy Of oa ara) Ve: pelriel (eel ev ey emia, a,

Hence

(4.243) BAB=D-CB=(4 |

since by (4.23.6) we know that B;1 = 4,/5;1. One should expect that
after reducing the matrix to this form a considerable step has been taken
toward complete diagonalization.
In case A is symmetric, if co = bo, then C = B, and every 6; > 0.
Hence D* is a real diagonal matrix. If we set
U = BD-»,
then U is an orthogonal matrix, and
ao Bo”% 0 eee

(4.24.4) S=UTAU =[6% om BX...)

For the case of a symmetric matrix, we now consider another method of


obtaining the triple-diagonal form (4.24.4).
Before doing so, however, consider the characteristic function. Let

(4.24.5) po(d) = = (A — a1)pi(A) — Bopo(A),

Nag ba 86 0
= fo’? | 5h — oh. > By) = (A — a2) p2(d) — Brpila).
0 —B,% r — 2
Sule obese a Or ele Les all ale See gue el Womew ae) 0 Sane 10. 46 210, 0) 6) 6 16-4) se) © +e 0 4B Oe)

Thus the polynomials p,(\) are the expansions of the determinants of the
first principal minors of the matrix AJ — S. Note that pji+1(d) and pi()
182 PRINCIPLES OF NUMERICAL ANALYSIS

cannot have a common factor, for if they did, this factor would be con-
tained also in p;_1(A), hence also in p;_2(A), . . . , hence also in po(A),
which is absurd. Also at any p for which ae = 0, pi+1(p) and Pi-r(0)
have opposite signs, since each 6; > 0.
We can show that between any consecutive zeros of p;(A) there is a
zero of pi+i(A), and further that pii1(A) has a zero to the left, and one
to the right, of all those of p,(A). Hence between any two consecutive
zeros of pii1(d) there is exactly one zero of p,(A). Since Pr(ao) = 0, and
since p2(ao) < 0, while po(+ ©) = +, the statement is certainly true
for? = 1. subeoce it demonstrated oe 3, D4, - . + , Pi+1 and consider
pire. Let pi and pz be consecutive zeros of pi4i(A). Then

pi+2(p1) = —Bip.(p1), Di+2(p2) = —Bips(p2).

The hypothesis implies that po, ps, . . . , Pit1 can have only simple zeros
and that p; has one and only one zero between the consecutive zeros p; and
po Of pii1; hence p;(p1) and p.(p2) have opposite signs, and hence p;+2(p1)
and p:+2(p2) have opposite signs. Therefore p:+2 has an odd number of
zeros between pi and pe.
Next suppose p is the greatest zero of piz1. Then pii2(p) = —PBepi(p).
But p; has no zero exceeding p, and pi(~) = +0. Hence p,(p) > 0,
and therefore pis+2(p) <0. Hence p;;2(p) has an odd number of zeros
exceeding p. But pi;2 is of degreei + 2. The hypothesis of the induc-
tion implies that p,i1 has 7 + 1 real and distinct zeros; these divide the
real \ axis into 7 segments and two rays extending to +o and to —,
respectively. We have shown that each segment and one of the rays
each has on it an odd number of zeros of pii2, Hence each can contain
only one, and the remaining zero lies on the other ray.
Thus the polynomials p;(\) have all the properties required of a Sturm
sequence, as in §3.05, though they are not formed in the same way.
Hence by counting the number of variations in sign exhibited by the
sequence p,() at each of two values of \ and by taking the difference, one
has the exact number of proper values of the matrix A contained on the
interval between these values.
Now return to the symmetric matrix A and consider, ab initio, the
problem of reducing it to a triple-diagonal form. Suppose A is 3 X 3.
Then if a12 # 0, one can find an orthogonal matrix of the form
130 0
(4.24.6) U=|0 ¢ —s},
0s Cc

where c and s are the sine and cosine of some angle, such that in the trans-
formed matrix
Al = UTAU.
THE PROPER VALUES AND VECTORS OF A MATRIX 183
the elements a, = a, = 0. In fact, one finds
(4.24.7) wi as = a, = Caz — Say,

and hence one has only to choose


(4.24.8) K = a13/a12, c = (1 + x)-%, 8 = Ck,
which can always be done if ai. ~ 0. It turns out then that
ae = a, = Car + sayz = at12/C,

so that as a result of the transformation

late] > |or2| > 0.


For an arbitrary symmetric matrix A, let it be partitioned

An A
4.24.9 Salsrarong i
( ) a G2 ra)

where Ai: is of order 3. Suppose ai ¥ 0. Let


U 0
(4.24.10) V= (i ’)
c
where U is the orthogonal matrix of order 5 <lefined by (4.24.6) and
(4.24.8). Then V is orthogonal, and

(4.24.11) VIAV = (UTAWU sit)


AnU An

Hence the matrix V transforms the symmetric matrix A, arbitrary except


that ai2 ~ 0, into a matrix A’ in which aj; = 0. Moreover, the trans-
formation leaves unaffected all elements in the first row of Ai2, and in the
first column of A», as well as the entire submatrix Ao.
Now if in A’ the element a, ~ 0, one can interchange the third and
fourth columns and the third and fourth rows, apply a similar trans-
formation, interchange again, if desired, and obtain a matrix A” in which

af, = of = ol, = of, = 0.


By continuing in this fashion, all elements but the first two in the first
row and all elements but the first two in the first column can be caused to
vanish.
Having achieved this, one can now operate with the submatrix of order
n — 1 obtained after leaving out the first row and the first column of the
matrix. Eventually, therefore, one obtains the required triple-diagonal
form. When the characteristic equation is solved, the proper vectors
associated with each proper value are obtained by a direct solution of the
homogeneous equations.
184 PRINCIPLES OF NUMERICAL ANALYSIS

If it should happen that any 8; = 0 on the off diagonals, then it is


sufficient to consider separately the characteristic equation and proper
vectors of the submatrix above and to the left of the vanishing 6; and of
that below and to the right.
4.3. Bibliographic Notes. A good development of the topics outlined
in §4.0 can be found in MacDuffee (1943). Of the many papers on bounds
see especially Taussky (1949), Brauer (1946, 1947, 1948), Parker (1951),
Ostrowski (1952), and Price (1951).
Papers by Aitken (1931, 1936-1937) are classic. A more recent gen-
eral discussion of iterative methods is by Semendiaev (1950). Barg-
mann, Montgomery, and von Neumann (1946) discuss the use of the
trace of powers of A and consider in particular the prevention of “‘over-
flow” in spite of round-off. The use of polynomials to accelerate
convergence was proposed by Flanders and Shortley (1950). On the
use of transformations and orthogonalization for successive proper values
see Hotelling (1943) and Feller and Forsythe (1951). Kohn (1949)
describes without proof an iteration which converges to an arbitrary
(but random) proper value.
Aitken uses the A operator for obtaining successive proper values.
On equal and nearly equal roots see Rosser, Lanczos, Hestenes, and Karush
(1951). Rotational diagonalization, for which equality or near equality
is no difficulty, was used by Kelley (1935), but is in fact much older.
The method was discussed by Goldstine in August, 1951, at a symposium
at the Institute for Numerical Analysis and is to be published with
detailed error analysis in a paper by Goldstine, Murray, and von Neumann.
The method of §4.21 is due to Frame (1949, and an unpublished paper).
It was also published by Fettis (1950), but without detail or considera-
tion of multiple roots. The recursion defined by (4.22.5) was given by
Bryan (1950). On the escalator method, see Morris and Head (1942),
and for a more general (brief) treatment, Vinograde (1951). Lanczos
(1951a and 1951b) gave the method of minimized iteration. The triple-
diagonal form with a valuable treatment of error is discussed by Givens
(1951 and a forthcoming memorandum).
That the polynomials p;(x) form a Sturm sequence is a classical result
(see Browne, 1930).
A method for the simultaneous improvement of approximation to all
proper values is given by Jahn (1948) and Collar (1948).
CHAPTER 5

INTERPOLATION

5. Interpolation

This book falls naturally into two parts: one part dealing with the
solution of equations and systems of equations, the other part dealing
with the approximate representation of functions. We come now to the
second part.
It may be that a function or its integral or its derivative is not easily
evaluated; or that one knows nothing but a limited number of its func-
tional values, and perhaps these only approximately. In either case one
may require an approximate representation in some form that is readily
evaluated or integrated or otherwise manipulated. If one knows only
certain functional values, which, however, are presumed exact, the
approximate representation may be required to assume the same func-
tional values corresponding to the given values of the argument. The
problem is then one of interpolation. If the given functional values
cannot be taken as exact, then a somewhat simpler representation will be
accepted, and one which is not required to take the same, but only
approximately the same, functional values. This is smoothing or curve
fitting. Even a function that is easy to evaluate may not be easy to
integrate. For approximate quadrature, therefore, the usual method
is to obtain an approximate representation in terms of functions that
are easily integrated.
In general, the method is to select from some class of simple functions
¢(x) a limited number, ¢o0(z), ¢:(z), . . . , n(x), and attempt to approxi-
mate the required function f(x) by a linear combination of these functions.
Thus we wish to find constants y; such that the function

(5.0.1) B(x) = yoo(x) + yidi(z) + °° + + Ynbn(x)


is in some sense a reasonable approximation to f(x). If we agree to use
n + 1 functions ¢, then we have n + 1 constants y at our disposal, and
we can impose n + 1 conditions for their determination. In interpola-
tion the conditions imposed are that f(x) and ®(z) shall be equal at each
of n + 1 distinct values z; of the abscissas. Thus we require that
(5.0.2) F(a) = B(x) (= 0;1,.. «...,5 1%).
185
186 PRINCIPLES OF NUMERICAL ANALYSIS

The points x; on the axis of abscissas will be called the fundamental


points. . A particularly simple choice of functions ¢,(x) for most compu-
tational purposes is
gi = x,

so that 4(z) is a polynomial of degree n.


Another possibility is to require that (5.0.2) shall holdfor7 =0, .. . ,7,
while forj =r+1,... , we require that
(5.0.3) f’ (a) = B(x).
In this event all the x; must be distinct, but they may coincide with some
of the z;. We could equally well require that higher derivatives of f and
@ shall be equal for certain values of x, provided that altogether we have
exactly n + 1 independent and consistent conditions imposed on the y’s.
Still other types of conditions may be used, and some will be discussed
later.
To return to (5.0.2), if we write

(5.0.4) yi = f(a)
for brevity, then Eqs. (5.0.2) can be written

(5.0.5) Y3 = Zyidi(z).
If the determinant

(5.0.6) A = |¢:(x;)| 4 0,
then these equations have a unique solution which can be written down
on applying Cramer’s rule. It is clear that (5.0.6) will not be satisfied
if any two of the z; are the same, since then two rows of the determinant
would be identical.
Equations (5.0.1) and (5.0.5) can be regarded as n + 2 homogeneous
equations in the n + 2 quantities —1, yo, v1, ..., Yn. Hence their
determinant must vanish:

do(%o) dito) «~~. dno) Yo


go(t1) gilt) 2 2. nla) Y1
(BiO!7)tace” Uylictsa: alienate tee ecn et, ai
$o(4n) $1(4n) OF 0G Pn(Xn) Yn

do(t) gilt) ... galt) B(x)


This can be regarded as an equation in ®(z). If we expand this determi-
nant by elements of the last row, the last term will be A&(zx), and every
other term will be equal to some ¢,(x) multiplied by its cofactor, which is
a constant. Hence when we solve for (x), we shall have (x) expressed
as a linear combination of the functions ¢,(x) in just the form (5.0.1), as
INTERPOLATION 187
required. Also if we set x = x; in (5.0.7) and subtract row 7 + 1 from -
the last row, we get —
A(@(z;) — ys) = 0,
which shows that S(x;) = y;, and the values of © at the points 2; agree
with those of f(z). We can write the solution of (5.0.7) in the form

do(to) i(%o) «~~ Gn(%o) Yo


(5.0.8) A®(x) = — Grltbh dlc 4. Ce) es :

o(@) te O1(%) . -< Weietibal(s) eb 0

If we expand the determinant on the right of (5.0.8) by elements of the


last row, and divide by A, we obtain the form (5.0.1). Also we note that,
if we expand along the last column and divide by A, we obtain a form
(5.0.9) B(x) = Dy;A;(z),
where each A,(x) is itself a particular linear combination of the ¢;(x) with
coefficients depending only upon the x;. Hence for a particular set of
fundamental points the A; can be calculated once and used for any f(z).
This would be useful, for example, if interpolations are to be made for
each of several different functions, all of which are tabulated for the same
values 2x.
Equation (5.0.9) exhibits an important property of interpolating func-
tions: their linearity. Thus, to make explicit the fact that @ is the
interpolating function for f(x), let us designate it ®(f; x) and write
(5.0.9) in the form

(5.0.10) ®(f; x) = Zf(zi) A(z).


By the same rule, if g is any other function, its interpolating function is

®(g; x) = Zg(2xi)A(2).
But then if \ and uw are any constants, and A(x) = Af(x) + ug(a), it
follows that
(5.0.11) - &(h3 2) = Z[f(as) + wg(a)]Ai(z)
AMl AB(f; 2) + wb(G; 2).
It is understood that the basic functions ¢; and the fundamental points 2;
are fixed throughout.
We may note further that, since by (5.0.7) &(¢;; x) = 9;, therefore

(5.0.12) (2) = ) a(aAl@) (7=0,1,..., m).


1=0
188 PRINCIPLES OF NUMERICAL ANALYSIS
Suppose we require now that at 2, the derivatives of and f shall be
equal:
(an) =o = # (2s),
Then xz, may or may not coincide with one of the other z;. We modify
(5.0.7) in the next to the last line and write

bo(to) i(%o) . ~~ dn(%o) Yo

C0A8) Giz) Aon) dee Weed 4


dot) gift) ... Gn(t) B(x)
Again the function &(x) which satisfies this equation is a linear combina- —
tion of the functions ¢;(x) which takes on the prescribed values y; for
j7=0,1,...,%-—1. Since (5.0.13) is an identity, we can differenti-
ate, and only the last line is affected, all other elements being constant.
Hence we see that ©’ takes on the prescribed value at x,. If we require
the derivatives of and f to be equal for any other x;, we replace also
that row of (5.0.13) by a row of derivatives, and again the equation
defines the required function &. This procedure can be used for deriva-
tives of any order where the solution (5.0.8) is modified only by replacing
appropriate rows in the two determinants by rows of derivatives of the
order required. The form (5.0.1) still holds, and the form (5.0.9) is
modified only by the replacement of certain y; by the value of the deriva-
tive. The linearity property is unaffected.
5.01. Some Expressions for the Remainder. The determinant

$o(x) $n(2)
(5.01.1) Wa) =| eect eee
P(%) 6. OP)
is known as the Wronskian. If this remains different from zero every-
where on the interval of interpolation (a, b), one can define the linear
operator Z,41 by the relation
ole)... ala) 42)
(6.01.2) Eel) WD) |soon Pl ce eG) ynle
Gety (x) ae gaty (x) path (x)
= petD + aio™ + oo % + An+19,

and the linear differential equation of order n + 1

(5.01.3) Lntile] = 0
is satisfied by each ¢,(z). Moreover every solution of (5.01.3) is expres-
sible as a linear combination of the ¢;.
INTERPOLATION : 189
In general, for y < 7 one can similarly define the linear operator L,41,
and the. differential equation of order »y + 1

(5.01.4) L,+:1¢] = 0
is satisfied by $0, $1, . . . , ¢», and every solution of (5.01.4) is expressible
as a linear combination with constant coefficients of these ¢’s.
An equivalent definition of the operators L,,, can be obtained as
follows: Define the differential operator

(5.01.5) D = d/dz,
and select bo so that
(5.01.6) (D — bo)¢o(z) =0, bola) = 44(x)/do(z).
Then .
(5.01.7) L,[¢] = (D — bo)4,
since the two differential equations L[¢] = 0 and (D — bo)? = O are
both satisfied by ¢o. Again let b; satisfy

(5.01.8) (D — b:)(D — bo)di = 0, bi = (4, — bo1)’/(¢4 — bods).


Hence

(5.01.9) Lald] = (D — bs)(D — bo)6= (D — bi) Lal o@)).


Proceeding sequentially, we define be, bs, . . . , ba so that

(5.01.10) Ln+il¢] = (D — ba) + - + (D — bo)d = (D — BLL].


A generalization of Rolle’s theorem can now be stated: If the functions
b;(x) are all analytic on the interval (a, b), and if $(x) is analytic, and
vanishes n + 2 times, counting multiplicities, then Ln;:[¢] vanishes at
least once on the interval.
First consider any two consecutive zeros of ¢, and define

viz) = o(z) exp |— [[*bole) de |.


Then
v(x) = L,[¢] exp [- if* bo(z) dz|.
Then y and ¢ vanish together, as do y’ and L,[¢]._ By Rolle’s theorem .
y’ vanishes at least once between consecutive zeros of ¢. By a simple
extension of the argument, L.[¢] vanishes at least once between consecu-
tive zeros of L,[¢]. Eventually we conclude that L,+:[¢] vanishes at
least once between consecutive zeros of L,[¢], and hence at least once
on the interval, .
190 PRINCIPLES OF NUMERICAL ANALYSIS

Define the function


é o(s) omneaule $n(8)

(5.01.11) g(z, s) = W~*(s) b-P(s) 22. %-2(s) = Zg.(s) (x).


go(x) SEIS thn (2)
Then as a function of x, g(x, s) satisfies (5.01.3), and
z=s 0, i a eta eR: Ses
(5.01.12) d'g(x, s)/dx*
13 T=.

Hence one can verify directly that

(5.01.13) yo) = adit [? g(x, s)¥(s)ds


satisfies the nonhomogeneous equation

(5.01.14) Ly] = ¥
for any constants a;.
Any solutions y;(x) of (5.01.3) with the nonvanishing Wronskian could
replace the ¢; in (5.01.11), and the same g(z,s) would result. This can
be verified directly by writing

di = Lars;
and substituting into (5.01.11), in which case the determinant |a;,;j
appears as a factor which cancels out. Otherwise one can observe that
the initial conditions (5.01.12) define the solution g(z,s) of (5.01.3)
uniquely. In particular the A,(x) are linear combinations of the ¢; and
hence satisfy (5.01.3), together with the conditions

(5.01.15) Ax(xj) = 4.
If the A; replace the ¢; in (5.01.11), one can write

(5.01.16) g(x,s) = ZT;(s)A,(x).

Note that

(5.01.17) g(zj,8) = ZT.(s) A(z) = T(s),


whence one can write

(5.01.18) g(x,s) = ZA;(x)g(xi,8).

From (5.01.12) and (5.01.16) we verify that

(5.01.19) ee Y B:As(x) ale¥ A) [*Pi(s)¥(s)ds


INTERPOLATION 191
satisfies (5.01.14) with y(#,) = B;. In particular

(5.01.20) h(x) = y Ade) ifTi(s)ds


satisfies |

(5.01.21) Testes ng helen:


With h(x) defined by (5.01.20) or (5.01.21), we can obtain Petersson’s
form for the remainder
(5.01.22) R(x) = f(x) — (x) = f(x) — Ly:A,(x),
1.e., the error made in representing f(x) by ®(x). The function R(x)
vanishes at then + 1 pointsz,. For any x’ + 2;, we can choose C so that
f(x’) — &(z’) — Ch(z') = 0.
It is clear that h(x’) ¥ 0, since otherwise h(x), which vanishes at every
x;, would have at least n + 2 zeros, whence Ln+:[h] would vanish at
least once, contrary to (5.01.21). When C is chosen so that

f(z) — B(x) — Chia)


vanishes at x’, that function has n + 2 zeros, whence

Latilf(t) — &(@) — Ch(x)] = Lapilf(e)] — C


vanishes at least once. Hence for some £& on (a,b), C = Lnsilf(é)]-
Hence
R(a’) = Larilf(O]h(e’).
Hence if we drop the prime, we have
(5.01.23) f(z) = DyA(x) + Lasilf(O)A(2),
where ~ is some point on the interval, and h(x) satisfies (5.01.20) and
(5.01.21).
Since certainly f(x) satisfies
(5.01.24) Lngily] = Lnsilfl,
we can apply (5.01.19) and assert that

(5.01.25) fa) = Y Ate) {ye+ fF Tels)Lanalf(o)]as}


= &(2) +) Ada) [* P(9)Lnsalf(s)lds.
This can also be written

(5.01.26) f(a) = a@) + YA@) [* o(@s)Lnrlf olds,


because of (5.01.17),
192 PRINCIPLES OF NUMERICAL ANALYSIS
o the remainder RF can be written
, — ip
Since i; = ib

(5.01.27) R(x) = [* g(xs)Lnrslf(e)lds — ) As(e) f° Pils) Lmeslf(@)]d5


after applying (5.01.16) to the first integral. The formula remains
equally valid, however, when the end point b of the interval of interpola-
tion replaces the end point a in the limits of integration. When this -
replacement is made and the two equivalent expressions for R(x) are
combined, one obtains the more symmetric form,

(5.01.28)
R(x) = [?K@,s)Lnnslf(e)ds,
2K(zx,s) = g(x,s) sgn ( — s) — ZA,(x)T,(s) sgn (a; — 8),
where sgn u is the signum function whose value is +1 when the argument
is positive and —1 when the argument is negative. Although this func-
tion is discontinuous where the argument vanishes, nevertheless the
kernel K(zx,s) remains continuous, since g(z,s) vanishes at s = x, and
T.(s) = g(a, 8) vanishes at s = 2;.
It is possible to generalize this development to cases where certain con-
ditions y; = &(z;) are replaced by conditions of the form f(#;) = 6™(a,),
which require the equality of derivatives of f and the approximating
function $, rather than equality of their functional values. As one
special case, consider the requirements that at some point a
f(a) = &™(a), ave. Os 1 eee eas
This gives the Taylor expansion when the functions are polynomials.
Let the functions y,(x) be chosen so that

Lwly(@)] = 0, (a) =¥,(@) = s+ =Wr Pa) =0, WP(a) =1.


In (5.01.13) replace a by a, the ¢; by the equivalent set y;, and let

¥(s) = Lrzilf(s)].
Hence f(x) can be represented in the form

fle) = Bobo(a) + Bale) +--+ + Badal) + [*o(e,s)Ennlf(o)lds,


provided the #’s are properly selected. On setting x = a, we find

Bo = f(a).
Next, apply the operator L; and again set x = a to obtain

| Bi = Li{f(a)].
Proceeding thus we finally arrive at Petersson’s generalized Taylor’s
expansion
INTERPOLATION | 193
(5.01.29) F(@) = f(a) Po(x) + Lilf(a)Wi(z) + - - - + LL f(a)Wa(x)
+ [° 9(x,s)Lnvalf(s)lds.
5.1. Polynomial Interpolation. Consider now the case of polynomial
interpolation. Equation (5.0.1) becomes
(5.1.1) P(t) =cot ewe t+ ecu? +--+ - + ene.
The determinant A is the Vandermonde determinant,

(5.1.2) A= a I (x; it 23),


t>j

which vanishes if and only if any two of the x; coincide. Equation (5.0.8)
takes the form ae
i Xo ee Xo Yo

(5.1.3) AGN Dae: el Benes


I Or ee ie a

ee
1 « 2 pe AY
which coincides with (5.1.1) if we expand along the last row, but has the
form
(5.1.4) P(x) = DyL(zx)
when we expand along the last column. The L; are themselves poly-
nomials with coefficients which depend only upon the z;. These poly-
nomials are
(5.1.5) L(x) = | [(@ — 2) /@: — x)).
IAI

They can be obtained by direct expansion of the determinant, or we can


verify that they satisfy the necessary conditions if we note that

(5.1.6) L(xj) = 45
with 6,; the Kronecker 6. From this it follows that with L,(x) defined by
(5.1.5) and P(x) by (5.1.4) we have P(z;) = y.
We can write L,(w) in another form if we define

(5.1.7) w(x)
= Ie — x),
for then

(5.1.8) w(t) = ) I @- »),


j tz

(5.1.9) w'(x;)
= [] (@% — %).
Ag
194 PRINCIPLES OF NUMERICAL ANALYSIS

Hence .

(5.1.10) L(t) = (x)/[(e — ai)


w’(x,)],
and therefore |

(5.1.11) P(x) = w(x) > w/l@ — 2;)w'(z,)],

or j

(5.1.12) P(x) = w(2) Yfle)/\@ — %)o"(a)]


The form (5.1.4) with any of the equivalent representations of the ~
L(x) is the Lagrange interpolation formula.
If f(x) is itself a polynomial of degree not greater than n, then f(z)
and P(x) are identical. Hence the L,(x) satisfy the n + 1 identities

(5.1.13) ai = LajL,(x) (jtme OF nh):


This is the identity (5.0.12) for the case ¢; = 2’.
The explicit forms for the L;(x) become rather complicated when
derivatives of P and f are to be equated at some of the points z;, but in
any particular case they can be formed from the determinantal expression.
However, an important special case arises when both conditions P = f
and P’ = f’ are to be imposed at every xz; For this case we have Her-
mite’s interpolation formula for the polynomial H(x) of degree 2n + 1
satisfying

(5.1.14) H(x:) = f(x), He a= ,) (Gime OL eae

We can surely express this in the form


(6.115) (x) = Zy+hi(x) + Ly,H.(z)
with suitable polynomials h,(x) and H;(x), each of degree 2n + 1 or less.
Instead of writing down the appropriate determinant (5.0.13) and expand-
ing, it is easier to proceed indirectly. If in (5.1.15) we set x = 2; and
apply (5.1.14), it follows that

F(a) = Zf(a)hi(ays) + Uf’ (a) Hix).


This relation must hold whatever may be the values of f(x;) and of f’(z;).
Thus, we may have f(x;) = 1 while f(x,) = 0 for every 7 + j, and while
f'(@) = 0 for all 7. This implies that A;(x;) = 1. On the other hand,
if, for some particular k ¥ j, f(x,) = 1 while f(z) = 0 for i #k, and
f'(x:) = 0 for all z, then we find that hi(z;) = Ofork #7. Setting some
f'(%x) = 1 while all other f’(z;) = 0 and all f(z;) = 0 shows that every
A(xx) = 0,
INTERPOLATION 195
This argument, with an analogous one applied to

(5.1.16) - H'(a) = Zyhi(z) + LyiHi(a),


shows that the polynomials h,(x) and H;(x) must satisfy

(5.1.17) hi(a;) = 63, H;(a;) = 0,


h,(z;) = 0, Hi(x;) = 84.
We ask now whether with appropriately chosen linear polynomials
v;(x) and w,(x) the polynomials h; and H; may have the form
(5.1.18) h(a) = vi(x)L4(x), Hele) = w.(x)L2(2).
These are, indeed, of the necessary degree 2n + 1. From (5.1.6) all
conditions are satisfied for h;(2;) and H,(x;) with j #7. Before examin-
ing the case for j = 7, note that, since by (5.1.10)

w(x) = w'(x,)(x — 2) L(x),


therefore
w’(x) = w'(x,)[L.(x) + (2 — 2,)Li(x)],
w(x) = w'(x,)[20i(z) + (@ — a) Lz (x)].
Hence w’’(x;) = 2w’(x,)
Li(x;) or 204 (4:) = w’’(xi)/w' (xi). Now

hi(x;) = v(x) L}(a) = vi(x)


by (5.1.6). Hence, by (5.1.17) »:(7;) = 1. Also, by differentiating
(5.1.18)
h(a) = vj(a)L}(@) + 2v;(x)Li(w)L; (2),
so that, at x;, 0 = v(x) + w’’(x,)/w'(x;), and since »; is linear,
(5.1.19) v(x) = 1 — (2% — 2%)w’’(4;)
/w’(x).
Next 0 = H,(x,;) = w,(x;)L?(x,) so that w:(z;) = 0. Also

Hi(x) = wi(x)L}(x) + 2w(x)Li(w)


Li;(x)
so that 1 = w;(z,;). Hence
(5.1.20) - w(x) = @%— %.

Hermite’s formula, is therefore

(5.1.21) A(x) = 2[yws(x) + yjwi(z)|Li@),


with v; and w; defined by (5.1.19) and (5.1.20).
Identities analogous to (5.1.13) hold for the h; and H;. In particular
for H(x) = 1 we have
(5.1.22) 1 = Yv,(x) L(x).
196 PRINCIPLES OF NUMERICAL ANALYSIS

5.11. The Remainder Term. While the polynomial P(z) is determined


to be equal to f(x) at each of n + 1 distinct values of x, in general P # f
at all other values. Some expressions for the remainder

R(z) = fle) — &(@)


were derived in §5.01. A simpler derivation of one of these for the case
of polynomial interpolation will be given now. Practically this provides |
only an upper bound for the error in terms of an upper bound for the
derivative of order n + 1 of f(x) on an interval which contains all the 2;.
Even though we cannot, or do not wish to, evaluate f or any of its deriva-
tives exactly, we may be able to set limits to the possible values of any _
of these derivatives, and in this case the error estimates will be helpful.
Let 2,41 be the point at which the error is to be evaluated. Define
the function g(x) by

ager: digesthaiG2)

beatt) Lark xrtt f(@n41) 9(2)


1h eo oa gett f(x)
Then
g(%o) = g(t1) = ++ + = g(%n41) = 0.
By Rolle’s theorem, therefore, g’(x)} must vanish at least once in each
of the n + 1 intervals between consecutive values of the z;. If we let 2}
designate the points at which g’ vanishes, then again by Rolle’s theorem
g(x) must vanish at least once in each interval between consecutive
values x; Continuing thus, we conclude finally that g@+»(z) vanishes
at least once at a point £ which lies somewhere on the interval between
the greatest and the least of the z;. Hence for this & we have

1 2o aytt (xo)
1 Ue ets F(2n41) 0.
00 ... @+tD! fore
This is exact, though we know nothing about £ except the fact that it lies
somewhere on the interval named. If we expand along the last row, we
get

ORLON Orel mes cor 0) 74) ue) 64/0 ane, NAP OPN A so Cale cms. p ;
Lee ieee ae OR tt 1 ntl Fa ts sdSoe)
and if we solve this equation for f(x,41) and drop the subscript, we get

(5.11.1) f(z) = P@) +f" olz)/(n + DY},


INTERPOLATION 197
where w is the polynomial defined in (5.1.7). Hence the second term on
the right represents the amount by which P(x) deviates from f(x). This
corresponds to (5.01.23).
Now although we would not know the value of £, which is in any case a
function of z, nevertheless we may know an upper bound for f+» on the
entire interval. Let us call this M,4;. Then for any x on the interval
containing all the z; we have

(5.11.2) |R(z)| S Masilo(a)|/(n + 1)!


The right member of this inequality vanishes for x = 2;, as it should.
Between successive 2;, |w(x)| rises toa relative maximum. With uniform
spacing of the z; the maxima are highest near the ends of the interval, for
if rp << 41 < +--+ < %n, then at one end the factor |x — z,| is large, and
at the other |z — | is large. Hence in this case the approximation is
best for values of x near the middle of the range. For values of z out-
side the range, the inequality (5.11.2) is still valid, provided we under-
stand M,41 to represent a bound for f“*» in the entire interval including
also xz. But outside the range, w(x) itself becomes increasingly large, and
this accounts for the high uncertainty of extrapolation.
5.12. Chebyshev Polynomials and Optimum-interval Interpolation. In
the inequality (5.11.2) the factor M,,; depends upon the particular func-
tion f(x) but not upon the distribution within the interval of interpolation
of the values x; which determine the P(x). On the other hand, (x)
depends only upon the distribution of the z; and not at all upon the
function. Ordinarily, in short calculations one has available a set of
tabulated values of f(x) and must accept them as they are given. But
when a table is being prepared, the location of the x; can be chosen at
will, and some choices might be better than others.
All that we know about the variation with x of the error in the interpo-
lation is contained in the polynomial w(x). The bounds of the error are
least exact at those points x where |w(zx)| is greatest. Hence it is natural
to prefer a selection of points x; on the interval which reduces as far as
possible the greatest maximum of |w(x)|. It is plausible to suppose that,
since the relative maxima ordinarily vary in height from one to the next,
a choice of the x; that reduces the highest maximum will probably raise
some of the others. Hence we might anticipate that the minimal maxi-
mum will be had, if at all, only in a case where all maxima are equal.
And a succession of equal maxima suggests a trigonometric sine or
cosine.
Introduce a change of scale and origin so that the interval over which
the interpolations are to be made is the interval from —1 to +1. Bya
well-known trigonometric identity, cos n is expressible as a polynomial
in cos 6 of degree n. This is trivial forn = 0 andn = 1, while forn = 2,
198 PRINCIPLES OF NUMERICAL ANALYSIS
cos 26 = 2 cos? @— 1. Suppose it verified that
(5.12.1) | cos ré = P,(cos 6)
is a polynomial of degree rforr = 0,1,...,n. Then
cos (n + 1)6 + cos (n — 1)6 = 2 cos n8@ cos 0
by a formula from elementary trigonometry, whence
(5.12.2) Pag=2P.P y= Fae
is a polynomial of degree n + 1 in cos 6. Hence if
4 = cos 0, 0'=="C0s7 2; 0<@0<r7,
then P,,(x) is a polynomial in x of degree n, and since Py = 1, Pi = 2, it
follows from (5.12.2) that the coefficient of x” in P, is 2"-1. Hence each
polynomial
fies ©
(sey T, = 2'— cos (7 cos-? &) (n > 1);
has leading coefficient 1, and since all its zeros lie on the interval, it is a
possible w(x). We can prove that no polynomial #,,(x) exists with degree
n and leading coefficient 1 whose maximum absolute values are all
numerically less than those of 7’,(x).
Since 7’,(x) = 2!” cos né, therefore 7’,(x) has the relative maxima and
minima of +2! for

6; jr/n (Gi= 0, loci. scr, 1);


and hence for
x, = COs (jr/n).
Now if R, is of degree n with leading coefficient 1, and has no maximum
or minimum numerically greater than those of 7’, then

Tn (x9) —RR(x) = 0, Tn(%1)— Ra(x}) < 0,


since the maximum of 7, at x) cannot be less than the value of R, at a
the minimum of T,, at x, cannot exceed the value of R, at vi, ....
Hence the polynomial 7, — R, must vanish at least n times on the
interval. But 7, — R, is of degree only n — 1, and therefore
T, -Rk,=
_ The application of this theorem is that, if one chooses the n + 1 values
a; to be the zeros of the polynomial 7,41, making this the w of (5.11.2),
then the greatest possible error of interpolation anywhere on the interval
is 2-"Mn41/(n + 1)! for any function whatever whose derivative of
order n + 1 does not exceed M,1; on the interval. Any other choice of
fundamental points would replace the factor 2-* by a larger one.
INTERPOLATION 199
The polynomials defined by (5.12.3) are known as the Chebyshev
polynomials. Their zeros are easily found, for they vanish when
nO =(2i+1)"/2 (¢=0,1,...,n—D),
and hence when

(5.12.4) x; = cos [(27 + 1)x/(2n)].

In these formulas it is understood that the range of interpolation has been


transformed to the interval from —1 to +1.
Function tables may contain hundreds or thousands of entries, and
for any particular interpolation one would expect to use only a few con-
secutive ones. When the table is to be printed in a book, ordinarily the
abscissas z; are uniformly spaced. When tabular entries are required
for automatic computation, it is important to reduce to a minimum the
number of entries to be recorded. The use of the Chebyshev points
x; may then be appropriate.
Suppose that we are willing to use an interpolation polynomial of
degree n at most and that an error of magnitude ¢ can be tolerated. The
entire range of the variable is to be broken up into subintervals within
each of which at most n + 1 Chebyshev points 2; are to be selected at
which to evaluate the entries f(z;) for the tabulation. The entries f(z;)
on one of these subintervals will be used to determine the interpolation
polynomial for that interval. We would like to use as few of these sub-
intervals as possible, and hence we would like to make each subinterval
as long as it can be made without allowing the interpolation error to
exceed e, or the degree of the polynomial to exceed n. In some circum-
stances an optimal solution is possible.
We consider here the problem of making a particular interval as
long as possible. When the end points a and b are known, then the
transformation
x =[(b —a)u+ (6+ a))/2,
(5.12.5)
u = (24 — b — a)/(b—
a),

transforms the interval (a, b) in LS SEA xz to the interval (—1, 1)


in the variable u. If
n

¢(u) = J] (u — w),
0

where wu; is given by substituting 2; in (5.12.5), then

w(a) = 2-"-(b — a)"**o(u),


as is verified directly.
200 PRINCIPLES OF NUMERICAL ANALYSIS
~ In view of (5.11.1), the condition that the error shall nowhere exceed e
is that -
1)ke.
< (n +|
[ford(E)a(x)
Suppose |f+»(x)| is monotonically decreasing. Then this inequality is
surely satisfied if
(fe? (@|M < (n + 1)!
where M represents the maximum of |w(z)| on the interval (a, 6). If
N is the maximum of |¢(u)| on the interval (—1, 1), this is equivalent to
(b — a)"#1N|fO+D(a)|
< (mn + 1)12+.
Hence for a fixed a the longest admissible interval b — a would be that ©
for which the equality holds:
(b = q)"+1 a, (n + 1) 1Qnt+1eN—1| f+ (q) has

If the Chebyshev points are used, ¢(u) = Tn41(u), then N = 2-", and
therefore
(5.12.6) b=atA4(n + 1) !2-| for (a)[vorn,

When |f+»(z)| is monotonically increasing, a and b can be interchanged.


If |f+» (a)| remains monotonic over the entire range of the tabulation,
the range can be divided into optimal intervals by starting at one end
and working toward the other; if it has a single maximum, one can start
with this and work toward the two ends. Other cases will require
special treatment.
5.13. Aztken’s Method of Interpolation. We turn now to computational
procedures. Calculation of the Z;(x) directly from (5.1.5) or (5.1.10)
involves considerable labor if the degree is higher than one or two.
Tables of the L; are available for equally spaced z;. If these are not at
hand, or if the x; are not equally spaced, then Aitken’s method of compu-
tation is almost ideally simple.
We first obtain a generalization of the formula (5.1.3). Let P; stand
for P(x;); let P;; stand for the linear interpolation polynomial determined
by (a, y:) and (a;, y;); let P.;, stand for the quadratic interpolation
polynomial determined by (a:, y:), (aj, y;), and (xx, yx); ... . 3 and let
Po1...n stand for P itself. As for P;, we can regard it as the interpolation
polynomial of zero degree determined by (2;, y;).
Note that
12Oa P ji

and that in general for any P,;... permuting the subscripts leaves the
polynomial unchanged. Note also that
(5.18.1) P3 See: (z:) = Yi.
INTERPOLATION 201
The generalization of (5.1.3) is that for any m, if M stands for the set of
subscripts m + 1,m+2,...., 7, then

1 Xo xy Pou

eats 2). i ge
taupe cosa ca = 0.

ik > % Ge IP

If in place of the P;x we were to write P;, this equation would define the
interpolation polynomial of degree m determined by (2, yo), . . oie t
(Lm) Ym).
In proof we observe first that the polynomial P defined by (5.13.2)
is of degree n at most, since in the expansion of the determinant x” will
multiply each of the interpolation polynomials P;y, which are of degree
nm — m, and there are no terms of higher degree. Next we observe that,
if in the determinant we set x = x; for 71< m, then P must take the
value assumed by Pix, and this by (5.18.1) is y;. This is true because
all other elements of the last row are then identical with corresponding
elements of the rowz+ 1. Finally, if in the determinant we set x = 2;
for 7 > m, then every Piy becomes equal to y;, making all elements but
the last in the last column equal to y; times the corresponding elements
in the first column. Hence the determinant can vanish only if also P
has the value y;. Hence the P defined by (5.13.2) is in fact the polynomial
P of (5.1.3), and the theorem is proved.
Aitken applies this principle in the following way: In application of
(5.1.3) with n = 1, we have
1 Xo Yo
1
i mM Pu=—jl am yi,
i i270
and hence
— P

Likewise
x—2, P
(5.13.4), Pu=|i a P, iy(22 — 21)
From the theorem then we can say that

(5.13.5) Pic =
%—2
t—2%,

Tn like manner we can form P23, and then


Po
Pr fos
xv — Xo Pore
(5.13.6) Pos = puis ahsdhews — %o).
202 PRINCIPLES OF NUMERICAL ANALYSIS

In a specific calculation x is a specific number, and the sequential com-


putations yield the numerical values assumed by the various polynomials
at that particular x. When different polynomials agreeto a sufficient:
number of significant figures, one terminates the process.
In applying these formulas, one can choose at will the particular poly-
nomials P,;... to be evaluated, bearing in mind only that, when a pair
P;;... and P,;... are used to evaluate P,.;..., they must agree in all
subscripts but one, while the unlike subscripts 7 and k determine the
a; and x, which are to appear explicitly. Aitken proposes a sequence and
tabulation as follows:
Po Looe

Py Po %1 —
P» Pos Poe te — &

P; Pos Pois Po28 tz — &

Py Pos Pos Pores Porrss %4— 2%

However, the best approximation can be expected at any stage when the
abscissa x lies roughly in the middle of the interval containing the par-
ticular fundamental abscissas being utilized. Consequently it is advan-
tageous in using this scheme to order the abscissas so that either .
<4 < 42 < 4%) <4 < 41 < 43 < + - , or else the reverse order holds.
5.14. Divided Differences. Aitken’s method is disadvantageous when
a number of interpolations must be carried out over the same range.
An alternative to the computation of the Lagrange polynomials L,(zx) is
the use of divided differences in the construction of Newton’s interpolation
formula.
The polynomial ’
(5.14.1) P(x) = ao + (@ = Xo)ar + (@ — mo)(%@ — m1)ag + >?
+ (4 — ao)(@ —"2)) = * @— a, 4)an
is of degree n and assumes the values f(x;) at z:, provided

a(x0) = do,
(5.14.2) — f(x1) = ao + (a1 — 20) an,
f(x2) = A+ (Xe — Lo) a1 Sf (x2 = Lo) (Xe = 21)Q2,
SE ORNS ON RF aieee 08 Ue eK Is. a de vouren Tece tel, et ole) opie afemwael cer tate Mine (6)

and the coefficients a; can be determined recursively from these relations.


From these relations it is apparent that for any function f(x) the poly-
nomial Po:...m(x) as defined in the last section is
Pos... m() = ao + (@ — to)ai + +++ + (u@ — x0) + + + (@ — m—1)dm,
where the coefficients are the same as the first m + 1 coefficients in P(z).
Finally, since a, is the coefficient of x” in (5.14.1), this must be equal
to the coefficient c, of x” in (5.1.1). Hence an = cy is expressible as the
. quotient of two determinants, where the denominator is the Vandermonde
INTERPOLATION 203
‘of order n + 1, and the numerator is the same except that the elements
x? are replaced by f(x;). Hence a», is expressible as the quotient of two
similar determinants of order m+ 1. Thus, for the given function f
each coefficient a, in (5.14.1) is a function of the m+ 1 variables 2»,
21, ..., %m. This function is called a divided difference of order m
and is written
(5.14.3)
trie ences! of(Fe) lbh So Lore
f(Zo, Ties ee) ag hs EY oe Alp eae Cary ee Car ae Shee 0 OMG oO Go od

LRG eet ahie) Valdie Pas a


For the particular case when f = x’ for r < m this vanishes for any set of
fundamental points, while for r = m
I (has tin fle Canoe Le
Hence the divided difference of order m for any polynomial of degree m
is a constant, and for any polynomial of degree less than m it vanishes.
The notation [xo, 21, . . . , Lm] is often found in the literature in place
of f(xo, 21, . . - , 2m), but this fails to place in evidence the function
whose divided difference is being written.
Now consider the expansions

Por = f(ao) + (% — 20)f (20, 21),


Poo = f(x0) + (% — 2o)f(xo, X2),
Poir = f(to) + (& — 2o)f(xo, 41) + (@ — o)(% — a1)
f(Lo, Xi, T2).
By a formula analogous to (5.13.5), however, it is also true that

=~ 2, Po (x)
ic — 2).
Por isi vo — F9 Po2(x)

By equating the coefficients of x? in the two expressions for Poi2(x), one


obtains the identity

f(Xo, £1, Z2) = [fF(xo, x2) — f(xo, 21)]/(%2 — 21).


Since the divided difference is symmetric in all variables, it follows also
that
f(©o, £1, %2) = [f(x1, 42) — f(%o, %1)]/(@2 — 20)
= [f(%o, 2) — f(a1, %2)]/(%o — 21).
By (5.13.6) one finds likewise that
(ao, #1, %2, &3) = [f(xo, 41, 22) — f(t, 2, &3)]/(to — 2),
and in general
(5.14.4) f(xo0, 01, Xo, . . -) = [f(a0, 22, . . -.) — fs, 22, . . .))/(eo - 1),
204 PRINCIPLES OF NUMERICAL ANALYSIS

where the omitted variables are the same in all three places. This being
the case, one can form divided differences of progressively higher order
according to the scheme:

xo (xo)
f(xo, £1)
v1 (x1) f(x0, %1, Z2)
f(x1, £2) f(xo, 41, £2, Xs)
(5.14.5) we f(xe) f(&1, 2, Xs) s
f(x2, 3) ;
tz f (xs)

where each f is equal to the difference of the two on its left, divided by
the difference of the x’s on the diagonals with it.
Another expression for the divided difference which can be obtained
directly from (5.14.3) is of some theoretical interest. The coefficient of
f(z;) on the right of this identity is the quotient of two Vandermonde
determinants. When the common factors are canceled out, we are left
with

» [fe) /T] @ - ) |.
™m

(ILO Fae ay eee


i= 0 jt

This brings out again the fact that the divided difference of any order is
symmetric in all the arguments x; which appear in it.
5.141. Integral form of the remainder. Consider the function

¢@) = fll — Dao + tm]


for fixed x) and x1. This satisfies

o’(t) = (1 — 2o)f’[(1 — t)to + tail,


$(0) = f(%o), (1) = f(a).
Hence integration of ¢’(f) and division by (11 — 20) gives

(5.141.1) f(xo, a1) = if“FLL — tao + trilde.


Now consider

| b(t, t2) = fll — ti)ao + (th — te) a1 + teal,


br(ti, t2) = (t1 — 2o)f’[(1 — tito + (tr — te)ar + tend],
t,,t,(t1, te) = (%1 — 20) (ta — ai) f"[(1 — ti)ao + (th — to)1 + toata].
INTERPOLATION 205
Then

[PPG = tte + (hy = te) + teesddts = (@, — 2)!


(%2 — %)— ih btta(hi, te)dt, = (a1 — %o)—'"(ae — 21) [¢4,(t, f1)
— $4(t, 0)] = 2 — ai)“ {f'[(1 — ty)ao + tree] — f’[(1 — tao + hai]}.
But if we now apply the original result (5.141.1), we find

is fsF/T — ti)to+ (tr — te)tr + teeeldte dt


= (x2 r*< x1)"[f(Xo, 2) > f (Xo, r1)],

whereas the right member of this is by (5.14.4) equal to f(xo, x1, 22).
Hence

(5.141.2) f (Xo, V1, X2) = iA Jie(a ron t1) Xo + (t, = te) X1 + toteldts di,

and by a simple induction we have in general

(5.141.3) f (Xo, V1, - » « > ip) = is iM aan 3 fr fela iy t1) Xo

+ (t: — tear + > °° + tntaldt, > + - dé;


In the special case when x) = %, the relation (5.141.1) gives

flo, 0) = f(x).
Again when zp = 21 = 22, it follows from (5.141.2) that

f(%o, Zo, Zo) = f’’(x0)/2,


and generally for m + 1 equal arguments
(5.141.4) io fer 5 20) = (ro) ant.

Hence if the function and its derivatives up to and including the mth are
known at some point xo, and the derivatives of the interpolation poly-
nomial are required to equal those of the function, the table of divided
differences is formed by writing f(xo) m + 1 times, f’(xo) m times, f’(x0) /2!
m — 1 times, . . . , and constructing the rest of the table as before.
In general, whether or not there are repeated arguments, the identity

f(a0, x) = [f(x) — f(ao)]/(@ — 20)


is valid for any x, including in the limit « = 2o, and it can be written

(5.141.5) f(a) = f(@o) + (& — %o)f(zo, 2).


Again
f(#o, v1, x) = [f(zo, x) a f(ao, x1)|/(@ = 1),
206 PRINCIPLES OF NUMERICAL ANALYSIS
and the identity can be written

f(o, 2) = f(wo, #1) + (@ — 21) f(@o, 21, 2).


When this is substituted into (5.141.5), the result is

(5.141.6) f(a) = f(wo) + (& — 2o)f(xo, 41) + (% — 20) (% — 41)f(Xo, %1, 2).
In general,

(5.141.7) f(x) = f(xo) + (% — %o)f(xo, 1) + °°:


+ (@ — %)(@ — a1) * + + (@ — Sn-1)f(Zo, t1, . - - Zn)
+ (x — 20)(@ — %1) + + + (@ — Hn) f(Xo, Xi, . ~~ , En, 2).

If f were a polynomial of degree 7 or less, the last divided difference would


vanish identically. For an arbitrary f, the last term represents the error
made in replacing f by its interpolation polynomial of degree n as given
by the preceding terms. On introducing (5.141.3), we can write

f@) = P(x) + R@),


(5.141.8) R(x) = w(x) if; iisath: if* got) E + to(xo — 2)
te yts(at; — 25-1) |éte . gee
1

where P(x) is the interpolation polynomial of degree n, and R(x) is the


remainder. The expression previously obtained for the remainder R(z)
involved the indeterminate quantity known only to lie somewhere on the
interval containing %, 41, ...,%, and z. Note that in case all the
fundamental points 2%, ..., 2, coincide the expression (5.141.7)
becomes a Taylor expansion, and (5.141.8) is a well-known formula for
the remainder.
5.15. Operational Derivation of Equal-interval Formulas. Naturally
the Lagrangean formulas for interpolation can be specialized to the case
of equally spaced abscissas z;. However, a great many special forms
exist, and most of these are readily derived directly by the use of an
operational scheme.
Suppose that the entire tabulation is made at points . . . 2, 21,
Xo, L1, 2, . . . and that
Ti =Uuth
for every integer 7. Then
Xi = Xo + th

for every 7. The interval width h is assumed fixed throughout. On


making the change of variable
(5.15.1) % = to + uh, Uu = (x — 20) /h,
INTERPOLATION 207
the function f(x) becomes a function of u:
(5.15.2). f(zo + uh) = g(u),
for which the interval width is unity.
We now define a set of operators with respect to the interval h. For
any function f(x), the displacement operator E, the forward-difference
operator A, the backward-difference operator V, and the central-difference
operator 6 are defined as follows:

1.3)
aoe
Af(x) Oe
= f(x + h) — f(x),
Vf(z) = fle) — fle — hi,
a(n) = fle + h/2) — fle — h/2).
Since it is natural to require that

E’f(z) = E[Ef(x)] = Eff + h) = fl + 2h),


and in general
(5.15.4) E“f(z) = f(x + uh)

for any integer u, we may indeed go a step further and accept (5.15.4),
for all real u, integral or not. With this understanding we can write the’
following formal relations between pairs of operators:

A=E—1
= EV.
(5.15.5) SN a VEY = HY — E-*,

In principle any of the four operators can be expressed in terms of any


other, but the expressions are not all simple. Thus A is the ‘‘negative”’
root of the quadratic
A? — SA — & = 0,
and V is the “‘positive’”’ root of
V? + 5V — 8 = 0.
In addition to the operators, we define also the factorials
(5.15.6) u™ =ulu — 1) *** w—r+))
= (% — %0)(@ — a1) > + + (@ — Bra)/h
and the generalized binomial coefficient
(5.15.7) Un = u?/ri.
Since Au(x) = 1, we find for these quantities
(5.15.8) Au = rue», Aug) = Ur—1).
208 PRINCIPLES OF NUMERICAL ANALYSIS
Now if wu is a positive integer, since H = 1 + A, it follows that
(5.15.9) . Be =1+ UcayA + U2) A? + U3yA3 Sie oS ;

and the series terminates after u-+ 1 terms. If these equivalent oper-
ators are applied to fo.= f(xo), we have
(5.15.10) f(x) = fo + uamAfo + UaA*fo + usyA*fo +t ---,

where x is given by (5.15.1). If w is not a positive integer, the series


does not terminate. But the polynomial in x which results from replac-
ing wu by its expression in terms of « on the right of
(5.15.11) P(x) = fo + umAfo + > ++ + umAfo
is a polynomial of degree n in z, and fori = 0,1, . . . , nit is true that
P(a:) = f(a). Hence P(x) defined by (5.15.11) is the interpolation poly-
nomial determined at the points %, . . . ,2n. Thisis Newton’s formula
for forward interpolation. In terms of z it is

6.15.12) P@) =fr4+= == Met oe


Gi te) Ga sep A*fo.
Te nh

Note that the fundamental points which determine this polynomial are
the points whose abscissas are Xo, 41, 2, . . . , Zn.
Newton’s formula for backward interpolation is obtained by expand-
ing E” in powers of V:
(5.15.18) EY = (1+ V9)" =1L+tumv t+ (ut Dov? + (u + 2) V3
+ “ipcre eae

On dropping terms of degree higher than n, one has

(5.15.14) P(x) = fot uamVfot (ut De@Vfo+t:-:


+ (utn— I)mVfo
ee ee
(mime) SES = bt) a
w nth”

For this the fundamental points are the points whose abscissas are x_n,
Tnti,.-+.+, U1, Xo. If these were the same as the points entering in
(5.15.12), but with different designations, then the two polynomial
s
(5.15.12) and (5.15.14) would be identical except in form. They would
also be identical if f were itself a polynomial of degree n, whether or not
the points were the same. In that case A"+1f and V*t}f would vanish,
and both polynomials P(x) would be identical with f.
There are many different forms in which the interpolation polynomial
INTERPOLATION 209
of Moire n can be written, and these can be obtained, one from the others,
by neglecting differences biorder n + 1 and higher, ana then by renaming
the points. One simple scheme is based upon the “lozenge diagram.”
In the array
UrAPf

(u+
Ur+ 1
1) p41
lew.
UA?fott
where parentheses are omitted from subscripts of u, the sum of the terms
in the upper row is equal to the sum of the terms in the lower row:

UrAPfa + (u + 1):
APtf, = UA? hapa + UrysAP tf.
The identity follows directly from the relation

Ur(APf o41 77 A?fq) = [Cu ao 1) ria Fi Urzi]A?t fg,

which is easily verified. Consequently in the array

Eailiag cre
Se! + 2); A

4
eeeSs
ee ats
™ (u ppDs)aspsok
u 3 aeUn ane a thay:

sala oud Afo eae Cree


a) ;
2 7 oH! Saar \ass bat }

e
fi ee (u faa 1)e 2 s

ee ce
fs ae ;

the sum of any two terms connected by a dash pointing down to the right
is equal to the sum of the two terms connected by the dash just: below.
Now if we start with fo and proceed diagonally downward, summing the
first n + 1 terms, we obtain the right member of (5.15.11). But the sum
of any other sequence of terms obtained by proceeding to the right and
ending with A*f) will have, according to the theorem, identically the
same value. Hence we obtain different expressions for the same interpo-
lation polynomial. By ending on A*f; for 7 ¥ 0, we obtain an interpola-
tion polynomial based upon a different set of findamental points.
It has been remarked already that with uniform spacing the interpola-
tion is most accurate for points near the middle of the range. For compu-
tational purposes it is convenient if the coefficients are small. Both
conditions are satisfied if one designates as x» the fundamental point
210 PRINCIPLES OF NUMERICAL ANALYSIS
closest to the point z to be interpolated and if the series contains only
terms as close as possible to the horizontal line through fo. The two
Newton-Gauss formulas result: é
(5.15.15) P(x) = fo + urAfo + U2h*f_y + (u + 1) sA*f_y

+ (u + 1)sA4f_2 + (u + 2)sA°f_2+ +:
and
(5.15.16) P(x) = fo + wAf_1 + (u + 1) 2A*f_1 + (u + 1) 3A*f_2

+ (wu + 2)4A4f_2 + (u + 2)sA5f3s+ °°: ,


the first following the lower, the second the upper broken line through fo-
These last formulas are more neatly expressed by central differences.
In this notation the same lozenge diagram appears as follows:

ihewea’ +1): a (y a 1 a
ea re ame a> (9). ce

fo en 4 a Rake u + Da asfPe, tp (u + iN ate


— haha eae, aes sh aap ea (u+ oh
h ee — 1): i au 4 ley es °

—~ (u- 2): fis eae: - WU 5

fa acai < i: :

The two formulas (5.15.15) and (5.15.16) appear in these notations:


(5.15.17) P(x) = fo -- Urof rs + U26"fo + (u + 1) 35°fx + (u + 1) 464fo
+ *) ce) ae

and

(5.15.18) P(z) = fo + urof_1% + (u + 1) 26°fo + (u + 1) 36°f_y,

Eb 240g ee

These two formulas can be combined to give a single more symmetric


formula, if we introduce the ‘“‘central mean’’ operator » defined by
(5.15.19) Qu = HA+ E-*,
If we add the two formulas and divide by 2, the result is
(5.15.20) P(x) = [1 + upd + u252/2! + u(u? — 12)p53/8!
+ u(u? — 1?)64/4! + wu? — 1?)(u? — 2?)y85/51 + + + Ifo.
If between two differences of odd order in the table one writes their mean,
the coefficients for the above formula all lie on the same horizontal line.
INTERPOLATION 211°
A great many other special formulas can be derived, but for these
reference must be made to the copious literature on interpolation. |
5.151. The remainder in equal-interval interpolation. All interpolation
formulas that can be obtained from the lozenge diagram by following any
broken-line path to a particular difference 5*f) are identical except in the
arrangement of the terms. Hence they all have the same remainder term.
If n = 2p is an even number, the remainder is
(5.151.1) Raga = het u(u? — 12) ++ + (u? — p)f(H/(n+ DY},
where é is a point on the smallest interval containing z, rp, Zpi1, -.- ;
Zo, . .. 2p. If possible, for |u| < 1a formula which endsin a difference
5"fo, 5"f_1, or 6"f; should be used, rather than one ending in some other
6*f;, though this will not be possible when, for example, the tabulation
begins with w= 0. The same expression (5.151.1) for the remainder
holds for (5.15.20) if the series is terminated on a difference 57f) of even
order.
If 0 < uw < 1, a formula (5.15.17) which ends on a difference of odd
order is most symmetric with respect to that point. Ifn = 2p — 1, then
in this case

(5.151.2)
hetu(u® — 19)
Rags = Dre
+ + fu? = (p = 1)Peas
ar > nea
Mu — pferrr(p
we

To obtain an upper limit for the truncation error from #,4,: directly,
one must have an upper limit for f“*+» over the interval containing the z;,
and this is not necessarily easy to obtain. However, it may be known
that each of certain consecutive derivatives retains a fixed sign over the
interval, and these signs may be known. In this event, Steffenson’s
“error test’? can be applied. This rests upon the simple observation
that in any series

S=Utuwut::: + Un + Rati,

where R,,,1 is the error due to dropping terms beginning with wn41, if it is
known that R,,; and 2,2 have opposite signs, then since

Rayi = Unti + Rate,


it follows that
[Rnsil < |unsal
and the error is less than the first neglected term in absolute value.
5.2. Trigonometric and Exponential Interpolation. Given any set of
constants a; and f;, among the possible choices of basic functions ¢,(z)
for interpolation would be

¢:(z) = sin (aw + Bi)


DAD PRINCIPLES OF NUMERICAL ANALYSIS

or
$:(t) = exp (a + B:).
The general methods described at the beginning of this chapter apply, but
not much more can be said when the constants o; and 8; arecompletely
arbitrary and unrelated. Most often, however, the a; will be in arithmetic
progression, in which case certain rere simplifications are possible.
If, possibly after changing scale, one can set
a,
= k,

then for exponential interpolation one has


(5.2.1) @(x) = Dye".
However, if
z= e,
then
d(x) = Syiz* = V(z),
and the exponential interpolation is the same as polynomial interpolation
with a transformation on the independent variable.
Another exponential form is

(5.2.2) OG) yo + viet teyie + ne ye,


+ ne" + i Camino

Given now the 2n + 1 points, 20, 21, . . . , Len, let


0,2n

(5.2.3) E(x) = |] {exp [(e — 2,)/2] — exp [—(@ — 2,)/2]},


jt
(5.2.4) e(x) = E;(x)/E;(z;).
Then
; ei(x;) = by,
whence
2n

(5.2.5) B(x) = Y yiei(z)


0
satisfies
(xi) = yi
and expands into a polynomial of the form (5.2.2).
In the form

(5.2.6) B(x) = yo + yie® + ye + + Hf yneit® $ y_nem ine


if y, and y_; are conjugate complex quantities, then (5.2.6) can be written
in the form
INTERPOLATION 213
(5.2.7) (2) = ao + a1 cos %+ a2 cos 2a+ +++ + a, cos nt
+ 6:sinz + B2sin2a+--- +86, sin nz.
Analogous to the functions E(x) and e;(x) are the functions
0,2n

(5.2.8) S(x) = |] sin [(x — x,)/2],


IAA

(5.2.9) si(z) = S,(x)/S;(x:).


Then
2n

(5.2.10) oa)
eo = th 48:(2).
YiSi(x

5.3. Bibliographic Notes. The literature on interpolation, methods


of approximate representation, and numerical quadrature is vast, and
only a few elementary general principles are given here. On operational
methods see Whittaker and Robinson, Steffenson (1927), Aitken (1929),
McClintock (1895), Herget (1948), Jordan (1947), and the encyclopedias,
both French and German.
The usual expression for the remainder in polynomial interpolation
is (5.11.2); its generalization (5.01.23) is given by Petersson (1949).
Steffenson (1927) gives (5.141.8). The form (5.01.25) for polynomials
was given by Peano (1914; see also Mansion, 1914); Kowalewski (1932)
gave this and (5.01.28), still for polynomials only. The more general
form (5.01.25) for arbitrary interpolating functions was given by Sard
(1948a) and Milne (19480). See also Curry (195la). These will
reappear in the following two chapters. Attention is called to subsequent
papers by Sard and collaborators.*
The discussion of optimal-interval interpolation is based on Harrison
(1949).
The designation ‘‘Chebyshev polynomials” is sometimes applied to
any set of orthogonal polynomials. The term ‘‘Chebyshev polynomial”
is sometimes applied to that polynomial of degree n which among all
polynomials of that degree most closely approximates the function under
consideration, where the measure of departure is the maximum absolute
difference between the polynomial and the function. This is not neces-
sarily the interpolating polynomial which agrees with the function at the
Chebyshev points (5.12.4), since in (5.11.1) it is not shown that & neces-
sarily maximizes f+» at the same time that + maximizes w = T,. No
*In the Russian translation of this book it is pointed out that remainder formulas similar
to those of Sard had been previously obtained by E. Ja. Remez: On some classes of linear
functionals in spaces cp and on remainder terms in formulas of approximate analysis, Trudy
Inst. Mat. AN U.S.S.R. 3(1939) :21-62, 4(1940) :47-82; On remainder terms for some
formulas of approximate analysis, Doklady Akad. Nauk, SSSR 26(1940) :130—134.
214 PRINCIPLES OF NUMERICAL ANALYSIS

simple algorithm is available in general for constructing the closest poly-


nomial approximation to a given function.
One might suppose that by increasing the degree of the interpolating
polynomial one would necessarily improve the approximation to any
given continuous function, at least when the fundamental abscissas are
always uniformly spaced. ‘However, this is not so. For discussion of
these and related problems see Bernstein (1926), Ahiezer (1948), de la
Vallée Poussin (1952), Fejér (1934), Jackson (1930), and Feldheim (1939).
Most treatments of interpolation discuss also the subject of ‘‘inverse
interpolation,” which is the evaluation of the independent variable at the
point where the function takes on a prescribed value, not tabulated. An —
obvious possibility is to interchange the roles of the dependent and
independent variables and interpolate in the ordinary manner. Other-
wise one can equate the interpolating polynomial to the prescribed
value and solve the resulting algebraic equation. In this connection see
Kincaid (1948b).
Most standard texts discuss divided differences. See Aitken (1932)
for the method called by his name and Neville (1934) for a similar method.
Tables of differences, ordinary or divided, are useful for detecting errors
in computed tables of functions. See Miller (1950).
CHAPTER 6

MORE GENERAL METHODS OF APPROXIMATION

6. More General Methods of Approximation


In selecting a particular linear combination

(6.0.1) B(x) = Lyi¢d;(z)


of functions ¢,(xz) to perform an interpolation, the aim is naturally to
determine a combination @ which approximates the given function f at
whatever point or points the function fis to be evaluated. In the method
of interpolation, so-called, certain values y; = f(x;) of f, and possibly
certain values y/” = f(x;), must be known already, or directly obtain-
able. Then the coefficients y; are determined by the requirement that
each y; = ®(z;) and that each y = 6(2z;). However, this is only one
- of many possible schemes for determining the ¥;.
An expansion in powers of — 2» by Taylor’s series up to and including
the term in (% — 2)” differs in appearance from interpolation, but can
be made a special case in which the conditions for determining the
coefficients are that
(6.0.2) (x) = f(x), (ML Re
Petersson’s generalized expansion which satisfies conditions (6.0.2) but
replaces polynomials by a more general set of functions ¢;(x) was given
in §5.01. The well-known expansions in Fourier or other orthogonal
series, with only a finite number of terms retained, illustrate other meth-
ods of obtaining approximate representations that are useful for some
purposes.
In this chapter will be treated only certain fairly immediate extensions
of the methods of interpolation.
6.1. Finite Linear Methods. In the expansion of a function as a
series of orthogonal functions, the coefficients are determined by integra-
tion. The methods to be described here require summation rather than
integration, and hence are called finite.
When the given quantities y; are not exactly, but only approximately,
equal to the f(z,), it is not necessarily worth while to require that the ®(z,)
be exactly equal tothey,;. Iftherearen + 1 quantities y;, approximately
equal to f at n + 1 distinct points x;, it may be saving in time, and quite
215
216 PRINCIPLES OF NUMERICAL ANALYSIS

adequate, to find some linear combination (6.0.1) of m+1<n-+1


functions ¢,, in such a way that the resulting (z;) will be as nearly as
possible equal to the y:;, without however expecting strict equality at
least at all of the points. This is especially the case when the y; result
from experimental measurements or result from computations in which
the round-off is not negligible. But if strict equality can no longer be
asked for, some other criterion must be found that will yield a system of
only m + 1 equations in the m + 1 unknowns but that will in some sense
treat all n + 1 of the points alike.
Let the index r run from 0 to m, to distinguish from 7, j, . . . which
run from 0 ton > m. We seek
(6.1.1) @(x) = 2r-¢-(x) = f(x) — R(e),
where R does not necessarily vanish even at the z;. The criterion will
be given in the following form: Choose m + 1 functions y,(x) so that
the matrix of the y,(z;) has rank m + 1, and then determine the 7, so
that the m + 1 conditions

(6.1.2) >,ve) Re) =0


are satisfied. If the matrix of the ¢,(z;) also has rank m+ 1, then
Eqs. (6.1.2) define the y; uniquely. Moreover, in the special case
m = n, ® becomes the ordinary interpolating function.
Let d, c, y, f,, and p,, respectively, represent the vectors whose elements
are R(x), Yr, Yi, or(%;), and y,(2;); let F and P represent the matrices
whose columns are the vectors f, and p,, respectively. It is required to
satisfy the equations
(6.1.3) Fe =y+d,
(6.1.4) P'd = 0.
Thus y is to be resolved into two components, one in the space of F and
the other orthogonal to the space of P. On combining these two equa-
tions one obtains
(6.1.5) PF = Ply,
If each matrix P and F has rank m+ 1 < n+ 1, then the matrix P'F
is nonsingular, and there is a unique solution y of (6.1.5). Given a
system of functions y,(x), the problem reduces to the multiplication of
the matrices and solution of the linear system (6.1.5), and this is a
problem already discussed at length in Chap. 2.
However, it may happen that at the outset one does not fee how
many Demet ¢, one should use to obtain an adequate representation in
the sense that the vector d has been made sufficiently small. This vector
MORE GENERAL METHODS OF APPROXIMATION 217
represents the amount by which the function © fails to agree even with
the given values y; of f. It is geometrically evident that, if the system
(6.1.5) of order m + 1 has been solved, and then also a new system of
order m + 2 obtained by adjoining a new function ¢,,1, the new residual
vector d cannot be greater than the old one, and in general will actually
be less. Moreover, if n + 1 functions are included, d will vanish since
we are back to the case of interpolation. One would like, if possible, to
start with a single function ¢o and adjoin sequentially ¢1, ¢o, ... ,
stopping only after the vector d has been made sufficiently small. It is
possible to do this without having to solve a new system each time by
forming a pair of biorthogonal systems of vectors.
The method can be developed by showing that unit triangular matrices
U and V can be found such that for the two matrices
(6.1.6) T = FV-, S = PU-,

the matrix product


(6.1.7) St = D
is a diagonal matrix. This is a generalization of the theorem shown at
the end of §2.201.
The theorem is trivial when P and F have but a single column each.
An inductive proof can be given, which also exhibits the algorithm by
assuming relations (6.1.6) and (6.1.7) and showing that, given vectors f,
independent of the columns of F, and p, independent of the columns of
P, it is possible to find vectors u, v, ¢, s, and a scalar 6, satisfying

r,o(¥ t)=@n,
(6.1.8) (S, 8) ee 3)= (P, p),
(S, )'(T, ) = (2 °)
The last relation gives 6 as

(6.1.9) st = 6
and requires that
(6.1.10) T = 0, S't = 0.
The first two require that
(6.1.11) Tv+t=f, Su+s = p.
Hence when the first of (6.1.11) is multiplied on the left by S™ and the
second by 7", s and ¢ are eliminated, and equations in v alone and in u
218 PRINCIPLES OF NUMERICAL ANALYSIS
alone result. Because of (6.1.7), and because D is diagonal and hence
symmetric, v and ware given by
(6.1.12) y = DS, u = D-I"p,
provided only that no diagonal element in D is zero. After v and u are
found from (6.1.12), ¢ and s are obtained from (6.1.11), and 6 from (6.1.9).
This establishes the induction and provides an algorithm for finding the
matrices S, T, U, V, and D.
Now observe that the columns of T are linear combinations of those of
F, and the columns of S are linear combinations of those of P. Equiva-
lent to (6.1.3) and (6.1.4) are the equations
TVe=y+d, Std I ad

Instead of solving for c, one solves for


(6.1.13) b= Ve
by the simple relation
(6.1.14) b = D-'STy.

To return to the induction, suppose that b is given and that the effect of
adjoining a new function ¢, and hence a new vector f, is to be investi-
gated. Vectors s and é are found by the method already described.
The vector b is then unaffected except by the adjunction of an additional
element
(6.1.15) B = o-tely.
The equation

(6.1.16) Tb=yt+d
becomes

7,9 (?)=y+a,
B
where d’ is the new residual, or on expanding and applying (6.1.16),
d+ip=d'.
The vector d is orthogonal to all columns of 7; d’ is orthogonal to all
columns of the enlarged matrix (7, ¢); in particular d’ and £¢ are orthog-
onal. Hence

(6.1.17) d'd = ptt + dd’.


Thus the adjunction of the new function ¢ reduces the squared ene
of the residual d by the amount f7t"t,
MORE GENERAL METHODS OF APPROXIMATION 219

Once it is decided how many functions ¢ are to be included in 4, one


can go to (6.1.13) and solve force. Alternatively, one can form the func-
tions 7,(x) corresponding to the columns t, of T. Thus let 7(x) represent
the row vector whose elements are the first m + 1 functions ro(zx), . . 2
Tm(x) ;let (x) represent the row vector whose elements are ¢o(zx), . . “aie J
om(xz). Then
(6.1.18) r(x) = o(x) V1.
Then &(x) is given in either of the two ways:
(6.1.19) @(r) = 7(x)b = O(x)e.
The matrix V and the functions 7(x) are entirely independent of the
function to be approximated, and depend only upon the functions ¢, and
y, and upon the points z;. Hence if the same basic functions and basic
points are to be used in the representation of more than one function
f(x), it is advantageous to find the 7(x) and the matrices T, S, V, U once
and for all.
6.101. An expression for the remainder. The solution c of Eqs. (6.1.5)
can be written
C= (Pi) Phy:
and hence the function © as

(6.101.1) O(2) = $(x)(P'F)“P'y.


If for the elements y; of the vector y one substitutes the values ¢,(z;)
of any ¢,, the result is
(6.101.2) é-(z) = $(x)(P'F) PY,
since the quantities ¢,(z;) are the elements of the column f, of the mat-
rixF. Thus (6.101.1) expresses ®(x) as a linear combination of the values
y; of f(x) the coefficients of which are certain functions ®,(z):

(6.101.3) ®(x) = Y Bilas,


and (6.101:2) implies that :

(6.101.4) $(z) = Y¥ 8) r(xj),


identically. ;
The Petersson expansion for a = a can be written

(6.1015) fa) = Yarda(z) + f°g(x, 8)Lmaalf(s)lds,


when the special functions denoted in §5.01 as y, are replaced by linear
combinations of the ¢,. Moreover,
220 PRINCIPLES OF NUMERICAL ANALYSIS

us =), entelas) +f." olen )Dmisl(olde


Now multiply this equation by %,(x), sum over j, and subtract from
(6.101.5). Because of (6.101.4), the result is

(6.101.6) R(x) = [”gz, 8)Lmslf(e)lds


— LBe) [Poles mel sol
*
If, as in §5.01, we write ib a
if ‘ we
‘h obtain on applying
(6.101.4)

(6.101.7) R(x) = ) (x) [ o(@s, 2)Lmesl


f(s)ds,
similar to (5.01.26). For the symmetric form given in (5.01.28), we
can again write (6.101.6) with the limit b instead of a and obtain

(6.101.8)
R@) = f° KG, 8)Lmealf(s)lds,
2K (a,s) = g(z, 8) sgn (w — 8) — BBi(x)g(a, 8) sgn (a; — 8).
6.11. Least-square Curve Fitting. The residual vector d always vanishes
when m = n, provided at each step the columns of F,, as well as those of
P, are kept linearly independent. Moreover, the vector c is then inde-
pendent of the choice of the functions y, given only the linear independ-
ence of the sets of functional values. But for any given m <n, the
vectors c and d and the length of d will depend upon the selection of the
functions y. Thus, for example, one could choose functions y each of
which vanishes at all but m + 1 of the points z;. This would give the
interpolating function (a) which passes through these m cas 1 points
and would take no account of the other points.
It is natural to ask that for whatever m < n one might fix upon the
vector c be chosen so that the vector d is as small as possible. In all cases
the vector y is resolved into two components one of which lies in the
space of F and d being the other component. Then the shortest length
possible for d is the length of the perpendicular from the point y to the
space of F. Hence the component Fc of y in the space of F is to be chosen
as the orthogonal projection of y upon this space. This is effected by
making y, = ¢,, and hence by taking the matrix P to be the same as F.
All equations in §6.1 can be specialized immediately to this case. It
may be remarked in passing that in many cases there are good statistical
grounds for minimizing the length (or the squared length) of d, but these
will not be developed here.
MORE GENERAL METHODS OF APPROXIMATION 221
Often experimental conditions may be such that certain values of
the y; may be less reliable than others. If so, greatest “weight” should
be given to the measurements of highest reliability. This can be effected
by associating a diagonal matrix W of order n + 1, with positive non-
null diagonal elements, in which the magnitude of each diagonal element,
say the jth, measures the degree of reliability attached to the value of the
measurement y;, The matrix W is then used as a metric for the space,
and the orthogonality and lengths of the vectors are taken with reference
to this metric. This can be achieved by setting

P= WF
in §6.1.
6.111. Least-square fitting of polynomials. A special simplification is
possible when the function © is to be a polynomial, and

(6.111.1) (2) = 2".


Let X be the diagonal matrix whose diagonal elements are the abscissas
x; Then

(6.111.2) fr = X"fo.
The argument used in proving Lanczos’s theorem (§2.22) can be applied
here to show that each column ¢,,1 of the matrix T is expressible as a
linear combination of Xt,, t,, and é,1. In fact,

to = fo,
(6.111.3) i = (Xx = 9591) to,

bet — (Xx mi a,6;1)t, ste OpOre bis r = ily

where

(6.111.4) a, = WwW Xt,.

From to = fo one calculates ao and 50, thence f:, ai, and 4;, and so on,
sequentially. The functions 7,(r%) are orthogonal on the set of points
x; and are given by

To(2) = it,
(6.111.5) 9 r1(z) = (&@ — aodp")ro(z),
Troi(%) = (% — a,d>1)7-(x) — 6,671,77-1(2), POPE

6.12. Finite Fourier Expansions. Let the abscissas %o, 41, . . + » Ln—1
be uniformly spaced. After making a linear change of variable, we can
assume that

(6.12.1) x, = k/n,
222 PRINCIPLES OF NUMERICAL ANALYSIS

and the function f(z) is to be considered only on the range 0 < a< 1,
or is supposed periodic of period 1. Make a further change of variable by
(6.12.2) =exp (2riz), t=+W/-l.
Then
(6.12.3) w, = exp (2rik/n),
and
(6.12.4) wi, = exp (2rijk/n) = of.
The function f(z) becomes a function of w, F(w), and the interpolating .
polynomial in w gives a representation of f(x) of the form

f(x) = Bo + Bi exp (2rix) + Bo exp (4mix) + + - :


+ Bn1 exp [2(n — 1)riz] + R(x).
If f(z) is real, then either each term is real or else its conjugate complex
also occurs, if it is understood that

(6.12.5) exp (2kix) = cos (2krx) + 7 sin (2krz).


Suppose n = pm is some integral multiple of m, and consider a repre-
sentation

(6.12.6) f(z) = Bo + Bi exp (2ripr) + + > > + Bn—i[2(m — 1)ripz]


+ R(x)
or

(6.12.7) Fw) = Bot Biw? + + + + + Bmw?) + Pw).

If (6.12.3) is taken as valid for all integral values of k, positive and


negative, which amounts to so defining the w, for k negative, or exceeding
n — 1, then (6.12.4) is valid for all integral values of j and k. Hence
n—1
me a PG—h aS
k=0
k=0

Any w, satisfies w = 1, but if k ~ 0, then w #1. But since


ot —LS — Dw
+ or? +--+ +4),
it follows that for k ~ 0

op+tot? + +++ +o,+1=0.


Hence

6.12.8 ae Soph — n if Ie
j ate
( ) > ENE 0. ia oh
MORE GENERAL METHODS OF APPROXIMATION 223
Hence the functions w?/ and w-?* are biorthogonal on the set «, in the
sense of §6.1. Then if the functions w?/ are taken as the 7;, and w-?* as
the o; of §6.1, the coefficients B are given by

(6.12.9) B= 1 >;yrcor?i,
P
If 27 ¥ m, then

or
}
Bm—j = ) Yray?™ = ) yr,
p
B= mY mleos (2rpjk/n) — i sin (2npjk/n)],
k

Bm» = 17} > yrleos (Qrpjk/n) + 7 sin (Qrpjk/n)].


k
Hence, if
Aj = n7 y yx cos (2npjk/n),
(6.12.10) e
B; = 7r2 > yy sin (Qnpjk/n),
k
then
Baek,
Bae ae Ax + tBi,

and
Bi2?* + Bmw ?* = 2A, cos (Qrpkx) + 2B, sin (Qrpkz).

Hence if m is odd, the representation (6.12.6) can be written


(m—1)/2
(6.12.11) f(x) = 40+ 2 Ax cos (2rpkx)
k=1
(m—1)/2 ;

+2 y B, sin (2rpkx) + R(z),


k=1

with the A’s and B’s given by (6.2.10). In case m is even, Bn/2 = 0,
while Amz = Oif nis even, and An, = 1/n if n is odd.
6.2. Chebyshev Expansions. It was shown in the last chapter that if
T,, is the Chebyshev polynomial of degree n, then 7’, is that polynomial
of degree n and leading coefficient unity whose maximum departure from 0
on the interval from —1 to +1 is least. This property was used there to
obtain a particular set of points of interpolation that was optimal in the
sense there described. The property may, however, be utilized in a
different way to obtain an approximate representation of f(x). If f(x)
is expanded in a series of polynomials T,,, and if the coefficients of these
224 PRINCIPLES OF NUMERICAL ANALYSIS
polynomials in the expansion do not themselves increase too rapidly,
then the fact that each polynomial is small over the entire interval
suggests that a small number of terms in the expansion might provide
an approximation that is uniformly good over the entire interval. This
is in contrast to a Taylor series expansion which requires more terms the
farther one goes from the center of the expansion.
The analysis is simplified somewhat by introducing the functions
(6.2.1) C,(x) = 2 cos 8, x = 2 cos #.
Then
Coa) = 1 Cie) =a, Cie) = 277 — 2, .°%
and in general C,(x) is a polynomial in x of leading coefficient 1. We
consider the representation of the function f(x) on the range from —2 to
+2 and seek an expansion

(6.2.2) f(z) = ao + aiCi(az) + a2C2(z) + -°-,


or equivalently

(6.2.3) f(2 cos 6) = a +2 » On COS 6,


1
with @ ranging from 0 tow. Since
2 cos n6 cos mé = cos (n + m)6 + cos (n — m)6,
it follows that

(6.2.4) 2 ee cos n6 cos m6 dé = . for n A m,


0 T forn = m,
and hence formally

(6.2.5) On = eo} re” (2 cos 6) cos n6 dé


= (1/2m) [*, fle)(Cn(x)/VE = Plas.
Equations (6.2.4) express the orthogonality of the functions cos n@
and cos mé, n ~ m, on the interval from 0 to x. Equivalently, C,(z)
and C,,(z), n # m, are orthogonal with respect to the weight function
(4 — x?)-*% on the interval from —2 to +2.
Any power x” of x is expressible as a linear combination of C, and
polynomials of lower order:

a” = C(x) + nCn_2(x) + (3)Cr—s(a) + eee,


If we expand
f@) =fO + 2f'0) + Way#"O0) + >
MORE GENERAL METHODS OF APPROXIMATION 225
and substitute into the integrals (6.2.5), we obtain
_ f(0) f+2)(0) Niven)
(6.2.6) a, = 4 il
+ 1)! ' 2m +2)!
The first term alone on the right is the coefficient of x” in the Taylor
expansion. If this coefficient and a, are comparable, the expansion in
Chebyshev polynomials, carried to a given number of terms, may be
expected to give a representation of the function that is uniformly good
for all |x| < 2; for 2 > |x| > 1 itis better than the same number of terms
of the Taylor series, though the representation will be less good than the
Taylor series for |z| < 1.
6.3. Bibliographic Notes. The remainder formulas given here are
additional special cases of the general remainder formulas of Sard and
Milne. .
There is an extensive literature on methods of curve fitting, and only
a few titles are given in the bibliography. Deming (1938) extends the
method of least squares to cases where the parameters occur nonlinearly
by employing a method of successive linearization. Remainder formulas
in such a situation are not available. A ‘‘method of moments,” common
in the literature, leads to functions y, which are different from ¢,. For
the infinite case, reference is to be made to the literature on Fourier
series, orthogonal series in general, and the problem of moments.
On the representation by Chebyshev polynomials see Lanczos (1938),
Miller (1945), and Olds (1950). Nonlinear methods include the use of
continued fractions (and reciprocal differences, described in the books on
interpolation).
On methods of smoothing data subject to error see Schoenberg (1946,
1952), papers by Sard, and Lanczos (1952).
CHAPTER 7

NUMERICAL INTEGRATION AND DIFFERENTIATION

7. Numerical Integration and Differentiation


A function (x) obtained by any of the methods outlined in the last
two chapters, to the extent that it represents the function f(x) it is
intended to approximate, may replace f in any numerical operations
required. Thus where a derivative or an integral of f(x) is required, one
may differentiate or integrate (x). Unfortunately for finding the
derivative, though #(x) might agree closely with f(x) in value, even over
the entire interval, it can nevertheless happen that the slopes might be
radically different. With integration, however, the situation is more
fortunate, for clearly if on the interval from a to b the deviation R(x)
of from f nowhere exceeds « in absolute value, then the error in the
integral cannot exceed «(a — b); and since ordinarily R(x) will change
signs at least once, one can expect the error to be much less than this.
7.1. The Quadrature Problem in General. With the m + 1 functions
¢,-(x), consider in addition a fixed function w(x) and the integrals

Ca) f” be(x)ww(a)dar = pp.


Suppose that values
(7.1.2) Yi = f(a)

are known at n+ 1 points x;. We seek coefficients \;, independent


of the values of y;, so that in the expression

(7.1.3) [ s@u@ae = yu + B
the remainder R vanishes whenever f is equal to any of the ¢,. Hence

(7.1.4) mes ),dse(a).

Here are m + 1 equations for determining the n + 1 required coeffi-


cients \;. If the equations are consistent, they can be satisfied whenever
n > m, and ifn > m, it is possible to impose n — m further conditions on
ther;. Ifin < m, the equations can be satisfied if and only if the points z;
226
NUMERICAL INTEGRATION AND DIFFERENTIATION 227
are so placed that the matrices (u,, ¢-(2;)) and (¢,(2;)) have the same
rank.
Consider again Petersson’s expansion

(7.1.5)
He) = Yandel) + [7oe, )Lmrslf(s)lds,
v= DVande(e) + [* g(t, 8)Dmeslf(s)lds.
Assuming the multipliers \; to have been found, multiply the first equa-
tion here by w(x) and integrate from a to b; the second by 2, and sum,
subtracting the result from the integral just obtained. By (7.1.3),
(7.1.1), and (7.1.4), this gives

R= ‘i” w(z) if
* g(x, 8)Lmsalf(s)|ds dx — yx i,* g(a, 8)Lmsslf(s)]ds.
A similar expansion written for x = b gives in the same way

R= — [Pw(a) fPo(z, \Dmsslf(s)lds dx + Yr [?gle, 8).


Zmeslf(s)lds.
Hence by addition of the two expressions

R= [" M(o)Lmuslf(s)lds,
(7.1.6)
2M(s) = ” w(2)g(a, 8) sgn (« — s)dx — Y a(x, 8) sen (as — 8).
The function M(s) is continuous since the discontinuity in the signum
function occurs only where g vanishes.
As examples, let m=n=1, a=2=0, and b=2,=1. Let
w(x) = 1, ¢o(z) =1, and ¢i(z) =z. Then g(x, s) =2—s, and
No =A. = %. Then M(s) = s(s —1)/2. This gives the common
trapezoidal rule with remainder:

(7.1.7) ie* #(x)dx = (yo + y)/2 —- if* s(1 — 8)f"(s)ds/2.


Next, let w(x) = e%, do(x) = 1, and ¢i(~) = x. Again g(z,s) =x —s.
However,
No = (e* — 1 — a)/e?, At = (ae* — e* + 1)/a?.

Then M(s) = [e** — (e* — 1)s — 1]/a’, and

(7.1.8) f’flayes da = [yoet — 1 — @) + yilact — o* + 1)I/at*


+ ff le — (et— 1)s— if’"(s)ds/o?.
If w(x) = 1, do(x) = 1, and ¢1(x) = e%, then g(x, s) = (e** — 1)/a,
No = (e* — 1)!a“"(ae* — e* + 1), and Ay = (e* — 1)~1a“"(e* — 1 — a).
228 PRINCIPLES OF NUMERICAL ANALYSIS

Then M(s) = (e* — 1)—!a-[s(e* — 1) — e*(1 — e-**)]. _Hence

(7.1.9) flfla)de = (e* — 1)Yatlyolaet — e* + 1) + ws(er — 1 — 2]


+(e = Ita? fi [a(e* — 1) — ex(d — e**)1LF(8) — of (Ids.
Finally, let w(x) = 1, ¢o(x) = e%, and ¢:(x) = xe. Then
No= (e*
— a — 1)/a?, A = (a —-14+ &*)/e?,

~ and g(z, s) = (x — sje**-), Then


M(s) = {1+ [(1 — e*)s — le} /a?.
Hence

(7.1.10) f* ¢(a)dx = [yole* — a — 1) + sla — 1 + &*)]/0?


+ [1+ (C= *)s — Mes*}I7"(s) — 2af"(8) + aXf(s)]ds/ar.
When these formulas are applied repeatedly to intervals from 7 — 1 to 2,
fori = 1, 2, ...,-, then formulas (7.1.7), (7.1.9), and (7.1.10) give,
neglecting remainders,

[Pfede = (yo/2 + y us + yo/2)


t=1
n—1

(e* — 1)“a“Mlyolaet — e* + 1) + ynlew—1—a)] + ) x


t=1
a [yo(e* —a — 1) +yi(a —-I + €-*)]
n—-1
ob ane exs pam, Cade! » Yi.

i=l
Equations (7.1.8) and (7.1.10) are identical if in the former f(x)e%* is
replaced by f(x).
In (6.101.3) and (6.101.4), the %,(x) are linear combinations with
constant coefficients of the ¢,(x). On multiplying (6.101.4) by w(x) and
integrating, one finds

pe =) dela) f° wi) ®(a)ae,


and the integral on the right is expressible as a linear combination of the
ur. Thus the choice

(7.1.11) ee [” w (a) B(x)dex


satisfies (7.1.4) and determines the d; uniquely. Hence in case n = m,
_ to use (7.1.4) to determine the ); for the expression (7.1.3) is equivalent
NUMERICAL INTEGRATION AND DIFFERENTIATION 229
to approximating the integral of fw by the integral of éw, where ® is the
interpolating function; when n > m, one can always determine the ); by
(7.1.4), together with suitable further conditions so that the sum on the
right of (7.1.3), ignoring R, is the approximation given by integrating -
w with a function @ determined by one of the methods of §6.1.
In case n < m, it may be possible to choose the points x; at which the
evaluations f(z;) are to be made so that the matrices (¢,(x;), u,) and
(¢,-(x:)) have the same rank. In this event Eq. (7.1.4) is consistent and
has solutions \;; moreover, if the rank is n + 1, the solutions ); are
unique.
Consider the polynomial case with
(7.1.12) OAS) =a":
If the points x; are all distinct, then the matrix (¢,(z;)) has rank n + 1
for m>n. Hence Eq. (7.1.4) will be consistent if every square sub-
matrix of (zf, u,) of ordern + 2issingular. This implies that any matrix
of the form
1 1 1 Mr
Xo V1 Ln Mr4t

gti oett eel at

1 1 treed: Ho M1 i)
Xo L1 Ln Bi be M3

eet anet wee Mnt1 Mn4+2 Hn+3

must have rank n+ 1. In particular if m = 2n + 1, and the z; are


taken to satisfy the equation

1 Ko My Se Sheets

(7.1.18) ipa
x 1
ees!
1.debt
be
segs)
=", = 0,
nyt

et net ne + +e Ont

then if these z; gre all distinct, the system (7.1.4) is consistent, and defines
a set of coefficients uniquely.
With the x; as determined by (7.1.13), let
(7.1.14) wale) = Ww — &,) = at! + wat" + °° + Onn.

Then w(x) is equal to the determinant in (7.1.13) except for a constant


factor. Hence
(7.1.15) Mneizi t On,obngi $ °° * + Onnpi = 0.
230 PRINCIPLES OF NUMERICAL ANALYSIS

But by (7.1.1), (7.1.18), and (7.1.14), this implies that


(7.1.16) ih” hwn(a)w(a\dx = 0, ¢=0,1,... 5%
Hence if p(x) is any polynomial of degree < n,

| ik,p(x) wn(x)w(x)dx = 0.

In particular if we consider the sequence of polynomials wo, w1, . . « , n,


it follows from this equation that

(7.1.17) i i” com(2t)ion(x)yw(x)dx =0, nm


The polynomials w,(x) are therefore said to be orthogonal on the interval
(a, b) with respect to the weight function w(x). The derivation of
Lanczos’s theorem can be paraphrased to show that by defining formally
w1 = 1,
w_1(x) = 1
(7.1.18) wo(x) = & — andy},
w(x) = (2 — a,6-")o,1(4) — 5,551,@,_2(2), res
where
ar = [?203_,(2)w(2)de,
(7.1.19)
6, = i)
: w2_,(x)w(x)da.

Hence to obtain the polynomial w,(2), one can calculate recursively the
w(x), r <n, instead of expanding the determinant (7.1.13).
In this situation both the x; and the i; were left completely arbitrary
and determined so as to take account of a maximal set of the ¢,. Another
possible procedure would be to restrict the x; and \; by a limited number
of conditions, and then impose as many of the conditions (7.1.4) as
possible. As an example of this, it is sometimes desirable to select the
x; so that the 2; are all equal:
M=Ma
tc =D
There are then to be determined the n + 2 quantities xm, . . . ,%, and ,
and therefore in general n + 2 conditions (7.1.4) can be imposed. For
polynomials ¢, these have the form

A = po/(n + 1),
= (n + 1)p1/no,
@) OPS: iO) ROA L610) ‘6, 6s ela ae)

Zaptt ss (n + 1) pnis/Mo.
Hence the sums of the powers up to the (n + 1)st are known. From
these the elementary symmetric functions of the az; can be formed by
NUMERICAL INTEGRATION AND DIFFERENTIATION 231
means of Eqs. (3.02.5), and hence the equation of degree n + 1 of which
the z; are the roots. Unfortunately the roots 2; do not necessarily
always fall within the range of the integration, and when they do not, the
method is unusable. -
7.2, Numerical Differentiation. Equation (6.101.7) can be written

(7.2.1) fe) = ®@) +) a@) [* oe, )LF()las.


_ Hence ; : .
Se) = #@) + YH) [” gas, NLIs()lds + Y (aoa, aL)
But
),ala, 8) = YY e502) dada)
j
| = Ja(a)9-(8) = 9, 8),
and
Ge; @) —0.
Hence
(7.2.2) #@) =8@) +) H%o) [” oa, )LIf(@)lds.
Again
>, H@ as, 8) = ) d(e)g(s) = age, 8)/ae,
and
z=0
dg(x, s)/dx| = 0,
so that
(728) f(z) = 8a) +) ay@) [”oe, dLIf(o)las.
Similar relations hold for f™, » < n.
For polynomial interpolation in terms of divided differences the
equation

(7.2.4) f(x) = f(wo) + (x — 20)f(xo, x1) + (@ — 20)(% — a1)


f(Xo, 41, 22)
ie ee a (x oo Xo) 5 Pky (x nn Ln—1)f (Xo, U1, + + , @n)
as (x i Xo) Sate: (x aa Ln) f(xo, Ui, + + + > Xn, x)

is exact, the last term representing an expression for the remainder. Let
Lin) = (x a Lo) ene (x — Li)
. Then
f
(1.2.5 )» f(x) == fo, 0) x 21) + Uyf(xo,
oD) 1, 22) ip+ ++ lke> (abd opeuale.
232 PRINCIPLES OF NUMERICAL ANALYSIS
where
(7.2.8) Fe = 14 (0, Pipes, 4 Lan) ebayLey © iia aes ae
When x = 2, Xm) = 0, and the second term drops out of the ee
Continuing, one finds
(7.2.7) f'(@) = 2f(eo, v1, %2) + Ua f(Xo, £1,002, 2s) + + *-
a Dee atl Ce, aces) , £n) = HRC
(7.2.8) R"” = Lind (Lo, U1, - + + yn, 2) or 22 inf (Co, Ti, - + + Un, V, 2)
+ Xn) f(Xo, V1, - «+ « » Un, L, LX, %);

All divided differences which occur in these remainders can be expressed


in integral form as in §5.141.
7.3. Operational Methods. For polynomial interpolation with equally
spaced fundamental points, formulas for numerical differentiation and
integration can be derived by operational methods. The Taylor expan-
sion of an analytic function can be written in the form

F(@o + uh) = f(eo) + uhf' (xo) + wh? f(a) /2!+ ++ *


Hence by introducing the differential operators

(7.3.1) D = d/dz, 6=hD


and recalling the definitions of the displacement and difference operators,
we can write formally

Ee = 1+ whD + wh?D?2/2!+ +--+ = ev,


and therefore set
(7.3.2) BE=1+A=(1-V)7=e,
whence
(7.3.3) 6 = log FE = log (1 + A) = — log (1 — V).
Hence
(7.3.4) 6 = A — A?/2 + A#/3 — At/4 +++:
ViV7/2+V37/34+V4/4+---
These expansions can be used to obtain the derivative at any point a in
terms of forward or backward differences. To obtain f’(x) = f’(ao + uh),
write

(7.3.5) H“@ = (1 + A)“ log (1 + A) = A + (2u — 1)A2/2


+ (8u? — 6u + 2)A2/6 +
or the corresponding expression in terms of V.
NUMERICAL INTEGRATION AND DIFFERENTIATION 233
For the derivative in terms of central differences, we have
6 = EX — E-% = 6% — ¢-2 = 2 sinh (6/2),
p = (BY + E-¥)/2 = (e% + e~/)/2 = cosh (6/2).
From this it follows that formally
dé/dé = p = (1 + 82/4)%,
and therefore

(7.3.6) 6 fi a+ 2/AyHar
12- §3 12-32. §5
| |
31-22 Bls2t
mo

This formula gives the value of f’(xo) in terms of the values of f at


points %in/2. To obtain f’(xo) in terms of values of f at points t,, proceed
as follows:
It can be verified directly that 6/» satisfies the differential equation

(7.3.7) (1 + 5?/4)d(0/u)/d6 + (5/4)(6/u) =


Also 6/y is an odd function of 6. Hence assume the expansion
6/u = 6 + 236% + a5d® + a7d7 + ---,

substitute into the differential equation, and equate coefficients of like


powers of 6. By this means one finds that

@DY get aT |
(7.3.8) 6=p |e-3
To continue to higher derivatives, we have next that
d(6?)/d3 = 26 d0/dé = 20/p.
Hence
1)2
6? = 2 feta @ pos |ar,

or

(7.3.9) ed ee ey et: |
It can now be shown inductively that

(1 + 5?/4)d(6?"*1/u)/d6 + (8/4)(6?"#1/u) = (2 + 1)6”


and
d(6?”) /dd = 2v0?"1/u.
From this one can proceed sequentially to find 6/y, 64, 05/u,....
Some estimate of the error can be had by noting the magnitude of
the first neglected term. It is possible to obtain an exact expression for
234 PRINCIPLES OF NUMERICAL ANALYSIS
the error in any particular case by a method that will be used for numer-
ical integration, now to be described.
Ordinarily in numerical integrations the integral is to be evaluated
between two of the interpolation points, say z; and z;. At any rate it is
no restriction to assume that the lower limit is such a point. Let F(x)
be any function satisfying .
(7.3.10) DF(z) = f(z).
One wishes to evaluate (E* — E*)F(xo) where 7 is an integer. If 7 = 0,
we have

(7.3.11) [aes f(x)dx = (E* — 1)F(2s)


= (ud + w?62/2 + w363/3! + - + -)F(x0)
= uh(1 + ud/2 + u?62/3! + + - -)f(2o).

If the powers of 6 are replaced by their expansions in terms of any of the


difference operators up to whatever power may be desired, the result is a
formula for numerical integration. If one so desires, he can replace all
difference operators retained in the formula by their expressions in terms
of the displacement operator E, so obtaining a formula directly in terms
of the f(z;).
7.31. The Trapezoidal Rule. Next to a Riemann sum this is the
simplest rule of all. It is given in (7.1.7), but will now be derived
operationally. In (7.3.11) set u = 1; we carry the expansion to the first
power only of the difference operator, and to this degree of approximation
6=A. Hence (7.3.11) gives

(731.1) f@)de = hQ + 4/2)f(@) = A + E)f(ao)/2


= h(yo + y1)/2.
To evaluate the remainder one can proceed as follows: The formulas

(7.31.2) e° Veta ais 2 i1 rel) (1—1)0 dz


L+ 0+ 346? + 3408 [100-08dr
0
1

are exact and can be verified directly by integrating by parts. If e® is


replaced by E, and e? — 1 by A, one has by the second of these

AF (20) = h ( + 40 + Ke :* 2B dr)f(ao)
and by the first therefore

AP(zs) = h|1 + 34a — 346" ['o(1 — 2)B* dr |f(a).


NUMERICAL INTEGRATION AND DIFFERENTIATION 235
Thus the integral operator provides the desired remainder. This can be
transformed

hor [* o(1 — 1)E** defo) = hi fr = nf "leo + (1 — ahldr


= [Ge =o) — ao)f’"(o)dv
by introducing the new variable of integration
= 209 + (1 —7r)h = a — th.
Hence we obtain finally the trapezoidal rule with remainder

(7.31.3) ifzf(x)de = Whi ty) -% [* (ai — )(0 — a0) f'"(0)


dv.
By the law of the mean, since ;

[EG - 0 — aed = (er — 24/6 = 14/6,


we can write ;

(7.31.4) [/"f(@)de = Mh(yo + ys) — Mohef"(®, a SES m.


Upper and lower limits for the true value of the integral can be had by
introducing minimum and maximum values of f”. If f’’(x) does not
change sign on the interval, the law of the mean can be invoked again to
give
[F @ — @ — 2)do = (mr — DE 20) [Fao
= (t1 — &)(E — t0)(yi — ¥).
Since the quadratic factor cannot exceed h?/4, it follows that, when f”
does not change sign, then for some positive e < 1 it is true that

(7.31.5) [ f@)de = Vh(yo + ys) — Web, — m0).


7.32. The Maclaurin Quadrature Formula. The identities (7.31.2) are
_ special cases of the more general form
(7.82.1) e®=1+ u0+--+ + uO"/m!
4 (ymtlgmtt /m!) i pme(i—r)ud diy,
On setting u = 1 and u = —1, subtracting, and applying to F(x), one
obtains

(7.32.2) /wore
2h ses) Se EI" (eo)5 OS 2 apm
h |+ R,
236 PRINCIPLES OF NUMERICAL ANALYSIS

where
jones iE

qemtl{ fOem+Di eg + (1 — r)h] — fO™*Y[xo — (1 — r)h]}dr


— 1
iS ery [: eS
9) 2m+-1f
aco | f(2m+1) (totv) —
—ff(2m+1) (to =f
— v)]dv

This is the Maclaurin quadrature formula. For m = 0,

(7.32.4) [”f(e)de = 2hf(ao) + i!* (h — v)Ef"(ao + ») — f’(eo — ») Jv.


The integral on the right expresses the difference between the area under
the curve and that under the tangent to the curve at the point (Xo, yo).
7.33. Simpson’s Rule. When m=1, the formulas (7.32.2) and
(7.32.3) in operational form are

E — Et = 26 + 146% + (04/6) [7B — B-O)dr,


with operators applied to F(a). One factor @ on the right applied to
F(o) replaces it by hf(ao). Consider the term 67. Since

2 + 5? = 2 cosh 6 = e? + e~?

by (7.3.5), we can apply (7.32.1) to obtain

62 II a + go [Ei — EO)dr,
Hence
(7.33.1) EH — EE" 20 + 1406? — Leos Hh72(1 — 7)(E** — E-O—)dr

M40(E + 4+ E>)
— 608 [2 — 1)(B — BO),
This gives Simpson’s rule with remainder when applied in the usual way
to F(a). After changing the variable of integration on the right, one has

(7.38.2) ["f@)dx = Vh(ya + Ayo + ys)


— 36 [Pe — 0) lf"(wo+ 0) — $0 — v)Ido.
It is customary to point out that, whereas Simpson’s rule utilizes only
second differences, 6?, and should therefore have a vanishing remainder
when f is a quadratic polynomial, the remainder vanishes in fact even
when f is a cubic polynomial. This follows from the fact that for a
cubic polynomial f the third derivative is constant, and the integrand
therefore vanishes in the remainder term.
NUMERICAL INTEGRATION AND DIFFERENTIATION 237
In case f’” is monotonic (but not necessarily constant), the integrands
which occur on the right in (7.33.2) and in (7.32.3) for m = 1 are both
positive or both negative. However, one integral appears as an addi-
tive, and one as a subtractive, term. Hence in case f’”” is monotonic, of
the two approximations h(y_1 + 4yo + y1)/3 and 2h(yo + h*y///6), one
is an overestimate and one an underestimate of the integral which appears
on the left.
Another “enclosure” theorem can be obtained when f’” is monotonic,
a theorem, that is, which provides an upper and a lower bound to the true
value of the integral. When /’” is monotonic, then the difference between
the derivatives which appears in the integral on the right of (7.33.2) is
bounded by 0 and f’”’(x1) — f’”"(a_1). Hence for some positive e < 1,

fi (h — v)*[f""
(eo + v) — f’" (ao — v)Jdv = e(yf’ — yl) hs (h — v)%v dv
ey! — yt )h4/12.
Hence
(7.33.3) | f(a)de = Yh(y-1 + 4yo + yi) — At(yf’— yl) /72.
The eet are obtained by setting « equal to 0 and 1.
7.34. Newton’s Three-eighths Rule. To obtain a formula using third
differences requires the expansion of e® to the fourth power of 6. In
order to gain the advantages of symmetry in the expressions, consider the
evaluation of (H* — E-*)F (a2) in terms of H+t*f(xo) and E+f(x»),
assuming the fundamental abscissas to be 743, and iy. Write
€35/2 = il + 360 + 6 6? + % 608 + Rs

1 + 340 + 6? + %66? + 2749864 + 36Ry/


and the corresponding expansions of e~*/? with remainders R_x and
R_3,.. We find then that
(7.34.1) 63/2 — ¢— 34/2 — He — f-*# = 30(1 + 3 6? + Rs. + R_s.').

The expression 1 + 36?/8 is required in terms of yu, »*, and remainders.


For this we have
E% + B-% = 8y® — 6u = 2 + 9607/4 + Bye + B_se
and
EY + IP = Qu yy + 62/4 + Ry + R_y.
Hence

(7.34.2) 1+ 3¢6? = uw? — Le(Ry + Rx) — 34(Ry + R_y).


The required formula is obtained by combining (7.34.1) and (7.34.2).
Write this result:

(7.34.3) [% fa)dn = 36h(y-s6 + 3y-4 + Buy + yy) +B,


238 PRINCIPLES OF NUMERICAL, ANALYSIS.
where R is obtained by combining the various remainder terms which
appear above. Consider the terms in 6Ry and 6Ry’. When either of
these operators is applied to F(20), they yield integrals with respect to 7
between the limits 0 and 1 of a polynomial in 7 multiplied by
fi” [zo + 3h(1 — 7)/2].
By introducing a new variable of integration
v = 2 + 8h(1 — 7)/2,
the integrals are taken from 2 to x3. After a little algebraic manipula-
tion, when the coefficient is included, they combine to give the integral

Ya iiss (t3g — v)?(ao — v)f”(v)dv.


A similar manipulation of 0R_s, and 6R_3’ gives

oa J (ae — 0) aro — vhf" (o)de,


a result that can be written down immediately from symmetry.
There remain the terms in 0Ry and 6R_y. For the first of these
introduce the variable of integration
v=2+ A(1 — 7)/2,
the integral being taken then from x» to zy. The result is

—34eh [™ (ay — v)4f™(v)do.


The same integral from z_y, to 2» results from the term in 0R_y. These
three integrals can be rearranged to give finally

(7.84.4) R=, {"rk CaP Re IONS

ae [P* Meo — v)4 — Kel f™(v)dv + Ne (x34 — v)3(ty — v)f77(v)do}.

Equations (7.34.3) and (7.34.4) together give Newton’s three-eighths rule


with remainder.
The remainder just given is identical in form with that which one
would obtain by using (7.1.6), and the polynomials in » which multiply
f’”(v) in the several integrands define the function M(v) which appears
in (7.1.6). More generally, the operational method is merely a device
that may be applied for calculating a quadrature formula and remainder -
in the special case when the intervals are equal, the base functions are
polynomials, and w(x) = 1, whereas the method of §7.1 can be applied
in general.
NUMERICAL INTEGRATION AND DIFFERENTIATION 239
It may be noted that M(v) is of fixed sign throughout the interval of
integration ini (7.34.4), and hence the law of the mean can be applied as
in the previous cases.
7.35. Open Formulas. The trapezoidal rule, Simpson’s rule, and the
three-eighths rule are said to be of closed type, since the integration
extends from the first to the last of the fundamental abscissas. A formula
of open type is one which extends beyond. Such a formula is less exact
but is often required, in particular in most methods of solving ordinary
differential equations. In the most common methods for doing this, the
dependent variable y is evaluated sequentially at successive points
Xi, Litt, Lizz, .... When y; has been evaluated, then one proceeds
to evaluate y;+1, first approximately by an open integration of y’, there-
after by successive approximations by closed integrations, each closed
integration employing the currently available approximation to y;,;.
Under quite general conditions this process converges.
There are many possible formulas of open type, just as there are
many of closed type. The three of the latter type which have already
been given by no means exhaust the list. Of open formulas we shall give
only one, which Milne uses in conjunction with Simpson’s rule as the
closed formula, for solving a differential equation. The formula in ques-
tion gives the integral of y = f(x) from x_2 to 7;2 in terms of the three
middle ordinates y_1, yo, and yi. Hence it is a quadratic formula.
In (7.32.1) for m = 3, set uw= 2 and u = —2 and subtract:

BE? — E-* = $66[3 + 267+ 26% [* (20-9 —B20)


dr|
Again in (7.32.1) for m = 1, set uw= 2 and u = —2 and add:

E+E =2+ 6+ 0! i* 72(EI* — B-0-)dr,


The left member of this equation is 6? + 2; on eliminating the 6? from
- the first and applying the operator to F(x) as usual, we have

ihe
" fa)dex = 46h(3 + 26%) + R,
where
R= $6he{— [lf es— th) — f°" ea t+ th) |dr
| 9 i* Ff" (tg — Qh) — fae + 2rh)|dr}.
After a change of the variables of integration this can be written

(7.35.1) R=\% ieaa — 0) f""(v)do


+16 [" @ — a)[@ — 24)? — BRIG"
(0)do
+ [7 @ — 2) ¥"@o)do + 36 [* (wo — mle = 0) — SIS 0)do,
240 PRINCIPLES OF NUMERICAL ANALYSIS

This is the remainder FR in the required formula

(7.35.2) [™fla)dx = $6h(2y1 — yo + 2y:) + R.


7.36. The Euler-Maclaurin Summation Formula. If the polynomials
¥(r) satisfy

(7.36.1) Yor) = 1, dy,+1(r)/dr = y,(r),

then repeated integration by parts yields the formula


n

(7.36.2) e? —1= ) ofys(1) — e,(0)] + ort Ya(r)edr.


y=]

If e® is replaced by EH, and the operators applied to f(xo), one has an


“expansion”’ of f(xo + h) in powers of h, in which however the coefficients
involve derivatives evaluated at both z) and x) + A. Thus the generali-
zation of Taylor’s expansion of Hummel and Seebeck comes from choosing

Vaim = 7™(1 — 7)*/(n + m)!.

In general, it is easily verified that for any set of polynomials satisfying


(7.36.1) if we write

(7.36.3) = ¥,(r) = aor’/v! + ay Y/(v»— DI +--+ +4,


then each a; is the same for any y, with »y> 7. Let
(7.36.4) a; = B;/i!.
Then

vly,(r) = Bor? + () By + (;)Boast hee dR

This can be written symbolically in the form

(7.36.5) V(r) = (B+ 7)’/r!

if we understand that in the expansion B‘ is to be replaced by B;. If we


now require that

(7.36.6) By = 1, (B + 1)” = B’, Mio de


then we have a recursion that defines the B;, and hence the y;(r), uniquely,
and (7.36.1) takes a particularly simple form:

(7.36.7) e? — 1 = ga(1 + EB) — » B,A@”/y! + grt tiv(r)e'—


"dr,
yp=2
NUMERICAL INTEGRATION AND DIFFERENTIATION 241
When this is applied to F (20), the result expresses the integral of f(x) from
Xa to x; in terms of f and its derivatives evaluated at x» and 21:

(7.36.8) fle)dz = M4h(yo + m) — DY Bdr(yg-? — yf-)/rl + R,


y=2

and F can be evaluated by applying the integral operator in (7.36.7).


If corresponding expansions are written for the integral from 2, to 22,
- + » Zm—1 tO Tm, and the results added, then the derivatives drop out
everywhere except at xo and 2m:

(7.36.9) i” f@de = h(yo/2 + yi t - ++ + ym-1 + Ym/2)

— y B,he(ye-P — ye-P)/v! + RY.


v=2

This is the Euler-Maclaurin summation formula, and it is often used for


approximating the value of a sum Zy; = Zf(x,) in case the function f(z)
is readily integrated.
The constants B, are the Bernoulli numbers, and the polynomials
(B + 7)” the Bernoulli polynomials. It turns out that

Bo,41 = 0, vy> 0.
For if n is allowed to approach infinity in (7.36.7), the expansion becomes

eel tl ef) — (eh — 1) » B,o"/v!.


v=2
Considering 6 a real number, we can write

1 +
.
2, Be/>! => oo
oe? +1) =
oe?
sar
+e) 8 8
ean = 5 coth 5°

Since the right member of this identity is an even function of @, only


even powers can appear on the left, and this proves the assertion.
7.4. Bibliographic Notes. The developments in Eqs. (7.2.4) and
following and in (7.3.6) and following are based on Steffenson (1927).
An interesting and suggestive development of quadrature formulas based
on polynomial interpolation is to be found in Kowalewski (1932).
The selection of points so as to minimize the number of points leads
to the Gaussian quadrature formulas, and that for equalizing the coeffi-
cients leads to the Chebyshev formulas. See Sard (1948b, 1949a, and
1949b) for other criteria for a ‘“‘good” formula and (1951) for extension
to more than one variable. The introductory treatment here follows
Kneschke (1949a and 1949b) in the main.
CHAPTER 8

THE MONTE CARLO METHOD

8. The Monte Carlo Method


In §1.6, it was pointed out that in many, if not most, computations
the occurrence of the maximum possible error may be extremely improb-
able and that it may be sufficient for practical purposes to be able to say
that the probability is p that the error in the result will exceed some
quantity 6, where p is small, perhaps one-tenth or one-hundredth of
1 per cent, and 6 is within the limits of tolerance. It may well happen
that computational labor of astronomical proportions would be required
for a result which is certain to be in error less than 6, if indeed such a
result is attainable at all, whereas with only moderate labor the probabil-
ity of an error greater than 5 can be made extremely small. Thus
though the computation is strictly deterministic, it may be both possible
and advantageous to employ nondeterministic, 7.e., statistical, methods
to appraise the result.
The result of the computation is therefore treated as an estimate
rather than a true approximation. Actually physical measurements are
often of this sort. A physical measurement may be the average of many
measurements, all differing from one another though taken under condi-
tions as nearly identical as it is possible to make them. Along with the
mean, the experimenter will then compute the probable error, which is
not the maximum error that could have been made, since that is poorly
defined or entirely undefined. Instead, the probable error is the error
which one expects will be exceeded half the time. In more precise
terms this means that, if one follows the practice of asserting with respect
to any given measured quantity that the mean of the measurements does
not differ from the true value by more than the probable error, then he
may expect to be wrong in the case of about half the quantities under
consideration. But the point of primary interest here is the fact that,
if the maximal error of measurement is not defined, then neither can
the maximal propagated error be specified for any computation which
makes use of these measurements as data.
This being granted, it is reasonable to consider the feasibility of using
nondeterministic methods for the computations themselves. This would
242
THE MONTE CARLO METHOD 243
mean obtaining an estimate of the desired quantity by means of some
random sampling process, rather than obtaining an approximation by a
rigorous computation. This is known.as the Monte Carlo method. Ina
few situations it is the only feasible method known for “‘solving’’ the
problem, though it is by no means a general method. A few examples
will be given to explain and illustrate it, but any reasonably complete
treatment would have to make rather extensive use of the theory of
probability.
8.1. Numerical Integration. Consider first the problem of evaluating
a multiple integral,
(8.1.1) @ = f¢d(x)dv.
The variable x is taken to represent a vector in the space of the coordi-
nates £, . .. , & (where we could have n = 1, in particular); dv repre-
sents an element of volume in the n-dimensional space; and the integral
is to be taken between fixed and finite limits. The assumption of finite-
ness for the range of integration is no restriction in itself, since this can
always be achieved by a transformation of variables if necessary. How-
ever, it is assumed that ¢ is everywhere finite and bounded in this region.
Hence the integral represents a hypervolume in the space of n+ 1
dimensions with coordinates &, . .. , &:, 7; therefore we can introduce
scale factors and translations and assume that in the region
(8.1.2) 0< ¢(z) <1
and that the integration extends over the region
(8.1.3) 0<é&<1.
It follows that, if points were drawn at random from the entire unit
(n + 1)-dimensional hypercube with uniform probability, then the
probability is ® that any particular one of the randomly selected points
has a coordinate 7 satisfying

(8.1.4) 1S ¢(2),
if s = (£1, ..., €) represents the other n coordinates of this point.
Hence if one made random drawings of a large number of points N, testing
inequality (8.1.4) each time a drawing is made, and if the inequality is
satisfied for N’ of these points, then N’/N provides an estimate of ©.
This is the essential idea underlying the Monte Carlo method of
numerical integration. However, in any digital computation one cannot
draw arbitrary points from the cube, but only points whose coordinates
have a digital representation.
Suppose it has been determined that for representing the coordinate
£; in the base 8 it is sufficient to use o; places and that the computation of
244 PRINCIPLES OF NUMERICAL ANALYSIS

¢(z) will then be accurate to7 places. Suppose, further, one has available
some process for drawing digits 0,1, . . . , 8 — 1 at random with equal
probability. One therefore makes o1 +o2+ °° * ton +T drawings.
The first o, digits in order provide the representation of the coordinate £1;
the next o2 provide the representation of the coordinate f, ... ; the
last 7 provide the representation of the coordinate 7. With these repre-
sentations one tests the inequality (8.1.4) to decide whether the selected
point in (n+ 1) space lies inside or outside the volume. There is a
question whether the equality sign should be allowed in (8.1.4), or only
the strict inequality. If the equalities do not arise in sufficient numbers
to make a significant difference, then it is immaterial whether these points
are counted as inside or outside or are neglected altogether. If they make
a significant contribution, the decision must be based upon a considera-
tion of the routine for computing ¢.
If one could really make random selections from all points (£1, . . . ,
£n, n) in the unit hypercube, rather than from those points only whose
coordinates are digital numbers, and could obtain the strict mathematical
value of ¢(x) for any point, one would be repeating the occurrences of an
event with two possible outcomes: “success”? and ‘‘failure’’ with prob-
abilities 6 and 1 — &. By standard statistical formulas one can deter-
mine the probability that in N trials the number of actual successes Ny
will differ from N® by more than any given amount.
But since only points x are drawn for which the £; are digital, one is
at best not estimating the volume ©, but a slightly different volume ®’,
and the statistical formula gives the probability of deviations from N®’,
rather than from N&. The nature of the volume ®’ is best illustrated
for the case n = 2.
Each of the o; digits of £; can have any one of 8 possible values. Hence
there are 8+” possible points z. Associated with each z is a computed
¢*(x), defined by the computational routine. This is a digital number
with 7 places. For each x let 7’ represent a quantity differing from $*(z)
by not more than 6-7/2, and whose exact value depends upon the error
o* (a) — $(2) and upon the rule for including or excluding the equality
in (8.1.4). Then ®’ is equal to 6-*-” times the sum of the quantities 7’
for all possible x. Hence the quantity &’ being estimated is essentially
that approximation to $ that would be obtained by employing a Riemann
sum of 8+: terms for the integral.
The total error in the entire computation is therefore
(8.1.5) & — Ni/N = (®@ — ’) + (@ — N,/N).
The second parentheses is the so-called sampling error, which is the
deviation of the estimate from the quantity being estimated. In the
assertion that the probability is p that this error does not exceed 4, if 6
THE MONTE CARLO METHOD 245

is regarded as a function of N for fixed p, then 4 is inversely proportional


to the square root of N. Thus to cut 6 in one-half one must quadruple
the size N of the sample.
The first parentheses, — ©’, represents the usual computational
error, generated and residual. There is no initial error, since each $*(z)
is to be computed for exactly that x that has been drawn by the random
process. For a given ¢, the error ¢ — ¢* generated in computing any
¢*(x) depends upon the values of the o;. Increasing these, which is to
say, employing more places in the computation, will decrease the gener-
ated error, though at the expense of the additional labor involved in
carrying along the extra places. But increasing the o; will also decrease
the residual error, which is the deviation of the Riemann sum from the
true value ® of the integral. And the decrease in residual error comes
about without the need for actually computing additional terms in the
sum. Thusif the function ¢ is quite irregular, so that many subdivisions
of the range of integration would be required to make the residual error
sufficiently small, the direct computation of the Riemann sum might
require a prohibitively large number of terms, hence ¢ computed for a
prohibitively large number of x’s, whereas in Monte Carlo computations
the fineness of the subdivision has no effect whatever upon the number
of values of x for which ¢ must be computed. This may be extremely
important when the space is of high dimensionality. Thus, if n = 6,
and each o; = 10, the Riemann sum contains 10° terms; if o; = 20, it
contains 20° terms.
There are known techniques for making Monte Carlo estimates of an
element of an inverse matrix, and solutions of certain functional equa-
tions, but it is not clear that the method is useful unless variables occur
which are to be integrated out in the solution that is required.
It should be mentioned in passing that (8.1.4) can be replaced by any
relation equivalent toit. Thus if ¢ is a square root, the relation n? < ¢?
is equivalent and more easily examined.
8.2. Random Sequences. To employ the Monte Carlo method one
must be able somehow to obtain random sequences of digits, or at least
sequences which resemble random sequences in all essential aspects.
What constitutes an adequate ‘‘resemblance”’ is not altogether clear, but
at least the digits used must be in roughly equal proportions, and no
digit may show a marked tendency to follow any other particular digit.
Printed tables of randomly selected decimal digits are available, and the
Rand Corporation has prepared punched-card tables of random decimal
digits.
For high-speed machines neither printed tables nor punched cards
are suitable sources. If one selects a 2»-digit number for » sufficiently
large (say, 5 or more) squares, extracts the middle 2» digits, and repeats,
246 PRINCIPLES OF NUMERICAL ANALYSIS
one is bound eventually to return again to the original number. Thus the
process. is cyclic. If the cycle is sufficiently long, then the extracted
digits form a sequence that resembles a random sequence, and the opera-
tion is easily programed for a computing machine. Similar processes are
multiplying by a fixed multiplier and extracting middle digits; and
squaring and reducing modulo some prime.
8.3. Bibliographic Notes. The Monte Carlo method achieved its
first popularity among the atomic-energy laboratories, following some
successes in its use by von Neumann and Ulam. For general discus-
sions see Metropolis and Ulam (1949) and proceedings of the Monte
Carlo Symposium edited by Householder, Forsythe, and Germond (1951).
More recently a series of papers and reports have come out from the
Institute for Numerical Analysis: Kac and Donsker (1950), Kac (1951),
Wasow (1950, 195la, and 1951b), Fortet (1952a and 19526), Curtiss
(1952), Forsythe and Liebler (1950, 1951), and Cutkosky (1951). See
also the proceedings of the several Endicott symposia, and the Quarterly
Progress Reports of the National Applied Mathematics Laboratories of
the National Bureau of Standards.
For problems other than numerical quadrature, the method has been
used for matrix inversion, for solving functional equations of various
types, but more especially for problems associated with physical processes
that are essentially stochastic in character. In this last connection see
the proceedings of the Monte Carlo Symposium, and also Nelson (1949)
and Kahn (1949, 1950). On integral equations, which specialize directly
to linear algebraic systems, see Albert (1951-1952) and Nygaard (1952),
in addition to references already mentioned. It is this author’s opinion
that the method has proved and will prove most useful for the intrinsically
stochastic physical problems.
BIBLIOGRAPHY

Shmuel Agmon (1951): “The Relaxation Method for Linear Inequalities, ” National
Bureau of Standards, NAML Report 52-27.
N. I. Ahiezer (1947): Digan on the Theory of Approximation” (Russian), Moscow
_ and Leningrad, 323 pp.
Franz Aigner and Ludwig Flamm (1912): Analyse von Abklingungskurven, Physzk. Z.,
13 :1151-1155.
A. C, Aitken (1926): On Bernoulli’s Numerical Solution of Algebraic Equations,
Proc. Roy. Soc. Edinburgh, 46 :289-305.
(1929): A General Formula of Polynomial Interpolation, Proc. Edinburgh
Math. Soc., 1:199-203.
(1931): Further Numerical Studies in syecbraic Equations and Matrices,
Proc. Roy. Soc. Edinburgh, 61:80-90.
(1932a): On Interpolation by Iteration of Proportional Parts, without the
Use of Differences, Proc. Edinburgh Math. Soc. (2), 3:56-76.
(1932b): On the Evaluation of Determinants, the Formation of Their Adju-
gates, and the Practical Solution of Simultaneous Linear Equations, Proc.
Edinburgh Math. Soc. (2), 3:207—219.
(1932c): On the Graduation of Data by the Orthogonal Polynomials of Least
Squares, Proc. Roy. Soc. Edinburgh, 53 :54-78.
(1933): On Fitting Polynomials to Data with Weighted and Correlated Errors,
Proc. Roy. Soc. Edinburgh, A64:12-16.
(1934): On Least Squares and Linear Combination of Observations, Proc. Roy.
Soc. Edinburgh, A565 :42-48.
(1936-1937a): Studies in Practical Mathematics. I. The Evaluation with
Application of a Certain Triple Product Matrix, Proc. Roy. Soc. Edinburgh,
67 :172-181.
(1936-1937b): Studies in Practical Mathematics. II. The Evaluation of the
Latent Roots and Latent Vectors of a Matrix, Proc. Roy. Soc. Edinburgh, 57 :269-
304.
(1937-1938): Studies in Practical Mathematics. III. The Application of
Quadratic Extrapolation to the Evaluation of Derivatives, and to Inverse
Interpolation, Proc. Roy. Soc. Edinburgh, 68:161-175.
(1945): Studies in Practical Mathematics. IV. On Linear Approximation ‘
Least Squares, Proc. Roy. Soc. Edinburgh, A62:138-146.
G. E. Albert (1951-1952): ‘‘A General Approach to the Monte Carlo Estimation of
the Solutions of Certain Fredholm Integral Equations,” I-III, Oak Ridge
National Laboratory Internal Memorandum.
Franz L. Alt (1952): pres tiengular Matrices, Proc. Intern. Congr. Math., 1950,
1:657.
H. Andoyer (1906): Calcul des différences et interpolation, Encyclopédie sct. math., I,
21 :47-160.
R. V. Andree (1951): ee sen of the Inverse of a Matrix, Am. Math. Monthly,
58 :87—-92.
Vv. A. Bailey (1941): Prodigious Calculations, Australian J. Sci., 3:78-80.
247
248 PRINCIPLES OF NUMERICAL ANALYSIS
L. Bairstow (1914): “Investigations Relating to the Stability of the Aeroplane,”
Reports and Memoranda No. 154 of Advisory Committee for Aeronautics.
T. Banachiewicz (1937): Zur Berechnung der Determinanten, wie auch der Inversen,
und zur durauf basierten Auflésung der Systeme linearen Gleichungen, Acta
Astron., 3:41-72,
V. Bargmann, D. Montgomery, and J. von Neumann (1946): “Solution of Linear
Systems of High Order,” Princeton, N.J., Institute for Advanced Study Report,
BuOrd, Navy Dept.
M. S. Bartlett (1951): An Inverse Matrix Adjustment Arising in Discriminant Analy-
sis, Ann. Math. Stat., 22:107-111.
Julius Bauschinger (1904): Interpolation, Enc. Math. Wiss., I D, 3:799-820.
Edmund C. Berkeley (1949): “‘Giant Brains, or Machines That Think,” John Wiley &
Sons, Inc., New York, xvi + 270 pp.
Serge Bernstein (1926): ‘‘Lecons sur les propriétés extrémales et la meilleure approxi-
mation des fonctions analytiques d’une variable réelle,” Gauthier-Villars & Cie,
Paris, x + 207 pp.
Raymond T. Birge and J. W. Weinberg (1947): Least Squares Fitting of Data by
Means of Polynomials, Revs. Mod. Phys., 19:298-360.
M. §. Birman (1950): Some Estimates for the Method of Steepest Descent (Russian),
Uspekhi Mat. Nauk 5, 3(37):152-155.
D. R. Blaskett and H. Schwerdtfeger (1945): A Formula for the Solution of an Arbi-.
trary Analytic Equation, Quart. Appl. Math., 3:266-268.
E. Bodewig (1935): Uber das Euler’sche Verfahren zur Auflésung numerischer
Gleichungen, Comment. Math. Helv., 8:1-4.
(1946a): On Graeffe’s Method of Solving Algebraic Equations, Quart. Appl.
Math., 4:177-190.
(1946): Sur la méthode de Laguerre pour l’approximation des racines de
certaines équations algébriques et sur la critique d’Hermite, Koninkl. Ned. Akad.
Wetenschap. Proc., 49 :910-921.
(1947): Comparison of Some Direct Methods for Computing Determinants
and Inverse Matrices, Koninkl. Ned. Akad. Wetenschap. Proc., 50 :49-57.
(1947-1948): Bericht tiber die verschiedenen Methoden zur Lésung eines
Systems linearer Gleichungen mit reellen Koeffizienten, Koninkl. Ned. Akad.
Wetenschap. Proc., 60:930-941, 1104-1166, 1285-1295; 51:53-64, 211-219.
(1949): On Types of Convergence and on the Behavior of Approximations
in the Neighborhood of a Multiple Root of an Equation, Quart. Appl. Math.,
7 :325-333.
O. Bottema (1950): A Geometrical Interpretation of the Relaxation Method, Quart.
Appl. Math., 7:422-423.
O. L. Bowie (1951): Practical Solution of Simultaneous Linear Equations, Quart.
Appl. Math., 8:369-373.
Alfred Brauer (1946): Limits for the Characteristic Roots of a Matrix, Duke Math. J.,
13 387-395.
(1947): Limits for the Characteristic Roots of a Matrix, II, Duke Math. J.,
14 :21-26.
(1948): Limits for the Characteristic Roots of a Matrix, III, Duke Math. J.,
15 :871-877.
Otto Braunschmidt (1943): Uber Interpolation, J. reine angew. M ath., 185 314-55.
P. Brock and F. J. Murray (1952): ‘‘The Use of Exponential Sums in Step by Step
Integration,” unpublished manuscript.
S. Brodetsky and G. Smeal (1924): On Graeffe’s Method for Complex Roots of Alge-
braic Equations, Proc. Cambridge Phil. Soc:; 22 :83-87.
BIBLIOGRAPHY 249
E. T. Browne (1930): On the Separation Property of the Roots of the Secular Equa-
tion, Am. J. Math., 52 :843-850.
E. M. Bruins (1951): ‘““Numerieke Wiskunde,” Servire, Den Haag, 127 pp.
Joseph G. Bryan (1950): ‘““A Method for the Exact Determination of the Character-
istic Equation and Latent Vectors of a Matrix with Applications to the Dis-
criminant Function for More Than Two Groups,” thesis, Harvard University.
Hans Buckner (1948): A Special Method of Successive Approximations for Fredholm
Integral Equations, Duke Math. J., 15 :197—206.
H. Burkhardt (1904): Trigonometrische Interpolation, Enc. Math. Wiss., II A,
9a 3642-693.
Gino Cassinis (1944): I metodi di H. Boltz per la risoluzione dei sistemi di equazioni
lineari e il loro impiego nella compenzazione delle triangolazione, Riv. catasto e
servict tecnict erartalt, No. 1.
L. Cesari (1931): Sulla risoluzione dei sistemi di equazioni lineari per approssimazioni
successive, Rass. poste, telegrafi e telefoni, Anno 9.
(1937): Sulla risoluzione dei sistemi di equazioni lineari per approssimazioni
successive, Atti accad. nazl. Lincei Rend., Classe sct. fis., mat. e nat. (6a), 25 :422-
428.
F, Cohn (1894): Ueber die in recurrirender Weise gebildeten Gréssen und ihren
Zusammenhang mit den algebraischen Gleichungen, Math. Ann., 44:473-538.
A. R. Collar (1948): Some Notes on Jahn’s Method for the Improvement of Approxi-
mate Latent Roots and Vectors of a Square Matrix, Quart. J. Mech. Appl. Math.,
1:145-148.
L. Collatz (1950a): Iterationsverfahren fiir komplexe Nullstellen algebraischer
Gleichungen, Z. angew. Math. u. Mech., 30:97-101.
(1950b): Uber die Konvergenzkriterien bei Iterationsverfahren fiir lineare
Gleichungssysteme, Math. Z., 53:149-161.
Computation Laboratory (1946): ‘‘A Manual of Operation for the Automatic Sequence
Controlled Calculator,’’ Harvard University Press, Cambridge, 561 pp.
(1949): ‘“‘Description of a Relay Calculator,” Harvard University Press,
Cambridge, 366 pp.
J. L. B. Cooper (1948): The Solution of Natural Frequency Equations by Relaxation
Methods, Quart. Appl. Math., 6:179-182.
A. F. Cornock and J. M. Hughes (1943): The Evaluation of the Complex Roots of
Algebraic Equations, Phil. Mag. (7), 34:314-320.
J. G. van der Corput (1946): Sur approximation de Laguerre des racines d’une
équation qui a toutes ses racines réelles, Koninkl. Ned. Akad. Wetenschap. Proc.,
49 :922-929.
Charles L. Critchfield and John Beck, Jr. (1935): A Method for Finding the Roots of
the Equation f(z) = 0 Where f Is Analytic, J. Research Nat. Bur. Standards,
14 :595-600.
L. L. Cronvich (1939): On the Graeffe Method of Solution of Equations, Am. Math.
Monthly, 46 :185-190.
Prescott D. Crout (1941): A Short Method for Evaluating Determinants and Solving
Systems of Linear Equations with Real or Complex Coefficients, Trans. AIEE,
60 :1235—1240.
Haskell B. Curry (1944): The Method of Steepest Descent for Non-linear Minimiza-
tion Problems, Quart. Appl. Math., 2:258-261.
(1951a): Abstract Differential Operators and Interpolation Formulas, Portu-
galiae Math., 10:135-162.
(1951b): Note on Iterations with Convergence of Higher Degree, Quart. Appl.
Math., 9:204-205.
250 PRINCIPLES OF NUMERICAL ANALYSIS

J. H. Curtiss (1952): ““A Unified Approach to the Monte Carlo Method,” report
presented at meeting of Association for Computing Machinery, Pittsburgh,
May 2-3, 1952.
R. E. Cutkosky (1951): A Monte Carlo Method for Solving a Class of Integral Equa-
tions, J. Research Nat. Bur. Standards, 47 :113-115.
W. Edwards Deming (1938): “Statistical Adjustment of Data,” John Wiley & Sons,
Inc., New York, x + 261 pp.
Bernard Dimsdale (1948): On Bernoulli’s Method for Solving Algebraic Equations,
Quart. Appl. Math., 6:77-81.
C. Domb (1949): On Iterative Solutions of Algebraic Equations, Proc. Cambridge
Phil. Soc., 45 :237-240.
Paul S. Dwyer (1951): ‘Linear Computations,”’ John Wiley & Sons, Inc., New York,
xi + 344 pp.
Engineering Research Associates, Inc. (1950): ‘High-speed Computing Devices,”
McGraw-Hill Book Company, Inc., New York, xiii + 440 pp..
I. M. H. Etherington (1932): On Errors in Determinants, Proc. Edinburgh Math. Soc.,
$:107-117.
G. Faber (1910): Uber die Newton’sche Naherungsformel, J. reine angew. Maith.,
138 :1-21.
V. N. Faddeeva (1950): ‘‘Computational Methods of Linear Algebra” (Russian),
Moscow and Leningrad (Chap. 1, “Basic Material from Linear Algebra,” trans-
lated by Curtis D. Benster), National Bureau of Standards Report 1644.
Leopold Fejér (1934): On the Characterization of Some Remarkable Systems of
Points of Interpolation by Means of Conjugate Points, Am. Math. Monthly,
41 :1-14.
Ervin Feldheim (1939): Théorie de la convergence des procédés d’interpolation et de
quadrature mécanique, Mém. sci. math. acad. sci. Paris, No. 95.
William Feller and George E. Forsythe (1951): New Matrix Transformations for
Obtaining Characteristic Vectors, Quart. Appl. Math., 8:325-331.
Henry E. Fettis (1950): A Method for Obtaining the Characteristic Equation of a
Matrix and Computing the Associated Modal Columns, Quart. Appl. Math.,
8 :206-212.
Donald A. Flanders and George Shortley (1950): Numerical Determination of Funda-
mental Modes, J. Appl. Phys., 21:1326-1332.
A. Fletcher, J. C. P. Miller, and L. Rosenhead (1946): ‘Index of Mathematical
Tables,” McGraw-Hill Book Company, Inc., New York, 450 pp.
{. R. Ford (1925): The Solution of Equations by de Method of Successive Approxima-
tions, Am. Math. Monthly, 32 :272-287.
George E. Forsythe (1951): ‘Tentative Classification of Methods and Bibliography
on Solving Systems of Linear Equations,’ National Bureau of Standards, INA
52-7 (internal memorandum).
(1952): ‘‘Bibliographic Survey of Russian Mathematical Monographs, 1930—
1951,’’ National Bureau of Standards Report 1628.
and Richard A. Leibler (1950): Matrix Inversion by a Monte Carlo Method,
MTAC, 4:127-129.
———- and (1951): Correction to the article, ‘‘ Matrix Inversion by a Monte
Carlo Process,” MT AC, 6:55.
————— and Theodore S. Motzkin (1952): An Extension of Gauss’ Transformation for
Improving the Condition of Systems of Linear Equations, MTAC, 6:9-17.
Tomlinson Fort (1948): “Finite Differences and Difference Equations in the Real
Domain,’’ Oxford University Press, New York.
R. Fortet (1952): On the Estimation of an Eigenvalue by an Additive Functional of a
BIBLIOGRAPHY 251
Stochastic Process, with Special Reference to the Kac-Donsker Method, J.
Research Nat. Bur: Standards, 48 :68-75.
L. Fox (1950): Practical Methods for the Solution of Linear Equations and the
Inversion of Matrices, J. Roy. Stat. Soc., B12 :120-136.
and J. C. Hayes (1951): More Practical Methods for the Inversion of Matrices,
J. Roy. Stat. Soc., B13 :83-91.
, H. D. Huskey, and J. H. Wilkinson (1948): Notes on the Solution of Algebraic
Linear Simultaneous Equations, Quart. J. Mech. Appl. Math., 1:149-173.
J. S. Frame (1944): A Variation of Newton’s Method, Am. Math. Monthly, 61:36-38.
(1949): A Simple Recursion Formula for Inverting a Matrix (Abstract), Bull.
Am. Math. Soc., 65 :1045.
R. A. Frazer (1947): Note on the Morris Escalator Process for the Solution of Linear
Simultaneous Equations, Phil. Mag., 38 :287-289.
and W. J. Duncan (1929): On the Numerical Solution of Equations with
Complex Roots, Proc. Roy. Soc. (London), A125 :68-82.
: , and A. R. Collar (1946): ‘“‘Elementary Matrices and Some Applica-
tions to Dynamics and Differential Equations,” The Macmillan Company, New
York, xvi + 416 pp.
G. F. Freeman (1943): On the Iterative Solution of Linear Simultaneous Equations,
Phil. Mag. (7), 34:409-416.
B. Friedman (1949): Note on Approximating Complex Zeros of a Polynomial, Com-
muns. Pure Appl. Math., 2:195-208.
Thornton C. Fry (1945): Some Numerical Methods for Locating Roots of Polynomials,
Quart. Appl. Math., 3:89-105.
Eduard Firstenau (1860): Neue Methode zur Darstellung und Berechnung der
imagindren Wurzeln algebraischer Gleichungen durch Determinanten der Coeffi-
zienten, Ges. Bef. ges. Naturw., Marburg, 9:19-48.
A. de la Garza (1951): “An Iterative Method for Solving Systems of Linear Equa-
tions,” Oak Ridge, K-25 Plant, Report K-731.
N. K. Gavurin (1950): Application of Polynomials of Best Approximation to Optimal
Convergence of Iterative Processes (Russian), Uspekhi Mat. Nauk 5, 3(37):156-
160.
Wallace Givens (1951): ‘‘Computation of Eigenvalues,’’ Oak Ridge National Labora-
tory (internal memorandum).
(1952): Fields of Values of a Matrix, Proc. Am. Math. Soc., 3:206-209.
James W. Glover (1924): Quadrature Formulae When Ordinates Are Not Equidistant,
Proc. Intern. Math. Congr., Toronto, 831-835.
Herman H. Goldstine and John von Neumann (1951): Numerical Inverting of
Matrices of High Order, II, Proc. Am. Math. Soc., 2:188-202.
Michael Golomb (1943): Zeros and Poles of Functions Defined by Taylor Series,
Bull. Am. Math. Soc., 49 :581—592.
E. T. Goodwin (1950): Note on the Evaluation of Complex Determinants, Proc.
Cambridge Phil. Soc., 46 :450-452.
Lawrence M. Graves (1946): ““The Theory of Functions of Real Variables,’’ McGraw-
Hill Book Company, Inc., New York, x + 300 pp.
Robert E. Greenwood (1949): Numerical Integration for Linear Sums of Exponential
Functions, Ann. Math. Stat., 20:608-611.
and Masil B. Danford (1949): Numerical Integration with a Weight Function
z, J. Math. Phys., 28:99-106.
D. P. Grossman (1950): On the Problem of the Numerical Solution of Systems of
Simultaneous Linear Algebraic Equations (Russian), Uspekhi Mat. Nauk 5,
3(37) :87-103.
252 PRINCIPLES OF NUMERICAL ANALYSIS

P. G. Guest (1950): Orthogonal Polynomials in the Least Squares Fitting of Observa-


tions, Phil. Mag., 41(7) :124-137.
(1951): The Fitting of Polynomials by the oe of Weighted Grouping,
Ann. Math. Stat., 22 :537-548.
Jules Haag (1949): Sie la stabilité des points invariant d’une transformation, Bull.
sci. math., 73 123-134.
J. Hadamard (1892): Essai sur l’étude des fonctions données par leur développement
de Taylor, J. Math. (4), 8:101—186.
Hugh J. Hamilton (1946): Roots of Equations by Functional Iteration, Duke Math. J.,
13 :113-121.
(1950): A Type of Variation on Newton’s Method, Am. Math. Monthly,
57 :517-522.
J. M. Hammersley (1949): The Numerical Reduction of Nonsingular Matrix Pencils,
Phil. Mag., 40(7) :783-807.
Joseph O. Harrison, Jr. (1949): Piecewise Polynomial Approximation for Large-scale
Digital Calculators, MTAC, 3:400—407.
(1951): ‘‘On the Growth of Error in the Numerical Integration of Differential
Equations,’”’ thesis, Columbia University.
Philip Hartman (1949): Newtonian Approximations to a Zero of a Function, Comment.
Math, Helv., 21:321-326.
D. R. Hartree (1949): Notes on Iterative Processes, Proc. Cambridge Phil. Soc.,
46 :230-236.
Paul Herget (1948): ‘“The Computation of Orbits,’ Edwards Bros., Inc., Ann Arbor,
Mich., ix + 177 pp.
M. Herzberger (1949): The Normal Equations of the Method of Least Squares and
Their Solution, Quart. Appl. Math., 7:217-223.
and R. H. Morris (1947): A Contribution to the Method of Least Squares,
Quart. Appl. Math., 6 :354-357.
Magnus R. Hestenes and William Karush (1951a): A Method of Gradients for the
Calculation of the Characteristic Roots and Vectors of a Real Symmetric Matrix,
J. Research Nat. Bur. Standards, 47 :45-61.
and (19516): Solutions of Ax = Bz, J. Research Nat. Bur. Standards,
47 :471-478.
and Marvin L. Stein (1951): ‘The Solution of Linear Equations by Minimiza-
tion,” National Bureau of Standards, NAML Report 52-45.
and Eduard Stiefel (1952): ‘‘Method of Conjugate Gradients for Solving
Linear Systems,’’ National Bureau of Standards Report 1659.
T. J. Higgins, R. P. Agnew, J. Barkley Rosser, and R. J. Walker (1942): Note on
Whittaker’s Method for the Roots of a Power Series, Am. Math. Monthly,
49 :462-465.
T. H. Hildebrandt and L. M. Graves (1927): Implicit Functions and Their Differ-
entials in General Analysis, Trans. Am. Math. Soc., 29:127-153.
Jerome Hines (1951): On Approximating the Roots of an Equation by Iteration,
Math. Mag., 24:123-127.
Frank L. Hitchcock (1938): Finding Complex Roots of Algebraic Equations, J. Math.
Phys., 17 :55-58.
(1939): Algebraic Equations with Complex Coefficients, J. Math. Phys.,
18 :202-210.
(1944): An Improvement on the G. C. D. Method for Complex Roots, J. Math.
Phys., 23 :69-74.
P. G. Hoel (1941): On Methods of Solving Normal Equations, Ann. Math. Stat.,
12 3354-359,
BIBLIOGRAPHY 253
—— and D. D. Wall (1947): The Accuracy of the Root-squaring Method for Solv-
ing Equations, J. Math. Phys., 26 :156-164.
Hans Hornich (1950): Zur Auflésung von Gleichungssystemen, Monatsch. Math.,
64 :130-134.
Paul Horst (1935): A Method of Determining the Coefficients of a Characteristic
Equation, Ann. Math. Stat., 6 :83-84.
Harold Hotelling (1936): Simplified Calculation of Principal Components, Psycho-
metrika, 1:27-35.
(1943a): Some New Methods in Matrix Calculation, Ann. Math. Stat., 14:1-34.
(1943): Further Points on Matrix Calculation and Simultaneous Equations,
Ann. Math. Stat., 14:440—-441.
(1949): Practical Problems of Matrix Calculation, Proc. Symposium Math.
Stat. Prob. Berkeley, 275-294.
A. S. Householder (1950): Some Numerical Methods for Solving Systems of Linear
Equations, Am. Math. Monthly, 57 :453-459.
- (1951): Polynomial Iterations to Roots of Algebraic Equations, Proc. Am.
Math. Soc., 2:718-719.
, G. E. Forsythe, and H. H. Germond (eds.) (1951): ‘Monte Carlo Method,”
National Bureau of Standards, Applied Mathematics Series 12, vii + 42 pp.
and Gale Young (1938): Matrix Approximation and Latent Roots, Am. Math.
Monthly, 45:165-171.
P. M. Hummel (1946): The Accuracy of Linear Interpolation, Am. Math. Monthly,
53 :364-366.
and C, L. Seebeck (1949): A Generalization of Taylor’s Expansion, Am. Math.
Monthly, 56 :243-247.
and (1951): A New Interpolation Formula, Am. Math. Monthly,
58 :383-389.
Harry D. Huskey and Douglas R. Hartree (1949): On the Precision of a Certain
Procedure of Numerical Integration, J. Research Nat. Bur. Standards, 42 357-62.
C. A. Hutchinson (1935): On Graeffe’s Method for the Numerical Solution of Alge-
braic Equations, Am. Math. Monthly, 42:149-161.
S. Inman (1950): The Probability of a Given Error Being Exceeded in Approximate
Computation, Math. Gaz., 34:99-113.
C. Isenkrahe (1888): Ueber die Anwendung iterirter Funktionen zur Darstellung der
Wurzeln algebraischer und transcendenter Gleichungen, Math. Ann., 31 :309-317.
V. K. Ivanov (1939): On the Convergence of Iterative Processes for the Solution
of Systems of Linear Algebraic Equations (Russian), Izvest. Akad. Nauk SSSR
(1939) :477—483.
Dunham Jackson (1921): The General Theory of Approximation by Polynomials and
Trigonometric Sums, Bull. Am. Math. Soc., 27 :415-431.
(1930): The Theory of Approximation, Am. Math. Soc. Colloquium Publs.,
Vol. 11.
C. G. J. Jacobi (1835): Observatiunculae. ad theoriam equationum pertinentes,
J. reine angew. Math., 13 :340-352.
(1846): Uber ein leithtes Verfahren die in der Theorie der Sacularstérungen
vorkommenden Gleichungen numerisch Aufzulésen, J. reine angew. Math., 30 :51-
94.
H. A. Jahn (1948): Improvement of an Approximate Set of Latent Roots and Modal
Columns of a Matrix by Methods Akin to Those of Classica] Perturbation Theory,
Quart. J. Mech. Appl. Math., 1:131-144.
H. Jensen (1944): Attempt at a Systematic Classification of Some Methods for the
Solution of Normal Equations, Geodaet, Insts, Meddelelsa, No. 18,
254 PRINCIPLES OF NUMERICAL ANALYSIS

C. W. Jones, J. C. P. Miller, J. F. C. Conn, and R. C. Pankhurst (1945): Tables of


Chebyshev Polynomials, Proc. Roy. Soc. Edinburgh, A62:187—203.
Charles Jordan (1947): “Calculus of Finite Differences,” 2d ed., Chelsea Publishing
Company, New York, xxi + 652 pp.
W. B. Jordan (1951): An Iterative Process, MTAC, 5:183.
F. Jossa (1940): Risoluzione progressiva di un sistema di equazioni lineari, Rend.
accad. sci. fis. mat. e nat. soc. reale Napoli (4), 10:346-352.
Mark Kac (1951): On Some Connections between Probability Theory and Differential
and Integral Equations, Proc. Symposium Math. Stat. Prob., 2d Symposium
Berkeley, 1950:189-215.
and Michael Cohen (1952): ‘“‘A Statistical Method for Determining the Lowest
Eigenvalue of Schrédinger’s Equation,’’ National Bureau of Standards Report
1558.
and M. D. Donsker (1950): A Sampling Method for Determining the Lowest
Eigenvalue and the Principal Eigenfunction of Schrédinger’s Equation, J.
Research Nat. Bur. Standards, 44:551-557.
Herman Kahn (1949): “Modification of the Monte Carlo Method,’ The RAND
Corporation, Report P-132.
(1950): Random Sampling (Monte Carlo) Techniques in Neutroh Attenuation
Problems, Nucleonics, 6: May, 27-33; June, 60—65.
L. V. Kantorovich (1939): The Method of Successive Approximations for Functional
Equations, Acta Math., 71:62-97.
(1947): On the Method of Steepest Descent (Russian), Doklady Akad. Nauk
SSSR, 66 :233-236.
(1948a): On a General Theory of Methods of Analytical Approximations
(Russian), Doklady Akad. Nauk SSSR, 60 :957-960.
(1948b): On the Method of Newton for Functional Equations (Russian),
Doklady Akad. Nauk SSSR, 69 :1237-1240.
Truman L, Kelley (1935): ‘Essential Traits of Mental Life,” Harvard University
Press, Cambridge, Mass., 145 pp.
W. M. Kincaid (1947): Numerical Methods for Finding Characteristic Roots and
Vectors of Matrices, Quart. Appl. Math., 5 :320-345. .
(1948a): Note on the Error in Interpolation of a Function of Two Independent
Variables, Ann. Math. Stat., 19 :85-88.
(1948b) : Solution of Equations by Interpolation, Ann. Math. Stat., 19 :207-219.
Anna Klingst (1941): Eine Verallgemeinerung der Euler-Maclaurin’schen Reihe und
der Bernoulli’schen Zahlen, Sitzber. Akad. Wiss. Wien. Ila, 150 :221-256.
A. Kneschke (1949a): Theorie der geniherten Quadratur, J. reine angew. Math.,
187 :115-128.
(19496): Zur Theorie der Interpolation, Math. Z., 62:137-149.
Julius Kénig (1876): Ein allgemeiner Ausdruck fiir die ihren absoluten Betrage
nach kleinste Wurzel der Gleichung nten Grades, Math. Ann., 9:530-540.
(1884): Ueber eine Kigenschaft der Potenzreihen, Math. Ann., 23 :447-449.
Walter Kohn (1949): A Variational Iteration Method for Solving Secular Equations,
J. Chem. Phys., 17:670.
Zdenek Kopal, Pierre Carrus, and Katherine E. Kavanagh (1951): A New Formula
for Repeated Mechanical Quadratures, J. Math. Phys., 30:44—48.
Gerhard Kowalewski (1932): “Interpolation und genaherte Quadratur,” B. G. Teub-
ner, Leipzig, v + 146 pp.
(1938): Entwicklung einer Funktion nach Lagrangeschen Polynomen und
ibren Integralen, Deut. Mathematik, 3:275-280.
BIBLIOGRAPHY 255
(1948): “Einfiihrung in die Determinantentheorie,” 3d ed., Chelsea Publish-
ing Company, New York, 320 pp.
A. N. Krylov (1950): “Lectures on Approximate Computations” (Russian), 5th ed.,
Moscow, Leningrad, 400 pp.
A. C. Kurog, A. I. Markusevié, and P. R. Ragevskii (ed.) (1948): “ Mathematics in
i
SSSR in the 30 years 1917-1947” (Russian), Gosudarstvennoe Izdatel’stvo
Tehniko-Teoreticeskil Literatury, Moscow-Leningrad, 1044 pp.
Jack Laderman (1948): The Square Root Method for Solving Simultaneous Linear
Equations, MTAC, 3:13-16.
Cornelius Lanczos (1938): Trigonometric Interpolation of Empirical and Analytic
Functions, J. Math. Phys., 17 :123-199.
(1950): An Iteration Method for the Solution of the Eigenvalue Problem of
Linear Differential and Integral Operators, J. Research Nat. Bur. Standards,
45 :255-282.
(1951): “Solution of Systems of Linear Equations by Minimized Iterations,”’
National Bureau of Standards, NAML Report 52-13.
(1952): ‘Analytical and Practical Curve Fitting of Equidistant Data,”
National Bureau of Standards Report 1591.
D. H. Lehmer (1945): The Graeffe Process As Applied to Power Series, MTAC,
1:377-383.
D. C. Lewis (1947): Polynomial Least Square Approximations, Am. Jour. Math.,
69 :273-278.
V. B. Lidskii (1950): On Proper Values of Sums and Products of Symmetric Matrices
(Russian), Doklady Akad. Nauk SSSR, 16 :769-772.
H. Liebmann (1918): Die angenadherte Ermittlung harmonischer Funktionen und
konformer Abbildung, Siizber. math.-naturw. Kl. bayer. Akad. Wiss. Miinchen,
1918:385-416.
Shih-nge Lin (1941): A Method of Successive Approximation of Evaluating the
Real and Complex Roots of Cubic and Higher-order Equations, J. Math. Phys.,
20 :231-242.
(1943): A Method for Finding Roots of Algebraic Equations, J. Math. Phys.,
22 :60-77.
A. T. Lonseth (1947): The Propagation of Error in Linear Problems, Trans. Am. Math.
Soc., 62 :193-212.
Y. L. Luke and Dolores Ufford (1951): On the Roots of Algebraic Equations, J. Math.
Phys., 30 :94-101.
Cyrus Colton MacDuffee (1943): ‘Vectors and Matrices,” The Mathematical
Association of America, Menasha, Wis., xi + 192 pp.
(1946): ‘‘The Theory of Matrices,” Chelsea Publishing Company, New York,
v +110 pp.
P. Mansion (1914): Théoréme général de Peano sur le reste dans les formules de
quadrature, Mathesis, 34:169-174.
Emory McClintock (1895): Theorems in the Cliches of Enlargement, Am. Jour.
Math., 17:69-80.
James McMahon (1894): On the General Term in the Reversion of Series, Bull. NY
Math. Soc., 3:170-172.
N. N. Meiman (1949): Some Questions Relating to the Location of the Zeros of Poly-
nomials (Russian), Uspekhi Mat. Nauk, 4:154-188.
Nicholas Metropolis and S. Ulam (1949): The Monte Carlo Method, J. Am. Stat.
Assoc., 44 :335-341.
Franz Meyer (1889): Zur Auflésung der Gleichungen, Math. Ann., 33:511-524,
256 PRINCIPLES OF NUMERICAL ANALYSIS
Leroy F. Meyers and Arthur Sard (1950a): Best Approximate Integration Formulas,
J. Math. Phys., 29 7118-123.
(19500): Best Interpolation Formulas, J. Math. Phys., 29:198-206.
TL CeP. Miller (1945): Two Numerical Applications of Chebyshev Polynomials,
Proc. Roy. Soc. Edinburgh, A62:204-210.
(1950): Checking by Differences, MT'AC, 4:3-11.
William Edmund Milne (1949a): ‘Numerical Calculus. Approximations, Interpo-
lation, Finite Differences, Numerical Integration, and Curve Fitting,” Princeton
University Press, Princeton, N.J., x + 393 pp.
(1949b): The Remainder in Linear Methods of Approximation, J. Research
Nat. Bur. Standards, 43:501-511.
L. M. Milne-Thompson (1933): “Calculus of Finite Differences,” Macmillan & Co. -
Ltd., London, xxiii + 558 pp.
R. von Mises (1936): Uber allgemeine Quadraturformeln, J. reine angew. Math.,
174 :56-67.
and Hilda Pollaczek-Geiringer (1929): Praktische Verfahren der Gleichungs-
auflésung, Z. angew. Math. u. Mechan., 9:58-77, 152-164.
Edward F. Moore (1949): A New General Method for Finding Roots of Polynomial
Equations, MTAC, 3:486-488.
Oskar Morgenstern and Max A. Woodbury (1950): The Stability of Inverses of
Input-Output Matrices, Hconometrica, 18 :190-192.
Joseph Morris (1946): An Escalator Process for the Solution of Linear Simultaneous
Equations, Phil. Mag. (7), 37:106—120.
(1947): ‘The Escalator Method in Engineering Vibration Problems,” John
Wiley & Sons, Inc., New York, xv + 270 pp.
and J. W. Head (1942): ‘‘Lagrangian Frequency Equations. An ‘Escalator’
Method for Numerical Solution” (appendix by G. Temple), Aircraft Eng.,
14: 312-316.
T. S. Motzkin and J. L. Walsh (1951): ‘On the Derivative of a Polynomial and
‘Chebyshev Approximation,”’ National Bureau of Standards Report 1444.
M. Miiller (1948): Uber ein Eulersches Verfahren zur Wurzelberechnung, Math. Z.,
51 :474-496.
Thomas Muir (1906, 1911, 1920, 1923): ““The Theory of Determinants in the Histor-
ical Order of Development,” Macmillan & Co., Ltd., London, 1:xi + 491 pp.,
2:xvi + 475 pp., 3:xxvi + 503 pp., 4:xxvi + 508 pp.
(1930): ‘‘Contributions to the History of Determinants, 1900-1920,” Blackie
& Son, Ltd., Glasgow, xxiii + 408 pp.
and W. H. Metzler (1930): ‘Theory of Determinants,”’ Albany, 606 pp.
Francis J. Murray (1947): ‘‘The Theory of Mathematical Machines,” King’s Crown
Press, New York, vii + 116 pp.
(1949): Linear Equation Solvers, Quart. Appl. Math., 7 :263-274.
(1951): Error Analysis for Mathematical Machines, Trans. NY Acad. Sci.,
13(1]) :168-174.
H. Nagelsbach (1876, 1877): Studien zur Fiirstenau’s neuer Methode, Arch. Math.
Phys., 69 :147-192; 61:19-85.
M. Lewis Nelson (1949): “‘A Monte Carlo Computation Being Made for Neutron
Attenuation in Water,” Oak Ridge National Laboratory, ORNL-439.
Eugen Netto (1887): Ueber einen Algorithmus zur Auflésung numerischer alge-
braischer Gleichungen, Math. Ann., 29:141-147.
(1887): Zur Theorie der Horeten Funktionen, Math. Ann., 29:148-153.
John von Neumann and H. H. Goldstine (1947): Numerical Inverting of Matrices of
High Order, Bull. Am. Math. Soc., 53 :1021—-1099.
BIBLIOGRAPHY 257
E. H. Neville (1934): Iterative Interpolation, Ind. Math. Soc. J., 20:87-120.
N. E. Nérlund (1926): ‘“‘Legons sur les séries d’interpolation,”’ Gauthier-Villars & Cie,
Paris, vii + 236 pp.
Kristen Nygaard (1952): “On the Solution of Integral Equations by Monte-Carlo
Methods,” Norwegian Defence Research Establishment, Rapport Nr. F-R94.
C. D. Olds (1950): The Best Polynomial Approximation of Functions, Am. Math.
Monthly, 57 :617-621.
Ascher Opler (1951): Monte Carlo Matrix Calculation with Punched Card Machines,
MTAC, 5:115-120.
Alexander Ostrowski (1936): Konvergenzdiskussion und Fehlerabschatzung fiir die
Newton’sche Methode bei-Gleichungssystemen, Comment. Math. Helv., 9:79-
103.
(1937a): Sur la détermination des bornes inférieures pour une classe des
déterminants, Bull. sct. math. (2), 61:19-32.
(19376): Uber die Determinanten mit iiberwiegender Hauptdiagonale, Com-
ment. Math. Helv., 10 :69-96.
(1937c): Uber die Konvergenz und die Abrundungsfestigkeit des Newtonschen
Verfahrens, Rec. Math., 2:1073-1095. :
(1938): Uber einen Fall der Konvergenz des Newtonschen Naherungsver-
fahrens, Rec. Math., 3:254-258.
(1940): Recherches sur la méthode de-Graeffe et les zéros des polynomes et des
series de Laurent, Acta Math., 72:99-155.
(1952): Note on Bounds for Determinants with Dominant Principal Diagonal,
Proc. Am. Math. Soc., 3:26-30.
and Olga Taussky (1951): On the Variation of the Determinant of a Positive
Definite Matrix, Koninkl. Ned. Akad. Wetenschap. Proc., 54:383-385.
W. V. Parker (1948): Characteristic Roots and the Field of Values of a Matrix, Duke
Math. J., 15 :439-442.
(1948): Sets of Complex Numbers Associated with a Matrix, Duke Math. J.,
16 :711-715.
(1951): Characteristic Roots and Field of Values of a Matrix, Bull. Am. Math.
Soc., 57 :103-108.
Maurice Parodi (1949): Sur les limites des modules des racines des équations algé-
briques, Bull. sct. math., 73 :135-144.
(1951): Sur les familles-de matrices auxquelles est applicable une méthode
ditération, Compt. rend., 232 :1053-1054.
(1952): Sur quelques propriétés des valeurs caractéristiques des matrices
carrées, Mém. sci. math. acad. sci. Paris, No. 118.
G. Peano (1914): Residuo in formulas de quadratura, Mathesis, 34:1—-10.
Karl Pearson (1920): ‘‘On the Construction of Tables and on Interpolation, Tracts
for Computers II,’’ Cambridge University Press, London, 70 pp.
Hans Petersson (1949): Uber Interpolation durch Lésungen linearer Differential
Gleichungen, Abhandl. Math. Sem. Hamburg, 16 :40-55.
S. Pincherle (1915): Funktional Operationen und Gleichungen, Enc. Math. Wiss.,
II A, 11:761-817.
R. Plunkett (1950): On the Convergence of Matrix Iteration Processes, Quart. Appl.
Math., 7:419-421.
G. Polya (1915): Uber das Graeffesche Verfahren, Z. Math. u. Phys., 68 :275-290.
A. Porter and C. Mack (1949): New Methods for the Numerical Solution of Algebraic
Equations, Phil. Mag. (7), 40:578-585.
G. Baley Price (1951): Bounds for Determinants with Dominant Principal Diagonal,
Proc. Am. Math. Soc., 2:497—502.
258 PRINCIPLES OF NUMERICAL ANALYSIS

“Proceedings of a Second Symposium on Large-scale Digital Calculating Machinery”’


(1949), Harvard University Press, Cambridge, Mass., 393 pp.
R. Prony (An IV): Essai expérimentale et analytique, J. polytechnique Cah., 2:24-76.
Hans A. Rademacher (1947): On the Accumulation of Errors in Processes of Integra-
tion on High-speed Calculating Machines, Proc. Symposium Large-scale Digital
Calculating Machinery, 176-185, Harvard University Press, Cambridge, Mass.,
1948.
and I. J. Schoenberg (1947): An Iteration Method for Calculation with Laurent
Series, Quart. Appl. Math., 4:142-159.
Johann Radon (1935): Restausdriicke bei Interpolations- und Quadraturformeln durch
bestimmte Integrale, Monatsh. Math. u. Phys., 42:389-396.
Raymond Redheffer (1948): Errors in Simultaneous Linear Equations, Quart. Appl.
Math., 6 :342-343.
Edgar Reich (1949): On the Convergence of the Classical Iterative Method of Solving
Linear Simultaneous Equations, Ann. Math. Stat., 20:448-451.
Lewis F. Richardson and J. Arthur Gaunt (1927): The Deferred Approach to the
Limit, Phil. Trans. Roy. Soc. (London), A226 :299-361.
H. W. Richmond (1944): On Certain Formulae for Numerical Approximation, J.
London Math. Soc., 19:31-38.
R. D. Richtmyer (1952): ‘“‘The Evaluation of Definite Integrals, and a Quasi-Monte-
Carlo Method Based on the Properties of Algebraic Numbers,’’ Los Alamos
Scientific Laboratory Report, LA-1342.
Mari Sofia Roma (1947): Il metodo dell’ortogonalizzazione per la risoluzione numerica
dei sistemi di equazioni algebriche, Riv. Catasto e serviti tecnict Erariali, No. 1.
Arnold E. Ross (1950): ‘‘A General Theory of the Iterative Methods of Solution of
Linear Systems,’”’ University of Notre Dame, Office of Air Research Contract,
Technical Report No. 1.
J. Barkley Rosser (1951): Transformations to Speed the Convergence of Series,
J. Research Nat. Bur. Standards, 46 :56-64.
, C. Lanczos, M. R. Hestenes, and W. Karush (1951): The Separation of Close
Higenvalues of a Real Symmetric Matrix, J. Research Nat. Bur. Standards,
47 :291—297.
Carl Runge (1885): Entwicklung der Wurzeln einer algebraischen Gleichung in
Summen von rationalen Funktionen der Coefficienten, Acta Math., 6 :305-318.
.and H. Kénig (1924): “‘Vorlesungen iiber numerisches Rechnen,’’ Springer-
Verlag, Berlin.
and F. A. Willers (1915): Numerische und graphische Quadratur und Inte-
gration gewohnlicher und partieller Differentialgleichungen, Hnc. Math. Wiss.,
II C, 2:47-176.
S. Rushton (1951): On Least Squares Fitting by Orthonormal Polynomials Using
the Choleski Method, J. Roy. Stat. Soc., B13 :92-99.
D. H. Sadler (1950): Maximum-interval Tables, MTAC, 4:129-132.
Herbert E. Salzer (1951): Checking and Interpolation of Functions Tabulated at
Certain Irregular Logarithmic Intervals, J. Research Nat. Bur. Standards, 46 :74-
77.
Paul A. Samuelson (1942): A Method of Determining Explicitly the Coefficients of
the Characteristic Equation, Ann. Math. Stat., 13 :424—429.
(1945): A Convergent Iterative Process, J. Math. Phys., 24:131-134.
(1949): Iterative Computation of Complex Roots, J. Math. Phys., 28 :259-267.
L. Sancery (1862): De la méthode des substitutions successives pour le calcul des
racines des équations, Nouvelles ann. math. (2), 1:305-315,
BIBLIOGRAPHY 259.
Arthur Sard (1948a): Integral Representations of Hemainders; Duke Math.
J.,
15 3333-345.
(1948b): The Remainder in Approximations by Moving Averages, Bull. Am.
Math. Soc., 64:788-792.
(1949a): Best Approximate Integration Formulas; Best yee a nen
Formulas, Am. J. Math., 71:80-91.
(19498): Smoothest Ropromescon Formulas, Ann. Math. Stat., 20:612-615.
(1951): Remainders: Functions of Several Variables, Acta M. ath., 84: 319-346.
James B. Scarborough (1950): “Numerical Mathematical Analysis,” 2d ed., Johns
Hopkins Press, Baltimore, xviii + 511 pp.
Erhard Schmidt (1908): Uber die Auflésung linearer Gleichungen mit unendlich
vielen Unbekannten, Rend. circ. mat. Palermo, 26 :53-77.
Hermann Schmidt (1950): Uber Wurzelapproximation nach Euler und Fixgebilde
linearer Transformationen, Math. Z., 62:547-556.
Robert J. Schmidt (1941): On the Numerical Solution of Linear Simultaneous Equa-
tions by an Iterative Method, Phil. Mag. (7), 32:369-383.
(1935): Die Allgemeine Newtonsche Quadraturformel und Quadraturformeln
fiir Stieltjesintegrale, J. reine angew. Math., 178 :52-59.
I. J. Schoenberg (1946): Contributions to the Problem of Approximation of Equi:
distant Data by Analytic Functions, Quart. Appl. Math., 4:45-99, 112-141.
(1952): ‘On Smoothing Operations and Their Generating Functions,”
National Bureau of Standards Report 1734.
Ernst Schroder (1870): Uber unendliche viele Algorithmen zur Auflésung der Gleich-
ungen, Math. Ann., 2:317-365.
(1871): Ueber iterirte Funktionen, Math. Ann., 3:296-322.
Gunther Schulz (1933): Iterative Berechnung der reziproken Matrix, Z. angew. Math.
u. Mech., 13 :57-59.
(1942): Uber die Lésung von Gleichungssystemen durch Iteration, Z. angew.
Math. u. Mech., 22 :234-235.
Hans Schwerdtfeger (1950): ‘Introduction to Linear Algebra and the Theory of
Matrices,” P. Noordhoff N. V., Groningen, 280 pp.
L. Seidel (1874): Uber ein Verfahren die Gleichungen, auf Welche die Methode der
kleinsten Quadrate fihrt, sowie lineare Gleichungen tiberhaupt, durch successive
Anndaherung aufzulésen, Abhandl. bayer. Akad. Wiss., Math.-physik. Kl., 11:81-
108.
K. A. Semendiaev (1943): “The Determination of Latent Roots and Invariant Mani-
folds of Matrices by Means of Iterations” (translated by Curtis D. Benster),
National Bureau of Standards Report 1402.
W. F. Sheppard (1924): Interpolation with Least Mean Square of Error, Proc. Intern.
Math. Congr. Toronto, 821-830.
J. Sherman and W. J. Morrison (1949): Adjustment of an Inverse Matrix Correspond-
ing to Changes in the Elements of a Given Column or a Given Row of the Original
Matrix, Ann. Math. Stat., 20 3621.
(1950): Adjustment of an Inverse Matrix Corresponding to a Change in One
Element of a Given Matrix, Ann. Math. Stat., 21:124.
J. A. Shohat (1929): On a Certain Formula of Mechanical Quadratures with Non-
equidistant Ordinates, Trans. Am. Math. Soc., 31:448-463.
and A. V. Bushkovitch (1942): On Seine Applications of the Tchebycheff
Inequality for Definite Integrals, J. Math. Phys., 21:211—-217.
and C. Winston (1934): On Mechanical Quadratures, Rend. circ. math. Palermo,
68 :153-165.
260 PRINCIPLES OF NUMERICAL ANALYSIS
Y. A. Srefder (1951): The Solution of Systems of Linear Consistent Algebraic Equa-
tions (Russian), Doklady Akad. Nauk SSSR, 76 :651-654.
J. F. Steffenson (1927): “Interpolation,” The Williams & Wilkins Company, Baltimore.
Marvin L. Stein (1952): Gradient Methods in the Solution of Systems of Linear
Equations, J. Research Nat. Bur. Standards, 48 :407—413.
P. Stein (1951a): A Note on Inequalities for the Norm of a Matrix, Am. Math. Monthly,
58 :558-559.
(1951b): The Convergence of Seidel Iterants of Nearly Symmetric Matrices,
MTAC, 5:237-240.
(1952a): A Note on Bounds of Multiple Characteristic Roots of a Matrix,
J. Research Nat. Bur. Standards, 48 :59-60.
(1952b): Some General Theorems on Iterants, J. Research Nat. Bur. Standards,
48 :82-83.
J. K. Stewart (1951): Another Variation of Newton’s Method, Am. Math. Monthly,
58 :331-334.
Eduard Stiefel (1952): Uber einige Methoden der Relaxationsrechnung, Z. angew.
Math. u. Phys., 3:1-33.
J. L. Synge (1944): A Geometrical Interpretation of the Relaxation Method, Quart.
Appl. Math., 2:87-89.
Olga Taussky (1949): A Recurring Theorem on Determinants, Am. Math. Monthly,
56 :672-676.
(1950): Notes on Numerical Analysis—2. Note on the Condition of Matrices,
MTAC, 4:111-112.
F. Theremin (1855): Recherches sur la résolution des équations de tous les dégrés,
J. reine angew. Math., 49:187-243.
T. N. Thiele (1909): “Interpolationsrechnung,” B. G. Teubner, Leipzig, xii + 175 pp.
John Todd (1949a): The Condition of a Certain Matrix, Proc. Cambridge Phil. Soc.,
46 :116-118.
(19496): The Condition of Certain Matrices, I, Quart. J. Mech. Appl. Math.,
2 :469-472.
L. B. Tuckerman (1941): On the Mathematically Significant Figures in the Solution
of Simultaneous Linear Equations, Ann. Math. Stat., 12:307-316.
A. M. Turing (1948): Rounding-off Errorsin Matrix Proce Quart. J. M ech. Appl.
Math., 1:287-308.
H. W. Turnbull (1929): “The Theory of Determinants, Matrices, and Invariants,”’
Blackie & Son, Ltd., Glasgow, xvi + 338 pp.
and A. C. Aitken (1930): “An Introduction to the Theory of Canonical
Matrices,”’ Blackie & Son, Ltd., Glasgow, xiii + 192 pp.
C. de la Vallée Poussin (1952): ere sur l’approximation des fonctions wane
variable réelle,”” Gauthier-Villars & Cie, Paris, vi + 151 pp.
C. C. Van Orstrand (1910): Reversion of Power Series, Phil. Mag. (6), 19 :366-376.
Frank M. Verzuh (1949): The Solution of Simultaneous Linear eee with the
Aid of the 602 Calculating Punch, MTAC, 3:453-462.
Bernard Vinograde (1950): Note on the Escalator Method, Proc. Am. Math. Soc.,
1:162-164.
H. S. Wall (1948): A Modification of Newton’s Method, Am. Math. Monthly, 55 :90-94.
Wolfgang Wasow (1950): “‘On the Duration of Random Walks,” National Bureau of
Standards prepublication copy.
(1951a): “A Note on the Inversion of Matrices by Random Walks,” National
Bureau of Standards, NAML Report 52-15.
(19516): Random Walks and the Eigenvalues of Elliptic Difference Equations,
J. Research Nat. Bur. Standards, 46 :65-73.
BIBLIOGRAPHY 261
Harold Wayland (1945): Expansion of Determinantal Equations into Polynomial
Form, Quart. Appl. Math., 2:277-306.
F. Wenzl (1952): Iterationsverfahren zur Berechnung komplexer Nullstellen von
Gleichungen, Z. angew. Math. u. Mech., 32:85-87.
E. T. Whittaker and G. Robinson (1940): ‘The Calculus of Observations. A Treatise
on Numerical Mathematics,” 3d ed., Blackie & Son, Ltd., Glasgow, xvi + 395 pp.
Helmut Wielandt (1944): Das Iterationsverfahren bei nicht selbstadjungierten
linearen Eigenwertaufgaben, Math. Z., 50 :93-143.
Albert Wilansky (1951): The Row-sums of the Inverse Matrix, Am. Math. Monthly,
58 :614-615.
Maurice V. Wilkes, David J. Wheeler, and Stanley Gill (1951): ‘‘The Preparation of
Programs for an Electronic Digital Computer,” Addison-Wesley Press, Inc.,
Cambridge, 170 pp.
Fr. A. Willers (1950): “‘Methoden der praktischen Analysis,”’ 2d ed., Walter De Gruy-
ter & Company, Berlin, 410 pp.
C. Winston (1934): On Mechanical Quadratures Formulae Involving the Classical
Orthogonal Polynomials, Ann. Math., 35 :658-677.
H. Wittmeyer (1936a): Einfluss der Anderung einer Matrix auf die Lésung des
zugehérigen Gleichungssystems, sowie auf die characteristischen Zahlen und die
Eigenvektoren, Z. angew. Math. u. Mech., 16 :287-300.
(19360): Uber die Lésung von linearen Gleichungssysteme durch Iteration,
Z. angew. Math. u. Mech., 16:301-310.
J. R. Womersley (1946): Scientific Computing in Great Britain, MTAC, 2:110-117.
Y. K. Wong (1935): An Application of Orthogonalization Process to the Theory of
Least Squares, Ann. Math. Stat., 6:53-75.
Max Woodbury (1950): Inverting Modified Matrices, Memorandum Report 42,
Statistical Research Group, Princeton.
F. L. Wren (1937): Neo-Sylvester Contractions and the Solution of Seene of Linear
Equations, Bull. Am. Math. Soc., 43 :823-834.
Rudolf Zurmihl (1950): iM Matrizen Eine Darstellung fiir Ingenieure,’”’ Springer-
Verlag, Berlin, xv ++ 427 pp.
PROBLEMS

CHAPTER 1
1. Suppose that a, b, c, and z are digital numbers and that |ax + b| < |c|. Assume
a machine forming pseudoproducts and pseudoquotients with maximum error e.
For calculating :
y = (ar + b)/c
(a) If |a| < |ec| and |b] < |c|, what routine is optimal, and what is the error?
(b) If Ja] > |c| and |x| > |c|, what error may occur?
ocplt

y= asx", la| Soe |x] < ies


offs
a; and x digital, describe a routine producing only digital intermediate quantities and
obtain limits for the final error.
3. A table of values of f(z) is to be prepared at equally spaced values of x with
values of Af given to facilitate interpolation. Should one give

Afe = f*(ei41) — f*(%)


or
A*f; = [f(via1) — f(zs)]*?
That is, should one give the rounded difference of the f’s or the difference of the
rounded f’s?
4. Find the error in the evaluation of
2-1(2-1 4+ 2-%)%

if the computation makes use of the routine described above for square roots.
5. Obtain formulas for the errors Ay and relative errors Ay/Az due to errors Az for

(a) y = sin z,
(6) y = tan z,
(c) y = sec 2,
(2) y = exp (az),
(e) y = log z.
6. Obtain a formula for the error Az in the solutionof the quadratic equation
ax? + be +c = 0 if the coefficients may be in error by amounts Aa, Ab, Ac.
7. If f = 1 — x?/a, the iteration

Ziyi = ail + f(ex)/2]


converges to the square root of a. Devise a routine based upon this iteration, assum-
ing a machine with the same characteristics as described in §1.5, and analyze the
errors.
263
264 PRINCIPLES OF NUMERICAL ANALYSIS

CHAPTER 2
1. Solve in four distinct ways, using at least one direct and at least one iterativ:
method:
3.22 — 2.0y + 3.92 = 13.0,
2.12 + 5.ly — 2.92 = 8.6,
5.92 + 3.0y + 2.22 = 6.9.
2. If a, b, c, and d are arbitrary vectors in a plane, show that

[b, c][a, d] + [c, a][b, d] + [a, b]lc, d] = 0,


and write the corresponding determinantal identity.
8. If a, . . . , e are vectors in 3 space, show that
[b, c, e][a, d, e] + [c, a, e][b, d, e] + [a, b, el[c, d, e] = 0.
4. If a1, . . . , a, are linearly independent, then the vectors a* satisfying
at-a; = 63;
are said to form a set reciprocal to the initial one. Show that
[ai1, @s, as, aal:[€1, 2, @€s, @4] = [e1, €2, es, eaj:[at, a, a3, aé]
ll [ai, 2, As, aaj:[€1, €2, es, a4]

and write the corresponding determinantal identities.


5. If the e; form an arbitrary set of (linearly independent) reference vectors, let
e? represent the reciprocal set. Show that, if G is the matrix of e; - e;, then G~ is the
matrix of e*- e/. Also if a’ is the covariant representation of a in the system e;, then
it is the contravariant representation in the system e?, and conversely.
6. With the system of equations Ax = y of Prob. 1, form the equivalent system
Al’Ax = Aly. Solve using (i) Seidel iterations, (iz) relaxations, (27) triangular
factorizations, and (iv) the method of Stiefel and Hestenes.
7. Suppose it required to evaluate a'x, where a is a known vector and z satisfies
Az =y. Show that in the factorization of the bordered matrix

Ay a L'W’,
—a 0
where L’ is unit lower triangular and W’ upper triangular, a'z is the element in the
lower right-hand corner of W’.
8. In the process of making a triangular factorization of a matrix A, certain quanti-
ties may vanish and necessitate a reordering of rows or columns, or both, in order to
proceed. Show that the process will go through without such rearrangements if and
only if all the following determinants are non-null:

a ae @11 12 13
11 12
O11, 9 |@2t Gee eg], « « « «
Qo Ane
31 A32 233

CHAPTER 3

1. If f(x) is of degree r < n, and

g(x) = || @ — 2)
0
PROBLEMS - 265
has no repeated factors, then
1 Eyto tc il eer, ae
Zo Zn Xo In
f(x) Dal Ee Bees ely ye +sw |- ex
g(a) Cie: tate xe oy
f (Xo) f(@n)
L— 2X L— Ln

2. From Eqs. (3.02.9) express S; as a determinant in the o; and o; 1s a determinant


in the Sj.
3. Obtain equations similar to (3.02.5) and (3.02.9) relating the S; and the s;.
From these obtain determinantal expressions for members of each set in terms of those
of the others.
4. If the x; are distinct, show that

ie aby. al ih ah
-| 21, Ze Zs =m + wt + 2:2,
«3 23

oe ae lle = ©2t3 + X3%1 + ize.


TT, 5
Generalize for x1, 22, . . . , In.
5. Evaluate x as the smallest root of x! sin x = 0 by applying (z) Bernoulli’s
method and (zz) Graeffe’s method, either to the equation itself or to a suitable trans-
form.
6. Accelerate the convergence of the Bernoulli sequence for the preceding problem
by applying the 6? process.
7. Form a third-order iteration for 73 +2 —1=0.
8. Form a second-order polynomial iteration for the same equation.

CHAPTER 4
1. Find the characteristic equation of the matrix

Bol 2) 4
(EE SOR) t
7 3
412 2

2. Evaluate the largest proper value (s) of the above matrix by iteration.
3. Apply Lanczos’s method (§4.23) to obtain the proper values and proper vectors
of the same matrix.
4. Diagonalize by the method of §4.115

Bit Onde 0)
eo) ek
i 23.024h
Onin S
5. Obtain the triple-diagonal form (§4.24) of this matrix.
6. The largest proper value of a certain matrix A of order n is to be evaluated by
iteration. Though the sequence (A”)‘z will give more rapid convergence than the
266 PRINCIPLES OF NUMERICAL ANALYSIS

sequence A‘z, the formation of A? requires 7’ preliminary multiplications. Assuming


p iterations of A would be required, for what minimal value of p would it be more
efficient first to obtain A??

CHAPTER 5
1. If g(x) is of degree n — 1 or less, show that

Zg(xi)/w'
(xi) = 0.
2. Show that 4 D2xt/w' (xi) = 1.
8. Show that Daiw!’ (x;)/w' (xi) = n(n + 1).
4, For any integer v > 0 show that

(» + 1)L3 (xi) = wt (a;) /o!(ai).


5. For x = cos ¢ the functions

Un-1(z) = 23-* sin n¢ esc >

are called Chebyshev polynomials of the second kind. Show that they are poly-
nomials, obtain a recursion, and determine their zeros.
6. The values of f(z;) and their differences are to be tabulated, with the z; equally*
spaced, but an erroneous value f(xo) + « is entered in place of f(zo). Show the effect
of this error on the values of the successive differences.
7. The trigonometric functions are known exactly for certain values of the argu-
ment: 0°, +30°, +45°, .... Other values of the sine are to be obtained from these
by interpolation. Use an error formula to ascertain how many figures are reliable
if the interpolating polynomial is quadratic; if it is cubic.
8. Using the Chebyshev points, form the cubic interpolation polynomial for interpo-
lating values of the sine over one quadrant.

CHAPTER 6
1. Taking x; = 7, use the method of §6.111 to construct the polynomials 7,(z),
r =0,1,... , 6, orthogonal on the points —3, —2,..., +8, with W =I.
2. Experimentally measured values y; of f(x;) are given at points z; =7. Values
f'(x:) of the derivative are desired. A standard method is based upon finding the
polynomial of some degree giving the best least-squares fit and differentiating. The
result depends upon the number of points used and upon the degree of the polynomial.
As an example, obtain formulas for f’(0) in terms of y_3, y_2, . . . , Ys.
8. An approximation of the form (6.0.1) to f(z) is required giving the best least-
squares fit to the data, subject to the restriction that the vector c of the coefficients y;
is constrained to satisfy
Be =z

exactly (neglecting rounding errors). Show that, with the auxiliary vector w, c is
determined by the system
F'Fe + Blw = Fy,
Be = z.
4. Obtain the expansion (6.2.2) for

F(z) = cos (2 + z)r/8,


PROBLEMS 267
5. Find the best linear polynomial and the best quadratic polynomial, in the sense
. of least squares, for fitting the data

z O 2 6 10 15 18 21
y 140 13.0 10.77 80 50 29 1.0

and find the residuals for each.


6. If f(x) is odd with the period 27, find the approximating trigonometric function
given
zx 0 r/6 2/3 w/2 2x/3 5r/6 ow
ape i) BL ch 4.5 4 2.5 0.

7. If zu = log (1 + 2), then e = 1+ 2. Consider this as an equation in u for


fixed x. Then the equation

f(z) = (1 +2 — e™)/z = 0
can be solved for u by the method of §3.2, yielding a sequence of rational fractions in
x approximating log (1 + x). Obtain the fifth term in this sequence. For what
values of x does the sequence converge?

CHAPTER 7

1. Apply the Euler-Maclaurin formula (§7.36) to show that, if p is a positive integer,

1? +2? 4 .-+- +n? = nPt!/(p + 1)

+ nP/2 + Byp(p —1) - + + (p — v + 2)n7-7**/s1.


v =2

2. Give a direct derivation of the recursion defined by Eqs. (7.1.18) and (7.1.19).
3. For b = —a = 1, w(x) = 1, calculate wo, wi, we, ws, and the 2’s associated with
each.
4. Do likewise with a = 0,b = ©, and w(x) = e*.
6. If x; = x) + th,i = —3, —2,... , +38, obtain a formula of the form

[Ps@ae = Nofo + Az(f_2 +f2) + Neos + b;) + 8,

with R vanishing for polynomials up to a degree as high as possible, and find & in
general.
6. By the method outlined in §7.3, obtain explicit expansions in terms of central
differences for derivatives up to the fifth of f(x) at x = o.
7. Let
Tas afbide woe
represent the result of applying a numerical quadrature formula based on equally
spaced abscissas (e.g., Simpson’s rule) to the evaluation of the integral of a particular
function f(z) between fixed limits. Show that J is an even function of h, expressible
in the form
I(h) = I) + Ihh? + I2h4 + ey

where J» is the exact value of the integral. Hence derive a formula for an improved
approximation to Jo, given I(h) and I(h/2).
268 PRINCIPLES OF NUMERICAL ANALYSIS
1
8. Obtain ip‘(1 + 2)%dz numerically using Gauss’s method with a polynomial
of third degree, and compare the result with the true value.
9. Use Simpson’s rule with four subintervals to evaluate
3
f, dx /log x.

CHAPTER 8
1. Obtain a Monte Carlo estimate of the value of the integral in Prob. 8, Chap. 7.
(Note that y > (1 + z)* is equivalent to y2 > 1 +2.)
INDEX

Agmon, Shmuel, 84 Bound of matrix, 39-42


Ahiezer, N. I., 214 Bounds for proper values of matrix,
Aitken, A. C., 85, 155, 184, 214 146-149, 184
on Bernoulli’s method, 116-118, 141 Brauer, Alfred, 184
Aitken’s 6% process, 117-118, 126-128, Brodetsky, S., 109, 142
141, 153 Browne, E. T., 184
Aitken’s method of interpolation, 202 Bryan, Joseph G., 184
Albert, G. E., 246 Budan’s theorem, 96, 103
Alexander, James, 84
Alternant, 92
Analytic function, 99-103 Canonical form of a matrix, 30-37
of matrix, 37-38 Cayley-Hamilton theorem, 30-32, 143
Approximation, finite linear methods of, Central differences, 207
215-223 Central mean operator, 210
general methods of, 215-225 Cesari’s method, 53-55
iterative, 4, 10-13 Characteristic equation of matrix, 31, 53,
series, 4, 13-14 143
Association for Computing Machinery, Characteristic function of matrix, 31,
16 143, 166, 181
Auxiliary unit matrix, 36 Chebyshev expansions, 223-225
Chebyshev points, 199-200, 213
Chebyshev polynomials, 197-200, 213,
Backward differences, 207, 232 223-225
Backward interpolation, 208 applied to matrix inversion, 85, 154
Bailey’s iteration for root extraction, 125 Chebyshev quadrature formulas, 241
Bargmann, V., 55, 84, 153, 184 Choleski’s method, 84
Bartlett, M. S., 84 Cofactor, 23
Base of representation, 2-3, 5 Collar, A. R., 85, 184
Beck, John, Jr., 142 Collatz, L., 84, 142
Benster, Curtis D., 85 Commutative functions, 142
Berkeley, Edmund C., 16 Complex roots, 108-114, 116-117, 138-
Bernoulli numbers, 241 141
Bernoulli polynomials, 241 Computation Laboratory, Harvard Uni-
Bernoulli’s method, 114-117, 141, 152, versity, 16
155 Computers, automatic, 1, 2, 16
for complex roots, 138 Coordinate systems, 17-20
Bernstein, Serge, 214 Coordinates of vector, 18
Binomial coefficient, generalized, 207 Cramer’s rule, 28
Blaskett, D. R., 141 Critchfield, Charles L., 142
Blunders, 1-2, 152 Crout, Prescott D., 70, 84
Bodewig, E., 84, 142 Crout’s method, 70-71, 82-83
Bodewig’s iteration for matrix inversion, Curry, Haskell B., 213
56-57, 131 Curtiss, J. H., 246
269
270 PRINCIPLES OF NUMERICAL ANALYSIS
Curve fitting, 185, 225 Escalator method, for linear equations,
least-square, 72, 220-221, 225 78-79
Cutkosky, R. E., 246 : for proper values and vectors, 172, 184
Euler-Maclaurin summation formula,
240-241
Deming, W. Edwards, 225 Exponential interpolation, 211-213
Derivative equations, 89-91
Descartes’s rule of signs, 96
Determinants, 22-25 Factor theorem, 87
Differentiation, numerical, 226, 231-233 Factorials, 207
Digital numbers, 7-10, 15, 60-65 Factorization methods for nonlinear
Direct methods, for linear equations, equations, 138
65-79, 82-83 Faddeeva, V. H., 85
for proper values and vectors, 166-184 Fejér, Leopold, 214
Displacement operator, 207, 232 Feldheim, Ervin, 214
Divided differences, 202-206, 214, 231- Feller, William, 158, 184
232 Fettis, Henry E., 184
Donsker, M. D., 246 Field of values of matrix, 147, 150
Duncan, W. J., 85 Finite iterations for proper values and
Dwyer, Paul §., 15, 84 vectors, 173-180
Fixed point of transformation, 142
Flanders, D. A., 184
Elements of vector, 18 Forsythe, George, 84, 158, 184, 246
Elimination, methods of, 68-72 Fortet, R., 246
Enclosure theorems, 237 Forward-difference operator, 207, 232
Engineering Research Associates, 16 Forward interpolation, 208
Enlargement methods for proper values, Fourier expansions, 215
170-173 finite, 221-223
Equations, linear, 17-85 Frame, J. 8., 184
direct methods for, 65-79, 82-83 _ Frazer, R. A., 85
escalator method for, 78-89 Friedman, Bernard, 142
iterative methods for, 38, 44-65, Functional iteration, systems of equa-
81-82 tions, 135-138
nonlinear, 86-1438 Functions, analytic, 99-103, 141
factorization methods for, 138 commutative, 142
multiple roots of, 89-90, 97, 103 of matrix, 37-38
rational roots of, 94 minimization of, 132-135
systems of,. 132-138 permutable, 142
transcendental, 112-118 symmetric, 87-89, 91
Errors, composition of, 4-6 of proper values, 167-170
of formulation, 2
generated, 3-10, 60-65, 245
initial, 3 Garza, A. de la, 59, 84
of measurement, 2 Gaussian quadrature formulas, 241
propagated, 3, 5-7, 13, 242 Gavurin, N. K., 85
relative, 6-7 Geiringer, Hilda, 54, 84
residual, 3, 4, 14, 58-60, 245 Generated errors, 3-10, 60-65, 245
round-off, 2, 3, 5-6, 8-15, 184 Germond, H. H., 246
sampling, 244 Gill, Stanley, 16
sources of, 2-3, 15 Givens, J. W., 15, 184
statistical estimates of, 14-15, 242-243 Goldstine, H. H., 2, 7, 15, 84, 184
truncation, 2-4, 13, 15, 58-60 Golomb, Michael, 141 ;
INDEX 271
Gradient, 47, 78, 133 Iteration, higher-order, 123-132
Graeffe’s method, 106-114, 138, 142, polynomial, 129-132
152,.155 second-order, 122-123
Graves, L. M., 142 Iterative methods, for linear systems,
Grossman, D. P., 85 38, 44-65, 81-82
for proper values and vectors, 146,
150-166
Hadamard, J., 141 Ivanov, V. K., 84
Hall, Jean, 84
Hamilton, Hugh J., 142
Harrison, Joseph O., Jr., 15, 213 Jackson, Dunham, 214
Hartree, D. R., 15 Jahn, H. A., 184
Harvard University Computation Labo- Jordan, Charles, 213
ratory, 16 Jordan’s method, 70-72, 84
Head, J. W., 184
Herget, Paul, 213
Hermite’s interpolation formula, 194-195 Kac, Mark, 246
Hestenes, Magnus R.., 84, 184 ‘Kahn, Herman, 246
Highest common factor, 97-99 Karush, Wm., 184
Hildebrandt, T. H., 142 Kelley, T. L., 184
Hitchcock, Frank L., 140-142 Kincaid, W. M., 214
Horner’s method, 94, 95, 121 Kneschke, A., 241
Hotelling, Harold, 56, 84, 184 Kohn, Walter, 184
Hotelling and Bodewig, method of, 56-57 K6nig’s theorem, 103-106, 115, 116,
Hotelling’s iteration for matrix inversion, 123-125, 141
56-57, 131 Kowalewski, Gerhard, 213
Householder, A. 8., 246
Hummel, P. M., 240
Huskey, Harry D., 15 Lagrange interpolation formula, 194, 206
Lagrange polynomials, 202
Lanczos, Cornelius, 84, 184, 225
Inman, S., 15 Lanczos’s method of minimized itera-
Institute for Advanced Study, 16 tions, 173-180
Institute for Numerical Analysis, 84, Lanczos’s theorem, 74, 230
184, 246 Laplace expansion, 24, 117
Integration, numerical, 226-241, 243-245 Least-square curve fitting, 72, 220-221,
International Business Machines Cor- 225
poration, 16, 84 © Lehmer, D. H., 109
Interpolation, 185-213, 215, 217 Lehmer’s algorithm, 108-112, 142
Aitken’s method of, 202 Length, 25-27
backward, 208 Liebler, Richard A., 246
equal-interval formulas, 206-211 Lin, Shih-nge, 142°
exponential, 211-213 Linear dependence, 17, 27
forward, 208 Linear equations (see Equations, linear)
inverse, 214 Linear independence, 17
polynomial, 193-211 Lonseth, A. T., 84
_ trigonometric, 211-213 Lozenge diagram, 209-210
Isolation of roots of equation, 95-98 Luke, Y. L., 142
Iteration, accelerating convergence of,
55, 117-118, 153-155, 184
first-order, 121-122 McClintock, Emory, 213
functional, 118-132, 135-138 MacDuffee, C. C., 85, 184
272 PRINCIPLES OF NUMERICAL ANALYSIS

Maclaurin quadrature formula, 235-237 Minimized iterations, method of, 173—


Magnitude of matrix, measures of, 38-44 180, 184 :
Mansion, P., 213 Mises, Richard von, 54, 84
Massachusetts Institute of Technology, Moments, method of, 225
16 Monte Carlo method, 242-246
Matrix, adjoint, 27-29, 167-171 Montgomery, D., 56, 84, 153, 184
auxiliary unit, 36 Morris, Joseph, 184
bound of, 39-42 Morrison, W. J., 84
bounds of proper values, 146-149, 184 Muir, Sir Thomas, 85, 142
canonical form of, 30-37 Multiple roots, 89-90, 97, 103
characteristic equation of, 31, 53, 143 Murray, Francis J., 15, 184
characteristic function of, 31, 143, 166,
181
Chebyshev polynomial applied to in- National Applied Mathematics Labora-
version of, 85, 154 tories, 246
complex, 79-81, 146-149 National Bureau of Standards, 16
field of values of, 147, 150 Nelson, M. L., 246
functions of, 37-38 Neumann, John von, 2, 7, 15, 56, 84,
Hermitian, 80, 146 153, 184, 246
Hotelling’s iteration for inversion of, Neville, E. H., 214
56-57, 131 Newton-Gauss formulas, 210°
idempotent, 29-30 Newton’s identities, 91, 167
maximum of, 39-44 Newton’s interpolation formulas, 202, 208
minimal equation of, 143 Newton’s method, 4, 94, 95, 124, 129
minimal function of, 143, 166 for complex roots, 138, 140
modified, 79, 83 for square roots, 10-13, 123
norm of, 39-44, 151 for systems of equations, 242
null space of, 31 for transcendental equations, 122
nullity of, 27-28 Newton’s three-eighths rule, 237-239
orthogonal, 26, 37, 43-44, 147 Nonlinear equations (see Equations,
positive, 27, 43, 45-48, 61-62 nonlinear)
proper values of, 150-162, 166 Norm of matrix, 39-44, 151
rank of, 27-28 Null space of matrix, 31
reciprocal of, 27—29 Nullity of matrix, 27-28
semidefinite, 146 Numerical differentiation, 226, 231-233
semidiagonal, 65-67 Numerical integration, 226-241, 243-245
symmetric, 26, 60, 146 Nygaard, Kristen, 246
diagonal form for, 37
triple-diagonal form for, 180-184
Olds, C. D., 225
transpose of, 26
Open quadrature formulas, 239
triangular, 65-67
Operational methods, 206-211, 213, 232-
unitary, 147, 151
241
Matrix product, 19, 25
Optimum-interval interpolation, 197-
Maximum of matrix, 39-44
200, 213
Metric, 26
Orthogonality, 25-27
Metropolis, Nicholas, 246
Orthogonalization, methods of, 72-78, 83
Metzler, W. H., 85
Ostrowski, Alexander, 142, 184
Miller J. C. P., 214, 225
Outer products, 22-25, 28
Milne, W. E., 213, 225
Minimal equation of matrix, 143
Minimal function of matrix, 143, 166 Parker, W. V., 184
Minimization of function, 132-135 Peano, G., 213
INDEX 273
Permutable functions, 142 Residual errors, 3, 4, 14, 58-60, 245
Petersson, Hans, 213 Richmond, H. W., 142
Petersson’s expansion, 192-193, 215, 219, Robinson, G., 142, 213
227 Rolle’s theorem, 90, 196
Petersson’s form of remainder, 191 Petersson’s generalization of, 189
Picard’s method, 142 Rosser, J: B., 184
Plunkett, R., 84 Rotational reduction to diagonal form,
Polya, G., 112, 142 160-162, 184
Polynomial interpolation, 193-211 Round-off errors, 2, 3, 5-6, 8-15, 184
Polynomial iterations, 129-132 Routines, 1, 4, 5, 10
Power series, 99 Runge, Carl, 142
Price, G. B., 184
Principal vector, 32-37, 146, 166
orthogonality relations, 36-37 Sard, Arthur, 213, 225, 241
Projections, 29-30, 45-52, 72-78 Scalar product, 25-27
Propagated errors, 3, 5-7, 13, 242 Schmidt’s method, 72
Proper value, 31-37 Schoenberg, I. J., 180, 225
bounds for, 146-149, 184 Schréder’s method, 128-131, 141
enlargement methods for, 170-173 Schwartz inequality, 39
escalator method for, 172, 184 Schwerdtfeger, Hans, 141
finite iterations for, 173-180 Seebeck, C. L., 240
iterative methods for, 146, 150-166 Seidel, method of, 48, 53, 59, 134
methods of computation, 143-184 errors in, 61-62
Proper vector, 31 operational count for, 81-82
escalator method for, 172, 184 Semendiaey, K. A., 184
finite iterations for, 173-180 Sherman, Jack, 84
iterative methods for, 146, 150-166 Shortley, George, 184
methods of computation, 143-184 Significant figures, 7
Pseudo operations, 5, 8-10 Simpson’s rule, 236-237, 239
Singular point of analytic function, 141
Smeal, G., 109, 142
Quadrature, 185, 226-231 Smoothing, 185, 225
Steepest descent, method of, 47-48, 50,
51, 82
Rademacher, Hans A., 15, 130 for nonlinear systems, 132-135
RAND Corporation, 245 Steffenson, J. F., 213, 241
Random sequence, 245-246 Steffenson’s error test, 211
Rank of matrix, 27-28 Steifel, Eduard, 84
Reciprocal of matrix, 27—29 Steifel and Hestenes, method of, 73-78
Regula falsi, 121-122 Stein, Marvin, 84
Reich, Edgar, 84 Sturm functions, 95-98, 182, 184
Relative errors, 6-7 Sturn’s theorem, 96-98
Relaxation, method of, 48, 50, 51, 81 Subroutines, 4
Remainder, 213, 219, 225 Symmetric functions, 87-89, 91
in interpolation, 188-193 of proper values, 167-170
in numerical differentiation, 231-232 Synthetic division, 93-95
in numerical quadrature, 226-228, 235-
241
Petersson’s form of, 191 Taussky, Olga, 184
in polynomial interpolation, 196-197, Taylor’s expansion, Petersson’s generali-
204-206, 211, 213 zation of, 192-193
Remainder theorem, 87 Tompkins, C. B., 84
274 PRINCIPLES OF NUMERICAL ANALYSIS
Trace, of matrix, 31, 39, 44, 150. Vallée Poussin, Ch. de la, 214
of powers of matrix, 151, 184 Vandermonde determinants, 91-93, 142,
Transcendental equations, 112-118 198, 204
Transformation, 20-22, 32-37 Vector, covariant and contravariant, 27.
' fixed point of, 142 geometric, 17-20
Transpose of matrix, 26 — numerical, 18
Trapezoidal rule, 227, 234-235, 239 Vinograde, Bernard, 184
Triangular inequality, 40
Trigonometric interpolation, 211-213
Triple-diagonal form, 180-184 Wasow, Wolfgang, 246
Truncation errors, 2-4, 13, 15, 58-60 Wenzl, F., 142
Turing, A. M., 15, 84 Wheeler, David J., 16
Whittaker, E. T., 118, 142, 213
Whittaker’s expansion, 118, 142
Ufford, Dolores, 142 Wilks, Maurice V., 16
Ulam, S., 246 Woodbury, Max, 84
University of Illinois, 16 Wronskian, 188
Le F ’ Le wy S44
. t Ate “‘eGacte wager oe." 43 148, ;

vd t, Gi cin fe) md i oe vw
ah

Quin rAd Be
7 : ; +“ CAT ay, Mgivaey i 4
A CATALOG OF SELECTED
DOVER BOOKS
IN SCIENCE AND MATHEMATICS
CATALOG OF DOVER BOOKS

Astronomy
BURNHAM’S CELESTIAL HANDBOOK, Robert Burnham, Jr. Thorough guide
to the stars beyond our solar system. Exhaustive treatment. Alphabetical by constel-
lation: Andromeda to Cetus in Vol. 1; Chamaeleon to Orion in Vol. 2; and Pavo to
Vulpecula in Vol. 3. Hundreds of illustrations. Index in Vol. 3. 2,000pp. 6% x 94.
Vol. I: 0-486-23567-X
Vol. II: 0-486-23568-8
Vol. III: 0-486-23673-0

EXPLORING THE MOON THROUGH BINOCULARS AND SMALL TELE-


SCOPES, Ernest H. Cherrington,
Jr. Informative, profusely illustrated guide to locat-
ing and identifying craters, rills, seas, mountains, other lunar features. Newly revised
and updated with special section of new photos. Over 100 photos and diagrams.
240pp. 8% x 11. 0-486-24491-1

THE EXTRATERRESTRIAL LIFE DEBATE, 1750-1900, MichaelJ. Crowe. First


detailed, scholarly study in English of the many ideas that developed from 1750 to
1900 regarding the existence of intelligent extraterrestrial life. Examines ideas of
Kant, Herschel, Voltaire, Percival Lowell, many other scientists and thinkers. 16 illus-
trations. 704pp. 5% x 814. 0-486-40675-X

THEORIES OF THE WORLD FROM ANTIQUITY TO THE COPERNICAN


REVOLUTION, MichaelJ.Crowe. Newly revised edition of an accessible, enlight-
ening book recreates the change from an earth-centered to a sun-centered concep-
tion of the solar system. 242pp. 5% x 8'4. 0-486-41444-2

A HISTORY OF ASTRONOMY, A. Pannekoek. Well-balanced, carefully reasoned


study covers such topics as Ptolemaic theory, work of Copernicus, Kepler, Newton,
Eddington’s work on stars, much more. Illustrated. References. 521pp. 5% x 84.
0-486-65994-1

A COMPLETE MANUAL OF AMATEUR ASTRONOMY: TOOLS AND


TECHNIQUES FOR ASTRONOMICAL OBSERVATIONS, P. Clay Sherrod
with Thomas L. Koed. Concise, highly readable book discusses: selecting, setting up
and maintaining a telescope; amateur studies of the sun; lunar topography and occul-
tations; observations of Mars, Jupiter, Saturn, the minor planets and the stars; an
introduction to photoelectric photometry; more. 1981 ed. 124 figures. 25 halftones.
37 tables. 335pp. 614 x 914. 0-486-40675-X

AMATEUR ASTRONOMER’S HANDBOOK,J. B. Sidgwick. Timeless, compre-


hensive coverage of telescopes, mirrors, lenses, mountings, telescope drives, microm-
eters, spectroscopes, more. 189 illustrations. 576pp. 5% x 8'4. (Available in U.S. only.)
0-486-24034-7

STARS AND RELATIVITY, Ya. B. Zel’dovich and I. D. Novikov. Vol. 1 of Relativistic


Astrophysics by famed Russian scientists. General relativity, properties of matter under
astrophysical conditions, stars, and stellar systems. Deep physical insights, clear pre-
sentation. 1971 edition. References. 544pp. 5% x 844. 0-486-69424-0
CATALOG OF DOVER BOOKS

Chemistry
THE SCEPTICAL CHYMIST: THE CLASSIC 1661 TEXT, Robert Boyle. Boyle
defines the term “element,” asserting that all natural phenomena can be explained by
the motion and organization of primary particles. 1911 ed. viiit+232pp. 5% x 8.
0-486-42825-7

RADIOACTIVE SUBSTANCES, Marie Curie. Here is the celebrated scientist’s


doctoral thesis, the prelude to her receipt of the 1903 Nobel Prize. Curie discusses
establishing atomic character of radioactivity found in compounds of uranium and
thorium; extraction from pitchblende of polonium and radium; isolation of pure radi-
um chloride; determination of atomic weight of radium; plus electric, photographic,
luminous, heat, color effects of radioactivity. iit+94pp. 5% x 84. 0-486-42550-9

CHEMICAL MAGIC, Leonard A. Ford. Second Edition, Revised by E. Winston


Grundmeier. Over 100 unusual stunts demonstrating cold fire, dust explosions,
much more. Text explains scientific principles and stresses safety precautions.
128pp. 5% x 84. 0-486-67628-5

THE DEVELOPMENT OF MODERN CHEMISTRY, AaronJ. Ihde. Authorita-


tive history of chemistry from ancient Greek theory to 20th-century innovation.
Covers major chemists and their discoveries. 209 illustrations. 14 tables.
Bibliographies. Indices. Appendices. 85lpp. 5% x 87. 0-486-64235-6

CATALYSIS IN CHEMISTRY AND ENZYMOLOGY, William P. Jencks.


Exceptionally clear coverage of mechanisms for catalysis, forces in aqueous solution,
carbonyl- and acyl-group reactions, practical kinetics, more. 864pp. 5% x 8'4.
0-486-65460-5

ELEMENTS OF CHEMISTRY, Antoine Lavoisier. Monumental classic by founder


of modern chemistry in remarkable reprint of rare 1790 Kerr translation. A must for
every student of chemistry or the history of science. 539pp. 5% x 8%. 0-486-64624-6

THE HISTORICAL BACKGROUND OF CHEMISTRY, Henry M. Leicester.


Evolution of ideas, not individual biography. Concentrates on formulation of a coher-
ent set of chemical laws. 260pp. 5% x 8's. 0-486-61053-5

A SHORT HISTORY OF CHEMISTRY, J. R. Partington. Classic exposition


explores origins of chemistry, alchemy, early medical chemistry, nature of atmos-
phere, theory of valency, laws and structure of atomic theory, much more. 428pp.
5% x 8%. (Available in U.S. only.) 0-486-65977-1

GENERAL CHEMISTRY, Linus Pauling. Revised 3rd edition of classic first-year


text by Nobel laureate. Atomic and molecular structure, quantum mechanics, statis-
tical mechanics, thermodynamics correlated with descriptive chemistry. Problems.
992pp. 5% x 84. 0-486-65622-5

FROM ALCHEMY TO CHEMISTRY, John Read. Broad, humanistic treatment


focuses on great figures of chemistry and ideas that revolutionized the science. 50
illustrations. 240pp. 5% x 8's. 0-486-28690-8
CATALOG OF DOVER BOOKS

Engineering
DE RE METALLICA, Georgius Agricola. The famous Hoover translation of great-
est treatise on technological chemistry, engineering, geology, mining of early mod-
ern times (1556). All 289 original woodcuts. 638pp. 6% x 11. 0-486-60006-8

FUNDAMENTALS OF ASTRODYNAMICS, Roger Bate et al. Modern approach


developed by U.S. Air Force Academy. Designed as a first course. Problems, exer-
cises. Numerous illustrations. 455pp. 5% x 8/4. 0-486-60061-0

DYNAMICS OF FLUIDS IN POROUS MEDIA, Jacob Bear. For advanced stu-


dents of ground water hydrology, soil mechanics and physics, drainage and irrigation
engineering and more. 335 illustrations. Exercises, with answers. 784pp. 6’ x 914.
0-486-65675-6

THEORY OF VISCOELASTICITY (Second Edition), Richard M. Christensen.


Complete consistent description of the linear theory of the viscoelastic behavior of
materials. Problem-solving techniques discussed. 1982 edition. 29 figures.
xiv+364pp. 616 x 9%. 0-486-42880-X

MECHANICS,J. P. Den Hartog. A classic introductory text or refresher. Hundreds


of applications and design problems illuminate fundamentals of trusses, loaded
beams and cables, etc. 334 answered problems. 462pp. 5% x 8’. 0-486-60754-2

MECHANICAL VIBRATIONS, J. P. Den Hartog. Classic textbook offers lucid


explanations and illustrative models, applying theories of vibrations to a variety of
practical industrial engineering problems. Numerous figures. 233 problems, solu-
tions. Appendix. Index. Preface. 436pp. 5% x 8’. 0-486-64785-4

STRENGTH OF MATERIALS, J. P. Den Hartog. Full, clear treatment of basic


material (tension, torsion, bending, etc.) plus advanced material on engineering
methods, applications. 350 answered problems. 323pp. 5% x 844. 0-486-60755-0

A HISTORY OF MECHANICS, René Dugas. Monumental study of mechanical


principles from antiquity to quantum mechanics. Contributions of ancient Greeks,
Galileo, Leonardo, Kepler, Lagrange, many others. 671pp. 5% x 8%. 0-486-65632-2

STABILITY THEORY AND ITS APPLICATIONS TO STRUCTURAL


MECHANICS, Clive L. Dym. Self-contained text focuses on Koiter postbuckling
analyses, with mathematical notions of stability of motion. Basing minimum energy
principles for static stability upon dynamic concepts of stability of motion, it devel-
ops asymptotic buckling and postbuckling analyses from potential energy considera-
tions, with applications to columns, plates, and arches. 1974 ed. 208pp. 5% x 8's.
0-486-42541-X

METAL FATIGUE, N. E. Frost, K. J. Marsh, and L. P. Pook. Definitive, clearly writ-


ten, and well-illustrated volume addresses all aspects of the subject, from the histori-
cal development of understanding metal fatigue to vital concepts of the cyclic stress
that causes a crack to grow. Includes 7 appendixes. 544pp. 5% x 84%. 0-486-40927-9
CATALOG OF DOVER BOOKS

ROCKETS, Robert Goddard. Two of the most significant publications in the history
of rocketry and jet propulsion: “A Method of Reaching Extreme Altitudes” (1919) and
“Liquid Propellant Rocket Development” (1936). 128pp. 5% x 844. 0-486-42537-1
STATISTICAL MECHANICS: PRINCIPLES AND APPLICATIONS, Terrell L.
Hill. Standard text covers fundamentals of statistical mechanics, applications to fluc-
tuation theory, imperfect gases, distribution functions, more. 448pp. 5% x 8%.
0-486-65390-0
ENGINEERING AND TECHNOLOGY 1650-1750: ILLUSTRATIONS AND
TEXTS FROM ORIGINAL SOURCES, Martin Jensen. Highly readable text with
more than 200 contemporary drawings and detailed engravings of engineering pro-
jects dealing with surveying, leveling, materials, hand tools, lifting equipment, trans-
port and erection, piling, bailing, water supply, hydraulic engineering, and more.
Among the specific projects outlined-transporting a 50-ton stone to the Louvre, erect-
ing an obelisk, building timber locks, and dredging canals. 207pp. 8% x 11'.
0-486-42232-1
THE VARIATIONAL PRINCIPLES OF MECHANICS, Cornelius Lanczos.
Graduate level coverage of calculus of variations, equations of motion, relativistic
mechanics, more. First inexpensive paperbound edition of classic treatise. Index.
Bibliography. 418pp. 5% x 844. 0-486-65067-7

PROTECTION OF ELECTRONIC CIRCUITS FROM OVERVOLTAGES,


Ronald B. Standler. Five-part treatment presents practical rules and strategies for cir-
cuits designed to protect electronic systems from damage by transient overvoltages.
1989 ed. xxiv+434pp. 644 x 944. 0-486-42552-5
ROTARY WING AERODYNAMICS, W. Z. Stepniewski. Clear, concise text cov-
ers aerodynamic phenomena of the rotor and offers guidelines for helicopter perfor-
mance evaluation. Originally prepared for NASA. 537 figures. 640pp. 6’ x 914.
0-486-64647-5
INTRODUCTION TO SPACE DYNAMICS, William Tyrrell Thomson. Com-
prehensive, classic introduction to space-flight engineering for advanced undergrad-
uate and graduate students. Includes vector algebra, kinematics, transformation of
coordinates. Bibliography. Index. 352pp. 5% x 8%. 0-486-65113-4
HISTORY OF STRENGTH OF MATERIALS, Stephen P. Timoshenko. Excellent
historical survey of the strength of materials with many references to the theories of
elasticity and structure. 245 figures. 452pp. 5% x 8's. 0-486-61187-6
ANALYTICAL FRACTURE MECHANICS, David J. Unger. Self-contained text
supplements standard fracture mechanics texts by focusing on analytical methods for
determining crack-tip stress and strain fields. 336pp. 6% x 944. 0-486-41737-9
STATISTICAL MECHANICS OF ELASTICITY, J. H. Weiner. Advanced, self-
contained treatment illustrates general principles and elastic behavior of solids. Part
1, based on classical mechanics, studies thermoelastic behavior of crystalline and
polymeric solids. Part 2, based on quantum mechanics, focuses on interatomic force
laws, behavior of solids, and thermally activated processes. For students of physics
and chemistry and for polymer physicists. 1983 ed. 96 figures. 496pp. 5% x 844.
0-486-42260-7
CATALOG OF DOVER BOOKS

Mathematics
FUNCTIONAL ANALYSIS (Second Corrected Edition), George Bachman and
Lawrence Narici. Excellent treatment of subject geared toward students with back-
ground in linear algebra, advanced calculus, physics and engineering. Text covers
introduction to inner-product spaces, normed, metric spaces, and topological spaces;
complete orthonormal sets, the Hahn-Banach Theorem and its consequences, and
many other related subjects. 1966 ed. 544pp. 61 x 914. 0-486-40251-7

ASYMPTOTIC EXPANSIONS OF INTEGRALS, Norman Bleistein & Richard A.


Handelsman. Best introduction to important field with applications in a variety of sci-
entific disciplines. New preface. Problems. Diagrams. Tables. Bibliography. Index.
448pp. 5% x 84. 0-486-65082-0
VECTOR AND TENSOR ANALYSIS WITH APPLICATIONS, A. I. Borisenko
and I. E. Tarapov. Concise introduction. Worked-out problems, solutions, exercises.
257pp. 5% x 8h. 0-486-63833-2
AN INTRODUCTION TO ORDINARY DIFFERENTIAL EQUATIONS, Earl
A. Coddington. A thorough and systematic first course in elementary differential
equations for undergraduates in mathematics and science, with many exercises and
problems (with answers). Index. 304pp. 5% x 84. 0-486-65942-9

FOURIER SERIES AND ORTHOGONAL FUNCTIONS, Harry F. Davis. An


incisive text combining theory and practical example to introduce Fourier series,
orthogonal functions and applications of the Fourier method to boundary-value
problems. 570 exercises. Answers and notes. 416pp. 5% x 8%. 0-486-65973-9

COMPUTABILITY AND UNSOLVABILITY, Martin Davis. Classic graduate-


level introduction to theory of computability, usually referred to as theory of recur-
rent functions. New preface and appendix. 288pp. 5% x 814. 0-486-61471-9

ASYMPTOTIC METHODS IN ANALYSIS, N. G. de Bruijn. An inexpensive, com-


prehensive guide to asymptotic methods—the pioneering work that teaches by
explaining worked examples in detail. Index. 224pp. 5% x 8% 0-486-64221-6

APPLIED COMPLEX VARIABLES, John W. Dettman. Step-by-step coverage of


fundamentals of analytic function theory—plus lucid exposition of five important
applications: Potential Theory; Ordinary Differential Equations; Fourier Transforms;
Laplace Transforms; Asymptotic Expansions. 66 figures. Exercises at chapter ends.
512pp. 5% x 84. 0-486-64670-X

INTRODUCTION TO LINEAR ALGEBRA AND DIFFERENTIAL EQUA-


TIONS, John W. Dettman. Excellent text covers complex numbers, determinants,
orthonormal bases, Laplace transforms, much more. Exercises with solutions.
Undergraduate level. 416pp. 5% x 844. 0-486-65191-6
RIEMANN’S ZETA FUNCTION, H. M. Edwards. Superb, high-level study of
landmark 1859 publication entitled “On the Number of Primes Less Than a Given
Magnitude” traces developments in mathematical theory that it inspired. xiv+315pp.
5'h x 8h. 0-486-41740-9
CATALOG OF DOVER BOOKS

CALCULUS OF VARIATIONS WITH APPLICATIONS, George M. Ewing.


Applications-oriented introduction to variational theory develops insight and pro-
motes understanding of specialized books, research papers. Suitable for advanced
undergraduate/graduate students as primary, supplementary text. 352pp. 5% x 814.
0-486-64856-7
COMPLEX VARIABLES, FrancisJ.Flanigan. Unusual approach, delaying complex
algebra till harmonic functions have been analyzed from real variable viewpoint.
Includes problems with answers. 364pp. 5% x 84. 0-486-61388-7

AN INTRODUCTION TO THE CALCULUS OF VARIATIONS, Charles Fox.


Graduate-level text covers variations of an integral, isoperimetrical problems, least
action, special relativity, approximations, more. References. 279pp. 5% x 844.
0-486-65499-0
COUNTEREXAMPLES IN ANALYSIS, Bernard R. Gelbaum and John M. H.
Olmsted. These counterexamples deal mostly with the part of analysis known as
“real variables.” The first half covers the real number system, and the second half
encompasses higher dimensions. 1962 edition. xxiv+198pp. 5% x 84. 0-486-42875-3

CATASTROPHE THEORY FOR SCIENTISTS AND ENGINEERS, Robert


Gilmore. Advanced-level treatment describes mathematics of theory grounded in the
work of Poincaré, R. Thom, other mathematicians. Also important applications to
problems in mathematics, physics, chemistry and engineering. 1981 edition.
References. 28 tables. 397 black-and-white illustrations. xvii + 666pp. 6% x 944.
0-486-67539-4

INTRODUCTION TO DIFFERENCE EQUATIONS, Samuel Goldberg. Excep-


tionally clear exposition of important discipline with applications to sociology, psy-
chology, economics. Many illustrative examples; over 250 problems. 260pp. 5% x 844.
0-486-65084-7

NUMERICAL METHODS FOR SCIENTISTS AND ENGINEERS, Richard


Hamming. Classic text stresses frequency approach in coverage of algorithms, poly-
nomial approximation, Fourier approximation, exponential approximation, other
topics. Revised and enlarged 2nd edition. 721 pp. 5% x 8's. 0-486-65241-6

INTRODUCTION TO NUMERICAL ANALYSIS (2nd Edition), F. B. Hilde-


brand. Classic, fundamental treatment covers computation, approximation, inter-
polation, numerical differentiation and integration, other topics. 150 new problems.
669pp. 5% x 84. 0-486-65363-3

THREE PEARLS OF NUMBER THEORY, A. Y. Khinchin. Three compelling


puzzles require proof of a basic law governing the world of numbers. Challenges con-
cern van der Waerden’s theorem, the Landau-Schnirelmann hypothesis and Mann’s
theorem, and a solution to Waring’s problem. Solutions included. 64pp. 5% x 8%.
0-486-40026-3

THE PHILOSOPHY OF MATHEMATICS: AN INTRODUCTORY ESSAY,


Stephan Korner. Surveys the views of Plato, Aristotle, Leibniz & Kant concerning
propositions and theories of applied and pure mathematics. Introduction. Two
appendices. Index. 198pp. 5% x Sih. 0-486-25048-2
CATALOG OF DOVER BOOKS

INTRODUCTORY REAL ANALYSIS, A.N. Kolmogorov, S. V. Fomin. Translated


by Richard A. Silverman. Self-contained, evenly paced introduction to real and func-
tional analysis. Some 350 problems. 403pp. 5% x 8/4. 0-486-61226-0

APPLIED ANALYSIS, Cornelius Lanczos. Classic work on analysis and design of


finite processes for approximating solution of analytical problems. Algebraic equa-
tions, matrices, harmonic analysis, quadrature methods, much more. 559pp. 5% x 8h.
0-486-65656-X

AN INTRODUCTION TO ALGEBRAIC STRUCTURES, Joseph Landin. Superb


self-contained text covers “abstract algebra”: sets and numbers, theory of groups, the-
ory of rings, much more. Numerous well-chosen examples, exercises. 247pp. 5% x 84.
0-486-65940-2

QUALITATIVE THEORY OF DIFFERENTIAL EQUATIONS, V. V. Nemytskii


and V.V. Stepanov. Classic graduate-level text by two prominent Soviet mathemati-
cians covers classical differential equations as well as topological dynamics and
ergodic theory. Bibliographies. 523pp. 5% x 8'4. 0-486-65954-2

THEORY OF MATRICES, Sam Perlis. Outstanding text covering rank, nonsingu-


larity and inverses in connection with the development of canonical matrices under
the relation of equivalence, and without the intervention of determinants. Includes
exercises. 237pp. 5% x 8's. 0-486-66810-X

INTRODUCTION TO ANALYSIS, Maxwell Rosenlicht. Unusually clear, accessi-


ble coverage of set theory, real number system, metric spaces, continuous functions,
Riemann integration, multiple integrals, more. Wide range of problems. Under-
graduate level. Bibliography. 254pp. 5% x 8'4. 0-486-65038-3

MODERN NONLINEAR EQUATIONS, Thomas L. Saaty. Emphasizes practical


solution of problems; covers seven types of equations. “. . . a welcome contribution
to the existing literature...."—Math Reviews. 490pp. 5% x 84. 0-486-64232-1

MATRICES AND LINEAR ALGEBRA, Hans Schneider and George Phillip


Barker. Basic textbook covers theory of matrices and its applications to systems of lin-
ear equations and related topics such as determinants, eigenvalues and differential
equations. Numerous exercises. 432pp. 5% x 8's. 0-486-66014-1

LINEAR ALGEBRA, Georgi E. Shilov. Determinants, linear spaces, matrix alge-


bras, similar topics. For advanced undergraduates, graduates. Silverman translation.
387pp. 5% x 844. 0-486-63518-X

ELEMENTS OF REAL ANALYSIS, David A. Sprecher. Classic text covers funda-


mental concepts, real number system, point sets, functions of a real variable, Fourier
series, much more. Over 500 exercises. 352pp. 5% x 844. 0-486-65385-4

SET THEORY AND LOGIC, Robert R. Stoll. Lucid introduction to unified theory
of mathematical concepts. Set theory and logic seen as tools for conceptual under-
standing of real number system. 496pp. 5% x 8%. 0-486-63829-4
CATALOG OF DOVER BOOKS

TENSOR CALCULUS, J.L. Synge and A. Schild. Widely used introductory text
covers spaces and tensors, basic operations in Riemannian space, non-Riemannian
spaces, etc. 324pp. 5% x 84. 0-486-63612-7
ORDINARY DIFFERENTIAL EQUATIONS, Morris Tenenbaum and Harry
Pollard. Exhaustive survey of ordinary differential equations for undergraduates in
mathematics, engineering, science. Thorough analysis of theorems. Diagrams.
Bibliography. Index. 818pp. 5% x 84. 0-486-64940-7
INTEGRAL EQUATIONS, F. G. Tricomi. Authoritative, well-written treatment of
extremely useful mathematical tool with wide applications. Volterra Equations,
Fredholm Equations, much more. Advanced undergraduate to graduate level.
Exercises. Bibliography. 238pp. 5% x 84. 0-486-64828-1
FOURIER SERIES, Georgi P. Tolstov. Translated by Richard A. Silverman. A valu-
able addition to the literature on the subject, moving clearly from subject to subject
and theorem to theorem. 107 problems, answers. 336pp. 5% x 84. 0-486-63317-9
INTRODUCTION TO MATHEMATICAL THINKING, Friedrich Waismann.
Examinations of arithmetic, geometry, and theory of integers; rational and natural
numbers; complete induction; limit and point of accumulation; remarkable curves;
complex and hypercomplex numbers, more. 1959 ed. 27 figures. xii+260pp. 5% x 8/4.
0-486-63317-9
POPULAR LECTURES ON MATHEMATICAL LOGIC, Hao Wang. Noted logi-
cian’s lucid treatment of historical developments, set theory, model theory, recursion
theory and constructivism, proof theory, more. 3 appendixes. Bibliography. 1981 edi-
tion. ix + 283pp. 5% x 8. 0-486-67632-3

CALCULUS OF VARIATIONS, Robert Weinstock. Basic introduction covering


isoperimetric problems, theory of elasticity, quantum mechanics, electrostatics, etc.
Exercises throughout. 326pp. 5% x 8's. 0-486-63069-2
THE CONTINUUM: A CRITICAL EXAMINATION OF THE FOUNDATION
OF ANALYSIS, Hermann Weyl. Classic of 20th-century foundational research deals
with the conceptual problem posed by the continuum. 156pp. 5% x 84.
0-486-67982-9
CHALLENGING MATHEMATICAL PROBLEMS WITH ELEMENTARY
SOLUTIONS, A. M. Yaglom and I. M. Yaglom. Over 170 challenging problems on
probability theory, combinatorial analysis, points and lines, topology, convex poly-
gons, many other topics. Solutions. Total of 445pp. 5% x 8%. Two-vol. set.
Vol. I: 0-486-65536-9 Vol. II: 0-486-655377
INTRODUCTION TO PARTIAL DIFFERENTIAL EQUATIONS WITH
APPLICATIONS, E. C. Zachmanoglou and Dale W. Thoe. Essentials of partial dif-
ferential equations applied to common problems in engineering and the physical sci-
ences. Problems and answers. 416pp. 5% x 8's. 0-486-65251-3
THE THEORY OF GROUPS, HansJ. Zassenhaus. Well-written graduate-level text
acquaints reader with group-theoretic methods and demonstrates their usefulness in
mathematics. Axioms, the calculus of complexes, homomorphic mapping, #-group
theory, more. 276pp. 5% x 84. 0-486-40922-8
CATALOG OF DOVER BOOKS

Math—Decision Theory, Statistics, Probability


ELEMENTARY DECISION THEORY, Herman Chernoff and Lincoln E.
Moses. Clear introduction to statistics and statistical theory covers data process-
ing, probability and random variables, testing hypotheses, much more. Exercises.
364pp. 5% x 8h. 0-486-65218-1

STATISTICS MANUAL, Edwin L. Crow et al. Comprehensive, practical collection


of classical and modern methods prepared by U.S. Naval Ordnance Test Station.
Stress on use. Basics of statistics assumed. 288pp. 5% x 8/4. 0-486-60599-X

SOME THEORY OF SAMPLING, William Edwards Deming. Analysis of the


problems, theory and design of sampling techniques for social scientists, industrial
managers and others who find statistics important at work. 61 tables. 90 figures. xvii
+602pp. 5% x 84. 0-486-64684-X

LINEAR PROGRAMMING AND ECONOMIC ANALYSIS, Robert Dorfman,


Paul A. Samuelson and Robert M. Solow. First comprehensive treatment of linear
programming in standard economic analysis. Game theory, modern welfare eco-
nomics, Leontief input-output, more. 525pp. 5% x 8's. 0-486-65491-5

PROBABILITY: AN INTRODUCTION, Samuel Goldberg. Excellent basic text


covers set theory, probability theory for finite sample spaces, binomial theorem,
much more. 360 problems. Bibliographies. 322pp. 5% x 8’. 0-486-65252-1

GAMES AND DECISIONS: INTRODUCTION AND CRITICAL SURVEY,


R. Duncan Luce and Howard Raiffa. Superb nontechnical introduction to game the-
ory, primarily applied to social sciences. Utility theory, zero-sum games, n-person
games, decision-making, much more. Bibliography. 509pp. 5% x 84. 0-486-65943-7

INTRODUCTION TO THE THEORY OF GAMES,J.C. C. McKinsey. This com-


prehensive overview of the mathematical theory of games illustrates applications to
situations involving conflicts of interest, including economic, social, political, and
military contexts. Appropriate for advanced undergraduate and graduate courses;
advanced calculus a prerequisite. 1952 ed. x+372pp. 5% x 84. 0-486-42811-7

FIFTY CHALLENGING PROBLEMS IN PROBABILITY WITH SOLUTIONS,


Frederick Mosteller. Remarkable puzzlers, graded in difficulty, illustrate elementary
and advanced aspects of probability. Detailed solutions. 88pp. 5% x 84. 65355-2

PROBABILITY THEORY: A CONCISE COURSE, Y. A. Rozanov. Highly read-


able, self-contained introduction covers combination of events, dependent events,
Bernoulli trials, etc. 148pp. 5% x 81. 0-486-63544-9

STATISTICAL METHOD FROM THE VIEWPOINT OF QUALITY CON-


TROL, Walter A. Shewhart. Important text explains regulation of variables, uses of
statistical control to achieve quality control in industry, agriculture, other areas.
192pp. 5% x 8%. 0-486-65232-7
CATALOG OF DOVER BOOKS

Math—Geometry and Topology


ELEMENTARY CONCEPTS OF TOPOLOGY, Paul Alexandroff. Elegant, intuitive
approach to topology from set-theoretic topology to Betti groups; how concepts of
topology are useful in math and physics. 25 figures. 57pp. 5% x 84. 0-486-60747-X

COMBINATORIAL TOPOLOGY, P. S. Alexandrov. Clearly written, well-orga-


nized, three-part text begins by dealing with certain classic problems without using
the formal techniques of homology theory and advances to the central concept, the
Betti groups. Numerous detailed examples. 654pp. 5% x 8%. 0-486-40179-0

EXPERIMENTS IN TOPOLOGY, Stephen Barr. Classic, lively explanation of one


of the byways of mathematics. Klein bottles, Moebius strips, projective planes, map
coloring, problem of the Koenigsberg bridges, much more, described with clarity and
wit. 43 figures. 210pp. 5% x 8'4. 0-486-25933-1

THE GEOMETRY OF RENE DESCARTES, René Descartes. The great work


founded analytical geometry. Original French text, Descartes’s own diagrams, togeth-
er with definitive Smith-Latham translation. 244pp. 5% x 844. 0-486-60068-8

EUCLIDEAN GEOMETRY AND TRANSFORMATIONS, Clayton W. Dodge.


This introduction to Euclidean geometry emphasizes transformations, particularly
isometries and similarities. Suitable for undergraduate courses, it includes numerous
examples, many with detailed answers. 1972 ed. viiit296pp. 6% x 944. 0-486-43476-1

PRACTICAL CONIC SECTIONS: THE GEOMETRIC PROPERTIES OF


ELLIPSES, PARABOLAS AND HYPERBOLAS,J. W. Downs. This text shows how
to create ellipses, parabolas, and hyperbolas. It also presents historical background on
their ancient origins and describes the reflective properties and roles of curves in
design applications. 1993 ed. 98 figures. xii+ 100pp. 6% x 94. 0-486-42876-1

THE THIRTEEN BOOKS OF EUCLID’S ELEMENTS, translated with introduc-


tion and commentary by Sir Thomas L. Heath. Definitive edition. Textual and lin-
guistic notes, mathematical analysis. 2,500 years of critical commentary. Unabridged.
1,414pp. 5% x 8%4.. Three-vol. set.
Vol. I: 0-486-60088-2 Vol. II: 0-486-60089-0 Vol. III: 0-486-60090-4

SPACE AND GEOMETRY: IN THE LIGHT OF PHYSIOLOGICAL,


PSYCHOLOGICAL AND PHYSICAL INQUIRY, Ernst Mach. Three essays by
an eminent philosopher and scientist explore the nature, origin, and development of
our concepts of space, with a distinctness and precision suitable for undergraduate
students and other readers. 1906 ed. vit 148pp. 5% x 81. 0-486-43909-7

GEOMETRY OF COMPLEX NUMBERS, Hans Schwerdtfeger. Illuminating,


widely praised book on analytic geometry of circles, the Moebius transformation,
and two-dimensional non-Euclidean geometries. 200pp. 5% x 8%. — 0-486-63830-8

DIFFERENTIAL GEOMETRY, Heinrich W. Guggenheimer. Local differential geom-


etry as an application of advanced calculus and linear algebra. Curvature, transforma-
tion groups, surfaces, more. Exercises. 62 figures. 378pp. 5% x 84. 0-486-63433-7
CATALOG OF DOVER BOOKS

History of Math
THE WORKS OF ARCHIMEDES, Archimedes (T. L. Heath, ed.). Topics include
the famous problems of the ratio of the areas of a cylinder and an inscribed sphere;
the measurement of a circle; the properties of conoids, spheroids, and spirals; and the
quadrature of the parabola. Informative introduction. clxxxvi+326pp. 5% x 84.
0-486-42084-1

A SHORT ACCOUNT OF THE HISTORY OF MATHEMATICS, W. W. Rouse


Ball. One of clearest, most authoritative surveys from the Egyptians and Phoenicians
through 19th-century figures such as Grassman, Galois, Riemann. Fourth edition.
522pp. 5% x 8h. 0-486-20630-0

THE HISTORY OF THE CALCULUS AND ITS CONCEPTUAL DEVELOP-


MENT, Carl B. Boyer. Origins in antiquity, medieval contributions, work of Newton,
Leibniz, rigorous formulation. Treatment is verbal. 346pp. 5% x 84. 0-486-60509-4

THE HISTORICAL ROOTS OF ELEMENTARY MATHEMATICS, Lucas N. H.


Bunt, Phillip S. Jones, and Jack D. Bedient. Fundamental underpinnings of modern
arithmetic, algebra, geometry and number systems derived from ancient civiliza-
tions. 320pp. 5% x 84. 0-486-25563-8

A HISTORY OF MATHEMATICAL NOTATIONS, Florian Cajori. This classic


study notes the first appearance of a mathematical symbol and its origin, the com-
petition it encountered, its spread among writers in different countries, its rise to pop-
ularity, its eventual decline or ultimate survival. Original 1929 two-volume edition
presented here in one volume. xxviii+820pp. 5% x 8's. 0-486-67766-4

GAMES, GODS & GAMBLING: A HISTORY OF PROBABILITY AND


STATISTICAL IDEAS, F. N. David. Episodes from the lives of Galileo, Fermat,
Pascal, and others illustrate this fascinating account of the roots of mathematics.
Features thought-provoking references to classics, archaeology, biography, poetry.
1962 edition. 304pp. 5% x 8’. (Available in U.S. only.) 0-486-40023-9

OF MEN AND NUMBERS: THE STORY OF THE GREAT


MATHEMATICIANS, Jane Muir. Fascinating accounts of the lives and accom-
plishments of history’s greatest mathematical minds—Pythagoras, Descartes, Euler,
Pascal, Cantor, many more. Anecdotal, illuminating. 30 diagrams. Bibliography.
256pp. 5% x 81h. 0-486-28973-7

HISTORY OF MATHEMATICS, David E. Smith. Nontechnical survey from


ancient Greece and Orient to late 19th century; evolution of arithmetic, geometry,
trigonometry, calculating devices, algebra, the calculus. 362 illustrations. 1,355pp.
5% x 8'b. Two-vol. set. Vol. I: 0-486-20429-4 Vol. II: 0-486-20430-8

A CONCISE HISTORY OF MATHEMATICS, DirkJ. Struik. The best brief his-


tory of mathematics. Stresses origins and covers every major figure from ancient
Near East to 19th century. 41 illustrations. 195pp. 5% x 84. 0-486-60255-9
CATALOG OF DOVER BOOKS

Physics
OPTICAL RESONANCE AND TWO-LEVEL ATOMS, L. Allen andJ.H. Eberly.
Clear, comprehensive introduction to basic principles behind all quantum optical res-
onance phenomena. 53 illustrations. Preface. Index. 256pp. 5% x 8%. 0-486-65533-4

QUANTUM THEORY, David Bohm. This advanced undergraduate-level text pre-


sents the quantum theory in terms of qualitative and imaginative concepts, followed
by specific applications worked out in mathematical detail. Preface. Index. 655pp.
5% x 8A. 0-486-65969-0

ATOMIC PHYSICS (8th EDITION), Max Born. Nobel laureate’s lucid treatment of
kinetic theory of gases, elementary particles, nuclear atom, wave-corpuscles, atomic
structure and spectral lines, much more. Over 40 appendices, bibliography. 495pp.
5% x 8h. 0-486-65984-4

A SOPHISTICATE’S PRIMER OF RELATIVITY, P. W. Bridgman. Geared


toward readers already acquainted with special relativity, this book transcends the
view of theory as a working tool to answer natural questions: What is a frame of ref-
erence? What is a “law of nature”? What is the role of the “observer”? Extensive
treatment, written in terms accessible to those without a scientific background. 1983
ed. xlviiit 172pp. 5% x 814. 0-486-42549-5

AN INTRODUCTION TO HAMILTONIAN OPTICS, H. A. Buchdahl. Detailed


account of the Hamiltonian treatment of aberration theory in geometrical optics.
Many classes of optical systems defined in terms of the symmetries they possess.
Problems with detailed solutions. 1970 edition. xv + 360pp. 5% x 8%. 0-486-67597-1

PRIMER OF QUANTUM MECHANICS, Marvin Chester. Introductory text


examines the classical quantum bead on a track: its state and representations; opera-
tor eigenvalues; harmonic oscillator and bound bead in a symmetric force field; and
bead in a spherical shell. Other topics include spin, matrices, and the structure of
quantum mechanics; the simplest atom, indistinguishable particles; and stationary-
state perturbation theory. 1992 ed. xiv+3l4pp. 61% x 944. 0-486-42878-8

LECTURES ON QUANTUM MECHANICS, Paul A. M. Dirac. Four concise, bril-


liant lectures on mathematical methods in quantum mechanics from Nobel Prize-
winning quantum pioneer build on idea of visualizing quantum theory through the
use of classical mechanics. 96pp. 5% x 8's. 0-486-41713-1

THIRTY YEARS THAT SHOOK PHYSICS: THE STORY OF QUANTUM


THEORY, George Gamow. Lucid, accessible introduction to influential theory of
energy and matter. Careful explanations of Dirac’s anti-particles, Bohr’s model of the
atom, much more. 12 plates. Numerous drawings. 240pp. 5% x 8%. 0-486-24895-X

ELECTRONIC STRUCTURE AND THE PROPERTIES OF SOLIDS: THE


PHYSICS OF THE CHEMICAL BOND, Walter A. Harrison. Innovative text
offers basic understanding of the electronic structure of covalent and ionic solids,
simple metals, transition metals and their compounds. Problems. 1980 edition.
582pp. 64 x 4h. 0-486-66021-4
CATALOG OF DOVER BOOKS

HYDRODYNAMIC AND HYDROMAGNETIC STABILITY, S. Chandrasekhar.


Lucid examination of the Rayleigh-Benard problem; clear coverage of the theory of
instabilities causing convection. 704pp. 5% x 8%. 0-486-64071-X

INVESTIGATIONS ON THE THEORY OF THE BROWNIAN MOVEMENT,


Albert Einstein. Five papers (1905-8) investigating dynamics of Brownian motion
and evolving elementary theory. Notes by R. Fiirth. 122pp. 5% x 84. 0-486-60304-0
THE PHYSICS OF WAVES, William C. Elmore and Mark A. Heald. Unique
overview of classical wave theory. Acoustics, optics, electromagnetic radiation, more.
Ideal as classroom text or for self-study. Problems. 477pp. 5% x 8%. 0-486-64926-1
GRAVITY, George Gamow. Distinguished physicist and teacher takes reader-
friendly look at three scientists whose work unlocked many of the mysteries behind
the laws of physics: Galileo, Newton, and Einstein. Most of the book focuses on
Newton’s ideas, with a concluding chapter on post-Einsteinian speculations concern-
ing the relationship between gravity and other physical phenomena. 160pp. 5% x 84.
0-486-42563-0
PHYSICAL PRINCIPLES OF THE QUANTUM THEORY, Werner Heisenberg.
Nobel Laureate discusses quantum theory, uncertainty, wave mechanics, work of
Dirac, Schroedinger, Compton, Wilson, Einstein, etc. 184pp. 5% x 8%. 0-486-60113-7
ATOMIC SPECTRA AND ATOMIC STRUCTURE, Gerhard Herzberg. One of
best introductions; especially for specialist in other fields. Treatment is physical
rather than mathematical. 80 illustrations. 257pp. 5% x 8’. 0-486-60115-3
AN INTRODUCTION TO STATISTICAL THERMODYNAMICS, Terrell L.
Hill. Excellent basic text offers wide-ranging coverage of quantum statistical mechan-
ics, systems of interacting molecules, quantum statistics, more. 523pp. 5% x 8/4.
0-486-65242-4
THEORETICAL PHYSICS, Georg Joos, with Ira M. Freeman. Classic overview
covers essential math, mechanics, electromagnetic theory, thermodynamics, quan-
tum mechanics, nuclear physics, other topics. First paperback edition. xxiii + 885pp.
5% x 8h. 0-486-65227-0
PROBLEMS AND SOLUTIONS IN QUANTUM CHEMISTRY AND
PHYSICS, Charles S. Johnson,Jr.and Lee G. Pedersen. Unusually varied problems,
detailed solutions in coverage of quantum mechanics, wave mechanics, angular
momentum, molecular spectroscopy, more. 280 problems plus 139 supplementary
exercises. 430pp. 6% x 944. 0-486-65236-X
THEORETICAL SOLID STATE PHYSICS, Vol. 1: Perfect Lattices in Equilibrium;
Vol. I: Non-Equilibrium and Disorder, William Jones and Norman H. March.
Monumental reference work covers fundamental theory of equilibrium properties of
perfect crystalline solids, non-equilibrium properties, defects and disordered systems.
Appendices. Problems. Preface. Diagrams. Index. Bibliography. Total of 1,301pp. 5%
x 8%. Two volumes. Vol. I: 0-486-65015-4 Vol. II: 0-486-65016-2
WHAT IS RELATIVITY? L. D. Landau and G. B. Rumer. Written by a Nobel Prize
physicist and his distinguished colleague, this compelling book explains the special
theory of relativity to readers with no scientific background, using such familiar
objects as trains, rulers, and clocks. 1960 ed. vit72pp. 5% x 8%. 0-486-42806-0
CATALOG OF DOVER BOOKS

A TREATISE ON ELECTRICITY AND MAGNETISM, James Clerk Maxwell.


Important foundation work of modern physics. Brings to final form Maxwell’s theo-
ty of electromagnetism and rigorously derives his general equations of field theory.
1,084pp. 5% x 8%. Two-vol. set. Vol. I: 0-486-60636-8 Vol. II: 0-486-60637-6

QUANTUM MECHANICS: PRINCIPLES AND FORMALISM, Roy McWeeny.


Graduate student-oriented volume develops subject as fundamental discipline, open-
ing with review of origins of Schrédinger’s equations and vector spaces. Focusing on
main principles of quantum mechanics and their immediate consequences, it con-
cludes with final generalizations covering alternative “languages” or representations.
1972 ed. 15 figures. xi+155pp. 5% x 84. 0-486-42829-X

INTRODUCTION TO QUANTUM MECHANICS With Applications to


Chemistry, Linus Pauling & E. Bright Wilson,
Jr.Classic undergraduate text by Nobel
Prize winner applies quantum mechanics to chemical and physical problems.
Numerous tables and figures enhance the text. Chapter bibliographies. Appendices.
Index. 468pp. 5% x 8. 0-486-64871-0

METHODS OF THERMODYNAMICS, Howard Reiss. Outstanding text focuses


on physical technique of thermodynamics, typical problem areas of understanding,
and significance and use of thermodynamic potential. 1965 edition. 238pp. 5% x 84.
0-486-69445-3
THE ELECTROMAGNETIC FIELD, Albert Shadowitz. Comprehensive under-
graduate text covers basics of electric and magnetic fields, builds up to electromag-
netic theory. Also related topics, including relativity. Over 900 problems. 768pp.
5% x 8h. 0-486-65660-8

GREAT EXPERIMENTS IN PHYSICS: FIRSTHAND ACCOUNTS FROM


GALILEO TO EINSTEIN, Morris H. Shamos (ed.). 25 crucial discoveries: Newton’s
laws of motion, Chadwick’s study of the neutron, Hertz on electromagnetic waves,
more. Original accounts clearly annotated. 370pp. 5% x 8's. 0-486-25346-5

EINSTEIN’S LEGACY, Julian Schwinger. A Nobel Laureate relates fascinating


story of Einstein and development of relativity theory in well-illustrated, nontechni-
cal volume. Subjects include meaning of time, paradoxes of space travel, gravity and
its effect on light, non-Euclidean geometry and curving of space-time, impact of radio
astronomy and space-age discoveries, and more. 189 b/w illustrations. xiv+250pp.
8% x 9h. 0-486-41974-6

STATISTICAL PHYSICS, Gregory H. Wannier. Classic text combines thermody-


namics, statistical mechanics and kinetic theory in one unified presentation of thermal
physics. Problems with solutions. Bibliography. 532pp. 5% x 84. 0-486-65401-X

Paperbound unless otherwise indicated. Available at your book dealer, online at


www.doverpublications.com, or by writing to Dept. GI, Dover Publications, Inc., 31 East
2nd Street, Mineola, NY 11501. For current price information or for free catalogues (please indi-
cate field of interest), write to Dover Publications or log on to www.doverpublications.com
and see every Dover book in print. Dover publishes more than 500 books each year on science,
elementary and advanced mathematics, biology, music, art, literary history, social sciences, and
other areas.
i 7

. -
4 il

“ t pine the aagelees


mai! ' ne a - a ae
a

: age 5 A CAC eat


inal avery 4 Sabi :

‘ = «4

4" n

; * '
= . 9 a =
i Ui yy tu
~~ 4 qe = Ey (} eS
t ~~
J \. ‘4 tyre fp @ y
‘ pe an ha Ut pal wer al)
’ ay a Ae

5 Si ead
Lipa

Hit We atop ae
i ae m2 mn ts at 4

oe) cat

a tee i a i ales) pee


; atta wit Jee 2,
(continued from front flap)

INTRODUCTORY REAL ANALysis, A. N. Kolmogorov and S. V. Fomin.


(0-486-61226-0)
SPECIAL FUNCTIONS AND THEIR APPLICATIONS, N. N. Lebedev. (0-486-60624-4)
CHANCE, LUCK AND Statistics, Horace C. Levinson. (0-486-41997-5)
TENSORS, DIFFERENTIAL FORMS, AND VARIATIONAL PRINCIPLES, David Lovelock
and Hanno Rund. (0-486-65840-6)
SURVEY OF MATRIX THEORY AND MATRIX INEQUALITIES, Marvin Marcus and
Henryk Minc. (0-486-67102-X)
ABSTRACT ALGEBRA AND SOLUTION BY RADICALS, John E. and Margaret W.
Maxfield. (0-486-67121-6)
FUNDAMENTAL CONCEPTS OF ALGEBRA, Bruce E. Meserve. (0-486-61470-0)
FUNDAMENTAL CONCEPTS OF GEOMETRY, Bruce E. Meserve. (0-486-63415-9)
FIFTY CHALLENGING PROBLEMS IN PROBABILITY WITH SOLUTIONS, Frederick
Mosteller. (0-486-65355-2)
NUMBER THEORY AND ITs History, Oystein Ore. (0-486-65620-9)
MATRICES AND TRANSFORMATIONS, Anthony J. Pettofrezzo. (0-486-63634-8)
THE UMBRAL CALCULUS, Steven Roman. (0-486-44139-3)
PROBABILITY THEORY: A CONCISE Course, Y. A. Rozanov. (0-486-63544-9)
LINEAR ALGEBRA, Georgi E. Shilov. (0-486-63518-X)
ESSENTIAL CALCULUS WITH APPLICATIONS, Richard A. Silverman. (0-486-66097-4)
INTERPOLATION, J.F. Steffensen. (0-486-45009-0)
A CONCISE HisToRY OF MATHEMATICS, Dirk J. Struik. (0-486-60255-9)
PROBLEMS IN PROBABILITY THEORY, MATHEMATICAL STATISTICS AND THEORY OF
RANDOM Functions, A. A. Sveshnikov. (0-486-63717-4)
TENSOR CALCULUS, J. L. Synge and A. Schild. (0-486-63612-7)
MODERN ALGEBRA: Two VOLUMES BounD As ONE, B.L. Van der Waerden.
(0-486-44281-0)
CALCULUS OF VARIATIONS WITH APPLICATIONS TO PHYSICS AND ENGINEERING, Robert
Weinstock. (0-486-63069-2)
INTRODUCTION TO VECTOR AND TENSOR ANALYSIS, Robert C. Wrede.
(0-486-61879-X)
DISTRIBUTION THEORY AND TRANSFORM ANALYSIS, A. H. .Zemanian.
(0-486-65479-6)

Paperbound unless otherwise indicated. Available at your book dealer,


online at www.doverpublications.com, or by writing to Dept. 23, Dover
Publications, Inc., 31 East 2nd Street, Mineola, NY 11501. For current
price information or for free catalogs (please indicate field of interest),
write to Dover Publications or log on to www.doverpublications.com
and see every Dover book in print. Each year Dover publishes over 500
books on fine art, music, crafts and needlework, antiques, languages, lit-
erature, children’s books, chess, cookery, nature, anthropology, science,
mathematics, and other areas.
Manufactured in the U.S.A.
NUMERICAL
ANALYSIS
ALSTON S. HOUSEHOLDER
Both routines of numerical computation and those of high-speed digital com-
putation rely on basic principles of numerical analysis. This text offers a clear
and concise presentation by one of the authorities in the field. An excellent ia

reference and study work, it covers a significant amount of information on


essential topics for intermediate to advanced mathematicians and computer
scientists.

Based on a lecture course given in Oak Ridge for the University of Tennessee,
this volume concerns general topics of the solution of finite systems of linear
and nonlinear equations and the approximate representation of functions.
Specific chapters cover the art of computation, matrices and linear equations,
nonlinear equations and systems, the proper values and vectors of a matrix,
interpolation, more general methods of approximation, numerical integration
and differentiation, and the Monte Carlo method. The Graeffe process,
Bernoulli's method, polynomial interpolation, and the quadrature problem
receive special attention.
Each chapter contains bibliographic notes, and an extensive bibliography
appears at the end. A final section provides 54 problems, subdivided accord-
ing to chapter, for additional reinforcement.
Dover (2006) unabridged republication of the edition originally published by
McGraw-Hill Book Company, Inc., New York, 1953. 288pp. 5% x 8%.
Paperbound.
See every Dover book in pri
www.doverpublication

You might also like