Principles of Numerical Analysis
Principles of Numerical Analysis
7 ME FS re ~ i alte Aen
PA 4 : cep aaa pa ihn Sa of %)fay oe
Th sin = cS
ye ee et
bgp —
PRLINE wRFoe. tnt a ; nee
Ainaihag
PRINCIPLES OF
NUMERICAL ANALYSIS
| (ee
eVIAMA JAI,
PRINCIPLES OF
NUMERICAL ANALYSIS
ALSTON S. HOUSEHOLDER
Bibliographical Note
This Dover edition, first published in 1974 and reissued in 2006, is an
unabridged, slightly corrected republication of the work originally published
by the McGraw-Hill Book Company, Inc., New York, in 1953.
B, J, ano J
Digitized by the Internet Archive
in 2024
https://archive.org/details/principlesofnume0000alst
PREFACE
This is a mathematical textbook rather than a compendium of computa-
tional rules. It is hoped that the material included will provide a useful
background for those seeking to devise and evaluate routines for numerical
computation.
The general topics considered are the solution of finite systems of
equations, linear and nonlinear, and the approximate representation of
functions. Conspicuously omitted are functional equations of all types.
The justification for this omission lies first in the background presupposed
on the part of the reader. Second, there are good books, in print and in
preparation, on differential and on integral equations. But ultimately,
the numerical “‘solution”’ of a functional equation consists of a finite table
of numbers, whether these be a set of functional values, or the first n coeffi-
cients of an expansion in terms of known functions. Hence, eventually
the problem must be reduced to that of determining a finite set of numbers
and of representing functions thereby, and at this stage the topics in this
book become relevant.
The endeavor has been to keep the discussion within the reach of
one who has had a course in calculus, though some elementary notions
of the probability theory are utilized in the allusions to statistical assess-
ments of errors, and in the brief outline of the Monte Carlo method. The
book is an expansion of lecture notes for a course given in Oak Ridge for
the University of Tennessee during the spring and summer quarters of
1950.
The material was assembled with high-speed digital computation
always in mind, though many techniques appropriate only to “hand”
computation are discussed. By a curious and amusing paradox, the
advent of high-speed machinery has lent popularity to the two innovations
from the field of statistics referred to above. How otherwise the con-
tinued use of these machines will transform the computer’s art remains to
be seen. But this much can surely be said, that their effective use
demands a more profound understanding of the mathematics of the
problem, and a more detailed acquaintance with the potential sources of
error, than is ever required by a computation whose development can be
watched, step by step, as it proceeds. It is for this reason that a text-
book on the mathematics of computation seems in order.
Help and encouragement have come from too many to permit listing
vil
Vill - PREFACE
all by name. But it is a pleasure to thank, in particular, J. A. Cooley,
C. C. Hurd, D. A. Flanders, J. W. Givens, A. de la Garza, and members of
the Mathematics Panel of Oak Ridge National Laboratory. And for the
painstaking preparation of the copy, thanks go to Iris Tropp, Gwen
Wicker, and above all, to Mae Gill.
A. S. Householder
CONTENTS
Preface. Vii
rth fe
nterpolation 185
Bibliography 247
Problems 263
Index 269
CHAPTER 1
by 1 per cent from the true quotient. (The number 0.334, when express-
ing the result of the same division, deviates by only 0.2 per cent from the
true quotient, and yet is incorrectly obtained. The deviation of 0.33
from the true quotient will be called an error. If the division is to be
carried out to three places but not more, then 0.333 is the best representa-
tion possible and the replacement of the final ‘3”’ by a final “4” will be
called a blunder.
Blunders result from fallibility, errors from finitude. Blunders will
not be considered here to any extent. There are fairly obvious ways to
guard against them, and their effect, when they occur, can be gross,
insignificant, or anywhere in between. Generally the sources of error
other than blunders will leave a limited range of uncertainty, and gen-.
erally this can be reduced, if necessary, by additional labor. It is impor-
tant to be able to estimate the extent of the range of uncertainty.
Four sources of error are distinguished by von Neumann and Goldstine,
and while occasionally the errors of one type or another may be negligible
or absent, generally they are present. These sources are the following:
1. Mathematical formulations are seldom exactly descriptive of any
real situation, but only of more or less idealized models. Perfect gases
and material points do not exist.
2. Most mathematical formulations contain parameters, such as
lengths, times, masses, temperatures, etc., whose values can be had only
from measurement. Such measurements may be accurate to within 1,
0.1, or 0.01 per cent, or better, but however small the limit of error, it is
not zero.
3. Many mathematical equations have solutions that can be con-
structed only in the sense that an infinite process can be described whose
limit is the solution in question. By definition the infinite process can-
not be completed, so one must stop with some term in the sequence,
accepting this as the adequate approximation to the required solution.
This results in a type of error called the truncation error.
4. The decimal representation of a number is made by writing a
sequence of digits to the left, and one to the right, of an origin which is
marked by the decimal point. The digits to the left of the decimal are
finite in number and are understood to represent coefficients of increasing
powers of 10 beginning with the zeroth; those to the right are possibly
infinite in number, and represent coefficients of decreasing powers of 10.
In digital computation only a finite number of these digits can be taken
account of. The error due to dropping the others is called the round-off
error.
In decimal representation 10 is called the base of the representation.
Many modern computing machines operate in the binary system, using
the base 2 instead of the base 10. Every digit in the two sequences is
THE ART OF COMPUTATION 3
either 0 or 1, and the point which marks the origin is called the binary
point, rather than the decimal point. Desk computing machines which
use the base 8 are on the market, since conversion between the bases 2
and 8 is very simple. Colloquial languages carry the vestiges of the use
of other bases, e.g., 12, 20, 60, and in principle, any base could be used.
Clearly one does not evaluate the error arising from any one of these
sources, for if he did, it would no longer be a source of error. Generally
it cannot be evaluated. In particular cases it can be evaluated but not
represented (e.g., in the division 1 + 3 carried out to a preassigned
number of places). But one does hope to set bounds for the errors and
to ascertain that the errors will not exceed these bounds.
The computor is not responsible for sources 1 and 2. He is not
concerned with formulating or assessing a physical law nor with making
physical measurements. Nevertheless, the range of uncertainty to
which they give rise will, on the one hand, create a limit below which the
range of uncertainty of the results of a computation cannot come, and
on the other hand, provide a range of tolerance below which it does not
need to come.
With the above classification of sources, we present a classification
of errors as such. This is to some extent artificial, since errors arising
from the various sources interact in a complex fashion and result in a
single error which is no simple sum of elementary errors. Nevertheless,
thanks to a most fortunate circumstance, it is generally possible to
estimate an over-all se ofsapoentaunty as though it were such a simple
sum (§1. 2). fence we will distinguish propagated error, generated
and one can say at least that the total error does not exceed the sum of
the two errors.
That propagated and generated errors depend upon the details of the
routine, such as the value of \ for any representation, and even the order
in which certain operations are carried out, is easily seen. Thus, to con-
sider only round-off, if ¢ is some operation, it may be that mathematically
nto(y*de*)—atoX(y*o*2*) = [ota(y*de*)—vtoly*ote)]
+[a*o(y*o*e*)—a*a*(y*o*2)],
Thus the errors generated by performing the operations in the two pos-
sible ways have different expressions, and cannot be assumed equal
without proof.
1.3. Propagated Error and Significant Figures. Let
(1.3.1) e=ax*+ é, fp Pea
ys ae
and consider the problem of evaluating the function f(z, y, ...). Ifthe
function can be expanded in Taylor’s series, then
(1.3.2) f(a, 1 .) = CSs y*, ome .) = feck ah yite eh 2
=} V6 (fee + enjoy ee )
where the partial derivatives are to be evaluated at 2*, y*, .... This
represents the error in f arising from errors in the arguments, 7.e., the
propagated error. Generally one expects the errors é, 7, ... to be
“‘small” so that the terms of second and higher power can be neglected.
If so, then the propagated error Af satisfies, approximately,
f=tetyt--,
Hence the error in a sum does not exceed the sum of the errors in the
terms.
One can, by direct differentiation, write down any number of special
relations (1.3.3) based upon the assumption that the errors in the argu-
ments are small. Nevertheless, for the detailed analysis one must go to
the individual elementary operations.
Consider the case of the product and quotient. For the first we have
Usually one says that the relative error in a product is the sum of the
relative errors in the factors, and this is approximately true if these rela-
tive errors are both small, but only then.
For the quotient
we 8 Mapai Ect *y
¥y <
*
y*(y* + n)
and the relative error is
(1.3.5) Be mintedVouMake ee of
GUT be Soh Seay,
If the relative error 7/y* is negligible, then the relative error in the
quotient does not exceed in magnitude the sum of the magnitudes of the
relative errors in the terms. Nevertheless, if »/y* < 0 and not numeri-
cally small, the conclusion does not follow. .
If x*, given by (1.2.1), represents a number whose true value is z,
and if
But if the one condition or the other holds, the machine will correctly
form the sum or difference, as the case may be, and no round-off is
generated. Hence if a* + b* or a* — b* is digital, a* + b* or a* — b*
can be formed, and formed correctly, without generating any new error.
If a* and b* are digital, then necessarily
larbe\r<ole
However, the true product a*b* is a number of 2\ places. If the machine
holds only ) places, it will not form the true product a*b*, but a pseudo
product. It can be represented a* X b*. It may be that the machine
merely drops off the last \ places from a*b*. Or it may be that the
machine first forms a*b* + 8—/2 and then drops the last » places. In
the latter event the pseudo product satisfies
(1.4.1) la*b* — a* x b*| Se = 6/2.
where « is introduced to simplify notation. Let us assume that (1.4.1)
holds. ;
For division, the quotient. a*/b* will usually require infinitely many
places for its true representation, even though a* and b* are both digital.
The machine, however, can retain only the first \, and it may compute the
first \ places of a*/b* and drop the rest, or it may compute the first \
places of a*/b* + 6/2. In the latter event, the retained » places
represent the pseudo quotient a* + b*, which satisfies
(1.4.2) |a*/b* — a* + b*| <e.
Given a series of n products a*b* to be added together, we have
(1.4.3) |Za*¥ X b¥ — La*b*| < ne.
However, instead of recording each product as a digital number and add-
ing the results, it may be possible for the machine to retain and accumu-
late the true products of a* and b*, rounding off the sum as a digital
number. This pseudo operation may be designated 2*a*b*, and for this
we have
(1.4.4) |=*a*b* — Latb*| < «.
While
(Yy¥-a+y) +222,
y-a+y> 2,
Y 22 POY > eo] y > Gy,
whence y? > a, which proves the second inequality in (1.5.5). By the
first inequality in (1.5.4),
(¢-—a+2)+2>(y-—a+y)+24+2%.
But
(g-—a+z)/2> (e—a+az) +2,
ee aO ee YO Ue ae,
since in forming the pseudo quotient by 2 (which is a single shift to the
right) the error is either 0 or 21. Hence
C=442)/22 ya 4)/2+2 4,
2-Q@+e>y-aryt2%
Again,
a+2-—a/z> —2>,
a+y—a/y
<0,
12 PRINCIPLES OF NUMERICAL ANALYSIS
whence
2—-a/z>y-—a/y.
But the function f(z) = 2 — a/z is properly monotonically increasing.
Hence f(z) > f(y) implies the first inequality in (1.5.5).
This implies, in particular, that, if
(t;-—Q+2%,)
+2 = 4%; — %41>0,
then x; > +/a. If it should happen that
(1.5.6) (x; Oy as 2;) ~2< 0,
then clearly we should take x; and not x41 a8 x. On the other hand, if
(1.5.7) (@_1 ae UR re Gia) +22> VEINS
Gi Dio aa
(t24 a ee Gea )y/2 = (ti_1 Oh ae Xi-1) A} pies
whence
U1 — @ + U1 > 2-*1,
Hence
Ue = (0i a es Demtarll Ss a/xs-1 + agg
ine Ss a/xi—1,
Lint Ls,
a;+ TX S a/x:,
mee a/x; > —2>,
This holds a fortiori if the inequality holds in (1.5.7). Hence in all cases
z—a/e> —2>,
oS ae oe
(1.5.8) t > (a + 2--2)% — 2-1,
and center (a + 2-*~?)% upon which z must lie. These inequalities can
be written
The terms w, are the terms which appear in the expansions, while w* are
the terms we actually obtain in the computation. For this machine
\ = 39, and the machine accepts a (2d)-digit dividend so that divisions
xz + n are performed by dividing (2x) by (2~n). Let
én = 2-“lw, — wel.
Then
w* — w, = [(x +n) X wh, — (2 + noe] + [@ +0) - (x/n) wr
hi + (x/n)(wa_, — Wn-1)-
14 PRINCIPLES OF NUMERICAL ANALYSIS
The division steps satisfy
0<2/n—a +n < 2-%(n — 1)/n.
Also
|wr| < 1/n}.
Hence
én < 1 + 2(n — 1)/n! 4+ eni/n.
The residual error is less than the first neglected term, and for n > 15,
\wn| < 2-49 = 2-1, Hence on solving recursively and adding the
generated errors (the ¢’s) and the residual errors, we have
lc — (1 — cos x)| < 1.197 XK 2-*%,
|s — sin z| < 1.140 X 2-*.
For the check let
2 fy
cosz =1—c+e, sinz=s+e,
where ¢ and e’ are bounded by the right members of the above inequalities.
Then
2c — c? — gs? = Qe cos x + 2e’ sin & — &* — &?.
Hence
|2c — c? — s*| < 2e’(cos x + sin |x|) + 2(e — e’) cos x
< 2c! 1/2 + We — 2)
< 1.669: X 2-*.
Hence
[Ze —cXe—sXs| < |2c — 2? — 8314+ |eXec—c|+|sX8
— 83
Kaidisnnts
(2.01.1) x= y Ee.
17
18 PRINCIPLES OF NUMERICAL ANALYSIS
(2.01.2) fu » eves.
i
But also each e; is expressible as a linear combination of fi, . . . , fn:
»
obtain
(2.01.4) x = f;€;;&;.
Hence if
(2.01.5) = Y,eiki
7
the & are the coordinates of the geometric vector x in the f; coordinate »
system, and the set of coordinates £, constitute the numerical vector 2’
which represents x in that system of coordinates.
It is convenient to arrange the coefficients ¢;; in the rectangular array
€11 €12 €1n
(2.01.6) ae te, a
En] 9 €no> se 3) Cha
f; = Y Beday
i
Then
ej = Y fe = . » Si PijEsi-
j kj
Hence if F represents the matrix of the ¢x;,
Y vase = 9ni;
7
where 6;; is the Kronecker delta, defined by
= 0 when
k #1,
(2.01.11) oh 1 when k = 4.
20 PRINCIPLES OF NUMERICAL ANALYSIS
In this event the matrix P has the simple form
(2.01.12)
Selle) ah ven Kot hemes ne
Hence
(2.02.5) T = ETE
22 PRINCIPLES OF NUMERICAL ANALYSIS
is the matrix which represents the transformation in the coordinate
system f, since this is the matrix which, when applied to the numerical
vectorx’ (which represents x in that system), will yield the numerical
vector representing 7'(x) in that system.
2.03. Determinants and Outer Products. An outer product of two
vectors a and b is a new type of geometric entity, defined to have the
following properties:
(12) [a, b] = 1b; a];
(22) [aa, b] = ofa, bj;
(32) [a, b] + [a, c] = [a, b + ¢].
It can be pictured as a two-dimensional vector, whose magnitude, taken
positively or negatively, is the area of the parallelogram determined by
the vectors in the product. It follows immediately from (12) that
[a, a] = 0.
Hence
[a, b] = [a, b] + afa, a] = [a, b] + [a, aa] = [a, b + aa].
Hence for any scalars a and B,
. [a, b] = [a, b + aa] = [a + fb, bl.
If e1, €2 are any linearly independent vectors in the space of a and b,
then
(2.03.1) [a, b] = |a_ bl[ex, eal,
where
(2.03.2) la bl] = arb, — afi
is called the determinant of the numerical vectors a and b. The evalua-
tion is immediate:
[a, b] = [a, Bei + Bree] = Bila, er] + Bela, eo]
= Bilare: + ar€e, 1] + Bolare1 + are, eo]
= Ba2[€2, e1] + Boos[@1, €2]
= (a182 — a81)[e1, e2].
The determinant is a number, and its relation to the outer product is
similar to that of the coordinates to a vector.
_ It is a simple geometric exercise to show that (32) holds in the paral-
lelogram interpretation when a, b, and c are all in the same 2 space.
When they are not, the relation serves to specify the rule of composition.
For outer products of n vectors the defining relations are sufficiently
typified by the case n = 3:
(13) la, b, c] a =(a, Cc, b] = 16 b, a] ae ee
(23) [aa, b, c] = ofa, b, c];
(3,) [a, b, c] + [a, b, d] = [a, b,c + d].
MATRICES AND LINEAR EQUATIONS 23
From these we deduce that
? [a,a,c) = +--+ =0;
[a,
b, c] = [a + Bb, b, c] = [a,b,c
+ aa] = - - - ;
and if e:, 2, es; are linearly independent vectors in the space of a, b, and
c, then
(2.03.3) [a,b,c] = |a Db elfey, es, esl,
where |a b c| is called the determinant of the numerical vectors a, b >
and c, and its value will now be obtained. Note first that, if
In fact, the identical steps that led to (2.03.1) and (2.03.2) will, if applied
to the second member of this last equality, yield the third member.
Now by an obvious modification we obtain
Gy. Bi tvs
la b c| = | a2 Bo Y2 1,
a Bs Ys
we see that in the expansion of the determinant in terms of the y’s, the
coefficient of each y; is, except for sign, that second-order determinant
that remains after deleting the row and column containing y;. The sign
is that power of —1 whose exponent is obtained by adding together the
number of the row and column. Thus 72 is in the second row, third
column, and the sign is (—1)***. The coefficient of y; with its proper
sign is called the cofactor of y;. By interchanging rows and columns and
going through the same process, we find
la b c| = 0A, + aA + a3A3
and, in general, when the elements of any column are multiplied by the
cofactors of some other column and the products summed, the sum
vanishes.
Finally, we have the expansions
la b ¢| oA, + 61Bi + Wilt
arAe + BoBe + v2T2
ll a3Ag + B3Bzg + YsI's.
These are the expansions we should get if we were to rewrite the deter-
minant, writing the rows of the original as columns of the new one, and
these equations say in effect that this exchange of rows for columns leaves
the value of the determinant unaltered. It is quite clear that such an
exchange leaves the value of a second-order determinant unaltered.
Hence it is clear that in the expansion of either third-order determinant,
the original or the transposed, the coefficient of any element is the same.
The theorem follows because, when the determinant is expressed as an
explicit function of its elements, each term contains as a factor one and
only one element from each row and one and only one element from each
column.
The recursive extension to successively higher dimensions can be made
by following the same pattern, and the formulas need not be written
explicitly. For each extension the expansion is made first along a par-
ticular column, and one observes that it is equally possible to expand
along any row.
For determinants of order 4 or greater another type of expansion is
possible, called the Laplace expansion. To describe this it is convenient
to introduce the symbolism
a; Bil
|ox; B;| i
a; B;
If, now,
& = a1€1 + ar€e + aze3 + ass,
Ore e, Te ee en eine Memes: om euvied el he os: we) Late
= |EI|Filgs, . . . , Bal,
and on the other hand,
= (2ée,)y.
But by the same rule, if
y = 27;€;,
then
xy = X27;€;,
and therefore
(2.04.1) xy = DLEmee;.
26 PRINCIPLES OF NUMERICAL ANALYSIS
Now the scalar products
(2.04.2) Ys = C0; = VK
are known once the vectors e; are themselves known. Hence the scalar
product of any two vectors can be calculated from (2.04.1).
If each column of a matrix M is written as a row, the order remaining
the same, the resulting matrix is known as the transpose of the original
and is designated M'™. In particular, if 7 is the column vector of the &,
x" is the row vector of the é;. With this understanding, if G is the matrix
(2.04.3) G = (vi) = e'e,
this is said to define the metric in the space, and Eq. (2.04.1) becomes
(2.04.4) xy = w'Gy = y'Gze.
The matrix G is equal to its own transpose and is said to be symmetric.
When the metric G is known and fixed throughout the discussion, one
often uses the notation
Then
(ai) = |A|-1 adj (A),
(a;)(a*) = I,
so that
(2.05.8) A-} = |A|“! adj (A).
since P is idempotent.
Any symmetric idempotent matrix P is a projection operator. For
if P has rank m, we can find a matrix A of m linearly independent columns
such that every column of P is a linear combination of columns of A.
Hence, for some matrix B we can write
P = AB’
We have only to show that
B= A(AtTA)=*.
Since P is idempotent,
Pu =P? =A BAB = AB
Since the columns of A are linearly independent, B'AB™ = B™. The
rank of P = AB" cannot exceed the rank of B"; hence B™ has rank at least
m, and having only m rows, the rank is exactly m. Hence
BIA ==, A'B.
Since P is symmetric,
ABT = BAT,
ABTA = BATA,
A = BATA.
This is the desired result.
For an arbitrary metric, e'e = G, the orthogonal projection is repre-
sented by a matrix
(2.051.3) P = A(A'GA)-1A1G,
where the columns of A are contravariant vectors.
2.06. Cayley-Hamilton Theorem; Canonical Form of a Matrix. If with
any vector x we associate its successive transforms
The matrix
(2.06.3). V(T) = Vol oe mT + ey ae a vert
For any proper value A, any vector in the null space of T’ — XJ is also
in the null space of (7 — AJ)’ for any positive integer r, but the converse
is not true. A vector in the null space of (T— ADF for any positive
integer r is a principal vector. If x is in the null space of (T — AI)’
but not in the null space of (7 — dJ)*“}, it is called a principal vector of
grade r.
The characteristic function 400) has the remarkable property that
(2.06.10) ¢(T) = 0.
This is the Cayley-Hamilton theorem, which can be stated otherwise ie
saying that the null space of ¢(7’) is the entire space. This might be
expected from the fact, shown above, that the null spaces of two poly-
nomials in 7 have a non-null vector in common only if they have a
common divisor. A proof of the Cayley-Hamilton theorem is as follows
Since
o(T) — $Q)I = (=—1* = (-Y*] + nl(— 7) - (YT
spss eh a(t)
therefore this difference is equal to a polynomial in 7 multiplied by
T — XI, and hence is said to be divisible by T — AJ. Also, by Eq.
(2.05.6),
(T — AI) adj (T — AI) = oQ)I.
Hence ¢(A)J is also divisible by T — AJ. It follows, then, that ¢(T) is
divisible by 7 — J. However, ¢(T) issuidepee dent of \ and must
therefore vanish.
If F is any nonsingular matrix, then the matrices T and
(2.06.11) fA aON Firs HS
are said to be similar. They represent the same transformation but in
different coordinate systems. Since
F-\(T —I)F =T’ — XI,
they have the same characteristic function and hence the same proper
values. One proves inductively that for any positive integer r
T= FAThs
and hence that
(2.06.12) W(T") = F-Y(T)F
for any polynomial y.
It is reasonable to expect that a given transformation might be more
simply represented in some coordinate systems than in others, and this
will now be shown. Note first that the theorem expressed by (2.06.4)
MATRICES AND LINEAR EQUATIONS 33
can be generalized as follows: If there is no nonconstant factor common to
all polynomials $1(A), ¢2(A), . . . , dm(A), then there exist polynomials
Fil), fold), . . » , fm(A) such that —
+ y2
(yr )
o(T t+ + Ym) = 90.
34 PRINCIPLES OF NUMERICAL ANALYSIS
But since ¢:(7) contains every factor (A — A)" except (A — Ai), it
follows that
. d(T) (y2 + > °° + Ym) = 0.
Hence
oi(T)yi = 0.
But
(T — Al)™yi = 0,
whereas (A — A1)™ and ¢i(A) have no common factor. Hence y: = 0,
contrary to supposition. —
Now for a new coordinate system choose a matrix
in which the columns of F, form a coordinate system for the null space of
(T — r11)™, the columns of F, form a coordinate system for the null space
of (7 — Ael)™, .. . . Ifxis any vector in the null space of (T’ — AJ)”,
so also is Tx since
(T —dA2)"Tx = T(T — dAd)™x = 0.
Hence any column of TF; is expressible as a linear combination of columns
of F;, and therefore
(2.06.15) TE = RT
where 7” has the form
Eo Or eO
2.06.1 T’ =
/ 0 Ts 0
where all columns of F; are linearly independent and where every column
of F;; is a principal vector of grade 7.
MATRICES AND LINEAR EQUATIONS 35
Now the columns of F;,, are of grade v;, while those of
(T — ADF,
are of grade », — 1. Furthermore, the columns of
are linearly independent. If this were not the case, there would exist
equal linear combinations (7 — i,J)z of the columns of the first sub-
matrix and y of the columns of the second:
(2.06.18) (T —rADz = y.
But then
(T — r.1)% = (T — dAJ)*"y = 0
since all columns of F; are of grade < v;. But then y is of grade »; — 1.
Hence y, which is by definition a linear combination of columns of
F;,,, 1s also a linear combination of the remaining columns of F,, and this
is another way of saying that the columns of F; are linearly dependent.
Since this is not the case, y = 0. Hence by (2.06.18) x is a proper
vector, and hence both a linear combination of columns of F;; and of Fi,,.
Hence x = 0.
The argument may be continued to show that the columns of
(2.06.19) CE) Dane AT — Ad) Foy Fa)
all columns of F;,, are exhausted, pass next to a column (if any) from
_ F;,,,-1 which does not appear in one of the above chains; then pass to
Fsy—2)
’ By so forming and rearranging the matrices F; which make up F, we
obtain a matrix F whose columns are grouped into sequences such that,
when the double subscripts of the f’s are replaced by single subscripts
from 1 to n, we have either
Thy = Mf + Fits
or else
Tf; = Mf;
for some d;. Hence
Th hele,
where now T”’ has the form
Pi ORO
0. sFES0
2.06.20 TY’ = &
( ) 0.6 Om ek;
a) ete (6 seme: wel ce "a ens,
(2.06.21) T =r +h,
000
(2.06.22 ) Tage
1
1802-0
Go
ee) eo <o ‘ee We. 6 'e
The matrix J; is called the auxiliary unit matrix, and has units along the
first subdiagonal and elsewhere has zeros. Note that
I, = 12
has units along the second subdiagonal, and if I, is of order v, then Jz = 0.
Every column of F is a principal vector of T. We could apply the
above theorem to 77 and in the process obtain a matrix G every column of
which is a principal vector of T7. If f is a principal vector of T cor-
responding to the proper value \, and if g is a principal vector of 7™
corresponding to the proper value » ~ i, then g and f satisfy
(2.06.23) g Tf = gf =0.
The proof can be made inductively. Suppose first that g and f are proper
vectors: Then
gT = ug, Tf =,
so that
g'Tf = ug'f = dg'f.
MATRICES AND LINEAR EQUATIONS 37
Since \ ~ p, this proves the relation for that case. Next suppose g is
a proper vector but f of grade 2. Hence
(T — Af = fi #0,
but
(T — ANYfi = 0.
Hence f: is a proper vector. Now
gfi = 0
as was just shown. Hence
g'Tf = do's,
whereas ;
. g'Tf = ng’,
and again, since \ ~ uy, the relation is proved. By continuing one proves
the relation for proper vectors g and f of any grade.
If T is symmetric, T™’ = T, then any principal vector of 77 is also
a principal vector of 7. But for a symmetric matrix we now show
that all principal vectors are proper vectors, and in the normalized form
(2.06.20) of 7 all matrices T’ are scalars.
This is clearly the case when the proper values are all distinct. In
that case, in fact, every proper vector is orthogonal to every other proper
vector, whence F'F is a diagonal matrix, and by choosing every vector f
to be of unit length, one has even
(2.06.24) 7 FF =I
so that F is an orthogonal matrix. Suppose the proper values of 7’ are
not all distinct. One can, nevertheless, vary the elements of T slightly
so that the matrix T + 67 is still symmetric and has all proper values
distinct. ‘Then F + 6F is an orthogonal matrix. As the elements of
[T + 6T vary continuously while the matrix remains symmetric, the
columns f + éf of F + 6F also vary continuously but remain mutually
orthogonal and can be held at unit length. Hence these properties
remain while 57’ vanishes. Hence for any symmetric matrix 7 there
exists an orthogonal matrix F' such that
(2.06.25) F'TF = A,
where A is a diagonal matrix whose elements are the proper values of T’.
2.07. Analytic Functions of a Matrix; Convergence. The relation
(2.06.12), valid for any polynomial, is easily extended. Consider first
any of the matrices 7’; of (2.06.20), neglecting the trivial case when T’, is a
scalar. Any power can be written
(2.07.1) Ty = MI + hs + C) Metre,
38 PRINCIPLES OF NUMERICAL ANALYSIS
Hence in this case as r becomes infinite, 7,’ approaches the null matrix, in
the sense that every one of its elements approaches zero. If for every
proper value 2; of T it is true that |\;| < 1, then also 7” approaches the
null vector as r becomes infinite. Since F and F~ are fixed, this is true
also of F-1T’"F, and hence of 77. Hence if every proper value of 7 has
modulus less than unity, then 7*—> 0 as r becomes infinite. This con-
dition is necessary as well as sufficient.
Now consider any function (A) analytic at the origin:
N(A) = [Za?]%.
Clearly
(2.08.3) b(A') = B(A), N(A‘) = N(A).
If we use the notion of a trace of a matrix
(2.08.4) tr (A) = Za,
then an equivalent expression for N(A) is
(2.08.5) N(A) = [tr (ATA)]* = [tr (AA]*.
If a; are the column vectors of A, and a; the row vectors, then
(2.08.6) N?(A) = ZN%(a;) = ZN*(a}),
where the exponent applies to the functional value
N?(A) = [N(A)]’.
Hence if a; = x and all other a; = 0,
N(A) = N(z).
A useful inequality is the Schwartz inequality which states that for
any vectors x and y
(2.08.7) |zty| = ly™z| < N(x)N(y).
Geometrically this means that a scalar product of two vectors does not
exceed the product of the lengths of the vectors (in fact it is this product
multiplied by the cosine of the included angle). This generalizes immedi-
ately to matrices
(2.08.8) , N(AB) < N(A)N(B).
40 PRINCIPLES OF NUMERICAL ANALYSIS
Another useful inequality is the triangular inequality
(2.08.9) N(e@+y) < N(x) + NQ),
which says that one side of a triangle does not exceed the sum of the other
two, and which also generalizes immediately
(2.08.10) N(A + B) < N(A) + NB).
Also we have
(2.08.11) b(A + B) < B(A) + D(B).
But
(2.08.12) |aty| < nb(x)b(y),
since in zTy there are n terms each of which could have the maximum value
b(x)b(y). Hence for matrices
(2.08.13) b(AB) < nb(A)b(B).
If in (2.08.8) we take
bi = &, b=bs
= --: = b, = 0,
then we have
(2.08.14) N(Az) < N(A)N(z).
We now introduce the third measure:
(2.08.15) M(A) = max N(Az)/N(z) = ess |ztAy|/[N (x) N(y)],
or equivalently
(2.08.16) M(A) = max N(Az) = max |xTAy|.
N(z) =1 N(z)
= Ny) =1
Any choice of unit vectors for x and y will give a number z’Ay which
cannot exceed the maximum. Hence
Of the three functions b, N, and M, the first is obtainable for any given
matrix by inspection, and the second by direct computation. The third,
however, is only obtainable in general from rather elaborate computa-
tions, though it generally yields the best estimates of error.
MATRICES AND LINEAR EQUATIONS 43
: M (A) = ej Ae,
and furthermore
M(B) = M(A).
Hence
(2.08.27) M(A) = 1%.
Also
(2.08.28) N?(A) = 2X.
44 PRINCIPLES OF NUMERICAL ANALYSIS
To see this, we observe that by definition the proper values ); of a
matrix B satisfy the algebraic equation
|B — rxI| = 0
and that the trace tr (B) is the sum of the proper values, while by
definition
N*(A) = tr (ATA) = tr (B).
These relations provide an alternative proof for (2.08.21) and the second
inequality in (2.08.20).
By analogy with M, we define
(2.08.29) m(A) = an N(Az)/N (2).
Then
(2.08.30) m(A) = d2%.
Also if A is nonsingular,
(2.08.31) M(A-) = d;*, m(A-1) = Az,
Hence for a nonsingular matrix
(2.08.32) M(A-)m(A) =
These relations arise from the fact that, if B is nonsingular,
X' BUX = A},
which is a special case of the relation
(2.08.33) X'B*X = A’,
where r is any integer, positive or negative.
We conclude this discussion by noting that, if x’ is the vector whose
elements are |£;| and if 1, is the vector each on whose elements i
is unity,
then from (2.08.7) it follows that
(2.08.34) Dé] < n¥N (a),
where x has the elements £ This follows from the fact that
N(1,) = n¥.
2.1. Iterative Methods. Generally speaking, an iterative method for
solving an equation or set of equations is a rule for operating upon an
approximate solution z, in order to obtain an improved solution 241,
and such that the sequence {z,} so defined has the solution x as its limit.
This is to be contrasted with a direct method which prescribes only a
finite sequence of operations whose completion yields an exact solution.
Since the exact operations must generally be replaced by pseudo opera-
MATRICES AND LINEAR EQUATIONS 45
tions, in which round-off errors enter, the exact solution is seldom attain-
able in practice, and one may wish to improve the result actually obtained
by one or more iterations. Also since the “approximation” x) with
which one may start an iteration does not necessarily need to be close,
itis sometimes advantageous to omit the direct method altogether, start
with an arbitrary %, perhaps zo = 0, and iterate until the approach is
sufficiently close.
2.11. Some Geometric Considerations. A large class of iterative meth-
ods are based upon the following simple geometric notion: Take any
vector b and a-sequence of vectors {u,}, and define the sequence {b,} by
(2.11.1) b = bo,
bp-1 = bp + ApUlp,
where the scalar ), is chosen so that b, is orthogonal to u,. Then if the
vectors u, ‘‘fill out’’ some n space, the vectors b, approach as a limit a
vector that is orthogonal to this n space. Without attempting a more
precise definition of what is meant by “filling out,” we can see that it
must imply the following: If the vectors e; represent any set of reference.
vectors for this n space, then however far out we may go in the sequence
{up}, it must always be possible to find vectors u, = eu, with a non-
vanishing projection on any e; and, in fact, with components that have
some fixed positive lower bound. A possible choice for the vectors u,
- would be the reference vectors e; taken in order and then repeated.
If e; is the arithmetic vector associated with the geometric vector e;,
we have then
Urynti = i,
Now
\p = ulHb,1/ulHuy
vlAHb, 1/viAH Ay.
Consequently it is natural to take
H = A“,
which gives
(2,111.2) Ap = VT p1/VT Ady.
Alternatively we may take
Sp = by.
Then
y — Ax, = Aby = A(bp-1 — Nytly) = y — Atp-1 — ApAUy.
Hence
Ax, = AXp1 + rA,AUp,
or
(2.111.3) Lp = Lp-1 + ApUp.
But
Ap = URHb, 1/ulHuy.
If we take
H=A,
we have
(2.111.4) Ap = Ulrp-1/UTAUy.
To understand the reason for this selection, we note first that, since A is
positive definite, by hypothesis, there exists therefore a matrix C for which
A = CTC.
The equations
As = y
are therefore equivalent to the equations
Cz = 2,
where
Clz. = 4:
Define the function
g(x) (Caz — 2)"(Cx — 2)
21C1Cx — 2a7™Clz + alz
= a'Ax — Qaly + zlz.
From (2.111.8) we see that x, differs from rp_; only in the 7th element.
Furthermore, since b, and e; are to be orthogonal, this means that the ith
element of r, = As, must equal zero, which means that the 7th equation is
satisfied exactly. Hence the ith element of x, is chosen so that the 7th
equation will be satisfied when all other elements are the same as for
XLp-1. While we may expect that this process will require more steps
than does the method of steepest descent, the simplicity of each step is a
great advantage, especially in using automatic machinery.
3. The method of relaxation always takes u, to be some e;, but the
selection is made only at the time. Since the choice u, = e; has the effect
of eliminating the 7th component of ry, one chooses to eliminate the largest
residual. However this is not necessarily the best choice. The effective-
ness of the correction is measured by the magnitude of the correcting
vector A,U,, and this magnitude is
UtAs, s{utAup}%.
Now when wu, = ¢;, then ulAs,_; is the 7th component of the residual
T,-1, but this is divided by the length of e; which has the value of ~/ai.
Hence one should examine the quotients of the residual components
divided by the corresponding ~/a, and eliminate the largest quotient.
This method clearly converges more rapidly than the Seidel method,
which projects upon the same vectors but in a fixed sequence. Therefore
for ‘‘hand”’ calculations it is to be preferred. For automatic machinery,
however, the fixed sequence is almost certainly to be preferred.
2.112. The matrix A is not necessarily positive definite. This case
can always be reduced to the preceding if we multiply throughout by AT.
However, this extra matrix multiplication is to be avoided if possible.
With regard to the equations
Az = y,
we may adopt either of two obvious geometric interpretations.
The simplest interpretation is that we wish to change the vector
coordinates, as in Eq. (2.01.8), where y, taking the place of 2’, is known,
MATRICES AND LINEAR EQUATIONS 49
and A, taking the place of EZ, is known. In the symbols used here,
therefore, the columns of A are the numerical vectors which represent the
e; in the system f, and the column vector y represents x in the same system.
The vector y is to be expressed as a linear combination of the columns
of A, which is another way of saying that the vector x, whose representa-
tion is known in the f system, is to be resolved along the vectors e;.
The other interpretation comes from regarding each of the n equations
as the equation of a hyperplane in n space. If a; represents the ith row
vector in A, then the 7th equation is
j at = Niy
and this equation is satisfied by any vector x leading from the origin of the
point-coordinate system to a point in the hyperplane. If we divide
through this equation by N(a,), the length of a;, we obtain
[a;/N (a,)]z = 4:/N (ai),
and since the vector multiplying x is a unit vector, the equation says that
the projection of x upon the direction of a; is of length 7;,/N(a;), and hence
the same for all points in the plane. Consequently the vector a; is
orthogonal to the plane; and the distance of the plane from the origin is
\n:|/N (a).
In case we think of the underlying coordinate system e as nonorthogo-
nal, the vectors a; are taken to be covariant representations of the
normals, and the vector x as the contravariant representation of the
vector x drawn to the common intersection of the n planes.
These two geometric interpretations suggest different iterative schemes.
We begin with the hyperplane interpretation.
2.1121. The Equations Represent a System of Hyperplanes. If v is
any column vector, then
(2.1121.1) viAx = vly
is also the equation of a hyperplane passing through the point x. The
normal, written as a column vector, is A'v. If x, is any approximation
to x, and s, and r, are defined as before,
(2.1121.2) A8» = Tp = y — Aty = A(z — 2p),
and project upon the vector up41 = A'p41. This amounts to writing
Az, = y — Ab,
so that as b, vanishes, x, approaches z. The basic sequence as defined
by (2.11.1) and (2.11.4) takes the form
50 PRINCIPLES OF NUMERICAL ANALYSIS ©
(2.1121.5) ‘ Ap = Virp-1/V,AA'd,,
if the identity matrix is taken to define the metric. But then (2.1121.4)
gives
(2.1121.6) Lp = Lp + ApAldy.
By analogy with the method of steepest descent as described for the
positive definite case, we may define the non-negative function
A pATe;.
Hence to make the optimal choice, we should divide each residual by the
square root of the sum of the squares of the corresponding row of A, and
select the largest quotient. Presumably all these square roots will be
used repeatedly and should be calculated in advance.
2.1122. The Equations Represent a Resolution of the Vector y along
the Column Vectors of A. If x, is any set of trial multipliers,
T? =Yy — Ax,
represents the deviation of the vector Az, from the required vector y-
Take
bp = Tp,
and let
Up = Ady
MATRICES AND LINEAR EQUATIONS 51
represent any linear combination of the columns of A. Then Eqs.
(2.11.1) and (2.11.4) give
| Vp = Aly,
a choice that complicates the denominator in \, excessively. In taking
v, = @; for some 2, we alter only one element of x,_1 in obtaining zy, but
no element of r, is made to vanish, so no one of the equations is necessar-
ily satisfied exactly. To find the optimal e; according to the principle
of the method of relaxation, we observe that we wish to maximize the
vector
ApUp = ApAD_
or
ApAe;.
efATr,4
which is a complete scalar product of the 7th column of A with the residual
vector rp1. Taking this scalar product represents the greater portion
of the labor involved in the complete projection, so that one would
probably always take the vectors e; in strict rotation.
2.113. Some generalizations. The methods described in §2.111 con-
sisted in taking each residual s,_1 = « — %p_1 from the true solution z,
projecting orthogonally upon a vector u,, and adding the projection to
2%p,-1 to obtain an improved approximation x, The new residual sp,
was orthogonal to the projection on uy. Clearly if the projection is
made on a linear space of two or more dimensions, the projection, 7.e., the
correction, will be at least as large as the projection on any single direction
in this space. Hence it is to be expected that the rate of convergence
of the process would be more rapid if, instead of projecting each time
upon a single vector uy, we were to project upon a linear space of two or
more dimensions. Such a space may be represented by a matrix U,
such that any vector in the space is a linear combination of columns of
Us
52 PRINCIPLES OF NUMERICAL ANALYSIS
The problem is now the following: Given any matrix U,, we wish to
project the residual s,-1 orthogonally upon the space U, (that is, the
space of linear combinations of its columns). The projection “gill be
taken as a correction to be added to x»_1 to yield the improved approxi-
mation x». The orthogonal projection is represented by the matrix
U,(USAU,)UTA, and we find now that
(2.113.1) Lp = Lp + U,(UIAU,) “Uhr.
The scalar \, is for present purposes to be replaced by the vector
(UTAU,)“Ur y-1.
Its expression is the same as for \, except that the matrix U, replaces
the vector u,, and it is to be noted that the reciprocal matrix enters as a
premultiplier. While the method does provide a larger correction, in
general this advantage is offset by the necessity for calculating an inverse
matrix whose order is equal to the dimensionality of the subspace upon
which the projection is being made. Nevertheless in special cases this
inversion may prove to be fairly simple.
If the columns of the matrix U, are unit vectors ¢;, then the matrix
UTAU, is a principal submatrix of the matrix A. If, say, we take Up
to be the two-column matrix (e;, e;), then
Arkeppyn = Y — ArXpn.
Since the vectors Zpnii for 0 < 7 < m need not enter explicitly, we may
modify the notation by writing simply zx, for what had been designated
Xpn, and the iteration is written
(21212) A1fp41 Py Axx, At Ai + A>.
Lp = Az '(y — Ac®p-1),
and x satisfies
“6= Azlty = A22),
Then (2.121.5) yields one of the criteria given by von Mises and Geiringer
which they state in the form
» a} <1.
tA
a,j
y les! Se <1,
tAy
MATRICES AND LINEAR EQUATIONS 55
by noting that, if o are the elements of s,, then |o%+?| < » laus| > lo,
iwi
whence the criterion implies that Z|o{7?| < ad\o”|.
Now suppose A is positive definite and write
(2.121.8) yI+ B) =A.
Thus we take Ai = I, Az = B for (2.121.2). Since
Ree YIN Yseh) Lea
it follows that for each proper value \; of A there is a proper value
(4; — v)/y of B, and convergence of the process requires therefore
that every \; < 27. If 7 is so chosen, and if the proper values of ) are
arranged in order of magnitude,
Mie aA ot ay ee Xn;
F(A)a = f(A)y,
equivalent to the original, yields a more rapidly convergent sequence.
The proper values of F(A) are n; = F(\;). If uw’ and w”’ are the largest
and smallest of the u;, we wish to choose F(A) sothat (u’ — p’’)/(u’ + yw”)
is as small as possible, as we see by (2.121.10). Hence we wish to choose
F(A) to be positive over the range (A1, An), and with the least possible
variation.
The simplest case is that of a quadratic function F, and the optimal
choice is then
F = Xa — d), a = (Ay + X,)/2.
Ordinarily one does not know the proper values in advance, though one
might wish to estimate the two extreme ones required (¢.g., see Bargmann,
56 PRINCIPLES OF NUMERICAL ANALYSIS
Montgomery, and von Neumann), or these might be required for other
purposes. —
2.122. The method of Hotelling and Bodewig. The iterations so far
considered have begun with an arbitrary initial approximation 2» (which .
might be z = 0). Suppose, now, that by some process of operating
upon y in the system
(2.122.1) Az=y,
perhaps by means of one of the direct methods of solution to be described
below, one obtains a ‘“‘solution” x) which, however, is inexact because
it is infected by round-off. The operations performed upon the vector y
are equivalent to the multiplication by an approximate inverse C:
(2.122.2) Lo = Cy.
Then by (2.11.4) the unknown residual so satisfies the same system as
- does xz, except that r» replaces y, and hence we might expect that Cro is
also an approximation to s. Hence we might suppose that
1 = 2% + Cro = C(2I — AC)y
Vo = Xo = Cy, ro = y — Ax = By,
Pisa pe, Crs Tp = Tp _ Avp+1 = Br>.
Then
Ly=v tnt *** +p,
(2.122.9) tp = y — At, = Brty,
The procedure is to compute vp;1 from the last remainder r,, and then
compute the next remainder rp+1.
2.13. Some Estimates of Error. The fact that an iterative process con-
verges to a given limit does not of itself imply that the sequence obtained
by a particular digital computation will approach this limit. If the
machine operates with o significant figures in the base 8, we are by no
means sure of o significant figures in the result. At some stage the
round-off errors introduced in the process being used will be of such
magnitude that continuation of the process is unprofitable. However
another, perhaps more slowly convergent, process might permit further
improvement. In any case it is important to be able to estimate both
the residual errors and the generated errors. In presenting these
estimates, it will be supposed that the equations to be solved are them-
58 PRINCIPLES OF NUMERICAL ANALYSIS
selves exact. The extent to which an error in the original coefficients
affects the solution will be discussed in a later section. |.
2.131. Residual errors. We first consider residual or truncation errors
neglecting any effects of round-off. If M(H) < 1, then from the identity
I —-H)3=I1+ AH -4)"
=I+HI+H+H'?+--:>)
it follows that ;
M(ZI — H)] S<1+ M(A)M(CZ — H)*).
Hence
(2.131.1) M((I — H)-] < 1/1 — M(A)].
Thus in certain cases a bound for the maximum value of a reciprocal can
be obtained from the matrix itself. From the same identity, since
N(I) < n4,if N(H) < 1, then
(2,131.2) N(Z — H)-'] < n4/(1 — NCA),
and if nb(H) < 1, then
(2.131.3) b[( — H)-] < 1/[1 — nb(A)).
The last two inequalities are generally less sharp but more easily applied.
Now consider the sequences C’, and B, defined by (2.122.3). Since
AT = Co Boa’
if we set H = Bo, then
(2.131.4) M(A™) < M(C,)/{1 — M(B,)I,
provided the denominator is positive. To establish the analogous
inequalities using N and b, we note that
or
Cy = AMI — BB),
(2,131.7) A“ — C, = A BY.
MATRICES AND LINEAR EQUATIONS 59
Hence given the SApienses on M(Bo), N(Bo), or nb(Bo), as the case may
be, we have
M(A™ — C,) < M(A)M” (Bp),
whence by (2.131.4)
A = Ai + Aa,
Then
A — C, = HI — H)Azt
and
and
N(r,) < M(A1)M(Az1)M?(A)N
(10)-
Since v4
Tpp1 — Zp = HPA;z'y,
« — a, = (I — H)H?A;ry,
then ;
Sp = 2 — ty = (I — A)“ (hpy1 — 2p).
Hence
(2.131.16) N(8) < N(@p11 — %)/(1 — M(H]
< N(Sp11 — 2) /{l — NCD).
Also
Sy = (= A) AG, Ty—1),
(2,131.17) N(8p) < M(H)N (ap — %p-1)/[1 — M(H)]
< N(H)N (ap — %p-1)/[1 — N(A)).
These inequalities provide estimates of the error in terms of the magnitude
of a particular correction.
2.132. Generated errors. If A is symmetric, let » and u be the numeri-
cally smallest proper value and an associated proper vector, respectively.
If xo is any approximation to z = A-y, then Axo = y — fro while
A(to tu) =y —Tot wu.
Hence if » is very small, a large component in 2» along u would appear in
ro a8 only a small component in the same direction. Another way of
saying this is to say that a putative solution x» might yield a residual ro
that would be regarded as negligibly small even when 2 has a large
erroneous component along wu.
In general, for any matrix if x and x, are two putative solutions, then
%1—- Xo = A“ (re == 11),
(2.132.1) N(a
— 2) < N(r2 — 1)M(A~*) = N(r2 — 11)/m(A),
and if m(A) is small, then a large difference x, — x2 could result in only a
small rz — ri, possibly less than the maximum round-off error. In fact,
if «is the limit of the round-off error, then e/m(A) represents the limit of
detectable accuracy in the solution.
There is no a priori assurance, however, that any particular method of
solution will give a result that is even that close. We therefore consider
this question for some of the iterative methods described above. It will
be assumed, for definiteness, that the operations are fixed-point with
maximal round-off ¢, all numbers being scaled to magnitude less than
unity, and that in the multiplication of vectors and matrices it is possible
to accumulate complete products and round off thesum. If each product
is rounded, then generally in the estimates given below the factor « must
be multiplied by n, the order of the matrix.
MATRICES AND LINEAR EQUATIONS 61
In any iterative process which utilizes one approximation to obtain one
that is theoretically closer, the given approximation actually utilized in
the computation, however it may have been obtained, is digital. To the
digital approximation one applies certain pseudo operations to obtain
another digital approximation. Two partially distinct questions arise:
Given a digital approximation and a particular method of iteration, can
we be sure that the next iteration will give improvement? Given two
digital approximations, however obtained, when can we be sure that one
is better than the other? These are questions relating to both the
generated and the residual errors, since for iterative methods they merge
together.
Basic to the discussion is the fact that, when a product Azo, say, of a
digital matrix by a digital vector, is rounded off by rounding only the
accumulated sums and not the separate products of the element, then the
resulting digital vector, which will be designated (Axo)*, satisfies
Hence
ely + ro) — aly tr) = hy +r) — ally + rf) + aro — 99) a xi(r1 we rt),
and (2.1321.1) will certainly be satisfied if
(2.1321.2) 2T(y + rf) — xh(y + 79) > n4*{N@o) + N@)].
Since we can also say that
\2X(rp — rz)| < nb(xp)b(rp — rp) < neb(xp),
therefore (2.1321.1) is also implied by 7
(2.1821.3) a}(y + rt) — ai(y + 19) > nelb(xo) + b(@1)).
This requirement is somewhat more stringent.
Now consider a particular approximation 2» and the digital approxima-
tion that would be obtained from 2» following a single projection. Can
we be assured that the digital result of making the projection will be a
better approximation than x9? If the projection is made on e;, we wish
to know whether
(to + A*e,){Ly + [y — A(wo + A*e,)]} > zO(y + 70),
where
r* = elré mon OER
We suppose every a; = 1. This does not violate the requirement that all
stored quantities be in magnitude less than unity since the a,; need not
be stored explicitly in this case. Hence
A* = elry, A = elro.
The above inequality reduces to
2r* hak.
ae ee (Azp)*, (i ip 2,
This being the case, we can give an inductive algorithm for a factoriza-
tion of A into the product of a unit lower triangular matrix L and an
upper triangular matrix W. That such a factorization exists and is
unique when A is of second order and Ai; # 0 follows from the above by
taking Nir =N2,2 = 1. For purposes of the induction suppose that Nu
above was unit lower triangular and M,, upper triangular. Then M1,
and N» are uniquely determined by (2.201.4). We change the notation
and partition further, writing
Ai Ai Ais Li 0 0 Wi Wie Wis
(2.201.5) Aa Ago Agog = Lox Lie 0 0 Wo Wes ’
As: Azo Az 31 Lee Ls 0 0 W;
MATRICES AND LINEAR EQUATIONS 67
where Iu. = Nu, Wir = Mi, Aus is the same as above, but the sub-
matrices. previously designated A22, Asi, and Ais are now further par-
titioned, as are the matrices Nei and Mie. When the necessary inverses
exist, these last matrices or their own submatrices are determined
uniquely by Eqs. (2.201.4) which now have the form
Hence W22, Wes, and L32 can be determined uniquely from A and from the
portions of L and W already determined, provided only that As. — LeaWi
is nonsingular, and independently of the choice of the matrices L33 and
Ws3. The last condition merely specifies the product L3;3W33s. Hence
for the inductive algorithm take I2. = 1 and determine the scalar Wee
and the vectors W23 and Lz32. Now the matrices
a 0 ) de )
Dey De 0 Wx
A = LDL".
If we write
LD = K,
then
A = KK".
(2.201.9) Gy aR (% 2
and ¢
GR = 0.
ee
of having selected a unit lower triangular matrix of the form
Lie I 22
where Ji; is itself unit lower triangular, in such a way that A is factored
The matrices Li; and Ls; are not themselves written down. The partial
system
M o2%2 = 22
and supposing Lu, La, Ls1, Wu, Wie, Wis, 21 already determined, Lo»
is prescribed (in practice Lez = 1), and Ls2, Wee, Wes, 22 are to be deter-
mined at this step. Equations (2.201.6) give Ls2, W22, and Wes, while 2»
is given by
(2.21.6) 22 = Le (y2 = L321).
While in practice one takes Lo. = 1, this equation and Eqs. (2.201.6)
are perfectly general. Since neither L33, W33, nor 23 occurs in any of
these relations, one can, with Crout, write the two matrices L — I and
(W, z) in the same rectangular array, filling out in sequence the first
row, the first column, the second row, the second column, etc. When
this array is filled out, the elements along and to the right of the principal
diagonal are the coefficients and the constants in the triangular equations
Wx =z.
In case one has two or more sets of equations with the same matrix A,
then the vectors y and z may be replaced by the matrices Y and Z in
(2.21.5) and (2.21.6). Alternatively one may solve one of these systems,
after which, with L and W already known, the elements of any other
column z in Z are obtained sequentially from the corresponding column y
in Y by using (2.21.6), remembering that at the start there is no partial
vector 2; so that one has simply ¢1 = m. In particular, if a single system
is solved by this method, and a result x is obtained which is only approxi-
mate because of round-off errors, we have seen that the error vector
x — Xo satisfies a system with the same matrix A, so that (2.21.6) can be
applied with y — Ax» replacing y.
Another modification of the method of elimination is that of Jordan.
It is clear that, after £ has been eliminated from equations 2, . . . , n,
MATRICES AND LINEAR EQUATIONS 71
and while the new second equation is being used to eliminate £ from
equations3, . . . , n, this can also be used to eliminate £ from the first
equation. Next the third equation can be used to eliminate £3 from what
are now equations 1 and 2, as well as from equations 4,...,n. By
proceeding thus, one obtains an equivalent system of the form Dz = w,
where D is diagonal. This amounts to multiplying the original system
Ax = y sequentially by matrices each of which differs from the identity
only in a single column. However this column will have non-null ele-
ments both above and below the diugonal.
Crout’s method provides a routine for triangular factorization which
minimizes the number of recordings and also the space required for the
recordings. This is very desirable, whether the computations are by
automatic machinery or not. For machine computation it has the dis-
advantage of requiring products such as L3i1W12 involving elements from a
column of Z and from a row of W. Jordan’s method permits a similar
economy of recording without requiring operations upon columns.
To see this we note first that, if J is a matrix satisfying
J(A, I) m ae J),
and of K,, one first forms the 7th row from the 7th row of the previous
composite. For this one divides every element but the 7th (which is
a;) by o;, recording the quotient, and in the ith place records oj. To
obtain the jth row (j ¥ 7), one increases each element except the ith by
—a; times the corresponding element in the new ith row. In the 7th
place one records ¢; = —a;/a; = 0 — aj/ai.
Clearly if one operates in this fashion upon the matrix (A, J, y), then
one comes out with (J, A-!, x). Thus in using automatic machinery if
n(n + 1) places are reserved in the memory for (A-}, x), then these places
are to be filled first by the matrix (A, y) arranged by rows. Each multi-
plication by an J + J; requires first an operation upon the elements of
the ith row, followed by an operation upon the elements of the old jth
and the new 7th row.
2.22. Methods of Orthogonalization. Let;
(2.22.1) A=RV, RR
= D?
where V is unit upper triangular and D? is diagonal. We have seen in
§2.201 that such matrices exist. The general metric G of §2.201 is here
taken to be J. The matrices V and R can be computed sequentially by
applying Eqs. (2.201.9) and (2.201.10) with appropriate modification of
notation. Then the equations Ax = y can be written RVx = y so that
D?Vxz = Ry, and
(2.22.2) gee Vie
Since D is diagonal and V unit upper triangular, their inversion is straight-
forward. This is Schmidt’s method.
In the least-squares problem one has a matrix B, with m < n rows,
and a vector y, and one seeks a vector x of m elements such that
Be =y+d, a'B
= 0.
of r;, one adds a vector so chosen that r;,1 is orthogonal, with respect to
the metric I, to all preceding r;. If this can be accomplished, then for
some m <n, fm = 0, and hence Az, = y. For if all the vectors 7o, 71,
: , fr—1 are non-null, then being mutually orthogonal they are linearly
independent, and only the null vector is orthogonal to all of them.
Geometrically the method has other points of interest. We have
already noted that the solution x of the equations Ax = y minimizes the
function
(2.22.5) 2f(a) = wtAx — Qaly.
In fact it represents the common center of the hyperdimensional ellipsoids
(2.22.6) f(x) = const.
This fact provides the usual approach to the method of steepest descent.
Also at x» the function f(x) is varying most rapidly in the direction of
ro, which is the gradient at 2 of the function —f(x). Hence one takes
L1 = Lo + Af,
For 7 = 0 the statement is trivial, for it merely says that r, is.a linear :
combination of r> and Aro. Suppose that all vectors ro, 71, . . . , 7% are
non-null and that the statement holds for them. Then p; ¥ 0, since
otherwise the mutually orthogonal vectors ro, ..., 1:41 would be
linearly dependent. Since 7; is a linear combination of ro, Aro, . . . ,
A*ro, therefore Ar; is a linear combination of Aro, A’m, ..., A’ro.
Hence r;,1 is expressed as a linear combination of ro, Aro, . . . , A‘fo.
The theorem in question states that
Pi+1,0 = Pitt = + * = prise = 0.
Also
par = ayy Ars,
(2.22.16) Ti = 1; — Az,
(2:22.17) Zina = Tint + we,
where
(2.22.19) 2iAz; = 0, EE Ie
4=m+ H0Z0,
we have
zhAz. = 2bAri + poziAzo
and
riry = —dorlAzo.
and this is seen to vanish from (2.22.18), (2.22.14), and (2.22.12). Now
suppose (2.22.19) verified for allj <i<k. From (2.22.16) we have
riAz; = 0, jgst-lj>tt+1,
(2.22.20) pitt = —Asr], Az,
i = ArT Ag.
MATRICES AND LINEAR EQUATIONS 77
Hence, from (2.22.17) with i = k we have the required relation verified
when? = k+1andj<k-—1.
Again, from (2.22.17) with i= k
To see this we have only to solve the equation f(a» + Au) = f(xo) for X.
If we take wu = 7o, then A’ = 2X9. Hence 2; is the mid-point of the chord
in the direction 79.
Now the plane rj}A(a — x1) = 0 passes through 2; and also through
the point = A-y, for by direct substitution the left member of this
equation becomes rjr: which vanishes because of orthogonality. This
plane is a diametral plane of the ellipsoid f(x) = f(x); it intersects this
latter ellipsoid in an ellipsoid of lower dimensionality. Instead of choos-
ing 22 to lie on the gradient to f(x) = f(x1), as is done by the method of
73 PRINCIPLES OF NUMERICAL ANALYSIS
steepest descent, the method of Hestenes and Stiefel now takes x2 to lie
on the orthogonal projection of the gradient in this hyperplane, or, what —
amounts to the same, along the gradient to the section of the ellipsoid
which lies in the hyperplane. At the next step a diametral space of
dimension n — 2 is formed, and 2; is taken in the gradient to the section
of the ellipsoid f(x) = f(a2) by this (n — 2) space. Ultimately a diame-
tral line is obtained, and z, is the center itself. With the formulas
already given these statements can be proved in detail, but the proof will
be omitted here.
2.23. Escalator Methods. Various schemes have been proposed for
utilizing a known solution of a subsystem as a step in solving the complete
system. Let A be partitioned into submatrices,
and suppose the inverse A;} is given or has been previously obtained. If
Cae) ‘@ Cu
| Cu Eu
22502 es (0) ce ’
sc -(Oo.
then
I 11 Za
Is2/’
where the J;; and the 0;; are the identity and null matrices of dimensions
that correspond to the partitioning. Hence if we multiply out, we obtain
If Ave is a scalai, Ais a column vector, and As: a row vector, then C2»
is a scalar, Ci, a column vector, and Cz: a row vector, and the inverse
required for C22 is trivial. The matrices are to be obtained in the order
given, and it is to be noted that the product A;z!Ai2 occurs three times,
and can be calculated at the outset. If A is symmetric, then
Cox = Cle
MATRICES AND LINEAR EQUATIONS | 79
In any event the matrix C2» is of lower dimension than C, and the required
inversion: more easily performed. It is therefore feasible to invert in
sequence the matrices
pages Q@i1 GQi2 13
12
(a1), >| ei Geo Gog, - ++ 4
O21 22
O31 a32 Q33
each matrix in the sequence taking the place of the Aj; in the inversion of
the next.
In the following section it will be shown how from a known inverse
A one can find the inverse of a matrix A’ which differs from A in only
a single element or in one or more rows and columns. It is clearly pos-
sible to start from any matrix whose inverse is known, say the identity I,
and by modifying a row or a column at a time, obtain finally the inverse
required. However, these formulas have importance for their own sake,
and will be considered independently.
2.24. Inverting Modified Matrices. The following formulas can be
verified directly: |
(2.24.1) (A + USV")-! = A-! — A“1US(S + SV™A—'US)SV™A—1,
(2.24.2) (A+ US“*Y!)* = At — AUS + VA) VIA
provided the indicated inverses exist and the dimensions are properly
matched. Thus A and S are square matrices, U and V rectangular. In
particular, if U and V are column vectors wu and », and if the scalar S = 1,
then
(2.24.3) (A + w')-! = A! — (Au) (TA-1)/(1 + TAM).
If u = e;, then the 7th row of w is v', and every other row is null;
if v = e;, then the 7th column of wo" is wu, and every other column is null;
if u = ce;, where o is some scalar, and v = e;, then the element in the
ith row and jth column of wo’ is ¢, and every other element is zero. In
the last instance, v'A-u is o(a1),;, where (a); is the indicated element
of A-!. We have then the interesting corollary that the matrix A + w
becomes singular when ¢ = —1/(a~),;.
2.25. Matrices with Complex Elements. If the coefficients of a system
of linear equations are complex, then the matrix can be written in the
form A + 2B, where A and B have only real elements. In general we
may expect the solution to have complex elements. Hence the equations
can be written in the form
and since the real parts and the pure imaginary parts must be separately
equal, this is equivalent to the real system of order 2n:
Az — By =¢,
Bx + Ay =d,
or
(Senay) =)
Thus the complex system of order 7 is equivalent to a real system of order
2n, since these steps can be reversed. The complex matrix A + 7B is
singular if and only if the system with c+ id = 0 has a nontrivial
solution, and this occurs if and only if the real matrix of order 2n is
singular.
A complex matrix is called Hermitian in case A is symmetric and B
skew-symmetric, 1.e., in case
Al aA, pie
But then the real matrix of order 27 can be written
(5BoA4)je
and it is symmetric. Hence the complex matrix is Hermitian if and only
if the corresponding real matrix is symmetric. A Hermitian matrix is
positive definite if and only if for every non-null complex vector x + ty
it is true that
(at — ty")(A + 1B)(z + ty) > 0.
This implies that the quantity is real. But if we evaluate the quantity on
the left, we obtain
ants = 1 — Y oiib;,
ji
&; = n/a — p: (cxsj/ orgs)
E3-
jt
The n(n + 1)/2 divisions ,/o;; and aij/ox (for a symmetric matrix)
can always be done in advance. Thereafter each correction of a single
¢; requires n — 1 multiplications and a single recording provided the
82 PRINCIPLES OF NUMERICAL ANALYSIS
The recordings required are the triangular matrices L and W and the
vectors z and z, or n? + 2n quantities altogether.
To iterate the process in order to reduce round-off, formation of
ro = y — Axo requires n? products; L7'7y requires n(n — 1)/2 (all
multiplications, no divisions since L is unit lower triangular); and
W'L~'ro requires n(n + 1)/2, or altogether 2n? products and at least
3n recordings. These give the corrections to 2, so that an additional n
recordings of 2, itself are required. If the matrix is symmetric, the
operations are reduced by nearly one-half.
2.315. Orthogonalization. In forming RV = A, R'™R = D* as in §2.22,
suppose 7 columns of A have been orthogonalized. As in (2.201.9)
and (2.201.10), 2 elements of the next column of v are to be found, each
requiring n products and a division. Then the next column of FR requires
n(n + 7) products, and n more are required for the next element of
D?. Hence to orthogonalize the columns of A requires a total of
n(4n? + n — 1)/2,
or approximately 2n’ products. Beyond this one requires R'y with n?
products; n more products in multiplying this by D-*; and n(n — 1)/2
more in solving the triangular system Vr = D-*R'y. Altogether it
amounts to 2n?(n + 1) products. For recordings we require at least the
nm? elements of R; n(n — 1)/2 elements of V; n elements of D?; andn
elements each of R'y, of D-*R'y, and of x. This makes n(8n + 7)/2
recordings.
2.316. Inverting a modified matrix. In Eq. (2.24.3), if wu= e;, and A-!
is given, then the inversion of A + w™ requires n? multiplications for
v'A-!; n quotients of 1 + v'A—1u into the vector v'A—! (or into A-u);
n products for multiplying the column vector by the row vector. Hence
there are 2n? + n product operations for modifying the inverse when a
single column of the matrix A is modified. If one builds up the inverse
by modifying a column at a time, then in the worst case n?(2n + 1)
products are required. However, if one starts with the identity, then
in the first step, since
(I + wt)? = I — w'/(1 + Tw),
only n quotients are needed and no other products. The new inverse
differs from J in only the 7th row, so that many zeros remain if 7 is large.
If the programing takes advantage of the presence of the zeros, the num-
ber of products is reduced considerably.
Once the inverse is taken, if a set of equations are to be solved, an
additional n* products are needed.
2.4. Bibliographic Notes. Most of the methods described here are
old, and have been independently discovered several times. A series of
84 PRINCIPLES OF NUMERICAL ANALYSIS
P(x) = (@ — 21)Q:(2).
and not only x; but also 7, . . . , 7, are roots of P = 0. But there can
be no others. For if 2n41 were different from zi, ... , %n, and alsoa
root of P = 0, then it would be true that
ai = —Q0 »;Ti,
+
a2 = ao UXj,
(3.01.4) i<j
a3 = —Qo y CU r,
t<j<k
Gn = (—1)"aor1%2 * + *. En.
(3.01.6) s&s, =
) x,
(3.01.9)
S76, 'o. le me: 6 76,78; te 0. te Te
(3.02.1) P(x) =
If P is a real polynomial, 7.e., if all its coefficients are real, then the real
roots of the derivative equations have important relations to the real
roots of the original.
If we set x = 2+ 7, where r is any real number, forming P(z + r),
expand each power of z + 1, and collect like powers of z, the result is a
polynomial in z of degree n. If in this polynomial we now replace z by
x — r but without expanding powers of x — r, we obtain an expression
of the form
P(x) = Cn + Cn—1(@ — 7) + Cree — 7)? + +> > + e0o(e — 1)",
where the c’s are the constant coefficients of the several powers of z in
P(z +r), and in particular, c) = do. This is an identity which therefore
holds when we differentiate on both sides and continues to hold when we
give to x any fixed value. In particular if we set x = r, we find (as in the
proof of the remainder theorem) that
P(r) = Cn,
P(x)= (@ — r)mP™(r)/m! + -
Moreover, r is a root of multiplicity m — 1 of P’ = 0, of multiplicity
m—2ofP” =0,....
If P is a real polynomial, then between consecutive real roots of P = 0
there is an odd number of roots of P’ = 0. In particular, there is at least
one. This is Rolle’s theorem. For suppose 2 is a root of multiplicity
m,, 2 of multiplicity m2. Then we can write
P(x) = (w@ — 21)™(e — r2)™Q(z),
where Q does not vanish at x1 or 2 or anywhere between. Since Q is a
polynomial, it must retain the same sign throughout the interval. Now
P'(x) = (@ — 41)™""(@ — x2)"*19(z),
where
q(x) = m(x — 22)Q + mo(x — 21)Q + (a — 21)(4 — 22)Q’.
Hence
q(#1) = m(%1 — %2)Q(x1),
Q(X2) = me(xe — £1)Q(z2).
Hence q(x) and q(x2) have opposite signs, and g(x) must vanish an odd
number of times between x; and x2. Hence the same is true of P’(z).
If we differentiate the factored form (3.01.3) of P(x), we obtain for P’
a sum of products of n — 1 factors each. In fact, each product can be
written as P(x)/(x — x;) for some 7. Hence
Se — 018; + a2 ad 0,
(3.02.9)
S3 — 0182 + 0281 7 Os> 0,
we rel Nia) en mis. ie Wiese e, We.. ial 6. fer. ee) 9,8) xe))ca
(3.03.1) =
92 PRINCIPLES OF NUMERICAL ANALYSIS
has these and only these roots. The determinant, therefore, whose
expansion is a cubic polynomial in z is equal to some constant times
(x — a1)(x — 22)(x — x3) by application of the factor theorem. If we
were to regard 2; as the variable instead of x, and apply the factor theorem
again, it appears that (73 — x2) and (x3 — 21) are also factors of the
expansion of the determinant. Likewise, regarding x2 as the variable,
(2 — 21) appears as an additional factor. Hence the determinant is
equal to the product (a3 — x2) (a3 — %1)("%2 — %1)(% — %3)(% — %e)(% — 21),
possibly multiplied by some factor as yet undetermined. However, the
determinant is a cubic polynomial in z, and so has no other factors con-
taining 2; it is also a cubic in x3, and so can have no other factors contain-
ing x3; nor by the same rule can it have other factors containing 22 or 71.
Hence any factor not yet found is a constant, independent of x or any
of the x; But the expansion of the determinant contains the term z2r72z*
once from the principal diagonal, and the expansion of the product
contains this product also. Hence there is no other factor, and
A eesbead
ge |
= (ts — Le)(%3 — 21) (%2 — 21)(4 — Xs)(% — 42)(e — 2).
The coefficient of x? is
12 Ay at
Z1 Xe X3| = (xs — e)(X3 — 21)(X2 — 4).
ui xy x5
Such a determinant is called an alternant, or an elementary Vandermonde
determinant. The negative of the coefficient of x? is |
Lp el ake ji ak
%1 V2 L3| = 01|/|X1 Lo ZI.
3 3 3 2 2
TT, 4 ti ry 23
Again, the coefficient of z is
L casheoat ly gdoand
ag? 22 22| =
no Ay ce O2;|%1 Xe XI,
Del
3 3
aes3 Hi 3. a
and the negative of the constant term is
with P(r) obtained as the final step. This process can be systematized
by writing the system
ao ai a2 Chan an [r
borate Pe Landy oat
bo bi be St ad R
where bo = do, and in general every number along the bottom row is the
sum of the two above it. The r is written in the upper right-hand box
merely as a convenient reminder.
Having written this, we now observe that the b’s are the coefficients
of the quotient
Q(x) = boc? + bya? + > + + a1.
One way to see this is to note that, when in ordinary long division we
divide P(x) by x — 1, the b; is exactly the remainder we get after dividing
aot + a by x — 1, the bz is the remainder after dividing aox? + air + az,
94 PRINCIPLES OF NUMERICAL ANALYSIS
and so on sequentially. We have written above merely a scheme for
evaluating these remainders in sequence.
If the coefficients of the equation P(x) =' 0 are all integers, it is possible
to obtain all its rational roots by inspection and a few synthetic divisions.
This is a help even when the rational roots are of no interest for them-
selves, since for every known rational root the degree of the equation
can be lowered by one. If we examine the scheme for synthetic division,
we can see that, if r is an integer and the a’s are all integers, then the b’s
are allintegers. Ifrisa root, then R = 0,sothata, = —brr. Hence
risa factor of dn. Thus if the equation has any integral root, the root is
a divisor of the constant term. More generally, if P(x) = 0 is a poly-
nomial equation with integral coefficients, and if p/q is a rational root in
lowest terms, then p is a divisor of the constant term, and q is a divisor
of the leading coefficient.
For suppose r is a fraction p/g in lowest terms. If bor is a fraction, say
with denominator s, then b; is a mixed number whose fractional term
has the denominator s. But s cannot divide p since p/gq is in lowest
terms. Hence bir is certainly fractional. By continuing to the end, we
conclude that p/q cannot be a root if q does not divide ao. If we apply
the argument to the reciprocal equation (3.01.8), we conclude that p
must divide adn. Since there are only a finite number of possible choices
for p and q, these can be examined one by one.
In some of the numerical methods of evaluating roots of polynomial
equations, and for other purposes too, one often starts with a polynomial
P(x), replaces x by z + 7, and wishes to evaluate the coefficients of the
polynomial P(z+ 7) as in §3.02. For example, r might be a close
approximation to a desired root of P(x) = 0, and we wish to replace the
equation by one in z for which the desired root z is as small as possible.
This is done in both Horner’s method and in Newton’s method, which will
be described later.
As in deriving (3.02.2), we write
P(x) = (2 — r)Q(x) + en
= (2 — r)[co@s—ir)™ 1 + * >> + ene) + e,,
it is clear further that c,_; is the remainder after dividing the quotient
by «—7r,.... Hence we extend our synthetic division scheme as
follows:
NONLINHAR EQUATIONS AND SYSTEMS 95
At each division we cut off the final remainder and repeat the syn-
thetic division with the preceding coefficients. This is sometimes
called reducing the roots of the equation, since every root 2; of the equa-
tion P(z + r) = 0 is7r less than a corresponding root x; of P(x) = 0.
In solving equations by Newton’s or Horner’s method, it is first
necessary to localize the roots roughly, and the first step in this is to
obtain upper and lower bounds for all the real roots. If in the process
of reducing the roots by a positive r the b’s and the c of any line are all
positive, as well as all the c’s previously calculated, then necessarily
all succeeding b’s and c’s will be positive. Hence the transformed equa-
tion will have only positive coefficients and hence can have no positive
real roots. Hence the original equation can have no real roots exceeding
r. Hence any positive number r is an upper bound to the real roots of an
algebraic equation if in any line of the scheme for reducing the roots by r
all numbers are positive along with the c’s already calculated.
3.05. Sturm Functions; Isolation of Roots. The condition just given is
sufficient for assuring us that r is an upper bound to the roots of P = 0,
but it is not necessary. -In particular if all coefficients of P are positive,
the equation can have no positive roots. This again is a sufficient but
not a necessary condition. A condition that is both necessary and
sufficient will be derived in this section. In fact, we shall be able to
tell exactly how many real roots lie in any interval. However, since it
is somewhat laborious, some other weaker, but simpler, criteria will be
given first.
Suppose r is an m-fold root so that
P(a) = («& — r)™P™(r)/m! + + °°
Since P™(r) #0, there is some interval (r — «, r+ e) sufficiently
small so that P™(z) is non-null throughout the interval, and P‘™-(z),
, P’(x), P(x) are non-null except at r. Suppose P(r) > 0. Then
P-) (x) is increasing throughout the interval, and so it must be nega-
tive at r — «, positiveatr +.«. Hence P~*(z) is decreasing, and hence
positive, at r — ¢; increasing, and hence again positive, at r+e«. By
extending the argument, it appears that the signs at r — « and at r + €
can be tabulated as follows:
96 PRINCIPLES OF NUMERICAL ANALYSIS
Aes Pim-3) Pi(m-2) Pim-1) P(m)
(PIC o 0 < — + = GP
hie OX ECR S + + + 42
Po =P, Pi, =P
Pir) = Pizrlr) = 0,
then P; and P;,; have a common divisor x — r. Since
Co Cy Cy c;
where
og EN en = Oy a
Obtain, next, the sequence
Co Cy Co C3
— Qn—1Pm—1
+ Px =
and regard them as equations in the unknowns P2, P3, . ..., Pm, the
coefficients Q being supposed known. The matrix of coefficients is unit
lower triangular and has determinant 1. Hence P,, is itself expressible
as a determinant, in which P; and P» occur linearly, in the last column
only. Hence the expansion of the determinant has indeed the form of the
left number of (3.06.1), where go and gq: are polynomials, expressiblein
terms of the Q’s.
3.07. Power Series and Analytic Functions. A few basic theorems on
series, and in particular on power series, will be stated here for future
reference. Proofs, when not given, can be found in most calculus texts.
Consider first a series of any type
(3.07.1) bo t+ bitbe+--:
where the 6; are real or complex numbers. Let
(3.07.2) Sx. = bo t+bit->: +d
represent the sum of the firstn + 1terms. The series (3.07.1) converges
to the limit s, provided lim s, = s, that is, provided for any positive
there exists an N such that |s, — s| < « whenevern > N. A theorem
of Cauchy states that the series (3.07.1) converges if and only if
lim |Snrp — Sn| = 0
nn 2
mn lbn+al/ldn| = B,
then the series diverges.
If for some positive 6 < 1itis true that lim |b,|”" = 8, then the series
converges absolutely, but if 8 > 1, it diverges.
Any convergent series has a term of maximum modulus. For since
the sequence of terms b, has the limit zero, for any e there is a term by
such that all subsequent terms are less than ein modulus. Choose « less
than the modulus of some term in the series. Among the N + 1 terms
bo, . . - , by, there is one whose modulus is not exceeded by that of any
other of these terms, nor is it exceeded by the modulus of any b, for
n>N. Hence this is a term of maximum modulus,
NONLINEAR EQUATIONS AND SYSTEMS 101
When the terms }; of the series (3.07.1) are functions of x, the limit,
when it exists, is also a function of x, and we may write
From (3.07.5) it follows that, if the series (3.07.4) converges for x = Zo,
then for every 7
(3.07.6) la;| < y/|zl,
where + is the modulus of the term of maximum modulus.
We have seen that f(x) defined by (3.07.4) is continuous throughout its
circle of convergence. It is also differentiable throughout the same
circle, and
so that
aho = ko,
(3.08.5) ae ie
—h,-1 + ah, = k,.
On multiplying these equations by 1, a, «, . . . and adding, one obtains
ath =k that::: + ka".
Let
(3.08.6) F(z) =ko thet: +> +hz’ = F(z) — Ry4r(2).
104 PRINCIPLES OF NUMERICAL ANALYSIS
Then
(3.08.7) hy = oF, (a) = a YF(a) — R,41(0)],
and
(3.08.8) hy/hoya = oF (a) /Fy41(a).
However, F is analytic at a, and the series (3.08.3) converges for z = a.
Let p’ be the radius of convergence of this series and let p satisfy |a| <
p <p’. Then the series converges forz = p. If y is the modulus of the
term of maximum modulus of the series F'(p), then
\k,| < y/p’.
By (3.08.8)
an hi/hy4 im kyy10°t?/F y41(a).
Hence
(3.08.9) la rae hy/hy+:| = plart/p7*}|,
hy hy41 OF y4.1(a) = 0,
ys hy+e Fy42(a%)
G00) ee ae 2eee
n—1
3.1. The Graeffe Process. We turn now to methods for solving a sin-
gle equation in asingle unknown. We have seen that one can express the
sum s, of the pth powers of the roots of an algebraic equation as a rational
function of the coefficients of the equation by relations (3.02.5). But
we can write
Sp = 2(1 + 23/af + 28/28 + + * *).
Hence if there should be one root that is larger than all the others, say x1,
then for a sufficiently large p all fractions within the parentheses should
become negligible, and we would have approximately
Sp = 24,
and in particular
lim sl/? = 2}.
Day ee ‘
es: asx7—5 =) ie
+ 2a,a302"—4 + she
or
apc?" + (Qaod2 — af)x*-? + (2aras, — 2aia3 + a2)a2—4 pt heal)
Since only even powers of x occur here, this can be written
aP/ay, = —2x2?,
If the equation has only simple real roots, all relations (3.1.7) are valid
for sufficiently large p. The signs of the roots are undetermined, but
these can be obtained by substitution or in other ways. But if P(x) is
real, and the equation has complex roots, these occur in complex conjugate
pairs with equal moduli, and there may be any number of unequal roots,
all having the same modulus. For example, all n roots of
a” —1=0
If there is one pair of complex roots whose moduli exceed the moduli of
all others,
lz:| = lao] =p > |e] (> 2),
then in polar form
%1 = p exp 20,
t= p exp (—76),
s0 that for m = 2?
xt = p™ exp mis,
xy p™ exp (—mié),
and
ae + 2e = 2p™ cos mo.
Hence for larger values of p, cos m6 will fluctuate in value and even in
sign, causing a/a?) to do likewise. However, in a¥/a® the dominant
term will be
NE 8 fo
If we can be sure that we stop where cos m@ is not too small, then x” + xz
will dominate the other terms in a/a, and we can obtain both p and
6, but with the quadrant of @ undecided.
This indeterminacy can be resolved if we apply the root-squaring
process to the equation
Py +h) =0
as well as to the original equation, where h is a small fixed number (using
the method of §3.04 to obtain the coefficients of y). Each root y; of this
equation is related to a root of the original equation by
YrP=f—oh,
and if h is small enough, the moduli of y; and yz will also exceed the
moduli of all the other roots. If
o = |ys| = yal,
then our roots 7; and 2; lie in the complex plane where the circle of radius
p about the origin intersects the circle of radius o about the point h units
to the right of the origin (or —A units to the left if h is negative). This
determines #0 uniquely. In case there are other roots x; with the same
modulus, the corresponding roots y; will have different moduli, and this
difficulty is thereby removed.
3.11. Lehmer’s Algorithm. The technique of investigating the roots
y: of Ply + h) = 0, along with the roots x; of P(x) = 0, has the dis-
advantage of requiring two applications of the Graeffe process in addition
to the special computations involved in the determination of the coeffi-
cients of y. Moreover in selecting h, one should be careful to make it
NONLINEAR EQUATIONS AND SYSTEMS 109
small enough so that, if roots x; and a; are such that |z;,| > |z,|, then also
la; — h| > |; — Al.
Brodetsky and Smeal therefore make the natural proposal that h should
be “infinitesimal,”’ and Lehmer has developed an effective algorithm. _
The original Graeffe process can be described in slightly different terms
by saying that we start with a polynomial P(x) and obtain from it a
polynomial P(x) whose zeros are the squares of those of P; from P: we
obtain P2(x) whose zeros are the squares of those of P; and hence the
fourth powers of those of P,.... On setting Po = P for uniformity,
one verifies that
ods :
we obtain the recursion
r-1
art) = (—1)a?* + 2 > (—1)"aPalp,, r=0,1,..., 2
(3.11.6) ae me
bet) = (—1)’a? dy _,, fuels 2, fos
v=0
In like manner if
jai| > |x] > lars] > °°,
then
eile Spee
then
ay oo —kar,
by = —kat,
(—1)'ap = op,
(—1)*b = kaket,
(—1)*a, = atae
(=o = cet kaa
Hence
Lin = 1/(b ,/a@, — b?/a”).
a”) will contain the term —2p” cos mé which will oscillate in value with
increasing p and m but will dominate the other terms whenever m0 is not
too far from an integral multiple of r. When this is the case, we may say
To express the coefficients B,,, in terms of the bp, consider the related
problem
1+ Ay+ Ay?+ :-- =exp (ay + ay?/2+ --: >).
By differentiating both sides with respect to y, we get
Age CAs te seat. me Moat cay +) (1 Ary te Asay? Fs. ° 2),
Hence on multiplying and comparing coefficients, we obtain the recursion
Ai = a,
2A, = aA, + a2,
Hence
Ste oS at
2B2,m = il te Dom,
3B3m = bmnB2,m ar DomB 1,m +- Dam;
Opn tes ear) wer ose) ‘ow ee se 16 On 6) Senne: 6
It follows immediately from the first of these relations and (3.12.7) that
[Biml < viel,
and if y > 1, as we may require, one can show inductively that
|Bom| < y?|p-?™|.
114 _ PRINCIPLES OF NUMERICAL ANALYSIS
Now from (3.12.4) and (3.12.9)
(3.12.10) ~1 +b Gime”! Gamz** + °°
a {1 aa Z™Dee™ + Suen + (—1)"2"™z7* OS zn™)a + Bim2Z™ + ap ‘).
+: Los Bani « ie a.
But
|BimZ2"| < nylzn/pl™,
Bom y apap < (7)¥*|2n/p|?,
Brit: ee lescuy*lee/ ol
For fixed n, as m increases, the first term vanishes as |z,/p|", the second
as |z,/p|?", .... Hence
(3.12.11) gem + + + 2a, = (—1)* + O(l2n/p|”).
This is the required theorem.
3.2. Bernoulli’s Method. The Graeffe process has the decided advan-
tage that the exponents m = 2? themselves build up exponentially.
Hence if one is fortunate enough to have the roots reasonably well
separated in the original equation, he may hope that only a relatively
small number of root squarings will be required. It has the further
advantage that, in principle at least, it yields simultaneously all the
roots of an algebraic equation and all roots within a circle of analyticity
when the equation is transcendental. Hence, once the squaring has been
carried sufficiently far, all solutions are obtainable by simple division or,
at worst, by root extraction.
The methods now to be described do not converge nearly so fast; they
give only one root, or at most a few roots, at a time; and in some cases -
they require some previous knowledge of the approximate locations of the
roots to be determined. Nevertheless, they all have one striking advan-
tage. Errors, once made, do not propagate but tend to die out. If
there were no round-off error, they would die out completely. A gross
error might cause the process to converge to some root other than the
one intended, however. But the self-correcting tendency suggests that a
method of this type might be useful at least for improving approximate
solutions obtained by, say, Graeffe’s method, but with insufficient
accuracy.
Let the equation
(3.2.1) f(z) = do oh Q;z2 + Az? + ooo = 0.
NONLINEAR EQUATIONS AND SYSTEMS 115
have a single root a interior to some circle about the origin throughout
which f(z) is analytic. Then if g(z) is analytic throughout the same
circle, and g(a) # 0, it follows from Kénig’s theorem that h,/hp1— a,
where
(3.2.2) g/f = ho + Iiz + hoz? + - - >
Now if
Aho = Jo,
Ahi + aiho = gi
3.2.4 us
( ). Ache + aihi + acho = go,
B40. OmyP ge O, 10. 8, KO) 10) ..0. OO aheuue
(3.2.6) Ho =|» ta
p+1 y
obtain
H®/H®, — oar.
Roaincr
gain
Rea [psp
Hore
h,
|Fee pe ee
(3.2.7)
hye hy h,
h
then
AY /HS, — arora.
Aitken has given a convenient recursion for calculating the determi-
nants H® of successively higher order. The formula is
H® = 1, H® = h,.
oe meOUrOh,=. A 0c 2 IR (’O aT ee
Beet? 9 Ay hs (0-0 how 0 h, ha 0 0
PaseOik Ghat hyias le0 hoe 0 herman ko
are equal, since the second is obtainable from the first by subtracting the
fourth, fifth, and sixth rows, respectively, from the first, second, and third.
But the second one vanishes, since in the Laplace expansion by third-order
minors every term contains a determinant with a column of zeros. The
Laplace expansion of the first along the first three rows has six nonvanish-
ing terms, but these are equal in pairs. When the three distinct terms
are written out, one obtains
hr he 1| |h, bya O
-- h, hy-1 0 Avs h, 0 == ib),
Ay h, 0 hy+e Nya 1
the same identity holds when the quantities are primed. In fact, by
direct substitution one finds that, when each term in the sequence is
increased by w, the entire quantity on the left is increased by w. We can
therefore take w = —wu and consider the sequence whose limit is 0.
But by direct substitution, then, the determinant is seen to vanish,
which proves the assertion, In §3.08 it was shown that each sequence
118 PRINCIPLES OF NUMERICAL ANALYSIS
uy = HY /H®, for fixed p converges geometrically (in the limit). Hence
we may expect that the derived sequence
Uy-1
U Or= e us re 2Uy = Uy+1)
v Uy
Uy+1
would converge more rapidly than the original one. This is Aitken’s 6?
process. A second derived sequence, u‘?, can be formed from the u{? —
just as the u was formed from the u,. It is to be noted that in forming —
a term in a derived sequence one can neglect all digits on the left that
are common to the three terms being used. This is because of the prop-
erty that in increasing each term by w the result is increased by w.
We conclude this section with a brief mention of an expansion due to —
Whittaker. In (3.2.4) we are free to take go = do, 91 = g2 = °°-* =O.
If the first vy + 1 of these equations are regarded as vy + 1 linear equations
in the vy + 1 unknowns fo, di, . . . , h,, the solution for h, can be written
down in determinantal form (cf. §3.32 below). Hence the ratio h,/h,41
can be expressed as the ratio of two determinants. Moreover, one can
write
a= ho/hi + (hi/he a ho/hi) ae (he/hs a hi/he) none ’
(3.3.3) . x = (2).
NONLINEAR EQUATIONS AND SYSTEMS 119
In particular if ¥(z) is analytic and non-null throughout some neighbor-
hood of a root « of (3.3.1), then a is the only root of (3.3.3) in that
neighborhood of a. This suggests the possibility of so choosing y that
the sequence
by an induction that can be carried out once we know that every 2; lies
in N(a, p). Since k <1, therefore, the distance |x; — a| decreases
geometrically at least. °
Now suppose that for some 2 and p and a positive k < 1 we have
converges absolutely. But the partial sums of the last series are the 2;.
Hence the sequence (3.3.4) converges, and the limit therefore satisfies
(3.3.3).
If ¢ is analytic in some neighborhood of a root a, as will be assumed
throughout, and if
(3.3.10) Id’(a)| <1,
then for any k which satisfies
ld’(x)| < k.
The requirement often made that at the initial approximation x» we
should have
F(%0)f’" (xo) > 0
is not strictly necessary.
Newton’s method applies to transcendental as well as to algebraic
equations, and to complex roots as well as to real. However if the equa-
tion is real, then the complex roots occur in conjugate pairs, and the
iteration cannot converge to a complex root unless 2p is itself complex.
But if x is complex, and sufficiently close to a complex root, the iteration
will converge to that root.
For algebraic equations, as each of the first two or three 2; is obtained, it
is customary to diminish the roots by 2; by the process described in §3.04.
NONLINEAR EQUATIONS AND SYSTEMS 123
Or rather, one first diminishes by 2; then one obtains 7, — a) and
diminishes the roots of the last equation by this amount; then obtains
— x, and diminishes by this amount, etc. Since
Fes + u) = fei) + uf'(e) + -
one has always that 2;,1 — x; is the negative quotient of the constant
term by the coefficient of the linear term. Hence one calculates f(x;) and
f’(x;) in the process of diminishing the roots at each stage.
However, f(z;) decreases as one proceeds. When f(z;)/f’(x;) becomes
sufficiently small, one can write
u = —([f@s) + uf"(7)/21+ - - -]/f'@,),
which is exact. When wis small, the terms in u?, u?, . . . become small
correction terms, and subsequent improvements in the value of u can be
made quite rapidly by resubstituting corrected values.
When Newton’s method is applied to the equation
x? —N =0,
the result is a standard method for extracting roots in which one uses
(3.32.1) f@® =0
124 . PRINCIPLES OF NUMERICAL ANALYSIS
has only a single root a, which is simple; if f(z) and g(z) are analytic
throughout this circle, and g(a) ~ 0; and if we define
For P,(x) is simply the ratio of the coefficients of the Taylor expansion of
h(z) about the point z.
This being true, then at least for r sufficiently large it is to be expected
that
la — + — P,(z)| < la — al,
and hence x + P, should then define a convergent iteration of some order.
It turns out that the iteration is convergent for any r and in fact is of
order r + 1.
In proof, we write the expansions
> -
h(z) = ho(x) + (2 — z)hi(a) + (2 — 2)*ho(~) +
and
F(z) = ko(x) + (2 — z)ki(zv) + > + > + (2 — x)ke-(x) + Rryilz, 2).
Then
Px) = h,-a(x)/h-(x)
= (a — 2)[F(a) — R,(a, 2)]/{F(@) — Rey1(e, x)],
and
a—2—P, = (a — x)k,(x)/[F(a) — Rr4i(a, 2).
Since a — x — P,(z) has the factor (a — x)’, all derivatives of
a—z-—P,,
and hence of x + P,(x), from the first to the rth will vanish at z = a.
By definition, therefore, x + P,(x) defines an iteration of order r + 1
at least. Note that with g =1, Pi = —f/f’, which yields Newton’s
method.
For the functions h,(x) required in forming any P,(x), one can obtain
them by differentiation, as indicated in the theorem, or by solving a
recursion like that of (3.2.4). However, now the a, and g, are functions
of x, coefficients in the expansions
f(z) = aox) + (2 —az)ai(z) +---,
g(z) = go(x) + (2 — z)gi(z) + ---.
In the statement of Whittaker’s expansion (§3.2), reference was made
to the fact that the h, could be expressed by means of determinants.
NONLINEAR EQUATIONS AND SYSTEMS 125
This is equally true for the h,(x), and even for the P,(x). For this it is
convenient to suppose that any desired function g has been divided into f
in advance, and f now designates the quotient. The expansion is now
(3.32.4) 1/f(z) = ho(x) + (2 — z)hi(x) + (@ — 2)*al(z) + ---,
so that
h(x) = [1/f(x)]/r},
and
P,(a) = hy1(x)/h,(a).
Let
Ao = L Ai = a,(z),
ai ao 0
(3.32.5) Be SM hers
a, Gr, Ay_2
Then
(3.32.6) Ie = (—)*A,ag*—1,
and
(3.32.7) P, = —apA,_1/A,.
This is equivalent to saying that an iteration
(3.32.8) ¢,=x+P,
of order r + 1 is given by 4, satisfying
or —='Z. «Ge 0
—l ai a PEN?
(3.32.9) 0 Cates oct Ses ce Ob
0 a, Gr-1
As an example, let
f=wa-—N
Then
a = u™ — N, a = ma), a2 = m(m — 1)z"-?/2.
{Ewcayloay
(£)] — wa (é) wc (£) }
A) =
E — wa(€) — wa (é) + wala (]}?
where the sequences of deviations satisfy
wy (€) - Cn ko ewe
wa(~) = air tr ss,
= (1 —a®)(1 —a®)E+---.
But
a = gale), a? = $in(a),
so that, if (3.33.6) is satisfied, the expansion of the denominator begins
with a term in the first power of & Hence the expansion of the fraction
begins with a term of degree 2 at least, and the iteration is therefore of
order 2 at least.
Thus given two iterations of the same order, one can always form an
iteration of higher order. But it was nowhere required that ¢q) and ¢(2)
be distinct, so that from any single iteration one can form an iteration of
higher order. More than this, an iteration ® of order r > 1 always
converges in some neighborhood about a, whereas an iteration ¢ of order 1
need not converge and will not unless |¢’(a)| <1, as we have seen.
Hence from any function ¢, analytic in the neighborhood of a and satisfy-
ing ¢(@) = a, one can form an iteration which converges to a whether
that defined by ¢ converges or not.
In ordinary practical application it is not desirable to form @ explicitly.
Instead, one can proceed as follows: One forms
r1 = $(Zo), t= (21)
in the usual manner. However, for x3 one takes not ¢(x2) but ®(xo) by
23 = (“ote — 2?)/(o — 2x1 + 22).
In terms of the difference operator A, defined by
Ax; = Vii — LH,
This form brings out the fact that in practical computation any sequence
of digits on the left that is identical for x3,, 73,41, and 3,42 can be ignored,
since it drops out in forming the differences and is restored in subtracting
from %3,.
Just as the iteration & was of order higher than ¢, so one can form from
® an iteration of still higher order. Thus having computed
x3 = B(x), Le = ¥(23),
instead of computing ¢(z.), one could now form
X27 = (Xox— — 12)/(to — 2x3 + Ze).
In principle, iterations of arbitrarily high order can be built up by pro-
ceeding in this manner.
3.34. Iterations of Higher Order: Schréder’s Method. The oldest method
of obtaining iterations of arbitrary order seems to be that given by
Schréder. Consider any simple root a of
(3.34.1) f(x) = 0.
In the neighborhood of a we can set
(3.34.2) y =f)
and let
(3.34.3) a = hly),
where
This is the required recursion for expressing the b,(x) in terms of the a;(z),
and hence of the derivatives of f.
3.35. Iterations of Higher Order: Polynomials. Three distinct methods
have been given for forming iterations of higher order. The 6? process
presupposes that some iteration is known and deduces from it an iteration
of higher order. The methods of Schréder and of Richmond start with
the equation f = 0 and form directly an iteration of any prescribed order,
130 PRINCIPLES OF NUMERICAL ANALYSIS
provided only that the root a is a simple root and that f is analytic, or at
least possesses derivatives of sufficiently high order, in the neighborhood ~
of a. Clearly if g(x) ¥ 0 and is analytic in the same neighborhood, one
could apply either method to the equation F = 0, where F = fg or where
F =f/g. Thus there are a great many ways of forming an iteration of
any order, and doubtless many different iterations. In special cases it
might be desirable to impose upon the iterations special conditions other
than the order of convergence. In particular if f is a polynomial, one
might ask that ¢ be a polynomial. This would be desirable in case
one is using a computing machine for which division is inconvenient; or
in operations with matrices, where direct inversion is to be avoided; or,
as Rademacher and Schoenberg have shown, in operations with Laurent’s
series, where direct inversion is impossible.
We ask now whether, when f is a polynomial, one can find a function g
such that, when Schréder’s method is applied to the equation
(3:So4m) F = f/g = 0,
an iteration ¢, will be a polynomial. Taking first ¢1, we wish to choose g
so that
(f/9)/(F/9)' = Pf,
where p is a polynomial. This requires that
pf’ — plg'/g)f = 1.
But if f = 0 has only simple roots, one can always find polynomials p and
q such that
(3.35.2) of’ —qf =1
by the theorem of §3.06. Hence if g satisfies
(3.35.3) g'/g = 4/P,
the requirement is satisfied for 4.
Now suppose that the process yields
@ = «(2 — az).
For the case when a and 2 are matrices, this defines the Hotelling-
Bodewig iteration for inverting a matrix.
On introducing the subject of functional iteration, it was shown
that the iteration would converge to a when x is any point in the com-
plex plane within a circle about a throughout which a certain Lipschitz
condition is satisfied. This condition is sufficient but by no means neces-
sary. In particular, suppose a is real and ¢ is a polynomial. The curve
132 PRINCIPLES OF NUMERICAL ANALYSIS
(3.41.2) d = Zy?
takes on the minimum value ¢ = 0 at all points satisfying (3.41.1).
More generally if a,; are elements of a positive definite matrix, the
function
(3.41.3) pt = LIYpiash;
NONLINEAR EQUATIONS AND SYSTEMS 133
also takes on the minimum value ¢* = 0 at the same points. There are
thus many ways in which the problem of solving a set of equations can be
replaced by a problem in minimization. We therefore consider the prob-
lem of minimizing the function $(&, &, . . . , &n), or briefly ¢(z), where
xz is the vector whose components are the ¢. The partial derivatives
be: = 06/08;
are components of a vector which we shall designate as ¢, and take to
be a column vector. This is known as the gradient of ¢, often denoted by
grad ¢ or V4, and its direction at any point z is normal to that surface
(3.41.4) ¢ = const
(3.41.6) u'dz(to) ¥ 0.
This means that the direction u is not tangent to the surface (3.41.5) at
Zo. It therefore cuts through the surface and so intersects surfaces at
which ¢ has smaller (as well as larger) values than at 1%. Determine
(3.41.7) t1 = Xo — AU
at which ¢(z) has a value smaller than it has at 2. The equation for
locating:x1 is obtained by equating to zero the derivative of ¢1(\):
(3.41.10) u'dz(%o — Au) = 0.
(3.41.12) v= :,(%0)/é,2;(2o)
(3.42.4) d(x) = 0,
where x represents the vector of elements &; If each function ¢; is
analytic in the neighborhood of some point 2», then
136 PRINCIPLES OF NUMERICAL ANALYSIS
(3.42.8) t= Xo — fz*(Xo)f(Xo),
which is the direct generalization of the iteration given by Newton’s
method.
The iteration does converge, and it is not necessary that derivatives
of all orders of the ¢; should exist. However if n is at all large, the
repeated evaluation of the inverse matrix fz! or the repeated solution of
linear systems of the type (8.42.7) is certainly undesirable. Conse-
quently a somewhat more general theorem will be proved.
Suppose all functions ¢; have continuous first partial derivatives
in the region of n space being considered, and suppose moreover that this
region is convex. If ¢ is any one of these functions, and ¢, is the row
vector of its first partial derivatives, then for any x’ and x’ in the region
(3.5.2) f(x + ty) = f(x) + tyf'(z) — yf" (a) /2! — f(a) /BI+ ++: ,
whence
ule, y) = f(t) — yh @) 20 ys (s/s)
v(x, y) = ff’ (a) — yf" @)/31+ °° 4,
so that u and v/y are functions of x and y?.
Any solution z of
must be in the form z = x + iy, where x and y are real and satisfy
yo Ml
il @1 + Gay" + - * * Gen ay = 0:
Multiply the first equation by ai, the second by ao, subtract, and divide
through by y?. The resulting equation is of degree m — 1 in y?. If
Qom41 #0, multiply the first equation by den41, the second by dom, and
subtract. This gives a second equation of degree m — 1 in y?, and these
two can be treated as were the original two. Eventually there results an
equation of degree 0 in y®. If demy1 = 0, continue with the equation
resulting from the first elimination along with y-v = 0.
For applying Newton’s method or one of its generalizations to the
NONLINEAR EQUATIONS AND SYSTEMS 139
original equation, one can also separate ¢ into its real and imaginary
parts, writing
Here are two equations in the two unknowns a and b which must be
satisfied by the coefficients of a quadratic factor d(z) of f(z).
The polynomials ro and 7; in a and b could be determined equally
well in a slightly different manner. Dropping the subscript on the z,
if z is any zero of d(z), we can say that
zg = —ag*-! — bg.
140 PRINCIPLES OF NUMERICAL ANALYSIS
Hence if f(z) is of degree n, this substitution reduces it to a polynomial
of degree n — 1 in the zeros of d(z), and the coefficients of the poly-
nomial are themselves polynomials in a and b. Likewise
gr-l —— azg™-2 = (1a
These are the partial derivatives required for the application of Newton’s
method to the solution of (3.5.7), and they are obtained by two divisions
of f(z) by (2? + az + b).
Hence if a, and b. represent approximations to the coefficients a and b
of an exact division d(z), then improved approximations Ge+1, bay1 Can
be obtained by solving
where Qo, 41, To, 71 are the quantities obtained after division of f(z) twice
by 2? + az + be. When the process is carried out in this way, the
general forms of the polynomials ry and r; in a and b are not obtained.
Instead their numerical values and those of their partial derivatives are
obtained by the divisions which use the current numerical approximations
aq and be.
This method can be generalized to yield a factor of arbitrary degree.
If one writes down formally a factorization of f(z) into factors with
unknown coefficients, then by expressing that f(z) is to equal identically
the product of these factors one obtains a set of equations relating the
unknown coefficients. Let the unknown coefficients be represented as
Qi, G2, . . . , @y taken in any order, and let the conditions be written
W=w=--- =w=9,
where each y is a polynomial in the a’s. If with each y; one can associ-
ate an a; in such a way that y; = 0 is easily solved for a; as a function
of the other a’s, one can use this fact to define formally an iterative
scheme for evaluating the coefficients in the factorization, and many
different such schemes have been proposed. Generally, however, the
question of convergence is left open.
3.6. Bibliographic Notes. The author is indebted to Professor
Schwerdtfeger for numerous references on iterative methods for both
linear and nonlinear equations, as well as for a copy of some lecture notes.
And at this point reference may be made to Blaskett and Schwerdtfeger
(1945) on the Schréder iterations.
Konig (1884) published the theorem that is now classic in i the theory
of functions. Hadamard (1892) elaborated this and related notions, with
reference to the location and characterization of singular points of analytic
functions, and made application to the evaluation of zeros. A more
recent discussion is that of Golomb (1943). Aitken (1926, 1931, 1936-
1937b) discusses extensively the use of Bernoulli’s method and the &
142 PRINCIPLES OF NUMERICAL ANALYSIS
Before showing that all proper values satisfy (4.0.4), we show that, if
h(A) is the highest common divisor of the elements of adj (A — AJ), then _
VAL = (A — ATQQA).
Since (A) divides m(A), we can write
mr) = kA)v),
whence by (4.0.7)
KA)Q(A) = PQ).
Hence every element of P(A) is divisible by the polynomial k(X). Hence
k(A) is a constant, and therefore m(A) and ¥(A) differ at most by a con-
stant multiplier. If we now take the determinants of the two sides of
(4.0.7), we have ss
G(A)|PA)| = ma},
whence every linear factor of (A) must be also a factor of ¥(A).
THE PROPER VALUES AND VECTORS OF A MATRIX 145
If x is any vector, then since ¥(A) = 0, it is certainly true that
W(A)z = 0. ;
For a given vector z ~ 0 let h(\) be a polynomial for which it is also
true that h(A)x = 0. Then if d(d) is the highest common divisor of
h(A) and (A), it is true that d(A)z = 0. In fact, one can find poly-
nomials p(A) and g(A) such that
(4.0.9) Vi(A)g(A)y = y.
Hence y is a linear combination of the columns of ¥;(A). If z is any
146 PRINCIPLES OF NUMERICAL ANALYSIS
linear combination of these columns, then the last nonvanishing vector
in the sequence
20 = 2,
a= (A “7 rL)zZ0,
(4.0.10)
22 >= (A = rD)a:,
«0 Beet Rey, piel Esite “e) Fes) sea
is a proper vector associated with ),, and all vectors of the sequence are
principal vectors.
Among the schemes for finding the proper values of a matrix, some lead
directly to the characteristic function ¢, to the minimal function y, or to
some divisor w(A) of the minimal function. When this function is
equated to zero, the resulting equation is then to be solved by any con-
venient method. The scheme for finding the polynomial ¢, ¥, or w, as
the case may be, may or may not have associated with it a scheme for
finding the proper vectors. If the scheme provides only some w, and
not necessarily y or ¢, it may be necessary to reapply the scheme in order
to obtain the remaining proper values.
Other schemes are iterative in character, depending upon the repeated
multiplication of a matrix by a vector. A scheme of this type ordinarily
leads to a sequence of vectors having a proper vector as its limit and to a
sequence of scalars whose limit is the associated proper value. Before
describing these methods in detail, we shall introduce a few further
preliminaries.
4.01. Bounds for the Proper Values of a Matrix. Since a nonsymmetric
matrix may have complex proper values, and hence complex proper
vectors, it is necessary to give further consideration to complex matrices. -
The natural generalization of a symmetric real matrix is a Hermitian
complex matrix. The matrix A is Hermitian in case it is equal to its own
conjugate transpose, 7.e., to the matrix obtained when every element is
replaced by its complex conjugate, and the resulting matrix is then trans-
posed. Let a bar represent the conjugate (as is customary), and an
asterisk represent the conjugate transpose. Then the matrix A is
Hermitian in case
(4.01.1) A* = AT= A,
If. A is Hermitian, and x is any vector, real or. complex, then x*Az is a
real number. For if we take its complex conjugate, we have £*Az; but
this is a scalar and is equal to its own transpose Z7A'Z*T = *A *z**,
However x** = x, and the theorem is proved. Hence we can define a
positive definite Hermitian matrix as a Hermitian matrix for which
az*Az > 0 whenever x + 0, and a non-negative semidefinite matrix as
one for which z*Az > 0 for every x. Only a singular matrix can be
THE PROPER VALUES AND VECTORS OF A MATRIX 147
' semidefinite without being definite. Clearly a Hermitian matrix all of
whose elements are real is a symmetric matrix. ?
Analogous to a real orthogonal matrix, 7.e., a matrix V such that
V'V =I = VV", is a unitary matrix U, which is one such that
(4.01.2) UFUral
= UUs.
A unitary matrix with real elements is orthogonal.
The proper values of a Hermitian matrix are all real, since if Ax = Xa,
then x*Az = dx*z, and both x*Az and «*z are real numbers. Also, if
complex vectors x and y are said to be orthogonal when z*y = y*xz = 0,
then proper vectors associated with distinct proper values of a Hermitian
matrix are orthogonal. For if
Az = dz, Ay = py,
then
y*Ax = hy*x, etAy = party.
But
y*Az = x*Ay, y*z = x*y,
whence if \ ¥ y, this implies that x*y = 0.
If A is Hermitian, there exists a unitary matrix U such that
(4.01.3) eA= N
where A is a diagonal matrix whose elements are the proper values of A,
and where the columns of U are the proper vectors of A. This corre-
sponds to the case of the real symmetric matrix, and the argument can
be made by paraphrasing that given in the real case.
If A is any matrix, Hermitian or not, any scalar of the form «*Aa/x*x
for z ~ 0 is said to lie in the field of values of A. Any proper value of A
lies in its field of values. For if Ax = da, then x*Ax = dAx*az.
If A is any matrix, then A*A is Hermitian and semidefinite; it is
also positive definite if A is nonsingular, for then Az ~ 0 whenever x ¥ 0,
and hence z*A*Axz = (Ax)*(Az) > 0. If the proper values of A*A are
p? > p? > + * + > p2 > 0, and d is any proper value of A, then
(4.01.4) pi = Wj pi}.
For if Ax = Az, then z*A* = dXx*, and hence
gtA* Ax = \\r*z.
Hence XA is in the field of values of A*A. But if a is any number in
the field of values of A*A, then for some a with a*a = 1 it is true that
a*A*Aa =a. If
U*A*AU = P?,
148 PRINCIPLES OF NUMERICAL ANALYSIS
where P? is the diagonal matrix whose elements are the p?, and if a = Ub,
then
a = b*U*A*AUb = b*P% = ZB6ip?,
all products ,8; are real and positive, and hence a is a weighted mean of
the p?. Hence a cannot exceed the greatest nor be exceeded by the least
of the p?:
pi 2 a ] pi.
Since XA is such an a, the relation (4.01.4) now follows.
If \ is a proper value of A with multiplicity », then \ + pis a proper
value of A + pl with multiplicity ». For this reason, the following
classical theorem can provide information as to the limits of the proper
values of a matrix: If for every 7
laslléd =| Y)ati.
jt
Then if & has the same significance as before, there is at least one & for
which |Ex| < \é|. For
pein
— 2 +2) 6 a gt,
(4.01.7) A= st fp
(PO¢),
where P and R are square matrices. Hence if the matrix is not one which
can be given the form (4.01.7) by any permutation of rows, accompanied
by the same permutation of columns, then the conditions
with a proper inequality for at least one value of 7 are sufficient to ensure
the nonsingularity of the matrix.
Obviously the above argument can be applied to A’.
Now let
(4.11.2) MA SP SA
x = Uy, y = U*x,
then
ete = y*U*Uy = y*y:
If p(r) is any polynomial in ), the matrix p(A) has the same proper
vectors as A itself, and its proper values are p(\;). In particular, the
matrix A? is necessarily non-negative, definite or semidefinite. More-
over, for » sufficiently large, A + ul is positive definite. It is therefore —
no essential restriction to assume, when convenient, that A is positive
definite or is at least semidefinite.
‘In §2.06 the trace tr (A), which is the sum of the diagonal elements,
was shown to be equal to the coefficient of \”-! in the characteristic equa-
an except possibly for the sign, and to be equal to the sum of the proper
values,
tr (A) = 2d;
as v increases all terms but the first within the brackets approach zero,
and in the limit,
(4.111.5) A’r > No,
provided only 7:0. That is to say, as v increases, the vector A’
approaches a vector in the direction of the first proper vector uw. It is
necessary only to normalize to obtain w; itself.
If 71 =0 4m, the same argument shows that A’x approaches a
vector in the direction of u2, provided Az exceeds \3, . . . , An nUMerically.
To square a matrix of order n requires n* multiplications, whereas to
multiply a matrix by a vector requires only n*. If
(4.111.6) oe = A’g = Age,
then for large v it appears from (4.111.5) that 2,41 = diz, Hence,
although if » = 2?, » increases rapidly with p, it may be more advan-
tageous to form the sequence z, directly as in (4.111.6) than to square
the matrix several times and then multiply by z. Moreover, a blunder
made in computing z, will be corrected in the course of subsequent
multiplications by A. The two methods are related essentially as are
Graeffe’s and Bernoulli’s methods for solving ordinary equations. It
might be pointed out further that, if by a rare chance it should happen
that 7: = 0, nevertheless round-off will introduce a component along
uy in the x,, and this component will build up eventually, though perhaps
slowly.
Let
(4,111.7) Gy = try, = YY y->
Then a, is independent of v, and in particular
N; — MG aim By,
for which the greatest of the ratios |j/dj|, 7 > 1, is least. But the
greatest, of these is either |\3/)j| or |A,/4| or both. The optimal p is
therefore that for which these ratios are equal, since a different selection
of » would increase one of these ratios even though it decreases the other.
Hence the optimal yu is
w= —(Az + dx) /2.
To make the strictly optimal choice, one must know 2 and X,, exactly, but
enough information may be at hand to permit a good choice.
The iteration of the linear polynomial A + yl and of the very special
quadratic polynomial A? in place of A is sometimes advantageous. The
question arises then whether A*, or even an A?-+ ul, for some yu is
necessarily the best quadratic polynomial, and more generally what is
the best polynomial of any given degree. It turns out that the best
polynomial is given by the Chebyshev polynomial of the prescribed
degree (cf. §5.12).
If a’ and a” are known such that \1 > ’ > Ae > + +s Sa SN”,
then it is no restriction to suppose that \’ = —\” = 1. Forif this is not
the case, one can replace A by A’ = (X’ — N”’)-2A — (X’ + N’’) I].
Hence assume this to have been done, and assume further that a 4 is
known such that
hi SoS Pak, Ss. eee SE
(4.112.1)
Then let
(4,112.2) T'n(X) = cos [m arc cos jl],
Sm(X) = T'n(d)/Tn(8).
Then S,,(5) = 1, and S,,(A1) => 1. On the other hand, for any 2 satisfy-
ing 1 >> —1 and in particular for \ = yx, 7 > 1, it is true that
|Sm(X)| <1. Indeed, the argument developed in §5.12 can be modified
to show that, of all polynomials q(A) of degree m satisfying 9(6) = 1,
Sm(A) is that polynomial whose maximal absolute value on the interval
from —1 to +1 is least. Hence among all polynomials of degree m
that might be used for the iteration, S,,(A) is the best choice that can
be made on the basis of the information contained in the hypothesis. In
other words, in the sequence
whereas
Ay = A — Niu = 0.
' Hence A; has the proper values 0, associated with the proper vector w1,
and ),, 7 > 1, associated with the proper vector u;. Thus A; satisfies the
requirements. It is useful to note that
Aj = A? — deuwt,
and hence inductively
(4.113.2) AY = A” — Nuzut.
Thus if the powers of A have been formed in the process of arriving at \1
and wu, to form the same powers of A; it is necessary only to subtract a
scalar multiple of the-singular matrix wiu*.
When dz and we are found, one can form
Ao => Ai = Noes
urans(?)=(6' 4)@)-G)-»@)
Then
av.(7) =»)
Hence
(4,113.4) Ww = (2)
where w is a vector of n — 1 elements andwascalar. Then it is possible
to choose u so that
_ ed hp ee o w*
pe Tg ( I- pee) Eh I - oa)
and hence by direct calculation
(4.113.6) w= (1 — o)/(1 — a).
Hence when A, and wu, are found, the transformation (4.113.3), where the
unitary matrix U, is defined by (4.113.5), (4.113.4), and (4.113.6),
replaces A by a Hermitian matrix A, of lower order, whose proper values
are the same as those of A which are not yet known, and whose proper
vectors v; yield those of A by the simple relation
Let
Qy = Bry, By = 262).
Then
-114.1
ay =54 HimAl + Homers,
2
Saray By = @11AJ + Wowedr}.
Then the two matrices
The method can be extended to the case when three or more of the proper
values are nearly equal.
When the powers A” are formed explicitly, there are already at hand
the vectors x, of n distinct sequences since the 7th column of A” is A’e;.
The 7th diagonal element of A” is the a, for the starting vector ¢;.
If the matrices A” of the sequence approach rank 1, then all columns
approach the proper vector associated with the single largest proper value.
If they approach rank 2, then the two largest proper values are equal or
nearly so. Pick out any two diagonal elements and consider their values
in consecutive matrices of the series. Let these be a, and B, in A’, ay41
and 6,41in A’+1, From (4.114.1), the matrix
a By \_frA¥ AB Nee oa
Oyvi1 Brot AG get Hone W2W2
160 PRINCIPLES OF NUMERICAL ANALYSIS
Hence if the determinant of this matrix vanishes for large v, then \1 = Az.
If it does not vanish, the roots are nearly equal, by comparison with the
other proper values, but not exactly equal, and they can be found by
solving Eqs. (4.114.2).
4.115. Rotational reduction to diagonal form. A second-order Hermitian
matrix has the form
he 2 Gs cos ‘\
e2 sin 6
Assume 6 > 0, and let ui and uw. < pi be the two roots of (4.115.2). Then
if v is a proper vector associated with m1,
(B81 — pwi)e cos 6 + Bets) sin 6 = 0.
This can be satisfied by taking
o. =w+ ¢/2, wo. =w— ¢/2,
with w arbitrary, and
(u — B:)(u — Be) = 6.
Because of this relation, either root yw; must exceed both #; and B. or be
exceeded by both. But since the sum of the roots is
Hi + we = Bi + Ba,
it follows that the larger root 1; must exceed both, and the smaller root ps
must be exceeded by both. Hence
tan @ > 0,
THE PROPER VALUES AND VECTORS OF A MATRIX 161
and 6 can be taken to lie in the first quadrant. Hence 6 is uniquely
determined. By applying a standard trigonometric identity, one obtains
(4.115.4) tan 26 = 26/(B: — B:2),
an expression which does not involve p.
The proper vector v; associated with yi can now be written
e#/2 cos 6
(4.115.5) vy = (n am \
—e'#/2 sin 6
(4.115.7) Vo = ( 26? cos )
One verifies directly that the sums of squares of the diagonal elements of
B and M are related by
wi + wf = Bi + BZ + 267.
Now suppose B represents any principal minor of A. For simplicity
let this be the minor taken from the first two rows and columns of A.
The matrix
V0
base & 2
is easily seen to be unitary. If we write A in the form
Bue Bo )
ois B ingBiss,
then ~
= Ut = M V 1).
Ai aoe) UtAU,i ey By )
162 PRINCIPLES OF. NUMERICAL. ANALYSIS
provided only m + 0.
For a non-Hermitian matrix, there are two sets of proper vectors, a set
of row vectors and a set of column vectors. Corresponding to any proper
value ); there is a column vector w; and a row vector w‘ such that
(4.12.5) Aw; = wi, wid = rw,
The w* are in fact the rows of W-1, as the w; are the columns of W, or may
THE PROPER VALUES AND VECTORS OF A MATRIX 163
be taken so when properly normalized. Let u = vW- be any row vector.
Then
UA’ = vA"W- = Trg,
if the ¢; are the elements of v. Hence, again, for sufficiently large v
(4.12.6) u, = uA” = dy},
provided ¢; ~ 0. Itis not necessary that w; and w' be of unit length, but
for these to be a column of W and a row of W-', respectively, it is neces-
sary that
ww, = 1.
oy weNo!
+None
Consider the sequence of scalars
(4.12.8) ays, = U,a:
For large v this becomes
ay = Aihim + AZGan2.
Then in the limit the matrix
Oy Nt nN
yg. AZHE YBtt
Gyp2 AZH® Yet
The a’s and the ’s can indeed be individual components in the 2’s or in
the w’s. The extension to a larger number of proper values of equal
164 PRINCIPLES OF NUMERICAL ANALYSIS
modulusis direct, and applies, moreover, also to the case of moduli that
are nearly equal.
In case \1 = de, the coefficients in the quadratic (4.12.9) all vanish.
This case will be considered later.
Suppose the proper value )1 of greatest modulus and its associated
vectors w; and w! have been found. Suppose dA, exceeds all other proper
values in modulus. Its proper vector wz is orthogonal to w!, and w? is
orthogonal to w:. Hence one can proceed as above but with starting
vectors x, orthogonal to w!, and wu, orthogonal to 1:
wig = uw; = 0.
are of progressively lower grade, and lie also in the subspace, and in par-
ticular u{? is the unique proper vector in the subspace.
Consider the effect of the iteration upon these principal vectors. If
we write
Us = (uPu® - - - uf),
then one verifies that
Oe Uae
where
Xe alt FO, ae.
AREA; Lo. p= Ad hy,
fo) 0; (ef fe fe) 16) 0 ve! Ve
The auxiliary unit matrix J, vanishes in the n,th and higher powers.
Associated with each distinct \; will be a particular matrix U; of
principal vectors, and x can be expressed as a sum
i=l ge,
+ PT = [Ql + 1) — Ml = Ip = 0,
it follows that in the limit any 1 + 71 consecutive vectors in the sequence
Ax will be linearly dependent, and in fact the coefficients expressing the
166 PRINCIPLES OF NUMERICAL ANALYSIS
dependence relation are the coefficients of the powers of ) in the expan-
sion of (A — A1)". ‘This fact provides a means for computing Ai.
Given Ax, it is possible to form the combinations
A),2y = By41 — Ait),
AZ ty = Lrz2 — WZrXy41 + Af,
We find
A), 2 = Uf + Iy- AW Ow + I) "x
= UWA oh I,) "x,
But since [7*= 0, therefore A*tz, = 0. Moreover, J7'—! differs from the
null matrix only in the last element of the first row. Hence in the product
UiJ*— only one column is non-null, and this is the proper vector u,
Hence A*!—!z, is equal to uw“ except for a nonessential scalar multiplier.
Thus even in this case, it is possible to obtain from the iteration both the
largest proper value and an associated proper vector.
4.2. Direct Methods. By a direct method will be meant a method for
obtaining explicitly the characteristic function, the minimal function, or
some divisor, possibly coupled with a method for obtaining any proper
vector in a finite number of steps, given the associated proper value. The
method to be used in evaluating the zeros of the function is left open.
Naturally one such method would be direct expansion of the determi-
nant |A — Al| to obtain the characteristic function. This done, and the
equation solved, one could proceed to solve the several sets of homogene-
ous equations (A — AI)x = 0, where d takes on each of the proper values.
Such a naive method might be satisfactory for simple matrices of order 2
or 3, but for larger matrices the labor would quickly become astronomical.
In discussing iterative methods, it was convenient to consider sepa-
rately Hermitian and non-Hermitian matrices. This was primarily
because of the fact that for Hermitian matrices the proper values are
known to be real, though a further point in favor of the Hermitian matrix
is the fact that it can always be diagonalized. For the application of
direct methods, however, the occurrence of complex proper values intro-
duces no difficulty in principle, though naturally they complicate the
task of solving the equation once it is obtained. Intrinsic difficulties
arise in the use of direct methods only with the occurrence of multiple
proper values. Fortunately, however, one begins in the same way
regardless of whether multiplicities are present or not. If present, the
fact reveals itself as one proceeds. If a given direct method applies at
all to non-Hermitian matrices, then its application to any diagonalizable
matrix can be discussed about as easily as can the application to the
THE PROPER VALUES AND VECTORS OF A MATRIX 167
(4.21.2) ea = tes
Newton’s identities (3.02.5) express the sums of the powers s; as poly-
nomials in these elementary polynomials, where the y; here take the
places of the o; in (3.02.5). But
Ss = 7 = tr (A),
a3 = 2h, = tr (A®),
and in general
(4.21.3) 8h =o ir GA"):
Hence by taking powers of A up to and including the nth, one can com-
pute the sums of powers of the \;, and thence by applying Newton’s
identities find the y,. Hence to find the s, by this method requires
(n — 1) matrix products; each matrix product requires n’ multiplications;
hence altogether to find the coefficients in (4.21.1) requires approxi-
mately n‘ multiplications. .
To improve the algorithm and obtain further information, consider
(4.21.4) CX) = ad) QP A) = Con — Car CAF ee
ete Cena
Then
(4.21.5) C(A)QI — A) = (—1)"eQ)I.
(4.21.6) Crea
©) 8G SOO 1 Oldie Ke 3e0 ce
CriA = Yall.
Now 71 is given by (4.21.2). Hence C; can be found from the second of
(4.21.6). On multiplying this by A and taking the trace of both sides,
one finds in view of (4.21.3)
tr (CiA) + 82. = 7151.
Hence the coefficients yy, and the matrices C can be obtained in the
sequence Cy = I, y1, C1, v2, Co, . . . » Yn. The final equation in (4.21.6)
serves as a check. Note that, since each C; is a polynomial in A, it is
commutative with A:
ACy-= CyA.
As a byproduct of this computation, one obtains the determinant
|A| = Yn;
the adjoint
adj A = C23;
QT — AC’) = (—1)"¢'(A)I,
- THE PROPER VALUES AND VECTORS OF A MATRIX 169
and on taking determinants
of both sides,
(—1)"9Qi)|C’As)| = [((—1)"9’Q)]*.
But ¢(\;) = 0, whereas ¢’(\;) #0, and we have therefore reached a
contradiction. Hence the method yields at least those proper row and
column vectors that are associated with simple proper values.
In forming the matrices C2, C3, . . . , Cn1, each requires n* multipli-
cations, making n°(n — 2) in all for forming the characteristic function.
Given a \;, to form C(A,) one can form
Ad — A)C’A)B = (—1)"6"(u)B.
The rank of the right member of this equation is the same as the rank of
B, since ¢’’(\;) # 0, and this cannot exceed the rank of any matrix factor
on the left. But (A.J — A) has rank n — 2, whence the rank of B cannot
exceed n — 2. Hence B has rank n — 2 exactly, and therefore C’(A,;) has
rank 2 exactly.
Thus when \; is a double root (and not a triple root), either C(A;:) # 0,
in which case there exists a non-null column of C(A;) and a non-null
column of C’(\,), the first being a proper, and the second a principal,
column vector associated with ),; or else C(A;) = 0, in which case C’(X;)
has rank 2 and any two linearly independent columns are proper vectors.
Corresponding statements can be made for the rows.
The argument can be extended to the case of a root of arbitrary
multiplicity.
4,22. Methods of Enlargement. Suppose A,_1 is a principal minor of
A, say that taken from the first n — 1 rows and columns of A, and let
bn—1 (Aa)
Undoubtedly this covers the majority of cases that arise in actual practice.
But this column alone is insufficient for obtaining all the proper and
principal vectors when ),; is a multiple root, and moreover no general
method has been provided for the theoretically possible case of a simple
root A; for which this column (but not the entire adjoint) vanishes.
172 PRINCIPLES OF NUMERICAL ANALYSIS
The proper row vectors can be obtained in the “usual” case by using
equations corresponding to (4.22.5) to compute f;_1(A), and hence
1 (4). ;
The “escalator method” also proceeds to matrices of progressively
higher order, but it requires the actual solution of the characteristic equa-
tion at each stage. Consider only the case of a symmetric matrix, for
which moreover all proper values are distinct. Thus suppose for the
symmetric matrix A all proper values and all proper vectors are known:
(4.22.6) AU = UA.
If A is bordered by a column vector and its transpose to form a symmetric
matrix of next higher order, we wish to solve the system
(4.22.7) le *)& Z
Hence
Ay ae an = ry,
(4.22.8) Biel gw
Let
(4.22.9) y = Uw.
Then
AUw + an = Uw,
UAw + an = Uv,
(4.22.10) Aw + UTan = du,
w = (Al — A)~U"an.
Since 7 can be taken as an arbitrary scale factor, (4.22.9) and (4.22.10)
determine the proper vector once the proper value \ is known. How-
ever, from the second equation (4.22.8) it follows that
a'Uw = (A — a)n,
and, on substituting (4.22.10),
(4.22.11) aU(Al — A)“UTa = X —
This equation in scalar form can be written ‘
(4.22.12) Z(a™u;)?/(A — i) =A — a,
and its roots are the proper values of the bordered matrix. If the d; are
arranged on the \ axis in the order \1 > \2 > - +: > Aq, then as X
varies from —© to dn, the left member decreases from 0 to — ©; as
\ varies from An to Ani, the left member decreases from + to —o;
. ; a8) varies from ); to + ©, the left member decreases from + © to
0. The right member increases linearly throughout. Hence (4.22.12)
THE PROPER VALUES AND VECTORS OF A MATRIX 173
has exactly one root \ < \,; exactly one root \ between each pair of
consecutive ;; and exactly one root \ > x. With the roots thus iso-
lated, Newton’s method is readily applied for evaluating them.
4.23. Finite Iterations. If bo is an arbitrary non-null vector, then in
the sequence ;
Do, bi = Abo, be = Abi, ai Ae Ct
is the highest common divisor of p(A) and (A), then d(A)bo = 0. That
is to say
(Avs Ar 2 te ot 8 by =O)
or ‘
b, ae 5yb,-1 + eae ae 5,bo = 0.
But the vectors bo, bi, . . . , bm—1 are linearly independent, whence v = m
and d = p. Hence p(d) divides the minimal function ¥(A) and hence
also the characteristic function ¢(A). In particular if m =n, then
p(dA) = (—1)"¢(A). When this is true, therefore, one can form the
characteristic equation by first performing the n iterations Ab; and then
by solving a system of n linear equations. When this is not so, one can
at least obtain a divisor of the characteristic function by performing a
smaller number of iterations and by solving a system of lower order.
However, in addition one must test at each step the independence of the
vectors already found, as long as the number is below n + 1.
A great improvement is provided by Lanczos’s “‘method of minimized
iterations.” Let bo and co be arbitrary nonorthogonal non-null vectors.
In case A is symmetric, take by = ¢o; if A is Hermitian, take by = co and
in the following discussion replace the transpose by the conjugate trans-
174 PRINCIPLES OF NUMERICAL ANALYSIS
pose. In other cases by and co may or may not be the same. Form b,
as a linear combination of by and Abo, orthogon#] to co; and c; as a linear
combination of co and Ae, orthogonal to bo. Thus
b; = Abo ae abo,
where
0 = chbi = chAbo rae aochbo;
and
cl = clA — doc},
where
0 = clby = chAbo — Sochbo.
But then
ag = 50 = chAbo/cibo.
where
0 chbe chAby aa Bochbo,
0 I : clbe I clAby = otyc1by.
Hence
a, = clAbi/clbi, Bo = chAb,/chbo.
But from the relations already derived,
clAb, = clbi = clAbo.
Hence
Bo = clbi/chbo,
and if
ch = clA — acl — Boch,
then it follows that
0= chbo = chbi.
The step breaks down in case cib; = 0, since this is the denominator in
a. But this means that
OS clAbi = clA(A = aol )bo,
and when a is replaced by its value, this is, apart from the factor (clbo)—},
equal to the determinant of the matrix product
(co
chA
(by. Ab 0)-
Hence this can vanish only if the pair bo, Abo or else the pair co, A'cp is
linearly dependent.* Suppose for the present that this is not the case.
*This conclusion, and the more general ones, do not follow. It can be shown, however,
that if neither pair (more generally neither set) is linearly dependent, then after a slight
perturbation of either starting vector, bo or co, the determinant will not vanish. One could
say, in pseudotechnical language, that failure has zero probability.
THE PROPER VALUES AND VECTORS OF A MATRIX 175
We now show that a, and B; can be chosen so that
is orthogonal to bo, bi, and be, provided only that the vectors bo, Abs,
A’bo, and also the vectors ¢o, A™co, A™co, are linearly independent triples.
First we note that
chAbe (el = aro} Da = 0,
chAbo ch(by + ado) => 0,
This is always possible if c, and be are not orthogonal. But the matrix
product
Cy chbo chA bo chA 2bo
¢ (bo Abo Abo) => clAby chA 2bo chA 3bo
cA clA 2bo clA 3b chA 4D
until possibly at some stage c; and b; are orthogonal. When ¢; and }b; are »
not orthogonal, then b;41 is orthogonal to every vector Co, C1, ... , Gi,
and c41 is orthogonal to every vector bo, bi, . . . , by.
Necessarily, there is a smallest m <n for which clb,, = 0, since if
the vectors Co, C1, ... , Cn—1 are all linearly independent, then only -
bn = 0 is orthogonal to them all, and this vector is orthogonal to ¢n,
whatever c, may be. Suppose the relation holds forsomem <n. Then
either the set bo, Abo, . . . , Abo or else the set co, ATC, . . . , (AT)co
is linearly dependent. For definiteness suppose it to be theformer. The
set from which Abo is omitted is linearly independent, for if it were not,
the m selected would not be the smallest possible. Hence Abo is
expressible as a linear combination of the m vectors bo, . . . , A™~1bo.
Hence Ab,,_1 is some linear combination of the vectors bo, bi, . . . , Om—1!
FM ps = Bm—10m—1 ar Hm—20m—2 <r Eh aospodo.
Hence
But then
0 = ChOm = Mochbo,
= ClO => prclby,
By hypothesis all the vector products on the right are non-null, and
therefore
O = wo = wi = °° * = m1
— Om,
whence b,, = 0 and
Thus if the vectors bo, Abo, . . . , A%bo are linearly dependent, then
(4.23.7) holds; correspondingly if the iterates of co by A™ are linearly
dependent, then it follows that
(4.23.8) 0= (At = Om—11)Cm—1 — Bram.
Hence the recursion (4.23.5) and (4.23.6) can be continued until for some
« = m either (4.23.7) or else (4.23.8) holds.
THE PROPER VALUES AND VECTORS OF A MATRIX 177
Consider now the sequence of polynomials
pod) = 1,
pila) = (A — ao)pod),
(4.23.9) p2(A) = (A — ai)pi(d) — Bopo(d),
ERO, (8, Cae) (05 elec ey (6: .6 ver ele! ie! 6. ese
© AO Ob HAL IOME Oy sey Oe) 0 Ore, CO wine, 10) dO) WO) Ooms ope
where the a’s and #’s are defined by (4.23.6). One verifies inductively
that
If only one of 6b, and cm vanishes while the other does not, one can by
a different choice of bo (if b» = 0) or of co (if cn = 0) obtain a longer
sequence. Hence suppose that bn = Cm = 0. Moreover pm(d) has only
simple zeros:
(4.23.18) p(d) = pu(A) = (A — x)(A — Ae) + + + (A = An)
Then
Zfi(A)gi(r) = 1.
Hence
Also, since polynomials in A (or those in A‘) are commutative with one
another, .
Zfi(A)gi(A)d; = by, Zf(A )ai(A eg = cy.
But -
(A — Ail) fi(A)ai(A)db; = fi(A)p(A)d; = 0.
Hence f;(A)q:(A)b; is a proper vector of A associated with the proper value
:, 80 that each 6; is expressed as a linear combination of proper vectors.
Again, if the first of (4.23.16) is solved for any proper vector
f(A) agi(A)bo,
it is expressed as a linear combination of vectors bo, Abo, Abo, . . . , and
these in turn are expressible as linear combinations of the b;. Hence
each proper vector appearing in the first of (4.23.16) is expressible as a
linear combination of the b;. Likewise each proper vector appearing
in’the second of (4.23.16) is expressible as a linear combination of the
c;. Let u; represent the proper vectors of A, »; the proper vectors of A’,
which appear in (4.23.16). Then
(4.23.17) bo = ZU, Co = 20;.
Hence if we let
po(A1) donc Dm—1(A1)
(4.23.19) Pic ee er tae aeons cee ;
Do(Am) » «+ Pm—1(Am)
BOS (by ae roan) Com (Cpe
cues)
4.23.20 } m2
( ) OO, Pec Yom) V = (y EP
But
VU tee -D,
U = BH, Vi = ICT.
Moreover,
CB =A
is also a diagonal matrix. Hence
CU 2 AH" V'B = KIA,
H=A°C'U =A"P'D, K™ = V'BA“! = DPA“,
(4.23.22) U = BA"P'D, V" = DPA-Ct.
The diagonal matrix D is determined only by scale factors in the vectors
u; and v; and these can be chosen as convenient. Thus the m proper
vectors of A which appear in (4.23.17) can be expressed as linear combina-
tions of the b;, and the m proper vectors of A™ (or proper row vectors of A)
can be expressed as linear combinations of the c; (or of the c}).
When m < n, in general this will be because ¢(A) has multiple zeros.
Suppose still that p,,(A) has only simple zeros and take a bj, orthogonal to
all the c;. Then Abj is also orthogonal to all the c;, since
JAB, = (ATe;)"B4,
and A'c; is itself a linear combination of the c’s. Likewise if ci is orthog-
onal to all the b’s of the original sequence, so also will A'c) be orthogonal
to all the b’s of this sequence. Hence one can develop sequences bj, and Cc,
and these vectors will be independent of those hitherto found. They
will yield new proper vectors associated with the multiple roots. More-
over, the new sequences will in general have fewer than m members each.
If they have fewer than n — m members, a third pair of sequences must
be started with bj orthogonal to all previous c’s, and cj orthogonal to all
previous b’s. Since each such sequence will contain at least one member,
the initial vector of the sequence, the process will eventually terminate
and yield all proper vectors.
If, instead of imposing the orthogonality requirement upon bj and c},
one required only that b> ¥ bo and cy ¥ ¢o, then in general new sequences
of m terms each will result, proper vectors associated with simple roots
will be found over again, but a new proper vector for A and one for A' will
be found associated with each multiple root. This course might be
preferred in order to avoid the computation of the orthogonal starting
vectors.
When the zeros of pn(A) are not all distinct, it is still possible to obtain
the proper vectors in much the same way. A resolution such as (4.23.16)
can be employed to show that bp and ¢» are expressible as sums of principal
vectors. Here q:(A) is the quotient of p(A) by the highest power of
(A — 4) it contains as a factor. Each principal vector which appears in
180 PRINCIPLES OF NUMERICAL ANALYSIS
U= , eds
j
Hence
Chu; = cig}by.
But
c; = pj(A")eo,
whence
Hence
CTB = D,
Boh = DCs C-1 = D-1BT,
THE PROPER VALUES AND VECTORS OF A MATRIX 181
Also we found that
cl Ab; = auch; = a;6;,
clAby_1 = cl_,Ab; =. clb; = 6;,
clAb; = 0, ji —gl > 1.
Hence the product CTAB is a matrix for which the only non-null elements
lie along, just above or just below, the main diagonal:
ado ba 0 O
61 a6, be 0
(4.24.2) CUAB=| 9. %: 252 63
Oy Of oa ara) Ve: pelriel (eel ev ey emia, a,
Hence
(4.243) BAB=D-CB=(4 |
since by (4.23.6) we know that B;1 = 4,/5;1. One should expect that
after reducing the matrix to this form a considerable step has been taken
toward complete diagonalization.
In case A is symmetric, if co = bo, then C = B, and every 6; > 0.
Hence D* is a real diagonal matrix. If we set
U = BD-»,
then U is an orthogonal matrix, and
ao Bo”% 0 eee
Nag ba 86 0
= fo’? | 5h — oh. > By) = (A — a2) p2(d) — Brpila).
0 —B,% r — 2
Sule obese a Or ele Les all ale See gue el Womew ae) 0 Sane 10. 46 210, 0) 6) 6 16-4) se) © +e 0 4B Oe)
Thus the polynomials p,(\) are the expansions of the determinants of the
first principal minors of the matrix AJ — S. Note that pji+1(d) and pi()
182 PRINCIPLES OF NUMERICAL ANALYSIS
cannot have a common factor, for if they did, this factor would be con-
tained also in p;_1(A), hence also in p;_2(A), . . . , hence also in po(A),
which is absurd. Also at any p for which ae = 0, pi+1(p) and Pi-r(0)
have opposite signs, since each 6; > 0.
We can show that between any consecutive zeros of p;(A) there is a
zero of pi+i(A), and further that pii1(A) has a zero to the left, and one
to the right, of all those of p,(A). Hence between any two consecutive
zeros of pii1(d) there is exactly one zero of p,(A). Since Pr(ao) = 0, and
since p2(ao) < 0, while po(+ ©) = +, the statement is certainly true
for? = 1. subeoce it demonstrated oe 3, D4, - . + , Pi+1 and consider
pire. Let pi and pz be consecutive zeros of pi4i(A). Then
The hypothesis implies that po, ps, . . . , Pit1 can have only simple zeros
and that p; has one and only one zero between the consecutive zeros p; and
po Of pii1; hence p;(p1) and p.(p2) have opposite signs, and hence p;+2(p1)
and p:+2(p2) have opposite signs. Therefore p:+2 has an odd number of
zeros between pi and pe.
Next suppose p is the greatest zero of piz1. Then pii2(p) = —PBepi(p).
But p; has no zero exceeding p, and pi(~) = +0. Hence p,(p) > 0,
and therefore pis+2(p) <0. Hence p;;2(p) has an odd number of zeros
exceeding p. But pi;2 is of degreei + 2. The hypothesis of the induc-
tion implies that p,i1 has 7 + 1 real and distinct zeros; these divide the
real \ axis into 7 segments and two rays extending to +o and to —,
respectively. We have shown that each segment and one of the rays
each has on it an odd number of zeros of pii2, Hence each can contain
only one, and the remaining zero lies on the other ray.
Thus the polynomials p;(\) have all the properties required of a Sturm
sequence, as in §3.05, though they are not formed in the same way.
Hence by counting the number of variations in sign exhibited by the
sequence p,() at each of two values of \ and by taking the difference, one
has the exact number of proper values of the matrix A contained on the
interval between these values.
Now return to the symmetric matrix A and consider, ab initio, the
problem of reducing it to a triple-diagonal form. Suppose A is 3 X 3.
Then if a12 # 0, one can find an orthogonal matrix of the form
130 0
(4.24.6) U=|0 ¢ —s},
0s Cc
where c and s are the sine and cosine of some angle, such that in the trans-
formed matrix
Al = UTAU.
THE PROPER VALUES AND VECTORS OF A MATRIX 183
the elements a, = a, = 0. In fact, one finds
(4.24.7) wi as = a, = Caz — Say,
An A
4.24.9 Salsrarong i
( ) a G2 ra)
INTERPOLATION
5. Interpolation
This book falls naturally into two parts: one part dealing with the
solution of equations and systems of equations, the other part dealing
with the approximate representation of functions. We come now to the
second part.
It may be that a function or its integral or its derivative is not easily
evaluated; or that one knows nothing but a limited number of its func-
tional values, and perhaps these only approximately. In either case one
may require an approximate representation in some form that is readily
evaluated or integrated or otherwise manipulated. If one knows only
certain functional values, which, however, are presumed exact, the
approximate representation may be required to assume the same func-
tional values corresponding to the given values of the argument. The
problem is then one of interpolation. If the given functional values
cannot be taken as exact, then a somewhat simpler representation will be
accepted, and one which is not required to take the same, but only
approximately the same, functional values. This is smoothing or curve
fitting. Even a function that is easy to evaluate may not be easy to
integrate. For approximate quadrature, therefore, the usual method
is to obtain an approximate representation in terms of functions that
are easily integrated.
In general, the method is to select from some class of simple functions
¢(x) a limited number, ¢o0(z), ¢:(z), . . . , n(x), and attempt to approxi-
mate the required function f(x) by a linear combination of these functions.
Thus we wish to find constants y; such that the function
(5.0.4) yi = f(a)
for brevity, then Eqs. (5.0.2) can be written
(5.0.5) Y3 = Zyidi(z).
If the determinant
(5.0.6) A = |¢:(x;)| 4 0,
then these equations have a unique solution which can be written down
on applying Cramer’s rule. It is clear that (5.0.6) will not be satisfied
if any two of the z; are the same, since then two rows of the determinant
would be identical.
Equations (5.0.1) and (5.0.5) can be regarded as n + 2 homogeneous
equations in the n + 2 quantities —1, yo, v1, ..., Yn. Hence their
determinant must vanish:
®(g; x) = Zg(2xi)A(2).
But then if \ and uw are any constants, and A(x) = Af(x) + ug(a), it
follows that
(5.0.11) - &(h3 2) = Z[f(as) + wg(a)]Ai(z)
AMl AB(f; 2) + wb(G; 2).
It is understood that the basic functions ¢; and the fundamental points 2;
are fixed throughout.
We may note further that, since by (5.0.7) &(¢;; x) = 9;, therefore
$o(x) $n(2)
(5.01.1) Wa) =| eect eee
P(%) 6. OP)
is known as the Wronskian. If this remains different from zero every-
where on the interval of interpolation (a, b), one can define the linear
operator Z,41 by the relation
ole)... ala) 42)
(6.01.2) Eel) WD) |soon Pl ce eG) ynle
Gety (x) ae gaty (x) path (x)
= petD + aio™ + oo % + An+19,
(5.01.3) Lntile] = 0
is satisfied by each ¢,(z). Moreover every solution of (5.01.3) is expres-
sible as a linear combination of the ¢;.
INTERPOLATION : 189
In general, for y < 7 one can similarly define the linear operator L,41,
and the. differential equation of order »y + 1
(5.01.4) L,+:1¢] = 0
is satisfied by $0, $1, . . . , ¢», and every solution of (5.01.4) is expressible
as a linear combination with constant coefficients of these ¢’s.
An equivalent definition of the operators L,,, can be obtained as
follows: Define the differential operator
(5.01.5) D = d/dz,
and select bo so that
(5.01.6) (D — bo)¢o(z) =0, bola) = 44(x)/do(z).
Then .
(5.01.7) L,[¢] = (D — bo)4,
since the two differential equations L[¢] = 0 and (D — bo)? = O are
both satisfied by ¢o. Again let b; satisfy
(5.01.14) Ly] = ¥
for any constants a;.
Any solutions y;(x) of (5.01.3) with the nonvanishing Wronskian could
replace the ¢; in (5.01.11), and the same g(z,s) would result. This can
be verified directly by writing
di = Lars;
and substituting into (5.01.11), in which case the determinant |a;,;j
appears as a factor which cancels out. Otherwise one can observe that
the initial conditions (5.01.12) define the solution g(z,s) of (5.01.3)
uniquely. In particular the A,(x) are linear combinations of the ¢; and
hence satisfy (5.01.3), together with the conditions
(5.01.15) Ax(xj) = 4.
If the A; replace the ¢; in (5.01.11), one can write
Note that
(5.01.28)
R(x) = [?K@,s)Lnnslf(e)ds,
2K(zx,s) = g(x,s) sgn ( — s) — ZA,(x)T,(s) sgn (a; — 8),
where sgn u is the signum function whose value is +1 when the argument
is positive and —1 when the argument is negative. Although this func-
tion is discontinuous where the argument vanishes, nevertheless the
kernel K(zx,s) remains continuous, since g(z,s) vanishes at s = x, and
T.(s) = g(a, 8) vanishes at s = 2;.
It is possible to generalize this development to cases where certain con-
ditions y; = &(z;) are replaced by conditions of the form f(#;) = 6™(a,),
which require the equality of derivatives of f and the approximating
function $, rather than equality of their functional values. As one
special case, consider the requirements that at some point a
f(a) = &™(a), ave. Os 1 eee eas
This gives the Taylor expansion when the functions are polynomials.
Let the functions y,(x) be chosen so that
¥(s) = Lrzilf(s)].
Hence f(x) can be represented in the form
Bo = f(a).
Next, apply the operator L; and again set x = a to obtain
| Bi = Li{f(a)].
Proceeding thus we finally arrive at Petersson’s generalized Taylor’s
expansion
INTERPOLATION | 193
(5.01.29) F(@) = f(a) Po(x) + Lilf(a)Wi(z) + - - - + LL f(a)Wa(x)
+ [° 9(x,s)Lnvalf(s)lds.
5.1. Polynomial Interpolation. Consider now the case of polynomial
interpolation. Equation (5.0.1) becomes
(5.1.1) P(t) =cot ewe t+ ecu? +--+ - + ene.
The determinant A is the Vandermonde determinant,
which vanishes if and only if any two of the x; coincide. Equation (5.0.8)
takes the form ae
i Xo ee Xo Yo
ee
1 « 2 pe AY
which coincides with (5.1.1) if we expand along the last row, but has the
form
(5.1.4) P(x) = DyL(zx)
when we expand along the last column. The L; are themselves poly-
nomials with coefficients which depend only upon the z;. These poly-
nomials are
(5.1.5) L(x) = | [(@ — 2) /@: — x)).
IAI
(5.1.6) L(xj) = 45
with 6,; the Kronecker 6. From this it follows that with L,(x) defined by
(5.1.5) and P(x) by (5.1.4) we have P(z;) = y.
We can write L,(w) in another form if we define
(5.1.7) w(x)
= Ie — x),
for then
(5.1.9) w'(x;)
= [] (@% — %).
Ag
194 PRINCIPLES OF NUMERICAL ANALYSIS
Hence .
or j
ager: digesthaiG2)
1 2o aytt (xo)
1 Ue ets F(2n41) 0.
00 ... @+tD! fore
This is exact, though we know nothing about £ except the fact that it lies
somewhere on the interval named. If we expand along the last row, we
get
ORLON Orel mes cor 0) 74) ue) 64/0 ane, NAP OPN A so Cale cms. p ;
Lee ieee ae OR tt 1 ntl Fa ts sdSoe)
and if we solve this equation for f(x,41) and drop the subscript, we get
¢(u) = J] (u — w),
0
If the Chebyshev points are used, ¢(u) = Tn41(u), then N = 2-", and
therefore
(5.12.6) b=atA4(n + 1) !2-| for (a)[vorn,
and that in general for any P,;... permuting the subscripts leaves the
polynomial unchanged. Note also that
(5.18.1) P3 See: (z:) = Yi.
INTERPOLATION 201
The generalization of (5.1.3) is that for any m, if M stands for the set of
subscripts m + 1,m+2,...., 7, then
1 Xo xy Pou
eats 2). i ge
taupe cosa ca = 0.
ik > % Ge IP
If in place of the P;x we were to write P;, this equation would define the
interpolation polynomial of degree m determined by (2, yo), . . oie t
(Lm) Ym).
In proof we observe first that the polynomial P defined by (5.13.2)
is of degree n at most, since in the expansion of the determinant x” will
multiply each of the interpolation polynomials P;y, which are of degree
nm — m, and there are no terms of higher degree. Next we observe that,
if in the determinant we set x = x; for 71< m, then P must take the
value assumed by Pix, and this by (5.18.1) is y;. This is true because
all other elements of the last row are then identical with corresponding
elements of the rowz+ 1. Finally, if in the determinant we set x = 2;
for 7 > m, then every Piy becomes equal to y;, making all elements but
the last in the last column equal to y; times the corresponding elements
in the first column. Hence the determinant can vanish only if also P
has the value y;. Hence the P defined by (5.13.2) is in fact the polynomial
P of (5.1.3), and the theorem is proved.
Aitken applies this principle in the following way: In application of
(5.1.3) with n = 1, we have
1 Xo Yo
1
i mM Pu=—jl am yi,
i i270
and hence
— P
Likewise
x—2, P
(5.13.4), Pu=|i a P, iy(22 — 21)
From the theorem then we can say that
(5.13.5) Pic =
%—2
t—2%,
Py Po %1 —
P» Pos Poe te — &
However, the best approximation can be expected at any stage when the
abscissa x lies roughly in the middle of the interval containing the par-
ticular fundamental abscissas being utilized. Consequently it is advan-
tageous in using this scheme to order the abscissas so that either .
<4 < 42 < 4%) <4 < 41 < 43 < + - , or else the reverse order holds.
5.14. Divided Differences. Aitken’s method is disadvantageous when
a number of interpolations must be carried out over the same range.
An alternative to the computation of the Lagrange polynomials L,(zx) is
the use of divided differences in the construction of Newton’s interpolation
formula.
The polynomial ’
(5.14.1) P(x) = ao + (@ = Xo)ar + (@ — mo)(%@ — m1)ag + >?
+ (4 — ao)(@ —"2)) = * @— a, 4)an
is of degree n and assumes the values f(x;) at z:, provided
a(x0) = do,
(5.14.2) — f(x1) = ao + (a1 — 20) an,
f(x2) = A+ (Xe — Lo) a1 Sf (x2 = Lo) (Xe = 21)Q2,
SE ORNS ON RF aieee 08 Ue eK Is. a de vouren Tece tel, et ole) opie afemwael cer tate Mine (6)
=~ 2, Po (x)
ic — 2).
Por isi vo — F9 Po2(x)
where the omitted variables are the same in all three places. This being
the case, one can form divided differences of progressively higher order
according to the scheme:
xo (xo)
f(xo, £1)
v1 (x1) f(x0, %1, Z2)
f(x1, £2) f(xo, 41, £2, Xs)
(5.14.5) we f(xe) f(&1, 2, Xs) s
f(x2, 3) ;
tz f (xs)
where each f is equal to the difference of the two on its left, divided by
the difference of the x’s on the diagonals with it.
Another expression for the divided difference which can be obtained
directly from (5.14.3) is of some theoretical interest. The coefficient of
f(z;) on the right of this identity is the quotient of two Vandermonde
determinants. When the common factors are canceled out, we are left
with
» [fe) /T] @ - ) |.
™m
This brings out again the fact that the divided difference of any order is
symmetric in all the arguments x; which appear in it.
5.141. Integral form of the remainder. Consider the function
whereas the right member of this is by (5.14.4) equal to f(xo, x1, 22).
Hence
(5.141.2) f (Xo, V1, X2) = iA Jie(a ron t1) Xo + (t, = te) X1 + toteldts di,
flo, 0) = f(x).
Again when zp = 21 = 22, it follows from (5.141.2) that
Hence if the function and its derivatives up to and including the mth are
known at some point xo, and the derivatives of the interpolation poly-
nomial are required to equal those of the function, the table of divided
differences is formed by writing f(xo) m + 1 times, f’(xo) m times, f’(x0) /2!
m — 1 times, . . . , and constructing the rest of the table as before.
In general, whether or not there are repeated arguments, the identity
(5.141.6) f(a) = f(wo) + (& — 2o)f(xo, 41) + (% — 20) (% — 41)f(Xo, %1, 2).
In general,
1.3)
aoe
Af(x) Oe
= f(x + h) — f(x),
Vf(z) = fle) — fle — hi,
a(n) = fle + h/2) — fle — h/2).
Since it is natural to require that
for any integer u, we may indeed go a step further and accept (5.15.4),
for all real u, integral or not. With this understanding we can write the’
following formal relations between pairs of operators:
A=E—1
= EV.
(5.15.5) SN a VEY = HY — E-*,
and the series terminates after u-+ 1 terms. If these equivalent oper-
ators are applied to fo.= f(xo), we have
(5.15.10) f(x) = fo + uamAfo + UaA*fo + usyA*fo +t ---,
Note that the fundamental points which determine this polynomial are
the points whose abscissas are Xo, 41, 2, . . . , Zn.
Newton’s formula for backward interpolation is obtained by expand-
ing E” in powers of V:
(5.15.18) EY = (1+ V9)" =1L+tumv t+ (ut Dov? + (u + 2) V3
+ “ipcre eae
For this the fundamental points are the points whose abscissas are x_n,
Tnti,.-+.+, U1, Xo. If these were the same as the points entering in
(5.15.12), but with different designations, then the two polynomial
s
(5.15.12) and (5.15.14) would be identical except in form. They would
also be identical if f were itself a polynomial of degree n, whether or not
the points were the same. In that case A"+1f and V*t}f would vanish,
and both polynomials P(x) would be identical with f.
There are many different forms in which the interpolation polynomial
INTERPOLATION 209
of Moire n can be written, and these can be obtained, one from the others,
by neglecting differences biorder n + 1 and higher, ana then by renaming
the points. One simple scheme is based upon the “lozenge diagram.”
In the array
UrAPf
(u+
Ur+ 1
1) p41
lew.
UA?fott
where parentheses are omitted from subscripts of u, the sum of the terms
in the upper row is equal to the sum of the terms in the lower row:
UrAPfa + (u + 1):
APtf, = UA? hapa + UrysAP tf.
The identity follows directly from the relation
Eailiag cre
Se! + 2); A
4
eeeSs
ee ats
™ (u ppDs)aspsok
u 3 aeUn ane a thay:
e
fi ee (u faa 1)e 2 s
ee ce
fs ae ;
the sum of any two terms connected by a dash pointing down to the right
is equal to the sum of the two terms connected by the dash just: below.
Now if we start with fo and proceed diagonally downward, summing the
first n + 1 terms, we obtain the right member of (5.15.11). But the sum
of any other sequence of terms obtained by proceeding to the right and
ending with A*f) will have, according to the theorem, identically the
same value. Hence we obtain different expressions for the same interpo-
lation polynomial. By ending on A*f; for 7 ¥ 0, we obtain an interpola-
tion polynomial based upon a different set of findamental points.
It has been remarked already that with uniform spacing the interpola-
tion is most accurate for points near the middle of the range. For compu-
tational purposes it is convenient if the coefficients are small. Both
conditions are satisfied if one designates as x» the fundamental point
210 PRINCIPLES OF NUMERICAL ANALYSIS
closest to the point z to be interpolated and if the series contains only
terms as close as possible to the horizontal line through fo. The two
Newton-Gauss formulas result: é
(5.15.15) P(x) = fo + urAfo + U2h*f_y + (u + 1) sA*f_y
+ (u + 1)sA4f_2 + (u + 2)sA°f_2+ +:
and
(5.15.16) P(x) = fo + wAf_1 + (u + 1) 2A*f_1 + (u + 1) 3A*f_2
ihewea’ +1): a (y a 1 a
ea re ame a> (9). ce
fa acai < i: :
and
Eb 240g ee
(5.151.2)
hetu(u® — 19)
Rags = Dre
+ + fu? = (p = 1)Peas
ar > nea
Mu — pferrr(p
we
To obtain an upper limit for the truncation error from #,4,: directly,
one must have an upper limit for f“*+» over the interval containing the z;,
and this is not necessarily easy to obtain. However, it may be known
that each of certain consecutive derivatives retains a fixed sign over the
interval, and these signs may be known. In this event, Steffenson’s
“error test’? can be applied. This rests upon the simple observation
that in any series
S=Utuwut::: + Un + Rati,
where R,,,1 is the error due to dropping terms beginning with wn41, if it is
known that R,,; and 2,2 have opposite signs, then since
or
$:(t) = exp (a + B:).
The general methods described at the beginning of this chapter apply, but
not much more can be said when the constants o; and 8; arecompletely
arbitrary and unrelated. Most often, however, the a; will be in arithmetic
progression, in which case certain rere simplifications are possible.
If, possibly after changing scale, one can set
a,
= k,
(5.2.10) oa)
eo = th 48:(2).
YiSi(x
r,o(¥ t)=@n,
(6.1.8) (S, 8) ee 3)= (P, p),
(S, )'(T, ) = (2 °)
The last relation gives 6 as
(6.1.9) st = 6
and requires that
(6.1.10) T = 0, S't = 0.
The first two require that
(6.1.11) Tv+t=f, Su+s = p.
Hence when the first of (6.1.11) is multiplied on the left by S™ and the
second by 7", s and ¢ are eliminated, and equations in v alone and in u
218 PRINCIPLES OF NUMERICAL ANALYSIS
alone result. Because of (6.1.7), and because D is diagonal and hence
symmetric, v and ware given by
(6.1.12) y = DS, u = D-I"p,
provided only that no diagonal element in D is zero. After v and u are
found from (6.1.12), ¢ and s are obtained from (6.1.11), and 6 from (6.1.9).
This establishes the induction and provides an algorithm for finding the
matrices S, T, U, V, and D.
Now observe that the columns of T are linear combinations of those of
F, and the columns of S are linear combinations of those of P. Equiva-
lent to (6.1.3) and (6.1.4) are the equations
TVe=y+d, Std I ad
To return to the induction, suppose that b is given and that the effect of
adjoining a new function ¢, and hence a new vector f, is to be investi-
gated. Vectors s and é are found by the method already described.
The vector b is then unaffected except by the adjunction of an additional
element
(6.1.15) B = o-tely.
The equation
(6.1.16) Tb=yt+d
becomes
7,9 (?)=y+a,
B
where d’ is the new residual, or on expanding and applying (6.1.16),
d+ip=d'.
The vector d is orthogonal to all columns of 7; d’ is orthogonal to all
columns of the enlarged matrix (7, ¢); in particular d’ and £¢ are orthog-
onal. Hence
(6.101.8)
R@) = f° KG, 8)Lmealf(s)lds,
2K (a,s) = g(z, 8) sgn (w — 8) — BBi(x)g(a, 8) sgn (a; — 8).
6.11. Least-square Curve Fitting. The residual vector d always vanishes
when m = n, provided at each step the columns of F,, as well as those of
P, are kept linearly independent. Moreover, the vector c is then inde-
pendent of the choice of the functions y, given only the linear independ-
ence of the sets of functional values. But for any given m <n, the
vectors c and d and the length of d will depend upon the selection of the
functions y. Thus, for example, one could choose functions y each of
which vanishes at all but m + 1 of the points z;. This would give the
interpolating function (a) which passes through these m cas 1 points
and would take no account of the other points.
It is natural to ask that for whatever m < n one might fix upon the
vector c be chosen so that the vector d is as small as possible. In all cases
the vector y is resolved into two components one of which lies in the
space of F and d being the other component. Then the shortest length
possible for d is the length of the perpendicular from the point y to the
space of F. Hence the component Fc of y in the space of F is to be chosen
as the orthogonal projection of y upon this space. This is effected by
making y, = ¢,, and hence by taking the matrix P to be the same as F.
All equations in §6.1 can be specialized immediately to this case. It
may be remarked in passing that in many cases there are good statistical
grounds for minimizing the length (or the squared length) of d, but these
will not be developed here.
MORE GENERAL METHODS OF APPROXIMATION 221
Often experimental conditions may be such that certain values of
the y; may be less reliable than others. If so, greatest “weight” should
be given to the measurements of highest reliability. This can be effected
by associating a diagonal matrix W of order n + 1, with positive non-
null diagonal elements, in which the magnitude of each diagonal element,
say the jth, measures the degree of reliability attached to the value of the
measurement y;, The matrix W is then used as a metric for the space,
and the orthogonality and lengths of the vectors are taken with reference
to this metric. This can be achieved by setting
P= WF
in §6.1.
6.111. Least-square fitting of polynomials. A special simplification is
possible when the function © is to be a polynomial, and
(6.111.2) fr = X"fo.
The argument used in proving Lanczos’s theorem (§2.22) can be applied
here to show that each column ¢,,1 of the matrix T is expressible as a
linear combination of Xt,, t,, and é,1. In fact,
to = fo,
(6.111.3) i = (Xx = 9591) to,
where
From to = fo one calculates ao and 50, thence f:, ai, and 4;, and so on,
sequentially. The functions 7,(r%) are orthogonal on the set of points
x; and are given by
To(2) = it,
(6.111.5) 9 r1(z) = (&@ — aodp")ro(z),
Troi(%) = (% — a,d>1)7-(x) — 6,671,77-1(2), POPE
6.12. Finite Fourier Expansions. Let the abscissas %o, 41, . . + » Ln—1
be uniformly spaced. After making a linear change of variable, we can
assume that
(6.12.1) x, = k/n,
222 PRINCIPLES OF NUMERICAL ANALYSIS
and the function f(z) is to be considered only on the range 0 < a< 1,
or is supposed periodic of period 1. Make a further change of variable by
(6.12.2) =exp (2riz), t=+W/-l.
Then
(6.12.3) w, = exp (2rik/n),
and
(6.12.4) wi, = exp (2rijk/n) = of.
The function f(z) becomes a function of w, F(w), and the interpolating .
polynomial in w gives a representation of f(x) of the form
6.12.8 ae Soph — n if Ie
j ate
( ) > ENE 0. ia oh
MORE GENERAL METHODS OF APPROXIMATION 223
Hence the functions w?/ and w-?* are biorthogonal on the set «, in the
sense of §6.1. Then if the functions w?/ are taken as the 7;, and w-?* as
the o; of §6.1, the coefficients B are given by
(6.12.9) B= 1 >;yrcor?i,
P
If 27 ¥ m, then
or
}
Bm—j = ) Yray?™ = ) yr,
p
B= mY mleos (2rpjk/n) — i sin (2npjk/n)],
k
and
Bi2?* + Bmw ?* = 2A, cos (Qrpkx) + 2B, sin (Qrpkz).
with the A’s and B’s given by (6.2.10). In case m is even, Bn/2 = 0,
while Amz = Oif nis even, and An, = 1/n if n is odd.
6.2. Chebyshev Expansions. It was shown in the last chapter that if
T,, is the Chebyshev polynomial of degree n, then 7’, is that polynomial
of degree n and leading coefficient unity whose maximum departure from 0
on the interval from —1 to +1 is least. This property was used there to
obtain a particular set of points of interpolation that was optimal in the
sense there described. The property may, however, be utilized in a
different way to obtain an approximate representation of f(x). If f(x)
is expanded in a series of polynomials T,,, and if the coefficients of these
224 PRINCIPLES OF NUMERICAL ANALYSIS
polynomials in the expansion do not themselves increase too rapidly,
then the fact that each polynomial is small over the entire interval
suggests that a small number of terms in the expansion might provide
an approximation that is uniformly good over the entire interval. This
is in contrast to a Taylor series expansion which requires more terms the
farther one goes from the center of the expansion.
The analysis is simplified somewhat by introducing the functions
(6.2.1) C,(x) = 2 cos 8, x = 2 cos #.
Then
Coa) = 1 Cie) =a, Cie) = 277 — 2, .°%
and in general C,(x) is a polynomial in x of leading coefficient 1. We
consider the representation of the function f(x) on the range from —2 to
+2 and seek an expansion
(7.1.3) [ s@u@ae = yu + B
the remainder R vanishes whenever f is equal to any of the ¢,. Hence
(7.1.5)
He) = Yandel) + [7oe, )Lmrslf(s)lds,
v= DVande(e) + [* g(t, 8)Dmeslf(s)lds.
Assuming the multipliers \; to have been found, multiply the first equa-
tion here by w(x) and integrate from a to b; the second by 2, and sum,
subtracting the result from the integral just obtained. By (7.1.3),
(7.1.1), and (7.1.4), this gives
R= ‘i” w(z) if
* g(x, 8)Lmsalf(s)|ds dx — yx i,* g(a, 8)Lmsslf(s)]ds.
A similar expansion written for x = b gives in the same way
R= [" M(o)Lmuslf(s)lds,
(7.1.6)
2M(s) = ” w(2)g(a, 8) sgn (« — s)dx — Y a(x, 8) sen (as — 8).
The function M(s) is continuous since the discontinuity in the signum
function occurs only where g vanishes.
As examples, let m=n=1, a=2=0, and b=2,=1. Let
w(x) = 1, ¢o(z) =1, and ¢i(z) =z. Then g(x, s) =2—s, and
No =A. = %. Then M(s) = s(s —1)/2. This gives the common
trapezoidal rule with remainder:
i=l
Equations (7.1.8) and (7.1.10) are identical if in the former f(x)e%* is
replaced by f(x).
In (6.101.3) and (6.101.4), the %,(x) are linear combinations with
constant coefficients of the ¢,(x). On multiplying (6.101.4) by w(x) and
integrating, one finds
1 1 treed: Ho M1 i)
Xo L1 Ln Bi be M3
1 Ko My Se Sheets
(7.1.18) ipa
x 1
ees!
1.debt
be
segs)
=", = 0,
nyt
et net ne + +e Ont
then if these z; gre all distinct, the system (7.1.4) is consistent, and defines
a set of coefficients uniquely.
With the x; as determined by (7.1.13), let
(7.1.14) wale) = Ww — &,) = at! + wat" + °° + Onn.
| ik,p(x) wn(x)w(x)dx = 0.
Hence to obtain the polynomial w,(2), one can calculate recursively the
w(x), r <n, instead of expanding the determinant (7.1.13).
In this situation both the x; and the i; were left completely arbitrary
and determined so as to take account of a maximal set of the ¢,. Another
possible procedure would be to restrict the x; and \; by a limited number
of conditions, and then impose as many of the conditions (7.1.4) as
possible. As an example of this, it is sometimes desirable to select the
x; so that the 2; are all equal:
M=Ma
tc =D
There are then to be determined the n + 2 quantities xm, . . . ,%, and ,
and therefore in general n + 2 conditions (7.1.4) can be imposed. For
polynomials ¢, these have the form
A = po/(n + 1),
= (n + 1)p1/no,
@) OPS: iO) ROA L610) ‘6, 6s ela ae)
Zaptt ss (n + 1) pnis/Mo.
Hence the sums of the powers up to the (n + 1)st are known. From
these the elementary symmetric functions of the az; can be formed by
NUMERICAL INTEGRATION AND DIFFERENTIATION 231
means of Eqs. (3.02.5), and hence the equation of degree n + 1 of which
the z; are the roots. Unfortunately the roots 2; do not necessarily
always fall within the range of the integration, and when they do not, the
method is unusable. -
7.2, Numerical Differentiation. Equation (6.101.7) can be written
is exact, the last term representing an expression for the remainder. Let
Lin) = (x a Lo) ene (x — Li)
. Then
f
(1.2.5 )» f(x) == fo, 0) x 21) + Uyf(xo,
oD) 1, 22) ip+ ++ lke> (abd opeuale.
232 PRINCIPLES OF NUMERICAL ANALYSIS
where
(7.2.8) Fe = 14 (0, Pipes, 4 Lan) ebayLey © iia aes ae
When x = 2, Xm) = 0, and the second term drops out of the ee
Continuing, one finds
(7.2.7) f'(@) = 2f(eo, v1, %2) + Ua f(Xo, £1,002, 2s) + + *-
a Dee atl Ce, aces) , £n) = HRC
(7.2.8) R"” = Lind (Lo, U1, - + + yn, 2) or 22 inf (Co, Ti, - + + Un, V, 2)
+ Xn) f(Xo, V1, - «+ « » Un, L, LX, %);
(7.3.6) 6 fi a+ 2/AyHar
12- §3 12-32. §5
| |
31-22 Bls2t
mo
@DY get aT |
(7.3.8) 6=p |e-3
To continue to higher derivatives, we have next that
d(6?)/d3 = 26 d0/dé = 20/p.
Hence
1)2
6? = 2 feta @ pos |ar,
or
(7.3.9) ed ee ey et: |
It can now be shown inductively that
AF (20) = h ( + 40 + Ke :* 2B dr)f(ao)
and by the first therefore
(7.32.2) /wore
2h ses) Se EI" (eo)5 OS 2 apm
h |+ R,
236 PRINCIPLES OF NUMERICAL ANALYSIS
where
jones iE
2 + 5? = 2 cosh 6 = e? + e~?
62 II a + go [Ei — EO)dr,
Hence
(7.33.1) EH — EE" 20 + 1406? — Leos Hh72(1 — 7)(E** — E-O—)dr
M40(E + 4+ E>)
— 608 [2 — 1)(B — BO),
This gives Simpson’s rule with remainder when applied in the usual way
to F(a). After changing the variable of integration on the right, one has
fi (h — v)*[f""
(eo + v) — f’" (ao — v)Jdv = e(yf’ — yl) hs (h — v)%v dv
ey! — yt )h4/12.
Hence
(7.33.3) | f(a)de = Yh(y-1 + 4yo + yi) — At(yf’— yl) /72.
The eet are obtained by setting « equal to 0 and 1.
7.34. Newton’s Three-eighths Rule. To obtain a formula using third
differences requires the expansion of e® to the fourth power of 6. In
order to gain the advantages of symmetry in the expressions, consider the
evaluation of (H* — E-*)F (a2) in terms of H+t*f(xo) and E+f(x»),
assuming the fundamental abscissas to be 743, and iy. Write
€35/2 = il + 360 + 6 6? + % 608 + Rs
ihe
" fa)dex = 46h(3 + 26%) + R,
where
R= $6he{— [lf es— th) — f°" ea t+ th) |dr
| 9 i* Ff" (tg — Qh) — fae + 2rh)|dr}.
After a change of the variables of integration this can be written
Bo,41 = 0, vy> 0.
For if n is allowed to approach infinity in (7.36.7), the expansion becomes
1 +
.
2, Be/>! => oo
oe? +1) =
oe?
sar
+e) 8 8
ean = 5 coth 5°
(8.1.4) 1S ¢(2),
if s = (£1, ..., €) represents the other n coordinates of this point.
Hence if one made random drawings of a large number of points N, testing
inequality (8.1.4) each time a drawing is made, and if the inequality is
satisfied for N’ of these points, then N’/N provides an estimate of ©.
This is the essential idea underlying the Monte Carlo method of
numerical integration. However, in any digital computation one cannot
draw arbitrary points from the cube, but only points whose coordinates
have a digital representation.
Suppose it has been determined that for representing the coordinate
£; in the base 8 it is sufficient to use o; places and that the computation of
244 PRINCIPLES OF NUMERICAL ANALYSIS
¢(z) will then be accurate to7 places. Suppose, further, one has available
some process for drawing digits 0,1, . . . , 8 — 1 at random with equal
probability. One therefore makes o1 +o2+ °° * ton +T drawings.
The first o, digits in order provide the representation of the coordinate £1;
the next o2 provide the representation of the coordinate f, ... ; the
last 7 provide the representation of the coordinate 7. With these repre-
sentations one tests the inequality (8.1.4) to decide whether the selected
point in (n+ 1) space lies inside or outside the volume. There is a
question whether the equality sign should be allowed in (8.1.4), or only
the strict inequality. If the equalities do not arise in sufficient numbers
to make a significant difference, then it is immaterial whether these points
are counted as inside or outside or are neglected altogether. If they make
a significant contribution, the decision must be based upon a considera-
tion of the routine for computing ¢.
If one could really make random selections from all points (£1, . . . ,
£n, n) in the unit hypercube, rather than from those points only whose
coordinates are digital numbers, and could obtain the strict mathematical
value of ¢(x) for any point, one would be repeating the occurrences of an
event with two possible outcomes: “success”? and ‘‘failure’’ with prob-
abilities 6 and 1 — &. By standard statistical formulas one can deter-
mine the probability that in N trials the number of actual successes Ny
will differ from N® by more than any given amount.
But since only points x are drawn for which the £; are digital, one is
at best not estimating the volume ©, but a slightly different volume ®’,
and the statistical formula gives the probability of deviations from N®’,
rather than from N&. The nature of the volume ®’ is best illustrated
for the case n = 2.
Each of the o; digits of £; can have any one of 8 possible values. Hence
there are 8+” possible points z. Associated with each z is a computed
¢*(x), defined by the computational routine. This is a digital number
with 7 places. For each x let 7’ represent a quantity differing from $*(z)
by not more than 6-7/2, and whose exact value depends upon the error
o* (a) — $(2) and upon the rule for including or excluding the equality
in (8.1.4). Then ®’ is equal to 6-*-” times the sum of the quantities 7’
for all possible x. Hence the quantity &’ being estimated is essentially
that approximation to $ that would be obtained by employing a Riemann
sum of 8+: terms for the integral.
The total error in the entire computation is therefore
(8.1.5) & — Ni/N = (®@ — ’) + (@ — N,/N).
The second parentheses is the so-called sampling error, which is the
deviation of the estimate from the quantity being estimated. In the
assertion that the probability is p that this error does not exceed 4, if 6
THE MONTE CARLO METHOD 245
Shmuel Agmon (1951): “The Relaxation Method for Linear Inequalities, ” National
Bureau of Standards, NAML Report 52-27.
N. I. Ahiezer (1947): Digan on the Theory of Approximation” (Russian), Moscow
_ and Leningrad, 323 pp.
Franz Aigner and Ludwig Flamm (1912): Analyse von Abklingungskurven, Physzk. Z.,
13 :1151-1155.
A. C, Aitken (1926): On Bernoulli’s Numerical Solution of Algebraic Equations,
Proc. Roy. Soc. Edinburgh, 46 :289-305.
(1929): A General Formula of Polynomial Interpolation, Proc. Edinburgh
Math. Soc., 1:199-203.
(1931): Further Numerical Studies in syecbraic Equations and Matrices,
Proc. Roy. Soc. Edinburgh, 61:80-90.
(1932a): On Interpolation by Iteration of Proportional Parts, without the
Use of Differences, Proc. Edinburgh Math. Soc. (2), 3:56-76.
(1932b): On the Evaluation of Determinants, the Formation of Their Adju-
gates, and the Practical Solution of Simultaneous Linear Equations, Proc.
Edinburgh Math. Soc. (2), 3:207—219.
(1932c): On the Graduation of Data by the Orthogonal Polynomials of Least
Squares, Proc. Roy. Soc. Edinburgh, 53 :54-78.
(1933): On Fitting Polynomials to Data with Weighted and Correlated Errors,
Proc. Roy. Soc. Edinburgh, A64:12-16.
(1934): On Least Squares and Linear Combination of Observations, Proc. Roy.
Soc. Edinburgh, A565 :42-48.
(1936-1937a): Studies in Practical Mathematics. I. The Evaluation with
Application of a Certain Triple Product Matrix, Proc. Roy. Soc. Edinburgh,
67 :172-181.
(1936-1937b): Studies in Practical Mathematics. II. The Evaluation of the
Latent Roots and Latent Vectors of a Matrix, Proc. Roy. Soc. Edinburgh, 57 :269-
304.
(1937-1938): Studies in Practical Mathematics. III. The Application of
Quadratic Extrapolation to the Evaluation of Derivatives, and to Inverse
Interpolation, Proc. Roy. Soc. Edinburgh, 68:161-175.
(1945): Studies in Practical Mathematics. IV. On Linear Approximation ‘
Least Squares, Proc. Roy. Soc. Edinburgh, A62:138-146.
G. E. Albert (1951-1952): ‘‘A General Approach to the Monte Carlo Estimation of
the Solutions of Certain Fredholm Integral Equations,” I-III, Oak Ridge
National Laboratory Internal Memorandum.
Franz L. Alt (1952): pres tiengular Matrices, Proc. Intern. Congr. Math., 1950,
1:657.
H. Andoyer (1906): Calcul des différences et interpolation, Encyclopédie sct. math., I,
21 :47-160.
R. V. Andree (1951): ee sen of the Inverse of a Matrix, Am. Math. Monthly,
58 :87—-92.
Vv. A. Bailey (1941): Prodigious Calculations, Australian J. Sci., 3:78-80.
247
248 PRINCIPLES OF NUMERICAL ANALYSIS
L. Bairstow (1914): “Investigations Relating to the Stability of the Aeroplane,”
Reports and Memoranda No. 154 of Advisory Committee for Aeronautics.
T. Banachiewicz (1937): Zur Berechnung der Determinanten, wie auch der Inversen,
und zur durauf basierten Auflésung der Systeme linearen Gleichungen, Acta
Astron., 3:41-72,
V. Bargmann, D. Montgomery, and J. von Neumann (1946): “Solution of Linear
Systems of High Order,” Princeton, N.J., Institute for Advanced Study Report,
BuOrd, Navy Dept.
M. S. Bartlett (1951): An Inverse Matrix Adjustment Arising in Discriminant Analy-
sis, Ann. Math. Stat., 22:107-111.
Julius Bauschinger (1904): Interpolation, Enc. Math. Wiss., I D, 3:799-820.
Edmund C. Berkeley (1949): “‘Giant Brains, or Machines That Think,” John Wiley &
Sons, Inc., New York, xvi + 270 pp.
Serge Bernstein (1926): ‘‘Lecons sur les propriétés extrémales et la meilleure approxi-
mation des fonctions analytiques d’une variable réelle,” Gauthier-Villars & Cie,
Paris, x + 207 pp.
Raymond T. Birge and J. W. Weinberg (1947): Least Squares Fitting of Data by
Means of Polynomials, Revs. Mod. Phys., 19:298-360.
M. §. Birman (1950): Some Estimates for the Method of Steepest Descent (Russian),
Uspekhi Mat. Nauk 5, 3(37):152-155.
D. R. Blaskett and H. Schwerdtfeger (1945): A Formula for the Solution of an Arbi-.
trary Analytic Equation, Quart. Appl. Math., 3:266-268.
E. Bodewig (1935): Uber das Euler’sche Verfahren zur Auflésung numerischer
Gleichungen, Comment. Math. Helv., 8:1-4.
(1946a): On Graeffe’s Method of Solving Algebraic Equations, Quart. Appl.
Math., 4:177-190.
(1946): Sur la méthode de Laguerre pour l’approximation des racines de
certaines équations algébriques et sur la critique d’Hermite, Koninkl. Ned. Akad.
Wetenschap. Proc., 49 :910-921.
(1947): Comparison of Some Direct Methods for Computing Determinants
and Inverse Matrices, Koninkl. Ned. Akad. Wetenschap. Proc., 50 :49-57.
(1947-1948): Bericht tiber die verschiedenen Methoden zur Lésung eines
Systems linearer Gleichungen mit reellen Koeffizienten, Koninkl. Ned. Akad.
Wetenschap. Proc., 60:930-941, 1104-1166, 1285-1295; 51:53-64, 211-219.
(1949): On Types of Convergence and on the Behavior of Approximations
in the Neighborhood of a Multiple Root of an Equation, Quart. Appl. Math.,
7 :325-333.
O. Bottema (1950): A Geometrical Interpretation of the Relaxation Method, Quart.
Appl. Math., 7:422-423.
O. L. Bowie (1951): Practical Solution of Simultaneous Linear Equations, Quart.
Appl. Math., 8:369-373.
Alfred Brauer (1946): Limits for the Characteristic Roots of a Matrix, Duke Math. J.,
13 387-395.
(1947): Limits for the Characteristic Roots of a Matrix, II, Duke Math. J.,
14 :21-26.
(1948): Limits for the Characteristic Roots of a Matrix, III, Duke Math. J.,
15 :871-877.
Otto Braunschmidt (1943): Uber Interpolation, J. reine angew. M ath., 185 314-55.
P. Brock and F. J. Murray (1952): ‘‘The Use of Exponential Sums in Step by Step
Integration,” unpublished manuscript.
S. Brodetsky and G. Smeal (1924): On Graeffe’s Method for Complex Roots of Alge-
braic Equations, Proc. Cambridge Phil. Soc:; 22 :83-87.
BIBLIOGRAPHY 249
E. T. Browne (1930): On the Separation Property of the Roots of the Secular Equa-
tion, Am. J. Math., 52 :843-850.
E. M. Bruins (1951): ‘““Numerieke Wiskunde,” Servire, Den Haag, 127 pp.
Joseph G. Bryan (1950): ‘““A Method for the Exact Determination of the Character-
istic Equation and Latent Vectors of a Matrix with Applications to the Dis-
criminant Function for More Than Two Groups,” thesis, Harvard University.
Hans Buckner (1948): A Special Method of Successive Approximations for Fredholm
Integral Equations, Duke Math. J., 15 :197—206.
H. Burkhardt (1904): Trigonometrische Interpolation, Enc. Math. Wiss., II A,
9a 3642-693.
Gino Cassinis (1944): I metodi di H. Boltz per la risoluzione dei sistemi di equazioni
lineari e il loro impiego nella compenzazione delle triangolazione, Riv. catasto e
servict tecnict erartalt, No. 1.
L. Cesari (1931): Sulla risoluzione dei sistemi di equazioni lineari per approssimazioni
successive, Rass. poste, telegrafi e telefoni, Anno 9.
(1937): Sulla risoluzione dei sistemi di equazioni lineari per approssimazioni
successive, Atti accad. nazl. Lincei Rend., Classe sct. fis., mat. e nat. (6a), 25 :422-
428.
F, Cohn (1894): Ueber die in recurrirender Weise gebildeten Gréssen und ihren
Zusammenhang mit den algebraischen Gleichungen, Math. Ann., 44:473-538.
A. R. Collar (1948): Some Notes on Jahn’s Method for the Improvement of Approxi-
mate Latent Roots and Vectors of a Square Matrix, Quart. J. Mech. Appl. Math.,
1:145-148.
L. Collatz (1950a): Iterationsverfahren fiir komplexe Nullstellen algebraischer
Gleichungen, Z. angew. Math. u. Mech., 30:97-101.
(1950b): Uber die Konvergenzkriterien bei Iterationsverfahren fiir lineare
Gleichungssysteme, Math. Z., 53:149-161.
Computation Laboratory (1946): ‘‘A Manual of Operation for the Automatic Sequence
Controlled Calculator,’’ Harvard University Press, Cambridge, 561 pp.
(1949): ‘“‘Description of a Relay Calculator,” Harvard University Press,
Cambridge, 366 pp.
J. L. B. Cooper (1948): The Solution of Natural Frequency Equations by Relaxation
Methods, Quart. Appl. Math., 6:179-182.
A. F. Cornock and J. M. Hughes (1943): The Evaluation of the Complex Roots of
Algebraic Equations, Phil. Mag. (7), 34:314-320.
J. G. van der Corput (1946): Sur approximation de Laguerre des racines d’une
équation qui a toutes ses racines réelles, Koninkl. Ned. Akad. Wetenschap. Proc.,
49 :922-929.
Charles L. Critchfield and John Beck, Jr. (1935): A Method for Finding the Roots of
the Equation f(z) = 0 Where f Is Analytic, J. Research Nat. Bur. Standards,
14 :595-600.
L. L. Cronvich (1939): On the Graeffe Method of Solution of Equations, Am. Math.
Monthly, 46 :185-190.
Prescott D. Crout (1941): A Short Method for Evaluating Determinants and Solving
Systems of Linear Equations with Real or Complex Coefficients, Trans. AIEE,
60 :1235—1240.
Haskell B. Curry (1944): The Method of Steepest Descent for Non-linear Minimiza-
tion Problems, Quart. Appl. Math., 2:258-261.
(1951a): Abstract Differential Operators and Interpolation Formulas, Portu-
galiae Math., 10:135-162.
(1951b): Note on Iterations with Convergence of Higher Degree, Quart. Appl.
Math., 9:204-205.
250 PRINCIPLES OF NUMERICAL ANALYSIS
J. H. Curtiss (1952): ““A Unified Approach to the Monte Carlo Method,” report
presented at meeting of Association for Computing Machinery, Pittsburgh,
May 2-3, 1952.
R. E. Cutkosky (1951): A Monte Carlo Method for Solving a Class of Integral Equa-
tions, J. Research Nat. Bur. Standards, 47 :113-115.
W. Edwards Deming (1938): “Statistical Adjustment of Data,” John Wiley & Sons,
Inc., New York, x + 261 pp.
Bernard Dimsdale (1948): On Bernoulli’s Method for Solving Algebraic Equations,
Quart. Appl. Math., 6:77-81.
C. Domb (1949): On Iterative Solutions of Algebraic Equations, Proc. Cambridge
Phil. Soc., 45 :237-240.
Paul S. Dwyer (1951): ‘Linear Computations,”’ John Wiley & Sons, Inc., New York,
xi + 344 pp.
Engineering Research Associates, Inc. (1950): ‘High-speed Computing Devices,”
McGraw-Hill Book Company, Inc., New York, xiii + 440 pp..
I. M. H. Etherington (1932): On Errors in Determinants, Proc. Edinburgh Math. Soc.,
$:107-117.
G. Faber (1910): Uber die Newton’sche Naherungsformel, J. reine angew. Maith.,
138 :1-21.
V. N. Faddeeva (1950): ‘‘Computational Methods of Linear Algebra” (Russian),
Moscow and Leningrad (Chap. 1, “Basic Material from Linear Algebra,” trans-
lated by Curtis D. Benster), National Bureau of Standards Report 1644.
Leopold Fejér (1934): On the Characterization of Some Remarkable Systems of
Points of Interpolation by Means of Conjugate Points, Am. Math. Monthly,
41 :1-14.
Ervin Feldheim (1939): Théorie de la convergence des procédés d’interpolation et de
quadrature mécanique, Mém. sci. math. acad. sci. Paris, No. 95.
William Feller and George E. Forsythe (1951): New Matrix Transformations for
Obtaining Characteristic Vectors, Quart. Appl. Math., 8:325-331.
Henry E. Fettis (1950): A Method for Obtaining the Characteristic Equation of a
Matrix and Computing the Associated Modal Columns, Quart. Appl. Math.,
8 :206-212.
Donald A. Flanders and George Shortley (1950): Numerical Determination of Funda-
mental Modes, J. Appl. Phys., 21:1326-1332.
A. Fletcher, J. C. P. Miller, and L. Rosenhead (1946): ‘Index of Mathematical
Tables,” McGraw-Hill Book Company, Inc., New York, 450 pp.
{. R. Ford (1925): The Solution of Equations by de Method of Successive Approxima-
tions, Am. Math. Monthly, 32 :272-287.
George E. Forsythe (1951): ‘Tentative Classification of Methods and Bibliography
on Solving Systems of Linear Equations,’ National Bureau of Standards, INA
52-7 (internal memorandum).
(1952): ‘‘Bibliographic Survey of Russian Mathematical Monographs, 1930—
1951,’’ National Bureau of Standards Report 1628.
and Richard A. Leibler (1950): Matrix Inversion by a Monte Carlo Method,
MTAC, 4:127-129.
———- and (1951): Correction to the article, ‘‘ Matrix Inversion by a Monte
Carlo Process,” MT AC, 6:55.
————— and Theodore S. Motzkin (1952): An Extension of Gauss’ Transformation for
Improving the Condition of Systems of Linear Equations, MTAC, 6:9-17.
Tomlinson Fort (1948): “Finite Differences and Difference Equations in the Real
Domain,’’ Oxford University Press, New York.
R. Fortet (1952): On the Estimation of an Eigenvalue by an Additive Functional of a
BIBLIOGRAPHY 251
Stochastic Process, with Special Reference to the Kac-Donsker Method, J.
Research Nat. Bur: Standards, 48 :68-75.
L. Fox (1950): Practical Methods for the Solution of Linear Equations and the
Inversion of Matrices, J. Roy. Stat. Soc., B12 :120-136.
and J. C. Hayes (1951): More Practical Methods for the Inversion of Matrices,
J. Roy. Stat. Soc., B13 :83-91.
, H. D. Huskey, and J. H. Wilkinson (1948): Notes on the Solution of Algebraic
Linear Simultaneous Equations, Quart. J. Mech. Appl. Math., 1:149-173.
J. S. Frame (1944): A Variation of Newton’s Method, Am. Math. Monthly, 61:36-38.
(1949): A Simple Recursion Formula for Inverting a Matrix (Abstract), Bull.
Am. Math. Soc., 65 :1045.
R. A. Frazer (1947): Note on the Morris Escalator Process for the Solution of Linear
Simultaneous Equations, Phil. Mag., 38 :287-289.
and W. J. Duncan (1929): On the Numerical Solution of Equations with
Complex Roots, Proc. Roy. Soc. (London), A125 :68-82.
: , and A. R. Collar (1946): ‘“‘Elementary Matrices and Some Applica-
tions to Dynamics and Differential Equations,” The Macmillan Company, New
York, xvi + 416 pp.
G. F. Freeman (1943): On the Iterative Solution of Linear Simultaneous Equations,
Phil. Mag. (7), 34:409-416.
B. Friedman (1949): Note on Approximating Complex Zeros of a Polynomial, Com-
muns. Pure Appl. Math., 2:195-208.
Thornton C. Fry (1945): Some Numerical Methods for Locating Roots of Polynomials,
Quart. Appl. Math., 3:89-105.
Eduard Firstenau (1860): Neue Methode zur Darstellung und Berechnung der
imagindren Wurzeln algebraischer Gleichungen durch Determinanten der Coeffi-
zienten, Ges. Bef. ges. Naturw., Marburg, 9:19-48.
A. de la Garza (1951): “An Iterative Method for Solving Systems of Linear Equa-
tions,” Oak Ridge, K-25 Plant, Report K-731.
N. K. Gavurin (1950): Application of Polynomials of Best Approximation to Optimal
Convergence of Iterative Processes (Russian), Uspekhi Mat. Nauk 5, 3(37):156-
160.
Wallace Givens (1951): ‘‘Computation of Eigenvalues,’’ Oak Ridge National Labora-
tory (internal memorandum).
(1952): Fields of Values of a Matrix, Proc. Am. Math. Soc., 3:206-209.
James W. Glover (1924): Quadrature Formulae When Ordinates Are Not Equidistant,
Proc. Intern. Math. Congr., Toronto, 831-835.
Herman H. Goldstine and John von Neumann (1951): Numerical Inverting of
Matrices of High Order, II, Proc. Am. Math. Soc., 2:188-202.
Michael Golomb (1943): Zeros and Poles of Functions Defined by Taylor Series,
Bull. Am. Math. Soc., 49 :581—592.
E. T. Goodwin (1950): Note on the Evaluation of Complex Determinants, Proc.
Cambridge Phil. Soc., 46 :450-452.
Lawrence M. Graves (1946): ““The Theory of Functions of Real Variables,’’ McGraw-
Hill Book Company, Inc., New York, x + 300 pp.
Robert E. Greenwood (1949): Numerical Integration for Linear Sums of Exponential
Functions, Ann. Math. Stat., 20:608-611.
and Masil B. Danford (1949): Numerical Integration with a Weight Function
z, J. Math. Phys., 28:99-106.
D. P. Grossman (1950): On the Problem of the Numerical Solution of Systems of
Simultaneous Linear Algebraic Equations (Russian), Uspekhi Mat. Nauk 5,
3(37) :87-103.
252 PRINCIPLES OF NUMERICAL ANALYSIS
CHAPTER 1
1. Suppose that a, b, c, and z are digital numbers and that |ax + b| < |c|. Assume
a machine forming pseudoproducts and pseudoquotients with maximum error e.
For calculating :
y = (ar + b)/c
(a) If |a| < |ec| and |b] < |c|, what routine is optimal, and what is the error?
(b) If Ja] > |c| and |x| > |c|, what error may occur?
ocplt
if the computation makes use of the routine described above for square roots.
5. Obtain formulas for the errors Ay and relative errors Ay/Az due to errors Az for
(a) y = sin z,
(6) y = tan z,
(c) y = sec 2,
(2) y = exp (az),
(e) y = log z.
6. Obtain a formula for the error Az in the solutionof the quadratic equation
ax? + be +c = 0 if the coefficients may be in error by amounts Aa, Ab, Ac.
7. If f = 1 — x?/a, the iteration
CHAPTER 2
1. Solve in four distinct ways, using at least one direct and at least one iterativ:
method:
3.22 — 2.0y + 3.92 = 13.0,
2.12 + 5.ly — 2.92 = 8.6,
5.92 + 3.0y + 2.22 = 6.9.
2. If a, b, c, and d are arbitrary vectors in a plane, show that
Ay a L'W’,
—a 0
where L’ is unit lower triangular and W’ upper triangular, a'z is the element in the
lower right-hand corner of W’.
8. In the process of making a triangular factorization of a matrix A, certain quanti-
ties may vanish and necessitate a reordering of rows or columns, or both, in order to
proceed. Show that the process will go through without such rearrangements if and
only if all the following determinants are non-null:
a ae @11 12 13
11 12
O11, 9 |@2t Gee eg], « « « «
Qo Ane
31 A32 233
CHAPTER 3
g(x) = || @ — 2)
0
PROBLEMS - 265
has no repeated factors, then
1 Eyto tc il eer, ae
Zo Zn Xo In
f(x) Dal Ee Bees ely ye +sw |- ex
g(a) Cie: tate xe oy
f (Xo) f(@n)
L— 2X L— Ln
ie aby. al ih ah
-| 21, Ze Zs =m + wt + 2:2,
«3 23
CHAPTER 4
1. Find the characteristic equation of the matrix
Bol 2) 4
(EE SOR) t
7 3
412 2
2. Evaluate the largest proper value (s) of the above matrix by iteration.
3. Apply Lanczos’s method (§4.23) to obtain the proper values and proper vectors
of the same matrix.
4. Diagonalize by the method of §4.115
Bit Onde 0)
eo) ek
i 23.024h
Onin S
5. Obtain the triple-diagonal form (§4.24) of this matrix.
6. The largest proper value of a certain matrix A of order n is to be evaluated by
iteration. Though the sequence (A”)‘z will give more rapid convergence than the
266 PRINCIPLES OF NUMERICAL ANALYSIS
CHAPTER 5
1. If g(x) is of degree n — 1 or less, show that
Zg(xi)/w'
(xi) = 0.
2. Show that 4 D2xt/w' (xi) = 1.
8. Show that Daiw!’ (x;)/w' (xi) = n(n + 1).
4, For any integer v > 0 show that
are called Chebyshev polynomials of the second kind. Show that they are poly-
nomials, obtain a recursion, and determine their zeros.
6. The values of f(z;) and their differences are to be tabulated, with the z; equally*
spaced, but an erroneous value f(xo) + « is entered in place of f(zo). Show the effect
of this error on the values of the successive differences.
7. The trigonometric functions are known exactly for certain values of the argu-
ment: 0°, +30°, +45°, .... Other values of the sine are to be obtained from these
by interpolation. Use an error formula to ascertain how many figures are reliable
if the interpolating polynomial is quadratic; if it is cubic.
8. Using the Chebyshev points, form the cubic interpolation polynomial for interpo-
lating values of the sine over one quadrant.
CHAPTER 6
1. Taking x; = 7, use the method of §6.111 to construct the polynomials 7,(z),
r =0,1,... , 6, orthogonal on the points —3, —2,..., +8, with W =I.
2. Experimentally measured values y; of f(x;) are given at points z; =7. Values
f'(x:) of the derivative are desired. A standard method is based upon finding the
polynomial of some degree giving the best least-squares fit and differentiating. The
result depends upon the number of points used and upon the degree of the polynomial.
As an example, obtain formulas for f’(0) in terms of y_3, y_2, . . . , Ys.
8. An approximation of the form (6.0.1) to f(z) is required giving the best least-
squares fit to the data, subject to the restriction that the vector c of the coefficients y;
is constrained to satisfy
Be =z
exactly (neglecting rounding errors). Show that, with the auxiliary vector w, c is
determined by the system
F'Fe + Blw = Fy,
Be = z.
4. Obtain the expansion (6.2.2) for
z O 2 6 10 15 18 21
y 140 13.0 10.77 80 50 29 1.0
f(z) = (1 +2 — e™)/z = 0
can be solved for u by the method of §3.2, yielding a sequence of rational fractions in
x approximating log (1 + x). Obtain the fifth term in this sequence. For what
values of x does the sequence converge?
CHAPTER 7
2. Give a direct derivation of the recursion defined by Eqs. (7.1.18) and (7.1.19).
3. For b = —a = 1, w(x) = 1, calculate wo, wi, we, ws, and the 2’s associated with
each.
4. Do likewise with a = 0,b = ©, and w(x) = e*.
6. If x; = x) + th,i = —3, —2,... , +38, obtain a formula of the form
with R vanishing for polynomials up to a degree as high as possible, and find & in
general.
6. By the method outlined in §7.3, obtain explicit expansions in terms of central
differences for derivatives up to the fifth of f(x) at x = o.
7. Let
Tas afbide woe
represent the result of applying a numerical quadrature formula based on equally
spaced abscissas (e.g., Simpson’s rule) to the evaluation of the integral of a particular
function f(z) between fixed limits. Show that J is an even function of h, expressible
in the form
I(h) = I) + Ihh? + I2h4 + ey
where J» is the exact value of the integral. Hence derive a formula for an improved
approximation to Jo, given I(h) and I(h/2).
268 PRINCIPLES OF NUMERICAL ANALYSIS
1
8. Obtain ip‘(1 + 2)%dz numerically using Gauss’s method with a polynomial
of third degree, and compare the result with the true value.
9. Use Simpson’s rule with four subintervals to evaluate
3
f, dx /log x.
CHAPTER 8
1. Obtain a Monte Carlo estimate of the value of the integral in Prob. 8, Chap. 7.
(Note that y > (1 + z)* is equivalent to y2 > 1 +2.)
INDEX
vd t, Gi cin fe) md i oe vw
ah
Quin rAd Be
7 : ; +“ CAT ay, Mgivaey i 4
A CATALOG OF SELECTED
DOVER BOOKS
IN SCIENCE AND MATHEMATICS
CATALOG OF DOVER BOOKS
Astronomy
BURNHAM’S CELESTIAL HANDBOOK, Robert Burnham, Jr. Thorough guide
to the stars beyond our solar system. Exhaustive treatment. Alphabetical by constel-
lation: Andromeda to Cetus in Vol. 1; Chamaeleon to Orion in Vol. 2; and Pavo to
Vulpecula in Vol. 3. Hundreds of illustrations. Index in Vol. 3. 2,000pp. 6% x 94.
Vol. I: 0-486-23567-X
Vol. II: 0-486-23568-8
Vol. III: 0-486-23673-0
Chemistry
THE SCEPTICAL CHYMIST: THE CLASSIC 1661 TEXT, Robert Boyle. Boyle
defines the term “element,” asserting that all natural phenomena can be explained by
the motion and organization of primary particles. 1911 ed. viiit+232pp. 5% x 8.
0-486-42825-7
Engineering
DE RE METALLICA, Georgius Agricola. The famous Hoover translation of great-
est treatise on technological chemistry, engineering, geology, mining of early mod-
ern times (1556). All 289 original woodcuts. 638pp. 6% x 11. 0-486-60006-8
ROCKETS, Robert Goddard. Two of the most significant publications in the history
of rocketry and jet propulsion: “A Method of Reaching Extreme Altitudes” (1919) and
“Liquid Propellant Rocket Development” (1936). 128pp. 5% x 844. 0-486-42537-1
STATISTICAL MECHANICS: PRINCIPLES AND APPLICATIONS, Terrell L.
Hill. Standard text covers fundamentals of statistical mechanics, applications to fluc-
tuation theory, imperfect gases, distribution functions, more. 448pp. 5% x 8%.
0-486-65390-0
ENGINEERING AND TECHNOLOGY 1650-1750: ILLUSTRATIONS AND
TEXTS FROM ORIGINAL SOURCES, Martin Jensen. Highly readable text with
more than 200 contemporary drawings and detailed engravings of engineering pro-
jects dealing with surveying, leveling, materials, hand tools, lifting equipment, trans-
port and erection, piling, bailing, water supply, hydraulic engineering, and more.
Among the specific projects outlined-transporting a 50-ton stone to the Louvre, erect-
ing an obelisk, building timber locks, and dredging canals. 207pp. 8% x 11'.
0-486-42232-1
THE VARIATIONAL PRINCIPLES OF MECHANICS, Cornelius Lanczos.
Graduate level coverage of calculus of variations, equations of motion, relativistic
mechanics, more. First inexpensive paperbound edition of classic treatise. Index.
Bibliography. 418pp. 5% x 844. 0-486-65067-7
Mathematics
FUNCTIONAL ANALYSIS (Second Corrected Edition), George Bachman and
Lawrence Narici. Excellent treatment of subject geared toward students with back-
ground in linear algebra, advanced calculus, physics and engineering. Text covers
introduction to inner-product spaces, normed, metric spaces, and topological spaces;
complete orthonormal sets, the Hahn-Banach Theorem and its consequences, and
many other related subjects. 1966 ed. 544pp. 61 x 914. 0-486-40251-7
SET THEORY AND LOGIC, Robert R. Stoll. Lucid introduction to unified theory
of mathematical concepts. Set theory and logic seen as tools for conceptual under-
standing of real number system. 496pp. 5% x 8%. 0-486-63829-4
CATALOG OF DOVER BOOKS
TENSOR CALCULUS, J.L. Synge and A. Schild. Widely used introductory text
covers spaces and tensors, basic operations in Riemannian space, non-Riemannian
spaces, etc. 324pp. 5% x 84. 0-486-63612-7
ORDINARY DIFFERENTIAL EQUATIONS, Morris Tenenbaum and Harry
Pollard. Exhaustive survey of ordinary differential equations for undergraduates in
mathematics, engineering, science. Thorough analysis of theorems. Diagrams.
Bibliography. Index. 818pp. 5% x 84. 0-486-64940-7
INTEGRAL EQUATIONS, F. G. Tricomi. Authoritative, well-written treatment of
extremely useful mathematical tool with wide applications. Volterra Equations,
Fredholm Equations, much more. Advanced undergraduate to graduate level.
Exercises. Bibliography. 238pp. 5% x 84. 0-486-64828-1
FOURIER SERIES, Georgi P. Tolstov. Translated by Richard A. Silverman. A valu-
able addition to the literature on the subject, moving clearly from subject to subject
and theorem to theorem. 107 problems, answers. 336pp. 5% x 84. 0-486-63317-9
INTRODUCTION TO MATHEMATICAL THINKING, Friedrich Waismann.
Examinations of arithmetic, geometry, and theory of integers; rational and natural
numbers; complete induction; limit and point of accumulation; remarkable curves;
complex and hypercomplex numbers, more. 1959 ed. 27 figures. xii+260pp. 5% x 8/4.
0-486-63317-9
POPULAR LECTURES ON MATHEMATICAL LOGIC, Hao Wang. Noted logi-
cian’s lucid treatment of historical developments, set theory, model theory, recursion
theory and constructivism, proof theory, more. 3 appendixes. Bibliography. 1981 edi-
tion. ix + 283pp. 5% x 8. 0-486-67632-3
History of Math
THE WORKS OF ARCHIMEDES, Archimedes (T. L. Heath, ed.). Topics include
the famous problems of the ratio of the areas of a cylinder and an inscribed sphere;
the measurement of a circle; the properties of conoids, spheroids, and spirals; and the
quadrature of the parabola. Informative introduction. clxxxvi+326pp. 5% x 84.
0-486-42084-1
Physics
OPTICAL RESONANCE AND TWO-LEVEL ATOMS, L. Allen andJ.H. Eberly.
Clear, comprehensive introduction to basic principles behind all quantum optical res-
onance phenomena. 53 illustrations. Preface. Index. 256pp. 5% x 8%. 0-486-65533-4
ATOMIC PHYSICS (8th EDITION), Max Born. Nobel laureate’s lucid treatment of
kinetic theory of gases, elementary particles, nuclear atom, wave-corpuscles, atomic
structure and spectral lines, much more. Over 40 appendices, bibliography. 495pp.
5% x 8h. 0-486-65984-4
. -
4 il
‘ = «4
4" n
; * '
= . 9 a =
i Ui yy tu
~~ 4 qe = Ey (} eS
t ~~
J \. ‘4 tyre fp @ y
‘ pe an ha Ut pal wer al)
’ ay a Ae
5 Si ead
Lipa
Hit We atop ae
i ae m2 mn ts at 4
oe) cat
—
Based on a lecture course given in Oak Ridge for the University of Tennessee,
this volume concerns general topics of the solution of finite systems of linear
and nonlinear equations and the approximate representation of functions.
Specific chapters cover the art of computation, matrices and linear equations,
nonlinear equations and systems, the proper values and vectors of a matrix,
interpolation, more general methods of approximation, numerical integration
and differentiation, and the Monte Carlo method. The Graeffe process,
Bernoulli's method, polynomial interpolation, and the quadrature problem
receive special attention.
Each chapter contains bibliographic notes, and an extensive bibliography
appears at the end. A final section provides 54 problems, subdivided accord-
ing to chapter, for additional reinforcement.
Dover (2006) unabridged republication of the edition originally published by
McGraw-Hill Book Company, Inc., New York, 1953. 288pp. 5% x 8%.
Paperbound.
See every Dover book in pri
www.doverpublication