Moler & Morrison - Pythagorean Sums
Moler & Morrison - Pythagorean Sums
Moler & Morrison - Pythagorean Sums
Donald Morrison
An algorithm is presented for computing a “Pythagorean sum” a 0 b = d m directly froma and b without computing
their squares or takinga square root. No destructive floating pointoverflows or underflows are possible. The algorithm can be
extended to computethe Euclidean norm of a vector. The resulting subroutineis short, portable, robust, and accurate, but not
as eficient assome other possibilities. The algorithm is particularly attractivefor computers where space and reliability are
more important thanspeed.
1. Introduction
It is generally accepted that “square root” is a fundamental The conventional Fortran construction
operation in scientific computing. However, we suspect that
square root is actually used most frequently as part of an R = SQRT(X**2+Y**2)
even more fundamental operationwhich we call Pythagorean may produce damaging underflows and overflows even
addition: though the data and the result are well within the range of
the machine’sfloatingpoint number system. Similar con-
a0b = d m . structions in other programming languages may cause the
The algebraic properties of Pythagorean addition are very same difficulties.
similar to those of ordinary addition of positive numbers.
Pythagorean addition is also the basisfor many different The remedies currently employed in robust mathematical
computations: software lead to code which is clever,but unnatural, lengthy,
possibly slow, and sometimes not portable. This is even true
Polar conversion: of the recently published approaches to the calculation
of the
Euclidean vector norm by Blue [ 11 and by the Basic Linear
r=xOy;
Algebra Subprograms group,Lawson et al. [2].
Complex modulus:
In this paper we present an algorithm pythag(a,b)which
I z I = real(t) o imag(z); computes a 0 b directly from a and b, without squaring
themand without takinganysquare roots. The result is
Euclidean vector norm:
robust, portable, short,and, we think,elegant.It is also
[I u 1 = u, 0 u* 0 ... 0 U ” ; potentially faster thana square root. We recommend that the
algorithmbe consideredfor implementation in machine
Givens rotations: language or microcode on future systems.
(c, :)(;) (J = ;
One of our first uses of pythag and theresulting Euclidean
norm involved a graphics minicomputerwhich has a sophisti-
cated Fortran-based operating system, but only about 32K
bytes of memory available to the user. We implemented
o Copyright 1983 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of
royalty provided that ( 1 ) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on
the first page. The titleand abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by
computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the
Editor. 577
IBM J. RES. DEVELOP. VOL. 27 NO. 6 NOVEMBER 1983 CLEVE MOLER AND DONALD MORRISON
MATLAB [3], an interactive matrix calculator based on are involved. There may beunderflows if I b I is much smaller
LINPACK and EISPACK. In this setting, the space occu- than 1 a 1, but as long as such underflows are quietly set to
pied by both source and objectcode was crucial. MATLAB zero. no harm will result in most cases.
does matrix computationsin complex arithmetic, so pythag is
particularly useful. We are able toproduce robust, portable There canbe some deteriorationin accuracy if both 1 a 1 and
softwarethat uses the full range of the floatingpoint Ib( are very near p, thesmallest positive floating point
exponent. number. As an extreme example,suppose a = 4p and b
= 3p. Then the iteratesshown above should simplybe scaled
2. Algorithm pythag by p. But the valueof q after thefirst iteration would be less
The algorithm for computing pythag(a,b) = a @ b is than p and so would be set to zero. The process would
terminate early with the corresponding value of p, which is
real function pythag(a,b) an inaccurate, butnot totally incorrect, result.
real a,b,p,q,r,s
P:= max(lal,lbl)
3. Euclidean vector norm
q:=min(lal,lbJ)
A primary motivation for our development of pythag is its
while (q is numerically significant)
use in computing the Euclidean norm or 2-norm of a vector.
do
The conventional approach, which simply takes the square
r := (SIP)’
root of the sumof the squaresof the components, disregards
s := r/(4+r)
the possibility of underflow and overflow, thereby effectively
p := p+2*s*p
halving the floating point exponentrange. The approachesof
q := s*q
Blue [ 11 and Lawson et al. [2] provide for the possibility of
od
accumulatingthreesums,one of small numbers whose
pythag := p
squares underflow, one of largenumbers whose squares
The two variables p and q are initialized so that overflow, andone of “ordinary-sized” numbers. Environ-
mental inquiries or machine- and accuracy-dependent con-
p@q=a@bandO<q<p.
stants areneeded to separate the threeclasses.
The main part of the algorithm is an iteration that leaves
p @ q invariant while increasing p and decreasing q. Thus Withpythagavailable,computation of the 2-norm is
when q becomes negligible, p holds the desired result. We easy:
show in Section 4 that the algorithm is cubically convergent
and that it will never require more than three iterations on real function norm2(x)
any computer with 20 or fewer significant digits. It is thus real vector x
potentially faster than theclassical auadratically convergent real s
iteration for square root. s:= 0
for i := 1 to (number of elements in x)
s := pythag(s,x(i))
There areno square roots involved and, despite the title
of
norm2 := x
this paper, the algorithm cannotbe used to computea square
root. If either argument is zero, the result is the absolute
This algorithm has all the characteristics that might be
value of the other argument.
desired of it, except one. It is robust-there are no destruc-
tive underflows and no overflows unless the resultmust
Typical behavior of thealgorithm is illustrated by overflow. It is accurate-the round-off error corresponds to
pythag(4,3). The values of p and q after each iteration are a few units in the last digit of each component of the vector.
iteration P 4 It is portable-there are no machine-dependent constantsor
environmental inquiries. It is short-both the source code
0 4.000000000000 3 . ~ ~ ~ ~ ~ ~ the~ object
and 0 code
~ require
~ 0very ~little memory. It accesses
1 4.986301369863
0.369863013698 eachelement of the vector only once, which is of some
importance in virtual memory and other modern operating
2 4.999999974188 0 ~ ~ ~ ~ 5 ~ 8 ~ 5 2 6 3 3
systems.
3 5.000000000000 ~ . ~ ~ ~ ~ ~
The only possible drawback is its speed. For a vector of
The most important feature of the algorithm is its robust- length n, it requires n calls to pythag. Even if pythag were
ness. There will be no overflows unless the final result implemented efficiently, this is roughly the same asn square
578 overflows. Infact, no intermediateresultslargerthan a@b roots. The approachesof [ 11 and [2] require only n multipli-
CLEVE MOLER AND DONALD MORRISON IBM J. RES. DEVELOP. VOL. 27 NO. 6 NOVEMBER 1983
cations for the most frequent case where the squares of the It follows that after three iterations
vector elements do not underflow or overflow. However, in
most of the applicationswe are awareof, speedis not a major r 2 20
e < - = - < 0.5*10-
consideration.Inmatrixcalculations, for example,the 2 u
Euclidean normis usually required only in an outerloop. The
time-determining calculations do not involve pythag. Thus, If the arithmetic were done exactly, after three iterations
in our opinion, all the advantages outweigh this one disad- the valueof p would agree with the truevalue of p 0q to 20
vantage. decimal digits.If there were furtheriterations,eachone
would at least triple the number of correct digits. Initial
4. Convergence analysis values with q < p produce even more rapid convergence.
Whentheiteration in pythag is terminated and the final
value of p accepted as theresult, the relative erroris With quadraticallyconvergent iterations such as theclas-
sical squareroot algorithm, itis often desirable touse special
e = (P 0 - P ) / ( P 0 4) starting procedures to produce good initial approximations.
= ( G - l ) / G , Our choice of initial values with q Ip can be regarded as
such a starting procedure since the algorithm will converge
where r = ( q / p ) ' . (We assume throughout thissection that
even without this condition. However, since the convergence
initially p and q are positive.)
is so rapid, it seems unlikelythat any more elaborate starting
mechanism would offer any advantage.
The values of e and r are closely related, and thevalues of
their reciprocals are even more closely related. In fact,
5. Round-off error and stopping criterion
1 1
"-- +1+-.
JiG In addition to being robust with respect to underflow and
e r overflow, theperformance of pythag in the presence of
round-off error is quite satisfactory. It is possible to show
Since 1 < < 1 + r / 2 , it follows that that after each iteration the computed value of the variablep
is the same as the value that would be obtained with exact
-2r + 1 < -e1< - +r2 - .23 computation on slightly perturbed starting values. The rapid
convergence guarantees that thereis no chance for excessive
Thus 1 / e exceeds 2 / r by at least 1 and atmost 1.5. accumulation of rounding errors.
The values of u taken in successive iterations aregiven by 1. Take a fixed number of iterations.
u := u ( u + 3)'. The appropriate number depends upon the desired accura-
- 0,
cy: two iterations for6 or fewersignificantdigits,
iteration implies the cubic convergence of the algorithm. machine and precision dependence. Moreover, fewer itera-
Since initially we have 0 < q Ip , it follows that tions are necessary forpythag(a,b) with b much smaller than
U.
O<r<land4<u,
and u increases rapidly from thevery beginning. If the initial 2. Iterate until thereis no change.
value of q / p happens tobe an integer, thenu takes on integer
values. This can be implemented in a machine-independent man-
ner with something like
The most slowly convergent case has initial values p = q
and r = 1. The iteratedvalues of u are
iteration2 0 1 3 4 ps := p
U 196 4 7761796 >4*1OZ0 >lo6' p :=p+2*s*p 579
1BM J. RES. DEVELOP. VOL. 27 NO. 6 NOVEMBER 1983 CLEVE MOLER AND DONALD MORRISON
if p = ps then exit 6. Some related algorithms
It is possible tocompute Jn
by replacing the
statement
:= (q/P)2
This is probably the most foolproof criterion, but it always
uses oneextraiteration,justto confirm thatthe final in pythag with
iteration was not necessary.
r := -(q/p)’.
3. Predict that there will be no change. The convergence analysis in Section 4 still applies, except
that r and u take on negative values. In particular, when a
The idea is to doa simple calculation earlyin the step that = b, the initial value of u is -4 and this value does not
will indicate whether or not the remainder of the step is change. The iterationbecomes simply
necessary. If we usef(x) A y to mean that the computed
value off(x) equals y , then the condition we wish to predict P := P/3,
is q := -913.
p + 2sp A p. The variablep approaches zero as it should, but the conver-
gence is only linear. If a # b, the convergence is eventually
When r is small, then s = r/(4 + r ) is less than and almost cubic, but many iterations may be required to enter the cubic
equal to r/4. Consequently, a sufficient and almost equiva- regime.
lent condition is
2 + r ~ 2 . function sqrt(z)
real z,p,r,s
However, this is not quite true. Let /3 be the base of the p:= 1
floating point arithmetic. For any floating point number p in r := z-1
the range 1 5 p < 8, the setof floating point numbers d for
which while (r is numerically significant)
p+d-p do
s := r/(4+r)
of d for which
is the same as the set p := p+2*s*p
r := r*(s/(1+2*s))’
l+dGl. od
sqrt := p
In other words, the conditions p + dp A p and 1 + d A 1 are
precisely equivalent only whenp is a power of 8. Although this algorithmwill converge for anypositive z, it is
most effective for values of z near 1. The algorithm can be
We have chosen to stopwhen derived from the approximation
4+r-4.
-2-
4 + 3r
4+r’
There are threereasons for thischoice. The quantity4 + r is
available earlyin the step andis needed in computing s. The which is accurate to second order for small values of r. The
condition is almost equivalent to predicting no change in p. classical quadratically convergent iteration for square root
The variablesp and q have already been somewhat contami- can bederived from the approximation
nated by round-off error fromprevious steps. r
-21+-,
2
The satisfactory error properties of pythag are inherited
by norm2. I t is possible to show that the computed valueof which is accurate only to first order. The cubically conver-
norm2(x) is the exact Euclidean norm of some vector whose gent algorithm requiresfewer iterations, but more operations
individual elements are within the round-off error of the per iteration. Consequently, its relative efficiency depends
580 corresponding elements of x. upon the detailsof the implementation.
CLEVE MOLER AND DONALD MORRISON IBM I. RES. DEVELOP. VOL. 27 NO. 6 NOVEMBER 1983
The Euclidean normof a vector can alsobe computed by a 2. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh,
generalization of pythag(a,b) to allow a vector argument “Basic Linear Algebra Subprograms for Fortran Usage,” ACM
Trans. Math. Software5,308-323 (1979).
with any number of components in place of (a,b), a vector 3. Cleve Moler, “MATLAB Users’ Guide,” Technical Report
argument with only two components: CS81-I, Department of Computer Science, University of New
Mexico, Albuquerque.
vector-pythag(x)
real vector x,q Received June 6, 1983; revised July 15, 1983
real p,r,s,t
p := (any nonzero componentof x, preferably the largest)
q := (x with p deleted)
while (q is numerically significant)
do Cleve 0. Moler Department of ComputerScience,University
of New Mexico, Albuquerque, New Mexico 87131.Professor Moler
r := (dot product of q / p with itself) has been with the University of New Mexico since 1972. He is
s := r/(4+r) currently chairman of the Department of Computer Science. His
p := p+2*s*p research interests include numerical analysis, mathematical soft-
ware, and scientific computing. He received his Ph.D. in rnathemat-
q := s*q ics from Stanford University, California, in 1965 and taught at the
od University of Michigan from 1966 to 1972. Professor Moler is a
vector-pythag := p member of the Association for Computing Machinery and the
Society for Industrial and Applied Mathematics.
58 1
IBM J. RES. DEVELOP. VOL. 27 0 IVO. 6 NOVEMBER 1983 (3LEVE MOLER A N D DONALD MORRISON