0% found this document useful (0 votes)
151 views

A Unified Algorithm For Elementary Functions

This paper describes a single unified algorithm for calculating elementary functions using coordinate rotation. The algorithm is based on rotating vectors in linear, circular, or hyperbolic coordinate systems depending on the function. Only basic operations like shifting, adding, subtracting and using prestored constants are required. The algorithm's limited domain of convergence is analyzed, along with modifications to extend it for floating point calculations. The paper also describes an implemented floating point processor using this algorithm, including its block diagram, microprogram control, and performance measures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views

A Unified Algorithm For Elementary Functions

This paper describes a single unified algorithm for calculating elementary functions using coordinate rotation. The algorithm is based on rotating vectors in linear, circular, or hyperbolic coordinate systems depending on the function. Only basic operations like shifting, adding, subtracting and using prestored constants are required. The algorithm's limited domain of convergence is analyzed, along with modifications to extend it for floating point calculations. The paper also describes an implemented floating point processor using this algorithm, including its block diagram, microprogram control, and performance measures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A unified algorithm for elementary functions

by J. S. WALTHER
Hewlett-Packard Company
Palo Alto, California

SUMMARY P = (x, y) sho~n in Figure 1 are defined as


This paper describes a single unified algorithm for the R=[X2+ my2J/2
calculation of elementary functions including multipli- A =m-1/ 2 tan-1 [ml / 2y/xJ
cation, division, sin, cos, tan, arctan, sinh, cosh, tanh,
arctanh, In, exp and square-root. The basis for the I t can be shown that R is the distance from the origin
algorithm is coordinate rotation in a linear, circular, or to the intersection of the curve of constant radius with
hyperbolic coordinate system depending on which the x axis, while A is twice the area enclosed by the
function is to be calculated. The only operations re- vector, the x axis, and the curve of constant radius,
quired are shifting, adding, subtracting and the recall divided by the radius squared. The curves of constant
of prestored constants. The limited domain of con- radius for the circular (m = 1), linear (m = 0), and
vergence of the algorithm is calculated, leading to a hyperbolic (m = -1) coordinate systems are shown in
discussion of the modifications required to extend the Figure 1.
domain for floating point calculations.
A hardware floating point processor using the algo- ITERATION EQUATIONS
rithm was built at Hewlett-Packard Laboratories. The
block diagram of the processor, the microprogram
Let a new vector Pi+! = (Xi+!, Yi+l) be obtained from
control used for the algorithm, and measures of actual
P i = (Xi, Yi) according to
performance are shown.
(3)
INTRODUCTION (4)
The use of coordinate rotation to calculate elementary where m is the parameter for the coordinate system,
functions is not new. In 1956 VoIder developed a class
and Oi is an arbitrary value. The angle and radius of
of algorithms for the calculation of trigonometric and
the new vector in terms of the old are given by
hyperbolic functions, including exponential and loga-
rithm. In 1959 he described a COordinate Rotation A i+1 =A i - a i (5)
DIgital Computer (CORDIC) for the calculation of (6)
Ri+l=Ri*Ki
trigonometric functions, multiplication, division, and
where
conversion between binary and mixed radix number ai = m- 1/ 2 tan-Ieml/2oiJ (7)
systems. Daggett in 1959 discussed the use of the
CORDIC for decimal-binary conversions. In 1968 Ki = [1 +moi2J/2 (8)
Liccardo'did a master's thesis on the class of CORDIC J

The angle and radius are modified by quantities which


algorithms.
are independent of the coordinate values. Table I gives
It is not generally realized that many of these algo-
the equations for ai and Ki after applying identities A2
rithms can be merged into one unified algorithm.
and A5 from the appendix.
For n iterations we find
COORDINATE SYSTEMS
(9)
Let us consider coordinate systems parameterized by
m in which the radius R and angle A of the vector Rn=Ro*K (10)

379

From the collection of the Computer History Museum (www.computerhistory.org)


380 Spring Joint Computer Conference, 1971

y These relations are summarized in Figure 2 for m= 1,


m = 0 and m = -1 for the following special cases.
1. A is forced to zero: Yn = O.
m= 1
2. z is forced to zero: Zn = o.
-- ...............
The initial values Xo, Yo, Zo are shown on the left of each
" block in the figure while the final values x n, Yn, Zn are
shown on the right. The identities given in the appendix
were used to simplify these results. By the proper choice
of the initial values the functions x z, y/x, sin z, cos z,
tan- l y, sinh z, cosh z, and tanh- 1 y may be obtained. In
addition the following functions may be generated.
tan Z = sin z/cos Z (17)
tanh Z= sinh Z/ cosh Z (18)

R exp z=sinh z+cosh z (19)


In w=2 tanh-ley/x] where x=w+1 andy=w-1 (20)
Figure I-Angle A and Radius R of the vector P=(x, y)
(w)1/2= (X 2_ y2)1/ 2 where x=w+}i and y=w-}i (21)

where
n-l CONVERGENCE SCHEME
a=Lai (11)
i=O The angle A of the vector P may be forced to zero
n-l by a converging sequence of rotations ai which at each
K=IIKi (12) step brings the vector closer to the positive x axis.
i=O

The total change in angle is just the sum of the incre-


mental changes while the total change in radius is the
product of the incremental changes.
If a third variable z is provided for the accumulation
of the angle variations
,~ tr= '.
7

z
Y

Z ,----+0
i

CIRCUlAR (_1). z ... 0


(<<0,. - ,.~
Itl (y cos z + x sin z)
.)

'1' ~~~
y

z
Y

Z
0

-1
z + tan (y/x)

CIRCULAR (_1). A ... 0


(13) n-1 2 1/2
~ - If (1 + OJ ) for n iterations
j-O
and the set of difference equations (3), (4), and (13)
is solved for n iterations, we find,
X~ X ~x

xn =K {xo cos (am I/2 ) +yoml/2 sin (am I/2 ) } (14) y --J--~LY + xz
Yn = K {yo cos ( aml/ 2) - xom-1/2 sin ( aml/ 2) } (15) z~t- Z i~o
LINEAR (m-O). z ... 0 LINEAR (m-O). A ... 0
zn=zo+a (16)
where a and K are as In equations (11) and (12).
'->~'-'''' -" + , .......) x~HX _iC_ 1 l"T7
TABLE I-Angles and Radius Factors
Y ~ __~----41t_1 (Y cosh z + x sinh z) y ~ ._Y_!---7 0
z~~z_l-~o z ~~~ z + tanh
-1
(y/x)

Coordinate Radius HYPERBOLIC (m - -1). z ... 0 HYPERBOLIC (m - -1). A ... 0

System m Angle (Xi Factor Ki n-1


Ir 2 1/2
It_I - II (1 - OJ ) for n iterations

1 tan-l~i (1 +~i2)112 j-O

0 ~i 1
-1 tanh-l~i (1_~~.2)1I2
Figure 2-Input-output functions for CORDIC modes

From the collection of the Computer History Museum (www.computerhistory.org)


Unified Algorithm for Elementary Functions 381

The magnitude of each element of the sequence may be TABLE II-Shift Sequences for a binary code
predetermined, but the direction of rotation must be
determined at each step such that coordinate domain of radius
radix system shift sequence convergence factor
(22) p m Fmi; i~O max I Ao I K

The sum of the remaining rotations must at each


2 1 0, 1, 2, 3, 4, i, ... .-..1. 74 .-..1.65
step be sufficient to bring the angle to at least within 2 1.0 1.0
0 1,2,3,4,5, i+l, ...
an-l of zero, even in the extreme case where Ai=O, 2 -1 1, 2, 3, 4, 4, 5, .... * '-"1.13 ,,-,0.80
1 Ai+11 =ai. Thus,

n-l * for m = -1 the following integers are repeated:


ai- ~ aj<an-l (23) {4, 13, 40, 121, ... , k, 3k+ 1, ... }
j=i+l

The domain of convergence is limited by the sum of


the rotations .. z to zero. The proof of convergence proceeds exactly as
n-l before except that A is replaced by z in equations (22)
/ Ao /- ~ aj<an-l (24) through (29). By equation (25) z has the same domain
j=O of convergence as A.
n-l
max 1 Zo 1 = max 1 Ao I (30)
max 1 Ao 1 = an-l + ~ aj (25)
j=O Note that since K is a function of Oi 2, where Oi =
To show that A converges to within an-l of zero m-1/ 2 tan[ml/2aiJ, K is independent of the sequence of
within n steps we first prove the following theorem. signs chosen for the ai. Thus, for a fixed sequence of
ai magnitudes the constant 11K may be used as an
initial value to counteract the factor K present in the
Theorem final values.
n-l
/ Ai 1 < an-l + ~ aj (26)
j=i
USE OF SHIFTERS
holds for i~O.

The practical use of the algorithm is based on the


Proof use of shifters to effect the multiplication by Oi. If p is
the radix of the number system and F i is an array of
We proceed by induction on i. The hypothesis (26) integers, where i ~ 0, then a multiplication of x by
holds for i=O by (24). We now show that if the hy- (31)
pothesis is true for i then it is also true for i+ 1. Sub-
tracting ai from (26) and applying (23) at the left is simply a shift of x by F i places to the right. The
side yields integers F i must be chosen such that the angles
(32)

satisfy the convergence criterion (23). The domain of


n-l]
convergence is then given by (25).
-ai< [ an-l+ ~ aj (27) Table II shows some F sequences, convergence
j=i+l
domains, and radius factors for a binary code.
Application of (22) then yields The hyperbolic mode (m = -1) is somewhat compli-
n-l cated by the fact that for ai =tanh-1 (2- i ). the con-
/ Ai+11 < an-l +~
j=i+l
aj (28) vergence criterion (23) is not satisfied. However, it can
be shown that
as was to be shown. Therefore, by induction, the hy-
pothesis holds for all i~O. (33)
In particular, the theorem is true for i = n so that
and that therefore if the integers {4, 13, 40, 121, ... , k,
(29)
3k+ 1, ... } in the Fi sequence are repeated then (23)
The same scheme may be used to force the angle in becomes true.

From the collection of the Computer History Museum (www.computerhistory.org)


382 Spring Joint Computer Conference, 1971

TABLE III-Prescaling Identities

Domain of
Identity Domain Convergence

sin (Q ~+ D) r sin D if Q mod 4=0}


= i C?S D. ~f Q mod 4 = 1 1D 1<~=157
2 . 1.74
2 - sm D If Q mod 4 =2
L-cos D if Q mod 4=3
COS D if Q mod 4=01

cos(Q ~+D)= -sin D ~fQ mod 4=1 1.74


2 { -cos D If Q mod 4=2J
sin D if Q mod 4=3

IDI<~=157
2 . 1.74

tan-' G)~~-tan-'(Y) 1 y 1<1.0 00

2Q
sinh(Q loge2+D) =2" [cosh D+sinh D-2-2Q(cosh D-sinh D)] 1D 1<loge2 =0.69 1.13

2Q
cosh(Q loge2+D) =2" [cosh D+sinh D+2-2Q(cosh D-sinh D)] 1D 1<loge2 =0.69 1.13

tanh(Q loge2+D) = sinh (Q loge2+D)/cosh(Q loge2+D) 1 D 1<loge2 =0.69 1.13

tanh-1(1- M2- E ) = tanh-1(T) + (E /2)loge2 0.17 <T <0.75 ( -0.81, 0.81)

where T=(2-M -M2-E)/(2+M -M2-E) for 0.5::;;M <1, E~l

exp(Q loge2+D) =2Q(cosh D+sinh D) 1D 1<loge2 =0.69 1.13

loge(M2 E) =logeM +Eloge2 0.5SM <1.0 (0.10, 9.58)

(2E12 sqrt(M) if E mod 2 =0 1 (0.5::;;M <1.0


sqrt(M2E) = i ~ i (0.03, 2.42)
l2 (E+l) 12 sqrt(M/2) if E mod 2 = 1 J l0.25 sM/2 <0,5
0.5::;;1 M,,! <1.0 (-1.0, 1.0)

0.25::;;! M y/2Mz 1< 1.0 (-1.0, 1.0)

EXTENDING THE DOMAIN of a large argument we first shift the argument's binary
point E places until it is just to the left of the most
The limited domain imposed by the convergence significant non-zero bit. The fraction M then satisfies
criterion (25) may be extended by means of the pre- 0.5 ~M < 1.0 and as shown in the table therefore falls
scaling identities shown in Table III. For example, to within the domain of convergence. The answer is calcu-
calculate the sinc of a large argument, we first divide lated as logeM + E loge2.
the argument by 7r/2 obtaining a quotient Q and a re-
mainder D where I D I< 7r/2. The table shows that
only sin D or cos D need be calculated and that 7r/2 is ACCURACY
within the domain of convergence. Note that the sine
and cosine can be generated simultaneously by the The accuracy at the nth step is determined in theory
CORDIC algorithm and that the answer may then be by the size of the last of the converging sequence of
chosen as plus or minus one of these according to Q rotations ai, and for large n -is- approximately equal in
mod 4. As a second example, to calculate the logarithm digits to Fn-l. The accuracy in digits may conveniently

From the collection of the Computer History Museum (www.computerhistory.org)


Unified Algorithm for Elementary Functions 383

be made equal to L, the length of storage used for each


variable, by choosing n such that F n-l = L. Shifter Adder
Control Control
In practice the accuracy is limited by the finite
length of storage. The truncation of input arguments
performed to make them fit within the storage length
gives rise to unavoidable error, the size of which de- +
pends on the sensitivity of the calculated function to
small changes in the input argument. In a binary code, ADDER I +mu
' - - - - - - - - 4 SUBTRACTER
the truncation of intermediate results after each of L
iterations gives rise to a total of at most log2L bits of
error. This latter error can be rendered harmless by using
L+lo~L bits for the storage of intermediate results.
In a normalized floating point number system it is +
desirable that all L bits of the result be accurate, inde-
pendent of the absolute size of the argument. To ac- ADDER I -u
complish this for very small arguments it is necessary SUBTRACTER

to keep each storage register in a normalized form; i.e.,


DECISION { SIGN OF Y
in a form where there are no leading zeros. It is possible
SIGNALS SIGN OF1
to do this by transforming the iteration equations (3),
(4), (13) to a normalized form according to the follow-
ing substitutions. +

x becomes x' (34) ADDER I +u


SUBTRACTER
y becomes y' 2-E (35)
z becomes z' 2- E (36)
aF becomes aF' 2-F (37)
CONSTANTS: ex m t
F

where E, a positive integer, is chosen such that the


initial argument, placed into either the y or z register, READ-
ONLY
is normalized. MEMORY
The result of the substitutions is
x'(;-x' +my'2-(F+E) (38)
y'(;-y' - X'2-(F-E) (39) Figure 3-Hardware block diagram
z'(;-z' +ap'2-(F-E) (40)
Fortunately, not all the reciprocal constants l/Km i
For simplicity the subscripts i and i+ 1 have been
need to be stored since for large values of i
dropped. Instead, a has been expressed as a function
of F as in equation (32), and the replacement operator
((;-) has been used. i may be initialized to a value such - 1.-=I-m(%)2-
2' 2
?" (44)
K m i
that Fi=E:
(41) and therefore all the constants having i>L/2 are
identical to within L significant bits. Therefore, only
and n may be chosen such that L significant bits are
L/2constants need to be stored for m = + 1 and also
obtained:
for m = -1. For m = 0 no constants need to be stored
(42)
since K O i = 1 for i2::1.
Note that n-iinitial=L and that therefore providing A similar savings in storage can be made for the
L+lo~L bits for the storage of intermediate results is angle constants am,F since for large values of F
still adequate.
a'm,F=am,F 2F =1-m(73)2-2F , (45)
The radius factor K is now a function of i = iinitial as
well as m. and thus, as for the K constants, only L/2 constants
n-l need to be stored for m = + 1 and also for m = -1.
K m.i = II (1 +m2-2Fi) 1/2 (43) For m=O no constants need to be stored since a'o,F = 1
j=i
for F2::1.

From the collection of the Computer History Museum (www.computerhistory.org)


384 Spring Joint Computer Conference, 1971

The initial argument and correction constants are


loaded into the three registers and m is set to one of the
m= {+1.0.-1} three values 1, 0, -1. If the initial argument is small,
it is normalized and E is set to minus the binary ex-
ponent of the result, otherwise, E is set to zero. Next,
i is initialized to a value such that Fm,i=E. A loop is
then entered and is repeated until F m,i- E = L. In this
loop the direction of rotation necessary to force either
of the angles A or z to zero is chosen; the binary vari-
able u, used to control the three adder/subtracters, is
set to either + 1 or -1; and the iteration equations are
executed.
U=l U=-l
Table IV gives a breakdown of the maximum execu-
tion times for the most important functions. The fig-
ures in/the column marked "data transfers from com-
puter" are the times for operand and operation code
transfers between the processor and an HP-2116
computer.
The processor retains the result of each executed
function. Thus, add, subtract, multiply and divide re-
quire only one additional operand to be supplied, and
the one operand functions do not require any operand
ENDTEST transfers. The first operand is loaded via the LOAD
NO instruction, and the final result is retrieved via the
STORE instruction.
YES

END
TABLE IV-Maximum Execution Times

Figure 4-Flowchart of the microprogram control DATA


CORDIC PRESCALE, TRANSFERS
EXE- NORMAL- FROM
CUTION IZE, MISC. COMPUTER TOTAL
HARDWARE IMPLEMENTATION ROUTINE .usec .usec .usec JLsec

A hardware floating point processor based on the LOAD 0 5 25 30


CORDIC algorithm has been built at Hewlett-Packard STORE 0 0 15 15
Laboratories. Figure 3 shows a block diagram of the
processor which consists of three identical arithmetic ADD 0 15 25 40
units operated in parallel. Each arithmetic unit con- SUBTRACT 0 25 25 50
tains a 64-bit register, an 8-bit parallel adder/sub- MULTIPLY 60 15 25 100
tracter, and an 8-out-of-48 multiplex shifter. The as- DIVIDE 60 15 25 100
sembly of arithmetic units is controlled by a micro-
program stored in a read-only memory (ROM), which SIN 70 85 5 160
also contains the angle and radius-correction constants. COS 70 85 5 160
The ROM contains 512 words of 48 bits each and oper- TAN 130 85 5 220
ATAN 70 15 5 90
ates on a cycle time of 200 nanoseconds. 130
SINH 70 55 5
The processor accepts three data types: 48-bit float- COSH 70 55 5 130
ing point, 32-bit floating point, and 32-bit integer. All TANH 130 55 5 190
the functions are calculated to 40 bits of precision ATANH 70 45 5 120
(approximately 12 decimal digits), and the accuracy EXPONENTIAL 70 55 5 130
LOGARITHM 70 45 5 120
is limited only by the truncation of input arguments.
SQUARE- 70 25 5 100
The essential aspects of the microprogram used to ROOT
execute the CORDIC algorithm are shown in Figure 4.

From the collection of the Computer History Museum (www.computerhistory.org)


Unified Algorithm for Elementary Functions 385

CONCLUSION Masters Thesis EE Dept University of California at


Berkeley September 1968
3 J E VOLDER
The unified CORDIC algorithm is attractive for the Binary computation algorithms for coordinate rotation and
calculation of elementary functions because of its function generation
simplicity, its accuracy, and its capability for high Convair Report IAR-1148 Aeroelectronics Group June 1956
speed execution via parallel processing. Its applications 4 J E VOLDER
include desktop calculators, as in the HP-9100 series; The Cordie trigonometric computing technique
IRE Transactions on Electronic Computers Vol EC-8 No 3
air navigation computers, as described in VoIder's pp -330-334 September 1959
original work; and floating point processors, as illus-
trated in this paper.

APPENDIX
ACKNOWLEDGMENTS

The author wishes to thank the many people at Mathematical identities


!Jewlett-Packard Laboratories and Cupertino Division
for their contributions and support. Let i = ( - I )1/2

z==lim m-I / 2 sin (zml/2) (AI)


REFERENCES m-+O

z==lim m-I / 2 tan-l (zml/2) (A2)


1 D H DAGGETT m-+O
Decimal-binary conversion in Cordie
IRE Transactions on Electronic Computers Vol EC-8 No 3 sinhz== - i sin (iz) (A3)
pp 335-339 September 1959
2 M A LICCARDO coshz == cos (iz) (A4)
An interconnect processor with emphasis on Cordie mode
operation tanh-1z== - i tan-l (iz) (A5)

From the collection of the Computer History Museum (www.computerhistory.org)


From the collection of the Computer History Museum (www.computerhistory.org)

You might also like