0% found this document useful (0 votes)
15 views7 pages

Radix 8

The document describes a proposed radix-8 CORDIC algorithm for implementing high-performance trigonometric and other complex mathematical functions with reduced latency compared to conventional radix-2 CORDIC. The proposed algorithm reduces the number of iterations needed to n/3 + 3 by using a radix-8 approach, while maintaining a critical path delay of one carry-propagate adder. The algorithm, its convergence properties, and implementation using redundant arithmetic are discussed and compared to other reported high-performance CORDIC techniques.

Uploaded by

Loc Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

Radix 8

The document describes a proposed radix-8 CORDIC algorithm for implementing high-performance trigonometric and other complex mathematical functions with reduced latency compared to conventional radix-2 CORDIC. The proposed algorithm reduces the number of iterations needed to n/3 + 3 by using a radix-8 approach, while maintaining a critical path delay of one carry-propagate adder. The algorithm, its convergence properties, and implementation using redundant arithmetic are discussed and compared to other reported high-performance CORDIC techniques.

Uploaded by

Loc Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ASIC Implementation of High Performance Radix-8

CORDIC Algorithm
Ankur Changela Dr. Mazad Zaveri Prof. Anurag Lakhlani
PhD student Assistant Professor Senior Lecturer
School of Engineering and Applied School of Engineering and Applied School of Engineering and Applied
Science, Ahmedabad University, Science, Ahmedabad University, Science, Ahmedabad University,
Ahmedabad, Gujarat Ahmedabad, Gujarat Ahmedabad, Gujarat
Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—The COordinate Rotation DIgital Com- extra iterations are required to calculate the scale factor. Main
puter(CORDIC) is a well known special purpose algorithm drawback of conventional radix-2 CORDIC algorithm is that
to compute various complex mathematical functions. Since it performs n iterations to produce the n-bit precision which
CORDIC has less hardware complexity, it has achieved the
attention of many researchers. CORDIC is the part of many increases the latency [1].
real time applications such as FFT and DCT in the field of In real-time application, computational delay is the main
communication systems, signal and image processing. Main concern for designers. In case of CORDIC algorithm, com-
drawback of CORDIC algorithm is high latency i.e. number putational delay depends on number of iterations needed to
of iterations required to compute the total rotation. Many get the desired rotation. Comprehensive research has been
researchers have reported CORDIC algorithm with reduced
latency but at the cost of increase in hardware complexity carried out to reduce the computational delay of CORDIC by
and/or increase in delay of micro-rotation i.e. iteration. In this reducing the number of iterations at the cost of increase in
paper the radix-8 CORDIC algorithm is presented. Proposed hardware complexity and/or increase in critical path delay of
radix-8 CORDIC algorithm takes n/3 iterations to compute the single iteration. The conventional radix-2 CORDIC algorithm
total rotation and three additional iterations to compensate the presented in [1], takes 5n/4 iterations for rotation, including
scale factor. Further the detailed comparison between proposed
radix-8 CORDIC algorithm and recently reported CORDIC scale factor compensation and it has critical delay of an carry
algorithm is presented considering three parameters: 1. Latency propagation adder(CPA). High performance radix-4 CORDIC
2. Hardware Complexity and 3. Critical Delay. algorithm reported in [7] has reduced the number of iterations
to n/2 without scale factor compensation and (n/2) + 3
I. I NTRODUCTION with scale factor calculation and compensation. Double step
COordinate Rotation DIgital Computer(CORDIC) is a well branching CORDIC algorithm reported in [8], has reduced
known special purpose algorithm, which was described by the iterations to (n + 2)/3 without scale factor compensation
J.E. Volder in 1959 [1]. CORDIC is a rotational algorithm and it has two CPAs in the critical path. Recently reported
which can rotate the vector in two dimensional plane to com- low latency hybrid CORDIC algorithm [6] uses the advantage
pute the complex mathematical functions, such as trigonom- of double step branching and radix-4 CORDIC algorithm
etry (sine,cosine etc.), transcendental(square-root, exponential and number of iterations has been reduced to (3n/8) + 1.
etc.), and linear functions(multiplication, division etc.) [1] [2]. However, it has complex hardware since it uses large com-
To compute various complex mathematical functions, it can parator and it has two CPAs in the critical path. In this
rotate the vector in circular, hyperbolic, and linear coordinate paper, we propose radix-8 CORDIC algorithm (which is an
system using simple hardware i.e. adder and shifter which extension of radix-2 CORDIC algorithm) and also presents its
makes CORDIC suitable for real-time applications in the field VLSI implementation. Our proposed CORDIC algorithm takes
of image processing [3], communication systems [4], and (n/3) + 3 iterations for rotation in circular coordinate system
digital signal-processing [5]. CORDIC is an iterative algorithm including scale factor compensation and critical path delay
and to compute the rotation, it divides the rotation into is limited to one CPA only. Selection function of proposed
sequence of micro-rotations i.e. iterations which are computed algorithm is complex since it is not an integer power of two.
by shift and add/sub operations. Drawback of this process is VLSI implementation is done using the redundant (carry save)
that each iteration scales up the rotating vector by constant arithmetic to overcome this problem [9]. Since ASIC has
factor, which must be compensated in order to get the correct good performance in terms of power and delay compare to
value of the rotating vector. Additional iteration(s) may be FPGA [2] [5], ASIC of proposed algorithm is implemented.
required to compensate the constant scale factor. Conventional The rest of the discussion has been structured as follows.
radix-2 CORDIC algorithm has constant scale factor, however, Section 2 is the basic overview of conventional radix-2
CORDIC algorithms proposed in [6] [7], have a scale factor CORDIC algorithm [1]. Section 3 is about the proposed radix-
which depends on amount of rotations performed, and hence, 8 CORDIC algorithm and convergence of the proposed algo-

978-1-5386-5314-2/18/$31.00 ©2018 IEEE 699


rithm. Scale factor calculation and compensation, derivation it is not the integer power of two which leads to increase
of the selection function and the VLSI implementation of the in complexity of the iteration [10]. This problem can be
proposed algorithm are also discussed in Section 3. Section 4 overcome using redundant (carry save) arithmetic and detail
describes the simulation results and comparison between pro- is discussed in the later section related to implementation.
posed radix-8 CORDIC algorithm and other reported CORDIC Following is proposed radix-8 CORDIC algorithm equations
algorithms in [1] [7] [8] [6]. Section 5 concludes the proposed for circular rotation mode.
work.
xi+1 = xi − σi 8−i yi
II. CORDIC A LGORITHM yi+1 = yi + σi 8−i xi
The CORDIC algorithm is an iterative process to rotate a zi+1 = zi − tan-1 (σi 8−i )
vector by arbitrary angles, using only add and shift operations. Yq
CORDIC algorithm can rotate the vector along three coordi- K= 1 + σi2 8−2i (3)
nate systems: circular, hyperbolic and linear. CORDIC can i≥0

be operated in two modes: Vectoring and Rotation. Different where σi ∈ {±4, ±3, ±2, ±1, 0}, x0 and y0 are the initial
mathematical functions can be calculated using CORDIC al- coordinates of the vector and z0 is the input rotation angle.
gorithm by choosing appropriate mode and coordinate system. Similarly, xi+1 and yi+1 are the coordinates of rotated vector
The CORDIC iteration for circular rotation mode to compute and zi+1 is the angle to be rotated. As shown in (3), scale
sine and cosine, can be represented as: factor K is not constant and it depends on the value of σi .
In case of radix-8 CORDIC algorithm, the value of K ranges
xi+1 = xi − di yi 2−i
from 1 to 4.41.
yi+1 = yi + di xi 2−i
A. Convergence of Radix-8 CORDIC Algorithm
zi+1 = zi − di tan-1 (2−i ) (1)
To prove the convergence of radix-8 CORDIC algorithm,
where di is -1 if zi < 0, +1 otherwise. we have to prove that the residue of variable z given in (3) is
bounded after each iteration. Basic concept of SRT division
Since 2−i is equivalent to binary shifter, to perform the presented in [11] is used to determine the selection function
iteration we need binary adder/subtractor and shifter. If n σi and the convergence of the radix-8 CORDIC algorithm for
iterations (for n-bit precision) are performed on (1), they will an input angle z0 ∈ [−π/2, π/2]. As presented in [6] [7], we
converge to following equations. define,
xn = kn [x0 cos z0 − y0 sin z0 ] wi = 8i zi
yn = kn [y0 cos z0 + x0 sin z0 ] If we multiply the zi+1 given in (3) by 8i+1 and if it rearranged
zn = 0 as follows then variable z can be expressed in terms of w as
Yn p follows:
kn = 1 + 2−2i (2)
i=1 8i+1 zi+1 = 8i+1 zi − 8i+1 tan-1 (σi 8−i )
kn is the scale factor and it is 1.64 if number of iteration wi+1 = 8(wi − Pi [σi ])
tends to infinity. In (2), if the initial value of x0 and y0 are Pi [σi ] = 8i tan-1 (σi 8−i ) (4)
chosen as 1/1.644 and 0 respectively, the cosine and sine of
To prove the convergence of variable w, we have to prove
an input angle (given in z0 ) can be computed from xn and yn
that residue of the variable w is bounded after each iteration.
respectively. The value of xn is taken 1/1.64 to compensate
Following is the method for radix-8 division which has been
the constant scale factor, kn .
derived from radix-4 SRT division presented in [11]. We define
III. R ADIX -8 CORDIC A LGORITHM two variables M and N as follows:
Radix-2 CORDIC algorithm uses the power of two and Mi [q] = Pi [q] − (4/7)Pi [1]
it shifts one bit in one iteration. To process more than one Ni [q] = Pi [q] + (4/7)Pi [1] (5)
bits, radix higher than 2 can be employed. If radix-r is
used where r = 2n , n bits of result will be calculated in where q ∈ {±4, ±3, ±2, ±1, 0}. Selection function σi must
each iteration which will reduce the number of iterations to be selected such way that for σi = q, wi must be bounded in
compute the total rotation and hence latency as well. In radix- interval as follows:
2 CORDIC algorithm rotation angle is decomposed into series
Mi [q] ≤ wi ≤ Ni [q] (6)
of elementary angle whose value is tan-1 (2−i ). If radix-r is
considered then the value of elementary angle is tan-1 (σi r−i ). For convergence, there must be at least one σi = q for
The coefficients σi is the selection function which can take which (6) holds true. The equations given in (5), are simulated
any value from set {−r/2, ..., 0, ...r/2}. For radix higher than using MATLAB and continuity condition for the intervals for
four, the selection function i.e. coefficients σi is complex and different value of σi has been verified. Same has been listed

700
TABLE I B. Selection Function
I NTERVALS FOR CORRESPONDING VALUE OF σi
The value of selection function is decided base on criteria
σi = q
i=0 i≥1 given in (6). Selection function has to be chosen in such a
N0 [σi ] M0 [σi ] Ni≥1 [σi ] Mi≥1 [σi ] way that it is independent of the iteration which is being
-4 -1.77 -0.87 -4.57 -3.42
-3 -1.69 -0.8 -3.57 -2.42 executed. We can say that, the value of σi has to be chosen
-2 -1.55 -0.65 -2.57 -1.42 so that upper bound value and lower bound value of wi has
-1 -1.23 -0.33 -1.57 -0.42 to be independent of iteration being executed. Overlapping
0 -0.44 0.44 -0.57 0.57 area of wi has to be found out to decide the range of wi
1 0.33 1.23 0.42 1.57
2 0.65 1.55 1.42 2.57 for corresponding σi . The selection criteria given in (6) for
3 0.80 1.69 2.42 3.57 selecting the σi can be rewritten as,
4 0.87 1.77 3.42 4.57
σi = q if M [q] ≤ wi ≤ N [q]
where , M [q] = maxMi [q] and N [q] = minNi [q].
in the table I. The value of σi depends on the i, the iteration It is obvious that if i → ∞, then M1 [−q] ≤ N∞ [−q − 1]
is being evaluated. The set of value, σi can take is redundant and M∞ [q] ≤ N1 [q] and hence there will be always common
and the overlap between the intervals can be seen in table I. overlapping area. Overlapping area for i = 0 and i > 0 are
To prove that the selection criteria given in (6) converges the obtained separately. However, case for i > 0 and σi = 2 is
radix-8 CORDIC algorithm, we have to select σi such a way discussed here and other cases can be derived in a similar way.
that it satisfies the following condition. Maximum value of Mi [q] will be obtained when i → ∞ and
minimum value of Ni [q] will be obtained when i = 1 for i >
|wi | ≤ Pi [4] + (4/7)Pi [1] (7) 0. To find out the interval of wi for corresponding σi = 2 we
have to find out four values, M1 [3], M∞ [3], N1 [2], and N∞ [2].
The convergence of radix-8 CORDIC algorithm is proved Fig. 1 shows the overlapping area between the intervals. It
using method of induction in two parts. In the first part, we
prove that wi is bounded for i = 0 and later we prove that wi
is bounded for i ≥ 1. For i = 0, we assume that |w0 | ≤ π/2.
From (7), we can find out |w0 | as

|w0 | ≤ P0 [4] + (4/7)P0 [1] ≈ 1.77

Hence it is proved that (7) holds true for i = 0. Let us assume


that (7) is valid for i = b − 1 and then we have to prove that
it is also valid for i = b. Assuming that i = b − 1 then there
must be some value of q such that Fig. 1. Overlapping area between the intervals for σi = 2

Mb−1 [q] ≤ wb−1 ≤ Nb−1 [q] shows that wi has to be chosen from interval M∞ [3] ≤ wi ≤
N1 [2] for σi = 2. If we find out the interval for for σi = 1
Substituting the value of Mb−1 [q] and Nb−1 [q] from (5), using same procedure, it will be 1.42 ≤ wi ≤ 1.56. Now using
these obtained results, we can set the range of wi i.e. σi = 2
Pb−1 [q] − (4/7)Pb−1 [1] ≤ wb−1 ≤ Pb−1 [q] + (4/7)Pb−1 [1] for 1.5 ≤ wi < 2.5. Similarly, other selection functions can
be derived for i = 0 and i > 0 and they are listed in table II.
Subtracting by Pb−1 [q] and multiplying each side of this All the values of w0 and wi are chosen in such a way that w0
equation by eight, we obtain can be represented in binary using five bits and wi using four
bits.
−(32/7)Pb−1 [1] ≤ 8(wb−1 − Pb−1 [q]) ≤ (32/7)Pb−1 [1]
TABLE II
Now using the definition of variable w given in (4), above C RITERIA FOR SELECTION FUNCTION
equation can be rewritten as follows:
σi = q i=0 i>0
4 w0 ≥ 1 wi ≥ 3.5
|wb | ≤ (32/7)Pb−1 [1] 3 1 > w0 ≥ 0.875 3.5 > wi ≥ 2.5
2 0.875 > w0 ≥ 0.75 2.5 > wi ≥ 1.5
It is obvious that (32/7)8b−1 tan-1 (8−(b−1) ) ≤ 8b tan-1 (4 · 1 0.75 > w0 ≥ 0.375 1.5 > wi ≥ 0.5
0 0.375 > w0 ≥ −0.375 0.5 > wi ≥ −0.5
8−b ) + (4/7)8b tan-1 (8−b ). Using this concept we can write -1 −0.375 > w0 ≥ −0.75 −0.5 > wi ≥ −1.5
-2 −0.75 > w0 ≥ −0.875 −1.5 > wi ≥ −2.5
|wb | ≤ Pb [4] + (4/7)Pb [1] -3 −0.875 > w0 ≥ −1 −2.5 > wi ≥ −3.5
-4 −1 > w0 −3.5 > wi
Therefor theorem is verified for i = b.

701
−1
C. Scale Factor Compensation then the partial product for K04 = 0.001111100000 are given
by,
Scale factor of radix-8 CORDIC algorithm is not constant
and it depends upon the value of σi and the iteration being 2−3 x = x3 x3 x3 x3 .x2 x1 x0 x−1 x−2 x−3 x−4 x−5
executed. Compensation of scale factor is essential when it is 2−4 x = x3 x3 x3 x3 .x3 x2 x1 x0 x−1 x−2 x−3 x−4
not constant since value of scale factor depends on input angle
2−5 x = x3 x3 x3 x3 .x3 x3 x2 x1 x0 x−1 x−2 x−3
through σi . Method presented in [7] and [6] uses the look-up
table to store the value of scale factor. For n-bit precision, 2−6 x = x3 x3 x3 x3 .x3 x3 x3 x2 x1 x0 x−1 x−2
scale factor tends to one after n/4 iteration and hence size of 2−7 x = x3 x3 x3 x3 .x3 x3 x3 x3 x2 x1 x0 x−1
the look-up table is 3(n/12)+1 × n bits in [7]. Similar method
Later these five partial products will be added using redundant
is used in [6], and the size of look-up table is 9×8(n/16)−1 ×n
(carry save) arithmetic addition in order to get the compensated
bits. Later, the stored value of scale factor will be multiplied
value of variables x and y. Additional three iterations are
with rotated vector using shift and add operations in order to
required to compensate the scaled vector for proposed radix-
get the correct value of the vector.
8 CORDIC algorithm. Hence, proposed radix-8 CORDIC
For radix-8 CORDIC algorithm, scale factor for ith iteration algorithm takes (n/3)+3 iterations for convergence with scale
and for selection function σi is given by, factor compensation.
−1
Kiσi
= (1 + σi2 8−2i )−1/2 = (1 + σi2 2−6i )−1/2 (8) D. Architecture of Proposed Radix-8 CORDIC Algorithm
Unfolded pipelined architecture of proposed radix-8
As shown in (8), if we take i = 2 then the value of σi2 will CORDIC algorithm is presented in this section. Unfolded
be multiplied by 2−12 i.e. the value of σi2 will be shifted to pipelined architecture has separate hardware for each iteration
right by 12 bits. If 16-bit precision is taken for scale factor so all stages can compute parallely. This helps to increase
then for i > 2, the value of scale factor can be assumed the throughput of the system [12]. Proposed architecture has
one. For higher accuracy, one or two additional iterations been implemented using redundant arithmetic(carry save). x
may be performed for compensation. The method used in and y rotator modules are implemented separately using (3).
this work uses the approximated value of the scale factor. Architecture for first iteration is shown in Fig. 2. Fig. 4 shows
All the possible values of scale factor are calculated for σi the architecture of all other iterations which are same except
and iteration i. Then the total scale factor can be expressed the angle table stored on a ROM as it stores the value which
−1 −1 −1
as K −1 = K0σ 0
· K1σ 1
· K2σ2
. Each possible values of K −1 depends on the iteration being executed i.e. tan-1 (σi 8−i ).
then approximated using five bits. For example, actual value
−1
of K04 = 0.24253 and it can be approximated with 5 bits as
0.001111100000 and error in this case is 3.48 × 10−4 . Thus
the scale factor can be expressed as

−1 −1 −1
K −1 = (K0σ 0
+ ∆0 ) · (K1σ1
+ ∆1 ) · (K2σ 2
+ ∆2 )

where ∆0 ,∆1 and ∆2 are the errors introduced because of


approximation. Above equation can be further simplified as:

−1 −1 −1 −1 −1
K −1 =K0σ 0
· K1σ 1
· K2σ2
+ K0σ0
· K1σ 1
· ∆2 +
−1 −1 −1 −1
K0σ 0
· K2σ 2
· ∆1 + K1σ1
· K2σ 2
· ∆0 .

All other terms can be assumed zero since ∆ is very small.


Total error Et , which has been introduced because of approx-
imation is given by,

−1 −1 −1 −1 −1 −1
Et =K0σ0
· K1σ 1
· ∆2 + K0σ 0
· K2σ 2
· ∆1 + K1σ1
· K2σ 2
· ∆0 . Fig. 2. Architecture of first micro-rotation of proposed radix-8 CORDIC
algorithm
(9)
As shown in Fig. 2, five bit comparator is needed to compare
From (9), it is obvious that Et would be small as ∆ is very
−1 the value of w0 as the values of w0 for corresponding selection
small and Kiσ ≤ 1. For compensation five partial products of
i
−1 function are chosen such a way that it can be expressed in
variables x and y are generated based on the value of Kiσ i
. binary using five binary bits(using 2’s complement representa-
For example,if x is represented as tion)with precision FXP(2,3) i.e. two bits for integer and 3 bits
for fraction. The output of the comparator is selection function,
x = x3 x2 x1 x0 .x−1 x−2 x−3 x−4 x−5 x−6 x−7 x−8 σ0 and based on the value of σ0 , the value of tan-1 (σ0 ) is

702
obtained from the angle table stored on the ROM. Later these
two vectors tan-1 (σ0 ) and w0 will be multiplied by eight using
binary shift operation and will be stored on pipeline registers.
x rotator has two datapaths where first datapath moves from
x0 to 3 to 2 CSA (Carry Save Adder). In second datapath,
x0 moves through shifter in order to compute 8−i · σ0 · x0 .
Detail architecture of shifter is shown in Fig. 3. Since, radix-8

Fig. 3. Architecture of Shifter


Fig. 4. Architecture of ith micro-rotation of proposed radix-8 CORDIC
CORDIC algorithm has selection function which is not always algorithm where i ≥ 1
the integer power of two as it could be three as well. To handle
such situation, shifter will produce two vectors x0p1 and x0p2
where x0p1 = x0 and x0p2 = 2 · x0 (using shift operation)
as shown in Fig. 2. This can be achieved by adding 2 to 1
multiplexer in datapath as shown in Fig. 3. These two vectors
x0p1 and x0p2 then move to 3 to 2 CSA of y rotator. 3 to 2
CSA of x rotator will receive the similar two vectors from y
rotator. Three vectors x0 , y0p1 and y0p2 will be added by 3 to
2 CSA in order to produce xs1 and xc1 which will be stored
on pipeline register. Total delay of first iteration is the addition
of delay of 4-bit comparator, shifter and 3 to 2 CSA. 3 to 2
CSA has the delay of only one full adder [9].
Fig. 4 shows the architecture of ith iteration where i > 0.
The architecture of wi rotator has major changes compared to
w0 rotator of first iteration. As shown in Fig. 4, in order to get
the value of w,b six bits of input vectors wsi and wci are added
using CLA(carry look ahead adder). Only four bits(FXP(2,2)) Fig. 5. Architecture of 6 to 2 and 3 to 2 CSA
of obtained value of w b is then compared with the value listed
in table II using 4-bit comparator. The output of comparator
is selection function,σi which will be used as input for shifter Fig. 6 shows the architecture of area efficient implemen-
and ROM. Three vectors wsi , wci and tan-1 (σi 8−i )(output tation of proposed radix-8 CORDIC algorithm. wi rotator of
of ROM)will be added using 3 to 2 CSA in order to get this architecture is same as the architecture shown in Fig. 4.
wsi+1 and wci+1 . Architecture of x and y rotators are same as As shown in Fig. 6, in x and y rotators, two vectors generated
first iteration, except the shifter has two input vectors. Shifter by 3 to 2 CSA are added to generate xi+1 and yi+1 using
will produce four vectors to handle the situation, wherein the CPA(Carry Propagation Adder). This architecture will reduce
selection function is three. 6 to 2 CSA will be used to add the area and hence power as well. This architecture has
six vectors and the architectures of 6 to 2 CSA and 3 to 2 hardware complexity of 2n full adders as compared to 4n
CSA are shown in Fig. 5. As shown in Fig. 5, 6 to 2 CSA of proposed architecture in Fig. 4. Critical path delay of this
has delay of three full adders [9] and hardware complexity of architecture is tCP A + tF A as compared to 3tF A of proposed
4n full adders where n is the number of bits(precision). 3 to architecture in Fig. 4.
2 CSA has delay of one full adder and hardware complexity
IV. S IMULATION R ESULT AND C OMPARISON
of n full adders. All six generated vectors will be stored on
pipeline registers and will be processed in next micro-rotation Micro-rotation equations given in (3), for circular rotation
as shown in Fig. 4. mode of proposed radix-8 CORDIC algorithm, have been

703
Fig. 7. Verification waveform of proposed synthesized radix-8 CORDIC
algorithm

Double step branching CORDIC algorithm presented in [8]


has reduced the latency approximately by 40% compared to
radix-2 CORDIC algorithm without scale factor calculation
Fig. 6. Architecture of Area Efficient Implementation of Proposed Algorithm
and compensation. It has hardware complexity of f (3n) and
2-CPA in critical path. Hybrid CORDIC algorithm presented
in [6] has reduced the latency approximately by 75% compared
implemented in MATLAB. The values of sine and cosine,
to radix-2 CORDIC algorithm with increase in hardware
for an input angle ranging from −π/2 to π/2 with step size
complexity of f (3n) and delay of 2-CPA in critical path.
π/180 have been derived using MATLAB simulation. Inbuilt
Our proposed radix-8 CORDIC algorithm has reduced the
functions of MATLAB are used to generate values of sine and
latency by 65% compared to radix-2 CORDIC algorithm.
cosine for same input angle range, for comparison purposes.
Architecture shown in Fig. 4 has hardware complexity of
It is found out that rms value of the error for sine and cosine
f (4n) and delay of 3 full adders only in critical path. Area
are 3.46 × 10−4 and 4.17 × 10−4 respectively.
efficient implementation of our proposed algorithm shown in
The Proposed architecture of radix-8 CORDIC algorithm is Fig. 6, has hardware complexity of f (2n) and delay of one
implemented using Verilog HDL and simulated using Mod- CPA and one full adder in critical path.
elSim HDL simulator. Fig. 7 shows the simulation results. Both our architectures, redundant arithmetic and area effi-
Values of sine and cosine are shown in analog form for an cient architecture are implemented using Verilog HDL with
input angle, zin ranging from −π/2 to π/2 with step size 24-bit (FXP(8,16)) precision. The ASIC of the proposed
π/180. Our proposed architecture is implemented using 24- area efficient architecture is implemented using Cadence SoC
bits fixed point precision(FXP(8,16): 8 bits for integer and 16 Encounter CAD tool and it is shown in Fig. 8. Both our
bits for fraction value). Simulation results are also compared architectures are synthesized using Cadence RC compiler and
with the sine and cosine generated using inbuilt MATLAB synthesis results are obtained for 45nm low-vt standard cell
functions for same input angle range. It is found out that the technology. Comparison of our proposed architectures are
RMS value of the error for sine and cosine are 4.3 × 10−3 shown in table IV. The clock period shown in table IV is
and 3.9 × 10−3 respectively. derived based on critical path delay analysis. It shows the
Proposed radix-8 CORDIC algorithm is also compared with minimum clock period required to avoid setup and hold time
other recently proposed CORDIC algorithm considering three violations. Proposed area efficient architecture has 45% less
parameters, latency in terms of number of iterations, hardware hardware compared to redundant arithmetic architecture. How-
complexity(in terms of order of full adder) and critical timing ever, area efficient architecture is 23% slower than redundant
of micro-rotation. Comparison has been shown in table III and arithmetic architecture. Also, area efficient architecture has
it is summarized here. Radix-2 algorithm presented in [1] has less power consumption compared to redundant arithmetic
a latency of n iterations without scale factor compentation and architecture.
5n/4 with scale factor compentation for n-bit precision but it
has simple hardware and one CPA in critical path. Radix-4 V. C ONCLUSION
CORDIC algorithm reported in [7] has reduced the iterations Implementation of radix-8 CORDIC algorithm is presented
to 50% compared to radix-2 CORDIC algorithm with same in this work to reduce the number of iterations to compute the
hardware complexity but without scale factor compensation. total rotation. Proof of convergence and mathematical valida-
High performance radix-4 algorithm presented in [13] has tion of proposed algorithm have been presented. To verify the
hardware complexity of f (1.5n) and 2-CPA in critical path. proposed algorithm, MATLAB simulation has been performed

704
TABLE III
C OMPARISON OF P ROPOSED R ADIX -8 CORDIC ALGORITHM .

parameters
CORDIC Number of Hardware Critical
Algorithms Iterations Complexity path Delay
Radix-2 5n
f (n) tCP A
algorithm [1] 4
Radix-4 n
f (n) tCP A
algorithm∗1 [7] 2
Radix-4 n
+3 f (1.5n) 2tCP A
algorithm [13] 2
Double Step n+3
Branching CORDIC f (3n) 2tCP A
∗1
algorithm [8] 2
Hybrid CORDIC 3n
+1 f (3n) 2tCP A
algorithm [6] 8
Proposed n
Radix-8 CORDIC +3 f (4n) 3tF A
algorithm ∗2 3
Proposed n Fig. 8. Layout of Proposed architecture
Radix-8 CORDIC +3 f (2n) tCP A + tF A
algorithm∗3 3
∗1: Number of iteration does not include the scale factor calculation
and compensation; ∗2:Implementation using redundant arithmetic;
and Systems I: Regular Papers, vol. 56, no. 9, pp. 1893–
∗3 ::Area efficient implementation; f (∗): Order in terms of full adder; 1907, Sept 2009.
tCP A : Delay of Carry Propagation Adder; tF A : Delay of Full Adder. [5] D. Timmermann, H. Hahn, B. J. Hosticka, and
G. Schmidt, “A programmable cordic chip for digital
TABLE IV signal processing applications,” IEEE Journal of Solid-
C OMPARISON OF PROPOSED REDUNDANT ARITHMETIC AND AREA State Circuits, vol. 26, no. 9, pp. 1317–1321, Sep 1991.
EFFICIENT ARCHITECTURE .
[6] R. Shukla and K. C. Ray, “Low latency hybrid cordic
Proposed Architectures algorithm,” IEEE Transactions on Computers, vol. 63,
Parameters Area Efficient Redundant Arithmetic no. 12, pp. 3066–3078, Dec 2014.
Area (µm2 ) 8732 16114
Clock period (ns) 1.3 1 [7] E. Antelo, J. Villalba, J. D. Bruguera, and E. L. Zapata,
Power (mW) 2.2 2.4 “High performance rotation architectures based on the
radix-4 cordic algorithm,” IEEE Transactions on Com-
puters, vol. 46, no. 8, pp. 855–870, Aug 1997.
to compute sine and cosine functions. Proposed algorithm is [8] D. S. Phatak, “Double step branching cordic: a new
also implemented using redundant(carry save) arithmetic in algorithm for fast sine and cosine generation,” IEEE
Verilog HDL and simulated using ModelSim HDL simula- Transactions on Computers, vol. 47, no. 5, pp. 587–602,
tor. Area efficient architecture of proposed algorithm is also May 1998.
presented. Synthesis results of both architectures have been [9] G. D. Sutter, J. Deschamps, and J. L. Imana, “Modular
derived and it shows that redundant arithmetic architecture multiplication and exponentiation architectures for fast
is faster than area efficient architecture but at the cost of rsa cryptosystem based on digit serial computation,”
hardware. Proposed radix-8 CORDIC algorithm has reduced IEEE Transactions on Industrial Electronics, vol. 58,
the number of iterations for convergence approximately by no. 7, pp. 3101–3109, July 2011.
65%, as compared to radix-2 CORDIC algorithm. [10] J. Bruguera, E. Antelo, and E. Zapata, “Design of a
pipelined radix 4 cordic processor,” Parallel Computing,
R EFERENCES vol. 19, no. 7, pp. 729 – 744, 1993.
[1] J. E. Volder, “The cordic trigonometric computing tech- [11] B. Parhami, Computer Arithmetic: Algorithms and Hard-
nique,” IRE Transactions on Electronic Computers, vol. ware Designs. Oxford, UK: Oxford University Press,
EC-8, no. 3, pp. 330–334, Sept 1959. 2000.
[2] J. Villalba and T. Lang, “Low latency word serial [12] A. Changela, M. Zaveri, and A. Lakhlani, “Fpga imple-
cordic,” in Proceedings IEEE International Conference mentation of asynchronous mousetrap pipelined radix-2
on Application-Specific Systems, Architectures and Pro- cordic algorithm,” in IEEE International Conference on
cessors, Jul 1997, pp. 124–131. Current Trends towards Converging Technologies, March
[3] H. Huang and L. Xiao, “Cordic based fast radix-2 dct 2018, pp. 45–50.
algorithm,” IEEE Signal Processing Letters, vol. 20, [13] P. R. Rao and I. Chakrabarti, “High-performance com-
no. 5, pp. 483–486, May 2013. pensation technique for the radix-4 cordic algorithm,”
[4] P. K. Meher, J. Valls, T. B. Juang, K. Sridharan, and IEE Proceedings - Computers and Digital Techniques,
K. Maharatna, “50 years of cordic: Algorithms, architec- vol. 149, no. 5, pp. 219–228, Sep 2002.
tures, and applications,” IEEE Transactions on Circuits

705

You might also like