Fpga Implementation of Modified Radix 2 SRT Division Algorithm
Fpga Implementation of Modified Radix 2 SRT Division Algorithm
Abstract - The flexibility of field programable gate arrays The radix r recurrence for computing successive
(FPGAs) can provide arithmetic intensive applications residuals is:
with the benefites of custom hardware but without the high
cost of custom silicon implementations. In this paper, we Rj = rR1-, - qjd , j = 1,2,3...
present the adaptation of modified radix 2 division (I)
algorithm [1] for lookup table based FPGAs where Rj is the residual, d E [2,1) is the normalized
implementation. For this modified scheme, the result digits
and the residuals are computed concurrently and the divisor and qj E {-a...a} is thejth quotient digit. The
computations in adjacent rows are overlapped. The
implementation has been done with Xilinx technology and residual at the jth step must satisfy VR < ad l(r -1)|.
FPGA-Advantage CAD tools. The quotient is accumulated by appending successive
Keywords: Field programmable gate arrays (FPGAs),
division, SRT division, two-digit quotient selection. quotient digits to the partial quotient Qj, i.e.,
1420
4. Division and FPGAs d Rm
d
a/s' R,,
retRe t etr
ols /I
comprvss it," R, R,
a/s
compress
R ,,,
(- 1,0.1)
J° if o/s-I
P |1 f aJs - O
2t,,, +5 = R*,+p
Fig. 2 Radix 2 division array R '-S+tI. t,, =0
Ri tf compress 0
restore -
f,
R iRtf restore I
O If qe(-I.I)
blocks is the one with the largest radix having steps of a
maximum of k input bits. For example, SRT (radix 2) q q
division is the best choice for the XC4010 under this
criterion. SRT requires only four bits for the quotient digit Fig. 3 Description of the Basic cells
selection function whereas higher radices need at least six
bits. SRT also uses three bits for the generation of each bit 5. simulation results
of the divisor multiple while higher radices use at least
five. The computation of rR1 - shifted estimated residual The proposed radix 2 division algorithm were described
using VHDL code, compiled and simulated using
of Ri- for all radices requires more input bits than the Modelsim. A simulation result of the proposed divider is
general 5-input lookup table of an XC4010 CLB, but this shown in Fig. 4 for the input operands
computation has the fewest number of input bits with radix x = (0.0100000)2 = 0.25 ( in binary signed digit
2. BSD form), and d = (0.100000)2 = 0.5 (in regular
1421
binary form), which results in q = (0.10000)2= 0.5 ( in arithmetic, pp. 80-86, Windsor, Ontario, Canada, June
BSD form). 29-jully 2, 1993.
[2] M. E. Louie and M. D. Ercegovac," On Digit -
Recurrence Division Implementations for Field Progr-
/da,l&OAO 11111 . j 1 - ammable gate Arrays" In Proc. Of the 1 10h symposium
/d_w, Q Jnm 100000 iL on Computer Arithmetic, PP. 202-209, Canada, June29
/d_s,apid%Ak 00000 -July 2 1993
/dwq/zo.Aw. 10000 ' r- --
[3] J.E Robertson, " A New Class of Digital Division
Methods," IRE Trans Electronic Computers, Vol. 7,
PP. 218-222, Sept. 1958.
[4] D.E. Atkins, " Introduction to the Role of Redundancy
in Computer Arithmetic," IEEE Computer, 1975,
3400nw ..
. _.____!X .. s.._ .. l
2?w u._
..... ,. .,...... I.
3s]
PP74-77.
146 g 4 f140. t [5] Xilinx, "XC4000 Logic Cell Array Family- Technical
Fig. 4 function simulation results data," San Jose, 1990.
6. Implementation results
the VHDL code for proposed divider was processed by
Leonardo synthesis tool for xilinx XC4O 10
FPGA(40 lOePQ 160). The result of the synthesis process is
shown in Table 3
Table 3. synthesis results of xilinx XC4010_FPGA
Max.
Speed estimated FG HFG CLB 1OS
grade clock
-4 4.75 MHz 90 24 51 36
-3 7.0 MHz 90 24 51 36
.
7. Conclusions
Generating an efficient lookup table based FPGA
implementation for arithmetic requires (I) the selection of
an algorithm suited to the target technology, (2) the
creation of a suitable variation of that algorithm for the
target characteristics, and (3) an efficient mapping
approach. A well- matched algorithm is recognized by the
simple decomposition of its intermediate steps into
expressions of k variables or less, where k is the number of
inputs in the lookup tables. This paper also has briefly
described the modified radix 2 SRT division algorithm.
The premise of the approach has been that concurrently
computing the residual and the result digit at each step
leads to an increase in the performance of the circuits
compared to the SRT methods. The penalty of the
modified method appears in the reduced range of the
operands.
8. References
[1] S.E. Cquillan, J.V. McCanny, and R. Hamill, "New
Algorithms and VLSI Architectures for SRT Division
and Square Root," Proc. 1I h symp. Computer
1422