0% found this document useful (0 votes)
80 views

Fpga Implementation of Modified Radix 2 SRT Division Algorithm

Uploaded by

Asif Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Fpga Implementation of Modified Radix 2 SRT Division Algorithm

Uploaded by

Asif Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

FPGA Implementation Of Modified Radix 2 SRT Division Algorithm

AttifA. Ibrahem I, Hamed Elsimary I, Aly E. Salama 2


Electronics Research Institute, Cairo, Egypt,
2
Cairo University, Cairo, Egypt

Abstract - The flexibility of field programable gate arrays The radix r recurrence for computing successive
(FPGAs) can provide arithmetic intensive applications residuals is:
with the benefites of custom hardware but without the high
cost of custom silicon implementations. In this paper, we Rj = rR1-, - qjd , j = 1,2,3...
present the adaptation of modified radix 2 division (I)
algorithm [1] for lookup table based FPGAs where Rj is the residual, d E [2,1) is the normalized
implementation. For this modified scheme, the result digits
and the residuals are computed concurrently and the divisor and qj E {-a...a} is thejth quotient digit. The
computations in adjacent rows are overlapped. The
implementation has been done with Xilinx technology and residual at the jth step must satisfy VR < ad l(r -1)|.
FPGA-Advantage CAD tools. The quotient is accumulated by appending successive
Keywords: Field programmable gate arrays (FPGAs),
division, SRT division, two-digit quotient selection. quotient digits to the partial quotient Qj, i.e.,

1. Introduction Qj = Qj-l + qjr .

There are two main factors which limit the performance


Now, the flexibility of field programmable gate arrays of the SRT method. Firstly, a serial dependency existes
(FPGAs) allows the rapid development of high among the iterations. This is fundamental to the successive
performance custom hardware. By selecting arithmetic approximation method of computing the quotient.
algorithms suited to the FPGA technology and Secondly, the computation of the residual and the quotient
subsequently applying optimal mapping strategies, high digit are performed sequentially.
performance FPGA implementations can be developed [2].
In designing fast, hardware-oriented arithmetic 3. Modified Radix 2 SRT Division
algorithms, VLSI architectural design issues such as
regularity, modularity and locality of interconnections The key to this new radix 2 method of division [1] is
must be. addressed. This facilitates the mapping of
algorithms on to an architecture which is amenable to a the reformulation of the recurrence in (1) as
VLSI implementation and assists in the test and
complexity management of designs[ 1]. The purpose of this R; =2Rj, +0 0 =
paper is to briefly describe the modified division scheme
9 1 qj E-d tO)
and its adaptation for lookup table based FPGAs
implementation.
The paper is structured as follows. Section 2 shows a
brief description of the standard SRT division method and
R. -|{R2Rk if ifqj =-1=0or I (2)
identifies the factors which inhibit performance. In Section qj
3, we give a brief description of the modified radix 2 SRT
division algorithm [I] and its array architecture. Section 4 where d e [dmin9 dmax ) and qj is chosen from the
presents critical factors that efficiently matching division signed binary number representation (SBNR) digit set.
to a given set of FPGA characteristics [2]. Section 5
presents simulation results. Section 6 presents Clearly, the computation of the tentative residual, R;, can
implementation results. Then we provide our conclusions
in section 7. proceed before the full quotient digit has been computed.
2. standard SRT division algorithm All that is required is that the quotient digit to be located to
either the {0,1) or {-1,0) subsets. The quotient digit can
SRT division [3] is a digit-recurrence algorithm which then be computed concurrently and in a separate path to
utilises arithmetic redundancy [4] to reduce the required that of the tentative residual. To facilitate this, it must be
precision of comparisons between the divisor and the possible to compute 0 as quickly as possible. Therefore,
residual.
sz5 should be dependent upon the MSD only of the residual

0-7803-8294-3/04/$20.00 ©2004 IEEE


1419
and redundancy overflow in the residual should be Table 1 Quotient Digit and signal generation for Modified
avoided. This can be achived by introducing a more SRT Division
stringent bound on the residual, namely jR| < dmin .
Fig. I shows the selection regions which are used to R q a/s restor compress R - q.d2
design the quotient digit selection function for the divison 00 0 X I 0 00
algoithm[ I]
01 0 x I 0 01
01 0 x I 0 01
10 I 0 0 x 00
ih~~~~~~~+ Li_U 11 1 0 0 x 01
d iT 0 x I I 01
- d o0 l I 0 x 00
Ti 0 x 1 1 o
11 1O 1 0 x 01
--ddab
-d3-
2dalh qi,_
-dab- dafx x - Don't care
F K-_,

Fig.1 Selection regions for division


3.1 Division Array architecture
Table I detailes the necessary control signales required by An architecture to implement the modified SRT division
algorithm is illustrated in fig. 2. the circuit comprises a
the algorithm for each digit-pair comprising the residual regular array of type I and type 2 cells with quotient digits
estimate. The restore signal is required when a zero is and control signals being determinied by the S cells oni the
periphery of the array. The functional and gate level
selected as a quotient digit in order to select the previous descriptions of the basic cells are given in fig. 3. the
residual, rather than the tentative residual, as the new divisor digits d3d4d5 .... and the dividend
residual. The compress signal is required when the x = O.Ox2 x3...: enter the array in a bit-parallel manner
quotient digit is a zero and MSD of the residual is non- as shown. Scince the two MSDs of the divisor are known
implicity [1], they are not input to the array. Each signed
zero i.e. when R = 0.11 - whereR is the estimated binary digit, R, comprising the residual is composed of
two digits namely, R+ E{E,} and R E {- 1,0}, which
residual of 2Rjl - or R = 0. 1 1. The signal which
are encoded as shown in table 2. this coding enables
determines whether the divisor multiple is d or - d is conventional full-adders to be employed in
adding/subtracting a signed binary operand and a binary
the add/subtract signal, a/s for brevity, and can be operand.
determined from the MSD of the residual as required. It is Table 2 Coding of R+ and R
important to note that the first digit of the term R -qd
where d2 is the two MSDs of the divisor, is always zero
such that, when the new residual is scaled, no redundancy
overflow occures.

1420
4. Division and FPGAs d Rm
d

The proper selection of radix and algorithm [2] are critical


factors in efficiently matching division to a given set of f.
a l als
FPGA characteristics. A higher radix will have greater
1,0
restore
1 I f h(0°l
aestore
combinational logic depth but fewer required iterations.
Each of the digit-recurrence division algorithms differs R,,, d (0,1)
primarily in the number of bits required to compute the
following basic steps: (- 1.0,1)
idf a/s- I
d if ars-0 esStore
2t,o, + S = R-+ P
R4 R-
R '- S+ ti,

{e.RrnR i/if restore


restore -I
-0

a/s' R,,
retRe t etr
ols /I
comprvss it," R, R,
a/s
compress
R ,,,
(- 1,0.1)
J° if o/s-I
P |1 f aJs - O
2t,,, +5 = R*,+p
Fig. 2 Radix 2 division array R '-S+tI. t,, =0

Ri tf compress 0
restore -
f,

(1) Select a quotient digit, qj , R*, tf compress I

R iRtf restore I

(2) Form a multiple of the divisor, qjD, and R, R2


if restore 0

(3) Compute the next residual, R1 .


q ~~~~compress
A choice of the best radix for a lookup table based restore

FPGA[5] is determined by comparing the maximum


function size of the division algorithm (based on number of q &l(R,1R2)
input bits) with the function size of a single logic block. A (O.-I)
[I If Re
composition of switching functions maps most efficiently ff R e(0.1)
into-k-input lookup tables when each function has no more
thet k inputs. With more than k inputs a function requires compress -
I f t I

at least two lookup tables, and a composition of n such lO if RI 0

functiorns may require more than 2n logic blocks. By this I f q-O


rationale, the best division algorithm for k-input logic o

O If qe(-I.I)
blocks is the one with the largest radix having steps of a
maximum of k input bits. For example, SRT (radix 2) q q
division is the best choice for the XC4010 under this
criterion. SRT requires only four bits for the quotient digit Fig. 3 Description of the Basic cells
selection function whereas higher radices need at least six
bits. SRT also uses three bits for the generation of each bit 5. simulation results
of the divisor multiple while higher radices use at least
five. The computation of rR1 - shifted estimated residual The proposed radix 2 division algorithm were described
using VHDL code, compiled and simulated using
of Ri- for all radices requires more input bits than the Modelsim. A simulation result of the proposed divider is
general 5-input lookup table of an XC4010 CLB, but this shown in Fig. 4 for the input operands
computation has the fewest number of input bits with radix x = (0.0100000)2 = 0.25 ( in binary signed digit
2. BSD form), and d = (0.100000)2 = 0.5 (in regular

1421
binary form), which results in q = (0.10000)2= 0.5 ( in arithmetic, pp. 80-86, Windsor, Ontario, Canada, June
BSD form). 29-jully 2, 1993.
[2] M. E. Louie and M. D. Ercegovac," On Digit -
Recurrence Division Implementations for Field Progr-
/da,l&OAO 11111 . j 1 - ammable gate Arrays" In Proc. Of the 1 10h symposium
/d_w, Q Jnm 100000 iL on Computer Arithmetic, PP. 202-209, Canada, June29
/d_s,apid%Ak 00000 -July 2 1993
/dwq/zo.Aw. 10000 ' r- --
[3] J.E Robertson, " A New Class of Digital Division
Methods," IRE Trans Electronic Computers, Vol. 7,
PP. 218-222, Sept. 1958.
[4] D.E. Atkins, " Introduction to the Role of Redundancy
in Computer Arithmetic," IEEE Computer, 1975,
3400nw ..
. _.____!X .. s.._ .. l
2?w u._
..... ,. .,...... I.
3s]
PP74-77.
146 g 4 f140. t [5] Xilinx, "XC4000 Logic Cell Array Family- Technical
Fig. 4 function simulation results data," San Jose, 1990.

6. Implementation results
the VHDL code for proposed divider was processed by
Leonardo synthesis tool for xilinx XC4O 10
FPGA(40 lOePQ 160). The result of the synthesis process is
shown in Table 3
Table 3. synthesis results of xilinx XC4010_FPGA
Max.
Speed estimated FG HFG CLB 1OS
grade clock

-4 4.75 MHz 90 24 51 36

-3 7.0 MHz 90 24 51 36
.

7. Conclusions
Generating an efficient lookup table based FPGA
implementation for arithmetic requires (I) the selection of
an algorithm suited to the target technology, (2) the
creation of a suitable variation of that algorithm for the
target characteristics, and (3) an efficient mapping
approach. A well- matched algorithm is recognized by the
simple decomposition of its intermediate steps into
expressions of k variables or less, where k is the number of
inputs in the lookup tables. This paper also has briefly
described the modified radix 2 SRT division algorithm.
The premise of the approach has been that concurrently
computing the residual and the result digit at each step
leads to an increase in the performance of the circuits
compared to the SRT methods. The penalty of the
modified method appears in the reduced range of the
operands.
8. References
[1] S.E. Cquillan, J.V. McCanny, and R. Hamill, "New
Algorithms and VLSI Architectures for SRT Division
and Square Root," Proc. 1I h symp. Computer

1422

You might also like