0% found this document useful (0 votes)
31 views8 pages

A Fast Hybrid Multiplier Combining Booth and Wallace/Dadda Algorithms

Uploaded by

PPL PPL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

A Fast Hybrid Multiplier Combining Booth and Wallace/Dadda Algorithms

Uploaded by

PPL PPL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A FAST HYBRID MULTIPLIER COMBINING BOOTH AND

WALLACE/DADDA ALGORITHMS

Brian Millar and Philip E. Madrid Earl E. Swartzlander,Jr.


Motorola Inc. Dept. of Electrical and Computer Engineering
6501 William Cannon Drive West University of Texas at Austin
Austin, TX 78741 Austin, TX 78712

Abstract Pa* ProdtG m


Using radix 4 recoding, a vety fast multiplier is designed The two non-zero numbers being multiplied are A and B,
which takes advantage qf the most desirable characteristicsof where A is the multiplier word m the form:
Booth and WallacetDad& multiplier schemes. This hybrid
multiplier is shown to be su rwr fmm a pe ormance stand-
point to the waditiod WalEe/Daddamult&ier and conse-
A=as..hl ,-a, a= %. ...
and B is the multiplicand m the form:
quently, superior to the traditional Booth multiplier as well.
Djfferent methods of fwther increasing the speed are also sug- B=bS.b31 bB ...be
gested, Radix 4 is chosen because it is optimalfor such a mul- The numbers have 1 sim bit and 32 binary digits. “&le 1
tiplier. This will be explained by showing tk limitation of shows the required stepsin the Booth algoriihm ‘b generate a
incteasing the radix to radix 8. partial product [2,3].The appropriate bits of the multiplier
word are analyzed at each step, and the described action if any
rformed to the multi licand before it is F e d to the par-
%&oduct and right shzted by two bit posit”
There are many methods to perform binary multiplication
of two numbers, Some of b algorithms strive for minimiza-
tion of gate count, layout area, or interconnect. Others are
designed to be extremely fast, but increase the size of the mul-
tiplier si nificantly. Two well known multiplication algo- Bits q+Iai,%-l Booth step before shifting
rithms, %ooth hgher radix multiplication [1,2,3],and OOO Nothing
WallacdDadda multipliers [4&5]. when combined in a hybrid
fashion, can yield a multiplier that is faster and not more than 001 +B
30% larger than the Wtional WallaceDadda multiplier. A 010 +B
similar idea has been proposed previously [a], but thls paper
shows the implementation. The higher radix Booth algorithm 011 +2B
has the advantage of reducing the number of partial products
which are added [2], while the Wallace/p)adda approach 100 -2B
allows the padal products to be added very quickly. A hybrid
multiplier combining both of these strengths is presented. It is
I 101 I -B I
shown to be very fast, and realized with a reasonable amount 110 -B
of logic gates. Section 2 develops a 32x32 bit hybrid multi-
plier and analyzes the size and gate delay compared with a 111 .. Nothing
straightforward WallaceDadda multiplier. Section 3 briefly
discusses why radices higher than 4 are not advantageous to In hardware, the partial products B and 2B can be gener-
this multiplier. ated immediately. B is an input to the multiplier, and 2B is
formed by shifting the bits of B to the left one place. -2B can
WwDadda
..
4 and also be generated immediately once -B is computed. On the
surface, it appears that -B will require a high-speed adder to
compute, and therefore require a large overhead to generate.
A fast 32x32 bit multiplier is designed combining the Booth However, a trick will be used here: only will be formed
radix 4 and Wallace/Dadda techniques. The Booth a roach (the 1’s complement). This is easily done with an inverter for
generates less partial products than the ~al~ac%adda each bit. Then, the carry in of one is added in the reduction
a roach. The WallaceDadda method, however, adds the par- structure, to complete the 2’s complement. A special mux for
ti!I’poducts significantly faster than a mditional~oothmulti- the LSB enables this carry in to be set when a “-B” partial
plier which accumulates a running sum of the partial products. product is needed.
It might seem therefore that the higher the radix of Booth The important realization here is that adding the carry in to
implemented, the fewer partial products added by the Wallace/ the reduction section does not in any way slow down the
Dad& structure and hence the faster the multiplier. This is not reduction, but it does greatly reduce the time needed to gener-
the case, however, as will be understood by the end of this ate the partial products. Were it not for this, a high-speed
Paper. adder would be necessary to generate -B from B. This will be
’Ibis hybrid multiplier is readily divided into four major clear from the Dadda reduction structure shown later,
blocks for discussion: a partial product generation block, a The partial product generation is shown by Figure 1, where
partial product selection block, a reduction tree, and a high a colon is used to denote concatenation of bits. The partial
speed adder. products are extended to 33 bits. For B and -8, the sign bit is
0-7803-05
10-8192$03.0001992IEEE 158
P a r m Pro-
With the implementation of radix 4 for a 32x32 bit multi-
P
plication, there are 17 products for the reduction struc-
ture (seeFigure 2). 0, there are 33 bits per partial product:
B and are sign extended, 2B shifts in a ‘0’as the LSB, and
2B shifts in a ‘1’ as the LSB. The LSB mux contains several
additional gates to control the cany in signal. The regular
muxes and the LSB mux are depicted in Figure 3. As can be
seen, the regular muxes and the LSB muxes contain 10 and 13
gates respectively.A fan-in of 6 is not be very desirable here,
but with appmpnate sizing at the transistor level, this OR gate
2B (3i bits) B (33 bits) M S B O :B B I1 is not be a problem. The muxes select the appropriate partial
product (or 0 for just shift which mults if none of the other
partial products am selected) basedon the bits of the multipli-
f +
B(3 bits) 2E(33 bits)
cand being analyzed according to the Booth radix 4 le^
already presented.
The general mux structure is shown by Figure 2, where
Figure 1. Partial Product Generation. each successive partial product is shifted left by 2 units in the
reduction structure. The last partial product requires only 32
extended. For 2B, B is shifted left by-one unit and a ‘0’is muxes instead of 33. This is because the sign bit of the multi-
wired in for a positive operation. For 2B, B is shifted left one plier word is extended (seeFigure 2) and according to Table
unit, and a ‘1’ is wired in for a negative operation. Some 1. only B or can result for the last partial product. The sign
examples will demonstrate this later. extension to 33 bits is not needed here.
The partial products are available in 1 gate delay for radix With the muxes as shown, the total delay from the partial
4. All logic gates, whether they are inverters or NANDjates, products is 3A. This absorbs the lA delay for9e invetem
will be assumed to have 1 gate delay, indicated as “lA. This neededto form Band 2Bin 2.1. This is because B and 2B can
1 4 needed to form all possible partial products, is absorbed in be formed in parallel with the inverters forming the compli-
the next block, the parlial product selection. ments of the mux select lines. ”lie total gate count for the
muxes needed to perform a 32x32 bit multiplication in radix 4

S 31 3 0 2 9 2 8 2 7 2 6 2 5 24 23 2 2 2 1 2 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2

Figure 2. Mux Tree Structure.


For each group of three bits of the multiplier word, the appropriate partial product is muxed according to Booth Radix 4
rules. This builds a mux tree, where each partial product is shifted by 2 bit positions before the reduction structure.

159
Regular Mux LSB Mux

Figure 3. Regular and LSB Mux shown in Mux Tree Structure

where PP denotes Partial Product is: 18->13->9-%->4->3->2.


16 PPs *32regular muxes+ 10 gates /mux =5120 gates With a conventional 32x32 bit Wake/Dadda multiplier,
1 PP *31 regular muxes* 10gates / mux = 31 gates the partial products are available in lA, but there are 32 par-
I7PPs *lLSBmux * 13gates/mux=221gates tialproducts which requires 8 reduction steps:

Total = 5372 gates 32->28->


19->13->9->6>4->3->2.
The reduction structure following the mux section is
shown in Figure 4. From the figure, several observations can
Re-ve Followed bv a Iiipb-SDeed be made. First, there are 17 rows, one for each partial product.
Adder Each partial product is sign-extended except for the last one
which needs only 32 bits to form a complete 64 bit result. The
After a delay of only 36, the partial products are available carry muxes select whether or not or 2n was formed and
to the reduction structure. There are at most 18 bits per col- needs the cany in to form the true 2's complement The final
umn, including the cany in's. There are only 17 partial prod- sign bit is the xorvofthe sign bits of A&B. This reduction
ucts, but one column in the reduction structure has 18 "dots" structure then gets kduced as shown in Figure 5, to a 2 row
due to the potential carry-in. Using Dadda's sequence [5]with 64 bit result. Finally, the reduction structure is fed into a 64 bit
18 dots as the limiting factor, 6 reduction steps are required to high-speed adder which produces the final result, In Figure 5,
compress the partial products down to 2 for the high-speed H s and F's am used to represent half and full adders respec-
adder. The 6 steps compress the structure down in the follow- tively. Qpically Dad& "dots" are used to show the reduc-
ing manner: tion steps.
160
1’

....................
0
...............................ooooooooooooooooooooooooooooooooo

....................
(1
. . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0

.......,
0 .
.......ooooooooooooooooooooooooooooooooo
..................ooooooooooooooooooooooooooooooooo
.....................ooooooooooooooooooooooooooooooooo
(1
(1

0
.....................ooooooooooooooooooooooooooooooooo
0

.................
. ~ . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0

.............
0
. . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

...........
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0

.......
.......e.

.....
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ~ ~ 0 0 0 0 0 0
0
0.. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Figure 4. Booth Generated Partial Products.

0 = Mux Outputs = Carry from he mux to complete two’scomplement = Sign extended bit = Sign Bit [A(s)$B(s)l

llllllllllllllllllllllllllllllllllllllllllllll9lS9l86l56453423l2
l77777llll7lllll7777lllllllllllS6l56453423l2Ol 0
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

lllllllllllllllllllllllllllllllllllllllllllllll9lS9l86l56453423l2
3333333333333333333333333333333333333333231101 0
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH

9999999999999999999999999999999999999999999999998918615645342312
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH

6666666666666666666666666666666666666666666666666666665645342312
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH
44444444444444444444444444444444444144444444444.444444444444342312
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFHH

3333333333333333333333333333333333333333333333333333333333332312
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFnH
2222222222222222222222222222222222222222222222222222222222222212

S 64 Bit High-speedAdder

Figure 5. Dadda “Dot” Diagram.

F = Full Adder H = Half Addm


161
A=110011100
(append 0 to LSB in Booth and sign extend)
Several examples follow to illustrate different cases in the
multiplier. In the interest of brevity, the examples are smaller abitSAction
than a 32x32 bit multiplication,but still serve to illustrate how 1
the multiplier works. The exam les show how the partial
products would appear to the nx8wtion block, and the final
11 111 1010111 100 Zl+l
result. 0000000000 111 None
OOO10100 001 B
1
101011 110 B+1
A=11/32= 0.010110
B= 5/16= 0.010100 1.110000011000
E= 1.101011 'r
Answer = 55/512 = 0.Ooo110111ooO 1 is XOR of sign bits of A & B

A=000101100
(append 0 to LSB in Booth and sign extend)
A=-25/32= 1.001110
ahitSAction B= -5/16= 1.101100
1 E= 0.010011
111111010111 100 2B+1 Answer=l25/5 12= 0.0011 11 101OOO
m101oO0 011 2B
Ooo10100 010 B A= 110011100
000000 OOO None (append 0 to LSB in Booth and sign extend)
o.OOO110 1 11OOO P f
m
x0 is XOR of sign bits of A & B 1
OOOOOO100111 100 28+1
0000000000 111 None
Examde:! 11101100 001 B
A= 11/32= 0.010110 1
B= -5/16= 1.101100 010011 110 B+1
E= 0.010011
Answer=-55/512 =1.111001001Ooo 0.0011 1 1101000
x-
0 is XOR of sign ".s of A IB
A=Ooo101100
Analyzing Example 1, there ~ I V four columns of partial
(append 0 to LSB in Booth and sign extend) products. The one '1' all by itself is the cany in which is
needed to form the 2's com lement of 2B. The partial -
at ebltSAction p"d
ucts are sign extended to i n n the correct number o bits.
Notice the last partial product is not sign extended (it is only
1
000000100111 100 m+1 six bits not seven) since the number of decimal bits needed in
the result are already met. Also,the carry out of the most sig-
11 11011Ooo 011 2B nificant decimal digit is discarded. The sign bit is indepen-
11101100 010 B dently calculated as the XOR of A&B.
000000 OOO None Example 2 shows the case when B is the negative of
Example 1. The same characteristicscan be Seen here.
1.11 1001001Ooo Examples 3 & 4 Serve to illustrate a CO le of points. They
x1 is XOR of sign bits of A & B both contain a 2B+1 step in them (where%+l is equivalent
to 2* 2's complement of B). In Example 3, B is a positive
value and in Exam le 4, B is a negative value. But, as can be
seen, when the 2&1 operation is needed, a '1' is shifted in
regardless of the sign of B. Likewise, if a 2B operation were
needed, a 'O'would be shifted in regardless of the sign. Notice
A=-25/32= 1.001110 again that the last partial product is not sign extended (it is
only six bits not seven) since the number of decimal bits
B= 5/16= 0.0 10100 needed in the result are already met. Again the carry out of the
E= 1.101011 MSB is discarded. and the sign is calculated from the xor of A
Answer=-125/5 12= 1.1 1OOOOO11oO0 & B.
162
1'

This again is a savings of 1OA The hybrid multiplier saves


reduction steps at the expense of increased area for the multi-
plexers. The number of step reduceddepends where on Dad-
In the following anal sis. a standard 9 gate full adder is da's se~uence the number of pwial products for both
used which requires 6A the sum to be computed once its methods faU For 32 and 64 bits, Booa "!sthe number
operands are present. Also. a standard 4 bit block size/ 3-level of partial products mws by 2, theteby saving 2 reduction
carry look ahead adder [7] is used for the 64 bit fast adder. steps. Any savings in reduction steps is very beneficial since
Using this mlix 4 approach, the total delay to canpute the each step requires 6A with this full-adderimpkm"eon.
product is:
38 for partial product generatiodselection (muxes)
36A for reduction(18->13->9-%->4->3->2)
6 reduction steps * 6 gate delays for full addex
1) To e up both the W W c e D a W multiplier and the
radix 4 hy 'd presented. an analysis of a fbhddex circuit at
14A for final high-speed carry bok-aheadadder the transisM level c m be made. With proper sizing, a
- straightforward " a n d 4 appmch to the sum and carry [61
53A total could yield maUer delay hmes than the modular fast adder
from half aidem This could be a great savings since 6 gate
This compares to a delay of 636 for a rraditional Wallace/ delays could be cut to three for each reduction step. This is the
Daddaapproach: mOSttimtC~~OftlledgCXi~.

1A for partial product generation 2) Another possibility is to consider a faster high-speed


48A forreduction addex such as that of a c ~ n yselat adder.
(32->28->19->13->9->6->4->3->2)
8 reduction step * 6 gate delays for full adder 3) Finally, the gate count could be significantly decreased
14A for final 64 bit high-speed any lool-ahead adder by using dynamic logic in the multiplexer blocks.
-
63A total
The total gate count for this hybrid approach is:
32 for inverters to fonn
5372 for muxes (Fi
6210 for 690 full E g i g u r e 5) On the surface, it might appear tbat using a higher radix
248 for 62 half at%krsF igure5) Booth algoridrm would make m even faster multiplier.
4 for sign-bit xor &from AND,OR,JNV Highez rrdices would p;oduce fewer partial products, which
intumwouldreq~fewunductionsteptounupresstoa
11866 total gates +a64 bit high speedadder two pa&l product result for the high-speed adder. However,
the speed togenerate,plus the size to select the partial prod-
This compares to a gate count for a traditional Wallace/ ucts is very limiting, as will be seen.
Dad& of:
1024 AND tesforparriaIprodu~ts(32~)
8091 for 8gfulI adders
124 for3lhalfadders 'hble 2 shows the requiredsteps in the Booth algorithm for
4 for sign-bit xor made from AND,ORJNV radix 8 to generate a partial product. 'Lhe bits of the multiplier
9243 total gates + a 64 bit high speed adder ward arc amdyad at each ay and the appm&a&
any is performed to the mdhp x m d before
action if
g to the cur-
Thus a 28% increase in complexity results in a 16% rent pmial product and shifting by three bit positions to the
decrease in delay for a 32x32 bit multiplier. 'Ihe percentage left.
increase in complexity would be lowif the high- Fnnn the it c8n be seen that the following partial
speed cany lodr-ahead iwtv was figuredinto the calculation. prod- a needed: B, 2B, 3B, 4B, 4 B , -3B, -m, -B.
Consider MW 64 bits. Using Booth radix 4 there are 33 limiting in terms Of speed are the partial products: 3B
& -3B which cannot be formed by simple shifting of B or -B.
partial products. For the hybrid approach,the delay i
s To perform this addition, a high speed ad&z must be used.
38 for muxes and partial product g e d o n For a 32x32 multipliaujcm, it takm 1415 to generate a 33 bit
48A forreduction 3B, and an additional 1A forthe inverSian needed to genetate
3~.Thisincursanacc~mulati~edclay0f 15AplSttocreatethe
(33->28719->13->9->6->4->3->2) partial products. 'Ihis tums out tobe arather hugeoverhead.
18A for 128 bit high-speed carry look-aheadadder
69A total P
-
The regular Wallace/Daddaapproach incurs a delay of: For a radix 8 Booth multipliex, 4 bits of the multiplier
word, A, are being examined. Based on these four bits, a
1A forpartialproductgenemion selection of the apprognate
' partialpoductmustbenwkfor
60A forreduction that step according to Wle 2. This requires many multiplex-
(64-%3->42->28-> 19->13->9->6->4->3->2)
18A for 128 bit high-speed carry la-ahead adder
-
?gfg shown in F v 6. 'I%e mux must select which
E 4 B , 4B, -3B, -2B,-B( or 0 for do nothing just
shift, but this will be the result if none of the othet partial
79A total productsare selected) should be the partialproduct for that
163
able 2: Booth Radix 8 Partial Product Rules.
0
-B
-2B
-3B
-4B
+4B
+3B
+2B MUX
+B
0

bit3
bit2
bit1
bit0

Figure 6. Mux Needed for Booth Radix 8.


L
1110 -B
From section 2.5,the delay of a straightforward Wallace/
1111 Nothing Dad& multipliez and of a radix 4 Booth and WdhceDadda
1’

111 A. D. Booth, “A Signed Binary Multiplication Tcch-


nique,” Quarterly Journal of Mechanics and Applied
Mathematics,vol. 4,pp. 236240,1951.Reprinted in [81
pp. 100-104.
[2] H. Sam and A. Gupta, “A Generalized Multibit Recod-
ing of WO’sComplement Binary Numbers and its Proof
with Applications in Multiplier ImpIemencationS,”
IEEE Transactions on Computers, vol. 39, pp. 1006-
1015,1990.
[3] 0.L. MacSorley, “Hi Speed Arithmetic in Binary
Complkx+” Proc. vol. 49, pp. 67-91, 1961.
Repnnted in [8]pp. 14-38.
[4] C. S. Wallace,, “A Suggestion for a Fast Multiplier,”
W E Transactions on Electronic Compum, vol. EC-
13, pp. 14-17.1964.Reprinted in [81 pp. 114-117.
[SI L.Dadda,“Some Schemes for Parallel Multipliers,” Alta
Freq., vol. 34,pp. 349356, 1%5. Reprinted in [SI pp.
118-125.
[6] S. Vassiliadis, E. M. Schwartz, and D.J. Hanrahan, ‘‘ A
General Proof for ovtrlapped Multiple-Bit Scanning
Multiplication,” IEEE Transactions on Computers, vol.
38,pp. 172-183.1989.
[7] A. Weinberger and J. L.Smith, “A logic for High-speed
Addition,” National Bureau of Standards Circular 591,
W58, pp. 3-12.Reprinted in [8]pp. 47-56.
[8] E. E.Swartzlander Jr., ed., Computer qrithmetic, vol. 1,
Los Alamitos, Ca: IEEE Computer Society Press. 1990.

165 .

You might also like