Array Vs Tree2
Array Vs Tree2
Spring 2008
Outline
Survey Objectives Basic Multiplication Schemes
Shift/Add Multiplication Algorithm Basic B i H Hardware d M Multiplier lti li
High-Radix Multipliers
Using Carry-save Adders Full Tree Multipliers High-Radix Multipliers Alternative Reduction Trees Tree Multipliers for signed numbers Divide and Conquer Design Array y Multipliers p Additive Multiply Modules Pipelined Tree and Array Multipliers Bit-Serial Multipliers Modular Multipliers Squaring
Variation in Multipliers
Conclusion
Each row corresponds to the product of the multiplicand and a single bit of multiplier. Each term is either 0 or a
Binary multiplication reduces to adding a set of numbers, each of which is 0, or shifted version of the multiplicand a
In practice, the required subtraction is performed by adding the 2scomplement of the multiplicand or adding its 1s-complement and inserting a carry-in carry in of 1 into the adder
7
A higher representation radix leads to fewer digits. Thus, a digit-at-atime multiplication algorithm requires fewer cycles as we move to higher radices, which means fewer partial products The reduction in the number of cycles, cycles along with the use of recoding and carry-save adders, leads to significant gains in speed over basic multipliers
10
11
xi
0 0 1 1 0 0 1 1
xi-1
0 1 0 1 0 1 0 1
yi+1
0 0 1 1 -1 -1 0 0
yi
0 1 -1 0 0 1 -1 0
explanation No string of 1s in sight End of a string of 1s Isolated 1 in x End of a string of 1s Beginning of a string of 1s End one string, begin new string Beginning of a string of 1s C ti Continuation ti of f string t i of f 1s 1
12
non0: 1 bit to distinguish 0 from nonzero digits neg: 1 bit to show the sign of nonzero digit two: 1 bit to show the magnitude of nonzero digit 13
14
15
16
17
18
19
20
21
22
23
24
R d ti T Reduction Tree
A logarithmic depth reduction tree based on CSA CSA, has an irregular structure that makes its design and layout quite difficult Additionally, connections and signal paths of varying lengths lead to logic hazards and signal skew that have implications for both performance and power consumption Compared to generic CSA, the only modification required is relative shifting of the operands to be added
25
Reduction Tree
26
28
29
Array Multipliers
A tree multiplier, multiplier with a one-sided reduction tree and a ripple-carry final adder is called an array multiplier an array multiplier is very regular in its structure and uses only short wires that go from one FA to adjacent FA It has a very simple and efficient y in VLSI and can be easily y and layout efficiently pipelined
30
Array Multipliers
Sum outputs are connected diagonally, while the carry outputs are linked vertically, except in the last row, where they are chained from right to left Baugh and Wooley method can be easily applied to array multiplier for 2s-complement multiplication
31
32
33
34
36
37
Modular Multipliers
A modular multiplier is one that produces the product of two (unsigned) integers modulo some fixed constant m. The two special cases of m=2b and m=2b-1 are simpler to deal with If the partial products are accumulated through carry carry-save save addition addition,
for m=2b, the output carry in position b-1 is ignored for m=2b-1, the carry out of position b-1 is combined with bits in column 0
38
Modular Multipliers
Similar techniques can be used to handle modular multiplication in the general case As an example, a modulo-13 multiplier can be designed by using identities: 16=3 16 3 mod 13 32=6 mod 13 64=12 mod 13 32 2+1 1 6 4+2 12 8+4
39
Squaring
Any standard or modular multiplier can be used for computing p=x2 if both inputs are connected to x A special-purpose k-bit squarer, if f built in hardware, will be significantly lower in cost and delay than a kk multiplier x i xi x i xixj + xjxi 2xixj
40
Conclusion
The classic shift/add multiplication schemes and their implementation have been examined There are two ways to speed up the underlying multi-operand addition; reducing d i th the number b of f operands d l leads d t to hi high-radix h di multipliers, lti li and dd devising i i hardware multi-operand adders that minimize the latency and/or maximize the throughput leads to tree and array multipliers Cost, VLSI area, and pin limitations favor bit-serial designs, while the desire g blocks leads to designs g based on Additive Multiply py to use available building Modules (AMMs) Finally, Fi ll th the special i l case of f squaring i was of f interest, i t t as it l leads d t to considerable simplification
41
42