0% found this document useful (0 votes)
66 views42 pages

Array Vs Tree2

The document discusses various algorithms and hardware designs for multiplication. It begins with basic multiplication schemes like shift-add and discusses optimizations like high-radix multiplication using Booth's algorithm and modified Booth's recoding. It also describes how tree and array multipliers can be implemented using carry-save adders to efficiently reduce partial products. The goal is to discuss approaches to speed up multiplication, which is a fundamental arithmetic operation.

Uploaded by

amulya_mallesh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views42 pages

Array Vs Tree2

The document discusses various algorithms and hardware designs for multiplication. It begins with basic multiplication schemes like shift-add and discusses optimizations like high-radix multiplication using Booth's algorithm and modified Booth's recoding. It also describes how tree and array multipliers can be implemented using carry-save adders to efficiently reduce partial products. The goal is to discuss approaches to speed up multiplication, which is a fundamental arithmetic operation.

Uploaded by

amulya_mallesh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Multipliers, Algorithms, and Hardware Designs

Mahzad Azarmehr Supervisor: Dr. M. Ahmadi

Spring 2008

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Outline
Survey Objectives Basic Multiplication Schemes
Shift/Add Multiplication Algorithm Basic B i H Hardware d M Multiplier lti li

High-Radix Multipliers

Multiplication of Signed Numbers Radix-4 Multiplication Modified Booths Recoding

Tree and Array Multipliers

Using Carry-save Adders Full Tree Multipliers High-Radix Multipliers Alternative Reduction Trees Tree Multipliers for signed numbers Divide and Conquer Design Array y Multipliers p Additive Multiply Modules Pipelined Tree and Array Multipliers Bit-Serial Multipliers Modular Multipliers Squaring

Variation in Multipliers

Conclusion

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

S Survey Objectives Obj ti


Multiplication is a heavily used arithmetic operation that figures prominently in signal processing and scientific applications Multiplication is hardware intensive, and the main criteria of interest are higher speed, lower cost, and less VLSI area The main concern in classic multiplication, often realized by K cycles of shifting and adding, is to speed up the underlying multi-operand addition add to o of pa partial ta p products oducts In this survey, a variety of multiplication algorithms and hardware d i designs are di discussed d
3

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Shift/Add Multiplication Algorithm


With the following notation: a Multiplicand ak-1ak-2a1a0 x Multiplier p Product xk-1xk-2x1x0 p2k-1p2k-2p1p0

Each row corresponds to the product of the multiplicand and a single bit of multiplier. Each term is either 0 or a

Binary multiplication reduces to adding a set of numbers, each of which is 0, or shifted version of the multiplicand a

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Shift/Add Multiplication Algorithm


Sequential multiplication can be done by a cumulative partial product (initialized to 0) and successively adding to it the properly shifted terms xja p(j+1) = (p(j) + xja2k) 2-l Instead of shifting successive numbers to the left for alignment, cumulative partial product is shifted by one bit to the right The product will have a total shift of k bits to the right so we pre-multiply a by 2k to offset this right, effect
5

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Basic Hardware Multiplier


x and p are stored in shift registers The next bit of x is used to select 0 or a for addition Shifting can be performed by connecting the (i)th sum output to the (k+i-1)th bit of the partial product register and the adders carry out to bit b t 2k-1 x and lower half of p can share the same register i
6

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Multiplication of Signed Numbers


In signed-magnitude numbers, the products sign should be computed separately by XORing the operand signs In 2s-complement representation: Negative multiplicand, the same routine with sign-extended values Negative multiplier, the term xk-1a should be subtracted rather than added in the last cycle

In practice, the required subtraction is performed by adding the 2scomplement of the multiplicand or adding its 1s-complement and inserting a carry-in carry in of 1 into the adder
7

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Multiplication of Signed Numbers


Examples of 2s-complement multiplications:

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Multiplication using Booths Booth s Recoding


The more 1s there are in x, the slower the multiplication In Booths recoding, every sequence of 1s is replaced with a sequence of 0s 0s, a -1 in the least significant end, and addition of 1 in the next higher position: 2j+2j-1++2 + +2i+1+2i = 2j+1-2 2i
xi 0 0 1 1 xi-1 0 1 0 1 yi 0 1 -1 0
explanation
No string of 1s in sight End of string of 1s Beginning of string of 1s Continuation of string of 1s

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

High Radix Multipliers High-Radix


These multiplication schemes handle more than one bit of the multiplier in each cycle

A higher representation radix leads to fewer digits. Thus, a digit-at-atime multiplication algorithm requires fewer cycles as we move to higher radices, which means fewer partial products The reduction in the number of cycles, cycles along with the use of recoding and carry-save adders, leads to significant gains in speed over basic multipliers

Multipliers, Algorithms and Hardware Designs

10

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Radix 4 Multipliers Radix-4


Based on two least significant end bits of multiplier, a pre-computed multiple of a is added Alternately, rather than adding 3a, add a and send a carry of 1 into the next radix-4 radix 4 digit of the multiplier

Multipliers, Algorithms and Hardware Designs

11

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Modified Booths Booth s Recoding


If radix radix-4 4 multiplication is performed with the recoded multiplier, only the multiples of a and 2a will be required, all of which are easily obtained by shifting and/or complementation
xi+1
0 0 0 0 1 1 1 1

xi
0 0 1 1 0 0 1 1

xi-1
0 1 0 1 0 1 0 1

yi+1
0 0 1 1 -1 -1 0 0

yi
0 1 -1 0 0 1 -1 0

explanation No string of 1s in sight End of a string of 1s Isolated 1 in x End of a string of 1s Beginning of a string of 1s End one string, begin new string Beginning of a string of 1s C ti Continuation ti of f string t i of f 1s 1

Multipliers, Algorithms and Hardware Designs

12

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Radix 4 Multipliers Radix-4


Booth s recoding is fully paralleled Booths and carry-free

non0: 1 bit to distinguish 0 from nonzero digits neg: 1 bit to show the sign of nonzero digit two: 1 bit to show the magnitude of nonzero digit 13

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Using Carry Carry-Save Save Adders


Carry save adders (CSA) can be used to reduce the number of Carry-save addition cycles as well as to make each cycle faster A row of binary FA is used as a mechanism to reduce three numbers to two numbers, rather than finding a single sum

Multipliers, Algorithms and Hardware Designs

14

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Wallace and Dadda trees

Wallaces strategy is to combine the partial product bits at the earliest


opportunity which leads to the fastest possible design opportunity, With Daddas method, combining takes place as late as possible and usually

leads to simpler CSA tree and a wider CPA

Multipliers, Algorithms and Hardware Designs

15

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Using Carry Carry-Save Save Adders


A carry carry-save save adder tree can reduce n binary numbers to two numbers having the same sum in O(log n) levels As an example, this CSA tree, reduces seven k-bit operands to two (k+2)-bit operands Not necessarily all the operands have th same alignment the li t

Multipliers, Algorithms and Hardware Designs

16

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Using Carry Carry-Save Save Adders


Radix 4 multiplication without Radix-4 Booths recoding can be implemented by using a CSA to handle the 3a multiple The drawback is that the add time is slightly increased increased, since the CSA overhead is paid in every cycle, regardless of whether 3a is actually needed

Multipliers, Algorithms and Hardware Designs

17

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Using Carry Carry-Save Save Adders


CSA can be put to better use for reducing the addition time by keeping the cumulative partial product in stored-carry form As the three values that form the next cumulative partial product are added, one bit of the final product is obtained and shifted into the lower half of the register register. This eliminates the need for carry propagation in all but the final addition

Multipliers, Algorithms and Hardware Designs

18

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Using Carry Carry-Save Save Adders


The previous CSA-based CSA based design can be combined with radix-4 Booths recoding to reduce the number of cycles by 50%, while also making each cycle considerably faster

Multipliers, Algorithms and Hardware Designs

19

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Using Carry Carry-Save Save Adders


In the Booth recoding logic and multiple selection circuit, the sign of each multiple must be incorporated in the multiple itself, rather than as a signal that controls addition/subtraction This configuration can be used for high-radix and parallel multipliers

Multipliers, Algorithms and Hardware Designs

20

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Using Carry Carry-Save Save Adders


This is another way to accommodate the required 3a multiple Four numbers (the sum and carry components of the cumulative partial products products, xia and 2xi+1a) need to be combined, thus necessitating a two-level CSA tree

Multipliers, Algorithms and Hardware Designs

21

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

High Radix Multipliers High-Radix


Now, it is an easy step to visualize a higher-radix multiplier: In radix-2b multiplication with Booths recoding, we have to reduce b/2 multiples to 2 using a (b/2+2) input CSA tree whose (b/2+2)-input other two inputs are taken by the carry-save partial products. Without Booths Booth s recoding a (b+2)-input CSA tree would be needed

Multipliers, Algorithms and Hardware Designs

22

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Tree and Array Multipliers


Tree, or fully parallel multipliers constitute limiting cases of high-radix high radix multipliers (radix-2k ) With a high-performance CSA tree followed by a fast adder, logarithmic time multiplication becomes possible The resulting multipliers are expensive, but justifiable, for applications in which multiplication speed is critical One-sided CSA trees lead to much slower, but highly regular, structures known as array multipliers that offer higher pipelined throughput than tree multipliers and significantly lower chip area

Multipliers, Algorithms and Hardware Designs

23

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Full Tree Multipliers Full-Tree


In full full-tree tree multipliers, all the k multiples of multiplicand are produced at once and a k-input CSA tree is used All the multiples are combined in one pass; the tree does not require feedback links, making pipelining quite feasible

Multipliers, Algorithms and Hardware Designs

24

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

R d ti T Reduction Tree
A logarithmic depth reduction tree based on CSA CSA, has an irregular structure that makes its design and layout quite difficult Additionally, connections and signal paths of varying lengths lead to logic hazards and signal skew that have implications for both performance and power consumption Compared to generic CSA, the only modification required is relative shifting of the operands to be added

Multipliers, Algorithms and Hardware Designs

25

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Reduction Tree

Multipliers, Algorithms and Hardware Designs

26

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Alternative Reduction Trees


A slice of (n;2) counter counter, when suitably replicated, can perform the function of the reduction tree Using counters assures us that all outputs are produced after the same number b of f full-adder f ll dd d delays l The structure can be replicated to form an n-input reduction tree of desired width. Such balanced-delay trees are quite suitable for VLSI implementation q p of parallel multipliers
27

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Alternative Reduction Trees


Another alternative is using a module that reduces four numbers to two as the basic building block Then partial products reduction trees can be structured as binary trees that possess a recursive structure, making them more regular and easier to layout

Multipliers, Algorithms and Hardware Designs

28

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Tree multipliers for signed numbers


In multiplying 2s-complement 2 s-complement numbers directly, partial products are signed numbers To avoid having to deal with negatively weighted bits, an efficient ff method offered ff by Baugh and Wooley:
x0 -1 -x0 =

Multipliers, Algorithms and Hardware Designs

29

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Array Multipliers
A tree multiplier, multiplier with a one-sided reduction tree and a ripple-carry final adder is called an array multiplier an array multiplier is very regular in its structure and uses only short wires that go from one FA to adjacent FA It has a very simple and efficient y in VLSI and can be easily y and layout efficiently pipelined
30

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Array Multipliers
Sum outputs are connected diagonally, while the carry outputs are linked vertically, except in the last row, where they are chained from right to left Baugh and Wooley method can be easily applied to array multiplier for 2s-complement multiplication

Multipliers, Algorithms and Hardware Designs

31

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Pipelined Tree and Array Multipliers


Xi inputs are delayed through the insertion of latches in their paths and the product emerges with a latency of 2k-1 2k 1 cycles FA blocks used are assumed to have output latches for f both sum and carry The final ripple-carry adder has been pipelined as well

Multipliers, Algorithms and Hardware Designs

32

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Divide and Conquer Design


A 2b2b multiplier can be synthesized using bb multiplier Although there are four partial products, only three values need to be added 2b2b multiplication has been reduced to 4 bb multiplications and a three-operand addition

Multipliers, Algorithms and Hardware Designs

33

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Divide and Conquer Design


For 2b2b multiplication one can use b-bit adders exclusively to accumulate the partial products

Multipliers, Algorithms and Hardware Designs

34

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Additive Multiply Modules (AMMs)


In certain computations, multiplications are commonly followed by additions. In such cases, implementing a multiply-add unit to compute p=ax+y might be cost effective. F th Furthermore, AMMs AMM can be b used d as building blocks for multipliers In a bc AMM: (2b-1)(2c-1)+(2b-1)+(2c-1)=2b+c-1 The cost of a 42 AMM is less than the p and a 4combined costs of a 42 multiplier bit adder
Inputs marked with an asterisk carry 0s Multipliers, Algorithms and Hardware Designs 35

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Bit Serial Multipliers Bit-Serial


Bit-serial arithmetic is attractive in view of its smaller pin count count, reduced wire length, and lower floor space requirements in VLSI The compactness of the design may allow it to run a bit-serial multiplier at a high enough clock rate to make it competitive with much more complex designs with regard to speed

Multipliers, Algorithms and Hardware Designs

36

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Bit Serial Multipliers Bit-Serial


For a latency-free multiplier, the relationship between the output and inputs are written in the form of a recurrence:
a(0)=a0 , a(1)=(a1a0)2 , , a(i)=2iai+a(i-1) p(i)=2-(i+1) a(i) x(i) , 2p(i)=p(i-1)+aix(i-1)+xia(i-1)+2iaixi

A (5;3) counter can be used as an adder, if p(i1) is stored in double-carry-save form

Multipliers, Algorithms and Hardware Designs

37

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Modular Multipliers
A modular multiplier is one that produces the product of two (unsigned) integers modulo some fixed constant m. The two special cases of m=2b and m=2b-1 are simpler to deal with If the partial products are accumulated through carry carry-save save addition addition,

for m=2b, the output carry in position b-1 is ignored for m=2b-1, the carry out of position b-1 is combined with bits in column 0

Multipliers, Algorithms and Hardware Designs

38

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Modular Multipliers
Similar techniques can be used to handle modular multiplication in the general case As an example, a modulo-13 multiplier can be designed by using identities: 16=3 16 3 mod 13 32=6 mod 13 64=12 mod 13 32 2+1 1 6 4+2 12 8+4

Multipliers, Algorithms and Hardware Designs

39

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Squaring
Any standard or modular multiplier can be used for computing p=x2 if both inputs are connected to x A special-purpose k-bit squarer, if f built in hardware, will be significantly lower in cost and delay than a kk multiplier x i xi x i xixj + xjxi 2xixj

Multipliers, Algorithms and Hardware Designs

40

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Conclusion
The classic shift/add multiplication schemes and their implementation have been examined There are two ways to speed up the underlying multi-operand addition; reducing d i th the number b of f operands d l leads d t to hi high-radix h di multipliers, lti li and dd devising i i hardware multi-operand adders that minimize the latency and/or maximize the throughput leads to tree and array multipliers Cost, VLSI area, and pin limitations favor bit-serial designs, while the desire g blocks leads to designs g based on Additive Multiply py to use available building Modules (AMMs) Finally, Fi ll th the special i l case of f squaring i was of f interest, i t t as it l leads d t to considerable simplification
41

Multipliers, Algorithms and Hardware Designs

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Questions and Comments

Multipliers, Algorithms and Hardware Designs

42

You might also like