Multiplier
Multiplier
Spring 2018
Digital Design and
Integrated Circuits
Instructors:
John Wawrzynek and Nick Weaver
Lecture 21:
Multiplier Circuits
EE141
Multiplication
a3 a2 a1 a0 Multiplicand
b3 b2 b1 b0 Multiplier
X a 3b 0 a 2b 0 a1b 0 a0b 0
a3 b 1 a2b 1 a1b 1 a 0b 1 Partial
a 3b 2 a 2b 2 a1b 2 a 0b 2 products
a 3b 3 a 2b 3 a1b 3 a 0b 3
Control Algorithm:
1. P ← 0, A ← multiplicand,
B ← multiplier
2. If LSB of B==1 then add A to P
else add 0
• Cost α n, Τ = n clock cycles.
3. Shift [P][B] right 1
• What is the critical path for
determining the min clock 4. Repeat steps 2 and 3 n-1 times.
period? 5. [P][B] has product.
Page 3
“Shift and Add” Multiplier
Signed Multiplication:
Remember for 2’s complement numbers MSB has negative weight:
5
EE141
Unsigned
Combinational
Multiplier
EE141
Array Multiplier
Single cycle multiply: Generates all n partial products simultaneously.
Each row: n-bit adder with AND gates
HA FA FA HA
y2
Propagation delay ~2N x3 x2 x1 x0
z1
FA FA FA HA
x3 x2 x1 y3
x0
z2
FA FA FA HA
z7 z6 z5 z4 z3 8
EE141
Carry-Save Addition
• Speeding up multiplication is a • Example: sum three numbers,
matter of speeding up the 310 = 0011, 210 = 0010, 310 = 0011
summing of the partial products.
• “Carry-save” addition can help. 310 0011
• Carry-save addition passes + 210 0010
(saves) the carries to the output, carry-save add
c 0100 = 410
rather than propagating them.
s 0001 = 110
carry-save add
310 0011
c 0010 = 210
carry-propagate add s 0110 = 610
1000 = 810
• In general, carry-save addition takes in 3 numbers and produces 2.
• Whereas, carry-propagate takes 2 and produces 1.
• With this technique, we can avoid carry propagation until final addition
Page 9
Carry-save Circuits
Page 10
Array Multiplier using Carry-save Addition
Fast carry-
propagate adder
Page 11
Carry-save Addition
CSA is associative and communitive. For example:
(((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))
Page 12
Increasing Throughput: Pipelining
Idea: split processing
across several clock
cycles by dividing circuit
into pipeline stages
separated by registers
that hold values passing
from one stage to the
next.
= register
EE141
Booth Recoding: Higher-radix mult.
Idea: If we could use, say, 2 bits of the multiplier in generating each
partial product we would halve the number of columns and halve the
latency of the multiplier!
AN-1 AN-2 … A4 A3 A2 A1 A0
x BM-1 BM-2 … B3 B2 B1 B0
M/2 2
...
BK+1,K*A = 0*A → 0
Booth’s insight: rewrite 2*A
= 1*A → A
and 3*A cases, leave 4A for
= 2*A → 4A – 2A
next partial product to do! = 3*A → 4A – A 15
EE141
Booth recoding
(On-the-fly canonical signed digit encoding!)
current bit pair from previous bit pair
A “1” in this bit means the previous stage needed to add 4*A. Since
this stage is shifted by 2 bits with respect to the previous stage,
adding 4*A in the previous stage is like adding A in this stage! 16
EE141
Bit-serial Multiplier
• Bit-serial multiplier (n2 cycles, one bit of result per n cycles):
• Control Algorithm:
Page 17
Signed Multipliers
EE141
Combinational Multiplier (signed!)
(-3) * (-2)
(-3) 1 0 1 (X)
(-2) * 1 1 0 (Y)
--------------------
0 0 0 0 0 0 Y0*X = 0
+ 1 1 1 0 1 2Y1*X = -6
- 1 1 0 1 4Y2*X = -12
----------------------
(+6) 0 0 0 1 1 0
19
EE141
Combinational Multiplier (signed)
X3 X2 X1 X0
* Y3 Y2 Y1 Y0
--------------------
X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0
+ X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1
+ X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2
- X3Y3 X3Y3 X2Y3 X1Y3 X0Y3
-----------------------------------------
y0
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0 x3 x2 x1 x0
y1
x3 x2 x1 x0
z0
FA FA FA FA FA FA HA
x3 x2 y2
x1 x0
z1
FA FA FA FA FA HA
x3 x2 x1 y3
x0
z2
FA FA FA FA FA 1
There are tricks we can use
z7 z5
to eliminate the extra
z6 z4 z3 20
EE141 circuitry we added…
2’s Complement Multiplication
(Baugh-Wooley)
Step 1: two’s complement operands so high Step 3: add the ones to the partial products
order bit is –2N-1. Must sign extend partial and propagate the carries. All the sign
products and subtract the last one extension bits go away!
X3 X2 X1 X0 X3Y0 X2Y0 X1Y0 X0Y0
* Y3 Y2 Y1 Y0 + X3Y1 X2Y1 X1Y1 X0Y1
-------------------- + X2Y2 X1Y2 X0Y2
X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 + X3Y3 X2Y3 X1Y3 X0Y3
+ X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1 +
+ X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2 + 1
- X3Y3 X3Y3 X2Y3 X1Y3 X0Y3 - 1 1 1 1
-----------------------------------------
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
Step 2: don’t want all those extra additions, so Step 4: finish computing the constants…
add a carefully chosen constant, remembering
to subtract it at the end. Convert subtraction
into add of (complement + 1). X3Y0 X2Y0 X1Y0 X0Y0
X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 + X3Y1 X2Y1 X1Y1 X0Y1
+ 1 + X2Y2 X1Y2 X0Y2
+ X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1 + X3Y3 X2Y3 X1Y3 X0Y3
+ 1 + 1
+ X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2 + 1 1
+ 1
+ X3Y3 X3Y3 X2Y3 X1Y3 X0Y3
+ 1 –B = ~B + 1 Result: multiplying 2’s complement operands
+ 1 takes just about same amount of hardware as
21
-
EE141
1 1 1 1 multiplying unsigned operands!
2’s Complement Multiplication
y0
x3 x2 x1 x0
y1
x3 x2 x1 x0
1 z0
FA FA FA HA
x3 x2 y2
x1 x0
z1
FA FA FA HA
x3 x2 x1 y3
x0
1 z2
HA FA FA FA HA
z7 z6 z5 z4 z3
22
EE141
Multiplication in Verilog
You can use the “*” operator to multiply two numbers: