Basic Arithmetic and The ALU Basic Arithmetic and The ALU
Basic Arithmetic and The ALU Basic Arithmetic and The ALU
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 1 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 2
Background Background
Recall: 32-bit types include
• unsigned integers
n bits give rise to 2n combinations
• singed integers
let us call a string of 32 bits as “b31 b30 . . . b3 b2 b1 b0”
• single-precision floating point
No inherent meaning • MIPS instructions (A.10)
• one interpretation f(b31 . . . b4 b3 b2 b1 b0) -> value
• another f(b31 . . . b4 b3 b2 b1 b0) -> control signals
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 3 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 4
Unsigned integers Signed Integers
f(b31 . . . b0) = b31 x 231 + . . . + b1 x 2 + b0 x 20 2’s Complement
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 7 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 8
Subtraction Full Adder
A - B = A + 2’s complement of B full adder (a, b, cin )--> (cout, s)
E.g., 3 - 2
cout = two of more of (a, b, cin)
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 9 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 10
a0 b0 a1 b1 a2 b2 a31b31 a0 b0 a1 b1 a2 b2 a3 b3
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 11 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 12
Combined Ripple-carry Adder/Subtractor Carry Lookahead
control = 1 => subtract The above ALU is too slow -
XOR B with control and set Cin0 to control • gate delays for add = 32 x FA + XOR ~= 64 - too slow
Theoretically:
Full Full Full Full
Add Add Add Add Cout
er er er er • In parallel
• sum0 = f(cin, a0, b0)
• sumi = f(cin, ai . . . a0, bi . . . b0)
• sum31 = f(cin, a31 . . . a0, b31 . . . b0)
operation
b b b b •
a0 0 a1 1 a2 2 a31 31
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 13 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 14
0011 0110
build tree so delay is O(log2 n) for n bits
Need both to generate and at least one to propagate
E.g., 2 x 5 gate delays for 32-bits Define: gi = ai * bi ## carry generate
p i = ai + b i ## carry propagate
We will give the basic idea with (a) 4-bit then (b) 16-bit adder
Recall: ci+1 = ai * bi + ai * ci + bi * ci
= g i + pi * ci
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 15 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 16
Carry Lookahead 4-bit Carry Lookahead Adder
Therefore
c1 = g0 + p0 * c0 c0
c4 Carry Lookahead Block
c2 = g1 + p1 * c1 = g1 + p1 * (g0 + p0 * c0)
= g 1 + p 1 * g 0 + p1 * p 0 * c 0
g3 p3 a3 b3 g2 p2 a2 b2 g1 p1 a1 b1 g0 p0 a0 b0
c3 = g2 + p2 * g1 + p2 * p1 * g0 + p2 * p1 * p0 * c0
c3 c2 c1 c0
c4 = g3 + p3*g2 + p3*p2*g1 + p3*p2*p1*g0 + p3*p2*p1*p0*c0
Uses one level to form pi and gi, two levels for carry
s3 s2 s1 s0
But, this needs n+1 fanin at the OR and the rightmost AND
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 17 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 18
Hierachical Carry Lookahead for 16 bits Hierachical Carry Lookahead for 16 bits
Build 16-bit adder from four 4-bit adders
G12,15 = g15 + p15 * g14 + p15* p14 * g13 + p15 * p14 * p13 * g12
s12-15 s8-11 s4-7 s0-3 P12,15 = p15 * p14 * p13 * p12
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 19 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 20
Carry Lookahead Basics Carry Lookahead: Compute G’s and P’s
Fill in the holes in G’s and P’s:
Gi, k = Gj+1,k + Pj+1, k * Gi,j (assume i < j +1 < k ) G12,15 G8,11 G4,7 G0,3
P12,15 P8,11 P4,7 P0,3
Pi,k = Pi,j * Pj+1, k
G0,15
P0,15
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 21 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 22
Full Adder 0
c12 c8 c4 c0 c0
Full Adder
G8,11 G0,3
P8,11 P0,3 Full Adder 1
c8 c0
G0,7 next
P0,7 select
2-1 Mux select
c0
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 23 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 24
Other Adders: Carry Save Wallace Tree
A + B -> S f e d c b a
Save carries A + B -> S, Cout CSA CSA
Use Cin A + B + C -> S1, S2 (3# to 2# in parallel)
CSA
CSA
c s
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 25 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 26
• srl -> shift right arithmetic (old MSB --> new MSB)
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 27 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 28
Shifter All Together
d7 d6 d1 d0 d0 0 operation
invert carryin
Mux shift based on 0th bit by 0 or 1 shamt0
a
stage0
s07 s00
s07 s06 s00 s02 s01 0 s00 0
result
Mux
Mux st
shift based on 1 bit by 0 or 2 shamt1
stage1
s17 s10
b
Mux
s17 s13 s14 s10 s13 0 s10 0 Add
Mux shift based on 2nd bit by 0 or 4 shamt2
dout
dout7 dout7
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 29 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 30
Overflow Overflow
with n-bits only 2ncombinations More involved for 2’s Complement
-1+ -1 = -2
Unsigned [0, 2n -1], 2’s Complement [-2n-1, 2n-1-1]
111
Unsigned Add
+ 111
5+6>7
1110
101
110 = -2 is correct => can’t just use carry-out
+ 110
1011
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 31 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 32
Addition Overflow Addition Overflow
When is overflow NOT possible? p1, p2 > 0 and n1, n2 < 0 2+3=5>4 010
p1 + p2 + 011
p1 + n1 not possible 101 = -3 < 0! In general, X = f(2)
n1 + p2 not possible -1 + -4 111
n1 + n2 + 100
overflow = X * a(2) * b(2) + Y * a(2) * b(2) 011 which is 011 > 0 In general Y = f(2)
What are X and Y? Overflow = f(2) * a(2) * b(2) + f(2) * a(2) * b(2)
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 33 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 34
neg - pos ==> neg ;; overflow otherwise Flag - condition code that may be tested by software
pos - neg ==> pos ;; overflow otherwise sticky flag - e.g., for floating point
overflow = f(2) * (a2) * b(2) + f(2) * a(2) * (b2) trap - possibly with mask
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 35 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 36
Zero and Negative Zero and Negative
zero = f(2) + f(1) + f(0) May also want correct answer even on overflow
can’t also look at f(3) because negative = (a < b) = (a-b < 0) even if overflow
001 +1 E.g., is -4 < 2?
+ 111 -1 100 -4
1000 0 - 010 2
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 37 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 38
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 39 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 4 40
MMX, cont.
Others: MOV, (UN)PACK, & MASK (e.g., next)
15 15 100 120 101 76 15 15
15 15 15 15 15 15 15 15
--------------------------------
FF FF 00 00 00 00 FF FF
Why? Weatherperson at 00’s & weathermap at FF’s
Comments
• Backward compatible & no OS changes (overload FP regs)
• Others have similar: Sun, HP, and now Intel SSE
• ISVs (i.e., for games) have not (yet) embraced