CODch 4 Slides
CODch 4 Slides
Architecture (AT70.01)
Comp. Sc. and Inf. Mgmt.
Asian Institute of Technology
Instructor: Dr. Sumanta Guha
Slide Sources: Patterson &
Hennessy COD book website
(copyright Morgan Kaufmann)
adapted and supplemented
COD Ch. 4
Arithmetic for Computers
Arithmetic
Where we've been:
performance
abstractions
instruction set architecture
assembly language and machine language
What's up ahead:
implementing the architecture
operation
32 ALU
result
32
b
32
Numbers
Bits are just bits (no inherent meaning)
conventions define relationship between bits and numbers
Binary integers (base 2)
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...
ambiguous zero
001 = +1 001 = +1 001 = +1
ambiguous zero
010 = +2 010 = +2 010 = +2
011 = +3 011 = +3 011 = +3
100 = 0 100 = -3 100 = -4
101 = -1 101 = -2 101 = -3
110 = -2 110 = -1 110 = -2
111 = -3 111 = 0 111 = -1
Issues:
balance – equal number of negatives and positives
ambiguous zero – whether more than one zero representation
ease of arithmetic operations
Which representation is best? Can we get both balance and non-ambiguous
zero?
Representation Formulae
Two’s complement:
= X’, if xn = 0
-2n + X’, if xn = 1
One’s complement:
xnX’ = X’, if xn = 0
-2n + 1 + X’, if xn = 1
MIPS – 2’s complement
32 bit signed numbers:
A + B 0 0 0
A + B 0 0 0
A – B 0 0 0
A – B 0 0 0
2. OR gate (c = a + b) a b c=a+b
0 0 0
a
c 0 1 1
b
1 0 1
1 1 1
3. Inverter (c = a) a c=a
a c 0 1
1 0
4. Multiplexor d d c
(if d = = 0, c = a;
else c = b) 0 a
a 0 1 b
c
b 1
Review: Boolean Algebra &
Gates
Problem: Consider logic functions with three inputs: A, B, C.
output D is true if at least one input is true
output E is true if exactly two inputs are true
output F is true only if all three inputs are true
a
output
b
.
Operation .
.
a
0
Result
1
b
How could we build a 1-bit ALU for add, and, and or?
How could we build a 32-bit ALU?
1-bit Adder Logic
xor
Operation a0 CarryIn
Result0
CarryIn ALU0
b0
CarryOut
a
0
a1 CarryIn
Result1
1 ALU1
Result b1
CarryOut
2
b
a2 CarryIn
Result2
ALU2
b2
CarryOut CarryOut
a31 CarryIn
Result31
ALU31
b31
a
0
1
Result
b 0 2
CarryOut
Tailoring the ALU to MIPS:
Test
for Less-than and Equality
Need to support the set-on-less-than instruction
e.g., slt $t0, $t3, $t4
remember: slt is an R-type instruction that produces 1 if rs < rt
and 0 otherwise
idea is to use subtraction: rs < rt rs – rt < 0. Recall msb of
negative number is 1
two cases after subtraction rs – rt:
if no overflow then rs < rt most significant bit of rs – rt = 1
if overflow then rs < rt most significant bit of rs – rt = 0
why?
e.g., 5ten – 6ten = 0101 – 0110 = 0101 + 1010 = 1111 (ok!)
-7ten – 6ten = 1001 – 0110 = 1001 + 1010 = 0011 (overflow!)
therefore
set bit = msb of rs – rt overflow bit
where set bit, which is output from ALU31, gives the result of slt
Fig. 4.17(lower) indicates set bit is the adder output – not correct !!
set bit is sent from ALU31 to ALU0 as the Less bit at ALU0; all other
Less bits are hardwired 0; so Less is the 32-bit result of slt
Supporting slt
Binvert Operation
CarryIn
a
0 Binvert CarryIn Operation
Result a0 CarryIn
b 0 2 b0 ALU0 Result0
Less
1
CarryOut
Less input of
Less 3 the 31 most
significant ALUs
is always 0
a1 CarryIn
a. CarryOut
b1 ALU1 Result1
1- bit ALU for the 31 least significant bits 0 Less
Extra set bit, to be routed to the Less input of the least significant 1-bit CarryOut
ALU, is computed from the most significant Result bit and the Overflow bit
(it is not the output of the adder as the figure seems to indicate)
Binvert Operation
a2 CarryIn
CarryIn
b2 ALU2 Result2
0 Less
a CarryOut
0
Result CarryIn
b 0 2
1
a31 CarryIn Result31
Less 3 b31 ALU31 Set
0 Less Overflow
Set
Overflow
Overflow
detection
b.
32-bit ALU from 31 copies of ALU at top left and 1 copy
1-bit ALU for the most significant bit of ALU at bottom left in the most significant position
Tailoring the ALU to MIPS:
Test for Less-than and Equality
What about logic for the overflow bit ?
overflow bit = carry in to msb carry out of msb
verify!
logic for overflow detection therefore can be put in to ALU31
Need to support test for equality
e.g., beq $t5, $t6, $t7
use subtraction: rs - rt = 0 rs = rt
do we need to consider overflow?
Supporting ALU
control
ALU operation
a2 CarryIn Result2
b2 ALU2 a
0 Less
CarryOut Output is 1 only if all Result bits are 0 Zero
ALU Result
Overflow
b
Result31 CarryOut
a31 CarryIn
b31 ALU31 Set
0 Less Overflow Symbol representing ALU
C1 = G0 + P0.c0
C2 = G1 + P1.G0 + P1.P0.c0
C3 = G2 + P2.G1 + P2.P1.G0 + P2.P1.P0.c0
C4 = G3 + P3.G2 + P3.P2.G1 + P3.P2.P1.G0 +
P3.P2.P1.P0.c0
Carry-lookahead
C a r r y In
a0
b0
a1
b1
C a r r y In
4bAdder0
R e s u lt 0 - - 3 Logic
a2 CarryIn
P0
b2
a3
G0 Carry-lookahead Unit
b3
C1
a0
Logic to compute
a6 P1
b6 a1 ALU1
a7
G1 s1
b7 b1
C2
a2 ALU2 s2
a8 C a r r y In b2
b8 R e s u lt 8 - - 1 1
a9 a3
b9 4bAdder2 ALU3 s3
a10 P2
b10 G2
b3
a11
b11
C3 Blow-up of 4-bit adder:
(conceptually) consisting of
a12
b12
C a r r y In
R e s u lt 1 2 - - 1 5
four 1-bit ALUs plus logic to
a13
b13 4bAdder3 compute all CarryOut bits
a14
b14
P3
G3
and one super generate and
a15
b15
C4 one super propagate bit.
C a rryO u t
Each 1-bit ALU is exactly as
16-bit carry-lookahead adder from four 4-bit for ripple-carry except c1, c2,
adders and one carry-lookahead unit c3 for ALUs 1, 2, 3 comes
from the extra logic
Two-level Carry-lookahead
Adder: Second Level
for a 16-bit adder
Two-level carry-lookahead logic steps:
1. compute pi’s and gi’s at each 1-bit ALU
2. compute Pi’s and Gi’s at each 4-bit adder unit
3. compute Ci’s in carry-lookahead unit
4. compute ci’s at each 4-bit adder unit
5. compute results (sum bits) at each 1-bit ALU
E.g., add using carry-lookahead logic:
0001 1010 0011 0011
1110 0101 1110 1011
Compare times for ripple-carry vs. carry-lookahead for a 16-bit
adder assuming unit delay at each gate
Multiply
Grade school shift-add method:
Multiplicand 1000
Multiplier x 1001
1000
0000
0000
1000
Product 01001000
m bits x n bits = m+n bit product
Binary makes it easy:
multiplier bit 1 => copy multiplicand (1 x multiplicand)
multiplier bit 0 => place 0 (0 x multiplicand)
3 versions of multiply hardware & algorithm:
Shift-add Multiplier Version 1
Start
Multiplier
64-bit ALU Shift right
2. Shift the Multiplicand register left 1 bit
32 bits
Product
Control test
Write 3. Shift the Multiplier register right 1 bit
64 bits
Done Algorithm
Shift-add Multiplier Version1
Start
Yes: 32 repetitions
Done
Algorithm
Observations on Multiply
Version 1
1 step per clock cycle nearly 100 clock cycles to multiply two
32-bit numbers
Half the bits in the multiplicand register always 0
64-bit adder is wasted
0’s inserted to right as multiplicand is shifted left
least significant bits of product never
change once formed
Multiplier
32-bit ALU Shift right
32 bits 2. Shift the Product register right 1 bit
Shift right
Product Control test
Write
3. Shift the Multiplier register right 1 bit
64 bits
Multiplier0 = 1 1. Test
Multiplier0
Multiplier0 = 0
Example: 0010 * 0011:
Yes: 32 repetitions
Done
Algorithm
Observations on Multiply
Version 2
Each step the product register wastes space that exactly matches
the current size of the multiplier
Multiplicand
32-bit ALU
Yes: 32 repetitions
Done
Algorithm
Observations on Multiply
Version 3
2 steps per bit because multiplier & product combined
What about signed multiplication?
easiest solution is to make both positive and remember whether to
negate product when done, i.e., leave out the sign bit, run for 31 steps,
then negate if multiplier and multiplicand have opposite signs
Junior school method: see how big a multiple of the divisor can be
subtracted, creating quotient digit at each step
Binary makes it easy first, try 1 * divisor; if too big, 0 * divisor
Dividend = (Quotient * Divisor) + Remainder
3 versions of divide hardware & algorithm:
Start
2a. Shift the Quotient register to the left, 2b. Restore the original value by adding
Quotient setting the new rightmost bit to 1 the Divisor register to the Remainder
64-bit ALU Shift left register and place the sum in the
32 bits Remainder register. Also shift the
Quotient register to the left, setting the
new least significant bit to 0
Remainder Control
Write test
64 bits
2a. Shift the Quotient register to the left, 2b. Restore the original value by adding
2 …
setting the new rightmost bit to 1 the Divisor register to the Remainder 3
register and place the sum in the
Remainder register. Also shift the 4
Quotient register to the left, setting the
new least significant bit to 0 5
Yes: 33 repetitions
Done Algorithm
Observations on Divide Version 1
Half the bits in divisor always 0
1/2 of 64-bit adder is wasted
Divisor Remainder register is initialized 2. Subtract the Divisor register from the
left half of the Remainder register and
with the dividend at right place the result in the left half of the
32 bits
Remainder register
Quotient
32-bit ALU Shift left
Remainder >
– 0 Remainder < 0
32 bits Test Remainder
Yes: 32 repetitions
Each step the remainder register wastes space that exactly matches
the current size of the quotient
Divisor
Remainder >
– 0 Remainder < 0
32 bits Test Remainder
32-bit ALU
3a. Shift the Remainder register to the 3b. Restore the original value by adding
left, setting the new rightmost bit to 1 the Divisor register to the left half of the
Remainder register and place the sum
Shift right in the left half of the Remainder register.
Remainder Control Also shift the Remainder register to the
Shift left
test left, setting the new rightmost bit to 0
Write
64 bits
Yes: 32 repetitions
Signed divide:
make both divisor and dividend positive and perform division
negate the quotient if divisor and dividend were of opposite signs
make the sign of the remainder match that of the dividend
this ensures always
dividend = (quotient * divisor) + remainder
–quotient (x/y) = quotient (–x/y) (e.g. 7 = 3*2 + 1 & –7 = –3*2 – 1)
MIPS Notes
div (signed), divu (unsigned), with two 32-bit register
operands, divide the contents of the operands and put
remainder in Hi register and quotient in Lo; overflow is ignored
in both cases
31 bits 30 to 20 bits 19 to 0
bits 31 to 0
Addition
Shift the smaller number to the right until its
exponent would match the larger exponent
Overflow or Yes
underflow?
No
Exception
No
Still normalized?
Yes
Done
Floating Point
Addition Sign Exponent Significand Sign Exponent Significand
Compare
Small ALU exponents
Hardware:
Exponent
difference
0 1 0 1 0 1
Shift smaller
Control Shift right
number right
Add
Big ALU
0 1 0 1
Increment or
decrement Shift left or right Normalize
Multpication
to get the new biased exponent
Overflow or Yes
underflow?
No
Exception
No
Still normalized?
Yes
Done
Floating Point Complexities
In addition to overflow we can have underflow (number too
small)
Accuracy is the problem with both overflow and underflow
because we have only a finite number of bits to represent
numbers that may actually require arbitrarily many bits
limited precision rounding rounding error
IEEE 754 keeps two extra bits, guard and round
four rounding modes
positive divided by zero yields infinity
zero divide by zero yields not a number
other complexities
Implementing the standard can be tricky
Not implementing the standard can be even worse
see text for discussion of Pentium bug!
MIPS Floating Point
MIPS has a floating point coprocessor (numbered 1, SPIM) with
thirty-two 32-bit registers $f0 - $f31. Two of these are required
to hold doubles. Floating point instructions must use only even-
numbered registers (including those operating on single floats).
SPIM simulates MIPS floating point.
Other instructions…
Summary
Computer arithmetic is constrained by limited precision
Bit patterns have no inherent meaning but standards do exist:
two’s complement
IEEE 754 floating point
Computer instructions determine meaning of the bit patterns.
Performance and accuracy are important so there are many
complexities in real machines (i.e., algorithms and
implementation)