Untitled Presentation
Untitled Presentation
Computer Architecture
Fall 2004
0111 1111 1111 1111 1111 1111 1111 1110 two = + 2,147,483,646 ten
0111 1111 1111 1111 1111 1111 1111 1111 two = + 2,147,483,647 ten
1000 0000 0000 0000 0000 0000 0000 0000 two = – 2,147,483,648 ten
1000 0000 0000 0000 0000 0000 0000 0001 two = – 2,147,483,647 ten
...
minint
1111 1111 1111 1111 1111 1111 1111 1110 two = – 2 ten
1111 1111 1111 1111 1111 1111 1111 1111 two = – 1 ten
...
A 0111 → 0111
B - 0110 → + 1010 c 31
A 31 1-bit
FA S31
B31
c 32=carry_out
add/sub carry_i op
n
❑ Modify the adder
to handle logic A
operations
resul
t
1-bit
FA
B
add/sub carry_ou
CSE431 L03 MIPS ALU Design Review. 13 t Irwin, PSU, 2004
Modifying the ALU Cell for slt
add/sub carry_i op
n
A
resul
t
1-
B bit
FA
less
add/sub carry_ou
t
less
❑ Make the result 1 if
A1
the subtraction yields
a negative result
result
● if $s0<$s1 set $t0=1 1
B1 +
less
A1
result
1
...
B1 + zero
0 less
0 1 1 1 1 0
0 1 1 1 7 1 1 0 0 –4
+ 0 0 1 1 3 + 1 0 1 1 –5
1 0 1 0 –6 0 1 1 1 7
CSE431 L03 MIPS ALU Design Review. 17 Irwin, PSU, 2004
op
add/sub
A0 Modifying the ALU for Overflow
result ❑ Modify the most
0 significant cell to
B0 +
determine overflow
less
output setting
A1
result
1
...
B1 + zero
0 less
0 less overflow
set u op
CSE431 L03 MIPS ALU Design Review. 18 Irwin, PSU, 2004
But What about Performance?
❑ The critical path (the logic carry_in
path that determines the A0 0
1- result
speed of the circuit) of B0 bit
0
ALUcarry_out
ripple-carry adder is down carry_in
0
A1 1-
the carry chain 1
result
B1 bit
1
ALUcarry_out
carry_in
1
❑ For an n-bit adder where the A2 2 1- result
carry logic per ALU takes CP B2 bit
2
ALUcarry_out
time units that’s n*CP carry_in
2
A3 3 1-
● For CP = 0.010ns (10ps) that’s result
B3 bit
- 0.16ns for 16-bits ALU
3
...
3
● A 5GHz machine has a clock cycle
time of 200ps, so you’ve way carry_inn-1
missed your timing target! An-1 1- resultn-1
B n-1 bit
ALU
carry_ou
CSE431 L03 MIPS ALU Design Review. 19 Irwin, PSU, 2004
t
Review: 1-bit Binary Adder
carry_i A B carry carry_ S carry
n _in out status
0 0 0 0 0 kill
A 1 bit 0 0 1 0 1 kill
Full S 0 1 0 0 1 propagate
B Adder 0 1 1 1 0 propagate
1 0 0 0 1 propagate
carry_ou 1 0 1 1 0 propagate
t 1 1 0 1 0 generate
1 1 1 1 1 generate
❑ Note that all the p’s and g’s can be formed in parallel,
then all the carries can be formed in parallel,
then all the sums can be formed in parallel.
● approximately 3*CP, so 0.030ns for any size adder
€ €
€ €
❑ Measures to consider
p, g logic (1 unit delay)
● tree cell height (time)
● tree cell area; number of € cells
● cell fan-in and fan-out
parallel prefix logic tree
(1 unit delay per level) ● max wiring length
● wiring congestion
● delay path variation (glitching)
€ € € €
g= g= g= g = g 0 | p 0 &c0 = c 1
p= p= p= p = p 0 &c0
€ € €
g= g=
p= p=
€
g=
p=
g=
p=
c4 c3 c2 c1
CSE431 L03 MIPS ALU Design Review. 24 Irwin, PSU, 2004
A 4-bit Example
❑ The unrolled carry equations
c 0 = carry_in c 1 = g0 | p 0 &c0 c 2 = g1 | p 1 &c1 = g 1 | p 1 &g 0 | p 1 &p 0 &c0
c 3 = g2 | p 2 &c2 = g 2 | p 2 &g 1 | p 2 &p 1 &g 0 | p 2 &p 1 &p 0 &c0
c 4 = g3 | p 3 &c3 = g 3 | p 3 &g 2 | p 3 &p 2 &g 1 | p 3 &p 2 &p 1 &g 0 | p 3 &p 2 &p 1 &p 0 &c0
reminder €
g3 g2 g1 g0 g = g’’ |
p3 p2 p1 p0 p’’&g’
c0
p = p’’& p’
€ € € €
g = g 3 | p 3 &g2 g = g 2 | p 2 &g1 g = g 1 | p 1 &g0 g = g 0 | p 0 &c0 = c 1
p = p 3 &p2 p = p 2 &p1 p = p 1 &p0 p = p 0 &c0
€ € €
g = g 3 | p 3 &g2 | p 3 &p2 &(g 1 | p 1 &g0 ) g = g 1 | p 1 &g0 | p 1 &p0 &c0 = c 2
p = p 3 &p2 &p1 &p0 p = p 1 &p0 &c0
€
g = g 2 | p 2 &g1 | p 2 &p1 &(g 0 | p 0 &c0 ) = c 3
p = p 2 &p1 &p0 &c0
g = g 3 | p 3 &g2 | p 3 &p2 &g1 | p 3 &p2 &p1 &g0 | p 3 &p2 &p1 &p0 &c0 = c4
p = p 3 &p2 &p1 &p0 &c0
c4 c3 c2 c1
CSE431 L03 MIPS ALU Design Review. 25 Irwin, PSU, 2004
A 16-bit Kogge-Stone Parallel Prefix Adder
g15 g14 g13 g12 g11 g10 g9 g8 g7 g6 g5 g4 g3 g2 g1 g0
p15 p14 p13 p12 p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 c in
€ € € € € € € € € € € € € € € €
€ € € € € € € € € € € € € € €
Parallel Prefix
Computation
T = log 2n
€ € € € € € € € € € € € €
€ € € € € € € € €
€ c 15 c 14 c 13 c 12 c 11 c 10 c9 c8 c7 c6 c5 c4 c3 c2 c1
c 16
Tadd = T setup + (log 2n) t € + T sum = 0.060ns for 32-bits
CSE431 L03 MIPS ALU Design Review. 26 Irwin, PSU, 2004
Shift Operations
❑ Also need operations to pack and unpack 8-bit
characters into 32-bit words
❑ Shifts move all the bits in a word left or right
sll $t2, $s0, 8 #$t2 = $s0 << 8 bits
srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits
op rs rt rd shamt funct
❑ Notice that a 5-bit shamt field is enough to shift a 32-
bit value 25 – 1 or 31 bit positions
op rs rt rd shamt funct
❑ Reminders
● HW1 due September 21 th
● Evening midterm exam scheduled
- Tuesday, October 19 th , 20:15 to 22:15, Location 112 Kern
- Please let me know ASAP (via email) if you have a conflict