Week 6 - Lecture 6 - Arithmetic Processing Unit Implementation
Week 6 - Lecture 6 - Arithmetic Processing Unit Implementation
Cout Cout
Cout Cin
+ +
⚫
S S B
Cout Cin
+
N
S
Cout Cin
+ C30 + C29 C1 + C0 +
S31 S30 S1 S0
❑ Principles
➢ Generate a carry out if 𝑎 and 𝑏 are both 1, 𝐶𝑖𝑛 does not matter.
➢ Propagate a carry in to the carry out if 𝑎 or 𝑏 is 1.
➢ Local decisions based on 𝑎 and 𝑏 only: generate 𝐺 = 𝑎𝑏, propagate 𝑃 =
𝑎 + 𝑏.
Carry Look Ahead Adder
A3 B3 A2 B2 A1 B1 A0 B0
S3 C3 S2 C2 S1 C1 S0
C0
p3 g3 p2 g2 p1 g1 p0 g0
❑ Observations
➢ m-bit multiplicand x n-bit multiplier = (m+n)-bit product
➢ Accomplished via shifting and addition.
➢ Consume more time and more chip area than addition
Sequential Unsigned Multiplication
1000 10000 100000 1000000
× 1001 × 100 × 10 × 1
1000 1000000
+ 0 + 1000
1000 1000 1000 1001000
iteration
Initially 0
❖ If each step took a clock cycle, this algorithm would require almost 100 clock
cycles to multiply two 32-bit numbers.
Refined Algorithm & Hardware
Start
❑ Perform shifts in parallel
HI = 0, LO = Multiplier ➢ ALU produces 64-bit result + Carry bit
➢ Final Product = HI and LO registers
=1
LO[0]?
=0 ➢ Initialize LO = Multiplier
HI = HI + Multiplicand
No
32nd repetition? carry
shift right
Yes HI LO
Done write
LO[0]
Refined algorithm: Example
❑ Consider: 11002 x 11012
➢ 4-bit multiplicand and multiplier → 4-bit adder produces a 5-bit sum (with
carry)
➢ Product = 100111002
Signed Integer Multiplication (p & p)
❑ Case 1: Positive Multiplier
Multiplicand 1100 = -4
Multiplier x 0101 = +5
11111100
Sign-extension
111100
Product 11101100 = -20
No
32nd repetition? sign
shift right
Yes HI LO
Done write
LO[0]
Signed Multiplication: Example
❑ Consider: 10102 x 11112
➢ Iteration 1: No overflow → Extend sign-bit (1 → 1)
➢ Iteration 2,3: Overflow → Invert sign bit (0 → 1)
➢ Last iteration: No overflow (why?), add 2's complement of Multiplicand
➢ Product = 000011002
Faster Integer Multiplier
❑ Can be pipelined
➢ Several multiplication performed in parallel
MIPS multiplication
❑ Two 32-bit registers for product
➢ HI: most-significant 32 bits
➢ LO: least-significant 32-bits
❑ Instructions
➢ mult rs, rt / multu rs, rt
▪ 64-bit product in HI/LO
➢ mfhi rd / mflo rd
▪ Move from HI/LO to rd
▪ Can test HI value to see if product overflows 32 bits
➢ mul rd, rs, rt
▪ Least-significant 32 bits of product –> rd
Unsigned Division (Paper & Pencil)
Initially dividend
≥0 <0
❑ Looks a lot like a multiplier!
Difference?
➢ Same hardware can be used for both
Remainder = Difference
Set lsb of Quotient
sub
No
32nd repetition?
sign
Yes write
HI (remainder) LO (quotient) shift left
Done
set lsb
Unsigned Integer Division Example
❑ Consider: 11102 / 00112 (4-bit dividend & divisor)
➢ Requires 4-bit ALU as well as 4-bit registers for remainder & quotient
➢ Result: Quotient = 01002, Remainder = 00102
MIPS division
❑ Use HI/LO registers for result
➢ HI: 32-bit remainder
➢ LO: 32-bit quotient
❑ Instructions
➢ div rs, rt / divu rs, rt
▪ No overflow or divide-by-0 checking, software must perform checks if
required
➢ Use mfhi / mflo to access result
Floating-Point Addition
❑ Consider 1.0002 × 2–1 + –1.1102 × 2–2 (i.e. 0.510 + –0.437510)
❑ Algorithm
1. Align decimal points
▪ Shift number with smaller exponent: 1.0002 × 2–1 + –0.1112 × 2–1
2. Add significands
▪ 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
3. Normalize result & check for over/underflow
▪ 1.0002 × 2–4
4. Round and renormalize if necessary
▪ 1.0002 × 2–4 (no change) = 0.062510
Step 1
Step 2
Step 3
Step 4
FP Arithmetic Hardware
❑ FP multiplier is of similar complexity to FP adder
➢ But uses a multiplier for significands instead of an adder
❑ FP hardware is coprocessor 1
➢ Adjunct processor that extends the ISA
➢ Separate FP registers
➢ FP instructions operate only on FP registers, programs generally don’t do
integer ops on FP data, or vice versa
FP Instructions in MIPS
❑ FP load and store instructions
▪ lwc1, ldc1, swc1, sdc1
❑ Single-precision arithmetic
▪ add.s, sub.s, mul.s, div.s
❑ Double-precision arithmetic
▪ add.d, sub.d, mul.d, div.d
❑ MIPS ISA
➢ Core instructions: 54 most frequently used
▪ 100% of SPECINT, 97% of SPECFP
➢ Other instructions: less frequent