0% found this document useful (0 votes)
11 views

Week 6 - Lecture 6 - Arithmetic Processing Unit Implementation

This document discusses the implementation of arithmetic processing units in computer architecture. It begins with a review of integer addition, subtraction, and handling overflow. It then covers integer multiplication and division. The document discusses different approaches to implementing adders like ripple-carry adders and carry-lookahead adders. It also covers signed and unsigned integer multiplication algorithms and hardware implementations. The goal is to provide an overview of how arithmetic operations are performed at the hardware level in computer processors.

Uploaded by

Việt Hưng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Week 6 - Lecture 6 - Arithmetic Processing Unit Implementation

This document discusses the implementation of arithmetic processing units in computer architecture. It begins with a review of integer addition, subtraction, and handling overflow. It then covers integer multiplication and division. The document discusses different approaches to implementing adders like ripple-carry adders and carry-lookahead adders. It also covers signed and unsigned integer multiplication algorithms and hardware implementations. The goal is to provide an overview of how arithmetic operations are performed at the hardware level in computer processors.

Uploaded by

Việt Hưng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

ELT3047 Computer Architecture

Lecture 6: Arithmetic processing unit


implementation

Hoang Gia Hung


Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Last lecture review
❑ Demonstration of design principles for MIPS ISA
➢ Storage model
➢ Addressing mode
➢ Instruction encoding
➢ Compiler support* (self-study)
❑ MIPS instruction set
➢ Arithmetic & logical instructions
➢ Data transfer instructions
➢ Control instructions
➢ if-then, loops
➢ Procedure call

❑ Brief introduction to compiler technology


❑ Today’s lecture: implementation of MIPS ALU
➢ First part of the MIPS datapath in incremental featurism approach.
Arithmetic for Computers
❑ Operations on integers
➢ Addition and subtraction
➢ Multiplication and division
➢ Dealing with overflow

❑ Operations on real numbers


➢ Floating point operations
➢ Rounding error

❑ Arithmetic for multimedia (not covered in this course)


➢ Performing simultaneous operations on short vectors, e.g. 128-bit adder
= four 32-bit adds. Also called data-level parallelism (DLD), vector
parallelism, or Single Instruction, Multiple Data (SIMD)
➢ Saturating operations (e.g. clipping in audio, saturation in video)
Integer Addition and Subtraction
❑ Digits are added bit by bit from right to left, with carries
passed to the next digit to the left
➢ Example: 7 + 6

❑ Subtraction: the appropriate operand is simply negated before


being added, any carry-out is discarded.
➢ Example: 7 – 6 = 7 + (–6)
+7: 0111
–6: 1010
+1: 1 0001 (discard carry-out)
Overflow
❑ Occurs if result out of signed integer range
➢ Example: 7 - (-6) = 7 + 6
+7: 0111
+6: 0110
+13: 1101 = -8 + 4 + 1 = -3 (two’s complement)
➢ Occurs when adding operands with the same sign, or subtracting
operands with different signs.
➢ Detection: checking the MSB of two operands and the answer (3-bit
comp.) or checking Carry-in and Carry-Out from MSB’s (2-bit comp.).

❑ Dealing with Overflow.


➢ Some languages (e.g., C) ignore overflow
➢ Other languages (e.g., Ada, Fortran) require raising an exception → the
computer jumps to a predefined address to invoke the appropriate routine.
1-Bit Adders
Cin
Half Full 𝜏 2𝜏
Adder Adder
A B A B A

Cout Cout
Cout Cin
+ +

S S B

A B Cout S Cin A B Cout S


S
0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 1 0 1
1 0 0 1 0 1 0 0 1
1 1 1 0 0 1 1 1 0
1 0 0 0 1
S =AB 1 0 1 1 0
1 1 0 1 0 ❖ Half adder has 2 inputs, in
Cout = AB
1 1 1 1 1 principle HA is same as
FA, with Cin set to 0.
S = A  B  Cin
❖ Each FA has a finite delay
Cout = AB + ACin + BCin
= 2 gate delays (2𝜏)
Multibit Adders
A B
N N

Cout Cin
+
N
S

❑ Types of carry propagate adders (CPAs):


➢ Ripple-carry (slow)
➢ Carry-select (fast)
➢ Carry-lookahead (faster)

❑ Trade-off: faster adders require more hardware


Ripple-Carry Adder

A31 B31 A30 B30 A1 B1 A0 B0

Cout Cin
+ C30 + C29 C1 + C0 +
S31 S30 S1 S0

❑ Chain 1-bit adders together


➢ All input bits available at the same time
➢ Size/complexity: O(n) (= 𝑛𝐹𝐴𝑠𝑖𝑧𝑒 )

❑ Carry ripples through entire chain


➢ Carry ripple through all FAs from right to left
➢ Critical Path Delay: O(n) (= 𝑛𝑡𝐹𝐴 = 2𝑛𝜏)
Carry-Select Adder

❑ Principle: speculative Cin – compute both, select one


➢ 3 adders operate in parallel
➢ Size/complexity: O(n) (example: 1.5𝑛𝐹𝐴𝑠𝑖𝑧𝑒 + mux𝑠𝑖𝑧𝑒 )
➢ Critical Path Delay: O(n0.5) (example: 0.5𝑛𝑡𝐹𝐴 + mux𝑑𝑒𝑙𝑎𝑦 → n = 16: ≈ 16𝜏)
Carry Generate and Propagate

❑ Principles
➢ Generate a carry out if 𝑎 and 𝑏 are both 1, 𝐶𝑖𝑛 does not matter.
➢ Propagate a carry in to the carry out if 𝑎 or 𝑏 is 1.
➢ Local decisions based on 𝑎 and 𝑏 only: generate 𝐺 = 𝑎𝑏, propagate 𝑃 =
𝑎 + 𝑏.
Carry Look Ahead Adder
A3 B3 A2 B2 A1 B1 A0 B0

S3 C3 S2 C2 S1 C1 S0
C0

p3 g3 p2 g2 p1 g1 p0 g0

CLL (carry look-ahead logic)


C4

❑ Compute carry out (Cout) for 𝑘-bit blocks in parallel using


generate and propagate signals:
𝑐1 = 𝐺0 + 𝑐0 𝑃0
𝑐2 = 𝐺1 + 𝐺0 𝑃1 + 𝑐0 𝑃0 𝑃1
𝑐3 = 𝐺2 + 𝐺1 𝑃2 + 𝐺0 𝑃1 𝑃2 + 𝑐0 𝑃0 𝑃1 𝑃2
𝑐4 = 𝐺3 + 𝐺2 𝑃3 + 𝐺1 𝑃2 𝑃3 + 𝐺1 𝑃1 𝑃2 𝑃3 + 𝑐0 𝑃0 𝑃1 𝑃2 𝑃3
➢ We can compute 𝑐𝑛 in O(log n) gate delay and O(n2) size → only
manageable for small n
➢ Given 𝑐𝑛 we can compute 𝑠𝑛 for a constant additional delay.
Unsigned integer multiplication
❑ Paper and pencil example
Multiplicand 1000 = 8
Multiplier x 1001 = 9
1000
Binary multiplication is easy
0000 0 x multiplicand = 0
0000 1 x multiplicand = multiplicand
1000
Product 1001000 = 72

❑ Observations
➢ m-bit multiplicand x n-bit multiplier = (m+n)-bit product
➢ Accomplished via shifting and addition.
➢ Consume more time and more chip area than addition
Sequential Unsigned Multiplication
1000 10000 100000 1000000
× 1001 × 100 × 10 × 1
1000 1000000
+ 0 + 1000
1000 1000 1000 1001000
iteration

Initially 0

❖ If each step took a clock cycle, this algorithm would require almost 100 clock
cycles to multiply two 32-bit numbers.
Refined Algorithm & Hardware
Start
❑ Perform shifts in parallel
HI = 0, LO = Multiplier ➢ ALU produces 64-bit result + Carry bit
➢ Final Product = HI and LO registers
=1
LO[0]?
=0 ➢ Initialize LO = Multiplier

HI = HI + Multiplicand

Shift right (Carry, HI, LO) 1 bit


add

No
32nd repetition? carry
shift right
Yes HI LO
Done write

LO[0]
Refined algorithm: Example
❑ Consider: 11002 x 11012
➢ 4-bit multiplicand and multiplier → 4-bit adder produces a 5-bit sum (with
carry)

➢ Product = 100111002
Signed Integer Multiplication (p & p)
❑ Case 1: Positive Multiplier
Multiplicand 1100 = -4
Multiplier x 0101 = +5
11111100
Sign-extension
111100
Product 11101100 = -20

❑ Case 2: Negative Multiplier


Multiplicand 1100 = -4
Multiplier x 1101 = -3
11111100
Sign-extension
111100
00100 (2's complement of 1100)
Product 00001100 = +12
Signed Integer Multiplier Hardware
Start
❑ Perform shifts in parallel
HI = 0, LO = Multiplier ➢ ALU produces 64-bit result + Sign bit

➢ Sign bit set as follows


=1 =0
LO[0]? ➢ No overflow → Extend sign-bit of result
➢ Overflow → Invert sign bit of result
First 31 iterations: HI = HI + Multiplicand
Last iteration: HI = HI - Multiplicand

Shift right (sign, HI, LO) 1 bit


Add/sub

No
32nd repetition? sign
shift right
Yes HI LO
Done write

LO[0]
Signed Multiplication: Example
❑ Consider: 10102 x 11112
➢ Iteration 1: No overflow → Extend sign-bit (1 → 1)
➢ Iteration 2,3: Overflow → Invert sign bit (0 → 1)
➢ Last iteration: No overflow (why?), add 2's complement of Multiplicand

➢ Product = 000011002
Faster Integer Multiplier

❑ Uses multiple adders


➢ Cost/performance tradeoff

❑ Can be pipelined
➢ Several multiplication performed in parallel
MIPS multiplication
❑ Two 32-bit registers for product
➢ HI: most-significant 32 bits
➢ LO: least-significant 32-bits

❑ Instructions
➢ mult rs, rt / multu rs, rt
▪ 64-bit product in HI/LO
➢ mfhi rd / mflo rd
▪ Move from HI/LO to rd
▪ Can test HI value to see if product overflows 32 bits
➢ mul rd, rs, rt
▪ Least-significant 32 bits of product –> rd
Unsigned Division (Paper & Pencil)

quotient ❑ Check for 0 divisor


dividend ❑ Long division approach
1001 ➢ If divisor ≤ dividend bits: 1 bit in
1000 1001010 quotient, subtract
-1000 ➢ Otherwise: 0 bit in quotient, bring
divisor
10 down next dividend bit
101 ❑ Signed division
1010
➢ Divide using absolute values
-1000
➢ Adjust sign of quotient and remainder
remainder 10 as required

𝑛-bit operands yield 𝑛-bit ❑ Binary division is accomplished


quotient and remainder via shifting and subtraction.
Sequential Division Algorithm and
Hardware
Initially divisor
in left half

Initially dividend

❖ Unsigned integer division


❖ Requires 64-bit ALU as well as 64-
bit registers for divisor & remainder
❖ Can be optimized
Optimized Divider
Start
❑ Initialize: HI = 0, LO = Dividend
➢ Results: HI = Remainder, LO =
Shift (Remainder, Quotient) left Quotient
Difference = Remainder - Divisor

≥0 <0
❑ Looks a lot like a multiplier!
Difference?
➢ Same hardware can be used for both

Remainder = Difference
Set lsb of Quotient

sub
No
32nd repetition?
sign

Yes write
HI (remainder) LO (quotient) shift left
Done

set lsb
Unsigned Integer Division Example
❑ Consider: 11102 / 00112 (4-bit dividend & divisor)
➢ Requires 4-bit ALU as well as 4-bit registers for remainder & quotient
➢ Result: Quotient = 01002, Remainder = 00102
MIPS division
❑ Use HI/LO registers for result
➢ HI: 32-bit remainder
➢ LO: 32-bit quotient

❑ Instructions
➢ div rs, rt / divu rs, rt
▪ No overflow or divide-by-0 checking, software must perform checks if
required
➢ Use mfhi / mflo to access result
Floating-Point Addition
❑ Consider 1.0002 × 2–1 + –1.1102 × 2–2 (i.e. 0.510 + –0.437510)
❑ Algorithm
1. Align decimal points
▪ Shift number with smaller exponent: 1.0002 × 2–1 + –0.1112 × 2–1
2. Add significands
▪ 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
3. Normalize result & check for over/underflow
▪ 1.0002 × 2–4
4. Round and renormalize if necessary
▪ 1.0002 × 2–4 (no change) = 0.062510

❑ FP adder hardware is much more complex than integer adder


➢ FP adder usually takes several cycles.
FP Adder Hardware

Step 1

Step 2

Step 3

Step 4
FP Arithmetic Hardware
❑ FP multiplier is of similar complexity to FP adder
➢ But uses a multiplier for significands instead of an adder

❑ FP arithmetic hardware usually does


➢ Addition, subtraction, multiplication, division, reciprocal, square-root
➢ FP ↔ integer conversion
➢ Operations usually takes several cycles

❑ FP hardware is coprocessor 1
➢ Adjunct processor that extends the ISA
➢ Separate FP registers
➢ FP instructions operate only on FP registers, programs generally don’t do
integer ops on FP data, or vice versa
FP Instructions in MIPS
❑ FP load and store instructions
▪ lwc1, ldc1, swc1, sdc1

❑ Single-precision arithmetic
▪ add.s, sub.s, mul.s, div.s

❑ Double-precision arithmetic
▪ add.d, sub.d, mul.d, div.d

❑ Single- and double-precision comparison


▪ c.xx.s, c.xx.d (xx is eq, lt, le, …)
▪ Sets or clears FP condition-code bit

❑ Branch on FP condition code true or false


▪ bc1t, bc1f
FP Example: °F to °C
❑ C code:
float f2c (float fahr) {
return ((5.0/9.0)*(fahr - 32.0));
}

❑ Compiled MIPS code:


▪ fahr in $f12, result in $f0, literals in global memory space
f2c: lwc1 $f16, const5($gp)
lwc2 $f18, const9($gp)
div.s $f16, $f16, $f18
lwc1 $f18, const32($gp)
sub.s $f18, $f12, $f18
mul.s $f0, $f16, $f18
jr $ra
FP Accuracy
❑ IEEE Std 754 specifies additional rounding control
➢ Extra bits of precision (guard, round, sticky)
➢ Choice of rounding modes
➢ Allows programmer to fine-tune numerical behavior of a computation

❑ Not all FP units implement all options


➢ Most programming languages and FP libraries just use defaults

❑ Trade-off between hardware complexity, performance, and


market requirements
➢ FP accuracy is Important for scientific code
➢ But for everyday consumer use? “My bank balance is out by 0.0002¢!” 
Concluding Remarks
❑ ISAs support arithmetic
➢ Signed and unsigned integers
➢ Floating-point approximation to reals

❑ Arithmetic operations have finite range and precision


➢ Operations can overflow and underflow
➢ Need to account for this in programs

❑ MIPS ISA
➢ Core instructions: 54 most frequently used
▪ 100% of SPECINT, 97% of SPECFP
➢ Other instructions: less frequent

❑ Next lecture: Building a single-cycle processor


➢ Additional logic for datapath & control

You might also like