0% found this document useful (0 votes)
10 views112 pages

Computer Arithmetic

The document outlines the design and functionality of an Arithmetic Logic Unit (ALU) used in computer systems, detailing operations such as addition, subtraction, multiplication, and division. It discusses the construction of the ALU, including fast adders and overflow detection mechanisms, as well as the specifications for MIPS ALU operations. Additionally, it covers the representation of floating-point numbers and the handling of overflow and zero detection in arithmetic operations.

Uploaded by

羅凱騰
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views112 pages

Computer Arithmetic

The document outlines the design and functionality of an Arithmetic Logic Unit (ALU) used in computer systems, detailing operations such as addition, subtraction, multiplication, and division. It discusses the construction of the ALU, including fast adders and overflow detection mechanisms, as well as the specifications for MIPS ALU operations. Additionally, it covers the representation of floating-point numbers and the handling of overflow and zero detection in arithmetic operations.

Uploaded by

羅凱騰
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 112

Computer

Arithmetic
國立清華大學資訊工程學系
黃婷婷教授
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

2
Problem: Designing MIPS
ALU
 Requirements: must support the following
arithmetic and logic operations
 add, sub: two’s complement adder/subtractor
with overflow detection
 and, or, nor : logical AND, logical OR, logical
NOR
 slt (set on less than): two’s complement adder
with inverter, check sign bit of result

3
Functional
Specification
ALUop
4
A
32
Zero

ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
4
Functional
Specification
ALUop
4
A
32
Zero

ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
5
A Bit-slice ALU
 Design trick 1: divide and conquer
 Break the problem into simpler problems,
solve them and glue together the solution
 Design trick 2: solve part of the problem and
extend 32
A B 32

a0 b0 4
a31 b31
m
ALU31 m ALU0
co cin ALUop
c31 cin s0
s31

Overflow Zero
32 Result
6
A 1-bit ALU
 Design trick 3: take pieces you know (or can
imagine) and try to put them together
CarryIn Operation

and
A
0

or Result
1

Mux
1-bit add
Full 2
B Adder

CarryOut

7
Functional
Specification
ALUop
4
A
32
Zero

ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
8
A 4-bit ALU
1-bit ALU 4-bit ALU
CarryIn0 Operation
Operation
CarryIn
A0 1-bit
A Result0
B0 ALU
CarryIn1 CarryOut0
A1 1-bit Result1
Result B1 ALU

Mux
CarryIn2 CarryOut1
A2 1-bit Result2
B2 ALU
1-bit CarryIn3 CarryOut2
Full A3
B 1-bit Result3
Adder
B3 ALU

CarryOut CarryOut3

9
How about
Subtraction?
 2’s
complement: take inverse of every bit and
add 1 (at cin of first stage)
A + B’ + 1 = A + (B’ + 1) = A + (-B) = A - B
 Bit-wise inverse of B is B’

Subtract CarryIn Operation


(Bnegate)
A

ALU
Result
Sel
B 0
Mux

1
B’ CarryOut

10
Revised Diagram
 LSB and MSB need to do a little extra

A 32 B 32

a31 b31 a0 b0 4
ALU31 ALU0
ALUop
co cin
? c31 cin
s31 s0

Supply a
1 on
subtractio
32
n
Overflow Zero
Result
Combining the
CarryIn and Bnegate

11
Functional
Specification
ALUop
4
A
32
Zero

ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
12
R-Format Instructions (1/2)
 Define the following “fields”:
6 5 5 5 5 6
opcode rs rt rd shamt funct
 opcode: partially specifies what instruction it is
(Note: 0 for all R-Format instructions)
 funct: combined with opcode to specify the
instruction
Question: Why aren’t opcode and funct a single 12-
bit field?
 rs (Source Register): generally used to specify
register containing first operand
 rt (Target Register): generally used to specify
register containing second operand
 rd (Destination Register): generally used to specify
register which will receive result of computation

13
Nor Operation
 A nor B = (not A) and (not B)
ALUop
Ainvert Operation 2
CarryIn

a 0
0
1
Bnegate 1
Result
b 0 2
1

CarryOut
14
Functional
Specification
ALUop
4
A
32
Zero

ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
15
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

16
Functional
Specification
ALUop
4
A
32
Zero

ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
17
Set on Less Than (I)

 1-bit in ALU
(for bits 1-30) ALUop
Ainvert CarryIn Operation

a 0
0
1
Bnegate 1
Result
b 0 2
1
Less 3

(0:bits 1-30) CarryOut


18
Set on Less Than (II)

 Sign bit in ALU


Ainvert CarryIn Operation

a 0
0
1
Bnegate 1
Result
b 0 2
1
Less 3
Set

Overflow Overflow
detection

19
Set on Less Than (III)

 Bit 0 in ALU
ALUop
Ainvert CarryIn Operation

a 0
0
1
Bnegate 1
Result
b 0 2
1
Set 3

CarryOut
20
A Ripple Carry Adder and Set on Less
Than

ALUop Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-less-
than
1100 nor

21
Functional
Specification
ALUop
4
A
32
Zero

ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
22
Overflow
Decimal Binary Decimal 2’s
complement
0 0000 0 0000
1 0001 -1 1111
2 0010 -2 1110
3 0011 -3 1101
4 0100 -4 1100
5 0101 -5 1011
6 0110 -6 1010
7 0111 -7 1001
-8 1000
Ex: 7 + 3 = 10 but ... -4 - 5 = -9 but

0 1 1 1 1 0 0 0
0 1 1 1 7 1 1 0 0 -4
+ 0 0 1 1 3 + 1 0 1 1 -5
1 0 1 0 -6 0 1 1 1 7
23
Overflow Detection
 Overflow: result too big/small to represent
 -8  4-bit binary number  7
 When adding operands with different signs,
overflow cannot occur!
 Overflow occurs when adding:
 2 positive numbers and the sum is negative
 2 negative numbers and the sum is positive
=> sign bit is set with the value of the result
 Overflow if: Carry into MSB  Carry out of MSB

0 1 1 1 1 0 0 0
0 1 1 1 7 1 1 0 0
-4
+ 0 0 1 1 3 + 1 0 1 1
-5
1 0 1 0 -6 0 1 1 1
7
24
Overflow Detection
Logic
 Overflow = CarryIn[N-1] XOR CarryOut[N-1]

CarryIn0

A0 1-bit Result0 X Y X XOR Y


B0 ALU
CarryOut0 0 0 0
CarryIn1
A1 0 1 1
1-bit Result1
ALU 1 0 1
B1
CarryOut1 1 1 0
CarryIn2
A2 1-bit Result2
B2 ALU
CarryIn3 Overflow
A3 1-bit Result3
B3 ALU

CarryOut3

25
Dealing with Overflow
 Some languages (e.g., C) ignore overflow
 Use MIPS addu, addui, subu instructions
 Other languages (e.g., Ada, Fortran) require
raising an exception
 Use MIPS add, addi, sub instructions
 On overflow, invoke exception handler
 Save PC in exception program counter (EPC)
register
 Jump to predefined handler address
 mfc0 (move from coprocessor reg)
instruction can retrieve (copy) EPC value (to
a general purpose register), to return after
corrective action (by jump register
instruction)

26
Zero Detection Logic
 Zero Detection Logic is a one BIG NOR gate
(support conditional jump)
CarryIn0
A0 1-bit Result0
B0 ALU
CarryIn1 CarryOut0
A1 1-bit Result1
B1 ALU
Zero
CarryIn2 CarryOut1
A2 1-bit Result2
B2 ALU
CarryIn3 CarryOut2
A3 1-bit Result3
B3 ALU

CarryOut3

27
Problems with Ripple Carry
Adder
 Carry bit may have to propagate from LSB to
MSB => worst case delay: N-stage delay
CarryIn0
A0 CarryIn
1-bit Result0
B0 ALU
A
CarryIn1 CarryOut0
A1 1-bit Result1
B1 ALU
CarryIn2 CarryOut1
A2 1-bit Result2 B
B2 ALU
CarryIn3 CarryOut2 CarryOut
A3 1-bit Result3
Design Trick: look
B3 ALU for parallelism and
CarryOut3 throw hardware at
it 28
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

29
Carry Look-ahead: Theory (I)
(Appendix C)

B1 A1 B0 A0

Cin1

Cin0
Cin2 1-bit 1-bit
ALU ALU

Cout1

Cout0
 CarryOut=(B*CarryIn)+(A*CarryIn)+(A*B)
 Cin1=Cout0= (B0 * Cin0)+(A0 * Cin0)+ (A0 * B0)
 Cin2=Cout1= (B1 * Cin1)+(A1 * Cin1)+ (A1 * B1)

 Substituting Cin1 into Cin2:


 Cin2=(A1*A0*B0)+(A1*A0*Cin0)+(A1*B0*Cin0)
+(B1*A0*B0)+(B1*A0*Cin0)+(B1*B0*Cin0)
+(A1*B1)

30
Carry Look-ahead: Theory
(II)
 Now define two new terms:
 Generate Carry at Bit i:gi = Ai * Bi
 Propagate Carry via Bit i: pi = Ai xor Bi
 We can rewrite:
 Cin1=g0+(p0*Cin0)
 Cin2=g1+(p1*g0)+(p1*p0*Cin0)
 Cin3=g2+(p2*g1)+(p2*p1*g0)+
(p2*p1*p0*Cin0)
 Carry going into bit 3 is 1 if
 We generate a carry at bit 2 (g2)
 Or we generate a carry at bit 1 (g1) and
bit 2 allows it to propagate (p2 * g1)
 Or we generate a carry at bit 0 (g0) and
bit 1 as well as bit 2 allows it to propagate …..

31
A Plumbing Analogy for Carry Loo-
kahead
(1, 2, 4 bits)

32
Common Carry Look-ahead
Adder
 Expensive to build a “full” carry look-ahead
adder
 Ex: Cin3=g2+(p2*g1)+(p2*p1*g0)+
(p2*p1*p0*Cin0)
 Just imagine length of the equation for
Cin31

 Common practices:
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder

33
Cascaded Carry Look-
ahead
 Connects several N-bit look-ahead adders to
form a big one

A[31:24] B[31:24] A[23:16] B[23:16] A[15:8] B[15:8] A[7:0] B[7:0]


8 8 8 8 8 8 8 8

8-bit Carry C24 8-bit Carry C16 8-bit Carry C8 8-bit Carry C0
Lookahead Lookahead Lookahead Lookahead
Adder Adder Adder Adder

8 8 8 8

Result[31:24] Result[23:16] Result[15:8] Result[7:0]

34
Example: Carry Look-ahead
Unit

cout Carry Look-ahead Unit cin


4

4 4
gi pi

35
Example: Cascaded Carry Look-
ahead
 Connects several N-bit look-ahead adders to
form a big one

4-bit Carry c12 4-bit Carry c8 4-bit Carry c4 4-bit Carry c0


Lookahead Lookahead Lookahead Lookahead
Unit Unit Unit Unit

p[15:12] g[15:12] p[11:8] g[11:8] p[7:4] g[7:4] p[3:0] g[3:0]

c[16:13] c[12:9] c[8:5] c[4:1]

+ + + + + + + + + + + + + + + +

36
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

37
Multiple Level Carry Look-ahead
Adder
 View an N-bit look-ahead adder as a
block
 Where to get Cin of the block ?
A[31:24] B[31:24] A[23:16] B[23:16] A[15:8] B[15:8] A[7:0] B[7:0]
8 8 8 8 8 8 8 8

C2 C16 C
8-bit Carry 4 8-bit Carry 8-bit Carry 8 8-bit Carry C0
Lookahead Lookahead Lookahead Lookahead
Adder Adder Adder Adder

8 8 8 8

Result[31:24] Result[23:16] Result[15:8] Result[7:0]

 Generate “super” Pi and Gi of the


block
 Use next level carry look-ahead
structure to generate block Cin
38
A Plumbing Analogy for Carry
Look-ahead (Next Level P0 and
G0)

39
CarryIn

a0
b0
CarryIn
Result0--3
A Carry Look-
ahead Adder
a1
b1 ALU0
a2 pi
b2 P0
G0 gi
a3
b3 Carry-lookahead unit
C1
ci + 1

a4 CarryIn
b4
A B Cout
Result4--7
a5
b5 ALU1
a6
0 0 0 kill
P1 pi + 1
b6 G1 gi + 1
a7

0 1 Cin propagate
b7
C2
ci + 2

a8
b8
CarryIn
Result8--11
1 0 Cin propagate
a9
b9
a10
ALU2
P2 pi + 2
1 1 1 generate
b10 G2 gi + 2
a11
b11
C3
ci + 3

a12 CarryIn
G=A*B
b12
a13
Result12--15
P=A+B
b13 ALU3
a14 P3 pi + 3
b14 G3 gi + 3
a15 C4
b15 ci + 4

CarryOut

40
Example: Carry Look-ahead
Unit

P G

cout Carry Look-ahead Unit cin


4

4 4
gi pi

41
Example: Multiple Level Carry Look-
ahead
C[4:0] 4-bit Carry
Lookahead
Unit

P3, G3 P2, G2 P1, G1 P0, G0


4-bit Carry c12 4-bit Carry c8 4-bit Carry c4 4-bit Carry c0
Lookahead Lookahead Lookahead Lookahead
Unit Unit Unit Unit

p[15:12] g[15:12] p[11:8] g[11:8] p[7:4] g[7:4] p[3:0] g[3:0]

c[16:13] c[12:9] c[8:5] c[4:1]

+ + + + + + + + + + + + + + + + +
42
Carry-select Adder

CP(2n) = 2*CP(n) n-bit adder n-bit adder

CP(2n) = CP(n) + CP(mux)

n-bit adder 1 n-bit adder 0 n-bit adder

Cout Design trick: guess

43
Arithmetic for
Multimedia
 Graphics and media processing operates on
vectors of 8-bit and 16-bit data
 Use 64-bit adder, with partitioned carry chain
 Operate on 8×8-bit, 4×16-bit, or 2×32-bit
vectors
 SIMD (single-instruction, multiple-data)
 Saturating operations
 On overflow, result is largest representable
value
 c.f. 2s-complement modulo arithmetic
 E.g., clipping in audio, saturation in video

44
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

45
Memory

MIPS R2000
Organizatio
n
CPU Coprocessor 1 (FPU)

Registers Registers

$0 $0

$31 $31

Arithmetic Multiply
unit divide

Arithmetic
Lo Hi unit

Coprocessor 0 (traps and memory)


Registers

BadVAddr Cause

Status EPC

46
Multiplication in MIPS
mult $t1, $t2 # t1 * t2
 No destination register: product could be
~264; need two special registers to hold it
 3-step process:

$t1 01111111111111111111111111111111

X $t2 01000000000000000000000000000000

00011111111111111111111111111111 11000000000000000000000000000000

Hi Lo
mfhi $t3 $t3 00011111111111111111111111111111

mflo $t4 $t4 11000000000000000000000000000000

47
MIPS Multiplication
 Two 32-bit registers for product
 HI: most-significant 32 bits
 LO: least-significant 32-bits
 Instructions
 mult rs, rt / multu rs, rt
 64-bit product in HI/LO
 mfhi rd / mflo rd
 Move from HI/LO to rd
 Can test HI value to see if product
overflows 32 bits
 mul rd, rs, rt
 Least-significant 32 bits of product –>
rd

48
Unsigned Multiply
 Paper and pencil example (unsigned):
Multiplicand 1000ten
Multiplier X 1001ten
1000
0000
0000
1000
Product 01001000ten
 m bits x n bits = m+n bit product
 Binary makes it easy:
 0 => place 0 ( 0 x multiplicand)
 1 => place a copy ( 1 x multiplicand)
 2 versions of multiply hardware and
algorithm

49
Unsigned Multiplier (Ver.
1)
 64-bit multiplicand register (with 32-bit
multiplicand at right half), 64-bit ALU, 64-bit
product register, 32-bit multiplier register

Multiplicand
Shift left
64 bits

Multiplier
64-bit ALU Shift right
32 bits

Product
Control test
Write
64 bits

50
Multiply Algorithm Start

(Ver. 1)
Multiplier0 = 1 1. Test Multiplier0 = 0
Multiplier0

1a. Add multiplicand to product and


place the result in Product register

0010 x 0011
2. Shift Multiplicand register left 1 bit
Product Multiplier
Multiplicand 3. Shift Multiplier register right 1 bit
0000 0000 0011 0000 0010
0000 0010 0001 0000 0100 32nd No: < 32 repetitions
repetition?
0000 0110 0000 0000 1000
0000 0110 0000 0001 0000 Yes: 32 repetitions
0000 0110 0000 0010 0000 Done
51
Observations: Multiply Ver.
1
 Delay ratio of multiply to add 5:1 to 100:1
 Half of the bits in multiplicand always 0
=> 64-bit adder is wasted
 0’s inserted in right of multiplicand as
shifted
=> least significant bits of product never
changed once formed
 Instead of shifting multiplicand to left, shift
product to right?
 Product register wastes space => combine
Multiplier and Product register

52
Unsigned Multiply
 Paper and pencil example (unsigned):
Multiplicand 1000ten
Multiplier X 1001ten
1000
0000
0000
1000
Product 01001000ten
 m bits x n bits = m+n bit product
 Binary makes it easy:
 0 => place 0 ( 0 x multiplicand)
 1 => place a copy ( 1 x multiplicand)
 2 versions of multiply hardware and
algorithm

53
Unisigned Multiplier (Ver.
2)
 32-bit Multiplicand register, 32 -bit ALU, 64-
bit Product register (HI & LO in MIPS), (0-bit
Multiplier register)
Multiplicand

32 bits

32-bit ALU

Shift right Control


Product
Write test
64 bits

54
Start
Multiply
Algorithm (Ver. 2)
Product0 = 1 1. Test Product0 = 0
Product0

1a. Add multiplicand to left half of product and


place the result in left half of Product register

0010 x 0011
Multiplicand Product 2. Shift Product register right 1 bit
0010 0000 0011
0010 0011
0010 0001 0001
0011 0001
0010 0001 1000 32nd No: < 32 repetitions
0010 0000 1100 repetition?
0010 0000 0110 Yes: 32 repetitions
Done
55
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

56
Signed Multiply
 What about signed multiplication?
 The easiest solution is to make both
positive and remember whether to
complement product when done (leave out
sign bit, run for 31 steps)
 Apply definition of 2’s complement
 sign-extend partial products and
subtract at end

57
Using Definition of 2’s
Complement
 Paper and pencil example (signed):
Multiplicand 1001 (-7)
Multiplier X 1001 (-7)
11111001
+ 0000000
+ 000000
- 11001
Product 00110001 (49)
 Rule 1: Multiplicand sign extended
 Rule 2: Sign bit (s) of Multiplier
 0 => 0 x multiplicand
 1 => -1 x multiplicand
 Why rule 2 ?
 X = s xn-2 xn-3…. x1 x0 (2’s complement)
 Value(X) = - 1 x s x 2n-1 + x
n-2 x 2 +… + x1 x 21+ x0
n-2

x 20
58
Faster Multiplier
 A combinational multiplier
 Use multiple adders
 Cost/performance tradeoff

 Can be pipelined
 Several multiplication performed in parallel

59
Wallace Tree Multiplier
 Use carry save adders: three inputs and two
outputs

10101110
00100011
10000111
----------------
0 0 0 0 1 0 1 0 (sum)
10100111 (carry)
 8 full adders
 One full adder delay (no carry propagation)
 The last stage is performed by regular adder
 What is the minimum delay for 16 x 16 multiplier ?

60
8-bit Wallace Tree
Multiplier
M0 M1 M2 M3 M4 M5 M6 M7

CSA CSA

CSA CSA

CSA

CSA

Regular adder
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry look-ahead adder
 Multiple level carry look-ahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

62
Memory

MIPS R2000
Organizatio
n
CPU Coprocessor 1 (FPU)

Registers Registers

$0 $0

$31 $31

Arithmetic Multiply
unit divide

Arithmetic
Lo Hi unit

Coprocessor 0 (traps and memory)


Registers

BadVAddr Cause

Status EPC

63
Division in MIPS
div $t1, $t2 # t1 / t2
 Quotient stored in Lo, remainder in Hi
mflo $t3 #copy quotient to t3
mfhi $t4 #copy remainder to t4
 3-step process

 Unsigned division:
divu $t1, $t2 # t1 / t2
 Justlike div, except now interpret t1, t2 as
unsigned integers instead of signed
 Answers are also unsigned, use mfhi, mflo to
access
 No overflow or divide-by-0 checking
 Software must perform checks if required
64
Divide: Paper & Pencil
1001ten Quotient
Divisor 1000ten 1001010ten Dividend
-1000
0010
0101
1010
-1000
10ten Remainder
 See how big a number can be subtracted,
creating quotient bit on each step
Binary => 1 * divisor or 0 * divisor
 Two versions of divide, successive refinement
 Both dividend and divisor are 32-bit positive
integers
65
Divide Hardware
(Version 1)
 64-bit Divisor register (initialized with 32-bit
divisor in left half), 64-bit ALU, 64-bit
Remainder register (initialized with 64-bit
dividend), 32-bit Quotient register
Shift Right
Divisor
64 bits

Quotient Shift Left


64-bit ALU
32 bits

Write
Remainder Control
64 bits

66
Start: Place Dividend in Remainder
Divide Algorithm
(Version 1) 1. Subtract Divisor register from
Remainder register, and place the
0111 / 0010 result in Remainder register
Quot. Divisor Rem.
0000 00100000 00000111 Remainder  0 Test Remainder < 0
11100111 Remainder
00000111
0000 00010000 00000111 2b. Restore original value by
11110111 2a. Shift Quotient
adding Divisor to Remainder,
00000111 register to left,
place sum in Remainder, shift
0000 00001000 00000111 setting new
Quotient to the left, setting new
11111111 rightmost bit to 1
least significant bit to 0
00000111
0000 00000100 00000111
00000011
0001 00000011
0001 00000010 00000011
00000001 33rd No: < 33 repetitions
0011 00000001 repetition?
0011 00000001 00000001 Yes: 33 repetitions

67
Observations: Divide
Version 1
 Half of the bits in divisor register always 0
=> 1/2 of 64-bit adder is wasted
=> 1/2 of divisor is wasted
 Instead of shifting divisor to right,
shift remainder to left?
 1st step cannot produce a 1 in quotient bit
=> switch order to shift first and then
subtract
=> save 1 iteration
 Eliminate Quotient register by combining
with Remainder register as shifted left

68
Divide Hardware
(Version 2)
 32-bit Divisor register, 32 -bit ALU, 64-bit
Remainder register, (0-bit Quotient register)

Divisor

32 bits

32-bit ALU

Remainder (Quotient) Shift Left Control


64 bits Write

69
Divide Algorithm Start: Place Dividend in Remainder
Divide Algorithm
(Version 2)
(Version 2) 1. Shift Remainder register left 1 bit

0111 / 0010 2. Subtract Divisor register from the


left half of Remainder register, and place the
Step Remainder result in the left half of Remainder register
Div.
0 0000 0111 Remainder  0 Test Remainder < 0
0010 Remainder
1.1 0000 1110
1.2 1110 1110 3a. Shift 3b. Restore original value by adding
1.3b 0001 1100 Remainder to left, Divisor to left half of Remainder, and
2.2 1111 1100 setting new place sum in left half of Remainder.
Also shift Remainder to left, setting
2.3b 0011 1000 rightmost bit to 1
the new least significant bit to 0
3.2 0001 1000
3.3a 0011 0001
4.2 0001 0001
32nd No: < 32 repetitions
4.3a 0010 0011 repetition?
0001 0011
Yes: 32 repetitions
Done. Shift left half of Remainder right 1 bit
70
Divide
 Signed Divides:
 Remember signs, make positive, complement
quotient and remainder if necessary
 Let Dividend and Remainder have same sign
and negate Quotient if Divisor sign & Dividend
sign disagree,
 e.g., -7 2 = -3, remainder = -1
-7- 2 = 3, remainder = -1
 Satisfy Dividend =Quotient x Divisor +
Remainder

71
Observations: Multiply and
Divide
 Same hardware as multiply: just need ALU to
add or subtract, and 64-bit register to shift
left (multiply: shift right)
 Hi and Lo registers in MIPS combine to act as
64-bit register for multiply and divide

72
Multiply/Divide
Hardware
 32-bit Multiplicand/Divisor register, 32 -bit
ALU, 64-bit Product/Remainder register, (0-
bit Multiplier/Quotient register)

Multiplicand/
Divisor
32 bits

32-bit ALU

Shift Right
Product/ (Multiplier/
Shift Left Control
Remainder Quotient)
64 bits Write

73
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry lookahead adder
 Multiple level carry lookahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

74
Floating-Point:
Motivation
 What can be represented in N bits?
Unsigned 0 to 2n - 1
2’s Complement -2n-1 to 2n-1- 1
1’s Complement -2n-1+1 to 2n-1
Excess M -M to 2n - M - 1
 But, what about ...
 very large numbers?
9,349,398,989,787,762,244,859,087,678
 very small number?
0.0000000000000000000000045691
 rationals 2/3
 irrationals 2
 transcendentals e, 

75
Scientific Notation: Binary
Significand (Mantissa) exponent
1.0two x 2-1
“binary point” radix (base)

 Computer arithmetic that supports it is


called floating point, because the binary
point is not fixed, as it is for integers
 Normalized form: no leading 0s
(exactly one digit to left of decimal point)
 Alternatives to represent 1/1,000,000,000
 Normalized: 1.0 x 10-9
 Not normalized: 0.1 x 10-8, 10.0 x 10-10

76
FP Representation
 Normal format: 1.xxxxxxxxxxtwo  2yyyytwo
 Want to put it into multiple words: 32 bits for
single-precision and 64 bits for double-
precision
 A simple single-precision representation:
31 30 23 22 0
S Exponent Significand
1 bit 8 bits 23 bits
S represents sign
Exponent represents y’s
Significand represents x’s

77
Double Precision
Representation
 Next multiple of word size (64 bits)
31 30 20 19 0
S Exponent Significand
1 bit 11 bits 20 bits
Significand (cont’d)
32 bits
 Double precision (vs. single precision)
 But primary advantage is greater accuracy
due to larger significand

78
IEEE 754 Standard (1/4)
 Regarding single precision, DP similar
 Sign bit:
1 means negative
0 means positive
 Significand:
 To pack more bits, leading 1 implicit for
normalized numbers
 1 + 23 bits single, 1 + 52 bits double
 always true: 0 < Significand < 1
(for normalized numbers)
 Note: 0 has no leading 1, so reserve
exponent value 0 just for number 0

79
IEEE 754 Standard (2/4)
 Exponent:
 Need to represent positive and negative
exponents
 Also want to compare FP numbers as if they
were integers, to help in value comparisons
 If use 2’s complement to represent?
e.g., 1.0 x 2-1 versus 1.0 x2+1 (1/2 versus 2)
1/2 0 1111 1111 000 0000 0000 0000 0000 0000
2 0 0000 0001 000 0000 0000 0000 0000 0000

If we use integer comparison for these two


words, we will conclude that 1/2 > 2!!!

80
Biased (Excess)
Notation
 Biased 7
0000 -7
0001 -6
0010 -5
0011 -4
0100 -3
0101 -2
0110 -1
0111 0
1000 1
1001 2
1010 3
1011 4
1100 5
1101 6
1110 7
1111 8

81
IEEE 754 Standard (3/4)
 Instead, let notation 0000 0000 be most
negative, and 1111 1111 most positive
 Called biased notation, where bias is the
number subtracted to get the real number
 IEEE 754 uses bias of 127 for single precision:
Subtract 127 from Exponent field to get actual
value for exponent
 1023 is bias for double precision

1/2 0 0111 1110 000 0000 0000 0000 0000 0000


2 0 1000 0000 000 0000 0000 0000 0000 0000

82
IEEE 754 Standard (4/4)
 Summary (single precision):

31 30 23 22 0
S Exponent Significand
1 bit 8 bits 23 bits
(-1)S x (1.Significand) x 2(Exponent-127)

 Double precision identical, except with


exponent bias of 1023

83
Example: FP to
Decimal
0 0110 1000 101 0101 0100 0011 0100 0010
 Sign: 0 => positive
 Exponent:
 0110 1000two = 104ten
 Bias adjustment: 104 - 127 = -23
 Significand:
 1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22
= 1.0 + 0.666115
 Represents: 1.666115ten2-23  1.986  10-7

84
Example 1: Decimal to
FP
 Number = - 0.75
= - 0.11two  20 (base 2 scientific
notation)
= - 1.1two  2-1 (normalized scientific
notation)

 Sign: negative => 1


 Exponent:
 Bias adjustment: -1 +127 = 126
 126
ten = 0111 1110two
1 0111 1110 100 0000 0000 0000 0000 0000

85
Example 2: Decimal to
FP
 A more difficult case: representing 1/3?
= 0.33333…10 = 0.0101010101… 2  20 (base 2)
= 1.0101010101… 2  2-2
(normalization)
 Sign: 0
 Exponent = -2 + 127 = 125 =01111101
10 2
 Significand = 0101010101…

0 0111 1101 0101 0101 0101 0101 0101 010

86
Single-Precision
Range
 Exponents 00000000 and 11111111 reserved
 Smallest value
 Exponent: 00000001
 actual exponent = 1 – 127 = –126
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–126 ≈ ±1.2 × 10–38
 Largest value
 exponent: 11111110
 actual exponent = 254 – 127 = +127
 Fraction: 111…11  significand ≈ 10. = 2.0
2
 ±2.0 × 2+127 ≈ ±3.4 × 10+38

87
Double-Precision
Range
 Exponents 0000…00 and 1111…11 reserved
 Smallest value
 Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–1022 ≈ ±2.2 × 10–308
 Largest value
 Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
 Fraction: 111…11  significand ≈ 10. = 2.0
2
 ±2.0 × 2+1023 ≈ ±1.8 × 10+308

88
Floating-Point
Precision
 Relative precision
 all fraction bits are significant
 Single: approx 2–23
 Equivalent to 23 × log 2 ≈ 23 × 0.3 ≈ 6
10
decimal digits of precision
 Double: approx 2–52
 Equivalent to 52 × log 2 ≈ 52 × 0.3 ≈ 16
10
decimal digits of precision

89
Zero and Special
Numbers
 What have we defined so far? (single
precision)

Exponent Significand Object


0 0 ???
0 nonzero ???
1-254 anything +/- floating-point
255 0 ???
255 nonzero ???

90
Zero and Special
Numbers
 What have we defined so far? (single
precision)

Exponent Significand Object


0 0 ???
0 nonzero ???
1-254 anything +/- floating-point
255 0 ???
255 nonzero ???

91
Representation for 0
 Represent 0?
 exponent all zeroes
 significand all zeroes too
 What about sign?
 +0: 0 00000000 00000000000000000000000
 -0: 1 00000000 00000000000000000000000

92
Special Numbers
 What have we defined so far? (single precision)

Exponent Significand Object


0 0 0
0 nonzero ???
1-254 anything +/- floating-point
255 0 ???
255 nonzero ???

 Range:
1.0  2-126  1.8  10-38
What if result too small? (>0, < 1.8x10-38 =>
Underflow!)
(2 – 2-23)  2127  3.4  1038
What if result too large? (> 3.4x1038 => Overflow!)

93
Gradual Underflow
 Represent denormalized numbers (denorms)
 Exponent : all zeroes
 Significand : non-zeroes

0 0000 0000 0100 0000 0000 0000 0000 000


= 0.012  2-126

 Allow a number to degrade in significance


until it become 0 (gradual underflow)

94
Smallest Number
 The smallest normalized number
 1.0000 0000 0000 0000 0000 0000  2-
126

 The smallest de-normalized number


 0.0000 0000 0000 0000 0000 0001  2-
126

95
Special Numbers
 What have we defined so far? (single
precision)

Exponent Significand Object


0 0 0
0 nonzero denorm
1-254 anything +/- floating-point
255 0 ???
255 nonzero ???

96
Representation for +/- Infinity
 In FP, divide by zero should produce +/-
infinity, not overflow
 Why?
 OK to do further computations with infinity,
e.g., X/0 > Y may be a valid comparison
 IEEE 754 represents +/- infinity
 Most positive exponent reserved for infinity
 Significands all zeroes

S 1111 1111 0000 0000 0000 0000 0000 000

97
Special Numbers
(cont’d)
 What have we defined so far? (single-
precision)

Exponent Significand Object


0 0 0
0 nonzero denom
1-254 anything +/- fl. pt. #
255 0 +/- infinity
255 nonzero ???

98
Representation for Not a
Number
 What do I get if I calculate sqrt(-4.0) or 0/0?
 If infinity is not an error, these should not be
either
 They are called Not a Number (NaN)
 Exponent = 255, Significand nonzero
 Why is this useful?
 Hope NaNs help with debugging?
 They contaminate: op(NaN,X) = NaN
 OK if calculate but don’t use it

99
Special Numbers
(cont’d)
 What have we defined so far? (single-
precision)

Exponent Significand Object


0 0 0
0 nonzero denom
1-254 anything +/- fl. pt. #
255 0 +/- infinity
255 nonzero NaN

100
Outline
 Addition and subtraction (Sec. 3.2)
 Constructing an arithmetic logic unit
(Appendix C)
 Building ALU
 Add, sub, and, or, nor
 Set-on-less-than, overflow detection, zero
detection
 Fast adders
 Cascaded carry lookahead adder
 Multiple level carry lookahead adder
 Multiplication (Sec. 3.3, Appendix C)
 Unsigned multiply
 Signed multiply
 Division (Sec. 3.4)
 Floating point (Sec. 3.5)
 Representations
 Addition and multiplication

101
Floating-Point
Addition
Basic addition algorithm:
(1) Align binary point :compute Ye – Xe
 right shift the smaller number, say Xm, that many
positions to form Xm  2Xe-Ye
(2) Add mantissa: compute Xm  2Xe-Ye + Ym

(3) Normalization & check for over/underflow if


necessary:
 left shift result, decrement result exponent
 right shift result, increment result exponent
 check overflow or underflow during the shift

(4) Round the mantissa and renormalize if


necessary

102
Floating-Point Addition
Example
 Now consider a 4-digit binary example
 1.000 × 2–1 + –1.110 × 2–2 (0.5 + –0.4375)
2 2
 1. Align binary points
 Shift number with smaller exponent
 1.000 × 2–1 + –0.111 × 2–1
2 2
 2. Add mantissa
 1.000 × 2–1 + –0.111 × 2–1 = 0.001 × 2–1
2 2 2
 3. Normalize result & check for
over/underflow
 1.000 × 2–4, with no over/underflow
2
 4. Round and renormalize if necessary
 1.000 × 2–4 (no change) = 0.0625
2

103
Sign Exponent Significand Sign Exponent Significand

Compare
Small ALU exponents

Exponent
difference Step 1
0 1 0 1 0 1

Shift smaller
Control Shift right
number right

Add Step 2
Big ALU

0 1 0 1

Increment or Step 3
decrement Shift left or right Normalize

Step 4
Rounding hardware Round

Sign Exponent Significand


104
FP Adder Hardware
 Much more complex than integer adder
 Doing it in one clock cycle would take too
long
 Much longer than integer operations
 Slower clock would penalize all instructions
 FP adder usually takes several cycles
 Can be pipelined

105
Floating-Point Multiplication

Basic multiplication algorithm


(1) Add exponents of operands to get exponent of
product
doubly biased exponent
Xe = 1111 must
= 15 be=corrected:
7+8
Xe = 7
Ye = 0101 = 5 = -3 + 8
Ye = -3
10100 = 20 = 4 + 8 + 8
Excess 8
=4+8 = 12 = 01100
need extra subtraction step of the bias amount
(2) Multiplication of operand mantissa
(3) Normalize the product & check overflow or
underflow during the shift
(4) Round the mantissa and renormalize if
necessary
(5) Set the sign of product
Arithmetic-106
Floating-Point Multiplication
Example
 Now consider a 4-digit binary example
 1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)
 1. Add exponents
 Unbiased: –1 + –2 = –3
 Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127
= –3 + 127
 2. Multiply operand mantissa
 1.000 × 1.110 = 1.1102  1.110 × 2–3
2 2 2
 3. Normalize result & check for over/underflow
 1.110 × 2–3 (no change) with no over/underflow
2
 4. Round and renormalize if necessary
 1.110 × 2–3 (no change)
2
 5. Determine sign:
 –1.110 × 2–3 = –0.21875
2
107
Memory

MIPS R2000
Organizatio CPU Coprocessor 1 (FPU)

n Registers

$0
Registers

$0

$31 $31

Arithmetic Multiply
unit divide

Arithmetic
Lo Hi unit

Coprocessor 0 (traps and memory)


Registers

BadVAddr Cause

Status EPC

108
MIPS Floating Point
 Separate floating point instructions:
 Single precision: add.s,sub.s,mul.s,div.s
 Double precision: add.d,sub.d,mul.d,div.d
 FP part of the processor:
 contains 32 32-bit registers: $f0, $f1, …
 most registers specified in .s and .d instruction
refer to this set
 Double precision: by convention, even/odd pair
contain one DP FP number: $f0/$f1, $f2/$f3
 separate load and store: lwc1 and swc1
 Instructions to move data between main
processor and coprocessors:
 mfc0, mtc0, mfc1, mtc1, etc.

109
Interpretation of Data
The BIG Picture

 Bits have no inherent meaning


 Interpretation depends on the instructions
applied
 Computer representations of numbers
 Finite range and precision
 Need to account for this in programs

110
§3.6 Parallelism and Computer Arithmetic: Associativity
Associativity
 Floating Point add, subtract associative ?

(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38 0.00E+00
z 1.0 1.0 1.50E+38
1.00E+00 0.00E+00

 Therefore, Floating Point add, subtract are not


associative!
 Why? FP result approximates real result!
 This example: 1.5 x 1038 is so much larger than 1.0 that
1.5 x 1038 + 1.0 in floating point representation is still
1.5 x 1038
111
§3.9 Concluding Remarks
Concluding Remarks
 ISAs support arithmetic
 Signed and unsigned integers
 Floating-point approximation to reals
 Bounded range and precision
 Operations can overflow and underflow
 MIPS ISA
 Core instructions: 54 most frequently used
 100% of SPECINT, 97% of SPECFP
 Other instructions: less frequent

112

You might also like