Computer Arithmetic
Computer Arithmetic
Arithmetic
國立清華大學資訊工程學系
黃婷婷教授
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
2
Problem: Designing MIPS
ALU
Requirements: must support the following
arithmetic and logic operations
add, sub: two’s complement adder/subtractor
with overflow detection
and, or, nor : logical AND, logical OR, logical
NOR
slt (set on less than): two’s complement adder
with inverter, check sign bit of result
3
Functional
Specification
ALUop
4
A
32
Zero
ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
4
Functional
Specification
ALUop
4
A
32
Zero
ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
5
A Bit-slice ALU
Design trick 1: divide and conquer
Break the problem into simpler problems,
solve them and glue together the solution
Design trick 2: solve part of the problem and
extend 32
A B 32
a0 b0 4
a31 b31
m
ALU31 m ALU0
co cin ALUop
c31 cin s0
s31
Overflow Zero
32 Result
6
A 1-bit ALU
Design trick 3: take pieces you know (or can
imagine) and try to put them together
CarryIn Operation
and
A
0
or Result
1
Mux
1-bit add
Full 2
B Adder
CarryOut
7
Functional
Specification
ALUop
4
A
32
Zero
ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
8
A 4-bit ALU
1-bit ALU 4-bit ALU
CarryIn0 Operation
Operation
CarryIn
A0 1-bit
A Result0
B0 ALU
CarryIn1 CarryOut0
A1 1-bit Result1
Result B1 ALU
Mux
CarryIn2 CarryOut1
A2 1-bit Result2
B2 ALU
1-bit CarryIn3 CarryOut2
Full A3
B 1-bit Result3
Adder
B3 ALU
CarryOut CarryOut3
9
How about
Subtraction?
2’s
complement: take inverse of every bit and
add 1 (at cin of first stage)
A + B’ + 1 = A + (B’ + 1) = A + (-B) = A - B
Bit-wise inverse of B is B’
ALU
Result
Sel
B 0
Mux
1
B’ CarryOut
10
Revised Diagram
LSB and MSB need to do a little extra
A 32 B 32
a31 b31 a0 b0 4
ALU31 ALU0
ALUop
co cin
? c31 cin
s31 s0
Supply a
1 on
subtractio
32
n
Overflow Zero
Result
Combining the
CarryIn and Bnegate
11
Functional
Specification
ALUop
4
A
32
Zero
ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
12
R-Format Instructions (1/2)
Define the following “fields”:
6 5 5 5 5 6
opcode rs rt rd shamt funct
opcode: partially specifies what instruction it is
(Note: 0 for all R-Format instructions)
funct: combined with opcode to specify the
instruction
Question: Why aren’t opcode and funct a single 12-
bit field?
rs (Source Register): generally used to specify
register containing first operand
rt (Target Register): generally used to specify
register containing second operand
rd (Destination Register): generally used to specify
register which will receive result of computation
13
Nor Operation
A nor B = (not A) and (not B)
ALUop
Ainvert Operation 2
CarryIn
a 0
0
1
Bnegate 1
Result
b 0 2
1
CarryOut
14
Functional
Specification
ALUop
4
A
32
Zero
ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
15
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
16
Functional
Specification
ALUop
4
A
32
Zero
ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
17
Set on Less Than (I)
1-bit in ALU
(for bits 1-30) ALUop
Ainvert CarryIn Operation
a 0
0
1
Bnegate 1
Result
b 0 2
1
Less 3
a 0
0
1
Bnegate 1
Result
b 0 2
1
Less 3
Set
Overflow Overflow
detection
19
Set on Less Than (III)
Bit 0 in ALU
ALUop
Ainvert CarryIn Operation
a 0
0
1
Bnegate 1
Result
b 0 2
1
Set 3
CarryOut
20
A Ripple Carry Adder and Set on Less
Than
ALUop Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-less-
than
1100 nor
21
Functional
Specification
ALUop
4
A
32
Zero
ALU
Result
32
Overflow
B
32
CarryOut
ALU Control (ALUop) Function
0000 and
0001 or
0010 add
0110 subtract
0111 set-on-less-than
1100 nor
22
Overflow
Decimal Binary Decimal 2’s
complement
0 0000 0 0000
1 0001 -1 1111
2 0010 -2 1110
3 0011 -3 1101
4 0100 -4 1100
5 0101 -5 1011
6 0110 -6 1010
7 0111 -7 1001
-8 1000
Ex: 7 + 3 = 10 but ... -4 - 5 = -9 but
…
0 1 1 1 1 0 0 0
0 1 1 1 7 1 1 0 0 -4
+ 0 0 1 1 3 + 1 0 1 1 -5
1 0 1 0 -6 0 1 1 1 7
23
Overflow Detection
Overflow: result too big/small to represent
-8 4-bit binary number 7
When adding operands with different signs,
overflow cannot occur!
Overflow occurs when adding:
2 positive numbers and the sum is negative
2 negative numbers and the sum is positive
=> sign bit is set with the value of the result
Overflow if: Carry into MSB Carry out of MSB
0 1 1 1 1 0 0 0
0 1 1 1 7 1 1 0 0
-4
+ 0 0 1 1 3 + 1 0 1 1
-5
1 0 1 0 -6 0 1 1 1
7
24
Overflow Detection
Logic
Overflow = CarryIn[N-1] XOR CarryOut[N-1]
CarryIn0
CarryOut3
25
Dealing with Overflow
Some languages (e.g., C) ignore overflow
Use MIPS addu, addui, subu instructions
Other languages (e.g., Ada, Fortran) require
raising an exception
Use MIPS add, addi, sub instructions
On overflow, invoke exception handler
Save PC in exception program counter (EPC)
register
Jump to predefined handler address
mfc0 (move from coprocessor reg)
instruction can retrieve (copy) EPC value (to
a general purpose register), to return after
corrective action (by jump register
instruction)
26
Zero Detection Logic
Zero Detection Logic is a one BIG NOR gate
(support conditional jump)
CarryIn0
A0 1-bit Result0
B0 ALU
CarryIn1 CarryOut0
A1 1-bit Result1
B1 ALU
Zero
CarryIn2 CarryOut1
A2 1-bit Result2
B2 ALU
CarryIn3 CarryOut2
A3 1-bit Result3
B3 ALU
CarryOut3
27
Problems with Ripple Carry
Adder
Carry bit may have to propagate from LSB to
MSB => worst case delay: N-stage delay
CarryIn0
A0 CarryIn
1-bit Result0
B0 ALU
A
CarryIn1 CarryOut0
A1 1-bit Result1
B1 ALU
CarryIn2 CarryOut1
A2 1-bit Result2 B
B2 ALU
CarryIn3 CarryOut2 CarryOut
A3 1-bit Result3
Design Trick: look
B3 ALU for parallelism and
CarryOut3 throw hardware at
it 28
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
29
Carry Look-ahead: Theory (I)
(Appendix C)
B1 A1 B0 A0
Cin1
Cin0
Cin2 1-bit 1-bit
ALU ALU
Cout1
Cout0
CarryOut=(B*CarryIn)+(A*CarryIn)+(A*B)
Cin1=Cout0= (B0 * Cin0)+(A0 * Cin0)+ (A0 * B0)
Cin2=Cout1= (B1 * Cin1)+(A1 * Cin1)+ (A1 * B1)
30
Carry Look-ahead: Theory
(II)
Now define two new terms:
Generate Carry at Bit i:gi = Ai * Bi
Propagate Carry via Bit i: pi = Ai xor Bi
We can rewrite:
Cin1=g0+(p0*Cin0)
Cin2=g1+(p1*g0)+(p1*p0*Cin0)
Cin3=g2+(p2*g1)+(p2*p1*g0)+
(p2*p1*p0*Cin0)
Carry going into bit 3 is 1 if
We generate a carry at bit 2 (g2)
Or we generate a carry at bit 1 (g1) and
bit 2 allows it to propagate (p2 * g1)
Or we generate a carry at bit 0 (g0) and
bit 1 as well as bit 2 allows it to propagate …..
31
A Plumbing Analogy for Carry Loo-
kahead
(1, 2, 4 bits)
32
Common Carry Look-ahead
Adder
Expensive to build a “full” carry look-ahead
adder
Ex: Cin3=g2+(p2*g1)+(p2*p1*g0)+
(p2*p1*p0*Cin0)
Just imagine length of the equation for
Cin31
Common practices:
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
33
Cascaded Carry Look-
ahead
Connects several N-bit look-ahead adders to
form a big one
8-bit Carry C24 8-bit Carry C16 8-bit Carry C8 8-bit Carry C0
Lookahead Lookahead Lookahead Lookahead
Adder Adder Adder Adder
8 8 8 8
34
Example: Carry Look-ahead
Unit
4 4
gi pi
35
Example: Cascaded Carry Look-
ahead
Connects several N-bit look-ahead adders to
form a big one
+ + + + + + + + + + + + + + + +
36
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
37
Multiple Level Carry Look-ahead
Adder
View an N-bit look-ahead adder as a
block
Where to get Cin of the block ?
A[31:24] B[31:24] A[23:16] B[23:16] A[15:8] B[15:8] A[7:0] B[7:0]
8 8 8 8 8 8 8 8
C2 C16 C
8-bit Carry 4 8-bit Carry 8-bit Carry 8 8-bit Carry C0
Lookahead Lookahead Lookahead Lookahead
Adder Adder Adder Adder
8 8 8 8
39
CarryIn
a0
b0
CarryIn
Result0--3
A Carry Look-
ahead Adder
a1
b1 ALU0
a2 pi
b2 P0
G0 gi
a3
b3 Carry-lookahead unit
C1
ci + 1
a4 CarryIn
b4
A B Cout
Result4--7
a5
b5 ALU1
a6
0 0 0 kill
P1 pi + 1
b6 G1 gi + 1
a7
0 1 Cin propagate
b7
C2
ci + 2
a8
b8
CarryIn
Result8--11
1 0 Cin propagate
a9
b9
a10
ALU2
P2 pi + 2
1 1 1 generate
b10 G2 gi + 2
a11
b11
C3
ci + 3
a12 CarryIn
G=A*B
b12
a13
Result12--15
P=A+B
b13 ALU3
a14 P3 pi + 3
b14 G3 gi + 3
a15 C4
b15 ci + 4
CarryOut
40
Example: Carry Look-ahead
Unit
P G
4 4
gi pi
41
Example: Multiple Level Carry Look-
ahead
C[4:0] 4-bit Carry
Lookahead
Unit
+ + + + + + + + + + + + + + + + +
42
Carry-select Adder
43
Arithmetic for
Multimedia
Graphics and media processing operates on
vectors of 8-bit and 16-bit data
Use 64-bit adder, with partitioned carry chain
Operate on 8×8-bit, 4×16-bit, or 2×32-bit
vectors
SIMD (single-instruction, multiple-data)
Saturating operations
On overflow, result is largest representable
value
c.f. 2s-complement modulo arithmetic
E.g., clipping in audio, saturation in video
44
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
45
Memory
MIPS R2000
Organizatio
n
CPU Coprocessor 1 (FPU)
Registers Registers
$0 $0
$31 $31
Arithmetic Multiply
unit divide
Arithmetic
Lo Hi unit
BadVAddr Cause
Status EPC
46
Multiplication in MIPS
mult $t1, $t2 # t1 * t2
No destination register: product could be
~264; need two special registers to hold it
3-step process:
$t1 01111111111111111111111111111111
X $t2 01000000000000000000000000000000
00011111111111111111111111111111 11000000000000000000000000000000
Hi Lo
mfhi $t3 $t3 00011111111111111111111111111111
47
MIPS Multiplication
Two 32-bit registers for product
HI: most-significant 32 bits
LO: least-significant 32-bits
Instructions
mult rs, rt / multu rs, rt
64-bit product in HI/LO
mfhi rd / mflo rd
Move from HI/LO to rd
Can test HI value to see if product
overflows 32 bits
mul rd, rs, rt
Least-significant 32 bits of product –>
rd
48
Unsigned Multiply
Paper and pencil example (unsigned):
Multiplicand 1000ten
Multiplier X 1001ten
1000
0000
0000
1000
Product 01001000ten
m bits x n bits = m+n bit product
Binary makes it easy:
0 => place 0 ( 0 x multiplicand)
1 => place a copy ( 1 x multiplicand)
2 versions of multiply hardware and
algorithm
49
Unsigned Multiplier (Ver.
1)
64-bit multiplicand register (with 32-bit
multiplicand at right half), 64-bit ALU, 64-bit
product register, 32-bit multiplier register
Multiplicand
Shift left
64 bits
Multiplier
64-bit ALU Shift right
32 bits
Product
Control test
Write
64 bits
50
Multiply Algorithm Start
(Ver. 1)
Multiplier0 = 1 1. Test Multiplier0 = 0
Multiplier0
0010 x 0011
2. Shift Multiplicand register left 1 bit
Product Multiplier
Multiplicand 3. Shift Multiplier register right 1 bit
0000 0000 0011 0000 0010
0000 0010 0001 0000 0100 32nd No: < 32 repetitions
repetition?
0000 0110 0000 0000 1000
0000 0110 0000 0001 0000 Yes: 32 repetitions
0000 0110 0000 0010 0000 Done
51
Observations: Multiply Ver.
1
Delay ratio of multiply to add 5:1 to 100:1
Half of the bits in multiplicand always 0
=> 64-bit adder is wasted
0’s inserted in right of multiplicand as
shifted
=> least significant bits of product never
changed once formed
Instead of shifting multiplicand to left, shift
product to right?
Product register wastes space => combine
Multiplier and Product register
52
Unsigned Multiply
Paper and pencil example (unsigned):
Multiplicand 1000ten
Multiplier X 1001ten
1000
0000
0000
1000
Product 01001000ten
m bits x n bits = m+n bit product
Binary makes it easy:
0 => place 0 ( 0 x multiplicand)
1 => place a copy ( 1 x multiplicand)
2 versions of multiply hardware and
algorithm
53
Unisigned Multiplier (Ver.
2)
32-bit Multiplicand register, 32 -bit ALU, 64-
bit Product register (HI & LO in MIPS), (0-bit
Multiplier register)
Multiplicand
32 bits
32-bit ALU
54
Start
Multiply
Algorithm (Ver. 2)
Product0 = 1 1. Test Product0 = 0
Product0
0010 x 0011
Multiplicand Product 2. Shift Product register right 1 bit
0010 0000 0011
0010 0011
0010 0001 0001
0011 0001
0010 0001 1000 32nd No: < 32 repetitions
0010 0000 1100 repetition?
0010 0000 0110 Yes: 32 repetitions
Done
55
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
56
Signed Multiply
What about signed multiplication?
The easiest solution is to make both
positive and remember whether to
complement product when done (leave out
sign bit, run for 31 steps)
Apply definition of 2’s complement
sign-extend partial products and
subtract at end
57
Using Definition of 2’s
Complement
Paper and pencil example (signed):
Multiplicand 1001 (-7)
Multiplier X 1001 (-7)
11111001
+ 0000000
+ 000000
- 11001
Product 00110001 (49)
Rule 1: Multiplicand sign extended
Rule 2: Sign bit (s) of Multiplier
0 => 0 x multiplicand
1 => -1 x multiplicand
Why rule 2 ?
X = s xn-2 xn-3…. x1 x0 (2’s complement)
Value(X) = - 1 x s x 2n-1 + x
n-2 x 2 +… + x1 x 21+ x0
n-2
x 20
58
Faster Multiplier
A combinational multiplier
Use multiple adders
Cost/performance tradeoff
Can be pipelined
Several multiplication performed in parallel
59
Wallace Tree Multiplier
Use carry save adders: three inputs and two
outputs
10101110
00100011
10000111
----------------
0 0 0 0 1 0 1 0 (sum)
10100111 (carry)
8 full adders
One full adder delay (no carry propagation)
The last stage is performed by regular adder
What is the minimum delay for 16 x 16 multiplier ?
60
8-bit Wallace Tree
Multiplier
M0 M1 M2 M3 M4 M5 M6 M7
CSA CSA
CSA CSA
CSA
CSA
Regular adder
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry look-ahead adder
Multiple level carry look-ahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
62
Memory
MIPS R2000
Organizatio
n
CPU Coprocessor 1 (FPU)
Registers Registers
$0 $0
$31 $31
Arithmetic Multiply
unit divide
Arithmetic
Lo Hi unit
BadVAddr Cause
Status EPC
63
Division in MIPS
div $t1, $t2 # t1 / t2
Quotient stored in Lo, remainder in Hi
mflo $t3 #copy quotient to t3
mfhi $t4 #copy remainder to t4
3-step process
Unsigned division:
divu $t1, $t2 # t1 / t2
Justlike div, except now interpret t1, t2 as
unsigned integers instead of signed
Answers are also unsigned, use mfhi, mflo to
access
No overflow or divide-by-0 checking
Software must perform checks if required
64
Divide: Paper & Pencil
1001ten Quotient
Divisor 1000ten 1001010ten Dividend
-1000
0010
0101
1010
-1000
10ten Remainder
See how big a number can be subtracted,
creating quotient bit on each step
Binary => 1 * divisor or 0 * divisor
Two versions of divide, successive refinement
Both dividend and divisor are 32-bit positive
integers
65
Divide Hardware
(Version 1)
64-bit Divisor register (initialized with 32-bit
divisor in left half), 64-bit ALU, 64-bit
Remainder register (initialized with 64-bit
dividend), 32-bit Quotient register
Shift Right
Divisor
64 bits
Write
Remainder Control
64 bits
66
Start: Place Dividend in Remainder
Divide Algorithm
(Version 1) 1. Subtract Divisor register from
Remainder register, and place the
0111 / 0010 result in Remainder register
Quot. Divisor Rem.
0000 00100000 00000111 Remainder 0 Test Remainder < 0
11100111 Remainder
00000111
0000 00010000 00000111 2b. Restore original value by
11110111 2a. Shift Quotient
adding Divisor to Remainder,
00000111 register to left,
place sum in Remainder, shift
0000 00001000 00000111 setting new
Quotient to the left, setting new
11111111 rightmost bit to 1
least significant bit to 0
00000111
0000 00000100 00000111
00000011
0001 00000011
0001 00000010 00000011
00000001 33rd No: < 33 repetitions
0011 00000001 repetition?
0011 00000001 00000001 Yes: 33 repetitions
67
Observations: Divide
Version 1
Half of the bits in divisor register always 0
=> 1/2 of 64-bit adder is wasted
=> 1/2 of divisor is wasted
Instead of shifting divisor to right,
shift remainder to left?
1st step cannot produce a 1 in quotient bit
=> switch order to shift first and then
subtract
=> save 1 iteration
Eliminate Quotient register by combining
with Remainder register as shifted left
68
Divide Hardware
(Version 2)
32-bit Divisor register, 32 -bit ALU, 64-bit
Remainder register, (0-bit Quotient register)
Divisor
32 bits
32-bit ALU
69
Divide Algorithm Start: Place Dividend in Remainder
Divide Algorithm
(Version 2)
(Version 2) 1. Shift Remainder register left 1 bit
71
Observations: Multiply and
Divide
Same hardware as multiply: just need ALU to
add or subtract, and 64-bit register to shift
left (multiply: shift right)
Hi and Lo registers in MIPS combine to act as
64-bit register for multiply and divide
72
Multiply/Divide
Hardware
32-bit Multiplicand/Divisor register, 32 -bit
ALU, 64-bit Product/Remainder register, (0-
bit Multiplier/Quotient register)
Multiplicand/
Divisor
32 bits
32-bit ALU
Shift Right
Product/ (Multiplier/
Shift Left Control
Remainder Quotient)
64 bits Write
73
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry lookahead adder
Multiple level carry lookahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
74
Floating-Point:
Motivation
What can be represented in N bits?
Unsigned 0 to 2n - 1
2’s Complement -2n-1 to 2n-1- 1
1’s Complement -2n-1+1 to 2n-1
Excess M -M to 2n - M - 1
But, what about ...
very large numbers?
9,349,398,989,787,762,244,859,087,678
very small number?
0.0000000000000000000000045691
rationals 2/3
irrationals 2
transcendentals e,
75
Scientific Notation: Binary
Significand (Mantissa) exponent
1.0two x 2-1
“binary point” radix (base)
76
FP Representation
Normal format: 1.xxxxxxxxxxtwo 2yyyytwo
Want to put it into multiple words: 32 bits for
single-precision and 64 bits for double-
precision
A simple single-precision representation:
31 30 23 22 0
S Exponent Significand
1 bit 8 bits 23 bits
S represents sign
Exponent represents y’s
Significand represents x’s
77
Double Precision
Representation
Next multiple of word size (64 bits)
31 30 20 19 0
S Exponent Significand
1 bit 11 bits 20 bits
Significand (cont’d)
32 bits
Double precision (vs. single precision)
But primary advantage is greater accuracy
due to larger significand
78
IEEE 754 Standard (1/4)
Regarding single precision, DP similar
Sign bit:
1 means negative
0 means positive
Significand:
To pack more bits, leading 1 implicit for
normalized numbers
1 + 23 bits single, 1 + 52 bits double
always true: 0 < Significand < 1
(for normalized numbers)
Note: 0 has no leading 1, so reserve
exponent value 0 just for number 0
79
IEEE 754 Standard (2/4)
Exponent:
Need to represent positive and negative
exponents
Also want to compare FP numbers as if they
were integers, to help in value comparisons
If use 2’s complement to represent?
e.g., 1.0 x 2-1 versus 1.0 x2+1 (1/2 versus 2)
1/2 0 1111 1111 000 0000 0000 0000 0000 0000
2 0 0000 0001 000 0000 0000 0000 0000 0000
80
Biased (Excess)
Notation
Biased 7
0000 -7
0001 -6
0010 -5
0011 -4
0100 -3
0101 -2
0110 -1
0111 0
1000 1
1001 2
1010 3
1011 4
1100 5
1101 6
1110 7
1111 8
81
IEEE 754 Standard (3/4)
Instead, let notation 0000 0000 be most
negative, and 1111 1111 most positive
Called biased notation, where bias is the
number subtracted to get the real number
IEEE 754 uses bias of 127 for single precision:
Subtract 127 from Exponent field to get actual
value for exponent
1023 is bias for double precision
82
IEEE 754 Standard (4/4)
Summary (single precision):
31 30 23 22 0
S Exponent Significand
1 bit 8 bits 23 bits
(-1)S x (1.Significand) x 2(Exponent-127)
83
Example: FP to
Decimal
0 0110 1000 101 0101 0100 0011 0100 0010
Sign: 0 => positive
Exponent:
0110 1000two = 104ten
Bias adjustment: 104 - 127 = -23
Significand:
1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22
= 1.0 + 0.666115
Represents: 1.666115ten2-23 1.986 10-7
84
Example 1: Decimal to
FP
Number = - 0.75
= - 0.11two 20 (base 2 scientific
notation)
= - 1.1two 2-1 (normalized scientific
notation)
85
Example 2: Decimal to
FP
A more difficult case: representing 1/3?
= 0.33333…10 = 0.0101010101… 2 20 (base 2)
= 1.0101010101… 2 2-2
(normalization)
Sign: 0
Exponent = -2 + 127 = 125 =01111101
10 2
Significand = 0101010101…
86
Single-Precision
Range
Exponents 00000000 and 11111111 reserved
Smallest value
Exponent: 00000001
actual exponent = 1 – 127 = –126
Fraction: 000…00 significand = 1.0
±1.0 × 2–126 ≈ ±1.2 × 10–38
Largest value
exponent: 11111110
actual exponent = 254 – 127 = +127
Fraction: 111…11 significand ≈ 10. = 2.0
2
±2.0 × 2+127 ≈ ±3.4 × 10+38
87
Double-Precision
Range
Exponents 0000…00 and 1111…11 reserved
Smallest value
Exponent: 00000000001
actual exponent = 1 – 1023 = –1022
Fraction: 000…00 significand = 1.0
±1.0 × 2–1022 ≈ ±2.2 × 10–308
Largest value
Exponent: 11111111110
actual exponent = 2046 – 1023 = +1023
Fraction: 111…11 significand ≈ 10. = 2.0
2
±2.0 × 2+1023 ≈ ±1.8 × 10+308
88
Floating-Point
Precision
Relative precision
all fraction bits are significant
Single: approx 2–23
Equivalent to 23 × log 2 ≈ 23 × 0.3 ≈ 6
10
decimal digits of precision
Double: approx 2–52
Equivalent to 52 × log 2 ≈ 52 × 0.3 ≈ 16
10
decimal digits of precision
89
Zero and Special
Numbers
What have we defined so far? (single
precision)
90
Zero and Special
Numbers
What have we defined so far? (single
precision)
91
Representation for 0
Represent 0?
exponent all zeroes
significand all zeroes too
What about sign?
+0: 0 00000000 00000000000000000000000
-0: 1 00000000 00000000000000000000000
92
Special Numbers
What have we defined so far? (single precision)
Range:
1.0 2-126 1.8 10-38
What if result too small? (>0, < 1.8x10-38 =>
Underflow!)
(2 – 2-23) 2127 3.4 1038
What if result too large? (> 3.4x1038 => Overflow!)
93
Gradual Underflow
Represent denormalized numbers (denorms)
Exponent : all zeroes
Significand : non-zeroes
94
Smallest Number
The smallest normalized number
1.0000 0000 0000 0000 0000 0000 2-
126
95
Special Numbers
What have we defined so far? (single
precision)
96
Representation for +/- Infinity
In FP, divide by zero should produce +/-
infinity, not overflow
Why?
OK to do further computations with infinity,
e.g., X/0 > Y may be a valid comparison
IEEE 754 represents +/- infinity
Most positive exponent reserved for infinity
Significands all zeroes
97
Special Numbers
(cont’d)
What have we defined so far? (single-
precision)
98
Representation for Not a
Number
What do I get if I calculate sqrt(-4.0) or 0/0?
If infinity is not an error, these should not be
either
They are called Not a Number (NaN)
Exponent = 255, Significand nonzero
Why is this useful?
Hope NaNs help with debugging?
They contaminate: op(NaN,X) = NaN
OK if calculate but don’t use it
99
Special Numbers
(cont’d)
What have we defined so far? (single-
precision)
100
Outline
Addition and subtraction (Sec. 3.2)
Constructing an arithmetic logic unit
(Appendix C)
Building ALU
Add, sub, and, or, nor
Set-on-less-than, overflow detection, zero
detection
Fast adders
Cascaded carry lookahead adder
Multiple level carry lookahead adder
Multiplication (Sec. 3.3, Appendix C)
Unsigned multiply
Signed multiply
Division (Sec. 3.4)
Floating point (Sec. 3.5)
Representations
Addition and multiplication
101
Floating-Point
Addition
Basic addition algorithm:
(1) Align binary point :compute Ye – Xe
right shift the smaller number, say Xm, that many
positions to form Xm 2Xe-Ye
(2) Add mantissa: compute Xm 2Xe-Ye + Ym
102
Floating-Point Addition
Example
Now consider a 4-digit binary example
1.000 × 2–1 + –1.110 × 2–2 (0.5 + –0.4375)
2 2
1. Align binary points
Shift number with smaller exponent
1.000 × 2–1 + –0.111 × 2–1
2 2
2. Add mantissa
1.000 × 2–1 + –0.111 × 2–1 = 0.001 × 2–1
2 2 2
3. Normalize result & check for
over/underflow
1.000 × 2–4, with no over/underflow
2
4. Round and renormalize if necessary
1.000 × 2–4 (no change) = 0.0625
2
103
Sign Exponent Significand Sign Exponent Significand
Compare
Small ALU exponents
Exponent
difference Step 1
0 1 0 1 0 1
Shift smaller
Control Shift right
number right
Add Step 2
Big ALU
0 1 0 1
Increment or Step 3
decrement Shift left or right Normalize
Step 4
Rounding hardware Round
105
Floating-Point Multiplication
MIPS R2000
Organizatio CPU Coprocessor 1 (FPU)
n Registers
$0
Registers
$0
$31 $31
Arithmetic Multiply
unit divide
Arithmetic
Lo Hi unit
BadVAddr Cause
Status EPC
108
MIPS Floating Point
Separate floating point instructions:
Single precision: add.s,sub.s,mul.s,div.s
Double precision: add.d,sub.d,mul.d,div.d
FP part of the processor:
contains 32 32-bit registers: $f0, $f1, …
most registers specified in .s and .d instruction
refer to this set
Double precision: by convention, even/odd pair
contain one DP FP number: $f0/$f1, $f2/$f3
separate load and store: lwc1 and swc1
Instructions to move data between main
processor and coprocessors:
mfc0, mtc0, mfc1, mtc1, etc.
109
Interpretation of Data
The BIG Picture
110
§3.6 Parallelism and Computer Arithmetic: Associativity
Associativity
Floating Point add, subtract associative ?
(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38 0.00E+00
z 1.0 1.0 1.50E+38
1.00E+00 0.00E+00
112