Floating Point Arithmetic Class
Floating Point Arithmetic Class
Outline
Floating-Point Numbers
Floating-Point Multiplication
The World is Not Just Integers
Programming languages support numbers with fraction
Called floating-point numbers
Examples:
3.14159265… (π)
2.71828… (e)
0.000000001 or 1.0 × 10–9 (seconds in a nanosecond)
86,400,000,000,000 or 8.64 × 1013 (nanoseconds in a day)
last number is a large integer that cannot fit in a 32-bit integer
S Exponent Fraction
Floating-Point Numbers
Floating-Point Multiplication
IEEE 754 Floating-Point Standard
Found in virtually every computer invented since 1980
Simplified porting of floating-point numbers
Unified the development of floating-point algorithms
Increased the accuracy of floating-point numbers
Single Precision Floating Point Numbers (32 bits)
1-bit sign + 8-bit exponent + 23-bit fraction
S Exponent8 Fraction23
S E F = f1 f 2 f3 f 4 …
Floating-Point Numbers
Floating-Point Multiplication
Floating Point Addition Example
Consider adding: (1.111)2 × 2–1 + (1.011)2 × 2–3
For simplicity, we assume 4 bits of precision (or 3 bits of fraction)
Cannot add significands … Why?
Because exponents are not equal
How to make exponents equal?
Shift the significand of the lesser exponent right
until its exponent matches the larger number
(1.011)2 × 2–3 = (0.1011)2 × 2–2 = (0.01011)2 × 2–1
Difference between the two exponents = –1 – (–3) = 2
So, shift right by 2 bits 1.111
+
0.01011
Now, add the significands:
Carry 10.00111
Addition Example – cont’d
So, (1.111)2 × 2–1 + (1.011)2 × 2–3 = (10.00111)2 × 2–1
However, result (10.00111)2 × 2–1 is NOT normalized
Normalize result: (10.00111)2 × 2–1 = (1.000111)2 × 20
In this example, we have a carry
So, shift right by 1 bit and increment the exponent
Round the significand to fit in appropriate number of bits
We assumed 4 bits of precision or 3 bits of fraction
Round to nearest: (1.000111)2 ≈ (1.001)2 1.000 111
Renormalize if rounding generates a carry + 1
1.001
Detect overflow / underflow
If exponent becomes too large (overflow) or too small (underflow)
Floating Point Subtraction Example
Consider: (1.000)2 × 2–3 – (1.000)2 × 22
We assume again: 4 bits of precision (or 3 bits of fraction)
Shift significand of the lesser exponent right
Difference between the two exponents = 2 – (–3) = 5
Shift right by 5 bits: (1.000)2 × 2–3 = (0.00001000)2 × 22
Convert subtraction into addition to 2's complement
Sign
2’s Complement
Overflow or yes
Exception Rounding either truncates
underflow?
fraction, or adds a 1 to least
no significant fraction bit
Done
Next . . .
Floating-Point Numbers
Floating-Point Multiplication
Floating Point Multiplication Example
Consider multiplying: 1.0102 × 2–1 by –1.1102 × 2–2
As before, we assume 4 bits of precision (or 3 bits of fraction)
Unlike addition, we add the exponents of the operands
Result exponent value = (–1) + (–2) = –3
yes
Rounding either truncates
Overflow or
Exception fraction, or adds a 1 to least
underflow?
significant fraction bit
no
Done