COA - Unit2 Floating Point Arithmetic 2
COA - Unit2 Floating Point Arithmetic 2
Outline
Floating-Point Numbers
Floating-Point Multiplication
S Exponent Fraction
Floating-Point Numbers
Floating-Point Multiplication
S E F = f1 f 2 f3 f 4 …
Solution:
Sign = 1 is negative
Exponent = (01111100)2 = 124, E – bias = 124 – 127 = –3
Significand = (1.0100 … 0)2 = 1 + 2-2 = 1.25 (1. is implicit)
Value in decimal = –1.25 × 2–3 = –0.15625
What is the decimal value of?
01000001001001100000000000000000
Solution:
implicit
Value in decimal = +(1.01001100 … 0)2 × 2130–127 =
(1.01001100 … 0)2 × 23 = (1010.01100 … 0)2 = 10.375
Floating Point ICS 233 – KFUPM
© Muhamed Mudawar slide 11
Examples of Double Precision Float
What is the decimal value of this Double Precision float ?
01000000010100101010000000000000
00000000000000000000000000000000
Solution:
Value of exponent = (10000000101)2 – Bias = 1029 – 1023 = 6
Value of double float = (1.00101010 … 0)2 × 26 (1. is implicit) =
(1001010.10 … 0)2 = 74.5
Operation Result
n / 0
x
nonzero / 0
+ (similar for -)
0 / 0 NaN
- NaN (similar for -)
/ NaN
x 0 NaN
NaN op anything NaN
Exp. exp E 2E
0 000 -2 ¼ Denormalized
1 001 2- ¼
2 010 1- ½
3 011 0 1
Normalized
4 100 1 2
5 101 2 4
6 110 3 8
7 111 n/a Inf or NaN
Floating-Point Numbers
Floating-Point Multiplication
+ 0.00001 × 22
2’s Complement
Overflow or yes
Exception Rounding either truncates
underflow?
fraction, or adds a 1 to least
no significant fraction bit
Done
1 FX 1 FY
sign
Exponent 0 1
Subtractor
Swap
d = | EX – E Y |
Shift Right
SX add / subtract
Sign Significand
add/sub
Computation sign
Adder/Subtractor
SY
max ( EX , EY )
c c
Detect carry, or Shift Right / Left
z Count leading 0’s z
Inc / Dec c
Rounding Logic
SZ EZ FZ
Floating-Point Numbers
Floating-Point Multiplication
yes
Rounding either truncates
Overflow or
Exception fraction, or adds a 1 to least
underflow?
significant fraction bit
no
Done
Floating-Point Numbers
Floating-Point Multiplication
0 0.11111110101100010001011 0 1 1 × 25 (sum)
+ 1.11111101011000100010110 1 1 1 × 24 (normalized)
Round bit Sticky bit
Notes
Round down: rounded result is close to but no greater than true result.
Round up: rounded result is close to but no less than true result.
Examples
Round to nearest 1/4 (2 bits right of binary point)
Value Binary Rounded Action Rounded Value
2 3/32 10.000112 10.002 (<1/2—down) 2
2 3/16 10.001102 10.012 (>1/2—up) 2 1/4
2 7/8 10.111002 11.002 (1/2—up) 3
2 5/8 10.101002 10.102 (1/2—down) 2 1/2
Floating-Point Numbers
Floating-Point Multiplication
.DOUBLE Directive
Stores the listed values as double-precision floating point
Examples
var1: .FLOAT 12.3, -0.1
var2: .DOUBLE 1.5e-10
pi: .DOUBLE 3.1415926535897924