The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
3.14159265 ()
2.71828 (e)
0.000000001 or 1.0 109 (seconds in a nanosecond)
86,400,000,000,000 or 8.64 1013 (nanoseconds in a day)
last number is a large integer that cannot fit in a 32-bit integer
Floating-Point Numbers
Examples of floating-point numbers in base 10
5.341103 , 0.05341105 , 2.013101 , 201.3103
decimal point
binary point
Floating-Point Representation
A floating-point number is represented by the triple
S is the Sign bit (0 is positive and 1 is negative)
Representation is called sign and magnitude
Fraction
Next . . .
Floating-Point Numbers
IEEE 754 Floating-Point Standard
Floating-Point Addition and Subtraction
Floating-Point Multiplication
MIPS Floating-Point Instructions
Fraction23
Exponent11
Fraction52
(continued)
F = f1 f2 f 3 f4
Solution:
Sign = 1 is negative
Exponent = (01111100)2 = 124, E bias = 124 127 = 3
Significand = (1.0100 0)2 = 1 + 2-2 = 1.25 (1. is implicit)
Value in decimal = 1.25 23 = 0.15625
Solution:
implicit
Solution:
Value of exponent = (10000000101)2 Bias = 1029 1023 = 6
Value of double float = (1.00101010 0)2 26 (1. is implicit) =
(1001010.10 0)2 = 74.5
0.8125 2 = 1.625
0.625 2 = 1.25
0.8125 = (0.1101)2 = + + 1/16 = 13/16
0.25 2
= 0.5
0.5 2
= 1.0
Stop when fractional part is 0
Single
Precision
Double
Precision
Infinity
Infinity is a special value represented with maximum E and F = 0
For single precision with 8-bit exponent: maximum E = 255
For double precision with 11-bit exponent: maximum E = 2047
Denormalized Numbers
IEEE standard uses denormalized numbers to
Fill the gap between 0 and the smallest normalized float
Single precision:
Double precision:
Negative
Overflow
-
-2128
Negative
Underflow
Normalized (ve)
Positive
Underflow
Denorm
-2126
Denorm
0
2126
Positive
Overflow
Normalized (+ve)
+
2128
Exponent = 8
Fraction = 23
Value
1 to 254
Anything
(1.F)2 2E 127
Denormalized Number
nonzero
(0.F)2 2126
Zero
Infinity
255
NaN
255
nonzero
NaN
Exponent = 11
Fraction = 52
Value
1 to 2046
Anything
(1.F)2 2E 1023
Denormalized Number
nonzero
(0.F)2 21022
Zero
Infinity
2047
NaN
2047
nonzero
NaN
Normalized Number
Double-Precision
Normalized Number
Floating-Point Comparison
IEEE 754 floating point numbers are ordered
Because exponent uses a biased representation
Exponent value and its binary representation have same ordering
Integer
X<Y
Magnitude
X=Y
Comparator
X>Y
Next . . .
Floating-Point Numbers
IEEE 754 Floating-Point Standard
Floating-Point Addition and Subtraction
Floating-Point Multiplication
MIPS Floating-Point Instructions
1.100000000000001100001012 22
= 0.01100000000000001100001 012 24
24
+ 1.10000000000000110000101
22
+ 1.11100100000000000000010
24
101 25
Rounding
Single-precision requires only 23 fraction bits
However, Normalized result can contain additional bits
1.00100010000000000110001 | 1 01 25
Round Bit: R = 1
Sticky Bit: S = 1
Sticky bit: appears after the round bit (OR of all additional bits)
Consider Adding:
+ 1.00000000101100010001101 2-6
1.00000000000000010011010 2-1
- 1.11101111111101110101011
Example on Rounding
Round following result using IEEE 754 rounding modes:
1.11111111111111111111111 1 0 2-7
Round Bit
Sticky Bit
Overflow or
underflow?
no
Done
yes
Exception
EY
Exponent
Subtractor
sign
0
FX
Swap
d = | EX EY |
SX
add/sub
FY
Shift Right
add / subtract
Sign
Computation
Significand
Adder/Subtractor
sign
SY
max ( EX , EY )
c
z
Inc / Dec
SZ
EZ
Detect carry, or
Count leading 0s
c
Rounding Logic
FZ
Next . . .
Floating-Point Numbers
IEEE 754 Floating-Point Standard
Floating-Point Addition and Subtraction
Floating-Point Multiplication
MIPS Floating-Point Instructions
1.11010000100000010100001
1.10000000001000000000000
111010000100000010100001
111010000100000010100001
1.11010000100000010100001
10.1011100011111011111100110010100001000000000000
Overflow or
underflow?
no
Done
yes
Exception
Few extra bits are needed: guard, round, and sticky bits
Minimize hardware but without compromising accuracy
Value2
Value3
Value4
Sum
1.0E+30
-1.0E+30
9.5
-2.3
7.2
1.0E+30
9.5
-1.0E+30
-2.3
-2.3
1.0E+30
9.5
-2.3
-1.0E+30
Next . . .
Floating-Point Numbers
IEEE 754 Floating-Point Standard
Floating-Point Addition and Subtraction
Floating-Point Multiplication
MIPS Floating-Point Instructions
(.s extension)
(.d extension)
FP Arithmetic Instructions
Instruction
add.s
add.d
sub.s
sub.d
mul.s
mul.d
div.s
div.d
sqrt.s
sqrt.d
abs.s
abs.d
neg.s
neg.d
fd, fs, ft
fd, fs, ft
fd, fs, ft
fd, fs, ft
fd, fs, ft
fd, fs, ft
fd, fs, ft
fd, fs, ft
fd, fs
fd, fs
fd, fs
fd, fs
fd, fs
fd, fs
Meaning
(fd) = (fs) + (ft)
(fd) = (fs) + (ft)
(fd) = (fs) (ft)
(fd) = (fs) (ft)
(fd) = (fs) (ft)
(fd) = (fs) (ft)
(fd) = (fs) / (ft)
(fd) = (fs) / (ft)
(fd) = sqrt (fs)
(fd) = sqrt (fs)
(fd) = abs (fs)
(fd) = abs (fs)
(fd) = (fs)
(fd) = (fs)
Format
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0
1
0
1
0
1
0
1
0
1
0
1
0
1
ft5
ft5
ft5
ft5
ft5
ft5
ft5
ft5
0
0
0
0
0
0
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fs5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
fd5
0
0
1
1
2
2
3
3
4
4
5
5
7
7
FP Load/Store Instructions
Separate floating point load/store instructions
lwc1: load word coprocessor 1
General purpose
register is used as
the base register
$f2, 40($t0)
$f2, 40($t0)
$f2, 40($t0)
$f2, 40($t0)
Meaning
Format
($f2) = Mem[($t0)+40]
($f2) = Mem[($t0)+40]
Mem[($t0)+40] = ($f2)
Mem[($t0)+40] = ($f2)
0x31
0x35
0x39
0x3d
$t0
$t0
$t0
$t0
$f2
$f2
$f2
$f2
im16 = 40
im16 = 40
im16 = 40
im16 = 40
Meaning
Format
mfc1
$t0, $f2
($t0) = ($f2)
0x11
$t0
$f2
mtc1
$t0, $f2
($f2) = ($t0)
0x11
$t0
$f2
($f4) = ($f2)
0x11
$f2
$f4
($f4) = ($f2)
0x11
$f2
$f4
FP Convert Instructions
Convert instruction: cvt.x.y
Convert to destination format x from source format y
Supported formats
Single precision float = .s
fd, fs
fd, fs
fd, fs
fd, fs
fd, fs
fd, fs
Meaning
to single from integer
to single from double
to double from integer
to double from single
to integer from single
to integer from double
Format
0x11
0x11
0x11
0x11
0x11
0x11
0
1
0
1
0
1
0
0
0
0
0
0
fs5
fs5
fs5
fs5
fs5
fs5
fd5
fd5
fd5
fd5
fd5
fd5
0x20
0x20
0x21
0x21
0x24
0x24
fs, ft
fs, ft
fs, ft
fs, ft
fs, ft
fs, ft
Label
Label
Meaning
cflag = ((fs) == (ft))
cflag = ((fs) == (ft))
cflag = ((fs) <= (ft))
cflag = ((fs) <= (ft))
cflag = ((fs) <= (ft))
cflag = ((fs) <= (ft))
branch if (cflag == 0)
branch if (cflag == 1)
Format
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0x11
0
1
0
1
0
1
8
8
ft5
ft5
ft5
ft5
ft5
ft5
0
1
fs5
fs5
fs5
fs5
fs5
fs5
0
0
0
0
0
0
im16
im16
0x32
0x32
0x3c
0x3c
0x3e
0x3e
pi:
msg:
.text
main:
ldc1
li
syscall
mul.d
mul.d
la
li
syscall
li
syscall
.double
.asciiz
3.1415926535897924
"Circle Area = "
$f2, pi
$v0, 7
#
#
#
#
#
$f2,3 = pi
read double (radius)
$f0,1 = radius
$f12,13 = radius*radius
$f12,13 = area
in
elements
X[i][j]
Address of Y[i][k] =
Address of Y + (in+k)8
Address of Z[k][j] =
Address of Z + (kn+j)8
addu
addu
addu
sub.d
$t1,
$t2,
$t3,
$f0,
$0, $0
$0, $0
$0, $0
$f0, $f0
#
#
#
#
$t1
$t2
$t3
$f0
=
=
=
=
i =
j =
k =
sum
mul
addu
sll
addu
l.d
$t4,
$t4,
$t4,
$t4,
$f2,
$t1, $a0
$t4, $t3
$t4, 3
$a2, $t4
0($t4)
#
#
#
#
#
$t4
$t4
$t4
$t4
$f2
= i*size(row) = i*n
= i*n + k
=(i*n + k)*8
= address of y[i][k]
= y[i][k]
$t5,
$t5,
$t5,
$t5,
$f4,
$t3, $a0
$t5, $t2
$t5, 3
$a3, $t5
0($t5)
#
#
#
#
#
$t5
$t5
$t5
$t5
$f4
= k*size(row) = k*n
= k*n + j
=(k*n + j)*8
= address of z[k][j]
= z[k][j]
$f6,
$f0,
$t3,
$t3,
$f2,
$f0,
$t3,
$a0,
$f4
$f6
1
L3
#
#
#
#
$f6 = y[i][k]*z[k][j]
$f0 = sum
k = k + 1
loop back if (k != n)
$t6,
$t6,
$t6,
$t6,
$f0,
$t1, $a0
$t6, $t2
$t6, 3
$a1, $t6
0($t6)
#
#
#
#
#
$t2,
$t2,
$t1,
$t1,
$t2,
$a0,
$t1,
$a0,
1
L2
1
L1
#
#
#
#
j = j +
loop L2
i = i +
loop L1
Return:
jr
$ra
# return
1
if (j != n)
1
if (i != n)