0% found this document useful (0 votes)
50 views17 pages

Information Representation Floating Point

This document discusses floating point number representation in computers. It begins by explaining fractional numbers in decimal and binary number systems, including examples of converting between the two systems. It then describes the normalized floating point number representation used in most computers, which stores numbers as a sign, mantissa, and exponent. The rest of the document provides examples of this representation using binary numbers, discusses sources of error, and describes the IEEE 754 standard for single and double precision floating point numbers.

Uploaded by

Manisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views17 pages

Information Representation Floating Point

This document discusses floating point number representation in computers. It begins by explaining fractional numbers in decimal and binary number systems, including examples of converting between the two systems. It then describes the normalized floating point number representation used in most computers, which stores numbers as a sign, mantissa, and exponent. The rest of the document provides examples of this representation using binary numbers, discusses sources of error, and describes the IEEE 754 standard for single and double precision floating point numbers.

Uploaded by

Manisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Department of Computer and Information Science,

School of Science, IUPUI

CSCI 230

Information Representation:
Floating Point Representation

Dale Roberts, Lecturer


IUPUI
[email protected]

Dale Roberts
Fractional Numbers
Examples: 456.7810 = 4 x 102 + 5 x 101 + 6 x 100 + 7 x 10-1+8 x 10-2
1011.112 = 1 x 23 + 0 x 22 + 1 x 21 + 1 x 20 + 1 x 2-1 + 1 x 2-2
= 8 + 0 + 2 + 1 + 1/2 + ¼
= 11 + 0.5 + 0.25 = 11.7510

Conversion from binary number system to


decimal system
Examples: 111.112 = 1 x 22 + 1 x 21 + 1 x 20 + 1 x 2-1 + 1 x 2-2
= 4 + 2 + 1 + 1/2 + ¼ = 7.75 10
Examples: 11.0112

2 1 0 -1 -2 -3
22 21 20 2-1 2-2 2-3
4 2 1 ½ ¼ 1/8

x x x x

Dale Roberts
Conversion from decimal number system to binary system
Examples: 7.7510 = (?)2

1. Conversion of the integer part: same as before – repeated division by 2


7 / 2 = 3 (Q), 1 (R)  3 / 2 = 1 (Q), 1 (R)  1 / 2 = 0 (Q), 1 (R) 710 = 1112
Conversion of the fractional part: perform a repeated multiplication by 2 and extract the integer part
of the result
0.75 x 2 =1.50  extract 1
0.5 x 2 = 1.0  extract 1 0.7510 = 0.112
0.0  stop
Combine the results from integer and fractional part, 7.75 10 = 111.112
write in the same order
How about choose some of

4 2
Examples: try 5.625B
1 1/2 1/4 1/8
=0.5 =0.25 =0.125

Dale Roberts
Fractional Numbers (cont.)
Exercise 1: Convert (0.625)10 to its binary form
Solution: 0.625 x 2 = 1.25  extract 1
0.25 x 2 = 0.5  extract 0
0.5 x 2 = 1.0  extract 1
0.0  stop

 (0.625)10 = (0.101)2

Exercise 2: Convert (0.6)10 to its binary form


Solution: 0.6 x 2 = 1.2  extract 1
0.2 x 2 = 0.4  extract 0
0.4 x 2 = 0.8  extract 0
0.8 x 2 = 1.6  extract 1
0.6 x 2 = 

 (0.6)10 = (0.1001 1001 1001 …)2

Dale Roberts
Fractional Numbers (cont.)
Exercise 3: Convert (0.8125)10 to its binary form

Solution: 0.8125 x 2 = 1.625  extract 1


0.625 x 2 = 1.25  extract 1
0.25 x 2 = 0.5  extract 0
0.5 x 2 = 1.0  extract 1
0.0  stop

 (0.8125)10 = (0.1101)2

Dale Roberts
Fractional Numbers (cont.)
Errors
One source of error in the computations is due to back and
forth conversions between decimal and binary formats
Example: (0.6)10 + (0.6)10 = 1.210

Since (0.6)10 = (0.1001 1001 1001 …)2


Lets assume a 8-bit representation: (0.6)10 = (0 .1001 1001)2 , therefore
0.6 0.10011001
+ 0.6  + 0.10011001
1.00110010
Lets reconvert to decimal system:
(1.00110010)b
= 1 x 20 + 0 x 2-1 + 0 x 2-2 + 1 x 2-3 + 1 x 2-4 + 0 x 2-5 + 0 x 2-6 + 1 x 2-7 + 0 x 2-8
= 1 + 1/8 + 1/16 + 1/128 = 1.1953125
 Error = 1.2 – 1.1953125
= 0.0046875

Dale Roberts
Floating Point Number Representation
If x is a real number then its normal form representation is:

x = f • Base E
where f : mantissa
E: exponent
exponent
Example: 125.3210 = 0.12532 • 103
mantissa
- 125.3210 = - 0.12532 • 103
0.054610 = 0.546 • 10 –1

The mantissa is normalized, so the digit after the fractional point is non-zero.
Note that in binary, the leading digit is always 1, so it is normally hidden.
If needed the mantissa should be shifted appropriately to make the first digit
(after the fractional point) to be non-zero & the exponent is properly adjusted.

Dale Roberts
Normalizing Numbers
Example:

134.1510 = 0.13415 x 103

0.002110 = 0.21 x 10-2

101.11B = .1011 x 23 or 1.011 x 22 (hidden1)

0.011B = .11 x 2-1 or 1.1 x 2-2 (hidden1)


AB.CDH= .ABCD x 162

0.00ACH= .AC x 16-2

Note that the concept of a hidden 1 only applied to binary.

Dale Roberts
Floating Point Number Representation
Assume we use 16-bit binary pattern for normalized binary
form based on the following convention (MSB to LSB)
Sign of mantissa (±)= left most bit (where 0: +; 1: - )
Mantissa (f)= next 11 bits, leading 1 is assumed, m=1.f
Exponent (E) = next 4 bits, bias 7
20=7 (0111). 21=8 (1000), 2-1=6 (0110)

f = 1.?1?2?3?4…?11 ?12…?15
x = ± f • Base E
E : converted to binary, b1b2b3b4
MSB LSB

?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 ?10 ?11 b1 b2 b3 b4
+:0
- :1 Excess-7

Dale Roberts
Floating Point Number Representation
Question:
How the computer expresses the 16-bit
approximation of 1110.111010111111 in normalized
binary form using the following convention
Sign of mantissa = left most bit (where 0: +; 1: - )
Mantissa = next 11 bits, leading 1 is hidden, really represents 12 bits
Exponent = next four bits, bias 7
sign mantissa exponent

Answer: 1 bit 11 bits 4 bits

Step 1: Normalization
1110.111010111111 = + 1.110111010111111 * 2 +3

Step 2: “Plant” 16 bits


the 16 bit floating point representation is 0 11011101011 1010

Dale Roberts
Floating Point Number Representation
Question:
Interpret the normalized binary number
0110 0000 0000 0100 B

using the convention mentioned


Sign of mantissa = left most bit (where 0: +; 1: - )
Mantissa = next 11 bits, leading 1 is hidden, really represents 12 bits
Exponent = next four bits, bias 7
find its decimal equivalent.

Answer:
0 11000000000 0100 B = 1.11B *2(4-7)=-3 = 0.00111B
= 7/32 = 0.21875D

Dale Roberts
Real Life Example: IEEE 754
IEEE Standard 754 is the representation of
floating point used on most computers.
Single precision (float) is 32 bits or 4 bytes with
the following configuration.
1 sign bit 8 exponent 23 fraction

•The sign field for mantissa is 0 for positive or 1 for negative


•In the mantissa, the decimal point is assumed to follow the first ‘1’. Since
the first digit is always a ‘1’, a hidden bit is used to representing the bit.
The fraction is the 23 bits following the first ‘1’. The fraction really
represents a 24 bit mantissa.
•The exponent field has a bias of 127, meaning that 127 is added to the
exponent before it’s stored. 20 becomes 127, 21 becomes 128, 2-3 becomes
124, 2-1 becomes 126, etc. When the exponent becomes -127 (all zeroes),
the hidden bit is not used to allow gradual underflow.

Dale Roberts
Real Life Example: IEEE 754
IEEE 754 Examples: Normalized Numbers

0 1000 0011 0000 0000 0000 0000 0000 000


= 1 x 24 = 16

0 0011 0001 0000 0000 0000 0000 0000 000


= 1 x 2-78 = 3.3087e-24

0 1000 0001 0100 0000 0000 0000 0000 000


= 1.25 x 22 = 5

Dale Roberts
Real Life Example: IEEE 754
Double precision (double) is 64 bites or 8 bytes
with the following configuration.

1 sign bit 11 exponent 52 fraction

•The definition of the fields matches single precision.


•The double precision bias is 1023.
•What value can you not represent because of the hidden bit?
•Certain bit patterns are reserved to represent special values. Of
particular importance is the representation for zero (all bits zero).
There are also patterns to represent infinity, positive and negative
numeric overflow, positive and negative numeric underflow, and
not-a-number (abbreviated NaN).

Dale Roberts
IEEE 754 Converter
The Java Applet IEEE 754 Converter is an
interactive demonstration that let’s you enter a
floating point number and see its IEEE 754
implementation.

Dale Roberts
Appendix. IBM 370
The 32 Bit Single Precision Floating Point Format for IBM 370
Base = 16
Exponent = Excess-64 notation (i.e., compute binary equivalent, then subtract 64)
Sign = sign of number (0: positive, 1: negative)
Mantissa = normalized fraction (i.e. first digital after ‘.’ is non-zero)

sign exponent mantissa


1 bit 7 bits 24 bits

Example: What is the value of the following point number?

1 100 0010 1001 0011 1101 0111 1100 0010

Sign = 1  the number is negative


Exponent (E) = 100 00102 = 6610 = 2 (subtract 64, because of Excess-64 )
Mantissa (f ) = 1001 0011 1101 0111 1100 0010 = 93D7C2H
The above floating point number is: x = (sign) f • 16 E = - 0.93D7C2 • 16 2
x = - (9 x 16-1 + 3 x 16-2 + D x 16-3 + 7 x 16-4 + C x 16-5 + 2 x 16-6) • 16 2
= - (9 x 161 + 3 x 160 + 13 x 16-1 + 7 x 16-2 + 12 x 16-3 + 2 x 16-4)
= - (144+3 +0.8125+0.02734375 + 0.0029296875 + 0.000030517578125)
= - 147.842803955078125

Dale Roberts
Acknowledgements
These slides where originally prepared by Dr. Jeffrey Huang, updated by Dale
Roberts.
IEEE 754 information was obtained from Steve Hollasch
http://stevehollasch.com/cgindex/coding/ieeefloat.html.
IEEE 754 examples were obtained from Tony Cassandra at St. Edward’s
University.
IEEE 754 Converter. http://www.h-schmidt.net/FloatApplet/IEEE754.html.

Dale Roberts

You might also like