0% found this document useful (0 votes)
69 views11 pages

Ieee 754 F P R: Loating Oint Epresentation

The document discusses the IEEE 754 standard for floating point representation. It defines single and double precision floating point formats that represent numbers in binary format using three components: a sign bit, exponent field, and significand (fraction) field. The standard allows for representation of both small and large numeric values consistently across systems. Single precision uses 32 bits total with an 8-bit exponent and 23-bit fraction, representing about 6 digits of precision. Double precision uses 64 bits with an 11-bit exponent and 52-bit fraction, representing about 16 digits of precision.

Uploaded by

Hoang Anh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views11 pages

Ieee 754 F P R: Loating Oint Epresentation

The document discusses the IEEE 754 standard for floating point representation. It defines single and double precision floating point formats that represent numbers in binary format using three components: a sign bit, exponent field, and significand (fraction) field. The standard allows for representation of both small and large numeric values consistently across systems. Single precision uses 32 bits total with an 8-bit exponent and 23-bit fraction, representing about 6 digits of precision. Double precision uses 64 bits with an 11-bit exponent and 52-bit fraction, representing about 16 digits of precision.

Uploaded by

Hoang Anh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE 754 FLOATING POINT

REPRESENTATION
Alark Joshi

Slides courtesy of Computer Organization and Design, 4th edition


FLOATING POINT
 Representation for non-integral numbers
 Including very small and very large numbers
 Like scientific notation
 –2.34 × 1056 normalized
 +0.002 × 10–4
 +987.02 × 109 not normalized

 In binary
 ±1.xxxxxxx2 × 2yyyy
 Types float and double in C
FLOATING POINT STANDARD
 Defined by IEEE Std 754-1985
 Developed in response to divergence of
representations
 Portability issues for scientific code
 Now almost universally adopted
 Two representations
 Single precision (32-bit)
 Double precision (64-bit)
IEEE FLOATING-POINT FORMAT
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction

x  (1)S  (1 Fraction) 2(Exponent Bias)

 S: sign bit (0  non-negative, 1  negative)

 Normalize significand: 1.0 ≤ |significand| < 2.0


 Significand is Fraction with the “1.” restored
 Always has a leading pre-binary-point 1 bit, so no need to
represent it explicitly (hidden bit)
IEEE FLOATING-POINT FORMAT
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction

x  (1)S  (1 Fraction) 2(Exponent Bias)

 Exponent: excess representation: actual exponent +


Bias
 Ensures exponent is unsigned
 Single precision: Bias = 127;
 Double precision: Bias = 1203
SINGLE-PRECISION RANGE
 Exponents 00000000 and 11111111 are reserved
 Smallest value
 Exponent: 00000001
 actual exponent = 1 – 127 = –126
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–126 ≈ ±1.2 × 10–38
 Largest value
 exponent: 11111110
 actual exponent = 254 – 127 = +127
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+127 ≈ ±3.4 × 10+38
DOUBLE-PRECISION RANGE
 Exponents 0000…00 and 1111…11 are reserved
 Smallest value
 Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–1022 ≈ ±2.2 × 10–308
 Largest value
 Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+1023 ≈ ±1.8 × 10+308
FLOATING-POINT PRECISION
 Relative precision
 all fraction bits are significant
 Single: approx 2–23
 Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6
decimal digits of precision
 Double: approx 2–52
 Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16
decimal digits of precision
FLOATING-POINT EXAMPLE
 Represent –0.75
 –0.75 = (–1)1 × 1.12 × 2–1
= -1 × 1. ½ × ½
= -1.5 * .5 = -0.75
 S=1
 Fraction = 1000…002
 Exponent = –1 + Bias
 Single: –1 + 127 = 126 = 011111102
 Double: –1 + 1023 = 1022 = 011111111102

 Single: 1011111101000…00
 Double: 1011111111101000…00
FLOATING-POINT EXAMPLE
 What number is represented by the single-
precision float
11000000101000…00
 S=1
 Fraction = 01000…002
 Exponent = 100000012 = 129

 x = (–1)1 × (1 + 012) × 2(129 – 127)


= (–1) × 1.25 × 22
= –5.0
EXAMPLE
 Number to IEEE 754 conversion

 http://www.h-schmidt.net/FloatConverter/IEEE754.html

 127.0 – 0 10000101 11111100000000000000000


 128.0 – 0 10000110 00000000000000000000000

 Check IEEE 754 representation for


 2.0, -2.0
 127.99
 127.99999 (five 9’s)
 What happens with 127.999999 (six 9’s) and 3.999999 (six 9’s)

You might also like