0% found this document useful (0 votes)

70 views4 pages

IEEE Floating-Point Tutorial

The document discusses two common floating-point binary formats used by Intel processors: IEEE Single Precision which uses 32 bits with 1 sign bit, 8 exponent bits, and 23 mantissa bits, and IEEE Double Precision which uses 64 bits with 1 sign bit, 11 exponent bits, and 52 mantissa bits. It provides examples and explanations of how floating-point numbers are represented using these formats.

Uploaded by

kabfgd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views4 pages

IEEE Floating-Point Tutorial

Uploaded by

kabfgd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Tutorial: Floating-Point Binary [Link]

htm

Tutorial: Floating-Point Binary

The two most common floating-point binary storage formats used by Intel processors were created for Intel and later standardized by
the IEEE organization:

1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa. Also called single
IEEE Short Real: 32 bits
precision.
1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa. Also called
IEEE Long Real: 64 bits
double precision.

Both formats use essentially the same method for storing floating-point binary numbers, so we will use
the Short Real as an example in this tutorial. The bits in an IEEE Short Real are arranged as follows, with
the most significant bit (MSB) on the left:

Fig.
1

The Sign
The sign of a binary floating-point number is represented by a single bit. A 1 bit indicates a negative
number, and a 0 bit indicates a positive number.

The Mantissa
It is useful to consider the way decimal floating-point numbers represent their mantissa. Using -3.154 x
105 as an example, the sign is negative, the mantissa is 3.154, and the exponent is 5. The fractional
portion of the mantissa is the sum of each digit multiplied by a power of 10:

.154 = 1/10 + 5/100 + 4/1000

A binary floating-point number is similar. For example, in the number +11.1011 x 23, the sign is positive,
the mantissa is 11.1011, and the exponent is 3. The fractional portion of the mantissa is the sum of
successive powers of 2. In our example, it is expressed as:

.1011 = 1/2 + 0/4 + 1/8 + 1/16

Or, you can calculate this value as 1011 divided by 24. In decimal terms, this is eleven divided by
sixteen, or 0.6875. Combined with the left-hand side of 11.1011, the decimal value of the number is
3.6875. Here are additional examples:

Binary Floating-Point Base 10 Fraction Base 10 Decimal

11.11 3 3/4 3.75
0.00000000000000000000001 1/8388608 0.00000011920928955078125

The last entry in this table shows the smallest fraction that can be stored in a 23-bit mantissa. The
following table shows a few simple examples of binary floating-point numbers alongside their equivalent
decimal fractions and decimal values:

Decimal
Binary Decimal Fraction
Value
.1 1/2 .5
.01 1/4 .25
.001 1/8 .125

1 of 4 2/19/2013 12:07 AM
Tutorial: Floating-Point Binary [Link]

.0001 1/16 .0625

.00001 1/32 .03125

The Exponent
IEEE Short Real exponents are stored as 8-bit unsigned integers with a bias of 127. Let's use the number
1.101 x 25 as an example. The exponent (5) is added to 127 and the sum (132) is binary 10000100. Here
are some examples of exponents, first shown in decimal, then adjusted, and finally in unsigned binary:

Adjusted
Exponent (E) Binary
(E + 127)

+5 132 10000100
0 127 01111111
-10 117 01110101
+128 255 11111111
-127 0 00000000
-1 126 01111110

The binary exponent is unsigned, and therefore cannot be negative. The largest possible exponent is
128-- when added to 127, it produces 255, the largest unsigned value represented by 8 bits. The
approximate range is from 1.0 x 2-127 to 1.0 x 2+128 .

Normalizing the Mantissa

Before a floating-point binary number can be stored correctly, its mantissa must be normalized. The
process is basically the same as when normalizing a floating-point decimal number. For example, decimal
1234.567 is normalized as 1.234567 x 103 by moving the decimal point so that only one digit appears
before the decimal. The exponent expresses the number of positions the decimal point was moved left
(positive exponent) or moved right (negative exponent).

Similarly, the floating-point binary value 1101.101 is normalized as 1.101101 x 23 by moving the decimal
point 3 positions to the left, and multiplying by 23. Here are some examples of normalizations:

Binary Value Normalized As Exponent

1101.101 1.101101 3
.00101 1.01 -3
1.0001 1.0001 0
10000011.0 1.0000011 7

You may have noticed that in a normalized mantissa, the digit 1 always appears to the left of the decimal
point. In fact, the leading 1 is omitted from the mantissa in the IEEE storage format because it is
redundant.

Creating the IEEE Bit Representation

We can now combine the sign, exponent, and normalized mantissa into the binary IEEE short real
representation. Using Figure 1 as a reference, the value 1.101 x 20 is stored as sign = 0 (positive),
mantissa = 101, and exponent = 01111111 (the exponent value is added to 127). The "1" to the left of
the decimal point is dropped from the mantissa. Here are more examples:

Biased Exponent
Binary Value Sign, Exponent, Mantissa
-1.11 127 1 01111111 11000000000000000000000

2 of 4 2/19/2013 12:07 AM
Tutorial: Floating-Point Binary [Link]

+1101.101 130 0 10000010 10110100000000000000000

-.00101 124 1 01111100 01000000000000000000000
+100111.0 132 0 10000100 00111000000000000000000
+.0000001101011 120 0 01111000 10101100000000000000000

Converting Decimal Fractions to Binary Reals

If a decimal fraction can be easily represented as a sum of fractions in the form (1/2 + 1/4 + 1/8 + ... ),
it is fairly easy to discover the corresponding binary real. Here are a few simple examples

Decimal Fraction Factored As... Binary Real

1/2 1/2 .1
1/4 1/4 .01
3/4 1/2 + 1/4 .11
1/8 1/8 .001
7/8 1/2 + 1/4 + 1/8 .111
3/8 1/4 + 1/8 .011
1/16 1/16 .0001
3/16 1/8 + 1/16 .0011
5/16 1/4 + 1/16 .0101

Of course, the real world is never so simple. A fraction such as 1/5 (0.2) must be represented by a sum
of fractions whose denominators are powers of 2. Here is the output from a program that subtracts each
succesive fraction from 0.2 and shows each remainder. In fact, an exact value is not found after creating
the 23 mantissa bits. The result, however, is accurate to 7 digits. The blank lines are for fractions that
were too large to be subtracted from the remaining value of the number. Bit 1, for example, was equal to
.5 (1/2), which could not be subtracted from 0.2.

starting: 0.200000000000

1
2
3 subtracting 0.125000000000
remainder = 0.075000000000
4 subtracting 0.062500000000
remainder = 0.012500000000
5
6
7 subtracting 0.007812500000
remainder = 0.004687500000
8 subtracting 0.003906250000
remainder = 0.000781250000
9
10
11 subtracting 0.000488281250
remainder = 0.000292968750
12 subtracting 0.000244140625
remainder = 0.000048828125
13
14
15 subtracting 0.000030517578
remainder = 0.000018310547

3 of 4 2/19/2013 12:07 AM
Tutorial: Floating-Point Binary [Link]

16 subtracting 0.000015258789
remainder = 0.000003051758
17
18
19 subtracting 0.000001907349
remainder = 0.000001144409
20 subtracting 0.000000953674
remainder = 0.000000190735
21
22
23 subtracting 0.000000119209
remainder = 0.000000071526

Mantissa: .00110011001100110011001

4 of 4 2/19/2013 12:07 AM

Understanding Floating Point Representation
No ratings yet
Understanding Floating Point Representation
28 pages
Unit-1 COA
No ratings yet
Unit-1 COA
26 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
IEEE 754 32-bit Float Conversion Guide
No ratings yet
IEEE 754 32-bit Float Conversion Guide
4 pages
Floating Point Integer
No ratings yet
Floating Point Integer
15 pages
Floating Point Representation
No ratings yet
Floating Point Representation
18 pages
Module2.1 of Nothing
No ratings yet
Module2.1 of Nothing
7 pages
13.3 Floating-Point Numbers, Representation & Manipulation
No ratings yet
13.3 Floating-Point Numbers, Representation & Manipulation
10 pages
arch1-LECTURE-NUMBER REPRESENTATION
No ratings yet
arch1-LECTURE-NUMBER REPRESENTATION
42 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
38 pages
IEEE Floating Point Conversion Guide
No ratings yet
IEEE Floating Point Conversion Guide
34 pages
Floating Point Number Representation
No ratings yet
Floating Point Number Representation
21 pages
Biểu diễn số thực trong C
No ratings yet
Biểu diễn số thực trong C
2 pages
IEEE 754: Floating Point Guide
No ratings yet
IEEE 754: Floating Point Guide
10 pages
Ieee Standard For Floating Point Numbers
No ratings yet
Ieee Standard For Floating Point Numbers
5 pages
IEEE Floating Point Representation Explained
No ratings yet
IEEE Floating Point Representation Explained
31 pages
Decimal to Binary Floating Point Conversion
No ratings yet
Decimal to Binary Floating Point Conversion
4 pages
Decimal To Floating-Point Conversions: The Conversion Procedure
No ratings yet
Decimal To Floating-Point Conversions: The Conversion Procedure
5 pages
The Conversion Procedure (Decimal To Floating Point)
No ratings yet
The Conversion Procedure (Decimal To Floating Point)
8 pages
BCS302 Unit-2 (Part-III)
No ratings yet
BCS302 Unit-2 (Part-III)
7 pages
Floating Point Binary
No ratings yet
Floating Point Binary
1 page
Floating Point Conversion Guide
No ratings yet
Floating Point Conversion Guide
23 pages
Floating Point Precision and Mantissa Issues
No ratings yet
Floating Point Precision and Mantissa Issues
44 pages
Week 3 - Digital Arithmetic
No ratings yet
Week 3 - Digital Arithmetic
43 pages
Lecture Slides Week4
No ratings yet
Lecture Slides Week4
42 pages
How To Convert A Decimal Number To IEEE 754
No ratings yet
How To Convert A Decimal Number To IEEE 754
13 pages
What Are Floating Point Numbers?
No ratings yet
What Are Floating Point Numbers?
7 pages
IEEE Standard 754 Floating Point Numbers
No ratings yet
IEEE Standard 754 Floating Point Numbers
7 pages
Fall 2024 - CS302 - 1 - BC230407526
No ratings yet
Fall 2024 - CS302 - 1 - BC230407526
3 pages
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
No ratings yet
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
21 pages
Floating-Point Representation in Computing
No ratings yet
Floating-Point Representation in Computing
6 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
27 pages
Stack Organization in Computer Systems
No ratings yet
Stack Organization in Computer Systems
22 pages
IEEE Floating Point Representation
No ratings yet
IEEE Floating Point Representation
8 pages
Floating-Point Representation Guide
No ratings yet
Floating-Point Representation Guide
14 pages
IEEE 754 Floating Point Representation
No ratings yet
IEEE 754 Floating Point Representation
6 pages
COA Module6 FloatingPoint
No ratings yet
COA Module6 FloatingPoint
17 pages
BCSE205L-Module 2 Division and Floating Point Arithmetic
No ratings yet
BCSE205L-Module 2 Division and Floating Point Arithmetic
36 pages
4 VTV
No ratings yet
4 VTV
2 pages
Fsmul 01
No ratings yet
Fsmul 01
8 pages
Floating Point Basics for Students
No ratings yet
Floating Point Basics for Students
27 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
9 pages
IEEE-754 32-bit Floating Point Guide
No ratings yet
IEEE-754 32-bit Floating Point Guide
3 pages
Unit2 2.3&2.4
No ratings yet
Unit2 2.3&2.4
28 pages
Cacc
No ratings yet
Cacc
106 pages
1 - Decimal To Floating-Point Conversions
No ratings yet
1 - Decimal To Floating-Point Conversions
3 pages
Floating Point Number
No ratings yet
Floating Point Number
34 pages
13.3 Floating Point Numbers Notes 2024
No ratings yet
13.3 Floating Point Numbers Notes 2024
8 pages
Floating Point
No ratings yet
Floating Point
26 pages
9 Floating Point Numbers
No ratings yet
9 Floating Point Numbers
21 pages
IEEE 754 Floating Point Guide
No ratings yet
IEEE 754 Floating Point Guide
26 pages
MTH 214 Accuracy in Numerical Calculations and Error Analysis
No ratings yet
MTH 214 Accuracy in Numerical Calculations and Error Analysis
18 pages
COA
No ratings yet
COA
14 pages
Floating-Point Binary
No ratings yet
Floating-Point Binary
3 pages
Part 5 Floating Point Add Sub Mul
No ratings yet
Part 5 Floating Point Add Sub Mul
20 pages
Understanding Floating Point Numbers
No ratings yet
Understanding Floating Point Numbers
26 pages
Ass 1
No ratings yet
Ass 1
8 pages
Sega Genesis Game Genie Codes Guide
No ratings yet
Sega Genesis Game Genie Codes Guide
207 pages
Cryptography and Network Security Guide
No ratings yet
Cryptography and Network Security Guide
3 pages
School Form 7 (SF7) School Personnel Assignment List and Basic Profile
No ratings yet
School Form 7 (SF7) School Personnel Assignment List and Basic Profile
2 pages
Vocabulary and Grammar Exercises
No ratings yet
Vocabulary and Grammar Exercises
126 pages
Colgate Flosser User-Manual
No ratings yet
Colgate Flosser User-Manual
10 pages
Resume Help for Job Seekers
100% (1)
Resume Help for Job Seekers
6 pages
Form 1 Home Science Curriculum 2012
No ratings yet
Form 1 Home Science Curriculum 2012
12 pages
Grade 9 Writing Errors Analysis
No ratings yet
Grade 9 Writing Errors Analysis
22 pages
Nike's Strategic Management Analysis
No ratings yet
Nike's Strategic Management Analysis
9 pages
Report 30 - 07 - 2019 18 - 10 - 13 CAT
No ratings yet
Report 30 - 07 - 2019 18 - 10 - 13 CAT
1 page
Sahu Et Al., 2021 - Improving Financial and Environmental Performance Through MFCA A SME Case Study
No ratings yet
Sahu Et Al., 2021 - Improving Financial and Environmental Performance Through MFCA A SME Case Study
19 pages
Bhu3119 Cat 1-1
No ratings yet
Bhu3119 Cat 1-1
5 pages
Advanced Accounting 14th Edition Joe Ben Hoyle Thomas Schaefer Timothy Doupnik Get It Now
No ratings yet
Advanced Accounting 14th Edition Joe Ben Hoyle Thomas Schaefer Timothy Doupnik Get It Now
316 pages
Early Childhood Caries and Rampant Caries
No ratings yet
Early Childhood Caries and Rampant Caries
24 pages
AHDS 364: Goat and Sheep Management Guide
No ratings yet
AHDS 364: Goat and Sheep Management Guide
22 pages
Conservation of Architectural Heritage
No ratings yet
Conservation of Architectural Heritage
313 pages
Baghdad
No ratings yet
Baghdad
2 pages
Orlando's Theory of The Deliberative Nursing Process
100% (1)
Orlando's Theory of The Deliberative Nursing Process
18 pages
Antennas & Microwave MCQs
No ratings yet
Antennas & Microwave MCQs
123 pages
Assistant Construction Supervisor Duties
No ratings yet
Assistant Construction Supervisor Duties
3 pages
Determining Integration Constants in Beams
No ratings yet
Determining Integration Constants in Beams
37 pages
Evolution of Management Theories
No ratings yet
Evolution of Management Theories
12 pages
Year 10 Exam Scope Overview 2024
No ratings yet
Year 10 Exam Scope Overview 2024
3 pages
Choco Crafts
No ratings yet
Choco Crafts
24 pages
General Biology I Meyer Genetics Practice Problems (Set 1-3)
No ratings yet
General Biology I Meyer Genetics Practice Problems (Set 1-3)
3 pages
Polymer Bulletin Journal
No ratings yet
Polymer Bulletin Journal
22 pages
Numerical Method For Engineers-Chapter 10
100% (3)
Numerical Method For Engineers-Chapter 10
22 pages
Animal Diversity-I & II
No ratings yet
Animal Diversity-I & II
5 pages
Penny Ventures
No ratings yet
Penny Ventures
13 pages
Service Manual Skanmobile
100% (11)
Service Manual Skanmobile
136 pages

IEEE Floating-Point Tutorial

Uploaded by

IEEE Floating-Point Tutorial

Uploaded by

Tutorial: Floating-Point Binary [Link]

Tutorial: Floating-Point Binary

.154 = 1/10 + 5/100 + 4/1000

.1011 = 1/2 + 0/4 + 1/8 + 1/16

Binary Floating-Point Base 10 Fraction Base 10 Decimal

.0001 1/16 .0625

Normalizing the Mantissa

Binary Value Normalized As Exponent

Creating the IEEE Bit Representation

+1101.101 130 0 10000010 10110100000000000000000

Converting Decimal Fractions to Binary Reals

Decimal Fraction Factored As... Binary Real

You might also like