8.1.4 Data Representation - Floatng Point Numbers

The document explains floating point numbers, including their representation in both decimal and binary formats using scientific notation. It details the IEEE 754 standard for single and double precision, including the structure of the sign, exponent, and mantissa fields, as well as the concepts of overflow and underflow. Additionally, it covers floating point arithmetic operations such as addition and multiplication, providing examples for both decimal and binary calculations.

Uploaded by

Candice

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views3 pages

8.1.4 Data Representation - Floatng Point Numbers

Uploaded by

Candice

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Floating Point Numbers

Real Numbers: pi = 3.14159265... e = 2.71828...

Scientific Notation: has a single digit to the left of the decimal point.
A number in Scientific Notation with no leading 0s is called a Normalised Number: 1.0 × 10-8
Not in normalised form: 0.1 × 10-7 or 10.0 × 10-9
Can also represent binary numbers in scientific notation: 1.0 × 2-3
Computer arithmetic that supports such numbers is called Floating Point.
The form is 1.xxxx… × 2yy…
Using normalised scientific notation
1. Simplifies the exchange of data that includes floating-point numbers
2. Simplifies the arithmetic algorithms to know that the numbers will always be in this form
3. Increases the accuracy of the numbers that can be stored in a word, since each unnecessary
leading 0 is replaced by another significant digit to the right of the decimal point
Representation of Floating-Point numbers
-1S × M × 2E
Bit No Size Field Name
31 1 bit Sign (S)
23-30 8 bits Exponent (E)
0-22 23 bits Mantissa (M)
A Single-Precision floating-point number occupies 32-bits, so there is a compromise between the
size of the mantissa and the size of the exponent.
These chosen sizes provide a range of approx:
± 10-38 ... 1038
 Overflow
The exponent is too large to be represented in the Exponent field
 Underflow
The number is too small to be represented in the Exponent field
To reduce the chances of underflow/overflow, can use 64-bit Double-Precision arithmetic
Bit No Size Field Name
63 1 bit Sign (S)
52-62 11 bits Exponent (E)
0-51 52 bits Mantissa (M)
providing a range of approx
± 10-308 ... 10308
These formats are called ...
IEEE 754 Floating-Point Standard
Since the mantissa is always 1.xxxxxxxxx in the normalised form, no need to represent the leading 1.
So, effectively:
Single Precision: mantissa ===> 1 bit + 23 bits
Double Precision: mantissa ===> 1 bit + 52 bits
Since zero (0.0) has no leading 1, to distinguish it from others, it is given the reserved bit pattern all
0s for the exponent so that hardware won't attach a leading 1 to it. Thus:
 Zero (0.0) = 0000...0000
 Other numbers = -1S × (1 + Mantissa) × 2E
If we number the mantissa bits from left to right m1, m2, m3, ...
mantissa = m1 × 2-1 + m2 × 2-2 + m3 × 2-3 + ....
Negative exponents could pose a problem in comparisons.
For example (with two's complement):
Sign Exponent Mantissa
-1
1.0 × 2 0 11111111 0000000 00000000 00000000
+1
1.0 × 2 0 00000001 0000000 00000000 00000000
With this representation, the first exponent shows a "larger" binary number, making direct
comparison more difficult.
To avoid this, Biased Notation is used for exponents.
If the real exponent of a number is X then it is represented as (X + bias)
IEEE single-precision uses a bias of 127. Therefore, an exponent of
-1 is represented as -1 + 127 = 126 = 011111102
0 is represented as 0 + 127 = 127 = 011111112
+1 is represented as +1 + 127 = 128 = 100000002
+5 is represented as +5 + 127 = 132 = 100001002
So the actual exponent is found by subtracting the bias from the stored exponent. Therefore, given
S, E, and M fields, an IEEE floating-point number has the value:
-1S × (1.0 + 0.M) × 2E-bias
(Remember: it is (1.0 + 0.M) because, with normalised form, only the fractional part of the mantissa
needs to be stored)

Floating Point Addition

Add the following two decimal numbers in scientific notation:
8.70 × 10-1 with 9.95 × 101
1. Rewrite the smaller number such that its exponent matches with the exponent of the larger
number.
8.70 × 10-1 = 0.087 × 101
2. Add the mantissas
9.95 + 0.087 = 10.037 and write the sum 10.037 × 101
3. Put the result in Normalised Form
10.037 × 101 = 1.0037 × 102 (shift mantissa, adjust exponent)
check for overflow/underflow of the exponent after normalisation
4. Round the result
If the mantissa does not fit in the space reserved for it, it has to be rounded off.
For Example: If only 4 digits are allowed for mantissa
1.0037 × 102 ===> 1.004 × 102
(only have a hidden bit with binary floating point numbers)
Example addition in binary
Perform 0.5 + (-0.4375)
0.5 = 0.1 × 20 = 1.000 × 2-1 (normalised)
-0.4375 = -0.0111 × 20 = -1.110 × 2-2 (normalised)
1. Rewrite the smaller number such that its exponent matches with the exponent of the larger
number.
-1.110 × 2-2 = -0.1110 × 2-1
2. Add the mantissas:
1.000 × 2-1 + -0.1110 × 2-1 = 0.001 × 2-1
3. Normalise the sum, checking for overflow/underflow:
0.001 × 2-1 = 1.000 × 2-4
-126 <= -4 <= 127 ===> No overflow or underflow
4. Round the sum:
The sum fits in 4 bits so rounding is not required
Check: 1.000 × 2-4 = 0.0625 which is equal to 0.5 - 0.4375
Correct!
Floating Point Multiplication
Multiply the following two numbers in scientific notation by hand:
1.110 × 1010 × 9.200 × 10-5
1. Add the exponents to find
New Exponent = 10 + (-5) = 5
If we add biased exponents, bias will be added twice. Therefore we need to subtract it once
to compensate:
(10 + 127) + (-5 + 127) = 259
259 - 127 = 132 which is (5 + 127) = biased new exponent
2. Multiply the mantissas
1.110 × 9.200 = 10.212000
Can only keep three digits to the right of the decimal point, so the result is
10.212 × 105
3. Normalise the result
1.0212 × 106
4. Round it
1.021 × 106

Example multiplication in binary:

1.000 × 2-1 × -1.110 × 2-2
1. Add the biased exponents
(-1 + 127) + (-2 + 127) - 127 = 124 ===> (-3 + 127)
2. Multiply the mantissas
1.000
× 1.110
-----------
0000
1000
1000
+ 1000
-----------
1110000 ===> 1.110000

The product is 1.110000 × 2-3

Need to keep it to 4 bits 1.110 × 2-3
3. Normalise (already normalised)
At this step check for overflow/underflow by making sure that
-126 <= Exponent <= 127
1 <= Biased Exponent <= 254
4. Round the result (no change)
5. Adjust the sign.
Since the original signs are different, the result will be negative
-1.110 × 2-3

Multiplying Floating Point Numbers
No ratings yet
Multiplying Floating Point Numbers
8 pages
Programming The 8086 8088
100% (2)
Programming The 8086 8088
336 pages
Lecture 05 - Floating Point Numbers
No ratings yet
Lecture 05 - Floating Point Numbers
28 pages
Python Introduction
No ratings yet
Python Introduction
281 pages
IEEE Paper On Floating Point
No ratings yet
IEEE Paper On Floating Point
28 pages
Computer Programming Questions
67% (15)
Computer Programming Questions
147 pages
Doc-20240730-Wa0013 240730 165456
No ratings yet
Doc-20240730-Wa0013 240730 165456
21 pages
Session 7 and 8
No ratings yet
Session 7 and 8
26 pages
80c196 Users Guide PDF
100% (1)
80c196 Users Guide PDF
98 pages
Lec4 Computer Architecture
No ratings yet
Lec4 Computer Architecture
39 pages
Ron Beaufort Training, LLC Hands-On Technical Workshops
No ratings yet
Ron Beaufort Training, LLC Hands-On Technical Workshops
17 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
23 pages
Autocad 2012 PDF Dxf-Reference Enu
No ratings yet
Autocad 2012 PDF Dxf-Reference Enu
270 pages
Lec 9
No ratings yet
Lec 9
11 pages
Floating Point Numbers: Do You Have Your Laptop Here?
No ratings yet
Floating Point Numbers: Do You Have Your Laptop Here?
10 pages
Chapter 5 - Floating Point Numbers
No ratings yet
Chapter 5 - Floating Point Numbers
9 pages
Dcap407 Data Structure
100% (1)
Dcap407 Data Structure
270 pages
COA Module6 FloatingPoint
No ratings yet
COA Module6 FloatingPoint
17 pages
Lec 3 Cao Floating Point Representation
No ratings yet
Lec 3 Cao Floating Point Representation
28 pages
Number System
No ratings yet
Number System
38 pages
2.4 Floating Point Representation
No ratings yet
2.4 Floating Point Representation
7 pages
Floating Point Arithmetic Example
No ratings yet
Floating Point Arithmetic Example
4 pages
Floating Point Numbers: CS031 September 12, 2011
No ratings yet
Floating Point Numbers: CS031 September 12, 2011
22 pages
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
No ratings yet
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
49 pages
2.5 Floating Point Addition and Multiplication
No ratings yet
2.5 Floating Point Addition and Multiplication
6 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
6 pages
COA
No ratings yet
COA
14 pages
How To Represent Real Numbers: - in Decimal Scientific Notation
No ratings yet
How To Represent Real Numbers: - in Decimal Scientific Notation
16 pages
Unit-1 COA
No ratings yet
Unit-1 COA
26 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
Itec1000 Lecture Note 5
No ratings yet
Itec1000 Lecture Note 5
10 pages
Part 5 Floating Point Add Sub Mul
No ratings yet
Part 5 Floating Point Add Sub Mul
20 pages
COMP0068 Lecture10 High Level Data Types
No ratings yet
COMP0068 Lecture10 High Level Data Types
25 pages
IEEE Standard 754
No ratings yet
IEEE Standard 754
10 pages
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
No ratings yet
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
8 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
7 pages
Lecture 10 (Temp)
No ratings yet
Lecture 10 (Temp)
50 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
Floating Point Tutorial
No ratings yet
Floating Point Tutorial
15 pages
Q1: Why Is The Exponent Biased in Floating Point Hardware Design, and What Does Biased Mean in Floating Point?
No ratings yet
Q1: Why Is The Exponent Biased in Floating Point Hardware Design, and What Does Biased Mean in Floating Point?
2 pages
Floating Point Numbers 237045407 237045407
No ratings yet
Floating Point Numbers 237045407 237045407
20 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
23 pages
Floating Point
No ratings yet
Floating Point
16 pages
4.16. Floating Point
No ratings yet
4.16. Floating Point
5 pages
Culv5 v2.2.2 User Guide
No ratings yet
Culv5 v2.2.2 User Guide
8 pages
Computer Programming 1
No ratings yet
Computer Programming 1
165 pages
Ieee Standard For Floating Point Numbers
No ratings yet
Ieee Standard For Floating Point Numbers
5 pages
Floating Point
No ratings yet
Floating Point
26 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
MSC IT Questions
No ratings yet
MSC IT Questions
7 pages
Module2.1 of Nothing
No ratings yet
Module2.1 of Nothing
7 pages
Floating Point Multiplication
No ratings yet
Floating Point Multiplication
2 pages
Review: How To Represent Real Numbers
No ratings yet
Review: How To Represent Real Numbers
9 pages
Floating Points
No ratings yet
Floating Points
31 pages
Computer Organisation
No ratings yet
Computer Organisation
4 pages
Pawn Language Guide
No ratings yet
Pawn Language Guide
196 pages
Real Number Representation and Floating Point Arithmetic
No ratings yet
Real Number Representation and Floating Point Arithmetic
12 pages
Charles Bernstein - Dark City (1994)
No ratings yet
Charles Bernstein - Dark City (1994)
134 pages
OPNQRYF (Open Query File) Command Description
No ratings yet
OPNQRYF (Open Query File) Command Description
56 pages
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
No ratings yet
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
104 pages
C Sample Two Mark Question and Answer: 1) Who Invented C Language?
93% (14)
C Sample Two Mark Question and Answer: 1) Who Invented C Language?
13 pages
Floating-Point Numbers and Round-Off Errors by Kusal Kaluarachchi Medium
No ratings yet
Floating-Point Numbers and Round-Off Errors by Kusal Kaluarachchi Medium
2 pages
Floating Point
No ratings yet
Floating Point
26 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
26 pages
Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
10 MIPS Floating Point Arithmetic
No ratings yet
10 MIPS Floating Point Arithmetic
28 pages
(Turner) - Applied Scientific Computing - Chap - 02
No ratings yet
(Turner) - Applied Scientific Computing - Chap - 02
19 pages
CP500 User
No ratings yet
CP500 User
338 pages
Computer Organization
No ratings yet
Computer Organization
22 pages
Fixed Point and Floating Point Representation #2
No ratings yet
Fixed Point and Floating Point Representation #2
19 pages
Scientific Computation (Floating Point Numbers)
No ratings yet
Scientific Computation (Floating Point Numbers)
4 pages
Digital Image Processing: Using MATLAB
100% (1)
Digital Image Processing: Using MATLAB
49 pages
Pli Qa
100% (3)
Pli Qa
41 pages
Data Representation Workbook
No ratings yet
Data Representation Workbook
8 pages
Soc2040 SP Week 5 Lecture1 Slides On Data Representation Part4 Spring 2024
No ratings yet
Soc2040 SP Week 5 Lecture1 Slides On Data Representation Part4 Spring 2024
46 pages
Chapter 7 - Floating Point Arithmetic
No ratings yet
Chapter 7 - Floating Point Arithmetic
8 pages
Oracle Datatypes: Data Types For Oracle 8 To Oracle 11g
No ratings yet
Oracle Datatypes: Data Types For Oracle 8 To Oracle 11g
9 pages
Using USS Protocol With MM420
No ratings yet
Using USS Protocol With MM420
20 pages
Openerp Server
No ratings yet
Openerp Server
99 pages
NI Tutorial 12950 en
No ratings yet
NI Tutorial 12950 en
15 pages
Chapter 6
No ratings yet
Chapter 6
11 pages
Math Library-TMS320F2812
No ratings yet
Math Library-TMS320F2812
30 pages
Vector Processor
No ratings yet
Vector Processor
7 pages
Chapter 5 Study Questions
No ratings yet
Chapter 5 Study Questions
17 pages
Low-Power Multiple-Precision Iterative Floating-Point Multiplier With SIMD Support
No ratings yet
Low-Power Multiple-Precision Iterative Floating-Point Multiplier With SIMD Support
13 pages
Lecture 06 - MIPS Floating Point Arithmetic
No ratings yet
Lecture 06 - MIPS Floating Point Arithmetic
23 pages
Adding Math Power To Your PICAXE Rev B
No ratings yet
Adding Math Power To Your PICAXE Rev B
7 pages
Fast mental calculation tricks
From Everand
Fast mental calculation tricks
EasyMath
No ratings yet
GRE - Quantitative Reasoning: QuickStudy Laminated Reference Guide
From Everand
GRE - Quantitative Reasoning: QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

8.1.4 Data Representation - Floatng Point Numbers

Uploaded by

8.1.4 Data Representation - Floatng Point Numbers

Uploaded by

Floating Point Numbers

Real Numbers: pi = 3.14159265... e = 2.71828...

Floating Point Addition

Example multiplication in binary:

The product is 1.110000 × 2-3

You might also like