0% found this document useful (0 votes)

19 views17 pages

L1 FloatingPointNumbers Intro

The document discusses floating point number representation and formats. Floating point numbers use a sign bit, exponent field, and significand to represent both very large and small numbers. The IEEE 754 standard defines common floating point formats like single and double precision that use 32 and 64 bits respectively. Special values like infinity, NaN, and denormal numbers are also discussed.

Uploaded by

gauravsri_542631997

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views17 pages

L1 FloatingPointNumbers Intro

Uploaded by

gauravsri_542631997

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Floating-Point Numbers

Slides adapted by: Sparsh Mittal

Floating Point
• Representation for non-integral numbers
 Including very small and very large numbers
• Like scientific notation
 –2.34 × 1056 normalized

 +0.002 × 10–4 not normalized

 +987.02 × 109
• In binary
 ±1.xxxxxxx2 × 2yyyy
• Types float and double in C

3
Floating Point Standard
• Defined by IEEE Standard 754-1985
• Developed in response to divergence of representations
Portability issues for scientific code
• Now almost universally adopted
• Most-commonly representations
Half precision (16-bit)
Single precision (32-bit)-------------float in C
Double precision (64-bit) ------------double in C

4
IEEE Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction
(Exponent Bias)
x  ( 1)  (1 Fraction)  2
S

• S: sign bit (0  non-negative, 1  negative)

• Normalize significand: 1.0 ≤ |significand| < 2.0
 Always has a leading pre-binary-point 1 bit, so no need to
represent it explicitly (hidden bit)
 Significand is fraction with the “1.” restored
• Exponent: excess representation: actual exponent + Bias
Ensures exponent is unsigned
Single: Bias = 127; Double: Bias = 1023 6
Formula for Bias

Bias = 2(NumberOfExpBits-1)-1
Why Is It Called Single/Double Precision
• The precision indicates the number of decimal digits that
are correct, that is, without any kind of representation
error or approximation. In other words, it indicates how
many decimal digits one can safely use.
• The number of decimal digits which can be safely used:
• Single precision: log10(224), which is about 7~8 decimal
digits
• Double precision: log10(253), which is about 15~16
decimal digits

5
https://stackoverflow.com/a/42444685/984260
Various formats and their correct digits
Precision Decimal
Total Bits Sign Exponent Significand
Type Digits

Half 16 1 5 10 ~3.31

Single 32 1 8 23 ~7.22

Double 64 1 11 52 ~15.95

Quadruple 128 1 15 112 ~34.02

Octuple 256 1 19 236 ~71.34

Infinities and NaNs
• Exponent = 111...1, Fraction = 000...0
 ±Infinity
• Exponent = 111...1, Fraction ≠ 000...0
 Not-a-Number (NaN)
 Indicates illegal or undefined result
For example, 0.0 / 0.0

7
Special FP Numbers
E M Value
255 0  if S = 0
255 0 –  if S = 1 This table is
255 0 NAN(Not a number) for FP32
0 0 0 numbers
0 0 Denormal number

 NAN + x= NAN 1/0 = ∞

 0/0 = NAN -1/0 = -∞
 sin-1(5) = NAN

8
E – Exponent, M –Mantissa
Denormal Numbers on Number Line

Denormal numbers

Normal numbers

• In decimal, say 7 * 105 is considered normalized representation, but 0.7*1e6 is

not normalized.
• Similarly, in binary, 1.1* 25 is considered normalized, but 0.11 * 26 is not
normalized; it is said “denormal”.
9
Denormal Numbers
• Exponent = 000...0  hidden bit is 0

x  ( 1) S  (0  Fraction)  2126 For FP32

• Smaller than normal numbers
 Allow for gradual underflow, with diminishing precision

NOTE: For denormal numbers, exponent is NOT 0-bias, but 1-bias.

Bias is 127, so we get 1-127 = -126

10
Denormal Numbers

 Smallest +ve normal number : 2-126

 Largest denormal number :
 0.11...11 * 2-126 = (1 – 2-23)*2-126
=2-126 - 2-149

11
Summary representation

Note: We have to first check whether a number belongs to special cases (0/infinity/NaN/denormal).
If a number does not belong to special case, then, it is taken as a normal number.
Fixed-point format
• FxP has a specific number of bits (or digits) reserved for
integer and fractional parts, regardless of how large/small the
number is. For example:
With IIIII.FFFFF format, we can show numbers in range
[00000.00000, 11111.11111] (binary system)
• FP: the number of bits for integer/fractional part is not
reserved. Instead, it reserves certain bits for the significand
and exponent
• Int is similar to FxP, except that Int has no fraction part.
• Sometimes, Int and FxP are used synonoymously
Further Study
• https://blog.demofox.org/2017/11/21/floating-point-precision/
• https://www.h-schmidt.net/FloatConverter/IEEE754.html
• https://stackoverflow.com/questions/4220417/print-binary-
representation-of-a-float-number-in-c
• https://moocaholic.medium.com/fp64-fp32-fp16-bfloat16-tf32-
and-other-mem bers-of-the-zoo-a1ca7897d407
• https://www.ibm.com/support/pages/single-precision-floating-
point-accuracy
• http://www.mimirgames.com/articles/programming/digits-of-pi-
needed-for-floating-point-numbers/

How To Add Custom Field To Condition Table
100% (1)
How To Add Custom Field To Condition Table
4 pages
COA - Unit2 Floating Point Arithmetic 2
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
67 pages
DBMS - Unit II - PPT - With PLSQL
100% (1)
DBMS - Unit II - PPT - With PLSQL
143 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
Floating Point
No ratings yet
Floating Point
13 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
Fixed and Floating Point Representation
No ratings yet
Fixed and Floating Point Representation
5 pages
Floating Point Representation - M.eng Term Paper
No ratings yet
Floating Point Representation - M.eng Term Paper
6 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
Floating Point Numbers: CS101 Introduction To Computing
No ratings yet
Floating Point Numbers: CS101 Introduction To Computing
41 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Lecture 02 - Floating Point Arithmetic
No ratings yet
Lecture 02 - Floating Point Arithmetic
14 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
Lecture5 - Arithmetic For Computers - Part 2
No ratings yet
Lecture5 - Arithmetic For Computers - Part 2
57 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
4.4 - 1 New Floating Point
No ratings yet
4.4 - 1 New Floating Point
22 pages
Floating Points
No ratings yet
Floating Points
31 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
Double-Precision Floating-Point Format - Wikipedia
No ratings yet
Double-Precision Floating-Point Format - Wikipedia
8 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
No ratings yet
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
21 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
IEEE Standard 754
No ratings yet
IEEE Standard 754
10 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Complete Floating Point (Blog)
No ratings yet
Complete Floating Point (Blog)
18 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
Ieee 754 F P R: Loating Oint Epresentation
No ratings yet
Ieee 754 F P R: Loating Oint Epresentation
11 pages
IEEE 754 Floating Point Standard
No ratings yet
IEEE 754 Floating Point Standard
2 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
Fixed Point Numbers
No ratings yet
Fixed Point Numbers
20 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
IEEE 754 Floating-Point Format: C Programming
No ratings yet
IEEE 754 Floating-Point Format: C Programming
51 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
5 pages
Floating Point
No ratings yet
Floating Point
33 pages
Ch. 2 Floating Point Numbers: Representation
No ratings yet
Ch. 2 Floating Point Numbers: Representation
16 pages
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
No ratings yet
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
24 pages
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
No ratings yet
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
39 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
Number Representation
No ratings yet
Number Representation
7 pages
Computer Organization 2: Lab Tutorial 3 Chapter
No ratings yet
Computer Organization 2: Lab Tutorial 3 Chapter
30 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
20 pages
01 DigitalNumericalFormats
No ratings yet
01 DigitalNumericalFormats
27 pages
Floating Point Representation Examples
No ratings yet
Floating Point Representation Examples
2 pages
Floating Point
No ratings yet
Floating Point
6 pages
2.4 Floating Point Representation
No ratings yet
2.4 Floating Point Representation
7 pages
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
GCSE Maths Teachers Pack V11
From Everand
GCSE Maths Teachers Pack V11
Clive W. Humphris
No ratings yet
WMIC Introduction
No ratings yet
WMIC Introduction
2 pages
FMUv6X RT Pin Comparison
No ratings yet
FMUv6X RT Pin Comparison
45 pages
Netconf: in This Chapter
No ratings yet
Netconf: in This Chapter
54 pages
Learn Golang and Python Quickly - Coding For Beginners - 2 Books in 1 - Golang and Python Crash Course
No ratings yet
Learn Golang and Python Quickly - Coding For Beginners - 2 Books in 1 - Golang and Python Crash Course
403 pages
Pandora 1.3 User Guide
No ratings yet
Pandora 1.3 User Guide
135 pages
8086 Microprocessor Lab 1
No ratings yet
8086 Microprocessor Lab 1
7 pages
Otis Lift PIU Data Sheet DS 13.570
No ratings yet
Otis Lift PIU Data Sheet DS 13.570
4 pages
Abcde-Presentation 6708 1572241150
No ratings yet
Abcde-Presentation 6708 1572241150
36 pages
Wisp 2018.2 Nist CSF Mapping Example PDF
100% (1)
Wisp 2018.2 Nist CSF Mapping Example PDF
4 pages
Normalization Notes
No ratings yet
Normalization Notes
20 pages
Employee Management System Abstract:: Project Design: A. Database Design
No ratings yet
Employee Management System Abstract:: Project Design: A. Database Design
6 pages
Database Management System KCS501
No ratings yet
Database Management System KCS501
2 pages
DBMS - Question Bank (Unit 1 To 6,9)
No ratings yet
DBMS - Question Bank (Unit 1 To 6,9)
5 pages
HCM Global Absence FF Guide 2209568
No ratings yet
HCM Global Absence FF Guide 2209568
105 pages
myCANAL FR NEW 2022 (YASHVIR GAMING) .SVB 2
No ratings yet
myCANAL FR NEW 2022 (YASHVIR GAMING) .SVB 2
8 pages
DBMS Question Bank
No ratings yet
DBMS Question Bank
10 pages
HF25 Datacom Technical Manual
No ratings yet
HF25 Datacom Technical Manual
34 pages
Dbms 131201060534 Phpapp01 PDF
100% (1)
Dbms 131201060534 Phpapp01 PDF
42 pages
Information Technology P1 Nov 2024 Eng
No ratings yet
Information Technology P1 Nov 2024 Eng
30 pages
The Seven Layers of Networking 17.01.2024
No ratings yet
The Seven Layers of Networking 17.01.2024
3 pages
Diagrama Arquitectura
No ratings yet
Diagrama Arquitectura
1 page
Upcast&Downcast
No ratings yet
Upcast&Downcast
11 pages
Authoritative SYSVOL Restore
No ratings yet
Authoritative SYSVOL Restore
23 pages
Custom GenIL Object Model
No ratings yet
Custom GenIL Object Model
14 pages
Heroku Security, Privacy, and Compliance - Heroku Dev Center
No ratings yet
Heroku Security, Privacy, and Compliance - Heroku Dev Center
3 pages
Types of Memory Their Uses
No ratings yet
Types of Memory Their Uses
12 pages
3 - ETL Processing On Google Cloud Using Dataflow and BigQuery
0% (1)
3 - ETL Processing On Google Cloud Using Dataflow and BigQuery
15 pages
Shared Variables NI PSP
No ratings yet
Shared Variables NI PSP
22 pages

L1 FloatingPointNumbers Intro

Uploaded by

L1 FloatingPointNumbers Intro

Uploaded by

Floating-Point Numbers

Slides adapted by: Sparsh Mittal

 +0.002 × 10–4 not normalized

• S: sign bit (0  non-negative, 1  negative)

Quadruple 128 1 15 112 ~34.02

Octuple 256 1 19 236 ~71.34

 NAN + x= NAN 1/0 = ∞

• In decimal, say 7 * 105 is considered normalized representation, but 0.7*1e6 is

x  ( 1) S  (0  Fraction)  2126 For FP32

NOTE: For denormal numbers, exponent is NOT 0-bias, but 1-bias.

 Smallest +ve normal number : 2-126

You might also like