0% found this document useful (0 votes)

574 views

Fixed and Floating Point Representation

Fixed point representation reserves a specific number of bits for the integer and fractional parts. Floating point representation reserves bits for the significand (mantissa) and exponent. The exponent specifies where the decimal point lies. Floating point allows a wider range of values to be represented and is preferred over fixed point for numerical analysis. Special values like 0, infinity, and NaN are represented using specific bit patterns in the exponent and mantissa fields according to IEEE standards.

Uploaded by

jyotiranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

574 views

Fixed and Floating Point Representation

Uploaded by

jyotiranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Fixed and Floating Point Representation

Fixed-Point Representation −

This representation has fixed number of bits for integer part and for fractional part. For
example, if given fixed-point representation is IIII.FFFF, then you can store minimum value
is 0000.0001 and maximum value is 9999.9999. There are three parts of a fixed-point number
representation: the sign field, integer field, and fractional field.

We can represent these numbers using:

 Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.

 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.
 2’s complementation representation: range from -(2(k-1)) to (2(k-1)-1), for k bits.

2’s complementation representation is preferred in computer system because of unambiguous

property and easier for arithmetic operations.

Example −Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for
the integer part and 16 bits for the fractional part.

Then, -43.625 is represented as following:

Where, 0 is used to represent + and 1 is used to represent. 000000000101011 is 15 bit binary

value for decimal 43 and 1010000000000000 is 16 bit binary value for fractional 0.625.

The advantage of using a fixed-point representation is performance and disadvantage

is relatively limited range of values that they can represent. So, it is usually inadequate for
numerical analysis as it does not allow enough numbers and accuracy. A number whose
representation exceeds 32 bits would have to be stored inexactly.
These are above smallest positive number and largest positive number which can be store in
32-bit representation as given above format. Therefore, the smallest positive number is 2 -16
≈ 0.000015 approximate and the largest positive number is (2 15-1)+(1-2-16)=215(1-2-16)
=32768, and gap between these numbers is 2-16.

We can move the radix point either left or right with the help of only integer field is 1.

Floating-Point Representation −

This representation does not reserve a specific number of bits for the integer part or the
fractional part. Instead it reserves a certain number of bits for the number (called the mantissa
or significand) and a certain number of bits to say where within that number the decimal
place sits (called the exponent).

The floating number representation of a number has two part: the first part represents a
signed fixed point number called mantissa. The second part of designates the position of the
decimal (or binary) point and is called the exponent. The fixed point mantissa may be fraction
or an integer. Floating -point is always interpreted to represent a number in the following
form: Mxre.

Only the mantissa m and the exponent e are physically represented in the register (including
their sign). A floating-point binary number is represented in a similar manner except that is
uses base 2 for the exponent. A floating-point number is said to be normalized if the most
significant digit of the mantissa is 1.

So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number.

Note that signed integers and exponent are represented by either sign representation, or one’s
complement representation, or two’s complement representation.

The floating point representation is more flexible. Any non-zero number can be represented
in the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.

Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed
exponent, and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1
for a normalized number) and is referred to as a “hidden bit”.

Then −53.5 is normalized as -53.5=(-110101.1)2=(-1.101011)x25 , which is represented as

following below,
Where 00000101 is the 8-bit binary value of exponent value +5.

Note that 8-bit exponent ﬁeld is used to store integer exponents -126 ≤ n ≤ 127.

The smallest normalized positive number that ﬁts into 32 bits is

(1.00000000000000000000000)2x2-126=2-126≈1.18x10-38 , and largest normalized positive
number that ﬁts into 32 bits is (1.11111111111111111111111)2x2127=(224-1)x2104 ≈
3.40x1038 . These numbers are represented as following below,

The precision of a ﬂoating-point format is the number of positions reserved for binary digits
plus one (for the hidden bit). In the examples considered here the precision is 23+1=24.

The gap between 1 and the next normalized floating-point number is known as machine
epsilon. the gap is (1+2-23)-1=2-23for above example, but this is same as the smallest positive
floating-point number because of non-uniform spacing unlike in the fixed-point scenario.

Note that non-terminating binary numbers can be represented in floating point representation,
e.g., 1/3 = (0.010101 ...)2 cannot be a ﬂoating-point number as its binary representation is
non-terminating.

The IEEE specifies two types of formats in floating point representation that are:

 Single precision(32-bit)
 Double precision(64-bit)

Single Precision Floating Point Representation

The single-precision floating-point representation (also known as FP32 or float32) is a

computer number format that uses a floating radix point to express a wide dynamic range of
numeric values. The IEEE 754 standard defines a binary32 as having the following
characteristics:

 1 bit for sign

 8-bit for exponent
 The precision of significand: 24 bits (23 explicitly stored)

The structure of single precision floating point representation is as follows:

Exponent calculation

In the IEEE 754 standard, the single-precision floating-point representation, the exponent is
encoded using an offset-binary encoding, with the zero offset being 127; this is known as
exponent bias.

Emin = 01H – 7FH = −126

Emax = FEH – 7FH = 127

Exponent bias = 7FH = 127

Thus, the offset of 127 must be removed from the recorded exponent to obtain the real
exponent as described by the offset-binary representation.

Double Precision Floating Point Representation

The double precision floating point representation (also known as FP64 or float64) is a
computer number format that uses a floating radix point to express a wide dynamic range of
numeric values. The IEEE 754 standard defines a binary64 as having the following
characteristics:

 1 bit for sign

 11-bit for exponent
 The precision of significand: 53 bits (52 explicitly stored)

The structure of double precision floating point representation is as follows:

Need for Floating Point Representation

A fixed point representation will not be sufficient when representing extremely small or
extremely big numbers. The precision will be lost. As a result, you must examine floating
point representations, in which the binary point is believed to be floating.

Consider the decimal value 12.34 * 107, which may alternatively be written as 0.1234 * 10 9,
where 0.1234 is the fixed-point mantissa. The other portion is the exponent value, and it
shows that the actual position of the binary point in the fraction is 9 places to the right (left)
of the specified binary point.

A floating point representation is so named because the binary point can be shifted to any
place and the exponent value can be modified accordingly. By convention, you should use a
normalized form, with the floating point to the right of the first nonzero (significant) digit.
Special Value Representation −

There are some special values depended upon different values of the exponent and mantissa
in the IEEE 754 standard.

 All the exponent bits 0 with all mantissa bits 0 represents 0. If sign bit is 0, then +0,
else -0.
 All the exponent bits 1 with all mantissa bits 0 represents infinity. If sign bit is 0, then
+∞, else -∞.
 All the exponent bits 0 and mantissa bits non-zero represents denormalized number.
 All the exponent bits 1 and mantissa bits non-zero represents error.

Calculus Cheat Sheet Integrals
100% (5)
Calculus Cheat Sheet Integrals
5 pages
Digital Systems - Short Notes
100% (1)
Digital Systems - Short Notes
6 pages
Digital-Circuits Anand-Kumar
No ratings yet
Digital-Circuits Anand-Kumar
10 pages
11+ Maths Practice Paper
No ratings yet
11+ Maths Practice Paper
24 pages
t2 M 1224 Year 6 Adding Fractions Worksheets - Ver - 5
No ratings yet
t2 M 1224 Year 6 Adding Fractions Worksheets - Ver - 5
3 pages
Lab Report 2 Zaryab Rauf Fa17-Ece-046
No ratings yet
Lab Report 2 Zaryab Rauf Fa17-Ece-046
9 pages
4 - Bit Arithmetic Processor, Design of A 4 Bit Shifter
No ratings yet
4 - Bit Arithmetic Processor, Design of A 4 Bit Shifter
56 pages
4.1 Solution of Algebraic and Transcendental Equations: Bisection Method
No ratings yet
4.1 Solution of Algebraic and Transcendental Equations: Bisection Method
10 pages
Newton Raphson Approximation On Python
No ratings yet
Newton Raphson Approximation On Python
6 pages
Digital Electronics Ec 1201
100% (1)
Digital Electronics Ec 1201
34 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
Electrical Power Generation
No ratings yet
Electrical Power Generation
15 pages
CH 1 Number System and Numerical Error Analysis
No ratings yet
CH 1 Number System and Numerical Error Analysis
10 pages
Ade Mod5
No ratings yet
Ade Mod5
8 pages
Anand Kumar SN S Complete Notes On Signals and Systems
No ratings yet
Anand Kumar SN S Complete Notes On Signals and Systems
64 pages
PPS_Notes_First Year_c programming
No ratings yet
PPS_Notes_First Year_c programming
189 pages
Fetch-Decode-Execute Cycle
No ratings yet
Fetch-Decode-Execute Cycle
5 pages
Basic Operations On Signals PDF
No ratings yet
Basic Operations On Signals PDF
8 pages
Sem - 2 - Engineering Maths - III & IV
No ratings yet
Sem - 2 - Engineering Maths - III & IV
336 pages
PPS Question Bank
No ratings yet
PPS Question Bank
6 pages
Microprocessors Notes
No ratings yet
Microprocessors Notes
65 pages
Unit 1, 2 Control - Systems - Notes
No ratings yet
Unit 1, 2 Control - Systems - Notes
97 pages
Instruction Format 8051
No ratings yet
Instruction Format 8051
26 pages
MC Merged
No ratings yet
MC Merged
506 pages
GTU PHD Core Syllabus CMOS Analog Circuit Design
No ratings yet
GTU PHD Core Syllabus CMOS Analog Circuit Design
1 page
9 State Variable Approach (Continuous Systems) : Advantages
No ratings yet
9 State Variable Approach (Continuous Systems) : Advantages
76 pages
Fractal Previous Year Coding Questions Super Dream
No ratings yet
Fractal Previous Year Coding Questions Super Dream
2 pages
Computer Oriented Numerical Methods
No ratings yet
Computer Oriented Numerical Methods
2 pages
Coa Unit-4 Notes
No ratings yet
Coa Unit-4 Notes
44 pages
Decimal Binary C Programming Using Switch
No ratings yet
Decimal Binary C Programming Using Switch
10 pages
2 Marks: Question Bank Unit-Iv Memory System
No ratings yet
2 Marks: Question Bank Unit-Iv Memory System
2 pages
For PPS
No ratings yet
For PPS
57 pages
NPTEL Online Course: Control Engineering: Assignment 1
No ratings yet
NPTEL Online Course: Control Engineering: Assignment 1
4 pages
Z - BCD - Gray Codes PDF
No ratings yet
Z - BCD - Gray Codes PDF
15 pages
03 - Top Level View of Computer Function and Interconnection
No ratings yet
03 - Top Level View of Computer Function and Interconnection
64 pages
Cpds Imp Ques Theory and Pgms
No ratings yet
Cpds Imp Ques Theory and Pgms
5 pages
Dbatu em 3 Question Paper - 2022
No ratings yet
Dbatu em 3 Question Paper - 2022
2 pages
BFS Greedybfs Astar Search Techniques in AI Difference and Details
No ratings yet
BFS Greedybfs Astar Search Techniques in AI Difference and Details
2 pages
PDF
No ratings yet
PDF
3 pages
ECA-I Short Answer Questions
No ratings yet
ECA-I Short Answer Questions
7 pages
SOP To Standard SOP
No ratings yet
SOP To Standard SOP
51 pages
8051 Question
No ratings yet
8051 Question
9 pages
CH 7.9: Nonhomogeneous Linear Systems: The General Theory of A Nonhomogeneous System of Equations
No ratings yet
CH 7.9: Nonhomogeneous Linear Systems: The General Theory of A Nonhomogeneous System of Equations
21 pages
MTech VLSI Design 1st Sem Syllabus
100% (1)
MTech VLSI Design 1st Sem Syllabus
5 pages
Signals and Systems - EC3354 - Hand Written Notes - Unit 1 - Classification of Signals and Systems
100% (1)
Signals and Systems - EC3354 - Hand Written Notes - Unit 1 - Classification of Signals and Systems
54 pages
Interpolation & Decimation: - Sampling Period at The Output
No ratings yet
Interpolation & Decimation: - Sampling Period at The Output
32 pages
Subband Coding: Presented by DR.R Murugan NIT Silchar
No ratings yet
Subband Coding: Presented by DR.R Murugan NIT Silchar
11 pages
8085 and 8086comparison
50% (2)
8085 and 8086comparison
4 pages
BEE Question Bank 1 PDF
No ratings yet
BEE Question Bank 1 PDF
2 pages
BASIC 8085 Programs (Must Have)
100% (16)
BASIC 8085 Programs (Must Have)
3 pages
DSP IMP Questions
No ratings yet
DSP IMP Questions
5 pages
Parity Checker
No ratings yet
Parity Checker
21 pages
Polymorphism
No ratings yet
Polymorphism
20 pages
Assignments Week08
No ratings yet
Assignments Week08
4 pages
4-5 Basic Relationship Between Pixels
83% (6)
4-5 Basic Relationship Between Pixels
44 pages
B.Tech. ECE - R23 - Course Structure & II Year Syllabus
100% (1)
B.Tech. ECE - R23 - Course Structure & II Year Syllabus
29 pages
Arm LPC2148 Material
No ratings yet
Arm LPC2148 Material
17 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
5 pages
Number Representation
No ratings yet
Number Representation
7 pages
Fixed Versus Floating Point
No ratings yet
Fixed Versus Floating Point
5 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Unit 2
No ratings yet
Unit 2
16 pages
Worksheet-1 Vector
No ratings yet
Worksheet-1 Vector
8 pages
Reviewer in Mathematics 1st Quarter
No ratings yet
Reviewer in Mathematics 1st Quarter
6 pages
Order of Operations
No ratings yet
Order of Operations
33 pages
Annual Exam Revision Worksheet
No ratings yet
Annual Exam Revision Worksheet
9 pages
Number Base Conversion (Lec3)
No ratings yet
Number Base Conversion (Lec3)
16 pages
Ratioo
No ratings yet
Ratioo
16 pages
MATH Sample Paper 26 Unsolved
No ratings yet
MATH Sample Paper 26 Unsolved
8 pages
09 HBMT3303 T5
No ratings yet
09 HBMT3303 T5
28 pages
HKIMO Heat_PracticePaper - Kindergarden
No ratings yet
HKIMO Heat_PracticePaper - Kindergarden
8 pages
fourier series
No ratings yet
fourier series
4 pages
WMI 2023 Final Grade 06 Paper A Question
50% (2)
WMI 2023 Final Grade 06 Paper A Question
5 pages
Diagn Test Simple
No ratings yet
Diagn Test Simple
4 pages
Integration 01 Question Notes Varun JEE Advanced 2024
No ratings yet
Integration 01 Question Notes Varun JEE Advanced 2024
28 pages
College Algebra Dictionary
No ratings yet
College Algebra Dictionary
12 pages
Question Bank - 2
No ratings yet
Question Bank - 2
22 pages
Gs Gakoma B s3 Third Term Exam
No ratings yet
Gs Gakoma B s3 Third Term Exam
4 pages
Class 9 Maths Cbse Circles Notes
No ratings yet
Class 9 Maths Cbse Circles Notes
11 pages
Answer Key For Summative Test
No ratings yet
Answer Key For Summative Test
53 pages
Complex Number: Om Sharma
No ratings yet
Complex Number: Om Sharma
10 pages
Math30.CA U1l1 PolynomialFunctions
100% (1)
Math30.CA U1l1 PolynomialFunctions
20 pages
TM TG Yr5
100% (1)
TM TG Yr5
206 pages
Calculus - Real World Problem
No ratings yet
Calculus - Real World Problem
9 pages
7 Integration by Rationalizing Substitution
No ratings yet
7 Integration by Rationalizing Substitution
14 pages
10 Tricks For Doing Fast Math: 1. Adding Large Numbers
No ratings yet
10 Tricks For Doing Fast Math: 1. Adding Large Numbers
7 pages
Soal Latihan PAS MAT W Level XII 2022
No ratings yet
Soal Latihan PAS MAT W Level XII 2022
6 pages
Oswaal Notes 2 [Maths]
No ratings yet
Oswaal Notes 2 [Maths]
23 pages
A second step to mathematical olympiad problems Derek Allan Holton instant download
100% (1)
A second step to mathematical olympiad problems Derek Allan Holton instant download
77 pages

Fixed and Floating Point Representation

Uploaded by

Fixed and Floating Point Representation

Uploaded by

Fixed and Floating Point Representation

We can represent these numbers using:

 Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.

2’s complementation representation is preferred in computer system because of unambiguous

Then, -43.625 is represented as following:

Where, 0 is used to represent + and 1 is used to represent. 000000000101011 is 15 bit binary

The advantage of using a fixed-point representation is performance and disadvantage

Then −53.5 is normalized as -53.5=(-110101.1)2=(-1.101011)x25 , which is represented as

The smallest normalized positive number that ﬁts into 32 bits is

Single Precision Floating Point Representation

The single-precision floating-point representation (also known as FP32 or float32) is a

 1 bit for sign

The structure of single precision floating point representation is as follows:

Emin = 01H – 7FH = −126

Emax = FEH – 7FH = 127

Exponent bias = 7FH = 127

Double Precision Floating Point Representation

 1 bit for sign

The structure of double precision floating point representation is as follows:

Need for Floating Point Representation

You might also like