Fixed and Floating Point Representation
Fixed and Floating Point Representation
Fixed and Floating Point Representation
Fixed-Point Representation −
This representation has fixed number of bits for integer part and for fractional part. For
example, if given fixed-point representation is IIII.FFFF, then you can store minimum value
is 0000.0001 and maximum value is 9999.9999. There are three parts of a fixed-point number
representation: the sign field, integer field, and fractional field.
Example −Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for
the integer part and 16 bits for the fractional part.
We can move the radix point either left or right with the help of only integer field is 1.
Floating-Point Representation −
This representation does not reserve a specific number of bits for the integer part or the
fractional part. Instead it reserves a certain number of bits for the number (called the mantissa
or significand) and a certain number of bits to say where within that number the decimal
place sits (called the exponent).
The floating number representation of a number has two part: the first part represents a
signed fixed point number called mantissa. The second part of designates the position of the
decimal (or binary) point and is called the exponent. The fixed point mantissa may be fraction
or an integer. Floating -point is always interpreted to represent a number in the following
form: Mxre.
Only the mantissa m and the exponent e are physically represented in the register (including
their sign). A floating-point binary number is represented in a similar manner except that is
uses base 2 for the exponent. A floating-point number is said to be normalized if the most
significant digit of the mantissa is 1.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the
exponent value, and Bias is the bias number.
Note that signed integers and exponent are represented by either sign representation, or one’s
complement representation, or two’s complement representation.
The floating point representation is more flexible. Any non-zero number can be represented
in the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.
Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed
exponent, and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1
for a normalized number) and is referred to as a “hidden bit”.
Note that 8-bit exponent field is used to store integer exponents -126 ≤ n ≤ 127.
The precision of a floating-point format is the number of positions reserved for binary digits
plus one (for the hidden bit). In the examples considered here the precision is 23+1=24.
The gap between 1 and the next normalized floating-point number is known as machine
epsilon. the gap is (1+2-23)-1=2-23for above example, but this is same as the smallest positive
floating-point number because of non-uniform spacing unlike in the fixed-point scenario.
Note that non-terminating binary numbers can be represented in floating point representation,
e.g., 1/3 = (0.010101 ...)2 cannot be a floating-point number as its binary representation is
non-terminating.
The IEEE specifies two types of formats in floating point representation that are:
Single precision(32-bit)
Double precision(64-bit)
In the IEEE 754 standard, the single-precision floating-point representation, the exponent is
encoded using an offset-binary encoding, with the zero offset being 127; this is known as
exponent bias.
Thus, the offset of 127 must be removed from the recorded exponent to obtain the real
exponent as described by the offset-binary representation.
The double precision floating point representation (also known as FP64 or float64) is a
computer number format that uses a floating radix point to express a wide dynamic range of
numeric values. The IEEE 754 standard defines a binary64 as having the following
characteristics:
Consider the decimal value 12.34 * 107, which may alternatively be written as 0.1234 * 10 9,
where 0.1234 is the fixed-point mantissa. The other portion is the exponent value, and it
shows that the actual position of the binary point in the fraction is 9 places to the right (left)
of the specified binary point.
A floating point representation is so named because the binary point can be shifted to any
place and the exponent value can be modified accordingly. By convention, you should use a
normalized form, with the floating point to the right of the first nonzero (significant) digit.
Special Value Representation −
There are some special values depended upon different values of the exponent and mantissa
in the IEEE 754 standard.
All the exponent bits 0 with all mantissa bits 0 represents 0. If sign bit is 0, then +0,
else -0.
All the exponent bits 1 with all mantissa bits 0 represents infinity. If sign bit is 0, then
+∞, else -∞.
All the exponent bits 0 and mantissa bits non-zero represents denormalized number.
All the exponent bits 1 and mantissa bits non-zero represents error.