0% found this document useful (0 votes)
11 views4 pages

Fixed and Floating Point Representations

The document explains fixed and floating point representations of fractions in binary. Fixed point uses a set number of bits for integer and fractional parts, leading to limited range and precision, while floating point allows for representation of very large numbers using a mantissa and exponent, similar to scientific notation. The precision of floating point representation is determined by the number of digits in the mantissa, with normalization ensuring the most significant bit is utilized effectively.

Uploaded by

aub.tho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

Fixed and Floating Point Representations

The document explains fixed and floating point representations of fractions in binary. Fixed point uses a set number of bits for integer and fractional parts, leading to limited range and precision, while floating point allows for representation of very large numbers using a mantissa and exponent, similar to scientific notation. The precision of floating point representation is determined by the number of digits in the mantissa, with normalization ensuring the most significant bit is utilized effectively.

Uploaded by

aub.tho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Fixed and Floating Point Representations

Fractions can be represented in decimal by using a decimal point with numbers to the
right of the point which represent decreasing powers of 10.

For example the number 0.125 = 1 tenth + 3 hundredths + 5 thousandths

1 + 2 + 5_
10 100 1000

Factions can be represented in binary using either fixed point or floating point representations.

Fixed Point

In this system numbers are represented using a fixed number of bits for the fractional
part of the number. For example, in a 16-bit number 12 bits can be used for the integer part and
4 bits for the fractional part. The integer part of a decimal number can be converted to binary
using the process previously discussed. The fractional part of the number can be found by
multiplying the fraction by 2, taking the whole number part from the result and continuing to
multiply the remaining fractional part until it becomes 0.

For example the number 87.37510:

Integer Part (87)

Divide by 2 Quotient Remainder


87 / 2 43 1
43 / 2 21 1
21 / 2 10 1
10 / 2 5 0
5/2 2 1
2/2 1 0
1/2 0 1

8710 = 10101112

Fractional Part (.375)


Multiply by 2 Whole Fraction
.375 x 2 0 .75
.75 x 2 1 .5
.5 x 2 1 0

.37510 = 0112

The final number is 0000010101110110

The integer part is represented in red with five additional zeros to make twelve bits, while the
fractional part is represented in blue with one additional zero to make 4 bits.

The problem with this system is that there is a limited amount of numbers that can be
represented accurately depending on the number of bits used for the integer and fractional part.
The range of numbers that can be represented is determined by the number of bits in the
integer part while the precision of the number is determined by the number of bits used to
represent the fraction.

This becomes even more complicated considering that binary numbers need a lot more
digits to represent fractions than decimal. Each bit in the fractional part represents decreasing
powers of two, from left to right. These are added up to get the number in decimal. For example
the binary fraction 0.011 = 0 x ½ + 1 x ¼ + 1 x ⅛ = ⅜ or 0.375 (as shown above).

Fixed point notation assumes a binary point in a set position as there is no third symbol
available to store it explicitly.

The following table shows some decimal fractions and their binary equivalent:

Binary Fraction Fraction Decimal Fraction Binary Fraction Fraction Decimal Fraction
0.1 1/2 0.5 0.000001 1/64 0.015625
0.01 1/4 0.25 0.0000001 1/128 0.0078125
0.001 1/8 0.125 0.00000001 1/256 0.00390625
0.0001 1/16 0.0625 0.000000001 1/512 0.001953125
0.00001 1/32 0.03125 0.0000000001 1/1024 0.0009765625

The advantage of fixed point notation is simple arithmetic (same as integer arithmetic) and
therefore faster processing. However, a disadvantage is the limited range, increasing the
number of bits after the binary point for precision decreases the range and vice versa.
Floating Point

Fixed point representations allow the computer to hold fractions, but the range of numbers is still
limited. Even using 4 bytes (32 bits) to hold each number, with 8 bits for the fractional part after
the point, the largest number that can be held is just over 8 million. Another format is needed for
holding very large numbers.

In decimal, we can show very large numbers in scientific notation. For example:

1,200,000,000,000 can be written as 0.12 x 1013

Here, 0.12 is called the mantissa (or coefficient) and 13 is called the exponent. The mantissa
holds the digits and the exponent defines where to place the decimal point. In the example
above, the point is moved 13 places to the right.

The same technique can be used for binary numbers. For example, two bytes (16 bits) might be
divided into 10 bits for the mantissa and 6 for the exponent.

sign mantissa exponent


0 110100000 000011 = 0.1101 x 23

The sign bit (0) tells us that the number is positive. The mantissa represents 0.1101 and the
exponent tells us to move the point 3 places right, so the number becomes 110.1, which when
converted to decimal is 6.5.

If the exponent is negative (indicated by a 1 in the leftmost bit), the binary point is moved left
instead of right. So, for example:

mantissa exponent
0 100000000 11110

represents a mantissa of 0.1 and an exponent of 111110 (-2), so the whole number represents
0.0001, that is, one eighth or 0.125.

The precision of the floating point representation described above depends on the
number of digits stored in the mantissa. Looking once again at the more familiar decimal
system:

the number 34,568,000 can be expressed as .34568 x 108 , allowing 5 digits for the

mantissa or as .3457 x 108 , allowing only 4 places for the mantissa


Some accuracy has been sacrificed here.

The number could also be written as 0.034568 x 109 , but then we need 6 places in the
mantissa to achieve the same accuracy. In order to achieve the most accurate representation
possible for a given size of mantissa, the number should be written with no leading zeros to the
left of the most significant bit.

In binary, the same principle is used. Thus using a mantissa of 9 bits plus a sign bit, the number
0.000001001 would be represented in the mantissa as 0.100100000 , with an exponent of
111011 (-5).

This is known as the normalised form, and in the case of a positive number, is the form in which
the first bit of the mantissa, not counting the sign bit, is 1.

You might also like