0% found this document useful (0 votes)
17 views23 pages

3 Fixed and Floating Point DSP

Uploaded by

suwaidduste19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views23 pages

3 Fixed and Floating Point DSP

Uploaded by

suwaidduste19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Problems in storing and recalling the numbers

Digital computers are very proficient at storing and recalling


numbers; but this process have some errors.
For examples,
1. If we want to store the number: 1.41421356. The
computer does its best, stores the closest number it can
represent: 1.41421354.
In some cases this error is quite insignificant, while in other
cases it is terrible.

1
Problems in storing and recalling the numbers
2. A classic computational error results from the addition of
two numbers with very different values, for example, 1
and 0.00000001. We would like the answer to be
1.00000001, but the computer replies with 1.
To avoid these errors, we should understand of how
computers store and manipulate numbers.

2
These problems arise because a fixed number of bits are
allocated to store each number,
Usually 8, 16, 32 or 64.
For example, consider the case where eight bits are used
to store the value of a variable.
Since there are 28 = 256 possible bit patterns, the variable
can only take on 256 different values.
This is a fundamental limitation of the situation, and there is
nothing we can do about it.

3
Remedies and constraints
We can declare each bit pattern to represent.
In the simplest cases, the 256 bit patterns might represent
the integers from 0 to 255, 1 to 256, -127 to 128, etc.
In a more unusual scheme, the 256 bit patterns might
represent 256 exponentially related numbers: 1, 10, 100,
1000, …, 10254, 10255. Everyone, accessing the data, must
understand what value each bit pattern represents.
This is usually provided by an algorithm or formula for
converting between the represented value and the
corresponding bit pattern, and back again. 4
Many encoding schemes are possible and available.
Two general formats have become common
fixed point (also called integer numbers)
and floating point (also called real numbers).
Range the largest and smallest numbers they can
represent
Precision the size of the gaps between numbers

5
Fixed Point Representation: Unsigned integer

Fixed point representation is used to store integers, the


positive and negative whole numbers: … -3, -2, -1, 0, 1, 2, 3,
…. High level programs, such as C and BASIC, usually
allocate 16 bits to store each integer.
In the simplest case, the 216 = 65,536 possible bit patterns are
assigned to the numbers 0 through 65,535. This is called
unsigned integer format,
Conversion between the bit pattern and the number being
represented is nothing more than changing between base 2
(binary) and base 10 (decimal). The disadvantage of unsigned
integer is that negative numbers cannot be represented.
6
Fixed Point Representation: Unsigned integer

7
Fixed Point Representation: Offset Binary

Offset binary is similar to unsigned integer,


The decimal values are shifted to allow for negative
numbers.
In the 4 bit example the decimal numbers are offset by
seven, resulting in the 16 bit patterns corresponding to the
integer numbers -7 through 8.
In this same manner, a 16 bit representation would use
32,767 as an offset, resulting in a range between - 32,767
and 32,768.
Offset binary is not a standardized format.
8
Fixed Point Representation: Offset Binary

9
Fixed Point Representation: Sign and Magnitude
Sign and magnitude is another simple way of
representing negative integers.
The far left bit is called the sign bit, and is made
a zero for positive numbers, and a one for
negative numbers.
The other bits are a standard binary
representation of the absolute value of the
number.
This results in one wasted bit pattern, since there
are two representations for zero, 0000 (positive
zero) and 1000 (negative zero). This encoding
scheme results in 16 bit numbers having a range
10
of -32,767 to 32,767.
Fixed Point Representation: Sign and Magnitude

11
Fixed Point Representation: Two’s Complement
Two's complement is the format most commonly used by
hardware engineers, and is how integers are usually
represented in computers.
The decimal number zero, which corresponds to a binary
zero, 0000. As we count upward, the decimal number is
simply the binary equivalent (0 = 0000, 1 = 0001, 2 = 0010, 3
= 0011, etc.). Now, remember that these four bits are stored
in a register consisting of 4 flip-flops. If we again start at
0000 and begin subtracting, the digital hardware
automatically counts in two's complement: 0 = 0000, -1 =
1111, -2 = 1110, -3 = 1101, etc.
12
Fixed Point Representation: Two’s Complement

13
Fixed Point Representation: Two’s Complement
For the forward direction, it changes: 00000, 00001, 00002,
00003, and so on.
While for backwards direction, it changes: 00000, 99999,
99998, 99997, etc.
Using 16 bits, two's complement can represent numbers
from -32,768 to 32,767.
The left most bit is a 0 if the number is positive or zero, and
a 1 if the number is negative.
Consequently, the left most bit is called the sign bit, just as
in sign & magnitude representation.

14
Fixed Point Representation: Two’s Complement
Converting between decimal and two's complement is
straightforward for positive numbers, a simple decimal to
binary conversion.
For negative numbers, the following algorithm is often
used:
(1)take the absolute value of the decimal number,
(2)convert it to binary,
(3)complement all of the bits (ones become zeros and
zeros become ones),
(4)add 1 to the binary number.
For example: -5 → 5 → 0101 → 1010 → 1011. 15
Floating Point: Real Numbers

The encoding scheme for floating point numbers is more


complicated than for fixed point.
It is the same as used in scientific notation, where a
mantissa is multiplied by ten raised to some exponent.
For example, 5.4321 × 106, where 5.4321 is the mantissa
and 6 is the exponent.
The numbers represented in scientific notation are
normalized so that there is only a single nonzero digit left of
the decimal point. This is achieved by adjusting the
exponent as needed. 16
Floating Point: Single Precision

Floating point representation is similar to scientific notation,


except everything is carried out in base two, rather than base
ten.
The most common is ANSI/IEEE Std. 754-1985. This standard
defines the format for 32-bit numbers called single precision,
as well as 64-bit numbers called double precision.
The 32 bits used in single precision are divided into three
separate groups: bits 0 through 22 form the mantissa, bits 23
through 30 form the exponent, and bit 31 is the sign bit.

17
Floating Point: Single Precision

31 30 23 22 21 1 0

Exponent Mantissa
8 bits 23 bits

Sign bit

18
Floating Point: Single Precision
Equation for converting a bit pattern into a floating point number.
The number is represented by V, S is the value of the sign bit, M
is the value of the mantissa, and E is the value of the exponent

V = ( – 1 )S x M x 2 E – 127

The term: (-1)S, means that the sign bit, S, is 0 for a positive
number and 1 for a negative number.
The variable, E, is the number between 0 and 255 represented
by the eight exponent bits. Subtracting 127 from this number
allows the exponent term to run from to In other words, the
exponent is stored in offset binary with an offset of 127.
The mantissa, M, is formed from the 23 bits as a binary fraction.
19
Floating Point: Real Numbers Single Precision
The decimal fraction: 2.783, is interpreted: 2 + 7/10 + 8/100 +
3/1000. The binary fraction: 1.0101, means: 1 + 0/2 + 1/4 + 0/8 +
1/16.
Floating point numbers have only one nonzero digit left of the
decimal point (called a binary point in base 2)
Since the only nonzero number that exists in base two is 1, the
leading digit in the mantissa will always be a 1, and therefore
does not need to be stored.
Removing this redundancy allows the number to have an
additional one bit of precision. The 23 stored bits, referred to by
the notation: m22,m21,m21,…,m0, form the mantissa according to:
20
Floating Point: Single Precision

In other words, M = 1 + m222-1 + m212-2 + m202-3…. If bits 0


through 22 are all zeros, M takes on the value of one.
The largest and smallest numbers allowed in the standard are
±3.4 × 1038 and ± 1.2 x 10-38 respectively.
The freed bit patterns allow three special classes of numbers:
(1) ±0 is defined as all of the mantissa and exponent bits being
zero.
(2) ±∞ is defined as all of the mantissa bits being zero, and all of
the exponent bits being one.

21
Floating Point: Single Precision
Besides these special classes, there are bit patterns that are not
assigned a meaning, commonly referred to as NANs (Not A
Number).

22
Floating Point: Double Precision
The IEEE standard for double precision simply adds more bits to
the single precision format. Of the 64 bits used to store a double
precision number, bits 0 through 51 are the mantissa, bits 52
through 62 are the exponent, and bit 63 is the sign bit.
As before, the mantissa is between one and just under two, i.e., M
= 1 +m512-1 +m502-2 + m492-3….
The 11 exponent bits form a number between 0 and 2047, with an
offset of 1023, allowing exponents from 2-1023 to 21024.
The largest and smallest numbers allowed are ± 1.8 x 10308 and ±
2.2 x 10-308, respectively. These are extremely large and small
numbers.
Single precision is adequate enough for most of applications.
23
Double precision is adequate for all the applications.

You might also like