Lesson5.0 Fixed Float (CH 03)
Lesson5.0 Fixed Float (CH 03)
Storage
3.1
Objectives
After studying this chapter, the student should be able
to:
List five different data types used in a computer.
Describe how different data is stored inside a computer.
Describe how integers are stored in a computer.
Describe how reals are stored in a computer.
Describe how text is stored in a computer using one of the
various encoding systems.
Describe how audio is stored in a computer using sampling,
quantization and encoding.
Describe how images are stored in a computer using raster
and vector graphics schemes.
Describe how video is stored in a computer as a
representation of images changing in time.
3.2
3-1 INTRODUCTION
3.5
Data compression
To occupy less memory space, data is normally compressed
before being stored in the computer.
3.6
3-2 STORING NUMBERS
3.8
Figure 3.4 Fixed point representation of integers
i
An integer is normally stored in memory using
fixed-point representation.
3.9
Unsigned representation
An unsigned integer is an integer that can never be negative
and can take only 0 or positive values. Its range is between 0
and positive infinity.
0 (2n -1)
An input device stores an unsigned integer using the
following steps:
3.10
Example 3.1
Store 7 in an 8-bit memory location using unsigned
representation.
3.11
Example 3.1
Store 7 in an 8-bit memory location using unsigned
representation.
Solution
First change the integer to binary, (111)2. Add five 0s to make a
total of eight bits, (00000111)2. The integer is stored in the
memory location. Note that the subscript 2 is used to emphasize
that the integer is binary, but the subscript is not stored in the
computer.
3.12
Example 3.2
Store 258 in a 16-bit memory location.
3.13
Example 3.2
Store 258 in a 16-bit memory location.
Solution
First change the integer to binary (100000010)2. Add seven 0s to
make a total of sixteen bits, (0000000100000010)2. The integer is
stored in the memory location.
3.14
Example 3.3
What is returned from an output device when it retrieves the bit
string 00101011 stored in memory as an unsigned integer?
3.15
Example 3.3
What is returned from an output device when it retrieves the bit
string 00101011 stored in memory as an unsigned integer?
Solution
Using the procedure shown, the binary integer is converted to the
unsigned integer 43.
3.16
Sign-and-magnitude representation
In this method, the available range for unsigned integers (0
to 2n − 1) is divided into two equal sub-ranges. The first half
represents positive integers, the second half, negative
integers.
3.18
Example 3.4
Store +28 in an 8-bit memory location using sign-and-magnitude
representation.
Solution
The integer is changed to 7-bit binary. The leftmost bit is set to 0.
The 8-bit number is stored.
3.19
Example 3.5
Store 28 in an 8-bit memory location using sign-and-magnitude
representation.
3.20
Example 3.5
Store 28 in an 8-bit memory location using sign-and-magnitude
representation.
Solution
The integer is changed to 7-bit binary. The leftmost bit is set to 1.
The 8-bit number is stored.
3.21
Example 3.6
Retrieve the integer that is stored as 01001101 in sign-and-
magnitude representation.
3.22
Example 3.6
Retrieve the integer that is stored as 01001101 in sign-and-
magnitude representation.
Solution
Since the leftmost bit is 0, the sign is positive. The rest of the bits
(1001101) are changed to decimal as 77. After adding the sign,
the integer is +77.
3.23
Example 3.7
Retrieve the integer that is stored as 10100001 in sign-and-
magnitude representation.
3.24
Example 3.7
Retrieve the integer that is stored as 10100001 in sign-and-
magnitude representation.
Solution
Since the leftmost bit is 1, the sign is negative. The rest of the
bits (0100001) are changed to decimal as 33. After adding the
sign, the integer is −33.
3.25
Two’s complement representation
Almost all computers use two’s complement representation
to store a signed integer in an n-bit memory location. In this
method, the available range for an unsigned integer of (0 to
2n − 1) is divided into two equal sub-ranges. The first sub-
range is used to represent nonnegative integers, the second
half to represent negative integers. The bit patterns are then
assigned to negative and nonnegative (zero and positive)
integers as shown in Figure 3.8.
3.26
Figure 3.8 Two’s complement representation
i
In two’s complement representation, the leftmost bit
defines the sign of the integer. If it is 0, the integer is
positive. If it is 1, the integer is negative.
3.27
One’s Complementing
Before we discuss this representation further, we need to
introduce two operations. The first is called one’s
complementing or taking the one’s complement of an
integer. The operation can be applied to any integer, positive
or negative. This operation simply reverses (flips) each bit.
A 0-bit is changed to a 1-bit, a 1-bit is changed to a 0-bit.
Example 3.8
The following shows how we take the one’s complement of the
integer 00110110.
3.28
Example 3.9
The following shows that we get the original integer if we apply
the one’s complement operations twice.
3.29
Two’s Complementing
The second operation is called two’s complementing or
taking the two’s complement of an integer in binary. This
operation is done in two steps. First, we copy bits from the
right until a 1 is copied; then, we flip the rest of the bits.
Example 3.10
The following shows how we take the two’s complement of the
integer 00110100.
3.30
Example 3.11
The following shows that we always get the original integer if we
apply the two’s complement operation twice.
i
An alternative way to take the two’s complement of
an integer is to first take the one’s complement and
then add 1 to the result.
3.31
Storing an integer in two’s complement format:
•The integer is changed to an n-bit binary.
•If it is positive or zero, it is stored as it is. If it is negative, take
the two’s complement and then stores it.
3.32
Example 3.12
Store the integer 28 in an 8-bit memory location using two’s
complement representation.
3.33
Example 3.12
Store the integer 28 in an 8-bit memory location using two’s
complement representation.
Solution
The integer is positive (no sign means positive), so after decimal
to binary transformation no more action is needed. Note that five
extra 0s are added to the left of the integer to make it eight bits.
3.34
Example 3.13
Store −28 in an 8-bit memory location using two’s complement
representation.
3.35
Example 3.13
Store −28 in an 8-bit memory location using two’s complement
representation.
Solution
The integer is negative, so after changing to binary, the computer
applies the two’s complement operation on the integer.
3.36
Example 3.14
Retrieve the integer that is stored as 00001101 in memory in
two’s complement format.
3.37
Example 3.14
Retrieve the integer that is stored as 00001101 in memory in
two’s complement format.
Solution
The leftmost bit is 0, so the sign is positive. The integer is
changed to decimal and the sign is added.
3.38
Example 3.15
Retrieve the integer that is stored as 11100110 in memory using
two’s complement format.
3.39
Example 3.15
Retrieve the integer that is stored as 11100110 in memory using
two’s complement format.
Solution
The leftmost bit is 1, so the integer is negative. The integer needs
to be two’s complemented before changing to decimal.
3.40
Comparison
3.41
Storing reals
A real is a number with an integral part and a fractional part.
For example, 23.7 is a real number—the integral part is 23
and the fractional part is 7/10. Although a fixed-point
representation can be used to represent a real number, the
result may not be accurate or it may not have the required
precision. The next two examples explain why.
i
Real numbers with very large integral parts or very
small fractional parts should not be stored in fixed-
point representation.
3.42
Example 3.16
In the decimal system, assume that we use a fixed-point
representation with two digits at the right of the decimal point
and fourteen digits at the left of the decimal point, for a total of
sixteen digits. The precision of a real number in this system is
lost if we try to represent a decimal number such as 1.00234: the
system stores the number as 1.00.
Example 3.17
In the decimal system, assume that we use a fixed-point
representation with six digits to the right of the decimal point and
ten digits for the left of the decimal point, for a total of sixteen
digits. The accuracy of a real number in this system is lost if we
try to represent a decimal number such as 236154302345.00. The
system stores the number as 6154302345.00: the integral part is
much smaller than it should be.
3.43
Floating-point representation
The solution for maintaining accuracy or precision is to use
floating-point representation.
7,452,000,000,000,000,000,000.00
3.45
Example 3.18
The following shows the decimal number
7,452,000,000,000,000,000,000.00
The three sections are the sign (+), the shifter (21) and the fixed-
point part (7.425). Note that the shifter is the exponent.
Some programing languages and calculators shows the number as
+7.425E21
3.46
Example 3.19
Show the number
−0.0000000000000232
in scientific notation (floating-point representation).
3.47
Example 3.19
Show the number
−0.0000000000000232
in scientific notation (floating-point representation).
Solution
We use the same approach as in the previous example—we move
the decimal point after the digit 2, as shown below:
The three sections are the sign (), the shifter (14) and the
fixed-point part (2.32). Note that the shifter is the exponent.
3.48
Example 3.20
Show the number
(101001000000000000000000000000000.00)2
in floating-point representation.
3.49
Example 3.20
Show the number
(101001000000000000000000000000000.00)2
in floating-point representation.
Solution
We use the same idea, keeping only one digit to the left of the
decimal point.
3.50
Example 3.21
Show the number
−(0.00000000000000000000000101)2
in floating-point representation.
3.51
Example 3.21
Show the number
−(0.00000000000000000000000101)2
in floating-point representation.
Solution
We use the same idea, keeping only one digit to the left of the
decimal point.
3.52
Normalization
To make the fixed part of the representation uniform, both
the scientific method (for the decimal system) and the
floating-point method (for the binary system) use only one
non-zero digit on the left of the decimal point. This is called
normalization. In the decimal system this digit can be 1 to
9, while in the binary system it can only be 1. In the
following, d is a non-zero digit, x is a digit, and y is either 0
or 1.
3.53
i
Note that the point and the bit 1 to the left of the
fixed-point section are not stored—they are implicit.
i
The mantissa is a fractional part that, together with
the sign, is treated like an integer stored in sign-and-
magnitude representation.
We need to remember that it is not an integer- it is a fractional
part that is stored like an integer. If we insert extra 0s to the right
of the number, the value will not change, whereas in a real
integer if we insert extra 0s to the left of the number, the value
3.54 will not change.
Excess System
The exponent, the power that shows how many bits the decimal
point should be moved to the left or right, is a signed number.
Although this could have been stored using two’s complement
representation, a new representation, called the Excess system, is
used instead. In the Excess system, both positive and negative
integers are stored as unsigned integers. To represent a positive
or negative integer, a positive integer (called a bias) is added to
each number to shift them uniformly to the non-negative side.
The value of this bias is 2m−1 − 1, where m is the size of the
memory location to store the exponent.
3.55
Example 3.22
We can express sixteen integers in a number system with 4-bit
allocation. By adding seven units to each integer in this range, we
can uniformly translate all integers to the right and make all of
them positive without changing the relative position of the
integers with respect to each other, as shown in the figure. The
new system is referred to as Excess-7, or biased representation
with biasing value of 7.
3.59
Example 3.23
Show the Excess_127 (single precision) representation of the
decimal number 5.75.
Solution
a. The sign is positive, so S = 0.
b. Decimal to binary transformation: 5.75 = (101.11)2.
c. Normalization: (101.11)2 = (1.1011)2 × 22.
d. E = 2 + 127 = 129 = (10000001)2, M = 1011. We need to add
nineteen zeros at the right of M to make it 23 bits.
e. The presentation is shown below:
3.61
Example 3.24
Show the Excess_127 (single precision) representation of the
decimal number –161.875.
Solution
3.64
Example 3.25
Show the Excess_127 (single precision) representation of the
decimal number –0.0234375.
Solution
a. S = 1 (the number is negative).
b. Decimal to binary transformation: 0.0234375 = (0.0000011)2.
c. Normalization: (0.0000011)2 = (1.1)2 × 2−6.
d. E = –6 + 127 = 121 = (01111001)2 and M = (1)2.
e. Representation:
3.66
Retrieving numbers stored in IEEE standard floating point
format:
1. Find the value of S,E, and M.
2. If S=0, set the sign to positive, otherwise set the sign to
negative.
3. Find the shifter (E-127).
4. Denormalize the mantissa.
5. Change the denormalized number to binary to find the
absolute value.
6. Add the sign.
3.67
Example 3.26
The bit pattern (11001010000000000111000100001111)2 is
stored in Excess_127 format. Show the value in decimal.
3.68
Example 3.26
The bit pattern (11001010000000000111000100001111)2 is
stored in Excess_127 format. Show the value in decimal.
Solution
a. The first bit represents S, the next eight bits, E and the
remaining 23 bits, M.
Storing Zero
A real number with an integral part and the fractional part set to
zero, that is, 0.0, cannot be stored using the steps discussed
above. To handle this special case, it is agreed that in this case
the sign, exponent and the mantissa are set to 0s.
3.70
Truncation errors
The value of the number stored using floating-point
representation may not be exactly as we expect it to be.
Ex: (1111111111111111.11111111111)2
in memory using excess_127 representation. After normalization,
we have:
(1.11111111111111111111111111)2
the mantissa has 27 1s. This mantissa needs to be truncated to 23
1s. (1111111111111111.11111111)2
the difference between the original number and what is retrieved
is called the truncation error.
3.71
3-3 STORING TEXT
3.72
We can represent each symbol with a bit pattern. In other words,
text such as “CATS”, which is made up from four symbols, can
be represented as four n-bit patterns, each pattern defining a
single symbol (Figure 3.14).
3.73
The length of the bit pattern that represents a symbol in a
language depends on the number of symbols used in that
language.
3.74
Codes
ASCII
Unicode
Other Codes
i
See Appendix A
3.75
3-4 STORING AUDIO
3.77
Sampling
If we cannot record all the values of a an audio signal over
an interval, we can record some of them. Sampling means
that we select only a finite number of points on the analog
signal, measure their values, and record them.
3.79
Encoding
The quantized sample values need to be encoded as bit
patterns. Some systems assign positive and negative values
to samples, some just shift the curve to the positive part and
assign only positive values.
3.80
Standards for sound encoding
Today the dominant standard for storing audio is MP3 (short
for MPEG Layer 3). This standard is a modification of the
MPEG (Motion Picture Experts Group) compression
method used for video. It uses 44100 samples per second and
16 bits per sample. The result is a signal with a bit rate of
705,600 bits per second, which is compressed using a
compression method that discards information that cannot be
detected by the human ear. This is called lossy compression,
as opposed to lossless compression: see Chapter 15.
3.81
3-5 STORING IMAGES
Color depth
The number of bits used to represent a pixel, its color depth,
depends on how the pixel’s color is handled by different
encoding techniques. The perception of color is how our
eyes respond to a beam of light. Our eyes have different
types of photoreceptor cells: some respond to the three
primary colors red, green and blue (often called RGB), while
others merely respond to the intensity of light.
3.83
True-Color
One of the techniques used to encode a pixel is called True-
Color, which uses 24 bits to encode a pixel.
3.84
Indexed color
The indexed color—or palette color—scheme uses only a
portion of these colors.
3.86
Standards for image encoding
Several de facto standards for image encoding are in use.
JPEG (Joint Photographic Experts Group) uses the True-
Color scheme, but compresses the image to reduce the
number of bits (see Chapter 15). GIF (Graphic Interchange
Format), on the other hand, uses the indexed color scheme.
3.87
Vector graphics
Raster graphics has two disadvantages: the file size is big
and rescaling is troublesome. To enlarge a raster graphics
image means enlarging the pixels., so the image looks ragged
when it is enlarged. The vector graphic image encoding
method, however, does not store the bit patterns for each
pixel. An image is decomposed into a combination of
geometrical shapes such as lines, squares, or circles.
For example, consider a circle of radius r. The main pieces of
information a program needs to draw this circle are:
1. The radius r and equation of a circle.
2. The location of the center point of the circle.
3. The stroke line style and color.
4. The fill style and color.
3.88
3-6 STORING VIDEO
3.89