1.1.3 Data Storage
1.1.3 Data Storage
1.1.3 Data Storage
0T
1) Storage is frequently used to mean the devices and data connected to the computer through
input/output operations - that is, hard disk and tape systems and other forms of storage that don't include
computer memory and other in-computer storage. For the enterprise, the options for this kind of storage
are of much greater variety and expense than that related to memory. This meaning is probably more
common in the IT industry than meaning than the second meaning.
2) In a more formal usage, storage has been divided into: (1) primary storage, which holds data
in memory (sometimes called random access memory or RAM) and other "built-in" devices such as the
processor's L1 cache (2) secondary storage, which holds data on hard disks, tapes, and other devices
requiring input/output operations.
0T
0T
0T
0T
0T
0T
0T
0T
Primary storage is much faster to access than secondary storage because of the proximity of the storage
to the processor or because of the nature of the storage devices. On the other hand, secondary storage
can hold much more data than primary storage.
In addition to RAM, primary storage includes read-only memory (ROM) and L1 and L2 cache memory. In
addition to hard disks, secondary storage includes a range of device types and technologies, including
diskettes, Zip drives, redundant array of independent disks (RAID) systems, and holographic storage.
Devices that hold storage are collectively known as storage devices.
A somewhat antiquated term for primary storage is main storage and a somewhat antiquated term for
secondary storage is auxiliary storage. Note that, to add to the confusion, there is an additional meaning
for primary storage that distinguishes actively used storage from backup storage.
0T
0T
0T
0T
0T
0T
0T
0T
0T
Page 1 of 12
Page 3 of 12
0T
0T
0T
0T
0T
0T
0T
0T
0T
0T
0T
Safety in Redundancy:
More redundancy detects more errors, at the cost of more data transmitted. We could simply send each
message 3 times, and discard any copy that disagrees with the other two. This is a simple example of a
"perfect" code, although it is far from perfect, it is called that because it adds exactly enough redundancy
to detect or correct some number of errors. In this case, not that many. Note that although it is unlikely, it
IS possible that the two identical copies both happened to have the exact same error, and the third copy is
the correct one. If each copy arrived different in some way, we might have detected two errors. Error
detection and correction systems are rated by how much redundancy they cost, and how many errors they
can detect or correct. In this example three "symbols" are used, each is the length of the original message.
It can detect two errors, and correct one. We can do much better.
Page 4 of 12
0T
0T
0T
0T
0T
0T
0T
0T
0T
0T
0T
Checksum schemes include parity bits, check digits, and longitudinal redundancy checks. Some
checksum schemes, such as the Damm algorithm, the Lund algorithm, and the Verhoeff algorithm,
are specifically designed to detect errors commonly introduced by humans in writing down or
remembering identification numbers.
0T
0T
0T
0T
0T
0T
0T
0T
0T
0T
A checksum is determined in one of two ways. Let's say the checksum of a packet is 1 byte long. A byte is
made up of 8 bits, and each bit can be in one of two states, leading to a total of 256 possible
combinations. Since the first combination equals zero, a byte can have a maximum value of 255.
32TU
U32T
If the sum of the other bytes in the packet is 255 or less, then the checksum contains that
exact value.
If the sum of the other bytes is more than 255, then the checksum is the remainder of the total
value after it has been divided by 256.
Let's look at a checksum example:
Bytes total 1,151
1,151 / 256 = 4.496 (round to 4)
4 x 256 = 1,024
1,151 - 1,024 = 127 checksum
PARITY:
For example, if we send some specific sequence of ones and zeros, and then count the number of ones
that we sent and send an extra 1 if that count is odd or an extra 0 if that count is even, then we have
introduced a small amount of redundancy into the transmission. The extra bit is called the parity bit, it is
even parity because it makes the total number of 1's in the transmission become an even number, and is a
simple example of what is called an "extended" code. The receiver can then count up the number of 1 bit
they received, perform the same calculation, and if the result does not match the extra bit we sent them,
they will know that an error occurred. If by chance, two errors occur in the byte being transmitted for
example 00001111 changes to 00000011 the parity bit will still read this as a one which is theoretically
correct, but the data is different, this error wont be detected by even parity.
The sender while creating a frame counts the number of 1s in it, for example, if even parity is used and
number of 1s is even then one bit with value 0 is added. This way number of 1s remains even. Or if the
number of 1s is odd, to make it even a bit with value 1 is added.
Page 5 of 12
The receiver simply counts the number of 1s in a frame. If the count of 1s is even and even parity is used,
the frame is considered to be not-corrupted and is accepted. If the count of 1s is odd and odd parity is
used, the frame is not corrupted and accepted.
CHECK DIGIT:
Number added to a code (such as a bar code or account number) to derive a further number as a means of
verifying the accuracy or validity of the code as it is printed or transmitted. A code consisting of three
digits, for example, such as 135 may include 9 (sum of 1, 3, and 5) as the last digit and be communicated
as 1359.
0T
0T
0T32U
0T32U
U0T32
U0T32
0T
0T
0T
0T32U
0T
U0T32
0T32U
U0T32
0T
0T
0T32U
U32T
0T
0T32U
U0T32
0T
0T
32TU
U0T32
0T
0T
0T32U
U32T
Now you will learn how check digits are calculated. The ISBN-10 (used on books) has been chosen as the
example; this uses a module 11 system which includes the letter X to represent the number 10.
Example 1
To calculate the check digit for the ISBN 0 - 2 0 1 - 5 3 0 8 2 - ?
(i)
the position of each digit is considered:
10 9 8 7 6 5 4 3 2 1
digit position
0 - 2 0 1- 5 3 0 8 2 - ?
Number
U
(ii)
Each digit is then multiplied by its digit position and the totals added together
(0x10) + (2x9) + (0x8) + (1x7) + (5x6) + (3x5) + (0x4) + (8x3) + (2x2)
= 0 + 18 + 0 + 7 + 30 + 15 + 0 + 24 + 4
= 98
(iii)
The total is then divided by 11 (modulo 11) and the remainder, if any, is subtracted from 11 to
give the check digit.
98 11 = 8 remainder 10
11 10 = 1
This gives a check digit of 1
Final ISBN becomes 0 -2 0 1 - 5 3 0 8 2 1
Page 6 of 12
-1
-5
-X
(ii)
digit
position
number
Each digit is then multiplied by its digit position and the totals added together
(0x10) + (1x9) + (3x8) + (1x7) + (5x6) + (2x5) + (4x4) + (4x3) + (7x2) + (Xx1)
= 0 + 9 + 24 + 7 + 30 + 10 + 16 + 12 + 14 + 10 (recall that X = 10)
= 132
(iii)
The total is then divided by 11; if there is no remainder then the check digit is correct:
132 11 = 12 remainder 0
Hence the check digit is correct
ERROR CORRECTION:
Error correction may generally be realized in two different ways:
ARQ
ARQ is also called Automatic repeat request which is an error control (error correction) method that uses
error-detection codes and positive and negative acknowledgments. When the transmitter either receives a
negative acknowledgment or a timeout happens before acknowledgment is received, the ARQ makes the
transmitter resend the message.
32T
32T
Page 7 of 12
41T
41T
41T
When Internet file-sharing boomed into popularity with Napster and the iPod, the MP3 cornered the
market for one reason: it had a small footprint. Without broadband connections, it was impractical at the
time to share file sizes larger than the MP3 standard 2 3 Megabytes.
And that preference has stuck for some time now even though MP3 does not have nearly the same
amount of quality as WAV or AIFF files. But despite this growing base of people using higher quality
formats, there are still those who prefer the MP3 whether out of nostalgia or quality, who knows.
32T
32T
Page 8 of 12
0T
0T41
0T41
41T
41T
41T
41T
0T
0T41
0T41
0T
The MP4 is a container format, allowing a combination of audio, video, subtitles and still images to be held
in the one single file. It also allows for advanced content such as 3D graphics, menus and user
interactivity.
Because MP4 was a reliable application that required a relatively low amount of bandwidth, just about
everyone could take advantage of using the tool. This was especially true as technology made it possible
to create more powerful desktop and laptop systems that had a larger hard drive and could command
more power.
The enhancement of the speed of various types of Internet connections also helped to make MP4 more
accessible to a greater audience. MP4 works in a similar although much more complex way to MP3s, by
compressing the files without losing any of the quality. MP3 technology revolutionized the way in which
music and audio files are used and it's looking like the MP4 format will do the same for the video market.
0T
0T41
0T41
0T
Page 9 of 12
0T32
0T32
0T32
0T
0T
0T32
0T32
0T
0T
0T
0T
0T
0T32
0T32
0T
32T
0T32
0T32
0T
0T
0T32
0T32
0T
32T
0T32
0T32
0T
0T
0T32
0T32
0T
0T
0T
16T
16T
0T
0T32
0T32
0T
0T
0T32
0T32
0T
0T
0T32
32T
Data compression is also widely used in backup utilities, spreadsheet, and database management
systems. Certain types of data, such as bit-mapped graphics, can be compressed to a small fraction of
their normal size.
0T
32T
0T
0T32
0T32
0T32
0T
0T
0T32
32T
LOSSY COMPRESSION
Lossy compression refers to discarding irrelevant information. Generally this means compressing images,
video, or audio by discarding data that the human perceptual system cannot see or hear.
Lossy compression is a hard AI problem. To illustrate, speech could theoretically be compressed by
transcribing it into text and compressing it with standard techniques to about 10 bits per second. We are
nowhere near that!
Even worse, we could imagine a lossy video compressor translating a movie into a script, and the
decompressor reading the script and creating a new movie with different details but close enough so that
the average person watching both movies one after the other would not notice any differences. We may
use a result by Landauer (1986) to estimate just how tiny this script could be. He tested people's memory
(over a period of days) over a wide range of formats such as words, numbers, pictures and music, and
concluded that the human brain writes to long term memory at a fairly constant rate of about 2 bits per
second. Currently we need 107 bits per second to store DVD quality MPEG-2 video.
P
P0T
0T
The state of the art is to apply lossy compression only at a very low level of human sensory modeling,
where the model is well understood.
IMAGE COMPRESSION
All image formats, even BMP, may be regarded as a form of lossy image compression. An uncompressed
image is normally a 2 dimensional array of pixels, where each pixel has 3 color components (red, green,
blue) represented as an integer with a fixed range and resolution. A pixel array is an approximation of a 2
dimensional continuous field where the light intensity at any point would be properly described as a
continuous spectrum. Note how lossy compression is applied:
The eye can't see detail much smaller than 0.1 mm, so there is no need for an image to have more than a
few thousand pixels in each dimension.
The eye can't detect differences in brightness of less than about 1%, so there is no need to quantize
brightness to more than a few hundred levels.
Page 10 of 12
LOSSLESS COMPRESSION
Lossless data compression is a class of data compression algorithms that allows the original data to be
perfectly reconstructed from the compressed data. By contrast, lossy data compression, permits
reconstruction only of an approximation of the original data, though this usually allows for
improved compression rates (and therefore smaller sized files).
0T
0T
0T
0T32
0T32
0T32
0T32
0T
0T
0T32
0T32
0T
0T32
32T
0T
Lossless data compression is used in many applications. For example, it is used in the ZIP file format and
in the GNU tool gzip. It is also often used as a component within lossy data compression technologies
(e.g. lossless mid/side joint stereo preprocessing by the LAME MP3 encoder and other lossy audio
encoders).
0T
0T
0T32
0T32
0T
0T
0T
0T32
0T32
0T32
0T
32T
0T32
0T32
0T
0T
0T32
0T32
0T32
0T32
0T
Lossless compression is used in cases where it is important that the original and the decompressed data
be identical, or where deviations from the original data could be deleterious. Typical examples are
executable programs, text documents, and source code. Some image file formats, like PNG or GIF, use
only lossless compression, while others like TIFF and MNG may use either lossless or lossy
methods. Lossless audio formats are most often used for archiving or production purposes, while
smaller lossy audio files are typically used on portable players and in other cases where storage space is
limited or exact replication of the audio is unnecessary.
0T
0T
0T
0T
0T32
0T32
32T0
0T32
0T32
0T32
0T
0T
0T32
0T32
0T32
0T32
0T
0T
0T32
32T
0T
0T
0T
Page 12 of 12