1.1.3 Data Storage
1.1.3 Data Storage
Show understanding that sound (music), pictures, video, text and numbers are stored in
different formats
Representing data
All data inside a computer is transmitted as a series of electrical signals that are either on or off.
Therefore, in order for a computer to be able to process any kind of data, including text, images and
sound, they must be converted into binary form. If the data is not converted into binary – a series of
R
1s and 0s – the computer will simply not understand it or be able to process it.
KU
A
TH
Representing Text
ED
When any key on a keyboard is pressed, it needs to be converted into a binary number so that it
can be processed by the computer and the typed character can appear on the screen.
HM
A
A code where each number represents a character can be used to convert text into binary. One
code we can use for this is called ASCII. The ASCII code takes each character on the keyboard and
assigns it a binary number. For example:
the letter ‘a’ has the binary number 0110 0001 (this is the denary number 97)
the letter ‘b’ has the binary number 0110 0010 (this is the denary number 98)
the letter ‘c’ has the binary number 0110 0011 (this is the denary number 99)
Text characters start at denary number 0 in the ASCII code, but this covers special characters
including punctuation, the return key and control characters as well as the number keys, capital
letters and lower case letters.
ASCII code can only store 128 characters, which is enough for most words in English but not enough
R
for other languages. If you want to use accents in European languages or larger alphabets such as
Cyrillic (the Russian alphabet) and Chinese Mandarin then more characters are needed. Therefore
another code, called Unicode, was created. This meant that computers could be used by people
KU
using different languages.
Representing images
Images also need to be converted into binary in order for a computer to process them so that they
can be seen on our screen. Digital images are made up of pixels. Each pixel in an image is made
up of binary numbers.
A
If we say that 1 is black (or on) and 0 is white (or off), then a simple black and white picture can be
created using binary.
To create the picture, a grid can be set out and the squares coloured (1 – black and 0 – white). But
TH
before the grid can be created, the size of the grid needs be known. This data is
called metadata and computers need metadata to know the size of an image. If the metadata for
the image to be created is 10x10, this means the picture will be 10 pixels across and 10 pixels down.
This example shows an image created in this way:
ED
HM
Adding colour
The system described so far is fine for black and white images, but most images need to use colours
as well. Instead of using just 0 and 1, using four possible numbers will allow an image to use four
colours. In binary this can be represented using two bits per pixel:
A
00 – white
01 – blue
10 – green
11 – red
While this is still not a very large range of colours, adding another binary digit will double the number
of colours that are available:
1 bit per pixel (0 or 1): two possible colours
2 bits per pixel (00 to 11): four possible colours
3 bits per pixel (000 to 111): eight possible colours
The number of bits used to store each pixel is called the colour depth. Images with more colours
need more pixels to store each available colour. This means that images that use lots of colours are
stored in larger files.
Image quality
R
Image quality is affected by the resolution of the image. The resolution of an image is a way of
describing how tightly packed the pixels are.
KU
In a low-resolution image, the pixels are larger so fewer are needed to fill the space. This results in
images that look blocky or pixelated. An image with a high resolution has more pixels, so it looks a
lot better when you zoom in or stretch it. The downside of having more pixels is that the file size will
be bigger.
Representing sound
Sound needs to be converted into binary for computers to be able to process it. To do this, sound is
A
captured - usually by a microphone - and then converted into a digital signal.
An analogue to digital converter will sample a sound wave at regular time intervals. For example, a
sound wave like this can be sampled at each time sample point:
TH
ED
HM
The samples can then be converted to binary. They will be recorded to the nearest whole number.
Time sample 1 2 3 4 5 6 7 8 9 10
Denary 8 3 7 6 9 7 2 6 6 6
A
Binary 1000 0011 0111 0110 1001 0111 0010 0100 0110 0110
If the time samples are then plotted back onto the same graph, it can be seen that the sound wave
now looks different. This is because sampling does not take into account what the sound wave is
doing in between each time sample.
R
KU
A
This means that the sound loses quality as data has been lost between the time samples. The way to
increase the quality and store the sound at a quality closer to the original, is to have more time
samples that are closer together. This way, more detail about the sound can be collected, so when
it’s converted to digital and back to analogue again it does not lose as much quality.
TH
The frequency at which samples are taken is called the sample rate, and is measured in Hertz (Hz).
1 Hz is one sample per second. Most CD-quality audio is sampled at 44 100 or 48 000 KHz.
Methods of error detection and correction, such as parity checks, check digits, checksums
and Automatic Repeat reQuests (ARQ)
ED
Error detection and correction has great practical importance in maintaining data (information)
integrity across noisy Communication Networks channels and less than- reliable storage media.
Error Correction: Send additional information so incorrect data can be corrected and accepted.
Error correction is the additional ability to reconstruct the original, error-free data.
HM
There are two basic ways to design the channel code and protocol for an error correcting system:
Automatic Repeat-Request (ARQ): The transmitter sends the data and also an error detection
code, which the receiver uses to check for errors, and request retransmission of erroneous data.
In many cases, the request is implicit; the receiver sends an acknowledgement (ACK) of correctly
received data, and the transmitter re-sends anything not acknowledged within a reasonable
period of time.
Go-Back-N ARQ
Selective Repeat ARQ
All three protocols usually use some form of sliding window protocol to tell the transmitter to
determine which (if any) packets need to be retransmitted.
These protocols reside in the Data Link or Transport Layers of the OSI model.
Forward Error Correction (FEC): The transmitter encodes the data with an error-correcting code
(ECC) and sends the coded message. The receiver never sends any messages back to the
transmitter. The receiver decodes what it receives into the "most likely" data. The codes are
designed so that it would take an "unreasonable" amount of noise to trick the receiver into
misinterpreting the data.
Error Detection: Send additional information so incorrect data can be detected and rejected. Error
detection is the ability to detect the presence of errors caused by noise or other impairments during
transmission from the transmitter to the receiver.
R
message for the purposes of error detection. Several schemes exist to achieve error detection, and
are generally quite simple. All error detection codes transmit more bits than were in the original data.
Most codes are "systematic": the transmitter sends a fixed number of original data bits, followed by
KU
fixed number of check bits usually referred to as redundancy which are derived from the data bits
by some deterministic algorithm.
The receiver applies the same algorithm to the received data bits and compares its output to the
received check bits; if the values do not match, an error has occurred at some point during the
transmission. In a system that uses a "non-systematic" code, such as some raptor codes, data bits are
transformed into at least as many code bits, and the transmitter sends only the code bits.
A
Repetition Schemes: Variations on this theme exist. Given a stream of data that is to be sent, the
data is broken up into blocks of bits, and in sending, each block is sent some predetermined number
of times. For example, if we want to send "1011", we may repeat this block three times each. Suppose
TH
we send "1011 1011 1011", and this is received as "1010 1011 1011".
As one group is not the same as the other two, we can determine that an error has occurred. This
scheme is not very efficient, and can be susceptible to problems if the error occurs in exactly the
same place for each group e.g. "1010 1010 1010" in the example above will be detected as correct
in this scheme. The scheme however is extremely simple, and is in fact used in some transmissions of
numbers stations.
Parity Schemes: A parity bit is an error detection mechanism. A parity bit is an extra bit transmitted
ED
with a data item, chose to give the resulting bits even or odd parity. Parity refers to the number of
bits set to 1 in the data item. There are 2 types of parity
Even parity - an even number of bits are 1 Even parity - data: 10010001, parity bit 1
Odd parity - an odd number of bits are 1 Odd parity - data: 10010111, parity bit 0
HM
The stream of data is broken up into blocks of bits, and the number of 1 bits is counted. Then, a "parity
bit" is set (or cleared) if the number of one bits is odd (or even).This scheme is called even parity; odd
parity can also be used. There is a limitation to parity schemes. A parity bit is only guaranteed to
detect an odd number of bit errors (one, three, five, and so on). If an even number of bits (two, four,
six and so on) are flipped, the parity bit appears to be correct, even though the data is corrupt. For
example
If the new checksum is not 0, error is detected. Checksum schemes include parity bits, check digits,
and longitudinal redundancy check. Suppose we have a fairly long message, which can reasonably
be divided into shorter words (a 128 byte message, for instance). We can introduce an accumulator
with the same width as a word (one byte, for instance), and as each word comes in, add it to the
accumulator.
When the last word has been added, the contents of the accumulator are appended to the
message (as a 129th byte, in this case). The added word is called a checksum. Now, the receiver
performs the same operation, and checks the checksum. If the checksums agree, we assume the
message was sent without error.
Calculating Checksum
R
A checksum is determined in one of two ways. Let's say the checksum of a packet is 1 byte long. A
byte is made up of 8 bits, and each bit can be in one of two states, leading to a total of 256 (28)
possible combinations. Since the first combination equals zero, a byte can have a maximum value
KU
of 255.
If the sum of the other bytes in the packet is 255 or less, then the checksum contains that exact
value.
If the sum of the other bytes is more than 255, then the checksum is the remainder of the total
A
value after it has been divided by 256.
CRCs are so called because the check (data verification) value is a redundancy (it expands the
message without adding information) and the algorithm is based on cyclic codes. CRCs are popular
because they are simple to implement in binary hardware, easy to analyze mathematically, and
particularly good at detecting common errors caused by noise in transmission channels. Because
the check value has a fixed length, the function that generates it is occasionally used as a hash
HM
function.
Understanding of the concept of Musical Instrument Digital Interface (MIDI) files, JPEG files,
MP3 and MP4 files
ASCII (American Standard Code for Information Interchange)is a 7-bit character code that was
introduced by American National Standards Institute (ANSI) and is used by most U.S. personal and
workstation computers.
Acronym for American Standard Code for Information Interchange. A coding scheme using 7 or 8
bits that assigns numeric values to up to 256 characters, including letters, numerals, punctuation
marks, control characters, and other symbols. ASCII was developed in 1968 to standardize data
transmission among disparate hardware and software systems and is built into most minicomputers
and all PCs. ASCII is divided into two sets: 128 characters (standard ASCII) and an additional 128
(extended ASCII).
Char Ctrl Dec Hex Char Dec Hex Char Dec Hex Char Dec Hex
<space
NUL ^@ 0 00 32 20 @ 64 40 ` 96 60
>
SOH ^A 1 01 ! 33 21 A 65 41 a 97 61
STX ^B 2 02 " 34 22 B 66 42 b 98 62
R
ETX ^C 3 03 # 35 23 C 67 43 c 99 63
KU
EOT ^D 4 04 $ 36 24 D 68 44 d 100 64
ENQ ^E 5 05 % 37 25 E 69 45 e 101 65
A
BS ^H 8 08 ( 40 28 H 72 48 h 104 68
HT ^I 9 09 ) 41 29 I 73 49 i 105 69
LF ^J 10 0A *
TH
42 2A J 74 4A j 106 6A
VT ^K 11 0B + 43 2B K 75 4B k 107 6B
FF ^L 12 0C , 44 2C L 76 4C l 108 6C
CR ^M 13 0D - 45 2D M 77 4D m 109 6D
SO ^N 14 0E . 46 2E N 78 4E n 110 6E
ED
SI ^O 15 0F / 47 2F O 79 4F o 111 6F
DLE ^P 16 10 0 48 30 P 80 50 p 112 70
HM
DC1 ^Q 17 11 1 49 31 Q 81 51 q 113 71
DC2 ^R 18 12 2 50 32 R 82 52 r 114 72
DC3 ^S 19 13 3 51 33 S 83 53 s 115 73
DC4 ^T 20 14 4 52 34 T 84 54 t 116 74
NAK ^U 21 15 5 53 35 U 85 55 u 117 75
SYN ^V 22 16 6 54 36 V 86 56 v 118 76
A
ETB ^W 23 17 7 55 37 W 87 57 w 119 77
CAN ^X 24 18 8 56 38 X 88 58 x 120 78
EM ^Y 25 19 9 57 39 Y 89 59 y 121 79
SUB ^Z 26 1A : 58 3A Z 90 5A z 122 7A
ESC ^[ 27 1B ; 59 3B [ 91 5B { 123 7B
FS ^\ 28 1C < 60 3C \ 92 5C | 124 7C
GS ^] 29 1D = 61 3D ] 93 5D } 125 7D
RS ^^ 30 1E > 62 3E ^ 94 5E ~ 126 7E
<del
US ^_ 31 1F ? 63 3F _ 95 5F 127 7F
ete>
R
EBCDIC (Extended Binary Coded Decimal Interchange Code) was developed by IBM for use on their
KU
mainframe computers.
Unicode is a character coding system designed to support the worldwide interchange and display
of written texts of diverse languages by providing a unique number for every character.
A 16-bit character encoding standard developed by the Unicode Consortium between 1988 and
1991. By using two bytes to represent each character, Unicode enables almost all of the written
A
languages of the world to be represented using a single character set. (By contrast, 8-bit ASCII is not
capable of representing all of the combinations of letters and diacritical marks that are used just
with the Roman alphabet.) Approximately 39,000 of the 65,536 possible Unicode character codes
have been assigned to date, 21,000 of them being used for Chinese ideographs. The remaining
combinations are open for expansion.
TH
Expression of denary in BCD and vice versa
http://www.miniwebtool.com/bcd-to-decimal-converter/
http://www.miniwebtool.com/decimal-to-bcd-converter/
Depends on implementing
Boolean True or False
platform
0 through +/-
79,228,162,514,264,337,593,543,950,335
(+/-7.9...E+28) with no decimal point; 0
Decimal 16 bytes through +/-
7.9228162514264337593543950335
with 28 places to the right of the
A
decimal
-1.79769313486231570E+308 through -
4.94065645841246544E-324, for
negative values
Double 8 bytes
4.94065645841246544E-324 through
1.79769313486231570E+308, for
positive values
-9,223,372,036,854,775,808 through
Long 8 bytes
9,223,372,036,854,775,807(signed)
R
Short 2 bytes -32,768 through 32,767 (signed)
KU
-3.4028235E+38 through -1.401298E-45
for negative values;
Single 4 bytes
1.401298E-45 through 3.4028235E+38
for positive values
A
UInteger 4 bytes 0 through 4,294,967,295 (unsigned)
0 through 18,446,744,073,709,551,615
ULong 8 bytes
(unsigned)
TH Each member of the structure has a
User- Depends on implementing range determined by its data type
Defined platform and independent of the ranges of the
other members
Microsoft Word is the most popular word processing software in the world. You probably won’t come
across loads of doc files, but if you do it can be annoying if you haven’t got a program that can
open them.
The most basic of files, it’s just some text. You no doubt already have either NotePad, SimpleText, or
browser
Bitmaps
R
KU
A large part of using modern computers involves sending pictures and films to each other, along
with using a graphical user interface. All of this involves computers saving and processing images.
A
This section will cover the two main image types: vector and bitmap, along with some compression
techniques.
Bitmap Graphics - a collection of pixels from an image mapped to specific memory locations holding
their binary colour value.
TH
Pixel - the smallest possible addressable area defined by a solid colour, represented as binary, in an
image.
ED
HM
Resolution
Image Resolution - how many pixels an image contains per inch/cm
Screen Resolution - the number of pixels per row by the number of pixels per column
A
R
KU
A
TH
ED
HM
A
R
Colour depth - The number of bits used to represent the colour of a single pixel
KU
Colour
1 bit 2 bit 4 bit
depth
A
Example
TH
stores 4 colours:
Mono-chrome, only stores RGB(70,61,55), RGB(79,146,85)
Description Stores limited colours
black and white RGB(129,111,134),
RGB(149,146,166)
Number of
colours
per pixel
ED
Colour
8 bit 24 bit
depth
HM
Example
colours
per pixel
It seems pretty obvious that the higher the colour depth, the closer the picture will look to reality.
Why then don't we just ramp up the colour depth on every image that we make? The answer should
be obvious, for a fixed resolution, the higher the colour depth, the larger the file size.
If the first image uses 1 bit to store the colour for each pixel, then the image size would be:
Number of Pixels * Colour Depth = Image Size
67500 * 1 bit = 67500 bits
For the second image uses 2 bits to store the colour for each pixel, then the image size would be:
R
Number of Pixels * Colour Depth = Image Size
67500 * 2 bit = 135000 bits
KU
Vector graphics
Vector graphics are graphics in which the image is represented in a mathematical fashion. What
this allows one to do is to zoom in an image to infinite precision. They are ideal for situations in which
an image might be used at various resolutions and dimensions.
Raster graphics
Raster graphics are of a fixed dimension, somewhat like a grid pattern with specified values at each
A
point. These graphics are the default for things from the real world (IE, scanned images, photographs,
etc). They are ideal for use when an image will only be used once, and will never need to be
enlarged, or if portions are coming from a photograph or other real-world image.
TH
Differences between Raster(bitmaps) and vector graphics
ED
HM
Formats page for more. Your browser can display them, or any image editor.
R
The resolution of the photographs is reduced from A to E. Photographs A and B are very sharp whilst
photograph D is very fuzzy and E is almost unrecognisable. This is the result of changing the number
KU
of PIXELS per centimetre used to store the image (that is, reducing the PICTURE RESOLUTION).
When a photographic file undergoes file compression, the size of the file is reduced. The trade-off for
this reduced file size is reduced quality of the image. One of the file formats used to reduce
photographic file sizes is known as JPEG. This is another example of lossy file compression. As with
MP3 format, once the image is subjected to the jpeg compression algorithm, a new file is formed
A
and the original file can no longer be constructed. Jpeg will reduce the RAW BITMAP image by a
factor of between 5 and 15 depending on the quality of the original.
An image that is 2048 pixels wide and 1536 pixels high is equal to 2048 × 1536 pixels; in other words,
3 145 728 pixels. This is often referred to as a 3-megapixel image (although it is obviously slightly larger).
TH
A raw bitmap can often be referred to as a TIFF or BMP image (file extension .TIF or .BMP). The file size
of this image is determined by the number of pixels. In the previous example, a 3-megapixel image
would be 3 megapixels × 3 colours. In other words, 9 megabytes (each pixel occupies 3 bytes
because it is made up of the three main colours: red, green and blue). TIFF and BMP are the highest
image quality because, unlike jpeg, they are not in a compressed format. The same image stored in
jpeg format would probably occupy between 0.6 megabytes and 1.8 megabytes.
Jpeg relies on certain properties of the human eye and, up to a point, a certain amount of file
compression can take place without any real loss of quality. The human eye is limited in its ability to
ED
detect very slight differences in brightness and in colour hues. For example, some computer imaging
software boasts that it can produce over 40 million different colours – the human eye is only able to
differentiate about 10 million colours.
PNGs are a file format designed to be used in place of GIFs. They are usually slightly smaller, and
sport advanced features like alpha-channel transparency and 24-bit colour support. Read more
on our image formats page. Your browser can view them.
MUSICAL INSTRUMENT DIGITAL INTERFACE (MIDI) is always associated with the storage of music files.
However, MIDI files are not music and don’t contain any sounds; they are very different to, for
example, MP3 files. MIDI is essentially a communications protocol that allows electronic musical
instruments to interact with each other. The MIDI protocol uses 8-bit serial transmission with one start
bit and one stop bit, and is therefore asynchronous).
A MIDI file consists of a list of commands that instruct a device (for example, an electronic organ,
sound card in a computer or in a mobile phone) how to produce a particular sound or musical note.
Each MIDI command has a specific sequence of bytes. The first byte is the status byte – this informs
the MIDI device what function to perform. Encoded in the status byte is the MIDI channel. MIDI
operates on 16 different channels, which are numbered 0 to 15.
R
Examples of MIDI commands include:
Note on/off: this indicates that a key (on an electronic keyboard) has been pressed/released to
produce/stop producing a musical note
KU
Key pressure: this indicates how hard the key has been pressed (this could indicate loudness of
the music note or whether any vibrato has been used, and so on).
Two additional bytes are required, a PITCH BYTE, which tells the MIDI device which note to play, and
a VELOCITY BYTE, which tells the device how loud to play the note. When music or sound is recorded
on a computer system, these MIDI messages are saved in a file which is recognised by the file
A
extension .mid.
If this .mid file is played back through a musical instrument, such as an electronic keyboard, the music
will be played back in an identical way to the original. The whole piece of music will have been
stored as a series of commands but no actual musical notes. This makes it a very versatile file
TH
structure, since the same file could be fed back through a different electronic instrument, such as
an electric guitar, with different effects to the original. However, to play back through an instrument
such as a guitar would need the use of SEQUENCER SOFTWARE, since the MIDI files wouldn’t be
recognised in their ‘raw’ form.
Both the electronic instruments and the computer need a MIDI interface to allow them to ‘talK’ to
each other. It was mentioned earlier that the MIDI operates on 16 channels. In fact the computer
ED
can send data out on all 16 MIDI channels at the same time. For example, 16 MIDI devices, each set
up for a different MIDI channel, could be connected to the computer. Each device could be playing
a separate line in a song from the sequencer software, effectively creating an electronic orchestra.
This implementation is being used more and more today in the recording studio, by major orchestras
and in musical scores used in films.
HM
Because MIDI files don’t contain any audio tracks, their size, compared with an MP3 file, is
considerably smaller. For example, a 10 megabyte MP3 file only requires about 10 kilobyte file size
when using the MIDI format. This makes them ideal for devices where memory is an issue; for example,
storing ring tones on a mobile phone.
MPEG-3 (MP3) uses technology known as AUDIO COMPRESSION to convert music and other sounds
into an MP3 file format. Essentially, this compression technology will reduce the size of a normal music
file by about 90 per cent. For example, an 80 megabyte music CD can be reduced to 8 megabytes
using MP3 technology.
MP3 files are used in MP3 players, computers or mobile phones. Files can be downloaded from the
internet, or CDs can be converted to MP3 format. The CD files are converted using FILE
COMPRESSION software. Whilst the music quality can never match the ‘full’ version found on a CD,
the quality is satisfactory for most general purposes.
But how can the original music file be reduced by 90 per cent whilst still retaining most of the music
quality? This is done using file compression algorithms which use PERCEPTUAL MUSIC SHAPING; this
essentially removes sounds that the human ear can’t hear properly. For example, if two sounds are
played at the same time, only the louder one can be heard by the ear, so the softer sound is
eliminated. This means that certain parts of the music can be removed without affecting the quality
R
too much. MP3 files use what is known as a LOSSY FORMAT since part of the original file is lost following
the compression algorithm. This means that the original file can’t be put back together again.
However, even the quality of MP3 files can be different since it depends on the BIT RATE – this is the
KU
number of bits per second used when creating the file. Bit rates are roughly between 80 and 320
kilobits per second; usually 200 or higher gives a sound quality close to a normal CD.
A
container formats, it allows streaming over the Internet. The only official filename extension for MPEG-
4 Part 14 files is .mp4, but many have other extensions, most commonly .m4a and .m4p. M4A (audio
only) is often compressed using AAC encoding (lossy), but can also be in Apple Lossless format.
Some devices advertised as "MP4 Players" are simply MP3 Players that also play AMV video or some
TH
other video format, and do not necessarily play the MPEG-4 Part 14 format.
MPEG-4 (MP4) files are slightly different to MP3 files. This format allows the storage of multimedia files
rather than just sound. Music, videos, photos and animation can all be stored in the MP4 format.
Videos, for example, could be streamed over the internet using the MP4 format without losing any
real discernable quality.
Real Networks created formats for streaming audio and video, and gave away free players for the
formats, before allowing themselves to become so smothered in advertising that everyone with
sense decided to stop using their programs. You might still come across real audio files around the
net,
Program: Real One Player
HM
R
.HTML/ .HTM — HyperText Markup Language file
Most pages you create for a website will be HTML files
KU
.CSS — Cascading Style Sheet
CSS files are a tool in the repertoire of webmasters that take care of how their websites look. To read
more about them, see our CSS tutorials. CSS files can be created or edited in any text-editor, like
Notepad. Try » TopStyle Lite too, it includes loads of selectors for easy editing.
A
.RAR — RAR archive
This is a compressed file format similar to the popular .zip format. It sports advanced functions like
special multimedia compression and has many benefits over zip files.
Program: WinRAR to take care of your RAR archives, and it can handle other archive types too.
TH
.ZIP — ZIPped file
Zipped files are really groups of other types of files kept together and compressed a bit. Many
downloads will consist of zip collections, so be sure to have something to open them with.
ED
HM
A
Compression
R
In addition, large files take a lot longer to download or upload which leads to web pages, songs and
videos that take longer to load and play when using the internet.
KU
Any kind of data can be compressed. There are two main types of compression: lossy and lossless.
Lossy compression
Lossy compression removes some of a file’s original data in order to reduce the file size. This might
mean reducing the numbers of colours in an image or reducing the number of samples in a sound
A
file. This can result in a small loss of quality of an image or sound file.
A popular lossy compression method for images is the JPEG, which is why most images on the internet
are JPEG images. A popular lossy compression method for sounds is MP3. Once a file has been
compressed using lossy compression, the discarded data cannot be retrieved again.
TH
Lossless compression
Lossless compression doesn’t reduce the quality of the file at all. No data is lost, so lossless
compression allows a file to be recreated exactly as it was when originally created.
There are various algorithms for doing this, usually by looking for patterns in the data that are
repeated. Zip files are an example of lossless compression.
The space savings of lossless compression are not as good as they are with lossy compression.
ED
HM
A
Lossy Lossless
Data Compression
In digital signal processing, data compression, source coding, or bit-rate reduction involves
encoding information using fewer bits than the original representation. Compression can be either
lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical
redundancy. No information is lost in lossless compression. Lossy compression reduces bits by
identifying unnecessary information and removing it. The process of reducing the size of a data file
is referred to as data compression.
Image Compression
Image compression may be lossy or lossless. Lossless compression is preferred for archival purposes
and often for medical imaging, technical drawings, clip art, or comics. Lossy compression methods,
especially when used at low bit rates, introduce compression artifacts. Lossy methods are especially
suitable for natural images such as photographs in applications where minor (sometimes
imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate. The lossy
compression that produces imperceptible differences may be called visually lossless.
Audio Compression
R
Audio data compression, as distinguished from dynamic range compression, has the potential to
reduce the transmission bandwidth and storage requirements of audio data. Audio compression
algorithms are implemented in software as audio codecs. Lossy audio compression algorithms
KU
provide higher compression at the cost of fidelity and are used in numerous audio applications.
These algorithms almost all rely on psychoacoustics to eliminate less audible or meaningful sounds,
thereby reducing the space required to store or transmit them.
In both lossy and lossless compression, information redundancy is reduced, using methods such as
coding, pattern recognition, and linear prediction to reduce the amount of information used to
represent the uncompressed data.
A
Lossless audio compression produces a representation of digital data that decompress to an
exact digital duplicate of the original audio stream, unlike playback from lossy compression
techniques such as Vorbis and MP3. Compression ratios are around 50–60% of original size,
TH
which is similar to those for generic lossless data compression. Lossless compression is unable to
attain high compression ratios due to the complexity of waveforms and the rapid changes in
sound forms.
Lossy audio compression is used in a wide range of applications. In addition to the direct
applications (mp3 players or computers), digitally compressed audio streams are used in most
video DVDs, digital television, streaming media on the internet, satellite and cable radio, and
increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater
compression than lossless compression (data of 5 percent to 20 percent of the original stream,
ED
Video Compression
Video compression uses modern coding techniques to reduce redundancy in video data. Most
video compression algorithms and codecs combine spatial image compression and temporal
HM
The majority of video compression algorithms use lossy compression. Uncompressed video requires a
very high data rate. Although lossless video compression codecs perform an average compression
of over factor 3, a typical MPEG-4 lossy compression video has a compression factor between 20
and 200.[24] As in all lossy compression, there is a trade-off between video quality, cost of processing
the compression and decompression, and system requirements. Highly compressed video may
present visible or distracting artifacts.
A
Codec
A codec is a device or computer program capable of encoding or decoding a digital data stream
or signal. Codec is a portmanteau of coder-decoder or, less commonly, compressor-decompressor.
A codec encodes a data stream or signal for transmission, storage or encryption, or decodes it for
playback or editing. Codecs are used in videoconferencing, streaming media and video editing
applications. A video camera's analog-to-digital converter (ADC) converts its analog signals into
digital signals, which are then passed through a video compressor for digital transmission or storage.
A receiving device then runs the signal through a video decompressor, then a digital-to-analog
converter (DAC) for analog display.
Audio Codec
An audio codec is a device or computer program capable of coding or decoding a digital data
stream of audio.
R
and the bandwidth required for transmission of the stored audio file. Most codecs are
implemented as libraries which interface to one or more multimedia players.
KU
In hardware, audio codec refers to a single device that encodes analog audio as digital signals
and decodes digital back into analog. In other words, it contains both an Analog-to-digital
converter (ADC) and Digital-to-analog converter (DAC) running off the same clock. This is used
in sound cards that support both audio in and out, for instance.
Video Codec
A video codec is an electronic circuit or software that compresses or decompresses digital video,
A
thus converting raw (uncompressed) digital video to a compressed format or vice-versa. In the
context of video compression, "codec" is a concatenation of "encoder" and "decoder"; a device
that can only compress is typically called an encoder, and one that can only decompress is
known as a decoder.
TH
The format of the compressed data usually conforms to a standard video compression
specification. The compression is typically lossy, meaning that the compressed video lacks some
of the information present in the original video. A consequence of this is that decompressed
video has lower quality than the original, uncompressed video because there is insufficient
information to accurately reconstruct the original video.
There are complex relationships between the video quality, the amount of data used to
represent the video (determined by the bit rate), the complexity of the encoding and decoding
ED
algorithms, sensitivity to data losses and errors, ease of editing, random access, and end-to-end
delay (latency).
HM
A