Chapter 1
Chapter 1
Revision Objectives
M
the effects of changing sampling rate and resolution on sound quality •
the need for file compression methods (such as lossy and lossless formats)
M
• how to compress common file formats (such as text files, bit-map images, vector
graphics, sound files and video files).
A
D
W
A
S
I
Q
Data Representation
Number systems
Designers of computer systems adopted the binary (base 2) number system since this
allows only two values, 0 and 1. No matter how complex the system, the basic building
block in all computers is the binary number system. Since computers contain million and
millions of tiny ‘switches’, which must be in the ON or OFF position, this lends itself
logically to the binary system. A switch in the ON position can be represented by 1; a
switch in the OFF position can be represented by 0. Each of the binary digits are known
as bits.
M
Binary number system
The binary system uses 1s and 0s only which gives these corresponding weightings: U
H
128 64 32 16 8 4 2 1
(2 ) (2 ) (2 ) (2 ) (2 ) (2 ) (2 ) (20)
7 6 5 4 3 2 1
A
1 appears in a column, the column value is added to the total. For example, the binary
number above is:
A
the reverse operation - converting from denary to binary - is slightly more complex. There
are two basic ways of doing this.
2 107
2 53 remainder :1
2 26 remainder :1
2 13 remainder :0 Write the remainder
from bottom to top to
M
2 6 remainder :1
get the binary number
2 3 remainder :0 01101011
U
2 1 remainder :1
2 0 remainder :1
H
0 remainder :0
A
In one’s complement, each digit in the binary number is inverted (in other words, 0 be-
comes 1 and 1 becomes 0). For example, 0 1 0 1 1 0 1 0 (denary value 90) becomes
D
1 0 1 0 0 1 0 1.
In two’s complement, each digit in the binary number is inverted and a ‘1’ is added to the
W
right most bit. for example, 0 1 0 1 1 0 1 0 (denary value 90) becomes:
A
1 0 1 0 0 1 0 1
+ 1
S
= 1 0 1 0 0 1 0 1 (Since 1 + 1 = 0, a carry of 1) = denary value - 90
Throughout the remainder of this chapter, we will use the two’s complement method to
avoid confusion. Also, two’s complement makes binary addition and subtraction more
straightforward
I
Now that we are introducing negative numbers, we need a way to represent these in bi-
Q
nary. The two’s complement uses these weightings for an 8-bit number representation:
-128 64 32 16 8 4 2 1
This means:
-128 64 32 16 8 4 2 1
1 1 0 1 1 0 1 0
0 0 1 0 0 1 1 0
The first example: -128 + 64 + 16 + 8 + 2 = -38
The second example is: 32 + 4 + 2 = 38
The easiest way to convert a number into its negative equivalent is to use two’s comple-
M
ment. For example, 104 in binary is 0 1 1 0 1 0 0 0.
U
To find the binary value for -104 using two’s complement:
H
invert the digits:
add 1:
1 0 0 1 0 1 1 1
0
(+104 in denary)
A
which gives 1 0 0 1 1 0 0 0 = -104)
M
M
Binary addition
Consider Examples:
Example 1 A
D
add 0 0 1 0 0 1 0 1 (37 denary) and 0 0 1 1 1 0 1 0 (58 in denary).
Solution
W
-128 64 32 16 8 4 2 1
0 0 1 0 0 1 0 1
A
+
0 0 1 1 1 0 1 0
S
=
0 1 0 1 1 1 1 1
This gives us 0 1 0 1 1 1 1 1, which is 95 in denary; the correct answer.
I
Q
Example 2
Add 0 1 0 1 0 0 1 0 (82 in denary) and 0 1 0 0 0 1 0 1 (69 in denary)
Solution
-128 64 32 16 8 4 2 1
0 1 0 1 0 0 1 0
+
0 1 0 0 0 1 0 1
1 0 0 1
=
0 1 1 1
M
This gives us 1 0 0 1 0 1 1 1, which is -105 in denary (which is clearly nonsense). When U
adding two positive numbers, the result should always be positive (Likewise, when add-
ing two negative number, the result should always be negative). Here the addition of two H
positive numbers has resulted in a negative answer. This is due to the result of the addi-
tion producing a number which is outside the range of values which can be represented A
by the 8 bits being used (in this case +127 is the largest value which can be represented,
and the calculation produces the value 151, which is larger than 127 and, therefore, out M
M
of range). This causes overflow.
A
D
W
A
S
I
Q
Binary subtraction
To carry out subtraction in binary, we convert the number being subtracted into its nega-
tive equivalent using two’s complement, and then add the two numbers.
Example 1
Solution
A
0 1 0 1 1 1 1 1
+
1 0 1 1 1 1 0 0
1 0 0 0 1
=
1 0 1 1
D
The additional ninth bit is simply ignored leaving the binary number 0 0 0 1 1 0 1 1 W
A
(denary equivalent of 27, which is the correct result of the subtraction).
S
I
Q
Example 1
Solution
A
1 1 1 0 0 0 0 0 1
D
The gives us 1 1 1 0 0 0 1, which is -31 in denary; the correct answer.
W
A
S
I
Q
Measurement of the size of computer memories
The byte is the smallest unit of memory in a computer. Some computers use larger bytes,
such as 16-bit systems and 32-bit systems, but they are always multiples of 8. 1 byte of
memory wouldn’t allow you to store very much information; so memory size is measured
in these multiples.
M
1 kilobyte (1KB) 1 000
1 megabyte (1 MB) 1 000 000
U
1 gigabyte (1 GB) 1 000 000 000
1 terabyte (1 TB) 1 000 000 000 000
H
1 petabyte (1 PB) 1 000 000 000 000 000
A
The system of numbering shown in table above only refers to some storage devices, but is
technically inaccurate. It is based on the SI (base 10) system of units where 1 kilo is equal
to 1000. A 1 TB hard disk would allow the storage of 1 x 1012 bytes according to this sys-
tem. However, since memory size is actually measured in terms of powers of 2, Another
system has been proposed by the International Electrotechnical Commission (IEC): it is
M
based on the binary system.
M
Table: IEC memory size system
Name of memory size Number of Bytes Equivalent denary value (bytes)
A
1 kilobyte (1KB)
1 megabyte (1 MB)
210
220
1 024
1 048 576
D
W
1 gigabyte (1 GB) 230 1 073 741 824
1 terabyte (1 TB) 240 1 099 511 627 776
1 petabyte (1 PB) 250 1 125 899 906 842 624
A
S
This system is more accurate. Internal memories (Such as RAM) should be measure using
the IEC system. A 64 GB RAM could, therefore, store 64 x 230 bytes of data
I
(68 719 476 736 bytes).
Q
Hexadecimal number system
The hexadecimal system is very closely related to the binary system. Hexadecimal (some-
times referred to as simply hex) is a base 16 system with the weightings.
Because it is a system based on 16 different digits, the numbers 0 to 9 and the letters A to
F are used to represent hexadecimal digits.
A
1001 9 9
1010 A 10
1011
1100
B
C
11
12 S
1101
1110
D
E
13
14 I
1111 F 15
Q
Converting from binary to hexadecimal and from hexadecimal to binary
Converting from binary to hexadecimal is fairly easy process. Starting from the right and
moving left, split the binary number into groups of 4 bits. If the last group has less than 4
bits. then simply fill in with 0s from the left. Take each group of 4 bits and convert it into
the equivalent hexadecimal digit.
Example 1
Solution
M
First split it into group of 4 bits:
U
1011 1110 0001 H
Then find the equivalent hexadecimal digits A
B E 1 M
Example 2 M
Convert 1 0 0 0 0 1 1 1 1 1 1 1 0 1 from binary to hexadecimal A
Solution D
First split it into groups of 4 bits:
W
A
10 0001 1111 1101
S
The left group only contains 2 bits, so add in two 0s to the left:
I
0010 0001 1111 1101
Q
Now find the equivalent hexadecimal digits:
2 1 F D
Converting from hexadecimal to binary is also straightforward, simply take each hexadeci-
mal digit and write down the 4 bit code which corresponds to the digit.
B F 0 8
Solution
B5A41AFC A
than it is to work with: D
W
10110101101001000001101011111100
A
So, hexadecimal is often used when developing new software or when trying to trace
errors in programs. When the memory contents are output to a printer or monitor, this is
S
known as a memory dump.
A program developer can look at each of the hexadecimal codes (as shown in table be-
low) and determine where the error lies. The value on the far left shows the memory
location, so it is possible to find out exactly where in memory the fault occurs. Using hex-
I
adecimal is more manageable than binary. It is a powerful fault-tracing tool, but requires
considerable knowledge of computer architecture to be able to interpret the results
Q
Binary-coded decimal (BCD) system
The binary-coded decimal (BCD) system uses a 4-bit code to represent each denary digit:
0000=0 0101=5
0001=1 0110=6
0010=2 0111=7
0011=3 1000=8
0100=4 1001=9
W
0 1 1 0 0 1 0 1 6 5
A
Uses of BCD
The most obvious use of BCD is in the representation of digits on a calculator or clock
S
display.
180.3 I
Each denary digit will have a BCD equivalent value which makes it easy to convert from
computer output to denary display.
Q
It is nearly impossible to represent decimal values exactly in computer memories which
use the binary number system. Normally this doesn’t cause a major issue since the dif-
ferences can be dealt with. However, when it comes to accounting and representing
monetary value in computers, exact values need to be stored to prevent significant errors
from accumulating. Monetary values use a fixed point notation, for example $1.31, so one
solution is to represent each denary digit as a BCD value.
Consider adding $0.37 and $0.94 together using fixed point decimals.
$0.37 0 0 0 0 0 0 0 0 . 0 0 1 1 0 1 1 1
+ +
$0.94 0 0 0 0 0 0 0 0 . 1 0 0 1 0 1 0 0 Expected result = $1.31
0 0 1 1
+
0 0 0 1 M
+
U
H
1
=
1 1 0 1 A
This produces 1 1 0 1 which isn’t a denary digit: this will flag an error and the computer M
M
again needs to add 0 1 1 0
A
D
1 1 0 1
+
this again produces a
fifth bit which is carried
0 1 1 0
W
A
to the next decimal digit =
position 1 0 0 1 1
adding 1 to 0 0 0 0 0 0 0 0 produces:
S
0 0 0 0 0 0 0 1 I
Final Answer
Q
0 0 0 0 0 0 0 0
. 0 0 1 1 0 0 0 1
A
48 30 0 80 50 P 112 70 p
49 31 1 81 51 Q 113 71 q
S
50 32 2 82 52 R 114 72 r
51 33 3 83 53 S 115 73 s
I
52 34 4 84 54 T 116 74 t
53 35 5 85 55 U 117 75 u
Q
54 36 6 86 56 V 118 76 v
55 37 7 87 57 W 119 77 w
56 38 8 88 58 X 120 78 x
57 39 9 89 59 Y 121 79 y
58 3A : 90 5A Z 122 7A z
59 3B ; 91 5B [ 123 7B {
60 3C < 92 5C \ 124 7C |
61 3D = 93 5D ] 125 7D }
62 3E > 94 5E ^ 126 7E ~
63 3F ? 95 5F _ 127 7F DEL
Notice the storage of characters with upper case and lower case. For example
a 1 1 0 0 0 0 1 hex 61 (lower case)
A 1 0 0 0 0 0 1 hex 41 (upper case)
y 1 1 1 1 0 0 1 hex 79 (lower case)
Y 1 0 1 1 0 0 1 hex 59 (upper case)
Notice the sixth bit changes from 1 to 0 when comparing lower and upper case charac-
ters. This makes the conversion between the two an easy operation. It is also noticeable
that the character sets (such as a to z, 0 to 9, and so on) are grouped together in se-
quence, which speeds up usability.
Extended ASCII uses 8-bit codes (128 to 255 in denary or 80 to FF in hex). This allows for
M
non English characters and for drawing characters to be included.
U
Since ASCII code has a number of disadvantages and is unsuitable for some purposes,
different methods of coding have been developed over the years. One coding system is H
called Unicode. Unicode allows characters in a code form to represent all languages of
the world, thus supporting many operating systems, search engines and internet brows- A
ers used globally. There is overlap with standard ASCII code, since the first 128 (English)
characters are the same, but Unicode can support several thousand different characters M
M
in total. As can be seen in tables, ASCII uses one byte to represent a character, whereas
Unicode will support up to four bytes per character.
A
D
W
A
S
I
Q
Table: Extended ASCII code table
Hex Dec har Hex Dec Char Hex Dec Char
80 A0 C0
81 A1 C1
82 A2 C2
83 A3 C3
84 A4 C4
85 A5 C5
M
86 A6 C6
87 A7 C7
U
88 A8 C8
89 A9 C9
H
8A AA CA
8B AB CB
A
8C AC CC
8D AD CD
M
8E AE CE
8F AF CD
M
90 B0 D0
91 B1 D1
A
92 B2 D2
93 B3 D3
D
94 B4 D4
95 B5 D5
96 B6 D6
97
98
B7
B8
D7
D8
W
99
9A
B9
BA
D9
DA
A
9B
9C
BB
BC
DB
DC
S
9D
9E
BD
BE
DD
DE
I
9F BF DF
Q
Hex Dec Char Hex Dec Char
E0 F0
E1 F1
E2 F2
E3 F3
E4 F4
E5 F5
E6 F6
E7
E8
F7
F8 M
E9
EA
F9
FA U
EB
EC
FB
FC H
ED
EE
FD
FE A
EF FF
M
The Unicode consortium was set up in 1991. Version 1.0 was published with five goals,
these were to M
A
• Create a universal standard that covered all languages and all writing systems
• produce a more efficient coding system than ASCII
D
• adopt uniform encoding where each character is encoded as 16-bit to 32-bit code
• create unambiguous encoding where each 16-bit to 32-bit value always represents the
same character (it is worth pointing out here that the ASCII code tables are not stand-
ardised and versions other than the ones shown in tables above exist)
• reserve part of the code for private use to enable a user to assign codes for their ownW
characters and symbols (useful for Chinese and Japanese character sets)
A
S
A sample of Unicode characters ar shown in table below. As can be seen from the table,
characters used in languages such as Russian, Greek, Romanian and Croatian can now be
I
represented in a computer).
Q
Table: Sample of Unicode characters
0 1 2 3 4 5 6 7 8 9 A B C D E F
01A0 Ơ ơ Ƣ ƣ Ƥ ƥ Ʀ Ƨ ƨ Ʃ ƪ ƫ Ƭ ƭ Ʈ Ư
01B0 ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ ƹ ƺ ƻ Ƽ ƽ ƾ ƿ
01C0 ǀ ǁ ǂ ǃ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ
01D0 ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ
01E0 Ǡ ǡ Ǣ ǣ Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ
ǰ DZ Dz dz Ǵ ǵ Ƕ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ
M
01F0
0200 Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ
0210
0220
Ȑ
Ƞ
ȑ
ȡ
Ȓ
Ȣ
ȓ
ȣ
Ȕ
Ȥ
ȕ
ȥ
Ȗ
Ȧ
ȗ
ȧ
Ș
Ȩ
ș
ȩ
Ț
Ȫ
ț
ȫ
Ȝ
Ȭ
ȝ
ȭ
Ȟ
Ȯ
ȟ
ȯ U
0230
0240
Ȱ
ɀ
ȱ
Ɂ
Ȳ
ɂ
ȳ
Ƀ
ȴ
Ʉ
ȵ
Ʌ
ȶ
Ɇ
ȷ
ɇ
ȸ
Ɉ
ȹ
ɉ
Ⱥ
Ɋ
Ȼ
ɋ
ȼ
Ɍ
Ƚ
ɍ
Ⱦ
Ɏ
ȿ
ɏ H
0250
0260
ɐ
ɠ
ɑ
ɡ
ɒ
ɢ
ɓ
ɣ
ɔ
ɤ
ɕ
ɥ
ɖ
ɦ
ɗ
ɧ
ɘ
ɨ
ə
ɩ
ɚ
ɪ
ɛ
ɫ
ɜ
ɬ
ɝ
ɭ
ɞ
ɮ
ɟ
ɯ A
0270
0280
ɰ
ʀ
ɱ
ʁ
ɲ
ʁ
ɳ
ʁ
ɴ
ʄ
ɵ
ʅ
ɶ
ʆ
ɷ
ʇ
ɸ
ʈ
ɹ
ʉ
ɺ
ʊ
ɻ
ʋ
ɼ
ʌ
ɽ
ʍ
ɾ
ʎ
ɿ
ʏ
M
0290
02A0
ʐ
ʠ
ʑ
ʡ
ʒ
ʢ
ʓ
ʣ
ʔ
ʤ
ʕ
ʥ
ʖ
ʦ
ʗ
ʧ
ʘ
ʨ
ʙ
ʩ
ʚ
ʪ
ʛ
ʫ
ʜ
ʬ
ʝ
ʭ
ʞ
ʮ
ʟ
ʯ
M
02B0 ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ʹ ʺ ʻ ʼ ʽ ʾ ʿ
A
Multimedia
Images can be stored in a computer in two common formats: bit-map image and vector D
W
graphic.
Bit-map images
Bit-map images are made up of pixels (picture elements); the image is stored in a two
dimensional matrix of pixels.
A
When storing images are pixels, we have to consider • at least 8 bits (1 byte) per
S
pixel are needed to code a coloured image (this gives 256
possible colours by varying the intensity of the blue, green and red elements) I
• true colour requires 3 bytes per pixel (24 bits), which gives more than one million
colours. Q
• the number of bits used to represent a pixel is called the colour depth
In terms of images, we need to distinguish between but depth and colour depth; for
example, the number of bits that are used to represent a single pixel (bit depth) will
determine the colour depths of that pixel. As the bit depth increases, the number of pos-
sible colours which can be represented also increases. For example, a bit depths of 8 bits
per pixel allows 256 (28) different colours (the colour depth) to be represented, whereas
using a bit depth of 32 bits per pixel results in 4 294 967 296 (232) different colours. The
impact of bit depth and colour depth is considered later.
We will now consider the actual image itself and how it can be displayed on a screen.
There are two important definitions here: • Image resolution refers to the number of
pixels that make up an image; for example,
M
an image could contain 4096 x 3192 pixels (12 738 656 pixels in total)
• Screen resolution refers to the number of horizontal pixels and the number of vertical U
pixels that make up a screen display (for example, if the screen resolution is smaller
than the image resolution then the whole image cannot be shown on the screen or H
the original image will now be a lower quality).
A
A pixel-generated image can be scaled up or scaled down: it is important to understand
that this can be done when deciding on the resolution. The resolution can be varied on M
M
many cameras before taking, for example, a digital photograph. When magnifying an im-
age, the number of pixels that makes up the image remains the same but the area they
A
cover is now increased. This means some of the sharpness could be lost. This is known as
the pixel density and is key when scaling up photographs.
The main drawback of using high resolution images is the increase in file size. As the
number of pixels used to represent the image is increased, the size of the file will also in-
D
W
crease. This impacts on how many images can be stored on, for example, a hard drive, it
also impacts on the time to download an image from the internet or the time to transfer
A
images from device to device. Bit-map images rely on certain properties of the human eye
and, up to a point, the amount of file compression used. The eye can tolerate a certain
S
amount of resolution reduction before the loss of quality becomes significant.
if the image uses 2 bits to store the colour for each pixel, then the image size would be:
Number of Pixels x Colour Depth = Image Size
67500 x 2 bit = 135000 bits
Vector Graphics
Vector graphics are images that use 2D points to describe lines and curves and their
properties that are grouped to form geometric shapes. Vector graphics can be designed
using computer aided design (CAD) software or using an application which uses a drawing
canvas on the screen.
A
D
W
A
S
I
Q
Comparison between vector graphics and bit-map images
Table: Comparison between vector graphics and bit-map images.
Vector graphic images Bit-map images
made up of geometric shapes which re- made up of tine pixels of different colours
quire definition/attributes
to alter/edit the design, it is necessary to possible to alter/edit each of the pixels to
change each of the geometric shapes change the design of the image.
they do not require large file size since it is because of the use of pixels (which give
M
made up of simple geometric shapes very accurate designs), the file size is very
large
U
because the number of geometric shapes is since images are built up pixels, the final
limited, vector graphics are not usually very image is usually very realistic
H
realistic
file formates are usually .svg, .cgm, .odg file formats are usually .jpeg, .bmp, .png
It is now worth considering whether a vector graphic or a bit-map image would be the
best choice for a given application. When deciding which is the better method, we should
A
consider the following.
• Does the image need to be resized? If so, a vector graphic could be the best option •
M
Does the image need to be drawn to scale? Again, a vector graphic is probably the best
option • M
Does the image need to look real? Usually bit-map images look more realistic than
vector graphics • A
Are there file restrictions? If so, it is important to consider whether vector graphic
images can be used: if not, it would be necessary to consider the image resolution of a D
bit-map image to ensure the file size is not too large.
For example, when designing a logo for a company or composing an ‘exploded diagram’
W
of a car engine, vector graphics are the best choice.
A
However, when modifying photographs using photo software, the best method is to use
bit-map images. S
I
Q
Sound files
Sound requires a medium in which to travel through (it cannot travel in vacuum). This
is because it is transmitted by causing oscillation of particles within the medium. The
human ear picks up these oscillations (changes in air pressure) and interprets them as
sound. Each sound wave has a frequency and wavelength; the amplitude specifies the
loudness of the sound.
M
U
H
A
M
M
A
D
W
A
Sound is an analogue value; this needs to be digitised in order to store sound in a com-
puter. This is done using an analogue to digital converter (ADC). If the sound is to be used
S
as a music file, it is often filtered first to remove higher frequencies and lower frequencies
which are outside the range of human hearing. To convert the analogue data to digital,
the sound waves are sampled at a given time rate. The amplitude of the sound cannot be
measured precisely, so approximate values are stored.
I
Q
Figure: A sound wave
Figure above shows a sound wave. The x-axis shows the time intervals when the sound
M
was sampled (0 to 20), and the y-axis shows the amplitude of the sampled sound (the
amplitudes above 10 and below 0 are filtered out in this example). U
At time interval 1, the approximate amplitude is 9; at time interval 2, the approximate H
amplitude is 4, binary bits can be used to represent each amplitude value (for example, 9
would be represented by the binary value 1001). Increasing the number of possible val- A
ues used to represent sound amplitude also increases the accuracy of the sampled sound
(for example, using a range of 0 to 127 gives a much more accurate representation of the M
M
sound sample than using a range of, for example, 0 to 10). This is known as the sampling
resolution (also known as the bit depth).
A
So,how s sampling used to record a sound clip?
The amplitude of the sound wave is first determined at set time intervals (the sampling
S
rate)
This gives an approximate representation of the sound wave
The sound wave is then encoded as a series of binary digits.
Using a higher sampling rate or larger resolution will result in a more faithful representa-
I
tion of the original sound source.
Q
Table: The pros and cons of using a larger sampling resolution when recording sound
Pros Cons
larger dynamic range produces larger file size
better sound quality takes longer to transmit/download sound
files
less sound distortion requires greater processing power
Recorded sound is often edited using software, Common features of such software in-
clude the ability to
• edit the start/stop times and duration of a sample
• extract and save (or delete) part of a sample
• after the frequency and amplitude of a sample
• fade in and fade out
• mix and/or merge multiple sound tracks or sources
• combine various sound sources together and alter their properties
• remove ‘noise’ to enhance one sound wave in a multiple of waves (for example, to
identify and extract one person’s voice out of a group of people)
• convert between different audio formats
M
File Size U
File size = Sampling rate x Sample Resolution X length of sound
H
If you wanted to record a 30 second voice message on your mobile phone you would use
the following A
Sample Rate = 8.000Hz
Sample Resolution = 16 bit M
M
Length of Sound = 30 seconds
A
8,000 x 16 x 30 = 3 840 000 Bits = 480 000 bytes
D
File Compression
It is often necessary to reduce the file size of a file to either save storage space or to re-
duce the time taken to stream or transmit data fro one device to another. The two most
W
common forms of time taken to stream or transmit data from one device to another. The
two most common forms of file compression are ‘lossless file compression’ and ‘lossy file
A
compression’.
S
Lossless File Compression
With this technique, all the data from the original file can be reconstructed when the
file is uncompressed again. This is particularly important for files where loss of any data
would be disastrous (such as a spreadsheet file of important results).
I
Lossy File Compression
With this technique, the file compression algorithm eliminates unnecessary data (as with
Q
MP3 and JPEG formats, for example)
Lossless file compression is designed to lose none of the original detail from the file (such
as Run-Length Encoding (RLE). Lossy file compression usually results in some loss of detail
when compared to the original; it is usually impossible to reconstruct the original file. The
algorithms used in the lossy technique have to decide which parts of the file are impor-
tant (and need to be kept) and which parts can be discarded.
File Compression Applications
MP3 files are used in MP3 players, computers or mobile phones. Music files can be down-
loaded or streamed from the internet in a compressed format, or CD files can be convert-
ed to MP3 format. While streamed or MP3 music quality can never match the ‘full’ ver-
M
sion found on a CD, the quality is satisfactory for most purposes.
U
But how can the original music file be reduced by 90% while still retaining most of the
music quality? This is done using file compression algorithms that use perceptual music H
shaping.
A
Perceptual music shaping removes certain sounds. For example
frequencies that are outside the human hearing range M
M
if two sounds are played at the same time, only the louder one can be heard by the ear,
so the softer sound is eliminated.
This means that certain parts of the music can be removed without affecting the quality A
D
too much. MP3 files uses what is known as a lossy format, since part of the original file is
lost following the compression algorithm. This means that the original file cannot be put
back together again. However, even the quality of MP3 files can be different, since it de-
W
pends on the bit rate- this refers to the number of bits per second used when creating the
file. Bit rates are between 80 and 320 kilobits per second; usually 200 kilobits or higher
A
gives a sound quality close to a normal CD.
S
MPEG-4 (MP4) files are slightly different to MP3 files. This format allows the storage of
multimedia files rather than just sound. Music, videos, photos and animation can all be
stored in the MP4 format. Videos, for example, could be streamed over the internet using
the MP4 format without losing any real discernible quality.
I
Photographic (bit-map) images
When a photographic file is compressed, both the file size and quality of images are re-
Q
duced. A common file format for images is JPEG, which uses lossy file compression. Once
the image is subjected to the JPEG compression algorithm, a new file is formed and the
original file can no longer be constructed. A JPEG will reduce the raw bit-map image by a
factor of between 5 and 15, depending on the quality of the original.
Vector graphics can also undergo some form of the file compression. Scalable vector
graphics (.svg) are defined in XML text files which, therefore, allows them to be
compressed.
Run-length encoding (RLE)
Run-length encoding (RLE) can be used to compress a number of different file formats.
It is a form of lossless/reversible file compression that reduces the size of a string of adja-
cent identical data (such as repeated colours in an image).
D
bytes. This is half the original file size.
One issue occurs with a string such as ‘cdcdcdcdcd’, where compression is not very ef-
fective. To cope with this we use a flag. A flag preceding data indicates that what follows
are the number of repeating units (for example, 255 05 97 where 255 is the flag and the W
other two numbers indicate that there are five items with ASCII c ode 97). When a flag is
not used, the next byte(s) are taken with their face value and a run of 1 (for example, 01 A
S
99 means one character with ASCII code 99 follows).
I
Consider this example:
String aaaaaaaa bbbbbbbbbb c d c d c d eeeeeeee
Q
Code 08 97 10 98 01 99 01 100 01 99 01 100 01 99 01 100 08 101
The original files contain 32 characters and would occupy 32 bytes of storage.
The coded version contains 18 values and would require 18 bytes of storage.
This has 15 values and would, therefore, require 15 bytes of storage. This is a reduction in
file size of about 53%.
Using RLE with images
M
this becomes
1 0 0 0 0 0 0 1
1 0 1 1 1 1 1 1 9W 6B 2W 1B 7W 1B 7W 5B
U
3W 1B 7W 1B 7W 1B 6W
1 0 1 1 1 1 1 1
1 0 0 0 0 0 1 1 Using W = 1 and B = 0 we
1 0 1 1 1 1 1 1
H
get:
1 0 1 1 1 1 1 1 91 60 21 10 71 10 71 50 31
1 0 1 1 1 1 1 1
A
10 71 10 71 10 61
1 0 1 1 1 1 11
M
The 8 x8 grid would need 64 bytes: the compressed RLE format has 30 values, and there-
fore needs only 30 bytes to store the image.
Coloured images M
A
Figure below shows an object in four colours. Each colour is made up of red, green and
blue (RGB) according to the code on the right.
Figure: Using RLE with a coloured image
Square
colour
Red Green
Components
Blue
D
W
0 0 0
0 255 0 A
255 0 0 S
This produces the following data:
2 0 0 0 4 0 255 0 3 0 0 0 6 255 255 255 1 0 0 0 2 0 255 0 4 255 0 0 4 0 255 0 1 255 255 255 2 255 I
0 0 1 255 255 255 4 0 255 0 4 255 0 0 4 0 255 0 4 255 255 255 2 0 255 0 1 0 0 0 2 255 255 255 2
255 0 0 2 255 255 255 3 0 0 0 4 0 255 0 2 0 0 0 Q
The original image (8 x 8 square) would need 3 bytes per square (to include all three RGB
values). There fore, the uncompressed file for this image is 8 x 8 x 3 = 192 bytes.
The RLE code has 92 values, which means the compressed file will be 92 bytes in size. This
gives a file reduction of about 52%. It should be noted that the file reductions in reality
will be as large as this due to other data which needs to be stored with the compressed
file (such as a file header).
General Methods of compressing files
All the above file compression techniques are excellent for very specific types of file.
However, it is also worth considering some general methods to reduce the size of a file
without the need to use lossy or lossless file compression:
M
Movie files reduce the sampling resolution
U
reduce the frame rate
Binary - base two number system based on the values 0 and 1 only.
bit - abbreviation for binary digit
One’s complement - each binary digit in a number is reversed to allow both negative and
positive numbers to be represented.
Two’s complement - each binary digit is reversed and 1 is added in right-most position to
produce another method of representing positive and negative numbers.
Sign and magnitude - binary number system where left most bit is used to represent the
sign (0 = + and 1 = -); the remaining bits represent the binary value
Hexadecimal - a number system based on the value 16 (uses the denary digits 0 to 9 and
M
the letters A to F)
Memory dump - contents of a computer memory output to screen or printer. U
Binary coded decimal (BCD) - number system that uses 4 bits to represent each denary
digit H
ASCII code - coding system for all the characters on a keyboard and control codes.
Character set - a list of characters that have been defined by computer hardware and A
software. It is necessary to have a method of coding; so that the computer can under-
stand human characters. M
M
Unicode - coding system which represents all the languages of the world (first 128 charac-
ters are the same as ASCII code).
A
Bit-map image - system that uses pixels to make up an image
Pixel - smallest picture element that makes up an image
D
Colour Depth - number of bits used to represent the colours in a pixel, e.g, 8 bit colour
depth can represent 28 = 256 colours
Bit Depth - number of bits used to represent the smallest unit in, for example, a sound or
W
image file - the larger the bit depth, the better the quality of the sound or colour image
Image resolution - number of pixels that make up an image, for example, am image could
A
contain 4096 x 3192 pixels (12 738 656 pixels in total)
Screen resolution - number of horizontal and vertical pixels that make up a screen display.
S
If the screen resolution is smaller than the image resolution, the whole image cannot be
shown on the screen, or the original image will become lower quality.
Resolution - number of pixels per column and per row on a monitor or television screen
Pixel density - number of pixels per square centimetre
Vector graphics - images that use 2D points to describe lines and curves and their proper-
I
ties that are grouped to form geometric shapes
Sampling resolution - number of bits used to represent sound amplitude (also known as
Q
bit depth)
Sampling rate - number of sound sample taken per second
Frame rate - number of video frames that make up a video per second
Lossless file compression - file compression method where the original file can be restore
following decompression
Lossy file compression - file compression method where parts of the original file cannot
be recovered during decompression, so some of the original details is lost
JPEG - Joint Photographic Expert Group - a form of lossy file compression based on the
inability of the eye to spot certain colour changes and hues.
MP3/MP4 files - file compression method used for music and multimedia files
Audio compression - method used to reduce the size of a sound file using perceptual
music shaping
Perceptual music shaping - method where sounds outside the normal range of hearing of
humans, for example, are eliminated from the music file during compression.
Bit rate - number of bits per second that can be transmitted over a network. It is a meas-
ure of the data transfer rate over a digital telecoms network
Run length encoding (RLE) - a lossless file compression technique used to reduce text and
photo files in particular.
M
U
H
A
M
M
A
D
W
A
S
I
Q