0% found this document useful (0 votes)

12 views55 pages

1 - Data Representation

Uploaded by

Tajamul Khawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views55 pages

1 - Data Representation

Uploaded by

Tajamul Khawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Data Representation

Number systems

Eng. Tajammul Khawar

Representation of numbers

Numbers in everyday life are usually represented using the digits 0 to 9, but this is not the
Sir Khawar

only way in which a number can be represented. There are multiple number base systems,
which determine which digits are used to represent a number. The number system that we
are most familiar with is called denary or decimal (base-10), but binary (base-2)
and hexadecimal (hex or base-16) are also used by computers. You can perform
arithmetic calculations on numbers written in other base notations, and even convert
numbers between bases.

2
Binary and denary
Many ancient cultures developed the counting system that we use today, known as
the decimal system. It allows us to use ten values and it is likely that this common
approach was developed because of the fact humans have ten fingers/digits to count

Sir Khawar
with. You may have also heard this system referred to as denary or base-10.

Computers obviously don't have fingers, but instead use tiny switches called transistors
that allow electricity to be on or oﬀ in a circuit. These circuits are combined to represent
data and the two states of on or oﬀ are represented as 1 or 0. This is known as
the binary or base-2 as only two values can be used. Combinations of 1s and 0s can be
used by a computer to represent any type of information (e.g. numbers, text, images,
sound, program instructions).

3
Base - 10 (Denary)
The denary system is a method of assigning a place value to numbers.
Sir Khawar

A place value is the numerical value of a digit that appears within a number. For example,
take the number: 189210

The place value of 189210 is one thousand.

1000 100 10 1

1 8 9 2

In this number, there are:

•
1 thousand
•
8 hundreds
•
9 tens
•
2 ones

To work out what the place values are, you start from the first column on the right where
the 1 place value is and multiply by 10 as you move from right to left.

4
Base - 2 (Binary)
Binary is a base-2 number system. It only uses the digits 0 and 1. To understand how a
binary value translates to a denary, you need to understand the place values for a base-2
system.

Sir Khawar
To work out what the place values are, you use the same process as you did with the
denary system and start from the first column on the right where the 1 place value is. But
when using base-2, you multiply by 2 each time as you move from right to left.

Take the following binary number: 01002

The place value of 1 is 4

8 4 2 1

0 1 0 0

5
In this number, there are:
Sir Khawar

• 0 eights

• 1 four

• 0 twos

• 0 ones

6
Converting from binary to denary
To convert from binary to denary, you need to know the place value of each digit in the
number.

Sir Khawar
When you are just starting to learn how to do this, it is a good idea to always use
a table to help you with the conversion.

For example, if working with a 4-bit binary number, use the following table:

8 4 2 1

For an 8-bit binary number, use the following table:

128 64 32 16 8 4 2 1

Example 1
Take the 4-bit binary number 10112 and place the digits from right to left in the table:

8 4 2 1

1 0 1 1

By doing this, you can see the value of each bit. Look at the place value for each bit
represented as a

8+2+1=11

Therefore, the binary number 10112 is 1110 in denary.

7
Sir Khawar

Example 2
A byte is equal to 8 bits. To convert the following byte, follow the same process but this
time use the table with eight columns.

Convert the following binary value into denary: 100011012

128 64 32 16 8 4 2 1

1 0 0 0 1 1 0 1

128 + 8 + 4 + 1 = 14110

Therefore, the binary value of 100011012 is equal to the denary value of 14110

8
Converting from denary to binary
Converting from denary to binary is a very similar to the process you used to convert
binary to denary.

Sir Khawar
The main two differences are as follows:

• You now need to work from left to right

• You need to subtract, not add

Again, it is a good idea to always use a table like this one.

128 64 32 16 8 4 2 1

Example 1
To describe the process, the denary number 5 will be converted into its binary equivalent.

Using your table, you start by looking for the first place value that is less than 5 and place
a 1 in that column. In this case, it is the column with the place value of 4 as all the place
values to the left of this (8, 16, 32, etc.) are all greater than the value of 5.

128 64 32 16 8 4 2 1

9
Now that you have placed the 1 in that column, fill the empty spaces to the left with
Sir Khawar

zeros:

128 64 32 16 8 4 2 1

0 0 0 0 0 1

Next, take the place value away from your current number (5).

5−4=1

The next step is to look for the highest remaining place value for the remaining number.
As in this instance the remaining number is 1, this neatly flts into the 1 column. Fill in the
gaps with 0s.

128 64 32 16 8 4 2 1

0 0 0 0 0 1 0 1

Now you have your conversion:

510 in denary is 1012 in binary.

10
Example 2
In this example, follow the same process, but this time, convert the larger denary value

Sir Khawar
of 7610

Step 1

Look for the highest value that fits into the number 76 and place a 1 in that column. In this
case, it is the column with the place value of 64 as the place value to the right of this (128)
is greater than 76.

128 64 32 16 8 4 2 1

0 1

Now that you have placed the 1 in that column, you then take the value of the place value
away from your current number (76).

76−64=12

Step 2

Repeat the same process again and find the highest value that fits into the number you
have remaining (12). In this case, it is the column with the place value 8.

128 64 32 16 8 4 2 1

0 1 0 0 1

11
Complete the same calculation as last time to see what you have remaining:
Sir Khawar

12−8=4

Step 3

The remainder is now 4 and there is a place value for 4. Place a 1 in that column.

128 64 32 16 8 4 2 1

0 1 0 0 1 1

4−4=0

Now that there is no remainder, fill in the remaining columns with zeros.

128 64 32 16 8 4 2 1

0 1 0 0 1 1 0 0

Answer: The denary value of 7610 is equal to the binary value 10011002

12
Base - 16 (Hexadecimal)
Hexadecimal (base-16, hex) is often used in computer science. This system uses a base
of 16 digits, i.e. 16 unique symbols are combined to make up all other numbers. There are
only ten symbols in the denary number system (0–9), and so in hexadecimal, a further six

Sir Khawar
symbols (the characters A–F) are used to represent the remaining six digits.

The 16 digits that form the base of the hexadecimal system correspond to the denary
values 0–15. Also, each hex digit is equivalent to four binary digits. Here are the sixteen
digits that form the base of the hex system:

Denary Hexadecimal Binary

0 0 0000

1 1 0001

2 2 0010

3 3 0011

4 4 0100

5 5 0101

6 6 0110

7 7 0111

8 8 1000

9 9 1001

10 A 1010

11 B 1011

12 C 1100

13 D 1101

14 E 1110

15 F 1111

13
Hexadecimal is used to represent a binary value. For example, look at how the denary
Sir Khawar

number 16110 is represented as a binary number and in hexadecimal:

Binary 10100001

Hexadecimal A1

14
Why is hexadecimal used as shorthand for binary?

Hexadecimal is often used by people instead of binary because:

• It is easier to read and interpret

Sir Khawar
• It uses fewer digits to represent the same value

• Compared to binary, it is less likely that a digit will be written down incorrectly

Below are some areas of computing where you might come across the use of
hexadecimals.

Programming with colors

Often, when you a pick a color in a program, a hexadecimal value is assigned to that
color.

15
Many programming languages and software applications allow programmers, designers,
Sir Khawar

and digital artists to enter in their choice of color as a hexadecimal. This is because,
compared to binary, the values are much easier to remember and to write out when they
need to be used.

Most electronic screens use RGB to display color. Each color combines 8 bits for a shade
of red (R), 8 bits for a shade of green (G), and 8 bits for a shade of blue (B).
Therefore, to represent any RBG color, 24 bits are needed.

It's very hard for anyone to remember a combination of 24 bits; it's much easier to
remember a hexadecimal value of just 6 digits.

For example, the color orange can be depicted as FFA50016 instead of

111111111010010100000000.

Media Access Control (MAC addresses)

A media access control (MAC) address is a number that relates to a network interface
controller. MAC addresses are usually displayed as a set of hexadecimal digits separated
by colons.

16
Memory dumps

A memory dump typically appears on a screen when the computer has crashed. It's called

Sir Khawar
a memory dump as it is outputting the current state of the computer's working memory
to help the user in debugging the error.

By representing the memory dump as hexadecimals instead of binary numbers, the length
of the memory dump when it is displayed is reduced by 75%.

17
Converting binary to hexadecimal
To convert between binary and hexadecimal, you will use a process that involves:
Sir Khawar

1. Splitting your binary number into nibbles (sets of 4 bits)

2. Calculate the value of each nibble in denary using the place values for the 4-bit
numbers

3. Convert the denary value into the corresponding hexadecimal digit

4. Read the hexadecimal number from left to right

Reminder:

0 to 9 in denary is the same in hexadecimal. For example 810 = 816

The denary numbers 1010 to 1510 are represented as follows:

Hex A B C D E F

Denary 10 11 12 13 14 15

For example: 1210 = C16

18
This is how to convert the following binary number into hexadecimal: 100110112

Sir Khawar
Step 1

Split the number into two nibbles:

1001 and 1011

Step 2

To calculate the value of each nibble in denary, use the place holder table below to help
you:

8 4 2 1 8 4 2 1

1 0 0 1 1 0 1 1

• Nibble 1 (1001): 8 + 1 = 9

• Nibble 2 (1011): 8 + 2 + 1 = 11

19
Sir Khawar

Step 3

Convert the denary value into the corresponding hexadecimal digit:

• Nibble 1 (1001): 910 = 916

• Nibble 2 (1011): 1110 = B16

Step 4

Read the hexadecimal number from left to right:

100110112 = 9B16

20
Converting hexadecimal to binary
To convert a hexadecimal number into binary:

1. Take each hex digit separately and find its equivalent denary value

Sir Khawar
2. Convert each denary value to a nibble (4-bit binary number) using appropriate
place values for each of the digits; each value has to be expressed using four
digits

3. Combine the nibbles and read the binary number from left to right

Example:

This is how to convert the hexadecimal number 6B16.

Step 1

Find the equivalent denary number for each of the hex digits:

• 616 = 610

•
F16 = 1110

Step 2

Convert each denary number into a 4-bit binary number; the place values for each set of
binary four digits are:

8 4 2 1 8 4 2 1

0 1 1 0 1 0 1 1

4+2=6 8 + 2 + 1 = 11

21
Sir Khawar

Step 3

Combine the nibbles and read the binary number from left to right:

6B16 = 011010112

Step 4

If you were required to convert 6B16 to a denary number, now that you have the binary
number, you can use the same method as converting binary to denary as follows:

128 64 32 16 8 4 2 1

0 1 1 0 1 0 1 1

64 + 32 + 8 + 2 + 1 = 107

Therefore you can say that:

6B16 = 10710

22
Binary Addition
Rule 1: 0 + 0 = 0

Rule 2: 0 + 1 = 1 or 1 + 0 = 1

Sir Khawar
Rule 3: 1 + 1 = 0 carry 1

Rule 4: 1 + 1 + 1 = 1 carry 1

When performing an addition, you may be given two or more binary numbers to add
together. Put the numbers above each other, with the binary numbers aligned to the right,
then look at each column from the right, one at a time. If there are 8 bits, look at the
column with the 8th bit in and find which rule applies to it. Then move to the 7th. Carried
digits are put in the column to the left, and they count when applying the rules.

Worked example
Add the binary numbers 01111011 and 01101000.

Step 1: put the numbers together (these are in a table to help to get you started).

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

23
Step 2: look at the rightmost column, 1 + 0.
Sir Khawar

Which rule does this follow? Rule 2. So the answer is 1.

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

Step 3: look at the next rightmost column, 1 + 0.

Rule 2 again. Fill in the answer.

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

1 1

Step 4: next column, 0 + 0.

Rule 1. The answer is 0.

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

0 1 1

24
Step 5: next column, 1 + 1.

Rule 3. The answer is 0, carry 1. The carry goes below the column to the left.

Sir Khawar
1

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

0 0 1 1

Step 6: next column (there is now a bit in the carry that needs to be taken into account). 1
+ 0 + 1.

Ignore the 0, there are two 1s, so this follows Rule 3. The answer is 0 carry 1.

1 1

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

0 0 0 1 1

Step 7: next column (including the carry), 1 + 1 + 1.

There are three 1s, so this follows Rule 4. The answer is 1 carry 1.

1 1 1

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

1 0 0 0 1 1

25
Step 8: next column (including the carry), 1 + 1 + 1.
Sir Khawar

Rule 4. 1 carry 1.

1 1 1 1

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

1 1 0 0 0 1 1

Step 9: next column (including the carry), 0 + 0 + 1.

There is one 1, so this follows Rule 2. The answer is 1.

1 1 1 1

0 1 1 1 1 0 1 1

+ 0 1 1 0 1 0 0 0

1 1 1 0 0 0 1 1

Once you have completed an addition, convert the binary numbers to check you have
done it correctly.

1 1 1 1

0 1 1 1 1 0 1 1 = 123

+ 0 1 1 0 1 0 0 0 = 104

1 1 1 0 0 0 1 1 = 227

26
Worked example.
Add the binary numbers 10110110 and 11000111.

Sir Khawar
Step 1: put the numbers together and complete the rightmost column (two 1s, Rule 3).

1 0 1 1 0 1 1 1

+ 1 1 0 0 0 1 1 1

Step 2: check the second column (three 1s, Rule 4).

And continue until the end.

1 1 1 1

1 0 1 1 0 1 1 1

+ 1 1 0 0 0 1 1 1

0 1 1 1 1 1 1 0

There is an extra carry left over on this one. This is called overflow. It means that the two
(in this case) 8-bit numbers added together need more than 8 bits. They need 9. Show
this in the examination to make it clear you know what has happened. You may also be
asked what it is and why it is there.

27
Logical Shifts
Left shift
A logical left shift shifts all the bits in a binary string to the left by a specified number of
Sir Khawar

places.

For example, a left shift by one place would involve:

• Moving all of the bits in the string one place to the left

• Discarding the most significant (leftmost) bit

• Putting a 0 into the empty place on the right

If the string represents a number, this operation is equivalent to multiplying the number
by 2. Each shift to the left will multiply the number by 2, so performing a shift three places
to the left on a binary number is the same as multiplying the number by

23 = 8.

Consider this example of multiplying 14 by 8. The binary value is shifted left three times to
obtain the result 112:

128 64 32 16 8 4 2 1

14 0 0 0 0 1 1 1 0

28 0 0 0 1 1 1 0 0

56 0 0 1 1 1 0 0 0

112 0 1 1 1 0 0 0 0

28
Right shift
A logical right shift shifts all of the bits in a binary string to the right by a specified
number of places.

Sir Khawar
For example, a right shift by one place would involve:

• Moving all of the bits in the string one place to the right

• Discarding the least significant (rightmost) bit

• Putting a 0 into the empty place on the left

If the string represents a number, this operation is equivalent to dividing the number by 2.
In general terms, you can say that shifting a binary number right by n places has the effect
of dividing the number by 2n

29
Signed integers in binary

Whole numbers such as 7, 12 and 3988 are called integers. Unsigned integers have
Sir Khawar

positive values by definition, while signed integers can be positive or negative; the
numbers that are larger than zero are called positive, and the ones smaller than zero are
called negative. In denary, negative integers are represented using a minus symbol before
the value of the number, e.g. −19. In binary, there are several ways to represent signed
integers, the most common being two's complement.

Denary to two’s complement

If the denary number is positive, then conversion is the same as to binary. However,
because it is positive the first bit (the most significant bit) must be a 0 to show it is
positive.

For example, 23 in binary is 10111. However, this starts with a 1, which would indicate that
the number is negative. Add a 0 at the front to indicate that it is positive: 010111.

If the denary number is negative, then convert it to two’s complement form. There are
a number of ways of doing this, but we’ll stick with one method here. You convert the
denary number to binary (as normal), then flip every bit (if it’s a 1, replace it with a 0, if it’s
a 0 replace it with a 1), then add binary 1 to it.

30
Worked example
Convert the denary number –35 into two’s complement.

Sir Khawar
Step 1: write +35 in binary (add 0 at the left to show it is positive).

00100011

Step 2: is the number you are converting negative? Yes, so flip every bit.

11011100

Step 3: add 1.

1 1 0 1 1 1 0 0

+ 0 0 0 0 0 0 0 1

1 1 0 1 1 1 0 1

Step 4: write the answer.

11011101

31
Data Representation
Text, sound and images

Sir Tajammul Khawar

ASCII and Unicode
To represent text digitally, each character needs to have its own unique bit-pattern. Bit-
Sir Khawar

patterns are combinations of 1s and 0s used to represent data inside of a computer. The
bit-pattern used for each character becomes a numeric character code.

A character can be any of the following:

• Letters (upper and lower case letters have separate codes)

• Punctuation (e.g. ?/|\£$)

• Numbers (0–9)

• Non-printing commands (e.g. Enter, Delete, F1)

For computers to be able to communicate and exchange text between each other
reticently, they must have an agreed standard that defines which character code is used
for which character. A standardized collection of characters and the bit-patterns used to
represent them is called a character set.

34
ASCII
ASCII stands for 'American Standard Code for Information Interchange'. It was defined in
1963 and was one of the most common character sets used. It started by using 7 bits to
represent characters, which allowed for a maximum of 128 (2 7) characters to be

Sir Khawar
represented.

These days, 8 bits (1 byte) are used to store each character in the ASCII character set. The
original coding system remains, but each code now has a preceding 0, so there are still
128 bit-patterns in the set. The eighth bit was sometimes used as a parity bit for checking
for errors during the transmission of data.

When text is encoded and stored using ASCII, each of the characters is assigned a denary
(decimal) character code, which is represented and stored in the computer as binary.

If you look carefully at the ASCII representation of each character, you might notice some
patterns. For example:

Character Character code in denary Character code in binary

a 97 0110 0001

b 98 0110 0010

c 99 0110 0011

As you can see, a is represented by 97, b is represented by 98, and c is represented by 99.
This means that if you know the denary value of a character, you can easily work out the
denary values of the previous and subsequent characters.

35
Extended ASCII
There are also extensions on this standard, such as extended ASCII, which uses 8 bits to
Sir Khawar

represent characters, which raises the possible range of characters to 28 = 256.

Unicode
The problem with ASCII is that it only allows you to represent a small number of characters (128
for standard ASCII). This might be enough to represent the characters in the English alphabet,
but it is not sufficient to represent all of the languages and scripts in the world, and all of the
possible numbers and symbols. For example, ASCII can't possibly store the hundreds of
thousands of characters in the below scripts in just 8 bits.

•
Chinese characters 汉字

•
Japanese characters 漢字

• Cyrillic Кири́ллица

• Gujarati ગુજરાતી

• Urdu ‫اردو‬

•
Greek ελληνικά

Moreover, the widespread use of the World Wide Web made it more important to have a universal
international coding system, as the range of platforms and programs has increased dramatically,
with more developers from around the world using a much wider range of characters.

The character set that is most commonly used instead is Unicode. Each Unicode character can be
encoded on a computer with three different encoding standards, which differ based on the
minimum number of bits used:

36
With over a million possible characters, we are able to store every character from every
Goals of Unicode
alphabet, and include a range of special symbols (e.g. mathematical operators, geometric
shapes, arrows, emojis, ideograms, etc.). The flrst 128 codes in Unicode and ASCII are

Sir Khawar
• create
used to a universal
represent standard
the same that covered all languages and all writing systems
characters.

• produce a more authentic coding system than ASCII

• adopt uniform encoding where each character is encoded as 16-bit or 32-bit code

• create unambiguous encoding where each 16-bit and 32-bit value always represents
the same character

37
Character codes for numeric digits
A number can be represented as a set of characters. For example, the number 35 can be
Sir Khawar

represented as the characters '3' and '5'. When a denary digit (from 0 to 9) is processed as
a character, the computer uses the binary pattern of its character code, instead of the
binary representation of that digit. For example, the binary representation of the number
35 using 8 bits is 001000112, but the binary pattern for '35' is 00110011001101012. This is
because the character code for '3' using 8-bit ASCII is 5110 = 001100112 and the character
code for '5' is 5310 = 001101012. Therefore, it is important that you can tell the difference
between the binary representation of a denary number, and the (different) binary pattern
for that number when it is stored as a set of characters.

38
Representation of images

All data on a computer system is represented using binary patterns, which are sequences
of 1s and 0s. In order to represent an image, one method is to store it as if it were a grid of
colored squares, with each color represented by a unique binary pattern. The image

Sir Khawar
dimensions and the number of colors used are factors that affect the size of the image
file.

At a more advanced level, you will learn that images can also be stored as mathematical
equations describing shapes, which are then rendered back into an image when viewed by
the user. It is useful to know the benefits and drawbacks of each image representation
method in order to decide the correct format in which to save a particular image.

39
Bitmap images
Bitmap images are made up of pixels (picture elements); an image is made up of a two-
Sir Khawar

dimensional matrix of pixels. Pixels can take different shapes such as:

Each pixel can be represented as a binary number, and so a bitmap image is

stored in a computer as a series of binary numbers, so that:

• a black and white image only requires 1 bit per pixel – this means that each pixel can
be one of two colors, corresponding to either 1 or 0

• if each pixel is represented by 2 bits, then each pixel can be one of four colors (22 =
4), corresponding to 00, 01, 10, or 11

• if each pixel is represented by 3 bits then each pixel can be one of eight colors (23 =
8), corresponding to 000, 001, 010, 011, 100, 101, 110, 111.
The number of bits used to represent each color is called the color depth. An 8 bit
color depth means that each pixel can be one of 256 colors (because 28 = 256).
Modern computers have a 24 bit color depth, which means over
16 million different colors can be represented With x pixels, 2x colors can be
represented as a generalization. Increasing color depth also increases the size of
the file when storing an image.

Image resolution refers to the number of pixels that make up an image; for example, an
image could contain 4096 × 3072 pixels (12 582 912 pixels in total).

40
The resolution can be varied on many cameras before taking, for example, a digital
photograph. Photographs with a lower resolution have less detail than those with a
higher resolution.

Sir Khawar
Image ‘A’ has the highest resolution and ‘E’ has the lowest resolution. ‘E’ has become
pixelated (‘fuzzy’). This is because there are fewer pixels in ‘E’ to represent the image.

The main drawback of using high resolution images is the increase in file size. As the
number of pixels used to represent the image is increased, the size of the file will also
increase. This impacts on how many images can be stored on, for example, a hard drive.
It also impacts on the time to download an image from the internet or the time to transfer
images from device to device. A certain amount of reduction in resolution of an image is
possible before the loss of quality becomes noticeable.

41
Representation of sound

Soundwaves are vibrations in the air. The human ear senses these vibrations and
Sir Khawar

interprets them as sound.

Each sound wave has a frequency, wavelength and amplitude. The amplitude specifies the
loudness of the sound.

Sound waves vary continuously. This means that sound is analogue. Computers cannot
work with analogue data, so sound waves need to be sampled in order to be stored in a
computer. Sampling means measuring the amplitude of the sound wave. This is done
using an analogue to digital converter (ADC).
To convert the analogue data to digital, the sound waves are sampled at regular time
intervals. The amplitude of the sound cannot be measured precisely, so approximate
values are stored.

42
The x-axis shows the time intervals when the sound was sampled (1 to 21), and the y-axis
shows the amplitude of the sampled sound to 10.

Sir Khawar
At time interval 1, the approximate amplitude is 10; at time interval 2, the approximate
amplitude is 4, and so on for all 20 time intervals. Because the amplitude range in Figure
1.9 is 0 to 10, then 4 binary bits can be used to represent each amplitude value (for
example, 9 would be represented by the binary value 1001). Increasing the number of
possible values used to represent sound amplitude also increases the accuracy of the
sampled sound (for example, using a range of 0 to 127 gives a much more accurate
representation of the sound sample than using a range of, for example, 0 to 10). The
number of bits per sample is known as the sampling resolution (also known as the bit
depth). So, in our example, the sampling resolution is 4 bits.

43
Sampling rate is the number of sound samples taken per second. This is measured in
Sir Khawar

hertz (Hz), where 1Hz means ‘one sample per second’.

So how is sampling used to record a sound clip?

• the amplitude of the sound wave is first determined at set time intervals (the
sampling rate)

• this gives an approximate representation of the sound wave

• each sample of the sound wave is then encoded as a series of binary digits.
Using a higher sampling rate or larger resolution will result in a more faithful
representation of the original sound source. However, the higher the sampling rate
and/or sampling resolution, the greater the file size.

CDs have a 16-bit sampling resolution and a 44.1kHz sample rate – that is 44100 samples
every second. This gives high-quality sound reproduction.

Benefits Drawbacks

larger dynamic range produces larger file size

takes longer to transmit/download music

better sound quality
files

less sound distortion requires greater processing power

44
Data Representation
Data Storage and
compression

Sir Tajammul Khawar

Units of data storage
A binary digit (or bit) is the fundamental unit of data storage, and will have a value of 0
Sir Khawar

or 1. A group of eight bits is called a byte. Four-bit numbers are called a nibble.

Historically, storage capacity was expressed using the metric prefixes

of kilo (1,000), mega (1,000,000), etc. Since 1998 there has been a move towards using the
special prefixes developed to more accurately represent binary values (as per the
International System of Units (SI) definition). For example, a kibibyte is equal to 1,024
bytes, whereas a kilobyte is equal to 1,000 bytes.

The differences between the two systems are shown below, pay close attention to which
letters are capitalized or not:

Name Notation Power of 10 Value

kilobyte KB 103 1,000 bytes

megabyte MB 106 1,000,000 bytes

gigabyte GB 109 1,000,000,000 bytes

terabyte TB 1012 1,000,000,000,000 bytes

Name Notation Power of 2 Value

kibibyte KiB 210 10241 = 1,024 bytes

mebibyte MiB 220 10242 = 1,048,576 bytes

gibibyte GiB 230 10243 = 1,073,741,824 bytes

tebibyte TiB 240 10244 = 109,951,162,776 bytes

46
Data storage and file compression

Calculation of file size

The file size of an image is calculated as:

image resolution (in pixels) × color depth (in bits)

Sir Khawar
The size of a mono sound file is calculated as:

sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds)

For a stereo sound file, you would then multiply the result by two.

Worked example
A photograph is 1024 × 1080 pixels and uses a color depth of 32 bits. How many
photographs of this size would fit onto a memory stick of 64GiB?

1. Multiply number of pixels in vertical and horizontal directions to find total number
of pixels = (1024 × 1080) = 1 105 920 pixels

2. Now multiply number of pixels by color depth then divide by 8 to give the number of
bytes = 1105920 × 32 = 35389440/8 bytes = 4423680 bytes

3. 64 GiB = 64 × 1024 × 1024 × 1024 = 68719476736 bytes

4. Finally divide the memory stick size by the files size = 68 719 476 736/4 423 680
= 15 534 photos.

47
Sir Khawar

Worked example
A camera detector has an array of 2048 by 2048 pixels and uses a color depth of 16. Find
the size of an image taken by this camera in MiB.

1. Multiply number of pixels in vertical and horizontal directions to find total number
of pixels = (2048 × 2048) = 4194304pixels

2. Now multiply number of pixels by color depth = 4 194 304 × 16 = 67 108 864 bits

3. Now divide number of bits by 8 to find the number of bytes in the file = (67 108 864)/8
= 8 388 608 bytes

4. Now divide by 1024 × 1024 to convert to MiB = (8 388 608)/(1 048 576) = 8 MiB.

Worked example
An audio CD has a sample rate of 44100 and a sample resolution of 16bits. The music
being sampled uses two channels to allow for stereo recording. Calculate the file size for
a 60-minute recording.

1. Size of file = sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds)

2. Size of sample = (44100 × 16 × (60 × 60)) = 2540160000bits

3. Multiply by 2 since there are two channels being used = 5 080 320 000 bits

4. Divide by 8 to find number of bytes = (5 080 320 000)/8 = 635 040 000

5. Divide by 1024 × 1024 to convert to MiB = 635 040 000 / 1 048 576 = 605MiB.

48
Data compression
The calculations previously show that sound and image files can be very large. It is
therefore necessary to reduce (or compress) the size of a file for the following reasons:

Sir Khawar
to save storage space on devices such as the hard disk drive/solid state drive

• to reduce the time taken to stream a music or video file

• to reduce the time taken to upload, download or transfer a file across a network

• the download/upload process uses up network bandwidth – this is the

maximum rate of transfer of data across a network, measured in bits per second.
This occurs whenever a file is downloaded, for example, from a server. Compressed
files contain fewer bits of data than uncompressed files and therefore use less
bandwidth, which results in a faster data transfer rate.

• reduced file size also reduces costs. For example, when using cloud storage, the cost
is based on the size of the files stored. Also an internet service provider (ISP)
may charge a user based on the amount of data downloaded.

49
Lossy file compression
With this technique, the file compression algorithm eliminates unnecessary data from the
Sir Khawar

file. This means the original file cannot be reconstructed once it has been compressed.
Lossy file compression results in some loss of detail when compared to the original file.
The algorithms used in the lossy technique have to decide which parts of the file need to
be retained and which parts can be discarded.
For example, when applying a lossy file compression algorithm to:

• an image, it may reduce the resolution and/or the bit/color depth

• a sound file, it may reduce the sampling rate and/or the resolution.

Lossy file compression algorithms

• MPEG-3 (MP3) and MPEG-4 (MP4)

• JPEG.

MP3
MP3 files are used for playing music on computers or mobile phones. This
compression technology will reduce the size of a normal music file by about 90%.
While MP3 music files can never match the sound quality found on a DVD or CD, the
quality is satisfactory for most general purposes.

But how can the original music file be reduced by 90% while still retaining most of the
music quality? Essentially the algorithm removes sounds that the human ear can’t hear
properly. For example:

• removal of sounds outside the human ear range

• if two sounds are played at the same time, only the louder one can be heard
by the ear, so the softer sound is eliminated. This is called perceptual music shaping.

50
JPEG
When a camera takes a photograph, it produces a raw bitmap file which can be very large

Sir Khawar
in size. These files are temporary in nature. JPEG is a lossy file compression algorithm
used for bitmap images. As with MP3, once the image is subjected to the JPEG
compression algorithm, a new file is formed and the original file can no longer be
constructed.
The JPEG file reduction process is based on two key concepts:

• human eyes don’t detect differences in color shades quite as well as they detect
differences in image brightness (the eye is less sensitive to color variations than it is to
variations in brightness)

• by separating pixel color from brightness, images can be split into 8 × 8 pixel blocks,
for example, which then allows certain ‘information’ to be discarded from the image
without causing any real noticeable deterioration in quality.

51
Lossless file compression
With this technique, all the data from the original uncompressed file can be
Sir Khawar

reconstructed. This is particularly important for files where any loss of data would be
disastrous (e.g. when transferring a large and complex spreadsheet or when downloading
a large computer application).

Lossless file compression is designed so that none of the original detail from the file is
lost.

Run-length encoding (RLE)

• it is a form of lossless/reversible file compression

• it reduces the size of a string of adjacent, identical data (e.g. repeated colors in an
image) a repeating string is encoded into two values:

• the flrst value represents the number of identical data items (e.g. characters) in
the run

• the second value represents the code of the data item (such as ASCII code if it
is a keyboard character)

• RLE is only effective where there is a long run of repeated units/bits.

52
Using RLE on text data

Consider the following text string: ‘aaaaabbbbccddddd’. Assuming each character

Sir Khawar
requires 1byte then this string needs 16bytes. If we assume ASCII code is being used, then
the string can be coded as follows:

This means we have flve characters with ASCII code 97, four characters with ASCII code
98, two characters with ASCII code 99 and flve characters with ASCII code 100. Assuming
each number in the second row requires 1 byte of memory, the RLE code will need 8 bytes.
This is half the original file size.

One issue occurs with a string such as ‘cdcdcdcdcd’ where RLE compression isn’t very
effective. To cope with this, we use a flag. A flag preceding data indicates that what
follows are the number of repeating units (for example, 255 05 97 where 255 is the flag and
the other two numbers indicate that there are flve items with ASCII code 97). When a flag
is not used, the next byte(s) are taken with their face value and a run of 1 (for example, 01
99 means one character with ASCII code 99 follows).

Consider this example:

The original string contains 32 characters and would occupy 32 bytes of storage. The
coded version contains 18 values and would require 18 bytes of storage. Introducing a flag
(255 in this case) produces:

53
Using RLE with images

Worked example

Sir Khawar
Figure shows the letter ‘F’ in a grid where each square requires 1 byte of storage. A white
square has a value 1 and a black square a value of 0:

The 8 × 8 grid would need 64bytes; the compressed RLE format has 30 values, and
therefore needs only 30bytes to store the image.

54
Using RLE with images

Worked example

Sir Khawar
Figure shows an object in four colors. Each color is made up of red, green and blue
(RGB) according to the code on the right.

This produces the following data: 2 0 0 0 4 0 255 0 3 0 0 0 6 255 255 255 1 0 0 0 2 0 255 0 4
255 0 0 4 0 255 0 1 255 255 255 2 255 0 0 1 255 255 255 4 0 255 0
4 255 0 0 4 0 255 0 4 255 255 255 2 0 255 0 1 0 0 0 2 255 255 255 2 255 0 0
2 255 255 255 3 0 0 0 4 0 255 0 2 0 0 0.

The original image (8 × 8 square) would need 3bytes per square (to include all three RGB
values). Therefore, the uncompressed file for this image is
8 × 8 × 3 = 192bytes.

The RLE code has 92 values, which means the compressed file will be 92bytes in size. This
gives a file reduction of about 52%. It should be noted that the file reductions in reality
will not be as large as this due to other data which needs to be stored with the
compressed file (e.g. a file header).

Unit 1 DTE
No ratings yet
Unit 1 DTE
25 pages
Livro Phonics Teoria e Passos
100% (2)
Livro Phonics Teoria e Passos
257 pages
The Number System CS
No ratings yet
The Number System CS
19 pages
Chap1 DataRepresentation Notes
No ratings yet
Chap1 DataRepresentation Notes
20 pages
Binary and Hexa (Ch1) - 1
No ratings yet
Binary and Hexa (Ch1) - 1
40 pages
Class XI Number System
No ratings yet
Class XI Number System
120 pages
Lecture 4 Index Compression
No ratings yet
Lecture 4 Index Compression
32 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Data Representation & Storage
No ratings yet
Data Representation & Storage
26 pages
Notes Information Representation
No ratings yet
Notes Information Representation
98 pages
Paper 1 Computer Science AS
No ratings yet
Paper 1 Computer Science AS
194 pages
8.part II - Channel Code Introduction M
No ratings yet
8.part II - Channel Code Introduction M
40 pages
As Chapter 1 Information Representation
No ratings yet
As Chapter 1 Information Representation
157 pages
Binary
No ratings yet
Binary
18 pages
A Brief History of Cryptology and Cryptographic Algorithms
No ratings yet
A Brief History of Cryptology and Cryptographic Algorithms
104 pages
1.1 Number System NEW A Level Computer Science
No ratings yet
1.1 Number System NEW A Level Computer Science
23 pages
Data Representation 1
No ratings yet
Data Representation 1
28 pages
Class 1 (Lecture Notes)
No ratings yet
Class 1 (Lecture Notes)
6 pages
DoD Suppliers Passive RFID Information Guide v8
No ratings yet
DoD Suppliers Passive RFID Information Guide v8
27 pages
1 - Data Representation
No ratings yet
1 - Data Representation
56 pages
ICT IGCSE Book 1-15
No ratings yet
ICT IGCSE Book 1-15
14 pages
EE203 01 Digital and Number Systems
No ratings yet
EE203 01 Digital and Number Systems
34 pages
G8 - Unit 9 - Data - Data Representation - Part A
No ratings yet
G8 - Unit 9 - Data - Data Representation - Part A
11 pages
Comp Sci Term 2 Midterm
No ratings yet
Comp Sci Term 2 Midterm
5 pages
The 4th Organizational Behavior FTU 2021.9.16 New
No ratings yet
The 4th Organizational Behavior FTU 2021.9.16 New
51 pages
Computer Science 1
No ratings yet
Computer Science 1
61 pages
A'Level Computer Science by Zafar Ali Khan
No ratings yet
A'Level Computer Science by Zafar Ali Khan
172 pages
CS Chp1
No ratings yet
CS Chp1
24 pages
A Level - 1.1.1 Number Representation
No ratings yet
A Level - 1.1.1 Number Representation
17 pages
Number Systems
No ratings yet
Number Systems
43 pages
Aioredis Documentation: Release 1.3.0
No ratings yet
Aioredis Documentation: Release 1.3.0
98 pages
cvcaOH Y
No ratings yet
cvcaOH Y
17 pages
Data Representation
No ratings yet
Data Representation
17 pages
Inheritance-8 - BPP Python
No ratings yet
Inheritance-8 - BPP Python
12 pages
Telegram Channel Telegram Group
No ratings yet
Telegram Channel Telegram Group
55 pages
Computer
No ratings yet
Computer
26 pages
Binary To Hex
No ratings yet
Binary To Hex
18 pages
Lecture 1
No ratings yet
Lecture 1
59 pages
1 1 PDF
No ratings yet
1 1 PDF
91 pages
Computer Science Notes
No ratings yet
Computer Science Notes
81 pages
1 1 1 Number Representation
No ratings yet
1 1 1 Number Representation
17 pages
Igcse Computer Science - Chapter 1
No ratings yet
Igcse Computer Science - Chapter 1
16 pages
2 - ComputerOrganization - Data Rep
No ratings yet
2 - ComputerOrganization - Data Rep
52 pages
Data Representation
No ratings yet
Data Representation
21 pages
Unit 1 TC IIIrd Sem
No ratings yet
Unit 1 TC IIIrd Sem
115 pages
Design Integrity Notes
No ratings yet
Design Integrity Notes
4 pages
Web Application Penetration Testing EXtreme - 1
100% (1)
Web Application Penetration Testing EXtreme - 1
300 pages
Binary and Hexa (Ch1)
No ratings yet
Binary and Hexa (Ch1)
42 pages
Revision Booklet New
No ratings yet
Revision Booklet New
10 pages
Ch1 Notes
No ratings yet
Ch1 Notes
16 pages
(AMALEAKS - BLOGSPOT.COM) HUMSS Discipline and Ideas (HUMSS-122) Week 20
No ratings yet
(AMALEAKS - BLOGSPOT.COM) HUMSS Discipline and Ideas (HUMSS-122) Week 20
35 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
43 pages
Chapter 1 Data Representation
No ratings yet
Chapter 1 Data Representation
33 pages
StdSupport NumberSystem (Encrypted)
No ratings yet
StdSupport NumberSystem (Encrypted)
23 pages
1 Data Representation
No ratings yet
1 Data Representation
32 pages
JPEG Specification
No ratings yet
JPEG Specification
186 pages
QR Code Analysis
No ratings yet
QR Code Analysis
8 pages
Number Representation
No ratings yet
Number Representation
17 pages
1.error Detection and Correction
No ratings yet
1.error Detection and Correction
74 pages
2 - ComputerOrganization - Data Rep
No ratings yet
2 - ComputerOrganization - Data Rep
75 pages
Notes Chapter 1 Data Representation
No ratings yet
Notes Chapter 1 Data Representation
32 pages
DC LAB MANUAL NM1 With Addon Experiments R16
No ratings yet
DC LAB MANUAL NM1 With Addon Experiments R16
52 pages
Chapter 1 - Data Representation
No ratings yet
Chapter 1 - Data Representation
156 pages
Elements of Communication
No ratings yet
Elements of Communication
9 pages
1.1.1 Number Representation N
No ratings yet
1.1.1 Number Representation N
17 pages
UG 3-1 R19 ECE Syllabus
No ratings yet
UG 3-1 R19 ECE Syllabus
28 pages
Chapter 1: Binary Systems and Hexadecimal
No ratings yet
Chapter 1: Binary Systems and Hexadecimal
4 pages
Textbook 2
No ratings yet
Textbook 2
10 pages
Chapter 1, Part 2
No ratings yet
Chapter 1, Part 2
45 pages
Number Representation
No ratings yet
Number Representation
25 pages
Chapter 1: Binary Systems and Hexadecimal
No ratings yet
Chapter 1: Binary Systems and Hexadecimal
5 pages
Managerial Communication
No ratings yet
Managerial Communication
95 pages
CH 1
No ratings yet
CH 1
22 pages
DM Unit 1
No ratings yet
DM Unit 1
31 pages
ISO
No ratings yet
ISO
12 pages
The Number System CS
No ratings yet
The Number System CS
17 pages
Morse Code Translator Decode & Encode Morse Code (With Audio)
No ratings yet
Morse Code Translator Decode & Encode Morse Code (With Audio)
1 page
Data Representation (No Sound or Images)
No ratings yet
Data Representation (No Sound or Images)
3 pages
Error Coding For Wireless Communication
No ratings yet
Error Coding For Wireless Communication
5 pages
Analogue Digital Data 9-14 of 19
No ratings yet
Analogue Digital Data 9-14 of 19
6 pages
Chapter Three Source Coding: 1-Sampling Theorem
No ratings yet
Chapter Three Source Coding: 1-Sampling Theorem
19 pages
Module 5 - Part1
No ratings yet
Module 5 - Part1
36 pages
Computer Science IGCSE Chapter 1 Notes
No ratings yet
Computer Science IGCSE Chapter 1 Notes
34 pages
1.1.1 Number Representation
No ratings yet
1.1.1 Number Representation
17 pages
Computer Section 1.1.1
No ratings yet
Computer Section 1.1.1
17 pages
Hodder Education Computer Science Study Guide and Notes
67% (3)
Hodder Education Computer Science Study Guide and Notes
201 pages
Revision Guide Coursebook
No ratings yet
Revision Guide Coursebook
10 pages
I Wish I Knew That: Math
From Everand
I Wish I Knew That: Math
GOLDSMITH, MICHAEL
3.5/5 (3)
From Zero To Infinity (And Beyond)
From Everand
From Zero To Infinity (And Beyond)
Mike Goldsmith
2.5/5 (1)
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
From Everand
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
Mourad Boufadene
No ratings yet