06 Data Representation New
06 Data Representation New
2. ASCII: It is an acronym for American Standard Code for Information Interchange. The
numeric data conversion from Decimal (or denary) number system (the number system
that we use for calculations) is easy because there is a direct conversion method
(discussed in the later section of this chapter). For example the denary number 65 is
equivalent to 01000001 of binary number. But there is no mathematical way that can
convert non-numeric data (like alphabets of you name) into binary form.
Hence, ANSI (stands for American National Standard Institute) in 1970s has allotted a
unique numeric code for every character (Numeric, non-numeric all) of our language.
These codes are called ASCII. Now it is easy to convert “A” to binary form by using its
ASCII code 65 for the mathematical method of conversion.
One other coding system is called Unicode. Unicode can represent all languages of
the world, thus supporting many operating systems, search engines and internet
browsers used globally.
Activity 1: Google “ASCII Table” and find out the ASCII codes for English alphabets
and numbers.
3. Binary Number System: It is the number system that is used to represent the human
language data into computer-understandable form. It has two characters (or alphabets);
a zero and a one. Each character is called a bit. Combinations of these bits are used to
represent each and every character of human language data into digital form.
Conventionally a group of 8 bits is used to represent a character of our language. For
example the bit code for alphabet A is 01000001.
2 38716 – 0
2 19358 – 0
2 9679 – 1
2 4839 – 1
2 2419 – 1
2 1209 – 1
2 604 – 0
2 302 – 0
2 151 – 1
2 75 – 1
2 37 – 1
2 18 – 0
2 9–1
2 4–0
2 2–0
4. Hexadecimal Number System: Binary system was more than sufficient when
computer was used in a handful of countries using English Language only. But, when
the use of computer spread to other countries, especially in non-English countries the
character set of our language exceeded the maximum limit of 255 (28) combination of
bit code. Moreover, capacity of computer memories also increased from a mere KB to
TB. This requires large codes to address memories. Because of these two
developments, the binary number system became inadequate; hence hexadecimal
number system was introduced for representing data in computers.
“Hex” means 6 and decimal means 10 so hexadecimal system has 16 digits in it; ten
digits (0 - 9) and then, six alphabets A, B, C, D, E, and F. It works in power of 16. 28
gives 256 (0 – 255) combinations whereas 168 will give you 4,294,967,296
combinations, a big number by any standard. This helps in representing a large number
of characters and enable us to address large sized memory.
16 is a power of 2 (24), hence a hexadecimal digit stands for 4 binary digits. For
example 6510 is equivalent to 010000012 which can be written in hexadecimal form as
4116.This makes it easy for programmers to write codes easily and efficiently, it also
save typing time as well as debugging becomes easy.
Please note that some people use “0x” prefix to denote hexadecimal number while
some use a subscript 16 as suffix to denote it. Hence 493 of hexadecimal can be
represented as 0x493 or 49316. We in IGCSE use subscript 16 as suffix.
3. Then the next digit on the left (or the number that you have substituted for
alphabets) by 161.
4. Keep on multiplying the digits (or number that you have substituted for alphabets)
on the left side by the next power of 16 (16 1, 162, 163, etc)
5. Add all these products.
6. The sum of products will be the answer.
Example: Convert 973C16 to Decimal.
16 38716 - 12
16 2419 - 3
16 151 - 7
9
Ans: 3871610 = 973C16 [because 1210 is C16]
4.3. Hexadecimal to Binary Conversion:
1. Convert each digit of the hexadecimal number to its binary equivalent by using the
following grid
23 22 21 20
8 4 2 1
9 7 3 C (12)
8 4 2 1 8 4 2 1 8 4 2 1 8 4 2 1
1 0 0 1 0 1 1 1 0 0 1 1 1 1 0 0
Place
Value of 8 4 2 1 8 4 2 1 8 4 2 1 8 4 2 1
Binary
Binary
1 0 0 1 0 1 1 1 0 0 1 1 1 1 0 0
Digit
Product
of place
8 0 0 1 9 <-Sum 0 4 2 1 7 <-Sum 0 0 2 1 3 <-Sum 8 4 0 0 12 <-Sum
value &
Digit
5.1. Memory Dumps: When the contents of a memory location are output to a printer or a
monitor it is called screen dump. This is particularly helpful for a programmer who is
developing system software or trying to trace errors in programs. By studying the
contents of memory location he can rectify the errors more easily then using the “hit
and trial” method. Also it will be easy to test the programs that he is developing. To see
the screen dump, he must use the memory location (address of the part of the memory
where data/information is being stored). Since it is much easier to work with B 5 A 4 1 A
F C rather than 1011 0101 1010 0100 0001 1010 1111 1100, hence hexadecimal is
often used when developing new software or when trying to trace errors in programs. It
is a very powerful fault tracing tool, but requires considerable knowledge of computer
architecture to interpret the memory contents.
5.2. Uses of Hexadecimal in HTML colour codes: Hyper Text Markup Language is used
to develop Websites. In HTML a colour is specified according to the intensity of its Red,
Green and Blue (RGB) components, each represented by eight bits. Thus, there are 24
bits used to specify a web colour, and 16,777,216 colours that may be so specified. It's
easier for the human programmer to represent a 24-bit integer, often used for 32-bit
colour values, as #FF0099 instead of 1111 1111 0000 0000 1001 1001
HTML TAG Name
<font color="#FF0000"> RED </font> (RED)
<font color="#00FF00"> GREEN </font> (GREEN) LIME
<font color="#0000FF"> BLUE </font> (BLUE)
<font color="#FFFF00"> YELLOW </font> (YELLOW)
<font color="#FF00FF"> MAGENTA </font> (MAGENTA) FUCHSIA
<font color="#00FFFF"> CYAN </font> (CYAN) AQUA
5.3. Uses of Hexadecimal in MAC Address: Media Access Control (MAC) address is a
number that uniquely identifies a device on the internet. The MAC address refers to the
Network Interface Card (NIC) which is part of a device. The MAC address is rarely
changed hence a particular device can always be identified no matter where it is. A
MAC address is 48 bits long. So we can assign MAC address to 281 billion computers.
The MAC address is commonly written as a sequence of 12 hexadecimal digits rather
than in 48 bits, hence a typical MAC address will be written as 48-3F-0A-91-00-BC
instead of 48 bits which are 0100 1000 0011 1111 0000 1010 1001 0001 0000 0000
1011 1100.
The first half of a MAC address (48-3F-0A) is the identity number of the manufacturer
and the second half (91-00-BC) is the serial number of the device.
number refers to the memory location of the error. This helps programmers to find and
then fix problems.
Note: The generation of a 9th bit is a clear indication that the sum has exceeded this
value. This is known as an overflow error and in this case is an indication that a
number is too big to be stored in the computer using 8 bits registers.
7. Logical binary shifts: Computers can carry out a logical shift on a sequence of binary
numbers. The logical shift means moving the binary number to the left or to the right.
Each shift left is equivalent to multiplying the binary number by 2 and each shift right
is equivalent to dividing the binary number by 2.
The denary number 21 is 00010101 in binary. If we put this into an 8-bit register:
128 64 32 16 8 4 2 1
0 0 0 1 0 1 0 1
If we now shift the bits in this register one place to the left
128 64 32 16 8 4 2 1
0 0 1 0 1 0 1 0
The value of the binary bits is now 21 × 21 i.e. 42. We can see this is correct if we
calculate the denary value of the new binary number 101010 (i.e. 32 + 8 + 2)
And now suppose we shift the original number four places left:
128 64 32 16 8 4 2 1
0 1 0 1 0 0 0 0
The left-most 1-bit has been lost. In our 8-bit register the result of 21 × 24 is 80
which is clearly incorrect. This error is because we have exceeded the maximum
number of left shifts possible using this register.
The denary number 200 is 11001000 in binary. Putting this into an 8-bit register gives:
128 64 32 16 8 4 2 1
1 1 0 0 1 0 0 0
If we now shift the bits in this register one place to the right:
128 64 32 16 8 4 2 1
0 1 1 0 0 1 0 0
The value of the binary bits is now 200 ÷ 21 i.e. 100. We can see this is correct by
converting the new binary number 01100100 to denary (64 + 32 + 4).
To represent negative integers, we make use of two’s complement. The left-most bit
always determines the sign of the binary number. In two’s complement the left-most bit
is changed to a negative value. This means the new range of possible numbers is:
-128 (10000000) to +127 (01111111).
Convert −79 into an 8-bit binary number using two’s complement format.
Method 1
−128 64 32 16 8 4 2 1
1 0 1 1 0 0 0 1
Method 2
−128 64 32 16 8 4 2 1
1 0 1 1 0 0 0 1
It is a good idea to practice both methods.
9. Data storage: By now we are aware that how text and number data is stored in
computers. Now let’s find out how images and sound are stored in it.
9.1 Text File: Estimating the size of a text file is relatively straightforward. ASCII code is
used to represent each character. Each character from the keyboard has a value of 1
byte. Suppose we typed in the following message:
If we count the number of characters in the text typed in, we get the number 48.
So, each character equals 1 byte. We can therefore use this to estimate the size of a
text file. Obviously, there are other codes stored with the file which make its real size
slightly different, but we aren't concerned about that in this section of the book.
9.2. Bitmap Images: A bitmap images is composed of many tiny parts, called pixels, which
are often many different colours. It is possible to edit each individual pixel. Each pixel
can be represented as a binary number.
The number of bits used to represent each colour is called the colour depth. An 8 bit
colour depth means that each pixel can be one of 256 colours (because 28 = 256).
Modern computers have a 24 bit colour depth, which means over 16 million different
colours. Since the computer has to store information about every single pixel in the
image, the file size of a bitmap graphic is often quite large.
Image resolution refers to the number of pixels that make up an image; for example, an
image could contain 800 × 600 pixels (480 000 pixels in total).
When you resize a bitmap graphic, it tends to lose quality i.e. it becomes blurred.
The number of bits per sample is known as the sampling resolution (also known as the
bit depth). Sampling rate is the number of sound samples taken per second. This is
measured in hertz (Hz), where 1Hz means ‘one sample per second’.
sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds)
For a stereo sound file, you would then multiply the result by two.
10. File Compression: It is method of reducing a file’s size that is to be transmitted over
network of computer (LAN or Internet) to be stored for archive. The reduction is done
by reducing the redundancy of data (or graphics) and replacing them with some
predefined codes. While text files must be extracted (uncompressed) back to its original
(or almost original) form to use it, the audio/video/graphics files can be used in
compressed form by appropriate software. A file is compressed to save storage space
and to use less time to transmit over a network.
10.1. Importance of compressing Files: File compression is important for files transmitted
over the Internet because if they are not compressed then there would be considerably
more data to transmit. This would result in more network traffic, slower download times
and delays in viewing web pages, particularly those with multimedia content. Streaming
audio and video would be impractical without file compression.
However, compressed data may be of lower quality (if using lossy compression) and
must be decompressed to be used. This extra processing may slow some applications
and in the case of video decompression, require dedicated hardware such as graphics
cards for the video to be viewed as it is actually being decompressed.
10.2. Types of File Compressions: There are two types of compression techniques;
Lossless and Lossy compression.
10.2.1 Lossless compression: This allows the original file to be re-created exactly from the
compressed file. It works by searching for patterns in the file so, instead of repeatedly
storing a block of identical data; the data is stored once and then indexed. Further
occurrences are simply stored as the index number so the decompression software can
simply look up the data and place it back in the correct position.
Text files compress well because certain letters and words will often appear together in
the same pattern. Software files also compress well for similar reasons, they are made
up of a relatively small number of different instructions, often arranged in a set pattern.
In both cases, the larger the original file, the better the compression ratio as there are
more likely to be repeating patterns and each pattern will be repeated more frequently.
This means we have five characters with ASCII code 97, four characters with ASCII
code 98, two characters with ASCII code 99 and five characters with ASCII code 100.
Assuming each number in the second row requires 1 byte of memory, the RLE code
will need 8 bytes. This is half the original file size.
Figure shows the letter ‘F’ in a grid where each square requires 1 byte of storage. A
white square has a value 1 and a black square a value of 0:
The 8 × 8 grid would need 64 bytes; the compressed RLE format has 30 values, and
therefore needs only 30 bytes to store the image.
10.2.2 Lossy compression: Files that include a lot of unique information, such as bitmap
graphics, sound or video files, cannot be compressed much with lossless compression
because there is so little repeated data. Lossy compression works differently, it
removes data that is not needed, either because a drop in quality is acceptable or the
difference cannot be detected by the human eye or ear. Streaming audio and video is
possible with lossy compression.
MP3 (Moving Pictures Expert Group Audio Layer 3): This has become the standard
for distributing digital music files on the internet. It uses lossy compression to reduce
file sizes to about a tenth of the original. The compression algorithm is intended to
remove sounds that are generally beyond the limits of most people’s hearing but some
claim that the loss in quality is noticeable.
MP4 (MPEG-4 Part 14): The MP4 is a container format, allowing a combination of
audio, video, subtitles and still images to be held in the one single file. It also allows for
advanced content such as 3D graphics, menus and user interactivity. MP4 is a reliable
application that required a relatively low amount of bandwidth.
Exercise
1. When a key is pressed on the keyboard, the computer stores the ASCII representation
of the character typed into main memory. The ASCII representation for A is 65
(denary), for B is 66 (denary), etc.
There are two letters stored in the following memory locations:
Location 1 A
Location 2 C
Location 2:___________________________________________________________
Location 2:___________________________________________________________
1 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1
c. Explain why a programmer would prefer to see the contents of the locations displayed
as hexadecimal rather than binary, when debugging his program that reads the key
presses.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
3. Given below are the contents of three memory locations with addresses shown in
denary.
Address Memory contents
150 0100 0111
151 1100 1101
152 1001 1100
a.i.What is the binary value for address 150?
_____________________________________________________________________
ii. What is the hexadecimal value for the contents of address 152?
_____________________________________________________________________
b. The numbers in location 151 and 152 are the height and width (in pixels) of a bitmap
graphic currently in main memory. What are the dimensions of the bitmap in denary?
Height: __________________ pixels Width: __________________ pixels
i. How many bits are required to store each pixel for a black and white bitmap?
_____________________________________________________________________
_____________________________________________________________________
ii. For a 256-colour bitmap, each pixel requires a byte of memory. Explain this statement.
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
4. Files are often compressed before they are transmitted over the internet.
_____________________________________________________________________
b. State one advantage of file compression before sending them over internet.
_____________________________________________________________________
_____________________________________________________________________
c. Two types of file compressions are lossy and lossless. State which compression type is
most appropriate for each of the following and explain why it is appropriate.
Explanation: ___________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Explanation: ___________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
5. Some students decide to do a survey to find out how good the general public is at
Mathematics. These surveys produced a lot of data. The students decided to run a file
compression utility.
_____________________________________________________________________
_____________________________________________________________________
ii. The students frequently send each other emails with file attachments. Describe two
different file types where compression can be used.
1. __________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. __________________________________________________________________
___________________________________________________________________
___________________________________________________________________
6. A website is made up of different types of files. State what each of the file types in the
table below is used for.
HTML
JPG
MP4
MP3