0% found this document useful (0 votes)
40 views20 pages

1.1.3 Data Storage

1. All data must be converted to binary digits of 1s and 0s to be processed by computers. Text is converted using character encoding schemes like ASCII that assign a binary number to each character. Images represent pixels as patterns of binary numbers to represent color and resolution. Sound is converted by sampling the amplitude of the sound wave at regular intervals and encoding the samples as binary numbers. 2. Error detection and correction methods like parity checks, check digits and checksums allow computers to detect errors in transmitted data and request retransmissions to ensure integrity. Automatic repeat request protocols have receivers acknowledge correct data and request resending of anything received incorrectly.

Uploaded by

K B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views20 pages

1.1.3 Data Storage

1. All data must be converted to binary digits of 1s and 0s to be processed by computers. Text is converted using character encoding schemes like ASCII that assign a binary number to each character. Images represent pixels as patterns of binary numbers to represent color and resolution. Sound is converted by sampling the amplitude of the sound wave at regular intervals and encoding the samples as binary numbers. 2. Error detection and correction methods like parity checks, check digits and checksums allow computers to detect errors in transmitted data and request retransmissions to ensure integrity. Automatic repeat request protocols have receivers acknowledge correct data and request resending of anything received incorrectly.

Uploaded by

K B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1.

1 DATA REPRESENTATION AHMED THAKUR


1.1.3 DATA STORAGE

 Show understanding that sound (music), pictures, video, text and numbers are stored in
different formats

Representing data

All data inside a computer is transmitted as a series of electrical signals that are either on or off.
Therefore, in order for a computer to be able to process any kind of data, including text, images and
sound, they must be converted into binary form. If the data is not converted into binary – a series of

R
1s and 0s – the computer will simply not understand it or be able to process it.

KU
A
TH
Representing Text
ED

When any key on a keyboard is pressed, it needs to be converted into a binary number so that it
can be processed by the computer and the typed character can appear on the screen.
HM
A

A code where each number represents a character can be used to convert text into binary. One
code we can use for this is called ASCII. The ASCII code takes each character on the keyboard and
assigns it a binary number. For example:

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 1
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

 the letter ‘a’ has the binary number 0110 0001 (this is the denary number 97)
 the letter ‘b’ has the binary number 0110 0010 (this is the denary number 98)
 the letter ‘c’ has the binary number 0110 0011 (this is the denary number 99)

Text characters start at denary number 0 in the ASCII code, but this covers special characters
including punctuation, the return key and control characters as well as the number keys, capital
letters and lower case letters.

ASCII code can only store 128 characters, which is enough for most words in English but not enough

R
for other languages. If you want to use accents in European languages or larger alphabets such as
Cyrillic (the Russian alphabet) and Chinese Mandarin then more characters are needed. Therefore
another code, called Unicode, was created. This meant that computers could be used by people

KU
using different languages.

Representing images
Images also need to be converted into binary in order for a computer to process them so that they
can be seen on our screen. Digital images are made up of pixels. Each pixel in an image is made
up of binary numbers.

A
If we say that 1 is black (or on) and 0 is white (or off), then a simple black and white picture can be
created using binary.

To create the picture, a grid can be set out and the squares coloured (1 – black and 0 – white). But
TH
before the grid can be created, the size of the grid needs be known. This data is
called metadata and computers need metadata to know the size of an image. If the metadata for
the image to be created is 10x10, this means the picture will be 10 pixels across and 10 pixels down.
This example shows an image created in this way:
ED
HM

Adding colour
The system described so far is fine for black and white images, but most images need to use colours
as well. Instead of using just 0 and 1, using four possible numbers will allow an image to use four
colours. In binary this can be represented using two bits per pixel:
A

 00 – white
 01 – blue
 10 – green
 11 – red

While this is still not a very large range of colours, adding another binary digit will double the number
of colours that are available:
 1 bit per pixel (0 or 1): two possible colours
 2 bits per pixel (00 to 11): four possible colours
 3 bits per pixel (000 to 111): eight possible colours

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 2
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

 4 bits per pixel (0000 – 1111): 16 possible colours


 …
 16 bits per pixel (0000 0000 0000 0000 – 1111 1111 1111 1111): over 65 0000 possible colours

The number of bits used to store each pixel is called the colour depth. Images with more colours
need more pixels to store each available colour. This means that images that use lots of colours are
stored in larger files.

Image quality

R
Image quality is affected by the resolution of the image. The resolution of an image is a way of
describing how tightly packed the pixels are.

KU
In a low-resolution image, the pixels are larger so fewer are needed to fill the space. This results in
images that look blocky or pixelated. An image with a high resolution has more pixels, so it looks a
lot better when you zoom in or stretch it. The downside of having more pixels is that the file size will
be bigger.

Representing sound
Sound needs to be converted into binary for computers to be able to process it. To do this, sound is

A
captured - usually by a microphone - and then converted into a digital signal.

An analogue to digital converter will sample a sound wave at regular time intervals. For example, a
sound wave like this can be sampled at each time sample point:
TH
ED
HM

The samples can then be converted to binary. They will be recorded to the nearest whole number.

Time sample 1 2 3 4 5 6 7 8 9 10

Denary 8 3 7 6 9 7 2 6 6 6
A

Binary 1000 0011 0111 0110 1001 0111 0010 0100 0110 0110

If the time samples are then plotted back onto the same graph, it can be seen that the sound wave
now looks different. This is because sampling does not take into account what the sound wave is
doing in between each time sample.

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 3
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

R
KU
A
This means that the sound loses quality as data has been lost between the time samples. The way to
increase the quality and store the sound at a quality closer to the original, is to have more time
samples that are closer together. This way, more detail about the sound can be collected, so when
it’s converted to digital and back to analogue again it does not lose as much quality.
TH
The frequency at which samples are taken is called the sample rate, and is measured in Hertz (Hz).
1 Hz is one sample per second. Most CD-quality audio is sampled at 44 100 or 48 000 KHz.

 Methods of error detection and correction, such as parity checks, check digits, checksums
and Automatic Repeat reQuests (ARQ)
ED

Error detection and correction has great practical importance in maintaining data (information)
integrity across noisy Communication Networks channels and less than- reliable storage media.

Error Correction: Send additional information so incorrect data can be corrected and accepted.
Error correction is the additional ability to reconstruct the original, error-free data.
HM

There are two basic ways to design the channel code and protocol for an error correcting system:

 Automatic Repeat-Request (ARQ): The transmitter sends the data and also an error detection
code, which the receiver uses to check for errors, and request retransmission of erroneous data.
In many cases, the request is implicit; the receiver sends an acknowledgement (ACK) of correctly
received data, and the transmitter re-sends anything not acknowledged within a reasonable
period of time.

The types of ARQ protocols include


 Stop-and-wait ARQ
A

 Go-Back-N ARQ
 Selective Repeat ARQ

All three protocols usually use some form of sliding window protocol to tell the transmitter to
determine which (if any) packets need to be retransmitted.

These protocols reside in the Data Link or Transport Layers of the OSI model.

 Forward Error Correction (FEC): The transmitter encodes the data with an error-correcting code
(ECC) and sends the coded message. The receiver never sends any messages back to the

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 4
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

transmitter. The receiver decodes what it receives into the "most likely" data. The codes are
designed so that it would take an "unreasonable" amount of noise to trick the receiver into
misinterpreting the data.

Error Detection: Send additional information so incorrect data can be detected and rejected. Error
detection is the ability to detect the presence of errors caused by noise or other impairments during
transmission from the transmitter to the receiver.

Error Detection Schemes: In telecommunication, a redundancy check is extra data added to a

R
message for the purposes of error detection. Several schemes exist to achieve error detection, and
are generally quite simple. All error detection codes transmit more bits than were in the original data.
Most codes are "systematic": the transmitter sends a fixed number of original data bits, followed by

KU
fixed number of check bits usually referred to as redundancy which are derived from the data bits
by some deterministic algorithm.

The receiver applies the same algorithm to the received data bits and compares its output to the
received check bits; if the values do not match, an error has occurred at some point during the
transmission. In a system that uses a "non-systematic" code, such as some raptor codes, data bits are
transformed into at least as many code bits, and the transmitter sends only the code bits.

A
Repetition Schemes: Variations on this theme exist. Given a stream of data that is to be sent, the
data is broken up into blocks of bits, and in sending, each block is sent some predetermined number
of times. For example, if we want to send "1011", we may repeat this block three times each. Suppose
TH
we send "1011 1011 1011", and this is received as "1010 1011 1011".

As one group is not the same as the other two, we can determine that an error has occurred. This
scheme is not very efficient, and can be susceptible to problems if the error occurs in exactly the
same place for each group e.g. "1010 1010 1010" in the example above will be detected as correct
in this scheme. The scheme however is extremely simple, and is in fact used in some transmissions of
numbers stations.

Parity Schemes: A parity bit is an error detection mechanism. A parity bit is an extra bit transmitted
ED

with a data item, chose to give the resulting bits even or odd parity. Parity refers to the number of
bits set to 1 in the data item. There are 2 types of parity

 Even parity - an even number of bits are 1 Even parity - data: 10010001, parity bit 1
 Odd parity - an odd number of bits are 1 Odd parity - data: 10010111, parity bit 0
HM

The stream of data is broken up into blocks of bits, and the number of 1 bits is counted. Then, a "parity
bit" is set (or cleared) if the number of one bits is odd (or even).This scheme is called even parity; odd
parity can also be used. There is a limitation to parity schemes. A parity bit is only guaranteed to
detect an odd number of bit errors (one, three, five, and so on). If an even number of bits (two, four,
six and so on) are flipped, the parity bit appears to be correct, even though the data is corrupt. For
example

 Original data and parity: 10010001+1 (even parity)


 Incorrect data: 10110011+1 (even parity!)
A

Parity usually used to catch one-bit errors

Checksum: A checksum of a message is an arithmetic sum of message code words of a certain


word length, for example byte values, and their carry value. The sum is negated by means of ones-
complement, and stored or transferred as an extra code word extending the message. On the
receiver side, a new checksum may be calculated, from the extended message.

If the new checksum is not 0, error is detected. Checksum schemes include parity bits, check digits,
and longitudinal redundancy check. Suppose we have a fairly long message, which can reasonably
be divided into shorter words (a 128 byte message, for instance). We can introduce an accumulator

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 5
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

with the same width as a word (one byte, for instance), and as each word comes in, add it to the
accumulator.

When the last word has been added, the contents of the accumulator are appended to the
message (as a 129th byte, in this case). The added word is called a checksum. Now, the receiver
performs the same operation, and checks the checksum. If the checksums agree, we assume the
message was sent without error.

Calculating Checksum

R
A checksum is determined in one of two ways. Let's say the checksum of a packet is 1 byte long. A
byte is made up of 8 bits, and each bit can be in one of two states, leading to a total of 256 (28)
possible combinations. Since the first combination equals zero, a byte can have a maximum value

KU
of 255.

 If the sum of the other bytes in the packet is 255 or less, then the checksum contains that exact
value.

 If the sum of the other bytes is more than 255, then the checksum is the remainder of the total

A
value after it has been divided by 256.

Let's look at a checksum example:


 Bytes total 1,151
 1,151 / 256 = 4.496 (round to 4)
 4 x 256 = 1,024
TH
 1,151 - 1,024 = 127 checksum

Cyclic Redundancy Check (CRC): A cyclic redundancy check (CRC) is an error-detecting


code commonly used in digital networks and storage devices to detect accidental changes to raw
data. Blocks of data entering these systems get a short check value attached, based on the
remainder of a polynomial division of their contents. On retrieval the calculation is repeated, and
corrective action can be taken against presumed data corruption if the check values do not match.
ED

CRCs are so called because the check (data verification) value is a redundancy (it expands the
message without adding information) and the algorithm is based on cyclic codes. CRCs are popular
because they are simple to implement in binary hardware, easy to analyze mathematically, and
particularly good at detecting common errors caused by noise in transmission channels. Because
the check value has a fixed length, the function that generates it is occasionally used as a hash
HM

function.

 Understanding of the concept of Musical Instrument Digital Interface (MIDI) files, JPEG files,
MP3 and MP4 files

TEXT FILE FORMATS

ASCII and Unicode


A

ASCII (American Standard Code for Information Interchange)is a 7-bit character code that was
introduced by American National Standards Institute (ANSI) and is used by most U.S. personal and
workstation computers.

Acronym for American Standard Code for Information Interchange. A coding scheme using 7 or 8
bits that assigns numeric values to up to 256 characters, including letters, numerals, punctuation
marks, control characters, and other symbols. ASCII was developed in 1968 to standardize data
transmission among disparate hardware and software systems and is built into most minicomputers
and all PCs. ASCII is divided into two sets: 128 characters (standard ASCII) and an additional 128
(extended ASCII).

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 6
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

Char Ctrl Dec Hex Char Dec Hex Char Dec Hex Char Dec Hex

<space
NUL ^@ 0 00 32 20 @ 64 40 ` 96 60
>

SOH ^A 1 01 ! 33 21 A 65 41 a 97 61

STX ^B 2 02 " 34 22 B 66 42 b 98 62

R
ETX ^C 3 03 # 35 23 C 67 43 c 99 63

KU
EOT ^D 4 04 $ 36 24 D 68 44 d 100 64

ENQ ^E 5 05 % 37 25 E 69 45 e 101 65

ACK ^F 6 06 & 38 26 F 70 46 f 102 66

BEL ^G 7 07 ' 39 27 G 71 47 g 103 67

A
BS ^H 8 08 ( 40 28 H 72 48 h 104 68

HT ^I 9 09 ) 41 29 I 73 49 i 105 69

LF ^J 10 0A *
TH
42 2A J 74 4A j 106 6A

VT ^K 11 0B + 43 2B K 75 4B k 107 6B

FF ^L 12 0C , 44 2C L 76 4C l 108 6C

CR ^M 13 0D - 45 2D M 77 4D m 109 6D

SO ^N 14 0E . 46 2E N 78 4E n 110 6E
ED

SI ^O 15 0F / 47 2F O 79 4F o 111 6F

DLE ^P 16 10 0 48 30 P 80 50 p 112 70
HM

DC1 ^Q 17 11 1 49 31 Q 81 51 q 113 71

DC2 ^R 18 12 2 50 32 R 82 52 r 114 72

DC3 ^S 19 13 3 51 33 S 83 53 s 115 73

DC4 ^T 20 14 4 52 34 T 84 54 t 116 74

NAK ^U 21 15 5 53 35 U 85 55 u 117 75

SYN ^V 22 16 6 54 36 V 86 56 v 118 76
A

ETB ^W 23 17 7 55 37 W 87 57 w 119 77

CAN ^X 24 18 8 56 38 X 88 58 x 120 78

EM ^Y 25 19 9 57 39 Y 89 59 y 121 79

SUB ^Z 26 1A : 58 3A Z 90 5A z 122 7A

ESC ^[ 27 1B ; 59 3B [ 91 5B { 123 7B

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 7
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

FS ^\ 28 1C < 60 3C \ 92 5C | 124 7C

GS ^] 29 1D = 61 3D ] 93 5D } 125 7D

RS ^^ 30 1E > 62 3E ^ 94 5E ~ 126 7E

<del
US ^_ 31 1F ? 63 3F _ 95 5F 127 7F
ete>

R
EBCDIC (Extended Binary Coded Decimal Interchange Code) was developed by IBM for use on their

KU
mainframe computers.

Unicode is a character coding system designed to support the worldwide interchange and display
of written texts of diverse languages by providing a unique number for every character.

A 16-bit character encoding standard developed by the Unicode Consortium between 1988 and
1991. By using two bytes to represent each character, Unicode enables almost all of the written

A
languages of the world to be represented using a single character set. (By contrast, 8-bit ASCII is not
capable of representing all of the combinations of letters and diacritical marks that are used just
with the Roman alphabet.) Approximately 39,000 of the 65,536 possible Unicode character codes
have been assigned to date, 21,000 of them being used for Chinese ideographs. The remaining
combinations are open for expansion.
TH
Expression of denary in BCD and vice versa
http://www.miniwebtool.com/bcd-to-decimal-converter/
http://www.miniwebtool.com/decimal-to-bcd-converter/

Data Representation Types


DATA TYPE STORAGE ALLOCATION VALUE RANGE
ED

Depends on implementing
Boolean True or False
platform

Byte 1 byte 0 through 255 (unsigned)

Char 2 bytes 0 through 65535 (unsigned)


HM

0:00:00 (midnight) on January 1, 0001


Date 8 bytes through 11:59:59 PM on December 31,
9999

0 through +/-
79,228,162,514,264,337,593,543,950,335
(+/-7.9...E+28) with no decimal point; 0
Decimal 16 bytes through +/-
7.9228162514264337593543950335
with 28 places to the right of the
A

decimal

-1.79769313486231570E+308 through -
4.94065645841246544E-324, for
negative values
Double 8 bytes
4.94065645841246544E-324 through
1.79769313486231570E+308, for
positive values

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 8
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

-2,147,483,648 through 2,147,483,647


Integer 4 bytes
(signed)

-9,223,372,036,854,775,808 through
Long 8 bytes
9,223,372,036,854,775,807(signed)

4 bytes on 32-bit platform Any type can be stored in a variable


Object
8 bytes on 64-bit platform of type Object

SByte 1 byte -128 through 127 (signed)

R
Short 2 bytes -32,768 through 32,767 (signed)

KU
-3.4028235E+38 through -1.401298E-45
for negative values;
Single 4 bytes
1.401298E-45 through 3.4028235E+38
for positive values

Depends on implementing 0 to approximately 2 billion Unicode


String
platform characters

A
UInteger 4 bytes 0 through 4,294,967,295 (unsigned)

0 through 18,446,744,073,709,551,615
ULong 8 bytes
(unsigned)
TH Each member of the structure has a
User- Depends on implementing range determined by its data type
Defined platform and independent of the ranges of the
other members

UShort 2 bytes 0 through 65,535 (unsigned)

.DOC — Microsoft Word DOCument


ED

Microsoft Word is the most popular word processing software in the world. You probably won’t come
across loads of doc files, but if you do it can be annoying if you haven’t got a program that can
open them.

.TXT — TeXT file


HM

The most basic of files, it’s just some text. You no doubt already have either NotePad, SimpleText, or
browser

.PDF — Portable Document Format


Adobe Acrobat files were invented so that documents could be transferred between computers
and indeed platforms, and still look the exact same, something which can’t be said about HTML
files...
Program: Acrobat Reader
A

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 9
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

IMAGE FILE FORMATS

Bitmaps

R
KU
A large part of using modern computers involves sending pictures and films to each other, along
with using a graphical user interface. All of this involves computers saving and processing images.

A
This section will cover the two main image types: vector and bitmap, along with some compression
techniques.

Bitmap Graphics - a collection of pixels from an image mapped to specific memory locations holding
their binary colour value.
TH
Pixel - the smallest possible addressable area defined by a solid colour, represented as binary, in an
image.
ED
HM

Resolution
 Image Resolution - how many pixels an image contains per inch/cm
 Screen Resolution - the number of pixels per row by the number of pixels per column
A

Video Display Formats

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 10
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

R
KU
A
TH
ED
HM
A

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 11
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

Calculating screen resolutions


Using the diagram above we are going to work out how many pixels are required to display a single
frame on a VGA screen.

Checking the resolution:


Height = 480
Width = 640
Area = Width * Height = Total Pixels
Area = 640 * 480 = 307200

R
Colour depth - The number of bits used to represent the colour of a single pixel

KU
Colour
1 bit 2 bit 4 bit
depth

A
Example
TH
stores 4 colours:
Mono-chrome, only stores RGB(70,61,55), RGB(79,146,85)
Description Stores limited colours
black and white RGB(129,111,134),
RGB(149,146,166)
Number of
colours
per pixel
ED

Colour
8 bit 24 bit
depth
HM

Example

Description close to reality hard to see any difference between reality


Number of
A

colours
per pixel

It seems pretty obvious that the higher the colour depth, the closer the picture will look to reality.
Why then don't we just ramp up the colour depth on every image that we make? The answer should
be obvious, for a fixed resolution, the higher the colour depth, the larger the file size.

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 12
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

Calculating file size for different colour depths


All the images above are of the same resolution:
300*225 = 67500 pixels

If the first image uses 1 bit to store the colour for each pixel, then the image size would be:
Number of Pixels * Colour Depth = Image Size
67500 * 1 bit = 67500 bits

For the second image uses 2 bits to store the colour for each pixel, then the image size would be:

R
Number of Pixels * Colour Depth = Image Size
67500 * 2 bit = 135000 bits

KU
Vector graphics
Vector graphics are graphics in which the image is represented in a mathematical fashion. What
this allows one to do is to zoom in an image to infinite precision. They are ideal for situations in which
an image might be used at various resolutions and dimensions.

Raster graphics
Raster graphics are of a fixed dimension, somewhat like a grid pattern with specified values at each

A
point. These graphics are the default for things from the real world (IE, scanned images, photographs,
etc). They are ideal for use when an image will only be used once, and will never need to be
enlarged, or if portions are coming from a photograph or other real-world image.
TH
Differences between Raster(bitmaps) and vector graphics
ED
HM

 Vector images scale without file size increase / decrease


 Bitmap images scale resulting in file size increase / decrease
 Vector images scale without distortion to the image
 Bitmap images distort (pixellate) when scaling
 Bitmaps are better for photo editing

Bitmaps require less processing power to display

.GIF — Graphics Interchange Format


The most common image format on the Internet. Good for simple images. Read our Image File
A

Formats page for more. Your browser can display them, or any image editor.

.JPG/ .JPEG — Joint Photographic Experts Group file


Another very common image file format, mainly used for photos. Again, for more check out
the Image File Formats page. Your browser can show them, or an image editor.

Look at the following five photographs of the same car wheel:

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 13
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

R
The resolution of the photographs is reduced from A to E. Photographs A and B are very sharp whilst
photograph D is very fuzzy and E is almost unrecognisable. This is the result of changing the number

KU
of PIXELS per centimetre used to store the image (that is, reducing the PICTURE RESOLUTION).

When a photographic file undergoes file compression, the size of the file is reduced. The trade-off for
this reduced file size is reduced quality of the image. One of the file formats used to reduce
photographic file sizes is known as JPEG. This is another example of lossy file compression. As with
MP3 format, once the image is subjected to the jpeg compression algorithm, a new file is formed

A
and the original file can no longer be constructed. Jpeg will reduce the RAW BITMAP image by a
factor of between 5 and 15 depending on the quality of the original.

An image that is 2048 pixels wide and 1536 pixels high is equal to 2048 × 1536 pixels; in other words,
3 145 728 pixels. This is often referred to as a 3-megapixel image (although it is obviously slightly larger).
TH
A raw bitmap can often be referred to as a TIFF or BMP image (file extension .TIF or .BMP). The file size
of this image is determined by the number of pixels. In the previous example, a 3-megapixel image
would be 3 megapixels × 3 colours. In other words, 9 megabytes (each pixel occupies 3 bytes
because it is made up of the three main colours: red, green and blue). TIFF and BMP are the highest
image quality because, unlike jpeg, they are not in a compressed format. The same image stored in
jpeg format would probably occupy between 0.6 megabytes and 1.8 megabytes.

Jpeg relies on certain properties of the human eye and, up to a point, a certain amount of file
compression can take place without any real loss of quality. The human eye is limited in its ability to
ED

detect very slight differences in brightness and in colour hues. For example, some computer imaging
software boasts that it can produce over 40 million different colours – the human eye is only able to
differentiate about 10 million colours.

.PNG — Portable Network Graphics


HM

PNGs are a file format designed to be used in place of GIFs. They are usually slightly smaller, and
sport advanced features like alpha-channel transparency and 24-bit colour support. Read more
on our image formats page. Your browser can view them.

.TIFF — Tagged Image File Format


For really high quality images, TIFFs are used, but cannot be viewed through a browser.
Program: image editor.

SOUND FILE FORMATS


A

.MIDI/ .MID — Musical Instrument Digital Interface


Midis are sequenced music files made on keyboards. They’re usually really small and often sound
great, although it largely depends on your soundcard. Midi collections are one of the few places in
the world where you can find classic game and movie music, and for that I salute them.
Program: WinAmp.

MUSICAL INSTRUMENT DIGITAL INTERFACE (MIDI) is always associated with the storage of music files.
However, MIDI files are not music and don’t contain any sounds; they are very different to, for
example, MP3 files. MIDI is essentially a communications protocol that allows electronic musical

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 14
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

instruments to interact with each other. The MIDI protocol uses 8-bit serial transmission with one start
bit and one stop bit, and is therefore asynchronous).

A MIDI file consists of a list of commands that instruct a device (for example, an electronic organ,
sound card in a computer or in a mobile phone) how to produce a particular sound or musical note.
Each MIDI command has a specific sequence of bytes. The first byte is the status byte – this informs
the MIDI device what function to perform. Encoded in the status byte is the MIDI channel. MIDI
operates on 16 different channels, which are numbered 0 to 15.

R
Examples of MIDI commands include:
 Note on/off: this indicates that a key (on an electronic keyboard) has been pressed/released to
produce/stop producing a musical note

KU
 Key pressure: this indicates how hard the key has been pressed (this could indicate loudness of
the music note or whether any vibrato has been used, and so on).

Two additional bytes are required, a PITCH BYTE, which tells the MIDI device which note to play, and
a VELOCITY BYTE, which tells the device how loud to play the note. When music or sound is recorded
on a computer system, these MIDI messages are saved in a file which is recognised by the file

A
extension .mid.

If this .mid file is played back through a musical instrument, such as an electronic keyboard, the music
will be played back in an identical way to the original. The whole piece of music will have been
stored as a series of commands but no actual musical notes. This makes it a very versatile file
TH
structure, since the same file could be fed back through a different electronic instrument, such as
an electric guitar, with different effects to the original. However, to play back through an instrument
such as a guitar would need the use of SEQUENCER SOFTWARE, since the MIDI files wouldn’t be
recognised in their ‘raw’ form.

Both the electronic instruments and the computer need a MIDI interface to allow them to ‘talK’ to
each other. It was mentioned earlier that the MIDI operates on 16 channels. In fact the computer
ED

can send data out on all 16 MIDI channels at the same time. For example, 16 MIDI devices, each set
up for a different MIDI channel, could be connected to the computer. Each device could be playing
a separate line in a song from the sequencer software, effectively creating an electronic orchestra.
This implementation is being used more and more today in the recording studio, by major orchestras
and in musical scores used in films.
HM

Because MIDI files don’t contain any audio tracks, their size, compared with an MP3 file, is
considerably smaller. For example, a 10 megabyte MP3 file only requires about 10 kilobyte file size
when using the MIDI format. This makes them ideal for devices where memory is an issue; for example,
storing ring tones on a mobile phone.

.MP3 — MPEG Layer 3 sound file


Single handedly caused a revolution. MP3 is a sound file format which is highly compressed, which
allows download-happy file sizes and excellent quality. Has caused much grief for the music industry
as song are now small enough to be traded online.
Program: WinAmp
A

MPEG-3 (MP3) uses technology known as AUDIO COMPRESSION to convert music and other sounds
into an MP3 file format. Essentially, this compression technology will reduce the size of a normal music
file by about 90 per cent. For example, an 80 megabyte music CD can be reduced to 8 megabytes
using MP3 technology.

MP3 files are used in MP3 players, computers or mobile phones. Files can be downloaded from the
internet, or CDs can be converted to MP3 format. The CD files are converted using FILE

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 15
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

COMPRESSION software. Whilst the music quality can never match the ‘full’ version found on a CD,
the quality is satisfactory for most general purposes.

But how can the original music file be reduced by 90 per cent whilst still retaining most of the music
quality? This is done using file compression algorithms which use PERCEPTUAL MUSIC SHAPING; this
essentially removes sounds that the human ear can’t hear properly. For example, if two sounds are
played at the same time, only the louder one can be heard by the ear, so the softer sound is
eliminated. This means that certain parts of the music can be removed without affecting the quality

R
too much. MP3 files use what is known as a LOSSY FORMAT since part of the original file is lost following
the compression algorithm. This means that the original file can’t be put back together again.
However, even the quality of MP3 files can be different since it depends on the BIT RATE – this is the

KU
number of bits per second used when creating the file. Bit rates are roughly between 80 and 320
kilobits per second; usually 200 or higher gives a sound quality close to a normal CD.

.MP4 – MPEG-4 Part 14


MPEG-4 Part 14 or MP4 is a digital multimedia format most commonly used to store video and audio,
but can also be used to store other data such as subtitles and still images. Like most modern

A
container formats, it allows streaming over the Internet. The only official filename extension for MPEG-
4 Part 14 files is .mp4, but many have other extensions, most commonly .m4a and .m4p. M4A (audio
only) is often compressed using AAC encoding (lossy), but can also be in Apple Lossless format.

Some devices advertised as "MP4 Players" are simply MP3 Players that also play AMV video or some
TH
other video format, and do not necessarily play the MPEG-4 Part 14 format.

MPEG-4 (MP4) files are slightly different to MP3 files. This format allows the storage of multimedia files
rather than just sound. Music, videos, photos and animation can all be stored in the MP4 format.
Videos, for example, could be streamed over the internet using the MP4 format without losing any
real discernable quality.

.RAM — Real Audio Movie


ED

Real Networks created formats for streaming audio and video, and gave away free players for the
formats, before allowing themselves to become so smothered in advertising that everyone with
sense decided to stop using their programs. You might still come across real audio files around the
net,
Program: Real One Player
HM

.WAV — WAVe sound file


A basic, either un- or not very- compressed sound file, usually used for short sound samples.
Your computer will be able to play these anyway (when it turns on and sings it’s playing a .wav).
Program: WinAmp for more power.

VIDEO FILE FORMATS

.AVI — Audio/Video Interleaved


Standard video format supported on the windows platform. They do not stream, however, so you
have to download the entire file before you can watch any of it.
A

Program: Windows Media Player or pick up WinAmp or QuickTime.

.MPEG/ .MPG — Motion Picture Experts Group file


One of the standards for streaming movies.
Program: WinAmp can play movies. Or you could use » Windows Media Player, by Microsoft.

.MOV/ .QT — QuickTime MOVie


The QuickTime format was designed by Apple and originated on the Mac, but has made the
transition to the PC and is hugely popular.
Program: QuickTime Player

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 16
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

APPLICATION/PROGRAM FILE FORMATS

.EXE — EXEcutable file


If you download a program that you need to install, it will likely come as an exe file.
Just double click it to install on your PC. Be careful of viruses!

WEB FILE FORMATS

R
.HTML/ .HTM — HyperText Markup Language file
Most pages you create for a website will be HTML files

KU
.CSS — Cascading Style Sheet
CSS files are a tool in the repertoire of webmasters that take care of how their websites look. To read
more about them, see our CSS tutorials. CSS files can be created or edited in any text-editor, like
Notepad. Try » TopStyle Lite too, it includes loads of selectors for easy editing.

COMPRESSION FILE FORMATS

A
.RAR — RAR archive
This is a compressed file format similar to the popular .zip format. It sports advanced functions like
special multimedia compression and has many benefits over zip files.
Program: WinRAR to take care of your RAR archives, and it can handle other archive types too.
TH
.ZIP — ZIPped file
Zipped files are really groups of other types of files kept together and compressed a bit. Many
downloads will consist of zip collections, so be sure to have something to open them with.
ED
HM
A

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 17
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

 Understanding of the principles of data compression (lossless and lossy compression


algorithms) applied to music/video, photos and text files

Compression

Why compress files?


Processing power and storage space is very valuable on a computer. To get the best out of both, it
can mean that we need to reduce the file size of text, image and audio data in order to transfer it
more quickly and so that it takes up less storage space.

R
In addition, large files take a lot longer to download or upload which leads to web pages, songs and
videos that take longer to load and play when using the internet.

KU
Any kind of data can be compressed. There are two main types of compression: lossy and lossless.

Lossy compression
Lossy compression removes some of a file’s original data in order to reduce the file size. This might
mean reducing the numbers of colours in an image or reducing the number of samples in a sound

A
file. This can result in a small loss of quality of an image or sound file.

A popular lossy compression method for images is the JPEG, which is why most images on the internet
are JPEG images. A popular lossy compression method for sounds is MP3. Once a file has been
compressed using lossy compression, the discarded data cannot be retrieved again.
TH
Lossless compression
Lossless compression doesn’t reduce the quality of the file at all. No data is lost, so lossless
compression allows a file to be recreated exactly as it was when originally created.

There are various algorithms for doing this, usually by looking for patterns in the data that are
repeated. Zip files are an example of lossless compression.

The space savings of lossless compression are not as good as they are with lossy compression.
ED
HM
A

Lossy Lossless
Data Compression
In digital signal processing, data compression, source coding, or bit-rate reduction involves
encoding information using fewer bits than the original representation. Compression can be either
lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical
redundancy. No information is lost in lossless compression. Lossy compression reduces bits by
identifying unnecessary information and removing it. The process of reducing the size of a data file
is referred to as data compression.

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 18
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

Image Compression
Image compression may be lossy or lossless. Lossless compression is preferred for archival purposes
and often for medical imaging, technical drawings, clip art, or comics. Lossy compression methods,
especially when used at low bit rates, introduce compression artifacts. Lossy methods are especially
suitable for natural images such as photographs in applications where minor (sometimes
imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate. The lossy
compression that produces imperceptible differences may be called visually lossless.

Audio Compression

R
Audio data compression, as distinguished from dynamic range compression, has the potential to
reduce the transmission bandwidth and storage requirements of audio data. Audio compression
algorithms are implemented in software as audio codecs. Lossy audio compression algorithms

KU
provide higher compression at the cost of fidelity and are used in numerous audio applications.
These algorithms almost all rely on psychoacoustics to eliminate less audible or meaningful sounds,
thereby reducing the space required to store or transmit them.

In both lossy and lossless compression, information redundancy is reduced, using methods such as
coding, pattern recognition, and linear prediction to reduce the amount of information used to
represent the uncompressed data.

A
 Lossless audio compression produces a representation of digital data that decompress to an
exact digital duplicate of the original audio stream, unlike playback from lossy compression
techniques such as Vorbis and MP3. Compression ratios are around 50–60% of original size,
TH
which is similar to those for generic lossless data compression. Lossless compression is unable to
attain high compression ratios due to the complexity of waveforms and the rapid changes in
sound forms.

 Lossy audio compression is used in a wide range of applications. In addition to the direct
applications (mp3 players or computers), digitally compressed audio streams are used in most
video DVDs, digital television, streaming media on the internet, satellite and cable radio, and
increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater
compression than lossless compression (data of 5 percent to 20 percent of the original stream,
ED

rather than 50 percent to 60 percent), by discarding less-critical data.

Video Compression
Video compression uses modern coding techniques to reduce redundancy in video data. Most
video compression algorithms and codecs combine spatial image compression and temporal
HM

motion compensation. Video compression is a practical implementation of source coding in


information theory. In practice, most video codecs also use audio compression techniques in parallel
to compress the separate, but combined data streams as one package.

The majority of video compression algorithms use lossy compression. Uncompressed video requires a
very high data rate. Although lossless video compression codecs perform an average compression
of over factor 3, a typical MPEG-4 lossy compression video has a compression factor between 20
and 200.[24] As in all lossy compression, there is a trade-off between video quality, cost of processing
the compression and decompression, and system requirements. Highly compressed video may
present visible or distracting artifacts.
A

Codec
A codec is a device or computer program capable of encoding or decoding a digital data stream
or signal. Codec is a portmanteau of coder-decoder or, less commonly, compressor-decompressor.

A codec encodes a data stream or signal for transmission, storage or encryption, or decodes it for
playback or editing. Codecs are used in videoconferencing, streaming media and video editing
applications. A video camera's analog-to-digital converter (ADC) converts its analog signals into
digital signals, which are then passed through a video compressor for digital transmission or storage.
A receiving device then runs the signal through a video decompressor, then a digital-to-analog
converter (DAC) for analog display.

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 19
2210 [email protected], 0300-8268885
1.1 DATA REPRESENTATION AHMED THAKUR
1.1.3 DATA STORAGE

 Audio Codec
An audio codec is a device or computer program capable of coding or decoding a digital data
stream of audio.

In software, an audio codec is a computer program implementing an algorithm that compresses


and decompresses digital audio data according to a given audio file or streaming media audio
coding format. The objective of the algorithm is to represent the high-fidelity audio signal with
minimum number of bits while retaining the quality. This can effectively reduce the storage space

R
and the bandwidth required for transmission of the stored audio file. Most codecs are
implemented as libraries which interface to one or more multimedia players.

KU
In hardware, audio codec refers to a single device that encodes analog audio as digital signals
and decodes digital back into analog. In other words, it contains both an Analog-to-digital
converter (ADC) and Digital-to-analog converter (DAC) running off the same clock. This is used
in sound cards that support both audio in and out, for instance.

 Video Codec
A video codec is an electronic circuit or software that compresses or decompresses digital video,

A
thus converting raw (uncompressed) digital video to a compressed format or vice-versa. In the
context of video compression, "codec" is a concatenation of "encoder" and "decoder"; a device
that can only compress is typically called an encoder, and one that can only decompress is
known as a decoder.
TH
The format of the compressed data usually conforms to a standard video compression
specification. The compression is typically lossy, meaning that the compressed video lacks some
of the information present in the original video. A consequence of this is that decompressed
video has lower quality than the original, uncompressed video because there is insufficient
information to accurately reconstruct the original video.

There are complex relationships between the video quality, the amount of data used to
represent the video (determined by the bit rate), the complexity of the encoding and decoding
ED

algorithms, sensitivity to data losses and errors, ease of editing, random access, and end-to-end
delay (latency).
HM
A

COMPUTER SCIENCE https://www.facebook.com/groups/OAComputers/


Page 20
2210 [email protected], 0300-8268885

You might also like