0% found this document useful (0 votes)
31 views29 pages

Data Representation B

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views29 pages

Data Representation B

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Why should the characters “x" and "z" take up

the same number of bits as "e" or " "?

Huffman codes use variable-length bit strings


to represent each character
More frequently used letters have shorter
strings to represent them

1
ballboard would be
1010001001001010110001111011

Encode roadbed
1111100010111010011011

2
In Huffman encoding no character's bit string
is the prefix of any other character's bit
string – this is called the prefix property

To decode
◦ look for match left to right, bit by bit
◦ record letter when a match is found
◦ begin where you left off, going left to right

3
Try it!

Decode

1011111001010
DRAB

4
Technique for determining codes guarantees
the prefix property of the codes

Two types of codes


◦ general, based on use of letters in English (Spanish,
etc.)
◦ specialized, based on text itself or specific types of
text

5
• Consider a data stream where there are only
4 characters: A, B, C, D

• They occur with frequency 0.40; 0.35; 0.20;


0.05 respectively.

• A simple two bit coding could be used:


00; 01; 10; 11; but then we use two bits all
the time.
 Instead we design a binary graph:
A 0

0.40

B
10
1.00
0.35
1
110
C 11
0.60
0.20
111 0.25
D
0.05
• Note that the coding is now:
A 0
B 10
C 110
D 111
• Sometimes we use 1 character, sometimes 3.

• Note that ON AVERAGE we use 1.85 characters:


• (1)(0.4) + (2)(0.35) + (3)(0.20+0.05)
• Before we used 2 so CR=0.925.
• [Information Theory sets a theoretical limit of
1.21]
We perceive sound when a series of air compressions vibrate a
membrane in our ear, which sends signals to our brain

10
A stereo sends an electrical signal to a speaker to
produce sound
This signal is an analog representation of the sound
wave
The voltage in the signal varies in direct proportion
to the sound wave

11
Digitize the signal by sampling
◦ periodically measure the voltage
◦ record the numeric value

How often should we sample?

A sampling rate of about 40,000 times per


second is enough to create a reasonable
sound reproduction

12
Some data
is lost, but a
reasonable
sound is
reproduced

Figure 3.8 Sampling an audio signal

13
CDs store audio information digitally
On the surface of the CD are microscopic pits
that represent binary digits
A low intensity laser is pointed as the disc
The laser light reflects
◦ strongly if the surface is smooth and
◦ poorly if the surface is pitted

14
Figure 3.9
A CD player reading
binary information

15
Audio Formats
◦ WAV, AU, AIFF, VQF, and MP3

MP3 (MPEG-2, audio layer 3 file) is dominant


◦ analyzes the frequency spread and discards
information that can’t be heard by humans
◦ bit stream is compressed using a form of
Huffman encoding to achieve additional
compression

Is this a lossy or lossless compression (or both)?

16
Colour
Perception of the frequencies of light that
reach the retinas of our eyes

Retinas have three types of colour


photoreceptor cone cells that correspond to
the colours of red, green, and blue

18
Colour is expressed as an RGB (red-green-
blue) value--three numbers that indicate the
relative contribution of each of these three
basic colours

An RGB value of (255, 255, 0) maximizes the


contribution of red and green, and minimizes
the contribution of blue, which results in a
bright yellow

19
Figure 3.10 Three-dimensional colour space

20
A few TrueColor
RGB values and
the colors they
represent

21
A browser may support only a certain number
of specific colours, creating a palette from
which to choose

Figure 3.11
The Netscape color palette

22
23
Digitizing a picture
Representing it as a collection of individual
dots called pixels
Resolution
The number of pixels used to represent a
picture
Raster Graphics
Storage of data on a pixel-by-pixel basis
Bitmap (BMP), GIF, JPEG, and PNG are raster-
graphics formats

24
Bitmap format
Contains the pixel colour values of the image from
left to right and from top to bottom
GIF format (indexed colour)
Each image is made up of only 256 colours
JPEG format
Find out about it.
PNG format
Like GIF but achieves greater compression with wider
range of colour depths

25
M columns 0...15 Quantized
Value 110
N rows
0...15

Digitization of a continuous image. The pixel at


coordinates [n=3, m=10] has the integer
brightness value 110.
This is the ATTRIBUTE vector.
26
Vector graphics
A format that describes an image in terms of
lines and geometric shapes

A vector graphic is a series of commands that


describe a line’s direction, thickness, and
colour

The file sizes tend to be smaller because not


every pixel is described

27
The good side and the bad side…

Vector graphics can be resized mathematically


and changes can be calculated dynamically as
needed

Vector graphics are not good for representing


real-world images

28
Video codec: COmpressor/DECompressor
Methods used to shrink the size of a movie to
allow it to be played on a computer or over a
network

Almost all video codecs use lossy compressions


to minimize the huge amounts of data
associated with video

29
Temporal compression
A technique based on differences between
consecutive frames: If most of an image in
two frames hasn’t changed, why should we
waste space to duplicate all of the similar
information?

30
Spatial compression
A technique based on removing redundant
information within a frame: This problem is
essentially the same as that faced when
compressing still images

You might also like