0% found this document useful (0 votes)
53 views

Elective: Data Compression and Encryption V Extc ECCDLO 5014

This document provides an overview of data compression techniques. It begins by explaining why data compression is useful for reducing the size of files during transmission and storage. It then describes different types of compression including lossless compression for retaining all original data and lossy compression which can tolerate some loss of data. Key aspects covered include exploiting redundancies in data and properties of human perception. Specific examples of compressing text, images, audio and video are given to illustrate compression techniques.

Uploaded by

anitasjadhav
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Elective: Data Compression and Encryption V Extc ECCDLO 5014

This document provides an overview of data compression techniques. It begins by explaining why data compression is useful for reducing the size of files during transmission and storage. It then describes different types of compression including lossless compression for retaining all original data and lossy compression which can tolerate some loss of data. Key aspects covered include exploiting redundancies in data and properties of human perception. Specific examples of compressing text, images, audio and video are given to illustrate compression techniques.

Uploaded by

anitasjadhav
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

Elective: Data Compression and

Encryption
V EXTC
ECCDLO 5014

Anita Jadhav
Course Outcome
C308.1 analyze techniques of text compression in order to solve
(CO1) numerical related to statistical/dictionary based text
compression techniques
C308.2 explain audio/image/video compression standards.
(CO2)
C308.3 describe goals of cryptography and standards in Private Key
(CO3) cryptosystems.

C308.4 analyze number theoretic techniques and solve numericals


(CO4) related to Public Key Cryptography.

C308.5 comprehend societal issues related to Network Security and


(CO5) describe their solutions.
What is data compression?
• Compression is an “art and science” for
reduction in size of data without compromising
on its utility.
• Why ‘art’?
• It requires artistic approaches to identify what
can be retained and what can be thrown..
• Why ‘science’?
• There are well designed mathematical methods
for retaining and throwing up!!
Why Compress?

• Downloading digital color photograph given a 33.6 kbps


modem:

• Uncompressed image (TIFF file) = 600 kbytes, 142 seconds


• Lossless compression (GIF file) = 300 kbytes, 71 seconds
• Lossy compression (JPEG file) = 50 kbytes, 12 seconds
Why Compress?
• To reduce the volume of data to be transmitted
(text, fax, images)

• To reduce the bandwidth required for


transmission and to reduce storage
requirements (speech, audio, video)
Philosophy
• Compression is possible by exploiting redundancies in the data and
properties of human perception.

• Digital audio is a series of sample values; image is a rectangular array of


pixel values; video is a sequence of images played out at a certain rate
• Neighboring sample values are correlated
Redundancy
• Adjacent audio samples are similar (predictive
encoding); samples corresponding to silence (silence
removal)
• In digital image, neighboring samples on a scanning
line are normally similar (spatial redundancy)
• In digital video, in addition to spatial redundancy,
neighboring images in a video sequence may be similar
(temporal redundancy)
Redundancies contd..
• Text files
Frequently used characters or groups of characters

• Image files
Adjacent pixels in an image (spatial redundancy)

• Audio Files
Silence (silence removal)
Neighboring samples (predictive encoding)

• Video Files
Similar neighboring images (temporal redundancy)
Human Perception factors
• Compressed version of digital audio, image, video need not
represent the original information exactly.. But fact is we are okay
with it!!
• Perception sensitivities are different for different signal patterns, eg.
We do not hear all the frequencies in 20 Hz to 20 KHz range in the
same way..(MP-III)
• Human eye is less sensitive to the higher spatial frequency
components than the lower frequencies (transform coding)
• https://www.youtube.com/watch?time_continue=110&v=bh_9XFzb
WV8
Model for compression

Courtesy: www.scienceblogs.com
Classification
• Lossless compression
– lossless compression for legal and medical
documents, computer programs
– exploit only data redundancy
• Lossy compression
– digital audio, image, video where some errors or
loss can be tolerated
– exploit both data redundancy and human
perception properties
• Constant bit rate versus variable bit rate
coding
Classification
• Logical Vs. Physical Compression
Physical compression acts directly on the data; it is thus a question of
storing the redundant data from one bit pattern to another.
Logical compression on the other hand is carried out by a logical
reasoning, substituting this information with equivalent information.
Eg. Voice box parameters

• Symmetric Vs. Asymmetric


In the case of symmetrical compression, the same method is
used to compress and to decompress the data. The same
amount of work is thus needed for each of these operations.
Classification
• Certain compression algorithms are based on dictionaries that are for
a specific type of data: these are non-adaptive encoders. The
occurrence of letters in a text file, for example, depends on the
language in which it is written.
• An adaptive encoder adapts to the data which it will have to
compress, it does not start out with an already prepared dictionary
for a given type of data.
• A semi-adaptive encoder will build a dictionary according to the data
to be compressed: it builds the dictionary by going through the file
and then compresses the latter.
Data Compression
• Let’s look at an example.
• Let’s imagine we had to send the following
message:

The rain in Spain lies mainly in the plain


Data Compression
• If we had to send this as it is down a wire:

The rain in Spain lies mainly in the plain


Data Compression

• The a total of 42 characters (including 8 spaces)

The rain in Spain lies mainly in the plain


Data Compression

• The a total of 42 characters (including 8 spaces)

The rain in Spain lies mainly in the plain


Data Compression

• Lets replace the word “the” with the number 1.

The rain in Spain lies mainly in the plain


the =1
Data Compression

• Lets replace the word “the” with the number 1.

1 rain in Spain lies mainly in 1 plain


the =1
Data Compression

• Lets replace the word “the” with the number 1.

• We’ve reduced the of characters to 38.

1 rain in Spain lies mainly in 1 plain


the =1
Data Compression

• Lets replace the letters “ain” with the number


2.

1 rain in Spain lies mainly in 1 plain


the =1
Data Compression ain =2

• Lets replace the letters “ain” with the number 2.

• We’ve reduced the of characters to 30.

1 r2 in Sp2 lies m2ly in 1 pl2


the =1
Data Compression ain =2

• Lets replace the letters “in” with the number


3.

1 r2 in Sp2 lies m2ly in 1 pl2


the =1
Data Compression ain =2
in = 3

• Lets replace the letters “in” with the number 3.

• We’ve reduced the of characters to 28.

1 r2 3 Sp2 lies m2ly 3 1 pl2


the =1
Data Compression ain =2
in = 3

• Now lets say 1 means “the ”, so it’s “the” and


a space

1 r2 3 Sp2 lies m2ly 3 1 pl2


the =1
Data Compression ain =2
in = 3

• Now lets say 1 means “the ”, so it’s “the” and a


space

• We’ve reduced the of characters to 26.

1r2 3 Sp2 lies m2ly 3 1pl2


the =1
Data Compression ain =2
in = 3

• Now lets say 3 means “in ”, so it’s “in” and a


space

1r2 3 Sp2 lies m2ly 3 1pl2


the =1
Data Compression ain =2
in = 3

• Now lets say 3 means “in ”, so it’s “in” and a


space

• We’ve reduced the of characters to 24.

1r2 3Sp2 lies m2ly 31pl2


the =1
Data Compression ain =2
in = 3

• So that’s 24 characters for a 42 character


message, not bad.

The rain in Spain lies mainly in the plain

1r2 3Sp2 lies m2ly 31pl2


Data Compression

• Let’s try a different example.


Data Compression

• Let’s try a different example. Let’s say we are


sending a list of jobs, with each item on the list is
10 characters long.
• Bookkeeper
• Teacher---
• Porter----
• Nurse-----
• Doctor----
Data Compression

• Rather than sending the spaces we could just


say how long they are:
• Bookkeeper
• Teacher---
• Porter----
• Nurse-----
• Doctor----
Data Compression

• Rather than sending the spaces we could just


say how long they are:
• Bookkeeper • Bookkeeper
• Teacher--- • Teacher3-
• Porter---- • Porter4-
• Nurse----- • Nurse5-
• Doctor---- • Doctor4-
Data Compression

• We’ve gone from 50 to 42 characters:

• Bookkeeper • Bookkeeper
• Teacher--- • Teacher3-
• Porter---- • Porter4-
• Nurse----- • Nurse5-
• Doctor---- • Doctor4-
Data Compression

• Or let’s imagine we are sending a list of house


prices.
• 350000
• 600000
• 550000
• 2100000
• 3000000
Data Compression

• Now let’s use the # to indicate number of


zeros:
• 350000
• 600000
• 550000
• 2100000
• 3000000
Data Compression

• Now let’s use the # to indicate number of


zeros:
• 350000 • 35#4
• 600000 • 6#5
• 550000 • 55#4
• 2100000 • 21#5
• 3#6
• 3000000
Data Compression

• We’ve gone from 32 characters to 18


characters:
• 350000 • 35#4
• 600000 • 6#5
• 550000 • 55#4
• 2100000 • 21#5
• 3#6
• 3000000
Image Compression
Data Compression

• Let’s think about images.

• Let’s say we are trying to display the letter ‘A’


Data Compression

• Let’s think about images.

• Let’s say we are trying to display the letter ‘A’


Data Compression
• We could encode this as:

• WWWBBWWW
• WWBWWBWW
• WBWWWWBW
• WBWWWWBW
• WBBBBBBW
• WBWWWWBW
• WBWWWWBW
• WWWWWWWW
Data Compression
• We could compress this to:

• WWWBBWWW
• WWBWWBWW
• WBWWWWBW
• WBWWWWBW
• WBBBBBBW
• WBWWWWBW
• WBWWWWBW
• WWWWWWWW
Data Compression
• We could compress this to:

• WWWBBWWW • 3W2B3W
• WWBWWBWW • 2WB2WB2W
• WBWWWWBW • WB4WBW
• WBWWWWBW • WB4WBW
• WBBBBBBW • W6BW
• WBWWWWBW • WB4WBW
• WBWWWWBW • WB4WBW
• WWWWWWWW • 8W
Data Compression
• From 64 characters to 44 characters:

• WWWBBWWW • 3W2B3W
• WWBWWBWW • 2WB2WB2W
• WBWWWWBW • WB4WBW
• WBWWWWBW • WB4WBW
• WBBBBBBW • W6BW
• WBWWWWBW • WB4WBW
• WBWWWWBW • WB4WBW
• WWWWWWWW • 8W
Data Compression

• We call this “run-length encoding” or RLE.


Data Compression

• Now let’s add one more rule.


Data Compression

• Now let’s add one more rule.

• Let’s imagine if we send the number ‘0’ it


means repeat the previous line.
Data Compression
• So now we had:

• WWWBBWWW • 3W2B3W
• WWBWWBWW • 2WB2WB2W
• WBWWWWBW • WB4WBW
• WBWWWWBW • WB4WBW
• WBBBBBBW • W6BW
• WBWWWWBW • WB4WBW
• WBWWWWBW • WB4WBW
• WWWWWWWW • 8W
Data Compression
• And we get:

• WWWBBWWW • 3W2B3W • 3W2B3W


• WWBWWBWW • 2WB2WB2W • 2WB2WB2W
• WBWWWWBW • WB4WBW • WB4WBW
• WBWWWWBW • WB4WBW • 0
• WBBBBBBW • W6BW • W6BW
• WBWWWWBW • WB4WBW • WB4WBW
• WBWWWWBW • WB4WBW • 0
• WWWWWWWW • 8W • 8W
Data Compression
• Going from 64 to 44 to 34 characters:

• WWWBBWWW • 3W2B3W • 3W2B3W


• WWBWWBWW • 2WB2WB2W • 2WB2WB2W
• WBWWWWBW • WB4WBW • WB4WBW
• WBWWWWBW • WB4WBW • 0
• WBBBBBBW • W6BW • W6BW
• WBWWWWBW • WB4WBW • WB4WBW
• WBWWWWBW • WB4WBW • 0
• WWWWWWWW • 8W • 8W
Data Compression

• For most images, the lines are repeated


frequently, so you can get massive savings
from RLE.
Performance Measures
• Relative complexity of the algorithm
• Memory required to implement the algorithm
• Compression ratio (size of input information/
size of compressed information)
• Redundancy
• Distortion (MSE, PSNR)
• Fidelity (subjective)
Modeling and coding
Predictive Coding
Dictionary Technique
Entropy
• Amount of information I in a symbol of occurring
probability p : I = log2(1/p)
• Symbols that occur rarely convey a large amount of
information
• Average information per symbol is called entropy H
H= pix log2(1/pi) bits per codeword

• entropy is a measure of the average number of binary


symbols needed to code the output of source
Models

• Physical model

• Probability model

• Markov model

• Composite source model


Coding
• Coding: assignment of binary sequences to
elements of an alphabet

• Codeword: individual members of the set

• Alphabet: collection of symbols called letters

• Average number of bits per symbol is often


called the rate of the code
Uniquely decodable codes

You might also like