0% found this document useful (0 votes)

49 views10 pages

Lossless Compression: Lesson 1

This document discusses lossless compression techniques. It covers Shannon-Fano coding and Huffman coding, which are based on information theory. Shannon-Fano coding assigns codes to symbols based on their frequency, dividing the list of symbols in half at each step based on probability. Huffman coding builds a binary tree from the frequency of symbols and assigns shorter codes to more frequent symbols, resulting in unique prefix codes. Both techniques aim to minimize the number of bits needed to represent data without loss of information.

Uploaded by

dhiraj_das_10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views10 pages

Lossless Compression: Lesson 1

Uploaded by

dhiraj_das_10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 10

Lossless Compression

Multimedia Systems (Module 2)

Lesson 1:
m

Minimum Redundancy Coding based on Information

Theory:
Shannon-Fano Coding
Huffman Coding

Lesson 2:
m

Adaptive Coding based on Statistical Modeling:

Adaptive Huffman
Arithmetic coding

Lesson 3:
m

Dictionary-based Coding

LZW

Lossless Compression
Multimedia Systems (Module 2 Lesson 1)

Summary:
r

Compression
m
m

r
r
r

Sources:
r The Data Compression Book,

With loss
Without loss

Shannon: Information
Theory
Shannon-Fano Coding
Algorithm
Huffman Coding Algorithm

2nd Ed., Mark Nelson and

Jean-Loup Gailly.
Dr. Ze-Nian Lis course
material

Compression
Why Compression?

All media, be it text, audio, graphics or video has redundancy.

Compression attempts to eliminate this redundancy.

What is Redundancy?
m

If one representation of a media content, M, takes X bytes and

another takes Y bytes(Y< X), then X is a redundant
representation relative to Y.
Other forms of Redundancy
If the representation of a media captures content that is not
perceivable by humans, then removing such content will not
affect the quality of the content.

For example, capturing audio frequencies outside the human hearing

range can be avoided without any harm to the audios quality.

Is there a representation with an optimal size Z that cannot be

improved upon?
This question is tackled by information theory.
3

Compression
Lossless Compression

Compression with loss

Lossless Compress

Compress with loss

Uncompress

M M

Information Theory
According to Shannon, the entropy@ of an information
source S is defined as:
H(S) =

m
m

i (pi

log

(1/pi ))

log 2 (1/pi ) indicates the amount of information contained in

symbol Si, i.e., the number of bits needed to code symbol Si.
For example, in an image with uniform distribution of graylevel intensity, i.e. pi = 1/256, with the number of bits
needed to code each gray level being 8 bits. The entropy of
the image is 8.
Q: What is the entropy of a source with M symbols where
each symbol is equally likely?
Entropy, H(S) = log2 M

Q: How about an image in which half of the pixels are white

and half are black?
Entropy, H(S) = 1

@Here is an excellent primer by Dr. Schnieder on this subject

Information Theory
Discussion:
m

Entropy is a measure of how much information is encoded in

a message. Higher the entropy, higher the information
content.

We could also say entropy is a measure of uncertainty in a

message. Information and Uncertainty are equivalent concepts.

The units (in coding theory) of entropy are bits per symbol.

Entropy gives the actual number of bits of information

contained in a message source.

It is determined by the base of the logarithm:

2: binary (bit);
10: decimal (digit).

Example: If the probability of the character e appearing in

this slide is 1/16, then the information content of this
character is 4 bits. So, the character string eeeee has a total
content of 20 bits (contrast this to using an 8-bit ASCII
coding that could result in 40 bits to represent eeeee).
6

Data Compression = Modeling + Coding

Data Compression consists of taking a stream of symbols and
transforming them into codes.
m
m

The model is a collection of data and rules used to process input

symbols and determine their probabilities.
A coder uses a model (probabilities) to spit out codes when its
given input symbols

Lets take Huffman coding to demonstrate the distinction:

Input Symbols
Stream

Model

Probabilities

Encoder

Codes

Output
Stream

The output of the Huffman encoder is determined by the Model

(probabilities). Higher the probability shorter the code.
r Model A could determine raw probabilities of each symbol
occurring anywhere in the input stream. (pi = # of occurrences of Si /
Total number of Symbols)
r Model B could determine prob. based on the last 10 symbols in the
i/p stream. (continuously re-computes the probabilities)
7

The Shannon-Fano Encoding Algorithm

Calculate the frequencies

of the list of symbols
(organize as a list).
Sort the list in
(decreasing) order of
frequencies.
Divide list into two
halfs, with the total
freq. Counts of each half
being as close as
possible to the other.
The upper half is
assigned a code of 0 and
lower a code of 1.
Recursively apply steps 3
and 4 to each of the
halves, until each symbol
has become a
corresponding code leaf
on a tree.

Example
Symbol

Count

15
0
0

7
0
1

6
1

5
1

Symbol

Count

Info.
-log2(pi)

Code

Subtotal
# of
Bits

x 1.38

x 2.48

x 2.70

110

x 2.96

111

85.25

It takes a total of 89 bits to encode

85.25 bits of information (Pretty
good huh!)
8

The Huffman Algorithm

Initialization: Put all

nodes in an OPEN list L,
keep it sorted at all
times (e.g., ABCDE).
Repeat the following
steps until the list L
has only one node left:
1. From L pick two nodes
having the lowest
frequencies, create a
parent node of them.
2. Assign the sum of the
children's frequencies
to the parent node and
insert it into L.
3. Assign code 0, 1 to
the two branches of
the tree, and delete
the children from L.

Example
1

0
39
0

Count
Symbol

15
A

7
B

Symbol

Count

0
1
13

Info.
-log2(pi)

1
0

1
5
E

Code

Subtotal
# of
Bits

x 1.38

x 2.48

000

x 2.70

001

x 2.70

010

x 2.96

011

85.25

Huffman Alg.: Discussion

Decoding for the above two algorithms is trivial as long as the
coding table (the statistics) is sent before the data. There
is an overhead for sending this, negligible if the data file is
big.
Unique Prefix Property: no code is a prefix to any other code
(all symbols are at the leaf nodes)
--> great for decoder, unambiguous; unique Decipherability?

If prior statistics are available and accurate, then Huffman

coding is very good.
Number of bits (per symbol) needed for Huffman Coding is:
87 / 39 = 2.23
Number of bits (per symbol)needed for Shannon-Fano
Coding is:
89 / 39 = 2.28
10

Source Coding
No ratings yet
Source Coding
35 pages
Information Theory and Coding
100% (1)
Information Theory and Coding
79 pages
Data Compression
No ratings yet
Data Compression
46 pages
Materi Source Coding
No ratings yet
Materi Source Coding
39 pages
Script PDF
No ratings yet
Script PDF
78 pages
DCT Based Coding
No ratings yet
DCT Based Coding
49 pages
Advanced Multimedia Infrastructure
No ratings yet
Advanced Multimedia Infrastructure
32 pages
Lossless Compression: Huffman Coding: Mikita Gandhi Assistant Professor Adit
No ratings yet
Lossless Compression: Huffman Coding: Mikita Gandhi Assistant Professor Adit
39 pages
Information Theory: Dr. Muhammad Imran Farid
No ratings yet
Information Theory: Dr. Muhammad Imran Farid
32 pages
Image Compression
100% (1)
Image Compression
38 pages
Chap 2
No ratings yet
Chap 2
47 pages
cp467 12 Lecture14 Compression1
No ratings yet
cp467 12 Lecture14 Compression1
146 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
45 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
20 pages
Lecture 7 Source Coding 2024
No ratings yet
Lecture 7 Source Coding 2024
28 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
PTSP VI Part 2
No ratings yet
PTSP VI Part 2
44 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
No ratings yet
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
26 pages
Week 3
No ratings yet
Week 3
30 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
HGGJ Chapter Four
No ratings yet
HGGJ Chapter Four
30 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
03 Compression
No ratings yet
03 Compression
15 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
DC 3
No ratings yet
DC 3
20 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Unit 2
No ratings yet
Unit 2
30 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Data Compression Basics: Discrete Source
No ratings yet
Data Compression Basics: Discrete Source
34 pages
Data Compression
No ratings yet
Data Compression
28 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Unit 5 - Part-Ii
No ratings yet
Unit 5 - Part-Ii
41 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
Source Coding
No ratings yet
Source Coding
29 pages
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
No ratings yet
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
13 pages
Lecture
No ratings yet
Lecture
75 pages
Revision of Lecture 1: Q Bits R R Q Q (Bits/symbol) I (M P Log R R R) ? M, P
No ratings yet
Revision of Lecture 1: Q Bits R R Q Q (Bits/symbol) I (M P Log R R R) ? M, P
18 pages
Eee 427 - 4
No ratings yet
Eee 427 - 4
9 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Source Coding
No ratings yet
Source Coding
18 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
Stu-Lossless Compression Algos
No ratings yet
Stu-Lossless Compression Algos
21 pages
Information Theory and Coding - Chapter 3
No ratings yet
Information Theory and Coding - Chapter 3
33 pages
Basics of Compression: Goals
No ratings yet
Basics of Compression: Goals
15 pages
Chapter 2:number System and Data Representation: Computer Organization and Architecture
No ratings yet
Chapter 2:number System and Data Representation: Computer Organization and Architecture
36 pages
CH 6
No ratings yet
CH 6
21 pages
Chapter 2
No ratings yet
Chapter 2
13 pages
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
No ratings yet
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
25 pages
Huffman
No ratings yet
Huffman
13 pages
3-1-Lossless Compression
No ratings yet
3-1-Lossless Compression
10 pages
Data Compression (Pt2)
No ratings yet
Data Compression (Pt2)
22 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
3.source Coding Data Compression
No ratings yet
3.source Coding Data Compression
25 pages
Advanced Financial Management Acca Paper
100% (2)
Advanced Financial Management Acca Paper
10 pages
HARQ Process
No ratings yet
HARQ Process
6 pages
Coding Systems - ASCII and Unicode
No ratings yet
Coding Systems - ASCII and Unicode
23 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Convolutional Coding Viterbi Algorithm
100% (3)
Convolutional Coding Viterbi Algorithm
18 pages
Lecture 15 - Image Compression JPEG
No ratings yet
Lecture 15 - Image Compression JPEG
21 pages
Image Compression
No ratings yet
Image Compression
15 pages
Log 1
No ratings yet
Log 1
758 pages
Huffman Code
No ratings yet
Huffman Code
29 pages
Cyclic Codes in Digital Electronics
No ratings yet
Cyclic Codes in Digital Electronics
6 pages
Design and Simulation of DVB S2 T2 Baseb
No ratings yet
Design and Simulation of DVB S2 T2 Baseb
16 pages
DX Diag
No ratings yet
DX Diag
9 pages
Gail
No ratings yet
Gail
19 pages
Ch10 - Part 1
No ratings yet
Ch10 - Part 1
21 pages
Digital Display Advertising
No ratings yet
Digital Display Advertising
21 pages
Channel Coding Using Matlab
No ratings yet
Channel Coding Using Matlab
14 pages
Document 1 3
No ratings yet
Document 1 3
11 pages
Pepsi Refresh Project Group3
No ratings yet
Pepsi Refresh Project Group3
9 pages
Barlow 1959 Sensory Mechanisms, The Reduction of Redundancy and Intelligence PDF
No ratings yet
Barlow 1959 Sensory Mechanisms, The Reduction of Redundancy and Intelligence PDF
24 pages
NAND Flash Media Management Algorithms
No ratings yet
NAND Flash Media Management Algorithms
23 pages
Pepsi Refresh Project Group3
No ratings yet
Pepsi Refresh Project Group3
9 pages
Implementing Turbo Codes in Python - Cybersecurity, Programming, Signal Processing
No ratings yet
Implementing Turbo Codes in Python - Cybersecurity, Programming, Signal Processing
8 pages
Yes - Owner of A Lonely Heart
No ratings yet
Yes - Owner of A Lonely Heart
3 pages
03 Greedy Algorithms
No ratings yet
03 Greedy Algorithms
9 pages
Freesat Huffman Table (Compression Type 1)
No ratings yet
Freesat Huffman Table (Compression Type 1)
35 pages
Turbo Coding (CH 16) : Parallel Concatenated
No ratings yet
Turbo Coding (CH 16) : Parallel Concatenated
17 pages
Cretan Hieroglyphs
No ratings yet
Cretan Hieroglyphs
24 pages
Akshara Malika Stotra
No ratings yet
Akshara Malika Stotra
2 pages
A Survey On Power - Efficient Forward Error Correction Scheme For Wireless Sensor Networks
No ratings yet
A Survey On Power - Efficient Forward Error Correction Scheme For Wireless Sensor Networks
7 pages
Ciacona em - Silvius Leopold Weiss
No ratings yet
Ciacona em - Silvius Leopold Weiss
6 pages
DCN Error Correction
No ratings yet
DCN Error Correction
6 pages
Sample Question Paper 1
No ratings yet
Sample Question Paper 1
2 pages
AI Algorithms: Foundations, Applications, and Advancements
From Everand
AI Algorithms: Foundations, Applications, and Advancements
Anand Vemula
No ratings yet
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Information Theory: A Concise Introduction
From Everand
Information Theory: A Concise Introduction
Stefan Hollos
No ratings yet
Algorithmic Information Theory: Fundamentals and Applications
From Everand
Algorithmic Information Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lossless Compression: Lesson 1

Uploaded by

Lossless Compression: Lesson 1

Uploaded by

Lossless Compression

Multimedia Systems (Module 2)

Minimum Redundancy Coding based on Information

Adaptive Coding based on Statistical Modeling:

2nd Ed., Mark Nelson and

All media, be it text, audio, graphics or video has redundancy.

If one representation of a media content, M, takes X bytes and

For example, capturing audio frequencies outside the human hearing

Is there a representation with an optimal size Z that cannot be

Compression with loss

Compress with loss

log 2 (1/pi ) indicates the amount of information contained in

Q: How about an image in which half of the pixels are white

@Here is an excellent primer by Dr. Schnieder on this subject

Entropy is a measure of how much information is encoded in

We could also say entropy is a measure of uncertainty in a

Entropy gives the actual number of bits of information

It is determined by the base of the logarithm:

Example: If the probability of the character e appearing in

Data Compression = Modeling + Coding

The model is a collection of data and rules used to process input

Lets take Huffman coding to demonstrate the distinction:

The output of the Huffman encoder is determined by the Model

The Shannon-Fano Encoding Algorithm

Calculate the frequencies

It takes a total of 89 bits to encode

The Huffman Algorithm

Initialization: Put all

Huffman Alg.: Discussion

If prior statistics are available and accurate, then Huffman

You might also like