0% found this document useful (0 votes)

101 views87 pages

Data Compression Lecture01

The document discusses data compression concepts including lossless and lossy compression. Lossless compression aims to compress data without any loss, while lossy compression allows for some loss of data in exchange for higher compression ratios. Information theory provides an understanding of the limits of data compression and how to compress data efficiently based on the probabilities of symbols.

Uploaded by

Nilotpal Pramanik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views87 pages

Data Compression Lecture01

Uploaded by

Nilotpal Pramanik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

Data Compression

Introduction to Data Compression

Entropy
Variable Length Codes

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
Data Compression

Basic Data Compression Concepts

original compressed decompressed

x y x̂
Encoder Decoder

• Lossless compression x x̂
– Also called entropy coding, reversible coding.
• Lossy compression x x̂
– Also called irreversible coding.
• Compression ratio = x y
– x is number of bits in x.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 2
Data Compression

Why Compress
• Conserve storage space
• Reduce time for transmission
– Faster to encode, send, then decode than to send
the original
• Progressive transmission
– Some compression techniques allow us to send
the most important bits first so we can get a low
resolution version of some data before getting the
high fidelity version
• Reduce computation
– Use less data to achieve an approximate answer

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
Data Compression

Braille
• System to read text by feeling raised dots on
paper (or on electronic displays). Invented in
1820s by Louis Braille, a French blind man.

a b c z

and the with mother

th ch gh

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
Data Compression

Braille Example
Clear text:
Call me Ishmael. Some years ago -- never mind how
long precisely -- having \\ little or no money in my purse,
and nothing particular to interest me on shore, \\ I thought
I would sail about a little and see the watery part of the
world. (238 characters)
Grade 2 Braille in ASCII.
,call me ,i\%mael4 ,``s ye$>$s ago -- n``e m9d h[ l;g
precisely -- hav+ \\ ll or no m``oy 9 my purse1 \& no?+
``picul$>$ 6 9t]e/ me on \%ore1 \\ ,i $?$``$|$ ,i wd sail
ab a ll \& see ! wat]y ``p ( ! \_w4 (203 characters)

Compression ratio = 238/203 = 1.17

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
Data Compression

Lossless Compression
• Data is not lost - the original is really needed.
– text compression
– compression of computer binary files
• Compression ratio typically no better than 4:1 for
lossless compression on many kinds of files.
• Statistical Techniques
– Huffman coding
– Arithmetic coding
– Golomb coding
• Dictionary techniques
– LZW, LZ77
– Sequitur
– Burrows-Wheeler Method
• Standards - Morse code, Braille, Unix compress, gzip,
zip, bzip, GIF, JBIG, Lossless JPEG
12

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
Data Compression

Lossy Compression
• Data is lost, but not too much.
– audio
– video
– still images, medical images, photographs
• Compression ratios of 10:1 often yield quite
high fidelity results.
• Major techniques include
– Vector Quantization
– Wavelets
– Block transforms
– Standards - JPEG, JPEG2000, MPEG 2, H.264

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
Data Compression

Why is Data Compression Possible

• Most data from nature has redundancy
– There is more data than the actual information
contained in the data.
– Squeezing out the excess data amounts to
compression.
– However, unsqueezing is necessary to be able to
figure out what the data means.
• Information theory is needed to understand
the limits of compression and give clues on
how to compress well.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
Data Compression

What is Information
• Analog data
– Also called continuous data
– Represented by real numbers (or complex
numbers)
• Digital data
– Finite set of symbols {a1, a2, ... , am}
– All data represented as sequences (strings) in the
symbol set.
– Example: {a,b,c,d,r} abracadabra
– Digital data can be an approximation to analog
data

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 9
Data Compression

Symbols
• Roman alphabet plus punctuation
• ASCII - 256 symbols
• Binary - {0,1}
– 0 and 1 are called bits
– All digital information can be represented
efficiently in binary
– {a,b,c,d} fixed length representation
symbol a b c d
binary 00 01 10 11

– 2 bits per symbol

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
0
Data Compression

Exercise - How Many Bits Per

Symbol?
• Suppose we have n symbols. How many bits
(as a function of n ) are needed in to
represent a symbol in binary?
– First try n a power of 2.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
1
Data Compression

Discussion: Non-Powers of Two

• Can we do better than a fixed length
representation for non-powers of two?

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
2
Data Compression

Information Theory
• Developed by Shannon in the 1940’s and 50’s
• Attempts to explain the limits of communication
using probability theory.
• Example: Suppose English text is being sent
– It is much more likely to receive an “e” than a “z”.
– In some sense “z” has more information than “e”.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
3
Data Compression

First-order Information
• Suppose we are given symbols {a1, a2, ... , am}.
• P(ai) = probability of symbol ai occurring in the
absence of any other information.
P(a1) + P(a2) + ... + P(am) = 1
• inf(ai) = log2(1/P(ai)) bits is the information of ai
in bits. 7
6
5
-log(x)
4
y

3
2
1
0
0.15

0.85
0.5
0.01
0.08

0.22
0.29

0.57
0.64

0.71
0.78

0.92
0.99
0.36
0.43

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
4
Data Compression

Example
• {a, b, c} with P(a) = 1/8, P(b) = 1/4, P(c) = 5/8
– inf(a) = log2(8) = 3
– inf(b) = log2(4) = 2
– inf(c) = log2(8/5) = .678
• Receiving an “a” has more information than
receiving a “b” or “c”.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
5
Data Compression

First Order Entropy

• The first order entropy is defined for a probability
distribution over symbols {a1, a2, ... , am}.
m
1
H ∑
i 1
P(ai ) log 2 (
P(ai )
)
• H is the average number of bits required to code up a
symbol, given all we know is the probability distribution
of the symbols.
• H is the Shannon lower bound on the average number of
bits to code a symbol in this “source model”.
• Stronger models of entropy include context.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
6
Data Compression

Entropy Examples
• {a, b, c} with a 1/8, b 1/4, c 5/8.
– H = 1/8 *3 + 1/4 *2 + 5/8* .678 = 1.3 bits/symbol

• {a, b, c} with a 1/3, b 1/3, c 1/3. (worst case)

– H = 3* (1/3)*log2(3) = 1.6 bits/symbol

• Note that a standard code takes 2 bits per

symbol
symbol a b c
binary code 00 01 10

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
7
Data Compression

An Extreme Case
• {a, b, c} with a 1, b 0, c 0
– H=?

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
8
Data Compression

Entropy Curve
• Suppose we have two symbols with probabilities
x and 1-x, respectively.
maximum entropy at .5
1.2

1 -(x log x + (1-x)log(1-x))

0.8
entropy

0.6

0.4

0.2

0
0

1
0.1
0.2
0.3
0.4

0.5
0.6
0.7
0.8
0.9

probability of first symbol

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 1
9
Data Compression

A Simple Prefix Code

• {a, b, c} with a 1/8, b 1/4, c 5/8.
• A prefix code is defined by a binary tree
• Prefix code property
– no output is a prefix of another
input output
binary tree a 00
0 1
c b 01 code
0 1
a b c 1

ccabccbccc
1 1 00 01 1 1 01 1 1 1

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 2
0
Data Compression

Binary Tree Terminology

root

node

leaf

0 1
c
0 1
a b

11000111100

ccabccca

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
3
Data Compression

Exercise Encode/Decode

0 1
a 1
0
d
0 1

b c

• Player 1: Encode a symbol string

• Player 2: Decode the string
• Check for equality

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
4
Data Compression

How Good is the Code

0 1
c
0 1 5/8
a b
1/8 1/4

bit rate = (1/8)2 + (1/4)2 + (5/8)1 = 11/8 = 1.375 bps

Entropy = 1.3 bps
Standard code = 2 bps

(bps = bits per symbol)

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
5
Data Compression

Design a Prefix Code 1

• abracadabra
• Design a prefix code for the 5 symbols
{a,b,r,c,d} which compresses this string the
most.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
6
Data Compression

Design a Prefix Code 2

• Suppose we have n symbols each with
probability 1/n. Design a prefix code with
minimum average bit rate.
• Consider n = 2,3,4,5,6 first.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
7
Data Compression

Huffman Coding
• Huffman (1951)
• Uses frequencies of symbols in a string to build a
variable rate prefix code.
– Each symbol is mapped to a binary string.
– More frequent symbols have shorter codes.
– No code is a prefix of another.
• Example: 0 1
a 0
b 100 a
0 1
c 101
d
d 11 0 1

b c

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
8
Data Compression

Variable Rate Code Example

• Example: a 0, b 100, c 101, d 11
• Coding:
– aabddcaa = 16 bits
– 0 0 100 11 11 101 0 0= 14 bits
• Prefix code ensures unique decodability.
– 00100111110100

– aabddcaa

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 3
9
Data Compression

Cost of a Huffman Tree

• Let p1, p2, ... , pm be the probabilities for the
symbols a1, a2, ... ,am, respectively.
• Define the cost of the Huffman tree T to be
m
C(T) ∑p r i i
i 1
where ri is the length of the path from the root
to ai.
• C(T) is the expected length of the code of a
symbol coded by the tree T. C(T) is the bit
rate of the code.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
0
Data Compression

Example of Cost
• Example: a 1/2, b 1/8, c 1/8, d 1/4
T
0 1
a 1
0
d
0 1

b c

C(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75

a b c d

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
1
Data Compression

Huffman Tree
• Input: Probabilities p1, p2, ... , pm for symbols
a1, a2, ... ,am, respectively.
• Output: A tree that minimizes the average
number of bits (bit rate) to code a symbol.
That is, minimizes
m
HC(T) ∑p r i i bit rate
i 1
where ri is the length of the path from the root
to ai. This is the Huffman tree or Huffman
code

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
2
Data Compression

Optimality Principle 1
• In a Huffman tree a lowest probability symbol
has maximum distance from the root.
– If not exchanging a lowest probability symbol with
one at maximum distance will lower the cost.

T p smallest T’
k p<q
k<h
h
p q

q p

C(T’) = C(T) + hp - hq + kq - kp = C(T) - (h-k)(q-p) < C(T)

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
3
Data Compression

Optimality Principle 2
• The second lowest probability is a sibling of
the smallest in some Huffman tree.
– If not, we can move it there not raising the cost.

T p smallest T’
k q 2nd smallest
q<r
h
q k<h r

r p q p

C(T’) = C(T) + hq - hr + kr - kq = C(T) - (h-k)(r-q) < C(T)

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
4
Data Compression

Optimality Principle 3
• Assuming we have a Huffman tree T whose two
lowest probability symbols are siblings at
maximum depth, they can be replaced by a new
symbol whose probability is the sum of their
probabilities.
– The resulting tree is optimal for the new symbol set.
T
T’
p smallest
q 2nd smallest
h

q+p
q p
C(T’) = C(T) + (h-1)(p+q) - hp -hq = C(T) - (p+q)

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
5
Data Compression

Optimality Principle 3 (cont’)

• If T’ were not optimal then we could find a
lower cost tree T’’. This will lead to a lower
cost tree T’’’ for the original alphabet.

T’ T’’ T’’’

q+p
q+p q p

C(T’’’) = C(T’’) + p + q < C(T’) + p + q = C(T) which is a contradiction

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
6
Data Compression

Recursive Huffman Tree Algorithm

1. If there is just one symbol, a tree with one
node is optimal. Otherwise
2. Find the two lowest probability symbols with
probabilities p and q respectively.
3. Replace these with a new symbol with
probability p + q.
4. Solve the problem recursively for new symbols.
5. Replace the leaf with the new symbol with an
internal node with two children with the old symbols.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
7
Data Compression

Iterative Huffman Tree Algorithm

form a node for each symbol ai with weight pi;
insert the nodes in a min priority queue ordered by probability;
while the priority queue has more than one element do
min1 := delete-min;
min2 := delete-min;
create a new node n;
n.weight := min1.weight + min2.weight;
n.left := min1;
n.right := min2;
insert(n)
return the last node in the priority queue.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
8
Data Compression

Example of Huffman Tree Algorithm (1)

• P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1

.4 .1 .3 .1 .1
a b c d e

.4 .2 .3 .1
a c d

b e

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 4
9
Data Compression

Example of Huffman Tree Algorithm (2)

.4 .2 .3 .1
a c d

b e

.4 .3 .3
a c

b e

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
0
Data Compression

Example of Huffman Tree Algorithm (3)

.4 .3 .3 .4 .6
a c a

d c

b e d

b e

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
1
Data Compression

Example of Huffman Tree Algorithm (4)

.4 .6
a a

c c

d d

b e b e

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
2
Data Compression

Huffman Code

0 1 average number of bits per symbol is

.4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4 = 2.1
a
0 1
c a 0
0 b 1110
1
c 10
d d 110
0 1 e 1111
b e

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
3
Data Compression

Optimal Huffman Code vs. Entropy

• P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1
Entropy

H = -(.4 x log2(.4) + .1 x log2(.1) + .3 x log2(.3)

+ .1 x log2(.1) + .1 x log2(.1))
= 2.05 bits per symbol

Huffman Code

HC = .4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4
= 2.1 bits per symbol
pretty good!

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
4
Data Compression

In Class Exercise
• P(a) = 1/2, P(b) = 1/4, P(c) = 1/8, P(d) = 1/16,
P(e) = 1/16
• Compute the Optimal Huffman tree and its
average bit rate.
• Compute the Entropy
• Compare
• Hint: For the tree change probabilities to be
integers: a:8, b:4, c:2, d:1, e:1. Normalize at
the end.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
5
Data Compression

Quality of the Huffman Code

• The Huffman code is within one bit of the entropy
lower bound.
H HC H 1
• Huffman code does not work well with a two symbol
alphabet.
– Example: P(0) = 1/100, P(1) = 99/100
– HC = 1 bits/symbol
0 1
1 0
– H = -((1/100)*log2(1/100) + (99/100)log2(99/100))
= .08 bits/symbol

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
6
Data Compression

Powers of Two
• If all the probabilities are powers of two then
HC H
• Proof by induction on the number of symbols.
Let p1 < p2 < ... < pn be the probabilities that add up
to 1
If n = 1 then HC = H (both are zero).
If n > 1 then p1 = p2 = 2-k for some k, otherwise the
sum cannot add up to 1.
Combine the first two symbols into a new symbol of
probability 2-k + 2-k = 2-k+1.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
7
Data Compression

Powers of Two (Cont.)

By the induction hypothesis
HC(p1 p2 ,p3 ,...,pn ) H(p1 p2 ,p3 ,...,pn )
n
- (p1 p 2 )log2 (p1 p2 ) ∑ p log i 2 (pi )
i 3
n
2 k 1
log2 (2 k 1
) ∑ p log i 2 (pi )
i 3
n
2 k 1
(log2 (2 ) 1)k
∑ p log i 2 (pi )
i 3
n
k k
2 log 2(2 ) 2 log 2 (2 ) k k
∑ p log i 2 (p i ) 2 k
2 k

i 3
n

∑ p log i 2 (pi ) (p1 p2 )

i 1

H(p1,p2 ,...,pn ) (p1 p2 )

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
8
Data Compression

Powers of Two (Cont.)

By the previous page,

HC(p1 p2 ,p3 ,..., pn ) H(p1,p2 ,..., pn ) (p1 p2 )

By the properties of Huffman trees (principle 3),

HC(p1,p2 ,..., pn ) HC(p1 p2 ,p3 ,..., pn ) (p1 p2 )

Hence,
HC(p1,p 2 ,...,pn ) H(p1,p2 ,...,pn )

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 5
9
Data Compression

Extending the Alphabet

• Assuming independence P(ab) = P(a)P(b), so
we can lump symbols together.
• Example: P(0) = 1/100, P(1) = 99/100
– P(00) = 1/10000, P(01) = P(10) = 99/10000,
P(11) = 9801/10000.

0 1 HC = 1.03 bits/symbol (2 bit symbol)

= .515 bits/bit
11
0 1 Still not that close to H = .08 bits/bit
10
0 1
01 00

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
0
Data Compression

Quality of Extended Alphabet

• Suppose we extend the alphabet to symbols
of length k then

H HC H 1/k
• Pros and Cons of Extending the alphabet
+ Better compression
- 2k symbols
- padding needed to make the length of the input
divisible by k

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
1
Data Compression

Huffman Codes with Context

• Suppose we add a one symbol context. That is in
compressing a string x1x2...xn we want to take into
account xk-1 when encoding xk.
– New model, so entropy based on just independent
probabilities of the symbols doesn’t hold. The new entropy
model (2nd order entropy) has for each symbol a probability
for each other symbol following it.
– Example: {a,b,c}
next
a b c
a .4 .2 .4
prev b .1 .9 0
c .1 .1 .8

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
2
Data Compression

Multiple Codes
next
Code for first symbol
a b c
a 00
a .4 .2 .4
prev b .1 .9 b 01
0
c 10
c .1 .1 .8

a b c

0 1 0 1 0 1

a b a c
0 .9 .1 .8 0 1
1 .4
b a b
c
abbacc .1
.2 .4 .1
00 00 0 1 01 0

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
3
Data Compression

Average Bit Rate for Code

• P(a) = .4 P(a) + .1 P(b) + .1 P(c)
P(b) = .2 P(a) + .9 P(b) + .1 P(c)
1 = P(a) + P(b) + P(c)
• 0 = -.6 P(a) + .1 P(b) + .1 P(c)
0 = .2 P(a) - .1 P(b) + .1 P(c)
1 = P(a) + P(b) + P(c)
• P(a) = 1/7, P(b) = 4/7, P(c) = 2/7

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
4
Data Compression

Average Bit Rate for Code

1/7 4/7 2/7

0 1 0 1 0 1

a b a c
0 .9 .1 .8 0 1
1 .4
b a b
c
.2 .1 .1
.4

ABR = 1/7 (.6 x 2 + .4) + 4/7 (1) + 2/7 ( .2 x 2 +.8)

= 8/7 = 1.14 bps

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
5
Data Compression

Complexity of Huffman Code Design

• Time to design Huffman Code is O(n log n)
where n is the number of symbols.
– Each step consists of a constant number of priority
queue operations (2 deletemin’s and 1 insert)

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
6
Data Compression

Approaches to Huffman Codes

1. Frequencies computed for each input
– Must transmit the Huffman code or
frequencies as well as the compressed input
– Requires two passes
2. Fixed Huffman tree designed from training data
– Do not have to transmit the Huffman tree
because it is known to the decoder.
– H.263 video coder
3. Adaptive Huffman code
– One pass
– Huffman tree changes as frequencies change

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
7
Data Compression

Run-Length Coding
• Lots of 0’s and not too many 1’s.
– Fax of letters
– Graphics
• Simple run-length code
– Input
00000010000000001000000000010001001.....
– Symbols
6 9 10 3 2 ...
– Code the bits as a sequence of integers
– Problem: How long should the integers be?

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
8
Data Compression

Golomb Code of Order m

Variable Length Code for Integers
• Let n = qm + r where 0 < r < m.
– Divide m into n to get the quotient q and
remainder r.
• Code for n has two parts:
1. q is coded in unary
2. r is coded as a fixed prefix code
Example: m = 5 0 1
code for r
0 1 0 1

0 0 1
1 2

3 4

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 6
9
Data Compression

Example
• n = qm + r is represented by:
678q
11L10r̂
– where rˆ is the fixed prefix code for r
• Example (m = 5):
2 6 9 10 27
010 1001 10111 11000 11111010

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
0
Data Compression

Alternative Explanation
Golomb Code of order 5
input output
00000 1
0 1
00001 0111
0 1
00000 0001 0110
0 1 0 1
0 1 001 010
1 01 001
01 001
0001 00001
1 000

Variable length to variable length code.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
1
Data Compression

Run Length Example: m = 5

0000
00000010000000001000000000010001001.....
1
00000010000000001000000000010001001.....
001
00000
00000010000000001000000000010001001.....
1
00000010000000001000000000010001001.....
0111

In this example we coded 17 bits in only 9 bits.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
2
Data Compression

Choosing m
• Suppose that 0 has the probability p and 1
has probability 1-p.
• The probability of 0n1 is pn(1-p). The Golomb
code of order
m -
1 log 2 p
is optimal.
• Example: p = 127/128.

m - 89
1 log2 (127/128)

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
3
Data Compression

Average Bit Rate for Golomb Code

Average output code length

Average Bit Rate
Average input code length

• m = 4 as an example. With p as the probability of 0.

p 4 3p 3 (1 p) 3p 2(1 p) 3p(1 p) 3(1 p)
ABR
4p 4 4p3 (1- p) 3p2 (1- p) 2p(1 p) (1 p)

ouput 1 011 010 001 000

input 0000 0001 001 01 1
weight p4 p3(1-p) p2(1-p) p(1-p) 1-p

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
4
Data Compression

Comparison of GC with Entropy

GC – entropy
order entropy

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
5
Data Compression

Notes on Golomb codes

• Useful for binary compression when one symbol is
much more likely than another.
– binary images
– fax documents
– bit planes for wavelet image compression
• Need a parameter (the order)
– training
– adaptively learn the right parameter
• Variable-to-variable length code
• Last symbol needs to be a 1
– coder always adds a 1
– decoder always removes a 1

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
6
Data Compression

Tunstall Codes
• Variable-to-fixed length code
• Example
input output
a 000
b 001 a b cca cb ccc ...
ca 010 000 001 110 011 110 ...
cb 011
cca 100
ccb 101
ccc 110

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
7
Data Compression

Tunstall code Properties

1. No input code is a prefix of another to
assure unique encodability.
2. Minimize the number of bits per symbol.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
8
Data Compression

Prefix Code Property

a 000 a c
b
b 001
000 001 a c
ca 010 b
cb 011 011 a c
010 b
cca 100
ccb 101 100 101 110
ccc 110
Unused output code is 111.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 7
9
Data Compression

Use for unused code

• Consider the string “cc”, if it occurs at the end
of the data. It does not have a code.
• Send the unused code and some fixed code
for the cc.
• Generally, if there are k internal nodes in the
prefix tree then there is a need for k-1 fixed
codes.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
0
Data Compression

Designing a Tunstall Code

• Suppose there are m initial symbols.
• Choose a target output length n where 2n > m.
1. Form a tree with a root and m children with
edges labeled with the symbols.
2. If the number of leaves is > 2n – m then halt.*
3. Find the leaf with highest probability and
expand it to have m children.** Go to 2.

* In the next step we will add m-1 more leaves.

** The probability is the product of the probabilities
of the symbols on the root to leaf path.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
1
Data Compression

Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
a c
b

.7 .2 .1

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
2
Data Compression

Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
a c
b
a c
b .2 .1

.49 .14 .07

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
3
Data Compression

Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3 aaa 000
a c aab 001
b
aac 010
a c
b .2 .1 ab 011
a c ac 100
b .14 .07
b 101
.343 .098 .049 c 110

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
4
Data Compression

Bit Rate of Tunstall

• The length of the output code divided by the
average length of the input code.
• Let pi be the probability of, and ri the length of
input code i (1 < i < s) and let n be the length
of the output code.
n
Average bit rate s

∑p r
i 1
i i

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
5
Data Compression

Example
aaa .343 000
aab .098 001
a c
b aac .049 010

a c ab .14 011
b .2 .1
ac .07 100
a c b .2 101
b .14 .07
c .1 110

.343 .098 .049

ABR = 3/[3 (.343 + .098 + .049) + 2 (.14 + .07) + .2 + .1]

= 1.37 bits per symbol
Entropy = 1.16 bits per symbol

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
6
Data Compression

Notes on Tunstall Codes

• Variable-to-fixed length code
• Error resilient
– A flipped bit will introduce just one error in the
output
– Huffman is not error resilient. A single bit flip can
destroy the code.

Computer Vision & Biometrics Lab, Indian Institute of Information Technology, Allahabad 8
7

Fiat - Q Interpersonal Relationships Questionnaire: Class A: Assertion of Needs (Identification & Expression)
No ratings yet
Fiat - Q Interpersonal Relationships Questionnaire: Class A: Assertion of Needs (Identification & Expression)
5 pages
HUM 1020 Adventure Map
No ratings yet
HUM 1020 Adventure Map
11 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
HTCS501 Unit 4
No ratings yet
HTCS501 Unit 4
17 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Data Compression Unit-1 - 1
No ratings yet
Data Compression Unit-1 - 1
21 pages
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
No ratings yet
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
93 pages
Chap 2
No ratings yet
Chap 2
47 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Lec 42024
No ratings yet
Lec 42024
13 pages
Chapter 2-Compression Techniques
No ratings yet
Chapter 2-Compression Techniques
63 pages
Data Compression Explained
No ratings yet
Data Compression Explained
110 pages
Advanced Multimedia Infrastructure
No ratings yet
Advanced Multimedia Infrastructure
32 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
01 EntropyLosslessCoding PDF
No ratings yet
01 EntropyLosslessCoding PDF
29 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
17 pages
09 Basic Compression
No ratings yet
09 Basic Compression
81 pages
Dce Easy Solution
0% (1)
Dce Easy Solution
87 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Types of Coding: - Source Coding - Code Data To More Efficiently Represent
No ratings yet
Types of Coding: - Source Coding - Code Data To More Efficiently Represent
41 pages
Sayood DataCompression
No ratings yet
Sayood DataCompression
22 pages
DC (Ca 1)
No ratings yet
DC (Ca 1)
11 pages
Data Compression 2
No ratings yet
Data Compression 2
19 pages
Data Compression Explained
100% (1)
Data Compression Explained
92 pages
Data Compression: CS 147 Minh Nguyen
No ratings yet
Data Compression: CS 147 Minh Nguyen
25 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Data Compression
No ratings yet
Data Compression
21 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
57 pages
05A Compression
No ratings yet
05A Compression
102 pages
06 Image Compresssion
No ratings yet
06 Image Compresssion
49 pages
MM Unit-III - 0
No ratings yet
MM Unit-III - 0
22 pages
Ic23 Unit01 Script
No ratings yet
Ic23 Unit01 Script
30 pages
Entropy, Coding and Data Compression
No ratings yet
Entropy, Coding and Data Compression
33 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Video Processing Communications Yao Wang Chapter8a
No ratings yet
Video Processing Communications Yao Wang Chapter8a
19 pages
Data Compression: Chapter - 2 Mathematical Preliminaries For Lossless Compression
100% (2)
Data Compression: Chapter - 2 Mathematical Preliminaries For Lossless Compression
26 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
PDF
No ratings yet
PDF
5 pages
Module 2
No ratings yet
Module 2
47 pages
Unit 5 - Data Compression
No ratings yet
Unit 5 - Data Compression
46 pages
CH 6
No ratings yet
CH 6
21 pages
Compression 2
No ratings yet
Compression 2
70 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
Huffman Shannon Fano2
No ratings yet
Huffman Shannon Fano2
41 pages
Compression
100% (1)
Compression
38 pages
Multimedia Systems: Chapter 7: Data Compression
No ratings yet
Multimedia Systems: Chapter 7: Data Compression
41 pages
Data Compression Intro
100% (1)
Data Compression Intro
107 pages
Intro To ICT 11
No ratings yet
Intro To ICT 11
31 pages
Image Compression
No ratings yet
Image Compression
38 pages
B.E Semester: 6 - IT (GTU) : 2161603 - Data Compression and Data Retrieval
No ratings yet
B.E Semester: 6 - IT (GTU) : 2161603 - Data Compression and Data Retrieval
17 pages
DC CH1
No ratings yet
DC CH1
17 pages
Data Compression
No ratings yet
Data Compression
20 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Top Networking Terms You Should Know
From Everand
Top Networking Terms You Should Know
JOHN SMITH
No ratings yet
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
From Everand
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
Fouad Sabry
No ratings yet
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
From Everand
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
Kevin Wilson
No ratings yet
Observing Cells Lab (1) ALL
No ratings yet
Observing Cells Lab (1) ALL
6 pages
Pol 528 CP Syllabus 2024 To Post
No ratings yet
Pol 528 CP Syllabus 2024 To Post
14 pages
Gardner
No ratings yet
Gardner
24 pages
LNGJ JP101 Calendar Dates
No ratings yet
LNGJ JP101 Calendar Dates
7 pages
12A4 - Question Tags
No ratings yet
12A4 - Question Tags
12 pages
Management Information System 2
No ratings yet
Management Information System 2
10 pages
Messiaen, Saint Francis, and The Birds of Faith
100% (1)
Messiaen, Saint Francis, and The Birds of Faith
12 pages
Pakikigkapwa
No ratings yet
Pakikigkapwa
15 pages
Study Strategies of Vs & Aerie
No ratings yet
Study Strategies of Vs & Aerie
50 pages
The Silver Lining of Materialism. The Impact of Luxury
No ratings yet
The Silver Lining of Materialism. The Impact of Luxury
27 pages
Doc-20230413-Wa0000.
No ratings yet
Doc-20230413-Wa0000.
2 pages
Basic Sciences Organ Systems Errata 11-18-09
No ratings yet
Basic Sciences Organ Systems Errata 11-18-09
6 pages
Your Student Banking Advantage Plan Account: Here's What Happened in Your Account This Statement Period
No ratings yet
Your Student Banking Advantage Plan Account: Here's What Happened in Your Account This Statement Period
2 pages
Intro Islamiyat 2015
No ratings yet
Intro Islamiyat 2015
28 pages
Renaissance To Restoration
100% (2)
Renaissance To Restoration
68 pages
Krishnamurti Paddhati
100% (1)
Krishnamurti Paddhati
3 pages
Brown Girl Dreaming
No ratings yet
Brown Girl Dreaming
1 page
Garcilaso Égloga Tercera
No ratings yet
Garcilaso Égloga Tercera
10 pages
Kisay Reference 4
No ratings yet
Kisay Reference 4
3 pages
Dimensions of Philippine Literary History
100% (1)
Dimensions of Philippine Literary History
40 pages
LESSON 3 Writing Effective Business Correspondence
No ratings yet
LESSON 3 Writing Effective Business Correspondence
4 pages
Gjuhe Angleze b2 Sesioni 1
No ratings yet
Gjuhe Angleze b2 Sesioni 1
8 pages
CRTB2 Sample Report
No ratings yet
CRTB2 Sample Report
3 pages
A Case Study in Classic Mistakes
No ratings yet
A Case Study in Classic Mistakes
6 pages
Inversion Restrictive Adverbials Powerpoint Teens c1 Ver 5
No ratings yet
Inversion Restrictive Adverbials Powerpoint Teens c1 Ver 5
13 pages
Youth Work Dissertation Ideas
100% (2)
Youth Work Dissertation Ideas
5 pages
Managerial Economics Patrick McNutt
100% (1)
Managerial Economics Patrick McNutt
49 pages
Exercises Present Progressive
No ratings yet
Exercises Present Progressive
2 pages