0% found this document useful (0 votes)
108 views

Huffman and Arithmetic Coding

The document discusses image compression techniques. It covers statistical entropy coding methods like Huffman coding and arithmetic coding. It provides an example of how to construct a Huffman coding tree and assign codes to symbols based on their probabilities. The example shows how to encode and decode data using the Huffman codes.

Uploaded by

pankaj sadhotra
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Huffman and Arithmetic Coding

The document discusses image compression techniques. It covers statistical entropy coding methods like Huffman coding and arithmetic coding. It provides an example of how to construct a Huffman coding tree and assign codes to symbols based on their probabilities. The example shows how to encode and decode data using the Huffman codes.

Uploaded by

pankaj sadhotra
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

CS 753 DIP

Module 5 Image Compression

CS 414 - Spring 2011


Outline
• Statistical Entropy Coding
– Huffman coding
– Arithmetic coding

CS 414 - Spring 2011


Huffman Encoding
• Statistical encoding
• To determine Huffman code, it is useful
to construct a binary tree
• Leaves are characters to be encoded
• Nodes carry occurrence probabilities of
the characters belonging to the sub-tree

CS 414 - Spring 2011


Huffman Encoding (Example)
Step 1 : Sort all Symbols according to their probabilities
(left to right) from Smallest to largest
these are the leaves of the Huffman tree

P(B) = 0.51

P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16

CS 414 - Spring 2011


Huffman Encoding (Example)
Step 2: Build a binary tree from left to
Right P(CEDAB) = 1
Policy: always connect two smaller nodes
together (e.g., P(CE) and P(DA) had both
Probabilities that were smaller than P(B),
Hence those two did connect first

P(CEDA) = 0.49 P(B) = 0.51

P(CE) = 0.20
P(DA) = 0.29

P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16

CS 414 - Spring 2011


Huffman Encoding (Example)
Step 3: label left branches of the tree
P(CEDAB) = 1
With 0 and right branches of the tree
With 1
0 1

P(CEDA) = 0.49 P(B) = 0.51

0 1

P(CE) = 0.20
P(DA) = 0.29
0 1
0 1

P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16

CS 414 - Spring 2011


Huffman Encoding (Example)
Step 4: Create Huffman Code
P(CEDAB) = 1
Symbol A = 011
Symbol B = 1
0 1
Symbol C = 000
Symbol D = 010
Symbol E = 001
P(CEDA) = 0.49 P(B) = 0.51

0 1

P(CE) = 0.20
P(DA) = 0.29
0 1
0 1

P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16

CS 414 - Spring 2011


Huffman Decoding

• Assume Huffman Table


• Symbol Code
X 0
Y 10
Z 11
Consider encoded
bitstream:
000101011001110
What is the decoded string?

CS 414 - Spring 2011


Huffman Example
• Construct the Huffman Symbol (S) P(S)
coding tree (in class)
A 0.25

B 0.30

C 0.12

D 0.15

E 0.18
Characteristics of Solution

Symbol (S) Code

A 01

B 11

C 100

D 101

E 00

CS 414 - Spring 2011


Example Encoding/Decoding
Encode “BEAD” Symbol (S) Code
110001101
A 01

B 11

Decode “0101100” C 100

D 101

E 00

CS 414 - Spring 2011


Entropy (Theoretical Limit)
N
H    p( si ) log 2 p( si ) Symbol P(S) Code
i 1 A 0.25 01

= -.25 * log2 .25 + B 0.30 11


-.30 * log2 .30 + C 0.12 100
-.12 * log2 .12 +
-.15 * log2 .15 + D 0.15 101
-.18 * log2 .18 E 0.18 00

H = 2.24 bits
Average Codeword Length
N
L   p( si )codelength( si ) Symbol P(S) Code
i 1 A 0.25 01

= .25(2) + B 0.30 11
.30(2) +
C 0.12 100
.12(3) +
.15(3) + D 0.15 101
.18(2) E 0.18 00

L = 2.27 bits
Code Length Relative to Entropy
N N
L   p( si )codelength( si ) H    p( si ) log 2 p( si )
i 1 i 1

• Huffman reaches entropy limit when all


probabilities are negative powers of 2
– i.e., 1/2; 1/4; 1/8; 1/16; etc.

• H <= Code Length <= H + 1

CS 414 - Spring 2011


Example
H = -.01*log2.01 + Symbol P(S) Code
-.99*log2.99 A 0.01 1
= .08 B 0.99 0

L = .01(1) +
.99(1)
=1

CS 414 - Spring 2011


Group Exercise

• Compute Entropy (H) Symbol (S) P(S)

A 0.1
• Build Huffman tree B 0.2

C 0.4
• Compute average
code length D 0.2

E 0.1
• Code “BCCADE”

CS 414 - Spring 2011


Limitations
• Diverges from lower limit when probability of
a particular symbol becomes high
– always uses an integral number of bits

• Must send code book with the data


– lowers overall efficiency

• Must determine frequency distribution


– must remain stable over the data set
CS 414 - Spring 2011
Arithmetic Coding
• Optimal algorithm as Huffman coding wrt
compression ratio
• Better algorithm than Huffman wrt
transmitted amount of information
– Huffman – needs to transmit Huffman tables with
compressed data
– Arithmetic – needs to transmit length of encoded
string with compressed data

CS 414 - Spring 2011


Arithmetic Coding
• Each symbol is coded by considering the prior data
• Encoded data must be read from the beginning,
there is no random access possible
• Each real number (< 1) is represented as binary
fraction
– 0.5 = 2-1 (binary fraction = 0.1); 0.25 = 2-2 (binary fraction =
0.01), 0.625 = 0.5+0.125 (binary fraction = 0.101) ….

CS 414 - Spring 2011


CS 414 - Spring 2011
CS 414 - Spring 2011
Other Compression Steps
• Picture processing (Source Coding)
– Transformation from time to frequency domain
(e.g., use Discrete Cosine Transform)
– Motion vector computation in video
• Quantization
– Reduction of precision, e.g., cut least significant
bits
– Quantization matrix, quantization values
• Entropy Coding
– Huffman Coding + RLE

CS 414 - Spring 2011


Audio Compression and Formats (Hybrid Coding
Schemes)
• MPEG-3
• ADPCM
• u-Law
• Real Audio
• Windows Media (.wma)

• Sun (.au)
• Apple (.aif)
• Microsoft (.wav)

CS 414 - Spring 2011


Image Compression and Formats
• RLE
• Huffman
• LZW
• GIF
• JPEG / JPEG-2000 (Hybrid Coding)
• Fractals

• TIFF, PICT, BMP, etc.

CS 414 - Spring 2011


Video Compression and Formats (Hybrid Coding
Schemes)
• H.261/H.263/H.264
• Cinepak (early 1992 Apple’s video codec in Quick-time
video suite)
• Sorensen (Sorenson Media, used in Quick-time and
Macromedia flash)
• Indeo (early 1992 Intel video codec)
• Real Video (1997 RealNetworks)
• MPEG-1, MPEG-2, MPEG-4, etc.

• QuickTime, AVI, WMV (Windows Media Video)

CS 414 - Spring 2011


Summary
• Important Lossless (Entropy) Coders
– RLE, Huffman Coding and Arithmetic Coding
• Important Lossy (Source ) Coders
– Quantization
• Differential PCM (DPCM) – calculate difference from
previous values – one has fewer values to encode
• Loss occurs during quantization of sample values,
hence loss on difference precision occurs as well
Solution

• Compute Entropy (H) Symbol P(S) Code


– H = 2.1 bits A 0.1 100

• Build Huffman tree B 0.2 110

C 0.4 0
• Compute code length
– L = 2.2 bits D 0.2 111

E 0.1 101
• Code “BCCADE” => 11000100111101

You might also like