0% found this document useful (0 votes)
53 views

Huffman Coding: Vida Movahedi

Huffman coding is an algorithm that assigns variable-length codes to symbols based on their frequency of occurrence, with more common symbols getting shorter codes. It involves combining the two least frequent symbols into a new symbol and repeating until there is only one symbol left. This results in a prefix-free code that is optimal for compression. However, it has disadvantages in that changing symbol frequencies require recomputing the code, and it does not consider blocks of predictable symbols together.

Uploaded by

Yugal Jindle
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Huffman Coding: Vida Movahedi

Huffman coding is an algorithm that assigns variable-length codes to symbols based on their frequency of occurrence, with more common symbols getting shorter codes. It involves combining the two least frequent symbols into a new symbol and repeating until there is only one symbol left. This results in a prefix-free code that is optimal for compression. However, it has disadvantages in that changing symbol frequencies require recomputing the code, and it does not consider blocks of predictable symbols together.

Uploaded by

Yugal Jindle
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
You are on page 1/ 8

Huffman Coding

Vida Movahedi

October 2006
Contents
• A simple example
• Definitions
• Huffman Coding Algorithm
• Image Compression
A simple example
• Suppose we have a message consisting of 5 symbols,
e.g. [►♣♣♠☻►♣☼►☻]
• How can we code this message using 0/1 so the coded
message will have minimum length (for transmission or
saving!)

• 5 symbols  at least 3 bits


• For a simple encoding,
length of code is 10*3=30 bits
A simple example – cont.
• Intuition: Those symbols that are more frequent should
have smaller codes, yet since their length is not the
same, there must be a way of distinguishing each code

• For Huffman code,


length of encoded message
will be ►♣♣♠☻►♣☼►☻
=3*2 +3*2+2*2+3+3=24bits
Definitions
• An ensemble X is a triple (x, Ax, Px)
– x: value of a random variable
– Ax: set of possible values for x , Ax={a1, a2, …, aI}
– Px: probability for each value , Px={p1, p2, …, pI}
where P(x)=P(x=ai)=pi, pi>0,  pi  1

i ai pi h(pi)
• Shannon information content of x
1 a .0575 4.1
– h(x) = log2(1/P(x))
2 b .0128 6.3

3 c .0263 5.2
.. .. ..
• Entropy of x 1
H ( x)   P ( x). log 26 z .0007 10.4
– x Ax P ( x)
Huffman Coding Algorithm
1. Take the two least probable symbols in the
alphabet
(longest codewords, equal length, differing in last digit)

1. Combine these two symbols into a single


symbol, and repeat.
Example
• Ax={ a , b , c , d , e }
• Px={0.25, 0.25, 0.2, 0.15, 0.15} 1.0
0

0.55 1

0
0.45 0.3
0 1 0 1
a b c d e
0.25 0.25 0.2 0.15 0.15
00 10 11 010 011
Disadvantages of the Huffman Code
• Changing ensemble
– If the ensemble changes the frequencies and probabilities
change  the optimal coding changes
– e.g. in text compression symbol frequencies vary with context
– Re-computing the Huffman code by running through the entire
file in advance?!
– Saving/ transmitting the code too?!

• Does not consider ‘blocks of symbols’


– ‘strings_of_ch’ the next nine symbols are predictable
‘aracters_’ , but bits are used without conveying any new
information

You might also like