3.3 Huffman Coding
3.3 Huffman Coding
ENGINEERING AND
TECHNOLOGY
DIGITAL
COMMUNICATION
18EC502
Digital Communication/Module 3
Huffman
Coding
Digital Communication/Module 3
Video Reference
Digital Communication/Module 3
Huffman Coding
• Observations:
– Frequent symbols have short codes.
– In an optimum prefix-free code, the two code words that occur least frequently will
have the same length.
Digital Communication/Module 3
Entropy of Extended Source
• The entropy of extended source is equal to the entropy
of n times the entropy of original source.
• Problem:
• Consider a discrete memoryless source with source
alphabet with discrete probabilities
Digital Communication/Module 3
Entropy of Extended Source
Digital Communication/Module 3
Entropy of Extended Source
Hence,
Digital Communication/Module 3
Source Coding Theorem
• Source Encoding is the process for efficient
representation of data generated by the discrete source.
• The device that performs this representation is called
Source Encoder.
• Eg. is Morse code.
• An efficient source encoder satisfies the following two
requirements.
• The source code produce by the encoder are in binary
form.
• The source code is uniquely decodable.
Digital Communication/Module 3
Source Coding Theorem
Digital Communication/Module 3
Data Compaction
• For efficient signal transmission the redundant
information are removed with no loss of information –
Data Compaction or Lossless data transmission.
• Prefix Coding
• For a source code representing the output of the source
the code has to be uniquely decodable.
• A special code known as prefix code satisfying a
restriction known as the prefix condition is used.
• A prefix code is a code in which no code word is the
prefix of any other code word.
Digital Communication/Module 3
Prefix Code
• Code I and III is not a prefixcode but Code II is.
Digital Communication/Module 3
Prefix Code
Decision Tree
for the above
table.
The encoded sequence
1011111000 is
decoded
as the source sequence
s1s3s2soso.
A prefix code has the important
property
that it
is uniquely decodable.
Digital Communication/Module 3
Huffman Coding
• Is to assign to each symbol of an alphabet a
sequence of bits roughly equal in length to the
amount of information conveyed by the symbol in
question.
• The average code word length approaches the
fundamental limit set by the entropy.
• The Huffman code replaces the prescribed set of
source statistcs with a simpler one.
Digital Communication/Module 3
Huffman Encoding Algorithm
• The source symbols listed in the order of decreasing
probability. The two source symbols of lowest
probability is assigned a 0 or 1. ---- Splitting Stage.
• These two source symbols are combined into new
source symbol of probability equal to the sum of the
two original probabilities.
• The probability of new symbol placed in the list in
accordance with its value.
• The procedure is repeated until we are left with a final
list of source statistics of only two for which 0 and 1
are assigned.
Digital Communication/Module 3
Huffman Encoding Algorithm
• Human Coding - a bottom-up approach
– Initialization: Put all symbols on a list sorted
according to their frequency counts.
• This might not be available !
– Repeat until the list has only one symbol left:
(1) From the list pick two symbols with the lowest frequency
counts. Form a Huffman sub tree that has these two symbols as
child nodes and create a parent node.
(2) Assign the sum of the children's frequency counts to the
parent and insert it into the list such that the order is maintained.
(3) Delete the children from the list.
– Assign a codeword for each leaf based on the
path from the root.
Digital Communication/Module 3
Huffman Tree
• The five symbols of the alphabet of a discrete
memoryless source and their probabilities are given.
Digital Communication/Module 3
Huffman Tree
Digital Communication/Module 3
Huffman Tree
Digital Communication/Module 3
Huffman Tree
• The entropy of discrete memoryless source is
• = 2.12193/2.2 = 96.45%
Digital Communication/Module 3
Example 2
Digital Communication/Module 3
Example 2
Digital Communication/Module 3
Example 2
Digital Communication/Module 3
Example 2
Symbol Probability Code word
X1 0.3 11
X2 0.25 01
X3 0.20 00
X4 0.12 100
X5 0.08 1011
X6 0.05 1010
Digital Communication/Module 3
Example 2
• = 2 × 0.3 + 2 × 0.25 + 2 × 0.20 + 3 × 0.12 + 4 × 0.08 + 4 × 0.05
= 2.38
Digital Communication/Module 3
• The variance of the average code word length
over the ensemble of source symbol is
Digital Communication/Module 3
Assessment Components
1. Online Quiz
2. Assignment
Digital Communication/Module 3
THANK YOU !!
Digital Communication/Module 3