Huffman
Huffman
Huffman coding is a data compression technique that is used for compressing data to
reduce its size without losing any of its details. It was developed by David A.
Huffman.
Huffman coding uses a variable-length code for each character in the file.
Huffman Coding is generally useful to compress the data in which there are frequently
occurring characters.
1
Huffman code will always follow prefix code/rule
Example:
Symbol code
A 0
B 1
C 01
If sender send oo1 message to receiver side, then ambiguity occur
001
AAB AC
2
Example:
Total numbers of characters are: .45 + .13 + .12 + .16 + .09 + .05 = 1.00
Fixed length code requires 3-bits while variable code requires 2.24 bits
=>Saving of memory approximately 25 %
Thus, Huffman’s encoding of the text will use 25% less memory than its fixed-
length encoding
3
Solution:
Step1: Sort the characters by their frequency in ascending order.
Step 2: combine two minimum frequency nodes & arrange in ascending order
Step 3: combine two minimum frequency nodes & arrange in ascending order
Step 4: combine two minimum frequency nodes & arrange in ascending order
Step 5: combine two minimum frequency nodes & arrange in ascending order
A: 11
B: 100
C: 00
D: 01
-- 101
4
Encoding
DAD is encoded as 011101,
Decoding:
10011011011101 is decoded as BAD_AD.
With the occurrence frequencies given and the codeword lengths obtained.
= 1.87
Character Count
M: 1
A: 4
H: 1
R: 2
S: 1
T: 1
5
6
Step 3: Optimal Code
7
Example 3:
Example 4:
A file contains the following characters with the frequencies as shown. If Huffman
Coding is used for data compression, determine-
8
9
AKTU Questions
2. Draw the Huffman tree for the following symbols whose frequency of
occurrence of a message is stated along with the symbols below
M1: 0.45 M2: 0.02 M3: 0.24 M4: 0.18 M5: 0.11
and decode the following message
10110011011111001100101111101101100
3. Differentiate between fixed length and variable length encoding. Draw a
Huffman tree for the following symbols whose frequency of occurrence
in a msg is stated along with the symbol below:
A:15, B:6, C: 7, D: 12, E: 25, F: 4, G:6, H:1, I:15
Decode the message 1110100010111011.
10
When you are on the left you are on the right. When you are on the right, you are on the
wrong.
11