12 - Huffman Coding Algorithm
12 - Huffman Coding Algorithm
What is Encoding ?
Fixed-Length encoding –
Every character is assigned a binary code using same number of
bits.
Since ‘a’ occurs more frequently than ‘b’, ‘c’ and ‘d’, it uses least
number of bits, followed by ‘d’, ‘b’ and ‘c’.
This technique is the basis for all data compression and encoding
schemes.
Step 1- Create a leaf node for each character and build a min heap using all
the nodes (the frequency value is used to compare two nodes in min heap)
Step 2- Repeat Steps 3 to 5 while heap has more than one node
Step 3- Extract two nodes, say x and y, with minimum frequency from the
heap
Step 4- Create a new internal node z with x as its left child and y as its right
child. frequency(z)= frequency(x) + frequency(y)
How many bits are required to store this file using Huffman Encoding?
A file containing 6 unique characters and frequency of each character is given:
c=34 - 011
d=9 - 0101
g=35 - 00
u=2 - 01000
m=2 - 01001
a=100 -1
c=34 - 011
d=9 - 0101
g=35 - 00
u=2 - 01000
m=2 - 01001
a=100 -1