Huffman Coding
Huffman Coding
Fixed-length and variable length are two types of encoding schemes, explained as follows-
Fixed-Length encoding - Every character is assigned a binary code using same number of
bits. Thus, a string like “aabacdad” can require 64 bits (8 bytes) for storage or transmission,
assuming that each character uses 8 bits.
a0
b 011
c 111
d 11
Thus, the string “aabacdad” gets encoded to 00011011111011 (0 | 0 | 011 | 0 | 111 | 11 | 0 | 11),
using fewer number of bits compared to fixed-length encoding scheme.
There are mainly two parts. First one to create a Huffman tree , and another one to traverse the
tree to find codes.
For an example, consider some strings “YYYZXXYYX”, the frequency of character Y is larger
than X and the character Z has the least frequency. So the length of the code for Y is smaller than
X, and code for X will be smaller than Z.
Complexity for assigning the code for each character according to their frequency is O(n log n).
Charecteristics of Huffman Coding
Problem
A file contains the following characters with the frequencies as shown. If Huffman Coding is
used for data compression find the Huffman Code for each character .
Characters Frequencies
a 10
e 15
i 12
o 3
u 4
s 13
t 1
Step-01:
Step-02:
Step-03:
Step-04:
Step-05:
Step-06:
Step-07:
Following this rule, the Huffman Code for each character is-
a = 111
e = 10
i = 00
o = 11001
u = 1101
s = 01
t = 11000