Department of Artificial Intelligence & Data Science K. K. Wagh Institute of Engineering Education and Research
Department of Artificial Intelligence & Data Science K. K. Wagh Institute of Engineering Education and Research
Assignment No.: 3
Assignment Title: Develop a program to implement Huffman Encoding using a greedy strategy.
Objectives:
1. To find out Hummfan’s coding tree using greedy method.
2. To understand how greedy algorithms works.
3. To analyze time complexity of the algorithm.
THEORY:
Greedy Method: Among all the algorithmic approaches, the simplest and straightforward approach is
the Greedy method. In this approach, the decision is taken on the basis of current available
information without worrying about the effect of the current decision in future. Greedy algorithms
build a solution part by part, choosing the next part in such a way, that it gives an immediate benefit.
This approach never reconsiders the choices taken previously. This approach is mainly used to solve
optimization problems. Greedy method is easy to implement and quite efficient in most of the cases.
Hence, we can say that Greedy algorithm is an algorithmic paradigm based on heuristic that follows
local optimal choice at each step with the hope of finding global optimal solution.
Huffman’s Tree: Huffman tree or Huffman coding tree defines as a full binary tree in which each
leaf of the tree corresponds to a letter in the given alphabet. The Huffman tree is treated as the binary
tree associated with minimum external path weight that means, the one associated with the minimum
sum of weighted path lengths for the given set of leaves. So the goal is to construct a tree with the
minimum external path weight.
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes
to input characters, lengths of the assigned codes are based on the frequencies of corresponding
characters. The most frequent character gets the smallest code and the least frequent character gets the
largest code.
There are two to methods to Construct Huffman’s Tree.
1. Fixed Length Method
2. Variable Length Method
Example:
Character Frequency
a 4
b 7
c 3
d 2
e 4
1. Sort the characters by frequency, ascending. These are kept in a Q/min-heap priority queue.
2. For each distinct character and its frequency in the data stream, create a leaf node.
3. Remove the two nodes with the two lowest frequencies from the nodes, and the new root of the
tree is created using the sum of these frequencies.
4. Make the first extracted node its left child and the second extracted node its right child while
extracting the nodes with the lowest frequency from the min-heap. To the min-heap, add this node.
Since the left side of the root should always contain the minimum frequency.
5. Repeat steps 3 and 4 until there is only one node left on the heap, or all characters are represented
by nodes in the tree. The tree is finished when just the root node remains.
Step 1: Build a min-heap in which each node represents the root of a tree with a single node and
holds 5 (the number of unique characters from the provided stream of data).
Step 2: Obtain two minimum frequency nodes from the min heap in step two. Add a third internal
node, frequency 2 + 3 = 5, which is created by joining the two extracted nodes.
Now, there are 4 nodes in the min-heap, 3 of which are the roots of trees with a single element each,
and 1 of which is the root of a tree with two elements.
Step 3: Get the two minimum frequency nodes from the heap in a similar manner in step three.
Additionally, add a new internal node formed by joining the two extracted nodes; its frequency in the
tree should be 4 + 4 = 8.
Now that the minimum heap has three nodes, one node serves as the root of trees with a single element
and two heap nodes serve as the root of trees with multiple nodes.
Step 4: Get the two minimum frequency nodes in step four. Additionally, add a new internal node
formed by joining the two extracted nodes; its frequency in the tree should be 5 + 7 = 12.
When creating a Huffman tree, we must ensure that the minimum value is always on the left side and
that the second value is always on the right side. Currently, the image below shows the tree that has
formed:
Step 5: Get the following two minimum frequency nodes in step 5. Additionally, add a new internal
node formed by joining the two extracted nodes; its frequency in the tree should be 12 + 8 = 20.
Continue until all of the distinct characters have been added to the tree. The Huffman tree created for
the specified cast of characters is shown in the above image.
Now, for each non-leaf node, assign 0 to the left edge and 1 to the right edge to create the code for each
letter.
Rules to follow for determining edge weights:
We should give the right edges weight 1 if you give the left edges weight 0.
If the left edges are given weight 1, the right edges must be given weight 0.
Any of the two aforementioned conventions may be used.
However, follow the same protocol when decoding the tree as well.
Following the weighting, the modified tree is displayed as follows:
A 4 01
B 7 11
C 3 101
D 2 100
E 4 00
Time complexity: O(nlogn) where n is the number of unique characters. If there are n nodes,
extractMin() is called 2*(n – 1) times. extractMin() takes O(logn) time as it calls minHeapify(). So,
the overall complexity is O(nlogn).
If the input array is sorted, there exists a linear time algorithm. We will soon be discussing this in our
next post.
Space complexity:- O(N)
Conclusion: The Huffman algorithm is a greedy algorithm. Since at every stage the algorithm looks
for the best available options. Greedy is an algorithmic paradigm that builds up a solution piece by
piece, always choosing the next piece that offers the most obvious and immediate benefit. So the
problems where choosing locally optimal also leads to global solution are the best fit for Greedy. But
still this method does not guarantee that the solution obtained is best optimal solution.