LP-III Assignment No 2
LP-III Assignment No 2
Title of the Assignment: Write a program to implement Huffman Encoding using a greedy strategy.
Objective of the Assignment: Students should be able to understand and solve Huffman Encoding
using greedy method
Prerequisite:
1. Basic of Python or Java Programming
2. Concept of Greedy method
3. Huffman Encoding concept
---------------------------------------------------------------------------------------------------------------
Contents for Theory:
1. Greedy Method
2. Huffman Encoding
3. Example solved using huffman encoding
---------------------------------------------------------------------------------------------------------------
What is a Greedy Method?
● A greedy algorithm is an approach for solving a problem by selecting the best option available
at the moment. It doesn't worry whether the current best result will bring the overall optimal
result.
● The algorithm never reverses the earlier decision even if the choice is wrong. It works in a top-
down approach.
● This algorithm may not produce the best result for all the problems. It's because it always goes
for the local best choice to produce the global best result.
● This algorithm can perform better than other algorithms (but, not in all cases).
● As mentioned earlier, the greedy algorithm doesn't always produce the optimal solution. This is
the major disadvantage of the algorithm
● For example, suppose we want to find the longest path in the graph below from root to leaf.
Greedy Algorithm
2. At each step, an item is added to the solution set until a solution is reached.
Huffman Encoding
● Huffman Coding is a technique of compressing data to reduce its size without losing any of the
details. It was first developed by David Huffman.
● Huffman Coding is generally useful to compress the data in which there are frequently occurring
characters.
● Huffman Coding is a famous Greedy Algorithm.
● It is used for the lossless compression of data.
● It uses variable length encoding.
● It assigns variable length code to all the characters.
● The code length of a character depends on how frequently it occurs in the given text.
● The character which occurs most frequently gets the smallest code.
● The character which occurs least frequently gets the largest code.
● It is also known as Huffman Encoding.
Prefix Rule-
● Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total of
8 * 15 = 120 bits are required to send this string.
● Using the Huffman Coding technique, we can compress the string to a smaller size.
● Huffman coding first creates a tree using the frequencies of the character and then generates code for
each character.
● Once the data is encoded, it has to be decoded. Decoding is done using the same tree.
● Huffman Coding prevents any ambiguity in the decoding process using the concept of prefix code
ie. a code associated with a character should not be present in the prefix of any other code. The tree
created above helps in maintaining the property.
● Huffman coding is done with the help of the following steps.
1. Calculate the frequency of each character in the string.
2. Sort the characters in increasing order of the frequency. These are stored in a priority queue Q.
4. Create an empty node z. Assign the minimum frequency to the left child of z and assign the
second minimum frequency to the right child of z. Set the value of the z as the sum of the above two
minimum frequencies.
5. Remove these two minimum frequencies from Q and add the sum into the list of frequencies (*
denote the internal nodes in the figure above).
For sending the above string over a network, we have to send the tree as well as the above
compressed-code. The total size is given by the table below.
Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to 32
+ 15 + 28 = 75.
Example:
A file contains the following characters with the frequencies as shown. If Huffman Coding is used for data
compression, determine-
Following this rule, the Huffman Code for each character is-
a = 111
e = 10
i = 00
o = 11001
u = 1101
s = 01
t = 11000
Time Complexity-
Conclusion- In this way we have explored Concept of Huffman Encoding using greedy method