Open In App

Huffman Coding in C++

Last Updated : 16 Jul, 2024
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

In this article, we will learn the implementation of Huffman Coding in C++.

What is Huffman Coding?

Huffman Coding is a popular algorithm used for lossless data compression. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters. This technique ensures that the most common characters are represented by shorter bit strings, reducing the overall size of the encoded data.

How does Huffman Coding work in C++?

Huffman Coding works by building a binary tree called the Huffman Tree from the input characters. The algorithm processes the input characters to construct this tree, where each leaf node represents a character and the path from the root to the leaf node determines the code for that character.

Steps to Build Huffman Tree in C++

Take an array of unique characters along with their frequency of occurrences as input and output the Huffman Tree. 

  1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
  2. Extract two nodes with the minimum frequency from the min heap.
  3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
  4. Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and the tree is complete.

Algorithm to Implement Huffman Coding

  • Frequency Calculation:
    • Calculate the frequency of each character in the input data.
  • Priority Queue Initialization:
    • Initialize a priority queue to store nodes of the Huffman Tree based on their frequencies.
  • Building the Huffman Tree:
    • Construct the Huffman Tree by repeatedly combining the two nodes with the lowest frequencies into a new node until only one node remains, which becomes the root of the Huffman Tree.
  • Generating Huffman Codes:
    • Traverse the Huffman Tree to generate the Huffman codes for each character. Assign '0' and '1' based on left and right traversal in the tree.
  • Encoding the Input Data:
    • Encode the input data using the generated Huffman codes to produce the compressed output.
  • Decoding the Encoded Data:
    • Decode the encoded data back to the original input using the Huffman Tree.

C++ Program to Implement Huffman Coding

The below program demonstrates how we can implement huffman coding in C++.


Output
Huffman Codes:
M 111
A 110
U 00
F 01
N 100
H 101

Original string:
HUFFMAN

Encoded string:
101000101111110100

Decoded string:
HUFFMAN

Time complexity: O(nlogn), where n is the number of unique characters. If there are n nodes, extractMin() is called 2*(n – 1) times. extractMin() takes O(logn) time as it calls minHeapify(). So, the overall complexity is O(nlogn).
If the input array is sorted, there exists a linear time algorithm. We will soon be discussing this in our next post.

Auxiliary Space: O(N)

Working example of Huffman Coding in C

Consider the string "HUFFMAN".

1. Count Frequencies:

H: 1, U: 1, F: 2, M: 1, A: 1, N: 1

2. Build a Priority Queue:

[(1, H), (1, U), (2, F), (1, M), (1, A), (1, N)]

3. Build the Huffman Tree:

Combine U and N:

(2, UN)
    (2)
/ \
U(1) N(1)

Combine H and A:

(2, HA)
    (2)
/ \
H(1) A(1)

Combine M and the subtree containing H and A:

(3, MHA)
      (3)
/ \
M(1) (2)
/ \
H(1) A(1)

Combine F and the subtree containing U and N:

(4, UNF)
      (4)
/ \
(2) F(2)
/ \
U(1) N(1)

Combine the two subtrees:

(7, UNFMHA)
         (7)
/ \
(3) (4)
/ \ / \
M(1) (2) (2) F(2)
/ \ / \
H(1) A(1) U(1) N(1)

Final Huffman Tree

Here's the final Huffman Tree for the string "HUFFMAN":

         (7)
/ \
(3) (4)
/ \ / \
M(1) (2) (2) F(2)
/ \ / \
H(1) A(1) U(1) N(1)

Character Codes

H: 101, U: 00, F: 01, M: 111, A: 110, N: 100
  • M: 111
  • A: 110
  • U: 00
  • F: 01
  • N: 100
  • H: 101

Encode Data:

"HUFFMAN" -> 101000101111110100

Decode Data:

"101000101111110100" -> "HUFFMAN"

Applications of Huffman Coding

  • It is used in file compression for reducing the size of files such as text, images, and videos (ZIP, GZIP).
  • It can be used to efficiently transmits data over networks by reducing the amount of data to be sent.
  • It is commonly used in formats like JPEG, MP3, and MPEG.

Article Tags :
Practice Tags :

Similar Reads