The Huffman coding is a popular algorithm used for lossless data compression. It works by assigning the variable-length codes to the input characters with the shorter codes assigned to the more frequent characters. This results in the prefix-free code meaning no code is a prefix of any other code. The Huffman coding guarantees an optimal prefix-free encoding minimizing the total number of bits required to encode a message.
Data Structure Organization:
The Huffman coding organizes data using the binary tree called a Huffman tree. The tree is constructed using the priority queue or a min-heap to efficiently find the two lowest frequency nodes and merge them into the new node. This process continues until all nodes are merged into a single root node forming the Huffman tree.
Root
/ \
Node1 Node2
/ \ / \
... ... ... ...
Implementation:
The Huffman coding can be implemented using various data structures such as the arrays linked lists or priority queues. One common approach is to the use a priority queue to efficiently select the two lowest frequency nodes for the merging. This implementation typically involves the following steps:
- Calculate the frequency of each input character.
- Create a leaf node for each character and add it to the priority queue.
- While there is more than one node in the priority queue:
- Remove the two nodes with the lowest frequency.
- They merge them into a new internal node with the sum of their frequencies.
- The Add the new node back to priority queue.
- The remaining node in priority queue becomes the root of the Huffman tree.
Basic Operations:
- Encoding: The Generate the Huffman codes for the each input character based on the constructed Huffman tree.The Time complexity: O(n log n) where n is the number of the unique characters.
- Decoding: The Decode a sequence of the Huffman codes back into the original message using the Huffman tree. The Time complexity: O(n) where n is the length of the encoded message.
- Compression: The Encode a message using the generated Huffman codes. The Space complexity: The Depends on the efficiency of the Huffman tree typically close to the entropy of the message.
- Decompression: The Decode a compressed message back into the original message. The Space complexity: Depends on the size of the compressed message and efficiency of the Huffman tree.
Program:
Java
// Java Program to Implement
// Huffman Coding
import java.util.PriorityQueue;
import java.util.HashMap;
// Class representing a node in the Huffman Tree
class HuffmanNode {
// Character data
char data;
// Frequency of the character
int frequency;
// Left and right child nodes
HuffmanNode left, right;
// Constructor to initialize the node
HuffmanNode(char data, int frequency) {
this.data = data;
this.frequency = frequency;
left = right = null;
}
}
// Driver Class
public class HuffmanCoding {
// Main Function
public static void main(String[] args) {
// The message to be encoded
String message = "Huffman coding is a lossless data compression algorithm.";
// Create a frequency map to count the frequency of each character
HashMap<Character, Integer> frequencyMap = new HashMap<>();
for (char c : message.toCharArray()) {
frequencyMap.put(c, frequencyMap.getOrDefault(c, 0) + 1);
}
// Create a priority queue to build the Huffman Tree
PriorityQueue<HuffmanNode> priorityQueue =
new PriorityQueue<>((a, b) -> a.frequency - b.frequency);
// Create a Huffman node for each character and add it to the priority queue
for (char c : frequencyMap.keySet()) {
priorityQueue.add(new HuffmanNode(c, frequencyMap.get(c)));
}
// Build the Huffman Tree
while (priorityQueue.size() > 1) {
// Remove the two nodes with the lowest frequency
HuffmanNode left = priorityQueue.poll();
HuffmanNode right = priorityQueue.poll();
// Create a new internal node with these two nodes
// as children and add it back to the queue
HuffmanNode newNode =
new HuffmanNode('$', left.frequency + right.frequency);
newNode.left = left;
newNode.right = right;
priorityQueue.add(newNode);
}
// The remaining node is the root of the Huffman Tree
HuffmanNode root = priorityQueue.poll();
// Print the Huffman codes for each character
System.out.println("Huffman codes:");
printCodes(root, new StringBuilder());
}
// Method to print the Huffman codes
public static void printCodes(HuffmanNode root, StringBuilder code) {
if (root == null) return;
// If this is a leaf node, print the character and its code
if (root.data != '$') {
System.out.println(root.data + ": " + code);
}
// Traverse the left subtree
if (root.left != null) {
printCodes(root.left, code.append('0'));
code.deleteCharAt(code.length() - 1);
}
// Traverse the right subtree
if (root.right != null) {
printCodes(root.right, code.append('1'));
code.deleteCharAt(code.length() - 1);
}
}
}
Output
Huffman codes:
a: 000
m: 0010
n: 0011
.: 01000
p: 010010
u: 010011
l: 0101
s: 011
: 100
f: 10100
g: 10101
r: 10110
d: 10111
H: 110000
h: 110001
c: 11001
i: 1101
t: 11100
e: 11101
o: 1111
Applications of Huffman Coding
- Data compression: The Huffman coding is widely used in the file compression algorithms like ZIP, gzip and JPEG.
- Telecommunication: The Huffman coding is used in the voice and image compression to the reduce bandwidth usage.
- Error detection and correction: The Huffman codes with the error-detecting capabilities are used in the applications like QR codes and barcodes.
Conclusion
The Huffman coding provides an efficient method for the lossless data compression by the assigning the variable-length codes to the input characters based on their frequencies. Its applications span across the various domains making it a fundamental algorithm in the computer science and telecommunications. By understanding the Huffman coding and its implementation one can appreciate its significance in the optimizing data storage and transmission.
Similar Reads
Huffman Coding in C++ In this article, we will learn the implementation of Huffman Coding in C++.What is Huffman Coding?Huffman Coding is a popular algorithm used for lossless data compression. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters. This technique en
6 min read
Huffman Coding in C In this article, we will learn the implementation of Huffman Coding in C.What is Huffman Coding?Huffman Coding is a lossless data compression algorithm. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters. This algorithm makes sure that the m
8 min read
Huffman Coding in Python Huffman Coding is one of the most popular lossless data compression techniques. This article aims at diving deep into the Huffman Coding and its implementation in Python. What is Huffman Coding?Huffman Coding is an approach used in lossless data compression with the primary objective of delivering r
3 min read
Coding Guidelines in Java Java is one of the most popular and widely used programming languages and platforms. A platform is an environment that helps to develop and run programs written in any programming language. Java is fast, reliable, and secure. From desktop to web applications, scientific supercomputers to gaming cons
7 min read
Java Fundamentals Coding Practice Problems Understanding Java fundamentals is the first step to becoming a proficient Java programmer. This collection of Java basic coding practice problems covers essential topics such as input/output operations, arithmetic and logical operators, type conversion, conditional statements, loops, and more. Thes
1 min read