0% found this document useful (0 votes)
74 views

Huffman Coding Algorithm: Data Compression and Data Retrieval

Huffman coding is a popular data compression algorithm that uses variable-length codewords to encode source symbols. It develops a prefix code by building a binary tree from the frequency of occurrence for each symbol, where more frequent symbols are assigned shorter codewords. The algorithm works by iteratively combining the two least frequent symbols into a new node, recalculating frequencies, until a full tree is generated to obtain the Huffman code. Decoding follows the tree from root to leaf by taking left or right branches based on the bit value being read to determine the encoded symbol.

Uploaded by

Pooja Bharti
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Huffman Coding Algorithm: Data Compression and Data Retrieval

Huffman coding is a popular data compression algorithm that uses variable-length codewords to encode source symbols. It develops a prefix code by building a binary tree from the frequency of occurrence for each symbol, where more frequent symbols are assigned shorter codewords. The algorithm works by iteratively combining the two least frequent symbols into a new node, recalculating frequencies, until a full tree is generated to obtain the Huffman code. Decoding follows the tree from root to leaf by taking left or right branches based on the bit value being read to determine the encoded symbol.

Uploaded by

Pooja Bharti
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Huffman Coding

Algorithm
Data Compression and Data Retrieval
Encoding
Decoding
Prefix-Codes
Fixed-length Vs Variable-length
coding
Introduction
• Popular method for data compression.
• Developed by David Huffman as a student in a class on Information
Theory at MIT in 1950.
• Works on binary tree.
• It is a type of optimal prefix code and can be viewed as a variable length
code.
• Blocks with higher probabilities are assigned shorter codewords and
blocks with low probabilities are assigned long codewords.
• Generate a code tree and Huffman code is obtained from labelling of
code tree.
Cont…
• Purpose: For the construction of minimum redundancy code.
• Feature: How the variable length codes can be packed together.
• In Huffman encoded data stream, each character can have a variable
number of bits. How do we separate one character from the next?
• For that we need proper selection of Huffman codes that enable the
correct separation.
• Ex. The characters A to G occur in original data stream with
probabilities A=0.154, B=0.110, C=0.072, D=0.063 and so on.
A=1, B=01, … , G=000011
Algorithm
• Input: A set of symbols and their probabilities
• Output: Prefix free binary code with minimum expected codeword length
• Algorithm:
1. Begin with list of all symbols with their associated frequencies.
2. Find the two symbols with lowest frequency.
3. Create a new symbol and link it to these two symbols.
4. Remove the original symbols from the list.
5. Give the new symbol the combined frequencies of two characters.
6. Add the new symbol to the list.
7. Repeat until only one symbol remains in the list.
Example
Cont…
Cont…
Cont…
Cont…
Decoding Huffman Code
• As we read bits from input stream, we traverse the tree beginning at
the root, taking left hand path if read 0 and right hand path if we read
1. When we hit a leaf, we have found the code.
• Advantages:
• Easy to implement.
• Produce lossless compression.
• Disadvantages:
• Slow process of compression.
• Variable length codes so difficult for the decoder to know that it has reached
the last bit of the code.
Thank
You

You might also like