0% found this document useful (0 votes)
56 views11 pages

Huffman

Huffman coding is a data compression technique that uses variable-length codes for characters, allowing for efficient data compression, particularly for frequently occurring characters. It is widely applied in formats like GZIP and BZIP2, and it can save memory compared to fixed-length encoding. The document outlines the process of constructing Huffman trees and provides examples of encoding and decoding messages.

Uploaded by

Anshu Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views11 pages

Huffman

Huffman coding is a data compression technique that uses variable-length codes for characters, allowing for efficient data compression, particularly for frequently occurring characters. It is widely applied in formats like GZIP and BZIP2, and it can save memory compared to fixed-length encoding. The document outlines the process of constructing Huffman trees and provides examples of encoding and decoding messages.

Uploaded by

Anshu Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Huffman Coding: Greedy Approach

Huffman coding is a data compression technique that is used for compressing data to
reduce its size without losing any of its details. It was developed by David A.
Huffman.

Huffman coding uses a variable-length code for each character in the file.

It is also known as Huffman Encoding

It encoding follows the prefix codes

Huffman Coding is generally useful to compress the data in which there are frequently
occurring characters.

1
Huffman code will always follow prefix code/rule

Example:
Symbol code
A 0
B 1
C 01
If sender send oo1 message to receiver side, then ambiguity occur
001

AAB AC

Sender how to recognize which code is received

Applications of Huffman’s encoding


1. Huffman’s encoding is a variable length encoding, so that number of bits used are
lesser than fixed length encoding.
2. Huffman coding is used in conventional compression formats like GZIP, BZIP2,
PKZIP, etc.
3. For text and fax transmissions.
3. Huffman’s code is used in transmission of data in an encoded format.
4. Huffman’s encoding is used in decision trees and game playing.

Differentiate between fixed length and variable length encoding

2
Example:

In fixed length code, need 3 bits to represent 6 characters

This method require 3 bits the code entire file

How do we get 3 bits?

Total numbers of characters are: .45 + .13 + .12 + .16 + .09 + .05 = 1.00

Add each character is assigned 3-bit codeword => 3* 1.00 = 3 bits

Fixed length code requires 3-bits while variable code requires 2.24 bits
=>Saving of memory approximately 25 %
Thus, Huffman’s encoding of the text will use 25% less memory than its fixed-
length encoding

EXAMPLE 1: Consider the five-symbol alphabet {A, B, C, D, _} with the following


occurrence frequencies in a text made up of these symbols:

3
Solution:
Step1: Sort the characters by their frequency in ascending order.

Step 2: combine two minimum frequency nodes & arrange in ascending order

Step 3: combine two minimum frequency nodes & arrange in ascending order

Step 4: combine two minimum frequency nodes & arrange in ascending order

Step 5: combine two minimum frequency nodes & arrange in ascending order

1. Huffman Code of each character:

A: 11
B: 100
C: 00
D: 01
-- 101

4
Encoding
DAD is encoded as 011101,
Decoding:
10011011011101 is decoded as BAD_AD.
With the occurrence frequencies given and the codeword lengths obtained.

2. Length of Huffman encoded message

= ∑ (frequency * code length)


=2 * 0.35 + 3 *0.1 + 2 * 0.2 + 2 *0.2 + 3 * 0.15
= 2.25 bits

3. Average code length per character

= ∑ (frequency * code length) / ∑ (frequency)


= (2 * 0.35) + (3 *0.1) + (2 * 0.2) +( 2 *0.2) + (3 * 0.15) / (2+3+2+2+3)

= 1.87

Example: Construct Huffman Tree for 'MAHARASHTRA'

Step 1: Frequency Count

Character Count
M: 1
A: 4
H: 1
R: 2
S: 1
T: 1

Step 2: Draw Huffman tree

5
6
Step 3: Optimal Code

7
Example 3:

Example 4:

A file contains the following characters with the frequencies as shown. If Huffman
Coding is used for data compression, determine-

1. Huffman Code for each character


2. Average code length
3. Length of Huffman encoded message (in bits)

8
9
AKTU Questions

1. Explain Huffman algorithm. Construct Huffman tree of MAHARASHTRA


with its optimal code.
2. Construct a Huffman tree for given characters A, B, C, D, E, F, G, H having
frequencies 22, 5, 11, 19, 2, 11, 25, 5 respectively. What will be the code of HEAD in
binary?
3. What is Huffman Tree? Draw a Huffman tree with Algorithm for following symbol
whose frequency of occurrence in a message is stated along with the symbol below:
A:6 B:7 C:12 D:6 E:25 G:4 H:10 J:35 K:15
1. Create Huffman tree with following data numbers.
24, 55, 13, 67, 88, 36, 17, 61, 24, 76

2. Draw the Huffman tree for the following symbols whose frequency of
occurrence of a message is stated along with the symbols below
M1: 0.45 M2: 0.02 M3: 0.24 M4: 0.18 M5: 0.11
and decode the following message
10110011011111001100101111101101100
3. Differentiate between fixed length and variable length encoding. Draw a
Huffman tree for the following symbols whose frequency of occurrence
in a msg is stated along with the symbol below:
A:15, B:6, C: 7, D: 12, E: 25, F: 4, G:6, H:1, I:15
Decode the message 1110100010111011.

A networking company uses a compression technique to encode the message before


transmitting over the network. Suppose the piece of message (each character occupies 7 bits )
written in italic letter.

10
When you are on the left you are on the right. When you are on the right, you are on the
wrong.

Suggest the answer to following question based on above problem.

i. Construct Huffman tree


ii. Decode the message following message 101111101010111111111100

11

You might also like