0% found this document useful (0 votes)
30 views

Huffman Coding

Uploaded by

Mudassar Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Huffman Coding

Uploaded by

Mudassar Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Huffman Coding

Each character is represented in 8 bits when characters are coded using standard codes such as
ASCII. It can be seen that the characters coded using standard codes have fixed-length code word
representation. In this fixed-length coding system the total code length is more. For example, let
we have six characters (a, b, c, d, e, f) and their frequency of occurrence in a message is {45, 13,
12, 16, 9, 5}. In fixed-length coding system we can use three characters to represent each code.
Then the total code length of the message is (45+13+12+16+9+5) x 3 = 100 x 3 = 300.

Let us encode the characters with variable-length coding system. In this coding system, the
character with higher frequency of occurrence is assigned fewer bits for representation while
the characters having lower frequency of occurrence in assigned more bits for representation. The
variable length code for the characters are shown in the following tableThe total code length in
variable length coding system is 1  45 + 3  12 + 3  16  4  9 + 4  5 = 224. Hence fixed length code
requires 300 bits while variable code requires only 224 bits.

a b c d e f

0 101 100 111 1101 1100

Prefix (Free) Codes


We have seen that using variable-length code word we minimize the overall encoded string length.
But the question arises whether we can decode the string. If a is encoded 1 instead of 0 then
the encoded string “111” can be decoded as “d” or “aaa”. It can be seen that we get ambiguous string.
The key point to remove this ambiguity is to use prefix codes. Prefix codes is the code in which there is
no codeword that is a prefix of other codeword.

The representation of “decoding process” is binary tree whose leaves are characters. We
interpret the binary codeword for a character as path from the root to that character, where

 “0” means “go to the left child”


 “1” means “go to the right child”
Greedy Algorithm for Huffman Code:
According to Huffman algorithm, a bottom up tree is built starting from the leaves. Initially,
there are n singleton trees in the forest, as each tree is a leaf. The greedy strategy first finds two trees
having minimum frequency of occurrences. Then these two trees are merged in a single tree where the
frequency of this tree is the total sum of two merged trees. The whole process is repeated until there
in only one tree in the forest.

Let us consider a set of characters S=<a, b, c, d, e, f> with the following frequency of occurrences P
=
< 45, 13, 12, 16, 5, 9 >. Initially, these six characters with their frequencies are considered six singleton
trees in the forest. The step wise merging these trees to a single tree is shown in Fig. 6.3. The merging
is done by selecting two trees with minimum frequencies till there is only one tree in the forest.

a : 45 b : 13 c : 12 d : 16 e:5 f:9
Step wise merging of the singleton trees.

Now the left branch is assigned a code “0” and right branch is assigned a code “1”. The decode tree
after assigning the codes to the branches.

The binary codeword for a character is interpreted as path from the root to that character; Hence,
the codes for the characters are as follows

a=0

b = 101

c = 100

d = 111
e = 1100

f = 1101

Therefore, it is seen that no code is the prefix of other code. Suppose we have a code 01111001101.
To decode the binary codeword for a character, we traverse the tree. The first character is 0 and the
character at which the tree traversal terminates is a. Then, the next bit is 1 for which the tree is
traversed right. Since it has not reached at the leaf node, the tree is next traversed right for the next bit
1. Similarly, the tree is traversed for all the bits of the code string. When the tree traversal terminates at
a leaf node, the tree traversal again starts from the root for the next bit of the code string. The character
string after decoding is “adcf”.

AlgorithmHUFFMAN(n, S)

// n is the number of symbols and S in the set of characters, for each character c S, the frequency of
occurrence in f(c) //

Initialize the priority queue;

Q = S ; // Inilialize the priority Q with the frequencies of all the characters of set S//

for(i =1 ; i<= n-i, i++){

z = CREAT _NODE ( ); // create a node pointed by z; //

// Delete the character with minimum frequency from the Q and store in node x//

x = DELETE _MIN (Q);

// Delete the character with next minimum frequency from the Q and store in node y//

y = DELETE_MIN (Q);

zleft = x; // Place x as the left child of z//

zright = y; // Place y as the right child of z//

//The value of node z is the sum of values at node x and node y//

f(z) = f(x) + f(y);

//insert z into the priority Q//

INSERT (Q, z);

returnDELETE_MIN(Q)

You might also like