0% found this document useful (0 votes)
126 views5 pages

Huffman Project Report

Algorithm developed for Huffman encoding takes a string of data symbols to be encoded along with a vector containing respective symbol probabilities as input. It calls two recursive functions to generate the Huffman dictionary and reports the average length of the codeword dictionary as output.

Uploaded by

Nesmah Zy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views5 pages

Huffman Project Report

Algorithm developed for Huffman encoding takes a string of data symbols to be encoded along with a vector containing respective symbol probabilities as input. It calls two recursive functions to generate the Huffman dictionary and reports the average length of the codeword dictionary as output.

Uploaded by

Nesmah Zy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Huffman Coding

(EE 575: Source Coding Project)

Project Report

Submitted By:
Raza Umar

ID: g200905090
Algorithm Description

Algorithm developed for Huffman encoding takes a string of data symbols to be encoded along
with a vector containing respective symbol probabilities as input. It calls two recursive functions
to generate the Huffman dictionary and reports the average length of the codeword dictionary as
output.

The main theme of algorithm is to make use of cell structures in matlab to build the Huffman tree
while keeping track of child and parent nodes. Once the tree has been built, codeword
corresponding to each input data symbol (which acts like a leaf node in Huffman tree) can be
found out by simply traversing the tree from the branch till that leaf node is encountered. The
general structure contains cells corresponding to input data symbol, probability and its original
order in the list of symbols passed to the algorithm as a string. Two additional cells have been
added in the structure to keep information regarding the child nodes and code word of the current
node. A structure is made for each data symbol and M (= number of input data symbols)
instances of this structure are filled with known information and sorted in ascending order of
probability. This result in M leaf nodes corresponding to M data symbols arranged in ascending
order of probability.

Huffman tree is generated by passing this structure (with M nodes) to a recursive function
“gen_h_tree”. This function combines the top two nodes (nodes with least probability) to make
one parent node. Parent node contains the information of two combining nodes as child nodes
and the probability of parent node is equal to the sum of probabilities of child nodes. The two
child nodes are then removed from the Huffman tree and depending on the probability of this
parent node, it is inserted in the Huffman tree such that all the (M-1) nodes remain in ascending
order of probability. Note that, by replacing two child nodes with one parent node, number of
nodes gets reduced by 1. This function is recursively called till the Huffman tree consists of only
one final node with probability 1.

Huffman dictionary is then generated by traversing this tree recursively till the leaf nodes.
Essentially, Huffman dictionary is another structure containing cells corresponding to input data
symbol, probability, codeword, length of its codeword and its original order in input string of
data symbols. Since Huffman tree is a binary tree so each parent node contains information of its
two child nodes. A child node with least probability is assigned bit 1 while a child node with
higher probability is assigned a bit 0. All these bits corresponding to each node are concatenated
into a vector which ultimately becomes the code word of the node which has no child i.e. leaf
node.

Each time when a leaf node is encountered, weighted average length of the code word is
accumulated in a variable “avglen” containing 0 as its initial value. This variable represents the
average length of the codeword dictionary when all leaf nodes get their codewords assigned.
Codeword dictionary is then arranged according to the desired output format e.g. either same as
input data symbol order (original order) or in ascending/descending order of code length.

The algorithm then output each data symbol along with its respective codeword from the
codeword dictionary.
Algorithm Flowchart

Start

Read the inputs


1. String of input data symbols
2. Vector of respective probabilities

Fill in the structure h_tree


h_tree.symbol
h_tree.prob
h_tree.org_order
corresponding to each i/p data symbol
• sort M nodes in ascending order of prob.

Generate h_tree

yes
Is this structure
has only 1 node?

no

• Combine top two nodes to form a parent node


• Combining nodes are two children of parent node
• Prob. of parent node is the sum of prob. of child nodes
Insert new_node
index=1
While (new_node.prob > h_tree(index).prob)
do index=index+1
Place new_node before h_tree(index) in struct h_tree
avglen=0
h_tree

Generate h_dict

Is this a leaf node? yes


no

For i=1:2
h_tree.child{i}.code=[h_tree.code 2-i]
call Generate h_dict with h_tree.child{i} as input
end For

Copy h_tree to h_dict


Avglen=avglen+h_tee.prob*length(h_tree.code)

Display Output
• Sort h_dict instances according to original order of input data symbols
• Output symbols and their respective codewords
• Output avglen of codeword dictionary

You might also like