huffman code
huffman code
Associated with each program i is a length li, 1≤i≤n. All programs can be stored on the tape if and only if
the sum of the lengths of the programs is at most l.
We assume that whenever a program is to be retrieved from this tape, the tape is initially positioned at
the front.
Hence, if the programs are stored in the order I= i1, i2, ... , in, the time tj needed to retrieve program ij is
proportional to ∑l≤k≤j lik.
If all programs are retrieved equally often, then the expected or mean retrieval time (MRT) is (1/n)
∑l≤j≤n tj.
In the optimal storage on tape problem, we are required to find a permutation for the n programs so
that when they are stored on the tape in this order the MRT is minimized. This problem fits the ordering
paradigm.
Example: Let n = 3 and (li, l2, l3) = (5, 10, 3). There are n! = 6 possible orderings. These orderings and
their respective d values are:
□ A greedy approach to building the required permutation would choose the next program on the basis
of some optimization measure. The next program to be stored on the tape would be one that minimizes
the increase in d. We observe that the increase in d is minimized if the next program chosen is the one
with the least length from among the remaining programs.
1 Algorithm Store(n, m)
3{
4 j := O; / / Next tape to store on
5 for i := 1 to n do
6{
9 j := (j +1) mod m;
10
10 }
The optimal merge pattern problem describes that there two sorted files containing n and m records
respectively could be merged together to obtain one sorted file in time O(n +m).
When more than two sorted files are to be merged together, the merge can be accomplished by
repeatedly merging sorted files in pairs.
Thus, if files x1 , x2, x3 and x4 are to be merged, we could first merge x1 and x2 to get a file y1. Then we
could merge y1 and x3 to get y2. Finally, wecould merge y2 and x4 to get the desired sorted file.
Alternatively, we could first merge x1 and x2 getting yl, then merge x3 and x4 and get y2, and finally
merge y1 and y2 and get the desired sorted file.
Given n sorted files, there are many ways in which to pairwise merge them into a single sortedfile.
Different pairings require differing amounts of computing time.
The problem we address ourselves to now is that of determining an optimal wayto pairwise merge n
sorted files.
Example: Method 1:The files x1, x2 and x3 are three sorted files of length 30, 20, and 10 records each.
Merging x1 and x2 requires 50 record moves. Merging the result with x3 requires another 60 moves. The
total number of record moves required to merge the three files this way is 110.
Method 2: Merge x2 and X3 (taking 30 moves) and then x1 (taking 60 moves), the total record moves
made is only 90.
Thus, if we have five files (x1,... , x5) with sizes (20, 30, 10, 5, 30), our greedy rule would generate the
following merge pattern:
3 (60 moves)
4. Merge z2 and z3 to get the answer z4 (95 moves) The total number of record moves is 205.
The merge pattern such as the one just described will be referred to as a two-way merge pattern (each
merge step involves the merging of two files). The two-way merge patterns can be represented by
binary merge trees.
Figure shown below a binary merge tree representing the optimal merge pattern obtained for the above
five files.
The leaf nodes are drawn as squares and represent the given five files.
These nodes are called external nodes. The remaining nodes are drawn as circles and are called internal
nodes. Each internal node has exactly two children, and it represents the file obtained by merging the
files represented by its two children. The number in each node is the length (i.e., the number of records)
of the file represented by that node.
The external node X4 is at a distance of 3 from the root node z4 ( a node at level i is at a distance of i - 1
from the root). Hence, the records of file x4 are moved three times, once to get z1, once again to get z2,
and finally one more time to get Z4. If di is the distance from the root to the external node for file Xi and
qi, the length of Xi is then the total number of record moves for this binary merge tree is
This sum is called the weighted external path length of the tree.
};
A well-known Greedy algorithm is Huffman Coding. The size of code allocated to a character relies on the
frequency of the character, which is why it is referred to be a greedy algorithm. The short-length variable
code is assigned to the character with the highest frequency, and vice versa for characters with lower
frequencies. It employs a variable-length encoding, which means that it gives each character in the provided
data stream a different variable-length code.
Prefix Rule
Essentially, this rule states that the code that is allocated to a character shall not be another code's prefix. If
this rule is broken, various ambiguities may appear when decoding the Huffman tree that has been created.
Let's look at an illustration of this rule to better comprehend it: For each character, a code is provided, such
as:
1. a-0
2. b-1
3. c - 01
Assuming that the produced bit stream is 001, the code may be expressed as follows when decoded:
1. 0 0 1 = aab
2. 0 01 = ac
What is the Huffman Coding process?
The Huffman Code is obtained for each distinct character in primarily two steps:
o Create a Huffman Tree first using only the unique characters in the data stream provided.
o Second, we must proceed through the constructed Huffman Tree, assign codes to the characters,
and then use those codes to decode the provided text.
The steps used to construct the Huffman tree using the characters provided
1. Input:
2. string str = "abbcdbccdaabbeeebeab"
If Huffman Coding is employed in this case for data compression, the following information must be
determined for decoding:
The frequency of each character in the provided string must first be determined.
Character Frequency
a 4
b 7
c 3
d 2
e 4
1. Sort the characters by frequency, ascending. These are kept in a Q/min-heap priority queue.
2. For each distinct character and its frequency in the data stream, create a leaf node.
3. Remove the two nodes with the two lowest frequencies from the nodes, and the new root of the tree
is created using the sum of these frequencies.
o Make the first extracted node its left child and the second extracted node its right child
while extracting the nodes with the lowest frequency from the min-heap.
o To the min-heap, add this node.
o Since the left side of the root should always contain the minimum frequency.
4. Repeat steps 3 and 4 until there is only one node left on the heap, or all characters are represented
by nodes in the tree. The tree is finished when just the root node remains.
Step 1: Build a min-heap in which each node represents the root of a tree with a single node and holds 5
(the number of unique characters from the provided stream of data).
Step 2: Obtain two minimum frequency nodes from the min heap in step two. Add a third internal node,
frequency 2 + 3 = 5, which is created by joining the two extracted nodes.
o Now, there are 4 nodes in the min-heap, 3 of which are the roots of trees with a single element each,
and 1 of which is the root of a tree with two elements.
Step 3: Get the two minimum frequency nodes from the heap in a similar manner in step three. Additionally,
add a new internal node formed by joining the two extracted nodes; its frequency in the tree should be 4 + 4
= 8.
o Now
that the minimum heap has three nodes, one node serves as the root of trees with a single element
and two heap nodes serve as the root of trees with multiple nodes.
Step 4: Get the two minimum frequency nodes in step four. Additionally, add a new internal node formed by
joining the two extracted nodes; its frequency in the tree should be 5 + 7 = 12.
o When creating a Huffman tree, we must ensure that the minimum value is always on the left side
and that the second value is always on the right side. Currently, the image below shows the tree
that has formed:
Step 5: Get the following two minimum frequency nodes in step 5. Additionally, add a new internal node
formed by joining the two extracted nodes; its frequency in the tree should be 12 + 8 = 20.
Continue until all of the distinct characters have been added to the tree. The Huffman tree created for the
specified cast of characters is shown in the above image.
Now, for each non-leaf node, assign 0 to the left edge and 1 to the right edge to create the code for each
letter.
o We should give the right edges weight 1 if you give the left edges weight 0.
o If the left edges are given weight 1, the right edges must be given weight 0.
o Any of the two aforementioned conventions may be used.
o However, follow the same protocol when decoding the tree as well.
o We must go through the Huffman tree until we reach the leaf node, where the element is present, in
order to decode the Huffman code for each character from the resulting Huffman tree.
o The weights across the nodes must be recorded during traversal and allocated to the items located
at the specific leaf node.
o The following example will help to further illustrate what we mean:
o To obtain the code for each character in the picture above, we must walk the entire tree (until all
leaf nodes are covered).
o As a result, the tree that has been created is used to decode the codes for each node. Below is a list
of the codes for each character:
a 4 01
b 7 11
c 3 101
d 2 100
e 4 00