Algorithm Design Paradigm-3
Algorithm Design Paradigm-3
Paradigm - 3
Can we do better?
Huffman Code
• A variable-length code can do considerably better than a fixed-length code, by
giving frequent characters short codewords and infrequent characters long
codewords.
• Figure 16.3 shows such a code; here the 1-bit string 0 represents a, and the 4-
bit string 1100 represents f. This code requires
(45 x 1 + 13 x 3 + 12 x 3 + 16 x 3 + 9 x 4 + 5 x 4) x 1,000 = 224,000 bits to
represent the file, a savings of approximately 25%.
• In fact, this is an optimal character code for this file
Huffman Code: Greedy Approach
• Prefix Codes
• means the codes (bit sequences) are assigned in such a way that the code
assigned to one character is not the prefix of code assigned to any other
character.
• This is how Huffman Coding makes sure that there is no ambiguity when
decoding the generated bitstream in variable length coding.
• For example: Let there be four characters a, b, c and d, and their
corresponding variable length codes be 00, 01, 0 and 1.
• This coding leads to ambiguity because code assigned to c is the prefix of
codes assigned to a and b.
• If the compressed bit stream is 0001, the de-compressed output may be
“cccd” or “ccb” or “acd” or “ab”.
Huffman Code: Greedy Approach
• Huffman invented a greedy algorithm that constructs an optimal prefix code
called a Huffman code.
• Assume that C is a set of n characters and that each character
c ϵ C is an object with an attribute c.freq giving its frequency.
• The algorithm builds the tree T corresponding to the optimal
code in a bottom-up manner. It begins with a set of |C| leaves
and performs a sequence of |C|-1 “merging” operations to
create the final tree.
• The algorithm uses a min-priority queue Q, keyed on the freq
attribute, to identify the two least-frequent objects to merge
together.
• When we merge two objects, the result is a new object whose
frequency is the sum of the frequencies of the two objects
that were merged.
Huffman Code: Greedy Approach