0% found this document useful (0 votes)
84 views12 pages

Huffman Codes

Huffman coding is a greedy algorithm that assigns variable-length binary codes to characters based on their frequency. It builds a prefix tree from the character frequencies where each leaf node represents a character and the path from the root to the leaf is the character's code. This results in more frequent characters having shorter codes and less frequent characters having longer codes. The algorithm produces an optimal prefix code that compresses data by 20-90% depending on the character frequencies.

Uploaded by

Imaan Mufti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views12 pages

Huffman Codes

Huffman coding is a greedy algorithm that assigns variable-length binary codes to characters based on their frequency. It builds a prefix tree from the character frequencies where each leaf node represents a character and the path from the root to the leaf is the character's code. This results in more frequent characters having shorter codes and less frequent characters having longer codes. The algorithm produces an optimal prefix code that compresses data by 20-90% depending on the character frequencies.

Uploaded by

Imaan Mufti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Greedy Algorithms

Huffman Codes
Huffman Codes
• Huffman codes compress data very effectively: savings of 20% to 90%
are typical, depending on the characteristics of the data being
compressed.
• We consider the data to be a sequence of characters.
• Huffman’s greedy algorithm uses a table giving how often each
character occurs (i.e., its frequency) to build up an optimal way of
representing each character as a binary string.
Huffman Codes
• Suppose we have a 100,000 character data file that we wish to store
compactly.
• We observe that the characters in the file occur with the frequencies
given by Figure 16.3.
• That is, only 6 different characters appear, and the character a occurs
45,000 times.
Fixed-length code
• in which each character is represented by a unique binary string,
which we call a codeword.
• If we use a fixed-length code, we need 3 bits to represent 6
characters:
• a = 000, b = 001, . . . , f = 101.
• This method requires 300,000 bits to code the entire file.

• Can we do better?
Variable-length code
• A variable-length code can do considerably better than a fixed-length
code, by giving frequent characters short codewords and infrequent
characters long codewords.
• Figure 16.3 shows such a code;
• here the 1-bit string 0 represents a, and the 4-bit string 1100 represents f.
This code requires

• to represent the file, a savings of approximately 25%. In fact, this is an


optimal character code for this file, as we shall see.
Prefix codes

For example, a code with code words {9, 55} has the prefix
property; a code consisting of {9, 5, 59, 55} does not, because "5"
is a prefix of "59" and also of "55".
Prefix codes
• Using prefix codes, a message can be transmitted as a sequence of
concatenated code words, without any out-of-band markers or (alternatively)
special markers between words to frame the words in the message.
• The recipient can decode the message unambiguously, by repeatedly finding
and removing sequences that form valid code words.
• This is not generally possible with codes that lack the prefix property, for
example {0, 1, 10, 11}: a receiver reading a "1" at the start of a code word
would not know whether that was the complete code word "1", or merely
the prefix of the code word "10" or "11"; so the string "10" could be
interpreted either as a single codeword or as the concatenation of the words
"1" then "0".
How to compute prefix codes?
Constructing a Tree
Algorithm
Analysis

You might also like