We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7
Huffman Code
Dr. Raghavendra Mishra
Huffman Codes • Most character code systems (ASCII, unicode) use fixed length encoding • If frequency data is available and there is a wide variety of frequencies, variable length encoding can save 20% to 90% space • Which characters should we assign shorter codes; which characters will have longer codes?
• At first it is not obvious how decoding will
happen, but this is possible if we use prefix codes Prefix Codes • No encoding of a character can be the prefix of the longer encoding of another character, for example, we could not encode t as 01 and x as 01101 since 01 is a prefix of 01101 • By using a binary tree representation we will generate prefix codes provided all letters are leaves Some Properties • Prefix codes allow easy decoding – Given a: 0, b: 101, c: 100, d: 111, e: 1101, f: 1100 – Decode 001011101 going left to right, 0|01011101, a|0|1011101, a|a|101|1101, a|a|b|1101, a|a|b|e • An optimal code must be a full binary tree (a tree where every internal node has two children) • For C leaves there are C-1 internal nodes • The number of bits to encode a file is
where f(c) is the freq of c, dT(c) is the tree depth of
c, which corresponds to the code length of c Building the Encoding Tree The Algorithm
• An appropriate data structure is a binary min-heap
• Rebuilding the heap is lg n and n-1 extractions are made, so the complexity is O( n lg n ) • The encoding is NOT unique, other encoding may work just as well, but none will work better Correctness of Huffman’s Algorithm • The following results are presented without proof