Huffman Algorithm - Code Construction 2
Huffman Algorithm - Code Construction 2
Huffman Coding
Huffman codes can be used to compress information
Like WinZip although WinZip doesnt use the Huffman algorithm JPEGs do use Huffman as part of their compression process
The basic idea is that instead of storing each character in a file as an 8-bit ASCII value, we will instead store the more frequently occurring characters using fewer bits and less frequently occurring characters using more bits
On average this should decrease the filesize (usually )
Huffman Coding
As an example, lets take the string:
duke blue devils
Huffman Coding
We then pick the nodes with the smallest frequency and combine them together to form a new node
The selection of these nodes is the Greedy part
The two selected nodes are removed from the set, but replace by the combined node This continues until we have only 1 node left in the set
Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1
i,1
2
s,1
Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1
b,1
2
v,1 i,1
2
s,1
Huffman Coding
e,3 d,2 u,2 l,2 sp,2
k,1 b,1
3
2 v,1 i,1
2
s,1
Huffman Coding
e,3 d,2 u,2 4
l,2 sp,2 k,1 b,1
3
2 v,1 i,1
2
s,1
Huffman Coding
e,3
d,2
4
u,2
4
l,2 sp,2 k,1
3
2 b,1 v,1 i,1
2
s,1
Huffman Coding
e,3
d,2
4
u,2
4
l,2 sp,2 i,1 2 s,1
5
3 k,1 b,1 2 v,1
Huffman Coding
7
e,3 d,2 4 u,2
4
l,2 sp,2 i,1 2 s,1
5
3 k,1 b,1 2 v,1
Huffman Coding
7
e,3 d,2 4 u,2 4 l,2 sp,2 i,1 2 s,1 k,1 b,1
9
5 3 2 v,1
Huffman Coding
16 7 e,3 d,2 4 u,2 4 l,2 sp,2 i,1 2 s,1 k,1 b,1 9 5 3 2 v,1
Huffman Coding
Now we assign codes to the tree by placing a 0 on every left branch and a 1 on every right branch A traversal of the tree from root to leaf give the Huffman code for that particular leaf character Note that no code is the prefix of another code
Huffman Coding
16 7 e,3 d,2 4 u,2 4 l,2 sp,2 i,1 2 s,1 k,1 b,1 9 5 3 2 v,1
d u l i s k
sp 101
b
v
11110
11111
Huffman Coding
These codes are then used to encode the string Thus, duke blue devils turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101
Huffman Coding
Uncompressing works by reading in the file bit by bit
Start at the root of the tree If a 0 is read, head left If a 1 is read, head right When a leaf is reached decode that character and start over again at the root of the tree
Thus, we need to save Huffman table information as a header in the compressed file
Doesnt add a significant amount of size to the file for large files (which are the ones you want to compress anyway) Or we could use a fixed universal set of codes/freqencies