5 Huffman Coding
5 Huffman Coding
Huffman Encoding/Decoding
Dr. Rubi Quiñones
Computer Science Department
Southern Illinois University Edwardsville
Course Timeline
• Introduction to algorithms
ALGORITH
ANALYSIS
MIC
• Median findings and order statistics
• Time complexity
STRATEGIES • Activity selection problems
GREEDY
•
Advanc
Algorithm intractability
ed
ts
• Randomized algorithms 2
Greedy Approach
• Huffman Encoding
• Huffman Decoding
3
Before Compression
• Based on lengths of assigned codes based on frequencies
• Variable length codes are known as prefix codes
4
Total bits: 174
Before Compression
Code Tree
5
Before Compression
Code Tree
Left Traversal in tree has value of 0 0 1 Right Traversal in tree has value of 1
0 1 0 1
0 0 1 0 0
6
Before Compression
Code Tree
Left Traversal in tree has value of 0 0 1 Right Traversal in tree has value of 1
0 1 0 1
0 0 1 0 0
IF we want A, traverse, and get 000 What is the character code for P? 7
Before Compression
Code Tree
Left Traversal in tree has value of 0 0 1 Right Traversal in tree has value of 1
We want to do better! 0 1 0 1
0 0 1 0 0
IF we want A, traverse, and get 000 What is the character code for P? 101 8
Introduction of Compression
19th century – what was the earliest form of compression?
9
Introduction of Compression
19th century – Morse Code (they didn’t know it, but they were doing compression!)
1949 - the introduction of Shannon Fano coding. It assigned codes based on the probability of the symbol occurring.
1952 – Huffman Encoding/Decoding
1977 – Limpel-Ziv
10
Huffman Encoding
• What is it? It’s a lossless data compression algorithm
• We want to minimize the size of the file, based on the frequencies of the symbols
• There are two parts in performing Huffman Coding
• Build a Huffman tree from input characters
• Traverse the Huffman tree and assign codes to characters
11
Huffman Encoding
Version 1
12
Huffman Encoding
Version 1
13
Huffman Encoding
Version 1
14
Huffman Encoding
Version 1
15
Huffman Encoding
Version 1
16
Huffman Encoding
Version 1
17
Huffman Encoding
Version 1
REPEAT
18
Huffman Encoding
Version 1
REPEAT
19
Huffman Encoding
Version 1
REPEAT
I
12
20
Huffman Encoding
Version 1
REPEAT
E P I
13 12 21
15
Huffman Encoding
Version 1
58
0
1
15 43
0
1
3
13
0
0 1
12 0
1
10 0
1
0
1
E I P
15 13 12
A 1110 10 40
58
0
E 0 15 15 1
15 43
0
1
I 10 12 24 3
13
0
0 1
S 111111 3 18 0
12
1
10
T 11110 4 20 0
1
P 110 13 39 0
1
E I P
\n 111110 1 6
15 13 12
Total bit: 162 vs 174 bits
23
Huffman Encoding
Version 1 (we just did) version 2
We were assigning left nodes ourselves Keep assigning left until our sum is greater than any freq
5 on the remaining
1 list. If that happens, start a new tree
8
1 4
5 3 1
1 3
3 0
1
0
1
2 1
1
0
0
0
0 1
E I P
24
15 13 12
Huffman Encoding
version 3
Assigning left and right building the tree
58
43
30
18
10
E I P A
15 13 12 10
25
Huffman Coding
take a look at version 2
Goal: Reduce the bits, we
must reduce the code tree
REPEAT
26
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree
REPEAT
27
WAIT! 18 is now higher than the remaining list…
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree
28
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree
29
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree
30
Now we’re left with E, where should we put it?
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree
31
With the lowest sub tree. Are we done now?
Huffman Coding
32
No, we have to combine them now!
Huffman Coding
1
0
0 1
0
1
0
0
0 1
33
Fill in the left and right traversal…
Huffman Coding
Version 2
Version 2
1
0
1
0 0 1
0 1
0
0 1
34
Huffman Encoding
58 Version 3
43
30
18
10
E I P A
15 13 12 10
Adding Char by alternating left and right
35
Version 1 Version 2 Version 3
Cha Code Freq Total Char Code Freq Total Bits
r Bits
A 1110 10 40 A 1111 10 40
E 0 15 15 E 0 15 15
I 10 12 24 I 10 12 24
S 111111 3 18 S 111011 3 18
T 11110 4 20 T 11100 4 20
P 110 13 39
P 110 13 39
\n 111010 1 6
\n 111110 1 6
36
Huffman Encoding
There is a different end case in this example. How would you approach it?
37
Huffman Encoding
Before After
Code Code
000 11110 100
001 1
11111 55
010 1110 0 1
0 39
101 110
1
110 10 26
0 1
111 0 0 14
0 1
Before compression: (3x5)+(3x9)+(3x12)+(3x13)+(3x16)+(3x45)=300 bits
f e d c a b
After compression: (5x5)+(5x9)+(4x12)+(3x13)+(2x16)+(1*45) = 234 bits
45 16 13 12 5 9
39
Huffman Decoding
Before After
Encode Encode R
1
000 11110
001 11111
010 1110
101 110
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again
Decode: 11011110110 Is 110 a leaf node? Yes, it is D. record and restart at root node
42
Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 0
101 110 D
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again
Decode: 11011110110 Is 11110 a leaf node? Yes, it’s A. record and restart at root node
47
Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 1
0
101 110 D 1
110 10
111 0 0
Perform Huffman decoding. A
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again
1. How much storage does the following frequency list need (before compression) assuming each
character needs a 3 bit long code?
B: 2, Y: 2, D: 2, S: 1, C: 1, O: 6
2. Perform Huffman Coding (whichever version you want) on the frequency list above. How much
memory does it consume? Does it perform better or worse?
3. Perform Huffman Decoding using the previous frequency codes on the following.
111101111100111010
11000111010
11000
49
Course Timeline
• Introduction to algorithms
ALGORITH
ANALYSIS
MIC
• Median findings and order statistics
• Time complexity
STRATEGIES • Activity selection problems
GREEDY
•
Advanc
Algorithm intractability
ed
ts
• Randomized algorithms 50