0% found this document useful (0 votes)
7 views50 pages

5 Huffman Coding

The document discusses Huffman Encoding and Decoding as a lossless data compression algorithm, detailing its historical context and the steps involved in building a Huffman tree. It outlines the process of assigning variable-length codes based on character frequencies to minimize file size, and compares different versions of the algorithm. Additionally, it covers the pros and cons of Huffman coding and its greedy nature in algorithm design.

Uploaded by

Dipesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views50 pages

5 Huffman Coding

The document discusses Huffman Encoding and Decoding as a lossless data compression algorithm, detailing its historical context and the steps involved in building a Huffman tree. It outlines the process of assigning variable-length codes based on character frequencies to minimize file size, and compares different versions of the algorithm. Additionally, it covers the pros and cons of Huffman coding and its greedy nature in algorithm design.

Uploaded by

Dipesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Advanced Algorithms

Huffman Encoding/Decoding
Dr. Rubi Quiñones
Computer Science Department
Southern Illinois University Edwardsville
Course Timeline
• Introduction to algorithms

ALGORITH

ANALYSIS
MIC
• Median findings and order statistics
• Time complexity
STRATEGIES • Activity selection problems
GREEDY

• Water connection problem, Egyptian fraction


• Huffman (de)coding
• shelves, mice, and policemen problem

CONQUER

Max subarray sum and Nearest neighbor


DIVIDE
AND

• Newton’s and bisection algorithm


• Matrix multiplication, skyline and hanoi
• Fibonacci and path counting
PROGRAMMING

• Coin row and collecting problem


DYNAMIC

• Matrix chain multiplication and longest common subsequence


• Knapsack and optimal binary trees
• Floyd Warshall Algorithm and A*
Concep


Advanc

Algorithm intractability
ed

ts

• Randomized algorithms 2
Greedy Approach

• Huffman Encoding
• Huffman Decoding

3
Before Compression
• Based on lengths of assigned codes based on frequencies
• Variable length codes are known as prefix codes

= 𝑙𝑒𝑛𝑔𝑡ℎ 𝑐𝑜𝑑𝑒 𝑥𝑓𝑟𝑒𝑞 = 3𝑥10

4
Total bits: 174
Before Compression

Code Tree

5
Before Compression

Code Tree

Left Traversal in tree has value of 0 0 1 Right Traversal in tree has value of 1

0 1 0 1

0 0 1 0 0

6
Before Compression
Code Tree

Left Traversal in tree has value of 0 0 1 Right Traversal in tree has value of 1

0 1 0 1

0 0 1 0 0

IF we want A, traverse, and get 000 What is the character code for P? 7
Before Compression
Code Tree

Left Traversal in tree has value of 0 0 1 Right Traversal in tree has value of 1

We want to do better! 0 1 0 1

0 0 1 0 0

IF we want A, traverse, and get 000 What is the character code for P? 101 8
Introduction of Compression
19th century – what was the earliest form of compression?

9
Introduction of Compression
19th century – Morse Code (they didn’t know it, but they were doing compression!)
1949 - the introduction of Shannon Fano coding. It assigned codes based on the probability of the symbol occurring.
1952 – Huffman Encoding/Decoding
1977 – Limpel-Ziv

10
Huffman Encoding
• What is it? It’s a lossless data compression algorithm
• We want to minimize the size of the file, based on the frequencies of the symbols
• There are two parts in performing Huffman Coding
• Build a Huffman tree from input characters
• Traverse the Huffman tree and assign codes to characters

• Input: character list with its frequencies


• Output: character list with its code

11
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

12
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

13
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

14
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

15
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from

Step 3 take the next lowest freq


char, and add it to the left of tree

16
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

Step 3 take the next lowest freq


char, and add it to the left of tree

17
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them 10

Step 3 take the next lowest freq


char, and add it to the left of tree

REPEAT

18
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them 10

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT

19
Huffman Encoding
Version 1

Goal: Reduce the bits, we


must reduce the code tree
33
Step 1: take the 2 chars with the
lowest frequencies
12
Step 2: make a 2 leaf node tree
from them 10

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT
I

12

20
Huffman Encoding
Version 1

Goal: Reduce the bits, we 58

must reduce the code tree


15 43
Step 1: take the 2 chars with the
lowest frequencies
13 30
Step 2: make a 2 leaf node tree
from them
12
Step 3 take the next lowest freq
char, and add it to the tree 10

REPEAT

E P I

13 12 21
15
Huffman Encoding
Version 1

58
0
1
15 43
0
1
3
13
0
0 1
12 0
1
10 0
1
0
1
E I P

15 13 12

Fill out the chart 22


Huffman Encoding
Version 1
Char Code Freq Total Bits

A 1110 10 40
58
0
E 0 15 15 1
15 43
0
1
I 10 12 24 3
13
0
0 1
S 111111 3 18 0
12
1
10
T 11110 4 20 0
1
P 110 13 39 0
1
E I P
\n 111110 1 6
15 13 12
Total bit: 162 vs 174 bits

23
Huffman Encoding
Version 1 (we just did) version 2
We were assigning left nodes ourselves Keep assigning left until our sum is greater than any freq
5 on the remaining
1 list. If that happens, start a new tree
8

1 4
5 3 1

1 3
3 0
1
0
1
2 1
1
0
0
0
0 1

E I P
24
15 13 12
Huffman Encoding
version 3
Assigning left and right building the tree

58

43

30

18

10

E I P A

15 13 12 10
25
Huffman Coding
take a look at version 2
Goal: Reduce the bits, we
must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT

26
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT

27
WAIT! 18 is now higher than the remaining list…
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT until sum is < remaining


freq integers. If so, go to step 1.

28
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT until sum is < remaining


freq integers. If so, go to step 1.

29
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT until sum is < remaining


freq integers. If so, go to step 1.

30
Now we’re left with E, where should we put it?
Huffman Coding
Goal: Reduce the bits, we
must reduce the code tree

Step 1: take the 2 chars with the


lowest frequencies

Step 2: make a 2 leaf node tree


from them

Step 3 take the next lowest freq


char, and add it to the tree

REPEAT until sum is < remaining


freq integers. If so, go to step 1.

31
With the lowest sub tree. Are we done now?
Huffman Coding

32
No, we have to combine them now!
Huffman Coding

1
0

0 1
0
1
0
0
0 1

33
Fill in the left and right traversal…
Huffman Coding
Version 2

Version 2
1
0
1

0 0 1
0 1
0
0 1

Total bit: 146 vs 174 bits

34
Huffman Encoding

58 Version 3
43

30
18

10

E I P A
15 13 12 10
Adding Char by alternating left and right

35
Version 1 Version 2 Version 3
Cha Code Freq Total Char Code Freq Total Bits
r Bits
A 1110 10 40 A 1111 10 40

E 0 15 15 E 0 15 15

I 10 12 24 I 10 12 24

S 111111 3 18 S 111011 3 18

T 11110 4 20 T 11100 4 20

P 110 13 39
P 110 13 39

\n 111010 1 6
\n 111110 1 6

Total bit: 163 vs 174 bits


Total bit: 162 vs 174 bits
Total bit: 146 vs 174 bits

36
Huffman Encoding

1. Create a traditional tree in the order they are listed


1. Determine their code index and total bits
2. Perform Huffman encoding (version 2)
1. Determine their code index and total bits

There is a different end case in this example. How would you approach it?

37
Huffman Encoding
Before After
Code Code
000 11110 100
001 1
11111 55
010 1110 0 1
0 39
101 110
1
110 10 26
0 1
111 0 0 14
0 1
Before compression: (3x5)+(3x9)+(3x12)+(3x13)+(3x16)+(3x45)=300 bits
f e d c a b
After compression: (5x5)+(5x9)+(4x12)+(3x13)+(2x16)+(1*45) = 234 bits
45 16 13 12 5 9

Pros and Cons?


38
What algorithms can we use to build the Huffman coding problem?
Huffman Encoding
Algorithm Huffman-v2 (c)
1. Put all characters in a priority queue by freq
c is 2 column input of character and freq
2. while (there is more than 1 node in queue)
1. Dequeue the first two chars using extractMin().minheap()
2. Create them as new nodes with the sum of of their freq.
lowest freq -> left child. highest freq -> right child
3. Dequeue the next char and create a node for it and it to
Time complexity? the tree.
Space complexity? 4. If the freq < the sum, make it a left child.
What makes this greedy? If the freq > the sum, make it a right Child.
If the freq == to the sum, make a choice.
5. If (sum > remaining freq in priority queue)
1. Create new tree and repeat conditions in (2)
3. // when there is 1 node left
1. Assign it to the tree that gives lowest sum
4. Combine all trees to 1 tree

39
Huffman Decoding
Before After
Encode Encode R
1
000 11110
001 11111
010 1110
101 110
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 1 a leaf node? No, keep going 40


Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110
101 110
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 11 a leaf node? No, keep going 41


Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 0
101 110 D
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 110 a leaf node? Yes, it is D. record and restart at root node
42
Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 0
101 110 D
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 1 a leaf node? No, keep going 43


Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 0
101 110 D
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 11 a leaf node? No, keep going 44


Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 1
0
101 110 D
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 111 a leaf node? No, keep going 45


Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 1
0
101 110 D 1
110 10
111 0
Perform Huffman decoding.
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 1111 a leaf node? No, keep going 46


Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 1
0
101 110 D 1
110 10
111 0 0
Perform Huffman decoding. A
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 Is 11110 a leaf node? Yes, it’s A. record and restart at root node
47
Huffman Decoding
Before After
Code Code R
1
000 11110
1
001 11111
010 1110 1
0
101 110 D 1
110 10
111 0 0
Perform Huffman decoding. A
1. Start at the root and decode until leaf is found
2. If current bit is 0, we move to the left node of the tree
3. If current bit is 1, we move right node of the tree
4. If during traversal, we encounter a leaf node,
we print character and start at root node again

Decode: 11011110110 ANSWER: DAD


Time complexity? 48
Challenge

1. How much storage does the following frequency list need (before compression) assuming each
character needs a 3 bit long code?
B: 2, Y: 2, D: 2, S: 1, C: 1, O: 6

2. Perform Huffman Coding (whichever version you want) on the frequency list above. How much
memory does it consume? Does it perform better or worse?

3. Perform Huffman Decoding using the previous frequency codes on the following.
111101111100111010
11000111010
11000

49
Course Timeline
• Introduction to algorithms

ALGORITH

ANALYSIS
MIC
• Median findings and order statistics
• Time complexity
STRATEGIES • Activity selection problems
GREEDY

• Water connection problem, Egyptian fraction


• Huffman (de)coding
• shelves, mice, and policemen problem

CONQUER

Max subarray sum and Nearest neighbor


DIVIDE
AND

• Newton’s and bisection algorithm


• Matrix multiplication, skyline and hanoi
• Fibonacci and path counting
PROGRAMMING

• Coin row and collecting problem


DYNAMIC

• Matrix chain multiplication and longest common subsequence


• Knapsack and optimal binary trees
• Floyd Warshall Algorithm and A*
Concep


Advanc

Algorithm intractability
ed

ts

• Randomized algorithms 50

You might also like