Huffman Coding
Huffman Coding
Encoding messages
Wasted space
Unicode uses twice as much space as ASCII
• inefficient for plain-text messages containing only ASCII characters
Same number of bits used to represent all characters
‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’
A = 00
0010110111001111111111
B = 01
C = 10 ACDBADDDDD
D = 11
Prefix property
A code has the prefix property if no character code is the prefix (start of
the code) for another character
Example:
Symbol Code
P 000
01001101100010
Q 11
R 01 RSTQPT
S 001
T 10
000 is not a prefix of 11, 01, 001, or 10
11 is not a prefix of 000, 01, 001, or 10 …
Code without prefix property
Symbol Code
P 0
Q 1
R 01
S 10
T 11
DEAACAAAAABA
Symbol Code
A 0
B 10
C 110
D 1110
E 11110
1110111100011000000100 22 bits
Another possible code
DEAACAAAAABA
Symbol Code
A 0
B 100
C 101
D 1101
E 1111
1101111100101000001000 22 bits
Better code
DEAACAAAAABA
Symbol Code
A 0
B 100
C 101
D 110
E 111
11011100101000001000 20 bits
What code to use?
Answer: Yes!
Huffman coding tree
Binary tree
each leaf contains symbol (character)
label edge from node to left child with 0
label edge from node to right child with 1
Code for any symbol obtained by following path from root to the leaf
containing symbol
Code has prefix property
leaf node cannot appear on path to another leaf
note: fixed-length codes are represented by a complete Huffman tree
and clearly have the prefix property
Building a Huffman tree
A G M T E H _ I S
1 1 1 1 2 2 3 3 5
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 1
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 2
2 2
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 3
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 4
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 5
2 2 4 6
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 6
4 4
2 2 2 2 6
E H
1 1 1 1 3 3 5
A G M T _ I S
Step 7
8 11
4 4 6 5
S
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
Step 8
19
11 8
6 5 4 4
S
3 3 2 2 2 2
_ I E H
1 1 1 1
A G M T
Label edges
19
0 1
11 8
0 1 S 01 0 1
E 110
6 5 4 4
H 111
0 1 S 0 1 0 1
_ 000
3 3 I 001
2 2 2 2
A 1000
_ I 0 1 0 1 H
G 1001 E
M 1010 1 1 1 1
T 1011
A G M T
Huffman code & encoded message
This is his message
S 01
E 110
H 111
_ 000
I 001
A 1000
G 1001
M 1010
T 1011
10111110010100000101000111001010001010110010110001001110
Huffman Coding
A B C D E F
Example2:
Solution: STEP2
0.18
B A C D E F
Example2:
Solution: STEP3
0.18 0.27
B A D C E F
Example2:
Solution: STEP4
0.38
B A D C F
Example2:
Solution: STEP5
0.38 0.62
0.27
0.20 0.18 0.35
F
E
0.15 0.12
0.10 0.08
D C
B A
Example2:
1.00
Solution: STEP6
0.38
0.62
0.20 0.18
0.35
0.27
E
F
0.10 0.08
0.15 0.12
B A
D C
Example2:
1.00
Solution: STEP7
0
0.38
0.62 0
0
0.20 0.18
0.35
0.27
E 0
F
0
0.10 0.08
0.15 0.12
B A
D C
Example2:
1.00
Solution: STEP8
1
0
0.38
0.62 0 1
0
1
0.20 0.18
0.35
0.27
E 0 1
F
0 1
0.10 0.08
0.15 0.12
B A
D C
Example2:
1.00
Solution: STEP8
1
0
0.38
0.62 1
0
0 1 Symbol Code
A 111
B 110 0.20 0.18
0.27
0.35 C 011
D 010
E 10 E
F 0 1 0 1
F 00
0.10 0.08
0.15 0.12
B A
D C
Example2: