0% found this document useful (0 votes)
19 views30 pages

CS301 Lec26

Uploaded by

aishabatool.hu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

CS301 Lec26

Uploaded by

aishabatool.hu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Huffman Encoding

 Huffman code is method for the


compression for standard text
documents.
 It makes use of a binary tree to develop
codes of varying lengths for the letters
used in the original message.
 Huffman code is also part of the JPEG
image compression scheme.
 The algorithm was introduced by David
Huffman in 1952 as part of a course
assignment at MIT.
1
Lecture No.26
Data Structures

Dr. Sohail Aslam

2
Huffman Encoding
 To understand Huffman encoding, it
is best to use a simple example.
 Encoding the 32-character phrase:
"traversing threaded binary trees",
 If we send the phrase as a message
in a network using standard 8-bit
ASCII codes, we would have to send
8*32= 256 bits.
 Using the Huffman algorithm, we can
send the message with only 116 bits.

3
Huffman Encoding
 List all the letters used, including the
"space" character, along with the
frequency with which they occur in the
message.
 Consider each of these
(character,frequency) pairs to be nodes;
they are actually leaf nodes, as we will see.
 Pick the two nodes with the lowest
frequency, and if there is a tie, pick
randomly amongst those with equal
frequencies.

4
Huffman Encoding
 Make a new node out of these two, and
make the two nodes its children.
 This new node is assigned the sum of
the frequencies of its children.
 Continue the process of combining the
two nodes of lowest frequency until
only one node, the root, remains.

5
Huffman Encoding
Original text:
traversing threaded binary trees
size: 33 characters (space and newline)

NL : 1 i: 2
SP : 3 n: 2
a: 3 r: 5
b: 1 s: 2
d: 2
t: 3
e: 5
g: 1 v: 1
h: 1 y: 1
6
Huffman Encoding

2 is equal to sum
of the frequencies of
the two children nodes.

e r
a t
5 5
3 3

d i n s 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
7
Huffman Encoding

There a number of ways to combine


nodes. We have chosen just one such
way.

e r
a t
5 5
3 3

d i n s 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
8
Huffman Encoding

e r
a t
5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
9
Huffman Encoding

e r
a t 4 4 5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
10
Huffman Encoding

4 e r 5
a t 4 4 5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
11
Huffman Encoding

9 10
6 8

4 e r 5
a t 4 4 5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
12
Huffman Encoding

14 19

9 10
6 8

4 e r 5
a t 4 4 5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
13
Huffman Encoding
33

14 19

9 10
6 8

4 e r 5
a t 4 4 5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
14
Huffman Encoding

 List all the letters used, including the


"space" character, along with the
frequency with which they occur in the
message.
 Consider each of these
(character,frequency) pairs to be nodes;
they are actually leaf nodes, as we will see.
 Pick the two nodes with the lowest
frequency, and if there is a tie, pick
randomly amongst those with equal
frequencies.
15
Huffman Encoding

 Make a new node out of these two, and


make the two nodes its children.
 This new node is assigned the sum of
the frequencies of its children.
 Continue the process of combining the
two nodes of lowest frequency until
only one node, the root, remains.

16
Huffman Encoding

 Start at the root. Assign 0 to left


branch and 1 to the right branch.
 Repeat the process down the left and
right subtrees.
 To get the code for a character,
traverse the tree from the root to the
character leaf node and read off the 0
and 1 along the path.

17
Huffman Encoding
33
0 1

14 19

9 10
6 8

4 e r 5
a t 4 4 5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
18
Huffman Encoding
33
0 1

14 19
0 1 0 1
9 10
6 8

4 e r 5
a t 4 4 5 5
3 3

d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
19
Huffman Encoding
33
0 1

14 19
0 1 0 1
9 10
6 8
0 1 0 1
0 1 0 1
4 e r 5
a t 4 4 5 5
3 3 0 1 0 1 0 1 0 1
d i n s 2 2 2 S
2 2 2 2 P
NL b g h v y 3

1 1 1 1 1 1
20
Huffman Encoding
33
0 1

14 19
0 1 0 1
9 10
6 8
0 1 0 1
0 1 0 1
4 e r 5
a t 4 4 5 5
3 3 0 1 0 1 0 1 0 1
d i n s 2 2 2 S
2 2 2 2 0 1 0 1 0 1 P
NL b g h v y 3

1 1 1 1 1 1
21
Huffman Encoding

Huffman character
codes
• Notice that the code
NL  10000 is variable length.
SP  1111 • Letters with higher
a  000 frequencies have
b  10001
d  0100 shorter codes.
e  101 • The tree could have
g  10010 been built in a
h  10011 number of ways; each
i  0101 would yielded
n  0110
r  110 different codes but
s  0111 the code would still
t  001 be minimal.
v  11100 22
y  11101
Huffman Encoding

Original: traversing threaded binary


trees

Encoded:
t r a v e

001110000111001011100111010101
101001011110011001111010100001
001010100111110000101011000011
011101111100111010110101110000
23
Huffman Encoding

Original: traversing threaded binary trees


With 8 bits per character, length is 264.

Encoded:
0011100001110010111001110101011010
0101111001100111101010000100101010
0111110000101011000011011101111100
111010110101110000

Compressed into 122 bits, 54% reduction.

24
Mathematical Properties
of Binary Trees

25
Properties of Binary Tree

Property: A binary tree with N


internal nodes
has N+1 external nodes.

26
Properties of Binary Tree
A binary tree with N internal nodes has N+1 external nodes.

A internal nodes: 9
external nodes: 10

B C
internal node

D E F

G E F

external node
27
Properties of Binary Tree

Property: A binary tree with N internal


nodes has 2N links: N-1 links to internal
nodes and N+1 links to external nodes.

28
Threaded Binary Tree
Property: A binary tree with N internal nodes has 2N links:
N-1 links to internal nodes and N+1 links to external nodes.

B C internal link

D E F
external link

G E F

Internal links: 8
External links:
29 10
Properties of Binary Tree

Property: A binary tree with N internal


nodes has 2N links: N-1 links to internal
nodes and N+1 links to external nodes.
• In every rooted tree, each node,
except the root, has a unique parent.
• Every link connects a node to its
parent, so there are N-1 links
connecting internal nodes.
• Similarly, each of the N+1 external
nodes has one link to its parent.
• Thus N-1+N+1=2N links. 30

You might also like