0% found this document useful (0 votes)
35 views

Huffman Coding

This document discusses Huffman coding, which is a variable-length encoding technique used for lossless data compression. It constructs a prefix code that assigns shorter code lengths to more frequent characters and longer code lengths to less frequent characters. The optimal code is represented by a full binary tree where every non-leaf node has two children. It describes how to construct a Huffman code by building a binary tree from the character frequencies using a priority queue. The time complexity of this algorithm is O(n log n).

Uploaded by

f20210467
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Huffman Coding

This document discusses Huffman coding, which is a variable-length encoding technique used for lossless data compression. It constructs a prefix code that assigns shorter code lengths to more frequent characters and longer code lengths to less frequent characters. The optimal code is represented by a full binary tree where every non-leaf node has two children. It describes how to construct a Huffman code by building a binary tree from the character frequencies using a priority queue. The time complexity of this algorithm is O(n log n).

Uploaded by

f20210467
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Greedy Algorithms

Dr. Tathagata Ray


Professor, BITS Pilani, Hyderabad Campus
BITS Pilani [email protected]
Hyderabad Campus
Huffman Codes

• Suppose there is a file with the following content.

• “there was a brown crow he was very intelligent”

• In this text we have

Char a b c e g h i l n o r s t v w y
Freq 3 1 1 6 1 2 2 2 3 2 4 2 3 1 4 1

• We have to encode this file such that each character is


represented by a unique binary string called codewords.

BITS Pilani, Hyderabad Campus


Codewords

• If we use fixed length code, we require 4 bits to


represent this 16 characters.

• Thus we will encode the text using 38x4 = 152 bits.

• Can we do better?

BITS Pilani, Hyderabad Campus


Variable-length code

• Variable-length code can do considerably better than a


fixed length code.

• Give frequent characters short codewords and infrequent


character long codewords.
• For example
Char a b c d e f Total
bits
Freq 45000 13000 12000 16000 9000 5000 100000
Fixed 000 001 010 011 100 101 300000
len.
Var. 0 101 100 111 1101 1100 224,000
code

BITS Pilani, Hyderabad Campus


Prefix codes

• We consider here only codes in which no codeword is


also a prefix of some other codeword.

• These are called prefix codes.

• Prefix codes are simple to decode.

• For example 001011101 parses uniquely to 0.0.101.1101


i.e. aabe.

BITS Pilani, Hyderabad Campus


Decoding process

• Decoding needs a convenient representation for the


prefix code so that we can easily pick off the initial
codeword.

• A binary tree whose leaves are the given characters


provide such a representation.

BITS Pilani, Hyderabad Campus


100
0 1
a:45 55
0 1
25 30
0 1 0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

BITS Pilani, Hyderabad Campus


Optimal code

• An optimal code for a file is always represented by a full


binary tree, in which every non leaf node has two
children.

• If C is the alphabet from which the characters are drawn


and all character frequencies are positive then the tree
for an optimal prefix code has exactly |C| leaves and
exactly |C-1| internal nodes.

BITS Pilani, Hyderabad Campus


Cost of a tree

• Given a tree T corresponding to a prefix code, the


number of bits required to encode a file is given by

• .
Depth of c in T

Frequency of character c

BITS Pilani, Hyderabad Campus


Constructing a Huffman code

HUFFMAN(C)

1. For to -1
2. allocate a new node .
3. EXTRACT-MIN()
4. EXTRACT-MIN()

5. INSERT(, )
6. return EXTRACT-MIN() //return root of the tree

BITS Pilani, Hyderabad Campus


Example
f:5 e:9 c:12 b:13 d:16 a:45

c:12 b:13 14 d:16 a:45


0 1
f:5 e:9

14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13

BITS Pilani, Hyderabad Campus


Example

14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13

30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

BITS Pilani, Hyderabad Campus


Example
30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9
BITS Pilani, Hyderabad Campus
Example
a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
100 f:5 e:9
0 1

a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9 BITS Pilani, Hyderabad Campus
Time Complexity

• If we use heap data structure to implement the minimum


priority queue Q then constructing the queue will take
O(n) time.

• Each extract minimum operation takes O(lg n) time and


the For loop runs for n-1 time, hence the total time taken
is O(n lg n).

BITS Pilani, Hyderabad Campus

You might also like