0% found this document useful (0 votes)
39 views

Huffman Coding

This document discusses Huffman coding, which is a variable-length encoding technique used for lossless data compression. It constructs a prefix code that assigns shorter code lengths to more frequent characters and longer code lengths to less frequent characters. The optimal code is represented by a full binary tree where every non-leaf node has two children. It describes how to construct a Huffman code by building a binary tree from the character frequencies using a priority queue. The time complexity of this algorithm is O(n log n).

Uploaded by

f20210467
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Huffman Coding

This document discusses Huffman coding, which is a variable-length encoding technique used for lossless data compression. It constructs a prefix code that assigns shorter code lengths to more frequent characters and longer code lengths to less frequent characters. The optimal code is represented by a full binary tree where every non-leaf node has two children. It describes how to construct a Huffman code by building a binary tree from the character frequencies using a priority queue. The time complexity of this algorithm is O(n log n).

Uploaded by

f20210467
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Greedy Algorithms

Dr. Tathagata Ray


Professor, BITS Pilani, Hyderabad Campus
BITS Pilani [email protected]
Hyderabad Campus
Huffman Codes

• Suppose there is a file with the following content.

• “there was a brown crow he was very intelligent”

• In this text we have

Char a b c e g h i l n o r s t v w y
Freq 3 1 1 6 1 2 2 2 3 2 4 2 3 1 4 1

• We have to encode this file such that each character is


represented by a unique binary string called codewords.

BITS Pilani, Hyderabad Campus


Codewords

• If we use fixed length code, we require 4 bits to


represent this 16 characters.

• Thus we will encode the text using 38x4 = 152 bits.

• Can we do better?

BITS Pilani, Hyderabad Campus


Variable-length code

• Variable-length code can do considerably better than a


fixed length code.

• Give frequent characters short codewords and infrequent


character long codewords.
• For example
Char a b c d e f Total
bits
Freq 45000 13000 12000 16000 9000 5000 100000
Fixed 000 001 010 011 100 101 300000
len.
Var. 0 101 100 111 1101 1100 224,000
code

BITS Pilani, Hyderabad Campus


Prefix codes

• We consider here only codes in which no codeword is


also a prefix of some other codeword.

• These are called prefix codes.

• Prefix codes are simple to decode.

• For example 001011101 parses uniquely to 0.0.101.1101


i.e. aabe.

BITS Pilani, Hyderabad Campus


Decoding process

• Decoding needs a convenient representation for the


prefix code so that we can easily pick off the initial
codeword.

• A binary tree whose leaves are the given characters


provide such a representation.

BITS Pilani, Hyderabad Campus


100
0 1
a:45 55
0 1
25 30
0 1 0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

BITS Pilani, Hyderabad Campus


Optimal code

• An optimal code for a file is always represented by a full


binary tree, in which every non leaf node has two
children.

• If C is the alphabet from which the characters are drawn


and all character frequencies are positive then the tree
for an optimal prefix code has exactly |C| leaves and
exactly |C-1| internal nodes.

BITS Pilani, Hyderabad Campus


Cost of a tree

• Given a tree T corresponding to a prefix code, the


number of bits required to encode a file is given by

• .
Depth of c in T

Frequency of character c

BITS Pilani, Hyderabad Campus


Constructing a Huffman code

HUFFMAN(C)

1. For to -1
2. allocate a new node .
3. EXTRACT-MIN()
4. EXTRACT-MIN()

5. INSERT(, )
6. return EXTRACT-MIN() //return root of the tree

BITS Pilani, Hyderabad Campus


Example
f:5 e:9 c:12 b:13 d:16 a:45

c:12 b:13 14 d:16 a:45


0 1
f:5 e:9

14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13

BITS Pilani, Hyderabad Campus


Example

14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13

30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

BITS Pilani, Hyderabad Campus


Example
30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9

a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9
BITS Pilani, Hyderabad Campus
Example
a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
100 f:5 e:9
0 1

a:45 55 1
0

25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9 BITS Pilani, Hyderabad Campus
Time Complexity

• If we use heap data structure to implement the minimum


priority queue Q then constructing the queue will take
O(n) time.

• Each extract minimum operation takes O(lg n) time and


the For loop runs for n-1 time, hence the total time taken
is O(n lg n).

BITS Pilani, Hyderabad Campus

You might also like