0% found this document useful (0 votes)
3 views28 pages

Lec 05

The document discusses message encoding techniques, particularly focusing on prefix-free encoding and Huffman encoding. It explains the importance of frequency in determining the optimal encoding strategy and introduces a recursive algorithm for constructing the Huffman tree. The document also includes examples and questions to illustrate the encoding process and its efficiency.

Uploaded by

Sunaina Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views28 pages

Lec 05

The document discusses message encoding techniques, particularly focusing on prefix-free encoding and Huffman encoding. It explains the importance of frequency in determining the optimal encoding strategy and introduces a recursive algorithm for constructing the Huffman tree. The document also includes examples and questions to illustrate the encoding process and its efficiency.

Uploaded by

Sunaina Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

COL 351:

Analysis and Design of Algorithms


Lecture 5
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A − 00 Eg: C A B A D
Natural
Approach:
B − 01 Encoding: 1 0 0 0 0 1 0 0 1 1
C − 10
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A − 00 QUESTION:
Natural Can we compress beyond
Approach:
B − 01
2000 characters?
C − 10
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0
Length = 400 (1) + 100 (3) + 200 (3) + 300 (2) = 1900
Alternate
Way:
B − 100
C − 101 Question: Can there be unique
encoding/decoding?
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A B
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A B A D
D − 11
Prefix-free Encoding

Definition:
An encoding in which if (x1, …, xk) is a code-word, then NO prefix of it can be code-word.

Example
Binary-Tree Root
A −0 Representation
0 1
B − 100
A
C − 101
D
D − 11
B C

Remark: Symbols can only be leaf nodes.


Encoding Problem

Given: Symbols (a1, …, an) with frequency F = ( f1, …, fn).

Aim: Find prefix-free encoding for which “encoded-message” has minimum length.

Equivalently, find a binary-tree T with leaves a1, …, an that minimizes following.


n


fi × depth(ai, T ).
i=1
Encoding Problem

Given: Symbols (a1, …, an) with frequency F = ( f1, …, fn).

Aim: Find prefix-free encoding for which “encoded-message” has minimum length.

Equivalently, find a binary-tree T with leaves a1, …, an that minimizes following.


n


fi × depth(ai, T ).
i=1

Remark 1: Each internal node in binary-tree T must be have exactly two children.

TRIVIAL
P R O A CH?
AP
Remark 2: Symbol with smallest frequency should have highest depth.
Which Greedy Strategy works?

Ques. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can we choose T to be the


following (right-skewed) tree?

Root

0 1

a1

a2

an−1 an
Which Greedy Strategy works?

Ques. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can we choose T to be the


following (right-skewed) tree?

Root

0 1

a1

a2

an−1 an

Ans: No !
(If there are 2k frequencies, all identical, then T must be a balanced tree).
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof:
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof: Suppose this is not the case.

an−1
an
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof: Suppose this is not the case.


Then we can take two siblings ai and aj in
last layer of T, and perform swaps: an−1

• ai ⟷ an an
• aj ⟷ an−1
ai aj
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof: Suppose this is not the case.


Then we can take two siblings ai and aj in
last layer of T, and perform swaps: an−1

• ai ⟷ an an
• aj ⟷ an−1
ai aj

Main Idea: Add smallest two frequencies and find solution to the
new problem.
Reducing the problem

F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %
Reducing the problem

F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %

Root

0 1

Z D

Optimal tree for F *


Reducing the problem

F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %

Root Root

0 1 0 1

A A

D Z D

Optimal tree for F *


B C
Algorithm (Hufmann Encoding)
SOLVE({a1, …, an})

1. Suppose an, an−1 have the smallest frequency.

2. Replace an, an−1 with new symbol ã and set f˜ = fn + fn−1.


3. T̃ = SOLVE({a1, …, an−2, ã}).

4. Add an, an−1 as children of ã in tree T̃, and return the new tree.
Correctness

Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.
Correctness

Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.

Proof (Part 1):

(1) Let T* ˜
opt be optimal solution for F* = (F∖{fi, fj}) + f .

(2) T = Tree obtained by adding ai and aj as children of ã in T*


opt. T*
opt
(3) So, we get a solution for F with msg-length:
opt-msg-length(F*) + f˜. ã

(4) This implies LHS ⩽ RHS . ai aj

Construction of T
Correctness

Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.

Proof (Part 2):

(1) Let Topt be optimal solution for F in which ai and aj are sibling.
(2) T* = Tree obtained by removing ai and aj from Topt, and
Topt
marking the common parent as ã.
(3) So, we get a solution for F* with msg-length: ã
opt-msg-length(F) − f˜.
X Xa
ai j
(4) This implies RHS ⩽ LHS .
Construction of T*
Algorithm (Hufmann Encoding)

1. Suppose an, an−1 have the smallest frequency.

2. Replace an, an−1 with new symbol ã and set f˜ = fn + fn−1.


3. Recursively solve {a1, …, an−2, ã}, and find opt tree T̃.

4. Add an, an−1 as siblings of ã in tree T̃, and return the new tree.

Homework: Provide an O(n log n) time implementation using min-heaps.

You might also like