Lec 05
Lec 05
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A − 00 Eg: C A B A D
Natural
Approach:
B − 01 Encoding: 1 0 0 0 0 1 0 0 1 1
C − 10
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A − 00 QUESTION:
Natural Can we compress beyond
Approach:
B − 01
2000 characters?
C − 10
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0
Length = 400 (1) + 100 (3) + 200 (3) + 300 (2) = 1900
Alternate
Way:
B − 100
C − 101 Question: Can there be unique
encoding/decoding?
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A B
D − 11
Message Encoding
Frequency Percentage
{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice
A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A B A D
D − 11
Prefix-free Encoding
Definition:
An encoding in which if (x1, …, xk) is a code-word, then NO prefix of it can be code-word.
Example
Binary-Tree Root
A −0 Representation
0 1
B − 100
A
C − 101
D
D − 11
B C
Aim: Find prefix-free encoding for which “encoded-message” has minimum length.
∑
fi × depth(ai, T ).
i=1
Encoding Problem
Aim: Find prefix-free encoding for which “encoded-message” has minimum length.
∑
fi × depth(ai, T ).
i=1
Remark 1: Each internal node in binary-tree T must be have exactly two children.
TRIVIAL
P R O A CH?
AP
Remark 2: Symbol with smallest frequency should have highest depth.
Which Greedy Strategy works?
Root
0 1
a1
a2
an−1 an
Which Greedy Strategy works?
Root
0 1
a1
a2
an−1 an
Ans: No !
(If there are 2k frequencies, all identical, then T must be a balanced tree).
Which Greedy Strategy works?
Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.
Proof:
Which Greedy Strategy works?
Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.
an−1
an
Which Greedy Strategy works?
Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.
• ai ⟷ an an
• aj ⟷ an−1
ai aj
Which Greedy Strategy works?
Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.
• ai ⟷ an an
• aj ⟷ an−1
ai aj
Main Idea: Add smallest two frequencies and find solution to the
new problem.
Reducing the problem
F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %
Reducing the problem
F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %
Root
0 1
Z D
F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %
Root Root
0 1 0 1
A A
D Z D
4. Add an, an−1 as children of ã in tree T̃, and return the new tree.
Correctness
Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.
Correctness
Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.
(1) Let T* ˜
opt be optimal solution for F* = (F∖{fi, fj}) + f .
Construction of T
Correctness
Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.
(1) Let Topt be optimal solution for F in which ai and aj are sibling.
(2) T* = Tree obtained by removing ai and aj from Topt, and
Topt
marking the common parent as ã.
(3) So, we get a solution for F* with msg-length: ã
opt-msg-length(F) − f˜.
X Xa
ai j
(4) This implies RHS ⩽ LHS .
Construction of T*
Algorithm (Hufmann Encoding)
4. Add an, an−1 as siblings of ã in tree T̃, and return the new tree.