Open navigation menu

Scribd

0% found this document useful (0 votes)

3 views28 pages

Lec 05

The document discusses message encoding techniques, particularly focusing on prefix-free encoding and Huffman encoding. It explains the importance of frequency in determining the optimal encoding strategy and introduces a recursive algorithm for constructing the Huffman tree. The document also includes examples and questions to illustrate the encoding process and its efficiency.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views28 pages

Lec 05

The document discusses message encoding techniques, particularly focusing on prefix-free encoding and Huffman encoding. It explains the importance of frequency in determining the optimal encoding strategy and introduces a recursive algorithm for constructing the Huffman tree. The document also includes examples and questions to illustrate the encoding process and its efficiency.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

COL 351:

Analysis and Design of Algorithms

Lecture 5
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A − 00 Eg: C A B A D
Natural
Approach:
B − 01 Encoding: 1 0 0 0 0 1 0 0 1 1
C − 10
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A − 00 QUESTION:
Natural Can we compress beyond
Approach:
B − 01
2000 characters?
C − 10
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0
Length = 400 (1) + 100 (3) + 200 (3) + 300 (2) = 1900
Alternate
Way:
B − 100
C − 101 Question: Can there be unique
encoding/decoding?
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A B
D − 11
Message Encoding

Frequency Percentage

{A,B,C,D} A − 40 %
B − 10 %
n = 1000
characters C − 20 %
D − 30 %
Bob Alice

A −0 Eg: C A B A D
Alternate
Way:
B − 100 Encoding: 1 0 1 0 1 0 0 0 1 1
C − 101 Decoding: C A B A D
D − 11
Prefix-free Encoding

Definition:
An encoding in which if (x1, …, xk) is a code-word, then NO prefix of it can be code-word.

Example
Binary-Tree Root
A −0 Representation
0 1
B − 100
A
C − 101
D
D − 11
B C

Remark: Symbols can only be leaf nodes.

Encoding Problem

Given: Symbols (a1, …, an) with frequency F = ( f1, …, fn).

Aim: Find prefix-free encoding for which “encoded-message” has minimum length.

Equivalently, find a binary-tree T with leaves a1, …, an that minimizes following.

n

∑
fi × depth(ai, T ).
i=1
Encoding Problem

Given: Symbols (a1, …, an) with frequency F = ( f1, …, fn).

Aim: Find prefix-free encoding for which “encoded-message” has minimum length.

Equivalently, find a binary-tree T with leaves a1, …, an that minimizes following.

n

∑
fi × depth(ai, T ).
i=1

Remark 1: Each internal node in binary-tree T must be have exactly two children.

TRIVIAL
P R O A CH?
AP
Remark 2: Symbol with smallest frequency should have highest depth.
Which Greedy Strategy works?

Ques. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can we choose T to be the

following (right-skewed) tree?

Root

0 1

a1

a2

an−1 an
Which Greedy Strategy works?

Ques. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can we choose T to be the

following (right-skewed) tree?

Root

0 1

a1

a2

an−1 an

Ans: No !
(If there are 2k frequencies, all identical, then T must be a balanced tree).
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof:
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof: Suppose this is not the case.

an−1
an
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof: Suppose this is not the case.

Then we can take two siblings ai and aj in
last layer of T, and perform swaps: an−1

• ai ⟷ an an
• aj ⟷ an−1
ai aj
Which Greedy Strategy works?

Lemma 1. If symbols a1, …, an satisfy f1 ⩾ f2 ⩾ ⋯ ⩾ fn, then can there exists optimal T
in which
• an and an−1 are at maximum depth, and
• they are siblings.

Proof: Suppose this is not the case.

Then we can take two siblings ai and aj in
last layer of T, and perform swaps: an−1

• ai ⟷ an an
• aj ⟷ an−1
ai aj

Main Idea: Add smallest two frequencies and find solution to the
new problem.
Reducing the problem

F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %
Reducing the problem

F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %

Root

0 1

Z D

Optimal tree for F *

Reducing the problem

F F*
A − 40 % A − 40 %
Original B − 10 % Z − 30 % Smaller
problem Instance
C − 20 % D − 30 %
D − 30 %

Root Root

0 1 0 1

A A

D Z D

Optimal tree for F *

B C
Algorithm (Hufmann Encoding)
SOLVE({a1, …, an})

1. Suppose an, an−1 have the smallest frequency.

2. Replace an, an−1 with new symbol ã and set f˜ = fn + fn−1.

3. T̃ = SOLVE({a1, …, an−2, ã}).

4. Add an, an−1 as children of ã in tree T̃, and return the new tree.
Correctness

Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.
Correctness

Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.

Proof (Part 1):

(1) Let T* ˜
opt be optimal solution for F* = (F∖{fi, fj}) + f .

(2) T = Tree obtained by adding ai and aj as children of ã in T*

opt. T*
opt
(3) So, we get a solution for F with msg-length:
opt-msg-length(F*) + f˜. ã

(4) This implies LHS ⩽ RHS . ai aj

Construction of T
Correctness

Lemma 2: Suppose in the instance F = ( f1, …, fn) indices i, j ∈ [1,n] satisfy that there
is an optimal tree in which ai and aj are siblings. Then for the new instance
F* = (F∖{fi, fj}) + f˜, where f˜ = fi + fj, we have
opt-msg-length(F) = opt-msg-length(F*) + f˜.

Proof (Part 2):

(1) Let Topt be optimal solution for F in which ai and aj are sibling.
(2) T* = Tree obtained by removing ai and aj from Topt, and
Topt
marking the common parent as ã.
(3) So, we get a solution for F* with msg-length: ã
opt-msg-length(F) − f˜.
X Xa
ai j
(4) This implies RHS ⩽ LHS .
Construction of T*
Algorithm (Hufmann Encoding)

1. Suppose an, an−1 have the smallest frequency.

2. Replace an, an−1 with new symbol ã and set f˜ = fn + fn−1.

3. Recursively solve {a1, …, an−2, ã}, and find opt tree T̃.

4. Add an, an−1 as siblings of ã in tree T̃, and return the new tree.

Homework: Provide an O(n log n) time implementation using min-heaps.

You might also like

Static Huffman Coding Term Paper
No ratings yet
Static Huffman Coding Term Paper
23 pages
Mini Project
No ratings yet
Mini Project
26 pages
Huffman
No ratings yet
Huffman
35 pages
Oracle LibraryCacheInternals JulianDyke
No ratings yet
Oracle LibraryCacheInternals JulianDyke
66 pages
CTS GencNext
No ratings yet
CTS GencNext
12 pages
Huffman
No ratings yet
Huffman
70 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
CA Module 1
No ratings yet
CA Module 1
64 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
Huffman Code1
100% (1)
Huffman Code1
13 pages
Greedy Algo 2
No ratings yet
Greedy Algo 2
54 pages
4 Information Theory
No ratings yet
4 Information Theory
53 pages
Object Oriented Programming
No ratings yet
Object Oriented Programming
137 pages
Codeforces Problems DP
0% (1)
Codeforces Problems DP
17 pages
MMC Module 3
No ratings yet
MMC Module 3
65 pages
Unite 4-Greedy Method - CSE
No ratings yet
Unite 4-Greedy Method - CSE
41 pages
Unreal Development Kit
No ratings yet
Unreal Development Kit
13 pages
M3R5 Python Notes
No ratings yet
M3R5 Python Notes
142 pages
Huffman
No ratings yet
Huffman
22 pages
Module IV
No ratings yet
Module IV
37 pages
Dynamic Programming - Longest Common Subsequence (LCS)
No ratings yet
Dynamic Programming - Longest Common Subsequence (LCS)
34 pages
Lecture 14
No ratings yet
Lecture 14
25 pages
Huffman Coding: Greedy Algorithm
No ratings yet
Huffman Coding: Greedy Algorithm
27 pages
Lecture# 08 Greedy Algorithms
No ratings yet
Lecture# 08 Greedy Algorithms
63 pages
DAA Unit-4
No ratings yet
DAA Unit-4
26 pages
Lec 29
No ratings yet
Lec 29
25 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
CSC 310, Spring 2004 - Assignment #1 Solutions
No ratings yet
CSC 310, Spring 2004 - Assignment #1 Solutions
4 pages
Lec 02
No ratings yet
Lec 02
19 pages
11 Huffman Coding
No ratings yet
11 Huffman Coding
25 pages
L14 Huffman Code
No ratings yet
L14 Huffman Code
30 pages
Mmis G1 Ass
No ratings yet
Mmis G1 Ass
13 pages
Huffman
No ratings yet
Huffman
53 pages
07 Brute Force
No ratings yet
07 Brute Force
54 pages
Greedy Techniques
No ratings yet
Greedy Techniques
21 pages
Daa CS F364 L07 08 Huffman Codes
No ratings yet
Daa CS F364 L07 08 Huffman Codes
19 pages
ICS 220 - Data Structures and Algorithms: Dr. Ken Cosh
No ratings yet
ICS 220 - Data Structures and Algorithms: Dr. Ken Cosh
22 pages
Unit III
No ratings yet
Unit III
28 pages
Ds 2016 17 Lec4
No ratings yet
Ds 2016 17 Lec4
11 pages
Notes Compression
No ratings yet
Notes Compression
9 pages
Traversal To Binary Tree
No ratings yet
Traversal To Binary Tree
22 pages
04huffman 2x2
No ratings yet
04huffman 2x2
6 pages
Huffman Codes
No ratings yet
Huffman Codes
8 pages
608 16 PDF
No ratings yet
608 16 PDF
14 pages
Lecture35-37 SourceCoding
No ratings yet
Lecture35-37 SourceCoding
20 pages
Dsa Q31
No ratings yet
Dsa Q31
3 pages
Dynamic Programming:: Example 1: Assembly Line Scheduling. Instance
No ratings yet
Dynamic Programming:: Example 1: Assembly Line Scheduling. Instance
14 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Real Bash - Course
No ratings yet
Real Bash - Course
144 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Types of Algorithms Solutions: 1. Which of The Following Algorithms Is NOT A Divide & Conquer Algorithm by Nature?
No ratings yet
Types of Algorithms Solutions: 1. Which of The Following Algorithms Is NOT A Divide & Conquer Algorithm by Nature?
8 pages
Data Structure & Algorithms Solutions
No ratings yet
Data Structure & Algorithms Solutions
9 pages
Tut3 Sol
No ratings yet
Tut3 Sol
3 pages
Huffman Trees and Codes: Greedy Technique
No ratings yet
Huffman Trees and Codes: Greedy Technique
6 pages
Compsci Explanations PDF
No ratings yet
Compsci Explanations PDF
24 pages
Huffman Codes: Coding
No ratings yet
Huffman Codes: Coding
8 pages
Hauffman Coading
No ratings yet
Hauffman Coading
6 pages
Lecture 15
No ratings yet
Lecture 15
3 pages
4 Bit Multiplier
No ratings yet
4 Bit Multiplier
7 pages
Huffman Coding
No ratings yet
Huffman Coding
3 pages
Greedy and DFS in Huffman Coding
No ratings yet
Greedy and DFS in Huffman Coding
5 pages
Huffman Codes: Spring 2010
No ratings yet
Huffman Codes: Spring 2010
7 pages
String Matching
No ratings yet
String Matching
4 pages
MCQs SQL Server
No ratings yet
MCQs SQL Server
15 pages
EssentialAlgorithmsAndDataStrucutres 1stedition PDF
No ratings yet
EssentialAlgorithmsAndDataStrucutres 1stedition PDF
95 pages
SDKRef
No ratings yet
SDKRef
305 pages
POINTERS in C Programming Language
No ratings yet
POINTERS in C Programming Language
31 pages
Heap Sort Min-Heap or Max-Heap
No ratings yet
Heap Sort Min-Heap or Max-Heap
11 pages
Ketan Babbar - 22BCC70082 - Daa
No ratings yet
Ketan Babbar - 22BCC70082 - Daa
34 pages
Courseplan Ooad
No ratings yet
Courseplan Ooad
3 pages
L Python PDF
No ratings yet
L Python PDF
52 pages
Architecture Java Runtime Environment
No ratings yet
Architecture Java Runtime Environment
12 pages
Silabus Sekolah Fullstack
No ratings yet
Silabus Sekolah Fullstack
21 pages
W2020 ITM500 Final Review
No ratings yet
W2020 ITM500 Final Review
17 pages
Technology Stack - Template
No ratings yet
Technology Stack - Template
3 pages
66ac0e190e0e80d17f1677ba Report Cantinacode Super Champs Token 4
No ratings yet
66ac0e190e0e80d17f1677ba Report Cantinacode Super Champs Token 4
20 pages
Lec 04
No ratings yet
Lec 04
49 pages
Operating System Scheduling Algorithms
No ratings yet
Operating System Scheduling Algorithms
6 pages
Lec 10
No ratings yet
Lec 10
36 pages
Pipelined RISC-V Processor With Cache
No ratings yet
Pipelined RISC-V Processor With Cache
7 pages
Lec 03
No ratings yet
Lec 03
26 pages
Lec 07
No ratings yet
Lec 07
17 pages
Process: Subprocess Popen
No ratings yet
Process: Subprocess Popen
10 pages
Pradeep SIT
No ratings yet
Pradeep SIT
1 page
00 Template FAKTOR EXACTA 2020
No ratings yet
00 Template FAKTOR EXACTA 2020
3 pages
JU NCPC 2023 - Online Preliminary Contest Editorial
No ratings yet
JU NCPC 2023 - Online Preliminary Contest Editorial
4 pages
4.4 Theory of Computation
No ratings yet
4.4 Theory of Computation
11 pages
Pathway Electives: For Spring Term: 21222
No ratings yet
Pathway Electives: For Spring Term: 21222
17 pages
Week3 COMPUTER PROGRAMMING
No ratings yet
Week3 COMPUTER PROGRAMMING
17 pages
Variant Configuration 13062024
No ratings yet
Variant Configuration 13062024
4 pages
Hemavathi S Updated Resume
No ratings yet
Hemavathi S Updated Resume
2 pages