0% found this document useful (0 votes)

8 views

MLSP_LAB_EXP2

The document outlines an experiment on lossless compression using Huffman coding as part of a course at IIT Kharagpur. It includes a grading rubric, an introduction to compression techniques, detailed steps for building a Huffman tree, generating a codebook, and encoding/decoding processes. Additionally, it describes two datasets for practical implementation, along with assignments for reporting results and visualizations.

Uploaded by

adatiasam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

MLSP_LAB_EXP2

Uploaded by

adatiasam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Department of Electrical Engineering

Indian Institute of Technology Kharagpur

Machine Learning for Signal Processing Laboratory
(EE69210)
Spring, 2022-23

Experiment 2: Lossless Compression

Grading Rubric
Tick the best applicable per row Points
Below Lacking Meets all
Expecta- in Some Expecta-
tion tion
Completeness of the report
Organization of the report (5 pts)
Quality of figures (5 pts)
Building the min-heap (20 pts)
Generating the code book and com-
pressed bit stream (25 pts)
Ability to decode the Huffman tree
from its in-order traversal represen-
tation (20 pts)
Ability to recover the dataset (25
pts)
TOTAL (100 pts)

1
1 Introduction
Compression of a signal involves multiple kinds of approaches. Since a sample of signal my be
represented in a dimension much higher than its intrinsic dimension, so one kind of approaches
involves operations to reduce its dimension. Principal component analysis (PCA) or Singular
value decomposition (SVD) are few of such approaches to reduce the intrinsic dimensionality of a
signal. Alternatively, operations like Discrete Fourier transform (DFT), Discrete cosine transform
(DCT) also play a role of compacting the energy of the signal, by reducing the effective number
of samples in the transformed domain which may be needed to recover the original domain
signal. Such kind of approaches are generally referred to as energy compacting transforms and
also offer lossy compression of the signal. The other family of operations involve reducing the
number of bits required to represent a sample. Since the number of samples or its dimension is
not effected, these kind of transforms do not typically lead of any loss of energy of the signal.
Such family of operations are known as lossless compression.
In this experiment we would implement the mechanism of lossless compression of a vector
representing a set of samples using principles of Entropy of values of samples.

2 Huffman coding
The method of Huffman coding is adopted to minimize the number of bits required to represent
a sample. The minimum limit being the Entropy of values which make up the samples.
Let us consider a vector X constituted of the following samples X = [x0 , x1 , · · · , xi , · · · , xN −1 ],
where xi ∈ RD×1 . As an example we can consider the string X ={‘h’,‘e’,‘l’,‘l’,‘o’}. If every
alphabet is represented as 8-bit ASCII, then we would have xi ∈ B8×1 , where B ∈ {0, 1}
represents the set of bits. Thus we have X ∈ B8×5 . So the average number of bits per sample is
8. However, it can be seen that since the number of unique symbols in this case is only 4, so
one may say that only 2 bits per symbol is enough. However, this number of bits per symbol
would vary with the constituents of X. Huffman coding is one such way to evolve a reduced
code representation for each sample, and interestingly, through this approach the number of bits
per sample is not fixed, and this variable number of bits per sample enables us to reduce the
average number of bits per sample.

2.1 Constructing the Huffman tree

The process starts with identifying the number of unique symbols in X ∈ RD×N say represented
as U = [u0 , u1 , · · · , uj , · · · , uk−1 ] ∈ RD×k . In this case we have the following k = 4 unique
symbols U ={‘h’,‘e’,‘l’,‘o’}∈ B8×4 . Let us now find the probability of occurrence of each of the
unique symbols in X as p(ui ). Here we have p(‘h’) = 0.2, p(‘e’) = 0.2, p(‘l’) = 0.4, p(‘o’) = 0.2.
Now, let us build a min-heap such that a left child na of the parent node nc has p(na ) ≤ p(nb )
where nb is the right child. Also, p(nc ) = p(na ) + p(nb ), and any leaf represents one of the unique
symbols in U, such that if na is a leaf node with the symbol ui , then p(na ) = p(ui ). Here we
can start with creating the min-heap in steps as shown in Fig. 1. This binary tree representing
the heap is referred to as the Huffman tree.

2.2 Code book generation

The Huffman tree can be used to generate the variable length bit code to be assigned to each
symbol accordingly. Starting at a node nc , traversal to its left child na is assigned a bit code
of 0 while traversal to its right child nb is assigned the bit code of 1. When we encounter a
child to be a leaf node with a symbol, then the bit code assigned to that symbol corresponds
to the accumulation of all such bits starting at root node of the min-heap. The tree would be

2
(a) Step 1 (b) Step 2 (c) Step 3

Figure 1: Steps wise evolution of the min-heap.

represented similar to as in Fig. 2. Now, using this tree we can find the bit code for the symbols
as in Table 1.

Figure 2: Huffman tree created for the dataset in example. # denotes a non-leaf node, and the
symbol present at a leaf node is appropriately mentioned viz. x =‘l’, etc.

ui Bit code
‘h’ 110
‘e’ 111
‘l’ 0
‘o’ 10

Table 1: Bit code for each symbol ui generated using the Huffman tree.

2.3 Coded bit stream

Using the bit code for the symbols in Table 1 we can now represent the dataset X using a
concatenated string of bits, by replacing each sample by the corresponding bit code of its unique
symbol. This would provide us with a new dataset Y ∈ Bl where l is the total number of bits in
Y . Thus we obtain for this example Y = [1101110010].

3
2.4 Decoding the bit stream
In order to recover the original form of the dataset stored as X̂ from the received bit stream Y ,
we would again have to employ the Huffman tree. We start the process accordingly by reading
one bit at a time and traversing through the Huffman tree till we reach a leaf node. As with
this example we would see the following steps being adopted.

1. Root → Right → Right → Lef t so as to reach x =‘h’.

Store ‘h’ as the first sample in the recovered dataset X̂.

2. Reset tree pointer to Root.

3. Root → Right → Right → Right so as to reach x =‘e’.

Store ‘e’ as the second sample in the recovered dataset X̂.

4. Reset tree pointer to Root.

5. Root → Lef t so as to reach x =‘l’.

Store ‘l’ as the third sample in the recovered dataset X̂.

6. Reset tree pointer to Root.

7. Root → Lef t so as to reach x =‘l’.

Store ‘l’ as the fourth sample in the recovered dataset X̂.

8. Reset tree pointer to Root.

9. Root → Right → Lef t so as to reach x =‘o’.

Store ‘o’ as the fifth sample in the recovered dataset X̂.

2.5 Building the Encoder-Decoder

Given the task of lossless compression of the dataset X using Huffman coding, the following
steps would have to undertaken.

2.5.1 Encoder
The encoder would perform the following tasks

1. Generate the set of unique symbols U from the dataset X.

2. Compute the probability of occurance of each unique symbol as p(ui ).

3. Construct the min-heap.

4. Generate the code book for each unique symbol ui .

5. Generate the coded bit stream Y .

6. Encode the Huffman tree with in-order traversal, also known as the Code book.

2.5.2 Compressed file

The compressed file should consist of the Huffman tree and the coded bit stream Y .

4
2.5.3 Decoder
The decoder would perform the following set of steps to recover the dataset

1. Reconstruct the Huffman tree using the Code book.

2. Using traversal on the Huffman tree recover X̂ from Y .

3 Experiments
3.1 Dataset 1
3.1.1 Generation of the dataset
Generate a lower case alphabet and numeric string of a given length N consisting of random
characters without blank spaces or line breaks or special characters. You can use the string and
random library in Python to achieve it. This would denote the dataset X. Store the generated
string as a .txt file.

3.1.2 Encoder - Decoder - Compressed file

Write a function for the Huffman encoder which would accept a .txt file as an input, and it
would produce a compressed file .huf.
The .huf file would consist of the first Byte stating the number of Bytes which are used to
storing the Huffman tree in its in-order traversal representation. The second Byte onward would
be the Huffman tree. On completion of this, the compressed bit code would be stored, written in
bit format.
Write a function for the Huffman decoder which would accept a .huf file as an input, and it
would produce the decompressed file .txt.

3.1.3 Assignments to solve and report

Write down the codes for the following tasks in a .ipynb file, including all visualizations and
submit the executed file. Submit also a separate pdf of your report of this experiment prepared
using Latex and the documentclass Article. This report should describe your observations and
reasoning while executing these experiments.

1. Generate t = 10 different .txt files for each of N = {50, 100, 500, 1000, 5000}.

2. Use encoder to encode each of the file and store each as a separate .huf.

3. Compute the compression factor for each file as sizeof(< fileName > .txt)/sizeof(< fileName > .huf)
where sizeof(·) operator returns the size of the file in Bytes. Report the variation of
compression factors for each N using a notch-box plot. Here x-axis should represent N as
specified above and y-axis should represent the compression factor.

4. Use the decoder to decode each of the .huf files.

5. Measure the MSE between the original .txt and the decoded .huf file.

6. Measure the time to encode and decode each file, and report the times separate as a
notch-box plot for encoder and for decoder. Here x-axis should represent N as specified
above and y-axis should represent the computation time in sec. Encoder and decoder
times to be shown as stacked / grouped items for each N .

5
3.2 Dataset 2
3.2.1 Preparation of the dataset
Here we would strive to solve Huffman on grayscale images of human faces of size 64 × 64, from
the Olivetti faces dataset1 with N = 400 samples. Since each image I ∈ Z64×64×1 and has 8-bit
grayscale representation, so we can alternatively write it as X ∈ B8×4,096 where B = {0, 1}.

3.2.2 Assignments to solve and report

Write down the codes for the following tasks in a .ipynb file, including all visualizations and
submit the executed file. Submit also a separate pdf of your report of this experiment prepared
using Latex and the documentclass article. This report should describe your observations and
reasoning while executing these experiments.

1. Over each image in the dataset store it as a .bmp file and then perform the following.

2. Use encoder to encode each of the image in .bmp format and store each as a separate .huf.

3. Compute the compression factor for each file as sizeof(< fileName > .bmp)/sizeof(< fileName > .huf)
where sizeof(·) operator returns the size of the file in Bytes. Report the variation of
compression factors over all images using a notch-box plot. Here y-axis should represent
the compression factor.

4. Use the decoder to decode each of the .huf files to obtain .bmp file.

5. Measure the MSE between the original .bmp and the decoded .huf file.

6. Measure the time to encode and decode each file, and report the times separately as a
notch-box plot for encoder and for decoder. Here y-axis should represent the computation
time in sec. Encoder and decoder times to be shown as stacked / grouped items.

7. Plot the compression factor on x-axis and encoder time on y-axis over all images as a scatter
plot. Repeat the same with decoder time on y-axis, and comment on your observation.

1
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch olivetti faces.html

FIT1047 Assignment - 1-4
No ratings yet
FIT1047 Assignment - 1-4
11 pages
Huffman Coding Notes
No ratings yet
Huffman Coding Notes
7 pages
Huffman Coding On Matlab
No ratings yet
Huffman Coding On Matlab
8 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Huffman Coding Assignment
50% (2)
Huffman Coding Assignment
7 pages
Samarth Adatia MLSP Exp2
No ratings yet
Samarth Adatia MLSP Exp2
14 pages
Getting Started: Huffman Coding
No ratings yet
Getting Started: Huffman Coding
5 pages
5 Huffman Coding
No ratings yet
5 Huffman Coding
50 pages
University of Management & Technology: Submitted By: Usama Dastagir 14030027011 Hassan Humayoun 14030027043
No ratings yet
University of Management & Technology: Submitted By: Usama Dastagir 14030027011 Hassan Humayoun 14030027043
7 pages
Huff Man
No ratings yet
Huff Man
8 pages
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
No ratings yet
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
8 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
Data Compression - Unit 2
No ratings yet
Data Compression - Unit 2
31 pages
Mini Project
No ratings yet
Mini Project
26 pages
Data Compression
No ratings yet
Data Compression
28 pages
Hufman Exp
No ratings yet
Hufman Exp
2 pages
Adc-Exp
No ratings yet
Adc-Exp
10 pages
ADCexp 2
No ratings yet
ADCexp 2
7 pages
DSP PDF
No ratings yet
DSP PDF
8 pages
Huffman Encoding Report
No ratings yet
Huffman Encoding Report
36 pages
Lec 6
No ratings yet
Lec 6
31 pages
Sumit_Singh_MLSP_Expt_2
No ratings yet
Sumit_Singh_MLSP_Expt_2
17 pages
Chapter 3-Part II
100% (1)
Chapter 3-Part II
26 pages
Data Compression Huffman Codes
No ratings yet
Data Compression Huffman Codes
60 pages
Information Theory Coding
No ratings yet
Information Theory Coding
11 pages
ADC_EXPT_2_078_mane_B1
No ratings yet
ADC_EXPT_2_078_mane_B1
10 pages
ADCexp 2
No ratings yet
ADCexp 2
8 pages
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
No ratings yet
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
28 pages
IEEE_paper
No ratings yet
IEEE_paper
2 pages
Huffman Coding: Vida Movahedi
No ratings yet
Huffman Coding: Vida Movahedi
24 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Deep Dive Into Huffman Coding Techniques
No ratings yet
Deep Dive Into Huffman Coding Techniques
3 pages
ex 7 Daa
No ratings yet
ex 7 Daa
8 pages
Data Compression Unit-2
No ratings yet
Data Compression Unit-2
74 pages
Compression and Decompression Using Huffman Convention Synopsis
No ratings yet
Compression and Decompression Using Huffman Convention Synopsis
10 pages
Huffman Coding
No ratings yet
Huffman Coding
11 pages
IEEE Paper
No ratings yet
IEEE Paper
2 pages
Huffman Coding and Encoding Data Methods
No ratings yet
Huffman Coding and Encoding Data Methods
6 pages
Huffman
No ratings yet
Huffman
2 pages
Haufman
No ratings yet
Haufman
8 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
DC 3
No ratings yet
DC 3
20 pages
Optimization Problems
No ratings yet
Optimization Problems
38 pages
Imc14 03 Huffman Codes PDF
No ratings yet
Imc14 03 Huffman Codes PDF
31 pages
Entropy
No ratings yet
Entropy
10 pages
Lecture
No ratings yet
Lecture
75 pages
CH - 03 Huffman & Extended Huffman
No ratings yet
CH - 03 Huffman & Extended Huffman
10 pages
Huffman and Arithmetic Coding
No ratings yet
Huffman and Arithmetic Coding
27 pages
Digital Communications Lab (CE-343L) : Experiment NO
No ratings yet
Digital Communications Lab (CE-343L) : Experiment NO
3 pages
Huffman Code
No ratings yet
Huffman Code
5 pages
Manual GRP A - Assignment 2 .Docx 1 1
No ratings yet
Manual GRP A - Assignment 2 .Docx 1 1
15 pages
Nikhil Devadas: Huffman Data Compression .!!!!
No ratings yet
Nikhil Devadas: Huffman Data Compression .!!!!
4 pages
LP-III Assignment No 2
No ratings yet
LP-III Assignment No 2
16 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Huffman Coding
No ratings yet
Huffman Coding
11 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Coding and Decoding _1st_chapter
No ratings yet
Coding and Decoding _1st_chapter
19 pages
Oral Com 1st Quarter Exam Without Answer Key Sy 2022 2023
No ratings yet
Oral Com 1st Quarter Exam Without Answer Key Sy 2022 2023
4 pages
SIKKIM MANIPAL MBA 1 SEM MB0039-Business Communication MQP
50% (2)
SIKKIM MANIPAL MBA 1 SEM MB0039-Business Communication MQP
14 pages
PT20 SDK Programming Guide
No ratings yet
PT20 SDK Programming Guide
114 pages
Notes On Communication Skills - 25.2.2024
No ratings yet
Notes On Communication Skills - 25.2.2024
7 pages
410 - SQP - Front Office Operations
No ratings yet
410 - SQP - Front Office Operations
3 pages
File Compression Using Huffman Algorithm - 2003
0% (1)
File Compression Using Huffman Algorithm - 2003
43 pages
String Compression
No ratings yet
String Compression
13 pages
MATH3411 Information, Codes and Ciphers 2020 T3
No ratings yet
MATH3411 Information, Codes and Ciphers 2020 T3
18 pages
Data Quality Taxonomy & Failure Coding
100% (1)
Data Quality Taxonomy & Failure Coding
22 pages
JADAK HS 1M Series N User Manual 2
No ratings yet
JADAK HS 1M Series N User Manual 2
90 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
58 pages
Communication Process
No ratings yet
Communication Process
3 pages
T215A_Session 1
No ratings yet
T215A_Session 1
41 pages
Foundations of Group Behavior
No ratings yet
Foundations of Group Behavior
9 pages
Communication Models
100% (1)
Communication Models
42 pages
Error Control Codes: Tarmo Anttalainen
No ratings yet
Error Control Codes: Tarmo Anttalainen
128 pages
Media and Information Literacy
No ratings yet
Media and Information Literacy
12 pages
B. To Reduce The Size of Data To Save Space
100% (1)
B. To Reduce The Size of Data To Save Space
25 pages
Convolutional Codes: Representation and Encoding
No ratings yet
Convolutional Codes: Representation and Encoding
29 pages
Eze Lawretta Uchenna.
No ratings yet
Eze Lawretta Uchenna.
149 pages
Unit I Information Theory & Coding Techniques P I
No ratings yet
Unit I Information Theory & Coding Techniques P I
48 pages
Information Theory-VI
No ratings yet
Information Theory-VI
41 pages
Advanced Digital Communications (Ee5511) : MSC Module of Wireless Communication System
No ratings yet
Advanced Digital Communications (Ee5511) : MSC Module of Wireless Communication System
54 pages
Information Theory &: Coding (18EC54)
No ratings yet
Information Theory &: Coding (18EC54)
8 pages
The Discipline of Communication
No ratings yet
The Discipline of Communication
42 pages
Instant download (Ebook) Introduction to Computing and Programming in Python, Global Edition by Mark J. Guzdial, Barbara Ericson pdf all chapter
100% (10)
Instant download (Ebook) Introduction to Computing and Programming in Python, Global Edition by Mark J. Guzdial, Barbara Ericson pdf all chapter
36 pages
F1-407-1355-002 - M145 TPMS Diagnostics Specification V1 - 02
No ratings yet
F1-407-1355-002 - M145 TPMS Diagnostics Specification V1 - 02
59 pages
Two Layers QR Codes
No ratings yet
Two Layers QR Codes
16 pages

MLSP_LAB_EXP2

Uploaded by

MLSP_LAB_EXP2

Uploaded by

Department of Electrical Engineering

Indian Institute of Technology Kharagpur

Experiment 2: Lossless Compression

2.1 Constructing the Huffman tree

2.2 Code book generation

Figure 1: Steps wise evolution of the min-heap.

2.3 Coded bit stream

1. Root → Right → Right → Lef t so as to reach x =‘h’.

2. Reset tree pointer to Root.

3. Root → Right → Right → Right so as to reach x =‘e’.

4. Reset tree pointer to Root.

5. Root → Lef t so as to reach x =‘l’.

6. Reset tree pointer to Root.

7. Root → Lef t so as to reach x =‘l’.

8. Reset tree pointer to Root.

9. Root → Right → Lef t so as to reach x =‘o’.

2.5 Building the Encoder-Decoder

1. Generate the set of unique symbols U from the dataset X.

2. Compute the probability of occurance of each unique symbol as p(ui ).

3. Construct the min-heap.

4. Generate the code book for each unique symbol ui .

5. Generate the coded bit stream Y .

2.5.2 Compressed file

1. Reconstruct the Huffman tree using the Code book.

2. Using traversal on the Huffman tree recover X̂ from Y .

3.1.2 Encoder - Decoder - Compressed file

3.1.3 Assignments to solve and report

4. Use the decoder to decode each of the .huf files.

3.2.2 Assignments to solve and report

You might also like