Open navigation menu

Scribd

0% found this document useful (0 votes)

110 views31 pages

Lesson - Huffman and Entropy Coding

The document discusses Huffman coding and entropy encoding for data compression. It explains that Huffman coding assigns variable length binary codes to characters, with shorter codes for more frequent characters. This allows data to be compressed by representing characters with fewer bits on average. The document provides an example of building a Huffman tree and assigning codes. It notes that Huffman coding results in optimal compression and has the unique prefix property. Entropy encoding techniques aim to represent data as close to the theoretical entropy limit as possible.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views31 pages

Lesson - Huffman and Entropy Coding

The document discusses Huffman coding and entropy encoding for data compression. It explains that Huffman coding assigns variable length binary codes to characters, with shorter codes for more frequent characters. This allows data to be compressed by representing characters with fewer bits on average. The document provides an example of building a Huffman tree and assigning codes. It notes that Huffman coding results in optimal compression and has the unique prefix property. Entropy encoding techniques aim to represent data as close to the theoretical entropy limit as possible.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Information Theory & Coding

Huffman and Entropy Coding

Professor Dr. A.K.M Fazlul Haque

Electronics and Telecommunication Engineering (ETE)
Daffodil International University
Basic Idea

Note:

Fixed-length encoding
ASCII, Unicode

Variable-length encoding : assign longer code words to less

frequent characters, shorter code words to more frequent
characters.
Huffman Coding

 Huffman codes can be used to compress information

– Like WinZip – although WinZip doesn’t use the
Huffman algorithm
– JPEGs do use Huffman as part of their compression
process
Huffman Coding (Cont.)

 As an example, lets take the string:

“duke blue devils”
 We first to a frequency count of the characters:
• e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
 Next we use a Greedy algorithm to build up a Huffman
Tree
– We start with nodes for each character

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
Huffman Coding (Cont.)

 We then pick the nodes with the smallest frequency and combine
them together to form a new node.
– The selection of these nodes is the Greedy part
 The two selected nodes are removed from the set, but replace by the
combined node.
 This continues until we have only 1 node left in the set.
Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 2

i,1 s,1
Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 k,1 2 2

b,1 v,1 i,1 s,1

Huffman Coding (Cont.)

e,3 d,2 u,2 l,2 sp,2 3 2

k,1 2 i,1 s,1

b,1 v,1
Huffman Coding (Cont.)

e,3 d,2 u,2 4 3 2

l,2 sp,2 k,1 2 i,1 s,1

b,1 v,1
Huffman Coding (Cont.)

e,3 4 4 3 2

d,2 u,2 l,2 sp,2 k,1 2 i,1 s,1

b,1 v,1
Huffman Coding (Cont.)

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

7 4 5

e,3 4 l,2 sp,2 2 3

d,2 u,2 i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

7 9

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

16

7 9

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1
Huffman Coding (Cont.)

 Now we assign codes to the tree by placing a 0 on every left branch

and a 1 on every right branch.
 A traversal of the tree from root to leaf give the Huffman code for that
particular leaf character.
 Note that no code is the prefix of another code.
Huffman Coding (Cont.)

16
e 00
7 9
d 010
u 011
e,3 4 4 5
l 100
sp 101
d,2 u,2 l,2 sp,2 2 3
i 1100
i,1 s,1 k,1 2 s 1101
k 1110
b,1 v,1 b 11110
v 11111
Huffman Coding (Cont.)

 These codes are then used to encode the string.

 Thus, “duke blue devils” turns into:

010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101

 When grouped into 8-bit bytes:

01001111 10001011 11101000 11001010 10001111 11100100 1101xxxx

 Thus it takes 7 bytes of space compared to 16 characters * 1 byte/char =

16 bytes uncompressed
Huffman Coding

 Uncompressing works by reading in the file bit by bit.

– Start at the root of the tree
– If a 0 is read, head left
– If a 1 is read, head right
– When a leaf is reached decode that character and start over again at
the root of the tree
 Thus, we need to save Huffman table information as a header in the
compressed file.
– Doesn’t add a significant amount of size to the file for large files (which
are the ones you want to compress anyway)
– Or we could use a fixed universal set of codes/freqencies
Most important properties of
Huffman Coding

 Unique Prefix Property: No Huffman code is a prefix of any other

Huffman code
• For example, 101 and 1010 cannot be Huffman codes. Why?
 Optimality: The Huffman code is a minimum-redundancy code (given
an accurate data model)
• The two least frequent symbols will have the same length for their
Huffman code, whereas symbols occurring more frequently will
have shorter Huffman codes
• It has been shown that the average code length of an information
source S is strictly less than  + 1, i.e.
 l’ <  + 1
Data Compression Scheme

Input Data Encoder B0 = # bits required before compression

(compression) B1 = # bits required after compression

Codes / Compression Ratio = B0 / B1.

Code words Storage or
Networks

Codes /
Code words Decoder
(decompression)

Output
Data
Compression Techniques

Coding Type Basis Technique

Run-length Coding
Entropy
Huffman Coding
Encoding
Arithmetic Coding
DPCM
Prediction
DM
FFT
Transformation
DCT
Source Coding
Bit Position
Layered Coding Subsampling
Sub-band Coding
Vector Quantization
JPEG
MPEG
Hybrid Coding
H.263
Many Proprietary Systems
Compression Techniques (Cont.)

 Entropy Coding
– Semantics of the information to encoded are ignored
– Lossless compression technique
– Can be used for different media regardless of their characteristics
 Source Coding
– Takes into account the semantics of the information to be encoded.
– Often lossy compression technique
– Characteristics of medium are exploited
 Hybrid Coding
– Most multimedia compression algorithms are hybrid techniques
Entropy Encoding

 Information theory is a discipline in applied mathematics involving the

quantification of data with the goal of enabling as much data as possible
to be reliably stored on a medium and/or communicated over a channel.
 According to Claude E. Shannon, the entropy  (eta) of an information
source with alphabet S = {s1, s2, ..., sn} is defined as

n n
1
  H ( S )   pi log 2   pi log 2 pi
i 1 pi i 1

where pi is the probability that symbol si in S will occur.

Entropy Encoding (Cont.)

 Example 1: What is the entropy of an image with uniform distributions

of gray-level intensities (i.e. pi = 1/256 for all i)?
 Example 2: What is the entropy of an image whose histogram shows
that one third of the pixels are dark and two thirds are bright?
Entropy Encoding: Run-Length

 Data often contains sequences of identical bytes. Replacing these

repeated byte sequences with the number of occurrences reduces
considerably the overall data size.
 Many variations of RLE
– One form of RLE is to use a special marker M-byte that will indicate the
number of occurrences of a character
• “c”!#
– How many bytes are used above? When do you think the M-
byte should be used?
• ABCCCCCCCCDEFGGG
is encoded as
ABC!8DEFGGG

– What if the string contains the “!” character?

– How much is the compression ratio for this example
3.8 Entropy Encoding: Run-Length (Cont.)

 Many variations of RLE :

 Zero-suppression: In this case, one character that is
repeated very often is the only character used in the
RLE. In this case, the M-byte and the number of
additional occurrences are stored.
 When do you think the M-byte should be used, as
opposed to using the regular representation without
any encoding?
Entropy Encoding: Run-Length (Cont.)

 Many variations of RLE :

– If we are encoding black and white images
(e.g. Faxes), one such version is as follows:
– (row#, col# run1 begin, col# run1 end, col#
run2 begin, col# run2 end, ... , col# runk
begin, col# runk end)
– (row#, col# run1 begin, col# run1 end, col#
run2 begin, col# run2 end, ... , col# runr
begin, col# runr end)
– ...
– (row#, col# run1 begin, col# run1 end, col#
run2 begin, col# run2 end, ... , col# runs
begin, col# runs end)
Entropy Encoding: Huffman Coding

 One form of variable length coding.

 Greedy algorithm.
 Has been used in fax machines, JPEG and MPEG.
Entropy Encoding: Huffman Coding
(Cont.)

Algorithm of Huffman Coding:

Input: A set C = {c1 , c2 , ... , cn} of n characters and their frequencies {f(c1) ,
f(c2 ) , ... , f(cn )}
Output: A Huffman tree (V, T) for C.
1. Insert all characters into a min-heap H according to their frequencies.
2. V = C; T = {}
3. for j = 1 to n – 1
4. c = deletemin(H)
5. c’ = deletemin(H)
6. f(v) = f(c) + f(c’) // v is a new node
7. Insert v into the minheap H
8. Add (v,c) and (v,c’) to tree T making c and c’ children of v in T
9. end for
END

You might also like

Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Group-8 DIP Presentation
No ratings yet
Group-8 DIP Presentation
100 pages
New CS WorkBook 2023 (Computer Science) (Editable)
No ratings yet
New CS WorkBook 2023 (Computer Science) (Editable)
94 pages
Huffman Coding
No ratings yet
Huffman Coding
39 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
9210 International Gcse Computer Science Question Paper 2 v1.0
No ratings yet
9210 International Gcse Computer Science Question Paper 2 v1.0
21 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
cp467 12 Lecture14 Compression1
No ratings yet
cp467 12 Lecture14 Compression1
146 pages
CGIP Huffman EX
No ratings yet
CGIP Huffman EX
17 pages
3-1-Lossless Compression
No ratings yet
3-1-Lossless Compression
10 pages
Unit 2
No ratings yet
Unit 2
82 pages
2024-11-12 Huffman Trees 分享 -
No ratings yet
2024-11-12 Huffman Trees 分享 -
11 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
Equation Reducible To Homogeneous Differential Equation
100% (1)
Equation Reducible To Homogeneous Differential Equation
3 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Text and Text Compression
No ratings yet
Text and Text Compression
28 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
Huffman Encoding: Farhad Muhammad Riaz
No ratings yet
Huffman Encoding: Farhad Muhammad Riaz
17 pages
Ic23 Unit02 Script
No ratings yet
Ic23 Unit02 Script
29 pages
Huffman
No ratings yet
Huffman
17 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
DC 3
No ratings yet
DC 3
20 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
No ratings yet
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
13 pages
Entropy
No ratings yet
Entropy
10 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Chapter 3-Part II
100% (1)
Chapter 3-Part II
26 pages
Lecture 6
No ratings yet
Lecture 6
22 pages
Module IV
No ratings yet
Module IV
37 pages
DR Questions
No ratings yet
DR Questions
55 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
It6005 - Digital Image Processing
No ratings yet
It6005 - Digital Image Processing
27 pages
Huffman Code
No ratings yet
Huffman Code
7 pages
Data Compression
No ratings yet
Data Compression
28 pages
Lecture
No ratings yet
Lecture
75 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
Multimedia
100% (1)
Multimedia
274 pages
Ec1009 Digital Image Processing
100% (15)
Ec1009 Digital Image Processing
37 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Term Paper Huffman Coding
No ratings yet
Term Paper Huffman Coding
9 pages
Answers To Computer Systems Workbook
No ratings yet
Answers To Computer Systems Workbook
45 pages
Huffman Coding: Vida Movahedi
No ratings yet
Huffman Coding: Vida Movahedi
24 pages
DIP Unit 4 Image Compression
No ratings yet
DIP Unit 4 Image Compression
32 pages
Compression: Safeen H. Rasool Assist. Lecturer
No ratings yet
Compression: Safeen H. Rasool Assist. Lecturer
16 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Unit 2
No ratings yet
Unit 2
28 pages
Compression and Decompression Using Huffman Convention Synopsis
No ratings yet
Compression and Decompression Using Huffman Convention Synopsis
10 pages
CH 6
No ratings yet
CH 6
21 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Huff Man
No ratings yet
Huff Man
8 pages
Geographic Information Systems-1
No ratings yet
Geographic Information Systems-1
108 pages
Data Compression (Pt2)
No ratings yet
Data Compression (Pt2)
22 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Lossless Compression: Lesson 1
No ratings yet
Lossless Compression: Lesson 1
10 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
AC Fundamentals
No ratings yet
AC Fundamentals
21 pages
Series and Parallel Ac Networks RLC
No ratings yet
Series and Parallel Ac Networks RLC
19 pages
8525 2 QP ComputerScience G 25may23 PM
No ratings yet
8525 2 QP ComputerScience G 25may23 PM
28 pages
Chap 10
No ratings yet
Chap 10
33 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
9210 2 QP InternationalComputerScience G 31may23 07 00 GMT
No ratings yet
9210 2 QP InternationalComputerScience G 31may23 07 00 GMT
28 pages
Image Compression by Retaining Image Quality - Ieee Format
No ratings yet
Image Compression by Retaining Image Quality - Ieee Format
4 pages
OCTAGON Plugin API User Manual Eng v09
No ratings yet
OCTAGON Plugin API User Manual Eng v09
38 pages
Solutions: Cardiff Cardiff University Examination Paper
No ratings yet
Solutions: Cardiff Cardiff University Examination Paper
15 pages
Cvdip Assignment
No ratings yet
Cvdip Assignment
12 pages
Gazi Thesis Wittek
100% (3)
Gazi Thesis Wittek
7 pages
Image Compression Using Proposed Enhanced Run Length Encoding Algorithm
No ratings yet
Image Compression Using Proposed Enhanced Run Length Encoding Algorithm
14 pages
Homogeneous Differential Equation
No ratings yet
Homogeneous Differential Equation
9 pages
Op Amp 1
No ratings yet
Op Amp 1
9 pages
Test Bank For Computer Science: An Overview, 13th Edition Glenn Brookshear Dennis Brylow - Fast Download To Start Reading Immediately
100% (8)
Test Bank For Computer Science: An Overview, 13th Edition Glenn Brookshear Dennis Brylow - Fast Download To Start Reading Immediately
42 pages
2.KRLE Data Compression
No ratings yet
2.KRLE Data Compression
6 pages
ETE457 - 171-19-1971 Mid Exam Answer
No ratings yet
ETE457 - 171-19-1971 Mid Exam Answer
10 pages
Data Representation - AQA Computer Science Cheat Sheet: by Via
No ratings yet
Data Representation - AQA Computer Science Cheat Sheet: by Via
2 pages
AS 9618 CS qp3p1.
No ratings yet
AS 9618 CS qp3p1.
14 pages
Chapter 3
No ratings yet
Chapter 3
25 pages
Data Storage and Compression
No ratings yet
Data Storage and Compression
17 pages
Aqa 8525 TG Rle
No ratings yet
Aqa 8525 TG Rle
6 pages
BMPS Syllabus
No ratings yet
BMPS Syllabus
9 pages
Exam Sample Question Geomatics
No ratings yet
Exam Sample Question Geomatics
9 pages