0% found this document useful (0 votes)

52 views22 pages

Data Compression (Pt2)

Data compression techniques aim to reduce the size of data files or streams for efficient storage or transmission by removing redundant information. There are two main categories of compression methods: lossless techniques that allow for perfect reconstruction of the original data, and lossy techniques that discard some information for higher compression ratios but lower quality reconstruction. Popular lossless compression algorithms discussed in the document include run-length encoding, Shannon-Fano coding, and Huffman coding which assigns variable length codes to symbols based on their frequency of occurrence.

Uploaded by

Syahmi Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views22 pages

Data Compression (Pt2)

Uploaded by

Syahmi Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 22

Multimedia Data

Compression

1
Why Data Compression?
 Make optimal use of limited storage space

 Save time and help to optimize resources

 If compression and decompression are done in I/O processor, less time is
required to move data to or from storage subsystem, freeing I/O bus for
other work

 In sending data over communication line: less time to transmit and less
storage to host
7.1 Introduction
 If the compression and decompression processes induce no
information loss, then the compression scheme is lossless;
otherwise, it is lossy.
 Compression ratio:

 (7.1)

 B0 – number of bits before compression

 B1 – number of bits after compression
 In general, we would desire any codec (encoder/decoder
scheme) to have a compression ratio much larger than 1.0.
 The higher the compression ratio, the better the lossless
compression scheme, as long as it is computationally feasible.

3
7.1 Introduction
 Compression: the process of coding that will effectively
reduce the total number of bits needed to represent certain
information.
 Figure 7.1 depicts a general data compression scheme, in which
compression is performed by an encoder and decompression
is performed by a decoder.

Fig. 7.1: A General Data Compression Scheme.

4
Data Compression Methods
 Data compression is about storing and sending
a smaller number of bits.
 There’re two major categories for methods to
compress data: lossless and lossy methods
7.2 Basics of Information Theory
 The entropy η of an information source with alphabet
S = {s1, s2, . . . , sn} is:
n
1
  H (S )   pi log 2  (7.2)
i 1 pi
n  (7.3)
  pi log 2 pi
i 1
 pi – probability that symbol si will occur in S.
1
 log
– indicates the amount of information (
2 pi
self-information as defined by Shannon) contained in
si, which corresponds to the number of bits needed
to encode si.

Li & Drew 6
Data Compression- Entropy
 Entropy is the measure of information content
in a message.
 Messages with higher entropy carry more information than
messages with lower entropy.
 The average entropy over the entire message is
the sum of the entropy of all n symbols in the
message
7.3 Run-Length Coding
• RLC is one of the simplest forms of data compression.
 The basic idea is that if the information source has the property that
symbols tend to form continuous groups, then such symbol and the
length of the group can be coded.
 Consider a screen containing plain black text on a solid white
background.
 There will be many long runs of white pixels in the blank space, and
many short runs of black pixels within the text. Let us take a
hypothetical single scan line, with B representing a black pixel and W
representing white:
WWWWWBWWWWBBBWWWWWWBWWW
 If we apply the run-length encoding (RLE) data compression
algorithm to the above hypothetical scan line, we get the following:
5W1B4W3B6W1B3W
 The run-length code represents the original 21 characters in only 14.

8
7.4 Variable-Length Coding
 variable-length coding (VLC) is one of the best-known
entropy coding methods

 Here, we will study the Shannon–Fano algorithm, Huffman

coding, and adaptive Huffman coding.

9
7.4.1 Shannon–Fano Algorithm
 To illustrate the algorithm, let us suppose the symbols to be
coded are the characters in the word HELLO.
 The frequency count of the symbols is
Symbol H E L O
Count 1 1 2 1
 The encoding steps of the Shannon–Fano algorithm can be
presented in the following top-down manner:
 1. Sort the symbols according to the frequency count of their
occurrences.
 2. Recursively divide the symbols into two parts, each with
approximately the same number of counts, until all parts
contain only one symbol.

10
7.4.1 Shannon–Fano Algorithm
 A natural way of implementing the above procedure is to
build a binary tree.
 As a convention, let us assign bit 0 to its left branches and 1
to the right branches.
 Initially, the symbols are sorted as LHEO.
 As Fig. 7.3 shows, the first division yields two parts: L with a
count of 2, denoted as L:(2); and H, E and O with a total
count of 3, denoted as H, E, O:(3).
 The second division yields H:(1) and E, O:(2).
 The last division is E:(1) and O:(1).

11
7.4.1 Shannon–Fano Algorithm

Fig. 7.3: Coding Tree for HELLO by Shannon-Fano. 12

Table 7.1: Result of Performing Shannon-Fano on HELLO
Symbol Count Log2 1 Code # of bits used
pi
L 2 1.32 0 2
H 1 2.32 10 2
E 1 2.32 110 3
O 1 2.32 111 3
TOTAL # of bits: 10

Average number of bits required to represent each symbol (entropy):

n
1
  H ( S )   pi log 2
i 1 pi
= 0.4 (1.32) + 0.2 (2.32) + 0.2 (2.32) + 0.2(2.32)
= 0.528+0.464+0.464+0.464
= 1.92 bits per symbol

Li & Drew 13
7.4.1 Shannon–Fano Algorithm
 The Shannon–Fano algorithm delivers satisfactory coding results
for data compression, but it was soon outperformed and
overtaken by the Huffman coding method.
 The Huffman algorithm requires prior statistical knowledge about
the information source, and such information is often not
available.
 This is particularly true in multimedia applications, where future
data is unknown before its arrival, as for example in live (or
streaming) audio and video.
 Even when the statistics are available, the transmission of the
symbol table could represent heavy overhead
 The solution is to use adaptive Huffman coding compression
algorithms, in which statistics are gathered and updated
dynamically as the data stream arrives.
14
7.4.2 Huffman Coding Algorithm
 Huffman coding is based on the frequency
of occurrence of a data item.
 The principle is to use a lower number of
bits to encode the data that occurs more
frequently.

15
Huffman Coding Algorithm
Symbol Count
A 5
B 7
C 10
D 15
E 20
F 45
Step 1: Sort the list by frequency (descending)

Symbol Count
F 45
E 20
D 15
C 10
B 7
A 5 16
Huffman Coding Algorithm
Step 2: Make the 2 lowest elements into leaves, creating a parent node
with a frequency that is the sum of the lower element frequencies.
AB (12)

A (5) B (7)

Step 3: The two elements are removed from the list and the new
parent node with frequency 12 is inserted into the list. The list sorted by
frequency is
Symbol Count
F 45
E 20
D 15
AB 12
C 10

17
Huffman Coding Algorithm
Step 4: Then, repeat the loop, combining the two lowest elements.

Symbol Count
ABC (22)
F 45
C(10) AB (12) ABC 22
E 20
A (5) B (7)
D 15
Step 5: Repeat until there is only one element left in the list.

Symbol Count
ABCDE (57) F 45
ABCDE 57
ABC (22)

DE (35) C(10) AB (12)

D (15) E (20) A (5) B (7)

18
Huffman Coding Algorithm
ABCDEF (102)

ABCDE (57)

ABC (22)

DE (35) C(10) AB (12)

F (45) D (15) E (20) A (5) B (7)

Symbol Count
ABCDEF 102

19
Huffman Coding Algorithm
ABCDEF (102)

ABCDE (57)
0 0 1
ABC (22)
0 1
DE (35) C(10) AB (12)
0 1 0 1

F (45) D (15) E (20) A (5) B (7)

Symbol Count Pi 1 Log2 1 Code # of bits used

pi pi
A 5 0.049 20.41 4.35 1110 20
B 7 0.069 14.49 3.86 1111 28
C 10 0.098 10.20 3.35 110 30
D 15 0.147 6.80 2.76 100 45
E 20 0.196 5.10 2.35 101 60
F 45 0.441 2.27 1.18 0 45
102 228 20
Huffman Coding Algorithm
Symbol Count Pi 1 Log2 1 Code # of bits used
pi pi
A 5 0.049 20.41 4.35 1110 20
B 7 0.069 14.49 3.86 1111 28
C 10 0.098 10.20 3.35 110 30
D 15 0.147 6.80 2.76 100 45
E 20 0.196 5.10 2.35 101 60
F 45 0.441 2.27 1.18 0 45
102 228

Average number of bits required to represent each symbol (entropy):

n
1
  H ( S )   pi log 2
i 1 pi
= 0.049 (4.35) + 0.069 (3.86) + 0.098 (3.35) + 0.147(2.76) +
0.196(2.35) +0.441(1.18)
= 0.2132+0.2663+0.3283+0.3864+0.4606+0.520
= 2.1748 bits per symbol

21
7.5 Lempel-Ziv-Welch (LZW)
 The Lempel-Ziv-Welch (LZW) algorithm employs an
adaptive, dictionary-based compression technique.

 Unlike variable-length coding, in which the lengths of the

codewords are different, LZW uses fixed-length
codewords to represent variable length strings of
symbols/characters that commonly occur together, such
as words in English text.
 LZW is used in many applications, such as UNIX
compress, GIF for images, WinZip, and others.

20250320121146-Module-3 MMC Notes
No ratings yet
20250320121146-Module-3 MMC Notes
27 pages
cp467 12 Lecture14 Compression1
No ratings yet
cp467 12 Lecture14 Compression1
146 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Alphabet Activities
No ratings yet
Alphabet Activities
65 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
List of Inspection Characteristics For 5.56mm Completed Cartridge (200602) PDF
No ratings yet
List of Inspection Characteristics For 5.56mm Completed Cartridge (200602) PDF
453 pages
Sistema de Identificaqcion
No ratings yet
Sistema de Identificaqcion
341 pages
Trace
No ratings yet
Trace
153 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Source Coding
No ratings yet
Source Coding
35 pages
Data Compression Overview
No ratings yet
Data Compression Overview
77 pages
Báo cáo công tác quản lý, bảo quản, bảo dưỡng phương tiện phòng cháy và chữa cháy
No ratings yet
Báo cáo công tác quản lý, bảo quản, bảo dưỡng phương tiện phòng cháy và chữa cháy
58 pages
Stu-Lossless Compression Algos
No ratings yet
Stu-Lossless Compression Algos
21 pages
Data Compression
No ratings yet
Data Compression
46 pages
Source Coding
No ratings yet
Source Coding
29 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Tutorial 8
No ratings yet
Tutorial 8
20 pages
Advanced Multimedia Infrastructure
No ratings yet
Advanced Multimedia Infrastructure
32 pages
HGGJ Chapter Four
No ratings yet
HGGJ Chapter Four
30 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Image and Video Compression
No ratings yet
Image and Video Compression
18 pages
Module IV
No ratings yet
Module IV
37 pages
Multimedia Unit-4
No ratings yet
Multimedia Unit-4
30 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
2024-11-12 Huffman Trees 分享 -
No ratings yet
2024-11-12 Huffman Trees 分享 -
11 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Arithmetic Lempel and Ziv Coding Chapter 2 Part 2 EH
No ratings yet
Arithmetic Lempel and Ziv Coding Chapter 2 Part 2 EH
23 pages
BSidesLV2023 - PasswordsCon - Follow The White Rabbit Down The Rabbit Hole - Yiannis
No ratings yet
BSidesLV2023 - PasswordsCon - Follow The White Rabbit Down The Rabbit Hole - Yiannis
25 pages
UNIT-5 Entropy Encoding
No ratings yet
UNIT-5 Entropy Encoding
8 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Source Coding
No ratings yet
Source Coding
18 pages
DC 3
No ratings yet
DC 3
20 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Match Letters
No ratings yet
Match Letters
36 pages
Marcha Amarguras
No ratings yet
Marcha Amarguras
6 pages
Chapter 2
No ratings yet
Chapter 2
13 pages
JavaScript - The Strings Object
No ratings yet
JavaScript - The Strings Object
3 pages
CH 6
No ratings yet
CH 6
21 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
Image Compression
100% (1)
Image Compression
38 pages
Data Compression
No ratings yet
Data Compression
28 pages
3-1-Lossless Compression
No ratings yet
3-1-Lossless Compression
10 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
B. To Reduce The Size of Data To Save Space
100% (1)
B. To Reduce The Size of Data To Save Space
25 pages
Codes
No ratings yet
Codes
31 pages
Learn 7zip Command Examples in Linux
No ratings yet
Learn 7zip Command Examples in Linux
3 pages
Dce Easy Solution
0% (1)
Dce Easy Solution
87 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
RULES
No ratings yet
RULES
7 pages
ASCII Code - The Extended ASCII Table
No ratings yet
ASCII Code - The Extended ASCII Table
8 pages
Alfabetul NATO Şi Pronunţia
No ratings yet
Alfabetul NATO Şi Pronunţia
3 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Chapter 07 - Lossless Compression Algorithms
No ratings yet
Chapter 07 - Lossless Compression Algorithms
50 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
Lecture 6
No ratings yet
Lecture 6
22 pages
09 CM0340 Basic Compression Algorithms
No ratings yet
09 CM0340 Basic Compression Algorithms
73 pages
2017 May 24 Huffman Lecture1
No ratings yet
2017 May 24 Huffman Lecture1
24 pages
Lecture 3 Compressiond Algo
No ratings yet
Lecture 3 Compressiond Algo
65 pages
Project Report: "Shannon Fannon Coding"
No ratings yet
Project Report: "Shannon Fannon Coding"
8 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
No ratings yet
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
25 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Text Compression
No ratings yet
Text Compression
16 pages
Presentation Multimedia
No ratings yet
Presentation Multimedia
15 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Define Psychovisual Redundancy
No ratings yet
Define Psychovisual Redundancy
3 pages
20ModelQ HSC ICT
No ratings yet
20ModelQ HSC ICT
21 pages
Valz Venezolano 2 TR Kevin Rodríguez
No ratings yet
Valz Venezolano 2 TR Kevin Rodríguez
3 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
(Unicode) Character
No ratings yet
(Unicode) Character
4 pages
Lempel-Ziv-Welch (LZW) - Is A Universal Lossless Data Compression Algorithm Created by Abraham
No ratings yet
Lempel-Ziv-Welch (LZW) - Is A Universal Lossless Data Compression Algorithm Created by Abraham
5 pages
CD15 de mp3
No ratings yet
CD15 de mp3
6 pages
Chapter 7 Lossless Compression Algorithms
No ratings yet
Chapter 7 Lossless Compression Algorithms
25 pages
UBL Invoice
No ratings yet
UBL Invoice
3 pages
Entropy
No ratings yet
Entropy
10 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Lossless Compression: Lesson 1
No ratings yet
Lossless Compression: Lesson 1
10 pages
Basic Concepts of Encoding
No ratings yet
Basic Concepts of Encoding
34 pages
3.source Coding Data Compression
No ratings yet
3.source Coding Data Compression
25 pages
Baudot
No ratings yet
Baudot
2 pages
ASCII Code - The Extended ASCII Table
No ratings yet
ASCII Code - The Extended ASCII Table
7 pages
Learn Python Visually: Creative Coding with Processing.py
From Everand
Learn Python Visually: Creative Coding with Processing.py
Tristan Bunn
No ratings yet
Graphic Guide to Python with Processing.py 3: Graphic Guide to Programming
From Everand
Graphic Guide to Python with Processing.py 3: Graphic Guide to Programming
Antony Lees
No ratings yet
Exploring Arduino: Tools and Techniques for Engineering Wizardry
From Everand
Exploring Arduino: Tools and Techniques for Engineering Wizardry
Jeremy Blum
4.5/5 (5)
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)

Data Compression (Pt2)

Uploaded by

Data Compression (Pt2)

Uploaded by

Multimedia Data

 Save time and help to optimize resources

 B0 – number of bits before compression

Fig. 7.1: A General Data Compression Scheme.

 Here, we will study the Shannon–Fano algorithm, Huffman

Fig. 7.3: Coding Tree for HELLO by Shannon-Fano. 12

Average number of bits required to represent each symbol (entropy):

DE (35) C(10) AB (12)

D (15) E (20) A (5) B (7)

DE (35) C(10) AB (12)

F (45) D (15) E (20) A (5) B (7)

F (45) D (15) E (20) A (5) B (7)

Symbol Count Pi 1 Log2 1 Code # of bits used

Average number of bits required to represent each symbol (entropy):

 Unlike variable-length coding, in which the lengths of the

You might also like