Why Needed?: Without Compression, These Applications Would Not Be Feasible

1) Data compression is needed due to the large size of modern applications like videos and the limited capacity of storage and transmission mediums. It allows for more efficient organization and transmission of data. 2) There are two main types of compression: lossless, which preserves all original data, and lossy, which sacrifices some quality to achieve higher compression ratios. Common lossless techniques include run length encoding (RLE) and Huffman coding, while lossy techniques typically reduce information content. 3) Huffman coding assigns variable length codes to symbols based on their frequency, with more common symbols getting shorter codes. It achieves better compression than RLE on average. Lempel-Ziv (LZ

Uploaded by

smile00972

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views11 pages

Why Needed?: Without Compression, These Applications Would Not Be Feasible

Uploaded by

smile00972

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Compersion:

With the increased emphasis on full-text data bases, the problem of handling
the quantity of data becomes significant. Since the time required to search a
database is heavily dependent on the amount of data, for efficient operation of
an information system is necessary both to organize the data well and to find
as efficient a representation for the data as is possible. Thus there is growing
interest in data compersion. Why needed?
1. Size of applications is going from large to larger MP3, MPEG, Tiff, etc.
2. Fax has about 4 million dots/page more than 1 minutes over 56Kbps.
If the data is compressed by a factor of 10, the transmission time is
reduced to 6 seconds per page.
3. TV / Motion Pictures uses 30 pictures (frames) / second 200,000 pixels /
frames, color pictures require 3 bytes for each pixel (RGB). Each frame
has 200,000 * 24 = 4.8 Mbits, 2-hour movie requires 216,000 pictures.
total bits for such movie = 216,000 * 4.8 Mbits = 1.0368 x 1012. This is
much higher than the capacity of DVDs
Without compression, these applications would not be feasible.
A codec is called LOSSY, if the data is lost during compression, while it called
LOSSLESS, if the data is not loss during compression.

1. Redundancy reduction (Usually lossless):
Remove redundancy from the message.
2. Reduce information content (Usually loosy):
Reduce the total amount of information in the message.
Leads to sacrifice of quality.
Two classes of text compression methods
Symbol-wise (or statistical) methods
Estimate probabilities of symbols - modeling step
Usually based on either arithmetic or Huffman coding
Dictionary methods
Replace fragments of text with a single code word
(typically an index to an entry in the dictionary).
eg: Ziv-Lempel coding, which replaces strings of
characters with a pointer to a previous
occurrence of the string.
No probability estimates needed
Text Compression
model
encoder
model
decoder
compressed text text text
Information Theory
Entropy: Shannon borrowed the definition of entropy from statistical physics
to capture the notion of how much information is contained in the whole
alphabet. For a set of possible messages S, Shannon defined entropy as,
Where p(s) is the probability of message s. The self information i(s) represents
the number of bits of information contained in it, and roughly speaking the
number of bits we should use to send that message.
( ) ( )
( )
( ) ( ) s i s p
s p
s p S H
S s S s
.
1
log
2
e e
= =
average original symbol length
average compressed symbol length
C
< >
=
< >
( ) } {
( ) 25 . 2
125 . 0
1
log x 0.125 x 2
25 . 0
1
log x 0.25 x 3
0.125 0.125, 0.25, 0.25, , 25 . 0
2 2
= + =
=
s H
s P
Redundance: is the average codeword legths minus the entropy.

Compersion ratio: is the ratio between the average number of bit/symbol in
the original message and the same quantity for the coded message.
Based on the assumption that a file has a great deal of redundancy. Data is
considered just a string of symbols. RLE is good for fax and voice.
22 characters 14 characters
ABBCCDDDDDDDDDEEFGGGGG => ABBCCD#9EEFG#5
(22-14)/22 = 36 % reduction
Disadvantage:
1. We are unable to distinguish compressed text in the file from
uncompressed text.
2. Any numeric value will be interpreted as the beginning of a
compressed sequence.
1: Run Length Encoding (RLE)
1. Intially each symbole is considered as a separate
binary tree.
2. Two tree with the lowest frequencies are chosen and
combined into a single tree whose assigned frequency
is the sum of the two given frequencies. The chosen
tree form the two branches of the new tree.
3. The process is repeated until only a single tree
remains. Then the two branches for every are labeled 0
and 1 (0 on the left branch, but the order is not
important).
4. The code for each symbole can be read by following
the branch from the root to the symbol.
There is another algorithm which performances are slightly
better than Run Length Ecoding, the famous Huffman coding.
Huffman code is the frequency distribution of the symboles to
be encoded. A binary tree is then constructed.
2: Huffman coding
Huffman coding - Example
0
a
0.05
b
0.05
c
0.1
d
0.2
e
0.3
f
0.2
g
0.1
0.1
0.2
0.3
0.4
0.6
1.0
0
0
0
0
0
1
1
1
1
1
1
a
0.05
b
0.05
c
0.1
d
0.2
e
0.3
f
0.2
g
0.1
0.1
0.2
0.3
0.4
0.6
1.0
Symbol Prob. Codeword
0.05 0000
0.05 0001
0.1 001
0.2 01
0.3 10
0.2 11
a
b
c
d
e
f 0
0.1 111 g
Code the sequence (aeebcddegfced) and
evaluate entropy and compression ratio.
Sol: 0000 10 10 0001 001 01 01 10 111 110 001 10 01
Aver. orig. symb. length = 3 bits
Aver. compr. symb. length = 34/13
Symbol Prob. Codeword
0.05 0000
0.05 0001
0.1 001
0.2 01
0.3 10
0.2 11
a
b
c
d
e
f 0
0.1 111 g
Huffman coding - Exercise
H(X) = 2.5464 bits
Huffman coding - Notes
1. In the huffman coding, if, at any time, there is more than one way to
choose a smallest pair of probabilities, any such pair may be chosen.
2. Huffman code is a variable-length code, with the more frequent symbols
being assigned shorter codes.
3. Huffman codes are good for data messages.
LZ77 keep track of last n bytes of data seen and when a phrase is encountered
that has already been seen, they output a pair of values corresponding to the
position of the phrase in the previously-seen buffer of data, and the length of
the phrase. The code consists of a set of triples < a, b, c >, where:
a = relative position of the longest match in the dictionary
b = length of longest match
c = next char in buffer beyond longest match
The beginning with 0 identify new characters, not previously seen.
Lempel-Ziv Compression (LZ77):
P e t e r _ P i p e r _ p i c (0,0,P)
(0,0,t)
(2,1,r)
(0,0,_)
Output
Code
P e t e r _ P i p e r _ p i c
(0,0,e)
P e t e r _ P i p e r _ p i c
P e t e r _ P i p e r _ p i c
P e t e r _ P i p e r _ p i c
1
2
3
4
5
No. of code
triples
k
k
k
k
k
Decoded text
P e t e r _ P i p e r _
(6,1,i)
(6,3,c)
(0,0,k)
P e t e r _ P i p e r
(8,2,r)
P e t e r _ P i p e r
P e t e r _ P i p e r
6
7
8
9
_
_
_
p i c
p i c
p i c
p i c
k
k
k
k
Output
Code
No. of code
triples Decoded text
Arithmetic coding is based on the concept of interval subdividing.
In arithmetic coding a source ensemble is represented by an
interval between 0 and 1 on the real number line.
Each symbol of the ensemble narrows this interval. It uses the
probabilities of the source messages to successively narrow
the interval used to represent the ensemble.
Arithmetic Coding:
Arithmetic Coding: Description
In the following discussions, we will use M as the size of the
alphabet of the data source,
N[x] as symbol x's probability,
Q[x] as symbol x's cumulative probability (Q[i]=N[0]+N[1]+.)
Assume we know the probabilities of each symbol,
we can allocate to each symbol an interval with width proportional to
its probability, and each of the intervals does not overlap with others.
This can be done if we use the cumulative probabilities as the two
ends of each interval. Therefore, the two ends of each symbol x
amount to Q[x-1] and Q[x].
Symbol x is said to own the range [Q[x-1], Q[x]).
Arithmetic Coding: Encoder example
Symbol, x Probability,
N[x]
[Q[x-1], Q[x])
A 0.4 0.0, 0.4
B 0.3 0.4, 0.7
C 0.2 0.7, 0.9
D 0.1 0.9, 1.0
1
0
B
0.4
0.7 0.67
0.61
C
0.634
0.61
A
0.6286
0.6196
B
String: BCAB
Code sent:
0.6196
0.52
0.61
0.67
0.634
0.652
0.664
0.6196
0.6268

Blue-J: ICSE Class - 10
No ratings yet
Blue-J: ICSE Class - 10
186 pages
Module IV
No ratings yet
Module IV
37 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Text and Text Compression
No ratings yet
Text and Text Compression
28 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
3 Chapter Text and Image Compression
No ratings yet
3 Chapter Text and Image Compression
132 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
Data Compression
No ratings yet
Data Compression
35 pages
MMC Module 3
No ratings yet
MMC Module 3
65 pages
Chapter 4 - Introduction To Source Coding
No ratings yet
Chapter 4 - Introduction To Source Coding
72 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
Image Compression
100% (1)
Image Compression
38 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
Multimedia Multimedia Bible
0% (2)
Multimedia Multimedia Bible
163 pages
Multmedia Studies
No ratings yet
Multmedia Studies
15 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Data Compression: by Dilip Jha Assistant Prof. GBPEC, Pauri
No ratings yet
Data Compression: by Dilip Jha Assistant Prof. GBPEC, Pauri
21 pages
Image Compression
No ratings yet
Image Compression
50 pages
CH 6
No ratings yet
CH 6
21 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
Data Compression
No ratings yet
Data Compression
25 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
Data Compression
No ratings yet
Data Compression
20 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Bec613a - MMC - Module 3
No ratings yet
Bec613a - MMC - Module 3
55 pages
cp467 12 Lecture14 Compression1
No ratings yet
cp467 12 Lecture14 Compression1
146 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Forouzan6e ch11 PPTs Accessible
No ratings yet
Forouzan6e ch11 PPTs Accessible
119 pages
Source Coding Ompression
No ratings yet
Source Coding Ompression
34 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
ECEVSP L03 Compression2
No ratings yet
ECEVSP L03 Compression2
40 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Data Compression
No ratings yet
Data Compression
22 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Compressor Principles
No ratings yet
Compressor Principles
32 pages
Compression: Some Slides Courtesy James Allan@umass
No ratings yet
Compression: Some Slides Courtesy James Allan@umass
47 pages
Ch8c Data Compression
No ratings yet
Ch8c Data Compression
7 pages
Synopsis On: Data Compression
No ratings yet
Synopsis On: Data Compression
25 pages
ICT - Module 1 Lecture 3
No ratings yet
ICT - Module 1 Lecture 3
43 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
Multimedia Igds MSC Exam 2000 Solutions
No ratings yet
Multimedia Igds MSC Exam 2000 Solutions
16 pages
DCT Sample Solved
No ratings yet
DCT Sample Solved
22 pages
Chapter Presentation
No ratings yet
Chapter Presentation
57 pages
7.file Compression
No ratings yet
7.file Compression
20 pages
Presentation Multimedia
No ratings yet
Presentation Multimedia
15 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
MMC Module Iii-1
No ratings yet
MMC Module Iii-1
73 pages
Lecture 8
No ratings yet
Lecture 8
8 pages
IBM Power10 Scale-Out L2 Quiz - Attempt Review
No ratings yet
IBM Power10 Scale-Out L2 Quiz - Attempt Review
13 pages
Case Study Hadoop
No ratings yet
Case Study Hadoop
3 pages
STDI-0002 v2.1
No ratings yet
STDI-0002 v2.1
228 pages
Guide To DevOps
No ratings yet
Guide To DevOps
26 pages
AK - W L-R and Zig-Zag
No ratings yet
AK - W L-R and Zig-Zag
5 pages
Samsung Galaxy M13
No ratings yet
Samsung Galaxy M13
2 pages
Social Networking
100% (1)
Social Networking
5 pages
AI and DS
No ratings yet
AI and DS
6 pages
Huawei CloudEngine S5735-S-V2 Series Hybrid Optical-Electrical Switches Brochure
No ratings yet
Huawei CloudEngine S5735-S-V2 Series Hybrid Optical-Electrical Switches Brochure
13 pages
22 10 Con FN
No ratings yet
22 10 Con FN
3 pages
Passive Income PDF
No ratings yet
Passive Income PDF
2 pages
Advanced Migration Technologies Wintershall Case Study
No ratings yet
Advanced Migration Technologies Wintershall Case Study
2 pages
Core Java 8 Course Content
No ratings yet
Core Java 8 Course Content
3 pages
About Manco.: Major Time Line
No ratings yet
About Manco.: Major Time Line
4 pages
Embedded Systems Tools & Peripherals
No ratings yet
Embedded Systems Tools & Peripherals
4 pages
Nathaniel Brandon Cei 6 Stalpi Ai Increderii in Sine
No ratings yet
Nathaniel Brandon Cei 6 Stalpi Ai Increderii in Sine
179 pages
LINQ To Objects: C# Development
No ratings yet
LINQ To Objects: C# Development
34 pages
Ch-2 System Planning and Selection
No ratings yet
Ch-2 System Planning and Selection
79 pages
Architecture Roadmap
No ratings yet
Architecture Roadmap
5 pages
Java Multithreading Concurrency Interview Questions and Answers - JournalDev
No ratings yet
Java Multithreading Concurrency Interview Questions and Answers - JournalDev
25 pages
Web Technologies Miniproject
No ratings yet
Web Technologies Miniproject
64 pages
TLM2024 Cehtlm
No ratings yet
TLM2024 Cehtlm
35 pages
Mayuri Totare Java Developer 3 YOE
No ratings yet
Mayuri Totare Java Developer 3 YOE
1 page
Software Management: Purpose of A Software Development Plan (SDP)
No ratings yet
Software Management: Purpose of A Software Development Plan (SDP)
4 pages
BGP - Flowspec VPNV4
No ratings yet
BGP - Flowspec VPNV4
2 pages
Atm Machine Code
No ratings yet
Atm Machine Code
6 pages
Interview Questions
No ratings yet
Interview Questions
10 pages
User Guide Manual PDF
100% (1)
User Guide Manual PDF
166 pages

Why Needed?: Without Compression, These Applications Would Not Be Feasible

Uploaded by

Why Needed?: Without Compression, These Applications Would Not Be Feasible

Uploaded by

Data Compersion:

You might also like