0% found this document useful (0 votes)

126 views30 pages

Unit 1 Data Compression

MM notes

Uploaded by

justicesavior08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views30 pages

Unit 1 Data Compression

MM notes

Uploaded by

justicesavior08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

UNIT 1

BASICS OF DATA COMPRESSION

Multimedia:
It is the use of computers to present and combine text, graphics, audio, video and
animation in digital format

Basic Elements of Multimedia

Text, Images, Audio, Video and Animation

Compression:
It is the method of reducing the size of digital file so that it occupies less space .
Types of Compression:

There are two types of compression: lossless and lossy.

Lossless compression algorithms reduce the size of files without losing any information in
the file, which means that we can reconstruct the original data from the compressed file.

Lossy compression algorithms reduce the size of files by discarding the less important
information in a file, which can significantly reduce file size but also affect file quality.

Difference between Lossy and Lossless Comparison

[Link] Lossy Compression Lossless Compression

Lossy compression is the method which While Lossless Compression does not
1. eliminate the data which is not eliminate the data which is not
noticeable. noticeable.

In Lossy compression, A file does not While in Lossless Compression, A file can
2.
restore or rebuilt in its original form. be restored in its original form.

In Lossy compression, Data’s quality is But Lossless Compression does not

3.
compromised. compromise the data’s quality.

Lossy compression reduces the size of But Lossless Compression does not
4.
data. reduce the size of data.

Algorithms used in Lossy compression Algorithms used

are: Transform coding, Discrete Cosine in Lossless compression are: Run Length
5.
Transform, Discrete Wavelet Transform, Encoding, Lempel-Ziv-Welch, Huffman
fractal compression etc. Coding, Arithmetic encoding etc.

Lossy compression is used in Images, Lossless Compression is used in Text,

6.
audio, video. images, sound.
Lossless Compression has less data-
Lossy compression has more data-
7. holding capacity than Lossy compression
holding capacity.
technique.

Lossy compression is also termed as Lossless Compression is also termed as

8.
irreversible compression. reversible compression.

Advantages and Disadvantages of Lossy and Lossless comparison:

Lossy Lossless
Small file sizes. Ideal for web
Pros use. Lots of tools, plugins and No loss in quality. Slight decreases in file sizes.
software support it.

Quality degrades due to

Cons Compressed files are larger than lossy files.
higher rate of compression.

Methods used for Lossy and Lossless compression

Algorithms used in Lossy compression are:

Transform coding,

Discrete Cosine Transform,

Discrete Wavelet Transform,

fractal compression etc.

Algorithms used in Lossless compression are:

Run Length Encoding,

Lempel-Ziv-Welch,

Huffman Coding,

Arithmetic encoding etc

Run Length Encoding

Run-length encoding (RLE) is a form of lossless compression.. RLE is a simple method of

compressing data by specifying the number of times a character or pixel colour repeats
followed by the value of the character or pixel. The aim is to reduce the number of bits used
to represent a set of data.
1. Encode AAAAAABBBBBCCCCCCCCDDEEEEEFFF using RLE

Output:

6A 5B 8C 2D 5E 3F

2. Encode
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWW
WWWWWWWWWBWWWWWWWWWWWWWW
Output
12W 1B 12W 3B 24W 1B 14W

DICTIONARY-BASED ALGORITHMS

The compression techniques we have seen so far replace individual symbols with a variable
length codewords. In dictionary compression, variable length substrings are replaced by
short, possibly even fixed length codewords. Compression is achieved by replacing long
strings with shorter codewords

The general scheme is as follows:

• The dictionary D is a collection of strings

. For completeness, the dictionary includes all single symbols.

• The text T is parsed into a sequence of phrases: T = T1T2 . . . Tz,

• The text is encoded by replacing each phrase Ti with a code that acts as a pointer to the
dictionary.

LEMPEL ZIV

 Lossless compression
 Dictionary based compression
 Encoded string is longer than the input stream

Example 1

Let
AABABBBABAABABBBABBABB A=0
B=1

POSITION 1 2 3 4 5 6 7 8 9
ABA
SEQUENCE A AB ABB B ABA BB ABBA BB
B
1B
(Numbe
NUMERICAL
r
REPRESENTATIO ΦA 2B ΦB 2A 5B 4B 3A 7
denotes
N
position
)
0000
(Last
digit
denote
s the
CODE letter, 0011 0101 `0001 0101 1011 1001 0110 0111
First 3
digit
the
numbe
r

Example 2: 01000101110010100101

LEMPEL ZIV WELCH

 Common technique
 Used in GIF, optionally in TIFF and PDF
 Simple
 High throughput in hardware
 Widely used in Unix file compression
 Doubles the capacity of the hardware
 Scans a file for data pattern that appears more than once.
 Patterns are stored in dictionary

Example1:

[Link] “TOBEORNOTTOBEORTOBEORNOT#”

There are 26 symbols in the plaintext alphabet (the capital letters A through Z). # is used to
represent a stop code

The initial dictionary, then, consists of the following

entries:
Symbol Decimal Symbol Decimal
# 0 N 14
A 1 O 15
B 2 P 16
C 3 Q 17
D 4 R 18
E 5 S 19
F 6 T 20
G 7 U 21
H 8 V 22
I 9 W 23
J 10 X 24
K 11 Y 25
L 12 Z 26
M 13

OUTPUT:

Extended
CODE Description
Dictionary
27 TO 20 CODE OF FIRST LETTER T
CODE OF FIRST LETTER O
28 OB 15 (OVERLAPPING WITH THE SECOND
BIT OF THE FIRST CODE)
29 BE 2 CODE OF FIRST LETTER B
30 EO 5 CODE OF FIRST LETTER E
31 OR 15 CODE OF FIRST LETTER O
32 RN 18 CODE OF FIRST LETTER R
33 NO 14 CODE OF FIRST LETTER N
34 OT 15 CODE OF FIRST LETTER O
35 TT 20 CODE OF FIRST LETTER T
36 TOB 27 CODE OF FIRST TWO LETTERS TO
37 BEO 29 CODE OF FIRST TWO LETTERS BE
38 ORT 31 CODE OF FIRST TWO LETTERS OR
39 TOBE 36 CODE OF FIRST TWO LETTERS TOB
40 EOR 30 CODE OF FIRST TWO LETTERS EO
41 RNO 18 CODE OF FIRST TWO LETTERS RN
42 O# 15 CODE OF FIRST LETTER O
Thus the code is 20,15,2,5,15,18,14,15,20,27,29,31,36,30,18,15

Example 2:

Encode wa_ba_wabba_wabba_wabba_w

Initial given dictionary

Index Entry
1 _
2 a
3 b
4 o
5 w
Solution:

Extended
CODE Description
Dictionary
6 wa 5 CODE OF FIRST LETTER w
7 ab 2 CODE OF FIRST LETTER a
8 bb 3 CODE OF FIRST LETTER b
9 ba 3 CODE OF FIRST LETTER b
10 a_ 2 CODE OF FIRST LETTER a
11 _w 1 CODE OF FIRST LETTER _
12 wab 6 CODE OF FIRST TWO LETTERS wa
13 bba 8 CODE OF FIRST TWO LETTERS bb
14 a_w 10 CODE OF FIRST TWO LETTERS a_
15 wabb 12 CODE OF FIRST THREE LETTERS wab
16 ba_ 9 CODE OF FIRST TWO LETTERS ba
17 _wa 11 CODE OF FIRST TWO LETTERS _w
18 abb 7 CODE OF FIRST TWO LETTERS ab
19 ba_w 16 CODE OF FIRST THREE LETTERS ba_
Thus the code is 5,2,3,3,2,1,6,8,10,12,9,11,7,16

Example 3:

Encode geekific-geeficf

Index Entry
45 -
99 C
101 e
102 f
103 g
105 i
107 k
Dictionary length is 255

Extended
CODE Description
Dictionary
256 ge 103 CODE OF FIRST LETTER g
257 ee 101 CODE OF FIRST LETTER e
258 ek 101 CODE OF FIRST LETTER e
259 ki 107 CODE OF FIRST LETTER k
260 if 105 CODE OF FIRST LETTER i
261 fi 102 CODE OF FIRST LETTER f
262 ic 105 CODE OF FIRST LETTER i
263 e- 101 CODE OF FIRST LETTER e
264 -g 45 CODE OF FIRST LETTER -
265 gee 256 CODE OF FIRST TWO LETTERS ge
266 ef 101 CODE OF FIRST LETTER e
267 fic 262 CODE OF FIRST TWO LETTERS fi
268 cf 99 CODE OF FIRST LETTER c

Thus the code word is 103,101,101,107,105,102,105,101,45,256,101,262,99

HUFFMAN CODING

Huffman coding is an algorithm for compressing data with the aim of reducing its size
without losing any of the details. This algorithm was developed by David Huffman.
Huffman coding is typically useful for the case where data that we want to compress has
frequently occurring characters in it.
It is a lossless compression.
It is a variable length code.
Variable length is based on the frequencies of the characters.
Variable length codes are assigned to input characters.
They are prefix codes.
Huffman Coding step-by-step working or creation of Huffman Tree is as follows:
 Step-1: Calculate the frequency of each string.
 Step-2: Sort all the characters on the basis of their frequency in ascending order.
 Step-3: Mark each unique character as a leaf node.
 Step-4: Create a new internal node.
 Step-5: The frequency of the new node as the sum of the single leaf node
 Step-6: Mark the first node as this left child and another node as the right child of the
recently created node.
 Step-7: Repeat all the steps from step-2 to step-6.
Example:
Suppose a data file has the following characters and the frequencies. If huffman coding is
used, calculate:
 Huffman Code of each character
 Average code length
 Length of Huffman encoded data

Solution:
Initially, create the Huffman Tree:
Step-2:

Step-3:

Step-4:

Step-5:
The above tree is a Huffman Tree.
Now, assign weight to all the nodes.
Assign “0” to all left edges and “1” to all right edges.
The tree will become

Huffman Code of each character:

A: 00
B: 10
C: 110
D: 01
E: 111
Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi
= {(12 x 2) + (13 x 2)+ (15 x 2)+ ( 7 x 3)+ (9 x 3)} / (12 + 13 + 15 + 7+ 9)
= (24+ 26+ 30 + 21 + 27)/56
= 128/56
= 2.28
Average code length per character = 2.28
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per
character
= 56 x 2.28
= 127.68

Example :2

Encode "go go gophers" using Huffman coding.

SYMBOL FREQUENCY
g 3
0 3
p 1
e 1
h 1
r 1
s 1
“ 2

Step 1: Arrange in ascending order:

Step 2: add 2 least numbers and arrange in ascending order again as shown below tree
diagram

Step 3: add 2 least numbers and arrange in ascending order again as shown below tree
diagram
Step 4: add 2 least numbers and arrange in ascending order again as shown below tree
diagram

Step 5: add 2 least numbers and arrange in ascending order again as shown below tree
diagram

Step 6: add 2 least numbers and arrange in ascending order again as shown below tree
diagram,

Step 7: add 2 least numbers and arrange in ascending order again as shown below tree
diagram

Step 8: add 2 least numbers and arrange in ascending order again as shown below tree
diagram, and assign values 0 to the left side and 1 right side
Step 8: For every symbol assign code from the values from top to bottom

Frequency of a
Symbol Code Size
String
G 3 00 3x2=6
O 3 01 3x2=6
S 1 100 1x3=3
E 1 1100 1x4=4
H 1 1101 1x4=4
P 1 1110 1x4=4
R 1 1111 1x4=4
“ 2 101 2x3=6
8x4=32 bits 13 bits 37 bits

Before compression 13 x 8 = 104 bits

Total size after compression = 32+13+37 = 82 bits

Example 3:

Find Huffman codeword of the given text “AAAAAAAAAABBBBBCCCSS” BY USING

STATIC Huffman tree. Calculate Entropy and derive the average number of bits per
character for codeword?
Avg L = Σ Li * P (i)
=10/20*1+5/20*2+3/20*3+2/20*3=10+10+9+6/20=35/20=1.75
Space required to store the original message=20*8=160 bits
Space required to store the decoded message=35 bits

Example4:

Encode BCAADDDCCACACAC using Huffman coding.

HUFFMAN CODING
It is the average level of information surprise or uncertainity inherent to the variable’s
possible outcomes.

H[X] = − Σ P i log2 Pi

H[X]= -(1/0.3010) Σ P i log Pi

Average Length L= Σ P I x Ni

Efficiency = η = H/L

Redundancy = 1 – η

Procedure
Example: Encode using Huffman coding P0=0.4, P1=0.2, P2=0.1, P3=0.2 and P4=0.1

Step 1: Arrange the probabilities in Descending order

MSG PROB

P0 0.4
P1 0.2
P3 0.2
P2 0.1
P4 0.1
Step2: Combine ( add ) last 2 probabilities and arrange in descending order

Step3: Combine ( add ) last 2 probabilities and arrange in descending order

Repeat until all the probabilities are combined.

Step 4: Assign values 0 and 1 to all elements as shown

Step 5: Obtain codes by tracing

P0=00

P1=01

P2=010

P3=11

P4=110

SYMBOL PROB CODE INVERSE CODE CODE LENGTH

P0 0.4 00 00 2
P1 0.2 01 10 2
P2 0.1 010 010 3
P3 0.2 11 11 2
P4 0.1 110 011 3

Entropy = H[X]= -(1/0.3010) Σ P i log Pi

=-(1/0.3010)[(0.4*log0.4)+(0.2*log0.2)+(0.1*log0.1)+(0.2*log0.2)+(0.1*log0.1)

=2.122

Average Length L= Σ P I x Ni

=(0.4*2)+(0.2*2)+(0.1*3)+(0.2*2)+(0.1*3)

=2.2

Efficiency = η = H/L =2.122/2.2

96.45

Redundancy =1- η =0.355

-------------------------------------------------------------------------------------------------------------------
-------
Example3: x1=0.4,x2=0.19,x3=0.16,x4=x5=0.15
Example 4: x1=0.3,x2=0.25,x3=0.2,x4=0.12,x5=0.08 and x6=0.05

Advantages:

 Efficient
 Shorter codes
 Higher compression ratio
 Doest need any special markers to separate different codes
 Easy to integrate with existing system
 Lossless
 Easy implementation In hardware and software

Disadvantages:

 Frequency should be known in advance

 Tree is complex
 Difficult to understand and debug
 Time consuming
 When unique symbols are present less effective

---------------------------------------------------------------------------------------------------------------------
--------

ARITHMETIC CODING
Arithmetic coding is a type of entropy encoding utilized in lossless data compression.
Arithmetic coding can be used in most applications of data compression. Its main
usefulness is in obtaining compression in conjunction with an adaptive model, or when the
probability of one event is close to 1.

Difference between Huffman coding and Arithmetic coding:

Arithmetic coding Huffman coding

Doesn’t need probability Distribution Need probability Distribution
No need to keep and share codeword table Need to store code word table
Decompression speed is low Decompression speed is high
compression speed is low compression speed is high
compression ratio is high compression ratio is low

PROCEDURE

Example:
Find Sm for the message”babc” with a=0.2,b=0.5 and c=0.3 probabilities using arithmetic
coding.

symbol probability Cumulative probability

a 0.2 0.2
b 0.5 0.7
c 0.3 1.0

Step1:

Step 2: For b

For “b” Calculate Range and mark in the diagram

Lower Limit =0.2

Upper Limit =0.7

Difference =d =0.7-0.2 =0.5

Range = lower Limit +(difference x probability of the symbol)

Range of a=0.2+(0.5x0.2)=0.2+0.10=0.3

Range of b= 0.3+(0.5x0.5)=0.3+0.25=0.55

Range of c=0.55+(0.5x0.3)=0.55+0.15=0.7

Ste 3: for a

For “a” Calculate Range and mark in the diagram

Lower Limit =0.2

Upper Limit =0.3

Difference =d =0.3-0.2 =0.1

Range = lower Limit +(difference x probability of the symbol)

Range of a=0.2+(0.1x0.2)=0.2+0.02=0.22

Range of b= 0.22+(0.1x0.5)=0.22+0.05=0.27

Range of c=0.27+(0.1x0.3)=0.27+0.03=0.3

For “b” Calculate Range and mark in the diagram

Lower Limit =0.22

Upper Limit =0.27

Difference =d =0.27-0.22 =0.05

Range = lower Limit +(difference x probability of the symbol)

Range of a=0.22+(0.05x0.2)=0.22+0.01=0.23

Range of b= 0.23+(0.05x0.5)=0.23+0.025=0.255

Range of c=0.255+(0.05x0.3)=0.255+0.015=0.27

For code word

Code word lies between Lower Limit =0.255 and Upper Limit =0.27

Tag is = (0.255+0.27)/2=0.2275
---------------------------------------------------------------------------------------------------------------------
-----

CONTEXT BASED COMPRESSION/PREDICTIVE CODING

Compresses based on previous symbol.

Encoding is done in prediction mode.

It the distribution or transformation is based on previous sequence (history), there is no

need to transfer additional information

Using the history to determine the sequence in a predictive manner is called as predictive
coding

Popular algorithm used is Predictive with Partial Match (PPM)

In PPM it is necessary only to store that context (surrounding elements) of an encoding

sequence.

At first since there is no previously encoded elements it has to be encoded separately

In PPM escape symbol <ESC> is used to denote that the letter to be encoded isn’t present in
the context.

PPM Basic Algorithm

The basic algorithm initially attempts to use the largest context. The size of the largest
context is predetermined.
If the symbol to be encoded has not previously been present in this context, an escape
symbol is encoded and the algorithm attempts to use the next smaller context.
If the symbol has not occurred in this context either, the size of the context is further
reduced.
This process continues until either we obtain a context that has previously been present
with this symbol, or we arrive at the conclusion that the symbol has not been encountered
previously in any context.
In this case, we use a probability of 1/M to encode the symbol, where M is the size of the
source alphabet.

Example,
Letter to be coded - letter a of the word probability

First attempt - See if the string proba has previously occurred— that is, if the letter a had
previously occurred in the context of prob.
If not, we would send an escape symbol

Second attempt – See if the letter a had previously occurred in the context of rob.
If the string roba had not occurred previously, not, we would send an escape symbol

Third attempt - See if the letter a had previously occurred in the context of ob.
If the string oba had not occurred previously, not, we would send an escape symbol

Forth attempt - See if the letter a had previously occurred in the context of b.
If the string ba had not occurred previously, it is concluded that a has occurred for the first
timeand encode it separately

Example:
Encode the sequence
“this is the “

[Link] that we have already encoded initial 7 symbols “this is” (including space)
[Link] that the longest context length is two, that is order is from -1 to 2 (-1,0,1,2)
3. Count array for -1 order context “this is the “
Letter count Cumulative count
t 1 1
h 1 2
i 1 3
s 1 4
space 1 5
e 1 6
Total count 6
Since special character should be written last interchange e and space

Letter count Cumulative count

t 1 1
h 1 2
i 1 3
s 1 4
e 1 5
space 1 6
Total count 6

[Link] array for 1 order context: “this is”

It gives how many times the letter appears after the context
I order cum
letter count Remarks
context count

h 1 1 h appears once
t
before t in "this
<ESC> 1 2 is"
total count 2
i 1 1
h i appears once
<ESC> 1 2 beforeh in "this
is"
total count 2

s 2 2
i s appears once
<ESC> 1 3
before i in "this
total count 3 is"

space 1 1
s space appears
<ESC> 1 2 once before s in
total "this is"
2
count

i 1 1
space i appears once
before space in
<ESC> 1 2 "this is"
total
2
count
[Link] array for 2 order context: “this is”

II order cum
letter count Remarks
context count
i 1 1
th th appears once before
<ESC> 1 2
i in "this is"
total count 2
s 1 1
hi hi appears once before
<ESC> 1 2 s in "this is"
total count 2

space 1 1
is is appears once before
<ESC> 1 2 space in "this is"
total count 2

i 1 1
s space s space appears once
<ESC> 1 2 before i in "this is"
total count 2

s 1 1
space i space i appears once
<ESC> 1 2 before s in "this is"
total count 2

[Link] assume that the word length for arithmetic coding is six. Thus, l=000000 and
u=111111

[Link] “this is” has already been encoded, the next letter to be encoded is space in ”this is
the”.

In the II order consider space as the first letter (ie in space i) the cumulative count for space is 1
and total count is 2

Formula for lower limit is ln=ln-1+{[(un-1-ln-1+1)*(cum count/(xn-1)]/total count}

Upper limit =un=ln-1+{[(un-1-ln-1+1)*(cum count/(xn)]/total count}

Un-1=decimal of (111111)=63

Ln-1=0, cum count for space=1 and total count =2

Substituting in ln =0+{[(63-0+1)*0]/2}=0=000000

and un =0+{[(63-0+1)*1]/2}=31=011111

Unit 1
No ratings yet
Unit 1
67 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
3-3-Arithmetic Coding
100% (1)
3-3-Arithmetic Coding
71 pages
Stability and Frequency Domain Analysis
No ratings yet
Stability and Frequency Domain Analysis
121 pages
ThinkSeed Full Prep
No ratings yet
ThinkSeed Full Prep
7 pages
CS3401 - Algorithm
No ratings yet
CS3401 - Algorithm
37 pages
Unit Iv Linear Block Codes: Channel Encoder
No ratings yet
Unit Iv Linear Block Codes: Channel Encoder
26 pages
COA (Week 1 To Week 12 Detailed Solution)
100% (1)
COA (Week 1 To Week 12 Detailed Solution)
53 pages
DSA Lab Syllabus
No ratings yet
DSA Lab Syllabus
1 page
Ad3351 Daa Important Questions
No ratings yet
Ad3351 Daa Important Questions
94 pages
Chapter5 Encod Mod
100% (1)
Chapter5 Encod Mod
48 pages
FOC Anna Univ Ques
No ratings yet
FOC Anna Univ Ques
15 pages
8086 Instruction Set and Its Classification
100% (1)
8086 Instruction Set and Its Classification
3 pages
DSP Integrated Circuits Exam 2011
No ratings yet
DSP Integrated Circuits Exam 2011
3 pages
19cs413 Artificial Intelligence
No ratings yet
19cs413 Artificial Intelligence
3 pages
Embedded System Design Overview
No ratings yet
Embedded System Design Overview
67 pages
Itc Unit-Iii
No ratings yet
Itc Unit-Iii
58 pages
Final Assessment Questions
No ratings yet
Final Assessment Questions
5 pages
18CS653 - NOTES Module 1
No ratings yet
18CS653 - NOTES Module 1
24 pages
QuickSell: Empowering SMBs via WhatsApp
No ratings yet
QuickSell: Empowering SMBs via WhatsApp
11 pages
Data Compression
No ratings yet
Data Compression
4 pages
PPS Course Material
100% (1)
PPS Course Material
177 pages
Ee660 2017 Spring Materials Week 04 Slides
No ratings yet
Ee660 2017 Spring Materials Week 04 Slides
40 pages
Module-02 AIML NOTES
No ratings yet
Module-02 AIML NOTES
29 pages
EDF and RMS Scheduling in RTOS
No ratings yet
EDF and RMS Scheduling in RTOS
28 pages
Final Question Bank - DSP
No ratings yet
Final Question Bank - DSP
32 pages
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
No ratings yet
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
7 pages
Number System Assignment Solutions
No ratings yet
Number System Assignment Solutions
7 pages
Unit 6
No ratings yet
Unit 6
22 pages
CS3361 Data Structures Lab Manual
No ratings yet
CS3361 Data Structures Lab Manual
59 pages
CS 188 AI Practice Midterm 1 Instructions
No ratings yet
CS 188 AI Practice Midterm 1 Instructions
15 pages
EE309 Microprocessor Exam September 2017
No ratings yet
EE309 Microprocessor Exam September 2017
1 page
Engg Mathe 3 June 2013
100% (1)
Engg Mathe 3 June 2013
0 pages
Chapter-4: Error Control Coding: (Digital Communication)
No ratings yet
Chapter-4: Error Control Coding: (Digital Communication)
35 pages
Text & Image Compression Guide
No ratings yet
Text & Image Compression Guide
51 pages
B. To Reduce The Size of Data To Save Space
100% (1)
B. To Reduce The Size of Data To Save Space
25 pages
CSE 3213 Final Exam Fall 2010
No ratings yet
CSE 3213 Final Exam Fall 2010
15 pages
Systolic Architecture
No ratings yet
Systolic Architecture
21 pages
State Space Search For Solving Problems: Lecture Module 4
100% (1)
State Space Search For Solving Problems: Lecture Module 4
37 pages
Infosys Verbal Ability Test
No ratings yet
Infosys Verbal Ability Test
20 pages
RTOS
0% (1)
RTOS
1 page
Universal Collection of JNTU Materials
No ratings yet
Universal Collection of JNTU Materials
6 pages
System Noise Temperature in Receivers
No ratings yet
System Noise Temperature in Receivers
11 pages
Unit 3 - Cyclic Code MCQ
No ratings yet
Unit 3 - Cyclic Code MCQ
6 pages
Btech Syllabus
0% (1)
Btech Syllabus
2 pages
Microprocessor and Interfacing Techniques: (Course Code: CET208A) Credits-3
No ratings yet
Microprocessor and Interfacing Techniques: (Course Code: CET208A) Credits-3
147 pages
Computer Organization Question Bank
No ratings yet
Computer Organization Question Bank
5 pages
Week 1: Practice Problems
No ratings yet
Week 1: Practice Problems
2 pages
2marks With Answer
No ratings yet
2marks With Answer
46 pages
Embedded Unit-4
100% (1)
Embedded Unit-4
12 pages
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
No ratings yet
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
10 pages
Tadipatri Engineering Training Circulars
No ratings yet
Tadipatri Engineering Training Circulars
35 pages
Deepfake Voice Synthesis Framework
No ratings yet
Deepfake Voice Synthesis Framework
24 pages
Number Aptitude Practice Quiz
No ratings yet
Number Aptitude Practice Quiz
4 pages
Chapter 1 Random Processes and Noise
No ratings yet
Chapter 1 Random Processes and Noise
32 pages
CS2402 MOBILE COMPUTING Anna University Question Bank
0% (1)
CS2402 MOBILE COMPUTING Anna University Question Bank
6 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Understanding Data Compression Techniques
No ratings yet
Understanding Data Compression Techniques
11 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
Data Compression
No ratings yet
Data Compression
35 pages
Prohibited Private Practice for Officials
No ratings yet
Prohibited Private Practice for Officials
1 page
Steps To Fix A Bad Google Review
No ratings yet
Steps To Fix A Bad Google Review
6 pages
I. Preliminary Considerations
No ratings yet
I. Preliminary Considerations
3 pages
Introduction To Web Development
No ratings yet
Introduction To Web Development
16 pages
Shakti Marketing FMCG To The Rural Consumer
No ratings yet
Shakti Marketing FMCG To The Rural Consumer
10 pages
Form of Performance Guarantee (Rev.4R-260314)
No ratings yet
Form of Performance Guarantee (Rev.4R-260314)
1 page
Key Recruitment Metrics for Success
No ratings yet
Key Recruitment Metrics for Success
3 pages
Spirousb: PC Based Spirometer
No ratings yet
Spirousb: PC Based Spirometer
1 page
Parts of The Ship
100% (1)
Parts of The Ship
22 pages
Constitutional Development of India
No ratings yet
Constitutional Development of India
14 pages
Nexus - Polyester Surfacing Veil
No ratings yet
Nexus - Polyester Surfacing Veil
1 page
What Are The Benefits of Cold Storage
No ratings yet
What Are The Benefits of Cold Storage
3 pages
SA35AC E01 Merged
No ratings yet
SA35AC E01 Merged
87 pages
New Holland
No ratings yet
New Holland
1 page
Biznet Dedicated Internet - Connection Setting - Linksys Router PDF
No ratings yet
Biznet Dedicated Internet - Connection Setting - Linksys Router PDF
4 pages
Denovo Fest Brochure Final
No ratings yet
Denovo Fest Brochure Final
2 pages
Murray Loop Test To Locate Ground Fault
83% (6)
Murray Loop Test To Locate Ground Fault
2 pages
Bin Hashim 180328122218
No ratings yet
Bin Hashim 180328122218
4 pages
ICT Standards for Student Learning
No ratings yet
ICT Standards for Student Learning
10 pages
United Technologies' TQM Success
No ratings yet
United Technologies' TQM Success
6 pages
Ckan
No ratings yet
Ckan
387 pages
Civil Engineering Heritage London and The Thames Valley Smith Download
No ratings yet
Civil Engineering Heritage London and The Thames Valley Smith Download
89 pages
PRT67270151400
No ratings yet
PRT67270151400
2 pages
Solution Manual of Statistical Digital Signal Processing Modeling by MonsonH
83% (29)
Solution Manual of Statistical Digital Signal Processing Modeling by MonsonH
315 pages
Student Solutions Manual For Basic Technical Mathematics With Calculus Si Version 10th Edition 10th Edition Allyn J Washington Download
100% (11)
Student Solutions Manual For Basic Technical Mathematics With Calculus Si Version 10th Edition 10th Edition Allyn J Washington Download
77 pages
Optimization Techniques in Pharmaceutical Formulation Kiran
0% (1)
Optimization Techniques in Pharmaceutical Formulation Kiran
22 pages
High Rise Al 2
No ratings yet
High Rise Al 2
1 page
Registered Contractor 2
No ratings yet
Registered Contractor 2
5 pages
Tip 122 - Datasheet
No ratings yet
Tip 122 - Datasheet
4 pages
Agriculture Lesson for G7 STVE-HE
100% (1)
Agriculture Lesson for G7 STVE-HE
10 pages

Unit 1 Data Compression

Uploaded by

Unit 1 Data Compression

Uploaded by

UNIT 1

BASICS OF DATA COMPRESSION

Basic Elements of Multimedia

There are two types of compression: lossless and lossy.

Difference between Lossy and Lossless Comparison

[Link] Lossy Compression Lossless Compression

In Lossy compression, Data’s quality is But Lossless Compression does not

Algorithms used in Lossy compression Algorithms used

Lossy compression is used in Images, Lossless Compression is used in Text,

Lossy compression is also termed as Lossless Compression is also termed as

Advantages and Disadvantages of Lossy and Lossless comparison:

Quality degrades due to

Methods used for Lossy and Lossless compression

Algorithms used in Lossy compression are:

Discrete Cosine Transform,

Discrete Wavelet Transform,

fractal compression etc.

Algorithms used in Lossless compression are:

Run Length Encoding,

Arithmetic encoding etc

Run Length Encoding

Run-length encoding (RLE) is a form of lossless compression.. RLE is a simple method of

The general scheme is as follows:

• The dictionary D is a collection of strings

. For completeness, the dictionary includes all single symbols.

• The text T is parsed into a sequence of phrases: T = T1T2 . . . Tz,

LEMPEL ZIV WELCH

The initial dictionary, then, consists of the following

Initial given dictionary

Thus the code word is 103,101,101,107,105,102,105,101,45,256,101,262,99

Huffman Code of each character:

Encode "go go gophers" using Huffman coding.

Step 1: Arrange in ascending order:

Before compression 13 x 8 = 104 bits

Total size after compression = 32+13+37 = 82 bits

Find Huffman codeword of the given text “AAAAAAAAAABBBBBCCCSS” BY USING

Encode BCAADDDCCACACAC using Huffman coding.

H[X]= -(1/0.3010) Σ P i log Pi

Step 1: Arrange the probabilities in Descending order

Step3: Combine ( add ) last 2 probabilities and arrange in descending order

Repeat until all the probabilities are combined.

Step 5: Obtain codes by tracing

SYMBOL PROB CODE INVERSE CODE CODE LENGTH

Entropy = H[X]= -(1/0.3010) Σ P i log Pi

Efficiency = η = H/L =2.122/2.2

Redundancy =1- η =0.355

 Frequency should be known in advance

Difference between Huffman coding and Arithmetic coding:

Arithmetic coding Huffman coding

symbol probability Cumulative probability

For “b” Calculate Range and mark in the diagram

Upper Limit =0.7

Difference =d =0.7-0.2 =0.5

Range = lower Limit +(difference x probability of the symbol)

For “a” Calculate Range and mark in the diagram

Lower Limit =0.2

Upper Limit =0.3

Difference =d =0.3-0.2 =0.1

Range = lower Limit +(difference x probability of the symbol)

For “b” Calculate Range and mark in the diagram

Lower Limit =0.22

Upper Limit =0.27

Difference =d =0.27-0.22 =0.05

Range = lower Limit +(difference x probability of the symbol)

For code word

CONTEXT BASED COMPRESSION/PREDICTIVE CODING

Compresses based on previous symbol.

Encoding is done in prediction mode.

It the distribution or transformation is based on previous sequence (history), there is no

Popular algorithm used is Predictive with Partial Match (PPM)

In PPM it is necessary only to store that context (surrounding elements) of an encoding

At first since there is no previously encoded elements it has to be encoded separately

PPM Basic Algorithm

Letter count Cumulative count

[Link] array for 0 order context: “this is”

[Link] array for 1 order context: “this is”