0% found this document useful (0 votes)
29 views33 pages

Day 20

Uploaded by

studyxubuntu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views33 pages

Day 20

Uploaded by

studyxubuntu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

What is Data Compression?

Data compression is the representation of an


information source (e.g. a data file, a speech
signal, an image, or a video signal) as accurately
as possible using the fewest number of bits.

Compressed data can only be understood if the


decoding method is known by the receiver.
Why Data Compression?
• Data storage and transmission cost money. This cost
increases with the amount of data available.
• This cost can be reduced by processing the data so
that it takes less memory and less transmission
time.
• Some data types consist of many chunks of
repeated data (e.g. multimedia data such as
audio, video, images, …)
• Such “raw” data can be transformed into a
compressed data representation form saving a lot
of storage and transmission costs.

• Disadvantage of Data compression:


Compressed data must be decompressed to be viewed
(or heard), thus extra processing is required.
Lossless and Lossy Compression Techniques
• Data compression techniques are broadly classified into
lossless and lossy.

• Lossless techniques enable exact reconstruction of the


original document from the compressed information.

• Exploit redundancy in data


• Applied to general data
• Examples: Run-length, Huffman and LZW

• Lossy compression - reduces a file by


permanently eliminating certain redundant
information
• Exploit redundancy and human perception
• Applied to audio, image, and video
• Examples: JPEG and MPEG

• Lossy techniques usually achieve higher compression rates


Classification of Lossless Compression Techniques
• Lossless techniques are classified into static, adaptive (or
dynamic), and hybrid.
• In a static method the mapping from the set of messages to the
set of codewords is fixed before transmission begins, so that a
given message is represented by the same codeword every time
it appears in the message being encoded.
• Static coding requires two passes: one pass to compute
probabilities (or frequencies) and determine the mapping,
and a second pass to encode.
• Examples: Static Huffman Coding

• In an adaptive method the mapping from the set of messages


to the set of codewords changes over time.
• All of the adaptive methods are one-pass methods; only
one scan of the message is required.
• Examples: LZW, and Adaptive Huffman Coding

• An algorithm may also be a hybrid, neither completely static


nor completely dynamic.
Run-length encoding
The following string:
BBBBHHDDXXXXKKKK
WWZZZZ
can be encoded more compactly by replacing each repeated string of
characters by a single instance of the repeated character and a number
that represents the number of times it is repeated:
B4H2D2X4K4W2Z4
Here "B4" means four B's, and “H2” means two H's, etc. Compressing
a string in this way is called run-length encoding.

B0 = # bits required before compression


B1 = # bits required after compression

Compression Ratio = B0 / B1.


Run-length encoding

As another example, consider the storage of a rectangular image. As


a single color bitmapped image, it can be stored as:

The rectangular image can be compressed with run-length


encoding by counting identical bits as follows:
0, 40
0, 40
0,10 1,20 0,10
The first line says that the first line of the bitmap
0,10 1,1 0,18 1,1 0,10 consists of 40 0's. The third line says that the
0,10 1,1 0,18 1,1 0,10
0,10 1,1 0,18 1,1 0,10 third line of the bitmap consists of 10 0's
0,10 1,20 0,10 followed by 20 1's followed by 10 more 0's, and
so on for the other lines
0,40
Run-length encoding
This compression technique is most useful where
symbols appear in long runs, and thus can
sometimes be useful for images that have areas
where the pixels all have the same value, cartoons
for example
Relative encoding
• Relative encoding is a transmission technique that
attempts to improve efficiency by transmitting the
difference between each value and its predecessor,
in place of the value itself .
• Eg:-15106433003 would be transmitted as 1+4-4-
1+6-2-1+0-3+0+3.
• In effect the transmitter is predicting that each
value is the same as its predecessor and the data
transmitted is the difference between the predicted
and actual values.
• Differential Pulse Code Modulation (DPCM)
is an example of relative encoding.
Huffman coding
• a variable-length code is assigned to input
different characters
• Most frequent characters have the smallest
codes
• least frequent characters have longer codes
• In the first pass create a Huffman tree, and in
the second pass traverse the tree to find
codes.
Huffman coding- Building Huffman
Tree
1. Create a leaf node for each unique character and
build a min heap of all leaf nodes
2. Extract two nodes with the minimum frequency
from the min heap
3. Create a new internal node with a frequency equal
to the sum of the two nodes frequencies. Make the
first extracted node as its left child and the other
extracted node as its right child. Add this node to the
min heap
4. Repeat steps#2 and #3 until the heap contains only
one node. The remaining node is the root node and
the tree is complete.
Huffman coding- Example
Message : 100 character string with the frequencies of
each character given below

character Frequency
a 5
b 9
c 12
d 13
e 16
f 45
Huffman coding- Example(contd.)
Extract two minimum frequency nodes from min
heap. Add a new internal node with frequency 5 + 9 =
14.

character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Huffman coding- Example(contd.)
Extract two minimum frequency nodes from heap.
Add a new internal node with frequency 12 + 13 = 25

character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
Huffman coding- Example(contd.)
Extract two minimum frequency nodes. Add a new
internal node with frequency 14 + 16 = 30

character Frequency
Internal Node 25
Internal Node 30
f 45
Huffman coding- Example(contd.)
Extract two minimum frequency nodes. Add a new internal
node with frequency 25 + 30 = 55

character Frequency
f 45
Internal Node 55
Huffman coding- Example(contd.)
Extract two minimum frequency nodes. Add a new
internal node with frequency 45 + 55 = 100

character Frequency
Internal Node 100
Huffman coding- Generating the
code
Traverse the tree formed starting from the root. While
moving to the left child, write 0 . While moving to the
right child, write 1
Huffman coding- Generating the
codes
character code-word
f 0
e 111
d 101
c 100
b 1101
a 1100
LZW (Lempel–Ziv–Welch)
• It is lossless, meaning no data is lost when
compressing
• is typically used in GIF and Unix file compression
utility compress
• works by reading a sequence of symbols, grouping
the symbols into strings, and converting the strings
into codes
• The strings are replaced by their corresponding
codes and so the input file is compressed.
• The efficiency of the algorithm increases as the
number of long, repetitive words in the input data
increases.
LZW ENCODING
1 Initialize table with single character strings
2 P = first input character
3 WHILE not end of input stream
4 C = next input character
5 IF P + C is in the string table
6 P=P+C
7 ELSE
8 output the code for P
9 add P + C to the string table
10 P=C
11 END WHILE
12 output code for P
LZW Encoding Example

Data word : a b a b a b a b a
Dictionary
1 a
2 b
ababababa

Dictionary
1a Encoded word : 1
2b
3 ab
LZW Encoding Example(cont.)

ababababa

Dictionary
1a Encoded word : 1 2
2b
3 ab
4 ba

ababababa
ab already available in the dictionary, append next character
LZW Encoding Example(cont.)

ababababa
Dictionary
1a Encoded word : 1 2 3
2b
3 ab
4 ba
5 aba
ababababa
ab already available in the dictionary, append next character
ababababa
aba already available in the dictionary, append next character
LZW Encoding Example(cont.)

ababababa
Dictionary
1a Encoded word : 1 2 3 5
2b
3 ab
4 ba
5 aba
6 abab
ababababa
ba already available in the dictionary. Since, end of loop,
output code for ba
Encoded word: 1 2 3 5 4
LZW DECODING
1 Initialize table with single character strings
2 OLD = first input code
3 output translation of OLD
4 WHILE not end of input stream
5 NEW = next input code
6 IF NEW is not in the string table
7 S = translation of OLD
8 S=S+C
9 ELSE
10 S = translation of NEW
11 output S
12 C = first character of S
13 OLD + C to the string table
14 OLD = NEW
15 END WHILE
LZW Decoding Example

Code word : 1 2 3 5 4 7
Dictionary
1a
2b
123547 Decoded word : a
123547 Decoded word : a b
123547
Dictionary
1a
2b
3 ab Decoded word : a b ab
LZW Decoding Example(contd.)

123547 Decoded word : a b ab


Dictionary
1a
2b
3 ab
4 ba
5 ab? Decoded word : a b ab ab?
Dictionary
1a
2b
3 ab
4 ba
5 aba Decoded word : a b ab aba
LZW Decoding Example(contd.)

123547 Decoded word : a b ab aba ba


123547
Dictionary
1a
2b
3 ab
4 ba
5 aba
6 abab
7 ba? Decoded word : a b ab aba ba ba?
LZW Decoding Example(contd.)

123547
Dictionary
1a
2b
3 ab
4 ba
5 aba
6 abab
7 bab Decoded word : a b ab aba ba bab
Image Compression Standards
Image Compression Standards
• Tagged image file format (TIFF) : used widely in
printing business as photograph file standard and not
used in browsers
• Portable network graphics (PNG) :created as an
open-source alternative to GIF. Designed to transfer
images over the internet.
• Joint Photographic Experts Group (JPEG): working
group which creates the standard for still image
compression.
• JPEG 2000 : image compression standard compatible
with multimedia technologies with the features such as
superior bit rate performance progressive transmission
by pixel and by accuracy, coding of interested region,
and not error prone to bit errors
JPEG vs JPEG 2000

JPEG JPEG2000
Created for natural images Created for computer generated images
Discrete Cosine Transform Discrete Wavelet Transform
Video Compression Standards
Standard Application
H.261 Video conferencing over ISDN
MPEG-1 Video on digital storage media
(CD-ROM)

MPEG-2 Digital Television


H.263 Video telephony over PSTN
MPEG-4 Object-based coding, synthetic
content, interactivity

H.264/ MPEG-4 AVC Improved video compression

You might also like