HGGJ Chapter Four
HGGJ Chapter Four
Multimedia Data
Compression
1
Outline
• Introduction
• Compression with loss and lossless
• Run-Length Encoding
• Huffman Coding
• Dictionary- based coding(LZW)
• Adaptive Coding
2
Introduction: What is multimedia
Data Compression?
• The process of coding that will effectively
reduce the total number of bits needed to
represent certain information.
• Refers to sending or storing a smaller number
of bits.
3
Introduction: Why multimedia Data
Compression?
• Audio, image and video require vast amounts of
data
– 320x240x8bits grayscale image: 77Kb
– 1100x900x24bits color image: 3MB
– 640x480x24x30frames/sec: 27.6 MB/sec
• Low network’s bandwidth doesn't allow for real
time video transmission
• Slow storage or processing devices don't allow for fast
playing back
• Compression reduces storage requirements
4
Multimedia data compression
Techniques
5
Lossless Techniques
• Now a days,libraries,museums,film studios, and governments are
converting more and more data and archives into digital form.
Some of the data(e.g., precious books and paintings)in deed need
to be stored with out any loss.
• As a start, suppose we want to encode the call numbers of the 120
million or so Items in the Library(a mere 20million,if we consider
just books).Why don’t we just transmit each item as a27-bit
number, giving each item a unique binary code(since 227 > 120,
000, 000)?
• The main problem is that this “great idea” requires too many bits.
And in fact there exist many coding techniques that will effectively
reduce the total number of bits needed to represent the above
information. The process involved is generally referred to as
compression .
6
Lossless Techniques
• In lossless data compression, the integrity of the
data is preserved.
• The original data and the data after compression
and decompression are exactly the same.
– in these methods, the compression and
decompression algorithms are exact inverses of each
other.
– no part of the data is lost in the process.
– Redundant data is removed in compression and
added during decompression.
– Lossless compression methods are normally used
when we cannot afford to lose any data.
7
Lossless Techniques(1.Run-Length
Encoding)
• is probably the simplest method of compression.
• It can be used to compress data made of any combination of
symbols.
• It does not need to know the frequency of occurrence of symbols
and can be very efficient if data is represented as 0s and 1s.
• The basic idea is that if the information source we wish to compress
has the property that symbols tend to form continuous groups,
instead of coding each symbol in the group individually, we can code
one such symbol and the length of the group.
• The method can be even more efficient if the data uses only two
symbols (for example 0 and 1) in its bit pattern and one symbol is
more frequent than the other.
• As an example, consider a bi-level image (one with only1-bit black
and white pixels) with monotone regions—like an fx. This
information source can be efficiently coded using run-length coding.
8
Lossless Techniques(1.Run-Length
Encoding:example1)
9
Lossless Techniques(1.Run-Length
Encoding:example2)
• Compress the following data using run-
Length encoding
111234444888889
10
Lossless Techniques(2. Shannon-Fano Algorithm)
11
Shannon-Fano Algorithm
12
Shannon-Fano Algorithm
13
Shannon-Fano Algorithm
14
Shannon-Fano Algorithm
15
Lossless Techniques(3.Huffman
coding)
• Assigns shorter codes to symbols that occur more frequently and
longer codes to those that occur less frequently.
• Binary coding tree will be used,in which the left branches are coded 0
and right branches 1.Asimple list data structure is also used.
• Algorithm (HuffmanCoding).
1.Initialization: put all symbols on the list sorted according to their
frequency counts.
2.Repeat until the list has only one symbol left.
(a)From the list, pick two symbols with the lowest frequency counts. Form
a Huffman subtree that has these two symbols as child nodes and
create a parent node for them.
(b)Assign the sum of the children’s frequency counts to the parent and
insert it into the list, such that the order is maintained.
(c)Delete the children from the list.
3. Assign a code word for each leaf based on the path from the root.
16
Lossless Techniques(3.Huffman
coding)
• a bottom-up approach
• For example, imagine we have a text file
that uses only five characters (A, B, C, D, E).
• Before we can assign bit patterns to each
character, we assign each character a weight
based on its frequency of use.
• In this example, assume that the frequency
of the characters is as shown in table below.
17
Lossless Techniques(3.Huffman
coding:example1)
Let us see how to encode text using the code for our five
characters. Figure 15.6 shows the original and the encoded
text.
15.20
Figure Huffman encoding
Lossless Techniques(3.Huffman coding: Decoding)
15.21
Figure Huffman decoding
Properties of Huffman Coding
22
2. Optimality: minimum redundancy code –
proved optimal for a given data model
(i.e., a given probability distribution):
• The two least frequent symbols will have the same
length for their Huffman codes, differing only at the
last bit.
• Symbols that occur more frequently (higher probability)
will have shorter Huffman codes than symbols that
occur less frequently.
23
Dictionary-Based Coding- Lempel Ziv encoding
15.24
Lempel Ziv encoding: Compression
•In this phase there are two concurrent events: building an
indexed dictionary and compressing a string of symbols.
•The algorithm extracts the smallest substring that cannot be
found in the dictionary from the remaining uncompressed
string.
•It then stores a copy of this substring in the dictionary as a
new entry and assigns it an index value.
•Compression occurs when the substring, except for the last
character, is replaced with the index found in the dictionary.
•The process then inserts the index and the last character of
the substring into the compressed string.
15.25
15.26
Figure An example of Lempel Ziv encoding
Lempel Ziv encoding:
Decompression
15.27
Figure An example of Lempel Ziv decoding
Adaptive Huffman Coding
29
Why Adaptive Huffman Coding?
30