3 MM Compression
3 MM Compression
3 MM Compression
1
The Need for Compression
• Take, for example, a video signal with resolution
320x240 pixels and 256 (8 bits) colors, 30 frames per
second
• Raw bit rate = 320x240x8x30
= 18,432,000 bits
= 2,304,000 bytes = 2.3 MB
5
Types of Compression
Lossless Compression Lossy Compression
M M
m m
M’ M
Uncompress Uncompress
M M’
• Dictionary-based coding
–The previous algorithms (both entropy and Huffman) require the
statistical knowledge which is often not available (e.g., live audio,
video).
–Dictionary based coding, such as Lempel-Ziv (LZ) compression
techniques do not require prior information to compress strings.
•Rather, replace symbols with a pointer to dictionary entries
16
Common Compression Techniques
• Compression techniques are classified into Static and
adaptive (dynamic) encodings.
1. Static coding requires two passes: one pass to compute
probabilities (or frequencies) and determine the mapping, &
a second pass to encode.
• Examples: Huffman Coding, Entropy encoding(Shanon-fano)
2. Adaptive coding:
–It adapts to localized changes in the characteristics of the data, and
don't require a first pass over the data to calculate a probability
model. All of the adaptive methods are one-pass methods; only one
scan of the message is required.
–The cost paid for these advantages is that the encoder & decoder
must be complex to keep their states synchronized, & more
computational power is needed to keep adapting the
encoder/decoder state.
–Examples: Lempel-Ziv encoding 17
Compression model
• Almost all data compression methods involve the use of a
model, a prediction of the composition of the data.
–When the data matches the prediction made by the model, the
encoder can usually transmit the content of the data at a lower
information cost, by making reference to the model.
–In most methods the model is separate, and because both the
encoder and the decoder need to use the model, it must be
transmitted with the data.
• In adaptive coding, the encoder and decoder are instead
equipped with identical rules about how they will alter their
models in response to the actual content of the data
–both start with a blank slate, meaning that no initial model needs to
be transmitted.
–As the data is transmitted, both encoder and decoder adapt their
models, so that unless the character of the data changes radically,
the model becomes better-adapted to the data it's handling and
compresses it more efficiently. 18
Huffman coding
• Developed in 1950s by David Huffman,
widely used for text compression,
multimedia codec and message 0 1
transmission
D4
0 1
• The problem: Given a set of n symbols
and their weights (or frequencies), 1 D3
0
construct a tree structure (a binary tree
for binary code) with the objective of D1 D2
reducing memory space and decoding
time per symbol. Code of:
D1 = 000
• For instance, Huffman coding is
constructed based on frequency of D2 = 001
occurrence of letters in text documents D3 = 01
D4 = 1
19
Huffman coding
• The Model could determine raw probabilities of each
symbol occurring anywhere in the input stream.
pi = # of occurrences of Si
Total # of Symbols
20
How to construct Huffman coding
• Step 1: Create forest of trees for each symbol, t1, t2,… tn
• Step 2: Sort forest of trees according to falling
probabilities of symbol occurrence
• Step 3: WHILE more than one tree exist DO
–Merge two trees t1 and t2 with least probabilities p1 and p2
–Label their root with sum p1 + p2
–Associate binary code: 1 with the right branch and 0 with the left
branch
• Step 4: Create a unique codeword for each symbol by
traversing the tree from the root to the leaf.
–Concatenate all encountered 0s and 1s together during traversal
• The resulting tree has a prob. of 1 in its root and symbols
in its leaf node. 21
Example
• Consider a 7-symbol alphabet given in the following table
to construct the Huffman coding.
Symbol Probability
a 0.05
• The Huffman encoding
b 0.05
algorithm picks each time two
c 0.1 symbols (with the smallest
d 0.2 frequency) to combine
e 0.3
f 0.2
g 0.1
22
Huffman code tree
24
Entropy encoding
According to Shannon, the entropy of an information source S is
defined as: H(S i ) ( log 2 (1/p i ))
i
28
Exercise
• Given the following symbols and their corresponding
frequency of occurrence, find an optimal binary code for
compression:
Character: a b c d e t
Frequency: 16 5 12 17 10 25
29
Lempel-Ziv compression
•The problem with Huffman coding is that it requires
knowledge about the data before encoding takes place.
–Huffman coding requires frequencies of symbol occurrence
before codeword is assigned to symbols
•Lempel-Ziv compression:
–Not rely on previous knowledge about the data
–Rather builds this knowledge in the course of data
transmission/data storage
–Lempel-Ziv algorithm (called LZ) uses a table of code-words
created during data transmission;
•each time it replaces strings of characters with a reference to a
previous occurrence of the string.
31
Lempel-Ziv Compression Algorithm
• The multi-symbol patterns are of the form: C0C1 . . . Cn-1
Cn. The prefix of a pattern consists of all the pattern
symbols except the last: C0C1 . . . Cn-1
1. Aaababbbaaabaaaaaaabaabb
2. ABBCBCABABCAABCAAB
3. SATATASACITASA.
36