MM05 1
MM05 1
BBBBEEEEEEEECCCCDAAAAA
BBBBEEEEEEEECCCCDAAAAA
• Abdcbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a
is replaced by
Abdcb¥
LOSSLESS Compression
Pattern Substitution / LZW Compression
• The compression is based on a simple idea that attempts to
build a group of individual symbols into strings. The strings
are given a code, or an index, which is used every time the
string appears in the input message.
• So, the whole message consisting of symbols is broken down
into groups forming substrings, each substring having an
index. Its simplicity resides in the fact that no pre-analysis of
the message is required to create a dictionary of strings, but
rather the strings are created as the message is scanned.
• Compression is achieved when a single code (or index) is
used for a string of symbols rather than just the symbol.
LOSSLESS Compression
LZW Compression : PROCESS
• Initialize the dictionary to contain all initial symbols. The
vocabulary forms the initial dictionary entries. Every entry in
the dictionary is given an index.
• While scanning the message to compress, search for the
longest sequence of symbols that has appeared as an entry in
the dictionary. Call this entry E.
• Encode E in the message by its index in the dictionary.
• Add a new entry to the dictionary, which is E followed by the
next symbol in the scan.
• Repeat the process of using the dictionary indexes and
adding until you reach the end of the message
LOSSLESS Compression
LZW Compression : Algorithm
LOSSLESS Compression
LZW Compression : Example
• Figure illustrates a running example. Here the dictionary is
first initialized with x and y (the initial symbol set) having
indexes 0 and 1. The string is shown at the top of the figure
and along with it the iterative output of indexes produced by
the algorithm. The coded output produced by the algorithm is
“0 0 1 1 3 6 3 4 7 5 8 0.”
LOSSLESS Compression
LZW Compression : Example
• .
LOSSLESS Compression
LZW Compression : Example
• final set of indexes and dictionary code words
LOSSLESS Compression
LZW Compression : Decompression
• Decompression works in the reverse manner. Given the
encoded sequence and the dictionary, the original message
can be obtained by a dictionary lookup
LOSSLESS Compression
Huffman Coding
• The goal, as mentioned earlier, is to have more frequent
symbols that have smaller code lengths and less frequent
symbols that have longer code lengths.
• This variable-length representation is normally computed by
performing statistical analysis on the frequency of occurrence
of each symbol. Huffman coding provides a way to efficiently
assign new codes to each symbol, depending on the
frequency of occurrence of the symbol
• The process starts by computing or statistically determining
the probability of occurrence of each symbol. The symbols are
then organized to form the leaves of a tree, and a binary tree
is built such that the most frequent symbol is closer to the root
and the least frequent symbol is farthest from the root.
LOSSLESS Compression
Huffman Coding
LOSSLESS Compression
Huffman Coding
LOSSLESS Compression
Huffman Coding
LOSSLESS Compression
Arithmetic Coding
• In previous techniques, coding every symbol was represented
individually by a code, or a group was represented in case of
a run, or dictionary word.
• Thus, a whole number of bits were required to encode a
symbol (or symbol group).
• Arithmetic coding overcomes this constraint by mapping an
entire message to a real number between zero and one. This
real number representing the entire message is coded as a
binary number.
• Arithmetic coding, thus, encodes a message entirely without
assigning a fixed binary code to each symbol and, thereby,
tends to produce better compression ratios.
LOSSLESS Compression
Arithmetic Coding
• Given an alphabet of n symbols, there are an infinite number
of messages that are possible.
• Each message is mapped to a unique real number in the
interval [0,1).
• The interval contains an infinite amount of real numbers, so it
must be possible to code any message uniquely to one
number in the interval.
• The interval is first set at [0,1) for the first symbol, and then
partitioned according to the symbol probabilities.
• Depending on the first symbol of the message, an interval is
chosen, which is then further partitioned depending on
probabilities
LOSSLESS Compression
Arithmetic Coding
1) Divide the interval [0,1) into n segments corresponding to the
n symbols; the segment of each symbol has a length
proportional to its probability. Each segment i has an upper
bound U and lower bound L corresponding to the start of the
segment and the end of the segment (U- L = Pi ).
2) Choose the segment that corresponds to the first symbol in
the message string. This is the new current interval with its
computed new upper and lower bounds.
3) Divide the new current interval again into n new segments
with length proportional to the symbols probabilities and
compute the new intervals accordingly
LOSSLESS Compression
Arithmetic Coding
4) From these new segments, choose the one corresponding to
the next symbol in the message.