0% found this document useful (0 votes)
5 views27 pages

MM05 1

Multimedia

Uploaded by

Sana M.saffar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views27 pages

MM05 1

Multimedia

Uploaded by

Sana M.saffar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Lossless Compression

Department of Software Engineering


Compression Types

Lossless and Lossy Compression


• A variety of data compression techniques compress
multimedia data. They fall into either the lossless or lossy
categories.
• Lossless compression, as the name suggests, results in a
compressed signal, which produces the exact original signal
when decompressed.
• In lossy compression, the compressed signal when
decompressed does not correspond to the original signal.
Lossy compression produces distortions between the original
and the decompressed signal..
Compression metrics
• To characterize compression, two metrics are commonly
used—compression rate and compression ratio. Both are
related, but the former relates to the transmission of data,
whereas the latter relates to storage of data.
Compression Rate
• Compression rate is an absolute term and is simply defined
as the rate of compressed data, which we imagine as being
transmitted in real time. It is expressed as bits/symbol—such
as bits/sample, bits/pixel, or even bits/second. It, thus, defines
the bit rate of the compressed signal.
Compression Ratio
• Compression ratio is a relative term and is defined as the ratio
of the size (or rate) of the original data to the size (or rate) of
the compressed data.
Rate Distortion
• Lossless compression normally assigns different codes to the
original samples or symbols by studying the statistical
distribution of those symbols in the signal. Thus, the bit rate
produced by the compressed signal is bound to vary
depending on how the signal’s symbols are distributed. This
produces a variable bit rate for the compressed signal.

• This might not pose a problem for some applications involving


offline compression, but variable rates are undesirable when
delivering or streaming real-time data. In such cases, lossy
techniques are used, where the compressed signal can be
distorted to achieve a constant bit rate and are useful for
streaming video and audio and for real-time delivery of media
across networks
LOSSLESS Compression
• Lossless compression techniques achieve compression by
removing the redundancy in the signal.
• This is normally achieved by assigning new codes to the
symbols based on the frequency of occurrence of the symbols
in the message.
• More frequent symbols are assigned shorter codes and vice
versa. Sometimes the entire message to be coded is not
readily available and frequency of symbols cannot be
computed a priori. This happens when a source is “live,” for
example, in real-time communication systems. In such cases,
a probabilistic model is assumed and the source symbols are
assigned a model probability. The lossless compression
algorithms make use of the probabilistic models to compute
efficient codes for the symbols
LOSSLESS Compression
Run Length Encoding
• Run length encoding is the simplest form of redundancy
removal. It removes redundancy by relying on the fact that a
string contains repeated sequences or “runs” of the same
symbol.
• The runs of the same symbol are encoded using two
entities—a count suggesting the number of repeated symbols
and the symbol itself.
• For instance, a run of five symbols aaaaa will be replaced by
the two symbols 5a.
LOSSLESS Compression
Run Length Encoding
• A sample string of symbols is as follows:

BBBBEEEEEEEECCCCDAAAAA

• It can be represented as 4B8E4C1D5A.


LOSSLESS Compression
Run Length Encoding
• A sample string of symbols is as follows:

BBBBEEEEEEEECCCCDAAAAA

• It can be represented as 4B8E4C1D5A.


• The original 22-byte message (assuming a byte for each
symbol) has been reduced to 10 bytes, giving a compression
ratio of 2.2.
LOSSLESS Compression
Run Length Encoding
• Run length encoding performs best in the presence of
repetitions or redundancy of symbols in the information.
• This corresponds to a low entropy case and does result in
efficient compression of the data.
• However, if there is no repetition or the seeming randomness
in the symbols is high, run length encoding is known not to
give any compression and, in fact, depending on
implementation flavors, it could very well result in increasing
the size of the message.
LOSSLESS Compression
Run Length Encoding
• Run length encoding is used in a variety of tools involving
text, audio, images, and video.
• For example, when two people are having a telephone
conversation, there are gaps represented by zero values
when nobody is speaking.
• It is also not uncommon in images and video to have areas,
which have the same pixel distribution, such as in the
background.
LOSSLESS Compression
Repetitive Suppression
• Repetition suppression works by reserving a specific symbol
for a commonly occurring pattern in a message. For example,
a specific series of successive occurrences of the letter a in
the following example is replaced by the flag

• Abdcbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a
is replaced by
Abdcb¥
LOSSLESS Compression
Pattern Substitution / LZW Compression
• The compression is based on a simple idea that attempts to
build a group of individual symbols into strings. The strings
are given a code, or an index, which is used every time the
string appears in the input message.
• So, the whole message consisting of symbols is broken down
into groups forming substrings, each substring having an
index. Its simplicity resides in the fact that no pre-analysis of
the message is required to create a dictionary of strings, but
rather the strings are created as the message is scanned.
• Compression is achieved when a single code (or index) is
used for a string of symbols rather than just the symbol.
LOSSLESS Compression
LZW Compression : PROCESS
• Initialize the dictionary to contain all initial symbols. The
vocabulary forms the initial dictionary entries. Every entry in
the dictionary is given an index.
• While scanning the message to compress, search for the
longest sequence of symbols that has appeared as an entry in
the dictionary. Call this entry E.
• Encode E in the message by its index in the dictionary.
• Add a new entry to the dictionary, which is E followed by the
next symbol in the scan.
• Repeat the process of using the dictionary indexes and
adding until you reach the end of the message
LOSSLESS Compression
LZW Compression : Algorithm
LOSSLESS Compression
LZW Compression : Example
• Figure illustrates a running example. Here the dictionary is
first initialized with x and y (the initial symbol set) having
indexes 0 and 1. The string is shown at the top of the figure
and along with it the iterative output of indexes produced by
the algorithm. The coded output produced by the algorithm is
“0 0 1 1 3 6 3 4 7 5 8 0.”
LOSSLESS Compression
LZW Compression : Example
• .
LOSSLESS Compression
LZW Compression : Example
• final set of indexes and dictionary code words
LOSSLESS Compression
LZW Compression : Decompression
• Decompression works in the reverse manner. Given the
encoded sequence and the dictionary, the original message
can be obtained by a dictionary lookup
LOSSLESS Compression
Huffman Coding
• The goal, as mentioned earlier, is to have more frequent
symbols that have smaller code lengths and less frequent
symbols that have longer code lengths.
• This variable-length representation is normally computed by
performing statistical analysis on the frequency of occurrence
of each symbol. Huffman coding provides a way to efficiently
assign new codes to each symbol, depending on the
frequency of occurrence of the symbol
• The process starts by computing or statistically determining
the probability of occurrence of each symbol. The symbols are
then organized to form the leaves of a tree, and a binary tree
is built such that the most frequent symbol is closer to the root
and the least frequent symbol is farthest from the root.
LOSSLESS Compression
Huffman Coding
LOSSLESS Compression
Huffman Coding
LOSSLESS Compression
Huffman Coding
LOSSLESS Compression
Arithmetic Coding
• In previous techniques, coding every symbol was represented
individually by a code, or a group was represented in case of
a run, or dictionary word.
• Thus, a whole number of bits were required to encode a
symbol (or symbol group).
• Arithmetic coding overcomes this constraint by mapping an
entire message to a real number between zero and one. This
real number representing the entire message is coded as a
binary number.
• Arithmetic coding, thus, encodes a message entirely without
assigning a fixed binary code to each symbol and, thereby,
tends to produce better compression ratios.
LOSSLESS Compression
Arithmetic Coding
• Given an alphabet of n symbols, there are an infinite number
of messages that are possible.
• Each message is mapped to a unique real number in the
interval [0,1).
• The interval contains an infinite amount of real numbers, so it
must be possible to code any message uniquely to one
number in the interval.
• The interval is first set at [0,1) for the first symbol, and then
partitioned according to the symbol probabilities.
• Depending on the first symbol of the message, an interval is
chosen, which is then further partitioned depending on
probabilities
LOSSLESS Compression
Arithmetic Coding
1) Divide the interval [0,1) into n segments corresponding to the
n symbols; the segment of each symbol has a length
proportional to its probability. Each segment i has an upper
bound U and lower bound L corresponding to the start of the
segment and the end of the segment (U- L = Pi ).
2) Choose the segment that corresponds to the first symbol in
the message string. This is the new current interval with its
computed new upper and lower bounds.
3) Divide the new current interval again into n new segments
with length proportional to the symbols probabilities and
compute the new intervals accordingly
LOSSLESS Compression
Arithmetic Coding
4) From these new segments, choose the one corresponding to
the next symbol in the message.

5) Continue Steps 3 and 4 until the whole message is coded.

6) Represent the segment’s value by a binary fraction in the final


interval
LOSSLESS Compression
Arithmetic Coding

You might also like