Ch8c Data Compression
Ch8c Data Compression
The quality of the coded and later on encoded data should be as equal as possible to make cost
effective implementation possible
The processing of the algorithm must not exceed certain time span i.e. for each compression
tech., there are other requirements that differ from those other techniques. One can distinguish
between the requirements of an application running in a “dialogue” mode and in a “retrieval”
mode
In dialogue mode means an interaction among human users via multimedia info. And a retrieval
mode means retrieval of multimedia info by a human user from a DB. e.g. HTML db & HTML doc.
In dialogue mode applications the following characteristics requirements based on human perception
must b considered;-
End to End delay – shouldn’t exceed 150ms for compression and decompression. A delay in the
range of 50ms should b achieved to support face to face dialogue application e.g. video
conference. The overall End to End delay traditionally comprises any delay in the h/w, in the
involved communication protocol processing at the end system and in the data transfer from and
to respective I/O devices
In retrieval mode applications the following demands;-
Fast forward and backward data retrieval with simultaneous display should b possible. This
implies a fast search for the info in multimedia db
Random access to single image and audio frames of a data stream should b possible, making the
access time less than 0.5s
Decompression of images, video or audio should b possible without a link to other data units.
This allows random access and editing
For both dialog and retrieval mode the following requirements apply;-
To support scalable video in different system, it’s necessary to define a format independence of
frame size and video frame rate.
Must be possible to synchronize audio and video data as well as with other media
To make an economical solution possible, coding should b realized using software (for a cheap
and low quality solution) or VLSI (for a high quality solution)
1
Should be possible to generate data on multimedia system and produce this data on other system.
The compression technique should be compatible.
2
Processes the results of the previous steps & specifies the granularity of the mapping of real nos.
into integers. It results in a reduction of precision.
d. entropy encoding
Usually the last step (not all times)
Compresses a sequential digital data stream without data loss e.g. a seq. of zeros in a data stream
can b compressed by specifying the no. occurrences followed by occur itself.
Uncompressed pic prepare pic process pic quantized
entropy encoding compresssed pic
DIATOMIC ENCODING
Variation of run-length encoding based on a combination of 2 data bytes.
This technique determines the most frequently occurring pairs of bytes.
3
According to analysis of the English language, the most frequently occurring pairs are;-
(blanks are included in the pairs)
“e”, “c”, “a” and “s”
“e”, “t”, “th”, “a”, “s”, “re”, “in”, “he”
Replace of these pairs by special single bytes that do not occur anywhere else in text leads to a data
reduction of more than 10%
HUFFMAN TECHNIQUE
Huffman coding is an entropy encoding algorithm used for lossless data compression.
Determines the optimal code using the minimum no. of bits given the characters that must be
encoded together with the probability of their occurrence, thus the length (no. of bits) of the coded
characters will differ in text, the shortest code is assigned to those characters that occur most
frequently.
To determine the Huffman code it’s useful to construct a binary tree, the leaves (nodes) rep. the
characters that are to be encoded. Every node obtains the occurrence probability of one of the
character belonging to this subtree. Zeros and 1 are assigned to the branches (edge of the tree). E.g.
1) A, B, C, D, and E have the following probability of occurrence
P(A) = 0.16
P(B) = 0.51
P(C) = 0.09
P(D) = 0.13
P(E) = 0.11
To construct the binary tree, we first take the least two probabilities, in the example above it is C and
E
P(C) = 0.09 and P(E) = 0.11
4
P(C,E) =0.20
P(D,A) =0.29
5
Then we combine the above;
P (C,E,D,A) = 0.49
P(C,E) =0.20
P(D,A) =0.29
6
At last we combine the whole tree;
P(B,C,E,D,A) =1
P(B) =0.51