Application of Compression
Application of Compression
Application of Compression
their Applications
Mohammad Hosseini
Network Systems Lab
School of Computing Science
Simon Fraser University, BC, Canada
Email: mohammad [email protected]
Abstract—Today, with the growing demands of infor- enumerate some recent issues and approaches regarding
mation storage and data transfer, data compression is data compression, like energy consumption.
becoming increasingly important. Data Compression is
a technique which is used to decrease the size of data. II. DATA C OMPRESSION A LGORITHMS : L OSSY
This is very useful when some huge files have to be AND L OSSLESS COMPRESSION
transferred over networks or being stored on a data Basically, we have two types of data compression al-
storage device and the size is more than the capacity of gorithms. Lossless algorithms, which can reconstruct the
the data storage or would consume so much bandwidth
original message exactly from the compressed message,
for transmission in a network. With the advent of the
Internet and mobile devices with limited resources, data and lossy algorithms, which can only reconstruct an ap-
compression has gained even more importance. It can proximation of the original message. Lossless algorithms
be effectively used to save both storage and bandwidth, are typically used for text, and lossy compression algo-
thus to decrease download duration. Data compression can rithms, on the other hand, remove unnecessary data per-
be achieved by a host of techniques. During this survey, manently and so the original data cannot be completely
I’m going to thoroughly discuss some of important data regenerated. This is used for image, sound and video
compression algorithms, their performance evaluation, and compression since it can cause significant reduction in
their major applications along with today’s issues and
file size with no significant quality reduction.
recent research approaches.
III. L OSSLESS DATA C OMPRESSION
I. I NTRODUCTION A. Huffman Algorithm
Data compression is one of the enabling technologies Huffman coding algorithm was first developed by
for multimedia applications. It would not be practical David Huffman as part of a class assignment! The class
to put images, audio and video on websites if do not was the first ever in the area of information theory and
use data compression algorithms. Mobile phones would was taught by Robert Fano at MIT. The codes generated
not be able to provide communication clearly without by this algorithm are called Huffman codes. These codes
data compression. With data compression techniques, we are prefix codes and are optimum for a given model (set
can reduce the consumption of resources, such as hard of probabilities). The Huffman procedure is based on two
disk space or transmission bandwidth. In this survey, observations regarding optimum prefix codes:
first we introduce the concept of lossy and lossless 1) In an optimum code, symbols that occur more fre-
data compression techniques, and will thoroughly discuss quently (have a higher probability of occurrence)
some of their major algorithms. Then we pick two of the will have shorter codewords than symbols that
mostly used compression algorithms, implement them occur less frequently.
and compare their compression ratio as a performance 2) In an optimum code, the two symbols that occur
factor. Then in the third part, we discuss some of least frequently will have the same length.
the most significant applications of data compression It is easy to see that the first observation is correct.
algorithms in multimedia compression, JPEG and MPEG If symbols that occur more often had codewords that
coding algorithms. Finally in the last section we would were longer than the codewords for symbols with less
Fig. 1. Probabilities of alphabets in the example word
sequence code
B 000
C 001
AB 010
AC 011
AAA 100
AAB 101
AAC 110
IV. L OSSY DATA C OMPRESSION of output values, y. Many of the fundamental ideas of
Lossy compression is compression in which some quantization and compression are easily introduced in
of the information from the original message sequence the simple context of scalar quantization. Fig. 5 shows
is lost. This means the original sequences cannot be examples of uniform and non uniform quantizations. [2]
regenerated from the compressed sequence. Just because Vector quantization is a different type of quantization,
information is lost doesn’t mean the quality of the output which is typically implemented by selecting a set of
is reduced. For example, random noise has very high representatives from the input space, and then mapping
information content, but when present in an image or all other points in the space to the closest representative.
a sound file, we would typically be perfectly happy to The representatives could be fixed for all time and part
drop it. Also certain losses in images or sound might of the compression protocol, or they could be determined
be completely undetectable to a human viewer (e.g. for each file (message sequence) and sent as part of the
the loss of very high frequencies). For this reason, sequence. The most interesting aspect of vector quantiza-
lossy compression algorithms on images can often get tion is how one selects the representatives. Typically it is
a better compression ratio by a factor of 2 than lossless implemented using a clustering algorithm that finds some
algorithms with an undetectable loss in quality. However, number of clusters of points in the data. A representative
when quality does start degrading in an observable way, is then chosen for each cluster by either selecting one
it is important to make sure it degrades in a way that of the points in the cluster or using some form of
is least objectionable to the viewer (e.g., dropping ran- centroid for the cluster. Finding good clusters is a whole
dom pixels is probably more unpleasant than dropping interesting topic on its own. [9]
some color information). For these reasons, the ways Vector quantization is most effective when the vari-
most lossy compression techniques are used are highly ables along the dimensions of the space are correlated.
dependent on the media that is being compressed. Lossy Fig. 6 shows an example of possible representatives for a
compression for sound, for example, is very different height-weight chart. There is clearly a strong correlation
than lossy compression for images. between people’s height and weight and therefore the
In this section, I will go over two significant tech- representatives can be concentrated in areas of the space
niques which can be applied to various contexts, and that make physical sense, with higher densities in more
in the next sections we will see some of their major common regions. Using such representatives is very
applications in the real world. much more effective than separately using scalar quanti-
zation on the height and weight. [4] Vector quantization,
A. Vector and Scalar Quantization as well as scalar quantization, can be used as part of a
Quantization is the procedure of constraining some- lossless compression technique. In particular if in addi-
thing from a relatively large or continuous set of values tion to sending the closest representative, the coder sends
(such as the real numbers) to a relatively small discrete the distance from the point to the representative, then the
set (such as integers). original point can be reconstructed. The distance is often
Scalar quantization is one of the simplest and most referred to as the residual. In general this action alone
general idea for lossy compression. Scalar quantization would not cause compression, but in case the points
is a mapping of an input value x into a finite number have been tightly clustered around the representatives,
Fig. 6. Example of vector quantization.
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99