Unit 5
Unit 5
Introduction
In recent years, there have been significant advancements in algorithms and
architectures for the processing of image, video, and audio signals. These
advancements have proceeded along several directions. On the algorithmic front, new
techniques have led to the development of robust methods to reduce the size of the
image, video, or audio data. Such methods are extremely vital in many applications that
manipulate and store digital data. Informally, we refer to the process of size reduction as
a compression process. We will define this process in a more formal way later.
On the architecture front, it is now feasible to put sophisticated compression
processes on a relatively low-cost single chip; this has spurred a great deal of activity in
developing multimedia systems for the large consumer market. One of the exciting
prospects of such advancements is that multimedia information comprising image,
video, and audio has the potential to become just another data type. This usually
implies that multimedia information will be digitally encoded so that it can be
manipulated, stored, and transmitted along with other digital data types. For such data
usage to be pervasive, it is essential that the data encoding is standard across different
platforms and applications. This will foster widespread development of applications and
will also promote interoperability among systems from different vendors. Furthermore,
standardization can lead to the development of cost effective implementations, which in
turn will promote the widespread use of multimedia information. This is the primary
motivation behind the emergence of image and video compression standards.
Background
Compression is a process intended to yield a compact digital representation of a
signal. In the literature, the terms source coding, data compression, bandwidth
compression, and signal compression are all used to refer to the process of
compression. In the cases where the signal is defined as an image, a video stream, or
an audio signal, the generic problem of compression is to minimize the bit rate of their
Course Code & Name CEC366 Image Processing
digital representation. There are many applications that benefit when image, video, and
audio signals are available in compressed form. Without compression, most of these
applications would not be feasible!
Image, video, and audio signals are amenable to compression due to the factors
below.
• There is considerable statistical redundancy in the signal.
1. Within a single image or a single video frame, there exists significant correlation
among neighbor samples. This correlation is referred to as spatial correlation.
2. For data acquired from multiple sensors (such as satellite images), there exists
significant correlation amongst samples from these sensors. This correlation is referred
to as spectral correlation.
3. For temporal data (such as video), there is significant correlation amongst samples in
different segments of time. This is referred to as temporal correlation.
• There is considerable information in the signal that is irrelevant from a
perceptual point of view.
• Some data tends to have high-level features that are redundant across space
and time; that is, the data is of a fractal nature.
For a given application, compression schemes may exploit any one or all of the
above factors to achieve the desired compression data rate.
Course Code & Name CEC366 Image Processing
There are many applications that benefit from data compression technology.
Table 1.1 lists a representative set of such applications for image, video, and audio
data, as well as typical data rates of the corresponding compressed bit streams. Typical
data rates for the uncompressed bit streams are also shown.
Course Code & Name CEC366 Image Processing
This definition is somewhat ambiguous and depends on the data type and the
specific compression method that is employed. For a still-image, size could refer to the
bits needed to represent the entire image. For video, size could refer to the bits needed
to represent one frame of video. Many compression methods for video do not process
each frame of video, hence, a more commonly used notion for size is the bits needed to
represent one second of video.
Course Code & Name CEC366 Image Processing
Coding Efficiency
This is usually measured in bits per sample or bits per second (bps). Coding
efficiency is usually limited by the information content or entropy of the source. In
intuitive terms, the entropy of a source X provides a measure for the "randomness" of X.
From a compression theory point of view, sources with large entropy are more difficult
to compress (for example, random noise is very hard to compress).
Coding Complexity
Course Code & Name CEC366 Image Processing
Coding Delay
A complex compression process often leads to increased coding delays at the
encoder and the decoder. Coding delays can be alleviated by increasing the processing
power of the computational engine; however, this may be impractical in environments
where there is a power constraint or when the underlying computational engine cannot
be improved. Furthermore, in many applications, coding delays have to be constrained;
for example, in interactive communications.
Lossy compression
The majority of the applications in image or video data processing do not require
that the reconstructed data and the original data are identical in value. Thus, some
amount of loss is permitted in the reconstructed data. A compression process that
results in an imperfect reconstruction is referred to as a lossy compression process.
This compression process is irreversible. In practice, most irreversible compression
processes degrade rapidly the signal quality when they are repeatedly applied on
Course Code & Name CEC366 Image Processing
The noise signal energy is defined as the energy measured for a hypothetical
signal that is the difference between the encoder input signal and the decoder output
signal. Note that, SNR as defined here is given in decibels (dB). In the case of images
or video, PSNR (peak signal-to noise ratio) is used instead of SNR. The calculations are
essentially the same as in the case of SNR , however, in the numerator, instead of
using the encoder input signal one uses a hypothetical signal with a signal strength of
255 (the maximum decimal value of an unsigned 8-bit number, such as in a pixel).
High SNR or PSNR values do not always correspond to signals with perceptually
high quality.
Another measure of signal quality is the mean opinion score, where the
performance of a compression process is characterized by the subjective quality of the
decoded signal.
developed for the lossless compression of such images. We discuss these standards
later. In general, even when lossy compression is allowed, the overall compression
scheme may be a combination of a lossy compression process followed by a lossless
compression process. Various image, video, and audio compression standards follow
this model, and several of the lossless compression schemes used in these standards
are described in this section. The general model of a lossless compression scheme is
as depicted in the following figure.
Data Redundancy
Course Code & Name CEC366 Image Processing
Coding
if grey levels of image are coded in such away that uses more
symbols than is necessary
Inter-pixel
can guess the value of any pixel from its neighbours
Psyco-visual
some information is less important than other info in normal
visual processing
Coding redundancy
Fewer bits to represent frequent symbols
Huffman coding : Lossless
Occurs when the data used to represent the image is not utilized
in an optimal manner
Interpixel redundancy
Neighboring pixels have similar values
Occurs because adjacent pixels tend to be highly correlated, in
most images the brightness levels do not change rapidly, but
change gradually
Predictive coding : Lossless
Correlation between pixels is not used in coding
Correlation due to geometry and structure
Value of any pixel can be predicted from the value of the
neighbours
Course Code & Name CEC366 Image Processing
Psychovisual redundancy
Some information is more important to the human visual system
than other types of information
Quantization : Lossy
Remove information that human visual system cannot perceive
Removal of high frequency data : Lossy
Due to properties of human eye
Eye does not respond with equal sensitivity to all visual
information (e.g. RGB)
Certain information has less relative importance
If eliminated, quality of image is relatively unaffected
This is because HVS only sensitive to 64 levels
Use fidelity criteria to assess loss of information
Coding redundancy
If the gray level of an image is coded in a way that uses more code words than
necessary to represent each gray level, then the resulting image is said to contain
coding redundancy.
Interpixel redundancy
Course Code & Name CEC366 Image Processing
The value of any given pixel can be predicted from the values of its neighbors.
The information carried by is small. Therefore the visual contribution of a single pixel to
an image is redundant. Otherwise called as spatial redundant geometric redundant or
interpixel redundant. Eg: Run length coding
• Before encoding, preprocessing is performed to prepare the image for the encoding
process, and consists of any number of operations that are application specific
• After the compressed file has been decoded, postprocessing can be performed to
eliminate some of the potentially undesirable artifacts brought about by the
compression process
• The compressor can be broken into following stages:
1. Data reduction: Image data can be reduced by gray level and/or spatial quantization,
or can undergo any desired image improvement (for example, noise removal)
process
2. Mapping: Involves mapping the original image data into another mathematical space
where it is easier to compress the data
3. Quantization: Involves taking potentially continuous data from the mapping stage
and putting it in discrete form
4. Coding: Involves mapping the discrete data from the quantizer onto a code in an
optimal manner
• A compression algorithm may consist of all the stages, or it may consist of only one
or two of the stages
Course Code & Name CEC366 Image Processing
Huffmann Coding
1. Find the gray level probabilities for the image by finding the histogram
2. Order the input probabilities (histogram magnitudes) from smallest to largest
3. Combine the smallest two by addition
4. GOTO step 2, until only two probabilities are left
5. By working backward along the tree, generate code by alternating assignment of 0
and 1
Course Code & Name CEC366 Image Processing
Course Code & Name CEC366 Image Processing
• Run-length coding (RLC) works by counting adjacent pixels with the same
gray level value called the run-length, which is then encoded and stored
• RLC can be implemented in various ways, but the first step is to define the
required parameters
• Horizontal RLC (counting along the rows) or vertical RLC (counting along the
columns) can be used
• In basic horizontal RLC, the number of bits used for the encoding depends
on the number of pixels in a row
n
• If the row has 2 pixels, then the required number of bits is n, so that a run
that is the length of the entire row can be encoded
Example
Course Code & Name CEC366 Image Processing
Course Code & Name CEC366 Image Processing
Compression Achieved
Original image requires 3 bits per pixel (in total - 8x8x3=192 bits).
JPEG Encoding
JPEG 2000
Course Code & Name CEC366 Image Processing
Quantization:
After T table is created, the values are quantized to reduce the number of bits
needed for encoding.
Quantization divides the number of bits by a constant, then drops the fraction. This
is done to optimize the number of bits and the number of 0s for each particular
application.
Compression:
Quantized values are read from the table and redundant 0s are removed.
To cluster the 0s together, the table is read diagonally in an zigzag fashion. The
reason is if the table doesn’t have fine changes, the bottom right corner of the table
is all 0s.
JPEG usually uses lossless run-length encoding at the compression phase.
Course Code & Name CEC366 Image Processing
transform for color images The one-dimensional wavelet transform is applied to the
rows and columns, and the coefficients are quantized based on the image size and
number of wavelet bands utilized
• These quantized coefficients are then arithmetically coded on a bitplane basis
Transform Coding
Procedure
Transform coding, is a form of block coding done in the transform domain. The
image is divided into blocks, or subimages, and the transform is calculated for each
block
• Any of the previously defined transforms can be used, frequency (e.g.
Fourier) or sequency (e.g. Walsh/Hadamard), but it has been determined that
the discrete cosine transform (DCT) is optimal for most images
• The newer JPEG2000 algorithms uses the wavelet transform, which has
been found to provide even better compression
•
• After the transform has been calculated, the transform coefficients are
quantized and coded
• This method is effective because the frequency/sequency transform of
images is very efficient at putting most of the information into relatively few
coefficients, so many of the high frequency coefficients can be quantized to 0
(eliminated completely)
• This type of transform is a special type of mapping that uses spatial
frequency concepts as a basis for the mapping
• The main reason for mapping the original data into another mathematical
space is to pack the information (or energy) into as few coefficients as
possible
• The simplest form of transform coding is achieved by filtering by eliminating
some of the high frequency coefficients
• However, this will not provide much compression, since the transform data is
typically floating point and thus 4 or 8 bytes per pixel (compared to the
Course Code & Name CEC366 Image Processing
original pixel data at 1 byte per pixel), so quantization and coding is applied
to the reduced data
• Quantization includes a process called bit allocation, which determines the
number of bits to be used to code each coefficient based on its importance
• Typically, more bits are used for lower frequency components where the
energy is concentrated for most images, resulting in a variable bit rate or
nonuniform quantization and better resolution
• Two particular types of transform coding have been widely explored:
1. Zonal coding
2. Threshold coding
• These two vary in the method they use for selecting the transform
coefficients to retain (using ideal filters for transform coding selects the
coefficients based on their location in the transform domain)
Zonal coding
• It involves selecting specific coefficients based on maximal variance
• A zonal mask is determined for the entire image by finding the variance
for each frequency component
• This variance is calculated by using each subimage within the image as
a separate sample and then finding the variance within this group of
subimages
Course Code & Name CEC366 Image Processing
• The zonal mask is a bitmap of 1's and 0', where the 1's correspond to the
coefficients to retain, and the 0's to the ones to eliminate
• As the zonal mask applies to the entire image, only one mask is required
Threshold coding
• It selects the transform coefficients based on specific value
• A different threshold mask is required for each block, which increases
file size as well as algorithmic complexity
• In practice, the zonal mask is often predetermined because the low
frequency terms tend to contain the most information, and hence exhibit the
most variance
• In this case we select a fixed mask of a given shape and desired
compression ratio, which streamlines the compression process
• It also saves the overhead involved in calculating the variance of each group of
subimages for compression and also eases the decompression process
Course Code & Name CEC366 Image Processing
• Typical masks may be square, triangular or circular and the cutoff frequency is
determined by the compression ratio
IMPORTANT QUESTIONS
12 MARKS QUESTIONS
1. Explain the schematics of image compression standard JPEG.
2. Explain CCITT Image compression standard.
3. Explain lossy and lose less predictive coding with a neat sketch.
Course Code & Name CEC366 Image Processing
4. Briefly explain (a) variable length coding (b) Transform coding (c) Zonal coding.
5. Explain Shannon’s coding with a suitable example.
6. Explain Huffman coding with a suitable example.
7. Explain Pixel and threshold coding with a suitable example.