0% found this document useful (0 votes)
7 views29 pages

DIP Unit 4

Uploaded by

Jeevan Kp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views29 pages

DIP Unit 4

Uploaded by

Jeevan Kp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Unit 4

Introduction to image compression


Images require a lot of space as image files can be very large. They also need to be
exchanged among various imaging systems. Hence, there is a need to reduce both the amount of
storage space and the transmission time. This leads us to the area of data compression. Data
compression deals with algorithms and techniques for compression of data. It is the art and science
of removing redundancies to represent information in a compact form.
It has been observed that in data compression there is no compromise on the information quality
only the data used to represent the information is reduced.
Image processing applications use huge amounts of image data. They are expected to both store
and transmit huge amounts of data. Data compression becomes essential due to the following three
reasons:
Storage: The storage requirements of imaging applications are very high. The goal of data
compression is to reduce the amount of memory by reducing the number of bits, while at the same
time maintain the minimum data to reconstruct the image. The reduction of data reduces the memory
requirement and the resources spent for storage.
Transmission: The transmission time of the image is directly proportional to the size of the
image. Image compression aims to reduce the transmission time by reducing the size of the image.
The reduction of data leads to easier and faster transportation of data.
Faster Computation: Reduced data simplifies the algorithm design and facilitates faster
execution of the algorithms.
Image compression model
Compressor and decompressor are known as coder and decoder respectively. They may be located
in the same place or at both ends of the channel. The term encoder is also used for coder. The coder
and decoder are collectively referred to as codec. Codec may be the hardware or the software
component.

The two main components of the image compression model are the encoder and the
decoder. The source or symbol encoder takes a set of symbols from the input data, removes the
redundancies and sends the data across the channel. The decoder has two parts, namely, channel
decoder and symbol (or source) decoder. If the link is noise-free, the channel encoder and decoder
can be omitted.
Compression Measures

Data compression algorithms can be viewed as a mathematical transformation for mapping a


message of N1 data bits to a set of codes with N2 data bits representing the same information.
A message can be conveyed by various means. Human beings are very good at this. For
example, we use abbreviations such as ‘UK’ and ‘JPEG’ to represent ‘United Kingdom’ and ‘Joint
photographic experts group’, respectively. It can be observed that there is no compromise on the
meaning of the message. Instead, only the representation of the message is changed so that it is
more compact than the earlier form. This kind of logical substitution is called logical
compression. This is used at the highest logical level. At the image level, the transformation of
the input message to a compact representation of code is more complex. This is called physical
compression, and is accomplished using complex data compression algorithms by manipulating
the pixels.
Data compression algorithms can be visualized as the mapping of a set of message symbols
to codes using some logical rules of conversion. A message is composed of a well-defined set of
symbols S, which conveys something to observer. In the imaging context, the pixel or block of
pixels is considered a as set of symbols. A code, on the other hand, is a sequence of symbols or
numbers that are used to represent information. A string of codes is called a codeword. The
compression ratio (CR) is defined as N1/N2 and the relative redundancy is defined as RD=1-1/CR
There are three possible scenarios. The first scenario may be N2 = N1. This means that the
compression ratio is 1 and the relative redundancy is 1-1/1=0. This indicates that there is no
redundancy in the image. The input message is reproduced exactly. In scenario two, if N2 << N1,
the compression ratio is ∞ and the relative redundancy is 1. This condition implies that the image
has highly redundant data and there is a significant compression. In scenario three, when N2 >>
N1, the compression ratio is 0 and relative redundancy is ∞. This indicates that the transformed set
has more data than the original set. Typically this is called data explosion or reverse compression.
Data explosion is not under the purview of data compression.
Some of the compression measures are
Compression ratio is defined as

This is expressed as N1:N2. It is common to use the compression ratio of 4:1. The
interpretation is that 4 pixels of the input image are expressed as 1 pixel in the output image.
Savings percentage is defines a
Bit rate: Bit rate describes the rate at which bits are transferred from the sender to the receiver and
indicates the efficiency of the compression algorithm by specifying how much data is transmitted
in a given amount of time. It is often given as bits per second (bps), kilobits per second (Kbps), or
megabits per second (Mbps). Bit rate specifies the average number of bits per stored pixel of the
image and is given as

Types of Redundancies
Redundancy means repetitive data. This may be data that share some common
characteristics or overlapped information. This redundancy may be present implicitly or explicitly.
For example, consider the string ‘aaaaab’
This information has redundancy where the character a is repeated five times. The message can be
conveyed simply as {(a, 5), b}, implying that the character a occurs five times and b occurs once.
This reduces the information to a compact form. This is useful when the repetition is large.
Similarly redundancy can be present in images also. Consider the image.

It can be observed that the pixel value 10 is repeated seven times. In this case, the
redundancy is explicit. The redundancy may be implicit also. Consider the image.

This image can be split into two images by combining the LSBs to form one image and the
MSBs to form another image.

The first image has all 0s and the second image has two 0s and two 1s. These redundancies
can be exploited and the image data can be reduced. The types of redundancies in images are
1) Coding Redundancy
High probabilities are associated with events that are likely to occur frequently. Low
probabilities are associated with events that occur rarely. Based on this uncertainty, information can
be measured and quantified. The amount of uncertainty is called the self-information associated
with the event. This is known as information content. The unit is bits. Thus the information content
is inversely proportional to its probability. When the probability of the event is 1, the information
content is 0 and when the probability is 0, the information content is 1. Thus the range of
information content is also 0-1.
In images, the grey values of pixels are not equally probable and often a value is related to
that of its neighbours. So a pixel and its grey level (k) are modelled as a random variable rk. Each
random variable is associated with a probability p(r k). Coding redundancy is caused due to poor
selection of coding technique. A coding scheme assigns a unique code for all symbols of the
message. Binary coding is a good example of a coding scheme where only two code symbols {0,1}
are used. So the message is mapped to a codeword containing 0s and 1s. There are many coding
schemes available. ASCII (American standard code for information interchange) is a 7-bit code.
Grey code is another popular code. When the mapping between the source symbols and the code
is fixed, such a code is called a block code. If a code is uniquely decodable, it is called a uniquely
decodable code. If the codeword of a block code is distinct, it is called a non – singular code. If
the decoding is possible without the knowledge of the succeeding code words, it is known as
instantaneous code.
The Huffman code is a variable coding technique and uses lesser number of bits to encode
the same information. Hence it can be concluded that variable length coding is better than fixed
binary coding. Thus a wrong choice of code creates unnecessary additional bits. These extra bits
are called redundancy. Thus coding redundancy is given as
Coding redundancy = Average bits used to code - Entropy

Where rk represents grey levels, p(rk) is the probability of the pixels given by grey levels r k,
l(rk ) is the length of the code used.
Entropy of an image denotes the minimum number of bits required to code the message. Thus
coding redundancy is removed by using good coding schemes including variable length codes such
as Huffman coding and Shannon-Fano coding algorithm.
It can be observed that Huffman coding is closer to the entropy. Hence it can be concluded
that variable length coding is better than fixed binary coding.

2) Inter pixel redundancy


A pixel of an image is not isolated, but often related to its neighbours. Hence pixel can be
predicted using the value of its neighbours. For example, consider an image with a constant
background. The visual nature of the image background is given by many pixels that are not
actually necessary. This is called spatial redundancy (or geometrical redundancy). Spatial
redundancy may be present in a single frame (intra-frame) or among multiple frames.
In intra-frame redundancy, large portions of the image may have the same characteristics
such as colour and intensity. This kind of redundancy is also called as spatial redundancy. One
way to reduce the redundancy is to use subsampling, where alternative pixels can be used to reduce
the bits.
Another way to reduce the inter-pixel dependency is to use quantization where a fixed
number of bits are used to reduce the bits. Inter-pixel dependency is solved by algorithms such as
predictive coding techniques, bit-plane algorithm, run length coding, and dictionary based
algorithms.
3) Psychovisual redundancy
Most imaging applications produce images that are observed by humans. So the images that
convey little or no information to the human observer are said to be psychovisually redundant. For
example, the human visual system is very sensitive to certain information such as edges and
textures. In most cases the outline of the object itself conveys the structure of the objects. So the
human visual system attaches great importance to these kinds of information compared to any other
information of the image. This kind of redundant information of less importance is called
psychovisual redundancy. One way to resolve this redundancy is to perform uniform quantization
by reducing the number of bits. The least significant bits (LSBs) of the image do not convey much
information and hence can be removed. Sometimes this may cause edge effects that can be removed
by a scheme called improved gray scale (IGS).

4) Chromatic redundancy
Chromatic redundancy refers to the presence of unnecessary colours in an image. The colour
channels of colour images are highly correlated but the human visual system cannot perceive
millions of colours. Hence the colours that are not perceived by the human visual system can be
removed without affecting the quality of the image.
Fidelity refers to the condition of accurate reproduction of data. The difference between the
original and the reconstructed images is called distortion. The amount of distortion should be
assessed. Objective fidelity measures are error, SNR, and PSNR. Subjective assessment factors are
picture quality, appearance, brand name or customer care to select a product. Similarly the image
quality can be assessed based on the subjective picture quality scale.

Categories of Compression Algorithms


The role of the compression algorithm is to reduce the source data to a compressed form and
decompress it to get the original data. Hence the compression algorithm should have an idea
about the symbols to be coded and their probability distribution.

Any compression algorithm has two components.


1) Modeller: The purpose of the modeller is to condition the image data for compression using
the knowledge of the data. The modeller is present both in the sender and the receive sides.
The models can be either static (that is the models at the sender and the receiver sides do not
change) or dynamic (that is the models change depending on the change of data during the
compression or decompression process). Based on the models used, the algorithm can also
be classified as static or dynamic compression algorithm.
2) Coder: The second component is called the coder. The sender – side coder is called the
encoder. This codes the symbols independently or using the model. The receiver – side coder
is called the decoder, which decodes the message from the compressed data.
If the models at the sender and receiver sides are the same, the compression scheme is symmetric.
Otherwise it is asymmetric. Compression algorithms can be broadly classified into two types.

Lossless compression is useful in preserving information as there is no information loss. This


type of algorithm is useful in the legal and medical domains. Lossy compression algorithms compress
the data with a certain amount of error that is acceptable to the human observer. The human visual
system has many defects like colour blindness. So the loss of information is either not noticed much
or the errors are acceptable. This category of algorithms is useful in applications such as broadcast,
television, and multimedia.

Another way of classifying image compression algorithms is


1. Entropy encoding
2. Predictive coding
3. Transform coding
4. Layered coding

1) Entropy coding
The logic behind this category is that if pixels are not uniformly distributed then an
appropriate coding scheme can be selected that can encode the information so that the average
number of bits is less than the entropy. Entropy specifies the minimum number of bits used to
encode information.
Hence the coding is based on the entropy of the source and on the possibility of occurrence of the
symbols. This leads to the idea of variable length coding. Some examples of this type of coding are
Huffman coding, arithmetic coding, and dictionary based coding.
2) Predictive coding
The idea behind predictive coding is to remove the mutual dependency between the successive
pixels and then perform the encoding. Normally samples would be very large, but the differences
would be small.

For example, let us assume that the following pixels need to be transmitted:

Pixels 400 405 420 425


Difference 5 15 5
It can be observed that the difference of pixels is always lesser than original and requires
fewer bits for representation. Therefore it makes sense to encode the differences rather than
original. However, this approach may not work effectively for rapidly changing data such as
{300, 4096, 128, 4096, 15}.
Hence the number of bits required to code the difference is very small for slowly varying data.
Examples of this category include differential pulse code modulation (DPCM) and delta
modulation techniques.
3) Transform coding
The energy is packed into fewer components and only these components are encoded and
transmitted. The human eye is more sensitive to the lower spatial frequencies than the higher
spatial frequencies. The idea is to remove the redundant high frequency components to create
compression. The removal of these frequency components leads to loss of information. However,
this loss of information, if tolerable, can be used for imaging and video applications. Thus the
basis of transform coding is frequency selection, information packing, and the concept of basis
images.
4) Layered coding
Layered coding is very useful in the case of layered images. Sometimes the image is
represented in the form of layers. Data structures like pyramids are useful to represent an image in
this multiresolution form. The layers of a pyramid would be sent depending on the application. At
times, these images are segmented as foreground and background and based on the needs of the
application, encoding is performed. This is also in the form of selected frequency coefficients or
selected bits of pixels of an image.

Compression Algorithms - 1
1) Lossless compression algorithms
Lossless compression algorithms preserve information and the compression process incurs
no data loss. Hence these algorithms are used in domains where reliability and preservation of
data are crucial. However the compression ratio of these algorithms is small in comparison to the
lossy compression algorithms. Some popular lossless compression algorithms are
1. Run-length coding
2. Huffman coding
3. Shannon-Fano coding
4. Arithmetic coding
5. Dictionary-based coding

1) Run-length coding
Run-length coding (RLC) exploits the repetitive nature of the image. It tries to identify the length
of the pixel values and encodes the image in the form of a run. Each row of the image is written
as a sequence. Then the length is represented a run of black or white pixels. This is called run-
length coding. This is an effective way of compressing an image. If necessary, there can be further
compression using variable length coding to code the run lengths themselves.

Run-length coding is developed by consultative committee of the international telegraph and


telephone (CCITT), and it is a new standard used to encode binary and grey level images. The
technique scans the image row by row and identifies the run. The output run-length vector
specifies the pixel value and the length of the run. Consider the image shown below
The horizontal RLC starts from the top-left pixel, scans the image from left to right, and
generates the run length vector. For the image the run length vectors are as follows:

The maximum length is five. It requires three bits in binary. The total number of vectors is
six. The maximum number of bits required is three. The number of bits per pixel is one, as the
pixels of the image are either 0 or 1. Therefore, the total number of pixels is given by 6 x (3 + 1) =
24. The total number of bits of the original image is 5 x 5 = 25. Therefore, the compression ratio is
25/24, that is 1.042:1.
The scan line can be changed. This change affects the compression ratio. Vertical line scanning
of the same image yields

It can be observed that this is significantly lesser than the previous scheme. The scan line
can be changed to a zigzag line as shown below.
Vertical line scanning yields,

It can be observed that the compression ratio changes with the scan line.

2) Huffman coding
Huffman coding is a type of variable length coding. In Huffman coding, the coding redundancy
can be eliminated by choosing a better way of assigning the codes. The Huffman coding algorithm
is given as follows.
1) List the symbols and sort them
2) Pick two symbols having the least probabilities.
3) Create a new node. Add the probabilities of the symbols selected in step 2 and label the
new node with it.
4) Repeat steps 2 and 3 till only one node remains
5) Start assigning code 0 for the left tree and code 1 for the other branch.
6) Trace the code from the root to the leaf that represents each label.

The running time of the algorithm is O(nlog n). The Huffman tree is
shown.
The problem with the Huffman codes is that they are not unique. The data given can be
differently combined to yield the result shown.

3) Huffman decoder
The procedure for decoding as implemented in the Huffman decoder is as follows:
1. Find the coded message. Start from the root.
2. If the read bit is 0, move to the left. Other wise move to the right of the tree.
3. Repeat the steps until the leaf is reached. Then generate the code and start again from
the root.
4. Repeat steps 1-3 till the end of the message.

4) Shannon-Fano coding
The difference between Shannon-Fano coding and Huffman coding is that the binary tree
construction is top-down in the former. The whole alphabet of symbols is present in the root. Then
a node is split into two halves along with the probability of the symbols – One corresponding to
the left and the other to the right, based on the values of the probabilities. This process is repeated
recursively and an entire tree is constructed. Then 0 is assigned to the left half and 1 to the second
half.
The steps of the Shannon-Fano algorithm are as follows:
1. List the frequency table and sort the table on the basis of increasing frequency.
2. Divide the table into two halves such that the groups have more or less equal number
of frequencies.
3. Assign 0 to the upper half and 1 to the lower half.
Recursively repeat the process until each symbol becomes a leaf on the tree.

Compression Algorithms – 2
1) Bit plane coding
The idea of RLC can also be extended for multilevel images. This technique splits a multilevel
image into a series of bi-level images, that is , an m-bit grey level image can be represented in

the form
The zeroth order bit plane is generated by collecting the a0 bits of each pixel. The first
order bit plane is generated by collecting all the first bits of each pixel. Continuing in this
manner, the m-1 order bit plane is generated by collecting al the am-1 bits of each pixel.
Let us assume that the grey scale image is as follows.

The image A can now be divided into three planes using the MSB, the middle bit and the LSB
as follows:

The individual planes of the image can now be compressed using RLC techniques. If a plane is
completely white or black, RLC yields peak performance. However the disadvantage of this
scheme is that the neighbours in the spatial domain, say 3 and 4, having binary codes 011 and 100
are not present together in any of the planes. So it is reasonable to expect that the close neighbours
in the spatial domain not being present together in the bit plane causes a problem. In addition
small changes in the grey level have a chain effect and the complexity of the bit plane is
significantly affected.
To avoid this problem, grey code can be used instead of binary code. In grey code, the
successive codes differ by only one bit. The algorithm for generating grey code can be given as
follows.

Once the planes are obtained separately, RLC can be applied to the individual planes.
Another coding scheme that can be used for the bit plane is constant area coding (CAC). The bit
planes have uniform regions of 1s and 0s. A constant portion of the bit plane can be uniquely
coded using less number of bits. CAC divides the image into a set of blocks of the size (mxm)
like 8x8 or 16x`16. There are three types of blocks available.
1. A block of all white pixels
2. A block of all black pixels
3. A block with mixed pixels.
The most probable block is assigned a single code of either 0 or 1. The remaining blocks are
assigned a 2-bit code. White block skipping (WBS) is a scheme where a majority of the blocks
(white) are assigned a single bit. The rest of the blocks including mixed pixel blocks are encoded
together.

2) Arithmetic coding
Arithmetic coding is another popular algorithm and is widely used, like the Huffman coding
technique. The differences between arithmetic and Huffman coding are shown in the table.
Differences between arithmetic coding and Huffman coding

Arithmetic coding Huffman coding


This is a complex technique for coding short This is a simple technique for coding characters
messages
It is always optimal It is optimal only if the probabilities of the
symbols are negative powers of two
Precision is a big issue Precision is not an important factor
There is no slow reconstruction There is slow reconstruction whens the number
of symbols is very large and changing rapidly
Arithmetic coding uses a single code word for a string of characters. Consider the symbol and
their probabilities are given in the table.

Let us try to code the string ‘CAB’.


The arithmetic coding process is carried out as follows.
1. The first step is to divide the range into 0-1 based on the probabilities

The first character to be encoded is C. This falls in the range of 0.9-1.0. So the code would
start in this range only. The range here is 1.0-0.9=0.1. This range of 0.1 is now divided among the
symbols according the probability given. The new range can be obtained by multiplying the
probability and
0.1. The cumulative probability is given in the table.

1. The resultant new range is 0.96 – 1.00. This is given as follows:


Now the second character to be encoded is A. This falls in the range 0.9-0.96. So code would
start in this range only. The range here is 0.96-0.9 = 0.06. This range is now divided among the
symbols according to the probability given.

2. The resultant range is as follows

The next character to be encoded is B. This falls in the range 0.936-0.954. With this character this
string ends. So the final code for string ‘CAB’ is between 0.936 and 0.954. By this logic, any
string can be represented.

Repeat it to code for ‘CBAC’


The first character is ‘C’ which lies in the range 0.6 to
1.0 A = 0.6+0.3X0.4=0.72

B = 0.82+0.3X0.4=0.84 C =

0.84+0.4X0.4=1.0
3) Dictionary-based Coding
This is also called Lempel-Ziv-Welch (LZW) coding. The idea behind this coding is to use
a dictionary to store the string patterns that have already been encountered. Indices are used to
encode the repeated patterns. The encoder reads the input string. Then it identifies the recurrent
words and outputs their indices from the dictionary. If a new word is encountered, the word is
sent as output in the uncompressed form and is entered into the dictionary as a new entry. The
advantages of the dictionary-based methods are as follows:
1. They are faster
2. These methods are not based on statistics. Thus there is no dependency of the quality
of the model on the distribution of data.
3. These methods are adaptive in nature.

Encoding
The idea is to identify the longest pattern for each collected segment of the input string. It is
then checked in the dictionary. If there is no match, the segment becomes a new entry in the
dictionary. The algorithm is as follows:
2) Lossless predictive coding
Predictive coding techniques eliminate the inter-pixel dependencies by predicting new
information, which is obtained by taking the difference between the actual and the predicted
values of that pixel. The encoder takes a pixel of the input image f n. The predictor predicts the
anticipated value of that pixel using past inputs (historical data). The predicted value is rounded to
the nearest integer value denoted by fn. This is the predicted values. The error is the difference
between the actual and the predicted values.

This error is sent across the channel. The same predictor is sued in the decoder side to
predict the value. The reconstructed image is

Compression Algorithms - 3
1) Lossy Compression Algorithms
Lossy compressions algorithms, unlike lossless compression algorithms, incur a loss of
information. This loss is called distortion. However, this data loss is acceptable if it is tolerable.
The compression ratio of these algorithms is very large. Some popular lossy compression
algorithms are as follows:
1. Lossy predictive coding
2. Vector quantization
3. Block transform coding

1) Lossy predictive coding


Lossy predictive coding is an extension of the idea of predictive coding.
Predictive coding can also be implemented as a lossy compression scheme. Instead of taking
precautions, the highest value for 5 bits, that is, 31 can be used. This drastically reduces the
number of bits. However, loss of information increases. This is illustrated in the table below.

Here the number of bits used to transmit is the same as the original scheme, but the value 31 is
transmitted instead of 41, leading to error. This loss of information leads to an error which results
in a lossy compression scheme. This scheme requires only 6 x 6 = 36 bits.
i) Delta modulation
Delta modulation goes one step further by using only one bit for representing the quantized
error value. This can be positive or negative. Here the predictor is defined as

Where α is called the prediction coefficient. More generally, for the first digit,

Then the error is computed as follows:

The error is quantized as follows

Here ς is the positive quantity. This scheme creates problems if significant transitions in the
data are encountered frequently. When ς is varied, it is called adaptive delta modulation scheme.
2) Vector Quantization
Vector quantization (VQ) is a technique similar to scalar quantization. In scalar
quantization, the individual pixels are quantized. The idea of VQ is to identify the frequently
occurring blocks in an image and to represent them as representative vectors. The set of all
representative vectors is called the code book, which is then used for image. The structure of VQ
is shown in the figure below.

The codebook formation procedure is as follows:


1. Vector quantization first partitions the input space X into K non-overlapping regions. It
then assigns a code vector for each cluster. The code vector is commonly chosen as the
centroid of the vectors of the partition.

2. It carries out a mapping process between the input vectors and the centroid vector
3. This introduces an error called distortion measure. This distortion is described as

Here X and Y are two M-dimensional vectors.


4. The codebook of vector quantization consists of all the code words. The image is then
divided into fixed size blocks (vectors), typically 4 x 4 and replaced with the best match
found in the codebook based on the minimum distortion.
For example, in the case of 4 x 4 pixel blocks and a codebook of size 256, the bit rate is (log 2
256)/16=0.5bpp and the corresponding compression ratio is (16x8)/8=16.
3) Block transform coding
Block transform coding is another popular lossy compression scheme. The process of
transform coding is as shown.

Sub image selection


The aim of this step is to reduce the correlation between adjacent pixels to an acceptable
level. This is one of the most important stages where the image is divided into a set of sub-images.
As the first step, the NxN image is decomposed to a set of sub-images of size nxn for operational
convenience. The value of n is a power of two. This is to ensure that the correlation among the
pixels is minimum. This step is necessary to reduce the transform coding error and
computational complexity. (Imagine how difficult it would be to handle matrices of size
1024x1024!) Generally sub-images would be of size 8x8 or 16x16.

Transform selection
The whole idea of transform coding is to use mathematical transforms for data compression.
Transformations such as discrete Fourier transform (DFT), discrete cosine transform (DCT) and
wavelet transforms can be used. The choice of the transforms depends on the resources and the
amount of error associated with the reconstruction process. Mathematical transforms are tools for
information packing.
An important aspect of the image is that smoother details are low frequency components.
Sharp details like edges are high frequency components. Transforms convert data into frequency
components. Then the required frequency components are selected so that the coefficients
associated with the details that are insensitive to the human eye are discarded.

Bit allocation
Transform coding is the process of truncating, quantizing, and coding the coefficients of
the transformed sub images. It is necessary to assign bits such that the compressed image will have
minimum distortions. Bit allocation should be done based on the importance of the data. The idea
of bit allocation is to reduce the distortion by optimal allocation of bits to the different classes of
data. The steps involved in bit allocation are as follows:
1. Assign predefined bits to all classes of data in the image.
2. Reduce the number of bits by one and calculate the distortion
3. Identify the data that is associated with the minimum distortion and reduce one bit form
its quota
4. Find the distortion rate again
5. Compare with the target and if necessary repeat the steps 1-4 to get the optimal rate.

Zonal coding
The zonal coding process involves multiplying each transform coefficient by the
corresponding element in a zonal mask, which has 1 in the location of maximum variance and 0 in
the other places. A zonal mask is designed as part of this process. These coefficients are retained as
they convey more image information. The locations are identified based on the image models used
for source symbol encoding. The retained coefficients are quantized and coded. The number of bits
allocated may be fixed or may vary based on some optimal quantizer.

Threshold mask
Threshold coding works based on the fact that transform coefficients having the maximum
magnitude make the most contribution to the image. The threshold may be one of the following:
1. A single global threshold
2. An adaptive threshold for each sub image
3. A variable threshold as a function of the location for each coefficient in the sub
image.
4. The thresholding and quantization process can be combined.
Z(u, v) is the transform normalized array T=T(u, v)
x z(u, v)
The inverse transform of T gives the decompressed image approximately. It is assumed that the
largest magnitude makes the most significant contribution. This varies from one mask to another.
The mask has 1 in the place of the maximum threshold and 0 in the other places.

Image segmentation
Image segmentation has emerged as an important phase in image based applications.
Segmentation is the process of partitioning a digital image into multiple regions and extracting a
meaningful region known as Region of Interest (ROI). Regions of interest vary with applications.
For example, if the goal of a doctor is to analyse the tumour in a computer tomography (CT)
image, then the tumour in the image is the ROI. Similarly if the image application aims to
recognize the iris in an eye image then the iris in the eye image is the required ROI. Segmentation
of ROI in real world images is the first major hurdle for effective implementation of image
processing applications as the segmentation process is often difficult. Hence, the success or
failure of the extraction of ROI ultimately influences the success of image processing
applications. No single universal segmentation algorithm exists for segmenting the ROI in all
images. Therefore, the user has to try many segmentation algorithms and pick an algorithm that
performs the best for the given requirement.

Image segmentation algorithms are based on either discontinuity principle or similarly


principle. The idea behind discontinuity principle is to extract regions that differ in properties
such as intensity, colour, texture, or any other images statistics. Mostly abrupt changes in the
intensity among the regions results in extraction of edges. The idea behind similarity principle is
to group pixels based on a common property, to extract a coherent region.
Formal definition of image segmentation
An image can be partitioned into many regions R1, R2 , R3 …..Rn. For example the image R
in figure is divided into three sub regions R1 , R2 and R3 as shown in the figure. A sub region or sub
image is a portion of the whole region R. The identified sub-regions should exhibit characteristics
such as uniformity and homogeneity with respect to colour, texture, intensity or any other statistical
property. In addition the boundaries that separate the regions should be simple and clear.
The characteristics of the segmentation process are the following:
1) If the sub regions are combined, the original region can be obtained. Mathematically it can be
stated that UR1=R for i = 1, 2, …,n. For example if the three regions of figure R1, R2.
The characteristics of the segmentation process are the following:
1. If the sub regions are combined, the original region can be obtained. Mathematically it
can be stated that UR1=R for i = 1, 2, …,n. For example if the three regions of figure R1, R2
and R3 are combined, the whole region R is obtained.
2. The sub regions R i should be connected. In other words, the region cannot be open ended
during the tracing progress.
3. The regions R1, R2, ….. Rn do not share any common property. Mathematically, it can be
stated as Ri ꓵ Rj = Φ Ɐ i and j where i≠j. Otherwise there is no justification for the region
to exist separately.
4. Each region satisfies a predicate or a set of predicates such as intensity or other image
statistics that is the predicate (P) can be colour, grey scale value, texture or any other
image statistics. Mathematically this is stated as P(Ri) = True.

Classification of Image segmentation algorithms


There are different ways of classifying the segmentation algorithms. Figure illustrates these
ways. One way is to classify the algorithms based on user interaction required for extracting the
ROI. Another way is to classify them based on the pixel relationships.

Based on user interactions, the segmentation algorithms can be classified into the following three
categories:
1. Manual
2. Semi-automatic
3. Automatic
The words ‘algorithm’ and ‘method’ can be used interchangeably. In the manual method,
the object of interest is observed by an expert who traces its ROI boundaries as well, with the help
of software. Hence, the decisions related to segmentation are made by the human observers. Many
software systems assist experts in tracing the boundaries and extracting them. By using the
software systems, the experts outline the object. The outline can be either an open or closed
contour.
Boundary tracing is a subjective process and hence variations exist among opinions of
different experts in the field, leading to problems in reproducing the same results. In addition a
manual method of extraction is time consuming, highly subjective, prone to human error and has
poor intra- observer reproducibility. However, manual methods are still used commonly by experts
to verify and validate the results of automatic segmentation algorithms.
Automatic segmentation algorithms are a preferred choice as they segment the structures of
the objects without any human intervention. They are preferred if the tasks need to be carried out
for a larger number of images.
Semi-automatic algorithms are a combination of automatic and manual algorithms. In semi-
automatic algorithms, human intervention is required in the initial stages. Normally the human
observer is supposed to provide the initial seed points indicating the ROI. Then the extraction
process is carried out automatically as dictated by the logic of the segmentation algorithm. Region
growing techniques are semi-automatic algorithms where the initial seeds are given by the human
observer in the region that needs to be segmented. However the program process is automatic.
These algorithms can be called assisted manual segmentation algorithms.
Another way of classifying the segmentation algorithm is to use the criterion of the pixel
similarity relationships with neighbouring pixels. The similarity relationships can be based on
colour, texture, brightness, or any other image statistics. On this basis, segmentation algorithms
can be classified as follows:
1. Contextual (region-based or global) algorithms
2. Non contextual (pixel-based or local) algorithms

Contextual algorithms group pixels together based on common properties by exploiting the
relationships that exist among the pixels. These are also known as region-based or global
algorithms. In region-based algorithms, the pixels are grouped based on some sort of similarity that
exists between them. Non-contextual algorithms are also known as pixel based or local
algorithms. These algorithms ignore the relationship that exists between the pixels or features.
Instead, the idea is to identify the discontinuities that are present in the image such as isolated lines
and edges. These are then simply grouped into a region based on some global level property.
Intensity-based thresholding is a good example of this method.

Detection of discontinuities
The three basic types of grey level discontinuities in a digital image are the following:
1. Points
2. Lines
3. Edges

1) Point detection
An isolated point is a point whose grey level is significantly different from its background in
a homogeneous area. A generic 3x3 spatial mask is shown in the figure.

The mask is superimposed onto an image and the convolution process is applied. The response of
the mask is given as
Where the fk values are the grey level values of the pixels associated with the image. A
threshold value T is used to identify the points. A point is said to be detected at the location on
which the mask is centred if |R| ≥ T, where T is a non – negative integer. The mask values of a point
detection mask are shown in the figure.

2) Line Detection
In line detection, four types of masks are used to get the responses, that is R 1, R2, R3 and R4 for the
directions vertical, horizontal, +45o, and -45o, respectively. The masks are shown in figure.

These masks are applied to the image. The response of the mask is given as

R1 is the response for moving the mask from the left to the right of the image. R2 is the
response for moving the mask from the top to the bottom of the image. R3 is the response of the
mask along the
+45o line and R4 is the response of the response of the mask with respect to a line of -45o. Suppose
at a certain line on the image |Ri|>|Rj|Ɐ j ≠i, then that line is more likely to be associated with the
orientation of the mask. The final maximum response is defined by max 4 i=1 {Ri} and the line is
associated with that mask. A sample image and the results of the line-detection algorithm are shown
in the figure.

3) Edge detection
Edges play a very important role in many image processing applications. They provide an
outline of the object. In the physical plane, edges correspond to the discontinuities in depth, surface
orientation, change in material properties, and light variations. These variations are present in the
image as grey scale discontinuities. An edge is a set of connected pixels that lies on the boundary
between two regions that differ in grey value. The pixels on an edge are called edge points.

A reasonable definition of an edge requires the ability to measure grey level transitions in
a meaningful manner. Most edges are unique in space that is, their position and orientation remain
the same in space when viewed from different points. When an edge is detected, the unnecessary
details are removed, while only the important structural information is retained. In short, an edge
is a local concept which represents only significant intensity transitions. An original image and its
edges are shown in the figure respectively.

An edge is typically extracted by computing the derivative of the image function. This
consists of two parts magnitude of the derivative, which is an indication of the strength/contrast of
the edge and the direction of the derivative vector, which is a measure of the edge orientation. Some
of the edges that are normally encountered in image processing are as follows:
1. Step edge
2. Ramp edge
3. Spike edge
4. Roof edge

These are shown in the figure.

1 . Step edge is an abrupt intensity change.


2. Ramp edge, on the other hand, represents a gradual change in intensity.
3. Spike edge represents a quick change and immediately returns to the original intensity
level.
4. Roof edge is not instantaneous over a short distance.

Edge linking algorithms


Edge detectors often do not produce continuous edges. Often the detected edges are not
sharp and continuous due to the presence of noise and intensity variations. Therefore, the idea of
edge linking is to use the magnitude of the gradient operator to detect the presence of edges and
to connect it to a neighbour to avoid breaks.

Continuity is ensured by techniques such as hysteresis thresholding and edge relaxation.


The adjacent pixels (x, y) and (x’, y’) are connected if they have properties such as

Where A is the angular threshold. Edge linking is a post-processing technique that is used
to link edges.

The idea of using edge detection algorithms to extract the edges and using an appropriate
threshold for combining them is known as edge elements extraction by thresholding. The
threshold selection can be static, dynamic or adaptive. Usage of edge detection, thresholding and
edge linking requires these algorithms to work interactively to ensure the continuity of the edges.
Edge relaxation
Edge relaxation is a process of re-evaluation of pixel classification using its context. Cracks are
the differences between the pixels. The crack edges for the horizontal edges are
|f(x,y) – f(x+1,y) | and for the vertical cracks are |I(x,y)-I(x,y-1)|

Graph theoretic algorithms


The graph theoretic approach is quite popular in machine vision as the problem is stated
as the optimization of some criteria function. The idea is to construct a graph of an image where
the nodes represent the pixel corners of the graph and the edges represent the pixel cracks.
Each edge is assigned a cost based on the gradient operator. The cost of the edge is high for
improbable edges and very low for potential edges such as cracks. The problem of finding an
optimal boundary is considered as a problem of finding minimum cost paths. The components of
this algorithm are as follows.
1. Forming the graph
2. Assigning cost functions
3. Identifying the start and end points
4. Finding the minimum cost path.

Principle of thresholding
Thresholding is a very important technique for image segmentation. It produces uniform
regions based on the threshold criterion T. The thresholding operation can be through of as an
operation, such as T = T {x, y, A(x,y), f (x,y)} where f(x,y) is the grey level of the pixel at (x, y)
and A(x,y) is the local property of the image. If the thresholding operations depend only on the
grey scale values, it is called global thresholding. If the neighbourhood property is also taken into
account, this method is called local thresholding. If T depends on pixel coordinates also, T is called
dynamic thresholding.
Histogram and threshold
The quality of the thresholding algorithm depends on the selection of a suitable threshold.
The selection of an appropriate threshold is a difficult process. Figures below show the effect of
the threshold value on an image.

The tool that helps to find the threshold is histogram. Histograms are of two types
1. Unimodal histogram
2. Multimodal histogram

If a histogram has one central peak, it is called unimodal histogram. On the other hand a
multimodal histogram has multiple peaks. A bimodal histogram is a special kind of histogram that
has two peaks separated by a valley. Using the valley, a suitable threshold can be selected to
segment the image. Thus the peak and the valley between the peaks are indicators for selecting the
threshold. This process is difficult for unimodal algorithms and more difficult if the foreground
and background pixels overlap. Some of the techniques that can be followed for selecting the
threshold values are as follows:
1. Random selection of the threshold value
2. If the ROI is brighter than the background object, then find the cumulative histogram.
Histogram is a probability distribution p(g) – ng/n where ng is the number of pixels having the
grey scale value as g and n is the total number of pixels. The cumulative histogram is thus
given by

Set the threshold T such that c(T) = 1/p. if we are looking for dark objects in white
background, the threshold would be 1-(1/p).

Selecting the threshold for bimodal images is an easy task since the valley can indicate the
threshold values. The process is difficult for unimodal histograms and too difficult if the
background and foreground pixels overlap.
Noise can affect histograms. Noise can create a spike and the presence of data with two
different distributions may cause problems. The presence of data belonging to different
distributions may produce a histogram with no distinct nodes which may complicate the process of
finding thresholds.
Global thresholding is difficult when the image contrast is very poor. The background of
the image can cause a huge problem too. In addition textures and colours tend to affect the
thresholding process. Similarly if the lighting conditions are not good, thresholding may yield poor
results.

Global thresholding algorithms


Bimodal images are those where the histograms have two distinct peaks separated by a valley
between them. The valley point is chosen as a threshold T. Then the pixels of the given image
(f(x,y)) are compared with the threshold. If the pixel values are greater than or equal to the
threshold value, the pixel is assigned a value of 1. Otherwise, it is assigned a value of 0, giving
the output threshold image g(x,y)
The threshold process is given as

One of the biggest problems is choosing an appropriate threshold value.


Multiple thresholding
If there are more than two classes of objects in the image, multiple thresholding should be
used. This method is an extension of the simple thresholding technique. Here fi is the value of the
input image pixel and t1, t2,…., t n are multiple threshold values. The values of the output image
are given as g1, g2, …. gn

Adaptive Thresholding Algorithm


Adaptive algorithm is also known as dynamic thresholding algorithm. Ideally in dynamic
thresholding, the image is divided into many overlapping sub-images. The histograms of all the
sub- images are constructed and the local thresholds are obtained. Then the threshold value is
obtained by interpolating the results of the sub-images. However, the problem with this approach
is that it is computationally very intensive and takes a lot of time. Hence, this approach is not
suitable for real applications.

Another way of applying adaptive thresholding is to split the image into many subregions.
The following image statistics can then be calculated; the idea is to locally apply these for the
subregions. Some useful image statistics are as follows

Max and Min correspond to the maximum and minimum of pixel values, respectively, and
c is a constant.

If the subregions are large enough to cover the foreground and background areas then this
approach works well. In addition, adaptive thresholding is good in situations where the image is
affected by non–uniform illumination problems. Figure (a) shows as original image with non-
uniform illumination. The result of the adaptive algorithm is shown in the figure (b). It can be
observed that this result is far better than the result provided by the global thresholding algorithm
for figure (a).

The reason is that global thresholding algorithms do not work well if the image has
illumination problems.

First-order Edge Detection Operators


Local transitions among different image intensities constitute an edge. Therefore, the aim is
to measure the intensity gradients. Hence, edge detectors can be viewed as gradient calculators.
Based on differential geometry and vector calculus, the gradient operator is represented as
Applying this to the image f, one gets

The difference between the pixels are quantified by the gradient magnitude. The direction
of the greatest change is given by the gradient vector. This gives the directions of the edge. Since
the gradient functions are continuous functions, the discrete versions of continuous functions can
be used. This can be done by finding the differences. The approaches in 1D are as follows. Here ∆x
and ∆y are the movements in x and y directions, respectively.

Roberts operator
Let f(x,y) and f(x+1,y) be neighbouring pixels. The difference between the adjacent pixels
is obtained by applying the mask [1-1] directly to the image to get the difference between the pixels.
This is defined mathematically as

Roberts’s kernels are derivatives with respect to the diagonal elements. Hence they are
called cross- gradient operators. They are based on the cross diagonal differences. The
approximation of Roberts’s operator can be mathematically given as
Since the magnitude calculation involves square root operation, the common practice is to
approximate the gradient with absolute values that are simpler to implement, as

Prewitt operator
The Prewitt method takes the central difference of the neighbouring pixels; this difference
can be represented mathematically as

The central difference can be obtained using the mask [-1 0 +1]. This method is very sensitive
to noise. Hence to avoid noise, the Prewitt method does some averaging. The Prewitt approximation
using a 3 x 3 mask is as follows:

Sobel operator
The Sobel operator also relies on central differences. This can be viewed as an
approximation of the first Gaussian derivative. This is equivalent to the first derivative of the
Gaussian blurring image obtained by applying a 3 x 3 mask to the image. Convolution is both
commutative and associative and is given as

The edge mask can be extended to 5x5, 7x7 etc. An extended mask always gives a better
performances.
Second-order Derivative Filters
Edges are considered to be present in the first derivative when the edge magnitude is large
compared to the threshold value. In the case of the second derivative, the edge pixel is present at a
location where the second derivative is zero. This means that f’’(x) has a zero crossing which can
be observed as a sign change in pixel differences. The Laplacian algorithm is one such zero-crossing
algorithm.
However, the problems of the zero-crossing algorithms are many. The problem with
Laplacian masks is that they are sensitive to noise as there is no magnitude checking – even a small
ripple causes the method to generate an edge point. Therefore, it is necessary to filter the image
before the edge detection process is applied. This method produces two-pixel thick edges, although
generally, one- pixel thick edges are preferred. However, the advantage is that there is no need for
the edge thinning process as the zero-crossings themselves specify the location of the edge points.
The main advantage is that these operators are rotationally invariant.
The second-order derivative is

This ▽2
operator is called Laplacian operator. The Laplacian of the 2D function f(x, y) is also defined
as

Since the gradient is a vector, two orthogonal filters are required. However, since the Laplacian
operator is scalar, a single mask is sufficient for the edge detection process. The Laplacian estimate
is given as
The Laplacian masks are as shown below. The mask shown in figure (a) is sensitive to
horizontal and vertical edges. It can be observed that the sum of the elements amounts to zero. To
recognize the diagonal edges, the mask shown in figure (b) is used. This mask is obtained by
rotating the mask of figure (a) by 45o. The addition of these two kernels results in a variant of the
Laplacian mask shown in figure (c). Two times of the mask shown in figure (a), when subtracted
from the mask shown in figure
(b) yields another variant mask as shown in figure (d).

Canny edge detection


The Canny approach is based on optimizing the trade-off between two performance criteria
and can be described as follows:
1. Good edge detection – The algorithm should detect only the real edge points and discard
all false edge points.
2. Good edge localization – The algorithm should have the ability to produce edge points
that are closer to the real edges.
3. Only one response to each edge – The algorithm should not produce any false, double,
or spurious edges.

The Canny edge detection algorithm is given as

1. First convolve the image with the Gaussian filter.


2. The next step is to thin the edges.
3. Apply hysteresis thresholding

You might also like