DIP Unit 4
DIP Unit 4
The two main components of the image compression model are the encoder and the
decoder. The source or symbol encoder takes a set of symbols from the input data, removes the
redundancies and sends the data across the channel. The decoder has two parts, namely, channel
decoder and symbol (or source) decoder. If the link is noise-free, the channel encoder and decoder
can be omitted.
Compression Measures
This is expressed as N1:N2. It is common to use the compression ratio of 4:1. The
interpretation is that 4 pixels of the input image are expressed as 1 pixel in the output image.
Savings percentage is defines a
Bit rate: Bit rate describes the rate at which bits are transferred from the sender to the receiver and
indicates the efficiency of the compression algorithm by specifying how much data is transmitted
in a given amount of time. It is often given as bits per second (bps), kilobits per second (Kbps), or
megabits per second (Mbps). Bit rate specifies the average number of bits per stored pixel of the
image and is given as
Types of Redundancies
Redundancy means repetitive data. This may be data that share some common
characteristics or overlapped information. This redundancy may be present implicitly or explicitly.
For example, consider the string ‘aaaaab’
This information has redundancy where the character a is repeated five times. The message can be
conveyed simply as {(a, 5), b}, implying that the character a occurs five times and b occurs once.
This reduces the information to a compact form. This is useful when the repetition is large.
Similarly redundancy can be present in images also. Consider the image.
It can be observed that the pixel value 10 is repeated seven times. In this case, the
redundancy is explicit. The redundancy may be implicit also. Consider the image.
This image can be split into two images by combining the LSBs to form one image and the
MSBs to form another image.
The first image has all 0s and the second image has two 0s and two 1s. These redundancies
can be exploited and the image data can be reduced. The types of redundancies in images are
1) Coding Redundancy
High probabilities are associated with events that are likely to occur frequently. Low
probabilities are associated with events that occur rarely. Based on this uncertainty, information can
be measured and quantified. The amount of uncertainty is called the self-information associated
with the event. This is known as information content. The unit is bits. Thus the information content
is inversely proportional to its probability. When the probability of the event is 1, the information
content is 0 and when the probability is 0, the information content is 1. Thus the range of
information content is also 0-1.
In images, the grey values of pixels are not equally probable and often a value is related to
that of its neighbours. So a pixel and its grey level (k) are modelled as a random variable rk. Each
random variable is associated with a probability p(r k). Coding redundancy is caused due to poor
selection of coding technique. A coding scheme assigns a unique code for all symbols of the
message. Binary coding is a good example of a coding scheme where only two code symbols {0,1}
are used. So the message is mapped to a codeword containing 0s and 1s. There are many coding
schemes available. ASCII (American standard code for information interchange) is a 7-bit code.
Grey code is another popular code. When the mapping between the source symbols and the code
is fixed, such a code is called a block code. If a code is uniquely decodable, it is called a uniquely
decodable code. If the codeword of a block code is distinct, it is called a non – singular code. If
the decoding is possible without the knowledge of the succeeding code words, it is known as
instantaneous code.
The Huffman code is a variable coding technique and uses lesser number of bits to encode
the same information. Hence it can be concluded that variable length coding is better than fixed
binary coding. Thus a wrong choice of code creates unnecessary additional bits. These extra bits
are called redundancy. Thus coding redundancy is given as
Coding redundancy = Average bits used to code - Entropy
Where rk represents grey levels, p(rk) is the probability of the pixels given by grey levels r k,
l(rk ) is the length of the code used.
Entropy of an image denotes the minimum number of bits required to code the message. Thus
coding redundancy is removed by using good coding schemes including variable length codes such
as Huffman coding and Shannon-Fano coding algorithm.
It can be observed that Huffman coding is closer to the entropy. Hence it can be concluded
that variable length coding is better than fixed binary coding.
4) Chromatic redundancy
Chromatic redundancy refers to the presence of unnecessary colours in an image. The colour
channels of colour images are highly correlated but the human visual system cannot perceive
millions of colours. Hence the colours that are not perceived by the human visual system can be
removed without affecting the quality of the image.
Fidelity refers to the condition of accurate reproduction of data. The difference between the
original and the reconstructed images is called distortion. The amount of distortion should be
assessed. Objective fidelity measures are error, SNR, and PSNR. Subjective assessment factors are
picture quality, appearance, brand name or customer care to select a product. Similarly the image
quality can be assessed based on the subjective picture quality scale.
1) Entropy coding
The logic behind this category is that if pixels are not uniformly distributed then an
appropriate coding scheme can be selected that can encode the information so that the average
number of bits is less than the entropy. Entropy specifies the minimum number of bits used to
encode information.
Hence the coding is based on the entropy of the source and on the possibility of occurrence of the
symbols. This leads to the idea of variable length coding. Some examples of this type of coding are
Huffman coding, arithmetic coding, and dictionary based coding.
2) Predictive coding
The idea behind predictive coding is to remove the mutual dependency between the successive
pixels and then perform the encoding. Normally samples would be very large, but the differences
would be small.
For example, let us assume that the following pixels need to be transmitted:
Compression Algorithms - 1
1) Lossless compression algorithms
Lossless compression algorithms preserve information and the compression process incurs
no data loss. Hence these algorithms are used in domains where reliability and preservation of
data are crucial. However the compression ratio of these algorithms is small in comparison to the
lossy compression algorithms. Some popular lossless compression algorithms are
1. Run-length coding
2. Huffman coding
3. Shannon-Fano coding
4. Arithmetic coding
5. Dictionary-based coding
1) Run-length coding
Run-length coding (RLC) exploits the repetitive nature of the image. It tries to identify the length
of the pixel values and encodes the image in the form of a run. Each row of the image is written
as a sequence. Then the length is represented a run of black or white pixels. This is called run-
length coding. This is an effective way of compressing an image. If necessary, there can be further
compression using variable length coding to code the run lengths themselves.
The maximum length is five. It requires three bits in binary. The total number of vectors is
six. The maximum number of bits required is three. The number of bits per pixel is one, as the
pixels of the image are either 0 or 1. Therefore, the total number of pixels is given by 6 x (3 + 1) =
24. The total number of bits of the original image is 5 x 5 = 25. Therefore, the compression ratio is
25/24, that is 1.042:1.
The scan line can be changed. This change affects the compression ratio. Vertical line scanning
of the same image yields
It can be observed that this is significantly lesser than the previous scheme. The scan line
can be changed to a zigzag line as shown below.
Vertical line scanning yields,
It can be observed that the compression ratio changes with the scan line.
2) Huffman coding
Huffman coding is a type of variable length coding. In Huffman coding, the coding redundancy
can be eliminated by choosing a better way of assigning the codes. The Huffman coding algorithm
is given as follows.
1) List the symbols and sort them
2) Pick two symbols having the least probabilities.
3) Create a new node. Add the probabilities of the symbols selected in step 2 and label the
new node with it.
4) Repeat steps 2 and 3 till only one node remains
5) Start assigning code 0 for the left tree and code 1 for the other branch.
6) Trace the code from the root to the leaf that represents each label.
The running time of the algorithm is O(nlog n). The Huffman tree is
shown.
The problem with the Huffman codes is that they are not unique. The data given can be
differently combined to yield the result shown.
3) Huffman decoder
The procedure for decoding as implemented in the Huffman decoder is as follows:
1. Find the coded message. Start from the root.
2. If the read bit is 0, move to the left. Other wise move to the right of the tree.
3. Repeat the steps until the leaf is reached. Then generate the code and start again from
the root.
4. Repeat steps 1-3 till the end of the message.
4) Shannon-Fano coding
The difference between Shannon-Fano coding and Huffman coding is that the binary tree
construction is top-down in the former. The whole alphabet of symbols is present in the root. Then
a node is split into two halves along with the probability of the symbols – One corresponding to
the left and the other to the right, based on the values of the probabilities. This process is repeated
recursively and an entire tree is constructed. Then 0 is assigned to the left half and 1 to the second
half.
The steps of the Shannon-Fano algorithm are as follows:
1. List the frequency table and sort the table on the basis of increasing frequency.
2. Divide the table into two halves such that the groups have more or less equal number
of frequencies.
3. Assign 0 to the upper half and 1 to the lower half.
Recursively repeat the process until each symbol becomes a leaf on the tree.
Compression Algorithms – 2
1) Bit plane coding
The idea of RLC can also be extended for multilevel images. This technique splits a multilevel
image into a series of bi-level images, that is , an m-bit grey level image can be represented in
the form
The zeroth order bit plane is generated by collecting the a0 bits of each pixel. The first
order bit plane is generated by collecting all the first bits of each pixel. Continuing in this
manner, the m-1 order bit plane is generated by collecting al the am-1 bits of each pixel.
Let us assume that the grey scale image is as follows.
The image A can now be divided into three planes using the MSB, the middle bit and the LSB
as follows:
The individual planes of the image can now be compressed using RLC techniques. If a plane is
completely white or black, RLC yields peak performance. However the disadvantage of this
scheme is that the neighbours in the spatial domain, say 3 and 4, having binary codes 011 and 100
are not present together in any of the planes. So it is reasonable to expect that the close neighbours
in the spatial domain not being present together in the bit plane causes a problem. In addition
small changes in the grey level have a chain effect and the complexity of the bit plane is
significantly affected.
To avoid this problem, grey code can be used instead of binary code. In grey code, the
successive codes differ by only one bit. The algorithm for generating grey code can be given as
follows.
Once the planes are obtained separately, RLC can be applied to the individual planes.
Another coding scheme that can be used for the bit plane is constant area coding (CAC). The bit
planes have uniform regions of 1s and 0s. A constant portion of the bit plane can be uniquely
coded using less number of bits. CAC divides the image into a set of blocks of the size (mxm)
like 8x8 or 16x`16. There are three types of blocks available.
1. A block of all white pixels
2. A block of all black pixels
3. A block with mixed pixels.
The most probable block is assigned a single code of either 0 or 1. The remaining blocks are
assigned a 2-bit code. White block skipping (WBS) is a scheme where a majority of the blocks
(white) are assigned a single bit. The rest of the blocks including mixed pixel blocks are encoded
together.
2) Arithmetic coding
Arithmetic coding is another popular algorithm and is widely used, like the Huffman coding
technique. The differences between arithmetic and Huffman coding are shown in the table.
Differences between arithmetic coding and Huffman coding
The first character to be encoded is C. This falls in the range of 0.9-1.0. So the code would
start in this range only. The range here is 1.0-0.9=0.1. This range of 0.1 is now divided among the
symbols according the probability given. The new range can be obtained by multiplying the
probability and
0.1. The cumulative probability is given in the table.
The next character to be encoded is B. This falls in the range 0.936-0.954. With this character this
string ends. So the final code for string ‘CAB’ is between 0.936 and 0.954. By this logic, any
string can be represented.
B = 0.82+0.3X0.4=0.84 C =
0.84+0.4X0.4=1.0
3) Dictionary-based Coding
This is also called Lempel-Ziv-Welch (LZW) coding. The idea behind this coding is to use
a dictionary to store the string patterns that have already been encountered. Indices are used to
encode the repeated patterns. The encoder reads the input string. Then it identifies the recurrent
words and outputs their indices from the dictionary. If a new word is encountered, the word is
sent as output in the uncompressed form and is entered into the dictionary as a new entry. The
advantages of the dictionary-based methods are as follows:
1. They are faster
2. These methods are not based on statistics. Thus there is no dependency of the quality
of the model on the distribution of data.
3. These methods are adaptive in nature.
Encoding
The idea is to identify the longest pattern for each collected segment of the input string. It is
then checked in the dictionary. If there is no match, the segment becomes a new entry in the
dictionary. The algorithm is as follows:
2) Lossless predictive coding
Predictive coding techniques eliminate the inter-pixel dependencies by predicting new
information, which is obtained by taking the difference between the actual and the predicted
values of that pixel. The encoder takes a pixel of the input image f n. The predictor predicts the
anticipated value of that pixel using past inputs (historical data). The predicted value is rounded to
the nearest integer value denoted by fn. This is the predicted values. The error is the difference
between the actual and the predicted values.
This error is sent across the channel. The same predictor is sued in the decoder side to
predict the value. The reconstructed image is
Compression Algorithms - 3
1) Lossy Compression Algorithms
Lossy compressions algorithms, unlike lossless compression algorithms, incur a loss of
information. This loss is called distortion. However, this data loss is acceptable if it is tolerable.
The compression ratio of these algorithms is very large. Some popular lossy compression
algorithms are as follows:
1. Lossy predictive coding
2. Vector quantization
3. Block transform coding
Here the number of bits used to transmit is the same as the original scheme, but the value 31 is
transmitted instead of 41, leading to error. This loss of information leads to an error which results
in a lossy compression scheme. This scheme requires only 6 x 6 = 36 bits.
i) Delta modulation
Delta modulation goes one step further by using only one bit for representing the quantized
error value. This can be positive or negative. Here the predictor is defined as
Where α is called the prediction coefficient. More generally, for the first digit,
Here ς is the positive quantity. This scheme creates problems if significant transitions in the
data are encountered frequently. When ς is varied, it is called adaptive delta modulation scheme.
2) Vector Quantization
Vector quantization (VQ) is a technique similar to scalar quantization. In scalar
quantization, the individual pixels are quantized. The idea of VQ is to identify the frequently
occurring blocks in an image and to represent them as representative vectors. The set of all
representative vectors is called the code book, which is then used for image. The structure of VQ
is shown in the figure below.
2. It carries out a mapping process between the input vectors and the centroid vector
3. This introduces an error called distortion measure. This distortion is described as
Transform selection
The whole idea of transform coding is to use mathematical transforms for data compression.
Transformations such as discrete Fourier transform (DFT), discrete cosine transform (DCT) and
wavelet transforms can be used. The choice of the transforms depends on the resources and the
amount of error associated with the reconstruction process. Mathematical transforms are tools for
information packing.
An important aspect of the image is that smoother details are low frequency components.
Sharp details like edges are high frequency components. Transforms convert data into frequency
components. Then the required frequency components are selected so that the coefficients
associated with the details that are insensitive to the human eye are discarded.
Bit allocation
Transform coding is the process of truncating, quantizing, and coding the coefficients of
the transformed sub images. It is necessary to assign bits such that the compressed image will have
minimum distortions. Bit allocation should be done based on the importance of the data. The idea
of bit allocation is to reduce the distortion by optimal allocation of bits to the different classes of
data. The steps involved in bit allocation are as follows:
1. Assign predefined bits to all classes of data in the image.
2. Reduce the number of bits by one and calculate the distortion
3. Identify the data that is associated with the minimum distortion and reduce one bit form
its quota
4. Find the distortion rate again
5. Compare with the target and if necessary repeat the steps 1-4 to get the optimal rate.
Zonal coding
The zonal coding process involves multiplying each transform coefficient by the
corresponding element in a zonal mask, which has 1 in the location of maximum variance and 0 in
the other places. A zonal mask is designed as part of this process. These coefficients are retained as
they convey more image information. The locations are identified based on the image models used
for source symbol encoding. The retained coefficients are quantized and coded. The number of bits
allocated may be fixed or may vary based on some optimal quantizer.
Threshold mask
Threshold coding works based on the fact that transform coefficients having the maximum
magnitude make the most contribution to the image. The threshold may be one of the following:
1. A single global threshold
2. An adaptive threshold for each sub image
3. A variable threshold as a function of the location for each coefficient in the sub
image.
4. The thresholding and quantization process can be combined.
Z(u, v) is the transform normalized array T=T(u, v)
x z(u, v)
The inverse transform of T gives the decompressed image approximately. It is assumed that the
largest magnitude makes the most significant contribution. This varies from one mask to another.
The mask has 1 in the place of the maximum threshold and 0 in the other places.
Image segmentation
Image segmentation has emerged as an important phase in image based applications.
Segmentation is the process of partitioning a digital image into multiple regions and extracting a
meaningful region known as Region of Interest (ROI). Regions of interest vary with applications.
For example, if the goal of a doctor is to analyse the tumour in a computer tomography (CT)
image, then the tumour in the image is the ROI. Similarly if the image application aims to
recognize the iris in an eye image then the iris in the eye image is the required ROI. Segmentation
of ROI in real world images is the first major hurdle for effective implementation of image
processing applications as the segmentation process is often difficult. Hence, the success or
failure of the extraction of ROI ultimately influences the success of image processing
applications. No single universal segmentation algorithm exists for segmenting the ROI in all
images. Therefore, the user has to try many segmentation algorithms and pick an algorithm that
performs the best for the given requirement.
Based on user interactions, the segmentation algorithms can be classified into the following three
categories:
1. Manual
2. Semi-automatic
3. Automatic
The words ‘algorithm’ and ‘method’ can be used interchangeably. In the manual method,
the object of interest is observed by an expert who traces its ROI boundaries as well, with the help
of software. Hence, the decisions related to segmentation are made by the human observers. Many
software systems assist experts in tracing the boundaries and extracting them. By using the
software systems, the experts outline the object. The outline can be either an open or closed
contour.
Boundary tracing is a subjective process and hence variations exist among opinions of
different experts in the field, leading to problems in reproducing the same results. In addition a
manual method of extraction is time consuming, highly subjective, prone to human error and has
poor intra- observer reproducibility. However, manual methods are still used commonly by experts
to verify and validate the results of automatic segmentation algorithms.
Automatic segmentation algorithms are a preferred choice as they segment the structures of
the objects without any human intervention. They are preferred if the tasks need to be carried out
for a larger number of images.
Semi-automatic algorithms are a combination of automatic and manual algorithms. In semi-
automatic algorithms, human intervention is required in the initial stages. Normally the human
observer is supposed to provide the initial seed points indicating the ROI. Then the extraction
process is carried out automatically as dictated by the logic of the segmentation algorithm. Region
growing techniques are semi-automatic algorithms where the initial seeds are given by the human
observer in the region that needs to be segmented. However the program process is automatic.
These algorithms can be called assisted manual segmentation algorithms.
Another way of classifying the segmentation algorithm is to use the criterion of the pixel
similarity relationships with neighbouring pixels. The similarity relationships can be based on
colour, texture, brightness, or any other image statistics. On this basis, segmentation algorithms
can be classified as follows:
1. Contextual (region-based or global) algorithms
2. Non contextual (pixel-based or local) algorithms
Contextual algorithms group pixels together based on common properties by exploiting the
relationships that exist among the pixels. These are also known as region-based or global
algorithms. In region-based algorithms, the pixels are grouped based on some sort of similarity that
exists between them. Non-contextual algorithms are also known as pixel based or local
algorithms. These algorithms ignore the relationship that exists between the pixels or features.
Instead, the idea is to identify the discontinuities that are present in the image such as isolated lines
and edges. These are then simply grouped into a region based on some global level property.
Intensity-based thresholding is a good example of this method.
Detection of discontinuities
The three basic types of grey level discontinuities in a digital image are the following:
1. Points
2. Lines
3. Edges
1) Point detection
An isolated point is a point whose grey level is significantly different from its background in
a homogeneous area. A generic 3x3 spatial mask is shown in the figure.
The mask is superimposed onto an image and the convolution process is applied. The response of
the mask is given as
Where the fk values are the grey level values of the pixels associated with the image. A
threshold value T is used to identify the points. A point is said to be detected at the location on
which the mask is centred if |R| ≥ T, where T is a non – negative integer. The mask values of a point
detection mask are shown in the figure.
2) Line Detection
In line detection, four types of masks are used to get the responses, that is R 1, R2, R3 and R4 for the
directions vertical, horizontal, +45o, and -45o, respectively. The masks are shown in figure.
These masks are applied to the image. The response of the mask is given as
R1 is the response for moving the mask from the left to the right of the image. R2 is the
response for moving the mask from the top to the bottom of the image. R3 is the response of the
mask along the
+45o line and R4 is the response of the response of the mask with respect to a line of -45o. Suppose
at a certain line on the image |Ri|>|Rj|Ɐ j ≠i, then that line is more likely to be associated with the
orientation of the mask. The final maximum response is defined by max 4 i=1 {Ri} and the line is
associated with that mask. A sample image and the results of the line-detection algorithm are shown
in the figure.
3) Edge detection
Edges play a very important role in many image processing applications. They provide an
outline of the object. In the physical plane, edges correspond to the discontinuities in depth, surface
orientation, change in material properties, and light variations. These variations are present in the
image as grey scale discontinuities. An edge is a set of connected pixels that lies on the boundary
between two regions that differ in grey value. The pixels on an edge are called edge points.
A reasonable definition of an edge requires the ability to measure grey level transitions in
a meaningful manner. Most edges are unique in space that is, their position and orientation remain
the same in space when viewed from different points. When an edge is detected, the unnecessary
details are removed, while only the important structural information is retained. In short, an edge
is a local concept which represents only significant intensity transitions. An original image and its
edges are shown in the figure respectively.
An edge is typically extracted by computing the derivative of the image function. This
consists of two parts magnitude of the derivative, which is an indication of the strength/contrast of
the edge and the direction of the derivative vector, which is a measure of the edge orientation. Some
of the edges that are normally encountered in image processing are as follows:
1. Step edge
2. Ramp edge
3. Spike edge
4. Roof edge
Where A is the angular threshold. Edge linking is a post-processing technique that is used
to link edges.
The idea of using edge detection algorithms to extract the edges and using an appropriate
threshold for combining them is known as edge elements extraction by thresholding. The
threshold selection can be static, dynamic or adaptive. Usage of edge detection, thresholding and
edge linking requires these algorithms to work interactively to ensure the continuity of the edges.
Edge relaxation
Edge relaxation is a process of re-evaluation of pixel classification using its context. Cracks are
the differences between the pixels. The crack edges for the horizontal edges are
|f(x,y) – f(x+1,y) | and for the vertical cracks are |I(x,y)-I(x,y-1)|
Principle of thresholding
Thresholding is a very important technique for image segmentation. It produces uniform
regions based on the threshold criterion T. The thresholding operation can be through of as an
operation, such as T = T {x, y, A(x,y), f (x,y)} where f(x,y) is the grey level of the pixel at (x, y)
and A(x,y) is the local property of the image. If the thresholding operations depend only on the
grey scale values, it is called global thresholding. If the neighbourhood property is also taken into
account, this method is called local thresholding. If T depends on pixel coordinates also, T is called
dynamic thresholding.
Histogram and threshold
The quality of the thresholding algorithm depends on the selection of a suitable threshold.
The selection of an appropriate threshold is a difficult process. Figures below show the effect of
the threshold value on an image.
The tool that helps to find the threshold is histogram. Histograms are of two types
1. Unimodal histogram
2. Multimodal histogram
If a histogram has one central peak, it is called unimodal histogram. On the other hand a
multimodal histogram has multiple peaks. A bimodal histogram is a special kind of histogram that
has two peaks separated by a valley. Using the valley, a suitable threshold can be selected to
segment the image. Thus the peak and the valley between the peaks are indicators for selecting the
threshold. This process is difficult for unimodal algorithms and more difficult if the foreground
and background pixels overlap. Some of the techniques that can be followed for selecting the
threshold values are as follows:
1. Random selection of the threshold value
2. If the ROI is brighter than the background object, then find the cumulative histogram.
Histogram is a probability distribution p(g) – ng/n where ng is the number of pixels having the
grey scale value as g and n is the total number of pixels. The cumulative histogram is thus
given by
Set the threshold T such that c(T) = 1/p. if we are looking for dark objects in white
background, the threshold would be 1-(1/p).
Selecting the threshold for bimodal images is an easy task since the valley can indicate the
threshold values. The process is difficult for unimodal histograms and too difficult if the
background and foreground pixels overlap.
Noise can affect histograms. Noise can create a spike and the presence of data with two
different distributions may cause problems. The presence of data belonging to different
distributions may produce a histogram with no distinct nodes which may complicate the process of
finding thresholds.
Global thresholding is difficult when the image contrast is very poor. The background of
the image can cause a huge problem too. In addition textures and colours tend to affect the
thresholding process. Similarly if the lighting conditions are not good, thresholding may yield poor
results.
Another way of applying adaptive thresholding is to split the image into many subregions.
The following image statistics can then be calculated; the idea is to locally apply these for the
subregions. Some useful image statistics are as follows
Max and Min correspond to the maximum and minimum of pixel values, respectively, and
c is a constant.
If the subregions are large enough to cover the foreground and background areas then this
approach works well. In addition, adaptive thresholding is good in situations where the image is
affected by non–uniform illumination problems. Figure (a) shows as original image with non-
uniform illumination. The result of the adaptive algorithm is shown in the figure (b). It can be
observed that this result is far better than the result provided by the global thresholding algorithm
for figure (a).
The reason is that global thresholding algorithms do not work well if the image has
illumination problems.
The difference between the pixels are quantified by the gradient magnitude. The direction
of the greatest change is given by the gradient vector. This gives the directions of the edge. Since
the gradient functions are continuous functions, the discrete versions of continuous functions can
be used. This can be done by finding the differences. The approaches in 1D are as follows. Here ∆x
and ∆y are the movements in x and y directions, respectively.
Roberts operator
Let f(x,y) and f(x+1,y) be neighbouring pixels. The difference between the adjacent pixels
is obtained by applying the mask [1-1] directly to the image to get the difference between the pixels.
This is defined mathematically as
Roberts’s kernels are derivatives with respect to the diagonal elements. Hence they are
called cross- gradient operators. They are based on the cross diagonal differences. The
approximation of Roberts’s operator can be mathematically given as
Since the magnitude calculation involves square root operation, the common practice is to
approximate the gradient with absolute values that are simpler to implement, as
Prewitt operator
The Prewitt method takes the central difference of the neighbouring pixels; this difference
can be represented mathematically as
The central difference can be obtained using the mask [-1 0 +1]. This method is very sensitive
to noise. Hence to avoid noise, the Prewitt method does some averaging. The Prewitt approximation
using a 3 x 3 mask is as follows:
Sobel operator
The Sobel operator also relies on central differences. This can be viewed as an
approximation of the first Gaussian derivative. This is equivalent to the first derivative of the
Gaussian blurring image obtained by applying a 3 x 3 mask to the image. Convolution is both
commutative and associative and is given as
The edge mask can be extended to 5x5, 7x7 etc. An extended mask always gives a better
performances.
Second-order Derivative Filters
Edges are considered to be present in the first derivative when the edge magnitude is large
compared to the threshold value. In the case of the second derivative, the edge pixel is present at a
location where the second derivative is zero. This means that f’’(x) has a zero crossing which can
be observed as a sign change in pixel differences. The Laplacian algorithm is one such zero-crossing
algorithm.
However, the problems of the zero-crossing algorithms are many. The problem with
Laplacian masks is that they are sensitive to noise as there is no magnitude checking – even a small
ripple causes the method to generate an edge point. Therefore, it is necessary to filter the image
before the edge detection process is applied. This method produces two-pixel thick edges, although
generally, one- pixel thick edges are preferred. However, the advantage is that there is no need for
the edge thinning process as the zero-crossings themselves specify the location of the edge points.
The main advantage is that these operators are rotationally invariant.
The second-order derivative is
This ▽2
operator is called Laplacian operator. The Laplacian of the 2D function f(x, y) is also defined
as
Since the gradient is a vector, two orthogonal filters are required. However, since the Laplacian
operator is scalar, a single mask is sufficient for the edge detection process. The Laplacian estimate
is given as
The Laplacian masks are as shown below. The mask shown in figure (a) is sensitive to
horizontal and vertical edges. It can be observed that the sum of the elements amounts to zero. To
recognize the diagonal edges, the mask shown in figure (b) is used. This mask is obtained by
rotating the mask of figure (a) by 45o. The addition of these two kernels results in a variant of the
Laplacian mask shown in figure (c). Two times of the mask shown in figure (a), when subtracted
from the mask shown in figure
(b) yields another variant mask as shown in figure (d).